Solutions For Homework 4: Two-Way ANOVA: Response Versus Solution, Days

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

Solutions for Homework 4

Note: Some of the problems may not be exactly the same as in the text book.

4.4. Three different washing solutions are being compared to study their
effectiveness in retarding bacteria growth in five-gallon milk containers. The analysis
is done in a laboratory, and only three trials can be run on any day. Because days
could represent a potential source of variability, the experimenter decides to use a
randomized block design. Observations are taken for four days, and the data are
shown here.

Days
Solution 1 2 3 4
1 13 22 18 39
2 16 24 17 44
3 5 4 1 22

(a) Analyze the data using the two-way command, and also using GLM. Compare the
results.

Minitab Output

Two-way ANOVA: Response versus Solution, Days

Source DF SS MS F P
Solution 2 703.50 351.750 40.72 0.000
Days 3 1106.92 368.972 42.71 0.000
Error 6 51.83 8.639
Total 11 1862.25

S = 2.939 R-Sq = 97.22% R-Sq(adj) = 94.90%

The Model F-value of 40.72 implies the model is significant. There is only a 0.03%
chance that a "Model F-Value" this large could occur due to noise.
Individual 95% CIs For Mean Based on
Pooled StDev
Solution Mean ----+---------+---------+---------+-----
1 23.00 (----*----)
2 25.25 (----*----)
3 8.00 (----*-----)
----+---------+---------+---------+-----
7.0 14.0 21.0 28.0

There is a difference between the means of the three solutions. The CIs indicates that
solution 3 is significantly different than the other two.

General Linear Model: Response versus Solution, Days

Factor Type Levels Values


Solution fixed 3 1, 2, 3
Days fixed 4 1, 2, 3, 4
Analysis of Variance for Response, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P


Solution 2 703.50 703.50 351.75 40.72 0.000
Days 3 1106.92 1106.92 368.97 42.71 0.000
Error 6 51.83 51.83 8.64
Total 11 1862.25

S = 2.93920 R-Sq = 97.22% R-Sq(adj) = 94.90%

Same as two-way analysis of variance, both treatment and block factor seems to be
significant.

(b) Use a multiple comparisons procedure to determine which solutions differ


significantly. Which procedure did you use and what is the interpretation of
“significantly different” when that procedure is used?

I used Tukey method. From it, effectiveness of solution 3 significantly differs from
those of solution 1 and solution 2. There is not enough evidence to conclude that
mean bacteria growth of solution 1 and solution 2 differ.

Minitab Output

Grouping Information Using Tukey Method and 95.0% Confidence

Solution N Mean Grouping


2 4 25.250 A
1 4 23.000 A
3 4 8.000 B

Means that do not share a letter are significantly different.

(c) Try to analyze the data with the last observation (Day=4, Solution=3) missing
with the two-way command (Replace the observation 22 with an asterisk in Minitab)
and note the error message. Now, approximate the missing data value from the
remaining data (See 4.1.3 and replace the asterisk with the estimated value. Then
use the two-way command to analyze the data (Remember to reduce the error
degrees of freedom to account for the estimated data value).

Using a two-way ANOVA will result into an error because of the incomplete data set.

Using equation (4.21) from the text book, we can estimate the missing value as follows:

3(10)  4(83)  203


x  26.5
2(3)
Now we are going to perform two-way ANOVA:
Two-way ANOVA: Response versus Solution, Days

Source DF SS MS F P
Solution 2 610.13 305.063 43.89 0.000
Days 3 1258.23 419.410 60.33 0.000
Error 6 41.71 6.951
Total 11 1910.06

S = 2.637 R-Sq = 97.82% R-Sq(adj) = 96.00%

Individual 95% CIs For Mean Based on Pooled StDev


Solution Mean +---------+---------+---------+---------
1 23.000 (----*-----)
2 25.250 (----*----)
3 9.125 (----*-----)
+---------+---------+---------+---------
6.0 12.0 18.0 24.0

It should be noted that we have estimated the missing value, so degrees of freedom for
error will reduce to 5. Now we need to recalculate the MS (error) and F-values
respectively.

MS error  8.342
305.06
FTrt   36.57
8.342
p  value  0.001

(d) Also use the GLM command on the data set with the missing value replaced with
an asterisk. Fit the reduced model with just the block factor and then fit the full
model with both the block and treatment factor. Compute the general linear test
using the full and reduced model by hand. Compare it with the test for treatment
from the GLM output from the full model.

Reduced Model (only block factor)

General Linear Model: Response versus Days

Factor Type Levels Values


Days fixed 4 1, 2, 3, 4

Analysis of Variance for Response, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P


Days 3 1348.89 1348.89 449.63 6.27 0.021
Error 7 501.83 501.83 71.69
Total 10 1850.73

S = 8.46702 R-Sq = 72.88% R-Sq(adj) = 61.26%


Full Model

General Linear Model: Response versus Solution, Days

Factor Type Levels Values


Solution fixed 3 1, 2, 3
Days fixed 4 1, 2, 3, 4

Analysis of Variance for Response, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P


Solution 2 953.31 460.12 230.06 27.58 0.002
Days 3 855.71 855.71 285.24 34.19 0.001
Error 5 41.71 41.71 8.34
Total 10 1850.73

S = 2.88819 R-Sq = 97.75% R-Sq(adj) = 95.49%

Next we will calculate the F statistic, as follows.


501.83  41.71 41.71
F*    27.58
75 5
Since F2,5 (.95)  5.78 , there exist significance clue that Treatment factor is significant.
Likewise, Full General linear model proposes that treatment factor is significant.

4.9 The effect of three different lubricating oils on fuel economy in diesel truck engines
is being studied. Fuel economy is measured using brake-specific fuel consumption after
the engine has been running for 15 minutes. Five different truck engines are available for
the study, and the experimenters conduct the following randomized complete block
design.

Truck
Oil 1 2 3 4 5
1 0.500 0.634 0.487 0.329 0.512
2 0.535 0.675 0.520 0.435 0.540
3 0.513 0.595 0.488 0.400 0.510

(a) Analyze the data from this experiment.

From the analysis below, there is a significant difference between lubricating oils with
regards to fuel economy.

Minitab Output

ANOVA: Response versus Oil, Truck


Factor Type Levels Values
Oil fixed 3 1, 2, 3
Truck fixed 5 1, 2, 3, 4, 5

Analysis of Variance for Response

Source DF SS MS F P
Oil 2 0.006706 0.003353 6.35 0.022
Truck 4 0.092100 0.023025 43.63 0.000
Error 8 0.004222 0.000528
Total 14 0.103028

S = 0.0229735 R-Sq = 95.90% R-Sq(adj) = 92.83%

The Model F-value of 6.35 implies the model is significant. There is only a 2.2% chance
that a "Model F-Value" this large could occur due to noise.

(b) Use the Bonferroni method to make comparisons among the three lubricating
oils to determine specifically which oils differ in break-specific fuel
consumption (Perform the test and report the respective confidence intervals
for difference of the means)

Based on Bonferroni method reported below, the mean fuel consumption for oil 1 and 2
are different. However, the fuel consumption for oil 3 is not different from oil 1 or 2.

Bonferroni 95.0% Simultaneous Confidence Intervals


Response Variable Response
All Pairwise Comparisons among Levels of Oil
Oil = 1 subtracted from:

Oil Lower Center Upper -------+---------+---------+---------


2 0.00478 0.048600 0.09242 (--------*-------)
3 -0.03502 0.008800 0.05262 (--------*--------)
-------+---------+---------+---------
-0.050 0.000 0.050

Oil = 2 subtracted from:

Oil Lower Center Upper -------+---------+---------+---------


3 -0.08362 -0.03980 0.004018 (--------*--------)
-------+---------+---------+---------
-0.050 0.000 0.050

Bonferroni Simultaneous Tests


Response Variable Response
All Pairwise Comparisons among Levels of Oil
Oil = 1 subtracted from:

Difference SE of Adjusted
Oil of Means Difference T-Value P-Value
2 0.048600 0.01453 3.3449 0.0305
3 0.008800 0.01453 0.6057 1.0000
Oil = 2 subtracted from:

Difference SE of Adjusted
Oil of Means Difference T-Value P-Value
3 -0.03980 0.01453 -2.739 0.0764

(c) Analyze the residuals from this experiment

The residual plots below do not identify any violations to the assumptions.

Normal Probability Plot


(response is Response)
99

95
90

80
70
Percent

60
50
40
30
20

10

1
-0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04 0.05
Residual

Versus Fits
(response is Response)
0.03

0.02

0.01

0.00
Residual

-0.01

-0.02

-0.03

-0.04
0.40 0.45 0.50 0.55 0.60 0.65 0.70
Fitted Value
Residuals Versus Oil
(response is Response)
0.03

0.02

0.01

0.00
Residual

-0.01

-0.02

-0.03

-0.04
1.0 1.5 2.0 2.5 3.0
Oil

(d) If the experimenter would like to detect a difference in fuel consumption of


0.04 between the most extreme of the three lubricating oils, with power 0.90 and
an alpha 0.05 level test, how many trucks should be included in the study?

10 trucks are need for each oil level.

Minitab Output
One-way ANOVA
Alpha = 0.05 Assumed standard deviation = 0.02297
Factors: 1 Number of levels: 3
Maximum Sample Target
Difference Size Power Actual Power
0.04 10 0.9 0.919122

The sample size is for each level.

Hand Calculation

S2=MSE=0.000528
Φ2=bδ2/2aσ2
=b*0.042/(2*3*0.000528)
=0.5050505*b

OC curves at alpha=0.05, beta=0.1, v1=3-1, v2=(3-1)(b-1)

b Φ2 Φ (a-1)(b-1) beta 1-beta=power


9 4.5454 2.132 16 .15 .85
10 5.0505 2.247 18 .07 .93
4.21 An industrial engineer is investigating the effect of four assembly methods (A, B, C,
D) on the assembly time for a color television component. Four operators are selected for
the study. Furthermore, the engineer knows that each assembly method produces such
fatigue that the time required for the last assembly may be greater than the time required
for the first, regardless of the method. That is, a trend develops in the required assembly
time. To account for this source of variability, the engineer uses the Latin square design
shown below. Analyze the data from this experiment (a = 0.05) draw appropriate
conclusions.

Order of Operator
Assembly 1 2 3 4
1 C=10 D=14 A=7 B=8
2 B=7 C=18 D=11 A=8
3 A=5 B=10 C=11 D=9
4 D=10 A=10 B=12 C=14

The Minitab output below identifies assembly method as having a significant effect on
assembly time.

Minitab Output

General Linear Model: Response versus Order, Operator, Method

Factor Type Levels Values


Order random 4 1, 2, 3, 4
Operator random 4 1, 2, 3, 4
Method fixed 4 1, 2, 3, 4

Analysis of Variance for Response, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P


Order 3 18.500 18.500 6.167 3.52 0.089
Operator 3 51.500 51.500 17.167 9.81 0.010
Method 3 72.500 72.500 24.167 13.81 0.004
Error 6 10.500 10.500 1.750
Total 15 153.000

S = 1.32288 R-Sq = 93.14% R-Sq(adj) = 82.84%


4.38 Consider the data in Problems 4.23 and 4.36. Suppressing the Greek letters, analyze
the data using the method developed in Problem 4.32. However, consider carefully
whether the row and column factors in the Latin Squares should be crossed or nested.

Square 1 - Operator
Order 1 2 3 Row4
Total
1 C=10 D=14 A=7 B=8 (39)
2 B=7 C=18 D=11 A=8 (44)
3 A=5 B=10 C=11 D=9 (35)
4 D=10 A=10 B=12 C=14 (46)
(32) (52) (41) (36) 164=y…1

Square 2 - Operator
Order 1 2 3 4 Row
Total
1 C=11 B=10 D=14 A=8 (43)
2 B=8 C=12 A=10 D=12 (42)
3 A=9 D=11 B=7 C=15 (42)
4 D=9 A=8 C=18 B=6 (41)
(37) (41) (49) (41) 168=y…2

Assembly Totals
Methods
y.1..=65
A
y.2..=68
B
y.3..=109
C
y.4..=90
D
Minitab output 1 shows the model that the two variables (operator and order) are nested
within variable 'Square'. It is the model suggested in question 4.32.

Minitab output 1
General Linear Model: response versus method, Square, order, operator
Factor Type Levels Values
method fixed 4 1, 2, 3, 4
Square fixed 2 1, 2
order(Square) fixed 8 1, 2, 3, 4, 1, 2, 3, 4
operator(Square) fixed 8 1, 2, 3, 4, 1, 2, 3, 4

Analysis of Variance for response, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P


method 3 159.250 159.250 53.083 14.00 0.000
Square 1 0.500 0.500 0.500 0.13 0.723
method*Square 3 8.750 8.750 2.917 0.77 0.533
order(Square) 6 19.000 19.000 3.167 0.84 0.566
operator(Square) 6 70.500 70.500 11.750 3.10 0.045
Error 12 45.500 45.500 3.792
Total 31 303.500

S = 1.94722 R-Sq = 85.01% R-Sq(adj) = 61.27%

Minitab output 2 shows the model without nesting.

Minitab output 2
General Linear Model: response versus method, Square, order, operator
Factor Type Levels Values
method fixed 4 1, 2, 3, 4
Square fixed 2 1, 2
order fixed 4 1, 2, 3, 4
operator fixed 4 1, 2, 3, 4

Analysis of Variance for response, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P


method 3 159.250 159.250 53.083 11.51 0.000
Square 1 0.500 0.500 0.500 0.11 0.746
method*Square 3 8.750 8.750 2.917 0.63 0.604
order 3 7.750 7.750 2.583 0.56 0.648
operator 3 44.250 44.250 14.750 3.20 0.048
Error 18 83.000 83.000 4.611
Total 31 303.500

S = 2.14735 R-Sq = 72.65% R-Sq(adj) = 52.90%

Note that the conclusions from the two models are the same that is ‘method’ is significant
at the level of significance 0.01. However, the second model, without nesting is the
appropriate one in this case, with the assumption that order means the same in both
squares and the 4 operators are the same, therefore there should only be 3 degrees of
freedom for each of these factors.  If there were 4 different operators in each square, then
nesting operators would be correct.  See Minitab output 3 below.

Minitab output 3 shows Order as crossed and Operator nested in Square

General Linear Model: Time versus Square, Order, Method, Operator

Factor Type Levels Values


Square fixed 2 1, 2
Order fixed 4 1, 2, 3, 4
Operator(Square) fixed 8 1, 2, 3, 4, 1, 2, 3, 4
Method fixed 4 A, B, C, D

Analysis of Variance for Time, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P


Square 1 0.500 0.500 0.500 0.13 0.721
Order 3 7.750 7.750 2.583 0.68 0.576
Operator(Square) 6 70.500 70.500 11.750 3.11 0.035
Method 3 159.250 159.250 53.083 14.03 0.000
Square*Method 3 8.750 8.750 2.917 0.77 0.528
Error 15 56.750 56.750 3.783
Total 31 303.500

S = 1.94508 R-Sq = 81.30% R-Sq(adj) = 61.36%

4.42 Seven different hardwood concentrations are being studied to determine their effect
on the strength of the paper produced. However the pilot plant can only produce three
runs each day. As days may differ, the analyst uses the balanced incomplete block design
that follows. Analyze this experiment (use a = 0.05) and draw conclusions.

Hardwood Days
Concentration 1 2 3 4 5 6 7
(%)
2 114 120 117
4 126 120 119
6 137 117 134
8 141 129 149
10 145 150 143
12 120 118 123
14 136 130 127

There are several computer software packages that can analyze the incomplete block
designs discussed in this chapter. The Minitab General Linear Model procedure is a
widely available package with this capability. The adjusted sums of squares are the
appropriate sums of squares to use for testing the difference between the means of the
hardwood concentrations.

Minitab Output

General Linear Model: Response versus Concentration, Day

Factor Type Levels Values


Concentration fixed 7 2, 4, 6, 8, 10, 12, 14
Day random 7 1, 2, 3, 4, 5, 6, 7

Analysis of Variance for Response, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P


Concentration 6 2037.62 1317.43 219.57 10.42 0.002
Day 6 394.10 394.10 65.68 3.12 0.070
Error 8 168.57 168.57 21.07
Total 20 2600.29

4.48 (Optional) Perform the interblock analysis for the design in Problem 4.35.

The interblock analysis for problem 4.35 uses ˆ 2  21.07 and


 MS Blocks ( adj )  MS E   b  1  65.68  21.07   6 
 2     19.12 .
a  r  1 7  2
A summary of the interblock, intrablock, and combined estimates is give below:

Parameter Intrablock Interblock Combined


1 -12.43 -11.79 -12.38
2 -8.57 -4.29 -7.92
3 2.57 -8.79 1.76
4 10.71 9.21 10.61
5 13.71 21.21 14.67
6 -5.14 -22.29 -6.36
7 -0.86 10.71 -0.03

Discussion Activity: (Add this answer to your homework submission for


grading.)

Analyze the data in problem 2.28 (7th edition) or 2.34 (8th edition) using both the
paired t-test command and the two-way ANOVA with girder as the block factor. 
Hint:  Consider the nine girders all distinct, despite the labels, two of which are the
same. This exercise is intended simply to emphasize the power of blocking

Paired T-Test and CI: Karlsruhe, Lehigh

Paired T for Karlsruhe - Lehigh

N Mean StDev SE Mean


Karlsruhe 9 1.3401 0.1460 0.0487
Lehigh 9 1.0662 0.0494 0.0165
Difference 9 0.2739 0.1351 0.0450

95% CI for mean difference: (0.1700, 0.3777)


T-Test of mean difference = 0 (vs not = 0): T-Value = 6.08 P-Value = 0.000

Two-way ANOVA: Strength versus Method, Girder

Source DF SS MS F P
Method 1 0.337568 0.337568 36.99 0.000
Girder 8 0.117101 0.014638 1.60 0.260
Error 8 0.073007 0.009126
Total 17 0.527676

S = 0.09553 R-Sq = 86.16% R-Sq(adj) = 70.60%

The conclusions of two analyses are the same.; Two methods, K- and L-, are significantly
different.

Brief comments.

1. Different standard deviation (the error)


The estimated variance of paired t-test is double. 0.1351^2=2*0.009126. This is because
they used different quantity. ANOVA used each single observation. On the other hand,
paired t-test used the difference, say di=xi-yi. Under independent assumption of two
methods, var(d)=var(x-y)=var(x)+var(y).

2. These two methods gave same p-value.


In paired t-test, recalling from Stat 501 that the square of a t-statistic is an F with 1 d.f.
for the numerator and the t d.f. in the denominator. We can see that (the observed T-
value)^2 is (6.08)^2 is about 37.

You might also like