Multifactor ANOVA
Multifactor ANOVA
7/24/2009
Multifactor ANOVA
Summary
The Multifactor ANOVA procedure is designed to construct a statistical model describing the
impact of two or more categorical factors Xj on a dependent variable Y. Tests are run to
determine whether or not there are significant differences between the means of Y at the
different levels of the factors and whether or not there are interactions between the factors. In
addition, the data may be displayed graphically in various ways, including a multiple scatterplot,
a means plot, and an interaction plot.
This procedure is designed for relatively simple experiments, such as factorial experiments with
fixed effects. The General Linear Models procedure should be used for more complicated
situations.
Sample Data:
The file stresstest.sgd contains data from a stress test of n = 36 individuals, reported by Kutner et
al. (1996). In the study, each subject exercised on a treadmill and the number of minutes required
to reach a predefined level of stress was recorded. The table below shows a partial list of the
data in that file:
Data Input
The data consist of a single column containing the measurements and multiple columns
indicating the levels of the experimental factors.
Covariates: optional numeric columns containing the values of quantitative variables that
vary together with the response and whose effects should be adjusted for before comparing
levels of the categorical factors.
Analysis Summary
The Analysis Summary shows the number of factors and the total number of observations n.
Scatterplot
The Scatterplot pane plots the data by levels of a selected factor.
30
minutes
20
10
0
high low
body fat
If there are many common values, you may wish to add a small amount of horizontal jitter to the
plot by pressing the Jitter button on the analysis toolbar:
This offsets each point randomly in the horizontal direction so that identical values do not plot on
top of each other:
minutes 30
20
10
0
high low
body fat
The above plot suggests that there are differences between individuals with high body fat and
those with low body fat.
Pane Options
ANOVA Table
In order to determine whether or not the factors have a significant effect on the dependent
variable, an analysis of variance is performed. The results are displayed in the ANOVA Table:
The table divides the overall variability among the n measurements into several components:
2009 by StatPoint Technologies, Inc. Multifactor ANOVA - 4
STATGRAPHICS – Rev. 7/24/2009
1. A component attributable to the Main Effect of each factor, which measures the
variability amongst the mean responses at each level of the factor.
Of particular importance are F-ratios and their associated P-Values. Small P-Values (less than
0.05 if operating at the 5% significance level) correspond to significant effects.
In the current example, all of the main effects are statistically significant as is the interaction
between factors A and C (Body fat and Smoking).
Pane Options
The Pane Options dialog box controls how the F-tests are calculated:
Sums of Squares: the type of decomposition used to calculate the sums of squares in the
ANOVA table. The default selection is Type III, which quantifies the increase in the error
sum of squares that would occur if each effect were removed from the analysis, given that all
of the other effects remain. In contrast, Type I sums of squares represent the reduction in the
error sum of squares that occurs as each variable is added to the model, in the order shown in
the ANOVA table. In a balanced experiment (an experiment with equal numbers of
2009 by StatPoint Technologies, Inc. Multifactor ANOVA - 5
STATGRAPHICS – Rev. 7/24/2009
observations at all combinations of the factors) such as the current example, both types of
sums of squares yield identical results. In unbalanced cases, there will be a difference. Type
III is the default since it quantifies the marginal contribution of each effect given that all of
the other effects have been accounted for.
Error Term: the mean square to be used as the denominator of the F-test when testing the
significance of each effect. In a design in which all factors are crossed and non-random, the
default selection of Residual is correct. For more complicated types of designs, the analyst
may wish to specify another denominator for certain effects. Note: the General Linear
Models procedure automatically determines the proper denominator for many types of
models involving random and nested factors and should normally be used to analyze those
types of experimental designs.
Analysis Options
The Analysis Options dialog box specifies the interactions to be included in the analysis.
Maximum Order Interaction: maximum number of factors for which an interaction will be
estimated.
Exclude: Press this button to remove one or more interactions from the analysis.
By double-clicking on any interaction, it can be moved from the left to the right or back again.
Any interactions specified in the Exclude field will not be estimated.
After removing the two insignificant effects from the stress test data, the ANOVA table shows
the remaining effects:
Graphical ANOVA
The Graphical ANOVA plot, developed by Hunter (2005), is a technique for displaying
graphically the importance of each factor in the analysis. It is a plot of the scaled effects of each
factor, where the “effect” of a factor equals the difference between the least squares mean for a
level of that factor and the estimated grand mean. Each of the effects is multiplied by a scaling
factor
R ni
(1)
T n
where R is the residual degrees of freedom, T is the degrees of freedom for the main effect of
the factor, ni equals the number of observations in the i-th level of the factor, and n is the
average number of observations at all levels of the factor. This scales the effects so that the
female male
gender P = 0.0000
high low
body fat P = 0.0000
Residuals
-24 -14 -4 6 16 26
Along the right-hand side of the display are the P-values for the main effects, taken from the
ANOVA table.
By comparing the variability amongst the treatment effects in the above plot to that of the
residuals, it is easy to see that all of the factors show differences of a greater magnitude than
could be accounted for solely by experimental error. Depending upon the relative location of the
effects, it may also be possible in some cases to visually identify which levels are significantly
different from which other levels, which is done formally by the Multiple Range Tests described
below.
The top half of the table displays each of the estimated least squares means in increasing order of
magnitude. It shows:
LS Mean - the estimated least squares mean. In the case of a balanced design, the least
squares mean is equivalent to the average of all observations at the indicated factor level.
2009 by StatPoint Technologies, Inc. Multifactor ANOVA - 8
STATGRAPHICS – Rev. 7/24/2009
In unbalanced designs, the least squares mean is the predicted value of the dependent
variable when the specified factor is set to a particular level while all other factors are set
equal to their mean levels. The least squares means adjust for any imbalance in the data
by making predictions at a common level of all the factors.
The second half of the table displays a comparison between each pair of level means.
Limits - an interval estimate of that difference, using the currently selected multiple
comparisons procedure.
Pane Options
LSD - forms a confidence interval for each pair of means at the selected confidence level
using Student’s t distribution. This procedure is due to Fisher and is called the Least
Significant Difference procedure, since the magnitude of the limits indicates the smallest
difference between any two means that can be declared to represent a statistically
significant difference. It should only be used when the F-test in the ANOVA table
indicates significant differences amongst the sample means. The probability of making a
Type I error applies to each pair of means separately. If making more than one
comparison, the overall probability of calling at least one pair of means significantly
different when they are not may be considerably larger than .
Tukey HSD - widens the intervals to allows for multiple comparisons amongst all pairs
of means using Tukey’s T. Tukey called his procedure the Honestly Significant
Difference procedure since it controls the experiment-wide error rate at . If all of the
means are equal, the probability of declaring any of the pairs to be significantly different
in the entire experiment equals . Tukey’s procedure is more conservative than Fisher’s
LSD procedure, since it makes it harder to declare any particular pair of means to be
significantly different.
Scheffe - designed to permit the estimation of all possible contrasts amongst the sample
means (not just pairwise comparisons). In the current instance, this procedure is likely to
be very conservative, since only pairs are being estimated.
Student-Newman-Keuls - Unlike the previous methods, this method does not create
intervals for the pairwise differences. Instead, it sorts the means in increasing order and
then begins to separate them into groups according to values of the Studentized range
distribution. Eventually, the means are separated into homogeneous groups within which
there are no significant differences.
The choice between the LSD procedure and a multiple comparisons procedure such as Tukey’s
HSD should depend on the relative cost of making a Type I error (calling a pair of means
different when they’re really not) versus the cost of making a Type II error (not calling a pair of
means different when they really are). In early stages of an investigation, one may not want to
be as conservative as when final verifications are being made.
Table of Means
This table displays the least squares mean for each level of the factors and for pairs of levels for
any included two-factor interactions. Each mean is shown together its estimated standard error
and a confidence interval:
Table of Least Squares Means for minutes with 95.0 Percent Confidence Intervals
Stnd. Lower Upper
Level Count Mean Error Limit Limit
GRAND MEAN 36 19.1389
body fat
high 18 14.7222 0.698361 13.2939 16.1505
low 18 23.5556 0.698361 22.1272 24.9839
gender
female 18 16.7222 0.698361 15.2939 18.1505
male 18 21.5556 0.698361 20.1272 22.9839
smoking
heavy 12 15.6667 0.855314 13.9174 17.416
light 12 18.5833 0.855314 16.834 20.3326
none 12 23.1667 0.855314 21.4174 24.916
body fat by smoking
high,heavy 6 14.1667 1.2096 11.6928 16.6406
high,light 6 14.1667 1.2096 11.6928 16.6406
high,none 6 15.8333 1.2096 13.3594 18.3072
low,heavy 6 17.1667 1.2096 14.6928 19.6406
low,light 6 23.0 1.2096 20.5261 25.4739
low,none 6 30.5 1.2096 28.0261 32.9739
Pane Options
Means Plot
The level means may be plotted together with uncertainty intervals:
19
17
15
13
high low
body fat
Provided all of the sample sizes are the same (or close), the analyst can determine which means
are significantly different from which others using the LSD, Tukey, Scheffe, or Bonferroni
procedure simply by looking at whether or not a pair of intervals overlap in the vertical direction.
A pair of intervals that do not overlap indicates a statistically significant difference between the
means at the selected confidence level. In this case, note that the interval for high body fat does
not overlap the interval for low body fat, indicating a statistically significant difference between
the means at those two levels.
Pane Options
Confidence intervals - displays confidence intervals for the level means using the mean
squared error from the ANOVA table.
LSD intervals - designed to compare any pair of means with the stated confidence level.
Tukey HSD Intervals - designed for comparing all pairs of means. The stated
confidence level applies to the entire family of pairwise comparisons.
Scheffe Intervals - designed for comparing all contrasts. Not usually relevant here.
Interaction Plot
When one or more significant interactions exist, they should be examined together using the
Interaction Plot.
Interaction Plot
32 body fat
high
29
low
minutes
26
23
20
17
14
heavy light none
smoking
The interaction plot displays the least squares means at all combinations of two factors. If the
factors do not interact, the lines on the plot should be approximately parallel. If they are not, then
the effect of one factor depends upon the level of the other, which is the definition of an
interaction.
Notice that the effect of smoking is much greater on individuals with low body fat than on those
with high body fat.
Pane Options
Interval - the type of interval (if any) to be placed around each mean.
Interaction - the interaction to be plotted. A point will be displayed showing the predicted
mean value for each combination of the factors in the selected interaction.
Plot on Axis - the factor within the selected interaction that will be used to define the
horizontal axis. Separate lines will be drawn for each level of the other factor.
23
19
15
11
heavy light none
smoking
Group 2: light-smoking, low body fat individuals, whose time in the test is less than
Group 1 but significantly longer than everyone else.
Group 3: everyone else. Note that their intervals all overlap, indicating no statistically
significant differences amongst any of the remaining individuals.
Residual Plots
As with all statistical models, it is good practice to examine the residuals. The residuals are equal
to the observed data values minus the values predicted by the underlying statistical model.
5
residual
-1
-4
-7
heavy light none
smoking
Pane Options
5
residual
-1
-4
-7
0 10 20 30 40
predicted minutes
Heteroscedasticity occurs when the variability of the data changes as the mean changes, and
might necessitate transforming the data before performing the ANOVA. It is usually evidenced
by a funnel-shaped pattern in the residual plot.
5
residual
-1
-4
-7
0 10 20 30 40
row number
If the data are arranged in chronological order, any pattern in the data might indicate an outside
influence. No such pattern is evident in the above plot.
Save Results
The following results can be saved to the datasheet:
1. Level Counts – the number of observations at each level of the factors and each pair of
factors.
2. Level Means – the mean response at each level of the factors and each pair of factors.
3. Level Standard Errors – the standard error at each level of the factors.
4. Least Squares Mean – the least square mean at each level of the factors.
5. Residuals – the n residuals.
Calculations
Statistical Model
In order to fit a model to the data, STATGRAPHICS constructs an n by p matrix of independent
variables X. The matrix includes:
Indicator variables for each factor. For a factor with k levels, k – 1 indicator variables are
constructed. The j-th indicator variable for a factor contains the value 1 for each observation
equal to the j-th level of the factor, –1 for each observation equal to the k-th level of the
factor, and 0 otherwise.
Cross-products of the indicator variables and covariate columns to represent any interactions.
̂ ( X X ) 1 X Y (2)
Yˆp X p ( X X ) 1 X Y (3)
where X p is the vector of independent variables in which each indicator variable corresponding
to factors not included in the specified effect is set to 0 and each covariate is set to its observed
mean level.
where MSE equals the mean squared error in the ANOVA table.