Analysis of Variance
Analysis of Variance
Circulation
Analysis of Variance
Martin G. Larson, SD
A
nalysis of variance (ANOVA) is a statistical technique to analyze variation in a response
variable (continuous random variable) measured under conditions defined by discrete
factors (classification variables, often with nominal levels). Frequently, we use ANOVA to
test equality among several means by comparing variance among groups relative to variance
within groups (random error).
Sir Ronald Fisher pioneered the development of ANOVA for analyzing results of agricultural
experiments.1 Today, ANOVA is included in almost every statistical package, which makes it
accessible to investigators in all experimental sciences. It is easy to input a data set and run a
simple ANOVA, but it is challenging to choose the appropriate ANOVA for different experimental
designs, to examine whether data adhere to the modeling assumptions, and to interpret the results
correctly. The purpose of this report, together with the next 2 articles in the Statistical Primer for
Cardiovascular Research series, is to enhance understanding of ANVOA and to promote its
successful use in experimental cardiovascular research. My colleagues and I attempt to
accomplish those goals through examples and explanation, while keeping within reason the
burden of notation, technical jargon, and mathematical equations.
Here, I introduce the ANOVA concept and provide details for 2 common models. The first
model, 1-way fixed-effects ANOVA, is an extension of the Student 2-independent-samples t test
that lets us simultaneously compare means among several independent samples. The second
model, 2-way fixed-effects ANOVA, has 2 factors, A and B, and each level of factor A appears in
combination with each level of factor B. This model lets us compare means among levels of factor
A and among levels of factor B; furthermore, we may examine whether combined factors induce
interaction effects (synergistic or antagonistic) on the response.
In the second ANOVA article, the author reviews several multiple-comparisons procedures for
analysis of differences among means, including comparisons between pairs of group means and
more general contrasts among group means. Usually, multiple-comparisons procedures are used
to control type I error rate across numerous hypothesis tests. In the third ANOVA report, the author
introduces repeated-measures ANOVA for use when each experimental unit contributes response
data at each level of a fixed factor (eg, different treatment doses). Statistical textbooks2,3 and
online documents4,5 provide readers with more technical detail for similar material or with broader
coverage of topics beyond the scope of these articles.
BACKGROUND
https://www.ahajournals.org/doi/epub/10.1161/CIRCULATIONAHA.107.654335 1/12
5/27/2021 Analysis of Variance
In this section, I briefly review key terminology for defining experimental design and ANOVA. An
“experimental unit” is the smallest unit of experimental material to which a factor or combination of
factors may be applied. Typically, each experimental unit is a whole organism (eg, human, mouse,
or rat), but it may be at the suborganism level (eg, individual myocytes) or supraorganism level
(eg, an institution). To determine the appropriate ANOVA model, we must know the relations
between factors and experimental units.
Statisticians distinguish 2 types of factors in experimental design and ANOVA: “fixed factors”
and “random factors.” A “fixed factor” is one for which the specific levels are of interest. An
investigator could repeat the entire experiment using identical factor levels both times.
Conceptually, each level of a fixed factor represents a distinct population with a unique response
mean. When an investigator deliberately arranges or modifies the levels of a fixed factor, we call
those levels treatments. The primary ANOVA objective is to test whether response means are
identical across factor levels. In contrast to a fixed factor, the levels of a “random factor” represent
a random sample from a potentially infinite number of levels. Different factor levels would be
chosen randomly if the experiment were redone. With random factors, the ANOVA objective is to
make an inference about random variation within a population.
When a factor level is applied to 2 or more independent experimental units, it is “replicated.” If
replicates are equal in number for each factor level, the experimental design is “balanced.” These
concepts generalize to combinations of factor levels.
An experiment may contain 2 or more factors combined in 2 different ways, either “crossed” or
“nested.” With crossed factors, each level of factor A is present in combination with each level of
factor B. For instance, each of 2 different medications (factor A levels are drugs X and Y) could be
administered at either of 2 doses (factor B levels are low or high), with each experimental unit
receiving 1 drug at 1 dose. In contrast to the situation with crossed factors, each level of a nested
factor occurs in just 1 level of the factor within which it is nested. In a study to compare for-profit
versus nonprofit institutions with respect to patients’ length of stay after coronary artery bypass,
institutional status is a fixed-factor status (for profit/nonprofit), “hospital” is a random factor
(specific hospitals are its levels) nested within the fixed-factor levels, and individual patients are
the experimental units.
We usually refer to ANOVA models using the terms “fixed effects” or “random effects.” This
should not cause confusion, because fixed factors correspond with fixed effects among factor
levels (that is, between-population mean differences), and random factors correspond with random
effects among levels (that is, within-population random differences). If the experimental design
includes fixed and random effects, then we use a “mixed-effects” ANOVA model.
https://www.ahajournals.org/doi/epub/10.1161/CIRCULATIONAHA.107.654335 2/12
5/27/2021 Analysis of Variance
Figure 1. Illustration of treatment effects. Left, Small treatment differences relative to error variance. Right, Large
treatment differences relative to error variance.
Assumptions
When we model data using 1-way fixed-effects ANOVA, we make 4 assumptions: (1) individual
observations are mutually independent; (2) the data adhere to an additive statistical model
comprising fixed effects and random errors; (3) the random errors are normally distributed; and (4)
the random errors have homogenous variance. Violations of these assumptions may compromise
or invalidate the ANOVA results, so let us examine each individually.
Independence The value of 1 observation must not influence the value of other observations. All
experimental units must be independent, and each experimental unit must contribute only 1
response value.
Additivity We can represent the data using a statistical model with additive components. The
model for 1-way fixed-effects ANOVA may be written as follows: individual response= (grand
mean)+(treatment effect)+(random error).
Normality We assume that the random errors within each treatment group, the deviations from
each group mean, have a normal, or gaussian, probability distribution.
Homogeneous Variance Finally, we assume that the within-group random errors have identical
variance across all treatment groups, represented by the parameter ς2.
Together, assumptions of independence, homogeneous variances, and normality imply that
residual errors are a sample of independently and identically distributed normal deviates.
ANOVA Calculations
Without going into mathematical details, the calculations proceed as follows. For each
observation, we write: deviation from overall mean=individual value−overall mean. Squaring each
deviation and summing over all observations yields the “total sum of squares” (SST). SST
represents total variability of observations from their overall mean, quantified by the sum of their
squared differences. An individual deviation also can be written as: deviation from overall mean=
(treatment mean−overall mean)+(individual value−treatment mean). With some algebra found in
https://www.ahajournals.org/doi/epub/10.1161/CIRCULATIONAHA.107.654335 3/12
5/27/2021 Analysis of Variance
statistical textbooks, SST partitions into 2 independent parts. The 2 parts are (1) “sum of squares
between treatments” (SSA), which is obtained by summing the terms (treatment mean−overall
mean)2, and (2) “sum of squares within treatments” (SSE), which is obtained by summing the
terms (individual value−treatment mean)2. SSA represents variability among group means, and
SSE represents within-group residual variability. Each sum of squares has its corresponding
“degrees of freedom” (abbreviated df), which is the effective number of independent observations
used in forming that sum of squares. With N observations, the total sum of squares, SST, has N−1
df; with a ≥2 treatment groups, the “between-treatments” sum of squares, SSA, has (a−1) df;
finally, the “within-treatments” residual sum of squares has (N−1)−(a−1)=(N−a) df. Interested
readers can find rules for determining df in standard statistical texts or online.2–4
Dividing each sum of squares by its df yields a quantity called a “mean square.” The residual
mean square, MSE=SSE/(N−a), estimates the error variance, ς2. If the “null hypothesis” is correct,
such that all treatments have the same population mean, then the between-treatments mean
square, MSA=SSA/(a−1), also estimates ς2. In that situation, the ratio of the 2 variance estimates,
denoted by F=MSA/MSE, has the statistical distribution called the ℱ distribution, with (a−1) and
(N−a) df. (The ℱ distribution was named in honor of Fisher.) Large values of the F ratio provide
evidence against the null hypothesis of equal treatment population means. The probability value is
the probability that a random variable selected from an ℱ distribution with (a−1) and (N−a) df will
exceed the observed F value.
Table 1 displays calculations for the 1-factor fixed-effects model. Scientific journals usually do
not publish the full ANOVA table due to limited space; some journals report the F statistic, its df,
and probability value, whereas others report only the probability value. Subsequent to a
“statistically significant” result (that is, obtaining P<α, where α is the prespecified type I error rate),
one may explore differences in treatment means using multiple-comparisons methods covered in
the next article in the present series on statistics.
Figure 2. Box plots of log(sOB-R) levels by BMI group for 188 men in the Framingham Third Generation Cohort.
Box width is proportional to sample size (Table 2). Units are ng/mL for sOB-R and kg/m2 for BMI.
Table 3 displays calculations from 1-way ANOVA (SAS procedure ANOVA).7 With N=188 men
in 4 BMI categories, there are (4−1)=3 df among groups and (188−4)=184 df within groups. The
sum of squares among BMI groups (SSA=4.24) is 19.1% of the total sum of squares (SST=22.17),
and the ratio of mean squares is highly statistically significant (F=14.50, df=3 and 184, P<0.0001).
These data provide strong evidence against the null hypothesis that the BMI groups have the
same population mean level of log(sOB-R). Inspection of Table 2 or Figure 2 suggests an inverse
association, with decreasing log(sOB-R) as BMI increases.
Table 3. Results of 1-Way ANOVA for Log(sOB-R) by BMI Category (Table view)
Checking Assumptions
https://www.ahajournals.org/doi/epub/10.1161/CIRCULATIONAHA.107.654335 5/12
5/27/2021 Analysis of Variance
If the design is balanced and the sample is large, ANOVA is robust with regard to moderate
deviations from assumptions of homogenous variances and normal error. The calculated F
statistic still has approximately an ℱ distribution. In contrast, fixed-effects 1-way ANOVA is invalid
if the observations are not independent.3 It is important to check and report whether one’s data
adhere to the assumptions and to perform supplementary analyses if serious violations exist.
Additivity In the 1-way ANOVA model, failure to satisfy the additive assumption often leads to
nonhomogeneous variances, which are covered next.
Homogeneous Variance Levene’s test8 is widely used to test the null hypothesis that variances
are homogeneous. An alternative procedure, Bartlett’s test, performs poorly with nonnormal data3
and should not be used unless normality has been validated. Visual inspection of Table 2 and
Figure 2 suggests that the spread of log(sOB-R) is similar in all BMI categories, and this is
confirmed by Levene’s test (P=0.94), so we conclude that variances are not heterogeneous.
Normality The Shapiro-Wilk procedure9 may be used to test normality in samples with fewer than
2000 observations. In this example, log-transformed sOB-R data are approximately normally
distributed in each BMI category (Shapiro-Wilk test P=0.11, 0.52, 0.17, and 0.91, respectively).
The raw data deviate severely from normality (at P<0.001 in 3 BMI categories) with right skewness
and/or high kurtosis, and this justifies application of the normalizing logarithmic transformation.
The ANOVA model is just an approximation for the data, and ANOVA assumptions may not be
satisfied completely. With normal data but heterogeneous variances, ANOVA is robust for
balanced or nearly balanced designs but not for highly unbalanced designs.3 In the setting of
normal data, heterogeneous variances, and an unbalanced design, one might use Welch’s
ANOVA to accommodate unequal variances.10 With homogeneous variances but nonnormal data,
ANOVA is robust for balanced designs with large samples but not for unbalanced design or small
samples (n<5 per group). In the setting of nonnormal data, homogeneous variances, and a small
sample or highly unbalanced design, a nonparametric procedure such as the Kruskal-Wallis test11
may be preferred over 1-way ANOVA. If the data are not normally distributed and variances are
heterogeneous, a transformation may be necessary. At the research design stage, an investigator
must realize the importance of a balanced design and large sample.
Confidence Intervals
After 1-way ANOVA, one may wish to estimate a confidence interval (CI) for a population mean or
for the difference between 2 population means. The form of the CI is (sample estimate)±
https://www.ahajournals.org/doi/epub/10.1161/CIRCULATIONAHA.107.654335 6/12
5/27/2021 Analysis of Variance
https://www.ahajournals.org/doi/epub/10.1161/CIRCULATIONAHA.107.654335 7/12
5/27/2021 Analysis of Variance
Figure 3. Illustration of interaction effects. Top, No interaction between factors A and B. Bottom, Interaction
(synergistic) between factors A and B. Open squares (factor A, level 1) and solid circles (factor A, level 2) represent
population mean values at 3 levels of factor B.
Formal definition of the factorial 2-way fixed-effects ANOVA model requires statistical notation
to identify specific levels of A and B and of their combination, as well as to denote each replicate
within each combination. Conceptually, the model for each observation is as follows: equation
As with 1-way ANOVA, deviations from the grand mean when expanded algebraically, squared,
and summed across levels of both factors produce sums of squares associated with main effects
for factor A, main effects for factor B, interaction effects due to combinations of A and B, and
random error. Corresponding df, mean squares, and F ratios and probability values from
hypothesis tests are displayed in Table 4.
https://www.ahajournals.org/doi/epub/10.1161/CIRCULATIONAHA.107.654335 8/12
5/27/2021 Analysis of Variance
Some computational algorithms for 2-way ANOVA use formulas that are valid only for
complete, balanced factorial designs. In practice, it is common to have unequal numbers in each
group, either because the study does not control the numbers of observations or because some
response data are missing. When confronted with data from incomplete or unbalanced factorial
designs, an investigator must choose a statistical software package that correctly handles the
calculations.
Table 5. Descriptive Statistics for Log(sOB-R) by BMI and HDL Cholesterol Categories (Table view)
https://www.ahajournals.org/doi/epub/10.1161/CIRCULATIONAHA.107.654335 9/12
5/27/2021 Analysis of Variance
Figure 4. Box plots of log(sOB-R) levels by HDL cholesterol group and BMI group for 188 men in the Framingham
Third Generation Cohort. Box width is proportional to sample size (Table 5). Units are ng/mL for sOB-R, mg/dL for
HDL, and kg/m2 for BMI.
Table 6 shows results from the 2-way ANOVA model with interaction that was fitted with the
SAS GLM (general linear model) procedure.12 Because of the highly unbalanced design, typical
ANOVA calculations (eg, SAS ANOVA procedure7) would produce incorrect results. Table 6
displays type III sums of squares and F tests. Type III sums of squares are preferred in analyses
of unbalanced designs, because these statistics are calculated for each factor or interaction after
adjustment for all other effects in the model; they do not depend on the ordering of variables. In
this example, the main effects are highly statistically significant with regard to both the BMI group
(F=9.18, 3 and 180 df, P<0.0001) and the HDL group (F=14.67, 1 and 180 df, P=0.0002), but the
BMI×HDL interaction is not significant (F=0.51, 3 and 180 df, P=0.67). When the interaction is not
statistically significant, it is common to refit the model with the exclusion of the interaction term to
simplify the interpretation of main effects. Here, one concludes that levels of log(sOB-R) are lower
in men with low HDL than in men who have higher HDL, that levels of log(sOB-R) tend to
decrease across BMI groups, and that the pattern of decrease in log(sOB-R) across BMI groups is
similar in both HDL groups.
Table 6. Results of 2-Way ANOVA for Log(sOB-R) by BMI and HDL Cholesterol Categories (Table view)
higher power. Sample size n=50 per group provides good power (say, 0.80) if true effect size is
δ=0.63, but a study with n=15 per group has power 0.80 only if effect size is δ=1.18, and a study
with n=5 per group has power 0.80 only if effect size is very large, δ=2.24 (not shown on graph).
Also, power is higher for balanced designs than for unbalanced and with few rather than many
treatment groups. Experiments should be designed to have reasonable power (typically set at
0.80) to detect realistic treatment differences, because inadequately powered experiments usually
yield inconclusive results.
Figure 5. Power in 1-way ANOVA as a function of sample size (n per group) and effect size (δ). Significance level is
0.05; population means are −δ/2, 0, and δ/2; and ς=1.
ARTICLE INFORMATION
Correspondence
Correspondence to Martin Larson, SD, Framingham Heart Study, 73 Mount Wayte Ave, Framingham, MA
01702. E-mail [email protected]
Affiliations
From the Department of Mathematics and Statistics, Boston University, Boston, Mass, and the Framingham
Heart Study of the National Heart, Lung, and Blood Institute, Framingham, Mass.
Acknowledgments
Data on sOB-R levels were kindly provided by Dr Vasan S. Ramachandran.
Sources of Funding
Salary support and examination data were provided by contract NO1 HC 25195 (Principal Investigator
P.A. Wolf) from the National Heart, Lung, and Blood Institute, National Institutes of Health. sOB-R levels were
measured with support from grant K24 HL04334 (Principal Investigator V.R. Ramachandran), National Heart,
Lung, and Blood Institute, National Institutes of Health.
Disclosures
None.
https://www.ahajournals.org/doi/epub/10.1161/CIRCULATIONAHA.107.654335 11/12
5/27/2021 Analysis of Variance
REFERENCES
1. Fisher RA. Statistical Methods for Research Workers. Edinburgh, United Kingdom: Oliver & Boyd; 1925.
2. Kleinbaum DG, Kupper LL, Muller KE. Applied Regression Analysis and Other Multivariable Methods.
2nd ed. Boston, Mass: PWS-Kent Publishing; 1988.
3. Zar JH. Biostatistical Analysis. Upper Saddle River, NJ: Prentice Hall; 1999.
4. Sit V. Analyzing ANOVA Designs: Biometrics Information Handbook No. 5. Province of British Columbia,
Ministry of Forests Research Program. Working paper 07/1995. Available at:
http://www.for.gov.bc.ca/hfd/pubs/docs/Wp/Wp07.pdf. Accessed July 25, 2007.
5. National Institute of Standards and Technology, Information Technology Library. NIST/SEMATECH e-
Handbook of Statistical Methods. Available at:
http://www.itl.nist.gov/div898/handbook/prc/section4/prc43.htm. Accessed July 25, 2007.
6. Splansky GL, Corey D, Yang Q, Atwood LD, Cupples LA, Benjamin EJ, D’Agostino RB Sr, Fox CS,
Larson MG, Murabito JM, O’Donnell CJ, Vasan RS, Wolf PA, Levy D. The Third Generation Cohort of the
National Heart, Lung, and Blood Institute’s Framingham Heart Study: design, recruitment, and initial
examination. Am J Epidemiol. 2007; 165: 1328–1335. Crossref. PubMed.
7. SAS Institute Inc. SAS/STAT User’s Guide, Version 8. Cary, NC: SAS Institute; 1999: 337–392.
8. Levene H. Robust tests for the equality of variance. In: Olkin I, ed. Contributions to Probability and
Statistics. Palo Alto, Calif: Stanford University Press; 1960: 278–292.
9. Shapiro SS, Wilk MB. An analysis of variance test for normality (complete samples). Biometrika. 1965;
52: 591–611. Crossref.
10. Welch BL. On the comparison of several mean values: an alternative approach. Biometrika. 1951; 38:
330–336. Crossref.
11. Kruskal WH, Wallis WA. Use of ranks in one-criterion analysis of variance. J Am Stat Assoc. 1952; 47:
583–621. Crossref.
12. SAS Institute Inc. SAS/STAT User’s Guide, Version 8. Cary, NC: SAS Institute; 1999: 1465–1636.
13. Stanley K. Design of randomized controlled trials. Circulation. 2007; 115: 1164–1169. Crossref. PubMed.
14. Friendly M. Power Computations for ANOVA Designs [computer software]. Version 1.2. Toronto, Canada:
York University; 2006. Available at: http://www.math.yorku.ca/SCS/sasmac/fpower.html. Accessed July
26, 2007.
15. Lenth RV. Java Applets for Power and Sample Size [computer software]. Iowa City, Iowa: University of
Iowa; 2006. Available at: http://www.stat.uiowa.edu/rlenth/Power. Accessed July 26, 2007.
https://www.ahajournals.org/doi/epub/10.1161/CIRCULATIONAHA.107.654335 12/12