Multivariate Analysis of Variance (MANOVA)
Multivariate Analysis of Variance (MANOVA)
com
Chapter 415
Multivariate Analysis
of Variance (MANOVA)
Introduction
Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In
ANOVA, differences among various group means on a single-response variable are studied. In MANOVA, the
number of response variables is increased to two or more. The hypothesis concerns a comparison of vectors of
group means. When only two groups are being compared, the results are identical to Hotelling’s T² procedure.
The multivariate extension of the F-test is not completely direct. Instead, several test statistics are available, such
as Wilks’ Lambda and Lawley’s trace. The actual distributions of these statistics are difficult to calculate, so we
rely on approximations based on the F-distribution.
Technical Details
A MANOVA has one or more factors (each with two or more levels) and two or more dependent variables. The
calculations are extensions of the general linear model approach used for ANOVA.
Unlike the univariate situation in which there is only one statistical test available (the F-ratio), the multivariate
situation provides several alternative statistical tests. We will describe these tests in terms of two matrices, H and
E. H is called the hypothesis matrix and E is the error matrix. These matrices may be computed using a number of
methods. In NCSS, we use the standard general linear models (GLM) approach in which a sum of squares and
cross-products matrix is computed. This matrix is based on the dependent variables and independent variables
generated for each degree of freedom in the model. It may be partitioned according to the terms in the model.
φi = θ i = λi
1-
1-θi λi
1
λi = 1 - θ i =
1 + φi
415-1
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multivariate Analysis of Variance (MANOVA)
Wilks’ Lambda
Define Wilks’ Lambda as follows:
|E|
Λ p,h,e =
|E+H |
p
= ∏ (1 - θ
j=1
j)
with e ≥ p.
The following approximation based on the F-distribution is used to determine significance levels:
(ft - g)(1 - Λ1/t )
F ph, ft - g =
ph Λ1/t
where
1
f = e - (p - h + 1)
2
ph - 2
g=
2
p 2h 2 − 4
2 if p 2 + h 2 − 5 > 0
p +h −5
2
t=
1 otherwise
where
s = min(p, h)
The following approximation based on the F-distribution is used to determine significance levels:
2
Tg
F a,b =
ce
where
a = ph
b = 4 + (a + 2)/(B - 1)
a(b - 2)
c=
b(e - p - 1)
415-2
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multivariate Analysis of Variance (MANOVA)
(e + h - p - 1)(e - 1)
B=
(e - p - 3)(e - p)
Pillai’s Trace
Pillai’s trace statistic, V(s), is defined as follows:
s
(s)
V = ∑θ
j=1
j= tr(H(E + H )- 1 )
where
s = min(p, h)
The following approximation based on the F-distribution is used to determine significance levels:
(2n + s + 1)V (s)
F s(2m+s+1),s(2n+s+1) =
(2m + s + 1)(s - V (s) )
where
s = min(p, h)
m = (| p - h | -1)/2
n = (e - p - 1)/2
Roy’s largest root, φ max , is defined as the largest of the φ i ’s. The following approximation based on the F-
distribution is used to determine significance levels:
2ν 2 + 2
F (2ν 1+2),(2ν 2+2) = φ
2ν 1 + 2 max
where
s = min(p, h)
ν 1 = (| p - h | -1)/2
ν 2 = (e - p - 1)/2
415-3
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multivariate Analysis of Variance (MANOVA)
Linearity
MANOVA assumes linear relationships among the dependent variables within a particular cell. You should study
scatter plots of each pair of dependent variables using a different color for each level of a factor. Look carefully
for curvilinear patterns and for outliers. The occurrence of curvilinear relationships will reduce the power of the
MANOVA tests.
Data Structure
The data must be entered in a format that places the dependent variables and values of each factor side by side. An
example of the data for a MANOVA design is shown in the table below. In this example, WRATR and WRATA are
415-4
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multivariate Analysis of Variance (MANOVA)
the two dependent variables. Treatment and Disability are two factor variables. This database is stored in the file
MANOVA1.
MANOVA1 dataset (subset)
WRATR WRATA Treatment Disability
115 108 1 1
98 105 1 1
107 98 1 1
90 92 2 1
85 95 2 1
80 81 2 1
100 105 1 2
105 95 1 2
95 98 1 2
70 80 2 2
Procedure Options
This section describes the options available in this procedure.
Variables Tab
These options control which variables are used in the analysis.
Response Variables
Response Variables
Specifies the response (dependent) variables to be analyzed.
Factor Specification
Factor Variable (1-10)
At least one factor variable must be specified. This variable’s values indicates how the values of the response
variable should be categorized. Examples of factor variables are gender, age groups, ‘yes’ or ‘no’ responses, etc.
Note that the values in the variable may be either numeric or text. The treatment of text variables is specified for
each variable by the Data Type option on the data base.
415-5
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multivariate Analysis of Variance (MANOVA)
Type
This option specifies whether the factor is fixed or random.
• Fixed
The factor includes all possible levels, like male and female for gender, includes representative values across
the possible range of values, like low, medium, and high temperatures, or includes a set of values to which
inferences will be limited, like New York, California, and Maryland.
• Random
The factor is one in which the chosen levels represent a random sample from the population of values. For
example, you might select four classes from the hundreds in your state or you might select ten batches from
an industrial process. The key is that a random sample is chosen. In NCSS, a random factor is “crossed” with
other random and fixed factors. Two factors are crossed when each level of one includes all levels of the
other.
Model Specification
This section specifies the experimental design model.
Which Model Terms
A design in which main effect and interaction terms are included is called a saturated model. Often, it is useful to
omit various interaction terms from the model. This option lets you specify which interactions to keep very easily.
If the selection provided here is not flexible enough for your needs, you can specify custom here and enter the
model directly.
The options included are as follows.
• Full Model
The complete, saturated model is analyzed. This option requires that you have no missing cells, although you
can have an unbalanced design. Hence, you cannot use this option with Latin square or fractional factorial
designs.
• Up to 1-Way
A main-effects only model is run. All interactions are omitted.
• Up to 2-Way
All main-effects and two-way interactions are included in the model.
• Up to 3-Way
All main-effects, two-way, and three-way interactions are included in the model.
• Up to 4-Way
All main-effects, two-way, three-way, and four-way interactions are included in the model.
• Custom
This option indicates that you want the Custom Model (given in the next box) to be used.
Custom Model
When a Custom Model is selected (see Which Model Terms), the model itself is entered here. If all main effects
and interactions are desired, you can enter the word “ALL” here. For complicated designs, it is usually easier to
check the next option, Write Model in ‘Custom Model’ Field, and run the procedure. The appropriate model will
be generated and placed in this box. You can then edit it as you desire.
415-6
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multivariate Analysis of Variance (MANOVA)
The model is entered using letters separated by the plus sign. For example, a three-factor factorial in which only
two-way interactions are needed would be entered as follows:
A+B+AB+C+AC+BC
Note that repeated-measures designs are not allowed.
Write Model in ‘Custom Model’ Field
When this option is checked, no analysis is performed when the procedure is run. Instead, a copy of the full model
is stored in the Custom Model box. You can then delete selected terms from the model without having to enter all
the terms you want to keep.
Reports Tab
The following options control which reports are displayed.
Select Reports
EMS Report ... Univariate F's
Specify whether to display the indicated reports.
Test Alpha
The value of alpha for the statistical tests and power analysis. Usually, this number will range from 0.1 to 0.001.
A common choice for alpha is 0.05, but this value is a legacy from the age before computers when only printed
tables where available. You should determine a value appropriate for your particular study.
Report Options
Precision
Specify the precision of numbers in the report. Single precision will display seven-place accuracy, while double
precision will display thirteen-place accuracy.
Variable Names
Indicate whether to display the variable names or the variable labels.
Value Labels
Indicate whether to display the data values or their labels.
Plots Tab
The following options specify the plot(s) of group means.
Select Plots
Means Plot(s) and Subject Plot
Specify whether to display the indicated plots. Click the plot format button to change the plot settings.
Y-Axis Scaling
Specify the method for calculating the minimum and maximum along the vertical axis. Separate means that each
plot is scaled independently. Uniform means that all plots use the overall minimum and maximum of the data.
This option is ignored if a minimum or maximum is specified.
415-7
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multivariate Analysis of Variance (MANOVA)
The Expected Mean Square expressions are provided to show the appropriate error term for each factor. The
correct error term for a factor is that term that is identical except for the factor being tested.
Source Term
The source of variation or term in the model.
DF
The degrees of freedom. The number of observations “used” by this term.
415-8
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multivariate Analysis of Variance (MANOVA)
Term Fixed?
Indicates whether the term is “fixed” or “random.”
Denominator Term
Indicates the term used as the denominator in the F-ratio.
Expected Mean Square
This expression represents the expected value of the corresponding mean square if the design was completely
balanced. “S” represents the expected value of the mean square error (sigma). The uppercase letters represent
either the adjusted sum of squared treatment means if the factor is fixed, or the variance component if the factor is
random. The lowercase letter represents the number of levels for that factor, and “s” represents the number of
replications of the experimental layout.
These EMS expressions are provided to determine the appropriate error term for each factor. The correct error
term for a factor is that term whose EMS is identical except for the factor being tested.
This report gives the results of the various significance tests. Usually, the four multivariate tests will lead to the
same conclusions. When they do not, refer to the discussion of these tests found earlier in this chapter. Once a
multivariate test has found a term significant, use the univariate ANOVA to determine which of the variables and
factors are “causing” the significance.
Term(DF)
The term in the design model with the degrees of freedom of the term in parentheses.
Test Statistic
The name of the statistical test shown on this row of the report. The four multivariate tests are followed by the
univariate F-tests of each variable.
Test Value
The value of the test statistic.
415-9
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multivariate Analysis of Variance (MANOVA)
DF1
The numerator degrees of freedom of the F-ratio corresponding to this test.
DF2
The denominator degrees of freedom of the F-ratio corresponding to this test.
F-Ratio
The value of the F-test corresponding to this test. In some cases, this is an exact test. In other cases, this is an
approximation to the exact test. See the discussion of each test to determine if it is exact or approximate.
Prob Level
The significance level of the above F-ratio. The probability of an F-ratio larger than that obtained by this analysis.
For example, to test at an alpha of 0.05, this probability would have to be less than 0.05 to make the F-ratio
significant.
Decision(0.05)
The decision to accept or reject the null hypothesis at the given level of significance. Note that you specify the
level of significance when you select Alpha.
WRATR WRATA
WRATR 45.33333 2.583333
WRATA 0.0572313 44.94444
This report displays the correlations and covariances formed by averaging across all the individual group
covariance matrices. The correlations are shown in the lower-left half of the matrix. The within-group covariances
are shown on the diagonal and in the upper-right half of the matrix.
This report analyzes the within-cell correlation matrix. It lets you diagnose multicollinearity problems as well as
determine the number of dimensions that are being used. This is useful in determining if Pillai’s trace should be
used.
R-Squared Other Y’s
This is the R-Squared index of this variable with the other variables. When this value is larger than 0.99, severe
multicollinearity problems exist. If this happens, you should remove the variable with the largest R-Squared and
re-run your analysis.
Canonical Variate
The identification numbers of the canonical variates that are generated during the analysis. The total number of
variates is the smaller of the number of variables and the number of degrees of freedom in the model.
415-10
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multivariate Analysis of Variance (MANOVA)
Eigenvalue
The eigenvalues of the within correlation matrix. Note that this value is not associated with the variable at the
beginning of the row, but rather with the canonical variate number directly to the left.
Percent of Total
The percent that the eigenvalue is of the total. Note that the sum of the eigenvalues will equal the number of
variates. If the percentage accounted for by the first eigenvalue is relatively large (70 or 80 percent), Pillai's trace
will be less powerful than the other three multivariate tests.
Cumulative Total
The cumulative total of the Percent of Total column.
This is the standard ANOVA report as documented in the General Linear Models chapter. A separate report is
displayed for each of the dependent variables.
Standard
Term Count Mean Error
All 18 89.11111
A: Treatment
1 9 99.88889 2.234687
2 9 78.33334 2.234687
B: Disability
1 6 95.83334 2.736922
2 6 88.83334 2.736922
3 6 82.66666 2.736922
AB: Treatment,Disability
1,1 3 106.6667 3.870592
1,2 3 100 3.870592
1,3 3 93 3.870592
2,1 3 85 3.870592
2,2 3 77.66666 3.870592
2,3 3 72.33334 3.870592
415-11
© NCSS, LLC. All Rights Reserved.
NCSS Statistical Software NCSS.com
Multivariate Analysis of Variance (MANOVA)
Plots Section
This report provides the least-squares means and standard errors for each variable. Note that the standard errors
are calculated from the mean square error of the ANOVA table. They are not the standard errors that would be
calculated from the individual cells.
415-12
© NCSS, LLC. All Rights Reserved.