An Introduction To Multivariate Statisti
An Introduction To Multivariate Statisti
T. W. ANDERSON
Stanford University
Department of Statl'ltLc",
Stanford, CA
~WlLEY
~INTERSCIENCE
A JOHN WILEY & SONS, INC., PUBLICATION
Copyright © 200J by John Wiley & Sons, Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any
form or hy any me'IllS, electronic, mechanical, photocopying, recording, scanning 0,' otherwise,
except as pClmit(ed under Section 107 or lOS or the 1Y7c> Uni!l:d States Copyright Act, without
either the prior writ'en permission of the Publisher, or al thorization through payment of the
appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive,
Danvers, MA (lIn], 'J7H-750-H400, fax 97R-750-4470, or on the weh 'It www.copyright,com,
Requests tf) the ,>ublisher for permission should be addressed to the Permissions Department,
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (20n 748-6011, fax (20n
748-6008, e-mdil: [email protected].
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best
efforts in preparing this book, they make no representations or warranties with resped to
the accuracy or completeness of the contents of this book and specifically disclaim any
implied warranties of merchantability or fitness for a particular purpose. No warranty may be
created or extended by sales representatives or written sales materials. The advice and
strategies contained herein may not be suitable for your situation. You should consult with
a professional where appropriate. Neither the publisher nor au'hor shall be liable for any
loss of profit or any other commercial damages, including but not limited to special,
incidental, consequential, or other damages.
For gl:nl:ral information on our othl:r products and sl:rvices pll:asl: contad our Customl:r
Care Department within the U.S. at 877-762-2974, outside the U.S, at 317-572-3993 or
fax 317-572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears
in print, however, may not he availllhie in electronic format.
lOYH7654321
To
DOROTHY
Contents
1 Introduction 1
2.1. Introduction, 6
2.2. Notions of Multivariate Distributions, 7
2.3. The Multivariate Normal Distribution, 13
2.4. The Distribution of Linear Combinations of Normally
Distributed Variates; Independence of Variates;
Marginal Distributions, 23
2.5. Conditional Distributions and Multiple Correlation
Coefficient, 33
2.6. The Characteristic Function; Moments, 41
2.7. Elliptically Contoured Distributions, 47
Problems, 56
3.1. Introduction, 66
vii
Vlli CONTENTS
4.1. rntroduction,115
4.2. Currelation CoclTiciellt or a 13ivariate Sample, 116
4.3. Partial Correlation CoetTicients; Conditional
Di!'trihutions, 136
4.4. The MUltiple Correlation Codficient, 144
4.5. Elliptically Contoured Distributions, ] 58
Problems, I ()3
5. I. rntrod uction,
170
5.2. Derivation of the Generalized T 2-Statistic and Its
Distribution, 171
5.3. Uses of the T"-Statistic, 177
5.4. The Distribution of T2 under Alternative Hypotheses;
The Power Function, 185
5.5. The Two-Sample Problem with Unequal Covariance
Matrices, 187
5.6. Some Optimal Properties or the T 1-Test, 190
5.7. Elliptically Contoured Distributions, 199
Problems, 20 I
References 687
Index 713
Preface to the Third Edition
For some forty years the first and second editions of this book have been
used by students to acquire a basic knowledge of the theory and methods of
multivariate statistical analysis. The book has also served a wider community
of stati~ticians in furthering their understanding and proficiency in this field.
Since the second edition was published, multivariate analysis has been
developed and extended in many directions. Rather than attempting to cover,
or even survey, the enlarged scope, I have elected to elucidate several aspects
that are particularly interesting and useful for methodology and comprehen-
sion.
Earlier editions included some methods that could be carried out on an
adding machine! In the twenty-first century, however, computational tech-
niques have become so highly developed and improvements come so rapidly
that it is impossible to include all of the relevant methods in a volume on the
general mathematical theory. Some aspects of statistics exploit computational
power such as the resampling technologies; these are not covered here.
The definition of multivariate statistics implies the treatment of variables
that are interrelated. Several chapters are devoted to measures of correlation
and tests of independence. A new chapter, "Patterns of Dependence; Graph-
ical Models" has been added. A so-called graphical model is a set of vertices
Or nodes identifying observed variables together with a new set of edges
suggesting dependences between variables. The algebra of such graphs is an
outgrowth and development of path analysis and the study of causal chains.
A graph may represent a sequence in time or logic and may suggest causation
of one set of variables by another set.
Another new topic systematically presented in the third edition is that of
elliptically contoured distributions. The multivariate normal distribution,
which is characterized by the mean vector and covariance matrix, has a
limitation that the fourth-order moments of the variables are determined by
the first- and second-order moments. The class .of elliptically contoured
xv
xvi PREFACE TO THE THIRD EDITION
T. W. ANDERSON
Stanford, California
February 2003
Preface to the Second Edition
Twenty-six years have plssed since the first edition of this book was pub-
lished. During that tim~ great advances have been made in multivariate
statistical analysis-particularly in the areas treated in that volume. This new
edition purports to bring the original edition up to date by substantial
revision, rewriting, and additions. The basic approach has been maintained,
llamely, a mathematically rigorous development of statistical methods for
observations consisting of several measurements or characteristics of each
sUbject and a study of their properties. The general outline of topics has been
retained.
The method of maximum likelihood has been augmented by other consid-
erations. In point estimation of the mf"an vectOr and covariance matrix
alternatives to the maximum likelihood estimators that are better with
respect to certain loss functions, such as Stein and Bayes estimators, have
been introduced. In testing hypotheses likelihood ratio tests have been
supplemented by other invariant procedures. New results on distributions
and asymptotic distributions are given; some significant points are tabulated.
Properties of these procedures, such as power functions, admissibility, unbi-
asedness, and monotonicity of power functions, are studied. Simultaneous
confidence intervals for means and covariances are developed. A chapter on
factor analysis replaces the chapter sketching miscellaneous results in the
first edition. Some new topics, including simultaneous equations models and
linear functional relationships, are introduced. Additional problems present
further results.
It is impossible to cover all relevant material in this book~ what seems
most important has been included. FOr a comprehensive listing of papers
until 1966 and books until 1970 the reader is referred to A Bibliography of
Multivariate Statistical Analysis by Anderson, Das Gupta, and Styan (1972).
Further references can be found in Multivariate Analysis: A Selected and
xvii
xvIH PREFACE TO THE SECOND EDITION
T. W. ANDERSON
SIanford. California
June 1984
Preface to the First Edition
This book has been designed primarily as a text for a two-semester course in
multivariate statistics. It is hoped that the book will also serve as an
introduction to many topics in this area to statisticians who are not students
and will be used as a reference by other statisticians.
For several years the book in the form of dittoed notes has been used in a
two-semester sequence of graduate courses at Columbia University; the first
six chapters constituted the text for the first semester, emphasizing correla-
tion theory. It is assumed that the reader is familiar with the usual theory of
univariate statistics, particularly methods based on the univariate normal
distribution. A knowledge of matrix algebra is also a prerequisite; however,
an appendix on this topic has been included.
It is hoped that the more basic and important topics are treated here,
though to some extent the coverage is a matter of taste. Some 0f the mOre
recent and advanced developments are only briefly touched on in the late
chapter.
The method of maximum likelihood is used to a large extent. This leads to
reasonable procedures; in some cases it can be proved that they are optimal.
In many situations, however, the theory of desirable or optimum procedures
is lacking.
Over the years this manuscript has been developed, a number of students
and colleagues have been of considerable assistance. Allan Birnbaum, Harold
Hotelling, Jacob Horowitz, Howard Levene, Ingram OIkin, Gobind Seth,
Charles Stein, and Henry Teicher are to be mentioned particularly. Acknowl-
edgements are also due to other members of the Graduate Mathematical
xix
xx PREFACE TO THE FIRST EDITION
T. W. ANDERSON
Introduction
tWhen data are listed on paper by individual, it is natural to print the measurements on one
individual as a row of the table; then one individual corresponds to a row vector. Since we prefer
to operate algebraically with column vectors, we have chosen to treat observations in terms of
column vectors. (In practice, the basic data set may weD be on cards, tapes, or di.sks.)
1
INTRODUCTION
The statistical methods treated in this book can be developed and evaluated
in the context of the multivariate normal distribution, though many of the
procedures are useful and effective when the distribution sampled is not
normal. A major reason for basing statistical analysis on the normal distribu-
tion is that this probabilistic model approximates well the distribution of
continuous measurements in many sampled popUlations. In fact, most of the
methods and theory have been developed to serve statistical analysis of data.
Mathematicians such as Adrian (1808), Laplace (1811), Plana (1813), Gauss
4 INTRODUCTION
(1823), and Bravais (1846) l:tudicd the bivariate normal density. Francis
Galton, th.! 3eneticist, introduced the ideas of correlation, regression, and
homoscedasticity in the study ·of pairs of measurements, one made on a
parent and OTJ~ in an offspring. [See, e.g., Galton (1889).] He enunciated the
theory of the multivariate normal distribution as a generalization of obsetved
properties of s2mples.
Karl Pearson and others carried on the development of the theory and use
of differe'lt kinds of correlation coefficients t for studying proble.ns in genet-
ics, biology, and other fields. R. A. Fisher further developed methods for
agriculture, botany, and anthropology, including the discriminant function for
classification problems. In another direction, analysis of scores 01 mental
tests led to a theory, including factor analysis, the sampling theory of which is
based on the normal distribution. In these cases, as well as in agricultural
experiments, in engineering problems, in certain economic problems, and in
other fields, the multivariate normal distributions have been found to be
sufficiently close approximations to the populations so that statistical analy-
ses based on these models are justified.
The univariate normal distribution arises frequently because the effect
studied is the sum of many independent random effects. Similarly, the
multivariate normal distribution often occurs because the multiple meaSUre-
ments are sums of small independent effects. Just as the central limit
theorem leads to the univariate normal distrL>ution for single variables, so
does the general central limit theorem for several variables lead to the
multivariate normal distribution.
Statistical theory based on the normal distribution has the advantage that
the multivariate methods based on it are extensively developed and can be
studied in an organized and systematic way. This is due not only to the need
for such methods because they are of practical US,!, but also to the fact that
normal theory is amenable to exact mathematical treatment. The 'suitable
methods of analysis are mainly based on standard operations of matrix.
algebra; the distributions of many statistics involved can be obtained exactly
or at least characterized; and in many cases optimum properties of proce-
dures can be deduced.
The point of view in this book is to state problems of inference in terms of
the multivariate normal distributions, develop efficient and often optimum
methods in this context, and evaluate significance and confidence levels in
these terms. This approach gives coherence and rigor to the exposition, but,
by its very nature, cannot exhaust consideration of multivariate &tUistical
analysis. The procedures are appropriate to many nonnormal distributions,
f For a detailed study of the development of the ideas of correlation, see Walker (1931),
1.2 THE MULTIVARIATE NORMAL DISTRIBUTION s
but their adequacy may be open to question. Roughly speaking, inferences
about means are robust because of the operation of the central limit
theorem~ but inferences about covariances are sensitive to normality, the
variability of sample covariances depending on fourth-order moments.
This inflexibility of normal methods with respect to moments of order
greater than two can be reduced by including a larger class of elliptically
contoured distributions. In the univariate case the normal distribution is
determined by the mean and variance; higher-order moments and properties
such as peakedness and long tails are functions of the mean and variance.
Similarly, in the multivariate case the means and covariances or the means,
variances, and correlations determine all of the properties of the distribution.
That limitation is alleviated in one respect by consideration of a broad class
of elliptically contoured distributions. That class maintains the dependence
structure, but permits more general peakedness and long tails. This study
leads to more robust methods.
The development of computer technology has revolutionized multivariate
statistics in several respects. As in univariate statistics, modern computers
permit the evaluation of obsetved variability and significance of results by
resampling methods, such as the bootstrap and cross-validation. Such
methodology reduces the reliance on tables of significance points as well as
eliminates some restrictions of the normal distribution.
Nonparametric techniques are available when nothing is known about the
underlying distributions. Space does not permit inclusion of these topics as
well as o,\her considerations of data analysis, such as treatment of outliers
a.n?Jransformations of variables to approximate normality and homoscedas-
tIClty.
The availability of modem computer facilities makes possible the analysis
of large data sets and that ability permits the application of multivariate
methods to new areas, such as image analysis, and more effective a.nalysis of
data, such as meteorological. Moreover, new problems of statistical analysis
arise, such as sparseness of parameter Or data matrices. Because hardware
and software development is so explosive and programs require specialized
knowledge, we are content to make a few remarks here and there about
computation. Packages of statistical programs are available for most of the
methods.
CHAPTER 2
The Multivariate
Normal Distribution
2.1. INTRODUCTION
6
2.2 NOTIONS OF MULTIVARIATE DISTRIBUTIONS 7
defined for every pair of real numbers (x, y). We are interested in cases
where F(x, y) is absolutely continuous;· this means that the following partial
derivative exists almost everywhere:
a 2 F(x, y) _
(2) a-~ay -f(x,y),
and
y x
(3) F(x,y) =
f-00
f
-00
f(u,v) dudv.
The nonnegative function f(x, y) is called the density of X and Y. The pair
of random variables ex, Y) defines a random point in a plane. The probabil-
ity that (X, Y) falls in a rectangle is
(4) Pr{x~X~x+6.x,y~Y~y+6.y}
= jYY+6 Y j.l+6X
x
f(u,v) dudv
(6.x> 0, 6.y> 0). The probability of the random point (X, Y) falling in any
set E for which the following int.!gral is defined (that is, any measurable set
E) is
tIn Chapter 2 we shall distinguish between random variables and running variables by use of
capital and lowercase letters, respectively. In later chapters we may be unable to hold to this
convention because of other complications of notation.
8 THE MULTIVARIATE NORMAL DISTRIBUTION
This follows from the definition of the integral ks the limit of sums of the
sort (4)]. If j(x, y) is continuous in both variables, the probability element
j(x, y) tl y tl x is approximately the probability that X falls between x and
x + tlx and Y falls between y and y + tly since
Y+~YfX+~X
(6) Pr{x::;X::;x+tlx,y::;Y::;y+tly} =
fY x
j(u,v)dudv
for some xo,Yo (x ::;xo:::;;x + tlx, y ::;Yo:::;;y + tly) by the mean value theo~
rem of calculus. Since j(u, [) is continuous, (6) is approximately j(x, y) tlx tl y.
In fact,
-j(x,y)tlxtlyl =0.
Now we consider the case of p random variables Xl' X 2 , ••• , Xp' The
cdf is
(8)
defined for every set of real numbers XI"'" xp' The density function, if
F(Xh'''' x p ) is absolutely continuous, is
(9)
.,
The probability of falling in any (measurable) set R in the p-dimensional
Euclidean space is
The probability element j(Xl"'" x p) tlxl '" tlxp is approximately the prob-
ability Pr{x l ::;Xl ::;Xl + tlxl, ... ,x p ::;Xp ::;x p + tlxp} if j(Xl'''''X p) is
2.2 NOTIONS OF MULTIVARIATE DISTRIBUTIONS 9
We call
00
(15) I f(u,v)dv=f(u),
-00
In a similar fashion we define G(y), the marginal edf of Y, and g(y), the
marginal density of Y.
Now we turn to the general case. Given F(x l , •.• , x p ) as the edf of
XII"" Xp, w(.; wish to find the marginal edf of some of Xl"'" X p' say, of
Xl"'" Xr (r <p). It is
1_
00 00
The marginal distribution and density of any other subset of Xl"'" Xp are
obtained in the obviously similar fashion.
The joint moments of a subset of variates can be computed from the
marginal distribution; for example,
= a 2F(x,y) = a 2F(x)G(y)
(21 ) f( x, y ) ax ay ax ay
_ dF(x) dG(y)
- dx dy
= f(x)g(y).
Conversely, if flx, y) = f{x)g{y), then
(24)
where Fi(x,) is the marginal cdf of XI' i = 1, ... , p. The set Xl" .. , Xr is said
to be independent of the set Xr+ 1, ••. , Xp if
(25) F(XI""'X p) =F(xp ... ,xr,oo, ... ,oo)·F(oo, ... ,oo,xr+" ... ,x,).
(30)
1t will be 110ticcu that for fixed y ,Illd ~y (> 0), the integrand of(30) behaves
as a univariate density function. Now for y such that g{y) > 0, we define
Pr{x 1 ~ X ~ \ 21 Y = y}, the probability that X lies between Xl and x 2 ' given
that Y is y, as the limit of (30) as ~y -+ O. Thus
X2
(31) Pr{xl:::;X :::;x 2 IY=y} =
fXI
f(uly) du,
where f{u Iy ) = f{u, y )I g{y). For given y, f{ u Iy) is a density funct~on and is
called the conditional density of X given y. We note that if X and Yare
independent, f{xly) = f{x).
In the general case of Xl''''' X r, with cdf F{x l , ... , x p ), the conditional
density of Xl"'" Xn given X'+l =X,+l"'" Xp =xp' is
(32)
tMore precisely. we assume this is true for the part of the x-space for which f(x 1, ... , x p) is
positive.
2.3 THE MULTIVARIATE NORMAL DISTRIBUTION 13
We assume the derivatives exist, and "mod" means modulus or absolute value
of the expression following it. The probability that (Xl"'" Xp) falls in a
region R is given by (11); the probability that (Y1, ••• , Yp) falls in a region S is
(2) x=
14 THE MULTIVARIATE NORMAL DISTRIBUTION
(3)
all a l2 alp
a 21 a 22 a 2p
(4) A=
a pI a p2 a pp
The square a(x - (3)2 = (x - (3)a(x - (3) is replaced by the quadratic form
p
(5) (x-b)'A(x-h)= E a,;Cx,-bj)(xj-bj)'
i.J= I
( 6) f( X p ... , X p )
= v
1\.1e - !tl'-b)'A(x-b) ,
where K (> 0) is chosen so that the integral over the entire p-dimensional
Euclidean space of Xl"'" xp is unity.
Written in matrix notation, the similarity of the multivariate normal
density (6) to the univariate density (1) is clear. Throughout this book we
shall use matrix notation and operations. Th; reader is referred to the
Appendix for a review of matrix theory and for definitions of our notation for
matrix operations.
We observe that f(x I ' " ., x) is nonnegative. Since A is positive definite,
(ti)
Now let us determine K so that the integral of (6) over the p-dimensional
space is one. We shall evaluate
( 9) K* = f ... f
::x:
IX. •
::x:
rx:;
e-- +(.r-b)'A(x-b) dx
p
'" dx l ·
2.3 THE MULTIVARIATE NORMAL DISTRIBUTION 15
We use the fact (see Corollary A.1.6 in the Appendix) that if A is positive
definite, there exists a nonsingular matrix C such that
(10) C'AC=I,
( 11) x - b = Cy,
where
(12)
Then
(14) J= modlCI,
where modi CI indicates the absolute value of the determinant of C. Thus (9)
becomes
We have
(16)
p
= mod ICI fl {v'27T}
1= 1
16 THE MULTIVARIATE NORMAL DISTRIBUTION
by virtue of
(22)
(23)
We shall now show the significance of b and A by finding tne first and
second moments of Xl"'" Xp' It will be convenient to consider these
random va;iables as constituting a random vector
(24)
If the random variables Zl1"'" Zmn can take on only a finite number of
values, the random matrix Z can be one of a (mite number of matrices, say
Z(l), ... , Z(q). If the probability of Z = Z(i) is P" then we should like to
define tlZ as 1:.1.1 Z(i),?" Then tlZ = (tlZ gh ). If the random variables
Zu, ... , Zmn have a joint density, then by operating with Riemann sums we
can define tlZ as the limit (if the limit exists) of approximating sums of the
kind occurring in the dis(!rete case; then again tlZ = (tlZ gh ). Therefore, in
general we shall use the following definition:
(27)
is the mean or mean vector of X. We shall usually denote this mean vector by
JL. If Z is (X - JLXX - JL)', the expected value is
The operation of taking the expected value of a random matrix (or vector)
satisfies certain rules which we can summarize in the following lemma:
Proof The element in the ith row and jth column of S(DZE + F) is
which is the element in the ith row and jth column of D( SZ)E + F .
•
Lemma 2.3.2. If Y= DX + f, where X is a random vector, then
(32) S Y= D SX +f,
(33) C~Y) =Dg(X)D'.
Pl'Ouf The first assertion follows directly from Lemma 23.1, and the
second from
(35)
(36)
=
1
/2'Tr
jX_ ~l,e - I '
I'\',· dy,
=0.
2.3 THE MULTIVARIATE NORMAL DISTRIBUTION 19
The last equality follows because t yje- !'y~ is an odd function of Yi' Thus
IY = O. Therefore, the mean of X, denoted by j-L, is
(37) j-L=GX=b.
From (33) we see that -€(X) = C(GYY')C'. The i,jth element of GYY' is
(38) GY;~ = f OO
...
foo YIYj nP {I I 2} dYl ... dyp
- - e - 2Yh
-0 -~ h=l f27r
(39)
The last equality follows because the next to last expression is the expected
value of the square of a variable normally distributed with mean 0 and
variance 1. If i -+ j, (38) becomes
(40)
= 0, i -+ j,
since the first integration gives O. We can summarize (39) and (40) as
( 41) GYY' = I.
Thus
tAlternatively, the last equality follows because the next to last expression is the expected value
of a normally distributed variable with mean O.
20 THE MULTIVARIATE NORMAL DISTRIBUTION
gives us
( 43)
(44)
From (43) we see that I is positive definite. Let us summarize these results.
(45)
such that the expected value of the vector with this density is j.L and the covariance
matrix is I.
We shall denote the density (45) as n(xl j.L, :~) and the distribution law as
N(j.L, I).
The ith L!iagonal element of the covariance matrix, (Iii> is the variance of
the ith component of X; we may sometimes denote this by (J/. The
co"elation coefficient between X, and X J is defined as
(46)
(47)
aap
J IJ _ 2 2( 2)
a, ~
I
(48) 2 - 1- Pij
~
is positive. Therefore, - 1 < P'J < L (For singular distributions, see Section
2.4.) The multivariate normal density can be parametrized by the means J.L;,
i = 1, ... , p, the variances a/, i = 1, ... , p, and the correlations Pij' i <$
i, j = 1, ... , p~ <"!
2.3 THE MtjLTIVARIATENORMAL DISTRIBUTION 21
(49)
(50)
( Xl - J.L1)( X 2 ~ J.L2) )
(X 2 -J.L2)
1 --p-
2 u 1U 2
U1
(51) I- l = 1
1 - p2 --p- I
0"lU2 u 22
(52)
Proof. The variance of XI* is b12u 12 , i = 1,2, and the covariance of Xi and
Xi is b1b2u1u2 p by Lemma 2.3.2_ Insertion of these values into the
definition of the correlation between xt and Xi shows that it is p_ If
f{ ILl' IL2' U 1, U 2 , p) is inval iant with respect to such transformations, it must
be .rcO, 0, 1, 1, p) by choice of bi = 1/ ui and c j = - ILJ u j , i = 1,2. •
22 THE MULTIVARIATE NORMAL DISTRIBUTION
(53)
The smaller (53) is (that is, the larger p is), the more similar Y1 and Y2 are. If
p> 0, Xl and X"]. tend to be positively related, and if p < 0, they tend to be
negatively related. If p = 0, the density (52) is the product o' the marginal
densities of Xl and X"].; hence XI and X 2 are independent.
It will be noticed that the density function (45) is constant on ellipsoids
(54)
(55) 1
----,-2 (2
YI-2pYIY2+Y22) =c.
1-p
Thl~ intercepts on the Y l·axis and h-axis are ~qual. If p > 0, the major axis of
the t:llipse is along the 45 0 line with a length of 2yc( 1 + p) , and the minor
axis has a length of 2yc( 1 - p) , If p < 0, the major axis is along the 135 0 line
with a length of 2yc( 1 - p) , and the minor axis has a length of 2yc( 1 + p) .
The value of p determines the ratio of these lengths. In this bivariate case we
can think of the density function as a surface above the plane. The contours
of equal density are contours of equal altitude on a topographical map; the~r
indicate the shape of the hill (or probability surface). If p> 0, the hill will
tend to run along a line with a positive slope; most of the hill will be in the
first and third quadrants. When we transform back to Xi = uiYi + J.Li' we
expand each contour by a factor of U I in the direction of the ith axis and
shift the center to ( J.LI' J.L2)·
2.4 liNEAR COMBINATIONS; MARGINAL DISTRIBUTIONS 23
The numerical values of the cdf of the univariate normal variable are
obtained from tables found in most statistical texts. The numerical values of
(56)
where the so-called tetrachoric functions T/Y) are tabulated in Pearson (1930)
up to T I 9(Y). Harris and Soms (1980) have studied generalizations of (57).
Proof. The density of Y is obtained from the density of X, n(xl fL, I), by
replacing x by
(2) x=C-1y,
24 THE MULTIVARIATE NORMAL DISTRIBUTION
(4)
The transformmion (2) carries Q into
== [C-1(y-Cp.)]'I-1[C I(y-Cp.)]
== (y - Cp.)'(CIC,)-I(y - Cp.)
since (C- I ), = (C')-l by virtue of transposition of CC- 1 = I. Thus the
density of Y is
= n(yICp.,CIC'). •
(7)
= I' X(l) \ =
(8) X (2) I
\X I
Now let us assume that the p variates have a joint normal distribution with
mean vectors
(9)
2.4 LINEAR COMBINATIONS; MARGINAL DISTRIBUTIONS 25
We say that the random vector X has been partitioned in (8) into subvectors,
that
(13) =: ( p.0) )
P. (2)
P.
(14)
has been partitioned similarly into submatrices. Here I2l = I'12' (See Ap-
pendix, Section A.3.)
We shall show that XO) and X(2) are independently normally distributed
if II2 = I;1 = O. Then
(15)
Its inverse is
(16)
say, where
(19)
Thus the marginal distribution of X(l) is N(JL(l), Ill); similarly the marginal
distribution of X(Z) is N(JL(2), I zz ). Thus the joint density of Xl' .. " Xp is the
product of the marginal density of Xl'.'" Xq and the marginal density of
Xq+ 1' ..• , Xp, and therefore the two sets of variates are independent. Since
the numbering of variates can always be done so that XO) consists of any
subset of the variates, we have proved the sufficiency in the following
theorem:
The m cessity follows from the fact that if XI is from one set and Xj from
the other, then for any density (see Section 2.2.3)
·f{Xq+p .. "xp)dxl"·dxp
= o.
Since aij = aiGj PI}' and 0;, OJ ::1= 0 (we tacitly assume that :I is nonsingular),
the condition aij - 0 is equivalent to PI} = O. Thus if one set of variates is
uncorrelated with the remaining variates, the two sets are independent. It
should be emphasized that the implication of independence by lack of
correlation depends on the assumption of normality, but the converse is
always true,
. Let us consider the special case of the bivariate normal distribution. Then
X (I) = X I' X(2) = X 2' .. 0) = I I
....
. . (2) = I I
,-1' ....
~
,-2''';''11
= a II = a 1''';''22
2 ~ = a 22 = (]",2'
2
Now let us show that the corollary holds even if the two <;ets are not
independent. We partition X, 11-, and :I as before. We shall make a
nonsingular Hnear transformation to subvectors
= Il2 + BIn·
The vector
(27)
- ""
-12 I 22
-I 1(.. 1
r-(I)
/ ~(2)
=v,
say, anJ
(28) 1f(Y)=$(Y-v)(Y-v)'
since
=~11-~)2~221~21'
Thus y(l) and y(2) are independent, and by Corollary 2.4.1 X(2) = y(2) has
the marginal distribution N(pP>' ~22)' Because the numbering of the compo--
nents of X is arbitrary, we can state the following theorem:
(30) Z=DX,
where Z has q components and D is a q X P real matrix. The expected value
of Z is
(31)
(32)
The case q = p and D nonsingular has been treated above. If q ;5; p and D is
of rank q. we can find a (p - q) X P matrix E such that
(33) (;)=(~)X
is a nonsingular transformation. (See Appendix. Section A.3J Then Z and W
have a joint normal distribution, and Z has a marginal normal distribution by
Theorem 2.4.3. Thus for D of rank q (and X having a nonsingular distribu-
tion, that is, a density) we have proved the following theorem:
30 THE MULTIVARIATE NORMAL DISTRIBUTION
(34) X=AY+A
say. It should be noticed that if p > q, then I is singular and therefore has
no inverse, and thus we cannot write the normal density for X. In fact, X
cannot have a density at all, because the fact that the probability of any set
not intersectmg the q-set is 0 would imply that the density is 0 almost
everywhere.
Now. conversely, let us see that if X has mean t-t and covariance matrix ~
of rank r, it can be written as (34) (except for 0 probabilities), where X has
an arbitrary distribution, and Y of r (~p) components has a suitable
distribution. If ~ is of rank r, there is a p X P nonsingular matrix B such
thaI
where the identity is of order r. (See Theorem A.4.1 of the Appendix.) The
transformation
(37) BX=V=
V(l»)
( V(2)
2.4 LINEAR COMBINATIONS; MARGINAL DISTRIBUTIONS 31
defines a random vector Y with covariance matrix (36) and a mean vector
(38)
say. Since the variances of the elements of y(2) are zero, y(2) = v(2) with
probability 1. Now partition
(39) B- 1 (C D),
(40)
(41) X CV O) + Dv C;'),
(42)
(43) Z=DAY+DA,
(45)
such that
F EF'
(46) FEF' = ( I I
F2EF;
(48)
Y is constant on eUipsoius
(49)
The marginal distribution of X(l) is the projection of the mass of the
distribution of X onto the q~dimensional space of the first q coordinate axes.
The surfaces of constant density are again ellipsoids. The projection of mass
on any line is normal.
(1) X
= (X(l))
X(2)
The density of XCI) and X (2) then can be obtained from this expression by
substituting X(I) - I 12 Ii21 X(2) for y(l) and X(2) for y<2) (the Jacobian of this
transformation being 1); the resulting density of X(l) and X(2) is
{2)
where
(3)
This density must be n(xl j-L, I). The conditional density of X(I) given that
Xl:!) = Xl:!1 is the quotient of (2) and the marginal density of X(2) at the point
X(2). which is n(x(2)1 fJ.(2), I
22 ), the second factor of (2). The quotient is
(4)
It should be noted that the mean of X(I) given x(2) is simply a linear function
of Xl:!). and the covariance matrix of X(1) given X(2) does not depend on X(2)
at all.
The element in the ith row and (k - q)th column of P= I 12 I2"i is often
denoted by
(7) f3,A q+I .... k-l.k+1.. ,p' i=l, ... ,q, k=q+l, ... ,p.
Definition 2.5.2
is the partial correlation between X, and X, holding Xq+ 1 " " , Xp fixed.
2.5 CONDITIONAL DISTRIBUTIONS; MULTIPLE CORRELATION 35
The density of XI given X2 is n[xll i-tl + (al pi ( 2)(x2 - i-t2)' a?(1- p2)].
The mean of this conditional distribution increases with x 2 when p is
positive and decreases with increasing x 2 when p is negative. It may be
noted that when a l = a2, for example, the mean of the conditional distribu-
tion of XI does not increase relative to i-tl as much as x2 increases relative to
i-t2' [Galton (889) observed that the average heights of sons whose fathers'
heights were above average tended to be less than the fathers' he~ghts; he
called this effect "regression towards mediocrity."] The larger Ipi is, the
smaller the variance of the conditional distribution, that is, the more infor-
mation x 2 gives about x I' This is another reason for considering p a
measure of association between Xl and X 2 •
A geometrical interpretation of the theory is enlightening. The density
f(x l • x 2) can be thought of as a surface z = f(x I' x2) over the x It x2-plane. If
we inters~ct this surface with the plane x2 = c, we obtain a curve z = f(x I, c)
over the line X2 = c in the xl' x 2-plane. The ordinate of this curve is
36 THE MULTIVARIATE NORMAL DISTRIBUTION
Theorem 2.5.2. The components of X(I'2) are unco"elated with the compo-
nents of X(2).
Let u(i) be the ith row of I 12' and p(i) the ith row of 13 (i.e., p(i) =<
u(i)Iil). Let r(Z) be the variance of Z.
This leads to
(14)
J (J""l r(p(l)x(2») ..; Oii r( a X(2»
I
•
Definition 2.5.4. The maximum correlation between Xi and the linear com-
bination a X(2) is called the multiple correlation coefficient between Xi and X(2).
I
(15)
A useful formula is
(16) 1
(17)
Since
- I ~-[
(18) CT."q+ I .. . p - CT.l - 0'(1')":'22 O'U)'
it foHows that
(19)
(20)
this follows from (8) when p = 3 and q = 2. We shall now find a generaliza-
tion of this result. The derivation is tedious, but is given here for complete-
ness.
Let
X(]) 1
(21) X= X(2) ,
(
X(3)
= f( x(l), x(2)lx(3»)
f( x(2)lx(3»)
In tt.e case of normality the conditional covariance matrix of X(I) and X(2)
give!' X (3) =. X(3) is
~ (};1I.3 }; 123 )
I 21 .3 I 22 .3 '
say, where
(24) I
(};II
= I21
I12
I22
In
I 23
I31 I32 I33
The conditional covariance of X(I) given X(2) = x(2) and X(3) = X(3) is calcu-
lated from the conditional covariances of X(I) and X(2) given X(3) = X(3) as
This result permits the calculation of U,j'PI+I, ... ,p' i,j= 1 " " , P I ' frat
CTif .p I +p 2.' •• , p' i,j = 1, ... , PI + P2'
In particular, for PI = q, P2 = 1, and P3 = P - q - 1, we obtain
Since
we obtain
(2)
Lemma 2.6.1. Let X' = (XU)I X (2) I). If X(I) and X(2) are independent and
g(x) = gO)(X(l»)l2)(x(2»), then
(4)
THE MULTIVARIATE NORMAL DISTRIBUTION
If g(x) is complex-valued,
Then
(9)
( 10)
Thus
(11)
Let
(12) X- JI-=CY.
(14)
Thus
(15)
= e1t'u Se WCY
= elt'P,e- t<t'C)(t'C)'
for lie = u' ; the third equality is verified by writing both sides of it as
integrals. But this is
(16)
= elt p,- it '&'1
", 1 ''''
(17)
= ell'(Dp.) it'{DID')t
,
which is the characteristic function of N(Dp., DItD') (by Theorem 2.6.0.
It is interesting to use the characteristic function to show that it is only the
multivariate normal distribution that has the property that every linear
combination of variates is normally distributed. Consider a vector Y of p
components with density f(y) and characteristic function
and suppose the mean of Y is p. and the covariance matrix is It. Suppose u'Y
is normally distributed for every u. Then the characteristic function of such
linear combination is
(19)
Now set t = 1. Since the right-hand side is then the characteristic function of
N(p., It), the result is proved (by Theorem 2.6.1 above and 2.6.3 below).
Figure 2.1
but the pair need not have a joint normal distribution and need not be
independent. This is done by choosing the rectangles so that for the resultant
distribution the expected value of Y1Y2 is zero. It is clear geometrically that
ibis can be done.
For future refeJence we state two useful theorems concerning characteris-
tlC functions.
Theorem 2.6.3. If the random vector X has the density f(x) and the
characteristic function eP(t), then
1 00 00 ,
(20) f{x) = p f ... fe-It XeP{t) dt l ... dt p •
(21T) -00 -00
This shows that the characteristic function determines the density function
uniquely. If X does not have a density, the characteristic function uniquely
defines the probability of any continuity interval. In the univariate case a
·continuity interval is an interval such that the cdf does nOt have a discontinu-
ity at an endpoint of the interval.
. Theorem 2.6.4. Let {~(x)} be a sequence of cdfs; and let {eP/t)} be the
~equence of co"esponding characteristic functions. A necessary and sufficient
condition for ~(x) to converge to a cdf F(x) is that, for every t, ePj(t) converges
to a limit eP(t) that is continuous at t ... O. When this condition is satisfied, the
limit eP(t) is identical with the characteristic function of the limiting distribution
F(x).
For the proofs of these two theorems, the reader is referred to Cramer
1(1946), Sections 10.6 and 10.7.
46 THE MULTIVARIATE NORMAL DISTRIBUTION
(21)
= I'h'
The second moment is
(22)
Thus
2
(23) Variance( Xi) = C( X 1 - I'J = (Til'
(25)
Definition 2.6.3. If all the moments of a distribution exist, then the cumu-
lants are the coefficients K in '
(27)
2.7 ELLIPTICALLY CONTOURED DISTRIBUTIONS 47
(1)
(2)
(3)
(4) YI = r sin 8 p
h = rcos 01 sin 8 2 ,
Y3 = r cos 8 1 cos 82 sin 83 ,
(5)
(6)
(7)
where
(8)
and the density of R =(y'y)i is ,p-l exp( !r 2 )/(2 W-- 1 reW)]. The density
of r2 = v is u~-le- tU/(2 iP r<W)]. This is the x2-density with p degrees of
freedom.
The COnstant C( p) is the surface area of a sphere of unit radius in p
dimensions. The random vector U with coordinates sin 0 1, cos 0 1 sin O2 ,,,,,
cos 0 1 cos O 2 ... cos 0 p - 1 ' where 0 1"", 0p-1 are independently distributed
each with the uniform distribution over (- 7r/2, 7r/2) except for 0 p _ 1 having
the uniform distribution over ( - 7r, 7r), is said to be uniformly distributed on
the unit sphere. (This is the simplest example of a spherically contoureq,.
distribution not having a density.) A stochastic representation of Y with thl
2.7 ELLIPTICALLY CONTOURED DISTRIBUTIONS 49
density g(y' y) is
d
(9) Y=RU,
~1O) SU=O.
(11) Sy=o
if rfR2 < 00. By symmetry SU? = ... = SU/ = lip because r.f_IU,2 = 1.
Again by symmetry SUI U2 = SUI U 3 = ... = SUp_ l Up. In particular SU1U2
= rf sin 0, cos 0, sin 02' the integrand of which is an odd function of 8, and
° *"
of 2 , Hence, Su.~ = 0, i j. To summarize,
(14)
Theorem 2.7.1. If Y has the density g(y' y), then Z = OY, where 0'0 = I,
has the density g(z' z).
Theorem 2.7.3. If X has the density (2), $R2 < 00, and f[c$(X)] ==
f[$(X)] for all c > 0, then f[$(X)] = f(I).
In particular PI}(X) = u i / yul/Oj) = AI/ AiA}j' where I = (Uii ) and A =
(AI) ).
(16)
co
(17) g2(Y;Y2) = C(q) 10 g(rr + Y~Y2)rrl drl'
Note that Z(I) and Z(Z) are uncorrclatc<.l even though possibly dependent.
Let C1 and Cz be q X q and (p - q) X (p - q) matrices satisfying C 1 A 11.zC~
zz C'Z = I p -q' Define y(l) and y(Z) by z(J) - T(l) C1yO) and
= I q and CZ A -J
z(Z) - v(Z) = C y(2). Then y(1) and y(2) have the density g(y(l)' y(1) + y(Z), y(2)).
z
The marginal density of y(Z) is (17), and the marginal density of X(Z) = Z(Z) is
where y(l) and U(I) have q components and y(Z) and U(Z) have p - q
components. Then R~ = y(2)'y(2) has the distribution of R ZU(Z)!U(2), and
In the cuse y.- N(O, Ip), (22) has the beta distribution, say B(p - q, q), with
density
Hence, in general,
(24)
where R~ £ R 2 b, b '" B( p ~ q, q), V has the uniform distributiun of v'v = 1
in P2 dimensions, and R2, b, and V are independent. All marginal distribu-
tions are elliptically contoured.-
where the marginal density g2(Y;Y2) is given by (17) and ri = Y;Y2. In terms
of Y I' (25) is a spherically contoured distribution (depending on ri).
Now consider X = (X~, X;)' with density (2). The conditional density of
X(l) given X(2) = X(2) is {
..~~
(26)
1 A 11.2 1 - ig{[ (X(I) - ..... (1))' - (X(2) - V(2))' B'] A 111.2 [ x(l) - ..... (1) - B(x(2) - ..... (2))]
= 1 A 11 0
2 1 - tg{[x(t) - v(t) - B(X(2) - v(2))], A1}.2[x(l) - ..... (1) - B(X(2) - ,,(2))] + ri}
-,:-g2(r1),
where ri = (X(2) ~ ..... (2)), A ;i(X(2) - ..... (2)) and B = A 12 A 21. The density (26) is
elliptically contoured in X(I) - ..... (1) - B( X(2) - ..... (2)) as a function of x(l). The
conditional mean of X(I) given X(2) =X(2) is
(27)
if S(RfIY;Y2 = ri} < co in (25), where Rf = Y{Y1• Also the conditional covari-
ance matrix is (S ri / q) All -2. It follows that Definition 2.5.2 of the partial
correlation coefficient holds when (Uijoq+I,0 .. ,p)=I lI . 2 =II! +IIZI2"lI21
and I is the parameter matrix given above.
Theorems 2.5.2, 2.5.3, and 2.5.4 ar~ true for any elliptically contoured
distribution for which S R2 < 00.
(28)
where Z = Of also has the density g(y' y). The equality (28) for all orthogo-
nal 0 implies Gell'z is a function of 1'1. We write
(33)
$R 4
= p( P + 2) (A'}"\kl + A,k Ajl + AifAjd
$R 4 p
= ~ - +2 (a"eTkI + a,l<, a)1 + ui/Uj,d·
($ R2 r p
3$R~ _ 3( $R2)2
p( P + 2) P
(~:1r
= 3K,
(37)
K"I.'= $( X,- M,)(X) - 1-t))(XI. - I-tI.)(X,- I-tl) - (U"Ukl + U1kOjI+ ut/Ojd
( 38)
2.7.5. Examples
(1) The multivariate t-distribution. Suppose Z '" N(O, Ip), ms 2 £ X~, and Z
and S2 are independent. Define Y= (l/s)Z. Then the density of Y is
(39)
and
(40)
( 41)
(42)
+.... ] (,-(1/2<)(X-fL)'I\-I(X-fL)
. (27T)1'/2IcAI~ ,
(43)
56 THE MULTIVARIATENORMAL mSTRIBUTION
PROBLEMS
Find:
2.3. (Sec. 2.2) Let f(x, y) C for x 2 + y2 S k'l and 0 elsewhere. Prove C =
1/(1rk 2), <boX $Y=O, $X 2 = <.~y2=k2/4. and $XY O. Are X and Y
tndeprrdent']
2.4. (Sec. 2.Ll Let F(x 1• X2) be the joint cuf of Xl' X 2 , and let F,,(xJ be the
marginal cdf of XI' i 1,2. Prove that if f~(Xi) is continuous, i = 1,2, then
F(xt, t z) is continuous.
2.5. (Sc.c. 2.2) Show that if the set X(, ... , X, is independent of the set
X,+I!'''' Xp. then
lOBLEMS 57
~.6. (Sec. 23) Sketch the ellipsl!s f( x, y) = 0.06, where f(x, y) is the bivariate
normal density with
2.7. (Sec. 2.3) Find b and A so that the following densities can be written in the
form of (23). Also find /-Lx. /-Ly' ax. a.l • and Pxy'
,
1
(b) 2A7T (X2/4 - 1.6.ly/2
exp - 0.72
+ i) .
2.8. (Sec. 2.3) For each matrix A in Problem 2.7 find .C so that C'AC = I.
2.10. (Sec. 2.3) Prove that the principal axes of (55) of Section 23 are along the 45°
and 135° lines with lengths 2y'c(1+p) and 2Vc(1-p), respectively. by
transforming according to Yl =(Zl +z2)/fi'Y2=(ZI -z2)/fi.
2.U. (Sec. 2.3) Suppose the scalar random variables Xl"'" Xn are independent
and have a density which is a function only of xt
+ ... +x~. Prove that the Xi
are normally distributed with mean 0 and common variance. Indicate the
mildest conditions on the density for your proof.
58 THE MULTIVARIATE NORMAL DISTRIBUTION
2.13. (Sec. 2.3) Prove that if Pi; = p, i"* j, i, j = 1, ... , p, then p ~ - 1/(p - 1).
2.15. (Sec. 2.4) Show that when X is normally distributed the components are
mutually independent if and only if the covariance matrix is diagonal.
2.16. (Sec. 2.4) Find necessary and sufficient conditions on A so that AY + A has a
continuous cdf.
2.17. (Sec. 2.4) Which densities in Problem 2.7 define distributions in which X and
Yare independent?
(a) Writc the marginal density of X for each case in Problem 2.6.
(b) Indicate the marginal distribution of X for each case in Problem 2.7 by th(.
notation N(a. b).
(cl Write the marginal densitY of XI and X" in Problem 2.9.
2.19. (S\:c. 2.4) What is the distribution or Z = X - Y whcn X and Y have each of
the densities in Problem 2.6?
2.20. (Sec. 2.4) What is the distribution of XI + 2X 2 - 3X3 when X" X 2 , X3 halje
the distribution defined in Problem 2.97
2.21. (Sec. 2.4) Let X = (XI' X~)'. where XI = X and X 2 = aX + b and X has the
distribution N(O,1). Find the cdf of X.
2.22. (Sec. 2.4) Let XI" ." X N be independently distributed, each according to
lV( /J-. u J).
(a) What is the distribution of X=(XI' ... ,XN )'? Find the vector of means
and the covariance matrix.
(b) U~ing Theorem 2.4.4, find the marginal distribution of X = LXiiN.
PROBLEMS 59
2.23. (Sec. 2.4) Let XI' ... ' X N be independently distributed with X, having distri-
bution N( f3 + ')'Zj, (]"2), where ~i is a given number, i = 1, ... , N, and EiZ; = O.
2.24. (Sec. 2.4) Let (XI' Y I )',(X2 , YZ)',(X3, Y3)' be independently distributed,
(X" Y,)' according to
i = 1,2,3.
2.25. (Sec. 2.4) Let X have a (singular) normal distribution with mean 0 and
covariance matrix
-1
-3
5
-il
*"
(a) Find a vector u 0 so that l:u = O. [Hint: Take cofactors of any column.]
(b) Show that any matrix of the form G = (H u), where H is 3 X 2, has the
property
2.27. (Sec. 2.4) Prove that if the joint (marginat) distribution of XI and X z is
singular (that is, degenerate), then the joint distribution of XI' X z, and X3 is
Singular.
60 THE MULTIVARIATE NORMAL DISTRIBUTION
2.28. (Sec. 25) In cach part of Prohlcm 2.6, find the conditional distribution of X
givcn Y = Y. find the conditional distribution of Y given X =x, and plot each
regr..:)\~'ion linc on Ihe appr6priulc graph in Problem 2.6.
1. 0.80 -0.401
~= O.HO I. -0.56 .
( -0.40 -0.56 1.
2.30. (Sec. 2.5) In Problem 2.9, find the conditional distribution of Xj and X 2 given
XJ =X3'
(a) Show thal finding 0: to maximize the absolute value of the correlation
hctween X, Hnd ex' Xl~l i.; cqlliv!liem to maximizing (0':,)0:)1 subject to
o:'l:22O: con~lanl.
(b) Find 0: by maximizing (0';,)0:)2 - ",(0: I ~22 0: - c), where c is a COnstant and
..\ is a Lagrange multiplier. i
- ,.
2.33. (Sec. 2.5) lnlJariallce of the mulliple correia lion coeffiCient. Prove that R j • q + I, .•• , p
is an invariant characteristic of the multivariate normal distribution of Xi and
X ll.) under the transformation x*I = b·" x· +I
c· for b·I .. 0 and X(2)* = HX(2) + k
fOr H nonsingular and th::.! every function of J-tj, O'ji' O'(f)' IJ,l2), and I22 that is
invariant is a function of R i . q + 1..... po
2.35. (Sec. 2.5) Find the multiple correlation coefficient between Xl and (X 2 , X 3 )
in Problem 2.29.
[Hint: Using Problem 2.36, prove III $ 0'111I 22 1, where :t22 is (p -1) X
(p - 1), and apply induction.]
2.38. (Sec. 25) Prove equality holds in Problem 2.37 if and only if :t is diagonal.
2.40. (Sec. 2.5) Let (Xl' X 2 ) have the density n (xl 0, ~) = !(x h X 2 ) .. Let the density
of X 2 given XI ;:::x 1 be !(x2 Ix t ). Let the joint density of XI' X 2 , X3 be
!(xl>x 2 )!(x3 Ix). Find the covariance matrix of X I,X2,X3 and the partial
correlation between X2. and X 3 for given X I'
2.41. (Sec. 2.S) Prove 1 - "Rr'23 = (1 - pt3X1 - Pf203)' [Hint: Use the fact that the
variance of XI' in the conditional distribution given X 2 and X3 is (1- Rf'23) 0'11']
2.42. (Sec. 25) If P = 2, c~n there be a difference between the simple correlation
between Xl and X z and the multiple correlation between XI and X(2) = X 2?
Explain.
l44. (Sec. 2.5) Give a necessary and sufficient condition for Ri •q + l ..... p:: 0 in terms
If of O'(.q+I>'''' O'ip.
[Hint: Apply Theorem A.3.2 of the Appendix to the cofactors used to calculate
(T Ij.]
2.48. (Sec. 2.5) Show that for any joint distribution for which the expectations exi.;;t
and any function h(x(2)) that
[Hillt: In the above take the expectation first with respect to Xi conditional
all X( 2).]
2.49. (Sec. 2.5) Show that for any function h(X(2)) and any joint distribution of Xl
and X(2) for which the relevant expectations exist, .8'[Xi - h(X(2))]2 = 4[Xi -
g(X(2))]2 + 4[g(X(2)) - h(X(2))]2, where g(X(2)) = 4Xi lx(2) is the conditional
expectation of Xi given X(2) = X(2). Hence g(X(2)) minimizes the mean squared
error of prediction. [Hint: Use Problem 2.48.]
2.50. (Sec. 2.5) Show that for any fUnction h(X(2) and any joint distribution of Xl
and X(2) for which the relevant expectations exist, the correlation between Xi
and h(X(2)) is not greater than the correlation between and g(X(2)), where x,.
g(X(2)) = 4Xi lx(2).
2.51 .. (Sec. 2.5) Show that for any vector functicn h(X(2))
is positive semidefinite. Note this generalizes Theorem 2.5.3 and Problem 2.49.
2.52. (Sec. 25) Verify that I 12I;-2! = _'I'~!I '1'12' where 'I' = I-I is partitioned
similarly to I.
=(00 0) +
I;-21 (I)
-13' Ill.2(I -I -13),
where 13 = I 12 I;-r [Hint: Use Theorem A.3.3 of the Appendix and the fact
that I -I is symmetric.]
PROBLEMS 63
+ (I 12 - I I 331I32)(I22 - I 23 I 331I32f I
13
I22 I23)-I(I21) -I
III-(II2 I 13) ( I32 I33 I31 =I ll -II3 I 33 I 31
where
Q = ( X(I) - ....(1»). A II( X(I) - ....(1») + ( x(l) - .... (1»). A 12( x(2) (2»)
_ ....
IA22-A21Aii1A121t
~=-~~~-=-e
_!Q
2 2
(27r)~(p-q)
2.60. (Sec. 2.6) Let f be distributed according to N(O, X). Differentiating the
characteristic function, verify (25) and (26).
2.61. (Sec. 2.6) Verify (25) and (26) by using the transformation X - fL:;:a CY, where
X = CC', and integrating the density of f.
o otherwise.
2.63. (Sec. 2.6) Suppose X is distributed according to N(O, X). Let X = «(1 l> ••• , (1p)'
Prove
(11(1'1
[
(11(1p
£1.£11
K= :
[
£1£~
and £i is a column vector with 1 in the ith position and O's elsewhere.
2.64. Complex normal distribution. Let ex', f')' have a normal distribution with mean
vector (!ix, !iy)' and covariance matrix
X= (~ -~)
r '
where r is positive definite and ~ = - ~t (skew symmetric). Then Z =X + if
is said to have a complex normal distribution with mean 6 -ILx+ i.... y and
covariance matrix G(Z - 6 )(Z - 6)* = P = Q + iR, where z* = XI - ifI• Note
that P is Hermitian and positive definite.
(c) Show
j
.'
,
Problem 2.64, show that W'= AZ, where A is a nonsingular complex matrix, has
the complex normal distribution with mean A6 and covariance matrix €(W) =
APA*.
iP i!it(u* Z)
0e =e I!itU"O-/l" p" •
where {?t(x+iy)""x,
2.68. (Sec. 2.7) For the multivariate t-distribution with density (41) show that
GX = ... and 1&'(X) = [m/(m - 2)]A.
CHAPTER 3
3.1. INTRODUCfION
66
3.2 ESTIMATORS OF MEAN VECfORAND CQVARIANCEMATRIX 67
In the likelihood function the vectors xl>"" XN are fixed at the sample
values and L is a function of .... and ~. To emphasize that these quantities
are variable~ (and not parameters) we shall denote them by 1'* and I*. Then
the logarithm of the likelihood function is
(2) log L = - ~pN log211'- ~N logl I*I
N
- ~ E (x a - 1'*)':£ *- 1 ( X a - 1'* ) .
01-1
(3) =
68 ESTIMATION OF 1HE MEAN VECTOR AND TIiE COVARIANCE MATRIX
where Xu = (Xla"'" xpa)' and X, = E~_I xia/N, and let the matrix of sums
of squares and cross products of deviations about the mean be
N
(4) A= E (xa-x)(xa-x)'
01.-1
..
I,} = 1 , ... ,p.
Proof
(6)
N N
E (Xa:"'b)(xa- b)' = E [(Xa-x) + (x-b)][(xa-x) + (x-b))'
a-I
N
- E [( Xa -x)( Xa -x)' + (Xa -x)(x- b)'
a-I
N
+(x-b) E (xa-i)' +N(i-b)(x-b)'.
0-1
The second and third terms on the right-hand jide are 0 because E(x a -x) ..
EXa - NX= 0 by (3). •
(7)
N N
E (xa - p.*)(xu - p.*)' = E (xa -x)( Xu - x)' + N(x - p.*)(~t- p.*)'
Using this result and the properties of the trace of a matrix (tr CD = EcIJd"
= tr DC), we hale
(8)
N r N
E (x a -p,... ),1; ... -I(%a-P,... ) -tr L (xa -p,... )'1; ... -I(%a-P,... )
0-1
N
= tr E 1; ... -1 (Xa - p,*)( Xa - p,*)'
Proof. Let D-EE' and E'G- 1E=H. Ther G=EH-IE'. and IGI "'" lEI
-IH- 11 'IE'I - IH- 11 'IEE'I = IDII IHI, and tr G-ID - tr G-1EE' -
trE'G-\,-trH. Then the function to be maximized (with respect to posi-
five,4efhiite H) is
J~) ,
.( 12) "{'~"1,;+
..~'!.)
f= - N 10giDI + N 10glHl - tr H.
:Let q '£ 'd,:, ~here T is lower triangular (CoroUuy A.1.7). Then the
m~lnof
t •• '
(IS)
Corollary 3.2.1. If On the basis of a given sample 81" " , 8m are maximum
likelihood estimators of the parameters 81"'" 8m of a distribution, then
cJ>1(8 1• '1'" 8m)~"" cJ>m(8 1, ... , em) are maximum likelihood estir.12tor~' of
cJ>1(8 1•••• ,8m), •.•• cJ>m((lt' ... ,8m) if the transformation from 8 1, ... ,Om to
cJ>l" ••• cJ>m is one-to-one.t If the estimators of 0 1" " , Om are unique, then the
estimators of cJ>1"'" cJ>m are unique.
(17)
Proof. The set of parameters /-tl = /-til a;2 = [fll' and Prj = ujj ujjOj j is a V
one-to-one transform of the set of parameters /-t{ and UIj' Therefore. by
Corollary 3.2.1 the estimator of /-ti is Mi' of u/ is au, and of PJj is
(18)
•
Pearson (1896) gave a justification for this estimator of Pi)' and (17) is
sometimes called the Pearson correlation coefficient. It is also called the
simple correlation coefficient. It is usuaUy denoted by rl)'
tTbe assumpbon that the transformation is one-to-onc i!\ made 50 that the sct 4>" ... , 4>m
un\quc1y define!> the likelihood. An alternative in ca....e fJ.fI "" cf>( (}) doe:! not have a unique inverse
Is to define $(0*)=(0: 4>(0)=0*} and g(o*)=supj'(O)IOES(O*), whieh is considered the
"induced likelihood"when !(O) is the likelihood function. Then 0* .. 4>(6) maximizes g(O*),
for g(O*)'" sup!(O)IOE S(6*):?: supf(O)IOE S =1(6)" g(6*) for all 0* ES*. {See, e.g.,
Zehna (1966).1
72 ESTIMATION OF THE MEAN VECTOR AND THE COVARIANCE MATR1X
.. ------------~~--------~~
d~
Flgure3.1
X~N = (U;l .
(19) . .
, '
x pN up
that is, u~ is the ith row of X. The vector Ui can be considered as a vector in
an N-dimensional space with the ath coordinate of one endpoint being x la
and the other endpoint at the origin. Thus the sample is represented by p
vectors in N-dimensional Euclidean space. By defmition of the Euclidean
metric, the squared length of Ui (that is, the squared distance of one
endpoint from the other) is UiUi" E~_lxta'
Now let us show that the cosine of the an Ie between u, and u) is
ujujl "';u~u,ujUj .... EZ. 1 xu"xJ,./ E~.l X;aEZ ... 1 xJa' Choose the scalar d so
the vector duJ is orthogonal to U, - du); that is, 0 = du~(Ui - dul ) = d(uju, -
~uJ)' Therefore, d-uJu;lujuj' We decompose U, into Ui-duj and duj
lUi = (u du j ) + duJl as indicated in Figure 3.1. The absolute value of the
l -
cosine of the angle between Ui and uJ is the length of duJ divided by the
length of Ui; that is, it is ';du~(duj)/u~ul - "';du}uJdlu~ui; the cosine is
uiu)1 ";u~u:u~uj' This proves the desired result.
. , .
To give a geometric interpretation of a,j and a'II "aUal) ' we introduce
the equiangular line, which is the line going through the origin and the point
(1, 1, ... ,1). See Figure 3.2. The projection of u, on the vector E - (1, 1, ... ', I)'
is (E'U'/E'E)E = (E"x 1aIE"I)E -XiE - (XI' x" ... ,Xj)'. Then we decpmpose
Ui into XiE, the projection on the equiangular line, and U, -XiE, ~e
projection of U I on the plane perpendicular to the equiangular line. The
squared length of U, - XIE is (Ui - XiE )'(u l - X,E) ... Ea(x'a - X/)2; this is
NU" = au' Translate Us -XjE and uj - XJE, so th~~each vector has an enp.-
point at the origin; the ath coordinate of the first vector is Xja -x" and of
·2 ESTIMATORS OF MEAN VEcroR AND COVARIANCE MATRIX 73
~he s~cond is X Ja - xj* The cOsine of the angle between these two vectors is
(20)
N
E (Xla - xi )( xia -XJ}
N N
E (Xla _X )2 E (Xja -XJ)2
j
a-I
• •
3
2 •
sedative B, and so on. Assuming that each pair (i.e., each row in the table) is
an observation from N(J.\.,:'£), we find that
I. _ x (2.33)
r- 0.75 '
N N
(1 ) $fa = $ E ca,8X,8 = E ca,8 $X,8
,8=1 ,8=1
N
= ~ c a,8p.,8 = Va'
,8=1
= $ [
,8-1
E ca,8 (X,8 - p.,8)] [ E
e=l
c-yA Xe - p.e)']
N
E ca,8c-y,,$(X,8- p.,8)(X,,- p..,)'
,8. c= I
N
= E La,8C-ye o,8e I
,8, c= 1
where 0a-y is the Kronee ker delta (= 1 if a = '}' and = 0 if a =1= '}').
This shows that fa is independent of f-y, a =1= ,}" and fa has the covariance
matrix I. •
76 ESTIMATION OF THE MEAN VECTOR AND THE COVARIANCE MATRIX
Proof
N
(3) E YaY~ == E Ecapxp Eca-yx;
a= I a p -y
= E ( Ecapca-y )XpX~
~. -y a
= E Op-y Xp X-y
~,-y
•
Let Xl"'" X N be independent, each distributed according to N(IL, I).
There exists an N X N orthogonal matrix B = (baf;)) with the last row
Then
(6)
N
= E ZaZ~ - ZNZN
a=l
33 THE DIsTRIBUTION OF THE SAMPLE MEAN VECfOR 77
N
= E baf3 bNf3 1N JL
f3=1
=0, a=t=N.
3.3.2. Tests and Confidence Regions for the Mean Vector When the
Covariance Matrix Is Known
A statistical problem of considerable importance is that of testing the
hypothesis that the mean vector of a normal distribution is a given vector.
78 ESTIMATION OF TIlE MEAN VECI'OR AND THE COVARIANCE MATRIX
and a related problem is that of giving a confidence region for the unknown
vector of means. We now go on to study these problems under the assump-
tion that the covariance matrix I is known. In Chapter 5 we consideJ' these
problems when the covariance matrix is unknown.
In the univariate case one bases a test or a confidence interval on the fact
that the difference between the sample mean and the population mean is
normally distributed with mean zero and known variance; then tables of the
normal distribution can be used to set up significance points or to compute
confidence intervals. In the multivariate case one uses the fact that the
difference between the sample mean vector and the population mean vector
is normally distributed with mean vector zero and known covariance matrix.
One could set up limits for each component on the basis of the distribution,
but this procedure has the disadvantages that the choice of limits is some-
what arbitrary and in the case of tests leads to tests that may be very poor
against some alternatives, and, moreover, such limits are difficult to compute
because tables are available only for the bivariate case. The procedures given
below, however, are easily computed and furthermore can be given general
intuitiw and theoretical justifications.
The procedures and evaluation of their properties are based on the
following theorem:
Thus
(14)
To test the hypothesis that IJ. = lJ.o' where lJ.o is a specified vector, we use as
our critical region
(15)
If we obtain a sample such that (15) is satisfied, we reject the null hypothe"is.
It can be seen intuitively that the probability is greater than a of rejecting
the hypothesis if IJ. is very different from lJ.o. since in the space of x (15)
defines an ellipsoid with center at lJ.o' and when IJ. is far from lJ.o the density
of x will be concentrated at a point near the edge or outside of the ellipsoid.
The quantity N(i lJ.o)'I -I(i - lJ.o) is distributed as a noncentral X 2 with
p degrees of freedom and noncentrality parameter N(IJ. - lJ.o)'~ -1(1J. - lJ.o)
when i is the mean of a sample of N from N(IJ.,~) [given by Bose
(1936a),(1936b)]. Pearson (1900) first proved Theorem 3.3.3 for v = O.
Now consider the following statement made on the basis of a sample with
mean i: 'lThe mean of the distribution satisfies
(16)
as an inequality on IJ.*." We see from (14) that the probability that a sample
will be drawn such that the above statement is true is 1- a because the
event in (14) is equivalent to the statement being false. Thus, the set of IJ.*
satisfying (16) is a confidence region for IJ. with confidence 1 - a.
In the p-dimensional space of i, (15) is the surface and exterior of an
ellipsoid with center lJ.o, the shape of the ellipsoid depending" on ~ -1 and
the size on (1/N)x;(a) for given ~-l. In the p-dimensional space of IJ.*
(16) is the surface and interior of an ellipsoid with its center at X. If ~ -I = I,
then (14) says that the robability is a that the distance between x and IJ. is
greater than X;( a )/N.
80 . ESTIMATION OF THE MEAN VECfOR AND THE COVARIANCE MATRIX
Theorem 3.3.4. Ifi is the mean of a sample of N drawn from N(v., I) and
I is known, then (15) gives a critical region of size a for testing the hypothesis
V. = v'o, and (16) gives a confidence region for V. I)f confidence 1 - a. Here
Xp2( a) bi chosen to satisfy (13).
The same technique can be used for the corresponding two-sample prob-
lems. Suppose we have a sample {X~l)}, a = 1, ... , N 1, from the dIstribution
N( p(l), I), and a sample {x~:)}, a = 1, ... , N 2 , from a second normal popula-
tion N( p(2l, ~) with the same covariance matrix. Then the two sample
means
N, 1 N~
(17) -(1)
X
= _1_"
N L.....
(I)
xa , x(2) =N L x~2)
I a= I 2 a= 1
( 18) NNIN2
+N (y - V
)'~k -l( Y - V
) ~ X" 2()
a
I 2
is a confidence region for the difference v of th~ two mean vectors, and a
critical region for testing the hypothesis v.(l) = f.1.(2) is given by
(19)
(20) i = 1, . .. ,p.
( 21)
--1 e -{(Z-1")~
- I 1 ~(p-I)-l e -~K'
-
(22) w-
J27r 2~(p-l)r[ H p - 1)]
:;c a a
_.!.("t2+u)( 2)t(P-3) " T Zl
(23) Ce Z V-Z 1 t..J -,-.
a.
a=O
=B[1(p-l),,8+1J
r[!(p-l)]r(,8+1)
r(!p+,8)
by the usual properties of the beta and gamma functions. Thus the density of
V is
(26)
We ~an use the duplication formula for the gamma function r(2,8 + 1) = (2,8)!
(Problem 7.37),
to rewrite (26) as
(28)
To obtain the power function of the test (15), we note that /N(X - ..... 0)
has the distribution N[/N ( ..... - ..... 0)' I]. From Theorem 3.3.3 we obtain the
following corollary:
Sufficiency
A statistic T is sufficient for a family of distributions of X or for a parameter
9 if the conditional distribution of X given T = t does not depend on 8 [e.g.,
Cramer (1946), Section 32.4]. In this sense the statistic T gives as much
information about 8 as the entire sample X. (Of course, this idea depends
strictly on the assumed family of distributions')
(1 ) f(YI8) =g[t(y),8]h(y),
where g[t(y), 8] and hey) are nonnegative and hey) does not depend on 8.
Theorem 3.4.1. If Xl"'.' XN are observations from N(JL, I), then x and S
are sufficient for JL and I. If fL is given, L ~"I (x a - JL)( Xa - JL)' is sufficient for
I. If I is given, x is sufficient for JL.
The right-hand side of(2) is in the form of(I) for x, S, JL, I, and the middle
is in the form of (1) for L~=I(Xa - JLXXa - JLY, I; in each case h(x I, ... ,x N )
= 1. The right-hand side is in the form of (1) for x, JL with h(x l ,···, XN) =
exp{ - teN - 1) tr I -IS}. •
Completeness
To prove an optimality property of the T 2-test (Section 5.5), we need the
result that (i, S) is a complete sufficient set of statistics for ( .... , I).
Theorem 3.4.2. The sufficient set of statistics i, S is complete for ...., I when
the sample is drawn from N( .... , I).
Proof We can define the sample in terms of i and Z I' ... , Zn as in Section
3.3 with n = N - 1. We assume for any function g(x, A) = g(i, nS) that
Efficiency
If a q-component random vector Y has mean vector GY = v and covariance
matrix G(Y - v Xy - v)' = W, then
(7)
This is the information matrix for one observation. The Cramer-Rao lower
86 ESTIMATION OF THE MEAN VEcrOR AND THE COVARIANCE MATRIX
, 2log f ]-1
( 10) N$(t-O)(t-O) - [-$ rIaoao'
is positive semidefinite. (Other lower bounds can also be given.)
Consis te1lc.v
1 N N _ _
(11) S'I = N_ I L (x", - ,u,)(x}<y - ,u}) - N -1 (Xi - ,u,)( X, - ,u,)
<> ~ I
Asymptotic Nonnality
First we prove a multivariate central limit theorem.
Proof Let
the univariate central limit theorem [Cramer (1946), p. 21S]' the limiting
distribution is N(O, t ' Tt). Therefore (Theorem 2.6.4),
(13)
for every u and t. (For t = 0 a special and obvious argument is used.) Let
u = 1 to obtain
(IS)
(16) y=
a
and the elements of T being .given above. If the elements of A(n) are
arranged in vector form similar to (16), say the vector Wen), then Wen) - nv
= L~"I(Ya - v). By Theorem 3.4.3, (1/ v'n)[W(n) - nv] has a limiting normal
distribution with mean 0 and the covariance matrix of Ya . II
The elements of B(n) will have a limiting normal distribution with mean .0
if x I' x 2 , ••• are independently and identically distributed with finite fourth-
order momcnts, bUl thc covariance slructure of B(n) will depend on thc
fourth-order moments.
For example, if d and 0 are univariate, the loss may be squared error,
= (0 - d)2, and the risk is the mean squared error $8[ D( X) - 0 F.
L( 0, d)
A decision procedure D(x) is as good as a procedure D*(x) if
D(X) is better than D*(x) if (18) holds with a strict inequality for at least one
value of O. A procedure D*(x) is inadmissible if there exists another proce-
dure D(x) that is better than D*(x). A procedure is admissible if it is not
inadmissible (Le., if there is no procedure better than it) in terms of the given
loss function. A class of procedures is complete if for any procedure not in
the class there is a better procedure in the class. The class is minimal
complete if it does not contain a proper complete subclass. If a minimal
complete class exists, it is identical to the class of admissible procedures.
When such a class is avai lable, there is no (mathematical) need to use a
procedure outside the minimal complete class. Sometimes it is convenient to
refer to an essentially complete class, which is a class of procedures o;uch that
for every procedure outside the class there is one in the class that is just as
good.
3.4 THEORETICAL PROPERTIES OF ESTIMATORS OF THE MEAN VECTOR 89
Given the a priori density p, the decision procedure o(x) that minimizes
r( p, 0) is the Bayes procedure, and the resulting minimum of r( p, 0) is the
Bayes ri<;k. Under general conditions Bayes procedures are admissible and
admissible procedures are Bayes or limits of Bayes procedur~s. If the dt:nsity
of X given 8 is f(xi 8), the joint density of X and 8 is fixl 8)p(O) and the
average risk of a procedure Sex) is
= fx{feL[8,0(X)]g(8IX)d8}f(X) dx;
here
are the marginal density of X and the a posteriori density of e given x. The
procedure that minimizes r( p, 0) is one that for each r minimizes the
expression in braces on the right-hand sid~ or (20), that is, the expectation of
L[O, Sex)] with respect to the a posteriori distrib·Jtion. If 8 and d are vectors
(0 and d) and L(O, d) = (0 - d)'Q(O - d), where Q is positive definite, then
(23)
(24)
90 ESTI~ATION OF THE MEAN VECfOR AND THE COVARIANCE MATRIX
Proof Since x is sufficient for IJ., we need only consider x, which has the
distribution of IJ. + v, where v has the distribution N[O,(ljN)I] and is
independent of IJ.. Then the joint distribution of IJ. and x is
(25)
The mean of the conditional distribution of IJ. given x is (by Theorem 2.5.1)
(26) ( 1)-1 ( x - v) ,
v + <I> <I> + N I
3.5.1. Introduction
The sample mean x seems the natural estimator of the population mean Il-
based on a sample from N(Il-, 1:). It is the maximum likelihood estimator, a
sufficient statistic when ~ is known, and the minimum variance unbiased
estimator. Moreover, it is equivariant in the sense that if an arbitrary vector v
is added to each observation vector and to Il-. the error of estimation
(x + v) - (Il- + v)::= X - Il- is independent of v; in other words, the error
does not depend on the choice of origin. However. Stein (1956b) showed the
startling fact that this conventional estimator is not admissible with respect to
the loss function that is the sum of mean squared errors of the components
when ~ = I and p? 3. James and Stein (1961) produced an estimator whiCh
has a smaller sum of mean squared errors; this estimator will be studied in
Section 3.5.2. Subsequent studies have shown that the phenomenon is
widespread and the implications imperative.
displaying an alternative estimator that has a smaller expected loss for every
mean vector IJ.. We assume that the nonnal distribution sampled has covari-
ance matrix proportional to I with the constant of proportionality known. It
will be convenient to take this constant to be such that Y = (1 / N)E~ '" 1 Xa = X
has the distribution N(IJ.. I). Then the expected loss or risk of the estimator
Y is simply GIIY - 1J.1I2 = tr 1= p. The estimator proposed by James and Stein
is (essentially)
(2) - 2 ) (y-v)+v,
lIy-v
where v is an arbitrary flxed vector and p;;::: 3. This estimator shrinks the
observed y toward the specified v. The amount of shrinkage is negligible if y
is very different from v and is considerable if y is close to v. In this sense v
is a favored point
Theorem 3.5.1. With respect to the loss function (1). the risk of the estima-
tor (2) is less than the risk of the estimator Y for p ;;:: 3.
We shall show that the risk of Y minus the risk of (2) is positive by
applying the following lemma due to Stein (1974).
( 4)
then
oo 1 joo f'(x) 1
J
I2
(5) f(x)(x- 8)-e-"2(x-(J) dx=
_00 V27r _00
3.5 IMPROVED ESTIMATION OF THE MEAN 93
OO 1 I 2
(6) f9
[f(x) -f(8)](x- 8)-e-,(x-6) dx
'/27r
9 1
+
f
I '
[f(x) -f(O)](x- 8)-e-~(x-ln- dx
-00 /27r
OOfXf'(y)(x - 8)-e-,(J.-e)
1 I 1
=
f9 9 /27r
dydx
OOfOOf'(y)(x - O)-e-!(x-tJ)
1
f
2
= dxdy I
9 Y /27r
which yields the right-hand side of (5). Fubini's theorem justifies the in ter-
change of order of integration. (See Problem 3.22.) •
= $ f2 P- 2 ;. (Y _ )( Y _ v) _ (p - 2)1 }
fL\ IIY-vIl2 j":; I J-L, I I IIY'-vIl2'
94 ESTIMATION OF THE MEAN VECTOR AND THE COVARIANCE MATRIX
.\",- V,
p , f' (Y,) = ---::'""p--~
J=I J=I
mLxal
vr-------~~----~----~------~-------
(10)
(11) 1/ 1 = 1/2V- 1
... lIf- .... T
=e 2 - -,II} ; . (1'4
- .t....
2
) f3 _ _-;-;-1_--::-100
+ f-!) 0 V
2 +n - 2
lp
e
_!v d
v
~-o
'" 2
f3 t-'
=e
= le-
2
£ )f3_-:--_
~-o
(1'2
f3!( + f3-1)
1_ _
for p 2 3. Note that for .... = .... , that is, 1'2 = 0, (11) is 1/(p - 2) and the mean
squared error (10) is 2. For large f. the reduction in risk is considerable.
Table 3.2 gives values of the risk for p = 10 and u 2 = 1. For example, if
1'2 = II .... - .... 11 2 is 5, the mean squared error of the James-Stein estimator is
8.86, compared to 10 for the natural estimator; this is the case jf Jii - V t =
1/12 = 0.707, i = 1, ... ,10, for instance.
(12) g(u)~O,
(14)
plus 2 times
. 1
(21T)iP
1 exp { t[ f: yl-
t 1
2y 1 1l .... 11 + 11 .... 11
2
]) dy,
where y' =x'P, (11 ....11,0, ... ,0) = ....'p, and PP' =1. [The first column of P is
(1/II .... ID ..... ] Then (15) is 11 .... 11 times
1
--,,..-e
(21T)iP
i
'Ef, 2
-IY, du du ... dy
:.rl 'J2 p
~ °
(by replacing YI by -YI for Yl < 0). •
3.5 IMPROVED ESTIMATION OF THE MEAN 97
The theorem shows that m(Y) is not admissible. However, it is known that
m+(Y) is also not admissible, but it is believed that not much further
improvement is possible.
This approach is easily extended to the case where one observes x I' ••• , x,.,
from N(j.I., I) with loss function L(j.I., m) = (m - j.I.)/I -I(m - j.I.). Let I =
CC' for some nonsingular C, xQ = Cx!, a = 1, ... , N, j.I. = Cj.I.*, and
L*(m*, j.I.*) = IIm* - j.I.* 112. Then xi, ... , x~ are observations from N(j.I.*, 1),
and the problem is reduced to the earlier one. Then
( 18)
p-2 )+ _
( 1- N(x-v)'I-l(i-v) (x-v)+v
( 19)
Theorem 3.5.3. Let /'( z ). 0 :5 ::: < ce, he II /lOl1dccreasing differentiable func-
tion such that 0.::; /,(z):::;; 2(p - 2). Then for p ~ 3
(22) C(m*, fJ..*) =:: (m* - j.L* )'(m* - j.L*) = Ilm* - fJ..*1I2.
Thr..: r..:stimalor (2]) of fJ.. is tranSrmlllr..:J to thr..: r..:stimator of fJ..* = C-1(fJ.. - v),
(23)
(24)
Since r(z) is differentiable, we use Lemma 3.5.1 with (x - IJ) = (Yi - I-Lt )0.
and
( 25)
(26)
3.5 IMPROVED ESTIMATION OF THE MEAN 99
Then
(27)
y
tlR( *)=G*{2( _2)r(y,,&-2 ) +4r'(y',&-2y)- r2 y',&-2
... ... P Y',& 2 Y Y' ,& - Y
(28)
z laU-W-C+I e- iuz du
(29) r( z) = --=.0_ _ _ _ __
fot:ru iP - e ~uz du
C
i=1 j=i
p j
= E D:, E (mf - J.L'!)2
,= 1 i= 1
p
= E D:jllm*(J) - fL*U)\l2,
j= 1
where D:j=qj-qj+I' j=1, ... ,p-1, D:p=q;, m*U)=(mj, ... ,mj)', and
fL* U) = ( J.Li, ... , J.Lj)', j = 1, ... , p. This decomposition of the loss function
suggests combining minimax estimators of the vectors fL*U>, j = 1, ... , p. Let
y<n= (YI'''''Y/'
Theorem 3.5.4. If h(J)(y(f)) = [h\J)(yU»), . .. , h)j)(y(j»)], is a minimax esti-
matorof fL*U) under the loss function IIm*(j)- fL*U)1I 2, j= 1, ... ,p, then
1 p
(31) -=!ED:,h~,)(y(J»), i=1, ... ,p,
q. j=1
P
= E D:, $1I-.llh U)( y<J») - fL*(j)1I 2
j=l
P P
::; E D:,j = E qj
j-I j= 1
Since the expected value of G.(Y) with respect to (32) is (31) and the loss
function is convex, the risk of the estimator (31) is less than that of the
randomized estimator (by Jensen's inequality). •
i,j.k,l, = l, .... p
= N(1 -I- K)( 0".) O"kl + O"jk O"}I + 0",1 O"}~) + N( N - 1) O",} O"u.
102 ESTIMATION OF THE MEAN VECTOR AND THE COVARIANCE MATRIX
.'1
1 ~'
=-~'
N'2 E (XJ a - iLJ( Xl (3 - iLl )( Xkl - iLl )( X,6 - iLl)
(I:, {J. y. 6'= 1
N 1 N
( 8) cf E (XICC - iL,)( X}U - iL}) N E (Xk{J - iLd (Xll - IL-r)
a~l {J.,'=l
~cB-vec(bl.···.b.) ~ (H
( to)
See. e.g., Magnus and Neudecker (1979) or Section A.5 of the Appendix. We
can rewrite (4) as
Theorem 3.6.2
( 13)
3.6 ELLIPTICALLY CONTOURED DISTRIBUTIONS 103
This theorem follows from the central limit theorem for independent
identically distributed random vectors (with finite fourth moments). The
theorem forms the basis for large-sample inference.
N
(15) ~ E [(xa _x)'S-I(xa - X)]2 ~ p(p + 2)(1 + K).
a=!
A consistent estimator of K is
( 16)
(17)
104 ESTIMATION OF THE MEAN VECTOR AND THE COVARIANCE MATRIX
(18)
)1 Q f.L Q f.L
The estimator A
is a kind of weighted average of the rank 1 matrices
(xQ
~XxQ - ~)'. In the normal case the weights are liN. In most cases
-
(19) and (20) cannot be solved explicitly, but the solution may be approxi-
mated by iterative methods.
The covariance matrix of the limiting normal distribution of -IN (vec A-
vec A) is
where
p( p + 2)
(22)
O'[g = 4,s[g'(R:J.. 212 '
R
g(R2)
20'1g(1 - O'lg)
(23) 0' 2g = 2 + P (1 - (T 1g) •
(24)
3.6 ELLIPTICALLY CONTOURED DISTRIBUTIONS 105
be an N X P random matrix with density g(Y'Y) = g(L~= I y" y~). Note lhat
the density g(Y'Y) is invariant with respect to orthogonal transformations
y* :::: ON Y. Such densities are known as left spherical matrix densities. An
example is the density of N observations from NCO,lp)'
(25)
d
(29) vee Y= R vee U,
~NJ1
(30) r(~PI2) w1N p-l g(w),
vee U has the uniform distribution on L~= I L;= I u;Q = 1, and R and vee U are
independent. The covariance matrix of vee Y is
(31)
106 ESTIMATION OF THE MEAN VECfOR AND THE COVARIANCE MATRIX
Since vec FGH = (H' ® F)vec G for any conformable matrices F, G, and H,
we can write (27) as
(32)
Thus
(33)
$R2
(34) -€(vec X) = (C ® IN )-€(vec Y)( C' (~IN) = Np A ® IN'
where x = (l/N)X' E N' This shows that a sufficient set of statistics for JI- and
A is x and nS = (X - ENX')'(X - ENX'), as for the normal distribution. The
maximum likelihood estimators can be derived from the following theorem,
which will be used later for other models.
'" - A m-
( 40) v=v, '1>= -~
wh '
3.6 ELLIPTICALLY CONTOURED DISTRIBUTIONS 107
and t~ maximum of the likelihood is 14>1 ih(w l, ) [Anderson, Fang, and Hsu
(1986)].
(42)
Under normality h(d) = (27r)- ~me- ~d, and the maximum of (42) is attained
at v = ii, 'It = 'It I~I -11 m ~, and d = m. For arbitrary h(') the maximum
=.!:
v
of (42) is attained at = ii, B= ii, and J = who Then the maximum likeli~
hood estimator of 4> is
( 43)
Theorem 3.6.4. Let X (N Xp) have the density (28), where wiNPg(w) has
a finite positive maximum at Wg' Then the maximum likelihood estimators of IL
and A are
(44) iJ.=x,
Corollary 3.6.1. Let X (N X p) have the density (28). Then the maximum
likelihood estimators of v, (Au,'''' App), and PiP i,j=l, .•. ,p, are x,
(p/wg)(aw ... ,a pp ), andai}/Valian" i,j= l, ... ,p.
Proof. Corolhlry 3.6.1 follows from Theorem 3.6.3 and Corollary 3.2.1. •
(45)
108 ESTIMATION OF THE MEAN VECfOR AND THE COVARIANCE MATRIX
for all c. Then the distribution of f(X) where X h2s an arbitrary density (28) is
the same as its distribution where X has the normal density (28).
( 47)
by (45). Let f(X) = h(vec X). Then by (46), h(cX) = h(X) and
Any statistic satisfying (45) and (46) has the same distribution for all g(.).
Hence) if its distribution is known for the normal case, the distribution is
valid for all elliptically contoured distributions.
Any function of the sufficient set of statistics that is translation-invariant,
that is, that satisfies (45), is a function of S. Thus inference concerning 'I can
be based on S.
PROBLEMS
3.1. (Sec. 3.2) Find ii, t, and (PI) for the data given in Table 3.3, taken from
Frets (1921).
3,3, (Sec. 3.2) Compute ii, t, S, and P for the following pairs of observations:
(34,55),(12,29), (33, 75), (44, 89), (89,62), (59, 69),(50,41),(88, 67). Plot the obser~
vations.
3.4. (Sec. 3.2) Use the facts that I C* I = nAp tr c* = EAI , and C* = I if Al
Ap = 1, where A., ... , Ap are the characteristic roots of C*, to prove Lemma
3.2.2. [Hint: Use f as given in (12).}
PROBLEMS 109
tThese data, used in examples in the first edition of this book, came from Rao
(1952), p. 2_45. Izenman (1980) has indicated some entries were apparenrly
incorrectly copied from Frets (1921) and corrected them (p. 579).
3.S. (Sec. 3.2) Let be the body weight (in kilograms) of a cat and
Xl X2 the heart
weight (in grams). [Data from Fisher (1947b)']
5.0 3.0 1.6 0.2 6.6 3.0 4.4 1.4 7.2 3.2 6.0 1.8
5.0 3.4 1.6 0.4 6.8 2.8 4.8 1.4 6.2 2.8 4.8 1.8
5.2 3.5 1.5 0.2 6.7 3.0 5.0 1.7 6.1 3.0 4.9 1.8
5.2 3.4 1.4 0.2 6.0 2.9 4.5 1.5 6.4 2.8 5.6 2.1
4.7 3.2 1.6 0.2 5.7 2.6 3.5 1.0 7.2 3.0 5.8 1.6
4.8 3.1 1.6 0.2 5.5 2.4 3.8 1.1 7.4 2.8 6.1 1.9
504 3.4 1.5 0.4 5.5 2.4 3.7 1.0 7.9 3.8 6.4 2.0
5.2 4.1 1.5 0.1 5.8 2.7 3.9 1.2 6.4 2.8 5.6 2.2
5.5 4.2 1.4 0.2 6.0 2.7 5.1 1.6 6.3 2.8 5.1 1.5
4.9 3.1 1.5 0.2 5.4 3.0 4.5 1.5 6.1 2.6 5.6 1.4
5.0 3.2 1.2 0.2 6.0 3.4 4.5 1.6 7.7 3.0 6.1 2.3
5.5 3.5 1.3 0.2 6.7 3.1 4.7 1.5 6.3 3.4 5.6 2.4
4.9 3.6 1.4 0.1 6.3 2.3 4.4 1.3 6.4 3.1 5.5 1.8
4.4 3.0 1.3 0.2 5.6 3.0 4.1 1.3 6.0 3.0 4.8 1.8
5.1 3.4 1.5 0.2 55 25 4.0 1.3 6.9 3.1 5.4 2.1
PROBLEMS 111
5.0 3.5 1.3 0.3 5.5 2.6 4.4 1.2 6.7 3.1 5.6 2.4
4.5 2.3 1.3 0.3 6.1 3.0 4,6 1.4 6.9 3.1 5.1 2.3
4.4 3.2 1.3 0.2 5.8 2.6 4.0 1.2 5.8 2.7 5.1 1.9
5.0 3.5 1.6 0.6 5.0 2.3 3.3 1.0 6.8 3.2 5.9 2.3
5.1 3.8 1.9 0.4 5.6 2.7 4.2 1.3 6.7 3.3 ;).7 1..5
4.8 3.0 1.4 0.3 5.7 3.0 4.2 1.2 6.7 3.0 5.2 2.1
5.1 3.8 1.6 0.2 5.7 2.9 4.2 1.3 6.3 2.5 5.J 1.9
4.6 3.2 1.4 0.2 6,2 2.9 4.3 1.3 6.5 3.0 5.2 2.0
5.3 3.7 1.5 0.2 5.1 2.5 3.0 1.1 6.2 3.4 5,4 2.3
5.0 3.3 1.4 0.2 5.7 2.8 4.1 1.3 5.9 3.0 5.1 1.8
281.3) 3275.55 )
[xu = ( 1098.3 ' 13056.17 .
3.7. (Sec. 3.2) Invariance of the sample co"elation coefficient. Prove that T12 is an
invariant characteristic of the sufficient statistics i and S of a bivariate sample
under location and scale transformations (x1a = b/xia + C il b, > 0, i = 1,2, a =
1, ' .. , N) and that every function of i and S that is invariant is a function of
T 12' [Hint: See Theorem 2.3.2.}
3.8. (Sec. 3.2) Prove Lemma 3.2.2 by induction. [Hint: Let HI = h II,
H,-
_(Hh' I _ 1
i=2, ... ,p,
(1)
N
N(~-l) E (xa -x13 )(xa -x13 Y= N~ 1 E (xa-i)(xa-xy·
a<~ a-I
(Note: When p = 1, the left-hand side is the average squared differences of the
observations.)
112 ESTIMATION OF THE MEAN VECfOR AND THE COVARIANCE MATRIX
3.10. (Sec. 3.2) Estimation of I when fJ. is known. Show that if Xl'·'.' XN constitute
a sample from N(fJ., I) and fJ. is known, then (l/N)'f.~_l(Xa - fJ.Xxa - fJ.)' is
the maximum likelihood estimator of I.
A 1 N
9 =z= N E za'
a~1
3.12. (Sec. 3.2) Prove Lemma 3.2.2 by using Lemma 3.2.3 and showing N logl CI -
tr CD has a maximum at C = ND- l by setting the derivatives of this function
with respect to the elements of C = I -1 equal to O. Show that the function of C
tends to - 00 as C tends to a singular matrix or as one or mOre elements of C
tend to 00 and/or - 00 (nondiagonal elements); for this latter, the equivalent
of (13) can be used.
3.13. (Sec. 3.3) Let Xa be distributed according to N( 'YCa, I), a = 1, ... , N, where
'f.c;>O. Show that the distribution of g=(l/'f.c~)'f.caxa is N['Y,(l/'f.c~)I].
Show that E = 'f.a(Xa - gcaXXa - gc a )' is independently distributed as
'f.~::: Za Z~, where Zl' ... ' ZN are independent, each with distribution MO, I).
[Hint: Let Za = 'f.b a {3X{3' where b N {3 = c{3/ J'f.c~ and B is orthogonal.]
3.14. (Sec. 3.3) Prove that the power of the test in (19) is a function only of p and
[N 1N 2 /(N l + N2)](fJ.(I) - fJ.(2», I -1(fJ.(I) - fJ.(2»), given u.
3.15. (~ec.. 3.3) Efficiency of the mean. Prove that i is efficient for estimating fJ..
3.16. (Sec. 3.3) Prove that i and S have efficiency [(N - l)/N]P(I'+ll/2 for estimat-
ing fJ. and :t.
3.17. (Sec. J.2) Prove that Pr{IA I = O} = 0 for A defined by (4) when N > p. [Hint:
Argue that if Z! = (Zl' ... ,Zp), then IZ~ I "* 0 implies A = Z; Z; I +
'f.~':).HZaZ~ is positive definite. Prove PdlZjl =ZnIZj-11 +'f.!:IIZ,jcof(ZjJ)
= O} = 0 by induction, j = 2, ... , p.]
PROBLEMS 113
l-~(~+l:)-I = l:(~+.l:)-I.
~-~(~+l:)-I~=(~-I +l:-I)-I.
fX 1f'(y)I--e-:()'-~)
f OOfOO\ f'(y)(x- B)--e-;(.'-O)
tly &
1 I : \
dxdy:.::
n
1
&
I :
dy,
(J J.'" \f'(y)(B-x)--e-,(\-H)
I ~ I \ d\'dy= J H I
1f'(Y)I--e-'(\-~)
I : dy.
J _00 _::0 & -x &
3.23. Let Z(k) = (Z;/kH, where i= 1, ... ,p, j= l.. .. ,q and k= 1.2..... be: a
sequence of random matrices. Let one norm of a matrix A be N1(A) =
max;.) mod(a'j)' and another he N::(A) == [,.) a~ = tr AA'. Some alternative
ways of defining stochastic convergence of Z(k) to B (p x q) are
Prove that these three definitions are equivalent. Note that the definirion of
X(k) converging stochastically 10Cl is thaI for every arhilrary positive 8 and r..
we can find K large enough so that for k > K
3.24. (Sec. 3.2) Covariance matrices with linear structure [Anderson (1969)1. Let
'I
(i) l: = L. (]"~Gg.
~ II
ESTIMATION OF THE MEAN VECfOR AND THE COVARIANCE MATRIX
where GO, ...• Gq are given symmetric matrices such that there exists at least
one (q + l).. tuplet 0'0.0'1 ••••• O'q such that (i) is positive definite. Show that the
likelihood equations based on N observations are
(iii) ~ i- -1 G i- -1 G .. (i) - ~ :. -1 G :. -1 A
L. tr -.-1 g-j-1 hO'h - N tr ':'j-1 g':'j-1 , g=0,1, ••• ,q,
iI-O
~
I
CHAPTER 4
4.1. INTRODUCTION
115
116 SAMPLE CORRE LATION COEFFICIENTS
( 1)
4.2 CORRElATION r.OEFFICIENT OF A BIVARIATE SAMPLE 117
(2)
(3)
We shall consider
( 4)
where
N
(5) a jj = ~ (Xja -x,)(Xja -x,), i.j= 1,2.
a=1
(7)
and the pairs (zl\, z4 1 ),'· .,(ZIN' Z2N) are independently distributed.
118 SAMPLE CORRELATION COEFFICIENTS
Figure 4.1
( 8)
If VI is fixed, we can rotate coordinate axes so that the first coordinate axis
lies along L'l. Then bV I has only the first coordinate different from zero, and
L'~ - bL't has this first coordinate equal to zero. We shall show that cot 8 IS
proportional to a t-variable when p = o.
We usc thL following lemma.
is the marginal density of y~ I), and the conditional density of YF), ... , y~2)
given l"1( I) = yjl), ... , y~l) = y~l) is
(9)
•
4.2 CORRElATION COEFFICIENT OF A BIVARIATE SAMPLE 119
Write V, = (Z;], ... , Z;,)', i = 1,2, to denote random vectors. The condi-
tional distribution of ZZa given Zla = z!a is N( {3z1a' O'Z), where {3 = fJlTz/ (It
and O'z = (Il(1 - pZ). (See Sei,;tion 2.5J The density of Vz given V[ = VI is
N( {3t'I' O'z1) since the ZZ<r are independent. Let b = V2VJV'IV t (= aZl/a ll ),
so that bV;(Vz bv1 ) = 0, and let U (VZ - bvl)'(Vz - bv l ) = ViVz - bZulv t
(= all - arz/au). Then cot tJ = b"";an/U. The rotation of coordinate axes
involves choosing an n X n orthogonal matrix C with first row (1/c)v~, where
Z_ ,
c - VtV t •
We now apply Theorem 3.3.1 ,vith Xa = ZZa. Let Ya = r.{>ca{>Zz{>, a =
1, ... , n. Then Y1, ••• , Y,i are independently normally distributed with vari-
ance 0' 2 and means
(10)
n n
(ll) GYa = 2: ca1' {3z 11' = {3c 2: Ca1'c l1' = 0, a:l=!.
y=! 1'= 1
" n ~
(12) U= 2: Zia- b2 2: zta= 2: yaz _y l2
a=1 a I a=1
= "
'-' a'
y2
a 2
Lemma 4.2.2. If (Z1a' Z20.)' a = 1, ... , n, are independent, each pair with
density (7), then the conditional distributions of b = r.~= 1 ZZo. Zla/r.~= 1 Z;o.
and U/O'2=r.:=I(ZZa-bZla)2/O'Z given Zla=z!o.' a=1, ... ,n, are
N( {3, 0'2 /c 2 ) (c 2 = r.:_ 1 zia) and X Z with n - 1 degrees of freedom, respec-
tively; and band U are independent.
(13)
120 SAMPLE CORRELATION COEFFICIENTS
(14)
(16)
Since w = r(l- r2)- t, we have dwldr = (1- r2)- ~. Therefore the density of
r is (replacing n by N - 1)
(17)
From (I7) we see that the density is symmetric about the origin. For
N> 4, it has a mode at r = 0 and its order of contact with the r-axis at ± 1 is
HN - 5) for N odd and IN -3 for N even. Since the density is even, the
odd moments are zerO; in particular. the mean is zero. The even moments
are found by integration (letting x = r2 and using the definition of the beta
function). That tCr 2m = nHN - O]r(m + 4)/{J;"n4(N - 0 + m]} and in
particular that the variance is 1/(N - 0 may be verified by the reader.
The most important use of Theorem 4.2.1 is to find significance points for
testing the hypothesis that a pair of variables are not correlated. Consider the
4.2 CORRELATION COEFFICIENT OF A BIVARIATE SAMPLE 121
hypothesis
(IS)
for some particular pair (l, j). It would seem reasonable to reject this
hypothesis if the corresponding sample correlation coefficient were very
different from zero. Now how do we decide what we mean by "very
different"?
Let us suppose we are interested in testing H against the alternative
hypotheses P,} > O. Then we reject H if the sample correlation coefficient 'I}
is greater than some number '0' The probability of rejecting H when H is
true is
(19)
(20)
(21 )
(22)
122 SAMPLE CORRELATION COEFFICIENTS
(25)
1
o
(26) a(b,u) 1- all =-
1
1a(a I2 , an) - -2~ 1
all
Thus the density of all' a 12' and an for all ~ 0, a 22 ~ 0, and all an - a~2 ;;::: 0
IS
(27)
where
124 SAMPLE CORRELATION COEFFICIENI'S
(29)
for A positive definite, and. 0 otherwise. This is a special case of the Wishart
density derived in Chapter 7.
We want to find the density of
(30)
where arL = aLL 1a}, a~2 = a22 /0"22, and a72 = a L2/( 0"10"2)· The tra:tsformation
is equivalent to setting O"L = 0"2 = 1. Then the density of aLL' a22' and
r = a 12 /Ja ll a22 (da L2 = drJa ll a 22 ) is
(31)
where
To find the density of r, we must integrate (31) with respect to all and a22
over the range 0 to 00. There are various ways of carrying out the integration,
which result in different expressions for the density. The method we shall
indicate here is straightforward. We expand part of the exponential:
(34)
Since
(36)
(1 - p2)tn2nV-;r(tn)r[~{n -1)1
.£a~O a
l{pr)"2
.(1 - p )
"r2[~{n+a)]2n+a(1_p~)II+a
r{ z)( z + ~)
2z 1
(37) r(2z) =2 -
{;
(38)
where n =N -1.
(39)
( 4,0)
. (1 - pr) -n + ~. F (I2,2";
I n + 2;1 1 +2 pr ) '
where
. . . _ ~ f( a + j) f( b + j) f( c) xi
( 41) F(a,b,c,x)- J~O f(a) f(b) f(c+j) j!
(43) H: p= Po'
If the alternatives are p> Po' we reject the hypothesis if the sample correla-
tion coefficient is greater than ro, where ro is chosen so 1 - F(roIN, Po) = a,
t he significance level. If the alternatives are p < Po, we reject the hypothesis
if the sample correlation coefficient is less than r;), where r~ is chosen so
FU''oIN, Po) = a. If the alternatives arc P =1= Po' the region of rejection is r > rl
and r <r'" where r l and r~ are chosen SO [1- F(r,IN, Po)] + F(r;IN. Po) = a.
David suggests that r I and r; be chosen so [1 - F(r II N. Po)] = F(r; IN, Po)
= 1a. She has shown (1937) that for N'?. 10, Ipl s;; 0.8 this critical region is
nearly the region of an unbiased test of H, that is, a test whose power
function has its minimum at Po.
It should be pointed out that any test based on r is invariant under
transformations of location and scale, that is, x;a = bix ra + Ci' b, > 0, i = 1,2,
inverse of r = fie p), i = 1,2, then the inequality fl( p) < r is equivalent tot
P <fI I(r), and r <fzC p) is equivalent to hl(,) < p. Thus (44) can be written
(45)
This equation says that the probability is 1 - a that we draw a sample such
that the inteIVal (f2 I (r),J'i l (r)) covers the parameter p. Thus this inteIVal is
a confidence inteIVal for p with confidence coefficient 1 - a. For a given N
and a the CUIVes r = fl( p) and r = f2( p) appear as in Figure 4.3. In testing
the hypothesis p = Po, the intersection of the line p = Po and the two CUIVes
gives the significance points rl and r~. In setting up a confidence region for p
on the basis of a sample correlation r*, we find the limits 1:;1 (r*) and
Figure 4.3
tThe point (fl( p), p) on the first curve is to the left of (r, p), and thl' point (r,[ll(r» i~ above
(r, p).
4.2 CORRELATION COEFFICIENT OF A BIVARIATE SAMPLE 129
fll(r* ) by the intersection of the line r = r* with the two curves. David gives
these curves for a = 0.1, 0.05, 0.02, and 0.01 for various values of N. One-
sided confidence regions can be obtained by using only one inequality above.
The tables of F(r\N, p) can also be used instead of the curves for finding
the confidence interval. Given the sample value r*, fll(r*) is the value of p
such that ~a = Pr{r ~ r*\ p} = F("* IN, p), and similarly [21(r*) is the value
of P such that ~a = Pr{r 2. r* I p} = 1 - F(r* IN, p). The interval between
these two values of p, (fi. I (r*), fli (r* », is the confidence interval.
As an example, consider the confidence interval with confidence coeffi-
cient 0.95 based on the correlation of 0.7952 observed in a sample of 10.
UsL1g Graph II of David, we 1ind the two limits are 0.34 and 0.94. Hence we
state that 0.34 < P < 0.94 with confidence 95%.
SUPOE w L (x,8)
(46) A(x) = ----'=----=-=--=------'-----'-
supo-= u L (x,8) .
The likelihood ratio test is the procedure of rejecting the null hypothesis .... -/zen
A(x) is less than a predetermined constallI.
Intuitively, one rejects the null hypothesis if the density of the observa-
tions under the most favorable choice of parameters in the null hypothesi~ is
much less than the density under the most favorable unrestricted choice of
the parameters. Likelihood ratio tets have some desirable featlJres~ ~ee
Lehmann (195~), for example. Wald (1943) has proved some favorable
asymptotic properties. For most hypotheses conceming the multivariate
normal distribution, likelihood ratio tests are appropriate and often are
optimal.
Let us consider the likelihood ratio test of the hypothesis that p = Po
based on a sample xl"'" x N from the bivariate normal distribution. The set
n consists of iJ.l' iJ.2' (Tl' (T2' and p such that (Tl > 0, (T'2. > O. - I < P < L
The set w is the subset for which p = Po. The likelihood maximized in n is
(by Lemmas 3.2.2 and 3.2.3)
(47)
130 SAMPLE CORRELATION COEFFICIENTS
(48)
where (Tz= (Tl(TZ and T= (T1/(T2' The maximum of (48) with respect to T
occurs at T = va:: va
I :'2 • The concentrated likelihood is
( 49)
(51 )
The likelihood ratio test is (1- pJXl - ,2Xl - PO,)-2 < c, where c is chosen
so the probability of the inequality when samples are drawn from normal
populations with correlation Po is the prescribed significanc(' leveL The
critical region can be written equivalently as
( 52)
or
(53)
r < Poc - (1 - pJ)v'i"=C"
pJc + 1 - pJ
(54) r(n)
(55)
where Cgh(n) =A/1h(n)/ ~ G"gg G"hll . The set Cin), C)in), and Cin) is dis-
tributed like the distinct elements of the matrix
where
Oi)
p
Let
(57)
(58)
b= l~).
132 SAMPLE CORRELATION COEFFICIENTS
2p
(59) 2p
1 + p2
Proof See Serfling (1980), Section 3.3, or Rao (1973), Section 6a.2. A
function g(u) is said to have a differential at b or to be totally differentiable
at b if the partial derivatives ag(u)/ au, exist at u = b and for every e> 0
there exists a neighborhood Ne(b) such that
(60)
It is cleJr that U(n) defined by (57) with band T defined by (58) and (59),
respectively, satisfies the conditions of the theorem. The function
(61)
(62)
4.2 CORRElATION COEFFICIENT OF A BIVARIATE SAMPLE 13J
(63)
2p
2p
l(-~p
-"'iP
1 + p2 1
I
- iP
= ( p _ p .. , p - p~, 1 - p'2) I
- iP
.., ..,
=(l-p-f·
1
(64) f ( p)
I
= 1 _ p2 ="21( 1 +1 p + 1 -1p) .
I 1+r -1
( 65) z = -log-- = tanh r
2 1- r '
1 1+ p
(66) ,=-log--.
2 1- p
134 SAMPLE CORRELATION COEFFICIENTS
( 67)
( 68)
., 1 8 - p2
(69) G(z {r=-+
1] 4n-
? + ...
and holds good for p2/n 2 small. Hotelling (1953) gives moments of z to order
n -~. An important property of Fisher's z is that the approach to normality is
much more rapid than for r. David (938) makes some comparisons between
the labulat~d probabilities and the probabilities computed by assuming z is
normally distributed. She recommends that for N> 25 one take z as nor-
mally distributed with mean and variance given by (67) and (68). Konishi
(1978a, 1978b, 1979) has also studied z. [Ruben (1966) has suggested an
alternative approach, which is more complicated, but possibly more accurate.]
We shall now indicate how Theorem 4.2.5 can be used.
( 1 + Po
(70) '0 = '2log 1 - Po .
(72) ./
v N - 3 1z '0 Nipo
_ f I > 1.96.
(73)
1
(77) Pr{X=x a } =N'
- a=l, ... ,N
136 SAMPLE CORRELATION COEFFICIENTS
A rand Jm ~ample of size N drawn from this finite population ha.; a probabil-
ity distribution, and the correlation coefficient calculated from such a sample
has a (discrete) probability distribution, say PN(r). The bootstrap proposes to
use this distribution in place of the unobtainable distribution of the correla-
tion coefficient of random samples from the parent population. However, it is
prohibitively expensive to compute; instead PN(r) is estimated by the empiri.
cal distribution of r calculated from a large number of random samples from
(77). Diaconis and Efron (1983) have given an example of N = 15; they find
the empirical distribution closely resembles the actual distribution of r
(essentially obtainable in this special case). An advantage of this approach is
that it is not necessary to assume knowledge of the parent population;
a disadvantage is the massive computation.
(1) x(1)
X= ( X(2)
1
,
then the conditional distribution of X(l) given X(2) = X(2) is N[IJ,(l) + p(X(2) -
1J,(2), I U . 2 ]' where
(2)
(3)
N
(4) A= E (xa-i)(xa- i )'
a=1
We can now apply Corollary 3.2.1 to the effect that maximum likelihood
estlmators of functions of pa rameters are those functions of the maximum
likelihood estimators of those parameters.
Theorem 4.3.1. Let XI"'" X N be a sample from N( jJ., I), where jJ.
and I are partitioned as in (1). Define A by (4) and (i(I)' i(~)') =
(1/N)LZ=I(X~)' X~2)'). Then the maximltm likelihood estinJatOl:'1 of jJ.(l). jJ.(21.
13. I ll .2 • and I22 are fJ.,(I) = i(I), fJ.,(2) = i<2l,
(7)
where
( 10)
The estimator P,),q + I, ... , p' denoted by r,),q +I. . '" p' is called the sample
paltiai co"elatiOIl coefficient between X, and X) holding X q+ I' ... , Xp fixed. It is
also called the sample partial correlation coefficient between Xi and Xi
having taken account of Xq+ I " " , Xp. Note that the calculations can be done
in terms of (r,)-
The matrix A 11.2 can also be represented as
N
(11) All ~ = E
a=1
[X~I) - x(l) - P{ X~2) - X(2)) 1[X~l) - x(l) - P{ X~2) - x(2)) l'
A A
= All - PA 22 P I.
(12)
P
( 13) X,=X,+ E ~,)(x)-x)), i = 1, ... , q,
j=q+l
where X" x) are running variables. Here ~i) is an element of = 112 1221 = P
A 12 A 221 . The ith row of P
is ( ~/ q+ I' ••. , ~/P)' Each right-hand side of (13) is
the least squares regression function of x, on Xq+I""'Xp; that is, if we
project the points XI"'" x N on the coordinate hyperplane of Xi' Xq+ I " ' " Xp'
4.3 PARTIAL CORRELATION COEFF"ICIENTS 139
N
Figure 4.4
is on the hyperplane (13). The difference in the ith coordinate of Xa Hnd the
point (14) is Yia = Xja - (Xi + Lr~q + I ~ixJa - i J)] for i = 1, ... , q anJ 0 for
the other coordinates. Let y:. = (y 1",' .•. , Yq<l). These points can be repre-
sented as N points in a q-dimensional space. Then A ll .2 = L~=IYaY~.
We can also interpret the sample as p points in N-space (Figure 4.4). Let
u J =(x;I' ... ,xi1Y )' be the jth point, and let E = 0, ... , 1)' be another point.
The point with coordinates ii' ... ' Xi is XiE. The projection of u, on the
hyperplane spanned by u q+ I, •.• , up, E is
p
(15) u =i;E + E
j ~tl uJ - XjE);
j=q+l
this is the point on the hyperplane that is at a minimum distance from U j • Let
uj be the vector from u; to u" that is, Uj - U" or, equivalently, this vector
translated so that one endpoint is at the origin. The set of vectors ut, ... , u~
are the projections of U l ' ... ' u q on the hyperplane orthogonal to
140 SAMPLE CORRELATION COEFFICIENTS
Up, E. Then u;'ui = a,'.qtl ..... p, the length squared of uj (Le., the
U qtl , ••• ,
square of the distance of u from 12). Then Uf 'u; / ,ju; 'UfU; 'u; = r ij .qt I • .... p
is the cosine of the angle between u; and uj.
As an example of the use of partial correlations we consider some data
[Hooker (1907)] on yield of hay (XI) in hundredweights per acre, spring
rainfall (X2 ) in inches, and accumulated temperature above 42°P in the
spring (X3 ) for an English area over 20 years. The estimates of /-L" (T,
(= F,;), and P,} are
( 28.02)
91
fJ, = i = 59:. ,
(16)
(::) ~ 4.42)
1.10 ,
85
1 0.80 40
PI2
1 P") (
P23::::;
1.00
0.80 1.00
-0. 1
-0.56 .
P21
From the correlations we observe that yiel.d and rainfall are positively
related, yield and temperature are negatively related, and rainfall and tem-
perature are negatively related. What interpretation is to be given to the
apparent negative relation between yield and temperature? Does high tem-
perature tend to cause low yield, or is high temperature associated with low
rainfall and hence with low yield? To answer this question we consider the
correlation between yield and temperature when rainfall is held ftxed; that is,
we use the data given above to estimate the partial correlation between Xl
and X3 with X 2 held fixed. It is t
(17)
Thus, 'f thl.! effect of rainfall is removed, yield and temperature :ue positively
correlated. The conclusion is that both hif;h raninfall and high temperature
increase hay yield, but in most years high rainfall occurs with low tempera-
ture and vice versa.
(%""'1 a=l
=FHF' =1.
(20) E= (!:)
142 SAMPLE CORRELATION COEFFICIENTS
Note that
~ ul!}:,p
== V( ~ )F= U(2) P,
m
(24)
a=m-rt I
Thus C is
II! m m m-r
L: }~, 1:; - GHG' = E v.,U{; - E VnU,: == E V"a.:.
" I " I ,,-III-reI l\'=1
(26) Za =
z(I)
Z~2)
1.
(
x(2) = (X 2 , •• • , X p )'; we shall not need subscripts on R.' The variables can
always be numbered so that the desired multiple correlation is this one (any
irrelevant variables being omitted). Then the multiple correlation in the
population is
( 1)
(2)
(3)
( 4)
(5) R=
( 6)
IAI
The quantrtles R and ~ have properties in the sample that are similar to
those R and ~ have in the population. We have analogs of Theorems 2.5.2,
2.5.3, and 2.5.4. Let xla=xl +~'(X~2)_X(2)), and xTa=x1a-x la be the
residual.
Theorem 4.4.1. Tile residuals xfa are unco"elated in the sample with the
components of X~2). a = 1, ... , N. For every ve( tor a
N 2 N ,
(7) E [Xla-Xl-~'(X~)_x(2))] ::; E [xlll-il-a'(x~2)-ir~\)r·
a=l a=l
146 SAMPLE CORRELATION COEFFICIENTS
The sample correlation between x I" and a' X~,2), a = I, ... , N, is maximized for
a :::;; ~. and that ma:x:imum co"elacion is R.
Proo,t: Since the sample mean of the residuals is 0, the vectOr of sample
covariances between :rt and X~2) is proportional to
Q
\'
\ 8) t [( .\'\", -.\' I) - ~' (X~2) - .r(2))]( X~2) - j(2»), = a(l) - P' A22 = O.
'" 1
The right-han,d side of (7) can be written as the left-hand side plus
N ,
(9) 1: [(P - a)'( x;) - j(2»)r
"'''I
N
= (P - a) 1: (x~) -
I j(2») ( x~) - i(2»), (P - a),
","'1
N N
(10) all 2 1: (XI" idP'(x;;) -i(2)) + 1: (p'(x~) -i(2»)r
0-1 aal
N N
saIl 2 1: (XI" -ida'(x;) -i(:!») + 1: [al(x~) j(2»)f,
a-I ",=1
( \1)
Thus .t\ + ~'(X~:) -1(2)) is the best linear predictor of Xl", in the sample,
and P'X~I is the linear function of X~2) that has maximum sample correlation
4.4 THE MULTIPLE CORRELATION COEFFICIENT 147
with Xl 0 ' The minimum Sum of squares of deviations [the left-hand side of
(7)] is
N
(12) L [(x la -XI) - ~I(X~) _i(2))]2 =a 11 - ~'A22~
a=l
(13)
N ....
( 14) '\'
i-.J [a
p
l
(
xa(2) _ -(2))] 2--.p
X
.......
a'A 22P-a(l)
a - , A-I
22 a (l)'
a=1
and the length squared of the first vector is L~=I(Xla -i\)2 = all' Thus R is
the cosine of the angle between the first vector and its projection.
148 SAMPLE CORRELATION COEFFICIENTS
~~--------------------------l
Figure 4.5
In Section 3.2 we saw that the simple correlation coefficient is the cosine
of the angle between the two vectors involved (in the plane orthogonal to the
equiangular line). The property of R that it is the maximum correlation
between X Ia and linear combinations of the components of X~2) ~orresponds
to the geometric property that R is the cosine of the smallest angle between
the vector with components x I a - XI and a vector in the hyperplane spanned
by the other p - 1 vectors.
The geometric interpretations are in terms of the vectors in the (N - I)-
dimensional hyperplane orthogonal to the equiangular line. It was shown in
Section 3.3 that the vector (XII - XI" .. , XiN - X) in this hyperplane can be
designated as (ZiP"" Zi N-I), where the zia are the coordinates referred to
an (N - I)-dimensional 'coordinate system in the hyperplane. It was shown
that the new coordinates are obtained from the old by the transformation
Zia = L.~,",l ba {3xi{3' a = 1, ... , N, where B = (b a {3) is an orthogonal matrix
with last row OlIN, ... , 11 IN). Then
N N-I
(15) aij = E (xia -Xi)(X ja -XJ) = E ZiaZja'
a=1 a=1
sufficient set of statistics for jJ. and ~, thaI i~ invariant under these transfor-
mations. Just as the simple correlation,. is a measure of association between
two scalar variables in a sample, the multiple correlation R is a measure of
association between a scalar \ ariable and a vector variable in a sample.
(16)
then
(17)
and
(18)
For q = 1, Corollary 4.3.2 states that when ~ = 0, that is, when R = O. all 2 is
· 'b uted as L,<;"N-P
d Istn V2 d a(l)A22 -1 . d'
Istn'b ute d as Lo.=N-p+1
,<;"N-[ V' h
a =[ a an all) IS 0.-. were
I
VI' ., . ,VN - 1 are independent, each with distribution N(O, (Tn :!). Then
a n . 2 / (T11.2 and a'(I)A2i a(\/ (TIl 2 are distributed independently as xl-varia-
1
(19)
~
Xp--l N-p
=-'~'-p 1
XN-p -
p-l
- - Fp-I.N-p
N-p
(~l) R= p-l
1 + N_pFp-I,N-P
IS
The observations are given; L is a function of the indeterminates fL*, I*. Let
(u be the region in the parameter space n
specified by the null hypothesis.
The likelihood ratio criterion is
(24)
4.4 THE MULTIPLE CORRELATION COEFFICIENT 151
(25)
(26)
The first factor is maximized at J.l.i = III Xl and 0"1) 0"1) (1 j N)a u ' and
the second factor is maximized at j.l(2)* = fJ,(2) = f(2) and ~~2 == i22 =
(ljN)A 22 . The value of the maximized function is
(28)
The likelihood ratio test consists of the critical region A < Ao, where Ao is
chosen so the probability of this inequality when R 0 is the significance
level a. An equivalent test is
(29)
(30)
Theorem 4.4.3. Given a sample x I' ... , X N from N(JL, I), the likelihood
ratio test at significance level a for the hypothesis R = 0, where R is the
population multiple co"elation coefficient between XI and (X2 , •.. , X p )' is given
by (30), where R is the sample multiple co"elation coefficient defined by (5).
As an example consider the data given at the end of Section 4.3.1. The
sample multiple correlation coefficient is found from
1 rJ2 r l3
1.00 0.80 -0.40
r 21 1 r23
0.80 1.00 -0.56
r 31 r32 1 -0.40 -0.56 1.00
(31) 1- R2 = = 0.357.
1 r2;l 1 1.00 -0. 56 1
~0.56 1.00
r32 1
Thus R is 0.802. If we wish to test the hypothesis at the 0.01 level that hay
yield is independent of spring rainfall and temperature, we compare the
observed [R 2 /(l-R 2 )][(20-3)/(3-1)]=15.3 with F 2,I7(0.01) = 6.11 and
find the result significant; that is, we reject the null hypothesis.
The test of independence between XI and (X z'" ., Xp) =X(2)' is equiva-
lent to the test that if the regression of XI on x(Z) (that is, the conditional
.
expecte d va Iue 0 f X I gIVen X 2 -x
- 2 ,···, X p --;1' ).IS #LI + f,lr(
t" X
(2) - (2))
JL, t he
vector of regression coefficients is O. Here p = A 22 a(l) is the usual least
I
squal es estimate of ~ with expected value p and covariance matrix 0"11-2 A 221
(when the X~2) are fixed), and all _2/(N - p) is the usual estimate of O"U'2'
Thus [see (18)]
(32)
is the usual F-statistic for testing the hypothesis that the regression of XI on
x Z"'" xp is O. In this book we are primarily interested in the multiple
correlation coefficient as a measure of association between one variable and
a vector of variables when both are random. We shall not treat problems of
univariate regression. In Chapter 8 we study regression when the dependent
variable is a vector.
(33) 1
which is equivalent to
(34)
where ~ I221 U(1) and (J"1\.2 = (J"n - u(I)I 2:!IU(l)' The conditions are those
of Theorem 4.3.3 with Ya=Zla, r W, MIa Z~), r=p 1, <P==(J"Il.:!,
I
m = n. Then a ll . 2 = all - a(I)A 22 a(JI cOrresponds to L~-l Y" Y,; GHG'. and
a ll .2/(J"lJ.2 has a X2-distribution with n - (p 1) degrees of freedom.
I
a(l) A 221 a(l) (A 22 a(I»)' A 22 (A 2la(l») corresponds to GHGt and is distributed
as LO/Ual, a = n - (p - I) + 1, ... , n, where Var(Ua ) (J"1t.:! and
(35)
(36)
154 SAMPLE CORRELATION COEFFICIENTS
(p-l)exp[ -!P'A22~/O"1l'2]
(37) (N-p)r[HN-p)]
cxp[ lUI
21-'
u/O"]
A 221-' 112 (1 _
,
wr~(N-p)-1
(38)
r[t(N-p)]
P'L~= I Z~)Z~)I~
(39)
0"11-2
4.4 THE MULTIPLE CORRELATION COEFFICIENT 155
Since the distribution of Z~2) is N(O, In), the distribution of WZ~2) / ~ (Til· 2
is normal with mean zero and variance
$WZ~2)Z~2) /~
( 40)
(T1I.2
WI22~ WI22~/(T1I
= (Til - WI22~ = 1- WI22~/(Ttl
iP
1- R 2 '
2!n+" rOn + a)
V2 2
4>a rOn+a)
(1 + 4»1 n a
+ rOn)
Applying this result to (38), we obtain as the density of R2
(1 - R2) i(n-p- 1)(1 _ R2) in 00 (iP( (R2) tfp-I )+r I r20n + IL)
( 42)
r[~(n -p + 1)]rOn) p.f:o IL!r(~(p -1) + IL]
Fisher (1928) found this di~tribution. It can also be written
(43)
'F[ln 1
2 '2 (p-1)·R
2 , In· , R],
2 2
(44)
a) t( /I - p + I) L
=
( at
_ {'i n - I
1=1
(1_R2)~n(R2)1(P-3)(1_ R2)i(n- p -n
( 45)
r[~(n-p+l)]
Theorem 4.4.5. The density of the square of the multiple co"elation coeffi-
cient, R\ between XI and X 2 , •.. , Xp based on a sample of N = n + 1 is given
by (42) or (43) [or (45) in the case of n - p + 1 even], where iP is the
co"esponding population multiple correlation coefficient.
'1 (1-
1
R 2 )1(n- p+O-I(R 2 )i(p+h-I)+r I d(R 2 )
o
_ (1_R2)tn E (R2)i-Lr20n+JL)r[Hp+h-l)+JL]
- rOn) 1-'=0 JL!f[~(p-l) +JL]r[~(n+h) +JL]
than the sample correlation between XI and WX(2); however, the latter is the
simple sample correlation corresponding to the simple population correlation
between XI and (l'X(2\ which is R, the population multiple correlation.
Suppose Rl is the multiple correlation in the first of two samples and ~I
is the estimate of (3; then the simple correlation between X 1 and ~/I X(2) in
the second sample will tend to be less than RI and in particular will be less
than R 2 , the multiple ccrrelation in the second sample. This has been called
"the shrinkage of the multiple correlation."
Kramer (1963) and Lee (1972) have given tables of the upper significance
points of R. Gajjar (1967), Gurland (1968), Gurland and Milton (1970),
Khatri (1966), and Lee (1917b) have suggested approximations to the distri-
butions of R 2/(1 - R2) and obtained large-sample results.
Theorem 4.4.6. Given the observations Xl' ... , X N from N( j-L, I), of all tests
oj'/? = 0 at a given significance level based on i and A = E~_l(Xa -iXx a -x)'
that are invariant with respect to transfonnations
( 47)
any critical rejection region given by R greater than a constant is unifomlly most
powerful.
( 48)
Theorem 4.4.7 follows from Theorem 4.4.6 in the same way that Theorem
5.6.4 follows from Theorem 5.6.1.
( 1)
for all c > 0 and all positive definite S and the conditions of Theorem 4.5.1 hold,
then
af(a) -0
(7) aa' a - . •
The conclusion of Corollary 4.5.1 can be framed as
The limiting normal distribution in (8) holds in particular when the sample is
drawn from the normal distribution. The corollary holds true if K is replaced
by a consistent estimator R. For example, a consistent estimator of 1 + R
given by (16) of Section 3.6 is
N
(9) l+R= L [(xa-x)'S-I(xa i)r/[Np(p+2)].
lX==1
(10)
Now let us consider the asymptotic distribution of R2, the squa.e of the
multiple correlation, when IF, the square of the population multiple correla-
tion, is O. We use the notation of Section 4.4. R2 = 0 is equvialent to 0'(1) = O.
Since the sample and population multiple correlation coefficients between
Xl and X(2) = (X2 , . •• ,Xp)I are invariant with respect to linear transforma-
tions (47) of Section 4.4, for purposes of studying the distribution
of R2 we can assume .... = 0 and I = lp. In that case SlI .4 1, S(I) .4 0, and
*"
S22 .4lp _ I ' Furthermore, for k, i 1 and j = l = 1, Lemma 3.6.1 gives
(14)
NR 2 N.S(I) S-I
I
22 S(I) d 2
(15) 1+ R= (1 + R) sll ~ Xp-I .
(16)
based on the vector spherical model g(tr yIy). The unbiased estimators
of v and I = (cfR 2 /p)A are x = (l/N)X'E N and S = (l/n)A, where A =
(X - Ellx')'lX - E NX ' ).
Since
Theorem 4.5.3. When X has the vector elliptical density (6), rite distribu-
tions of rll , r,rq + l' and R2 are the distlibutions derived for normally distributed
observations .
(18)
l. --_
( 19) WI =v- ?, ...• p.
I
(20) V=UT',
The proof of the lemma is given in the first part of Section 7.2 and as the
Gram-Schmidt orthogonalization in the Appendix (Section A.S.I). This
lemma generalizes the construction in Section 3.2; see Figure 3.1. See also
Figure 7.1.
Note that' T is lower triangulul', U'U=I", and V'V= TT'. The last
equation, til ~ 0, i = 1, ... , p, and I" = 0, i <j, can be solved uniquely for T.
Thus T is a function of V' V (and the restrictions).
Let Y CN xp) have the density gCY'Y). and let 0" he an orthogonal
N X N matrix. Then y* = ON Y has the density g(Y* 'Y*). Hence y* =
- U* T* , w1lere til* > O·
- vI . Le t y* -
N Y !L * -- 0 ,I. ......-- ].. F rom
, I -- I , " " p, an d t'J
°
I
U* = 0NU fb. U. Let the space of U (N xp) such that U'U = I" be denoted
O(Nxp).
The proof of Corollary 7.2.1 shows that for arbitrary gO the density of
T is
p
(21) n {cB( N + 1- i)] t/~-/}g(tr IT'),
i~ 1
~22) Y=VT',
(24) f( XG') = f( X)
for all G (p xp). Then the distribution of f(X) where X has an arbitrary density
(18) is the same as the distribution of f(X) where X has the nonnal density (18).
Proof. From (23) we find that f(X) = j(YC'), and from (24) we find
j(YC') = j(VT'C') = f(V), which is the same for arbitrary and normal densi-
ties (18). •
The condition (24) of Corollary 4.5.5 is that I(X) is invariant with respect
to linear transformations X ~ XG.
The density (18) can be written as
which shows that A and x are a complete set of sufficient statistics for
A=CC 1 and v.
PROBLEMS
4.2. (Sec.4.2.1) Using the data of Problem 3.1, test the hypothesis that Xl and X 2
are independent against all alternatives of dependence at significance level 0.01.
4.3. (Sec. 4.2.1) Suppose a sample correlation of 0.65 is observed in a sample of 10.
Test the hypothesis of independence against the alternatives of positive correla-
tion at significance level 0.05.
4.4. (Sec. 4.2.2) Suppose a sample correlation of 0.65 is observed in a sample of 20.
Test the hYpothesis that the population correlation is 0.4 against the alternatives
that the population correlation is greater than 0.4 at significance level 0.05.
4.5. (Sec.4.2.1) Find the significance points for testing p = 0 at the 0.01 level with
*'
N = 15 observations against alternatives (a) p 0, (b) p> 0, and (c) p < O.
4.6. (Sec. 4.2.2) Find significance points for testing p = 0.6 at the 0.01 level with
*'
N = 20 observations against alternatives (a) p 0.6, (b) p> 0.6, and (c) p < 0.6.
4.7. (Sec. 4.2.2) Tablulate the power function at p = -1(0.2)1 for the tests in
Probl~m 4.5. Sketch the graph of each power function.
4.8. (Sec. 4.2.2) Tablulate the power function at p = -1(0.2)1 for the tests in
Problem 4.6. Sketch the graph of each power function.
4.9. (Sec. 4.2.2) Using the data of Problem 3.1, find a (two-sided) confidence
interval for p 12 with confidence coefficient 0.99.
4.10. (Sec. 4.2.2) Suppose N = 10, , = 0.795. Find a one-sided confidence interval
for p [of the form ('0> 1)] with confidence coefficient 0.95.
164 SAMPLE CORRELATION COEFFICIENTS
4.11. (Sec. 4.2.3) Use Fisher's z to test the hypothesis P = 0.7 against alternatives
1-' ¢ O. i at the 0.05 level with r = 0.5 and N = 50.
4.12. (Sec. 4.2.3) Use Fisher's z to test the hypothesis PI = P2 against the alterna-
tives PI ¢ P2 at the 0.01 level with r l = 0.5, Nl = 40, r2 = 0.6, Nz = 40.
4.14. (Sec. 4.2.3) Use Fisher's z to obtain a confidence interval for p with confi~
dence 0.95 based on a sample correlation of 0.65 and a sample size of 25.
4.15. (Sec. 4.2.2). Prove that when N = 2 and P = 0, Pr{r = l} = Pr{r = -l} = !.
4.16. (Sec. 4.2) Let kN(r, p) be the density of thl sample corrclation coefficient r
for a given value of P and N. Prove that r has a monotone likelihood ratio; that
is, show that if PI> P2, then kN(r, p,)jkN(r, P2) is monotonically increasing in
r. [Hint: Using (40), prove that if
OQ
F[!,!;n+h~(1+pr)]= E ca (1+pr)a=g(r,p)
a=O
4.17. (Sec. 4.2) Show that of all tests of Po against a specific PI (> Po) based on r,
the procedures for which r> C implies rejection are the best. [Hint: This follows
from Problem 4.16.}
4.18. (Sec. 4.2) Show that of all tests of p = Po against p> Po bas~d on r, a
procedure for which r> C implies rejection is uniformly most powerful.
4.19. (Sec. 4.2) Prove r has a monotone likelihood ratio for r> 0, p> 0 by proving
h(r) = kN(r, PI)jkN(r, P2) is monotonically increasing for PI > P2. Here h(r) is
a constant times Cr:~Oca PLra)j(t:~Oca pzr a ). In the numerator of h'(r),
show that the coefficient of r f3 is positive.
4.20. (Sec. 4.2) Prove that if I is diagonal, then the sets r 'j and all are indepen-
dently distributed. [Hint: Use the facts that rij is invariant under scale transfor-
mations and that the density of the observations depends only on the a,i']
PROBLEMS 165
tC r
2
m =
r[t(N-l)]r(m+i)
---=::=="--=-:--.:.-"---'----'::-'-
j;r[t(N - 1) + m]
4.22. (Sec. 4.2.2) Prove fl( p) and f2( p) are monotonically increasing functions
of p.
4.23. (Sec. 4.2.2) Prove that the density of the sample correlation r [given by
(38)] is
[Hint: Expand (1 - prx)-n in a power series, integrate, and use the duplication
formula for the gamma fLnction.]
4.24. (Sec. 4.2) Prove that (39) is the density of r. [Hint: From Problem 2.12 show
Then argue
Finally show that the integral 0[(31) with respect to all ( = y 2) and a 22 ( = z2) is
(39).]
4.25. (Sec. 4.2) Prove that (40) is the density d r. [Hint: In (31) let all = ue- L- and
u
a22 = ue ; show that the density of u (0 ~ u < 00) and r ( - 1 ~ r ~ 1) is
;, r(j+1) j
L- r( 1)'1 Y .
j=O 2 J.
C,~J,Tl = (I-p~rn
I
E(2p/ J3 +
l
r2[t(n+l)+J3]r(Iz+J3+t)
;;r(~n) {3=() (213+ I)! rOn +h + 13+ 1)
I
(1- p2f/l 00 (2p)'~ r2(In + J3)r(h + 13+ t)
f;r(tn) {3"'f:o (213)1 rOn+h+J3)
".27. lSec. 4.2) The I-distribution. Prove that if X and Yare independently dis-
tributed, X having the distribution N(O,1) and Y having the X 2-distribution
with m degrees of freedom, then W = XI JY1m has the density
[Hint: In the joint density of X and Y, let x = twlm- t and integrate out w.]
[Him: Use Problem 4.26 and the duplication formula for the gamma function.]
4.29. (Sec. 4.2) Show that .fn( "j - Pij)' (i,j) =(1,2),(1,3),(2,3), have a joint limit-
ing distribution with variances (1 - PiJ)2 ann covaIiances of rij and 'ik, j k "*
·
b emg ~1 (2 P,k - Pi} P,k Xl - Pi,2 - Pik
2 2) 2
- Pjk + Pjk'
.UO. {Sec. 4.3.2) Find a confidence interval for P!3.2 with confidence 0.95 based on
'1.' ~ = 0.097 and N = 20.
4.31. {S~i.:. 4.3.2) U~t: Fi~her'~ ~ 10 te~t thc hypothesis P!2'34 = 0 against alternatives
p!~ ,.\ "*
0 at significance level 0.01 with '!n4 = 0.14 and N = 40.
4.32. {SCL. 43) Show that the inequality '~n s: I i~ the same as the inequality
1'1,1 ~O. where l'i,1 denotes the determinant of the 3 X 3 correlation matrix.
4.33. {Sct:. 4.3) il/varial/ce of Ihe sample paHial correlatioll coefficient. Prove that
r!~.3" ... p is invariant under the transformations x~a = ajxjCl + b;x~) + c j ' a j > 0,
i = 1, 2, x~)'" = Cx~) + b, ~ a = 1, ... , N, where x~) = (X3a"'" xpa)/, and that
any function of i and I that is invariant under these transformations is a
function of '12.3.. .. , p'
4.34. (Sec. 4.4) Invariance of the sample rmdtiple correlation coefficient. Prove that R
is a fUllction of the sufficient ~latistics i and S that is invariant under changes
of location and scale of Xl a and nonsingular linear transformations of 2) (that xi
is. xi.. = ex l " + d, X~2)* = CX~2) + d, a = 1, ... , N) and that every function of i
dnu S Ihal is invarianl i~ a function of R.
PROBLEMS 167
4.35. (Sec. 4.4) Prove that conditional on Zla=Zla' a=I, ... ,n, R2/0-R2) is
r
distributed like 2 /(N* -1), where r2
= N*i'S-li based on N* = n observa-
tions on a vector X with p* = P - 1 components, with mean vector (C/0"1J}<1(l)
(nc 2 = LZ~a) and covariance matrix I 22 '1 = I.22 - U/O"ll }<1(1)U(I)' [Hint: The
conditional distribution of Z~2) given Zla=zla is N[(1/0"1l)u(I)zla,I 22 '1]'
There is an n X n orthogonal matrix B which carries (ZIJ,"" Zln) into (c, ... , c)
and (Z;I"'" Z'/I) into OJ,, ... , }f", i = 2, ... , p. Let the new X~ be
(Y2a ,···, Ypa ).]
4.36. (Sec. 4.4) Prove that the noncentrality parameter in the distribution in Prob-
-2 -~
lem 4.35 is (all/O"Il)R /0 - RL).
4.37. (Sec. 4.4) Find the distribution of R2/<1 - R2) by multiplying the density of
Problem 4.35 by the density of all and integrating with re:;pect to all'
4.38. (Sec. 4.4) Show that the density of r 2 dcrived from (38) of Section 4.2 is
identical with (42) in Section 4.4 for p = 2. [Hint: Use the duplication formula
for the gamma function.}
4.39. (Sec. 4.4) Prove that (30) is the uniformly most powerful test of R= 0 based
on ,. [Hint: Use the Neyman-Pearson fundamental lemma.]
4.40. (Sec. 4.4) Prove that (47) is the unique unbiased estimator of ip· based on R2.
V
4.43. (Sec. 4.3) Prove that if P'j-q+ I, ... , P == 0, then N - 2 - (p - q) ',j'q+ I, " .. p/
,; 1 - ',~.q+ I .... , p is distributed according tothet-distributionwith N - 2 - (p - q)
degrees of freedom.
4.44. (Sec. 4.3) Let X' = (XI' Xl' X(2),) have the distribution N(p., ~). The condi-
tional distribution of XI given X 2 == Xl and X(2) = x(2) is .
where
Show Cz == a l z.3, •..• p/a22'3,. .. ,p- [Him: Solve for c in terms of C2 and the a's, and
substitute.]
Hint: Use
a.,.,
a.
II 2•.. "P =a II - (c.,
f
C )
--
- ( a (2)
4.46. (Sec. 4.3) Prove that 1/a22.3•.. ., I' is the element in the upper left-hand corner
of
4.47. (Sec. 4.3) Using the results in Problems 4.43-4.46, prove that the test for
PI2.3 ..... p = 0 is equivalent to the usual t-test for 1'2 = o.
4.48. Missing observations. Let X = (yl Z')', where Y has p components and 1 has q
components, be distributed according to N(fJ., I), where
(:: 1,
Let M observations be made on X, and N M additional observations be made
on Y. Find the maximum likelihood estimates of fJ. and I. [Anderson 0957).]
[Hint: Express the likelihood function in terms of the marginal density of Yand
the conditional density of 1 given Y.]
I= (~,
p'
Show that on the basis of one observation, x' = (XI' x:" Xl)' we can obtain a
confidence interval for p (with confidence coefficient 1 - ad by using as end-
points of the interval the solutions in t of
where xj( 0:) is the significance point of the x2-distribution with three degreeg
of freedom at significance level 0:.
CHAPTER 5
5.1. INTRODUCTION
r>:1 I: - fL
{= yNr v - -
s
(2)
170
2
5.2 DERIVATION OF THE T ·STATISTIC AND ITS DISTRIBUTION 171
the population mean. Hotelling (1931) proposed the T 2· statistic for two
samples and derived the distribution when IJ. is the population mean.
In Section 5.3 various uses of the T 2-statistic are presented, including
simultaneous confidence intervals for all linear combinations of the mean
vector. A James-Stein estimator is given when ~ is unknown. The power
function 0: the T 2-test is treated in Section 5.4, and the mUltivariate
Behrens-Fisher problem in Section 5.5. In Section 5.6, optimum properties
of the T 2-test are considered, with regard to both invariance and admissibil-
ity. Stein's criterion for admissibility in the general exponential fam:ly is
proved and applied. The last section is devoted to inference about the mean
in elliptically contoured distributions.
maxL(lJ.o, ~)
(2) A = ---.-::I=---_~~_
maxL(IJ., ~) ,
/L. I
that is, the numerator is the maximum of the likelihood function for IJ., ~ in
the parameter space restricted by the null hypothesis (IJ. = lJ.o, ~ positive
definite), and the denominator is the maximum over the entire parameter
space (~ positive definite). When the parameters are unrestricted, the maxi·
mum occurs when IJ., ~ are defined by the maximum likelihood estimators
2
172 THE GENERALIZED T -STATISTIC
(3) iln = i,
(4)
( 6)
(7)
A= Itnl~N = IL(Xa-i)(xa-i)'I~N
(8)
II,.,1!N IL(Xa-IJ.O)(xa-lJ.o)'I~N
IAI{N
where
N
(9) A= L (Xa- i )(Xa- i )'=(N-l)S.
a-"'l
where
2 . 173
5.2 DERIVATION OF THE T -STATISTIC AND ITS DISTRIBUTION
( 12)
where Ao is chosen so that the probability of (12) when the null hypothesis is
true is equal to the significance level. If we take the ~Nth root of both sides
of (12) and invert, subtract 1, and multiply by N - 1, we obtain
(13)
where
( 14) T~ = (N - 1) ( AD 1/ N - 1).
Theorem 5.2.1. The likelihood ratio test of the hypothesis Jl = Jlo for the
distribution N(Jl, I) is given by (13), where T2 is defined by (11), x is the mean
of a sample of N from N(Jl, I), S is the covan'ance matrix of the sample, and To~
is chosen so that the probability of (13) under the null hypothesis is equal to the
chosen significance level.
The Student t-test has the property that when testing J..I. = 0 it is invariant
with respect to scale transformations. If the scalar random variable X is
distributed according to N( J..I., (T2), then X* = cX is distributed according to
N(cJ..l., C 2{T 2), which is in the same class of distributions, and the hypothesis
tS X = 0 is equivalent to tS X* = tS cX = O. If the observations Xu are trans-
formed similarly (x! = cx o' )' then, for c> U, t* computed from x; is the
same as t computed from Xa' Thus, whatever the unit of measurement the
statistical result is the same.
The generalized T2-test has a similar property. If the vector random
variable X is distributed according to N(Jl, I), then X* = CX (for ICI 0) is'*
distlibuted according to N( CJl, CI C'), which is in the same class of distribu-
tions. The hypothesis tS X = 0 is equivalent to the hypothesis tS X'" = tSCx = O.
ex
If the observations xa are transformed in the same way, x: = a' then T* ~
c'Jmputed on the basis of x: is the same as T2 computed on the basis of Xa'
This follOWS from the facts that i* = ex and A = CAC' and the following
lemma:
(15)
174 THE GENERAliZED T2~STATISTIC
A2/ N = 1[,~=I(X£l-.r)(xa-.r)'1
(17 )
1[,~~I(Xu -110)( Xu - 110),1 •
in terms of parallelotopes. (See Section 7.5.) In the p-dimensional represen-
tation the numerator of A?'/ N is the sum of squares of volumes of all
parallelotopes with principal edges p vectors, each with one endpoint at .r
and the other at an Xu' The denominator is the sum of squares of volumes of
all parallelotopes with principal edges p vectors, each with one endpoint at
11(\ and the other at Xu' If the sum of squared volumes involving vectors
emanating from i, the "center" of the xu, is much less than that involving
vectors emanating from 11o, then we reject the hypothesis that 110 is the
mean of the distribution.
There is also an interpretalion in the N·dimensional representation. Let
y, = (XII' ".' X, N)' be the ith vector. Then
N 1
(18) IN.r, = E r;:r X,a
a=1 yN
is the distance from the origin of the projection of y, on the equiangular line
(with direction cosines 1/ {N, ... , 1/ {N). The coordinates of the projection
are (x" ... ,x). Then (Xii ~X"""X'N -x,) is the projection of Y on the j
plane through the origin perpendicular to the equiangular line. The numera-
tor of A2/ /I' is the square of the p-dimensional volume of the parallelotope
with principal edges, the vectors (X,I - x" ... , X,N - x). A point (XiI-
fLo;"'" X,N /Lor> is obtained from Yi by translation parallel to the equiangu-
lar line tby a distance IN fLo,). The denominator of >..2/ N is the square of the
yolume of the parallelotope with principal edges these vectors. Then >..2/N is
the ratio of these squared volumes.
N(O, :I). The r 2 defined in Section 5.2.1 is a special case of this with
Y = IN (j - 1L0) and v = IN (IL - 1L0) and n = N - 1. Let D be a nonsingu-
lar matrix such that D:ID' = I, and define
U QY*,
(21)
B = QnS*Q'.
From the way Q was defined,
vI = ~q
i... 11 y*
I
= Vy*'y* ,
(22)
Then
btl b I2 blp VI
b 21 b 22 b 2p
°
r2
(23) n = U'B-1U= (VI,O, ... ,0)
bpI b P2
= V 12b ll ,
b PP
°
where (b l }) = B- 1 • By Theorem AJ.3 of the Appendix, l/b ll = b ll
b(l)B 221 b(l) = b ll .2•...• p, where
(24)
independent, each with distribution N(O, I). By Theorem 4.3.3 b 11.2, •.•• p is
conditionally distributed as L::(t- I) W,}, where conditionally the Wa are
independent, each with the distribution N(O,1); that is, b ll •2.... ,p is condi-
tionally distributed as X2 with n - (p - 1) degrees of freedom. Since the
conditional distribution of b lt •2 , ... ,p does not depend on Q, it is uncondition-
ally distributed as X2, The quantity Y* 'Y* has a noncentral x 2-distribution
with p degrees of freedom and noncentrality parameter v* 'v* = v'I. Iv.
Then T2jn is distributed as the ratio of a noncentnll X2 and an independent
Xl.
Corollary 5.2.1. Let XI"'" x N be a sample from N( ..... , I.), and let T2 =
N(i- ..... o),S-l(i ..... 0). The distribution of [T 2j(N-1)][(N-p)jp] is non-
central F with p and /,1 - p degrees of freedom and noncentrality parameter
N( ..... - ..... 0)'1. 1(..... - ..... 0). If ..... = ..... !), then the F-distribution is central.
Proof By the central limit theorem (Theorem 4.2.3) ihe limiting distribu-
tion of IN (iN - ..... ) is N(O, I.). The sample covariance matrix converges
stochastically to I.. Then the limiting distribution of TJ is the distribution of
Y'I. 1 Y, where Y has the distribution N(O, I.). The theorem follows from
Theorem 3.3.3. •
5.3 USES OF THE T 2-STATISTIC 177
(25)
this is the density of the bera distribution with pafHmctcr~ ~/l and ~b ~ ~
(1 )
as given in Section 5.2.1. If the significance level is ex, then the lOOex9c point
cf the F-distribution is taken, that is,
(2)
say. The choice of significance level may depend on the power of the test. We
shall di ..,cu~s this in Section 5.4.
The statistic T2 is computed from i and A. The vectOr A-I(.x - Ill)) = b
is the solution of Ab = i - I.l.o. Then T2 I(N - 1) = N(i - I.l.o)/ b.
Note that' T2 I(N - 1) is the nonzero root of
Proof The nonzero root, say AI' of (4) is associated with a charClcteristic
vector ~ satisfying
(5)
178 THE GENERALIZED T 2 -STATISTIC
------f------- ml
( 6)
•
In the case above v = IN(x - . . . 0) and B = A.
(7)
( 8)
(9)
2
5.3 USES OF THE T -STATISTIC 179
holds for all -y with probability 1 - a. Thus we can assert with confidence
1 - a that the unknown parameter vector satisfies simultaneously for all -y
the inequalities
( 12)
+ E(yi
a-"l
2) - ,(2)) (yi2) _ ,(2))') ,
180 THE GENERALJZED r 2-STATISTIC
(14)
(15)
taken from the population Iris versicolor (1) and 50 from the population Iris
setosa (2). See Table 3.4. The data may be summarized (in centimeters) as
( 18) -(1)
x
= [~:~~~)
4.260'
1.326
( 19) x
-(2) = (~:~~)
1.462'
0.246
5.3 USES OF lHE T 2-STATISTIC 181
The value of T2/98 is 26.334, and (T Z/98) X 91 = 625.5. This value is highly
significant cJmpared to the F-value for 4 and 95 degrees of freedom of 3.52
at the 0.01 significance level.
Simultaneous confidence intervals for the differences of component means
I-tP) - 1-t~2), ; = 1,2,3,4, are 0.930 ± 0.337, - 0.658 ± 0.265, - 2.798 ± 0.270.
and 1.080 ± 0.121. In each case 0 does not lie in the interval. [Since r9~(.01) <
T4 ,98(.01), a univariate test on any component would lead to rejection of the
null hypothesis.] The last two components show the most significant differ-
ences from O.
(21 )
where {31"'" {3q are given scalars and 1-1- is a given vector. Thc criterion i ...
(23)
where
(24)
182 THE GENERAUZED T 2-STATISTIC
( 25)
(26)
(27) CE =0,
has mean CJL and covariance matrix CIC'. The hypothesis H is CJL = O.
The stati~tic to be used is
(29)
where
N
_ I '\'
(30) Y = N i-.J Ya =CX,
a= ,
(31)
N
= N~IC L (xu-x)(xu-x)'C'.
a='
This statistic has the T 2-distribution with N - 1 degrees of freedom for a
(p - I)-dimensional distribution. This T2- statistic is invariant under any
linear transformation in the p - 1 dimensions orthogonal to E. Hence the
statistic is independent of the choice of C.
An example of this sort has been given by Rao (1948b). Let N be the
amount of cork in a boring from the north into a cork tree; let E, S, and W
be defined similarly. The set of amounts in four borings on one tree is
5.3 USES OF THE T 2-STATISTIC 183
8.86)
(33) y 4.50;
\ 0.86
the covariance matrix for y is
The value of T 2 /(N -1) is 0.768. The statistic 0.768 x 25/3 = 6.402 is to be
compared with the F-significance point with 3 and 25 degrees of freedom. It
is significant at the 1% level.
(35)
- 2
( 1- N(x-v)'I (x-v)
)+ x-v +v
( )
is a minimax estimator of .... for any v and has a smaller risk than x when
p ~ 3. When I is unknown, we consider replacing it by an estimator, namely,
a mUltiple of A = nS.
Theorem 5.3.1. When the loss is (m - .... )'I-'(m - .... ), the estimator for
p ~ 3 given by
(36) (1 - a
N(x- v)'A (x- v)
)( - ) +v
x-v
has smaller risk than x and is minimax/or 0 < a < 2(p - 2)/(n - p + 3), and
the risk is minimized/or a = (p - 2)/(n - p + 3).
184 THE GENERALIZED r 2 -STAllSTIC
(37)
2
tlR _ 2a Xn-p+l
2 P
y y a2( Xn-p+1 ) 2}
( /lY-vI1 iE(
$
(38) (j.L)- np' 2 ,-,ul)( , - V1) - IIY-vI1 2
2 2
=$ (2a(p-2)xn _p+1 _ a (xLp+l)2)
p. /lY - vll 2 IIY - vl1 2
= {2( P - 2) (n - P + 1) a
has smaller risk than (36) and is minimal' for 0 < a < 2( p 2) I(n - p + 3).
Proof This corollary follows from Theorem 5.3.1 and Lemma 3.5,2. •
In Section 5.2.2 we showed that (T~ 11lXN - p)lp has a noncentral F-distri·
bution. In this section we shall discuss the noncentral F-distribution. its
tabulation, and applications to procedures based on T2.
The noncentral F-distribution is defined .IS the distribution of the ratio of
a noncentral X2 and an independent X~ divided by the ratio of correspond~
ing degrees of freedom. Let V have the noncentral .¥~-distribution with p
degrees of freedom and noncentrality parameter T:: (as given in Theorem
3.3.5), and let W be independently distributed as x:: with m degrees of
freedom. We shall find the density of F = (Vip) I( WI 111), which is the
noncentral F with noncentrality parameter T~. The joint density of V and W
is (28) of -Section 33 multiplied by the density of W, which is
2- imr-1(tm)w!m-'e- t..... The joint density of F and W (do = pwdflm) is
(1)
.~
m
r. (T2)~~_
{3=!) 4 {3
(2)
186 THE GENERAUZED r 2 -STATISTIC
e- x (T2/2)i3[t2j(N-1)}!P+i3-lr(!N+.8)
(3) (N-1)r(!<N-p)] (3~O t3!r(tp+.8)[l+t2/(N-1)ltN+~
where
.. _ ~ r(a+,3)r(b)x~
(4) lF1(a,b,x) - {3~O r(a)r(b+ .8).8!'
(5)
[I (~)3(1 )8]. His accompanying tables of significance points are for T2/(T2 +
N -1).
As an example, suppose p = 4, n - p + 1 = 20, and consider testing the
null hypothesis,... 0 at the 1% level of significance. We would like to know
the probability, say, that we accept the null hypothesis whell <p = 2.5 (T 2 =,
31.25). It is 0.227. If we think the disadvantage of accepting the null
hypothesis when N, ,..., and I are such that T2 = 31.25 is less than the dis-
advantage of rejecting the null hypothesis when it is true, then we may find it
5.5 TWO-SAMPLE PROBLEM WITH UNEQUAL COVARIANCE MATRICES 187
If the covariance matrices are not the same, the T 2-test for equality of mean
vectorS has a probability of rejection under the null hypothesis that depends
on these matrices. If the difference between the matrices is small or if the
sample sizes are large, there is no practical effect. However, if the covariance
matrices are quite different and/or the sample sizes are relatively small, the
nominal significance level may be distorted. Hence we develop a procedure
with assigned significance level. Let {x~)}, ex = 1, ... ,~, be samples from
N(jJ.(I), Ii)' i = 1,2 We wish to test the hypothesis H: jJ.(l) = jJ.(2). The mean
i(l) of the first sample is normally distributed with expected value
(2)
Similarly, the mean i(2) of the second sample is normally distributed with
expected value
(3)
188 THE GENERALIZED T 2 -STATISTIC
(4)
Thus i( I) - i(2) has mean 1-'-( I) - 1-'-(2) and covariance matrix (1/N 1 ) I I +
(I/N2 )'I 2 • We cannot use the technique of Section 5.2, however, because
NJ N2
(5) E (x~)-j(l»)(X~I)_j(l))'+ E (X~2)_j(2»)(X~2)-i(2»)'
a=1 a=1
does not have the Wishart distribution with covariance matrix a multiple of
(l/N1)II + (l/N 2 )I 2 •
If Nl = N2 = N, say, we can use the T 2-test in an obvious way. Let
Ya = x~l) - X~2) (assuming the numbering of the observations in the two
samples is independent of the observations themselves). Then Ya is normally
distributed with mean 1-'-(1) - 1-'-(2) and covariance matrix II + I 2 , and
Yl' ... 'YN are independent. Let ji =(l/N)E~=IYa=i(l) -i(2), and define S
by
N
(6) (N-l)S= E (Ya-ji)(Ya-ji)'
a=1
N
= E (x~) _X~2) -i(i) +j(2»)(X~I) _X~2) -i(l) +X(2»)'.
a=1
Then
(7)
is suitabl~ for testing the hypothesis 1-'-(1) - 1-'-(2) = 0, and has the T 2-distribu-
tion with N - 1 degrees of freedom. It should be observed that if we had
known II = I 2 , we would have used a T 2-statistic with 2N - 2 degrees of
freedom; thus we have lost N - 1 degrees of fr~edom in constructing a test
which is independent of the two covariance matrices. If Nl = N2 = 50 as in
the example in Section 5.3.4, then T4: 4 '1(,0l) = 15.93 as compared to ~:98(.01)
= 14.52.
Now let -us turn our attention to the case of NI =1= N 2 • For cor,venience, let
NI < N2 • Then we define
1 NJ N2
(8) Y
a
=x(l) -
a Ifi _I
N2
X(2)
a
+
-INN
1
E X(2) -
2 {3= 1
(3
~
N
E X(2)
2 y= I
"Y'
a= 1, ... ,NI
5.5 TWO-SAMPLE PROBLEM WITH UNEQUAL COVARIANCE MATRICES 189
(10)
Thus a suitable statistic for testing f.L( I) - f.L(2) = 0, which has the T 2-distribu-
tion with N1 - 1 degrees of freedom, is
( 11)
where
NJ
( 12) Y= ~ E Ya =j(l) _j(2)
1 a= 1
and
NJ NJ
(13) (N I -l)S= E (Ya-Y)(Ya-Y)'= E (ua-u)(ua-u)',
a=1 a=1
where f31"'" f3 q are given scalars and f.L is a given vector. If the ~ are
unequal, take NI to be the smalJest. Let
(15) Ya = r.
f3lx~l) + i~2 f31-J ~ (X~l) - ~
I 1.8=1
Ex~) + 1
~NI~
EX(ll)
)'=1 Y •
190 THE GENERALIZED T 2 ·STATISTIC
. 1 Nt
( 17) j(l) = N I: x~),
I {3- I
NJ
(IX) (Nt -I)S= I: (Yu-y)(y.. -j),·
a"'"
Then
( 19) T2 = N, (y - ..,) t S-l (y - ..,)
is suitable for testing H, and when the hypothesis is true, this statistic has the
T 2-distribution for dimension p with N, - 1 degrees of freedom. If we let
U et = [:'-1 f3IVNI/N;x~}, a = 1, ... , N 1, then S can be defined as
Nt
(20) (N I -1)S I: (u" u)(u" u)'.
0'=1
(21)
We assume that x( I) and X(2) are each of q components. Then y = x(l) - X(2)
is distributed normally with mean ..,(1) - ..,(2) and covariance matrix Iy = III
- I~1 - I12 + I 22 . To test the hypothesis ..,(1) = ..,(2) we use a T2-stat istic
NY 1 S ~- J y, whe re the mean vector and covariance matrix of the sample are
partitioned similarly to .., and I.
statistics A = E(x a - iXxa - i)' and i which are invariant with respect to
the transformations A* = CAC' and i* = ex, where C is nonsingular. The
transformation x: = Ct a leaves the problem invariant; that is, in terms of x!
we test the hypothesis Gx: = 0 given that xi, ... , x~ are N observations
from a multivariate normal population. It seems reasonable that we require a
solution that is also invariant with respect to these transformations; that is,
we look for a critical region that is not changed by a nonsingular linear
transformation. (The defin.tion of the region is the same in different coordi-
nate systems.)
Theorem 5.6.1. Given the observations Xl"'" X N from N(fL, I), of all
tests of fL = 0 based on i and A = E(x a - i)(x a - i)' that are invariant with
respect to transformations i* = ex, A* = CAC' (C nonsingular), the T2_ test is
uniformly most poweiful.
(1)
II
I 2 In)!P- I (1 + t 2I n( t(
f(tp)
II + I)f [ Hn + 1)]
Corollary 5.6.1. On the basis of observations Xl, ••• , X N from N(II-, I). of
all randomized tests based on i and A that are invariant with respect to
transformations x* = ex, A* = CACI (C nonsingular), the T2~test is untformly
most powerful.
Proof. Let I/I(x l ,.,., X N) be the critical function of an invariant test. Then
Since x, A are sufficient statistics for 11-, I, the expectation G[ 1/1 ( x I' ... ,
x N )lx,.4] depends only on x, A, It is invariant and has the same power as
I/I(x I" .. , X N)' Thus each test in this larger class can be replaced by one in
the smaller class (depending only on x and A) that has identical power.
Corollary 5.6.1 completes the proof. •
Theorem 5.6.3. Given observations XI' • , • , X N from N(II-, I), of all tests of
II- = 0 based on x and A = E(xa - i)(x a - xY with power depending only on
NII-/I-III-, the T 2-test is uniformly most powerful.
for all XI,,,,,XN except for a set ofx1"",x N of Lebesgue measure zero; this
exception set may depend on C.
5.6 SOME OPTIMAL PROPERTIES OF THE r:!-TEST 193
It is clear that Theorems 5.6.1 and 5.6.2 hold if we extend the definition of
invariant test to mean that (3) holds except for a fixed set of .:1"'" X,' of
measure 0 (the set not depending on C). It has been shown by Hunt and
Stein [Lehmann (1959)] that in our problem almost invariancc implies invari-
ance (in the broad sense).
Now we wish to argue that if !/I(x, A) has power depending only on
N~'I. I~, it is almost invariant. Since the power of !/ICx, A) depends only on
N~'I. -I~, I.he power is
== GfL.I.!/I(CX,CAC').
The second and third terms of (4) are merely different ways of writing the
same integral. Thus
Theorem 5.6.4 waS first proved by Simaika (1941). The results and proofs
given in this section follow Lehmann (1959). Hsu (1945) has proved an optimal
property of the T 2·test that involves averaging the power over ~ and I..
mch that
(8) WEn,
A E@.
(9)
We map from ,1'" to 0..111; the vector y = (y(1)" y(2),), is composed of y(1; = x
2 2 X1X2'
an d y (2) -- ( Xl> . 2 2 2), 1~h e vecor
••• ' X 1X p ,X 2 " . " X p •
t - (1)1 ,00(2),),'"
00- 00 1"
1
composed of w(1)=:1:- v. and 00(2)= - t(ull,U12, ... ,ulp,u22, ... ,uPPY,
where (U 11 ) = ~ -1; the transformation of parameters is one to one. The
measure meA) of a set A E::J23 is the ordinary Lebesgue measure of the se'i of
x that map~ into the ~et A. (Note that the prohability mea!;Ure in all is not
defined by a density.)
Figure 5.2
The cond~tions of the theorem are illustrated in Figure 5.2, which is drawn
simultaneously in the space OY and the set fl.
with strict inequality for some 00; we shall show that this assumption leads to
a contradiction. Let B = {yl cp(y) < I}. (If the competing test is nonrandom-
ized, B is its acceptance region.) Then
(13)
196 THE GENERALIZED T 2 ·STATISTIC
Then
+f [<PA(Y) - <p(y)]eA(w':'-C)dPwlY)}'
w'Y;$C
For 00') > C we have tfliy) 1 and <piy) <p(y) ~ 0, and (yl <piy) - <p(y)
> O} has positive measure; therefore, the first integral in the braces ap-
proaches 00 as A -+ 00. The second integral is bounded because the integrand
is bounded by 1, and hence the last expression is positive for sufficiently large
A. This contradicts (11). •
Proof. The closure of A is convex (Problem 5.18), and the test with
acceptance region equal to the closure of- A differs from A by a set of
probability 0 for all 00 E n. Furthermore,
(15) An{ylw'y>c}=0 ~ Ac{ylw'ys;c}
~ closure A c {yl 00' y s; c}.
(16)
N
(17) B= E z"z:.
a=1
~r::::'A-I-X= Ni'B-lx
( 18) 1'1 X ---
1-Ni'B- I x
Proof of Lemma. If we let B =A + fNxfNx' in (10) of Section 5.2. we
obtain by Corollary A.3.1
for a suitable k.
The function Z;"'B-IZ N is conveX in (Z, B) for B positive definite (PlOblem
5.17). Therefore, the set zNB-IZ N ::;; k is convex. This shows that the set A is
convex. Furthermore, the closure of A is convex (Problem 5.18). and the
probability of the boundary of A is O.
Now consider the other condition of Theorem 5.6.5. Suppose A is disjoint
with the half-space
o
(22) -I
o
where D is nonsingular. If A is not positive semidefinite, - / is not vacuous,
because its order is the number of negative characteristic roots of A. Let
.:, = (l/Y}~1l and
B~(D')-I[~ ~]D-I,
0
(23 ) yl
0
Then
(24 ) oo'y =
1
_V'ZO
y
1 [-/
+ '2tr 0
0
0
yT
0 n
which is greater than c for sufficiently large y. On the other hand
(2~)
which is less than k for sufficiently large y. This contradicts the fact that (20)
and (2l) are disjoint. Thus the conditions of Theorem 5.6.5 are satisfied and
the theorem is proved. •
function) is to reject H 0 if
f f( xl 00 )fIl( doo)
(26)
f f( xl (0) fIo( doo )
for some c (0 :$; C :$; 00). If equality in (26) occurs with probability 0 for all
00 E no, then the Bayes procedure is unique and hence admissible. Since the
meaSUre~ are finite, they can be normed to be probability meaSUres. For the
T2-test of Ho: IJ.. = 0 a pair of measures is suggested in Problem 5.15. (This
pair is not unique.) Th(" reader can verify that with these measures (26)
reduces to the complement of (20).
Among invariant tests it was shown that the T 2-test is uniformly most
powerful; that is, it is most powerful against every value of IJ..' I, - llJ.. among
invariant tests of the specified significance level. We can ask whether the
T 2-test is ('best" against a specified value of lJ..'I,-11J.. among all tests. Here
H best" can be taken to mean admissible minimax; and (( minimax" means
(1)
the sample mean i and covariance S are unbiased estimators of the distribu-
tion mean IJ..=V and covariance matrix I, = (GR2/p )A, where R2=
(X-v)'A-1(X-v) has finite expectation. The T 2-statistic, T 2 =N(i-
IJ..)'S-l(i - IJ..), can be used for tests and confidence regions for IJ.. when I,
(or A) is unknown, but the small-sample distribution of T2 in general is
difficult to obtain. However, the limiting distribution of T2 when N --+ 00 is
obtained from the facts that IN (i - IJ..) ~ N(O, I,) and S.4 I, (Theorem
3.6.2).
200 THE GENERALIZED T 2 -STATISTIC
Theorem 5.7.1. Let Xl>"" , x N be a sample from (1). Assume cffR2 < 00.
Then T2.!4 Xp2.
Proof Theorem 3.6.2 implies that N(i - f,L)'I-I(i - f,L) ~ Xp2 and N(i
- f,L)'I-I(i - f,L) - T2 ~ O. •
Theorem 5.7.1 implies that the procedures in Section 5.3 can be done on
an asymptotic basis for elliptically contoured distributions. For example, to
test the null hypothesis f,L = f,Lo, reject the null hypothesis if
(2)
where Xp2( a) is the a-significance point of the X2 -distribution with p degrees
of freedom the limiting prohahility of (2) when the null hypothesis is true
and N ~ 00 is a. Similarly the confidence region N(i - m)'S-'(X - m) ~
x;(a) has li.niting confidence 1 - a.
based on the left spherical density g(Y'Y). Here Y has the representation
Y g, UR', where U (N X p) has the uniform distribution on O( N X p), R is
lower triangular, and U and R are independent. Then X4: eNv' + UR'C'.
The T 2-criterion to test the hypothesis v = 0 is NX' S- I i, which is invariant
with respect to transformations X ~ XG. By Corollary 4.5.5 we obtain the
following theorem.
( 4)
since X 4: UR'C',
(5)
PROBLEMS 201
and
(7)
(9)
PROBLEMS
5.1. (Sec. 5.2) Let xa be dbtributed according to N(fJ. + ll( Za - i), I), a =
1,. '" N, where i = OjN)[za' Let b == [lj[(za - i)2][x,,(za - i), (N - 2)S =
[[xa i-b(za-i)][xa-x-b(za-i)]" and T2=f.(za-Z)2b'S-lb. Show
that T2 has the r 2-distribution with N 2 degrees of freedom. [Him: See
Problem 3..13.]
5.2. (Sec. 5.2.2) Show that T2 j( N - 1) can be written as R2 j( 1 - R:!) with the cor-
respondcnce" given in Tahle 5.1.
2
202 THE GENERALIZED r ·srATISTIC
Table 5.1
Section 52 Section 4.4
Zla
Z~l
{Fix Q(l) = !:zlaZ~)
B !:xax'a A
22
= !:Z(2)Z(2),
a a
1 == !:xga all = !:zfa
T2 R2
N-l 1-
p P -1
N n
where up. _ •• uN are N numbers and Xl'" _. XN are independent, each with the
distribution N(O, :£). Prove that the distribution of R 2 /(1 - R2) is independent
of u I' ... , uN- [Hint: There is an orthogonal N X N matrix C that carries
(u I> ... , UN) into a vector proportional to (1/ {Fi, .. . ,1/ {ii).]
5.4. (Sec. 5.22) Use Problems 52 and 5.3 to show that [T 2 /(N -l)][(N - p)/p]
has the Fp.N_p·distribution (under the null hypothesis). [Note: This is the
analysis that corresponds to Hotelling's geometric proof (1931).]
5.5. (Sec. 522) Let T2 Ni'S - I X, where .i and S are the mean vector and
covariance matrix of a sample of N from N(p., :£). Show that T2 is distributed
the same when p. is replaled by A = (T, 0, .. . ,0)1, where -r2 = p/:£ -1 p., and :£ is
replaced by 1.
5.6. (Sec. 5.2.2) Let U = [T 2 /tN - l)]j[l + T 2 /(N - 1)]. Show that u =
-yV'(W,)-lV-y', where -y=(1/{Fi, ... ,1/{Fi) and
PROBLEMS
i * 1,
2 P )
v*v*'
II'"
(J
~*'
I'
1 0 o
V2 V 'l
---,
V1Vl
o
E=
,
vpvl
---, 0 1
vlv 1
5.8. (Sec. 5.2.2) Prove that w has the distribution of the square of a multiple
correlation between One vector and p - 1 vectors in (N - l}-space without
subtracting means; that is, it has density
[Hint: The transformation o~ Problem 5.7 is a projection of V 2' •••• V p' 'Y on the
(N - I)-space orthogonal to VI']
5.9. (Sec. 52.2) Verify that r = s/(1 - s) multiplied by (N -1)/1 has the noncen-
tral F-distribution with 1 and N - 1 degrees of freedom and noncentrality
parameter NT2.
2
204 THF GENERALIZED T -STATISTIC
5.11. (Sec. 53) Use the data in Section 3.2 to test the hypothesis that neither drug
has a soporific effect at significance level 0.01.
5.12. (Sec. 5.3) Using the data in Section 3.2, give a confidence region for f,1 with
confidence coefficient 0.95.
5.13. (Sec. 5.3) Prove the statement in Section 5.3.6 that the T 2·statistic is indepen·
dent of the choice of C.
5.14. (Sec. 5.5) Use the data of Problem 4.41 to test the hypothesis that the mean
head length and breadth of first SOns are equal to those of second sons at
significance level 0.01.
5.15. (Sec. 5.6.2) T2·test as a Bayes procedure [Kiefer and Schwartz (1965)]. Let
Xl"'" xN be independently distributed, each according to N(f,1, I). Let be no
defined by [f,1,I]=[O,(I+llll'}-'] with 11 having a density proportional to
II+llll'l-tN , and let TIl be defined by [f,1,I]= [(I+llll,}-ll1,(I+llll,}-I]
with 11 having a density proportional to
(a) Show that the lleasures are finite for N > P by showing 11'(1 + 1111'} - 111 .s; 1
and verifying that the integral of 11+ 1111'1- iN = (1 + 1111')- iN is finite.
(b) Show that the inequality (26) is equivalent to Ni'(,L.~~lXaX~}-li~k.
Hence the T 2-test is Bayes and thus admissible.
5.16. (Sec. 5.6.2) Let g(t} = f[tyl + (1- t}Y2], where f(y) is a real-valued functiun
of the vector y. Prove that if g(t} is convex, then f(y} is convex.
5.17. (Sec. 5.6.2) Show that z'B-Iz is a convex function of (z, B), where B is a
positive definite matrix. [Hint: Use Problem 5.16.]
5.18. (Sec. 5.6.2) Prove that if the set A is convex, then the closure of A is convex.
5.19. (Sec. 5.3) Let i and S be based on N observations from N(f,1, I}, and let X
be an additional ohservation from N(f,1, I}. Show that X - i is distributed
according to
N[O, (1 + 1IN)I].
5.20. (Sec. 53} Let x~) be obsen:ations from N(Il-('l, :I), a = 1, ... , N,. i = 1,2. Find
the likelihood ratio criterion for testing the hY'Jothesis Il-(I) = 1l-(2).
5.21. (Sec. S.4) Prove that Il-':I -Ill- is larger for Il-' = ( IJ-I' IJ-~) than for Il- = IJ-I by
verifying
Discuss- the power of the test IJ-, = 0 compared to the power of the test IJ-I = 0,
1J-2 = O.
(a) Using the data of Section 5.3.4, test the hypothesis IJ-\I) =: 1J-\2).
(b) Test the hypothesis IJ-\I) = 1J-\2.), IJ-~I) = IJ-<i).
5.24. Let XC!), = (y(i\ Z(i)I), i"", 1,2, where y(l) has p components and Zld has q
components, be distributed according to N(1l-(1),:I}, where
(i) =
( (,)
I-Ly 1 :I=(:In i = 1,2,
I-L Il-~)' 'II)'
Find the likelihood ratio criterion (or eqllivalent T 2-criterion) for testing Il-~'\ =
Il-(;) given Il-~) = Il-~;) on the basis of a sample of N j on X(I\ i = 1,~, l Him:
Express the likelihood in terms of the marginal density of Yl') and the
conditio.lar density of Z(i) given y(i).]
5.25. Find the distribution of the criterion in the preceding problem under the null
hypothesis.
5.26. (Sec. 5.5) Suppose x~g) IS an observation from N(ll-l~\. :I~), C'C = 1. .. ,. N~..
g= 1, ... ,q.
206 TIlE GENERALIZED T 2-STATlST1C
(a) Show that the hypothesis p..(I) = ... = p..(q) is equivalent to £y~) = 0,
i = 1, ... , q - 1, where
5.27. {Sec. 5.2} Prove (25) is the density of V = x~""/( X; + xl}. [Hint: In the joint
density of U = Xa2 and W ... xC make the transformation u = uw(l - U)-I, W = w
and integrate out w.]
CHAPTER 6
Classification of Observations
207
208 CLASSIFICATION OF OBSERVATIONS
Table 6.1
Statistician's Decision
7T) 7T~
7Tl 0 C(211)
Population
7T1 C(112) 0
statistician can classify him or her as coming from population 7T:; if from 7T~,
the statistician can classify him or her as from 7TI' We need to know the
relative undesirability of these two kinds of misdassificatiol1. Let the cost of
tlte first type of misclassification be C(211) (> 0), and let the cost of mis-
classifying an individual from 7T2 as from 7TI be COI2) (> 0). These costs
may be measured in any kind of units. As we shall see later, it is only the
ratio of the two costs that is important. The statistician may not know these
costs in each case, but will often have at least a rough idea of them.
Table 6.1 indicates the costs of correct and incorrect classification. Clearly,
a good classification procedure is one that minimizes in some sense or other
the cost of misclassification.
( 3) P'212,R}= f p~(x}dx,
R,
210 CLASSIFICATION OF OBSERVATIONS
It is this average loss that we wish to minimize. That is, we want to divide our
space into regions RI and R2 such that the expected loss is as small as
possible. A procedure that minimizes (5) for given qI and q2 is called a Bayes
procedure.
In the example of admission of students, the undesirability of misclassifica-
tion is, in one instance, the expense of teaching a student who will nOi
complete the course successfully and is, in the other instance, the undesirabil-
ity of excluding from college a potentially good student.
The other case we shall treat is that in which there are no known a priori
probabilities. In this case the expected loss if the observation is from 7T I IS
(1 )
We can also define the conditional probability that an observation came from
a certain popUlation given the values of the observed variates. For instance,
the conditional probability of coming from popUlation 1T l' given an observa-
tion x, is
(2)
Suppose for a moment that C(112) = C(21l) = 1. Then the expected loss is
(3)
212 CLASSIFICATION OF OBSERVATIONS
qIPI(X) ~ qzP2(X)
( 4)
qIPI(X) +qzP2(X) qIPI(X) +q2P2(X) '
(6)
On the right-hand side the second term is a given number; the first term is
minimized if R~ includes the points x such that q 1 P I( x) - q 2 pi x) < 0 and
excludes the points for which qIPix) - qzpix) > O. If
(7) i = 1,2.
then the Bayes procedure is unique except for sets of probability zero.
Now we notice that mathematically the problem was: given nonnegative
constants ql and q2 and nonnegative functions PI(X) and pix), choose
regions RI and R2 so as to minimize (3). The solution is (5), If we wish to
minimize (5) of Section 6.2, which can be written
6.3 CLASSIFICATION INTO ONE OF TWO POPUl.ATIONS 213
(11 ) i = 1,2,
SUPPO$\C 0 < ql < I. Then if PCll2, R*) <POI2, R), the right~hand side of
(13) is 1e~s than zero and therefore P(211, R) < P(211, R*). Then P(211, R*)
< P(211. R) similarly implies P(112, R) < P(112, R*). Thus R* is not better
than R. and R is admissible. If q 1 = 0, then (13) implies 0 5 PO 12, R*) ~
P(112, R). For a Bayes procedure, RI includes only points for which pix) = O.
Therefore, P(l12, R) = 0 and if R* is to be better POI2, R*) = 0.IfPr{P2(x)
= OI7Tl} = 0, then P(211, R) = Pr{pzCx) > OI7TI} = 1. If POI2, R*) = 0, then
RT contains only points for which pZCx) = O. Then P(211, R*) = Pr{Ril7TI}
= Pr{p~(x) > OI7T l} = 1, and R~ is not better than R.
Now let us prove the converse, namely, that every admissible procedure is
a Baye~ procedure. We assume t
(14) PI(X)
Pr { P2( x)
I)
= k 7T/ = 0, i=I,2, O:s;k:s;oo.
Then for any ql the Bayes procedure is unIque. Moreover, the cdf of
Pl(X)/P/x) for 7Tl and 7T2 is continuous.
Let R be an admissible procedure. Then there exists a k such that
= P(211, R*),
The proof of Theorem 6.3.3 shows that the ciass of Bayes procedures is
compkle. For if R is any procedure oUlside the class, we eonslruct a Bayes
proccdure R* so lhat P(211, R) = P(211, R*). Thcn, since R* is admissible,
P( 1.12. R) ~ P( 112, R*). Furthermore, the class of Bayes procedures is mini-
mal complete since it is identical with the class of admissible procedures.
6.4 CLASSIFICATION INTO ONE OF TWO NORMAL POPULATIONS 215
Finally, let us consider the minimax procedure. Let P(ilj, q1) = P(ilj, R),
where R is the Bayes procedure corresponding to q1' P(ilj, ql) is a continu-
ous function of q \. P(211, ql) varies from 1 to 0 as q. goes from 0 to 1;
P(112, q.) varies from 0 to 1. Thus there is a value of q., say q;, such that
P(211, qi) = P(112, qj). This is the minimax solution, for if there were
',lDother proc~dure R* such that max{P(211, R*), P(112, R*)} $ P(211, q;) =
P(112, q;), that would contradict the fact that every Bayes solution is admissi-
ble.
Now we shaH use the general procedure outlined above in the case of two
multivariate normal populations with equal covariance matrices, namely,
N(JL0l, I) al1d N(JL(2), I), where JL(I)' = (p.y>, .. ,' p.~» is the vector of means
of the ith population, i 1,2, and I is the matrix of variances and covari-
ances of each population. [The approach was first used by Wald (1944).]
Then the ith density is
( 1) Pi{X)
(2)
The region of classification into 7T I' R 1> is the set of x's for which (2) is
greater than or equal to k (for k suitably chosen). Since the logarithmic
function is monotonically increasing, the inequality can be written in terms of
the logarithm of (2) as
216 CLASSIFICATION OF OBSERVATIONS
Theorem 6.4.1. If 11"( has the density (1), i = 1,2, the best regions of
classification are given by
(7)
In the particular case of the two populations being equally likely and the
costs being equal, k = 1 and log k = O. Then the region of classification into
11" 1 IS
and variance
(11) Var I( U) = tB'1( fL(l) - fL(2»)'I - I (X - fL(l»)( X - fLO», I ·-1 (fLO) - fL(Z)
(12)
= _11\2
2 '-l •
(14) P(211)
(IS) P(112)
Figure 6.1 indicates the two probabilities as the shaded portions in the tails
Figure 6.1
218 CLASSIFICATION OF OBSERVATIONS
Theorem 6.4.2. If the 7f, have densities (I), i = 1,2, the. minimax regions of
classification are given by (6) where c = log k is chosen by the condition (16) with
CUlj) the two costs of nzisc/assificatioll.
x 1 _l\.~ d
y.
(17)
J.l/2 -J27f
-e',
( 18)
( 19)
[tC\(X'd) - C2 (X'd)r
(20)
Var(X'd)
the denominator is
The derivatives of (23) with respect to the components of d are set equal to
zero to obtain
(24)
(25)
N\
( 1) (N1 + N2 - 2)S = L (X~1) - x(l))( X~l) - X(I))'
a=1
Nz
+ L (x~) - X(2))( X~2) - X(2»), .
a=l
The first term of (2) is the discriminant function based on two samples
[suggested by Fisher (1936)]. It is the linear function that has greatest
variance between samples relative to the variance within samples (Problem
6.12). We propose that (2) be used as the criterion of classification in the
same way that (5) of Section 6.4 is used.
Vlhen the populations are known, we can argue that the classification
criterion is the best in the sense that its use minimizes the expected loss in
the case of known a priori probabilities and generates the class of admissible
procedures when a priori probabilities are not known. We cannot justify the
use of (2) in the same way. However, it seems intuitively reasonable that (2)
should give good result.s. Another criterion is indicated in Section 6.5.5.
Suppose we have a sample x it ••. , X N from either 7r t or 7r 2' and we wish
to classify the sample as a whole. Then we define S by
N)
(3) (Nt +N2 +N - 3)S = E (x~t) -.i(1»)(x~t) _.in)),
a-I
N2 N
+ E (x~) -.i(2))(x~) -.i(2»)' -- E (xa-x)(xa-x)"
a= I a-I
where
( 4)
(5)
(7)
(8)
(9)
Then
(10) W=
The density of M has been given by Sitgreaves (1952). Anderson (1951a) and
Wald (1944) have also studied the distribution of W.
If N J = N 2 , the distribut'lon of W for X from 'IT I is the same as that of
- W for X from 'IT 2' Thus, If W ~ 0 is the region of classification as 'IT" then
the probability of misc1assifying X when it is from 'lT l is equal to the
probability of misclassifying it when it is from 'lT2'
(See Problem 3.23.) This can be proved by using the Tchebycheff inequality.
Similarly_
and
(14) plim S= I
(17)
plim (X(ll +X(Z»),S-I(X(I) _X(2l) = (j.L(l) + j.L(2»,I-I(j.L(I) _ j.L(2».
N 1 ·N 1 --o:X:
Theorem 6.5.1. Let W be given by (6) with X(l) the mean of a sample of NI
from N(j.L(l), I), X(2) the mean of a sample of N z from N(j.L(Z), I), and 5 the
estimate of I based on the pooled sample. The limiting distribution of W as
Nl -+ co and N~ -+ 00 is N(iti, ti) if X is distributed according to N(j.L(l), I)
and is N( - i~2, ~2) if X is distributed according to N(j.L(2l, I).
2 N,
(19) E E (y~I)-bl(X~)~i)]2,
i= I 0:= I
where
(20)
2N{ 2N;
(21) ~}2 (x~)-x)(x~)~x)'b= ~ ~ y~i)(x~)~x)
j=lo:=l j=la=1
'2 N,
(22) L E (x~) ~ x)(x~) ~i)'
i=lo:=1
2 N,
= ~ E (x~) ~ i(I»)( x~) - X(I))'
i '" I 0: -= I
2 N,
= ~ E (x~)~x(i))(X~)_i(I))'
i-I 0:-=1
(23)
224 CLASSIFICATION OF OBSERVATIONS
where
2 N,
(24) A = L L (X~) - i(i))( X~) - i(i»)'.
i=1 a=1
Since (i(l) - i(2»)'b is a scalar, we see that the solution b of (23) is propor-
tional to S~I(i(l) - i(2»).
A(1)
(25) fJ.l
r.(2)
.... 1
= i(2) ,
N2
+ L (X~2) fi\2»)( x~)
u=1
Since
NI
(26) L (x~) fa.V»)(x~l)- fa.\l)) , + (x- fa.\l))(x fi(I!))'
ao:l
NI
L (x~l) - i(O)( x~l) - i(l))' + N 1( i(l) fa.\I») (i(l) fa.\l))'
a=1
+ (x - fiV»( ~ - fa.\l»)'
NI h
L (X~I) - i(l))( x~l) - i(O)' + Nt ~ 1(x - i(l»)( x - i(l»)',
a~1
6.5 CLASSIFICATION WHEN THE PARAMETERs ARE ESTIMATED 225
we can write i [ as
(27) ~~ I -- N[ + 1N2 + 1 [A + NI N
+ 1 (x - x-(1))( x - x -(I)),] ,
where A is given by (24). Under the assumptions of the alternative hypothesis
we find (by considerations of symmetry) that the maximum likelihood estima-
tors of the parameters are
A(2) _
u -(2)
1V2X
+x
(28) lL2 - N2 + 1 '
~~2-- 1 +1
N 1 +N N2 ( x-x-(2))( x-x-r:l) '1 .
[A + N~+l
2
(29)
(30) N
1+ I (x-x(I))'A- 1 (x-i(I))
N[ +1
n+ N2 (X_i(2))'S-\(x-i(~))
N2 + 1
N~ ( _ ~) _\ _(;)
(31) R . Il
l'
+ N+l- X - Xl- )' S (x - x - )
2
226 CLASSIFICATION OF OBSERVATION~
~ Nl~ 1 (X~i(l))'S-I(X-i(l))],
which has the probability limit 0 as N 1, N2 -+ 00. The probabilities of misclas-
sification with Ware equivalent asymptotically to those with Z for large
samples.
Note that for Nl = N 2 - Z = [N1/(N1+ l)]W. Then the symmetric test
based on the cutoff c = 0 is the same for Z and W.
6.5.6. Invariance
The classification problem is invariant with respect to transformations
x(I)*(l
= BX(I) + c
(l , ex = 1, ... , NI ,
(34) ex = 1, ... , N 2 ,
x* = Bx + c,
where B is nonsingular and c is a vector. This transformation induces the
following transformation on the sufficient statistics:
1
+ 2 [u
3
+ 211u 2 + (p - 3 + (12)u + (p - 2) 11 ]
2N211
1
+ 4 n [4u 3 + 411u 2 + (6p - 6 + (1 2)u + 2(p - 1)11]} + O(n- 2 ),
and Pr{ -(W + 1112 )/11 ~ UI7T 2 ) is (1) with N[ and N2 interchanged.
228 CLASSIFICATION OF OBSERVATIONS
The rule using W is to assign the observation x to 1T1 if W(x) > c and to
1T2 if W(x):::; c. The probabilities of misclassification are given by Theorem
Corollary 6.6.1
Note tha·l. the correction term is positive, as far as this correction goes;
that is, the probability of misclassification is greater than the value of the
normal approximation. The correction term (to order n -I) increases with p
for given ~ and decreases with ~ for given p.
Since ~ is usually unknown, it is relevant to Studentize W. The sample
Mahalanobis squared distance
(3)
( 4)
(8)
Note that these means and variance are functions of the samples with
probability limits
( 13) P(ll?
. -, c, x-(I) ,X-(2) , S)=l-'+'
'¥
C-J.lX
(2)(-(1) -(2)
,x ,
_(I) -(2)
S) 1
•
[
u(x ,x .S)
In (12) y,.rite c as DU I + tDz. Then the argument of ¢(.) in (12) is
lit Diu + (i(l) - i(:!l)' S -1 (i( I) ..,(1) I u; the first term converges in probabil-
ity to u l ' the second term tends to 0 as Nl -;. 00, N2 -;. 00, and (12) to 4>(u l ).
In (13) write t' as Duz !Dz. Then the argument of q:,(.) in (13) is
It;Dlu+ (i{l) _XI~))'S-l(i(2) -..,(2)/u. The first term converges in proba~
bility to 1I2 and thc second term to 0; (13) converges to 1 ¢(u 2 ).
For given i(ll, i(21, and S the (conditional) probabilities of misclassifica-
tion (12) and (13) are functions of the parameters ..,(1), ..,(21, I and can be
estimated. Consider them when c = O. Then (12) and (13) converge in
probability to ¢( - ~ .1); that suggests q:,( - tD) as an estimator of (12) and
(3). A better estimator is q:,( 115), where D2 (n p _l)D2 In, which
is closer to being an unbiased estimator of tJ.2• [See (4).] Mclachlan
(1973. 1974a, 1974b. 1974c) gave an estimator of (12) whose bias is of order
n-~: it is
Theorem 6.6.4. As N\ -;. 00, N!. -;. co, and NIIN2 -;. a pOSitive limit,
Theorem 6.6.6
+ 4~ [u 3 + (4 P - 3) u] } + 0 ( n- 2 ) ,
232 CLASSIFICATION OF OBSERVATIONS
2
(18) pr{ - Z +JD ~ U)7T 2}
1
+ 4 n [U6 + ( 4P - 5) u o] .
Then as Nl ~ 00, N2 ~ 00, and Nli N2 ~ a positive limit,
(20)
Suppose we have a priori probabilities of the populations, q1> ...• q",. Then
the expected loss is
q,p,(x)
(3)
( 4)
(5) L, q,p,(x)CUli)
i= 1
I+j
234 CLASSIFICATION OF OBSERVATIONS
for all j and select that j that gives the minimum. (If two different indices
give the minimum, it is irrelevant which index is selected.) This procedure
assigns the point x to one ofthe Rr Following this procedure for each x, we
define our regions RI"'" Rm' The classification procedure, then, is to
classify an observation as coming from 1Tj if it falls in Rj"
m m
(6) . I: qrPI(x)C(kli) < L q,PI(x)CUli), j=l, ... ,m, j.;.k.
[If (6) holds for all j (j .;. k) except for h indices and the inequality is replaced by
equality for those indices, then this point can be assigned to any of the h + I1T's.]
If the probability of equality between the right-hand and left-hand sides of (6) is
zerO for each k and j under 1T, (I::ach i), then the minimizing procedure is unique
except for sets of probability zero.
where lz(x\R) = Iz}x) for x in RJ' For the Bayes procedure R* described in
the theorem, h(xl R) is h(xIR*) = mini hl(x). Thu~; the difference between
the expected loss for any procedure R and for R* is
~o.
Equality can hold only if h,(x) = min, h/x) for x In RJ except for sets of
probability zero. •
6.7 CLASSIFICATION INTO ONE OF SEVERAL POPULATIONS 235
Let us see how this method applies when CCjli) = 1 for all i and j, i ¢ j.
Then in Rk
m m
(10) E qiPI(X) < E qiPi(X), j ¢k.
r= 1
I .. j
(11)
'*
We shall now assume that CCiIj) = 1, i j, and Pdpi(X) = OI7Tj } = O. The
latter condition implies that all Pj(x) are positive on the same set (except fer
a set of measure 0). Suppose qj = 0 for i = 1, ... , t, and ql > 0 for i = t +
1, ... , m. Then for the Bayes solution RI , i = 1, ... , t, is empty (except for
a set of probability 0), as seen from (11) [that is, Pm(x) = 0 for x in R J
It follows that rG, R) = L,*jP(jli, R) = 1- PGli, R) = 1 for i = 1, . .. ,t.
Then (R r +], ..• , Rm) is a Bayes solution for the problem involving
Pr+](x), ... ,Pm(x) and ql+], ... ,qm' It follows from Theorem 6.7.2 that no
procedure R* for which PUI i, R*) = 0, i = 1, ... , t, can be better than the
Bayes procedure. Now consider a procedure R* such that Rf includes a set
of positive probability so that POll, R*) > O. For R* to be better than R,
In such a case a procedure R** where Rj* is empty, i = 1, ... , t, Rj* = Rj,
i = t + 1, ... , m - 1, and R':n* = R':n URi U ... URi would give risks such
that
Then Ri:l, ... ,R~*) would be better than (R1+1, ... ,R m) for the (m-t)-
decision problem, which contradicts the preceding discussion.
The conVerse is true without conditions (except that the parameter space
is finite).
We shall noW apply the theory of Section 6.7 to the case in which each
population has a normal distribution. [See von Mises (1945).] We assume that
the means are different and the covariance matrices are alike. Let N(JJ.<'\ I)
be the distribut ion of 1T1 • The density is given by (1) of Section 6.4. At the
outset the parameters are ·assumed known. For general costs with known
a priori probabilities we can form the m functions (5) of Section 6.7 and
define the region R j as consisting of points x such that the jth function is
minimum.
In the remainder of our discussion We shall assume that the costs of
misclassification are equal. Then we use the functions
The con:.tants c/.. can be taken nonnegative. These sets of regions form the
dass of admissible procedures. For the minimax procedure these constants
are determined so all p(iI i, R) are equal.
We now show how to evaluate the probabilities of COrrect classification. If
X is a random observation, We consider the random variables
(4)
Here V)r = - Vrr Thus We use m(m - 1)/2 classification functions if the
means span an (m - 1)-dimensional hyperplane. If X is from 1TJ' then ~i is
distributed according to N(}: t1~1> t1~), where
(5)
( 6)
f··· f
:x: :x:
(7) P(jlj. R) = f; dUJI .. , dUJ.J-1 dUj,j+1 ." dUjm'
c)-c m c)-c.
----------------+-~--~~--------------zl
" '.JI
,P.'f,p.~),
R2
(8)
and I by S defined by
(9)
(10)
If the variables above are random, the distributions are different from those
of ~j' However, as Ni -;. 00, the joint distributions approach those of Ui)"
Hence, for sufficiently large sa'Tlples one can use the theory given above.
240 CLASSIFICATION OF OBsERVATIONS
Table 6.2
Mean
Brahmin
Measurement ( 17"1)
measurements for each individual of a caste are stature (Xl), sitting height
(x 2), nasal depth (x 3), and nasal height (x 4). The means of these variables in
the three populations are given in Table 6.2. The matrix of correlations for
all the ~opulations is
The standard deviations are (J"I = 5.74, (J"2 = 3.20, (J"3 = 1.75, 0"4 = 3.50. We
assume that each population is normal. Our problem is to divide the space of
the four variables Xl' X2, X3, X4 into three regions of classification. We
assume that the costs of misclassification are equal. We shall find (i) a set of
regions under the assumption that drawing a new observation from each
population is equally likely (q 1 = q2 = q3 = t), and Oi) a set of regions such
that the largest probability of misclassification is minimized (the minimax
solution).
We first compute the coefficients of I-I (f.L( I) - f.L(2) and I - I (f.L(I) - f.L(3).
Then I-I(f.L(2) - f.L(3) = I-I(f.L(I) - f.L(3) - I-I(f.L(I) - f.L(2). Then we calcu-
late ~(f.L(i) + f.L(j), I - I (f.L(/) - f.L(j). We obtain the discriminant functions t
Table 6.3
Standard
Population of x u Means Deviation Correlation
'l'he other three functions are U21(X) = -u 12 (x), U31 (X) = -U I3 (X), and
U32(X) = -U 23(X), If there are a priori probabilities and they are equal, the
best set of regions of classification are R 1: UI2(X) ~ 0, HI/X) ~ 0; R:.:
U21 (X) 2 0, u 23 (X):2::. 0; and R3: U31(X) ~ 0, U32 (X) ~ O. For example, if we
obtain an individual with measurements x such that U I 2(X) ~ and U I3 (X):2::. a
0, we classify him as a Brahmin.
To find the probabilities of misclassification when an individual is drawn
from population 1Tg we need the means, variances, and covariances of the
proper pairs of u's. They are given in Table 6.3. t
The probabilities of misclassification are then obtained by use of the
tables for the bivariate normal distribution. These probabilities are 0.21 for
1Tl' 0.42 for 1T 2 , and 0.25 for 1T3' For example, if measurements are made on
a Brahmin, the probability that he is classified as an Artisan or Korwa is 0.21.
The minimax solution is obtained by finding the constants C I' c:.. and c.
for (3) of Section 6.8 so that the probabilities of misclassification are equal.
The regions of classification are
t Some numerical errors in Anderson (l951a) are corrected in Table 6.3 and (3).
2~2 CLASSIFICATION 01-' OBSERVATIONS
~k I (1) _
- N[
1
+ 1 [ A[ + N[
NI + 1 ( x _ x-(1)( x _ x
(I»'} ,
(3)
~k 2 (2) -
- N2 1+ 1 [ A 2 + N2N2+ 1 (x - x-(2»( x - x-(2»,] .
6.10 POPULATIONS WITH UNEQUAL COVARIANCE MATRICES 243
IIl(2)I~Nqi2(2)lk(Nl+1) [1 + (X_i(2»)IA21(X-i(2»)]~(N2+1)
(4) =~--------------------~-----
Itl(1)I¥N,+1lIi2(1)I~N2 [1 + (x-x(1»)'All(x _X(l»)]~(NI+1)
(Nl + 1)~(N,+I)PMN2PIA21~
Nt N'P (N2 + 1)~(N2+I)pIAIII; .
The obseIVation x is classified into 1Tl if (4) is greater than 1 and into 1T 2 if
(4) is less than 1-
An alternative criterion is to plug estimates into the logarithm of (1). Use
to classify into 1T I if (5) is large and into 1T 2 if (5) is small. Again it is difficult
to evaluate the probabilities of misclassification.
sample space into the two regions of classification by some simple curve or
surface. The simplest is a li~e or hyperplane; the procedure may then be
termed linear.
Let b (¢ 0) be a vector (of p components) and c a scalar. An observation
x is classified as from the first population if b' x ;;:::. c and as from the second if
b'x <c. We are primarily interested in situatIons where the important
difference between the two populations is the difference between the cen-
ters; we assume j.L{l) ¢ j.L(2) as well as II ¢ I 2 , and that II and I2 are
nonsingula r.
When sampling from the ith population, b' x has a univariate normal
distribution with mean 8(b' xln) = b' j.L{i) and variance
(7)
(8)
(10)
6.10 POPULATIONS WITH UNEQUAL COVARIANCE MATRICES 245
where "I = fLU) - fL(2). To maximize YI for given Y:. we differentiate YI with
respect to b to obtain
-[b''Y -Y2(b'.I2b)~](b'.Ilb)-~.Ilb.
If we let
(12)
(13)
(15)
Then from (9), (12), and (13)
b'fL(l) (t 2 b'.I 2 b + b'f.l(:!l) r.-:-:;,:;;--::-
(16) YI= Vb'.I1b =tIVb.I1b.
= 2t'Y t .I I + (1 - t) .I2 r
I [ 1 .I I [ t .I 1 + ( 1 - t) .I 2 r I "I
-t2'Y'[t.I1 + (1- t).I2r\.Il - .I::)[t.I 1 + (1 t).I 2 ] I
. .II[t.I 1 + (l-t).I:d I",
1
- t2 'Y'[t.I 1 + (1-t).I 2r .II[t.I 1 + (l-t).I:d I
1
·(.II .I 2)[t.I 1 + (1-t}.I::.1- ",
1
+.I1[t.I! +(1-t).I 2 r .I 2 }[t.I 1 +(I-t).I:,]-I",
Lemma 6JO.1. If I I and I z are positive definite and tl > 0, t2 > 0, then
is posiriue definire.
(19)
•
Similarly dvi!dt < O. Since VI ~ 0, V2 ~ 0, we see that VI increases with t
from 0 at t=O to V-y'I1I'Y at t= 1 and V2 decreases from V'Y''i;i. 1 'Y at
I = 0 to 0 at I = 1. The coordinates v, and v2 are continuous functions of t.
Fot" given Y2' O~r,;s Vy'I 2I y, there is a t such that Y2=v 2 ==t zyb'I 2b
and b satisfies (14) for t, =t and t~ = I-t. Then' y, =v I =t,yb''i'lb maxi-
mizes y, for tLat value of h. Similarly given y" 0 SYI S J'Y'I~I'Y, there is
a I such that YI = v, = t,yb'I ,b and b satisfies (14) for tl = t and t2 = 1 - t,
and Y2 = V2 = t 2 yb'I 2 b maximizes Y2' Note that y, ~ 0, Y2 ~ 0 implies the
crror5 of misclassification are not greater Lhan ~.
We now argue that the set of y" Y2 defined this way correspond to
admissihle linear procedures. Let x" x 2 be in this set, and suppose another
procedure defined by z" Z2. were better than Xl' X2, that is, x, s Z" x 2 S Z2
with at least one strict inequality. For YI = z, let y~ be the maximum Y2
among linear procedures; then ZI = Y" Z2 sy~ and hence Xl sy., X2 sy~.
Hov. eva, this is possible only if x I = y,. X z = yL because dy,/dY2 < O. Now
wc have a contradiction to the assumption that z" Z2 was better than x" x 2 •
Thus x ,. x: corresponds to an admissible linear procedUre.
Since Yt increases with t and y~ decreases with increasing t, there is one and
only one solution to (20), and this can be approximated by trial and error by
guessing a value of t (0 < t < 1), solving (14) for b, and computing the
quadratic form on the right of (20). Then another t can be tried.
An alternative approach is to set Yl = Y2 in (9) and solve for c. Then the
common value of Y I = Y2 is
b''Y
(21)
(22)
dYI dh
(24) ql<b(YI)dt +q2<b(h) dt =0,
where <b(u) = (27T)- te- t • There does not seem to be any easy Or direct
1l2
way of solving (24) for t. The left-hand side of (24) is not necessarily
monotonic. In fact, there may be several roots to (24). If there are, the
absolute minimum will be found by putting the solution into (23). (We
remind the reader that the curve of admissible error probabilities is not
necessary convex,)
Anderson and Bahadur (1962) studied these linear procedures in general,
including YI < 0 and Y2 < O. Clunies-Ross and Riffenburgh (1960) ap~
proached the problem from a mOre geometric pOint of view.
PROBLEMS
6.1. (Sec. 6.3) Let 7T, be N(fL, Ii)' i = 1,2. Find the form of th~ admissible
classification procedures.
6.2. (Sec. 6.3) Prove that every complete class of procedures includes the class of
admissible procedures.
6.3. ~S~c. 6.3) Prove that if the class of admissible procedures is complete, it is
minimal complete.
6.5. (Sec. 6.3) When p(x) = n(xl fl., 1:) find the best test of fl. = 0 against fl. = fl.*
at significance level e. Show that this test is unifonnly most powerful against all
alternatives fl. = Cfl.*, C > O. Prove that there is no uniformly most powerful test
against fl. = fl.(l) and fl. = fl.(2) unless fl.(l) = CIL(2) for some C > O.
6.6. (Sec. 6.4) Let P(210 and P<1!2) be defined by (14) and (15). Prove if
- 4.:12 < C < 4.:12, then P(210 and P(112) are decreasing functions of d.
6.7. (Sec. 6.4) Let x' = (x(I)" X(2),). Using Problem 5.23 and PrOblem 6.6, prove
that the class of classification procedures based on x is uniformly as good as
the class of procedures based on X(I).
PROBLEMS 249
6.8. (Sec. 6.5.1) Find the criterion for classifying irises as Iris setosa or Iris
versicolor on the basis of data given in Section 5.3.4. Classify a random sample
of 5 Iris virginica in Table 3.4.
6.9. (Sec.6.5.0 Let W(x) be the classification criterion given by (2). Show that the
r 2-criterion for testing N(jJ.0), I.) == N(jJ.(2), I.) is proportional to W(ill,) and
W(i(2»).
6.10. (Sec. 6.5.1) Show that the probabilities of misclassification of x I - ... , XN (all
assumed to be from either 'lT 1 or 'lT2) decrease as N increases.
6.Il. (Sec. 6.5) Show that the elements of M are invariant under the transforma-
tion (34) and that any function of the sufficient statistics that is invariant is a
function of M.
6.12. (Sec. 6.5) Consider d' xU). Prove that the ratio
NI ~ N! ,
L (d'X~I) - d'i(l'f + L (d'X~2\ - d'i(2)f
a=I a~l
6.13. (Sec. 6.6) Show that the derivative of (2) to terms of order 11 -I is
6.14. (Sec. 6.6) Show GD2 is (4). [Hint: Let I. = I and show that G( S - II J: = 1) ==
[n/(n - p - 0]1.]
1
+ 4n [3u 3 + 4tl u 2 + (2 P - 3 + tl2 ) U + 2(p - 1) tl ] } + 0 (n - 2 ) •
6.16. (Sec. 6.8) Let 'lTi be N(jJ.(il, I.), i = 1.. .. , m. If the jJ.(1) arc on a line (i.e ..
jJ.(i) = jJ. + VIP), show that for admissible procedures the Rr are defined by
parallel planes. Thus show that only one discriminant function u fl. (x) need be
used.
250 CLASSIfiCATION OF OBSERVATIONS
6.17. (Sec. 6.8) In Section 8.8 data are given on samples from four pOpulations of
skulls. Consider the first two measurements and the first three sample~.
Construct the classification functions UiJ(X). Find the procedure for qi =
N,/( N) + N'1 + N'!». Find the minimax procedure.
6.18. (Sec. 6.10) Show that b' x = c is the equation of a plane that is tangent to an
ellipsoid of constant density of 71') and to an ellipsoid of constant density of 71'2
at a common point.
6.19. (Sec. 6.8) Let x\/), ... , xWI be observations from N(fJ.(I), 1:), i = 1,2,3, and let
.l be an observation to be classified. Give explicitly the maximum likelihood
rule.
7.1. INTRODUCTION
251
252 COVARIANCF,MATRIX DlSl RIAUTION; GENERALIZED VARIANCE
(1 )
(2)
(3) i=2"",p.
Note that t'j' j = 1, ... , i-I, are the first i - I coordinates of Vi in the
coordinate system with WI, ..• , w , - 1 as the first i - 1 coordinate axes. (See
Figure 7.1.) The sum of the other n - i + 1 coordinates squared is Ilv;1I 2 _.
L~.:~ti~=ti~ =llw,1I2; W, is the vector from Vi to its projection on Wp''''Wi-l
(or equivalently on v p " " vi _ I)'
7.2 THE WISHART DISTRIBUTION 253
~ __ ~------------------l
i
(7) t~ = L Clkt k], i ~j,
k=]
=0, i <j,
can be written
t71 c ll 0 0 o 0 tIl
ttl X C22 0 o 0 t2l
tr2 x x cn o 0 t22
( 8) =
'~I X X x
t;p x x x
p
(9) L L tl~· = tr TT'
1= 1 j= 1
( 10)
7.2 THE WISHART DISTRIBUTION 255
aa hi
<.11) at* = 0, k>h,
kl
tt,
that is, aah,! a = 0 if k, I is beyond h, i in the lexicographic ordering. The
Jacobian of the transformation from A to T* is the determinant of the lower
triangular matrix with diagonal elements
(12)
( 13)
aahi * h > i,
atti = t ir ,
The Jacobian is therefore 2 p n!~ 1ti~ {J + I-i. The Jacobian of the transfurma-
tion from T* to A is the reciprocal.
( 14)
The density (14) will be denoted by w(AI I, n), and the associated distri-
bution will be termed WeI, n). If n < p, then A does not have a density, but
its distribution is nevertheless defined, and we shall refer to it as WeI, n),
(15)
(16)
Then the product of (15) and (16) for i = 2, ... , p is (6) times dtu ... dtpp •
This aralysis, which exactly parallels the geometric derivation by Wishart
[and later by Mahalanobis, Bose, and Roy (1937)], was given by Sverdrup
tin the first edition of this book, the derivation of the Wishart distribution and its geometric
Interpretation were in terms of the nonorthogonal vectors "I"'" lip'
7.2 THE WISHART DISTRIBUTION 257
(1947) [and by Fog (1948) fOI' p = 31. Another method was used by Madow
(1938), who drew on the distributioll or correlatlon c()eITiL:knt~ (for 1 = J)
obtained by Hotelling by considering certain partial correlation coefficients.
Hsu (1939b) gave an inductive proof, and Rasch (1948) gave a method
involving the use of a functional equaLiol1. A daferent method is to obtain Lhc
characteristic function and invert it, as was done by Ingham (1933) and by
Wishart and Bartlett (1933).
Cramer (1946) verified that the Wishart distribution has the characteristic
function of A. By means of alternative matrix transformations Elfving (I 947).
Mauldon (1955), and Olkin and Roy (954) derived the Wi~hart distribution
via the Bartlett decomposition; Kshirsagar (1959) based his derivation on
random orthogonal transformations. Narain (1948), (1950) and Ogawa (1953)
us<.;d a regression approach. James (1954), Khatri and Ramachandran (1958),
and Khatri (1963) applied different methods. Giri (1977) used invariance.
Wishart (1948) surveyed the derivations up to that date. Some of these
methods are indicated in the problems.
The relation A = TT' is known as the Bartlett decompositio{( [Bartlett
(1939)], and the (nonzero) elements of T were termed rectangular coordinates
by Mahalanobi.c;, Bo,c;e, and Roy (1937),
Corollary 7.2.4
(17) I· -f IBI,-1(p+ll
8>0
e -lrB dB =
/=1
p
7Tl'lil-l)/~ nr[t - ~(i - 1)].
Proof Here B > 0 denotes B positive definite. Since (14) is a density. its
integral for A > 0 1S 1. Let 'I = J, A = 2B (dA = 2 dB), and 12 = 2t. Then the
fact that the integral is 1 is identical to (17) for t a half integer. However. if
we derive (14) from (6), we can let n be any real number greater than p .- 1.
In fact (17) holds for complex t such that !Ilt > p - 1. (/I/t means the real
part of t.) _
f'
(18) rp(t) = 7TP(p-l)/~ nr[t - ~(i - 1)].
/'" I
IA I ~(I/- {J ~ I ) e ~ ~ If ~ 'I
(19) W( AII , n) = ":""""':"-:-,- - " , - - - - - -
2~{I"1 II-·II [~J( ~n)
= ...~o ............ ~ \ I., ... \..,. ~ ...... _ • ~ J.. " J. . , ..... &...' . . .' I. ...... ~ ......... ' - ' . I .... _ -- ........ .
( 1)
Let
(2)
a=1
= sexp(i i: Z~(H)Za).
a=1
where Z has the density (1). For (H) real, there is a real nonsingular matrix B
such that
(5) B'I-IB =1,
(6) B'0B=D,
where D is a real diagonal matrix (Theorem A.: .2 of the Appendix). Jf we let
z = By, then
(7) cS' exp( iZ'0Z) = S exp( if' Df)
p
= sO exp(id"Y,z)
,= I
p
=
,=0 1 S exp(id"Y/)
7.3 SOME PROPERTIES OF THE WISHART DISTRIBUTION 259
p
(8) tC exp( iZ'0Z) = n (1- 2id,,)
,= I
-!
: = II - 2iDI- z
,
since 1- 2iD is a diagonal matrix. From (5) and (6) we see that
= IB'(I-I - 2i0)BI
= IB'I'II-' -2i01·IBI
= IBI 2'II- 1 -2i01,
(10)
It can be shown that the result is valid provided (~(O"k - 2i8;k» is positive
definite. In particular, it is true for all real 0. It also holds for I singular.
(11)
(12) A =CBC',
(13)
(15)
(16)
Theorem 7.3.5. Let A and I be pal1ilioned into PI,P2, ... ,Pq rows and
columns (PI + ... +P q = p),
(17) A=
Proof. A is distributed as L~= 1 Z"Z~, where Z I " ' " ZII are independently
distributed, each according to N(O, I). Let Z,. be partitioned
(18) Z =
Z~l). 1
" .
( z(q)
a
as A and I have been partitioned. Since II/ = 0, the sets Zill, ... ,
ZIIm, .•• , Z(q)
I ,"',
Z(q)
II are .md den
epen t . Th en A II -- ,(,II
l.....aR 1
ZO)ZO)'
a ,....
Il
A-
qq-
L~-l Z~Il>Z~9)' are independent. The rest ol Theorem 7.3,) follow .. from
Theorem 7.3.4. •
262 COVARIANCE MATRIX DISTRIBUTION; GENERALIZED VARIANCE
(1)
then
m
(2) I: r, = N
1=1
o
(3) I
o
where I is of order r" the upper left-hand 0 is square of order Lj-== ~ r; (which is
vacuous for i = 1), and the lower-right hand 0 is square of order L~= i + 1 r; (which
is vacuous for i = m).
Proof The necessity follows from the fact that (1) implies that the sum of
(3) over i = 1, ... , m is IN' Now let us prove the sufficiency; we assume (2).
7.4 COCHRAN'S THEOREM 263
o
(4) ~,
o
where the partitloning is according to (3), and ~, is diagonal of order ri' This
is possible in view of (2). Then
o
(5) I-~ ,
o
Since the rank of (5) is not greater than L:~~ 1 r, - r, = N - r" whlch is the sum
of the orders of the upper left-hand and lower right-hand 1 's in (5). the rank
of I-~, is 0 and ~,= I, (Thus the r, nonzero rootS of C, are 1, and C1 is
positive semidefinite.) From (4) we obtain
o
(6) 1
o
where B, con~ists of the rl columns of P; corresponding to 1 in (6). From (1)
we obtain
B,
B'2
(7) =P'P,
B'm
N
(8) Q,= E c~pYaY;, i= l •.. "m,
()(. {3 '" I
264 COVARIANCE MATRIX DISTRIBUTION; GENERAUZED VARIANCE
(9)
;=1 a=1
It follows from (3) that Ci is idempotent. See Section A.2 of the Appendix.
This theorem is useful in generalizing results from the univariate analysis
of variance. (See Chapter 8.) As an example of the use of this theorem, let us
prove that the mean of a sample of size N times its transpose and a multiple
of the sample covariance matrix are independently distributed with a singular
and a nonsingular Wishart distribution, respectively. ~t YI , ... , YN be inde-
pendently distributed, each according to N(O, I). We shall use the matrices
J)
C1 = (C~1~) = (liN) and Cz = (42 = [8a .8 - (lIN)]. Then
~ 1 --
(10) Q1 = '-' NYa Y~ = NYY',
a, {:I-I
(11)
N
= 2: YaY~ - NIT'
a"'l
N
= 2: (Ya - Y)(Ya - Y)"
a=l
1 N
(1) lSI N=t L (xa i)(xa- x)'.
a-I
where e = (1, ... , 1)', are orthogonal to the equiangular line (through the
origin and e); see Figure 3.2. Then the entries of
(4)
We now apply this theorem to the parallelotope having the rows of (2)
as principal edges. The dimensionality in Theorem 7.5.1 is arbitrary (but at
least p).
( 6) IAI =
LYp-l.aYla LY;-l.a LYp- I, .sYP I3
a a {3
(8)
where the sum on f3 is over ('Y I' ... , 'Yp). If we now expand this determinant
in the manner used for IA I, we obtain
(9)
268 COVARIANCE MATRIX DISTRIBUTION; GENERALIZED VARIANCE
where the sum is for each f3; over the range ('Y I' ... , 'Yp )' Summing (9) over all
different sets ('YI < .. ' 'Yp )' we obtain (7). (IYi.BjY'.B) = 0 if two or more f3; are
equal.) Thus IAI is the sum of volumes squared of all different parallelotopes
formed by sets of p of the vectors yO'. as principal edges. If we replace YOi by
xa - i, we can state the following theorem:
Theorem 7.5.2. Let lSI be defined by (1), where Xl'.·.' XN are the N
vectors of a sample. Then lSI is proportional to the sum of squares of the
volumes of all the different parallelotopes fonned by using as pn'ncipal edges p
vectors with p of Xl'··.' XN as one set of endpoints and i as the other, and the
factor of proportionality is 1I(N - l)p,
(11 )
The v~lume of this ellipsoid is C(p)1 II t[ Xp2(ex)]W Ip, where C(p) is defined
in Problem 7.3.
then IAI = Iel·IBI ·1 ell = IBI·I II. By the development in Section 7.2 we
see that IBI has the distribution of nr=l tt- and that t;l,"" t;p are indepen-
dently distributed with X 2-distributions.
If p = 1, lSI has the distribution of III 'X~-l I(N - 1). If p = 2, lSI has
the distribution of IIlx~-l' x~-2/(N - 1)2. It follows from Problem 7.15 or
7.37 that when p = 2, lSI has the distribution of III( X1N_4)2 1(2N - 2)~. We
can write
i
nV;(n) - (n--y +i) = In V;(n) -1 + n
(13)
"';2( n - p f-i) 12 VI _P ; i
We have
Vl(n)]
( 19) U(n)= : ,
(~(n)
IBI In P = w = f(u!> .. " up) = U t U 2 ••• up> T = 21, afl auil ll _ b = 1, and CP~TCPb
= 2p. Thus
(20)
since
u ll 0 0
0 U 22 0 P
(2) III = = Il
i= 1
uu'
0 0 OPp
We make the transformation
(3) i:;:},
The Jacobian is the product of the Jacobian of (4) and that of (3) for ail
fixed. The Jacobian of (3) is the determinant of a p(p -O/2-order diagonal
matrix with diagonal elements va;; ,;a;;.
Since each particular subscript k,
say, appem in the set rif (i <j) p - 1 times, the Jacobian is
p
(5) J= nat.(p-o.
II
i= 1
H we substitute from (3) and (4) into w[AI(ujjSij ), n] and multiply by (5), we
obtain as the joint density of {ali} and {rij}
(6)
since
(7)
where ru = 1. In the ith term of the product on the right-hand side of (6), let
aul(2uji) ui ; then the integra] of this term is
(8)
by definition of the gamma function (or by the fact that atil Uu has the
X2-density 'vith n degrees of freedom). Hence the density of rij is
Theorem 7.7.l. If A has the distrihution W( I , m), then B = A -.1 has the
density
I"'I 11111 B I - !( /II +(J +I ) e - fIr 11' y- I
(1 )
2tmprp(im)
We shall call (1) the density of the inverted Wishart distributio'l with m
degrees of freedom t and denote the distribution by W- l( 'V, m) and the
density by w- I (BI 'V, m). We shall call'll the precision matrix or concentra-
tion matrix.
tTbe definition of the number of degrees of freedom differs from that of Giri (1977), p. 104, and
Muirhead (1982), p. 113.
7,7 INVERTED WISHART DISTRIBUTION AND BAYES ESTIMATION 273
Theorem 7.7.2.If A has the distribution WeI, n) and I has the a priori
distribution W-I(W, m), then the conditional distribution of I is W-I(A +
W,n + m).
for A and I positive definite. The marginal density of A is the integral of (2)
over the set of I positive definite, Since the integral of (1) with respect to B
is 1 identically in W, the integral of (2) with respect to I is
(4)
2t(Il+ml P rJHn +m)]
Corollary 7.7.1. If nS has the distribution WeI, n) and I has the (l priori
dIStribution W- I(W, m), then the conditional distribution of I given S is
JV-I(nS + W, n + m).
Corol1ary 7.7.2. If nS has the distribution WeI, n), I has the a priori
distribution W- I (W , m), and the loss function is tr( D - I )G( D - I) H, where
G and H are positive definite, then the Bayes estimator for I is
1
(5) n m -p- 1 (nS + 'V).
+
The marginal density of i and A is the integral of (8) with respect to I-'- and
I. The exponclItial in (8) is - ~ times
( 9) ( N + K) I-'- I I - I I-'- - 2( NX + K v) I - I I-'-
+ Ni' I - Ii + K v I I - I V + tr( A + '11) I - I
(11) 1Ttvr~[-!(N-1)]rp(!m)(N+K)1P
·IAI !(N-p-2)1 Wllml 'I' + A + N:KK (i - v )(i - v)'1- 1(N+m).
The conditional density of j-L and I given i and A is the ratio of (8) to (11),
namely,
Coronary 7.7.3. If XI" •• ' X N are observations from N(j-L, I), if j-L and I
have the a priori density n(j-Llv,(1/K)I]xw- I (I\W,m), and if the loss
function is (d - j-L)'](d - j-L) - tr(D - I)G(D - I)H, then the Bayes estima-
tors of j-L and I are
1
(13) N +K(Ni+Kv)
and
respectively.
The estimator of j-L is a weighted average of the sample mean i and the
a priori mean v. If N is large, the a priori mean has relatively little weight.
276 COVARIANCE MATRIX DISTRIBUTION; GENERALIZED VARIANCE
Theorem 7.7.4. Ifxl, ... ,xN are observations from N(JL, I-) and if JL and
I- have the a priori density n[JLI 11, (1/K)I-] X w-I(I-1 '1', m), then the marginal
a posteriori density of JL given i and S is
(15)
(16)
( 17)
1TWrp[ !(N + m)] IB + (N + K)(JL - JL* )(JL - JL* )'1 t(N+m+l) •
(x -+ ex,
S -+ CSC', I -+ CIC'). We consider two loss functions which arc
invariant with respect to such transformations.
One loss function is quad ratic:
(1)
where G is a positive definite matrix. The other is based on the form of the
lik.elihood function:
(2)
(See Lemma 3.2.2 and alternative proofs in Problems 3.4, 3.8, and 3.12.) Each
of these is 0 when G = I and is positive when G ¢ I. The second loss
function approaches 00 as G approaches a singular matrix or when one or
more elements (or one or more characteristic roots) of G approaches x. (See
proof of Lemma ~.2.2.) Each is invariant with respect to transformations
G* = CGC', I* = CIC'. We can see some properties of the loss fUllctions
from L/[, D) = Lf. [(d/i - 1)2 and L I ([, D) = L;= I(d ll - log d j , - n, where
D is diagonal. (By Theorem A.2.2 of the Appendix for arbitrary positive
definite I and symmetric G, there exists a nonsingular C such that CIC' = I
and CGC' = D.) If we let g = (gIl' . " , gPP' g12' ... , gp_ I.p)', s =
(su, ... ,Spp,S[2,""Sp_[,p)" fT=(u[[, ... 'Upp '0"12 ••••• 0"p-I.P)', and <1>=
S(s - fT )(s - fT)', then Lq(I G) is a constant multiple of (g- fT )''1>- [(g- fT).
(See Problem 7.33.)
The maximum likelihood estimator i and the unbiased estimator S are of
the form aA, where A has the distribution WeI, n) and n = N - 1.
2
= $/ tr ( aA. * -[)
= $/ (a
2
.~
',J=[
a7/ - 2a t a~, + pl
,=[
= cF1{atrA* -logIA*I-ploga-p}
Although the minimum risk of the estimator of the form aA is constant for
its lo.'Is function, the estimator is not minimax. We shall now consider
estimators G(A) such that
for lower triangular matrices H. The two loss functions are invariant with
respect to transf(Hmations G* = HGH', I* = HIH'.
Let A = 1 and H be the diagonal matrix D, with -1 as the ith diagonal
element and 1 as each other diagonal element. Then HAR' = I, and the
i, jth component of (5) is
(6) j+i.
Hence, g,/l) = 0, i "* j, and G(l) is diagonal, say D. Since A = TT' for T
lower triangular, we have
= TG(l)T'
=TDT' ,
(8)
c8'~L[I,G(A)] = jL[I,G(A)]C(p,n)III-~nIAIHn-p-l)e-t!rrlAdA
= jL[KK',G(A)]C(p,n)IKJ('I-tnIAIHn-P-l)
7.8 IMPROVED ESTIMATION OF THE COVARIANCE MATRIX 279
= $JL[KK'.KG(A*)K']
= $JL[I.G(A*)]
$J tr (TDT' - 1)2
The expectations can be evaluated by using the fact that the (nonzero)
elements of T are independent, tl~ has the x2-distribution with n + 1 - £
degrees of freedom, and tii' £ > }, has the distribution N(O,I). Then
(10)
fij=n+p-2j+l, i <},
f;=n+p+2i+l,
Theorem 7.8.2. With respect to the quadratic loss function the best estimator
invariant with respect to linear transformations I ~ H'IH', A ~ HAH', where
H is lower triangular, is G(A) = TDT', where'D is the diagonal matrix whose
diagonal elements compose d F-lj, F andjare defined by (11), and A = IT'
with T lower triangular.
280 COVARIANCE MATRIX DISTRIBUTION; GENERALIZED VARIANCE
In the case of p = 2
~
(12)
d = (n+lr-(n-l) d = (~ + 1)( n + 2) .
I ~
(n + 1r( n + 3) - (n - 1)
2
r
(n + 1 (n + 3) - (n - 1)
The risk is
2
2 + Sn + 4
3n
(13)
n + 5n 2 + 6n + 4 .
3
The difference between the risks of the best estimator aA. and the best
estimator TDT' is
The difference is 515 for n = 2 (relative to ~), and 417 for n = 3 (rela~ive to 1);
it is of the order 2/n 2 ; the improvement due to using the estimator TDT' is
not great, at least for p = 2.
For the likelihood loss function we calculate
(15) $rL{[I,G(A)]
= @"rL{[I, TDT']
= cC'r[tr TDT' -logl TDT'I - p]
tTbe essential condition is that the group is sOlvable. See Kiefer (1966) and Kudo (1955).
7.8 IMPROVED ESTIMATION OF THE COVARIANCE MATRIX 281
p p 1 p
= @1 [ i.~ I t,~ dJ - i~ log t,~ -,~ log d, - P
p p p
= ~ (n + p - 2j + 1) d) - E log d) - E cf log x,;+ 1_) - p.
j~1 j=1 J=I
Theorem 7.8.4. With respect to the likelihood loss function, the best estima-
tor invariant with respect to linear transformations I -+ HI H', A -+ HAH',
where H is lower triangular, is G(A) = TDT', where the jtlz diagonal elemellf of
the diagonal matrix D is 1/(n + p - 2j + 1), j = 1, ... , p, and A = IT', with T
lower triangular. The minimum risk is
p p
(16) 6'I,L[I,G(A)] = E log(n +p - 2j + I} - E J.' log X,;+l-J'
j=l J=l
James and Stein (1961) gave this estimator. Note that the reciprocals of
the weights 1/(rl + p - 0, 1/(n + p - 3), ... . 1/(n - p + 1) are symmetrically
distributed about the reciprocal of lin.
If p = 2,
(17) G(A) =
1
n + 1A + 0
(0 ~
o 1
f...
IAI
a '
-
n -1
2 II
(18) SG(A) = n + 1 1+
n
fl
2 (0
+1 0
The difference between the risks of the best estimator aA and the best
estimator TDT' is
(19) p log n - f:
J=l
log(n + p - 2j + 1) = - .£ IOg(
J:l
1+ p - ~j + I ).
282 COVARIANCE MATRIX DISTRIBUTION; GENERALIZED VARIANCE
If P = 1. the improvement is
which is 0.288 for n = 2, 0.118 for n = 3,0.065 for n = 4, etc. The risk (19) is
O(l/n~) for any p. (See Problem 7.31.)
An obvious disadvantage of these estimators is that they depend on the
coordinate system. Let P, be the ith permutatioll matrix, i = 1, ... , pI, and iet
P,AP; = Ti~" where T; is lower triangular and tj) > 0, j = 1, ... , p. Then a
randomized estimator that does not depend on the numbering of coordinates
is to let the estimator be P,'TjDT,'P, with probability l/p!; this estimator has
the same risk as the estimaLor for the original numbering of coordinates.
Since the loss functions are convex, (l/p!)LjP,'T,DT/Pj will have at least as
good a risk function; in this case the risk will depend on I.
Haff (1980) has shown that G(A) = [l/(n + p + 1)](A + 'YuC),
where y is constant, 0;5; 'Y;5; 2(p - l)/(n - p + 3), u = l/tr(A -IC) and C is
an arbitrary positive definite matrix, has a smaller quadratic risk than
[J /(11 + P + I )]A. The estimator G(A) = Cl/n)[A + ut(u)C], where t(u) is an
absolutely continuous, nonincreasing function, 0;5; t(u);5; 2(p - l)/n, has a
smaller likelihood risk than S.
(I)
S~I and T~/, /N(S-I) and /N(T-I) have limiting normal distribu-
tions, and
Let X= v + CY, where Y has the density g(y'y), A = CC', and I = S(X
-v)(X-v)' =(SR 2 /p)A=rr', and C and r are lower triangular. Let S
-- p
be the sample covariance of a sample of Non X. Let S = IT'. Then S - I,
T~ r, and
Theorem 7.9.2. Define T= (t,) by Y'Y= IT', t;j= 0, i <j, and t,; ~ O. If
the density of Y is g(Y'Y), then the density of Tis
284 COVARIANCE MATRIX DISTRIBUTION; GENERALIZED VARIANCE
Proof. Let f= (vl, ••• ,vp )' Define Wi and Wi' recursively by WI = VI' U I =
w';lIwlll,
( 6)
(7)
(See Lemma AA.2.) Define
(8) z, = Q"v·=
t,,'_1
z* I
j
( 11) I'; 1', = E t)/.. t ,k , j<i.
k=1
The transformation from f= (vp ...• v p ) to Zl' ••• ,zp has JacoDian 1.
To obtain the density of T convert zi to polar coordinates and integrate
with respect to the angular coordinates. (See Section 2.7.1.) •
The ahove proof follows the lines of the proof of (6) in Section 7.2, but
does not use information about the normal distribution, such as t,~ 4. X~+I-j'
See also Fang and Zhang (1990), Theorem 3.4.1.
Let C be a lower triangular matrix such that A = CC'. Define X = fC'.
(12) lCI-Ng[C-1X'X(C')-I],
PROBLEMs
285
then the lower triangular matrix T* satisfying X' X = T* T*' and til ~ 0 has the
density
(13)
Theorem 7.9.4. If X has the density (12), then A = X' X has the density
(14)
(15)
Theorem 7.9.5. Let X have the density (12) where A is diagonal. Let
S (N - 1)-1 (X - € Ni')'(X - I:; Ni') and R = (diag S)- ~S(diag S)-~. Then
the density of R is (9) of Section 7.6.
PROBLEMS
YI = w sin Oil
Y2 = wcos 01 sin ° 2,
where - 111" < 0, :$; 111", i = 1, ... , n - 2, -11" < 0"-1 .:::; 1T', and 0.:::;
w <00.
286 COVARIANCE MATRIX DISTRIBUTION; GENERALIZED VARIANCE
(a) Prove w 2
so forth.]
(b) Show that the Jacobian is w ll - J COS,,-2 (JI COSIl- 3 (J2 ... cos 811 - [Hint: Prove
2,
COS 81 0 0 0
0 0 cos (JII_ J 0
wsin 9\ w sin 9, w sin (J"_I 1
W x x x
0 wcos 9 1 X X
7.3. (Sec. 7.2) Use Problems 7.1 and 7.2 to prove that the surface area of a sphere of
unit radius in n dimensions is
21T111
C(n) = f( ~n) .
7.4. tSec. 7.2) Use Problems 7.\, 7.2, and 7.3 to prove that if the density of
y' = Y\, ... , y,,)is f(y' y), then the density of u = y' y is ~C(n)f(u)u I,I-I.
7.5. (Sec. 7.2) X 2·distribution. Use Problem 7.4 to show that if Yh'''' Yn are
independently distributed, each according to N(O,1), then U = r:~", I Y~ has the
density U~II-\ e-!u /(2~nf(~n)], which is the x 2 -density with n degrees of
freedom.
7.6. (Sec. 7.2) Use (9) of Section 7.6 to derive the distribution of A.
7.7. (Sec. 7.2) Use the proof of Theorem 7.2.1 to demonstrate Pr{IAI = O} = O.
PROBlEMs
287
7.8. (Sec, 7.2) Independence of estimators of the parameters of the complex normal
distribution Let ZI" .. ' z,v be N observations from the complex normal distribu-
tion with mean () and covariance matrix P. (See Problem 2.64.) Show that Z
and A = Z~~I(Za - Z)(Za - Z)* are independently distributed, and show that
A has the distribution of L~= I Wa Wa*' where WI" .. ' Wn are independently
distributed, each according to the complex normal distribution with mean (t and
covariance matrix P.
7.9. (Sec, 7.2) The complex Wishart distribution. Let WI"'" Wn be independently
distributed, each according to the complex normal distribution with mean 0 and
covariance matrix P. (See Problem 2.64.) Show that the density of B =
L~= I Wa Wa* is
7.10. (Sec. 7.3) Find the characteristic function of A from WeI, n). [Hint: From
fw(AI I, n)dA =.,
one derives
as an identity in 4J.} Note that comparison of this result with that of Section
7.3.1 is a proof of the Wishart distribution.
7.12. (Sec, 7.3.1) Find the first two moments of the elements of A by differentiating
the characteristic function (11).
"'.13. (Sec. 7.3) Let Zl' ... ' Z" be independently distributed, each according to
N(O,1). Let W= L~.f3=lbaf3ZaZ~. Prove that if a'Wa = X~ for all a such that
a'a = 1, then W is distributed according to WO, m). [Hint: Use the characteris-
tic function of a'Wa.}
7.14. (Sec. 7.4) Let Xu be an observation from N()3z,. . ,I), a= I, ... , N, where za is
a scalar. Let b= LaZaXa/raz~. Use Theorem 7.4.1 to show that LaXax'a-
bb'L",Z~ and bb' are independent.
2 2 ) II
t! ( XN-I XN-2 = t! (X2N.-4/
2 4 ) II , h <:! 0,
by use of the duplication formula for the gamma function; X~-I and X~-2 are
independent. Hence show that the distribution of X~-I X~-? is the distribution
of XiN-4/4. -
288 COVARIANCE MATRIX DISTRIBUTION; GENE RAU ZED VARIANCE
7.16. (Sec. 7.4) Verify that Theorem 7.4.1 follows from Lemma 7.4.1. [Hint: Prove
that Qj having the distribution WeI, rl) implies the existence of (6) where I is
of order r, and that the independence of the Q/s implies that the I's in (6) do
not overlap.]
7.17. (Sec. 7.5) Find GIAlh directly from W(1;.n). [Hint: The fact that
jw(AII,n)dA:;. I
shows
as zn identity in n.]
N(X-"*)'S-l(f-"*).s
.- .- (N-l)PF
N - P p. N-p (e)·
,
where x and S are based on a sample of N from N(fL, I). Find the expected
value of the volume of the confidence region.
7.19. (Sec. 7.6) Prove that if 1; =I, the joint density of ri/,p' i, j = 1, ... , p - 1, and
r Jp , ... ,rp _I.I' is
where R u .p (rlj'p)' [Hint: rii'p = (rij - r/prjp)/( VI - ri~ VI - rJ~ ) and Irljl ='
.n 2
r{ Hn - (p - 3)]} (1 _ r2
lr{ 1[ ( 2)]}
)~(n-p)
1~ I 17'2 '2 n - p- i3·4 ..... p
2
... Pn- r[ ~(n -1)] (1- 2 )!(n-4)
i-I 17' r I'2(n -
2I I
2) ] rl p-I·p
.
.n1'-1
r (1'in ) (1 _ 2 )i(n-31
i=117'trI!cn-l)] r,p .
7.21. (Sec. 7.6) Prove (without the use of Problem 7.20) that if :I = 1, then
rip"'" r p _ I, p are independently distributed. [Hint: rip = a IP /(";;;:; va
pp ). Prove
that the pairs (alp, all)'" " (a p _ l . p, ap-l. p _ l ) are independent when
(Zip"", Z"P) are fixed, and note from Section 4.2,1 that the marginal distribu-
tion of rip' conditional on zap' does not depend on zap']
7.22. (Sec. 7.6) Prove (without the use of Problems 7.19 and 7.20) that if 1: = I. then
the set r ll" ... , r ll _ I, p is independent of the set rif'{" i. j = l. .... " _. l. l Hint:
From Seetion 4.3.2 a pp ' and (alP) are indepelldent or (£lIn,)' Prov..: thaI
aIJII,(ail)' and aii' i = 1, .... P - 1, arc independent of (rl , ) hy proving that
a/j.p are independent of (ri;-p)' See Problem 4.21.]
7.23. (Sec. 7.6) Prove the conclusion of Problem 7.20 by using Problems 7.21 and
7.22.
7.24. (Sec. 7.6) Reverse the steps in Problem 7.20 to derive (9) of Section 7.6.
7.25. (Sec. 7.6) Show that when p = 3 and 1: is diagonal r lJ , r 13 , r13 are not
mutually independent.
7.26. (Sec. 7.6) Show that when :I is diagonal the set rlJ are pairwise independent.
rIhn+p)]
7.30. (Sec. 7.8) Verify (17) and ([8). [Hint: To verify (18) let l: = KK', A ~I KA.* K',
and A* = T*T*, where K and T* are lower triangular.]
290 COVARIANCE MATRIX DISTRIBUTION; GENERALIZED VARIANCE
P even,
::: - ~(p-I)
E log [ 1 ~ (P~2i+l)2]
n ' P odd.
1= I
7.32. (Sec. 7.8) Prove L/I., G) and L,(I., G) are invariant with respect to transfor-
mations G* := CGC'. I.* := CI.C' for C nonsingular.
7.33. (Sec. 7.8) Prove L,,(I..G) is <t mUltiple of (g~ (1)IcJ>-I(g - (1). Hint: Trans·
form so 1 = I. Then show
<fl = _~
II
(210 0) I'
7.36. (Sec. 7.2) Dirichlet distn·butiorI. Let Y1, ••• , Y,n be independently distributed as
x:'.-variables with Pl •... ' Pm degrees of freedom, respectively. Define Zj =
YjL~'~I~' i= I .... ,m. Show that the density I)f ZI,,,,,Zm-1 is
for z, ~ 0, i = 1, .... m.
7.37. (Sec. 7.5) Show that if X~-I and X~-2 are independently distributed, then
x~ I x~ ~ is distrihuted as (X?'V ~ f'- /4. [Hint: In the joint density of x = X~-l
and S = x.~'-~ substitute z := 2!J..Y. X == x. and expl<.!ss the marginal density of Z
a~ zX. lt(z). where h(z) is an integral with respect to x. Find h'(z), and solve
l
Ihl' dilTt'n':lllial l'qUiILioll. Sec Sriva~tavH and Khatri (1979). ChapLer 3.]
CHAPTER 8
8.1. INTRODUCTION
In this chapter we generalize the univariate least squares theory (i.e., regres-
sion analysIs) and the analysis of variance to vector variates. The algebra of
the multivariate case is essentially the same as that of the univariate case.
This leads to distribution theory that is analogous to that of the univariate
case and to test criteria that are analogs of F-statistics. In fact, given a
univariate test, we shall be able to write down immediately a corresponding
multivariate test. Since the analysis of variance b~sed on the model of fixed
effects can be obtained from least squares theory, we obtain directly a theory
of multivariate analysis of variance. However, in the multivariate case there is
more latitude in the choice of tests of significance.
In univariate least squares we consider scalar dependent variates x I' •. " X N
drawn from popUlations with expected values I3'ZI"'" I3'ZN' respectively,
where 13 is a column vector of q components and each of the zrr is a column
vector of q known components. Under the assumption that the variances in
the populations arc the same, the least squares estimator of 13' is
(1 ) b
l
=( fXaZ~)( £Z<>Z~)-l
<>=1 <>=1
291
292 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
If the populations are normal, the vector is the maximum likelihood estima-
tor of ~. The unbiased estimator of the common variance a 2 is
N
(2) S2= L (x a -b'zQ)2/(N-q),
a=l
1 0- 2
(3)
[q/(N-q)]F+ 1 = 0-02 '
(1)
(3)
N N
(4.) ~ (xa-Fz,,)(xa-Fz a ), = ~ (x,,-Bza)(x,,-Bz,,)'
a=1 a=1
N
+(B - F) ~ Z a Z~ ( B - F) , .
a=l
N
(5) ~ [(Xa-Bza) +(B-F)za][(xa-Bz a ) +(B-F)za]',
a=1
N
(6) ~ Za(Xa - BZa)' = 0
a=1
by virtue of (3). •
294 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
(7)
N N
tr~*-I L (.l' .. -ffz,,)(x.r-P*zu)'=tr~*-l L (xa-Bza)(xa-Bza)'
,,=1 a=1
+ tr I *- 1 (B - p* ) A ( B - 13*)' ,
where
(8)
The likelihood is maximized with respect to 13' by minimizing the last term
in (7).
Lemma 8.2.2. If A and G are posicive definite, trFAF'G >0 for F *" O.
Proof Let A = HH', G = KK'. Then
for F *" 0 because then ](' FH *" 0 since Hand ]( are nonsingular. •
It follows from (7) and the lemma that L is maximized with respect to p*
by 13* = B, that is.
(I U)
where
(II )
A 1 N A) A
(12) ~= N L (Xa- t3z a (Xa- t3 za )'·
(t = I
N
(14) JJ'P = JJ' I: Xaz~A-1
a=1
N
= I: PZaz~A -I = PAA- 1
a=1
=13·
296 TESTING THE GENERAL LINEAR HYPOTHESIS; MAN OVA
(15)
N N
~(~/- (3/)(~) - (3;)' =A- I $ L (X,a - ($"Xra)za L (X/y- (,s"X)y)Z;A-1
~=I y=1
N
=A-
I
L cff(Xja- c,fXja)(X/y- 0'Xjy)zaZ~A-1
a,y= I
N
= A -I ~
i..J
~
uay 0;) za Zy
, A-I
(x. y= I
To summarize, the vector of pq components (~'1' ... ' ~~)' = vec is nor~ P'
mally distributed with mean ((3'1 •... ' (3~)' = vec t3' and covariance matrix
0"1l
A-I 0"12
A-I O"lp
A-I
0"21
A-I 0"22
A-I 0"2p
A-I
(16)
The matrix (16) is thc KroncL.:kcr (or dircct) produL.:t of the matrices I and
A- I, denoted by I ® A - I.
(17)
N N N p q
N
(19) L f;aZha = 1, j=i, It=g,
a=1
= 0, otherwise.
A linear unbiased estimator is best if it has minimum variance over all linear
unbiased estimators; that is, if (1;'(F - (3,g)~ ~ t'(G - {3,gf!. for G = L:~= I g~ xa
and $0 = {3lg'
298 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
Theorem 8.2.4. The least squares estimator is the best linear unbiased
estimator of {3,g.
(21)
=0.
(1)
tU LlKJ::LlHOOD RATIO CRITERIA FOR REGRESSION COEFFICIENTS 299
SO that 131 has ql columns and 132 has q2 columns. We shall derive the
likelihood ratio criterion for testing the hypothesis
(2)
where pt is a given matrix. The maximum of the likelihood function L for
the sample xl' ... , x N is
= (C2 -l3iAI2)A221
(8)
(10)
The likelihood ratio criterion for testing H is (10) divided by (3), namely,
(11 )
A=IInltN
IIwl!N·
In testing H, one rejects the hypothesis if A< Ao, where Ao is a suitably
chosen number.
A speciai case of this problem led to Hotelling's T2 -criterion. If q = ql = 1
(q2 = 0), za = 1, a = 1, ... ,Nt and 13::::: PI = IJ.., then the T 2-criterion for
testing the hypothesis IJ.. = lJ..o is a monotonic fuIlction of (11) for ~ = lJ..o·
The hypothesis IJ.. = 0 and the T 2-statistic are invariant with respect to the
transformations X* = DX and x: = Dx a , a = 1, ... , N, for nonsingular D.
Similarly, in this problem the null hypothesis PI = 0 and the lik~lihood ratio
criterion for testing it are invariant with respect to nonsingular linear
transformations.
Theorem 8.3.1. The likelihood ratio criterion (11) for testing the null
hypothesis PI = 0 is invariant with respect to transformations x: ::::: Dx a , a =
1, ... , N, for nonsingular D.
(12) P* =DCA- I
=DP,
~ 1 N ~ ~ ~
(13) .z..tl = N ~ (Dxa - DPZa)( DXa - DPZa)' = D.z..nD',
a=l
N
(15) I*=!~(Dx
w N i...J <r
-Dit Z(2l)(Dx a -Dit
.... 2 w a a )'=DI w
.... 2 wz(2
l D'·
•
a=l
Lemma 8.3.1.
(16)
8.3 LIKEUHOOD RATIO CRITERIA FOR REGRESSION COEFFICIENTS 301
Figure 8.1
302 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
(22)
Thus, t, Pin, and P2 w form a sufficient set of statistics for I, Pl' and 13 2 ,
8.3 LIKELIHOOD RATIO CRITERIA FOR REGRESSION COEFFICIENT'S 303
Wilks (1932) first gave the likelihood ratio criterion for testing the equality
of mean vectors from several populations (Section 8.8). Wilks (D34) and
Bartlett (1934) extended its use to regression coefficients.
(23)
(25)
Let
(33) Pln(AIl-AI2AZiIA21)=CI-C2A22IA21
= X( Z'I - Z~ AZi IA 21 )
= WI r-I ' ·,
that is, WI = PlnAll'2P; = PlnP~ Land fl = PIP1 1. Similarly, from (6) we
obtain
(34)
(1)
where A Ll ' 2 =AIL - Al2 A22LA21' We shall study the distribution and the
moments of U when PI = ~. It has been shown in Section 8.2 that Ni. n is
distributed according to WeI, n), where n = N - q, and the elements of
Pn - P have a joint normal distribution independent of Ni n .
8.4 DISTRIBUTION OF TIlE LIKELIHOOD RATIO CRITERION 305
(3) U=----
IGI
iG+HI'
where G is dislributed according lO WeI, n), H is dislributed according to
W(I, m), where m = ql' and G and Hare independenl.
),.,et
( 4) G = N in = XX I - xZ' ( ZZ') - 1 ZX I,
(7)
(8) r. -- _
)
..... p,
306 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
yy' yV'
VJ vV'
( 9)
IVV'I
Y, Y,' lCZ'j
~Z' IZZ'I
ZY,'
Y; - I Y,'- I Y,_I Z '
ZY;'_I zz' jlZZ'1
Y, - I Y,'- I ,,
Yi'" Yi_IZ'
y,*y',-I y(y,*' Y(Z'
ZY,'_I Zy,*' ZZ'
8.4 DISTRIBUTION OF THE LIKELIHOOD RATIO CRITERION 307
=~l
[If;l k [Ii;' j[Yf-,
__~______________~__~
z']
[Y,;ljPi'_1 Z'1
=y~y~'
"
-y~
I
(Y.'I-I Z' )[~-IY;-I
ZY,'_I
The ratio V, is the 21Nth power of the likelihood ratio criterion for
testing the hypothesis that the regression of y,* =x'i -llil ZI on ZI is 0 (in
Y,
the presence of regression on -I and Z2); here 1111 is the ith row of ~. For
i = 1, gu is the sum of squares of the residuals of yt
= (Yw'''' YIN) from its
regression on Z, and gIl + hll is th~ sum of squares of the residuals from Z2'
The ratio VI = gll/(gll + h ll ), which is approximate to test the hypothesis
that regression of yi on ZI is 0, is distributed as Xn2 /( Xn2 + X~) (by Lemma
8.4.2) and has the beta distribution (3(v; ~n, ~m). (See Section 5.2, for
example.) Thus V; has the beta density
(11) .Brv;~(n+l-i),~m]
for 0 :s; v :s; 1 and 0 for v outside this interval. Since this distribution does not
depend on }{ _ I, we see that the ratio V; is independent of ~ -1' and hence
independent of IVI"'" Vi-I' Then VI"'" Vp are independent.
The cdf of U can be found by integrating the joint density of VI"'" l-j,
over the range
(12) n~:s;u.
i= 1
308 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
We shall now show that for given N - q2 the indices p and ql can be
interchanged; that is, the distributions of Up qI' N-q 2 _q I = Up •m• n and of
t
(13)
Ip V 1m V'
= V'
= V
= 11m - V'VI;
1m Ip
the fourth and sixth equalities follow from Theorem A.3.2 of the Appendix,
and the: fifth from permutation of rows and columns. Since the Jacobian of
WI = CV is model CI m = 111 tm, the joint density of 1 and V is
(15)
for 1 and Ip - VV' positive definite, and 0 otherwise. Thus 1 and V are
independently distributed; the density of 1 is the first term in (15), namely,
w(lllp,n + m), and the density of V is the second term, namely, of the form
8.4.2. Moments
Since (11)'is a density and hence integrates to I! by change of notation
(22)
310 TESTING THE GENERAL LINEAR HYPOTHESIS: MANOVA
t 23 )
~U~'
~r.,".'1
= n
j.,1
J r[Hm +n +2)-j] r[i(m +n + 1) -j]
\r[!(m+n+2)-j+h] r[Hm+n+l)-j+h]
.r[}(n+2)-j+h]r[t(n+l)-j+h] )
r[i(n+2) -j]r[t(n+l) -j]
_
-J]
r {f(m+n+I-2 nr(n+I-2 j +2h)}
r(m+n-I-2j+2h)r(n+I-2j) ,
(24) n(f
1",1 \ (1
1 r (m + n + ~ -
r(n + 1 - 2j)r(m)
2 j) l" + I - 2 Jl + 2" - I (1 _ y) m - I dY }
\\ here the Y, are independent and ~ has density (3(y: n + 1 - 2j, m).
Suppose p is odd; that is, p = 2s + 1. Then
( 25) <-,"
(£>
_C<'
U2s+l.m.n - (£J
(
n ZI Zs+
S
1=1
2
I
)"
,
where the Z, are independent and Z, has density (3(z; n + 1 - 2i, m) for
i = 1.. .. , sand ZH I is distributed with density {3 [z; (n + 1 - p )/2, m /2].
P6)
r[Hn +m)] ~n-I(I- )~m-I
r(in)f(tm) u u.
8.4 DISTRIBUTION OF THE LIKELIHOOD RATIO CRITERION 311
1 1
(27) U = = -----,------:--=--
l,m,ll 1 + Li"l 1 + (mjn)Fm,n '
where gIl is the one element of G = NIn and Fm,n is an F-statistic. Thus
1- Ut,nt,n . ~ =F
(28) Ul,m,n m m.n·
p-2
From Theorem 8.4.4, we see that the density of VU2,m,n IS
(30)
r(n + m -1) i tn - 3 )(I_ c)m-J
2r(n-1)f(m)u vu.
pEven
Wald and Brookner (1941) gave a method for finding the distribution of
Up,m,n for p or m even. We shall present the method of Schatzoff (1966a). It
will be convenient first to consider Up, m, n for m == 2r. We can write the event
nf"lV;~U as
(32) Y1 + ... + ~ ~ - log u.
312 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
where Y1, ..• , ~ are independent and Y; = -log V, has the density
(34)
r[Hn+l-i)+r]
K{= r[Hn+l-i)]f(r)
1
= (r-l)!1=0
il n + 1 - i + 2j
2
(35)
k k' k-h
cw "" ( 1 h . wi
= e Jh ':0 - ) ( k - h)! -(c~--=-a-J)-:h"""'+-:-I
(36)
where
(38)
c= r(n+m-l)r(n+m-3)
r(n - l)r(n - 3)r2(m)
m-I [(m-l)!]2(-I)'+1
(39) Pr{ U4 , m , n ~ u} = C . ~ 0 ( m - l. - 1) '( . 1) ,.1.]
. m - ]-
., . ,.
1,1=
m-I [(m-l)!]2(-I)'+!
= c,,~o (m - i-I) '(m - j - 1) 'i!j!(n - 3 + j)
The last step of the integration yields powers of ru and products of powers
of ru
and log u (for 1 + i - j = - 1).
314 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
Particular Values
Wilks (IQ35) gives explicitly the distrihutions of U for p = 1, P = 2, p = 3
with m = 3; P = 3 with In 4: and p = 4 with m = 4. Wilks's formula for
p =" 3 with m = 4 appears to be incorrect; see the first edition of this book.
Consul (11.)66) gives many distributivns for special caSLS. See also Mathai
(1971).
It is shown in Section 8.5 that - [n - ~(p - m + 1)] lc,g Up,m.n has a limiting
x~-distribution with pm degrees of freedom. Let Xp'lm(a) denote the a
significance point of X;m, and let
(41 )
- [Il- HP -111 + I)] log Ll/ 1 • In " . ( a)
C p, ,n , n - p + I ( a) = :< ( )
XI'II' a
Table 8.1 [from Pearson and Hartley (1972)] gives value of Cp,m,M(a) for
a = 0.10 and 0.05. p = 1(1)10. various even values of m, and M = Jl - P + 1
= 1(1)10{2)20.24,30,40.60,120.
To test a null hypothesis one computes Up. m,l. and rejects the null
hypothesis at significance level a if
Since Cp.m.n(a) > 1, the hypothesis is accepted if the left-hand side of (42) is
less than xim( a).
The purpose of tabulating Cp,m. M(a) is that linear interpolation is reason-
ably accurate because the entries decrease monotonically and smoothly to 1
as M increases. Schatzoff (1966a) has recommended interpolation for odd p
by using adjacent even values of p and displays some examples. The table
also indicates how accurate the x:!·approximation is. The table has been
ex.tended by Pillal and Gupta (1969).
The criterion U has been expressed in (7) as the product of independent beta
variables VI' V2 , •. • , Vp' The ratio v:
is a least squares criterion for testing the
null hypothesis that in the regression of x~ - ll;rZr on Z = (Z~ Zz)' and
8.4 DISTRIBUT[OI' OF THE LIKELIHOOD RATIO CRITERION 315
I-V,n-i+l
( 43)
~ m
t In Some cases the ordering of variables may be imposed; for example, XI might be an
observation at the first time point, x2 at the second time point, and so on.
316 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
1 - v.( e.) n - i + 1
(44) 1 I • =F (e)
vi(el ) m m,n-t+l I'
x:t'x'!"
1 , XI*X't-I x*Z'
t
Xi-I Xi *, Xi-I X;_I Xi_IZ'
Zx*'
1
ZX;_I ZZ'
(45)
(xi - IlH Z I)( xi - il,iZI)' (xi - il,iZI )X;_I (xi - llilZI )Z;
X'_I (xi - lliI Z I)' X,_IX;_I Xi_IZ;
Z2( xi - Pil ZI)' Z2 X;_1 Z2Z;
X t - I X:_ I Zi-I Z!
Z2 X :_1 Z2 X 2
~ vi ( ei )·
Xi-I X;_I X,_IZ'
ZX:_ I ZZ'
( 1) h=O,l, ... ,
t [n all cases where we apply this result, the parameters x k , ~k> YJ • and 1)) will be such that there
is a distribution with such moments.
8.5 ASYMPTOTIC EXPANSION OF DISTRIBUTION OF CRITERION 317
a b
(2) ~ Xk = ~ Y)"
k~l ,=1
(3) M= -210gW,
= £'W- 211 P
Here p is arbitrary; later it will depend on N. If a = b, xI.: = Yk' gl.:::; Tl~, then
(I) is the hth moment of the product of powers of variables with beta
distributions, and then (I) holds for all h for which the gamma functions
exist. In this case (4) is valid for all real t. We shall assume here that (4) holds
for all real t, and in each case where we apply the result we shan verify this
assumption.
Let
where
get) =2it P [ t
k=1
xklogxk - EY)IOgy)]
F 1
a
+ E log r[ px k( 1 - 2it) + (3k + ek]
k=l
b
- E log r[ py){1- 2it) + 8) + Tl)],
j= I
expansion formula for the gamma function [Barnes (1899), p. 64] which is
asymptotic in x for bounded h:
(7)
BI(h) =h-~,
Taking x = px k(1 - 2it), Py/l - 2it) and h = 13k + gk' Sj + Tlj in tum, we
ohtain
(9) ¢(t)=Q-g(O)-~flog(I-2it)
m a h
+ E wr(1-2it)-r + E O(x;(m+l)) + E O(Yj(m+l)).
r~1 k=l j=l
where
( 10) f= - 2{ ~ gk - ~ Tlj - Ha - b) }.
*This definition differs slightly from that of Whittaker and Watson [(1943), p. 126], who expand
r(c· hT - ])/(e T -. n. If B:(h) is this .~econd tyre of ro1ynomiill. B 1(h)=Bf(h)- B 2r (h)= t.
B!.(h) + ( - 1)' + I B,. where B, is the rth Bernoulli number, and B 2,+ I( h) = B~r+ l(h).
8.5 ASYMPTOTIC EXPANSION OF DISTRIBUfION OF CRITERION 319
One resulting form for ep(t) (which we shall not use here) is
I m
(13) ep(t) =l'<1>(I) = eQ - g (O)(I- 2it) -"if E a v (1- 2itfU +R~+I'
U~O
where E;=oa uz-u is the sum of the first m + 1 terms in the series expansion
of exp( - E~~o wrz- r ), and R~ +. is a remainder term. Alternatively,
m
(14) <I>(t) = -tflog(I-2it) + E w [(1-2it)-r -1] +R'm+I'
r
r=1
where
In (14) we have expanded gCO) in the same way we expanded get) and have
collected similar terms.
Then
where Tr(t) is the term in the expansion with terms Wfl ... w:', Eis f = r; for
example,
(17)
(19)
=/ 00 1
_00 ... 7r
I
-;:;-(1-2it)-lue-,lzdt.
Let
(20)
Iv
Rill + I =
Joo_00 27r
1
(1 -
. -tJ",
211) Rm + I ('
-lIZ
dt .
00 1 m
-cb(t)e-i'Zdt= ESr(z)+R~t-l
(21) / _0027r
r=O
+{ W2[gf+4(z) -gf(z)]
+ ~~ [g[+4(Z) - 2g[+z(z) +g[(Z)]}
Let
(22)
The cdf of M is written in terms of the cdf of pM, which is the integral of
8.5 ASYMPTOTIC EXPANSION OF DISTRIBUTION OF CRITERION 321
.
Pr{ xl $ pMo}) + ~I (Pr{ xl~~ s pMo}
following the remainder terms along. (In fact, to make the proof rigorous one
needs to verify that each remainder is of th,= proper order in a uniform
sense.)
In many cases it is desirable to choose p so that WI = O. In such a case
using only the first term of (23) gives an error of order 8- J .
Further details of the ex! lansion can be found in Box's paper (1949).
Theorem 8.5.1. Suppose.' that $W h is given by (I) for all pure(v imaginal}'
h, with (2) holding. Then the edf of - 2p log W Is given by (23). The error,
R"m+l' is O(8- cm + I ») ifxk'C:~ck8, YJ "2!.d,8 (ck>O, d,>O), and if (l-p)x",
(1 - p)Y, have limits, where p may depend on 8.
(24)
K nk~ I r[ t( N - q + 1 - k ;.- Nil)]
nf~lr[}(N-q1+ l-j+Nh)]'
322 TESTING THE GENERAL LINEAR HYPOTHESIS; tv' ANOVA
and this holds for all h fm which the gamma functions exist, including purely
imaginary h. We let a = b = p,
We observe that
(26)
\ 2w 1 = L
p {U[(1-P)N-q+l-k 1fIN- H(1 p)N-q+l-kl
k-I '2P
2
{1[( 1 - p) N - q2 + 1 - k] } - {[ (1 p) N q, + 1 - k 1}
=~
pN
t [-2[(1-P)N-q2+1-k]ql+q~ +9..!.]
k=l 4 2
= :;N [- 2( 1 - p) N + 2q2 - 2 + ( P + 1) + q 1 + 2] .
(27)
Then
( 28 ) Pr { - 2 ~ log A ~ z}
=Pr{-klogUp.q1 ,N_c ~z}
=Pr{Xp?.q,~z}
where
pql(p2 + q? - 5)
(30) 'Y2 = 48 '
2
(31) 'Y4 = 'Yl + ~io [3p4 + 3q~ + lOp2q; - 50(p2 + qn + 159].
Since it = U}~I,n' where n = N - q, (28) gives Pr{ -k log Up. ql • n ::; z}.
t Box has shown th..J.t the term of order N- 5 is 0 and gives the coefficients to be used in the term
of order N- 6 •
324 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
(33)
2
h(h-l)(h-2) 4cPk- 3(h-3)(Uk //-tk)2 O( -3)
+'4 2 + /-tk ,
- /-tk
where cPk = cC'(Yk - /-tk)"!' / /-tk' assumed bounded. The rth moment of Zk is
expressed by replacment of h by rh in (34). The central moments of Zk are
(36)
(37)
Now we consider -log Up,m.n = - r.f~1 log v" where Vi"'" Vp are inde-
pendent and Vi has the density {3(x; (n + 1 - ;)/2, m/2), i = 1,.,., p. As
n --+ 00 and m --+ 00, -log 1~ tends to normality. If V has the density
(3(x; a/2, b/2), the moment generating function of -logV is
$ e
-II V
=
r[(a +b)/2]r(a/2-t)
=-7-'---"""---:~---=-~"":""---T-
(38) og
rCa/2)r[(a +b)/2 -1] .
where ",(w) == d log f(w)/dw. [See Abramovitz and Stegun (1972), p. 258, for
elCample,] From few + 1) = wf(w) we obtain the recursion relation tb(w + 1)
= ",(w) + l/w. This yields for s = 0 and l an integer
(40)
The validity of (40) for s = 1,2, ... is verified by differentiation. [The expre~
sion for ",'(Z) in the first line of page 223 of Mudholkar and Trivedi (1981) is
incorrect.] Thus for b = 2l
( 41)
From these results we obtain as the rth cumulant of -log Up. 21.11
P I-I 1
(42) Kr (-logUp ,21,n)=Y(r-1)! E E
I~IJ~O
" /'
(n-l,1-_J)
r'
As l --+ 00 the series diverges for r = 1 and converges for r = 2,3, and hence
Kr/ K I --+ 0, r = 2,3. The same is true as p --+ 00 (if n /p approaches a positive
constant).
Given n, p, and l, the first three cumulants arc calculatcd from (42). Then
ho is determined from (37), and ( -log Up • 21.,,)"" is treated as approximately
normally distributed with mean and variance calculated from (34) and (3))
for h = hI).
Mudholkar and Trivedi (1980) calculated the error of approximation fllr
significance levels of 0.01 and 0.05 for n from 4 to 66, p = 3,7, and
326 TESTING THE GENERAL UNEAR HYPOTHESIS; MANOVA
q = 2,0,10. The maximum error is less than 0.0007; in most cases the error is
considerably less. The error for the xl-approximation is much larger, espe-
cially for small values of 11.
In case of m odd the rth cumulant can be approximated by
p [ ~(f/l - 3) I I I ]
(-B) 2'(r-J)!E E -- r+- r'
,,,,I ,=0 (11-1+1-2]) 2(n-i+m)
8.5.4. An F-Approximation
Rao (1951) ha<; used the expansion of Section 8.52 to develop an expansion
of the distribution of another function of UjI,m,u in terms of beta distribu-
lioll~. The con~II.IIlts can he adjusted ~1.1 thtlt the term after the leading one is
of order m" ~. A good approximation is to consider
I-Ulls ks-r
(44 )
UIIS pm
p 2m 2 - 4 pm
(45) s= 1'=2-1,
p2 +m 2 - 5 '
(1) PZa=PIZ~)+P2Z~)
= PI Z~(I) + Pi z~),
where I: a Z*(I)Z(2)
A a a ' = 0 and r. a z*(I)Z*(I),
a a = A I .2. Then
I AI = A ln and 8*2
........ P2 =
P2w'
We shall use the principle of invariance to reduce the set of tests to be
considered. First, if we make the transformation X; = Xa + rz~\ we leave
the null hypothesis invariant, since $ X:
= p l z!(1) + (13; + r)z~) and P=; + r
is unspecified. The only invariants of the sufficient statistics are i and PI
(since for each, P';, there is a r that transforms it to 0, that is, - P'; ).
Second, the n.lll hypothesis is invariant under the transformation z! *(1) =
Cz!(1) (C nonsingular); the transformation carries PI to PIC-I. Under this
transformation i and PIAII.2P; are invariant; we consider A II .2 as informa-
tion relevant to inference. However, these are the only invariants. For
consider a function of PI and A II .2, say f(PI' A II .2). Then there is a C* that
carries this into f(P I C* -I, I), and a further orthogonal transformation
carries this into f(T, I), where tiv = 0, i < V, tif ~ O. (If each row of T is
considered a vector in q.-space, the rotation of coordinate axes can b(. done
so the first vector is along the first coordinate axis, the second vector is in the
plane determined by the first two coordinate axes, and so forth). But T is a
function of IT' = PI A 1I.2P~; that is, the elements of T are uniquely deter-
mined by this equation and the preceding restrictions. Thus our tests will
A ,.. ", A,..",
depend on I and PI A II .2PI. Let NI = G and PI A II . 2PI =H.
Third, the null hypothesis is invariant when Xa is replaced by Kx a , for ~
and P=; are unspecified. This transforms G to KGK' and H to KHK'. The
only invariants of G and H under such transformations are the roots of
(2) IH-IGI = O.
= IK(H-IG)K'I
= IKI·IH-IGI·IK'I.
On the other hand, these are the only invariants, for given G and H there is
328 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
II 0 0
0 12 0
(4) KHK' =L=
0 0 Ip
where II ~ · .. I p are the roots of (2). (See Theorem A.2.2 of the Appendix.)
Theorem 8.6.1. Let x" be all observatioll ji"011l N(PIZ~(I) + ~z~~\ 1),
where LaZ:(I)z~)' = 0 and LaZ:(l)Z:(I), = A ll . 2 • The only functions of the
sufficient statistics and A II 2 invariant under Ihe transfonnations x~ = Xu +
fZ(2) _**(I) = Cz*(I)
a ,Awa a ' and x*a =](x a are the roots o(
'J(2), where G = N~
~
and
A A I
H = ~IAll.?PI'
(5)
IGI IKGK'I III
U= IG+HI = IKGK' +KHK'I = I:-'-+~L:-;-I
-:-:1
= tr HK' K = tr HG - 1 .
This criterion was suggested by Lawley (938), Bartlett (939), and Hotelling
(1947), (1951). The test procedure is to reject the hypothesis if (6) is greater
than a constant depending on p, m, and n.
8.6 OTHER CRITERIA FOR TESTING THE LINEAR HYPOTHESIS 329
J;r[Hm+n-1)] -~(n-l) [I 1 ]
- rUm)r(~n) (1 +w) /~'~;(~+"'): }(m - l),~(n --1) .
V',here I/a, b) is the incomplete beta function, that is, the integral of (3(y: a. b)
from 0 to x.
Constantine (1966) expressed the density of tr HG - 1 as an infinite series
in generalized Laguerre polynomials and as an infinite series in zonal
polynomials; these series, however, converge only for tr HG - I < 1. Davis
(1968) showed that the analytic continuation of these series satisfies a system
of linear homogeneous differential equations of order p. Davis (1970a,
1970b) used a solution to compute tables as given in Appendix B.
Under the null hypothesis, G is distributed as L'~"'-IZaZ~ (n = N - q) and
H is distributed as L~t..1 Y"Y,:, where the Zcr. and Y,J are independent, each
with distribution N(O, ~). Since the roots are invariant under the previously
specified linear transformation, we can choose K so that K 'IK' = 1 and let
G* = KGK' [= L(KZ"XKZ)'] and H* = KHK'. This is equivalent to assum-
ing at the outset that 'I = I.
Now
(8) · N
P11m 1 G = PI'1m -+
II
-I ~
i-.J Z Z'a Ci = 1,
N .... oo n .... x n q n a~l
This result follows applying the (weak) law of large numbers to each element
of (l/n)G,
(9) plim
n .... OO
t ZlaZ,a =
~ ,y=l J'ZlaZ,a = 01] '
lLawIcy (I 93H) purp()f!ed 10 derive the exact di!.!nhulion, hlll the rCl'uh i ... in crror
330 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
p p q.
( 10) tr H= L hll = L L y,~.
1=1 1=1 V= 1
~ 11 )
and let xl( ex) be the ex·significance point of the x2-distribution with k
degrees of freedom. Then
'(
(11) m~'l'.m.n( ex) = Xpm ex)
l[p+m+14(
+ 2n pm + 2 Xpm ex)
+(p-m+l)X;m(ex)] +O(n- 2 ).
Ito also gives the term of order n- 2 • See also Muirhead (1970). Davis
(1 ~70a), (1970b) evaluated the aCCUl :lcy of the approximation (12). Ito also
founJ
1 [p+m+l 2
(13) Pr{tltrHG-I~z}=Gpm(Z)- 2n pm+2 Z
where G/z)= pr{xl ~z} and gk(Z) = (d/dz)Gk(z). Pillai (1956) suggested
another approximation to nWp,n,j ex), and Pillai and Samson (1959) gave
moments of tr HG -1. PilIai and Young (1971) and Krishnaiah and Chang
(972) evaluated the Laplace transform of tr HG- 1 and showed how to invert
8.6 OTHER CRITERIA FOR TESTING THE LINEAR HYPOTHESIS 331
~ I, ( _[
( 14) V= i-.J 1 + I, = tr L 1+ L)
i"" [ I
=trKHK'(KGK' +KHK')-[
= tr HK' [K( G + H)K'] -I K
-1
= tr H(G + H) ,
where as before K is such that KGK' = I and (4) holds. In terms of the roots
f = Ij(1 + It>, i = 1, ... , p, of
(15) IH-f(H+G)1 =0,
the criterion is Er.1 fi. In principle, the cdf, density, and moments under the
null hypothesis can be found from the density of the roots (Sec. 13.2.3),
p p
(16) c nftt(m-p-I) n
(= 1 1=1
(1 - It )~(n-p-I) n
i <J
eli - 1;),
where
7Ttp2fp[!(m + n)]
( 17)
C= fp( tn )fp( tm )fpdp)
for 1 >f[ > ... > fp > 0, and 0 otherwise. If m - p and n - p are odd, the
density is a polynomial in fI'.'.' fp. Then the density and cdf of the sum of
the roots are polynomials.
Many authors have written about the moments, Laplace transforms, densi-
ties, and cdfs, using various approaches. Nanda (1950) derived the distribu-
tion for p = 2,3,4 and m = p + 1. Pilla i (1954), (1956), (1960) and PilIai and
332 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
Mijares (1959) calculated the fi~st four moments of V and proposed approxi-
mating the distribution by a beta distribution based on the first four mo-
ments. Pillai and Jayachandran (1970) show how to evaluate the moment
generating function as a weighted sum of determinants whose elements are
incomplete gamma functions; they derive exact densities for some special
cases and use them for a table of significance point!'. Krishnaiah and Chang
(1972) express the distributions as linear combinations of inverse Laplace
transforms of the products of certain double integrals and further develop
this technique for finding the distribution. Davis (1972b) showed that the
distribution satisfies a differential equation and showed the nature of the
solution. Khatri and Pillai () 968) obtained the (nonnul)) distributions in
series forms. The characteristic function (under the null hypothesis) was
given by James (1964). Pillai and Jayachandran (1967) found the nonnull
distribution for p = 2 and computed power functions. For an extensive
bibliography see Krishnaiah (1978).
We now turn to the asymptotic theory. It follows from Theorem 8.6.2 that
nV or NV has a limiting X2-distribution with pm degrees of freedom.
Let Up, Ill, / a) be defined hy
Then Davis (1970a), (1970b), Fujikoshi (1973), and Rothenberg (1977) have
shown that
2 1 [ p+m+l 4
(19) nUp,m,n(a) =Xpm(a) + 2n - pm +2 Xl'm(a)
1 p +m + 1 4 ( ( -2
(22) nUp,m,n(a)=nup.m.n(a) + 2n' pm+2 Xpm a)+O n ).
M OTHER CRITERIA FOR TESTING THE LINEAR HYPOTHESIS 333
(24)
The density of the roots fl"'" fp for p ~ m under the null hypothesis is
given in (16). The cdf of R = fl' Pr{fl ~f*}, can be obtained from the joint
density by integration over the range 0 ~fp ~ ... ~fl 5,f*. If m - p and
n - p are both odd, the density of fl"'" fp is a polynomial; then the cdf of
fl is a polynomial in f* and the density of fl is a polynomial. The only
difficulty in carrying out the integration is keeping track of the different
terms.
Roy [(1945), (1957), Appendix 9] developed a method of integration that
results in a cdf that is a linear combination of products of univariate beta
densities and 1eta edfs. The cdf of fl for p = 2 is
<25) Pr{fl~f}=If(m-l,n-1)
-
r;rl'!(m
2
+ n - 1)] ~!m-I)(l-f);(n-!)l
1
[.t(rn - 1) l(1[ - 1)]
r(tm)rOn) r f :2 '2 .
334 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
This is derived in Section 13.5. Roy (1957), Chapter 8, gives the cdfs for
p = 3 and 4 also.
By Theorem 8,6.2 the limiting distribution of the largest characteristic root
of nHG·· 1 , NHG- 1 , IlH(H+G)-·, or NH(H+Gr l is the distribution of
the largest characteristic root of H having the distribution W(J, m). The
dcnsitks or the roots of H are given in Section 13.3. ]n principle, the
marginal density of the largest root can be obtained from the joint density by
integration, but in actual fact the integration is more difficult than that for
the density of the roots of HG - I or H( H + G)- J.
The literature on this subject is too extensive to summarize here. Nanda
(1948) obtained the distribution for p = 2, 3, 4, and 5. Pillai (1954), (1956),
(1965), (1967) treated the distribution under the null hypothesis. Other
results were obtained by Sugiyama and Fukutomi (1966) and Sugiyama
(1967). Pillai (1967) derived an appropriate distribution as a linear combina-
tion of incomplete beta functions. Davis (1972a) showed that the density of a
single ordered root satisfies a differential equation and (1972b) derived a
recurrence relation for it. Hayakawa (1967),· Khatri md Pillai (1968), Pillai
and Sugivama (1969), and Khatri (1972) treated th.'! noncentral case. See
Krishnaiah (1978) for more references.
( 26)
The distribution of
( 27)
8.6 OTHER CRITERIA FOR TESTING THE LINEAR HYPOTHESIS 335
+ E v gpm
j
2
+6 [ X;m( a)]
i=I
336 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
p
+ L vlgpm + 6 [ X;m( ex)]
i= I
(31)
~ 2_ P+m +1 (~ )2 ~ ( __ )2 _ p( P - 1)( P + 2) _ 2
i-.J V, +2 i-.J V, = i-.J V, V pm + 2 V ,
i=1 pm i=1 i=1
0'" (p-1){p+2)
(32) -=->
V pm+2
(1) H: PI = p~,
338 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
(2)
and compare this number with Up.q,.l'( cr), the cr significance point of the
Up.q,.n-distribution. For p = 2, ... ,10 and even m, Table 1 in Appendix B can
be used. For m = 2, .... 10 and even p the same table can be used with m
replaced by p and p replaced by m. eM as given in the table remains
unchanged.) For p and m both odd, interpolation between even values of
either p or m will give !iufficient accuracy for most purposes. For reasonably
larg~ II. the <l~ymptotic theory can be used. An equivalent procedure is to
calculate Pr{Up . m . I , :::;; U}; if this is less than cr, the null hypothesis is rejected.
Alternatively one can use the Lawley-Hotelling trace criterion
l
(3) W= tr (Niw - Ni n)( Ninr
= tr (P I n - P~ ) All 2 ( PIn - pr)' ( N In) -1 ,
the Pillai trace criterion
= tr ( it
..... w - PI*) All.:} (it..... 10 - PI*)'(N~"',v'
)-1
or the Roy maximL.m root criterion R, where R is the maximum root of
(6 )
( 7)
(8)
the first qz rows and columns of A* and of A** are the same as the result of
applying the forward solution to the left-hand side of
(9)
and the first qz rows of C* and C** are the same as the result of applying
the forward solution to the right-hand side of(9). Thus t3zwAzZt32w = CiC!*',
where C*, = (C;' Ci t) and C**, = (C2*' Ci* t).
The method implies a method for computing a determinant. In Section
A.S of the Appendix it is shown that the result of the forward solution is
FA = A*, Thus IFI ·IAI = IA* L Since the determinant of a triangular matrix
n
is t he product of its diagonal elements, IFI = 1 and IA I = IA * I = 7= I air.
This result holds for any positive definite matrix in place of A (with a suitable
modification of F) and hence can be used to compute INInl and INI«J
(10)
(11 ) A A -) (A _)I~U'Jqn(a),
INI. n + (piH - PI A lI . z Pin - 13; , , I'
Theorem 8.7.1. The region (11) in the 131-space is a confidence region for
131 with confidence coefficient 1 - a.
340 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
=trAY'G-1y- (tr<lt,y)2
tr A -1 <ltJG<It '
The confidence region (12) can he explored hy use of (16) for various «1>. If
cPik = 1 for some pair ([, K) and 0 for other elements, then (6) gives an
interval for {3]K' If cP'k = 1 for a pair ([, K), -1 for (I, L), and 0 otherwise,
the interval pertains to {3/K - {3ILI the difference of coefficients of two
independent variables. If cP,k = 1 for a pair (I, K), 1 for (J, K). and 0
otherwise, one obtains an interval for {3]K - (3jK' the difference of coeffi-
cients for two dependent variables.
(17)
(18)
with probability 1 - ex; the second inequality follows from Theorem A.2.4 of
the Appendix. Then a set of confidence intervals on all linear comhinations
a'p i b holding with confidence I - ex is
We can compare these intervals with (16) for <I> = ab', which is of rank 1.
The term suhtracted from and added to tr <I>'PtH =a'Plnb is the square root
of
This is greater than the term subtracted and added to a'P1ob in (19) because
I~'p.m,"(a), pertaining to the sum of the roots, is greater than 'p.m.n(a),
relating to one root. The bounds (16) hold for all p x m matrices <1>, while
(19) holds only for matrices ab' of rank L
Mudholkar (1966) gives a very general method of constructing simultane-
ous confidence intervals based on symmetric gauge functions. Gabriel (1969)
relates confidence bounds to simultaneous test procedures. Wijsman (1979)
showed that under certain conditions the confidence sets based on the
maximum root are smallest. [See also Wijsman (1980).]
III univariate analysis it is well known that many hypotheses can be put in the
form of hypotheses concerning regression coefficients. The same is true for
the corresponding multivariate cases. As an example we consider testing the
hypothesis that the means of, say, q normal distributions with a common
covariance matrix are equal.
Let y~'l be an observation from N(fL(r), I), a = 1, ... , N{, i = 1, ... , q. The
null hypothesis is
To put the problem in the form considered earlier ir this chapter, let
( 2) X=(x 1 x 1 "'x,
"I
x N1+I .. ·x)
N
= (y(1)y(2)
1 2
"'y(l)y(1) ... y(q))
NI 1 Nq
0 0
0 0 0 I 0
0 0 0 0 0
=
0 0 0 0 0
1 1 1 1 1
8.8 TESTING EQUALITY OF MEANS 343
that is, Zia = 1 if NI + ... +Ni - I < a 5, NI + ... +N,• and zia = 0 otherwise,
for i = 1, ... , q - 1, and Zqa = 1 (all a). Let 13 (PI P2), where
Nl 0 0 NI
0 N2 0 N2
N
(5) A= E zaza' =
a=1 0 0 Nq- l Nq_l
Nl N2 Nq_ 1 N
N
(6) C= E xaz~ = (Ey~l) Ey~2) ." Ey~q-') f2y~i») .
a~1 a a a l.a
say, and
(7)
= Ey~)y~I)' - Njj'
i. a
A A A A I
For In. we use the formula NIn. = LXaX~ - PnApn = LXaX~ CA -IC'
Let
1 0 0 0
0 1 0 0
(8) D=
0 0 1 0
-1 -1 -1 1
344 TESTING THE GENERAL LINEAR HYPOTHB.~IS; MANOVA
then
1 0 0 0
o 1 0 0
(9)
o 0 I 0
1 1 1 1
Thus
=CD'(DAD') 1 DC'
-1
N, o
o o
= ( Lyil ) ... Ly~q»)
a ex
o o
= L Niyi)j(l)' •
i
(11)
I, ex
(12)
8.8 TESTING EQUALITY OF MEANS 345
as implied by (4) and (5) of Section 8.4. Here H has the distrihution
W(~, q - I). It will he seen that when p = I, this test reduces to the usual
F-test
"N (-(I)
LJ/Y
__
Y
)2 II
(14) _ ))2 . q -1 > Fq - l . a}.
(Y~ )- Y<'
II (
.
We give an example of the analysis. The data are taken from Barnard's
study of Egyptian skulls (1935). The 4 ( = q) populations are Late Pre dynastic
Ci = 1), Sixth to Twelfth (i = 2), Twelfth to Thirteenth (i = 3), and Ptolemaic
Dynasties (i = 4). The 4 (= p) measurements (i.e., components of Y~/) are
maximum breadth, basialveolar length, nasal height, and basibregmatic height.
The numbers of obsetvations are Nl = 91, N2 = 162, N;. = 70, N~ = 75. The
data are sumn arized as
(16) Nin
_
9661.997470
445.573301
445.573301
9073.115027
1130.623900
1239.211990 2255.812722
1
2 148.584 :2 10
IN~nl
5
(18) U= = 2.4269054 X 10 = 0.f214344.
INIJ 2.9544475 X 10 5
r c
(2) E AI = E lI} = 0,
i'" 1 } '" 1
that the variance of Y;} is a 2, and that the Y;} are independently normally
distributed. To test that column effects are zero is to test that
( 3) v·=
} 0, j=1" .. ,c.
k =i,
=0, k *- i,
ZOk,Ij = 1, k=j,
=0, k*-j.
,. c
(5) ct"Y;j = /-LZ!X),lj + E AkZkO,iJ + E VkZOk,lj'
k-l k=1
The hypothesi::, is that the coefficients of ZOk. iJ are zero, Since the matrix of
fixed variates here,
ZOJ,ll ZOO,FC
ZlO,ll zlO,rc
ZOe. II ZOc. re
is singular (for example, row 00 is the sum of rows 10,20, ... , rO), one must
elaborate the regression theory. When one does, one finds that the test
criterion indicated by the regression theory is the usual F-test of analysis of
variance.
Let
y 1 ~y.
.. = -rc i..J IJ'
i.j
(7)
348 TEST] NG THE GENERAL LlNEAR HYPOTHES1S; MANOVA
and let
a= "( y
'-' I)
_ yI. _ y . }. + y •• )2
i. i
= "y2-e"y2-r'y2.+rey2
'-' I} '-' I. '-'.} •• '
t.j I j
(8)
b=rE(y:}-y..)2
j
=r Ey2. } -rey2
., .
}
(9) F=!!.'
a
{e-l){r-l}
e 1 .
Under the null hypothesis, this has the F-distribution with e - land (r - 1).
(c - 1) degrees of freedom. The likelihood ratio criterion for the hypothesis
is the rel2 power of
a 1
(1O)
a+5 = 1 + {{e -l)/[{r -l){e - l)]}F'
Table 8.1
Varieties
Location M S V T P Sums
UF 81 105 120 110 98 514
81 82 80 87 84 414
W 147 142 151 192 146 778
100 116 112 148 108 584
M 82 77 78 131 90 458 .
103 105 117 140 130 595
C 120 121 124 141 125 631
99 62 96 126 76 459
GR 99 89 69 89 104 450
66 50 97 62 80 355
D 87 77 79 102 96 441
68 67 67 92 94 338
Sums 616 611 621 765 659 3272
517 482 569 655 572 2795
( 12)
IAI
IA+BI'
Under the null hypothesis, this has the distribution of U for p, n = (r - 1).
(c - 1) and ql = C - 1 given in Section 8.4. In order for A to be nonsingular
(with probab.ility 1), we must require p $ (r - 1Xc - 1).
As an example we use data first published by Immer, Hayes, and Powers
(1934), and later used by Fisher 0947a), by Yates and Cochran (1938), and by
Tukey (1949). The first component of the observation vector is the barley
yield in a given year; the second component is the same measurement made
the following year. Column indices run over the varieties of barley, and row
indices over the locations. The data are given in Table 8.1 [e.g., ~! in the
upper left-hand comer indicates a yield of 81 in each year of variety M in
location UF]. The numbers along the borders are sums.
We consider the square of 047, 100) to be
Then
3279 802\
( 20)
IAI 1 17-io =
= .;--80_2_4_0_ 0 4107
IA+BI 6U67 33521 . .
13352 6880
This result is to be compared with the significant point for U2• 4• 20 • Using the
result of Section 8.4, we see that
1 - ';0.4107 . 19 = 2.66
';0.4107
Now let us see that each F-test in the univariate analysis of variance has
analogous tests in the multivariate analysis of variance. In the linear hypothe-
sis model for the univariate analysis of variance, one assumes that the
random variables YI , ... , YN have expected values that are linear combina-
tions of unknown parameters
(21)
where the {3's are the parameters and the z's are the known coefficients. The
variables {Ya } are assumed to be normally and independently distributed with
common variance (J" 2. In this model there are a set of linear combinations,
say L~", I Yia Ya , where the y's are known, such that
(22)
~23)
b n ECa.BYal{, n
(24) a m Eda.B ~ l{, . m
(25)
352 TESTING THE GENERAL UNEAR HYPOTHESIS; MANOVA
has the distribution W(~, 11). When the null hypothesis is true,
(26)
(27)
IA\
IA+BI
because it follows from (28) that La. "d a /3 ~ay ~"B = 1, 'Y = D ~ n, and = 0
otherwise, and La. /3 ca {3l/1..,y ~{3B = 1, n + 1 ~ 'Y D =:; n + m, and = 0 other-
wise. Since W" is orthogonal, the {Za) are ind~pendently normally distributed
8.10 SOME OPTIMAL PROPERTIES OF TESTS 353
with covariance matrix :~. The same argument shows ttz a = 0, a = 1, ....
n + m, under the null hypothesis. Thus A and B are independently dis-
tributed according to W(~. n) and W(~, m), respectively.
( 1) IH-A(H+G)I=o,
where U = XX' + YY' + ZZ'. Expect for roots that are identically zero, the
roots of (2) coincide with the nonzero characteristic roots of X'(U - yy,)-l X.
Let V= (X, Y, U) and
(4)
The main result, first proved by Schwartz (1967), is the following theorem:
where A.ril and v[/I' i = 1, ... , m, are the coordinates rea"anged in nonascending
order.
),.(1) _ ),.(M(V\»)
),.(21 _ )"(M( V2 »
A(M(pYI + qY 2)
Theorem 8.10.3.
The proof of Theorem 8.10.3 (Figure 8.4) follows from the pair of
majorizations
k k
= E Aj(A) + E A,(B)
1=1 1=1
k
= E{A,(A)+A,(B)},
;=1
k=l, .... p.
•
358 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
Lemma 8.10.2
Lemma 8.10.6.
( 19)
where VI=(XI,YI,U I), V2 =(X 2 ,Y2 ,U2 ), UJ-Y1Y~>0, U2 -Y2 Y;>0, O::;,p
=l-q::::;;1.
This implies
Then Lemma 8.10.5 implies that the right-hand side of (21) is less than or
equal to
•
Lemma 8.10.7. If A ::::;; B, then X(A) -< w ACB).
k= 1, ... ,p. •
From Lemma 8.10.7 we obtain the first majorization in (12) and hence
Theorem 8.10.3, which in turn implies the convexity of A. Thus the accep-
tance region satisfies condition (i) of Stein'S theorem.
(24)
= tr (j)'X + tr 'IT'Y- ltr
2 ~U ,
360 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
(25)
(26)
and V= ex, Y, U), where Xo, Yo are fixed matrices and 'Y is a positive
number. Then
o
(27) 1 4>'X + -tr
oo'y = -tr o
1 'V'Y + 2"tr
o
1 (-I0 'YI
'Y 'Y 0 o
for sufficiently large 'Y. On the other hand,
= :,X{Xo[W)-'(: Y! ;)D-'
--+0
as 'Y --+ 00. Therefore, V E A for sufficiently large 'Y. This is a contradiction.
Hence 0 is positive semidefinite.
*
Now let 00 l correspond to (4) 1,0, 1), where 4> I O. Then 1 + A0 is
positive definite and 4>1 + A4> * 0 for sufficiently large A. Hence 00 1 + Aoo E
o - 0 0 for sufficiently large A. •
e_ extreme points
C(A)
D(A) (AI. Q)
Figure 8.5
It will be proved in Lemma 8.10.10, Lemma 8.10.11, and its l'orullary that
monotonicity of A and convexity of A* implies COdcA*. Then D(A) =
C( A) n R"::, cA* n R"::, = A. Now suppose v E R'~ and v --< '" A. Then v E
D( A) CA. This shows that A is monotone in majorization. Fmthermore. if
A* is convex, then A = R"::, nA* is convex. (See Figure 8.5.) •
Lemma 8.10.10. Let C be compact and convex, and let D be conve;r. fr the
extreme points of C are contained in D, then C cD.
Proof Obvious. •
where 7T' is a perrnutation of (l, ... ,m) and 01 = ... = 01. = 1,01.+1 = ." = 8",
= 0 for some k.
E"1.: x ,=O,
(31)
Em: \111 = O.
Suppose that k is the first index such that Ek holds. Then x E Rn~ implies
o=x" ?:xl+I ;::: •.. 2X m 2 O. Therefore, E/p"" E", hold. The remaining
k - 1 = ttl - (m k + 1) or more equations are among the F's. We order
them as F,I'"'' Fv where i l < ... < ii, 12 k - 1. Now i l < ." < i l impHes
i,?:1 with equality if and only if i l = 1, ... ,i,=I. In this case FI, ... ,Fk _ 1
hold (l?: k n Now suppose i , > t. Since x.I. = ". = xm = 0,
But x I + .•. +X" -I .:s;: Al + ... + A.I. _ I' and we have Ak + ... + A'l = O. There-
fore. 0 = A" + .. ' + A'l;::: A" ;::: ... ;::: A'l! ?: O. In this case FI<. -1' ... , F,n reduce
to the same equation Xl + ... +X k - 1 = Al + ". +AI<._I' It follows that X
satisfies k 2 more equations, which have tf. be F I , ••• , FI<._ 2' We have
shown that in either case E", ...• EII/' Fb"" f, -I hold and thi5 gives the
point f3 = (AI" •.• Ak 1: 0, ... ,0), which is in R'~ n C( A). Therefore, f3 is an
extreme point. •
Then a test with the acceptance region A = {A Ife A) :::; c} i..'1 admissible.
We shall only sketch the proof of this theorem [following Schwartz (1967)].
Let ,;>:;
= d,-, i = 1, ... , t, and let the density of d I' ... , d r be fedl v), where
v = (vL,"" v)' is defined in Section 8.6.5 and f(dl v) is given in Chapter 13.
364 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
The ratio f(dl v) If(dl 0) can be extended symmetrically to the unit cube
(0:::;; d;:::;; 1, i = 1, ... , t). The extended ratio is then a convex function and is
strictly increasing in each d i • A proper Bayes procedure has an acceptance
region
f(dlv)
(36)
f (dIO) dI1(v) ~C,
-a IL=O a -a OlLa -a 0 a IL
Figure 8.6. Three probabilities of acceptance.
8.10 SOME OPTIMAL PROPERTIES OF TESTS 365
(37)
for 0 ~k.:s;; 1.
Lemma 8.10.12. Let E, F be convex and symmetric about the origin. Then
(39)
Hence by convexity of F
366 TESTING THE GENERAL LINEAR HYPOTIlESIS; MANOVA
and
(40) V{ao[(E +y) nF] + (1- O'o)[(E-y) nF]} s; V{(E +ky) nF}.
- V 1/ n { ( E + y) n F} .
Then
;::c
= In H * ( u) du.
Similarly,
f f( x + Icy) dx 1 H( u) du.
:;c
(45)
£ 0
By Lemma 8.10.12, H(u) 2 H*(u). Hence Theorem 8.10.5 fOllows from (44)
and (45). •
8.10 SOME OPTIMAL PROPERTIES OF TESTS 367
= (~!), p>m,
Proof We prove this for the case p ~ m and vp > O. Other cases can be
proved similarly. By Theorem A.2.2 of the Appendix there is a matrix B such
that
(47) B:IB' == I,
Let
(48)
Then
(49) ..Ti' I F'1-- Ip'
(50)
and
tlU (D!,O),
(53)
tlV=O.
368 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
Invariant tests are given in terms of characteristic roots II," . , I r (ll ~ ".. ~ Ir)
of U '(W')- I U. Note that for the admissibility we used the characteristic
roots of Ai of U'(UU' + W,)-IU rather than Ii = Ar/o - A). Here it is more
natural to use Ii' which corresponds to the parameter value Vi' The following
theorem is given by Das Gupta, Anderson, and Mudholkar (1964).
(54) I{ u, V)
= (27T)
Applying Theorem 8.10.5 to (54), we see that the power increases monotoni-
cally in each F." •
Since the section of a convex set is convex, we have the following corollary.
From this we see that Roy's maximum root test A: II ::;; K and the
Lawley-Hotelling trace test A: tr U'(W,)-I U::;; K have power functions that
are monotonically increa!'>ing in each VI'
To see that the acceptance region of the likelihood ratio test
r
(55) A: O{1+I,)::;;K
i- I
Then
r
(57) fl(1 +1;) ==IU'(W,)-l U + 1 1=\u*'u* +/1
i= I
~ v;
(b) (c;
Figure 8.8. Contours of power functions.
370 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
and similarly the Eaton-Perlman result does not exclude (b). The last result
guarantees that the contour looks like (c) for Roy's maximum root test and
the Lawley-Hotelling trace test. These results relate to the fact that these
two tests are more likely to detect alternative hypotheses where few v/s are
far from zero. In contrast with this, the likelihood ratio test and the
Bartlett-Nanda-Pillai trace test are sensitive to the overall departure from
the null hypothesis. It might be noted that the convexity in rv-space cannot
be translated into the convexity in v-space.
By using the noncentral density of l,'s which depends on the parameter
values Vl"'" VI' Perlman and Olkin (980) showed that any invariant test
with monotone acceptance region (in the space of roots) is unbiased. Note
that this result covers all the standard tests considered earlier.
( 1) a= 1, ... ,N,
Theorem 8.11.1. Suppose O/N)A --+ Ao, z~ ZC'( < constant, a = 1,2, ... ,
and either the eu's are independent identically distributed or the ea's are indepen-
dent with (},l'l e~eaI2+e < constant for some 8> O. Then B.4 t3 and IN vec(B-
t3) has a limiting normal distribution with mean 0 and covariance matrix
"k ""
'01.'t (\- 1.
(2) H:t3=t3*,
8.11 ELLIPTICALLY CONTOURED DISTRIBUTIONS 371
Lemma 8.11.1. Under the conditions of Theorem 8.11.1 the h'miting distri-
bution of H is W(l:. q).
(5)
Then the lemma fol1ows from Theorem 8.11.1 and (4) of Section 8.4. •
1 1 ) -I I
I
=Nlog 1+ N(NG H.
Theorem 8.11.2. Under the conditions of Theorem 8.11.1, when the null
hypothesis is true,
d 2
(7) -210g A -+ Xpq'
We have
Theorem 8.11.2 agrees with the first term of the asymptotic expansion of
- 210g A given by Theorem 8.5.2 for sampling from a normal distribution.
The test and confidence procedures discussed in Sections 8.3 and 8.4 can be
applied using this X 2-distribution.
The criterion U = A2/ N can be written as U = nf~ I V, where Vi is defined
in (8) of Section 8.4. The term V; has the form of U; that is, it is the ratio of
the sum of squares of residuals of Xta regressed on x I a' ... , Xi _I, a' Za to the
sum regressed on xla, ... ,xi-I,a' It follows that under the null hypothesis
VI"'" Vp are asymptotically independent and -N log V; ~ Xq2. Thus
-N log U = -N[f= I log fIi ~ X;q' This argument justifies the step-down
procedure asymptotically.
Section 8.6 gave several other criteria for the general linear hypothesis:
the Lawley- Hotelling trace tr HG - I, the Bartlett-Nand a-Pillai trace tr H( G
+ H) - I, and the Roy maximum root of HG - 1 or H( G + H) -I. The limiting
distributions of N tr HG -I and N tr H(G + H)-I are again X;q. The limiting
distribution of the maximum characteristic root of NHG- 1 or NH(G +H)-I
is the distribution of the maximum characteristic root of H having the
distributions W(I, q) (Lemma 8.11.1). Significance points for these test crite-
ria are available in Appendix B.
where C =XZ' = L~=IXaZ~ and A = ZZ' = L~=IZaZ~. Note that the density
of eis invariant with respect to multiplication on the right by N X N
orthogonal matrices; that is, E' is left spherical. Then E' has the stochastic
represp-ntation
d
(11) E' = UTF',
8.11 ELLIPTICALLY CONlOURED DISTRIBUTIONS 373
where U has the uniform distribution on U'U = Ip, T is the lower triangular
matrix with nonnegative diagonal elements satisfying EE' = IT', and F
is a lower triangular matrix with nonnegative diagonal elements satisfying
FF' = I. We can write
It was shown in Section 8.6 that the likelihood ratio criterion for H: P = O.
the Lawley-Hotelling trace criterion, the Bartlett-Nanda-Pillai trace crite-
rion, and the Roy maximum root test are invariant with respect to linear
transformations x ~ Kx. Then Corollary 4.5.5 implies the following theorem.
Theorem 8.11.3. Under the null hypothesis P = O. the distribution o.f each
inuadant criterion when the distribution of E' is left spherical is the same (IS the
distribution under nonnality.
Thus the tests and confidence regions described in Section 8.7 are valid
for left-spherical distributions E /.
The matrices Z' A-1Z and IN - Z' A -I Z are idempotent of ranks q and
N - q. There is an orthogonal matrix ON such that
(16) O]l'
o X' ' o
I."'-q
]l'X' ,
where K = FT'.
The trace criterion tr HG-I, for example, is
(17)
The distribution of any invariant criterion depends only on U (or V), not
on T.
374 TESTING THE GENERAL LINEAR HYPOTHESIS; MANOVA
PROBLEMS
Weight of grain 40 17 9 15 6 12 5 9
Weight of straw 53 19 10 29 13 27 ]9 30
Amount of fertilizer 24 11 5 12 7 14 11 18
Let Z1Q = 1, and let zla be the amount of fertilizer on the ath plot. Estimatel:J
for this sample. Test the hypothesis I:J I = 0 at fie 0.01 significance level.
8.2. (Sec. 8.2) Show that Theorem 3.2.1 is a special case of Theorem 8.2.1.
[Him: Let q = 1, za = 1, I:J = It.]
N
E (x" I:Jz")(x,, -l:Jz,,), .
u-I
8.5. (Sec. 8.3) In the following data [Woltz, Reid, and Colwell (1948), used by
R. L Anderson and Bancroft (1952)] the variables are Xl' rate of cigarette bum;
).'~. the percentage of nicotine; Zl' the percentage of nitrogen; z2' of chlorine;
z~. of potassium; Z4' of phosphorus; Z5' of calcium; and z6' of magnesium; and
Z 7 = 1; and N = 25:
53.92
62.02
56.00
(42.20)
EN xa =
a=l
54.03 '
Q=I
12.25 ,
89.79
24.10
25
~
"""I ( x" - _)(
X x" - x
_)'
=
(0.6690
0.4527
0.4527)
6.5921 '
"
PROBLEMS 375
N
E (za -Z;)(za -z)'
a=1
I
1.8311 -0.3589 -0.0125 -0.0244 1.6379 0.5057 0
-0.3589 8.8102 -0.3469 0.0352 0.7920 0.2173 0
-0.0125 -0.3469 1.5818 -0.0415 -1.4278 -0.4753 0
= -0.0244 0.0352 -0.0415 0.0258 0.0043 0.0154 0
1.6379 0.7920 -1.4278 0.0043 3.7248 0.9120 0
0.5057 0.2173 -0.4753 0.0154 0.9120 03828 0
0 0 0 0 0 0 0
0.2501 2.6691
-1.5136 -2.0617
N 0.5007 -0.9503
E (za - Z){ xa - i)' = -0.0421 -0.0187
Q""I -0.1914 3.4020
-0.1586 1.1663
0 0
(a) Estimate the regression of xl and Xz on ZI' z5' z6' and Z"
(b) Estimate the regression on all seven variables.
(c) Test the hypothesis that t"te regression on Z2' Z3' and Z4 is O.
8.6. (Sec. 8.3) Let q = 2, Z La = wa (scalar), zZa = 1. Show that the U-statistic for
testing the hypothesis 131 = 0 is a monotonic function of a T2- statistic, and give
the TZ~stalislic in a simple form. (See Problem 5.1.)
Prove that
8.8. (Sec. 8.3) Let q, = qz. How do you test the hypothesis 131 = 132?
8.10. (Sec. 8.4) By comparing Theorem 8.2.2 and Problem 8.9, prove Lemma 8.4.1.
8.11. (Sec. 8.4) Prove Lemma 8.4.1 by showing that the density of ~lfi and ~2'" is
1 -11) r(n+2)r[t(n+1)]
/u( 2 n '2 + r(n-l)r(zn-l)v1T
1 r=
2u!n I ~ ut<n-I) 1
.( n(n-l) + n-l [arcsin(2u-1)-2'1T]
2u~n
+ -n- Iog
(1 + {U.;r-=u) + 2u~n-l(1-
3( n + 1)
U)t}
.
[Hint: Use Theorem 8.4.4. The region {O ~ Zl ~ 1,0 :s;.zz :::;.1, ztzz ~ u} is the
union of {O'::;;ZI :::;.1,0 :::;'z2'::;; u} and {O :s;.ZI ~U/z2' u ,::;;zz :s;.1}.]
8.1S. (Sec. 8.4) For p s m find ooEU h from the density of G and H. [Hint: Use the
fact that the density of K + [i= 1 V.V;' is W( I" s + t) if the density of K is
WeI, s) and VI"'" V. are independently distributed as N(O, I,).]
(a) Show that w.hen p is even, the characteristic function of y,.. log Up.In,n, say
4>(0 = ooE e"Y, is the reciprocal Of a polynomial.
(b) Sketch a method of inverting the characteristic function of Y by the
method of residues.
(c) Show that the resulting, denSity of U is a polynomial in Iii
and log u with
possibly a factor of u - i'.
8.17. (Sec.8.5) Usc the asymptotic expansion of the distribution to compute pr{-k
log U3,3,n :s;. M*} for
(a) n = 8, M* = 14.7,
(b) n 8, M* = 21.7,
(c) n = 16, M* = 14.7,
(d) n"'" 16, M* = 21.7.
(Either compute to the third decimal place or use the expansion to the k- 4
term.)
PROBLEMS 377
8.18. (Sec. 8.5) In case p = 3, ql = 4, and n = N - q = 20, find the 50~ significance
point for k log U (a) using -210g A as X 2 and (b) using -k log U as X 2 • Using
more terms of this expansion, evaluate the exact significance levels for your
answers to (a) and (b).
p /. p p
L
i= I
1 ~ /.
'
s: log n
,= I
(I + I,) ~ L I"
,= I
8.20. (Sec. 8.6) The multivariaIe bela density. Let Hand G be independently dis·
tributed according to W(l:, m) and WCI, n), respectively. Let C be a matrl"
such that CC ' = H + G, and let
8.22. (Sec. 8.9) The Latin square. Let Y;j' i, j = 1, ... , r, be distributed according to
N(fl.ij> :1:), where cx:Elj) = fl.ij = "I + A, + Vj + fl.k and k = j - i + 1 (mod r) with
LAi = LV} = Lfl.k = O.
(a) Give the univariate analysis of variance table for main effects and errOr
(including sums of squares, numbers of degrees of freedom. ,mo mean
squares).
(b) Give the table for the vector case.
(c) Indicate in the vector case how to test the hypothesis A, = 0, i = 1. .... r.
8.23. (Sec. 8.9) Let XI he the yield of a process and x~ a quality measure. Ld
zi = 1, Z2 = ± 10° (temperature relative to average) = ±O.75 lndatiw: nwa· =,
sure of flow of one agent), and z.. = ± 1.50 (relative measure of flow of another
agent). [See Anderson (1955a) for details.] Three ohs~rvations were made on x.
378 TESTING THE GENERAl. LINEAR HYPOTHESIS; lvlANOVA
and x:! for each possihle triplet of values of Z2' Z3' and %4' The estimate of Pis
8.24. (Sec. 8.6) Interpret the transformations referred to in Theorem 8.6.1 in the
original terms; that is, H: PI = 13! and z~).
8.25. (Sec. 8.6) Find the cdf of tr HG - I for p = 2. [Hint: Use the distribution of the
roots given in Chapter 13.]
(a) Show that the measures are finite for n ';?:;p by showing tr C'(I + CC,)-IC
< m and verifying that the integral of II + CC'I- ~(n+m) is finite. [Hint: Let
C= (c1, ... ,cm ), Dj=l+ Ei_lcrc; =EJEj, cj=E;_ld),i= 1, ... ,m(Eo =J).
Show ID) = IDj_,IO +d;d) and hence IDml = n}:,(1 +l;d/ Then refeI
to Problem 5.15.]
(b) Show that the inequality (26) of Section 5.6 is equivalent to
8.27. (Sec. 8.10.1) Likelihood ratio lest as a Bayes procedure. Let wI'"'' wm+n be
independently normally distributed with covariance matrix I and means ooEw{
,. 'YI' i"", 1, ... ,m, oc.Ew, = 0, i-m + l, ... ,m +n, with n';?:;m +p. Let be no
defined by [f I' I] = [0, (I + CC') -I], where the p X m matrix C has a deru:ity
proportional to II + CC'I- ,(Il+n1) and fl = ('YI,"" 'Ym); let I be defined by n
PROBLEMS 379
(a) SIIOW the measures are finite. [Hint: See Problem 8.26.]
(b) Show that the inequality (26) of Section 5.6 is equivalent to
8.28. (Sec. 8.10.1) Admissibility of the likelihood ratio test. Show that tlle acceptance
region Izz' 1/ IZZ' + XX' I :::: c satisfies the conditions of Theorem 8.1 0.1. [Hint:
The acceptance region can be written n~~lml>c, where m i =l-A" i=
1,,,.,t.]
8.29. (Sec. 8.10.1) Admissibility of the Lawley-Hotelling test. Show that the accep-
tance region tr XX'(ZZ') -I ~ c satisfies the conditions of Theorem 8.10.1.
8.30. (Sec. 8.10.1) Admissibility of the Bartlett-Nanda-Pillai trace test. Show that the
acceptance region tr X'(ZZ' +XX,)-IX $ c satisfies the conditions of Theorem
8.10.1.
8.31. (Sec. 8.10.1) Show that if A and B are positive definite and A - B is positive
semidefinite, then B- 1 - A -I is positive semidefinite.
8.32. (Sec.8.10.1) Show that the boundary of A has m-measure 0. [Hint: Show that
(closure of A) c.1 U C, where C = (JI1 u - Yl" is singular}.]
where
8.34. (Sec. 8.10.1) Show that C( A) is convex. [Hint: Follow the solution of Problem
8.33 to show ( px + qy) -< w A if x -< w A and y -< w A.]
8.35. (Sec. 8.10.1) Show that if A is monotone, then A* is monotone. [Hint; Use
the fact that
=(B+W)-I- 1 _\ (B+W)·IUU'(B+W)-'.
l+u'(B+W) u
The resulting quadratic form in u involves the matrix (tr A)/- A for A =
I I
(B + W)- >B(B + W)- >; show that this matrix is positive semidefinite by diago-
nalizing A.]
8.37. (Sec. 8.8) Let x~,), Q' = 1, ... , Np , be observations from N(f.L<p), :I), v = 1, ... , q.
What criterion may be used to test the hypothesis that
m
f.L<~) = E 'Y;,Chll + f.L,
h=\
where Chp arc given numbers and 'Ypl f.L arc unknown vectors? [Note: This
hypothesis (that the means lie on an m-dimensional hyperplane with ratios of
distances known) can be put in the form of the general linear hypothesis.]
8.38. (Sec. 8.2) Let xa be an observation from N(pz" I :I), Q' = 1, ... , N. Suppose
there is a known fixed vector 'Y such that P'Y = O. How do you estimate P?
8.39. (Sec. 8.8) What is the largest group of transformations on y~l), Q' = 1, ... , Nfl
i = 1, ... , q, that leaves (1) invariant? Prove the test (12) is invariant under this
group.
CHAPTER 9
Testing Illdependence of
Sets of Variates
9.1. INTRODUCTION
381
382 TESTING INDEPENDENCE OF SETS OF VARIATES
that is,
( 1) x::::
The vector of means .... and the covariance matrix I are partitioned similarly,
....ll)
....(2)
(2) .... =
....(q)
:1:[l In :1: lq
I2I I22 I 2q
( 3) I=
The null hypothesis we wish to test is that the subvectors X<l), ... , X(q) are
mutually independently distriruted, that is, that the density of X factors into
the densities of X(ll, ... , X<q). It is
q
(4) H: n(xl .... ,:1:) = fln(x(!)I ....(i), :1:;;).
I'" 1
(5) nO)'
r-
=', =
"""I}'"
0
(Seo;; Section 2,4.) Conversely, if (5) holds, th"~n (4) is true. Thus the null
'*
hypothesis is equivalently H: I,) = 0, i j. Thi~ can be stated alternativel) as
the hypothesis that I is of the form
III 0 0
0 In 0
(6) Io=
0 0 Iqq
criterion is
where
(8) L(I-I-, I) = nN
0%'" 1
!
(2'IT) zPI IP
I e- }(xn- ... n; .I(xn - ... )
and L(I-I-, 1: 0) is L(I-I-, 1:) with 1: ij = 0, i:# j, and where the maximum is taken
with respect to all vectors 1-1- and positive definite I and 1:0 (i.e., 1:(1)' As
derived in Section 5.2, Equation (6),
(9)
where
(10)
where
(12) (f)
L {(. .II,.
~
..... '
) -
-
nN
1
(2 'IT ) !PII ~
e _l(x(l)_ .. (I»)·I.-~x(l)-
2 ...... 1/........ (I))
, II
0%=1 ..... ri 112 .
Clearly
q
(13) maxL(I-I-' Io)
fL. I.o
= n max L/(..-.(,), 1: u)
i= 1 fLit), I."
where
(14)
384 TESTING INDEPENDENCE OF SETS OF VARIATES
where ,l(e) is a number such that the prob~bility of (17) is e with "I = Io. (It
remains to show that such a number c~n be found.) Let
( 18)
p
(20) IAI = IRI n au,
i=1
9.2 LIKELIHOOD RATIO CRITERION FOR INDEPENDENCE OF SETS 385
where
Rll RI2 R lq
R21 R22 R 2q
(21) R=(r,j)=
and
PI + ..• +p,
Thus
IAI IRI
(23) V= [1:1 R) .
nlAu!
That is, V can be expressed entirely in terms of sample correlation coeffi-
cients.
We can interpret the criterion V in terms of generalized variance. Each
set (x II' ... , xi N) can be considered as a vector in N -space; the let (x II -
Xi"'" X,N - X) = Zi' say, is the projection on the plane orthogonal to the
equiangular line. The determinant IAI is the p-dimensional volume squared
of the parallelotope with Zh"" zp as principal edges. The determinant IAIII
is the ptdimensional volume squared of the parallelotope having as principal
edges the ith set of vectors. If each set of vectors is orthogonal to each other
set (Le., R jj = 0, i =1= j), then the volume squared IAI is the product of tht:
volumes squared IAiil. For example, if P = 2, PI == P2 = 1, this statement is
that the area of a parallelogram is the product of the lengths of the sides
if the sides are at right angles. If the sets are almost orthogonal, then IAI
n
is almost IA ), and V is almost 1.
The criterion has an invariance property. Let C, be an arbitrary nonsingu-
lar matrix of order PJ and let
C1 0 0
0 C2 0
(24) C=
0 0 Cq
(25) A*I) = ~
I..J (X"-lIl
u
-- X*PI)(X*()
u
a
=C "I~
~ (Xl,)
n"
- rP»)(x(Jl
n
x())' C)'
V* = IA*I ICAC'I
(26)
OIA;~1 OIC,AIIC;1
ICI'IAI'IC'I IAI
=V
OIC,I'IA,J~IC:1 - OIA)
for ICI =OIC,I. Thus the test is invariant with respect to linear transforma-
tions within each set.
Narain (950) showed that the test based on V is strictly unbiased; that is,
the probability of rejecting the null hypothesis is greater than the significance
level if the hypothesis is not true. [See also Daly (940).]
AlI A LI •I AI,
A, _1.1 A , - Ll - I A , _1 I
A/I Al • , - I Au
(1) V=
I
i- 2, ... , q.
All Al. I - 1
-IAlIl
A,-J,J A ,- LI - 1
93 DISTRIBUfION OF THE LIKELIHOOD RATIO CRITERION 387
Then V = vi V3 ... Vq • Note that V, is the N /2th root of the likelihood ratio
criterion for testing the null hypothesis
that is, that X(I) is independent of (X(I)', ... ,X(,-l)'),. The null hypothesis
H is the intersection of these hypotheses.
When H j is !rut.,
Theorem 9.3.1. v:
has the distribution of Upl'p,.n_p" where
n =N-1 andpj=PI + ... +Pi-I, i=2, ... ,q.
(3)
( 4)
where
(5)
Ii-I.I
When the null hypotheris is not assumed, the estimator of P, is (5) with Ijk
replaced by A ik , and the estimator of (4) is (4) with Ijk replaced by (l/n)A jk
and P, replaced by its estimator. Under H j : Pi = 0 and the covariance matrix
(4) is Iii' which is estimated by (l/n)A ij • The N /2th root of the likelihood
388 TESTING INDEPENDENCE OF SETS OF VARIATES
( 6)
IAi,1
Ai_I , I A,-I.i-I
Ail A",_I
=~----------------~----~
All
'IAiil
Ai-I,i-I
which is v,.
This is the U~statistic for Pi dimensions, Pi components of the
conditioning vector, and n - p, degrees of fteedom in the estimator of the
covariance matrix. •
Proof From the proof of Theorem 9.3.1, we see that the distribution of V;
is that of Upl,PI. n _P, not depending on the conditioning Z~k), k = 1, ... , i-I,
a = 1, ... , n. Hence the distribution of V; does not depend on V2 ' " ' ' V,-I'
•
Theorem 9.3.3. Under the null hypothesis V is distributed as n{=2 nf~1 Xii'
where the X;/s are independent and Xi) has the density J3[xl Hn - Pi + 1 -
]') ,W,
1-].
9.3.2. Moments
Theorem 9.3.4. When the null hypothesis is true, the h th moment of the
criterion is
(8)
If the PI are even, say p, = 2 r l , i > I, then by using the duplication formula
rcex + ~)I'( ex + 1) = .;; f(2 ex + 1)2 - ~ for the gamma function WI: can rl:-
II
(9) (%V , =
'
Ii {n r(n + 1- PI - 2k + 2h)rCn + 1 - 2k) )
i=2 k=1 r{n + 1- PI - 2k)f(n + 1 - 2k + 211)
=
q
[! (
Dr,
S--I(n + 1- p, - 2k,Pi)
-j~IX11+I-P,-2k+211-1(1_X)P,-1 dt).
Thus V is distributed as f1{_2{f1~'~ I Y,n, where the Y,k are independent. and
Y,k has density /3(y; n + 1 - PI - 2k, p,).
In general, the duplication formula for the gamma function can be used to
reduce the moments as indicated in Section 8.4.
(1)
(.~AIt=K o;_Ir{HN(I+h)-i]}
ni-l{nJ~lr{HN(1 +h) _ill} 1
where K is chosen so that $Ao = 1. This is of the form of (1) of Section 8.5
with
a=p, b =p,
( 2) 1})
-i +
= ----"--"-,.--.;;...;....,.;..
(3)
Let
(p3 _ Ep;)2
(5)
72(p2 - Ep,2) .
(6) Pr{-klogV::::;v}=Pr{x!::::;v}
Table 9.1
Second
p f v 'Y2 N k 'Y2/ k2 Term
11 71
4 6 12.592 24 15 6" 0.0033 0.0007
IS 69
5 10 18.307 a- 15 T 0.0142 -v.OO21
6 15 24.996 235
15 fil 00393 -0.0043
48 6
7:1
16 '6 0.0331 -0.0036
(7)
'Y2 = p( i8~ 1) (2p2 - 2p - 13),
f= 2q(q - 1),
(8) k = N _ 4q ; 13,
BqqAqLB'LL BqqAq2B~ 0
q
=~ E trA,)A)~LAi,A~l.
I,)=1
;*J
= Pr{ xl ~x}
1 (3
+ n1 [ 12 p - ~ 2+ 2;~
3P ;:-; Pi ~ p;3) Pr {2
X/+6 ~x
}
9.6 STEP~DOWN PROCEDURES 393
and reject the null hypothesis if V. < v,C s,) for any i. Here v,C 8,) is the
number such that the probability of (1) when H, is true is 1 - s" The
significance level of the procedure is 8 satisfying
q
(2) 1- 8 = n (1 -
;=2
8/) •
The subtests can be done sequentially, say, in the order 2, ... , q. As soon as a
subtest calls for rejection, the procedure is terminated; if no subtest leads to
rejection, H is accepted. The ordering of the subvectors is at the discretion
of the investigator as well as the ordering of the tests.
Suppose, for example, that measurements on an individual are grouped
i.nto physiological measurements, measurements of intelligence, and mea-
surements of emotional characteristics. One could test that intelligence is
independent of physiology and then that emotions are independent of
physiology and intelligence, or the order of these could be reversed, Alterna-
tively, one could test that intelligence is independent of emotions and then
that physiology is independent of these two aspects, or the order reversed.
There is a third pair of procedures.
394 TESTING INDEPENDENCE OF SETS OF VARIATES
Other' criteria for the linear hypothesis discussed in Section 8.6 can be
used to test the compo nent hypotheses H 2' .•• , H q in a similar fashion.
When HI is true, the criterion is distributed independently of X~l), •• " X~i-l),
ex = L ... , N, and hence independently of the criteria for H 2" " , H,-l'
( 4)
All
X - A(l) - (A(I) ... A(I») •
,I- /I " ",-I .
(
A'_I.I
A(I)
Ai-II A,-I,,-I ,-I,'
A(l) A(l) A(l)
il "i-I II
All Al,,-l
A(I)
/I
A,_1.1 A,-l.,-l
9.6 STEP-DOWN PROCEDURES 395
For k> 1, the criterion for testing that the kth row of the matrix in (3) is 0 is
[see (8) in Section 8.4]
(5)
Ai 1,1
A(k)
Ai-I,l Ai-l,,-I i-l,(
(k) A(k) A(k)
A il i,I-1 Ii IA~~-l)1
=+------------------------+ ,
t(k-l)
Al,l-I • Ii IA~7)1
A(k-l)
Ai-I,I Ai-I,i-I i-I. i
A(k-I) A(k-I) A<~-l)
it 1,1-1 /I
1 - Xij n - Pi + 1 - j
(6) X. - P > Fp,.n-p,+l_j(e,j)'
I) 1
(7)
396 TESTING INDEPENDENCE OF SETS OF VARIATES
9.7. AN EXAMPLE
9.47
25.56
(1) X= 13.25
31.44 .
27.29
8.80
2.57 0.85 1.56 1.79 1.33 0,42
0.85 37.00 3.34 13.47 7.59 0.52
1.5b 3.34 8.44 5.77 2.00 0.50
(2) s= 1.79 13.47 5.77 34.01 10.50 1.77
1.33 7.59 2.00 10.50 23.01 3.43
0,42 0.52 0.50 1.77 3.43 4.59
>.8 THE CASE OF TWO SETS OF VARIATES 397
In the case of two sets of variates (q = 2), the random vector X, the
observation vector x a ' the mean vector .." and the covariance matrix l: are
partitioned as follows:
X=
( x(l)
X(2)
1
, x" = x~) 1,
(x'"
(1) ,
.., (~'l)
..,
(2)
)
, 1:~l1:11 1:" ).
:t21 l:22
(2)
398 TESTING INDEPENDENCE OF" SETS OF" VARIATES
(3)
It was shown in Section 9.3 that when the null hypothesis is true, this
criterion is distributed as UPI • p~. N-l-p 2' the criterion for testing a hypothesis
about regression coefficients (Chapter 8). We now wish to study further the
relationship between testing the hypothesis of independence of two sets and
testing the hypothesis that regression of one set on the other is zero.
The conditional distribution of X~I) given X~Z) = X~2) is N[JL(I) + t3(x~) -
JL(2»), !.1l,2] = N(t3(X~2) - i(2») + 11, !.l1.z]' where 13 = !.12!.221, !.U'2 =
!'JI - !.12!.221!.21' and 11 = JLO) + t3(x(Z) - JL(2»). Let X: = X!:rl ) , z!' = [(X~2)
Xl:!»)' 1]. it = (13 11), and !.* = !.U.2' Then the conditional distribution of X:
is N(it z!, !.*). This is exactly the distribution studied in Chapter 8.
The null hypothesis that!' 12 = 0 is equivalent to the null hypothesis 13 = O.
Considering x;) fiXed, we know from Chapter 8 that the criterion (based on
the likelihood ratio criterion) for testing this hypothesis is
(4)
where
(5)
= ( LX* z*(I),
\ Q (r
. -,
mil)) (A~2
~)
= (A 12 An' f(I)),
N
( 6) '" (x P ) - xtl))(x(l) - x(l»)' == A II'
'-' '" <r
a"l
9.8 THE CASE OF TWO SETS OF VARIATES 399
N
(7) ~
L.J
[X(1) - X(1) - A A -1 (X(2) -
a 12 22 CI'
X(2))"1 [X(l) -
"0'
.t(1) - A
12
A -1 (X(2) - i(2) )]'
22 a
a=l
Therefore,
IAI
( 8)
which is exactly V.
Now let us see why it is tha~ when the null hypothesis ls true the
distribution of U = V does not depend on whether the Xi2) are held fIx·!d. It
was shown in Chapter 8 that when the null hypothesis is true the di3tribu"Lion
of U depends only on p, q}> and N - q2' not on Za' Thus the conditional
distribution of V given X~2) = x~"2) does not depend on xi2); the joint dlstribu·
tion of V and X~2) is the product of the distribution of V and the distribution
of X~2), and the marginal distribution of V is this conditional distribution.
This shows that the distribution of V (under the null hypothesis) does not
depend on whether the Xi2) are fIxed or have any distribution (normal or
not).
We can extend this result to show that if q> 2, the distribution of V
under the null hypothesis of independence does not depend on the distribu·
tion of one set of variates, say X~1). We have V = V2 ••• Vq , where V. is
defined in (1) of Section 9.3. When the null hypothesis is true, Vq is
distributed independently of X~t), ... , X~q-l) by the previous result. In turn
we argue that l-j is distributed independently of X~l), ... , X~) -1). Thus
V2 ... Vq is distributed independently of X~l).
(9)
IJ'PX(2)(I3X(2»/1 IP~22P/1 II 12 ~2i 121 I
I~lll = 11111 = IIIlI
o 112
~21 I22
=( 1) P 1 -'--.,--,-......,.--.""':'
(10)
In a sense this measure shows how well X(1) can be predicted from X (2) •
In the case of two scalar variables XI and X 2 the coefficient of alienation
is al 21 a}, where fTl~2 =- $(XI - f3X2)2 is the variance of XI about it:.
regression on X 2 when $X! = CX 2 = 0 and J'(XI IX2 )= f3X 2 • In the case of
two vectors X(I) and X(21, the regression matrix is p = I 12 ~221, and the
general~ed variance of X(I) about its regression on X(2) is
(11)
The admissibility of the likelihood ratio test in the case of the 0-1 loss
function can be proved by showing that it is the Bayes procedure with respect
to an appropriate a priori distribution of the parameters. (See Section 5.6.)
Theorem 9.9.1. The likelihood ratio teSt of the hypothesis that I is of the
form (6) of Sel"tion 9.2 is Hayes (lml admis.'Iih/e if N > p + 1.
Proof. We shall show that the likelihood ratio test is equivalent to rejec-
tion of the hypothesis when
(1) ------~c,
where x represents the sample, 8 reprc:-;cnts the paramcter~ (Il- and I),
f(xi8) is the density, and ITl and no
are proportional to probability mea-
sures of 8 under the alternative and null hypotheses, respectively. Specifi.
cally, the left-hand side is to be proportional to the square root of n; 'liA"i /
IAI.
To define ITl' let
. exp { - ~ E
a~]
[x a - ( I + vv' r ry] ,(I +
1 Vt") [ x" - (I + vv' r :l~} ])
402 TESTING INDEPENDENCE OF SETS OF VARIATES
(4)
a=1 a""l
N N
= E X~Xa+V' E xax~v-2yv'NX+Ny2
a-I a==1
- :x:
:x:
-:xl
~e-ilrA-tN.n'.
L et Z{, -- [Z(l)'
a ' Z(2)'],
a ,a -- 1,.,., n, be d'Ist£1'bute d aceo rd'mg to
( 1)
9.10 MONOTlC[TY OF POWER FUNCf[ONS 403
Lemma 9.10.1. There exist matrices BI (PI XPI)' B2 (P2 XP2) such that
(2)
(3)
M=(D,O),
(4) D = diag( 8 1" " , 8p ,),
8i = pJ (1 - Pi2)! , i=l,,,,,PI'
Then
(7)
Now given Y, the problem reduces to the MANOV A problem and we can
apply Theorem 8.10.6 as follows. There is an orthogonal transformation
(Section 8.3.3) that carries X* to (V, V) such that Sh = VV', Se = W',
V=(UI' ... ,u/l ) . V is PI x(n -P2)' u , has the distrihution N(DiE/T),
i= 1, ... ,PI (E I being the ith column of I), and N(O, I), i=PI + 1",.,P2'
and the columns of V are independently clistributed according to N(O, I).
Then cl' ... ,c p1 are the characteristic roots of VV'(W')-l, and their distri-
bution depends on the characteristic roots of MYY' M', say, 'Tl, ... , 'Tp~. Now
from Theorem 8.10.6, we obtain the foUowing lemma.
Now Lemma 9.10.3 applied to MYY' M' shOWS that for every j, 'T/ is an
increasing function of 5( = pjO - pl) ~ and hence of Pi. Since the marginal
distribution of Y does not depend on the p/s, by taking the unconditional
power we obtain the following theorem.
Theorem 9.10.1. An invariant test for which the acceptance region is convex
in each column of V for each set of fixed V and other columns of V has a power
function that is monotonically increasing in each PI'
Then
where 1 + K=ptffR4/[(p+2)(~R2)2].
The likelihood ratio criterion for testing the null hypothesis :I,) = O. i *" j.
v.,
is the N 12th power of U = 0;=2 where V. is the U-criterion for testing the
null hypothesis Iii = 0, ... , Ii-I. ,= 0 and is given by (l) and (6) of Section
9.3. The form of Vi is that of the likelihood ratio criterion U of Chapter 8
with X replaced by XU), ~ by ~i given by (5) of Section 9.3, Z by
( 4) X(I-I) =
x(1)
: 1 '
r X(I-I)
(5)
with similar definitions of :£(1-1), :£U.I I>, S(I-1). and S(I,I-1). We write
Vi = 1Gilll G( + Hil , where
(7)
- - -I -
= (N -l)S(I·,-I)(S(t-I») S(I-I.,),
Theorem 9.11.1. When X has the density (1) and the null hypothesis is trne.
the limiting distribution of H, is W[(1 + K ):'::'11' p,]. where PI = p I + ... + P. - J
406 TESTING INDEPF.NDENCEOF SETS OF VARIATES
(9)
if i.1 5:p, and k, m > Pi or if j, I> p,. and k, m 5:Pi' and <!SjkSlm = 0 other-
wise (Theorem 3.6.1). We can write
Theorem 9.11.2. Ullder the conditions of Theorem 9.11.1 when the null
hypothesis is true
cI 2
(11 ) - N log II; - (1 + K) Xl1.p,'
q d
(13) -N log V= -N 12 log V, - xl,
j., 2
where f= Ef-2PIPi = Mp(p + 1) - E?=l p,(p, + 1)]. The likelihood ratio test
of Section 9.2 can be carried out on an asymptotic basis.
9.11 ELLIPTICALLY CONTOURED DISTRIBUTIONS 407
g
(14) ~tr(AA() I -If = t t tr A'JAj~lAJiA,~1
i,/=-1
i¢f
The matrix A and the vector x are sufficient statistics, and the likelihood
ratio criterion for the hypothesis His (IAI/nf=,IA,iI)N/2, the same as for
normality. See Anderson and Fang (1990b).
(18)
408 TESfING INDEPENDENCE OF SETS OF VARIATES
./
for all v and
for all K = diag(K ll , ... , Kqq). Then the distribution of J(X), where X has the
arbitrary density (16), is the same as the distribution of J(X), where X has the
nonnal density (16).
It follows from Theorem 9.11.3 that V has the same distribution under the
null hypothesis H when X has the density (16) and for X normally dis-
tributed since V is invariant under the transformation X -+ KX. Similarly, J.t:
and the criterion (14) are invariant, and hence have the distribution under
normality.
PROBLEMS
q
rev h
=
K(Io,n)
K(I +2h)
j ".
jn IA;,I
<
-h
w(A,Io,n+2h)dA,
o,n 1=1
~V
II
=
K(Io,n)
K(I + 2h)
n [K(IIl,n
q
K(I)
+2h) j j ( ]
", w Alii Iii' n) dA/i .
0' n n
<
1= I II'
Pr(V:=;;u} =1.;u{n-5,4)
9.4. (Sec. 9.3) Derive some of the distributions obtained by Wilks t 1l)~5) and
referred to at the end of Section 9.3.3. [Hint: In addition to the results for
Problems 9.2 and 9.3, use those of Section 9.3.2.]
9.5. (Sec. 9.4) For the case p, = 2, express k and Y2' Compute the second term or
(6) when u is chosen so that the first term is 0.95 fOr P = 4 and 6 and N = 15.
9.6. (Sec. 9.5) Prove that if BAR = CAC' = I for A positive definite and Band C
I
9.7. (Sec. 9.5) Prove N times (2) has a limiting X2-distribution with f degrees of
freedom under the null hypothesis.
9.8. (Sec. 9.8) Give the sample vector coefficient of alienation and the vector
correlation coefficient.
9.9. (Sec. 9.8) If Y is the sample vector coefficient of alienation and z the square
of the vector correlation coefficient. find ~ yR z I, when ~ I ~ "" o.
1
f ... f
00 00
I dUI'" du <x
,_00 _00 (1 + "P
'-1= I
u~),r
I
p
I'f P < n. [H'mt.. Le t Yj -- w) Y1 + "(' '-,=j+ I Yi2, 'J -- 1, ...• p- , Il l'. turn. 1
9.11. Let xI;::' arithmetic speed, x2::: arithmetic power, x~ = intellectual interest.
x 4 = soc al interest, Xs = activity interest. Kelley (1928) observed the following
correlations between batteries of tests identified as above, based on 109 pupils:
Let x tl }, = (.r l • .1: 2 ) and x(~\r = (.1: 3 , X-l' x.:J Test the hypothesis that x(1) is
independent of Xl::} at the 1% significance level.
9.12. Cany out the same exercise on the dutu in Prohlem 3.42.
9.13. Another set of time-study data [Abruzzi (1950)] is summarized by the correla-
tion matrix based on 188 observations:
10.1. INTRODUCTION
411
412 TESTING HYPOTHESES OF EQUAUTY OF COVARJANCEMATRICES
In this section we study several normal distributions and consider using a set
of samples, one from each population, to test the hypothesis that the
covariance matrices of these populations are equal. Let x';P, a = 1, ... , Ng,
g = 1, ... , q, be an observation from the gth populatioll N( ....(g), l:g). We wish
to test the hypothesis
(1)
Nt
(2) Ag = E (x~g) - x(g))( x~P - x(g»)" g=l, ... ,q,
a=J
First we shall obtain the likelihood ratio criterion. The likelihood function is
(3)
The space n is the parameter space in which each Ig is positive definite and
....(g) any vector. The space w is the parameter space in which I I = l:2 = ...
= l:q (positive definite) and ....(g) is any vector. The maximum likelihood
estimators of ....(g) and l:g in n are given by
The maximum likelihood estimators of ....(g) in ware given by (4), fJ.~) =x(g),
since the maximizing values of ....(g) are the same regardless of l:g. The
function to be maximized with respect to l: I = ... = l: q = l:, say, is
(5)
10.2 CRITERIA FOR EQUALITY OF COVARIANCE MATRICES 413
(6)
(7)
(8)
ll qg""l N1pN,'
g
(9)
where Al(s) is defined so that (9) holds with probability E: when (1) is true.
Bartlett (1937a) has suggested modifying AI in the univariate case by
replacing sample numbers by the numbers of degrees of freedom of the A~.
Except for a numerical constant, the statistic he proposes is
(10)
(11)
where srand s~ are the usual unbiased estImators of a;2 and a~ (the t\vo
population variances) and
( 12)
414 TESTING HYPOTHESES OF EQUALITY OF COVARIANCEMATRICF.s
(13)
(14)
Brown (1939) and Scheffe (1942) have shown that (14) yields an unbiased
test.
Bartlett gave a more intuitive argument for the use of VI in place of AI'
He argues that if N I , say, is small, Al is given too much weight in AI' and
other effects may be missed. Perlman (19S0) has shown that the test based on
VI is unbiased.
If one assumes
(15)
N~
( 16) A"_"' = "i...J (x(g) -
(f
a""~t; z(g))(x u - aI-"g z(g)),
cr a ,
a"'-I
l17) g= 1, ... , q,
(1 )
The test of the assumption i.l H2 was considered in Section 10.2. Now let us
consider the hypothesis that both means and covariances are the same; this is
a combination of HI and H 2. We test
(2)
As in Section 10.2, let x~g\ a = 1, ... , Ng• be an observation from N(JL(g\ Ig),
g = 1, ... , q. Then il is the unrestricted parameter space of {JL(g), I g}, g =
1, ... , q, where I g is positive definite, and w* consists of the space restricted
by (2).
The likelihood function is given by (3) of Section 10.2. The hypothesis HI
of Section 10.2 is that the parameter point faUs in w; the hypothesis H2 of
Section 8.8 is that the parameter point falls in w* given it falls in w ~ w*;
and the hypothesis H here i~; that the parameter point falls in w* given that
it is in il.
We use the following lemma:
(3)
416 TESTING HYPOTHESES OF EQUALITY OF COVARIANCE MATRICES
maxOefi. f(y,6)
(4) Aa = maxOefi f( y, 6)'
A _ max Oe fi b f(y,6)
(5) b-
maxOefi. f(y,6)'
(7)
where
q Ng
(8) B= L L (x~P-i)(x~,g)-i)'
g= L a=l
q
=A + L Ng(i(g) -i)(i(K) -i)'.
g=1
( 10)
(11)
However, Perlman (1980) has shown that the likelihood ratio test is unbiased.
lOA DISTRIBUTIONS Of THE CRITERIA 417
then
(2)
IBlblCI C
( 5)
IB + Cl h + C
•
=
n
(=1
IB(lblC,l c
IBI_llbIC,_llc
IB(_I +C,_1Ib+c
IB, + C(l b+c
p b c
bif.,-ICI/·{-1
=0I 1=
b+c
(b" I-I + C,/,,_I )
10.4 DISTRIBUTIONS OF THE CRITERIA , 419
where bi/. i _I = b(( - b(()B1-_11b(l) and CI{'I-l = Cfl - C(f)Ci_11 C(I)' The second term
for i = 1 is defined as 1.
Now we want to argue that the ratios on the right-hand side of (7) are
statistically independent when Band C are independently distributed ac-
cording to we£., m) and WeI, n), respectively. It follows from Theorem 4.3.3
that for B f _ 1 fixed bU) and ba.l - I are independently distributed according to
N(l3 u), O"U.(_I B;-:..II) and O"U'I I X Z with m - (i - 1) degrees of freedom, re-
spectively. Lemma 1004.1 implies that the first term (which is a function of
b"'f-l/Cji.i-l) is independent of bU•t - 1 +CiI-f_I'
We apply the following lemma:
(8) b(i)B;=.ll b(f) + C(f)Ci-\ C(I) - (b(n + C(I)Y (BI _ I + Ci - 1) I(bU) + C(i»)
(9)
b'B-'(B- I +C-I)-I(B- I +C-I)b+c'(B-1 +C-I)(B- 1 +C- I ) IC-IC
The denominator of the ith second term in (7) is the numerator plus (8),
The conditional distribution of Bi-II b(i) e;_'1 C(I) is normal with mean
i-I + e-
, O"il'i-I (B-
-1 a. C- I d ' 1 I ) Th .
Bi-II"'(I) - (-,'Y(l) an covarIance matrlX (-I' e covarI-
ance matrix is 0"(('(-1 times the inverse of the second matrix on the right-hand
side of (8). Thus (8) is distributed as O"IN-l X Z with i - 1 degrees of freedom,
independent of BI_I> CI- 1' bit'i-I> and Cji'i-I'
Then
(10)
is distributed as Yib+C', where 1'; has the ,B[i(m + n) - i + 1,!(i - 1)] distribu-
tion. Then (5) is distributed as nr= I Xib(1 - X)Cnf=2 l';:b+c, and the factors
are mutually independent.
Theorem 10.4.2.
(12)
where the X's and Y's are independent, X ig has the J3H(n 1 + .,. + n g_1 - i + 1),
i(n g -i + 1)] distribution, and 1';g has the J3[~(nl + ... +ng) -i + 1,~(i -1)]
distribution.
Proof. The factors VI2 , ••• , V1q are independent by Theorem 10.4.1. Each
term V1g is decomposed according to (7), and the factors are independent .
•
The factors of VI can be interpreted as test criteria for subhypotheses.
The term depending on X i2 is the criterion for testing the hypothesis that
Ui~,If- I = Ui~~f- I, and the term depending on Yi2 is the criterion for testing
(I) - (2)' (I) - (2)
O'(i) - O'(i) given U{j'i- t - Uij'i-I' an
d "'""i-I.I
" "'"
""1-1,2'
Th e terms d epen.d'109
on X'g and l';:g similarly furnish criteria for testing I I = Ig given I I = ... =
Ig_I'
Now consider the likelihood ratio criterion A given by (7) of Section 10.3
for testing the hypothesis Jl.(I) = ... = Jl.(q) anc I I = ... = I q • It is equivalent
to the criLerion
(13)
The two factors of (13) are independent because the first factor is indepen-
dent of Al + ... +Aq (by Lemma 10.4.1 and the proof of Theorem 10.4.1)
and of i(l), •.. , i(q).
10.4 DISTRIBUTIONS OF THE CRITERIA 421
Theorem 10.4.3
(14) W= nq
g~2
{ p
n}(J(NI+"'+N~-I)(l-X )'iN~nYl(NI+
i= I
I
Ig I~
I
1=2
P I
I~
where the X's, Y's, and Z's are independent, XIII has the ,B[~(111 + '" +11/1- I -
i + 1),!(n g - i + 1)] distribution, Y{g has the ,B[!(n, + '" +n g) - i + I ,~(i - I)]
distribution, and Zi has the ,B[hn + 1 - i),~(q - 1)] distribution.
(15)
.n r[4(nJ
1~2
r[!(nJ+"'+ng)(I+h)-i+1]r[~(n'+"'+llg-i+1)] }
+ ... +ng) - i + l1r[1(n J+ ... +ng)(l + h) - ~(i -1)]
=n{
i= J
r[Hn+1-i)]
r [ ~ (n + hn + 1 - i)]
n
g= J
r [Hn g +hn g +1-i)]}
-H
r [ n g + 1 - i) ]
rp(!n)
rp[Hn +hn)]
n
}:=,
1~[Hng +Im g)]
rpUn,,)
422 TESTING HYPOTHESES OF EQUALITY Of COVARIANCE MATRICES
(16)
= nq ( np r[l(n 2 1
+ ... +n g-I + 1- i) + lh(N
2 1
+ ... +Ng-I )]
s~1 I~I r[HIl 1 +"'+ll r l +l-i)]r[Hn g +1-i)]
.n r [1 (n
P
2 I
+ .. , + n j: ) + lh
2
(N I + '" + Ng ) + 1 - i]
1=2 r[ h n l + .,. +ng) + 1 - i]
1'[1(111 + ... +11, + [- ill }
. r[ Hn I + .. , + ".~ + 1 - i) + ~ h ( N I + ... + Nx)]
p r [ ~ (n + 1 - i + hN)] r [H N - i) ]
)J r [ ~ (11 + 1 - i) ] r [1(N + hN - i) ]
_ fP
-I1 \g=1
n r[HNg+hNg-i)]}
r[HN/:-i)]
1'[~(N-i)]
r[t(N+hN-i)]
rp(!n) q fp[Hng+hNg)]
rpOn + thN) }] rgOn g)
Theorem 10.4.4. Let VI be the criterion defined by (10) of Section JO.2 for
testing the hypothesis that HI: I I = '" = 1'1' where A/: is n H times the sample
covariance matrix and llg + 1 is the size of the sample from the gth population; let
W be the criterion defined by (13) for testing the hypothesis H : .... 1 = .. , = .... q
and HI' where B =A + Lj?Ng(i(g) -i)(i(g) - f)'. The hth moment of VI when
HI is trne is given by (15). The hth moment oJ' W, the criterion for testing H, is
given by (16).
This theorem was first proved by Wilks (1932). See Problem 10.5 for an
alternative approach.
lOA DISTRIBUTIONS OF THE CRITERIA 423
If p is even, say p = 2r, we can use the duplication formula for the gamma
function [r(a + k)r(a + 1) = J;r(2a + 1)2- 2a ]. Then
{f r ( n g + hn g + 1 - 2 j) 1 r (n + 1 -
2j) }
(17) SV1
h _
-lJ)J r q
r(ng + 1- 2j) r(n + hn + 1- 2j)
and
(18)
SW -
h _ n{[ gIJ
j"" I
q r( ng + hNg + 1 - 2j)
r (n g+ 1 - 2j)
1r ( Nr(+NhN- 2j)- 2j) } .
In principle the distributions of the factors can be integrated to obtain the
distributions of VI and W. In Section 10.6 we consider VI when p = 2, q = 2
(the case of p = 1, q = 2 being a function of an F-statistic). In other cases,
the integrals become unmanageable. To find probabilities we use the asymp-
totic expansion given in the next section. Box (1949) has given some other
approximate distributions.
(g) _ X(g)
(I-I) ) (g) _ IJ.(ig)- I) ) I~g) =
I(?)
(I-I)
(J'(W]
(19) XCi) - (
X/g) , IJ.(I) - ( /-t~g) , I
[ ....,.(g), (g) ,
v (i) aii
(20)
where a:(g) /1'1-1
= a/g)
II
- fI(~)'I(
(I)
0 fI(~)
I-I (I)'
It is assumed that the components of X
have been numbered in descending order of importance. At the ith step the
component hypothesis aJ?_1 = ai}~!_1 is tested at significance level ej by
means of an F-test based on sgll_1 /sm-I; SI and S2 are partitioned like I(l)
and I(2). If that hypothesis is accepted, then the hypothesis fleW = (or fill?
I(I)
i-I
- I fI (1) = I (2) - I fI (2»
(l) {-1 (f)
is tested at significance level o.I on the assumption
that I}~ I = I}~ I (a hypothesis pnviously accepted). The criterion is
given xU~ I) and xU~ 1)' Alternatively one can test in sequence the equality of
the conditional distributions of X,o) and xl 2) given xU~ I) and x~~~ I)'
For q > 2, the hypothesis II = ... = Iq can be tested, and then j.11 = ...
= j.1q. Alternatively, one can test [l/(g - 1)](1 1 + .. ' + Ig-I) = Ig and
[l/(g-l)](j.1(l) + ... + j.1(g-l)) = j.1(g).
bon
n<r q n -pn
2I I [
q 1 k' jwn V
(1) A*1 -- V I . n g=l
q
n wng
g
'= V1 . I] (n)
g-I g
- - g-I
- I] (k g ) - I
IS
(2)
10.5 ASYMPTOTIC EXPANSIONS OF DISTRIBUTIONS OF CRITERIA 425
b=p. - In
YJ - '2 • TI, = t(1- j), j= l, .... p,
Then
= - rq
l
t
i"" I
(1 - i) - t(
) '" I
1 - j) - (qp - p) 1
=- [-qtp(p 1) + 1p(p -1) - (q -l)p]
=±<q-l)p(p+l},
8
J
= ~(1 - p)n, j = 1, ... , p. and {3k = !O - p)ni: = 10 - p)k.~n, k = (g.- 1)p
+ 1, ... ,gp.
In order to make the sec<?nd term in the expansion vanish, we take p as
q 1 1 \ 2p~ + 3p - 1
(5) p=l- ( g~l ng - nJ6(p+l)(q-l)'
Then
Thus
(8)
426 TESTING HYPOTHESES OF EQUALITY OF COVARIANCEMATRICES
Then
( 11) W2 = p( p +2
48p
3) [ 1: [1
q
g-I
-2 -
Ng
I) (p + 1)( P + 2) -
-2
N
6(1 - p) 2(q - 1) 1
The asymptotic expansion of the distribution of - 2 P log A is
(12) Pr{ - 2p log A:::; z}
A = ( 78.948 214.18 )
I 214.18 1247.18 '
A = (223.695 657.62)
2 657.62 2519.31 '
A = ( 187.618 375.91 )
4 375.91 1473.44 '
A = ( 88.456 259.18 )
5 259,18 1171.73 •
10.6 THE CASE OF TWO POPULATIONS 427
where C is non singular. The maximal invariant of the parameters under the
transformation of locations (C = J) is the pair of covariance matrices lI' I 2,
and the maxiwal invariant of the sufficient statistics i(1), S I' i(2), S2 is the
pair of matrices SI' S2 (or equivalently AI' A 2 ). The transformation (1)
induces the transformations Ii = CI Ie', Ii = CI 2C', Sf = CS1C', and
Sf = CS 2 C'. The roots A, ~ A2 ~ ... ~ Ap of
(2)
Moreover, the roots are the only invariants because there exists a nonsingular
matrix C such that
(4)
(5)
( 6)
where L is the diagonal matrix with Ii as the ith diagonal element. The null
hypothesis is rejected if the smaller roots are too small or if the larger roots
are too large, or both.
The null hypothesis is that Al = ... = Ap = 1. Any useful in variant test of
the null hypothesis has a rejection region in the space of II"'" I p that
include~ the points that in some sense are far from II = ... = Ip = 1. The
power of an invariant test depends on the parameters through the roots
AI"'" Ap.
The criterion (19) of Section 10.2 is (with nS = n l SI + n2S2)
(9)
= II.
r
+ 11g + U(g)
a' a=l .... ,M, g=l, .... q.
430 TESTING HYPOTHESES OF EQUAUTY OF COVARIANCE MATRICES
where GU(g) = 0 and GU(g)U(8)1 = :I. Vg = 11-(8) - 11-, and II- = (llq)EZ .. 1 II-(g)
O:::~, IV.~::::: 0). The null hypothesis of no effect is VI = ... Vq = O. Let
Degrees of
Source Sum of Squares Freedom
q
Effect H=M E (i(g)-i)(i(g)-i)' q-1
gml
q AI
Error G= E E (x~) - i(g)Xx~g) - i(g», q(M 1)
g"'l a=l
q .\f
Total E E (x~) - i)(x~g) - i)' qM 1
g=l a=I
Invariant tests of the null hypothesis of no effect are b,.sed on the roots of
\H-mGI 0 or of ISh-ISel=O, where Sh=[1/(q O]H and S..,=
[J/q(M -1)]G. The null hypothesis is rejected if one or more of the roots is
too large. The error matrix G has the distribution W(l., q(M - 1». The
effects matrix H has the distribution W(:I, q - 1) when the null hypothesis is
true and has the noncentral Wishart distribution when the null hypothesis is
not true; ils expected value is
q
( 10) IiH = (q - 1)I + ME (11-(11) - II-)(II-(g) - 11-)'
8'=1
q •
= (q - I) l + M E Vg v;.
g=l
where Vg hac:; the distribution N(O, e). Then x~g) has the distribution
N( f.1. I + 0). The null hypothesis of no effect is
(12) 0==0.
In this model G again has the distribution W(:I, q(M - 1». Since j(g) = II- +
Jljl + U(g) has the distribution N(II-, (l/M):I + e), H has the distribution
W(:£ + M@, q - n. The null hypothesis (12) is equivalent to the equality of
10.7 TESTING HYPOTHESIS OF PROPORTIONALITY; SPHERICITY TEST 431
(13)
(Note {lj[q(M - l)]}G and (1jq)H maximize the likelihood without regard
to @ being positive definite.) Let It = Ii if II> 1, and let Ii = 1 if Ii'::;; 1.
Then the likelihood ratio criterion for testing the hypothesis 9 = 0 against
the alternative @ positive semidefinite and 0 =1= 0 is
p I*!tl k
(14) kfiqMp Il i 1 = MtqMk Il----'---:--
i=l (17 + M - 1),qM 1-1 (I, + M - 1)
where k is the number of roots of (13) greater than 1. [See Anderson (1946b),
(1984a), (1989a), Morris and Olkin (1964), and Klotz and Putter (1969).]
(1)
432 TESTING HYPOTHESES OF EQUALITY OF COVARIANCEMATRICES
The hypothesis is true if and only if all the roots of (1) are equal. t Another
way of putting it is that the arithmetic mean of roots <pp ..• , <Pp is equal to
the geometric mean, that is,
Df_I<P,I/ p = I~II/p =1
(2)
r..f""l <pi/p tr '.tIp .
The lengths squared of the principal axes of the ellipsoids of constant density
are proportional to the roots <P, (see Chapter 11)~ the hypothesis specifies
that these are equal, that is, that the ellipsoids are spheres.
The hypothesis H is equivalent to the more general form 'I' = (]'2'1'0, with
'I' 0 specified, having observation vectors J'I," . ,J'N from N( v, '1'). Let C be
a matrix such that
(3) C'I'oC' = 1,
and let j.1* = Cv, ~* = C'I'C', x: = Cya' Then xL ... , x~ are observations
from N( j.1* , ~ *), and the hypothesis is transformed into H: ~ * = (]' 21.
(4)
where
N
(5) A E (xa-i)(xa-i)'=(a i ,)
a=1
and 'ii = aiil vaUa'1' We use the results of Section 10.2 to obtain ..\2 by
considering the ith component of xa as the ath observation from the ith
population. (p here is q in Section 1O.2~ N here is Ng there; pN here is N
t This follows from the fact that 1: - O'4W, Where ~ is a diagonal matrix with roots as diagonal
elements and 0 is an orthogonal matrix.
10.7 TESTING HYPOTHESIS OF PROPORTIONAUTY; SPHERICITY TEST 433
there.) Thus
( 6)
na-tN
= _ _--'-'11'---,-_
(trA/p)!PN'
(7)
(8) Is -Ill = 0,
(9)
N
(10) A* = L (x: -i*)(x: -i*)'
a=l
N
=C L (ya-ji)(Ya-ji)'C
a=l
=CBC',
where
N
(11) B= L (Ya-ji)(ya-ji)'·
a=l
434 TESTING HYPOTHESES OF EQUALITY OF COVARIANCE MATRICES
( 13)
Mauchly (1940) gave this criterion and its moments under the null
hypothesis.
The maximum likelihood estimator of u 2 under the null hypothesis is
tr B'V()I/(pN). which is tr A/(pN) in canoniLal form; an unbiased estimator
is trB'I'ij'/[p(N-l)] or trA/[p(N-l)] in canonical form [Hotelling
(1951)]. Then tr B'I'U"I/U 2 has the X2-distrihution with peN -1) degrees of
freedom.
(14)
(IS)
It follows that
( 16)
For p = 2 we have
r(n) r[t(n+l-i)+h]
(17)
h h
$ W = 4 r (n + 2h) U---::""'
2
r..-'[ (-n-+-l--"';"'l-')::-]. .: . .
i'-I
_r(n)r(n-l+2h)_ n-l
- r(n+2h)r(n-l) - n-l+2h
This result can also be found from the joint distribution of 11,12 , the roots of
(8). The density for p = 3, 4, and 6 has been obtained by Consul (1967b). See
also Pillai and Nagarsenkar (1971).
(19)
436 TESTING HYPOTHESES OF EQUALITY OF COVARIANCE MATRICES
1 _ 2p2 + P + 2
(21) - p- 6pn
Then
=Pr{-nplogW;::;z} !
= Pr{ xl ~z} + w 2 (Pr{ xl+4 ;::;z} - Pr{ xl :s;z}) + O(n- 3 ).
Factors c(n, p, e) have been tabulated in Table B.6 such that
(25) ! n tr
2
(s - trPS I) ~~
tr S
(s _trpS I) Ltr S
= In tr
2
(LS
tr S
- 1)1 = {n[ p~ tr S2 -p]
- ( tr S) 2
= In[-P -
2 ("/1 1)2 l=l
,-"
12 -
P
~=
-
2:(=l(!,
f·
1 ~n -I):! ,
L,=I I
where 1= 2:f= l/,/p. The left-hand side of (25) is based on the loss function
L/,£, G) of Section 7.8; the right-hand side shows it is proportional to the
square of the coefficient of variation of the characteristic roots of the sample
covariance matrix S. Another criterion is I til p' Percentage points have been
given by Krishnaiah and Schuurmann (1974).
Given observations YI>" ., YN from N( v, 'II), we can test qr = IT:' 'V 1I for any
specified '1'0' From this family of tests we can set IIp a confidence region for
'}T. If any matrix is in the confidence region, all multiples of it are. This kind
of confidence region is of interest if all components of Ycr are measured in
the same unit, but the investigator wants a region imh:-pt.:'ndt:nt of this
common unit. The confidence region of confidence I - e consists of all
matrices 'II * satisfying
(26)
(28)
( I)
(3)
where
(4 )
a
Sugiura and Nagao (1968) have shown that the likelihood ratio test is biased,
but the modified likelihood ratio test based on
(5)
10.8 TESTING THAT A COVARIANCE MATRIX IS EQUAL TO A GIVEN MATRIX 439
2
(6) - -log
n
Ai = tr S -loglSI - p = L/(/, S),
(8)
where
N
(9) B= E (Ya - Y)(Ya - Y)'.
a" I
(to)
which is the likelihood ratio criterion for testing the hypothesis u 2 = 1 given
~ = u 2/. The modified criterion At is the product of IAI tn /(tr A/p)wn and
tr A
wn _ i\r A+ liP
(11) ~-
( pn ) e 2
n •
'
etpnh
= --f"'fIAI~nhe-thtrAw(AII n)dA.
n wnh '
Since
(13)
IAI ten +n h-p-l) e-1(tr I. -IA+lr hA)
IAltnh e-!hlrAw(AII,n) = ----;------;------
2wn III tnrp(!n)
2kp'I'Tp[~n(l +h)]
= ------~------~----
11- + hli t(,· t,ddl ~I ~''f"On)
1
2tpnh\II~nhrp[~n(l +h)]
=
11 + hII ~( .. +nh)rpOn)
(14)
(16)
t(II-) )-III(
, (' 1) \ ',I" I I ,)
. tI lI}-- J'
=(1-2u) ( 1-
1(n j+l)(1-2it}
2j - 1 ) - I fII
. ( 1 - --"------
n(1 - 2it)
As n -'jI 00, $j(t) -'jI (1- 2it)- tI, which is the characteristic function of x/
(x 2 with j degrees of freedom), Thus - 2 log Ai is asymptotically distributed
as EJ.=1 xl,which is X 2 with Ej-l j = :W(p + 1) degrees of freedom. The
distnbution of Ai can be further expanded (Korin (1968), Davis (1971)] as
(19) Pr{-2plogAt~z}
(20) p= 1- -"'-~--'---
Under the null hypothesis this criterion has a limiting x2-distribution with
~p( f1 + I) degrees of freedom.
Roy (1957), Section 6.4, proposed a test based on the largest and smallest
characteristic roots 11 and I p: Reject the null hypothesis if
(24)
and e is the significance level. Clemm, Krishnaiah, and Waikar (1973) give
tables of u = 1/1. See also Schuurman and Waikar (1973).
a'S*a }
(25) Pr {I::; a' a ::; u Va*" 0 = 1 - e,
where
(26) Pr{1 :::;1; ::;/r::; u} = 1- e.
10.8 TESTING THAT A COVARIANCE MATRIX IS EOUAL TO A GIVEN MATRIX 443
Let a = Cb. Then a' a = b'C'Cb = b'1b and a'S*a = b'C'S*Cb = b'Sb. Thus
(25) is
b'Sb }
(27) 1 - e = Pr { I ~ "ii"!Ji ~u Vb*" 0
b'Sb
Pr { - b'Ib < -1-
b'Sb
= u- -< -
with confidence 1 - e.
If b has 1 in the ith position and O's elsewhere, (28) is s,Ju ~ 0;, ~ s;;/I. If
b has 1 in the ith position, - 1 in the jth position, i j, and O's elsewhere, *"
then (28) is
(29)
(30) ~
I
_ S'I + Sjj
2
(!I _!)
u -
< (T < ~ +
') - u
Si' + s)]
2
(1-I _!)
u' i *j.
We can obtain simultaneously confidence intervals on all elements of I.
From (27) we can obtain
(31) 1 - e = Pr { uI 7i'b
b'Sb b'Ib 1 b'Sb
~ b'b ~ T 7i'b Vb
}
= pr{ ~ Ip ~ Ap ~ Al ~ ill}'
where 11 and I p are the large"t and smallest characteristic roots of S and Al
and Ap are the largest and smallest characteristic roots of I. Then
(32)
(2)
(3) x=C(Y-Vo),
where
( 4)
Then x l' ...• X N constitutes a sample from N( f.L, 1), and the hypothesis is
(6) A2 = e- 2"\N-'-
xx.
(7)
(S)
10.9 TESTING MEAN VECfOR AND COVARIANCE MATRIX 445
(12)
+ 9p - 11
= 1 _ 2)12
p 6N(p+3)'
Nagarsenker and Pillai (1974) used the moments to derive exact distributions
and tabulated the 5% and 1% significance points for p = 20)6 and N =
4(1)20(2)40(5)100.
Kow let us return to the observations Y1"'" YN' Then
a a
= trA +Ni'i
tr( B'lTol) + N(j - VO)''lTol(j ~ vo)
and
(15)
Theorem 10.9.1. Given the p-component observation vectors Y •. ... 'Ys from
N( v, 'IT), the likelihood ratio criterion for testing the hypothesis f{: v = v l\'
446 TESTING HYPOTHESES OF EQUALITY OF COVARIANCE MATRICES
(1)
( 2) g= 1, ... ,q,
where the p x'l{ matrix Cg has density proportional to 11+ C" C~I- in!,
ng = Ng - 1, the 'g-component random vector y(g) has the conditional normal
(4)
(6)
(7)
10.11 ELLlPTICALLY CONTOURED DISTRIBUTIONS 449
For invariance we want t = 'Li-I t g' If t I' ... , t q are taken so rg + t g = kNg and
p - 1 < kNg < N g - p, g = 1, ... , q, for some k, then (7) is the likelihood ratiu
test; if rg+tg=kn g and p-l <kng<n g + I-p, g= 1, ... ,q, for some k,
then (7) is the modified test [I.e., (p -l)/min g Ng <k < 1 - p/ming N!).
(8)
The alternative hypothesis has been treated before. For the null hypothesis
let
(9)
(10)
( 1)
= [,t I
N, log I + (i'H -/)1- N logll + (i. -I) I]
= -( ~ Ng[tr(1g0-/)-~tr(igo-/)2 +OANg- 3
)}
g I
By Theorem 3.6.2
(3) {N; vec(S.'J. -Ip) ~ N[O~ (K + 1)(Ipl + Kpp) + K vec Ip (vec II')'] ~
and n~Sg = Ngi gn~ g = 1~ ... ~ q, are independent. Let Ng = k gN, g = 1, ... , q,
E~", I kg = I, and let N -+ 00. In terms of this asymptotic theory the limiting
djstribution of vec(St -I), ... , vec(Sq -I) is the same as the distribution of
ji(ll, .... ylq) of Section 8.8, with 'I of Section 8.8 replaced by (K + 1)(11.2 +
Kpl') + K vec Ip(vec Ip)'.
V
When I = I, the variance of the limiting distribution of Ng (s~f) - 1) is
3 K + 2; the covariance of the limiting distribution of ...[N; (s}f) - 1) and
..jNg ~S~jli) - 1), i i= j, is the variance of s~y>, i =1= j, is K + 1~ the set {N(sW
K;
q
(4) -210gt\=1 LNg(Yg-Y)'(Yg-Y)
g=1
q
= td L Ng(Yg - Y)(Yg -j)'
g=1
(5)
In these terms
q-l
(7) -210g AI = 1 L W~Wg + Op(N-3)~
g"'l
Theorem 10.11.1. When sampling from (1) and the null hypothesis is true,
The likelihood criteria for testing the null hypothesis (2) of Section 10.3 is
the product Al A2 or VI V2 • Lemma 1004.1 states that under normality VI and
V2 (or equivalently Al and A2) are independent. In the elliptically contoured
case we want to show that log VI and log V2 are asymptotically independent.
Then
(11)
d 2
-(K+l)X(q-I)(p-I)(p+2)/2+ [( K+1)+pK/2 ] Xq-I
2 +Xp(q-I)·
2
10.11 ELLIPTICALLY CONTOURED DISTRIBUTIONS 453
( 13)
The first factor is the criteron for independence of the components of X, and
the second is that the variances of the components are equal. For the first we
set q = P and Pr = 1 in Theorem 9.10, and for the second we set q = p and
p = 1. Thus
In this density (A c' ig), g = 1, ... , q, is a sufficient set of statistics. and the
likelihood ratio criterion is (8) of Section 10.2, the same as for normality
[Anderson and Fang (1990b)1.
(17)
for every nonsingular C. Then the distribution of I(X) where X has the arbitrary
density (15) with A I == ••• = A q is the same as the distribution of I( X) where X
has the nonnal density (15).
454 TESTING HYPOTHESES OF EQUALITY OF COVARIANCE MATRICES
PROBLEMS
10.1. (Sec. 10.2) Sums of squares and cross.products of deviations from the means
of four measurements are given below (from Table 3.4). The populations are
Ins versicolor ( I), Iris serosa (2), and Iris lJirginica (3); each sample consists of 50
observations:
A -
(13.0552
4.1740
4.1740
4.X250
8.9620
4.0500
2.7332]
2.0190
1- 8.9620 4.0500 10.8200 3.5820 '
2.7332 2.0190 3.5820 1.9162
A
(60082
= 4.8616
4.8616
7.0408
0.8014
0.5732
0.5~21
0.4556'
2 0.8014 0.5732 1.4778 0.2974 '
0.5062 0.4556 0.2974 0.5442
(19.8128
4.5944
4.5944
5.0962
14.8612
3.4976
2.4056]
2.3338
A~ = 14.8612 3.4976 14.9248 2.3924 .
2.4056 2.3338 2.3924 3.6962
PROBLEMS 455
4'y(g) = 0,
Define
q
Z(g) = "c
J..., gil
ylh)
' g= l, ... ,q.
Show that
(b) Let x~g), 0: 1, ... , N, be a random sample from N(p,(g),I g ), g= l, ... ,q.
Use the result from (a) to construct a test of the hypothesis
H:I 1 = ..• =I q ,
based un a test of independence of Z(q) and the set Z(I), .•. , Z(q -I). Find
the exact distribution of the criterion for the case p = 2.
10.3. (Sec. 10.2) Unbiasedness of the modified likelihood ratio test of Show u? uf.
,
that (14) is unbiased. [Hint: Let
, G = n 1 F In2, r = lul, and c 1 u?
< c2 be the
sOlutions to Gi"' (1 + G r '[(fI, 1<112) = k, the critical value for the modified
likelihood ratio criterion. Then
Show that the derivative of the above with respect to r is positive for 0 <' r < 1,
o for r 1, and negative for r> 1.1
456 TESTING HYPOTHESES OF EQUALITY OF COVARIANCEMATRICES
10.4. (Sec. 10.2) Prove that the limiting distribution of (19) is where f= xl,
W(p + lXq - O. [Hint: Let I = /. Show that the limiting distribution of
(19) is the limiting distribution of
10.5. (Sec. 10.4) Prove (15) by integration of Wishart densities. [Hint: Gvt =
Gni~IIAgl hlAI- ~II can be written as the integral of a constant times
IAI- illn~= Iw(A/i1 I, nil + hn ll ). Integration over Li-t Ag = A gives a constant
times w(AI I, n).]
10.6. (Sec. lOA) Prove (16) hy integration of Wishart and normal dem,ities. [Hint:
L1_tNiXllll-x)(Xlg)-X)' is distributed as LJ:Il yfyf . Use the hint of Prob-
lem 10.5.]
10.7. (Sec. 10.6) Let x\"l, ... , x~) be observations from N(p.(~), I~), ~,= 1,2, and
let A.= Dx~~) _i(·l)(x~,") -x(~))'.
(b) Let dr, di, ... , d~ be the roots of I I I - AI21 = 0, and let
d1 0 0
0 d2 0
D=
0 0 dp
2
Show that T is distributed as IBt!'IBzl/IB I +B 2 1 , where BI is dis-
tributed according to W(D 2 , N - 1) and B2 is distributed according to
2
W(J, N - O. Show that T is distributed as IDC1DI'1 C:!,I /IDC1D + C2 1 ,
whcre C, i:-; di:-;trihuted according to W(J, N - 1).
Pr{VI 5, v} = la(nl - 1, n2 - 1)
10.11. (Sec. 10.7) Let XI"'" x N be a sample from N(p.,:I). What il'i the likelihood
ratio criterion for testing the hypothesis p. == kp.o, :t = k:!:I!I' where P.1I and :Ill
are specified and k is unspecified?
In each case k 2 is unspecified, but :I(I is specified, Find lht.' likelihood mlit)
criterion A2 fOr testing H l' Give Ihc asymptolic distribminn of :? Illg A~
under H 2 • Obtain the exact distribution of a suitahle monOlonic funttion of A~
under H 2 •
10.14. (Sec. 10.7) Find the likelihood ratio criterion A for testing H of Prohlem
10.13 (given X"".,X N ). What is the asymptolic dislributioll t.1t" -::!iog.\ 1I11lkr
H?
10.15. (Sec. 10.7) Show that A = Al A:" where A is defined in Prohlem IlU'+. A~ i::;
defined in Problem 10.13, and AI is the likelihood ratio criterion for HI in
Problem 10.13. Are Al and A1 independently distributed under H? Prove your
answer.
458 TESTING HYPOTHESES OF EQUALITY OF COVARIANCE MATRICES
10.16. (Sec. 10.7) Verify that tr B 'Ito" I has the x1Hdil,tribution with p( N - 1) de-
grees of freedom.
10.17. lSec. 10.7,1) Admissibility of sphericity leSl. Prov~ that the likelihood ratio test
of sphericity is admissible. [Him: Under the null hypothesis let I = U/(1 +
1]~)]I. and let 11 have the density (1 + 11~)- !"p(11 2 ) r t.J
nile. Also, IE~-",x,xil has the distribution of Xr1X?_! ... X?-p+1 if XI""'X,
are independently distributed according to MO, n.1
where C if; P X r. [Hint: ce' has the di~tribution WCA-l,r) if C has a density
proportional to e - ~tr C.4('.J
10.20. (Sec. 10.10.1) Using Problem 10.18, complete the proof of Theorem 10.10.1.
CHAPTER 11
Principal Components
11.1. INTRODUCTION
459
460 PRINCIPAL COMPONENTS
Suppose the random vector X of p components has the covariance matrix 'I.
Since we shall be interested only in variances and covariances in this chapler,
we shall assume that the mean vector is O. Moreover, in developing the ideas
and algebra here, the actual distribution of X is irrelevant except for the
covariance matrix; however, if X is normally distributed, more meaning can
be given to the principal components.
In the following treatment we shall not u~.e the usual theory of characteris-
tic roots and vectors; as a matter of fact, that theory will be derived implicitly.
The treatment will include the cases where 'I is singular (i.e., positive
semidefinite) and where 'I has multiple roots.
Let 11 be a p-component column vector such that P'1l = 1. The variance of
11' X is
(1)
11.2 DEANITION OF PRINCIPAL COMPONENTS IN THE POPULATION 461
~by Theorem AA.3 of the Appendix). Since (3'I (3 and (3'(3 have deI'ivatives
everywhere in a region containing (3'(3 = 1, a vectOr (3 maximizing (3' I (3
must satisfy the expression (3) set equal to 0; that is
( 4) (I - A/)(3 = O.
(5) II-·All=O.
(7)
( 8)
where A and "I are Lagrange multiplers. The vector of partial derivatives is
(9) BcP2
BI3 = 2In _ 2An - 2" In(l)
P PIP,
and we set this equal to O. From (9) we obtain by multiplying on the left by
13(1)'
°
by (7). Therefore, "I = and 13 must satisfy (4), and therefore A must satisfy
(5). Let A(2) be the maximum of AI"'" Ap such that there is a vector 13
~atisfying (l: - A(2)])13 = 0, 13' 13 = 1, and (7); call this vector 13(2) and the
corresponding linear combination U2 = 13(2) X. (It will be shown eventually
I
i=l,oo.,r.
We want to maximize
r
(12) cPr + I = WI 13 - A{13'13 -1) - 2 ~ "II3'II3(I),
i= I
where A and "1"'" "r are Lagrange multipliers. The vector of partial
derivatives is
(13)
and we set this equal to O. Multiplying (13) on the left by 13(j)·, we obtain
(14)
o o
o
(15) A=
o o
The equations I ~(r) = A(r)~(r) can be written in matrix form as
(17) p'P=I.
From (16) and (17) we obtain
(18) P'IP=A.
464 PRINCIPAL COMPONENTS
Theorem 11.2.1. Let the p-component random vector X have ,eX = 0 and
,eXX' = 1. Then there exists an orthogonal linear transformation
(20) u=p'X
such that the covariance matrix of U is ,e UU' = 'l and
o
o
(21) A=
where Al 2. A2 2. ... 2. Ap 2. 0 are the roots of (5). The rth column of 13, ~('\
satisfies (I - A, J)~(r) = O. The rth component of U, U, = ~(,), X, has maximum
variance of all normalized linear combinations unc01Telated with VI" .. , V, -I'
Proof. Let GX = 0 and GXX' = I. Then GV= 0 and cCW' = CIC'. The
generalized variance of V is
(24)
(25)
(27)
or
(28)
Multiplication by I gives
This equation is the same as (4) and the same algebra can be developed.
Thus the vectors ~(l\ ... , ~(p) give the principal axis of the ellipsoid. The
transformation u = 13' X is a rotation of the coordinate axes so that the new
axes are in the direction of the principal axes of the ellipsoid. In the new
coordinates the ellipsoid is
(30)
(31)
Let 11'Y = a; that is, 'Y 'Y = 'Y' 11 X = a' X. Then (31) results from maximizing
$(a' X)2 = a I I a relative to a' 11-2 a. This last quadratic form is a weighted
sum of squares, the weights being the diagonal elements of 11-2.
It might be noted that if 11-2 is taken to be the matrix
(Til 0 0
0 (T22 0
(32) 11-2 =
0 0 (Tpp
( 1) II -kIl =0
(2)
(3)
Proof. When the roots of II - AI! = 0 are different, each vector Il(i) is
uniquely defined except that 1l~/) can be replaced by -Il(i). If we require that
the first nonzero component of Il(i) be positive, then Il(l) is uniquely defined,
and lL, A, P is a single-valued function of lL, I. By Corollary 3.2.1, the set of
maximum likelihood estimates of lL, A, P is the same function of p" i. This
function is defined by (1), (2), and (3) with the corresponding restriction that
the first nonzero component of b(i) must be positive. [It can be shown that if
*
III 0, the probability is 1 that the roots of (1) are different, because the
conditions on i for the roots to have multiplicities higher than 1 determine a
region in the space of i of dimensionality less than W(p + 0; see Okamoto
(1973).] From (1S) of Section 11.2 we see that
(4)
(5)
Replacing b(l) by -b(i) clearly does not Change I:kjb(l)b(i),. Since the
likelihood function depends only on i (see Section 3.2), the maximum of the
likelihood function is attained by taking any set of solutions of (2) and (3) .
•
It is possible to assume explicitly arbitrary multiplicities of roots of I. If
these multiplicities are not all unity, the maximum likelihood estimates are
not defined as in Theorem 11.3.1. [See Anderson (1963a).] As an example
suppose that we aSsume that the equation II - All = 0 has one root of
mUltiplicity p. Let this root be AI' Then by Corollary 11.2.1, ~ - All is of
rank 0; that is, I - All = 0 or I = All. If X is distributed according to
N(lL, I) = NllL, AlI), the components of X are independently distributed
with variance AI' Thus the maximum likelihood esdmator of At is
(6)
and i = All, and P can be any orthogonal matrix. It might be pointed out
that in Section 10.7 we considered a test of the hypothesis that I = All (with
Al unspecified), that is, the hypothesis is that I has one characteristic root of
multiplicity p.
In most applications of principal component analysis it can be assumed
that the roots of I are different. It might also be pointed out that in some
uses of this method the algebra is applied to the matrix of correlation
11.4 MAXIMUM LIKELIHOOD ESTIMATES OF THE PRINCIPLE COMPONENTS 469
There are several ways of computing the characteristic roots and characteris-
tic vectors (principal components) of a matrix ~ or i. We shall indicate
some of them.
One method for small p involves expanding the determinantal equation
(1) 0= II - All
2nd solving the resulting pth-degree equation in A (e.g., by Newton's method
or the secant method) for the roots Al > A2 > ... > Ap. Then I - A,I is of
rank p - 1, and a solution of (I - AiI)~(I) = 0 can be obtained by taking f3/')
as the cofactor of the element in the first (or any other fixed) column and jth
row of I - AJ.
The second method iterates using the equation for a characteristic root
and the corresponding characteristic vector
The rate of convergence depends on the ratio A2/ Al ; the closer this ratio is
to 1, the slower the convergence.
To find the second root and vector define
(5)
Then
(6)
470 PRINCIPAL CO MPONENTS
if i ¢ 1. and
(7)
Thus A2 is the largest root of I2 and ~(2) is the corresponding vector. The
iteration process is now applied to 12. to find A2 and ~(2). Defining I3 = I2
- A:!~(~l~(2.)', we can find A3 and ~(3), and so forth.
There are several ways in which the labor of the iteration procedure may
be reduced. One is to raise I to a power before proceeding with the
iteration. Thus one can use 12, defining
This procedure will give twice as rapid convergence as the use of (3). Using
I ~ = I 2.I:! will lead to convergence four times as rapid, and so on. It should
be noted that since II is symmetric, the re are only p( p + 1)/2 elements to
be found.
Efficient computation, however, uses other methods. One method is the
QR or QL algorithm. Let Io = I. Define recursively the orthogonal Q/ and
lower triangular L, by I, = Q,L, and I i + 1 = LiQi (= Q~I/Q/), i = 1,2, ....
(The Gram-Schmidt orthogonalization is a way of finding Q/ and Lj; the QR
method replaces a lower triangular matrix L by an upper trhmgular matrix
R.) If the characte ristic roots of I a re distinct, lim i ... ooI /+ 1 = A* , w he re A*
is the diagonal matrix with the roots usually ordered in ascending order. The
characteristic vectors are the columns of lim, _ cc Q: Q: -1 .,. Q'1 (which is com·
puted recursively).
A more efficient algorithm (for the symmetric I) uses a sequence of
Householder transformations to carry l: to tridiagonal form. A Householder
matrb: i,.; H = I - 2a a' where a' a = 1. Such a matrix is orthogonal and
symmetric. A Householder transformation of the symmetric matrix I is
H l: H. It is symmetric and has the same characteristic roots as I; its
characteristic vectOrS are H times those of I.
A tridiagonal matrix is one with all entries 0 except on the main diagonal,
the first superdiagonal, and the first subdiagonal. A sequence of p - 2
Householder transformations carries the symmetric I to tridiagonal fOlm.
(The first one inserts O's into the last p - 2 entries of the first column and
row of HIH, etc. See Problem 11.13.)
11.S AN EXAMPLE 471
The QL method is applied to the tridiagonal form. At the ith step let the
tridiagonal matrix be TJi); let ~.(I) be a block-diagonal matrix (Givens matrix)
I 0 0 0
0 cos OJ' - sin Of 0
(9) p.(i) = ,
)
0 sin Of cos OJ 0
0 0 0 I
where cos OJ is the jth and j + 1st diagonal element; and let ~.(i) = p~~}~.~\,
j = 1, ... ,p - 1. Here OJ' is chosen so that the element in position j, j + 1 in ~.
is O. Then pO) = pfl)p~l) ... P~~l is orthogonal and P(i)TJI) = R(i) is lower
triangular. Then TJi+l) =R(i)P(i), (=P(i)TJI)P(i)') is symmetric and tridiago-
nal. It converges to A* (if the roots are all different). For more details see
Chapters 11/2 and 11/3 of Wilkinson and Reinsch (1971), Chapt~r 5 of
Wilkinson (1965), and Chapters ), 7, and 8 of Golub and Van Loan (1989).
A sequence of one-sided Householder transformation (H I) can carry I to
R (upper triangular), thus effecting the QR decomposition.
11.5. AN EXAMPLE
and an estimate of I is
S = ...LA =
0.266433
0.085184
0.085184
0.098469
0.182899
0.082653
0.055780
0.041204
1
(2)
49
( 0.182899 0.082653 0.220816 0.073102 .
0.055780 0.041204 0.073102 0.039106
472 PRINCIPALCOMPONENTS
0.6867244
0.3053463
(3)
0.6236628 .
0.2149837
This vector agrees with the normalized seventh iterate to about one unit in
the sixth place. It should be pointed out that 11 and b(l) have to be calculated
more accurately than 12 and b(2), and so forth. The trace of S is 0.624824,
which is the sum of the roots. Thus 11 is more than three times the sum of
the other rootS.
We next compute
and iterate z() = S2Z(;-n, using zeOlI = (0,1,0,0). (In the actual computation
S2 was multiplied by 10 and the first row and column were multiplied by -1.)
In this case the iteration does not proceed as rapidly; as will be seen, the
ratio of 12 to 13 is approximately 1.32. On the last iteration, the ratios agree
to within four units in the fifth significant figure. We obtain 12 = 0.072 3828
and
-0.669033
0.567484
(5)
0.343309
0.335307
The sum of the four roots is E;.. III = 0.6249, compared with the trace of the
sample covariance matrix, tr S ::;: 0.624 824. The first accounts for 78% of the
total variance in the four measurements; the last accounts for a little more
than 1%. In fact, the variance of 0.7x 1 + O.3x~ + 0.6x,3 + 0.2.t~ (an approxi-
mation to the first principal component) is 0.478, which is almost 77o/r. of the
total variance. If one is interested in studying the variations in condition~ that
lead to variations of (XI' x 2 ' X 3 • x 4 ), one can look for variations in conditions
that lead to variations of 0.7x , + 0.3X2 + 0.6'\",1 + 0.2x~. It is not very impor-
tant if the other variations in (Xl' X2' X:;, ..\'~) are neglected in exploratory
investigations.
the limiting distribution N(O,2A;>. The covariances of g(1), ••• , g{P) in the
limiting distribution are
(I)
(2)
,1, - A,
(3) vn--
fil,
(4)
rn 1, - A~ :::=;;z(e),
-z(e)::::;;V2: 0
A,
where the value of the NCO, 1) distribution beyond z(e) is 1e. The interval
(4) can be inverted to give a confidence interval for A, with confidence 1 - e:
( 6)
11.6 STATISTICAL INFERENCE 475
where 11 I is the p X P diagonal matrix with a as the ith diagonal element and
J A;Aj j(A 1 - Aj) as the jth diagonal element, j -+ i; 11~ is the (p - 1) X (p 1)
diagonal matrix obtained from 11, by deleting the ith row and column; and
Pi is the p X (p 1) matrix formed by deleting the lth column from 13. Then
1
h(l) = 111- 13i '{ii (b(i) - p(l» has a limiting normal distribution with mean 0
(7)
end
(8)
because I3A- II3' =l:-I, 1313' =1, and I3AI3' =l:. Then (8) is
This approach also provides a test of the null hypothesis that the ith
characteristic vector is a specified 13~) (13~)'13~) = 1). The hypothesis is
rejected if the right-hand side of (11) with 13(1) replaced by 131:) exceeds
X;- I(e).
Mallows (961) suggested a test of whether some characteristic vector of
'I is 130' Let Po be p X (p - 1) matrix such that 130130 = O. if the null
hypothesis is true, WoX and P;,X are independent (because Po is.l nonsingu-
lar transform of the set of other characteristic vectors). The test is based on
the multiple correlation between 13~)X and p;,X. In principle, the test
procedure can be inverted to obtain a confidence region. The usefulness of
these procedures is limited by the fact that the hypothesized vector is not
attached to a characteristic root; the interpretation depends on the root (e.g.,
largest versus smallest).
Tyler (981), C1983b) has generalized the confidence regicm (] 1) to includ~
the vectors in a linear subspace. He has also ~tudied casing the restrictions of
a normally distributed parent population.
( 12)
Then
(13)
b'Sb b'Sb}
~ Pr { min - - s; Ap ' Al S; max -1-
b'b=1 u b'b"" I
11.6 STATISTICAL INFERENCE 477
(14)
where H- I = (h ij ) and ch/H) and chl(H) are the minimum and maximum
characteristic roots of H, respectively.
(16) 1= 1. .. ., p.
Since ch/H) l/ch\(H- 1 ) and chl(H) = l/ch p (H- 1 ), the lemma follows.
•
The argument for Theorem 5.2.2 shows that 1/(A p h PP ) is distributed a3
x 2 with n - p + 1 degrees of freedom, and Theorem 4.3.3 shows that h PP is
independent of h n . Let I' and u' be two numbers such that
Then
(18) 1
478 PRINCIPALCOMPONENTS
[ [
(19) ....l!.-<..\<\<...l
u' - I\p - 1\1 - [' ,
(20)
(21 )
The homogeneity condition means that the confidence bounds are multiplied
by c~ if the observed vectors are multiplied by c (which is a kind of scale
invariance). The monotonicity conditions imply that an increase in the size of
S results in an increase in the limits for I (which is a kind of consistency).
The confidence bounds given in (31) of Section 10.8 for the roots of I
based on the distribution of the roots of S when I = I are greater.
(1 )
11.7 CHARACTERISTIC ROOTS OF A COVARIANCE MATRIX 479
where,), is specified, against the alternative that the sum is less than ')'. If the
characteristic roots of I are different, it follows from Theorem 13.5.1 that
(2)
has a limiting normal distribution with mean 0 and variance 2Ef=m + 1 A1- The
variance can be consistently estimated by 2E;", m + 1 I,? Then a rejection region
with (large-sample) significance level e is
(3)
where z(2 e) is the upper 3ignificance point of the standard normal distribu-
tion for significance level e. The (large-sample) probability of rejection is e if
equality holds in (1) and is less than e if inequality holds.
The investigator may alternatively want an upper confidence interval for
E;=m + 1 AI with at least approximate confidence level 1 - e. It is
p P J2E{'=III+ II:
(4) i"'~IAI::;;i=E-I/,+ rn z(2e).
If the right-hand side is sufficiently small (in particular less than ')' j, the
investigator has confidence that the sum of the variances of the smallest
p - m principal components is so small they can be neglected. Anderson
(1963a) gave this analysis also in the case that Am+ I = ... = Ap.
where 0 is specified, against the alternative that f(A) < O. We use the fact
that
Bf(A} Am+1 + ... +Ap
i = 1, ... ,m,
aA; (AI + ... + Ap) 2 '
(6)
Bf(A} A1 + ... +Am
aA I
= i=m+1, ... ,p.
(A1 + ... + Ap) 2 '
480 PI'.INCIP AL CO MPONENI'S
(8)
(9)
nf=m+l/,
,,-111 ( p-m )p-m .
(r.r ,'/ ~ I 1/)
11.7 CHARACTERISTIC ROOTS OF A COVARIANCE MATRIX 481
It is also the likelihood ratio criterion, but we shall not derive it. [See
Anderson (1963a).] Let [;;Ui-Am+1)=d" i=m+l, ... ,p. The logarithm
of (9) multiplied by -n is asymptotically equivalent under the null hypothesis
to
p
2:i==m+lI,
(10) -n log n
t=m+ 1
I, + n( p - m) lng---=-m
p
p 2: P (A ", + 1 +n-~d),
-- _ n ~
L.J
1og (\I\m + 1 - ) + n ( P - m ) 1og
+ n - +d I
I ==m + I
_m
i==m+l p
=n{- ~ log(l+ d
,
l)+(p-m)log(l+
d
2:;,""I1I+l , .!.l}
I =m + 1 Am + 1 n' (p - m) Am + In:
1 PIP
L d; - 2]
=
2Am+1
2
[;=m+l p
_m ( L+ d, ) + 0l'( 1) -
l=m 1
(1)
e( (X -- v XX - v)' = ( cF R 2I p ) 'II = I.
The maximum likelihood e~timators of the principal components of I are
the ~haractcri~tic roots and vectors of i = U:'R2Ip)A given by (20) of
Section 3.6. Alternative estimators are the characteristic roots and vectors of
S, the unbia5ed estimator of I. The asymptotic normal distributions of these
e~timators are derived in Section 13.7 (Theorem 13.7.0. Let I = I3AI3' and
S = BLB', whl:rc A and L arc diagonal and 13 and B are orthogonal. Let
D = IN (L - A) and G = IN (B - 13). Then the limiting distribution of D
and G is normal with G and D independent.
The variance of d, is (2 + 3K)A~, and the covariance of d j and d) (i j) is *"
KA, A" The covariance of g, is
(2)
PROBLEMS 483
(3)
For inference about a single ordered root A/ the limiting standard normal
distribution of IN ([i - Ai) /( /2(2 + 3 K) 1/) can be used.
For inference about a single vector the right-hand side of (11) in Section
11.6.2 can be used with S replaced by (1 + K)S and S-I by S 1/(1 + K).
It is shown in Section 13.7.1 that the limiting distribution of the logarithm
of the likelihood ratio criterion for testing the equality of the q = p - m
smallest roots is the distribution of (1 + IdXq2(Q 1)/2-1'
PROBLEMS
(Vii)
1/1i and
( 1/1i 1
-1/1i'
11.2. (Sec. 11.2) Verify tha t the proof of Theorem 11.2.1 yields a proof of Theorem
A.2.1 of the Appendix for any real symmetric matrix.
484 PRINCIPAL COMPONENTS
11.3. (Sec. 11.2) Let z = y +x, where $y $x 0, $yy' = «fl., $xx' = (J'21, Iyx'
= O. The p components of y can be called systematic parts, and the compo-
nents of x errors.
(a) Find the linear combination 'Y' z of unit variance that has minimum error
variance (i.e., 'Y' x has minimum variance).
(b) Suppose <Pff + (J' 2 = 1, 1= 1, ... , p. Find the linear function 'Y' Z of unit
variance that maximizes the sum of squares of the correlations between II
and 'Y'z, i = 1,. .. , p.
(c) Relate these results to principal components.
11.4. (Sec. 11.2) Let l: = 4> + (J' 21, where 4> is positive semidefinite of rank m.
Prove that each characteristic vector of 4> is a vector of l: and each root of l:
is a root of 4> plus (J' 2.
where E = (1, ... , 1)'. Show that for p > 0, the largest characteristic root is
(J' 2[1 + (p -l)p] and the corresponding characteristic vector is E. Show that if
E' x = 0, then x is a characteristic vector corresponding to the root (J' 2(1 - p).
Show that the root (J' 2(1 - p) has multiplicity p - 1.
11.7. (Sec. 11.3) In the example of Section 9.6, consider the three pressing opera-
tions (X2' X4' xs)' Find the first principal component of this estimated covari-
ance matrix. [Hint: Start with the vector (1,1,1) and iterate.)
11.8. (Sec. 11.3) Prove directly the sample analog of Theorem 11.2.1, where LX,.. =
0, LXax~=A.
U.9. (Sec. 11.3) Let II and Ip be the largest and smallest characteristic roots of S,
respectively. PrOve $11 ~ AI and $Ip::;; Ap'
11.10. (Sec. 11.3) Let UI = ~(lll X be the first population principal component with
variance r(ul ) = A., and let VI =h(l)lX be the first sample principal compo-
nent with (sample) variance II (based on S). Let S* be the covariance matrix
of a second (independent) sample. Show G h(l), S* h O ) :s;; AI'
PROBLEMS 485
11.11. (Sec. 1 L3) Suppose that O"ij> 0 for every i, j [I = (0"1;)]' Show that (a) the
coefficients of the first principal component are all of the same sign. and
(b) the coefficients of ea( h other principal component cannot be all of the
same sign.
. (1 )' =E
11m
'-ox
TA
1\1
11 •
where Ell has 1 in the upper left-hand position and O's elsewhere.
(d) Show lim, JtiAD2 = I I(P(l It XliII f!.
-0
11.15. (Sec. 11.6) Prove that lI' < II if I' = 1 and p> 2.
11.16. (Sec. 11.6) Prove that u < u* if 1 = 1* and p> 2. where rand u" are the 1
and u of Section 10.8.4.
486 PRINCIP AL COMPONENTS
U.l7. The lengths. widths. and heights (in millimeters) of 24 male painted turtles
[Jolicoeur and Mosimann (1960)] are given below. Find the (sample) principd
components and their variances.
Case Case
No. Length Width Height No. Length Width Height
1 93 74 37 13 116 90 43
2 94 78 35 14 117 90 41
3 % 80 35 15 117 91 41
4 101 84 39 16 119 93 41
5 102 85 38 17 120 89 40
6 103 81 37 18 120 93 44
7 [04 83 39 19 121 95 42
8 106 83 39 20 125 93 45
9 107 82 38 21 127 96 45
[0 112 89 40 22 128 95 46
11 113 88 40 23 131 95 46
12 114 86 40 24 135 106 47
CHAPTER 12
Canonical Correlations
and Canonical Variables
12.1. INTRODUCTION
In this section we consider two sets of variates with a joint distribution, and
we analyze the correlations between the variables of one set and those of the
other set. We find a neW coordinate system in the space of each set of
variates in such a way that the new coordinates display unambiguously the
system of correlation. More precisely, we find linear combinations of vari-
ables in the sets that have maximum correlation; these linear combinations
are the first coordinates in the new systems. Then a second linear combina-
tion in each set is sought such that the correlation between these is the
maximum of correlations between such linear combinations as are uncorre-
lated with the first linear combinations. The procedure is continued until the
two new coordinate systems are completely specified.
The statistical methOd outlined is of particular usefulness in exploratory
studies. The investigator may have two large sets of variates and may want
to study the interrelations. H the two sets are very large, he may want
to consider only a few linear combinations of each set. Then he will want to
study those l.near combinations most highly correlated. For example, one set
of variables may be measurements of physical characteristics, such as various
lengths and breadths of skulls; the other variables may be measurements of
mental characteristics, such as scores on intelligence tests. H the investigator
is interested in relating these, he may find that the interrelation is almost
487
488 CANONICAL CORRELATIONS AND CANONICAL VARIABLES
( 1) X=
X(I)
( X(2)
1
•
For convenience we shall assume PIS [.2' The covariance matrix is parti-
tioned similarly into P I and P2 rows and columns,
(2)
12.2 CORRELATIONS AND VARIATES IN THE POPULATION 489
(J)
( 4)
We note that ffU = ,sa' XII) = a' ff XII) = 0 and similarly ff V = O. Then the
correlation between U and V is
(5)
Thus the algebraic problem is to find a and 'Y to maximize (5) subject to (3)
and (4).
Let
where A and J.-L are Lagrange multipliers. We differentiate '" with respect to
the elements of a and 'Y. The vectors of derivatives set equal to zero are
(7)
(8)
Multiplication of (7) on the left by a' and (8) On the left by 'Y' gives
(10)
Since Of'I II a = 1 and 'Y 'I 22 '), = 1, this shows that A = J.-L = a 'I I~ 'Y. Thus (7)
490 CA"'10 NICAL CO RRELAnONS AND CANONICAL VARIABJ~ES
(13) (
In order that there be a nontrivial solution [which is necessary for a solution
satisfying (3) and (4)], the matrix on the left must be singular; that is,
(14)
'I
in the first PI columns does not contain A. Since I is positive definite.
III ., 122\ :;. 0 (Corollary A.1.3 of the Appendix). This shows that (14) is a
polynomial equation of degree P and has P roots, say AI ~ A2 ~ ... ~ Ap. [a'
and 1" complex conjugate in (9) and (10) prove A real.]
From (9) we see that A = a'I l2 y is the correlation between U = a'X(I)
and V = 1" Xl Z) when a and l' satisfy (13) for some vulue of A. Since we
want the maximum correlation, we take A = AI' Let a solution to (13) for
A = A, be 0'('>' y(ll, and let Ul = 0'0)1 X(I) and V l = 1'(1), X(2) Then U I and VI
are normalized linear combinations of X(l) and X(2). respectively, with maxi-
mum correlation.
We now consider finding a second linear combination of X(l), say c: =
a X (I) , and a second linear combination of X(2), say V = 1" X(2), such that of
I
tion of X(2), V = 1" X (2), that among all linear combinations uncorrelated with
U I ' VI"'" Ur • v,., have maximum correlation. The condition that U be uncor-
related with U, is
(15)
12.2 CORRELATIONS AND VARIATES IN THE POPULATION 491
Then
(16)
(17)
(18)
We now maximize cC' Ur +I v,+ I, choosing a and "'I to satisfy (3), (4), (15),
and (17) for i = 1,2, ... , r. Consider
, ,
+ "t... va'I
I II aU) + "t... 0."'1
I 'I 22 'Y U),
i= I i= 1
°
where A, /-L, VI> ••• , Vr' 1, ••• , Or are Lagrange multipliers. The vectors of
partial derivatives of I/Ir+ I with respect to the elemellts of a and "'I are set
equal to zero, giving
(20)
(21)
Multiplication of (20) on the left by a(j)' and (21) on the left by 'Y(j), gives
(22)
(23)
Thus (20) and (21) are simply (11) and (12) or alternatively (13). We therefore
take the largest Ai' say, A(r+ I), such that there is a solution to (13) satisfying
(1), (4), (15), and (17) for i = 1, ... , r. Let this solution be a(r+ 1\ 'Y(r+ 1\ and
let Ur+1 = a(r+ I), X(l) and V.r =+ 'Y(r+ I), x (2)
l·
This procedure is continued step by step as long as successive solutions
can be found which satisfy the conditions, namely, (13) for some AI' (3), (4),
(15), and (17). Let m be the number of steps for which this can be done. Now
492 CANONICALCORRELATIONS AND CANO NICAL VARIABlES
we shall show that m=PI' (s;.P2)' Let A=(a(l) '" a(m)), 1'1 =(')'(1) ...
,),(m)), and
o
(24)
o
o o
The conditions (3) and (15) can be summarized as
(25) A'IllA = I.
(26)
(29)
(30)
Since E'I II E is nonsingular, comparison of (27) and (30) shows that c = va,
11.2 CORRELATIONS AND VARIATES IN THE POPULATION 493
(31)
(32)
(33) A'IIIA=I,
(34) A'Il2fl =A,
(35) f;I22fl = I.
(36) f;I22fl = 0,
(37) f~I22f2 = I.
Any f2 can be multiplied on the right by an arbitrary (P2 - PI) X (P2 - PI)
orthogonal matrix. This matrix can be formed one column at a time: 'Y(P' + Il
is a vector orthogonal to I22fl and normalized so 'Y(p,+I)'I 22 'Y(P,+I\ = 1;
",/PI+2) is a vector orthogonal to I 22 (f, 'Y(PI+ 1») and normalized so
'f (PI +2) '1 22 'Y (PI +2) = 1; and so forth. Let f = (f I f 2); this square matrix is
nonsingular since f' I 22 f = I. Consider the determinant
A'
(38) o
o
-AI A 0
= A - AI 0
o 0 -AI
= (-A)P2-PII- I
A
-~\
=(-A)P2- P' I-AII·I-AI-A(-AI)-I A I
-AlII Il2
(39)
I:lI - AI22
Thus the roots of (14) are the roots of (38) set equal to zero, namely,
A = ± AI'), i,:: 1, ... ,Pl' and A = 0 (of multiplicity P2 - PI)' Thus (AI'"'' Ap)
= (AI"'" Apt' 0, ... ,0, - ApI"'" - Al ), The set {AU)2}, i = 1, ... , Pl' is the set
{An, i = 1, ... , Pl' To show that the set {A(I)}, i = 1, ... , PI' is the set {A,,},
i = 1, ... , Pl, we only need to show that A(I) is nonnegative (and therefore is
one of the A" i = 1, ... , PI)' We observe that
thus, if A(r), a(r" 'Y(r) is a solution, so is - A(r), - a(r>, 'Y(r). If A(r) were
negative, then - A(r) would be nonnegative and - A(r) ~ A(r). But since A(r)
was to be maximum, we must have A(r) ~ - A(r) and therefore A(r) ;;:: O. Since
the set {A(I)} is the same as {A,}, i = 1, ... , PI' we must have AU) = A,.
Let
VI
(42) U= =A'X(I),
VPt
Vj
(44) V(2) =
V
p
,.,
:
1 r;x(2).
=
VP2
The components of U are one set of canonical variates, and the components
12.2 CORRELATIONS AND VARIATES IN THE POPULATION 495
$(~(+U' ~, )(~"
A'
(45) V(l)' V (2) , ) = 0 ~I2W 0
r ~2)
V(2) 0 r ~21
2
122 0 l
[PI A 0
= A [PI 0
0 0 [
PZ-PI
where
AI a a
a A2 a
(46) A=
a a ApI
with respect to b is fOl bi = c i , since Ectb, is the cosine of the angle between
the vector b and (c1 .... ,c,JI'0, ... ,0). Then (47) is
(50)
or
(51)
(52)
'fy(51)&lor 1\\2_\2
an d ex (I) , ••• , ex (PI) salIs \2 respectIve
- 1\1'" ., I\PI' "1ar
' Iy. Th e SImI
equations for y(l), ... ,y(P2) occur when >t.2 = Ar .... , A;2 are substituted with
(54)
-AC I I. II Cj C I I. 12 C; CI 0 -AI. II I. 12 C'I 0
0= C'z ,
C 2I. 21 Ci -AC2I. 22 C Z 0 C2 I.2! -AI. 22 0
12.2 CORRElATIONS AND VARIATES IN THE POPULATION 497
and hence the roots are unchanged. Conven;dy, let f( Y.. ll , ~ 12, Y.. ~2) be a
vector-valued function of I such that f(CIIIIC~,CIII~C~.C:;I~:!C;)=
f(I", II2' 1: 22 ) for all non~inglliar C, and C~. If C~ = A and C:! = ('I, then
(54) is (~8), which depends only on the canonical correlations. Then f=
fCI, (1\,0), I). •
and the relative mean squared effect can be measured by the ratio
tf(bV)2/lffU:! = pZ. Thus maximum effect of a linear combination of X(:!) on
a linear combination of XU) is made by "'1(1)1 Xc::!) on «(1)1 XO),
In the special case of p, = 1, the one canonical correlation is the multiple
correlation between XU) = XI and X (::!).
The definition of canonical variates and correlations was made in terms of
the covariance matrix I = Iff(X - Iff XXX - Iff X)'. We could extend this
treatment by starting with a normally distributed vector Y with p + P.l
components and define X as the vector having the conditional distribution of
the first p components of Y given the value of the last P.; components. This
498 CANONICAL CORRELATIONS AND CANONICAL VARIABLa
\\ould n1~al1 treating X", wi' h mean S Xrp = ey~); the elements of the
covariance matrix would be the partial covariances of the first P elements
of Y.
l11e interpretation of canonical variates may be facilitated by considering
the correlations between the canonical variates and the components of the
original vectors [e.g., Darlington. Weinberg, and Wahlberg (1973)]. The
covariance between the jth canonical variate ~ and Xi is
PI PI
(57) rf~X, = rf E aL)XkX, = ~~ aLilaki ·
k'=l k",l
(nO) I 12 = A( A 0)r I ,
12.3.1. Estimation
Let x I'.'. X" be N observations from N(f.L, I). Let xa be partitioned into
I
(2)
(3)
(4)
t 2 satisfies
(6)
(7)
Theorem 12.3.1. Let XI' ••• , X N be N observations from N(JL, I). Let I be
partitioned into PI and P2 (PI :5P2) rows and columns as in (2) in Section 12.2,
and let X" be similarly partitioned as in (1). The maximum likelihood estimators
of the canonical correlations are the roots of (3), where iif are defined by (2).
Th:: maximum likelihood estimators of the coefficients of tile jth canonical
compf)nents satisfy (4) and (5), j = 1, ... , PI; the remaining components satisfy
(6) and (7).
(10)
We shall call the linear combinations a(j) x(l) and C(j)1 X(2) the sample
" a
canonical variates.
We can also derive the sample canonical variates from the sample correla-
tion matrix,
21
Let
/s-:; o o
o J S22 o
( 12)
o o
12.3 ESTIMATION OF CANONICAL CORRELATIONS AND VARIATES 501
VSPI+I,PI+I 0 o
(13) S2 =
0 Is PI +~ ~
~.PI +~
o
0 0
angle between these two vectors is the correlation between u a = 0:' X~I) and
va = 'Y 'X~2), a = 1, ... , N .. Finding 0: and 'Y to maximize the correlation is
equivalent to finding the veetors in the PI-space and the P2-space such that
the angle between them is least (Le., has the greatest cosine). This gives the
first canonical variates, and the first canonical correlation is the cosine of the
angle. Similarly, the second canonical variates correspond to vectors orthogo~
nal to the first canonical variates and with the angle minimized.
12.3.2. Computation
We shall discuss briefly computation in terms of the population quantities.
Equations (50), (50, or (52) of Section 12.2 can be used. The computation of
II2I221 I21 can be accomplished by solving I21 = I22F for IZ21 I21 and
then multiplying by I12' If PI is sufficiently small, the determinant
II l2 I;;/I 21 - vIlli can be expanded into a polynomial in v, and the
polynomial equation may be solved for v. The solutions are then inserted
iuto (50 to arrive at the vectors 0:.
In many cases P I is too large for this procedure to be efficient. Then one
can use an iterative procedure
(17)
502 CANONICALCORRELATIONS AND CANONICAL VARIABLES
U8 )
The A2(i + 1) converges to At and aU + 1) converges to a(l) (if AI> A2).
This can be demonstrated in a fashion similar to that used for principal
components, using
(19) ""
~ll-I "" "" - 1 ""
~12~22 -
~21- A A 2A- 1
PI
(20) 1"1/1121221121 - A~a(I)«(I), = E a(()A;ii(i)'
;=2
o o o
o A~ o
=A
o 0
The maximum characteristic root of this matrix is A~. If we now use this
matrix for iteration, we will obtain A~ and a(2). The procedure is continued
to find as many A; and a(i) as desired.
Given A." and aU) we find 'Y(i) from I 21 aU) = A.I I 22 'V(i)
J
or equivalently
(1/ A, }'i;:;:lI 21 aU) = 'Y(i). A check on the computations is provided by com-
U
Paring I 12 'Y ) and AiI l a(i) l' ,.
For the sample we perfonn these calculations with 'It ij or SCj substituted
for I'l' It is often convenient to use Rij in the computation (because
-1 < r 'l < 1) to obtain S 1aU) and S 2 cU). from these aU) and c(j) can be
'
computed.
Modern computational procedures are available for canonical correlations
and variates similar to those sketched for principal components. Let
(21) Z 1 = (x(l)
1 - x-(l) , ... , X(I)
N - X-(I)) ,
(22)
12.4 STATISTICAL INFERENCE S03
An A 1z
A
A' 0 Au A 1z "
A 0
AZI An 0
A
I A 0
A I 0
PI
0 0 I I A = II - A21 = fl (1 - rl),
III ·111 "
A I i-I
where r l = II ~ ... ~ 'PI = [PI .2 0 are the PI possibly nonzerO sample canoni~
cal correlations. Under the null hypothesis, the limiting distribution of
Bartlett's modification of - 2 times the logarithm of the likelihood ratio
criterion, namely,
PI
(2) - [N - 1(p + 3)] E log(1 - 'l),
i-I
504 CANONICAL CORRELA nONS AND CANONICAL VARIABLES
PI
(3) N E r,2 = N tr A~IIA 12AZ/A21'
i= I
PI
(4) - [N - H P + 3)] E 10g(1- rl)
i=k+ I
Yet another procedure (which can only be carried out for small PI) is to test
PPI = 0, then PPI _ I = 0, and so forth. In this procedure one would use r] to
test the hypothesis P, = O. The relevant asymptotic distribution will be
discussed in Section 12.4.2.
r2
Z, = IN I( i=l. .... k.
(5) 2p, 1
.,
z(=Nr,~, i=k+l,····PI·
Then in the limiting distribution ZI" .• ,21. Hnd the set :::, t I ' " ' ' ':"1 arc
mutually independent, z, haf\ the limiting distribution N(O. I). 1.... , k.
and the density of the limiting distribution of zk+ I I " " zp, if\
(6)
.n PI
;=k+l
zt(P~-PI I) n PI
i.J""t.+l
(2, ZJ)'
I <I
12.5. AN EXAMPLE
be the head length of the first son in the a th family, x 2!l be the head breadth
of the first son, x3a be t.he head length of the second son, and .\""'1 he the
head breadth of the second son. We shall investigate the relations between
the measurements for the first son and for the second. Thus X~IP = (x I,,' x?,,)
506 CANONICAL CORRELATIONS AND CANONICAL VARIABLES
(2)
All of the correlations are about 0.7 except for the correlation between the
two measurements on second sons. In particular, RI2 is nearly of rank one,
and hence the second canonical will be near zero.
We compute
-IR
R 11 R12 (0.544311 0.53R841 )
(4) 21 = 0.538841 0.534950 .
0.552166 ) 1.366501]
( 6) SaIl) - S a(2)-
I - [ 0.521548 ' 1 - [ -1.378467'
where
( 7)
125 AN EXAMPLE 507
(9) o 1= ( 10.0402 0)
";S44 0 6.7099 .
(10) 1 -1
I;Rll R12 S2 c
( (1))_(0.552157)
- 0.521560 '
..!. -1 (S (2)) _ ( 1.365151)
12 Rll R12 2 C
--1.376741 .
The first vector in (10) corresponds closely to the first vector in (6); in fact, it
is a slight improvement, for the computation is equivalent to an iteration on
Sla(l). The second vector in (10) does not correspond as closely to the second
vector in (6). One reason is that 12 is correct to only four or five significant
figures (as is 112 = Ii) and thus the components of S2 C(2) can be correct to
only as many significant figures; secondly, the fact that S2 C(2) corresponds
to the smaller root means that the iteration decreases the accuracy instead of
increasing it. Our final results are
(1) (2)
if = 0.789, 0.054,
The larger of the two canoni:al correlations, 0.789, is larger than any of
the individual correlations of a variable of the first set with a variable of the
other. The second canonical correlation is very near zero. This means that to
study the relation between two head dimensions of first sons and second sons
we can confine our attention to the first canonical variates; the second
canonical variates are correlated only slightly. The first canonical variate in
each set is approximately proportional to the sum of the two measurements
divided by their respective standard deviations; the second canonical variate
in each set is approximately proportional to the difference of the two
standardized measurements.
508 CANONICAL CORRElATIONS AND CANONICAL VARIABLES
(l)
(2)
Since we consider a set of random vectors X~ I), ••• , X;'P with expected values
depending on X~2\ ..• , x~) (nonstochastic), we em write the conditional
expected value of X,~I) as T + P(X~2) - X(2)), where T = JL{I) + p(X(2) - JL(2))
can be considered as a parameter vector. This is the model of Section 8.2
with a slight change of notation.
The model of this section is
where X~2\ .•• , x~) are a set of nonstochastic vectors (q X 1) and .f(2) =
N-IL~"'lX~). The covariance matrix is
(4) rS(X(L)
q, - rSX(I))
q, (x(L)
q, - rSX(I))'
<b = 'V •
Consider a linear combination of X~L), say Uq, = a 'X~I). Tht;n U.p has
variance a ''Va and expected value
(5)
12.6 LINEARLY RELATED EXPECTED VALUES 509
The mean expectrd value is (1/N)L~=1 @U4>= a'T, and the mean sum of
squares due to X(2) is
N N
(6) ~ E (SU4>-a'T)2= ~ E a'p(x;)_i(2))(x~2)-i(2))'p'a
</;>=1 </;>= I
= a'pS22p'a.
We can ask for the linear combination that maximizes the mean sum of
squares relative to its variance; that is, the linear combination of dependent
variables on which the independent variables have greatest effect. We want
to maximize (6) subject to a' '\(I a = L That leads to the vector equation
(7)
for K satisfying
(8)
Multiplication of (7) on the left by a' shows that a 'pS 22 p' a = K for a and
K satisfying a' '\(I a = 1 and (7); to obtain the maximum we take the largest
root of (8), say KI' Denote this vector by a(l), and the corresponding random
variable by UI 4> = a(l), X~I). The expected value of this first canonical vari~
able is SUI4>=a(l)'[p(x~)-i(2))+T]. Let a(l)'P=k;r(l)', where k is
determined so
(9) 1=
N' ( ;r(l)'x~) - ~ E
~ E N )2
;r(l)'x~~)
</;>~I 1)=1
= kE ;r(l),(x~) _i(2))(X~2)
N
</;>= 1
-i(2))';r(l)
Then k = {K;. Let UI 4> = ;r(l),(x~) - i(2)). Then SUI 4> = {K; v~1) + a(1)'T.
Next let us obtain a linear combination U4> = a' X~I) that has maximum
effect sum of squares among all linear combinations with variance 1 and
uncorrelated with UI 4>' that is, 0 = S(U4> - SU4>XU 1cb - SU I 4»' = a ''\(Ia(l).
As in Section 12.2, we can set up this maximization problem with Lagrange
multipliers and find that a satisfies (7) for some K satisfying (8) and
a' '\(I a = L The process is continued in a manner similar to that in Section
12.2. We summarize the results.
The jth canonical random variable is ~cb = aU)'Xil), where aU) satisfies
(7) for K = K) and a(f)''\(Ia U ) = 1; Kl ~ K2 ~ ... ~ Kp , are the roots of (8).
510 CANONICAL CORREIATIONS AND CANONICAL VARIABLES
We shall assume the rank of f3 is PI :S Pz' (Then Kpl > 0,) 01/> has the largest
effect sum of squares of linear combinations that have unit variance and are
uncorrelated with U'cb"'" ~-l. cb' Let 'Y U ) = (1/ ji;)f3'a(J), Vj = a(J)/T , and
V)cb = 'Y(J)'(xCj;) - i(~)), Then
(10)
(11 )
(12) i =1= j,
(13)
(IS) n1 1/>='
EN ( 1',,, - 1
N
N)(
E l"1 l'. - 1
N
N v" 1'= /,
E
~=l ~=,
The random canonical variates are uncorrelatcd and have variance I, The
expected value of each random canonical variate is a multiple of the corre
sponding nonstochastic canonical variate plus a constant. The nonstochastic
canonical variates have sample variance 1 and are uncorrelated in the
sample,
If PI > P2' the maximum rank of f3 is P2 and Kp2+1 = ... = Kpl = O. In that
case we define A,=(a(I)" .. ,a(P2)) and A 2 =(a(P2+1) .. "a(PI)), where
a 0 '. ... ,a(pz) (corresponding to positive K'S) are defined as before and
a' p:-'-ll, ... , a(pd are any vectors satisfying aU)' WaUl = 1 and a(j)'Wa U ) ==
to _il) +
O, l. =1= j,. Th en (()J:OU(')
I/> -- u,v,t, v" l' -- 1, .•• , P2' an d (()J:OU("
I/> -- v,. 1• -- P2 +
1,,,,,PI'
In either case if the rank of f3 is r s; min(PI' p), there are r roots of (8)
that are nonLero and hence GUJI) = {jIV~) + v, for i = 1,,,,, r.
12.6 LINEARLY RELATED EXPECfED VALUES Sl1
12.6.2. Estimation
Let X~I), •.• , x~) be a set of observations on xf'\ ... , X~) with the probability
structure developed in Section 12.6.1, and let X~2\ ... , x~) be the set of
corresponding independent variates. Then we can estimate T, 13, and '\(I by
N
(16) T= ~ E x~) = x(l),
<f.>=1
( 18) W =! E [x~)
N
<f.>=l
- j(1) - 13 (x~) - j(2))][ X~l) - j(l) - 13 (x~) - j(2)) r
= !(All-AI2A221A21)=Sll-SI2S22IS21'
where the A's and S's are defined as before. (It is convenient to divide by
n = N - 1 lnstead of by N; the latter would yield maximum likelihood
estimators.)
The sample analogs of (7) and (8) are
Thc l'Oots k I ~ ." ~ kp, of (20) e~timate the roots KI ~ .,. ~ Kf'1 of (8), and
the corresponding solutions a(l), ... , ii(pd of (19), normalized by a(I)'Wa(i) = 1,
estimate a(I), ••• , a(P'). Then c<J) = (1/ ,[k;)P'a(j) estimates 'Y<J1, and nj =
a(j)'j(l) estimates vf' The sample canonical variates are a(j)'x~) and c(j)'(x~)
- j(2)), j = 1, ... , PI' cP = 1, ... , N. If PI> P2' then PI - P2 more a(j)'s can
be defined satisfying a(j)'Wa(J) = 1 and a(j)'Wa(l) = 0, i:fo j.
(21)
512 CANONICAL CORRELATIONS AND CANONICAL VARIABLES
(22)
we see that 11 = k,/O + k j ) and k, = 1;/0 -I;), i = 1, ... , PI' The vector a(!)
in Section 12.3 satisfies
(24)
Note that this is the same criterion as for the case of both vectors stochastic
(Section 12.4). Then
PI
(26) -[N-Hp+3)] L:
/=k+1
10g(1- l n
12.6 LINEARLY RELATED EXPECt'ED VALUES 513
m I
(31) G= E E (YaJ - Ya )(YaJ - Ya)' = nW
a= I J = I
(32) ~=(a-(r+l)
u )~~.,
a-(Pil)' ,
514 CANONICAL CORRELATIONS AND CANONICAL VARIABLES
the rank of t3 is specified to be k (:5 PI)' the vectors xF), ... , x~) are
nonstochastic, and Za is normally distributed. On the basis of a sample
XI"'" x N , define i by (2) of Section 12.3 and A, A, and t by (3), (4), and
(5). Partition A = diag( A I' A2)' A= (A I' A2 ), and r
"2 _1.
= (t p t 2), where AI'
AI' and r l have k columns. Let <1>1 = AI(Ik - AI) '.
'" A A. '"
( 2)
The maximum likelihood estimator of t3 of rank k is the same for X(l) and
X'z) normally distributed because the density of X = (X( 1)', X(2)'), factors as
(2)
We require B to be nonsingular. This model wa~ initiated by Haavelmo
(1944) and was developed by Koopmans, Marschak, Hurwicz, Anderson,
Rubin, Leipnik, et ai., 1944-1954, at the Cowles Commission for Research in
Economics. Each component equation represents the behavior of some group
(such as consumers or producers) and has economic meaning.
The set of structural equations (1) can be solved for y/ (because B is
nonsingular )~
(3) y/ = Uz/ + v"
where
(4) 0=-B-1f,
with
(5)
516 CANONICAL CORRELATIONS AND CANONICAL VARIABLES
say. The equation (3) is called the reduced form of the model. It is a
multivariate regression model. In principle, it is observable.
f)=[~1 o
(6) (B "I'
~l,
where the vectors ~, 0, "I, and 0 have G 1 , G 2, Kh and K2 components,
respectively. The reduced form is partitioned wnformally into G 1 and G 2
sets of rOws and KI and K2 sets of columns:
(7)
(9)
This implies
(11 )
( 12)
(14)
T
P= EYlz~ Ez,z~
(T )-1 ,
(=1 (= 1
1 T
E (Y, -
A
where
T
(17) A= E Z,Z~
1""1
and vec{d l1 •• " d m ) = (d~,. ,., d'm)'. If, furthermore, the I', are normal, then
P is normal and Tn has the Wishart distribution with covariance matrix n
and T - K degrees of freedom.
518 CANON ICAL CORRELATIONS AND CANONICAL VARIABLES
(18)
(19)
( ~O)
(21 )
(22)
(23)
( 24)
12.8 SIMULTANEOUS EQUATIONS MODELS 519
P'2 A n ·1 Pi2
T
Error E (y~') - PIlZ~') - P'2Z~2))(y}') - PllzP) - P'2Z~2))'
1=1
T
Total E y,(l)y}I),
1=1
The first term in the table is the (vector) sum of squares of y~l) due to the
effect of zP). The second term is due to the effect of Z~2) beyond the effect of
zP). The two add to (PAP')w which is the total effect of Z,' the predeter-
mined variables.
We propose to find the vector ~ such that effect of Z~2) and ~' y~l) beyond
the effect of z)l) is minimized relative to the error sum of squares of ~'YI(l).
We minimize
(25)
~'( PI2S22"P;2)~ = (~' P'2 )S22"(~' P'2 ),
~'nll~ ~'nll~
where Tn = r.;""y,y,- P/:.'P'. This estimator has been called the least vari-
ance ratio estimator. Under normality and based only on the 0 restrictions on
the coefficients of this single equation, the estimator is maximum likelihood
and is known as the limited-information maximum likelihood (LIML) estimator
[Anderson and Rubin (1949)].
The algebra of minimizing (25) is to find the smallest root, say v, of
(26)
(27)
The first component equation in (27) has been dropped because it is linearly
dependent on the other equations [because v is a root of (26)].
where
(33)
and
( 35)
(36) (a,j)~t,
(37) '\}I'~!l,
We can write the model for the linear functional relationship with dummy
12.8 SIMULTANEOUS EQUATIONS MODELS 521
variables. Define
o
(38) Sal = 1 - ath position, a=i •... ,Il-i.
(39)
Then
The correspondence is
(42) ,,
1 ~Z(I) S Ct} ~ ..
.. (~)
,.
(46)
( 47)
n
(48) H=k E (xa-x)(ia-x)' ~P:!A22IP;,
a"'I
522 CANONICALCORRELATJONS AND CANONICAL VARIABLES
n k T
(49) G= L L (xa)-ia)(xa)-ia)'~TO= L(Yt-PZI)(YI-PZ I )'.
cr=1 )=1 1=1
We shall find the limiting distribution of ';1\~* - (3*) defined by (28) and
(31) by showing that ~* is asymptotically equivalent to
( 50) a* -
PTSLS - ~
(p*S p*,)-lp*S
12 22·[ 12
'
22 22·1 P 12 •
This derivation is essentially the same as that given in Anderson and Rubin
(1950) except for notation. The estimator defined by (50), known as the two
stage least squares (TSLS) estimator, is an approximation to the LIML
estimator obtained by dropping the terms vOrl and vW(l) from (31). Let
~* = ~tlML' We assume the conditions for tt(P - IT) having a limiting
normal distribution. (See Theorem 8.11.1.)
(51)
Since
(52)
The import of Lemma 12.8.1 is that the difference between the LIML
estimator and the TSLS estima.tor is OpC1/T). We have
Consider
T T
' +P*'fl*=P' fl=A- 1 "Z(2.I)y(I)fl=A- 1 "Z(2'1)U
(54) P 12 12.... 12.... 22.1 ~ 1 I.... 22.1 ~ 1 II'
,""1
where Z(H)
1
= Z(2)
1
- A 2. A -I
II z(I)
1 •
Thus $(p'12 + p*' fl*) = $ P'12 ....
12 .... fl = ° and
Note that ~~LS - (3* = - (Pi2 S22.1 p~)-I Pi'zS22'1 P{2(3 and ((3', O)Y, +
(y',O)z, = u ll •
p
Proof The theorem follows from (55), S22'1 ~ S~2'1' and P I2 ~ ll12'
•
Because of the correspondence between the LIML estimator and the
maximum likelihood estimator for the linear functional relationship as out-
lined in Section 12.7.5, this asymptotic theory Can be tr<1nslated for the latter.
524 CANONICALCORRELA nONS AND CANONICAL VARIABLES
(57) a = 1, ... , n,
where
(58) a = 1, ... , n.
(59)
Although Anderson and Rubin (1950) showed that Vn.~1 and vW(l) could
be dropped from (31) defining ~tlML and hence that ~tsLS was asymptoti-
cally equivalent to ~tsLS' they did not explicitly propose ~tsLS' [As part of
the Cowles Commission program, Chernoff and Divinsky (1953) developed a
computational program of ~LlML'] The TSLS estimator was proposed by
Basmann (J 957) and Theil (J 96 I), It corresponds in the linear functional
relationship setup to ordinary least squares on the first coordinate. If some
other coefficient of ~ were set equal to one, the minimization would be in
the direction of that coordinate.
Con~ider thc general linear functional relationship when the error covari-
ance matrix is unknown and there are replications. Constrain B to be
(60)
Partition
( 61)
~64)
PROBLEMS
12.1. (Sec. 12.2) Let Za = Z!a = 1, a = 1, ... ,n, and P = 13. Verify that a(l) = I-II3.
Relate this result to the discriminant function (Chapter 6).
12.2. (Sec. 12.2) Prove that the roots of (14) are real.
where yO>, y(2), Z are independent with mean zero and covariance matrices 1
with appropriate dimensionalities. Let A=(al, ... ,ak ), B=(bl" .. ,bk ). and
suppose that A'A, B' B are diagonal with positive diagonal elements. Show
that the canonical variables for nonzero canonical correlations are propor-
tional to a~X(1),b~X(2). Obtain the canonical correlation coefficients and ap-
propriate normalizing coefficients for the canonical variables.
12.5. (Sec. 12.2) Let Al ~ A2 2: ... ~ Aq > 0 be the positive roots of (14), where III
and I22 are q X q nonsingular matrices.
12.7. (Sec. 12.3) Find the canonical correlations and canonical variates between
the first two variables and the last three in Problem 4.42.
12.8. (Sec. 12.3) Prove directly the sample analog of Theorem 12.2.1.
12.9. (Sec. 12.3) Prove that Ar(f + 1) -? Ai and a(f + 1) -? a(l) if a(O) is such that
a'(O}Illa(l) =1= O. [Him: U~O IIIII'2IitI2! = AA2A- I.]
12.11. Let Al 2: A2 ~ ... :2: Aq be the roots of I I I - AI21 = 0, where I I and I2 are
q X q positive definite covariance matrices.
12.12. (Sec. 12.4) For q = ~ express the criterion (2) of Section 9.5 in terms of
canonical correlations.
12.13. Find the canonical correlations for the data in Problem 9.11.
CHAPTER 13
13.1. INTRODUCTION
(1)
If the hypothesis is true, the roots have the distribution given in Theorem
13.2.2 or 13.2.3. Thus the significance level of any invariant test of the
general linear hypothesis can be obtained from the distribution derived in the
next section. If the test criterion is one of the ordered roots (e.g., the largest
root), then the desired distribution is a marginal distribution of the joint
distribution of roots.
The limiting distributions of the roots are obtained under fairly general
conditions. These are needed to obtain other limiting distributions, such as
the distribution of the criterion for testing that the smallest variances of
528
13.2 THE CASE OF TWO WISHART MATRICES 529
principal components are equaL Some limiting distributions are obtained for
elliptically contoured distributions.
(4) IA -IBI = o.
The corresponding vectors satisfying
(5) (A-lB)x=O
satisfy
(6) o C- J ( A - IB) x
= C- I ( CA*C' -ICB*C')x
= ( A * - IB *) C I X.
(7) IA - f( A + B) I = 0
and the vectors y satisfying
( 8) [A - f(A +B)]y = O.
( 10)
Thus the roots of (4) are related to the roots of (7) by 1= f /(1 - f) or
f= I/(i + I), and. the vectors satisfying (5) are equal (or proportional) to
those satisfying (8).
We now consider finding the distribution of the roots and vectors satisfy-
ing (7) and (8). Let the roots be ordered fi > f2 > ... > fp > 0 since the
probability of two roots being equal is 0 [Okamoto (1973)]. Let
(ll) F=
(12) y'(A+B)y= I
( U)
because y;AYj = f~y;(A + B)Yj and y;AYj = f,y;(A + B)Yj' and this can be only
*
if (13) holds (f, fj)'
Let the p x p matrix Y be
P4)
13.2 THE CASE OF TWO WISHART MATRICES 531
(20) O(A,B) I
I a(E,F) .
(21) O(A,G)I=IO(A,B)1
Io(E,F) a(E,F)'
532 THE DISTRIBU1l0NSOF CHARACfERISTICROOTS AND VECTORS
(22)
where dx a and dYf3 are only formally differentials (i.e., we write these as a
mnemonic device). If fa(YI"'" Yn) is a polynomial, then afal aY/3 is the
coefficient of y3 in the expansion of fa(YI + y~ , ... ,Yn + Y:) [in fact the
coefficient in the expansion of fa(YI"'" Yp-l' Y/3 + y3, YP+ l' ... , Yn)]'
The elements of A and G are polynomials in E and F. Thus the cerivative of
an element of A is the coefficient of an element of E* and F* in the
expansion of (E + E*)'(F + F*)(E + E*) and the derivative of an element of
G is the coefficient of an clemcnt of E* and F* in thc expansion of
(E + E* Y( E + E*). Thus the Jacobian of the transformation from A, G
to E, F is the determinant of the linear transformation
(23) dA (dE)'FE+E'(dF)E+E'F(dE),
(24) dG = (dE)'E + E'(dE).
Since A and G (dA and dG) are symmetric, only the functionally indepen-
dent component equations above are used.
Multiply (23) and (24) on the left by E' -1 and on the right by E-l to
obtain
It should be kept in mind that (23) and (24) are now considered as a linear
transformation without regard to how the equations were obtained.
Let
Then
The determinant is
dI I
dW II dw,} (i <j) dw,} (i > j)
da jl 1 2F 0 0
(33) dgil 0 21 0 0
da,} (i <j) 0 0 M N
dg,) (i <j) 0 0 1 1
=l~ 2Fl·\M
21 1 ~l =2 P
!M NI,
where
(34) dW 12
o o o : 0
I
1 \
I I
dall' o II : 0 o i 0
---- -- - ,I / , -1- - - - - - - - - - - - - -
1
r - '-T-
I
--
da2j o 0 \ -
... 0 J
I
I
I
0
M= I I
I I
I I
o o : 0 to!: I 0
-1- - - - - - - - - - - - - -I- - - - - ... - --
I 1 I
__________ I
J____ ________ I _ ___ 1f __ _
o 0 \ 0
:
0 IF
: lp-l
534 THE DISTRIBUTIONS OF CHARACfERISTIC ROOTS AND VECfORS
and
(35)
f2 0 : 0 o o
I
1 1
I I
o fp 1 0 o I
: 0
da::: 3
------------,--
o ... 0: f3 ----------r----,----
'" a 0
I
I
I
I
I I
N= . j I
I I
I I
o fp i t
I I
a
---------~----~----
1 1 I
1 _ ____ IL ____ ..lI ___ _
_ _ _ _ _ _ _ .___________ L_
da p _
LP
a 0 1 0 0: I
: fp
1
Then
where
(39)
( 41)
p
2P CI IE' EI ~(m+n-p )e- ftr E'E nfi"i(m-p-l)
j", 1
n (1 -
p
; .. 1
,
li)'i(n-p-l) n (); .- Ii)'
i<j
( 42)
where the integration is 0 < eil < 00, - 00 < e(J < 00, j '* 1_ The value of (42) is
unchanged if we let - 00 < eiI < 00 and multiply by 2- p • Thus (42) is
Except for the constant (2.". )iP\ (43) is a definition of the expectation of the
!(m + n - p)th power of IE' EI when the ejJ have as density the function
within brackets. This expected value is the 4(m + n - p)th moment of the
r,eneralized variance IE' EI when E' E has the distribution W(I, pl. (See
Section 7.5.) Thus (43) is
( 44)
(45)
(47)
Ii
(48) fl = T+T; /
we have
df. 1
dl j = (//+1
I -
(49) I, - jj = (I, + ~) + 1)'
1
1-1,=1+1'
J
The joint density of Y can be found from (45) and the fact that the
Jacobian is IY1- zP. (See Theorem A.4.6 of the Appendix.)
We shall show that the nonzero roots fl > .. , > fm (these roots being distinct
with probability 1) are the roots of
(53)
(54)
1Ttm2rJ h m + n)]
(57)
rm( ~m)rJ tern + n - p)]rm(~P)
.n [If(p-m-l l(
m
1-1
1 - J,) 1(n-p-l l] n(J, - f).
I~
(1) IA -lII = 0,
where the matrix A has the distribution W(I, n). It will be obseIVed that th(.
variances of the principal components of a sample of n + 1 from N(JL, I) are
1In times the roots of(I). We shall find the following theorem useful:
1T-Wlg(ll,···,lp)O'<l(1,. -11 )
(2)
fpOp)
Proof From Theorem A.2.l of the AppendiK. we know that there exists an
orthogonal matrix C such that
(3) B = C'LC,
where
It 0 0
0 12 0
( 4) L=
0 0 Ip
If the l's are numbered in descending order of magnitude and if Cil ~ 0, then
(with probability 1) the transformation from B to Land C is unique. Let the
matrix C be given the coordinates c I, .. " C p( p_1 )/2' and let the Jacobian of
the transformation be f(L, C). Then the joint density of Land C is
gU I , ... ,I p) f( L, C). To prove the theorem we must show that
~5)
( 6)
13.3 THE CASE OF ONE NONSINGULAR WISHART MATRIX 539
~rhen by Lemma 13.3.1, which will be stated below, B has the density
f [l(m + n)]
(7) p ~ ! II ~- BI Hn -p 1)1 BI t(m-p 1)
fp('2m)fp('2n)
f[l(m+n)] p
fl(1-1,r(n- p -nfl 1t(m- p -n
1 P
= f1 ~ !
r;, (2: m) f,) ( 2 n) j - I I '" 1
The joint density of Land C is f(L, C)g*(ll" .. , lp). In the preceding section
we proved that the marginal density of L is (50). Thus
IBlt(m- p l)f(B)7Twm
(9)
fp(tm)
Now let u:., find the dcnsity of the roots of (D. The density of A is
(11)
S40 THE DISfRIBUTIONS OF CHARACfERISTICROOTS AND VECTORS
Theorem 13.3.2. If A (p x p) has the distribution W(J, n), then the charac-
teristic roots (II ~ l2 ~ ... ~ l p ~ 0) have the density (11) over the range where
the density is not O.
since the roots are different with probability 1. Let the vectors with Yli ~ 0 be
(13)
Then
(15) Y'Y=I.
(16) A = YLY'.
( 17)
where the XO/ are independently distributed, each according to N(O, I). Let
(18)
is distributed according to W(I, n). The roots of A* are the roots of ,4; thus
(20) A* = C**'LC**,
(21) C** 'C** = I
Let
clJ* 0 0
Icfll
*
C 2l
0 0
(23) J( C*) = Icf11
*
Cl'l
0 0
1e;1 1
The distribution of C** is the same as that of C. We now shall show that
this fact defines the distribution of C.
The definition is possible because it has been proved that there is only one
distribution with the required invariance property [Halmos 095u)]. It has also
been shown that this distribution is the only one invariant under multiplica-
tion on the left by an orthogonal matrix (i.e., the distribution of QE is the
same as that of E). From this it follows that the probability is 1/2 P that E is
such that e il ;?: O. This can be seen as follows. Let 1 I, ... , 12 p be the 2 p
diagonal matrices with elements + 1 and - 1. Since the distribution of I, E is
the same as that of E, the probability that ell;?: 0 is the same as the
probability that the elements in the first column of I, E are nonnegative.
These events for i = 1, ... , 2 P are mutually exclusive and exhaustive (except
for elements being 0, which have probability 0), and thus the probability of
anyone is 1/2 p.
542 THE DISTRIBUTIONS OF CHARACfERISTICROOTS AND VECfORS
From the lemma we see that the matrix C has the conditional Haar
invariant distribution. Since the distribution of C conditional on L is the
same, C and L are independent.
Theorem 13.3.3. If C = yl. where Y = (y l' ... , y p) are the normalized char·
I1cceristic vectors of A with Y II ~ 0 and where A is distributed according te
W([, n), then C has the conditional Haar invariant distribution and C is
distributed independently of the charactelistic roots.
Proof. The density of QBQ I . where QQ' = I. is the same as that of B (for
the roots are invariant). and therefore the distribution of J(Y'Q I)Y'Q I is the
same as that of yl. Then Theorem 13.3.4 follows from Lemma 13.3.2. ,.
(25)
Then the characteristic roots II > ... > lp of B have the density
Proof Since the characteristic roots 0 f B 2 are I ~ , ... ,I; and tr B 2 = '£1;,
the theorem follows directly. •
Corollary 13.3.2. Let nS be distributed according to W(I, n), and define the
diagonal matrix Land B by S=C'LC, C'C=I, II> '" >lp, and C£l~O,
i = 1, ... , p. Then the density of the limiting distribution of fn- (l, - I) = D
diagonal is (26) with lj replaced by d l , and the matrix C is independently
distributed according to the conditional Haar measure.
Proof The density of the limiting distribution of Iii (S - I) is (25), and the
diagonal elements of D are the characteristic roots of Iii (S - I) and the
columns of C' are the characteristic vectors. •
(1)
where
N
(2) Aij= 1: (x~I)-i(i))(x~J)-i(f))', i,j=I,2,
a=l
(3) X
= (X(l))
X(2)
544 THE DISTRIBUTIONS OF CHARACfERISTIC ROOTS AND VECfORS
(4)
From Section 3.3 we know that the distribution of A lj is the same as that of
n
(5) A lJ = '"
i....J yU)y(j)'
Of a , i,j = 1,2,
asl
where n =N -1 and
( 6) Y
= ( yO)
y(Z)
1
is distributed according to N(O, ~), Let us aSSume that the dimensionality of
yO), say PI' is not greater than the dimensionality of y(2), say PZ' Then there
are PI nonzero roots of (1), say
(7)
(9)
and
(ll)
( 12) IQ f{All'Z+Q)I-O.
13.5 ASYMPTOTIC DISTRIBUTIONS IN CASE OF ONE WISHART MATRIX 545
The distribution of f.-, i = 1, _.. , PI' is the distribution of the nonzero roots of
(:i2), and the density is given by (see Section 13.2)
(13)
.n
p,
1='
{J.~(PJ-P' -I )(1- J.) 1(f,T-p )-p, - ~)} n Cf, - f;).
Pl
1<.1
Since the conditional density (13) does not depend upon y(2), (13) is the
unconditional density of the squares of the sample canonical correlation
coefficients of the two sets X~l) and X~~), ex = 1, ... , N. The density (13) also
holds when the X(2) are actually fixed variate vectors or have any distribu-
tion, so long as X(1) and X(2) are independently distributed and X(1) has a
multivariate normal distribution.
In the special case when PI = 1, P2 = P - 1, (13) reduces to
(14)
which is the density of the square of the sample multiple correlation coeffi-
cient between X( I) (p I = 1) and X(~) (p~ = p - 1).
A1 >A 2 >'" >Ap, 1,2:./ 2 2:. ... 2:.lp' (31,2:.0, b" 2:. 0, i=I, ... ,p. Define
G = m(B - 13) and diagonal D = m(L - A). Then the limiting distribution of
546 THE DISTRIBUTIONS OF CHARACTERISTIC ROOTS AND VECTORS
_ ~ A, Ak ,
( 2) ccY0'(g,) - '-.J 2 (3k(3k'
k=l (>..,-A k )
k*,
where p = ((31' ... , (3). The covariance matrix of gj and g} in the limiting
distrihution is
(3)
(4) T= YLY',
1 1
(b) U= WA + D + AW' + r (WD + WAW' +DW') + -WDW'.
vn n
(8) U= WA +D + AW',
(9) O=W+W'.
When we substitute W' = - W from (9) into (8) and write the result in
components. we obtain w" = 0,
(Note wij = -w,r) From Theorem 3.4.4 we know that in the limiting normal
distribution of U the functionally independent elements are statistically
independent with means 0 and variances dr(uu) = 2A; and d'Y(u ij ) =
*"
A,.Aj , i j. Then the limiting distribution of D and W is normal, and
d 1 , •.• ,dp ,WI2,W13""'wp _ 1,p are independent with means 0 and variances
dr(d,.) = 2A;, i = 1, ... , p, and dr(w,.,) = A;Aj/(Aj - AY, j = i + 1, ... , p,
i = 1, ... , p - 1. Each column of B is ± the corresponding column of pY;
since Y ~ I, we have PY ~ p, and with arbitrarily high probability each
column of B is nearly identical to the corresponding column of pY. Then
G = /n (B - P) has the limiting distribution of P/n (Y - I) = pW. The
asymptotic variances and covariances follow.
Now we justify the limiting distribution of D and W. The equations
T = YLY' and I = IT' and conditions 11> ... > Ip, Yil> 0, i = 1, ... , p, define
a 1-1 transformation of T to Y, L except for a set of measure O. The
transformation from Y, L to T is continuously differentiable. The inverse is
continuously differentiable in a neighborhood of Y = I and L = A, since the
equations (8) and (9) can be solved uniquely. Hence Y, L as a function of T
satisfies the conditions of Theorem 4.2.3. •
(12)
where the diagonal elements of the diagonal matrix A I are different and are
larger than A* (> 0). Let
( 13)
( 14) (~' o
A* Iq
1+ Iii1 (U"
U21
UIZ )
Un
= [(\' C2
o 1+ Iii
1 (W"
W21 WI2) 1
W22
= (~' A*Iq
1 [(D'0
0) + Iii c,;,c; 1
+(WIIA' A*W C; 1 (A'W!'
12 A,W;, )1+ riM,
1
W21 A 1 A*W C; + A*C W
22 2 12
A*C2W~
where the submatrices of M are sums of products of C2, A I' A*Iq' Dk , Wkl ,
and 1/ Iii. The orthogonality of Y Up = YY') implies
where the submatrices of N are sums of products of Wkl • From (4) and (15)
we find that
( 16)
The limiting distribution of O/A*)U22 has the density (25) of Section 13.3
with p replaced by q. Then the limiti ng distribution of D2 and C2 is the
distribution of Di and Y2~ defined by Uti = Y22 Di Y:z2' , where (1/ A* )Ui2 has
the density (25) of Section 13.3.
13.6 ASYMPTOTIC DISTRIBUTIONS IN CASE OF TWO WISHART MATRICES 549
To justify the preceding derivation we note that D z and Y22 are functions
of U depending on n that converge to the solution of U~ = Y2i Di2 Y~i'. We
can use the following theorem given by Anderson (1963a) and due to Rubin.
(18) lim Gn ( v) = G( v)
n-+oo
(19)
satisfy the conditions of the theorem have been given by Anderson (1963a).
(3) I«I>-AII=O
"1'1"1 = 1, and I'll ~ 0, and let r = ('YI>"" 'Y p )' Letl l ~ ••• ~ lp (> 0) be the
roots of (1), and let L be the diagonal matrix with the roots as diagonal elements
in descending order; let xi, ... ,x; be the solutions to (2) for I = I;, i = 1, ... , p,
x* 'T* x* = 1, and xi,. > 0, and let X* = (xi, . .. , x;). Define Z* = in (X* - r)
and diagonal D = Iii (L - A). Then the limiting distn'bution of D and Z* is
nonnal with means 0 as n -+ 00, m -+ 00, and mjn -+ 7J (> 0). The asymptotic
variances and covariances that are not 0 are
(5)
(6)
(8) i =1= j.
Proof Let
(9) S= r's*r, T= r'T*r.
Then mS and nT are distributed independently according to W( A, m) and
W(J, n), respectively (Section 7.3.3). Then 11"'" lp are the roots of
( 10) IS -ITI = O.
13.6 ASYMPTOTIC DISTRIBUTIONS IN CASE OF TWO WISHART MATRICES 551
(13) A= _1_U
in
= (I + _I_HI) (A + _I_D) (I + _I_H),
in in in
(14) 1+ ~V=(/+ ~H')(/+ ~H).
These can be rewritten
1 1
(15) U=D+AH+H'A+ r(DH+H'D+H'AH) + -H"1)H,
yn n
If we neglect the termS of order 1/,;n and 1In (as in Section 13.5), we
can write
(17) U=D+AH+H'A,
( 18) V=H+H ' ,
(19) U - VA = D + A H - H A.
552 THE DISTRIBUTIONS OF CHARACTERISTIC ROOTS AND VECTORS
(20)
(21)
(22)
(23) i :1= j,
and covariances
(24) i:1= j.
The pairs (d" hit) of diagonal elements of D and H are independent with
variances (5),
(25)
and covariance
(26)
(27) r/~r=l.
(28)
p p p
where Sw Til' L 1 , and GIl are k X k. Then G ll -+ lk' G 12 -+ 0, and G:!l - O.
but G 22 does not have a probability limit. Instead G~2G::'2 ~ lp_k' Let the
singular value decomposition of G 22 be EJF, where E and F are orthogonal
and J is diagonal. Let C z EF.
The limiting distribution of U fii(S - A) and V= fii(T - J) is normal
with the covariance structure given above (12) with Ak + I = .. , :. AI' = A*.
Define D = fii (L - A), H11 fii (GIl - I), H lz = In G}2, H:'I = /11 G:: 1• and
H22 = fii(G zz - Cz) = fiiE(J -1p-k)F. Then (13) and (15) are replaced by
(29) AOI
[
o
= l
1 I
fii H IZ o A* lp_'
• + -;::=::-
vn D~-
554 THE DISTRIBUTIONS OF CHARACTERIST!CROOTS AND VE.CroRS
(
CH~I
Vt1
(30)
[~ o
Ip - k
1+ 1
In
[V"V21 VIOl
v"
1+
Vtl
J- H;1 1
InH'21
I 1
1+ In HII
1
rn H12
=
1 , 1 1
C2 + rn H22
r:; Hl~ C'
'2
1 H'22
+ r;; ,r,; H21
\tl t1 t1
If We neglect the terms of order 11m and liN, instead of (11) we can write
Then vlI =2h ll , i=1, ... ,k; Li,,-\v,,=d l , i=1, ... ,k; ur,-v;,A j =
(AI - A,)h l ), i =1= j, i, j = 1, ... , k; U22 - A* V22 = C~D2C2; CzCU21 - V21 AI) =
H21(A~'I-Al); and (UI2-A*VI2)C;=(A*I-AI)HI2' The limiting distri-
buti~n of U22 - A* V22 is normal with mean 0; cS'(u" - A* v;Y = $d~ =
2A*-0 + 7])/7], i = k + 1, ... , p; and (f-(u i ) - A* V lj )2 = A*2(1 + 7])/7], i =1= j,
i. j = k + 1, ... , p. The limiting distribution of D ~ and C2 is the distribw ion
of D2 and C 2 defined by U22 - A* V 22 = C~D2C2 where 01 A* )(U22 - V22 ) has
the d~nsity of (25) of Sectio 11 13.3.
13.7 ASYMPTOTIC DISTRIBUTION IN A REGRESSION MODEL 555
(1 ) --IS
SZI S11 A S Al2
12 "I = 22 "I , "IAIS22"1
A
=
1
.
(3) U=@V+W,
where @=A1p(r')-l, f!UU = Iuu =lpl ' ,RW' Ivv =lp1 ' GUV' = Iuv
1
1
= (t\, 0) = t\, $WW = Iww = Ipi - t\z, and $VW' = O. [See (33) to (37)
and (45) of Section 12.2.] Let the sample covariance matrices be Suu =
A'SuA', SUI< = A'Sur, and Svv= rJszzr. Let the sample vectors consti-
tute H = r- I f' = r-I(.y I""'.y p). Then H satisfies
(4)
(5) HIZ]
H zz '
556 THE DISTRIBUTIONS OF CHARACfERISTICROOTS AND VECfORS
(6)
the last P2 - PI columns of (4) are SUVH2 = 0. Then Hll ~/pl' H12 ~ 0, and
H2I ~ 0, but the probability limit of (4) only implies H~2H22 ~/p2-PI' Let
the singular value decomposition of H22 be H22 = EJF.
Define SUu = m(Suu -Ip), Stv = m(Svv - lp2 )' SUv= mCSuv - A),
H:=m(HI-/(PI)' and A*=[m(A-A),O], where I(PI)=Up1'0)'. Then
expansion of (6) yields
(7) (A' + )" stu )( Ip, + }" Suu r' (A + )" stv )( I,p,) + }" Ht 1
~ (I" + )" stv )( I,,,) + }" Ht )( A + ~ A' r
From (7) we obtain
(9)
(10)
(13)
( A~ - A7)h;j = ha~l n
( AjV,auja + A,u,avja - A, \u,aUja - A7 V,aVja) + 0,,( 1).
Anderson (1999a), has also given the asymptotic covariances of a j and of lit)
and a l • Note that hij depends linearly on (u ra ' V, a> and that the pairs
(U'-a' Via) and (U ja , vja ), i =1= j, are uncorrelated. The covariances (14) do not
depend on (U, V) being normal.
Now suppose that the rank of f1:? is k < Pl' Define HI = (Hi 1_ H;I)' as
tbe first k columns of H satisfying (4), and define AI = diag( AI"'" A~).
Then HI satisfies (6), and Ht satisfies (8), (9), (10), and (1). Then A;' is
given by (12) for i = 1, ... , k.
558 1 HE DISTRIBUTIONS OF CHARACfERISTIC ROOTS AND VECfORS
Hence
13.7.2. One Set of Variates Stochastic and the Other Set Nonstochastic
Now consider the case that X(2) in (2) is nonstochastic, where J.·Za; = 0 and
,{ZIXZ~ = I zz. We observe X =X 1..... xII" We aSSUme
( 19)
1 1/
(20) SII = n [, x~')x~I), = I3S2213 + SZ213' + I3S2z + Szz ~ I3I2213' + I zz ,
IX'"'I
-A(Izz + I3S2213')
~22)
( S2213
( 23)
We shall first assume PI = pz and At> ... '> ApI> O. Then (22) and (23)
and a" > 0 define
(25)
(26)
13.7 ASYMPTOTIC DISTRIBUTION IN A REGRESSION MODEL 559
2 )-1 "2
(27) ( A + S v w) ( A + A S v w + Sw v A + S w w ( A + Sw v ) H = H A .
Since
(32)
(AJ - A;)h;j = ~
vn
t [(1 -
a=1
Anv,.awja\ + A,.wia vja (l - An - AiW,aWjaAj]
Then
(34)
The ith diagonal term of (34) is (29) for i = 1, ... , k. The i, jth element of the
upper left-hand submatrix is (32) for i =F j and i, j = ] k. Two other
J" .,
(37)
H~ + Hti
[ C2H~ +Hiz'
Hiz + HiJ 'C2
CH* H*'C =O+Op(l).
1
2 22 + 22 2
(38)
(39)
(40)
A
0*-
k-
(suv + A S*21
(Ht1 + Htn
I
uv
S*ll S*12)
wv wv
- ( S*21 o +op(l).
wv
( 41)
since Svv ~ I. The effect of the rank restriction is to replace the lower
right-hand submatrix of SWv by 0 (the parameter value).
Since S~ v = (l/!nrr.:. 1Wa V~, we have vec Sw v = (1/";-; n:::~ = I (V, ® Wa )·
Because Vcr and W" are independent,
(43) A 1 n [
. r '-.J Wa V(l)'
vec€>*k =vec-" a ,
( W(a 1)
0
1 1+0 (1)
V(2),
a p
'In a~l
where Va = (V(l),
a'a
V(2),), and W
a
= (W(l)'
cr'U·
W(2),), Then
( 44)
o
IP
~
-10 ® [ Ik-A~
0
(46) ttfveen(Bk-p)[vee(Bk-p)]t
(
=[r,r;®I zz ]+ [r2r;®IzzA,lk Al2)-' A',I zz ]
lfwe define 0 = Ivxr, = IzzAI A1(I - A2,)-1 and 0 = r" then p = 00'.
We have
Corollary 13.7.1. xi
Let 2), •.. , X~2) be a set of vectors such that (19) holds.
L e t xa(I) -- Q........a(2) + Za' a -- 1, ... , n, were
h za lS. an 0 bserva t'tOn on a ran dom
vector Z with rfZ = 0 and rfZZ' = I zz . Suppose 13 has rank k. Then the
limiting distribution of ,;nvec(Bk - 13) is normal with mean 0 and covariance
(46) or (49).
A[ > '" > Ap, II > ". > Ip, (3il ~ 0, b,-[ ~ 0, i = 1, .. . ,p. As in Section 13.5.1,
define T = 13' Sp = YLY', whe re Y = 13' B is orthogonal and Yil ~ O. Then
rfT = p'IP = A.
The limiting covariances of /Nvec(S - I) and /Nvec(T - A) are
(5)
(6)
(7)
Proof The proof is the same as for Theorem 13.5.1 except that (4) is used
instead of (4) with K = O. •
(8)
In Section 11.7.3 it was shown that - N times the logarithm of (8) has the
limiting distribution of
= 1 [2
--;z Ep u;," + EP u,~1 - -1 (P E U/I
) -] .
2A i=p-lJ,+l '=p-q+ I q '=p-q-..I
, <)
The term Li < ,U;j has the limiting distribution of (1 + K) A* ~Xq'!.(q _ I) /~. The
limiting distribution of (u p _ q _1. p_q+ I " ' " U PP ) is normal with mean 0 and
covariance matrix A*2[2(1 + K)lq + KEE']A*2. The limiting distribution of
[LU~i - (LUji)2 Iq ]A*2 is 2(1 + K)A*2Xq'2_ I • Hence, the limiting distribution of
(9) is the distribution of (1 + K) X;(q + I) /2-1 .
We are also interested in the characteristic roots and vectors or one
covariance matrix in the metric of another covariance matrix.
(11)
(12)
566 THE DISTRIBl.JTIONS OF CHARACTERISTIC ROOTS AND VECTORS
( 13)
~ 7'1
I' d 2+3K
(I-t} .(Y('( 1':1)= J \"1,.
(15) i =1= j,
(16)
( 17)
Let A = BLB', where L is di<lgonal with diagonal clements II > ... > I" and
B is onhogon,lI with bll ~ O. Since g(tr A) = gO:::[~ II), the density of II"'" lp
is (Theorem 13.3.4)
(m,n>p).
Let C be a m,mix such that C'lfC' = I. Then Y = CY* and Z = CZ* have
the density g[tr( Y Y' + ZZ')]' Let A* = y* y* " B* = Z* Z* " A = YY', and
B = ZZ'. The roots of IA* -IB* I = 0 are the roots of (A -IBI = O. Let the
PROBLEMS 567
roots of IA - f(A + B)I = 0 be fl > ... > fp, and let F = diag(f1,···, f p)'
Define E (p X p) by A + B = E' E, and A = E' FE, and eil ~ 0, i = 1, ... , p.
( 19)
the density of E is
(20)
and in Section 13.7 g[tr(Y'Y + Z 'Z)] = g[tr(A + B)]. The distribution of the
roots does not depend on the form of gO; the distribution of E depends
only on E' E = A + B. The algebra in Section 13.2 carries over to this more
general case.
PROBLEMS
13.1. (Sec. 1.1.2) Prove Theorem 13.2.1 for p = 2 hy calculating the Jacobian
directly.
13.2. (Sec. 13.2) Prove Theorem 13.3.2 for p = 2 directly by representing the
orthogonal matrix C in terms of the cosine and sine of an angle.
13.3. (Sec. 13.2) Consider the distribution of the roots of IA -IBI = 0 when A and
B are of order two and are distributed according to WCI, m) and WCI, n),
respectively.
13.4. (Sec. 13.2) Prove that the Jacobian 1 a(G, A)/ aCE, F)I is nUl -~) times a
function of E by showing that the Jacobian vanishes for f, = It
and that its
degree in f,. is the same as that of n(f, - ~).
13.5. (Sec. 13.3) Give the Haar invariant distribution explicitly for the 2 x 2 orthog-
onal matrix represented in terms of the cosine and sine of an angle.
568 THE DISTRIBUTIONS OF CHARACl ERISTICROOTS AND VEcrORS
13.6. (Sec. 13.3) Let A and B be distributed according to weI, m) and weI, n)
respectively. Let 11> '" :> Ip be the roots of IA -IBI = 0 and m 1 > ... > mp
be the roots of IA - mIl = O. Find the distribution of the m's from that of the
I's by letting n - t 00.
13.7. (Sec. 13.3) Prove Lemma 13.3.1 in as much detail as Theorem 13.3.1.
13.8. Let A be distributed according to WeI, n). In case of p = 2 find the distribu-
tion of the characteristic roots of A. [Hint: Transform so that I goes into a
diagonal matrix.]
13.9. From the result in Problem 13.6 find the di!'tribution of the sphericity criterion
(when the null hypothesi!; is not true).
13.10. (Sec. 13.3) Show that X (p x n) has the density ix(X' X) if and only if T has
the density
where T is the lower triangular matrix with positive diagonal elements such
that TT' =X' X. [Srivastava and Khatri (1979)]. [Hint: Compare Lemma 13.3.1
with Corollary 7.2.1.]
13.11. (Sec. 13.5.2) In the ca<;e that the covariance matrix is (t 2) find the limiting
distribution of D I , WII , W12 , and W21 •
Factor Analysis
14.1. INTRODUCTION
Factor analysis is based Oil a model in which the observed vector is parti-
tioned into an unobserved systematic part and an unobserved error part The
components of the errOr vector are considered as uncorrelated or indepen-
dent, while the systematic part is taken as a linear combination of a relatively
small number of unobserved factor variables. The analysis separates the
effects of the factors, which are of bnsic interest, from the error~. From
another point of view the analysis gives a description or explanation of the
interdependence of a set of variables in terms of the factors without regard to
th,~ observed variability, This approach is to be compared with principal
component analysis, which describes or «explains" the variability observed.
Factor analysis was developed originally for the analysis of scores on mental
tests; however, the methods are useful in a much wider range of situations,
for example, analyzing sets of tests of attitudes, sets of physical measure-
m.ents, and sets of economic quantities. When a battery of tests is given to a
group of individuals, it is observed that the score of an individual on a given
test is more related to his scores on other tests than to the sCores of other
individuals on the other tests; that is, usually the scores for any particular
individual are interrelated to some degree. This interrelation is "explained"
oy considering a test score of an individual as made up of a part which is
peculiar to this particular test (called error) and a part which is a function of
mOre fundamental quantities called scores of plimmy abilities or factor scores.
Since they enter several test scores, it is their effect that connects the various
569
570 FACfOR ANALYSIS
test scores. Roughly, the idea is that a person who is more intelligent in some
respects will do better on many tests than someone who is less intelligent.
The model for factor analysis is defined and discussed in Section 14.2.
Maximum likelihood estimators of the parameters are derived in the case
that the factor scores and errors are normally distributed, and a test that the
model fits is developed. The large·sample distribution theory is given for the
estimators and test criterion (Section 14.3). Maximum likelihood estimators
for fixed factors do not exist, but alternative estimation procedures are
suggested (Section 14.4). Some aspects of interpretation are treated in
Section 14.5. The maximum likelihood estimators are derived when the
factors are normal and identification is effected by specified zero loadings.
Finally the estimation of factor scores is considered. Anderson (19843)
discusses the relationship of factor analysis to prinCipal components and
linear functional and structural relationships.
(1) X=Af+U+j.L,
( 4) :.t=AA'+'If.
If f and U are normal, all the information about the structure comes from
(3) [or (4)] and G X = fl..
14.2.2. Identification
Given a covariance matrix I and a number m of factors, we Cl:..n ask whether
there exist a triplet A, <l> positive definite, and 'If positive definite and
diagonal to satisfy (3); if so, is the triplet unique? Since any triplet can be
transformed into un equivalent structure A C, C- 1 4:>C,-I, and 'If, we can
put m 2 independent conditions on A and <f> to rule out this indetermin~cy.
The number of component~ in the ohservahle :£ and the number of condi-
tions (for uniqueness) is !PCp + I) +m 2 ; the numbers of parameters in A,
4:>, and 'If are pm, ~mCm + 1), and p, respectively. If the excess of observed
quantities and conditions over number of parameters, namely, H(p - m)2
-p - m], is positive, we can expect a problem of existence but can anticipate
uniqueness if a set of parameters does exist. If the excess i .. negative, we can
expect existence but possibly not uniqueness; if the excess is 0, we can hope
for both existence and uniqueness (or at least a finite number of solutions).
The question of existence of a solution is whether there exists a diagonal
14.2 THE MODEL 573
(5)
iR diagonal. If the diagonal elements of r are ordered and different ('Y II >
122 > ... > 'Ymm), A is uniquely determined. Alternative conditions are that
the first m rOWS of A form a lower triangular matrix. A generalization of this
c.Jndition is to require that the first m rows of B A form a lower triangular
matrix, where B is given in advance. (This condition is implied by the
SCI-called centroid method.)
Simple Structure
These are conditions proposed by Thurstone (1947, p. 335) for choosing a
matrix out of the class AC that will have particular psychological meaning. If
Aia = 0, then the ath factor does not enter into the ith test. The general idea
of simple structure is that many tests should not depend on all the factors
when the factors have real psychological meaning. This suggests that. given a
A, one should consider all rotations, that is, all matrices A C where C is
orthogonal, and choose the one giving most 0 coefficients. This matrix can be
considered a,~ giving the simple.~t structure and presumably the one with most
meaningful psychological interpretation. It should be remembered that the
psychologist can construct his or her tests so that they depend on the
assumed factors in different ways.
The positions of the O's are not chosen in advance, but rotations Care
tried until a A is found satisfying these conditions. It is not clear that these
conditions effect identification. Reiers¢1 (1950) modified Thurstone's condi~
tions so that there is only one rotation that satisfies the conditions. thus
effecting identification.
test score. Then we do not assume that #;'ff' = I. These conditions are
similar to some llsed in econometric models. The coefficients of the a th
column are identified except for multiplication by a scale factor if (a) there
are at least m - 1 zero elements in that column and if (b) the rank of Ala) is
111 - 1. where At") is the matrix composed of the rowS containing the
assigned O's in the a th column with those assigned O's deleted (Le., the a th
column deleted). (See Problem 14.1.) The multiplication of a column by a
scale constant can be eliminated by a normalization, such as <baa = 1 or
ArQ = 1 for some i for each a. If <baa = 1, a = 1, ... ,m, then «I> is a
correlation matrix.
It will be seen that there are m normalizations and a minimum of
m(m - 1) zero conditions. This is equal to the number of elements of C. If
there are more than m - 1 zerO elements specified in one or more columns
of A. then there may be more conditions than are required to take out the
indeterminacy in A C; in this case thc condition!; may restrict A «I> A'.
As an example, consider the model
0
AZI 0
(6) X=j.L+ A31 A32 [~] + V
0 A42
0 1
V
A21 V
A31 v + A32 a
=j.L+ +V
A42 a
a
for the scores on five tests, where v and a are meaSUres of verbal and
arithmetic ability. The fir!;t tWo tests are specified to depend only on verbal
ability While the last two tests depend only on arithmetic ability. The
normalizations put verbal ability into the scale of the first test and arithmetic
ability into the scale of the fifth test.
Koopmans and Reiers9l1 (1950), Anderson and Rubin (1956), and Howe
(1955) suggested the use of preassigned O's for identification and developed
maximum likelihood estimation under normality for this case. [See also
Lawley (1958).] Joreskog (1969) called factor analysis under these identifica-
tion conditions confirmatory factor analysis; with arbitrary conditions or with
rotation to simple structure, it has been called exploratory factor analysis.
14.2 THE MODEL 575
Other CondUlo1ZS
A convenient set of conditions is to require the upper square submatrix of A
to be the identity. This assumes that the upper f:quare matrix without this
condition is nonsingular. In fact, if A* = (A~', A*2')' is an arbitrary p X m
matrix with Ai square and nonsingular, then A = A*Ai 1 (1m' A'2)' satis-
fies the co~dition. (This specification of the leading m X m submatrix of A
as 1m is a convenient identification condition and does not imply any
substantive meaning.)
Next we shall maximize the logarithm of (I) with .... replaced by fl.; this ist
(3)
(This is the logarithm of the concentrated likelihood.) From 1 I-I = I, we
obtain for any parameter fJ
( 4)
Then the partial derivative of (3) with regard to 1/111' a diagonal element of
'It, is - N /2 times
P
(5) (Tii - E CkJ(TJb-
ik
,
k.J=D
tWe could add the restriction that the off-diagonal elements of A''I' - I A are 0 with Lagrange
mUltipliers, but then the Lagrange multipliers become 0 when the derivatives are set equal to O.
Such restrictions do not affect the maximum.
]4.3 ESTIMATORS FOR RANDOM ORTHOGONAL FACTORS 577
where I -I = ({TIJ) and (C ,j ) = C = (1/ N)A. In matrix notation. (5) set equal
to 0 yields
(6)
(8)
We have
From this we obtain 'I'-lA(f +1)-1 = I-IA. Multiply (8) by I and use the
above to obtain
(10)
or
(11)
(12)
I ",-1 (I - C) W- I I = ( A A' + W) W- I ( W + A A' - C) '1'- I ( A A' + W)
=W+AA'-C
because
(14)
578 FACfOR ANALYSIS
Then (6) is equhulent to diag 'I'-I(I - C)'I'-I = diagO. Since 'I' is diago-
nal. this equation is equivalent to
The e~timators .\ and ~ are determined by (0), (15), and the requiremert
that A' 'I' -I '\ is diagonal.
We can multiply (11) on the left by '1'- t to obtain
( 16)
The third equality follows from (8) multiplied on the left by I; the fourth
equality follows from (5) and the fact that ,,it is diagonal. Next we find
The scc()t1d equality is I UU' + T{,I = I U'U + 1",1 for Up X m, which is proved
as in ()4) Dr Section HA. From tht.: fact that the charuclcri~tic roots of
14.3 ESTIMATORS FOR RANDOM ORTHOGONAL FACTORS 579
I I
'1'- 2"(C - '1')'1'- 2" are the roOlS 1'1> 1'2> ." > I'p of 0 = I C - qr - 1''1'1 =
IC-(1+I')qrl,
(19)
ICI
-,,- =
1'1'1
n (1 + 1',).
p
i=1
"
[Note that the roots 1 + 1', of '1'- tCqr - tare poslllve. The roots I'i of
'1'- ~(C - '1')'1''' t are not necessarily positive; usually some will be negative.]
Then
(20)
The largest roots 1'1 > ... > 1'm should be selected for diagonal elements of
t. Then S = {l, ... , m}. The logarithm of the concentrated likelihood (3) is a
function of I = A A' + qr. This matrix is positive definite for every A and
every diagonal 'I' that is positive definite; it is also positive definite for some
diagonal 'I' 's that are not positive definite. Hence there is not necessarily a
relative maximum for qr positive definite. The concentrated likelihood
function may increase as one or more diagonal elements of qr approaches O.
In that case the derivative equations may not be satisfied for qr positive
definite.
The equations for the estinlators (11) and (15) can be written as polyno-
mial equations [multiplying (11) by 1'1'1], but cannot be solved directly. There
are various iterative procedures for finding a maximum of the likelihood
function, including steepest descent, Newton-Raphson, scoring (using the
information matrix), and Fletcher-Powell. [See Lawley and Maxwell (1971),
Appendix II, for a discussion.]
Since there may not be a relativ~ maximum in the region for which I/I/i > 0,
i = 1, ... , p, an iterative procedure may define a sequence of values of A and
q, that includes ~i1 < 0 for some indices i. Such negative values are inadmis-
sible because I/Iii is interpreted as the variance of an error. One may impose
the condition that I/Ii' ~ 0, i = 1, ... ,p. Then the maximum may occur on the
boundary (and not all of the derivative equations will be satisfied). For some
indices i the estimated variance of the error is 0; that is, some test scores are
exactly linear combinations of factor scores. If the identification conditions
580 FACfOR ANALYSIS
ct> = I and A ''1'-1 A diagonal are dropped, we can find a coordinate system
for the factors such that the test scores with a errOr variance can be
interpreted as (transformed) factor scores. That interpretation does not seem
useful. [See Lawley and Maxwell (1971) for further discussion.]
An alternative to requiring I/Ili to be positive is to require 1/11/ to be
bounded away from O. A possibility is l/lli ~ eUri for some small e, such as
0.005. Of course, the value of e is arbitrary; increasing e will decrease the
value of the maximum if the maximum is not in the interior of the restricted
region, and the derivative equations will not all be satisfied.
The nature of the concentrated likelihood is such that more than one
relative maximum may be possible. Which maximum an iterative procedure
approaches will depend on the initial val ues. Rubin and Thayer (1982) have
given an example of three sets of estimates from three different initial
estimates using the EM algorithm.
The EM (expectation-maximization) algorithm is a possible computational
device for maximum likelihood estimation [Dempster, Laird, and Rubin
(1977), Rubin and Thayer (1982)]. The idea is to treat the unobservable f's as
missing data. Under the assumption that f and U have a joint normal
distribution, the sufficient statistics are the means and covariances of the X's
and f's. The E-step of the algorithm is to obtain the expectation of the
covariances on the basis of trial values of the param".!ters. The M-step is to
maximize the likelihood function on the basis of these covariances; this step
provides updated values of the parameters. The steps alternate, and the
procedure usually converges to the maximum likelihood estimators. (See
Problem 14.3.)
As noted in Section 14.2, the structure is equivariant and the factor scores
are invariant under changes in the units of measurement of the observed
variables X --+ DX, where D is a diagonal matrix with positive diagonal
elements and A is identified by A''I'- l A is diagonal. If we let DA= A*,
D 'if J) = 'I' * , and DCD = C* , then the logarithm of the likelihood function is
a constant plus a constant times
The fact that the factors do not depend on the location and seale factors is
one reason for considering factor analysis as an analysis of interdependence.
It is convenient to give some ruk:-; or thumh for initial c:-;timatc:-; of the'
cotnmunalitie~, E~tI~ I ),.7J = 1 - Ifr".
in terms of observed correlations. One rlile
is to use the R~'I ..... i~I.Jtl. ..• I" Another is to tll'C max" I,II'"J
diagonal positive definite 'It and some p x m matrix A. The likelihood ratio
criterion' is
P "1 I'
"
== E Yj- E -Yi= "\....
L.J ')I"
!= 1 1= I i=m+l
tThis factor is heuristic. If m = O. the factor from Chaplcr 9 i~.V elp + 11 )/6: B,mkfl
suggested replacing Nand p by N - m and p - m. re~pective1y.
582 FACfOR ANALYSIS
(27)
(28)
If ( Ol~) is nonsingular, if A and l{r are identified by the condition that A ",-I A
I
is diagonal and the diagonal elements are different and ordered, if C ~ '" + A A I,
and if {N (C - I) has a limiting normal distribution, then {N ( A - A) and
{N (.q, - "') have a limiting normal distribution.
where (~i}) = (Oi7)-I. The other covariances are too involved to give here.
While the asymptotic covariances are too complicated to give insight into
the sampling variability, they can be programmed for computation. In that
case the parameters are replaced by their consistent estimators.
(32)
(33)
This can be rewritten as
(34)
Multiplication on the left by l: -I Cl: -I yields (8\ which leads to (10). This
estimator of A given '11 is the same as the maximum likelihood estimator
except for normalization of columns. The equation obtained by setting the
derivatives of (32) with respect to '11 equal to 0 is
(35)
An alt~rnative is to minimize
(36)
(37)
Browne (1974) showed that the generalized least squares estimator of '11 has
the same asymptotic distribution as the maximum likelihood estimator. Dahm
and Fuller (1981) showed that if Cove in (31) is replaced by a matrix
converging to cove and '11, A, and <I> depend on some parameters, then the
asymptotic distributions are the same as for maximum likelihood.
where T is the diagonal matrix with diagonal dements 'I"'" '", the charac~
teristic roots of C. If tm+I, •.• ,tp are small, C can be approximated by
III
where Tl is the diagonal matrix with diagonal elements t I' ... ,tm , and X is
approximated by
III
(40)
,"'I
Then the sample covariance of the difference between X and the approxima~
don (40).is the sample covariance of
(41)
which is B:.T2B; = Ef=m + I b,t,b;, and the sum of the variances of tht: compo~
nents is Er=m + 1 t l , Here 1'2 is the diagonal matrix with t l1/ + It' •• , (p as:
diagonal elements.
This analysis is in terms of some common unit of measurement. The first
m components "explain" a large proportion of the "variance/' tr C. When the
units of measurement are not 1he same (e.g., when the units are arbitrary),
it is customary to standardize each measurement to (sample) variance 1.
However, then the principal components do not have the interpretation in
telms of variance.
Another difference between principal component analysis and factor anal~
ysis is that the former does not separate the error from the systematic part.
This fault is easily remedied, however, Thomson (1934) proposed the follow.
ing estimation procedure for the factor analysis model. A diagonal matrix qt
is subtracted from C. and the principal component analysis is carried out on
C - qt. However, qt is determined so C - qt is close to rank m. The
equations are
(42) (C qt)A= AL,
(43) diag( qt + A A) = diag C,
(44) N. A = L diagonal.
The last equation is a normalization and takes out the indeterminacy in :\.
This method allows for the error terms, but still depends on the units of
586 FACfOR ANALYSIS
1 n cxp
P ( N
_1.2 '\' ( _ _ '\"m A f. )2 \
f
(2) L = N /2 '-.J
x/a /1-/ L..j
,I,
=I '/ fa
.
[( 27T)PflP ",.]
/=1 '1-'11
;=1 a=1 '1-'1/
This likelihood function does not have a maximum. To show this fact, let
/1-1=0, Al1 =l, A1/=0 (j=t=I), flu=X 1u ' Then X[Cl-/1-I-r./~IAij~a=O,
and I/III does not appear in the exponent but appears only in the constant.
As I/Il! -+ 0, L -+ 00. Thus the likelihood does not have a maximum, and the
maximum 1ik:elihood estimators do not exist [Anderson and Rubin (1956)].
Lawley (1941) set the partial derivatives of the likelihood equal to 0, but
Solari (1969) showed that the solution is only a stationary value, not a
maxImum.
Since maximum liLelihood estimators do not exist in the case of fixed
factors, what estimation methods can be used? One possibility is to use the
maximum likelihood method appropriate for random factors. It was stated by
Anderson and Rubin (1956) and proved by Fuller, Pantula, and Amemiya
(1982) in the case of identification by O's that the asymptotic normal distribu-
tion of the maximum likelihood estimators for the random case is the same as
for fixed factors.
The sample covariance matrix under normality ha:-> the noncentral Wishart
distribution [Anderson (l946a)] depending on'll, A<I>A', and N -1. Ander-
son and Rubin (1956) proposed maximizing this likelihood function. How-
ever, one of the equations is difficult to solve. Again the estimators are
asymptotically equivalent to the maximum likelihood estimators for the
nndom-factor case.
14.5.1. Interpretation
The identification restrictions of A ''11- IA diagonal or the first m roWs of A
being 1m may be convenient for computing the maximum likelihood estima-
tors, but the components of the factor score vector may not have any intrinsic
meaning. We saw in Section 14.2 that 0 coefficients may give meaning to a
factor by the fact that this factor does not affect certain tests. Similarly, large
factor loadings may help in interpreting a factor. The coefficient of verbal
ability, for example, should be large on tests that look like they are verbal.
In psychology each variable Or factor usually has a natural positive direc-
tion: more answers right on a test and more of the ability represented by the
factor. It is usuaUy expected tl.at more ability leads to higher performance;
that is, the factor loading should be positive if it is not O. Therefore, roughly
588 FACfOR ANALYSIS
. .
·('\21. '\22)
. "
·('\31. '\32)
------------1------------ '\il
speaking, for the sake of interpretation, one may look for factor loadings that
are either 0 or positive and large.
14.5.2. Transfonnations
The maximum likelihood estimators on the basis of some arbitrary identifica·
tion conditions including ~ = I are A and W. We consider transformations
( 1)
( 4)
given A. <1» as the one with the most meaningful interpretation in terms of
the subject matter of the tests. The idea of simple structure is that with 0
fUl:tor loadiIlgs in l:crtain patterns the component factor SCores can be given
meaning regardless of the moment matrix. Permitting <I> to be an arbitrary
positive definite matrix allows more O's in A.
Another consideration 1n selecting transformations or identification conJi~
lions is autonomy, or permanence, or invariance with regard to certain
changes. For example, what happens if a selection of the constituents of a
population is made? In case of intelligence tests, suppose a selection is made,
such a~ college admittees out of high school seniors, that can be assumed to
involve the primary abilities. One can envisage that the relation between
unob~e1ved factor SCores f and observed test scores x is unaffected by the
selection, that is. that the mdtrix of factor loadings A is unchanged. The
variance of the errors (and specific factors), the diagonal elements of '11, may
also be comidered as unchanged by the selection because the errors are
uncorrelated with the factors (primary abilities).
SlIppose there is a {rue model, A, <1>, '11, and the investigator applies
identification conditions that permit him to discover it. Next, suppose there is
a selection that results in a new population of factor SCores so that their
l:ovariancc matrix is <1>*. When the inve!>tigator analyzes the new observed
covariance matrix '11 + A <1>* A', will he find A again? If part of the identifi-
cation conditions are that the factor moment matrix is I, then he wiil obtain
a different factor loading matrix. On the oH er hand, if the identification
conditions are entirely on the factor loadings (specified O's and 1's), the factor
loading matrix from the analysis is the same as before.
The same consideration is relevant in comparing two populations. It may
be reasonable to consider that WI = '112 ' A 1= A 2, but <1>1 *" <1>2' To test the
hypothesis that <1>1 = <1>2' one wants to u::;e identification conditions that
agree with A I = A 2 (ratha than A I = A ~C). The condition should be on
the factor loadings.
What happens if more tests are added (or deleted)? In addition to
observing X = A f + fJ.. + U. 5iuppose one observes X* = A*f + fJ..* + U*,
when: U t- iJo; 1Il1corrl'lakd with U. Since the common factor!> f are un-
changed, cI> is unchanged. However, the (arbitrary) condition that A' '11- I A
is diagonal is changed; use of this type of condition would lead to a rotation
onA' A*').
We now consider estimation of A, '11, and <I> when <I> is unrestricted and A
is identified by specified O's and 1'so We assume that each column of A has at
14.7 ESTIMATION OF FACfOR SCORES 591
least m + 1 O's in specified positions and that the submatrix consisting of the
rows of A containing the O's specified for a given column is of rank m - 1.
(See Section 14.2.2.) We further assume that each column of A has 1 in a
specified position or, alternatively, that the diagonal element of <fJ corre-
sponding to that column is 1. Then the model is identified.
The likelihood function is given by (0 of Section 14.3. The derivatives of
the likelihood function set equal to 0 are
(3)
( 4) I=W+A<fJA.
( 1) f:= (A'W- A) 1
-1 A'W- 1 (x a - fJ.)
= r- 1A'W- 1 (xa - f1),
(2)
(3) .fi(X)
f
= (W+Atf>A'
tPAt
Atf»
tf> .
(5)
If tP = I, the predictor is
(6)
When r is also diagonal, the jth element of (6) is 'Yj/(1 + 'YJ) times the jth
element of (1). In the conditional distribution of xa given fa (for tf> = J)
(7) S ( f: Ifa ) = ( I + r) -! r fa ,
This last matrix, describing the mean squared error. is smaller than (2)
describing the unbiased estimator. The estimator (5) or (6) is a Bayes
estimator and is appropriate when fa is treated as random.
rROBLEMS
A- (0 Nil)
- X(I) AI) ,
implies
ell
C= ( o
14.2. (Sec. 14.3) For p =:;, m = 1. and A = A. prove 10,;1 = n;~ I( A~ /1]1,,).
14.3. (Sec. 14.3) The EM algorithm.
(a) Iff and U are normal and f and X are obseIVed, show that the likelihood
function based on (xl,.fl), ... ,(x,v,J\,) is
594 fACfOR ANALYSIS
(b) Sho\\. thai when the factor scores are included as data the sufficient set of
statistics is X, j, C u = C,
1 N _
C tf = N L (xa-i)(fa-f)',
cr= I
N
CfJ'= ~ L (fa-j)(f,,-j)'.
a=\
(c.:l Show that the conditional expectations of the covariances in (b) given
X=(xl' .... x N ), A, «1>, and 'I' are
ct· = $( CffIX, A, «1>, '1') = «I> A '('I' + A «I> A ') -ICxX<W + A«I>A') -I A«I>
(d) Show that the maximum likelihood estimators of A and 'I' given «I> = 1 are
A~ -- C* C*-I ,
.,[ [[
~ = C*xx - C*
x[ C*-l
[[ C*'
x['
CHAPTER 15
Patterns of Dependence;
Graphical Models
15.1. INTRODUCTION
595
596 [>A·lTIc.RN~ 01· DEPIc.NLJENCE; GRAPHICAL MODELS
A graph is a set of vertices and edges, G == (V, E). Each vertex is identified
with a random vector. ]n this chapter the random variables have a joint
normal distribution. Each undirected edge is a line connecting two vertices. It
is designated by its two end points; (u, v) is the same as (v, u) in an
undirected graph (but not in directed graphs).
Two vertices connected by an edge are called adjacent; if not connected by
an edge, they are called nonadjacent. In Figure 15.l(a) all vertices are
15.2 UNDIRECrED GRAPHS 597
jb
L Ii
.b
a. • a • a a
c c c c
nonadjacent; in (b) a and b ale adjacent; in (c) the pair a and b and the pair
a and c are adjacent; in (d) every pair of vertices are adjacent.
The family of (normal) distributions associated with G is defined by a set
of requirements on conditional distributions, known as Markov properties,
Since the distributions considered here are normal, the conditions have to do
with the covariance matrix ~ and its inverse A = ~-I, which is known as the
concentration matrix. However, many of the lemmas and theorems hold for
nonnormal distributions. We shall consider three definitions of Markov and
then show that they are equivalent.
In symbols
(1)
where .1l means independence and V\ Cu, v) indicates the set V with II and v
deleted. The definition of pairwise Markov is that PUlJ V\II',I) = 0 for all pairs
for which (u, v) ff:. E. We may also write u.1l vI V\ Cu, v).
Let ~ and A >= ~ - I be partitioned as
(2) A= [A'4A
ABA
(3)
The condition d covariance matrix is
( 4)
598 PATfERNS OF DEPENDENCE; GRAPHICAL MODELS
If A =(1. 2) and B = (3, ... , p), the covariance of XI and X 2 given X 3 , ••• , Xp
is al~.~ p in LA B = (a ],3 .. p), This is 0 if and only if AI2 = 0; that is, 'I A . B
'
is diagonal if and only if A A A is diagonal.
(5)
Proof The relations (6) imply that the density of X, Y, and Z can be
written as
which is the density generating (7). Conversely, (9) can be written as either
form in (8), implying (7). •
The relations in Theorem 15.2.2 and Corollary 15.2.1 are sometimes called
the block independence theorem. They are based on positive densities, that is,
nonsingular normal distributions.
Proof Suppose the graph is locally Markov (Definition 15.2.3). Let u and
v be nonadjacent vertices. Because v is not adjacent to u, it is not in bd(u);
hence,
(12)
(14)
•
Theorem 15.2.4. A pairwise Markov distribution on a graph is locally
Markov.
(16)
(17)
imply
(18)
(19)
•
A third notion of Markov, namely, global, requires some definitions.
600 PATIERNS OF DEPENDENCE; GI~APHICAL MODELS
Thus S separates Band C if for every sequence of vertices vo, VI' ... , vn
with Vo E B and v" E C at least one of VI"'" V,,-l is a vertex in S. Here B
and/or Care nonempty, but S can be empty.
(2Q)
(21) l: _
(A.B) S -
[l:AA Il:AB ]_ [l:A'
IBA I ]l:-I[l:
SS SA
I SB ]
BB BS
~ [AAA AABr
ABA ABB
Proof Let the set B be i, the set C be j not adjacent to i. and the set A
the rest of t he variables. Any path from B to C must inel ude elemt~nts of A,
Hence i is independent of j in the distribution conditioned on the other
variables. •
By Corollary 15.2.1
(23)
(24)
Theorems 15.2.3, 15.2.5, and 15.2.6 show that the three Markov properties
are equivalent: anyone implies the other two. The proofs here hold fairly
generally, but in this chapter a nonsingular mult'variate normal distribution is
assumed: thus all densities are positive.
.. Maximal"' means that if another vertex from V is added to the set, the set
will 110 longer be complete. A clique can be constructed by starting with one
vertex. say VI' If it is not adjacent to any other vertex, VI alone constitutes a
clique. If u: is adjacent to VI [( VI' v:!) E E], continue constructing a clique
with VI and v'2 ill it until a maximal complete subset is obtained. Thus every
vertex is a member of at least one clique, and every edge is included in at
least one clique.
(28)
(29)
(30)
(31)
(32)
(33)
604 PATrERNS OF' DEPENDENCE; GRAPHICAL MODELS
.
difficulty grade recommendation
.-----.... 3 - - - - - - - + •4
1
~
2.
~
---------+~ • 5
IQ SAT
Figure 15.2
where
(34)
•
In turn fAUS<X AUS ) and fBus(x BUS ) can be factorized, leading to (28).
(1)
(2)
15.3 DIRECfED GRAPHS 605
(3) X3 = f3:11 XI + f3 32 X 2 + u 3 ,
( 4) X 4 = f3 43 X ~ + II 4 •
(5) X5 = f3 52 X 2 + 115'
where u I' u 2 ' ll;., U 4 ' u 5 arc mutlJally i ndepe nd~n t 1I nohseIVcu variahlcs.
Wold (1960) called such models causal chaills. Note that the matrix of
coefficients is lower triangular. In general X, may depend on Xl ..... X, I '
The recursive linear system (J) to (5) ge nerates the reeur~ive factorizatilHl
(6) fm45(xl,x2,x;.,x4'x~)
(10) de(u)={vlu<v},
and u is an ancestor of v,
pa(v)
C.w
w
•
Figure 15.3
Note that
(13 ) vlLwlnd(v)\w.
(14) vlL[nd(v)\pa(v)]lpa(v).
where Uh ••• ' UT are independent N(O, 0' 2) variables and Yo has distrioution
N[O, 0'2/0 - p)2)]. In this case given Yt. the future Yf+l> ... , YT is indepen-
dent of the past Yo" .. ,Yt-I'
Proof The proof is the same as the proof of Theorem 15.2.3 for undi-
rected graphs. •
Theorem 15.3.3. A finite ordered set (V,:s;) admits at least one well-
numbering.
Lemma 15.3.1. A finite, partially ordered set (V, :s;) has at least one
maximal element a* .
u = V(IO) -l> V(ll) -l> '" - . V(I.) = v the indices satisfy io ~ i l .$ ".$ in' The
well~numbering is not necessarily unique. Since V is finite, a maximal
element can be found by comparing v, and v} for at most n(n - 1)/2 pairs.
(16) vf JL ( VI,' . " V, I) \ pa( v() I pa( v,), i=3, ... ,I1.
Recursive Factorization
The recursive aspect of the acyclic directed graph permits a systematic
factorization of the density. Use the construction of Theorem 15,3.4. Let
n = I VI ; then v" i.s a maximal element of V. Then
(17)
At the jth step let v,,_) +I be a maximal element of V\ (vn , .. , , v,,_, +: ), Then
(20)
Thus
(21)
(22)
(23)
(24)
whae E 1, ..• , Ell are independent random vectors with 0E j Ej = I). In matrix
form (23) to (26) are
1 0 0 0 El
-B21 1 0 0 E2
B= -B 31 - B;.2 1 0 , E= E3
(29)
A chain graph includes both directed and undirected edges; however, only
certain patterns of vertices and edges are permitted. Suppose the set of
vertices V of the graph G = (V, E) can be partitioned into subsets V =
V(1) U ", u V( T) so that within a subset the lertices are joined by undi-
rected edges and directed edges join vertices in different subsets. Let greG)
be the set of vertices 1, ... , T and let $(G) be the (directed) edge set such
that T -+ (J" if and only if there is at least one element U E V( T) and at least
one element v E V( (J") such that u -+ v is in E, the edge set of G. Then
~v(G) = [:T(G), J."(G)] is an acyclic directed graph; we can define pa q>( T),
etc., for ~ 1) (G).
Let XT=- (X"lll E V(T)}. Within a set the vertices form an undirected
graph rdative to tht: probability distribution conditional on thc pa~t (that is,
earlier sets). See Figure 15.4 [Lauritzen (1996)] and Figure 15.5.
We now define the Markov properties as specified by Lauritzen and
Wermuth (1989) and Frydenberg (1990):
V(1)
I:
V(3)
V(2)
Figure 15.4. A chain graph.
V(2)
Figure 15.5. The corresponding induced acyclic directed graph on V VO) U V(2) U V(3).
(C2) For each T the conditional distribution of Xr given Xpa til(T) is globally
Markov with respect to the undirected graph on V( T).
(C3)
pao(U)
V('t-l) V(-r:)
1 3
2 4
V(1) V(2)
Figure 15.7. A chdin graph.
(C3*)
(3)
( 4) LWF:
(5) AMP:
( 6)
(7)
(8)
(9)
where (e3' e4) has an arbitrary distribution independent of ( el' e), and
hence independent of (X" X 2 ),
In general the AMP model can be expressed as (26) of Section 15.3,
The above form shows t'1at i and S are a pair of sufficient statistics for fJ-
and I, and they are independently distributed, The interest in this chapter is
on the dependences, which depend only on the covariance matrix ~. not fJ-,
For the rest of this chapter we shall suppress the mean, Accordingly. we
suppose that the parent distribution is NCO, I) and the sample is X I' . , , , X'I),
and S = (l/n)'[:=l xa x~. Tt e likelihood function can be written
(2)
= ex p [ --we A) - i .t
1=1
AI/'I/ - E
1<1
AIII II ],
where A satisfies the condition All = 0, (t, j) ~ E. In ihis form the canonical
parameters are All"'" App and AI), (I, j) E E. The canonical variables are
sn,""spp and S'l' (i,j)EE; these form a sufficient set of statistics. To
maximize the likelihood function we differentiate (3) with respect to Ali'
i = L .... p, and All' (i, j) EE, to obtain the equations (4) and (5).
where A = i-I.
This result follows from the general theory of exponential families. See
Lauritzel1 (1996), Theorem 5.3 and Appendix D.l.
Here we shall show that for a decomposable graph the equations (4) and
(5) have a unique positive definite solution by developing Gon algorithm for its
computation. We follow Speed and Kiiveri (1986).
I n(xIO,P)
@
(8) I( PR
I ) =0p ogn(xIO,R)
:. ~HlogIPR-11 +tr(I~PR-l)].
•
Lemma 15.5.2. Let
( to) R12]
R22 •
Then
CD The matrix
(11 )
(12)
(13)
I-P R-l
(14) PQ-I =PR-1 + [. ~I It
(20)
- -:- - - - - -
Note that I II = nf=1 o)RI, where R = (p,,). Given that a" = s", the selec-
tion of PI; to maximize the entropy of the fitted normal distribution satisfying
the requirements also minimizes IRI [Demspter (1972)].
618 PATIERNS OF DEPENDENCE; GRAPHICAL MODEl.}
AAC]
(23) ABC,
Acc
(24)
and
(25)
(26) S(ABlC] = S
S CC ABC'
where
SAA C
(27) S(AB)'C = [ S
BkC
A [(SAA'C
(28) I w
= 0
The matrix S(AB) C has tht: Wishart distribution W[I(ABl'C' n - (PA + PB)]'
where 11.-1 and PH arc the number of componcnts of X A and X n, respectively
15.5 STATISTICAL INFERENCE 619
(29) L],
and Sec has the Wishart distribution W(I ee , n). The matrix S( AB).e and the
matrix B(A8)'e' are independent (Chapter 8).
Consider testing the null hypothesis (25). This is testing the null hypothesis
*'
~jlB.e = 0 against the alternative I A8 .e O. Th: determinant of (26) is
1~nl = ISAB.el . 1Sec! ; the determinant of(28) is IIJ ISAAI ·IS8B I ·ISed·
The likelihood ratio criterion is
(3D)
Since the sample covariance matrix S( A 8).e has the Wishart distribution
W[~(AB)'C' n - (PA + PB)]' where PA and PB are the numbers of components
of XA and X8 (Section 8.2), the criterion is, in effect, upA.PB,n-(PA+PB)'
studied in S0ctions 8.4 and 8.5.
As another example, consider the graph in Figure 15.9. Note that node 4
separates (1,2,3) and (5,6); nodes 1,4 separate 2 and 3; and node 4 separates
5 and 6. These separations imply three conditional independences:
(Xl' X 2 , X 3 ) JL(Xs, X 6 )IX4 • X 2 JL X 3 1(X 1, X 4 ), and Xs JL X 6 1X4 • In terms of
covariances these conditional independences are
~(123)(56H
1
(31) I(123)(56) 1(123)4 I;4 I4(56) = 0,
(32) I 2).(14) I23 - I2(14) 1(i~)(l4) 1(14)3 = 0,
2 5
6
3
Figure 15.9
620 PAITERNSOF DEPENDENCE; GRAPHICAL MODELS
(34)
(35) S=
S62 S66 S64
S42 S46 S44
[S2:" S~')
:
[S~)_,
+ : S44 [S42
=
S22>4 S(l("4 S64
[ S42 S4b I
= s(1...6)(1...6)"4 +s
S (2 •. 6)4 S-IS
44 4·(1...6) S(2">6)4]
[ 4{2 .•. 6) S44 -
The determinant of S is
(36)
If the condition (Xl' X 2 • X 3 )JL(X5 • X 6 )IX4 i<.; imposed, the maximum likeli-
hood estimator is (35) with S( 125)(56)'4 replaced by 0 to obtain
(37)
(39)
Here U(231)(56H has the distribution of UP1 +P,+P3,P5+Pb,n-p (Section 8.4) since
the distribution of S(L.6)(L.6H is W(I(~ .. 6)(~ ... 6H' n - p), independent
of 8 w
The first three rOWs and columns of S(l .. . 6)(2 ... 6)-4 constitute the matrix
(40)
S23'4
8 33 '4
S13'4
(41)
(42)
!::: '
( S21-4) ]
(43)
622 PA TIERNS OF DEPENDENCE; GRAPHICAL MODELS
II / 2
(44) U23-14 •
The statistic U:'I l~ has the distribution of Up2.1'~.II-( PI +Pl+P,+P4) (Section 8.4)
since S(23){:!3) 14 has the distribution W[I(23)(23)<14' n - (PI + pJ] independent
of S(l-t)(l-l)'
The estimator of IIS6)(~6H with I 56 ' 4 = 0 imposed is S(56)(56)-4 with S56'4
replaced by 0 to obtain
(45)
(40)
( I S(56)(56)-4 1 ]11/2 = U /2 1/
The statistic 11 5 4 has the distribution of Up;. Po,II-( P4 + P5 +P6) since S(56)(56)o4
(1
The likelihood ratio criterion for testing the three null hypotheses is
( 48)
When the null hypothcscs are true, the factors L.h3l)(56)-4' U23 <14' and U56 -4 are
independ..;nt. Their distributions are discussed in Sections 8.4 and 8.5. In
particular the moments of these factors are given and asymptotic expansions
of distributions are described.
(50) a 2 =i z +B 2 i,.
In general
N
(51) Bj:=o: E [xJ(cr) - i j ][ xpa(u})(cr) -xpa(u}l]'
a=I
(52) aj=xj+BJipa(u}).
ACKNOWLEDGMENTS
Matrix Theory
(1) A=
(2)
(3)
624
A.1 DEFINITION OF A MATRIX AND OPERATIONS ON MATRICES 625
(4) A+B=B+A,
The matrix (0) with all elements 0 is denoted as O. The operation A + (-IB)
is denoted as A-B.
If A has the same number of columns as B has rows, that is, A = (a l ) ,
i = 1, ... , I, j = 1, ... , m, B = (bjk ), j = 1,,,., m, k 1, ... , n, then A and B
can be multiplied according to the rule
i=l,,,.,/, k=l"."n;
that is, AB is a matrix with 1 rows and n columns, the element in the ith row
and kth column being 'f.7=la 1j bJk • The matrix product has the properties
The relationships (11)-(13) hold provided one side is meaningful (i.e., the
numbers of rows and columns are such that the operations can be performed);
it follows then that the other side is also meaningful. Because of (11) we can
write
the ith row and Jth column. The operation of transposition has the proper-
ties
again with the restriction (which is understood throughout this book) that at
least one side is meaningful.
A vector x with m components Can be treated as a matrix with m rows
and one column. Therefore, the above operations hold for vectors.
We shall now be concerned with square matrices of the same size, which
r.:an be added and multiplied at wilL The number of rows and columns will be
taken to be p. A is called symmetric if A = A'. A particular matrix of
cunsiderable interest is the identity matrix
1 a a a
a 1 a a
( 18) /= a a 1 a = (Sj]),
a a a 1
(19) i =},
=0, i -+ j.
The identity matrix satisfies
(20) LA=A1=A.
where the summation is taken over all permutations (}[, ... , }p> of the set of
integers (1" .. , p), and JUI,' .. ' Jp ) is the number of transpositions required
to change (1,., ., p) into <il' ... '} p). A transposition consists of interchanging
two numbers, and it can be s~own that, although one can transform (1, ... , p)
into Cil" .. ' Jp ) by transpositions in many different ways, the number of
A.I DEFINITION OF A MATRIX AND OPERATIONS ON MATRICES 627
(25)
(26)
since
(28) Az=O
is the trivial one z = 0 [by multiplication of (28) on the left by AI]. If
IAI = 0, there is at least one nontrivial solution (that is, z >1= 0). Thus an
equivalent definition of A being nonsingular is that (28) have only the trivial
solution.
A set of vectors Zl>'" > Zr is said to be linearly independent if there exists
no set of scalars cp ... ,c" not all zero, such that E.~_lCiZi=O. A qXp
628 MA fR IX THEOR Y
p
(29) x'Ax= ~ a/lx/x;,
i,; = 1
where x' = (Xl' ... ' Xp) and A = (a ij ) is a symmetric matrix. This matrix A
and the quadratic form are called positive semidefinite if x 'Ax ~ 0 for all x. If
x' Ax> 0 for all x"* 0, then A and the quadratic form are called positive
definite. In this book positive definite implies the matrix is symmetric.
(32) tr An tr BA.
Then
(36) d!')
1/
= 0' i=j+1, ... ,p, j=l, ... ,g-l,
(37) I
ll.~)
')
= (I(g-II
I} • i=l, ... ,g-l, j=l, ... ,p,
a(l)
11 0 0 0
a(2)
0 :!2 0 0
0 a(3)
(39) FAF'= 0 33 0
a(p)
0 0 0 pp
Proof. Let FAP' = D2, and let D be the diagonal matrix whose diagonal
elements are the positive square roots of the diagonal elements of D2. Then
C = D- 1 F serves the purpose. •
The characteristic roots of a square matrix B are defined as the roots of the
characteristic equation
(1) IB - All = o.
Alternative terms are latent roots and eigenvalues. For example, with
we have
The degree of the polynomial equation (1) is the order of the matrix Band
the constant term is IB I.
A matrix C is said to be orthogonal if C'C = I; it follows that CC' = I. Let
the vectors ;)" = (xl"'" xp) and y' = (Y1"'" yp) represent two points in a
p-dimensional Euclidean space. The distance squared between them is
D( x, y) = (x - y y(x - y). The transformation z = Cx Can be thought of as a
change of coordinate axes in the p-dimensional space. If C is orthogona!, the
632 MATRIX THEORY
dl 0 0
0 dz 0
( 4) CIBC=D=
0 0 dp
Thus the characteristic roots of B are the diagonal elements of the trans~
formed matrix D.
If .\ is a characteristic root of B, then a vector XI not icentically 0
satisfying
( 6) (B = AJ)x J
0
characteristic vector, and let C=(CI' ... ,cp )' Then C'C=I and BC=CD.
These lead to (4). If a characteristic root has multiplicity m, then a set of m
corresponding characteristic vectors can be replaced by m linearly indepen.
dent linear combinations of them. The vectors can be chosen to satisfy (6)
and xjxi = 0 and xjBx; = 0, i j.*"
A characteristic vector lies in the direction of the principal axis (see
Chapter 11). The characteristic roots of B are proportional to the squares of
the reciprocals of the lengths of the principal axes of the ellipsoid
(7) x' Ilx = I
The roots of such equations ar':! of interest because of their invariance under
certain transformations. In fac , for nonsingular C, the roots of
(10) IC'BC-A(C'AC)I =0
*"
and IC'I = ICI o.
By Corollary A 1.6 we have that if A is positive definite there is a matrix
E such that E' AE = I. Let E' BE = B*. From Theorem A.2.1 we deduce that
there exists an orthogonal matrix C such that C' B* C = D, where D is
diagonal. Defining EC as F, we have the folloVo ing theorem:
At 0 0
0 A2 0
(12) F'BF=
0 0 Ap
(13) F'AF=I,
where Al ~ ... ~ Ap (~ 0) are the roots of (9). If B is positive definite. \ > 0
i= 1, ... ,p.
634 MATR IX THEOR Y
(14)
(15) X=PDQ.
( 16)
where EI is diagonal and positive definite. Let XQ' = Y = (YI Y2 ), where the
number of columns of YI is the order of E1• Then YZY2 = 0, and hence
Y2 = O. Let P! = YI E1 4. Then PIP! = 1. An t1 X n orthogonal matrix P =
(PI pJ satisfying the theorem is obtained by adjoining P 2 to make
P orthogonal. Then the upper left-hand corner of D is EI~' and the rest of iJ
consists of zeros. •
where Al and Ap are the largest and smallest roots of (1), and
(18)
Proof: The inequalities (17) were essentially proved in Section 11.2, and
can also be derived from (4). The inequalities (18) follow from Theorem
A.2.2. •
A.3 PARTITIONED VECfORS AND MATRICES 635
(2)
A12 + B12 )
(3)
A22 + B22 '
Now partition C (n X r) as
(4) C'2 )
C 22 '
where Cll and C'2 have q rowS and Cll and C21 have s columns. Then
12
(5) C )
C22
= (AlICn +A 12 C 21 A ll C 12 + A 12 C22 )
A 21 C ll +A 22 C 21 A 2,C'2 + An C 22 .
636 MATRIX THEORY
To verify this, consider an element in the first p rows and first s columns of
AC. The t, jth element is
The first sum is the i, jth element of ABC II , the second sum is the i, jth
element of A 12 C Z1 ' and therefore the entire sum (6) is the t, }th element of
AIlC Il +A 1Z C 21 , In a similar fashion we can verify that the other submatrices
of AC can be written as in (5).
We note in passing that if A is partitioned as in (2), then the transpose of
A can be written
AI
A' = A'Z! )
(8) ( A'12
II
A' .
22
If A12 .;.., 0 and A21 0, then for A positive definite and All square,
(9) :-1) .
22
The matrix on the right exists because An and An are nonsingular. That the
right-hand matrix is the inverse of A is verified by multiplication:
(10) All
(o
0 ) (A11l 0 )=(1 0J\,
An 0 Ai} 0 1
o 1
( 11)
1 o
The evaluation of the first determinant in the middle is made by expanding
according to minors of the last row; the only nonzero element in the sum is
the last, which is 1 times a determinant of the same form with 1 of order one
A.3 PARTITIONED VECfORS AND MATRICES 637
I o
(12) =
o
(13)
(14) A121=IA
I II I,
Theorem A.J.1. Let the square matrix A be partitioned as in (2) so that A22
is square. If An is nonsingular, let
(15)
Then
(16)
Theorem A.3.2. Let the square matrix A be partitioned as in (2) so that A:2
is square. If A22 is nonsingular,
(18)
638 MATRIX THEORY
c y
IC- yy'l
1 y'
(9) = = C = ICI(1- y'C-Iy).
y' 1 y
(20)
PI) o
A22
)C- 1•
Hence
(23) x' A - I X = (X(I) - A A-I x(2))'A- 1 (X(I) - A A- 1x(2») + X(2), A-I X(2)
12 22 11-2 12 22 22'
(24)
x' A -I X = x(l), A~112X(l) - x(l), A;I~2AI2A221 x(:~)
(26)
Proof We have
(2)
Choose B (r x p) such that
(3)
640 MATRIX THEORY
is nonsingular. Then
(4)
(5)
(6)
where tlie order of I is the number of positive characteristic roots of E and the
order of -I is the number of negative characteristic roots of E.
hI 0 0
0 hz 0
(7) GEGI=
0 0 hp
where h l :2:': ... :2:': hq > 0 > hq+l ~ ..• ~ hp are the characteristic roots of E.
Let
o o o
o l/{ii; 0 o
(8) K=
o o l/"j-hq+1 o
o o o l/"j-hp
AA SOME MISCELLANEOUS RESULTS 641
Then
o
(to) -I
o
where the order of I is the number of positive characteristic roots of C and the
order of - / is the number of negative charactedstic roots, the slim of the orders
being r.
Proof. The proof is the same as that of Theorem A4.l except that Lemma
A.4.l is used instead of Corollary A 1.6. •
( 12)
ox-liJ,
where c = ,;x;-;.
Proof Let the first row of 0 be (I Ic )x'. The other rows may be chosen in
any way to make the matrix orthogonal. •
illBI
(13) ~=B'J'
IJ
i,j=l .... ,p.
642 MATRIX THEORY
Lemma A.4.5. Let b,j= f3,/c I , ... ,cn ) be the i,jth element of a p Xp
matrix B. Then for g = 1, ... , n,
(15)
alBI = .f. alBI. af3ih(c\ ... "c n ) = f. af3/h(c p •.. ,cn )
a g '-- ab iJ g '-- Bih iJc •
c l,h=l Ih c i,I1=I g
alAI
(16) aa=A jp
II
(18)
Since IAI = IBI and B,) = B)I =A!) =A)" (17) follows. •
Theorem A.4.3.
where a/ax denotes taking panial derivatives with respect to each component ofx
and arranging the panial derivatives in a column.
(21) A®B=
Some properties are the following when the orders of matrices permit the
indicated operations:
Theorem A.4.4. Let the 1th characteristic root of A (p Xp) be Ai and the
corresponding characteristic vector be XI = (XII"'" Xpi )', and let the ath root of
B (q X q) be Va and the corresponding cha racterntic vector be Ya ' a = 1, ... , q.
Then the i, a th root of A ® B is Ai va' and the corresponding characteristic vector
isxi®Ya=(XIiY:, ... ,xpIY:)', i= 1, ... ,p, a= 1, ... ,q.
Proof.
AixplBYa
= AlVa
XpiYa
•
644 MATRIX THEORY
Theorem AA.S
(25)
Proof The determinant of any matrix is the product of its roots; therefore
(26)
•
Definition A.4.2. If the p X m matrix A = (a), ... , am)' then vee A =
( ap ..• ,a m
t t ),
.
(29)
where
Then
(31) (aoa)
E .... -.E (a)
ao y E = - y (aao Y)Y_[ •
-I
If 0 = YO:/P then
(32)
A.4 SOME MISCELlANEOUS RESULTS 645
where Ea{3 is a p X P matrix with all elements 0 except the element in the
ath row and 13th column, \\hich is 1; and e. a is the ath column of E and e{3.
is its 13th row_ Thus Be,)! BYa {3 = -e ,a e(3)- Then the Jacobian is the determi-
nant of a p2 X p2 matrix
(33) •
Theorem A.4.7. Let A and B be symmetric matrices with charactensric roots
a l ~ a2 ~ ••• ~ ap and b l ~ b 2 ~ .•. ~ bp , respectively, and let H be a p X P
onhogonal matrix. Then
p p
(34) max tr HAH'B = E alb!, min HA'H'B = E a)b p + I _J •
H j=l H )=1
(35) max
H*
tr H*AH*'B = max
H*
tr H*H0 D 0 H'0 H*'H h D h H'b
p
(36) tr HD" H' Db = E (HDoH') IIb,
i= 1
p-I i p
= E E (HDoH'h(b, - b, + I ) + bp E (HD"H'}))
i=] /=1 )=1
p-I I P
:::; E E a) ( b, - b, + 1 ) + bp E a)
i=l )=1 )=1
by Lemma A.4.6 below. The minimum in (34) is treated as the negative of the
maximum with B replaced by - B [von Neumann (937)]. •
646 MATRIX THEORY
Proof
k p p
(38) 1: 1: PljY} = 1: g,Yj'
i-1j=1 j=I
k p
= 1: (y) Yk)(gj 1)+ 1: (Yj-Yk)g;
j=l j=k+l
:s O. •
Corollary A.4.2. Let A be a symmetric matrix with characteristic roots
a l 2::. a 2 2. ... a p' Then
k
(40) max trR'AR= 1:a i •
R'R=I. i I
(41 )
•
Theorem A.4.S.
(42) 11 + xCI = 1 +x tr C + D( x 2 ).
(43)
•
A.5 ORTHOGONALIZATION AND SOLUTION OF LINEAR EQUATIONS 647
(1) i = 2, . .. ,p.
that is,
(4)
Then
(5) A = V'V= TU'UT' = TT'
as shown in Section 7.2. Note that if V is square. we have decomposed an
arbitrary nonsingular matrix into the product of an orthogonal matrix and an
upper triangular matrix with positive diagonal elements; this is sometimes
known as the QR decomposition. The matrices U and T in (4) are unique.
These operations can be done in a different order. Let V = (v~O), •.. , v~O»).
For k = 1, ... , p - 1 define recursively
t -llv(k-l) II U =
1 V(k-l)
1V ( k - I)
= -tkk
(6) kk - k , k IIvik-I)11 k k ,
such that
i-I
(10) 0= VhW, = V~Vi + L fi)vhVi
/"'1
i-I
( 11)
Let D, be the diagonal matrix with IIw/11 == tif as the jth diagonal element.
Then U = WD t- 1 = VF' Dt- I • Comparison with V = UT' shows that F = DT- I•
Since A = IT', we see that FA = DT' is upper triangular. Helice F is the
matrix defined in Theorem A. 1.2.
There are other methods of accomplishing the QR decomposition that
may be computationally more efficient Or more stable. A Householder matrix
has the form H = In - 2aa', where a'a = 1, and is orthogonal and symmet-
ric. Such a matrix HI (i.e., a vector a) can be selected so that the first
column of H1V has O's in all positions except the fLrst, which is positive. The
next matrix has the form
(12) o
In_1
)-2(0)(0
a
, _(10 a ) -
(13)
A.5 ORTHOGONALIZATION AND SOLUTION OF LINEAR EQUATIONS 649
(14)
where H(I) has p columns. Then from (13) we obtain V = HO)T'. Since the
decomposition is unique, H(l) = u.
Another procedure uses Givens matrices. A Givens matrix G1) is I except
for the elements gil = cos 8 = gii and gi) = sin 8 = -g)i' i =1= j. It is orthogonal.
Multiplication of V on the left by such a matrix leaves all rows unchanged
except the ith and jth; 8 can be chosen so that the i. jth element of G1) V
.v
is O. Givens matrices G 21 , ••. ,Gnl can be chosen in turn so G,Il ... G 2 has
all O's in the first column except the first element, which is positive. Next
G n ,· .. , Gn2 can be selected in turn so that when they are applied the
resulting matrix has O's in the second column except for the first two
elements. Let
(15)
Then we obtain
( 16)
and G(l) = u.
(17) Ax=y,
(18) A*x=y*.
p
( ~O) r = .\.**
. I ,
- ~ *:J. \..
L... (I,) . I'
} =i+ I
these equations are to be solved successively for x P ' x p _ P"" Xl' The calcula-
tion of FA =A* is known as the tOlward solution, and the solution of (18) as
the backward solution.
Since FAF' =A*F' =D2 diagonal, (20) is A**x=y**,where A** =D- 2A*
and y** = D-~ y*. Solving this equation gives
The computation is
(~2)
which is the product of the diagonal elements ot A*, resulting from the
forward solution. We also have
(24)
The forward solution gives a computation for the quadratic form which
occurs in T~ and other statistics.
For more on matrix computations ctmsult Golub and Von Loan (I 989).
APPENDIX B
Tables
TABLEB.l
Wll..KS'LIKEUHOOD CRITERION: FACTORS C(p, m, M)
TO ADJUST TO xi, 1ft' WHERE M = n - p +1
5% Significance Level
p-3
M\m 2 4 6 8 10 12 14 16 I~
5% Significance Level
p=3 p=4
M\m 20 22 2 4 6 8 10 12 14
1 2.021 2.067 1.407 1.451 1.517 1.583 1.644 1.700 l.751
2 1.580 1.616 1.161 1.194 1.240 1.286 1.331 l.373 1.413
3 1.408 1.438 1.089 1.114 1.148 1.183 1.218 1.252 1.284
4 1.313 1.338 1.057 1.076 1.102 1.130 1.159 1.186 l.213
5 1.251 1.273 l.040 1.055 1.076 1.099 1.122 1.145 1.168
6 1.208 1.227 1.030 1.042 1.('59 1.078 1.097 1.118 1.137
7 1.176 1.193 1.023 1.033 1.047 1.063 1.080 1.097 1.115
8 1.151 1.167 1.018 1.027 1.038 1.052 1.067 l.082 1.097
9 1.132 1.147 1.015 1.022 1.032 1.044 1.057 1.070 1.084
10 1.116 1.129 1.012 l.018 1.027 1.038 1.049 1.061 1.073
12 1.092 1.103 1.009 1.014 1.020 1.029 1.038 1.047 1.058
15 1.069 1.078 1.006 1.009 1.014 1.020 1.027 1.035 1.042
20 1.046 1.052 1.003 1.006 1.009 1.013 1.017 1.022 1.027
30 1.025 1.029 1.002 1.003 1.004 1.006 1.009 1.011 1.014
60 l.008 1.009 1.000 1.001 1.001 1.002 1.003 1.003 1.004
00 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 l.000
2
Xpm 79.0819 85.9649 15.5073 26.2962 36.4150 46.1943 55.7585 65.1708 74.4683
652
TABLE B.l (Continued)
- 5% significance Level
p""'5 p=6 p-7
M\m 14 16 2 6 8 10 12 2 4
12 1.029 1.034 ~
1.042 1.028 1.038 Lo34 LoB 1.035 1.039
IS 1.020 1.024 1.031 1.019 1.027 1.023 1.023 1.025 1028
20 1.0n 1.016 1.019 1.012 1.017 1.014 1.015 1.016 1.017
30 1.006 1.008 1.010 1.006 1.009 l.(lO7 1.007 1.008 1.009
60 1.002 1.002 1.003 1.001 1.003 1.002 1.002 1.002 1.002
00 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
2 58.1240 74.4683 90.5312 26.2962 83.6753 28.8693 50.9985 72.15n 31.4104
Xpm
653
TABLEB.l ( Continued)
1 % Significance Level
p=3
M\m 2 4 6 8 10 12 14 16
1 U56 1.514 1.649 1.763 1.862 1.949 2.026 2.095
2 1.131 1.207 1.282 1.350 1.413 1.470 1.523 1.571
3 1.070 1.116 1.167 1.216 1.262 1.306 1.346 1.384
4 1.043 1.076 1.113 1.150 1.187 1.221 1.254 1.285
5 1.030 1.054 1.082 1.112 1.141 1.170 1.198 1.224
6 1.022 1.040 1.063 1.087 1.112 1.136 1.159 1.182
7 1.016 1.031 1.050 1.070 1.091 1.111 1.132 1.152
8 1.013 1.025 1.041 1.058 1.075 1.093 1.111 1.129
9 1.010 1.021 1.034 1.048 1.064 1.080 1.095 1.111
10 1.009 l.017 1.028 1.041 1.055 1.069 1.082 1.097
12 1.006 1.012 1.021 1.031 1.042 1.053 1.064 1.076
15 1.004 1.009 1.014 1.021 1.030 1.038 1.047 1.056
20 1.002 1.005 1.009 1.013 1.019 1.024 1.030 1.036
30 1.001 1.002 1.004 1.007 1.009 1.012 1.016 1.019
60 1.000 1.001 1.001 1.002 1.003 1.004 1.005 1.006
00 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
2
Xpm 16.8119 26.2170 34.8053 42.9798 50.8922 58.6192 66.2062 73.6826
TABLEB.l (Continued)
1 % Significance Level
p~3 p-4
M\m 18 20 22 2 4 6 8 10
1 2.158 2.216 2.269 1.490 1.550 1.628 1.704 1.774
2 1.616 1.657 1.696 1.192 1.229 1.279 1.330 1.379
3 1.420 1.453 1.485 1.106 1.132 1.168 1.207 1.244
4 1.315 1.344 1.371 1.068 1.088 1.115 1.146 1.176
5 1.249 1.274 1.297 1.047 1.063 1.085 1.109 1.134
6 1.204 1.226 1.246 1.035 1.048 1.066 1.086 1.107
7 1.171 1.190 1.209 1.027 1.037 1.052 1.070 1.088
8 1.146 1.163 1.180 1.021 1.030 1.043 1.053 1.073
q 1.127 1.142 1.157 1.017 1.025 1.036 1.048 1.062
10 1.111 1.125 1.139 1.014 1.021 1.030 1.041 1.054
12 1.087 1.099 1.110 1.010 1.015 1.023 1.031 1.041
15 1.065 1.074 1.083 1.007 1.010 1.016 1.022 1.029
20 1.043 1.049 1.056 1.004 1.006 1.010 1.014 1.019
30 1.023 1.027 1.031 1.002 1.003 1.005 1.007 1.009
60 1.007 1.009 1.010 1.000 1.001 1.001 1.002 1.003
00 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
2
Xpm 81.0688 88.3794 95.6257 20.0902 31.9999 42.9798 53.4858 63.6907
654
TABLE B.1 (Continued)
1 % Significance Level
p-4 p-5
M\m 12 14 16 18 20 2 4 6
1 1.838 1.896 1.949 1.999 2.045 1.606 1.589 1.625
2 1.424 1.467 1.507 1.545 1.580 1.248 1.253 1.284
3 1.280 1.314 1.347 1.378 1.408 1.141 1.150 1.175
4 1.205 1.234 1.261 1.287 1.313 1.092 1.101 1.121
5 1.159 1.183 1.207 1.230 1.252 1.065 1.074 1.090
6 1.128 1.149 1.169 1.189 1.208 1.049 1.056 1.070
7 1.106 1.124 1.142 1.160 1.177 1.038 1.044 1.056
8 1.089 1.105 1.121 1.137 1.153 1.031 1.036 1.046
9 1.076 1.091 1.105 1.119 1.133 1.025 1.030 1.039
10 1.066 1.079 1.092 1.105 1.118 1.021 1.025 1.033
12 1.0S1 1.062 1.073 1.083 1.094 1.015 1.019 1.025
15 1.017 1.045 1.053 1.062 1.071 1.010 LOU 1.017
20 1.024 1.029 1.035 1.041 1.047 1.006 1.008 1.011
30 1.012 1.015 1.019 1.022 1.026 1.003 1.004 1.005
60 1.004 1.005 1.006 1.007 1.008 1.001 1.001 1.001
00 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
2
Xpm 73.6826 83.5134 93.2168 102.8168 112.3292 23.2093 37.5662 50.8922
655
TABLE B.l (Cantin/led)
1% Significance Level
p-6 p-7
M\m 10 12 2 4 6 8 10
1 1.687 1.722 1.797 1.667 1.642 1.648 1.660
2 1.348 1.378 1.348 1.305 1.306 1.321 L342
3 1.230 1.255 1.207 u8B 1.194 1.210 1.229
4 1.169 1.191 1.140 1.130 1138 1.152 1.169
5 1.131 1.150 1.102 1.097 1.105 1.117 1.132
6 1.106 1.122 1.078 1.076 1.083 1.094 1.107
7 1.087 1.102 1.062 1.061 1.067 1.077 1.089
8 1.074 1.086 1.050 1.050 1.056 1.065 1.075
9 1.063 1.075 1.042 1.042 1.047 1.055 1.065
10 1.055 1.065 1.035 1.036 1.041 1.048 1.056
12 1.042 1.051 1.026 1.027 1.031 1.037 1.044
15 1.030 1.037 1.018 1.019 1.022 1.025 1.032
20 1.019 1.024 1.011 1.012 1.014 1.017 1,020
30 1.010 1.013 1.005 1.006 1.007 1.00t) 1.011
60 1.003 1.004 1.001 1.002 1.002 1.003 1.003
IX) 1.000 1.000 1.000 1.000 1.000 1.000 1.000
2 29.1412
Xpm 88.3794 102.816 48.2782 66.2062 83.5134 100.425
656
TABLE B.2
TABLES OF SIGNIFICANCE POINTS FOR THE LAWLEy-HOTELLING TRACE TEST
Pr{ ~ W ~ X .. } ... a
5% Significance level
p-2
n\m 2 3 4 5 6 8 10 12 15
2 9.859· 10.659· 11.098· 11.373· 11.562· 11.804· 11.952· 12.052· 12.153·
3 58,428 58.915 59.161 59.308 59.407 59.531 59.606 59.655 59.705
4 23.999 23.312 22.918 22.663 22.484 22.250 22.104 22.003 21.901
5 15.639 14.864 14.422 14.135 13.934 13.670 13.504 13.391 13.275
6 12.175 11.411 10.975 10.691 10.491 10.228 10.063 9.949 9.832
7 10.334 9.594 9.169 8.893 8.697 8.440 8.277 8.164 8.048
8 9.207 8.488 8.075 7.805 7.614 7.361 7.201 7.090 6.975
10 7.909 7.224 6.829 6.570 6.386 6.141 5.<J84 5.875 5.761
12 7.190 6.528 6.146 5.894 5.715 5.474 5.320 5.212 5.100
14 6.735 6.090 5.717 5.470 5.294 5.057 4.905 4.798 4.686
18 6.193 5.571 5.209 4.970 4.798 4.566 4.416 4.309 4.198
20 6.019 5.405 5.047 4.810 4.640 4.410 4.260 4.154 4.042
25 5.724 5.124 4.774 4.542 4.374 4.147 3.998 3892 3.780
30 5.540 4.949 4.604 4.374 4.209 3.983 3.835 3.729 3.617
35 5.414 4.829 4.488 4.260 4.096 3.872 3.724 3.618 3.505
40 5.322 4.742 4.404 4.178 4.014 3.791 3.643 3.538 3.425
50 5.198 4.625 4.290 4.066 3.904 3.682 3.535 3.429 3.315
60 5.118 4.549 4.217 3.994 3.833 3.611 3.465 3.359 3.245
70 5.062 4.496 4.165 3.944 3.783 3.562 3.416 3.310 3196
80 5.020 4.457 4.127 3.907 3.747 3.526 3.380 3.274 3.159
100 4.963 4.403 4.075 3.856 3.696 3.476 3.330 3.224 3.109
200 4.851 4.298 3.974 3.757 3.598 3.380 3.234 3.127 3.012
00 4.744 4.197 3.877 3.661 3.504 3.287 3.141 3.035 2.918
-Mulliplyby 10 2•
657
TABLE B.2 (Continued)
1% Significance Level
p-2
n\m 1. 3 4 5 6 8 10 12 15
tMuJtiply by 10 4
'Multiply by lO~
658
TABLE B.2 ( Continued)
5% Significance Level
p=3
n\m 3 4 5 6 8 10 12 15 20
659
TABLE 8.2 (Continued)
100 7.793 7.081 6.614 6.281 5.830 5.534 5.323 5.096 4.850
200 7.498 6.808 6.356 6.032 5.593 5.304 5.096 4.873 4.627
00 7.222 6.554 6.116 5.801 5.373 5.089 4.885 4.664 4.419
tMultiply by 10 4 •
*Multiply by 10 2 •
660
TABLE B.2 (Continued)
5% Significance Level
p=4
n\m 4 5 6 8 10 12 15 20 25
• Multiply by 10 2 .
661
TABLE B.2 ( Continlled)
1% Significance Level
p=4
n\m 4 5 6 8 10 12 IS 20 25
4 12.491t 12.800t n012t 13.283 t 13.449 t 13. 561t . 13.67 t 13.79 t 13.87 t
5 9.999· 10.004· 10.008· 10.012· 10.014· 10.016· 10.018· 10.02· 10.02·
6 1.938· 1.906· 1.885· 1.857 • 1.840· ·L828· 1.816 • 1.804· 1.797 •
7 85.053 82.731 81.125 79.047 77.759 76.882 75.989 75.082 74.522
8 51.991 SO.178 48.921 47.290 46.276 45.583 44.877 44.156 43.715
10 29.789 28.478 27.566 26.376 25.632 25.121 24.597 24.060 23.731
12 21.965 20.889 20.138 19.154 18.534 18.108 17.668 17.215 16.936
14 18.142 17.199 16.539 15.670 15.121 14.742 14.349 13.943 13.691
16 15.916 15.059 14.457 13.662 13.157 12.807 12.444 12.066 11.831
18 14.473 13.674 13.112 12.368 11.894 11.564 11.221 10.863 10.639
20 13.466 12.710 12.177 11.470 11.018 10.703 10.374 10.030 9.814
100 8.739 8.211 7.833 7.321 6.985 6.744 6.486 6.204 6.019
200 8.354 7.848 7.484 6.990 6.664 6429 6.176 5.898 5.714
00 8.000 7.513 7.163 6.686 6.369 6.140 5.892 5.616 5.432
tMultiply by 10'
• Multiply by 10~
662
TABLE 8.2 (Continued)
5% Significance Level
p=5
n\m 5 6 8 10 12 15 20 25 40
100 8.197 7.945 7.597 7.365 7.197 7.014 6.813 6.680 6.455
200 7.850 7.607 7.271 7.045 6.881 6.702 6.503 6.370 6.142
00 7.531 7.295 6.970 6.750 6.590 6.414 6.217 6.084 5.850
tMultiply by 10 4 •
• Multiply by 10 2 •
663
TABLE B.2 ( Continued)
1% Significance Level
p==5
n\m 5 6 8 10 12 15 20 25 40
100 9.793 9.374 8.806 8.432 8.164 7.876 7.561 7.355 7.009
200 9.306 8.907 8.363 8.004 7.745 7.465 7.157 6.953 6.606
00 8.863 8.482 7.961 7.6l5 7.365 7.093 6.790 6.588 6.236
·Multiply by 10 2 •
664
TABLE B.2 ( Contin1led)
5% Significance Level
p=6
n\m 6 8 10 12 15 20 25 30 35
100 9.360 8.976 8.720 8.534 8.333 8.110 7.963 7.857 7.777
200 8.910 8.542 8.295 8.115 7.919 7.701 7.555 7.449 7.369
500 8.659 8.300 8.059 7.882 7.689 7.473 7.328 7.222 7.140
1000 8.579 8.223 7.983 7.808 7.616 7.400 7.255 7.149 7.067
1% Significance Level
p-6
n\m 6 8 10 12 15 20 25 30 35
10 86.397 83.565 81.804 80.602 79.376 78.124 77.360 76.845 76.474
12 46.027 44.103 42.899 42.073 41.227 40.359 39.826 39.466 39.206
14 32.433 30.918 29.966 29.309 28.634 27.936 27.507 27.215 27.004
16 25.977 24.689 23.875 23.311 22.729 22.126 21.753 21.498 21.314
18 22.292 21.146 20.418 19.913 19.389 18.844 18.505 18.273 18.105
20 19.935 18.886 18.217 17.752 17.267 16.761 16.445 16.229 16.071
100 10.917 10.295 9.886 9.592 9.276 8.930 8.703 8.541 8.419
200 10.312 9.723 9.333 9.052 8.748 8.412 8.190 8.030 7.908
500 9.980 9.409 9.030 8.755 8.458 8.128 7.907 7.747 7.625
1000 9.874 9.308 8.933 8.661 8.365 8.037 7.817 7.657 7.534
666
TABLE B.2 ( ContinI/cd)
5% Significance Level
p-7
n\m c.; 10 12 15 20 25 30 35
10 85.040 84.082 83.426 82.755 82.068 81.648 81.364 81.159
12 42.850 42.126 41.627 41.113 40.583 40.257 40.037 39.877
14 29.968 29.373 28!~61 28.534 28.091 27.817 27.631 27.49j
16 24.038 23.519 23.158 22.781 22.389 22.145 21.978 21.857
18 20.692 20.222 19.893 19549 19.189 18.964 18.809 18696
20 18.561 18.125 17.819 17.498 17.159 16.947 16.800 16.694
25 15.587 15.202 14.930 14.642 14.337 14.143 14.009 13.9.tl
30 14.049 13.693 13.440 13.172 12.884 12.701 12.573 12.478
35 13.113 12.776 12.535 12.278 12.002 11.825 11.700 11.608
40 12.485 12.160 11.927 11.679 11.411 11.237 11.115 11.025
SO 11.695 11.386 11.165 10.927 10.668 10.500 10.381 10.292
60 11.219 10.921 10.706 10.475 10.221 10.056 9.938 9.850
70 10.901 10.610 10.400 10.173 9.923 9.760 9.643 9.555
80 10.674 10.388 10.181 9.957 9.710 9.548 9.432 9.344
100 10.371 10.091 9.889 9.669 9.426 9.265 9.150 9.062
200 9.812 9.545 9.350 9.138 8.902 8.744 8.629 8.542
500 9.504 9.244 9.054 8.846 8.613 8.456 8.342 8.254
1000 9.405 9.148 8.959 8.753 8.521 8.365 8.250 8.162
00 9.308 9.053 8.866 8.661 8.431 8.275 8.160 8.072
667
TABLEB.2 (Continlled)
1% Significance Level
p-7
n\m 8 10 12 15 20 25 30 35
10 185.93 182.94 180.90 178.83 176.73 175.44 174.57 173.92
12 71.731 69.978 68.779 67.552 66.296 65.528 65.010 64.636
14 44.255 42.978 42.099 41.197 40.269 39.698 39.311 39.032
16 33.097 32.057 31.339 30.599 2~.834 29.361 29.039 28.806
18 27.273 26.374 25.750 25.105 24.435 24.019 23.735 23.529
20 23.757 22.949 22.388 21.804 2l.195 20.816 20.556 20.367
668
TABLE 8.2 (Conrinued)
5% Significance Level
p=8
n\m 8 10 12 15 20 25 30 35
14 42.516 41.737 41.198 40.641 40.066 39.711 39.470 39.296
16 31.894 31.242 30.788 30.318 29.829 29.525 29.318 29.167
18 26.421 25.847 25.446 25.028 24.591 24.319 24.132 23.996
20 23.127 22.605 22.239 21.856 21.454 21.201 21.028 20.902
669
TABLE B.2 (Conrinued)
1% Significance Level
p=
n\m 8 10 12 15 20 25 30 35
14 65.793 64.035 62.828 61.592 60.323 59.545 59.019 58.639
16 44.977 43.633 42.707 41.754 40.771 40.164 39.753 39.456
18 35.265 34.146 33.373 32.573 31.745 31.232 30.882 30.629
20 29.786 28.808 28.129 27.425 26.691 26.235 25.924 25.697
67[1
TABLE 8.2 ( Conrinued)
5% Significance Level
p = 10
n\m 10 12 15 20 25 30 35
671
TABLEB.2 ( Continued)
1% Significance Level
p = 10
n\m 10 12 15 20 25 30 35
14 180.90 178.28 175.62 172.91 171.24 170
16 89.068 87.414 85.270 83.980 82.91 82.2 81.7
18 59.564 58.328 57.055 55.742 54.933 54.384 53.990
20 45.963 44.951 43.905 42.821 42.150 41.693 41.362
672
TABLE 8.3
TABLES OF SIGNIFICANCE POHrTS FOR THE BARTLETT-NANDA-PlLLAl TRACE TEST
----------------
tJ
I~ I 2 3 ~ 5 6 7 8 9 10 15 20
13 5.~9 ~. 250 3.730 3.~3O 3.229 3.OB2 2.970 2,581 2.808 2. 7~7 2.545 2.0431
15 5.567 ~.310 3.782 3. ~76 3.271 3.122 3.008 2.917 2.SA2 2.779 2.572 2.455
19 5.659 ~.396 3.858 3.5~6 3.336 3.183 3.066 2.972 2.895 2831 2.616 2 ~93
23 5.718 ~ . ...s3 3.91\ 3.595 3383 3.228 3.109 3013 2.935 2.669 2.650 2.52~
27 5.759 ~. ~95 3.950 3.632 3. AI8 3.261 3. I~I 3.045 2.966 2.899 2.677 2.5A8
05 33 5.801 ~.539 3.992 3.672 3.~56 3.299 3. 178 3. OBI 3.001 2.934 2.709 2.578
~3 5.SA5 ~.586 ~.037 3.716 3.~99 3.3~1 3.219 3.122 3.041 2. 97~ 2. 7~6 2.613
63 5.891 ~.6J5 ~.086 3. 76~ 3.5A7 3.389 3266 3.169 3.088 3.020 2791 2.657
83 5. 91~ ~. 661 ~. 112 3.790 3.573 3 ~15 3 293 3.195 3. II~ 3.046 2.818 2683
123 5.938 ~. 6SS ~. 139 3.818 3601 3. ~~3 3.321 3.223 3 1~3 3 075 2. SAl 2 713
2~3 5.962 ~. 715 ~. 168 3 SA6 3630 3 ~72 3351 3 25~ 3. 17~ 3.106 2. sao 2748
5.991 ~. 74-1 ~. 197 3.877 3.661 3.5O~ 3.J8.C 3.287 3.208 3. I~I 2.918 2.7SS
""
13 7.~99 5.~09 ~.S70 ~.09~ 3.780 3.555 3383 3.2.e 3.138 3.047 2.751 2.587
I 15 7.710 5.539 ~. 671 ~.I80 3.857 3.625 3.~.e 3.309 3.196 3. 101. 2.795 2 625
19 8.007 5.732 ~. 82~ ~. 3123.976 3.734 3.550 3.~05 3.287 3 ISS 2667 2.686
23 8.206 5.868 ~.935 ~.~09 ~ 06~ 3.815 3.627 3. ~78 3.356 3.255 2.923 2.735
27 8.3~9 5.970 5.019 ~. 131 3 878 3.686
~. -48J 3.53~ 3.~10 3.307 2.968 2.775
,01 33 8.500 6.080 5.111 ~.566 A.207 3950 3 75~ 3600 3. ~73 3.368 3.021 2 823
~3 8. 660 6.201 5.21~ ~.659 ~. 29~ ~.032 3.833 3.675 3. 5~7 3 ~39 3.085 2 SSI
63 8.831 6.333 5 329 ~ 76A ~ 393 A 127 3.925 3.765 3.63~ 3.525 3.163 2.955
83 8.920 6. ~04 5.392 ~ 823 A ~~9 A.181 3.977 3.815 3.68~ 3.57~ 3.210 3.000
123 9.012 6. ~7~ 5459 ~.885 A.5OB ~ 238 ~ 033 3 871 3.739 3.628 3263 3.052
2~3 9. lOB 6.556 5.529 ~.951 A.572 ~ 301 ~.095 3 932 3800 3.689 3323 3 113
"" 9.210 6.638 5 60A 5.023 ~ 6~2 ~.369 ~. 163 ~.OOO 3.867 3.757 3.393 3.185
tJ
~ 1 2 3 ~ 5 6 7 8 9 10 IS 20
I~ 6.989 5.595 5.019 ~.684 ~. ~58 ~ 293 ~.I65 ~.063 3.979 3.901! 3.672 3.537
16 7.095 5.673 5.OB2 A.738 ~.507 ~.338 ~. 207 ~. 103 ~.017 3.9~~ 3.702 3563
20 7.2043 5.787 5.177 ~.822 ~.583 ~.~09 ~. 27~ ~. 166 ~.on ~.002 3.751 3.606
2~ 7.341 5.866 5. 2~5 ~.883 A.639 ~. ~61 ~.323 ~.213 ~.I22 ~.046 3.790 36~
28 7. ~IO 5.925 5.295 ~ 929 ~ 682 ~.501 ~.362 ~. 250 ~.I58 ~.OBI 3.821 3.668
34 7. .e2 5.987 5.351 ~. 980 ~.73O L5~7 ~. ~06 ~.293 ~.200 ~. I"!I 3.857 3.'702
.05 ~~ 7.559 6.055 5. ~12 5.037 ~. 7SA ~.599 ~. 457 ~.342 ~.2.e ~. 169 3.901 3.7043
6.4 7.639 6.129 5.A80 5.101 ~.SA6 ~. 660 ~.516 ~.~ ~.305 ~.225 3.955 3.795
SA 7.681 6.168 5.517 5.137 ~. sao ~. 693 ~ . .s.9 ~.433 ~.338 ~. 257 3.986 3.826
12~ 7. 72~ 6.209 5.556 5. 17~ ~.917 ~.730 ~.585 ~.~9 ~. 37~ ~.293 ~.022 3.662
2~~ 7.768 6.251 5.597 5. 21~ ~. 957 ~. 769 ~. 62~ ~.508 ~. ~13 ~.333 ~.063 3.~
7 815 6.296 5.6.40 5.257 ~.m ~.812 ~.667 ~.552 ~, 457 ~.377 ~. 110 3.9.s.
""
I~ 8.971 6.855 5.970 5.457 5. 112 ~.662 ~.669 ~.516 ~.39O ~.285 3.939 3.7043
16 9.245 7.006 6.083 5.551 5. 195 ~.937 ~. 738 ~. 581 ~.451 ~.3A3 3.986 3.783
20 9.639 7.236 6258 5.698 5.326 5.056 ~.SA9 ~. 684 ~ . .s.9 ~.436 ~.063 USo
2~ 9.910 7.~03 6.387 5.808 5.-42~ 5. 1~6 ~.933 ~. 76.4 ~.625 ~.509 ~. 12~ 3.9OJ
28 10.106 7.528 6. -486 5.893 5.501 5.217 ~.m ~.827 ~.685 ~.567 ~. 17~ 3.9.e
34 10.317 7.667 6.598 5.990 5.588 5.298 5.076 4.900 ~. 756 ~.6J5 ~.233 ~.ool
.01 ~~ 10.545 7.821 6. 72~ 6.101 5.690 5.393 5.167 ~. 986 ~.839 ~. 715 ~.305 ~. 067
6.4 10.790 7.99~ 6.667 6230 5.809 5.505 5.27~ 5.090 ~.939 ~.813 ~.39~ ~. 151
SA 10. 920 8.088 6. 9~7 6.301 5.876 5.569 5.335 5.150 ~.998 ~.871 ~.~ ~. 203
12~ II. 056 8. ISS 7.032 6.379 5.9.e 5.639 5.~03 5.216 5.063 ~.935 ~.510 ~.26'3
2« II. 196 8.29~ 7. 12~ 6. ~63 6.028 5.716 5.~78 5.290 5.136 5.007 ~. 581 ~~
"" I\. 345 8.~06 7.222 6. s.s. 6.116 5.801 5.562 5.372 5.218 5.089 ~. 66-4 ~. ~19
673
TABLE B.3 (Continued)
-~.-.-- --------~.~ - ---~ - --.
.. ...... ..------~-~-.--
P-4
a i'0~ml
I '...
2 3 4 5 6 7 B 9 10 15 20
~--l-
15 B.331 6.859 6.245 5.885 5.642 5.462 5.323 5.; 12 5.119 5.04\ 4.119 4.6'11
.I 17
21
B.472
B.671
6952
7.091
6.31B
6.429
5.947
6.00
5.696
5.782
5512 5.369 5.:55 5.160
5.591 '5.443 5. ~2A 5.225
5.080
5.143
4.BII 4.654
4. f36.4 4.701
25 B.805 7.190 6.510 6.114 5. e.46
5.6-50 5.A9B 5.377 5.276 5.191 4.906 4.738
I 29 B.901 7.263 6.571 6.168 5.B96 5.696 5.542 5.41B 5.316 5.230 4.939 4.768
.OS I 35
45
OS
9. QO.l
9.113
9.229
7.~3
7.431
7.52B
6.640
6.716
6.&02
6.229
6.298
6.37B
5.952
6.01l
6,092
5.7<49 5.593 5.467 5.363
5 811 5.6$2 5.524 5.41B
5.663 5.721 5.592 5. .c&5
5.275
5.329
5.395
4.9&0 4.805
5.029 4.851
5.090 4.910
85 9.291 7.580 6.SA9 6.42' 6.1~ 5.923 5.761 5.631 5.523 5.432 5. 127 4.945
125 9.35A 7.635 6.899 6.1.6'1 6.179 5.968 5.~ 5.67. '<:.566 5.475 5.168 4.987
245 9.419 7.693 6.952 6.519 6.22B 6.016 5.852 5.721 5.613 5.522 5.216 5.0'lS
9.486 7.754 7.009 6.574 6.282 6.069 5.905 5.774 5.1.67 5. 576 5. '112 5.094
15 10,293 8. 188 7.276 6.737 6.373 6. lOS 5.89B 5.731 5.594 5.479 5.095 4.B74
17 10.619 8.360 7.401 6 SAO 6.462 6.ISA 5,971 5.799 5.658 5.539 5. I.... 4.916
21 II. 095 B,625 7.598 7,003 6 60A 6.313 6.089 5.909 5.762 5.638 5.225 4.987
25 11.42B B,818 7.744 7. 126 6712 6. All 6.1&0 5.995 5.W 5.716 5.290 5.044
I 29 11.672 B,966 7.858 7,222 6.798 6.490 6253 6.064 5.909 5.779 5.344 5.091
,01
II 35 H.938 9.131 7.987 7.332 6.897 6,581 6.338 6. 145 5.986 5.853 5.408 5. 149
t5 12.228 9,31B B.I35 7,460 7.012 6.688 6.439 6.241 6.079 5.942 5.47 5.221
&5 12.545 9,529 B.306 7.610 7,149 6.816 6.561 6.358 6.192 6.OS2 5.586 5.314
85 12.715 9.645 B, A02 7.695 7.227 6.890 6.632 6.426 6.258 6.117 5.646 5.371
125 12.893 9.769 8.505 7.7B7 7,313 6.971 6.710 6.503 6.333 6.190 5.715 5.439
! 245 13.080 9.902 8.617 7.889 7.408 7.062 6. 798 6.588 6.417 6. '113 5.796 5.519
13.277 10,045 8,739 8.000 7.513 7. 163 6.897 6.686 6.513 6.369 5.B92 5.616
""
p-5
II'
I~I 2 3 4 5 6 7 8 9 10 15 20
I 16
IB
9.589
9.761
B.071
B.179
7.430
7.512
7.OS2
7.120
6,795
6.854
6. 60s
6.659
6.457 6,338 6.239
6.SIU 6. J8.4 6.2B2
6. ISS
6..196
5.B73
5.906
5.7!U
5.735
I 22
26
10.007
10.176
B.340
B.457
7.639
7.732
7. 22B
7.308
6.949
7.021
6.745
6.B10
6,58cS 6,458 6.352
6.647 6.516 6.407
6.263
6.316
5.961
6.006
5.7SA
5.B23
30 10.29B B.5.... 7.803 7.370 7.0n 6.662 6.696 6.562 6.451 6.358 6.042 5.856
.OS 36 10.429 B.641 7.663 7. AAO 7.141 6.92.2 6.752 6.616 6.503 6.408 6.086 5.B96
1.6 10.571 8.7.a 7.974 7.521 7.216 6.991 6.B19 6.680 6.565 6.<4686, 140 5.'45
66 10.724 B.B68 un 7.615 7,31,)3 7.075 6.899 6.757 6.640 6.541 6.208 6.009
66 10.805 8.933 8.1~ 7.667 7.353 7. 122 6. 94. 6·&02 USA 6.584 6.2.a 6,049
126 10.B9O 9.002 8.195 7.724 7.407 7.174 6.995 6.851 6.733 6.633 6.295 6.095
21.6 10.978 9.076 B. 261 7.786 7.466 7.232 7.OS2 6.907 6. 788 6.688 6.350
... 1\.071 9.154 8.332 7.853 7.531 7.296 7.115 6.970 6.851 6.750 6.414
6.150
6.217
16 Il.~ 9,451 8.521 7.966 7587 7.306 7088 6.912 6.767 6.&44 6.230 5.9B9
16 II. 902 9.642 8.658 8.077 7.682 7.39\ 7.165 6.983 6.833 6.707 6.2BI 6.D33
22 12.449 9.939 8.876 B.2S5 7.835 7,52B 7.291 7.100 6.943 6.B10 6.366 6.1!U
26 12.837 10.159 9.040 B.39O 7.95A 7.635 7.389 7.192 7.030 6.B93 6. .& 6.166
30 13.125 10.328 9.168 B.497 B.O.a 7.720 7.A68 7.266 7.100 6.961} 6.491 6.216
.01 I 36
46
13.442
13.790
10518 9.314 8.621
10.735 9. .c83 B.765
8.158 7.820 7.561 7354
B.2B7 7.939 7.673 7.460
7.183
7.2SA
7.040
7.137
6.644
6.SS9 6.217
6.355
I 66
66
14.176
14.385
10.9SA 9.681 B 936
11.122 9.793 9. 0J.4
B.4A2 8,083 7.608 7.589
B.531 8.167 7.888 7.666
7.409
7.483
7.258
6.752
7.330
6.BIB
6.m
6.51B
126 14.6!U 11. 270 9.914 9.1.2 B.63O B.260 7.977 7.752 7.567 7.412
6.895 6.592
2A6 14,839 II. 431 10.047 9.260 8.7AO B.J64 B.on 7.SA9 7.663 7.506 6.985 6.681
15.086 II. 60s 10.193 9.392 B863 B. &112 B.l92 7.96\ 7.773 7.615 7.093 6.790
""
674
TABU~ B,3 (Continued)
n! 17
19
I
10.794
10.993
2
9.247
9,367
3
8.585
8.676
4
8. 193
8.268
5
7.926
7.990
6
7.728
7.785
7 8
6.9511 6.778
6.990 6,808
23 11.282 9.556 8.817 8.386 8.093 7.878 7.711 7.576 7.464 7.369 7.048 6.858
27 IUM 9.684 8.922 8.475 81n 7.950 7.m 7.638 7.523 7.425 7.095 6,899
31 It. 630 9.78<& 9.003 8.545 8.234 8.007 7.830 7.688 7.570 7.471 7.134 6.934
.05 37 I\. 790 9.897 9.094 8.624 8.306 8.073 7.892 7.747 7.627 7.525 7.181 6.976
47 11.964 10.024 9.199 8.716 8.390 8.151 7.966 7.817 7.694 7.590 7.239 urn
67 I?. IS. 10.166 9.319 8.824 8.490 8.245 8.056 7.903 7.778 7.612 7.312 7.099
87 12. 255 .10.245 9.387 8.885 8.S.7 8.299 8.108 7.9S. 7.827 7.720 7.357 7.142
127 12.362 10.328 9.459 8.951 8.609 8.359 8.165 8.010 7.882 7.774 7.¥YI 7.193
247 12.474 10.417 90538 9,024 8. 678 8.425 8.230 8.074 7.945 1.836 7.410 7,254
co 12.592 10.513 9.623 9. 11)4 8.755 8.500 8.303 8.146 8.017 7.908 7.543 7.328
17 12.722 10.664 9.72t 9.157 8.767 8.478 8.252 8.069 7.917 7.788 7.351 7.093
19 13.126 10.874 9.873 9.277 8.869 8.567 8.332 8. 143 7.986 7.653 7.403 7.137
23 13.736 IL202 10.IH 9.469 9.034 8.714 8.465 8.266 8.100 7.961 7.490 7.213
27 14.173 11.«6 10.192 9.617 9.162 8.828 8.570 8.363 8.192 8.048 7.5111 7.275
31 14.501 11.635 10.433 9.734 9.264 8.921 8.655 8.442 8.267 8.119 7.621 7.328
.01 37 14.W 11.850 10.596 9.871 9.384 9.030 8. 7511 8.537 8.356 8.204 7.693 7.392
47 15.270 12.097 10.787 10. OJ2 9.527 9.160 8.878 8.652 8.466 8.309 7.783 7.474
67 15.723 12.382 H.OII 10.224 9.700 9.319 9.027 8.794 8.602 8.440 7.899 7.58\
87 15,970 12.542 II. 138 10.33.S 9.800 9.413 9.115 8.878 8.683 8.520 7.971 7,649
127 16.233 12. 715 11.278 10.457 9.912 9.517 9.215 8.974 8.776 8.610 8.05.5 7.129
247 16.513 12.903 I L 432 10.593 10. OJ7 9.635 9.328 9.084 8.883 8.715 8. IS. 7.W
co 16.812 13.108 11.602 10.745 10.178 9.770 9.458 9.210 9.008 8.838 8.274 7.948
2 3 5 6 7 8 9 10 15 20
18 1l.961 10.396 9.n9 9.316 9.o.w 8.835 8.675 8.545 8.437 8.345 8.031 7.843
---
20 12. 18.4 10.528 9.817 9.396 9.109 8,896 8. 730 8.596 8.44 8.390 8.0~7 7.1Il4
2~ 12.513 10.731 9.972 9.525 9.~ 8. 996 8.821 8.680 8.563 8.4M 8.127 7.926
28 12,7'" 10.680 10.088 9.622 9.306 9.073 8.892 8.746 8.626 8.523 8.176 7.969
32 12.915 10.99" 10.178 9,699 9.374 9. 135 8.950 8.800 8.676 8.572 8,21(. 8. DO.!.
.05 38 13.102 11.123 10.281 9.787 9.453 9.20& 9.017 8.864 8.737 8.630 8.266 8.049
48 13.308 11.267 10.399 9.890 9.547 9.294 9.098 8.9~1 8.811 8.701 8.328 8.106
68 13.534 II. 433 10.537 10.012 9.658 9.398 9.197 9.036 8.902 8.789 8.407 8.160
88 13.657 11.52~ 10.61" 10.082 9.722 9.459 9.255 9.092 8.9511 8.BA2 8.456 8.226
128 IZ.786 11.623 10.698 10. 158 9.793 9.526 9.320 9.155 ~.018 8.903 8.513 8.:.:81
248 13.923 11.728 10.790 10.242 9.872 9.602 9.394 9.226 9.089 8.972 8.580 8.348
co 14,067 II. 8<&2 10.890 10.33.c 9.960 9.687 9.477 9.309 9.170 9.053 8.661 8.431
13..'314 II. 8<&1
10.895 10.321 9.923 9.627 9.395 9.206 9.049 8.915 8.440 8.188
i
32
38
\4.310
1·4097..
15.456
15.822
16.230
12.069
12."26
12.694
12.902
13.141
11.056
11.314
11.510
11.665
11.&45
10.448
10.655
10.815
10.943
!t.092
10. OJI
10.207
10.344
10.455
10.586
9.721
9.876
9.999
10.098
10.215
9.479
9.619
9.731
9.821
9.930
9.283
9."12
9.515
9.599
9.700
9.12\ 8.962 8.512
9.240 9.095 8.602
9.337 9.186 8.676
9.416 9.261 8. 738
9.511 9.351 8. 814
8.233
8.310
8.374
8.429
8.496
.01 16.688 13.416 12.056 11.269 10.742 10.357 111.061
48 9.824 9.628 9.463 8.9119 8.582
68 17.206 13.737 12.306 11.-'82 10.932 10. S32 10.224 9.978 9.776 9.605 9.033 8. 696
88 17.491 13.919 12 ....9 II. 605 It. Q.43 10.635 10.321 1Jl.070 9.W 9.691 9.110 8.768
128 17.796 1~.116 12.607 11.743 II. 168 10.751 10.431 10. 176 9.967 9.790 9.201 8.854
248 18.124 14.3J3 12.782 11.897 11.308 10.883 10.557 10.298 10.085 9.906 9.309 8.940
co 18.475 14.571 12.977 12.070 1l..c68 11,034 10.7OJ 10.439 10.223 10. Q.43 9.... ' 9.092
675
TABLE B.3 (Continued)
• ~ I 2 3 4 5 6 7 8 9 10 15 20
19 13. 101 11.524 10.83.5 10.423 10.1"1 9.930 9.766 9.&32 9,521 9.0426 9. 100 8.904
21 13. 34& ll.W 10.9"1 10.509 10.21 .. 9.995 9.8204 9.685 9.570 9."12 9.136 8.935
25 13.710 IL889 1l.109 10, &47 10.333 10.101 9,920 9.714 9.652 9.550 9.198 8• •
29 13.m 12. 054 11.23510.75310• .42510.1114 9.996 9.844 9.718 9.612 9.249 9.032
33 1.... 163 12.180 11.334 10.837 10. ..99 10.250 10.057 9.902 9.m 9.663 9.292 9.070
.OS 39 1".371 12.323 11.4048 10.9~ 10.S6S 10.329 10.130 9.970 9.837 9.725 9.~ 9.116
49 1".61" 12..u 11.580 H.a..a 10.688 10... 23 10.218 10.053 9.917 9.801 9.409 9.176
69 IA.m 12. 67.. 11.7301 11.183 10.811 10.538 10.326 10.156 10.016 9.897 9...9<1 9.~
89 15.021 12.179 11.822 IU61 10.883 10.605 10.390 10.21810.075 9.955 9.SA7 9.3Oot
129 15.173 12.892 11.918 II.:W 10.962 10.680 10. ~ 10.288 10. 143 10.021 9.609 9.363
2049 15.335 13.015 12.023 11.442 11.05110.765 10.S44 10.367 10.221 11).098 9. .t82 9,436
... 15.507 13.148 12.13& II.SA9 1l.l52 10.662 10.638 10.45910.31210.1111 9.711 9.526
19 1... 999
12. 992 12.043 11.463 1l.060 10.75& 10.521 10.328 10.167 10.030 9.558 9.275
21 15.463 13.235 12.215 ILS98 lI.17A 10.857 10.610 10.409 10.2<11 1000W 9.612 9.321
25 16. 17113.620 12."91 IL819 il.360 11.021 10.757 10.543 10.366 10.216 9.704 9.399
29 16.701'1
13.910 12.703 11.991 IL507 11.15110.81" 10.651 10.467 10.310 9.7eo 9.445
33 17.100 1".137 12.81112.12811.62611.256 10.970 10.740 10.5S0 10.3&9 9.844 9.521
.01 39 17.5A9 1".398 13.067 12.290 IUM IU63 11.Q66 10.648 10.651 10.45 9.92" 9.591
<19 18. 0581".702 13.297 12.482 II. 935 11.536 11.227 II). 980 10.716 10,60' 10.0204 9.681
IS.058
"
£9
129
18. 640
18.962
\9.310
15. 261
IS."
13. 573 12. 716 12.143 11.725 11.403
13.733 12.853 12.265 11.838 11.509
13.909 13.005 12.403 11.965 11.630
11.146
11.2"7
11.362
10.934 II). 756 10.155
11.030 11).849 11).238
11.1"210.956 10.335
9.801
9.811
9.970
2049 19.684 15.729 1<1.106 13.17112.55912.11211.769 1\."9S II.VI 1l.083 \0.453 10.083
00 20.090 16. oao 1... 327 13.37112.73& 12.280 11.930 11.652 II.m 11.23310.597 \0.227
- ~
• 1 2 3 . 5 6 7 8 9 10 15 20
2\ 15.322 13.733 13.027 12.603 12.311 12.093 1\.922 11.782 11,666 11.56611.221 1l.013
23 15..04 13.897 13.1"7 12.700 12.393 12.1&4 11.965 II.IUG 11.719 11.616 11.2~ 1l.0.t5
27 16.033 I".ISA 13.3010 12.857 12.526 12.282 12.091 11.f37 11.8OB 11.699 1\.325 11.100
31 16.:wA 1... ~7 13.<117 12.978 12.631 12.37S 12.176 12.015 11.881 11.768 11.380 11.146
35 16. sao 1<1.<191 13.603 13.07S 12.716 12.451 12.245 12.079 1I.9AI 11.821 II.A26 ll.l66
.05 AI 16.8<13 1",669 13.737 13. 188 12. 816 12. SAl 12.328 12.156 12.01" 11.893 11. <182 1l.236
SI 17.140 1".Ma 13.895 13.323 12.936 12.651 12.430 12.251 12.104 1l.979 II.SSS 11.301
71 17.416 15.100 1<1.083 13. 486 13.083 12.786 12.556 12.371 12.21812.089 11.&50 11.388
91 17.662 IS. 23\ 1".191 13.5&1 13. 169 12.866 12.632 12."43 12. 288 12. 156 IUIO H.443
131 17.861 15.37S 1<1.31013.687 13.266 12.957 12.718 12.S26 12.368 12.2~ IU81 lUll
IS. 532
...
251 18.076
18.307 IS. 705
1<1.443 13.806
I4.S91 13.940
13.37613.061
13.501 13.180
12.81812.622
12.933 12.735
12.461 12.325 11.667
12.572 12.~ 11.972
11.59.(
11.700
21 17.197 IS.2~ 1".28<1 13.698 13.2158 12.980 12.736 12.537 12.371 12.22811.733 Il.m
23 17.707 IS. 507 1.....7613.84913, .. 13 13.088 12.832 12.624 12.44912.301 11.78911.478
27 18.505 15.9<1\ 14.788 1".096 13.621 13.268 12.'193 n.769 12•.sa.t 12.0426 11.885 11.559
31 19.101 16. 273 15.029 14.290 13.766 13."1313.123 12.888 12.693 12.52811.965 11.628
35 19.562 16.535 15.222 14.447 13.920 13.531 13.230 12.986 12.785 12.61<1 12.0l4 p.687
.01 .. I 20.088 16.839 15.448 1".6321".080 13.67"13.35913.10612.89112.720 12.\191\.761
SI 20.692 17.196 15.7181<1.855 14.27 .. 13.648 13.519 13.255 13.037 12.85212.229 II.MB
71 21. 39.. 17.623 16.0.t5 IS. 130 1".SI6 1".06713.722 13.445 13.21613.0204 12.37" 11.989
91 21.790 17.868 16.236 15.292 1<1.660 1".199 13.845 13.561 13.327 13.130 12.466 12.07"
13\ 22.221 18. 1.. 1 16.450 IS."7S 1".82" 1... 350 13.986 13.69613.456 13.25A 12.S77 12.171
...
251 22.692
23.209
18.444
18.783
16.690 15.683 15.012
16.9&4 IS.923 IS. 231
1".S25 1... 152 13.853 13.608 13.402 12.712 12.307
14.730 1".346 1".041 13. 791 IJ.5&1 12.1581 12... n
676
TABLE BA
TABLES OF SIGNIFICANCE POINTS FOR THE ROY MAXIMUM ROOT TEST
Pr{ m ; nR ~ XQ } =a
p=2
~"'~ )5
It I 2 3 4 5 6 7 8 <; 10 20
13 5.-499 3.736 3.011 2.605 2.3-42 2.157 2.018 I. 910 1.823 1.752 1.527 I -407
15 5.56(' 3,807 3.078 2.668 2.-401 2211 2.069 l. 959 1.869 1.796 I 562 1<436
19 5.659 3.905 3.173 2.759 2.<487 2.293 2.148 2.033 1.9<40 1.86<4 I 618 148<4
23 5.718 3,971 3.239 2.822 2.548 2352 2.204 2.087 \.993 I. 915 1.66\ I 521
27 5.759 <4.018 3.286 2.868 2.593 2.396 2.2<47 2.129 2.033 1.95<4 1 696 I.SS2
.05 33 5.801 <4.068 3.336 2.918 2.6<43 2.<4<45 2.29<4 2.175 2.079 i. 998 1.736 I 588
<43 5.8<45 <4,120 3.391 2.973 2.697 2. <498 2.3<47 2.228 2.131 2.0.&9 I. 783 I. 631
63 5,891 <4. 176 3.<4<49 3,032 2.757 2.SS8 2.<407 2.288 2.190 2.109 1.640 1686
83 5.91<4 <4.205 3.eo 3.06<4 2.789 2.591 2.<4<40 2.321 2.223 2. 1<42 1.873 1.718
123 5.938 035 3.512 3.097 2.823 2.626 2.<476 2.356 2.259 2.178 1.909 1.7SS
2<43 5.962 <4.265 3 5-45 3.132 2.859 2663 2513 2.395 2298 2,217 1951 I 797
co 5.991 <4.297 3.580 3.169 2,897 2.702 2 SS<4 2.436 2.3<40 2.261 l. 998 1 8<47
13 7.<499 <4.675 3,610 3. o.ao 2.681 2.<432 1.2<49 2.109 l. 997 1907 1. 625 1<478
15 7,710 <4.834 3. 7<42 3.15-4 2.782 2.523 2.333 2.184 2.069 l. 973 I 676 1 519
19 8,007 5.06<4 3.937 3.325 2.936 2.66<4 2.<463 2.307 2.182 2080 I 758 I 587
23 8.206 5.223 <4.07<4 3."8 3.0<48 2.761l 2.559 2.397 2.268 2 161 I 823 I 6-41
27 8.3<49 5.339 <4.176 3 s.co 3.133 2.8<47 '1.63<4 2.468 2.335 2.225 I. 876 1.686
.01 33 8,500 5.<465 <4.287 3.6-42 3.221l 2.936 2.7\8 2.548 2.<412 2.299 1,938 17<40
43 8,660 5,600 4.<409 3755 3.33<4 3.037 2.IlIS 2.6<4\ 2.501 2. 186 2.0\3 \.607
63 8.831 5.7<47 <4.5<43 3.881 3.45<4 3.153 2.926 2.7049 2.607 2.488 2.105 1.891
83 8920 5.825 <4.616 3.950 3520 3217 2.989 :2 810 2.666 2.5-47 2.160 1.9<43
123 9.012 5,906 <4.692 <4.022 3,59\ 3.285 3.056 2,877 2.732 2.612 2.222 2002
2<43 9,108 5991 <4.772 <4. 100 3.666 3.360 3.130 2950 2.80-4 2.683 2.292 2072
co 9.210 6.080 -4.856 -4.182 3.7~7 3."0 3209 3029 2.&\4 2.763 2,373 2. 15-4
p=3
-'
2 3 5 6 7 8 9 10 13
"- 3.5-4-4 2.-430
1-4 6.989 -4.5\7 3.010 2.669 2.25-4 2.m 2.008 I. 919 1 639 1-491
\6 7.095 ... 617 3.63<4 3. 092 2.7-45 2·501 2.319 2.178 2,065 l. 973 1 682 I 526
20 7.243 ~. 760 3.767 3.215 2.859 2.608 2.-420 2.27-4 2. 156 2.059 1.751 lsas
2-4 7.~1 -4.a58 3.859 3.302 2.9-42 2.686 2.-495 2.~5 2.22-4 2.12-4 1.805 1.631
28 7.-410 -4.929 3.927 3.367 3.00-4 2.7-46 2.552 2.-400 2.277 2.176 I. 8<49 I. 669
.05 3-4 7.<182 5.00-4 -4.001 3.-439 3,073 2812 2.6\6 2,-462 2.338 2.23-4 1'(01 \.715
-4-4 7.559 5.086 -4.08l 3.517 3. 1-49 2.887 2.689 2.53-4 2.-408 2.303 \. 962 \.771
6-4 7.639 5. J73 -4. 169 3.60-4 3.235 2.972 2773 2.616 2.489 2.383 2,038 I 8<42
8-4 7.681 5.220 -4.216 3.651 3.282 3019 2.820 2.663 2.535 2.-429 2.082 \.885
12-4 7.72-4 5.268 -4,265 3.701 3.332 3.069 2.870 2.713 2..586 2.-479 2. \32 \.9~
2" 7.768 5.318 -4,317 3.75-4 3.386 3123 2.92-4 2.768 2.6<4\ 2.535 2.\89 L991
co 7.815 5.370 -4.371 3.810 3.-4-43 3.181 2.983 2.828 2.701 2.596 2.253 2.059
1-4 8.971 5.-416 -4. lilt :'.-412 2.978 2.680 2.<462 2.295 2.163 2.055 1.72-4 I.SS2
16 9.245 5.613 -4.265 3.5<18 3.098 2.787 2. SS9 2.38-4 2.245 2.\32 1.782 \. 598
20 9.639 5,905 -4.507 3. 757 3.28<4 2.956 2.}1-4 2.527 2.378 2.257 1.877 \. 676
2-4 9.910 6. III -4.681 3.910 3.-422 3.082 2831 2.636 2.<18\ 2.35.c 1.9,SA \. 7-40
28 10.106 6. 26.4 -4.81\ -4.026 3.528 3.180 2.922 2.722 7,562 2.-431 2.016 \.792
.01 ~ 10.3\7 6.-431 -4.955 -4.156 3.6-47 3.192 3.027 2.821 2.657 2.521 2.091 1.857
-4-4 10.5-45 6.61-4 50 1t6 -4.303 3,764 3.-420 3.1<18 2.938 2.768 2.628 2.182 1·937
6.4 10.790 6.at5 50 296 -4.169 3,9-40 3.568 3.291 3.075 2.901 2.757 2.295 2.0-40
8<4 10.920 6.923 5. 39} -4.560 -4.027 3.652 3,372 3. 153 2.977 2.832 2.363 2.103
12-4 II. 0S6 7.037 5.-49' -4,658 -4,120 3.742 3460 3,239 3.062 2.915 2.4-4\ .2. 117
2" I\. 196 7.m 5.601 4.763 4.221 3.641 3.556 3.33-4 3.155 3.008 2.530 226-4
co I\. 345 7.28<4 5,72S 4.875 4.331 3.948 3.663 3."0 3260 3.1\2 2.634 2.369
677
TABLE BA ( Conrinlled)
- ...... ~ - ..... --~----. ~-------
p=~
a ;~,; 2 3 <4 5 6 7 8 9 \0 15 20
IS 8,33\ 5.211 <4.013 3.365 2.955 2.670 2.<459 2.297 2.168 2.063 \.736 \. S04
17 8.<472 5,336 <4,123 3,<46<4 3.045 2.752 2.536 2.368 2.235 2.126 I. 78A U03
21 8.671 5,517 4,287 3,613 3,182 2.880 2,655 2,481 2.341 2.121 1.86<4 1.670
25 8,805 5.6<4<4 <4,403 3.721 3,283 2.975 2.745 2.566 2,<422 2.30" 1.928 1-724
29 8.901 5,736 <4. <490 3.802 3.360 3,0<48 2,81<4 2.632 2, <486 2.365 1.979 l. 769
,05 ' 35 9.004 5,837 <4,585 3,893 3.<4"6 3. 130 2.893 2.7OB 2.559 2• .c36
2.040 1.822
<4S 9. \13 5,9<46 <4.690 3,993 3,543 3.22<4 2.98A 2.796 2.645 2.520 2.11<4 1.889
65
85
9.229
9.291
6,065
6,128
".806
<4.869
".106
<4.168
3.653
3.71<4
3.332
3.392
3,1)90
3.1<49
2.900
2.959
2.7"6
2.804
2619
2.677
2.206
2.261
1.974
2.026
125 9,35.c 6.195 <4.935 <4,234 3. 779 3...57 3.21.. 3.023 2.868 27..0 2.322 2.087
2.. 5 9 ... 19 6.265 5,005 <4.304 3,850 3,527 3.28A 3.093 2.939 2.811 2.393 2.157
00 9...88 6.338 5,080 ... 380 3.926 3.603 3.361 3,171 3.017 2.890 2."75 2.2..2
15 10,293 6.080 <4,5.c9 3. 7<4<4 3,2<4" 2.901 2.651 2. .c61 2.310 2188 1.812 1.618
17 10.619 6,308 <4.731 3,898 3,378 3.021 2.760 2.559 2 ..01 2.273 1.875 \.668
21 11.095 6,650 5,010 <4.137 3.5&9 3.2it 2,933 2.720 2.550 2.<412 1.981 \.754
25 11 ..28 6896 5.213 <4,315 3,748 3.356 3,067 2,8A.c 2,666 2.521 2.066 1.825
29 11.672 7,OBO 5,368 <4, .. 51 3.872 3,469 3,172 2.9"2 2.759 2,609 :- 137 \.BSA
011 35 1l.938 7,28A 5,542 ... 606 4,013 3.600 3,29.. 3.057 2.868 2.713 2.221 1.956
45 12.228 7.510 5.737 <4.782 <4.175 3.752 3..a7 3.193 2,998 2.837 2.326 2.0<48
65 12,545 7,762 5,95& <4.98A ",36" 3. 930 3. 607 3,356 3.155 2.989 2,<458 2.1"
85 12,715 7,899 6,080 5.096 .....70 ",031 3.704 3."50 3.2.. 6 3.078 2.536 2.2040
125 12.893 8.0<45 6.210 5,218 ",585 .. , \042 3.812 3.555 3.348 3,178 2.630 2,326
245 , 13. OB( 8,199 6,350 5.349 ",710 <4,263 3.930 3.67\ 3. <462 3.290 2.737 2.<430
.., II 13,m 8.363 6.500 5, ..91 ",848 ... 397 ".062 3.80\ 3. 59\ 3. .. 18 2.863 2.555
p=5
a I"n""m ,I 2 3 . 5 6 7 8 9 10 15 20
,
,
i 'I
16 9.5&9 5,856 ","48 3,218 2.890 2,648
3,69" 2,"63 2.316 2,196 1.82.. 1.630
I 18 9,761 6.002 ".575 3.806
3.320 2.982 2.73" 2,542 2.390 2.265 1878 1. 67..
I 22 10.007 6.218 ... 765 3. ..n 3.128 2.869
3.978 2.669 2.509 2.378 I. 967 1.7 ..7
26 10.176 6.370 ... 903 3.593 3.237 2971
".IQ4 2.765 2,601 2.465 2.037 1.807
I 30 10.298 6. .c83 5.006 ... 200
3.683 3.321 3.051 2.842 2.67" 2.535 2.095 1857
36 10... 29 6,606 5.121 3,785 3."18 3,14oC
".307 2.930 2.759 2.617 2.165 1.918
osl, .c6 10.571 6.7 ..2 5.2"9 3.901 3.529 3,251
.....28 3. 03.c 2.859 2.7\" 2.250 1.99"
66 10.72<4 6.892 5.392 ",03.c 3.658 3,3n
<4.566 3.156 2.979 2.832 2.357 2.092
86 10.805 6.973 5.471 ... ~3
.... IOB 3.731 3...48 3.227 3.048 2.900 2. "21 2.152
126 10.890 7.056 5.55.. ".72.. .. 189 3.810 3,527 3. 30.. 3, 125 2.976 2.49<4 2.123
2..6 10.978 7,148 5,~3 ".812 ".276 3.897 3.613 3.390 3.210 3,061 2.578 2.306
.,., 11.071 7.2.... 5,738 ".907 ".371 3.992 3.7OB 3.485 3.304 3.157 2.676 2."07
II. 11. 53.c 6.701 ... 963 ".055 3."92 3. lOB 2.829 2.616 2.448 2.312 \.895 1.680
18 lL902 6.95.c 5. 163 ... 123 3.638 3.238 2.9-45 2.722 2.546 2."03 I. 962 1.733
22 12,4"9 7,340 5.473 ... <487 3.870 3.4oC6 3.135 2.896 2.707 2.554 2.076 1.825
26 12,837 7.620 5,703 4.685 <4.0<47 3.606 3.282 3.033 2.835 2.673 2. 169 \.902
30 13,125 7.632 5.880 4,8<40 <4. 186 3.733 3."00 3.1"2 2.938 2. no 2.2"6 1.9"
01 36 13.4oC2 8.069 6,079 5.016 4.346 3 88\ 3.537 3.271 3.060 2.886 2.339 2.046
~6 13790 8.335 6,306 5.2\9 ".532 ".053 3.699 3,"25 3.206 3.026 2,"56 2. , ..7
66 1".176 8.635 6.566 5.-45.. ... 750 ".259 3.89" 3.61\ 3.385 3.\98 2.604 2.279
86 14385 8.800 6.711 5.5U ".67" .. 377 ".007 3.720 3. ..90 3.30\ 2.695 2.362
126 1".606 8.9n 6.867 5.731 5.010 ".506 ".132 3.8A1 3.6OB 3. .. 16 2.800 2.461
2<46 14.839 9,165 7,036 5.888 5,159 ".650 ".272 3,978 3,7"1 3,547 2.92.. 2.579
00 15.086 9.367 7,218 6.060 5.32.. ",810 ",~8 ".132 3. 893 3.697 3.070 2.72..
678
TABLE BA (COlltillUed)
------------------~----------------------------
-
..
--
a '~ I 2 3 <4 5 6 7 8 9 10 15 20
\7 10.79<4 6. <470 <4.86\ <4.005 3.<468 3.098 2.827 2620 2<455 2.32J 1908 l. 693
19 10.993 6.634 5.001 <4.128 3.579 3.199 2.920 VOS 2.535 2.396 \. 965 \.7<40
23 \1.282 U80 5.216 <4.320 3.753 3.359 3.068 2.W 2.665 2.519 2.062 1.8\9
27 11.<463 7.056 5.372 <4. <462 3./l8.l 3. .c81 3.182 2.95\ 2.767 2.615 2.139 1.8&4
3\ \\.630 7.158 5. .c9I <4.57\ 3.9as 3.576 3.272 3.036 2.8<48 2.693 2.203 \.939
.OS 37 1\. 790 7.lJ<4 5.625 <4.695 <4.101 3.686 3.376 3. 136 2.9<43 2.7as 2.280 2.006
1,7 1 I. 961, 7. <495 5..77<4 <4.~ <4.235 3.813 3. <498 3.253 3.OS7 2.89<4 2.375 2.090
67 12.151, 7.675 5.9<4<4 <4.997 <4.390 3.963 3.61,3 3. 39<4 3. 191, 3.028 2.1,95 2.200
87 12.2'ss 7.711, 6.038 5.068 <4.<477 <4.0<48 3. 727 3. <476 3.27<4 .1. \07 2.568 2.268
127 12.362 7.878 6.138 5. las <4.573 <4.W 3. 818 3.566 3.363 3.195 2.652 2.3.c8
21,7 12.<47<4 7.989 6.2<46 5..291 <4.676 <4.21,<4 3. 920 3.667 3.<463 3.29<4 2.7<49 2.<4<41.
00 12.592 8.107 6.362 5.<4OS 1,.790 <4.357 <4.033 3.780 3.576 3. .c08 2.86-4 2.56\
17 12.722 7.296 5..360 <4.352 3.730 3.306 2.998 2.76<4 1,580 2.<431 I. 97<4 1.739
19 13.126 7.570 5.57<4 <4.53\ 3.885 3. <4<4<4 3.122 2.877 2.663 2.526 2.0<41. 1.795
23 13. 736 7.992 5..912 <4.817 ... 135 3.667 3.325 3.063 2· ass2.687 2. \65 1.892
27 14.173 8.303 6.16<4 5.03<4 1,.328 3.81,1 3. .c81, 3.210 2.993 2.816 2: 261, I. 97<4
31 I... S(U 8.. 5<4 I 6.360 5..20<4 ... 480 3.980 3.612 3.329 3.101,2.921 2.3<47 2.043
.01 37 1<4.865 8.808 6.583 5.<400 <4.657 <4.11,2 3.763 3.1,70 3.237 3.01,7 2.1,.c8 2.128
1,7 15.270 9.112 6.639 5.628 1,.86<4 <4.33<4 3.9<43 3.61.0 3.399 3.201 2.576 2.238
67 15..723 9• .csa 7.136 5.895 5..11\ <4.565 1,.161 3.81,8 3. 5983.393 2.739 2.363
87 15..970 9.650 7.303 6.01,7 5.252 <4.699 1,.289 3.971 3.716 3.507 2.81.0 2.<475
127 16.233 9.as6 7.1.8<4 6.213 5. .c08 <4.81,7 1,. <431 1,.108 3.aso 3.637 2958 2. ,58.C
21,7 16.513 10.079 7.682 6.395 5.580 5..012 <4.591 1,.261, 1,.0023.786 3.097 2.717
00 16.812 10.319 7.897 6.596 5.772 5.198 <4.772 <4.1,1.2 "'\77 3.959 3.261, 2.582
a
~ 18 11.961
I 2
7.063
3 I,
1,.3O.c
5 6 7
2.770
8
2.589
9 10 15 20
l. 989 1.753
5.258 3.708 3.298 2.999 2.<4<42
20 12. 18-4 7.2<43 5.1,11 1,. <437 3.827 3.1,06 3.098 2.861 2.671, 2.522 2.0<49 1.8112
2<4 12.513 7.516 5.61,7 1,.61,7 1,.016 3.580 3.258 3.011 2.81<4 2.653 '2.151 1.587
28 12.7<4<4 7.71<4 5.821 1,.803 <4.160 3.713 3.382 3.127 2.921, 2.757 2.235 1.956
32 12.915 7.863 5.95<4 <4.m <4.272 3.817 3."81 3.220 3.012 2.642 2.30<4 2.015
.OS 38 13.102 8.030 6.10<4 5..063 <4.<40\ 3.939 3.596 3.330 3.117 2.9<42 2.388 2.068
.c8 13.308 8.216 6.275 5.222 <4.'ss1 <4.081 3. 732 3.<461 3.2<43 3.063 2.1,92.2.180
68 13.53<4 8. ..26 6.<471 5..1.07 <4.727 <4.25\ 3.895 3.619 3.396 3.213 2.625 2.300
58 13.657 8. 5.. 1 6.579 5.511 <4.827 <4.3<48 3.990 3. 711 3.<486 3.301 2.706 2.376
128 13.786 8.665 6.697 5.62<4 <4.937 <4.<4'ss ".095 3.81<4 3.588 3."01 2.800 2. <465
2.c8 13.923 8.797 6.82<4 5..7<47 5.058 <4.573 ·UII 3.929 :".702 3.51<4 2.910 2.573
1<4.067 8.938 6.961 5..582 5.191 <4. 70s 1,.3..3 ... 060 3.833 3.6045 3. 0<41 2. 70s
""
18 \3.81<4 7.812 5.7<4" <4.6"0 3. 960 3.<498 3. \63 2.908 2.707 2.5045 2.050 1.797
20 , ... 310 8.164 5.971 <4.829 <4.124 3.6<42 3.292 3. 026 2.816 2.6<46 2.12<4 I. ass
21, \ ... 974 8.619 6.332 5.llJ 4.389 3.879 3.5D7 3.222 2.997 2.8\1, 2.250 t.956
28 15.~ 8.957 6. 60s 5.367 <4.595 .c.o65 3. 676 3.379 3.\43 2.951 2.355 2.0<42
32 15.822 9.218 6.818 5.'ss1 4.759 4.21 .. 3.814 3.506 3.262 3.063 2.443 2. \15
.0\ 38 \6.230 9.51<4 7. D63 5.765 1,.952 <4.390 3.rn 3.659 3.406 3. 199 2.,SS1 2.206
.c8 \6. 688 9.as2 7.3<46 6.015 5.179 <4.600 <4.173 3.8<43 3.581 3.366 2.688 2.32<4
68 17.206 10.243 7.679 6.313 5.0452 <4. ass <4.<413 <4.072 3.799 3.575 2. 866 2. .c81
58 17. <491 10."61 7.867 6.463 5.610 5.003 <4.55<4 <4.207 3.m 3.701 2.976 2.581
128 17.196 10.697 8.073 6.670 5.7as 5.169 <4.713 <4.360 <4.078 3.8.c6 3. 106 2.700
2<48 \8.124 10.95<4 8.298 6.878 5..980 5.356 <4.894 <4.535 <4.2.c8 <4.012 .3. 260 2.647
go \8• .c75 11.2lJ 8.5.46 7.108 6.200 5.567 5.099 <4.736 <4. <4<46 <4.207 3.<41.7 3.030
679
TABLE B.4 (Continlleg)
---~-.----------
2 3 5 6 7 8 9 10 15 20
19 13.101 7. 6~0 5.645 4. 59~ 3.9~1 3 ~93 3.166 2.916 2.719 2.559 2.067 1.812
21 13. 3~6 7.~ 5.8OB ~.737 ~. 067 3. 607 3.270 3.012 2.808 2.6043 2.130 1.863
25 13.710 8.132 6.063 ~.962 ~.270 3.792 3.~~1 3.171 2.956 2.782 2.238 1.9.)2
29 13.970 8.350 6.253 5.131 ~. ~25 3.935 3. 57~ 3.295 3.07~ 2.893 2.326 2.025
33 I~. 163 8.515 6.399 5.26~ ~. 5.47 ~.049 3.680 3.396 3. 169 2.96-C 2.~ 2.088
.OS 39 1~.377 8.701 6.566 5. ~16 ~. 688 ~. 181 3.806 3.515 3.283 3.092 2.~9O 2. 165
~9 I~. 61~ 8.912 6.757 5.593 ~.85~ ~.338 3955 3.658 3.~20 3.22~ 2.603 2.2~
69 1~.877 9.151 6.9T7 5.800 5. OSO ~. 5:16 ~. 136 3.832 3.589 3.388 2. 7~7 2.395
89 15.021 9.283 7.101 5.917 5.163 ~.6~ ~. 2~1 3.935 3.689 3.0486 2.836 2. ~78
129 15.173 9.~26 7.235 6.0~6 5.287 ~.755 ~.358 ~.050 3.802 3.597 2. 9~0 2.576
2~9 15.335 9.579 7.381 6.187 5.~2~ ~. t.89 ~.m ~. 180 3. 931 3. 725 3.063 2.695
00 15.507 9.745 7.5~1 6.342 5.577 5.040 ~. 6~0 ~.329 ~. 078 3. 872 3.210 2.8(5
1
19 I~. 999 8.0435 6. tl9~. 921 4. 185 3.685 3.323 3.().48 2.832 2.658 2.125 1.1!53
21 15. 463 8.7~3 6.357 5.119 ~.355 3.836 3.~58 3. 171 2.9~~ 2.762 2.201 I. 913
25 16.m 9.226 6.739 5.0439 ~. 634 ~. 08-4 3.682 3.376 3. 134 2.938 2332 2.018
29 16.700 9.589 7.030 5.687 ~.1!53 ~.280 3.861 3.5041 3.287 3. OBI 2.442 2.107
33 17.100 9.871 7.259 5.865 5.028 ~ . .u9 Hot 3.676 3.~1~ 3.200 2.535 2. 16-C
.01 39 17. 5~9 10. 19~ 7.52~ 6.115 5.2~ ~. 627 ~. 181 3.839 3.566 3.344 2.~9 2.280
~9 18. 058 10.565 7.833 6 387 5 . .a0 ~.85~ ~. 392 ~.037 3. 75~ 3. 523 2.795 2.~OS
69 18.~0 10.998 8.199 6.713 5.778 5.131 ~. 653 ~.26-C 3.990 3.7~9 2.986 2.573
89 18.962 II. 2~2 8.~OB 6 901 5.952 5. 29~ ~ 808 ~.0432 ~. 132 3.886 3. lOS 2.68()
129 19.310 11.5OB 8.638 7.109 6. 1~6 5. ~78 ~. 983 ~. 601 ~. 295 ~.~ 3.246 2.810
2~9 19.684 11.798 8.891 7.3~1 6.36~ 5.685 5 183 ~. 79~ ~. "83 ~.228 3. ~15 2.970
00 20.090 12.117 9.173 7.601 6.610 5.922 5.~13 5.019 ~. 703 ~. ~45 3.622 3.171
al~1 2
-
3 5 6 7 8 9 10 15 20
21 15. 322 8.761 6.395 5.158 ~. 392 3.869 3.489 3.199 2.970 2.785 2 217 I. 925
23 15. 604 8.979 6.577 5.315 ~. 531 3. 99~ 3.602 3.303 3. 067 2.875 2.285 1.980
27 16.033 9.320 6. 86~ 5.566 ~. 756 ~.199 3.790 3. ~77 3.229 3.028 2.~03 2.075
31 16.3~~ 9.573 7.OB2 5.759 ~. 931 ~.359 3.939 3.616 3.360 3.151 2.500 2.156
35 16.580 9.769 7.252 5.912 5.071 ~. ~88 ~.060 3.730 3.~67 3.253 2.581 2.225
~I 16. 6-C3 9.992 7.~48 6.090 5.23~ ~. 6~1 ~.203 3.865 3.596 3.376 2:682 2.311
.OS 51 17. I~O 10. 2~7 7.676 6.299 5. ~29 ~.82~ ~.376 ~.030 3. 75~ 3.5'0 2.810 2.~23
71 17. ~76 10.5043 7.9~~ 6• .s.a 5.663 5.0~7 ~.590 ~.235 3.95' 3.719 29n 2.572
91 17.M2 10.709 B. 097 6.691 5.800 5. 178 ~. 716 ~.358 ~.070 3.83.4 3.OBI 2.668
131 17.861 10.890 8. 265 6.850 5. 952 5. 32~ ~.&58 ~.~96 ~.206 3. 967 3.204 2.783
251 18. 076 It OB7 8.<150 7.027 6.122 5. ~90 5.021 ~.656 ~.363 ~ 122 3.350 2.925
00 18. 307 11.303 8.~ 7.22~ 6.315 5. ~79 5.207 ~. 6-C0 ~.5~6 ~.304 3.529 3. 102
21 17.197 9.534 6.851 5. ~70 ~. 62~ ~.051 3.636 3.322 3.075 2.8n 2.271 I. 962
23 17.707 9.867 7.107 5.682 ~.806 ~. 211 3.779 3.452 3.19~ 2.987 2.351 2.025
27 18.505 10.399 7.523 6.029 5. 107 ~. ~78 ~.021 3.672 3.397 3.175 2.m 2.137
31 19.101 10.805 7.8-46 6.302 5.346 ~.693 ~. 216 3.851 3.~ 3.330 2.6OB 2.233
35 19.562 11.125 8.103 6.522 5.5041 ~.868 ~. 376 ~.OOO 3.702 3.~60 2.709 2.315
~l 20.088 II. ~95 8.~OS 6.782 5.772 5.078 ~.570 ~. 180 3.871 3.619 2.835 2.~20
.01 51 20.692 11. 928 8. 761 7.093 6.OS2 5.335 ~.808 ~. ~03 ~. OBl 3.819 2.996 2.558
71 21. 39~ 12.441 9.190 7. ~73 6.397 5.~ 5.107 ~.686 ~.350 ~.076 3.211 2.745
9' 21. 790 12.735 9.0439 7.695 6.6()1 5.8.45 5.287 ~.857 ~.515 ~.234 3. 347 2.867
131 22.221 13.OS9 9.716 7. 9~~ 6.832 6.062 5. ~9~ 5.055 ~.7OS ~. ~18 3.510 3.016
251 22. 692 13.~17 10.025 8.225 7.09~ 6.310 5.732 5.285 ~.928 ~.636 3.707 3.201
00 23. 209 13.816 10.373 8.545 7.395 6.598 6.010 5.556 5.193 ~.895 3.952 3.~
680
TABLE B.5
SIGNIFICANCE POINTS FOR THE MODIFIED LIKELIHOOD RATIO TEST OF
EQUALITY OF COVARIANCE MATRICES BASED ON EQUAL SAMPLE SIZES
Pr{-21ogA* ~ X} = 0.05
ng \q 2 3 4 5 6 7 8 9 10
.---~-- ~ -- ~- -~-
p=2
3 12.18 18.70 24.55 30.09 35.45 40.68 45.81 50.87 55.87
4 10.70 16.65 22.00 27.07 31.97 36.76 41.45 46.07 50.64
5 9.97 15.63 20.73 25.56 30.23 34.79 39.26 43.67 48.02
p=3
5 19.2 30.5 41.0 51.0 60.7 70.3 79.7 89.0 98.3
p-4
6 30.07 48.63 65.91 82.6 98.9 115.0 131.0
7 27.31 44.69 60.90 76.56 91.89 107.0 121.9 137.0 152.0
8 25.61 42.24 57.77 72.78 87.46 101.9 116.2 130.4 144.6
9 24.46 40.56 55.62 70.17 84.42 98.45 112.3 126.1 139.8
10 23.62 39.34 54.05 68.27 82.19 95.91 109.5 122.9 136.3
681
TABLE B.5 ( COfllinLIed)
-~.-- --
flX \q 2 3 4 5 6 7 flK \q 2 3 4 5
-. --------.------- ..------------- --------
p= 5 p=6
8 39.29 65.15 89.46 113.0 10 49.95 84.43 117.0
9 36,70 61.40 84.63 107.2 129.3 151.5
10 34,92 58.79 81.25 103.1 124.5 145.7 11 47,43 80.69 112.2 142.9
12 45.56 77.90 108.6 138.4
11 33.62 56.86 78.76 100.0 120.9 141.6 13 44.11 75.74 lOS.7 135.0
12 32.62 55.37 76.83 97.68 I18.2 138.4 14 42.96 74.01 103.5 132.2
13 31.83 54.19 75.30 95,81 116.0 135.9 15 42.03 72.59 101.6 129.9
14 31.19 53.24 74.06 94.29 114.2 133.8
15 30.66 52.44 73.02 93.03 112.7 132.1 16 41.25 71.41 100.1 128.0
17 40.59 70.41 98.75 126.4
16 30.21 51.77 72.14 91.95 111.4 130.6 18 40.02 69.55 97.63 125.0
19 39.53 68.80 96.64 123.8
20 39.11 68.14 95.78 122.7
682
TABLE B.6
CORRECTI0N FACTORS FOR SIGNIFICANCE POINTS FOR TJii SPHERICITY TEST
5% Significance Level
n\p 3 4 5 6 7 8
4 1.217
5 1.074 1.322
6 1.038 1.122 1.383
7 1.023 1.066 1.155 1.420
8 1.015 1.041 1.088 1.180 1.442
9 1.011 1.029 1.057 1.098 1.199 1.455
10 1.008 1.021 1.040 1.071 1.121 1.214
683
TABLE B.6 (Continued)
1% Significance Level
n\p 3 4 5 6 7 8
4 1.266
5 1.091 1.396
6 1.046 1.148 1.471
7 1.028 1.079 1.186 1.511
8 1.019 1.049 1.103 1.213 1.542
9 1.013 1.034 1.067 1.123 1.234 1.556
10 1.010 1.025 1.047 1.081 1.138 1.250
12 1.006 1.015 1.027 1.044 1.068 1.104
14 1.004 1.010 1.018 1.028 1.041 1.060
16 1.003 1.007 1.012 1.019 1.028 1.039
18 1.002 1.005 1.009 1.014 1.020 1.028
20 1.002 1.004 1.007 1.011 1.015 1.021
24 1.001 1.003 1.005 1.007 1.010 1.013
28 1.001 1.002 1.003 1.005 1.007 1.009
34 1.001 1.001 1.002 1.003 1.004 1.006
42 1.000 L001 1.001 1.002 1.003 1.003
50 1.000 1.001 L001 1.001 1.002 1.002
100 1.000 1.000 1.000 1.000 1.000 1.001
2
X 15.0863 21.6660 29.1412 37.5662 46.9629 57.3421
684
TABLE B.7t
SIGNlflCANCE POINTS FOR THE MODIFIED LlKEUHOOD RATIO TEST I = Io
Pr{ - 2 log At ~ x} = 0.05
----~----
n 5% 1% n 5% 1% n 5% 1% n 5% 1%
----- - .. --~-- ----
p=2 p=3 p=5 p=6
2 13.50 19.95 4 18.8 25.6 9 32.5 40.0 12 40.9 49.0
3 10.64 15.56 5 16.82 22.68 10 31.4 38.6 13 40.0 47.8
4 9.69 14.13 14 39.3 47.0
5 9.22 13.42 6 15.81 21.23 11 30.55 37.51 15 38.7 46.2
7 15.19 20.36 12 29.92 36.72
6 8.94 13.00 8 14.77 19.78 13 29.42 36.09 16 38.22 45,65
7 8.75 12.73 9 14.47 19.36 14 29.02 35.57 17 37.81 45.13
8 8.62 12.53 10 14.24 19.04 15 28.68 35.15 18 37.45 44.70
9 8.52. 12.38 19 37.14 44,32
10 8.44 12.26 11 14.06 18.80 16 28.40 34.79 20 36.87 43.99
12 13.92 18.61 17 28.15 34.49 21 36.63 43.69
--
p=4 13 13.80 18.45 18 27.94 34.23
7 25.8 30.8 14 13.70 18.31 19 27.76 34.00 22 36.41 43.43
8 24.06 29.33 15 13.62 18.20 20 27.60 33.79 24 36.05 42.99
9 23.00 28.36 26 35.75 42.63
10 22.28 27.66 28 35.49 42.32
30 35.28 42.07
11 21.75 27.13
12 21.35 26.71
13 21.03 26.38
14 20.77 26.10
15 20.56 25.87
p=7 p=8 p=9 P = 10
18 48.6 56.9 24 58.4 67.1 28 70.1 79.6 34 (82.3) (92.4)
19 48.2 56.3 26 57.7 66.3 30 69.4 78.8 36 81.7 91.8
20 47.7 55.8 28 57.09 65.68 38 81.2 91.2
21 47.34 55.36 30 56.61 65.12 32 68.8 78.17 40 80.7 90.7
22 47.00 54.96 34 68.34 77.60
32 56.20 64.64 36 (67.91) (77.08) 45 79.83 89.63
24 46.43 54.28 34 55.84 64.23 38 (67.53) (76.65) 50 79.13 88.83
26 45.97 53.73 36 55.54 63.87 40 67.21 76.29 55 78.57 88.20
28 45.58 53.27 38 55.26 63.55 60 7813 87.68
30 45.25 52.88 40 55.03 63.28 45 66.54 75.51 65 77.75 87.26
32 44.97 52.55 50 66.02 74.92
34 44.73 52.27 55 65.61 74.44 70 77.44 86.89
60 65.28 74.06 75 77.18 86.59
687
688 REFERENCES
Anderson, T. W., and Herman Rubin (1949), Estimation of the parameters of a single
equation in a complete system of stochastic equations, Annals of Mathematical
Statistics, 20, 46-63. [Reprinted in Readings in Econometric Theory (J. Malcolm
Dowling and Fred R. Glahe, eds.), Colorado Associated University, 1970,
358-375.] [12.8]
Anderson, T. W., and Herman Rubin (1950), The asymptotic properties of estimates
of the parameters of a single equation in a complete system of stochastic
equations, Annals of Mathematical Statistics, 21, 570-582. [Reprinted in Readings
in Econometric Theory (J. Malcolm Dowling and Fred R. Glahe, eds.), Colorado
Associated University, 1970, 376-388.] [12.8]
Anderson, T. W., and Herman Rubin (1956), Statistical inference in factor analysis,
Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Proba-
bility (Jerzy Neyman, ed.), Vol. V, University of California, Berkeley and Los
Angeles, 11] -150. [14.2, 14.3, 14.4, 14.6]
Anderson, T. W., and Takamitsu Sawa (1973), Distributions of estimates of coeffi-
cients of a single equation in a simultaneous system and their asymptotic
expansiuns, Econometrica, 41, 683-714. [12.7]
Anderson, T. W., and Takamitsu Sawa (1977), Tables of the distribution of the
maximum likelihood estimate of the slope coefficient and approximations, Tech-
nical Report No. 234, Economics Series, Institute for Mathematical Studies in
the Social Sciences, Stanford University, April. [12.7]
Anderson, T. W., and Takamitsu Sawa (1979), Evaluation of the distribution function
of the two-stage least squares estimate, Econometrica, 47, 163-1S2. [12.7]
Anderson, T. W., and Takamitsu Sawa (1982), Exact and approximate distributions of
the maximum likelihood estimatOr of a slope coefficient, Journal of the Royal
Statisti-::al Society B, 44, 52-62. [12.7)
Anderson, T. W., and George P. H. Styan (1982), Cochran's theorem, rank additivity
and trlpotent matrices, Statistics and Probability: Essays in Honor of C. R. Rao
(G. Kallianpur, P. R. Krishllaiah, and J. K. Ghosh, eds.), North-Holland, Amster-
dam, 1-23. [7.4)
Anderson, T. W., and Akimichi Takemura (1982), A new proof of admissibility of
tests in multivariate analysis, Journal of Multivariate Analysis, 12, 457-468. [8.10)
Andersson, Steen A., David Madigan, and Michael D. Perlman (2001), Alternative
Markov properties for chain graphs, Scandinauianlournal of Statistics, 28, 33-85.
[15.2)
Barnard, M. M. (1935), The secular variations of skull characters in four series of
Egyptian skulls, Annals of Eugenics, 6, 352-371. [8.8]
Barndorff-Nielson, O. E. (1978), Infonnation and Exponential FamiUes in Statistical
Theory, John Wiley & Sons, New York. [15.5]
Barnes, E. W. (1899), The theory of the gamma function, Messenger of Mathematics,
29, 64-129. [8.5]
Bartlett, M. S. (1934), The vector representation of a sample, Proceedings of the
Cambridge Philosophical Society, 30, 327-340. [8.3]
Bartlett, M. S. (1937a), Properties of sufficiency and statistical tests, Proceedings of the
Royal Society of London A, 160, 268-282. [10.2)
692 REFERENCES
Giri, N., J. Kiefer, and C. Stein (963), Minimax character of Hotelling's T~ test in
the simplest case, Annals of Mathematical Statistics, 34, 1524-1535. [5.6]
Girshick, M. A. (1939), On the sampling theory of roOts of determinantal ~quations.
Annals of Mathematical Statistics, 10, 203-224. [13.2]
Gieser, Leon Jay (1981), Estimation in a multivariate "errors in variables" regression
model: large sample results, Annals of Statistics, 9, 24-44. [12.7]
Glynn, W. J., and R. J. Muirhead (1978), Inference in canonical correlation analysis.
Journal of Multivariate Analysis. 8, 468-478. [12.4]
Golub, Gene H.. and Franklin T. Luk (1976), Singular value decomposition: applica-
tions and computations, unpublished. [12.3]
Golub, Gene H., and Charles F. Van Loan (1989), Matrix Computations end ~d').
Johns Hopkins University Press, Baltimore. [11.4, 12.3, A.5]
Grubbs, F. E. (1954), Tables of 1% and 5% probability levels of Hotclling'~ gcncral-
ized T2 stj:ltistics, Technical Note No. 926, Ballistic Research Laboratory.
Aberdeen Proving Ground, Maryland. [8.6]
Gupta, Shanti S. (1963), Bibliography on the multivariate normal integrals and related
topics, Annals of Mathematical Statistics, 34, 829-838. [2.31
Gurland, John (1968), A relatively simple form of the distribution of the multiple
correlation coefficient, Journal of the Royal Statistical Society B, 30. 276-283. [4.4]
Gurland, 1., and R. Milton (1970), Further consideration of the distributioll of th~
ffi4ltiple correlation coefficient, Journal of the Royal Statistical Society B, 32.
381-394. [4.4]
Haavelmo, T. (1944), The probability approach in econometrics, Econometrica, 12.
Supplement, 1-118. [12.7]
Haff, L. R. (1980), Empirical Buyes estimation of the multivariate normal coyarianct.:;
matrix, Annals of Statistics, 8, 586-597. [7.8]
Halmos, P. R. (1950), Measure Theory, D. van Nostrand. New Ytlrk. [45. 13.31
Harris, Bernard, and Andrew P. Soms (1980). The use of the tetrachoric series for
evaluating multivariate pormal probabilities, Journal of Multivariate Analysis, 10.
252-267. [2.3]
Hayakawa, Takesi (1967), (In the distribution of the maximum latent root of a
positive definite symme1ric random matrix, Annals of the Institute of Statistical
Mathematics, 19, 1-17. [8.6]
Heck. D. L (1960), Charts of some upper percentage pOints of the distribution of the
largest characteristic root, Annals of Mathematical Statistics, 31. 625-642. [8.6]
Hickman, W. Braddock (1953), The Volume of Corporate Bond Financillg Since 1900.
Princeton University. Princeton, 82-90. [10.7]
Hoel, Paul G. (1937), A significance test for component analysis, Annals of Mathemat-
ical Statistics, 8, 149-158. [7.5]
Hooker, R. H. (1907), The correlation of the weather and crops, Journal of the RO.ml
Statistical Society; 70,1-42. [4.2]
HoteUing, Harold (1931), The generalization of Student's ratio. Annals of Jialhemali-
cal Statistics, 2, 360-378. [5.1, S.P]
Hotelling, Harold (1933), Analysis of a complex of statistical variables into prindpal
components, Journal of Educational Psychology, 24, 417-441. 498-520. [1 U, 14.31
698 REFERENCES
HoteHing. Harold (1936), Relations between two sets of variates, Biometrika, 28,
321-377. [12.1]
Hotelling. Harold (1947), Multivariate quality control, illustrated by the air testing of
sample bombsights. Techniques of Statistical Analysis (c. Eisenhart, M. Hastay,
and W. A. Wallis, I!d~'). McGraw-Hili, New York. 111-IX4. [R.6]
Hotelling. Harold (1951), A generalized T test and measure of multivariate disper-
sion. Proceedings of tile Second Berkeley Symposium on Mathematical Statistics
and Probability (jerzy Neyman. ed'), University of California, Los Angeles and
Berkeley. 23-41. [8.6. 10.7]
Hotdling. Harold (1953). New light on the correlation coefficient and its transforms
(with discussion), Journal of the Royal Statistical Society B, 15, 193-232. [4.2]
HO\\t~. w. G. (1955). Some ColltriblltiollS to Factor Ana (ysis, U.S. Atomic Energy
Commission Report. Oak Ridge National Laboratory, Oak Ridge, Tennessee.
[14.2, 14.6]
Hsu. P. L. (1938). Notes on Hotelling's generalized T, Annals of k'athematical
Statistics. 9. 231-·243. [5.4]
Hsu. P. L. (1939a), On the distribution of the roots of certain determinantal equa-
tions. Anllals of Eugenics, 9, 250-25R. [13.2]
Hsu. P. L. (1939b), A new proof of the joint product moment distribution, Proceedings
of the Cambridge Philosophical SOcieT)', 35, 336-338. [7.2]
Hsu. P. L. (1945), On the power functions for the E 2-test and the T 2-test, Annals of
Mathematical Statistics. 16, 278-286. [5.6]
Hudson. M. (1974), Empirical Bayes estimation, Technical Report No. 58, NSF
contract GP 3071IX-2, Department of StatistiCl, Stanford University. [3.5]
Immer. F. R .. H. D. Hayes, and LeRoy Powers (1934), Statistical determination of
barky varil!tal adaptation, Journal of the American Society of Agronomy, 26,
403-407. [8.9]
Ingham, A. E. (1933), An integral which occurs in statistics, Proceedings of the
Cambridge Philosophical Society. 29, 271-276. [7.2]
I to. K. ( 1956). Asymptotic formulae for the distribution of Hotelling's generalized T~
statistic. Annals of Mathematical Statistics, 27, 1091-1105. [8.6]
lto. K. (1960), Asymptotic formulae for the distribution of Hotelling's generalized TJ
statistic. II. Annals of Mathellwtical St({ti~tics. 31. 114X-1153. [x'6]
IZl!nman. A. 1. (1975), Reduced-rank regression for the multivariate linear model,
JOLlrnal of Multivariate Analysis,S. 248-264. [12.7]
lzenman. Alan Julian (1980), Assessing dimensil)nality in multivariate regression,
Ana(vsis of Variance. Handbook of Statistics. VoL 1 (P. R. Krishnaiah, ed.),
North-Holland. Amsterdam, 571-591. [3.P]
J ames, A. T. (1954), Normal multivariate analysis and the orthogonal group, Annals of
Mathematical Statistics, 25, 40-75. [7.2]
J ames. A. T. (1964), Distributions of matrix variates and latent roots derived from
normal samples, Annals of Mathematical Statistics, 35. 475-501. [8.6]
REFERENCES 699
James, W., and C. Stein (1961), Estimation with quadratic loss, Proceedings of the
Fourth Berkeley Symposium on Mathematical Statistics and Probability (Jerzy
Neyman, ed.), Vol. I, 361-379. University of California, Berkeley. [3.5, 7.8]
Japanese Standards Association (1972). Statistical Tables and Fonnulas with Computer
Applications. [Preface]
Jennrich, Robert I.. and Dorothy T. Thayer (1973), A note on Lawley's formulas for
standard errors in maximum likelihood factor analysis, Psychometrika, 38,
571-580. [14.3]
Johansen, S. (1995), Likelihood-based Inference in Cointegrated Vector Autoregressive
Models, Oxford University Press. [12.7]
Jolicoeur, Pierre, and J. E. Mosimann (1960), Size and shape variation in the painted
turtle, a principal component analysis, Growth, 24, 339-354. Also in Benchmark
Papers In Systematic and Evolutionary Biology (E. H. Bryant and W. R. Atchley,
eds.), 2 (1975),86-101. [1 LP]
Joreskog, K. G. (969) A general approach to confirmatory maximum likelihood
factor analysis, Psychometrika, 34, 183-202. [14.2]
Joreskog, K. G., and Arthur S. Goldberger (1972), Factor analysis by generalized least
squares, Psychometrika, 37, 243-260. [14.3]
Kaiser, Henry F. (1958), The varimax criterion for analytic rotation in factor analysis,
Psychometrika, 23, 187-200. [14.5]
Kanazawa, M. (1979), The asymptotic cut-off point and comparison of error probabili-
ties in covariate discriminant analysis, Journal of the Japan Statistical Society, 9,
7-17. [6.6]
Kelley, T. L (1928), Crossroads in the Mind of Man, Stanford University, Stanford.
[4.P,9.P]
Kendall, M. G., and Alan Stuart (1973), The Advanced Theory of Statistics (3rd ed.),
VoL 2, Charles Griffin, London. [12.6]
Kennedy, William J., Jr., and James E. Gentle (1980), Statistical Computing, Marcel
Dekker, New York. [12.3]
Khatri, C. G. (t 963), Joint estimation of the parameters of multivariate normal
populations, Journal of Indian Statistical Association, I, 125-133. [7.2]
Khatri, C. G. (1966), A note on a large sample distribution of a transformed multiple
corrcl?tion coefficient, Annals of the Institute of Statistical Mathematics, 18,
375-380. [4.4]
Khatri, C. G. (1972), On the exact finite series distribution of the smallest or the
largest root of matrices in three situations, Journal of Multivariate Analysis, 2,
201-207. [8.6]
Khatri, C. G., and K. C. Sreedharan Pillai (1966), On the moments of the trace of a
matriX and approximations to its non-central distribution, Annals of Mathematical
Statistics, 37, 1312-1318. [8.6]
Khatri, C. G., and K. C. S. Pilla! (1968), On the non-central distributions of two test
criteria in multIvariate analysis of variance, Annals of Mathematical Statistics, 39,
215-226. [8.6]
Khatri, C. G., and K. V. Ramachandran (1958), Certain multivariate distribution
problems, I (Wishart's distrlbution), Journal of the Maharaja Sayajairo, University
of Baroda, 7, 79-82. [7.2]
700 REFERENCES
Kiefer, 1. (1957), Invariance, minimax sequential estima tion, and continuous time
processes, Annals of Mathematical Statistics, 28, 573-601. [7.8]
Kiefer, J. (1966), Multivariate optimality results, Multivariate Analysis (Parachuyi R.
Krishnaiah, ed.), Academic, New York, 255-274. [7.8]
Kiefer, 1., and R. Schwartz (1965), Admissible Bayes character of T2_, R2_, and other
fully invariant tests for classical multivariate normal problems, Annals of Mathe-
matical Statistics, 36, 747-770. [5.P, 9.9, 10.10]
Klotz, Jerome, and Joseph Putter (1969), Maximnm likelihood estimation of the
multivariate covariance components for the balanced one-way layout. Annals of
Mathematical Statistics, 40, 1100-1105. [10.6]
Kolmogorov, A. (1950), Foundations of the Theory of Probability, Chelsea, New York.
[2.2]
Konishi, Sadanori (1978a), An approximation to the distribution of the sample
correia tion coefficien t, Biometrika, 65, 654-656. [4.2]
Konishi, Sadllnori (I 978b}, Asymptotic expansions for the distributions of statistics
based on a correlation matrix, Canadian Journal of Statistics, 6, 49-56. [4.2]
Konishi, Sadanori (1979), Asymptotic expansions for the distributions of functions of a
correlation matrix, Journal of Multivariate Analysis, 9, 259-266. [4.2]
Koopmans, T. c., and Olav Rcicrs01 (950), The idcntification of structural character-
istics, Annals of Math(matical Statistics, 21, I oS-181. [14.2]
Korin, B. P. U968}, On the distribution of a statistic used for testing a covariance
matrix, Biometrika, 55, 171-178. [10.8]
Korin, B. P. (1969), On testing the equality of k covariance matrices, Biometrika, 56,
216-218. [10.5]
Kramer, K. H. (1963), Tables for constructing confidence limits on the multiple
correlation coefficient, Journal of the American Statistical Association, 58,
1082-1085. [4.4]
Krishnaiah, P. R. (1978), Some recent developments on real multivariate distributions,
Development in Statistics (P. R Krishnaiah, ed.), Vol. 1, Academic, New York,
135-169. [8.6]
Krishnaiah, P. R. (1980), Computations of some multivariate distributions, Analysis of
Variance, Handbook of Statistics, Vol. 1 (P. R Krishnaiah, ed.), North-Holland,
Amsterdam, 745-971.
Krishnaiah, P. R., and T. C. Chang (l972), On the exact distributions of the traces of
S,CS, +S2}-1 and SIS2" Sankhya, A, 34,153-160. [8.6]
Krishnaiah, P. R, and F. J. Schuurmann (1974), On the evaluation of some distribu-
tions that arise in simultaneous tests for the equality of the latent roots of the
covariance matrix, Journal of Multivariate Analysis, 4, 265-282. [10.7]
Kshirsagar, A. M. (1959), Bartlett decomposition and Wishart distribution, Annals of
Mathematical Statistics, 30, 239-241. [7.2]
Kudo, H .. (1955}, On minimax invariant estimates of the transformation parameter,
Natural Science Report, 6, 31-73, Ochanomizu University, Tokyo, Japan. [7.8]
Kunitomo, Naoto (J 980), Asymptotic expansions of the distributions of estimators in a
linear functional relationship and simultaneous equations, Journal of the Ameri-
can Statistical Association, 75, 693-700. [12.7]
REFERENCES 701
Loeve. M. (1977). Probability Theory I (4th ed.), Springer~Verlag, New York. [2.2]
Loeve, M. (1978), Probability Theory /I (4th ed.), Springer-Verlag, New York. [2.2]
Madow. W. O. (1938), Contributions to the theory of multivariate statistical analysis,
Tratlsactiol/s of the American Mathematical Sociery, 44, 454-495. [7.2]
\1agnus. Jan R. (1988). Linear Strnctures, Charles Griffin and Co., London. [A.4]
Magnus, J. R, and H. Neudecker (1979), The commutation matrix: some properties
and applications, The Annals of Statistics, 7, 381-394. [3.6]
r-,'{ahalanobis, P. C (1930). On tests and measures of group divergence, Journal and
Proceedings of the Asiatic Society of Bengal, 26, 541-588. [3.3]
Mahalanobis. P. C, R C. Bose, and S. N. Roy (1937), Normalisation of statistical
variates and the use of rectangular co-ordinates in the theory of sampling
distributions, Sankhya, 3, 1-40. [7.2]
Mallows, C L. (1961), Latent vectors of random symmetric matrices, Biometrika,48,
133-149. [11.6]
Mardia, K. V. (1970), Measures of multivariate skewness and kurtosis with applica-
tions, Biometrika, 57, 519-530. [3.6]
Mariano, Roberto S., and Takamitsu Sawa (1972), The exact finite-sample distribu-
tion of the limited-information maximum likelihood estimator in the case of two
included endogenous variables, Journal of the American Statistical Association, 67,
159-163. [12.7]
Maronna, R A. (1976), Robust M-estimators of multivariate location and scatter,
Annals of Statistics, 4.51-67. [3.6]
Marshall, A. W., and L Olkin (1979), Inequaliries: Theo,y of Majorization and Its
Applications, Academic, New York. [8.10]
Mathai, A. M. (1971), On the distribution of the likelihood ratio criterion for testing
linear hypotheses on regression coefficients, Annals of the Institute of Statistical
Mathematics. 23, 181-197. [8,4]
Mathai, A. M., and R S. Katiyar (1979), Exact percentage points for testing indepen~
dence, Biometrika, 66, 353-356. 19.3]
Mathai. A. M .. and P. N. Rathie (1980). Tile exact non-null distribution for testing
equality of covariance matrices, Sankhya A, 42, 78-87. [10.4]
Mathai. A. M .. and R. K. Saxena (1973), Generalized Hypergeometric Functions with
Applications in Statistics and Physical Sciences, Lecture Notes No. 348, Springer~
Verlag, New York. [9.3]
Mauchly. J. W. (1940). Significance test for sphericity of a normal n~variate distribu~
tion. Annals of Mathematical Statistics, 11, 204-209. [10.7]
Mauldon, J. O. (1955), Pivotal quantities for Wishmt's and related distributions, and a
paradox in fiducial theory, Journal of the Royal Statistical Society B, 17, 79-85.
[7.2]
McDonald. Roderick P. (2002), What can We learn from path equations: identifiabil~
ity, constraints, equivalence, Psychmnetrika, 67, 225-249. [15.1]
McLachlan, G. J. (J 973), An asymptotic expansion of the expectation of the estimated
error ratc in discriminant analysis, Australian Journal of Statistics, 15, 210-214.
[6.6]
REFERENCES 703
Nagarsenker, B. N., and K. C. S. Pillai (1972), The Distribution of the Sphericity Test
Criterion, ARL 72-0154, Aerospace Research Laboratories. [Preface]
Nagarsenker, B. N., and K. C. S. Pillai (1973a), The distribution of the sphericity test
criterion, Journal of Multivariate Analysis, 3, 226-235. [10.7]
Nagarsenker, B. N., and K. C. S. Piliai (1973b), Distribution of the likelihood ratio
criterion for testing a hypothesis specifying a covariance matrix, Biometrika, 60,
359-364. [10.8]
Nagarsenker, B. N., and K. C. S. Pillai (1974), Distribution of the likelihood ratio
criterion for testing l: = l:o, f.l. = f.l.o, Journal of Multivatit te Analysis, 4, 114-122.
[10.9]
Nanda, D. N. (1948), Distribution of a root of a determinantal equation, Annals of
Mathematical Statistics, 19, 47-57. [8.6]
Nanda, D. N. (1950), Distribution of the sum of roots of a determinantal equation
under a certain condition, Annals of Mathematical Statistics, 21, 432-439. [8.6]
Nanda, D. N. (1951), Probability distribution tables of the larger root of a determinan-
tal equation with two roots, JOLln/al of the Ituliafl Society of AgriClll.·ural Statistics,
3, 175-177. [8.6]
Narain, R. D. (1948), A new approach to samplirg distributions of the multivariate
normal theory, I, iuurtlul uj'tlte IttdiufI Suciety of Agriculwt'al Statistics, 1, 59-69.
[7.2]
Nanin, R. D. (1950), On the completely unbiased character of tests of independence
in multivariate normal systems, Annals of Mathematical Statistics, 21, 293-298.
[9.2]
National Bureau of Standards, United States (1959), Tables of the Bivariate Nonnal
Distribution Function and Related Functions, U.S. Government Printing Office,
Washington, D.C. [2.3]
Neveu, Jacques (1965), Mathematical Foundations of the Calculus of Probability,
Holden-Day, San Francisco. [2.2]
Ogawa, 1. (1953), On the sampling distributions of classical statistics in multivariate
analysis, Osaka Mathematics Journal, 5, 13-52. [7.2]
Okamoto, Masashi (1963), An asymptotic expansion for the distribution of the linear
discriminant function, Annals of Mathematical Statistics, 34, 1286-1301. (Correc-
tion, 39 (1968), 1358-1359.) [6.6]
Okamoto, Masashi (1973), Distinctness of the eigenvalues of a quadratic form in a
multivariate sample, Annals of Statistics, 1, 763-765. [13.2]
Olkin, Ingram, and S. N. Roy (1954), On multivariate distribution theory, Annals of
Mathematical Statistics, 25, 329-339. [7.2]
Olson, C. L (1974), Comparative robustness of six tests in multivariate analysis of
variance, Journal of the American Statistical Association, 69, 894-908. [8:6]
Pearl, Judea (2000), Causality: Models, Reasoning, and Inference, Cambridge Univer~
sity Press, Cambridge. [15.1]
Pearson, E. S., and H. O. Hartley (1972), Biometrika Tables for Statisticians, Vol. II,
Cambridge (England), Published for the Biometrika Trustees at the University
Press. [Preface, 8.4]
Pearson, E. S., and S. S. Wilks (1933), Methods of statistic:!I analysis appropriate for k
samples of two variables, Biometrika, 25, 353-378. [10.5]
REFERENCES 70S
PillaL K. C. S.. and K. Jayachandran (1970), On the exact distri"'ution of Pillai's v(s)
~rit~rioll. Joun){// of Ille American SrariSlicaI Association, 65, 447-454. [8.6]
Pillai. K. C. S.. and T. A. Mijares (t 959), On the moments of the trace of a matrix and
approximatioru; to its distribution, Annals of Mathematical Statistics. 30,
11~5-1140. [8.6J
Pillai. K. C. S., and B. N. Nagarsenkcr (1971), On the distribution of the sphericity
tesl criterion in c1as~ical and complex normal populations having unknown
covariance matrices. Annals of Mathematical Statistics, 42, 764-767. [10.7]
Pill ai, K. C. S .. and P. Samson, Jr. (1959), On Hotelling's generalization of T2,
Biomerrika. -1.6. 160-168. [8.6]
PillaL K. C. S.. and T. Sugiyama (1969), Non-central distributions of the largest latent
roots of three matrices in multivariate analysis. Annals of the Institute of Statistical
Mathemal;cs, 21, 321-327. [8.6]
PillaL K. C. S., and D. L Young (1971), On the exaet distribution of Hotelling's
generalized T(f, Journal of Multivariate Analysis, 1, 9(}-107. [8.6]
Plana, G. A. A. (1813), Memoire sur divers problemes de probabilite, Memoires de
I'Academidmperiale de Turin, pourles Anllees 1811-1812, 20, 355-408. [1.2]
polya. G. 1I949)' Remarks on computing the probability integral in one and two
dimensions, Pro~eedfng:; of the Berkeley Symposium on Mathematical Statistics and
Probdbflity U. Neyman, ed.). 63-7!t [2.P]
Rao. C. R. I. I 948a), The utilization of multiple mCl:lsurenlents in problems of biologi-
cal classification, Journal of the Royal Sultislical Society B, 10, 159-193. [6.9]
Rill). C R. (1948b)' Te~ts of Significance in multivariate analysis. Biomerrika, 35,
58-79. [5.3]
Rao. C. Radhakrishna (1951), An asymptotic expansion of the distribution of Wilks's
criterion. Bulletin of the International Stat{sticallnsritute, 33, Part 2, 177-180. [8.5]
Rao. C. R. (1952). Advanced Sratistical Melhods in Biometric Research, John Wiley &
Sons. New York. [12.5]
Rao. C. R. (1973). Linear Sraristical Inference and Its Applications (2nd ed.), John
Wiley & Sons. New York. [4.2]
Rasch. G. (948), A functional equation for Wishart's distribution, Annals of Mathe-
marical Sraristics, 19, 262-266. [7.2]
Reiers~l, Olav (1950), On the identifiability of parameters in Thurstone's multiple
factor aZlalysis. Psychometrika, 15, 121-149. [14.2]
Reinsel, G. C .. and R. P. Velu (1998), Multivariale Reduced-rank Regression, Springer,
New York. [12.7]
Rkhardson. D. H. (1968), The exact distribution of a structural coefficient estimator,
Joun/al of lite American Slalislical Associarion, 63, 1214-1226. [12.7]
Rothenberg. Thomas J. (1977), Edgeworth expansions for multivariate test statistics,
IP-~55, Center for Research in Management Science, University of California,
Berke ley. [8.6]
Roy. S. N. (1939). p-statistics or some generalisations in analysis of variance appropri-
ate to multivariate problems, Sankhya, 4, 381-396. [13.2]
Roy. S. N. (1945). The individual sampling distribution of the maximum, the minimum
and any intermediate of the p-::;tatistics on the null-hypothesis, Sankhya, 7,
133-158, [8.6]
REFERENCES 707
Sugiyama, T" and K Fukutomi (1966), On the di~tribution of the extreme characteris-
tic roots of the matrices in mUltivariate analysis, Repons of S/(I/i~'lie(l1 Applielllion
Research, Union of Japanese Scientists and Engineers, 13, [",0]
Sverdrup, Erling (1947), Der'Lvation of the Wi!lhart dbmihution of the )iccond nrder
sample moments by straightforward integration of a multiple integml. Skandi-
naviskAktuarietidskrift,30, 151-166, [7,2]
Tang, p, Co (1938), The power function of the analysis of variance tests with table~
and illustrations of their use, Statistical Resecrch Memoirs, 2, 126-157, [5.4]
Theil, H. (assisted by 1. S, Cramer, H. Moerman, and A, Russchen) (1961), Economic
Forecasts and Policy, (2nd rev, ed,), North-Holland, Amsterdam, Contributions to
Economic Analysis No, XV (first published 1958), [12,8]
Thomson, Godfrey H. (193' ,), HoteHing's method modified to give Spearman's" g:,
Journal of Educatior.al 1 sychology, 25, 366-374, [14.3]
Thomson, Godfrey H. (1951), The Factorial Analysis of Humall Ability (5th ed.).
University of London, London, [14,7]
Thurstone, L L (1947), Multiple-Factor Analysis, University of Chicago. Chicago.
[14,2, 14.5]
Tsay" R. S" and G, Co Tiao (1985), Use of canonical analy:-i:- in tim.: :-.:rics model
identifica tion, Biometrika, 72, 299-315, [12,7]
Tukey, J. W, (1949), Dyadic anova, an analysis of variance for vecton;, Humall Biology,
21, 65-110, [8,9]
Tyler, David E. (1981), Asymptotic inference for eigenvectors, Allnals of S/roislies, 9,
725-745, [11.7]
Tyler, David E. (1982), Radial estimates and the test for sphericity, Bio.'1It'fIiJ..ll, 69,
429-436, [3,6]
Tyler, David E. (1983a); Robustness and efficiency properties of scatter matrices,
Biometrika, 70, 411-420, [3,6]
Tyler, David E. (1983b), The asymptotic distribution of principal component roots
under local alternatives to multiple roots, Annals of Statistics, 11. 1232-114::'.
[11,7]
Tyler, David E. (1987), A distribution free M-estimator of multivariate )icatter, AIIIWI,I
of Statistics, 15, 234-251. [3,6]
Velu, R. p" G, Co Reinsel, and D, W, Wichern (1986), Reduced rank models for
mUltiple time series, Biometrika, 73, 105-118, [12,7]
von Mises, R. (1945), On the classification of observation data into distinct grouP);.
Annals of Mathematical Statistics, 16, 68-73. [6.8]
von Neumann, 1. (1937), Some matrix-inequalities and metrization of matric-space.
Tomsk University Review, 1 286-300. Reprinted in JolIlI VOII Nel/mall Collected
Works (A. H. Taub, ed.), 4 (1962), Pergamon. New York. 205-21lJ. [AA]
Wald, A. (1943), Tests of statistical hypotheses concerning several parameters when
the number of observations is large, TransaCliolZS of l/ze American MalhemaIieal
Society, 54, 426-482, [4.2]
Wald, A. (1944), On a statistical problem arising in the classification \)( an incli\'jdu',\1
into one of two groups, Annals of Malhemalical Slalislics. 15. 145-162. [6.4. 6.5]
Wald, A. (1950), Slatistical Decision Funclions, John Wiley & Sons, New York. [6.2.
6.7, 8.10]
710 REFERENCES
\Vald, A., and R. Brookner (1941), On the distribution of Wilks' statistic for testing
the independence of several groups of variates, Annals of Mathematical Statistics,
12. 137-152. [8A, 9.3] .
Walker, Helen M. (1931), Studies in the History of Statistical Method, Williams and
Wilkins, Baltimore. [1.1]
Welch, P. D., and R. S. Wimpress (1961), Two multivariate statistical c >mputer
programs and their application to the vowel recognition problem, Journal of the
Acoustical Society of America, 33, 426-434. [6.10]
Wermuth. N. (1980), Linear recursive equations, covariance selection and path
analysis, Journal oJ' the American Statistical Association, 75, 963-972. [15.5]
Whittaker, E. T., and G. N. Watson (1943), A Course of Modem Analysis, Cambridge
University, Cambridge. [8.5]
Whittaker. Joe (1990), Graphical Models in Applied Multivariate Statistics, John Wiley
& Sons, I nt' .. Chichester. [15.1]
Wijsman. Robert A. (1979), Constructing all smallest simultaneous confidence sets in
a given class. with applications to MANOVA, Amlals of Statistics, 7. 1003-1018.
[8.7]
Wijsman, Robert A. (1980). Smallest simUltaneous confidence sets with appliclltions
in multivariate analysis. Multivariate Analysis. V, 483-498. [8.7]
Wilkinson, Jame~ Hardy (1965), The Algebraic Eigenvalue Problem, Clarendon, Oxford.
[ 11.4]
Wilkinson, 1. H., and C. Reinsch (1971), Linear Algebra, Springer-Verlag, New York.
[ 11.4]
Wilks. S. S. (932). Certain generalizations in the' analysis of variance, Biometrika,24.
471-494. [7.5, 8.3, lOA]
Wilh, S. S. (lY34), Moment-generating opcr<lIorl> for uctcrminnnts of product mo-
ments in samples from a normal syqtcm, Annah; of Mathmwtics, 35, 312-340.
[8.3]
Wilks, S. S. (1935), On the independence of k sets of normally distributed statistical
variahle);, Econometrica, 3, 309-326. [8.4, 9.3, 9.P]
Wishart, John (1928), The generalised product moment distribution in samples from a
normal multivariate population, Biometrika, 20A. 32-52. [7.2]
Wishart, John (1948), Proofs of the distribution law of the second order moment
statistics, Biometrika, 35, 55-57. [7.2]
Wishart. John. and MI S. Bartlett (1933), The generalised product moment distribU-
tion in a normal system, Proceedings of the Cambridge Philosophical Society, 29,
260-270. [7.:2]
Wold, H. D. A. (1954), Causality and econometrics, Econom..rrica 22, 162-177. [15.11
Wold, H. D. A. (1960), A generalization of casual chain models, Econometrica, 28,
443-463. [15.1]
Woltz, W. G., W. A. Reid, and W. E. Colwell (1948), Sugar and nicotine in cured
bright tobacco as related to mineral element composition, Proceedings of the Soil
Sciences Society of America, 13, 385-387. [8.P]
Wright, Sewall (1921), Correlation and causation, Journal of Aglicultural Research, 20,
557--585. [15.1]
REFERENCES 711
713
714 INDEX
QL algorithm, 471 I
O(N xp), 161 QR algorithm, 471
Orthonormal vectors. 647 decomposition, 647
Quadratic fonn, 628
Parallelotope. 266 Quadratic loss function for covariance matrix,
volume of, 266 '176
Partial correlation coefficient
computational fonnulas for, 39, 40, 41 r,71
confidence intervals for, 143 :X' (real purt), 257
distribution of 5amplc, 143 RHnd\)1ll matrix, I()
geometric interpretation of :;ample, 138 expected valuc of, 17
invariance of population, 63 Random vector, 16
invariancc (.f I>!lmple, 166 R,mdolllizcd tC~I, definition of, 192
maximum likelihood estimator of, 138 Rectangular coordinates, 257
in the population, 35 distribution of, 255, 257
recursion formula for, 41 Reduced rank regression, 514
sample, 138 estimator, asymptotic distribution of,
tests about, 144 550
Partial covariancc, 34 Regrcssion el)LI'ficicnt,; and function, 34
estimator of, 137 cortfidenec rcgions for, 339
Partial variance, 34 distribution of sample, 297
Partioning of a matrix, 635 geomctric intcrpretation of sample, 138
addition of. 6:l5 maximum likelihood estimator of, 294
of a covariam!c matrix, 25 pHrli(l1 corrcialion, connection with, 61
UCll!nninant of, 037 sample, 294
720 INDEX