0% found this document useful (0 votes)
23 views

Multi Variate Analysis

Uploaded by

alisajjid545
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
23 views

Multi Variate Analysis

Uploaded by

alisajjid545
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 133
A course in MULTIVARIATE ANALYSIS LECTURE NOTES FOR MATHEMATICS 204 A-B at University of California, Irvine by Howard G. Tucker* July, 1993 * @University of California, Irvine TABLE OF CONTENTS CHAPTER 1. MULTIVARIATE NORMAL DISTRIBUTION ... 1. Transformations and the multivariate normal distribution .. 2. Conditional densities and Conditional Expectations . 3. Regression and Independence .. 4, (Random) Orthogonal Matrices .. CHAPTER 2. THE WISHART DISTRIBUTION . 1. Samples on a Normal Population ... 2. Lemmas for the Wishart Matrix .. 3. The Wishart Distribution . CHAPTER 8. HOTELLING’S 7?-STATISTIC ... ey 67 1. Relationships between X and S. . 2, Simultaneous Confidence Intervals .. 3. Application #1: Test of Hypothesis for the Mean Vector ... id 4, Application #2: Multivariate Paired Comparisons 5. Application #3: ‘Ihe Multivarite Two-Sample T“-test 6. Application #4: Linear Hypotheses ... 7. Application #5: Growth Curves CHAPTER 4. INFERENCE ON MULTIVARIATE LINEAR MODELS .... 99 1, The Multivariate Linear Model ... 2. The Test Statistic for the Multivariate General Linear Hypothesis . - 107 ta. 9. One Way MANOVA. . 12 4. Application of MANOVA. . 14 CHAPTER 5. DISCRIMINANT ANALYSIS ... - U8 1. ‘The Fisher Linear Discriminant Function ... .18 2. Discriminant Analysis 121 3. Inequality of P-values + 124 4, Re: The Fisher Linear Discriminant Function ... = 128 hk CHAPTER 1. MULTIVARTATR NORMAT. DISTRIRMTION §1. Fundamental Facts Concerning the Multivariate Normal Distribution. The strong prerequisite for multivariate analysis is a knowledge of the multivariate normal distri- bution and its properties. In this section we present this strong prerequisite in considerable detail. DEFINITION: The n random variables Xi,--+,Xq, where n > 2, are said to be jointly normal (or Gaussian) or are said to have a multivariate normal (Gaussian) distribution if there exist n independent random variables Z;,-+-,Z, which are each (0, 1), constants pays**ydy and ann x n nonsingular matrix A = (a;) of real numbers such that x A ays ain\ (Zr =}ife} ¢ Xn Hn Ani **Gnn/ \Zn This can be written in vector form as X=yn+AZ, where X ts an -+- din x x=[i|we=]i}ae} of: fandze] i]. oe Hn n***a Zn or vector x in R" will be a vertical We shall adhere to the following notation. The poi vector, x = ( : ). It will be treated like a matrix, so that x‘ will denote its transpose 1»). In general, if B is any matrix, then B' denotes its transpose. We shall have (tite use for the following three lemmas LEMMA 1 If X,,-++,X, are mandam nariables with a joint ahenlutely continuous distri. bution with joint density f(z; -+-,zn), and if Wi = aiXi + b,1 Si 0 and b are constants, then Wi,W2,--,Ws have a joint absolutely continuous distribution with density Fila ral tDay** tn Proof: We note that PIW, Swiss, Wa Sn] = Pfs" seh] (ont = fe fo Heese Now make the change of variables (one at a time) 2; = ait; +5;,1 SiS, and obtain fete By the definition of density we conclude in=bn Fis eowg (Way QED. LEMMA 2. Let A be a non-singular n x n matriz, and denote S = {x ER": Ax € I}, where I= x.q[aj,0,]. Let H be an integrable function over S. Then [ov fH AuyldetAldus,---dug = f+ f Hoodx. ‘This lemma is a special case of a deep theorem in multivariable calculus whose proof is beyond the scope of this course. Lemma 8. Let X;-++,X, be random variables with a joint absolutely continuous distribution with joint density fx (x) . Let A be @ non-singular nxn matri, and define U = AX . Then U has a joint absolutely continuous distribution with joint density given by fu(u) = fx(A“*u)|deta~ Proof: For arbitrary u € R" , define S={xeR": Axe], where I= x3.,(—00,u]. Then by Lemma 2 above, we have Fol) = PU EN =P[XE 5) [ov | fatoodn = [of Set Aud laet AM 4 By the definition of joint density, it follows that fx(A~u)|detA~| is the density of U . QED. Definition: If U = (Uij) is a matrix of random variables, each having finite expectation, then we define EU as the matrix of expectations (E(Uij)) . If G(z) = (9i;(2)) is a matrix of integrable functions defined over some interval {a, i], then J? G(z)dz will denote the matrix of integrals (J! a:;(z)dz). LEMMA 4. I/U = (Ujj) fo wn xn mutria of random variables, if A — (aig) and D— (by) are r xm and nx s matrices respectively of real numbers, then E(AUB) = AE(U)B. 3 Proof: ‘Ihe element in the rth row and the kth column of AU is 3ai,Urs, and the element a in the ith row and jth column of AUB is )~ )‘aieUpbsj. Hence by the definition given mies above of the expectation of a matrix of random variables, the element in the ith row and jth column of E(AUB) is > Yai-E(Usa)buj. By standard matrix multiplication, this is the at leucut iu the sth tow and jth column of AB(U)B « QEp. Definition. If U and V are m— and n-dimensional random vectors respectively, and if each of the coordinates of each of them has finite second moment, then we define the covariance matrix of U,V to be Cov(U, V) = E((U - E(U))(V - E(V))'), Cov(U, V) = (cj) , an m xn matrix, where Ci; = Cov(Ui,Vj) . The covariance matrix of the random vector U 1s defined to be Cou(U) = Cov(U, U) . Le., Cov(U) = (dij) ts an m x m matrix, where dj; = Cov(Uj,U;) for i # j , and da = Var(Ui) . Theorem 1. If Xj,-+-yXn are multivariate normal, then their joint density is fori (any where p= EX and C is the covariance matris, i.e., C = E((X — u)(X — #)!). felx) exo(—H(x— ux w)), Praaf: From the definition af jaint normality of X , we know X = u+AZ , where Zs are independent and N(0,1) , and A is a non-singular matrix of real numbers. Thus = (ny eH Ss fa(z) = yg Note that Z— A-1(X — 2); hence by Lemma 1 and Lemma % ahave, Fx(x) = (Qn)F exp(-Hx = By (AYA — w)}ldet A“. ‘Let us define the matrix C by C = AA‘. Since (A‘)-" = (A) , we have C7! = (AA'y! = (A747? = (A-1A™). From here on, |D| will denote the determinant of a matrix D, and |detD| will denote the absolute value of the determinant of D .) Thus, [iO=]. We now have exn( Hoe — w)'O-H(x — 1). = AA] = 1A“, ond hence [det AM] C=] (enyr x(x) = We have yet to determine ps and C . We easily check from the definition that E(X) = ‘Thus 4 Cov(X) = E((X— u)(X — w)') = E(AZzA') AIA' = AA‘ = C. 0 AE(ZZ')A' QED. From Theorem 1, we see that the joint density and hence the distribution of a multivariate normal distribution is determined by its expectation vector and covariance matrix C. Thus, wwe shall write: X io (4,0) 5 which mesne: X is an n-dimensional random vector whose expectation (or mean) vector is and whose covariance matrix is C . We now relate the usual definition of the multivariate normal distribution to that given above. Also, a number of properties of the multivariate normal distribution that are most frequently used in multivariate analysis and linear regression analysis will be obtained. 5 LEMMA 5. If X is Nq(H,C) , then C is positive definite, Proof: By definition, X = AZ+y, where Z1-++,Z_ areii.d.N(0,1) and A is a non-singular nxn matrix. By the proof of Theorem 1 in §1,C = AA. Let x €R",x #0. Then A'x #0 and x'Ox = xtAAtx = (A'x)'Abx > 0. QED. THEOREM 2. If X,,--+,X, are random variables with a joint absolutely continuous distribution with density 1 5 Q Sac(x) = Kexp{—5(x- n)'D(x- )} for all x € RY, where x is a vector of constants, and D is a symmetric n x n positive defintte matréz, then X ts Nq(Wt,D *). Proof: Since D is positive definite, ov is D-'. Now given the positive defiuite syummctiie matrix D-?, it is known that there exists a symmetric positive definite matrix D-¥/? which satisfies D-/?D-1/? = D-1, Let Zi,-+-, 2, ben independent M(0,1) random variables, and let us write Y=DVZ4+u. By Theorem 1 of §1, we know that Y is multivariate normal with density ful) = i, eo He HDB), X,, are multivariate normal, and K = (|D|/(2x)")"?. QED. LEMMA ©. Let C be the covariance mutriz uf jointly nurmal random variables, and let C be partitioned as follows: On bn Cn | Cn where Ch: is a & x k submatriz, 1 < k where Cr ism xm, then U,V are independent if and only Cu| Cn if Cra = Ch, =0, i.e., Cou(U,V) =0. Proof: By Theorem 3, U is Nm(#,Ci1) and V is N(v,Cya). If U and V are independent, then their joint density equals the product of the two densities, i.e., the product of at ee op J(u mone 4) and = BE expr yon(w—») 12 which is easily seen to be Aho ()-(Nor()-). where C = , which proves that Cia = Ch, = 0. If, conversely, Cy = Ch = 0 1Ca then by a little algebra, one shows that the joint density of U and V tactors into the product of their densities. QED. THEOREM 7. If (¥) is Na(w,C), then there exist constants a and b and a random variable Z such that Y=at+bX4Z where B(Z) = 0 and X and Z are independent. Proof: Whatever the value of , we see that the value of « must be taken as a = E(Y)— bE(X) . Let us define Z by Cov(X,¥) Bn ¥ 0 ye G)-()+ By Lemma 7, the randam vectar (%) i hivaviate normal. An eaty computation yields: Cov(X,Z) = 0. By Theorem 6, we obtain that Z and X are independent. Thus if we take SplbsO and a= E(Y) — bE(X) , we obtain the conclusion. QED. 13 If we denote ox and oy as the standard deviations of X aud ¥ scapectively, and if poy denotes the correlation coefficient of X and Y , then, if (7) is Na(u,C) , we may write ¥—E(Y) = Moxy (X — £(X)) + 2, |. This is a restatement of Theorem 7. where Z and X are independent and E(Z) = We conclude this long section with a simple application. The following problem trequently (3) on a random vector (}) whose joint distribution is bivariate normal with mean vector (¢) - One ‘aricce in bio-medical research Qne has n independent observation, (7), wishes to test the null hypothesis Ho : ¢ = v against the alternative hypothesis Hy: «#7 with level of significance a. A common mistake made is that of doing e two-sample t-test. However, for each i,X; and ¥; are not necessarily independent; this happens when X; and ¥; are, eg, two particular measurements on the same patient. In such a case one must do 1 paired comparison test. PROPOSITION 1. If (31) ,---s(4) are independent Ni(u,C) random vectors, then Xi-Yiyr+-s Xa—Yq are independent N(j1—p2,7?) random variables, where +? = Var(Xs)+ Var(¥;) ~ 2000(Xi,%)- Proof: Let A be the n x 2n matrix defined as follows: Tero 0 0 0 oo o 0 1-10 0 A=|0 0 0 01 — 0 0 0000 1 and let W be the 2n-dimensional random vector define by W' = (X14 X2¥2 +++ Xn Ya)! ‘Then 4X =AW, Xn — Ya. and W is Nan(1, @ #, Jn @ C), where 1, @ # is the 2n-dimensional vector, # u # and Jy, @ C is the 2n x 2n matrix c}o| fo o|c 0 ojo} jc It is easy to see that rank(A) = nm, and thus by Lemma 6 we have that AW is Na( (itt — Ha)tny AUUn © CYAN). It is easy to verity that +77}; A(T, @ C)At = diag{r?, -- Thus X: — X1.+++,Xq — Yq are independent, each being M (a1 — ya,7?), and Ho is true if and only if yy — pa =0. If we let Z = Xi—Yi,1 SiS n,2, = %q— Yq and i Dla 2a) : T 15 then, if Ho 18 true, the statistic 7" defined by has the f-distribution with n — 1 degrees of freedom. We would accordingly reject Ho if IT| 2 C., where C is obtained from the (n ~ 1)th row and the 1 — a/2 column of the tables. of the t-distribution. 16 EXERCISES 1. Prove: X is Ni(0, Jn), where Jy is the n xn identity matrix, if and only if X,, are independent and each N'(0,1). 2, Prove: If A is an n x n non-singular matrix, then (ays (ay, 3. Prove: (eaynt [7 [ante Hts = 1 4, Let A be an n x n symmetric non-singular matrix, and let P be an n x n orthogonal matrix such that P‘AP is a diagonal matrix, i.e. P‘AP = (A,6;;), where Ai,---, Ax are the characteristic rants of A and fy; ie the Kronecker delta Prove that AM = P(Z6,) PE 5. IfX,-++,Xq is multivariate normal, and if {k1,-+-, ka) is a permutation of the integers {1-++ yn}, then Xj,,+++,X4q is multivariate normal. 6. Prove: An n-dimensional random vector X has a multivariate normal distribution if and only if there exist a vector 1 € R", a positive definite (symmetric) n xn matrix A and independent random variables Z;,-+-,Z,, each being N’(0,1) , such that X = #+AZ 7. Verify the statements made about the example given after Theorem 3. WW 10. 1. 12, 13. 4. 15. Let U,V he two independent (0,1) random vatishles Tefine X,Y hy X=UY = Wilvea - Wives: Find the joint density of X,Y. |. Prove: If A is a positive definite n x n matrix, then Wary |= (anya = Al Prove: If Z;,+++,Z, are independent A/(0,1) random variables, if P is an n x n or- thogonal matrix, and if Wi,---, Wn is defined by W= PZ, then W,,-++,W, are independent and W/(0,1) - Prove: If X;,-++,Xq are multivariate normal, if c1,+++,¢, are constants, not all zero, then ¢X; +++-+e,X, has a normal distribution. Prove the converse of Problem 10: If Z is Nq(0,J,) , and if P is ann x n matrix such that W defined by W = PZ is Nq(0, Jn) , then P is an orthogonal matrix. Prove: If X is Nq(4,C) and if € RY,c #0, then cfX is N(ctp,c'Ce). Prove: If X is Nq(s,C) , then X;,+++, Xp, are independent if and only if C is a diagonal matrix. Let X be Nq(42,C) and ¥ be A,(v,D) , and assume that X and Y are independent. By the definition of multivariate normal distribution one can write X = AU + # and 18 Y = BV +v,, where A and B are non-singular n x n matrices, U is Nq(0, J,) and V is N(0,J,) . Prove that U and V are independent and U*) (¥) #0.) 16. If X1,---,Xq are independent N,(4,C) random vectors, and if « € R* \ {0}, then a!X,,--+,0°X,, are independent M(a'p, a'Ca) random variables. 11. Prove: If (¥) is No(u,C) , then Var(X) = Var(¥) if and only if X -Y and X+¥ are independent. 19 §2. Conditional Densities and Conditional Expectations. We develop these topics in this section only for random variables which have a joint absolutely continuous distribution, function, i.e., which have a joint densi DEFINITION 1. If X1,--+,XmsYis--+s¥q are random variables with a joint absolutely continuous distribution function and joint density fx,y(x,y), then we define the conditional density of X given ¥ = y by Sxxy)/fxly) if fry) > 0 Sx (xly) 0 otherwise. PROPOSITION 1. In Definition 1, the conditional density fxry(xly) is @ density in x for every fized y at which fy(y) > 0. Proof: Since fy(y) > 0, it follows that fxiy(xly) > 0. We need only prove that S aim f ferv(xly)dx = 1, But Jive] tavtionds = J ae rol ag _ fay (sy)dx, and the conclusion follows due to the fact that fg J fx.y(x,y)dx = fy(y)- QED. PROPOSITION 2. In Definition 1, X and ¥ are independent if and only if faye (ely) — fx (x) for all y at which fy(y) > 0. Proof: X and ¥ are independent if and only if fx,x(x,y) = fx(x)fy(y) at all x,y, which 20 ie true if and only if Surety) = SXEEM fy(x) at all x and all y at which fy(y) > 0. QED. The following corollary will be used frequently. COROLLARY TO PROPOSITION 2. In Proposition 2, X and ¥ are independent if and only if fxry(xly) does not depend on y for each fired x. Proof: If X and ¥ are independent, then, by Proposition 2, fxjy(xly) = fx(x) which does not depend on y. Conversely, if fx;y(xly) does not depend on y, then Salo) = ff Ixxeoy)dy Re = [farce Re Sare(xly) [.- [ felovay a Sarv(ely)s and thus by Proposition 2, X and ¥ are independent. QED. DEFINITION 2. If, in Definition 1, m = 1, we define the conditional expectation of X1 given ¥ =y by E(XIY =y) = [ - tfxyry(zly)ar at all y € R” at which fy(y) > 0. 2 PROPOSITION &. If in Definition 1, m= 2, and ifa él isa constant, then E(aX1 + AGl¥ = y) = oEGIY = y) + BQGIY = y). Proof: Let U; = aX; + Xa,Us = Ap and V = ¥. ‘then by ‘Theorem 1 m Section 1 we have Fossn oarin®) = Faaaane gles wade ap Thus B(aX%1 + XA|Y¥=y) = E(U|V=y) = PE vsfoav(uly)den = [ay h fanvlmoen vided [2 [Les Fegpttnn Glen — wala pe Now malic the fellowing change of variablocr # = 2(uj — u,),4 — a+ If thie transformation is denoted by 9, then |det 9""(s,t)| = [al. Hence BOX + XIV =y) = fo [Plast 0 ta Eg dt = af” sfrv(sty)ds + [™ thrltlv)at = aE(K¥ = y) + E(%1¥ =). QED. DEFINITION 3. Suppose m= 1 in Definition 1, and define ply) = E(Xi1¥ = y). We define the conditional expectation of X; given Y, and denoted by E(X,I¥), by E(GIY) = o(Y). 2 It should be noted that B(Ay|¥) 1s not a number as 6(%;|¥ = y) was but is a function of Y. PROPOSITION 4 If m = 1 in Definition 1, and if F|X,| < 00, then FUR(X1Y)) = (XH) Proof: Let ¢(-) be as defined in Definition 3. Then B(EQXIY) = Ev¥) =f... oy)irvlev = [fo #faretelv def nay Re Lif» f Seale ro day)ae e Lot f [ taxenvenyas Re ~ fe sfasle)de ~ 2%). QED. PROPOSITION 5. If X is a rendom variable and Y is an n-dimensional random vector, if X and Y are independent, and if X,Y have a joint absolutely continuous distribution function, then E(X|¥ = y) = EX and E(X|Y) = E(X). Proof: By the hypothesis of independence, fx,y(z,y) = fx(z)fy(y), which implies fxry(zly) = fx(z). Now by the detinition, E(x =y) =f sfxe(slv)de LE sfale)az = BO), 23, from which the conclusion follows. QED. The proof of the next result is beyond the scope of this course. PROPOSITION 6. If X and Y are independent m— and n- dimensional random vectors, Af9(X, ¥) is a function of X and ¥ such that 9(X, ¥) is a random variable and such that the Joint distribution function of o(X.¥).¥ is absolutely continuous, and if Elp(X.Y)| < co. then E(9(X, Y)IY = y) = Eg(X,y). Although a proof of this proposition is beyond the scope of this course, we are able to motivate it by presenting a short proof of it in the case that the joint distribution function of X,¥ is discrete. In this case, EW(% YY =y) = L=P(lo(X,¥) = alil¥ =y)) D-Pllo(Xy) = alll¥ = y))- Since X and ¥ are independent, we obtain BUX YIY =y) = LsPla(Xy) = Eg(Xy), which concludes the proof. PROPOSITION 7. If X,Y are random variables with a joint absolutely continuous dis- tribution function, and if E|XY| < co, then B(XY|Y) = YE(X{Y). Proofi Let us consider random variables Z = X¥ and W = Y aud the corsceponding ey 11 continuously differentiable mapping of R? -+ R? defined by z = zy.w = y. Then z= 2/w,y = w, and [det 722 = pb for w 0. Hence faw(z,w) = fay (z/w,w)py, and BUXYY = 9) = BZ =u) = J sfapr(elv)dz Sxv(e/¥.9) fro) ‘Making the change of variable z = zy in this integral (Remember: y is fixed.), we have sayvey = g/t weeds = vf sfxv(elv)de = vE(XIY = y). Hence E(XY|Y = y) = yE(X|Y = y), from which we obtain E(XY|Y) = YE(X|Y). QED. ‘The following proposition is a generalization of Proposition 7. Its proof 1s beyond the scope of this course. The student should supply a proof of it in the discrete case. PROPOSITION &. If X is a random variable and Y is a p-dimensional random vector, if f RP +R? is a function such that f(¥) is a random variable, and if E(Xf(¥)) < 00, then E(Xf(Y)IY) = f(Y)E(XTY). The important property about £(X[Y) is that it is that unique function of ¥ which minimizes E((X ~ f(¥))*). Let us state this more precisely. PROPOSITION 8. If X,Y and f are as in Proposition 8, then E(X — E(XIY))?) ¢ BUX — 10). Proof: We first observe that E(X — F(¥))") = E(((X - E(XIY)) + (EXTY) - F(¥)))") = E((X — E(X|¥))) + E(E(XY) ~ F(¥))?) F2E((X ~ E(X]Y))(E(X1Y) - F(¥))). Note that E(X|Y) ~ f(¥) is a function of Y. Hence by Propositions 4 and 8 we have E((X — E(XTY)(E(XTY) - F(X) = E(E(X - E(XIY)(E(KIY) - F(¥))I¥)) = E(EXIY) - {(Y))E(X ~ E(X1¥))I¥Y)) E(E(XIY) ~ f(Y){E(XIY) - E(E(X1¥)IY)}). By Proposition 8, E(E(ATY)IY) = E(ATY). Thue R(X — F(X)? = RC = FEXIY))?) + E(EOCY) — f(¥))? > BUX — £(XIY))). QED. FXERCISES 1. Let X,X,Z be independent random variables, each one being (0,1). Find E(X? +? + Z3|Z =z). 2, Let X,Y be random variables whose joint distribution is uniform over the unit disk in R?, ie., their joint density is A feta ytd Sxy(ay) = : 0 ifzt+y?>1 (i) Prove that X and Y are not independent. (ii) Determine fxyy (2ly) when -1 0, then for every fixed value of y, the conditional density fxyy(2ly) is (in 2) the density of a random variable which is (a, B). 3. If Z is an n-dimensional random vector with absolutely continuous joint distribution, and if its density is fo(¢) = Kexp-H(¢- a) T(¢ all ¢ € RY, where Dis a positive definite n x n matrix, prove that K = VdefT/(2n)"?, 31 §4. (Random) Orthogonal Matrices. We explore independence further through the use of (random) orthogonal matrices. LEMMA 1. If Xi,+++,Xq are independent N(0,0%) random variables, if A is ann xn orthogonal matriz, and if Y = AX, then Y,,---,Yq are iui.d. N(0,07). Proof: Since A is non-singular, and since X is Nq(0,07Jq), then AX is Nq(A0,07AI, A‘) = Nn(0,07In)- QED. THEOREM 1, Let Z,,--+,Z, be iid. N(O,0%) random variables, and let P = (Pj) be ann xn matriz of random variables which is an orthogonal matriz with probability 1, i.e., PIP‘P = J,] = 1, and such that the joint distribution function of the P,;’s is absolutely continuous. Assume that Z and P are independent, and let W = PZ. Then W,,---,Wy are iid, N(0,0)- Proof: Let us denote [W < w] = Of4[W; < wy]. Then, letting P denote the set of all 1 x n orthogonal matrices, we have PIW sw] = PIPZ sw) = E(E(IpzewilP)) Jf BlieacelP =) folo)ép. Now by Proposition 6 of Section 2, and by Lemma 1 above, it follows that PIW sw) = ff BUpzew)folr)dp = [Pus w\fo(rap 32 — Ah (ape fo [le Etta) sere - Canary? [Peo —zis Sit QED. THEOREM 2. If U is Nq(0,0%n), if Po is ak xn matris with k 0, then f achieves ite unique masimum over @ at Q@— In. Proof: Let Q € Q, and let Ay,+++, 1, denote the (necessarily real) eigenvalues of Q. Since Q is positive-definite, then all A; > 0, and by Lemma 5, tr Q= Dh; Xi. Also det Q = Thay Aj ‘Thus J(@) = Wiog Tis) Es Levee Aim A Now Nlog Ay —.y ie manimised at 1, for which (N/1,) —1— 0, i0., when 3, = N for all 5 Hence f is maximized at any Q € Q for which all its eigenvalues are equal to N. By Lemma 4, there is only one such Q € Q, namely, Q = NI,. QED. LEMMA 7. Let Q be as in Lemma 6, and let D € Q be fired. If, for each C € Q, f(C) is defined by HC) = pM reg aeto— h(eD), then f achieves ezactly one mazimum value over Q, namely at C = ND-, and this mazi- Nn log N ~ 4N log det D - 3nN. Proof: There exists a positive-definite symmetric matrix D¥/? such that D = DD, By the properties of determinants and trace, we have ex DOD" f(C) = fN bog EER) per(D""cD¥") - —]W log det D 1 ples deu(DvoD¥) prwnen™™). Now —1N log det D does not depend on C, 90 by Lemma 6, f achieves its unique maximum 4 when D*U'U? = Iq, or C = ND}. Moreover, J(ND) = iN log det(ND tr(ND“'D) . JN log(N" dot D-) peer) 1 = ZNnlogN ~ AW logdet D- An. QED. LEMMA 8. Let Z = (2,5) be annxn matrix of ‘independent random variables, cach having an absolutely continuous distribution function. Then Pldet Z # 0] = 1. Proof: We prove this by induction on n. The lemma is clearly true for n = 1. Assuming it to be true for n— 1 where n > 2, we shall prove it true for n. Let Qj; denote the cofactor able: of Zi; in 7, and concider the following change of Wa =~ 2nQu+ 2Qia+---+ ZnQin Wa = 22 By induction hypothesis, P[Q11 = 0] = 0. One should note that Qs1,+-+,Q1m do not depend 198 Bix. Hence me vau wulve for Zy1y°+*y Zn 10 Uerms Of Wu,*+*, Wag and obtain the absolute 42 value of the Jacobian of the W's with respect to the Zs to obtain Bo Om 0 leet , 1=1/1Qul #0 O's fal with probability one. Hence we can find the joint density of the W's since we do know that the Z's du eve @ joiut density. We integrate out wizy---ynn to find the marginal density of Wy. But Wy = det Z, and since Wy; has a density and hence a continuous distribution function, it follows that P[W,, = 0] = 0. QED. LEMMA 9. If X1,-++,Xq are alll N,(u,5) and independent with 1 < p j and 8% has the x2_;4,-distribution. Let X1,-++,Xq be iid. NG(0,J,), where 1

é (as in the hypothesis), then det C = [Thy oi. Since cj: > 0 for all i, then det@ > 0, and Cis non-singular. Note tht for each i the cofactor of ci is the determinant of a lower triangular matrix, all of whose diagonal elements are pacitive Hence all the diagonal elements of C-! are positive. Ifj > i, then the cofactor of cj is (1) times the determinant of a lower triangular matrix with j - i diagonal elements being zeros. Thus if O-! = (bj), then by = 0if j >i. QED. LEMMA 8. If B is a positive-definite symmetric matriz, then there eists a lower triangular matrit D which satisfies DD! = B. Proof: Let B-¥" be a positive-definite matrix satisfying B-/?B-¥? = B+, and let BY? = (B-?)-1, Note that BY? is positive definite, and B = BY/?B4/?, By Lemma thereexists a lower triangular matrix C such that CBY/? is an orthogonal matrix. Then CBY?BYCt = J, or B= C-Y(C')! = C-(0-1)', By Lemma 2, O- is lower triangular. Thus we may take D-o". QED. 49 LEMMA 4. The matriz A = XX¢ is positine definite with probability ane Proof: Since n > p, X has rank p with probability one. Hence for all ¢ € RP\{0},¢'X # 0° with probability one, which implies ¢*X.X'C > 0 with probability one. QED. Notation: Let X} denote the ith row of X, ie., X' = (X ,). Note that by Lemma 8 of Section 1 the vectors X;,-+-,X, are linearly independent with probability one. LEMMA 5. There ezists a p x p lower triangular matriz of random variables denoted by Bo = (bi) such that BY is a measurable function of X1,-++,Xx only, and such that if y= x, Ye — Ox, + 0X, Yo = PIX $+ + UPX,, then Yi,-++,¥p form an orthonormal system in R” with probability 1. In addition b# > 0 with probability one for 1 0 yt 50 with prohahility ane ‘Thus the inverse of Bol exists with probability one, and we may define B by B = (B™")-1. We denote B = (83), and observe that, by Lemma 2, B is lower triangular. We collect all this in the following lemma. LEMMA 6. The following hold: (1) Y¥=B°X, (2) YY" = J, with probability one, (3) X= BY, (4) B= XY*, (5) A= BBY, and (©) bs = X1%. Proof: (1) follows from Lemma 5. (2) is true because ¥,---,Yp are orthonormal with probability one. (8) follows from (1). (4) follows from (2) and (3). (5) follows from (2) and (4) and the definition of A. (6) follows from (4). LEMMA T. For 2 Si S p, the sets of random vectors {¥4,-++ Yin} and {Xi,-+*, Xp} are independent, Proof: Recall that ¥j,--+,Y» form an orthonormal system, where ¥; (by the Gram-Schmidt process that occurred to make ¥ out of X) is a function of X;,---,X; only. Since all np ran- 51 dom variables in X are independent, then {Xi,+++,%p} aud {¥4,"-+)¥ie1) are independent, QED. LEMMA 8. For 2S iS p, the matriz yf yt constitutes the first i—1 rows of a random orthogonal matriz, all of whose entries are random variables that are independent of X; Xp Proof: This follows from Lemma 2 in Section 4 of Chapter 1 and from Lemma 7 above. QED. LEMMA 9. For 2 Yh Proof: By Lemma 6(4), B= X¥* or Bt = YX'. The first i— 1 terms of the ith column of both sides of this equation yield our result. QED. .-1 are independent, and each LEMMA 10. For 2 0] = 1. Proof: By Lemmas 10 and 13, bir,+*+5 by arc independent. QED. Also onc casily proves that by = 1/bi, Hence buy LEMMA 18. The rows of B are stochastically independent. Proof: Let us denote xt yf and Qi-1 = Xa Yay and let Bi) denote the first i—1 rows and first i- 1 columns of B-?. Because of the fact that B- is lower triangular, it follows that Q;.1 = Bu',P.-1. But note that all of the random variables in Bj-4 are Borel-measurable functions of those in 4.1, the first 1—1 rows of X, and thus we may write Q;-1 = ¥(P-1), where # is a Borel-measurable function. Let % ” us agree on the following notation: if V = ( A ) is a random vector, and if v = ( i ) eR, 54 then we shall denote [V < v] = (\[; Swi]. Let pi-a be an (i—1) x n matnix of Lnearly fel independent rows of real numbers. Then for x € R'“?, we have, for y > 0, P([bix $ 21,+++, Bina S innybh S yl Pien = pins) = PULQiaXi S xXIXi — XIQinQia% S vIlPina = pina) = P({O(Pi-a) Xi $ x]IXEX: — XI0(Pi-1)'Y(Pi-a)%i S 9) = [var [> oloat [xe sasltat, where ¢ is the density of the A'(0,1) distribution and x2_i43(t) is the density of the x24" distribution. Thus we see that the conditional joint distribution of the ith row of B namely, baay+++ bi, given the values of the first i—1 rows of X,, is the same as that of the unconditional distribution. We use this just-proved fact to prove that the rows of B are independent. Let ‘ ao ( ) so let ©, be any Borel set in R125 < ps Thos P Mail € Ui) = £ i ‘wee es a(e (Fr Amecai®s %)) ao (finest (ryecalXae*+ Ku) . But by the fact proved above, if 2H') distribution. Proof: Let X be a sample of size n on a Np(4,2) distribution. Then ¥ = JLX is a sample of size n on a Nm (Hp, HS Hi) distribution. We may write D=(X~ pI) - 214) Then, clearly, HDH' = (¥ - Hplt)(¥ - Hpty, 58 Thue HDHt has the Wo(n, HSH!) distribution OED. COROLLARY 1 TO LEMMA 1. If D W,(n,E), is a partition of D where Dy, is an m xm submatriz, and if En| En Enl En is a partition of © where Dy, is an m x m submatriz, then Dy has the Wm(n, Dy) distri- bution. Proof: Let H = (Jm|0) be an m x p matrix. Then Lemma 1 yields the result. QED. COROLLARY 2 TO LEMMA 1. Ifh € RY is a constant vector, if ht # 0, and if D is W,(n,), then ae has the x2-distribution. Proof: By Corollary 1, h'Dh has the W3(n,h' Ch) distribution, i.e., htDh has the same distribution as does DRL, ¥2, where ¥j,--+,Yq are independent, each being N(0,h* Dh). Hence htDh/h' Dh has the x2-distribution. QED. LEMMA 2. [fh is a p-dimensional random vector such that Ph # 0] = 1, if D is W,(n,E), and ifh and D are independent, then (i) h'Dh/h' Dh has the x2-distribution, and 39 (i) b‘Dh/n‘E n and h are independent. Proof: We shall prove this lemma only in the case where h has a joint absolutely continuous distribution. By Corollary 2 to Lemma 1] and by Propositions 4 and 6 in Chapter 1 we have, for z >0, PUh'Db/b! Dh <2] = EUlpowmesrncsy) Sf, Blew oun Sagalh = *)fale}ar = (r'Dr/rt Sor < 2] ir = fe, PeetDels! De 5 alfalrdd = LL (Leto) soar = [rine where x2(¢) denotes the density of the x2-distribution. This proves (i). In order to prove(ii), we first note that in our proof of part (i) we showed that, P({h'Dh/h! Sh < alfh =r) = [ xA(Hat for all z > 0 and all r € R®. Hence, for all Borel sets A in R?, P({h'Dh/h' Th S z)fh € A) = Le P({h'Dh/ht yh < a]fh € Allh =r) fulr)dr = hh »Plle'Dr/t' Dr < alle € Al) fale)ar = J, PlletDe/e" Dr < 2) falter. Now by the remark above, by the fact that P[h # 0] = 1, we see that by Corollary 2 to Lemma 1 and for all x #0, and by (3), P({t'Dr/r' Sr < 2) = Plth'Dh/h' Sh < z}). 60 Hence P({h'Dh/ht Soh < 2{h € A)) = | PUlb'Db/a! Dh < al) falrér = P({h'Dh/h‘ Th < z}) I, Sa(r)dr = P([h'Dh/h' Oh < 2})P(\h € Al). QED. ‘We shall henceforth use the following notation for elements of the matrices D, D-? D= (dij), = (4%), = (aij) and E~ = (0%). LEMMA 8. If D is Wy(1,D), tien det D/UetE hus the oame distribution ao the product ofp independent random variables whose distributions are x2, x2-15°*-sX2-pt1» and 0”? /d?? has the x2_,41-distribution. Proof: Let C be a lower triangular matrix of constants such that = CCt, and define ‘hen, by Lemma 1, A is W,(n,C-? £(C’ A=C7D(C !), ie., Ais W,(n,1,). Thus A has the same joint distribution as does XX', where X is a p x n matrix of independent N(0,1) random vatiables. For distributional purposes we may define A = XX‘. Now with respect to X let B be as defined in Section 2, i.e., B = (bij) is a pxp lower triangular matrix, 83, ie )} are independent random variablee, by ie W{(0,1) if 5 < Bylsisirs xii: and A= BBt. Let us define T= CB. Clearly, T is lower triangular. We next note 61 that TT! = CBBYC' = CAC = CO“ D(C™)'C' = D, ie, TT! = D. Thus detD = det(TT*) = det(CBB‘C') = det(CC') det(BB') aet() TT. By Lemma 13 in Section 2 of Chapter 2, we obtain the first conclusion of our lemma. In order to obtain the second conclusion, let Dp-1,Tp-1 Ap-1, Bp-1 Dp-1 and Cp-1 be (p—1) x (p—1) matrices obtained from the first p—1 rows and the first p—1 columns of D,T, A, B,5 and C = TysTp a Ap = Cp-tAp-iCf.,. From these and respectively. From relations established above it easily follows that Dy Biy-1BhayZpea = CpsChaay Tyna = Cp-1Bpes and Dp the fact that Ris lower triangular we get det A det(BB') __(det BY? Betas TB aba) ~ CEP Hence i = -dttA_ _ det Adet O det O' det Oy-1 det 1 oe = Tet Apa det Ayan det C det Of det 0,1 det Chay det(CACt) _det(C,-1Ch.1) FGrApaCha) det(CC4) = detD det Dyan CD ety Note that D- = (a) where d'¥ = (cofactor of dji)/det D. Since det D,-1 is the cofactor of dyp, it follows that d”? = (det D,-1)/det D. Similarly, o?? = (detD,-1)/det O. Hence 12, = a” /d?” which has the QED. LEMMA 4. Let D be W,(n,2), let h be a p-dimensional random vector with a joint absolutely continuous distribution function, and assume that D and h are independent. Then (i) RRS hes the x2_,4,-distribution, and Gi) MEE and h are independent. Proof: Let H be a px p orthogonal matrix of random variables whose pth row is (1/||hi|)h* and such that all entries in H are measurable functions of h. We may write H = H(h). Let hho € range (h), and let Ho = H(ho). Now Ho is an orthogonal px_p matrix of numbers. Note that as a consequence of our hypotheses, H and D are independent. Define D* = HoDH} and I" = HoDH§. By Lemma | and the fact that H and D are independent, the conditional diotribution of HDH" given H — Hy ie the unconditional distribution of HoDH{, which is W,(n, Ho X Hg). Let us denote (dj), Do? = (d"4), O° = (0) and D*-! = (o*4), where D* and [-* are as denoted above. By Lemma 3, o°??/d"?? has the Xi -pax-distribution. Note that HyD-1H§ = D*-*, Now, for appropriate matrices H1, Ho, Hs and Hy we have HoD“H, = He |pebrh§D-ho 63 rm — pppphbTothy. Ao noted above, o°/d"™ hoo Be has the same distribution function. Now let F(z) be the distribution function for the x2_p41-distribution, and let g(-) be the joint density of hh. Then, since h and D are independent we have ht he (ES hg D7" ho [2 RSE <2] oan J, Fle)alho)dho = F(2), Se] [b= ) g(by)dhy 1 ) in Lemma 2. QED. which proves (i). The proof of (ji) follows the same steps for the proof of conclusion in LEMMA 5. I/U is N(u,5), then (U — 4)! E7'(U — p) has the x2-distribution. Proof: Since and therefore S--? are positive definite, there exists a positive definite matrix XM? such that EVO“? = D-*. Ou easily verifies that L-2(U — gs) is Np(0, Jp). Hence (U— 1)! 5-1(U~p) is the sum of squares of p independent W/(0, 1) random variables. QED. THEOREM 1. I/U is Nj(4,D), if D is Wp(r4D), ifp

You might also like