Multivariate Material
Multivariate Material
Multivariate Methods
Kasim M.
Department of Statistics
University of Gondar
Gondar, Ethiopia
Contents
i
Multivariate Methods E-mail: [email protected]
ii
Chapter 1
The study of multivariate methods is greatly facilitated by the use of matrix algebra. This
chapter presents a review of basic concepts of matrix algebra which are essential to both
geometrical interpretations and algebraic explanations of subsequent multivariate statisti-
cal techniques.
A vector has both magnitude (length) and direction. The length of a vector x0 = (x1 , x2 , . . . , xn ),
is defined by q √
Lx = x21 + x22 + · · · + x2n = x0 x.
1
Multivariate Methods E-mail: [email protected]
The length of a vector can be expanded and contracted by multiplying with a constant a.
ax1
ax2
ax = ..
.
axn
Such multiplication of a vector x by a scalar a changes the length as
q √
Lax = a2 x21 + a2 x22 + · · · + a2 x2n = |a| x0 x.
When |a| > 1, vector x is expanded. When |a| < 1, vector x is contracted. When |a| = 1,
there is no change. If a < 0, the direction of vector x is changed.
Choosing a = L−1 x , we obtain the unit vector Lax , which has length 1 and lies in the
direction of x.
x1 p
Example 1.1. If n = 2, consider the vector x = . The length of x is Lx = x21 + x22 .
x2
Geometrically, the length of a vector in two dimensions can be viewed as the hypotenuse
of a right triangle.
2
Multivariate Methods E-mail: [email protected]
1 1 1
Example 1.3. x1 = 2 , x2 = 5 , x3 = −1
0 1 −1
a1 x1 + a2 x2 + a3 x3 = 0 ⇒
a1 + a2 + a3 = 0
2a1 + 5a2 − a3 = 0
a2 − a3 = 0
⇒ a1 + 2a2 = 0. ⇒ If a1 = a2 = 0, then a3 = 0. If a1 = 1, then a2 = a3 = 0.5.
Therefore, x1 , x2 and x3 are not linearly independent.
– The row and column rank of a matrix are equal.
∗ Rank (A) ≥ 0
∗ Rank (A) ≤ min(n, p)
∗ Rank (A) = Rank (A0 )
∗ Rank (A) = Rank (A0 A) = Rank (AA0 )
– Trace: The trace of a matrix is the sum of its diagonal elements: tr(A) =
Xk
aii .
i=1
– |aA| = an |A|
– |AB| = |BA| = |A||B|
3
Multivariate Methods E-mail: [email protected]
1 2 2
Example 1.4. A = , x=
2 4 −1
0
1 2 2
Q(x) = x Ax = 2, −1 =0
2 4 −1
⇒ A is not positive definite.
|A − λI| = 0
are called the eigen values (characteristics roots) of matrix A. These eigen values are
unique unless two or more eigen values are equal.
• The eigen values of a symmetric matrix with real elements are real. λi ’s can be
complex numbers if the matrix is not symmetric.
• The eigen values of a positive definite matrix are all positive. If a k × k symmetric
matrix is positive semi-definite of rank r (r < k), then it has r positive and (k − r)
zero eigen values.
• The eigen values of a diagonal matrix are the diagonal elements themselves.
Associated with every eigen value λi of a square matrix A, there is an eigen vector xi
whose elements satisfy the homogenous system of equations:
(A − λi I)xi = 0 ⇔ Axi = λi xi
• The elements of the vector xi are determined only up to a scaled factor because the
system is homogenous, we get only relationship like x1i = 5x2i because the number
of unknowns is greater than the number of equations.
– Since the values of the eigen vectors are trivial, normalizing makes them unique,
that is, the eigen vectors have a unit length.
4
Multivariate Methods E-mail: [email protected]
(1 − λ)(2 − λ) − 6 = 0 ⇒ λ2 − 3λ − 4 = 0
Thus, the eigen values of A are λ1 = 4 and λ2 = −1.
Axi = λi xi , i = 1, 2
• For λ1 = 4,
1 2 x11 x11
Ax1 = λ1 x1 ⇒ =4
3 2 x21 x21
⇒ x11 + 2x21 = 4x11
3
⇒ x21 = x11
2
2
Let x11 = 2 ⇒ x21 = 3. Thus, x = - - not unique.
3
2
The normalized eigen vector of x = is
3
" #
1 1 2 √2
13
e1 = p 0 x1 = √ = √3
.
x1 x1 4+9 3 13
5
Multivariate Methods E-mail: [email protected]
• For λ2 = −1,
1 2 x12 x12
Ax2 = λ2 x2 ⇒ =4
3 2 x22 x22
⇒ x12 + 2x22 = −x12
⇒ x22 = −x12
1
Let x12 = 1 ⇒ x22 = −1. Thus, x = - - not unique.
−1
1
The normalized eigen vector of x = is
−1
" 1
#
1 1 1 √
2
e2 = p 0 x2 = √ = .
x2 x2 1+1 −1 − √12
Note that e02 e2 = 1. Also, e1 and e2 are orthogonal (perpendicular), that is, e01 e2 = 0.
Example 1.6. Find the eigen values and corresponding eigen vectors of the following two
matrices:
13 −4 2
1 −5
A = ⇒ λ1 = 6, λ2 = −4 and B = −4 13 −2 ⇒ λ1 = 18, λ2 =
−5 1
2 −2 10
9, λ3 = 9
6
Multivariate Methods E-mail: [email protected]
– For λ1 = 2,
1 2 x11 x11
Ax1 = λ1 x1 ⇔ =2
2 −2 x21 x21
1 2
⇒ x21 = x
2 11
⇒ x1 =
1
!
√2
5
The normalized eigen vector corresponding to λ1 = 2 is e1 = √1
.
5
– For λ2 = −3,
1 2 x12 x12
Ax2 = λ2 x2 ⇒ =2
2 −2 x22 x22
1
⇒ x22 = −2x12 ⇒ x2 =
−2
!
√1
5
The normalized eigen vector corresponding to λ2 = −3 is e2 = .
− √25
The matrix is written as a function of eigen values and normalized eigen vectors.
A = OΛO 0
λ1 0 · · · 0
0 λ2 · · · 0
where O = (e1 , e2 , · · · , ek ) and Λ = diag(λ1 , λ2 , · · · , λk ) = .. .
.. . . .
. . . ..
0 0 · · · λk
Note here that O O = OO = Ik×k (O is orthogonal, O = O 0 ).
0 0 −1
⇒ A = OΛO 0 .
7
Multivariate Methods E-mail: [email protected]
A−1 = OΛ−1 O 0
1 1 1
where O = (e1 , e2 , · · · , ek ) and Λ−1 = diag , ,··· , .
λ1 λ2 λk
k
X 1
⇒ A−1 = ej e0j .
λ
j=1 j
Also,
1 1
A 2 = OΛ 2 O 0
1
p p p
where O = (e1 , e2 , · · · , ek ) and Λ = diag
2 λ1 , λ2 , · · · , λk .
k
1
X
λj ej e0j .
p
⇒ A2 =
j=1
13 −4 2
1 1 2
Example 1.8. Find A−1 and A 2 . A = and A = −4 13 −2
2 1
2 −2 10
• V = (e∗1 , e∗2 , · · · , e∗min(m,k) ) where e∗i (i = 1, 2, · · · , min(m, k)) is the normalized eigen
vector corresponding to λ∗i of the matrix A0 A.
√
λ1 √0 · · · 0
0 λ2 · · · 0
• Λ = ..
.. . . ..
. . .
p .
0 0 ··· λmin(m,k)
√
Note that λi is the eigen value of matrix A where λi is the eigen value of A0 A or
AA0 .
8
Multivariate Methods E-mail: [email protected]
3 1 1
Example 1.9. A =
−1 3 1
3 −1
0 3 1 1 11 1
AA = 1 3 = and
−1 3 1 1 11
1 1
3 −1 10 0 2
3 1 1
A0 A = 1 3 = 0 10 4
−1 3 1
1 1 2 4 2
• Eigen values and eigen vectors corresponding to AA0 .
0
11 − λ 1
|AA − λI| = 0 ⇒ = 0 ⇒ (11 − λ)2 − 1 = 0
1 11 − λ
λ2 − 22λ + 120 = 0 ⇒ λ = 12 or λ = 10.
The eigen values of AA√0 or A0 A are λ
√1 = 12 and λ2 = 10 which implies the eigen
values of A to be λ1 = 12 and λ2 = 10.
– Eigen vector corresponding to λ1 = 12,
0 11 1 x11 x11
AA x1 = λ1 x1 ⇒ = 12 ⇒ x21 = x11
1 11 x21 x21
" #
√1
1
Let x11 = 1 ⇒ x21 = 1 ⇒ x1 = ⇒ e1 = √12
1 2
p p √12 0
" #
√1 √1
U = (e1 , e2 ) = 2 2
and Λ = diag λ1 , λ2 = √
√1 − √12 0 10
2
λ2 − 12λ = 0 ⇒ λ = 12 or λ = 10 or λ = 0.
9
Multivariate Methods E-mail: [email protected]
" # √ " #
√1 √1 √1 √2 √1
0 2 2 12 √0 6 6 6 3 1 1
A = U ΛV = √1
=
2
− √12 0 10 √2
5
− √15 0 −1 3 1
10
Chapter 2
2.1 Introduction
Multivariate statistical analysis is concerned with data collected with several dimensions
of the same individual (subject or experimental unit). Using multivariate analysis, the
variables can be examined simultaneously in order to access the key features of the process.
It enables us to
2. Sorting and grouping. Groups of ”similar” objects or variables are created, based
upon measured characteristics. Example: discriminant analysis.
11
Multivariate Methods E-mail: [email protected]
In multivariate study, the interest is on the off-diagonals (covariances). Are all the
variables mutually independent or are one or more variables dependent on the others?
If so, how? Example: canonical correlation analysis.
4. Prediction. The relationship between variables can be determined for the purpose
of predicting the values of one or more variables on the basis of observations on
the other variables. Example: multivariate linear regression, multivariate analysis of
variance.
12
Multivariate Methods E-mail: [email protected]
Descriptive Statistics
A large data set is bulky, and its very mass poses a serious obstacle to any attempt to
visually extract pertinent information. Much of the information contained in the data can
be assessed by calculating certain summary numbers, known as descriptive statistics. For
example, the arithmetic average, or sample mean, is a descriptive statistic that provides
a measure of location-that is, a ”central value” for a set of numbers. And the average of
the squares of the distances of all of the numbers from the mean provides a measure of the
spread, or variation, in the numbers.
n
1X
• Sample mean: x̄j = xij ; j = 1, 2, · · · , p
n i=1
n
1X
• Sample variance: s2j = sjj = (xij − x̄j )2 ; j = 1, 2, · · · , p
n i=1
n
1X
• Sample covariance between Xj and Xk : sjk = (xij − x̄j )(xik − x̄k ); j, k =
n i=1
1, 2, · · · , p; j 6= k. Note sjk = skj for all j and k.
sjk
• Sample correlation coefficient between variable j and k: rjk = √ √ ; j, k =
sjj skk
1, 2, · · · , p. Note rjk = rkj and rjk = 1 if j = k.
Although, the sign of the sample correlation and sample covariance are the same, the
correlation is ordinarily easier to interpret as:
– its magnitude is bounded, that is, −1 ≤ rjk ≤ 1 for all j and k.
– it is unitless.
– it takes the variability into account.
But the major disadvantage of correlation is it does not measure non-linear associa-
tions.
The descriptive statistics for all the p variables in terms of vector and matrix operations
are:
13
Multivariate Methods E-mail: [email protected]
xi1 x̄1
n n
1X 1 X xi2 x̄2
• Sample mean vector: x̄ = xi = = .
.. ..
n i=1 n i=1
. .
xip x̄p p×1
n
1X
• Sample variance-covariance matrix: Sn = (x − x̄)(x − x̄)0 .
n i=1
s11 s12 · · · s1p
s21 s22 · · · s2p
⇒ Sn =
.. .. . . ..
. . . .
sp1 sp2 · · · spp p×p
Consequently, the sample standard deviation matrix is written as:
√
s11 0 ··· 0
√
1
0 s22 · · · 0
V = ..
2
.. . . ..
. . . .
√
0 0 ··· spp p×p
1 1
• Sample correlation matrix: Rn = (V 2 )−1 Sn (V 2 )−1
r11 r12 · · · r1p 1 r12 · · · r1p
r21 r22 · · · r2p
r21 1 · · · r2p
⇒ Rn = .. . = .
.. . . .. . . .
. . . .. .. . . ..
rp1 rp2 · · · rpp rp1 rp2 · · · 1 p×p
1 1
Note Sn = V 2 RV 2 . Note also that Sn and R are symmetric and positive definite.
Example 2.1. Find the sample mean vector, covariance and correlation matrices for the
following data matrix.
4 1
−1 3
3 5
14
Multivariate Methods E-mail: [email protected]
• Mean(Xj ) = µj = E(Xj ); j = 1, 2, · · · , p
Let X be an n × p random vector, i.e, X = (X1 , X2 , · · · , Xp )0 . Then the mean vector is:
X1 E(X1 ) µ1
X2 E(X2 ) µ2
E(X) = E .. = .. = .. = µ
. . .
Xp E(Xp ) µp
(X1 − µ1 )2 (X1 − µ1 )(X2 − µ2 ) · · · (X1 − µ1 )(Xp − µp )
(X2 − µ2 )(X1 − µ1 ) (X2 − µ2 )2 ··· (X2 − µ2 )(Xp − µp )
⇒Σ=E
.. .. .. ..
. . . .
(Xp − µp )(X1 − µ1 ) (Xp − µp )(X2 − µ2 ) . . . (Xp − µp )2
E(X1 − µ1 )2 E(X1 − µ1 )(X2 − µ2 ) . . . E(X1 − µ1 )(Xp − µp )
E(X2 − µ2 )(X1 − µ1 ) E(X2 − µ2 )2 ... E(X2 − µ2 )(Xp − µp )
=
.. .. ... ..
. . .
E(Xp − µp )(X1 − µ1 ) E(Xp − µp )(X2 − µ2 ) . . . E(Xp − µp )2
σ11 σ12 · · · σ1p
σ21 σ22 · · · σ2p
Thus, Σ =
.. .. . . ..
. . . .
σp1 σp2 · · · σpp p×p
15
Multivariate Methods E-mail: [email protected]
1 1
Also, the population correlation matrix is ρ = (V 2 )−1 Σ(V 2 )−1 , that is,
1 ρ12 · · · ρ1p
ρ21 1 · · · ρ2p
ρ = ..
.. . . ..
. . . .
ρp1 ρp2 · · · 1 p×p
1 1
Note: Σ = V 2 ρV 2 .
All the points (x1 , x2 ) that lie a constant distance, say c, from the origin satisfying the
equation q
c = x21 + x22 ⇔ c2 = x21 + x22
is called equation of a circle with radius c.
The Euclidean distance between two points P = (x1 , x2 ) and Q = (y1 , y2 ) in the two
dimensional space is p
dE (P, Q) = (x1 − y1 )2 + (x2 − y2 )2 .
Similarly, the Euclidean distance between P = (x1 , x2 , · · · , xp ) and Q = (y1 , y2 , · · · , yp ) in
the p dimensional space is
q p
dE (P, Q) = (x1 − y1 )2 + (x2 − y2 )2 + · · · + (xp − yp )2 = (x − y)0 (x − y).
Straight line or Euclidean distance is unsatisfactory for most statistical purposes. This is
because each co-ordinate contributes equally to the calculation of Euclidean distance. This
suggests a statistical measure of distance.
16
Multivariate Methods E-mail: [email protected]
A statistical distance takes into account the variability as well as the correlation unlike
the Euclidean distance. Suppose X 0 = (X1 , X2 , · · · , Xp ) follows a p dimensional dis-
tribution with E(X) = µ and a variance-covariance matrix Cov(X) = Σ. Suppose
x̄ = (x̄1 , x̄2 , · · · , x̄p )0 is a vector of means based on an n × p observed data matrix. The
statistical distance between x̄ and µ is given by
p
dS (x̄, µ) = (x̄ − µ)0 Σ−1 ((x̄ − µ)).
v −1
u
u
u σ 11 σ 12 · · · σ 1p x̄ 1 − µ 1
σ21 σ22 · · · σ2p x̄2 − µ2
u
⇒ dS (x̄, µ) = u u 1 x̄ − µ 1 , x̄ 2 − µ 2 , · · · , x̄ p − µ p . .. . . .. .
.. . ..
t . .
σp1 σp2 · · · σpp x̄p − µp
If one component has much larger variance than another, it will contribute less to the
squared distance. Two highly correlated variables will contribute less than two variables
that are nearly uncorrelated. Essentially, the use of the inverse of the covariance matrix
eliminates the effect of correlation and standardizes all of the variables.
x̄1 µ1 4 0
Example 2.2. Let x = ,µ= and Σ = . The variability in the x1 direc-
x̄2 µ2 0 1
tion is greater than that in the x2 direction as σ11 = 4 > σ22 = 1.
p
Euclidean distance: dE = (x1 − µ1 )2 + (x2 − µ2 )2 .
p
Statistical distance: dS = (x − µ)0 Σ−1 (x − µ).
s 1
0 x1 − µ 1
⇒ dS = (x1 − µ1 , x2 − µ2 ) 4
0 1 x2 − µ 2
r
(x1 − µ1 )2 (x2 − µ2 )2
⇒ dS = + ⇒ equation of ellipse
4 1
All points that lie a constant distance, say c = 2, from the theoretical mean (µ1 , µ2 ) satisfy
the equation:
(x1 − µ1 )2 (x2 − µ2 )2
+ = c2 = 4.
4 1
At x1 = µ1 , (x2 − µ2 )2 = 4 ⇒ x2 − µ2 = ±2 ⇒ x2 = µ2 ± 2.
17
Multivariate Methods E-mail: [email protected]
At x2 = µ2 , (x1 − µ1 )2 = 16 ⇒ x1 − µ1 = ±4 ⇒ x1 = µ1 ± 4.
P lotof theellipse
The ellipse stretches in the x1 direction as compared to that in the x2 direction because
of the larger variance in x1 (the ellipse is parallel to the x1 ). Having the same variance in
both axes, the equation will be simply a circle.
• E(a1 X1 ) = a1 E(X1 ) = a1 µ1 , a1 ∈ R.
• Var(a1 X1 ) = a21 Var(X1 ) = a21 σ11 , a1 ∈ R.
2. Bivariate case:
18
Multivariate Methods E-mail: [email protected]
X1
X2
Linear combination: a1 X1 + a2 X2 + · · · + ap Xp = (a1 , a2 , · · · , ap ) .. = a0 X.
.
Xp
• E(a0 X) = a0 E(X) = a0 µ
• Var(a0 X)
= a0 Σa
σ11 · · · σ1p
.. ..
Here Σ = . .
σp1 · · · σpp
4. Consider q linear combinations of p random variables.
p
X
Z1 = a11 X1 + a12 X2 + · · · + a1p Xp = a1j Xj = a1 X
j=1
p
X
Z2 = a21 X1 + a22 X2 + · · · + a2p Xp = a2j Xj = a2 X
j=1
..
.
p
X
Zq = aq1 X1 + aq2 X2 + · · · + aqp Xp = aqj Xj = aq X
j=1
In matrix form:
Z1 a11 a12 · · · a1p Z1
Z2 a21 a22 · · · a2p Z2
.. = .. .. .. ⇔ Z = AX
.. . .
. . . . . .
Zq aq1 aq2 · · · aqp Zq
• E(Z) = E(AX) = AE(X) = Aµ
• Cov(Z) = Cov(AX) = AΣA0
Example 2.3. Find the mean vector and covariance matrix for the linear combinations:
Z1 = X1 − X2 and
Z2 = X1 + X2 .
1 −1 X1
Z= = AX
1 1 X2
1 −1 µ1 µ1 − µ2
• E(Z) = AE(X) = Aµ = =
1 1 µ2 µ1 + µ2
• Cov(Z) = ACov(Z)A0
1 −1 σ11 σ12 1 1 σ11 − 2σ12 + σ22 σ11 − σ22
⇒ Cov(Z) = =
1 1 σ12 σ22 −1 1 σ11 − σ22 σ11 + 2σ12 + σ22
19
Multivariate Methods E-mail: [email protected]
20
Chapter 3
Most of the techniques on multivariate statistical analysis are based on the assumption
that the data were generated from a multivariate normal distribution, because
Univariate case:
Let X be a random variable with E(X) = µ and var(X) = σ 2 . Then if X ∼ N (µ, σ 2 ), its
pdf is given by
1 (x − µ)2
1
f (x) = √ exp − .
2πσ 2 σ2
(x − µ)2
Note the term = (x − µ)(σ 2 )−1 (x − µ) measures the squared statistical distance
σ2
form x to µ in standard deviation units.
21
Multivariate Methods E-mail: [email protected]
Multivariate case:
f (x) = f (x1 , x2 , · · · , xp )
= f (x1 ).f (x2 ). · · · .f (xp )
(x −µ ) 2 (x −µ )2 (xp −µp )2
1 −1 1 21 1 −1 2 22 1 − 12 2
=√ e 2 σ1 .√ e 2 σ2 . · · · .√ e σp
1 1 1 1
Since Σ = diag(σ11 , σ22 , · · · , σpp ), |Σ| = σ11 .σ22 . · · · .σpp . Thus = . .··· .
|Σ| σ11 σ22 σpp
1 1 1 1
which implies 1 = . .··· . .
|Σ| 2 σ1 σ2 σp
Also,
p p
X (xj − µj )2 X
2
= (xj − µj )(σj2 )−1 (xj − µj ) = (x − µ)0 Σ−1 (x − µ)
j=1
σj j=1
The general p dimensional normal density function, written as X ∼ Np (µ, Σ), is obtained
by letting Σ to be any p × p symmetric matrix,
σ11 σ12 · · · σ1p
σ21 σ22 · · · σ2p
Σ = .. .. .
.. . .
. . . .
σp1 σp2 · · · σpp
Here, the j th element of µ is still E(Xj ) = µj and the j th element of Σ is still σjj =
E(Xj − µj )2 but the (j, k)th element is now σjk = E(Xj − µj )(Xk − µk ), i 6= k.
p(p − 1)
If all the covariances of Σ are zero, then the p components are independently
2
distributed (which rarely happens). This cannot dealt with multivariate analysis rather
univariate analysis.
For a p×1 vector X, (x−µ)0 Σ−1 (x−µ) = c2 is the squared statistical distance from x to µ.
22
Multivariate Methods E-mail: [email protected]
Note the symmetric matrix Σ is positive definite (all eigen values are positive). Let a0 =
(a1 , a2 , · · · , ap ). We need to show a0 Σa > 0. Hence, a0 Σa = a0 E(X − µ)(X − µ)0 a since
Σ = E(X − µ)(X − µ)0 . Since (X − µ)0 a is a scalar, its transpose makes no change,i.e.,
a0 Σa can be written as E[a0 (X − µ)(X − µ)0 a] = E[a0 (X − µ)a0 (X − µ) > 0. Therefore,
Σ is positive definite.
Example
3.1.
Bivariate
normal distribution
(p = 2).
X1 µ1 σ11 σ12
X= ,µ= and Σ =
X2 µ2 σ21 σ22
σ12 √ √
ρ12 = √ √ ⇒ σ12 = ρ12 σ11 σ22
σ11 σ22
√ √
σ11 ρ12 σ11 σ22
⇒Σ= √ √ ⇒ |Σ| = σ11 σ22 − ρ212 σ11 σ22 = (1 − ρ212 )σ11 σ22
ρ12 σ11 σ22 σ22
√ √
−1 1 σ22 −ρ12 σ11 σ22
⇒Σ = √ √
(1 − ρ212 )σ11 σ22 −ρ12 σ11 σ22 σ11
The squared statistical distance is (X − µ)0 Σ(X − µ) = c
√ √
1 σ22 −ρ12 σ11 σ22 X1 − µ1
c = [X1 − µ1 , X2 − µ2 ] √ √
(1 − ρ2 )σ11 σ22 −ρ12 σ11 σ22 σ11 X2 − µ2
√ √
1 [σ22 (x1 − µ1 ) − ρ12 σ11 σ22 (x2 − µ2 ), x1 − µ 1
= √ √
(1 − ρ12 )σ11 σ22 −ρ12 σ11 σ22 (x1 − µ1 ) + σ11 (x2 − µ2 )]
2 x2 − µ 2
1 √ √
= 2
[σ22 (x1 − µ1 )2 − 2ρ12 σ11 σ22 (x1 − µ1 )(x2 − µ2 ) + σ11 (x2 − µ2 )2 ]
(1 − ρ12 )σ11 σ22
(x1 − µ1 )2 (x2 − µ2 )2
1 x1 − µ 1 x2 − µ 2
= − 2ρ12 √ √ +
(1 − ρ212 ) σ11 σ11 σ22 σ22
=A
23
Multivariate Methods E-mail: [email protected]
Eigen vectors:
24
Multivariate Methods E-mail: [email protected]
25
Multivariate Methods E-mail: [email protected]
Note that the ellipse is not parallel to the X or Y axis as the off-diagonal of Σ is not zero.
(Had the off-diagonal being zero and the variances are equal, the ellipse will be a circle).
InserttheP lothere
Along the ellipse shown above (on the surface of the ellipse) the bivariate normal density
is constant. This path along the surface is called a contour.
Note:
• If σ12 = 0, then the equations of the major and minor axes will be reversed, i.e.,
• If σ12 = 0 or ρ12 = 0, then the concentration ellipse is simply a circle and an infinity
of perpendicular axes can be given as ”principal”.
Remarks: Recall spectral decomposition. Let O be a matrix whose columns are the nor-
malized eigen vectors of Σ and let Λ be a diagonal matrix whose diagonals are the eigen
values of Σ. Then,
p
X
• Σ= λj ej e0j = OΛO 0
j=1
p
−1
X 1
• Σ = ej e0j = OΛ−1 O 0
j=1
λ j
p
1 1
X
λj ej e0j = OΛ 2 O 0
p
• Σ = 2
j=1
Generally,
26
Multivariate Methods E-mail: [email protected]
3. Zero covariance implies that the corresponding components are independently dis-
tributed (for normal distribution only). X1 and X2 are independent if and only if
cov(X1 , X2 ) = 0. That is, if cov(X1 , X2 ) = 0, f (x1 , x2 ) = f (x1 ) · f (x2 ).
27
Multivariate Methods E-mail: [email protected]
Multivariate case:
28
Multivariate Methods E-mail: [email protected]
1
a. X̄ ∼ Np (µ, Σ).
n
b. (n−1)S is distributed as Wishart distribution (matrix) with n−1 degrees of freedom.
The sampling distribution of the sample covariance matrix is called the Wishart distri-
bution. it is defined as the sum of independent products of multivariate normal random
vectors, Zi . → Wn (·|Σ) Wishart distribution with n degrees of freedom = distribution of
Xn n
X
0
Zi Zi . (Note for univariate distribution, Zi2 ∼ χ2 (n)).
i=1 i=1
• (X − µ) ∼ Np (0, Σ)
1
• Z = Σ− 2 (X − µ) ∼ Np (0, Ip )
1 1
• Z 0 Z = (X − µ)0 Σ− 2 Σ− 2 (X − µ) = (X − µ)0 Σ−1 (X − µ) ∼ χ2 (p).
Let Xi ; i = 1, 2, · · · , n be a random sample from any distribution with mean µ and finite
covariance Σ. Then:
1 √
• (X̄ − µ) ∼ Np (0, Σ) ⇒ n(X̄ − µ) ∼ Np (0, Σ) for large n. Since for large n, S
n √
is close to Σ with high probability, n(X̄ − µ) ∼ Np (0, S).
√ 1
• Z = nΣ− 2 (X̄ − µ) ∼ Np (0, Ip )
29
Chapter 4
One of the central messages of multivariate analysis is that the p correlated variables must
be analysed jointly.
X̄ − µ0
t= √ ∼ t(n − 1).
s/ n
The null hypothesis is rejected if |t| is large. Rejecting H0 when |t| is large is equivalent
to rejecting H0 if:
2 −1
s2
2 X̄ − µ0
t = √ = (X̄ − µ0 ) (X̄ − µ0 )
s/ n n
= n(X̄ − µ0 )(s2 )−1 (X̄ − µ0 ) ∼ F (1, n − p)
2
2 X̄ − µ0
Note: t = √ ∼ t2 (n − 1) = F (1, n − 1).
s/ n
Given a sample of n observations x1 , x2 , · · · , xn , H0 should be rejected, that µ0 is a
plausible value for µ, if the observed
x̄ − µ0
|t| = √
s/ n
30
Introductory Multivariate Methods - Stat 3133 E-mail: [email protected]
exceeds tα/2 (n − 1) or if the observed t2 = n(x̄ − µ0 )[s2 ]−1 (x̄ − µ0 ) > t2α/2 (n − 1).
2
2 X̄ − µ0
Z = √ ∼ χ2 (n).
σ/ n
Multivariate case:
Let X1 , X2 , · · · , Xn be a random sample from N (µ, Σ). The hypothesis to be tested is:
H0 : µ = µ0 versus H1 : µ 6= µ0 . The test statistic which is analog of the univariate t2 is:
(n − 1)p
T 2 = n(X̄ − µ0 )0 S −1 (X̄ − µ0 ) ∼ F (p, n − p)
n−p
where n n
1X 1 X
X̄ = Xi and S = (Xi − X̄)(Xi − X̄)0 .
n i=1 n − 1 i=1
This test statistic is called Hotelling’s T 2 statistic. If T 2 is ”too large”, i.e., x̄ is ”too far”
from µ0 , then H0 : µ = µ0 is rejected which means µ0 is not a plausible value for µ.
Summary Statistics: p = 2, n = 10
n
1X
x̄j = xij ; j = 1, 2
n i=1
10
1 X 1
⇒ x̄1 = xi1 = (3.17 + 3.45 + · · · + 2.05) = 3.20
10 i=1 10
10
1 X 1
⇒ x̄2 = xi2 = (3.45 + 2.35 + · · · + 4.36) = 3.80
10 i=1 10
3.20
Thus, the sample mean vector is: x̄ = .
3.80
n
1 X
sjk = (xij − x̄j )(xik − x̄k ); j, k = 1, 2
n − 1 i=1
31
Multivariate Methods E-mail: [email protected]
10 10
1 X 1 X
⇒ s11 = (xi1 − x̄1 )2 = 0.678 and s22 = (xi2 − x̄2 )2 = 0.645
10 − 1 i=1 10 − 1 i=1
10
1 X
⇒ s12 = (xi1 − x̄1 )(xi2 − x̄2 ) = −0.109
10 − 1 i=1
Thus, the sample covariance matrix is:
0.678 −0.109 −1 1.517 0.257
S= ⇒S =
−0.109 0.645 0.257 1.594
3 3
1. Hypothesis: H0 : µ = vs H1 : µ 6= .
5 5
2. T 2 = n(x̄ − µ0 )0 S −1 (x̄ − µ0 )
0
2 3.20 − 3 1.517 0.257 3.20 − 3
⇒ T = 10
3.80 − 5 0.257 1.594 3.80 − 5
1.517 0.257 0.2
= 10 0.2 −1.2
0.257 1.594 −1.2
= 22.322
which is equivalent to
(x̄ − µ)2
t2 = 2
= (x̄ − µ)(s2 )−1 (x̄ − µ) ≤ Fα (1, n − 1).
s /n
32
Multivariate Methods E-mail: [email protected]
Multivariate case:
(n − 1)p
n(x̄ − µ)0 S −1 (x̄ − µ) ≤ Fα (p, n − p)
n−p
The confidence region is an ellipsoid centered at the sample mean vector x̄ = (x̄1 , x̄2 , · · · , x̄p )0 .
This implies, the boundary of the ellipsoid is
c∗ (n − 1)p
(x̄ − µ)0 S −1 (x̄ − µ) = where c∗ = Fα (p, n − p).
n (n − p)
Beginning at the center, x̄, the half lengths of the axes are given by:
r s
p √ p c ∗ p (n − 1)p
λj c = λj = λj Fα (p, n − p)
n (n − p)n
in the direction of ej which is the normalized eigen vector corresponding to the eigen value
λj ; j = 1, 2, · · · , p of S.
Example 4.2. Recall example 4.1. The 95% confidence region for µ = (µ1 , µ2 )0 is given
by:
c∗
(x̄ − µ)0 S −1 (x̄ − µ) ≤
n
0
3.20 − µ1 1.517 0.257 3.20 − µ1 10.032
⇒ ≤
3.80 − µ2 0.257 1.594 3.80 − µ2 10
This confidence region for µ = (µ1 , µ2 )0 will be an equation of ellipse like the form:
For all points inside the ellipse, H0 will not be rejected. For example, you can easily check
that µ = (3, 5)0 does not lie in the region.
0.678 −0.109
Plot of the confidence region. It is already given S = .
−0.109 0.645
0.678 − λ −0.109
|S − λI| = 0 ⇒ =0
−0.109 0.645 − λ
λ2 − 1.323λ + 0.425 = 0
⇒ λ1 = 0.774 and λ2 = 0.550.
33
Multivariate Methods E-mail: [email protected]
Inserttheplothere
It would be erroneous to carry out univariate t tests for this purpose because the number
of tests and the correlation among the responses would lead to a greatly different values
of significance level (α) than the one chosen for the critical value of the t distribution.
For example, let X ∼ N6 (µ, Σ) and assume each component mean equals to a specified
value. There would be p = 6 univariate t-tests. Let α = 0.05. Then, the probabil-
ity of not rejecting the hypothesis of no difference from the specified value in each case
would be 1 − 0.05 = 0.95. If the tests are independent of each other, the probability of
not rejecting H0 in all of the 6 cases is (0.95)(0.95) · · · (0.95) = (0.95)6 = 0.7351. The
probability of rejecting at least one hypothesis of no difference from the specified value is
1 − 0.7351 = 0.2649 = α for a univariate t-test. This means that type I error is committed
26% of the time in testing all the 6 univariate tests. In general, the probability of commit-
ting type I error increases as the number of components is larger.
34
Multivariate Methods E-mail: [email protected]
In constructing the simultaneous confidence statements, all the separate confidence in-
tervals hold simultaneously a specified high confidence level (low significance level). Let
X1 , X2 , · · · , Xn be a random sample from Np (µ, Σ). Then, the linear combination:
has a normal distribution with mean a0 X̄ and variance a0 Σa, that is, a0 X ∼ N (a0 µ, a0 Σa).
A simultaneous confidence region is given by a set of a0 µ values such that the observed
t2 is relatively small for all choices of a. Then, a (1 − α)100% simultaneous confidence
interval for a0 µ is:
r !
(a0 x̄ − a0 µ)2 a0 x̄ − a0 µ √ √ a0 Sa
≤ c∗ ⇒ p ≤ c∗ ⇒ a0 x̄ ± c∗
a0 Sa/n a0 Sa/n n
(n − 1)p
where c∗ = Fα (p, n − p). In particular, if a0 = (0, 0, · · · , |{z} 1 , · · · , 0), then
n−p
j th position
√
r
sjj (n − 1)p
the confidence interval for a0 µ = µj is x̄j ± c∗ where c∗ = Fα (p, n − p).
n n−p
Example 4.3. Consider again example 4.1. Find the 95% confidence interval for mean
nutrient A and B. The sample mean vector and sample variance-covariance matrix, re-
spectively, were:
3.20 0.678 −0.109
x̄ = and S = .
3.80 −0.109 0.645
(10 − 1)2
Also, the critical value for the Hotelling’s T 2 was c∗ = F0.05 (2, 10 − 2) = 10.032.
10 − 2
A 95% simultaneous confidence interval interval for µ1 is:
!
√
r
√
r
s 11 0.678
x̄1 ± c∗ = 3.20 ± 10.033 = (2.375, 4.025).
n 10
Note that µ01 = 3 is found inside in the confidence interval for µ1 while µ02 = 5 is
found outside the confidence interval for µ2 . Hence, the second component (nutrient B) is
responsible for the rejection of H0 : µ = (3, 5)0 .
35
Multivariate Methods E-mail: [email protected]
r ! r !
0.678 0.645
µ1 : 3.2 ± 3.111 = (2.39, 4.01) and µ2 : 3.8 ± 3.111 = (3.01, 4.59)
10 10
Recall for a random sample Xi ; i = 1, 2, · · · , n from Np (µ, Σ), the likelihood function is:
" n
#
1 1X
`(µ, Σ) = np n exp − (xi − µ)0 Σ−1 (xi − µ) .
(2π) 2 |Σ| 2 2 i=1
n
1X
Also recall the ML estimate of µ is µ̂ = x̄ and that of Σ is Σ̂ = (xi − x̄)(xi − x̄)0 .
n i=1
The exponent of the likelihood function can be:
Xn
X n
0 −1 0 −1
(xi − µ) Σ (xi − µ) = tr (xi − µ) Σ (xi − µ)
i=1 i=1 | {z } |
{z }
A1×p Bp×1
( n )
X
= tr Σ−1 (xi − µ)(xi − µ)0 as tr(AB) = tr(BA)
( i=1 n
)
X
−1
= tr Σ (xi − µ)(xi − µ)0
i=1
Thus, " ( )#
n
1 1 X
`(µ, Σ) = np n exp − tr Σ−1 (xi − µ)(xi − µ)0 .
(2π) |Σ|
2 2 2 i=1
36
Multivariate Methods E-mail: [email protected]
2 |Σ̂|
⇒ Λn =
|Σ0 |
2
This likelihood-ratio test statistic Λ n is called Wilks’ Lambda. The null hypothesis H0 :
µ = µ0 should be rejected if the value of Λ is too small, i.e., if:
" # n2
|Σ̂|
Λ= < cα
|Σ0 |
where cα is the lower (100α)th percentile of the distribution of Λ. But,
−1
T2
2
Λn = 1 +
n−1
(n − 1)p 2
where T 2 ∼ F (p, n − p). Rejecting H0 for small values of Λ n is equivalent to
n−p
rejecting H0 for large values of T 2 .
37
Multivariate Methods E-mail: [email protected]
38
Chapter 5
39
Multivariate Methods E-mail: [email protected]
40
Multivariate Methods E-mail: [email protected]
Example 5.1. It is felt that three drugs (X1 , X2 and X3 ) may lead to changes in the level
of a certain biochemical compound found in the brain. Thirty mice of the same stain were
randomly divided into three groups and received the drugs. The amount of the compound
(in micrograms per gram of brain tissue) is recorded before and after the treatments. The
responses are in given in the following table. Test the hypothesis of no treatment effect at
5% level of significance.
Before treatment After treatment
x1i1 x1i2 x1i3 x2i1 x2i2 x2i3
1.21 0.61 0.70 1.26 0.50 0.81
0.92 0.43 0.71 1.07 0.39 0.69
0.80 0.35 0.71 1.33 0.24 0.70
0.85 0.48 0.68 1.39 0.37 0.72
0.98 0.42 0.71 1.38 0.42 0.71
1.15 0.52 0.72 0.98 0.49 0.70
1.10 0.50 0.75 1.41 0.41 0.70
1.02 0.53 0.70 1.30 0.47 0.67
1.18 0.45 0.70 1.22 0.29 0.68
1.09 0.40 0.69 1.00 0.30 0.70
The necessary calculations are as follows. Here d∗ij = dij − d¯j . Also the last row is the sum.
41
Multivariate Methods E-mail: [email protected]
10
1 X 1
⇒ d¯1 = di1 = (−0.204) = −0.204
10 i=1 10
10
1 X 1
⇒ d¯2 = di2 = (0.810) = 0.081
10 i=1 10
10
1 X 1
⇒ d¯3 = di3 = (−0.01) = −0.001
10 i=1 10
−0.204
⇒ d¯ = 0.081
−0.001
n 10
1 X 1 X
sdj dk = (dij − d¯j )(dik − d¯k ) = (dij − d¯j )(dik − d¯k ); j, k = 1, 2, 3
n − 1 i=1 10 − 1 i=1
10
1X 1
⇒ sd1 d1 = (di1 − d¯1 )2 = (0.55444) = 0.06160
9 i=1 9
10
1X 1
⇒ sd2 d2 = (di2 − d¯2 )2 = (0.02049) = 0.00228
9 i=1 9
10
1X 1
⇒ sd3 d3 = (di3 − d¯3 )2 = (0.01849) = 0.00205
9 i=1 9
10
1X 1
⇒ sd1 d2 = (di1 − d¯1 )(di2 − d¯2 ) = (−0.00096) = −0.00011
9 i=1 9
10
1X 1
⇒ sd1 d3 = (di1 − d¯1 )(di3 − d¯3 ) = (−0.00544) = −0.00060
9 i=1 9
10
1X 1
⇒ sd2 d3 = (di2 − d¯2 )(di3 − d¯3 ) = (−0.00469) = −0.00052
9 i=1 9
0.06160 −0.00011 −0.00060 16.28866 1.98818 5.27173
Sd = −0.00011 0.00228 −0.00052 ⇒ Sd−1 = 1.98818 465.77088 118.72867
−0.00060 −0.00052 0.00205 5.27173 118.72867 519.46436
The hypothesis to be tested is:
H0 : µd = 0
H1 : µd 6= 0
T 2 = nd¯0 (Sd )−1 d¯
42
Multivariate Methods E-mail: [email protected]
0
−0.204 16.28866 1.98818 5.27173 −0.204
⇒ T 2 = 10 0.081 1.98818 465.77088 118.72867 0.081 = 36.515
−0.001 5.27173 118.72867 519.46436 −0.001
(n − 1)p (10 − 1)3
The critical value is c∗ = Fα (p, n − p) = F0.05 (3, 10 − 3) = 16.779. There
n−p 10 − 3
is a significant treatment effect at 5% level of significance.
The next question is which of the three drugs (X1 , X2 or X3 ) leads to changes in the level of
the biochemical compound found in the brain? To answer this question, the simultaneous
confidence intervals for the individual mean differences µdj need to be constructed, which
is given by:
√
r
¯ ∗
sdj dj
dj ± c ; j = 1, 2, 3
n
Hence, the 95% confidence intervals are:
!
√
r
√
r
¯ ∗
sd1 d1 0.06160
µd1 : d1 ± c = −0.204 ± 16.779 = (−0.5255, 0.1175)
n 10
!
√
r
√
r
¯ ∗
sd2 d2 0.00228
µd 2 : d2 ± c = 0.081 ± 16.779 = (0.0191, 0.1429)
n 10
!
√
r
√
r
¯ ∗
sd3 d3 0.00205
µd 3 : d3 ± c = −0.001 ± 16.779 = (−0.0596, 0.0576)
n 10
The confidence interval for µd2 does not include zero. Thus, H0 : µd = 0 was rejected due
to the second component (X2 ). In other words, it is the second drug (X2 ) that led to a
significant change in the level of the biochemical compound found in the brain at 5% level
of significance.
43
Multivariate Methods E-mail: [email protected]
or
µ1 − µ2 1 −1 0 0 ··· 0 0 µ1
µ1 − µ3
0 1 −1 0 ··· 0 0
µ2
µ1 − µ4 =
0 0 1 −1 ··· 0 0
µ3 = Bµ
.. .. .. .. .. . . .. .. ..
. . . . . . . . .
µ1 − µq 0 0 0 0 · · · 1 −1 µq
| {z } | {z } | {z }
(q−1)×1 (q−1)×q q×1
Since each row is a contrast and the q − 1 rows are linearly independent, both A and B are
contrast matrices. If Aµ = Bµ = 0, then µ1 = µ2 = · · · = µq . Hence, the hypothesis of
no difference in treatments (equal treatment means) is Aµ = 0 for any choice of contrast
matrix A.
(n − 1)(q − 1)
T 2 = n(AX̄)0 (ASA0 )−1 (AX̄) ∼ F [q − 1, n − (q − 1)].
n − (q − 1)
Note that T 2 does not depend on the particular choice of A. As usual, reject H0 if the
observed T 2 = n(Ax̄)0 (ASA0 )−1 (Ax̄) > c∗ .
The (1−α)100% simultaneous confidence interval for a single contrast a0 µ for any contrast
vector a of interest are: r !
√ a 0 Sa
ax̄ ± c∗
n
44
Multivariate Methods E-mail: [email protected]
(n − 1)(q − 1)
where c∗ = Fα [q − 1, n − (q − 1)]. Particularly, the confidence interval for
n − (q − 1)
µj − µk is obtained by letting a0 = (0, · · · , 0, |{z}
1 , 0, · · · , 0, |{z}
1 , 0, · · · , 0):
j th position kth position
" r #
√ sjj − 2sjk + skk
(x̄j − x̄k ) ± c∗ ; j 6= k.
n
Example 5.2. A researcher conducted three indices measuring severity of heart attacks.
The values of the indices for n = 40 heart-attack patients arriving at a hospital emergency
room produced the following summary statistics.
46.1 101.3 63.0 71.0
x̄ = 57.3 and S = 63.0 80.2 55.6
50.4 71.0 55.6 97.4
Test the equality of the mean indices and judge the differences in pairs of mean indices.
1 −1 0
Since there are q = 3 treatments, let A = . Then the hypothesis to be
1 0 −1
tested is
µ1 − µ2 0
H0 : Aµ = 0 ⇒ H0 : =
µ1 − µ3 0
µ1 − µ2 0
H1 : Aµ 6= 0 ⇒ H1 : 6=
µ1 − µ3 0
The test statistic is: T 2 = n(Ax̄)0 (ASA0 )−1 (Ax̄).
−11.2 0 55.5 22.9 0 −1 0.02162 −0.00873
Ax̄ = , ASA = ⇒ (ASA ) =
−4.3 22.9 56.7 −0.00873 0.02116
2
0.02162 −0.00873 −11.2
T = −11.2 −4.3 = 90.49
−0.00873 0.02116 −4.3
(n − 1)(q − 1) (40 − 1)(3 − 1)
c∗ = Fα [q − 1, n − (q − 1)] = F0.05 [3 − 1, 40 − (3 − 1)] = 6.66
n − (q − 1) 40 − (3 − 1)
Hence, H0 : Aµ = 0 is rejected. All mean indices are not equal.
45
Multivariate Methods E-mail: [email protected]
" r #
√ s11 − 2s12 + s22
µ1 − µ2 : (x̄1 − x̄2 ) ± c∗
n
" r #
√
101.3 − 2(63.0) + 80.2
−11.2 ± 6.66 = (−14.23986, −8.16014)
40
" r #
√ s 11 − 2s 13 + s 33
µ1 − µ3 : (x̄1 − x̄3 ) ± c∗
n
" r #
√ 101.3 − 2(71.0) + 97.4
−4.3 ± 6.66 = (−7.37255, −1.22745)
40
" r #
√ s 22 − 2s 23 + s 33
µ2 − µ3 : (x̄2 − x̄3 ) ± c∗
n
" r #
√ 80.2 − 2(55.6) + 97.4
6.9 ± 6.66 = (3.57500, 10.22500)
40
All the intervals do not contain zero. Thus, all mean indices are significantly different from
each other (µ2 > µ3 > µ1 ).
46
Multivariate Methods E-mail: [email protected]
Assumptions
(n1 + n2 − 2)p
Since T 2 ∼ F (p, n1 +n2 −p−1), H0 will be rejected if the observed T 2 > c∗
n1 + n2 − p − 1
(n1 + n2 − 2)p
where c∗ = Fα (p, n1 + n2 − p − 1).
n1 + n2 − p − 1
A (1 − α)100% confidence region for µ1 − µ2 is given by:
−1
0 1 1
[(x̄1 − x̄2 ) − (µ1 − µ2 )] + Spooled [(x̄1 − x̄2 ) − (µ1 − µ2 )] ≤ c∗ .
n1 n2
47
Multivariate Methods E-mail: [email protected]
which is an ellipsoid centered at (x̄1 − x̄2 ). The half lengths of the axes are
s
1 1
c∗ ; j = 1, 2, · · · , p
p
λj +
n1 n2
in the direction of ej which is the normalized eigen vector associated with the eigen value
λj of Spooled .
If a0 = (0, 0, · · · , 1
|{z} , · · · , 0), a0 (µ1 − µ2 ) = µ1j − µ2j , a0 (x̄1 − x̄2 ) = x̄1j − x̄2j and
j th position
0
a Spooled a = sjj . Thus, a (1 − α)100% simultaneous confidence interval for µ1j − µ2j is
" s #
√ 1 1
(x̄1j − x̄2j ) ± c∗ + sjj
n1 n2
where sjj is the j th diagonal entry of the pooled covariance matrix, Spooled .
Example 5.3. Given the following hypothetical data on academic performance of students
(in preparatory school out of 100 and in university out of 4.00). Test the equality of the
population mean vectors between the two groups.
Female Male
Preparatory University Preparatory University
97 3.40 86 3.90
95 3.45 84 3.75
85 3.50 70 2.25
80 3.05
75 2.80
Summary statistics
92.3333 41.3333 0.2500
Female: x̄1 = and S1 =
3.4500 0.2500 0.0025
79.0000 43.0000 4.4125
Male: x̄2 = and S2 =
3.1500 4.4125 0.4663
42.4444 3.0250 −1 0.0764 −0.7415
Spooled = ⇒ Spooled =
3.0250 0.3117 −0.7415 10.4048
48
Multivariate Methods E-mail: [email protected]
(6)2
Critical value: c∗ = F0.05 (2, 5) = 13.8866. Reject H0 : µ1 = µ2 .
5
A (1 − α)100% simultaneous confidence interval for µ1j − µ2j is
" s #
√ 1 1
(x̄1j − x̄2j ) ± c∗ + sjj
n1 n2
where sjj is the j th diagonal entry of the pooled covariance matrix, Spooled .
" s #
√ 1 1
µ11 − µ21 : (x̄11 − x̄21 ) ± c∗ + s11
n1 n2
s !
√
1 1
13.3333 ± 13.8866 + 42.4444 = (−4.3967, 31.0633)
3 5
" s #
√ 1 1
µ12 − µ22 : (x̄12 − x̄22 ) ± c∗ + s22
n1 n2
s !
√
1 1
0.3000 ± 13.8866 + 0.3117 = (−1.2194, 1.8194)
3 5
Both simultaneous confidence intervals contain the value zero indicating that there is no
significant difference in the mean vectors between females and males. But, this is in
contradiction with the test of H0 : µ1 = µ2 . The possible reasons may be:
• The multivariate normality of the observation vectors might be violated because of
the small sample sizes.
• The assumption of equality of the covariance matrices (Σ1 = Σ2 ) may not hold.
49
Multivariate Methods E-mail: [email protected]
Univariate ANOVA:
µ` = µ + (µ` − µ)
⇒ µ` = µ + τ `
where τ` = µ` −µ is the `th population (treatment) effect. The null hypothesis now becomes
H0 : τ1 = τ2 = · · · = τg = 0.
X`i = µ` + e`i
⇒ X`i = µ + τ` + e`i .
g
X
2
where the random error e`i are independent N (0, σ ). A constraint n` τ` = 0 is imposed
`=1
to define the model parameters uniquely.
The analysis of variance is based on the decomposition of each observed value x`i ,
50
Multivariate Methods E-mail: [email protected]
n
X̀ n
X̀ n
X̀ n
X̀
2 2 2
⇒ (x`i − x̄) = n` (x̄` − x̄) + (x`i − x̄` ) since (x`i − x̄` ) = x`i − n` x̄` = 0.
i=1 i=1 i=1 i=1
Now taking the summation over `,
g n g g n
X X̀ X X X̀
(x`i − x̄)2 = n` (x̄` − x̄)2 + (x`i − x̄` )2
i=1 i=1
|`=1 {z } |`=1 {z } |`=1 {z }
SScorrected BSS (SStreatment ) W SS (SSresiduals )
BSS/(g − 1)
The null hypothesis H0 is rejected if F = > Fα (g − 1, n − g). Rejecting H0
W SS/(n − g)
when F is too large is equivalent with rejecting H0 if:
BSS BSS 1
is too large ⇒ + 1 is too large ⇒ is too small
W SS W SS BSS
+1
W SS
W SS
⇒ is too small.
BSS + W SS
This is used for a multivariate generalization.
Multivariate ANOVA - MANOVA:
Let X`1 , X`2 , · · · , X`n` ; ` = 1, 2, · · · , g is a random sample of size n` from an Np (µ` , Σ).
The random sample from the different populations are independent.
51
Multivariate Methods E-mail: [email protected]
When taking the summation over i, the middle two cross-products become zero vectors.
Then, taking the summation over ` gives:
g n g n g
X X̀ X X̀ X
0 0
(x`i − x̄)(x`i − x̄) = (x`i − x̄` )(x`i − x̄` ) + n` (x̄` − x̄)(x̄` − x̄)0
`=1 i=1 i=1
|`=1 {z } |`=1 {z }
W B
n
th 1 X̀
The sample mean of the ` group is x̄` = x`i and the sample covariance matrix
n` i=1
n
1 X̀
th
of the ` group is S` = (x`i − x̄` )(x`i − x̄` )0 ; ` = 1, 2, · · · , g. This implies
n` − 1 i=1
g
1X (n1 − 1)S1 + (n2 − 1)S2 + · · · + (ng − 1)Sg
x̄ = n` x̄` and Spooled = .
n `=1 n−g
g
X 1
Note that W = (n` − 1)S` = (n − g)Spooled ⇒ Spooled = W.
`=1
n − g
W SS
Recall in the univariate case, H0 : τ1 = τ2 = · · · = τg = 0 is rejected if is
BSS + W SS
small. Likewise, the null hypothesis is H0 : τ1 = τ2 = · · · = τg = 0 is to be rejected if:
|W |
Λ∗ = is too small.
|B + W |
The statistic Λ∗ is known as Wilks’ Lambda. The exact distribution of Λ∗ for special cases
is given on the text book on page 303, Table 6.3.
52
Multivariate Methods E-mail: [email protected]
53
Multivariate Methods E-mail: [email protected]
54 35
B+W =
35 122
The MANOVA table is:
Sources of variation Matrix of SS and cross-products
(SSP) df
36 48
Between Group (Treatment) B= 3−1=2
48 84
18 −13
Within Group (Residual) W = 12 − 3 = 9
−13
38
54 35
Total B+W = 12 − 1 = 11
35 122
W 515
Λ∗ = = = 0.096
B+W 5363
For p = 2 and g = 3, the distribution is:
√ !
n − g − 1 1 − Λ∗
√ ∼ F [2(g − 1), 2(n − g − 1)]
g−1 Λ∗
√ !
7 1 − 0.096
⇒ √ = 8.908 and F0.05 [4, 7] = 3.01
2 0.096
Therefore, H0 : τ1 = τ2 = τ3 = 0 should be rejected.
Next for the simultaneous confidence interval α/[pg(g − 1)] = 0.004167, t0.004167 (n − g) =
3.808. A (1 − α)100% confidence interval for the difference τ`j − τkj is:
" s #
1 1 wjj
(x̄`j − x̄kj ) ± t[α/pg(g−1)] (n − g) +
n` nk n − g
For component 1:
" s #
1 1 w11
τ11 − τ21 : (x̄11 − x̄21 ) ± 3.808 +
n1 n2 12 − 3
" s #
1 1 18
4 ± 3.808 + = (0.067, 7.933)
5 3 12 − 3
" s #
1 1 w11
τ11 − τ31 : (x̄11 − x̄31 ) ± 3.808 +
n1 n3 12 − 3
" s #
1 1 18
3 ± 3.808 + = (−0.613, 6.613)
5 4 12 − 3
54
Multivariate Methods E-mail: [email protected]
" s #
1 1 w11
τ21 − τ31 : (x̄21 − x̄31 ) ± 3.808 +
n2 n3 12 − 3
" s #
1 1 18
−1 ± 3.808 + = (−5.113, 3.113)
3 4 12 − 3
For component 2:
" s #
1 1 w22
τ12 − τ22 : (x̄12 − x̄22 ) ± 3.808 +
n1 n2 12 − 3
" s #
1 1 38
4 ± 3.808 + = (−1.714, 9.714)
5 3 12 − 3
" s #
1 1 w22
τ12 − τ32 : (x̄12 − x̄32 ) ± 3.808 +
n1 n3 12 − 3
" s #
1 1 38
6 ± 3.808 + = (0.751, 11.249)
5 4 12 − 3
" s #
1 1 w22
τ22 − τ32 : (x̄22 − x̄32 ) ± 3.808 +
n2 n3 12 − 3
" s #
1 1 38
2 ± 3.808 + = (−3.976, 7.976)
3 4 12 − 3
There is a significant difference between treatment 1 and treatment 2 in component 1.
Also, there is a significant difference between treatment 1 and treatment 3 in component
2.
55