0% found this document useful (0 votes)
260 views58 pages

Multivariate Material

This document provides a lecture note on multivariate methods. It begins with a review of matrix algebra, including definitions of matrices and vectors. It describes matrix characteristics such as rank and linear independence. It then covers aspects of multivariate analysis such as random vectors and matrices, distances between vectors, and expected values. Subsequent sections discuss the multivariate normal distribution, inference about mean vectors, and comparing multiple multivariate means. The document serves as an introduction to key concepts in multivariate analysis using matrix algebra.

Uploaded by

Bezan Melese
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
260 views58 pages

Multivariate Material

This document provides a lecture note on multivariate methods. It begins with a review of matrix algebra, including definitions of matrices and vectors. It describes matrix characteristics such as rank and linear independence. It then covers aspects of multivariate analysis such as random vectors and matrices, distances between vectors, and expected values. Subsequent sections discuss the multivariate normal distribution, inference about mean vectors, and comparing multiple multivariate means. The document serves as an introduction to key concepts in multivariate analysis using matrix algebra.

Uploaded by

Bezan Melese
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Lecture Note

Multivariate Methods

Kasim M.
Department of Statistics
University of Gondar
Gondar, Ethiopia
Contents

1 Review of Matrix Algebra 1


1.1 Definition of Matrix and Vector . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Matrix Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Eigen Values and Eigen Vectors . . . . . . . . . . . . . . . . . . . . 4
1.2 Spectral Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Aspects of Multivariate Analysis 11


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Objectives of Multivariate Analysis . . . . . . . . . . . . . . . . . . 11
2.1.2 Organization of Multivariate Data . . . . . . . . . . . . . . . . . . . 12
2.2 Random Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Distance of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Linear Combinations of Random Vectors . . . . . . . . . . . . . . . . . . . 18
2.5 Expected Value of the Sample Mean and Covariance Matrix . . . . . . . . 20

3 The Multivariate Normal Distribution 21


3.1 The Multivariate Normal Density and Its Properties . . . . . . . . . . . . . 21
3.1.1 Principal Axis of the Multivariate Normal Density . . . . . . . . . . 23
3.1.2 Further Properties of the Multivariate Normal Density . . . . . . . 27
3.2 Sampling from the Multivariate Normal Distribution . . . . . . . . . . . . 27
3.2.1 The Multivariate Normal Likelihood . . . . . . . . . . . . . . . . . 27
3.2.2 The Sampling Distribution of X̄ and S . . . . . . . . . . . . . . . . 28
3.2.3 Large Sample Behaviour of X̄ and S . . . . . . . . . . . . . . . . . 29

4 Inference about a Mean Vector 30


4.1 The Plausibility of µ0 as a Value for a Normal Population Mean µ. . . . . 30
4.2 Confidence Region for the Mean Vector µ . . . . . . . . . . . . . . . . . . 32
4.3 Simultaneous Confidence Statements . . . . . . . . . . . . . . . . . . . . . 34
4.4 The Bonferroni Method of Multiple Comparisons . . . . . . . . . . . . . . 36
4.5 Likelihood-Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.6 Large Sample Inference about µ . . . . . . . . . . . . . . . . . . . . . . . . 38

i
Multivariate Methods E-mail: [email protected]

5 Comparison of Several Multivariate Means 39


5.1 Dependent Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.1 Paired Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.2 A Repeated Measures Design for Comparing Treatments . . . . . . 43
5.2 Independent Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.1 Comparing Mean Vectors from Two Populations . . . . . . . . . . . 46
5.2.2 Comparison of Several Multivariate Population Means . . . . . . . 49

ii
Chapter 1

Review of Matrix Algebra

The study of multivariate methods is greatly facilitated by the use of matrix algebra. This
chapter presents a review of basic concepts of matrix algebra which are essential to both
geometrical interpretations and algebraic explanations of subsequent multivariate statisti-
cal techniques.

1.1 Definition of Matrix and Vector


A rectangular array of numbers with, for instance, n rows and p columns is called a matrix
of dimension n × p. It is written as:
 
x11 x12 . . . x1p
 x21 x22 . . . x2p 
X =  ..
 
.. . . .. 
 . . . . 
xn1 xn2 . . . xnp

A vector is a matrix of n × 1 real numbers x1 , x2 , . . . , xn and it is written as:


 
x1
 x2 
x =  ..  or x0 = x1 , x2 , · · · , xn .
  
 . 
xn

A vector has both magnitude (length) and direction. The length of a vector x0 = (x1 , x2 , . . . , xn ),
is defined by q √
Lx = x21 + x22 + · · · + x2n = x0 x.

1
Multivariate Methods E-mail: [email protected]

The length of a vector can be expanded and contracted by multiplying with a constant a.
 
ax1
 ax2 
ax =  .. 
 
 . 
axn
Such multiplication of a vector x by a scalar a changes the length as
q √
Lax = a2 x21 + a2 x22 + · · · + a2 x2n = |a| x0 x.
When |a| > 1, vector x is expanded. When |a| < 1, vector x is contracted. When |a| = 1,
there is no change. If a < 0, the direction of vector x is changed.

Choosing a = L−1 x , we obtain the unit vector Lax , which has length 1 and lies in the
direction of x.
 
x1 p
Example 1.1. If n = 2, consider the vector x = . The length of x is Lx = x21 + x22 .
x2
Geometrically, the length of a vector in two dimensions can be viewed as the hypotenuse
of a right triangle.

1.1.1 Matrix Characteristics


• Rank: The rank of a matrix A is the maximum number of linearly independent rows
(columns).

– A set of k vectors x1 , x2 , · · · , xk is said to be linearly independent if a1 x1 +


k
X
a2 x 3 + · · · + ak x k = ai xi = 0 only if a1 = a2 = · · · = ak = 0. That is, if
i=1
every ai is zero, the x1 , x2 , · · · , xk (columns) are linearly independent. Linear
independence implies every vector can not be written as a linear combination
of the other vectors. Vectors of the same dimension that are not linearly inde-
pendent are said to be linearly dependent which means at least one vector can
be written as a linearcombination
 of the
 other vectors.
3 2
Example 1.2. x1 = , x2 =
4 1
a1 x1 + a2 x2 = 0 ⇒
3a1 + 2a2 = 0
4a1 + a2 = 0
holds only if a1 = a2 = 0. This confirms that x1 and x2 are linearly independent.
3 2
In other words, the columns of matrix A = are linearly independent.
4 1

2
Multivariate Methods E-mail: [email protected]

    
1 1 1
Example 1.3. x1 =  2  , x2 =  5  , x3 =  −1 
0 1 −1
a1 x1 + a2 x2 + a3 x3 = 0 ⇒

a1 + a2 + a3 = 0

2a1 + 5a2 − a3 = 0
a2 − a3 = 0
⇒ a1 + 2a2 = 0. ⇒ If a1 = a2 = 0, then a3 = 0. If a1 = 1, then a2 = a3 = 0.5.
Therefore, x1 , x2 and x3 are not linearly independent.
– The row and column rank of a matrix are equal.
∗ Rank (A) ≥ 0
∗ Rank (A) ≤ min(n, p)
∗ Rank (A) = Rank (A0 )
∗ Rank (A) = Rank (A0 A) = Rank (AA0 )
– Trace: The trace of a matrix is the sum of its diagonal elements: tr(A) =
Xk
aii .
i=1

∗ tr(A ± B) = tr(A) ± trB)


∗ tr(cA) = c tr(A)
∗ tr(An×p Bp×n ) = tr(BA)
∗ tr(An×p Bp×q Cq×n ) = tr(CAB) = tr(BCA)

• Determinant: Det (A) = |A|

– |aA| = an |A|
– |AB| = |BA| = |A||B|

• Inverse: A square matrix A is said to be non-singular if its rank is equal to the


number of rows (columns).

– If a k × k matrix A is non-singular, then there exist a unique k × k matrix B


such that AB = BA = Ik×k .
∗ The matrix B is called inverse of A denoted by A−1 .
∗ A−1 exists if and only if the determinant of A is non-zero. And hence,
|A−1 | = |A|−1 .

• Positive Definite Matrix: A symmetric matrix A is said to be positive definite if


the quadratic form Q(x) = x0 Ax > 0 for all x 6= 0 where x0 = (x1 , x2 , · · · , xn ).

3
Multivariate Methods E-mail: [email protected]

   
1 2 2
Example 1.4. A = , x=
2 4 −1
  
0
  1 2 2
Q(x) = x Ax = 2, −1 =0
2 4 −1
⇒ A is not positive definite.

– A symmetric matrix A is said to be positive semi-definite if x0 Ax ≥ 0 for all


x 6= 0.

1.1.2 Eigen Values and Eigen Vectors


Let A be a k × k matrix and I be a k × k identity matrix. The scalars λ1 , λ2 , · · · , λk
satisfying the polynomial equation:

|A − λI| = 0

are called the eigen values (characteristics roots) of matrix A. These eigen values are
unique unless two or more eigen values are equal.

• The equation |A − λI| = 0 as a function of λ is called characteristic equation.

• The eigen values of a symmetric matrix with real elements are real. λi ’s can be
complex numbers if the matrix is not symmetric.

• The eigen values of a positive definite matrix are all positive. If a k × k symmetric
matrix is positive semi-definite of rank r (r < k), then it has r positive and (k − r)
zero eigen values.

• The eigen values of a diagonal matrix are the diagonal elements themselves.

• The eigen values of an idempotent matrix A, that is, A = A2 are 1 and 0.

Associated with every eigen value λi of a square matrix A, there is an eigen vector xi
whose elements satisfy the homogenous system of equations:

(A − λi I)xi = 0 ⇔ Axi = λi xi

• If |A − λi I| = 0, there exist at least one non-trivial solution (xi 6= 0).

• The elements of the vector xi are determined only up to a scaled factor because the
system is homogenous, we get only relationship like x1i = 5x2i because the number
of unknowns is greater than the number of equations.

– Since the values of the eigen vectors are trivial, normalizing makes them unique,
that is, the eigen vectors have a unit length.

4
Multivariate Methods E-mail: [email protected]

– The normalized eigen vector, ei , of xi is:


1 xi
ei = xi = p 0
Lxi xi xi

∗ ||ei || = e0i ei = 1, for all i.


x0 xj
∗ e0i ej = p i0 p 0 = 0 for all i 6= j.
xi xi xj xj
– The normalized eigen vectors are chosen to satisfy e01 e1 = e02 e2 = · · · = e0k ek = 1
and be mutually perpendicular, e0i ej = 0, i 6= j.
 
1 2
Example 1.5. Find the eigen values and eigen vectors of A = .
3 2
   
1 2 1 0 1−λ 2
|A − λI| = 0 ⇒ −λ =0⇒ =0
3 2 0 1 3 2−λ

(1 − λ)(2 − λ) − 6 = 0 ⇒ λ2 − 3λ − 4 = 0
Thus, the eigen values of A are λ1 = 4 and λ2 = −1.

To find the corresponding eigen vectors:

Axi = λi xi , i = 1, 2

• For λ1 = 4,     
1 2 x11 x11
Ax1 = λ1 x1 ⇒ =4
3 2 x21 x21
⇒ x11 + 2x21 = 4x11
3
⇒ x21 = x11
2
 
2
Let x11 = 2 ⇒ x21 = 3. Thus, x = - - not unique.
3
 
2
The normalized eigen vector of x = is
3
  " #
1 1 2 √2
13
e1 = p 0 x1 = √ = √3
.
x1 x1 4+9 3 13

Note that e01 e1 = 1.

5
Multivariate Methods E-mail: [email protected]

• For λ2 = −1,     
1 2 x12 x12
Ax2 = λ2 x2 ⇒ =4
3 2 x22 x22
⇒ x12 + 2x22 = −x12
⇒ x22 = −x12
 
1
Let x12 = 1 ⇒ x22 = −1. Thus, x = - - not unique.
−1
 
1
The normalized eigen vector of x = is
−1
  " 1
#
1 1 1 √
2
e2 = p 0 x2 = √ = .
x2 x2 1+1 −1 − √12

Note that e02 e2 = 1. Also, e1 and e2 are orthogonal (perpendicular), that is, e01 e2 = 0.

Example 1.6. Find the eigen values and corresponding eigen vectors of the following two
matrices:  
  13 −4 2
1 −5
A = ⇒ λ1 = 6, λ2 = −4 and B =  −4 13 −2  ⇒ λ1 = 18, λ2 =
−5 1
2 −2 10
9, λ3 = 9

1.2 Spectral Decomposition


Any symmetric square matrix can be can be constructed from its eigen values and eigen
vectors.
Let A be a k × k symmetric matrix having k non-zero eigen values λ1 , λ2 , · · · , λk with
normalized eigen vectors e1 , e2 , · · · , ek . Then, the spectral decomposition of A is given
by:
X k
0 0 0
A = λ1 e1 e1 + λ2 e2 e2 + · · · + λk ek ek = λj ej e0j .
j=1
 
1 2
Example 1.7. A =
2 −2

1−λ 2
• The eigen values: |A − λI| = =0
2 −2 − λ
⇒ λ2 + λ − 6 = 0. Thus, the eigen values of A are λ1 = 2 and λ2 = −3.

• The eigen vectors are:

6
Multivariate Methods E-mail: [email protected]

– For λ1 = 2,
    
1 2 x11 x11
Ax1 = λ1 x1 ⇔ =2
2 −2 x21 x21
 
1 2
⇒ x21 = x
2 11
⇒ x1 =
1
!
√2
5
The normalized eigen vector corresponding to λ1 = 2 is e1 = √1
.
5

– For λ2 = −3,
    
1 2 x12 x12
Ax2 = λ2 x2 ⇒ =2
2 −2 x22 x22
 
1
⇒ x22 = −2x12 ⇒ x2 =
−2
!
√1
5
The normalized eigen vector corresponding to λ2 = −3 is e2 = .
− √25

Note that e01 e1 = e02 e2 = 1 and e01 e2 = e02 e1 = 0.

We need to show A = λ1 e1 e01 + λ2 e2 e02 .


! !
√2 √1
 
1 2 
√2 √1
 
√1

= 2 √15 −3 5 − √25
2 −2 5
5 5 − √25 5

The matrix is written as a function of eigen values and normalized eigen vectors.

In matrix form, the spectral decomposition of A is:

A = OΛO 0
 
λ1 0 · · · 0
 0 λ2 · · · 0 
where O = (e1 , e2 , · · · , ek ) and Λ = diag(λ1 , λ2 , · · · , λk ) =  .. .
 
.. . . .
 . . . .. 
0 0 · · · λk
Note here that O O = OO = Ik×k (O is orthogonal, O = O 0 ).
0 0 −1

In the above example,


" #
√2 √1
 
5 5 2 0
O = (e1 , e2 ) = √1 , Λ=
5
− √25 0 −3

⇒ A = OΛO 0 .

7
Multivariate Methods E-mail: [email protected]

Again, using spectral decomposition, for a positive definite matrix A

A−1 = OΛ−1 O 0
 
1 1 1
where O = (e1 , e2 , · · · , ek ) and Λ−1 = diag , ,··· , .
λ1 λ2 λk
k
X 1
⇒ A−1 = ej e0j .
λ
j=1 j

Also,
1 1
A 2 = OΛ 2 O 0
1
p p p 
where O = (e1 , e2 , · · · , ek ) and Λ = diag
2 λ1 , λ2 , · · · , λk .

k
1
X
λj ej e0j .
p
⇒ A2 =
j=1

 
  13 −4 2
1 1 2
Example 1.8. Find A−1 and A 2 . A = and A =  −4 13 −2 
2 1
2 −2 10

1.3 Singular Value Decomposition


Let A be an m × k matrix. Then there exist and m × m orthogonal matrix U (i.e.,
U U 0 = I) and a k × k orthogonal matrix V (i.e., V V 0 = I) such that A = U ΛV 0 where
Λ is an m × k matrix with (i, i) entry λi ≥ 0 for i = 1, 2, · · · , min(m, k) and the other
entries are zero.

• U = (e1 , e2 , · · · , emin(m,k) ) where ei (i = 1, 2, · · · , min(m, k)) is the normalized eigen


vector corresponding to λi of the matrix AA0 .

• V = (e∗1 , e∗2 , · · · , e∗min(m,k) ) where e∗i (i = 1, 2, · · · , min(m, k)) is the normalized eigen
vector corresponding to λ∗i of the matrix A0 A.
 √ 
λ1 √0 · · · 0
 0 λ2 · · · 0 
• Λ =  ..
 
.. . . .. 
 . . .
p .

0 0 ··· λmin(m,k)

Note that λi is the eigen value of matrix A where λi is the eigen value of A0 A or
AA0 .

8
Multivariate Methods E-mail: [email protected]

 
3 1 1
Example 1.9. A =
−1 3  1
  3 −1  
0 3 1 1  11 1
AA = 1 3 = and
−1 3 1 1 11
  1 1  
3 −1   10 0 2
3 1 1
A0 A =  1 3  =  0 10 4 
−1 3 1
1 1 2 4 2
• Eigen values and eigen vectors corresponding to AA0 .

0
11 − λ 1
|AA − λI| = 0 ⇒ = 0 ⇒ (11 − λ)2 − 1 = 0
1 11 − λ
λ2 − 22λ + 120 = 0 ⇒ λ = 12 or λ = 10.
The eigen values of AA√0 or A0 A are λ
√1 = 12 and λ2 = 10 which implies the eigen
values of A to be λ1 = 12 and λ2 = 10.
– Eigen vector corresponding to λ1 = 12,
    
0 11 1 x11 x11
AA x1 = λ1 x1 ⇒ = 12 ⇒ x21 = x11
1 11 x21 x21
" #
√1
 
1
Let x11 = 1 ⇒ x21 = 1 ⇒ x1 = ⇒ e1 = √12
1 2

– Eigen vector corresponding to λ2 = 10,


    
0 11 1 x12 x12
AA x2 = λ2 x2 ⇒ = 10 ⇒ x22 = −x12
1 11 x22 x22
" #
√1
 
1 2
Let x12 = 1 ⇒ x22 = −1 ⇒ x2 = ⇒ e2 =
−1 − √12

p p   √12 0 
" #
√1 √1
U = (e1 , e2 ) = 2 2
and Λ = diag λ1 , λ2 = √
√1 − √12 0 10
2

• Eigen values and eigen vectors corresponding to A0 A.



10 − λ 0 2

|A0 A − λI| = 0 ⇒ 0 10 − λ 4 = 0
2 4 2−λ

10 − λ 4 + 2 0 10 − λ
0 4
(10 − λ) − 0 =0
4 2−λ 2 2−λ 2 4

λ2 − 12λ = 0 ⇒ λ = 12 or λ = 10 or λ = 0.

9
Multivariate Methods E-mail: [email protected]

– Eigen vector corresponding to λ1 = 12,


    
10 0 2 x11 x11
A0 Ax1 = λ1 x1 ⇒  0 10 4   x21  = 12  x21 
2 4 2 x31 x31

10x11 + 0x21 + 2x31 = 12x11


0x11 + 10x21 + 4x31 = 12x21
2x11 + 4x21 + 2x31 = 12x31
⇒ x21 = 2x11 and x31 = x11  
√1
 
1 6
Let x11 = 1 ⇒ x21 = 2 and x31 = 1 ⇒ x1 =  2  ⇒ e∗1 =  √2
 
6 
1 √1
6

– Eigen vector corresponding to λ2 = 10,


    
10 0 2 x12 x12
A0 Ax2 = λ2 x2 ⇒  0 10 4   x22  = 10  x22 
2 4 2 x32 x32

10x12 + 0x22 + 2x32 = 10x12


0x12 + 10x22 + 4x32 = 10x22
2x12 + 4x22 + 2x32 = 10x32
⇒ x32 = 0 and x22 = − 12 x12 .
√2
   
2 5
Let x12 = 2 ⇒ x22 = 1 ⇒ x1 =  −1  ⇒ e∗2 =  − √15 
0 0
 1 2

√ √
6 5
V = (e∗1 , e∗2 ) = √2 − √15 

6 
√1 0
6

" # √ " #
√1 √1 √1 √2 √1
 
0 2 2 12 √0 6 6 6 3 1 1
A = U ΛV = √1
=
2
− √12 0 10 √2
5
− √15 0 −1 3 1

10
Chapter 2

Aspects of Multivariate Analysis

2.1 Introduction
Multivariate statistical analysis is concerned with data collected with several dimensions
of the same individual (subject or experimental unit). Using multivariate analysis, the
variables can be examined simultaneously in order to access the key features of the process.
It enables us to

• explore the joint performance of the variables and

• determine the effect of each variable in the presence of the others.

As in the univariate case, it is assumed that a random sample of of the multi-component


observations has been collected from different individuals. The data consists of simulta-
neous measurements on many response variables. The common source of each individual
observation will generally lead to dependence or correlation among the dimension (compo-
nents). And this is the feature that distinguishes multivariate data and techniques from
their univariate counterparts.

2.1.1 Objectives of Multivariate Analysis


The objectives of scientific investigations to which multivariate methods most naturally
lend themselves include the following:

1. Data reduction or structural simplification. The phenomenon being studied is


represented as simply as possible without sacrificing valuable information. This will
make interpretation easier. Example: principal component analysis.

2. Sorting and grouping. Groups of ”similar” objects or variables are created, based
upon measured characteristics. Example: discriminant analysis.

3. Investigation of the dependence among variables. Studying the covariance


structure will help to determine the nature of the relationships among variables.

11
Multivariate Methods E-mail: [email protected]

In multivariate study, the interest is on the off-diagonals (covariances). Are all the
variables mutually independent or are one or more variables dependent on the others?
If so, how? Example: canonical correlation analysis.

4. Prediction. The relationship between variables can be determined for the purpose
of predicting the values of one or more variables on the basis of observations on
the other variables. Example: multivariate linear regression, multivariate analysis of
variance.

5. Hypothesis testing. Specific statistical hypotheses can be tested to validate as-


sumptions or to reinforce prior convictions.

2.1.2 Organization of Multivariate Data


Most multivariate data sets can be represented in a rectangular format, in which the
elements of each row correspond to the variable values of a particular unit in the data set
and the elements of the columns correspond to the values taken by a particular variable.
Suppose there are p ≥ 2 variables (characteristics) measured from n items. Let xij denote
the value of the j th variable on the ith item (i = 1, 2, · · · , n and j = 1, 2, · · · , p, n >> p).
Consequently, the data can be displayed as follows:

Variable 1 Variable 2 ··· Variable j ··· Variable p


Item 1 x11 x12 ··· xlj ··· xlp
Item 2 x21 x22 ··· x2j ··· x2p
.. .. .. ... .. .. ..
. . . . . .
Item i xi1 xi2 ··· xij ... xip
.. .. .. .. .. .. ..
. . . . . . .
Item n xn1 xn2 ··· xnj ··· xnp

This can be written as a rectangular array, matrix, X of n rows and p columns:

x11 x12 · · · x1j · · · xlp


 
 x21 x22 · · · x2j · · · x2p 
 . .. .. .. .. 
 . ...
 . . . . . 

X=
 xi1 xi2 . . . xij . . . xip 

 . .. .. .. .. .. 
 .. . . . . . 
xn1 xn2 · · · xnj · · · xnp n×p
A single multivariate observation is the collection of measurements on p different variables

12
Multivariate Methods E-mail: [email protected]

on the same item. Each row of X represents a multivariate observation.


x11 x12 · · · x1j · · · xlp x01
   
← 1st multivariate observation
 x21 x22 · · · x2j · · · x2p  
 . x02  ← 2nd multivariate observation
 . .. ... .. .. ..   .. 
 . . . . .   . 
  
X= =
x x . . . x . . . x x0i  ← ith multivariate observation

 i1 i2 ij ip  
 . .. .. .. .. ..  .. 
 ..

. . . . .   . 
xn1 xn2 · · · xnj · · · xnp x0 ← nth multivariate observation
n

Descriptive Statistics
A large data set is bulky, and its very mass poses a serious obstacle to any attempt to
visually extract pertinent information. Much of the information contained in the data can
be assessed by calculating certain summary numbers, known as descriptive statistics. For
example, the arithmetic average, or sample mean, is a descriptive statistic that provides
a measure of location-that is, a ”central value” for a set of numbers. And the average of
the squares of the distances of all of the numbers from the mean provides a measure of the
spread, or variation, in the numbers.
n
1X
• Sample mean: x̄j = xij ; j = 1, 2, · · · , p
n i=1
n
1X
• Sample variance: s2j = sjj = (xij − x̄j )2 ; j = 1, 2, · · · , p
n i=1
n
1X
• Sample covariance between Xj and Xk : sjk = (xij − x̄j )(xik − x̄k ); j, k =
n i=1
1, 2, · · · , p; j 6= k. Note sjk = skj for all j and k.
sjk
• Sample correlation coefficient between variable j and k: rjk = √ √ ; j, k =
sjj skk
1, 2, · · · , p. Note rjk = rkj and rjk = 1 if j = k.
Although, the sign of the sample correlation and sample covariance are the same, the
correlation is ordinarily easier to interpret as:
– its magnitude is bounded, that is, −1 ≤ rjk ≤ 1 for all j and k.
– it is unitless.
– it takes the variability into account.
But the major disadvantage of correlation is it does not measure non-linear associa-
tions.
The descriptive statistics for all the p variables in terms of vector and matrix operations
are:

13
Multivariate Methods E-mail: [email protected]

   
xi1 x̄1
n n
1X 1 X xi2   x̄2 
• Sample mean vector: x̄ = xi = = .
   
.. ..
n i=1 n i=1 
 
.   . 
xip x̄p p×1

n
1X
• Sample variance-covariance matrix: Sn = (x − x̄)(x − x̄)0 .
n i=1
 
s11 s12 · · · s1p
 s21 s22 · · · s2p 
⇒ Sn = 
 
.. .. . . .. 
 . . . . 
sp1 sp2 · · · spp p×p
Consequently, the sample standard deviation matrix is written as:
 √ 
s11 0 ··· 0

1
 0 s22 · · · 0 
V =  ..
 
2
.. . . .. 
 . . . . 

0 0 ··· spp p×p

1 1
• Sample correlation matrix: Rn = (V 2 )−1 Sn (V 2 )−1
   
r11 r12 · · · r1p 1 r12 · · · r1p
 r21 r22 · · · r2p 
  r21 1 · · · r2p 
 
⇒ Rn =  .. . = .

.. . . .. . . . 
 . . . ..   .. . . .. 
rp1 rp2 · · · rpp rp1 rp2 · · · 1 p×p

1 1
Note Sn = V 2 RV 2 . Note also that Sn and R are symmetric and positive definite.

Example 2.1. Find the sample mean vector, covariance and correlation matrices for the
following data matrix.  
4 1
 −1 3 
3 5

We find three people, and here is what we observe (with notation:


x11 = 4, x21 = −1, x31 = 3 and x12 = 1, x22 = 3, x32 = 5. The data array would the look
like:
   
x11 x12 4 1
X =  x21 x22  =  −1 3 
x31 x32 3 5

14
Multivariate Methods E-mail: [email protected]

2.2 Random Vectors and Matrices


A random vector (matrix) is a vector (matrix) whose elements are random variables. Let
Xj be the j th variable, then

• Mean(Xj ) = µj = E(Xj ); j = 1, 2, · · · , p

• Var(Xj ) = σj2 = E(Xj − µj )2 ; j = 1, 2, · · · , p

• Cov(Xj , Xk ) = σjk = E(Xj − µj )(Xk − µk ); j, k = 1, 2, · · · , p


σjk
• Cor(Xj , Xk ) = ρjk = √ √ ; j, k = 1, 2, · · · , p
σjj σkk

Let X be an n × p random vector, i.e, X = (X1 , X2 , · · · , Xp )0 . Then the mean vector is:
     
X1 E(X1 ) µ1
X2  E(X2 ) µ2 
E(X) = E  ..  =  ..  =  ..  = µ
     
 .   .  .
Xp E(Xp ) µp

The population variance-covariance matrix is:


  
X1 − µ1
X2 − µ2   
Σ = E(X − µ)(X − µ)0 = E  X − µ , X − µ , · · · , X − µ
  
..  1 1 2 2 p 2 
 .  
Xp−µp

 
(X1 − µ1 )2 (X1 − µ1 )(X2 − µ2 ) · · · (X1 − µ1 )(Xp − µp )
(X2 − µ2 )(X1 − µ1 ) (X2 − µ2 )2 ··· (X2 − µ2 )(Xp − µp )
⇒Σ=E
 
.. .. .. .. 
 . . . . 
(Xp − µp )(X1 − µ1 ) (Xp − µp )(X2 − µ2 ) . . . (Xp − µp )2
 
E(X1 − µ1 )2 E(X1 − µ1 )(X2 − µ2 ) . . . E(X1 − µ1 )(Xp − µp )
E(X2 − µ2 )(X1 − µ1 ) E(X2 − µ2 )2 ... E(X2 − µ2 )(Xp − µp )
=
 
.. .. ... .. 
 . . . 
E(Xp − µp )(X1 − µ1 ) E(Xp − µp )(X2 − µ2 ) . . . E(Xp − µp )2
 
σ11 σ12 · · · σ1p
 σ21 σ22 · · · σ2p 
Thus, Σ = 
 
.. .. . . .. 
 . . . . 
σp1 σp2 · · · σpp p×p

15
Multivariate Methods E-mail: [email protected]

Consequently, the population standard deviation matrix is written as:


 √ 
σ11 0 ··· 0

1  0
 σ22 · · · 0 
V 2 =  ..

.. . .. .. 
 . . . 

0 0 ··· σpp p×p

1 1
Also, the population correlation matrix is ρ = (V 2 )−1 Σ(V 2 )−1 , that is,
 
1 ρ12 · · · ρ1p
 ρ21 1 · · · ρ2p 
ρ =  ..
 
.. . . .. 
 . . . . 
ρp1 ρp2 · · · 1 p×p

1 1
Note: Σ = V 2 ρV 2 .

2.3 Distance of Vectors


Most multivariate techniques are based upon the simple concept of distance. If the point
P = (x1 , x2 ) is the point on the XY plane, then the Euclidean (straight line) distance from
P to the origin O = (0, 0) is given by the Pythagorean theorem. That is,
p q
dE (O, P ) = (x1 − 0) + (x2 − 0) = x21 + x22 .
2 2

All the points (x1 , x2 ) that lie a constant distance, say c, from the origin satisfying the
equation q
c = x21 + x22 ⇔ c2 = x21 + x22
is called equation of a circle with radius c.
The Euclidean distance between two points P = (x1 , x2 ) and Q = (y1 , y2 ) in the two
dimensional space is p
dE (P, Q) = (x1 − y1 )2 + (x2 − y2 )2 .
Similarly, the Euclidean distance between P = (x1 , x2 , · · · , xp ) and Q = (y1 , y2 , · · · , yp ) in
the p dimensional space is
q p
dE (P, Q) = (x1 − y1 )2 + (x2 − y2 )2 + · · · + (xp − yp )2 = (x − y)0 (x − y).

Straight line or Euclidean distance is unsatisfactory for most statistical purposes. This is
because each co-ordinate contributes equally to the calculation of Euclidean distance. This
suggests a statistical measure of distance.

16
Multivariate Methods E-mail: [email protected]

A statistical distance takes into account the variability as well as the correlation unlike
the Euclidean distance. Suppose X 0 = (X1 , X2 , · · · , Xp ) follows a p dimensional dis-
tribution with E(X) = µ and a variance-covariance matrix Cov(X) = Σ. Suppose
x̄ = (x̄1 , x̄2 , · · · , x̄p )0 is a vector of means based on an n × p observed data matrix. The
statistical distance between x̄ and µ is given by
p
dS (x̄, µ) = (x̄ − µ)0 Σ−1 ((x̄ − µ)).
v  −1  
u
u
u σ 11 σ 12 · · · σ 1p x̄ 1 − µ 1
 σ21 σ22 · · · σ2p  x̄2 − µ2 

u   
⇒ dS (x̄, µ) = u u 1 x̄ − µ 1 , x̄ 2 − µ 2 , · · · , x̄ p − µ p . .. . . .. . 
 .. .   .. 
  
t . .
σp1 σp2 · · · σpp x̄p − µp

• If Σ = I, the Euclidean and statistical distances are equal.

• If σij = 0 for i 6= j, the statistical distances is given by:


s
(x1 − µ1 )2 (x2 − µ2 )2 (xp − µp )2
dS (x̄, µ) = + + ··· + .
σ11 σ22 σpp

If one component has much larger variance than another, it will contribute less to the
squared distance. Two highly correlated variables will contribute less than two variables
that are nearly uncorrelated. Essentially, the use of the inverse of the covariance matrix
eliminates the effect of correlation and standardizes all of the variables.
     
x̄1 µ1 4 0
Example 2.2. Let x = ,µ= and Σ = . The variability in the x1 direc-
x̄2 µ2 0 1
tion is greater than that in the x2 direction as σ11 = 4 > σ22 = 1.
p
Euclidean distance: dE = (x1 − µ1 )2 + (x2 − µ2 )2 .
p
Statistical distance: dS = (x − µ)0 Σ−1 (x − µ).
s 1   
0 x1 − µ 1
⇒ dS = (x1 − µ1 , x2 − µ2 ) 4
0 1 x2 − µ 2
r
(x1 − µ1 )2 (x2 − µ2 )2
⇒ dS = + ⇒ equation of ellipse
4 1
All points that lie a constant distance, say c = 2, from the theoretical mean (µ1 , µ2 ) satisfy
the equation:
(x1 − µ1 )2 (x2 − µ2 )2
+ = c2 = 4.
4 1
At x1 = µ1 , (x2 − µ2 )2 = 4 ⇒ x2 − µ2 = ±2 ⇒ x2 = µ2 ± 2.

17
Multivariate Methods E-mail: [email protected]

At x2 = µ2 , (x1 − µ1 )2 = 16 ⇒ x1 − µ1 = ±4 ⇒ x1 = µ1 ± 4.

P lotof theellipse
The ellipse stretches in the x1 direction as compared to that in the x2 direction because
of the larger variance in x1 (the ellipse is parallel to the x1 ). Having the same variance in
both axes, the equation will be simply a circle.

2.4 Linear Combinations of Random Vectors


1. Univariate case:

• E(a1 X1 ) = a1 E(X1 ) = a1 µ1 , a1 ∈ R.
• Var(a1 X1 ) = a21 Var(X1 ) = a21 σ11 , a1 ∈ R.

2. Bivariate case:

• Cov(a1 X1 , a2 X2 ) = a1 a2 Cov(X1 , X2 ) = a1 a2 σ12 , a1 , a2 ∈ R.


 
X1
• Given X =
X2
 
X1
a1 X1 + a2 X2 = (a1 , a2 ) = a0 X
X2

– E(a0 X) = E(a1 X1 + a2 X2 ) = a1 E(X1 ) + a2 E(X2 ) = a1 µ1 + a2 µ2


 
µ
⇒ E(a X) = (a1 , a2 ) 1 = a0 µ
0
µ2

– Var(a0 X) = Var(a1 X1 +a2 X2 ) = Var(a1 X1 )+Var(a2 X2 )+Cov(a1 X1 , a2 X2 )

Var(a0 X) = a21 σ11 + a22 σ22 + a1 a2 σ12


  
σ11 σ12 a1
= (a1 , a2 )
σ12 σ22 a2
= a0 Σa

3. Multivariate case: If X a p-dimensional random vector and a ∈ Rp , then the linear


combination a0 X is a one-dimensional random variable. That is,
 
X1
 X2 
X =  .. 
 
 . 
Xp

18
Multivariate Methods E-mail: [email protected]

 
X1
 X2 
Linear combination: a1 X1 + a2 X2 + · · · + ap Xp = (a1 , a2 , · · · , ap )  ..  = a0 X.
 
 . 
Xp

• E(a0 X) = a0 E(X) = a0 µ
• Var(a0 X) 
= a0 Σa 
σ11 · · · σ1p
 .. .. 
Here Σ =  . . 
σp1 · · · σpp
4. Consider q linear combinations of p random variables.
p
X
Z1 = a11 X1 + a12 X2 + · · · + a1p Xp = a1j Xj = a1 X
j=1
p
X
Z2 = a21 X1 + a22 X2 + · · · + a2p Xp = a2j Xj = a2 X
j=1
..
.
p
X
Zq = aq1 X1 + aq2 X2 + · · · + aqp Xp = aqj Xj = aq X
j=1

In matrix form:
    
Z1 a11 a12 · · · a1p Z1
Z2  a21 a22 · · · a2p  Z2 
 ..  =  .. ..   ..  ⇔ Z = AX
    
.. . .
.  . . . .  . 
Zq aq1 aq2 · · · aqp Zq
• E(Z) = E(AX) = AE(X) = Aµ
• Cov(Z) = Cov(AX) = AΣA0
Example 2.3. Find the mean vector and covariance matrix for the linear combinations:
Z1 = X1 − X2 and
 Z2 = X1 + X2 .
1 −1 X1
Z= = AX
1 1 X2
    
1 −1 µ1 µ1 − µ2
• E(Z) = AE(X) = Aµ = =
1 1 µ2 µ1 + µ2
• Cov(Z) = ACov(Z)A0
     
1 −1 σ11 σ12 1 1 σ11 − 2σ12 + σ22 σ11 − σ22
⇒ Cov(Z) = =
1 1 σ12 σ22 −1 1 σ11 − σ22 σ11 + 2σ12 + σ22

19
Multivariate Methods E-mail: [email protected]

2.5 Expected Value of the Sample Mean and Covari-


ance Matrix
Let X be a random matrix given by:
   
X11 X12 · · · Xlp X10
 X21 X22 · · · X2p   X20 
X =  .. =
   
.. .. .. .. 
 . . . .   . 
Xn1 Xn2 · · · Xnp Xn0

If X1 , X2 , · · · , Xn is a random sample from some joint distribution with mean vector µ


and covariance matrix Σ, then
n n
1X 1X
• E(X̄) = E(Xi ) = µ = µ. Thus, X̄ is an unbiased estimator of the mean
n i=1 n i=1
vector µ.

• Cov(X̄) = E(X̄ − µ)(X̄ − µ)0 .


 
E(X̄1 − µ1 )2 E(X̄1 − µ1 )(X̄2 − µ2 ) · · · E(X̄1 − µ1 )(X̄p − µp )
E(X̄2 − µ2 )(X̄1 − µ1 ) E(X̄2 − µ2 )2 · · · E(X̄2 − µ2 )(X̄p − µp )
Cov(X̄) = 
 
.. .. ... .. 
 . . . 
E(X̄p − µp )(X̄1 − µ1 ) E(X̄p − µp )(X̄2 − µ2 ) ··· E(X̄p − µp )2
 
1 1 1
 n σ11 n σ12 · · · n σ1p 
1 1 1 
 σ
 21 n 22 σ · · · σ2p 
= n n 
 .. .. ..
.
.. 
 . . . 

1 1 1 
σp1 σp2 · · · σpp
n n n
1
= Σ
n
n
1X
Recall the sample variance-covariance matrix Sn = (xi − x̄)(xi − x̄)0 . It can
n i=1
n−1
be shown that E(Sn ) = Σ. Thus, Sn is a biased estimator of Σ. This implies
n
n
n 1 X
S = Sn = (xi − x̄)(xi − x̄)0 is an unbiased estimator of Σ, i.e.,
n−1 n − 1 i=1
n n n−1
E(S) = E(Sn ) = Σ = Σ.
n−1 n−1 n

20
Chapter 3

The Multivariate Normal


Distribution

Most of the techniques on multivariate statistical analysis are based on the assumption
that the data were generated from a multivariate normal distribution, because

• the multivariate normal distribution is mathematically tractable and ”nice” results


can be obtained. Mathematical complexity of other data generating distributions
may prevent the development of sampling distribution of the usual test statistics and
estimators.

• the sampling distribution of many multivariate statistics are approximately normal,


regardless of the the form of the parent population, because of the central limit effect;
i.e., as the number of source random vectors increases without bound.

3.1 The Multivariate Normal Density and Its Prop-


erties
Just as the normal distribution dominates univariate techniques, the multivariate normal
distribution plays an important role in most multivariate procedures.

Univariate case:

Let X be a random variable with E(X) = µ and var(X) = σ 2 . Then if X ∼ N (µ, σ 2 ), its
pdf is given by
1 (x − µ)2
 
1
f (x) = √ exp − .
2πσ 2 σ2
(x − µ)2
Note the term = (x − µ)(σ 2 )−1 (x − µ) measures the squared statistical distance
σ2
form x to µ in standard deviation units.

21
Multivariate Methods E-mail: [email protected]

Multivariate case:

Suppose X 0 = (X1 , X2 , · · · , Xp ) is a p × 1 vector with E(X) = µ and Cov(X) = Σ. The


joint density of p independent normal variates is:

f (x) = f (x1 , x2 , · · · , xp )
= f (x1 ).f (x2 ). · · · .f (xp )
     
(x −µ ) 2 (x −µ )2 (xp −µp )2
1 −1 1 21 1 −1 2 22 1 − 12 2
=√ e 2 σ1 .√ e 2 σ2 . · · · .√ e σp

2πσ1 2πσ2 2πσp


p
p " #
1 X (xj − µj )2

1 1 1 1
= √ . .··· . . exp − .
2π σ1 σ2 σp 2 j=1 σj2

1 1 1 1
Since Σ = diag(σ11 , σ22 , · · · , σpp ), |Σ| = σ11 .σ22 . · · · .σpp . Thus = . .··· .
|Σ| σ11 σ22 σpp
1 1 1 1
which implies 1 = . .··· . .
|Σ| 2 σ1 σ2 σp
Also,
p p
X (xj − µj )2 X
2
= (xj − µj )(σj2 )−1 (xj − µj ) = (x − µ)0 Σ−1 (x − µ)
j=1
σj j=1

Therefore, the joint density is:


 
1 1 0 −1
f (x) = p 1 exp − (x − µ) Σ (x − µ) .
(2π) 2 |Σ| 2 2

The general p dimensional normal density function, written as X ∼ Np (µ, Σ), is obtained
by letting Σ to be any p × p symmetric matrix,
 
σ11 σ12 · · · σ1p
 σ21 σ22 · · · σ2p 
Σ =  .. ..  .
 
.. . .
 . . . . 
σp1 σp2 · · · σpp
Here, the j th element of µ is still E(Xj ) = µj and the j th element of Σ is still σjj =
E(Xj − µj )2 but the (j, k)th element is now σjk = E(Xj − µj )(Xk − µk ), i 6= k.

p(p − 1)
If all the covariances of Σ are zero, then the p components are independently
2
distributed (which rarely happens). This cannot dealt with multivariate analysis rather
univariate analysis.

For a p×1 vector X, (x−µ)0 Σ−1 (x−µ) = c2 is the squared statistical distance from x to µ.

22
Multivariate Methods E-mail: [email protected]

Note the symmetric matrix Σ is positive definite (all eigen values are positive). Let a0 =
(a1 , a2 , · · · , ap ). We need to show a0 Σa > 0. Hence, a0 Σa = a0 E(X − µ)(X − µ)0 a since
Σ = E(X − µ)(X − µ)0 . Since (X − µ)0 a is a scalar, its transpose makes no change,i.e.,
a0 Σa can be written as E[a0 (X − µ)(X − µ)0 a] = E[a0 (X − µ)a0 (X − µ) > 0. Therefore,
Σ is positive definite.

Example
 3.1.
 Bivariate
  normal distribution
 (p = 2).
X1 µ1 σ11 σ12
X= ,µ= and Σ =
X2 µ2 σ21 σ22
σ12 √ √
ρ12 = √ √ ⇒ σ12 = ρ12 σ11 σ22
σ11 σ22
 √ √ 
σ11 ρ12 σ11 σ22
⇒Σ= √ √ ⇒ |Σ| = σ11 σ22 − ρ212 σ11 σ22 = (1 − ρ212 )σ11 σ22
ρ12 σ11 σ22 σ22
 √ √ 
−1 1 σ22 −ρ12 σ11 σ22
⇒Σ = √ √
(1 − ρ212 )σ11 σ22 −ρ12 σ11 σ22 σ11
The squared statistical distance is (X − µ)0 Σ(X − µ) = c
 √ √  
1 σ22 −ρ12 σ11 σ22 X1 − µ1
c = [X1 − µ1 , X2 − µ2 ] √ √
(1 − ρ2 )σ11 σ22 −ρ12 σ11 σ22 σ11 X2 − µ2
√ √  
1 [σ22 (x1 − µ1 ) − ρ12 σ11 σ22 (x2 − µ2 ), x1 − µ 1
= √ √
(1 − ρ12 )σ11 σ22 −ρ12 σ11 σ22 (x1 − µ1 ) + σ11 (x2 − µ2 )]
2 x2 − µ 2
1 √ √
= 2
[σ22 (x1 − µ1 )2 − 2ρ12 σ11 σ22 (x1 − µ1 )(x2 − µ2 ) + σ11 (x2 − µ2 )2 ]
(1 − ρ12 )σ11 σ22
(x1 − µ1 )2 (x2 − µ2 )2
    
1 x1 − µ 1 x2 − µ 2
= − 2ρ12 √ √ +
(1 − ρ212 ) σ11 σ11 σ22 σ22
=A

Therefore, the bivariate normal density is given by:


1 1
f (x1 , x2 ) = p exp(− A)
2π (1 − ρ212 )σ11 σ22 2

3.1.1 Principal Axis of the Multivariate Normal Density


The component (x − µ)0 Σ−1 (x − µ) specifies the equation of an ellipsoid in the p di-
mensional space when it is set equal to some positive constant c. The family of ellipsoid
generated by varying c has common point µ. For example, the following plot represent an
ellipse for p = 2 obtained by varying c.
The first principal axis of each ellipsoid is the line passing through its largest dimension
and the second which is perpendicular.

23
Multivariate Methods E-mail: [email protected]

If any line through µ of an ellipsoid is represented by its coordinates x on the surface,


then the first principal axis will have coordinates that maximized its squared half length,
(x − µ)0 (x − µ). p
d = (x − µ)0 (x − µ)
• Length of axis: 2d
p
• Half length: d = (x − µ)0 (x − µ)
• Squared half length: d2 = (x − µ)0 (x − µ)
So, to maximize (x − µ)0 (x − µ) subject to the constraint (x − µ)0 Σ−1 (x − µ) = c (c is
fixed):
f (x) = (x − µ)0 (x − µ) − λ[(x − µ)0 Σ−1 (x − µ) − c]
where λ is the lagrange multiplier.

f (x) = 0 ⇒ 2(x − µ) − 2λΣ−1 (x − µ) = 0
∂x
(I − λΣ−1 )(x − µ) = 0
Thus, the coordinates of the longest axis must satisfy the equation.
(I − λΣ−1 )(x − µ) = 0
⇒ Σ(I − λΣ−1 )(x − µ) = Σ0
⇒ (Σ − λI)(x − µ) = 0 trivial solution x − µ = 0
In order to have a non-trivial solution, |Σ − λI|. Hence λ is the eigen values of Σ! But, to
which of the p eigen values of Σ does the vector x correspond? From above, we have:
(I − λΣ−1 )(x − µ) = 0
(x − µ) = λΣ−1 (x − µ)
⇒ d2 = λc
For a fixed c, the length of the principal axis is maximized by taking λ as the largest √
eigen
value λ1 of Σ. Thus, the half length of the major (principal) axis is equal to d1 = λ1 c
in the direction of e1 (where e1 is the normalized eigen vector corresponding to the√ eigen
value λ1 of Σ). Consequently, the full length of the principal axis is equal to 2d1 = 2 λ1 c.
Example
 3.2. Consider
 the bivariate case, p = 2.
σ11 σ12
Σ= Let σ11 = σ22 , σ12 > 0.
σ21 σ22

σ11 − λ σ12
Eigen values: |Σ − λI| = 0 ⇒ = 0 ⇒ λ2 + 2σ11 λ + σ11
2 2
− σ12 = 0 This
σ12 σ11 − λ
equation is quadratic in λ. ⇒ λ1 = σ11 + σ12 and λ2 = σ11 − σ12 .

Eigen vectors:

24
Multivariate Methods E-mail: [email protected]

• For λ1 = σ11 + σ12 ⇒ Σx1 = λ1 x1 .


    
σ11 σ12 x11 x11
= (σ11 + σ12 )
σ12 σ11 x21 x21

σ11 x11 + σ12 x21 = σ11 x11 + σ12 x11



σ12 x11 + σ11 x21 = σ11 x21 + σ12 x21
⇒ x21 = x11 .
The major axis will be parallel to the line x21 = x11 .
 
1
Let x11 = 1 ⇒ x21 = 1 ⇒ x1 = . The normalized eigen vector corresponding to
" # 1
√1
2
λ1 = σ11 + σ12 is e1 = √1
which is the coordinates of the major axis. Recall that
2
the center is not changing. Thus, the first principal axis lies along the 45◦ throught
the center point µ = (µ1 , µ2 )0 .

• For λ2 = σ11 − σ12 ⇒ Σx2 = λ2 x2 .


    
σ11 σ12 x12 x12
= (σ11 − σ12 )
σ12 σ11 x22 x22

σ11 x12 + σ12 x22 = σ11 x12 − σ12 x12



σ12 x12 + σ11 x22 = σ11 x22 − σ12 x22
⇒ x22 = −x12 .
The minor axis will be parallel to the line x22 = −x12 .
 
1
Let x12 = 1 ⇒ x22 = −1 ⇒ x2 = . The normalized eigen vector corresponding
" # −1
√1
2
to λ2 = σ11 − σ12 is e2 = which is the coordinates of the minor axis.
− √12

Note that e01 e2 = e02 e1 = 0.

Half lengths of the axes:


p√
• Half length of the major axis: λ1 c =
(σ11 + σ12 )c.
√ p
• Half length of the minor axis: λ1 c = (σ11 − σ12 )c.

25
Multivariate Methods E-mail: [email protected]

Note that the ellipse is not parallel to the X or Y axis as the off-diagonal of Σ is not zero.
(Had the off-diagonal being zero and the variances are equal, the ellipse will be a circle).

Now, the plot of the ellipse is as shown below.

InserttheP lothere

Along the ellipse shown above (on the surface of the ellipse) the bivariate normal density
is constant. This path along the surface is called a contour.

Note:

• If σ12 = 0, then the equations of the major and minor axes will be reversed, i.e.,

– major axis: x11 = −x21


– minor axis: x12 = x22

• If σ12 = 0 or ρ12 = 0, then the concentration ellipse is simply a circle and an infinity
of perpendicular axes can be given as ”principal”.

Remarks: Recall spectral decomposition. Let O be a matrix whose columns are the nor-
malized eigen vectors of Σ and let Λ be a diagonal matrix whose diagonals are the eigen
values of Σ. Then,
p
X
• Σ= λj ej e0j = OΛO 0
j=1

p
−1
X 1
• Σ = ej e0j = OΛ−1 O 0
j=1
λ j

p
1 1
X
λj ej e0j = OΛ 2 O 0
p
• Σ = 2

j=1

Generally,

• (x − µ)0 Σ−1 (x − µ) = c defines ellipsoids of different sizes depending on c.

• Each ellipsoid is centered at µ = (µ1 , µ2 , · · · , µp )0 .


p
• The half lengths of the axes are λj c in the direction of ej ; j = 1, 2, · · · , p.

26
Multivariate Methods E-mail: [email protected]

3.1.2 Further Properties of the Multivariate Normal Density


Let X ∼ Np (µ, Σ), then:

1. Linear combinations of the components of X are also normally distributed. If


X ∼ Np (µ, Σ), then a0 X = a1 X1 + a2 X2 + · · · + ap Xp will have a univariate normal
distribution. That is, a0 X ∼ N (a0 µ, a0 Σa). More specifically, the marginal distri-
bution of any component Xj of X is N (µj , σjj ). Let a0 = (0, 0, · · · , |{z}
1 , · · · , 0)
j th position
and µ = (µ1 , µ2 , · · · , µj , · · · , µp )0 . Then a0 X = Xj ∼ N (µj , σjj ).

Similarly, if X ∼ Np (µ, Σ), the q linear combinations:


 
a11 X1 + a12 X2 + · · · + a1p Xp
a21 X1 + a22 X2 + · · · + a2p Xp 
Z = AX =   ∼ Nq (Aµ, AΣA0 )
 
..
 . 
aq1 X1 + aq2 X2 + · · · + aqp Xp

2. Allsubsets of the components of Xhave a (multivariate) normal distribution.


Σ11 | Σ12
  
Xq×1 µq×1 q×1 q×(p−q)
If  − − −− , then µ = − − −− and Σ = − − −− | − − −− 
X(p−q)×1 µ(p−q)×1 Σ21
(p−q)×q | Σ22
(p−q)×(p−q)

3. Zero covariance implies that the corresponding components are independently dis-
tributed (for normal distribution only). X1 and X2 are independent if and only if
cov(X1 , X2 ) = 0. That is, if cov(X1 , X2 ) = 0, f (x1 , x2 ) = f (x1 ) · f (x2 ).

4. The conditionaldistributions of the components


2
 are (multivariate) normal. f (x1 |x2 ) =
f (x1 , x2 ) σ12 σ
∼ N µ1 + (x2 − µ2 ), σ11 − 12 .
f (x2 ) σ22 σ22

3.2 Sampling from the Multivariate Normal Distribu-


tion
3.2.1 The Multivariate Normal Likelihood
Recall if X ∼ Np (µ, Σ), then the multivariate normal density is given by:
 
1 1 0 −1
f (x) = p 1 exp − (x − µ) Σ (x − µ) .
(2π) 2 |Σ| 2 2

27
Multivariate Methods E-mail: [email protected]

Let Xi ; i = 1, 2, · · · , n represent a (vector) random sample from Np (µ, Σ). Since X1 , X2 , · · · , Xn


are mutually independent and each is distributed as Np (µ, Σ), the joint distribution is:
n n  
Y Y 1 1 0 −1
f (x1 , x2 , · · · , xn ) = f (xi ) = p 1 exp − (xi − µ) Σ (xi − µ)
i=1 i=1 (2π) 2 |Σ| 2 2
" n
#
1 1X
= np n exp − (xi − µ)0 Σ−1 (xi − µ) .
(2π) 2 |Σ| 2 2 i=1

This expression as a function of µ and Σ for a fixed set of observations x1 , x2 , · · · , xn is


called likelihood function denoted by `(µ, Σ|x1 , x2 , · · · , xn ). To get the ML estimate of µ
and Σ,
∂ ∂
log ` = 0 and log ` = 0.
∂µ ∂Σ
Thus,
n
1X
µ̂ = xi = x̄
n i=1
n
1X
Σ̂ = (xi − µ̂)(xi − µ̂)0
n i=1
n
1X
= (xi − x̄)(xi − x̄)0
n i=1
n−1
= S.
n

3.2.2 The Sampling Distribution of X̄ and S


Univariate case:

Let Xi ; i = 1, 2, · · · , n be a random sample from N (µ, σ 2 ). Then


1
a. X̄ ∼ N (µ, σ 2 ).
n
(n − 1)S 2
b. ∼ χ2 (n − 1) where n > 1 and σ 2 > 0.
σ2
n n
1X 1 X
c. If n > 1, then X̄ and S are independent where X̄ = Xi and S = (Xi −
n i=1 n − 1 i=1
X̄)2 .

Multivariate case:

Let Xi ; i = 1, 2, · · · , n be a random sample of (vectors) from Np (µ, Σ). Then,

28
Multivariate Methods E-mail: [email protected]

1
a. X̄ ∼ Np (µ, Σ).
n
b. (n−1)S is distributed as Wishart distribution (matrix) with n−1 degrees of freedom.

c. X̄ and S are independent.

The sampling distribution of the sample covariance matrix is called the Wishart distri-
bution. it is defined as the sum of independent products of multivariate normal random
vectors, Zi . → Wn (·|Σ) Wishart distribution with n degrees of freedom = distribution of
Xn n
X
0
Zi Zi . (Note for univariate distribution, Zi2 ∼ χ2 (n)).
i=1 i=1

3.2.3 Large Sample Behaviour of X̄ and S


Recall if X ∼ Np (µ, Σ) with |Σ| > 0, then

• (X − µ) ∼ Np (0, Σ)
1
• Z = Σ− 2 (X − µ) ∼ Np (0, Ip )
1 1
• Z 0 Z = (X − µ)0 Σ− 2 Σ− 2 (X − µ) = (X − µ)0 Σ−1 (X − µ) ∼ χ2 (p).

Let Xi ; i = 1, 2, · · · , n be a random sample from any distribution with mean µ and finite
covariance Σ. Then:
1 √
• (X̄ − µ) ∼ Np (0, Σ) ⇒ n(X̄ − µ) ∼ Np (0, Σ) for large n. Since for large n, S
n √
is close to Σ with high probability, n(X̄ − µ) ∼ Np (0, S).
√ 1
• Z = nΣ− 2 (X̄ − µ) ∼ Np (0, Ip )

• n(X̄ − µ)0 Σ−1 (X̄ − µ) ∼ χ2 (p) for large n − p.


√ 1 √ 1
n(X̄ − µ)0 Σ− 2 nΣ− 2 (X̄ − µ) = Z 0 Z = Z12 + Z22 + · · · + Zp2 ∼ χ2 (p)
| {z }| {z }
Z0 Z

29
Chapter 4

Inference about a Mean Vector

One of the central messages of multivariate analysis is that the p correlated variables must
be analysed jointly.

4.1 The Plausibility of µ0 as a Value for a Normal


Population Mean µ
Univariate case:

Suppose a random sample of X1 , X2 , · · · , Xn is drawn from a normal population with


mean µ and variance σ 2 (in practice σ 2 is unknown, s is used instead). Given H0 : µ = µ0
versus H1 : µ 6= µ0 . The test statistic is:

X̄ − µ0
t= √ ∼ t(n − 1).
s/ n

The null hypothesis is rejected if |t| is large. Rejecting H0 when |t| is large is equivalent
to rejecting H0 if:
2 −1
s2
 
2 X̄ − µ0
t = √ = (X̄ − µ0 ) (X̄ − µ0 )
s/ n n
= n(X̄ − µ0 )(s2 )−1 (X̄ − µ0 ) ∼ F (1, n − p)
 2
2 X̄ − µ0
Note: t = √ ∼ t2 (n − 1) = F (1, n − 1).
s/ n
Given a sample of n observations x1 , x2 , · · · , xn , H0 should be rejected, that µ0 is a
plausible value for µ, if the observed

x̄ − µ0
|t| = √

s/ n

30
Introductory Multivariate Methods - Stat 3133 E-mail: [email protected]

exceeds tα/2 (n − 1) or if the observed t2 = n(x̄ − µ0 )[s2 ]−1 (x̄ − µ0 ) > t2α/2 (n − 1).
 2
2 X̄ − µ0
Z = √ ∼ χ2 (n).
σ/ n
Multivariate case:
Let X1 , X2 , · · · , Xn be a random sample from N (µ, Σ). The hypothesis to be tested is:
H0 : µ = µ0 versus H1 : µ 6= µ0 . The test statistic which is analog of the univariate t2 is:
(n − 1)p
T 2 = n(X̄ − µ0 )0 S −1 (X̄ − µ0 ) ∼ F (p, n − p)
n−p
where n n
1X 1 X
X̄ = Xi and S = (Xi − X̄)(Xi − X̄)0 .
n i=1 n − 1 i=1
This test statistic is called Hotelling’s T 2 statistic. If T 2 is ”too large”, i.e., x̄ is ”too far”
from µ0 , then H0 : µ = µ0 is rejected which means µ0 is not a plausible value for µ.

If n independent observation vectors x1 , x2 , · · · , xn are collected, then H0 : µ = µ0 is


(n − 1)p
rejected if T 2 = n(x̄ − µ0 )0 S −1 (x̄ − µ0 ) > c∗ where c∗ = Fα (p, n − p).
n−p
Example 4.1. Laboratory analysis of two different nutrients (A and B) for each of a sample
of size n = 10 of the same food (in mg per 100 gram portion) revealed the following.
A 3.17 3.45 3.73 1.82 4.39 2.91 3.54 4.09 2.85 2.05
B 3.45 2.35 5.09 3.88 3.64 4.63 2.88 3.98 3.74 4.36
Does it appear that the sample come from a food with mean nutrient amount vector
µ0 = (3, 5)0 ?

Summary Statistics: p = 2, n = 10
n
1X
x̄j = xij ; j = 1, 2
n i=1
10
1 X 1
⇒ x̄1 = xi1 = (3.17 + 3.45 + · · · + 2.05) = 3.20
10 i=1 10
10
1 X 1
⇒ x̄2 = xi2 = (3.45 + 2.35 + · · · + 4.36) = 3.80
10 i=1 10
 
3.20
Thus, the sample mean vector is: x̄ = .
3.80
n
1 X
sjk = (xij − x̄j )(xik − x̄k ); j, k = 1, 2
n − 1 i=1

31
Multivariate Methods E-mail: [email protected]

10 10
1 X 1 X
⇒ s11 = (xi1 − x̄1 )2 = 0.678 and s22 = (xi2 − x̄2 )2 = 0.645
10 − 1 i=1 10 − 1 i=1
10
1 X
⇒ s12 = (xi1 − x̄1 )(xi2 − x̄2 ) = −0.109
10 − 1 i=1
Thus, the sample covariance matrix is:
   
0.678 −0.109 −1 1.517 0.257
S= ⇒S =
−0.109 0.645 0.257 1.594
   
3 3
1. Hypothesis: H0 : µ = vs H1 : µ 6= .
5 5

2. T 2 = n(x̄ − µ0 )0 S −1 (x̄ − µ0 )
 0   
2 3.20 − 3 1.517 0.257 3.20 − 3
⇒ T = 10
3.80 − 5 0.257 1.594 3.80 − 5
  
  1.517 0.257 0.2
= 10 0.2 −1.2
0.257 1.594 −1.2
= 22.322

(n − 1)p (10 − 1)2


3. The critical value is c∗ = Fα (p, n − p) = F0.05 (2, 10 − 2) = 2.25() =
n−p 10 − 2
10.032.
4. Since T 2 = 22.322 > c∗ = 10.032, H0 is rejected. Thus, the sample does not appear
to come from a food with mean nutrient [3, 5]0 at 5% level of significance.

4.2 Confidence Region for the Mean Vector µ


Ordinarily, instead of testing H0 : µ = µ0 , it is preferable to find regions of µ values that
are plausible in the light of the observed data.
Univariate case:
For a random sample of n observations x1 , x2 , · · · , xn is drawn from a normal population
with mean µ and variance σ 2 , the (1 − α)100% confidence interval for µ is given by:

x̄ − µ
√ ≤ tα/2 (n − 1)
s/ n

which is equivalent to
(x̄ − µ)2
t2 = 2
= (x̄ − µ)(s2 )−1 (x̄ − µ) ≤ Fα (1, n − 1).
s /n

32
Multivariate Methods E-mail: [email protected]

Multivariate case:

A (1 − α)100% confidence region for the p dimensional multivariate normal population


with mean µ is given by:

(n − 1)p
n(x̄ − µ)0 S −1 (x̄ − µ) ≤ Fα (p, n − p)
n−p

The confidence region is an ellipsoid centered at the sample mean vector x̄ = (x̄1 , x̄2 , · · · , x̄p )0 .
This implies, the boundary of the ellipsoid is

c∗ (n − 1)p
(x̄ − µ)0 S −1 (x̄ − µ) = where c∗ = Fα (p, n − p).
n (n − p)

Beginning at the center, x̄, the half lengths of the axes are given by:
r s
p √ p c ∗ p (n − 1)p
λj c = λj = λj Fα (p, n − p)
n (n − p)n

in the direction of ej which is the normalized eigen vector corresponding to the eigen value
λj ; j = 1, 2, · · · , p of S.

Example 4.2. Recall example 4.1. The 95% confidence region for µ = (µ1 , µ2 )0 is given
by:
c∗
(x̄ − µ)0 S −1 (x̄ − µ) ≤
n
 0   
3.20 − µ1 1.517 0.257 3.20 − µ1 10.032
⇒ ≤
3.80 − µ2 0.257 1.594 3.80 − µ2 10
This confidence region for µ = (µ1 , µ2 )0 will be an equation of ellipse like the form:

`1 (x1 − µ1 )2 + `2 (x2 − µ2 )2 + `2 (x1 − µ1 )(x2 − µ2 ) = `4 .

For all points inside the ellipse, H0 will not be rejected. For example, you can easily check
that µ = (3, 5)0 does not lie in the region.
 
0.678 −0.109
Plot of the confidence region. It is already given S = .
−0.109 0.645

0.678 − λ −0.109
|S − λI| = 0 ⇒ =0
−0.109 0.645 − λ

λ2 − 1.323λ + 0.425 = 0
⇒ λ1 = 0.774 and λ2 = 0.550.

33
Multivariate Methods E-mail: [email protected]

Eigen vectors: Sxj = λj xj ; j = 1, 2

For λ1 = 0.774: Sx1 = λ1 x1


      
0.678 −0.109 x11 x11 1
⇒ = 0.774 ⇒ x1 =
−0.109 0.645 x21 x21 −0.88
 
0.751
Thus, the orientation of the major axis is: e1 = .
−0.661
For λ2 = 0.550: Sx2 = λ1 x2
      
0.678 −0.109 x12 x12 1.000
⇒ = 0.550 ⇒ x2 =
−0.109 0.645 x22 x22 1.174
 
0.648
Thus, the orientation of the minor axis is: e1 = .
0.761
√ p
The halfplengths of the major and minor axes are λ1 c = 0.774(1.0032) = 0.881 and

λ2 c = 0.550(1.0032) = 0.743, respectively.

Beginning at x̄ = [3.20, 3.80]0 , the plot is as follows.

Inserttheplothere

4.3 Simultaneous Confidence Statements


Once the null hypothesis H0 : µ = µ0 is rejected, then the component which is responsible
for rejection has to be determined.

It would be erroneous to carry out univariate t tests for this purpose because the number
of tests and the correlation among the responses would lead to a greatly different values
of significance level (α) than the one chosen for the critical value of the t distribution.
For example, let X ∼ N6 (µ, Σ) and assume each component mean equals to a specified
value. There would be p = 6 univariate t-tests. Let α = 0.05. Then, the probabil-
ity of not rejecting the hypothesis of no difference from the specified value in each case
would be 1 − 0.05 = 0.95. If the tests are independent of each other, the probability of
not rejecting H0 in all of the 6 cases is (0.95)(0.95) · · · (0.95) = (0.95)6 = 0.7351. The
probability of rejecting at least one hypothesis of no difference from the specified value is
1 − 0.7351 = 0.2649 = α for a univariate t-test. This means that type I error is committed
26% of the time in testing all the 6 univariate tests. In general, the probability of commit-
ting type I error increases as the number of components is larger.

34
Multivariate Methods E-mail: [email protected]

In constructing the simultaneous confidence statements, all the separate confidence in-
tervals hold simultaneously a specified high confidence level (low significance level). Let
X1 , X2 , · · · , Xn be a random sample from Np (µ, Σ). Then, the linear combination:

a1 Xi1 + a2 Xi2 + · · · + ap Xip = a0 X; i = 1, 2, · · · , n

has a normal distribution with mean a0 X̄ and variance a0 Σa, that is, a0 X ∼ N (a0 µ, a0 Σa).
A simultaneous confidence region is given by a set of a0 µ values such that the observed
t2 is relatively small for all choices of a. Then, a (1 − α)100% simultaneous confidence
interval for a0 µ is:
r !
(a0 x̄ − a0 µ)2 a0 x̄ − a0 µ √ √ a0 Sa
≤ c∗ ⇒ p ≤ c∗ ⇒ a0 x̄ ± c∗

a0 Sa/n a0 Sa/n n

(n − 1)p
where c∗ = Fα (p, n − p). In particular, if a0 = (0, 0, · · · , |{z} 1 , · · · , 0), then
n−p
j th position

 r 
sjj (n − 1)p
the confidence interval for a0 µ = µj is x̄j ± c∗ where c∗ = Fα (p, n − p).
n n−p
Example 4.3. Consider again example 4.1. Find the 95% confidence interval for mean
nutrient A and B. The sample mean vector and sample variance-covariance matrix, re-
spectively, were:    
3.20 0.678 −0.109
x̄ = and S = .
3.80 −0.109 0.645
(10 − 1)2
Also, the critical value for the Hotelling’s T 2 was c∗ = F0.05 (2, 10 − 2) = 10.032.
10 − 2
A 95% simultaneous confidence interval interval for µ1 is:
!

r

 r 
s 11 0.678
x̄1 ± c∗ = 3.20 ± 10.033 = (2.375, 4.025).
n 10

Similarly, a 95% simultaneous confidence interval interval for µ2 is:


!

r

 r 
s 22 0.645
x̄2 ± c∗ = 3.80 ± 10.033 = (2.995, 4.650).
n 10

Note that µ01 = 3 is found inside in the confidence interval for µ1 while µ02 = 5 is
found outside the confidence interval for µ2 . Hence, the second component (nutrient B) is
responsible for the rejection of H0 : µ = (3, 5)0 .

35
Multivariate Methods E-mail: [email protected]

4.4 The Bonferroni Method of Multiple Comparisons


The Bonferroni confidence interval makes an adjustment on the univariate t-test critical
value, not to increase type I error, by considering the total number of confidence intervals
required. The (1 − α)100% Bonferroni confidence interval for µj is:
 r 
sjj
x̄j ± tα/2p (n − 1)
n
where p is the number of confidence intervals required.
Example 4.4. Find the Bonferroni confidence interval based on the data given in exam-
ple 4.1. t 0.05 (10 − 1) = t0.0125 (9) = 3.111.
2(2)

r ! r !
0.678 0.645
µ1 : 3.2 ± 3.111 = (2.39, 4.01) and µ2 : 3.8 ± 3.111 = (3.01, 4.59)
10 10

4.5 Likelihood-Ratio Test


Likelihood-ratio test is a general principle for constructing test procedures. It is the ratio
of the restricted likelihood function to the unrestricted likelihood function.

Recall for a random sample Xi ; i = 1, 2, · · · , n from Np (µ, Σ), the likelihood function is:
" n
#
1 1X
`(µ, Σ) = np n exp − (xi − µ)0 Σ−1 (xi − µ) .
(2π) 2 |Σ| 2 2 i=1
n
1X
Also recall the ML estimate of µ is µ̂ = x̄ and that of Σ is Σ̂ = (xi − x̄)(xi − x̄)0 .
n i=1
The exponent of the likelihood function can be:
 
Xn 
X n 

0 −1 0 −1
(xi − µ) Σ (xi − µ) = tr (xi − µ) Σ (xi − µ)
i=1  i=1 | {z } |
 {z }

A1×p Bp×1
( n )
X
= tr Σ−1 (xi − µ)(xi − µ)0 as tr(AB) = tr(BA)
( i=1 n
)
X
−1
= tr Σ (xi − µ)(xi − µ)0
i=1

Thus, " ( )#
n
1 1 X
`(µ, Σ) = np n exp − tr Σ−1 (xi − µ)(xi − µ)0 .
(2π) |Σ|
2 2 2 i=1

36
Multivariate Methods E-mail: [email protected]

The (unrestricted) maximum of the likelihood function is:


" ( n
)#
1 1 X
`(µ̂, Σ̂) = np n exp − tr Σ̂−1 (xi − x̄)(xi − x̄)0
(2π) 2 |Σ̂| 2 2 i=1
 
1 1 
−1
= np n exp − tr nΣ̂ Σ̂
(2π) 2 |Σ̂| 2 2
 
1 1
= np n exp − n tr (Ip )
(2π) 2 |Σ̂| 2 2
 
1 1
= np n exp − np
(2π) 2 |Σ̂| 2 2
When the null hypothesis holds, there is no need of searching for µ because it is given as
fixed. Hence, under H0 : µ = µ0 , the restricted likelihood function is:
" ( n
)#
1 1 X
`(µ0 , Σ0 ) = np n exp − tr Σ−10 (xi − µ0 )(xi − µ0 )0
(2π) 2 |Σ0 | 2 2 i=1
 
1 1 −1

= np n exp − tr nΣ0 Σ0
(2π) 2 |Σ0 | 2 2
 
1 1
= np n exp − n tr (Ip )
(2π) 2 |Σ0 | 2 2
 
1 1
= np n exp − np
(2π) 2 |Σ0 | 2 2
Therefore, the likelihood-ratio is:
" # n2
|Σ̂|
`(µ0 , Σ0 )
Λ= =
`(µ̂, Σ̂) |Σ0 |

2 |Σ̂|
⇒ Λn =
|Σ0 |
2
This likelihood-ratio test statistic Λ n is called Wilks’ Lambda. The null hypothesis H0 :
µ = µ0 should be rejected if the value of Λ is too small, i.e., if:
" # n2
|Σ̂|
Λ= < cα
|Σ0 |
where cα is the lower (100α)th percentile of the distribution of Λ. But,
−1
T2

2
Λn = 1 +
n−1
(n − 1)p 2
where T 2 ∼ F (p, n − p). Rejecting H0 for small values of Λ n is equivalent to
n−p
rejecting H0 for large values of T 2 .

37
Multivariate Methods E-mail: [email protected]

4.6 Large Sample Inference about µ


When the sample size is large, tests of hypothesis and confidence regions for µ can be
constructed without the assumption of a normal population. All large sample inferences
are based on the χ2 distribution.

When n − p is large, H0 : µ = µ0 will be rejected if T 2 = n(x̄ − µ0 )0 S −1 (x̄ − µ0 ) > χ2α (p)


(n − 1)p
since Fα (p, n − p) and χ2α (p) are approximately equal for large sample size. Also,
n−p r !
p a 0 Sa
the (1 − α)100% simultaneous confidence interval for a0 µ is a0 x̄ ± χ2α (p) .
n

38
Chapter 5

Comparison of Several Multivariate


Means

5.1 Dependent Samples


5.1.1 Paired Comparison
In paired comparison, two treatments or the presence and absence of a single treatment is
compared by assigning both treatments to the same (e.g., persons) or identical (e.g., plots)
experimental units. The paired responses are then analysed by computing their differences.
Univariate case:
Let Xi1 and Xi2 denote the responses to treatment I (response before treatment) and to
treatment II (after treatment) for the ith ; i = 1, 2, · · · , n trial (experimental unit). That
is, (Xi1 , Xi2 ) are responses recorded on the ith pair of like units. The differential effects of
the treatments is:
Di = Xi1 − Xi2 ; i = 1, 2, · · · , n.
Let the differences Di ; i = 1, 2, · · · , n represent independent observations from N (µd , σd2 )
(σd2 is unknown). The hypothesis to be tested is H0 : µd = 0 versus H1 : µd 6= 0. Then, the
test statistic is:
D̄ − µd
t= √ ∼ t(n − 1)
sd / n
n n
1X 1 X
where D̄ = Di and sd = (Di − D̄)2 . Consequently, H0 should be rejected if
n i=1 n − 1 i=1
|t| > tα/2 (n − 1).
Multivariate case:
Given p responses, 2 treatments and n experimental units. Let X1ij denote the j th response
of the ith unit to treatment I (response before treatment) and let X2ij denote the j th
response of the ith unit to treatment II (response after treatment).

39
Multivariate Methods E-mail: [email protected]

Pre-treatment matrix Post-treatment matrix


Var 1 Var 2 · · · Var p Var 1 Var 2 · · · Var p
X111 X112 · · · X11p X211 X212 · · · X21p
X121 X122 · · · X12p X221 X222 · · · X22p
.. .. ... .. .. .. .. ..
. . . . . . .
X1n1 X1n2 ··· X1np X2n1 X2n2 ··· X2np
Taking the differences (before treatment - after treatment) of the type:
Dij = D1ij − D2ij ; i = 1, 2, · · · , n; j = 1, 2, · · · , p
The hypothesis of interest is: H0 : µd = 0 (no treatment effect for all p components)
versus H1 : µd 6= 0. If D1 , D2 , · · · , Dn are independent random vectors distributed as
Np (µd , Σd ), then the test statistic is:
(n − 1)p
T 2 = n(D̄ − µd )0 (Sd )−1 (D̄ − µd ) ∼ F (p, n − p)
n−p
n n
1X 1X
where D̄ = Di and Sd = (Di − D̄)(Di − D̄)0 .
n i=1 n i=1
Given the observed differences d0i = (di1 , di2 , · · · , dip ); i = 1, 2, · · · , n, H0 : µd = 0 is
rejected if the observed
(n − 1)p
T 2 = nd¯0 (Sd )−1 d¯ > Fα (p, n − p)
n−p
n n
¯ 1X 1 X ¯ i − d) ¯ 0.
where d = di and Sd = (di − d)(d
n i=1 n − 1 i=1
¯
   
d1 sd1 d1 sd1 d2 · · · sd1 dp
 d¯2   sd d sd d · · · sd d 
Note that d¯ =  ..  and Sd =  ..
   2 1 2 2 2 p 
.. .. .. 
 .   . . . . 
d¯p sdp d1 sdp d2 · · · sdp dp
A (1 − α)100% confidence region for µd is n(d¯ − µd )0 (Sd )−1 (d¯ − µd ) ≤ c∗ which is an
¯ To plot the confidence ellipsoid, the sample covariance matrix
ellipsoid passing through d.
of the sample differences, Sd , is used.

Also, a (1 − α)100% simultaneous confidence interval for a linear combination a0 µd is given


by: r !

√ a0S a
d
a d j ± c∗ ; j = 1, 2, · · · , p
n
Particularly, a (1 − α)100% simultaneous confidence interval for the individual mean dif-
ferences µdj ’s are given by:

 r 
s d d
d¯j ± c∗ j j
; j = 1, 2, · · · , p
n

40
Multivariate Methods E-mail: [email protected]

Example 5.1. It is felt that three drugs (X1 , X2 and X3 ) may lead to changes in the level
of a certain biochemical compound found in the brain. Thirty mice of the same stain were
randomly divided into three groups and received the drugs. The amount of the compound
(in micrograms per gram of brain tissue) is recorded before and after the treatments. The
responses are in given in the following table. Test the hypothesis of no treatment effect at
5% level of significance.
Before treatment After treatment
x1i1 x1i2 x1i3 x2i1 x2i2 x2i3
1.21 0.61 0.70 1.26 0.50 0.81
0.92 0.43 0.71 1.07 0.39 0.69
0.80 0.35 0.71 1.33 0.24 0.70
0.85 0.48 0.68 1.39 0.37 0.72
0.98 0.42 0.71 1.38 0.42 0.71
1.15 0.52 0.72 0.98 0.49 0.70
1.10 0.50 0.75 1.41 0.41 0.70
1.02 0.53 0.70 1.30 0.47 0.67
1.18 0.45 0.70 1.22 0.29 0.68
1.09 0.40 0.69 1.00 0.30 0.70

The necessary calculations are as follows. Here d∗ij = dij − d¯j . Also the last row is the sum.

di1 di2 di3 d∗2


i1 d∗2
i2 d∗2
i3 d∗i1 d∗i2 d∗i1 d∗i3 d∗i2 d∗i3
-0.050 0.110 -0.110 0.023716 0.000841 0.011881 0.004466 -0.016786 -0.003161
-0.150 0.040 0.020 0.002916 0.001681 0.000441 -0.002214 0.001134 -0.000861
-0.530 0.110 0.010 0.106276 0.000841 0.000121 -0.009454 -0.003586 0.000319
-0.540 0.110 -0.040 0.112896 0.000841 0.001521 -0.009744 0.013104 -0.001131
-0.400 0.000 0.000 0.038416 0.006561 0.000001 0.015876 -0.000196 -0.000081
0.170 0.030 0.020 0.139876 0.002601 0.000441 -0.019074 0.007854 -0.001071
-0.310 0.090 0.050 0.011236 0.000081 0.002601 -0.000954 -0.005406 0.000459
-0.280 0.060 0.030 0.005776 0.000441 0.000961 0.001596 -0.002356 -0.000651
-0.040 0.160 0.020 0.026896 0.006241 0.000441 0.012956 0.003444 0.001659
0.090 0.100 -0.010 0.086436 0.000361 0.000081 0.005586 -0.002646 -0.000171
-2.040 0.810 -0.010 0.554440 0.020490 0.018490 -0.000960 -0.005440 -0.004690
n 10
1X 1 X
d¯j = dij = dij ; j = 1, 2, 3
n i=1 10 i=1

41
Multivariate Methods E-mail: [email protected]

10
1 X 1
⇒ d¯1 = di1 = (−0.204) = −0.204
10 i=1 10
10
1 X 1
⇒ d¯2 = di2 = (0.810) = 0.081
10 i=1 10
10
1 X 1
⇒ d¯3 = di3 = (−0.01) = −0.001
10 i=1 10
 
−0.204
⇒ d¯ =  0.081 
−0.001

n 10
1 X 1 X
sdj dk = (dij − d¯j )(dik − d¯k ) = (dij − d¯j )(dik − d¯k ); j, k = 1, 2, 3
n − 1 i=1 10 − 1 i=1

10
1X 1
⇒ sd1 d1 = (di1 − d¯1 )2 = (0.55444) = 0.06160
9 i=1 9
10
1X 1
⇒ sd2 d2 = (di2 − d¯2 )2 = (0.02049) = 0.00228
9 i=1 9
10
1X 1
⇒ sd3 d3 = (di3 − d¯3 )2 = (0.01849) = 0.00205
9 i=1 9
10
1X 1
⇒ sd1 d2 = (di1 − d¯1 )(di2 − d¯2 ) = (−0.00096) = −0.00011
9 i=1 9
10
1X 1
⇒ sd1 d3 = (di1 − d¯1 )(di3 − d¯3 ) = (−0.00544) = −0.00060
9 i=1 9
10
1X 1
⇒ sd2 d3 = (di2 − d¯2 )(di3 − d¯3 ) = (−0.00469) = −0.00052
9 i=1 9
   
0.06160 −0.00011 −0.00060 16.28866 1.98818 5.27173
Sd =  −0.00011 0.00228 −0.00052  ⇒ Sd−1 =  1.98818 465.77088 118.72867 
−0.00060 −0.00052 0.00205 5.27173 118.72867 519.46436
The hypothesis to be tested is:
H0 : µd = 0
H1 : µd 6= 0
T 2 = nd¯0 (Sd )−1 d¯

42
Multivariate Methods E-mail: [email protected]

 0   
−0.204 16.28866 1.98818 5.27173 −0.204
⇒ T 2 = 10  0.081   1.98818 465.77088 118.72867   0.081  = 36.515
−0.001 5.27173 118.72867 519.46436 −0.001
(n − 1)p (10 − 1)3
The critical value is c∗ = Fα (p, n − p) = F0.05 (3, 10 − 3) = 16.779. There
n−p 10 − 3
is a significant treatment effect at 5% level of significance.

The next question is which of the three drugs (X1 , X2 or X3 ) leads to changes in the level of
the biochemical compound found in the brain? To answer this question, the simultaneous
confidence intervals for the individual mean differences µdj need to be constructed, which
is given by:

 r 
¯ ∗
sdj dj
dj ± c ; j = 1, 2, 3
n
Hence, the 95% confidence intervals are:
!

r

 r 
¯ ∗
sd1 d1 0.06160
µd1 : d1 ± c = −0.204 ± 16.779 = (−0.5255, 0.1175)
n 10
!

r

 r 
¯ ∗
sd2 d2 0.00228
µd 2 : d2 ± c = 0.081 ± 16.779 = (0.0191, 0.1429)
n 10
!

r

 r 
¯ ∗
sd3 d3 0.00205
µd 3 : d3 ± c = −0.001 ± 16.779 = (−0.0596, 0.0576)
n 10

The confidence interval for µd2 does not include zero. Thus, H0 : µd = 0 was rejected due
to the second component (X2 ). In other words, it is the second drug (X2 ) that led to a
significant change in the level of the biochemical compound found in the brain at 5% level
of significance.

5.1.2 A Repeated Measures Design for Comparing Treatments


A repeated measures design is another generalization of the univariate t statistic in which q
treatments are compared with respect to a single response measured from the same (iden-
tical) sampling units over time or space. Each experimental unit receives each treatment
once over successive period of time. The name repeated measures stems from the fact that
all treatments are administered to each unit.

Let Xik be the response to the k th ; k = 1, 2, · · · , q treatment on the ith ; i = 1, 2, · · · , n


unit.

43
Multivariate Methods E-mail: [email protected]

Item Treatment 1 Treatment 2 · · · Treatment q


1 X11 X12 ··· X1q X1
2 X21 X22 ··· X1q X2
.. .. .. .. .. ..
. . . . . .
n Xn1 Xn2 ··· Xnq Xq

The hypothesis of interest is whether µ1 = µ2 = · · · = µq (no treatment effect). For


comparative purposes, contrasts of the components of µ = E(Xi ) are considered. These
could be:
    
µ1 − µ2 1 −1 0 0 ··· 0 0 µ1
 µ2 − µ3   0 1 −1 0 ··· 0 0    µ2 
 
  
 µ3 − µ4   0 0 1 −1 · · · 0 0 
=   µ3  = Aµ
 

.. . .. .. .. . . .. .. . 
  .. .   .. 
    
 . . . . . .
µq−1 − µq 0 0 0 0 · · · 1 −1 µq
| {z } | {z } | {z }
(q−1)×1 (q−1)×q q×1

or     
µ1 − µ2 1 −1 0 0 ··· 0 0 µ1

 µ1 − µ3  
  0 1 −1 0 ··· 0 0 
 µ2 


 µ1 − µ4 =
  0 0 1 −1 ··· 0 0 
 µ3  = Bµ

 ..   .. .. .. .. . . .. ..  .. 
 .   . . . . . . .  . 
µ1 − µq 0 0 0 0 · · · 1 −1 µq
| {z } | {z } | {z }
(q−1)×1 (q−1)×q q×1

Since each row is a contrast and the q − 1 rows are linearly independent, both A and B are
contrast matrices. If Aµ = Bµ = 0, then µ1 = µ2 = · · · = µq . Hence, the hypothesis of
no difference in treatments (equal treatment means) is Aµ = 0 for any choice of contrast
matrix A.

Consider an Nq (µ, Σ) population. If A is a contrast matrix, then AX̄ ∼ N (Aµ, ASA0 ).


For testing H0 : Aµ = 0 vs H1 : Aµ 6= 0, the test statistic is:

(n − 1)(q − 1)
T 2 = n(AX̄)0 (ASA0 )−1 (AX̄) ∼ F [q − 1, n − (q − 1)].
n − (q − 1)

Note that T 2 does not depend on the particular choice of A. As usual, reject H0 if the
observed T 2 = n(Ax̄)0 (ASA0 )−1 (Ax̄) > c∗ .

The (1−α)100% simultaneous confidence interval for a single contrast a0 µ for any contrast
vector a of interest are: r !
√ a 0 Sa
ax̄ ± c∗
n

44
Multivariate Methods E-mail: [email protected]

(n − 1)(q − 1)
where c∗ = Fα [q − 1, n − (q − 1)]. Particularly, the confidence interval for
n − (q − 1)
µj − µk is obtained by letting a0 = (0, · · · , 0, |{z}
1 , 0, · · · , 0, |{z}
1 , 0, · · · , 0):
j th position kth position

" r #
√ sjj − 2sjk + skk
(x̄j − x̄k ) ± c∗ ; j 6= k.
n

Example 5.2. A researcher conducted three indices measuring severity of heart attacks.
The values of the indices for n = 40 heart-attack patients arriving at a hospital emergency
room produced the following summary statistics.
   
46.1 101.3 63.0 71.0
x̄ =  57.3  and S =  63.0 80.2 55.6 
50.4 71.0 55.6 97.4

Test the equality of the mean indices and judge the differences in pairs of mean indices.
 
1 −1 0
Since there are q = 3 treatments, let A = . Then the hypothesis to be
1 0 −1
tested is    
µ1 − µ2 0
H0 : Aµ = 0 ⇒ H0 : =
µ1 − µ3 0
   
µ1 − µ2 0
H1 : Aµ 6= 0 ⇒ H1 : 6=
µ1 − µ3 0
The test statistic is: T 2 = n(Ax̄)0 (ASA0 )−1 (Ax̄).
     
−11.2 0 55.5 22.9 0 −1 0.02162 −0.00873
Ax̄ = , ASA = ⇒ (ASA ) =
−4.3 22.9 56.7 −0.00873 0.02116
  
2
  0.02162 −0.00873 −11.2
T = −11.2 −4.3 = 90.49
−0.00873 0.02116 −4.3
(n − 1)(q − 1) (40 − 1)(3 − 1)
c∗ = Fα [q − 1, n − (q − 1)] = F0.05 [3 − 1, 40 − (3 − 1)] = 6.66
n − (q − 1) 40 − (3 − 1)
Hence, H0 : Aµ = 0 is rejected. All mean indices are not equal.

The 95% simultaneous confidence interval for µj − µk is:


" r #
√ s jj − 2s jk + s kk
(x̄j − x̄k ) ± c∗ ; j 6= k
n

45
Multivariate Methods E-mail: [email protected]

" r #
√ s11 − 2s12 + s22
µ1 − µ2 : (x̄1 − x̄2 ) ± c∗
n
" r #

101.3 − 2(63.0) + 80.2
−11.2 ± 6.66 = (−14.23986, −8.16014)
40
" r #
√ s 11 − 2s 13 + s 33
µ1 − µ3 : (x̄1 − x̄3 ) ± c∗
n
" r #
√ 101.3 − 2(71.0) + 97.4
−4.3 ± 6.66 = (−7.37255, −1.22745)
40
" r #
√ s 22 − 2s 23 + s 33
µ2 − µ3 : (x̄2 − x̄3 ) ± c∗
n
" r #
√ 80.2 − 2(55.6) + 97.4
6.9 ± 6.66 = (3.57500, 10.22500)
40
All the intervals do not contain zero. Thus, all mean indices are significantly different from
each other (µ2 > µ3 > µ1 ).

5.2 Independent Samples


5.2.1 Comparing Mean Vectors from Two Populations
Univariate case:
• X11 , X12 , · · · , X1n1 ∼ N (µ1 , σ12 )
• X21 , X22 , · · · , X2n2 ∼ N (µ2 , σ22 )
The hypothesis to be tested is H0 : µ1 = µ2 vs H1 : µ1 6= µ1 . Assuming σ12 = σ22 , the test
statistic is:
(X̄1 − X̄2 ) − (µ1 − µ2 )
t = s  ∼ t(n1 + n2 − 2)
1 1
+ s2pooled
n1 n2
(n1 − 1)s21 + (n2 − 1)s22
where s2pooled
= . Reject H0 if |t| > tα/2 (n1 + n2 − 2).
n1 + n2 − 2
Multivariate case: Data layout
Population 1 Population 2
0 0
X111 X112 · · · X11p → X11 X211 X212 · · · X21p → X21
0 0
X121 X122 · · · X12p → X12 X221 X222 · · · X22p → X22
.. .. .. .. .. .. .. .. .. ..
. . . . . . . . . .
0 0
X1n1 1 X1n1 2 · · · X1n1 p → X1n 1
X2n2 1 X2n2 2 · · · X2n2 p → X2n2

46
Multivariate Methods E-mail: [email protected]

Assumptions

• X11 , X12 , · · · , X1n1 ∼ Np (µ1 , Σ1 )

• X21 , X22 , · · · , X2n2 ∼ Np (µ2 , Σ2 ).

• X11 , X12 , · · · , X1n1 are independent of X21 , X22 , · · · , X2n1 .

• If n1 and n2 are small, then

– both populations are assumed to multivariate normal, and


– they are assumed to have the same covariance matrix (i.e., Σ1 = Σ2 ). This
assumption is much stronger than its univariate counterpart because the several
pairs of variances and covariances must be nearly equal.
n1 n2
1 X 0 1 X
If Σ1 = Σ2 = Σ, then both S1 = (x1i − x̄1 )(x1i − x̄1 ) and S2 = (x2i −
n1 − 1 i=1 n2 − 1 i=1
x̄2 )(x2i − x̄2 )0 estimate Σ. Consequently, both samples can be pooled to estimate the com-
(n1 − 1)S1 + (n2 − 1)S2
mon covariance Σ. That is, Spooled = estimates Σ.
n1 + n2 − 2
To test H0 : µ1 = µ2 vs H1 : µ1 6= µ1 , the squared statistical distance from x̄1 − x̄2 to
µ1 − µ2 is considered.

• Since E(X̄1 − X̄2 ) = (µ1 − µ2 ), (x̄1 − x̄2 ) estimates (µ1 − µ2 ).

• Since the two samples are independent, Cov(X̄1 , X̄2 ) = 0. This


 implies 
1 1 1 1
Cov(X̄1 − X̄2 ) = Cov(X̄1 ) + Cov(X̄2 ) = Σ+ Σ = + Σ. Thus,
  n1  n2  n1 n2
1 1 1 1
+ Spooled estimates Cov(X̄1 − X̄2 ) = + Σ.
n1 n2 n1 n2
The test statistic is:
  −1
2
 0 1 1  
T = (X̄1 − X̄2 ) − (µ1 − µ2 ) + Spooled (X̄1 − X̄2 ) − (µ1 − µ2 ) .
n1 n2

(n1 + n2 − 2)p
Since T 2 ∼ F (p, n1 +n2 −p−1), H0 will be rejected if the observed T 2 > c∗
n1 + n2 − p − 1
(n1 + n2 − 2)p
where c∗ = Fα (p, n1 + n2 − p − 1).
n1 + n2 − p − 1
A (1 − α)100% confidence region for µ1 − µ2 is given by:
  −1
0 1 1
[(x̄1 − x̄2 ) − (µ1 − µ2 )] + Spooled [(x̄1 − x̄2 ) − (µ1 − µ2 )] ≤ c∗ .
n1 n2

47
Multivariate Methods E-mail: [email protected]

which is an ellipsoid centered at (x̄1 − x̄2 ). The half lengths of the axes are
s 
1 1
c∗ ; j = 1, 2, · · · , p
p
λj +
n1 n2

in the direction of ej which is the normalized eigen vector associated with the eigen value
λj of Spooled .

A (1 − α)100% simultaneous confidence interval for a0 (µ1 − µ2 ) is


" s #


0 1 1
a (x̄1 − x̄2 ) ± c∗ + a0 Spooled a .
n1 n2

If a0 = (0, 0, · · · , 1
|{z} , · · · , 0), a0 (µ1 − µ2 ) = µ1j − µ2j , a0 (x̄1 − x̄2 ) = x̄1j − x̄2j and
j th position
0
a Spooled a = sjj . Thus, a (1 − α)100% simultaneous confidence interval for µ1j − µ2j is
" s  #
√ 1 1
(x̄1j − x̄2j ) ± c∗ + sjj
n1 n2

where sjj is the j th diagonal entry of the pooled covariance matrix, Spooled .

Example 5.3. Given the following hypothetical data on academic performance of students
(in preparatory school out of 100 and in university out of 4.00). Test the equality of the
population mean vectors between the two groups.
Female Male
Preparatory University Preparatory University
97 3.40 86 3.90
95 3.45 84 3.75
85 3.50 70 2.25
80 3.05
75 2.80
Summary statistics
   
92.3333 41.3333 0.2500
Female: x̄1 = and S1 =
3.4500 0.2500 0.0025
   
79.0000 43.0000 4.4125
Male: x̄2 = and S2 =
3.1500 4.4125 0.4663
   
42.4444 3.0250 −1 0.0764 −0.7415
Spooled = ⇒ Spooled =
3.0250 0.3117 −0.7415 10.4048

48
Multivariate Methods E-mail: [email protected]

The observed test statistic is:


 −1
1 1 −1
2
T = + (x̄1 − x̄2 )0 Spooled (x̄1 − x̄2 )
n1 n2
 −1   
1 1   0.0764 −0.7415 13.3333
= + 13.3333 0.3000
3 5 −0.7415 10.4048 0.3000
= 16.0999

(6)2
Critical value: c∗ = F0.05 (2, 5) = 13.8866. Reject H0 : µ1 = µ2 .
5
A (1 − α)100% simultaneous confidence interval for µ1j − µ2j is
" s  #
√ 1 1
(x̄1j − x̄2j ) ± c∗ + sjj
n1 n2

where sjj is the j th diagonal entry of the pooled covariance matrix, Spooled .
" s  #
√ 1 1
µ11 − µ21 : (x̄11 − x̄21 ) ± c∗ + s11
n1 n2
s !


1 1
13.3333 ± 13.8866 + 42.4444 = (−4.3967, 31.0633)
3 5

" s  #
√ 1 1
µ12 − µ22 : (x̄12 − x̄22 ) ± c∗ + s22
n1 n2
s !


1 1
0.3000 ± 13.8866 + 0.3117 = (−1.2194, 1.8194)
3 5

Both simultaneous confidence intervals contain the value zero indicating that there is no
significant difference in the mean vectors between females and males. But, this is in
contradiction with the test of H0 : µ1 = µ2 . The possible reasons may be:
• The multivariate normality of the observation vectors might be violated because of
the small sample sizes.

• The assumption of equality of the covariance matrices (Σ1 = Σ2 ) may not hold.

5.2.2 Comparison of Several Multivariate Population Means


Often, more than two populations need to be compared. Random samples are collected
from each of g populations.

49
Multivariate Methods E-mail: [email protected]

Population 1: X11 , X12 , · · · , X1n1

Population 2: X21 , X22 , · · · , X2n2


..
.

Population g: Xg1 , Xg2 , · · · , Xgng

Univariate ANOVA:

• Let X`1 , X`2 , · · · , X`n` be a random sample from an N (µ` , σ 2 ); ` = 1, 2, · · · , g.

• The samples from different populations are independent.

• All populations have a common variance, σ 2 .

The null hypothesis of equality of means H0 : µ1 = µ2 = · · · = µg . Each population mean


µ` ; ` = 1, 2, · · · , g can be considered as a sum of an overall mean (µ) and a component
specific to each population (τ` ):

µ` = µ + (µ` − µ)
⇒ µ` = µ + τ `

where τ` = µ` −µ is the `th population (treatment) effect. The null hypothesis now becomes
H0 : τ1 = τ2 = · · · = τg = 0.

Since the response X`i ∼ N (µ + τ` , σ 2 ), it can be expressed as:

X`i = µ` + e`i
⇒ X`i = µ + τ` + e`i .
g
X
2
where the random error e`i are independent N (0, σ ). A constraint n` τ` = 0 is imposed
`=1
to define the model parameters uniquely.

The analysis of variance is based on the decomposition of each observed value x`i ,

x`i = x̄ + (x̄` − x̄) + (x`i − x̄` )


x`i − x̄ = (x̄` − x̄) + (x`i − x̄` )
(x`i − x̄)2 = (x̄` − x̄)2 + 2(x̄` − x̄)(x`i − x̄` ) + (x`i − x̄` )2

Taking the summation over i,


n
X̀ n
X̀ n

2 2
(x`i − x̄) = n` (x̄` − x̄) + 2(x̄` − x̄) (x`i − x̄` ) + (x`i − x̄` )2
i=1 i=1 i=1

50
Multivariate Methods E-mail: [email protected]

n
X̀ n
X̀ n
X̀ n

2 2 2
⇒ (x`i − x̄) = n` (x̄` − x̄) + (x`i − x̄` ) since (x`i − x̄` ) = x`i − n` x̄` = 0.
i=1 i=1 i=1 i=1
Now taking the summation over `,
g n g g n
X X̀ X X X̀
(x`i − x̄)2 = n` (x̄` − x̄)2 + (x`i − x̄` )2
i=1 i=1
|`=1 {z } |`=1 {z } |`=1 {z }
SScorrected BSS (SStreatment ) W SS (SSresiduals )

The ANOVA table:


Sources of variation Sum of squares (SS) Degrees of freedom (df )
g
X
Between Group (Treatment) BSS = n` (x̄` − x̄)2 g−1
`=1
g n
X X̀
Within Group (Residual) W SS = (x`i − x̄` )2 n−g
`=1 i=1
g n
X X̀
Total T SScor = (x`i − x̄)2 n−1
`=1 i=1

BSS/(g − 1)
The null hypothesis H0 is rejected if F = > Fα (g − 1, n − g). Rejecting H0
W SS/(n − g)
when F is too large is equivalent with rejecting H0 if:
BSS BSS 1
is too large ⇒ + 1 is too large ⇒ is too small
W SS W SS BSS
+1
W SS
W SS
⇒ is too small.
BSS + W SS
This is used for a multivariate generalization.
Multivariate ANOVA - MANOVA:
Let X`1 , X`2 , · · · , X`n` ; ` = 1, 2, · · · , g is a random sample of size n` from an Np (µ` , Σ).
The random sample from the different populations are independent.

The null hypothesis of equality of means H0 : µ1 = µ2 = · · · = µg . The model is


X`i = µ + τ` + e`i where τ` = µ` − µ = (τ`1 , τ`2 , · · · , τ`p )0 is the `th group (treatment)
g
X
effect with n` τ` = 0 and e`i ∼ Np (0, Σ).
`=1

Now to decompose the sum squares, matrix manipulation is used.


(x`i − x̄)(x`i − x̄)0 = [(x`i − x̄` ) + (x̄` − x̄)][(x`i − x̄` ) + (x̄` − x̄)]0
=[(x`i − x̄` ) + (x̄` − x̄)][(x`i − x̄` )0 + (x̄` − x̄)0 ]
=(x`i − x̄` )(x`i − x̄` )0 + (x`i − x̄` )(x̄` − x̄)0
+ (x̄` − x̄)(x`i − x̄` )0 + (x̄` − x̄)(x̄` − x̄)0

51
Multivariate Methods E-mail: [email protected]

When taking the summation over i, the middle two cross-products become zero vectors.
Then, taking the summation over ` gives:
g n g n g
X X̀ X X̀ X
0 0
(x`i − x̄)(x`i − x̄) = (x`i − x̄` )(x`i − x̄` ) + n` (x̄` − x̄)(x̄` − x̄)0
`=1 i=1 i=1
|`=1 {z } |`=1 {z }
W B

Therefore, the MANOVA table is:


Sources of variation Matrix of SS and cross-products (SSP) df
g
X
Between Group (Treatment) B= n` (x̄` − x̄)(x̄` − x̄)0 g−1
`=1
g n
X X̀
Within Group (Residual) W = (x`i − x̄` )(x`i − x̄` )0 n−g
`=1 i=1
g n
X X̀
Total B+W = (x`i − x̄)(x`i − x̄)0 n−1
`=1 i=1

n
th 1 X̀
The sample mean of the ` group is x̄` = x`i and the sample covariance matrix
n` i=1
n
1 X̀
th
of the ` group is S` = (x`i − x̄` )(x`i − x̄` )0 ; ` = 1, 2, · · · , g. This implies
n` − 1 i=1
g
1X (n1 − 1)S1 + (n2 − 1)S2 + · · · + (ng − 1)Sg
x̄ = n` x̄` and Spooled = .
n `=1 n−g
g
X 1
Note that W = (n` − 1)S` = (n − g)Spooled ⇒ Spooled = W.
`=1
n − g
W SS
Recall in the univariate case, H0 : τ1 = τ2 = · · · = τg = 0 is rejected if is
BSS + W SS
small. Likewise, the null hypothesis is H0 : τ1 = τ2 = · · · = τg = 0 is to be rejected if:

|W |
Λ∗ = is too small.
|B + W |

The statistic Λ∗ is known as Wilks’ Lambda. The exact distribution of Λ∗ for special cases
is given on the text book on page 303, Table 6.3.

If n is large, Bartlett has shown that


 
p+g
− n−1− log(Λ∗ ) ∼ χ2 [p(g − 1)].
2

52
Multivariate Methods E-mail: [email protected]

Simultaneous Confidence Intervals


If H0 : τ1 = τ2 = · · · = τg = 0 is rejected, the next task is to identify which group(s) in
which variables (components) are responsible for rejection. Bonferroni’s approach can be
used to construct simultaneous intervals for the components of the differences τ` − τk on
µ` − µk which adjusts the significance level to the p(gC2 ) confidence intervals required.

Let τ`j be the j th component of τ` = µ` − µ = (τ`1 , τ`2 , · · · , τ`p )0 ; ` = 1, 2, · · · , g. Thus, for


component j; j = 1, 2, · · · , p, τ`j − τkj is estimated by x̄`j − x̄kj ; ` 6= k.
 
1 1 1 1
Var(x̄`j − x̄kj ) = σjj + σjj = + σjj
n` nk n` nk
where σjj is the j th diagonal element of Σ which is estimated by Spooled . Since Spooled =
1 wjj
W , σjj is estimated by where wjj is the j th diagonal element of W .
n−g n−g
Therefore, a (1 − α)100% confidence interval for the difference τ`j − τkj is:
" s  #
1 1 wjj
(x̄`j − x̄kj ) ± t[α/pg(g−1)] (n − g) +
n` nk n − g

where wjj is the j th diagonal element of W .


Example 5.4. Given the following observation vectors on two responses collected for three
treatments.
6 5 8 4 7
Treatment 1
7 9 6 9 9
3 1 2
Treatment 2
3 6 3
2 5 3 2
Treatment 3
3 1 1 3
Construct oneway MANOVA and test for treatment effects at 5% significance level.
    

6 2 3
x̄1 = , x̄2 = , x̄3 =
8 4 2
g         
1X 1 6 2 3 4
⇒ x̄ = n` x̄` = 5 +3 +4 =
n `=1 12 8 4 2 5
g  
X
0 36 48
B= n` (x̄` − x̄)(x̄` − x̄) =
48 84
`=1
g n  
X X̀
0 18 −13
W = (x`i − x̄` )(x`i − x̄` ) =
−13 38
`=1 i=1

53
Multivariate Methods E-mail: [email protected]

 
54 35
B+W =
35 122
The MANOVA table is:
Sources of variation Matrix of SS and cross-products
 (SSP) df
36 48
Between Group (Treatment) B= 3−1=2
 48 84 
18 −13
Within Group (Residual) W = 12 − 3 = 9
−13
 38 
54 35
Total B+W = 12 − 1 = 11
35 122
W 515
Λ∗ = = = 0.096
B+W 5363
For p = 2 and g = 3, the distribution is:
√ !
n − g − 1 1 − Λ∗
√ ∼ F [2(g − 1), 2(n − g − 1)]
g−1 Λ∗
√ !
7 1 − 0.096
⇒ √ = 8.908 and F0.05 [4, 7] = 3.01
2 0.096
Therefore, H0 : τ1 = τ2 = τ3 = 0 should be rejected.

Next for the simultaneous confidence interval α/[pg(g − 1)] = 0.004167, t0.004167 (n − g) =
3.808. A (1 − α)100% confidence interval for the difference τ`j − τkj is:
" s  #
1 1 wjj
(x̄`j − x̄kj ) ± t[α/pg(g−1)] (n − g) +
n` nk n − g

where wjj is the j th diagonal element of W .

For component 1:
" s  #
1 1 w11
τ11 − τ21 : (x̄11 − x̄21 ) ± 3.808 +
n1 n2 12 − 3
" s  #
1 1 18
4 ± 3.808 + = (0.067, 7.933)
5 3 12 − 3
" s  #
1 1 w11
τ11 − τ31 : (x̄11 − x̄31 ) ± 3.808 +
n1 n3 12 − 3
" s  #
1 1 18
3 ± 3.808 + = (−0.613, 6.613)
5 4 12 − 3

54
Multivariate Methods E-mail: [email protected]

" s  #
1 1 w11
τ21 − τ31 : (x̄21 − x̄31 ) ± 3.808 +
n2 n3 12 − 3
" s  #
1 1 18
−1 ± 3.808 + = (−5.113, 3.113)
3 4 12 − 3
For component 2:
" s  #
1 1 w22
τ12 − τ22 : (x̄12 − x̄22 ) ± 3.808 +
n1 n2 12 − 3
" s  #
1 1 38
4 ± 3.808 + = (−1.714, 9.714)
5 3 12 − 3
" s  #
1 1 w22
τ12 − τ32 : (x̄12 − x̄32 ) ± 3.808 +
n1 n3 12 − 3
" s  #
1 1 38
6 ± 3.808 + = (0.751, 11.249)
5 4 12 − 3
" s  #
1 1 w22
τ22 − τ32 : (x̄22 − x̄32 ) ± 3.808 +
n2 n3 12 − 3
" s  #
1 1 38
2 ± 3.808 + = (−3.976, 7.976)
3 4 12 − 3
There is a significant difference between treatment 1 and treatment 2 in component 1.
Also, there is a significant difference between treatment 1 and treatment 3 in component
2.

55

You might also like