Section 1 - Multivariate Data and Matrix Algebra
Section 1 - Multivariate Data and Matrix Algebra
By
Dr.Richard Tuyiragize
School of Statistics and Planning
Makerere University
MVA gathers and puts together all possible information on many variables to make pre-
dictions and answers questions. MVA techniques play an important role in data analysis
in almost all branches of knowledge including solid science physical sciences and medical
sciences, engineering, physical sciences etc. With the advance of computer technology, the
applications of MVA are increasing because hundreds of factors can be considered in solving
a problem.
2. Reduction of the dimensionality of the data: This refers to methods that are
primarily applied to reduce the data to a few manageable variables which can then
be used for further analysis. The new variables are in general uncorrelated but can
be a linear combination of the original variables. The most widely used method for
data reduction is Principal Component Analysis which forms linear combinations of
the original variables that are uncorrelated and have decreasing variance.
3. Classification: These techniques are used to classify individuals into distinct sub-
groups on the basis of the number of (possibly independent) variables. When the
groups are known a priori, Discriminant analysis is used to allocate members within the
group to the correct subgroups. A priori information required group membership.
However if it is only suspected that natural groupings may exist and the aim is to
discover such groups, one uses a technique known as luster analysis to identify the
groups in question.
4. Dependency: The relationship between variables can be determined for the purpose
of predicting the values of one or more variables on the basis of observations on the other
variables. Example: Multivariate linear regression, multivariate analysis of variance.
STA3120 1 Email:[email protected]
2 Structure of multivariate data set
Most multivariate data sets can be represented in a rectangular format, in which the ele-
ments of each row correspond to the variable values of a particular unit in the data set and
the elements of the columns correspond to the values taken by a particular variable.
Suppose there are p ≥ 2 variables (characteristics) measured from n items. Let xij denote
the value of the j th variable on the ith item (i = 1, 2, · · · ,n and j = 1, 2, · · · ,p, n ≫ p).
.. .. .. .. .. .. ..
. . . . . . .
If i = 1, 2, · · · ,n is consisting of time, we have a time series dataset. Time series data is data
collected at different points in time. For instance GDP of Uganda for years 2000-2020.
If time series and cross sectional are combined, we shall have a panel data set or longitudinal
data set. (A data set is longitudinal if it tracks the same type of information on the same
subjects at multiple points in time). Panel data is data that combines both time series and
cross sectional data.
STA3120 2 Email:[email protected]
If p = 1 the matrix reduces to an nx1 matrix (Univariate dataset)
Note that Multivariate analysis (techniques) are generalizations of univariate and bivariate
techniques.
3 Types of variables
Variables are generally divided into two major categories; Quantitative and Categorical
Quantitative variables: When you collect quantitative data, the numbers you record rep-
resent real amounts that can be added, subtracted, divided, etc. There are two types of
quantitative variables: discrete and continuous.
Discrete variables (aka integer variables) represents counts of individual items or values. For
example, Number of students in a class or Number of different tree species in a forest. Con-
tinuous variables (aka ratio variables) represents measurements of continuous or non-finite
values. For example, Distance, Volume or Age.
Categorical variables: These represent groupings of some kind. They are sometimes recorded
as numbers, but the numbers represent categories rather than actual amounts of things.
There are three types of categorical variables: binary, nominal, and ordinal variables.
Binary variables (aka dichotomous variables) represents yes/no outcomes. For example,
Heads/tails in a coin flip or Win/lose in a football game.
Nominal variables represent groups with no rank or order between them. For example,
Species names, Colors or Brands.
Ordinal variables represent groups that are ranked in a specific order. For example, finishing
place in a race or income level or rating scale responses.
Most of the techniques we are going to deal with in this course will assume variables of
numerical nature for which we can compute descriptive statistics, that is, means, variance,
standard deviation.
STA3120 3 Email:[email protected]
The study of multivariate methods is greatly facilitated by the use of matrix algebra. The
next section presents a review of basic concepts of matrix algebra which are essential for the
interpretation and explanation of subsequent multivariate statistical techniques.
4 Matrix algebra
It is written as:
x11 x12 · · · x1j · · · x1p
x21 x22 · · · x2j · · · x2p
X=
.. .. ... .. ... ..
. . . .
xn1 xn2 · · · xnj · · · xnp
If
a11 0 ··· 0 ··· 0
a21 a22 · · · 0 · · · 0
A=
.. .. .. .. .. ..
. . . . . .
an1 an2 · · · anj · · · anp
then A is a lower triangular matrix
If
a11 a12 · · · a1j · · · a1p
0 a22 · · · a2j · · · a2p
A=
.. .. .. .. .. ..
. . . . . .
0 0 ··· 0 · · · anp
then A is a upper triangular matrix
STA3120 4 Email:[email protected]
x1
x2
′
A vector is a matrix of n x1 real numbers; x = .. or x = x1 , x2 , · · · , xn
.
xn
For column vector x, if p=1, it reduces to a 1x1 vector called Scalar denoted as a11
If all the components are zeros, then the vector x is called zero or null or empty vector
denoted as 0.
′
A vector has both magnitude (length) and direction. The length of a vector x = x1 , x2 , · · · , xn
is defined by
p √
Lx = x21 + x22 + · · · + x2n = x′ x
The length of a vector can be expanded and contracted by multiplying with a constant a.
ax1
ax2
ax = ···
axn
Such multiplication of a vector x by a scalar a changes the length as;
p √
Lax = a2 x21 + a2 x22 + · · · + a2 x2n = |a| x′ x
When |a| > 1, vector x is expanded. When |a| < 1, vector x is contracted. When |a| = 1,
there is no change. If a< 0, the direction of vector x is changed.
For two vectors x and y, we define the dot product, i.e. x.y as the sum of the product of
corresponding components.
′
x .y = x1 y1 + x2 y2 + . . . . .+xp yp
y1
y2
= x1 x2 · · · xn ..
.
yn
Example
1 0
If x = and y =
2 3
STA3120 5 Email:[email protected]
′ 0
x.y = x .y = 1 2 =6
3
Note that the dot product is always a 1x1 matrix or a scalar.
j=1
x1
x2
= x1 x2 · · · xp ..
.
xp
′
= x .x = Dot product
In multivariate analysis, the sums of squares and sums of products are dot products. Thus;
x1 − y 1
p x2 − y 2
′
(xj − yj )2 = x1 − y1 x2 − y2 · · · xp − yp .. = (x − y) .(x − y)
P
j=1 .
x p − yp
The rank of a matrix A is the maximum number of linearly independent rows (columns).
Linear independence implies every vector can not be written as a linear combination of the
other vectors. Vectors of the same dimension that are not linearly independent are said to be
linearly dependent which means at least one vector can be written as a linear combination
of the other vectors.
Example
3 2 3 4
Let A = , such that x1 = and x2 =
4 1 2 1
a1 x1 + a2 x2 = 0 =⇒ 3a1 + 2a2 =0 and 4a1 + a2 = 0; holds only if a1 = a2 = 0.
STA3120 6 Email:[email protected]
This confirms
that x1 and x2 are linearly independent. In other words, the columns of ma-
3 2
trix A = are linearly independent.
4 1
Example
1 1 1 1 1 1
Let A = 2 5 −1 , such that x1 = 2 , x2 = 5 , x3 = −1
0 1 −1 0 1 −1
a1 x 1 + a2 x 2 + a3 x 3 = 0
Trace
k
P
The trace of a matrix is the sum of its diagonal elements: tr(A) = aii
i=1
Transpose of a matrix
When a new matrix is formed by inter-changing rows and columns of matrix A, then the
new formed matrix becomes the transpose of matrix A, denoted as AT or A1 .
′ ′ ′ ′ ′ ′
For matrices A and B, (AB) = B A and (A ± B) = A ± B
Square matrix
If all the off diagonal entries are zeros, the matrix A becomes a diagonal matrix denoted as
Diagp (aij ).
STA3120 7 Email:[email protected]
Symmetric matrix
A square matrix Apxp is said to be symmetric if Apxp = ATpxp so that aij = aji for every i and j.
a11 a12 · · · a1j · · · a1p
a21 a22 · · · a2j · · · a2p
A=
.. .. . . .. . . ..
. . . . . .
ap1 ap2 · · · 0 · · · app
Since for a symmetric matrix A, aij = aji for all i, j, we may decide to write only the upper
or lower triangle. i.e.
a11 a12 · · · a1j · · · a1p a11 ··· ···
a22 · · · a2j · · · a2p a21 a22 · · ·
···
A= or
.. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . .
. . . . .
··· · · · anp an1 an2 · · · anj · · · anp
Determinant of matrix
The determinant of a square matrix A denoted as det(A) or |A| of order pxp is a scalar such
that;
where Aij = (−1)i+j ∗ determinant of the matrix with the ith and j th column
Example
STA3120 8 Email:[email protected]
2 0 0
Given a diagonal matrix A = 0 4 0, then the determinant of A will be
0 0 1
|A| = 2 *4 *1 = 8
Inverse of matrix
For a square matrix A of order p i.e. Apxp , if there exists another square matrix B such that
A−1 B = BA = Ip , then B is termed as the inverse of A. Then we say that A is invertible or
non-singular, i.e. |A| =
̸ 0
Let A be, a kxk square matrix and Ik be a kxk identity matrix, then the scalars λ1 , λ2 , · · · λk
satisfying the polynomial equation |A − λI| = 0 are called the eigen values or characteristic
roots of matrix A.
Example
4 0
Given A = , obtain the eigen values of matrix A.
1 2
Solution
From |A − λI| = 0;
4 0 1 0
−λ I = 0; (4 − λ)(2 − λ) = 0
1 2 0 1
STA3120 9 Email:[email protected]
(x ̸= 0) such that Ax = λx, then x is said to be an eigen vector of matrix A associated with
eigen value λ.
If the multiplication of the vector by the square matrix results in a change in length of the
vector but does not result in the change of the direction or changes the vector to the opposite
direction, the vector is called an eigen vector of that matrix.
To each Eigen value λi , there exist a corresponding eigen vector x . Eigen vectors are not
unique as they contain an arbitrary scale factor, and so they are usually normalized so that
xT X = 1
Example
6 16 −8
Let A = and X =
−1 −4 1
6 16 −8 −32 −8
AX = = =4
−1 −4 1 4 1
−8
is an eigen vector with an eigenvalue of 4
1
Example
2 1 3
Let A = and X =
1 2 −3
2 1 3 3 3
AX = = =1
1 2 −3 −3 −3
3
is an eigen vector with an eigenvalue of 1
−3
Example
1 2
Find the eigen values and eigen vectors of A =
3 2
1 2
1 0
1 − λ 2
From |A − λI| = 0; −λ I = 0; =⇒
=0
3 2 0 1 3 2 − λ
STA3120 10 Email:[email protected]
(1 − λ)(2 − λ) - 6 =0; =⇒ λ2 − 3λ − 4 = 0
For λ1 = 4
1 2 x11 x
Ax1 = λ1 x1 =⇒ = 4 11
3 2 x21 x22
=⇒ x11 + 2x21 = 4x11 =⇒ x21 = 23 x11 .
2
Let x11 = 2 =⇒ x21 = 3. Thus, x = —– is not unique
3
2
The normalized eigen vector of x = is
3
!
√2
2
e1 = √ 1′ x1 = √4+9
1
= √313
x1 x1 3 13
′
Note that e1 e1 = 1.
For λ2 = −1
1 2 x12 x
Ax2 = λ2 x2 =⇒ = 4 12
3 2 x22 x22
=⇒ x12 + 2x22 = −x12 =⇒ x22 = −x12 .
1
Let x12 = 1 =⇒ x22 = -1. Thus, x = —– is not unique
=1
1
The normalized eigen vector of x = is
−1
!
√1
1
e2 = √ 1′ x2 = √1+1
1
= √ −1
2
x2 x2 −1 2
′ ′
Note that e2 e2 = 1. Also, e1 and e2 are orthogonal (perpendicular), that is, e1 e2 =0
STA3120 11 Email:[email protected]
It is also a useful way ofdetermining the inverse or determinant of square matrices.
2 3 4
A 11 A 12
Given a matrix A = 3 2 3, A can be partitioned as = ,
A21 A22
4 3 4
2 3
where A11 = 2; A12 = 3 4 ; A21 = 3; A22 = ,
3 4
student to complete!!!
Question:
STA3120 12 Email:[email protected]
1 x x2 0
0 1 x x2
Find the determinant of matrix A using matrix partitioning; A =
x2 0 1 x
x x2 0 1
All the terms in are of 2nd order i.e quadratic and the expression is called a Quadratic form
denoted as Q(f ) of a matrix A.
A square symmetric matrix A and its Q(f ) are said to be positive semi–definite if the Q(f )
′
is always positive or equal to zero for all non – zero; X AX ≥ 0 ∀ x ̸= 0
′
A square symmetric matrix A of order p is strictly positive definite if X AX ≥ 0 ∀ x ̸= 0
and its Q(f ) are said to be positive semi–definite if the Q(f ) is always positive or equal to
′
zero for all non – zero; X AX > 0 ∀ x ̸= 0
STA3120 13 Email:[email protected]