0% found this document useful (0 votes)
74 views53 pages

MatricesAndVectors PDF

The document defines key concepts in matrix algebra, including: - Vectors are arrays of elements that can be added, subtracted, and multiplied by constants. - The length and direction of a vector can change with multiplication. - Two vectors are perpendicular if their inner product is 0. - Vectors are linearly dependent if one can be written as a linear combination of the other(s). - Matrices are arrays of elements with rows and columns that can be transposed.

Uploaded by

jaslin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views53 pages

MatricesAndVectors PDF

The document defines key concepts in matrix algebra, including: - Vectors are arrays of elements that can be added, subtracted, and multiplied by constants. - The length and direction of a vector can change with multiplication. - Two vectors are perpendicular if their inner product is 0. - Vectors are linearly dependent if one can be written as a linear combination of the other(s). - Matrices are arrays of elements with rows and columns that can be transposed.

Uploaded by

jaslin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Basic Concepts in Matrix Algebra

• An column array of p elements is called a vector of dimension p and is


written as
 
x1
 x 
xp×1 =   ..2  .

 . 
xp

• The transpose of the column vector xp×1 is row vector


x0 = [x1 x2 . . . xp]

• A vector can be represented in p-space as a directed line with compo-


nents along the p axes.

38
Basic Matrix Concepts (cont’d)

• Two vectors can be added if they have the same dimension. Addition
is carried out elementwise.
     
x1 y1 x1 + y1
 x2   y2   x2 + y2 
x+y =

... +
 
... =
 
...


     
xp yp xp + yp

• A vector can be contracted or expanded if multiplied by a constant c.


Multiplication is also elementwise.
   
x1 cx1
 x2   cx2 
cx = c =
   
 ...   ... 

xp cxp
39
Examples

 
2
0
h i
x = 1  and x = 2 1 −4
 
−4

     
2 6×2 12
6×x=6× 1 = 6×1 = 6 
     
−4 6 × (−4) −24

       
2 5 2+5 7
x+y = 1  +  −2  =  1 − 2  =  −1 
       
−4 0 −4 + 0 −4

40
Basic Matrix Concepts (cont’d)

• Multiplication by c > 0 does not change the direction of x. Direction is


reversed if c < 0.

41
Basic Matrix Concepts (cont’d)

• The length of a vector x is the Euclidean distance from the origin


v
u p
uX
Lx = u
t x2
j
j=1

• Multiplication of a vector x by a constant c changes the length:


v v
u p u p
uX uX
2 2
c xj = |c|u x2
Lcx = t j = |c|Lx
u
t
j=1 j=1

• If c = L−1
x , then cx is a vector of unit length.

42
Examples
 
2
 1 
The length of x =  is
 
 −4 
−2
q √
Lx = (2)2 + (1)2 + (−4)2 + (−2)2 = 25 = 5

Then
   
2 0.4
1 
 1
  0.2 
z = × =
  
5  −4  −0.8

 
−2 −0.4
is a vector of unit length.
43
Angle Between Vectors

• Consider two vectors x and y in two dimensions. If θ1 is the angle


between x and the horizontal axis and θ2 > θ1 is the angle between
y and the horizontal axis, then
x y
cos(θ1) = 1 cos(θ2) = 1
Lx Ly
x y
sin(θ1) = 2 sin(θ2) = 2 ,
Lx Ly
If θ is the angle between x and y, then
cos(θ) = cos(θ2 − θ1) = cos(θ2) cos(θ1) + sin(θ2) sin(θ1).
Then
x y + x2y2
cos(θ) = 1 1 .
LxLy

44
Angle Between Vectors (cont’d)

45
Inner Product

• The inner product between two vectors x and y is


p
x0y =
X
xj yj .
j=1

√ q
• Then Lx = x0x, Ly = y0y and
x0y
cos(θ) = q q
(x0x) (y 0 y )

• Since cos(θ) = 0 when x0y = 0 and cos(θ) = 0 for θ = 90


or θ = 270, then the vectors are perpendicular (orthogonal) when
x0y = 0.
46
Linear Dependence
• Two vectors, x and y, are linearly dependent if there exist
two constants c1 and c2, not both zero, such that

c1 x + c2 y = 0

• If two vectors are linearly dependent, then one can be written as a


linear combination of the other. From above:

x = (c2/c1)y

• k vectors, x1, x2, . . . , xk , are linearly dependent if there exist con-


stants (c1, c2, ..., ck ) not all zero such that
k
X
cj xj = 0
j=1 47
• Vectors of the same dimension that are not linearly
dependent are said to be linearly independent
Linear Independence-example

Let
     
1 1 1
x1 = 
 2 , x2 =  0 ,  −2 
x3 = 
  

1 −1 1

Then c1x1 + c2x2 + c3x3 = 0 if


c1 + c2 + c3 = 0
2c1 + 0 − 2c3 = 0
c1 − c2 + c3 = 0

The unique solution is c1 = c2 = c3 = 0, so the vectors are linearly


independent.
48
Projections

• The projection of x on y is defined by


x0y x0y 1
Projection of x on y = 0 y = y.
yy Ly Ly

• The length of the projection is


|x0y| |x0y|
Length of projection = = Lx = Lx| cos(θ)|,
Ly LxLy
where θ is the angle between x and y.

49
Matrices

A matrix A is an array of elements aij with n rows and p columns:


 
a11 a12 · · · a1p
 a21 a22 · · · a2p 
A=
 
 ... ... ... ... 

an1 an2 · · · anp
The transpose A0 has p rows and n columns. The j-th row of A0 is the j-th
column of A
 
a11 a21 · · · an1
0
 a a22 · · · an2 
A =  12
 
 ... ... ... ... 

a1p a2p · · · anp

50
Matrix Algebra

• Multiplication of A by a constant c is carried out element by element.


 
ca11 ca12 · · · ca1p
 ca21 ca22 · · · ca2p 
cA = 
 
 ... ... ... ... 

can1 can2 · · · canp

51
Matrix Addition
Two matrices An×p = {aij } and Bn×p = {bij } of the same dimen-
sions can be added element by element. The resulting matrix is Cn×p =
{cij } = {aij + bij }

C = A+B
   
a11 a12 · · · a1p b11 b12 · · · b1p
 a21 a22 · · · a2p   b21 b22 · · · b2p 
=  +
   
 ... ... ... ...   ... ... ... ... 

an1 an2 · · · anp bn1 bn2 · · · bnp
 
a11 + b11 a12 + b12 · · · a1p + b1p
 a21 + b21 a22 + b22 · · · a2p + b2p 
= 
 
 ... ... ... ... 

an1 + bn1 an2 + bn2 · · · anp + bnp
52
Examples

 
" #0 2 5
2 1 −4
= 1 7 
 
5 7 0
−4 0

" # " #
2 1 −4 12 6 −24
6× =
5 7 0 30 42 0

" # " # " #


2 −1 2 1 4 0
+ =
0 3 5 7 5 10

53
Matrix Multiplication

• Multiplication of two matrices An×p and Bm×q can be carried out only
if the matrices are compatible for multiplication:

– An×p × Bm×q : compatible if p = m.

– Bm×q × An×p: compatible if q = n.

The element in the i-th row and the j-th column of A × B is the inner
product of the i-th row of A with the j-th column of B.

54
Multiplication Examples

 
" # 1 4 " #
2 0 1 2 10
×  −1 3  =
 
5 1 3 4 29
0 2

" # " # " #


2 1 1 4 1 11
× =
5 3 −1 3 2 29

" # " # " #


1 4 2 1 22 13
× =
−1 3 5 3 13 8

55
Identity Matrix

• An identity matrix, denoted by I, is a square matrix with 1’s along the


main diagonal and 0’s everywhere else. For example,
 
" # 1 0 0
1 0
I2×2 = and I3×3 =  0 1 0 
 
0 1
0 0 1

• If A is a square matrix, then AI = IA = A.

• In×nAn×p = An×p but An×pIn×n is not defined for p 6= n.

56
Symmetric Matrices

• A square matrix is symmetric if A = A0.

• If a square matrix A has elements {aij }, then A is symmetric if aij =


aji.

• Examples
 
" # 5 1 −3
4 2
 1 12 −5 
 
2 4
−3 −5 9

57
Inverse Matrix

• Consider two square matrices Ak×k and Bk×k . If

AB = BA = I
then B is the inverse of A, denoted A−1.

• The inverse of A exists only if the columns of A are linearly indepen-


dent.

• If A = diag{aij } then A−1 = diag{1/aij }.

58
Inverse Matrix

• For a 2 × 2 matrix A, the inverse is

" #−1 " #


a11 a12 1 a22 −a12
A−1 = = ,
a21 a22 det(A) −a21 a11

where det(A) = (a11 × a22) − (a12 × a21) denotes the


determinant of A.

59
Orthogonal Matrices

• A square matrix Q is orthogonal if

QQ0 = Q0Q = I,
or Q0 = Q−1.

• If Q is orthogonal, its rows and columns have unit length (q0j qj = 1)


and are mutually perpendicular (q0j qk = 0 for any j 6= k).

60
Eigenvalues and Eigenvectors

• A square matrix A has an eigenvalue λ with corresponding eigenvec-


tor z 6= 0 if
Az = λz

• The eigenvalues of A are the solution to |A − λI| = 0.

• A normalized eigenvector (of unit length) is denoted by e.

• A k × k matrix A has k pairs of eigenvalues and eigenvectors


λ1, e1 λ2, e2 ... λk , ek
where e0iei = 1, e0iej = 0 and the eigenvectors are unique up to a
change in sign unless two or more eigenvalues are equal.

61
Spectral Decomposition

• Eigenvalues and eigenvectors will play an important role in this course.


For example, principal components are based on the eigenvalues and
eigenvectors of sample covariance matrices.

• The spectral decomposition of a k × k symmetric matrix A is


A = λ1e1e01 + λ2e2e02 + ... + λk ek e0k

 
λ1 0 · · · 0
 0 λ2 · · · 0 
= [e1 e2 · · · ek ] 

... ... . . . ...  [e1 e2 · · ·

ek ]0
 
0 0 · · · λk
= P ΛP 0

62
Determinant and Trace

• The trace of a k × k matrix A is the sum of the diagonal elements, i.e.,


trace(A) = ki=1 aii
P

• The trace of a square, symmetric matrix A is the sum of the eigenval-


ues, i.e., trace(A) = ki=1 aii = ki=1 λi
P P

• The determinant of a square, symmetric matrix A is the product of the


eigenvalues, i.e., |A| = ki=1 λi
Q

63
Rank of a Matrix

• The rank of a square matrix A is

– The number of linearly independent rows

– The number of linearly independent columns

– The number of non-zero eigenvalues

• The inverse of a k × k matrix A exists, if and only if

rank(A) = k
i.e., there are no zero eigenvalues

64
Positive Definite Matrix

• For a k × k symmetric matrix A and a vector x = [x1, x2, ..., xk ]0 the


quantity x0Ax is called a quadratic form

• Note that x0Ax =


Pk Pk
i=1 j=1 aij xi xj

• If x0Ax ≥ 0 for any vector x, both A and the quadratic form are said
to be non-negative definite.

• If x0Ax > 0 for any vector x 6= 0, both A and the quadratic form are
said to be positive definite.

65
Example 2.11


• Show that the matrix of the quadratic form 3x2
1 + 2x 2−2
2 2x1x2 is
positive definite.

• For
" √ #
A= √3 − 2 ,
− 2 2
the eigenvalues are λ1 = 4, λ2 = 1. Then A = 4e1e01 + e2e02. Write

x0Ax = 4x0e1e01x + x0e2e02x


= 4y12 + y22 ≥ 0,
and is zero only for y1 = y2 = 0.

66
Example 2.11 (cont’d)

• y1, y2 cannot be zero because


e01
" # " #" #
y1 x1 0
= = P2×2 x2×1
y2 e02 x2
with P 0 orthonormal so that (P 0)−1 = P . Then x = P y and since
x 6= 0 it follows that y 6= 0.

• Using the spectral decomposition, we can show that:

– A is positive definite if all of its eigenvalues are positive.

– A is non-negative definite if all of its eigenvalues are ≥ 0.

67
Distance and Quadratic Forms

• For x = [x1, x2, ..., xp]0 and a p × p positive definite matrix A,

d2 = x0Ax > 0
when x 6= 0. Thus, a positive definite quadratic form can be inter-
preted as a squared distance of x from the origin and vice versa.

• The squared distance from x to a fixed point µ is given by the quadratic


form
(x − µ)0A(x − µ).

68
Distance and Quadratic Forms (cont’d)
• We can interpret distance in terms of eigenvalues and eigenvectors of
A as well. Any point x at constant distance c from the origin satisfies
p p
x0Ax = x0( λj ej e0j )x = λj (x0ej )2 = c2,
X X

j=1 j=1
the expression for an ellipsoid in p dimensions.

−1/2
• Note that the point x = cλ1 e1 is at a distance c (in the direction of
e1) from the origin because it satisfies x0Ax = c2. The same is true
−1/2
for points x = cλj ej , j = 1, ..., p. Thus, all points at distance c lie
on an ellipsoid with axes in the directions of the eigenvectors and with
−1/2
lengths proportional to λj .

69
Distance and Quadratic Forms (cont’d)

70
Square-Root Matrices

• Spectral decomposition of a positive definite matrix A yields


p
λj ej e0j = P ΛP,
X
A=
j=1
with Λk×k = diag{λj }, all λj > 0, and Pk×k = [e1 e2 ... ep] an
orthonormal matrix of eigenvectors. Then
p
1
A−1 = P Λ−1P 0 = ej e0j
X

j=1 λj

1/2
• With Λ1/2 = diag{λj }, a square-root matrix is
p q
A1/2 = P Λ1/2P 0 = λj ej e0j
X

j=1
71
Square-Root Matrices
The square root of a positive definite matrix A has the
following properties:

1. Symmetry: (A1/2)0 = A1/2

2. A1/2A1/2 = A
Pp −1/2
3. A−1/2 = λ
j=1 j e j e 0 =
j P Lambda−1/2P 0

4. A1/2A−1/2 = A−1/2A1/2 = I

5. A−1/2A−1/2 = A−1

Note that there are other ways of defining the square root of a positive
definite matrix: in the Cholesky decomposition A = LL0, with L a matrix
of lower triangular form, L is also called a square root of A.
72
Random Vectors and Matrices
• A random matrix (vector) is a matrix (vector) whose elements are ran-
dom variables.

• If Xn×p is a random matrix, the expected value of X is the n×p matrix


 
E(X11) E(X12) · · · E(X1p)
 E(X21) E(X22) · · · E(X2p) 
E(X) =  ,
 
 ... ... ··· .
.. 
E(Xn1) E(Xn2) · · · E(Xnp)
where
Z ∞
E(Xij ) = xij fij (xij )dxij

with fij (xij ) the density function of the continuous random variable
Xij . If X is a discrete random variable, we compute its expectation as
a sum rather than an integral.

73
Linear Combinations

• The usual rules for expectations apply. If X and Y are two random
matrices and A and B are two constant matrices of the appropriate
dimensions, then

E(X + Y ) = E(X) + E(Y )


E(AX) = AE(X)
E(AXB) = AE(X)B
E(AX + BY ) = AE(X) + BE(Y )

• Further, if c is a scalar-valued constant then

E(cX) = cE(X).

74
Mean Vectors and Covariance Matrices

• Suppose that X is p × 1 (continuous) random vector drawn from some


p−dimensional distribution.

• Each element of X, say Xj has its own marginal distribution with


marginal mean µj and variance σjj defined in the usual way:
Z ∞
µj = xj fj (xj )dxj
−∞
Z
σjj = (xj − µj )2fj (xj )dxj
−∞

75
Mean Vectors and Covariance Matrices (cont’d)

• To examine association between a pair of random variables we need


to consider their joint distribution.

• A measure of the linear association between pairs of variables is given


by the covariance
h i
σjk = E (Xj − µj )(Xk − µk )
Z ∞ Z ∞
= (xj − µj )(xk − µk )fjk (xj , xk )dxj dxk .
−∞ −∞

76
Mean Vectors and Covariance Matrices (cont’d)

• If the joint density function fjk (xj , xk ) can be written as


the product of the two marginal densities, e.g.,
fjk (xj , xk ) = fj (xj )fk (xk ),
then Xj and Xk are independent.

• More generally, the p−dimensional random vector X has


mutually independent elements if the p−dimensional joint
density function can be written as the product of the p
univariate marginal densities.

• If two random variables Xj and Xk are independent, then their covari-


ance is equal to 0. [Converse is not always true.]

77
Mean Vectors and Covariance Matrices (cont’d)

• We use µ to denote the p × 1 vector of marginal population means


and use Σ to denote the p × p population variance-covariance matrix:
0
h i
Σ = E (X − µ)(X − µ) .

• If we carry out the multiplication (outer product)then Σ is equal to:


 
(X1 − µ1 )2 (X1 − µ1 )(X2 − µ2 ) ··· (X1 − µ1 )(Xp − µp )
 (X2 − µ2)(X1 − µ1) (X − µ ) 2 ··· (X2 − µ2 )(Xp − µp ) 
2 2
E ... ... ... .
 ··· 
(Xp − µp )(X1 − µ1 ) (Xp − µp )(X2 − µ2 ) ··· (Xp − µp )2

78
Mean Vectors and Covariance Matrices (cont’d)

• By taking expectations element-wise we find that


 
σ11 σ12 · · · σ1p
 σ21 σ22 · · · σ2p 
Σ= .
 
 ... ... ··· ... 
σp1 σp2 · · · σpp

• Since σjk = σkj for all j 6= k we note that Σ is symmetric.

• Σ is also non-negative definite

79
Correlation Matrix

• The population correlation matrix is the p × p matrix with


off-diagonal elements equal to ρjk and diagonal elements
equal to 1.
 
1 ρ12 · · · ρ1p
 ρ21 1 · · · ρ2p 
.
 

 ... ... · · · ... 

ρp1 ρp2 · · · 1

• Since ρij = ρji the correlation matrix is symmetric

• The correlation matrix is also non-negative definite

80
Correlation Matrix (cont’d)

• The p × p population standard deviation matrix V 1/2 is a diagonal ma-



trix with σjj along the diagonal and zeros in all off-diagonal positions.
Then
Σ = V 1/2P V 1/2
and the population correlation matrix is

(V 1/2)−1Σ(V 1/2)−1

• Given Σ, we can easily obtain the correlation matrix

81
Partitioning Random vectors

• If we partition the random p × 1 vector X into two components X1, X2


of dimensions q × 1 and (p − q) × 1 respectively, then the mean vector
and the variance-covariance matrix need to be partitioned accordingly.
• Partitioned mean vector:
" # " # " #
X1 E(X1) µ1
E(X) = E = =
X2 E(X2) µ2

• Partitioned variance-covariance matrix:


" # " #
V ar(X1) Cov(X1, X2) Σ11 Σ12
Σ= = 0
,
Cov(X2, X1) V ar(X2) Σ12 Σ22
where Σ11 is q × q, Σ12 is q × (p − q) and Σ22 is (p − q) × (p − q).

82
Partitioning Covariance Matrices (cont’d)

• Σ11, Σ22 are the variance-covariance matrices of the


sub-vectors X1, X2, respectively. The off-diagonal elements
in those two matrices reflect linear associations among
elements within each sub-vector.

• There are no variances in Σ12, only covariances. These


covariancs reflect linear associations between elements
in the two different sub-vectors.

83
Linear Combinations of Random variables

• Let X be a p × 1 vector with mean µ and variance covariance matrix


Σ, and let c be a p×1 vector of constants. Then the linear combination
c0X has mean and variance:

E(c0X) = c0µ, and V ar(c0X) = c0Σc

• In general, the mean and variance of a q × 1 vector of linear combina-


tions Z = Cq×pXp×1 are

µZ = CµX and ΣZ = CΣX C 0.

84
Cauchy-Schwarz Inequality

• We will need some of the results below to derive some


maximization results later in the course.

Cauchy-Schwarz inequality Let b and d be any two


p × 1 vectors. Then,
(b0d)2 ≤ (b0b)(d0d)
with equality only if b = cd for some scalar constant c .

Proof: The equality is obvious for b = 0 or d = 0. For other cases,


consider b − cd for any constant c 6= 0 . Then if b − cd 6= 0, we have
0 < (b − cd)0(b − cd) = b0b − 2c(b0d) + c2d0d,
since b − cd must have positive length.
85
Cauchy-Schwarz Inequality

We can add and subtract (b0d)2/(d0d) to obtain


!2
(b0d)2 (b0d)2 (b0d)2 0
bd
0 < b0b−2c(b0d)+c2d0d− + = b0b− +( d0 d) c −
d0d d0d d0d d0d
Since c can be anything, we can choose c = b0d/d0d. Then,

0 (b0d)2
0<bb− ⇒ (b0d)2 < (b0b)(d0d)
d0d
for b 6= cd (otherwise, we have equality).

86
Extended Cauchy-Schwarz Inequality
If b and d are any two p × 1 vectors and B is a p × p positive definite
matrix, then
(b0d)2 ≤ (b0B b)(d0B −1d)
with equality if and only if b = cB −1d or d = cB b for some
constant c.
Pp √ Pp
Proof: Consider B 1/2 0
= i=1 λieiei, and B −1/2 = i=1 √1 eie0i.
( λi )
Then we can write
0 ∗
b0d = b0 I d = b0B 1/2B −1/2d = (B 1/2b)0(B −1/2d) = ∗
b d .
To complete the proof, simply apply the Cauchy-Schwarz
inequality to the vectors b∗ and d∗.

87
Optimization

Let B be positive definite and let d be any p × 1 vector. Then


(x0d)2
max 0 = d0B −1d
x6=0 x B x

is attained when x = cB −1d for any constant c 6= 0.

Proof: By the extended Cauchy-Schwartz inequality: (x0d)2 ≤ (x0B x)(d0B −1d).


Since x 6= 0 and B is positive definite, x0B x > 0 and we can divide both
sides by x0B x to get an upper bound
(x0d)2 0 B −1 d.
≤ d
x0B x
Differentiating the left side with respect to x shows that
maximum is attained at x = cB −1d.
88
Maximization of a Quadratic Form
on a Unit Sphere
• B is positive definite with eigenvalues λ1 ≥ λ2 ≥ · · · ≥ λp ≥ 0 and
associated eigenvectors (normalized) e1, e2, · · · , ep. Then
x0B x
max 0 = λ1, attained when x = e1
x6=0 x x

x0B x
min 0 = λp, attained when x = ep.
x6=0 x x

• Furthermore, for k = 1, 2, · · · , p − 1,
x0B x
max 0
= λk+1 is attained when x = ek+1.
x⊥e1,e2,··· ,ek x x
See proof at end of chapter 2 in the textbook (pages 80-81).

89

You might also like