0% found this document useful (0 votes)
55 views49 pages

Eval Norms

This document discusses the symmetric eigendecomposition of matrices. It explains that for symmetric matrices: 1) All eigenvalues are real. 2) The eigenvectors corresponding to different eigenvalues are orthogonal. 3) A symmetric matrix is diagonalizable by an orthogonal similarity transformation into a diagonal matrix of its eigenvalues.

Uploaded by

Mouli Roy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views49 pages

Eval Norms

This document discusses the symmetric eigendecomposition of matrices. It explains that for symmetric matrices: 1) All eigenvalues are real. 2) The eigenvectors corresponding to different eigenvalues are orthogonal. 3) A symmetric matrix is diagonalizable by an orthogonal similarity transformation into a diagonal matrix of its eigenvalues.

Uploaded by

Mouli Roy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

L.

Vandenberghe ECE133B (Spring 2020)

3. Symmetric eigendecomposition

• eigenvalues and eigenvectors

• symmetric eigendecomposition

• quadratic forms

• low rank matrix approximation

3.1
Eigenvalues and eigenvectors

a nonzero vector x is an eigenvector of the n × n matrix A, with eigenvalue λ, if

Ax = λx

• the matrix λI − A is singular and x is a nonzero vector in the nullspace of λI − A


• the eigenvalues of A are the roots of the characteristic polynomial:

det(λI − A) = λn + cn−1 λn−1 + · · · + c1 λ + (−1)n det(A) = 0

• this immediately shows that every square matrix has at least one eigenvalue
• the roots of the polynomial (and corresponding eigenvectors) may be complex
• (algebraic) multiplicity of an eigenvalue is its multiplicity as a root of det(λI − A)
• there are exactly n eigenvalues, counted with their multiplicity
• set of eigenvalues of A is called the spectrum of A

Symmetric eigendecomposition 3.2


Diagonal matrix

 A11
 0 ··· 0 

 0 A22 · · · 0
A =  .

.. ... ..
 .



 0
 0 ··· Ann 

• eigenvalues of A are the diagonal entries A11, . . . , Ann


• the n unit vectors e1 = (1, 0, . . . , 0), . . . , en = (0, . . . , 0, 1) are eigenvectors:

Aei = Aii ei

• linear combinations of ei are eigenvectors if the corresponding Aii are equal

Example: A = αI is a scalar multiple of the identity matrix

• one eigenvalue α with multipicity n


• every nonzero vector is an eigenvector

Symmetric eigendecomposition 3.3


Similarity transformation

two matrices A and B are similar if

B = T −1 AT

for some nonsingular matrix T

• the mapping that maps A to T −1 AT is called a similarity transformation


• similarity transformations preserve eigenvalues:

det(λI − B) = det(λI − T −1 AT) = det(T −1(λI − A)T) = det(λI − A)

• if x is an eigenvector of A then y = T −1 x is an eigenvector of B:

By = (T −1 AT)(T −1 x) = T −1 Ax = T −1(λx) = λy

of special interest will be orthogonal similarity transformations (T is orthogonal)


Symmetric eigendecomposition 3.4
Diagonalizable matrices

a matrix is diagonalizable if it is similar to a diagonal matrix:

T −1 AT = Λ

for some nonsingular matrix T

• the diagonal elements of Λ are the eigenvalues of A


• the columns of T are eigenvectors of A:

A(T ei ) = TΛei = Λii (T ei )

• the columns of T give a set of n linearly independent eigenvectors

not all square matrices are diagonalizable

Symmetric eigendecomposition 3.5


Spectral decomposition

suppose A is diagonalizable, with

 λ1 0 · · · 0   w1T 
0 λ2 · · ·
   
−1 0 w2T
A = TΛT =
     
v1 v2 · · · vn  .. .. . . . .. 


 .. 

λn
   

 0 0 ··· 


 wnT 

= λ1 v1 w1T + λ2 v2 w2T + · · · + λn vn wnT

this is a spectral decomposition of the linear function f (x) = Ax

• elements of T −1 x are coefficients of x in the basis of eigenvectors {v1, . . . , vn }:

x = TT −1 x = α1 v1 + · · · + αn vn where αi = wiT x

• applied to an eigenvector, f (vi ) = Avi = λi vi is a simple scaling


• by superposition, we find Ax as

Ax = α1 λ1 v1 + · · · + αn λn vn = TΛT −1 x

Symmetric eigendecomposition 3.6


Exercise

recall from 133A the definition of a circulant matrix


 a1 an an−1 ··· a3 a2 


 a2 a1 an ··· a4 a3 

a3 a2 a1 ··· a5 a4
A = 
 
.. .. .. ... .. .. 

 

 an−1 an−2 an−3 ··· a1 an 


 an an−1 an−2 ··· a2 a1 

and its factorization


1
A = W diag(W a)W H
n
W is the discrete Fourier transform matrix (W a is the DFT of a) and

1
W −1 = W H
n

what is the spectrum of A?


Symmetric eigendecomposition 3.7
Outline

• eigenvalues and eigenvectors

• symmetric eigendecomposition

• quadratic forms

• low rank matrix approximation


Symmetric eigendecomposition

eigenvalues/vectors of a symmetric matrix have important special properties

• all the eigenvalues are real

• the eigenvectors corresponding to different eigenvalues are orthogonal

• a symmetrix matrix is diagonalizable by an orthogonal similarity transformation:

QT AQ = Λ, QT Q = I

in the remainder of the lecture we assume that A is symmetric (and real)

Symmetric eigendecomposition 3.8


Eigenvalues of a symmetric matrix are real

consider an eigenvalue λ and eigenvector x (possibly complex):

Ax = λx, x,0

• inner product with x shows that x H Ax = λx H x


• x H x = i=1 |xi | 2 is real and positive, and x H Ax is real:
Pn

n X
n n
x Ax =
H
Ai j x̄i x j = Aii |xi | + 2
2
X X X
Ai j Re( x̄i x j )
i=1 j=1 i=1 j<i

• therefore λ = (x H Ax)/(x H x) is real


• if x is complex, its real and imaginary part are real eigenvectors (if nonzero):

A(xre + jxim) = λ(xre + jxim) =⇒ Axre = λxre, Axim = λxim

therefore, eigenvectors can be assumed to be real

Symmetric eigendecomposition 3.9


Orthogonality of eigenvectors

suppose x and y are eigenvectors for different eigenvalues λ, µ:

Ax = λx, Ay = µy, λ,µ

• take inner products with x , y :

λyT x = yT Ax = xT Ay = µxT y

second equality holds because A is symmetric

• if λ , µ this implies that


xT y = 0

Symmetric eigendecomposition 3.10


Eigendecomposition

every real symmetric n × n matrix A can be factored as

A = QΛQT (1)

• Q is orthogonal

• Λ = diag(λ1, . . . , λn) is diagonal, with real diagonal elements

• A is diagonalizable by an orthogonal similarity transformation: QT AQ = Λ

• the columns of Q are an orthonormal set of n eigenvectors: write AQ = QΛ as

 λ1 0 · · · 0 
  0 λ2 · · ·
 
0
=
   
A q1 q2 · · · qn q1 q2 · · · qn  . .. . . . ..
.


λn
 
 0 0 ··· 
 
= λ1 q1 λ2 q2 · · · λn qn
 

Symmetric eigendecomposition 3.11


Proof by induction

• the decomposition (1) obviously exists if n = 1


• suppose it exists if n = m and A is an (m + 1) × (m + 1) matrix
• A has at least one eigenvalue (page 3.2)
• let λ1 be any eigenvalue and q1 a corresponding eigenvector, with kq1 k = 1
• let V be an (m + 1) × m matrix that makes the matrix q1 V orthogonal:
 

q1T q1T Aq1 q1T AV λ1 q1T q1 λ1 q1T V λ1


       
0
= = =
 
A q1 V
VT V T Aq1 V T AV λ1V T q1 V T AV 0 V T AV

• V T AV is a symmetric m × m matrix, so by the induction hypothesis,

V T AV = Q̃Λ̃Q̃T for some orthogonal Q̃ and diagonal Λ̃

• matrix Q = q1 V Q̃ is orthogonal and defines a similarity that diagonalizes A:


 

q1T λ1 λ1 0
     
0
QT AQ = A q1 V Q̃ = =
 
Q̃T V T 0 Q̃T V T AV Q̃ 0 Λ̃
Symmetric eigendecomposition 3.12
Spectral decomposition

the decomposition (1) expresses A as a sum of rank-one matrices:

 λ1 0 · · · 0   q1T 
0 λ2 · · ·
   
0 q2T
A = QΛQ T
=
     
q1 q2 · · · qn  .. .. . . . .. 


 .. 

λn
   

 0 0 ··· 


 qnT 

n
= λi qi qiT
X
i=1

• the matrix–vector product Ax is decomposed as

n
Ax = λi qi (qiT x)
X
i=1

• (q1T x, . . . , qnT x) are coordinates of x in the orthonormal basis {q1, . . . , qn }


• (λ1 q1T x, . . . , λn qnT x) are coordinates of Ax in the orthonormal basis {q1, . . . , qn }

Symmetric eigendecomposition 3.13


Non-uniqueness

some freedom exists in the choice of Λ and Q in the eigendecomposition

 λ1 · · · 0   q1T 
A = QΛQT =

q1 · · ·
 
qn  .. . . . .. 


 .. 

λn
  

 0 ··· 


 qnT 

Ordering of eigenvalues
diagonal Λ and columns of Q can be permuted; we will assume that

λ1 ≥ λ2 ≥ · · · ≥ λn

Choice of eigenvectors
suppose λi is an eigenvalue with multiplicity k : λi = λi+1 = · · · = λi+k−1

• nonzero vectors in span{qi, . . . , qi+k−1 } are eigenvectors with eigenvalue λi


• qi , . . . , qi+k−1 can be replaced with any orthonormal basis of this “eigenspace”

Symmetric eigendecomposition 3.14


Inverse

a symmetric matrix is invertible if and only if all its eigenvalues are nonzero:

• inverse of A = QΛQT is

 1/λ1
 0 ··· 0 

 0 1/λ2 · · · 0
A−1 = (QΛQT )−1 = QΛ−1QT , Λ−1 =  .

. .. ... .. 

 
 0
 0 ··· 1/λn 

• eigenvectors of A−1 are the eigenvectors of A


• eigenvalues of A−1 are reciprocals of eigenvalues of A

Symmetric eigendecomposition 3.15


Spectral matrix functions

Integer powers

Ak = (QΛQT ) k = QΛ k QT , Λ k = diag(λ1k , . . . , λnk )

• negative powers are defined if A is invertible (all eigenvalues are nonzero)


• Ak has the same eigenvectors as A, eigenvalues λik

Square root
p p
A 1/2
= QΛ Q ,
1/2 T
Λ1/2
= diag( λ1, . . . , λn)

• defined if eigenvalues are nonnegative


• a symmetric matrix that satisfies A1/2 A1/2 = A

Other matrix functions: can be defined via power series, for example,

exp(A) = Q exp(Λ)QT , exp(Λ) = diag(eλ1, . . . , eλn )

Symmetric eigendecomposition 3.16


Range, nullspace, rank

eigendecomposition with nonzero eigenvalues placed first in Λ:

QT1
   
Λ1 0
A = QΛQT = = Q1Λ1QT1
 
Q1 Q2
0 0 QT2

diagonal entries of Λ1 are the nonzero eigenvalues of A

• columns of Q1 are an orthonormal basis for range(A)


• columns of Q2 are an orthonormal basis for null(A)
• this is an example of a full-rank factorization (page 1.27): A = BC with

B = Q1, C = Λ1QT1

• rank of A is the number of nonzero eigenvalues (with their multiplicities)

Symmetric eigendecomposition 3.17


Pseudo-inverse

we use the same notation as on the previous page

QT1
   
Λ1 0
A= = Q1Λ1QT1
 
Q1 Q2
0 0 QT2

diagonal entries of Λ1 are the nonzero eigenvalues of A

• pseudo-inverse follows from page 1.36 with B = Q1 and C = Λ1QT1


• the pseudo-inverse is A† = C † B† = (Q1Λ−1
1 )Q T:
1

Λ−1 QT1
   
0
A† = Q1Λ−1 1 =
T
 
Q Q1 Q2 1
1 0 0 QT2

• eigenvectors of A† are the eigenvectors of A


• nonzero eigenvalues of A† are reciprocals of nonzero eigenvalues of A
• range, nullspace, and rank of A† are the same as for A

Symmetric eigendecomposition 3.18


Trace

the trace of an n × n matrix B is the sum of its diagonal elements

n
trace(B) =
X
Bii
i=1

• transpose: trace(BT ) = trace(B)


• product: if B is n × m and C is m × n, then

n X
m
trace(BC) = trace(CB) =
X
Bi j C ji
i=1 j=1

• eigenvalues: the trace of a symmetric matrix is the sum of the eigenvalues

n
trace(QΛQ ) = trace(Q QΛ) = trace(Λ) =
T T
λi
X
i=1

Symmetric eigendecomposition 3.19


Frobenius norm

recall the definition of Frobenius norm of an m × n matrix B:


v
tm n q q
kBkF = Bi2j = trace(BT B) =
XX
trace(BBT )
i=1 j=1

• this is an example of a unitarily invariant norm: if U , V are orthogonal, then

kUBV kF = kBkF

Proof:

kUBV kF2 = trace(V T BT U T UBV) = trace(VV T BT B) = trace(BT B) = kBkF2

• for a symmetric n × n matrix with eigenvalues λ1, . . . , λn,


! 1/2
n
k AkF = kQΛQT kF = kΛkF = λi2
X
i=1

Symmetric eigendecomposition 3.20


2-Norm

recall the definition of 2-norm or spectral norm of an m × n matrix B:

kBxk
kBk2 = max
x,0 k xk

• this norm is also unitarily invariant: if U , V are orthogonal, then

kUBV k2 = kBk2

Proof:

kUBV xk kUByk kByk


kUBV k2 = max = max T = max = kBk2
x,0 k xk y,0 kV yk y,0 k yk

• for a symmetric n × n matrix with eigenvalues λ1, . . . , λn,

k Ak2 = kQΛQT k2 = kΛk2 = max |λi | = max{λ1, −λn }


i=1,...,n

Symmetric eigendecomposition 3.21


Exercises

Exercise 1

suppose A has eigendecomposition A = QΛQT ; give an eigendecomposition of

A − αI

Exercise 2

what are the eigenvalues and eigenvectors of an orthogonal projector

A = UU T (where U T U = I )

Exercise 3

the condition number of a nonsingular matrix is defined as

κ(A) = k Ak2 k A−1 k2

express the condition number of a symmetric matrix in terms of its eigenvalues


Symmetric eigendecomposition 3.22
Outline

• eigenvalues and eigenvectors

• symmetric eigendecomposition

• quadratic forms

• low rank matrix approximation


Quadratic forms

the eigendecomposition is a useful tool for problems that involve quadratic forms

f (x) = xT Ax

• substitute A = QΛQT and make an orthogonal change of variables y = QT x :

f (Qy) = yT Λy = λ1 y12 + · · · + λn yn2

• y1, . . . , yn are coordinates of x in the orthonormal basis of eigenvectors

• the orthogonal change of variables preserves inner products and norms:

k yk2 = kQT xk2 = k xk2

Symmetric eigendecomposition 3.23


Maximum and minimum value

consider the optimization problems with variable x

maximize xT Ax minimize xT Ax
subject to xT x = 1 subject to xT x = 1

change coordinates to the spectral basis ( y = QT x and x = Qy ):

maximize λ1 y12 + · · · + λn yn2 minimize λ1 y12 + · · · + λn yn2


subject to y12 + · · · + yn2 = 1 subject to y12 + · · · + yn2 = 1

• maximization: y = (1, 0, . . . , 0) and x = q1 are optimal; maximal value is

max xT Ax = max (λ1 y12 + · · · + λn yn2) = λ1 = max λi


k xk=1 k yk=1 i=1,...,n

• minimization: y = (0, 0, . . . , 1) and x = qn are optimal; minimal value is

min xT Ax = min (λ1 y12 + · · · + λn yn2) = λn = min λi


k xk=1 k yk=1 i=1,...,n

Symmetric eigendecomposition 3.24


Exercises

Exercise 1: find the extreme values of the Rayleigh quotient (xT Ax)/(xT x), i.e.,

xT Ax xT Ax
max T , min T
x,0 x x x,0 x x

Exercise 2: solve the optimization problems

maximize xT Ax minimize xT Ax
subject to xT x ≤ 1 subject to xT x ≤ 1

Exercise 3: show that (for symmetric A)

k Ak2 = max |λi | = max |xT Ax|


i=1,...,n k xk=1

Symmetric eigendecomposition 3.25


Sign of eigenvalues

matrix property condition on eigenvalues


positive definite λn > 0
positive semidefinite λn ≥ 0
indefinite λn < 0 and λ1 > 0
negative semidefinite λ1 ≤ 0
negative definite λ1 < 0

• λ1 and λn denote the largest and smallest eigenvalues:

λ1 = max λi, λn = min λi


i=1,...,n i=1,...,n

• properties in the table follow from

xT Ax xT Ax
λ1 = max x Ax = max T ,
T
λn = min x Ax = min T
T
k xk=1 x,0 x x k xk=1 x,0 x x

Symmetric eigendecomposition 3.26


Ellipsoids
if A is positive definite, the set

E = {x | xT Ax ≤ 1}

is an ellipsoid with center at the origin

√1 q1
λ1

√1 qn
λn

after the orthogonal change of coordinates y = QT x the set is described by

λ1 y12 + · · · + λn yn2 ≤ 1

this shows that:

• eigenvectors of A give the principal axes



• the width along the principal axis determined by qi is 2/ λi

Symmetric eigendecomposition 3.27


Exercise

give an interpretation of trace(A−1) as a measure of the size of the ellipsoid

E = {x | xT Ax ≤ 1}

Symmetric eigendecomposition 3.28


Max–min characterization of eigenvalues

as an extension of the maximization problem on page 3.24, consider

maximize λmin(X T AX) (2)


subject to XT X = I

the variable X is an n × k matrix, for some given value of k between 1 and n

• λmin(X T AX) denotes the smallest eigenvalue of the k × k matrix X T AX


• for k = 1 this is the problem on page 3.24: λmin(xT Ax) = xT Ax

Solution: from the eigendecomposition A = QΛQT = λ


Pn T
i=1 i qi qi
• the optimal value of (2) is the k th eigenvalue λ k of A
• an optimal choice for X is formed from the first k columns of Q:

X=
 
q1 q2 · · · qk

this is known as the Courant–Fischer min–max theorem


Symmetric eigendecomposition 3.29
Proof of the max–min characteriation

we make a change of variables Y = QT X :

maximize λmin(Y T ΛY )
subject to Y TY = I

we also partition Λ as

   λ1 · · · 0   λ k+1 · · · 0 
Λ=
Λ1 0
, Λ1 = 
 .. . . . .. ,
  .
Λ2 =  . ... .. 

0 Λ2
λk λn
 

 0 ··· 

 0
 ··· 

 
I
we show that the matrix Ŷ = is optimal
0
• for this matrix
 T    
I Λ1 0 I
Ŷ T ΛŶ = = Λ1, λmin(Ŷ T ΛŶ ) = λmin(Λ1) = λ k
0 0 Λ2 0

• on the next page we show that λmin(Y T ΛY ) ≤ λ k if Y is n × k with Y T Y = I

Symmetric eigendecomposition 3.30


Proof of the max–min characteriation
• on page 3.24, we have seen that

λmin(Y T ΛY ) = min uT (Y T ΛY )u
kuk=1

• if Y has k columns, there exists v , 0 such that Y v has k − 1 leading zeros:


 Y11 ··· Y1k   0 

 .. .. 


 .. 

   v1   
Yk−1,1 ··· Yk−1,k  .   0
Y v =   . =
 

Yk1 ··· Yk k    yk 

 .. ..   vk  
   .. 

   

 Yn1 ··· Ynk 


 yn 

• if Y T Y = I and we normalize v , then kY vk = kvk = 1 and

(Y v)T Λ(Y v) = λ k y k2 + · · · + λn yn2 ≤ λ k (y k2 + · · · + yn2) = λ k

• this shows that

λmin(Y T ΛY ) = min uT (Y T ΛY )u ≤ vT (Y T ΛY )v ≤ λ k
kuk=1
Symmetric eigendecomposition 3.31
Min–max characterization of eigenvalues

the minimization problem on page 3.24 can be extended in a similar way:

minimize λmax(X T AX)


(3)
subject to XT X = I

the variable X is an n × k matrix

• λmax(X T AX) denotes the largest eigenvalue of the k × k matrix X T AX


• for k = 1 this is the minimization problem on page 3.24: λmax(xT Ax) = xT Ax

Solution: from the eigenvalue decomposition A = QΛQT = λ


Pn T
i=1 i qi qi
• the optimal value of (3) is eigenvalue λn−k+1 of A
• an optimal choice of X is formed from the last k columns of Q:

X=
 
qn−k+1 · · · qn−1 qn

this follows from the max–min characterization on page 3.29 applied to −A


Symmetric eigendecomposition 3.32
Exercises

Exercise 1: suppose B is an m × m principal submatrix of A, for example,

 A11
 A12 · · · A1m 

 A21 A22 · · · A2m
B =  . ,

.. .. (4)
 .


 Am1
 Am2 · · · Amm 

and denote the m eigenvalues of B by µ1 ≥ µ2 ≥ · · · ≥ µm

show that
µ1 ≤ λ1, µ2 ≤ λ2, . . ., µm ≤ λm

(λ1, . . . , λm are the first m eigenvalues of A)

Exercise 2: consider the matrix B in (4) with m = n − 1; show that

λ1 ≥ µ1 ≥ λ2 ≥ µ2 ≥ · · · ≥ λn−1 ≥ µn−1 ≥ λn

this is known as the eigenvalue interlacing theorem


Symmetric eigendecomposition 3.33
Eigendecomposition of covariance matrix

• suppose x is a random n-vector with mean µ, covariance matrix Σ


• Σ is positive semidefinite with eigendecomposition

Σ = E((x − µ)(x − µ)T ) = QΛQT

define a random n-vector y = QT (x − µ)

• y has zero mean and covariance matrix Λ:

E(yyT ) = QT E((x − µ)(x − µ)T )Q = QT ΣQ = Λ

• components of y are uncorrelated and have variances E(yi2) = λi


• x is decomposed in uncorrelated components with decreasing variance:

E(y12) ≥ E(y22) ≥ · · · ≥ E(yn2)

the transformation is known as the Karhunen–Loève or Hotelling transform


Symmetric eigendecomposition 3.34
Multivariate normal distribution

multivariate normal (Gaussian) probability density function

1 − 12 (x−µ)T Σ−1 (x−µ)


p(x) = √ e
(2π) n/2 det Σ

x2
contour lines of density function for

 √   
1
Σ= √7 3
, µ=
5
4 3 5 4

eigenvalues of Σ are λ1 = 2, λ2 = 1,
 √   
3/2 1/2
q1 = , q2 = √
1/2 − 3/2
x1

Symmetric eigendecomposition 3.35


Multivariate normal distribution

the decorrelated and de-meaned variables y = QT (x − µ) have distribution

n
1 yi2
p̃(y) =
Y
√ exp(− )
i=1 2πλi 2λi

x2
y2
y1
y1
−λ11/2 λ11/2

y2 x1
−λ21/2 λ21/2

Symmetric eigendecomposition 3.36


Joint diagonalization of two matrices

• a symmetric matrix A is diagonalized by an orthogonal similarity:

QT AQ = Λ

• as an extension, if A, B are symmetric and B is positive definite, then

ST AS = D, ST BS = I

for some nonsingular S and diagonal D

Algorithm: S and D can be computed is as follows

• Cholesky factorization B = RT R, with R upper triangular and nonsingular


• eigendecomposition R−T AR−1 = QDQT , with D diagonal, Q orthogonal
• define S = R−1Q:

ST AS = QT R−T AR−1Q = Λ, ST BS = QT R−T BR−1Q = QT Q = I

Symmetric eigendecomposition 3.37


Optimization problems with two quadratic forms

as an extension of the maximization problem on page 3.24, consider

maximize xT Ax
subject to xT Bx = 1

where A, B are symmetric and B is positive definite

• compute nonsingular S that diagonalizes A, B:

ST AS = D, ST BS = I

• make change of variables x = Sy :

maximize yT Dy
subject to yT y = 1

• if diagonal elements of D are sorted as D11 ≥ · · · ≥ Dnn, solution is

y = e1 = (1, 0, . . . , 0), x = Se1, xT Ax = D11


Symmetric eigendecomposition 3.38
Outline

• eigenvalues and eigenvectors

• symmetric eigendecomposition

• quadratic forms

• low rank matrix approximation


Low-rank matrix approximation

• low rank is a useful matrix property in many applications

• low rank is not a robust property (easily destroyed by noise or estimation error)

• most matrices in practice have full rank

• often the full-rank matrix is close to being low rank

• computing low-rank approximations is an important problem in linear algebra

on the next pages we discuss this for positive semidefinite matrices

Symmetric eigendecomposition 3.39


Rank-r approximation of positive semidefinite matrix

let A be a positive semidefinite matrix with rank(A) > r and eigendecomposition


n
A = QΛQ = T
λi qi qiT , λ1 ≥ · · · ≥ λn ≥ 0, λr+1 > 0
X
i=1

the best rank-r approximation is the sum of the first r terms in the decomposition:
r
B= λi qi qiT
X
i=1

• B is the best approximation for the Frobenius norm: for every C with rank r ,
! 1/2
n
k A − CkF ≥ k A − BkF = λi2
X
i=r+1

• B is also the best approximation for the 2-norm: for every C with rank r ,

k A − Ck2 ≥ k A − Bk2 = λr+1

Symmetric eigendecomposition 3.40


Rank-r approximation in Frobenius norm

the approximation problem in Frobenius norm is a nonlinear least squares problem


!2
n X
n r
X X T kF2 =
X X
minimize kA − Ai j − Xik X j k
i=1 j=1 k=1

• we parametrize B as B = X X T with X of size n × r , and optimize over X


• this can be written in the standard nonlinear least squares form

minimize g(x) = k f (x)k 2

with vector x containing the elements of X and f (x) the elements of A − X X T


• the first order (necessary but not sufficient) optimality conditions are

∇g(x) = 2D f (x)T f (x) = 0

• the first order optimality conditions will be derived on page 3.41; they are

4(A − X X T )X = 0
Symmetric eigendecomposition 3.41
Solution of first order optimality conditions

AX = X(X T X)

• define eigendecomposition X T X = UDU T (U orthogonal r × r , D diagonal)


• use Y = XU and D as variables:

AY = Y D, Y TY = D

• the r diagonal elements of D must be eigenvalues of A


• the r columns of Y are corresponding orthogonal eigenvectors

• the columns of Y are normalized to have norm Dii

we conclude that the solutions of the first order optimality conditions satisfy

X X = YY =
T T
λi qi qiT
X
i∈I

where I is a subset of r elements of {1, 2, . . . , n}


Symmetric eigendecomposition 3.42
Optimal solution

among the solutions of the 1st order conditions we choose the one that minimizes

k A − X X T kF

• the squared error in the approximation is

X X T kF2 = λi qi qiT kF2


X
kA − kA −
i∈I

= λi qi qiT kF2
X
k
i<I

= λi2
X
i<I

• the optimal choice for I is I = {1, 2, . . . , r }:


r n
XX =
T
λi qi qiT , X X T kF2 = λi2
X X
kA −
i=1 i=r+1

Symmetric eigendecomposition 3.43


First order optimality
to derive the first order optimality conditions for

minimize k A − X X T kF2

we substitute X + δX , with arbitrary small δX , and linearize:

k A − (X + δX)(X + δX)T kF2


= k A − X X T + δX X T + X δX T + δX δX T kF2
≈ k A − X X T + δX X T + X δX T kF2
 
= trace (A − X X + δX X + X δX )(A − X X + δX X + X δX )
T T T T T T

≈ trace ((A − X X T )(A − X X T )) + 2 trace ((δX X T + X δX T )(A − X X T ))


= k A − X X T kF2 + 4 trace (δX T (A − X X T )X)

X is a stationary point if the second term is zero for all δX :

4(A − X X T )X = 0

Symmetric eigendecomposition 3.44


Rank-r approximation in 2-norm

the same matrix B is also the best approximation in 2-norm: if C has rank r , then

k A − Ck2 ≥ k A − Bk2

the right-hand side is

n r
k A − Bk2 = λi qi qiT λi qi qiT k2
X X
k −
i=1 i=1
n
= λi qi qiT k2
X
k
i=r+1
= λr+1

on the next page we show that k A − Ck2 ≥ λr+1 if C has rank r

Symmetric eigendecomposition 3.45


Proof

• if rank(C) = r , the nullspace of C has dimension n − r


• define an n × (n − r) matrix V with orthonormal columns that span null(C)
• we use the min–max theorem on page 3.32 to bound k A − Ck2:

k A − Ck2 = max |xT (A − C)x| (page 3.25)


k xk=1
≥ max xT (A − C)x
k xk=1
≥ max yT V T (A − C)V y ( kV yk = k yk )
k yk=1
= max yT V T AV y (V T CV = 0)
k yk=1
= λmax(V T AV)

≥ λr+1 (page 3.32 with k = n − r )

Symmetric eigendecomposition 3.46

You might also like