0% found this document useful (0 votes)
94 views

Linear Algebra: Lecture Notes

The document provides lecture notes on linear algebra topics including singular value decomposition, LU decomposition, Cholesky decomposition, and QR decomposition. It begins with an outline of the topics to be covered and instructions for an online quiz. It then discusses the spectral theorem and its applications, including constructing matrices from eigenvectors and low-rank approximations. The document motivates singular value decomposition as a generalization of spectral decomposition to non-symmetric matrices, and defines singular values as the square roots of the eigenvalues of the matrix A transposed times A.

Uploaded by

Halyna Oliinyk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views

Linear Algebra: Lecture Notes

The document provides lecture notes on linear algebra topics including singular value decomposition, LU decomposition, Cholesky decomposition, and QR decomposition. It begins with an outline of the topics to be covered and instructions for an online quiz. It then discusses the spectral theorem and its applications, including constructing matrices from eigenvectors and low-rank approximations. The document motivates singular value decomposition as a generalization of spectral decomposition to non-symmetric matrices, and defines singular values as the square roots of the eigenvalues of the matrix A transposed times A.

Uploaded by

Halyna Oliinyk
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Linear Algebra

Lecture Notes

Rostyslav Hryniv

Ukrainian Catholic University


Data Science Master Programme

1st term
Autumn 2019
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Heart and soul review:

go to socrative.com
press student login
enter the room LAUCU2019
answer 10 questions on eigenvalues and eigenvectors
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Outline
Spectral Theorem
Low rank approximations
1 Singular Value Decomposition
Definition
Explanation and proof
Applications of SVD
2 LU decomposition
Linear systems
Elementary matrices
LU factorization
3 Cholesky decomposition
Motivation
Applications of Cholesky decomposition
Algorithm
4 QR and around
Applications of QR
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

The Spectral Theorem


Holds for symmetric/Hermitian, skew-Hermitian, orthogonal/unitary
matrices
claims existence of an orthogonal basis of eigenvectors u1 , . . . , un
for eigenvalues λ1 , . . . , λn
spectral decomposition:

A = λ1 u1 u> >
1 + · · · + λn un un

orthogonally diagonalizable: with


Λ = diag(λ1 , . . . , λn )
P with columns u1 , . . . , un

P −1 AP = P > AP = Λ ⇐⇒ A = PΛP >


λj are
real for symmetric/Hermitian matrices
purely imaginary for anti-symmetric/skew-Hermitian ones
unimodular for orthogonal/unitary matrices
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Applications of the Spectral Theorem

to construct symmetric or anti-symmetric or orthogonal (Hermitian,


skew-Hermitian or unitary) matrix with prescribed spectrum and
eigenvectors
to construct low-rank approximation of A:
if |λ1 | ≥ |λ2 | ≥ · · · ≥ |λn | and λk +1 , . . . , λn are small compared to
λ1 , . . . , λk , then
Ak = λ1 u1 u> 1 + · · · + λk uk uk
>

is a good rank k approximation of A


in what norm? use the so-called Frobenius norm
Xn
2
kAkF := |ajk |2 = trace(A∗ A)
j,k =1
Pn Pn
then kAk2F = λ2j trace(uj u>
j=1 j )= j=1 λ2j
n
kA − Ak k2F = j=k +1 λ2j
P
and
in fact, the SVD says Ak is the best rank k approximation!
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Example of low-rank approximation

Let  
15 10
A=
10 0

Eigenvalues: λ1 = 20, λ2 = −5 (λ1 + λ2 = 15, λ1 λ2 = −100)


> >
eigenvectors: u1 = √25 , √15 , u2 = − √15 , √25
rank-one approximation:
   
> 16 8 −1 2
A1 = λ1 u1 u1 = , A − A1 =
8 4 2 −4
Frobenius norms:
kAk2F = 152 + 102 + 102 = 425,
kA − A1 k2F = 25 is just 1/17 of kAk2F
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Why low-rank?

Why low-rank matrix approximation are important?


In reality, deal with huge matrices (sizes 103 –106 or larger)
Sending and efficient storing becomes an issue!
Low-rank approximations are much easier for storing and sending!

Cost comparison:
full m × n matrix requires mn numbers to store;
rank one matrix requires only m + n + 1
important e.g. for image compressions
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

What if A is non-symmetric or non-square?

If A is non-symmetric, then
its eigenvectors need not be orthogonal, or even
too few eigenvectors (have to use generalized EVc’s)
If A is non-square, there are no eigenvalues and eigenvectors at all!

However, low rank approximations in (Frobenius norm) exist;


what is the best one?
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Best rank one approximation to a generic A


Problem
What is the best rank one approximation uv> of an m × n matrix A in
Frobenius norm? (WLOG assume kvk = 1)

The matrix uv> has rows u1 v> , . . . , um v> , if the rows of A are a>
1 , ...,
a>
m , then m m
X X
> 2 > > 2
kA − uv kF = kaj − uj v k = kaj − uj vk2
j=1 j=1
This is minimal if uj v is the projection Pk aj of aj onto ls(v):
m
X m
X m
X m
X
kaj − Pk aj k2 = kP⊥ aj k2 = kaj k2 − kPk aj k2
j=1 j=1 j=1 j=1
Thus need to maximize
Xm m
X m
X
kPk aj k2 = |a>
j v|2
= |v> aj |2 = kAvk2
j=1 j=1 j=1
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

The trolley-line-location problem


We reduced the above problem to the following one:
Maximize kAvk under the restriction that kvk = 1
This is what we get in the trolley-line-location problem:
Choose a direction v to minimize the sum of
squared distances from a1 , . . . , am to the line
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

The trolley line problem


Problem:
For the given vectors a1 , a2 , . . . am in Rn , find their best line fit `
The objective function to be minimized:
Xm
f (`) := dist2 (aj , `)
k =1

v is the unit vector on ` and Pv := vv> the orthogonal projector;


then dist(ak , `) = kak − Pv ak k, so that
X X X
f (`) = kak − Pv ak k2 = kak k2 − kPv ak k2

thus one needs to maximize the sum


X X X
kPv ak k2 = kvv> ak k2 = |a> 2 2
k v| = kAvk ,

where A has rows a> > >


1 , a2 , . . . , am
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Solution to the rank-one approximation problem


Consider the quadratic form
Q(v) := kAvk2 = (Av)> (Av) = v> A> Av
and denote
the largest eigenvalue by σ12
corresponding eigenvector (the first principal axis) by v1
then A> Av1 = σ12 v1

max{Q(v) | kvk = 1} = Q(v1 ) = σ12


and u1 := Av1 satisfies A> u1 = σ12 v1
Solution to the rank-one approximation problem:
In Frobenius norm, the best rank-one approximation of A is σ1 u1 v>
1

This leads to the notion of singular values of A


Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Motivation

The spectral decomposition A = PDP −1 works perfectly for


symmetric matrices:
P is an orthogonal matrix
columns of P are eigenvectors of A
D is diagonal with EV’s on the diagonal
Is there anything similar for nonsymmetric matrices
What about rectangular matrices?
Use the Singular Value Decomposition (SVD)

A = UΣV T
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Singular values

Let A be any m × n matrix


B := AT A is n × n and nonnegative:

xT Bx = xT (AT A)x = (Ax)T (Ax) = kAxk2 ≥ 0

denote by λ1 ≥ λ2 ≥ · · · ≥ λn the EV’s of B


p
σj := λj are called the singular values of A
notice that there are r := rank B = rank A positive σj

Example
 
1 1  
T 2 1
A = 0 1; B = A A = has EV’s λ1 = 3 and λ2 = 1;
1 2
1 0

thus σ1 = 3, σ2 = 1
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

SVD theorem

Theorem (SVD)
Every m × n matrix A can be written as

A = UΣV T

where U and V are orthogonal and Σ is an m × n matrix with singular


values of A on its main diagonal and zeros otherwise

Remark
This is an analogue for the A = UDU T diagonalization of a symmetric
matrix A
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

SVD theorem
Theorem (SVD — expanded form)
Every m × n matrix A of rank r can be written as A = UΣV T , where
U = (u1 . . . ur |ur +1 . . . um ),
V = (v1 . . . vr |vr +1 . . . vn ),
Σ has σj on its main diagonal and zeros otherwise
vj are eigenvectors of AT A with EV’s σj2 : AT Avj = σj2 vj
uj := Avj /kAvj k = Avj /σj for j = 1, . . . , r is an ONB for the range
of A
u1 , . . . , um is an ONB for Rm
A = σ1 u1 vT1 + · · · + σr ur vTr
The vectors u1 , . . . , ur are the left singular vectors of A;
v1 , . . . , vr are the right singular vectors of A
Remark: Avj = σj uj , AT uj = σvj , AAT uj = σj2 uj
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Proof of the SVD decomposition

Start with normalized eigenvectors v1 , . . . , vn


and eigenvalues σ12 , . . . , σn2 of AT A
Form uj := Avj /kAvj k = Avj /σj , j = 1, . . . , r (= rank A)
σj T
uTi uj = vTi AT Avj /(σi σj ) = vTi (AT A)vj /(σi σj ) = σi vi vj = δij
complete with ur +1 , . . . , um to an ONB of Rm
Now

UΣ = (σ1 u1 . . . σr ur |0 .{z
. . 0})
n−r
. . 0}) = A(v1 . . . vn ) = AV
= (Av1 . . . Avr |0 .{z
n−r

since V is orthogonal, VV T = I yields A = UΣV T


Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Example
 
1 1
For A = 0 1, we find that
1 0

σ1 = 3 and σ2 = 1
v1 = ( √1 , √1 ) and v2 = ( √1 , − √1 )
2 2 2 2
1
√ 1 1 T
u1 = √3 ( 2, √ , √ ) ,
2 2
u2 = (0, − √1 , √1 )T ,
2 2
u3 = √1 (−1, 1, 1)T
   
3 1 1 0 0
σ1 u1 vT + σ2 u2 vT2 =  12 1
2 + − 21 1
2 =A
1 1 1
2 2 2 − 12
T is the best rank one approximation of A in the Frobenius
σ1 u1 vP
norm (aij − bij )2
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Interpretation of SVD

A = UΣV T implies decomposition of x 7→ Ax into

x 7→ y := V T x, y 7→ z := Σy, z 7→ Ax = Uz

x 7→ y finds the coordinates of the vector x in terms of one


orthonormal basis (v1 , . . . , vn )
y 7→ z scales those coordinates
z 7→ Ax find the vector with the scaled coordinates over another
orthonormal basis (u1 , . . . , un )
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Reduced SVD
In the SVD representation, some part is uninformative:
vr +1 , . . . , vn are chosen arbitrarily in the nullspace of A
ur +1 , . . . um are chosen arbitrarily in the nullspace of AT
Σ has zero rows or columns
The reduced SVD removes that uninformative part:
 T
v1

σ 1 · · · 0
A = u1 · · · ur  . . . . . . . . . . .   ... 
  
| {z } 0 ··· σ T
} | v{zr }
m×r r
| {z
r ×r r ×n

The reduced SVD of AT :


 T
u1

σ 1 · · · 0
 .. 
AT = v1 · · ·

vr  ...........  . 

0 · · · σr uT r
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

To summarize:
The SVD for arbitrary rectangular matrices is an analogue of the
spectral decomposition for square (especially symmetric) matrices
factorization
A = UΣV T
means:
rotation V T : change of basis to v1 , . . . , vn
stretch Σ: multiplication by singular values along the vj
rotation U: change of basis to u1 , . . . , um
in particular, Ax = b is equivalent to Σc = d, where
c := V > x is the coordinate vector of x in the ONB v1 , . . . , vn
d := U > bPis the coordinatePvector of b in the ONB u1 , . . . , um ; ;
thus x = ck vk and b = dk uk with dk = σk ck for k = 1, . . . , r
Geometrically this means that A maps the unit ball Bn of Rn into
“degenerate” ellipsoid Em of Rm :
X
(dk /σk )2 ≤ 1
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Geometrical meaning of SVD

v1 7→ e1 7→ σ1 e1 7→ σ1 u1
v2 7→ e2 7→ σ2 e2 7→ σ2 u2
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Polar decomposition

Theorem (Polar decomposition)


Any square matrix A can be written as QS with orthogonal
::::::::::
Q and
symmetric positive semidefinite
::::::::::::::::::::::::::::::
S

Why polar?
z = reiθ =⇒ zz = |z|2 = r 2
A = QS =⇒ AT A = S(Q T Q)S = S 2

Proof.
Write A = UΣV T = (UV T )(V ΣV T ) =: QS
Q := UV T is orthogonal
S := V ΣV T is symmetric and positive semidefinite
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Image compression
Instead of storing m × n numerical entries, can take the best rank-r
approximation of A; need
r (1 + m + n)
numbers
Pseudo-inverse
A rectangular A cannot be inverted!
However, a pseudo-inverse A+ can be defined s.t.
A+ A ≈ In and AA+ ≈ Im
for Σ, the pseudo-inverse Σ+ should satisfy
Σ+ Σ = Ir ⊕ 0n−r , Σ Σ+ = Ir ⊕ 0m−r
Σ+ gets transposed and σj replaced with 1/σj
if A = UΣV T , then its pseudo-inverse is A+ := V Σ+ U T : indeed,

A+ A = V Σ+ (U T U)ΣV T = V Σ+ ΣV T = V (Ir ⊕ 0n−r )V T = (Ir ⊕ 0n−r )


A A+ = UΣ(V T V )Σ+ U T = UΣ Σ+ U T = U(Ir ⊕ 0m−r )U T = (Ir ⊕ 0m−r )
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Best solution to Ax = b:
Recall that Ax = b is soluble ⇐⇒ b belongs to the range (ie,
column space) of A
if n > rank A, then there are many solutions (or none)
(homogeneous equation Ax = 0 has nontrivial solutions)
any solution has the form x = x0 + x1 with x0 a particular solution
and x1 any solution to Ax = 0
if b not in the range, solve the normal equation AT Ax = AT b to get
the least square solution
if rank A = n, then AT A is invertible
otherwise, the least square solution is not unique
look for the shortest solution x̂
Claim: if A = UΣV T , then x̂ = V Σ+ U T b
kAx − bk = kΣV T x − U T bk = kΣy − U T bk;
the shortest solution: y = Σ+ (U T b) and x = V y = (V Σ+ U T )b
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

SVD vs PCA
Observe that the largest value of kAxk with kxk ≤ 1 is obtained for
x = v1 and is equal to σ1 ;
this is the first principal axis for AT A:
indeed, AT A = V ΣT U T UΣV T = V ΣT ΣV T = VDV T is the spectral
decomposition of the symmetric matrix B := AT A
B has eigenvalues σk2 with eigenvectors vk
the quadratic form Q(x) := xT Bx is equal to kAxk2
by the minimax properties of the eigenvalues,

σ12 = max xT Bx = max kAxk2 ,


kxk=1 kxk=1

σ22 = max xT Bx = max kAxk2 ,


kxk=1, kxk=1,
x⊥v1 x⊥v1

σ32 = . . .

AT A can be considered as a correlation matrix for the columns of A


Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

The trolley line problem


Problem:
For the given vectors a1 , a2 , . . . am in Rn , find their best line fit `
The target function to be minimized:
Xm
f (`) := dist2 (aj , `)
k =1
u is the unit vector on ` and Pu := uuT the orthogonal projector;
then dist(ak , `) = kak − Pu ak k, so that
X X X
f (`) = kak − Pu ak k2 = kak k2 − kPu ak k2
thus one needs to maximize the sum
X X X
kPu ak k2 = kuuT ak k2 = |aTk u|2 = kAuk2 ,

where A has rows aT1 , aT2 , . . . , aTm


Solution: The best direction is the first left singular vector v1 of A
The smallest value of the target function is kak k2 − σ12 = σ22 + · · · + σn2
P
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Best low-rank approximation of A


Frobenius norm of a matrix
X X X
kAk2F = |aij |2 = kai k2 = σi2
Indeed, i,j i i
pre-/post-multiplying by an orthogonal matrix does not change k · kF
thus A = UΣV T yields kAk2F = kU T AV k2F = kΣk2F
another reason: kAk2F = trace(AT A); now
trace(AT A) = trace(V ΣT ΣV T ) = trace(ΣT Σ) = σk2
P

Best rank-one approximation of A in the Frobenius norm


For a rank-one operator B = uvT , kBk2F = kuk2 kvk2 ; thus (kuk = 1)

kA − uvT k2F = trace(A − uvT )T (A − uvT )


= ...
= kAk2F − kAuk2 + kAu − vk2
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Linear systems

Linear equation, or linear system:




 a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2

Ax = b

 . .................................
am1 x1 + am2 x2 + · · · + amn xn = bm

A an m × n matrix
x ∈ Rn , b ∈ Rm
Solution: x = A−1 b for invertible A (m=n)
Computation of A−1 is costly ∼ O(n3 ) and not always necessary!
Alternatively, use the Gauss elimination method
In matrix form, amounts to an LU representation of A
L stands for “lower”- and U for “upper”-triangular
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Elementary row operations that simplify linear systems


Definition
A system is called consistent if it possesses at least one solution
Two linear systems are called equivalent if they possess the same
set of solutions
Idea:
Transform a system to a simpler form without changing the set of
solutions using the following elementary row transformations:
1. Multiply an equation/a row through by a nonzero constant
2. Add a constant times one equation/row to another
3. Interchange two equations/rows
Properties
1 Elementary row operations are reversible
2 Lead to equivalent systems/augmented matrices
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Example: elementary row operations


 
x + 2y − z = 2 1 2 −1 2
2x + y − z = 1 −2 × (r1 ) 2 1 −1 1
x+ y− z = 0 −1 × (r1 ) 1 1 −1 0
 
x + 2y − z = 2 1 2 −1 2
− 3y + z = −3 (r2 ) → (r3 ) 0 −3 1 −3
− y = −2 (r3 ) → (r2 ) 0 −1 0 −2
 
x + 2y − z = 2 1 2 −1 2
− y = −2 ×(−1) 0 −1 0 −2
− 3y + z = −3 +3 × (r2 ) 0 −3 1 −3
 
x + 2y − z = 2 −2 × (r2 ) + (r3 ) 1 2 −1 2
y = 2 0 1 0 2
z = 3 0 0 1 3
 
x = 1 1 0 0 1
y = 2 0 1 0 2
z = 3 0 0 1 3
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Elementary row transformations revisited


A is an m × n coefficient matrix of a system
1. Row multiplication: Multiply i th row by t
Amounts to matrix multiplication EA, with
E −1 = diag{1, . . . , 1, t −1 , 1, . . . , 1}
| {z }
i−1

2. Row replacement: Add α times k th row to `th row


Amounts to EA, with (E)ii = 1, (E)`k = α, (E)ij = 0 otherwise:
 
1 k

 ↓ 

−1 ` → −α 1
E =


 . . 
 . 
1
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Elementary row transformations revisited

3. Row interchange: Interchange k th and `th rows


Amounts to matrix multiplication EA, with
 
1 k
 ↓ 0 1  ← k
−1

E =E =  1 

 1 0 ↑ ←`
` 1

Definition
The above matrices performing the elementary row operations are
called elementary matrices
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Example: ERO via matrix multiplication


   
1 2 −1 1 0 0
2 1 −1 −2 × (r1 ) E1 = −2 1
 0
1 1 −1 −1 × (r1 ) −1 0 1
   
1 2 −1 1 0 0
0 −3 1 E2 = 0 1 0
0 −1 0 − 13 × (r2 ) 0 − 13 1
 
1 2 −1
E2 E1 A = 0 −3 1 =⇒
0 0 − 13
| {z }
U
 
1 0 0
A = E1−1 E2−1 U = 2 1 0 U = LU
1 13 1
| {z }
L
L: lower-triangular U: upper-triangular
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Products of lower-triangular matrices

Definition (Lower- and upper-triangular matrices)


A square matrix is called lower-triangular (upper-triangular) if all its
entries above (below) the main diagonal are zero

Lemma
Product of two lower-triangular (upper-triangular) matrices is
lower-triangular (upper-triangular)

Proof.
Use the row or column form of matrix-matrix product
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

LU factorization
Theorem
Assume that an m × n matrix can be reduced to row echelon form U
using only row substitution operations. Then A = LU with a
lower-triangular m × m matrix L.
Proof.
Ek · · · E1 A = U =⇒ L = (Ek · · · E1 )−1 = E1−1 · · · Ek−1

Definition
The above representation A = LU, with an m × m lower-triangular
matrix L and upper-triangular∗ m × n matrix U, is the LU-factorization
of A.
Remark (∗ )
L is unique if all its diagonal entries are 1
If row interchanges are needed, use PA = LU, with P encoding all
row interchanges
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Why is LU-factorization important?

(
Ux = y
Ax = b ⇐⇒
Ly = b

Typically, O(n3 ) flops are needed to solve Ax = b


To solve Ly = b and Ux = y, need O(n2 ) flops
E.g., elementary row operations transforming L to In only perform
on b
Use to find A−1 for nonsingular A
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

What is a Cholesky decomposition?


Recall:
A = LU (or PA = LU) exists for rectangular matrices
a LU-representation is not unique!
usually L is m × m with 1 on the main diagonal;
then both L and U = L−1 A is uniquely determined
reason: if A = L1 U1 = L2 U2 , then
L−1
2 L1 = U2 U1
−1

L−1
2 L1 is lower-triangular with 1 on the diagonal
U2 U1−1 is upper-triangular
thus L−1 −1
2 L 1 = U2 U1 = I
if A is nonsingular, U has nonzero diagonal
can “factor it out” as D to get A = LDU with U having 1’s on the
main diagonal
for symmetric matrices, U = LT and A = LDLT
reason: AT = U T DLT = LDU = A and use uniqueness
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Standard Cholesky decomposition


The standard form of Cholesky decomposition reads

A = LLT with L lower-triangular

requires A symmetric and positive semi-definite :

xT Ax = xT LT Lx = (Lx)T Lx = kLxk2 ≥ 0

conversely, if A positive semi-definite, the LLT decomposition exists


for positive definite A, such a decomposition is (almost) unique!
if L has diagonal S, then with L := L1 S and D := S 2 , we get

A = LLT = (L1 S)(SLT1 ) = L1 S 2 LT1 = L1 DLT1

Thus the LDLT decomposition is more general as it does not


require positive semi-definiteness
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Applications

For numerical solutions of Ax = b


As in LU factorization, split solving Ax = b into
Ly = b (forward substitution) and
LT x = y (backward substitution)
For symmetric matrices, twice as efficient as LU

Monte Carlo simulation of correlated random variables


The covariance matrix Σ of Gaussian RV X := (X1 , X2 , . . . , Xn )T is
positive (semi-)definite. If Σ = LLT , then X can be simulated as LZ with
standard Gaussian RV Z; indeed, then
E(XXT ) = E(LZZT LT ) = LE(ZZT )LT = LLT = Σ
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

The algorithm
Idea:
As in Gaussian elimination, make entries below diagonal zero
Recursive algorithm:
1 start with i := 1 and A(1) := A
2 At step i, the matrix A(i) has the following form:
 
Ii−1 0 0
A(i) =  0 ai,i bTi  ,
0 bi B (i)
with the identity matrix Ii−1 of size i − 1
3 Set  
Ii−1 0 0
 0 √
Li :=  ai,i 0 

0 √1 bi In−i
ai,i
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Recursive algorithm (continued):


4 Write
A(i) = Li A(i+1) L∗i ;
then
Ii−1 0 0
 

A(i+1) = 0 1 0 
(i) 1 T
0 0 B − ai,i bi bi
5 Finally, A(n+1) = In and so we get

A = A(1) = L1 A(2) LT1 = L1 L2 A(3) LT2 LT1


= · · · = L1 L2 . . . Ln LTn . . . LT2 LT1
= L1 L2 . . . Ln (L1 L2 . . . Ln )T
= LLT

with L := L1 L2 . . . Ln
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Example

   
4 0 2 2 0 0
A = A(1) = 0 1 −1 =⇒ L1 = 0 1 0 
2 −1 6 1 0 1
 
1 0 0  
(2) T 1 0
A(1) = L1 A L1 =⇒ A = 0 (2)  1 −1 =⇒ L̃2 =

−1 1
0 −1 5
   
(2) (3) T (3) 1 0 1 0
à = L̃2 à L̃2 =⇒ à = =⇒ L̃3 =
0 4 0 2
   
2 0 0 1 0 0 1 0 0
L = L1 L2 L3 = 0 1 0 0 1 0 0 1 0
1 0 1 0 −1 1 0 0 2
  
2 0 0 2 0 1
A = LLT = 0 1 0  0 1 −1
1 −1 2 0 0 2
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

QR factorization = matrix form of Gram–Schmidt

Assume A has linearly independent columns a1 , . . . , an


perform Gram–Schmidt orthogonalization to get q1 , . . . , qn
at each step, qk is a linear combination of ak , q1 , . . . , qk −1
thus ak is in the span of q1 , . . . , qk
ak = P1 ak + · · · + Pk ak = q1 q> >
1 ak + · · · + qk qk ak
in matrix form, this becomes a QR factorization:
 >
q1 a1 q> >

1 a2 . . . q1 an
    q>
2 a2 . . . q2 an 
> 
a1 a2 . . . an = q1 q2 . . . qn  . . . . . . . . . .
q>
| {z } | {z }
A Q n an
| {z }
R

R can be calculated as Q > A


Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Full QR factorization and applications


In the above form, Q is m × n and R is n × n
often called the reduced QR factorization
Q is not orthogonal (as it is not square)
add m − n columns to get an orthogonal Q̃ and add m − n zero
rows to R;
then A = Q̃ R̃ is the full QR factorization
Application of QR to least squares:
AT A = (QR)T (QR) = R T R;
therefore, R T R x̂ = R T Q T b =⇒ R x̂ = Q T b
as R is upper-triangular, this is very fast! Que: why is R invertible?

Advantages of QR-decomposition:
Orthogonal columns of Q make algorithm stable
(norms do not increase or decrease)
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Full QR factorization and applications


another method to find Q and R involve Householder’s reflections
Q = I − 2v vT
Householder’s reflection Q can be chosen so that Qx = kxke1 : set
u = x − kxke1 and v = u/kuk

the impact on A: take x to be the first column of A; then


 
kxk ∗ . . . ∗
 0 ∗ . . . ∗
QA =  .
 
. .. .. .. 
 . . . .
0 ∗ ... ∗
Yet another method uses Givens rotations
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around

Applications of QR

QR eigenvalue algorithm
On each step, factorize Ak = Qk Rk and set Ak +1 := Rk Qk
as Rk = Qk−1 Ak , one gets Ak +1 = Qk−1 Ak Qk
thus Ak and Ak +1 have the same eigenvalues
typically, Ak converge to an upper-triangular matrix R
(Schur form of A)
eigenvalues of A = diagonal entries of R
Fourier transform and Fast Fourier transform
...

You might also like