Linear Algebra: Lecture Notes
Linear Algebra: Lecture Notes
Lecture Notes
Rostyslav Hryniv
1st term
Autumn 2019
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
go to socrative.com
press student login
enter the room LAUCU2019
answer 10 questions on eigenvalues and eigenvectors
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
Outline
Spectral Theorem
Low rank approximations
1 Singular Value Decomposition
Definition
Explanation and proof
Applications of SVD
2 LU decomposition
Linear systems
Elementary matrices
LU factorization
3 Cholesky decomposition
Motivation
Applications of Cholesky decomposition
Algorithm
4 QR and around
Applications of QR
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
A = λ1 u1 u> >
1 + · · · + λn un un
Let
15 10
A=
10 0
Why low-rank?
Cost comparison:
full m × n matrix requires mn numbers to store;
rank one matrix requires only m + n + 1
important e.g. for image compressions
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
If A is non-symmetric, then
its eigenvectors need not be orthogonal, or even
too few eigenvectors (have to use generalized EVc’s)
If A is non-square, there are no eigenvalues and eigenvectors at all!
The matrix uv> has rows u1 v> , . . . , um v> , if the rows of A are a>
1 , ...,
a>
m , then m m
X X
> 2 > > 2
kA − uv kF = kaj − uj v k = kaj − uj vk2
j=1 j=1
This is minimal if uj v is the projection Pk aj of aj onto ls(v):
m
X m
X m
X m
X
kaj − Pk aj k2 = kP⊥ aj k2 = kaj k2 − kPk aj k2
j=1 j=1 j=1 j=1
Thus need to maximize
Xm m
X m
X
kPk aj k2 = |a>
j v|2
= |v> aj |2 = kAvk2
j=1 j=1 j=1
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
Motivation
A = UΣV T
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
Singular values
Example
1 1
T 2 1
A = 0 1; B = A A = has EV’s λ1 = 3 and λ2 = 1;
1 2
1 0
√
thus σ1 = 3, σ2 = 1
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
SVD theorem
Theorem (SVD)
Every m × n matrix A can be written as
A = UΣV T
Remark
This is an analogue for the A = UDU T diagonalization of a symmetric
matrix A
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
SVD theorem
Theorem (SVD — expanded form)
Every m × n matrix A of rank r can be written as A = UΣV T , where
U = (u1 . . . ur |ur +1 . . . um ),
V = (v1 . . . vr |vr +1 . . . vn ),
Σ has σj on its main diagonal and zeros otherwise
vj are eigenvectors of AT A with EV’s σj2 : AT Avj = σj2 vj
uj := Avj /kAvj k = Avj /σj for j = 1, . . . , r is an ONB for the range
of A
u1 , . . . , um is an ONB for Rm
A = σ1 u1 vT1 + · · · + σr ur vTr
The vectors u1 , . . . , ur are the left singular vectors of A;
v1 , . . . , vr are the right singular vectors of A
Remark: Avj = σj uj , AT uj = σvj , AAT uj = σj2 uj
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
UΣ = (σ1 u1 . . . σr ur |0 .{z
. . 0})
n−r
. . 0}) = A(v1 . . . vn ) = AV
= (Av1 . . . Avr |0 .{z
n−r
Example
1 1
For A = 0 1, we find that
1 0
√
σ1 = 3 and σ2 = 1
v1 = ( √1 , √1 ) and v2 = ( √1 , − √1 )
2 2 2 2
1
√ 1 1 T
u1 = √3 ( 2, √ , √ ) ,
2 2
u2 = (0, − √1 , √1 )T ,
2 2
u3 = √1 (−1, 1, 1)T
3 1 1 0 0
σ1 u1 vT + σ2 u2 vT2 = 12 1
2 + − 21 1
2 =A
1 1 1
2 2 2 − 12
T is the best rank one approximation of A in the Frobenius
σ1 u1 vP
norm (aij − bij )2
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
Interpretation of SVD
x 7→ y := V T x, y 7→ z := Σy, z 7→ Ax = Uz
Reduced SVD
In the SVD representation, some part is uninformative:
vr +1 , . . . , vn are chosen arbitrarily in the nullspace of A
ur +1 , . . . um are chosen arbitrarily in the nullspace of AT
Σ has zero rows or columns
The reduced SVD removes that uninformative part:
T
v1
σ 1 · · · 0
A = u1 · · · ur . . . . . . . . . . . ...
| {z } 0 ··· σ T
} | v{zr }
m×r r
| {z
r ×r r ×n
To summarize:
The SVD for arbitrary rectangular matrices is an analogue of the
spectral decomposition for square (especially symmetric) matrices
factorization
A = UΣV T
means:
rotation V T : change of basis to v1 , . . . , vn
stretch Σ: multiplication by singular values along the vj
rotation U: change of basis to u1 , . . . , um
in particular, Ax = b is equivalent to Σc = d, where
c := V > x is the coordinate vector of x in the ONB v1 , . . . , vn
d := U > bPis the coordinatePvector of b in the ONB u1 , . . . , um ; ;
thus x = ck vk and b = dk uk with dk = σk ck for k = 1, . . . , r
Geometrically this means that A maps the unit ball Bn of Rn into
“degenerate” ellipsoid Em of Rm :
X
(dk /σk )2 ≤ 1
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
v1 7→ e1 7→ σ1 e1 7→ σ1 u1
v2 7→ e2 7→ σ2 e2 7→ σ2 u2
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
Polar decomposition
Why polar?
z = reiθ =⇒ zz = |z|2 = r 2
A = QS =⇒ AT A = S(Q T Q)S = S 2
Proof.
Write A = UΣV T = (UV T )(V ΣV T ) =: QS
Q := UV T is orthogonal
S := V ΣV T is symmetric and positive semidefinite
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
Image compression
Instead of storing m × n numerical entries, can take the best rank-r
approximation of A; need
r (1 + m + n)
numbers
Pseudo-inverse
A rectangular A cannot be inverted!
However, a pseudo-inverse A+ can be defined s.t.
A+ A ≈ In and AA+ ≈ Im
for Σ, the pseudo-inverse Σ+ should satisfy
Σ+ Σ = Ir ⊕ 0n−r , Σ Σ+ = Ir ⊕ 0m−r
Σ+ gets transposed and σj replaced with 1/σj
if A = UΣV T , then its pseudo-inverse is A+ := V Σ+ U T : indeed,
Best solution to Ax = b:
Recall that Ax = b is soluble ⇐⇒ b belongs to the range (ie,
column space) of A
if n > rank A, then there are many solutions (or none)
(homogeneous equation Ax = 0 has nontrivial solutions)
any solution has the form x = x0 + x1 with x0 a particular solution
and x1 any solution to Ax = 0
if b not in the range, solve the normal equation AT Ax = AT b to get
the least square solution
if rank A = n, then AT A is invertible
otherwise, the least square solution is not unique
look for the shortest solution x̂
Claim: if A = UΣV T , then x̂ = V Σ+ U T b
kAx − bk = kΣV T x − U T bk = kΣy − U T bk;
the shortest solution: y = Σ+ (U T b) and x = V y = (V Σ+ U T )b
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
SVD vs PCA
Observe that the largest value of kAxk with kxk ≤ 1 is obtained for
x = v1 and is equal to σ1 ;
this is the first principal axis for AT A:
indeed, AT A = V ΣT U T UΣV T = V ΣT ΣV T = VDV T is the spectral
decomposition of the symmetric matrix B := AT A
B has eigenvalues σk2 with eigenvectors vk
the quadratic form Q(x) := xT Bx is equal to kAxk2
by the minimax properties of the eigenvalues,
σ32 = . . .
Linear systems
A an m × n matrix
x ∈ Rn , b ∈ Rm
Solution: x = A−1 b for invertible A (m=n)
Computation of A−1 is costly ∼ O(n3 ) and not always necessary!
Alternatively, use the Gauss elimination method
In matrix form, amounts to an LU representation of A
L stands for “lower”- and U for “upper”-triangular
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
Definition
The above matrices performing the elementary row operations are
called elementary matrices
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
Lemma
Product of two lower-triangular (upper-triangular) matrices is
lower-triangular (upper-triangular)
Proof.
Use the row or column form of matrix-matrix product
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
LU factorization
Theorem
Assume that an m × n matrix can be reduced to row echelon form U
using only row substitution operations. Then A = LU with a
lower-triangular m × m matrix L.
Proof.
Ek · · · E1 A = U =⇒ L = (Ek · · · E1 )−1 = E1−1 · · · Ek−1
Definition
The above representation A = LU, with an m × m lower-triangular
matrix L and upper-triangular∗ m × n matrix U, is the LU-factorization
of A.
Remark (∗ )
L is unique if all its diagonal entries are 1
If row interchanges are needed, use PA = LU, with P encoding all
row interchanges
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
(
Ux = y
Ax = b ⇐⇒
Ly = b
L−1
2 L1 is lower-triangular with 1 on the diagonal
U2 U1−1 is upper-triangular
thus L−1 −1
2 L 1 = U2 U1 = I
if A is nonsingular, U has nonzero diagonal
can “factor it out” as D to get A = LDU with U having 1’s on the
main diagonal
for symmetric matrices, U = LT and A = LDLT
reason: AT = U T DLT = LDU = A and use uniqueness
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
xT Ax = xT LT Lx = (Lx)T Lx = kLxk2 ≥ 0
Applications
The algorithm
Idea:
As in Gaussian elimination, make entries below diagonal zero
Recursive algorithm:
1 start with i := 1 and A(1) := A
2 At step i, the matrix A(i) has the following form:
Ii−1 0 0
A(i) = 0 ai,i bTi ,
0 bi B (i)
with the identity matrix Ii−1 of size i − 1
3 Set
Ii−1 0 0
0 √
Li := ai,i 0
0 √1 bi In−i
ai,i
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
A(i+1) = 0 1 0
(i) 1 T
0 0 B − ai,i bi bi
5 Finally, A(n+1) = In and so we get
with L := L1 L2 . . . Ln
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
Example
4 0 2 2 0 0
A = A(1) = 0 1 −1 =⇒ L1 = 0 1 0
2 −1 6 1 0 1
1 0 0
(2) T 1 0
A(1) = L1 A L1 =⇒ A = 0 (2) 1 −1 =⇒ L̃2 =
−1 1
0 −1 5
(2) (3) T (3) 1 0 1 0
à = L̃2 à L̃2 =⇒ à = =⇒ L̃3 =
0 4 0 2
2 0 0 1 0 0 1 0 0
L = L1 L2 L3 = 0 1 0 0 1 0 0 1 0
1 0 1 0 −1 1 0 0 2
2 0 0 2 0 1
A = LLT = 0 1 0 0 1 −1
1 −1 2 0 0 2
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
Advantages of QR-decomposition:
Orthogonal columns of Q make algorithm stable
(norms do not increase or decrease)
Singular Value Decomposition LU decomposition Cholesky decomposition QR and around
Applications of QR
QR eigenvalue algorithm
On each step, factorize Ak = Qk Rk and set Ak +1 := Rk Qk
as Rk = Qk−1 Ak , one gets Ak +1 = Qk−1 Ak Qk
thus Ak and Ak +1 have the same eigenvalues
typically, Ak converge to an upper-triangular matrix R
(Schur form of A)
eigenvalues of A = diagonal entries of R
Fourier transform and Fast Fourier transform
...