0% found this document useful (0 votes)

30 views14 pages

1 - Types of Matrices

Oxford NLA Notes

Uploaded by

chamuvmg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views14 pages

1 - Types of Matrices

Oxford NLA Notes

Uploaded by

chamuvmg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

C6.

1 Numerical Linear Algebra

Yuji Nakatsukasa

Welcome to numerical linear algebra (NLA)! NLA is a beautiful subject that combines
mathematical rigor, amazing algorithms, and an extremely rich variety of applications.
What is NLA? In a sentence, it is a subject that deals with the numerical solution (i.e.,
using a computer) of linear systems Ax = b (given A 2 Rn⇥n (i.e., a real n ⇥ n matrix) and
b 2 Rn (real n-vector), find x 2 Rn ) and eigenvalue problems Ax = x (given A 2 Rn⇥n ,
find 2 C and x 2 Cn ), for problems that are too large to solve by hand (n 4 is already
large; we aim for n in the thousands or even millions). This can rightfully sound dull, and
some mathematicians (those purely oriented?) tend to get turned o↵ after hearing this —
how could such a course be interesting compared with other courses o↵ered by the Oxford
Mathematical Institute? I hope and firmly believe that at the end of the course you will all
agree that there is more to the subject than you imagined. The rapid rise of data science and
machine learning has only meant that the importance of NLA is still growing, with a vast
number of problems in these fields requiring NLA techniques and algorithms. It is perhaps
worth noting also that these fields have had enormous impact on the direction of NLA, in
particular the recent and very active field of randomised algorithm was born in light of needs
arising from these extremely active fields.
In fact NLA is a truly exciting field that utilises a huge number of ideas from di↵erent
branches of mathematics (e.g. matrix analysis, approximation theory, and probability) to
solve problems that actually matter in real-world applications. Having said that, the number
of prerequisites for taking the course is the bare minimum; essentially a basic understanding
of the fundamentals of linear algebra would suffice (and the first lecture will briefly review
the basic facts). If you’ve taken the Part A Numerical Analysis course you will find it helpful,
but again, this is not necessary.
The field NLA has been blessed with many excellent books on the subject. These notes
will try to be self-contained, but these references will definitely help. There is a lot to learn;
literally as much as you want to.

Trefethen-Bau (97) [33]: Numerical Linear Algebra

– covers essentials, beautiful exposition

Golub-Van Loan (12) [14]: Matrix Computations

Last update: May 3, 2023. (minor typos fixed) Please report any corrections or comments on these
lecture notes to [email protected]

1
– classic, encyclopedic

Horn and Johnson (12) [20]: Matrix Analysis (& Topics in Matrix Analysis (86) [19])

– excellent theoretical treatise, little numerical treatment

J. Demmel (97) [8]: Applied Numerical Linear Algebra

– impressive content

N. J. Higham (02) [17]: Accuracy and Stability of Algorithms

– bible for stability, conditioning

H. C. Elman, D. J. Silvester, A. J. Wathen (14) [11]: Finite Elements and Fast Iterative
Solvers

– PDE applications of linear systems, Krylov methods and preconditioning

This course covers the fundamentals of NLA. We first discuss the singular value decom-
position (SVD), which is a fundamental matrix decomposition whose importance is only
growing. We then turn to linear systems and eigenvalue problems. Broadly, we will cover

Direct methods (n . 10,000): Sections 5–10 (except 8)

Iterative methods (n . 1,000,000, sometimes larger): Sections 11–13

Randomised methods (n & 1,000,000): Sections 14–16

in this order. Lectures 1–4 cover the fundamentals of matrix theory, in particular the SVD,
its properties and applications.
This document consists of 16 sections. Very roughly speaking, one section corresponds
to one lecture (though this will not be followed strictly at all).

Contents
0 Introduction, why Ax = b and Ax = x? 6

1 Basic LA review 7
1.1 Warmup exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Structured matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Matrix eigenvalues: basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Computational complexity (operation counts) of matrix algorithms . . . . . 11
1.5 Vector norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 Subspaces and orthonormal matrices . . . . . . . . . . . . . . . . . . . . . . 13

2
2 SVD: the most important matrix decomposition 14
2.1 (Some of the many) applications and consequences of the SVD: rank, col-
umn/row space, etc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 SVD and symmetric eigenvalue decomposition . . . . . . . . . . . . . . . . . 17
2.3 Uniqueness etc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Low-rank approximation via truncated SVD 18

3.1 Low-rank approximation: image compression . . . . . . . . . . . . . . . . . . 21

4 Courant-Fischer minmax theorem 22

4.1 Weyl’s inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.1.1 Eigenvalues of nonsymmetric matrices are sensitive to perturbation . 23
4.2 More applications of C-F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.3 (Taking stock) Matrix decompositions you should know . . . . . . . . . . . . 24

5 Linear systems Ax = b 24
5.1 Solving Ax = b via LU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.2 Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 Cholesky factorisation for A 0 . . . . . . . . . . . . . . . . . . . . . . . . . 27

6 QR factorisation and least-squares problems 28

6.1 QR via Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Towards a stable QR factorisation: Householder reflectors . . . . . . . . . . . 29
6.3 Householder QR factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.4 Givens rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.5 Least-squares problems via QR . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.6 QR-based algorithm for linear systems . . . . . . . . . . . . . . . . . . . . . 34
6.7 Solution of least-squares via normal equation . . . . . . . . . . . . . . . . . . 34
6.8 Application of least-squares: regression/function approximation . . . . . . . 35

7 Numerical stability 36
7.1 Floating-point arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.2 Conditioning and stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.3 Numerical stability; backward stability . . . . . . . . . . . . . . . . . . . . . 38
7.4 Matrix condition number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.4.1 Backward stable+well conditioned=accurate solution . . . . . . . . . 39
7.5 Stability of triangular systems . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.5.1 Backward stability of triangular systems . . . . . . . . . . . . . . . . 40
7.5.2 (In)stability of Ax = b via LU with pivots . . . . . . . . . . . . . . . 41
7.5.3 Backward stability of Cholesky for A 0 . . . . . . . . . . . . . . . . 41
7.6 Matrix multiplication is not backward stable . . . . . . . . . . . . . . . . . . 42
7.7 Stability of Householder QR . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.7.1 (In)stability of Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . 44

3
8 Eigenvalue problems 44
8.1 Schur decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.2 The power method for finding the dominant eigenpair Ax = x . . . . . . . 46
8.2.1 Digression (optional): Why compute eigenvalues? Google PageRank . 47
8.2.2 Shifted inverse power method . . . . . . . . . . . . . . . . . . . . . . 47

9 The QR algorithm 48
9.1 QR algorithm for Ax = x . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
9.2 QR algorithm preprocessing: reduction to Hessenberg form . . . . . . . . . . 52
9.2.1 The (shifted) QR algorithm in action . . . . . . . . . . . . . . . . . . 53
9.2.2 (Optional) QR algorithm: other improvement techniques . . . . . . . 54

10 QR algorithm continued 55
10.1 QR algorithm for symmetric A . . . . . . . . . . . . . . . . . . . . . . . . . 55
10.2 Computing the SVD: Golub-Kahan’s bidiagonalisation algorithm . . . . . . . 56
10.3 (Optional but important) QZ algorithm for generalised eigenvalue problems . 56
10.4 (Optional) Tractable eigenvalue problems . . . . . . . . . . . . . . . . . . . . 57

11 Iterative methods: introduction 58

11.1 Polynomial approximation: basic idea of Krylov . . . . . . . . . . . . . . . . 59
11.2 Orthonormal basis for Kk (A, b) . . . . . . . . . . . . . . . . . . . . . . . . . 60
11.3 Arnoldi iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

12 Arnoldi and GMRES for Ax = b 61

12.1 GMRES convergence: polynomial approximation . . . . . . . . . . . . . . . . 62
12.2 When does GMRES converge fast? . . . . . . . . . . . . . . . . . . . . . . . 64
12.3 Preconditioning for GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . 65
12.4 Restarted GMRES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

13 Symmetric case: Lanczos and Conjugate Gradient method for Ax = b,

A 0 66
13.1 Lanczos iteration and Lanczos decomposition . . . . . . . . . . . . . . . . . 66
13.2 CG algorithm for Ax = b, A 0 . . . . . . . . . . . . . . . . . . . . . . . . . 67
13.3 CG convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
13.3.1 Chebyshev polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 69
13.3.2 Properties of Chebyshev polynomials . . . . . . . . . . . . . . . . . . 71
13.4 MINRES: symmetric (indefinite) version of GMRES (nonexaminable) . . . . 71
13.4.1 MINRES convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
13.5 Preconditioned CG/MINRES . . . . . . . . . . . . . . . . . . . . . . . . . . 73
13.6 The Lanczos algorithm for symmetric eigenproblem (nonexaminable) . . . . 73
13.7 Arnoldi for nonsymmetric eigenvalue problems (nonexaminable) . . . . . . . 75

4
14 Randomised algorithms in NLA 75
14.1 Gaussian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
14.1.1 Orthogonal invariance . . . . . . . . . . . . . . . . . . . . . . . . . . 76
14.1.2 Marchenko-Pastur: Rectangular random matrices are well conditioned 77
14.2 Randomised least-squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
14.3 “Fast” algorithm: row subset selection . . . . . . . . . . . . . . . . . . . . . 78
14.4 Sketch-and-solve for least-squares problems . . . . . . . . . . . . . . . . . . . 79
14.5 Sketch-to-precondition: Blendenpik . . . . . . . . . . . . . . . . . . . . . . . 81
14.5.1 Explaining 2 (AR̂ 1 ) = O(1) via Marchenko-Pastur . . . . . . . . . . 81
14.5.2 Blendenpik: solving minx kAx bk2 using R̂ . . . . . . . . . . . . . . 82
14.5.3 Blendenpik experiments . . . . . . . . . . . . . . . . . . . . . . . . . 82

15 Randomised algorithms for low-rank approximation 83

15.1 Randomised SVD by Halko-Martinsson-Tropp . . . . . . . . . . . . . . . . . 83
15.2 HMT approximant: analysis (down from 70 pages!) . . . . . . . . . . . . . . 85
15.3 Precise analysis for HMT (nonexaminable) . . . . . . . . . . . . . . . . . . . 86
15.4 Generalised Nyström (nonexaminable) . . . . . . . . . . . . . . . . . . . . . 87
15.5 MATLAB code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
15.6 Randomised algorithm for Ax = b, Ax = x? . . . . . . . . . . . . . . . . . . 89

16 Conclusion and discussion 89

16.1 Important (N)LA topics not treated . . . . . . . . . . . . . . . . . . . . . . . 89
16.2 Course summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
16.3 Related courses you can take . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Notation. For convenience below we list the notation that we use throughout the course.

(A): the set of eigenvalues of A. If a natural ordering exists (e.g. A is symmetric so

is real), i (A) is the ith (largest) eigenvalue.

(A): the set of singular values of A. i (A) always denotes the ith largest singular
value. We often just write i .

diag(A): the vector of diagonal entries of A.

We use capital letters for matrices, lower-case for vectors and scalars. Unless otherwise
specified, A is a given matrix, b is a given vector, and x is an unknown vector.

k · k denotes a norm for a vector or matrix. k · k2 denotes the spectral (or 2-) norm,
k · kF the Frobenius norm. For vectors, to simplify notation we sometimes use k · k for
the spectral norm (which for vectors is the familiar Euclidean norm).

Span(A) denotes the span or range of the column space of A. This is the subspace
consisting of vectors of the form Ax.

5
We reserve Q for an orthonormal (or orthogonal) matrix. L, (U ) are often lower (upper)
triangular.

I always denotes the identity matrix. In is the n ⇥ n identity when the size needs to
be specified.

AT is the transpose of the matrix; (AT )ij = Aji . A⇤ is the (complex) conjugate
transpose (A⇤ )ij = A¯ji .

, ⌫ denote the positive (semi)definite ordering. That is, A (⌫)0 means A is

positive (semi)definite (abbreviated as PD, PSD), i.e., symmetric and with positive
(nonnegative) eigenvalues. A B means A B 0.

We sometimes use the following shorthand: alg for algorithm, eigval for eigenvalue, eigvec
for eigenvector, singval for singular value, and singvec for singular vector, i↵ for “if and only
if”.

0 Introduction, why Ax = b and Ax = x?

As already stated, NLA is the study of numerical algorithms for problems involving matrices,
and there are only two main problems(!):

1. Linear system
Ax = b.
Given a (often square m = n but we will discuss m > n extensively, and m < n briefly
at the end) matrix A 2 Rm⇥n and vector b 2 Rm , find x 2 Rn such that Ax = b.

2. Eigenvalue problem
Ax = x.
Given a (always!1 ) square matrix A 2 Rn⇥n find : eigenvalues (eigval), and x 2 Rn :
eigenvectors (eigvec).

We’ll see many variants of these problems; one worthy of particular mention is the SVD,
which is related to eigenvalue problems but given its ubiquity has a life of its own. (So if
there’s a third problem we solve in NLA, it would definitely be the SVD.)
It is worth discussing why we care about linear systems and eigenvalue problems.
The primary reason is that many (in fact most) problems in scientific computing (and
even machine learning) boil down to linear problems:

Because that’s often the only way to deal with the scale of problems we face today!
(and in future)
1
There are exciting recent developments involving eigenvalue problems for rectangular matrices, but these
are outside the scope of this course.

6
For linear problems, so much is understood and reliable algorithms are available2 .

A related important question is where and how these problems arise in real-world prob-
lems.
Let us mention a specific context that is relevant in data science: optimisation. Suppose
one is interested in minimising a high-dimensional real-valued function f (x) : Rn ! R where
n 1.
A successful approach is to try and find critical points, that is, points x⇤ where rf (x⇤ ) =
0. Mathematically, this is a non-linear high-dimensional root-finding problem of finding
x 2 Rn such that rf (x) =: F (x) = 0 (the vector 0 2 Rn ) where F : Rn ! Rn . One of
the most commonly employed methods for this task is Newton’s method (which some of you
have seen in Prelims Constructive Mathematics). This boils down to

Newton’s method for F (x) = 0, F : Rn ! Rn nonlinear:

1. Start with initial guess x(0) 2 Rn , set i = 0

@Fi (x)
2. Find Jacobian matrix J 2 Rn⇥n , Jij = |
@xj x=x(0)

3. Update x(i+1) := x(i) J 1

F (x(i) ), i i + 1, go to step 2 and repeat
1
Note that the main computational task is to find the vector y = J F (x(i) ), which is
a linear system Jy = F (x(i) ) (which we solve for the vector y)

What about eigenvalue problems Ax = x? Google’s pagerank is a famous application

(we will cover this if we have time). Another example the Schrödinger equation of physics
and chemistry. Sometimes a nonconvex optimisation problem can be solved by an eigenvalue
problem.
Equally important is principal component analysis (PCA), which can be used for data
compression. This is more tightly connected to the SVD.
Other sources of linear algebra problems include di↵erential equations, optimisation,
regression, data analysis, ...

1 Basic LA review
We start with a review of key LA facts that will be used in the course. Some will be trivial
to you while others may not. You might also notice that some facts that you have learned in
a core LA course will not be used in this course. For example we will never deal with finite
fields, and determinants only play a passing role.
2
A pertinent quote is Richard Feynman’s “Linear systems are important because we can solve them”.
Because we can solve them, we do all sorts of tricks to reduce difficult problems to linear systems!

7
1.1 Warmup exercise
Let A 2 Rn⇥n (n ⇥ n square matrix). (or Cn⇥n ; the di↵erence hardly matters in most of
this course3 ). Try to think of statements that are equivalent to A being nonsingular. Try to
come up with as many conditions as possible, before turning the page.

3
While there are a small number of cases where the distinction between real and complex matrices matters,
in the majority of cases it does not, and the argument carries over to complex matrices by replacing ·T with
·⇤ . Therefore for the most part, we lose no generality in assuming the matrix is real (which slightly simplifies
our mindset). Whenever necessary, we will highlight the subtleties that arise resulting from the di↵erence
between real and complex. (For the curious, these are the Schur form/decomposition, LDLT factorisation
and eigenvalue decomposition for (real) matrices with complex eigenvalues.)

8
Here is a list: The following are equivalent.
1. A is nonsingular.
1
2. A is invertible: A exists.

3. The map A : Rn ! Rn is a bijection.

4. All n eigenvalues of A are nonzero.

5. All n singular values of A are positive.

6. rank(A) = n.

7. The rows of A are linearly independent.

8. The columns of A are linearly independent.

9. Ax = b has a solution for every b 2 Cn .

10. A has no nonzero null vector.

11. AT has no nonzero null vector.

12. A⇤ A is positive definite (not just semidefinite).

13. det(A) 6= 0.
1
14. An n ⇥ n matrix A exists such that A 1 A = In . (this, btw, implies (i↵) AA 1
= In ,
a nontrivial fact)

15. . . . (what did I miss?)

1.2 Structured matrices

We will be discussing lots of structured matrices. For square matrices,
Symmetric: Aij = Aji (Hermitian: Aij = A¯ji )

– The most important property of symmetric matrices is the symmetric eigenvalue

decomposition A = V ⇤V T ; V is orthogonal V T V = V V T = In , and ⇤ is a
diagonal matrix of eigenvalues ⇤ = diag( 1 , . . . , n ).
– symmetric positive (semi)definite A (⌫)0: symmetric and all positive (nonneg-
ative) eigenvalues.

Orthogonal: AAT = AT A = I (Unitary: AA⇤ = A⇤ A = I). Note that for square

matrices, AT A = I implies AAT = I.

Skew-symmetric: Aij = Aji (skew-Hermitian: Aij = A¯ji ).

9
Normal: AT A = AAT . (Here it’s better to discuss the complex case A⇤ A = AA⇤ : this
is a necessary and sufficient condition for diagonalisability under a unitary transfor-
mation, i.e., A = U ⇤U ⇤ where ⇤ is diagonal and U is unitary.)
Tridiagonal: Aij = 0 if |i j| > 1.
Upper triangular: Aij = 0 if i > j.
Lower triangular: Aij = 0 if i < j.
For (possibly nonsquare) matrices A 2 Cm⇥n , (usually m n).
(upper) Hessenberg: Aij = 0 if i > j + 1. (we will see this structure often.)
“orthonormal”: AT A = In , and A is (tall) rectangular. (This isn’t an established
name—we could call it “matrix with orthonormal columns” every time it appears—
but we use these matrices all the time in this course, so we need a consistent shorthand
name for it.)
sparse: most elements are zero. nnz(A) denotes the number of nonzero elements in A.
Matrices that are not sparse are called dense.
Other structures: Hankel, Toeplitz, circulant, symplectic,... (we won’t use these in this
course)

1.3 Matrix eigenvalues: basics

Ax = x, A 2 Rn⇥n , (0 6=)x 2 Rn
2 32 3 2 3
2 1 1 1 1
4
Example: 1 2 1 5 4 1 = 4 15
5 4
1 1 2 1 1 2 3
1
This matrix has an eigenvalue = 4, with corresponding eigenvector x = 15 (to-
4
1
gether they are an eigenpair ).
An n ⇥ n matrix always has n eigenvalues
0 2 (not 3
always
1 0 n2
linearly
31 independent eigenvec-
1 0
tors); In the example above, ( , x) = @1, 4 15A , @1, 4 1 5A are also eigenpairs.
0 1
The eigenvalues
Qn are the roots of the characteristic polynomial det( I A) = 0: det( I
A) = i=1 ( i ).

According to Galois theory, eigenvalues cannot be computed exactly for matrices with
n 5. But we still want to compute them! In this course we will (among other
things) explain how this is done in practice by the QR algorithm, one of the greatest
hits of the field.

10
1.4 Computational complexity (operation counts) of matrix algo-
rithms
Since NLA is a field that aspires to develop practical algorithms for solving matrix problems,
it is important to be aware of the computational cost (often referred to as complexity) of
the algorithms. We will discuss these as the algorithms are developed, but for now let’s
examine the costs for basic matrix-matrix multiplication. The cost is measured in terms
of flops (floating-point operations), which counts the number of additions, subtractions,
multiplications, and divisions (all treated equally) performed.
In NLA the constant in front of the leading term in the cost is (clearly) important. It
is customary (for good reason) to only track the leading term of the cost. For example,
n3 + 10n2 is abbreviated to n3 .

Multiplying two n ⇥ n matrices AB costs 2n3 flops. More generally, if A is m ⇥ n and

B is n ⇥ k, then computing AB costs 2mnk flops.

Multiplying a vector to an m ⇥ n matrix A costs 2mn flops.

Norms
We will need a tool (or metric) to measure how big a vector or matrix is. Norms give us a
means to achieve this. Surely you have already seen some norms (e.g. the vector Euclidean
norm). We will discuss a number of norms for vectors and matrices that we will use in the
upcoming lectures.

1.5 Vector norms

For vectors x = [x1 , . . . , xn ]T 2 Cn

p-norm kxkp = (|x1 |p + |x2 |p + · · · + |xn |p )1/p (1  p  1)

p
– Euclidean norm=2-norm kxk2 = |x1 |2 + |x2 |2 + · · · + |xn |2
– 1-norm kxk1 = |x1 | + |x2 | + · · · + |xn |
– 1-norm kxk1 = maxi |xi |

Of particular importance are the three cases p = 1, 2, 1. In this course, we will see p = 2
the most often.
A norm needs to satisfy the following axioms:

k↵xk = |↵|kxk for any ↵ 2 C (homogeneity),

kxk 0 and kxk = 0 , x = 0 (nonnegativity),

kx + yk  kxk + kyk (triangle inequality).

11
The vector p-norm satisfies all these, for any p.
Here are some useful inequalities for vector norms. A proof is left for your exercise and
is highly recommended. (Try to think when each equality is satisfied.) For x 2 Cn ,
p1 kxk2  kxk1  kxk2
n

p1 kxk1  kxk2  kxk1

1
n
kxk1  kxk1  kxk1
Note that with the 2-norm, kU xk2 = kxk2 for any unitary U and any x 2 Cn . Norms
with this property are called unitarily invariant. p
The 2-norm is also induced by the inner product kxk2 = xT x. An important property
of inner products is the Cauchy-Schwarz inequality |xT y|  kxk2 kyk2 (which can be directly
proved but is perhaps best to prove in general setting)4 . When we just say kxk for a vector
we mean the 2-norm.

1.6 Matrix norms

We now turn to norms of matrices. As you will see, many (but not the Frobenius and trace
norms) are defined via the vector norms (these are called induced norms).
kAxkp
p-norm kAkp = maxx kxkp

– 2-norm=spectral norm(=Euclidean norm) kAk2 = max (A) (largest singular value;

see Section 2)
P
– 1-norm kAk1 = maxi m j=1 |Aji |
Pn
– 1-norm kAk1 = maxi j=1 |Aij |
qP P
Frobenius norm kAkF = 2
i j |Aij |
(2-norm of vectorisation)
P
trace norm=nuclear norm kAk⇤ = min(m,n)
i=1 i (A). (this is the maximum trace of QT A
where Q is orthonormal, hence the name)
Colored in red are unitarily invariant norms kAk⇤ = kU AV k⇤ , kAkF = kU AV kF , kAk2 =
kU AV k2 for any unitary/orthogonal U, V .

Norm axioms hold for each of these. Useful inequalities include: For A 2 Cm⇥n , (exercise;
it is instructive to study the cases where each of these equalities holds)
p
p1 kAk1  kAk2  mkAk1
n

4
Just in case, here’s a proof: for any scalar c, kx cyk2 = kxk2 2cxT y + c2 kyk2 . This is minimised wrt
xT y 2 (xT y)2
c at c = kyk 2 with minimiser kxk kyk2 . Since this must be 0, the CS inequality follows.

12
p
p1 kAk1  kAk2  nkAk1
m
p
kAk2  kAkF  min(m, n)kAk2

A useful property of p-norms is that they are subordinate, i.e., kABkp  kAkp kBkp
(problem sheet). Note that not all norms
 satisfy this, e.g. with the max norm kAkmax =
1
maxi,j |Aij |, with A = [1, 1] and B = one has kABkmax = 2 but kAkmax = kBkmax = 1.
1

1.7 Subspaces and orthonormal matrices

A key notion that we will keep using throughout is a subspace S. In this course we
will almost exclusively confine ourselves to subspaces of Rn , even though they generalize
to more abstract vector spaces. A subspace is the set of vectors that can be written as a
linear combination basis vectors v1 , . . . , vd , which are assumed to be P linearly independent
(otherwise there is a basis with fewer vectors). That is, x 2 S i↵ di=1 ci vi (where ci are
scalars, R or C). The integer d is called the dimension of the subspace. We also say the
subspace is spanned by the vectors v1 , . . . , vd , or that v1 , . . . , vd spans the subpsace.
How does one represent a subspace? An obvious answer is to use the basis vectors
v1 , . . . , vd . This sometimes becomes cumbersome, and a common and convenient way to
represent the subspace is to use a (tall-skinny) rectangular matrix V 2 Rn⇥d = [v1 , v2 , . . . , vd ],
as S = span(V ) (or sometimes just “subspace V ”) which means the subspace of vectors that
can be written as V c, where c is a ’coefficient’ vector c 2 Rd .
It will be (not necessary but) convenient to represent subspaces using an orthonormal
matrix Q 2 Rn⇥d . (once we cover the QR factorisation, you’ll see that there is no loss of
generality in doing so).
An important fact about subspaces of Rn is the following:

Lemma 1.1 Let V1 2 Rn⇥d1 and V2 2 Rn⇥d2 each have linearly independent column vectors.
If d1 + d2 > n, then there is a nonzero intersection between two subspaces S1 = span(V1 ) and
S2 = span(V2 ), that is, there is a nonzero vector x 2 Rn such that x = V1 c1 = V2 c2 for some
vectors c1 , c2 .

This is straightforward but important enough to warrant a proof.

Proof: Consider the matrix M := [V1 , V2 ], which is of size n ⇥ (d1 + d2 ). Since d1 + d2 > n
by assumption,
 this matrix has a right null vector5 c 6= 0 such that M c = 0. Splitting
c1
c= we have the required result. ⇤
c2
Let us conclude this review with a list of useful results that will be helpful. Proofs (or
counterexample) should be straightforward.
5
If this argument isn’t convincing to you now, probably the easiest way to see this is via the SVD; so stay
tuned and we’ll resolve this in footnote 9, once you’ve seen the SVD!

13
(AB)T = B T AT
1
If A, B invertible, (AB) = B 1A 1

If A, B square and AB = I, then BA = I

 1 
Im X I X
= m
0 In 0 In
Neumann series: if kXk < 1 in any norm,
1
(I X) = I + X + X2 + X3 + · · ·
P
For a square n ⇥ n matrix A, the trace is Trace(A) = ni=1 Ai,i (sum of diagonals).
For any X, Y such that XY Pis square,
P Trace(XY ) = Trace(Y X) (quite useful). For
B 2 Rm⇥n , we have kBk2F = i j |Bij |2 = Trace(B T B).

Triangular structure (upper or lower) is invariant under addition, multiplication, and

inversion. That is, triangular matrices form a ring (in abstract algebra; don’t worry if
this is foreign to you).

Symmetry is invariant under addition and inversion, but not multiplication; AB is

usually not symmetric even if A, B are.

2 SVD: the most important matrix decomposition

We now start the discussion of the most important topic of the course: the singular value
decomposition (SVD). The SVD exists for any matrix, square or rectangular, real or complex.
We will prove its existence and discuss its properties and applications in particular in
low-rank approximation, which can immediately be used for compressing the matrix, and
therefore data.
The SVD has many intimate connections to symmetric eigenvalue problems. Let’s start
with a review.
Symmetric eigenvalue decomposition: Any symmetric matrix A 2 Rn⇥n has the
decomposition
A = V ⇤V T (1)
where V is orthogonal, V T V = In = V V T , and ⇤ = diag( 1 , . . . , n) is a diagonal
matrix of eigenvalues.

i are the eigenvalues, and V is the matrix of eigenvectors (its columns are the eigenvec-
tors).
The decomposition (1) makes two remarkable claims: the eigenvectors can be taken to
be orthogonal (which is true more generally of normal matrices s.t. A⇤ A = AA⇤ ), and the
eigenvalues are real.

Numerical Methods For Least Squares Problems, Second Edition
No ratings yet
Numerical Methods For Least Squares Problems, Second Edition
510 pages
Num Methods
No ratings yet
Num Methods
495 pages
Arindama Singh - Introduction To Matrix Theory-Ane Books, New Delhi (2017)
No ratings yet
Arindama Singh - Introduction To Matrix Theory-Ane Books, New Delhi (2017)
214 pages
(Blackwell Ancient Religions) Rives, James B - Religion in The Roman Empire (2007, Blackwell Pub) - Libgen - Li
No ratings yet
(Blackwell Ancient Religions) Rives, James B - Religion in The Roman Empire (2007, Blackwell Pub) - Libgen - Li
319 pages
Signal Processing 1 Script English v2017 PDF
No ratings yet
Signal Processing 1 Script English v2017 PDF
224 pages
Iterative Methods For Sparse Linear Systems
No ratings yet
Iterative Methods For Sparse Linear Systems
460 pages
Linear Algebra LectureNote
No ratings yet
Linear Algebra LectureNote
288 pages
NumCSE Lecture Document
No ratings yet
NumCSE Lecture Document
886 pages
Saad
No ratings yet
Saad
460 pages
Dahlquist - Bjoerck - Numerical Methods in Scientific Computing. Volume 2
No ratings yet
Dahlquist - Bjoerck - Numerical Methods in Scientific Computing. Volume 2
667 pages
Saad Y., Iterative Methods For Sparse Linear Systems
No ratings yet
Saad Y., Iterative Methods For Sparse Linear Systems
469 pages
Applied and Computational Linear Algebra
100% (2)
Applied and Computational Linear Algebra
504 pages
Applied and Computational Linear Algebra A First Course Charles L. Byrne
No ratings yet
Applied and Computational Linear Algebra A First Course Charles L. Byrne
469 pages
No - Meth Jrock Vol2 PDF
No ratings yet
No - Meth Jrock Vol2 PDF
667 pages
Iterative Methods For Sparse Linear Systems 2nd Edition Yousef Saad - Get Instant Access To The Full Ebook With Detailed Content
No ratings yet
Iterative Methods For Sparse Linear Systems 2nd Edition Yousef Saad - Get Instant Access To The Full Ebook With Detailed Content
86 pages
Pma Exam
100% (1)
Pma Exam
358 pages
DQBJ Vol 2
100% (1)
DQBJ Vol 2
788 pages
Numerical Methods For Large Eigenvalue Problems
100% (1)
Numerical Methods For Large Eigenvalue Problems
285 pages
Lecture Notes
No ratings yet
Lecture Notes
128 pages
IterMethBook 2nded PDF
100% (1)
IterMethBook 2nded PDF
567 pages
Linear Algebra LectureNote
No ratings yet
Linear Algebra LectureNote
223 pages
Applied Numerical Computing
No ratings yet
Applied Numerical Computing
274 pages
Linear Guest
100% (1)
Linear Guest
436 pages
Yousef Saad - Iterative Methods For Sparse Linear Systems-Society For Industrial and Applied Mathematics (2003)
No ratings yet
Yousef Saad - Iterative Methods For Sparse Linear Systems-Society For Industrial and Applied Mathematics (2003)
460 pages
MA398 Script
No ratings yet
MA398 Script
115 pages
Fundamentals of Linear Algebra For Signal Processing 2022 09 22
No ratings yet
Fundamentals of Linear Algebra For Signal Processing 2022 09 22
321 pages
Applied Numerical Computing
100% (1)
Applied Numerical Computing
257 pages
NLA Lecture Notes
No ratings yet
NLA Lecture Notes
86 pages
Iterative Methods Sparse Linear Systems
No ratings yet
Iterative Methods Sparse Linear Systems
460 pages
Book Ena
No ratings yet
Book Ena
436 pages
Fundamentals of Numerical Linear Algebra
No ratings yet
Fundamentals of Numerical Linear Algebra
265 pages
Numerical Analisis 2015
No ratings yet
Numerical Analisis 2015
357 pages
Math 6610 - Analysis of Numerical Methods I
No ratings yet
Math 6610 - Analysis of Numerical Methods I
103 pages
Lecture Notes
No ratings yet
Lecture Notes
337 pages
Linear Guest PDF
No ratings yet
Linear Guest PDF
436 pages
Reader (09 10)
No ratings yet
Reader (09 10)
232 pages
ComputationalMathematics - Chapter 1 PDF
No ratings yet
ComputationalMathematics - Chapter 1 PDF
39 pages
Lecture Notes On Linear Algebra
No ratings yet
Lecture Notes On Linear Algebra
128 pages
Linear Guest (001 109)
No ratings yet
Linear Guest (001 109)
109 pages
Lecture Notes On Linear Algebra: Prepared by Muhammad Shahnewaz Bhuyan
No ratings yet
Lecture Notes On Linear Algebra: Prepared by Muhammad Shahnewaz Bhuyan
75 pages
IterMethBook 2nded
No ratings yet
IterMethBook 2nded
567 pages
The QR Algorithm: and Other Methods To Compute The Eigenvalues of Complex Matrices
No ratings yet
The QR Algorithm: and Other Methods To Compute The Eigenvalues of Complex Matrices
28 pages
Beginning Algebra 9th Edition Tobey Test Bank
100% (33)
Beginning Algebra 9th Edition Tobey Test Bank
25 pages
An Overview of Principles of Odor Production, Emission, and Control
100% (1)
An Overview of Principles of Odor Production, Emission, and Control
21 pages
Ore No Imouto Ga Konna Ni Kawaii Wake Ga Nai - Volume 08
No ratings yet
Ore No Imouto Ga Konna Ni Kawaii Wake Ga Nai - Volume 08
229 pages
Infectious Diseases of Livestock, 2nd Edition, Volume 1
No ratings yet
Infectious Diseases of Livestock, 2nd Edition, Volume 1
688 pages
Internship Report
No ratings yet
Internship Report
11 pages
What Is Moxibustion Acupuncturedrcmt PDF
No ratings yet
What Is Moxibustion Acupuncturedrcmt PDF
3 pages
Sovenier 2023
No ratings yet
Sovenier 2023
492 pages
Grammar Handbook
No ratings yet
Grammar Handbook
33 pages
Detailed Lesson Plan in Arts (Landscape)
No ratings yet
Detailed Lesson Plan in Arts (Landscape)
4 pages
ACSIC Guide Book July 2023
No ratings yet
ACSIC Guide Book July 2023
37 pages
Geography: Test Series
No ratings yet
Geography: Test Series
9 pages
As 1789
No ratings yet
As 1789
2 pages
Enclaving Atas Tanah Hak Guna Usaha Sebagai Sumber
No ratings yet
Enclaving Atas Tanah Hak Guna Usaha Sebagai Sumber
17 pages
Conditionals Random Pages Sample2 PDF
No ratings yet
Conditionals Random Pages Sample2 PDF
22 pages
Stylistics
No ratings yet
Stylistics
11 pages
Unix and Shell Programming
No ratings yet
Unix and Shell Programming
19 pages
Ap Literature and Composition Syllabus MR
No ratings yet
Ap Literature and Composition Syllabus MR
2 pages
Accounting Ratios
No ratings yet
Accounting Ratios
23 pages
All India Mock Test - 02
No ratings yet
All India Mock Test - 02
16 pages
Comparative Table
No ratings yet
Comparative Table
9 pages
CASA FOIA Department of The Army - West Point Mission Statement - 3.20.24
No ratings yet
CASA FOIA Department of The Army - West Point Mission Statement - 3.20.24
4 pages
TYS Questions by Topics (Full)
No ratings yet
TYS Questions by Topics (Full)
10 pages
Bhutan Spirit Sanctuary Prices For Walk in Guests Wellness 2024 Final
No ratings yet
Bhutan Spirit Sanctuary Prices For Walk in Guests Wellness 2024 Final
8 pages
Introduction of Session Chairs
No ratings yet
Introduction of Session Chairs
8 pages
Journal Er
No ratings yet
Journal Er
2 pages
Breezes Color by Number
No ratings yet
Breezes Color by Number
1 page
P17F06.036.01 - Occlutech Patient Brochure PDA PDF
No ratings yet
P17F06.036.01 - Occlutech Patient Brochure PDA PDF
2 pages
Project 12C
No ratings yet
Project 12C
2 pages
Comparative Study of Classifications of History
No ratings yet
Comparative Study of Classifications of History
2 pages
The Elements of Quantitative Investing
From Everand
The Elements of Quantitative Investing
Giuseppe A. Paleologo
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Kronecker Products and Matrix Calculus with Applications
From Everand
Kronecker Products and Matrix Calculus with Applications
Alexander Graham
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
From Everand
Time-dependent Behaviour and Design of Composite Steel-concrete Structures
Massimiliano Bocciarelli
No ratings yet
Ground State Structural Searches for Boron Atomic Clusters Using Density Functional Theory
From Everand
Ground State Structural Searches for Boron Atomic Clusters Using Density Functional Theory
John Kabaa
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Molecular Electronic-Structure Theory
From Everand
Molecular Electronic-Structure Theory
Trygve Helgaker
5/5 (2)
Primer of Quantum Mechanics
From Everand
Primer of Quantum Mechanics
Marvin Chester
4.5/5 (5)
Foundations of Image Science
From Everand
Foundations of Image Science
Harrison H. Barrett
No ratings yet
Robust Adaptive Control
From Everand
Robust Adaptive Control
Petros Ioannou
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Unlocking Statistics for the Social Sciences
From Everand
Unlocking Statistics for the Social Sciences
Norma Sinclair
No ratings yet
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
From Everand
Advanced Multiplayer Game Development with Ureal Engine 5: A Comprehensive Guide to C++ Scripting
Vladimir Kiselev
No ratings yet
Quantum Algorithms in Action: A Practical Guide to Implementation with Qiskit
From Everand
Quantum Algorithms in Action: A Practical Guide to Implementation with Qiskit
Robert Johnson
No ratings yet
Mathematics N4: FET College Nated, #6
From Everand
Mathematics N4: FET College Nated, #6
Efetobo Emede
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet

1 - Types of Matrices

Uploaded by

1 - Types of Matrices

Uploaded by

C6.

1 Numerical Linear Algebra

Trefethen-Bau (97) [33]: Numerical Linear Algebra

– covers essentials, beautiful exposition

Golub-Van Loan (12) [14]: Matrix Computations

– excellent theoretical treatise, little numerical treatment

J. Demmel (97) [8]: Applied Numerical Linear Algebra

N. J. Higham (02) [17]: Accuracy and Stability of Algorithms

– bible for stability, conditioning

– PDE applications of linear systems, Krylov methods and preconditioning

Direct methods (n . 10,000): Sections 5–10 (except 8)

Iterative methods (n . 1,000,000, sometimes larger): Sections 11–13

Randomised methods (n & 1,000,000): Sections 14–16

3 Low-rank approximation via truncated SVD 18

4 Courant-Fischer minmax theorem 22

6 QR factorisation and least-squares problems 28

11 Iterative methods: introduction 58

12 Arnoldi and GMRES for Ax = b 61

13 Symmetric case: Lanczos and Conjugate Gradient method for Ax = b,

15 Randomised algorithms for low-rank approximation 83

16 Conclusion and discussion 89

(A): the set of eigenvalues of A. If a natural ordering exists (e.g. A is symmetric so

diag(A): the vector of diagonal entries of A.

, ⌫ denote the positive (semi)definite ordering. That is, A (⌫)0 means A is

0 Introduction, why Ax = b and Ax = x?

Newton’s method for F (x) = 0, F : Rn ! Rn nonlinear:

1. Start with initial guess x(0) 2 Rn , set i = 0

3. Update x(i+1) := x(i) J 1

What about eigenvalue problems Ax = x? Google’s pagerank is a famous application

3. The map A : Rn ! Rn is a bijection.

4. All n eigenvalues of A are nonzero.

5. All n singular values of A are positive.

7. The rows of A are linearly independent.

8. The columns of A are linearly independent.

9. Ax = b has a solution for every b 2 Cn .

10. A has no nonzero null vector.

11. AT has no nonzero null vector.

12. A⇤ A is positive definite (not just semidefinite).

15. . . . (what did I miss?)

1.2 Structured matrices

– The most important property of symmetric matrices is the symmetric eigenvalue

Orthogonal: AAT = AT A = I (Unitary: AA⇤ = A⇤ A = I). Note that for square

Skew-symmetric: Aij = Aji (skew-Hermitian: Aij = A¯ji ).

1.3 Matrix eigenvalues: basics

Multiplying two n ⇥ n matrices AB costs 2n3 flops. More generally, if A is m ⇥ n and

Multiplying a vector to an m ⇥ n matrix A costs 2mn flops.

1.5 Vector norms

p-norm kxkp = (|x1 |p + |x2 |p + · · · + |xn |p )1/p (1  p  1)

k↵xk = |↵|kxk for any ↵ 2 C (homogeneity),

kxk 0 and kxk = 0 , x = 0 (nonnegativity),

kx + yk  kxk + kyk (triangle inequality).

p1 kxk1  kxk2  kxk1

1.6 Matrix norms

– 2-norm=spectral norm(=Euclidean norm) kAk2 = max (A) (largest singular value;

1.7 Subspaces and orthonormal matrices

This is straightforward but important enough to warrant a proof.

If A, B square and AB = I, then BA = I

Triangular structure (upper or lower) is invariant under addition, multiplication, and

Symmetry is invariant under addition and inversion, but not multiplication; AB is

2 SVD: the most important matrix decomposition

You might also like