LAA Notes 2024 Web
LAA Notes 2024 Web
Lawrence Reeves
School of Mathematics and Statistics
University of Melbourne 2024 semester 1
ii MAST10022 Linear Algebra: Advanced, 2024
Reading mathematics is not like reading novels or history. You need to think slowly about every
sentence. Usually, you will need to reread the same material later, often more than one rereading.
You have probably never had a laboratory course in mathematics. Mathematics is not considered
to be an experimental science, whereas physics, chemistry, and biology are. Research for a chemist
can consist of a laboratory experiment designed to validate a conjecture, to suggest a conjecture,
or simply to see what happens. There is little comparable activity in mathematics.
Linear algebra is a core component of undergraduate mathematics. The term ‘linear algebra’ cov-
ers many things from the basic idea of vectors in the plane through to infinite dimensional function
spaces and beyond. The fundamental objects of study are ‘vector spaces’ and ‘linear transforma-
tions’. Matrices are used as a tool. Appropriate mathematical notation and proof technique will be
introduced when investigating the fundamental properties of vector spaces.
In these lectures our goal will be to develop a solid theoretical understanding and to prove the results
(i.e., theorems, propositions, lemmas) considered. There will also be applications and calculation, but
these rest firmly on the theoretical foundations. Extra material and explanation will be presented in
the lectures.
You are expected to attend all lectures in person.
Topics
Topics that we will cover include :
Sources of information
The definitive source of information for this (and indeed any) subject is the University Handbook.
There are subject materials and a discussion board on the subject LMS site.
There is lots of general information for Maths and Stats students on the LMS site MSPrime
Books
The lecture notes are not a textbook. There are many excellent textbooks that cover the material of
this subject. It is not necessary to purchase a textbook. You should make use of the library ! Here are
some suggestions:
Classes
There are three one-hour lectures and two one-hour tutorials per week. The tutorials involve working
in small groups at a whiteboard. MATLAB is used as a tool for some of the exercises.
Assessment
Four written assignments 12%
Equally weighted and due at 1pm on Friday of weeks 3, 5, 9, 11
Late assignments will not be accepted (without special consideration)
Special consideration
Here is some general information about special consideration.
To apply for special consideration on the exam please follow the above link to apply online.
To apply for special consideration on an assignment or the mid-semester test or the MATLAB test,
please contact the lecturer directly. Do not apply through the general special consideration page
linked above as that can result in a long delay.
Applications must be submitted no later than four business days after the due date. (This is a
university-wide rule.)
Academic integrity
You must complete all assessment pieces entirely on your own.
It is academic misconduct to upload assignment or exam questions to online ‘help’ sites.
It is academic misconduct to show your assignment answers to other students.
There is more information on the University’s academic integrity site.
Lecturer
Lawrence Reeves When sending me an email, please use your
[email protected] university email account and include your
room 173, Peter Hall building student number.
(Subject to change!)
1 Sets and functions
2 Mathematical induction and logic
3 Matrices
4 Matrix inverses and linear systems
5 Row operations and elementary matrices
6 Gaussian elimination and types of solution set
7 Reduced row echelon form and matrix inverses
8 Rank and determinant of a matrix
9 Determinants (continued)
10 Fields
11 Vector spaces
12 Subspaces
13 Linear combinations: span and linear independence
14 Bases and dimension
15 Coordinates relative to a basis
16 Row and column space of a matrix
17 Some techniques for finite-dimensional vector spaces
18 Linear error-correcting codes
19 Linear transformations
20 Matrix representations of linear transformations
21 Change of basis
22 Dual space
23 Eigenvalues and eigenvectors
24 Eigenspaces
25 Diagonalisation
26 Diagonalisation II
27 Powers of a matrix and the Cayley-Hamilton theorem
28 Geometry in Euclidean space
29 Inner products
30 Orthonormal bases
31 The Gram-Schmidt orthogonalisation procedure and orthogonal projection
32 Orthogonal diagonalisation
33 Proof of the spectral theorem
34 Unitary diagonalisation
35 Least squares approximation
A Cardinality
B Existence of bases
vi MAST10022 Linear Algebra: Advanced, 2024
As motivation for the definition of vector space (Lecture 11), we list some examples of familiar things
that we might already think of as being ‘vectors’. We will see that they share some fundamental
algebraic properties: there is a version of ‘vector addition’ and of ‘scalar multiplication’.
Let u = (1, 1) and v = (1, −2). Then u, v ∈ R2 and if we add or multiply by a scalar we again get
elements of R2 .
u u
v
u+v
−2u
Let S be the set consisting of all solutions of the following simultaneous equations:
x + 9y + 6z + 8w = 0
2y + 3w = 0
Then u = (−6, 0, 1, 0) and v = (11, −3, 0, 2) are solutions. Adding two solutions gives another solu-
tion as does multiplying by a scalar: u + v = (5, −3, 1, 2) is a solution and −2u = (12, 0, −2, 0) is a
solution.
Polynomials
P2 (C) = {a0 + a1 x + a2 x2 | a0 , a1 , a2 ∈ C}
If we add any two elements of P2 (C), we get an element of P2 (C). If we multiply an element of P2 (C)
by a scalar, we get an element of P2 (C).
(1 + x + x2 ) + (1 − 2x − x2 ) = 2 − x
Functions
Consider the set F consisting of all functions from R to R. We have a way of adding two functions
together and of multiplying a function by a scalar. For example, let u, v ∈ F be given by u(x) = sin(x)
and v(x) = ex . Adding them together gives another element w ∈ F which is given by defining
w(x) = sin(x) + ex .
Vector spaces
All of the structures we’ve listed above share some fundamental properties. This leads to the abstract
concept of a vector space. The definition of a vector space, which will state precisely later, captures the
common algebraic properties shared by the above examples. The fundamental idea is that we have
a set (whose elements are called vectors) and a way of adding elements of the set together, as well
as a way of multiplying by a scalar. The elements of the set are called vectors, even if they are also
polynomials or functions or matrices. For a given vector space, the scalars are fixed and can be R or
C or some other field.
The pay-off for this abstraction is that any result or technique we establish for vector spaces applies
to all of the above examples – they don’t need to be treated separately.
Before beginning with the linear algebra content proper we revise some important general concepts
and notations. Sets and functions are fundamental to linear algebra and to modern mathematics in
general.
1.1 Sets
A set is a collection of objects called elements (or members) of that set.* The notation x ∈ A means
that x is an element of the set A. The notation x ∈
/ A is used to mean that x is not a member of A.
Let A and B be sets. We say that A is a subset of (or is contained in B), written A ⊆ B, if every
element of A is also an element of B (i.e., if x ∈ A, then x ∈ B). Two sets are equal if they have the
same members. Thus A = B exactly when both A ⊆ B and B ⊆ A. If A ⊆ B and A 6= B then we say
that A is a proper subset of B and (sometimes) write A ( B.
Sets are often defined either by listing their elements, as in A = {0, 2, 3}, or by giving a rule or
condition which determines membership in the set, as in A = {x ∈ R | x3 − 5x2 + 6x = 0}.
Here are some familiar (mostly mathematical) sets:
real numbers:† R
complex numbers: C = {x + iy | x, y ∈ R}
(1, 3] = {x ∈ R | 1 < x 6 3}
* In fact, more care is needed in the definition of a set. In general one must place some restriction on set formation. For
example, trying to form {x | x is a set} or {x | x ∈ / x} can lead to logical paradoxes (Russell’s paradox). This can be dealt
with or excluded in a more formal or axiomatic treatment of set theory. We will be careful to avoid situations where this
difficulty arises.
†
It’s a bit more involved to define the real numbers precisely, but one can think of them either as the points on the real
line or as (infinite) decimal expansions. In this subject we will be using some standard properties of R, but we will not give
a construction.
1-2 MAST10022 Linear Algebra: Advanced, 2024
Lemma 1.1
Proof. Let A be a set. We need to show that the following statement is true:
if a ∈ ∅, then a ∈ A (*)
Let’s suppose that (*) were not true. Then there would be an element a such that a ∈ ∅ is true and
a ∈ A is false. However, since a ∈ ∅ is never true, no such a exists and we conclude that (*) must in
fact be true.
Operations on sets
A ∩ B = {x | x ∈ A and x ∈ B}
A ∪ B = {x | x ∈ A or x ∈ B}
Example 1.2.
A \ B = {x | x ∈ A and x ∈
/ B}
If B ⊆ A, then A \ B is called the complement of B in A. If the larger set A is clear from the context,
we sometimes write B c for the complement of B in A.
Example 1.3.
1) Z \ N = {. . . , −2, −1, 0} 3) [0, 2] \ N = [0, 1) ∪ (1, 2)
2) N \ Z = ∅ 4) N \ [0, 2] = {3, 4, . . . }
Given a set A, the power set of A is the set containing all subsets of A. It is denoted P(A).
Example 1.5. For A = {α, β, γ} we have P(A) = {∅, {α}, {β}, {γ}, {α, β}, {α, γ}, {β, γ}, {α, β, γ}}.
Let A and B be two sets. We define a set, called the Cartesian product of A and B, by
A × B = {(a, b) | a ∈ A and b ∈ B}
Each element (a, b) of the set A × B is called an ordered pair.
1.2 Functions
The concept of a function is fundamental in mathematics. Functions on the real numbers are often
described using some sort of a formula (e.g., f (x) = sin(x)), but we want to define the notion of
function in a way that makes sense more generally. The idea is to make a definition out of what is
sometimes called the graph of a function.
Let A and B be sets. A function from A to B is a subset f of A × B such that for each a ∈ A
there is exactly one element of f whose first entry is a. We write f (a) = b to mean (a, b) ∈ f . We
write f : A → B to mean that f is a function from A to B. The set A is called the domain of the
function and B is called the codomain of the function.
Remark. 1. Functions are often (but not always!) given by a formula such as f : R → R, f (x) = x2 .
When written in this way, the subset of A × B is understood to be {(a, f (a)) | a ∈ A}.
2. The domain and codomain are part of the defining data of a function. The following two func-
tions are not the same:
f : R → R, f (x) = x2
g : [0, ∞) → R, g(x) = x2
Let f : A → B be a function.
Example 1.8.
1) The function f : R → R, f (x) = x2 is neither injective nor surjective.
2) The function g : [0, ∞) → R, g(x) = x2 is injective but not surjective.
3) The function h : R → [0, ∞), h(x) = x2 is surjective but not injective.
4) The function k : [0, ∞) → [0, ∞), k(x) = x2 is bijective.
Example 1.9.
1) f : N × N → N, f (m, n) = 2m 3n is injective (but not surjective).
(
n
2 if n is even
2) g : N → Z, g(n) = 1−n is bijective.
2 if n is odd
Example 1.10. Consider the function f : N → Z>2 , f (n) = n + 1. The corresponding subset of N × Z>2
is
{(1, 2), (2, 3), (3, 4), . . . }
The function f is a bijection. Its inverse is f −1 : Z>2 → N, f −1 (n) = n − 1 which as a subset of Z>2 × N
is
{(2, 1), (3, 2), (4, 3), . . . }
In mathematics one often needs functions of several variables, for example the operation of addition
of real numbers is a function of two variables which assigns to each pair of real numbers (x, y) their
sum x + y. Thus addition is a function from R2 to R. More generally, a function of n variables from
A to B (or an n-ary function f from A to B) is just a function of the form f : An → B.
1.3 Exercises
A = {n ∈ N | n is odd } C = {4n + 3 | n ∈ N}
B = {n ∈ N | n is a prime } D = {x ∈ R | x2 − 8x + 15 = 0}
A = {n ∈ N | n ≤ 11} E = {n ∈ N | n is even}
B = {n ∈ N | n is even and n ≤ 20}
(a) A ∩ B = A ∩ C implies B = C
(b) A ∪ B = A ∪ C implies B = C
(c) (A ∩ B = A ∩ C and A ∪ B = A ∪ C) implies B = C
8. Let S = {1, 2, 3, 4, 5} and T = {a, b, c, d}. For each question below: if the answer is “yes” give
an example; if the answer is “no” explain briefly.
9. Let S = {1, 2, 3, 4, 5} and consider the following functions from S to S: 1S (n) = n, f (n) = 6 − n,
g(n) = max{3, n} and h(n) = max{1, n − 1}.
10. Consider the two functions from N2 to N defined by f (m, n) = 2m 3n and g(m, n) = 2m 4n . Show
that f is injective but that g is not injective. Is f surjective? Explain. (You may use that every
n ∈ N with n > 2 has a unique prime factorisation.)
R : N → N, R(n) = n + 1
L : N → N, L(n) = max{1, n − 1}
The extra material at the end of a lecture can include extra theory or references. It is NOT required!
It’s just for those who would like to know more.
The art of proof: basic training for deeper mathematics, by Beck and Geoghegan, chapter 5.
Naive set theory, by Halmos.
Russell’s paradox, on Wikipedia.
Zermelo-Fraenkel set theory, on Wikipedia.
The axiomatic definition of the real numbers R and their construction from Q will be covered
in the subject MAST20033 Real Analysis: Advanced (amongst others). An important difference
between Q and R is that every bounded subset of R has a least upper bound. A standard
construction of R from Q is via “Dedekind cuts” which, roughly speaking, carefully adds least
upper bounds to Q.
Bijections are used in the definition of the cardinality (or size) of a set. Two sets are said to
have the same cardinality if there exists a bijection from one to the other. The two sets {1, 2, 3}
and {Julia, Ada, Xav} have the same cardinality. It starts to get interesting when we consider
infinite sets. Not all infinite sets have the same cardinality. The sets N, Z, and Q all have the
same cardinality, but R does not. In particular, there is a bijection from N to Q. That there is no
bijection from N to R can be shown with an elegant argument known as ‘Cantor diagonalisa-
tion’.
We continue with some background material on logic and proof by induction that we will need later
when constructing proofs.
2.1.1 Propositions
We will be concerned with statements that are either true or false. They are called propositions
(alternatively statements).
Example 2.1.
Propositions: Not propositions:
1+1=2 28
1+1=3 z is even
Given two propositions p and q, we can combine them to form new propositions. The conjunction
(‘and’) of p and q is denoted p ∧ q and is defined to be true if both p and q are true, and false in all
other cases. The disjunction (‘or’) of p and q is denoted p ∨ q and is defined to be false if both p and
q are false, and true in all other cases. For each, the four possible cases can be listed in a truth table.
We say that two statements p and q are logically equivalent if p is true precisely when q is true.* This
is written p ≡ q.
Example 2.3. We observed in the previous example that (p =⇒ q) ≡ (¬p∨q). To show this explicitly,
we construct a truth table for (p =⇒ q) ⇐⇒ (¬p ∨ q) and observe that the equivalence has value T
for all possible values of p and q.
p q p =⇒ q ¬p ∨ q (p =⇒ q) ⇐⇒ (¬p ∨ q)
T T T T T
T F F F T
F T T T T
F F T T T
Exercise 14. Use a truth table to show that (p =⇒ q) ≡ (¬q =⇒ ¬p). The second statement is called
the contrapositive of the first statement.
2.1.3 Quantifiers
The symbol ∀ means ‘for all’ (or ‘for each’ or ‘for every’). It is called the universal quantifier. The
general form of a proposition formed using the universal quantifier is
∀x ∈ A, p(x)
∃ x ∈ A, p(x)
is true if there is at least one element x in A such that the statement p(x) is true.
Example 2.5. Here are some statements constructed using these quantifiers.
4. ∀ x ∈ R ∃ y ∈ R, x + y = 0 (which is true)
* This is the same as saying that the statement p ⇐⇒ q is always true (i.e., is a tautology).
5. ∃ y ∈ R ∀ x ∈ R, x + y = 0 (which is false)
Notice the difference between the final two examples above. Example 4 says that every real number
has an additive inverse. For example, given x = π we can take y = (−1) × π. Example 5 says that
there exists one real number that is the additive inverse of every real number.
Example 2.7.
statement negation
∀ x ∈ R, x2 > 0 ∃ x ∈ R, x2 < 0
∃ y ∈ R ∀ x ∈ R, x + y = 0 ∀ y ∈ R ∃ x ∈ R, x + y 6= 0
∀ x ∈ U, x ∈ ∅ =⇒ x ∈ A ∃ x ∈ U, (x ∈ ∅ ∧ x ∈
/ A)
All maths lecturers are named Lawrence There exists a maths lecturer whose name is not
Lawrence
All flying pigs are purple There exists a flying pig that is not purple
2.2 Induction
In this subject we will be assuming some standard properties of N such as the distributive law. An
important property that we will use in many proofs is given in the following.
Let P (n) be a (true or false) statement that depends on a natural number n ∈ N. In order to prove
that P (n) is true for all values of n ∈ N it is sufficient to prove the following:
To give a proof of this theorem we would need to consider the definition and construction of the
natural numbers. It is equivalent to the so-called ‘well ordering’ property of N. We will not give a
proof, but shall regard the above as a fundamental property of N (and Z).
Here are some examples of using mathematical induction as a method of proof.
Example 2.9.
Proof. Let P (n) be the statement ‘n4 − 6n3 + 11n2 − 6n is divisible by 4’.† We check that both
conditions of the above theorem are satisfied.
†
Notice that P (n) is a statement that is either true or false. It is not a polynomial, nor is it an integer. It would be an
error to write something such as P (n) = n4 − 6n3 + 11n2 − 6n
(n + 1)4 − 6(n + 1)3 + 11(n + 1)2 − 6(n + 1) = (n4 + 4n3 + 6n2 + 4n + 1) − 6(n3 + 3n2 + 3n + 1)
+ 11(n2 + 2n + 1) − 6(n + 1)
= n4 − 2n3 − n2 + 2n
= (n4 − 6n3 + 11n2 − 6n) + 4n3 − 12n2 + 8n
= 4k + 4(n3 − 3n2 + 2n)
= 4(k + n3 − 3n2 + 2n)
Therefore P (n + 1) is true. It follows from the principle of mathematical induction (Theorem 2.8)
that P (n) is true for all n ∈ N.
Proof. Notice that the claim is that the inequality holds for all n > 4. In order to apply Theorem
2.8 in exactly the form given, we define P (n) to be the statement that (n + 3)! > 2n+3 . If we show
that P (n) is true for all n ∈ N, we will have established the claim.
Base case: P (1) is the statement that 4! > 24 . Since 4! = 24 > 16 = 24 , the statement P (1) is true.
Induction step: Let n ∈ N and suppose that P (n) is true, that is that (n + 3)! > 2n+3 . We need to
show that P (n + 1) is true, that is that ((n + 1) + 3)! > 2(n+1)+3 . We have
Therefore P (n + 1) is true. It follows from the principle of mathematical induction (Theorem 2.8)
that P (n) is true for all n ∈ N.
Here is another version of the induction statement in which the induction step is, in principle, easier
to prove.
Let P (n) be a (true or false) statement that depends on a natural number n ∈ N. In order to prove
that P (n) is true for all values of n ∈ N it is sufficient to prove the following:
Proof. We will show that this theorem follows from Theorem 2.8.§
‡
Every integer m ∈ Z divides 0 since 0 = m × 0
§
The converse is also true: Theorem 2.10 implies Theorem 2.8.
Let P (n) be as in the statement of the current theorem and assume that both (1) and (2) hold. We
need to show that P (n) is true for all n ∈ N. Let Q(n) be the statement that ‘P(k) is true for all k 6 n’.
We want to check that the statements Q(n) satisfy the conditions stated in Theorem 2.8.
Base case: Since Q(1) is simply the statement that P (1) is true, and were are assuming that (1) holds,
we have that Q(1) is true.
Induction step: We need to show that if Q(n) is true, then Q(n + 1) is true. Assume then that Q(n) is
true. That is, that P (1) is true, and P (2) is true, and P (3) is true,. . . , and P (n) is true. In particular, we
have that P (n) is true. Therefore, since we are assuming that (2) (from the current theorem) holds,
we have that P (n + 1) is true. Therefore, P (1) is true, and P (2) is true, and P (3) is true,. . . , and P (n)
is true, and P (n + 1) is true. That is, Q(n + 1) is true.
It follows from the principle of mathematical induction (Theorem 2.8) that Q(n) is true for all n ∈ N.
Therefore, P (n) is true for all n ∈ N.
Example 2.11. Complete induction can be used to prove that every natural number n ∈ N>2 can be
written as a product of primes.
2.3 Exercises
17. Use a truth table to show that (p ⇐⇒ q) is logically equivalent to (¬p ⇐⇒ ¬q).
(a) ∀x ∈ R, x2 = 10 (c) ∃a ∈ N, ∀x ∈ R, ax = 4
(b) ∃y ∈ N, y < 0 (d) ∀y ∈ Q, ∃x ∈ R, xy = 30
1 1 n 1 n
20. Let A = . Use induction to prove that A = for all n ∈ N.
0 1 0 1
Introductory logic
Here is a useful minor reformulation of Theorem 2.8. The difference is that we start at any
integer as the base case (in place of 1).
Theorem. Let P (n) be a (true or false) statement that depends on an integer n ∈ Z and let n0 ∈ Z. In
order to prove that P (n) is true for all values of n > n0 it is sufficient to prove the following:
Matrices
Matrices are a fundamental tool in linear algebra. We recall some definitions including the usual
arithmetic binary operations and the unary operation of transposition
We denote by Mm,n (R) the set of all matrices of size m × n having real entries.
The notations Mm,n (C), Mm,n (Q), and simply Mm,n are used similarly. For the moment, when
F is used it represents one of: Q, R or C. However, all results (and proofs) remain valid for any
fielda .
a
We will see the definition of a field in Lecture 10
π 3.1 0
Example 3.2. A = ∈ M2,3 (C), A12 = 3.1, A21 = − 21
− 12 4 2i
Definition 3.3
A matrix with the same number of rows as columns is called a square matrix
Given two matrices of the same size A, B ∈ Mm,n (F) we define A+B ∈ Mm,n (F) by (A+B)ij =
Aij + Bij . That is,
A11 A12 ... A1n B11 B12 ... B1n A11 + B11 A12 + B12 ... A1n + B1n
A21 A22 ... A2n B21 B22 ... B2n A21 + B21 A22 + B22 ... A2n + B2n
+ =
.. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . .
Am1 Am2 ... Amn Bm1 Bm2 ... Bmn Am1 + Bm1 Am2 + Bm2 ... Amn + Bmn
Given a matrix A ∈ Mm,n (F) and k ∈ F, define a matrix kA ∈ Mm,n (F) by (kA)ij = k × Aij .
That is,
A11 A12 ... A1n kA11 kA12 ... kA1n
A21 A22 ... A2n kA21 kA22 ... kA2n
k
... .. .. .. = .. .. .. ..
. . . . . . .
Am1 Am2 . . . Amn kAm1 kAm2 ... kAmn
π 3.1 0 2π 6.2 0
Example 3.7. 2 =
− 21 4 2i −1 8 4i
Given matrices A ∈ Mm,n (F) and B ∈ Mn,p (F) we define their product, AB, to be the matrix in
Mm,p (F) given by
Xn
(AB)ij = Aik Bkj
k=1
Note. Two matrices A and B can only be multiplied together (in that order) if the number of columns
of A is equal to the number of rows of B.
Exercise 22. Using the definition of matrix multiplication, show that for any matrix A ∈ Mm,n (F),
B ∈ Mn,m (F), and k ∈ F we have:
2π − 52 26.2 10i
2 5
π 3.1 0
2. 0 6 = −3 24 12i
− 21 4 2i
7 2 7π − 1 29.7 4i
1 1 1 −1 0 0
3. =
−1 −1 −1 1 0 0
Given a matrix A ∈ Mm,n we define its transpose to be the matrix AT ∈ Mn,m given by (AT )ij =
Aji . That is, AT is obtained from A by interchanging the rows and columns of A. A matrix is
called symmetric if AT = A.
π − 12
π 3.1 0
Example 3.11. A = , AT = 3.1 4
− 21 4 2
0 2
Exercise 23. Using the definitions of matrix addition and transpose prove that for all A, B ∈ Mm,n (F),
(A + B)T = AT + B T .
Lemma 3.12
Proof. Note first that (AB)T and B T AT are both of size p × m. To show that they are equal we need
to show that
∀i ∈ {1, . . . , p} ∀j ∈ {1, . . . , n}, ((AB)T )ij = (B T AT )ij
Let i ∈ {1, . . . , p} and j ∈ {1, . . . , n}. Then
3.2 Exercises
24. Suppose that A, B, C, and D are matrices with sizes given by:
A ∈ M2,3 , B ∈ M1,3 , C ∈ M2,2 , D ∈ M2,1 .
Determine which of the following expressions are defined. For those which are defined, give
the size of the resulting matrix.
26. Let
−1 0 1 0 4 −2
A = 2 −1 3 B= 3 1 2
0 1 −2 −1 0 1
Find
Verify that
31. (a) Show that if the matrix products AB and BA are both defined, then AB and BA are square
matrices.
(b) Show that if A is an m × n matrix and A(BA) is defined, then B is an n × m matrix.
33. Suppose that a 2 by 2 matrix A ∈ M2,2 (C) satisfies AB = BA for every 2 by 2 matrix B. That is
A satisfies
∀B ∈ M2,2 (C), AB = BA
In this exercise we show that A must be equal to zI2 for some z ∈ C.
Let A = ac db . By considering the two cases B = [ 10 00 ] and B = [ 00 10 ], show that a = d and
b = c = 0.
35. Let A, B ∈ Mn,n (C). Suppose that A2 = A. Show that (AB − ABA)2 = 0.
(Note we may not assume that n = 2 nor that AB = BA.)
36. Let A, B ∈ Mn,n be symmetric matrices. Show that AB is symmetric if and only if AB = BA.
37. (a) Give an example of three matrices A, B ∈ M2,2 and C ∈ M2,1 such that C 6= 0 and
AC = BC but A 6= B.
(b) Let A, B ∈ Mm,n (C). Suppose that AC = BC for all C ∈ Mn,1 (C). Show that A = B.
Matrices
Some histrory
v3 v2
0 1 0 1 1 2 6 1 5 4
1 0 1 0 1 6 2 5 1 4
0
A= 1 0 1 0 A3 =
1 5 0 4 2
v5 1
0 1 0 0 5 1 4 0 2
1 1 0 0 0 4 4 2 2 2
v4 v1
Note. Given any k ∈ N, the i,j-th entry of Ak gives the number of edge paths of length k from
vi to vj . (If you’re feeling adventurous, you could try proving this using induction on k.) For
example, there are six edge paths of length three from v1 to v2 in the above graph.
Matrices follow many of the algebraic properties that we are familiar with from the real numbers
(such as the distributive law). It’s natural to think about "dividing by a matrix" in the sense of multi-
plying by the multiplicative inverse.
An important property of the real numbers is that every non-zero real number has a multiplicative
inverse, that is, ∀ a ∈ R \ {0} ∃ b ∈ R, ab = 1. Some, but not all, non-zero square matrices have a
multiplicative inverse in the same sense.
A square matrix A ∈ Mn,n (F) is called invertible if there exists a matrix B ∈ Mn,n (F) such that
AB = In and BA = In . The matrix B is called the inverse of A and is denoted A−1 . If A is not
invertible, we say that A is singular.
Remark. Calling B the inverse of A needs some justification. It’s possible to show that there can be at
most one matrix that satisfies the above property. That is, if B, C ∈ Mn,n (F) are such that AB = In
and BA = In and AC = In and CA = In , then B = C.
−1 −1
2 i 1 i 0 −3 −2 4 −5 −2
Example 4.2. 1. =
i −1 i −2 3. 1 −4 −2 = 5 −6 −2
−3 4 1 −8 9 3
1 1
2. is singular
−1 −1
It’s easy to check that the given inverses are correct by simply multiplying and verifying that the
result is the identity matrix. We will describe a method for calculating the inverse of a matrix in a
later section.
Remark. Notice that it follows immediately from the definition that:
4. In−1 = In
If A and C are invertible matrices of the same size, then AC is invertible and
(AC)−1 = C −1 A−1
4-2 MAST10022 Linear Algebra: Advanced, 2024
Proof. Let A, C ∈ Mn,n (F) be two invertible matrices. We need to verify that C −1 A−1 satisfies the
conditions given in the definition of the matrix inverse.
and, similarly
Exercise 38. Let A ∈ Mn,n (F). For k ∈ N we define Ak to be the product of k copies of A, that is,
Ak = A
| ×A×
{z· · · × A}
k copies
Lemma 4.4
Let A, B ∈ Mn,n be two square matrices and let i ∈ {1, 2, . . . , n}. If all entries in the i-th row of
A are equal to zero, then all entries in the i-th row of AB are zero.
In particular, if a square matrix has a row consisting entirely of zeros, then it is singular.
Proof. Let A ∈ Mn,n (F) be a square matrix with all entries in the i-th row of A equal to zero. That is,
∀j ∈ {1, . . . , n}, Aij = 0. Let B ∈ Mn,n (F). For the same fixed i, we have (for all j)
n
X n
X n
X
(AB)ij = Aik Bkj = 0 × Bkj = 0=0
k=1 k=1 k=1
Therefore, the i-th row of AB has all entries equal to zero. In particular, (AB)ii 6= 1 and hence
AB 6= In . Therefore, no matrix B can satisfy the properties needed to be the inverse of A.
We would like to have an efficient method to solve simultaneous linear equations. Suppose that we
have n vairiables x1 , . . . , xn and m linear equations that they should simultaneously satisfy.
Given A ∈ Mm,n (F) and B ∈ Mm,1 (F) we would like to find all X ∈ Mn,1 (F) such that the equation
AX = B is satisfied. Thinking about our experience with solving simple equations of the form
2x = 7, our first thought might be to multiply on both sides by A−1 . The problem is that A need
not be invertible (or even square). However, if A does happen to be invertible, then we have the
following.
Proposition 4.6
Let A ∈ Mm,m (F) and B ∈ Mm,1 (F). If A is invertible, then the equation AX = B has a unique
solution and it is given by X = A−1 B.
Proof. First note that A−1 B is a solution since AA−1 B = Im B = B. Now suppose that X is any
solution. Then we have
The general case of a linear system (in which A is not necessarily invertible) is discussed in the next
section. The technique will rest on the following observation. Suppose we have A ∈ Mm,n (F) and
B ∈ Mm,1 (F) and an invertible matrix E ∈ Mm,m (F). Define A0 = EA and B 0 = EB. Then for all
X ∈ Mn,1 (F) we have
AX = B ⇐⇒ A0 X = B 0
The goal will be to arrange for the new linear system A0 X = B 0 to be as simple as possible.
4.3 Exercises
39. Let a, b, c, d ∈ C.
(a) Suppose
that ad − bc 6= 0. Show (using the definition of inverse) that the inverse of the
matrix ac db ∈ M2,2 (C) is the matrix
1 d −b
∈ M2,2 (C)
ad − bc −c a
40. Let A ∈ Mn,n (C). Suppose that there exists B ∈ Mn,k (C) such that AB = 0 and B 6= 0. Show
that A is not invertible. (Be careful, A need not be equal to 0.)
41. (a) Show that if a square matrix A satisfies A2 − 4A + 3I = 0 then A−1 = 13 (4I − A).
(b) Veify these relations in the case that A = [ 21 12 ]
42. Let
1 0 1
A= 2 3 4
−1 0 −2
Show that A−1 = − 13 (A2 − 2A − 4I).
43. Write the following systems of linear equations in the form AX = B. Then, by using the inverse
of A, solve the system (i.e., find X).
(a)
2x − 3y = 3
3x − 5y = 1
(b)
x + z = −1
2x + 3y + 4z = 3
−x − 2z = 3
44. Suppose A, B, P ∈ Mn,n (F) are such that P is invertible and A = P BP −1 . Show that
Ak = P B k P −1
Matrix inverse
Linear systems
For a linear system AX = B in which the matrix A is not invertible Proposition 4.6 does not apply. To
handle the general case we introduce the notion of elementary row operations and the corresponding
elementary matrices. They will turn out to be useful in other contexts, including when calculating
the inverse of a matrix.
Given a linear system there are certain operations that can be carried out without changing the set
of solutions. For example, changing the order in which the equations are listed does not alter the set
of solutions. Similarly, multiplying one of the equations by a (non-zero) constant does change the
solutions. This leads us to define the following operations on matrices.
Note that applying one of the above row operations does not change the size of the matrix. We
say that two matrices are row equivalent* if one can be obtained from the other by a sequence
of row operations. If A, B ∈ Mm,n (F) are row equivalent, we write A ∼ B.
0 i 2 1 0 −2 + 6i
Example 5.2. ∼ since one can be obtained from the other as follows
1 3 −2 0 1 −2i
0 i 2 R1↔R2 1 3 −2 (−i)×R2 1 3 −2 R1−3R2 1 0 −2 + 6i
−−−−−→ −−−−−→ −−−−−→
1 3 −2i 0 i 2 0 1 −2i 0 1 −2i
Definition 5.3
It follows from the definition that elementary matrices are always square.
0 1 1 0 1 −3
Example 5.4. The matrices , , and are elementary matrices. To justify this
1 0 0 −i 0 1
note that
1 0 R1↔R2 0 1 1 0 (−i)×R2 1 0 1 0 R1−3R2 1 −3
−−−−−→ , −−−−−→ , −−−−−→
0 1 1 0 0 1 0 −i 0 1 0 1
* Row equivalence is an example of what is called an equivalence relation.
5-2 MAST10022 Linear Algebra: Advanced, 2024
The connection between elementary matrices and elementary row operations is given by the follow-
ing result.
Lemma 5.5
Let A, B ∈ Mm,n (F). Suppose that B is obtained from A by applying a single row operation and
let E ∈ Mm,m (F) be the elementary matrix obtained by applying the same row operation to Im .
Then B = EA.
Aij
if i ∈
/ {p, q} Iij
if i ∈
/ {p, q}
Bij = Aqj if i = p Eij = Iqj if i = p
Apj if i = q Ipj if i = q
Pm
m
X Pk=1 Iik Akj
if i ∈
/ {p, q}
Aij if i ∈
/ {p, q}
m
(EA)ij = Eik Akj = k=1 Iqk Akj if i = p = Aqj if i = p = Bij
k=1 m
P
k=1 Ipk Akj if i = q Apj if i = q
Suppose now that the row operation is to multiply the p-th row by λ ∈ F \ {0}. Then we have
( (
Aij if i 6= p Iij if i 6= p
Bij = Eij =
λAij if i = p λIij if i = p
m
(P (
m
X Iik Akj if i 6= p Aij if i 6= p
(EA)ij = Eik Akj = Pk=1
m = = Bij
k=1 k=1 λIik Akj if i = p λAij if i = p
( (
Aij if i 6= p Iij if i 6= p
Bij = Eij =
Apj + λAqj if i = p Ipj + λIqj if i = p
m
(P (
m
X Iik Akj if i 6= p Aij if i 6= p
(EA)ij = Eik Akj = Pk=1
m = = Bij
k=1 k=1 (Ipk + λIqk )Akj if i = p Apj + λAqj if i = p
Corollary 5.6
Proof. Given any row operation ρ there is a row operation ρ0 which undoes the effect of ρ. Let the
corresponding elementary matrices be E and E 0 . Then applying the lemma with A = I we have
E 0 EI = I and EE 0 I = I. Hence EE 0 = I and E 0 E = I. Therefore E is invertible and E −1 = E 0 .
Example 5.7. Considering the row operations and corresponding elementary matrices from Exam-
ples 5.2 and 5.4 we have:
ρ
row operation A − →B elementary matrix E EA = B
0 i 2 R1↔R2 1 3 −2 0 1 0 1 0 i 2 1 3 −2
−−−−−→ E1 = =
1 3 −2i 0 i 2 1 0 1 0 1 3 −2i 0 i 2
1 3 −2 (−i)×R2 1 3 −2 1 0 1 0 1 3 −2 1 3 −2
−−−−−−→ E2 = =
0 i 2 0 1 −2i 0 −i 0 −i 0 i 2 0 1 −2i
1 3 −2 R1−3R2 1 0 −2 + 6i 1 −3 1 −3 1 3 −2 1 0 −2 + 6i
−−−−−→ E3 = =
0 1 −2i 0 1 −2i 0 1 0 1 0 1 −2i 0 1 −2i
Notice that
0 i 2 1 3 −2 1 3 −2 1 0 −2 + 6i
E3 E2 E1 = E3 E2 = E3 =
1 3 −2i 0 i 2 0 1 −2i 0 1 −2i
Exercise 46. For each of the row operations ρi in Example 5.7 write down a row operation ρ0i that
undoes the effect of ρi . Write down the elementary matrix Ei0 that corresponds to ρ0 . Verify that
Ei Ei0 = I2 .
Lemma 5.8
Let A, B ∈ Mm,n (F). If A and B are row equivalent, then there exists an invertible matrix
E ∈ Mm,m (F) such that B = EA.
Proof. Since A and B are row equivalent, there is a sequence of elementary row operations that trans-
forms A to B
ρ1 ρ2 ρ3 ρk
A −→ A1 −→ A2 −→ · · · −→ Ak = B
Let Ei be the elementary matrix corresponding to the row operation ρi and define E = Ek Ek−1 . . . E2 E1 .
Applying Lemma 5.5 we have
B = Ek Ek−1 . . . E2 E1 A = EA
We now define the first version of a "simplified" matrix that will be useful for solving linear systems
and for other applications (such as finding bases and calculating rank).
The leftmost non-zero entry in a row is called the leading entry of that row.
A matrix is in row echelon form if it satisfies the following conditions:
1. For any two non-zero rows, the leading entry of the lower row is further to the right than
the leading entry in the higher row.
2. Any row that consists entirely of zeros is lower than every non-zero row.
Note. Some authors add the condition that to be in row echelon form all leading entries should be
equal to 1. We are not including this requirement for what we call row echelon form. The extra
condition is not needed for any of our applications.
Examples 5.10.
2 0 2 3
0 1 −2 3 4 and 0 4 1 2 are in row echelon form
0 0 0 3
0 0 0 2 4
0 0 1 0 0 3
3 1 6
and 0 0 0
0 0 3 are not in row echelon form
0 0 0
0 4 1 2
2 −3 6 −4 9
5.3 Exercises
47. For each of the following row operations find the corresponding elementary matrix.
1 2 R2 −3R1 1 2
(a) −−−−−→
3 4 0 −2
1 2 5 R2 −3R1 1 2 5
(b) −−−−−→
3 4 6 0 −2 −9
1 2 1 2
R ↔R3
(c) 3 4 −−2−−−→ 5 6
5 6 3 4
3 1 4 3 1 4
R2 ×(−2)
(d) 1 5 9 −−−−−→ −2 −10 −18
2 6 5 2 6 5
48. Let A ∈ M3,5 (C). Suppose that B is obtained from A by the following sequence of row opera-
tions in the given order.
1) R1 ↔ R2 2) R3 − 2R1 3) R1 × 3
equivalence relations
column operations
Elementary column operations can be defined in way analogous to row operations. The ele-
mentary matrices corresponding to column operations are multiplied on the right rather than
on the left. Here’s an example to illustrate.
2 1 3 C1 ↔C2 1 2 3 C2 −2C1 1 0 0 C3 −2C2 1 0 0
−−−−−→ −−−−−→ −−−−−→
5 4 6 4 5 6 C3 −3C1 4 −3 −6 4 −3 0
0 1 0 1 −2 0 1 0 −3 1 0 0
2 1 3 1 0 0
1 0 0 0 1 0 0 1 0 0 1 −2 =
5 4 6 4 −3 0
0 0 1 0 0 1 0 0 1 0 0 1
We look at how to solve a linear system using ‘Gaussian elimination’ to put a corresponding matrix
into row echelon form. We also look at how to determine if a linear system has no solutions, a unique
solution, or more than one solution.
Any matrix can be put into row echelon form by performing a sequence of row operations as follows.
1. Consider the first column that is not all zeros. Interchange rows (if necessary) to bring a
non-zero entry to the top of that column. (The ‘leading entry’.)
2. Add suitable multiples of the top row to lower rows so that all entries below the leading
entry are zero.
3. Start again with Step 1 applied to the matrix without the first row.
(stop if there are no more rows)
Example 6.2. Here’s an example of applying the above procedure. It’s a good idea to record the row
operations being used at each step.
3 2 −1 −15 3 2 −1 −15 3 2 −1 −15 3 2 −1 −15
1 −11 0 1 −11 1 −11
1
1 −4 −30 R2− 13 R1 0
−−−−−−→ 3 3 −25 R3−R1
−−−−−→ 3 3 −25 R4−R1 0
−−−−−→ 3 3 −25
3 1 3 11 3 1 3 11 0 −1 4 26 0 −1 4 26
3 3 −5 −41 3 3 −5 −41 3 3 −5 −41 0 1 −4 −26
3 2 −1 −15 3 2 −1 −15
1 −11 0 1 −11 −25
R3+3R2 0 −25 R4−3R2
−−−−−→ 3 3 −−−−−→ 3 3
0 0 −7 −49 0 0 −7 −49
0 1 −4 −26 0 0 7 49
3 2 −1 −15
1 −11
R4+R3 0 −25
−−−−−→ 3 3
0 0 −7 −49
0 0 0 0
The final matrix is in row echelon form. All matrices above are row equivalent to one another.
Given A ∈ Mm,n (F) and B ∈ Mm,1 (F), to solve the linear system AX = B, we can proceed as
follows.
2. Apply Gaussian elimination starting with [A|B] to obtain a row echelon matrix [A0 |B 0 ].
3. Solve the new, simplified set of equations A0 X = B 0 starting from the last equation and working
up. (This is sometimes called back substitution)
Remark. Why does this work? We know that [A0 |B 0 ] = E[A|B] for some invertible matrix E by
Lemma 5.8. Therefore A0 = EA and B 0 = EB. Therefore AX = B and A0 X = B 0 have the same set
of solutions (see the comment at the end of section 4.2).
Example 6.3. Let’s use this technique to find all solutions to the following linear system.
3x + 2y − z = −15
x + y − 4z = −30
(∗)
3x + y + 3z = 11
3x + 3y − 5z = −41
3 2 −1 −15
1 1 −4
, B = −30.
Let A =
3 1 3 11
3 3 −5 −41
Then
3 2 −1 −15 3 2 −1 −15
0 1 −11
1 1 −4 −30
∼ 3 3 −25
(as shown in Example 6.2 )
[A|B] =
3 1 3 11 0 0 −7 −49
3 3 −5 −41 0 0 0 0
So the original linear system (∗) has a unique solution and it is given by (x, y, z) = (−4, 2, 7).
Definition 6.4
x− y+ z =3
x − 7y + 3z = −11
2x + y + z = 16
Note. As we saw in the above example, a linear system is inconsistent if there is a row of the form
[0 · · · 0 | k] (with k 6= 0) in the row echelon form. Given the way in which row echelon form is defined,
such a row occurs precisely when there is a leading entry in the final column of the row echelon form
matrix [A0 |B 0 ]. We shall see in the next section that this is the only situation in which a linear system
is inconsistent.
Definition 6.6
In Example 6.3 we saw a consistent linear system for which there was a unique solution. Here is an
example in which there turns out to be more than one solution.
x+ y+ z =4
2x + y + 2z = 9
3x + 2y + 3z = 13
1 1 1 4 1 1 1 4 1 1 1 4
R2−2R1 R3−R2
[A|B] = 2 1 2 9 −−−−−→ 0 −1 0 1 −−−−→ 0 −1 0 1
R3−3R1
3 2 3 13 0 −1 0 1 0 0 0 0
For any value of z, we get a solution by taking y = −1 and x − 5 − z. That is, the set of all solutions is
S = {(5 − z, −1, z) | z ∈ F}
Lemma 6.8
Let A ∈ Mm,n (F) and B ∈ Mm,1 (F). Suppose that [A|B] ∼ [A0 |B 0 ] and that [A0 |B 0 ] is in row
echelon form. Define r ∈ N ∪ {0} to be the number of non-zero rows in [A0 |B 0 ].
1. The system AX = B is consistent iff there is no leading entry in the final column of [A0 |B 0 ].
Sketch of proof. We have already noted above that if there is a leading entry in the final column, then
the system is inconsistent. If there is no leading entry in the final column, then a solution can always
be found using back substitution as above. The linear system A0 X = B 0 yields r (non-trivial) equa-
tions. For each equation we can write the variable xi , that corresponds to the leading entry, in terms
of the xj having j > i. The n − r variables that do not correspond to leading entries can be chosen as
parameters.
Note. 1. If r = n (as in Example 6.3), the solution requires no parameters. That is, there is a unique
solution.
2. If r < n, then there will be more than one solution. If F is infinite (eg, Q, R or C), there will be
infinitely many solutions.
6.5 Exercises
50. Use Gaussian elimination to put the following matrices into row echelon form.
1 1 −8 −14 0 1 2 3 4
(a) 3 −4 −3 0 (b) 0 0 5 7 6
2 −1 −7 −10 0 0 5 2 4
51. Solve the linear system whose augmented matrix can be reduced to the following row echelon
form.
1 −3 4 7
0 1 2 2
0 0 1 5
0 0 0 0
52. Use Gaussian elimination to solve the following linear systems of equations:
53. Solve the systems of linear equations whose augmented matrices can be reduced to the follow-
ing row echelon forms.
1 −3 7 1 1 2 3 4
(b)
(a) 0 1 4 0 0 0 0 0
0 0 0 1
54. Use Gaussian elimination to solve the following linear systems of equations:
(a) 4x − 2y = 5 (c) x1 + x2 + x3 + x4 =1
−6x + 3y = 1 2x1 + 3x2 + 3x3 =1
(b) x − 4y = 1 −x1 − 2x2 − 2x3 + x4 =0
−2x + 8y = −2 − x2 − x3 + 2x4 =1
(a) x + 2y − 3z = a (b) x + 2y + 4z = a
3x − y + 2z = b 2x + 3y − z = b
x − 5y + 8z = c 3x + y + 2z = c
56. Determine the values of the constant k ∈ C for which the system has
(i) no solutions,
(ii) a unique solution,
(iii) an infinite number of solutions.
(a) 2x + 3y + z = 11 (c) x+ y+ 2z = 9
x+ y+ z =6 x− y+ z=2
5x − y + 11z = k 4x + 2y + (k − 22)z = k
(b) x1 + x3 = 1
x2 + x3 = 2
2x2 + kx3 = k
57. Determine the values for a, b and c for which the parabola y = ax2 + bx + c passes through the
points:
(a) (0, −3), (1, 0) and (2, 5) (b) (−1, 1), (1, 9) and (2, 16)
Gaussian elimination
Elementary Linear Algebra by Anton and Rorres, §1.2
Sinhom = Xp + Shom
x + y + 3z = 2
x − y + 7z = 4
x + y + 3z = 0
x − y + 7z = 0
(x, y, z) = t(−5, 2, 1) t ∈ R
That is,
Shom = {t(−5, 2, 1) | t ∈ R}
That is,
Sinhom = {(3, 1, 0) + t(−5, 2, 1) | t ∈ R} = (3, 1, 0) + Shom
We further refine the simplified form of a matrix. For linear systems this corresponds to performing
back substitution while still in matrix form. The new form has the advantage of being uniquely
determined by the original matrix.
A matrix is in reduced row echelon form if the following three conditions are satisfied.
1 2+i 0 0 1 2+i 0
Example 7.2. 1) 1 −2 3 −4 5 , , are in RREF
0 0 1 0 0 0 1
1 0 0 2 4
1 0 1−i 3 0 1 0 1 6
2) 0 1 1 2 ,
0
are not in RREF
0 0 0 0
0 0 0 i
0 0 1 −4 9
Any matrix can be put into reduced row echelon form as follows.
1. First use Gaussian elimination (Algorithm 6.1) to put the matrix in row echelon form.
2. Multiply rows by appropriate numbers (type 2 row ops) to create the leading 1’s.
3. Working from the bottom row upward, use row operations of type 3 to create zeros above
the leading entries.
Example 7.4.
−2 0 6 8 14 −2 0 6 8 14 −2 0 6 8 14
1 0 −5 −8 −13 R2+ 21 R1 0 0 −2 −4 −6 R3+2R2 0 0 −2 −4 −6
−−−−−→ −−−−−→
2 0 −2 0 −2 R3 + R2 0 0 4 8 12 R4+ 21 R2 0 0 0 0 0
2 0 −5 −6 −11 R4 + R1 0 0 1 2 3 0 0 0 0 0
1 0 −3 −4 −7 1 0 0 2 2
− 12 R1 0 0 1 2 3 R1+3R2 0 0 1
2 3
−−−−→ 0 0 −−−−−→
− 12 R2 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
The above row operations tell us that the set of solutions is the same as that for the system
x1 + 2x4 = 14
x3 + 2x4 = 3
This gives
x1 = −2x4 + 14
x3 = −2x4 + 3
We choose s = x2 and t = x4 as parameters (since the corresponding columns have no leading entry).
The set of solutions is given by
Our method for finding the inverse of a matrix is based on the following result.
Theorem 7.5
Proof. Suppose first that A ∼ In . By Lemma 5.8 there is an invertible matrix E ∈ Mn,n (F) such that
EA = In . Since E is invertible we have
EA = In =⇒ E −1 EA = E −1 In =⇒ A = E −1
Corollary 7.6
Exercise 58. Suppose that A, B ∈ Mn,n (F) are such that AB = In . Show that A is invertible and that
A−1 = B. (Hint: as in the above proof, let E ∈ Mn,n (F) be invertible such that R = EA is in reduced
row echelon form. Show that RB = E and that therefore R does not have a row of zeros. Deduce
that R = In .)
Using the idea of the above theorem (and its proof) we have a way of finding the inverse of a square
matrix A ∈ Mn,n (F). First we use row operations to put A into reduced row echelon form R. If
R 6= In , then A is not invertible. If R = In , then A is invertible and its inverse is the matrix E. A
convenient way of doing this is the following.
2. Use row operations to put the matrix into reduced row echelon form
[A | In ] ∼ [R | B]
Remark. If we have row operations ρi (with corresponding elementary matrices Ei ) such that
ρ1 ρ2 ρ3 ρ k
[A|In ] −→ [A1 | B1 ] −→ [A2 | B2 ] −→ · · · −→ [In | B]
7.3 Exercises
59. For each of the following linear systems, use row reduction to decide whether the system has
(i) no solution, (ii) a unique solution, (iii) more than one solution.
(a) 3x − 2y + 4z = 3 (c) 3x − 4y + z = 2
x− y+ z = 7 −5x + 6y + 10z = 7
4x − 3y + 5z = 1 8x − 10y − 9z = −5
(b) x + 2y − z = −1 (d) 2x − 3y + 5z = 10
2x + 7y − z = 3 4x + 7y − 2z = −5
−3x − 12y + z = 0 2x − 4y + 25z = 31
60. Using row reduction, find the set of solutions to the following system of equations:
2x1 + x2 + 3x3 + x4 = 3
x1 + x2 + x3 − x4 = 6
x1 − x2 + 3x3 + 5x4 = −12
4x1 + x2 + 7x3 + 5x4 = −3
61. Determine the values of k ∈ C for which the system of equations has
(i) no solution,
(ii) a unique solution,
(iii) more than one solution.
(a) kx + y + z = 1 (c) x + 2y + kz = 1
x + ky + z = 1 2x + ky + 8z = 3
x + y + kz = 1
(b) 2x + (k − 4)y + (3 − k)z = 1 (d) x − 3z = −3
2y + (k − 3)z = 2 2x + ky − z = −2
x− 2y + z = 1 x + 2y + kz = 1
(a) x + 2y − 3z = a (b) x − 2y + 4z = a
3x − y + 2z = b 2x + 3y − z = b
x − 5y + 8z = c 3x + y + 2z = c
63. The equation of an arbitrary circle in the x-y plane can be written in the form
x2 + y 2 + ax + by + c = 0
where a, b, c are real constants. Find the equation of the unique circle that passes through the
three points (−2, 7), (−4, 5), (4, −3).
64. A traveller who just returned from Europe spent the following amounts.
For housing: $30/day in England, $20/day in France, $20/day in Spain
For food: $20/day in England, $30/day in France, $20/day in Spain
For incidental expenses: $10/day in each country.
The traveller’s records of the trip indicate a total of $340 spent for housing, $320 for food, $140
for incidental expenses while travelling in these countries. Calculate the number of days spent
in each country or show that the records must be incorrect.
65. Frank’s, Dave’s and Phil’s ages are not known, but are related as follows. The sum of Dave’s
and Phil’s ages is 13 more than Frank’s. Frank’s age plus Phil’s age is 19 more than Dave’s age.
If the sum of their ages is 71, how old are Frank, Dave and Phil?
66. Consider a triangle with vertices x, y and z as shown. Show that there z
exist unique points cx , cy , cz (on the sides indicated) with the property
cy
that: cx
d(x, cy ) = d(x, cz ) and d(y, cx ) = d(y, cz ) and d(z, cx ) = d(z, cy ).
x cz y
68. Following the algorithm described in lectures, reduce the following matrices to reduced row
echelon form. Keep a record of the elementary row operations you use.
4 −8 16 0 2 1 4
(a) 1 −3 6 (d) 0 0 2 6
2 1 1 1 0 −3 2
1 2 0 1
2+i 2+i 5 6+i
(b) (e) 2 4 1 1
1 − 2i 1 − 2i −2 + i 2 − i
3 6 1 1
1 2 0 0 2 7
−1 1 1 −1 1 1
(c) (f)
−1
2 2 1 −4 5
0 2 −2 2 −5 4
69. For each of the following matrices, decide whether or not the matrix is invertible and, if it is,
find the inverse.
1 2 −1
1 −1 0
1 0 1 0
2 0 0 1 0 1
(a) (b) 0 1 2 (c) −1 1 1 (d)
−3 1 0 0 1 1
0 0 1 0 −1 1
1 1 0 0
1 0
70. Consider the matrix A = .
−5 2
2x + y + z + w = 3
2x + 3y + 2z + 2w = 5
4x + 2y + 4z + 3w = 6
6x + 3y + 3z + 5w = 9
Proposition 8.1
The proof is presented (for completeness) in an appendix at the end of this lecture.
Note that this is the same as saying that if R and S are two matrices each in RREF and if R ∼ S, then
we must have R = S.
Knowing that the reduced row echelon form of A is unique enables us to make the following defini-
tion.
Definition 8.2
The rank of a matrix is the number of non-zero rows in its reduced row echelon form. We denote
the rank of a matrix A by rank(A).
Remark. 1. Although the row echelon form is not unique, it has the same number of non-zero rows
as its reduced row echelon form.
2. If two matrices are row equivalent, then they have the same rank.
4. If a square matrix R ∈ Mn,n (F) has rank n and is in reduced row echelon form, then R = In .
Proposition 8.3
Proof. Assume first that A is invertible. Then A ∼ In by Theorem 7.5. Therefore rank(A) = rank(In ) =
n.
For the converse, assume now that rank(A) = n. Let R be the reduced row echelon form of A. Then
rank(R) = rank(A) = n. Therefore R is an n × n matrix, it is in reduced row echelon form and has
no zero rows (since it has rank n). It must be the case that R = In because In is the only n × n matrix
that is in RREF and has rank n. Therefore A is invertible by Theorem 7.5.
8-2 MAST10022 Linear Algebra: Advanced, 2024
8.3 Determinant
Recall that for a 2 × 2 matrix ac db the quantity ad − bc can be used to determine whether or not
the matrix is invertible. It is also equal to the area of the parallelogram defined by the vectors
(a, b), (c, d) ∈ R2 . The determinant is a generalisation of this quantity to n × n matrices.
The determinant is a number (i.e., element of F) associated to a square matrix (in Mn,n (F)). It gives
a lot of information about the matrix. For example, a square matrix is invertible precisely when its
determinant is non-zero.
Rather than beginning with an explicit formula for the determinant, we list its important properties.
The first three of which are, in fact, enough to determine the value of the determinant.
Properties of determinants
Given a matrix A ∈ Mn,n (F), the determinant of A is a number det(A) ∈ F that satisfies:
1. det(In ) = 1
Note. The above properties tell us exactly what the effect of a row operation is on the determinant.
Example 8.4.
2 0 0 1 0 0 0 0 3 0 0 1
det 0 0 3 = 2 det 0
0 3 = −2 det 1
0 0 = −6 det 1 0 0
0 1 0 0 1 0 0 1 0 0 1 0
1 0 0 1 0 0
= 6 det 0 0 1 = −6 det 0 1 0 = −6
0 1 0 0 0 1
The following further properties can all be derived from the first three* .
Properties of determinants
5. If B is obtained from A by applying a row operation of the third kind (i.e., replacing a row
by itself plus a multiple of another row), then det(B) = det(A)
* In fact, for some fields (those of characteristic 2) property 4 does not follow from the first 3.
d1 ∗ · · · ∗
0 d2 · · · ∗
7. det = d1 d2 . . . dn
. .
.
0 · · · 0 dn
Note. Another common notation for the determinant of a matrix A which we will sometimes use is
to write |A| in place of det(A)
Example 8.5. We can use row operations to calculate the determinant of a matrix by putting it into
row echelon form.
1 2 −2 0 1 2 −2 0
2 3 −4 1 0 −1 0 1
= (property 5: R2 − 2R1 , R3 + R1 )
−1 −2 0 2 0 0 −2 2
0 2 5 3 0 2 5 3
1 2 −2 0
0 −1 0 1
= (property 5: R4 + 2R2 )
0 0 −2 2
0 0 5 5
1 2 −2 0
0 −1 0 1 5
= (property 5: R4 + R3 )
0 0 −2 2 2
0 0 0 10
= 1 × (−1) × (−2) × 10 = 20 (property 7)
8.4 Exercises
a b c 2a 2b 2c
(a) g h i (d) 2d 2e 2f
d e f 2g 2h 2i
a −b c a b c
(b) d −e f (e) d+a e+b f +c
g −h i g − 2a h − 2b i − 2c
d e f
(c) 3g 3h 3i
a b c
Proof. We need to show that for any A ∈ Mm,n (F) if A ∼ R1 and A ∼ R2 , and both R1 and R2
are in reduced row echelon form, then R1 = R2 .
The idea is to show that if R1 6= R2 (and they are both in RREF) then there exists X ∈ Mn,1 (F)
such that R1 X 6= 0 and R2 X = 0. This would establish that R1 6= R2 =⇒ R1 6∼ R2 (as
required).
Suppose, for a contradiction, that R1 6= R2 . Let j ∈ {1, . . . , n} be minimal such that j-th column
of R1 is not equal to the j-th column of R2 . Form a new matrix S1 from R1 as follows. Drop all
columns to the right of the j-th column and drop all the columns on its left that do not contain
a leading entry. Define S2 as the matrix similarly obtained from R2 . For example
1 2 0 2 5 1 2 0 3 6 1 0 2 1 0 3
R1 = 0 0 1 3 6 R2 = 0 0 1 4 7 S1 = 0 1 3 S2 = 0 1 4
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Note that S1 and S2 will be in reduced row echelon form and S1 ∼ S2 . Suppose first that the
j-th column of R1 (which is the last column of S1 ) contains a leading entry.
Then we have
0 b1
.. ..
Ik .
Ik .
0
bk
S1 =
1 S2 =
0
0
0
.. ..
0 . 0 .
0 0
T T
But then S1 b1 · · · bk −1 6= 0 whereas S2 b1 · · · bk −1 = 0 which is not possible
since S1 ∼ S2 . The same argument applies in the case in which the j-th column of R2 contains
a leading entry. If neither the j-th column of R1 nor the j-th column of R2 contains a leading
entry we have
a1 b1
.. ..
Ik .
Ik .
ak bk
S1 = S2 =
0
0
.. ..
0 . 0 .
0 0
Considered as the augmented matrix of a linear system, each has a unique solution. The solutions must
be equal since S1 ∼ S2 . Therefore ai = bi for all i. But then S1 = S2 , which contradicts the assumption
that their final columns are different.
Determinants (continued)
We claim that each of the properties given for the determinant can be derived from the first 3.
Exercise 78. Show that properties 4,5, and 6 follow from the first three properties. (For property 4
you will need to assume that the field is such that 1 + 1 6= 0.)
We will show that properties 7,8 9, and 10 follow from the first six.
Lemma 9.1
Let A, B ∈ Mn,n (F). Suppose B is obtained from A by a sequence of elementary row operations.
Then there is a k ∈ F \ {0} such that det(B) = k det(A). Moreover, if D is obtained from C using
the same sequence of row operations, then det(D) = k det(C).
0 0 dn
d1 ∗ ∗ 1 ∗ ∗
.. .
A = 0 . ∗ = d1 d2 · · · dn 0 . . ∗ (by properties 2, 3a)
0 0 dn 0 0 1
1 0 0
= d1 d2 · · · dn ..
0 . 0 = d1 d2 · · · dn (by properties 5, 1)
0 0 1
If one (or more) of the di is equal to 0, then the REF of A will have a row of zeros and therefore
det(A) = 0.
9-2 MAST10022 Linear Algebra: Advanced, 2024
Proof. We first show that det(A) = 0 =⇒ A is singular (by establishing the contrapositive).
For the converse, suppose that A is singular. Let R be the matrix in RREF such that R ∼ A. Because
A is singular, R 6= I (Theorem 7.5 ) and R therefore has a row of zeros. Since R has a row of zeros,
det(R) = 0 and therefore det(A) = 0 (Lemma 9.1).
Proof. Let R be the RREF of A. Then det(R) = k det(A) for some k 6= 0. We also have det(RB) =
k det(AB) with the same k.
Assume first that A is invertible. Then R = I and we have
On the other hand, if A is singular, then R has a row of zeros. Therefore RB has a row of zeros, and
so det(RB) = 0. We have
1
det(AB) = det(RB) = 0
k
and
det(A) det(B) = 0 det(B) = 0
det(AT ) = det(A)
Proof. If A is singular, then AT is also singular (Exercise 38) and therefore det(AT ) = 0 = det(A).
We can assume therefore that A is invertible. It is enough to show that det(E T ) = det(E) for all
elementary matrices E since any invertible matrix can be written as a product of elementary matrices
(Corollary 7.6) and we then have
and
It remains to check that det(E T ) = det(E) for elementary matrices. Note that if E is an elementary
matrix corresponding to a row swap or to multiplying a row by a constant, then E = E T . If E
corresponds to the third kind of row operation, then both E and E T are triangular and have all
diagonal entries equal to one.
The following gives a version of a (recursive) formula for det(A) that can often be useful.
where A(i, j) denotes the (n − 1) × (n − 1) matrix obtained from A by deleting the i-th row and
the j-th column.
Remark. This is often referred to as ‘cofactor expansion along the i-th row’.
Example 9.7.
1 2 −2
2 −2 1 −2 1 2
2 3 −4 = (−1)4 × (−1) × + (−1)5 × (−2) × + (−1)6 × 0 ×
3 −4 2 −4 2 3
−1 −2 0
= 1 × (−1) × (−2) + (−1) × (−2) × 0 + 1 × 0 × (−1)
=2
Remark. The value of (−1)i+j is +1 for i = j = 1 (i.e., top left of the matrix) and then alternates
between −1 and +1 as we move one entry vertically or horizontally.
9.3 Exercises
79. Use cofactor expansion to calculate the determinant of the matrix in Example 8.5.
80. Evaluate the determinant of the following matrices using cofactor expansion (Lemma 9.6):
" #
2 1 2 3 4 5
(a)
3 −1 0 3 4 5
(d)
2 1 1
0 0 4 5
3 0 −1
(b) 0 0 0 5
4 5 2
" #
a ab
2 4 2
(e) (where a, b ∈ C)
b a2 + b2
(c)
1 5 1
3 −7 3
81. Evaluate the determinants of the following matrices. For what values of the variables (x, λ, k ∈
C) are the matrices invertible?
x 2x −3x
(a) x x − 1 −3
0 0 2x − 1
λ−1 0 0 0
2 0 λ+1 0
(b)
1 λ−2 0 0
2 3 9 λ+2
k k+1 k+2
(c) k + 3 k + 4 k + 5
k+6 k+7 k+8
82. (a) Determine the values of the parameter λ for which det(A − λI) = 0 when
4 2 2 2 1
i) A =
−3 −1 ii) A = 2 5 2
1 2 2
(A − λI)X = 0
Think about why the properties listed uniquely determine the value of det(A).
An alternative way of defining (or calculating) the determinant of a matrix A ∈ Mn,n uses
permutations of the set {1, . . . , n}.
X
det(A) = sign(σ)A1,σ(1) A2,σ(2) . . . An,σ(n)
σ∈Sn
Fields
Recall that elements of R3 can be added together and multiplied by scalars. For example
Thinking about the properties of these operations leads to the definition of a vector space, the central
topic of this subject. A vector space is a generalisation of the vector structure of R3 in which we have
a set of ‘vectors’ together with a version of vector addition and scalar multiplication.
The first thing we will look at is what we can use as ‘scalars’.
10.1 Fields
Examples of fields that we alerady know are Q, R, and C. They each have two operations (addition
and multiplication) and the two operations satisfy certain familiar properties. For example, every
non-zero element has a multiplicative inverse. Before listing the defining properties we recall the
meaning of ‘associative’ and ‘commutative’ for binary operations.
Definition 10.1
A binary operation on a set S is a function S × S → S. We often use ‘infix’ notation for binary
operations. For example, we write a + b rather than +(a, b) for the binary operation of addition.
A binary operation : S × S → S is called:
1. associative if ∀a, b, c ∈ S, (a b) c = a (b c)
2. commutative if ∀a, b ∈ S, a b = b a
Example 10.2. The binary operations of addition and multiplication on R are associative and commu-
tative. Subtraction gives a binary operation on R that is not associative since, for example, (1−2)−3 6=
1 − (2 − 3). It is also not commutative since, for example, 1 − 2 6= 2 − 1. Matrix multiplication gives
a binary operation on M2,2 (R) that is associative but not commutative.
Definition 10.3
A field is a set F together with two binary operations, + and × on F satisfying the following
properties:
D) ∀ a, b, c ∈ F, a × (b + c) = (a × b) + (a × c) (distributivity)
Example 10.5. Here are two fields that each have only a finite number of elements.
(F2 , +) (F2 , ×)
+ [0] [1] × [0] [1]
The field F2 is a field having two elements F2 = {[0], [1]}
with operations given in the tables on the right. [0] [0] [1] [0] [0] [0]
[1] [1] [0] [1] [0] [1]
(F3 , +) (F3 , ×)
+ [0] [1] [2] × [0] [1] [2]
The field F3 is defined by F3 = {[0], [1], [2]}
and operations given on the right. [0] [0] [1] [2] [0] [0] [0] [0]
[1] [1] [2] [0] [1] [0] [1] [2]
[2] [2] [0] [1] [2] [0] [2] [1]
Given any prime p ∈ N there is a field having p elements. It can be constructed as follows. We label
the elements as [0], [1], . . . , [p − 1], that is, Fp = {[0], [1], . . . , [p − 1]}. The operations are defined by
[a] + [b] = [a + b]
[a] × [b] = [a × b]
where [a+b] is defined to be the the element [k] ∈ Fp given by the condition that p divides (a+b)−k. In
other words, we add as usual in Z, but then add a multiple of p until the result lies in {1, 2, . . . , p − 1}.
Similarly, the element [a×b] is defined to be the element [k] ∈ Fp given by the condition that p divides
(a × b) − k.
Remark. It’s common to write the elements of Fp simply as {0, 1, 2, . . . , p − 1} rather than
{[0], [1], [2], . . . , [p − 1]}. Another notation is {0, 1, 2, . . . , (p − 1)}. Be careful, Fp is not a ‘subfield’ of
R.
10.2 Exercises
84. Write down the addition and multiplication tables for F5 and F7 .
89. Let F be a field and a ∈ F \ {0}. Show that the function L : F → F given by L(x) = ax is a
bijection.
(a) ab = 1
(b) a2 = b
(c) a3 = 1
(d) 1 + 1 = 0 (This is harder and just for fun. Feel free to skip it.)
Vector spaces
Let F be a field. A vector space over F (also called an F-vector space, or just a vector space)
consists of a non-empty set V together with two binary operations:
Remark. 1. It’s important to note that the scalars form part of the vector space structure.
2. The elements of V are called vectors.
*
3. The symbol 0 is being used for the zero vector to distinguish it from the zero scalar 0 ∈ F. The
*
vector 0 is uniquely determined by the property in axiom 3.
4. The additive inverse −v of a vector v is uniquely determined by the property in axiom 4.
5. A vector space over R is often called a real vector space.
A vector space over C is often called a complex vector space.
Example 11.2. Here are some important examples of vector spaces. Everything we will say about
vector spaces applies to each of these.
4. (F3 )2 is a vector space over F3 . The operations are the same as above for Rn . Here are some
examples of the operations.
(1, 2) + (2, 2) = (0, 1)
2(2, 1) = (1, 2)
7. Sequence spaces
Let F be any field and let V = {(x1 , x2 , . . . ) | xi ∈ F}.† The operations are given by
9. Polynomials
Let F be any field let V = F[x] = {a0 + a1 x + a2 x2 + · · · + an xn | n ∈ N, ai ∈ F}.. The operations
are given by the usual operations on polynomials.
10. Functions
Let V = F(R, R) be the set of all functions from R to R. We get a vector space over R if we
define operations by
More generally, given any non-empty set S and any field F we have a vector space V = F(S, F)
whose vectors are functions from S to F and with operations as given above.
†
An element of this set is the same as a function from N to F.
We describe a standard way of combining two vector spaces to produce a third. Fix a field F and let
V and W be vector spaces over F.
The direct sum of V and W , denoted V ⊕ W , has underlying set is the Cartesian product of V
and W
V ⊕ W = {(v, w) | v ∈ V, w ∈ W }
with the operations defined by
Exercise 92. Show that with these operations V ⊕ W is a vector space over F.
11.3 Exercises
93. Determine whether or not the given set is a real vector space when equipped with the usual
operations. If it is not a vector space, list all properties that fail to hold.
(a) The set of all 2 × 3 matrices whose second column consists of 0’s.
That is, {A ∈ M2,3 (R) | Ai2 = 0 ∀ i}.
(b) The set of all (real) polynomials with positive coefficients.
That is, {a0 + a1 x + a2 x2 + · · · + an xn | ai ∈ R, ai > 0}.
(c) The set of all (real valued) continuous functions with the property that the function is 0 at
every integer. That is, {f ∈ C(R, R) | f (x) = 0 ∀ x ∈ Z}
94. Let V be the set of positive real numbers, that is, V = {x ∈ R | x > 0}. Define the operations of
vector addition ⊕ and scalar multiplication as follows:
Show that, equipped with these operations, V forms a real vector space. (What is the zero
vector? What is the (additive) inverse of a vector x ∈ V ?)
95. Let A ∈ M2,2 (R) and let V = {X ∈ M2,1 (R) | AX = [ 00 ]}. Show that V is closed under the usual
operations of matrix addition and scalar multiplication. That is, show that ∀u, v ∈ V, u + v ∈ V
and that ∀u ∈ V ∀α ∈ R, αu ∈ V . Check that V together with these operations forms a vector
space. (We may use standard properties of R.)
Modules
What if we were to relax the requirement that the scalars must be a field? For example, could
we allow Z as the scalars, but keep the rest of the definition of a vector space unchanged? This
leads to what is called a module. Many of our considerations about vector spaces continue to
work for modules, but some do not. Modules are covered in the subject MAST30005 Algebra.
Subspaces
A subspace W of a vector space V is a subset W ⊆ V that is itself a vector space using the
addition and scalar multiplication operations defined on V .
2. W a subspace of V is denoted W 6 V .
3. It’s important to realise that subspaces are vector spaces in their own right. Anything we can
prove about vector spaces applies to subspaces.
2. {(x, y, z) ∈ R3 | x + y + z = 0} 6 R3
To show that a subset fails to be a subspace it can sometimes be useful to note the following.
Lemma 12.3
*
Let V be a vector space and W ⊆ V . If W is a subspace, then 0V ∈ W .
*
(Where 0V denotes the zero vector in the vector space V .)
The following theorem allows us to avoid checking all the vector space axioms when showing that a
subset of V is a subspace.
0. W is non-empty
Proof. Suppose that W is a subspace of V . Then it is a vector space and therefore satisfies the con-
ditions in Definition 11.1. Therefore it is non-empty (by axiom 3) and the the operations of vector
addition and scalar multiplication give functions W × W → W and F × W → W and therefore (1)
and (2) are satisfied.
12-2 MAST10022 Linear Algebra: Advanced, 2024
For the converse, suppose that W ⊆ V satisfies (0), (1), and (2). We need to check that W satisfies
Definition 11.1. By (1) and (2), we have that the binary operations of vector addition V × V → V
and scalar multiplication F × V → V restrict to give functions W × W → W and F × W → W . That
axioms 1,2,5,6,7,8 of Definition 11.1 are satisfied is immediate (since they hold in V and W ⊆ V ).
Since W is non empty, there is an element w ∈ W . Then by condition (2) and Exercise 91 we have that
* * *
0V = 0w ∈ W . Therefore axiom 3 is satisfied (and 0W = 0V ). To see that axiom 4 holds, let v ∈ W .
Appealing to Exercise 91 again we have that −v = (−1)v ∈ W .
Exercise 97. Use the Subspace Theorem to show that the first two examples above (in Example 12.2)
are indeed subspaces.
Exercise 98. Let H and K be subspaces of a vector space V . Prove that the intersection H ∩ K is a
subspace of V .
*
Examples 12.5. 1. {0} is always a subspace of V
2. V is always a subspace of V
a 0 0
3. The set of diagonal matrices 0 b 0 | a, b, c ∈ C is a subspace of M3,3 (C).
0 0 c
{X ∈ Mn,1 (F) | AX = 0}
Proof. Let W = {X ∈ Mn,1 (F) | AX = 0}. By Theorem 12.4 it is enough to show that W is non-empty
and closed under vector addition and scalar multiplication.
Note first that 0 ∈ W since A0 = 0. Therefore W 6= ∅.
Let X, Y ∈ W . Then A(X + Y ) = AX + AY = 0 + 0 = 0. Therefore X + Y ∈ W .
Let X ∈ W and α ∈ F. Then A(αX) = αAX = α × 0 = 0. Therefore αX ∈ W .
Therefore, by the Subspace Theorem, W is a subspace of Mn,1 (F).
0 0 0 1 1
Example 12.7. Consider the matrix H = 0 1 1 0 0 ∈ M3,5 (F2 ).
1 0 1 0 1
The set W = {X ∈ M5,1 (F2 ) | HX = 0} is a subspace of M5,1 (F2 ). Solving the linear system in the
usual way, we get
12.1 Exercises
100. Decide which of the following are subspaces of C3 . Explain your answers.
101. Show that the following sets of vectors are not subspaces of Rn .
102. Use the subspace theorem to decide which of the following are real vector spaces with the usual
operations.
103. Determine whether or not the given set is a subspace of Mn,n (C).
(a) The set of all matrices, the sum of whose entries is zero.
(b) The set of all matrices whose determinant is zero.
(c) The diagonal matrices.
(d) The matrices with trace equal to 0.
104. Decide which of the following are complex vector spaces with the usual matrix operations.
z1 z2
(a) All complex 2 × 2 matrices with z1 and z2 real.
z3 z4
(b) All complex 2 × 2 matrices with z1 + z4 = 0.
105. Let A ∈ Mm,n (F) and B ∈ Mm,1 (F). Show that if B 6= 0, then the set of solutions {X ∈
Mn,1 (F) | AX = B} is not a subspace of Mn,1 (F).
106. Let V = (F2 )3 . Show that W = {(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0)} ⊆ V is a subspace of V .
We define and investigate the important notions of linear dependence and span.
α1 u1 + · · · + αk uk
Notice that if W 6 V and u1 , . . . uk ∈ W , then the above linear combination gives a vector in W . We
will investigate how to describe subspaces using linear combinations.
13.2 Span
Let V be a vector space with scalars F, and let S ⊆ V be a non-empty subset of V . The span of S
is the set of all linear combinations of vectors from S
Remark. 1. The above definition of span(S) does not assume that S is finite.
Examples 13.3. 1. S = {(1, 1), (−3, −3)} ⊆ R2 , span(S) is the line {(x, y) ∈ R2 | y = x}
−2 0 6 8
1 0 −5 −8
Example 13.4. Let A ∈ M4,4 (C) be given by A = 2 0 −2 0 .
Calculation gives that the reduced
1 0 0 2 2 0 −5 −6
row echelon form of A is R = 00 00 10 20 . The solution space of the linear system AX = 0 is given by
0000
x −2w 0 −2w
{ y
z
y
| x, y, z, w ∈ C, x = −2w, z = −2w} = { −2w | y, w ∈ C} = { y0 + −2w0 | y, w ∈ C}
w w 0 w
0 −2 0 −2
1 0
= {y 0 + w −2 | y, w ∈ C} = span{ 10 , −2 0 }
0 1 0 1
Proposition 13.5
*
Proof. Note first that if S = ∅, then both parts are trivially true since span(S) = {0}. We will assume
then that S 6= ∅.
To show that span(S) is a subspace we show that it satisfies the conditions of the Subspace Theorem.
Firstly, span(S) is non-empty since span(S) ⊇ S 6= ∅. Now suppose that u, v ∈ span(S). Then, from
the definition of the span, we have that u = ki=1 αi ui for some αi ∈ F and ui ∈ S and v = ki=1 βi vi
P P
for some βi ∈ F and vi ∈ S. But then
u + v = α1 u1 + · · · + αk uk + β1 v1 + · · · βl vl ∈ span(S)
Therefore span(S) is closed under vector addition. Similarly, for any a ∈ F we have that
and therefore span(S) is closed under scalar multiplication. It follows from the Subspace Theorem
that span(S) is a subspace of V .
For the second part, suppose that W 6 V and that S ⊆ W . We have
Remark. The above proposition tells us that span(S) is the ‘smallest’ subspace of V that contains S.
We sometimes say that span(S) is the subspace spanned by S.
Exercise 107. Let V be a vector space and let S, T ⊆ V be two subsets. Show that if S ⊆ T , then
span(S) 6 span(T ).
Definition 13.6
Given a subspace W 6 V of a vector space, we say that a subset S ⊆ V is a spanning set for W
if span(S) = W . We also say that S spans W .
Exercise 108. Show that S ⊆ V is a spanning set for a subspace W 6 V if and only if
(a) S ⊆ W and
Example 13.7. We will show that S = {(1, 1), (2, 3), (4, 5)} ⊆ R2 is a spanning set for R2 . By Exercise
108 we need to show that given any (a, b) ∈ R2 , there exist x, y, z ∈ R such that (a, b) = x(1, 1) +
y(2, 3) + z(4, 5). That is, we need to show that the following linear system is consistent:
x + 2y + 4z = a
x + 3y + 5z = b
Forming the augmented matrix of the linear system and row reducing gives:
1 2 4 a R2 −R1 1 2 4 a
−−−−→
1 3 5 b 0 1 1 b−a
Therefore the linear system is consistent (for all a, b ∈ R) and we conclude that S is a spanning set for
R2 .
Exercise 109. By setting up an appropriate linear system, show that the set
S = {(1, −1, 1), (3, −2, 3), (1, 0, 1), (1, 1, 1)}
is not a spanning set for R3 . Use your working to find a vector u ∈ R3 such that u ∈
/ span(S).
Definition 13.8
Let V ba a vector space over a field F. A subset S ⊆ V is called linearly dependent if there are
vectors u1 , . . . , uk ∈ S and scalars α1 , . . . , αk ∈ F such that
*
α1 u1 + · · · + αk uk = 0
k
X *
αi ui = 0 =⇒ ∀i, αi = 0
i=1
Example 13.9. We decide whether or not the subset S = {(1, 4, 1), (2, 5, 1), (3, 6, 1)} ⊆ C3 is linearly
dependent. From the definition, the set is linearly independent iff x(1, 4, 1) + y(2, 5, 1) + z(3, 6, 1) =
(0, 0, 0) =⇒ x = y = z = 0 That is, the set is linearly independent iff the following (homogeneous)
linear system has a unique solution
x + 2y + 3z = 0
4x + 5y + 6z = 0
x+y+z =0
h1 2 3i
We write down the corresponding matrix A = 4 5 6 . Reducing to row echelon form gives
111
1 2 3 1 2 3 R − 1
R
1 2 3
R2 −4R1 3 2
4 5 6 −−−−−→ 0 −3 −6 −−−−3−→ 0 −3 −6
R3 −R1
1 1 1 0 −1 −2 0 0 0
Since the solution set will require one parameter (Lemma 6.8), we know that there are non-trivial
solutions. Therefore the set S is linearly dependent.
Exercise 110. By solving an appropriate linear system, show that the following subset of P3 (C) is
linearly independent:
Lemma 13.10
*
1. If S = {u}, then S is linearly dependent iff u = 0.
2. If If S = {u, v} consists of exactly two vectors, then S is linearly dependent iff one of the two
vectors is a multiple of the other.
13.4 Exercises
112. Determine whether the given set spans the given vector space.
(a) {(2a, b, 0) | a, b ∈ R}
(b) {(a + c, c − b, 3c) | a, b, c ∈ R}
(c) {(4a + d, a + 2b, c − b) | a, b, c, d ∈ R}
114. Find (finite) spanning sets for the given vector spaces.
115. Determine whether or not the following sets of vectors are linearly independent:
116. By setting up and solving an appropriate linear system, decide whether the vector u is a linear
combination of the vectors in the set S. If so, express u as a linear combination of the vectors in
S.
118. Let a, b, c ∈ C be such that no two of them are equal. Show that the set {(1, a, a2 ), (1, b, b2 ), (1, c, c2 )} ⊆
C3 is linearly independent.
119. Determine whether or not the given set is linearly dependent. If the set is linearly dependent,
write one of its vectors as a linear combination of the others.
120. Show that a set S ⊆ V is linearly dependent iff ∃u ∈ S such that span(S \ {u}) = span(S).
*
121. Show that if a subset S ⊆ V contains the zero vector 0 ∈ V , then S is linearly dependent.
122. Show that any subset of a linearly independent set is itself linearly independent.
123. Let u, v ∈ V and let α, β ∈ F with α 6= 0. Show that {u, v} is linearly independent iff {αu+βv, v}
is linear independent.
124. Let A ∈ Mm,n (F) be a matrix that is in row echelon form. Show that the non-zero rows of A
form a linearly independent subset of M1,n (F).
A fundamental concept when working with vector spaces is that of a basis. It is a spanning set
that is as ‘small’ as possible. A basis gives an efficient way of describing and working with vectors.
Choosing a basis allows us to use coordinates to represent vectors. The dimension of a vector space
is defined using bases.
Definition 14.1
Let V be a vector space over a field F. A basis for V is a subset B ⊆ V that satisfies:
M0 2,2
6. Let V 6 (C) be the subspace of all matrices having trace equal to 0.
Then { 10 −1 , [ 00 10 ] , [ 01 00 ]} is a basis for V .
A vector space has many subsets that are bases. The following are the standard bases for some
common vector spaces. If no other basis if specified, these are the assumed choices.
6. The standard basis for Fn is {(1, 0, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, 0, 0, . . . , 0, 1)}. These vectors
are sometimes denoted as {e1 , e2 , · · · , en }.
7. The standard basis for Mm,n (F) is {Ei,j | i ∈ {1, . . . m}, j ∈ {1, . . . , n}} where Ei,j is the matrix
with a 1 in the (i, j)-th entry and 0 in all other entries.
Let V be a vector space and B = {b1 , . . . , bn } ⊆ V a basis for V . Every vector in u ∈ V can be
written uniquely as a linear combination of elements from B. That is, there exist unique αi ∈ F
such that
u = α1 b1 + · · · + αn bn
14.2 Dimension
Theorem 14.4
Let V be a vector space and let B = {b1 , . . . , bn } be a basis for V . Let S ⊆ V be a subset.
Proof. Note first that the third part is an immediate consequence of the first two parts, which we now
prove.
1) We prove the contrapositive: if S has more than n elements, then S is linearly dependent.
Suppose that |S| > n, and let {u1 , . . . , un+1 } ⊆ S be distinct elements of S. Let αij ∈ F be such that
uj = α1j b1 + · · · + αnj bn . Let A ∈ Mn,n+1 (F) be given by Aij = αij . To show that S is linearly
dependent we will show that there is a non-trivial linear combination of the ui that gives the zero
vector. For x1 , . . . , xn+1 ∈ F we have
n+1 n+1 n
X * X X *
xj uj = 0 ⇐⇒ xj αij bi = 0
j=1 j=1 i=1
n n+1
X X *
⇐⇒ αij xj bi = 0
i=1 j=1
n+1
X
⇐⇒ ∀i, αij xj = 0 (since B is linearly independent)
j=1
" x1 #
x2
⇐⇒ AX = 0 where X = ..
.
xn+1
The final expression is a homogeneous linear system. The solution space of this linear system will
need (n + 1) − rank(A) parameters (see Lemma 6.8). Since A has n rows, we have rank(A) 6 n, and
therefore (n + 1) − rank(A) > 1. Therefore the linear system more than one solution. In particular,
there is a non-trivial solution.
2) Will will show the contrapositive: if S has fewer than n elements, then it is not a spanning set.
Suppose that S = {u1 , . . . , uk } with k < n. Let αij ∈ F be such that uj = α1j b1 + · · · + αnj bn .
Let A ∈ Mn,k (F) be given by Aij = αij . We will show that there exist γ1 , . . . , γn ∈ F such that
v = γ1 b1 + · · · + γn bn is not in span(S).
n
X k
X
v ∈ span(S) ⇐⇒ ∃xj ∈ F, γi bi = xj uj
i=1 j=1
n
X k
X n
X
⇐⇒ ∃xj ∈ F, γi bi = xj αij bi
i=1 j=1 i=1
n
X n
X k
X
⇐⇒ ∃xj ∈ F, γi bi = αij xj bi
i=1 i=1 j=1
n
X
⇐⇒ ∃xj ∈ F ∀i, γi = αij xj (Lemma 14.3)
j=1
x1 γ1
⇐⇒ ∃xj ∈ F, A ... = ...
xk γn
γ1
We need to show that there is a choice for the γi such that the linear system AX = ... is incon-
γn
sistent. Let E ∈ Mn,n (F) be an invertible matrix such that EA = R is in reduced row echelon
# R ∈ Mn,k (F) has fewer columns than
form. "Since rows, the bottom row of R is all zeros. Now let
γ1
0.
en = 0.. ∈ Mn,1 (F) and choose the γi by C = ... = E −1 en . Then E[A | C] = [R | en ]. Since the
1 γn
bottom row of R is all zeros, the linear system in inconsistent.
1. dim(R2 ) = 2
2. dim(P ) = 2, where P 6 V is the plane P = {(x, y, z) ∈ R3 | x + y + z = 0}
3. dim(V ) = 3, where V 6 M2,2 (C) is V = { ac db | a + d = 0}
4. dim(Fn ) = n
5. dim(Mm,n (F)) = mn
6. dim(Pn (F)) = n + 1
7. F[x] is infinite dimensional
8. F(R, R) is infinite dimensional
Theorem 14.7
The full proof is a little too technical to go through here, but the idea of the proof is the following.
Start with a linearly independent set S ⊆ V . If S is not a basis, then there exists u ∈ V such that
S ∪ {u} is linearly independent. Therefore a maximal linearly independent set must be a basis. That
there is a maximal linearly independent set requires the use of ‘Zorn’s Lemma’.
We isolate the following consequence of the above argument.
Lemma 14.8
Let V be a vector space and S ⊆ V a subset. Suppose that |S| = dim(V ). Then S is linearly
independent if and only if S is a spanning set (for V ).
14.3 Exercises
(a) {(i, 0, −1), (1, 1, 1), (0, −i, i)} (b) {(i, 1, 0), (0, 0, 1)}
126. Which of the following sets of vectors are bases for P2 (R)?
128. (a) Show that any set of four polynomials in P2 (C) is linearly dependent.
(b) Show that a set consisting of two polynomials cannot span P2 (C).
129. Let
0 1 4 n T T o
and W = (x, y, z) ∈ R3 : A x y z = 3 x y z
A = 6 1 −8 .
−9 3 15
15.1 Coordinates
From Lemma 14.3 we know that every vector in a vector space can be written in exactly one way
as a linear combination of a given basis. The scalars that appear as the coefficients in the linear
combination are called the coordinates. Once we fix a basis for a finite dimensional vector space V ,
there is a one-to-one correspondence* between vectors and their coordinate matrices.
Let V be a vector space over a field F. Suppose that B = {b1 , . . . , bn } is an ordered basis for V ,
that is, a basis arranged in order: b1 first, b2 second and so on. For v ∈ V we can write
The scalars α1 , . . . , αn are uniquely determined by v (see Lemma 14.3) and are called the coor-
dinates of v relative to B.
αn
is called the coordinate matrix† of v with respect to B.
"1#
0
Example 15.2. 1. V = P3 (C), B = {1, x, x2 , x3 , x4 }, v =1+ 2x3 + 2x4 , [v]B = 0
2
2
1
2. V = M2,2 (Q), B = {[ 10 00 ] , [ 00 10 ] , [ 01 00 ] , [ 00 01 ]}, v= [ 12 02 ], [v]B = 0
2
2
−1
3. V = R2 , B = {(1, 2), (3, 4)}, v = (2, 2), [v]B = 1
4. V = {(x, y, z) ∈ R3 | x+y +z = 0} 6 R3 , B = {(1, 2, −3), (2, −1, −1)}, v = (−3, 4, −1), [v]B = 1
−2
Note. The coordinates depend on the basis chosen. If we change the basis B, the coordinates of a
vector will change.
5 3
2
* i.e., a bijection
†
also called the coordinate vector of v
15-2 MAST10022 Linear Algebra: Advanced, 2024
Lemma 15.4
Let V be a finite dimensional vector space over a field F and let B be an ordered basis for V .
Let u, v ∈ V and α ∈ F. Then
[αv]B = α[v]B
Pn
PLet the ordered basis be B = {b1 , . . . , bn } and let αi , βi ∈ F be such that u =
Proof. i=1 αi bi and
v = ni=1 βi bi . Then
n
X n
X n
X
u+v = αi bi + βi bi = (αi + βi )bi
i=1 i=1 i=1
and therefore
α1 " β1 # " α1 +β1 # "
α1 +β1
#
[u]B + [v]B = ... + ... = ..
. and [u + v]B = ..
.
αn βn αn +βn βn +αn
Exercise 132. Let V be an n-dimensional vector space with scalars F and let B be a basis for V . Show
that the map ϕ : V → Mn,1 (F) given by ϕ(v) = [v]B is a bijection.
The following observation allows us to convert some questions about an n-dimensional vector space
to the corresponding questions about Fn .
Lemma 15.5
Let V be an n-dimensional vector space with scalars F and let B be a basis for V . Let S ⊆ V be a
subset of V and define T ⊆ Mn,1 (F) by T = {[v]B | v ∈ S}. Then
Proof.
k
X
S a spanning set for V ⇐⇒ ∀v ∈ V ∃u1 , . . . , uk ∈ S ∃α1 , . . . , αk ∈ F, v= αi ui
i=1
k
X
⇐⇒ ∀v ∈ V ∃u1 , . . . , uk ∈ S ∃α1 , . . . , αk ∈ F, [v]B = αi [ui ]B
i=1
k
X
⇐⇒ ∀u ∈ Mn,1 (F ) ∃u1 , . . . , uk ∈ S ∃α1 , . . . , αk ∈ F, u= αi [ui ]B
i=1
⇐⇒ T a spanning set for Mn,1 (F )
Examples 15.6.
2. Consider the
1 vector
0space V 6 M2,2 (C) of matrices having trace 0. In Example
0 1 14.2.6 we saw
0 1 0 0
that B = { 0 −1 , [ 0 0 ] , [ 1 0 ]} is a basis 1 0 −1 1
1 −1
h iS = { 1 −1 , 0 1 , [ 1 0 ]} ⊆ V . Taking
h iforh V .i Let
0
coordinates with respect to B we get { 0 , 1 , 1 }. Since this set is linearly dependent (see
1 0 1
Exercise 115(c)), we have that S is linearly dependent.
15.2 Exercises
133. (a) Show that the set B = {(−2, 2, 2), (3, −2, 3), (2, −1, 1)} is a basis for R3 .
(b) Find the vectors x, y ∈ R3 whose coordinates with respect to B are
2 1
[x]B = 1
and [y]B = 0
1 −1
(c) For each of the following vectors find its coordinates with respect to B:
134. Find the coordinate vector of v with respect to the given basis B for the vector space V .
We’ve already seen that, given a matrix A ∈ Mm,n (F), its solution space is a subspace of Mn,1 (F).
There are two other spaces we can associate to a matrix.
Definition 16.1
1. The solution space of A (also called the null space or kernel) is the subspace of Mn,1 (F)
given by {X ∈ Mn,1 (F) | AX = 0}.
It will often be identified with a subspace of Fn .
T
Explicitly, solspace(A) = {(x1 , . . . , xn ) ∈ Fn | A x1 · · · xn = 0}.
2. The row space, rowspace(A), is the subspace of M1,n (F) spanned by the rows of A.
It will often be identified with a subspace of Fn . Explicitly, we identify the i-th row with
the element ri = (ai1 , . . . , ain ) ∈ Fn and let rowspace(A) = span{r1 , . . . , rm } 6 Fn
3. The column space, colspace(A), is the subspace of Mm,1 (F) spanned by the columns of A.
It will often be identified with a subspace of Fm . Explicitly, we identify the j-th column
with the element cj = (a1j , . . . , amj ) ∈ Fm and let colspace(A) = span{c1 , . . . , cn }
AX = .. .. .. .. .. = x .. + x .. + · · · + x ..
. . . . . 1 . 2 . n .
am1 am2 ... amn xn am1 am2 amn
Lemma 16.3
1. rowspace(A) = rowspace(R)
4. Every non-pivot column of A can be written as a linear combination of the columns to its
left
Proof. 1) We show that if two matrices are row equivalent, then they have the same row space. For
that it is enough to show that if R is obtained from A by a single row operation, then the the row space
of R is a subset of the row space of A. But this is clear since each row of R is a linear combination of
the rows of A.
2) From the first part, we know that the non-zero rows of R form a spanning set for rowspace(A).
That the non-zero rows are linearly independent is exercise 124.
3) Since the position of the leading entries will be the same, we can assume that R is in reduced row
echelon form. The pivot columns of R are then linearly independent. Denoting the columns of A by
c1 , . . . , cn and those of R by d1 , . . . , dn we have that
α1
α1 c1 + · · · + αn cn = 0 ⇐⇒ A .. = 0 (see the remark ofter Definition 16.1)
.
αn
α1
⇐⇒ R .. =0 (A and R are row equivalent)
.
αn
⇐⇒ α1 d1 + · · · + αn dn = 0
Therefore, the pivot columns of A form a linearly independent set. To see that the set of pivot columns
of A forms a spanning set for colspace(A) it is enough to show that each of the non-pivot columns
of A can be written as a linear combination of the pivot columns of A. But this is clearly true for the
columns of R, and therefore for the columns of A by the above calculation.
4) The statement holds for the columns of R since R is in row echelon form. That the same holds for
A then follows from the remark after Definition 16.1 (as in the previous part).
Definition 16.5
The dimension of the row space is called the row rank of a matrix. The dimension of the column
space is called the column rank of a matrix.
* That is, the columns of A such that the corresponding column in R has a leading entry.
Corollary 16.6
The rank, row rank and column rank of a matrix are all equal.
16.2 Exercises
136. In each part find a basis for, and the dimension of, the indicated subspace.
(a) The solution space of the homogeneous linear system (over R):
x1 − 2x2 + x3 = 0
x2 − x3 + x4 = 0
x1 − x2 + x4 = 0
x1 − 3x2 + x3 − x5 = 0
x1 − 2x2 + x3 − x4 = 0
x1 − x2 + x3 − 2x4 + x5 = 0
(c) The subspace of R4 of all vectors of the form (x, −y, x − 2y, 3y).
137. For each of the following real matrices find a basis for the
138. Find a basis for each of the column space, row space and solution space of the matrix
0 2 1 1
2 0 1 1
∈ M4,4 (F3 )
1 1 1 1
0 1 2 1
139. Let w = [x1 · · · xn ]T ∈ Mn,1 (R) be fixed, and let W = span{w}. Show the there exists a matrix
A ∈ Mn,n (R) whose solution space is W .
We collect here some techniques that result from the theory that we’ve seen so far.
Let V be an n-dimensional vector space over a field F and let B be a basis for V .
Let S = {u1 , . . . , uk } ⊆ V . To decide if S is linearly independent:
2. Calculate rank(A)
Example 17.2. Let S = {(2+2i)+2x+2x2 −2x3 , i+(1+i)x+x2 −x3 , −1−x2 +x3 } ⊆ P3 (C). Is S linearly
independent? To apply the above technique we first fix a basis for P3 (C). Let’s use the standard basis
B = {1, x, x2 , x3 }. Letting u1 = (2 + 2i) + 2x + 2x2 − 2x3 , u2 = i + (1 + i)x + x2 − x3 , u3 = −1 − x2 + x3
we get
2 + 2i i −1 −2 −1 0
2 1+i 0 ∼ 0 i 1
A = [u1 ]B [u2 ]B [u3 ]B =
2 1 −1 0 0 0
−2 −1 1 0 0 0
Since the rank is 2 which is strictly less the the number of elements in S, we conclude that S is linearly
dependent.
Let V be an n-dimensional vector space over a field F and let B be a basis for V .
Let S = {u1 , . . . , uk } ⊆ V . To decide if S is a spanning set for V :
2. Calculate rank(A)
Example 17.4. With S ⊆ P3 (C) as in Example 17.2, the calculation done there shows that S is not a
spanning set for P3 (C) because rank(A) = 2 < 4 = dim(P3 (C)).
17-2 MAST10022 Linear Algebra: Advanced, 2024
Algorithm 17.5: To find a subset of a set S = {u1 , . . . , uk } that is a basis for span (S)
Let V be an n-dimensional vector space over a field F and let B be a basis for V .
Let S = {u1 , . . . , uk } ⊆ V . To find a subset of S that is a basis for span(S):
4. The corresponding elements of S form a basis for span(S). That is, the basis is
Example 17.6. With S ⊆ P3 (C) as in Example 17.2, the calculation done there shows that the set
C = {(2 + 2i) + 2x + 2x2 − 2x3 , i + (1 + i)x + x2 − x3 } is a basis for span(S).
Remark. Alternatively, to find a basis for span(S) we could consider the matrix
2 + 2i 2 2 −2 −1 0 −1 1
AT = i 1 + i 1 −1 ∼ 0 1 −i i
−1 0 −1 1 0 0 0 0
The non-zero rows in the row echelon form give a basis {(−1, 0, −1, 1), (0, 1, −i, i)} for rowspace(A)
and hence a basis {−1 − x2 + x3 , x − ix2 + ix3 } for span(A).
Let V be an n-dimensional vector space over a field F and let B be a basis for V .
Let W 6 V be a subspace and fix a basis {w1 , . . . , wm } for W .
Given a linearly independent subset S = {u1 , . . . , uk } ⊆ W , to extend S to obtain a basis of W :
4. The vectors correspnding to the pivot columns form a basis for W . That is
is a basis for W .
a
The first k columns will all be pivot columns since S is linearly independent.
Example 17.8. We saw in the previous example that {−1 − x2 + x3 , x − ix2 + ix3 } ⊆ P3 (C) is a linearly
independent set. We can extend it to a basis for P3 (C) as follows (letting u1 = −1 − x2 + x3 , u2 =
x − ix2 + ix3 and using the standard basis B = {1, x, x2 , x3 } )
−1 0 1 0 0 0 −1 0 1 0 0 0
0 1 0 1 0 0
[[u1 ]B [u2 ]B [1]B [x]B [x2 ]B [x3 ]B ] = ∼ · · · ∼ 0 1 0 1 0 0
−1 −i 0 0 1 0 0 0 −1 i 1 0
1 i 0 0 0 1 0 0 0 0 1 1
Therefore the set {−1 − x2 + x3 , x − ix2 + ix3 , 1, x2 } is a basis for P3 (C).
Let V be an n-dimensional vector space over a field F and let B be a basis for V .
Let S = {u1 , . . . , uk } ⊆ V and let w ∈ V . To write w as a linear combination of the elements in S:
4. Otherwise, continue to reduced row echelon form R. The entries in the last column of R
give the coefficients in the linear combination. (See the example below.)
Example 17.10. Consider the set S = {−1 − x2 + x3 , x − ix2 + ix3 } ⊆ P3 (C). Let’s try to write the
element w = 1 + x + x2 + x3 as a linear combination of these two elements:
−1 0 1 −1 0 1
0 1 1 0 1 1
−1 −i 1 ∼ · · · ∼ 0 0 i
1 i 1 0 0 0
1 i 0 0 0 0
17.1 Exercises
1 −2 2 −3 1 0 2 −5
140. Let W = span(S) where S = { 4 1 , 9 −1 , 6 −5 , 7 5 } ⊆ M2,2 (R).
141. (a) Show that B = {(1, 1, 1), (1, 1, 0), (1, 0, 0)} is a basis for R3 .
(b) Find the coordinates of (4, −3, 2) ∈ R3 relative to B.
142. (a) Show that the set {(1 − i, i), (2, −1 + i)} is linearly dependent in C2
(b) Now consider the vector space V with underlying set C2 , but with R as the field of scalars.
Show that the set {(1 − i, i), (2, −1 + i)} is linearly independent in V
1 3 −2 5 4
1 4 1 3 5
1 4 2 4 3 ∈ M4,5 (R)
143. Let A =
2 7 −3 6 13
144. Which of the following are linear combinations of (0, −2, 2) and (1, 3, −1)?
150. In each part explain why the given statement is true “by inspection.”
(a) The set {(1, 0, 3), (−1, 1, 0), (1, 2, 4), (0, −1, −2)} is linearly dependent.
(b) The set {(1, −1, 2), (0, 1, 1)} does not span R3 .
(c) If the set {v1 , v2 , v3 , v4 } of vectors in R4 is linearly independent, then it spans R4 .
(d) The set {(0, 1, −1, 0), (0, −1, 2, 0)} is linearly independent, and so it spans the subspace of
R4 of all vectors of the form (0, a, b, 0).
151. In each part determine whether or not the given set forms a basis for the indicated subspace.
(a) {(1, 2, 3), (−1, 0, 1), (0, 1, 2)} for R3
(b) {(−1, 1, 2), (3, 3, 1), (1, 2, 2)} for R3
(c) {(1, −1, 0), (0, 1, −1)} for the subspace of R3 consisting of all (x, y, z) such that x+y+z = 0.
(d) {(1, 1, 0), (1, 1, 1)} for the subspace of R3 consisting of all (x, y, z) such that y = x + z.
152. Which of the following sets of vectors are bases for R3 ?
153. Find a basis for and the dimension of the subspace of Rn spanned by the following sets.
(a) {(0, 1, −2), (3, 0, 1), (3, 2, −3)} (n = 3)
(b) {(1, 3), (−1, 2), (7, 6)} (n = 2)
(c) {(−1, 2, 0, 4), (3, 1, −1, 2), (−5, 3, 1, 6), (7, 0, −2, 0)} (n = 4)
154. For each of the following sets choose a subset that is a basis for the subspace spanned by the set.
Then express each vector that is not in the basis as a linear combination of the basis vectors.
(a) {(1, 2, 0, −1), (2, −1, 2, 3), (−1, −11, 6, 13), (4, 3, 2, 1)} ⊆ Q4
(b) {(0, −1, −3, 3), (−1, −1, −3, 2), (3, 1, 3, 0), (0, −1, −2, 1)} ⊆ Q4
(c) {(1, 2, −1), (0, 3, 4), (2, 1, −6), (0, 0, 2)} ⊆ Q3
When storing or transmitting data (on a disk, over the internet etc) errors are often introduced. Er-
rors might result, for example, from physical damage or radiation. In a memory chip, background
radiation can alter the memory contents. We would like to be able to protect against this, and reduce
the risk of using corrupted data.
How can we encode data in order to detect and perhaps correct errors in transmission or storage?
The study of such problems is known as Coding Theory.
18.1 Codes
A key idea is to build in redundancy before sending or storing the data. A simple way to do this is by
repetition. If we wanted to send the message ‘1011’, we could send each bit twice and send ‘11001111’.
If the message ‘11011111’ were received we would know that there had been some corruption of the
message. In this example the sent message is made up of combinations of 00 and 11.
By repeating each bit three times we would even be able to correct errors:
We would know that there had been some interference and that the original message was (most
probably) 101, since 010 is ‘closer’ to 000 than to 111.
Definition 18.1
Let A be a finite set. We will refer to A as the alphabet. A code over A is a non-empty subset of
An . The number n is called the length of the code. The elements of a code are called codewords.
Example 18.2. 1. The set {(c, a, t), (d, o, g), (p, i, g), (a, b, c), (l, d, r)} is a code of length 3 over the
alphabet A = {a, b, c, . . . , z}.
2. {(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0)} is a code of length 3 over F2 = {0, 1}
Remark. When considering codes it is convenient to drop the commas and parentheses. So the exam-
ples above would be written as {cat, dog, pig, abc, ldr} and {000, 011, 101, 110}
Definition 18.3
A linear code of length n and rank k is a k-dimensional subspace of Fnp . (for some prime p ∈ N).
The code is called a binary linear code if p = 2, and it is a ternary linear code in the case p = 3.
Example 18.4. For the above repetition code, our codewords were 000 and 111. These two elements
together form a subspace of F32 so it is a linear code.
Example 18.5. {(a, b, c) | a, b, c ∈ F2 and a + b + c = 0} = {000, 011, 101, 110} is a subspace of F32 , and
so is a binary linear code. This code could be used as follows.
18-2 MAST10022 Linear Algebra: Advanced, 2024
original codeword If we receive a word abc we can check whether it is a codeword by calculat-
00 000 ing (remembering that entries are in F2 ) whether
01 101 hai
10 110 [1 1 1] b = 0
c
11 011
If it is a codeword, we know that the intended (original) message was bc.
Definition 18.6
A check matrix for a linear code is a matrix H such that C is the solution space of H.
x1 1 1 0
1
x2 1 0 1 1
x 1 0 0 0
x34 = a0 + b1 + c1 + d1
x5 0 1 0 0
x6 0 0 1 0
x7 0 0 0 1
1 1 0 1 a+b+d
1011 a+c+d
1 0 0 0 ab a
= 0 1 1 1 c = b+c+d
0100 d b
0010 c
0001 d
We have 4 parameters (called information bits in this setting) which can be chosen arbitrarily. (In
this case they are the 3rd, 5th, 6th and 7th bits.) The other 3 bits are called check bits. (The 1st, 2nd
and 4th bits.)
If we want to send the message abcd ∈ F42 , we calculate the codeword [ a+b+d a+c+d a b+c+d b c d ] and
send it. When a message is received, we check that it is a codeword by multiplication by H.
For example, suppose we wanted to send 1011 (i.e., a = 1, b = 0, c = 1, d = 1).
Given the specified encoding, the codeword for this is 0110011. We send the codeword 0110011
Suppose some interference occurs and the received word is v = 0100011
The receiver knows that an error has occurred because
0 0
1 1
0 0 0 0 1 1 1 1 0 0 0
T
Hv = H 0 = 0 1 1 0 0 1 1
0 = 1 6= 0
0 1 0 1 0 1 0 1 0 1 0
1 1
1 1
In fact, since
0
T
Hv = 1
1
the receiver knows that the error occurred in the third bit.
Assuming that a single error has occurred, the (column) matrix Hv T is equal to a column (of H). If it
is the nth column, the error occurred in the nth bit.
She therefore knows that to get the intended codeword the third bit should be swapped from 0 to 1
— giving 0110011
The original message is then recovered by dropping the check bits (1st, 2nd and 4th) — giving 1011
This code gives a way of correcting a single error, which we explore below.
Exercise 155. Using the above encoding, recover the original messages from the received words:
0110001, 1110000, 0000011
When an error occurs, we would like to be able to correct the error, by deciding what the original
message was. In the above triple repetition code, when the non codeword 010 was received we
deduced that the (probable) intended codeword was 000, since this is ‘closer’ to 010 than is the other
codeword 111. We saw another example in the previous section.
In correcting an error, we are assuming that the original codeword (before interference) is the code-
word closest to the received word. This is called the nearest neighbour principle.
Definition 18.8
Definition 18.10
Let C be a code with minimum distance dmin between codewords. Then C can be used to
In general, to find the minimum distance between codewords one needs to calculate the distance be-
tween every pair of codewords (and then take the minimum). However, for linear codes calculating
Definition 18.12
The weight w(u) of a word u = a1 · · · an ∈ Fnp is the number of non-zero coordinates, that is,
*
w(u) = d(u, 0).
Lemma 18.13
For a linear code, the minimum distance between codewords is equal to the smallest non-zero
weight.
Example 18.14. The linear code {0000, 1011, 0110, 1101} ⊂ F42 has minimum weight 2, and hence
minimum distance 2.
Suppose a linear code is defined by a check matrix H. How can we calculate the minimum weight
without having to list all the words in the code?
1. For a binary linear code defined by a check matrix H, the minimum weight is the smallest
number r > 0 such that r columns of H sum to zero.
2. More generally, let H ∈ Mm,n (Fp ) be the check matrix for a linear code C. Then the mini-
mum weight of C is equal to the size of the smallest linearly dependent set of columns of
H.
18.4 Exercises
158. Determine the minimum distance dmin between codewords for each of the following binary
codes:
159. What is the smallest minimum distance that a code must have in order to correct two errors?
How many errors will it detect?
160. Determine whether each of the following sets of codewords forms a binary linear code.
161. Verify that each of the following sets gives a binary linear code, and find the minimum distance.
164. Let
0 0 0 0 0 0 0 1 1 1
0 0 0 1 1 1 1 0 0 0
H= ∈ M4,10 (F2 ).
0 1 1 0 0 1 1 0 0 1
1 0 1 0 1 0 1 0 1 0
This check matrix defines a linear code whose information bits (i.e., parameters) are the 3rd,
5th, 6th, 7th, 9th, and 10th.
165. Let
0 0 0 0 1 1 1 1 1 1 1 1 1
H = 0 1 1 1 0 0 0 1 1 1 2 2 2 ∈ M3,13 (F3 )
1 0 1 2 0 1 2 0 1 2 0 1 2
Each codeword has 10 information bits and 3 check bits (1st, 2nd and 5th)
Linear transformations
We next consider functions between vector spaces that preserve the vector space structure (addition
and scalar multiplication). Such linear transformations arise in many applications and are a central
tool in the study of vector spaces.
Let V and W be vector spaces over the same field of scalars F. A linear transformation from V
to W is a function T : V → W that satisfies the following properties:
Remark.
2. The condition T (αu) = αT (u) implies that ‘lines are mapped to lines (or points)’
Examples 19.2.
and
T (α(x, y, z)) = T (αx, αy, αz) = (0, 2αx + αz, −αy) = α(0, 2x + z, −y) = αT (x, y, z)
dp
2. The map D : P3 (C) → P2 (C), D(p(x)) = dx is a linear transformation
3. Many familiar geometric transformations of the plane R2 are linear, such as: reflection across a
line through the origin, rotation about the origin, projection onto a line through the origin.
There is a natural way to use matrices to define linear transformations. In fact, as we will see later, all
linear transformations (between finite-dimensional vector spaces) can be represented by matrices.
Let A ∈ Mm,n (F). The function T : Mn,1 (F) → Mm,1 (F) given by
x1 x1
.. ..
T . = A .
xn xn
is a linear transformation.
Proof. We need to show that given any u, v ∈ Mn,1 (F) and α ∈ F we have T (u + v) = T (u) + T (v)
and T (αu) = αT (u).
Proposition 19.6
Proof. We need to show that ∀v ∈ V , T1 (v) = T2 (v). Let v ∈ V . Then, since S is a spanning set for V ,
there exist αi ∈ F and ui ∈ S such that v = ki=1 αi ui . Then we have
P
k
!
X
T1 (v) = T1 αi ui
i=1
k
X
= αi T1 (ui ) (T1 is a linear transformation)
i=1
Xk
= αi T2 (ui ) (T1 (ui ) = T2 (ui ))
i=1
k
!
X
= T2 αi ui (T2 is a linear transformation)
i=1
= T2 (v)
The next result says that a linear transformation can be defined by choosing images for the elements
of a basis.
Theorem 19.7
Let V and W be vector spaces over the same field F. Let B ⊆ V be a basis for V .
Given any function f : B → W , there exists a unique linear transformation T : V → W having
the property that T (b) = f (b) for all b ∈ B.
Proof. We need to show that there exists a linear transformation with the property that T (bi ) = f (bi )
for all i, and that if two linear transformations each have this property, then they are equal.
To establish the existence part of the statement we define a function T : V → W as follows. Given
u ∈ V , we have u = α1 b1 + · · · + αn bn for uniquely determined αi ∈ F and bi ∈ B (Lemma 14.3).
We define T (u) = α1 f (b1 ) + · · · + αn f (bn ). To see that this gives a linear transformation, let u, v ∈ V
and α ∈ F . Then there are b1 , . . . , bn ∈ B and αi , βi ∈ F such that u = α1 b1 + · · · + αn bn and
v = β1 b1 + · · · + βn bn . Then
T (u + v) = T (α1 b1 + · · · + αn bn + β1 b1 + · · · + βn bn )
= T ((α1 + β1 )b1 + · · · + (αn + βn )bn )
= (α1 + β1 )f (b1 ) + · · · + (αn + βn )f (bn ) (definition of T )
= α1 f (b1 ) + · · · + αn f (bn ) + β1 f (b1 ) + · · · + βn f (bn )
= T (α1 b1 + · · · + αn bn ) + T (β1 b1 + · · · + βn bn ) (definition of T )
= T (u) + T (v)
and
T (αu) = T (α(α1 b1 + · · · + αn bn ))
= T ((αα1 )b1 + · · · + (ααn )bn )
= (αα1 )f (b1 ) + · · · + (ααn )f (bn ) (definition of T )
= α(α1 f (b1 ) + · · · + αn f (bn ))
= αT (α1 b1 + · · · + αn bn ) (definition of T )
= αT (u)
19.4 Exercises
169. Determine whether or not the given map is a linear transformation, and justify your answer.
170. Let v1 , v2 , and v3 be vectors in a vector space V and T : V → R3 a linear transformation for
which T (v1 ) = (1, −1, 2), T (v2 ) = (0, 3, 2), and T (v3 ) = (−3, 1, 2). Find T (2v1 − 3v2 + 4v3 ).
171. For the linear transformations of R2 into R2 given by the following matrices:
(i) Sketch the image of the rectangle with vertices (0, 0), (2, 0), (0, 1), (2, 1).
(ii) Describe the geometric effect of the linear transformation.
0 1 1 1 b 0
(a) (c) (e)
1 0 0 0 0 c
0 0 1 0 1 3 −4
(b) (d) (f) 5
1 0 a 1 4 3
172. Show that there is no line through the origin in R2 that is invariant under the transformation
determined by the matrix
cos θ − sin θ
A(θ) =
sin θ cos θ
when θ is not an integral multiple of π. Give a geometric interpretation of this observation
commenting on the case when θ = kπ for some k ∈ Z.
173. Let V and W be two vector spaces over a field F. Let S ⊆ V be non-empty and linearly in-
dependent. Use Theorem 19.7 to show that for all functions f : S → W , there exists a linear
transformation T : V → W with the property that T (v) = f (v) for all v ∈ S.
Hom(V, W )
Let V and W be vector spaces over the same field F. The set of all linear transformations from
V to W is itself a vector space over F when given the usual ‘pointwise’ operations.
We saw in Lemma 19.4 that matrices can be used to define linear transformations. In fact, any linear
transformation can be represented by a matrix. Just as the coordinate matrix of a vector depends on
a choice of basis, the matrix of a linear transformation depends on a choice of basis for each of the
domain and codomain.
Let V, W be finite dimensional vector spaces with the same scalars F and let T : V → W be a linear
transformation. Let B = {b1 , b2 , . . . , bn } be an ordered basis for V and C = {c1 , c2 , . . . , cm } be an
ordered basis for W . Then T (bi ) ∈ W for each i = 1, . . . , m and we can therefore write T (bi ) uniquely
as a linear combination of the basis vectors in C.
We form a matrix [T ]C,B ∈ Mm,n (F) by defining [T ]C,B = (αij ). This matrix is called the matrix
of T with respect to B and C.
Note. The i-th column of [T ]C,B is given by [T (bi )]C . That is,
The way in which this matrix represents the linear transformation is given by the following.
Lemma 20.2
Let V and W be finite dimensional vector spaces with bases B and C respectively. Let T : V → W
be a linear transformation. Then
u = α1 b1 + · · · + αm bn
T (u) = α1 T (b1 ) + · · · + αn T (bn ) (T is linear)
[T (u)]C = α1 [T (b1 )]C + · · · + αn [T (bn )]C (Lemma 15.4) (∗)
20-2 MAST10022 Linear Algebra: Advanced, 2024
and
α1
· · · [T (bn )]C ...
[T ]C,B [u]B = [T (b1 )]C
αn
= α1 [T (b1 )]C + · · · + αn [T (bn )]C (see the remark on page 16-1)
= [T (u)]C (from (∗) above)
In summary, we have
apply T
u∈V −−−−−→ T (u) ∈ W
take coordsy ytake coords
mult by [T ]C,B
[u]B −−−−−−−−−→ [T (u)]C
Exercise 174. Suppose that A ∈ Mm,n (F) is such that A[u]B = [T (u)]C for all u ∈ V . Show that
A = [T ]C,B . (Hint: Show that the i-th column of A is equal to [T (bi )]C .)
Example 20.3. Consider the linear transformation T : M2,2 (R) → M2,2 (R) where T is defined by
T (A) = AT
0 1
Exercise 175. With T as above and C = {[ 10 00 ] , [ 01 10 ] , −1 0 , [ 00 01 ]}, calculate [T ]C,C .
Example 20.4. Let T : R2 → R2 be the linear transformation given by T (x, y) = (x + 4y, x + y). Let
S = {(1, 0), (0, 1)} and B = {(2, −1), (2, 1)}. Then
−1 0
−2 6
[T ]S,S = [ 11 41 ] 0 3 and [T ]B,B =
and [T ] S,B = 1 3
Lemma 20.5
Proof. Let u ∈ U .
Since [T ◦ S]C,A [u]A = ([T ]C,B [S]B,A )[u]A for all u ∈ U , we must have that [T ◦ S]C,A = [T ]C,B [S]B,A
Example 20.6. With the linear transformation T : R2 → R2 and bases S and B of Example 20.4, we
have
2 1 4 1 4 5 8
[T ]S,S = [T ]S,S [T ]S,S = =
1 1 1 1 2 5
−1 0 −1 0 1 0
[T 2 ]B,B = [T ]B,B [T ]B,B = =
0 3 0 3 0 9
1 4 −2 6 2 18
[T 2 ]S,B = [T ]S,S [T ]S,B = =
1 1 1 3 −1 9
The dimension of ker(T ) is called the nullity of T and is denoted nullity(T ). The dimension of
im(T ) is called the rank of T and is denoted rank(T ).
Example 20.9. Consider the linear transformation T : P3 (R) → P2 (R) given by differentiation. Then
ker(T ) = span{1} and im(T ) = P2 (R). Hence nullity(T ) = 1 and rank(T ) = 3.
Lemma 20.10
*
Let T : V → W be a linear transformation. Then T is injective if and only if ker(T ) = {0}.
*
Now, conversely, suppose that ker(T ) = {0}. For u, v ∈ V we have
* * *
T (u) = T (v) =⇒ T (u) − T (v) = 0 =⇒ T (u − v) = 0 =⇒ u − v ∈ ker(T ) =⇒ u − v = 0 =⇒ u = v
Therefore T is injective.
(a) Let X ⊆ V be a linearly independent subset of the domain. Show that if T is injective, then
T (X) is linearly independent.
(b) Let Y ⊆ V be a spanning set for the domain. Show that T (Y ) is a spanning set for the image
im(T ).
(c) Use parts (a) and (b) to show that if B is a basis for V and T is injective, then T (B) is a basis for
im(T ).
If both the domain and codomain are finite dimensional, the kernel and image of a linear transfor-
mation T can calculated from a matrix representation of T :
Therefore,
The following is essentially the observation that each column of a matrix is either a pivot column or
a non-pivot column.
Proof. We first prove under the assumption that W is also finite dimensional. Let n = dim(V ) and
m = dim(W ) and let B be a basis for V and C a basis for W . Let A = [T ]C,B ∈ Mm,n (F).
Example 20.12. For the linear transformation T : P3 (R) → P2 (R) of Example 20.9 we have rank(T ) +
nullity(T ) = 3 + 1 = 4 = dim(P3 (R)).
20.4 Exercises
179. In each part, find a single matrix that performs the indicated succession of operations.
1
(a) Compresses by a factor of 2 in the x-direction, then expands by a factor of 5 in the y-
direction.
(b) Reflects about y = x, then rotates about the origin through an angle of π.
(c) Reflects about the y-axis, then expands by a factor of 5 in the x-direction, and then reflects
about y = x.
180. Find the standard matrix (i.e., with respect to the standard basis of R2 ) of the rotation about the
origin through
π
(a) 4 anticlockwise (b) π
182. Find the indicated linear transformation if it is defined. If it is not defined, explain why not.
183. Find the matrix which represents (with respect to the standard bases) those linear transforma-
tions in question 182 which exist.
184. Let T : P2 (R) → P3 (R) be the function defined by multiplication by x. That is, T (a+bx+cx2 ) =
ax + bx2 + cx3 .
185. Let T : P2 (R) → P2 (R) be the linear transformation defined by T (p(x)) = p(2x + 1), that is,
186. Find the matrix that represents the linear transformation T with respect to the bases B and B 0 .
where B = B 0 = {1, x, x2 , x3 }.
187. Consider the linear transformation T : R4 → R3 given by the matrix (wrt the standard bases):
1 2 −1 1
[T ] = 1 0 1 1
2 −4 6 2
(a) Determine whether or not v1 = (−2, 0, 0, 2) and v2 = (−2, 2, 2, 0) are in the kernel of T .
(b) Determine whether or not w1 = (1, 3, 1) and w2 = (−1, −1, −2) are in the image of T .
(c) Find the nullity of T and give a basis for the kernel of T . Is the transformation injective?
(d) Find the rank of T and give a basis for the image of T . Is the transformation surjective?
(i) its standard matrix (i.e., with respect to the standard bases),
(ii) a basis for the kernel,
(iii) a basis for the image.
x1
x x+y x1 + x2 − x3
(a) T = (b) T x2 =
y 3y 2x1 + x2
x3
x + 2y x1 3x1 − x2 − 6x3
x
(c) T = −y (d) T x2 = −2x1 + x2 + 5x3
y
x−y x3 3x1 + 3x2 + 6x3
190. Let S : P2 (R) → P3 (R) be defined as follows. For each p(x) = a0 + a1 x + a2 x2 , define S(p) =
a0 x + 21 a1 x2 + 13 a2 x3 . The linear transformation S gives the antiderivative of p(x), with the
constant term equal to zero.
(a) Find the matrix A that represents S with respect to the bases B = {1, x, x2 } and B 0 =
{1, x, x2 , x3 }
(b) Use A to find the antiderivative of p(x) = 1 − x + 2x2 .
191. Let U, V, W be vector spaces over the same field F with V being fininte dimensional. Consider
linear transformations S : U → V and T : V → W that satisfy T ◦ S = 0. Show that rank(S) +
rank(T ) 6 dim(V ).
Let V and W be finite dimensional vector spaces over the same field F. Let n = dim(V ) and
m = dim(W ). Choose bases B for V and C for W and define f : Hom(V, W ) → Mm,n (F) by
f (T ) = [T ]C,B . Then f is a bijective linear transformation.
ϕ ψ
U V W
Commutative diagrams
A diagram of four linear transformations of the form
ϕ
U V
f g
ϕ0
U0 V0
is said to commute if g ◦ ϕ = ϕ0 ◦ f .
Suppose that in the following diagram of linear transformations the rows are exact and the left
square commutes.
ϕ ψ
0 U V W 0
f g h
ϕ0 ψ0
0 U0 V0 W0 0
Show that:
(d) There exists a unique linear transformation h : W → W 0 that makes the right square
commute;
(e) If g is surjective, then so too is h;
(f) If f is surjective and g is injective, then h is injective.
Change of basis
The matrix of linear transformation depends on the bases used for both domain and codomain. We
want to develop a convenient way of relating the different matrices that represent a given linear
transformation. In other words, if we change the bases used, how does the matrix representation
change?
Exercise 192. Show that a linear transformation T : V → W is invertible if and only if T is a bijection.
Note that in the above definition and exercise we are not assuming that either V or W is finite di-
mensional. In the case in which they are both finite dimensional we have the following.
Let V and W be finite dimensional vector spaces and let B and C be bases for V and W respec-
tively. Let T : V → W be a linear transformation.
*
Proof. Suppose first that T is invertible. Then T is a bijection and therefore ker(T ) = {0}. Therefore,
by the rank-nullity theorem, dim(im(T )) = dim(V ). But since T is a bijection, im(T ) = W . Therefore
dim(W ) = dim(V ).
Similarly,
Similarly, [S ◦ T ]B.B = [S]B,C [T ]C,B = A−1 A = In , and therefore S ◦ T = IdV . Therefore T is invertible,
and T −1 = S.
Exercise 193. Let V be an n-dimensional F-vector space and B a basis for V . Show that the map
ϕ : V → Mn,1 (F), ϕ(u) = [u]B is an isomorphism. Conclude that V ∼
= Fn .
How can we convert coordinates with respect to one basis to coordinates with respect to another?
Definition 21.3
Let V be a finite dimensional vector space and let B and C be two bases for V . The transition
matrix from B to C, denoted PC,B , is defined to be
Exercise 194. Let V be a finite dimensional vector space and let A, B and C be bases for V . Show that
Proposition 21.4
Let V be a finite dimensional vector space and let B and C be two bases for V . Then
Example 21.5. Consider the following bases for R2 : B = {(1, 1), (1, −1)} and S = {(1, 0), (0, 1)}.
−1
1 1 1 1 1 1 1
Then we have PS,B = and PB,S = =2 .
1 −1 1 −1 1 −1
1 1 2 −1
If u = (2, −4) ∈ R2 , then [u]B = PB,S [u]S = 21 =
1 −1 −4 3
Example 21.6. Consider the following bases for C2 : B = {(1, 1), (1, −1)} and C = {(1+i, 1), (1, 1−i)}.
To find PC,B , we could calculate [(1, 1)]C and [(1, −1)]C . Alternatively, we can proceed as follows:
−1
−1 1+i 1 1 1 −i 2 − i
PC,B = PC,S PS,B = (PS,C ) PS,B = =
1 1−i 1 −1 i −2 − i
We can also use transition matrices to relate two different matrix representations of the same linear
transformation.
Proposition 21.7
Let V and W be finite dimensional vector spaces. Let B1 and B2 be two bases for V and let C1
and C2 be two bases for W . Let T : V → W be a linear transformation. Then
(PC2 ,C1 [T ]C1 ,B1 PB1 ,B2 ) [u]B2 = PC2 ,C1 [T ]C1 ,B1 (PB1 ,B2 [u]B2 )
= PC2 ,C1 [T ]C1 ,B1 [u]B1 (Proposition 21.4)
= PC2 ,C1 [T (u)]C1 (Lemma 20.2)
= [T (u)]C2 (Proposition 21.4)
It follows that PC2 ,C1 [T ]C1 ,B1 PB1 ,B2 = [T ]C2 ,B2 (see Exercise 174).
Corollary 21.8
Let V be a finite dimensional vector space and let B and C be two bases for V . Denote P = PC,B .
Let T : V → V be a linear transformation. Then
[T ]C = PC,B [T ]B PB,C = P [T ]B P −1
Proof. Apply the Proposition with B2 = C2 = C and B1 = C1 = B. Note that PB,C = (PB,C )−1 .
Now let’s calculate the matrix with respect to the basis C = {(1, 1), (−1, 1)}.
−1
1 −1 3 −1 1 −1 2 0
[T ]C = PC,B [T ]B PB,C = =
1 1 −1 3 1 1 0 4
Definition 21.10
Let A, B ∈ Mn,n (F ). We say that A and B are similar if there exists an invertible matrix P ∈
Mn,n (F ) such that A = P BP −1 . It is denoted A ∼ B.a
a
Beware! This is not the same as saying that A and B are row-equivalent.
Exercise 195. Let A, B ∈ Mn,n (F) and suppose that A and B are similar. Show that there exists a
linear transformation T : Fn → Fn and a basis B of Fn such that A = [T ]S and B = [T ]B (where S is
the standard basis for Fn ).
21.4 Exercises
196. Determine whether or not the given linear transformation is invertible. If it is invertible, com-
pute its inverse.
198. (a) Find the transition matrix P from B to C, where B, C are the following bases of R3
B = {(1, −2, 1), (0, 3, 2), (1, 0, −1)} and C = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}
199. Verify that the given set B is a basis for Rn . Compute the change of basis matrix for each of the
bases, and use it to find the coordinate matrix of v with respect to B.
201. Let T : R3 → R3 be given by T (x, y, z) = (4x + y − 4z, −3x − y + 5z, x). Find the matrix [T ]B
that represents T with respect to the basis B of question 199 (b).
202. Let T : R2 → R2 be defined by T (x1 , x2 ) = (x1 − 2x2 , −x2 ), and let B = {u1 , u2 }, B 0 = {v1 , v2 },
u1 = (1, 0), u2 = (0, 1), where v1 = (2, 1), v2 = (−3, 4).
with respect to the standard basis for R3 . Find the matrix [T ]B of T with respect to the basis
B = {(1, 2, 1), (0, 1, −1), (2, 3, 2)}.
204. Let B = {b1 , b2 , b3 } be a basis for C3 . Calculate the nullity and rank of the linear transformation
T : C3 → C3 determined by
T (b1 ) = b1 − b2
T (b2 ) = b2 − b3
T (b3 ) = b1 − b3
205. Calculate the nullity and rank of the linear transformation T : (F7 )3 → (F7 )3 determined by
T (1, 0, 0) = (1, 2, 3)
T (0, 1, 0) = (3, 4, 5)
T (0, 0, 1) = (5, 1, 4)
Show that the relation of similarity is an equivalence relation on Mn,n (F). That is, show that
the relation is reflexive, symmetric and transitive.
Let V be an n-dimensional F-vector space. Show that every invertible n × n matrix is a change
of basis matrix for V . That is, show that for all invertible P ∈ Mn,n (F) there exist bases B and
B 0 for V such that P = [IdV ]B0 ,B .
Given a linear transformation T : V → V , show that there exist bases B and B 0 for V such that
[T ]B0 ,B is diagonal and all entries are either 0 or 1. (Hint: start with a basis for the kernel of T .)
It is important here that B and B 0 do not have to be the same. We will be investigating later the
special case in which there exists a basis B such that [T ]B,B is diagonal.
Dual space
Let F be a field and V a vector space over F. We consider linear transformations from V to F.
Let V be a vector space over F. The dual space, V ∗ , is the vector space made of all linear
functionals with operations given by, for ϕ, ψ ∈ V ∗ and k ∈ F,
Definition 22.4
B ∗ = {b∗1 , . . . , b∗n } ⊂ V ∗ is called the dual basis (or the basis dual to B).
a
That this uniquely determines the b∗i follows from Theorem 19.7
22-2 MAST10022 Linear Algebra: Advanced, 2024
Theorem 22.5
Let {b1 , . . . , bn } be a basis for V . Then {b∗1 , . . . , b∗n } is a basis for V ∗ , and
Pn
Proof. For all u ∈ V we have u = j=1 βj bj for some βj ∈ F. Then
n
X n
X
b∗i (u) = b∗i βj bj = βj b∗i (bj ) = βi
j=1 j=1
Pn
Having shown that βi = b∗i (u), we have that u = ∗
i=1 bi (u)bi .
Pn
For the second statement, let ϕ ∈ V ∗ . For all u ∈ V we have u = j=1 βj bj for some βj ∈ F. Then
n
! n
! n
X X X
ϕ(bi )b∗i (u) = ϕ(bi )b∗i βj bj
i=1 i=1 j=1
n
X n
X
= ϕ(bi ) βj b∗i (bj )
i=1 j=1
n n
!
X X
= ϕ(bi )βi = ϕ βi bi = ϕ(u)
i=1 i=1
Pn ∗
Therefore ϕ = i=1 ϕ(bi )bi . To see that {b∗1 , . . . , b∗n } is linearly independent we have
n
X n
X
βi b∗i = 0 =⇒ ∀j, βi b∗i (bj ) = 0
i=1 i=1
=⇒ ∀j, βj = 0
Corollary 22.6
b ∈ V ∗∗ by
Given u ∈ V we define an element u
u
b(ϕ) = ϕ(u)
b is a linear transformation V ∗ → F.
Exercise 208. Show that u
Proof. To see that the map is a linear transformation, for u, v ∈ V and ϕ ∈ V ∗ we have:
\
(u + v)(ϕ) = ϕ(u + v) = ϕ(u) + ϕ(v) = u
b(ϕ) + vb(ϕ)
(ku)(ϕ)
d = ϕ(ku) = kϕ(u) = kbu(ϕ)
Remark. The above isomorphism does not require any choice of basis.
Definition 22.8
The matrix of the transpose linear transformation T ∗ is the transpose of the matrix of T .
Theorem 22.9
Let V and W be finite dimensional vector spaces. Let B be a basis for V and B ∗ the dual basis
for V ∗ . Let C be a basis for W and C ∗ the dual basis for W ∗ . Let T : V → W be a linear
transformation. Then
[T ∗ ]B∗ ,C ∗ = ([T ]C,B )T
Proof. Let B = {b1 , . . . , bn }, B ∗ = {b∗1 , . . . , b∗n }, C = {c1 , . . . , cn }, and C ∗ = {c∗1 , . . . , c∗n }. Recall that
(
∗ ∗ 1 if i = j
bi (bj ) = ci (cj ) =
0 if i 6= j
Pn
Let aij ∈ F be such that T (bj ) = i=1 aij ci . That is, aij is the (i, j)-th entry of [T ]C,B . The i-th row of
[T ]C,B is [ai1 · · · ain ].
We will show that i-th column of [T ∗ ]B∗ ,C ∗ is the transpose of the i-th row of [T ]C,B .
The i-th column of [T ∗ ]B∗ ,C ∗ is given by [T ∗ (c∗i )]B∗ . We have
T ∗ (c∗i ) = c∗i ◦ T (definition of transpose linear transformation)
X n
= c∗i ◦ T (bj )b∗j (Theorem 22.5)
j=1
The i-th column of [T ∗ ]B∗ ,C ∗ is therefore [c∗i ◦ T (b1 ) · · · c∗i ◦ T (bn )]T . We will be done if we can show
that c∗i ◦ T (bj ) = aij . We have
n
X n
X
c∗i ◦ T (bj ) = c∗i ( akj ck ) = akj c∗i (ck ) = aij
k=1 k=1
22.5 Exercises
209. Let ϕ ∈ (R2 )∗ satisfy ϕ(1, −1) = 2 and ϕ(−4, 5) = −3. Find ϕ(x, y).
210. For each of the following bases of R3 , find the dual basis B ∗ = {u∗ , v ∗ , w∗ } of (R3 )∗ .
211. Let V be a finite dimensional vector space with basis B. Show that ∀u ∈ V and ∀ϕ ∈ V ∗ ,
212. For each of the following linear transformations T : R2 → R3 and linear functionals ϕ ∈ (R3 )∗ ,
calculate T ∗ (ϕ)(x, y).
ϕa (a0 + a1 x + a2 x2 ) = a0 + a1 a + a2 a2
Show
(a) ϕa ∈ V ∗
(b) If a 6= b, then ϕa 6= ϕb
The following connects transition matrices for V and transition matrices for V ∗ .
Theorem. Let V be a (finite dimensional) vector space. Let B and C be two bases for V and let B ∗ and
C ∗ be the (respective) dual bases for V ∗ . Then
Proof. We will show that (PC ∗ ,B∗ )T PC,B = I. The i-th row of (PC ∗ ,B∗ )T is [b∗i ]TC ∗ = [α1 · · · αn ]
where b∗i = α1 c∗1 + · · · + αn c∗n . The j-th column of PC,B is [bj ]C = [β1 · · · βn ]T where bj =
β1 c1 + · · · + βn cn . Therefore
S 0 = {ϕ ∈ V ∗ | ∀u ∈ S, ϕ(u) = 0}
Proof. Exercise! Hint for the first part: Let {w1 , . . . , wk } be a basis for W . Extend to a basis for
V , {w1 , . . . , wk , v1 , . . . , vm }. Show that {v1∗ , . . . , vm
∗ } is a basis for W 0 .
To help with understanding and analysing linear transformations, it is useful to identify subspaces
that are mapped to themselves in the following sense.
Definition 23.1
Example 23.2. Let T : R4 → R4 be the linear transformation having standard matrix representation
2 0 1 2
1 2 0 −1
[T ]S =
−2
1 −1 −1
1 −2 2 1
Let u1 = (−1, 1, 1, 0) and u2 = (0, 0, 1, −1). The subspace U = span{u1 , u2 } is invariant. To see this it
is enough to note that
−1
1
2 = [u1 ]S + [u2 ]S
[T (u1 )]S = [T ]S [u1 ]S = therefore T (u1 ) = u1 + u2 ∈ U
−1
−1
1
0 = [u1 ]S − [u2 ]S
[T (u2 )]S = [T ]S [u2 ]S = therefore T (u2 ) = u1 − u2 ∈ U
Exercise 215. Let V be a finite-dimensional F-vector space and let T : V → V be a linear transforma-
tion. Let B = {b1 , . . . , bn } be a basis for V and k ∈ {1, 2, . . . , n − 1}. Define W = span{b1 , . . . , bk } and
suppose that W is invariant (i.e. T (W ) ⊆ W ). Show that
" #
A C
[T ]B = for some A ∈ Mk,k (F), C ∈ Mk,(n−k) and B ∈ M(n−k),(n−k) (F).
0 B
Consideration of 1-dimensional invariant subspaces leads to the idea of eigenvectors and eigenval-
ues.
Exercise 216. Let V be a vector space and let T : V → V be a linear transformation. Suppose u ∈ V
and λ ∈ F are such that T (u) = λu. Show that the subspace W = span{u} 6 V is invariant.
23-2 MAST10022 Linear Algebra: Advanced, 2024
Let V be a vector space with field of scalars F, and let T : V → V be a linear transformation. A
*
scalar λ ∈ F is called an eigenvalue of T if there is a non-zero vector v ∈ V \ {0} such that
T (v) = λv
*
Such a vector v ∈ V \ {0} is called an eigenvector of T corresponding to the eigenvalue λ.
The set {v ∈ V | T (v) = λv} is a subspace of V , called the eigenspace of λ.
Exercise 217. Let T : V → V be a linear transformation and suppose that λ is an eigenvalue of T . Let
Wλ = {v ∈ V | T (v) = λv} (i.e., the corresponding eigenspace). Show that
*
(a) Wλ is a subspace of V . (b) Wλ 6= {0} (c) T (Wλ ) ⊆ Wλ
So both (2, −1, 0) and (6, −5, 3) are eigenvectors with eigenvalue 2.
As we’ve seen above, the matrix of a linear transformation is, of course, useful for working with
eigenvectors. We adapt the definition of eigenvalue and eigenvector to matrices in a natural way.
Let F be a field, and let A ∈ Mn,n (F). A scalar λ ∈ F is an eigenvalue of A if there is a non-zero
column matrix v ∈ Mn,1 (F) such that
Av = λv
Then v is called an eigenvector of A corresponding to eigenvalue λ.
Lemma 23.6
Let V be a finite-dimensional vector space over a field F and let T : V → V be a linear transfor-
mation. Let B be a basis of V and let λ ∈ F and v ∈ V . Then
We can calculate the eigenvalues of a matrix (and hence a linear transformation) using the following.
Proposition 23.7
Therefore,
1 4
Example 23.8. Let’s find the eigenvalues of A = ∈ M2,2 (R).
1 1
1 4 λ 0 1−λ 4
det(A − λI2 ) = det( − ) = det = λ2 − 2λ − 3 = (λ − 3)(λ + 1)
1 1 0 λ 1 1−λ
Exercise 218. Show that if A ∈ Mn,n (F) is in upper triangular form, then the eigenvalues of A are
exactly the entries on the diagonal of A.
Let A ∈ Mn,n (F). The determinant det(xIn − A) is a polynomial in x called the characteristic
polynomial of A. We will denote it by cA .
cA (x) = det(xIn − A)
Remark. 1. The characteristic polynomial of A ∈ Mn,n (F) is always monic and of degree exactly
n. That is,
cA (x) = α0 + α1 x + α2 x2 + · · · + αn−1 xn−1 + xn
for some α0 , . . . , αn−1 ∈ F.
3. From Proposition 23.7 we know that the eigenvalues of A are exactly the roots of the char-
acteristic polynomial. If λ ∈ F is an eigenvalue of A, then (x − λ) divides the characteristic
polynomial of A.
The algebraic multiplicity of an eigenvalue λ is the largest k ∈ N such that (x − λ)k divides the
characteristic polynomial.
2. In the case in which F = C, there is always at least one eigenvalue.* Further, the sum of the
algebraic multiplicities equals n.
8 −9 −9
Example 23.11. Let A = 9 −10 −9 ∈ M3,3 (R). The characteristic polynomial of A is given by
−1 2 5
The eigenvalues of A are therefore −1 and 2. The eignevalue −1 has algebraic multiplicity 1 and the
eigenvalue 2 has algebraic multiplicity 2.
0 −1
Example 23.12. Let A = ∈ M2,2 (R). The characteristic polynomial is:
1 0
x 1
det = x2 + 1
−1 x
23.4 Exercises
2 −3 6 −5 −8 −12 3 1 1
(a) 0 5 −6 (c) −6 −10 −12 (e) 2 4 2
0 1 0 6 10 13 1 1 3
2 1 0 2 2 2 1 1 0
(b) 0 2 0 (d) −1 −1 −2 (f) 0 1 0
0 0 −3 1 2 3 0 0 1
224. Let A ∈ Mn,n (C). Show that det(A) is equal to the product of the eigenvalues (with multiplic-
ity) of A .
Let A ∈ Mn,n (C). The sum of the eigenvalues (with multiplicity) of A is equal to the trace of A.
The product of the eigenvalues (with multiplicity) is equal to the determinant. Neither of these
is obvious!
Eigenspaces
We saw last lecture the definitions of the eigenvalues and eigenspaces for a linear transformation and
for a matrix. If λ ∈ F is an eigenvalue value of A ∈ Mn,n (F), the corresponding eigenspace is the
solution space of the matrix A − λIn .
2 −3 6
Example 24.1. Let A = 0 5 −6 ∈ M3,3 (R).
0 1 0
We find the eigenvalues of A and a basis for each eigenspace.
The characteristic polynomial is given by
x−2 3 −6
x−5 6
xI3 − A = 0 x − 5 6 = (x − 2) = (x − 2)(x(x − 5) + 6)
−1 x
0 −1 x
= (x − 2)(x2 − 5x + 6) = (x − 2)2 (x − 3)
Therefore the eigenvalues of A are 2 and 3. The eigenvalue 2 has algebraic multiplicity 2 and the
eigenvalue 3 has algebraic multiplicity 1.
To find the eigenspace for eigenvalue 2 we solve for the solution space of A − 2I3 .*
0 −3 6 0 1 −2 0 1 −2
R1 ↔R3 R2 −3R1
A − 2I3 = 0 3 −6 −−−−−→ 0 3 −6 −−−−−→ 0 0 0
R3 +3R1
0 1 −2 0 −3 6 0 0 0
The set {(1, 0, 0), (0, 2, 1)} is a basis for the eigenspace.
For eigenvalue 3, the eigenspace is the solution space of A − 3I3 .
−1 −3 6 −1 −3 6 −1 −3 6
R ↔R3 R −2R2
A − 3I3 = 0 2 −6 −−2−−−→ 0 1 −3 −−3−−−→ 0 1 −3
0 1 −3 0 2 −6 0 0 0
−1 0 −3 1 0 3
R1 +3R2 −1×R1
−−−−−→ 0 1 −3 −−−−→ 0 1 −3
0 0 0 0 0 0
h 8 −9 −9 i
Example 24.3. We saw last lecture (Example 23.11) that the matrix A = 9 −10 −9 ∈ M3,3 (R) has
−1 2 5
eigenvalues −1 (with algebraic multiplicity 1) and 2 (with algebraic multiplicity 2). Let’s find the
corresponding eigenspaces.
For eigenvalue −1 we have
9 −9 −9 1 0 4
A − (−1)I3 = 9 −9 −9 ∼ · · · ∼ 0 1 5
−1 2 6 0 0 0
The eigenspace has basis {(−4, −5, 1)} and geometric multiplicity 1.
For eigenvalue 2 we have
6 −9 −9 1 0 3
A − 2I3 = 9 −12 −9 ∼ · · · ∼ 0 1 3
−1 2 3 0 0 0
The eigenspace has basis {(−3, −3, 1)} and geometric multiplicity 1. Notice that, for this matrix, the
geometric multiplicity of the eigenvalue 2 is strictly less than its algebraic multiplicity.
We saw in Lemma 23.6 that the eigenvalues of a linear transforamtion T are the same as the eigenval-
ues of any matrix representation [T ]B , and that there is a correspondence between the eigenvectors of
T and the eigenvectors of [T ]B . We now define the characteristic polynomial of T to be the character-
istic polynomial of [T ]B . To see that this is independent of the choice of basis B, we note the following
lemma. Recall that if B and C are two bases, then although [T ]B and [T ]C are not equal, they are similar
in the sense that [T ]B = P [T ]C P −1 for some invertible matrix P (see Definition 21.10).
Lemma 24.4
Let A, B ∈ Mn,n (F). If A and B are similar, then they have the same characteristic polynomial.
x − 2 −1 −1
x − 1 −2
0 x − 1 −2 = (x − 2) = (x − 2)(x2 − x − 2) = (x − 2)(x − 2)(x + 1)
−1 x
0 −1 x
= (x + 1)3
We see that the solution space has dimension 2 and has a basis {(1, 0, 0), (0, 2, 1)} ⊆ F33 .
The eigenspace of T is therefore the subspace of P2 (F3 ) that has basis {1, 2x + x2 }.
The eigenvalue 2 ∈ F3 has geometric multiplicity 2 ∈ N.
In the above examples we saw that the geometric multiplicity was always less than or equal to the
algebraic multiplicity. We prove that this is always the case.
Proposition 24.7
Proof. Let W 6 V be the eigenspace corresponding to λ. Let k = dim(W ) be the geometric multiplic-
ity of λ. Let {w1 , . . . , wk } be a basis for W . Extend to a basis for V , B = {w1 , . . . , wk , bk+1 , . . . , bn }. We
have
λIk M
[T ]B = for some M ∈ Mk,(n−k) (F) and N ∈ M(n−k),(n−k) (F)
0 N
The characteristic polynomial of T is therefore (x − λ)k cN (x) where cN (x) is the characteristic poly-
nomial of the matrix N . Therefore the algebraic multiplicity is greater than or equal to k.
24.4 Exercises
225. Find bases for the eigenspaces of matrices in Exercise 219. Give the algebraic and geometric
multiplicity of each eigenvalue.
226. Find bases for the eigenspaces of matrices in Exercise 221. Give the algebraic and geometric
multiplicity of each eigenvalue.
227. Find 1-dimensional subspaces of R2 that are invariant under the linear transformations given
by the following matrices:
1 0 1 2
(a) (b)
2 0 −3 −6
Let A ∈ Mn,n (F). Prove that the characteristic polynomial cA (x) = det(xIn − A) has degree n
and is monic.
Diagonalisation
We have seen in several examples that, given a linear transformation, there is sometimes a basis with
respect to which the matrix of the linear transformation is diagonal. Having such a basis is extremely
useful when working with linear transformations. We will give conditions for the existence of such
a basis and see that it is not always possible.
25.1 Diagonalisability
Example 25.2. Consider the linear transformation T : R3 → R3 with standard matrix given by
2 −3 6
[T ]S = 0 5 −6
0 1 0
In Example 24.1 we saw that this matrix has eigenvalues 2 and 3. Further we calculated a basis
{(1, 0, 0), (0, 2, 1)} for the λ = 2 eigenspace and a basis {(−3, 3, 1)} for the λ = 3 eigenspace. In
particular, letting b1 = (1, 0, 0), b2 = (0, 2, 1) and b3 = (−3, 3, 1) we have
Therefore T is diagonalisable.
Remark. Notice that, in the above example, if we let A = [T ]S , and D = [T ]B , then we have that
A = P DP −1 since
A = [T ]S = PS,B [T ]B PB,S = P DP −1
A matrix A ∈ Mn,n (F) is diagonalisable if there is an invertible matrix P ∈ Mn,n (F) and a
diagonal matrix D ∈ Mn,n (F) such that
A = P DP −1
25-2 MAST10022 Linear Algebra: Advanced, 2024
Theorem 25.4
Proof. Suppose first that B = {b1 , . . . , bn } is a basis for V and that T (bi ) = λi bi . Then
" #
λ1 0
[T ]B = [ [T (b1 )]B · · · [T (bn )]B ] = [ [λ1 b1 ]B · · · [λn bn ]B ] = ..
.
0 λn
Conversely, suppose that C = {c1 , . . . , cn } is a basis of V such that [T ]C is diagonal and let the entries
on the diagonal be µ1 , . . . , µn . Then considering the j-th column of [T ]C we have that
0
.
..
0
[T (cj )]C =
µj = [µj cj ]C
0
.
..
0
Given a matrix A ∈ Mn,n (F), to find an invertible P ∈ Mn,n (F) and a diagonal D ∈ Mn,n (F)
such that A = P DP −1 (or conclude that no such exist):
1 4
Example 25.6. Let A = ∈ M2,2 (R). The characteristic polynomial of A is given by
1 1
1−x 4
cA (x) = det = x2 − 2x − 3 = (x − 3)(x + 1)
1 1−x
25.3 Exercises
229. Decide which of the matrices A in questions 219 above are diagonalisable and, if possible, find
an invertible matrix P and a diagonal matrix D such that P −1 AP = D.
230. Decide which of the matrices A in questions 221 above are diagonalisable and, if possible, find
an invertible matrix P and a diagonal matrix D such that P −1 AP = D.
Let V be a finite dimensional vector space over a field F. Although not all linear transformations
T : V → V are diagonalisable, if the field is C then there does always exist a basis such that
[T ]B is upper triangular. Here’s an outline of a proof.
Diagonalisation II
We look at sufficient conditions for diagonalisability and see why the diagonalisation method given
in the last lecture works.
Lemma 26.1
n+1
X *
αi vi = 0 (1)
i=1
n+1 n+1
X * X *
=⇒ T ( αi vi ) = T (0) =⇒ αi T (vi ) = 0
i=1 i=1
n+1
X *
=⇒ αi λ i vi = 0 (2)
i=1
n+1 n+1
!
X X *
λn+1 αi vi − αi λ i vi = 0
i=1 i=1
n+1 n
X * X *
=⇒ αi (λn+1 − λi )vi = 0 =⇒ αi (λn+1 − λi )vi = 0
i=1 i=1
=⇒ ∀i ∈ {1, . . . , n}, αi (λn+1 − λi ) = 0 ({v1 , . . . , vn } is linearly independent)
=⇒ ∀i ∈ {1, . . . , n}, αi = 0 (since λn+1 6= λi )
* *
Finally, note that we now have, from (1), that αn+1 vn+1 = 0. Since vn+1 6= 0 this implies that αn+1 = 0
also. Therefore the result holds for k = n + 1.
Therefore, by mathematical induction, the result holds for all k ∈ N.
26-2 MAST10022 Linear Algebra: Advanced, 2024
Example 26.2. For the matrix of Example 24.1, (1, 2, 1) is an eigenvector with eigenvalue 2 and
(3, −3, −1) is an eigenvector with eigenvalue 3. The set {(1, 2, 1), (3, −3, −1)} is linearly indepen-
dent.
(b) Let B1 be a basis for W1 and let B2 be a basis for W2 . Show that B1 ∪ B2 is linearly independent.
The following result justifies the technique given in the last lecture for diagonalising a matrix.
Theorem 26.3
Proof. Let the eigenvalues of T be λ1 , . . . , λk . Denote by gi and ai the geometric and algebraic mul-
tiplicities of the eigenvalue λi . We know from Proposition 24.7 that gi 6 ai . We also know that
a1 + · · · + ak 6 n.
Suppose that T is diagonalisable. Then there exists a basis B for V with the property that each
element of B is an eigenvector of T . For i ∈ {1, . . . , k} let Bi = {b ∈ B | b has eigenvalue λi }. Note
that B = B1 ∪ B2 ∪ · · · ∪ Bk and that Bi ∩ Bj = ∅ if i 6= i. We have
Therefore g1 + · · · + gk = n.
For the converse, suppose now that g1 +· · ·+gk = n. Let Ci be a basis for the λi eigenspace. Then |Ci | =
gi . From Exercise 231 we know that Ci ∩ Cj = ∅ if i 6= j. Therefore, if we define C = C1 ∪ C2 ∪ · · · ∪ Ck ,
we have
|C| = |C1 | + |C1 | + · · · + |Ck | = g1 + g2 + · · · + gk = n
All elements of C are eigenvectors. If we show that C is linearly independent we will be done since it
would follow that C is a basis.
Denote the elements of Ci = {ci1 , . . . , ci,gi }.
gi
k X k gi
X * X * X
αi,j ci,j = 0 =⇒ ui = 0 where we define ui = αi,j ci,j
i=1 j=1 i=1 j=1
*
=⇒ ∀i, ui = 0 by Lemma 26.1
=⇒ ∀i ∀j, αi,j = 0 since Ci is linearly independent
Corollary 26.4
Proposition 26.5
Note. The converse is false! Linear transformations that have repeated eigenvalues can still be diag-
onalisable.
Proof. Suppose that λ1 , . . . , λn ∈ F are eigenvalues of T and that λi 6= λj when i 6= j. Let mi be the
geometric multiplicity of the eigenvalue λi . Since mi > 1, we must must that m1 + · · · + mn = n.
Diagonalisability then follows from Theorem 26.3.
1 2 3 4
Example 26.6. 1. 00 20 33 44 ∈ M4,4 (F5 ) is diagonalisable (and we can see that without the need for
0004
any calculations).
2. 1−3i 4i
−2i 1+3i ∈ M2,2 (C) has eigenvalues 1 + i, 1 − i and is therefore diagonalisable.
26.3 Exercises
232. Give an example of a linear transformation T : R2 → R2 that is diagonalisable and has only one
eigenvalue.
4 −2 −1
233. Let E = −2 4 −1 ∈ M3,3 (C)
−1 −1 1
234. (a) Let T : Q2 → Q2 be the linear transformation given by T (x, y) = (x + y, x − y). Find the
eigenvalues of T . Is T diagonalisable? If it is diagonalisable, give a diagonal matrix D
such that [T ]B = D with respect to some basis B. (You are not being asked to find B.)
(b) Let S : R2 → R2 be the linear transformation given by S(x, y) = (x + y, x − y). Find the
eigenvalues of S. Is S diagonalisable? If it is diagonalisable, give a diagonal matrix D such
that [S]B = D with respect to some basis B. (You are not being asked to find B.)
In applications one often comes across the need to apply a transformation many times. If the trans-
formation can be represented by a diagonalisable matrix A, it’s much easier to compute Ak and thus
the effect of the k-th application of the transformation. The first point to appreciate is that computing
powers of a diagonal matrix D is easy.
h1 0 0i h1 0 0i h1 0 0i 1 0 0
2 3
Example 27.1. Let D = 0 −3 0 . Then D = 0 9 0 , D = 0 −27 0 , D = 0 (−3) 0 k k
0 0 2 004 0 0 8 0 0 2k
Example 27.3. As an application we will investigate a simple model of population movement be-
tween Victoria and Queensland. Assume that 2% of Victorians move to Queensland each year, 1%
of Queenslanders move to Victoria each year and everybody else stays put. This is an example of a
(discrete-time) ’Markov process’. We investigate what happens in the long term under these assump-
tions.
Let vi be the Victorian population (in millions) after i years and qi be the Queensland population (in
millions) after i years.
−1
k 1 1 1 0 1 1
A = 1 1
2 −1 0 (0.97)k 2 −1 vk = (1 + 2(0.97)k )v0 + (1 − (0.97)k )q0
3 3
k k 1 1
" #
1 1 + 2(0.97) 1 − (0.97) qk = (2 − 2(0.97)k )v0 + (2 + (0.97)k )q0
= 3 3
3 2 − 2(0.97)k 2 + (0.97)k
Then
a0 In + a1 A + · · · + an−1 An−1 + An = 0
That is, every square matrix satisfies its own characteristic equation.
2 11 14 3 2 1 0 0 0
A − 7A + 10I2 = −7 + 10 =
7 18 1 4 0 1 0 0
Proof of Cayley-Hamilton for diagonalisable matrices. First suppose that D ∈ Mn,n (F) is diagonal and let
the entries on the diagonal be λ1 . . . , λn ∈ F. Then cD (x) = (x − λ1 )(x − λ2 ) . . . (x − λn ). To see that
(D − λ1 I)(D − λ2 I) · · · (D − λn I) = 0 note that the entry in the i-th row and i-th column of D − λi I
is equal to zero.
Now suppose that A = P DP −1 for some invertible matrix D and diagonal matrix D. Then cA (x) =
cD (x) by Lemma 24.4. Writing cA (x) = xn + an−1 xn−1 + · · · + a1 x + a0 we have
Remark. A slightly more involved argument can be used to show that upper triangular matrices
satisfy the statement of the theorem. The full proof (at least for F = C and F = R) is then completed
by showing that all matrices in Mn,n (C) are similar to a matrix in upper triangular form.
Corollary 27.6
Let A ∈ Mn,n (F). For all m > 0, Am can be expressed as a linear combination of I, A, . . . , An−1 .
If A is invertible, then for all m > 0,
A−m can be expressed as a linear combination of I, A, . . . , An−1
9 18 −24
A = 7 20 −24 has characteristic polynomial cA (x) = x3 − 4x2 + x + 6
7 21 −25
We know that
A3 = 4A2 − A − 6I3
A4 = 4A3 − A2 − 6A = 4(4A2 − A − 6I3 ) − A2 − 6A = 15A2 − 10A − 24I3
A5 = 4A4 − A3 − 6A2 = 4(15A2 − 10A − 24I3 ) − (4A2 − A − 6I3 ) − 6A2
= 50A2 − 39A − 90I3
−1 2 2 1
A−1 = A + A − I3
6 3 6
27.3 Exercises
236. Suppose the nth pass through a manufacturing process is modelled by the linear equations
xn = An x0 , where x0 is the initial state of the system and
1 3 2
A=
5 2 3
Show that " # " #
1 1
2 2 1 n 12 − 12
An = 1 1 +( )
2 2
5 − 21 1
2
p
Then, with the initial state x0 = , calculate limn→∞ xn .
1−p
237. The Fibonacci sequence
0, 1, 1, 2, 3, 5, 8, 13, . . .
is given by the difference equation Fk+2 = Fk+1 + Fk and the initial conditions F0 = 0, F1 = 1.
Fk+1 k 1 1
(a) Letting uk = , show that uk = A u0 where A = .
Fk 1 0
(b) Show that A is diagonalisable, and find P and D such that A = P DP −1 .
(c) Use your answer to (b) to calculate Ak .
(d) Use your answer to (c) to show that
√ !k √ !k
1 1+ 5 1− 5
Fk = √ −
5 2 2
(a) Write down the market share shift as a system of linear equations.
(b) Express the shift in matrix form.
(c) Find the market shares after 5 and 10 years.
(d) Show that the market eventually reaches a steady state, and give the limit market shares.
241. For each matrix, find a non-zero polynomial satisfied by the matrix.
2 5 1 4 −3
(a)
1 −3 (b) 0 3 1
0 2 −1
242. (a) Give an example of a matrix A ∈ M3,3 (R) and a quadratic polynomial p ∈ R[x] such that
p(A) = 0.
(b) Give an example of a matrix B ∈ M3,3 (R) and two different cubic polynomials q, r ∈ R[x]
such that q(A) = r(A) = 0.
Claim. If A ∈ Mn,n (F) is upper-triangular, then A satisfies its own characteristic polynomial
Sketch of proof.
λ1 ∗ ∗
A=0
..
cA (x) = (x − λ1 )(x − λ2 ) · · · (x − λn )
. ∗
0 0 λn
Let ei ∈ Mn,1 be the column matrix with a 1 in the i-th row and zeros everywhere else. Define
*
subspaces V0 , V1 , . . . , Vn 6 Mn,1 (F) by V0 = {0} and Vk = span{e1 , . . . , ek }for k ∈ {1, 2, . . . , n}.
Note that
∀u ∈ Vk , (A − λk In )u ⊆ Vk−1
It follows that
∀u ∈ Mn,1 (F), (A − λ1 In ) · · · (A − λn In )u ⊆ V0
Since this holds for all u ∈ Mn,1 (F), we conclude that (A − λ1 In ) · · · (A − λn In ) = 0.
Combining this with the result that all elements of Mn,n (C) are similar to an upper triangular
matrix gives a proof of Cayley-Hamilton in the case F = C. In fact, the same argument works
for any ‘algebraically closed’ field, and it’s then not much work to extend to an arbitrary F.
We want to be able to consider geometric notions such as length and angle in a vector space. Before
defining the general notion of an inner product space, we revise the familiar context of Rn .
To work with length, distance, and angle we need more than just the vector space properties of Rn .
u · v = u1 v1 + u2 v2 + · · · + un vn ∈ R
Exercise 243. Use the definition of the dot product to verify the following properties. For all u, v ∈ Rn
and α ∈ R,
*
(a) u · v = v · u (c) u · (v + w) = u · v + u · w (e) u · u = 0 ⇐⇒ u = 0
(b) (αu) · v = α(u.v) (d) u · u > 0
Example 28.3. k(1, −2, 2)k = 3, k 31 (1, −2, 2)k = 1, 13 (1, −2, 2) is a unit vector.
Example 28.4. The distance between two points P (1, 3, −1) and Q(2, 1, −1) is the distance between
their position vectors: d(P, Q) = d((1, 3, −1), (2, 1, −1)) = k(1, 3, −1) − (2, 1, −1)k = k(−1, 2, 2)k = 3
Let u = (u1 , u2 , u3 ), v = (v1 , v2 , v3 ) ∈ R3 . The cross product (or vector product) of u and v is
the vector given by
i j k
u u3 u u3 u u2
u × v = u1 u2 u3 = 2 i− 1 j+ 1 k
v2 v3 v1 v3 v1 v2
v1 v2 v3
Exercise 244 (Properties of the cross product). Show (directly from the definitions) that for any vec-
tors u, v and w ∈ R3 , and scalar α ∈ R:
*
(a) u × v = −(v × u) (d) u × u = 0
(b) u × (v + w) = (u × v) + (u × w) (e) u · (u × v) = 0
(c) (αu) × v = α(u × v)
Example 28.7. We can use the cross product to find a vector perpendicular to both (2, 3, 1) and
(1, 1, 1).
Note that (2, −1, −1) · (2, 3, 1) = 0 and (2, −1, −1) · (1, 1, 1) = 0
Lemma 28.8
28.3 Lines in R3
Lines through the origin are exactly the 1-dimensional subspaces of R3 . Every line in R3 is a translate
of a 1-dimensional subspace of R3 in the following way.
*
P0 r0 + span(*
v)
Example 28.9. Consider the line passing through the points P (−1, 2, 3) and Q(4, −2, 5). It has
x = −1 + 5t
parametric equations: y = 2 − 4t t∈R
z = 3 + 2t
Definition 28.10
The angle between two lines is the angle between their direction vectors
x=1+t x = −4 + 3t x = 12 t
L1 : y = 2 − 4t t ∈ R L2 : y = −6 + 2t t ∈ R L3 : y = 1 − 2t t ∈ R
z = 3 + 2t z =3+t z =2+t
Given a point with position vector * p and a line with vector equation *r = *r0 + t*
u t ∈ R, the distance
from the point to the line is given by
k*
u × (*
p − *r0 )k
d=
k*
uk
Exercise 245. Use Lemma 28.8 to derive the above expression for the distance between a point and a
line.
28.4 Planes in R3
Planes through the origin are exactly the 2-dimensional subspaces of R3 . All planes in R3 are of the
form *r0 + W where W is a 2-dimensional subspace of R3 .
The vector equation of a plane through a point P0 and par- *
* r = *r0 + s*
u + t*
v s, t ∈ R
allel to both * v ∈ R3 is (where *r0 = OP0 ):
u, *
The cartesian form of a plane is:
Where a, b, c, d ∈ R and the vector (a, b, c) is perpendicular ax + by + cz = d
to both *
u and *v. Such a vector is called a normal to the plane.
Examples 28.12.
1. The plane perpendicular to the direction (1, 2, 3) and through the point (4, 5, 6) is given by
x + 2y + 3z = d where d = 1 × 4 + 2 × 5 + 3 × 6. That is, x + 2y + 3z = 32
2. Consider the plane perpendicular to (1, 0, −2) and containing the point (1, −1, −3).
The cartesian equation is: x − 2z = 7
A vector equation is: (x, y, x) = (1, −1, −3) + s(0, 1, 0) + t(2, 0, 1) s, t ∈ R
3. Consider the plane with vector equation
(x, y, z) = (1, 2, 3) + s(2, 3, 1) + t(1, 1, 1), s, t ∈ R
To find a catesian equation for the plane, we need a normal to the plane. For this we can
the fact the (2, 3, 1) × (1, 1, 1) = (2, −1, −1) is orthogonal to both (2, 3, 1) and (1, 1, 1). The
cartesian equation is of the form 2x − y − z = d. Since (1, 2, 3) lies in the plane, we have that
d = 2 − 2 − 3 = −3. The catesian equation is
2x − y − z = −3
Given a point *p and a plane that has normal vector * n and contains a point *r0 , the distance from the
point to the plane is given by
|(*
p − *r0 ) · *
n|
d=
k*
nk
28.5 Exercises
(a) (1, 0, 0), (0, 0, 4) (b) (1, −1, 0), (0, 1, 1) (c) (2, −2, 2), (−1, 0, 2)
249. Find the values of x such that the following pairs of vectors are (i) orthogonal and (ii) parallel.
(a) (x, 1 − 2x, 3) and (1, −x, 3x) (b) (x, x, −1) and (1, x, 6)
250. Write down the equation for the following lines in both vector and cartesian form.
(a) the line passing through P (2, 1, −3) and parallel to v = (1, 2, 2)
(b) the line through P (2, −3, 1) and parallel to the x-axis
(c) the line passing through the points P (2, 0, −2) and Q(1, 4, 2)
(d) the line through P (2, 4, 5) and perpendicular to the plane 5x − 5y − 10z = 2
251. Determine whether the lines L1 and L2 are parallel, intersecting or skew (not parallel or inter-
secting). If they intersect, find the point of intersection. Let the parameters s, t ∈ R.
(a) the plane through the point (1, 4, 5) and perpendicular to the vector (7, 1, 4)
(b) the plane through the point (6, 5, −2) and parallel to the plane x + y − z + 1 = 0
(c) the plane through the origin and the points (1, 1, 1) and (1, 2, 3)
(d) the plane that passes through the point (1, 6, −4) and contains the line
Are the points A(1, 1, 1), B(2, 1, 3), C(3, 2, 1) and D(4, 2, 3) coplanar? If yes, find the equa-
tion of the plane containing these points.
254. (a) Find the point of intersection of the line r(t) = (2, 1, 1) + t(−1, 0, 4); t ∈ R with the plane
x − 3y − z = 1.
(b) Find the point of intersection of the line x = 1 + t, y = 2t, z = 3t; t ∈ R with the plane
x + y + z = 1.
256. Given the line ` determined by the equations 2x − y + z = 0, x + z − 1 = 0, and M the point
(1, 3, −2), find a cartesian equation of the plane:
((1, 0, 0) × (0, 1, 0)) × (1, 1, 0) = (−1, 1, 0) 6= (0, 1, 0) = (1, 0, 0) × ((0, 1, 0) × (1, 1, 0))
Inner products
An inner product on a vector space is a generalisation of the dot product on Rn seen in the last lecture.
It will be used to define geometric notions such as length and angle.
When dealing with inner products, the field F will always be either R or C. Given an element α ∈ F,
we denote by α its complex conjugate and by |α| its absolute value.
Note.
The first condition implies that ∀u ∈ V, hu, ui ∈ R and therefore the inequality in 4(a) makes
sense.
The first and second conditions together imply that ∀u, v ∈ V ∀α ∈ F, hu, αvi = αhu, vi.
It’s possible to have many different inner products on the same vector space.
*
Exercise 257. Show that ∀v ∈ V , h0, vi = 0.
Examples 29.2.
h(u1 , . . . , un ), (v1 , . . . , vn )i = u1 v1 + · · · + un vn
h(u1 , . . . , un ), (v1 , . . . , vn )i = u1 v1 + · · · + un vn
R1
5. V = Pn (R), hp, qi = 0 p(x)q(x) dx
Examples 29.3.
h(u1 , u2 ), (v1 , v2 )i = u1 v1 + u2 v2
For a vector space with an inner product h· , ·i we define: the length (or norm) of a vector v ∈ V
by p
kvk = hv, vi
The distance between two vectors u, v ∈ V is defined to be
d(v, u) = kv − uk
∀u ∈ V ∀α ∈ F, kαuk = |α|kuk
Exercise 259. Let u, v be orthogonal vectors in an inner product space V . Show that
The vectors u and v are orthogonal (with respect to this inner product).
Example 29.6. Consider the real vector space V = C[0, 2π] of all continuous functions f : [0, 2π] → R.
We equip V with the following inner product
Z 2π
hf, gi = f (x)g(x)dx
0
The norms of the functions s, c ∈ V given by s(x) = sin(x) and c(x) = cos(x) are:
Z 2π Z 2π 2π
2 2 1 x 1
ksk = hs, si = sin (x)dx = (1 − cos(2x))dx = − sin(2x) =π
0 0 2 2 4 0
√ √
So ksk = π. Similarly, kck = π.
The vectors s and c are othogonal since
Z 2π Z 2π 2π
1 1
hs, ci = sin(x) cos(x)dx = sin(2x)dx = − cos(2x) =0
0 0 2 4 0
We would like to define what should be meant by the angle between two vectors in an inner product
space by using the same expression as in Definition 28.5. To be sure that it makes sense we need the
following result.
Further, equality holds if and only if one vector is a multiple of the other.
* * *
Proof. Let u, v ∈ V . If v = 0, the result holds because hu, 0i = 0 and k0k = 0 (see Exercise 257).
1
We now cinsider the case in which v is non-zero. Let p = kvk2
hu, viv. Note that
1 1 1 1
kpk2 = h 2
hu, viv, 2
hu, vivi = 4
|hu, vi|2 hv, vi = |hu, vi|2
kvk kvk kvk kvk2
and that w = u − p is orthogonal to p since
1 1
hp, wi = hp, u − pi = hp, ui − hp, pi = hu, vihv, ui − |hu, vi|2 = 0
kvk2 kvk2
Then we have
kuk2 = kw + pk2
= kwk2 + kpk2 (by Exercise 259)
> kpk2
1
= |hu, vi|2
kvk2
This inequality gives kuk2 kvk2 > |hu, vi|2 and hence kukkvk > |hu, vi|.
*
The above inequality is an equality iff kwk2 = 0, that is, w = 0. We have
*
w = 0 ⇐⇒ u = p ⇐⇒ u is a multiple of v
R1
Example 29.8. Consider V = P2 (R) with inner product given by hp, qi = 0 p(x)q(x) dx. Let u = −x
and v = x2 . Then
Z 1 Z 1 Z 1
3 1 2 2 1 2 1
hu, vi = − x dx = − kuk = hu, ui = x dx = kvk = hv, vi = x4 dx =
0 4 0 3 0 5
1 1
|hu, vi| = 6 √ = kukkvk
4 15
Definition 29.9
Let V be a real inner product space. The angle between two vectors u, v ∈ V is defined to be
θ ∈ [0, π] given by
hu, vi
θ = arccos
kuk kvk
Example 29.10. With u, v ∈ V as in Example 29.8 we have that the angle between x and x2 is
√ !
−1/4 − 15
θ = arccos √ = arccos
1/ 15 4
ku + vk 6 kuk + kvk
Proof.
ku + vk2 = hu + v, u + vi = hu, ui + hu, vi + hv, ui + hv, vi
= kuk2 + 2<(hu, vi) + kvk2 (<(z) denotes the real part of z ∈ C)
2 2
6 kuk + 2|hu, vi| + kvk
6 kuk2 + 2kukkvk + kvk2 (Cauchy-Schwartz)
2
= (kuk + kvk)
29.4 Exercises
u1 u2 v1 v2
260. Given U = and V = are two 2 × 2 matrices, then
u3 u4 v3 v4
hU, V i = u1 v1 + u2 v2 + u3 v3 + u4 v4
defines an inner product on M2,2 (R).
3 −2 −1 3
(a) Compute hU, V i when U = and V = .
4 8 1 1
−2 5
(b) Given A = , find kAk.
3 6
2 6 −4 7
(c) Given A = and B = , find the distance between them d(A, B).
9 4 1 6
2 1
(d) Let A = . Which of the following matrices are orthogonal to A?
−1 3
1 1 2 1
i) ii)
0 −1 5 2
262. For x = (x1 , x2 ).y = (y1 , y2 ) ∈ R2 , define hx, yi = x1 y1 + 3x2 y2 . Show that hx, yi is an inner
product on R2 .
263. In R2 , let h(x1 , x2 ), (y1 , y2 )i = x1 y1 − x2 y2 . Is this an inner product? If not, why not?
is an inner product in R2 .
265. Decide which of the suggested operations on x = (x1 , x2 , x3 ) and y = (y1 , y2 , y3 ) in R3 define
an inner product:
266. Decide which of the following functions hp, qi on real polynomials p(x) = a0 + a1 x + a2 x2 and
q(x) = b0 + b1 x + b2 x2 define inner products on P2 (R):
267. For the vectors x = (1, 1, 0), y = (0, 1, 0) in R3 compute the norms kxk and kyk using the
following inner products.
268. In each part determine whether the given vectors are orthogonal with respect to the Euclidean
inner product (i.e., the usual dot product).
(a) u = (−1, 3, 2), v = (4, 2, −1) (b) u = (0, 3, −2, 1), v = (5, 2, −1, 0)
269. Endow R4 have the Euclidean inner product (i.e. the dot product), and let u = (−1, 1, 0, 2).
Determine whether the vector u is orthogonal to the following vectors:
270. Let u = (1 + i, 3i) and v = (4, 2 − i). Use the complex dot product on C2 to compute:
271. Let C3 have the complex dot product. If u = (2i, i, 3i) and v = (i, 6i, k), for what values of k ∈ C
are u and v orthogonal?
272. Show that in every real inner product space: v+w is orthogonal to v−w if and only if kvk = kwk.
273. Prove that the following holds for all vectors x, y in a real inner product space:
defines an inner product in Rn , where [x] and [y] are coordinate matrices with respect to the
standard basis.
Show that it fails to be an inner product if A is not invertible.
(Hint: If A is not invertible, then its kernel is non-trivial.)
275. Verify that the Cauchy-Schwartz inequality holds for the given vectors using the Euclidean
inner product.
(a) u = (−3, 1, 0), v = (2, −1, 3) (b) u = (−4, 2, 1), v = (8, −4, −2)
276. Use the Cauchy-Schwartz inequality (applied to the Euclidean inner product on Rn ) to show
that given any a1 , a2 , . . . , an ∈ R we have that
r
a1 + · · · + an a21 + · · · + a2n
6
n n
277. Consider R2 and R3 , each with the Euclidean inner product. In each part find the cosine of the
angle between u and v.
278. For the vectors x = (1, 1, 0), y = (0, 1, 0) in R3 compute the angle between x and y using the
following inner products.
Orthonormal bases
Some bases for an inner product space fit nicely with the inner product. Before defining the notion
of an orthonormal basis we note that, after choosing a basis, inner products (on finite-dimensional
vector spaces) can be represented by matrices. If the basis is orthonormal, then the corresponding
matrix is particularly simple.
In this lecture, as in the previous, the field F is either R or C.
Let V be an n-dimensional vector space and let B be a basis for V . Fix a matrix M ∈ Mn,n (F). For
u, v ∈ V define
hu, vi = [u]T M [v] (†)
Where, for legibility, we have written [u] in place of [u]B and [v] in place of [v]B . The right hand side
is a 1 × 1 matrix which we identify with an element of F.
What conditions of M ensure that this gives an inner product on V ?
Exercise 279. Show that the function V × V → F defined by (†) satisfies axioms 2 and 3 in the
definition of an inner product.
Definition 30.1
Exercise 280. Show that if M is positive definite, then the function V × V → F defined by (†) is
an inner product on V . (Hint: The only thing left to show is that axiom 4 in the definition of inner
product is satisfied.)
Given an inner product on V , there always exists a matrix M ∈ Mn,n (F) such that the inner product
is given by the expression (†).
30-2 MAST10022 Linear Algebra: Advanced, 2024
Let V be a finite-dimensional inner product space and B a basis for V . There exists a matrix
M ∈ Mn,n (F) such that
∀u, v ∈ V, hu, vi = [u]TB M [v]B
Proof. Suppose B = {b1 , . . . , bn } and that we have an inner product h., .i on V . Define M ∈ Mn,n (F)
to be the matrix whose (i, j)-th entry is given by Mij = hbi , bj i. That hu, vi = [u]T M [v] for all u, v ∈ V
can be readily verified.
Exercise 281. Show that, with M defined as above, we have hu, vi = [u]T M [v] for all u, v ∈ V .
defines an inner product on R2 . What is the matrix representation of this inner product (with respect
to the standard basis)? Noting that h(1, 0), (1, 0)i = 1, h(1, 0), (0, 1)i = −1, h(0, 1), (1, 0)i = −1 and
h(0, 1), (0, 1)i = 5 we have that
1 −1 v1
h(u1 , u2 ), (v1 , v2 )i = [u1 u2 ]
−1 5 v2
then M = N .
(It follows that the matrix representation (with respect to a fixed basis) of an inner product is unique.)
Examples 30.5. 1. {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is orthogonal in R3 with the dot product as inner
product.
2. So is {(1, 1, 1), (1, −1, 0), (1, 1, −2)} ⊆ R3 using the dot product.
R 2π
3. Consider the inner product space of Example 29.6: V = C[0, 2π] and hf, gi = 0 f (x)g(x)dx.
The set {1, sin(x), cos(x)} ⊆ V is orthogonal.
Lemma 30.6
*
Let V be an inner product space and S ⊆ V \ {0}. If S is orthogonal, then S is linearly indepen-
dent.
*
Proof. Suppose that α1 , . . . , αk ∈ F and v1 , . . . , vk ∈ S are such that α1 v1 + · · · + αk vk = 0. Then for
i ∈ {1, . . . , k} we have
*
α1 v1 + · · · + αk vk = 0 =⇒ hα1 v1 + · · · + αk vk , vi i = 0
=⇒ α1 hv1 , vi i + · · · + αk hvk , vi i = 0
=⇒ αi hvi , vi i = 0 (since hvj , vi i = 0 if j 6= i)
*
=⇒ αi = 0 (since vi 6= 0)
Having shown that
*
α1 v1 + · · · + αk vk = 0 =⇒ ∀i, αi = 0
we conclude that the set S is linearly independent.
Example 30.7. 1. The set {(1, 1, 1), (1, −1, 0), (1, 1, −2)} ⊆ R3 is linearly independent.
2. The set {1, sin(x), cos(x)} ⊆ C[0, 2π] is linearly independent.
Definition 30.8
A set of vectors {v1 , . . . , vk } is called orthonormal if it is orthogonal and each vector has length
one. That is, (
0 i 6= j
{v1 , . . . , vk } is orthonormal if hvi , vj i =
1 i=j
Remark. Any orthogonal set of non-zero vectors can be made orthonormal by multiplying each vector
1
v by kvk .
Examples 30.9.
(a) The set {1, sin(x), cos(x)} is orthogonal but not orthonormal.
(b) The set { √12π , √1π sin(x), √1π cos(x)} is orthonormal
(c) The (infinite) set
1 1 1 1 1
{ √ , √ sin(x), √ cos(x), √ sin(2x), √ cos(2x), . . . }
2π π π π π
is orthonormal.
Definition 30.10
Let V be an inner product space. An orthonormal basis for V is a basis that is an orthonormal
set.
Examples 30.11.
Bases that are orthonormal are particularly convenient to work with. For example, we have the
following.
Lemma 30.12
Let V be an inner product space and let B = {b1 , . . . , bn } be an orthonormal basis for V . Then
for all v ∈ V we have that
v = hv, b1 ib1 + · · · + hv, bn ibn
Example 30.13. Let V = R3 equipped with the dot product and let
1 1 1
B = {b1 = √ (1, 1, 1), b2 = √ (1, −1, 0), b3 = √ (1, 1, −2)}
3 2 6
We saw above that this is an orthonormal basis. To find coordinates with respect to B, we can just
use the inner product. For example
(1, 2, 3) = ((1, 2, 3) · b1 )b1 + ((1, 2, 3) · b2 )b2 + ((1, 2, 3) · b3 )b3 (Lemma 30.12)
1 1 1
= √ (1, 2, 3) · (1, 1, 1) b1 + √ (1, 2, 3) · (1, −1, 0) b2 + √ (1, 2, 3) · (1, 1, −2) b3
3 2 6
√
√ 1 3
= 2 3 b1 − √ b2 − √ b3
2 2
√
2 6
That is, [(1, 2, 3)]B = √12 −1 √
− 3
Example 30.14. Let W = {(x, y, z) ∈ R3 | x + y + z = 0} 6 R3 equipped with the dot product. The set
1 1
B = { √ (−1, 1, 0), √ (−1, −1, 2)}
2 6
is an orthonormal basis for W . For (−1, 0, 1) ∈ W we have
√
1 3
(−1, 0, 1) · b1 = √ (−1, 0, 1) · b2 = √
2 2
√
1 3 1 1
(−1, 0, 1) = √ b1 + √ b2 = (−1, 1, 0) + (−1, −1, 2)
2 2 2 2
30.4 Exercises
283. Use Lemma 30.12 to express the given vector as a linear combination of the vectors in the
following orthonormal basis (with respect to the dot product) for R3 .
B = 31 , − 32 , 23 , − 23 , 13 , 23 , 32 , 23 , 13
284. Let V be the be the vector space C[0, 2π] of real-valued continuous functions on the interval
R 2π
[0, 2π] equipped with the inner product hf, gi = 0 f (x)g(x) dx. Show that the set
1 1 1
√ , √ sin(x), √ cos(x) ⊆ V
2π π π
is an orthonormal set
285. Let hx, yi be an inner product on a vector space V , and let B = {e1 , e2 , . . . , en } be an orthonormal
basis for V . Prove that:
(a) hα1 e1 + α2 e2 + · · · + αn en , β1 e1 + β2 e2 + · · · + βn en i = α1 β 1 + α2 β 2 + · · · + αn β n
(b) hx, yi = hx, e1 ihy, e1 i + · · · + hx, en ihy, en i
(c) The matrix representation, with respect to B, of the inner product is In .
286. Let V be a finite dimensional inner product space and let B = {b1 , . . . , bn } be an orthonormal
basis for V . Let T : V → V be a linear transformation and let A = [T ]B be the matrix of T with
respect to B. Prove that Aij = hT (bj ), bi i
Let V be the inner product space of Exercise 284. Show that the infinite set
n o
√1 , √1 sin(x), √1 cos(x), √1 sin(2x), √1 cos(2x), √1 sin(3x), √1 cos(3x), . . . ⊆V
2π π π π π π π
A Hilbert space is an inner product space with the property that the associated metric is com-
plete. That the metric is complete is to say that all Cauchy sequences converge.
An example of an infinite-dimensional Hilbert space is the space of "square summable" se-
quences of complex numbers
∞
X
`2 = { (z1 , z2 , z3 , . . . ) | zi ∈ C, |zi |2 < ∞}
i=1
We have seen how to find a basis for a vector space, but what about finding an orthonormal basis? In
this lecture we discuss a technique, based on the idea contained in Exercise 287 below, for converting
a basis of a finite-dimensional inner product space into an orthonormal basis.
Let V be an inner product space and let {u1 , . . . , uk } ⊆ V be an orthonormal set. Suppose that v ∈ V
is such that v ∈
/ W = span{u1 , . . . , uk }. Defining w = v − hv, u1 iu1 − hv, u2 iu2 − · · · − hv, uk iuk we
*
have that w 6= 0 and w is orthogonal to W . Therefore, if we define uk+1 = w/kwk and add it to the
above orthonormal set, the now larger set {u1 , . . . , uk+1 } ⊆ V is still orthonormal. Applying this
observation repeatedly allows us to construct an orthonormal basis for V .
Let V be a finite-dimensional inner product space and let {b1 , . . . , bn } be a basis for V .
We define u1 , w2 , u2 , . . . , wn , un ∈ V as follows:
1
1) u1 = kb1 k b1
1
2) w2 = b2 − hb2 , u1 iu1 and u2 = kw2 k w2
1
3) w3 = b3 − hb3 , u1 iu1 − hb3 , u2 iu2 and u3 = kw3 k w3
.. ..
. .
1
n) wn = bn − hbn , u1 iu1 − · · · − hbn , uk−1 iuk−1 and un = kwn k wn
Theorem 31.2
Example 31.3. We find an orthonormal basis for the subspace W of R4 (with the dot product) spanned
by {(1, 1, 1, 1), (2, 4, 2, 4), (1, 5, −1, 3)}.
Following the Gram-Schmidt procedure gives:
1
u1 = (1, 1, 1, 1)/(k(1, 1, 1, 1)k) = (1, 1, 1, 1)
2
1
w2 = (2, 4, 2, 4) − h(2, 4, 2, 4), u1 iu1 = (2, 4, 2, 4) − h(2, 4, 2, 4), (1, 1, 1, 1)i(1, 1, 1, 1)
4
= (2, 4, 2, 4) − 3(1, 1, 1, 1) = (−1, 1, −1, 1)
1
u2 = (−1, 1, −1, 1)/k(−1, 1, −1, 1)k = (−1, 1, −1, 1)
2
w3 = (1, 5, −1, 3) − h(1, 5, −1, 3), u1 iu1 − h(1, 5, −1, 3), u2 iu2
= (1, 5, −1, 3) − 2(1, 1, 1, 1) − 2(−1, 1, −1, 1) = (1, 1, −1, −1)
1
u3 = (1, 1, −1, −1)/k(1, 1, −1, −1)k = (1, 1, −1, −1)
2
So { 12 (1, 1, 1, 1), 21 (−1, 1, −1, 1), 12 (1, 1, −1, −1)} is an orthonormal basis for W .
Let V be an inner product space and let W 6 V be a subspace of V . The orthogonal comple-
ment of W is defined to be
Exercise 288. Let V be an inner product space and let W 6 V be a subspace of V . Prove the following
properties of the orthogonal complement. Note that we are not assuming that V (or W ) is finite
dimensional.
*
a) W ⊥ is a subspace of V b) W ∩ W ⊥ = {0} c) W ⊆ (W ⊥ )⊥
Lemma 31.5
Let V be a finite-dimensional inner product space and let W 6 V be a subspace. Every vector
v ∈ V can be written in a unique way as v = w + w0 where w ∈ W and w0 ∈ W ⊥ .
Proof. Let v ∈ V . We first need to show that there exist w ∈ W and w0 ∈ W ⊥ such that v = w + w0 .
Let {w1 , . . . , wk } be an orthonormal basis for W . Define
w = hv, w1 iw1 + · · · + hv, wk iwk
Then clearly w ∈ W . Now define w0 = v − w. We have that v = w + w0 . We have to show that
w0 ∈ W ⊥ . For i ∈ {1, . . . , k} we have
k
X
hw, wi i = h hv, wj iwj , wi i (definition of w)
j=1
k
X
= hv, wj ihwj , wi i (linearity)
j=1
w + w0 = u + u0
=⇒ w − u = u0 − w0 ∈ W ∩ W ⊥
*
=⇒ w − u = u0 − w0 = 0 (Exercise 288)
Exercise 289. Show that projW is a linear transformation and that (projW )2 = projW .
Exercise 290. Let V be an finite-dimensional inner product space and let W 6 V be a subspace of V .
b) Show that (W ⊥ )⊥ = W .
Lemma 31.7
R1
Example 31.8. Consider P2 (R) with the inner product hf, gi = f (x)g(x)dx. 0
√
The orthogonal projection of v = 1 + 2x + 3x2 onto the unit vector u = 3 x (i.e., the projection onto
the subspace W = span{u}) is
Z 1
23
projW (v) = hv, uiu = (3 x + 2x2 + 3x3 dx)x = x
0 4
Lemma 31.10
Let V be an inner product space and W 6 V a subspace. Then, for all v ∈ V projW (v) is the
vector in W that is closest to v. (That is, ∀v ∈ V ∀ w ∈ W , kv − projW (v)k 6 kv − wk)
Example 31.11. Let W 6 R4 be as in Example 31.3, that is, W = span{(1, 1, 1, 1), (2, 4, 2, 4), (1, 5, −1, 3)}.
We saw that { 21 (1, 1, 1, 1), 12 (−1, 1, −1, 1), 21 (1, 1, −1, −1)} is an orthonormal basis for W .
Using the orthonormal basis we find the point p ∈ W that is closest to v = (2, 2, 1, 3). From Lemma
31.10 we know that p = projW (v).
1
p = projW (v) = hv, u1 iu1 + hv, u2 iu2 + hv, u3 iu3 = 2(1, 1, 1, 1) + (−1, 1, −1, 1) + 0(1, 1, −1, −1)
2
1
= (3, 5, 3, 5)
2
31.3 Exercises
291. Use the Gram-Schmidt procedure to construct orthonormal bases for the subspaces of Rn spanned
by the following sets of vectors (using the dot product):
292. Let P2 (R) be the vector space of polynomials of degree at most two with the inner product
Z 1
hp, qi = p(x)q(x) dx
−1
293. Find the orthogonal projection of (x, y, z) onto the subspace of R3 spanned by the vectors
(a) {(1, 2, 2), (−2, 2, −1)} (b) {(1, 2, −1), (0, −1, 2)}
294. Consider R3 equipped with the dot product. Let w = (2, −1, −2) and v = (2, 1, 3). Find vectors
v1 and v2 such that v = v1 + v2 , v1 is parallel to w, and v2 is perpendicular to w.
295. Find the standard matrices of the transformations T : R3 → R3 which orthogonally project a
point (x, y, z) onto the following subspaces of R3 . Use the matrix to show the transformation is
idempotent (i.e., T ◦ T = T ).
296. Let Π be the plane in R3 given by x + y + z = 0. Use orthogonal projection to find the point on
Π that is as close as possible to (4, 5, 0).
297. Let V be a finite-dimensional inner product space and let W 6 V be a subspace. Show that
dim W + dim W ⊥ = dim V .
298. Let V be a finite-dimensional inner product space and let W 6 V be a subspace. Let P : V → V
be projection onto W . Show that
What happens if we apply Gram-Schmidt to a linearly dependent set? It can be used to produce
an orthonormal basis for the span of the original set of vectors.
R(w + w0 ) = w − w0
Orthogonal diagonalisation
In this lecture we look at matrix representations with respect to orthonormal bases. Suppose that
B = {u1 , . . . , un } is an orthonormal basis for Rn equipped with the standard inner product (dot
product). Then the transition matrix P = PS,B = [ [u1 ]S · · · [un ]S ] has the property that P T P = In .
Definition 32.1
Examples 32.2.
Exercise 299. Suppose that Q, P ∈ Mn,n (R) are orthogonal matrices. Show that
1. Q is orthogonal
2. the columns of Q form an orthonormal basis of Mn,1 (R) (with respect to the dot product)
3. the rows of Q form an orthonormal basis of Mn,1 (R) (with respect to the dot product)
Proof. That 1 ⇔ 2 follows from the way in which matrix multiplication is defined. That 1 ⇔ 3 then
follows from the fact that the transpose of an orthogonal matrix is orthogonal.
32-2 MAST10022 Linear Algebra: Advanced, 2024
For 1 ⇒ 4 we have:
kQuk2 = hQu, Qui = (Qu)T (Qu) (coordinate matrices with respect to the standard basis)
= uT QT Qu
= uT In u (since QT Q = In )
= uT u = kuk2
For 4 ⇒ 5 we have:
1
hQu, Qvi = (hQu + Qv, Qu + Qvi − hQu − Qv, Qu − Qvi)
4
1
= (hQ(u + v), Q(u + v)i − hQ(u − v), Q(u − v)i)
4
1
= (hu + v, u + vi − hu − v, u − vi) (since 4 holds)
4
= hu, vi
Definition 32.4
1 2
Example 32.5. The matrix A = is orthogonally diagonalisable since
2 −2
" √1 # " #T
− √2 − √15 √2
1 2 −3 0
A= = √2 5 √1
5
√2 √1
5 = QDQT
2 −2 5 5
0 2 5 5
Theorem 32.6
Sketch of proof. The proof is the same as for Theorem 25.4, but with the extra observation that the
basis B is orthonormal if and only if the transition matrix is orthogonal.
Theorem 32.7
Proposition 32.8
T
Proof. (Recall that for a matrix M ∈ Mm,n (C), A∗ denotes the conjugate transpose A∗ = A .)
*
Suppose that λ ∈ C and v ∈ Mn,1 (C) \ {0} are such that Av = λv. Then v ∗ A v is a 1 × 1 matrix and
(v ∗ A v)T = v ∗ A v (since any 1 × 1 matrix is symmetric). It follows that
v ∗ A v = (v ∗ A v)T = v T AT v = v ∗ A∗ v = v ∗ A v
and also
3. The union of the eigenspace bases will be an orthonormal basis {u1 , . . . , un } for Rn .
Letting Q be the matrix whose columns are given by the ui and letting D be the diagonal
matrix whose diagonal entries are the corresponding eigenvalues† we then have
A = QDQT
†
The order of the eigenvalues must correspond to the order of the eigenvectors ui
4 2 2
Example 32.10. We apply the above to the matrix A = 2 4 2,
2 2 4
To find the eigenvalues:
4−x 2 2 4−x 2 2
det(A − xI3 ) = det 2 4−x 2 = det x − 2 2 − x 0 (R2 − R1 , R3 − R1 )
2 2 4−x x−2 0 2−x
4−x 2 2
= (x − 2)2 det 1 −1 0
1 0 −1
2 2 2 4−x 2
= (x − 2) − det − det (expanding along the second row)
0 −1 1 −1
= (x − 2)2 (2 − (x − 6)) = (x − 2)2 (8 − x)
Finally, if we take 1
− √16 √1
− √2
1
3 2 0 0
− √16 √1
Q= √ and D = 0 2 0
2 3
0 √2 √1 0 0 8
6 3
32.4 Exercises
cos θ − sin θ
301. Show that the rotation matrix A = is orthogonal.
sin θ cos θ
302. For each symmetric matrix A below find a decomposition A = QDQT , where Q is orthogonal
and D diagonal.
6 −2 1 1 0
(a)
−2 6 (d) 1 1
0
7 2 0
0 0 0
(b) 2 6 2
3 1 0 0
0 2 5 1 3 0 0
(e)
0 0
−2 0 −36 0 0
(c) 0 −3 0 0 0 0 0
−36 0 −23
Reference
Our goal is to show that every real symmetric matrix is orthogonally diagonalisable (Theorem 32.7).
The proof is more easily understood when presented in terms of linear transformations rather than
matrices.
Definition 33.1
Let V be a finite dimensional real inner product space. We will call a linear transformation
T : V → V symmetric if it has the property that hT (u), vi = hu, T (v)i for all u, v ∈ V .
Exercise 303. Let V be a finite dimensional real inner product space. Let B be an orthonormal basis
for V and let T : V → V be a linear transformation. Show that T is symmetric if and only if the
matrix [T ]B is symmetric.
Theorem 33.2
Suppose that we would like to plot the set of points (x, y) ∈ R2 that satisfy the equation
6x2 − 4xy + 3y 2 = 1
We can use orthogonal diagonalisation to eliminate the cross terms in the above equation.
33-2 MAST10022 Linear Algebra: Advanced, 2024
x 6 −2
The equation can be written as X T AX = 1 where X = and A = .
y −2 3
Since A is real symmetric we know that it can be orthogonally diagonalised. Calculation gives A =
QDQT with
2 0 1 1 −2
D= and Q = √
0 7 5 2 1
Note that
X T AX = X T QDQT X = (QT X)T D(QT X) (∗)
Let B = {b1 = √1 (1, 2), b2
= √1 (−2, 1)}
be the orthonormal basis of R2
corresponding to the columns
5 5
of Q. We rewrite the above equation using coordinates with respect to B.
0
x
Let X 0 = be the coordinates of the point (x, y) with respect to B. Note that PS,B = Q and
y0
therefore PB,S = Q−1 = QT .
We have
X 0 = [(x, y)]B = PB,S [(x, y)]S = QT X
Then we have
6x2 − 4xy + 3y 3 = 1 ⇐⇒ X T AX = 1
⇐⇒ (QT X)T D(QT X) = 1 (from ∗)
0 T 0
⇐⇒ (X ) DX = 1
2 0 x0
⇐⇒ x0 y 0
=1
0 7 y0
⇐⇒ 2(x0 )2 + 7(y 0 )2 = 1
The curve is now much easier to recognise as an ellipse.
y
b1
y0 √1 b1
2
b2
1
√ b2
7
√1
7 x
x0
√1
2
33.2 Exercises
304. Use orthogonal diagonalisation to sketch the curve given by the following equation:
5x2 − 4xy + 8y 2 = 36
305. Prove the statements (a), (b), and (c) about W in the above proof of Theorem 33.2.
306. Let T : R2 → R2 be the linear transformation given by T (x, y) = (−6x, −5x+4y). The following
defines an inner product on R2
h(a, b), (x, y)i = ax − ay − bx + 2by
(a) Show that T is symmetric (with respect to the inner product above).
(b) Find a basis for R2 that is orthonormal (with respect to the inner product above) and com-
posed of eigenvectors of T .
Let V 6 F([0, 1], R) be the real inner product space of all smooth functions f : [0, 1] → R that
satisfy f (0) = f (1) = 0. That is
(a) Use integration by parts to show that ∀f, g ∈ V , hD(f ), gi = −hf, D(g)i
d2 f
(b) Show that D2 is symmetric. (Explicitly, D2 : V → V ,D2 (f (x)) = dx2
)
Unitary diagonalisation
We note that the results on orthogonal diagonalisation carry over to complex matrices. The proofs
given in the real case apply with only minor changes, and will not be repeated.
Definition 34.1
Examples 34.2.
1 i i 1 1+i 1−i
√ , ∈ M2,2 (C)
2 1 −1 2 1−i 1+i
Exercise 307. Suppose the U, P ∈ Mn,n (C) are unitary matrices. Show that
1. U is unitary
Definition 34.4
1 2i
Example 34.5. The matrix A = is orthogonally diagonalisable since
−2i −2
" √i 2i ∗
2i
# " #
− √ − √i5 √
1 2i −3 0
A= = √2 5 √1
5
√2 √1
5 = U DU ∗
−2i −2 5 5
0 2 5 5
Theorem 34.6
A matrix A ∈ Mn,n (C) is unitarily diagonalisable if and only if there is an orthonormal basis B
for Cn with the property that all elements of B are eigenvectors of A.
Theorem 34.7
Proposition 34.8
3. The union of the eigenspace bases will be an orthonormal basis {u1 , . . . , un } for Cn .
Letting U be the matrix whose columns are given by the ui and letting D be the diagonal
matrix whose diagonal entries are the corresponding eigenvalues, we then have
A = U DU ∗
1 1+i
Example 34.10. Let A = .
1−i 2
To find the eigenvalues of A:
x − 1 −1 − i
det(xI2 − A) = det = (x − 1)(x − 2) − (1 + i)(1 − i) = x2 − 3x = x(x − 3)
−1 + i x − 2
A basis for the λ = 0 eigenspace is {(1 + i, −1)}. An orthonormal basis is { √13 (1 + i, −1)}.
−2 1 + i −2 1 + i
A − 3I2 = ∼
1 − i −1 0 0
A basis for the λ = 3 eigenspace is {(1 + i, 2)}. An orthonormal basis is { √16 (1 + i, 2)}.
(Notice that (1 + i, −1) · (1 + i, 2) = (1 + i)(1 − i) − 2 = 0)
" 1+i 1+i #
√ √
3 6 0 0
Letting U = −1 and D = we have that A = U DU ∗ .
√ √2 0 3
3 6
34.4 Exercises
2 i
308. Unitarily diagonalse the Hermitian matrix A = .
−i 2
309. Let M ∈ Mn,n (C) be an Hermitian matrix. Show that M is positive definite (see definition 30.1)
if and only if all eigenvalues of M are strictly positive. (Hint: unitarily diagonalise M .)
311. Let A ∈ Mn,n (C) and suppose that A∗ = −A. Suppose that λ ∈ C is an eigenvalue of A. Show
that λ = iy for some y ∈ R.
A matrix A ∈ Mn,n (C) is called normal if A∗ A = AA∗ . Normal matrices are unitarily diago-
nalisable. (This will be covered in the subject MAST20022 Group Theory and Linear Algebra.)
All Hermitian matrices are normal. The matrix 2+i 2−i
2−i 2+i is normal but not Hermitian.
Pn
Given (x1 , y1 ),. . . , (xn , yn ) we want to find a, b ∈ R that minimise the quantity i=1 (yi − (a + bxi ))2
This can be written as
n
X
E= (yi − (a + bxi ))2 = ky − Auk2
i=1
where
1 x1
y1 1 x2
.. a
y= . A = . . u=
. .
. . b
yn
1 xn
The length is that coming from the inner product on Mn,1 (R) that corresponds to the dot product on
Rn , that is hv1 , v2 i = v1T v2 .
To minimise ky − Auk we want u to be such that Au is as close as possible to y (which is fixed).
That is, we want the vector in
that is closest to y.
The closest vector is precisely p = projW (y). To find u we could project y to W to get p and then
solve Au = p to get u (by solving a linear system).
35-2 MAST10022 Linear Algebra: Advanced, 2024
However, we can calculate u more directly (without finding an orthonormal basis for W ) by using
properties of the projection:
wT (y − projW y) = 0 ∀w∈W
=⇒ (Av)T (y − Au) = 0 ∀ v ∈ M2,1 (R) (since w = Av)
T T
=⇒ v A (y − Au) = 0 ∀ v ∈ M2,1 (R)
*
=⇒ AT (y − Au) = 0
*
=⇒ AT y − AT Au = 0
=⇒ (AT A)u = AT y (∗)
From this we can calculate u, given that we know A and y. It’s just a matter of solving the linear
system.
Note that AT A ∈ M2,2 (R). If AT A is invertible (and it usually is), the solution to (∗) is given by
u = (AT A)−1 AT y
Example 35.1. We find the straight line which best fits the data points: (−1, 1), (1, 1), (2, 3), (4, 5)
1 −1
T 1 1 1 1 1 1 4 6
A A= =
−1 1 2 4 1 2 6 22
5
1 4
4
1
T −1 T 1 22 −6 1 1 1 1 1 1 16 3
(A A) A y = =
52 −6 4 −1 1 2 4 3 13 11 2
5
1
16 11
The line of best fit is y = 13 + 13 x
-2 -1 1 2 3 4 5
The same method works for finding quadratic (or higher degree) fitting curves.
To find the quadratic y = a + bx + cx2 which best fits data (x1 , y1 ), (x2 , y2 ),. . . , (xn , yn ) we take
1 x1 x21
1 x2 x2 y1
y = ...
2
A = . .
..
.. .. .
yn
1 xn x2n
and solve
AT Au = AT y
for
a
u = b
c
Example 35.2. We find the parabola which best fits the data points: (−1, 1), (1, 1), (2, 3), (4, 5)
1 −1 1
1 1 1 1 4 6 22
1 1 1
AT A = −1 1 2 4
1 2
= 6 22 72
4
1 1 4 16 22 72 274
1 4 16
83
78
(AT A)−1 AT y = 26
9
6
1
6
4
The parabola of best fit is
83 9
y= 78 + 26 x + 61 x2 2
-2 -1 1 2 3 4 5
35.3 Exercises
312. Find the (least squares) line of best fit for the given data sets.
(a) {(0, 0), (1, 0), (2, 1), (3, 3), (4, 5)}
(b) {(−2, 2), (−1, 1), (0, −1), (1, 0), (2, 3)}
313. A maths lecturer was placed on a rack by his students and stretched to lengths L = 1.7, 2.0
and 2.3 metres when forces of F = 1, 2 and 4 tonnes were applied. Assuming Hooke’s law
L = a + bF , estimate the lecturer’s normal length a.
314. A firm that manufactures widgets finds the daily consumer demand d(x) for widgets as a func-
tion of their price x is as in the following table:
x 1 1.5 2 2.5 3
d(x) 200 180 150 100 25
Using least squares, approximate the daily consumer demand by a linear function.
315. Find the parabola of best fit for the data in Exercise 312(a).
(Use MATLAB for the matrix algebra in this question!)
Reference
Cardinality
What should it mean for two sets to have the same ‘size’? Does it make sense to say that there are
more rational numbers than there are natural numbers?* Are there more real numbers than there are
rationals?†
Rather than trying to define the size of a set directly, it is convenient to introduce the notion of two
sets ‘having the same number of elements’. The notion of a bijective function is clearly just what is
required.
Definition A.1
Two sets A and B are said to have the same cardinality if there exists a bijection A → B. A set is
finite if it is either empty or has the same cardinality as the set {1, 2, . . . , n} for some n. A set that
is not finite is called infinite. A set is called countably infinite is it has the same cardinality as
N. A set is called countable if it is either finite or countably infinite. A set that is not countable
is called uncountable.
Lemma A.2
Let A be a set.
Proof. For the first statement it suffices to show that if B is an infinite subset of N, then there is a
bijection ϕ : N → B. We inductively define a sequence of subsets Bi ⊂ N together with the required
function ϕ : N → B. Let B1 = B. Suppose that Bi has been defined and is infinite. Let bi = min(Bi )
and define ϕ(i) = bi and Bi+1 = Bi \ {bn }. Since Bi is infinite, Bi+1 is infinite. The finction ϕ : N → B
defined inductively in this way is clearly a bijection.
For the second statement, suppose now that there exists a surjective function f : N → A. If A is finite,
then A is countable by definition and there is nothing to prove. We can assume, therefore, that A is
infinite. We define a map g : N → A as follows. Define M ⊆ N by
M = {m ∈ N | f (m) ∈
/ {f (1), f (2), . . . , f (m − 1)}}
Note that
2. f (M ) = A
Denote by mi the i-th element of M (using the usual ordering on N) and define
g : N → A, g(i) = f (mi )
* no
†
yes
A-2 MAST10022 Linear Algebra: Advanced, 2024
Proposition A.3
(
n
2 if n is even
Proof. 1. The map f : N → Z given by f (n) = 1−n
is a surjection.
2 if n is odd
3. From the first two parts, we know that Z × (Z \ {0}) has the same cardinality as N. Then note
that the map h : Z × (Z \ {0}) → Q given by h(m, n) = mn is surjective.
At this point we might start to think that all infinite sets are countably infinite. But we’d be wrong.
Theorem A.4
R is uncountable.
Proof. Suppose, for a contradiction, that the interval (0, 1) ⊂ R is countable and let f : N → (0, 1) be
a bijection. We consider the decimal expansion of each element:
Each aij ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and aij is the j-th digit in the decimal expansion of f (i). Define
b ∈ (0, 1) as follows:
(
7 if aii = 8
b = 0.b1 b2 b3 . . . where bi =
8 if aii 6= 8
Notice that for all i ∈ N, bi 6= aii . It follows that for all i ∈ N, b 6= f (i). This contradicts the assumption
that f is surjective.
Theorem A.5
Let A be a set. The power set of A, P(A), does not have the same cardinality as A.
Proof. Suppose, for a contradiction, that there exists a bijection f : A → P(A). Define B = {a ∈ A |
a∈/ f (A)}. Since B ⊆ A and f is surjective, there exists b ∈ A such that f (b) = B. Then we have
Existence of bases
Theorem B.1
If we assume that V is finite dimensional, the theorem is much easier to prove (see Lecture 17). In the
general case, which we shall consider here, the proof uses some fundamental results from the theory
of infinite sets which are stated without proof in the second section.
Notice that, since V itself is a spanning set for V , we have the following consequence of the theorem.
Corollary B.2
Lemma B.3
Let V be a vector space over F and let X, Y ⊆ V be two subsets. If X is linearly independent and
Y spans V , then there is a subset Y 0 ⊆ Y such that X ∪ Y 0 is a basis of X.
Proof of Lemma B.3. Define S to be the collection of all subsets Z of Y such that X ∪ Z is linearly
independent, that is:
Z ⊇ Y 0 =⇒ Z = Y 0
How do we know that such a maximal element exists? If Y is a finite set, then S is also finite and the
existence is clear. If Y is infinite, the existence of a maximal element Y 0 is less obvious, and is in fact
a fundamental property of set theory. It is called “Zorn’s Lemma”* (see the next section).
* Infact, it is not really a lemma. It is equivalent to something called the “Axiom of Choice,” which is independent of
the other axioms of set theory.
B-2 MAST10022 Linear Algebra: Advanced, 2024
By construction X ∪Y 0 is linearly independent. We claim that it is also a spanning set for V . We know
that for all y ∈ Y , y ∈ span(X ∪ Y 0 ) since otherwise X ∪ Y 0 ∪ {y} would be linearly independent,
which contradicts the maximality of Y 0 . Thus Y ⊆ span(X ∪ Y 0 ) and hence span(Y ) ⊆ span(X ∪ Y 0 ).
Since span(Y ) = V and span(X ∪ Y 0 ) ⊆ V , it follows that span(X ∪ Y 0 ) = V . Therefore X ∪ Y 0 is a
spanning set for V , and hence a basis.
To prove the third part of the theorem, we will use the following lemma.
Lemma B.4
Let V be a vector space over F, let Y ⊆ V be a spanning set for V , and let {x1 , . . . , xm } ⊆ V be a
linearly independent set of vectors. If |Y | > m, then there are elements y1 , . . . , ym ∈ Y such that
{Y \ {y1 , . . . , ym }} ∪ {x1 , . . . , xm } is a spanning set for V . (That is, replacing the yi by the xi still
gives a spanning set.)
Proof. We first note that since {x1 , . . . , xm } is linearly independent, all of the xi are non-zero. Since Y
is a spanning set, there exist a1 , . . . , ak ∈ Y and α1 , . . . , αk ∈ F such that x1 = α1 a1 + · · · + αk ak . As
x1 is non-zero, at least one of the αi is non-zero. By re-ordering the ai if necessary we may assume
that α1 6= 0. We then have that
1 α2 αk
a1 = x1 − a2 − · · · − ak
α1 α1 α1
Let y1 = a1 . Using the above expression any linear combination of elements from Y can be rewritten
as a linear combination of vectors from Y1 = {Y \ {y1 }} ∪ {x1 }. We simply replace any occurance of
y1 by the right hand side of the above expression. This then gives a linear combination which does
not involve y1 , but does involve x1 . It follows that span(Y ) ⊆ span(Y1 ), and therefore span(Y1 ) = V .
Suppose now that we have found y1 , . . . , yl ∈ Y (where 1 6 l < m) such that Yl = {Y \ {y1 , . . . , yl }} ∪
{x1 , . . . , xl } is a spanning set for V . Since Yl is a spanning set, there exist b1 , . . . , bk ∈ Y \ {y1 , . . . , yl }
and β1 , . . . , βk , γ1 . . . , γl ∈ F such that xl+1 = β1 b1 +· · ·+βk bk +γ1 x1 +· · ·+γl xl . Since xl+1 is non-zero,
at least one of the βi or γj is non-zero. Indeed, not all the βi can be zero, as that would contradict the
linear independence of the set {x1 , . . . , xl+1 }. Re-ordering if necessary, we may assume that β1 6= 0.
Then
1 β2 βk γ1 γl
b1 = xl+1 − b2 − · · · − bk − x1 − · · · − xl
β1 β1 β1 β1 β1
Letting yl+1 = b1 and Yl+1 = {Y \{y1 , . . . , yl+1 }}∪{x1 , . . . , xl+1 }, we have, as above, that span(Yl+1 ) =
V . The lemma then follows by induction.
Lemma B.5
Let V be a vector space over F and let X, Y ⊆ V be two subsets. If X is linearly independent and
Y spans V , then |X| 6 |Y |.
Proof. We first prove the lemma under the assumption that Y is finite. Let Y = {y1 , . . . , yk }. Sup-
pose, in order to get a contradiction, that |X| > k. Choose distinct elements x1 , . . . , xk ∈ X. Then
{x1 , . . . , xk } is linearly independent (since X is), and applying Lemma B.4 we know that
is a spanning set for V . Since |X| > k, there is an element x ∈ X \ {x1 , . . . , xk }. As {x1 , . . . , xk } is a
spanning set for V , x can be expressed as a linear combination of the xi . This contradicts the linear
independence of X, so we must in fact have |X| 6 k.
Consider now the case where Y is not finite. Denote by F (Y ) the set of all finite subsets of Y . Then
|F (Y )| = |Y | (See Lemma B.6). Define a map Φ : X → F (Y ) as follows: For each x ∈ X we choose a
a finite subset Sx ⊂ Y such that x is a linear combination of Sx , and define Φ(x) = Sx . If |X| > |F (Y )|
then there is some element S ∈ F (X) with infinite preimage (see Lemma B.6). We would then have
a finite set S ⊂ Y such that Φ−1 (S) ⊂ X is an infinite, linearly independent subset of span(S). This
would contradict the first case of this proof.
Proof of part 3 of the theorem. Let B1 and B2 be two bases for V . Since B1 is linearly independent
and B2 is a spanning set, Lemma B.5 implies that |B1 | 6 |B2 |. On the other hand, B2 is linearly
independent and B1 is a spanning set, so we also have |B2 | 6 |B1 |. It follows that |B1 | = |B2 |.
In the above proof we used some results about infinite sets. We state them here without proof. The
interested reader should consult an introductory textbook on set theory.
Lemma B.6
2. If |X| > |Y |, then there exists an element y ∈ Y whose preimage, {x ∈ X | f (x) = y}, is
infinite.
Zorn’s Lemma is a fundamental statement in the theory of infinite sets, and is equivalent to the Axiom
of Choice. In order to state Zorn’s Lemma (in the form we use) we make the following definition. If
S is a collection of sets, a non-empty subset C ⊆ S is called a chain if
∀ A, B ∈ C either A ⊆ B or B⊆A
Zorn’s Lemma
Let S be a non-empty collection of sets. Suppose that whenever C is a chain in S the union ∪C∈C C
is also an element of S. Then S contains a maximal element.