0% found this document useful (0 votes)
147 views

Lecture Note

This document provides an introduction and preface to a course on accelerated algebra. It uses an example to illustrate the concepts that will be covered, showing how results from linear algebra can be applied to prove Chasles' theorem about cubic curves. Specifically, it shows that a cubic curve C passing through 8 points of intersection between two other cubic curves C1 and C2 must also pass through the 9th point of intersection. This is demonstrated by setting up and solving a system of linear equations to determine the coefficients of C.

Uploaded by

Loh Jun Xian
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views

Lecture Note

This document provides an introduction and preface to a course on accelerated algebra. It uses an example to illustrate the concepts that will be covered, showing how results from linear algebra can be applied to prove Chasles' theorem about cubic curves. Specifically, it shows that a cubic curve C passing through 8 points of intersection between two other cubic curves C1 and C2 must also pass through the 9th point of intersection. This is demonstrated by setting up and solving a system of linear equations to determine the coefficients of C.

Uploaded by

Loh Jun Xian
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 135

ACCELERATED ALGEBRA

IVAN CHELTSOV

Contents
Preface 1
1. Background 4
1.a. Handy notations 4
1.b. Multidimensional vectors 7
1.c. Distances, dot products and angles 9
1.d. Matrices and operations with them 14
1.e. Complex numbers and finite fields 20
2. Basic linear algebra 26
2.a. Vector subspaces 26
2.b. Linear span 28
2.c. Linear independence 31
2.d. Bases and dimension 35
2.e. Orthogonality 40
3. Matrices and linear equations 47
3.a. Rank–nullity theorem 47
3.b. Determinants 53
3.c. Systems of linear equations 59
3.d. Row echelon form 65
3.e. Computational examples 70
4. Linear transformations 75
4.a. What is a linear transformation? 75
4.b. Matrices versus linear transformations 78
4.c. Composing linear transformations 83
4.d. Rank–nullity theorem revisited 85
4.e. Linear operators 87
5. Eigenvalues and eigenvectors 93
5.a. Similar matrices 93
5.b. Diagonalizable matrices 97
5.c. Complex matrices 108
5.d. Orthogonal matrices 115
5.e. Symmetric matrices 120

These notes are adapted from Tom Leinster’s notes.


ACCELERATED ALGEBRA 1

Preface
Cambridge gave me various kinds of teaching and supervision, but possibly the most important
influences were Jeffrey Goldstone and Peter Swinnerton-Dyer, who encouraged me to continue
to think for myself and not to take the technical details too seriously.
Miles Reid
To illustrate what this course is about, let us consider classical
Theorem (Chasles). Let C1 and C2 be cubic curves in the plane R2 such that their intersection
consists of exactly 9 points. Let C be any cubic curve in the plane R2 that contains 8 of these points.
Then the curve C contains the ninth intersection point as well.
Actually, we can prove this theorem using results that will be described and proved in this course.
Moreover, we can use these results to check this theorem in every given case. For instance, suppose
that the curve C1 is given by the equation
− 5913252577x3 + 30222000280x2 y − 21634931915xy 2 +
+ 5556266591y 3 − 73906985473x2 + 102209537669xy − 37300172365y 2 +
+ 1389517162x − 88423819400y + 204616284808 = 0,
and suppose that the curve C2 is given by the equation
− 4844332x3 − 8147864x2 y − 4067744xy 2 −
− 1866029y 3 + 32668904x2 − 28226008xy + 41719157y 2 +
+ 252639484x + 126319742y − 960898976 = 0.
Then the intersection C1 ∩ C2 consists of the eight points
(2, 3), (−3, 4), (−4, −5), (−6, 2), (5, 3), (3, 2), (−2, −6), (4, 8)
and the ninth intersection point
!
1439767504290697562 4853460637572644276
, .
409942054104759719 409942054104759719
Now let C be a cubic curve in the plane R2 that passes through the first eight intersection points.
Let us show that C also passes through the ninth (ugly looking) intersection point.
The curve C is given by a polynomial equation that looks like this:
a1 x3 + a2 x2 y + a3 xy 2 + a4 y 3 + a5 x2 + a6 xy + a7 y 2 + a8 x + a9 y + a10 = 0,
where a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , a9 and a10 are some real numbers. Now we can substitute
the coordinates of the points (2, 3), (−3, 4), (−4, −5), (−6, 2), (5, 3), (3, 2), (−2, −6), (4, 8) into
this equation. This gives us the following eight linear equations:


 8a1 + 12a2 + 18a3 + 27a4 + 4a5 + 6a6 + 9a7 + 2a8 + 3a9 + a10 = 0,
36a2 − 27a1 − 48a3 + 64a4 + 9a5 − 12a6 + 16a7 − 3a8 + 4a9 + a10 = 0,




16a5 − 80a2 − 100a3 − 125a4 − 64a1 + 20a6 + 25a7 − 4a8 − 5a9 + a10 = 0,





72a2 − 216a1 − 24a3 + 8a4 + 36a5 − 12a6 + 4a7 − 6a8 + 2a9 + a10 = 0,


 125a1 + 75a2 + 45a3 + 27a4 + 25a5 + 15a6 + 9a7 + 5a8 + 3a9 + a10 = 0,

27a1 + 18a2 + 12a3 + 8a4 + 9a5 + 6a6 + 4a7 + 3a8 + 2a9 + a10 = 0,







 4a5 − 24a2 − 72a3 − 216a4 − 8a1 + 12a6 + 36a7 − 2a8 − 6a9 + a10 = 0,

64a + 128a + 256a + 512a + 16a + 32a + 64a + 4a + 8a + a = 0.
1 2 3 4 5 6 7 8 9 10
2 IVAN CHELTSOV

We have 10 unknowns and 8 equations. Our intuition says: we have many solutions. This is true!
For instance, if we add two more constraints
• a8 = 1389517162 and a9 = −88423819400,
then we get unique solution:
a = −5913252577,
 1




 a 2 = 30222000280,
a3 = −21634931915,








 a4 = 5556266591,

a5 = −73906985473,


 a6 = 102209537669,

a7 = −37300172365,







 a8 = 1389517162,

a9 = −88423819400,




a10 = 204616284808.
This gives us the polynomial

f1 (x, y) = −5913252577x3 + 30222000280x2 y − 21634931915xy 2 +


+ 5556266591y 3 − 73906985473x2 + 102209537669xy − 37300172365y 2 +
+ 1389517162x − 88423819400y + 204616284808.

It looks familiar! Indeed, our first cubic curve C1 in the plane is defined by the equation f1 (x, y) = 0.
Similarly, if we get another two constraints
• a8 = 252639484 and a9 = 126319742,
then we also get unique solution:
a = −4844332,
 1




 a2 = −8147864,
a3 = −4067744,





a4 = −1866029,





a5 = 32668904,


 a6 = −28226008,

a7 = 41719157,





a8 = 252639484,







 a9 = 126319742,

a10 = −960898976.
This gives us the polynomial

f2 (x, y) = −4844332x3 − 8147864x2 y − 4067744xy 2 −


− 1866029y 3 + 32668904x2 − 28226008xy + 41719157y 2 +
+ 252639484x + 126319742y − 960898976.

Recall that our second cubic curve C2 in R2 is defined by the equation f2 (x, y) = 0.
ACCELERATED ALGEBRA 3

In fact, we can extract the following 8 × 10 matrix (see Section 1.d) from the system above:
8 12 18 27 4 6 9 2 3 1
 
 −27 36 −48 64 9 −12 16 −3 4 1 
 −64 −80 −100 −125 16 20 25 −4 −5 1 
 
 −216 72 −24 8 36 −12 4 −6 2 1 
 
.
 125 75 45 27 25 15 9 5 3 1 

 27 18 12 8 9 6 4 3 2 1 
 
 −8 −24 −72 −216 4 12 36 −2 −6 1 
64 128 256 512 16 32 64 4 8 1
Then we can compute the rank of this huge 8 × 10 matrix. This is doable: its rank is equal to 8.
After this, using rank-nullity theorem, we conclude that every cubic curve in R2 that passes through
the points (2, 3), (−3, 4), (−4, −5), (−6, 2), (5, 3), (3, 2), (−2, −6), (4, 8) is given by
λf1 (x, y) + µf2 (x, y) = 0
for some numbers λ and µ. Thus, it contains the ninth intersection point of the curves C1 and C2 .
The following picture illustrates Chasles’s Theorem:

Here the blue curve is C1 , the red curve is C2 , and the orange curve is another cubic curve that
passes through the points (2, 3), (−3, 4), (−4, −5), (−6, 2), (5, 3), (3, 2), (−2, −6), (4, 8).
This example shows that linear algebra allows us to verify Chasles’s Theorem in special cases.
In fact, we can use basic tools from linear algebra to prove Chasles’s Theorem in full generality.
However, to use these tools properly, we have to introduce new objects (like matrices), learn
their basic properties and features (like what is the rank of a matrix and how to compute it),
and prove some important theorems, including rank-nullity theorem, which we already used above.
This is what we are going to do in these notes.
4 IVAN CHELTSOV

1. Background
1.a. Handy notations.
We could, of course, use any notation we want; do not laugh at notations;
invent them, they are powerful. In fact, mathematics is, to a large extent,
invention of better notations.
Richard Feynman
In this notes, we will often use Greek letters. The table below shows some of them together
with the name of each letter and how the name is pronounced.

Lower case Upper case Name Pronunciation


α alpha AL-fa
β beta BEE-ta
γ Γ gamma GAM-ma
δ ∆ delta DEL-ta
 epsilon EP-si-lon
ζ zeta ZEE-ta
η eta EE-ta
θ Θ theta THEE-ta (soft th, as in think )
ι iota eye-OH-ta
κ kappa KAP-pa
λ Λ lambda LAM-da
µ mu myoo (rhymes with few )
ν nu nyoo (rhymes with few )
ξ Ξ xi ksy (rhymes with pie)
o omicron never used in mathematics
π Π pi pie
ρ rho roe (rhymes with go)
σ Σ sigma SIG-ma
τ tau rhymes with now
υ upsilon almost never used in mathematics
φ Φ phi fy (rhymes with pie)
χ chi ky (rhymes with pie)
ω Ω omega OH-me-ga
Mathematicians reserve special symbols for some commonly used sets:
• ∅ stands for the empty set;
• N is the set of natural numbers;
• Z is the set of integers;
• Q is the set of rational numbers;
• R is the set of real numbers;
• C is the set of complex numbers (see Section 1.e);
• Fq is a field consisting of q = pn elements, where p is prime and n ∈ N (see Section 1.e).
ACCELERATED ALGEBRA 5

To specify a finite set, you can simply list its elements: {1, 3, 5}. This does not work well for
infinite sets. But instead we can use notation such as
n o
2
A= a∈Z : a −a>6 .
The statement x ∈ Y means that x is an element of the set Y . But X ⊆ Y means that every
element of the set X is an element of the set Y .
Given subsets A and B of a set X, we can form the following new subsets:
• the intersection A ∩ B = {x ∈ X : x ∈ A and x ∈ B};
• the union A ∪ B = {x ∈ X : x ∈ A or x ∈ B};
• the complement A \ B = {x ∈ X : x ∈ A but x 6∈ B}.
A function (or mapping or map) consists of three things:
• a set A, called the domain of the function;
• another set B, called the codomain of the function; and
• a rule f that assigns to each element a ∈ A an element f (a) ∈ B.
We write f : A → B to mean that f is a function with domain A and codomain B.
Example 1.1. Let f : R → R be the function that is given by f (a) = a2 for every real number a.
Then its domain is R and its codomain is also R.
When we have some function f : A → B, we sometimes write a 7→ b to mean that f (a) = b.
2 2
For instance, let f : Z → Q be the function given by n 7→ n2 . Then f (n) = n2 for every n ∈ Z.
Remark 1.2. Suppose we have functions f : A → B and g : B → C. Then we can feed the output
of f into the input of g to make a new function g ◦ f : A → C. This new function g ◦ f is given by

(g ◦ f )(a) = g f (a)
for every a ∈ A, and is called the composite of the functions g and f .
Let A and B be sets. A function f : A → B is said to be
• injective (or one-to-one) if for each b ∈ B, there is at most one a ∈ A such that f (a) = b;
• surjective (or onto) if for each b ∈ B, there is at least one a ∈ A such that f (a) = b;
• bijective if f is both injective and surjective.
For instance, the function f : {2, 3} → {4, 5, 6} defined by f (a) = a2 is injective and not surjective.
Similarly, the function g : {−2, 2} → {4} defined by g(a) = a2 is surjective and not injective.
Remark 1.3. For any set A, there is a function IdA : A → A, called the identity on A, which is
sometimes written as IdA . It is given by IdA (a) = a for every a ∈ A.
Let A and B be sets and let f : A → B be a function. If f is bijective then there is a unique
function g : B → A such that
g ◦ f = IdA ,


f ◦ g = IdB .
This function g is called the inverse of f , and it is usually denoted by f −1 . Conversely, if f has
an inverse then f is bijective.
Example 1.4. Let f : N → Z be the function given by
n
 if n is even,

2
f (n) = 1 − n
if n is odd.


2
6 IVAN CHELTSOV

Then f is bijective, and its inverse is the function g : Z → N given by


2n if n > 0,

g(n) =
1 − 2n if n 6 0.
Suppose we have numbers a1 , a2 , . . . , an and we want to consider their sum. We could write it
as a1 + a2 + · · · + an . But it is often convenient to write it instead as
n
X
ai .
i=1

For example, if you are given numbers a1 , a2 , . . . , an and b1 , b2 , . . . , bn , then


n
X n
X n
X
a1 + a2 + · · · + an + b 1 + b 2 + · · · + b n = ai + bi = (ai + bi ).
i=1 i=1 i=1

We can put sums inside other sums. For instance, if we have a grid of numbers
a11 a12 · · · a1n
a21 a22 · · · a2n
(1.5) .. .. ..
. . .
am1 am2 · · · amn
then we can write down the sum of all numbers in this grid as
     
a11 + a12 + · · · + a1n + a21 + a22 + · · · + a2n + · · · + am1 + am2 + · · · + amn .

This can be shortcut as !


X n
m X m
X n
X
aij = aij .
i=1 j=1 i=1 j=1

Of course, we can get the same total sum if we added up the numbers column by column, so that
m X
X n n X
X m
aij = aij .
i=1 j=1 j=1 i=1

This means that we can change the order of summation.


Exercise 1.6. Exactly one of the following statements is false. Which? Why?
(1) For all real numbers c, x1 , . . . , xn , we have
n
X n
X
cxi = c xi .
i=1 i=1

(2) For all real numbers x1 , . . . , xn , y1 , . . . , yn , we have


n
X n
X n
X
(xi + yi ) = xi + yj .
i=1 i=1 j=1

(3) For all real numbers x1 , . . . , xm , y1 , . . . , yn , we have


m X
X n m
X n
X
(xi + yj ) = xi + yj .
i=1 j=1 i=1 j=1
ACCELERATED ALGEBRA 7

1.b. Multidimensional vectors.


In two dimensions, we the form may trace
Of him whose soul, too large for vulgar space,
In n dimensions flourished unrestricted.
James Clerk Maxwell, Poem for Arthur Cayley
Fix an integer n > 0. An n-dimensional vector is simply an ordered list of n real numbers
x1 , x2 , . . . , xn ,
which we write in a column:
x1
 
 x2 
 . .
 .. 
xn
It saves space if we write x instead of this column. We will use this convention for the whole
course: whenever we have a vector x, we will write its ith entry as xi . We will also say that xi is
ith coordinate of the vector x.
It is standard to use bold typeface for vectors. But in handwriting, it is hard to write bold
symbols like x. Thus, we indicate a vector by underlining instead: x. The point of this convention
is to make sure we do not get vectors confused with scalars (elements of R). So do not forget:
in your work, always underline vectors.
The set of all n-dimensional vectors is called Rn . Often elements of Rn are written in the
horizontal notation 
x1 , x2 , . . . , xn
instead of as a column. But we are going to use column notation.
Remark 1.7. The elements of Rn are often called vectors or points.
When are two vectors equal? The rule is that for two vectors x and y in Rn , we have
x = y ⇐⇒ xi = yi for all i ∈ {1, . . . , n}.
For any two vectors x and y in Rn , we can add then to get a third vector x + y defined as
follows:
x1 + y 1
 
 x2 + y 2 
x+y =  ...  .

xn + y n
Similarly, any vector x in R can be multiplied by any scalar a ∈ R to get another vector ax ∈ Rn ,
n

which is defined by
ax1
 
 ax2 
ax =  ...  .

axn
n
An important vector in R is the zero vector, whose all entries are zeroes. We will denote it by 0.
Lemma 1.8. Let x, y, z be vectors in Rn , and let a and b be numbers in R. Then
(1) (x + y) + z = x + (y + z);
(2) x + y = y + x;
(3) x + 0 = x;
8 IVAN CHELTSOV

(4) a(bx) = (ab)x;


(5) 1x = x;
(6) a(x + y) = ax + ay;
(7) (a + b)x = ax + bx;
(8) 0x = 0.

Proof. This is a series of routine checks using the definitions. Let us prove only (6). We have

x1 + y1
 
  x2 + y2 
a x + y = a  ... 

xn + yn

by definition of vector addition. Similarly,we have

x1 + y1 a(x1 + y1 )
   
 x2 + y2   a(x2 + y2 ) 
a
 ...  = 
  .. 
. 
xn + yn a(xn + yn )

by definition of scalar multiplication. On the other hand, we have

a(x1 + y1 ) ax1 + ay1


   
 a(x2 + y2 )   ax2 + ay2 
 .. = .. 
 .   . 
a(xn + yn ) axn + ayn

since a(p + q) = ap + aq for every p and q in R. But

ax1 + ay1 ax1 ay1


     
 ax2 + ay2   ax2   ay2 
 .. = . + . 
 .   ..   .. 
axn + ayn axn ayn

by definition of vector addition. Moreover, we have

ax1 ay1 x1 y1
       
 ax2   ay2  x  y 
 .  +  .  = a  .2  + a  .2  = ax + ay
 ..   ..   ..   .. 
axn ayn xn yn

by definition of scalar multiplication. This shows that a(x + y) = ax + ay as required. 

We write the vector (−1)x as −x. It is the vector whose ith entry is −xi , and it satisfies

−x + x = 0.

Similarly, we write x + (−y) as x − y.

Exercise 1.9. Prove Lemma 1.8(2).


ACCELERATED ALGEBRA 9

1.c. Distances, dot products and angles.


The shortest distance between two points is a straight line.
Archimedes
Fix an integer n > 0. Then the length of a vector x ∈ Rn is
v
u n
uX
(1.10) kxk = t x2 . i
i=1

This is definition! If n = 2 or n = 3, this formula follows from Pythagoras’s theorem.


Note that (1.10) gives us a distance between the origin and any other point of Rn . More generally,
the distance between two points x and y in Rn is defined to be ky − xk.
Lemma 1.11. Let x and y be points in Rn , and let a ∈ R. Then the following assertions hold:
(1) kxk > 0, with equality if and only if x = 0;
(2) kx − yk > 0, with equality if and only if x = y;
(3) kaxk = |a| kxk;
(4) ky − xk = kx − yk.
Proof. To prove the assertion (1), observe that length kxk is the square root of a nonnegative real
number (by definition), so that kxk > 0. Now
v
u n n
uX X
kxk = 0 ⇐⇒ t 2
xi = 0 ⇐⇒ x2i = 0 ⇐⇒ x = 0.
i=1 i=1

Thus, we see that kxk = 0 ⇐⇒ x = 0 as claimed.


To prove (2), simply replace x by x − y in (1).
To prove (3), observe that
v v v
u n u n u n
uX uX uX
kaxk = t (axi )2 = t a2 x2i = |a|t x2i = |a| kxk.
i=1 i=1 i=1

Finally, the part (4) follows from (3) by taking a = −1 and replacing x by x − y. 
For two vectors x and y in Rn , their dot product x · y ∈ R is defined by
Xn
x·y = xi y i .
i=1

Note that x · y is a scalar (not a vector). Because of this x · y is often called scalar product.
Lemma 1.12. Let x, y, z ∈ Rn and a ∈ R. Then
(1) x · y = y · x;
(2) x · (y + z) = x · y + x · z;
(3) x · 0 = 0;
(4) x · (ay)√= a(x · y);
(5) kxk = x · x.

Proof. We will only prove (2). Write w = y + z. Then x · y + z = x · w. But
n
X
x·w = xi w i
i=1
10 IVAN CHELTSOV

by definition of dot product. Moreover, we have


Xn n
X 
xi w i = xi yi + zi
i=1 i=1

by definition of vector addition. Furthermore, we have


Xn n
 X 
xi yi + zi = xi yi + xi zi
i=1 i=1

since p(q + r) = pq + pr for every real numbers p, q and r. Thus, we have


n n
 X X
x· y+z = xi y i + xi zi = x · y + x · z
i=1 i=1

as required. 
Exercise 1.13. Prove Lemma 1.12(4).
You may have learn in school that whenever x and y are vectors in R2 or R3 , we have
x · y = kxk kyk cos θ
where θ is the angle between x and y. Since | cos θ| 6 1 for all θ, it follows that
|x · y| 6 kxk kyk
2 3
for all x and y in R or R . When does equality hold? We have
|x · y| = kxk kyk ⇐⇒ the points 0, x, y are collinear,
where collinear means they lie on a straight line. This fact can be generalized to all dimensions.
However, first we need to define a line in Rn . In this course all lines are straight.
Definition 1.14. A line in Rn is a set of points in Rn that is given by

x+t y−x
when t runs through R, where x and y are fixed points in Rn .
The line in this definition contains both points x and y.
Lemma 1.15 (Cauchy–Schwarz inequality). For any two vectors x and y in Rn , one has
|x · y| 6 kxk kyk
with equality if and only if the points 0, x and y are collinear.
Proof. If x = 0 or y = 0 then both sides of the inequality are 0 and the points 0, x and y are
collinear. Thus, we may assume that
x 6= 0 6= y.
Then the points 0, x and y are collinear if and only if x = ay for some a ∈ R. If x = ay then
x · y = ay · y = akyk2 ,
which implies that
x·y
a= .
kyk2
Hence 0, x and y are collinear if and only if
x·y
(1.16) x= y.
kyk2
ACCELERATED ALGEBRA 11

We do not know whether 0, x and y are collinear. But in any case, we can consider the distance
between the left-hand side and right-hand side of (1.16). We have
x·y 2
 x·y   x·y 
0 6 x − y = x − y · x− y =

kyk2 kyk2 kyk2
(x · y)2 (x · y)2 1  
= kxk2 − 2 + = kxk 2
kyk 2
− (x · y) 2
,
kyk2 kyk2 kyk2
by Lemmas 1.11 and 1.12. Thus, rearranging, we get
2
x · y 6 kxk2 kyk2 .
Taking square roots on both sides gives |x · y| 6 kxk kyk as required. Equality holds if and only if
x·y
x− y = 0.
kyk2
But we have already shown that this is equivalent to the condition that 0, x and y are collinear. 
Triangle inequality says that for any triangle, the length of each side is less than or equal to
the sum of the lengths of the other two sides. Now let x and y in R2 , thinking about the triangle
with vertices 0, x and x + y. Then, in this case, we have
• the distance from 0 to x is kx − 0k = kxk;
• the distance from x to x + y is k(x + y) − xk = kyk;
• the distance from 0 to x + y is k(x + y) − 0k = kx + yk;
so that kx + yk 6 kxk + kyk. We have a geometrically plausible argument for why this is true.
The same is true in all dimensions by
Lemma 1.17 (Triangle inequality). For every x and y in Rn , one has kx + yk 6 kxk + kyk.
Proof. Using the Cauchy–Schwarz inequality, we see that
kx + yk2 = x + y · x + y = kxk2 + 2x · y + kyk2 6
 
 2
6 kxk2 + 2|x · y| + kyk2 6 kxk2 + 2kxk kyk + kyk2 = kxk + kyk .
Taking square roots on both sides gives the result. 
The proof of this result demonstrate an important lesson:
it is often easier to work with squares of distances than with distances themselves.
Because we can expand a squared distance kvk2 as a dot product v · v.
Exercise 1.18. Let x and y be two vectors in Rn . Prove that
 
2 2 2 2
kx + yk + kx − yk = 2 kxk + kyk .

If x and y are nonzero vectors in R2 with angle θ between them, we have


x · y = kxk kyk cos θ.
We can use this equality to define angles between any nonzero vectors in Rn , where n is arbitrary.
Definition 1.19. Let x and y be nonzero vectors in Rn . The angle between x and y is the unique
real number θ such that
x·y
cos θ =
kxk kyk
and 0 6 θ 6 π.
12 IVAN CHELTSOV

Using Lemma 1.15 and basic properties of cosine, we see that this definition makes sense.
√   √ 
√ 2 −√2
Example 1.20. Let x = √ 2  and y = − √ 2 . Then x · y = 8 and kxk = kyk = 4, so that

12 12
8 1
cos θ = = ,
4·4 2
π
where θ is the angle θ between x and y. Then θ = 3 .
Exercise 1.21. Without doing any calculation, prove that the angle between the vectors
   
4 6
 1  66
   
120 and  2 
   
 8  30
0 5
in R5 is less than π2 .
Two vectors x and y in Rn are said to be orthogonal if x · y = 0. This happens exactly when
• either x = 0;
• or y = 0;
• or the angle between x and y is π2 .
Note that orthogonal is just a sexy word for perpendicular.
Lemma 1.22 (Pythagoras). Let x and y be orthogonal vectors in Rn . Then
kx + yk2 = kxk2 + kyk2 .
Proof. Using the hypothesis that x · y = 0, we have
kx + yk2 = x + y · x + y = x · x + 2x · y + y · y = kxk2 + kyk2
 

as required. 
Now, we are (very) temporarily reduce ourself to live in three-dimensional space R3 .
Definition 1.23. Let x and y be vectors in R3 . Then their cross product x × y is a vector in R3
which is defined as follows:  
x2 y3 − x3 y2
x × y = x3 y1 − x1 y3  .
x1 y2 − x2 y1
In this course, three-dimensional space plays no special role. Thus, we will hardly use the cross
product at all. But you may need it for other courses.
Lemma 1.24. Let x, y, z be vectors in R3 , and let a ∈ R. Then
(1) x × y = −(y × x);
(2) x × x = 0;
(3) x × (y + z) = (x × y) + (x × z);
(4) x × (ay) = a(x × y);
(5) (x × y) · x = 0 = (x × y) · y.
(6) kx × yk = kxk kyk sin θ, where θ is the angle between x and y.
ACCELERATED ALGEBRA 13

Proof. Let us just prove (5) and (6). To prove (5), observe that
   
 x2 y 3 − x3 y 2 x1
x × y · x = x3 y 1 − x1 y 3 · x2  =
  
x1 y 2 − x2 y 1 x3
= (x2 y3 − x3 y2 )x1 + (x3 y1 − x1 y3 )x2 + (x1 y2 − x2 y1 )x3 =
= x1 x2 y3 − x1 x3 y2 + x2 x3 y1 − x1 x2 y3 + x1 x3 y2 − x2 x3 y1 = 0

as required. Similarly, we have


2 2 2 2
kx × yk2 + (x · y)2 = x2 y3 − x3 y2 + x3 y1 − x1 y3 + x1 y2 − x2 y1 + x1 y1 + x2 y2 + x3 y3 .

Now, using elementary algebraic manipulations, we get

kx × yk2 + (x · y)2 = x21 + x22 + x23 y12 + y22 + y32 = kxk2 kyk2 ,
 

which implies that


2 2  2
2 2 2 2 2
kx × yk = kxk kyk − x · y = kxk kyk − kxkkyk cos θ = kxkkyk sin θ .

Since 0 6 θ 6 π, we have sin θ > 0. Thus, taking square roots, we get (6). 

In particular, we see that x × y is orthogonal to x and y.

Example 1.25. Let x and y be vectors from Example 1.20. Then


√ √ √ √   √ 
2
√ √ 12 + √2 √12 2 √24
x × y = − √12√2 − √2√12 = −2 24 .
− 2 2+ 2 2 0

This vector is indeed orthogonal to both x and y, since its dot product with x and its dot product
with y are both zero. Moreover, we have
 √ 
2 24 q
√ √ √ √ √
kx × yk = 2 2 2
−2 24 = (2 24) + (2 24) = 2 × 2 × 24 = 192.
 
0

On the other hand, using the values of kxk, kyk and θ that we found in Example 1.20, we get
π  √ √ √ √
kxk kyk sin θ = 4 · 4 · sin = 16 · 3/2 = 8 3 = 82 × 3 = 192,
3
which also follows from Lemma 1.24(6).

Exercise 1.26. Find a nonzero vector in R3 orthogonal to both


   
−2 3
 1  and  4  .
1 −2

Please do not forget that the cross product is defined only in the three-dimensional space R3 .
On the other hand, the dot product is define in Rn for any n > 1.
14 IVAN CHELTSOV

1.d. Matrices and operations with them.


It is my experience that proofs involving matrices can be
shortened by 50% if one throws the matrices out.
Emil Artin
Fix positive integers m and n. An m × n real matrix consists of mn real numbers
aij
where i ∈ {1, . . . , m} and j ∈ {1, . . . , n}. We usually visualize these mn numbers by arranging
them in a grid with m rows and n columns:
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
 .
 .. .. ..  .
. . 
am1 am2 · · · amn
Let us call this matrix A. We refer to aij as the (i, j)-entry of A. Sometimes we write

A = aij 16i6m, 16j6n
to mean that A is an m × n matrix with (i, j)-entry called aij . But usually we just write A = (aij ).
An m-dimensional vector is just an m × 1 matrix. For us, vectors are by default column vectors,
but you can also consider row vectors. By definition, a n-dimensional row vector is a 1 × n matrix.
Remark 1.27. A 1 × 1 matrix is just a scalar (a real number).
There are several algebraic operations on matrices. To define them, we will use the convention
that the (i, j)-entry of a matrix M is written as Mij . Then we define these operations as follows.
• Given m × n matrices A and B, we define an m × n matrix A + B by

A + B ij = Aij + Bij
for every i and j such that 1 6 i 6 m and 1 6 j 6 n.
• Given an m × n matrix A and a scalar c ∈ R, we define an m × n matrix cA by

cA ij = c · Aij
for every i and j such that 1 6 i 6 m and 1 6 j 6 n.
• Given an m × n matrix A and a n × p matrix B, we define an m × p matrix AB by
X n

AB ik = Aij Bjk
j=1

for every i and k such that 1 6 i 6 m and 1 6 k 6 p. So, the (i, k)-entry of AB is
Ai1 B1k + Ai2 B2k + · · · + Ain Bnk .
It should be pointed out that two matrices can only be multiplied if the number of columns in the
first is equal to the number of rows in the second. Remember this! In particular, we can always
multiply any two n × n matrices to get another n × n matrix.
One particularly important matrix is the m × n matrix all of whose entries are 0. We call this
matrix 0. Another important matrix is the n × n identity matrix
1 0 ··· 0
 
. . .. 
0 1
 . .
In =  . . . .
 .. . . . . 0
0 ··· 0 1
ACCELERATED ALGEBRA 15

When it is clear which n is meant, we write In as just I.


Matrix multiplication also gives us something new to do with vectors:
F if A is an m × n matrix and x is a n-dimensional vector (thought of as a n × 1 matrix), the
definition above gives us the matrix product Ax, which is an m × 1 matrix considered as
m-dimensional vector, whose entries are
X n

(1.28) Ax i = Aij xj
j=1

for every i such that 1 6 i 6 m.


Lemma 1.29. The following assertions hold:
(1) (A + B) + C = A + (B + C) for any m × n matrices A, B and C;
(2) A + B = B + A for any m × n matrices A and B;
(3) A + 0 = A for any matrix A;
(4) c(A + B) = cA + cB for any m × n matrices A and B and scalar c ∈ R;
(5) for any m × n matrix A, any n × p matrix B, and any p × q matrix C, one has
 
AB C = A BC .
(6) AIn = A = Im A for any m × n matrix A;
(7) A(B + C) = AB + AC for any m × n matrix A and n × p matrices B and C;
(8) (A + B)C = AC + BC for any m × n matrices A and B and n × p matrix C;
(9) c(AB) = (cA)B = A(cB) for any m × n matrix A, any n × p matrix B, and scalar c.
Proof. We will prove only (5) using the following convention: the (i, j)-entry of a matrix M is
written as Mij . Observe that both (AB)C and A(BC) are m × q matrices. Let us show that they
also have the same entries. We must show that
 
(AB)C i` = A(BC) i`
for every i ∈ {1, . . . , m} and ` ∈ {1, . . . , q}. Observe that
p p p
n
! n
 X X X X X
(AB)C i`
= (AB)ik Ck` = Aij Bjk Ck` = Aij Bjk Ck` .
k=1 k=1 j=1 k=1 j=1

On the other hand, we have


p p
n n
! n X
 X X X X
A(BC) i` = Aij (BC)j` = Aij Bjk Ck` = Aij Bjk Ck` .
j=1 j=1 k=1 j=1 k=1

Since we can change the order of summation, it follows that


 
(AB)C i` = A(BC) i`
as required. 
Exercise 1.30. Prove Lemma 1.29(7).
Lemma 1.29 is a result about matrices in general, but we can apply it in the special case where
some of the matrices are vectors. For instance, it tells us that

A(x + y) = Ax + Ay,

(1.31) A(cx) = cAx,

(A + B)x = Ax + Bx.

whenever A, B, x and y are matrices and vectors of the appropriate sizes, and c is a scalar.
16 IVAN CHELTSOV

Remark 1.32. Matrix multiplication is not commutative! It is not always true that
AB = BA
for matrices A and B. For example, you can check that
         
1 1 0 0 1 1 0 0 0 0 1 1
= 6= = .
0 0 1 1 0 0 1 1 1 1 0 0
Exercise 1.33. For every square matrix M , its trace, denoted by tr(M ), is the sum of its diagonal
elements. Let A and B be n × n matrices. Show that tr(AB) = tr(BA).
In this course, we will e1 , e2 , . . . , en for the vectors in Rn defined by
1 0 0
     
0 1 0
0 0 0
     
(1.34) e1 =   , e2 =   , . . . , en = 
   
. .
.
 ..  .
 ..   .. 
0 0 1
We will also write
A = (x1 |x2 | · · · |xn )
to mean that the columns of the matrix A are x1 , x2 , . . . , xn . For instance, if A is an m × n matrix
with (i, j)-entry written as Aij , then x1 ∈ Rm is given by
A11
 
 A21 
x1 = 
 ...  .

Am1
Lemma 1.35. Let A = (x1 |x2 | · · · |xn ) be an m × n matrix. Then
(1) Ay = y1 x1 + y2 x2 + · · · + yn xn for any vector y ∈ Rn ;
(2) Aej = xj for any j ∈ {1, . . . , n}, so that Aej is the jth column of A;
(3) A(y1 |y2 | · · · |yp ) = (Ay1 |Ay2 | · · · |Ayp ) for any y1 , . . . , yp ∈ Rn .
Proof. We will prove only (1) and (2). To prove (1), observe that Ay is an m × 1 matrix, which
we consider as an m-dimensional vector. We have to check that Ay and
n
X
y 1 x 1 + y 2 x2 + · · · + y n x n = yi xi
i=1

have the same entries. Let i be an integer such that 1 6 i 6 m. Then, using (1.28), we see that
n
X

Ay i
= Aij yj .
j=1

On the other hand, the ith entry of the vector y1 x1 is the number y1 Ai1 . Similarly, we see that
the ith entry of the vector yj xj is yj Aij for every j such that 1 6 j 6 n, so that
n
X

y1 x1 + y2 x2 + · · · + yn xn i
= y1 Ai1 + y2 Ai2 + · · · + yn Ain = Aij yj .
j=1

So the ith entries of the two sides are equal, as required. This proves (1).
To prove (2), it is enough just put y = ej in (1). 
ACCELERATED ALGEBRA 17

The transpose of an m × n matrix A is the n × m matrix, denoted as AT , whose (j, i)-entry is


the (i, j)-entry of the matrix A for every i and j such that 1 6 i 6 m and 1 6 j 6 n. Example:
 
 T 1 −3
1 −2 4
= −2 5  .
−3 5 6
4 6
A square matrix A = (aij ) is said to be symmetric if AT = A. Example:
 
4 −7 2
−7 99 12  .
2 12 −10
A square matrix A = (aij ) is said to be skew symmetric if AT = −A. Example:
 
0 −7 2
7 0 12 .
−2 −12 0.
Note that the diagonal entries aii of a skew symmetric matrix A = (aij ) must all be 0.
Exercise 1.36. Show that every square matrix can be written as the sum of a symmetric matrix
and a skew symmetric matrix.
Lemma 1.37. Let x and y be vectors in Rn . Then x · y = xT y.
Proof. The right-hand side is the 1 × 1 matrix
y1
 
  y2  
x1 x2 · · · xn   ...  = x1 y1 + x2 y2 + · · · + xn yn .

yn
But a 1 × 1 matrix is just a scalar, and the scalar here is exactly the dot product x · y. 
Here are some further useful properties of transposes.
Lemma 1.38. The following assertions hold:
(1) (A + B)T = AT + B T for all m × n matrices A and B;
(2) (cA)T = cAT for every matrix A and scalar c;
(3) (AB)T = B T AT for every m × n matrix A and n × p matrix B;
(4) (AT )T = A.
Proof. We will prove only (3) using the following convention: the (i, j)-entry of a matrix M is
written as Mij . Observe that (AB)T and B T AT are p × m matrices. We have to show that their
entries are the same. Let k ∈ {1, . . . , p} and i ∈ {1, . . . , m}. Then
n
X
T

(AB) ki
= (AB)ik = Aij Bjk .
j=1

On the right-hand side, we also have


n
X n
X
T T T T
  
B A ki
= B kj
A ji
= Bjk Aij .
j=1 j=1

So the (k, i)-entries of the matrices (AB)T and B T AT are equal as required. 
18 IVAN CHELTSOV

Now, we are temporarily consider only square n × n matrices with real entries. The set they
form is usually denoted by Mn (R). Then
• the sum of any two matrices in Mn (R) is a matrix in Mn (R);
• the product of any two matrices in Mn (R) is a matrix in Mn (R).
Thus, we see that n × n matrices behave like real numbers, where zero matrix plays the role of 0,
and the identity matrix In plays the role of 1. However, there are two big differences:
(a) the multiplication in Mn (R) is not commutative (see Remark 1.32);
(b) we cannot always divide one matrix in Mn (R) by another non-zero matrix in Mn (R).
For example, we cannot divide I2 by matrices in Remark 1.32. Indeed, if we could, we would get
  
1 0
 0 1
         
1 1 1 0 1 1 1 1  1 1
= × = 1 1 × 0 0  × −1 −1 =

−1 −1 0 1 −1 −1
0 0
   
1 0 1 0
0 1 0 1
       
1 1 1 1 0 0 0 0
= × × = × = ,
1 1 0 0 −1 −1 1 1 0 0 0 0
0 0 0 0
or
  
1 0
 0 1
         
1 1 1 0 1 1 0 0  1 1
= × = 0 0 × 1 1  × −1 −1 =

−1 −1 0 1 −1 −1
1 1
   
1 0 1 0
0 1 0 1
       
0 0 1 1 0 0 0 0
= × × = × = ,
0 0 1 1 −1 −1 0 0 0 0 0 0
1 1 1 1
because every matrix in Mn (R) multiplied by the zero matrix in Mn (R) is a zero matrix in Mn (R).
Definition 1.39. An n × n matrix A is invertible if there exists a n × n matrix B such that
AB = In = BA.
It is called singular if it is not invertible.
If A is an invertible matrix in Mn (R), then there can be only one matrix B such that AB = In
and BA = In . Indeed, if C is a matrix in in Mn (R) such that AC = In and CA = In ,
B = BIn = B(AC) = (BA)C = In C = C,
so that B = C. We call this matrix B the inverse of the matrix A, and write B as A−1 .
Example 1.40. The following matrix is invertible:
 
1 −2
.
−3 5
Indeed, we have
       
1 −2 −5 −2 1 0 −5 −2 1 −2
= = .
−3 5 −3 −1 0 1 −3 −1 −3 5
ACCELERATED ALGEBRA 19

If a, b and c are non-zero real numbers, then


 −1  1 
a 0 0 a
0 0
0 b 0 =  0 1 0  .
b
0 0 c 0 0 1c
Example 1.41. Let A be a 2 × 2 matrix
 
a b
.
c d
If ad − bc 6= 0 then A is invertible and
 
−1 1 d −b
A = .
ad − bc −c a
Check this! The number ad − bc is called the determinant of A.
Exercise 1.42. Let A and B be 2 × 2 matrices. Show that
det(AB) = det(A)det(B).
Prove that det(A) 6= 0 if and only if A is invertible.
If A and B be two n × n matrices such that AB = In , then BA = In , so that we only need to
check one equality in Definition 1.39. We will prove this later.
Exercise 1.43. Find a m × n matrix A and a n × m matrix B such that AB = Im but BA 6= In .
Show that this cannot happen when m = n = 2.
Lemma 1.44. The following assertion holds:
(1) Let A and B be invertible n × n matrices. Then AB is also invertible, and
−1
AB = B −1 A−1 .

(2) The identity matrix In is invertible, with inverse In .


Proof. Observe that
AB B −1 A−1 = A BB −1 A−1 = AIn A−1 = AA−1 = In .
  

Similarly, we see that (B −1 A−1 )(AB) = In . This proved (1). Part (2) follows from In In = In . 

Our final lemma in this section connects transpose and inverse:


Lemma 1.45. Let A be an invertible n × n matrix. Then AT is also invertible. Moreover, we have
−1 T
AT = A−1 .

Proof. Using Lemma 1.38(3), we see that


T T
A−1 AT = AA−1 = InT .

Similarly, we see that AT (A−1 )T = In . 


20 IVAN CHELTSOV

1.e. Complex numbers and finite fields.


The only reason that we like complex numbers is that we don’t like real numbers.
Bernd Sturmfels
So far we only used real numbers. Surprisingly, almost everything we described so far (except for
the distances and angles in Section 1.c) could be repeated almost verbatim replacing real numbers
by other numbers, which are important for various reasons. Among all of them, the most useful are
complex numbers C and finite fields Fq consisting of q = pn elements, where p is a prime number,
and n is a positive integer. Let us explain what these numbers are.
To obtain complex numbers from R, we begin by adjoining to R a new element i such that
i2 = −1.
Because we want to be able to add and multiply complex numbers, we must consider elements
a + bi
for each a and b in R. Every complex number looks like this. In fact, we can define C to be the set
consisting of all such sums, so that for every z ∈ C, there are unique real numbers a and b such
that z = a + bi. Then we extend addition and multiplication from R to C using intuitive rules:
(a + bi)(c + di) = ac + bdi2 + adi + bci = ac − bd + (ad + bc)t
and (a + bi) + (c + di) = (a + c) + (b + d)i, where a, b, c and d are any real numbers.
Now we can work with complex numbers absolutely in the same way as with real numbers.
We can define vectors over C and matrices over C absolutely as we did in Sections 1.b and 1.d.
Do we need this? Are complex numbers better than real? Yes, they are. They are much better:
Theorem 1.46 (Fundamental theorem of algebra). Let
f (x) = an z n + an−1 z n−1 + · · · + a1 z + a0
be a polynomial such that each ai is a complex number, an 6= 0 and n > 1. Then there exists
a complex number λ, called a root of f (x), such that f (λ) = 0.
This fantastic result easily implies the following
Corollary 1.47. Let f (x) = an z n + an−1 z n−1 + · · · + a1 z + a0 be a polynomial with complex
coefficients such that an 6= 0 and n > 1. Then
f (x) = an (x − λ1 )(x − λ2 ) · · · (x − λn )
for some complex numbers λ1 , λ2 , . . . , λn .
Exercise 1.48. Prove Theorem 1.46 in the case when n = 2.
Let z be a complex
√ number. Then z = a + bi for some real numbers a and b. We can define its
modulus as |z| = a2 + b2 . Then |z| > 0 for all z ∈ C. Moreover, we have |z| √
= 0 ⇐⇒ z = 0.
The complex conjugate of the number z is defined to be z = a − bi. Then |z| = zz and
z + w = z + w,


z · w = z · w.
for every two complex numbers z and w. Check this!
Exercise 1.49. Let f (x) be a polynomial with real coefficients, and let z be a complex number.
Prove that f (z) = 0 ⇐⇒ f (z) = 0.
ACCELERATED ALGEBRA 21

To describe finite fields, let us explain what the word field means. To do this, consider a set F.
We want to equip it with two operations + (called addition) and ∗ (called multiplication) such that
the set F behaves as usual (real or complex) numbers. Let us explain what do we mean by this.
First of all, the set F should have an element F in F that behaves like 0. This means that
a+F=F+a=a
for every element a in F. Since F is a bulky symbol, let us denote this element simply by 0.
Similarly, the set F should have another element, which we denote by 1, that behaves as usual 1
in complex or real numbers. This means that
a∗1=1∗a=a
for every element a in F such that a 6= 0, while 1 ∗ 0 should be 0 of course.
Clearly, we want both operations + and ∗ to be commutative. These means that
a+b=b+a


a∗b=b∗a
for every two elements a and b in F. Similarly, we want
(a + b) + c = a + (b + c)


(a ∗ b) ∗ c = a ∗ (b ∗ c)
for every three elements a, b and c of the set F. Moreover, we want
a ∗ (b + c) = a ∗ b + a ∗ c,
for every three elements a, b and c of the set F.
To make + looks like usual addition of real or complex numbers, we should be able to subtract
elements of the set F from each other. Namely, for every element a and b, the set F should contains
an element ♣ such that
a = ♣ + b,
so that we denote ♣ by a − b. In a very special case when a = 0, we denote 0 − b simply by −b.
The same should be true for multiplication: we must be able to divide elements in F with a single
exception: we cannot divide by 0 (this is one of the mortal sins). Thus, for every two elements a
and b such that b 6= 0, the set F should contains an element ♠ such that
a = ♠ ∗ b,
so that we denote ♠ by ab . If a = 1, then we say that 1
b
is the inverse of the element b, and we can
also denote it by b−1 .
Definition 1.50. If the set F is equipped with two operations + and ∗ such that the properties
we described are satisfied, we say that F is a field. To be precise, the set F is a field if the following
conditions are satisfies:
(1) the set F must have an element, which is usually denoted by 0, such that
0+a=a+0=a
for any element a in the set F;
(2) (a + b) + c = a + (b + c) for every three elements a, b and c in F;
(3) for every two elements a and b in the set F, there is c ∈ F such that
a = c + b,
so that we denote c by a − b (if a = 0, then we write −b instead of 0 − b);
(4) a + b = b + a for every two elements a and b in F;
22 IVAN CHELTSOV

(5) the set F must have an element, which is usually denoted by 1, such that
1∗a=a∗1=a
for any element a in the set F such that a 6= 0;
(6) (a ∗ b) ∗ c = a ∗ (b ∗ c) for every three elements a, b and c in F;
(7) for every two elements a and b in the set F such that b 6= 0, there is c ∈ F such that
a = c ∗ b,
so that we denote c by ab (if a = 1, then we often write b−1 instead of 1b );
(8) a ∗ b = b ∗ a for every two elements a and b in F;
(9) a ∗ (b + c) = a ∗ b + a ∗ c for every a, b and c in F
Examples of fields are rational numbers, real numbers and complex numbers (equipped with
usual addition and multiplication). These are infinite fields. What about finite fields?
Example 1.51. Let F be the set consisting of the following symbols: , N and F (my favorite).
Let us equip F with two operations + and ∗ such that all properties described above are satisfied.
First we have to choose the special element that plays the role of zero. Of course, it should be F.
Then, using this and Definition 1.50, we obtain the following addition table:

+ F  N

F F  N

  N F

N N F 
Now we should choose an element between  and N that will plays the role of 1. Let it be .
Then we obtain the following multiplication table:

∗ F  N

F F F F

 F  N

N F N 
Now one can check that the set F equipped with + and ∗ is a field.
Can we equip every finite set with two operations + and ∗ such that it becomes a field? No.
Exercise 1.52. Prove that there exists no field consisting of 6 elements.
If the set F consists of 3 elements, then we can equip it with two operations + and ∗ such that
the set F becomes a field. This follows from Example 1.51. To construct more finite fields, fix
a prime number p. Let Fp be the set

0, 1, . . . , p − 1 .
Equip this set with operations + and ∗ as follows. For every two elements a and b in Fp , we let
a + b = the remainder of the integer a + b when divided by p.
ACCELERATED ALGEBRA 23

Similarly, we let
a ∗ b = the remainder of the integer a ∗ b when divided by p.
Exercise 1.53. Prove that Fp equipped with + and ∗ is a field.
To divide elements in Fp , it is enough to understand how to find (multiplicative) inverses of the
elements in Fp . For instance, if p = 1973, then
1
= 570,
45
a
since 45 ∗ 570 = 1 in Fp . Thus, in this case, we have 45 = a ∗ 570 for every element a in Fp .
But finding inverses can be tricky. The best way to do this is to use extended Euclidean algorithm.
On the other hand using Fermat Little Theorem, one can show that
1
= a ∗ a ∗ ·{z
· · ∗ a ∗ a}
a |
p − 2 times

provided that p > 3 and a 6= 0 in Fp .


45
Exercise 1.54. Suppose p = 2063. Find 1973
in Fp .
Do we have other finite fields? We already know from Exercise 1.52 there exists no field consisting
of 6 elements. What about a field consisting of 4 elements? Does it exist? Yes, it does.
Let F be the set consisting of ♣, ♦, ♥, ♠. Let us equip F with + and ∗ such that it be-
comes a field. Let ♥ be the zero element in F, so that we can write 0 instead of ♥ if we wish.
Similarly, let ♦ be the element which we usually denote by 1. Then (and this is a good exercise),
we must get the following addition table:

+ ♥ ♦ ♣ ♠

♥ ♥ ♦ ♣ ♠

♦ ♦ ♥ ♠ ♣

♣ ♣ ♠ ♥ ♦

♠ ♠ ♣ ♦ ♥
Similarly, we must get the following multiplication table:

∗ ♥ ♦ ♣ ♠

♥ ♥ ♥ ♥ ♥

♦ ♥ ♦ ♣ ♠

♣ ♥ ♣ ♠ ♦

♠ ♥ ♠ ♦ ♣
One can check that F equipped with + and ∗ is a field, so that we can call it F4 for consistency.
But there is a better way to represent elements in F4 . Let us describe it.
24 IVAN CHELTSOV

Do you remember how we earlier constructed C? A similar approach can be used to construct F4 .
Namely, we start with the field F2 , which consists of just two elements: 0 and 1. Then we introduce
a new symbol, let us call it t, such that t2 = t + 1. Then we consider all sums that look like a + bt,
where a and b are both in F2 . Finally, we denote the obtained set by F4 , and extend addition and
multiplication from F2 to elements in F4 using simple intuitive rules:
(a + bt) ∗ (c + dt) = ac + bdt2 + adt + bct = (ac + bd) + (ad + bc + bd)t
and
(a + bt) + (c + dt) = (a + c) + (b + d)t,
where a, b, c and d are any elements in F2 . This turns F4 into a field with the same addition and
multiplication tables as above, where ♣, ♦, ♥ and ♠ are renamed as follows:
♥ 7→ 0,



♦ 7→ 1,


 ♣ 7→ t,

♠ 7→ 1 + t.

For every prime p and every n ∈ N, one can use a similar approach to construct the field Fq that
consists of q = pn elements. The crucial point in constructing F4 was the fact that the polynomial
x2 + x + 1
is irreducible over F2 , i.e. it cannot be factorized as a product of two other polynomials with
coefficients in F2 that have smaller degrees. Likewise, if we find an irreducible polynomial
f (x) = xn + an−1 xn−1 + an−2 xn−2 + · · · + a2 x2 + a1 x + a0
with each ai in Fp , we can construct a field Fq with q = pn as all possible sums
α0 + α1 t + α2 t2 + · · · + αn−2 tn−2 + αn−1 tn−1 ,
where each αi is an element of the field Fp , and t is a symbol that satisfies the following rule:
 
n n−1 n−2 2
t = − an−1 t + an−2 t + · · · + a2 t + a1 t + a0 .
Then we can expand the addition and multiplication operations from Fp to the set Fq in a similar
way as we did this earlier in the construction of the field F4 .
Exercise 1.55. Construct the field F8 and present its multiplication table.
Let us consider one explicit example. Let p = 2063 and q = 20632 . Let us construct the field Fq .
First observe that the polynomial x2 + 1 is irreducible over Fp . Indeed, if x2 + 1 were reducible,
then we would have
x2 + 1 = (x − a)(x + a)
for some a and b in Fp , so that a2 = −1 in Fp . In this case, we say that −1 is a quadratic residue.
But −1 is a quadratic residue in Fp ⇐⇒ the remainder of p when divided by 4 is 1. Note that
2063 = 515 × 4 + 3,
which implies that x + 1 is irreducible over Fp . Now we introduce a symbol t such that t2 = −1.
2

Then we consider all sums that look like


a + bt,
where a and b are both elements in F2063 . Finally, we denote the obtained set by Fq for q = p2 ,
and extend addition and multiplication from Fp to elements in Fq using the following rules:
(a + bt) ∗ (c + dt) = ac + bdt2 + adt + bct = (ac − bd) + (ad + bc)t
ACCELERATED ALGEBRA 25

and
(a + bt) + (c + dt) = (a + c) + (b + d)t,
where a, b, c and d are any elements in Fp . This turns the set Fq into a field.
Remark 1.56. We will see later that every finite field consists of q = pn elements for some prime
number p and some positive integer n. As we already mentioned, for every prime p and every
positive integer n there exist a field consisting of q = pn elements, which is usually denoted by Fq .
In fact, such field is unique (up to renaming the elements of the field).
Finite fields play a fundamental role in in a number of areas of mathematics and computer
science, including number theory, algebraic geometry, cryptography and coding theory.
Exercise 1.57. Let p be a prime number, let n be a positive integer, and let q = pn . Find how
many 2 × 2 matrices  
a b
c d
with ad − bc 6= 0 are there, where a, b, c and d are in Fq ,
26 IVAN CHELTSOV

2. Basic linear algebra


2.a. Vector subspaces.
Mister Data, there is a subspace communication for you.
Quote from Star Trek, The Next Generation
Let Π be a plane in R3 that passes through the origin. If we take any two points x and y in the
plane Π, then their sum
x+y
is also in Π. In this case, we say that Π is closed under addition. Also, the plane Π is closed under
scalar multiplication: if we take any point x on Π and any scalar λ, then λx is contained in Π too.
By the way, what is a plane in R3 ?
Definition 2.1. A hyperplane in Rn is a set of points x in Rn that is given by
Xn
Ai xi = B,
i=1
where A1 , . . . , An , B are real numbers such that at least one number among A1 , . . . , An is not zero.
Hyperplanes in R3 are called planes. Hyperplanes in R2 are lines (see Definition 1.14).
In Definition 2.1, the hyperplane contains the origin if and only if B = 0. If Π is a hyperplane
in Rn that passes through the origin, it has the following three properties:
(1) 0 ∈ Π (obviously),
(2) the hyperplane Π is closed under addition: for every x and y in Π, we have
x + y ∈ Π,
(3) the hyperplane Π is closed under scalar multiplication: for every x ∈ Π and λ ∈ R, we have
λx ∈ Π.
n
If L is a line in R that passes through the origin (see Definition 1.14), then L has the same
three properties: it contains 0 and is closed under addition and scalar multiplication.
Definition 2.2. A linear subspace of Rn is a subset V of Rn with the following properties:
(1) 0 ∈ V;
(2) for every x and y in V, we have x + y ∈ V;
(3) for every x ∈ V and every λ ∈ R, we have λx ∈ V.
A linear subspace of Rn , which is often called a vector subspace, is a subset containing 0 and
closed under addition and scalar multiplication. The following subsets in Rn are linear subspaces:
(1) {0}, which is called the trivial linear subspace;
(2) a line passing through the origin;
(3) a hyperplane that contains the origin;
(4) whole vector space Rn .
One can show that these are the only linear subspaces in Rn if n 6 3.
Exercise 2.3. Which of the following subsets of Rn are linear subspaces?
(1) The set of all scalar multiples of a given vector u in Rn .
(2) The subset
  
 x1 
3
V=  x2 ∈ R : 2x1 − x2 + x3 = 0, x1 + 5x2 = 0 .

x3
 
ACCELERATED ALGEBRA 27

(3) The subset


  
x1 2
W= ∈ R : x1 + 2x2 = 3, 4x1 + 5y1 = 6 .
x2
(4) The subset
  
x1 2 2 2
U= ∈ R : x1 + x2 = 0 .
x2
Here is a rather general way of creating linear subspaces of Rn :
Definition 2.4. Let A be an m × n real matrix. The kernel of A is
n o
ker(A) = x ∈ Rn : Ax = 0 .

Some people call the kernel of a matrix the null space.


Lemma 2.5. For any m × n matrix A, the kernel ker(A) is a linear subspace of Rn .
Proof. We must check that all conditions of Definition 2.2 are satisfied. Since A0 = 0, we have
0 ∈ ker(A),
so that Definition 2.2(1) is satisfied.
For Definition 2.2(2), if x and y are two points in ker(A), then

A x + y = Ax + Ay = 0 + 0 = 0,
so that x + y is contained in ker(A) as well.
For Definition 2.2(3), if x ∈ ker(A) and λ ∈ R, then
 
A λx = λ Ax = λ0 = 0,
so that λx is contained in ker(A). This shows that ker(A) is a linear subspace of Rn . 
Example 2.6. Let
 
1 −2 3
A= .
−4 5 −6
This is a 2 × 3 matrix, so its kernel is a subspace of R3 . Then
  
 x1 
3
ker(A) =  x2 ∈ R : x1 − 2x2 + 3x3 = 0 and − 4x1 + 5x2 − 6x3 = 0 .

 x 
3

This is a line in R3 that passes through 0.


Exercise 2.7. Find two matrices whose kernels consist of points x in R3 such that
− 4x1 + x2 − 3x3 = 0,


2x1 + 2x2 + 5x3 = 0.


Let v1 , v2 , . . . , vm and y be some vectors in the vector space Rn . If
y = λ1 v1 + λ2 v2 + · · · + λm vm
for some real numbers λ1 , λ2 , . . . , λm , we say that y is a linear combination of v1 , . . . , vm .
28 IVAN CHELTSOV
       
5 1 −2 1
Example 2.8. The vector 9 is a linear combination of 3 ,
     4  and −3, since

2 1 3 −2
       
5 1 −2 1
9 = 4 3 + 0  4  + 1 −3
2 1 3 −2
For any vectors v1 , v2 , . . . , vm in the n-dimensional vector space Rn , the zero vector 0 is a linear
combination of the vectors v1 , . . . , vm , since
0 = 0v1 + 0v2 + · · · + 0vm .
Lemma 2.9. Let V be a linear subspace of the space Rn , and let v1 , v2 , . . . , vm be vectors in V.
Then every linear combination of the vectors v1 , v2 , . . . , vm also belongs to V.
Proof. Let us prove the assertion by induction on m. If m = 1, this follows from Definition 2.2.
Thus, we suppose that m > 2, and the lemma holds for m − 1.
Let v1 , v2 , . . . , vm be some vectors in V, and let λ1 , λ2 , . . . , λm be so me real numbers. Put
w = λ1 v1 + · · · + λm−1 vm−1 .
Then w ∈ V and λm vm by inductive hypothesis. Since V is closed under addition, we have
w + λm vm ∈ V.
Thus, we see that
m
X
λi vi = w + λm vm ∈ V,
i=1
which completes the induction. 
 
5
Let us consider one example. For instance, the vector x = 9 is not a linear combination of

3
the vectors v1 , v2 , v3 in Example 2.8. Indeed, if

A = 1 −1 2
then these three vectors are contained in ker(A), but x 6∈ ker(A). Therefore, it follows from
Lemma 2.9 that x cannot be expressed as a linear combination of v1 , v2 and v3 .
       
1 3 5 7
Exercise 2.10. Show that the vector is a linear combination of , and .
2 4 6 8

2.b. Linear span.


The life of man is but a span.
English folk carol
One can show that every vector in the space R3 is a linear combination of vectors in Exercise 2.8.
However, if we take just two of them, their linear combinations form a plane in R3 that passes
through the origin. For instance, for every real numbers λ1 , λ2 , λ3 and λ4 , the vectors
       
5 1 −2 1
λ1 9 + λ2 3 + λ3 4 + λ4 −3
      
2 1 3 −2
ACCELERATED ALGEBRA 29

are contained in the plane in R3 that is given by


x1 − x2 + 2x3 = 0.
Moreover, every point in this plane is a linear combination of these (four) vectors.
Exercise 2.11. Find real numbers A1 , A2 , A3 , A4 such that all linear combinations of the vectors
     
1 1 1
1 1 0
 , , ,
1 0 1
0 1 1
are contained in the hyperplane in R4 given by A1 x1 + A2 x2 + A3 x3 + A4 x4 = 0
More generally, let v1 , v2 , . . . , vm be vectors in Rn .
Definition 2.12. The linear span of the vectors v1 , v2 , . . . , vm is the set
  n o
span {v1 , v2 , . . . , vm } = x ∈ Rn : x is a linear combination of the vectors v1 , v2 , . . . , vm .

Let V = span({v1 , . . . , vm }).


Lemma 2.13. The set V is a linear subspace of Rn .
Proof. We must verify the three conditions of Definition 2.2. Observe that
m
X
0= 0vi ,
i=1

so that 0 ∈ V. This verifies condition Definition 2.2(1).


To verify Definition 2.2(2), let x and y be some vectors in V. Then
m
X
x= λi vi
i=1

and
m
X
y= µi vi
i=1
for some real numbers λ1 , λ2 , . . . , λm , µ1 , µ2 , . . . , µm . Hence, we have
m
X m
X m
X m
 X
x+y = λi vi + µi vi = λi v i + µi vi = (λi + µi )vi ,
i=1 i=1 i=1 i=1

so that x + y ∈ V. This verifies Definition 2.2(2).


To verify Definition 2.2(3), let x ∈ V and γ ∈ R. Then
m
X
x= δi v i
i=1

for some real numbers δ1 , δ2 , . . . , δm . Thus, we have


m
X m
X
γx = γ δi v i = γδi vi ,
i=1 i=1

so that γx ∈ V. This verifies Definition 2.2(3). 


30 IVAN CHELTSOV

We say that the vectors v1 , . . . , vm span the vector subspace V, or that the set
n o
v1 , . . . , v m
is a spanning set for the vector space V.
 1   0 
Example 2.14. Let v1 = −1 and v2 = −1
1 be vectors in R3 , and let V = span({v1 , v2 }).
0
Then V is the plane in R3 that is given by
x1 + x2 + x3 = 0.
Indeed, denote this plane by Π. Then Π is a vector subspace of the three-dimensional space R3 ,
since it is the kernel of the 1 × 3 matrix (1 1 1). On the other hand, one can check that
v1 ∈ Π 3 v2 ,
so that V ⊆ Π by Lemma 2.9. To show that V ⊇ Π, let x be a vector in Π. Then
x = x1 v 1 − x 3 v 2 ,
so that x ∈ V. This shows that V ⊇ Π.
Let e1 , e2 , . . . , en be vectors in Rn defined in (1.34). Then e1 , e2 , . . . , en span the whole Rn , since
 
x = x1 e1 + x2 e2 + · · · + xn en ∈ span {e1 , e2 , . . . , en }
for every vector x ∈ Rn .
Lemma 2.15. Let v1 , . . . , vm be vectors in Rn , and let W be a linear subspace of Rn . Then
 
the vectors v1 , . . . , vm are contained in W ⇐⇒ span {v1 , . . . , vm } ⊆ W.

Proof. If all vectors v1 , . . . , vm are contained in W, then span({v1 , . . . , vm }) ⊆ W by Lemma 2.9.


Vice versa, if span({v1 , . . . , vm }) ⊆ W, then the vectors v1 , . . . , vm are contained in W, since these
vectors are contained in span({v1 , . . . , vm }) by definition. 
Corollary 2.16. Let v1 , . . . , vm be vectors in Rn that are contained in span({w1 , . . . , wk }) for
some vectors w1 , . . . , wk in Rn . Then
   
span {v1 , . . . , vm } ⊆ span {w1 , . . . , wk } .

Corollary 2.17. Let v1 , . . . , vm , vm+1 , . . . , vm+k be vectors in Rn . Then


   
span v1 , . . . , vm ⊆ span v1 , . . . , vm , vm+1 , . . . , vm+k .
Moreover, if vm+1 , . . . , vm+k are contained in span({v1 , . . . , vm }), then
   
span v1 , . . . , vm = span v1 , . . . , vm , vm+1 , . . . , vm+k .

The linear span of a non-zero vector in Rn is a line containing the origin and this vector.
Exercise 2.18. Let u, v and w be vectors in Rn . Prove that
   
span u, v, w = span au + bv + cw, v, w
for any real numbers a, b and c such that a 6= 0.
Let A be an m × n matrix. Then each column of the matrix A is an element of Rm .
Definition 2.19. The column space of the matrix A, written as col(A), is the span of its n columns.
ACCELERATED ALGEBRA 31

The transpose of each row of the matrix A is a n-dimensional row vector.


Definition 2.20. The row space of A, written as row(A), is the span of the transposes of its rows.
Note that col(A) is a linear subspace of Rm , and row(A) is a linear subspace of Rn . For example, if
 
1 −1 0
A= ,
0 1 −1
then one can easily see that
     
1 −1 0
col(A) = span , , = R2 ,
0 1 −1
and it follows from Example 2.14 that
    
 1 0  n o
row(A) = span −1 ,  1  = x ∈ R3 : x1 + x2 + x3 = 0 .
 0 −1 
Lemma 2.21. Let A be an m × n matrix. Then
n o
m n
col(A) = y ∈ R : y = Ax for some x ∈ R .

Proof. Let y ∈ Rm . Write the columns of A as vectors v1 , . . . , vn in Rm . Then y ∈ col(A) if and


only if y = x1 v1 + · · · + xn vn for some real numbers x1 , . . . , xn . But by Lemma 1.35, we have
x1
 

x1 v1 + · · · + xn vn = A ...  .

xn
This shows that y ∈ col(A) if and only if y = Ax for some x ∈ Rn . 

2.c. Linear independence.


It is the business of the very few to be independent.
Friedrich Nietzsche
n
Let v1 , v2 , . . . , vm be vectors in R . If m > n, then at least one of the vectors
v 1 , v 2 , . . . , vm
must be a linear combination of the remaining vectors. This follows from Corollary 2.32 below.
If vm is a linear combination of the vectors v1 , v2 , . . . , vm−1 , then
vm = λ1 v1 + λ2 v2 + · · · + λm−1 vm−1
for some real numbers λ1 , λ2 , . . . , λm−1 . In this case, the vector vm is redundant for all problems.
Similarly, if the redundant vector is v1 , then
v1 = λ2 v2 + · · · + λm−1 vm−1 + λm vm
for some real numbers λ2 , . . . , λm−1 , λm . How to express this in a more neat way?
Definition 2.22. The vectors v1 , v2 , . . . , vm in Rn are said to be linearly dependent if
C
( ) λ1 v1 + λ2 v2 + · · · + λm vm = 0
for some real numbers λ1 , λ2 , . . . , λm such that at least one of them is not zero.
32 IVAN CHELTSOV

For instance, if λm in this definition is not zero, then we can rewrite ( ) as C


λ1 λ2 λm−1
vm = − v1 − v2 − · · · − vm−1 ,
λm λm λm−1
so that vm is a linear combination of the vectors v1 , v2 , . . . , vm−1 . However, a priori, we do not know
which number among λ1 , λ2 , . . . , λm is not zero. But we know that such non-zero number exists.
Example 2.23. The vectors      
−1 2 −1
−8  4   1 
 , , 
 8   0  −4
−1 −2 2
in R4 are linearly dependent, since
     
−1 2 −1
−8 4 1
2 8  + 3  0  + 4 −4 = 0
    
−1 −2 2
Definition 2.24. Vectors are said to be linearly independent if they are not linearly dependent.
If the vectors v1 , v2 , . . . , vm are linearly independent, then
m
X
λi vi = 0 =⇒ λ1 = 0, λ2 = 0, . . . , λm = 0
i=1

for all real numbers λ1 , λ2 , . . . , λm .


Example 2.25. Let e1 , e2 , . . . , en be vectors in Rn defined in (1.34). Then they are linearly
independent. Indeed, suppose that they are linearly dependent. Then
λ1
 
 ...  = λ1 e1 + λ2 e2 + · · · + λn en = 0
λn
for some real numbers λ1 , λ2 , . . . , λn such that at leas one of them is not zero. This is absurd.
Every non-zero vector is linearly independent. Vice versa, any list of vectors containing 0 must
be linearly dependent. Indeed, given vectors v1 , . . . , vm with v1 = 0, we have
1v1 + 0v2 · · · + 0vm = 0,
and not all the coefficients on the left-hand side are zero.
Remark 2.26. Let v1 and v2 be two vectors in Rn . If they are linearly dependent, then there are
real numbers λ1 and λ2 such that λ1 v1 + λ2 v2 = 0 and λ1 6= 0 or λ2 6= 0. Thus, in this case, either
λ2
v1 = − v2
λ1
provided that λ1 6= 0, or
λ1
v2 = −
v2
λ2
provided that λ2 6= 0. Thus, it follows from Lemma 1.15 that the vectors v1 and v2 are linearly
dependent if and only if |v1 · v2 | = kv1 k kv2 k.
ACCELERATED ALGEBRA 33

Exercise 2.27. Decide whether the vectors


   
3156 989
,
−2047 10034
in R2 are linearly independent or linearly dependent.
There are some useful ways of restating the definition of linear independence:
Lemma 2.28. Let v1 , v2 , . . . , vm be vectors in Rn . The following are equivalent:
(1) the vectors v1 , v2 , . . . , vm are linearly independent;
(2) none of the vectors among v1 , v2 , . . . , vm is a linear combination of the others;
(3) for every x ∈ span({v1 , v2 , . . . , vm }), there are unique real numbers λ1 , λ2 , . . . , λm such that
Xm
x= λi v i .
i=1

Proof. Suppose that (3) holds. If v1 is a linear combination of the vectors v2 , . . . , vm , then
Xm
v1 = λi v i
i=2

for some scalars λ2 , . . . , λm , so that


1v1 + 0v2 + · · · + 0vm = 0v1 + λ2 v2 + · · · + λm vm
which contradicts the uniqueness in (3). Similarly, we obtain a contradiction in the case when any
vector among v1 , v2 , . . . , vm is a linear combination of the others. Thus, we see that (3) implies (2).
Suppose that (2) holds. Let λ1 , . . . , λm be real numbers such that
Xm
λi vi = 0.
i=1

Then λ1 = · · · = λm = 0. Indeed, if λ1 6= 0 for some i, then


1 
vi = − λ2 v2 + · · · + λm vm
λ1
which contradicts (2), so that λ1 = 0. Similarly, we see that λ2 = · · · = λm = 0. This shows that
the vectors v1 , v2 , . . . , vm are linearly independent. Hence, we see that (2) implies (1).
Now we suppose that (1) holds. Let x be a vector in span({v1 , . . . , vm }). Suppose that
X m X
x= λi vi = µi vi
i=1 i

for some real numbers λ1 , . . . , λm , µ1 , . . . , µn . Subtracting, we get


X m

λi − µi vi = 0,
i=1

so that λi = µi for all i by linear independence. This shows that (1) implies (3). 
Corollary 2.29. Let v1 , v2 , . . . , vm be linearly independent vectors in Rn . Then any non-empty
list of vectors consisting of vectors in {v1 , v2 , . . . , vm } is also linearly independent.
Exercise 2.30. Let u, v and w be vectors in Rn . Suppose that u and v are linearly independent,
the vectors u and w are linearly independent, and the vectors v and w are linearly independent.
Does it follow that u, v, w are linearly independent? Give a proof or counterexample.
34 IVAN CHELTSOV

So far we know nothing about how the sizes of spanning sets of vectors and of linearly indepen-
dent sets of vectors are related to one another. The next result describes the relationship:
Proposition 2.31 (Steinitz lemma). Let V be a vector subspace of Rn . Let v1 , . . . , vm be linearly
independent vectors in V, and let w1 , . . . , wk be vectors spanning V. Then m 6 k.
Proof. We claim that m 6 k and it is possible to choose m vectors in

w1 , . . . , wk
in such a way that when these vectors are replaced by v1 , . . . , vm , the resulting list still spans V.
Let us prove this by induction on m.
Suppose that m = 1. Then k > 1, since otherwise we would have V = {0}, which would imply
that v1 = 0, which contradicts v1 being linearly independent. By assumption, we have
k
X
v1 = λi wi
i=1

for some real numbers λ1 , . . . , λm such that at least one number among λ1 , . . . , λm is not zero.
Without loss of generality, we may assume that λ1 6= 0. Then
k
1 X λi
w1 = v1 − wi ,
λ1 i=2
λ 1

so that v1 , w2 , . . . , wm span V by Corollary 2.16. This is base of induction.


Now we assume that m > 2, and the claim (we are proving) holds for m − 1 linearly independent
vectors v1 , . . . , vm−1 . Then m − 1 6 k, and we can choose m − 1 vectors among w1 , . . . , wk in
such a way that when these vectors are replaced by v1 , . . . , vm−1 , the resulting list still spans V.
Without loss of generality, we may assume that these vectors are w1 , . . . , wm−1 . Then
 
V = span v1 , . . . , vm−1 , wm , . . . , wk
If k = m − 1, we have  
vm ∈ span v1 , . . . , vk
so that there exist real numbers λ1 , . . . , λk such that vm = λ1 v1 + · · · + λk vk , which is impossible
by Lemma 2.9, since v1 , . . . , vm are linearly independent by assumption.
We see that k > m. Since vm ∈ V, there are real numbers λ1 , . . . , λk such that
m−1
X k
X
vk+1 = λi vi + λi wi .
i=1 i=m

Then at least one number among λm , . . . , λk is not zero, because otherwise this equality would imply
that v1 , . . . , vk+1 are linearly dependent, which would contradict Corollary 2.29, since v1 , . . . , vm
are linearly dependent. Without loss of generality, we may assume that λm 6= 0. Then
m−1 k
1 X λi X λi
wm = vk+1 − vi − wi ,
λm i=1
λm λ
i=m+1 m

so that v1 , . . . , vm−1 , vm , wm+1 , . . . , wk span V. This completes the induction. 


Corollary 2.32. Let v1 , . . . , vk be linearly independent vectors in Rn . Then k 6 n.
For instance, any 4 vectors in R3 are linearly dependent: one must be in the span of the others.
Exercise 2.33. Let A be a 3×5 matrix. Explain why the columns of A must be linearly dependent.
ACCELERATED ALGEBRA 35

2.d. Bases and dimension.


If you want to achieve something, you build the basis for it.
Noam Chomsky
We have seen that the vectors e1 , . . . , en span the whole space Rn and are linearly independent.
Another way of saying this is that every vector x in Rn can be written as a linear combination
x = λ1 e1 + · · · + λn en
for some real numbers λ1 , . . . , λn in exactly one way (see Lemma 2.28).
Definition 2.34. Let V be a linear subspace of Rn . A basis of V is a list of vectors
v1 , . . . , v m
in the linear space V that is linearly independent and spans V.
In other words, a basis of the vector subspace V ⊆ Rn is a linearly independent spanning set.
Example 2.35. Let V be the subset in R3 that is defined as follows:
n o
V = x ∈ R 3 : x1 + x2 + x3 = 0 .
As observed in Example 2.14, the set V is a linear subspace of R3 spanned by the vectors
   
1 0
−1 ,  1  .
0 −1
Actually, these vectors are linearly independent, so that they form a basis of the vector subspace V.
Of course, this basis is not unique. For instance, another basis of the linear subspace V is
   
1 −3
 0  , −5 .
−1 8
The empty list of vectors is a basis of the trivial linear subspace {0} of the vector space Rn .
Similarly, if v is any nonzero vector in Rn and
 
V = span {v} ,
then v is a basis of V. In particular, any nonzero real number x is a basis of R.
Exercise 2.36. Let a, b, c and d be some real numbers. Let
   
a c
v1 = and v2 =
b d
be vectors in R2 . Prove that v1 and v2 form a basis in R2 ⇐⇒ ad − bc 6= 0.
Exercise 2.37. Let a and b be real numbers such that a 6= 0 or b 6= 0. Let
   
a −b
v1 = and v2 =
b a
be vectors in R2 . Prove that v1 and v2 form a basis in R2 .
Exercise 2.38. Let v1 and v2 be vectors in R2 that form a basis. Find all λ ∈ R such that
λv1 + v2 , v1 + λv2
2
is also a basis of the vector space R .
36 IVAN CHELTSOV

Observe that the list of vectors e1 , . . . , en introduced in (1.34) is a basis of the vector space Rn .
This basis is usually called the standard basis of Rn .
Exercise 2.39. Let        
1 1 1 6
v1 = 1 , v2 = 1 , v3 = 2 , x = 9 
      
1 2 3 14
Prove that v1 , v2 , v3 is a basis in R3 . Find real numbers λ1 , λ2 and λ3 such that
x = λ1 v1 + λ2 v2 + λ3 v3 .
Let V be a linear subspace of Rn and let v1 , . . . , vm be some vectors in V. Then
• v1 , . . . , vm span V ⇐⇒ for every vector x ∈ V, we have
X m
x= λi v i
i=1
for at least one list of real numbers λ1 , . . . , λm ,
• v1 , . . . , vm are linearly independent ⇐⇒ for every vector x ∈ V, we have
X m
x= λi vi
i=1
for at most one list of real numbers λ1 , . . . , λm ,
• v1 , . . . , vm is a basis of V ⇐⇒ for every vector x ∈ V, we have
Xm
x= λi vi
i=1
for exactly one list of real numbers λ1 , . . . , λm .
One vector subspace of the space Rn can have many different bases. But in all example so far,
all the bases of a given linear subspace have the same number of elements. This is a general truth:
Proposition 2.40. Any two bases of one linear subspace of Rn have the same number of elements.
Proof. This follows from Proposition 2.31. 
In particular, every basis of Rn contains exactly n vectors.
Exercise 2.41. Let v1 , . . . , vn be a basis of the space Rn , and let A be an invertible n × n matrix.
Prove that Av1 , . . . , Avn is also a basis of the vector space Rn .
Now our goal is to show that every vector subspace of Rn has at least one basis. We start with
Lemma 2.42. Let V be a linear subspace of Rn , and let v1 , . . . , vm be vectors in V that span V.
Then some subset of {v1 , . . . , vm } is a basis of the vector subspace V.
Proof. Consider all subsets of the set {v1 , . . . , vm } that span V. Choose one with the smallest
possible number of elements. Without loss of generality, we may assume that this subset is

v 1 , . . . , vk
for some k 6 m. Then v1 , . . . , vk is a basis of V. Indeed, these vectors span V by assumption, so
it only remains to show that they are linearly independent. If they are not, then, by Lemma 2.28,
one vector among v1 , . . . , vk is a linear combination of the remaining vectors, so that, without loss
of generality, we may assume that
Xk
v1 = λi vi ,
i=2
ACCELERATED ALGEBRA 37

so that v2 , . . . , vk span V by Corollary 2.17, which contradicts the minimality of k. 


Example 2.43. Let V be the vector subspace in R3 given by x1 + x2 + x3 = 0. The vectors
     
1 1 1
v 1 =  2  , v2 =  0  , v3 =  4 
−3 −1 −5
span V. Therefore, using Lemma 2.42, we see that some subset of this list must be a basis of V.
In fact, any two of them form a basis of this vector subspace. Note that the vectors
     
1 1 2
w1 =  2 , w2 =
  0 , w3 =
  0
−3 −1 −2
also span V. This time, the vectors w1 , w2 form a basis of V, the vectors w1 and w3 also form a
basis, but the vectors w2 and w3 do not form a basis.
The mirror image of Lemma 2.42 is the next result, which states that any linearly independent
set of vectors in a linear subspace can be extended to make a basis of this subspace.
Lemma 2.44. Let V be a subspace of Rn and let v1 , . . . , vk be linearly independent vectors in V.
Then there is some basis of V containing all of the vectors v1 , . . . , vk .
Proof. Consider all lists of linearly independent vectors in V that contain the vectors
v 1 , . . . , vk .
By Corollary 2.32, each such list contains at least n elements. Thus, we can choose one with the
largest number of elements:
v1 , . . . , vk , vk+1 , . . . , vm
where vk+1 , . . . , vm are some vectors in V and m > k. Then v1 , . . . , vm is a basis of the subspace V.
Indeed, these m vectors are linearly independent, so it only remains to show that it spans V.
Let v be any vector in V. The list of vectors v1 , . . . , vm , v has more than m elements, so these
vectors are linearly dependent. Hence there are real numbers λ1 , . . . , λm , λ, not all zero, such that
λ1 v1 + · · · + λm vm + λv = 0.
Since v1 , . . . , vm are linearly independent, we must have λ 6= 0. Then
λ1 λ2 λm
v=− v1 − − ··· − vm ,
λ λ λ
so that v is a linear combination of v1 , . . . , vm as required. 
Corollary 2.45. Every linear subspace of Rn has at least one basis.
Example 2.46. Let    
1 4
v1 = 2 , v2 = 5 .
3 6
Observe that the vectors v1 and v2 are linearly independent. To extend them to some basis of R3 ,
we can any vector v3 such that  
v3 6∈ span v1 , v2 .
 
1
For instance, we can take v3 = 0 . Then v1 , v2 , v3 is a basis.
0
38 IVAN CHELTSOV

Exercise 2.47. Consider three vectors


     
2 3 1
 2  −1 1
 , , .
 7   2  3
−1 4 1
in R4 . Show that they are linearly independent, and extend them to a basis of R4 .
Now we are ready to give the following very important definition.
Definition 2.48. Let V be a linear subspace of Rn . The dimension of V, often written as dim(V),
is the number of vectors in any basis of the linear subspace V.
Remark 2.49. This definition makes sense by Corollary 2.45 and Proposition 2.40.
We have dim({0}) = 0, since the empty list of vectors is a basis of {0}. We also have
dim Rn = n,


since Rn has a basis e1 , . . . , en that consists of n vectors.


Lemma 2.50. Let V and W be linear subspaces of Rn such that V ⊆ W. Then
 
dim V 6 dim W .

Moreover, if dim(V) = dim(W), then V = W .


Proof. Let v1 , . . . , vm be a basis of the linear subspace V. Then m = dim(V). By Lemma 2.44,
we can extend v1 , . . . , vm to the basis of W. This gives us linearly independent vectors
v1 , . . . , vm , vm+1 , . . . , vk
in W that span W, where k > m. Since k = dim(W), we must have m = dim(V) 6 dim(W) = k.
Moreover, if k = m, then V = W, because v1 , . . . , vk span both V and W in this case. 
Example 2.51. Let V be the vector subspace in R3 that is given by the equation x1 + x2 + x3 = 0,
and let W be the vector subspace in R3 that is given by
x1 + x2 + x3 = 0,


x 1 = x2 .
Then dim(V) = 2, since it has a basis with 2 vectors (see Example 2.35). Observe that
 
1
 1  ∈ W ( V.
−2
This show that this vector is a basis of W and dim(W) = 1.
Exercise 2.52. Let V be the vector subspace in R4 that is given by
x1 + 2x2 + 3x3 + 4x4 = 0,


4x1 + 3x2 + 2x3 + x4 = 0,


and let W be the vector subspace in R4 that is given by

x1 + 2x2 + 3x3 + 4x4 = 0,

4x1 + 3x2 + 2x3 + x4 = 0,
x − x + x − x = 0.

1 2 3 4

Prove that dim(V) = 2 and dim(W) = 1.


ACCELERATED ALGEBRA 39

Two-dimensional subspaces of Rn are called planes, and one-dimensional subspaces are lines.
Proposition 2.53. Let V be a linear subspace of Rn . Then dim(V) equals to
(1) the smallest number of vectors in any spanning set of V,
(2) the largest number of vectors in a linearly independent subset of V.
Proof. Choose a basis v1 , . . . , vm of the vector subspace V. Then

dim V = m.
by definition. Let k be the smallest number of vectors in any spanning set of V. Then

dim V > k,
because v1 , . . . , vm is a spanning set of V. Let w1 , . . . , wk be a spanning set of V with k vectors.
By Lemma 2.42, some subset of {w1 , . . . , wk } is a basis of V. But any basis of V has m elements,
so that k > m = dim(V). Thus, we proved that k = dim(V).
Let r be the largest number of vectors in a linearly independent subset of V. Then

dim V 6 r,
because v1 , . . . , vm are linearly independent. Let u1 , . . . , ur be a linearly independent vectors in V.
By Lemma 2.44, there is some basis of the vector subspace V that contains all vectors u1 , . . . , ur .
But any basis of V has m elements, so that r 6 m = dimV. Hence, we proved that r = dim(V). 
Exercise 2.54. Let v1 , . . . , vm be vectors in Rn . Prove that
!
 
dim span v1 , . . . , vm 6 m,

with equality if and only if v1 , . . . , vm are linearly independent.


To solve this exercise, you can use the following handy result:
Lemma 2.55. Let V be a vector subspace of Rn , let v1 , . . . , vm be vectors in V with m = dim(V).
Then v1 , . . . , vm is a basis of V ⇐⇒ v1 , . . . , vm span V ⇐⇒ v1 , . . . , vm are linearly independent.
Proof. Any basis of V is linearly independent and spans V, so it remains to prove the converses:
if v1 , . . . , vm span V or are linearly independent then they are a basis of V.
Suppose that v1 , . . . , vm span V. By Lemma 2.42, some subset of {v1 , . . . , vm } is a basis of V.
But any basis of V has m elements, so this subset is the whole {v1 , . . . , vm }, which is a basis of V.
The proof of the remaining assertion is very very similar. But now we should use Lemma 2.44.
Because of this, we leave it to the reader. 
Everything we defined, described and proved in this section can be done for Fn instead of Rn ,
where F is an arbitrary field (see Definition 1.50).
Exercise 2.56. Let p = 1973. Let V be the subset in F4p that is given by
x1 + 2x2 + 3x3 + 4x4 = 0,


4x1 + 3x2 + 2x3 + x4 = 0,


and let W be the subset in R4 that is given by

x1 + 2x2 + 3x3 + 4x4 = 0,

4x1 + 3x2 + 2x3 + x4 = 0,
x − x + x − x = 0.

1 2 3 4

Find how many elements V and W have.


40 IVAN CHELTSOV

2.e. Orthogonality.
The orthogonal features, when combined, can explode into complexity.
Yukihiro Matsumoto
Let v1 , . . . , vm be vectors in Rn . Recall the definition of dot-product from Section 1.c.
Definition 2.57. The vectors v1 , . . . , vm are orthogonal if
v i · vj = 0
for every i and j in {1, . . . , m} with i 6= j. We say that they are orthonormal if
0 for i =6 j,

vi · vj =
1 for i = j.

The standard basis vectors e1 , . . . , en are orthonormal. The vectors


   
1973 −2019
,
2019 1973
in R2 are orthogonal, but they are not orthonormal. Similarly, the vectors
   
1 1
,
1 −1
in R2 are orthogonal, but they are not orthonormal. The vectors
√ ! √ !
2 2
√2 , 2√
2 2
2
− 2

in R2 are are orthonormal. This two vectors are obtained from the previous
√ two vectors by rescaling,
so that they both have length 1. Recall from Section 1.c that kvk = v · v for all v ∈ Rn .
Example 2.58. Let v1 , v2 and v3 be vectors in R3 that are defines as follows:
     
1 0 0
v1 = 0 , v2 = 2 , v3 = 1 .
    
0 3 4
Then they are not orthogonal, as even though v1 · v2 = 0 and v1 · v3 = 0, we have v2 · v3 6= 0.
Lemma 2.59. If v1 , . . . , vm are orthogonal and all nonzero, then they are linearly independent.
Proof. Let λ1 , . . . , λm be scalars such that
m
X
(2.60) λi vi = 0.
i=1

Taking the dot product of each side of equation (2.60) with v1 gives
X m 
λi vi · v1 = 0.
i=1
By Lemma 1.12 and orthogonliry, we get
m
X m
X 
2

λ1 kv1 k = λi vi · v1 = λi vi · v1 = 0,
i=1 i=1
which gives λ1 = 0, because v1 6= 0 by assumption. Similarly, we see that λ2 = · · · = λm = 0.
This shows that v1 , . . . , vm are linearly independent. 
ACCELERATED ALGEBRA 41

Corollary 2.61. Orthonormal vectors are linearly independent.


Let V be a linear subspace of Rn .
Definition 2.62. An orthonormal basis of V is its basis that consists of orthonormal vectors.
The standard basis e1 , . . . , en of Rn is an orthonormal basis.
Exercise 2.63. Find an orthonormal basis v1 , v2 , v3 of the three-dimensional vector space R3
such that the coordinates of the vectors v1 , v2 and v3 are all nonzero.
If dim(V) = m and V contains m orthonormal vectors v1 , . . . , vm , then v1 , . . . , vm is an or-
thonormal basis of V by Lemmas 2.55 and 2.59.
Lemma 2.64. Let v1 , . . . , vm be an orthonormal basis of V, and let x be a vector in V. Then
m
X 
x= x · vi vi .
i=1

Proof. By definition of basis, we have


m
X
x= λi vi
i=1
for some real numbers λ1 , . . . , λm . Taking the dot product with v1 gives
Xm  m
X 
x · v1 = λi v i · v 1 = λi vi · v1 = λ1 ,
i=1 i=1

so that λ1 = x · v1 . Similarly, we see that λi = x · vi for every i ∈ {1, . . . , m}. 


If V = Rn with its standard basis e1 , . . . , en , then
x1
 
 x2  n
x=  ...  ∈ R ,

xn
where x · ei = xi . So in this case, Lemma 2.64 simply states that x = x1 e1 + · · · + xn en .
Example 2.65. Suppose that n = 2 and V = R2 . Let
√  √ 
2/2 2/2
v1 = √ , v2 = √ .
2/2 − 2/2
As we already mentioned above, these vectors are orthonormal, so that they form a basis of R2 .
To find real numbers λ1 and λ2 such that
 
5
= λ1 v1 + λ2 v2 ,
3
we can use Lemma 2.64. It gives

  
5
λ1 = 3 · v1 = 4 2,



 
λ2 = 5 · v2 = 2.


3
Exercise 2.66. Suppose that we erase the word orthonormal in the assertion of Lemma 2.64.
Is the lemma still true? Give a proof or give a counterexample.
42 IVAN CHELTSOV

If v1 , . . . , vm is any basis of V, we can write any vector x in V as a linear combination


Xm
x= λi vi
i=1
for a unique list of real numbers λ1 , . . . , λm . Then
X m  m
X 
x · vj = λi vi · vj = λi vi · vj
i=1 i=1

for each j ∈ {1, . . . , m}. This gives us m equations for the numbers λ1 , . . . , λm , which we can solve.
For instance, if m = n = 3 and
       
−2 1 2 1
v1 =  1  , v2 = 4 , v3 =  7  , x = 0
5 2 −4 0
then the vectors v1 , v2 and v3 form a basis of R3 , so that
x = λ1 v1 + λ2 v2 + λ3 v3
for some real numbers λ1 , λ2 and λ3 , which can be found by solving
   
x · v1 = λ1
 v1 · v1 + λ2

v 2 · v 1 + λ3 v3 · v1 ,
 
x · v2 = λ1 v1 · v2 + λ2 v 2 · v 2 + λ3 v3 · v2 ,
   
x · v3 = λ1 v1 · v3 + λ2 v 3 · v 2 + λ3 v3 · v3 ,

where all dot products can be computed as follows:


v1 · v1 = 30,



v2 · v2 = 21,





v3 · v3 = 69,





v1 · v2 = 12,



v1 · v3 = −17,
v2 · v3 = 22,





v1 · x = −2,





v2 · x = 1,





v3 · x = 3

Given a line through the origin in R3 , you can take the orthogonal plane through the origin.
Likewise, given a plane through the origin in R3 , you can take the orthogonal line through the origin.
Here is the general definition.
Definition 2.67. Let V be a linear subspace of Rn . The orthogonal complement of V is
n o
V⊥ = x ∈ Rn : x · v = 0 for all v ∈ V .

In order for a vector x to belong to V⊥ , it must be orthogonal to everything in V.


Exercise 2.68. Let V be a linear subspace in R3 define as follows:
    
 1 4 
V = span 2 , 5
3 6
 

Find a basis for the orthogonal complement of V.


ACCELERATED ALGEBRA 43

Let V be a linear subspace of Rn . Then V⊥ is pronounced “V-perp”.


Lemma 2.69. The subset V⊥ is also a linear subspace of Rn .
Proof. We verify the three conditions of Definition 2.2. For Definition 2.2(1), we have
0·v =0
for all v ∈ V. For Definition 2.2(2), let x and y be vectors in V⊥ . Then

x + y · v = x · v + y · v = 0 + 0 = 0,
for each v ∈ V, so that x + y ∈ V⊥ . The proof of Definition 2.2(3) is left to the reader. 

To check that a vector x in Rn is contained in V⊥ , we can use the following result:


Lemma 2.70. Let v1 , . . . , vm be vectors in V that span V. Then
n o
V⊥ = x ∈ Rn : x · v1 = · · · = x · vm = 0 .

Proof. Let x be a vector in Rn such that


x · vi = 0
for all i ∈ {1, . . . , m}. We have to show that x ∈ V⊥ . To do this, we must show that x · v = 0 for
every vector in v ∈ V. This is easy. Let v be a vector in V. Since v1 , . . . , vm span V, we have
v = λ1 v1 + · · · + λm v m
for some real numbers λ1 , . . . , λm . Then
  m
X 
x · v = x · λ1 v1 + · · · + λm v m = λi x · vi = 0,
i=1

as required. This shows that x ∈ V⊥ . 


n o
1
Example 2.71. Let V = span 1 , which is a line in R3 . Then by Lemma 2.70, we have
1
n   o n o
1
V⊥ = x ∈ R3 : x · 1 = 0 = x ∈ R3 : x1 + x2 + x3 = 0 .
1

Thus, the vector subspace V⊥ is a plane in R3 .


Surprisingly, any vector x ∈ Rn can be uniquely decomposed as
x=v+w

for some v ∈ V and some w ∈ V⊥ . We spend most of the rest of this section proving this.
Lemma 2.72. Let v1 , . . . , vm be an orthonormal basis of V. For every x ∈ Rn , write
m
X

P x = (x · vi )vi .
i=1

Then P (x) ∈ V and x − P (x) ∈ V⊥ .


44 IVAN CHELTSOV

Proof. Clearly, we have P (x) ∈ V. To prove that x − P (x) ∈ V⊥ , it suffices to show that

x − P (x) · vj = 0
for each j ∈ {1, . . . , m}. This follows from Lemma 2.70. We have
X m  m
X
   
P x · v1 = x · vi vi · v1 = x · v i vi · v1 = x · v1
i=1 i=1

so that (x − P (x)) · v1 = 0. Similarly, we see that (x − P (x)) · vj = 0 for every j ∈ {1, . . . , m}. 
The following result is an orthonormal analogue of Lemma 2.44.
Lemma 2.73. Let V be a linear subspace of Rn and let v1 , . . . , vk be orthonormal vectors in V.
Then there is some orthonormal basis of V containing all of the vectors v1 , . . . , vk .
Proof. Let m = dim(V) and W = span({v1 , . . . , vk }). If k = m, we are done. Therefore, we may
assume that k < m, so that W 6= V by Lemma 2.50. Then there is y ∈ V such that y 6∈ W. Let
k
X 
w= y · vi vi .
i=1

Then w ∈ W, so that y 6= w. Let


y−w
vk+1 = .
ky − wk
Then vk+1 ∈ V, since y ∈ V and w ∈ V. Moreover, we have vk+1 ∈ W⊥ by Lemma 2.72, so that
vk+1 · vi = 0
for every i 6 k. Furthermore, we also have kvk+1 k = 1. Then v1 , . . . , vk , vk+1 are orthonormal,
since v1 , . . . , vk are orthonormal.
We extended our list of orthonormal vectors v1 , . . . , vk to the list v1 , . . . , vk , vk+1 of orthonormal
vectors in V. After doing this construction m − k times, we obtain a list
v1 , . . . , vk , vk+1 , . . . , vm
of orthonormal vectors in V. Then it is an orthonormal basis of V. 
Corollary 2.74. Every linear subspace of Rn has at least one orthonormal basis.
Let us illustrate the proof of Lemma 2.73 with an explicit example. Let
     
1 −1 0
u1 = 2 , u2 =
   0 , u3 = 0 .
 
2 2 1
Let us show how to convert these vectors into an orthonormal basis of R3 . We proceed as follows.
(1) First, we let w1 = u1 .
(2) Then we let w2 = u2 + λw1 for some λ ∈ R. We want w1 and w2 to be orthogonal, so that
 
0 = w1 · w2 = w1 · u2 + λw1 = w1 · u2 + λw1 · w1 = 3 + 9λ,

which implies that λ = − 31 . This gives


     
−1 1 2
1  2 
w2 =  0 −
 2 =− 1 .
2 3 2 3 −2
ACCELERATED ALGEBRA 45

(3) Then we let w3 = u3 + λ1 w1 + λ2 w2 for some real numbers λ1 and λ2 . As above, we want
the vector w3 to be orthogonal to the vectors w1 and w2 , so that
 
0 = w1 · w3 = w1 · u3 + λ1 w1 + λ2 w2 = w1 · u3 + λ1 w1 · w1 = 2 + 9λ1

and
  4
0 = w2 · w3 = w2 · u3 + λ1 v1 + λ2 w2 = w2 · u3 + λ2 w2 · w2 = + 4λ2
3
which implies that λ1 = − 29 and λ2 = − 31 . This gives
     4  
0 1 −3 2
2 1 1
w3 = 0 − 2 − − 23  = −2 .
1 9 2 3 4 9 1
3

(4) By our construction, the vector w1 , w2 and w3 are orthogonal. To get orthonormal basis,
we have to normalize them as follows:
 
1
w1 1 
v1 = = 2
kw1 k 3 2
   
− 34 −2
w2 1 1
v2 = = − 32  = −1
kw2 k 2 4 3 2
3
 2   
2
w3 9
2 1 
v3 = = 3 −9 =
 −2 .
kw3 k 1 3 2
9
This gives us an orthonormal basis v1 , v2 and v3 .
Proposition 2.75. Let V be a linear subspace of Rn . Then
(1) V ∩ V⊥ = {0};
(2) for each x ∈ Rn , there are unique v ∈ V and w ∈ V⊥ such that x = v + w;
(3) dim(V) + dim(V⊥ ) = n.
Proof. To prove (1), let x be any vector in V ∩ V⊥ . Then
x·v =0
for every v ∈ V. In particular, this holds when v = x. This gives
kxk2 = x · x = 0,
so that x = 0. Vice versa, we have 0 ∈ V ∩ V⊥ , since V ∩ V⊥ is a vector subspace.
Before proving (2) and (3), let us choose some orthonormal basis v1 , . . . , vm of our linear
subspace V. We can do this by Corollary 2.74. Then m = dim(V).
To prove (2), take any vector x ∈ Rn and let
m
X
 
P x = x · vi vi .
i=1

Then, evidently, we have


  
x=P x + x−P x .
46 IVAN CHELTSOV

On the other hand, it follows from Lemma 2.72 that P (x) ∈ V and x − P (x) ∈ V⊥ , so that we let
( 
v=P x ,

w =x−P x .
Then x = v + w, where v ∈ V and w ∈ V⊥ .
To prove the uniqueness of the decomposition x = v + w, let v0 ∈ V and w0 ∈ V⊥ such that
x = v0 + w0 .
We must show that v0 = v and w0 = w. Now,
v + w = x = v0 + w0 ,
so v − v0 = w0 − w. But v − v0 ∈ V and w0 − w ∈ V⊥ , so both belong to V ∩ V⊥ , which is {0}.
Hence, we see that v = v0 and w = w0 as required.
To prove (3), let us choose some orthonormal basis w1 , . . . , wk of the linear subspace V⊥ .
Then v1 , . . . , vm , w1 , . . . , wk is an orthonormal basis of Rn . Indeed, these vectors span Rn by (2).
Moreover, all of them are orthogonal to each other, and all of them has length 1 by construction.
Hence v1 , . . . , vm , w1 , . . . , wk is an orthonormal spanning set of the n-dimensional vector space Rn .
Since orthonormality implies linear independence by Lemma 2.59, it is a basis. Then
dim V + dim V⊥ = k + m = n.
 

as required. 
Exercise 2.76. Let u be a non-zero vector in Rn . Let V be a subset in Rn given by
n o
n
V= x∈R : u·x=0 .
Prove that V is a linear subspace of Rn of dimension n − 1.
Exercise 2.77. Let V be a linear subspace of Rn . Prove that (V⊥ )⊥ = V.
ACCELERATED ALGEBRA 47

3. Matrices and linear equations

3.a. Rank–nullity theorem.


My rank is the highest known in Switzerland: I am a free citizen.
Bernard Shaw
Let A be m × n matrix with real entries. Then
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
A=  ... .. ..  ,
. . 
am1 am2 · · · amn
where each aij is a real number. Recall that
• col(A) denotes the column space of the matrix A, so that
col(A) ⊆ Rm ,
• row(A) denotes the row space of the matrix A, so that
row(A) ⊆ Rn ,
• ker(A) denotes the kernel of the matrix A, so that
ker(A) ⊆ Rn .
For small n and m, these vector subspaces are easy to describe explicitly.
Example 3.1. Consider the 4 × 3 matrix
 
1 3 4
0 0 0
 2 5 7 .
A= 
12 35 47
Now write the columns of the matrix A as y1 , y2 , y3 . Then y3 = y1 + y2 , so that dim(col(A)) = 2,
because y1 and y2 are linearly independent. Write the transposes of the rows of A as x1 , x2 , x3 , x4 .
Then x1 and x3 are linearly independent. Moreover, we have
x2 = 0x1 + 0x3


x4 = 10x1 + x3 .
so that x1 and x3 span row(A). Therefore, they form a basis of row(A). Then dim(row(A)) = 2.
To find the dimension of ker(A), observe that each vector x ∈ ker(A) is given by

x1 + 3x2 + 4x3 = 0,

2x1 + 5x2 + 7x3 = 0,

12x + 35x + 47x = 0.
1 2 3

Thus, x ∈ ker(A) ⇐⇒ x · x1 = x · x2 = x · x3 = x · x4 = 0. Thus, by Lemma 2.70, we see that


  ⊥
ker A = row A .

Now using Proposition 2.75, we conclude that ker(A) is one-dimensional.


48 IVAN CHELTSOV

Exercise 3.2. Let A be the following 3 × 3 matrix:


 
5 4 3
3 3 2 .
8 1 3
Find dimensions of the vector spaces col(A), row(A) and ker(A).
Lemma 3.3. Let A be an m × n matrix. Then
   
dim col(A) 6 min{m, n} > dim row(A) .
Proof. Both inequalities follow from Definition 2.48 and Lemma 2.50. 
The following theorem is one of the most useful result in these notes.
Theorem 3.4 (Rank–nullity). Let A be an m × n matrix. Then
   
dim col A + dim ker A = n

where n is the number of columns of the matrix A.


Proof. Let w1 , . . . , w` be some basis of col(A), and let v1 , . . . , vk be a basis of the kernel ker(A).
We have to prove that k + ` = n. By Lemma 2.21, we can choose vk+1 , . . . , vk+` ∈ Rn such that
Avk+1 = w1 , Avk+2 = w2 , . . . , Avk+` = w` .
Then k + ` = n ⇐⇒ v1 , . . . , vk+` is a basis of Rn , because each basis of Rn has n elements.
First, we prove that v1 , . . . , vk+` span Rn . Let x ∈ Rn . We have Ax ∈ col(A) by Lemma 2.21.
But the vectors w1 , . . . , w` span col(A), so that
Ax = δ1 w1 + · · · + δ` w`
for some real numbers δ1 , . . . , δ` . Then
 
Ax = δ1 Avk+1 + · · · + δ` Avk+` = A δ1 vk+1 + · · · + δ` vk+` .
Put x̂ = δ1 vk+1 + · · · + δ` vk+` . Then Ax = Ax̂, so that A(x − x̂) = 0, or equivalently
x − x̂ ∈ ker(A).
But v1 , . . . , vk span ker(A), so
x − x̂ = λ1 v1 + · · · + λk vk
for some real numbers λ1 , . . . , λk . Substituting in the definition of x̂ and rearranging gives
x = λ1 v1 + · · · + λk vk + δ1 vk+1 + · · · + δ` vk+` ,
so that x ∈ span{v1 , . . . , vk+` } as required.
Now let us prove that v1 , . . . , vk+` are linearly independent. Let λ1 , . . . , λk , δ1 , . . . , δ` be real
numbers such that
λ1 v1 + · · · + λk vk + δ1 vk+1 + · · · + δ` vk+` = 0.
Multiplying each side by A gives
λ1 Av1 + · · · + λk Avk + δ1 Avk+1 + · · · + δ` Avk+` = 0.
But the vectors v1 , . . . , vk are contained in ker(A) and Avk+i = wi . Then δ1 w1 + · · · + δ` w` = 0.
Since w1 , . . . , w` are linearly independent, we get δ1 = · · · = δ` = 0, which gives
λ1 v1 + · · · + λk vk = 0.
But v1 , . . . , vk are linearly independent, so that λ1 = · · · = λk = 0. We have now shown that all
numbers λj and δi are zero, so that v1 , . . . , vk+` are linearly independent, completing the proof. 
ACCELERATED ALGEBRA 49

Exercise 3.5. Let A be a m × n matrix such that


AB = Im ,


BA = In ,
for some n × m matrix B. Show that m = n.
On the other hand, we have the following result:
Lemma 3.6. Let A be an m × n matrix. Then
ker(A) = row(A)⊥

and also row(A) = ker(A)⊥ .

Proof. Let x be any vector in Rn . To prove that ker(A) = row(A)⊥ , we must show that
Ax = 0 ⇐⇒ x ∈ row(A)⊥ .
To do this, write the transposes of the rows of the matrix A as r1 , . . . , rm . Then
r1 · x
 
 r2 · x 
Ax =  ...  .

rm · x
Hence, we have
r1 · x = 0,



r2 · x = 0,


Ax = 0 ⇐⇒ .

 ..


rm · x = 0.

Now, using Lemma 2.70, we see that


!⊥
  ⊥
Ax = 0 ⇐⇒ x ∈ span r1 , . . . , rm ⇐⇒ x ∈ row A

by definition of the subspace row(A). This proves ker(A) = row(A)⊥ . Applying ⊥ to each side
and using Exercise 2.77, we get ker(A)⊥ = row(A). 
Using this lemma and Proposition 2.75, we obtain
Corollary 3.7. Let A be an m × n matrix. Then
   
dim row A + dim ker A = n

where n is the number of columns of the matrix A.


Using this result and Theorem 3.4, we get
Corollary 3.8. Let A be an m × n matrix. Then
   
dim row A = dim row A
50 IVAN CHELTSOV

Exercise 3.9. Let A be a m × n matrix such that its every row is a scalar multiple of the first
row. Prove that there is 
i ∈ 1, . . . , n
such that every column of the matrix A is a scalar multiple of the ith column.
The following exercise gives an alternative proof of Corollary 3.8 that does not use Lemma 3.6,
so that it should be solved without using Lemma 3.6 and its corollaries.
Exercise 3.10. Let A be a m × n matrix. Prove Corollary 3.8 as follows.
(i) For every vector x ∈ Rn , prove that if xT AT Ax = 0, then Ax = 0.
(ii) Use (i) to deduce that ker(AT A) = ker(A).
(iii) Use (ii) and Theorem 3.4 to deduce that
   
dim col AT A = dim col A .
(iv) Prove that    
dim col AT A 6 dim col AT .
(v) Use (iii) and (iv) to deduce Corollary 3.8.
Now we are in position to give the following definition.
Definition 3.11. Let A be an m × n matrix. Then the rank of the matrix A is
   
rank(A) = dim col A = dim row A .

Using Definition 3.11 and Lemma 2.42, we see that


• the rank of the matrix is the maximal number of its linearly independent columns;
• or the rank of the matrix is the maximal number of its linearly independent rows.
Exercise 3.12. Let A be a m × n matrix, and let B be a n × k matrix. Prove that
 n  o
rank AB 6 min rank A , rank B .

Theorem 3.4 is known as the rank-nullity theorem, because of the following definition:
Definition 3.13. Let A be an m × n matrix. The nullity of the matrix A is dim(ker(A)).
Exercise 3.14. Let A be a 3 × 5 matrix. What are the possible values of the nullity of A?
Using Definitions 3.11 and 3.13, we can restate Theorem 3.4 as follows: for a matrix A, one has
 
rank A + nullity A = n
where n is the number of columns of the matrix A.
Exercise 3.15. Let A be the following 6 × 4 matrix:
1 1 1 1
 
0 1 1 1
1 0 1 1
 
.
1 1 0 1

1 1 1 0
1 1 1 1
Find its rank and nullity.
ACCELERATED ALGEBRA 51

Exercise 3.16. Let A be the following 4 × 4 matrix:


 
1−λ 0 0 0
 0
 1−λ 0 0 ,
 0 0 2−λ 1 
0 0 0 2−λ
where λ is a real number. For every λ ∈ R, find the rank and nullity of the matrix A.
Let us apply Theorem 3.4 (the rank–nullity theorem) to square matrices.
Theorem 3.17. Let A be a n × n matrix. Then the following conditions are equivalent:
(i) the matrix A is invertible (see Definition 1.39);
(ii) there exists a n × n matrix B such that AB = In ;
(iii) ker(A) = {0}, i.e. the nullity of the matrix A is 0;
(iv) for every vector b ∈ Rn , there is exactly one vector x ∈ Rn such that
Ax = b;
(v) the columns of the matrix A are linearly independent;
(vi) the columns of the matrix A span Rn ;
(vii) the columns of the matrix A form a basis of Rn ;
(viii) rank(A) = n.
Proof. Using Lemma 2.55, we immediately see that
(v) ⇐⇒ (vi) ⇐⇒ (vii) ⇐⇒ (viii),
because the dimension of the vector space Rn is n. Moreover, we have
(iii) ⇐⇒ (viii)
by Theorem 3.4
Let us show that (iv) implies (iii). Let x be a vector in ker(A). Then
Ax = 0 = A0.
Thus, if (iv) holds, then its uniqueness part implies x = 0, so that ker(A) = {0}, which gives (iii).
Now let us show that (iii) implies (iv). Suppose that (iii) holds. Let us show that (iv) also holds.
Let b be any vector in Rn . Write the columns of A as v1 , . . . , vn . Since (iii) holds, then (vi) also
holds (we already proved this), so that
b = x1 v 1 + · · · + xn v n
for some real numbers x1 , . . . , xn . This implies that Ax = v for
x1
 
 x2 
x=  ...  .

xn
Moreover, such vector x is unique. Indeed, if x̂ is another vector in Rn such that Ax̂ = v, then

A x̂ − x = Ax̂ − Ax = b − b = 0,
so that the vector x̂ − x is contained in ker(A). But ker(A) = {0}, since (iii) holds. Then
x̂ − x = 0,
which gives x̂ = x as required. This shows that (iii) ⇐⇒ (iv).
52 IVAN CHELTSOV

Now let us prove that (i) implies (iii). Let x be a vector in ker(A). If A is invertible, then
x = In x = A−1 A x = A−1 Ax = A−1 0 = 0,
 

so that x = 0. Thus, if A is invertible, then ker(A) = {0}, which gives (iii).


Not let us prove that (iv) implies (ii). Let e1 , . . . , en be the standard basis of Rn . If (iv) holds,
then there are vectors b1 , . . . , bn in Rn such that

 Ab1 = e1 ,

Ab2 = e2 ,


..


 .

Abn = en ,

which can be rewritten as AB = In , where B is the n × n matrix whose columns are b1 , . . . , bn .


This shows that (iv) implies (ii).
To complete the proof, it is enough to show that (ii) implies (i). Let us do this. Suppose that (ii)
holds. This means that
AB = In
for some n × n matrix B. Let C = BA. We have to show that C = In . Suppose that C 6= In .
Let us seek for a contradiction. Note that
 
AC = A BA = AB A = In A = A.
Then A(C −In ) = 0. Since C 6= In , at least one column of the matrix C −In is not zero. Denote this
non-zero column by u. Then Au = 0, so that u ∈ ker(A). In particular, we see that ker(A) 6= {0}.
Recall from Lemma 2.21 that
n o
m n
col(A) = y ∈ R : y = Ax for some x ∈ R .

Substituting all columns of the matrix B into this formula and using AB = In , we conclude that
all vectors of the standard basis e1 , . . . , en are contains in col(A). This means that rank(A) = n.
Then, by Theorem 3.4, we have ker(A) = {0}, which is a contradiction, so that (ii) implies (i). 
Exercise 3.18. Write down the inverse of the matrix
 
7 0 0 0
0 −2 0 0 
0 0 1 0  .
 
0 0 0 10
Corollary 3.19. Let A be a n × n matrix. Then the following conditions are equivalent:
(i) A is invertible;
(ii) the rows of A are linearly independent;
(iii) the rows of A span Rn ;
(iv) the rows of A are a basis of Rn .
Proof. By Lemma 1.45, the matrix A is invertible ⇐⇒ the transposed matrix AT is invertible.
Thus, applying Theorem 3.17 to the matrix AT , we obtain the required assertion. 
Exercise 3.20. Let A be a m × n matrix. Prove or disprove the following assertions.
(i) The rows of A are linearly independent ⇐⇒ the columns of A are linearly independent.
(ii) Suppose that m = n, i.e. A is a square matrix. Then the rows of the matrix A are linearly
independent ⇐⇒ the columns of the matrix A are linearly independent.
ACCELERATED ALGEBRA 53

3.b. Determinants.
After hard work, the biggest determinant is being in the right place at the right time.
Michael Bloomberg
In Example 1.41, we met determinants of 2 × 2 matrices. Namely, let
 
a b
A= .
c d
From Example 1.41, we know that the determinant of A is the number
ad − bc
which is denoted by det(A) or sometimes by

a b
c d .

Then |det(A)| is the area of the parallelogram in R2 whose edge-vectors are the columns of A.
Now we consider similar example in dimension three. Let A be a 3 × 3 matrix
 
a11 a12 a13
a21 a22 a23  .
a31 a32 a33
Then its determinant det(A) is the number given by the formula

det(A) = a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31
which can be rewritten as

a22 a23 a21 a23 a21 a22
a32 a33 − a12 a31 a33 + a13 a31 a32 .
det(A) = a11

Alternatively, we can rewrite this formula as



a22 a23 a12 a13 a12 a13
det(A) = a11 a32 a33 − a21 a32 a33 + a31 a22 a23 .

As in dimensional two, one can show that |det(A)| is the volume of the parallelepiped in R3 whose
edges are the vectors the columns of the matrix A.
Exercise 3.21. Find the maximum value of the determinant

a11 a12 a13

a21 a22 a23

a31 a32 a33

when aij ∈ {0, 1} for every i ∈ {1, 2, 3} and j ∈ {1, 2, 3}.


Similarly, if A is a 4 × 4 matrix
 
a11 a12 a13 a14
a21 a22 a23 a24 
 ,
a31 a32 a33 a34 
a41 a42 a43 a44
54 IVAN CHELTSOV

then its determinant det(A) is given by the formula

a11 a22 a33 a44 − a11 a22 a34 a43 − a11 a23 a32 a44 + a11 a23 a34 a42 + a11 a24 a32 a43 − a11 a24 a33 a42 −
− a12 a21 a33 a44 + a12 a21 a34 a43 + a12 a23 a31 a44 − a12 a23 a34 a41 − a12 a24 a31 a43 + a12 a24 a33 a41 +
+ a13 a21 a32 a44 − a13 a21 a34 a42 − a13 a22 a31 a44 + a13 a22 a34 a41 + a13 a24 a31 a42 − a13 a24 a32 a41 −
− a14 a21 a32 a43 + a14 a21 a33 a42 + a14 a22 a31 a43 − a14 a22 a33 a41 − a14 a23 a31 a42 + a14 a23 a32 a41 ,
which is not so non-informative as it may look like. As in the previous cases, we can rewrite it
using the very same pattern:

a22 a23 a24 a21 a23 a24 a21 a22 a24 a21 a22 a23

det(A) = a11 a32 a33 a34 − a12 a31 a33 a34 + a13 a31 a32 a34 + a14 a31 a32 a33 .

a42 a42 a44 a41 a43 a44 a41 a42 a44 a41 a42 a43

Exercise 3.22. Let a, b, c and d be real numbers, and let


 
a b c d
 −b a d −c
A=  −c −d
.
a b
−d c −b a
Prove that det(A) = (a2 + b2 + c2 + d2 )2 .
Now we are in position to define the determinant of any n × n matrix using induction on n.
Namely, let A be a n × n matrix
a11 a12 . . . a1n
 
 a21 a22 . . . a2n 
 .
 .. .. . . ..  ,
. . . 
an1 an2 . . . ann
If n = 1, we let det(A) = a11 . If n > 2, let write A[i, j] be the (n − 1) × (n − 1) matrix obtained
from A by deleting the ith row and jth column. Then the determinant of A is the number det(A)
that is defined as
n
X
(−1)1+j a1j det A[1, j]

(3.23) det(A) =
j=1

which is sometimes also denoted as



a11 a12 . . . a1n
a21 a22 . . . a2n

.
.. .. . . . .. .
. .
a
n1 an2 . . . ann
Example 3.24. Using our definition and induction on n, one can show that

λ1 0 ··· 0
..

0

λ2 . 0
= λ1 λ2 · · · λn .

. .. . . ..
.. . . .

0 0 · · · λn
In particular, we have det(In ) = 1.
ACCELERATED ALGEBRA 55

Our inductive definition of the determinant of the matrix A gives a special status to its first
row. In fact, we can use any row or column to give a similar formula.
Proposition 3.25. Let A be a n × n matrix (aij ). Then
Xn
(−1)i+j aij det A[i, j]

det(A) =
j=1

for each i ∈ {1, . . . , n}. Similarly, we have


X n
(−1)i+j aij det A[i, j]

det(A) =
i=1

for each j ∈ {1, . . . , n}.


Proof. Omitted. 
We can often use this result to speed up calculations of determinants, as in the next example.
To handle the signs (−1)i+j , it is useful to notice that they form a chessboard pattern:
+ − + − ···
 
− + − + · · · 
+ − + − · · ·
 
− + − + · · · 
 
.. .. .. .. . .
. . . . .
Example 3.26. Suppose we want to compute the determinant of
 
3 1 0 2
 10 4 3 −9
A= .
4 −1 0 4 
−7 2 0 0
Using Proposition 3.25, we get

3 1 2  
1 2 3 2
det(A) = −3 4 −1 4 = −3 −7

− 2 4 4 = 150.

−7 2 0 −1 4

Proposition 3.27. Let A be a n × n matrix with columns v1 , . . . , vn .


(i) Let B be the matrix obtained from A by swapping columns i and j for i 6= j. Then
det(B) = −det(A).
(ii) Let B be the matrix obtained from A by multiplying the ith column by a scalar λ. Then
det(B) = λdet(A).
(iii) If vi = 0 for some i ∈ {1, . . . , n}, then det(A) = 0.
(iv) Let B and C be n × n matrices with columns u1 , . . . , un , and w1 , . . . , wn , respectively.
Suppose that there is k ∈ {1, . . . , n} such that
wk = vk + uk
and wi = vi = ui for every i ∈ {1, . . . , n} such that i 6= k. Then
det(C) = det(A) + det(B).
(v) det(AB) = det(A)det(B) for any n × n matrix B.
(vi) det(AT ) = det(A).
56 IVAN CHELTSOV

Proof. Omitted. 
Corollary 3.28. Let A be a n × n matrix with columns v1 , . . . , vn . Suppose that
vi = vj
for some i and j in {1, . . . , n} such that i 6= j. Then det(A) = 0.
Proof. Let B be the matrix obtained from A by swapping columns i and j. Then
det(B) = −det(A),
by Proposition 3.27(i). But B = A, so that det(A) = −det(A), which gives det(A) = 0. 
Corollary 3.29. Let A be a n × n matrix with columns v1 , . . . , vn , let C be a n × n matrix with
columns w1 , . . . , wn . Suppose that there are k 6= r in {1, . . . , n} such that
wk = vk + λr vr
for some λr ∈ R, and wi = vi for every i ∈ {1, . . . , n} such that i 6= k. Then det(C) = det(A).
Proof. Let B be a n × n matrix with columns u1 , . . . , un defined as follows:
ui = vk if i 6= k,


ui = λr vr if i = k,
Then det(B) = 0 by Proposition 3.27(ii), so that
det(C) = det(A) + det(B) = det(A)
by Proposition 3.27(iv). 
Corollary 3.30. Let A be an invertible n × n matrix. Then det(A) 6= 0 and
1
det A−1 =

.
det(A)
Proof. One has AA−1 = I2 , so that
det(A)det(A−1 ) = det(In ) = 1
by Proposition 3.27(v) and Example 3.24. 
Exercise 3.31. Let A be a n × n matrix with integer entries. Suppose that A is invertible, and
the inverse matrix A−1 also has integer entries. What is det(A)?
Exercise 3.32. Doing as little work as possible, compute the determinant of the matrix
 
1 2 1 0 4
 7 0 1 −1 −2
 
−1 3 −1 0 1
 .
0 2 0 0 0
1 1 1 0 2
Exercise 3.33. Observe that the integers 20604, 53227, 25755, 20927 and 289 are divisible by 17.
Moreover, it follows from (3.23) that the determinant of the matrix
 
2 0 6 0 4
5 3 2 2 7 
 
2 5 7 5 5  .
 
2 0 9 2 7 
0 0 2 8 9
is an integer. Prove that it is also divisible by 17.
ACCELERATED ALGEBRA 57

Exercise 3.34. Let x1 , x2 , x3 , x4 , y1 , y2 , y3 and y4 be real numbers, let


   
x1 x2 x3 x4 y1 y2 y3 y4
−x2 x1 x 4 −x 3
  −y 2 y1 y4 −y3 
A= −x3 −x4 x1 and B =  .
x2   −y3 −y4 y1 y2 
−x4 x3 −x2 x1 −y4 y3 −y2 y1
Using Exercise 3.22 and computing AB T , prove thay
2
x21 + x22 + x23 + x24 y12 + y22 + y32 + y42 = x1 y1 + x2 y2 + x3 y3 + x4 y4 +
 
2 2 2
+ x1 y2 − x2 y1 − x3 y4 + x4 y3 + x1 y3 + x2 y4 − x3 y1 − x4 y2 + x1 y4 − x2 y3 + x3 y2 − x4 y1 .
By Proposition 3.27(vi), for every property of determinants involving columns, there is a similar
property involving rows. For instance, if some row of a n × n matrix A is zero, then det(A) = 0.
Lemma 3.35. Let A be a n × n matrix. If rank(A) < n, then det(A) = 0.
Proof. Suppose that rank(A) < n. Then columns (or rows) of the matrix A are linearly dependent.
Let v1 , . . . , vn be column of the matrix A. Then
λ1 v1 + λ2 v3 + · · · + λn vn = 0
for some real numbers λ1 , λ2 , . . . , λn such that at least one of these numbers is not zero. Without
loss of generality, we may assume that λ1 6= 0. Then
λ2 λn
v1 − v3 − · · · − vn .
λ1 λ1
Now applying Corollary 3.29 n − 1 times, we obtain a matrix whose with the same determinant as
A whose first column is
λ2 λn
v1 + v3 + · · · + vn = 0.
λ1 λ1
Then applying Proposition 3.27(ii), we get det(A) = 0. 
In fact, if A is a n × n matrix, then rank(A) < n ⇐⇒ det(A) = 0. To show this, we have to
introduce some terminology: the (i, j)-cofactor of A is
Cij = (−1)i+j det A[i, j]


Then it follows from Proposition 3.25 that


n
X
det(A) = aij Cij
j=1

for any i ∈ {1, . . . , n}. Then the adjugate of A is the n × n matrix adj(A) whose (i, j)-entry is Cji .
Note the reversal of the indices! For instance, the adjugate of a 2 × 2 matrix is given by
   
a b d −b
adj = = .
c d −c a
Notice that in this case, A adj(A) = det(A)I2 . In fact, this is true for all square matrices:
Proposition 3.36. Let A be a n × n matrix. Then
A adj(A) = det(A)In .
58 IVAN CHELTSOV

Proof. Let A be a n × n matrix. We use the convention that the (i, j)-entry of a matrix M is
written as mij . Let i and k be any indices in {1, . . . , n}. We must show that
det(A) if i = k,
 

A adj A =
ik 0 if i 6= k.
We have
 n n
 X X
A adj A = aij Ckj = (−1)k+j aij det(A[k, j]).
ik
j=1 j=1

If i = k, then this sum is equal to det(A), by Proposition 3.25 as required.


Now we suppose that i 6= k. Let B be the n × n matrix obtained from A by replacing the kth
row by the ith row (and leaving all the other rows alone). The rows of A and B are the same apart
from the kth row, so
B[k, j] = A[k, j]
for all j. Also, we have bkj = aij for all j. Hence
  n
X
A adj(A) = (−1)k+j bkj det(B[k, j]) = det(B) = 0
ik
j=1

by Proposition 3.25 and Lemma 3.35, since two rows of B are equal. Hence (A adj(A))ik = 0. 
Corollary 3.37. Let A be a n × n matrix. Then A is invertible ⇐⇒ detA 6= 0.
Proof. By Corollary 3.30, it is enough to prove that A is invertible provided that det(A) 6= 0.
However, if det(A) 6= 0, then
 
1
A adj(A) = In
detA
by Proposition 3.36, so A is invertible by Theorem 3.17. 
Corollary 3.38. Let A be a n × n matrix. Then
rank(A) = n ⇐⇒ det(A) 6= 0 ⇐⇒ A is invertible.
Exercise 3.39. Find inverses of the following 3 × 3 matrices:
       
1 0 0 6 0 0 1 3 0 1 2 3
0 1 0 , 0 1 2 , 2 7 0 , 0 1 4 .
3 0 1 0 3 5 0 0 7 5 6 0
Exercise 3.40. Let A be the following 3 × 3 matrix:
 
7−λ −12 6
 10 −19 − λ 10  ,
12 −24 13 − λ
where λ is a real number. Find rank(A) for all possible λ ∈ R.
Exercise 3.41. Let A be a m × n matrix such that
rank(A) = m.
Prove that there exists a n × m matrix B such that AB = Im .
ACCELERATED ALGEBRA 59

3.c. Systems of linear equations.


A great deal of my work is just playing with equations and seeing what they give.
Paul Dirac
Let us consider the following system of simultaneous linear equations:
a x + a12 x2 + · · · + a1n xn = b1 ,

 11 1

a21 x1 + a22 x2 + · · · + a2n xn = b2 ,


(3.42) ..


 .

am1 x1 + am2 x2 + · · · + amn xn = bm .

Here, each aij and bi are some (fixed) real numbers, while x1 , . . . , xn are unknowns (variables).
The fundamental questions about a system of linear equations are these:
• Are there any solutions?
• If so, how many?
• And how can we compute them?
To answer them, we can use results proved in Section 3.a, because (3.42) can be rewritten as
a11 a12 · · · a1n x1 b1
    
 a21 a22 · · · a2n   x2   b2 
 . .. ..   .  =  . .
 .. . .   ..   .. 
am1 am2 · · · amn xn bm
Example 3.43. Suppose that n = 2. Then (3.42) simplifies as
a11 x1 + a12 x2 = b1 ,


a21 x1 + a22 x2 = b2 .
Multiply the first row by a21 and the second by a11 , then subtract. This gives
 
a11 a22 − a12 a21 x2 = a11 b2 − a21 b1 .
Assuming that a11 a22 − a12 a21 6= 0, this gives
a11 b2 − a21 b1
x2 =
a11 a22 − a12 a21
from which it follows that
a22 b1 − a12 b2
x1 = .
a11 a22 − a12 a21
So as long as a11 a22 − a12 a21 6 0, there is a unique solution. Note that
=
 
a11 a12
a11 a22 − a12 a21 = det .
a21 a22
Thus, if this matrix is invertible, then the system has a unique solution (cf. Example 1.41).
Now let us consider two very explicit examples, Examples 3.44 and 3.45 below, that illustrate
the method for solving system of linear equations which is known as Gaussian elimination.
Example 3.44. Let us consider the following system of linear equations:

2x + 3y − z = 3,

x + y + z = 4,
3x − 4y + z = 1.

60 IVAN CHELTSOV

In matrix form, it looks like


    
2 3 −1 x 3
1 1 1  y  = 4 .
3 −4 1 z 1
Let us simplify our system of linear equations. We begin by swapping the first two equations:

x + y + z = 4,

2x + 3y − z = 3,
3x − 4y + z = 1.

Obviously, this system of linear equations has the same solutions as our original system of equations.
Now let us subtract 2 times the first equation from the second equation, and then subtract 3 times
the first equation from the third equation. This gives:

x + y + z = 4,

y − 3z = −5,
 − 7y − 2z = −11.

Doing this does not change the solutions either, because


numbers x, y and z satisfy the old equations ⇐⇒ they satisfy the new ones.
We have eliminated x from all but one equation. Let us try to do the same with the variable y.
Subtracting 1 times the second equation from the first equation, and adding 7 times the second
equation to the third one, we get

x + 4z = 9,

y − 3z = −5,
 − 23z = −46.

Again, this system of linear equations has the same solutions as the original system of equations.
Actually, we can simplify it a bit more:

x + 4z = 9,

y − 3z = −5,

z = 2.

In matrix form, this new system of equations looks like


    
1 0 4 x 9
0 1 −3 y  = −5 .
0 0 1 z 2
We see that the new 3 × 3 matrix is upper-triangular and all its diagonal elements are equal to 1.
Let us simplify this system of equations a bit more. Subtracting 4 times the third equation from
the first equation, and adding 3 times the third equation to the second equation, gives us

x = 1,

y = 1,

z = 2.

It has unique solutions: x = 1, y = 1 and z = 2.


ACCELERATED ALGEBRA 61

Example 3.45. Let us consider the following system of linear equations:



x + 2y + 3z = 1,

3x + 5y − 2z = 2,

4x + 7y + z = 3.

Subtracting 3 times the first equation from the second equation, and 4 times the first equation
from the third equation, we get 
x + 2y + 3z = 1,

− y − 11z = −1,
 − y − 11z = −1.

Now we multiply the second equation by −1, and get:



x + 2y + 3z = 1,

y + 11z = 1,
 − y − 11z = −1.

Then subtract 2 times the second equation from the first one, and add 1 times the second equation
to the third equation. This gives 
x − 19z = −11,

y + 11z = 1,

0 = 0.

Now the third equation can be ignored. Moreover, in the first two equations, we can choose z freely,
say by putting z = t for an arbitrary scalar t. Thus, the solutions of the system are:

x = −1 + 19t,

y = 1 − 11t,

z = t.

where t is any real number. We call z a free variable, since we can choose it freely. Similarly, we
say that x and y are leading variables.
If all numbers b1 , b2 , . . . , bn in (3.42) are equal to zero, then (3.42) simplifies as
a x + a12 x2 + · · · + a1n xn = 0,

 11 1

a21 x1 + a22 x2 + · · · + a2n xn = 0,


(3.46) ..


 .

am1 x1 + am2 x2 + · · · + amn xn = 0.

Such systems of linear equations are called homogeneous. Otherwise, they are called inhomoge-
neous. Let us rewrite (3.46) as Ax = 0, where
a11 a12 · · · a1n
 
 a21 a22 · · · a2n 
A=  ... .. ..  .
. . 
am1 am2 · · · amn
Then the solutions to (3.46) span the kernel ker(A). This is a linear subspace of Rn by Lemma 2.5.
Therefore, in particular, our system of linear equations (3.46) always has the trivial solution x = 0,
Moreover, applying Theorem 3.4 (the rank–nullity theorem), we get
62 IVAN CHELTSOV

Corollary 3.47. A homogeneous system of linear equations with more variables than equations
always has infinitely many solutions. In particular, it has at least one nontrivial solution.

Example 3.48. A conic in R2 is a curve that is given by the following equation

(3.49) ax2 + bxy + cy 2 + dx + ey + f = 0

where a, b, c, d, e, f are some real numbers such that (a, b, c) 6= (0, 0, 0). For example, an ellipse,
hyperbola and parabola are all conics. Let P1 , P2 , P3 , P4 , P5 be distinct points in R2 such that at
most 3 of them are contained in one line. Then there exists a unique conic in R2 that contains all
of them. Indeed, write P1 = (x1 , y1 ), P2 = (x2 , y2 ), P3 = (x3 , y3 ), P4 = (x4 , y4 ) and P5 = (x5 , y5 ),
where each xi and yi is a real number. If C is a conic that contains P1 , P2 , P3 , P4 and P5 , then
 2

 ax1 + bx1 y1 + cy12 + dx1 + ey1 + f = 0,

 2 2
ax2 + bx2 y2 + cy2 + dx2 + ey2 + f = 0,



(3.50) ax23 + bx3 y3 + cy32 + dx3 + ey3 + f = 0,

ax24 + bx4 y4 + cy42 + dx4 + ey4 + f = 0,





ax2 + bx y + cy 2 + dx + ey + f = 0.

5 5 5 5 5 5

Therefore, we have 6 unknowns a, b, c, d, e and f , and only 5 linear equations. Using Corollary 3.47,
we see that there exists a conic in R2 that contains P1 , P2 , P3 , P4 , P5 . Now we let
 2 
x1 x1 y1 y12 x1 y1
x2 x2 y2 y 2 x2 y2 
 22 2
2

A= x
 32 x 3 y 3 y 3 x3 y3  .

2
x4 x4 y4 y4 x4 y4 
x25 x5 y5 y52 x5 y5

To show that there exists unique conic in R2 that contains P1 , P2 , P3 , P4 , P5 , it is enough to show
that rank(A) = 5. Indeed, if rank(A) = 5 then it follows from Theorem 3.4 that solutions of the
system of linear equations (3.50) form one-dimensional vector subspace in R6 , which simply means
that such solution is unique up to scaling, so that (3.49) is unique up to scaling, which means that
the required conic is unique. One the other hand, we know from Definition 3.11 that
 
rank A = 5 ⇐⇒ dim col A = 5 ⇐⇒ col(A) = R5 .


Thus, we conclude that


rank(A) = 5 ⇐⇒ col(A) contains e1 , e2 , e3 , e4 , e5 ,
where e1 , e2 , e3 , e4 , e5 is the standard basis of R5 . Therefore, it follows from Lemma 2.21 that to
prove the uniqueness of the conic that contains P1 , P2 , P3 , P4 , P5 , it is enough to find five vectors

a1 a2 a3 a4 a5
         
 b1   b2   b3   b4   b5 
 c1   c2   c3   c4   c5 
         
 , , , , 
 d1   d2   d3   d4   d5 
e  e  e  e  e 
1 2 3 4 5
f1 f2 f3 f4 f5
ACCELERATED ALGEBRA 63

such that
a1 a2 a3 a4 a5
         
 b1   b2   b3   b4   b5 
c  c  c  c  c 
         
A  1  = e1 , A  2  = e2 , A  3  = e3 , A  4  = e4 , A  5  = e5 .
d1   d2   d3   d4   d5 
e  e  e  e  e 
1 2 3 4 5
f1 f2 f3 f4 f5
Keeping in mind (3.49), we see that this is equivalent to the following geometric condition:
F for every Pi among P1 , P2 , P3 , P4 , P5 , there exists a conic Ci in R2 such that
(1) the conic Ci does not contain the point Pi ;
(2) the conic Ci contains all other points among P1 , P2 , P3 , P4 , P5 .
The condition F not very hard to check. Without loss of generality, it is enough to check F for P1 .
Namely, let us show that there is a conic C1 such that P1 ∈ C1 while C1 contains P2 , P3 , P4 , P5 .
To do this, denote by Lij the line in R2 that contains Pi and Pj , where i and j are in {1, 2, 3, 4, 5}
such that i 6= j. If P1 6∈ L23 ∪ L45 , we can let
C1 = L23 ∪ L45 ,
so that P1 6∈ C1 and C1 contains P2 , P3 , P4 , P5 . Therefore, without loss of generality, we may
assume that P1 ∈ L23 . Then P4 6∈ L23 and P5 6∈ L23 , so that
P1 6∈ L24 ∪ L35 ,
since L24 ∩ L23 = P2 and L23 ∩ L35 = P3 . Thus, we can let C1 = L24 ∪ L35 .
Exercise 3.51. A cubic curve in R2 is a curve that is given by the following equation
a1 x3 + a2 x2 y + a3 xy 2 + a4 y 3 + a5 x2 + a6 xy + a7 y 2 + a8 x + a9 y + a10 = 0,
where a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , a9 and a10 are fixed real numbers such that at least one number
among a1 , a2 , a3 , a4 is not zero. Let C1 and C2 be two distinct cubic curve in R2 such that their
intersection C1 ∩ C2 contains 9 distinct points P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9 such that
• at most 2 points among them are contained in one line,
• and at most 5 points among them are contained in one conic (see Example 3.48).
Prove that every cubic curve in R2 that contains 8 points among P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9
also contains the remaining ninth point (see Chasles’s Theorem in Preface).
We saw that homogeneous systems of linear equations always have (possibly trivial) solutions.
What about inhomogeneous systems of linear equations? These need not have any solutions, even
if they have more variables than equations. For instance, the following system has no solutions:
2x1 + 3x2 + 4x3 = 1,


20x1 + 30x2 + 40x3 = 1.


Lemma 3.52. Let A be a m × n matrix, let b be a vector in Rm , and let u be a vector in Rn .
Suppose that x = u is a solution to Ax = b. Then the set of all solutions to Ax = b is
n o
x ∈ Rn : x = u + w for some w ∈ ker(A) .

Proof. We have to prove two things: that every element of this set is a solution, and that every
solution belongs to this set. First, we let w ∈ ker(A). Then

A u + w = Au + Aw = b + 0 = b
64 IVAN CHELTSOV

as required. Second, let x be a vector in Rn such that Ax = b. Let w = x − u. Then



Aw = A x − u = Ax − Au = b − b = 0,
so that w ∈ ker(A). On the other hand, we have x = u + w. 
Example 3.53. Consider system of linear equations that consists of one equation: 2x − 3y = 7.
One its solution is x = 5 and y = 1. The associated homogeneous system is
2x − 3y = 0.
Its general solution is
   
x 3t
=
y 2t
where t is any real number. So by Lemma 3.52, the general solution of the original system is
     
x 5 3t
= + ,
y 1 2t
where t is any real number.
By Lemma 3.52, one of the following three possibilities holds:
• the system of linear equations (3.42) has no solutions,
• the system of linear equations (3.42) has exactly one solution,
• the system of linear equations (3.42) has infinitely many solutions.
Moreover, if (3.42) has a solution, then
(3.42) has exactly one solution ⇐⇒ rank(A) = n.
This fact immediately follows from Lemma 3.52 and Theorem 3.4 (the rank–nullity theorem).
To determine whether (3.42) has a solution or not, it is useful to write the m × n matrix A next
to the m-dimensional vector b, making a single matrix with a vertical bar separating A from b:
a11 a12 · · · a1n b1
 
 a21 a22 · · · a2n b2 
 .
 .. .. .. ..  .
. . . 
am1 am2 · · · amn bm
This matrix is called the augmented matrix of the system of linear equations (3.42).
Lemma 3.54. The system of linear equations (3.42) has a solution if and only if
 
rank A = rank A b

where A
b is the augmented matrix of the system of linear equations (3.42).

Proof. Observe that col(A) ⊂ col(A),


b so that
 
rank A 6 rank A
b

by Lemma 2.50. Moreover, if (3.42) has a solution, then col(A) must contain b by Lemma 2.21,
so that col(A) b = col(A), which gives rank(A) = rank(A). b
Let v1 , . . . , vn be the columns of the matrix A. If rank(A) = rank(A),
b then
 
col A = col A b
ACCELERATED ALGEBRA 65

by Lemma 2.50, which gives b ∈ col(A), so that there are real numbers x1 , . . . , xn such that
n
X
b= xi v i .
i=1

This equality can be rewritten as Ax = b, so that (3.42) has a solution in this case. 
Corollary 3.55. The system of linear equations (3.42) has a unique solution if and only if
 
rank A = rank A
b =n

where A
b is the augmented matrix of (3.42), and n is the number of columns of A.

Exercise 3.56. Let A be a 4 × 3 matrix such that the equation


 
8
9
Ax = 7

12
has exactly one solution. Write down all solutions of the equation Ax = 0.
Exercise 3.57. Show that the following system of linear equations has unique solution:

x1 + 2x2 + 3x3 = 1,

x2 + 4x3 = 0,

5x + 6x = 1.
1 2

Then find this solution using solution to Exercise 3.39.


Exercise 3.58. Observe that x1 = 3, x2 = 2 and x3 = 1 is a solution to
2x1 + 5x2 − 8x3 = 8,



4x1 + 3x2 − 9x3 = 9,

2x1 + 3x2 − 5x3 = 7,




x1 + 8x2 − 7x3 = 12.

Show that this system of linear equations does not have other solutions.

3.d. Row echelon form.


There is hardly any theory which is more elementary than linear algebra,
in spite of the fact that generations of professors and textbook writers
have obscured its simplicity by preposterous calculations with matrices.
Jean Dieudonné
In Examples 3.44 and 3.45, we solved two systems of linear equations in a clear methodical way.
We used three operations repeatedly:
• interchange two equations;
• multiply an equation by a nonzero scalar (on both sides);
• add a multiple of one equation to another equation.
Let us translate these operations into the matrix language.
66 IVAN CHELTSOV

Let A is a m × n matrix, let b be a vector in Rm . We want to find all x ∈ Rn such that Ax = b.


Recall from Section 3.c, that it is convenient to consider the following augmented matrix:
a11 a12 · · · a1n b1
 
 a a22 · · · a2n b2 
b =  .21
A .. .. ..  .
 .. . . . 
am1 am2 · · · amn bm
The equations of our system of linear equations Ax = b correspond to the rows of the matrix A, b
with the bar separating the left-hand side from the right-hand side. Moreover, the three operations
on equations correspond to the following three operations on the augmented matrix:
• interchange two rows;
• multiply a row by a nonzero scalar;
• add a scalar multiple of one row to another row.
These operations are called elementary row operations. We denote them as follows:
• interchanging rows i and j is written as Ri ↔ Rj ;
• multiplying row i by a nonzero scalar λ is written as Ri → λRi ;
• adding λ times row j to row i is written as Ri → Ri + λRj .
Example 3.59. The augmented matrix of the system of linear equations in Example 3.44 is
 
2 3 −1 3
 1 1 1 4 .
3 −4 1 1
When we solved this system of linear equations in Example 3.44, we did it in terms of equations.
We can repeat this solution in terms of elementary row operations as follows:
   
2 3 −1 3 1 1 1 4
R ↔R2 R3 →R3 −3R1
 1 1 1 4  −−1−−→  2 3 −1 3  − −−−−−−→
R2 →R2 −2R1
3 −4 1 1 3 −4 1 1
   
1 1 1 4 1 0 4 9 R3 → −1 R3
R3 →R3 +7R2
−→  0 1 −3 −5  −− −−−−−→  0 1 −3 −5  −−−−23 −−→
R1 →R1 −R2
0 −7 −2 −11 0 0 −23 −46
   
1 0 4 9 1 0 0 1
R2 →R2 +3R3
−→  0 1 −3 −5  −− −−−−−→  0 1 0 1  .
R1 →R1 −4R3
0 0 1 2 0 0 1 2
Now we can conclude that the unique solution of the system of linear equations is
   
x1 1
x2  = 1 .
x3 2
Note that it is often OK to do multiple row operations at once, as shown in Example 3.59.
But in order to avoid mistakes, do not do more than one operation to any individual row at once.
Lemma 3.60. Let A0 be the m × n matrix obtained from A by performing a single row operation,
and let b0 be the vector obtained from b by performing the same row operation. Then
Ax = b ⇐⇒ A0 x = b0
for every x ∈ Rn .
ACCELERATED ALGEBRA 67

Proof. There are three cases to prove, corresponding to the three types of elementary row operation.
We will do the third type only; the first two are similar, easier, and left as an exercise.
Without loss of generality, we may assume that A0 is obtained from A by the row operation
R1 → R1 + λR2
for some λ ∈ R. Write the rows of A as r1 , . . . , rm . Let x ∈ Rn . Then
r1 x
 
 r2 x 
Ax =   ...  .

rm x
0
The rows of A are r1 + λr2 , r2 , . . . , rm , so that
  
r1 + λr2 x r1 x + λr2 x
 
r2 x r2 x
A0 x = 
   
. = . .
 ..   .. 
rm x rm x
Similarly, we have
b1 + λb2
 
 b2 
b0 =  .. .
 . 
bm
Observe also that
r1 x + λr2 x = b1 + λb2 r1 x = b1
 
⇐⇒
r2 x = b2 r2 x = b2
so that we have
r1 x + λr2 x b1 + λb2 r1 x b1
       
r2 x   b2  r x b 
A0 x = b0 ⇐⇒   ⇐⇒  2.  =  .2  ⇐⇒ Ax = b.

.. = ..
 .   .   ..   .. 
rm x bm rm x bm
This completes the proof. 
0
Corollary 3.61. Let A be a m × n matrix, and let A be another a m × n matrix that is obtained
from A by a sequence of elementary row operations. Then
ker A = ker A0
 

and row(A) = row(A0 ) so that rank(A) = rank(A0 ) as well.


Proof. By Lemma 3.60, we have ker(A) = ker(A0 ). Then row(A) = row(A0 ) by Lemma 3.6. 
Example 3.62. The augmented matrix of the system of linear equations in Example 3.45 is
 
1 2 3 1
 3 5 −2 2  .
4 7 1 3
The reductions performed in Example 3.45 can be translated into row operations. The end result is
 
1 0 −19 −1
 0 1 11 1 .
0 0 0 0
68 IVAN CHELTSOV

As in Example 3.45, we conclude that the set of solutions is


    
 −1 19 
 1  + t −11 : t ∈ R .
 0 1 

The matrices A and A0 in Corollary 3.61 may have different column space, e.g. we have
   
1 0 R2 →R2 −R1 1 0
−−−−−−−→ ,
1 0 0 0
but the column spaces of these two matrices are different.
The matrices we ended up with in Examples 3.59 and 3.62 are of special type:
Definition 3.63. A matrix is in row echelon form (REF) if the following holds:
(1) any rows consisting entirely of zeros are at the bottom;
(2) in each nonzero row, the first nonzero entry (called the leading entry) is to the left of all
the leading entries below it.
For instance, the matrix  
1 2 3 4 5
0 6 7 8 9
 
0 0 0 10 11
0 0 0 0 0
is in row echelon form, and its leading entries are underlined.
Definition 3.64. A matrix is in reduced row echelon form (RREF) if the following holds:
(1) it is in row echelon form;
(2) all leading entries are equal to 1;
(3) each column containing a leading 1 has zeros everywhere else.
For instance, the final matrices in Examples 3.59 and 3.62 are both in reduced row echelon form,
and so is the matrix  
1 0 3 0
0 1 6 0
0 0 0 1 .
 
0 0 0 0
Theorem 3.65. Using elementary row operations, we can put any given matrix into reduced row
echelon form. Moreover, the resulting matrix is unique.
Proof. Left to the reader. 
Exercise 3.66. Find the reduced row echelon form of the matrix
 
1 4 7
2 5 8 .
3 6 9
We can find a basis of a row space of a matrix using elementary row operations and
Lemma 3.67. Let A be a matrix in REF. Then transposes of its nonzero rows is a basis of row(A).
Proof. The transposes of all non-zero rows of A span row(A). Therefore, to complete the proof,
we must show that they are linearly independent. Let us do this.
ACCELERATED ALGEBRA 69

Suppose that A is a m × n matrix with k nonzero rows. By definition, these are the first k rows.
Write their transposes as vectors r1 , . . . , rk in Rn with leading entries `1 , . . . , `k , respectively. Then
 
0
0
 .. 
 
.
0
 
ri = `i 
 
∗
 
∗
 
.
 .. 

where ∗ denotes for any real number. So that `i is the first non-zero coordinate of the vector ri .
Denote the number of this coordinate by pi . Then, by definition of REF, we have
1 6 p1 < p2 < · · · < pk 6 n.
Now let λ1 , . . . , λk be scalars such that
λ1 r1 + λ2 r2 + · · · + λk rk = 0.
Comparing the p1 th entries on each side gives λ1 `1 = 0, so that λ1 = 0. Thus, we have
λ2 r2 + · · · + λk rk = 0.
Comparing the p2 th entries on each side gives λ2 `2 = 0, so that λ1 = λ2 = 0. Thus, we have
λ3 r3 + · · · + λk rk = 0.
Continuing in this way, we get λ1 = · · · = λk = 0, so that r1 , . . . , rk are linearly independent. 
Corollary 3.68. The rank of a matrix is the number of nonzero rows in its row echelon form.
Exercise 3.69. Use reduction to REF to determine whether the following vectors
     
1 −4 6
3 , −6 ,  12 
0 3 −5
are linearly independent or not.
Exercise 3.70. Write down the augmented matrix of the system of linear equations

x + 2y + 3z = 1,

3x + 5y − 2z = 2,

4x + 7y + z = 3.

Find the reduced row echelon form of the augmented matrix. Then find the set of solutions.
For square matrices, using RREF gives us the following result.
Lemma 3.71. Let A be a n × n matrix. Then the following two conditions are equivalent:
• the matrix A is invertible;
• the reduced row echelon form of A is In .
Proof. Write R for the RREF of A. By Theorem 3.17, we have
• the matrix A is invertible ⇐⇒ ker(A) = {0},
• the matrix R is invertible ⇐⇒ ker(R) = {0}.
70 IVAN CHELTSOV

But ker(A) = ker(R) by Corollary 3.61, so that A is invertible ⇐⇒ R is invertible.


To complete the proof, we may assume that the matrix R is invertible. Let us show that R = In .
By Theorem 3.17, the rows of R are linearly independent, so that R has no rows consisting of zeros.
Since R is in RREF and it has only n columns, the matrix R looks like this
 
1 ∗ ··· ∗ ∗
. . .. .. 
0 1 . . .

. .
 .. . . . . . ∗ ∗

 
. . .
 .. . . . . 1 ∗

0 ··· 0 0 1
where ∗ denotes any real number. Moreover, every column containing a leading entry has zeros
everywhere else, so that we must have R = In . 

3.e. Computational examples.


We share a philosophy about linear algebra: we think basis-free, we write basis-free,
but when the chips are down we close the office door and compute with matrices like fury.
Paul Halmos
How to find a basis of the kernel of a matrix?
Example 3.72. Let A be the matrix from Example 3.73. Then its RREF is
 
1 0 1 0 −1
0 1 2 0 3
 .
0 0 0 1 4
0 0 0 0 0
Therefore, the solutions of Ax = 0 are the vectors x satisfying

x1 + x3 − x5 = 0,

x2 + 2x3 + 3x5 = 0,

x + 4x = 0.
4 5

We can rewrite this system of linear equations as



x1 = −x3 + x5 ,

x2 = 2x3 − 3x5 ,
x = −4x .

4 5

Then we let s = x3 and t = x5 , so that


     

 −1 1 

−2 −3

 
   

 


ker(A) = s  1  + t  0  : s ∈ R and t ∈ R .
   


 0 −4 


 
 0 1 

In the terminology introduced in Example 3.45, x3 and x5 are “free” variables.


How to find a basis of the row space of a matrix?
ACCELERATED ALGEBRA 71

Example 3.73. Let A be the matrix


 
1 1 3 1 6
 2 −1 0 1 −1
−3 2 1 −2 1  .
 
4 1 6 1 3
After a sequence of elementary row operations, we find that the RREF of A is
 
1 0 1 0 −1
0 1 2 0 3 
0 0 0 1 4  .
 
0 0 0 0 0
Now using Corollary 3.61 and Lemma 3.67, we see that the transposes of the nonzero rows of this
matrix form a basis of row(A). Thus, we have
     
1 0 0
0 1 0
     
 1  , 2 , 0
     
0 0 1
−1 3 4
form a basis of row(A). Here we calculated the reduced row echelon form of A, but any other row
echelon form would also have worked, for the same reasons.
Exercise 3.74. Let  
2 0 1 2
1 −3 0 1
2 1 1 0 .
A= 
3 −4 1 5
Find a basis for row(A), and find a basis for ker(A).
How to find a basis of the column space of a matrix?
Example 3.75. Let A be the matrix
 
1 2 −3 4
1 −1 2 1
 
3 0 1 6 .
 
1 1 −2 1
6 −1 1 3
To find a basis of col(A), we can find a basis of row(AT ) = col(A), which is done in Example 3.73.
Given a list of vectors in Rn , how to find a basis of their span?
Example 3.76. Consider the following vectors in R5 :
       
1 2 −3 4
1 −1 2 1
       
3 , v2 =  0  , v3 =  1  ,
v1 =  6 .
v4 = 
     
1 1 −2 1
6 −1 1 3
72 IVAN CHELTSOV

Turn these into row vectors (i.e. take transposes) and put them together as the rows of a matrix.
In this example, this gives the matrix A from Example 3.73. Then
  
span v1 , v2 , v3 , v4 = row A .

But we have already found a basis of row(A) in Example 3.73.


Exercise 3.77. Find a basis of the span in R5 of the following vectors:
     
0 1 −1
1  6  −4
     
 4 ,  0 ,  8 .
     
−2  1  −5
1 −1 3
Then find a basis of the orthogonal complement to this span.
How to find the inverse of an invertible square matrix?
Example 3.78. Let A be the matrix  
1 0 2
0 3 0 .
4 0 5
To find its inverse, place the vectors e1 , e2 , e3 next to the matrix A:
 
1 0 2 1 0 0
 0 3 0 0 1 0 .
4 0 5 0 0 1
Now using elementary row operations on this 3 × 6 matrix to put A into RREF:
   
1 0 2 1 0 0 1 0 2 1 0 0 R3 → −1 R3
R3 →R3 −4R1 3
 0 3 0 0 1 0  −− −−−−−→  0 3 0 0 1 0  −−−−− −→
R2 → 13 R2
4 0 5 0 0 1 0 0 −3 −4 0 1
1 0 0 −5
   2

1 0 2 1 0 0 3
0 3
1 R →R −2R
−→  0 1 0 0 0  −− 1
−−−1
−− →3
 0 1 0 0 1 0 .
3 3
0 0 1 43 0 −13
0 0 1 4
3
0 −1
3

Observe that the RREF of the matrix A is I3 , so that the matrix A is invertible by Lemma 3.71.
Moreover, the right-hand half of the 3 × 6 matrix is the inverse matrix A−1 , so that
 
−5/3 0 2/3
A−1 =  0 1/3 0 .
4/3 0 −1/3
In Example 3.78, we implicitly use one important observation. To describe it, fix a n×n matrix A.
Then each elementary row operation done to A can be obtained via a matrix multiplication
A 7→ M A
for an appropriate n × n matrix M . So, performing k elementary row operations that convert
A into its reduced row echelon form R, we obtain n × n matrices M1 , . . . , Mk such that
Mk · · · M2 M1 A = R.
| {z }
row operations
ACCELERATED ALGEBRA 73

If A is invertible, then R = In by Lemma 3.71, so that Mk · · · M2 M1 is the inverse of the matrix A.


In this case, if we apply the same row operations to the matrix In , we obtain the matrix
M · · · M M I = Mk · · · M2 M1 = A−1 .
| k {z 2 }1 n
row operations

Let us consider one more example that shows a potential problem for using this algorithm.
Example 3.79. Let A be the matrix  
9 8 7
6 5 4
3 2 1
Let us try to find its inverse. Applying the same method as in Example 3.79, we get
   8 7 1

9 8 7 1 0 0 1
R1 → R1
1 9 9 9
0 0
R2 →R2 −6R1
 6 5 4 0 1 0  −−−−9−→  6 5 4 0 1 0  −− −−−−−→ R3 → R3 − 3R1
3 2 1 0 0 1 3 2 1 0 0 1
 8 7 1
  8 7 1

1 9 9 9
0 0 1 9 9 9
0 0
R3 →R3 −2R2
−→  0 −1 3
−2
3
−2
3
1 0  −− −−−−−→  0 −1 3
−2
3
−2
3
1 0 .
−2 −4 −1
0 3 3 3
0 1 0 0 0 1 −2 1
We have not yet reduced A into its RREF, but this intermediate matrix already has a zero row.
Therefore, the matrix A is not invertible by Lemma 3.71, so that its inverse does not exists.
Exercise 3.80. Using row operations (not determinants), determine whether the following matri-
ces are invertible. If they are, find their inverses.
   
1 2 −1 1 1 −1
A = 1 1 1  , B = 3 1 2  .
1 −1 0 5 3 0
Let V be a linear subspace of Rn . How to find an orthonormal basis of this vector subspace?
To start with, we should find some basis y1 , . . . , ym of this subspace, and then use an algorithm
described in the proof of Lemma 2.73. It produces an orthonormal basis v1 , . . . , vm as follows:
y1

v 1 =
 ,


 ky1 k

y2 − (y2 · v1 )v1


v2 = ,






 ky2 − (y2 · v1 )v1 k
y3 − (y3 · v1 )v1 − (y3 · v2 )v2

v3 = ,



y3 − (y3 · v1 )v1 − (y3 · v2 )v2
..





 .



 ym − (ym · v1 )v1 − (ym · v2 )v2 − · · · − (ym · vm−1 )vm−1
v m =


kym − (ym · v1 )v1 − (ym · v2 )v2 − · · · − (ym · vm−1 )vm−1 k
This procedure is called the Gram–Schmidt process.
Example 3.81. Let V = span({y1 , y2 , y3 }), where
     
−5 1 1
3 0 −3
y1 = 
 1  , y2 = −4 , y3 =  2  .
    
1 3 0
74 IVAN CHELTSOV

Let us find an orthonormal basis of V. First, we check that y1 , y2 , y3 are linearly independent.
This can be done in one of the following ways:
• directly like in the solution to Exercise 2.39;
• by computing REF of the matrix (y1T , y1T , y1T ) and using Lemma 3.67;
• by observing that

3 0 −3
1 −4 2 = 3 −4 2 − 3 1 −4 = −39 6= 0.

3 0 1 3
1 3 0
We see that y1 , y2 , y3 form a basis of V. Thus, we can now apply the Gram–Schmidt process to
obtain an orthonormal basis v1 , v2 , v3 of the linear subspace V. To calculate v1 , we have
 
−5
y1 1  3
v1 = =  .
ky1 k 6 1 
1
To calculate v2 , we have
     
1 −5 1
 0  1 3  1 3 
y2 − (y2 · v1 )v1 = 
−4 + 6  1  = 6 −23 ,
    
3 1 19
so that  
1
y2 − (y2 · v1 )v1 1 
 3 .

v2 = =
ky2 − (y2 · v1 )v1 k 30 −23
19
To calculate v3 : we have
       
1 −5 1 −91
−3 1  3 
 +   + 3  3  = 1 −273
   
y3 − (y3 · v1 )v1 − (y3 · v2 )v2 = 
 2  3  1  50 −23 150  143 
0 1 19 221
and so that
   
−91 −7
y3 − (y3 · v1 )v1 − (y3 · v2 )v2 1 −273 = 1 −21 .
  
v3 = =
y3 − (y3 · v1 )v1 − (y3 · v2 )v2 390  143  30  11 
221 17
Exercise 3.82. Let    
1 2
2 6
w1 =  2 and w2 = 7 .
  
4 8
Use the Gram–Schmidt process to find an orthonormal basis of span({w1 , w2 }).
ACCELERATED ALGEBRA 75

4. Linear transformations
4.a. What is a linear transformation?
Non-linear means it’s hard to solve
Arthur Mattuck
n m
Let T : R → R be a function.
Definition 4.1. The function T is said to me a linear transformation or linear map if
(1) T (0) = 0;
(2) T (x + y) = T (x) + T (y) for all x and y in Rn ;
(3) T (λx) = λT (x) for all λ ∈ R and all x ∈ Rn .
Example 4.2. Let T : R2 → R2 be reflection in the x-axis. Then
   
x1 x1
T =
x2 −x2
for all x1 and x2 in R. Then T is a linear transformation, because
• T (0) = 0;
• for any x and y in R2 , we have
       
 x1 + y 1 x1 + y 1 x1 y1  
T x+y =T = = + =T x +T y ;
x2 + y 2 −(x2 + y2 ) −x2 −y2
• for any λ ∈ R and any vector x ∈ R2 , we have
     
 cx1 cx1 x1 
T λx = T = =λ = λT x .
cx2 −λx2 −x2
Example 4.3. Let T : R2 → R3 be a function that is given by
 
  3x1 + 2x2
x1
7→  x2 − 4x1 
x2
−x2
for every x1 and x2 in R. Then T is a linear transformation, because
• T (0) = 0;
• for any x and y in R2 , we have
 
  3(x 1 + y 1 ) + 2(x 2 + y 2 )
 x1 + y 1
T x+y =T =  (x2 + y2 ) − 4(x1 + y1 )  =
x2 + y 2
−(x2 + y2 )
   
3x1 + 2x2 3y1 + 2y2 
=  x2 − 4x1  +  y2 − 4y1  = T x) + T (y ;
−x2 −y2
• for any λ ∈ R and any x ∈ R2 , we have
   
  3λx 1 + 2λx2 3x 1 + 2x 2
 λx1 
T λx = T =  λx2 − 4λx1  = λ  x2 − 4x1  = λT x .
λx2
−λx2 −x2
Example 4.4. If T : R2 → R2 is a function given by
   2
x1 x1
7→
x2 x2
76 IVAN CHELTSOV

for every x1 and x2 , then T is not a linear transformation, because


            
1 2 4 2 1 1
T 2 =T = 6= =2 = 2T .
0 0 0 0 0 0
Similarly, if T : R2 → R2 is a function given by
   
x1 1
7→
x2 x1
for every x1 and x2 , then T is not a linear transformation, because T (0) 6= 0.
In fact, the three conditions in Definition 4.1 can be reduced to one condition:
Lemma 4.5. The function T : Rn → Rm is a linear transformation if and only if
  
T λx + µy = λT x + µT y
for every real numbers λ and µ, and every vectors x and y in Rn .
Proof. If T is a linear transformation, then
    
T λx + µy = T λx + T µy = λT x + µT y
for any real numbers λ and µ, and any vectors x and y in Rn . Conversely, suppose that
  
T λx + µy = λT x + µT y
for every real numbers λ and µ, and every vectors x and y in Rn . Then
     
T 0 = T 1 0 + 1 0 = 1T 0 + 1T 0 = T 0 + T 0 .
Subtracting T (0) from each side, we get 0 = T (0). Similarly, if x and y are vectors in Rn , then
     
T x + y = T 1x + 1y = 1T x + 1T y = T x + T y .
Finally, if λ is a real number and x is a vector in Rn , then
    
T λx = T λx + 0x = λT x + 0T x = λT x .
This shows that T is a liner transformation by Definition 4.1. 
Example 4.6. Let IdRn : Rn → Rn be the identity function. Then IdRn is a linear transformation.
Indeed, for any real numbers λ and µ, and any vectors x and y in Rn , we have
  
IdRn λx + µy = λx + µy = λIdRn x + µIdRn y ,
so that IdRn is a linear transformation by Lemma 4.5.
Example 4.7. Let T : R2 → R1 be a function given by
 
x1
7→ 3x1 − 2x2
x2
for any x1 and x2 in R. Let λ and µ be any real numbers, let x and y be any vectors in Rn . Then
    
 λx1 + µy1 3(λx1 + µy1 − 2(λx2 + µy2 )
T λx + µy = T = = λT (x) + µT (y).
λx2 + µy2 λ(3x1 − 2x2 ) + µ(3y1 − 2y2 )
Hence, by Lemma 4.5, the function T is a linear transformation.
ACCELERATED ALGEBRA 77

Exercise 4.8. Let T1 : R2 → R2 be the function that is given by


   
x x+y
7→ ,
y x−y
let T2 : R2 → R be the function that is given by
 
x
7→ 2x + 3y,
y
and let T3 : R2 → R3 be the function that is given by
 
  2x + y
x
7→ 3x + 1 .
y
y−2
Which of them are linear transformations? Justify your answer.
Let V be a linear subspace of Rn . By Proposition 2.75, for every x ∈ Rn , there are unique
vectors v ∈ V and w ∈ V⊥ such that
x = v + w.
Thus, for every x ∈ Rn , there is a unique vector v ∈ V such that
x − v ∈ V⊥
Thus, we can define a function PV : Rn → Rn such that PV (x) ∈ V and
x − PV x ∈ V⊥


for every vector x ∈ Rn . Then PV is called the orthogonal projection onto V.


Lemma 4.9. The function PV is a linear transformation.
Proof. Let λ and µ be some real numbers, let x and y be some vectors in Rn . Then PV (ax + by)
is the unique vector in V such that
 
λx + µy − PV λx + µy ∈ V⊥ .


Thus, by Lemma 4.5, the function PV is a linear transformation if


  
λx + µy − λPV x + µPV y ∈ V⊥ .


To see this, note that λPV (x) + µPV (y) ∈ V, because PV (x) and PV (y) are contained in V. Then
      
λx + µy − λPV x + µPV y = λ x − PV x + µ y − PV y ∈ V⊥ ,


because x − PV (x) ∈ V⊥ and y − PV (y) ∈ V⊥ by definition of PV . 

Let v1 , . . . , vm be an orthonormal basis of the linear subspace V.


Lemma 4.10. For every vector x ∈ Rn , we have
m
X
 
PV x = x · vi v i .
i=1
78 IVAN CHELTSOV

Proof. By definition, the vector PV (x) is the unique vector in V such that
x − PV (x) ∈ V⊥ .
But by Lemma 2.72, the vector
m
X 
x · vi vi .
i=1

has this property. This implies the required assertion. 


Remark 4.11. Let V be a linear subspace in R3 , and let PV : R3 → R3 be orthogonal projection to V.
Then dim(V) ∈ {0, 1, 2, 3}. Thus, we have the following possibilities:
dim(V) = 0 V = {0} and PV (x) = 0 for all x ∈ V;
dim(V) = 1 V is a line through the origin, so that PV maps x ∈ R3 to the closest point to x in this line,
e.g. if V is spanned by the standard vector e1 , then PV is given by
   
x1 x1
x2  7→  0  ;
x3 0

dim(V) = 2 V is a plane through the origin, and PV maps x ∈ R3 to the closest point to x in this plane,
e.g. if V is spanned by the standard vectors e1 and e2 , then PV is given by
   
x1 x1
x2  7→ x2  ;
x3 0

dim(V) = 3 V = R3 and PV (x) = x for every x ∈ R3 .

Exercise 4.12. Let V be a linear subspace of Rn , let PV : Rn → Rn be the orthogonal projection


onto V, and let x be a point in Rn . Prove that
x − v 2 = x − PV x 2 + PV x − v 2
 

for all v ∈ V. Deduce that PV (x) is the point of V closest to x.

4.b. Matrices versus linear transformations.


Matrix theory may be called the arithmetic of higher mathematics.
Richard Bellman
n m n
Let T : R → R be a linear transformation. Recall that any vector x ∈ R can be expressed
as a linear combination of the standard basis vectors e1 , . . . , en in Rn . This simply means that
x1 0 0
     
 0  x2  0
x=  ...  +  ...  + · · · +  ...  = x1 e1 + x2 e2 + · · · + xn en .
    

0 0 xn
Since T is a linear transformation, we have
   
T x = x1 T e 1 + x2 T e 2 + · · · + xn T e n
for all x ∈ Rn , so that T is uniquely determined by the vectors T (e1 ), . . . , T (en ).
ACCELERATED ALGEBRA 79

Proposition 4.13. Let v1 , . . . , vn be a basis of the space Rn , and let u1 , . . . , un be vectors in Rm .


Then there exists exactly one linear transformation T : Rn → Rm such that
 

 T v1 = u1 ,

 
T v2 = u2 ,

..


 .

T v  = u .

n n

Proof. First, let us prove that there is at most one such T. Suppose that there are two linear
transformations T : Rn → Rm and S : Rn → Rm such that
  

 T v1 = u1 = S v1 ,

  
T v = u = S v ,

2 2 2
..


 .

T v  = u = S v  .

n n n

Let x be any vector in Rn . Since v1 , . . . , vn span Rn , we can write


x = λ1 v1 + λ2 v2 + · · · + λn vn
for some λ1 , . . . , λn ∈ R. Then, using Lemma 4.5, we get
   
T x = λ1 T v1 + λ2 T v2 + · · · + λn T vn =
   
= λ1 u1 + λ2 u2 + · · · + λn un = λ1 S v1 + λ2 S v2 + · · · + λn S vn = S x .
Thus, we have T(x) = S(x) for every x ∈ Rn , so that T = S.
Now let us prove that there exists a linear transformation T : Rn → Rm such that
 

 T v1 = u1 ,

 
T v2 = u2 ,

..


 .

T v  = u .

n n

Since v1 , . . . , vn is a basis of Rn , each vector x ∈ Rn can be written as


x = λ1 v1 + λ2 v2 + · · · + λn vn
for unique scalars λ1 , . . . , λn in R. Define T : Rn → Rm by
x 7→ λ1 u1 + λ2 u2 + · · · + λn un .
Then T(vi ) = ui for each i ∈ {1, . . . , n}. To complete the proof, we have to show that T is linear.
Let λ and µ be some real numbers, let x and y be some vectors in Rn . By Lemma 4.5, to prove
that T is linear, it is enough to show that
  
T λx + µy = λT x + µT y .
Let us do this. Since v1 , . . . , vn is a basis of Rn , we have

Xn



 x= λi vi

i=1
 Xn

y = µi v i



i=1
80 IVAN CHELTSOV

for some real numbers λ1 , . . . , λn , µ1 , . . . , µn . Then


n n n
 X  X X
T λx + µy = λλi + µµi ui = λ λi ui + µ µi ui = λT(x) + µT(y)
i=1 i=1 i=1
as required. This shows that T is a linear transformation. 
Example 4.14. Let v1 , . . . , vn be some basis of Rn , and let λ1 , . . . , λn be some real numbers.
By Proposition 4.13, there is exactly one linear transformation T : Rn → Rn such that
 

 T v1 = λ1 v1 ,

 
T v = λ v ,

2 2 2
..


 .

T v  = λ v .

n n n

This transformation scales Rn by a factor of λi in the direction of vi .


Applied to the standard basis and linear transformation T : Rn → Rm , Proposition 4.13 tells us
that T is completely determined by the n vectors T (e1 ), . . . , T (en ) in Rm . It is natural to compile
them into an m × n matrix and give it a name:
Definition 4.15. The standard matrix of T is the m × n matrix
     
T = T e1 |T e2 | · · · |T en ,
so that [T ] is the m × n matrix whose ith column is T (ei ) for i ∈ {1, . . . , n}.
Therefore, we see that every linear transformation Rn → Rm gives rise to some m × n matrix.
For example, if IdRn : Rn → Rn is the identity function (see Example 4.6 and Lemma 4.29), then
 
IdRn = In .
For instance, if T : R2 → R2 is the linear transformation from Example 4.2, then
  
 1
T v 1 = 0


 
T v 2 = 0

 
−1
so that the standard matrix of T is the 2 × 2 matrix
 
  1 0
T = .
0 −1
Similarly, if T : R2 → R3 is the linear transformation from Example 4.3, then
  
  3
T v1 = −4

 


0


 


  2
T v 2 =


  1
−1
so that the standard matrix of T is the 3 × 2 matrix
 
  3 2
T = −4 1  .
0 −1
ACCELERATED ALGEBRA 81

Likewise, it T : R3 → R3 is orthogonal projection onto the plane span({e1 , e2 }), then


 

 T v 1 = e1

T v2 = e2
 
T v2 = 0

so that the standard matrix of T is the 3 × 3 matrix


 
  1 0 0
T = 0 1 0 .
0 0 0
Vice versa, every m × n matrix gives rise to some linear transformation Rn → Rm as follows.
Let A be an m × n matrix. Define a function TA : Rn → Rm by
(4.16) x 7→ Ax ∈ Rm
for each x ∈ Rn . For instance, if m = n and A = In , then
TA = IdRn ,
where IdRn : Rn → Rn is the identity function (see Example 4.6 and Lemma 4.29 below).
Exercise 4.17. Let T : R2 → R2 be the linear transformation defined as follows:
• rotate by π2 around the origin,
• then reflect in the x-axis,
• then multiply by −1,
• then reflect in the y-axis,
• then rotate by − π2 around the origin.
Find the standard matrix of T .
Lemma 4.18. Let A be an m × n matrix. Then TA : Rn → Rm is a linear transformation.
Proof. By Lemma 4.5, it is enough to show that TA (λx + µy) = λTA (x) + µTA (y) for every real
numbers µ and µ, and every vectors x and y in Rn . We have
  
TA λx + by = A λx + by = λAx + bAy = λTA x) + bTA (y
as required. 
Example 4.19. Let A be the 3 × 2 matrix
 
3 2
−4 1  .
0 −1
Then TA is the linear transformation R2 → R3 such that
   
3 2   3x 1 + 2x 2
x
TA x = Ax = −4 1  1 =  x2 − 4x1 

x2
0 −1 −x2
for every x ∈ R2 . This is linear transformation in Example 4.3.
Example 4.20. Let Rθ : R2 → R2 be the function that rotates R2 by an angle θ anticlockwise
about 0, where θ is some real number. Then
 
 x1 cos θ − x2 sin θ
Rθ x =
x1 sin θ + x2 cos θ
82 IVAN CHELTSOV

for every x ∈ R2 . Then Rθ (x) = Ax for every x ∈ R2 , where


 
cos θ − sin θ
A= .
sin θ cos θ
This shows that Rθ = TA . Hence, the function Rθ is linear by Lemma 4.18.
Exercise 4.21. Let v be a vector in Rn , and let Dv : Rn → R be a function defined by

Dv x = v · x
for every x ∈ Rn . Prove that Dv is a linear transformation. Explain why every linear transforma-
tion T : Rn → R is equal to Dv for some vector v ∈ Rn .
We have T = T[T ] in Example 4.3. We have TA = [TA ] in Example 4.19. This is not a coincidence.
Theorem 4.22. Let T : Rn → Rm be a linear transformation, and let A be a m × n matrix. Then
(
T[T ] = T,
 
TA = A.

Proof. For each i ∈ {1, . . . , n}, we have



T[T ] ei = [T ]ei .
But [T ]ei is the ith column of the matrix [T ], which is T (ej ). Then
 
T[T ] ei = T ei
for each i ∈ {1, . . . , n}. Then T[T ] = T but Proposition 4.13.
Similarly, for every i ∈ {1, . . . , n}, the ith column of [TA ] is TA (ej ). But

TA ei = Aei
for every i ∈ {1, . . . , n}, where Aei is the ith column of the matrix A. Hence, both matrices [TA ]
and A have the same columns, so that [TA ] = A. 
Corollary 4.23. ∀ linear transformation T : Rn → Rm , ∃ unique m×n matrix A such that T = TA .
Vice versa, ∀ m × n matrix A, ∃ unique linear transformation T : Rn → Rm such that TA = T .
Theorem 4.22 gives a bijection between linear transformations Rn → Rm and m × n matrices.
This bijection sends a given linear transformation T : Rn → Rm to its standard m × n matrix [T ].
Vice versa, the inverse bijection sends a given m × n matrix A to the linear transformation TA .
For example, the linear transformation in Example 4.3 corresponds to the matrix in Example 4.19.
Example 4.24. Let V be a linear subspace of the space Rn , and let PV : Rn → Rn be the orthogonal
projection onto V. Then PV is a linear transformation by Lemma 2.72. Therefore, there exists
unique n × n matrix A such that 
PV x = Ax.
for all vectors x. The matrix A is the standard matrix of PV .
Exercise 4.25. Let v be a vector in Rn of length 1, let V = span({v}), and let PV : Rn → Rn be
the orthogonal projection onto V. Find the standard matrix of the linear transformation PV .
Exercise 4.26. Let V be the orthogonal complement in R2 to span({v}), where
 
a
v=
b
ACCELERATED ALGEBRA 83

for some real numbers a and b such that v 6= 0. Let PV : R2 → R2 be the orthogonal projection
onto V. Show that PV is given by
!
b2 x1 −abx2
 
x1 2 2
7→ a2 xa2 −abx
+b .
x2 a2 +b2
1

find its standard matrix and compute its determinant. Let RV : R2 → R2 be the function given by
  
x 7→ x + 2 PV x − x .
Then RV is the reflection in V. Show that RV is given by
!
b2 x1 −abx2
 
x1 2 2
7→ a2 xa2 −abx
+b ,
x2 a +b2
2
1

conclude that it is a linear transformation, find its standard matrix and compute its determinant.
Linear transformations and matrices are not the same! A linear transformation is a function
with certain nice properties. But a matrix is just a grid of numbers.

4.c. Composing linear transformations.


I am in the world only for the purpose of composing
Franz Schubert
Let T : Rn → Rm and U : Rp → Rn be linear transformations.
Lemma 4.27. The composite T ◦ U : Rp → Rm is also a linear transformation.
Proof. Let λ and µ be some real numbers, and let x and y be some vectors in Rp . Then
     
(T ◦ U ) λx + µy = T U λx + µy = T λU x + µU y =
     
= λT U x + µT U y = λ(T ◦ U ) x + µ(T ◦ U ) y .
This shows that T ◦ U is a linear transformation by Lemma 4.5. 
Example 4.28. Let Rθ : R2 → R2 be the linear transformation that rotates the plane about 0 by
an angle of θ ∈ R. Then it follows from Example 4.20 that Rθ = TA and [Rθ ] = A for
 
cos θ − sin θ
A= .
sin θ cos θ
Let PV : R2 → R2 be the orthogonal projection to V = span({e1 }). Then PV = TB and [PV ] = B for
 
1 0
B= .
0 0
Then PV ◦ Rθ takes a point in the plane, rotates it by θ, then projects it onto the x-axis. We have
( 
(PV ◦ Rθ ) e1 = cos θe1 ,

(PV ◦ Rθ ) e2 = − sin θe1 .
Thus, we see that PV ◦ Rθ = TC for C = BA, so that [PV ◦ Rθ ] = BA.
U T
This example illustrates the following: for two linear transformations Rp −
→ Rn −
→ Rm , we have
 
T ◦ U = [T ][U ]

so that multiplying matrices corresponds to composing linear transformations .


84 IVAN CHELTSOV

Lemma 4.29. The following assertions hold:


(i) Let A be a m × n matrix, and let B be a n × p matrix. Then TAB = TA ◦ TB .
(ii) For a n × n identity matrix In , one has TIn = IdRn .
(iii) Let T : Rn → Rm and U : Rp → Rn be linear transformations. Then [T ◦ U ] = [T ][U ].
(iv) For an identity map IdRn : Rn → Rn , one has [IdRn ] = In .
Proof. For every x ∈ Rp , we have
    
(TA ◦ TB ) x = TA TB x = TA Bx = ABx = TAB x ,
so that TA ◦ TB = TAB . This proves (i).
For every x ∈ Rn , we have  
TIn x = In x = x = IdRn x ,
so that TIn = IdRn . This proves (ii).
Write A = [T ] and B = [U ]. Then T = TA and U = TB by Theorem 4.22, so that
     
T ◦ U = TA ◦ TB = TAB = AB = [T ][U ],
which proves (iii). Finally, observe that
   
IdRn = TIn = In
by Theorem 4.22. This proves (iv). 
Example 4.30. Is there a 2 × 2 matrix A such that we have A1973 = I2 but A 6= I2 ? Yes, it is.

Let θ = 1973 and A = [Rθ ], where Rθ is the linear transformation from Example 4.20. Then
 
cos θ − sin θ
A= 6= I2 .
sin θ cos θ
On the other hand, we have
Rθ ◦ Rθ ◦ Rθ ◦ · · · ◦ Rθ ◦ Rθ = R2π = IdR2 .
| {z }
1973 times

Thus, it follows from Lemma 4.29 that


A1973 = A
 
| × A × ◦A ×
{z · · · × A × A
} = Id R2 = I2 .
1973 times

Do we have other matrices with this this property? Yes, we do: we have
 1973        
n −1 n −1 n −1 n −1 n −1
MA M = MA M × MA M × · · · × MA M × MA M =
| {z }
1973 times
−1 n 1973
n n n n n
× M −1 =

=M ×A
| × A × A ×
{z · · · × A × A } ×M = M × A
1973 times
n
= M × A1973 × M −1 = M × I21973 × M −1 = M × M −1 = I2 ,
where M is any invertible 2 × 2 matrix, and n ∈ {1, 2, . . . , 1972}. Any other examples? No.
A linear transformation is said to be invertible if it has an inverse linear transformation.
Lemma 4.31. Let T : Rn → Rm be a linear transformation. Then
T is invertible ⇐⇒ T is bijective ⇐⇒ n = m and [T ] is invertible .
Moreover, if T is invertible, then [T −1 ] = [T ]−1 .
ACCELERATED ALGEBRA 85

Proof. Let A = [T ]. Then A is a m × n matrix. Moreover, we have T = TA by Theorem 4.22.


If n = m and A is invertible with inverse B, then TB is a linear transformation by Lemma 4.18,
while Lemma 4.29 implies that
T ◦ TB = TA ◦ TB = TAB = TIn = IdRn
and
TB ◦ T = TB ◦ TA = TBA = TIn = IdRn
so that T is bijective, its inverse function is TB , and [TB ] = B = A−1 .
To complete the proof of the lemma, we may assume that the linear transformation T is bijective.
We have to prove that n = m and [T ] is invertible. If x ∈ ker(A), then

T x = Ax = 0,
so that x = 0, because T is injective. Thus, we see that ker(A) = {0}, so that

rank(A) = dim col(A) = n
by Theorem 3.4 (the rank-nullity theorem). One the other hand, the function T is surjective.
Therefore, for every y ∈ Rm , there exists x ∈ Rn such that
 
Ax = TA x = T x = y,
where Ax ∈ col(A) by Lemma 2.21. This shows that col(A) = Rm . So that

m = dim col(A) = n,
which gives m = n. Since ker(A) = {0}, the matrix A is invertible by Theorem 3.17. 
Exercise 4.32. Let T : Rn → Rn be a linear transformation. Suppose that the vectors
  
T e1 , T e2 , . . . , T en
are linearly independent. Prove that for each vector b ∈ Rn , there exists unique vector x ∈ Rn
such that T (x) = b.
Exercise 4.33. Let v1 , . . . , vn be some basis of Rn , and let u1 , . . . , un be another basis of Rn .
Prove that there exists an invertible linear transformation T : Rn → Rn such that

T vi = ui
for every i ∈ {1, . . . , n}.

4.d. Rank–nullity theorem revisited.


Rank does not confer privilege or give power. It imposes responsibility.
Peter Drucker
n m
Let T : R → R be a linear transformation. Then its image is
n o
im(T ) = y ∈ Rm : y = T x for some x ∈ Rn


and the kernel of the linear transformation T is


n o
n

ker(T ) = x ∈ R : T x = 0 .

Example 4.34. Let V be a linear subspace of Rn , and let PV : Rn → Rn be the orthogonal


projection onto V. Then the image of PV is the space V, and the kernel of PV is the space V⊥ .
Lemma 4.35. One has im(T ) = col([T ]) and ker(T ) = ker([T ]) .
86 IVAN CHELTSOV

Proof. Both assertions follows from Theorem 4.22 and Lemma 2.21. 
Then im(T ) is a linear subspace of Rm , and ker(T ) is a linear subspace of Rn . Hence, it makes
sense to talk about their dimensions. Then we define the rank of the linear transformation T as

rank(T ) = dim im(T )

and we define the nullity of the linear transformation T as



null(T ) = dim ker(T )

so that rank(T ) = rank([T ]) and null(T ) = null([T ]) by Lemma 4.35.


Example 4.36. Let T : R3 → R4 be the linear transformation given by
 
  x2
x1  x3 
x2  7→   .
0
x3
0
Then   

 y 1 

y
  
2 4
im(T ) =   ∈ R : y3 = y4 = 0


 y3 

 y 
4

and   
 x1 
3
ker(T ) = x2 ∈ R : x2 = x3 = 0 ,

x3
 

so that im(T ) is 2-dimensional, giving rank(T ) = dim(im(T )) = 2, and ker(T ) is 1-dimensional,


giving null(T ) = dim(ker(T )) = 1. One the other hand, we have
 
0 1 0
0 0 1
0 0 0 .
[T ] =  
0 0 0
Then rank([T ]) = 2 and the nullity of the matrix [T ] is 1.
Using Theorem 3.4 and Lemma 4.35, we obtain
Theorem 4.37 (Rank–nullity). For any linear transformation T : Rn → Rm , one has
rank(T ) + null(T ) = n .

Exercise 4.38. Let T : R5 → R2 be a linear transformation. What are the possible values of its
rank and nullity? Give examples to show that the possibilities you say can occur really do occur.
When linear transformations are injective? When they are surjective?
Lemma 4.39. Let T : Rn → Rm be a linear transformation. Then
• T is injective ⇐⇒ ker(T ) = {0} ⇐⇒ null(T ) = 0;
• T is surjective ⇐⇒ im(T ) = Rm ⇐⇒ rank(T ) = m.
ACCELERATED ALGEBRA 87

Proof. Actually, we already proved these assertions implicitly earlier in the proof of Lemma 4.31.
Now let us do this explicitly. If T is injective, then ker(T ) = {0}. Vice versa, if ker(T ) = {0} and
 
T x =T y
for some x and y in Rn , then
  
T x − y = T x − T y = 0,
which implies that x − y ∈ ker(T ), which immediately gives x − y = 0, so that we have x = y.
Thus, if ker(T ) = {0}, then T is injective. But ker(T ) = {0} ⇐⇒ null(T ) = 0 by Lemma 2.50.
Similarly, we see that T is surjective ⇐⇒ im(T ) = Rm ⇐⇒ rank(T ) = m. 
Corollary 4.40. Let T : Rn → Rn be a linear transformation. Then
T is injective ⇐⇒ T is bijective ⇐⇒ T is surjective .
Proof. By Theorem 4.37 and Lemma 4.39, we have
T is injective ⇐⇒ null(T ) = 0 ⇐⇒ rank(T ) = n ⇐⇒ T is surjective.
Thus, if T is either injective or surjective then it is both, that is, bijective. 
Exercise 4.41. Let n and m be natural numbers. Prove the following assertions:
(i) if there exists an injective linear transformation Rn → Rm , then n 6 m;
(ii) if there exists a surjective linear transformation Rn → Rm , then n > m.
Exercise 4.42. Let T be the unique linear transformation R3 → R4 such that
     
  2   −2   −2
1 3 0 3 0 0
T 0 =  0 , T 1 =  8  , T 0 =  4  .
        
0 0 1
1 13 6
Is T injective? Justify your answer.

4.e. Linear operators.


What is the difference between matrix theory and linear algebra?
The difference is that in matrix theory you have chosen a particular basis.
Steve Huntsman
We reserve special name for linear transformations Rn → Rn . We call them linear operators .
Exercise 4.43. Let T : Rn → Rn and S : Rn → Rn be linear operators. If
S ◦ T = T ◦ S,
then T and S are said to commute. For every n > 2, find the following examples:
• an example of two linear operators Rn → Rn that do not commute;
• an example of two linear operators Rn → Rn that commute, whose standard matrices are
not scalar multiples of the matrix In .
Let T : Rn → Rn be some linear operator and let v1 , . . . , vn be an arbitrary basis of Rn .
By Proposition 4.13, the linear transformation T is uniquely determined by the vectors
T v1 , . . . , T vn ∈ Rn .
 
88 IVAN CHELTSOV

Since v1 , . . . , vn is a basis of Rn , for every j ∈ {1, . . . , n}, we can write


n
X

T vj = bij vi
i=1

for some real numbers b1j , . . . , bnj . This gives us the following n × n matrix
b11 b12 . . . b1n
 
 b21 b22 . . . b2n 
B=  ... .. . . . .
. . .. 
bn1 bn2 . . . bnn
If v1 , . . . , vn is the standard basis of Rn , then B = [T ] by definition. In general, we say that
F B is the matrix of the linear operator T with respect to basis v1 , . . . , vn .

Example 4.44 (cf. Exercise 3.40). Let T = TA , where TA : R3 → R3 is the linear operator with
 
7 −12 6
A = 10 −19 10 .
12 −24 13
For every x ∈ Rn , we have T (x) = Ax. Thus, we have
 

 T e 1 = 7e1 + 10e2 + 12e3 ,

T e2 = −12e1 − 19e2 − 24e3 ,
 
T e3 = 12e1 − 24e2 + 13e3 .

On the other hand, let


     
3 2 −1
v1 = 5 = 3e1 + 5e2 + 6e3 , v2 = 1 = 2e1 + e2 , v3 =  0  = −e1 + e3 .
6 0 1
Then v1 , v2 , v3 are linearly independent, so that they form a basis of R3 . Observe that
 
 T v 1 = Av1 = −v1 ,
 
T v2 = Av2 = v2 ,
 
T v3 = Av3 = v3 .

Thus, the matrix of the linear operator T with respect to basis v1 , v2 , v3 is


 
−1 0 0
B =  0 1 0 .
0 0 1
This matrix is much simpler than our original matrix A = [T ]. Let us explain how they are related.
Let x be a vector in R3 . Then
x = λ1 v1 + λ2 v2 + λ3 v3
for some real numbers λ1 , λ2 and λ3 , because v1 , v2 , v3 form a basis of R3 . Then
    
T x = T λ1 v1 + λ2 v2 + λ3 v3 = λ1 T v1 + λ2 T v2 + λ3 T v3 = −λ1 v1 + λ2 v2 + λ3 v3 .
ACCELERATED ALGEBRA 89

In this case, we have


     
x = λ1 v1 + λ2 v2 + λ3 v3 = λ1 3e1 + 5e2 + 6e3 + λ2 2e1 + e2 + λ3 − e1 + e3 =
     
= 3λ1 + 2λ2 − λ3 e1 + 5λ1 + λ2 e2 + 6λ1 + λ3 e3 .
Thus, we have 
x1 = 3λ1 + 2λ2 − λ3 ,

x2 = 5λ1 + λ2 ,

x = 6λ + λ .
3 1 3

In the matrix form, this can be expressed as


    
x1 3 2 −1 λ1
 x2  =  5 1 0   λ 2  .
x3 6 0 1 λ3
Denote this 3 × 3 matrix by M . Its columns are just coordinates of the vectors v1 , v2 and v3 . Then
      
λ1 x1 −1 2 −1 x1
λ2  = M −1 x2  =  5 −9 5  x2  .
λ3 x3 6 −12 7 x3

Note that A = M BM −1 and B = M −1 AM . This is not a coincidence (see Theorem 4.45 below).
Using A = M BM −1 , we can easily compute An for any n ∈ Z (cf. Example 4.30). Namely, we have
 n        
An = M BM −1 = M BM −1 × M BM −1 × · · · × M BM −1 × M BM −1 =
| {z }
n times
−1 n −1
=M ×B | ×B×B× {z· · · × B × B} ×M = M × B × M =
n times
  n
   
3 2 −1 (−1) 0 0 −1 2 −1 4 − 3(−1)n 6(−1)n − 6 3 − 3(−1)n
= 5 1 0   0 1 0  5 −9 5  = 5 − 5(−1)n 10(−1)n − 9 5 − 5(−1)n 
6 0 1 0 0 1 6 −12 7 6 − 6(−1)n 12(−1)n − 12 7 − 6(−1)n
This examples shows one useful feature: if there are real numbers λ1 , . . . , λn such that
 

 T v 1 = λ1 v 1 ,

 
T v 2 = λ 2 v 2 ,

..


 .

T v  = λ v ,

n n n

then the matrix of T with respect to the basis v1 , . . . , vn is


λ1 0 · · · 0
 
. .
 0 λ2 . . .. 

. . .
 .. .. ... 0 
0 · · · 0 λn
Theorem 4.45. Let T : Rn → Rn be a linear operator, let v1 , . . . , vn be a basis of Rn . Write:
• A for the standard matrix of the linear transformation T ;
• B for the matrix of the linear transformation T with respect to the basis v1 , . . . , vn ;
90 IVAN CHELTSOV

• M for the matrix (v1 |v2 | · · · |vn ), i.e. the matrix with columns v1 , . . . , vn .
Then M is invertible, A = M BM −1 and B = M −1 AM .
Proof. The matrix M is invertible by Theorem 3.17. To complete the proof, we have to prove that
AM = M B.
Let us do thus by evaluating T (vk ) for each k ∈ {1, . . . n} in two different ways. We have
n n n n X n  n
 X X X X X
T vk = bik vi = bik mji ej = mji bik ej = (M B)jk ej .
i=1 i=1 j=1 j=1 i=1 j=1

On the other hand, we have


X n  X n

T vk = T mik ei = mik T (ei ) =
i=1 i=1
n
X n
X n X
X n  n
X
= mik aji ej = aji mik ej = (AM )jk ej
i=1 j=1 j=1 i=1 j=1

Thus, for every k ∈ {1, . . . n}, we have


Xn n
X
(M B)jk ej = (AM )jk ej ,
j=1 j=1

so that (M B)jk = (AM )jk for every j and k in {1, . . . n}. This means that M B = AM . 
The matrix M in Theorem 4.45 is called change of basis matrix.
Example 4.46. Let v1 and v2 be the vectors in R2 such that
   
3 7
v1 = and v2 = .
1 4
Note that v1 and v2 are linearly independent, so that they form a basis of the vector space R2 .
By Proposition 4.13, there exists unique linear transformation T : R2 → R2 such that
( 
T v1 = 2v1 ,

T v2 = −v2 .
Then the matrix of T with respect to the basis v1 , v2 is
 
2 0
B= .
0 −1
The change of basis matrix M is  
3 7
M= .
1 4
By Theorem 4.45, the standard matrix A = [T ] is given by
   −1  
−1 3 7 2 0 3 7 1 31 −63
A = M BM = = .
1 4 0 −1 1 4 5 12 −26
Exercise 4.47. By Proposition 4.13, there is unique linear transformation T : R2 → R2 such that
       
3 1 1 301
T = and T = .
2 5 5 205
Find the standard matrix of T and compute its determinant.
ACCELERATED ALGEBRA 91

Theorem 4.45 tells us that matrices of the same linear operator with respect to different bases
are related in a certain way. It is useful to have some terminology for this.
Definition 4.48. Let A and B be n × n matrices. We say that A and B are similar to B if
A = M BM −1
for some invertible n × n matrix M . If A and B are similar, we write A ∼ B.
For instance, in Example 4.46, we proved that
   
1 31 −63 2 0
∼ .
5 12 −26 0 −1
Similarly, in Example 4.44, we proved that
   
7 −12 6 −1 0 0
10 −19 10 ∼  0 1 0 .
12 −24 13 0 0 1
Being similar is an equivalence relation on the set of all n × n matrices:
Lemma 4.49. Let A, B and C be n × n matrices. Then the following assertions hold:
(i) A ∼ A;
(ii) if A ∼ B, then B ∼ A;
(iii) if A ∼ B and B ∼ C, then A ∼ C.
Proof. To prove (i), observe that A = In AIn−1 , so that A ∼ A.
To prove (ii), suppose that A ∼ B. Then
A = M BM −1
for some invertible n × n matrix M . Put Q = M −1 . Then
B = M −1 AM = QAQ−1
and Q is invertible, so that B ∼ A.
To prove (iii), suppose that A ∼ B and B ∼ C. Then
(
A = M BM −1
B = QCQ−1
for some invertible n × n matrices M and Q. Let R = M Q. Then
  −1
A = M QCQ−1 M −1 = M Q C M Q = RCR−1


and R is invertible by Lemma 1.44, so that A ∼ C. 


Corollary 4.50. Let T : Rn → Rn be linear transformation. Then its matrices with respect to all
possible bases of the vector space Rn are similar.
Proof. The assertion follows from Theorem 4.45 and Lemma 4.49. 
Here is an important observation: if A is a n × n matrix such that A ∼ λIn for some λ ∈ R,
then there exists an invertible n × n matrix M such that
 
A = M λIn M −1 = λM −1 In M = λM −1 M = λIn ,
so that A = λIn . In particular, the only matrix that is similar to In is the matrix In .
Lemma 4.51. Similar matrices have the same determinant.
92 IVAN CHELTSOV

Proof. This follows Proposition 3.27 and Corollary 3.30. 


Using this lemma, for every linear operator T : Rn → Rn , we can define its determinant det(T )
as the determinant of its matrix with respect to any basis of the vector space Rn .
Example 4.52. Let T be the linear transformation from Example 4.46. It standard matrix is
 
1 31 −63
.
5 12 −26
and its matrix with respect to a certain non-standard basis is
 
2 0
.
0 −1
You can check that these two matrices have the same determinant det(T ) = −2.
Exercise 4.53. Let T be the unique linear transformation R3 → R3 such that
     
 3  1  −2
T e1 = 2 , T e2 = 5 , T e3 = −1 ,
    
3 0 1
and let U be the unique linear transformation R2 → R2 such that
       
3 1 1 301
U = and U = .
2 5 5 205
Compute their determinants.
Using Lemma 4.39 and Theorem 3.37, we see that a linear operator is invertible if and only if
its determinant is not zero.
Exercise 4.54. A linear operator T : Rn → Rn is idempotent if T ◦ T = T . Similarly, a square
matrix A is idempotent if A2 = A. Do the following:
(i) Let V be a linear subspace of Rn . Prove that PV is idempotent.
(ii) Write down three examples of 2 × 2 idempotent matrices and three examples of 5 × 5
idempotent matrices.
(iii) Find all invertible idempotent matrices.
(iv) Let T : Rn → Rn be an idempotent linear operator. Prove that

image(T ) = ker T − IdRn ,
where T −IdRn is the linear operator defined by x 7→ ([T ]−In )x for every x ∈ Rn . Interpret
this equality in the case where T is orthogonal projection onto a linear subspace.
Exercise 4.55. A linear operator T : Rn → Rn is nilpotent if
T r = |T ◦ T ◦{z
· · · T ◦ T} = 0
r times
for some r > 1, where 0 in the right-hand side means zero operator defined by x 7→ 0 for every
vector x ∈ Rn . Do the following
(i) Show that the linear operator T : R3 → R3 defined by
e3 7→ e2 7→ e1 7→ 0
is nilpotent. What is the smallest number r such that T r = 0?
(ii) Give an example of a nilpotent linear operator T : Rn → Rn such that T 10 = 0 but T 9 6= 0.
(iii) Let T : Rn → Rn be a nilpotent operator. Prove that T n = 0.
ACCELERATED ALGEBRA 93

5. Eigenvalues and eigenvectors


5.a. Similar matrices.
A city is composed of different kinds of men,
similar people cannot bring a city into existence.
Aristotle
Recall from Definition 4.48 that two n × n matrices A and B are said to be similar if
A = M BM −1
for some invertible n × n matrix M . In this case, we have Ar = M B r M −1 for every integer r.
This can be used to compute Ar provided that B r is easy to compute. For example, if
   
7 −12 6 −1 0 0
A = 10 −19 10 and B =  0 1 0 ,
12 −24 13 0 0 1
then A and B are similar, because A = M BM −1 , where
 
3 2 −1
M = 5 1 0 
6 0 1
Using this and Ar = M × B r × M −1 , we see that
     
3 2 −1 (−1)r 0 0 −1 2 −1 4 − 3(−1)r 6(−1)r − 6 3 − 3(−1)r
Ar = 5 1 0   0 1 0  5 −9 5  = 5 − 5(−1)r 10(−1)r − 9 5 − 5(−1)r 
6 0 1 0 0 1 6 −12 7 6 − 6(−1)r 12(−1)r − 12 7 − 6(−1)r
In this case, both 3 × 3 matrices A and B are from Example 4.44, we have B = M −1 AM , and
the matrix B is diagonal, so that we can compute B r easily.
Question 5.1. Given a n × n matrix A, can we find an invertible n × n matrix M such that
λ1 0 0 ··· 0
 
 0 λ2 0 ··· 0
−1
. . .. . .. 
M AM =   .
. .
. . .
. .
0 0 0 λ 0
n−1
0 0 0 · · · λn
for some real numbers λ1 , . . . , λn ? If yes, how to find the matrix M ?
If the answer to Question 5.1 is yes for a given square matrix A, we say that A is diagonalizable.
Translating Question 5.1 in the language of linear operators and using Theorem 4.45, we get
Question 5.2. Given a linear operator T : Rn → Rn , can we find a basis v1 , . . . , vn such that
 
 T v1 = λ1 v 1

 
T v = λ2 v 2

2




..
 .
 
T vn−1 = λn−1 vn−1





 
T vn = λn v n
for some real numbers λ1 , . . . , λn ? If yes, how to find the vectors v1 , . . . , vn ?
94 IVAN CHELTSOV

If the answer to Question 5.2 is yes for a given linear operator T : Rn → Rn , then we say that
this linear operator is diagonalizable. This brings us to the following (important) definition:
Definition 5.3. Let T : Rn → Rn be a linear operator.
(1) An eigenvalue of T is a real number λ such that

T x = λx

for some non-zero vector x ∈ Rn .


(2) An eigenvector of T with eigenvalue λ is non-zero vector x ∈ Rn such that T (x) = λx.
Translating this definition back into the language of matrices, we get
Definition 5.4. Let A be a n × n matrix.
(1) An eigenvalue of the matrix A is a real number λ such that

Ax = λx
for some non-zero vector x ∈ Rn .
(2) An eigenvector of A with eigenvalue λ is non-zero vector x ∈ Rn such that Ax = λx.
If T : Rn → Rn is a linear operator, then Theorem 4.22 gives
  
T x = T x
for every vector x ∈ Rn . Thus, the eigenvalues and eigenvectors of a linear operator T are exactly
the eigenvalues and eigenvectors of its standard matrix [T ].
Example 5.5. Let A be a n × n diagonal matrix
λ1 0 0 ··· 0
 
 0 λ2 0 ··· 0
. .. . . .. .. 
 .. . . . .
 
0 0 0 λ 0
n−1
0 0 0 · · · λn
where λ1 , . . . , λn are some real numbers. Then λ1 , . . . , λn are eigenvalues of the diagonal matrix A.
Let e1 , . . . , en be the standard basis of the vector space Rn . Then e1 , . . . , en are eigenvectors of
the matrix A with eigenvalues λ1 , . . . , λn , respectively.
Example 5.6. Let V be the linear subspace in R4 given by
x1 + x2 + 2x3 − x4 = 0,
let PV : R4 → R4 be the orthogonal projection onto V, let RV : R4 → R4 be the function such that
  
x 7→ x + 2 PV x − x

for every vector x ∈ R4 . Then RV is the reflection in V. Let


       
1 0 1 1
−1 2 0 1
v1 =  0  , v2 = −1 , v3 = 0 , v4 =  2  .
      
0 0 1 −1
ACCELERATED ALGEBRA 95

Then the vectors v1 , v2 , v3 and v4 are linearly independent, so that they form a basis of R4 .
Note also that v1 , v2 and v3 are contained in V, so that they form a basis of the vector subspace V.
Furthermore, the vector v4 forms a basis of the orthogonal complement to V. Then
 

 R V v 1 = v1 ,
 
RV v2 = v2 ,




 R V v 3 = v3 ,
 
RV v4 = −v4 .

Then 1 and −1 are eigenvalues of the linear operator RV , every non-zero vector in V is an eigenvec-
tor of the linear operator RV with eigenvalue 1, and every non-zero vector in V⊥ is an eigenvector
with eigenvalue −1. What are eigenvalues and eigenvectors of the orthogonal projection PV ?
Example 5.7. Let Rθ : R2 → R2 be the rotation by θ about 0, where θ ∈ R such that 0 6 θ < 2π.
If θ = 0, then Rθ = IdR2 and all vectors in R2 are its eigenvectors with eigenvalue 1. If θ = π, then

Rθ x = −x
for every vector x ∈ R2 , so that every vector in R2 is an eigenvector of Rθ with eigenvalue −1.
Finally, if θ 6= 0 and θ 6= π, then the operator Rθ does not have real eigenvalues.
Example 5.8. Let e1 and e2 be standard basis vectors in R2 , and let
 
0 1
A= .
0 0
Then Ae1 = 0 and Ae2 = e1 , so that e1 is an eigenvector of the matrix A with eigenvalue 0.
If x ∈ R2 is another eigenvector of this matrix, then
   
x1 x
A =λ 1
x2 x2
for some number λ. Then
       
x2 x1 x1 λx1
=A =λ =
0 x2 x2 λx2
which gives
λx1 = x2 ,


λx2 = 0.
If λ 6= 0, this gives x1 = 0 and x2 = 0, so that x = 0, which is not allowed. Then λ = x2 = 0 and
x = x1 e 1 .
Hence, the only eigenvalue of A is 0, and all eigenvectors of this matrix are scalar multiple of e1 .
This implies that A is not diagonalizable.
Exercise 5.9. Let A and B be n × n matrices. Prove or disprove the following assertions:
(i) If λ is an eigenvalue of A and B, then λ is an eigenvalue of AB.
(ii) If v is an eigenvector of A and B, then v is an eigenvector of AB.
Eigenvectors are not allowed to be zero, but eigenvalues can be. In fact, for a n × n matrix A,
the eigenvectors of A with eigenvalue 0 are exactly the nonzero elements of the kernel of A, because
Ax = 0x ⇐⇒ Ax = 0.
Thus, we see that 0 is an eigenvalue of A ⇐⇒ det(A) = 0 ⇐⇒ A is not invertible.
Proposition 5.10. Let T : Rn → Rn be a linear operator. The following conditions are equivalent:
96 IVAN CHELTSOV

• 0 is an eigenvalue of the linear operator T ;


• the operator T is not invertible.
Proof. Observe that 0 is an eigenvalue of the linear operator T if and only if ker(T ) 6= {0}.
By Lemma 4.39 and Corollary 4.40, this is equivalent to T not being invertible. 

It is useful to think about the set of all the eigenvectors of T that share a particular eigenvalue:
Definition 5.11. Let T : Rn → Rn be a linear operator, let A be a n × n matrix, and let λ ∈ R.
Then the λ-eigenspace of the linear operator T is
n o
Eλ (T ) = x ∈ Rn : T x = λx .


Similarly, the λ-eigenspace of the matrix A is


n o
Eλ (A) = x ∈ Rn : Ax = λx .

Example 5.12. Let RV : R4 → R4 be the linear operator (reflection) described in Example 5.6.
Then V is the 1-eigenspace of the operator RV , and V⊥ is its (−1)-eigenspace.
If A is a n × n matrix, then E0 (A) = ker(A). More generally, we have
Lemma 5.13. Let A be a n × n matrix, and let λ be a scalar. Then Eλ (A) = ker(A − λIn ).
Proof. For every vector x ∈ Rn , we have
 
x ∈ Eλ (A) ⇐⇒ Ax − λx = 0 ⇐⇒ A − λIn x = 0 ⇐⇒ x ∈ ker A − λI
as required. 
Corollary 5.14. Let T : Rn → Rn be a linear operator, let A be a n × n matrix, and let λ ∈ R.
Then Eλ (T ) and Eλ (A) are linear subspaces of Rn .
Proof. Using Lemmas 2.5 and 5.13, we see that Eλ (A) is a linear subspace. Since Eλ (T ) = Eλ ([T ]),
we see that Eλ (T ) is a linear subspace. 

We defined the λ-eigenspace for all scalars λ, even if λ is not an eigenvalue.


Proposition 5.15. Let A be a n × n matrix and λ ∈ R. The following are equivalent:
(1) λ is an eigenvalue of A;
(2) one has Eλ (A) 6= {0};
(3) the matrix A − λIn is not invertible;
(4) det(A − λI) = 0.
Proof. Using Definition 5.4, we see that (1) ⇐⇒ (2). By Theorem 3.37, we have (3) ⇐⇒ (4).
Finally, using Lemma 5.13 and Theorem 3.17, we deduce that (2) ⇐⇒ (3). 

Thus, if λ is an eigenvalue of A, then there exists at least one eigenvector with eigenvalue λ.
Exercise 5.16. Let A be a n × n matrix. Prove the following assertions:
(i) If λ is an eigenvalue of A, then λ2 is an eigenvalue of A2 .
(ii) If λ is an eigenvalue of A, then λ3 − 2λ + 5 is an eigenvalue of A3 − 2A + 5In .
(iii) If A invertible, then the eigenvalues of A−1 are the reciprocals of the eigenvalues of A
ACCELERATED ALGEBRA 97

5.b. Diagonalizable matrices.


In mathematics you don’t understand things. You just get used to them
John von Neumann
In this section, we answer Question 5.1. For every real numbers λ1 , . . . , λn , we let
λ1 0 0 ··· 0
 
 0 λ2 0 ··· 0
 . . . . .. 
diag λ1 , λ2 , . . . , λn =  ..
 .. .. .. ..
0 0 0 λ 0
n−1
0 0 0 · · · λn
Let A be a n × n matrix. Recall from Section 5.a that A is said to be diagonalizable if there exists
an invertible n × n matrix M such that
M −1 AM = diag λ1 , λ2 , . . . , λn


for some numbers λ1 , . . . , λn . Question 5.1 asks when such matrix M exists and how to find it.
To get a nice answer to Question 5.1, we let
  
χA λ = det A − λIn .

Then χA (λ) is a polynomial in λ of degree n with leading coefficient (−1)n , so that


χA λ = (−1)n λn + an−1 λn−1 + · · · + a1 λ + a0


for some real numbers a0 , a1 , . . . , an−1 . Observe that a0 = χA (0) = det(A). One can show that
an−1 = (−1)n tr(A),
where tr(A) is the sum of the diagonal elements of A (see Exercise 1.33). For instance, if
 
7 −12 6
A = 10 −19
 10 ,
12 −24 13
then, using the formula for the determinant (3.23), we see that
2
χA λ = −λ3 + λ2 + λ − 1 = − λ − 1 λ + 1 .
 

The polynomial χA (λ) is said to be the characteristic polynomial of the matrix A.


Proposition 5.17. The eigenvalues of A are exactly the real roots of the polynomial χA (λ).
Proof. Follows from Proposition 5.15. 
Corollary 5.18. An n × n matrix has at most n eigenvalues.
For instance, if A = diag(λ1 , λ2 , . . . , λn ) for some real numbers λ1 , . . . , λn , then
χA λ = det A − λIn = (−1)n λ − λ1 λ − λ2 · · · λ − λn ,
    

so that the eigenvalues of the matrix A are λ1 , λ2 , . . . , λn .


Example 5.19. Let T : R2 → R2 be the reflection in the x-axis, and let A = [T ]. Then

 1 − λ 0
χA λ = = (1 − λ)(−1 − λ) = (λ − 1)(λ + 1),
0 −1 − λ
98 IVAN CHELTSOV

so that the roots of χA (λ) are ±1. Moreover, we have


    
0 0 x
E1 (A) = ker(A − I2 ) = ker = : x∈R .
0 −2 0
Similarly, we see that E−1 (A) is exactly the y-axis.
Exercise 5.20. Find all (real) eigenvalues and the corresponding eigenspaces of the matrices
       
2 −1 2 0 1 0 4 −5 2 4 −5 7
5 −3 3  , −4 4 0 , 5 −7 3 ,  1 −4 9 .
−1 0 −2 −2 1 2 6 −9 4 −4 0 5
Lemma 5.21. Similar n × n matrices have the same characteristic polynomial.
Proof. Let A and B be similar n × n matrices. Then
A = M BM −1
for some invertible matrix M , so that
   
χA λ = det M BM −1 − λIn = det M B − λIn M −1 = det B − λIn = χB λ
   

as required. 
Corollary 5.22. Similar matrices have the same eigenvalues.
Thus, if the matrix A is diagonalizable and M is an invertible n × n such that
M −1 AM = diag λ1 , λ2 , . . . , λn


for some numbers λ1 , . . . , λn , then λ1 , . . . , λn are eigenvalues of the matrix A.


Example 5.23. Let  
1 2 1
A =  0 −1 0 ,
−2 −2 4
and let T = TA be the corresponding linear operator such that [T ] = A. Then

1 − λ 2 1

χA (λ) = 0
−1 − λ 0 = −(λ − 2)(λ − 3)(λ + 1),
−2 −2 4 − λ
so that the eigenvalues of A an T are 2, 3 and −1. To calculate E2 (T ) = E2 (A), observe that
 
−1 2 1
A − 2I2 =  0 −3 0 .
−2 −2 2
The reduced row echelon form of this matrix is
 
1 0 −1
0 1 0 
0 0 0
which implies that   
  t 
E2 (A) = ker A − 2I2 =  0 : t∈R .

t
 
ACCELERATED ALGEBRA 99

Therefore, we see that E2 (A) = span({v1 }), where


 
1
v1 = 0 .

1
Similarly, we see that E3 (T ) = E3 (A) = span({v2 }) and E−1 (T ) = E−1 (A) = span({v3 }), where
   
1 1
v2 = 0 and v3 = −1 .
2 0
Observe that v1 , v2 and v3 are linearly independent. Indeed, let
 
1 1 1
M = 0 0 −1 .
1 2 0
Then det(M ) 6= 0, which implies that v1 , v2 and v3 are linearly independent by Corollary 3.38.
Therefore, these vectors form a basis of R3 . Moreover, we see that
 
T v1  = 2v1 ,

T v2 = 3v2 ,
 
T v3 = −v3 ,

Thus, the matrix of the linear operator T with respect to the basis v1 , v2 , v3 is
 
2 0 0
B = 0 3 0  ,
0 0 −1
Hence, it follows from Theorem 4.45 that M −1 AM = B. One can explicitly check that this is true:
     
2 2 −1 1 2 1 1 1 1 2 0 0
M −1 AM = −1 −1 1   0 −1 0 0 0 −1 = 0 3 0  .
0 −1 0 −2 −2 4 1 2 0 0 0 −1
Thus, we see that A is diagonalizable.
Exercise 5.24. Let A be a diagonalizable n × n matrix such that all its eigenvalues are equal.
What can you say about A?
Now we are ready to answer Question 5.1.
Proposition 5.25. Let A, M and B be n × n matrices such that M is invertible, and

B = diag λ1 , λ2 , . . . , λn
for some real numbers λ1 , . . . , λn . Write M = (v1 |v2 | · · · |vn ). Then

 Av1 = λ1 v1 ,

Av2 = λ2 v2 ,


−1
M AM = B ⇐⇒ .

 ..


Avn = λn vn .

Proof. Note that


M −1 AM = B ⇐⇒ AM = M B.
Moreover, if i ∈ {1, . . . , n}, then
100 IVAN CHELTSOV

• the ith column of AM is Avi (by Lemma 1.35).


• the ith column of M B is λi vi .
Hence, we have AM = M D if and only if Avi = λi vi for every i ∈ {1, . . . , n}. 
Corollary 5.26. Let A be a n × n matrix. Then the following are equivalent:
(1) the matrix A is diagonalizable;
(2) there is a basis of Rn consisting of eigenvectors of the matrix A;
(3) the matrix A has n linearly independent eigenvectors.
Proof. The required assertion follows from Proposition 5.25 and Corollary 3.19. 
Let us show how to apply Proposition 5.25.
Example 5.27. Let  
−5 3 −3
A = −6 4 −6 .
0 0 −2
Then
χA λ = −(λ + 2)2 (λ − 1),


so that the eigenvalues of the matrix A are −2 and 1. Observe that


 
−3 3 −3
A − (−2)I3 = −6 6 −6 ,
0 0 0
which is a matrix of rank 1 and nullity 2, so that E2 (A) is two-dimensional. Let
   
1 −1
v1 = 1 and v2 =  0  .
0 1
Then v1 and v2 form a basis of E2 (A). Similarly, we see that E1 (A) is a one-dimensional vector
space that is spanned by the vector  
1
v3 = 2 .

0
Moreover, the vectors v1 , v2 and v3 form a basis of R3 . Let
 
1 −1 1
M = 1 0 2  .
0 1 0
Then it follows from Proposition 5.25 that
 
2 0 0
M −1 AM = 0 2 0 .
0 0 1
In particular, the matrix A is diagonalizable in this case.
Exercise 5.28. Let v be a non-zero vector in Rn considered as n × 1 matrix, and let
A = vvT .
Show that A is diagonalizable, one its eigenvalue is v · v, and other eigenvalues are all zeroes.
ACCELERATED ALGEBRA 101

However, Proposition 5.25 is not always applicable, because not all square matrices are diago-
nalizable. We saw this already in Example 5.8. Let us consider three additional examples.
Example 5.29. Let  
1 1 0
A = 0 1 1 ,
0 0 1
Then χA (λ) = −(λ − 1)3 , so that 1 is the only eigenvalue of the matrix A. Observe that
 
0 1 0
A − I2 = 0 0 1 ,
0 0 0
so that A has just one eigenvector up to scalling: the first standard basis vector e1 of the space R3 .
Because of this, the matrix A is not diagonalizable. Indeed, if it were diagonalizable, there would
exists an invertible 3 × 3 matrix M such that
 
λ1 0 0
M −1 AM =  0 λ2 0 
0 0 λ3
for some real numbers λ1 , λ2 and λ3 , so that λ1 = 1, λ2 = 1 and λ3 = 1 by Corollary 5.22, which
would imply that M −1 AM = I3 , so that A = M I3 M −1 = M M −1 = I3 , which is absurd.
Example 5.30. Let  
3 −2
A= ,
4 −1
Then χA (λ) = λ2 − 2λ + 5, so that A does not have real eigenvalues. Therefore, by Corollary 5.22,
the matrix A is not diagonalizable over the real numbers. However, the polynomial equation
λ2 − 2λ + 5 = 0
has two complex solutions: 1 + 2i and 1 − 2i. Therefore, if we are allowed to use complex numbers,
we can diagonalize A. Indeed, arguing exactly as in Example 5.23, we can obtain the matrix
 
2 + 2i 2 − 2i
M=
4 4
whose columns are eigenvectors of A with eigenvalues 1 + 2i and 1 − 2i, respectively. Then
 i 1+i      
−1 −4 8 3 −2 2 + 2i 2 − 2i 1 + 2i 0
M AM = i 1−i = .
4 8
4 −1 4 4 0 1 − 2i
Example 5.31. Let θ be a real number such that 0 6 θ < 2π, and let
 
cos θ − sin θ
A= .
sin θ cos θ
Then the characteristic polynomial of the matrix A is

cos θ − λ − sin θ
χA (λ) = det(A − λI2 ) = = λ2 − 2 cos θλ + 1.
sin θ cos θ − λ
If θ 6∈ {0, π}, then χA (λ) has no real roots, so that A is not digonalizable over R by Corollary 5.22.
Observe also that cos θ + sin θi and cos θ − sin θi are two complex eigenvalues of the matrix A.
Moreover, finding the corresponding complex eigenvectors and using Proposition 5.25, we get
 −1     
i 1 cos θ − sin θ i 1 cos θ + i sin θ 0
= .
1 i sin θ cos θ 1 i 0 cos θ − i sin θ
102 IVAN CHELTSOV

Therefore, the matrix A is diagonalizable over complex numbers.


Exercise 5.32. Let A be a 2 × 2 real matrix whose entries are all positive. Show that A has two
distinct (real) eigenvalues. Present an example of a real 2 × 2 matrix that does not have (real)
eigenvalues, which is different from matrices in Examples 5.30 and 5.31.
An answer to Question 5.1 is given by Proposition 5.25 and Corollary 5.26. In particular, it says
that a n × n matrix A is diagonalizable if and only if A has n linearly independent eigenvectors.
Can we find a simpler condition for diagonalization? Yes, we can. To prove it, we need
Lemma 5.33. Let A be a n × n matrix, let λ1 , . . . , λm be its eigenvalues, and let v1 , . . . , vm be
its eigenvectors with eigenvalues λ1 , . . . , λm , respectively. Suppose that λi 6= λj for every i 6= j.
Then v1 , . . . , vm are linearly independent.
Proof. We prove this by induction on m > 1. The result trivially holds when m = 1 by definition.
Suppose that m > 2, assume the result holds for m − 1, and let µ1 , . . . , µm be scalars such that
µ1 v1 + µ2 v2 + · · · + µm vm = 0.
We have to prove that µ1 = µ2 = · · · = µm = 0. Note that
0 = λm 0 = λm µ1 v1 + λm µ2 v2 + · · · + λm µm vm .
On the other hand, we have
 
0 = A µ1 v1 +µ2 v2 +· · ·+µm vm = µ1 Av1 +µ2 Av2 +· · ·+µm Avm = µ1 λ1 v1 +µ2 λ2 v2 +· · ·+µm λm vm .

Subtracting these equalities one from another, we get


  
λm − λ1 µ1 v1 + λm − λ2 µ2 v2 + · · · + λm − λm−1 µm−1 vm−1 = 0.
Thus, by induction, we have
 

 λ m − λ1 µ1 = 0,

 
 λm − λ2 µ2 = 0,

...




 λ −λ 
m m−1 µm−1 = 0.

But λ1 , . . . , λm are pairwise distinct, so µ1 = µ2 = · · · = λm−1 = 0, so that


µm vm = µ1 v1 + µ2 v2 + · · · + µm vm = 0,
which gives µm = 0, since vm 6= 0. This shows that v1 , . . . , vm are linearly independent. 
Corollary 5.34. Let A be a n × n matrix with n distinct eigenvalues. Then A is diagonalizable.
Proof. The required assertion follows from Lemma 5.33 and Corollary 5.26. 
Corollary 5.34 is very easy to apply.
Example 5.35. Let
 
1 0 0
A = 2 4 0 .
3 5 6
Then A is diagonalizable, since its eigenvalues are 1, 4 and 6 (all distinct).
ACCELERATED ALGEBRA 103

Example 5.36. Let  


1 2 3
A = 4 5 6 .
7 8 9
After some calculation, we find that
χA (λ) = −λ3 + 15λ2 + 18λ.
√ √
15+ 297 15− 297
Then 0, 2
and 2
eigenvalues of A, so that A is diagonalizable by Corollary 5.34.
Example 5.37. Let  
5 −1 −2
A = 0 −6 3  .
0 0 3
Then
5 − λ −1 −2

χA (λ) =
0 −6 − λ 3 = −(λ − 5)(λ + 6)(λ − 3),

0 0 3 − λ
so that the eigenvalues of the matrix A are 5, −6 and 3. Then A is diagonalizable by Corollary 5.34.
Of course, we can also show this explicitly using Proposition 5.25. Indeed, we have
     
1  1  7
0 ∈ ker A − 5I3 , 11 ∈ ker A + 6I3 , 2 ∈ ker(A − 3I3 ),
0 0 6
Now, using Proposition 5.25, we let
 
1 1 7
M = 0 11 2 .
0 0 6
Then, either using Proposition 5.25 or multiplying three 3 × 3 matrices, we get
 
5 0 0
−1
M AM = 0 −6  0 .
0 0 3
Exercise 5.38. Which of the following matrices are diagonalizable?
     
1 4 6 1 2 3 1 1 1
0 2 5 , 0 1 2 , 1 2 0 .
0 0 3 0 0 1 1 0 3
Note that Corollary 5.34 gives only sufficient condition for a square matrix to be diagonalizable.
In order to state a similar if and only if condition, we have to introduce two technical notations.
To do this, fix a n × n matrix A and a number δ. Then the number
  
γA (δ) = dim Eδ (A) = dim ker A − δIn

is called the geometric multiplicity of the number δ. By Proposition 5.17, we have


δ is an eigenvalue of the matrix A ⇐⇒ γA (δ) > 1 .
Similarly, the number
n o
r
µA (δ) = max r ∈ Z>0 : (λ − δ) divides χA λ
104 IVAN CHELTSOV

is called the algebraic multiplicity of the number δ. If k = µA (δ), then


 k
χA λ = λ − δ f (λ)
for some polynomial f (λ) of degree n − k such that f (δ) 6= 0. As with γA (δ), we have

δ is an eigenvalue of the matrix A ⇐⇒ µA (δ) > 1 .

Theorem 5.39. Let A be a n × n matrix, and let δ be a real number. Then


γA (δ) 6 µA (δ) .

Proof. Omitted. 
Both numbers γA (δ) and µA (δ) are not hard to compute using the tools we already have.
Example 5.40. Let
 
3 0 0
A = 0 3 0 .
0 0 5
Then χA (λ) = −(λ − 3)2 (λ − 5), so that the eigenvalues of A are 3 and 5. We have
  
   x   
E3 A = ker A − 3I3 = y  : x and y are real numbers = span e1 , e2 ,
0
 

so that γA (3) = 2 = µA (3). Similarly, we have


  
   0   
E5 A = ker A − 5I3 =  0 : z ∈ R = span e3 ,

 z 

so that γA (5) = 1 = µA (5). Observe that A is diagonalizable, because it is diagonal ,.


Example 5.41. Suppose that
 
1 1 0
A = 0 1 1 .
0 0 1
Then χA (λ) = −(λ − 1)3 , so that 1 is the only eigenvalue of A. Moreover, we have
µA (1) = 3 > 1 = γA (1).
In this case, the matrix A is not diagonalizable (see Example 5.29).
Example 5.42. Let
 
0 1
A= .
0 0
Then χA (λ) = λ2 , so that 0 is the only eigenvalue of the matrix A. We have
  
x
   
E0 A = ker A = : x ∈ R = span e1 ,
0
so that γA (0) = 1 < 2 = µA (0). In this case, the matrix A is not diagonalizable (see Example 5.8).
ACCELERATED ALGEBRA 105

Example 5.43. Let  


−3 0 0 0 0
 0 −3 0 0 0 
 
0
A= 0 8 0 0.
0 0 0 0 −1
0 0 0 1 0
Then the only (real) eigenvalues of A are −3 and 8, because
2
χA λ = − λ + 3 λ − 8 λ2 + 1 .
  

Then γA (−3) = µA (−3) = 2 and γA (8) = µA (8) = 1. But A is not diagonalizable over R. Why?
Note that i and −i are complex eigenvalues of the matrix A such that
γA (i) = µA (i) = γA (−i) = µA (−i) = 1.
One can show that A is diagonalizable over C.
Now the crucial fact is:
Lemma 5.44. Let A and B be similar n × n matrices, and let δ be a real number. Then
γA (δ) = γB (δ),


µA (δ) = µB (δ).
Proof. By Lemma 5.21, we have µA (δ) = µB (δ). To prove γA (δ) = γB (δ), we have to show that
   
dim ker A − δIn = dim ker B − δIn .
By Lemma 5.21, the number δ is an eigenvalue of A ⇐⇒ it is an eigenvalue of B. This gives
   
dim ker A − δIn 6= 0 ⇐⇒ dim ker B − δIn 6= 0.

Therefore, to complete the proof, we may assume that ker(B − δIn ) 6= {0}.
Let v1 , . . . , vr be a basis of the vector subspace ker(B − δIn ). Then
 
γB (δ) = dim ker B − δIn = r.

But Bvi = δvi for every i ∈ {1, . . . , r}. On the other hand, since A and B are similar, we have
A = M BM −1
for some invertible n × n matrix M , so that AM = M B. Then
AM vi = M Bvi = δM vi
for every i ∈ {1, . . . , r}. Thus, we see that M v1 , . . . , M vr are eigenvectors of the matrix A, so that
  
span M v1 , . . . , M vr ⊂ ker A − δIn .
Now, arguing as in the solution to Exercise 2.41, we see that the vectors M v1 , . . . , M vr are linearly
independent. This shows that
 
γA (δ) = dim ker A − δIn > r = γB (δ).

Similarly, we see that γB (δ) > γA (δ). 


Using this lemma, we obtain the following criterion for diagonalization:
106 IVAN CHELTSOV

Theorem 5.45. Let A be a n × n matrix, and let λ1 , . . . , λm be all its distinct (real) eigenvalues.
Then A is diagonalizable (over real numbers) if and only if
γA (λ1 ) = µA (λ1 )




γA (λ2 ) = µA (λ2 )

..


 .

γA (λm ) = µA (λm )

and γA (λ1 ) + γA (λ2 ) + · · · + γA (λm ) = n.


Proof. If A is diagonalizable, then, by Lemma 5.44, it is enough to check the required condition on
algebraic and geometric multiplicities for a diagonal matrix, which is easy and left to the reader.
Thus, to complete the proof, we may assume that
γA (λ1 ) = µA (λ1 )




γA (λ2 ) = µA (λ2 )

..


 .

γA (λm ) = µA (λm )

and also γA (λ1 ) + γA (λ2 ) + · · · + γA (λm ) = n. We have to show that the matrix A is diagonalizable.
The proof of this assertion is based on Corollary 5.33 and is very similar to the proof of Lemma 5.33.
Hence, for transparency, we will prove it only in for n = 2 and n = 3.
First we suppose that n = 2. If m = 2, then the matrix A is diagonalizable by Corollary 5.33.
Thus, we may assume that m = 1. Then µA (λ1 ) = 2, so that
χA (λ) = (λ − λ1 )2 .
Since γA (λ1 ) = µA (λ1 ) = 2, there are two linearly independent eigenvectors with eigenvalue λ1 .
Then A is diagonalizable by Corollary 5.26.
Now we suppose that n = 3. If m = 3, then A is diagonalizable by Corollary 5.33. If m = 1,
then
µA (λ1 ) = γA (λ1 ) = n = 3,
which implies that there are 3 linearly independent eigenvectors of the matrix A with eigenvalue λ1 ,
so that A is diagonalizable by Corollary 5.26. Thus, we may assume that m = 2. Then
γA (λ1 ) + γA (λ2 ) = 3.
Since γA (λ1 ) > 1 and γA (λ2 ) > 2, either γA (λ1 ) = 1 and γA (λ2 ) = 2, or γA (λ1 ) = 2 and γA (λ2 ) = 1.
Without loss of generality, we may assume that γA (λ1 ) = 1 and γA (λ2 ) = 2.
Let v1 be an eigenvector of A with eigenvalue λ1 , let v2 and v3 be linearly independent eigen-
vectors of A with eigenvalue λ2 . We claim that the vectors v1 , v2 and v3 are linearly independent.
Indeed, let µ1 , µ2 and µ3 be scalars such that
µ1 v1 + µ2 v2 + µ3 v3 = 0.
Let us show that µ1 = µ2 = µ3 = 0. Multiplying the previous vector equality by A, we get
µ1 λ1 v1 + µ2 λ2 v2 + µ3 λ2 v3 = 0.
One the other hand, multiplying µ1 v1 + µ2 v2 + µ3 v3 = 0 by λ1 , we get
λ1 µ1 v1 + λ1 µ2 v2 + λ1 µ3 v3 = 0.
Then, subtracting the last two equalities from each other, we get
(λ1 − λ2 )µ2 v2 + (λ1 − λ2 )µ3 v3 = 0.
ACCELERATED ALGEBRA 107

Since λ1 6= λ2 by assumption, we conclude that µ2 = 0 and µ3 = 0, because v2 and v3 are linearly


independent. Thus, we have

µ1 v1 = µ1 v1 + µ2 v2 + µ3 v3 = 0,

so that µ1 = 0, because v1 6= 0. Thus, we see that v1 , v2 and v3 are linearly independent.


Since v1 , v2 and v3 are linearly independent, A is diagonalizable by Corollary 5.26. 

Exercise 5.46. Let


 
3 −4 −4
A = −4 3 4 .
6 −6 −7
Do the following:
(i) Find the eigenvalues of A. For each eigenvalue, find a basis of the corresponding eigenspace.
What are the algebraic and geometric multiplicities of the eigenvalues?
(ii) Find an invertible matrix M and a diagonal matrix B such that M −1 AM = B.
(iii) Calculate A123456 .

Let us conclude this section by translating its main results into the language of linear operators.
Namely, let T : Rn → Rn be a linear operator. By Corollary 4.50 and Lemma 5.21, we may define
the characteristic polynomial χT (λ) of the operator T as

χT (λ) = det A − λIn

where A is the matrix of T with respect to any basis of Rn . Then the eigenvalues of T are real
roots of the polynomial χT (λ).

Example 5.47. Let


   
3 7
v1 = and v2 = ,
1 4
and let T : R2 → R2 be the unique linear operator such that
( 
T v1 = 2v1 ,

T v2 = −v2 .

Then the matrix of T with respect to the basis v1 , v2 is


 
2 0
B= ,
0 −1

so the characteristic polynomial χT (λ) of the linear operator T is given by

χT (λ) = χB (λ) = (λ − 2)(λ + 1).

We calculated in Example 4.46 that the standard matrix of T is


 
1 31 −63
A= .
5 12 −26
Since A and B are similar, we have χA (λ) = χB (λ).
108 IVAN CHELTSOV

Recall from Section 5.a that T is diagonalizable if there exists a basis v1 , . . . , vn of Rn such that
 
 T v1 = λ1 v1

 
T v = λ2 v2

2




..
 .
 
T v = λn−1 vn−1

n−1



 

T vn = λn vn
for some scalars λ1 , . . . , λn . By Theorem 4.45, Proposition 5.25 and Corollary 5.26, the following
conditions are equivalent:
(1) the linear operator T is diagonalizable;
(2) the standard matrix of T is diagonalizable;
(3) the matrix of T with respect to some basis of Rn is diagonal;
(4) there is a basis of Rn consisting of eigenvectors of the linear operator T ;
(5) there exist n linearly independent eigenvectors of the linear operator T .
Moreover, it follows from Corollary 5.34 that T is diagonalizable if T has n distinct eigenvalues.
For δ ∈ R, the number γT (δ) = dim(Eδ (T )) is called the geometric multiplicity of the number δ,
and the number n o
r
µT (δ) = max r ∈ Z>0 : (λ − δ) divides χT λ
is called the algebraic multiplicity of the number δ. By Lemma 5.44, we have
γT (δ) = γA (δ),


µT (δ) = µA (δ),
where A is the matrix of T with respect to any basis of Rn . Moreover, it follows from Theorem 5.45
that T is diagonalizable if and only if γT (λ1 ) + γT (λ2 ) + · · · + γT (λm ) = n and
γT (λ1 ) = µT (λ1 ),




γT (λ2 ) = µT (λ2 ),

..


 .

γT (λm ) = µT (λm ),

where λ1 , . . . , λm are all distinct eigenvalues of the linear operator T .


Exercise 5.48. Let Π be the plane in R3 that passes through the points
     
0 1 −2
0 , 0 ,  3  ,
0 4 2
and let RΠ : R3 → R3 be the reflection in Π. Find the characteristic polynomial of the standard
matrix of the linear operator RΠ , and find a basis of R3 such that the matrix of RΠ with respect
to this basis is diagonal.

5.c. Complex matrices.


The shortest path between two truths in the real domain passes through the complex domain.
Jacques Hadamard
Let A be a n × n matrix. Then A may be non-diagonalizable (see Examples 5.8 5.29 and 5.30).
Lemma 5.49. Let λ be an eigenvalue of A. If γA (λ) < µA (λ), then A is not diagonalizable.
ACCELERATED ALGEBRA 109

Proof. Follows from Theorem 5.45 or its proof. 


This result gives very clear obstruction for being diagonalizable that can be effectively used.
Example 5.50. Let  
1 1 0
A = 0 1 0 .
0 0 0
Then χA (λ) = −λ(λ − 1)2 , so that the eigenvalues of the matrix A are 0 and 1. Moreover, we have
µA (1) = 2 > 1 = γA (1),
so that A is not diagonalizable by Lemma 5.49. Of course, we can show this directly as follows.
Suppose that A is diagonalizable. Then there exists an invertible 3 × 3 matrix M such that
 
λ1 0 0
M −1 AM =  0 λ2 0 
0 0 λ3
for some scalars λ1 , λ2 and λ3 . Then λ1 , λ2 and λ3 must be eigenvalues of A by Corollary 5.22.
By Proposition 5.25, the columns of the matrix M are eigenvectors of A with eigenvalues λ1 , λ2 , λ3 ,
respectively. Thus, since M is invertible, the matrix A has two linearly independent eigenvectors
with eigenvalue 1, which is a contradiction.
Example 5.51. Let  
1 1 0
A = 0 1 0 .
0 0 2
2
Then χA (λ) = −(λ − 1) (λ − 2), so that the eigenvalues of A are 1 and 2. But
µA (1) = 2 > 1 = γA (1),
so that A is not diagonalizable by Lemma 5.49.
However, there is another obstruction for A being diagonalizable, which has very different nature.
To describe it, observe that it follows from Theorem 1.46 (Fundamental Theorem of Algebra) that
χA λ = (−1)n λ − λ1 λ − λ2 · · · λ − λn
   

for some (not necessarily distinct) complex numbers λ1 , λ2 , . . . , λn . These numbers are the complex
roots of the characteristic polynomial χA (λ). Note that all or some of them can be real.
Lemma 5.52. If χA (λ) has a non-real complex root, then A is not diagonalizable over R.
Proof. Follows from Theorem 5.45 or its proof. 
This obstruction disappears if we consider complex n × n matrices instead of real n × n matrices.
To do this properly, we have to repeat many our definitions and proofs over complex numbers,
which is easy to do: one just have to change R to C everywhere. We will not do this here in details.
Instead, let us just list the most important definitions and results without proofs.
From now on and until the end of this section, we assume that
A is a n × n complex matrix
so that A is a n × n matrix with entries in C. Then
• an eigenvalue of the matrix A is a complex number λ such that
Ax = λx
n
for some non-zero vector x ∈ C ;
110 IVAN CHELTSOV

• an eigenvector of A with eigenvalue λ is non-zero vector x ∈ Cn such that Ax = λx.


Similarly, for every complex number λ, we define the λ-eigenspace of the matrix A as
n o 
Eλ (A) = x ∈ Cn : Ax = λx = ker A − λIn ,


so that Eλ (A) is linear subspaces of Cn . Then the following conditions are equivalent:
(1) λ is an eigenvalue of A;
(2) one has Eλ (A) 6= {0};
(3) the matrix A − λIn is not invertible;
(4) det(A − λI) = 0.
Exercise 5.53. Let A be a n × n complex matrix with entries Aij , and let λ be its eigenvalue.
For each i ∈ {1, . . . , n}, let

ri = Ai1 + · · · + Ai,i−1 + Ai,i+1 + · · · + Ai,n ,
so that ri is the sum of the absolute values of all the entries in the ith row except |Aii |.
(i) Prove that there exists i ∈ {1, . . . , n} such that

λ − Aii 6 ri .

This result is the Gershgorin Circle Theorem. Hints: Choose an eigenvector x with eigen-
value λ, and choose i such that

xi = max |x1 |, . . . , |xn | .

Then consider the equation Ax = λx at its ith coordinate, and use the triangle inequality.
(ii) Use the Gershgorin Circle Theorem to prove that for every eigenvalue λ of A, one has
n
X

λ 6 max Aij .
16i6n
j=1

(iii) Apply (ii) to the matrix


 
3 1 −1 −i
 0 −5 i − 2 
3 3
i ,
 1

2
i 2i 2
− 32 12 0 6
Use the Gershgorin Circle Theorem to draw a region in C where the eigenvalues of this
matrix must lie. Use this to find an upper and an lower bounds for the real and imaginary
parts of the eigenvalues of this matrix.
The characteristic polynomial of the complex n × n matrix A is the polynomial
  
χA λ = det A − λIn .

Then χA (λ) is a (complex) polynomial in λ of degree n. Moreover, by Theorem 1.46, we have


χA λ = (−1)n λ − λ1 λ − λ2 · · · λ − λn ,
   

where λ1 , λ2 , . . . , λn are (not necessarily distinct) complex eigenvalues of the matrix A.


Corollary 5.54. Let A be a n × n complex matrix. Then A has at least one eigenvalue.
ACCELERATED ALGEBRA 111

Exercise 5.55. Let A be a 2 × 2 complex matrix, let λ1 and λ2 be its two eigenvalues. Show that
χA (λ) = λ2 − tr(A)λ + det(A),
where tr(A) is the sum of the diagonal entries of A. Deduce that
χA (0) = det(A) = λ1 λ2
and tr(A) = λ1 + λ2 . Conclude (without calculations) that the matrix
 
198 301
486 261
has two distinct real eigenvalues such that one of them is positive and one of them is negative.
The formula in this exercise holds in general. Indeed, let
χA λ = (−1)n λn + an−1 λn−1 + · · · + a1 λ + a0 ,


where a0 , a1 , . . . , an−1 are some complex numbers. Then


a0 = χA (0) = det(A) = λ1 λ2 · · · λn
and
an−1 = (−1)n tr(A) = (−1)n λ1 + λ2 + · · · + λn


where tr(A) is the sum of the diagonal elements of A, known as the trace of the matrix A.
Exercise 5.56. Let A and B be n × n complex matrices.
• Show that AB and BA have the same characteristic polynomials.
• What can we tell about the eigenvalues of AB if we know the eigenvalues of A and B?
Two n × n complex matrices A and B are said to be similar if A = M BM −1 for some complex
invertible n × n matrix M . If the matrices A and B are similar, then
• they have the same characteristic polynomial and the same eigenvalues;
• one has dim(Eδ (A)) = dim(Eδ (B)) for every complex number δ.
Moreover, there exists an invertible n × n complex matrix M such that
λ1 0 0 ··· 0
 
 0 λ2 0 ··· 0
−1
. . . . .. 
M AM =   .
. .
. . . .
. .
0 0 0 λ 0
n−1
0 0 0 ··· λn
for some complex numbers λ1 , . . . , λn if and only if the columns of M are linearly independent
eigenvectors v1 , . . . , vn of the matrix A such that

 Av1 = λ1 v1 ,

Av2 = λ2 v2 ,


..


 .

Avn = λn vn .

In this case, the matrix A is said to be diagonalizable. Then the following conditions are equivalent:
(1) the matrix A is diagonalizable;
(2) there is a basis of Cn consisting of (complex) eigenvectors of the matrix A;
(3) the matrix A has n linearly independent (complex) eigenvectors.
Moreover, if A has n distinct eigenvalues, then A is diagonalizable.
112 IVAN CHELTSOV

Exercise 5.57. Which of the following matrices are diagonalizable over complex numbers?
       
4 −5 7 −1 3 −1 4 7 −5 4 2 −5
 1 −4 9 , −3 5 −1 , −4 5 0  , 6 4 −9 .
−4 0 5 3 3 1 1 9 −4 5 3 −7
For a n × n complex matrix A and an arbitrary complex number δ, we let γA (δ) = dim(Eδ (A)).
We say that γA (δ) is the geometric multiplicity of the number δ. Similarly, we let
n o
r
µA (δ) = max r ∈ Z>0 : (λ − δ) divides χA λ ,
and we say that µA (δ) is the algebraic multiplicity of the number δ. Then
γA (δ) 6 µA (δ).
Moreover, we have the following diagonalization criterion:
Theorem 5.58. Let A be a n × n complex matrix, and let λ1 , . . . , λm be all its distinct (complex)
eigenvalues. Then A is diagonalizable (over complex numbers) if and only if
γA (λ1 ) = µA (λ1 )




γA (λ2 ) = µA (λ2 )

..


 .

γA (λm ) = µA (λm )

Fix a n×n matrix A, and let T : Cn → Cn be linear operator with [T ] = A. If A is diagonalizable,


then there exists an invertible n × n complex matrix M such that
λ1 0 0 ··· 0
 
 0 λ2 0 ··· 0
−1
. . .. . .. 
M AM =   .
. .
. . .
. .
0 0 0 λ 0
n−1
0 0 0 · · · λn
for some complex numbers λ1 , . . . , λn . In this case, if v1 , . . . , vn are columns of M , then
 

 T v 1 = λ1 v1 ,

 
T v2 = λ2 v2 ,

..


 .

T v  = λ v ,

n n n

so that the diagonal matrix M −1 AM is the matrix of the linear operator T in the basis v1 , . . . , vn .
This illustrates Theorem 4.45, which is also valid for complex linear operators.
If A is not diagonalizable, we can still find an invertible matrix M such that M −1 AM looks
rather simple. For example, let
 
5 4 2 1
0 1 −1 −1
A= −1 −1 3
.
0
1 1 −1 2
Then χA (λ) = (λ − 1)(λ − 2)(λ − 4)2 , so that the eigenvalues of A are 1, 2 and 4. One has
     
dim E1 A = dim E2 A = dim E4 A = 1.
ACCELERATED ALGEBRA 113

Thus, up to scaling, we have exactly one eigenvector with eigenvalue 1, 2 or 4. Namely, let
     
−1 1 1
1 −1 0
v1 =  0  , v2 =  0  , v3 = −1
    
0 2 1

Then v1 , v2 and v3 are eigenvectors of the matrix A with eigenvalues 1, 2 and 4, respectively.
Note that A is not diagonalizable by Theorem 5.58. The reason for this is that A does not have four
linearly independent eigenvectors. But the matrix A has three linearly independent eigenvectors!
These are our eigenvectors v1 , v2 and v3 . These three vectors are indeed linearly independent.
For instance, this follows from Lemma 5.33, which is valid for complex matrices. Let
 
a
b
v4 = 
c

d

for some complex numbers a, b, c and d. When v1 , v2 , v3 , v4 are linearly independent? Let
 
−1 1 1 a
1 −1 0 b 
M = 
0 0 −1 c 
0 1 1 d

Then det(M ) = −a−2b, which shows that v1 , v2 , v3 , v4 are linearly independent ⇐⇒ a+2b 6= 0.
Thus, we suppose that a + 2b 6= 0. Then M is invertible, and v1 , v2 , v3 , v4 form a basis of C4 .
One can check that
 
1 0 0 −3b − 3c − 3d
0 2 0 −2c − 2d 
M −1 AM = 0 0 4

a+b+c 
0 0 0 4
This is the matrix of the linear operator T in the basis v1 , v2 , v3 , v4 .

Question. What is the most simplest form of the matrix M −1 AM that we can get?

The answer to this question really depends on person’s taste. In my opinion, M −1 AM is the most
simplest when

 − 3b − 3c − 3d = 0,

− 2c − 2d = 0,

a + b + c = 1.

Solving this system, we see that a = t + 1, b = 0, c = −t, d = t, where t is any complex number.
For example, we can put a = 1 and b = c = d = 0, so that
 
1
0
v4 = 
0 .

0
114 IVAN CHELTSOV

Note that we could find v4 by solving the equation (M − 4I4 )v4 = v3 . In this case, we have
 

 T v1 = v1 ,

T v2  = 2v2 ,




 T v 3 = 4v3 ,
 
T v4 = 4v4 + v3 ,

so that the matrix of the linear operator T in the basis v1 , v2 , v3 , v4


 
1 0 0 0
0 2 0 0
M −1 AM =  0 0 4 1

0 0 0 4
Then we say that this matrix is Jordan normal form of our matrix A.
Exercise 5.59. Find an invertible 3 × 3 matrix M such that
   
1 −3 4 3 0 0
M −1 4 −7 8 M = 0 −1 1  .
6 −7 7 0 0 −1
Arguing as in our 4 × 4 example, we can prove that for every 2 × 2 complex matrix A, there
exists an invertible 2 × 2 matrix M such that either M −1 AM is diagonal, or
 
−1 λ 1
M AM =
0 λ
for some λ ∈ C. These are all Jordan normal forms of 2 × 2 matrices.
Similarly, for any 3 × 3 complex matrix A, there is an invertible 3 × 3 complex matrix M such
that the matrix M −1 AM is one of the following matrices:
     
λ1 0 0 λ1 0 0 λ1 1 0
 0 λ2 0  ,  0 λ2 1  ,  0 λ1 1  ,
0 0 λ3 0 0 λ2 0 0 λ1
where λ1 , λ2 and λ3 are complex numbers. These are all Jordan normal forms of 3 × 3 matrices.
Likewise, for any 4 × 4 complex matrix A, there is an invertible 4 × 4 complex matrix M such
that the matrix M −1 AM is one of the following matrices:
         
λ1 0 0 0 λ1 0 0 0 λ1 0 0 0 λ1 1 0 0 λ1 1 0 0
 0 λ2 0 0   0 λ2 0 0   0 λ2 1 0   0 λ1 0 0   0 λ1 1 0
 0 0 λ3 0  ,  0 0 λ3 1  ,  0 0 λ2 1  ,  0 0 λ2 1  ,  0 0 λ1
         ,
1
0 0 0 λ4 0 0 0 λ3 0 0 0 λ2 0 0 0 λ2 0 0 0 λ1
where λ1 , λ2 , λ3 and λ4 are complex numbers. These are Jordan normal forms of 4 × 4 matrices.
In general, we can find an invertible 4 × 4 complex matrix M such that the matrix M −1 AM is
composed of so-called Jordan blocks, which are square matrices that look like
λ 1 0 0 ··· 0
 
0 λ 1 0 · · · 0
0 0 λ 1 · · · 0
 
. 
 .. ,
 
0 · · · 0 λ 1 0
 
0 · · · 0 0 λ 1
0 ··· 0 0 0 λ
ACCELERATED ALGEBRA 115

where λ ∈ C. Here, we have λ on the diagonal and 1 in each entry above the diagonal. The proof
of this result is beyond the scope of these notes.
Let us conclude this section by one result about complex square matrices that will be used later.
To state it, observe that for a n × n complex matrix A, its complex conjugate A is a n × n matrix
whose (i, j)-entry is the complex conjugate of the (i, j)-entry of the matrix A. Similarly, we can
define complex conjugates of any matrices and vectors. Complex conjugation preserves addition
and multiplication of matrices. For instance we have

A + B = A + B and AB = A B
for every complex n × n matrices A and B. Now we are ready to state our result:
T
Lemma 5.60. Let A be a n × n complex matrix such that A = A . Then its eigenvalues are real.
Proof. Let λ ∈ C be an eigenvalue of A. Choose an eigenvector x ∈ Cn with eigenvalue λ. Then
x1
 
 x2 
x=  ... 

xn
for some complex numbers x1 , . . . , xn . Then xT x is real and positive, because
x1
 
n n
T
  x2  X X
x x = x1 x2 · · · xn  ..  =   xi xi = |xi |2 > 0,
. i=1 i=1
xn
since x 6= 0. The number xT x is the length of the complex vector x. On the other hand, we have
T T  T T
λ xT x = λx x = Ax x = Ax x = xT A x = xT Ax = xT λx = λxT x,

which gives λ = λ, so that λ is real. 


Corollary 5.61. Every real symmetric matrix has at least one real eigenvalue.
T
If A is a matrix such that A = A, it is called Hermitian. Hermitian matrices are important.

5.d. Orthogonal matrices.


It’s always good to take an orthogonal view of something. It develops ideas.
Ken Thompson
Let T : Rn → Rn be a linear operator. Usually, the operator T does not preserve lengths and
angles. For instance, if n = 2 and T is given by
    
x1 1 2 x1
7→ ,
x2 0 1 x2
then e2 has length 1, but  
 2
T e2 =
1

has length 5. So that T changes lengths. It also changes angles. For instance, the vectors e1 and
e2 are orthogonal, but T (e1 ) and T (e2 ) are not.
116 IVAN CHELTSOV

Definition 5.62. We say that T is an orthogonal transformation if


 
T x ·T y =x·y
for every vectors x and y in Rn .
If the linear operator T is an orthogonal transformation, then
p √
T (x) = T (x) · T (x) = x · x = x

for every vector x ∈ Rn , so that T preserves lengths, which also implies that it preserves angles
between vectors by Definition 1.19. In fact, the transformation T is orthogonal if and only if

T (x) = x

for every vector x ∈ Rn . This immediately follows from


Lemma 5.63. Let x and y be vectors in Rn . Then
1 
x·y = kx + yk2 − kx − yk2 .
4
Proof. We have
kx + yk2 − kx − yk2 = x + y · x + y − x − y · x − y =
   
 
= x · x + 2x · y + y · y − x · x − 2x · y + y · y = 4x · y
as required. 
To decide whether T is an orthogonal transformations or not, we can use the following criterion.
Theorem 5.64. Let A = [T ]. Then the following conditions are equivalent
(i) the linear operator T is orthogonal;
(ii) the columns of the matrix A are orthonormal;
(iii) the rows of the matrix A are orthonormal;
(iv) the equality AT A = In holds;
(v) the equality AAT = In holds.
Proof. If AT A = In , then
AAT = (AT A)T = InT = In ,
so that (iv) implies (v). Similarly, we see that (v) implies (iv).
Let us show that (iv) implies (i). To do this, we suppose that AT A = In . Let x and y be any
vectors in Rn . Then, using Lemma 1.37, we obtain
T
T x · T y = Ax · Ay = Ax Ay = xT AT Ay = xT y = x · y,
    

so that T is orthogonal. This shows that (iv) implies (i).


Let us show that (i) implies (ii). To do this, we suppose that T is orthogonal. Then
   
T x · T y = Ax · Ay = x · y
for every vectors x and y in Rn . In particular, we have
1 if i = j

 
Aei · Aej = ei · ej =
0 if i 6= j
for every i and every j in {1, . . . , n}. But Aei is the ith column of the matrix A by Lemma 1.35.
Thus, the columns of A are orthonormal. This shows that (i) implies (ii).
ACCELERATED ALGEBRA 117

Now let us show that (ii) ⇐⇒ (iv). To do this, let v1 , . . . , vn be the columns of the matrix A.
Let us use the following convention: the (p, q)-entry of a matrix M is written as Mpq . Then
n
X n
X
T T
 
A A ik = A ij Ajk = Aji Ajk = vi · vk
j=1 j=1

for every i and k in {1, . . . , n}. Thus, the equality AT A = In holds if and only if
1 if i = j

vi · v k =
0 if i 6= j
for every i and k in {1, . . . , n}. This show that (ii) ⇐⇒ (iv).
Similarly, we see that (iii) ⇐⇒ (v). This completes the proof of the theorem. 
Let A be a real n × n matrix. Inspired by Theorem 5.64, we give the following
Definition 5.65. We say that A is orthogonal if AT A = In or AAT = In .
By Theorems 4.22 and 5.64, we have
A is orthogonal ⇐⇒ TA is orthogonal .
Example 5.66. Let Rθ : R2 → R2 be the rotation by an angle θ anticlockwise about the origin,
where θ is some real number such that 0 6 θ < 2π. Then
 
  cos θ − sin θ
Rθ = .
sin θ cos θ
This matrix is orthogonal, since
    
T cos θ sin θ cos θ − sin θ 1 0
[Rθ ] [Rθ ] = = .
− sin θ cos θ sin θ cos θ 0 1
Thus, we see that Rθ is orthogonal.
Exercise 5.67. Which of the following matrices are orthogonal?
 

    1 1 1 1
2 2 −1 2 2 −1
!
3 1
   
2 0 1 −1 − 1 1 1 1 1 −1 −1
, , 12 √32 ,  2 −1 2  , −1 2 2 ,  .
0 2 −1 1 3 3 2 −1 1 −1 1 
2 2 −1 2 2 2 −1 2
−1 1 1 −1
It easily follows from Lemma 5.63 that a linear transformation T : Rn → R2 is orthogonal if and
only if kT (x)k = kxk for every vector x ∈ Rn . By Theorem 5.64, the same holds for matrices.
Corollary 5.68. Let A be a n × n matrix. Then A is orthogonal if and only if
kAxk = kxk
n
for every x ∈ R .
Proof. For consistency, let us prove this in details If A is orthogonal, then Theorem 5.64 gives
q   √
Ax = Ax · Ax = x · x = x
for every x ∈ Rn .
Conversely, suppose kAxk = kxk for every x ∈ Rn . Let us show that
 
Ax · Ay = x · y
for every x and y in Rn . Then Theorem 5.64 would imply that A is orthogonal.
118 IVAN CHELTSOV

Let x and y be some vectors in Rn . Then, using Lemma 5.63, we get


 1 
kAx + Ayk2 − kAx − Ayk2 =

Ax · Ay =
4
1  1 
= kA(x + y)k2 − kA(x − y)k2 = kx + yk2 − kx − yk2 = x · y
4 4
as required. 
If A is an orthogonal matrix, then AT is also orthogonal by Theorem 5.64.
Lemma 5.69. If A is an orthogonal matrix, then either det(A) = 1 or det(A) = −1.
Proof. If A is orthogonal, then
 
2 T T
   
det A = det A det A = det A A = det In = 1,

so that det(A) = ±1. 


Corollary 5.70. If A is an orthogonal matrix, then A is invertible.
If λ ∈ R, then it follows from Lemma 5.69 that λIn is orthogonal ⇐⇒ λ = ±1.
Lemma 5.71. Let A and B be orthogonal n × n matrices. Then AB is orthogonal.
Proof. We have
T
AB = B T AT AB = B T In B = B T B = In ,

AB
so that the matrix AB is orthogonal. 
Exercise 5.72. Find all 2 × 2 and 3 × 3 orthogonal matrices with integer entries.
The set consisting of all n × n real orthogonal matrices is usually denoted by On (R) or O(n, R).
The set of all n × n orthogonal matrices with determinant 1 is denoted by SOn (R) or SO(n, R).
Lemma 5.73. Let A be a matrix in SOn (R). Then
 
cos θ − sin θ
A=
sin θ cos θ
for some θ ∈ R such that 0 6 θ < 2π.
Proof. We have  
a b
A=
c d
for some real numbers a, b, c, d. Since AAT = I2 , we have
 2 2
a + c = 1,

b2 + d2 = 1,

ab + cd = 0.

Thus, there are some real numbers u and v such that


a = cos(u),



c = sin(u),


 b = cos(v),


d = sin(v).
ACCELERATED ALGEBRA 119

Moreover, we may assume that 0 6 u < 2π and 0 6 v < 2π. On the other hand, we have
ad − bc = 1,
because det(A) = 1. Now, using ad − bc = 1 and ab + cd = 0, we get
cos(u) sin(v) − sin(u) cos(v) = 1,


cos(u) cos(v) + sin(u) sin(v) = 0.


This gives sin(v − u) = 1 and cos(u − v) = 0, which implies that
π
v − u = + 2πk
2
for some k ∈ Z. This gives cos(v) = − sin(u) and sin(v) = cos(u). Thus, if we set θ = u, then
 
cos θ − sin θ
A=
sin θ cos θ
as required. 
Corollary 5.74. Let A be a matrix in On (R) such that det(A) = −1. Then
 2 
b − a2 −2ab
A=
−2ab a2 − b2
for some real numbers a and b such that a2 + b2 = 1.
Proof. This follows from the proof of Lemma 5.73. Alternatively, note that the matrix
 
1 0
0 −1
is orthogonal and its determinant is −1. Thus, it follows from Lemmas 5.71 and 5.73 that
   
1 0 cos θ − sin θ
A=
0 −1 sin θ cos θ
for some θ ∈ R such that 0 < θ 6 2π. Then
    
1 0 cos θ − sin θ cos θ − sin θ
A= =
0 −1 sin θ cos θ − sin θ − cos θ
Let γ = 2θ , a = sin γ and b = cos γ. Then
cos γ − sin2 γ −2 sin γ cos γ
 2   2 
b − a2 −2ab
A= =
−2 sin γ cos γ sin2 γ − cos2 γ −2ab a2 − b2
as required. 
Therefore, we see that orthogonal operators R2 → R2 are either rotations around the origin
or reflections in lines passing the origin (see Exercises 4.25 and 4.26). Similarly, the orthogonal
operators R3 → R3 can also be completely classified as follows:
• rotations by about lines through the origin;
• reflections in planes passing through the origin;
• the reflection in the origin given by x 7→ −x.
Exercise 5.75. Let V be a vector subspace in Rn , let PV : Rn → Rn be the orthogonal projection
onto V, and let RV : Rn → Rn be the linear operator given by
  
x 7→ x + 2 PV x − x .
Show that RV is an orthogonal transformation.
120 IVAN CHELTSOV

5.e. Symmetric matrices.


Symmetry is what we see at a glance; based on the fact that there is no reason for any difference
Blaise Pascal
Let T : Rn → Rn be a linear operator. Recall that T is said to be diagonalizable if and only if
its matrix with respect to some basis of Rn is diagonal. Now we consider
Definition 5.76. The linear operator T is said to be orthogonally diagonalizable if its matrix is
diagonal with respect to some orthonormal basis of Rn .
The main result of this section is
Theorem 5.77. Let T : Rn → Rn be a linear operator. The following conditions are equivalent:
• the operator T is orthogonally diagonalizable;
• the standard matrix [T ] is symmetric.
Exercise 5.78. Someone tells you they have got a linear operator T : R3 : R3 whose matrix with
respect to a certain orthonormal basis is diagonal, and whose standard matrix is
 
1 −3 7
2 1 −2 .
7 3 1
Why should you be suspicious?
Exercise 5.79. Let V be a linear subspace of Rn , and let PV : Rn → Rn be orthogonal projection
onto V . Explain why the standard matrix of PV is symmetric.
The proof of Theorem 5.77 will take almost half of the rest of this section (see Corollary 5.86).
To prove it, we will mostly work with matrices rather than operators. Fix a n × n matrix A.
Definition 5.80. The matrix A is said to be orthogonally diagonalizable if there exists an orthog-
onal n × n matrix P such that the matrix
P −1 AP = P T AP
is diagonal.
We have the following criterion for orthogonal diagonalizability:
Proposition 5.81. The following conditions are equivalent:
(1) the matrix A is orthogonally diagonalizable;
(2) there is an orthonormal basis of Rn consisting of eigenvectors of M ;
(3) the matrix A has n orthonormal eigenvectors.
Proof. The proof is almost identical to the proof of Corollary 5.26, inserting the word “orthogonal”
or “orthonormal” in appropriate places. 
For a linear operator T : Rn → Rn , it follows from Theorems 4.45 and 5.26 that
T is orthogonally diagonalizable ⇐⇒ [T ] is orthogonally diagonalizable .
Thus, translating Proposition 5.81 into the language of linear operators, we see that the following
conditions are equivalent:
(1) an operator T : Rn → Rn is orthogonally diagonalizable;
(2) there is an orthonormal basis of Rn consisting of eigenvectors of T ;
(3) there exist n orthonormal eigenvectors of T ;
ACCELERATED ALGEBRA 121

Therefore, it makes little difference whether we work with linear operators or square matrices.
From now on, we stick with matrices.
Example 5.82. Let  
5 −1
A= .
−1 5
Then χA (λ) = (λ − 4)(λ − 6). Moreover, the vectors
   
1 −1
and
1 1
are eigenvectors with eigenvalues 4 and 6, respectively. They are orthogonal but not orthonormal.
We scale them to make them orthonormal, and define P to be the matrix with these orthonormal
eigenvectors as its columns:
√ ! √
2
√2
−√ 22
P = 2 2
.
2 2
Then P is orthogonal, and
√ √ ! √ √ !
2 2 2
−√ 22
  
−1 T 2√ √2
5 −1 √2
4 0
P AP = P AP = =
− 2 2 −1 5 2 2 0 6
2 2 2 2

so that A is orthogonally diagonalizable.


Exercise 5.83. Is every orthogonal matrix orthogonally diagonalizable?
In Example 5.82, the orthogonally diagonalizable matrix A was symmetric. In fact:
Lemma 5.84. Every orthogonally diagonalizable matrix is symmetric.
Proof. Let A be an orthogonally diagonalizable matrix. Then
P −1 AP = P T AP = D
for some orthogonal matrix P and diagonal matrix D. Thus, we have
A = P DP −1 = P DP T .
Since D is symmetric, we get
 T T
AT = P DP T = P T DT P T = P DP T = A,

so that AT = A, which means that A is symmetric. 


Now we are ready to prove the converse, which is much harder.
Theorem 5.85. A real square matrix is orthogonally diagonalizable if and only if it is symmetric.
Proof. The “only if”-part follows from Lemma 5.84. For “if”-part, we use induction on the size of
the matrix. It is clear for 1 × 1 matrices. Now let n > 2, let A be a n × n symmetric matrix, and
suppose inductively that (n − 1) × (n − 1) symmetric matrices are orthogonally diagonalizable.
By Corollary 5.61, the matrix A has at least one real eigenvalue. Thus, we have
Av1 = λ1 v1
for some λ1 ∈ R and some v1 ∈ Rn such that v1 6= 0, Replacing v1 by
1
v1 ,
kv1 k
122 IVAN CHELTSOV

we may assume that kv1 k = 1. Now, using Lemma 2.73, we can find some orthonormal basis
v 1 , v 2 , . . . , vn
of the vector space Rn that contains v1 .
Put Q = (v1 |v2 | · · · |vn ). Then Q has orthonormal columns, so it is orthogonal by Corollary 5.64.
Now consider the matrix
QT AQ = Q−1 AQ.
By Lemma 1.35, we have Qe1 = v1 , so that
AQe1 = λ1 Qe1 ,
or equivalently we have
Q−1 AQe1 = λ1 e1 .
By Lemma 1.35, this means that the first column of QT AQ is λ1 e1 . Moreover, the matrix QT AQ
is symmetric, since
 T T
QT AQ = QT AT QT = QT AQ,

because A is symmetric. Hence, the first row of QT AQ is λ1 eT1 . This means that
λ1 0 ··· 0
 
0 ã11 · · · ã1,n−1 
QT AQ =  .
 .. .. .. 
. . 
0 ãn−1,1 · · · ãn−1,n−1
for some real numbers ãij .
e for the (n − 1) × (n − 1) matrix (ãij ). Then A
Write A e is symmetric, since QT AQ is symmetric.
Thus, by inductive hypothesis, the matrix A e is orthogonally diagonalizable, so than

λ2 0 0 ··· 0
 
 0 λ3 0 ··· 0
T
. . . . .. 
Q
e A e =  ..
eQ .
. . . .
. .
 
0 0 0 λ 0
n−1
0 0 0 · · · λn

for some (n − 1) × (n − 1) orthogonal matrix Qe and some real numbers λ2 , . . . , λn .


Define a n × n matrix R by
1 0 ··· 0
 
 0 
R=  ... .
Qe 
0
Then
λ1 0 0 ··· 0
 
λ1 0 ··· 0
 
 0 λ2 0 ··· 0
   0  . . .. . .. 
RT QT AQ R =  ..  =  .. .
. . .
. ..
 . eT A
Q eQe  
0 0 0 λ
n−1 0
0
0 0 0 · · · λn
Put P = QR. Then P T AP is diagonal.
ACCELERATED ALGEBRA 123

We claim that P is orthogonal. Indeed, the matrix Q is orthogonal, and R is orthogonal, since
1 0 ··· 0 1 0 ··· 0 1 0 ··· 0
    
 0  0   0 
RT R =  ...  .
  ..
= .
  ..
 = In .
QeT Q
e QeT Q
e 
0 0 0
Thus, the matrix P is orthogonal by Lemma 5.71. This completes the induction. 
Corollary 5.86. The assertion of Theorem 5.77 holds.
Exercise 5.87. Let A be a n × n symmetric matrix. Show that the following are equivalent:
• all eigenvalues of A are positive real numbers;
• there exists n × n symmetric matrix B such that A = B 2 .
Let us consider one 2 × 2 example.
Example 5.88. Let  
5 −3
A= .
−3 5
Let us find an orthogonal 2 × 2 matrix P such that P T AP is diagonal. We have
χA (λ) = λ2 − 10λ + 16 = (λ − 2)(λ − 8),
so that the eigenvalues of A are 2 and 8. To find an eigenvector of A with eigenvalue 2, we solve
3x1 − 3x2 = 0,


− 3x1 + 3x2 = 0.
One its solution is the vector    
x1 1
= .
x2 1
It is an eigenvector of A with eigenvalue 2. Observe that the vector
 
−1
1
is orthogonal to the vector we just found, so that it must be an eigenvector of A with eigenvalue 8.
Normalizing both these vectors, we get eigenvectors
√ ! √ !
2
−√ 22
v1 = √22 and v2 = 2
2 2

with eigenvalues 2 and 8, respectively, that both have length 1. These eigenvectors are orthonormal,
so that we let √ √ !
2 2
 −
P = v1 |v2 = √22 √22 .
2 2
This matrix is the required orthogonal matrix, so that
√ √ ! √ √ !
2 2 2
−√ 22
  
T 2√ √2
5 −3 √2
2 0
P AP = =
− 22 22 −3 5 2 2 0 8
2 2

Exercise 5.89. Let a be any real number, and let


 
a 1
A= .
1 a
Find an orthogonal 2 × 2 matrix P such that P T AP is diagonal.
124 IVAN CHELTSOV

Let us consider 3 × 3 example.


Example 5.90. Let n be an integer, and let
 
14 −14 −16
A = −14 23 −2  .
−16 −2 8
What is the formula for An ? Observe that A is symmetric. Thus, it follows from Theorem 5.85
that there exists an orthogonal n × n matrix P such that
 
λ1 0 0
A = P  0 λ2 0  P T
0 0 λ2
for some real numbers λ1 , λ2 , λ3 . Hence, if we know P and λ1 , λ2 , λ3 , we can easily compute
 n 
λ1 0 0
An = P  0 λn2 0  P T
0 0 λn2
To find P and λ1 , λ2 , λ3 , we compute the characteristic polynomial

χA (λ) = det A − λI3 = −(λ + 9)(λ − 36)(λ − 18),
so that the eigenvalues of A are −9, 36 and 18. These are our λ1 , λ2 , λ3 . Hence, we let

λ1
 = −9,
λ2 = 36,

λ
3 = 18.

Then we have find some non-zero x ∈ R3 such that (A − λ1 I3 )x = x. This gives



23x1 − 14x2 − 16x3 = 0,

− 14x1 + 32x2 − 2x3 = 0,
 − 16x − 2x + 17x = 0,

1 2 3

Solving this system, we find one eigenvector


 
2
1
2
with eigenvalue −9. Normalizing so that it has length 1, we get the eigenvector
2
3
v1 =  13  .
2
3

Similarly, we find an eigenvectors


 2  1
−3 −3
2 
v2 = 
3 and v3 = − 23 

1
3
+ 32
ACCELERATED ALGEBRA 125

with eigenvalues 36 and 18, respectively, that both have length 1. The three eigenvectors v1 , v2
and v3 are orthonormal, so that we let
2 2 1

3
− 3
− 3
P = v1 |v2 |v3 =  1 2 − 2  .

3 3 3
2 1 2
3 3 3

Then P is the required orthogonal matrix, so that


 
4 · (−9)n + 4 · 36n + 18n 2 · (−9)n − 4 · 36n + 2 · 18n 4 · (−9)n − 2 · 36n − 2 · 18n
An = 2 · (−9)n − 4 · 36n + 2 · 18n (−9)n + 4 · 36n + 4 · 18n 2 · (−9)n + 2 · 36n − 4 · 18n  .
4 · (−9)n − 2 · 36n − 2 · 18n 2 · (−9)n + 2 · 36n − 4 · 18n 4 · (−9)n + 36n + 4 · 18n
Observe that once we found v1 and v2 , we can immediately get v3 as the cross-product v1 × v2 .
Exercise 5.91. Let A be one of the following 3 × 3 matrices:
     
5 −1 −1 11 2 −8 0 0 1
−1 5 −1 ,  2 2 10  , 0 1 0 .
−1 −1 5 −8 10 5 1 0 0
Find a formula for An , where n is any integer.
Let us show how to apply orthogonal diagonalization of 2 × 2 real symmetric matrices to classify
conic curves in R2 . Let us start with one example. Let C be a curve in R2 that is given by
5x2 − 6xy + 5y 2 − 4 = 0.
How does it look like? If we plot it, we see that it looks like an ellipse. Let us show that it is an
ellipse. To start with, let us rewrite the defining equation of C as
  
 5 −3 x
x y − 4 = 0.
−3 5 y
Then we let ψ : R2 → R2 be the rotation by an angle θ anticlockwise about 0, where θ is some real
number. We know from Example 4.20 that ψ is given by
   
x x cos θ − y sin θ
7→
y x sin θ + y cos θ
so that ψ(v) = M v for every v ∈ R2 , where
 
cos θ − sin θ
M= .
sin θ cos θ
Let φ : R2 → R2 be the rotation by an angle θ clockwise about 0. Then φ is given by
     
x x cos θ + y sin θ T x
7→ =M .
y −x sin θ + y cos θ y
This means that φ is the inverse of ψ, so that we have
ψ ◦ φ = IdR2


φ ◦ ψ = IdR2
The rotation ψ maps the curve C to a curve ψ(C). To find its equation, observe that
     
x  x x cos θ + y sin θ
∈ ψ C ⇐⇒ φ ∈ C ⇐⇒ ∈ C.
y y −x sin θ + y cos θ
126 IVAN CHELTSOV

Thus, we see that ψ(C) is given by the equation


 2     2
5 x cos θ + y sin θ − 6 x cos θ + y sin θ − x sin θ + y cos θ + 5 − x sin θ + y cos θ − 4 = 0.

In the matrix form, we see that ψ(C) is given by the equation


   
 5 −3 T x
x y M M − 4 = 0.
−3 5 y
Now, if we let θ to be − π4 , we get
√ √ !
2 2
M= 2√
2
√2
2
= PT,
− 2 2

where P is the matrix we found in Example 5.88. Then it follows from Example 5.88 that ψ(C) is
given by   
 2 0 x
x y − 4 = 0,
0 8 y
which can be simplified as
2x2 + 8y 2 − 4.
This is indeed an ellipse. Thus, our original curve C can be obtained from this ellipse by rotating
the plane R2 anticlockwise by angle π2 around the origin.
Remark 5.92. In this example, we used orthogonal transformations of R2 to simplify the defining
equation of the curve C. While doing this, we never used any geometric properties of this curve.
Indeed, we only performes algebraic manipulations with the polynomial 5x2 − 6xy + 5y 2 − 4.
Therefore, in some sense, we used orthogonal transformations to simplify this polynomial.
Now, we can apply the same idea to any conic. Recall from Example 3.48 that a conic in R2 is
a plane curve that is given by the polynomial equation
(5.93) ax2 + bxy + cy 2 + dx + ey + f = 0,
where a, b, c, d, e and f are some real numbers such that (a, b, c) 6= (0, 0, 0). However, sometimes,
the equation (5.93) does not define anything that looks like a nice curve. For instance, the equation
xy = 0
defines a union of two intersecting lines. Similarly, the equation
x2 − 1 = 0
defines a union of two parallel lines. Likewise, the equation
x2 = 0
defines a single line. Moreover, in some cases, the equation (5.93) defines something that does not
look like a curve at all. For instance, the equation
x2 + y 2 = 0
defines a single point, and x2 +y 2 +1 = 0 defines nothing (an empty set). These are degenerate cases.
Apart from them, the curve defined by (5.93) is of one of the following three types:
• an ellipse,
• a hyperbola,
• a parabola.
ACCELERATED ALGEBRA 127

To define these types of conics, we can use Theorem 5.94 below.


Regardless of what (5.93) defines, we can to simplify the equation (5.93) using rotations and
translations. A translation is a map R2 → R2 given by
   
x x+α
7→
y y+β
for some fixed real numbers α and β. To be precise, we have the following result:
Theorem 5.94. Let Σ be a subset in R2 given by (5.93). Then there exists a rotation ψ : R2 → R2
and a translation ξ : R2 → R2 such that (ξ ◦ ψ)(Σ) is one of the following subsets:
• an ellipse given by
Ax2 + By 2 − 1 = 0
for some positive real numbers A and B;
• a hyperbola given by
Ax2 − By 2 − 1 = 0
for some positive real numbers A and B;
• a parabola given by
Ax2 − y = 0
for some positive real number A;
• one of the following degenerate types:
– a union of two lines;
– a single line;
– a single point;
– an empty set.
Proof. To start with, we rewrite the equation (5.93) in the matrix form as
 a b
    
2
x  x
x y b + d e + f = 0.
2
c y y
Then we let
b
 
a 2
N= b .
c
2
The characteristic polynomial of this 2 × 2 symmetric matrix is
4ac − b2
χN (λ) = λ2 − a + c λ +

.
4
Let λ1 and λ2 be its roots. Since N is symmetric, both roots λ1 and λ2 are real by Theorem 5.85.
These are the eigenvalues of the symmetric matrix N .
Since N is diagonalizable by Theorem 5.85, at least one of the eigenvalues λ1 and λ2 is not zero,
because N is not a zero matrix. Thus, without loss of generality, we may assume that λ1 6= 0. Let
 
α
v1 = .
β
be an eigenvector of the matrix N with eigenvalue λ1 . Replacing v1 by the vector
 
1 α
,
α +β β
p
2 2

we may assume that α2 +β 2 = 1. Then α = cos(θ) and β = sin(θ) for some θ such that 0 6 θ < 2π.
128 IVAN CHELTSOV

Now we are going to use Theorem 5.85. Namely, we let


 
− sin(θ)
v2 = .
cos(θ)
Then v2 is an eigenvector of N with eigenvalue λ2 . By construction, the vectors v1 and v2 form
an orthonormal basis of the plane R2 .
Let φ : R2 → R2 be the linear operator that is given by
    
x cos(θ) − sin(θ) x
7→ ,
y sin(θ) cos(θ) y
and let ψ be the inverse linear operator. Then φ is the anticlockwise rotation by θ around the origin,
and ψ is given by     
x cos(θ) sin(θ) x
7→ ,
y − sin(θ) cos(θ) y
which is the clockwise rotation by θ around the origin.
The maps φ and ψ are bijections. Therefore, we have
   
x  x
∈ ψ Σ ⇐⇒ φ ∈ Σ.
y y
Thus, to obtain the equation of the set ψ(Σ) we should substitute the point
      
x cos(θ) − sin(θ) x cos(θ)x − sin(θ)y
φ = =
y sin(θ) cos(θ) y sin(θ)x + cos(θ)y
into the equation (5.93). This is easier to do in matrix form. Namely, the set ψ(Σ) is given by
the equation
a 2b
       
 α β α −β x  α −β x
x y b + d e + f = 0,
−β α 2
c β α y β α y
where α = cos(θ) and β = sin(θ). Now, using Theorem 5.45, we deduce that
a 2b
     
α β α −β λ1 0
b = ,
−β α 2
c β α 0 λ2
because v1 and v2 are eigenvectors of the matrix N . Thus, we see that ψ(Σ) is given by the equation
  
2 2
 α −β x
λ1 x + λ2 y + d e + f = 0.
β α y
Thus, we found the rotation ψ : R2 → R2 such that ψ(Σ) is given by the equation
λ1 x2 + λ2 y 2 + dx¯ + ēy + f = 0,
where d¯ = αd + βe and ē = −βd + αe. This equation is simpler than the original equation (5.93).
Moreover, we have the following three possibilities:
• λ1 λ2 > 0;
• λ1 λ2 < 0;
• λ1 λ2 = 0.
Let us deal with each of them separately.
Suppose that λ1 λ2 > 0. Rewrite the equation of ψ(Σ) as
2 2 
d¯ d¯2 ē2
  

λ1 x + + λ2 y + + f− − = 0.
2λ1 2λ2 4λ1 4λ2
ACCELERATED ALGEBRA 129

Let ξ : R2 → R2 be the translation that is given by



   
x x+
7→ 2λ1
ē ,
y y+ 2λ2

and let ζ be the inverse translation ξ −1 . Then ξ and ζ are bijections. Therefore, we have
   
x  x 
∈ (ξ ◦ ψ) Σ ⇐⇒ ζ ∈ψ Σ ,
y y
which implies that (ξ ◦ ψ)(Σ) is given by the equation
d¯2 ē2
 
2 2
λ1 x + λ2 y + f − − = 0,
4λ1 4λ2
where λ1 λ2 > 0. Thus, we have the following possibilities:
• λ1 > 0, λ2 > 0, 4f − d¯2 /λ1 − ē2 /λ2 < 0 ⇒ the subset (ξ ◦ ψ)(Σ)) is an ellipse;
• λ1 < 0, λ2 < 0, 4f − d¯2 /λ1 − ē2 /λ2 > 0 ⇒ the subset (ξ ◦ ψ)(Σ)) is an ellipse;
• λ1 > 0, λ2 > 0, 4f − d¯2 /λ1 − ē2 /λ2 > 0 ⇒ the subset (ξ ◦ ψ)(Σ)) is an empty set;
• λ1 > 0, λ2 > 0, 4f − d¯2 /λ1 − ē2 /λ2 = 0 ⇒ the subset (ξ ◦ ψ)(Σ)) is a point;
• λ1 < 0, λ2 < 0, 4f − d¯2 /λ1 − ē2 /λ2 = 0 ⇒ the subset (ξ ◦ ψ)(Σ)) is a point.
Now we suppose that λ1 λ2 < 0. Rewrite the equation of the set ψ(Σ) as
2 2 
d¯ d¯2 ē2
  

λ1 x + + λ2 y + + f− − = 0.
2λ1 2λ2 4λ1 4λ2
As in the previous case, let ξ : R2 → R2 be the translation that is given by
¯ 
x + 2λd1
  
x
7→ .
y y + 2λē 2
Then (ξ ◦ ψ)(Σ) is given by the equation
d¯2 ē2
 
2 2
λ1 x + λ2 y + f − − = 0,
4λ1 4λ2
where λ1 λ2 < 0. Then
d¯2 ē2
the subset (ξ ◦ ψ)(Σ) is a hyperbola ⇐⇒ f − − 6= 0.
4λ1 4λ2
Moreover, if 4f − d¯2 /λ1 − ē2 λ2 = 0, then (ξ ◦ ψ)(Σ) is a union of two non-parallel lines.
Finally, we suppose that λ1 λ2 = 0. Then λ2 = 0, since we assumed that λ1 6= 0. Now we can
rewrite the equation of ψ(Σ) as
2
d¯ d¯2

λ1 x + + ēy + f − = 0.
2λ1 4λ1
If ē = 0, then we have the following possibilities:
• 4f − d¯2 /λ1 = 0 ⇒ (ξ ◦ ψ)(Σ) is a line;
• λ1 > 0, 4f − d¯2 /λ1 < 0 ⇒ (ξ ◦ ψ)(Σ) is a union of two parallel lines;
• λ1 < 0, 4f − d¯2 /λ1 > 0 ⇒ (ξ ◦ ψ)(Σ) is a union of two parallel lines;
• λ1 > 0, 4f − d¯2 /λ1 > 0 ⇒ (ξ ◦ ψ)(Σ) is an empty set;
• λ1 < 0, 4f − d¯2 /λ1 < 0 ⇒ (ξ ◦ ψ)(Σ) is an empty set.
130 IVAN CHELTSOV

Thus, we may assume that ē 6= 0. Then we can rewrite the equation ψ(Σ) as
2
d¯ d¯2
  
f
λ1 x + + ē y + − = 0.
2λ1 ē 4λ1 ē
Let ξ : R2 → R2 be the translation given by
!
  d¯
x x+ 2λ1
7→ d¯2 .
y y + fē − 4λ1 ē

Then (ξ ◦ ψ)(Σ) is given by the equation λ1 x2 + ēy = 0, which is a parabola. 


Actually, we did not yet define ellipse, hyperbola and parabola. However, we can do this using
Theorem 5.94. Namely, a subset in R2 is said to be an ellipse if it can be mapped by a composition
of rotation and translation into the subset given by

Ax2 + By 2 − 1 = 0

for some positive real numbers A and B. Similarly, a subset in R2 is said to be a hyperbola if it
can be mapped by a composition of rotation and translation into the subset given by

Ax2 − By 2 − 1 = 0

Finally, a subset in R2 is said to be a parabola if it can be mapped by a composition of rotation


and translation into the subset given by

Ax2 − y = 0
for some real number A 6= 0. In these three cases, we say that the subset is a non-degenerate conic.
Example 5.95. Let Σ be a subset of R2 that is given by the equation
5x2 + 2xy + 10y 2 + 10x + 2y = 0.
Then we can rewrite this equation as
    
 5 1 x  x
x y + 10 2 = 0.
1 10 y y
But the eigenvalues of the matrix
 
5 1
1 10
√ √
are 15−2 29 > 0 and 15+2 29 > 0. Thus, applying an appropriate composition of a rotation and
translation, we can map the subset Σ to the subset that is given by
√ √
15 + 29 2 15 − 29 2
x + y +D =0
2 2
for some real number D. If D < 0, then Σ is an ellipse. Likewise, if D = 0, then Σ is a single point.
Finally, if D > 0, then Σ is an empty. But Σ contains the point 0 and the point
 
−2
,
0
so that the subset Σ is an ellipse.
ACCELERATED ALGEBRA 131

Example 5.96. Let Σ be a subset of R2 that is given by the equation


x2 + 4xy − 2y 2 + x + y + 4 = 0.
Then we can rewrite this equation as
    
 1 2 x  x
x y + 1 1 + 4 = 0.
2 −2 y y
The eigenvalues of the matrix  
1 2
2 −2
are −3 and 2. Therefore, arguing as in the proof of Theorem 5.94, we can find an appropriate ro-
tation ψ : R2 → R2 and a translation ξ : R2 → R2 such that (ξ ◦ ψ)(Σ) is given by
−3x2 + 2y 2 + D = 0
for some D ∈ R. If D 6= 0, then Σ is a hyperbola. If D = 0, then Σ is a union of two intersecting
lines, which implies, in particular, that every five distinct points in Σ contain three collinear points.
On the other hand, the set Σ contains the following six points
     
0√ 1√ −1√
1± 33 , 5± 73 , −3± 41
4 4 4
One can easily check that no three of them are collinear. This shows that Σ is a hyperbola.
Example 5.97. Let Σ be a subset of R2 that is given by
9x2 + 6xy + y 2 + 4x + 2y + 5 = 0.
Then we can rewrite this equation as
    
 9 3 x  x
x y + 4 2 + 5 = 0.
3 1 y y
Then 0 and 10 are eigenvalues of the matrix
 
9 3
.
3 1
Arguing as in the proof of Theorem 5.94, we see that there exists a rotation ψ : R2 → R2 and
a translation ξ : R2 → R2 such that either (ξ ◦ ψ)(Σ) is given by
10x2 + y = 0,
or (ξ ◦ ψ)(Σ) is given by 10x2 + D = 0 for some D ∈ R. In the former case, the set Σ is a parabola.
In the latter case, the set Σ is either a single line, a union of two parallel lines, or an empty set.
On the other hand, the subset Σ contains the following five points:
         
2 4 4 10 10
, , , , .
−7 −15 −11 −35 −27
In particular, it is not empty. Moreover, these five points are not collinear, so that Σ is not a line.
Furthermore, no three points among these five points are collinear, so that Σ is not a union of two
parallel lines. Therefore, the subset Σ is a parabola.
In fact, one can determine the type of the conic given by (5.93) without using any rotations and
translation. To do this, rewrite the equation (5.93) into the matrix form:
b d
  
 ab 2 2e x
x y 1 
2
c 2   y  = 0.
d e
2 2
f 1
132 IVAN CHELTSOV

Denote the 3 × 3 matrix in this equation by M . As in the proof of Theorem 5.94, let
a 2b
 
N= b .
2
c
Lemma 5.98. Let Σ be a subset in R2 given by (5.93). Then
(1) det(N ) > 0 and tr(N )det(M ) < 0 ⇐⇒ Σ is an ellipse;
(2) det(N ) < 0 and det(M ) 6= 0 ⇐⇒ Σ is a hyperbola;
(3) det(N ) = 0 and det(M ) 6= 0 ⇐⇒ Σ is a parabola.
Proof. The trace and determinant of the matrix N are invariant with respect to rotations and
translations of the plane. Similarly, the sign of the determinant of the matrix M is also invariant
with respect to rotations and translations. Thus, the result follows from Theorem 5.94. 
Corollary 5.99. If det(M ) = 0, then (5.93) does not define a non-degenerate conic.
Let us consider one example. Let us determine which conic is given in R2 by
2x2 − 3xy + 7y 2 − 5x + 11y + 8 = 0.
In this case, the matrices M and N are
a 2b d
   
2 2 − 32 − 25
M =  2b c e  =  −3 7 11 
2 2 2
d e 5 11
2 2
f −2 2 8
and
b
2 − 32
   
a 2
N= = .
b
2
c − 23 7
Then det(N ) = 47 4
, det(M ) = 31 and tr(N ) = 9, so that our equation does not define a non-
degenerate conic. This is easy to see explicitly, since
 2 47 
!
124  37  3 29 29 2 124
2x2 − 3xy + 7y 2 − 5x + 11y + 8 = +2 x− − y+ + y+ > ,
47 47 4 47 8 47 47
so that our equation defines an empty set.
Let us determine the type of a conic in R2 that is given by
2x2 − 3xy + 7y 2 − 5x + 11y − 8 = 0.
Now we have det(N ) = 474
, det(M ) = −157 and tr(N ) = 9, so that our equation defines an ellipse.
Of course, we can show this algebraically: we have
 2 47 
!
628  37  3 29 29 2
2x2 − 3xy + 7y 2 − 5x + 11y − 8 = − +2 x− − y+ + y+ ,
47 47 4 47 8 47
so that our conic is given by
47 2 628
2x̄2 +ȳ − =0
8 47
in coordinates
37  3  29 
 
x̄ = x −
 − y+ ,
47 4 47
ȳ = y + 29 .

47
This change of coordinates is a composition of an invertible linear transformation and a translation.
It follows from the proof of Lemma 5.98 that such transformations preserve the type of any non-
degenerate conic.
ACCELERATED ALGEBRA 133

Exercise 5.100. Determine the type of the conic in R2 given by


8x2 + xy − 5y 2 + 7x − 3y + 2 = 0.
In the book Linear Algebra: A Modern Introduction, David Poole posed the following exercise:

Poole assumes that parabola is the curve in R2 that is given by the equation
y = ax2 + bx + c
for some real numbers a, b and c. This definition of parabola is not politically correct: it discrim-
,
inates the x-axis , since the equation
x = ay 2 + by + c
clearly defines a parabola in R2 . The goal of the next exercise is to solve Poole’s exercise using
our (politically correct) definition of parabola:
• parabola is a subset in R2 such that there exists a composition of rotations and translations
that maps this subset to the curve given by
y = ax2 ,
where a is some real number.
Exercise 5.101. Let P1 = (0, 1), P2 = (−1, 4) and P3 = (2, 1). Do the following:
(a) Find all parabolas in R2 that pass through the points P1 , P2 , P3 and (19, 20).
(b) Find all parabolas in R2 that pass through the points P1 , P2 , P3 and (9, 10).
(c) Try to describe all parabolas in R2 pass through P1 , P2 , P3 .
(d) Try to describe the subset S ⊂ R2 such that P ∈ S if and only if there exists a parabola
that contains P1 , P2 , P3 and P .
Let us finish this section by one useful remark. In the proof of Theorem 5.94, we implicitly
classified polynomials
ax2 + bxy + cy 2 + dx + ey + f
with (a, b, c) 6= (0, 0, 0) up to rotations and translations. Moreover, going through this proof, we
see that each such polynomial can be simplified using rotations and translations as follows:
(1) āx2 + c̄y 2 + f¯ for some real numbers ā, c̄ and f¯ such that āc̄ > 0 and (ā + c̄)f¯ < 0,
(2) āx2 + c̄y 2 + f¯ for some real numbers ā, c̄ and f¯ such that āc̄ < 0 and f¯ 6= 0,
(3) āx2 + ēy for some real numbers ā and e such that ā 6= 0 and ē 6= 0,
(4) āx2 + c̄y 2 for some real numbers ā and c̄ such that āc̄ > 0,
(5) āx2 + c̄y 2 for some real numbers ā and c̄ such that āc̄ < 0,
134 IVAN CHELTSOV

(6) āx2 + f¯ for some real numbers ā and f¯ such that āf¯ < 0,
(7) āx2 for some real number ā,
(8) āx2 + c̄y 2 + f¯ for some real numbers ā, c̄ and f¯ such that āc̄ > 0 and (ā + c̄)f¯ > 0,
(9) āx2 + f¯ for some real numbers ā and f¯ such that āf¯ > 0.
Geometrically, these cases corresponds to the following subsets: an ellipse, a hyperbola, a parabola,
a single point, two intersecting lines, two parallel lines, a single line (taken with multiplicity two),
an empty set, another empty set, respectively.

You might also like