Lecture Note
Lecture Note
IVAN CHELTSOV
Contents
Preface 1
1. Background 4
1.a. Handy notations 4
1.b. Multidimensional vectors 7
1.c. Distances, dot products and angles 9
1.d. Matrices and operations with them 14
1.e. Complex numbers and finite fields 20
2. Basic linear algebra 26
2.a. Vector subspaces 26
2.b. Linear span 28
2.c. Linear independence 31
2.d. Bases and dimension 35
2.e. Orthogonality 40
3. Matrices and linear equations 47
3.a. Rank–nullity theorem 47
3.b. Determinants 53
3.c. Systems of linear equations 59
3.d. Row echelon form 65
3.e. Computational examples 70
4. Linear transformations 75
4.a. What is a linear transformation? 75
4.b. Matrices versus linear transformations 78
4.c. Composing linear transformations 83
4.d. Rank–nullity theorem revisited 85
4.e. Linear operators 87
5. Eigenvalues and eigenvectors 93
5.a. Similar matrices 93
5.b. Diagonalizable matrices 97
5.c. Complex matrices 108
5.d. Orthogonal matrices 115
5.e. Symmetric matrices 120
Preface
Cambridge gave me various kinds of teaching and supervision, but possibly the most important
influences were Jeffrey Goldstone and Peter Swinnerton-Dyer, who encouraged me to continue
to think for myself and not to take the technical details too seriously.
Miles Reid
To illustrate what this course is about, let us consider classical
Theorem (Chasles). Let C1 and C2 be cubic curves in the plane R2 such that their intersection
consists of exactly 9 points. Let C be any cubic curve in the plane R2 that contains 8 of these points.
Then the curve C contains the ninth intersection point as well.
Actually, we can prove this theorem using results that will be described and proved in this course.
Moreover, we can use these results to check this theorem in every given case. For instance, suppose
that the curve C1 is given by the equation
− 5913252577x3 + 30222000280x2 y − 21634931915xy 2 +
+ 5556266591y 3 − 73906985473x2 + 102209537669xy − 37300172365y 2 +
+ 1389517162x − 88423819400y + 204616284808 = 0,
and suppose that the curve C2 is given by the equation
− 4844332x3 − 8147864x2 y − 4067744xy 2 −
− 1866029y 3 + 32668904x2 − 28226008xy + 41719157y 2 +
+ 252639484x + 126319742y − 960898976 = 0.
Then the intersection C1 ∩ C2 consists of the eight points
(2, 3), (−3, 4), (−4, −5), (−6, 2), (5, 3), (3, 2), (−2, −6), (4, 8)
and the ninth intersection point
!
1439767504290697562 4853460637572644276
, .
409942054104759719 409942054104759719
Now let C be a cubic curve in the plane R2 that passes through the first eight intersection points.
Let us show that C also passes through the ninth (ugly looking) intersection point.
The curve C is given by a polynomial equation that looks like this:
a1 x3 + a2 x2 y + a3 xy 2 + a4 y 3 + a5 x2 + a6 xy + a7 y 2 + a8 x + a9 y + a10 = 0,
where a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , a9 and a10 are some real numbers. Now we can substitute
the coordinates of the points (2, 3), (−3, 4), (−4, −5), (−6, 2), (5, 3), (3, 2), (−2, −6), (4, 8) into
this equation. This gives us the following eight linear equations:
8a1 + 12a2 + 18a3 + 27a4 + 4a5 + 6a6 + 9a7 + 2a8 + 3a9 + a10 = 0,
36a2 − 27a1 − 48a3 + 64a4 + 9a5 − 12a6 + 16a7 − 3a8 + 4a9 + a10 = 0,
16a5 − 80a2 − 100a3 − 125a4 − 64a1 + 20a6 + 25a7 − 4a8 − 5a9 + a10 = 0,
72a2 − 216a1 − 24a3 + 8a4 + 36a5 − 12a6 + 4a7 − 6a8 + 2a9 + a10 = 0,
125a1 + 75a2 + 45a3 + 27a4 + 25a5 + 15a6 + 9a7 + 5a8 + 3a9 + a10 = 0,
27a1 + 18a2 + 12a3 + 8a4 + 9a5 + 6a6 + 4a7 + 3a8 + 2a9 + a10 = 0,
4a5 − 24a2 − 72a3 − 216a4 − 8a1 + 12a6 + 36a7 − 2a8 − 6a9 + a10 = 0,
64a + 128a + 256a + 512a + 16a + 32a + 64a + 4a + 8a + a = 0.
1 2 3 4 5 6 7 8 9 10
2 IVAN CHELTSOV
We have 10 unknowns and 8 equations. Our intuition says: we have many solutions. This is true!
For instance, if we add two more constraints
• a8 = 1389517162 and a9 = −88423819400,
then we get unique solution:
a = −5913252577,
1
a 2 = 30222000280,
a3 = −21634931915,
a4 = 5556266591,
a5 = −73906985473,
a6 = 102209537669,
a7 = −37300172365,
a8 = 1389517162,
a9 = −88423819400,
a10 = 204616284808.
This gives us the polynomial
It looks familiar! Indeed, our first cubic curve C1 in the plane is defined by the equation f1 (x, y) = 0.
Similarly, if we get another two constraints
• a8 = 252639484 and a9 = 126319742,
then we also get unique solution:
a = −4844332,
1
a2 = −8147864,
a3 = −4067744,
a4 = −1866029,
a5 = 32668904,
a6 = −28226008,
a7 = 41719157,
a8 = 252639484,
a9 = 126319742,
a10 = −960898976.
This gives us the polynomial
Recall that our second cubic curve C2 in R2 is defined by the equation f2 (x, y) = 0.
ACCELERATED ALGEBRA 3
In fact, we can extract the following 8 × 10 matrix (see Section 1.d) from the system above:
8 12 18 27 4 6 9 2 3 1
−27 36 −48 64 9 −12 16 −3 4 1
−64 −80 −100 −125 16 20 25 −4 −5 1
−216 72 −24 8 36 −12 4 −6 2 1
.
125 75 45 27 25 15 9 5 3 1
27 18 12 8 9 6 4 3 2 1
−8 −24 −72 −216 4 12 36 −2 −6 1
64 128 256 512 16 32 64 4 8 1
Then we can compute the rank of this huge 8 × 10 matrix. This is doable: its rank is equal to 8.
After this, using rank-nullity theorem, we conclude that every cubic curve in R2 that passes through
the points (2, 3), (−3, 4), (−4, −5), (−6, 2), (5, 3), (3, 2), (−2, −6), (4, 8) is given by
λf1 (x, y) + µf2 (x, y) = 0
for some numbers λ and µ. Thus, it contains the ninth intersection point of the curves C1 and C2 .
The following picture illustrates Chasles’s Theorem:
Here the blue curve is C1 , the red curve is C2 , and the orange curve is another cubic curve that
passes through the points (2, 3), (−3, 4), (−4, −5), (−6, 2), (5, 3), (3, 2), (−2, −6), (4, 8).
This example shows that linear algebra allows us to verify Chasles’s Theorem in special cases.
In fact, we can use basic tools from linear algebra to prove Chasles’s Theorem in full generality.
However, to use these tools properly, we have to introduce new objects (like matrices), learn
their basic properties and features (like what is the rank of a matrix and how to compute it),
and prove some important theorems, including rank-nullity theorem, which we already used above.
This is what we are going to do in these notes.
4 IVAN CHELTSOV
1. Background
1.a. Handy notations.
We could, of course, use any notation we want; do not laugh at notations;
invent them, they are powerful. In fact, mathematics is, to a large extent,
invention of better notations.
Richard Feynman
In this notes, we will often use Greek letters. The table below shows some of them together
with the name of each letter and how the name is pronounced.
To specify a finite set, you can simply list its elements: {1, 3, 5}. This does not work well for
infinite sets. But instead we can use notation such as
n o
2
A= a∈Z : a −a>6 .
The statement x ∈ Y means that x is an element of the set Y . But X ⊆ Y means that every
element of the set X is an element of the set Y .
Given subsets A and B of a set X, we can form the following new subsets:
• the intersection A ∩ B = {x ∈ X : x ∈ A and x ∈ B};
• the union A ∪ B = {x ∈ X : x ∈ A or x ∈ B};
• the complement A \ B = {x ∈ X : x ∈ A but x 6∈ B}.
A function (or mapping or map) consists of three things:
• a set A, called the domain of the function;
• another set B, called the codomain of the function; and
• a rule f that assigns to each element a ∈ A an element f (a) ∈ B.
We write f : A → B to mean that f is a function with domain A and codomain B.
Example 1.1. Let f : R → R be the function that is given by f (a) = a2 for every real number a.
Then its domain is R and its codomain is also R.
When we have some function f : A → B, we sometimes write a 7→ b to mean that f (a) = b.
2 2
For instance, let f : Z → Q be the function given by n 7→ n2 . Then f (n) = n2 for every n ∈ Z.
Remark 1.2. Suppose we have functions f : A → B and g : B → C. Then we can feed the output
of f into the input of g to make a new function g ◦ f : A → C. This new function g ◦ f is given by
(g ◦ f )(a) = g f (a)
for every a ∈ A, and is called the composite of the functions g and f .
Let A and B be sets. A function f : A → B is said to be
• injective (or one-to-one) if for each b ∈ B, there is at most one a ∈ A such that f (a) = b;
• surjective (or onto) if for each b ∈ B, there is at least one a ∈ A such that f (a) = b;
• bijective if f is both injective and surjective.
For instance, the function f : {2, 3} → {4, 5, 6} defined by f (a) = a2 is injective and not surjective.
Similarly, the function g : {−2, 2} → {4} defined by g(a) = a2 is surjective and not injective.
Remark 1.3. For any set A, there is a function IdA : A → A, called the identity on A, which is
sometimes written as IdA . It is given by IdA (a) = a for every a ∈ A.
Let A and B be sets and let f : A → B be a function. If f is bijective then there is a unique
function g : B → A such that
g ◦ f = IdA ,
f ◦ g = IdB .
This function g is called the inverse of f , and it is usually denoted by f −1 . Conversely, if f has
an inverse then f is bijective.
Example 1.4. Let f : N → Z be the function given by
n
if n is even,
2
f (n) = 1 − n
if n is odd.
2
6 IVAN CHELTSOV
We can put sums inside other sums. For instance, if we have a grid of numbers
a11 a12 · · · a1n
a21 a22 · · · a2n
(1.5) .. .. ..
. . .
am1 am2 · · · amn
then we can write down the sum of all numbers in this grid as
a11 + a12 + · · · + a1n + a21 + a22 + · · · + a2n + · · · + am1 + am2 + · · · + amn .
Of course, we can get the same total sum if we added up the numbers column by column, so that
m X
X n n X
X m
aij = aij .
i=1 j=1 j=1 i=1
xn + y n
Similarly, any vector x in R can be multiplied by any scalar a ∈ R to get another vector ax ∈ Rn ,
n
which is defined by
ax1
ax2
ax = ... .
axn
n
An important vector in R is the zero vector, whose all entries are zeroes. We will denote it by 0.
Lemma 1.8. Let x, y, z be vectors in Rn , and let a and b be numbers in R. Then
(1) (x + y) + z = x + (y + z);
(2) x + y = y + x;
(3) x + 0 = x;
8 IVAN CHELTSOV
Proof. This is a series of routine checks using the definitions. Let us prove only (6). We have
x1 + y1
x2 + y2
a x + y = a ...
xn + yn
x1 + y1 a(x1 + y1 )
x2 + y2 a(x2 + y2 )
a
... =
..
.
xn + yn a(xn + yn )
ax1 ay1 x1 y1
ax2 ay2 x y
. + . = a .2 + a .2 = ax + ay
.. .. .. ..
axn ayn xn yn
We write the vector (−1)x as −x. It is the vector whose ith entry is −xi , and it satisfies
−x + x = 0.
Finally, the part (4) follows from (3) by taking a = −1 and replacing x by x − y.
For two vectors x and y in Rn , their dot product x · y ∈ R is defined by
Xn
x·y = xi y i .
i=1
Note that x · y is a scalar (not a vector). Because of this x · y is often called scalar product.
Lemma 1.12. Let x, y, z ∈ Rn and a ∈ R. Then
(1) x · y = y · x;
(2) x · (y + z) = x · y + x · z;
(3) x · 0 = 0;
(4) x · (ay)√= a(x · y);
(5) kxk = x · x.
Proof. We will only prove (2). Write w = y + z. Then x · y + z = x · w. But
n
X
x·w = xi w i
i=1
10 IVAN CHELTSOV
as required.
Exercise 1.13. Prove Lemma 1.12(4).
You may have learn in school that whenever x and y are vectors in R2 or R3 , we have
x · y = kxk kyk cos θ
where θ is the angle between x and y. Since | cos θ| 6 1 for all θ, it follows that
|x · y| 6 kxk kyk
2 3
for all x and y in R or R . When does equality hold? We have
|x · y| = kxk kyk ⇐⇒ the points 0, x, y are collinear,
where collinear means they lie on a straight line. This fact can be generalized to all dimensions.
However, first we need to define a line in Rn . In this course all lines are straight.
Definition 1.14. A line in Rn is a set of points in Rn that is given by
x+t y−x
when t runs through R, where x and y are fixed points in Rn .
The line in this definition contains both points x and y.
Lemma 1.15 (Cauchy–Schwarz inequality). For any two vectors x and y in Rn , one has
|x · y| 6 kxk kyk
with equality if and only if the points 0, x and y are collinear.
Proof. If x = 0 or y = 0 then both sides of the inequality are 0 and the points 0, x and y are
collinear. Thus, we may assume that
x 6= 0 6= y.
Then the points 0, x and y are collinear if and only if x = ay for some a ∈ R. If x = ay then
x · y = ay · y = akyk2 ,
which implies that
x·y
a= .
kyk2
Hence 0, x and y are collinear if and only if
x·y
(1.16) x= y.
kyk2
ACCELERATED ALGEBRA 11
We do not know whether 0, x and y are collinear. But in any case, we can consider the distance
between the left-hand side and right-hand side of (1.16). We have
x·y
2
x·y x·y
0 6
x − y
= x − y · x− y =
kyk2 kyk2 kyk2
(x · y)2 (x · y)2 1
= kxk2 − 2 + = kxk 2
kyk 2
− (x · y) 2
,
kyk2 kyk2 kyk2
by Lemmas 1.11 and 1.12. Thus, rearranging, we get
2
x · y 6 kxk2 kyk2 .
Taking square roots on both sides gives |x · y| 6 kxk kyk as required. Equality holds if and only if
x·y
x− y = 0.
kyk2
But we have already shown that this is equivalent to the condition that 0, x and y are collinear.
Triangle inequality says that for any triangle, the length of each side is less than or equal to
the sum of the lengths of the other two sides. Now let x and y in R2 , thinking about the triangle
with vertices 0, x and x + y. Then, in this case, we have
• the distance from 0 to x is kx − 0k = kxk;
• the distance from x to x + y is k(x + y) − xk = kyk;
• the distance from 0 to x + y is k(x + y) − 0k = kx + yk;
so that kx + yk 6 kxk + kyk. We have a geometrically plausible argument for why this is true.
The same is true in all dimensions by
Lemma 1.17 (Triangle inequality). For every x and y in Rn , one has kx + yk 6 kxk + kyk.
Proof. Using the Cauchy–Schwarz inequality, we see that
kx + yk2 = x + y · x + y = kxk2 + 2x · y + kyk2 6
2
6 kxk2 + 2|x · y| + kyk2 6 kxk2 + 2kxk kyk + kyk2 = kxk + kyk .
Taking square roots on both sides gives the result.
The proof of this result demonstrate an important lesson:
it is often easier to work with squares of distances than with distances themselves.
Because we can expand a squared distance kvk2 as a dot product v · v.
Exercise 1.18. Let x and y be two vectors in Rn . Prove that
2 2 2 2
kx + yk + kx − yk = 2 kxk + kyk .
Using Lemma 1.15 and basic properties of cosine, we see that this definition makes sense.
√ √
√ 2 −√2
Example 1.20. Let x = √ 2 and y = − √ 2 . Then x · y = 8 and kxk = kyk = 4, so that
12 12
8 1
cos θ = = ,
4·4 2
π
where θ is the angle θ between x and y. Then θ = 3 .
Exercise 1.21. Without doing any calculation, prove that the angle between the vectors
4 6
1 66
120 and 2
8 30
0 5
in R5 is less than π2 .
Two vectors x and y in Rn are said to be orthogonal if x · y = 0. This happens exactly when
• either x = 0;
• or y = 0;
• or the angle between x and y is π2 .
Note that orthogonal is just a sexy word for perpendicular.
Lemma 1.22 (Pythagoras). Let x and y be orthogonal vectors in Rn . Then
kx + yk2 = kxk2 + kyk2 .
Proof. Using the hypothesis that x · y = 0, we have
kx + yk2 = x + y · x + y = x · x + 2x · y + y · y = kxk2 + kyk2
as required.
Now, we are (very) temporarily reduce ourself to live in three-dimensional space R3 .
Definition 1.23. Let x and y be vectors in R3 . Then their cross product x × y is a vector in R3
which is defined as follows:
x2 y3 − x3 y2
x × y = x3 y1 − x1 y3 .
x1 y2 − x2 y1
In this course, three-dimensional space plays no special role. Thus, we will hardly use the cross
product at all. But you may need it for other courses.
Lemma 1.24. Let x, y, z be vectors in R3 , and let a ∈ R. Then
(1) x × y = −(y × x);
(2) x × x = 0;
(3) x × (y + z) = (x × y) + (x × z);
(4) x × (ay) = a(x × y);
(5) (x × y) · x = 0 = (x × y) · y.
(6) kx × yk = kxk kyk sin θ, where θ is the angle between x and y.
ACCELERATED ALGEBRA 13
Proof. Let us just prove (5) and (6). To prove (5), observe that
x2 y 3 − x3 y 2 x1
x × y · x = x3 y 1 − x1 y 3 · x2 =
x1 y 2 − x2 y 1 x3
= (x2 y3 − x3 y2 )x1 + (x3 y1 − x1 y3 )x2 + (x1 y2 − x2 y1 )x3 =
= x1 x2 y3 − x1 x3 y2 + x2 x3 y1 − x1 x2 y3 + x1 x3 y2 − x2 x3 y1 = 0
kx × yk2 + (x · y)2 = x21 + x22 + x23 y12 + y22 + y32 = kxk2 kyk2 ,
Since 0 6 θ 6 π, we have sin θ > 0. Thus, taking square roots, we get (6).
This vector is indeed orthogonal to both x and y, since its dot product with x and its dot product
with y are both zero. Moreover, we have
√
2 24
q
√
√ √ √ √
kx × yk =
2 2 2
−2 24
= (2 24) + (2 24) = 2 × 2 × 24 = 192.
0
On the other hand, using the values of kxk, kyk and θ that we found in Example 1.20, we get
π √ √ √ √
kxk kyk sin θ = 4 · 4 · sin = 16 · 3/2 = 8 3 = 82 × 3 = 192,
3
which also follows from Lemma 1.24(6).
Please do not forget that the cross product is defined only in the three-dimensional space R3 .
On the other hand, the dot product is define in Rn for any n > 1.
14 IVAN CHELTSOV
for every i and k such that 1 6 i 6 m and 1 6 k 6 p. So, the (i, k)-entry of AB is
Ai1 B1k + Ai2 B2k + · · · + Ain Bnk .
It should be pointed out that two matrices can only be multiplied if the number of columns in the
first is equal to the number of rows in the second. Remember this! In particular, we can always
multiply any two n × n matrices to get another n × n matrix.
One particularly important matrix is the m × n matrix all of whose entries are 0. We call this
matrix 0. Another important matrix is the n × n identity matrix
1 0 ··· 0
. . ..
0 1
. .
In = . . . .
.. . . . . 0
0 ··· 0 1
ACCELERATED ALGEBRA 15
whenever A, B, x and y are matrices and vectors of the appropriate sizes, and c is a scalar.
16 IVAN CHELTSOV
Remark 1.32. Matrix multiplication is not commutative! It is not always true that
AB = BA
for matrices A and B. For example, you can check that
1 1 0 0 1 1 0 0 0 0 1 1
= 6= = .
0 0 1 1 0 0 1 1 1 1 0 0
Exercise 1.33. For every square matrix M , its trace, denoted by tr(M ), is the sum of its diagonal
elements. Let A and B be n × n matrices. Show that tr(AB) = tr(BA).
In this course, we will e1 , e2 , . . . , en for the vectors in Rn defined by
1 0 0
0 1 0
0 0 0
(1.34) e1 = , e2 = , . . . , en =
. .
.
.. .
.. ..
0 0 1
We will also write
A = (x1 |x2 | · · · |xn )
to mean that the columns of the matrix A are x1 , x2 , . . . , xn . For instance, if A is an m × n matrix
with (i, j)-entry written as Aij , then x1 ∈ Rm is given by
A11
A21
x1 =
... .
Am1
Lemma 1.35. Let A = (x1 |x2 | · · · |xn ) be an m × n matrix. Then
(1) Ay = y1 x1 + y2 x2 + · · · + yn xn for any vector y ∈ Rn ;
(2) Aej = xj for any j ∈ {1, . . . , n}, so that Aej is the jth column of A;
(3) A(y1 |y2 | · · · |yp ) = (Ay1 |Ay2 | · · · |Ayp ) for any y1 , . . . , yp ∈ Rn .
Proof. We will prove only (1) and (2). To prove (1), observe that Ay is an m × 1 matrix, which
we consider as an m-dimensional vector. We have to check that Ay and
n
X
y 1 x 1 + y 2 x2 + · · · + y n x n = yi xi
i=1
have the same entries. Let i be an integer such that 1 6 i 6 m. Then, using (1.28), we see that
n
X
Ay i
= Aij yj .
j=1
On the other hand, the ith entry of the vector y1 x1 is the number y1 Ai1 . Similarly, we see that
the ith entry of the vector yj xj is yj Aij for every j such that 1 6 j 6 n, so that
n
X
y1 x1 + y2 x2 + · · · + yn xn i
= y1 Ai1 + y2 Ai2 + · · · + yn Ain = Aij yj .
j=1
So the ith entries of the two sides are equal, as required. This proves (1).
To prove (2), it is enough just put y = ej in (1).
ACCELERATED ALGEBRA 17
yn
But a 1 × 1 matrix is just a scalar, and the scalar here is exactly the dot product x · y.
Here are some further useful properties of transposes.
Lemma 1.38. The following assertions hold:
(1) (A + B)T = AT + B T for all m × n matrices A and B;
(2) (cA)T = cAT for every matrix A and scalar c;
(3) (AB)T = B T AT for every m × n matrix A and n × p matrix B;
(4) (AT )T = A.
Proof. We will prove only (3) using the following convention: the (i, j)-entry of a matrix M is
written as Mij . Observe that (AB)T and B T AT are p × m matrices. We have to show that their
entries are the same. Let k ∈ {1, . . . , p} and i ∈ {1, . . . , m}. Then
n
X
T
(AB) ki
= (AB)ik = Aij Bjk .
j=1
So the (k, i)-entries of the matrices (AB)T and B T AT are equal as required.
18 IVAN CHELTSOV
Now, we are temporarily consider only square n × n matrices with real entries. The set they
form is usually denoted by Mn (R). Then
• the sum of any two matrices in Mn (R) is a matrix in Mn (R);
• the product of any two matrices in Mn (R) is a matrix in Mn (R).
Thus, we see that n × n matrices behave like real numbers, where zero matrix plays the role of 0,
and the identity matrix In plays the role of 1. However, there are two big differences:
(a) the multiplication in Mn (R) is not commutative (see Remark 1.32);
(b) we cannot always divide one matrix in Mn (R) by another non-zero matrix in Mn (R).
For example, we cannot divide I2 by matrices in Remark 1.32. Indeed, if we could, we would get
1 0
0 1
1 1 1 0 1 1 1 1 1 1
= × = 1 1 × 0 0 × −1 −1 =
−1 −1 0 1 −1 −1
0 0
1 0 1 0
0 1 0 1
1 1 1 1 0 0 0 0
= × × = × = ,
1 1 0 0 −1 −1 1 1 0 0 0 0
0 0 0 0
or
1 0
0 1
1 1 1 0 1 1 0 0 1 1
= × = 0 0 × 1 1 × −1 −1 =
−1 −1 0 1 −1 −1
1 1
1 0 1 0
0 1 0 1
0 0 1 1 0 0 0 0
= × × = × = ,
0 0 1 1 −1 −1 0 0 0 0 0 0
1 1 1 1
because every matrix in Mn (R) multiplied by the zero matrix in Mn (R) is a zero matrix in Mn (R).
Definition 1.39. An n × n matrix A is invertible if there exists a n × n matrix B such that
AB = In = BA.
It is called singular if it is not invertible.
If A is an invertible matrix in Mn (R), then there can be only one matrix B such that AB = In
and BA = In . Indeed, if C is a matrix in in Mn (R) such that AC = In and CA = In ,
B = BIn = B(AC) = (BA)C = In C = C,
so that B = C. We call this matrix B the inverse of the matrix A, and write B as A−1 .
Example 1.40. The following matrix is invertible:
1 −2
.
−3 5
Indeed, we have
1 −2 −5 −2 1 0 −5 −2 1 −2
= = .
−3 5 −3 −1 0 1 −3 −1 −3 5
ACCELERATED ALGEBRA 19
Similarly, we see that (B −1 A−1 )(AB) = In . This proved (1). Part (2) follows from In In = In .
z · w = z · w.
for every two complex numbers z and w. Check this!
Exercise 1.49. Let f (x) be a polynomial with real coefficients, and let z be a complex number.
Prove that f (z) = 0 ⇐⇒ f (z) = 0.
ACCELERATED ALGEBRA 21
To describe finite fields, let us explain what the word field means. To do this, consider a set F.
We want to equip it with two operations + (called addition) and ∗ (called multiplication) such that
the set F behaves as usual (real or complex) numbers. Let us explain what do we mean by this.
First of all, the set F should have an element F in F that behaves like 0. This means that
a+F=F+a=a
for every element a in F. Since F is a bulky symbol, let us denote this element simply by 0.
Similarly, the set F should have another element, which we denote by 1, that behaves as usual 1
in complex or real numbers. This means that
a∗1=1∗a=a
for every element a in F such that a 6= 0, while 1 ∗ 0 should be 0 of course.
Clearly, we want both operations + and ∗ to be commutative. These means that
a+b=b+a
a∗b=b∗a
for every two elements a and b in F. Similarly, we want
(a + b) + c = a + (b + c)
(a ∗ b) ∗ c = a ∗ (b ∗ c)
for every three elements a, b and c of the set F. Moreover, we want
a ∗ (b + c) = a ∗ b + a ∗ c,
for every three elements a, b and c of the set F.
To make + looks like usual addition of real or complex numbers, we should be able to subtract
elements of the set F from each other. Namely, for every element a and b, the set F should contains
an element ♣ such that
a = ♣ + b,
so that we denote ♣ by a − b. In a very special case when a = 0, we denote 0 − b simply by −b.
The same should be true for multiplication: we must be able to divide elements in F with a single
exception: we cannot divide by 0 (this is one of the mortal sins). Thus, for every two elements a
and b such that b 6= 0, the set F should contains an element ♠ such that
a = ♠ ∗ b,
so that we denote ♠ by ab . If a = 1, then we say that 1
b
is the inverse of the element b, and we can
also denote it by b−1 .
Definition 1.50. If the set F is equipped with two operations + and ∗ such that the properties
we described are satisfied, we say that F is a field. To be precise, the set F is a field if the following
conditions are satisfies:
(1) the set F must have an element, which is usually denoted by 0, such that
0+a=a+0=a
for any element a in the set F;
(2) (a + b) + c = a + (b + c) for every three elements a, b and c in F;
(3) for every two elements a and b in the set F, there is c ∈ F such that
a = c + b,
so that we denote c by a − b (if a = 0, then we write −b instead of 0 − b);
(4) a + b = b + a for every two elements a and b in F;
22 IVAN CHELTSOV
(5) the set F must have an element, which is usually denoted by 1, such that
1∗a=a∗1=a
for any element a in the set F such that a 6= 0;
(6) (a ∗ b) ∗ c = a ∗ (b ∗ c) for every three elements a, b and c in F;
(7) for every two elements a and b in the set F such that b 6= 0, there is c ∈ F such that
a = c ∗ b,
so that we denote c by ab (if a = 1, then we often write b−1 instead of 1b );
(8) a ∗ b = b ∗ a for every two elements a and b in F;
(9) a ∗ (b + c) = a ∗ b + a ∗ c for every a, b and c in F
Examples of fields are rational numbers, real numbers and complex numbers (equipped with
usual addition and multiplication). These are infinite fields. What about finite fields?
Example 1.51. Let F be the set consisting of the following symbols: , N and F (my favorite).
Let us equip F with two operations + and ∗ such that all properties described above are satisfied.
First we have to choose the special element that plays the role of zero. Of course, it should be F.
Then, using this and Definition 1.50, we obtain the following addition table:
+ F N
F F N
N F
N N F
Now we should choose an element between and N that will plays the role of 1. Let it be .
Then we obtain the following multiplication table:
∗ F N
F F F F
F N
N F N
Now one can check that the set F equipped with + and ∗ is a field.
Can we equip every finite set with two operations + and ∗ such that it becomes a field? No.
Exercise 1.52. Prove that there exists no field consisting of 6 elements.
If the set F consists of 3 elements, then we can equip it with two operations + and ∗ such that
the set F becomes a field. This follows from Example 1.51. To construct more finite fields, fix
a prime number p. Let Fp be the set
0, 1, . . . , p − 1 .
Equip this set with operations + and ∗ as follows. For every two elements a and b in Fp , we let
a + b = the remainder of the integer a + b when divided by p.
ACCELERATED ALGEBRA 23
Similarly, we let
a ∗ b = the remainder of the integer a ∗ b when divided by p.
Exercise 1.53. Prove that Fp equipped with + and ∗ is a field.
To divide elements in Fp , it is enough to understand how to find (multiplicative) inverses of the
elements in Fp . For instance, if p = 1973, then
1
= 570,
45
a
since 45 ∗ 570 = 1 in Fp . Thus, in this case, we have 45 = a ∗ 570 for every element a in Fp .
But finding inverses can be tricky. The best way to do this is to use extended Euclidean algorithm.
On the other hand using Fermat Little Theorem, one can show that
1
= a ∗ a ∗ ·{z
· · ∗ a ∗ a}
a |
p − 2 times
+ ♥ ♦ ♣ ♠
♥ ♥ ♦ ♣ ♠
♦ ♦ ♥ ♠ ♣
♣ ♣ ♠ ♥ ♦
♠ ♠ ♣ ♦ ♥
Similarly, we must get the following multiplication table:
∗ ♥ ♦ ♣ ♠
♥ ♥ ♥ ♥ ♥
♦ ♥ ♦ ♣ ♠
♣ ♥ ♣ ♠ ♦
♠ ♥ ♠ ♦ ♣
One can check that F equipped with + and ∗ is a field, so that we can call it F4 for consistency.
But there is a better way to represent elements in F4 . Let us describe it.
24 IVAN CHELTSOV
Do you remember how we earlier constructed C? A similar approach can be used to construct F4 .
Namely, we start with the field F2 , which consists of just two elements: 0 and 1. Then we introduce
a new symbol, let us call it t, such that t2 = t + 1. Then we consider all sums that look like a + bt,
where a and b are both in F2 . Finally, we denote the obtained set by F4 , and extend addition and
multiplication from F2 to elements in F4 using simple intuitive rules:
(a + bt) ∗ (c + dt) = ac + bdt2 + adt + bct = (ac + bd) + (ad + bc + bd)t
and
(a + bt) + (c + dt) = (a + c) + (b + d)t,
where a, b, c and d are any elements in F2 . This turns F4 into a field with the same addition and
multiplication tables as above, where ♣, ♦, ♥ and ♠ are renamed as follows:
♥ 7→ 0,
♦ 7→ 1,
♣ 7→ t,
♠ 7→ 1 + t.
For every prime p and every n ∈ N, one can use a similar approach to construct the field Fq that
consists of q = pn elements. The crucial point in constructing F4 was the fact that the polynomial
x2 + x + 1
is irreducible over F2 , i.e. it cannot be factorized as a product of two other polynomials with
coefficients in F2 that have smaller degrees. Likewise, if we find an irreducible polynomial
f (x) = xn + an−1 xn−1 + an−2 xn−2 + · · · + a2 x2 + a1 x + a0
with each ai in Fp , we can construct a field Fq with q = pn as all possible sums
α0 + α1 t + α2 t2 + · · · + αn−2 tn−2 + αn−1 tn−1 ,
where each αi is an element of the field Fp , and t is a symbol that satisfies the following rule:
n n−1 n−2 2
t = − an−1 t + an−2 t + · · · + a2 t + a1 t + a0 .
Then we can expand the addition and multiplication operations from Fp to the set Fq in a similar
way as we did this earlier in the construction of the field F4 .
Exercise 1.55. Construct the field F8 and present its multiplication table.
Let us consider one explicit example. Let p = 2063 and q = 20632 . Let us construct the field Fq .
First observe that the polynomial x2 + 1 is irreducible over Fp . Indeed, if x2 + 1 were reducible,
then we would have
x2 + 1 = (x − a)(x + a)
for some a and b in Fp , so that a2 = −1 in Fp . In this case, we say that −1 is a quadratic residue.
But −1 is a quadratic residue in Fp ⇐⇒ the remainder of p when divided by 4 is 1. Note that
2063 = 515 × 4 + 3,
which implies that x + 1 is irreducible over Fp . Now we introduce a symbol t such that t2 = −1.
2
and
(a + bt) + (c + dt) = (a + c) + (b + d)t,
where a, b, c and d are any elements in Fp . This turns the set Fq into a field.
Remark 1.56. We will see later that every finite field consists of q = pn elements for some prime
number p and some positive integer n. As we already mentioned, for every prime p and every
positive integer n there exist a field consisting of q = pn elements, which is usually denoted by Fq .
In fact, such field is unique (up to renaming the elements of the field).
Finite fields play a fundamental role in in a number of areas of mathematics and computer
science, including number theory, algebraic geometry, cryptography and coding theory.
Exercise 1.57. Let p be a prime number, let n be a positive integer, and let q = pn . Find how
many 2 × 2 matrices
a b
c d
with ad − bc 6= 0 are there, where a, b, c and d are in Fq ,
26 IVAN CHELTSOV
and
m
X
y= µi vi
i=1
for some real numbers λ1 , λ2 , . . . , λm , µ1 , µ2 , . . . , µm . Hence, we have
m
X m
X m
X m
X
x+y = λi vi + µi vi = λi v i + µi vi = (λi + µi )vi ,
i=1 i=1 i=1 i=1
We say that the vectors v1 , . . . , vm span the vector subspace V, or that the set
n o
v1 , . . . , v m
is a spanning set for the vector space V.
1 0
Example 2.14. Let v1 = −1 and v2 = −1
1 be vectors in R3 , and let V = span({v1 , v2 }).
0
Then V is the plane in R3 that is given by
x1 + x2 + x3 = 0.
Indeed, denote this plane by Π. Then Π is a vector subspace of the three-dimensional space R3 ,
since it is the kernel of the 1 × 3 matrix (1 1 1). On the other hand, one can check that
v1 ∈ Π 3 v2 ,
so that V ⊆ Π by Lemma 2.9. To show that V ⊇ Π, let x be a vector in Π. Then
x = x1 v 1 − x 3 v 2 ,
so that x ∈ V. This shows that V ⊇ Π.
Let e1 , e2 , . . . , en be vectors in Rn defined in (1.34). Then e1 , e2 , . . . , en span the whole Rn , since
x = x1 e1 + x2 e2 + · · · + xn en ∈ span {e1 , e2 , . . . , en }
for every vector x ∈ Rn .
Lemma 2.15. Let v1 , . . . , vm be vectors in Rn , and let W be a linear subspace of Rn . Then
the vectors v1 , . . . , vm are contained in W ⇐⇒ span {v1 , . . . , vm } ⊆ W.
The linear span of a non-zero vector in Rn is a line containing the origin and this vector.
Exercise 2.18. Let u, v and w be vectors in Rn . Prove that
span u, v, w = span au + bv + cw, v, w
for any real numbers a, b and c such that a 6= 0.
Let A be an m × n matrix. Then each column of the matrix A is an element of Rm .
Definition 2.19. The column space of the matrix A, written as col(A), is the span of its n columns.
ACCELERATED ALGEBRA 31
x1 v1 + · · · + xn vn = A ... .
xn
This shows that y ∈ col(A) if and only if y = Ax for some x ∈ Rn .
Proof. Suppose that (3) holds. If v1 is a linear combination of the vectors v2 , . . . , vm , then
Xm
v1 = λi v i
i=2
so that λi = µi for all i by linear independence. This shows that (1) implies (3).
Corollary 2.29. Let v1 , v2 , . . . , vm be linearly independent vectors in Rn . Then any non-empty
list of vectors consisting of vectors in {v1 , v2 , . . . , vm } is also linearly independent.
Exercise 2.30. Let u, v and w be vectors in Rn . Suppose that u and v are linearly independent,
the vectors u and w are linearly independent, and the vectors v and w are linearly independent.
Does it follow that u, v, w are linearly independent? Give a proof or counterexample.
34 IVAN CHELTSOV
So far we know nothing about how the sizes of spanning sets of vectors and of linearly indepen-
dent sets of vectors are related to one another. The next result describes the relationship:
Proposition 2.31 (Steinitz lemma). Let V be a vector subspace of Rn . Let v1 , . . . , vm be linearly
independent vectors in V, and let w1 , . . . , wk be vectors spanning V. Then m 6 k.
Proof. We claim that m 6 k and it is possible to choose m vectors in
w1 , . . . , wk
in such a way that when these vectors are replaced by v1 , . . . , vm , the resulting list still spans V.
Let us prove this by induction on m.
Suppose that m = 1. Then k > 1, since otherwise we would have V = {0}, which would imply
that v1 = 0, which contradicts v1 being linearly independent. By assumption, we have
k
X
v1 = λi wi
i=1
for some real numbers λ1 , . . . , λm such that at least one number among λ1 , . . . , λm is not zero.
Without loss of generality, we may assume that λ1 6= 0. Then
k
1 X λi
w1 = v1 − wi ,
λ1 i=2
λ 1
Then at least one number among λm , . . . , λk is not zero, because otherwise this equality would imply
that v1 , . . . , vk+1 are linearly dependent, which would contradict Corollary 2.29, since v1 , . . . , vm
are linearly dependent. Without loss of generality, we may assume that λm 6= 0. Then
m−1 k
1 X λi X λi
wm = vk+1 − vi − wi ,
λm i=1
λm λ
i=m+1 m
Observe that the list of vectors e1 , . . . , en introduced in (1.34) is a basis of the vector space Rn .
This basis is usually called the standard basis of Rn .
Exercise 2.39. Let
1 1 1 6
v1 = 1 , v2 = 1 , v3 = 2 , x = 9
1 2 3 14
Prove that v1 , v2 , v3 is a basis in R3 . Find real numbers λ1 , λ2 and λ3 such that
x = λ1 v1 + λ2 v2 + λ3 v3 .
Let V be a linear subspace of Rn and let v1 , . . . , vm be some vectors in V. Then
• v1 , . . . , vm span V ⇐⇒ for every vector x ∈ V, we have
X m
x= λi v i
i=1
for at least one list of real numbers λ1 , . . . , λm ,
• v1 , . . . , vm are linearly independent ⇐⇒ for every vector x ∈ V, we have
X m
x= λi vi
i=1
for at most one list of real numbers λ1 , . . . , λm ,
• v1 , . . . , vm is a basis of V ⇐⇒ for every vector x ∈ V, we have
Xm
x= λi vi
i=1
for exactly one list of real numbers λ1 , . . . , λm .
One vector subspace of the space Rn can have many different bases. But in all example so far,
all the bases of a given linear subspace have the same number of elements. This is a general truth:
Proposition 2.40. Any two bases of one linear subspace of Rn have the same number of elements.
Proof. This follows from Proposition 2.31.
In particular, every basis of Rn contains exactly n vectors.
Exercise 2.41. Let v1 , . . . , vn be a basis of the space Rn , and let A be an invertible n × n matrix.
Prove that Av1 , . . . , Avn is also a basis of the vector space Rn .
Now our goal is to show that every vector subspace of Rn has at least one basis. We start with
Lemma 2.42. Let V be a linear subspace of Rn , and let v1 , . . . , vm be vectors in V that span V.
Then some subset of {v1 , . . . , vm } is a basis of the vector subspace V.
Proof. Consider all subsets of the set {v1 , . . . , vm } that span V. Choose one with the smallest
possible number of elements. Without loss of generality, we may assume that this subset is
v 1 , . . . , vk
for some k 6 m. Then v1 , . . . , vk is a basis of V. Indeed, these vectors span V by assumption, so
it only remains to show that they are linearly independent. If they are not, then, by Lemma 2.28,
one vector among v1 , . . . , vk is a linear combination of the remaining vectors, so that, without loss
of generality, we may assume that
Xk
v1 = λi vi ,
i=2
ACCELERATED ALGEBRA 37
x 1 = x2 .
Then dim(V) = 2, since it has a basis with 2 vectors (see Example 2.35). Observe that
1
1 ∈ W ( V.
−2
This show that this vector is a basis of W and dim(W) = 1.
Exercise 2.52. Let V be the vector subspace in R4 that is given by
x1 + 2x2 + 3x3 + 4x4 = 0,
Two-dimensional subspaces of Rn are called planes, and one-dimensional subspaces are lines.
Proposition 2.53. Let V be a linear subspace of Rn . Then dim(V) equals to
(1) the smallest number of vectors in any spanning set of V,
(2) the largest number of vectors in a linearly independent subset of V.
Proof. Choose a basis v1 , . . . , vm of the vector subspace V. Then
dim V = m.
by definition. Let k be the smallest number of vectors in any spanning set of V. Then
dim V > k,
because v1 , . . . , vm is a spanning set of V. Let w1 , . . . , wk be a spanning set of V with k vectors.
By Lemma 2.42, some subset of {w1 , . . . , wk } is a basis of V. But any basis of V has m elements,
so that k > m = dim(V). Thus, we proved that k = dim(V).
Let r be the largest number of vectors in a linearly independent subset of V. Then
dim V 6 r,
because v1 , . . . , vm are linearly independent. Let u1 , . . . , ur be a linearly independent vectors in V.
By Lemma 2.44, there is some basis of the vector subspace V that contains all vectors u1 , . . . , ur .
But any basis of V has m elements, so that r 6 m = dimV. Hence, we proved that r = dim(V).
Exercise 2.54. Let v1 , . . . , vm be vectors in Rn . Prove that
!
dim span v1 , . . . , vm 6 m,
2.e. Orthogonality.
The orthogonal features, when combined, can explode into complexity.
Yukihiro Matsumoto
Let v1 , . . . , vm be vectors in Rn . Recall the definition of dot-product from Section 1.c.
Definition 2.57. The vectors v1 , . . . , vm are orthogonal if
v i · vj = 0
for every i and j in {1, . . . , m} with i 6= j. We say that they are orthonormal if
0 for i =6 j,
vi · vj =
1 for i = j.
in R2 are are orthonormal. This two vectors are obtained from the previous
√ two vectors by rescaling,
so that they both have length 1. Recall from Section 1.c that kvk = v · v for all v ∈ Rn .
Example 2.58. Let v1 , v2 and v3 be vectors in R3 that are defines as follows:
1 0 0
v1 = 0 , v2 = 2 , v3 = 1 .
0 3 4
Then they are not orthogonal, as even though v1 · v2 = 0 and v1 · v3 = 0, we have v2 · v3 6= 0.
Lemma 2.59. If v1 , . . . , vm are orthogonal and all nonzero, then they are linearly independent.
Proof. Let λ1 , . . . , λm be scalars such that
m
X
(2.60) λi vi = 0.
i=1
Taking the dot product of each side of equation (2.60) with v1 gives
X m
λi vi · v1 = 0.
i=1
By Lemma 1.12 and orthogonliry, we get
m
X m
X
2
λ1 kv1 k = λi vi · v1 = λi vi · v1 = 0,
i=1 i=1
which gives λ1 = 0, because v1 6= 0 by assumption. Similarly, we see that λ2 = · · · = λm = 0.
This shows that v1 , . . . , vm are linearly independent.
ACCELERATED ALGEBRA 41
xn
where x · ei = xi . So in this case, Lemma 2.64 simply states that x = x1 e1 + · · · + xn en .
Example 2.65. Suppose that n = 2 and V = R2 . Let
√ √
2/2 2/2
v1 = √ , v2 = √ .
2/2 − 2/2
As we already mentioned above, these vectors are orthonormal, so that they form a basis of R2 .
To find real numbers λ1 and λ2 such that
5
= λ1 v1 + λ2 v2 ,
3
we can use Lemma 2.64. It gives
√
5
λ1 = 3 · v1 = 4 2,
√
λ2 = 5 · v2 = 2.
3
Exercise 2.66. Suppose that we erase the word orthonormal in the assertion of Lemma 2.64.
Is the lemma still true? Give a proof or give a counterexample.
42 IVAN CHELTSOV
for each j ∈ {1, . . . , m}. This gives us m equations for the numbers λ1 , . . . , λm , which we can solve.
For instance, if m = n = 3 and
−2 1 2 1
v1 = 1 , v2 = 4 , v3 = 7 , x = 0
5 2 −4 0
then the vectors v1 , v2 and v3 form a basis of R3 , so that
x = λ1 v1 + λ2 v2 + λ3 v3
for some real numbers λ1 , λ2 and λ3 , which can be found by solving
x · v1 = λ1
v1 · v1 + λ2
v 2 · v 1 + λ3 v3 · v1 ,
x · v2 = λ1 v1 · v2 + λ2 v 2 · v 2 + λ3 v3 · v2 ,
x · v3 = λ1 v1 · v3 + λ2 v 3 · v 2 + λ3 v3 · v3 ,
Given a line through the origin in R3 , you can take the orthogonal plane through the origin.
Likewise, given a plane through the origin in R3 , you can take the orthogonal line through the origin.
Here is the general definition.
Definition 2.67. Let V be a linear subspace of Rn . The orthogonal complement of V is
n o
V⊥ = x ∈ Rn : x · v = 0 for all v ∈ V .
for some v ∈ V and some w ∈ V⊥ . We spend most of the rest of this section proving this.
Lemma 2.72. Let v1 , . . . , vm be an orthonormal basis of V. For every x ∈ Rn , write
m
X
P x = (x · vi )vi .
i=1
Proof. Clearly, we have P (x) ∈ V. To prove that x − P (x) ∈ V⊥ , it suffices to show that
x − P (x) · vj = 0
for each j ∈ {1, . . . , m}. This follows from Lemma 2.70. We have
X m m
X
P x · v1 = x · vi vi · v1 = x · v i vi · v1 = x · v1
i=1 i=1
so that (x − P (x)) · v1 = 0. Similarly, we see that (x − P (x)) · vj = 0 for every j ∈ {1, . . . , m}.
The following result is an orthonormal analogue of Lemma 2.44.
Lemma 2.73. Let V be a linear subspace of Rn and let v1 , . . . , vk be orthonormal vectors in V.
Then there is some orthonormal basis of V containing all of the vectors v1 , . . . , vk .
Proof. Let m = dim(V) and W = span({v1 , . . . , vk }). If k = m, we are done. Therefore, we may
assume that k < m, so that W 6= V by Lemma 2.50. Then there is y ∈ V such that y 6∈ W. Let
k
X
w= y · vi vi .
i=1
(3) Then we let w3 = u3 + λ1 w1 + λ2 w2 for some real numbers λ1 and λ2 . As above, we want
the vector w3 to be orthogonal to the vectors w1 and w2 , so that
0 = w1 · w3 = w1 · u3 + λ1 w1 + λ2 w2 = w1 · u3 + λ1 w1 · w1 = 2 + 9λ1
and
4
0 = w2 · w3 = w2 · u3 + λ1 v1 + λ2 w2 = w2 · u3 + λ2 w2 · w2 = + 4λ2
3
which implies that λ1 = − 29 and λ2 = − 31 . This gives
4
0 1 −3 2
2 1 1
w3 = 0 − 2 − − 23 = −2 .
1 9 2 3 4 9 1
3
(4) By our construction, the vector w1 , w2 and w3 are orthogonal. To get orthonormal basis,
we have to normalize them as follows:
1
w1 1
v1 = = 2
kw1 k 3 2
− 34 −2
w2 1 1
v2 = = − 32 = −1
kw2 k 2 4 3 2
3
2
2
w3 9
2 1
v3 = = 3 −9 =
−2 .
kw3 k 1 3 2
9
This gives us an orthonormal basis v1 , v2 and v3 .
Proposition 2.75. Let V be a linear subspace of Rn . Then
(1) V ∩ V⊥ = {0};
(2) for each x ∈ Rn , there are unique v ∈ V and w ∈ V⊥ such that x = v + w;
(3) dim(V) + dim(V⊥ ) = n.
Proof. To prove (1), let x be any vector in V ∩ V⊥ . Then
x·v =0
for every v ∈ V. In particular, this holds when v = x. This gives
kxk2 = x · x = 0,
so that x = 0. Vice versa, we have 0 ∈ V ∩ V⊥ , since V ∩ V⊥ is a vector subspace.
Before proving (2) and (3), let us choose some orthonormal basis v1 , . . . , vm of our linear
subspace V. We can do this by Corollary 2.74. Then m = dim(V).
To prove (2), take any vector x ∈ Rn and let
m
X
P x = x · vi vi .
i=1
On the other hand, it follows from Lemma 2.72 that P (x) ∈ V and x − P (x) ∈ V⊥ , so that we let
(
v=P x ,
w =x−P x .
Then x = v + w, where v ∈ V and w ∈ V⊥ .
To prove the uniqueness of the decomposition x = v + w, let v0 ∈ V and w0 ∈ V⊥ such that
x = v0 + w0 .
We must show that v0 = v and w0 = w. Now,
v + w = x = v0 + w0 ,
so v − v0 = w0 − w. But v − v0 ∈ V and w0 − w ∈ V⊥ , so both belong to V ∩ V⊥ , which is {0}.
Hence, we see that v = v0 and w = w0 as required.
To prove (3), let us choose some orthonormal basis w1 , . . . , wk of the linear subspace V⊥ .
Then v1 , . . . , vm , w1 , . . . , wk is an orthonormal basis of Rn . Indeed, these vectors span Rn by (2).
Moreover, all of them are orthogonal to each other, and all of them has length 1 by construction.
Hence v1 , . . . , vm , w1 , . . . , wk is an orthonormal spanning set of the n-dimensional vector space Rn .
Since orthonormality implies linear independence by Lemma 2.59, it is a basis. Then
dim V + dim V⊥ = k + m = n.
as required.
Exercise 2.76. Let u be a non-zero vector in Rn . Let V be a subset in Rn given by
n o
n
V= x∈R : u·x=0 .
Prove that V is a linear subspace of Rn of dimension n − 1.
Exercise 2.77. Let V be a linear subspace of Rn . Prove that (V⊥ )⊥ = V.
ACCELERATED ALGEBRA 47
x4 = 10x1 + x3 .
so that x1 and x3 span row(A). Therefore, they form a basis of row(A). Then dim(row(A)) = 2.
To find the dimension of ker(A), observe that each vector x ∈ ker(A) is given by
x1 + 3x2 + 4x3 = 0,
2x1 + 5x2 + 7x3 = 0,
12x + 35x + 47x = 0.
1 2 3
BA = In ,
for some n × m matrix B. Show that m = n.
On the other hand, we have the following result:
Lemma 3.6. Let A be an m × n matrix. Then
ker(A) = row(A)⊥
Proof. Let x be any vector in Rn . To prove that ker(A) = row(A)⊥ , we must show that
Ax = 0 ⇐⇒ x ∈ row(A)⊥ .
To do this, write the transposes of the rows of the matrix A as r1 , . . . , rm . Then
r1 · x
r2 · x
Ax = ... .
rm · x
Hence, we have
r1 · x = 0,
r2 · x = 0,
Ax = 0 ⇐⇒ .
..
rm · x = 0.
by definition of the subspace row(A). This proves ker(A) = row(A)⊥ . Applying ⊥ to each side
and using Exercise 2.77, we get ker(A)⊥ = row(A).
Using this lemma and Proposition 2.75, we obtain
Corollary 3.7. Let A be an m × n matrix. Then
dim row A + dim ker A = n
Exercise 3.9. Let A be a m × n matrix such that its every row is a scalar multiple of the first
row. Prove that there is
i ∈ 1, . . . , n
such that every column of the matrix A is a scalar multiple of the ith column.
The following exercise gives an alternative proof of Corollary 3.8 that does not use Lemma 3.6,
so that it should be solved without using Lemma 3.6 and its corollaries.
Exercise 3.10. Let A be a m × n matrix. Prove Corollary 3.8 as follows.
(i) For every vector x ∈ Rn , prove that if xT AT Ax = 0, then Ax = 0.
(ii) Use (i) to deduce that ker(AT A) = ker(A).
(iii) Use (ii) and Theorem 3.4 to deduce that
dim col AT A = dim col A .
(iv) Prove that
dim col AT A 6 dim col AT .
(v) Use (iii) and (iv) to deduce Corollary 3.8.
Now we are in position to give the following definition.
Definition 3.11. Let A be an m × n matrix. Then the rank of the matrix A is
rank(A) = dim col A = dim row A .
Theorem 3.4 is known as the rank-nullity theorem, because of the following definition:
Definition 3.13. Let A be an m × n matrix. The nullity of the matrix A is dim(ker(A)).
Exercise 3.14. Let A be a 3 × 5 matrix. What are the possible values of the nullity of A?
Using Definitions 3.11 and 3.13, we can restate Theorem 3.4 as follows: for a matrix A, one has
rank A + nullity A = n
where n is the number of columns of the matrix A.
Exercise 3.15. Let A be the following 6 × 4 matrix:
1 1 1 1
0 1 1 1
1 0 1 1
.
1 1 0 1
1 1 1 0
1 1 1 1
Find its rank and nullity.
ACCELERATED ALGEBRA 51
xn
Moreover, such vector x is unique. Indeed, if x̂ is another vector in Rn such that Ax̂ = v, then
A x̂ − x = Ax̂ − Ax = b − b = 0,
so that the vector x̂ − x is contained in ker(A). But ker(A) = {0}, since (iii) holds. Then
x̂ − x = 0,
which gives x̂ = x as required. This shows that (iii) ⇐⇒ (iv).
52 IVAN CHELTSOV
Now let us prove that (i) implies (iii). Let x be a vector in ker(A). If A is invertible, then
x = In x = A−1 A x = A−1 Ax = A−1 0 = 0,
Substituting all columns of the matrix B into this formula and using AB = In , we conclude that
all vectors of the standard basis e1 , . . . , en are contains in col(A). This means that rank(A) = n.
Then, by Theorem 3.4, we have ker(A) = {0}, which is a contradiction, so that (ii) implies (i).
Exercise 3.18. Write down the inverse of the matrix
7 0 0 0
0 −2 0 0
0 0 1 0 .
0 0 0 10
Corollary 3.19. Let A be a n × n matrix. Then the following conditions are equivalent:
(i) A is invertible;
(ii) the rows of A are linearly independent;
(iii) the rows of A span Rn ;
(iv) the rows of A are a basis of Rn .
Proof. By Lemma 1.45, the matrix A is invertible ⇐⇒ the transposed matrix AT is invertible.
Thus, applying Theorem 3.17 to the matrix AT , we obtain the required assertion.
Exercise 3.20. Let A be a m × n matrix. Prove or disprove the following assertions.
(i) The rows of A are linearly independent ⇐⇒ the columns of A are linearly independent.
(ii) Suppose that m = n, i.e. A is a square matrix. Then the rows of the matrix A are linearly
independent ⇐⇒ the columns of the matrix A are linearly independent.
ACCELERATED ALGEBRA 53
3.b. Determinants.
After hard work, the biggest determinant is being in the right place at the right time.
Michael Bloomberg
In Example 1.41, we met determinants of 2 × 2 matrices. Namely, let
a b
A= .
c d
From Example 1.41, we know that the determinant of A is the number
ad − bc
which is denoted by det(A) or sometimes by
a b
c d
.
Then |det(A)| is the area of the parallelogram in R2 whose edge-vectors are the columns of A.
Now we consider similar example in dimension three. Let A be a 3 × 3 matrix
a11 a12 a13
a21 a22 a23 .
a31 a32 a33
Then its determinant det(A) is the number given by the formula
det(A) = a11 a22 a33 − a11 a23 a32 − a12 a21 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31
which can be rewritten as
a22 a23
a21 a23
a21 a22
a32 a33
− a12
a31 a33
+ a13
a31 a32
.
det(A) = a11
As in dimensional two, one can show that |det(A)| is the volume of the parallelepiped in R3 whose
edges are the vectors the columns of the matrix A.
Exercise 3.21. Find the maximum value of the determinant
a11 a12 a13
a21 a22 a23
a31 a32 a33
a11 a22 a33 a44 − a11 a22 a34 a43 − a11 a23 a32 a44 + a11 a23 a34 a42 + a11 a24 a32 a43 − a11 a24 a33 a42 −
− a12 a21 a33 a44 + a12 a21 a34 a43 + a12 a23 a31 a44 − a12 a23 a34 a41 − a12 a24 a31 a43 + a12 a24 a33 a41 +
+ a13 a21 a32 a44 − a13 a21 a34 a42 − a13 a22 a31 a44 + a13 a22 a34 a41 + a13 a24 a31 a42 − a13 a24 a32 a41 −
− a14 a21 a32 a43 + a14 a21 a33 a42 + a14 a22 a31 a43 − a14 a22 a33 a41 − a14 a23 a31 a42 + a14 a23 a32 a41 ,
which is not so non-informative as it may look like. As in the previous cases, we can rewrite it
using the very same pattern:
a22 a23 a24
a21 a23 a24
a21 a22 a24
a21 a22 a23
det(A) = a11
a32 a33 a34
− a12
a31 a33 a34
+ a13
a31 a32 a34
+ a14
a31 a32 a33
.
a42 a42 a44
a41 a43 a44
a41 a42 a44
a41 a42 a43
Our inductive definition of the determinant of the matrix A gives a special status to its first
row. In fact, we can use any row or column to give a similar formula.
Proposition 3.25. Let A be a n × n matrix (aij ). Then
Xn
(−1)i+j aij det A[i, j]
det(A) =
j=1
Proof. Omitted.
Corollary 3.28. Let A be a n × n matrix with columns v1 , . . . , vn . Suppose that
vi = vj
for some i and j in {1, . . . , n} such that i 6= j. Then det(A) = 0.
Proof. Let B be the matrix obtained from A by swapping columns i and j. Then
det(B) = −det(A),
by Proposition 3.27(i). But B = A, so that det(A) = −det(A), which gives det(A) = 0.
Corollary 3.29. Let A be a n × n matrix with columns v1 , . . . , vn , let C be a n × n matrix with
columns w1 , . . . , wn . Suppose that there are k 6= r in {1, . . . , n} such that
wk = vk + λr vr
for some λr ∈ R, and wi = vi for every i ∈ {1, . . . , n} such that i 6= k. Then det(C) = det(A).
Proof. Let B be a n × n matrix with columns u1 , . . . , un defined as follows:
ui = vk if i 6= k,
ui = λr vr if i = k,
Then det(B) = 0 by Proposition 3.27(ii), so that
det(C) = det(A) + det(B) = det(A)
by Proposition 3.27(iv).
Corollary 3.30. Let A be an invertible n × n matrix. Then det(A) 6= 0 and
1
det A−1 =
.
det(A)
Proof. One has AA−1 = I2 , so that
det(A)det(A−1 ) = det(In ) = 1
by Proposition 3.27(v) and Example 3.24.
Exercise 3.31. Let A be a n × n matrix with integer entries. Suppose that A is invertible, and
the inverse matrix A−1 also has integer entries. What is det(A)?
Exercise 3.32. Doing as little work as possible, compute the determinant of the matrix
1 2 1 0 4
7 0 1 −1 −2
−1 3 −1 0 1
.
0 2 0 0 0
1 1 1 0 2
Exercise 3.33. Observe that the integers 20604, 53227, 25755, 20927 and 289 are divisible by 17.
Moreover, it follows from (3.23) that the determinant of the matrix
2 0 6 0 4
5 3 2 2 7
2 5 7 5 5 .
2 0 9 2 7
0 0 2 8 9
is an integer. Prove that it is also divisible by 17.
ACCELERATED ALGEBRA 57
for any i ∈ {1, . . . , n}. Then the adjugate of A is the n × n matrix adj(A) whose (i, j)-entry is Cji .
Note the reversal of the indices! For instance, the adjugate of a 2 × 2 matrix is given by
a b d −b
adj = = .
c d −c a
Notice that in this case, A adj(A) = det(A)I2 . In fact, this is true for all square matrices:
Proposition 3.36. Let A be a n × n matrix. Then
A adj(A) = det(A)In .
58 IVAN CHELTSOV
Proof. Let A be a n × n matrix. We use the convention that the (i, j)-entry of a matrix M is
written as mij . Let i and k be any indices in {1, . . . , n}. We must show that
det(A) if i = k,
A adj A =
ik 0 if i 6= k.
We have
n n
X X
A adj A = aij Ckj = (−1)k+j aij det(A[k, j]).
ik
j=1 j=1
by Proposition 3.25 and Lemma 3.35, since two rows of B are equal. Hence (A adj(A))ik = 0.
Corollary 3.37. Let A be a n × n matrix. Then A is invertible ⇐⇒ detA 6= 0.
Proof. By Corollary 3.30, it is enough to prove that A is invertible provided that det(A) 6= 0.
However, if det(A) 6= 0, then
1
A adj(A) = In
detA
by Proposition 3.36, so A is invertible by Theorem 3.17.
Corollary 3.38. Let A be a n × n matrix. Then
rank(A) = n ⇐⇒ det(A) 6= 0 ⇐⇒ A is invertible.
Exercise 3.39. Find inverses of the following 3 × 3 matrices:
1 0 0 6 0 0 1 3 0 1 2 3
0 1 0 , 0 1 2 , 2 7 0 , 0 1 4 .
3 0 1 0 3 5 0 0 7 5 6 0
Exercise 3.40. Let A be the following 3 × 3 matrix:
7−λ −12 6
10 −19 − λ 10 ,
12 −24 13 − λ
where λ is a real number. Find rank(A) for all possible λ ∈ R.
Exercise 3.41. Let A be a m × n matrix such that
rank(A) = m.
Prove that there exists a n × m matrix B such that AB = Im .
ACCELERATED ALGEBRA 59
Here, each aij and bi are some (fixed) real numbers, while x1 , . . . , xn are unknowns (variables).
The fundamental questions about a system of linear equations are these:
• Are there any solutions?
• If so, how many?
• And how can we compute them?
To answer them, we can use results proved in Section 3.a, because (3.42) can be rewritten as
a11 a12 · · · a1n x1 b1
a21 a22 · · · a2n x2 b2
. .. .. . = . .
.. . . .. ..
am1 am2 · · · amn xn bm
Example 3.43. Suppose that n = 2. Then (3.42) simplifies as
a11 x1 + a12 x2 = b1 ,
a21 x1 + a22 x2 = b2 .
Multiply the first row by a21 and the second by a11 , then subtract. This gives
a11 a22 − a12 a21 x2 = a11 b2 − a21 b1 .
Assuming that a11 a22 − a12 a21 6= 0, this gives
a11 b2 − a21 b1
x2 =
a11 a22 − a12 a21
from which it follows that
a22 b1 − a12 b2
x1 = .
a11 a22 − a12 a21
So as long as a11 a22 − a12 a21 6 0, there is a unique solution. Note that
=
a11 a12
a11 a22 − a12 a21 = det .
a21 a22
Thus, if this matrix is invertible, then the system has a unique solution (cf. Example 1.41).
Now let us consider two very explicit examples, Examples 3.44 and 3.45 below, that illustrate
the method for solving system of linear equations which is known as Gaussian elimination.
Example 3.44. Let us consider the following system of linear equations:
2x + 3y − z = 3,
x + y + z = 4,
3x − 4y + z = 1.
60 IVAN CHELTSOV
Obviously, this system of linear equations has the same solutions as our original system of equations.
Now let us subtract 2 times the first equation from the second equation, and then subtract 3 times
the first equation from the third equation. This gives:
x + y + z = 4,
y − 3z = −5,
− 7y − 2z = −11.
Again, this system of linear equations has the same solutions as the original system of equations.
Actually, we can simplify it a bit more:
x + 4z = 9,
y − 3z = −5,
z = 2.
Subtracting 3 times the first equation from the second equation, and 4 times the first equation
from the third equation, we get
x + 2y + 3z = 1,
− y − 11z = −1,
− y − 11z = −1.
Then subtract 2 times the second equation from the first one, and add 1 times the second equation
to the third equation. This gives
x − 19z = −11,
y + 11z = 1,
0 = 0.
Now the third equation can be ignored. Moreover, in the first two equations, we can choose z freely,
say by putting z = t for an arbitrary scalar t. Thus, the solutions of the system are:
x = −1 + 19t,
y = 1 − 11t,
z = t.
where t is any real number. We call z a free variable, since we can choose it freely. Similarly, we
say that x and y are leading variables.
If all numbers b1 , b2 , . . . , bn in (3.42) are equal to zero, then (3.42) simplifies as
a x + a12 x2 + · · · + a1n xn = 0,
11 1
a21 x1 + a22 x2 + · · · + a2n xn = 0,
(3.46) ..
.
am1 x1 + am2 x2 + · · · + amn xn = 0.
Such systems of linear equations are called homogeneous. Otherwise, they are called inhomoge-
neous. Let us rewrite (3.46) as Ax = 0, where
a11 a12 · · · a1n
a21 a22 · · · a2n
A= ... .. .. .
. .
am1 am2 · · · amn
Then the solutions to (3.46) span the kernel ker(A). This is a linear subspace of Rn by Lemma 2.5.
Therefore, in particular, our system of linear equations (3.46) always has the trivial solution x = 0,
Moreover, applying Theorem 3.4 (the rank–nullity theorem), we get
62 IVAN CHELTSOV
Corollary 3.47. A homogeneous system of linear equations with more variables than equations
always has infinitely many solutions. In particular, it has at least one nontrivial solution.
where a, b, c, d, e, f are some real numbers such that (a, b, c) 6= (0, 0, 0). For example, an ellipse,
hyperbola and parabola are all conics. Let P1 , P2 , P3 , P4 , P5 be distinct points in R2 such that at
most 3 of them are contained in one line. Then there exists a unique conic in R2 that contains all
of them. Indeed, write P1 = (x1 , y1 ), P2 = (x2 , y2 ), P3 = (x3 , y3 ), P4 = (x4 , y4 ) and P5 = (x5 , y5 ),
where each xi and yi is a real number. If C is a conic that contains P1 , P2 , P3 , P4 and P5 , then
2
ax1 + bx1 y1 + cy12 + dx1 + ey1 + f = 0,
2 2
ax2 + bx2 y2 + cy2 + dx2 + ey2 + f = 0,
(3.50) ax23 + bx3 y3 + cy32 + dx3 + ey3 + f = 0,
ax24 + bx4 y4 + cy42 + dx4 + ey4 + f = 0,
ax2 + bx y + cy 2 + dx + ey + f = 0.
5 5 5 5 5 5
Therefore, we have 6 unknowns a, b, c, d, e and f , and only 5 linear equations. Using Corollary 3.47,
we see that there exists a conic in R2 that contains P1 , P2 , P3 , P4 , P5 . Now we let
2
x1 x1 y1 y12 x1 y1
x2 x2 y2 y 2 x2 y2
22 2
2
A= x
32 x 3 y 3 y 3 x3 y3 .
2
x4 x4 y4 y4 x4 y4
x25 x5 y5 y52 x5 y5
To show that there exists unique conic in R2 that contains P1 , P2 , P3 , P4 , P5 , it is enough to show
that rank(A) = 5. Indeed, if rank(A) = 5 then it follows from Theorem 3.4 that solutions of the
system of linear equations (3.50) form one-dimensional vector subspace in R6 , which simply means
that such solution is unique up to scaling, so that (3.49) is unique up to scaling, which means that
the required conic is unique. One the other hand, we know from Definition 3.11 that
rank A = 5 ⇐⇒ dim col A = 5 ⇐⇒ col(A) = R5 .
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c1 c2 c3 c4 c5
, , , ,
d1 d2 d3 d4 d5
e e e e e
1 2 3 4 5
f1 f2 f3 f4 f5
ACCELERATED ALGEBRA 63
such that
a1 a2 a3 a4 a5
b1 b2 b3 b4 b5
c c c c c
A 1 = e1 , A 2 = e2 , A 3 = e3 , A 4 = e4 , A 5 = e5 .
d1 d2 d3 d4 d5
e e e e e
1 2 3 4 5
f1 f2 f3 f4 f5
Keeping in mind (3.49), we see that this is equivalent to the following geometric condition:
F for every Pi among P1 , P2 , P3 , P4 , P5 , there exists a conic Ci in R2 such that
(1) the conic Ci does not contain the point Pi ;
(2) the conic Ci contains all other points among P1 , P2 , P3 , P4 , P5 .
The condition F not very hard to check. Without loss of generality, it is enough to check F for P1 .
Namely, let us show that there is a conic C1 such that P1 ∈ C1 while C1 contains P2 , P3 , P4 , P5 .
To do this, denote by Lij the line in R2 that contains Pi and Pj , where i and j are in {1, 2, 3, 4, 5}
such that i 6= j. If P1 6∈ L23 ∪ L45 , we can let
C1 = L23 ∪ L45 ,
so that P1 6∈ C1 and C1 contains P2 , P3 , P4 , P5 . Therefore, without loss of generality, we may
assume that P1 ∈ L23 . Then P4 6∈ L23 and P5 6∈ L23 , so that
P1 6∈ L24 ∪ L35 ,
since L24 ∩ L23 = P2 and L23 ∩ L35 = P3 . Thus, we can let C1 = L24 ∪ L35 .
Exercise 3.51. A cubic curve in R2 is a curve that is given by the following equation
a1 x3 + a2 x2 y + a3 xy 2 + a4 y 3 + a5 x2 + a6 xy + a7 y 2 + a8 x + a9 y + a10 = 0,
where a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 , a9 and a10 are fixed real numbers such that at least one number
among a1 , a2 , a3 , a4 is not zero. Let C1 and C2 be two distinct cubic curve in R2 such that their
intersection C1 ∩ C2 contains 9 distinct points P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9 such that
• at most 2 points among them are contained in one line,
• and at most 5 points among them are contained in one conic (see Example 3.48).
Prove that every cubic curve in R2 that contains 8 points among P1 , P2 , P3 , P4 , P5 , P6 , P7 , P8 , P9
also contains the remaining ninth point (see Chasles’s Theorem in Preface).
We saw that homogeneous systems of linear equations always have (possibly trivial) solutions.
What about inhomogeneous systems of linear equations? These need not have any solutions, even
if they have more variables than equations. For instance, the following system has no solutions:
2x1 + 3x2 + 4x3 = 1,
Proof. We have to prove two things: that every element of this set is a solution, and that every
solution belongs to this set. First, we let w ∈ ker(A). Then
A u + w = Au + Aw = b + 0 = b
64 IVAN CHELTSOV
where A
b is the augmented matrix of the system of linear equations (3.42).
by Lemma 2.50. Moreover, if (3.42) has a solution, then col(A) must contain b by Lemma 2.21,
so that col(A) b = col(A), which gives rank(A) = rank(A). b
Let v1 , . . . , vn be the columns of the matrix A. If rank(A) = rank(A),
b then
col A = col A b
ACCELERATED ALGEBRA 65
by Lemma 2.50, which gives b ∈ col(A), so that there are real numbers x1 , . . . , xn such that
n
X
b= xi v i .
i=1
This equality can be rewritten as Ax = b, so that (3.42) has a solution in this case.
Corollary 3.55. The system of linear equations (3.42) has a unique solution if and only if
rank A = rank A
b =n
where A
b is the augmented matrix of (3.42), and n is the number of columns of A.
Show that this system of linear equations does not have other solutions.
Proof. There are three cases to prove, corresponding to the three types of elementary row operation.
We will do the third type only; the first two are similar, easier, and left as an exercise.
Without loss of generality, we may assume that A0 is obtained from A by the row operation
R1 → R1 + λR2
for some λ ∈ R. Write the rows of A as r1 , . . . , rm . Let x ∈ Rn . Then
r1 x
r2 x
Ax = ... .
rm x
0
The rows of A are r1 + λr2 , r2 , . . . , rm , so that
r1 + λr2 x r1 x + λr2 x
r2 x r2 x
A0 x =
. = . .
.. ..
rm x rm x
Similarly, we have
b1 + λb2
b2
b0 = .. .
.
bm
Observe also that
r1 x + λr2 x = b1 + λb2 r1 x = b1
⇐⇒
r2 x = b2 r2 x = b2
so that we have
r1 x + λr2 x b1 + λb2 r1 x b1
r2 x b2 r x b
A0 x = b0 ⇐⇒ ⇐⇒ 2. = .2 ⇐⇒ Ax = b.
.. = ..
. . .. ..
rm x bm rm x bm
This completes the proof.
0
Corollary 3.61. Let A be a m × n matrix, and let A be another a m × n matrix that is obtained
from A by a sequence of elementary row operations. Then
ker A = ker A0
The matrices A and A0 in Corollary 3.61 may have different column space, e.g. we have
1 0 R2 →R2 −R1 1 0
−−−−−−−→ ,
1 0 0 0
but the column spaces of these two matrices are different.
The matrices we ended up with in Examples 3.59 and 3.62 are of special type:
Definition 3.63. A matrix is in row echelon form (REF) if the following holds:
(1) any rows consisting entirely of zeros are at the bottom;
(2) in each nonzero row, the first nonzero entry (called the leading entry) is to the left of all
the leading entries below it.
For instance, the matrix
1 2 3 4 5
0 6 7 8 9
0 0 0 10 11
0 0 0 0 0
is in row echelon form, and its leading entries are underlined.
Definition 3.64. A matrix is in reduced row echelon form (RREF) if the following holds:
(1) it is in row echelon form;
(2) all leading entries are equal to 1;
(3) each column containing a leading 1 has zeros everywhere else.
For instance, the final matrices in Examples 3.59 and 3.62 are both in reduced row echelon form,
and so is the matrix
1 0 3 0
0 1 6 0
0 0 0 1 .
0 0 0 0
Theorem 3.65. Using elementary row operations, we can put any given matrix into reduced row
echelon form. Moreover, the resulting matrix is unique.
Proof. Left to the reader.
Exercise 3.66. Find the reduced row echelon form of the matrix
1 4 7
2 5 8 .
3 6 9
We can find a basis of a row space of a matrix using elementary row operations and
Lemma 3.67. Let A be a matrix in REF. Then transposes of its nonzero rows is a basis of row(A).
Proof. The transposes of all non-zero rows of A span row(A). Therefore, to complete the proof,
we must show that they are linearly independent. Let us do this.
ACCELERATED ALGEBRA 69
Suppose that A is a m × n matrix with k nonzero rows. By definition, these are the first k rows.
Write their transposes as vectors r1 , . . . , rk in Rn with leading entries `1 , . . . , `k , respectively. Then
0
0
..
.
0
ri = `i
∗
∗
.
..
∗
where ∗ denotes for any real number. So that `i is the first non-zero coordinate of the vector ri .
Denote the number of this coordinate by pi . Then, by definition of REF, we have
1 6 p1 < p2 < · · · < pk 6 n.
Now let λ1 , . . . , λk be scalars such that
λ1 r1 + λ2 r2 + · · · + λk rk = 0.
Comparing the p1 th entries on each side gives λ1 `1 = 0, so that λ1 = 0. Thus, we have
λ2 r2 + · · · + λk rk = 0.
Comparing the p2 th entries on each side gives λ2 `2 = 0, so that λ1 = λ2 = 0. Thus, we have
λ3 r3 + · · · + λk rk = 0.
Continuing in this way, we get λ1 = · · · = λk = 0, so that r1 , . . . , rk are linearly independent.
Corollary 3.68. The rank of a matrix is the number of nonzero rows in its row echelon form.
Exercise 3.69. Use reduction to REF to determine whether the following vectors
1 −4 6
3 , −6 , 12
0 3 −5
are linearly independent or not.
Exercise 3.70. Write down the augmented matrix of the system of linear equations
x + 2y + 3z = 1,
3x + 5y − 2z = 2,
4x + 7y + z = 3.
Find the reduced row echelon form of the augmented matrix. Then find the set of solutions.
For square matrices, using RREF gives us the following result.
Lemma 3.71. Let A be a n × n matrix. Then the following two conditions are equivalent:
• the matrix A is invertible;
• the reduced row echelon form of A is In .
Proof. Write R for the RREF of A. By Theorem 3.17, we have
• the matrix A is invertible ⇐⇒ ker(A) = {0},
• the matrix R is invertible ⇐⇒ ker(R) = {0}.
70 IVAN CHELTSOV
0 ··· 0 0 1
where ∗ denotes any real number. Moreover, every column containing a leading entry has zeros
everywhere else, so that we must have R = In .
Turn these into row vectors (i.e. take transposes) and put them together as the rows of a matrix.
In this example, this gives the matrix A from Example 3.73. Then
span v1 , v2 , v3 , v4 = row A .
Observe that the RREF of the matrix A is I3 , so that the matrix A is invertible by Lemma 3.71.
Moreover, the right-hand half of the 3 × 6 matrix is the inverse matrix A−1 , so that
−5/3 0 2/3
A−1 = 0 1/3 0 .
4/3 0 −1/3
In Example 3.78, we implicitly use one important observation. To describe it, fix a n×n matrix A.
Then each elementary row operation done to A can be obtained via a matrix multiplication
A 7→ M A
for an appropriate n × n matrix M . So, performing k elementary row operations that convert
A into its reduced row echelon form R, we obtain n × n matrices M1 , . . . , Mk such that
Mk · · · M2 M1 A = R.
| {z }
row operations
ACCELERATED ALGEBRA 73
Let us consider one more example that shows a potential problem for using this algorithm.
Example 3.79. Let A be the matrix
9 8 7
6 5 4
3 2 1
Let us try to find its inverse. Applying the same method as in Example 3.79, we get
8 7 1
9 8 7 1 0 0 1
R1 → R1
1 9 9 9
0 0
R2 →R2 −6R1
6 5 4 0 1 0 −−−−9−→ 6 5 4 0 1 0 −− −−−−−→ R3 → R3 − 3R1
3 2 1 0 0 1 3 2 1 0 0 1
8 7 1
8 7 1
1 9 9 9
0 0 1 9 9 9
0 0
R3 →R3 −2R2
−→ 0 −1 3
−2
3
−2
3
1 0 −− −−−−−→ 0 −1 3
−2
3
−2
3
1 0 .
−2 −4 −1
0 3 3 3
0 1 0 0 0 1 −2 1
We have not yet reduced A into its RREF, but this intermediate matrix already has a zero row.
Therefore, the matrix A is not invertible by Lemma 3.71, so that its inverse does not exists.
Exercise 3.80. Using row operations (not determinants), determine whether the following matri-
ces are invertible. If they are, find their inverses.
1 2 −1 1 1 −1
A = 1 1 1 , B = 3 1 2 .
1 −1 0 5 3 0
Let V be a linear subspace of Rn . How to find an orthonormal basis of this vector subspace?
To start with, we should find some basis y1 , . . . , ym of this subspace, and then use an algorithm
described in the proof of Lemma 2.73. It produces an orthonormal basis v1 , . . . , vm as follows:
y1
v 1 =
,
ky1 k
y2 − (y2 · v1 )v1
v2 = ,
ky2 − (y2 · v1 )v1 k
y3 − (y3 · v1 )v1 − (y3 · v2 )v2
v3 =
,
y3 − (y3 · v1 )v1 − (y3 · v2 )v2
..
.
ym − (ym · v1 )v1 − (ym · v2 )v2 − · · · − (ym · vm−1 )vm−1
v m =
kym − (ym · v1 )v1 − (ym · v2 )v2 − · · · − (ym · vm−1 )vm−1 k
This procedure is called the Gram–Schmidt process.
Example 3.81. Let V = span({y1 , y2 , y3 }), where
−5 1 1
3 0 −3
y1 =
1 , y2 = −4 , y3 = 2 .
1 3 0
74 IVAN CHELTSOV
Let us find an orthonormal basis of V. First, we check that y1 , y2 , y3 are linearly independent.
This can be done in one of the following ways:
• directly like in the solution to Exercise 2.39;
• by computing REF of the matrix (y1T , y1T , y1T ) and using Lemma 3.67;
• by observing that
3 0 −3
1 −4 2
= 3
−4 2
− 3
1 −4
= −39 6= 0.
3 0
1 3
1 3 0
We see that y1 , y2 , y3 form a basis of V. Thus, we can now apply the Gram–Schmidt process to
obtain an orthonormal basis v1 , v2 , v3 of the linear subspace V. To calculate v1 , we have
−5
y1 1 3
v1 = = .
ky1 k 6 1
1
To calculate v2 , we have
1 −5 1
0 1 3 1 3
y2 − (y2 · v1 )v1 =
−4 + 6 1 = 6 −23 ,
3 1 19
so that
1
y2 − (y2 · v1 )v1 1
3 .
v2 = =
ky2 − (y2 · v1 )v1 k 30 −23
19
To calculate v3 : we have
1 −5 1 −91
−3 1 3
+ + 3 3 = 1 −273
y3 − (y3 · v1 )v1 − (y3 · v2 )v2 =
2 3 1 50 −23 150 143
0 1 19 221
and so that
−91 −7
y3 − (y3 · v1 )v1 − (y3 · v2 )v2 1 −273 = 1 −21 .
v3 =
=
y3 − (y3 · v1 )v1 − (y3 · v2 )v2
390 143 30 11
221 17
Exercise 3.82. Let
1 2
2 6
w1 = 2 and w2 = 7 .
4 8
Use the Gram–Schmidt process to find an orthonormal basis of span({w1 , w2 }).
ACCELERATED ALGEBRA 75
4. Linear transformations
4.a. What is a linear transformation?
Non-linear means it’s hard to solve
Arthur Mattuck
n m
Let T : R → R be a function.
Definition 4.1. The function T is said to me a linear transformation or linear map if
(1) T (0) = 0;
(2) T (x + y) = T (x) + T (y) for all x and y in Rn ;
(3) T (λx) = λT (x) for all λ ∈ R and all x ∈ Rn .
Example 4.2. Let T : R2 → R2 be reflection in the x-axis. Then
x1 x1
T =
x2 −x2
for all x1 and x2 in R. Then T is a linear transformation, because
• T (0) = 0;
• for any x and y in R2 , we have
x1 + y 1 x1 + y 1 x1 y1
T x+y =T = = + =T x +T y ;
x2 + y 2 −(x2 + y2 ) −x2 −y2
• for any λ ∈ R and any vector x ∈ R2 , we have
cx1 cx1 x1
T λx = T = =λ = λT x .
cx2 −λx2 −x2
Example 4.3. Let T : R2 → R3 be a function that is given by
3x1 + 2x2
x1
7→ x2 − 4x1
x2
−x2
for every x1 and x2 in R. Then T is a linear transformation, because
• T (0) = 0;
• for any x and y in R2 , we have
3(x 1 + y 1 ) + 2(x 2 + y 2 )
x1 + y 1
T x+y =T = (x2 + y2 ) − 4(x1 + y1 ) =
x2 + y 2
−(x2 + y2 )
3x1 + 2x2 3y1 + 2y2
= x2 − 4x1 + y2 − 4y1 = T x) + T (y ;
−x2 −y2
• for any λ ∈ R and any x ∈ R2 , we have
3λx 1 + 2λx2 3x 1 + 2x 2
λx1
T λx = T = λx2 − 4λx1 = λ x2 − 4x1 = λT x .
λx2
−λx2 −x2
Example 4.4. If T : R2 → R2 is a function given by
2
x1 x1
7→
x2 x2
76 IVAN CHELTSOV
To see this, note that λPV (x) + µPV (y) ∈ V, because PV (x) and PV (y) are contained in V. Then
λx + µy − λPV x + µPV y = λ x − PV x + µ y − PV y ∈ V⊥ ,
Proof. By definition, the vector PV (x) is the unique vector in V such that
x − PV (x) ∈ V⊥ .
But by Lemma 2.72, the vector
m
X
x · vi vi .
i=1
dim(V) = 2 V is a plane through the origin, and PV maps x ∈ R3 to the closest point to x in this plane,
e.g. if V is spanned by the standard vectors e1 and e2 , then PV is given by
x1 x1
x2 7→ x2 ;
x3 0
0 0 xn
Since T is a linear transformation, we have
T x = x1 T e 1 + x2 T e 2 + · · · + xn T e n
for all x ∈ Rn , so that T is uniquely determined by the vectors T (e1 ), . . . , T (en ).
ACCELERATED ALGEBRA 79
Proof. First, let us prove that there is at most one such T. Suppose that there are two linear
transformations T : Rn → Rm and S : Rn → Rm such that
T v1 = u1 = S v1 ,
T v = u = S v ,
2 2 2
..
.
T v = u = S v .
n n n
y = µi v i
i=1
80 IVAN CHELTSOV
for some real numbers a and b such that v 6= 0. Let PV : R2 → R2 be the orthogonal projection
onto V. Show that PV is given by
!
b2 x1 −abx2
x1 2 2
7→ a2 xa2 −abx
+b .
x2 a2 +b2
1
find its standard matrix and compute its determinant. Let RV : R2 → R2 be the function given by
x 7→ x + 2 PV x − x .
Then RV is the reflection in V. Show that RV is given by
!
b2 x1 −abx2
x1 2 2
7→ a2 xa2 −abx
+b ,
x2 a +b2
2
1
conclude that it is a linear transformation, find its standard matrix and compute its determinant.
Linear transformations and matrices are not the same! A linear transformation is a function
with certain nice properties. But a matrix is just a grid of numbers.
Do we have other matrices with this this property? Yes, we do: we have
1973
n −1 n −1 n −1 n −1 n −1
MA M = MA M × MA M × · · · × MA M × MA M =
| {z }
1973 times
−1 n 1973
n n n n n
× M −1 =
=M ×A
| × A × A ×
{z · · · × A × A } ×M = M × A
1973 times
n
= M × A1973 × M −1 = M × I21973 × M −1 = M × M −1 = I2 ,
where M is any invertible 2 × 2 matrix, and n ∈ {1, 2, . . . , 1972}. Any other examples? No.
A linear transformation is said to be invertible if it has an inverse linear transformation.
Lemma 4.31. Let T : Rn → Rm be a linear transformation. Then
T is invertible ⇐⇒ T is bijective ⇐⇒ n = m and [T ] is invertible .
Moreover, if T is invertible, then [T −1 ] = [T ]−1 .
ACCELERATED ALGEBRA 85
Proof. Both assertions follows from Theorem 4.22 and Lemma 2.21.
Then im(T ) is a linear subspace of Rm , and ker(T ) is a linear subspace of Rn . Hence, it makes
sense to talk about their dimensions. Then we define the rank of the linear transformation T as
rank(T ) = dim im(T )
and
x1
3
ker(T ) = x2 ∈ R : x2 = x3 = 0 ,
x3
Exercise 4.38. Let T : R5 → R2 be a linear transformation. What are the possible values of its
rank and nullity? Give examples to show that the possibilities you say can occur really do occur.
When linear transformations are injective? When they are surjective?
Lemma 4.39. Let T : Rn → Rm be a linear transformation. Then
• T is injective ⇐⇒ ker(T ) = {0} ⇐⇒ null(T ) = 0;
• T is surjective ⇐⇒ im(T ) = Rm ⇐⇒ rank(T ) = m.
ACCELERATED ALGEBRA 87
Proof. Actually, we already proved these assertions implicitly earlier in the proof of Lemma 4.31.
Now let us do this explicitly. If T is injective, then ker(T ) = {0}. Vice versa, if ker(T ) = {0} and
T x =T y
for some x and y in Rn , then
T x − y = T x − T y = 0,
which implies that x − y ∈ ker(T ), which immediately gives x − y = 0, so that we have x = y.
Thus, if ker(T ) = {0}, then T is injective. But ker(T ) = {0} ⇐⇒ null(T ) = 0 by Lemma 2.50.
Similarly, we see that T is surjective ⇐⇒ im(T ) = Rm ⇐⇒ rank(T ) = m.
Corollary 4.40. Let T : Rn → Rn be a linear transformation. Then
T is injective ⇐⇒ T is bijective ⇐⇒ T is surjective .
Proof. By Theorem 4.37 and Lemma 4.39, we have
T is injective ⇐⇒ null(T ) = 0 ⇐⇒ rank(T ) = n ⇐⇒ T is surjective.
Thus, if T is either injective or surjective then it is both, that is, bijective.
Exercise 4.41. Let n and m be natural numbers. Prove the following assertions:
(i) if there exists an injective linear transformation Rn → Rm , then n 6 m;
(ii) if there exists a surjective linear transformation Rn → Rm , then n > m.
Exercise 4.42. Let T be the unique linear transformation R3 → R4 such that
2 −2 −2
1 3 0 3 0 0
T 0 = 0 , T 1 = 8 , T 0 = 4 .
0 0 1
1 13 6
Is T injective? Justify your answer.
for some real numbers b1j , . . . , bnj . This gives us the following n × n matrix
b11 b12 . . . b1n
b21 b22 . . . b2n
B= ... .. . . . .
. . ..
bn1 bn2 . . . bnn
If v1 , . . . , vn is the standard basis of Rn , then B = [T ] by definition. In general, we say that
F B is the matrix of the linear operator T with respect to basis v1 , . . . , vn .
Example 4.44 (cf. Exercise 3.40). Let T = TA , where TA : R3 → R3 is the linear operator with
7 −12 6
A = 10 −19 10 .
12 −24 13
For every x ∈ Rn , we have T (x) = Ax. Thus, we have
T e 1 = 7e1 + 10e2 + 12e3 ,
T e2 = −12e1 − 19e2 − 24e3 ,
T e3 = 12e1 − 24e2 + 13e3 .
Note that A = M BM −1 and B = M −1 AM . This is not a coincidence (see Theorem 4.45 below).
Using A = M BM −1 , we can easily compute An for any n ∈ Z (cf. Example 4.30). Namely, we have
n
An = M BM −1 = M BM −1 × M BM −1 × · · · × M BM −1 × M BM −1 =
| {z }
n times
−1 n −1
=M ×B | ×B×B× {z· · · × B × B} ×M = M × B × M =
n times
n
3 2 −1 (−1) 0 0 −1 2 −1 4 − 3(−1)n 6(−1)n − 6 3 − 3(−1)n
= 5 1 0 0 1 0 5 −9 5 = 5 − 5(−1)n 10(−1)n − 9 5 − 5(−1)n
6 0 1 0 0 1 6 −12 7 6 − 6(−1)n 12(−1)n − 12 7 − 6(−1)n
This examples shows one useful feature: if there are real numbers λ1 , . . . , λn such that
T v 1 = λ1 v 1 ,
T v 2 = λ 2 v 2 ,
..
.
T v = λ v ,
n n n
• M for the matrix (v1 |v2 | · · · |vn ), i.e. the matrix with columns v1 , . . . , vn .
Then M is invertible, A = M BM −1 and B = M −1 AM .
Proof. The matrix M is invertible by Theorem 3.17. To complete the proof, we have to prove that
AM = M B.
Let us do thus by evaluating T (vk ) for each k ∈ {1, . . . n} in two different ways. We have
n n n n X n n
X X X X X
T vk = bik vi = bik mji ej = mji bik ej = (M B)jk ej .
i=1 i=1 j=1 j=1 i=1 j=1
so that (M B)jk = (AM )jk for every j and k in {1, . . . n}. This means that M B = AM .
The matrix M in Theorem 4.45 is called change of basis matrix.
Example 4.46. Let v1 and v2 be the vectors in R2 such that
3 7
v1 = and v2 = .
1 4
Note that v1 and v2 are linearly independent, so that they form a basis of the vector space R2 .
By Proposition 4.13, there exists unique linear transformation T : R2 → R2 such that
(
T v1 = 2v1 ,
T v2 = −v2 .
Then the matrix of T with respect to the basis v1 , v2 is
2 0
B= .
0 −1
The change of basis matrix M is
3 7
M= .
1 4
By Theorem 4.45, the standard matrix A = [T ] is given by
−1
−1 3 7 2 0 3 7 1 31 −63
A = M BM = = .
1 4 0 −1 1 4 5 12 −26
Exercise 4.47. By Proposition 4.13, there is unique linear transformation T : R2 → R2 such that
3 1 1 301
T = and T = .
2 5 5 205
Find the standard matrix of T and compute its determinant.
ACCELERATED ALGEBRA 91
Theorem 4.45 tells us that matrices of the same linear operator with respect to different bases
are related in a certain way. It is useful to have some terminology for this.
Definition 4.48. Let A and B be n × n matrices. We say that A and B are similar to B if
A = M BM −1
for some invertible n × n matrix M . If A and B are similar, we write A ∼ B.
For instance, in Example 4.46, we proved that
1 31 −63 2 0
∼ .
5 12 −26 0 −1
Similarly, in Example 4.44, we proved that
7 −12 6 −1 0 0
10 −19 10 ∼ 0 1 0 .
12 −24 13 0 0 1
Being similar is an equivalence relation on the set of all n × n matrices:
Lemma 4.49. Let A, B and C be n × n matrices. Then the following assertions hold:
(i) A ∼ A;
(ii) if A ∼ B, then B ∼ A;
(iii) if A ∼ B and B ∼ C, then A ∼ C.
Proof. To prove (i), observe that A = In AIn−1 , so that A ∼ A.
To prove (ii), suppose that A ∼ B. Then
A = M BM −1
for some invertible n × n matrix M . Put Q = M −1 . Then
B = M −1 AM = QAQ−1
and Q is invertible, so that B ∼ A.
To prove (iii), suppose that A ∼ B and B ∼ C. Then
(
A = M BM −1
B = QCQ−1
for some invertible n × n matrices M and Q. Let R = M Q. Then
−1
A = M QCQ−1 M −1 = M Q C M Q = RCR−1
If the answer to Question 5.2 is yes for a given linear operator T : Rn → Rn , then we say that
this linear operator is diagonalizable. This brings us to the following (important) definition:
Definition 5.3. Let T : Rn → Rn be a linear operator.
(1) An eigenvalue of T is a real number λ such that
T x = λx
Ax = λx
for some non-zero vector x ∈ Rn .
(2) An eigenvector of A with eigenvalue λ is non-zero vector x ∈ Rn such that Ax = λx.
If T : Rn → Rn is a linear operator, then Theorem 4.22 gives
T x = T x
for every vector x ∈ Rn . Thus, the eigenvalues and eigenvectors of a linear operator T are exactly
the eigenvalues and eigenvectors of its standard matrix [T ].
Example 5.5. Let A be a n × n diagonal matrix
λ1 0 0 ··· 0
0 λ2 0 ··· 0
. .. . . .. ..
.. . . . .
0 0 0 λ 0
n−1
0 0 0 · · · λn
where λ1 , . . . , λn are some real numbers. Then λ1 , . . . , λn are eigenvalues of the diagonal matrix A.
Let e1 , . . . , en be the standard basis of the vector space Rn . Then e1 , . . . , en are eigenvectors of
the matrix A with eigenvalues λ1 , . . . , λn , respectively.
Example 5.6. Let V be the linear subspace in R4 given by
x1 + x2 + 2x3 − x4 = 0,
let PV : R4 → R4 be the orthogonal projection onto V, let RV : R4 → R4 be the function such that
x 7→ x + 2 PV x − x
Then the vectors v1 , v2 , v3 and v4 are linearly independent, so that they form a basis of R4 .
Note also that v1 , v2 and v3 are contained in V, so that they form a basis of the vector subspace V.
Furthermore, the vector v4 forms a basis of the orthogonal complement to V. Then
R V v 1 = v1 ,
RV v2 = v2 ,
R V v 3 = v3 ,
RV v4 = −v4 .
Then 1 and −1 are eigenvalues of the linear operator RV , every non-zero vector in V is an eigenvec-
tor of the linear operator RV with eigenvalue 1, and every non-zero vector in V⊥ is an eigenvector
with eigenvalue −1. What are eigenvalues and eigenvectors of the orthogonal projection PV ?
Example 5.7. Let Rθ : R2 → R2 be the rotation by θ about 0, where θ ∈ R such that 0 6 θ < 2π.
If θ = 0, then Rθ = IdR2 and all vectors in R2 are its eigenvectors with eigenvalue 1. If θ = π, then
Rθ x = −x
for every vector x ∈ R2 , so that every vector in R2 is an eigenvector of Rθ with eigenvalue −1.
Finally, if θ 6= 0 and θ 6= π, then the operator Rθ does not have real eigenvalues.
Example 5.8. Let e1 and e2 be standard basis vectors in R2 , and let
0 1
A= .
0 0
Then Ae1 = 0 and Ae2 = e1 , so that e1 is an eigenvector of the matrix A with eigenvalue 0.
If x ∈ R2 is another eigenvector of this matrix, then
x1 x
A =λ 1
x2 x2
for some number λ. Then
x2 x1 x1 λx1
=A =λ =
0 x2 x2 λx2
which gives
λx1 = x2 ,
λx2 = 0.
If λ 6= 0, this gives x1 = 0 and x2 = 0, so that x = 0, which is not allowed. Then λ = x2 = 0 and
x = x1 e 1 .
Hence, the only eigenvalue of A is 0, and all eigenvectors of this matrix are scalar multiple of e1 .
This implies that A is not diagonalizable.
Exercise 5.9. Let A and B be n × n matrices. Prove or disprove the following assertions:
(i) If λ is an eigenvalue of A and B, then λ is an eigenvalue of AB.
(ii) If v is an eigenvector of A and B, then v is an eigenvector of AB.
Eigenvectors are not allowed to be zero, but eigenvalues can be. In fact, for a n × n matrix A,
the eigenvectors of A with eigenvalue 0 are exactly the nonzero elements of the kernel of A, because
Ax = 0x ⇐⇒ Ax = 0.
Thus, we see that 0 is an eigenvalue of A ⇐⇒ det(A) = 0 ⇐⇒ A is not invertible.
Proposition 5.10. Let T : Rn → Rn be a linear operator. The following conditions are equivalent:
96 IVAN CHELTSOV
It is useful to think about the set of all the eigenvectors of T that share a particular eigenvalue:
Definition 5.11. Let T : Rn → Rn be a linear operator, let A be a n × n matrix, and let λ ∈ R.
Then the λ-eigenspace of the linear operator T is
n o
Eλ (T ) = x ∈ Rn : T x = λx .
Example 5.12. Let RV : R4 → R4 be the linear operator (reflection) described in Example 5.6.
Then V is the 1-eigenspace of the operator RV , and V⊥ is its (−1)-eigenspace.
If A is a n × n matrix, then E0 (A) = ker(A). More generally, we have
Lemma 5.13. Let A be a n × n matrix, and let λ be a scalar. Then Eλ (A) = ker(A − λIn ).
Proof. For every vector x ∈ Rn , we have
x ∈ Eλ (A) ⇐⇒ Ax − λx = 0 ⇐⇒ A − λIn x = 0 ⇐⇒ x ∈ ker A − λI
as required.
Corollary 5.14. Let T : Rn → Rn be a linear operator, let A be a n × n matrix, and let λ ∈ R.
Then Eλ (T ) and Eλ (A) are linear subspaces of Rn .
Proof. Using Lemmas 2.5 and 5.13, we see that Eλ (A) is a linear subspace. Since Eλ (T ) = Eλ ([T ]),
we see that Eλ (T ) is a linear subspace.
Thus, if λ is an eigenvalue of A, then there exists at least one eigenvector with eigenvalue λ.
Exercise 5.16. Let A be a n × n matrix. Prove the following assertions:
(i) If λ is an eigenvalue of A, then λ2 is an eigenvalue of A2 .
(ii) If λ is an eigenvalue of A, then λ3 − 2λ + 5 is an eigenvalue of A3 − 2A + 5In .
(iii) If A invertible, then the eigenvalues of A−1 are the reciprocals of the eigenvalues of A
ACCELERATED ALGEBRA 97
for some numbers λ1 , . . . , λn . Question 5.1 asks when such matrix M exists and how to find it.
To get a nice answer to Question 5.1, we let
χA λ = det A − λIn .
for some real numbers a0 , a1 , . . . , an−1 . Observe that a0 = χA (0) = det(A). One can show that
an−1 = (−1)n tr(A),
where tr(A) is the sum of the diagonal elements of A (see Exercise 1.33). For instance, if
7 −12 6
A = 10 −19
10 ,
12 −24 13
then, using the formula for the determinant (3.23), we see that
2
χA λ = −λ3 + λ2 + λ − 1 = − λ − 1 λ + 1 .
as required.
Corollary 5.22. Similar matrices have the same eigenvalues.
Thus, if the matrix A is diagonalizable and M is an invertible n × n such that
M −1 AM = diag λ1 , λ2 , . . . , λn
Thus, the matrix of the linear operator T with respect to the basis v1 , v2 , v3 is
2 0 0
B = 0 3 0 ,
0 0 −1
Hence, it follows from Theorem 4.45 that M −1 AM = B. One can explicitly check that this is true:
2 2 −1 1 2 1 1 1 1 2 0 0
M −1 AM = −1 −1 1 0 −1 0 0 0 −1 = 0 3 0 .
0 −1 0 −2 −2 4 1 2 0 0 0 −1
Thus, we see that A is diagonalizable.
Exercise 5.24. Let A be a diagonalizable n × n matrix such that all its eigenvalues are equal.
What can you say about A?
Now we are ready to answer Question 5.1.
Proposition 5.25. Let A, M and B be n × n matrices such that M is invertible, and
B = diag λ1 , λ2 , . . . , λn
for some real numbers λ1 , . . . , λn . Write M = (v1 |v2 | · · · |vn ). Then
Av1 = λ1 v1 ,
Av2 = λ2 v2 ,
−1
M AM = B ⇐⇒ .
..
Avn = λn vn .
However, Proposition 5.25 is not always applicable, because not all square matrices are diago-
nalizable. We saw this already in Example 5.8. Let us consider three additional examples.
Example 5.29. Let
1 1 0
A = 0 1 1 ,
0 0 1
Then χA (λ) = −(λ − 1)3 , so that 1 is the only eigenvalue of the matrix A. Observe that
0 1 0
A − I2 = 0 0 1 ,
0 0 0
so that A has just one eigenvector up to scalling: the first standard basis vector e1 of the space R3 .
Because of this, the matrix A is not diagonalizable. Indeed, if it were diagonalizable, there would
exists an invertible 3 × 3 matrix M such that
λ1 0 0
M −1 AM = 0 λ2 0
0 0 λ3
for some real numbers λ1 , λ2 and λ3 , so that λ1 = 1, λ2 = 1 and λ3 = 1 by Corollary 5.22, which
would imply that M −1 AM = I3 , so that A = M I3 M −1 = M M −1 = I3 , which is absurd.
Example 5.30. Let
3 −2
A= ,
4 −1
Then χA (λ) = λ2 − 2λ + 5, so that A does not have real eigenvalues. Therefore, by Corollary 5.22,
the matrix A is not diagonalizable over the real numbers. However, the polynomial equation
λ2 − 2λ + 5 = 0
has two complex solutions: 1 + 2i and 1 − 2i. Therefore, if we are allowed to use complex numbers,
we can diagonalize A. Indeed, arguing exactly as in Example 5.23, we can obtain the matrix
2 + 2i 2 − 2i
M=
4 4
whose columns are eigenvectors of A with eigenvalues 1 + 2i and 1 − 2i, respectively. Then
i 1+i
−1 −4 8 3 −2 2 + 2i 2 − 2i 1 + 2i 0
M AM = i 1−i = .
4 8
4 −1 4 4 0 1 − 2i
Example 5.31. Let θ be a real number such that 0 6 θ < 2π, and let
cos θ − sin θ
A= .
sin θ cos θ
Then the characteristic polynomial of the matrix A is
cos θ − λ − sin θ
χA (λ) = det(A − λI2 ) =
= λ2 − 2 cos θλ + 1.
sin θ cos θ − λ
If θ 6∈ {0, π}, then χA (λ) has no real roots, so that A is not digonalizable over R by Corollary 5.22.
Observe also that cos θ + sin θi and cos θ − sin θi are two complex eigenvalues of the matrix A.
Moreover, finding the corresponding complex eigenvectors and using Proposition 5.25, we get
−1
i 1 cos θ − sin θ i 1 cos θ + i sin θ 0
= .
1 i sin θ cos θ 1 i 0 cos θ − i sin θ
102 IVAN CHELTSOV
...
λ −λ
m m−1 µm−1 = 0.
Proof. Omitted.
Both numbers γA (δ) and µA (δ) are not hard to compute using the tools we already have.
Example 5.40. Let
3 0 0
A = 0 3 0 .
0 0 5
Then χA (λ) = −(λ − 3)2 (λ − 5), so that the eigenvalues of A are 3 and 5. We have
x
E3 A = ker A − 3I3 = y : x and y are real numbers = span e1 , e2 ,
0
Then γA (−3) = µA (−3) = 2 and γA (8) = µA (8) = 1. But A is not diagonalizable over R. Why?
Note that i and −i are complex eigenvalues of the matrix A such that
γA (i) = µA (i) = γA (−i) = µA (−i) = 1.
One can show that A is diagonalizable over C.
Now the crucial fact is:
Lemma 5.44. Let A and B be similar n × n matrices, and let δ be a real number. Then
γA (δ) = γB (δ),
µA (δ) = µB (δ).
Proof. By Lemma 5.21, we have µA (δ) = µB (δ). To prove γA (δ) = γB (δ), we have to show that
dim ker A − δIn = dim ker B − δIn .
By Lemma 5.21, the number δ is an eigenvalue of A ⇐⇒ it is an eigenvalue of B. This gives
dim ker A − δIn 6= 0 ⇐⇒ dim ker B − δIn 6= 0.
Therefore, to complete the proof, we may assume that ker(B − δIn ) 6= {0}.
Let v1 , . . . , vr be a basis of the vector subspace ker(B − δIn ). Then
γB (δ) = dim ker B − δIn = r.
But Bvi = δvi for every i ∈ {1, . . . , r}. On the other hand, since A and B are similar, we have
A = M BM −1
for some invertible n × n matrix M , so that AM = M B. Then
AM vi = M Bvi = δM vi
for every i ∈ {1, . . . , r}. Thus, we see that M v1 , . . . , M vr are eigenvectors of the matrix A, so that
span M v1 , . . . , M vr ⊂ ker A − δIn .
Now, arguing as in the solution to Exercise 2.41, we see that the vectors M v1 , . . . , M vr are linearly
independent. This shows that
γA (δ) = dim ker A − δIn > r = γB (δ).
Theorem 5.45. Let A be a n × n matrix, and let λ1 , . . . , λm be all its distinct (real) eigenvalues.
Then A is diagonalizable (over real numbers) if and only if
γA (λ1 ) = µA (λ1 )
γA (λ2 ) = µA (λ2 )
..
.
γA (λm ) = µA (λm )
and also γA (λ1 ) + γA (λ2 ) + · · · + γA (λm ) = n. We have to show that the matrix A is diagonalizable.
The proof of this assertion is based on Corollary 5.33 and is very similar to the proof of Lemma 5.33.
Hence, for transparency, we will prove it only in for n = 2 and n = 3.
First we suppose that n = 2. If m = 2, then the matrix A is diagonalizable by Corollary 5.33.
Thus, we may assume that m = 1. Then µA (λ1 ) = 2, so that
χA (λ) = (λ − λ1 )2 .
Since γA (λ1 ) = µA (λ1 ) = 2, there are two linearly independent eigenvectors with eigenvalue λ1 .
Then A is diagonalizable by Corollary 5.26.
Now we suppose that n = 3. If m = 3, then A is diagonalizable by Corollary 5.33. If m = 1,
then
µA (λ1 ) = γA (λ1 ) = n = 3,
which implies that there are 3 linearly independent eigenvectors of the matrix A with eigenvalue λ1 ,
so that A is diagonalizable by Corollary 5.26. Thus, we may assume that m = 2. Then
γA (λ1 ) + γA (λ2 ) = 3.
Since γA (λ1 ) > 1 and γA (λ2 ) > 2, either γA (λ1 ) = 1 and γA (λ2 ) = 2, or γA (λ1 ) = 2 and γA (λ2 ) = 1.
Without loss of generality, we may assume that γA (λ1 ) = 1 and γA (λ2 ) = 2.
Let v1 be an eigenvector of A with eigenvalue λ1 , let v2 and v3 be linearly independent eigen-
vectors of A with eigenvalue λ2 . We claim that the vectors v1 , v2 and v3 are linearly independent.
Indeed, let µ1 , µ2 and µ3 be scalars such that
µ1 v1 + µ2 v2 + µ3 v3 = 0.
Let us show that µ1 = µ2 = µ3 = 0. Multiplying the previous vector equality by A, we get
µ1 λ1 v1 + µ2 λ2 v2 + µ3 λ2 v3 = 0.
One the other hand, multiplying µ1 v1 + µ2 v2 + µ3 v3 = 0 by λ1 , we get
λ1 µ1 v1 + λ1 µ2 v2 + λ1 µ3 v3 = 0.
Then, subtracting the last two equalities from each other, we get
(λ1 − λ2 )µ2 v2 + (λ1 − λ2 )µ3 v3 = 0.
ACCELERATED ALGEBRA 107
µ1 v1 = µ1 v1 + µ2 v2 + µ3 v3 = 0,
Let us conclude this section by translating its main results into the language of linear operators.
Namely, let T : Rn → Rn be a linear operator. By Corollary 4.50 and Lemma 5.21, we may define
the characteristic polynomial χT (λ) of the operator T as
χT (λ) = det A − λIn
where A is the matrix of T with respect to any basis of Rn . Then the eigenvalues of T are real
roots of the polynomial χT (λ).
Recall from Section 5.a that T is diagonalizable if there exists a basis v1 , . . . , vn of Rn such that
T v1 = λ1 v1
T v = λ2 v2
2
..
.
T v = λn−1 vn−1
n−1
T vn = λn vn
for some scalars λ1 , . . . , λn . By Theorem 4.45, Proposition 5.25 and Corollary 5.26, the following
conditions are equivalent:
(1) the linear operator T is diagonalizable;
(2) the standard matrix of T is diagonalizable;
(3) the matrix of T with respect to some basis of Rn is diagonal;
(4) there is a basis of Rn consisting of eigenvectors of the linear operator T ;
(5) there exist n linearly independent eigenvectors of the linear operator T .
Moreover, it follows from Corollary 5.34 that T is diagonalizable if T has n distinct eigenvalues.
For δ ∈ R, the number γT (δ) = dim(Eδ (T )) is called the geometric multiplicity of the number δ,
and the number n o
r
µT (δ) = max r ∈ Z>0 : (λ − δ) divides χT λ
is called the algebraic multiplicity of the number δ. By Lemma 5.44, we have
γT (δ) = γA (δ),
µT (δ) = µA (δ),
where A is the matrix of T with respect to any basis of Rn . Moreover, it follows from Theorem 5.45
that T is diagonalizable if and only if γT (λ1 ) + γT (λ2 ) + · · · + γT (λm ) = n and
γT (λ1 ) = µT (λ1 ),
γT (λ2 ) = µT (λ2 ),
..
.
γT (λm ) = µT (λm ),
for some (not necessarily distinct) complex numbers λ1 , λ2 , . . . , λn . These numbers are the complex
roots of the characteristic polynomial χA (λ). Note that all or some of them can be real.
Lemma 5.52. If χA (λ) has a non-real complex root, then A is not diagonalizable over R.
Proof. Follows from Theorem 5.45 or its proof.
This obstruction disappears if we consider complex n × n matrices instead of real n × n matrices.
To do this properly, we have to repeat many our definitions and proofs over complex numbers,
which is easy to do: one just have to change R to C everywhere. We will not do this here in details.
Instead, let us just list the most important definitions and results without proofs.
From now on and until the end of this section, we assume that
A is a n × n complex matrix
so that A is a n × n matrix with entries in C. Then
• an eigenvalue of the matrix A is a complex number λ such that
Ax = λx
n
for some non-zero vector x ∈ C ;
110 IVAN CHELTSOV
so that Eλ (A) is linear subspaces of Cn . Then the following conditions are equivalent:
(1) λ is an eigenvalue of A;
(2) one has Eλ (A) 6= {0};
(3) the matrix A − λIn is not invertible;
(4) det(A − λI) = 0.
Exercise 5.53. Let A be a n × n complex matrix with entries Aij , and let λ be its eigenvalue.
For each i ∈ {1, . . . , n}, let
ri = Ai1 + · · · + Ai,i−1 + Ai,i+1 + · · · + Ai,n ,
so that ri is the sum of the absolute values of all the entries in the ith row except |Aii |.
(i) Prove that there exists i ∈ {1, . . . , n} such that
λ − Aii 6 ri .
This result is the Gershgorin Circle Theorem. Hints: Choose an eigenvector x with eigen-
value λ, and choose i such that
xi = max |x1 |, . . . , |xn | .
Then consider the equation Ax = λx at its ith coordinate, and use the triangle inequality.
(ii) Use the Gershgorin Circle Theorem to prove that for every eigenvalue λ of A, one has
n
X
λ 6 max Aij .
16i6n
j=1
Exercise 5.55. Let A be a 2 × 2 complex matrix, let λ1 and λ2 be its two eigenvalues. Show that
χA (λ) = λ2 − tr(A)λ + det(A),
where tr(A) is the sum of the diagonal entries of A. Deduce that
χA (0) = det(A) = λ1 λ2
and tr(A) = λ1 + λ2 . Conclude (without calculations) that the matrix
198 301
486 261
has two distinct real eigenvalues such that one of them is positive and one of them is negative.
The formula in this exercise holds in general. Indeed, let
χA λ = (−1)n λn + an−1 λn−1 + · · · + a1 λ + a0 ,
where tr(A) is the sum of the diagonal elements of A, known as the trace of the matrix A.
Exercise 5.56. Let A and B be n × n complex matrices.
• Show that AB and BA have the same characteristic polynomials.
• What can we tell about the eigenvalues of AB if we know the eigenvalues of A and B?
Two n × n complex matrices A and B are said to be similar if A = M BM −1 for some complex
invertible n × n matrix M . If the matrices A and B are similar, then
• they have the same characteristic polynomial and the same eigenvalues;
• one has dim(Eδ (A)) = dim(Eδ (B)) for every complex number δ.
Moreover, there exists an invertible n × n complex matrix M such that
λ1 0 0 ··· 0
0 λ2 0 ··· 0
−1
. . . . ..
M AM = .
. .
. . . .
. .
0 0 0 λ 0
n−1
0 0 0 ··· λn
for some complex numbers λ1 , . . . , λn if and only if the columns of M are linearly independent
eigenvectors v1 , . . . , vn of the matrix A such that
Av1 = λ1 v1 ,
Av2 = λ2 v2 ,
..
.
Avn = λn vn .
In this case, the matrix A is said to be diagonalizable. Then the following conditions are equivalent:
(1) the matrix A is diagonalizable;
(2) there is a basis of Cn consisting of (complex) eigenvectors of the matrix A;
(3) the matrix A has n linearly independent (complex) eigenvectors.
Moreover, if A has n distinct eigenvalues, then A is diagonalizable.
112 IVAN CHELTSOV
Exercise 5.57. Which of the following matrices are diagonalizable over complex numbers?
4 −5 7 −1 3 −1 4 7 −5 4 2 −5
1 −4 9 , −3 5 −1 , −4 5 0 , 6 4 −9 .
−4 0 5 3 3 1 1 9 −4 5 3 −7
For a n × n complex matrix A and an arbitrary complex number δ, we let γA (δ) = dim(Eδ (A)).
We say that γA (δ) is the geometric multiplicity of the number δ. Similarly, we let
n o
r
µA (δ) = max r ∈ Z>0 : (λ − δ) divides χA λ ,
and we say that µA (δ) is the algebraic multiplicity of the number δ. Then
γA (δ) 6 µA (δ).
Moreover, we have the following diagonalization criterion:
Theorem 5.58. Let A be a n × n complex matrix, and let λ1 , . . . , λm be all its distinct (complex)
eigenvalues. Then A is diagonalizable (over complex numbers) if and only if
γA (λ1 ) = µA (λ1 )
γA (λ2 ) = µA (λ2 )
..
.
γA (λm ) = µA (λm )
so that the diagonal matrix M −1 AM is the matrix of the linear operator T in the basis v1 , . . . , vn .
This illustrates Theorem 4.45, which is also valid for complex linear operators.
If A is not diagonalizable, we can still find an invertible matrix M such that M −1 AM looks
rather simple. For example, let
5 4 2 1
0 1 −1 −1
A= −1 −1 3
.
0
1 1 −1 2
Then χA (λ) = (λ − 1)(λ − 2)(λ − 4)2 , so that the eigenvalues of A are 1, 2 and 4. One has
dim E1 A = dim E2 A = dim E4 A = 1.
ACCELERATED ALGEBRA 113
Thus, up to scaling, we have exactly one eigenvector with eigenvalue 1, 2 or 4. Namely, let
−1 1 1
1 −1 0
v1 = 0 , v2 = 0 , v3 = −1
0 2 1
Then v1 , v2 and v3 are eigenvectors of the matrix A with eigenvalues 1, 2 and 4, respectively.
Note that A is not diagonalizable by Theorem 5.58. The reason for this is that A does not have four
linearly independent eigenvectors. But the matrix A has three linearly independent eigenvectors!
These are our eigenvectors v1 , v2 and v3 . These three vectors are indeed linearly independent.
For instance, this follows from Lemma 5.33, which is valid for complex matrices. Let
a
b
v4 =
c
d
for some complex numbers a, b, c and d. When v1 , v2 , v3 , v4 are linearly independent? Let
−1 1 1 a
1 −1 0 b
M =
0 0 −1 c
0 1 1 d
Then det(M ) = −a−2b, which shows that v1 , v2 , v3 , v4 are linearly independent ⇐⇒ a+2b 6= 0.
Thus, we suppose that a + 2b 6= 0. Then M is invertible, and v1 , v2 , v3 , v4 form a basis of C4 .
One can check that
1 0 0 −3b − 3c − 3d
0 2 0 −2c − 2d
M −1 AM = 0 0 4
a+b+c
0 0 0 4
This is the matrix of the linear operator T in the basis v1 , v2 , v3 , v4 .
Question. What is the most simplest form of the matrix M −1 AM that we can get?
The answer to this question really depends on person’s taste. In my opinion, M −1 AM is the most
simplest when
− 3b − 3c − 3d = 0,
− 2c − 2d = 0,
a + b + c = 1.
Solving this system, we see that a = t + 1, b = 0, c = −t, d = t, where t is any complex number.
For example, we can put a = 1 and b = c = d = 0, so that
1
0
v4 =
0 .
0
114 IVAN CHELTSOV
Note that we could find v4 by solving the equation (M − 4I4 )v4 = v3 . In this case, we have
T v1 = v1 ,
T v2 = 2v2 ,
T v 3 = 4v3 ,
T v4 = 4v4 + v3 ,
where λ ∈ C. Here, we have λ on the diagonal and 1 in each entry above the diagonal. The proof
of this result is beyond the scope of these notes.
Let us conclude this section by one result about complex square matrices that will be used later.
To state it, observe that for a n × n complex matrix A, its complex conjugate A is a n × n matrix
whose (i, j)-entry is the complex conjugate of the (i, j)-entry of the matrix A. Similarly, we can
define complex conjugates of any matrices and vectors. Complex conjugation preserves addition
and multiplication of matrices. For instance we have
A + B = A + B and AB = A B
for every complex n × n matrices A and B. Now we are ready to state our result:
T
Lemma 5.60. Let A be a n × n complex matrix such that A = A . Then its eigenvalues are real.
Proof. Let λ ∈ C be an eigenvalue of A. Choose an eigenvector x ∈ Cn with eigenvalue λ. Then
x1
x2
x= ...
xn
for some complex numbers x1 , . . . , xn . Then xT x is real and positive, because
x1
n n
T
x2 X X
x x = x1 x2 · · · xn .. = xi xi = |xi |2 > 0,
. i=1 i=1
xn
since x 6= 0. The number xT x is the length of the complex vector x. On the other hand, we have
T T T T
λ xT x = λx x = Ax x = Ax x = xT A x = xT Ax = xT λx = λxT x,
for every vector x ∈ Rn , so that T preserves lengths, which also implies that it preserves angles
between vectors by Definition 1.19. In fact, the transformation T is orthogonal if and only if
T (x)
=
x
Now let us show that (ii) ⇐⇒ (iv). To do this, let v1 , . . . , vn be the columns of the matrix A.
Let us use the following convention: the (p, q)-entry of a matrix M is written as Mpq . Then
n
X n
X
T T
A A ik = A ij Ajk = Aji Ajk = vi · vk
j=1 j=1
for every i and k in {1, . . . , n}. Thus, the equality AT A = In holds if and only if
1 if i = j
vi · v k =
0 if i 6= j
for every i and k in {1, . . . , n}. This show that (ii) ⇐⇒ (iv).
Similarly, we see that (iii) ⇐⇒ (v). This completes the proof of the theorem.
Let A be a real n × n matrix. Inspired by Theorem 5.64, we give the following
Definition 5.65. We say that A is orthogonal if AT A = In or AAT = In .
By Theorems 4.22 and 5.64, we have
A is orthogonal ⇐⇒ TA is orthogonal .
Example 5.66. Let Rθ : R2 → R2 be the rotation by an angle θ anticlockwise about the origin,
where θ is some real number such that 0 6 θ < 2π. Then
cos θ − sin θ
Rθ = .
sin θ cos θ
This matrix is orthogonal, since
T cos θ sin θ cos θ − sin θ 1 0
[Rθ ] [Rθ ] = = .
− sin θ cos θ sin θ cos θ 0 1
Thus, we see that Rθ is orthogonal.
Exercise 5.67. Which of the following matrices are orthogonal?
√
1 1 1 1
2 2 −1 2 2 −1
!
3 1
2 0 1 −1 − 1 1 1 1 1 −1 −1
, , 12 √32 , 2 −1 2 , −1 2 2 , .
0 2 −1 1 3 3 2 −1 1 −1 1
2 2 −1 2 2 2 −1 2
−1 1 1 −1
It easily follows from Lemma 5.63 that a linear transformation T : Rn → R2 is orthogonal if and
only if kT (x)k = kxk for every vector x ∈ Rn . By Theorem 5.64, the same holds for matrices.
Corollary 5.68. Let A be a n × n matrix. Then A is orthogonal if and only if
kAxk = kxk
n
for every x ∈ R .
Proof. For consistency, let us prove this in details If A is orthogonal, then Theorem 5.64 gives
q √
Ax
= Ax · Ax = x · x =
x
for every x ∈ Rn .
Conversely, suppose kAxk = kxk for every x ∈ Rn . Let us show that
Ax · Ay = x · y
for every x and y in Rn . Then Theorem 5.64 would imply that A is orthogonal.
118 IVAN CHELTSOV
b = cos(v),
d = sin(v).
ACCELERATED ALGEBRA 119
Moreover, we may assume that 0 6 u < 2π and 0 6 v < 2π. On the other hand, we have
ad − bc = 1,
because det(A) = 1. Now, using ad − bc = 1 and ab + cd = 0, we get
cos(u) sin(v) − sin(u) cos(v) = 1,
Therefore, it makes little difference whether we work with linear operators or square matrices.
From now on, we stick with matrices.
Example 5.82. Let
5 −1
A= .
−1 5
Then χA (λ) = (λ − 4)(λ − 6). Moreover, the vectors
1 −1
and
1 1
are eigenvectors with eigenvalues 4 and 6, respectively. They are orthogonal but not orthonormal.
We scale them to make them orthonormal, and define P to be the matrix with these orthonormal
eigenvectors as its columns:
√ ! √
2
√2
−√ 22
P = 2 2
.
2 2
Then P is orthogonal, and
√ √ ! √ √ !
2 2 2
−√ 22
−1 T 2√ √2
5 −1 √2
4 0
P AP = P AP = =
− 2 2 −1 5 2 2 0 6
2 2 2 2
we may assume that kv1 k = 1. Now, using Lemma 2.73, we can find some orthonormal basis
v 1 , v 2 , . . . , vn
of the vector space Rn that contains v1 .
Put Q = (v1 |v2 | · · · |vn ). Then Q has orthonormal columns, so it is orthogonal by Corollary 5.64.
Now consider the matrix
QT AQ = Q−1 AQ.
By Lemma 1.35, we have Qe1 = v1 , so that
AQe1 = λ1 Qe1 ,
or equivalently we have
Q−1 AQe1 = λ1 e1 .
By Lemma 1.35, this means that the first column of QT AQ is λ1 e1 . Moreover, the matrix QT AQ
is symmetric, since
T T
QT AQ = QT AT QT = QT AQ,
because A is symmetric. Hence, the first row of QT AQ is λ1 eT1 . This means that
λ1 0 ··· 0
0 ã11 · · · ã1,n−1
QT AQ = .
.. .. ..
. .
0 ãn−1,1 · · · ãn−1,n−1
for some real numbers ãij .
e for the (n − 1) × (n − 1) matrix (ãij ). Then A
Write A e is symmetric, since QT AQ is symmetric.
Thus, by inductive hypothesis, the matrix A e is orthogonally diagonalizable, so than
λ2 0 0 ··· 0
0 λ3 0 ··· 0
T
. . . . ..
Q
e A e = ..
eQ .
. . . .
. .
0 0 0 λ 0
n−1
0 0 0 · · · λn
We claim that P is orthogonal. Indeed, the matrix Q is orthogonal, and R is orthogonal, since
1 0 ··· 0 1 0 ··· 0 1 0 ··· 0
0 0 0
RT R = ... .
..
= .
..
= In .
QeT Q
e QeT Q
e
0 0 0
Thus, the matrix P is orthogonal by Lemma 5.71. This completes the induction.
Corollary 5.86. The assertion of Theorem 5.77 holds.
Exercise 5.87. Let A be a n × n symmetric matrix. Show that the following are equivalent:
• all eigenvalues of A are positive real numbers;
• there exists n × n symmetric matrix B such that A = B 2 .
Let us consider one 2 × 2 example.
Example 5.88. Let
5 −3
A= .
−3 5
Let us find an orthogonal 2 × 2 matrix P such that P T AP is diagonal. We have
χA (λ) = λ2 − 10λ + 16 = (λ − 2)(λ − 8),
so that the eigenvalues of A are 2 and 8. To find an eigenvector of A with eigenvalue 2, we solve
3x1 − 3x2 = 0,
− 3x1 + 3x2 = 0.
One its solution is the vector
x1 1
= .
x2 1
It is an eigenvector of A with eigenvalue 2. Observe that the vector
−1
1
is orthogonal to the vector we just found, so that it must be an eigenvector of A with eigenvalue 8.
Normalizing both these vectors, we get eigenvectors
√ ! √ !
2
−√ 22
v1 = √22 and v2 = 2
2 2
with eigenvalues 2 and 8, respectively, that both have length 1. These eigenvectors are orthonormal,
so that we let √ √ !
2 2
−
P = v1 |v2 = √22 √22 .
2 2
This matrix is the required orthogonal matrix, so that
√ √ ! √ √ !
2 2 2
−√ 22
T 2√ √2
5 −3 √2
2 0
P AP = =
− 22 22 −3 5 2 2 0 8
2 2
with eigenvalues 36 and 18, respectively, that both have length 1. The three eigenvectors v1 , v2
and v3 are orthonormal, so that we let
2 2 1
3
− 3
− 3
P = v1 |v2 |v3 = 1 2 − 2 .
3 3 3
2 1 2
3 3 3
φ ◦ ψ = IdR2
The rotation ψ maps the curve C to a curve ψ(C). To find its equation, observe that
x x x cos θ + y sin θ
∈ ψ C ⇐⇒ φ ∈ C ⇐⇒ ∈ C.
y y −x sin θ + y cos θ
126 IVAN CHELTSOV
where P is the matrix we found in Example 5.88. Then it follows from Example 5.88 that ψ(C) is
given by
2 0 x
x y − 4 = 0,
0 8 y
which can be simplified as
2x2 + 8y 2 − 4.
This is indeed an ellipse. Thus, our original curve C can be obtained from this ellipse by rotating
the plane R2 anticlockwise by angle π2 around the origin.
Remark 5.92. In this example, we used orthogonal transformations of R2 to simplify the defining
equation of the curve C. While doing this, we never used any geometric properties of this curve.
Indeed, we only performes algebraic manipulations with the polynomial 5x2 − 6xy + 5y 2 − 4.
Therefore, in some sense, we used orthogonal transformations to simplify this polynomial.
Now, we can apply the same idea to any conic. Recall from Example 3.48 that a conic in R2 is
a plane curve that is given by the polynomial equation
(5.93) ax2 + bxy + cy 2 + dx + ey + f = 0,
where a, b, c, d, e and f are some real numbers such that (a, b, c) 6= (0, 0, 0). However, sometimes,
the equation (5.93) does not define anything that looks like a nice curve. For instance, the equation
xy = 0
defines a union of two intersecting lines. Similarly, the equation
x2 − 1 = 0
defines a union of two parallel lines. Likewise, the equation
x2 = 0
defines a single line. Moreover, in some cases, the equation (5.93) defines something that does not
look like a curve at all. For instance, the equation
x2 + y 2 = 0
defines a single point, and x2 +y 2 +1 = 0 defines nothing (an empty set). These are degenerate cases.
Apart from them, the curve defined by (5.93) is of one of the following three types:
• an ellipse,
• a hyperbola,
• a parabola.
ACCELERATED ALGEBRA 127
we may assume that α2 +β 2 = 1. Then α = cos(θ) and β = sin(θ) for some θ such that 0 6 θ < 2π.
128 IVAN CHELTSOV
and let ζ be the inverse translation ξ −1 . Then ξ and ζ are bijections. Therefore, we have
x x
∈ (ξ ◦ ψ) Σ ⇐⇒ ζ ∈ψ Σ ,
y y
which implies that (ξ ◦ ψ)(Σ) is given by the equation
d¯2 ē2
2 2
λ1 x + λ2 y + f − − = 0,
4λ1 4λ2
where λ1 λ2 > 0. Thus, we have the following possibilities:
• λ1 > 0, λ2 > 0, 4f − d¯2 /λ1 − ē2 /λ2 < 0 ⇒ the subset (ξ ◦ ψ)(Σ)) is an ellipse;
• λ1 < 0, λ2 < 0, 4f − d¯2 /λ1 − ē2 /λ2 > 0 ⇒ the subset (ξ ◦ ψ)(Σ)) is an ellipse;
• λ1 > 0, λ2 > 0, 4f − d¯2 /λ1 − ē2 /λ2 > 0 ⇒ the subset (ξ ◦ ψ)(Σ)) is an empty set;
• λ1 > 0, λ2 > 0, 4f − d¯2 /λ1 − ē2 /λ2 = 0 ⇒ the subset (ξ ◦ ψ)(Σ)) is a point;
• λ1 < 0, λ2 < 0, 4f − d¯2 /λ1 − ē2 /λ2 = 0 ⇒ the subset (ξ ◦ ψ)(Σ)) is a point.
Now we suppose that λ1 λ2 < 0. Rewrite the equation of the set ψ(Σ) as
2 2
d¯ d¯2 ē2
ē
λ1 x + + λ2 y + + f− − = 0.
2λ1 2λ2 4λ1 4λ2
As in the previous case, let ξ : R2 → R2 be the translation that is given by
¯
x + 2λd1
x
7→ .
y y + 2λē 2
Then (ξ ◦ ψ)(Σ) is given by the equation
d¯2 ē2
2 2
λ1 x + λ2 y + f − − = 0,
4λ1 4λ2
where λ1 λ2 < 0. Then
d¯2 ē2
the subset (ξ ◦ ψ)(Σ) is a hyperbola ⇐⇒ f − − 6= 0.
4λ1 4λ2
Moreover, if 4f − d¯2 /λ1 − ē2 λ2 = 0, then (ξ ◦ ψ)(Σ) is a union of two non-parallel lines.
Finally, we suppose that λ1 λ2 = 0. Then λ2 = 0, since we assumed that λ1 6= 0. Now we can
rewrite the equation of ψ(Σ) as
2
d¯ d¯2
λ1 x + + ēy + f − = 0.
2λ1 4λ1
If ē = 0, then we have the following possibilities:
• 4f − d¯2 /λ1 = 0 ⇒ (ξ ◦ ψ)(Σ) is a line;
• λ1 > 0, 4f − d¯2 /λ1 < 0 ⇒ (ξ ◦ ψ)(Σ) is a union of two parallel lines;
• λ1 < 0, 4f − d¯2 /λ1 > 0 ⇒ (ξ ◦ ψ)(Σ) is a union of two parallel lines;
• λ1 > 0, 4f − d¯2 /λ1 > 0 ⇒ (ξ ◦ ψ)(Σ) is an empty set;
• λ1 < 0, 4f − d¯2 /λ1 < 0 ⇒ (ξ ◦ ψ)(Σ) is an empty set.
130 IVAN CHELTSOV
Thus, we may assume that ē 6= 0. Then we can rewrite the equation ψ(Σ) as
2
d¯ d¯2
f
λ1 x + + ē y + − = 0.
2λ1 ē 4λ1 ē
Let ξ : R2 → R2 be the translation given by
!
d¯
x x+ 2λ1
7→ d¯2 .
y y + fē − 4λ1 ē
Ax2 + By 2 − 1 = 0
for some positive real numbers A and B. Similarly, a subset in R2 is said to be a hyperbola if it
can be mapped by a composition of rotation and translation into the subset given by
Ax2 − By 2 − 1 = 0
Ax2 − y = 0
for some real number A 6= 0. In these three cases, we say that the subset is a non-degenerate conic.
Example 5.95. Let Σ be a subset of R2 that is given by the equation
5x2 + 2xy + 10y 2 + 10x + 2y = 0.
Then we can rewrite this equation as
5 1 x x
x y + 10 2 = 0.
1 10 y y
But the eigenvalues of the matrix
5 1
1 10
√ √
are 15−2 29 > 0 and 15+2 29 > 0. Thus, applying an appropriate composition of a rotation and
translation, we can map the subset Σ to the subset that is given by
√ √
15 + 29 2 15 − 29 2
x + y +D =0
2 2
for some real number D. If D < 0, then Σ is an ellipse. Likewise, if D = 0, then Σ is a single point.
Finally, if D > 0, then Σ is an empty. But Σ contains the point 0 and the point
−2
,
0
so that the subset Σ is an ellipse.
ACCELERATED ALGEBRA 131
Denote the 3 × 3 matrix in this equation by M . As in the proof of Theorem 5.94, let
a 2b
N= b .
2
c
Lemma 5.98. Let Σ be a subset in R2 given by (5.93). Then
(1) det(N ) > 0 and tr(N )det(M ) < 0 ⇐⇒ Σ is an ellipse;
(2) det(N ) < 0 and det(M ) 6= 0 ⇐⇒ Σ is a hyperbola;
(3) det(N ) = 0 and det(M ) 6= 0 ⇐⇒ Σ is a parabola.
Proof. The trace and determinant of the matrix N are invariant with respect to rotations and
translations of the plane. Similarly, the sign of the determinant of the matrix M is also invariant
with respect to rotations and translations. Thus, the result follows from Theorem 5.94.
Corollary 5.99. If det(M ) = 0, then (5.93) does not define a non-degenerate conic.
Let us consider one example. Let us determine which conic is given in R2 by
2x2 − 3xy + 7y 2 − 5x + 11y + 8 = 0.
In this case, the matrices M and N are
a 2b d
2 2 − 32 − 25
M = 2b c e = −3 7 11
2 2 2
d e 5 11
2 2
f −2 2 8
and
b
2 − 32
a 2
N= = .
b
2
c − 23 7
Then det(N ) = 47 4
, det(M ) = 31 and tr(N ) = 9, so that our equation does not define a non-
degenerate conic. This is easy to see explicitly, since
2 47
!
124 37 3 29 29 2 124
2x2 − 3xy + 7y 2 − 5x + 11y + 8 = +2 x− − y+ + y+ > ,
47 47 4 47 8 47 47
so that our equation defines an empty set.
Let us determine the type of a conic in R2 that is given by
2x2 − 3xy + 7y 2 − 5x + 11y − 8 = 0.
Now we have det(N ) = 474
, det(M ) = −157 and tr(N ) = 9, so that our equation defines an ellipse.
Of course, we can show this algebraically: we have
2 47
!
628 37 3 29 29 2
2x2 − 3xy + 7y 2 − 5x + 11y − 8 = − +2 x− − y+ + y+ ,
47 47 4 47 8 47
so that our conic is given by
47 2 628
2x̄2 +ȳ − =0
8 47
in coordinates
37 3 29
x̄ = x −
− y+ ,
47 4 47
ȳ = y + 29 .
47
This change of coordinates is a composition of an invertible linear transformation and a translation.
It follows from the proof of Lemma 5.98 that such transformations preserve the type of any non-
degenerate conic.
ACCELERATED ALGEBRA 133
Poole assumes that parabola is the curve in R2 that is given by the equation
y = ax2 + bx + c
for some real numbers a, b and c. This definition of parabola is not politically correct: it discrim-
,
inates the x-axis , since the equation
x = ay 2 + by + c
clearly defines a parabola in R2 . The goal of the next exercise is to solve Poole’s exercise using
our (politically correct) definition of parabola:
• parabola is a subset in R2 such that there exists a composition of rotations and translations
that maps this subset to the curve given by
y = ax2 ,
where a is some real number.
Exercise 5.101. Let P1 = (0, 1), P2 = (−1, 4) and P3 = (2, 1). Do the following:
(a) Find all parabolas in R2 that pass through the points P1 , P2 , P3 and (19, 20).
(b) Find all parabolas in R2 that pass through the points P1 , P2 , P3 and (9, 10).
(c) Try to describe all parabolas in R2 pass through P1 , P2 , P3 .
(d) Try to describe the subset S ⊂ R2 such that P ∈ S if and only if there exists a parabola
that contains P1 , P2 , P3 and P .
Let us finish this section by one useful remark. In the proof of Theorem 5.94, we implicitly
classified polynomials
ax2 + bxy + cy 2 + dx + ey + f
with (a, b, c) 6= (0, 0, 0) up to rotations and translations. Moreover, going through this proof, we
see that each such polynomial can be simplified using rotations and translations as follows:
(1) āx2 + c̄y 2 + f¯ for some real numbers ā, c̄ and f¯ such that āc̄ > 0 and (ā + c̄)f¯ < 0,
(2) āx2 + c̄y 2 + f¯ for some real numbers ā, c̄ and f¯ such that āc̄ < 0 and f¯ 6= 0,
(3) āx2 + ēy for some real numbers ā and e such that ā 6= 0 and ē 6= 0,
(4) āx2 + c̄y 2 for some real numbers ā and c̄ such that āc̄ > 0,
(5) āx2 + c̄y 2 for some real numbers ā and c̄ such that āc̄ < 0,
134 IVAN CHELTSOV
(6) āx2 + f¯ for some real numbers ā and f¯ such that āf¯ < 0,
(7) āx2 for some real number ā,
(8) āx2 + c̄y 2 + f¯ for some real numbers ā, c̄ and f¯ such that āc̄ > 0 and (ā + c̄)f¯ > 0,
(9) āx2 + f¯ for some real numbers ā and f¯ such that āf¯ > 0.
Geometrically, these cases corresponds to the following subsets: an ellipse, a hyperbola, a parabola,
a single point, two intersecting lines, two parallel lines, a single line (taken with multiplicity two),
an empty set, another empty set, respectively.