0% found this document useful (0 votes)
80 views288 pages

Linear Algebra Done Openly

The document is a comprehensive textbook on linear algebra, authored by Andrew Misseldine, covering various topics such as linear systems, vector spaces, matrix operations, and eigenvalues. It includes detailed sections on fundamental concepts, properties, and applications of linear algebra, structured into chapters with specific focus areas. The content is designed for educational purposes, providing a thorough understanding of linear algebra principles and techniques.

Uploaded by

Majid Gunnarz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views288 pages

Linear Algebra Done Openly

The document is a comprehensive textbook on linear algebra, authored by Andrew Misseldine, covering various topics such as linear systems, vector spaces, matrix operations, and eigenvalues. It includes detailed sections on fundamental concepts, properties, and applications of linear algebra, structured into chapters with specific focus areas. The content is designed for educational purposes, providing a thorough understanding of linear algebra principles and techniques.

Uploaded by

Majid Gunnarz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 288

1

2
Linear Algebra Done Openly

Andrew Misseldine

June 30, 2023


2
Contents

Preface 5

1 Introduction to Linear Algebra 1


1.1 Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Modular Arithmetic and Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.5 Augmented Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6 Reduction of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2 Vectors 45
2.1 Vector Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.2 Matrix Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.4 Affine Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.5 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.6 Solution Sets of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.7 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.8 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

3 The Algebra and Geometry of Matrices 95


3.1 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.2 Matrix Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.3 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.4 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.5 Matrix Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.5.1 Generalizations of Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.5.2 The LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.6 Linear Transformations on R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.7 Representations of Linear Transformations as Matrices . . . . . . . . . . . . . . . . . . . . . . 131

4 Orthogonality 135
4.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.3 Outer Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.4 Affine Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
4.5 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.6 The Fundamental Theorem of Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
4.7 The Gram-Schmidt Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.8 The Least Squares Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

3
4 CONTENTS

5 Determinants 183
5.1 Introduction to Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
5.2 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
5.3 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
5.4 Cross Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

6 Eigenvalues 205
6.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.2 The Characteristic Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
6.4 Orthogonal Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.5 Similarity and Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

Appendix A Notation Index 229

Appendix B The Substitution and Elimination Methods 231

Appendix C Proofs of Theorems 235

Appendix D Solutions to Exercises 239


Preface

List of Contributors
Kory Adams Charles Del Quandro Christopher Houston Brittany Palmer Sarah Walters
Abby Allen Cameron Dix Ethan Hoyer Seth Palmer Adym Warhurst
Will Allen Kaylee Dockter Jacob Jensen Darby Parise Leon Weingartner
Kaden Allred Riley Drishinski Sofia Jones Colin Reid Kordell Welch
Samuel Andersen Joshua Edgel Cyrus Kaveh Zach Rogers Sarah Wilcox
Caroline Ashton Kaden Empey Jacob Kuhn Rindy Roos Walt Williams
Alicyn Astle Candance Fehr Maria Langford Hamza Samha Chance Witt
Sherie Ayag Courtney Flanigan Louaivasa Laulu Lucas Shaner Kyle Wood
Shelby Bartlett Jaren Frandsen Laura Lee Hannah Simonson Kennedy Worthington
Tyler Bayn C.J. Giacoletto Yucheng Long Joseph Spendlove Jiazheng Yan
Landry Benimana Phillip Goins Jaxton Maez Noah Swenson Jianhe Yu
Andrew Biskey Jaimie Goldberg Adam Maxwell Ashley Taylor Mattie Zeigler
Carson Blickenstaff Caleb Goodrich Ellie McReaken Jaden Torgerson Yifan Zhu
Alexis Borell Jordan Griffith Kalvin Mudrow Daven Triplett Heming Zu
Braden Carlson Kaylee Hall Jacob Newey Runtian Tu Mitchell Zufelt
Hailey Checketts Malcolm Hanks Christopher Newton Gage VanDyke
Skyler Clark Thayne Hansen Anthony Nguyen Allyson Vest
Mariah Clayson Yingjie He Yinglong Niu Grayson Walker
David de Lera Santo Bridger Hildreth Gordon Ochsner Gregory Walsh

5
6 CONTENTS
Chapter 1

Introduction to Linear Algebra

In this first chapter, we want to introduce all the major players in the linear arena. Linear Algebra is the
study of linear objects (a term we are not quite ready to define). As we have seen linear objects already
many times in algebra and calculus, Linear Algebra will take a detailed look at linear structures and this
first chapter will explore the important fundamentals of the linear universe.

Please be aware that this first chapter may feel awkward or difficult when you read it for the first time.
This is both expected and intentional. If you were familiar with all the ideas behind Linear Algebra then
you probably would not be reading this book. The struggle herein will produce questions, all of which we
will answer in later chapters. Consider this chapter your baptism by fire into all things linear.

1
2 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

“Education is that whole system of human training within and without the school house walls, which molds
and develops men.” – W. E. B. Du Bois

1.1 Linear Systems


In this section, we introduce the study of linear systems motivated by geometry. Thus, all numbers in this
section will be real numbers. We introduce the notion of a system of linear equations and the nature of
their solution sets. We discuss when a linear system is consistent/inconsistent, independent/dependent,
underdetermined/overdetermined, and homogeneous/nonhomogeneous. We discuss the difference between
free and dependent variables in a linear system.
Definition 1.1.1. A system of linear equations (or a linear system) is a set of equations with common
variables of the following form:
a1 x1 + a2 x2 + . . . + an xn = b,
where b and the coefficients a1 , . . . , an are fixed numbers.

A solution to a system is an assignment to each variable such that each equation is satisfied. The
solution set is the set of all solutions to a system of equations. Two systems of equations are equivalent
if they have the same solution set.
A linear equation of two real variables (typically x, y or x1 , x2 ) geometrically forms a line in the plane.
Likewise, a linear equation of three real variables (typically x, y, z or x1 , x2 , x3 ) geometrically forms a plane
in 3-space. Although much harder to visualize, a linear equation in four variables (typically x, y, z, w or
x, y, z, t or x1 , x2 , x3 , x4 ) geometrically forms a hyperplane in 4-space. Higher dimensional analogues likewise
exist. As such, emphasis in this section will be placed on linear systems with two or three real variables.

Example 1.1.2. Test that (3, 1) is a solution to y


the system ( 2x − y = 5
x + 5y = 8
2x − y = 5.
Notice that 3 + 5(1) = 3 + 5 = 8 and
(3, 1)
2(3) − 1 = 6 − 1 = 5. Thus, (3, 1) is a solu- x + 5y = 8
tion to the above system.
x
Geometrically, the solution to the previous
system is an intersection between two lines, as
depicted to the right. 

Example 1.1.3. Consider the linear system with three variables and three equations:

 x + 2y + 3z = 20

−2x + y = −1

−3x − 6y + 5z = −4.

We can check that (2, 3, 4) is a solution of this system. Note that (2) + 2(3) + 3(4) = 2 + 8 + 12 = 20 X,
−2(2) + (3) = −4 + 3 = −1 X, and −3(2) − 6(3) + 5(4) = −6 − 18 + 20 = −4 X, which shows that (2, 3, 4)
is in fact a solution of the system. Try as one might, another solution to this system CANNOT be found,
that is, (2, 3, 4) is the unique solution to this system of equations. 

Example 1.1.4. Consider the linear system with three variables and three equations:

 2x − y + z = 11

x + 3y − 10z = 2

−3x + 2y − 3z = −17.

1.1. LINEAR SYSTEMS 3

It is simple to check that (5, −1, 0) is a solution to this system. Likewise, we can also show that (6, 2, 1)
is a solution to this same system. Thus, it is possible to have multiple solutions to a linear solutions. 

Because of Example 1.1.4, it is worth investigating how many solutions a linear system may have. The
beginning of our investigation is the following definitions.

Definition 1.1.5. A system of equations is consistent if it has a solution. Otherwise, we call the system
inconsistent.

If a consistent linear system has a unique solution, we call this the independent case. If a consistent
linear system has multiple solutions, we call this the dependent case.

Like we did in Example 1.1.2, we will investigate linear systems with two variables and their correspond-
ing geometry.

Example 1.1.6. Solve each system of equations.


(
5x + 2y = 32 y
(a)
3x + 6y = 48
20

After graphing the two lines, as depicted to 15


the right, we see that the two lines intersect
10
at a unique point. This point of intersection, 3x + 6y = 48
(4, 6) , means the system has a unique solu- 5 (4, 6)
tion. Therefore, the system is consistent and 5x + 2y = 32
independent. x
1 2 3 4
(
y = −x + 5 y
(b)
2x + 2y = 3

After graphing the two lines, as depicted to the


right, we see that the two lines are parallel.
Thus, they do NOT ever intersect. Therefore,
the system is inconsisitent , that is, there is
no solution to the linear system.
y = −x + 5

x
2x + 2y = 3
4 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

2

y
 x =y+3 3x − 3y = 9
(c) 3 2

3x − 3y = 9 x= y+3
3
After graphing the two lines, as depicted to the
right, we see that the two lines are the same
line. Thus, every point on one line is also a x
point on the other line. Therefore, the system
is dependent, that is, there are infinitely many
solutions to the linear system. In particular,
the system is consistent.


In general, any two lines in the real plane intersect at 0, 1, or infinitely many points (this last case occurs
if the lines overlap). This holds also for higher dimensional systems of linear equations, that is, for any
system of linear real equations the number of solutions is 0, 1, or ∞ and the solution set of a linear system
will fall under one of these three types.

These three possibilities are also the case for intersections of planes in 3-dimensional space. The inde-
pendent, consistent case occurs when all planes in the system intersect at a unique point. This was the case
for Example 1.1.3. Visually, this can be seen as three planes intersecting at a unique point, such as the three
planes involved in Example 1.1.3, illustrated below:

The inconsistent case occurs when not all the planes simultaneously intersect at a common point, perhaps
because some subset of planes are parallel with one another. Finally, the dependent, consistent case occurs
when the collection of planes overlay at more than one point, e.g. three planes in 3-space intersect along a
common line, as illustrated below:

This was the case for Example 1.1.4. In fact, any point of the form
(z + 5, 3z − 1, z), (1.1.7)
1.1. LINEAR SYSTEMS 5

where the variable z can be freely assigned to any real number, is a solution to this linear system. This is
called the general solution to the system. In this example, z is a free variable, that is, a variable which
can be assigned any value. The other two variables, x and y, are called dependent variables, because
their assignment is dependent upon the assignments of the free variables corresponding to some linear rela-
tionship. Thus, the linear system in Example 1.1.4 has one free variable and two dependent variables. The
two solutions listed in Example 1.1.4 arose exactly from setting z = 0 and z = 1, respectively. In general,
the dependent case occurs exactly when the linear system has at least one free variable.

The following properties will be proven in the homework.

Proposition 1.1.8. Suppose that a consistent linear system has m equations and n variables.
(i) If n > m, then the system has at least n − m free variables which implies that it has multiple solutions.

(ii) If m > n, then the system has at least m − n many equations which could be removed without modifying
the solution set.

The situation where n > m is called an underdetermined system because there are not enough
equations to guarantee a unique solution. The situation where m > n is called an overdetermined system
because there might be too many equations to guarantee consistency.
Definition 1.1.9. A system of linear equations is called homogeneous if each equation is of the form:

a1 x1 + a2 x2 + . . . + an xn = 0.

Otherwise, the system is called nonhomogeneous.

Example 1.1.10. The below 3 × 3 linear system is homogeneous since the right hand side of each equation
is 0: 
 x + 3y + 5z = 0

2x + 4y + 6z = 0

4x + 2y =0

On the other hand, the below linear system appears to be homogeneous at first, but is really non-
homogeneous: 
 x + 3y − 2z + 4
 =0
4x + 2y + 3z − 5 =0

−x + 5y − 3z − 2 =0

The issue here is that the linear system is not in standard form, that is, all the variables are located on the
left-hand sides of the equations, in descending order, and the constants are all on the right-hand sides of the
equations. In standard form, the system would appear as:

 x + 3y − 2z = −4

4x + 2y + 3z = 5

−x + 5y − 3z = 20

In standard form, it is much more clear that the above linear system is non-homogeneous. 

Proposition 1.1.11. A homogeneous system of equations is always consistent.


6 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

Exercises (Go to Solutions)


For
 Exercises 1–4, determine if the point is a solution to the following linear system
 2x − 3y +
 z =0
y+ z =0

−x − 2z = 0.

1. (0, 0, 0) 2. (4, 2, −2) 3. (3, 2, 0) 4. (1, 1, 1)

For
 Exercises 5–8, determine if the point is a solution to the following linear system
 x+ y
 = 7
3x + 3y − 10z = −79

x + 5y + z = 33.

5. (4, 3, 10) 6. (0, 0, 0) 7. (3, 4, 10) 8. (−10, 3, −38)

For Exercises 9–12, determine if the point is a solution to the following linear system

 x + 3y − 2z = 13

2x − y = 2

−x+ 6y − 4z = 17.

♠ 9. (6, 10, 4) ♠ 10. (5, 8, 8) ♠ 11. (3, 4, 1) ♠ 12. (9, 3, 2)

For
 Exercises 13–17, determine if the point is a solution to the following linear system
 x− y+ z = 5

3x + 4y − 5z = −2

−2x− 4y + 7z = 11.

13. (1, 2, 3) 14. (3, 2, 1) 15. (5, 0, 6) 16. (3, 1, 3) 17. (9, 3, 2)

For Exercises 18–27, graph the linear system and determine if it is consistent or inconsistent. If consistent,
determine which whether it has a unique solution or multiple solutions.
(  
14x + 3y = 4  2x − 5y = 10  x = 5y − 4
18. 19. 4 20.
5x + 9y = 1  x − 2y = 4  1 x = y + 13
5 5
( ( (
2x + 3y = 4 −4x + y = 7 x− y =1
21. 22. 23.
x− y =2 −8x + 2y = 2 2y = 2x − 2
( ( (
y = 2x + 7 y = 5x + 8 y = 3x − 7
24. 25. 26.
y = −4x − 2 −5x + y − 2 = 6 y − 3x = 12


9x + 5 = 2y

27. x + 2y = 5

y = 2.5

For Exercises 28–37, determine if the linear system is homogeneous. For those which are, find a solution.
( ( (
4x + 2y = 0 6x − 7y = 3 3x − y = 0
28. 29. ♠ 30.
5x + 3y = 0 2x + 8y = 0 2x + 5y = 0
1.1. LINEAR SYSTEMS 7
( ( (
2x − 4y = 0 −5 − y − 2 = 0 2x + y = 10
♠ 31. 32. 33.
−6x + 8y = 3 x+y+5 =0 −7x − 3y = 3
  
5x + 2y + 3z = 2
  5x + 2y + 3z = 2
  x
 − 2z =0
34. 3x − 4z = −1 35. −3x − 2y + z = 5 ♠ 36. 3x + 4y + 4z + w = 0
  
5y + 3z = 5 2x + 2y − 4z = 1 −2x + 6y + 6z + 2w = 0
  

 2x − 2y − 2z = 6

37. −x − 3y + 9z = 1

4x + 3y − 18z = 3


 x + 2y − 4z = 0

38. Is the linear system −y + 3z = 0 consistent? Why or why not? Draw a 3-dimensional sketch

2x + 3y =0

of these planes and their intersection.
♠ 39. Using the general form for Example 1.1.4 provided on page 4 (namely (1.1.7)), construct five other
solutions to the linear system from Example 1.1.4. Answers may vary.

♠ 40. Prove Proposition 1.1.11.


8 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

“Hold fast to dreams. . . For when dreams go


Life is a barren field. . . Frozen with snow.” – Langston Hughes

1.2 Fields
The topic of this section will be about what is even a number. This discussion is often postponed to more
advanced mathematics courses where abstraction is king, but for linear algebra students, this abstraction
is just budding. This course will be primarily a computational introduction to linear algebra with select
proofs and applications. In spite of this goal, the abstract notion of a field will not hinder this. In fact, the
introduction of various fields will strengthen the student’s understanding of the concepts and computations
of linear algebra as the student will see what parts are truly necessary and when variability is allowed. This
will also strengthen the realization that discussion of complex vector spaces and real vector spaces are not
two separate discussion but instead one same story.

What really is a number? Although numbers are useful for counting, they are much more useful than
that. A simple answer would be to say it is a thing with which we do math on. Although a crude response,
this will satisfy our inquiry, thus avoiding a long journey through deep logic and philosophy. Maybe another
day.

A set to us will mean a collection of objects, typically called elements. Those elements could be
numbers, colors, people, pokémon, or other sets! Typically, sets will be denoted by capital alphabet let-
ters such as A, B, C, etc. and elements with lower case letters such as a, b, c, etc. Typically, {. . .} is
used to denote the elements in a set, i.e. A = {1, 2, 3, 4}. Often a set is described by some rule, such as
A = {x | x satisfies some rule}. If an element x is a member of the set X, then we denote this as x ∈ A.
Then notation x ∈/ A means that x is not an element of A. There must be a clear rule to decide whether an
element is a member of a set or not, otherwise it is not a set. The set which contains no elements, ∅ = {},
is called the empty set.

A function f is a relationship between sets, say A and B, such that each element of A is assigned to
exactly one element of B (although not all elements of B must be assigned to an element of A). We denote
this function relation as f : A → B. If A and B are two sets, we let A × B denote the set of ordered pairs of
elements from A and B. For example, if a ∈ A and b ∈ B, then (a, b) ∈ A × B. Finally, an operation is a
function of the form f : A × B → C. One should think of an operation as a process of bringing two objects
together and creating a third object.

1.2.1 Fields
We are now ready for the definition of a field of numbers. Do not panic. If this seems very fast, that is okay.
Our discussion will slow down quickly.

Definition 1.2.1. A field is a nonempty set F with elements called scalars on which are defined two
operations, called addition + : F × F → F and multiplication · : F × F → F , such that for all scalars
a, b, c ∈ F , the following ten axioms hold:

(i) a + b = b + a (vi) ab = ba
(ii) (a + b) + c = a + (b + c) (vii) (ab)c = a(bc)
(iii) There exists a scalar 0 ∈ F , such that (viii) There exists a scalar 1 ∈ F , such that
a+0=0+a=a a1 = 1a = a
(iv) For each a, there exists a scalar −a ∈ F , such (ix) For each a 6= 0, there exists a scalar a−1 ∈ F ,
that a + (−a) = (−a) + a = 0 such that aa−1 = a−1 a = 1
(v) a · (b + c) = (ab) + (ac) (x) (a + b) · c = (ac) + (bc)
1.2. FIELDS 9

It might seem like a daunting list but remember the following: a field is just a number system for which
we can add, subtract, multiply, and divide following the usual commutative, associative, and distributive
laws. The set of rational numbers, denoted Q,† is a field. The set of real numbers, denoted R, and the set
of complex numbers, denoted C, are also both fields. These are the fields that we are most familiar with.
Note that the set of integers, denoted Z,†† is NOT a field, as the quotient of two integers is not always an
1
integer, e.g. ∈
/ Z. Likewise, the set of natural numbers, denoted N, is NOT a field for the same reasons
2
with division but also because the difference of two natural numbers is not always a natural number, e.g.
3 − 4 = −1 ∈ / N.

Fields are the number systems we are used to in previous algebra classes. For example, fields are exactly
the environment where linear equations can be solved. Let F be a field and ax + b = c is a linear equation
with variable x and a, b, c ∈ F . Here the variable itself is a placeholder for some other scalar. A solution to
this linear equation is an assignment to the variable x which makes the equation true. For linear equations,
we see there is only one solution in any field:

ax + b = c
(ax + b) + (−b) = c + (−b) Existence of Additive Inverses (iv)
ax + (b − b) = c − b Additive Associativity (ii)
ax + 0 = c − b
ax = c − b Existence of Additive Identity (iii)
a−1 (ax) = a−1 (c − b) Existence of Multiplicative Inverses (ix)
c−b
(a−1 a)x = Multiplicative Associativity (vii)
a
c−b
1x =
a
c−b
x= Existence of Multiplicative Identity (viii)
a

Note that commutativity is used to guarantee that subtraction and division is well-defined. Therefore, a
linear equation has a unique solution over a field.

Example 1.2.2. Solve the following equations:


(a) 2x + 1 = 6 over Q

5
2x + 1 = 6 ⇒ 2x = 6 − 1 = 5 ⇒ x= .
2

(b) (i + 1)x + (3 − i) = 4 + 5i over C

(i + 1)x + (3 − i) = 4 + 5i⇒ (i + 1)x = (4 + 5i) − (3 − i) = 1 + 6i


 
1 + 6i 1 + 6i 1 − i 1 − i + 6i + 6 7 5
⇒ x= = = = + i. 
1+i 1+i 1−i 1−i+i+1 2 2

1.2.2 Modular Arithmetic and Finite Fields


We introduce now one last type of field: a finite field. Let n be a positive integer and let a be any inte-
ger. Consider the operation a ÷ n. The division algorithm known unto us since primary school guarantees
there are unique integers q, r, called the quotient and remainder, respectively, such that a = qn + r where
0 ≤ r < n. For example, for n = 5 and a = 13, we have that 13 = 2(5) + 3, that is, q = 2 and r = 3. We
generally would interpret this statement as 5 divides into 13 two times with remainder of three, that is, if
13 vectors are to be shared among 5 friends then each friend would get 2 vectors with 3 vectors left over.
10 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

Typically, with a division problem one is interested in finding the quotient but in modular arithmetic, one
is instead interested in finding the remainder, which has many powerful uses. Let Zn = {0, 1, 2, . . . , n − 1}.
Then Zn should be viewed as the set of all possible remainders, or residues, when divided by n. Define
next a function (mod n) : Z → Zn by the rule a (mod n) is the unique remainder of a when divided by n.
For example, 8 (mod 5) = 3. Likewise, 14 (mod 5) = 4 since 14 = 2(5) + 4, that is, 5 divides into 14 twice
with a remainder of 4. The residue a (mod n) is read ”a modulo n” and n is called modulus of the function.

Note that if 0 ≤ a < n, then a (mod n) = a, the original value, for example, 3 (mod 5) = 3. In this
example, the associated quotient is 0. This is a department from decimal division where the fraction 53 may
be expressed as the decimal 0.6. Special attention should also be given to when a is a negative integer.
Note that −11 (mod 6) = 1 since −11 = (−2)(6) + 1, where the quotient is −2 and the residue is 1. This
is in contrast to the possibility −11 = (−1)(6) + (−5). The difference here is that while quotients may be
any integer: positive, zero, or negative, remainder must always exist in the range 0 ≤ r ≤ n − 1, which is
necessarily nonnegative.

When a ≥ n, one way to determine a (mod n) is to subtract from a the modulus n until the difference
lies in Zn . For example, 14 > 5, the first difference 14 − (1)5 = 14 − 5 = 9 is still too large. The second
difference 14 − 2(5) = 9 − 5 = 4 ∈ Z5 . Thus, 14 (mod 5) = 4, as observed above. This notion of repeated
subtraction is actually where the division algorithm gets its life, in a similar way of viewing multiplication
as repeated addition. When a is negative, we can instead add the modulus n to a (or subtract a negative, if
one prefers) until the sum lies inside of Zn . For examples, −11 − (−1)6 = −11 + 6 = −5 which is too small,
but the next iteration −11 − (−2)6 = −5 + 6 = 1. Thus, −11 (mod 6) = 1, as observed above.

Note that 7 (mod 6) = 1 = −11 (mod 6), that is, it is possible for two different integers to have the
same remainder when divided by the modulus n. We say that two integers a and b are congruent modulo
n if a (mod n) = b (mod n) and denote this congruence as a ≡ b (mod n). For example, 7 ≡ 12 (mod 5)
since 7 and 12 both have remainder 2. If a ≡ b (mod n), then a = q1 n + r and b = q2 n + r for some
quotients q1 , q2 . Then their difference a − b = (q1 − q2 )n is a multiple of the modulus n. In other words,
a ≡ b (mod n) if and only if a−b is divisible by n. For example, 12−7 = 5 which implies that 12 ≡ 7 (mod 5).

We define two operations on Zn called modular addition and modular multiplication: modular
addition is defined by the rule that two integers are added together and the sum’s remainder modulo n is
reported, modular multiplication is defined similarly. The final result should be a number in Zn .

A visual interpretation of modular addition would be the following. Imagine n pearls on a string labeled
from left to right with the numbers 0 through n − 1. Take the two ends of this string and connect them
together to form a pearl necklace of integers. To compute a + b (mod n), start at the pearl labeled a and
rotate the necklace by one pearl to the right (counter-clockwise) exactly b times. The pearl you land on is
then modular sum of a and b. For example, starting at 2 and counting forward on necklace with 5 pearls
4 times gives the pearl 1, that is, 2 + 4 ≡ 1 (mod 5). Modular multiplication can similarly be visualized as
iterated addition on the pearl necklace.
1.2. FIELDS 11

Example 1.2.3. Let consider the following calculations:


(a) 6 + 11 (mod 5)

6 + 11 ≡ 17 ≡ 2 (mod 5)

(b) 7 + 13 (mod 5)

7 + 13 ≡ 20 ≡ 0 (mod 5)

(c) 2(4) (mod 7)

2(4) ≡ 8 ≡ 1 (mod 7).

(d) 2(5 + 6) (mod 13)

2(5 + 6) ≡ 2(11) ≡ 22 ≡ 9 (mod 13)

(e) 3(5) + 8 (mod 11)

3(5) + 8 ≡ 15 + 8 ≡ 4 + 8 ≡ 12 ≡ 1 (mod 11) 

We can define modular subtraction a − b (mod n) by adding the additive inverse of b with respect to
modular addition, that is, a − b ≡ a + (−b) (mod n). Thus, modular subtraction is the inverse operation
of modular addition. Returning to the pearl necklace analogy above, if a + b (mod n) turns the necklace
to the right (counter-clockwise) b times from a, a − b (mod n) turns the necklace to the left (clockwise) b
times from a. For example, starting at 2 and counting backward on the necklace with 5 pearls 4 times gives
the pearl 3, that is, 2 − 4 ≡ 3 (mod 5). Likewise, 14 − 5 ≡ 9 ≡ 1 (mod 8). Note that −b ≡ n − b (mod n).
Thus, the final result when computing modular subtraction should always be positive.
12 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

Similarly, we define modular division a/b (mod n) by multiplying by the multiplicative inverse of b
with respect to modular multiplication, that is, a/b ≡ ab−1 (mod n). This multiplicative inverse will be an
element of Zn such that bb−1 ≡ 1 (mod n), that is, their product will be one more than a multiple of n as
this implies the remainder is 1 when divided by n (see Example 1.2.3.3). For example,
1
4−1 = ≡2 (mod 7)
4
since 4(2) = 8 ≡ 1 (mod 7). Thus,
5
≡ 5(2) = 10 ≡ 3 (mod 7).
4
This multiplicative inverse does not always exists (see Exercise 1.2.2).

Alternatively, one can simplify a modular fraction without a multiplicative inverse by replacing the
numerator with an integer equivalent to the original but actually divisible by the denominator in the usual
sense. For example,
17 17 + 10 27 9(3)
≡ ≡ ≡ ≡ 9 (mod 5).
3 3 3 3
The following theorem characterizes when this can be done.

Theorem 1.2.4. The set Zn is a field with respect to modular addition and modular multiplication if and
only if n is a prime number.

Example 1.2.5. Solve the linear equation 2x + 1 ≡ 6 (mod 13).

Since Z13 is a field, we can solve this linear equation in the same fashion as the previous ones. Note
5
2x + 1 ≡ 6 ⇒ 2x ≡ 6 − 1 ≡ 5 (mod 13) ⇒ x≡ ≡ 5(2)−1 .
2
Next, we need to either identify the reciprocal of 2 (mod 13)‡ or replace 5 with a congruent integer which is
5 5 + 13 18
even, that is, ≡ ≡ ≡ 9 (mod 13). Therefore, the solution is 9. We can check this solution:
2 2 2
2(9) + 1 ≡ 18 + 1 ≡ 19 ≡ 6 (mod 13). 

Example 1.2.6. Solve the linear equation 3x + 5 ≡ 1 (mod 7).

We solve this equation similar to the previous example over the field Z7 .
3
3x + 5 ≡ 1 ⇒ 3x ≡ 1 − 5 ≡ −4 ≡ 3 (mod 7) ⇒ x≡ ≡ 1.
3
We can check this solution:
3(1) + 5 ≡ 3 + 5 ≡ 8 ≡ 1 (mod 7). 
1.2. FIELDS 13

Exercises (Go to Solutions)


1. Explain the difference between equality of integers and congruence modulo p.
For Exercises 2-6, determine with the statement is true or false. If false, correct the statement so that it is
true.
2. The remainder a (mod n) is read “a modulo n” and a is called the modulus of Zn .
3. It is possible for two different integers to have the same residue modulo n.
4. Modular division a/b (mod n) is defined by multiplying by the inverse of b with respect to modular
addition.
5. The set Zn is a field with respect to modular addition and multiplication if and only if n is a real
number.
6. If 0 ≤ a < n, then a (mod n) = n.

For Exercises 7-24, simplify the modular expression.

7. −4 (mod 7) 8. 3(−7) (mod 5) 9. −924 − 152 10. 136 − 822


(mod 4) (mod 13)

11. 2237 (mod 11) 1 1 ♠ 14. 34 + 61 (mod 7)


12. mod 3 13. mod 7
2 3
♠ 15. 4(13) (mod 5) 16. 5(15) (mod 6) 17. 2(8 − 5) + 4 (mod 3)

♠ 18. 2(6 − 3) + 7 (mod 2) ♠ 19. 3[2 + 1 − (6 + 3)] + 6(7) (mod 13)

5(4) + 3 3(5) + 4 6(7) + 5


♠ 20. (mod 7) 21. (mod 9) 22. (mod 13)
2(6) 2(7) 2(4)

23. 2[(3 + 4 + 2 + 5) + 1 + 2(3) − 3 + 2(−2)] (mod 7) 4(9) + 1 − (6 + 12)


24. (mod 13)
1 + 3(6)

−2(35 − 74)
25. (mod 19)
3

For Exercises 26–43, solve the linear equation.

26. 7x + 10 = 3 ♠ 27. 6x + 7 = 2 ♠ 28. (2 + i)x + (5 − 7i) = 3 − 4i

29. 3x + 7 ≡ 5 (mod 5) ♠ 30. x + 1 ≡ 0 (mod 2) ♠ 31. 2x + 3 ≡ 4 (mod 5)

♠ 32. 3x + 2 ≡ 1 (mod 7) ♠ 33. 5x + 1 ≡ 4 (mod 11) 34. 420x − 7 ≡ 98 (mod 11)

35. 2x + 5 ≡ 0 (mod 7) 36. 14x + 12 ≡ 124 (mod 6) 37. 5x − 25 ≡ 71 (mod 14)

38. 2x − 4 ≡ 14 (mod 7) 39. 3(2−4(x+1)) ≡ 27 (mod 3) 40. −20 − 4x ≡ 40 (mod 4)


5
41. 5x + 22 ≡ 15 (mod 8) 42. 6x + 2 ≡ 0 (mod 7) 43. (3 + 10)x + 2(7) + 3 ≡ 5(14)
(mod 7)

♠ 44. Show that the linear equation 2x + 3 ≡ 4 (mod 6) has no solution.

† The symbol for the rational field Q comes from the English word Quotient, given that every rational number can be

expressed as a quotient of integers.


†† The symbols for the ring of integers Z comes from the German word Zahlen, which translates as number.
‡ By the way, the reciprocal of 2 modulo 13 is 7 since 2(7) ≡ 14 ≡ 1 (mod 13). Note that x ≡ 5(2)−1 ≡ 5(7) ≡ 35 ≡ 9

(mod 13).
14 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

“Your sacred space is where you can find yourself again and again.” – Joseph Campbell

1.3 Vector Spaces


Now that we have learned about fields and linear equations, the next important linear structure should
discuss is that of linear combination. In order to define this, we first need a notion of a vector. We say that
a mathematical quantity is a vector if we can add two vectors together to create a third vector and we can
scale a vector by a scalar (hence the name). Vectors are distinct from scalars. While a scalar is a number,
a vector is more complicated than just a number. To distinguish the extra complexity of vectors, vector
quantities are denoted either in boldface font, such as u, v, or with an arrowhead above the symbol, such as
~u, ~v . The following definition makes precise this notion.

Definition 1.3.1. A vector space over a field F is a nonempty set V of elements, called vectors, on which
are defined two operations, called addition + : V × V → V and scalar multiplication · : F × V → V ,
such that for all u, v, w ∈ V and all scalars c, d ∈ F , the following eight axioms hold:

(i) Additive Associativity: u + v = v + u (v) Left Distributivity: c(u + v) = cu + cv


(ii) Additive Commutativity: (vi) Right Distributivity: (c + d)u = cu + du
(u + v) + w = u + (v + w)
(iii) Additive Identity: There exists a vector (vii) Multiplicative Associativity:
0 ∈ V , such that u + 0 = 0 + u = u c(du) = (cd)u

(iv) Additive Inverse: For each u, there exists (viii) Multiplicative Identity: 1u = u
a vector −u ∈ V , such that
u + (−u) = (−u) + u = 0

Example 1.3.2. In physics, a vector represents


a mathematical quantity with both magnitude
and direction, such as a force applied to an object.
Vectors are thus represented as arrows pointing in v v
the given direction and whose length represents
the magnitude of the vector. Physical vectors are
also free to move about in space, that is, this exact
location does not matter so long as their length
v
nor direction does not change. Therefore, the
three arrows illustrated above represent that same vector v. Hence, the arrow is determined by the relative
displacement of the head of the arrow from its tails. These vectors can be added together by the so-called
parallelogram rule: begin by placing the tails of the two vectors u and v together, form a parallelogram
whose parallel sides correspond to u with a copy of u and v with a copy of v; the sum u + v is the unique
vector which is the diagonal of the parallelogram whose tail agrees with the tail of u and v, as displayed:

v
w
w

v v u
v+

v+w

+ +
u u
u+

u u

v v v
u+
v

The Parallelogram Rule u+v =v+u (u + v) + w = u + (v + w)

The second diagram shows why arrow addition is commutative, as the two paths from the tail of u + v
to head produce the same vector. In particular, the black, solid path comprises v + u while the gray, dashed
1.3. VECTOR SPACES 15

path comprises u + v, and both paths have the same starting and ending points. The third diagram likewise
demonstrates how three or more vectors are added together and why arrow addition is associative, as the
path following (u + v) then w produces the same vector as the path following u then (v + w).
The zero vector 0 would be simply a point in space, that is, the vector with no magnitude nor direction.
Notice that adjoining a point to the head or tail will neither lengthen the arrow nor turn it. Thus, v + 0 = v.
Additionally, given a vector v, −v is defined as the arrow with the same length but pointing in the opposite
direction. It holds that v + (−v) = (−v) + v = 0.

v −v

We define scalar multiplication of arrows by multiplying the length of the arrow by the scalar (and switching
directions if the scalar is negative). Clearly, 1v = v, since the length of the arrow is unchanged, and
c(du) = (cd)u, since stretching a vector by a factor of d then stretching by a factor of c as the net affect of
stretching the arrow by a factor of cd. It is left as an exercise of the reader to prove that two distributive
laws. Therefore, the set of arrow in space forms a vector space under these operations. 

Example 1.3.3. Another type of vector is an array of numbers. Let F be a field. Then
 a column vector

 1 
 
is an array of n scalars from F such that the array is oriented vertically, e.g. v =   2 . A row vector

 
3
likewise is an array of n scalars such that the array is oriented horizontally, e.g. v = (1, 2, 3). Our notation
of a row vector agrees with the commonly used notation for coordinates of points in Cartesian geometry. As
such, we naturally identify these arrays of scalars with points in space. The difference between row vectors
and column vectors is purely notational, and we will use the two notations interchangeably.† The order of
the scalars in the array does matter, e.g. (1, 2, 3) 6= (2, 3, 1).

Let F n denote the set of all column vectors with n entries coming from the field F , for example, for
example,  
 1 
   
π   
 2 

1 − 2i
   
3 2
∈ Z53 .††
 
∈ , ∈ , or
   

 0 
 R   C  0 
1 5
2 − 2i
 √   
 
− 17  2 
 
 
1

Addition of column vectors is component-wise, that is, scalars in the corresponding positions are added
together:      
 u1   v1   u1 + v1 
     
 u2   v2   u2 + v2
     

u+v = +
 ..   ..  = 
 
..
.
 (1.3.1)
 .   .   . 
     
     
un vn un + vn
16 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

For scalar multiplication, every component is multiplied by the scalar, that is:
   
x
 1   cx1 
   
 x2   cx2 
   
cx = c  . 

=

..  .
 (1.3.2)
 ..   . 
   
   
xn cxn

Here the zero vector 0 is the array of all zeros and −v = (−1)v. We show that this definition of vector
addition satisfies the additive associativity axiom:
           
 u1   v1   w1   u1   v1 + w1   u1 + (v1 + w1 ) 
           
u2   v2   w2 u2   v2 + w2   u2 + (v2 + w2 ) 
           
  
u + (v + w) = 
 ..   ..  +  ..
 +     = 
 
+
..   ..
=
  ..



 .   . 
  
  .
  
  .  
  .  
  . 

           
un vn wn un vn + w n un + (vn + wn )
           
 (u1 + v1 ) + w1   u1 + v1   w1   u1   v1   w1 
           
(u2 + v2 ) + w2   u2 + v2   w2   u2   v2   w2 
           
(∗) 
=   ..
=
  ..
+
  ..   ..   ..  + 
 =  +  
..  = (u + v) + w,


 . 
 
 .  
  .   .   .  
      . 

           
(un + vn ) + wn ) un + vn wn un vn wn

where the marked equality (∗) follows from the additive associtivity of the field F . The remaining axioms
of a vector space are left as an exercise to the reader. Therefore, F n is a vector space, the one we will place
the most focus on in this text. 

   
 6   −3 
    3
Example 1.3.4. Let u =  −2  , v =  0  ∈ R . Then
 

   
2 5
       
 6   −3   6 − 3   3 
       
u+v =
  +  0  =  −2 + 0
−2     =
  .
−2 
       
2 5 2+5 7
 
 −1 
  3
If v =  3  ∈ R , then

 
2
   
 −1   −3 
   
3v = 3 
 3  =  9 .
   
   
2 6
1.3. VECTOR SPACES 17

Definition 1.3.5. Given vectors v 1 , v 2 , . . . , v n ∈ F m and scalars c1 , c2 , . . . , cn ∈ F , the vector x given as

x = c1 v 1 + c2 v 2 + . . . + cn v n

is called a linear combination of v 1 , v 2 , . . . , v n with coefficients c1 , c2 , . . . , cn .

A linear combination is a way of combining vectors using addition and scalar multiplication. These vec-
tors operations, of course, depend entirely on the field of scalars.

Example 1.3.6. Let us simplify the following linear combinations over the vector spaces C2 and Z33 ,
respectively.
           
 3   2+i   6   −1 + 2i   6 + (−1 + 2i)   5 + 2i 
(a) 2   + i = + = = .
i 3 + 5i 2i −5 + 3i 2i + (−5 + 3i) −5 + 5i
                

   1   1   0
0   2   2   0+2+2     1 
4
                 
 1 +2  1 +2  0  ≡  1
(b)        + 2 + 0  ≡  1 + 2 + 0  ≡  3  ≡  0  (mod 3). 
          
                 
2 1 2 2 2 1 2+2+1 5 2

Vectors can also be many other things, like matrices, functions, sequences of numbers. Even linear equa-
tions can be vectors.

Example 1.3.7. Let V be the set of linear equations with n variables, say x1 , x2 , . . . , xn with coefficients
coming from the field F . For example,

c 1 x 1 + . . . + c n x n = b1

is an element of V for c1 , . . . , cn , b1 ∈ F . Likewise, if

d 1 x 1 + . . . + d n x n = b2

is another linear equation in V , then

(c1 x1 + . . . + cn xn = b1 )
+ (d1 x1 + . . . + dn xn = b2 )
(c1 + d1 )x1 + . . . + (cn + dn )xn = (b1 + b2 )

Note that ci + di , b1 + b2 ∈ F for all i. Thus, the sum of equations is a member of V , that is, it is a vector
too. Similarly, if a ∈ F , then c1 x1 + . . . + cn xn = b1 scaled by a is the linear equation

(ac1 )x1 + . . . + (acn )xn = (ab1 ).

As aci , ab1 ∈ F for all i, this equation is likewise a vector. Therefore, V is a vector space. Let E1 , E2 , . . . , Em ∈
V be a list of linear equations with common solution x ∈ F n . Then x is a solution to any linear combination
a1 E1 + a2 E2 + . . . + am Em . 

Theorem 1.3.8. The following properties hold for any F -vector space V . Let u ∈ V and c ∈ F .

(i) The zero vector 0 in V is unique. (ii) The additive inverse of u is unique.
18 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

(iii) 0u = 0. (iv) c0 = 0. (v) −u = (−1)u.

Proof. (i) Suppose there is a vector θ ∈ V such that θ + v = v + θ = v for all v ∈ V . Then

0 = 0 + θ = θ.

Therefore, 0 is the unique vector with this property.

(ii) Like the last property, suppose that u0 ∈ V has the property that u + u0 = u0 + u = 0. Then

u0 = u0 + 0 = u0 + (u + −u) = (u0 + u) + −u = 0 + −u = −u.

Therefore, −u is the unique vector with this property.

(iii) First,
0u = (0 + 0)u = 0u + 0u.
Adding −0u to both sides of the equation gives 0 = 0u.

The remaining parts are left as exercises to the reader. 


1.3. VECTOR SPACES 19

Exercises (Go to Solutions)


For Exercises 1-1, determine with the statement is true or false. If false, correct the statement so that it is
true.

1. For any vector space V , ∅ ∈ V .


For Exercises 2-23, simplify the linear combination.
           
 1   0   1   −1   0   3 
           
 2  + 3 2  + 4 1
2. 5      
  0  + 2 0  − 2 4
♠ 3. 3      

           
4 1 1 1 1 0
           
3
   5   0   3   1   2 
           
 2 + 8 + 0
4.      
 5. −1 
 0  +  2  + 3 4
    

           
1 10 5 1 4 1
           
 3   8   −2   1   5   1 
           
6. 7  0  + 2  −3  − 3  −2 
    
  4  + 2 0  − 4 0
7. 8      

           
2 5 0 3 4 1
       
 3   7+i   1   3 + 4i 
8. (2 + 5i)   − (3 − 2i)   ♠ 9. (3 + 2i)   − (1 − 2i)  
4 6 + 2i 2 1+i
       
 1 + i   2i   2   1 + 2i 
10. (2 + i)  +  11. (2 + 4i)   − (3 − 2i)  
i 1 + 2i 3 2−i
         
1 − i 3 + i 1   2  1
12. (1 + i)   − (1 − i) 
      
      
2+i 4−i 13. (1 − i) 
 +
2   3 
   − (2 + 3i)  4 
 
     
−1 5 2
         
3  1 + 3i 2 1 1
15. 2  −  + 2  (mod 3)
        
   
 5  + (3 − 2i)  2 − 3i
14. (4 + 5i)  1 0 1
  

   
2 2 + 5i
         
3 5    1   1
1
16. −10  +  (mod 7)
    
     
2 6 ♠ 17. 
 0 + 1 + 1
     (mod 2)

     
1 0 1
           
 1   1   1     4   0
3 
           
18. 6 
 3  + 2 1  +  1
     (mod 3)
 19. 
 4 + 1 + 1
     (mod 5)

           
4 5 1 3 1 1
20 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA
           
 1   3   0   1 
 
 2  
  
1 

     
 2  + 3 2  + 4 1
20. 2       (mod 5)  0 
 
 3  
  
2 

♠ 21. 4 

 + 2
         +  (mod 5)
  
3 1 0  0 
 
 1  
   3 

     
2 0 4
           
 −1   1   2  4
   0   1 
           
 2  2   3   2  1  1 
           
  
22. 3   − 2 +  (mod 5) 23.   + 2  + 3  (mod 5)
           
 −3   2   0   3   4   2 
           
           
0 1 1 1 3 3

♠ 24. Prove parts (iv) and (v) of Theorem 1.3.8.

♠ 25. Let Pn = {a0 + a1 x + a2 x2 + . . . + an xn | a0 , a1 , a2 , . . . , an ∈ F } be the set of polynomials of degree at


most n and coefficients from a field F . Show that Pn is a vector space, hence we may view polynomials
as vector quantities. (More specifically, show that the sum of two polynomials is again a polynomial
and the scalar multiple of a polynomial is likewise a polynomial, similar to the method introduced in
Example 1.3.7).

Note that the way we defined vector addition and scalar multiplication over F n is essentially the only way
to do it to guarantee the vector space axioms. For Exercises 26-29, we will redefine scalar multiplication
over F 3 (and keep vector addition the same). Find a counterexample for why this alternative definition of
scalar multiplication fails at least one vector space axiom or property and hence does not produce a genuine
vector space. Answers may vary.
      
 x 1   x 1   x 1   cx 1 
       
♠ 26. Redefine scaling: c  x2  =  x2 
  
 ♠ 27. Redefine scaling: c  x2  =  x2 
  

       
x3 x3 x3 x3
      
 x 1   −cx1   x 1   0 
       
♠ 28. Redefine scaling: c  x2  =  −cx2 
  
 ♠ 29. Redefine scaling: c  x2  =  cx2 
  

       
x3 −cx3 x3 cx3
1.3. VECTOR SPACES 21

 
 1 
† From a vector perspective, the notion of a column vector 
 
 2 , a row vector [1 2 3], and a geometric point (1, 2, 3) are all

 
3
the same. Why have different notations then? In other contexts, it is helpful to distinguish between them. In calculus or physics,
a distinction between points and vectors (aka arrows) is sometimes desired (although often unnecessary). Thus, points have
their usual notation and arrows are given a different notation, which may include the column or row vector notation we have
already introduced or some other notation that we will not use here, e.g. h1, 2, 3i. As arrows can be moved anywhere in space
without changing their value, the vectors are often placed in standard position, meaning its tail is on the origin. Then the head
of the arrow is a unique point in space which characterizes this arrow. This standard representation of the vector, sometimes
called its algebraic form, is very desirable as the algebraic computations are far simpler their their geometric counterparts. For
this reason, we see not strong reason to distinguish between arrows in space and the coordinates that they are pointing at. The
only perspective that we will distinguish between a column vector and a row vector herein is when we consider them matrices,
such as in Chapter (ii). This may be imperative for matrix operations, such as multiplication. Under the matrix perspective,
a column vector is just an n × 1 matrix, and a row vector is just a 1 × n matrix.
†† The combined notations of the set Z , which is a field whose subscript defines the modulus, and the vector space F n ,
p
whose superscripts defines the number of entries in each vector, conveys information about the vector space Zn p . For example,
Z53 denotes the vector space whose vectors comprise of five elements in the array that are integers congruent modulo 3, e.g,
 
 1 
 
 2 
 
 
 0 . Note that no integer in Zn 3 ever needs to be greater that 2 or smaller than 0, as all those integers are reducible modulo
 
 
 
 2 
 
 
1
3. Hence, Z53 contains exactly 35 = 243 distinct vectors. In general, the vector space Zn n
p will contain exactly p many vectors.
22 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

“Don’t spend time beating on a wall, hoping to transform it into a door.” – Coco Chanel

1.4 Linear Transformations


The next important linear structure we have seen before (primarily in calculus) is the notion of a linear
operator or a linear transformation. This is a function on vectors which preserve linear combinations.

Definition 1.4.1. Let X and Y be vector space over a field F . A function T : X → Y is a linear
transformation if:
(i) T (u + v) = T (u) + T (v), for all vectors u, v ∈ X;
(ii) T (cu) = cT (u), for all vectors ∀u ∈ X and scalars c ∈ F .
Recall that X and Y are called the domain and codomain of T , respectively. For any x ∈ X, we call
T (x) the image of T x, and we call im(T ) = {T (x) | x ∈ X} (the set of images) the image (or range)
of T . The kernel of a linear transformation T is the set of vectors which map to the zero vector, that is,
ker T = {x | T (x) = 0}, denoted ker(T ).

Example 1.4.2. In calculus, there are many linear operators. For example, the derivative is a linear
transformation since
d d d d d
[f (x) + g(x)] = [f (x)] + [g(x)] and [cf (x)] = c [f (x)].
dx dx dx dx dx
 
d
Note that ker is the set of constant functions. Likewise, limits are linear operators:
dx
lim [f (x) + g(x)] = lim [f (x)] + lim [g(x)] and lim [cf (x)] = c lim [f (x)].
x→a x→a x→a x→a x→a

Likewise, antiderivatives, definite integrals, indefinite integrals, and series are all linear operators on func-
tions. Linear algebra was everywhere in calculus, we just didn’t know it! 

Proposition 1.4.3. Let T : X → Y be a linear transformation. Let x1 , . . . , xn ∈ X and c1 , . . . , cn ∈ F .


Then
T (c1 x1 + . . . + cn xn ) = c1 T (x1 ) + . . . + cn T (xn ).
The converse also is true, that is, T is a linear transformation if it preserves linear combinations. In
particular, T (0) = 0.

Example 1.4.4. Consider the function T : R3 → R2 given by the rule T (x1 , x2 , x3 ) = (x1 + 2x2 , x3 − 3x2 ),
or in vector notation as:  
x 1
 
  x1 + 2x2 
 

T x2  = 
 .
  x3 − 3x2
x3
This is a function which takes a vector in 3-space, such as (1, 2, 3), and maps it onto a vector in 2-space,
namely T (1, 2, 3) = (1 + 2(2), 3 − 3(2)) = (5, −3). But more than just a function that sends 3D vectors onto
2D vectors, this function preserves the structure of the vector space R3 as it maps onto R2 . This function
is a linear transformation since
      
x
  1  y
 1  ax 1 + by1 
 
  (ax1 + by1 ) + 2(ax2 + by2 ) 

     
Ta  x2  + b  y2  = T  ax2 + by2  = 
      
       (ax3 + by3 ) − 3(ax2 + by2 )
x3 y3 ax3 + by3
1.4. LINEAR TRANSFORMATIONS 23
   
 x1   y1 
   
 x1 + 2x2   y1 + 2y2     
a  + b  = aT  x  + bT
 2 
 y .
 2 
x3 − 3x2 y3 − 3y2    
x3 y3
   
 x1 + 2x2   0 
We next compute the kernel of T . Suppose T (x) = 0. This implies that   =  .
x3 − 3x2 0
Comparing components, we find two linear equations x1 + 2x2 = 0 and x3 − 3x2 = 0. Together, this forms
a homogeneous system of linear equations.
(
x1 + 2x2 = 0
− 3x2 + x3 = 0.

If we attempt to solve this system, we could use the method of substitution. Note we can solve for x1 in the
first equation, which gives x1 = −2x2 . Likewise, we can solve x3 in the second equation and find x3 = 3x2 .
What we see now is that x1 and x3 are determined by our choice of x2 and there appears to be NO restriction
on our choice of x2 , that is, we may choose it freely. For example, if x2 = 1, it would mean that x1 = −2
and x3 = 3, that is, x1 and x3 are dependent on x2 . Note that
 
−2      
  (−2) + 2(1)   −2 + 2   0 
 

T 1 =
 =  =   = 0.
  (3) − 3(1) 3−3 0
3

Thus, this vector is in the kernel. In fact, the solution set to this system is exactly the kernel of T . If we let
x2 = t, some free parameter t, then we see that
        



 −2t 







 −2 




 2 


    



 




ker T =  t   t ∈ R = t  1 t ∈ R = Span  −1  .
   

     

 
 
 


  
  
 
 

 3t   3   −3 
 
  x 1
 1 
 
 
Is the vector b =   in the image of T ? If so, there would be a vector x =  x2 

 such that
2  
x3
   
 x1 + 2x2   1 
T (x) = b. But this implies that   =  . Again, this vector equation implies two linear
x3 − 3x2 2
equations, namely, x1 + 2x2 = 1 and −3x2 + x3 = 2, which as a system of equations is
(
x1 + 2x2 = 1
− 3x2 + x3 = 2.

Again by substitution, we see x1 = 1 − 2x2 and x3 = 2 + 3x2 . Thus, if x2 = 0, we get that


 
 1 
   
   1 + 2(0)   1 
T 0  = 
  = .
  2 − 3(0) 2
2

Therefore, yes, b ∈ im T . Of course, many other choices of x were possible such that T (x) = b. 
24 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

Notice that the components of T (x) in the previous example are linear combinations of the components
of the input vectors, that is, each slot in T (x) is a linear combination of the variables x1 , x2 , . . . It can be
shown that a function T : F n → F m is a linear transformation if and only if the components of T (x) are
linear combinations of the components of x. Another takeaway from the previous example is that computing
the kernel or image of a linear transformation resulted in solving a system of linear equations. This is no
coincidence and we will discuss solving systems of linear equations more in the next sections.

 
 5a + 2c 
Example 1.4.5. Consider the function T : P2 → R2 given by the rule T (ax2 + bx + c) =  .
−a + c
Let f (x) = a1 x2 + b1 x + c1 , g(x) = a2 x2 + b2 x + c2 ∈ P2 and r ∈ R. Then

T (f (x) + g(x)) = T ((a1 x2 + b1 x + c1 ) + (a2 x2 + b2 x + c2 )) = T ((a1 + a2 )x2 + (b1 + b2 )x + (c1 + c2 ))


       
 5(a1 + a2 ) + 2(c1 + c2 )   (5a1 + 2c1 ) + (5a2 + 2c2 )   5a1 + 2c1   5a2 + 2c2 
= = = + 
−(a1 + a2 ) + (c1 + c2 ) (−a1 + c1 ) + (−a2 + c2 ) −a1 + c1 −a2 + c2
= T (a1 x2 + b1 x + c1 ) + T (a2 x2 + b2 x + c2 ) = T (f (x)) + T (g(x))

and

T (rf (x)) = T (r(a1 x2 + b1 x + c1 )) = T ((ra1 )x2 + (rb1 )x + (rc1 ))


     
 5(ra1 ) + 2(rc1 )   r(5a1 + 2c1 )   5a1 + 2c1 
= =  = r =
−(ra1 ) + (rc1 ) r(−a1 + c1 ) −a1 + c1
rT (a1 x2 + b1 x + c1 ) = rT (f (x)).

Therefore we see that T is a linear transformation between the vector spaces P2 and R2 .
   
 5a + 2c   0 
To find the kernel of T , we must solve the vector equation   =  , which corresponds to
−a + c 0
(
5a + 2c = 0
the linear system We solve this system by elimination. Scale the second equation by 5 and
−a + c = 0.
add together the resulting equation with the first equation. This gives 7c = 0, which implies that c = 0. The
original second equation −a + c = 0 implies that a = c. Hence, a = 0 likewise. What about the coefficient b?
The map T seems to “forget” about the coefficient b, that is, the calculation of the image T (ax2 + bx + c) is
independent of the coefficient b. The above linear system is actually a system with two equations and three
unknowns, although none of the equations place any restriction on the coefficient b. Hence, a = c = 0 are
dependent variables and b is a free variable. Therefore, ker T = {bx
 | b ∈ R} ⊆ P2
. 
 5a + 2c   x 
To find the image of T , we consider the general vector equation  = , which corresponds
−a + c y
(
5a + 2c = x
to the linear system Reducing this linear system by the same elimination technique as we did
−a + c = y.
x + 5y x + 5y x − 2y
with the kernel, we get 7c = x + 5y or c = . Since a = c − y, we also get that a = −y = .
7   7 7
 3 
For example, if we wanted a polynomial which mapped onto   via T , we could select f (x) = 4x2 + 3
1
(like we saw in the kernel calculation, the coefficient b is meaningless via this transformation). In particular,
im T = R2 . 
1.4. LINEAR TRANSFORMATIONS 25

Definition 1.4.6. A mapping T : X → Y is said to be onto (or surjective) if for each b ∈ Y there is at
least one x ∈ X such that T (x) = b, that is, the codomain and range of T are the same.

A mapping T : X → Y is said to be one-to-one (or injective) if for each b ∈ Y there is at most one
x ∈ X such that T (x) = b, that is, T (u) = T (v) implies that u = v.

A mapping T : X → Y is said to be invertible (or bijective) if for each b ∈ Y there is exactly one
x ∈ X, that is, T is one-to-one and onto. In particular, there exists another linear transformation S : Y → X
such that S ◦ T = Id and T ◦ S = Id, where Id is the identity function x 7→ x.

Proposition 1.4.7. Let T : X → Y be a linear transformation. Then T is one-to-one if and only if


ker T = {0}. Additionally, T is onto if and only if im T = Y .

Example 1.4.8.
 In Example 1.4.4, we see that T is not one-to-one since the kernel is nontrivial, that is,
 2 
 
 −1  have the same image.
both 0 and  
 
−3
We can determine whether T is onto by determining if a generic vector in R2 , say b = (b1 , b2 ) is an image
of some vector x ∈ R3 . Now this becomes a vector equation:
 
x      
(
  b1   x1 + 2x2   b1 
 
 x1 + 2x2 = b1
T y =
   ⇒  =  ⇒ .
b x − 3x b − 3x2 + x3 = b2
2 3 2 2
 
z

Treating x2 as a free variable and setting it equal to 0, we see that x1 = b1 and x3 = b2 . Therefore,
T (b1 , 0, b2 ) = (b1 , b2 ) = b. As b was no specific vector in R2 , but, in fact, a generic vector, this shows that
any vector in R2 can be the image of a vector from R3 via T , that is, T is onto. 

Example 1.4.9. Consider the linear transformation T : Z32 → Z42 given by the rule:

(x1 , x2 , x3 ) 7→ (x1 , x2 , x3 , x1 + x2 + x3 ).

This time, if T (x1 , x2 , x3 ) = (0, 0, 0, 0) then x1 = x2 = x3 = 0. Thus, ker T = {(0, 0, 0)}. Therefore, T
is one-to-one. On the other hand, consider b = (1, 1, 1, 0). If b ∈ im T , then x1 = x2 = x3 = 1. But
x1 + x2 + x3 ≡ 1 6≡ 0. Thus, b 6= T (x) for any x ∈ Z32 . Therefore, T is not onto. Note this linear
transformation determines an error detecting code, because if an error occurred in any of the first three bits
then the sum of the first three transmitted bits would not add up to the last bit. 
26 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

Exercises (Go to Solutions)

For Exercises 1-5, use the transformation For Exercises 6-10, use the transformation
T : R2 → R3 given by the rule: T : R3 → R2 given by the rule:
 
x 1
 
T (x, y) = (x + y, 0, 2x + 3y). 
   x1 + x3 

T x2  = 
 .
  x1 + 3x2
♠ 1. Show that T is a linear transformation. x3

6. Show that T is a linear transformation.


   
♠ 2. Compute T (1, 2) and T (5, −2). 0 1
   
   
7. Compute T  0 
 and T  1 .
 
   
♠ 3. Compute ker T . 0 2

8. Compute ker T .
 
♠ 4. Is b = (1, 0, 1) in the image of T ? Explain.
 1 
9. Is b =   in the image of T ? Explain.
1

♠ 5. Determine whether T is one-to-one, onto, or 10. Determine whether T is one-to-one, onto, or


neither. neither.

For Exercises 11-15, use the transformation For Exercises 16-18, use the transformation
T : R3 → R2 given by the rule: T : R3 → R3 given by the rule:
T (x, y, z) = (x + y − 2z, −y + z).
T (x, y, z) = (2z + y, 2x + 4, −2y).
11. Show that T is a linear transformation.
♠ 12. Compute T (1, 2, 3) and T (1, 0, −2). 16. Compute T (1, 3, 2).
♠ 13. Compute ker T .
17. Is b = (1, 6, 2) in the image of T ? Explain.
♠ 14. Is b = (3, −1) in the image of T ? Explain.
♠ 15. Determine whether T is one-to-one, onto, or 18. Show that T is NOT a linear transforma-
neither. tion.

For Exercises 19-21, use the transformation For Exercises 22-26, use the transformation
T : R3 → R2 given by the rule: T : Z42 → Z2 given by the rule:

T (x, y, z) = (2, 5). T (x, y, z, w) ≡ x + y + z + w.

22. Show that T is a linear transformation.


19. Show that T is NOT a linear transforma-
tion. ♠ 23. Compute T (1, 0, 0, 1) and T (1, 0, 1, 1).

♠ 24. Compute ker T .


20. Is b = (3, 2) in the image of T ? Explain.
♠ 25. Is b = 1 in the image of T ? Explain.
21. Determine whether T is one-to-one, onto, or ♠ 26. Determine whether T is one-to-one, onto, or
neither. neither.
1.4. LINEAR TRANSFORMATIONS 27

For Exercises 27-27, use the transformation For Exercises 28-29, use the transformation
T : Z23 → Z3 given by the rule: T : Z35 → Z35 given by the rule:
 

T (x, y) = 2x + y.  x + 2y + z 
 
T (x, y, z) = 
 2x + 2y + 2z
.

 
27. Compute T (1, 2) and T (5, 2). 3z + 4y + 3z

28. Compute T (1, 0, 1) and T (1, 2, 3).

29. Is b = (2, 4, 3) in the image of T ? Explain.

For Exercises 30-32, use the transformation


T : F n → F given by the rule:

T (x1 , x2 , . . . , xn ) = a1 x1 + a2 x2 + . . . + an xn ,

where F is a field and a1 , a2 , . . . , an ∈ F .


30. Show that T is a linear transformation.
31. Is it possible to determine if T is one-to-one with this information? Why or why not?
32. Is it possible to determine if T is onto with this information? Why or why not?
28 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

“When you put your hand to the plow, you can’t put it down until you get to the end of the row.”
– Alice Paul

1.5 Augmented Matrices


/Let us begin to find an effective algorithm for solving linear systems. The following three Elementary Row
Operations † will be the basic techniques used to solve linear systems. In the elementary row operations, we
refer to equations as “rows” for reasons that will be more clear by the end of this section.

Elementary Row Operations

1. (Replacement) Replace one row by the sum of itself and a multiple of another row.

2. (Interchange) Interchange the order of any two rows in the system.

3. (Scaling) Multiply all scalars in a row by a nonzero scalar.

We say that two systems of linear equations are row equivalent if there is a sequence of row operations
that transforms one system into the other.

Theorem 1.5.1. Two linear systems are equivalent if and only if they are row equivalent.

Example 1.5.2. Solve 



 x1 − 2x2 + 2x3 = 0
2x2 − 8x3 = −8

−4x1 + 6x2 + 2x3 = 10

using row operations.

We begin by replacing Row 3 with 


(Row 3 + 4 ∗ Row 1). This gives: x1 − 2x2 + 2x3 = 0

2x2 − 8x3 = −8

− 2x2 + 10x3 = 10

1
Next, we scale Row 2 by a factor of , which gives: 
2 x1 − 2x2 + 2x3 = 0

x2 − 4x3 = −4

− 2x2 + 10x3 = 10

Lastly, we replace Row 3 with 


(Row 3 + 2 ∗ Row 2), which gives: x1 − 2x2 + 2x3 = 0

x2 − 4x3 = −4

2x3 = 2

At this point, we recognize that we have solved for x3 , that is, x3 = 1. Substituting this into Row 2 gives

x2 − 4x3 = −4 ⇒ x2 = −4 + 4x3 = −4 + 4(1) = −4 + 4 = 0.

Now we have solved for x2 . Lastly, we use these values to solve for x1 :

x1 − 2x2 + 2x3 = 0 ⇒ x1 = 2x2 − 2x3 = 2(0) − 2(1) = 0 − 2 = −2.

Therefore, (x1 , x2 , x3 ) = (−2, 0, 1) is the unique solution to the above system. 


1.5. AUGMENTED MATRICES 29

Definition 1.5.3. If m and n are positive integers, an m × n matrix is a rectangular array of scalars with
m rows and n columns. Note that the numbers of rows always comes first.
For example,  
 1 2 3 
 
5 0 −3

is a 2 × 3 matrix over R.

The essential information of a linear system 


can be recorded compactly in a matrix. Consider 
 x1 − 2x2 + 2x3 = 0
the system from the previous example 2x2 − 8x3 = −8

−4x1 + 6x2 + 2x3 = 10

Aligning similar variables into columns, we create  


the coefficient matrix 1 −2 2 

 

 0 2 −8 

 
−4 6 2

and the augmented matrix  


 1 −2 0 2
 

 0 2 −8 −8 

 
−4 6 2 10

Example 1.5.4. Solve the following system of equations:




 3x2 + 3x3 = 11
2x1 − 3x2 + 3x3 = −4

x1 + x2 + 4x3 = 3

using the augmented matrix.

The augmented matrix is:  


 0 11 
3 3
 
 2
 −3 3 −4 

 
1 1 4 3

Now, the row operations for linear systems work  


on augmented matrices as well. Interchanging
Row’s 1 and 3 gives:  1 1
3  4
 
 2
 −3 3 −4 

 
0 3 3 11

Next, replace Row 2 with (Row 2 − 2Row 1),  


which gives:
 1 1 3  4
 
 0
 −5 −5 −10 

 
0 3 3 11
30 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

1
Next, scale Row 2 by − to get:  
5
 1 1 4 3 
 
 0 1 1 2 
 
 
0 3 3 11

Finally, replace Row 3 with (Row 3 − 3Row 2),  


which gives:
 1 1 4 3 
 
 0 1 1 2 
 
 
0 0 0 5

This implies that 


x1 + x2 + 4x3 = 3

x2 + x3 = 2

0 =5

which is impossible. Therefore, the system has no solution, that is, the system is inconsistent . 

Definition 1.5.5. In a matrix, we say a row is a zero row if all scalars in this row are zero. Otherwise,
we call it a nonzero row. In a nonzero row, we say the leading entry of the row is the leftmost, nonzero
entry in the row.

A matrix is in (row) echelon form if it has the following three properties:


1. There is no zero row above a nonzero row.

2. Each leading entry is in a column to the right of the leading entry of the row above it.
3. All entries in a column below a leading entry are zero.

Essentially, 1. requires all zero rows to be at the bottom of a matrix in echelon form. Additionally, 2.
and 3. require there exists a downward staircase of zeros in the lower left of the matrix, hence the name
echelon.

A matrix in echelon form is in (row) reduced echelon form (or RREF) if additionally:
4. The leading entry in each nonzero row is 1.

5. All entries in a column above a leading entry are zero.

When considering whether an augmented matrix is in (row reduced) echelon form, consider only those
columns to the left of the vertical line. That is, an augmented matrix is in (row reduced) echelon form if
and only if its coefficient matrix is.

A pivot position, or simply just a pivot, in a matrix is a location that corresponds to a leading entry
in one of its echelon forms. A pivot column (or row) is a column (or row) that contains a pivot position.
The number of pivots of a matrix A is called its rank, denoted rank(A).
A priori, we do not know the locations of the pivots in a matrix, but we can easily see them when
the matrix is in echelon form. Now if two matrices are row equivalent, then their pivot positions will be
the same. Thus, it will be highly useful to be able to compute echelon forms row equivalent to given matrices.
1.5. AUGMENTED MATRICES 31

Example 1.5.6. The following two matrices are in echelon form:


   
 1 i 2 − 5i 3   1 0 0 −3 
   
 0 0
 1 2−i   and  0
 1 0 .
0 
   
0 0 0 5 0 0 1 4

The first matrix is NOT in row reduced echelon form, since there are nonzero entries above the pivot in the
third column. The second one is and the first three columns are pivot columns. The first matrix has rank 2,
and the second has rank 3. 

Solving systems of linear equations when in echelon form is very simple. Solve first for the equation
involving one variable. Substitute this assignment of the variable into the linear equation involving two vari-
ables. Solve for the remaining unknown. Plug these two assignments into the equation with three unknowns.
Repeat this process as necessary. This technique is often called back-substitution.

Example 1.5.7. Solve the following linear systems from their augmented matrices which are in echelon
form.
(a) Examining below the echelon matrix and its corresponding system of linear equations, we see that we
can easily solve this system. Starting with the third equation, we get that z = 5. Substituting this into
the second equation, which depends only on y and z, we get y = −46+8z = −46+40 = −6. Finally, we
1 1 4
substitute these values into the first equation and get x = (y − 3z + 25) = (−6 − 15 + 25) = = 2.
2 2 2
Therefore, the unique solution to the system is (2, −6, 5) .
 
 2 −1 3 25  
  2x −
 y + 3z = 25
 0
 −1 8 46 
 ∼ −y + 8z = 46

15z = 75
  
0 0 15 75

(b) This augmented matrix is in row reduced echelon form. In terms of the linear system, the system is
already solved, and the solution corresponds to the augmented column of the matrix, that is, (3, −5, 3)
is the unique solution to this linear system.
 
 1 0 0 3  
x
 = 3
 
 0 1 0 −5 
  ∼ y = −5

z= 3
  
0 0 1 3

(c) The final equation in this system is an identity which is true for all assignments of the variables. It
neither adds or takes away any restrictions on the solution set. In fact, this equation could be removed
without changing the solution set. Because the system essentially has two equations but three variables,
it must be true that at least one of the variables is free.
 
 1 0 −3 2 
 
x1
 − 3x3 = 2 x1 = 2 + 3x3

 
 0 1
 3 7   ∼ x 2 + 3x 3 = 7 ∼ x2 = 7 − 3x3
 
0 =0 x3 = any number
   
0 0 0 0

Notice that the equation 0 = 0 places no restriction on the variables. It could be removed without
altering the solution set. Thus, there is no restriction placed on x3 , that is, x3 could be freely assigned
32 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

any real number t. This is what we mean by calling x3 a free variable. Upon choosing any such number,
x1 and x2 are determined by their relation with x3 , that is, their assignment is restricted by equations
1 and 2. Thus, this system has multiple solutions, whose general form is (2 + 3t, 7 − 3t, t) .

Whether there is a free variable or not in the a linear system is relatively easy to determine once
you know how to look for it. In particular, dependent variables correspond to pivot columns of the
augmented matrix, and free variables correspond to the remaining columns, the so-called non-pivot
columns.
Theorem 1.5.8. A linear system has multiple solutions if and only if it is consistent and have at least
one non-pivot column. These non-pivot columns will correspond in a one-to-one manner with the free
variables of the linear system.

(d) Examples like the previous one sometimes give the false narrative that multiple solutions occur only
when the linear system contains the identity 0 = 0 in some echelon form. This is patently false.
Consider the following overdetermined system (more rows than columns) below. The linear system
contains two rows of zeros but has a unique solution, namely (5, −2).
 
 1 0 5 
 
 0 1 −2 
 
 .
 
 0 0 0 
 
 
0 0 0

On the other hand, an underdetermined system (more columns than rows) will necessarily have some
columns which cannot have any pivots, as the number of pivots is bounded above by the number of
rows and the number of columns. Thus, it MUST have at least one non-pivot column (see Proposition
1.1.8). This translates to mean that an undetermined system MUST have free variables, which will
provide multiple solutions if the linear system is consistent. For example, the below linear system is
underdetermined but has no row of zeros. Nonetheless, the third column is non-pivot, meaning that
x3 is a free variable of the system. The general solution would then be (2 − t), 3 − t, t) for any scalar t.
 
( (
 1 1 2 5  x1 + x3 = 2 x1 = 2 − x3
∼ ∼
x2 + x3 3 x2 = 3 − x3 .

0 1 2 3
The false narrative derivatives from the overexposure linear algebra students often get from n × n
(square) linear systems. In this case, if there is a row of zeros then the square system devolves into
an undetermined system which has a free variable. The multiple solutions follow, not from the row of
zeros, but from the non-pivot column(s).

(e) The third equation gives a contradiction, since there is no choice of variables such that 0 = 8. This
tells us that the linear system is inconsistent .
 
1 0 −1 1 
− x3 = 1



 x 1

 0 2
 4 4   ∼ 2x 2 + 4x3 = 4

0 =8
  
0 0 0 8

Theorem 1.5.9. A linear system is inconsistent if and only if it contains a row of the form
 
0 0 ... 0 b (b 6= 0)

in some echelon form of the matrix. If the linear system is consistent, then the solution is unique if
and only if the system has no free variables. 
1.5. AUGMENTED MATRICES 33

Exercises (Go to Solutions)


For Exercises 1-5, each of the augmented matrices are in echelon form. Identify the pivot positions and
which variables, e.g. x1 , x2 , x3 , . . ., are free variables in the associated linear system.
     
1 0 0 1 2 1 4 −1 0 2 1 3 0
1.  2. 
     
   
0 1 0 0 1 3 5 0 3.  0 0 1 2 0 


 
0 0 0 4 0
   
 1 4 2 6 1   1 3 5 0 1 1 
   
 0 2 2 1 2   0 3 4 1 2 2 
 
 
4. 


  
5.  0
 0 0 0 2  
 3 
  0 0 6 1 3 

   
0 0 0 0 0  0
 0 0 0 2 4 

 
0 0 0 0 0 5

For Exercises 6-30, identify if the matrix is in echelon form and if it is in row reduced echelon form. If in
echelon form, identify the pivot positions and the rank of the matrix.
     
 1 0 0   1 2 0 −3   0 0 0 
  7.    
6. 
 0 1 0 
 0 0 1 5 8. 
 0 0 0 

   
0 0 1 0 0 0
     
 0 0 0  10. 3 −2 4 5  1 1 0 
   
 0

0 0 
 12. 
 0 1 0 

9. 



   
 0
 0 0 
 11. 1 2 3 4 5 6 0 0 1
 
0 0 0
     
 0 1 0   1 2 4 −3   1 0 0 0 
  14.    
13. 
 0 0 0 
 0 0 1 5 15. 
 0 0 2 0 

   
0 0 0 0 0 0 1
     
 1 6 −5 0   5 6 3   2 4 3 6 
  17.    
16. 
 0 0 0 2 
 0 4 12 18. 
 0 2 6 2 

   
0 0 0 0 0 0 1 4
     
 1 0 6 2   1 2 3 4 5   1 2 3 4 5 
     
19. 
 0 0 3 4 
 ♠ 20. 
 0 6 7 8 9 
 ♠ 21. 
 0 1 7 8 9 

     
4 0 1 6 0 0 0 3 4 0 0 0 0 2
34 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA
     
 1 2 3 4 5   1 0 3 0 5   1 0 3 4 5 
     
♠ 22. 
 0 0 0 0 2 
 ♠ 23. 
 0 1 7 0 9 
 ♠ 24. 
 0 1 7 8 9 

     
0 6 7 8 9 0 0 0 1 2 0 0 0 0 2
     
 1 0 0 4 5   1 1 1 1   1 − 51 2
5 − 25 
     
♠ 25. 
 0 0 1 0 2  26. 
 5 −1 2 −2  27. 
 0 2 1 7 
  3 
     
0 1 0 8 9 3 −1 1 3 0 0 0 1
     
1
 1 0 2 0   0 0 0

0 

 1 0 0 3

5 

 
28.  1
 0 1 0  0 9 1 3   0 7 0 0 4 
    
2 
  29. 


 30. 



0 0 0 1  0 0 6
 7 

 0 0 0 1
 6 

   
2 1 3 4 0 0 0 0 0

For Exercises 31-39, write the linear system as augmented matrix or vice versa.

  (
 4x2 + 2x3 = 6 3x − y
 = 0 x+ y+z = 1
♠ 33.

31. 3x1 + 5x2 + 2x3 = 7 ♠ 32. − 2y = 5 −2x − 6y + z = 12
 
3x1 + 17x2 + 8x3 = 24 x + 7y = 13
 

10x − 7y + 2z − 4w = 10
 3  
12x − 2y + 3z + w = 13 1 −2 −3
−4 

 

 3x + 4y − 3z + w

= 20 2 
34. 35. x + 3y − 2z

= 9 36. 

 x + 4z = 73 
  0 1 2 1 

+ 17z − 20w = 12
 
 y  
6y − 3z + 2w

= 10 −2 0 −2 8

     
 3 −2 −1 4   4 −1 −2 5 12   2 0 1 −9 3 
     
♠ 37. 
 1 0 3 −3 
 ♠ 38. 
 −3 0 0 1 5 

 0 9 21 25 −9 

     
39. 
 
0 0 0 2 1 0 0 0 0  14 −7 0 36 67 

 

 3 4 −7 19 2 

 
−18 0 2 8 1

For Exercises 40-52, perform the indicated elementary row operation(s) to the linear system or augmented
matrix.
  
2x + 2y
 = 5 
 2y + 5z = 10 3x − y = 0

40. x+ y− z = 1 ♠ 41. x − y + 3z = 6 ♠ 42. − 2y = 5
  
− 5y + 4z = −1 2x + 7y − z = 0 x + 7y = 13
  
scale Row 2 by 3 interchange Rows 1 and 2 scale Row 2 by −1/2
1.5. AUGMENTED MATRICES 35
( 
 7x + y − 2z ≡ 10
 
x+ y+z = 1
 1 3 1 3 

♠ 43.
−2x − 6y + z = 12 44. −x − y + 7z ≡ 5 (mod 11)  
replace Row 2 with


6x − z ≡ 3 45.  0 1 7
 1 

Row 2 + 2Row 1 1
 
scale Row 2 by 2
0 0 1 0
replace Row 1 with
Row 1 + Row 3
     
 0 0 0 0 0   2 4 6 −2 4   2 3 4 6 
     
46.  1 2 3 4 5 

 47. 
 0 1 3 5 5  48. 
 1 2 2 3 

     
2 0 0 0 7 0 0 0 1 1 9 4 7 2
replace Row 3 with scale Row 1 by 1/2 replace Row 1 with
Row 3 + 2Row 2 Row 1 − Row 2
     
 1 2 3 6   0 0 0 2   4 4 3 2  0
     
49.  2 3 4 1 

 ♠ 50. 
 3 2 1  (mod 5)
4  ♠ 51. 
 2 0 0 1 0   (mod 5)
     
5 6 7 2 1 0 3 3 1 0 0 0 0
replace Row 2 with interchange Rows 1 and 3 scale Row 1 by 4
Row 2 − 2Row 1,
replace Row 3 with
Row 3 − 5Row 1,
replace Row 3 with
Row 3 − 4Row 2
 
 1 2 3 4 0 
 
♠ 52. 
 0 1 3 1 2   (mod 5)
 
0 1 2 3 4
replace Row 1 with Row 1 − 2Row 2

For Exercises 53-57, identify the elementary row operation(s) which transform the first matrix into the second
matrix. Answers may vary.
       
 0 2 6   7 1 2   3 2 0   1 −2 0 
       
53.  7 1 ∼ 0
2  2 6  54. 
 5 4 ∼ 5
1  4 1 
 
  
       
−3 5 0 −3 5 0 7 0 1 3 2 0
       
 2 0 3   1 4 −4   1 −3 −9  
0 1 −3 0 −9 
       
55. 
 0 2 ∼ 0
−2  1 −1  3 −8 2 −21   0 1 2 6 
    

  

 56.  ∼ 
   
3 4 −1 0 −8 11 
 2 −2 9 7  
  0 4 9 25 

   
−2 4 −1 10 0 −2 −1 −8
   
 1 0 6 9   1 0 0 3 
   
 0 1 2 6   0 1 0 4 
   
57.  ∼ 
   
 0 0 1 1   0 0 1 1 
   
   
0 0 3 4 0 0 0 1
36 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

† The elementary row operations come by many names, and many texts never give them names are all. The replacement

operations is sometimes called row addition or row combination because it consists of adding a multiple of a row to another
row. The interchange operation is sometimes called row swap or row switch for obvious reasons. Finally, the scaling operation
is often called row multiplication because we multiply both sides of an equation by a nonzero constant. The naming scheme
used here for the elementary row operations follows the scheme used by Lay [?]. We prefer this naming scheme for two reasons.
First, students will remember them better and be able to talk with their classmates about them more easily when they have
short, simple names. Second, we will eventually see that these three elementary operations correspond to many concepts in
linear algebra, including elementary column operations. For this reason, it is preferable to have names which do not use the
word “row.”
1.6. REDUCTION OF LINEAR SYSTEMS 37

“A leader is someone who helps improve the lives of other people or improve the system they live under.”
– Sam Houston

1.6 Reduction of Linear Systems


The key to solving linear systems is to row reduce the augmented matrix to an echelon form. To accomplish
this, conduct the following recursive algorithm: identify the leftmost nonzero column. This becomes a pivot
column, with the pivot in the top position. If this pivot entry is zero, interchange the pivot row with any
nonzero row below it. Next, zero out each entry below the pivot by replacing this row with itself plus a mul-
tiple of the pivot row. Consider next the minor matrix where this pivot row and pivot column are removed.
Repeat this process on the minor matrix until an echelon form is obtained.

The above algorithm for row reduction to echelon form coupled with the technique of back-substitution
to solve the reduced system is call Gaussian Elimination. This provides an efficient procedure to solve
linear systems.

Example 1.6.1. Reduce the  matrix A to echelon form and solve


 the associated linear system using Gaus-

 0 0 0 −2 3 −7 
 
sian Elimination, where A = 
 −2 −4 −6 1 −1 −3  .
 
4 8 12 1 −2 16
We begin by identifying the leftmost nonzero column of the matrix, which is the first column of A. As
such, we place a pivot position in the first row of the first column. As this pivot position contains a zero value,
we will interchange Rows 1 and 2, since Row 2 contains a nonzero value. (Note that the red double-arrow
below will be shorthand we commonly use to denote the interchange of two rows in a matrix.)
   
 0 0 0 −2 3 −7   −2 −4 −6 1 −1 −3 
   
 −2
 −4 −6 1 −1 −3 
 ∼
 0 0 0 −2 3 −7 

   
4 8 12 1 −2 16 4 8 12 1 −2 16

Now we proceed to place zero values in the all the entries below the pivot position in Column 1 using Row
Replacement. As Row 2 already has a zero entry in Column 2 we may disregard it for now. To create a zero
entry in Row 3 we will replace Row 3 with Row3 + 2 · Row1, as shown. (Note that red statement “(Row
i + cRow j)” written adjacent to Row i will indicate we are replacing Row i with Row i + cRow j. For
convenience and to avoid arithmetic errors, we write cRow j write above Row i in red to indicate what values
we will add together to produce the next matrix in the row reduction.)
   
−2 −4 −6 −1 −3
 −2 −4 −6 1 −1 −3 
1
 
   

 0 0 0 −2 3 −7 
 ∼
 0 0 0 −2 3 −7 

−4 −8 −12 2 −2 −6
   
4 8 12 1 −2 16 (Row 3 + 2 Row 1) 0 0 0 3 −4 10

This completes the row reduction in Column 1. We now search for the next pivot position in the minor
matrix where Row 1 and Column 1 (the location of the first pivot) are ignored. We notice there are only
zero entries remaining in Column 2, so we skip this column. We do the same for Column 3. The leftmost
nonzero column is then Column 4. So we place the pivot in the (2, 4) position. As this entry is already
38 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

nonzero, no interchange is necessary here.


 
 −2 −4 −6 1 −1 −3 
 
 0
 0 0 −2 3 −7 

 
0 0 0 3 −4 10

Next, to remove the 3 below this second pivot we replace Row 3 with Row 3 + 32 Row 2. This is
demonstrated below.
   
 −2 −4 −6 1 −1 −3   −2 −4 −6 1 −1 −3 
   
 0
 0 0 −2  ∼
3 −7   0  0 0 −2 3 −7  
   
0 0 0 3 −4 10 0 0 0 0 1/2 −1/2

This completes the row reduction in the second pivot column. The third and final pivot would then be
placed in the (3, 5) position, as this is the leftmost nonzero column when Rows 1 and 2 and Columns 1-4 are
ignored. As there are no entries below Row 3, the row reduction is completed automatically.
 
 −2 −4 −6 1 −1 −3 
 
 0
 0 0 −2 3 −7 
 
0 0 0 0 1/2 −1/2

Finally, row reduction ends when we reach the end of the coefficient matrix. Thus, our matrix is in echelon
form. In order to solve the associated linear system, we will rewrite the augmented matrix as a system of
equations and solve by back-substitution.
−2x1 − 4x2 − 6x3 + x4 − x5 = −3



− 2x4 + 3x5 = −7

 1 1
x5 = −


2 2


Remark 1.6.2. Many like to deviate from the Gaussian Elimination algorithm in order to avoid the some-
what more cumbersome computations that follows by having entries which are fractions and having to do
arithmetic by hand. One strategy to follow to avoid sometimes unnecessary fractions includes always choos-
ing a row so that the pivot position is a one. Perhaps there is a row with already a one in it. Perhaps a
row could be rescaled so there is a row in the pivot position, but if chosen incorrectly we might introduce
fractions we are trying to avoid. Finally, a combination of row replacements can produce a pivot one if the
greatest common divisor between two entries in the same column is itself one.

For example, in Example 1.6.1 when working on the second pivot, instead of replacing Row 3 with Row
3 + 23 Row 2, we could have replaced Row 2 with Row 2 + Row 3 and then replaced Row 3 with Row 3 –
3Row 2, as illustrated below. It takes more row operations but avoiding all fractions.
   
 −2 −4 −6 1 −1 −3   −2 −4 −6 1 −1 −3 
   
0 −2 3 −7  ∼ 0 0 1 −1
 0 0  0 3 
 
   
0 0 0 3 −4 10 0 0 0 3 −4 10
 
 −2 −4 −6 1 −1 −3 
 
∼
 0 0 0 1 −1 3 
 H
 
0 0 0 0 −1 1
1.6. REDUCTION OF LINEAR SYSTEMS 39

Because the row reduced echelon form is more desirable, we can continue to row reduce a matrix in
echelon form that was produced by Gaussian elimination by transforming it into row reduced echelon form.
This method is known as Gauss-Jordan Elimination. To perform this algorithm, first perform Gaussian
elimination to produce an echelon form. This is known as the forward phase. For the backward phase, rescale
the rightmost pivot positions to one, if not already done, and create zeros above the pivot position. Repeat
this process for each pivot, moving right to left.

 
 0 1 −3 −1 −2 
 
 1 1 2 4 3 
 
Example 1.6.3. Row reduce the matrix A to reduced echelon form, where A = 

.

 3
 7 −6 8 1 

 
0 −1 3 4 −4

We begin with the forward phase of row reduc-    


tion. We select position (1, 1) for the first pivot. As  0 1 −3 −1 −2   1 1 23 4
there is a zero in this pivot, we interchange Row 1
   
 1 1 2 4 3   0 1 −3 −1 −2 
   
and Row 2:   ∼ 
   
 3
 7 −6 8 1 

 3
 7 −6 8 1 

   
0 −1 3 4 −4 0 −1 3 4 −4

Next, we replace Row 3 with Row 3 – 3 Row 1:


   
 1 1 2 4 3   1 1 2 3 4
   
 0 1 −3 −1 −2  0 1 −3 −1 −2 
   


 −3 −3 −6 −12 −9

 ∼



 3
 7 −6 8 1 (Row 3 – 3 Row 1)

 0
 4 −12 −4 −8 

   
0 −1 3 4 −4 0 −1 3 4 −4

Next, we move the pivot to position (2, 2). Using this pivot, we replace Row 3 with Row 3 – 4 Row 2 and
replace Row 4 with Row 4 + Row 1:
     
 1 1 2 4 3   1 1 2 4 3   1 1 2 4 3 
     
 0 1 −3 −1 −2   0 1 −3 −1 −2   0 1 −3 −1 −2 
     
 ∼  ∼ 
   −4 12 4 8  
 0 4 −12 −4 −8   0 4 −12 −4 −8 (Row 3 – 4 Row 1)  0 0 0 0 0 
 
   
   1 −3 −1 −2   
0 −1 3 4 −4 0 −1 3 4 −4 (Row 4 + Row 2) 0 0 0 3 −6

We next move the pivot to position (3, 4) as there are only zeros in (3, 3) and (4, 3). As the third row is a
zero row, we next interchange Rows 3 and 4. We note that this matrix is now in echelon form. To continue
to RREF, we scale Row 3 by 31 .
     
 1 1 2 4 3   1 1 2 4 3   1 1 2 4 3 
     
 0 1 −3 −1 −2   0 1 −3 −1 −2   0 1 −3 −1 −2 
     



 ∼


1 ∼



 0 0
 0 0 0 

 0 0
 0 3 −6 
 3 (Row 3)
 0 0
 0 1 −2 

     
0 0 0 3 −6 0 0 0 0 0 0 0 0 0 0
40 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

We begin now the backward phase of row reduction. In order to zero-out the entries above the pivot, we
replace Row 1 with Row 1 – 4 Row 3 and Row 2 with Row 2 + Row 3:
 −4 8
  
 1 1 2 4 3  (Row 1 – 4 Row 3)  1 1 2 0 11 
 1 −2   
 0 1 −3 −1 −2  (Row 2 + Row 3)  0 1 −3 0 −4 
   



 ∼


 0 0
 0 1 −2 

 0 0
 0 1 −2 

   
0 0 0 0 0 0 0 0 0 0

We next move back to the pivot in position (2, 2). As this entry is already one, we proceed to eliminate the
one above it. This is accomplished by replacing Row 1 with Row 1 – Row 2. This final matrix is the row
reduced echelon form of the original matrix. All the pivot columns are indicated.
 −1 3 4
  
 1 1 2 0 11  ( Row 1 – Row 2 )  1 0 5 0 15 
   
 0 1 −3 0 −4   0 1 −3 0 −4 
   



 ∼ 
 
 0 0
 0 1 −2 


 0 0 0 1 −2 

   
0 0 0 0 0 0 0 0 0 0

Theorem 1.6.4. Each matrix is row equivalent to one and only one reduced echelon matrix.

The proof of this important theorem essentially follows from Gauss-Jordan elimination.

Example 1.6.5. Row-reducing a matrix over an


alternative field, such as C or Zp does not change  
the algebra whatsoever. It only changes the arith- 4 + 9i 6 + 3i 0
metic associated to this field. To illustrate this
 
 
fact, we demonstrate how to row reduce a com-

 1 + 3i 2+i 0 

plex matrix.
 
−13 + 11i −1 + 12i 2 + 5i

Because the first column is nonzero, the first pivot position is in the (1, 1)-position. If we follow the
Gaussian Elimination algorithm, we would replace rows below this pivot by subtracting some multiple of
Row 1. But to do this, it will essentially require us to divide Row 1 by 4 + 9i. While we certainly can do this,
this can be a very cumbersome calculation for humans (especially those who are still a little uncomfortable
with complex arithmetic). Because of this, we demonstrate a slight variation to Gaussian Elimination that,
while is technically less efficient from a computational point of view, places less cognitive baggage on the
1+3i
reader. Instead of replacing Row 2 with Row 2− 4+9i Row 1, we replace Row 1 with Row 1−3Row2. This will
be immediately followed by replacing Rows 2 and 3 with Row 2−(1+3i)Row 1 and Row 3−(−13+11i)Row 1,
respectively.
     
 4 + 9i 6 + 3i 0   1 0 0   1 0 0 
     
 ∼  1 + 3i ∼ 0
 1 + 3i 2+i 0   2+i 0   2+i 0 
 
     
−13 + 11i −1 + 12i 2 + 5i −13 + 11i −1 + 12i 2 + 5i 0 −1 + 12i 2 + 5i

Our pivot position then moves to the (2, 2)-position. We again will deviate from the standard algorithm
1
to scale Row 2 by 2+i . From here, replace Row 3 with Row 3 − (−1 + 12i)Row 2. Finally, with the final
1.6. REDUCTION OF LINEAR SYSTEMS 41

1
pivot in the (3, 3)-position, we scale Row 3 by 2+5i , which provides the RREF of the original matrix.
       
 1 0 0   1 0 0   1 0 0   1 0 0 
       
 0 2+i 0 ∼ 0 1 0 ∼ 0 1 0 ∼ 0 1 0 
       
       
0 −1 + 12i 2 + 5i 0 −1 + 12i 2 + 5i 0 0 2 + 5i 0 0 1

42 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA

Exercises (Go to Solutions)


For Exercises 1-13, compute the row reduced echelon form for the matrix. List the row operations used to
transform the matrix to this form. Indicate the first time the matrix is in echelon form. Answers may vary.
       
1 2 1 2 3 1 2 2 1  1 5 6 −5
1. 
      
      
−4 5 ♠ 2. 
 2 3 3 
 ♠ 3. 
 2 1 −2 −2   3 0 −2

1


 ♠ 4. 
 
    
2 4 5 1 −1 −4 −3  0 1
 3 3 

 
−3 5 0 1
       
 1 2 0 9   8 16 0 14   3 0 6 9 18   1 + i −3 + 7i 
      ♠ 8.  
5. 
 0 2 4 6 
 6. 
 0 0 4 7 
 7. 
 0 2 4 0 8 
 2−i 3 + 2i
     
0 0 5 10 0 9 3 9 0 0 0 3 6
     
 6 − 2i 4 + 12i   2 7   0 5 6 
9.   ♠ 10.   (mod 17)  
−5 + i 2−i 5 11 ♠ 11. 
 2 1  (mod 7)
2 
 
6 5 9
   
 2 5 7   1

0 1 

0
 
12. 
 1 3  (mod 7)
6   1

2 0 −1 

  13. 

 (mod 5)

0 5 9  3
 −1 0 4 

 
1 4 5 1

For Exercises 14-18, solve the linear system using Gaussian elimination, that is, convert the linear system
into an augmented matrix, reduce it to any echelon form, switch it back into a system of equations, and
solve the reduced system using back-substitution.
 
 x − y + 3z = 1
 
 2y + 6z = 4
14. 2x − 2y + 5z = 6 15. 3y + 2z + x = 5
 
2x + 3y − 4z = 2 3z + 3x + 3y = 3
 

6x + 3y + 7z ≡ 20

3x1 + 2x2 + x3 = 1
 

18y + 9z ≡ 5

16. 3x1 + 8x2 + 3x3 = 5 
 17. (mod 5)
3x2 + x3 = 6 3y + 4z ≡ 10
 


3z ≡ 3

≡0

 x1 + 3x2 + 5x3 + 4x4 + 6x5

≡5

2x + 4x + 4x + 3x + x
1 2 3 4 5
18. (mod 7)

 x4 ≡5

≡2

6x1 + 2x2 + 6x3 + 3x4
For Exercises 19-32, solve the linear system using Gauss-Jordan elimination, that is, convert the linear system
into an augmented matrix, reduce it to row reduced echelon form, and use this to solve the linear system.
1.6. REDUCTION OF LINEAR SYSTEMS 43
(  (
2x + 4y ≡ 0  x + 2y = 2
 x+y =8
19. (mod 5) ♠ 21.
x + 3y ≡ 2 20. − 4y = 8 x−y =4

2x + 2y = 6

  
 x− y
 = 6 
 3y + 6z = 3 
 3x2 − 2x3 = 0
♠ 22. 2x − 3z = 16 23. x + 2y − 3z = 2 24. x1 + x2 + 3x3 = 12
  
2y + z = 4 4x − 3z = 0 6x1 + 2x2 + x3 = 13
  
  
 x1 + 4x2 + 7x3
 =3 2x − 2y − 2z = 2
 7x + 8y + 9z =
 10
25. 2x1 + 5x2 + 8x3 =3 ♠ 26. 2x + 3y + z = 2 . 27. 4x + 5y + 6z = 10
  
3x1 + 6x2 + 9x3 =3 3x + 2y =0 x + 2y + 3z = 10
  
 
2x1 + x2 − x3 = −10

  2x1 + x2 + 4x3 = 199
 

5x1 + 2x2 + 6x3 = 11
x1 − 3x2 − 2x3 = −4

28. 29. 2x1 + 10x2 + 15x3 = 984  x2 + 2x3 = 5
  30.
3x1 + x2 + x3 = 15 −x1 + 10x3 = 58 3x1 + 2x2 + x3 = 7
  



2x1 + x2 + 4x3 = 6

 2x = 10 − 4y + 2z

 x+ y + z+ w =1 

−10y = 20 − 16x − 3z − 3 + 13y + 16 − 2z

 3y + z =1 32.
31. 
x + y + 2z =0 −3z = −4x + 3z − 2y + z
 



2x + z + 4w = 0
44 CHAPTER 1. INTRODUCTION TO LINEAR ALGEBRA
Chapter 2

Vectors

45
46 CHAPTER 2. VECTORS

“If the world’s a veil of tears, Smile till rainbows span it.” – Lucy Larcom

2.1 Vector Equations


Linear combinations are the bread and butter of vectors. Algebraically speaking, this is what we do with
vectors, that is, we combine some vectors to construct new vectors. While it is simple to take a list of vector,
combine them, and see what pops out, the converse is not as simple. Given a list of vectors a1 , a2 , . . . , an ∈
F m , can we determine if b ∈ F m is a linear combination of these vectors? In other words, we seek to solve
the vector equation:
x1 a1 + x2 a2 + . . . + xn an = b. (2.1.1)
In Equation (2.1.1), note that the vectors a1 , . . . , an , b are fixed and the variables are the scalars x1 , x2 , . . . , xn ∈
F.

To solve the vector equation (2.1.1), suppose that the ith component of aj is denoted aij , that is,
 
 a1j 
 
 a2j 
 
 ..  .
aj =  
 . 
 
 
amj

Likewise, let the jth component of b be bj . Then Equation (2.1.1) becomes

x1 a1 + x2 a2 + . . . + xn an = b
       
a
 11  a
 12  a
 1n  b
 1 
       
 a21   a22   a2n   b2 
       
x1  .  + x2  .  + . . . + xn  . 
     =  . 

 ..   ..   ..   .. 
 
       
       
am1 am2 amn bm
       
a x
 11 1   12 2  a x a
 1n n  x b
 1 
       
 a21 x1   a22 x2   a2n xn   b2 
       
 +  + ... +   =  . 

.. .. ..
 .. 
      

 . 
 
 . 


 . 
  
       
am1 x1 am2 x2 amn xn bm
   
a
 11 1x + a x
12 2 + . . . + a x
1n n  b
 1 
   
 a21 x1 + a22 x2 + . . . + a2n xn   b2 
   

..
 =  . 

 .. 
  

 . 
  
   
am1 x1 + am2 x2 + . . . + amn xn bm

Examining this last equality of vectors reveals a system of linear equations, namely

a11 x1 + a12 x2 + . . . + a1n xn = b1





 a21 x1 + a22 x2 + . . . + a2n xn = b2


.. .. .. .. .. (2.1.2)


 . . . . .


am1 x1 + am2 x2 + . . . + amn xn = bm ,
2.1. VECTOR EQUATIONS 47

whose augmented matrix is  


 a11 a12 . . . a1n b1 
 
 a21 a22 . . . a2n b1
 

 .
 .. .. . . .. .. 

 . . . . . 

 
am1 am2 . . . amn bm
These two problems have the same solution set.

Theorem 2.1.1. Let x = (x1 , x2 , . . . , xm ) ∈ F m be a vector over a field F . Then x is a solution to the
vector equation (2.1.1) if and only if x is a solution to the linear system (2.1.2).
       
 1   1   3   −7 
       
 2   0   0   4 
       
Example 2.1.2. Let a1 =  
, a2 = 
 
, and a3 = 
 
. Let b = 
 
. Is b a linear combi-

 3   0   5   −4 
       
       
4 1 7 −9
nation of a1 , a2 , a3 ?

To solve the vector equation,


       
 1   1   3   −7 
       
 2  0  0   4 
       
 
x1 

 + x2 
   + x3 
 
=
 
,

 3 
 

 0 


 5   −4 
 

       
4 1 7 −9

we solve the linear system whose augmented matrix is


 
 1 1 3 −7 
 
 2 0 0 4 
 
 .
 
 3 0
 5 −4 

 
4 1 7 −9

We next row reduce the matrix via the elementary row operations:
   
 1 1 3 −7   1 1 3 −7 
   
 2 0 0 4  (Row 2 − 2Row 1)  0 −2 −6 18  (− 12 Row 2)
   



 ∼ 



 3 0 5 −4  (Row 3 − 3Row 1)  0 −3 −4 17 
   
   
4 1 7 −9 (Row 4 − 4Row 1) 0 −3 −5 19 (Row 4 − Row 3)
   
 1 1 3 −7   1 1 3 −7 
   
 0 1 3 −9   0 1 3 −9 
   
∼


 ∼ 



 0
 −3 −4 17  (Row 3 + 3Row 2)  0
 0 5 −10 
 (Interchange)
   
0 0 −1 2 (−Row 4) 0 0 1 −2 (Interchange)
48 CHAPTER 2. VECTORS
   
 1 1 3 −7   1 1 3 −7 
   
 0 1 3 −9   0 1 3 −9 
   
∼


 ∼



 0
 0 1 −2 

 0
 0 1 −2 

   
0 0 5 −10 (Row 4 − 5Row 3) 0 0 0 0

This last matrix is in echelon form. We will continue to row reduced echelon form.
     
 1 1 3 −7  (Row 1 − 3Row 3)  1 1 0 −1  (Row 1 − Row 2)  1 0 0 2 
     
 0 1 3 −9  (Row 2 − 3Row 3)  0 1 0 −3   0 1 0 −3 
     
∼

 ∼ 


 ∼ 



 0
 0 1 −2 
 0
 0 1 −2 

 0
 0 1 −2 

     
0 0 0 0 0 0 0 0 0 0 0 0

Therefore, the solution is (2, −3, −2). In fact, we see that


       
 1   1   3   −7 
       
 2   0  0   4 
       

2  − 3
 

 − 2
 
=
 
.
 
 3   0   5   −4 
       
       
4 1 7 −9

         
 1   1   2   1   4 
         
 2 , a2 =  0
Example 2.1.3. Let a1 =    , a3 =  2 , a4 =  1
   
, and b =  2  be vectors over
  
         
3 2 4 4 1
Z35 . Is b a linear combination of a1 , a2 , a3 , a4 ?

To solve the vector equation,


        
 1   1  2
   1   4 
         
x1 
 2
 + x2  0  + x3  2  + x4  1  =  2
       
,

         
3 2 4 4 1

we work with the augmented matrix


   
 1 1 2 1 4   1 1 2 1 4 
   
 2 0 2 1 2  (Row 2 − 2Row 1) ∼  0 3 3 4 4 
   
   
3 2 4 4 1 (Row 3 − 3Row 1) 0 4 3 1 4 (Row 3 − 3Row 2)
   
 1 1 2 1 4   1 1 2 1 4  (Row 1 − 2Row 3)
   
∼
 0 3 3 4 4 
 ∼
 0 3 3  (Row 2 − 3Row 3)
4 4 
   
0 0 4 4 2 (4Row 3) 0 0 1 1 3
2.1. VECTOR EQUATIONS 49
   
 1 1 0 4 3   1 1 0 4 3  (Row 1 − Row 2)
   
∼
 0 3 0 1  (2Row 2) ∼  0
0   1 0 2 0 

   
0 0 1 1 3 0 0 1 1 3
 
 1 0 0 2 3 
 
∼ 0 1 0 2 0 

 
0 0 1 1 3
The resulting linear system is then
 
x1
 + 2x4 = 3 x1 = 3x4 + 3

x2 + 2x4 = 0 ⇒ x2 = 3x4
 
x3 + x4 = 3 x3 = 4x4 + 3
 

Therefore, there are five possibilities: (3, 0, 3, 0), (1, 3, 2, 1), (4, 1, 1, 2), (2, 4, 0, 3), and (0, 2, 4, 4). This gives
five possible linear combinations such as:
             
 1   2   1   1   2   1   4 
             
3
 2  + 3 2  =  2  + 3 0  + 2 2  +  1  =  2
            .
 
             
3 4 3 2 4 4 1

Definition 2.1.4. Given vectors v 1 , v 2 , . . . , v n ∈ F m , the (linear) span of v 1 , v 2 , . . . , v n , denoted Span{v 1 , v 2 , . . . , v n },


is the set of all linear combinations of v 1 , v 2 , . . . , v n inside of F m , that is, We say that a subset S ⊆ F m is
spanned by v 1 , v 2 , . . . , v n if S = Span{v 1 , v 2 , . . . , v n } and that {v 1 , v 2 , . . . , v n } is a spanning set for S.

Asking whether a vector b ∈ Span{a1 , . . . , an } is equivalent to asking whether there exists a solution b
to the vector equation
x1 a1 + . . . + xn an = b.
(These last two examples need to be replaced/updated).
     
 1   2   7 
     
Example 2.1.5. Let a1 =  , a =
 −2  2  5  , and b =  4 . Is b ∈ Span{a1 , a2 }?
    
     
−5 6 −3
As explained above, answering the question is equivalent to solving the vector equationx1 a1 + x2 a2 = b.
Likewise, this vector equation can be solved by row reduction on the augmented matrix a1 a2 b :

       
 1 2 7   1 2 7   1 2 7   1 0 3 
       
∼ 0 ∼ 0 1 ∼ 0 1 .
 −2 5 4   9 18   2   2 

       
−5 6 −3 0 16 32 0 1 2 0 0 0

Therefore, the solution is x1 = 3 and x2 = 2, that is,


     
 1   2   7 
     
 −2  + 2  5  =  4
b = 3a1 + 2a2 = 3      .
 
     
−5 6 −3
50 CHAPTER 2. VECTORS
     
 1   5   −3 
     
Example 2.1.6. Let a1 =  , and b =  8 . Is b ∈ Span{a1 , a2 }?
 −2 , a2 =  −13
 
  
     
3 −3 1
 
Like above, we row reduce the augmented matrix a1 a2 b :

     
 1 5 −3   1 5 −3   1 5 −3 
     
∼ 0 ∼ 0 ,
 −2 −13 8   −3 2   −3 2 

     
3 −3 1 0 −18 10 0 0 −2

which is now in echelon form. The third equation is 0 = −2, which implies that the system in inconsistent.
Therefore, b ∈
/ Span{a1 , a2 } 
2.1. VECTOR EQUATIONS 51

Exercises (Go to Solutions)


For Exercises 1-4, express the given system of equations as a vector equation.
 ( 
 x1 + 3x2
 =7 3x1 + 7x2 − 5x3 = −5 −2x1 + 6x2 − 11x3 =
 6
♠ 2.
1. 7x1 + 6x2 − x3 = 8 −x1 − 2x2 + 6x3 = 4 ♠ 3. x1 − 2x2 + 5x3 = −3
 
3x1 + x2 + 2x3 = 3 3x1 − 19x2 + 31x3 = −17
 

 x1 + 2x2 + 2x3 = 1

4. 3x1 + x2 =4

5x1 + 2x2 + x3 = 3

For Exercises 5-9, determine if b is a linear combination of the other vectors {a1 , a2 , . . .}. If so, write b as
a linear combination. The set of vectors {a1 , a2 , . . .} will be listed first, followed by the vector b. Answers
may vary.
           
 −1

2

 1 
 −3 2 
 5
♠ 5. 
 
,  ,
      
     
      

 3 −2 

5 ♠ 6.  −1  ,  2  ,  4 
    


    
  

 

 1 3  3

           



 3 −2   1





 1+i 1 + 2i 

 4 + 5i


 
   

 


 
 






♠ 7. 
 −2  ,  3 ,  6
    
 8. 
 1−i
, 1 + i  ,
  
 4 − 2i 
 

      
     

 
 
 

 1 −3  −9  3 − 2i −2 + i  7 − 5i
         
1 2 −1 1 6

 


       
  

       
  
 
 −1   1   2   −1   3 
        

9. 

,
 
,
 
,
 
 ,
 


2   −1   2   2  14 

       
 

   

       
  
 
1 1 −1 2  8

 

For Exercises 10-12, determine if b ∈ Span{a1 , a2 , . . .}. If so, write b as a linear combination of the ai . The
set of vectors {a1 , a2 , . . .} will be listed first, followed by the vector b. Answers may vary.
               



 −4 10 7 

 12 

 1 −2 −1   −4





 
 
 
 









 
 
 
   


10.  9  ,  6  ,  −12  ,
       4 
  ♠ 11. 
 1  ,  −1  ,  1 ,  −1
      


     
   
     
  

 
 
 
 6 4 −8  −6  2 −2 2  −2
       



 4   0   3    1



      


12. 
 4  ,  1  ,  2 ,  2
       (mod 5)


       

 

 2 2 1  1

♠ 13. For any list of vectors v 1 , . . . , v n in F m , show that Span{v 1 , . . . , v n } contains the zero vector.
14. Suppose that u1 , u2 ∈ Span{v 1 , v 2 , v 3 }. Prove that Span{u1 , u2 } ⊆ Span{v 1 , v 2 , v 3 }.
15. Suppose that v 3 ∈ Span{v 1 , v 2 }. Prove that Span{v 1 , v 2 } = Span{v 1 , v 2 , v 3 }.
16. Prove that Span{v 1 , v 2 } = Span{v 1 , v 1 + v 2 }.
52 CHAPTER 2. VECTORS

17. Suppose that u1 , u2 , . . . , uk ∈ Span{v 1 , v 2 , . . . , v n }. Prove that Span{u1 , u2 , . . . , uk } ⊆ Span{v 1 , v 2 , . . . , v n }.


18. Suppose that v n+1 ∈ Span{v 1 , v 2 , . . . , v n }. Prove that Span{v 1 , v 2 , . . . , v n } = Span{v 1 , v 2 , . . . , v n , v n+1 }.
Pn
19. Let ci ∈ F be scalars such that cn 6= 0. Prove that Span{v 1 , v 2 , . . . , v n } = Span {v 1 , v 2 , . . . , v n−1 , i=1 ci v n } .

20. Suppose that u ∈ Span{v 1 , v 2 , . . . , v n } such that u 6= 0. Then there exists an i such that

Span{v 1 , v 2 , . . . , v i−1 , v i , v i+1 , . . . , v} = Span{v 1 , v 2 , . . . , v i−1 , u, v i+1 , . . . , v}.


2.2. MATRIX EQUATIONS 53

“Some painters transform the sun into a yellow spot, others transform a yellow spot into the sun.”
– Pablo Picasso

2.2 Matrix Equations


n
Definition 2.2.1. Let A be an m × n matrix and let  x ∈ F . Let the column vectors of A be
a1 , a2 , . . . , an ∈ F n , that is, A = a1 a2 . . . an . Then the product Ax is the linear combina-
tion of the column vectors of A with coefficients corresponding to the entries of x, that is,
 
x
 1 
 
 x2 
  
Ax = a1 a2 . . . an  .  
 = x1 a1 + x2 a2 + . . . + xn an . (2.2.1)
 .. 
 
 
xn

Note that Ax is only defined if the number of columns of A is equal to the number of entries of x.

Example 2.2.2.

  3              
 1 2 −3    1   2   −3   3   0   −6   −3 
 

(a)   0  = 3
  + 0  + 2 = + + = .
0 −2 5   0 −2 5 0 0 10 10
2
           
2 0   2 0 6 0 6
 3 
          
          
(b) 
 −6 1 
  = 3  −6  + 5  1  =  −18  +  5  =  −13 
        
.
  5          
3 2 3 2 9 10 19


Theorem 2.2.3. If A is an m × n matrix, with column vectors a1 , . . . , an , and if b ∈ F m , the matrix


equation
Ax = b (2.2.2)
has the same solution set as the vector equation

x1 a1 + . . . + xn an = b

which, in turn, has the same solution set as the system of linear equations whose augmented matrix is
 
a1 . . . an b .

Example 2.2.4. The solution set to the system of equation


(
x1 + 3x2 − 7x3 = 5
− 2x2 + 11x3 = 3
54 CHAPTER 2. VECTORS

is same as the solution set of the vector equation


       
1 3 −7 5
x1   + x2   + x3  = ,
       
0 −2 11 3

which in turn has the same solution set as the matrix equation
 
  x  
 1 
1 3 −7   5
= .
   
   x2  

0 −2 11   3
x3

Corollary 2.2.5. The equation Ax = b has a solution if and only if b is a linear combination of the column
vectors of A.

Definition 2.2.6. Let Col(A) denote the set of all linear combinations
 of column vectors of A, which is called
the column space of A. In particular, if A = a1 a2 ... an , then Col(A) = Span{a1 , a2 , . . . , an }.

According to Corollary 2.2.5, the equation Ax = b is consistent if and only if b ∈ Col(A).

 
 1 3 1 
 
Example 2.2.7. Let A = 
 2 4 . Compute the column space of A.
4 
 
3 5 7
The quick response to this example would be to find a spanning set for the column space of A, but, by
definition, a spanning set for a column space is never mysterious. Note that
     



 1 3 1 



   
    

Col(A) = Span  2  ,  4  ,  4  .
     

      

 

 3 5 7 

While we have technically computed the column space, that is, we have found a description of the set of
vectors, it does not leave the decision question of whether a generic vector b = (b1 , b2 , b3 ) belongs to Col(A)
well answered, which is really what we want to be able to do. To answer this question, we row reduce the
augmented matrix
       
 1 3 1 b1   1 3 1 b1   1 3 1 b1   1 3 1 b1 
       
∼ 0 ∼ 0 1 b1 − 12 b2 ∼ 0 1 1
1
 2 4 4
 b2   −2 2 b2 − 2b1   1  b1 − 2 b2


       
1
3 5 7 b3 0 −4 4 b3 − 3b1 0 1 −1 4 (3b1 − b3 ) 0 0 0 − 41 b1 + 12 b2 − 14 b3

Therefore, the above equation is consistent if and only if − 14 b1 + 12 b2 − 14 b3 = 0. Therefore, Now, there exists
plenty of choices of b such that this equation does not hold. Hence, Ax = b is not consistent for all b.

Col(A) = {b ∈ R3 | b1 − 2b2 + b3 = 0}. 


2.2. MATRIX EQUATIONS 55

Generalizing the principles seen in the previous example, for any n × n matrix A, the equation Ax = b
has a solution for all b if and only if A has a pivot in each row, in which case the solution must be unique.

Let A be an (m × n) matrix and let x ∈ F n . Then the rule x 7→ Ax is a transformation called a matrix
transformation.

Theorem 2.2.8. Let A be an m × n matrix and let u, v ∈ F n . Let c ∈ F . Then A(u + v) = Au + Av and
A(cu) = c(Au). In particular, multiplication by a matrix is a linear transformation.
 
Let A be an m × n matrix with column vectors a1 , . . . , an . Let A = aij , that is, let aij denote the
n
entry of A in the (i, j) position. Let x ∈ F . Then
       
 x1   a11   a12   a1n 
       
 x2   a21   a22   a2n
        

Ax = a1 a2 . . . an  .    = x a
1 1 + . . . + x a
n n = x 1

.
 + x 2

.
 + . . . + x n

 ..

 ..   ..   .. 
   
 . 
       
       
xn am1 am2 amn
       
 x1 a11   x2 a12   xn a1n   x1 a11 + x2 a12 + . . . + xn a1n 
       
 x1 a21   x2 a22   xn a2n   x1 a21 + x2 a22 + . . . + xn a2n 
       
=   ..
+
  ..
 + ... + 
  ..
=
  ..
.


 . 
 
 . 


 . 
 
 . 

       
x1 am1 x2 am2 xn amn x1 am1 + x2 am2 + . . . + xn amn

That is  
 x1 a11 + x2 a12 + . . . + xn a1n 
 
 x1 a21 + x2 a22 + . . . + xn a2n
 

Ax = 
 ..
.
 (2.2.3)

 . 

 
x1 am1 + x2 am2 + . . . + xn amn

This formula often is easier to compute than the original definition of Ax.

 
 0 −2 
  2 3
Example 2.2.9. Let A = 
 1  and define a transformation T : R → R by T (x) = Ax. Then
−3 
 
2 −3
 
 0 −2   
   x1 
T (x) = Ax = 
 1 −3 
 .
  x2
2 −3

 
3 
(a) Let u =  . Compute T (u).

−1
56 CHAPTER 2. VECTORS

   
 0(3) − 2(−1)   2 
   
 1(3) − 3(−1)   6  .
T (u) = Au =   =  
   
2(3) − 3(−1) 9
 
 −8 
2
 
(b) Find an x ∈ R whose image under T is b =  −7 

.
 
−2

We need an x1 and x2 such that


   
 0 −2   
 −8
  x1  

 
 1
 −3 
 =
 −7
.

  x2  
2 −3 −2

By row reduction, we solve the corresponding linear system:


           
 0 −2 −8   1 −3 −7   1 −3 −7   1 −3 −7   1 −3 −7   1 0 5 
           
 1 −3 −7  ∼  0 −2 −8  ∼  0 −2 −8  ∼  0 1 ∼ 0
4  1 ∼ 0 1
4  .
4 
 
      
           
2 −3 −2 2 −3 −2 0 3 12 0 1 4 0 0 0 0 0 0
 
 5 
Let x =  . Then, T (x) = b. In fact, x is the only vector whose image under T is b. 

4
2.2. MATRIX EQUATIONS 57

Exercises (Go to Solutions)


For Exercises 1-1, determine with the statement is true or false. If false, correct the statement so that it is
true.

1. A homogeneous linear system can be written as Ax = 0, where A is the m × n coefficient matrix,


x ∈ F n , and 0 is a scalar.
For Exercises 2-5, compute the matrix-vector product.
     
 4 5 2  1   3 −1 2  2 
     
2. 
 0 −1 3 
 4 
  ♠ 3. 
 4   −3 
3 7   
     
2 1 1 3 −2 1 5 1
    
 1 + 2i −1 + 3i   7 + i  1
♠ 4.      

0 3 + 4i −2 − i  1 1 1 0 0  1 


  
♠ 5.  (mod 2)
 
 0 1 0 1 0  0 
 
   
1 0 0 0 1   0 

 
1

For Exercises 6-8, express the given matrix equation as a system of equations.
     
  x1   1 2 2
      
4 1 2 0   1    
  x2     3 4  x   0
        
 
♠ 6.  0 4 3 1  
  
 ≡  0  (mod 5)
   ♠ 7. 
 
 =



   x3     −1 −2  y  2 
     
2 2 4 0   2    
x4 4 5 −9
    
 21 17 −32  
27 x   12 
    
4 42 6 19   y   −6 
    

8. 



=
 


 103
 −72 17 −8 
 z   15 
 

    
2 11 −10 13 w 1

For Exercises 9-16, solve the matrix equation.


         
 1 2 3   x1   10   4 4 2   x1   12 
         
9. 
 −1 −1   x2  =  −8
−3     
 ♠ 10. 
 −4 −3   x2  =  −3 
−2     
         
−1 −2 −2 x3 −9 −4 −3 1 x3 3
        
 1 0   1  1−i 1 1 + i   x1   7

  x1  
 
      
♠ 11. 
 2 5 
 =
 3

 12. 
 1 i −i    x2  = 
   3i 

  x2       
3 −2 0 3+i 2 3i x3 11 + 7i
58 CHAPTER 2. VECTORS
         
 1 4 3   x1   4   0 1 2   x1   1 
         
♠ 13. 
 4 3   x2  ≡  3
4      (mod 5)
 14. 
 2 1   x2  ≡  1
0      (mod 3)

         
3 4 3 x3 2 1 0 2 x3 0
      
 x1   1 1 1 1  x1   0 
   
 1 0 1 2    1
     
  x2    1 1 0 1  x2   1 
      
 
15.  ≡ 1  (mod 3) 16.  ≡  (mod 2)
 2 2 0 0   
        
   x3     1 1 1 0   x3   0 
      
0 0 1 2   1     
x4 0 1 0 1 x4 0

For Exercises 17-18, use the transformation T : R4 → R2 given by the rule:


 
x 1 
 
 
2 −1 0 3 x
 
 2 
T (x1 , x2 , x3 , x4 ) =  .
 
 
0 6 −2 1  x3 


 
x4

♠ 17. Compute T (1, 0, 0, 1) and T (1, 2, 3, 4). ♠ 18. Find x such that T (x) = (1, 2).

19. It can be shown that (4, 3, 2, 1) is the unique solution to the system of linear equations

 11x1 + 9x2 + 10x3 + 14x4 = 105


 9x + 4x2 + 5x3 + 9x4 = 67
1

 11x1 + 12x2 + 13x3 + 17x4 = 123


7x1 + 2x2 + 5x3 + 7x4 = 51.

With this knowledge, QUICKLY solve the matrix equation


    
 11 9 10 14   x1   105 
    
 9 4 5 9   x2   67 
    
  = .
    
 11 12 13 17   x3   123 
    
    
7 2 5 7 x4 51

Why were you able to solve it so quickly?


2.3. LINEAR INDEPENDENCE 59

“All mankind... being all equal and independent, no one ought to harm another in his life, health, liberty or
possessions.” – John Locke

2.3 Linear Independence


Definition 2.3.1. A set of vectors {a1 , a2 , . . . , an } ⊆ F m is said to be linearly independent if the vector
equation
x1 a1 + x2 a2 + . . . + xn an = 0
has only the trivial solution. Otherwise, the set is said to be linearly dependent.
In other words, a set of vectors is linearly dependent if there exists some weights c1 , c2 , . . . , cn , with at
least one not zero, such that
c1 a1 + c2 a2 + . . . + cn an = 0.
The previous equation is then called a linear dependence relation.
 
Expressed another way, suppose A = a1 a2 . . . an , then {a1 , . . . , an } is linearly independent if
and only if Ax = 0 has no nontrivial solution, since
 
 x1 
 
 x2
  

Ax = a1 a2 . . . an   ..
 = x1 a1 + x2 a2 + . . . + xn an = 0.

 . 
 
 
xn

The set {a1 , . . . , an } is linearly dependent if and only if Ax = 0 has a nontrivial solution.

     
 1   3   −3 
     
Example 2.3.2.  2 , v 2 =  1
Let v 1 =    , and v 3 =  4 .
  
     
3 −2 13

(a) Determine if the set {v 1 , v 2 , v 3 } is linearly independent.

To determine if the set is linearly independent, it suffices to compute the echelon form of the corre-
sponding homogeneous system. In fact, the final row of zeros is irrelevant.
       
 1 3 −3   1 3 −3   1 3 −3   1 3 −3 
       
 2
 1 4  ∼  0 −5 10  ∼  0 1 −2  ∼  0 1 −2 
     
.
       
3 −2 13 0 −11 22 0 1 −2 0 0 0

Thus, x3 is a free variable of the system, which implies that the homogeneous equation Ax = 0 does
have a nontrivial solution. Therefore, {v 1 , v 2 , v 3 } is linearly dependent.

(b) Find a linear dependence relation among v 1 , v 2 , and v 3 .


To determine a dependence relation, we finish the row reduction from above.
   
 1 3 −3   1 0 3 
   
 0 1 −2  ∼  0 1 −2 .
   
   
0 0 0 0 0 0
60 CHAPTER 2. VECTORS

Therefore, 
x1
 + 3x3 = 0
x2 − 2x3 = 0

0 =0

     
 x1   −3x3   −3 
     
Thus, x =  x2  =  2x3  = x3  2 . Thus, −3v 1 + 2v 2 + v 3 = 0 (corresponding to x3 = 1).
    
     
x3 x3 1
Of course, there are infinitely many dependence relations. For example, −15v 1 + 10v 2 + 5v 3 = 0
(corresponding to x3 = 5).

We also note that we have shown that v 3 ∈ Span{v 1 , v 2 } since −3v 1 + 2v 2 + v 3 = 0 implies v 3 =
3v 1 − 2v 2 . This observation is always the case for linear dependence. 

 
 1 −1 −1 
 
Example 2.3.3. Determine if the columns of the matrix A = 
 −1 2  are linearly independent.
4 
 
2 −4 −7
It suffices to show that Ax = 0 has no nontrivial solutions, that is, that an echelon form of A has no
zero rows.      
 1 −1 −1   1 −1 −1   1 −1 −1 
     
 −1
 2 4 ∼ 0 1
  3 ∼ 0 1
 
.
3 
     
2 −4 −7 0 −2 −5 0 0 1
Therefore, the columns of A are linearly independent. 

These examples illustrate an important point: a set of vector is linearly dependent if and only if the
corresponding homogeneous linear system has a free variable. In other words, a set of vectors is linearly
independent if and only if the associated coefficient matrix has a pivot in each column.

Theorem 2.3.4.
A set S = {v 1 , . . . , v p } ⊆ F n is linearly dependent if and only if at least one of the vectors of S is a linear
combination of the others.
Proof. Suppose first that S is linearly dependent and
c1 v 1 + . . . + cp v p = 0
is a dependence relation. Now there exists some j such that cj 6= 0. Without the loss of generality, we may
suppose that j = p. Then
c1 v 1 + . . . + cp−1 v p−1 = −cp v p
c1 cp−1
v1 + . . . + v p−1 = vp .
−cp −cp
So, v p ∈ Span{v 1 , . . . v p−1 }.

Conversely, if vp = c1 v 1 + . . . + cp−1 v p−1 , then


c1 v 1 + . . . + cp−1 v p−1 − vp = 0
is a dependence relation, that is, {v 1 , . . . , v p } is linearly dependent. 
2.3. LINEAR INDEPENDENCE 61

The following properties provided, in some cases, a quick test to determine the linear independence or
dependence of a set of vectors.

Theorem 2.3.5.

(a) If a set contains a linearly dependent subset, then the set itself is linearly dependent.

(b) The set {v} is linearly independent if and only if v 6= 0.

(c) If a set S ⊆ F n contains the zero vector, then S is linearly dependent.

(d) The set {u, v} is linearly independent if and only if neither vector is a multiple of the other, that is
v 6= cu nor u 6= cv for any c ∈ F .

(e) Suppose S = {v 1 , . . . , v p } ⊆ F n . If p > n, then S is linearly dependent.

Proof.
(a) Suppose S is linearly dependent and S ⊆ T . Then set every vector in T which is not in S to have
weight zero. Then a dependence relation on S becomes a dependence relation on T .

(b) Note that cv = 0 if and only if c = 0 or v = 0.

(c) Set the weight of the zero vector to one and set the weight of any other vector in S to zero. This gives
a dependence relation.
(d) This follows immediately from Theorem 2.3.4 when p = 2.

 
(e) Let A = v1 ... v p . Then A is an n × p matrix and the equation Ax = 0 corresponds to an
under-determined system of n equation with p unknowns. Thus, the system has a free variable by
Theorem 1.1.8, which implies the linear dependence of the column vectors. 

Example 2.3.6. Determine by inspection if the given sets are linearly independent.
 

 1 
 This set of vectors is actually just a single vec-
tor. As such, the set is linearly independent

 




(a)  2  .
  exactly when this vector is nonzero, which it
is. Therefore, the set is linearly independent.

 


 
 3  

     

 3   0   2   This set of vectors is linearly dependent be-
cause
 it  contains the zero
 vector. Note that

 

    

  
(b) 
 1   0   1 .
, ,
 3   0   2 
    

     

 
      
 5 0 7  0
 1  + 1  0  + 0  1  = 0.
    
     
5 0 7
62 CHAPTER 2. VECTORS
   
This set of vectors is linearly dependent be-
 15   −12 

 




   


 cause the second vector is just (−5/4) times
 
 0  
  
0 
 the first vector.
(c) 

,
 
 .



 20   −16 


   


   


 −5

4 

       
7 9 4 This set of vectors is linearly dependent be-
  3 

 

cause any set of 4 vectors in R3 must be de-




 
 
 
    

(d)  1  ,  0  ,  2  ,  1  .
        pendent.

        

 

 6 2 5 9  
2.3. LINEAR INDEPENDENCE 63

Exercises (Go to Solutions)


For Exercises 1-6, determine with the statement is true or false. If false, correct the statement so that it is
true.

1. If a set S ⊆ F n contains the zero vector, then S is linearly independent.


2. If a set contains a linearly dependent subset, then the set itself is linearly dependent.
3. The set {u, v} is linearly independent if and only if neither vector is a multiple of the other, that is,
v 6= cu and u 6= cv for any c ∈ F .

4. Suppose S = {v 1 , . . . , v p } ⊆ F n . If p > n, then S is linearly dependent.


5. The set {v} is linearly independent if and only if v 6= 0.
6. A set S of two or more vectors is linearly dependent if and only if at least one of the vectors in S is a
linear combination of the others.

QUICK! For Exercises 7-13, determine whether the set of vectors is linearly independent or dependent using
Theorem 2.3.5 in LESS THAN 10 SECONDS! Justify your quick response.
         
 1
  0   0   1   1 
 
 
 





     

 8.   ,  
♠ 7.  2  ,  1  ,  0 
      i

0 


     

 

 3 4 0 
             
 −1 + i
 1   2   3   7   −6 + 3i   i

 
  

 
10.  , ,

   

       
 

 1   2   0   1 2 3+i −4 + 2i 

         
 
9. 

,
 
,
 
,
 


0 0 6 0

       

       


       

 
 0

0 7 2


           
 1  1 0 1 0   1

 
 
 


  
        


 
 
          

   
 2 
   0   0   1 
      
  1  
 
0


♠ 11. 


 (mod 11) ♠ 12.   , 
  
,
 
,
  ,
   (mod 2)

 3 
 1   0   1   1  0

  
         

 
 
   


 
 
          

   
 4 
   1

1 1 0 1

     
 1   0   3 

 


 

     

 
 2   5   0 
     
13. 

,
 
,
 
 (mod 5)



 0   5   2 


     



     

 4

5 2 

For Exercises 14-20, determine if the set of vectors is linearly independent or dependent. If linearly dependent,
provide a nontrivial dependency relation.
           
1 2 3 1 2   −2

 
 
 


   


 
 
 
 





 
   


14.  0  ,  2  ,  6 
      ♠ 15. 
 −1  ,  −2  ,  3
    


     
 
     

 
 
 

 2 4 5   3 6 −2 
64 CHAPTER 2. VECTORS
           
 1   −1   2  2   1   4 

 
 
 

  
 


     

 
     

 −2   2   −3 
      17.   ,   ,
 4   3   1 
 
♠ 16. 
  ,
 
  ,
 
 





      



 −1   2   −3    6 2 3 

     


     


 −1

4 2 

           



 i   2 0 




 1 1 1 



  
 
 





 
 
 
 


♠ 18. 
  ,  1 − 2i
1   , i 
   ♠ 19.  3  ,  0  ,  2  (mod 5)
     

      
     


 
 
 

 −i −1 + i i   1 0 2 
       
1 2 6 0 

 

       


       

 
 2   6   5 
     
  5 

♠ 20. 

,
 
,
 
,
 
 (mod 7)

 0   6   1   4 

       

 


       

 
 1

4 1 4 

2.4. AFFINE GEOMETRY 65

“Space is an inspirational concept that allows you to dream big.” – Peter Diamandis

2.4 Affine Geometry


For any field, there is a limited amount of geometry, called affine geometry, we can attach to the vector
space F n by mimicking geometric structures from Rn .

Definition 2.4.1. Let F n be a vector space. Then a flat (or affine set) is a subset of F n which is congruent
to F m for some 0 ≤ m ≤ n. Equivalently, flats are solution sets of linear systems (vector equations, matrix
equations, etc.).

To describe flats, e.g., points, lines, planes, we present two recursive constructions: Top-Down or Bottom-
Up. The Top-Down approach is to start with a single non-zero, linear equation a1,1 x1 + . . . + a1,n xn = bi .
The solution set of this linear equation forms a special kind of flat, called a hyperplane. For example,
ax + by + cz = d defines a plane in R3 . By plane, we mean something that “looks” like R2 inside of Rn . In
general, a plane over F n should be a subset that “looks” like F 2 . The solution set to this 1×n system should
have n − 1 free variables. Generally speaking, we view a hyperplane in F n as an affine set that “looks” likes
F n−1 in F n .

Next, consider a linear system that contains the above linear equation and a second one a2,1 x1 + . . . +
a2,n xn = b2 . The solution set to this 2 × n linear system is geometrically the intersection of the two hyper-
planes, for example, two distinct planes intersecting in R3 form a line. By line, we mean something that
looks like R1 = R inside of Rn . In general, a line over F n should be a subset that “looks” like F 1 = F . If
the equations are linearly independent, then this 2 × n system will have n − 2 free variables. Hence, we view
an intersection of two hyperplanes in F n as an affine set that “looks” likes F n−2 in F n .

Continuing in this fashion of expanding the linear system by adding new linear equations while main-
taining linear independence, the m × n linear system will have n − m free variables and the flat will resemble
F n−m . This continues until m = n and the intersection of hyperplanes is just a point, which resembles
F 0 = {0}. In this case, there are no free variables to the linear system. This is, of course, only true if the
set of linear equations is linearly independent; if linearly dependent, then at least one equation is a linear
combination of the rest and its removal does not change the flat one bit. That is, the inclusion of a linear
combination of equations already in the system offers no restriction on the solution set whatsoever, and this
equation is frankly redundant to the linear system. This summarizes the Top-Down approach to affine sets.

The “shape” of the flat, that is, the parameter p for which the affine set resembles F p , appears to be
determined by the number of free variables in the linear system. The Bottom-Up approach to flats is to
adjoin more and more free variables until the flat has the right “shape.”

A point in F n is the solution to the vector equation

x = x0 ,

where x is a variable vector and x0 is a fixed vector. So a point in F n is just a vector, that is, P = x0 . It is
also a translation of the empty span Span{} = {0} = F 0 .

A line in F n , call it `, is the translation of the span of a single vector v, which acts as the direction or
slope of the line. If x0 is a point on `, then ` is the solution set to the vector equation

x = x0 + tv.

Similarly, a plane in F n , call it P, is a translation of a span of two linearly independent vectors, that is,
a translation of Span{u, v}. Then P is the solution set to the vector equation

x = x0 + su + tv,

where vectors u, v ∈ F n , {u, v} is linear independent, and x0 is on P.


66 CHAPTER 2. VECTORS

By allegory, we can construct higher flats by increasing the number of linearly independent vectors in
the linear combination associated with the vector equation, that is, we increase the number of linearly
independent vectors in the spanning set, the so-called spanners of the flat. For example, a hyperplane in
F 4 is the set of solutions to the vector equation
x = x0 + ru + sv + tw,
4
where {u, v, w} ⊆ F is linearly independent. More generally, an m-flat is the solution set to the vector
equation
Xm
x = x0 + ti v i , (2.4.1)
i=1
where {v 1 , v 2 , . . . , v m } ⊆ F n is a linearly independent set of vectors. Equation (2.4.1) is called the vector
form of the flat. Each component in this vector equation is a linear equation in its own right. The system
of linear equations associated to this vector equation is called the parametric equations of the flat. These
parametric equations would be the general solution to the (n − 1) × n Top-Down linear system described
above. This summarizes the Bottom-Up approach to affine sets.

Example 2.4.2.  Find


 a vector equation and parametric  equations
 of the line in R3 that passes through

 1   5 
   
 2  and is parallel to the vector v =  −3 .
the point x0 =    
   
3 1

If the flat is said to be “parallel” to a vector,


then that vector acts as a spanner for the flat.
Hence, the vector equation is simple enough:
x = x0 + tv For the parametric equations, we look closer:
    
x
 1   1 5 
  





 x1 = 1 + 5t

 x  =  2  + t  −3  .
 2      x2 = 2 − 3t

x3 = 3 + t.
      
x3 3 1


Example 2.4.3. Find a vector equation and parametric equations of the line in R4 that passes through
the origin and is parallel to the vector v = (4, −3, 2, −1).

As the line passes through the origin, we set


x0 = 0. Hence, the vector equation for this line
is :
x = tv,
    and the the parametric equations are given as:
x
 1  4 
x1 = 4t
 
    
 x2   −3 
    
x = −3t

  = t , 2
x 3 = 2t
   
 x3   2  

    
x4 = −t.
    
x4 −1


Example 2.4.4. Find a vector equation and parametric equations of the plane in R4 that passes through
x0 = (26, 3, −13, −18) and is parallel to both the vectors u = (1, −3, −2, −1) and v = (0, 0, 1, 0).
2.4. AFFINE GEOMETRY 67

The vector equation takes on the form:


x = x0 + su + tv,
        and the the parametric equations are given as:
 x1 26 1 0  
x1 = 26 + s
     
        
 x2 3   −3  0 
        
3 − 3s
   
x =
=  + s  + t , 2
 .
x3 = −13 − 2s + t
       
 x3   −13   −2   1  

        
x4 = −18 − s
        
x4 −18 −1 0


Suppose we have a line which contains the vector x0 and with spanner v. Let x1 be another vector on
this line. Then there exists some s ∈ F such that x1 = x0 + sv. Note that s 6= 0, since x1 6= x0 . Hence,
sv = x1 − x0 , which implies that the difference of any two vectors on the line is the a scalar multiple of the
spanner. Also, v = 1s (x1 − x0 ). Thus, for any vector x on the line, we have
t s−t t
x = x0 + tv = x0 + (x1 − x0 ) = x0 + x1 .
s s s
s−t t
Note that the coefficients satisfy the condition that + = 1. In general, the line containing x0 and x1
s s
is the set of vectors of the form {a0 x0 + a1 x1 | a0 , a1 ∈ F, a0 + a1 = 1}.

These principles can be generalized to higher flats too.

Definition 2.4.5. Given vectors x0 , x1 , . . . , xm ∈ F n , the vector x given as


x = a0 x0 + a1 x1 + . . . + am xm
is called an affine combination of x0 , x1 , . . . , xm if the scalars ai satisfy the equality a0 + a1 + . . . + am = 1.
The affine span of {x0 , x1 , . . . , xm }, denoted Aff(x0 , x1 , . . . , xm ), is the set of all affine combinations of
x0 , x1 , . . . , xm , that is, (m )
X m
X
Aff(x0 , x1 , . . . , xm ) = ai xi ai = 1 .
i=0 i=0

Note that the affine span of the vectors x0 , . . . , xm is not the same object as the linear span of the vectors
x0 , . . . , xm (recall Definition 2.1.4), although in general
Aff{x0 , . . . , xm } ⊆ Span{x0 , . . . , xm }. (2.4.2)
Since affine combinations are linear combinations with the extra condition that scalars sum to 1, all affine
combinations are linear combinations, but the converse does not hold, hence the above inequality. The affine
span of the vectors x0 , . . . , xm is the smallest affine set containing these specific vectors. To see this, note
that because of the assumption about the coefficients, any affine combination could be written as
x = (1 − a1 − a2 − . . . − am )x0 + a1 x1 + a2 x2 + . . . + am xm .
On the other hand, the affine combination x is on the flat determined by the equation
x = x0 + a1 (x1 − x0 ) + a2 (x2 − x0 ) + . . . + am (xm − x0 ). (2.4.3)
Notice this flat also contains each xj , which is obtained by setting ai = 1 when i = j and ai = 0 when i 6= j.

We see the very important observation from (2.4.3): the spanners for a flat can be found by taking
differences of specific vectors on the flat.

Example 2.4.6. Find a vector equation and parametric equations of the line in R3 through (1, 2, 3) and
(2, −2, 0).

We can take x0 to be either of the two points. We will take x0 = (1, 2, 3). To get the spanner v, we need
to take the difference of the two vectors provided, that is, v = (2, −2, 0) − (1, 2, 3) = (1, −4, −3).
68 CHAPTER 2. VECTORS

Therefore, the vector equation is given as


x = x0 + tv, and the parametric equations as:
 
   
x
 1   1 1 
  





 x1 = 1 + t

 x  =  2  + t  −4  ,
 2      x2 = 2 −4t .

x3 = 3 −3t
      
x3 3 −3


Example 2.4.7. Find a vector equation and parametric equations of the plane in R4 that passes through
(−17, 6, 29, 0), (−13, 3, 25, −2), and (−15, 6, 25, −1).

We can take x0 to be any of the three points. We will take x0 = (−17, 6, 29, 0). To get the spanners u
and v, we need to take the differences of the three vectors provided, that is,

u = (−13, 3, 25, −2)−(−17, 6, 29, 0) = (4, −3, −4, −2), and v = (−15, 6, 25, −1)−(−17, 6, 29, 0) = (2, 0, −4, −1).

Therefore, the vector equation is given as:


x = x0 + au + bv,
        and the parametric equations as
 x1   −17   4   2 
x1 = −17 + 4a + 2b

        
 x2   6   −3   0 
        
6 − 3a

x =
=  + a  + b , 2
 .
x3 = 29 − 4a − 4b
       
 −4   −4 
 x3   29 
     

   
− 2a − b

        x4 =
x4 0 −2 −1


Note that each flat corresponds to the solution set of a linear system: the parametric equations for the
Bottom-Up approach or the m × n linear system for the Top-Down approach. In the Top-Down approach, to
compute the intersection of two flats in F n , one need only take the union of all the linear equations from their
associated linear systems and solve the combined linear system. This really is the drive between Top-Down:
intersections! For computing the intersection with the Bottom-Up approach, it is important to remember
that we use different symbols for the free variables in the different parametric equations. The intersection
of flats is the set of all points that are in both subsets. The parameters that give these coincident points
do not have to be the same. If the same symbols for the parameters are used, it makes the false impression
that the parameters must be equal, which is not true in general.

Example 2.4.8. Find the intersection for the two planes in R4 from Examples 2.4.4 and 2.4.7.

To begin, we can equate the values for the dependent variables x1 , x2 , x3 , x4 , as follows:

= −17 + 4a + 2b 4a + 2b − s
  
 26 + s  = 43  a = 8
  
 3 − 3s

= 6 − 3a

 −3a + 3s = −3

 b = 9
∼ ∼

 −13 − 2s + t = 29 − 4a − 4b 
 −4a − 4b + 2s − t = −42 
 s = 7
  
−18 − s − 2a − b −2a − b + s = −18 = −12
  
= t

Solving this linear system gives the values a = 8 and b = 9 (s = 7 and t = −12), which implies that there
is a unique point of intersection between the two planes, (33, −18, −39, −25) . As bizarre as it may be to
visualize, although two distinct planes cannot intersect at a unique point in 3-space, they can in 4-space. 

Example 2.4.9. In the following example, we will compute the intersection between two affine sets, but
we will do it twice! The first time we will compute the intersection when the flats are represented Top-Down,
and the second attempt will be when the flats are represented Bottom-Up. We present them side-by-side for
2.4. AFFINE GEOMETRY 69

the reader to compare the differences in the representation.

Find the intersection of the two affine sets in R3 given as:

Top-Down Bottom-Up
(  
x1 + 3x2 − x3 = 16 x1 = −1 + 7s
 x1 = −14 + 20t

x1 + 4x2 + x3 = 24 x2 = 6 + −2s and x2 = 10 − 6t
 
x3 = 1 + s x3 = −1 + 3t
 
and (
3x1 + 7x2 − 6x3 = 34
To begin, we equate the variables for each of
−3x1 − 8x2 + 4x3 = −42. the parametric equations. Then remove the coor-
dinate vector and solve the parameters as a linear
To begin, we combine the two linear systems system.
into one. Note that any vector on the first flat is a
solution to the first two equations, and any vector  
on the second flat is a solution to the second two  7s − 1 = x1 = 20t − 14 
  7s − 20t = −13
equations. Hence, a vector on their intersection −2s + 6 = x2 = −6t + 10 ∼ −2s + 6t = 4
 
will be a solution to all four linear equations. s + 1 = x3 = 3t − 1 s − 3t = −2
 

x1 + 3x2 − x3 = 16



We proceed to solve the linear system:

 x + 4x + x = 24
1 2 3

 3x1 + 7x2 − 6x3 = 34
    
−3x1 − 8x2 + 4x3 = −42.

7 −20−13   1 0 1 
We proceed to solve the linear system:

   
∼ 0 1
     −2 6 4   1 
 
1 3 −1 16   1 0 0 6     

    1 −3 −2 0 0 0
 1 4 1 24   0 1 0 4 
   
 ∼ 
   
 3
 7 −6 34   0 0
 1 2 
 Hence, the intersection occurs when the pa-
rameters are s = 1 and t = 1. Therefore, the
   
−3 −8 4 −32 0 0 0 0
intersection of the two affine sets is the point
Therefore, the intersection of the two affine sets x = (−1 + 7(1), 6 − 2(1), 1 + (1)) = (−14 +
is the point x = (6, 4, 2) . 20(1), 10 − 6(1), −1 + 3(1)) = (6, 4, 2) . 
70 CHAPTER 2. VECTORS

Exercises (Go to Solutions)


For Exercises 1-6, describe what vector space the given affine set resembles. Assume all vectors and equations
are linearly independent.

♠ 1. A linear system with 4 equations and 6 vari- ♠ 2. A linear system with 7 equations and 8 vari-
ables. ables.

♠ 3. A linear system with 4 equations and 12 vari- ♠ 4. The span of three vectors.
ables.

♠ 5. The translations of the span of five vectors. ♠ 6. The span of no vectors, that is, Span{}.

For Exercises 7-14, find the vector form and parametric equations for the given affine set. Answers may vary.
7. The line passing through (5, 4, 6) and (1, −2, −3) in R3 .

8. The plane containing (−3, 6, 1) and parallel to (4, −2, 3) and (1, 6, 5) in R3 .
9. The plane containing (1, 2, 3), (−1, 4, 1), (4, −2, 8), (0, 1, 6) in R3 .
♠ 10. The plane containing (1, 2, 3, 4), (0, 1, 0, 1), and (−1, −2, 1, 4) in R4 .
♠ 11. The hyperplane containing (0, 1, 4, −1), (11, 0, 0, 2), (−2, −3, 0, 1), and (9, 9, 0, −1) in R4 .

♠ 12. The line containing (i, 1 + 2i) and parallel to the vector (2 − i, 5) in C2 .
♠ 13. The line passing through (1, 2, 3) and (0, 1, 3) in Z35 .

♠ 14. The plane containing (2, 3, 1) and parallel to (1, 2, 3) and (0, 0, 5) in Z37 .

For Exercises 15-19, find the intersection between the two flats provided. Answers may vary.
( (
x1 + x2 + x3 − 5x4 = 3 5x1 + 2x2 − x3 − x4 = −3
15. and
2x1 + 3x2 + 2x4 = −4 −3x1 + 2x3 − 4x4 = 5
( (
−x1 − 6x3 − x4 = −17 x1 − 2x2 − 2x3 + x4 = −1
♠ 16. and
−3x1 + 2x2 − 10x3 − 3x4 = −33 x2 + 3x3 − 3x4 = 5
         
 −2   1   3   4   −3 
         
17. x = 
 2  + a  −1  and x =  −1  + b  −2  + c  1 
        
         
−4 2 2 0 4
         
 4   −1   4   1   2 
         
2   0   −2   0  −2 
         
 
♠ 18. x = 

 + a
 
 and
 x=

 + b
   + c
 
.

6   −1 
     4   3   2 
      
         
−2 2 15 −5 3

19. The affine sets from Exercises 7 and 8.


2.5. SUBSPACES 71

“For the wise man looks into space and he knows there is no limited dimensions.” – Lao Tzu

2.5 Subspaces
Lines, planes, and hyperplanes through the origin are just special examples of subspaces of F n .

Definition 2.5.1. Let V be a vector space, such as F n . A subspace of V is any set W ⊆ V such that:
(i) 0 ∈ W ,
(ii) For each u, v ∈ W , the sum u + v ∈ W ,
(iii) For each v ∈ W and scalar c ∈ F , the vector cv ∈ W .

In words, a subspace is nonempty subset of V closed under addition and scalar multiplication. It should
be mentioned that closure under addition and scalar multiplication is equivalent to closure under linear
combinations.

When considering vector spaces before, we saw that a set of things is called a vector space when we can
appropriately add and scale all the objects, which are then called vectors. Necessary to this definition is that
the sum of two vectors and a scaled vector still be vectors! A subspace is a vector space inside of a vector
space, that is, each subspace of V is also a vector space in its own right. The sum of two vectors or a scaled
vectors originating from W will remain in W still. In regard to the axioms of a vector space, the subspace
inherits the axioms from the ambient vector space. For example, since ALL vectors commute in the larger
vector space, all vectors will commute in the smaller subset.

The vector space V in consideration will nearly always be F n for some field F . It should be mentioned
that F n is a subspace of F n because it satisfies all three conditions. Also, the zero space F 0 = {0} is also
a subspace.

Example 2.5.2. If v 1 , v 2 ∈ F n and W = Span{v 1 , v 2 }, then we claim that W is a subspace of F n . To


prove this claim, we must show that W satisfies the three conditions in the definition. Let c, s1 , s2 , t1 , t2 ∈ F .
Then
(i) 0 = 0v 1 + 0v 2 ∈ W ,
(ii) (s1 v 1 + s2 v 2 ) + (t1 v 1 + t2 v 2 ) = (s1 + t1 )v 1 + (s2 + t2 )v 2 ∈ W ,
(iii) c(s1 v 1 + s2 v 2 ) = (cs1 )v 1 + (cs2 )v 2 ∈ W .
Therefore, W is a subspace of F n . By mathematical induction, this same argument shows that a span for
any number of vectors is also a subspace of F n . 

Because of this, we often say that a span of a set of vector S is the subspace spanned by S. Likewise,
if H is a subspace of F n and W = Span{S} for some S ⊆ F n , then we say that S is a spanning set for W .

By our consideration of lines, planes, hyperplanes, and affine sets (flats) which pass through the origin
are subspaces because it is just a span of vectors. In fact, every coset is simply the translation of a subspace.
On the other hand, a translation of a subspace will likely not be a subspace since it no longer contains the
zero vector.

Example 2.5.3. In R2 , all lines through the origin are 1-dimensional subspaces. On the other hand, other
lines are not subspaces since they do not contain the origin which is the zero vector of R2 .

Let W be the set of all points (x, y) ∈ R2 for which x ≥ 0 and y ≥ 0, that is, W is the first quadrant of
the plane. This is not a subspace. On the one hand, the set contains 0 and is closed under addition. For
example, if x1 , x2 , y1 , y2 ≥ 0, then x1 + x2 , y1 + y2 ≥ 0 and (x1 , y2 ) + (x2 , y2 ) ∈ W . On the other hand, if
x, y ≥ 0, then −x, −y ≤ 0 and −(x, y) = (−x, −y) ∈ / W . Hence, W is not a subspace of R2 . 
72 CHAPTER 2. VECTORS
 
Let A = a1 a2 ... an be an m × n matrix. Then recall that the column space of A, denoted
Col A, is Span{a1 , a2 , . . . , an }. The column space of A is the set of all b ∈ F m such that Ax = b is consis-
tent. By Example 2.5.2, Col A is a subspace of F m .

   
 1 8 7   3 
   
Example 2.5.4. Let A = 
 7 6  and b =  3  over Z11 . Determine whether b ∈ Col A.
9   
   
8 7 6 7
We need to determine whether b is a linear combination of the column vectors of A. This is the same as
solving the matrix equation Ax = b. To do this, we use row reduction:
     
 1 8 7 3   1 8 7 3   1 8 7 3 
     
 7 6 9 3  ∼  0 5 4 4  ∼  0 5 4 4  (mod 11).
     
     
8 7 6 7 0 9 5 5 0 0 0 0

Thus, the system is consistent, which implies that b is in Col A. 

Example 2.5.5.
(a) Let V = R∞ be the set of all real-valued sequences. This is a vector space since we can add two
sequences:

{x1 , x2 , x3 , . . .} + {y1 , y2 , y3 , . . .} = {x1 + y1 , x2 + y2 , x3 + y3 , . . .} ∈ R∞ ,

and scale a sequence:


c{x1 , x2 , x3 , . . .} = {cx1 , cx2 , cx3 , . . .}.
Notice the zero vector of this vector space is the constant zero sequence:

0 = {0, 0, 0, . . .}.

Let W be the subset of R∞ of convergent sequences. Since the zero sequence converges (to zero), it is
contained in W . By limit properties, the sum of convergent sequences is convergent and converges to
the sum of limits. Likewise, a multiple of a convergent sequence converges to a multiple of the limit.
Therefore, W is a subspace of R∞ . By similar reasoning, the set of all sequence which converge to zero
is a subspace of R∞ .

(b) Let RX = {f : X → R}, that is, the set of all real-valued functions where X ⊆ R, e.g. RR is the
set of real-valued function defined on the entire x-axis. Then RX is also a vector space. Note that if
f, g ∈ RX , then the sum of “vectors” is the function defined by the rule:

(f + g)(x) = f (x) + g(x),

and scalar multiplication is given by the rule:

(cf )(x) = c[f (x)].

For this vector space, the function f (x) = 0 for all x ∈ X is the zero vector, called the zero function.

Let P denote the set of all polynomials with real coefficients. Viewing P as a subset of RX , as the
domain of any polynomial can be restricted to X, then P is a subspace of RR . To see this, note that
the zero polynomial is the same as the zero function and is contained in P. Second, the sum of two
2.5. SUBSPACES 73

polynomials is again a polynomial. Third, a polynomial times by a real number is again a real-valued
polynomial. Therefore, P is a subspace of RR .

Let Pn be the set of polynomials with degree at most n. By the same reasoning as before, Pn is a
subspace of P and hence a subspace of RX . In fact, Pn = Span(1, x, x2 , . . . , xn ). Furthermore,

P0 ≤ P1 ≤ P2 ≤ P3 ≤ . . . ≤ P ≤ RX .

(c) Let C(X) be the set of all real-valued continuous functions on the domain X. This is a subset of RX .
By facts from Calculus, the zero function is continuous, the sum of continuous functions is continuous,
and the multiple of a continuous function is continuous. Thus, C(X) ≤ RX .

Likewise, we can define the set C 1 (X) to be the set of all real-valued continuously differentiable (f 0 is
continuous) functions on the domain X. Likewise, Calculus tells us that constant functions are differ-
entiable, sums of differentiable functions are differentiable, and multiples of differentiable functions are
differentiable. Thus, C 1 (X) ≤ RX . Of course, since differentiable functions are necessarily continuous,
we have C 1 (X) ≤ C(X) ≤ RX .

Likewise, we can define C n (X) to be the set of functions f for which f (n) is continuous on X. By
similar reasoning, each of these sets are subspaces of RX . Let C ∞ (X) be the set of all functions in
RX for which all higher derivatives exists (and necessarily are continuous). This is likewise a subspace
and called the space of smooth functions. In fact, we have the following descending sequences of
subspaces:

RX ≥ C(X) ≥ C 1 (X) ≥ C 2 (X) ≥ . . . ≥ C n (X) ≥ . . . ≥ C ∞ (X)


≥ P ≥ . . . ≥ Pn ≥ . . . ≥ P2 ≥ P1 ≥ P0 .

All these vector spaces allude to why Linear Algebra is extremely useful for Advanced Calculus (often
called Real Analysis). Linear algebra is everywhere in calculus!

With the exception of those concepts directly related to calculus, such as continuity and derivatives,
the above examples of function spaces can be adapted to any field F . 
74 CHAPTER 2. VECTORS

Exercises (Go to Solutions)


For Exercises 1-6, explain why the subset of R2 does NOT form a subspace, that is, explain which of the
three axioms are satisfied and which ones fail. Provide a counterexample to justify when they fail. Answers
may vary.

♠ 1. W is the unit circle, that is, ♠ 2. W is the unit disc, that is,
W = {(x, y) | x2 + y 2 = 1}. W = {(x, y) | x2 + y 2 ≤ 1}.

♠ 3. W is the standard parabola, that is, ♠ 4. W is the union of the x- and y-axis, that is,
W = {(x, y) | y = x2 }. W = {(x, y) | x = 0 or y = 0}.

♠ 5. W is the upper half-plane, that is, 6. W is the punctured plane, that is,
W = {(x, y) | y ≥ 0}. W = {(x, y) | x2 + y 2 ≥ 1}.

For Exercises 7-10, decide if the subset of the function space RR forms a subspace. Justify your answer in a
similar style to Exercises 1-6. Answers may vary.

♠ 7. The set of functions whose y-intercept is 1, that ♠ 8. The set of functions whose y-intercept is 0, that
is, f (0) = 1. is, f (0) = 0.

♠ 9. The set of functions whose end behavior on the ♠ 10. The set of odd functions, that is, f (−x) =
right is ∞, that is, lim f (x) = ∞. −f (x).
x→∞
2.6. SOLUTION SETS OF LINEAR SYSTEMS 75

“After every storm the sun will smile; for every problem there is a solution, and the soul’s indefeasible duty
is to be of good cheer.” – William R. Alger

2.6 Solution Sets of Linear Systems


Recall that a linear system is said to be homogeneous if it can be written in the form Ax = 0, where A is a
m × n matrix, x ∈ F n , and 0 is the zero vector in F m . A homogeneous system Ax = 0 always has a solution,
namely x = 0. This can be thought of as the trivial solution. Any other solution to a homogeneous system
is a nontrivial solution.

Theorem 2.6.1. The homogeneous system Ax = 0 has a nontrivial solution if and only if the equation has
at least one free variable.

Example 2.6.2. Determine if the following homogeneous system has a nontrivial solution.

3x1 + 5x2 + 3x3 ≡ 0

4x1 + 5x2 + 4x3 ≡ 0 (mod 7).

6x1 + x2 + 6x3 ≡ 0

Let A be the coefficient matrix. Then we must row reduce the augmented matrix [A | 0], as seen below.
     
 3 5 3 0   3 5 3 0   3 5 3 0 
     
[A | 0] ≡ 
 4 5 4 ∼ 0
0   3 0 ∼ 0 3 0
0  
.
0 
     
6 1 6 0 0 5 0 0 0 0 0 0

Thus, x3 is a free variable. This implies that Ax = 0 has a nontrivial solution! To determine such a solution,
we continue to the RREF;
       
 3 5 3 0   3 5 3 0   3 0 3 0   1 0 1 0 
       
 0 3 0 0  ∼  0 1 0 0  ∼  0 1 0 0  ∼  0 1 0 0 .
       
       
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Thus, the system of equations becomes:



x1
 + x3 = 0
x2 =0 (mod 7)

0 =0

and the nontrivial solutions are of the form:


     
x
 1   6x 3   6 
     
x ≡  x2  ≡  0  ≡ x3  0 
    
 (mod 7).
     
x3 x3 1
76 CHAPTER 2. VECTORS
 
 6 
 
Thus, if v ≡ 
 0 , then the solution set of Ax = 0 is

 
1
             



 0 6 5 4   3 2 1 



   
   
 
   
   
   


Span{v} = {tv | t ∈ Z7 } =  0 , 0 , 0 , 0 , 0
       
, 0 , 0  .
     

              

 

 0 1 2 3 4 5 6 

Definition 2.6.3. Let A be an m × n matrix. Then the null space (or kernel) of A, denoted Nul A, is the
set of all solutions of the homogeneous system Ax = 0. The nullity of A, denoted nullity(A), is the number
of non-pivot columns in A. This value counts the number of free variables in the homogeneous system and
essentially measures the “size” of null space.

The null space A is a subset of F n . Furthermore, Nul A is a subspace. This agrees with the previous
example where the solution set was spans of vectors. As such, we can geometrically visualize the solution
set to a homogeneous system as a plane through the origin.

Theorem 2.6.4. The null space of an m × n matrix A is a subspace of F n .


Proof. Since A0 = 0, we have that 0 ∈ Nul A. Next, suppose that x, y ∈ Nul A. This means that Ax = 0
and Ay = 0. Thus, A(x + y) = Ax + Ay = 0 + 0 = 0. Hence, x + y ∈ Nul A. Finally, if c ∈ F , then
A(cx) = c(Ax) = c0 = 0. Hence, cx ∈ Nul A. Therefore, Nul A is a subspace of F n . 

 
 1 3 −3 4 
 
Example 2.6.5. For the matrix A = 
 2 1 , find a spanning set for its null space.
4 3 
 
3 −2 13 1
Note that when solving a homogeneous linear system, the augmented column is just the zero vector and
no row operation will ever transform a column of zeros. As such, we often omit the zero column when
solving homogeneous systems. In fact, solving homogeneous systems is essentially the same techniques as
determining whether a set of vectors is linearly independent (compare Example 2.3.2).
         
 1 3 −3 4   1 3 −3 4   1 3 −3 4   1 3 −3 4   1 0 3 1 
         
∼ 0 −5 10 −5 ∼ 0 −2 ∼ 0 1 ∼ 0
−2 1  .
−2 1 
 2 1 4 3    1 1    1

         
3 −2 13 1 0 −11 22 −11 0 1 −2 1 0 0 0 0 0 0 0 0

The associated reduced linear system would then be



x 1 + 3x3 + x4 = 0 (
= −3x3 − x4

x1
x2 − 2x3 + x4 = 0 ∼
 x2 = 2x3 − x4
0 =0

2.6. SOLUTION SETS OF LINEAR SYSTEMS 77

Thus, the general solution to the homogeneous system Ax = 0 is


           
x
 1   −3x 3 − x 4   −3x3   −x4   −3   −1 
           
 x2   2x3 − x4   2x3   −x4  2   −1 
           

x= 
=
 
=
 
+
 
 = x3 
 
 + x4 
 
.

 x3   x3 x3   0
     1   0 
         
           
x4 x4 0 x4 0 1

Let u = (−3, 2, 1, 0) and v = (−1, −1, 0, 1). Hence, Nul(A) = Span{u, v}. 

Like a homogeneous system, a non-homogeneous system of linear equations with multiple solutions can
have their solutions expressed in parametric form.

Example
 2.6.6. Describeall solutions
 of Ax = b over Z7 , where

 3 5 3   0 
   
A= 4 5 4   and b =  6 .
 
   
6 1 6 3
Here, A is the same coefficient matrix as in the first example. Thus, using the same row operations, we
can see that    
 3 5 3 0   1 0 1 6 
   
 ∼  0 1 0 2 .
 4 5 4 6   

   
6 1 6 3 0 0 0 0

Thus, the related system is 


x 1
 + x3 = 6
x2 =2

0 = 0.

and the solutions are all of the form:


       
 x1   6 + 6x 3   6   6 
       
x=  x2
=
  2  =  2  + x3  0  .
    
       
x3 x3 0 1
   
 6   6 
   
Thus, if v =  0  and x0 =  2 , then all solutions of Ax = b are points on the line determined by
  
   
1 0
x = x0 + tv, t ∈ Z7 . In fact, tv is the general form of the solutions for Ax = 0. Thus, any solution to
Ax = b is of the form x0 plus a solution to Av = 0. Using the Span{v} from Example 2.6.2, we see the
seven solutions are
                 



 6 5 




 6 5 4 3 2 1 0 



 
 

  
 

 
 
 
 
   
   
 
 
 
 


Aff  2  ,  2  =  2  ,  2  ,  2  ,  2  ,  2  ,  2  ,  2  .
                  

     
              


 
   

 0 1   0 1 2 3 4 5 6 
78 CHAPTER 2. VECTORS

Theorem 2.6.7. Suppose the equation Ax = b is consistent for some given b, and let x0 be a (particular)
solution. Then the solution set of Ax = b is the set of all vectors of the form x = x0 + xn , where xn is any
solution of the homogeneous equation Ax = 0 and can be expressed in parametric form.
From a geometric perspective, a solution set to a homogeneous system is the null space of its coefficient
matrix A. As such, it is a subspace spanned by n independent vectors, where n = nullity(A). Theorem
2.6.7 tells us that a solution set to a non-homogeneous system generically is an affine span of n independent
vectors, in other words, it is an n-flat.
2.6. SOLUTION SETS OF LINEAR SYSTEMS 79

Exercises (Go to Solutions)


For Exercises 1-1, determine whether the homogeneous system has a nontrivial solution. If so, list all the
free variables, e.g., x2 , x4 , x5 .

 2x1 + 3x2 + 2x3 = 0


 2x + 5x + 7x = 0
1 2 3
1.

 4x 1 + 6x 2 + 4x 3 = 0

−2x1 + x2 + 8x3 = 0

For Exercises 2-8, find a spanning set for the null space for each of the following matrices. Answers may
vary.
     
 9 12 −9 3   2 −4 0   1 2 2 1 
     
2.  10 11 −4 2 

 ♠ 3. 
 −2 4 1 
 4. 
 2 1 −2 −2 

     
4 7 2 0 1 −2 3 1 −1 −4 −3
     
 1 2   −5 10 7 −30  4  2 −1 −4 10 10 
    ♠ 7.  
 −2 −4   −1 2 1 0 −8  1 0 −1 3 5
 
♠ 5.    
 
♠ 6.  −1
   
 3 6  2 2 2 −3   
1 0 2 −1 −2
   
     
4 8  4 −8 −3 0 29   
8.  2 1 −1 −2
 
   1 

1 −2 2 −1 −5  
4 2 −2 −4 2

For Exercises 9-13, describe all solutions to the matrix equation Ax = b. The matrix A will be listed first,
followed by the vector b. Answers may vary.
       
 2 −6 8 0   10   9 −4 −6   1 
       
 1 −3

4 1   −3 
   ♠ 10.  −1
 6 −16 ,  11 
 

9. 

, 
 

    
 0 0 0 1   −8  −1 −2 8 −5
   
   
−1 3 −4 1 −13
       
2 −2 0 2  4   3 − 4i 3+i 6 + 2i   12 + 14i 
♠ 12.  , 

    
3 4 −5 0   −4  2 i 2i −2 + 5i
   

♠ 11. 

,






 0 0 6 0 

 18 
 
   
−2 −1 4 0 8
   
 1 0 0 1 1 1   0 
   
♠ 13. 
 1 1 0 0 1 ,
0   1  (mod 2)
 
   
0 1 1 0 0 1 0

We have seen that solution sets to homogeneous systems, aka null spaces, are subspaces, that is, they
always contain the zero vector, are always closed under addition, and are always closed under scalar multipli-
cation. In particular, if one has a list of solutions to the homogeneous system, then any linear combination
of these vectors is likewise a solution to the system. What about solution sets of non-homogeneous systems?
80 CHAPTER 2. VECTORS

After all, the solution set to a linear system is a flat and subspaces are just flats through the origin. Sadly,
such closure principles do not hold for general flats. Firstly, if the flat does not pass through the origin, then
it cannot be a subspace since it does not contain the zero vector. Similar problems arise with closure under
addition and scalars.

♠ 14. Construct an example of a non-homogeneous system such that the sum of two solutions is not a solution.
♠ 15. Construct an example of a non-homogeneous system such that the scalar multiple of a solution is not
a solution.
♠ 16. Although solution sets of non-homogeneous systems are not closed under vector addition or scalar
multiples, they are closed under lines, that is, if x0 and x1 are two distinct solutions to the non-
homogeneous system then any vector on the line

x = (1 − t)x0 + tx1 , t∈F

is likewise a solution to the system. Prove that solution sets are closed under lines.

17. We have already alluded to the fact that flat and affine spans are actually the same thing, despite being
defined differently. In this direction, prove that a solution set is closed under all affine combinations.
2.7. BASES 81

“It is through gratitude for the present moment that the spiritual dimension of life opens up.”
– Eckhart Tolle

2.7 Bases
Definition 2.7.1. A basis B of a vector space V is a linearly independent, spanning set of V . The size of
the basis, |B|, is called the dimension of V , denoted dim V .

Every line through the origin is a one-dimensional subspace and every plane through the origin is two-
dimensional. In general, a flat through the origin spanned by p independent vectors (aka, a subspace) has
dimension p. Consider for a moment the zero space {0}. This subspace has exactly two subsets {0} and ∅.
Vacuously, ∅ is linearly independent† . The set {0} is linearly dependent because it contains 0. Thus, ∅ is a
basis for {0}. Therefore, dim{0} = 0.

Example 2.7.2. For F n , the set E = {e1 , e2 , . . . , en }, where ei is the vector with a 1 in the ith component
and 0’s everywhere else, is always a basis. This is known as the standard basis of F n .
     
 1   1   1 

 

 
       3
On the other hand, the set B =   0   1   1  is a non-standard basis for F .
 ,   ,   

      

 

 0 0 1 

We mention some important properties of bases and dimension.


Theorem 2.7.3. Let B be a subset of the vector space V . Then the following are equivalent:
(i) B is a basis;
(ii) B is a maximal linearly independent set, that is, B is linearly independent but B ∪ {u} is linearly
dependent for any u ∈ V ;
(iii) B is a minimal spanning set of V , that is, V = Span(B) but V 6= Span(B r {v}) for any v ∈ B.
Theorem 2.7.4 (The Expansion Theorem). Let V be a vector space and let S ⊆ V be a linearly independent
subset. Then there exists a basis B of V such that S ⊆ B.
In other words, every linearly independent set can be expanded into a basis by including potentially new
vectors from V not already contained in S.
Theorem 2.7.5 (The Pruning Theorem). Let V be a vector space and let S ⊆ V be a spanning set of V .
Then there exists a basis B of V such that B ⊆ S.
In other words, every spanning set can be pruned down into a basis by removing unneeded vectors from
S.
Theorem 2.7.6. The dimension of a vector space V is well-defined, that is, if B and C are two bases for
V , then |B| = |C|.
Theorem 2.7.7. Let V be an n-dimensional vector space and let S ⊆ V .
(i) If |S| > n, then S is linearly dependent. But if S spans V , then there is a proper subset of S which is
a basis.
(ii) If |S| < n, then S does not span V . But if S is linearly independent, then there is a basis of V that
contains S as a proper subset.
Given any m × n matrix A, there are two fundamental subspaces†† associated to A: the column space
Col(A) which is the span of the column vectors of A (hence, a subspace of F m ) and the null space Nul(A)
which is the solution set to the homogeneous system Ax = 0 (hence, a subspace of F n ). The dimension of
Col(A) is called the rank‡ of A, and the dimension of Nul(A) is called the nullity of A.
82 CHAPTER 2. VECTORS

Theorem 2.7.8. The pivot columns of A form a basis for the column space of A.
The location of the pivots (or to say, the absence of pivots) in the echelon form of A tells us which column
vectors can be expressed as a linear combination of the previous column vectors
 
 1 3 3 2 4 
 
Example 2.7.9. Let A =   2 7 6 3 9 . Construct a basis for Col(A).

 
1 2 3 3 3
We begin by computing an echelon form of A:
     
 1 3 3 2 4   1 3 3 2 4   1 3 3 2 4 
     
∼ 0
∼ 0
1 0 −1 .
−1 1 
 2 7 6 3 9   1 1 0
  
     
1 2 3 3 3 0 −1 0 1 −1 0 0 0 0 0

We see now that this matrix is in echelon form, for which we can now easily identify the pivot columns, namely
the first and second. This indicates that the third and fourth  column
  vectors
 are both linear combinations
 1
  3 
 



   

of the first two column vectors. Therefore, a basis of Col A is  2  ,  7  and rank(A) = 2.
    

    

 

 1 2 

 
 1 3 3 2 −9 
 
 −2 −2 2 −8 2 
 
Example 2.7.10. Let A = 

. Find a basis for the column space of A.

 2 3 0 7 1 
 
 
3 4 −1 11 −8
 
 1 0 −3 5 0 
 
 0 1 2 −1 0 
 
Since A ∼  
 , we see that the third and fourth column vectors are linear

 0 0 0 0 1 
 
 
0 0 0 0 0
     
1 3 −9

 


      


     

 
 −2   −2   2 
      
combinations of the first and second column vectors. Thus, the set 
,
 
,
 
 is a basis

 2   3   1 

 


    


      

 
3 4 −8 

 

for Col A, and the rank(A) = dim(Col(A)) = 3. 

Theorem 2.7.11. If two matrices A and B are row equivalent, then Nul A = Nul B.

Because of the previous theorem, one can use the row-reduced echelon form of A to find a basis for
Nul(A). This will be the the same technique we applied in Example 2.6.5 to find a spanning set for Nul(A),
which was already a basis. The basis vectors will be derived from the non-pivot columns of A’s row-reduced
2.7. BASES 83

echelon form, that is, those columns which correspond to free variables of the linear system.

 
 1 3 3 2 4 
 
Example 2.7.12. Let A = 
 2 7 6 3 . Construct a basis for Nul(A).
9 
 
1 2 3 3 3
Remember that Nul A is the solution set to Ax = 0, which we now solve by row reducing the matrix A.
We might recognize that this is the same matrix from Example 2.7.12. As such, we have already seen that
 
 1 3 3 2 4 0 
 
[A|0]∼ 0  1 0 −1 1 0  .
 
0 0 0 0 0 0

From echelon form, we can identify the pivot and non-pivot columns. Thus, we see that nullity(A) = 3. To
find a specific basis, we are advantaged to continue to reduce the matrix to row-reduced echelon form, as
shown below:  
 1 0 3 51 0 
 
[A|0]∼
 0 1 0 .
−1 1 0 
 
0 0 0 0 0 0

As we have seen many times, this augmented matrix corresponds to a linear system, namely
(
x1 + 3x3 + 5x4 + x5 = 0
x2 − x4 + x5 = 0.

In usual tradition, we solve for the two dependent variables, x1 , x2 , in terms of the three free variables, x3 ,
x4 , x5 . From this we see (
x1 = −3x3 − 5x4 − x5
x2 = x4 − x5 ,
which provides the parametric equations to the flat which is the solution set to Ax = 0. Converting these
parametric equations into a single vector equation, we see
         
x
 1   −3x 3 − 5x 4 − x 5   −3   −5   −1 
         
 x  
 2   x4 − x5 
  0 
 
 1 
 
 −1 
 
         
x =  x3  =  = x + x + x  = x3 u + x4 v + x5 w. (2.7.1)
         
   x3 
 3
 1 
  4
 0 
  5
 0 

         
 x   x4   0   1   0 
 4         
         
x5 x5 0 0 1

Then Nul(A) = Span{u, v, w}. Thus, {u, v, w} is a spanning set for Nul(A). On the other hand, the
construction of {u, v, w} guarantees that they are linearly independent since

x3 u + x4 v + x5 w = 0

implies that x3 , x4 , and x5 are all zero. To see this, consider the 3rd, 4th, and 5th components of x in
(2.7.1). As u contains a 1 in the 3rd component and v and w both contain 0, any combination of u, v, and
w to form x must have the coefficient of u be x3 . The same is also true that the coefficients of v and w
must be x4 and x5 , respectively. Hence, if x were 0, then x3 = x4 = x5 = 0, proving the claim. Therefore,
{u, v, w} is a basis for Nul(A). 
84 CHAPTER 2. VECTORS

Let us attempt to abbreviate the process given in the previous example (also in Example 2.6.5). First,
compute the RREF of the matrix A. The basis of (A) will have exactly nullity(A) many vectors, each vector
corresponding to a free variable/non-pivot column. In those entries indexed by a free variable in the vectors
we are building, place a 1 or 0 according to whether the vector corresponds to the free variable from that
index. For example, in the previous example, the free variables were x3 , x4 , and x5 . Thus, we start building
three vector templates, in this same order, of the form
     

   ∗   ∗ 
     
 ∗   ∗   ∗ 
     
     
 1 , 0 , 0 ,
     
     
     
 0   1   0 
     
     
0 0 1

where ∗ designates a yet unspecified scalar. To fill in the *’s, notice that there indices correspond to the
dependent variables/pivot columns. These pivots also correspond to pivot rows in the RREF of A, which
each contain a scalar in non-pivot columns corresponding to free variables. From the first row, record the
inverse of the scalar you see in the corresponding columns into the vectors we are building. For example,
the first pivot row tells us to fill in the *’s as
     
 −3   −5   −1 
     
 ∗   ∗   ∗ 
     
     
 1 , 0 , 0 .
     
     
     
 0   1   0 
     
     
0 0 1

We have to switch the signs of the scalars as this is a consequence of moving the variables from the left-hand
side to the right-hand side of the parametric equations. Applying this same principle to Row 2, we get the
same basis for (A), namely,
     
 −3   −5   −1 
     
 0   1   −1 
     
     
 1 , 0 , 0 .
     
     
     
 0   1   0 
     
     
0 0 1

Example 2.7.13. Let


   
1 −3 4 2 −2   1 −3 4 0 −1/9 
A= ∼ .

−2 6 −8 5 3 0 0 0 1 −1/9

Find a basis for Col(A) and Nul(A).


   
1 2

 

Since we have the RREF of A already, we can very quickly determine that Col(A) = Span  ,  ,
   
 
 −2

5 

as a basis is formed by the 1st and 4th columns of A. Likewise, since the 2nd, 3rd, and 5th columns of A
2.7. BASES 85

are non-pivot columns, we see that a template for a basis of Nul(A) can be begun as
     
 ∗   ∗   ∗ 
     

 1 
 
 0 
 
 0 

     
, , .
     

 0 
  1 
  0 

     

 ∗ 
 
 ∗ 
 
 ∗ 

     
0 0 1

Reading off the scalars in the first row tells us


     
 3   −4   1/9 
     
 1   0   0 
     
     
 0 , 1 , 0 .
     
     
     
 ∗   ∗   ∗ 
     
     
0 0 1

Reading off the scalars in the second row gives us a basis for Nul(A), namely
     
 3   −4   1/9 
     
 1   0   0 
     
     
 0 , 1 , 0 .
     
     
     
 0   0   1/9 
     
     
0 0 1

Now, one can interchange a vector in a spanning set with any non-zero multiple of that same vector without
changing the span. Likewise, linear independence is not affected by interchanging a non-zero multiple of a
vector. Thus, if we do not want the fractions in the last spanning vector (1/9, 0, 0, 1/9, 1) = 1/9(1, 0, 0, 1, 9),
then we may substitute it with simply (1, 0, 0, 1, 9). Therefore,
     



 3   −4   1 




     

 





 1   0   0 
    

     

Nul(A) = Span  , 1 , 0  .
     
 0 
    


     

 





 0   0   1 
    



     


 
 0 0 9 
86 CHAPTER 2. VECTORS

Exercises (Go to Solutions)


For Exercises 1-1, determine with the statement is true or false. If false, correct the statement so that it is
true.

1. If A is an m × n matrix, then rank(A) + nullity(A) = m.


For Exercises 2-3, let S be the provided set of vectors. Let W = Span(S). Prune the spanning set S down
to find a basis for W . Answers may vary.
           



 1 21 6 




 0 0   1 




 
 
 
 





 
    

2.  2  ,  16  ,  12 
      3.  0  ,  0  ,  0 
     

     
 
      

 
 
 

 7 43 42   1 0 1 

For Exercises 4-19, let A be the provided matrix. Find a basis for Col(A) consisting of column vectors of A
and a basis for Nul(A). Find the rank and nullity of A. Answers may vary.
       
 1 15 8   1 1 2   1 1 1 0   5 4 2 1 
       
4. 
 0 9 6 
 5. 
 2 3 5 
 6. 
 2 3 4 2 
 7. 
 4 4 2 8 

       
0 0 2 4 2 4 3 1 3 1 2 5 6 3
     
8 −3 −13 15 17   −1 2
0  1  1 2 5 9 
♠ 8.  10. 

   
−2 1 3 −3 −3 ♠ 9. 
 7 −14 −7 −8 
 3 4 7 1
 
−3 6 3 2
     
 6 5 4 3 2 1   1 2 3 4 5   1+i 4−i 
     
 5 4 3 2 1 6   2 4 6 8 10   2 + 4i 4 − 2i
     

11. 


 12. 


 13. 



 2 4 6 8 2 4   3 6 9 12 15   6 − 10i 3+i 
     
     
0 1 0 1 2 6 2 −2 4 4 8 −8 − 2i 12 + 10i
   
 1 + 2i 2−i −1 0 1+i −3 + 3i   1 0 0 1 1 1 
   
♠ 14. 
 2 + 4i 4 − 2i 1 + 4i −2 2 + 5i −2 + 10i 
 ♠ 15. 
 0 1 0 1 1 (mod 2)
0 
   
3−i −1 − 3i 2 i 3−i 6 + 3i 0 0 1 1 0 1
     
 1 0 0 1 1 0   2

1 1 1 
  6 2 1 0 
   
16. 
 0 1 1 1 1 1 (mod 2)  4

2 0 3 
 18. 
 1 5 6 (mod 7)
6 
♠ 17. 
 (mod 5)
     
1 0 1 0 1 0  2
 1 3 0 
 3 6 3 2
 
0 0 1 2

 
 1 2 6 8 9 3 
 
19. 
 1 3 0 0 1  (mod 11)
2 
 
1 4 2 8 7 9
2.7. BASES 87

20. We say that a vector v = (v1 , v2 , . . . , vn ) ∈ Rn is strongly positive if vi > 0 for all i. Let W ≤ Rn
be a subspace that contains a strongly positive vector. Prove that W has a positive consisting of only
strongly positive vectors.

† There are no dependence relations on the vectors in ∅!


†† There are four fundamental subspaces associated to a matrix in total. Two other fundamental subspaces, namely the Row
Space and Left Null Space, will be defined later.
‡ These re-definitions of rank and nullity at first view may appear to be in conflict with the Definition 1.5.5 and Definition

2.6.3, which were defined by counting pivots. Theorem 2.7.8 and the technique developed after Theorem 2.7.11 show that these
notions, in fact, coincide.
88 CHAPTER 2. VECTORS

“If you want to get each individual’s honest opinion, you don’t want that opinion to be influenced by others
who are present, much less allow a group to coordinate what they are going to say.” – Thomas Sowell

2.8 Coordinates
Let V be a vector space with basis B = {v 1 , v 2 , . . . , v n }. Let x ∈ V . Since B is a spanning set for V , there
exists scalars c1 , c2 , . . . , cn ∈ F such that

x = c1 v 1 + c2 v 2 + . . . + cn v n .

Suppose that x can be expressed as a linear combination of B in another way, say

x = d1 v 1 + d2 v 2 + . . . + dn v n ,

for d1 , d2 , . . . , dn ∈ F . Then

0 = x−x = (c1 v 1 +c2 v 2 +. . .+cn v n )−(d1 v 1 +d2 v 2 +. . .+dn v n ) = (c1 −d1 )v 1 +(c2 −d2 )v 2 +. . .+(cn −dn )v n .

But B is linearly independent, which implies that each ci − di = 0, that is, ci = di . Therefore, each element
of V can be expressed uniquely as a linear combination of B.

Definition 2.8.1. Suppose the set B = {v 1 , v 2 , . . . , v n } is a basis for vector space V . For each x ∈ V , the
coordinates of x relative to the basis B are the unique coefficients c1 , . . . , cn ∈ F such that

x = c1 v 1 + c2 v 2 + . . . + cn v n .
 
 c1 
 .. 
 ∈ F n is called the coordinate vector of x relative to B or the B-coordinate
The vector [x]B = 
 . 
 
cn
vector of x.

     
 1   5   3 
     
Example 2.8.2. Let v 1 =   0 , v 2 =  2 , x =  2 , and B = {v 1 , v 2 }. Then B is a basis for
    
     
1 3 1
V = Span{v 1 , v 2 } because B is linearly independent. Determine if x ∈ V , and if it is, find the coordinate
vector of x relative to B.

If x ∈ V , then there exists c1 , c2 ∈ R such that c1 v 1 + c2 v 2 = x. Then, we row reduce the corresponding
augmented matrix, below: From this we see that x ∈ V and
         
 1 5 3   1 0 −2   3   1   5 
         
 0 2 2 ∼ 0 1  2  = −2  0  +  2  .
   .
1       
         
1 3 1 0 0 0 1 1 3
 
−2
Therefore, [x]B =  . The basis B determines a “coordinate system” for the plane spanned by v 1 ,
 
1
v2 . 
2.8. COORDINATES 89

In the previous example, even though the vectors in V are vectors in R3 , they are completely determined
by their coordinate vectors, which belong to R2 . Thus, there is a natural identification between the vectors
of W and the vectors in R2 , namely x 7→ [x]B . This mapping is a one-to-one, onto linear transformation.
Essentially, this means that the two spaces look the same. In this example, V and R2 are geometrically
the same as they are both planes. We even write that V ∼ = R2 , and we say that W is congruent to (or
isomorphic to) to R2 .

Example 2.8.3. Consider two bases B = {b1 , b2 } and C = {c1 , c2 } for a vector space V , such that

c1 = 2b1 − 3b2 and c2 = −3b1 + 5b2 .

Suppose further that


x = c1 + 3c2 .
Compute [x]B .

We already know that


     
2 −3 1
[c1 ]B =  , [c2 ]B =  , and [x]C =   .
     
−3 5 3

Thus,      
2   −3   −7 
[x]B = [c1 + 3c2 ]B = [c1 ]B + 3[c2 ]B =   + 3 = .

−3 5 12
On the other hand, the vector equation

[x]B = [c1 ]B + 3[c2 ]B

can be rewritten as a matrix equation


    
 −7   2 −3   1 
 
[x]B = [c1 ]B [c2 ]B [x]C ⇒  =   

12 −3 5 3

Theorem 2.8.4. Let B = {b1 , . . . , bn } and C = {c1 , . . . , cn } be bases of a vector space V . Then there is a
unique n × n matrix P , called the change-of-basis matrix to B from C such that
B←C

[x]B = P [x]C .
B←C

The columns of P are the B-coordinate vectors of the elements of C, that is,
B←C
 
P = [c1 ]B [c2 ]B ... [cn ]B .
B←C

Multiplication by P converts C-coordinate vectors to B-coordinate vectors. To change coordinates


B←C
between two bases, we need the coordinates of the old basis in terms of the new basis.

       
 3   4   1   5 
Example 2.8.5. Let b1 =  , b2 =  , c1 =  , c2 =  , B = {b1 , b2 } and C = {c1 , c2 }.
2 3 2 1
Then both B and C are bases of R2 . Find the change-of-basis matrix P .
B←C
90 CHAPTER 2. VECTORS

We need to compute the coordinate vectors [c1 ]B and [c2 ]B . We compute the first coordinate vector by
solving the linear system
     
3 4 1   1 0 −5 −5
 
b 1 b 2 c1 =  ∼ ⇒ [c1 ]B =  .
   

2 3 2 0 1 4 4

Using the same row operations, we see that


     
 3 4 5   1 0 11   11 
 
b 1 b 2 c2 =  ∼  ⇒ [c2 ]B =  .
2 3 1 0 1 −7 −7

This gives  
 −5 11 
P = . 
B←C
4 −7

Let B and C be two bases of a vector space V . Then mimicking the previous example, we can see that
   
B C ∼ E P , (2.8.1)
B←C

where E denotes the standard basis for F n . We should mention that the echelon form of B might  contain

E
rows of zeros. Thus, the left-hand side of this row-reduced echelon form really should be of the form  ,
 
0
where here 0 denotes some matrix consisting of only zeros. If, in fact, B and C really do not span the same
vector space, then there would instead be a nonzero entry on the right side of one of these rows of zeros,
indicating an inconsistent linear system. When the linear system is consistent though, then B and C truly
were bases for the same span and on the right-hand side of each of these rows of
 zeros will be accompanying
 E B←CP 
 
rows of zeros. Thus, (2.8.1) more properly should be B C ∼  .
0 0
         
 1   1   3    6  15 
         
1   0   0  3   5 
         
 
Example 2.8.6. Let b1 = 

, b2 = 
 
, b3 = 
 
, c1 = 
 
, c2 = 
 
,


 −3 

 −2 
 
 0



 −21 
 
 −23 
 
         
0 4 −2 26 12
 
 3 
 
2 
 

c3 =  , B = {b1 , b2 , b3 } and C = {c1 , c2 , c3 }. Then both B and C are bases for the same subspace of
 

 −8 

 
4
R4 .

(a) Find the change-of-basis matrix to B from C.


2.8. COORDINATES 91

   
 1 1 3 6 15 3   1 0 0 3 5 2 
   
 1 0 0 3 5 2   0 1 0 6 4 1 
     
B C = ∼ .
   
 −3 −2 0 −21 −23 −8   0 0 1 −1 2 0 
   
   
0 4 −2 26 12 4 0 0 0 0 0 0
 
 3 5 2 
 
Thus, P =  6 4 .
1 
B←C 
 
−1 2 0
 
 2 
 
(b) If [x]C = 
 , then compute [x]B .
−3 
 
4

    
 3 5 2   2   −1 
    
[x]B = P [x]C =  6 4   −3  =  4  .
1     
B←C 
    
−1 2 0 4 −8

Note that  
 −21 
 
 −1 
 
x = −b1 + 4b2 − 8b3 = 2c1 − 3c2 + 4c3 = 

.
 
 −5 
 
 
32
92 CHAPTER 2. VECTORS

Exercises (Go to Solutions)


For Exercises 1-3, find the coordinate vector [x]B given the basis B, where x is the vector provided first and
B is the set of vectors provided second.
             
4 1 3 −1 2 1 4

 
 
 

         
       
         
       
♠ 1.  −25 ,  2  ,  −5 
 
22  5   4   0 
               
 
   


  

 ♠ 2. 

, 
 
,
 
,
 


6  3 7   3    −3   3   0  
  

      

  
       

2 2 0 1 

 

     
−2 − 3i  1 1 + 2i
 

♠ 3.  , ,
     
 
−5 + 6i  i 2 − 3i 
 

For Exercises 4-7, given the coordinate vector [x]C and the change-of-basis matrix P , compute the coor-
B←C
dinate vector [x]B , where [x]C is the vector provided first and P is the matrix provided second.
B←C
           
1
   2 −2 1   1   3 −2 −1 0   1   6 7 2 
           
♠ 4. 
 2 ,  3
  0 2   2   3
  
2 2 −1 ♠ 6. 

,  1 3 4
2    (mod 11)
♠ 5. 
 ,    
           
3 5 6 −2  3   0
   0 8 2 
 3 5 5 1
   
4 −1 −2 2 1

   
 1   5 − 6i −3 − i 8 + 7i 
   
 2 ,  3 + i
7.    2+i 1+i  
   
3 4 + 2i −9 − 6i 3 − i

For Exercises 8-13, compute the change-of-basis matrix P for the bases B and C.
B←C
           



 1 −1   3





 4 −5 −12 




 
   

 
   
  
 
 


♠ 8. B = 
 2  ,  −2  ,  7
     , C =  8  ,  −12  ,  −28 
      

      
      

 
 
 

 3 4 −5   5 20 27 
               
1 −5 1 −1 −52 −20 −6 3

 
 
 


       
 
       


       
 
       

   
 2   0   1   −1 
        
 1  
 
2   −3   5 
    
♠ 9. B = 

,
 
, ,
   
 , C = 
 
,
 
,
 
,
  



 3   0   2   −2 
 
 −1   2   −9   3 

       
 
       



       
 
       

 4

7 2 5

 
 72 88 29 52 

               
2 0 0   1 −15 −15 4 2

 
 
 


     
 
        


       
 
        

   

 0   0   1   2
       
  35   25   10   −8 
        
10. B = 

,
 
,
  ,
 
 ,C= 
 
,
 
,
 
,
 


 0   0   1   4 1    −6   4 
3 

        
      
 
 
    

       
 
        

   
 0 1 0 −1 0 4 0 4 
 
 
 
2.8. COORDINATES 93
       
  1−i 
 1 + 2i
 4i   7 + 6i 
 
 
 

♠ 11. B =  ,  ,C=  ,


 2 − 3i

1 + 4i 
  1

i


           



 1 2 3 




 2 2   0 




 
 
 
  

 

 
    

 3  ,  4  ,  4 , C ≡  0
12. B ≡        , 4 , 2 
     (mod 5)

            

 
 
 

 3 4 0   2 3 4 
           



 0 2 1 




 0 2   1 




 
 
 
  

 

 
    

 1  ,  2  ,  0 , C ≡  3
13. B ≡        , 1 , 2 
     (mod 5)

     
       

  
 

 0 1 1   1 3 1 
94 CHAPTER 2. VECTORS
Chapter 3

The Algebra and Geometry of


Matrices

Up to this point, we have used matrices for one purpose, to encode information about linear systems. We
first introduced the augmented matrix in Section 1.5 to do exactly that. In Chapter 2 we started using
matrices to represent a set of vectors in a slightly more compact way. This happened in Section 2.2, where
we introduce the matrix-vector product so that the matrix equation (2.2.1) could encode the vector equation
(2.1.1). This allowed us to define linear transformations using matrices.

But we also introduced the the column and null spaces of a matrix. These vector spaces came about
essentially by solving problems about spanning and linear independence of vectors but we attributed the
spaces to the matrix, not a collection of vectors. This was our first inkling that matrices deserve to be
studied in their own right, not just as tools to better understand vectors. In fact, we will turn this paradigm
on its head. Viewing column vectors just as n × 1 matrices, all we have studied about vectors can actually
be viewed as a subset of the theory of matrices. We can add and scale matrices just like column vectors.
For this reason, we can actually view matrices as just beefier versions of column vectors, which can lead to
discussions about vector spaces of matrices. But matrices can transcend the vector operations as we will
introduce a general matrix multiplication, which will be the foundation for this chapter on matrices.

95
96 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

“The thing that scares us the most is when familiar things operate in unfamiliar ways.” – Noah Hawley

3.1 Matrix Operations


   
We say that two m × n matrices A = aij and B = bij are equal if aij = bij for all i and j (for all
aij , bij ∈ F ). We can add matrices term-wise, that is,
 
A + B = aij + bij .

Finally, we can multiply matrices by a scalar term-wise, that is,


 
cA = caij for all c ∈ F .

     
3 9 1   0 5 6   2 1 
Example 3.1.1. Let A =  , B =  , and C =  .

−2 4 6 3 1 1 0 4
Then    
3+0 9+5 1+6   3 14 7 
A+B = = .

−2 + 3 4+1 6+1 1 5 7
Now, A + C is not possible since the matrices have different sizes.

Next,    
 0 5 6   0 10 12 
2B = 2  = ,
3 1 1 6 2 2
and      
3 9 1   0 10 12   3 −1 −11 
A − 2B =  − = .



−2 4 6 6 2 2 −8 2 4

Because we can add and scale matrices, we can actually view matrices as vector themselves. Let F be a
field. Then F m×n will denote the set of m × n matrices with entries from F , which is a vector space. This
means that addition of matrices and multiplication of scalars follow the eight algebraic properties listed in
Definition 1.3.1. When working with matrices, we will let 0 denote an m × n matrix with all entries equal
to zero. This is called the zero matrix.

Let Eij be the matrix whose entry in the (i, j)th entry is a one and all other entries are zero. Then E =
{E1,1 , E1,2 , . . . , E1,n , E2,1 , E2,2 , . . . , E2,n , . . . , Em,1 , Em,2 , . . . , Em,n } forms the standard basis of F m×n .
The matrices Ei,j are often called the unit matrices. Using coordinate vectors, we see that F m×n ∼ = F mn ,
since |E| = mn.

 
Definition 3.1.2. The diagonal entries in an m × n matrix A = aij are the entries a11 , a22 , a33 , . . .,
that is, the entries aii , and they form the main diagonal.

If A is an n × n matrix, then we say that A is a square matrix.

Let In denote the n × n matrix whose entries are 1’s across the diagonal and 0’s everywhere else. This
matrix is known as the identity matrix.
3.1. MATRIX OPERATIONS 97

Definition 3.1.3. If A is an m × n matrix and B is an n × p matrix with column vectors


 
B = b1 b2 . . . bp ,

then the matrix product


   
AB = A b1 b2 ... bp = Ab1 Ab2 ... Abp .

The matrix AB is a m × p matrix.

If A is an n × n matrix, then Ak = A · · A}. We let A0 = In .


| ·{z
k

Matrix multiplication can also be defined with the seemingly complicated formula,
" n #      
X
AB = aik bkj = ai1 b1j + ai2 b2j + . . . + ain bnj for A = aik and B = bkj ,
k=1

which is none other than the “finger-multiplication” we learned with the matrix-vector product.

 
  9 −5 −3 
 2 1 −1 

 
Example 3.1.4. Compute AB with the given matrices A =   and B = 
 3 9 .
1 
0 4 −2  
−2 4 6
       
9  −5 −3  
  2(9) + 1(3) − 1(−2) 2(−5) + 1(9) − 1(4) 2(−3) + 1(1) − 1(6) 
      
      
AB = 
 A
 3  A 9 
 
 A 1 


=
 
        0(9) + 4(3) − 2(−2) 0(−5) + 4(9) − 2(4) 0(−3) + 4(1) − 2(6)
−2 4 6
   
 18 + 3 + 2 −10 + 9 − 4 −6 + 1 − 6   23 −5 −11 
= = ,
0 + 12 + 4 0 + 36 − 8 0 + 4 − 12 16 28 −8

which is a 2 × 3 matrix. 

Definition 3.1.5. If A is a square matrix, say n × n, and if

p(x) = a0 + a1 x + a2 x2 + . . . + am xm

is a degree m polynomial, then we define the n × n matrix p(A) to be

p(A) = a0 In + a1 A + a2 A2 + . . . + am Am .

An expression of this form is called a matrix polynomial in A.

 
 4 2 
Example 3.1.6. Find p(A) for p(x) = x2 + 3x + 2 and A =   over Z5 .
0 3
98 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

 2    
 4 2   4 2  1 0 
p(A) ≡ A2 + 3A + 2I ≡   + 3  + 2


0 3 0 3 0 1
       
 1 4   2 1   2 0   0 0 
≡ + + ≡  

0 4 0 4 0 2 0 0

 
Definition 3.1.7. Let A = aij be an m × n matrix. Then the transpose of A, denoted A> , is the
 
n × m matrix given by A> = aji , that is, the matrix whose columns are formed from the corresponding
rows of A.

Example 3.1.8. Let


     
 1 2 3   1 1 1   2 −3 
A= , B =  , and C =  .
4 5 6 3 5 7 0 1

Then
   
 1 4   1 3   
     2 0 
A> = 
 2 ,
5  B> =
 1
>
, and C = 
5  .
    −3 1
3 6 1 7


In the case of C-matrices, an alternative to transposes is preferred for reasons that will be explained in
Chapter 4.

Definition 3.1.9. Let A be an m × n complex matrix. Then we define A∗ = (A)> , which is called the
conjugate transpose. This replaces the role of transposes in complex space.

When discussing transposes of matrices, whenever the matrices are complex, the conjugate transpose
should ALWAYS be used instead of the standard transpose.

 
 2 − 3i 0 2i
 
 1 − 2i 3 + 5i 6 

 
Example 3.1.10. Let A =   and B = 
 −i 4 1 + 2i .

−2i 0 i  
0 2 − 2i 6
Note that    
 1 + 2i 2i   2 + 3i i 0
A∗ =  B∗ = 
   
 3 − 5i 0 ,  0 4 .
2 + 2i  
   
6 −i −2i 1 − 2i 6

Definition 3.1.11. If A is a square matrix, then the trace of A, denoted tr(A), is the sum of the diagonal
entries of A.
3.1. MATRIX OPERATIONS 99
 

   9 3 −2 3 
 
 2 0   28 9 4 0 
 
Example 3.1.12. Let A =   and B = 
 . Then

1 4  16 1
 6 3 

 
2 5 2 1

tr(A) = 2 + 4 = 6 , tr(B) = 9 + 9 + 6 + 1 = 25 . 
100 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

Exercises (Go to Solutions)


For Exercises 1-6, determine with the statement is true or false. If false, correct the statement so that it is
true.
1. We say that two m × n matrices A = [aij ] and B = [bij ] are equal if aij = bij for all i and j.
2. If A is a square matrix, then the trace of A is the sum of the column entries of A.
3. We can multiply matrices by a scalar term-wise.
4. The identity matrix is an n × n matrix whose entries are 1’s across the main diagonal and zeros
everywhere else.
5. If A is an n × n matrix, then we say that A is a square matrix.
6. The transpose of a matrix A is a matrix whose columns are formed from the corresponding rows of A.

7. Write the matrix E1,3 ∈ F 3×3 .


For Exercises 8-10, using the matrices listed below, explain why the operation is not possible.
 
      1 6 4
 2 3   1 1   2 7 5 
 
 
A= , B =  , C =  , D =  5 9 2 

.
1 4 3 6 1 5 4  
9 3 3

8. CA 9. ABD 10. tr(C)

For Exercises 11-20, using the matrices listed below, perform the matrix calculation:
 
     
 2 3 −2 
 1 2 3  0 5 6   3 0 −1 2   
 −5 0 7 
  
     
A=
 4 −3 ,
0  B=
 −5 −5 ,
2  C=
 1 7 ,
8 −3  D=

.

       0
 −2 4 

1 −2 −1 2 0 −3 3 3 2 −4  
1 2 3

♠ 11. 2A + B > ♠ 12. 4A − 3B ♠ 13. AC ♠ 14. A> ♠ 15. C > ♠ 16. D>

♠ 17. tr(A) ♠ 18. tr(B) ♠ 19. DB > ♠ 20. A2 + 2A − 5I3

For Exercises 21-24, using the matrices listed below, perform the matrix calculation over the field R:
 
  5 1  
 −1 0 2   −2 0 
 
 
A= , B =   2 2 , C = 
 .
1 3 5   3 −2
3 4

21. 2A> + B 22. AB 23. BC 24. ABC

For Exercises 25-33, using the matrices listed below, perform the matrix calculation over the field C:
 
    3 0  
 1 + 2i 1 − 3i   0 3 − 4i   1 0 i 1 − 2i 
 
 
A= , B =  , C =   i 2i  , D =  .
5 1−i 3i 1 + i   5 4+i 0 −4
3 + i 1 + 3i
3.1. MATRIX OPERATIONS 101

♠ 25. (1 + i)A − 3B ♠ 26. CA ♠ 27. A∗ ♠ 28. C ∗ ♠ 29. D∗

♠ 30. tr(A) ♠ 31. tr(B) ♠ 32. (BD)∗ ♠ 33. A2 + I2

For Exercises 34-42, using the matrices listed below, perform the matrix calculation over the field Z5 :
     
 1 2 3 4   0 1 1 2   2 3 3
 

 3 0 4 2 
     
 0 2 2 3   2 2 3 0   0 0 2
     
  
A≡ 
, B ≡ 
 
, C ≡  1
  2 3 2 , D ≡ 



 (mod 5).
 1 2 2 1   1 2 3 2     0 3 4 
     
    3 3 2 1  
4 4 3 2 4 3 4 2 1 2 3

♠ 34. 2A − 3B ♠ 35. CA ♠ 36. A> ♠ 37. C > ♠ 38. D>

♠ 39. tr(A) ♠ 40. tr(B) ♠ 41. D> B ♠ 42. A2 + A − I4

For Exercises 43-45, for the matrix A provided, find tr(A), A> , and tr(A> ).
     
 8 −7 2   1/8

−7/8 2/5 −12/13 

 5

1 6 2 

 
43. 
 0 −2 4 
  9/8

−1/2 1/4 1/4 

 3

0 3 4 

  44. 


 45. 

 (mod 7)

3 2 1  1 −12/15 2 1/5   5 2 1 1 
   
   
1/4 8/15 −3/8 3/8 2 3 0 0

For Exercises 46-46, for the matrix A provided, find tr(A), A∗ , and tr(A∗ ).
 
 1 − 3i −5 − 5i 2 + 2i 
 
46. A =  2 − 4i 8i 6+i  
 
−1 + 2i 3 − 6i 2 + 4i
102 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

“When you put your hand to the plow, you can’t put it down until you get to the end of the row.”
– Alice Paul

3.2 Matrix Properties


We mentioned in the previous section that F m×n , meaning that matrix addition and scalar multiplication
satisfy the eight properties listed in Definition 1.3.1. Transposes and traces also follow very nice algebraic
properties.

Theorem 3.2.1. Let A and B be matrices with sizes such that the indicated sums and products are defined.
Let c ∈ F .

(i) (A + B)> = A> + B > (iii) (A> )> = A

(ii) (cA)> = cA> (iv) (AB)> = B > A>

We see by the first two properties that transposition is a linear transformation > : F m×n → F n×m . Note
the last property states that the transpose of a product of matrices equals the product of their transposes in
the reverse order! This is called the Shoe-Sock Principle † . As we will see momentarily (see Theorem 3.2.4),
this order matters a lot.

Theorem 3.2.2. Let A and B be n × n matrices. Let c ∈ F .

(i) tr(A + B) = tr(A) + tr(B) (iii) tr(A> ) = tr(A)

(ii) tr(cA) = c tr(A) (iv) tr(AB) = tr(BA)

We see by the first two properties that the trace is a linear transformation tr : F m×n → F . It should be
mentioned that the last property does NOT say that tr(AB) = tr(A) tr(B). This is, in fact, very false. Also,
this property does NOT say that AB = BA only that tr(AB) = tr(BA). As hinted to already, we will soon
see that AB 6= BA, in general.

Unfortunately, matrix multiplication is not as well behaved as multiplication of real numbers which we
are used to. Many of the typical algebraic properties do hold.

Theorem 3.2.3. Let A, B, and C be matrices with sizes such that the indicated sums and products are
defined. Let c ∈ F .

(i) A(BC) = (AB)C (iv) c(AB) = (cA)B = A(cB)

(ii) A(B + C) = AB + AC (v) Let A be an m × n matrix. Then

Im A = A = AIn .
(iii) (A + B)C = AC + BC

Please be aware that certain multiplicative properties in the previous list are omitted. This is quite
intentional because many algebraic properties that we take for granted with real multiplication do NOT
hold for matrix multiplication.

Theorem 3.2.4. Let A, B, and C be matrices with sizes such that the indicated sums and products are
defined.
3.2. MATRIX PROPERTIES 103

(i) You CANNOT assume that AB = BA

(ii) You CANNOT assume that if AB = AC, then B = C

(iii) You CANNOT assume that if AB = 0, then A = 0 or B = 0.


   
1 2 0 −3
Example 3.2.5. Let A =   and B =  . Then
   
−1 3 1 1
         
1 2  0 −3   2 −1   0 −3   1 2   3 −9 
AB =  = and BA =  = .

  
−1 3 1 1 3 6 1 1 −1 3 0 5

Therefore, AB 6= BA. 

     
 0 2   1 2   5 6 
Example 3.2.6. Let A =  , B =  , and C =  . Then
0 −1 3 4 3 4
       
 0(1) + 2(3) 0(2) + 2(4)   6 8   0(5) + 2(3) 0(6) + 2(4)   6 8 
AB =  =  and AC =  = .
0(1) − (3) 0(2) − (4) −3 −4 0(5) − (3) 0(6) − (4) −3 −4

Therefore, AB = AC but B 6= C. In other words, we cannot divide both sides by A. 

   
 0 2   4 2 
Example 3.2.7. Let A =   and B =  . Then
0 −1 0 0
   
0(4) + 2(0) 0(2) + 2(0)   0 0 
AB =  = ,

0(4) − (0) 0(2) − (0) 0 0

which is the zero matrix. Therefore, AB = 0 but A 6= 0 not B 6= C. In other words, the zero product
property does NOT hold for matrices. 

Definition 3.2.8. Let A be an m × n matrix. Define the row space of A, denoted Row A, as the column
space of A> , that is, Row A = Col A> .

The dimension of Row A is called the corank, denoted corank A = p, where p is the number of pivots in
A.

Note that the corank is not really a new quantity. Note that corank(A) = dim(Row A) = the number of pivot positions =
rank A = dim(Col A).

For another way to describe the row space of A, note that each row of A contains n entries and can be
identified with a vector in F n . Then Row A is the span of the rows of A under this identification.

Theorem 3.2.9. Two matrices A and B are row equivalent if and only if Row A = Row B.
104 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

Theorem 3.2.10. If U is an echelon form of the matrix A, then nonzero rows of U form a basis for the
row space of A.

Much like the column space of A, the pivot rows of A also form a basis of Row(A). But unlike the column
space, the pivot rows of U form a basis of Row(A). This is not true for the column space, that is, the pivot
columns of U do NOT form a basis for Col(A). This is because row equivalent matrices need not have the
same column space. Only the row space is guaranteed to be equal.

 
 1 3 2 4 2 
 
 2 6 4 8 4 
 
Example 3.2.11. Let A = 

. Compute a basis for Row A, Col A, and Nul A.

 3 2 5 2 1 
 
 
4 2 5 1 0
Since    
 1 3 2 4 2   1 0 0 −1 −1 
   
 2 6 4 8 4   0 1 0 15/11 7/11 
   
A= ∼ .
   
 3 2 5 2 1   0 0 1 5/11 6/11 
   
   
4 2 5 1 0 0 0 0 0 0

Therefore,
{(1, 0, 0, −1, −1), (0, 1, 0, 15/11, 7/11), (0, 0, 1, 5/11, 6/11)}
is a basis for Row A. Hence, corank(A) = 3. If a basis without fractions is desired, those vectors can be
scaled by 11 without adjusting the span or the linear independence. Thus,

{(1, 0, 0, −1, −1), (0, 11, 0, 15, 7), (0, 0, 11, 5, 6)}

is another basis of Row A.

We also see that Likewise,


   
11 11
      
 

 
 1 3   2 
 
 
    

     
   


 −15   −7 

       
    
  
 2   6   4 
       
   

 , ,   −5  ,  −6 
   
     
 3   2   5 

       
   
  
    


        
 1   0 


      
 4

2 5 
 

   


 
 0 1 

is a basis for Col A. Hence, rank(A) = 3. is a basis for Nul A. Hence, nullity(A) = 2. 

Alternatively, one could find a basis for Row A by finding a basis for Col A> as we did before. This would,
in fact, provide a basis consisting of actual rows of A. The method above has the advantage (other than
providing simpler vectors) than we can find all these bases of the fundamental spaces of A simultaneously.
3.2. MATRIX PROPERTIES 105

Exercises (Go to Solutions)


1. Rewrite the eight axioms of a vector space, listed in Definition 1.3.1, as properties of m × n matrices.

   
 3 2 1   1 3 2 
   
2. Verify Theorem 3.2.4 (i) using A = 
 1 5 , B =  0
0   1 .
4 
   
4 1 2 2 1 7

For Exercises 3-7, for the matrix A provided, find a basis for Row(A) and compute corank(A). Answers may
vary.
   
8 −3 −13 15 17 −1 2 1 0
♠ 3. 
   
  
−2 1 3 −3 −3 ♠ 4.  7 −14 −7 −8 


 
−3 6 3 2
   
 1 0 0 1 1 1   2

1 1 1 

 
♠ 5. 
 0 1 0 1 1  (mod 2)
0   4

2 0 3 

  ♠ 6. 

 (mod 5)

0 0 1 1 0 1  2
 1 3 0 

 
0 0 1 2

 
 1 + 2i 2−i −1 0 1+i −3 + 3i 
 
♠ 7. 
 2 + 4i 4 − 2i 1 + 4i −2 2 + 5i −2 + 10i 

 
3−i −1 − 3i 2 i 3−i 6 + 3i

For Exercises 8-8, let A be the matrix provided on the left. The second matrix is row equivalent to A. Find
a basis for Col(A) and Row(A) and compute rank(A) and corank(A).
   
 1 1 −3 7 9 −9   1 1 −3 7 9 −9 
   
 1
 2 −4 10 13 −12   0 1 −4 3
  4 −3  
   
8.  1 −1 −1 ∼ 0 0
   
 1 1 −3   0 1 −1 2 

   
 1 −3 1 5 7 3   0 0
  0 0 0 0 
 
   
1 −2 0 0 −5 4 0 0 0 0 0 0

9. Show that R2×2 with respect to matrix addition and matrix multiplication is NOT a field.
10. Show that if F is any field and n a positive integer, then F n×n is a field if and only if n = 1.

† The name for the Shoe-Sock Principle comes from the identical property of inverse matrices, that is, (AB)−1 = B −1 A−1 .

See Theorem 3.3.7 (ii) for further details.


106 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

“My happiness grows in direct proportion to my acceptance, and in inverse proportion to my expectations.”
– Michael J. Fox

3.3 Matrix Inverses


When can we divide by a matrix? Does it even make sense to talk about matrix division? It depends on the
matrix.

Definition 3.3.1. An n × n matrix A is nonsingular (or invertible) if there exists an n × n matrix B


such that
AB = BA = In .
In this case, B is called an inverse of A. If A is not nonsingular, then it is singular.

   
 2 3  5 −3
Example 3.3.2. Let A =   and C =  . Then
 
3 5 −3 2
    
 2 3   5 −3   1 0 
AC =   = 
3 5 −3 2 0 1

and     
5 −3   2 3   1 0 
CA =  = .


−3 2 3 5 0 1
Thus, A is invertible with inverse C. 

We should mention that inverses are unique. Suppose that A is invertible with inverses B and C. Thus,
B = BIn = B(AC) = (BA)C = In C = C.
We will denote the inverse of A as A−1 . Therefore,
AA−1 = A−1 A = In .

 
 a b 
Theorem 3.3.3. Let A =  . If det(A) := ad − bc† =
6 0, then A is nonsingular with inverse
c d
 
1  d −b 
A−1 = .
ad − bc

−c a

If ad − bc = 0, then A is singular.
 
 1 2 
Example 3.3.4. Find the inverse of A =  , if it exists.
3 4
Since det(A) = 1(4) − (2)3 = 4 − 6 = −2, A is nonsingular, that is, it has an inverse, which is
   
1 4 −2   −2 1
A−1 = −  = .


2
−3 1 3/2 −1/2
3.3. MATRIX INVERSES 107
 
 9 3 
Example 3.3.5. Find the inverse of A =  , if it exists.
6 2
Since det(A) = 9(2) − 3(6) = 18 − 18 = 0, A has no inverse, that is, A is a singular matrix. 

For a nonsingular matrix A, the equation Ax = b has a unique solution for all b ∈ F n of the form
x = A−1 b.

Example 3.3.6. Solve the system of equations


(
x1 + 2x2 = 5
3x1 + 4x2 = 6.
 
 5 
We seek to solve the matrix equation Ax = b, where b =   and A is the matrix from the previous
6
equation. Thus,
    
−2 1 5 −4
A−1 (Ax) = A−1 b ⇒ x = A−1 b =  = .
    
 

3/2 −1/2 6 9/2

Theorem 3.3.7. Let A and B be n × n invertible matrices, let m be a positive integer, and r is a nonzero
real number. Then A−1 , AB, A> , Am , and rA are also invertible with:
(i) (A−1 )−1 = A (ii) (AB)−1 = B −1 A−1 (iii) (A> )−1 = (A−1 )>

(iv) (Am )−1 = (A−1 )m := A−m 1 −1


(v) (kA)−1 = A .
k

Note that (ii) is another example of the Shoe-Sock Principle.††

Corollary 3.3.8. If A is invertible, then AA> and A> A are likewise invertible.

Example 3.3.9. Inverse matrices allow us to solve matrix equations when otherwise we may have used
division. For example, solve the equation (AX −1 )−1 + B = C, assuming all matrices are n × n and nonsin-
gular (when necessary).

We begin by adding −B to both sides of the equation. This gives (AX −1 )−1 = C − B. Using the
Shoe-Sock Principle, we see that the left-hand side of the equation becomes (X −1 )−1 A−1 = XA−1 = C − B.
Finally, if we multiply the equation on both sides by A on the RIGHT side, we get

(XA−1 )A = (C − B)A
−1
X(A A) = CA − BA
XIn = CA − BA
X = CA − BA

The side on which we are multiplying matters. Had we multiplied A on the left, we would have got AXA−1
which is not necessarily X. Likewise, (XA−1 )A 6= A(C − B) since multiplication by A must be done BOTH
on the LEFT or BOTH on the RIGHT to guarantee equality (assuming all the products are even possible
here). After all, multiplying by A on the LEFT is a different linear transformation than multiplying by A
on the RIGHT. Do NOT assume matrices commute. Therefore, X = CA − BA .
108 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

Alternatively, we could solve for X in (AX −1 )−1 = C −B instead by taking inverses of both sides, that is,
[(AX −1 )−1 ]−1 = [C − B]−1 , or AX −1 = (C − B)−1 . On the right-hand side, do not be tempted to distribute
the exponent. While multiplication (and division) distribute over addition (and subtraction) even in the
1 1 1 1 2
matrix algebraic setting, exponents do not. By analog, (5 − 3)−1 = = 6= 5−1 − 3−1 = − = − .
5−3 2 5 3 15
Continuing on, we will multiply both sides of the equation by A−1 on the LEFT. This gives A−1 (AX −1 ) =
X −1 = A−1 (C − B)−1 . Again taking the matrix inverse of both sides, we have [X −1 ]−1 = [A−1 (C − B)−1 ]−1
or X = (C − B)A = CA − BA, agreeing with our solution from before. Of course, we used the Shoe-
Sock Principle here at the end. This demonstrates that there could be more than one correct path toward
the correct solution. Just make sure to adhere to all algebraic properties and restrictions, such as matrix
multiplication being noncommutative and the Shoe-Sock Principle. 

Example 3.3.10. Solve the equation B(XA)−1 = C, assuming all matrices are n × n and nonsingular
(when necessary).

We will demonstrate two paths to the solution of this matrix equation. Our first attempt begins with
multiplying both sides of the equation by B −1 on the LEFT. This gives B −1 (B(XA)−1 ) = B −1 C 6= CB −1 ,
or (XA)−1 = B −1 C. Inverting both sides then gives XA = (B −1 C)−1 = C −1 [B −1 ]−1 = C −1 B. Lastly, we
multiply both sides on the RIGHT by A−1 , giving X = C −1 BA−1 .

Alternatively, we could solve B(XA)−1 = C by first multiplying by XA on the RIGHT of both sides.
This gives B(XA)−1 (XA) = C(XA), or B = CXA. Our goal is still to solve for X, however, notice that
we cannot touch X in the current equation as it is sandwiched between two other matrices. In order to
“unwrap,” we must multiply by C −1 on the LEFT and then by A−1 on the RIGHT. This gives XA = C −1 B
and XC −1 BA−1 , respectively, agreeing with the solution of our first attempt. 

Theorem 3.3.11 (The Nonsingular Matrix Theorem). Let A be a square n × n matrix. Then the following
statements are equivalent:

(i) A is a nonsingular matrix. (xi) The rank of A is n.

(ii) A is an invertible matrix, that is, A has a matrix (xii) The columns of A span F n .
inverse A−1 .
(xiii) The columns of A form a linearly independent set.
(iii) There is an n × n matrix C such that CA = In .
(xiv) The columns of A form a basis for F n .
(iv) There is an n × n matrix C such that AC = In .
(xv) The rows of A span F n .
(v) The equation Ax = 0 has only the trivial solution.
(xvi) The rows of A form a linearly independent set.
(vi) The linear system Ax = b has no free variables for
each b ∈ F n . (xvii) The rows of A form a basis for F n .

(vii) The equation Ax = b is consistent for each b ∈ F n . (xviii) The linear transformation x 7→ Ax is injective
(one-to-one).

(viii) A> is an invertible matrix.


(xix) The linear transformation x 7→ Ax is surjective
(onto).
(ix) A is row equivalent to In .
(xx) The linear transformation x 7→ Ax is bijective
(x) A has n pivot position. (one-to-one and onto).
3.3. MATRIX INVERSES 109

Corollary 3.3.12. Let A and B be n × n matrices. If AB is invertible, then A and B are likewise invertible.

Example 3.3.13. Determine whether the following matrices are “nonsingular,” “singular,” or not enough
information.
(a) Suppose that A is a 3 × 3 matrix over C such that nullity(A) = 0.

By part (vi) of the Nonsingular Matrix Theorem, we see that A is nonsingular since the nullity is equal
to the number of free variables in the linear system Ax = b.

(b) Suppose A is a 5 × 5 matrix over Z2 . Furthermore, suppose that the equation Ax = 0 has 8 solutions.

By part (v) of the Nonsingular Matrix Theorem, we see that A is singular since the homogeneous
equation Ax = 0 has nontrivial solutions.

(c) Suppose A is a 2 × 2 matrix such that


   
 1   1 
A = .
2 2

We do not
 have enough information to decide if A is nonsingular or not. On theone hand,
 the identity
 1 0   1 0 
matrix   satisfies this condition and is nonsingular. On the other hand,   also satisfies
0 1 2 0
this condition but is singular. 
110 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

Exercises (Go to Solutions)


For Exercises 1-9, compute the inverse matrix of the given matrix using Theorem 3.3.3. Verify your inverse.
       
 2 5 1 2 9 −4 2 5
1.  2.  3.  ♠ 4. 
      
   
1 3 3 4 −7 5 1 3
       
 2 5  4 3   3 2   10 2 
♠ 5.  ♠ 6.  ♠ 7.   (mod 5) ♠ 8.   (mod 11)

 
3 7 −2 1 4 2 5 3
 
 1 1+i 
♠ 9.  
−i 2

For Exercises 10-13, solve linear system Ax = b using the matrix inverse.
   
 4 8   2 
10. A =  , b =  
2 6 3
     
 1 2 3   0   4 13 5 
    −1  
♠ 11. A = 
 −1 −1 , b =  3
1   , A =  −3
  −11 −4 

     
2 1 −5 −2 1 3 1
     
 4 26 31 −15   1   3 4 1  1
     
 −3 −21 −25 12  2   −1 0 4 1 
     

♠ 12. A =  , b =  , A−1 =  
     
 1
 7 8 −4 

 3 

 0
 −1 −3 0 

     
0 −1 −1 1 4 −1 −1 1 2
     
 1 0 1   1   2 1 0 
    −1  
♠ 13. A ≡ 
 2 0 , b ≡  1 , A ≡  2
1     1  (mod 3)
2 
     
1 2 0 2 2 2 0

For Exercises 14-22, determine whether the following matrices are “nonsingular,” “singular,” or not enough
information. Explain your reasoning.

♠ 14. A is a 4 × 4 matrix with ♠ 15. A is a 5 × 4 matrix with ♠ 16. A is a 3×3 matrix such that
rank(A) = 4. rank(A) = 4. every vector in F 3 can be
expressed as Ax, for some
vector x ∈ F 3 .

♠ 17. A is a 5 × 5 matrix with a ♠ 18. A is a 3 × 3 matrix in row- ♠ 19. A is a 3 × 3 matrix in row-


two rows of zeros in it row- reduced echelon form. reduced echelon form and a
reduced echelon form. pivot in each column.
3.3. MATRIX INVERSES 111

♠ 20. A is a 2×2 matrix such that ♠ 21. A is row equivalent to a sin- ♠ 22. A is a 3 × 3 matrix with
Ax = Ay for two distinct gular matrix. columns vectors a1 , a2 , a3
vectors x, y ∈ F 2 . such that a1 − 2a2 = a3 .

For Exercises 23-28, solve matrix equations for the matrix X using matrix properties. You may assume all
matrices are n × n and nonsingular. Be cautious! The order of multiplication matters!

23. (BAX)−1 = C ♠ 24. AX + B = 0 ♠ 25. A−1 (AX + C) = BA

♠ 26. (XA)−1 B = C ♠ 27. B(A + X)−1 = C 28. (DX − B)(CA)−1 = E

29. Prove Corollary 3.3.8.


30. Prove Theorem 3.3.12.

† The value det(A) = ad − bc is called the determinant of A. We will learn more about determinants in Chapter 5.
†† To remember this formula, think of the following proverb:

Put Your Socks On, Then Your Shoes


Take Your Shoes Off, Then Your Socks
112 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

“The opposite of love is not hate, it’s indifference.” – Elie Wiesel

3.4 Elementary Matrices


Definition 3.4.1. An elementary matrix is a matrix obtained by performing a single row operation on
the identity matrix.

Since there are three types of row operations, there are three types of elementary matrices. Each elemen-
tary matrix is nonsingular, whose inverse is the elementary matrix corresponding to the inverse row operation.

1. (Replacement) If you replace Row i with Row i + cRow j in the identity matrix, then the resulting
elementary matrix will have 1’s across the main diagonal and 0’s everywhere else except in the position
(i, j) which will be c.
For example, the matrix E1 , depicted below, Likewise, the matrix E1−1 is the matrix which
corresponds to the row operation corresponds to the inverse row operation
“replace Row 3 with Row 3 - 2Row 1.” “replace Row 3 with Row 3 + 2Row 1.”
   
 1 0 0   1 0 0 
−1
   
E1 =  0 1 0 

 E1 =  0 1 0 


   
−2 0 1 2 0 1

2. (Interchange) If you interchange Row i with Row j in the identity matrix, then the resulting elementary
matrix will be like the identity matrix except the ith and jth rows are interchanged.
For example, the matrix E2 corresponds to the The inverse of E2 is itself! It is also true that
row operation “interchange Rows 2 and 3.” E2 is equal to its own transpose.
 
(E2 )−1 = E2 = (E2 )> .
 1 0 0 
 
E2 =  0 0 1 

 
0 1 0

3. (Scaling) If you scale Row i by c in the identity matrix, then the resulting elementary matrix has 0’s
in all off-diagonal entries and 1’s in the main diagonal entries except (i, i) which is c.
For example, the matrix E3 corresponds to the The inverse of E3 corresponds to the row op-
row operation “scale Row 2 by 7.” eration “scale Row 3 by 1/5.”
   
 1 0 0   1 0 0 
(E3 )−1 = 
   
E3 =  1
 0 7 0   0 7 0 .
 
   
0 0 1 0 0 1
3.4. ELEMENTARY MATRICES 113
 
 a b c 
 
Example 3.4.2. Let A =   d e f . Using the elementary matrices E1 , E2 , and E3 from above,

 
g h i
compute E1 A, E2 A, and E3 A.
    
 1 0 0  a b c   a b c 
    
E1 A =  0 1 0   d e f  = 
     d e f ,

    
−2 0 1 g h i g − 2a h − 2b i − 2c
         
 1 0 0  a b c   a b c   1 0 0  a b c   a b c 
         
E2 A = 
 0 0  d
1   e f = g
 h ,
i  E3 A = 
 0 7  d
0   e f  =  7d 7e
 7f 

         
0 1 0 g h i d e f 0 0 1 g h i g h i


The previous example motivates the following proposition.

Proposition 3.4.3. If an elementary row operation† is performed on an m×n matrix A, the resulting matrix
can be written as EA, where the m × m matrix E is the elementary matrix associated to that elementary
row operation.

In the Nonsingular Matrix Theorem, we saw that an n × n matrix A is invertible if and only if A is row
equivalent to In . It turns out that the process of row-reducing A into In can produce the inverse matrix
A−1 too.

Theorem 3.4.4 (Inversion Algorithm). Let A be an invertible n × n matrix A. Then any sequence of
elementary row operations that reduces A to In also transforms In into A−1 .
Proof. Suppose that A is invertible. Hence, A ∼ In by the Nonsingular Matrix Theorem. Then there exists
some sequence of row operations transforming A into In . For each row operation in this sequence, there is
a corresponding elementary matrix Ei . Say it took p row operations. Thus,
A ∼ E1 A ∼ E2 (E1 A) ∼ . . . ∼ Ep (Ep−1 . . . E1 A) = In .
Thus, (Ep . . . E2 E1 )A = In . Therefore, Ep . . . E2 E1 = A−1 . Since (Ep . . . E2 E1 )In = A−1 , the sequence of
row operations transforming A into In transforms In into A−1 . 

The Inversion Algorithm is a process for compute matrix inverses. Consider the matrix [A | In ]. By row
reduction,
[A | In ] ∼ [In | A−1 ]. (3.4.1)
Thus, row reduction saves the day again! Additionally, if A is the set of column vectors of A, which neces-
sarily forms a basis for F n , and E is the standard basis for F n , then we see also that A−1 = P , the change
A←E
of basis matrix from standard coordinates to A-coordinates. Hence, the Inversion Algorithm is a special case
of the Change-of-Basis Algorithm we saw in (2.8.1).

 
 0 1 −3 
 
Example 3.4.5. Find the inverse of A = 
 1 −2 .
5 
 
−5 4 3
114 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

Using the method suggested above, we will row reduce A and apply these same row operations to In .
     
 0 1 −3 1 0 0   1 −2 5 0 1 0   1 −2 5 0 1 0 
     
 1
 −2 5 0 1 0 ∼ 0   1 −3 1 0 0  ∼  0   1 −3 1 0 0  
     
−5 4 3 0 0 1 −5 4 3 0 0 1 0 −6 28 0 5 1
     
3 1
 1 −2 5 0 1 0   1 −2 5 0 1 0   1 −2 0 −3 − 2 − 2 
     
∼ 0
 1 −3 1 0 0  ∼  0   1 −3 1 0 0  ∼  0   1 −3 1 0 0  
     
3 1 1 3 1 1
0 0 10 6 5 1 0 0 1 5 2 10 0 0 1 5 2 10
   
3 1 13 3 1
 1 −2 0 −3 − 2 − 2   1 0 0 5 2 10 
   
∼ 0 1 0 14 3 3 ∼
0 1 0 14 3 3 
5 2 10   5 2 10 
   
0 0 1 35 1
2
1
10 0 0 1 3
5
1
2
1
10
   
13 3 1
 5 2 10   26 15 1 
  1  
Therefore, A−1 =  14 3 3 
 5 2 10  = 10  28 15 3 .
  
   
3 1 1
5 2 10 6 5 1

A modification of the inversion algorithm can be used to factor a nonsingular matrix as product of
elementary matrices. Suppose that A reduces to In via a sequence of p-many replacement row operations.
Let E1 , E2 , . . . , Ep be the corresponding elementary matrices. Thus,
(Ep . . . E2 E1 )A = In , and A = (Ep . . . E2 E1 )−1 In = (E1−1 E2−1 . . . Ep−1 ).

Example 3.4.6. In Example 3.4.5 we reduced A into I3 by the following sequence of row operations, given
as descriptions and elementary matrices:

interchange replace Row 3 with replace Row 3 with 1


scale Row 3 by ,
Rows
 1 and 
2, Row
 3 + 5 Row1, Row
 3 + 6 Row2,  10

 0 1 0   1 0 0   1 0 0   1 0 0 
 

 1
      0 1 0 
 0 0 

 0
 1 0 

 0
 1 0 
 



      1
0 1 0 5 0 1 0 6 1 0 0 10

replace Row 1 with replace Row 2 with replace Row 1 with


 1 − 5 Row3,
Row Row
 2 + 3 Row3, Row
 1 + 2 Row2.
 1 0 −5   1 0 0   1 2 0 
     
 0  0 1 0 
 1 0 

 0
 1 3 
  
     
0 0 1 0 0 1 0 0 1

Following the same order but taking inverses, we get a factorization of A, namely:
       
 0 1 0  1 0 0  1 0 0  1 0 0  1 0 5  1 0 0  1 −2 0 
       
A=
 1 0 0 
 0 1
 0 
 0
 1 0  0 1

  0 1 0   0 1 −3   0
0    
.
1 0 
       
0 1 0 −5 0 1 0 −6 1 0 0 10 0 0 1 0 0 1 0 0 1

3.4. ELEMENTARY MATRICES 115

Exercises (Go to Solutions)


For Exercises 1-6, for the row operation performed on an m × n matrix, write the corresponding elementary
m × m matrix.

♠ 1. interchange Rows 1 and 3 in a 3 × 3 matrix. ♠ 2. scale Row 3 by 2 in a 4 × 4 matrix.

♠ 3. scale Row 2 by -5 in a 3 × 2 matrix. ♠ 4. replace Row 3 with Row 3 + 2Row 1 in a 3 × 3


matrix.

♠ 5. replace Row 4 with Row 4 − 3Row 2 in a 5 × 3 ♠ 6. replace Row 1 with Row 1 − Row 3 in a 4 × 2
matrix. matrix.

For Exercises 7-11, find the inverse matrix of the provided matrix, if possible. Verify your answer.
     
6 2 4 −3 −1 10 −1 −6
7. 
     
    
12 4 8.  1 −1
 0 
 ♠ 9.  −11
 2 9 
   
−2 −1 2 −3 1 3
   
1
 2 4 3   2 −1 − 32 
   
♠ 10. 
 1 0  (mod 5)
3  11.  1
− 12 − 32 
 2 
   
2 1 0 − 23 3 − 72

For Exercises 12-15, factor the matrix as a product of elementary matrices. Answers may vary.
     
 2 1 −6 8 13  5 −66 −8
12. 
   
    
1 2 ♠ 13.  −3 4 6 
 ♠ 14. 
 −1 16 2 

   
1 −1 −2 2 −20 −2

 
 3 1 + 3i 7i 
 
15. 
 2 1 + 2i 4i 

 
−2i 2 7

16. If      
 5 3 1   5 4 1   1 3 4 
     
A≡
 0 2 , B ≡  6
5   3 , C ≡  2
2   3 6 
 (mod 7),
     
0 1 3 6 2 1 1 3 1

solve the matrix equation (AX)−1 ≡ B + C (mod 7).

† Elementary column operations can also be performed on A using the same three operations and the same three forms of

elementary matrices, but the elementary matrices when multiplied on the right instead of left perform column operations.
116 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

“Whenever you’re in conflict with someone, there is one factor that can make the difference between
damaging your relationship and deepening it. That factor is attitude.” – William James

3.5 Matrix Factorizations


In algebra, it is often important to be able to undo the process of multiplication, which is known as factor-
ization. We see it in arithmetic, such as 6 = 2 · 3, and we see it in polynomials, x2 − 1 = (x − 1)(x + 1).
Factorization can be extremely useful in solving problems where such objects arise, for example, when trying
to solve the polynomial equation
x2 − 1 = 0,
the subsequent factorization
(x − 1)(x + 1) = 0
illuminates the solution set {1, −1}. Such factorizations for matrices can be equally useful. Although no
equivalent of prime numbers or irreducible polynomials exist for matrices, there are many very useful factor-
izations for matrices, much like the elementary factorizations we saw in the previous section. In this section,
we discuss generalizations of the elementary matrices from the previous section and discussion their presence
in matrix factorizations, in particular the LU factorization.

3.5.1 Generalizations of Elementary Matrices


Definition 3.5.1. A diagonal matrix is a square n × n matrix whose nondiagonal entries are all zero.
   
1
 d1 0 ··· 0   d1 0 ··· 0 
   
 0 d2 · · · 0   0 d12 ··· 0 
   
D=  , D−1 = 
..  .

 .. .. . . ..   .. .. ..
 .
 . . . 

 .
 . . . 

   
1
0 0 · · · dn 0 0 · · · dn

The identity matrix In is a diagonal matrix all whose diagonal entries are 1. The zero matrix is a diagonal
matrix all whose diagonal entries are 0. In a diagonal matrix, the diagonal entries need not be the same.
In general, an n × n diagonal matrix D is of the form displayed to the right. Every diagonal matrix can
be factored as a product of elementary matrices of scaling type (if we allow the possibility of zero scaling).
As scaling type elementary matrices can have at most one non-unital diagonal entry, we can view diagonal
matrices as their generalization. Similar to how scaling elementary matrices multiply, we see that if A is a
matrix, then DA is the matrix where the ith row of A is scaled by di . Likewise, AD is the matrix where
the jth column of A is scaled by dj . In particular, a product of two diagonal matrices is a diagonal matrix
whose diagonal entries are the products of the corresponding diagonal entries.

Furthermore, if no diagonal entry is zero, a diagonal matrix D is a product of elementary matrices and
hence is invertible by the Nonsingular Matrix Theorem. The sum and scalar multiples of diagonal matrices
are diagonal matrices. The set of diagonal matrices forms an n-dimensional subspace of F n×n . Additionally,
the product of two diagonal matrices is diagonal.
 
1
 2− 0 0 
 
Example 3.5.2. Let A =  0 1 0 

. Then A is a diagonal matrix and
 
0 0 3
     
1
 −2 0 0   16 0 0   16 0 0 
−1 4 −4
     
A =  0 1 0 , A =  0 1 0 , A =  0 1
    
.
0 
     
1 1
0 0 3 0 0 81 0 0 81
3.5. MATRIX FACTORIZATIONS 117

Furthermore, A can be factored into a product of scaling elementary matrices:


  
1

 2 0 0  1 0 0 
  
A=  0 1 0  0 1 0 .
   
  
0 0 1 0 0 3

Definition 3.5.3. A diagonal matrix C is called a scalar matrix if all the diagonal entries are equal.

Essentially, scalar matrices all have the form C = cIn for some scalar c, which is why they get their name.
Note that for any matrix A, we have that CA = (cIn )A = c(In A) = cA, that is, multiplication by a scalar
matrix is no different than scalar multiplication. Furthermore, scalar matrices are exactly those matrices
which commute with all other matrices with regard to multiplication.

Definition 3.5.4. A permutation matrix is a square matrix formed by some rearrangement of the rows
of In .

It can be shown that any permutation of objects can be accomplished by a sequence of transpositions,
that is, a rearrangement of only two things at a time. This implies that a permutation matrix is a product of
elementary matrices of interchange type. As such, a permutation matrix P is always non-singular. In fact,
P −1 = P > . As an interchange elementary matrix has at most two rows permuted, we can view permutation
matrices as their generalization. Similar to how interchange elementary matrices multiply, we see that if A
is a matrix, then P A is the matrix where the rows of A are rearranged in the same way that the rows in In
are rearranged for P . Likewise, AP is the matrix where the columns of A are rearranged in the same way
that the columns in In are rearranged for P . In particular, a product of two permutation matrices is a per-
mutation matrix, although the sum and scalar multiples of permutation matrices are no longer permutation
matrices.

 
 0 1 0 0 
 
 0 0 0 1 
 
Example 3.5.5. Let P =   is a permutation matrix. It can be easily verified that P > =
 
 1 0 0 0 
 
 
0 0 1 0
−1
P . We also see that P factors as a product of interchange elementary matrices:
   
 0 0 1 0  1 0 0 0  1 0 0 0 
   
 0 1 0 0  0 0 1 0  0 1 0 0 
   
P = 





 
 1 0 0 0   0 1 0 0  0 0 0 1 
   
   
0 0 0 1 0 0 0 1 0 0 1 0

Definition 3.5.6. An upper triangular matrix is a square n × n matrix whose entries below the main
diagonal are all zero. A lower triangular matrix is a square n × n matrix whose entries above the main
diagonal are all zero. A triangular that is either upper or lower triangular is called a triangular matrix.
A matrix is unit triangular if it is triangular and all the diagonal entries are 1. A matrix is strictly
triangular if it is triangular and all the diagonal entries are 0.
118 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

Upper Triangular Matrix Unit Upper Triangular Matrix Strictly Upper Triangular Matrix
     
 1 3 −1   1 5 3   0 7 −3 
     
 0 −3 2   0 1 2   0 0 −1 
     
     
0 0 2 0 0 1 0 0 0

Lower Triangular Matrix Unit Lower Triangular Matrix Strictly Lower Triangular Matrix
     
 1 0 0   1 00   0 0 0 
     

 3 −3 0 

 −3
 1 0 

 −1
 0 0 

     
−2 1 0 5 −7 1 1 2 0

In other words, an upper triangular matrix is a square matrix in echelon form, and a lower triangular
matrix is just a square matrix in upside-down echelon form† . Of course, if a matrix is upper AND lower
triangular, then it is actually a diagonal matrix.

An upper unit triangular matrix is a product of elementary matrices of replacement type, the kind used
during the back-phase of Gauss–Jordan elimination and hence have non-zero entries above the diagonal. In
general, an upper triangular matrix is a product of “backward” replacement elementary matrices and scaling
(including by zero) elementary matrices. In particular, unit upper triangular matrices are those which are
strictly a product of “backward” replacement elementary matrices, and hence can be viewed as their gen-
n(n − 1)
eralization. Upper triangular matrices are closed under sums and scalars, forming an -dimensional
2
n×n
subspace of F . Additionally, a product of upper triangular matrices is upper triangular, and an upper
triangular matrix is invertible if and only if none of the diagonal entries are zero, in which case the inverse
is likewise upper triangular. Furthermore, the transpose of an upper triangular matrix is lower triangular.
Similar statements can be made about lower triangular matrices.

   
 2 0 −5   3 −1 −2 
   
Example 3.5.7. Let A =   0 1 −3 
 and B =  0
 0 −5  . Then A is nonsingular, but B is not
   
0 0 4 0 0 3
by considering their diagonal entries. In particular,
   
1 5
 2 0 8   6 −2 −19 
−1
   
A = 3 , AB =
 0 1 4   0 0 −14  .

   
0 0 14 0 0 12

We also see that A factors as a product of replacement and scaling elementary matrices (in the case of B we
allow for a zero-scaling matrix, which technically is not an elementary matrix):
    
 1 0 0  1 0 −5   1 0 0  2 0 0 
    
A=
 0 1 0 
 0
 1 0 
 0
 1 −3 
 0
 1 0 
 and
    
0 0 4 0 0 1 0 0 1 0 0 1
3.5. MATRIX FACTORIZATIONS 119
      
 1 0 0  1 0 −2   1 0 0  1 0 0  1 −1 0  3 0 0 
      
B=
 0 1  0
0   1  0
0   1 −5 
 0
 0  0
0  
 0
1 0   1 .
0  
      
0 0 3 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1

3.5.2 The LU Factorization


Theorem 3.5.8. Let A be an m × n matrix which has an echelon form which can be obtained solely by
replacement row operations. Then there exists an m × n matrix U and an m × m matrix L such that
A = LU,
U is an echelon form of A and L is a lower unit triangular matrix. In particular, the (i, j)-position of L for
i > j is −c when the replacement Row i 7→ Row i + c Row j was used during the row reduction process.
Proof. When computing an echelon form of a matrix, the scaling row operation is never necessary. This one
is only necessary for row-reduced echelon form. Thus, we need only use replacement or interchange to row
reduce a matrix to echelon form. Suppose that A reduces to an echelon form U without interchange. Then
there is a sequence of p-many replacement row operations transforming A into U . Let E1 , E2 , . . . , Ep be the
corresponding elementary matrices. Thus,
(Ep . . . E2 E1 )A = U,
and
A = (Ep . . . E2 E1 )−1 U = (E1−1 E2−1 . . . Ep−1 )U.
Let L = E1−1 E2−1 . . . Ep−1 . Now, each elementary matrix Ei is unit lower triangular. Its inverse Ei−1 is also
unit lower triangular. The products of such matrices are also unit lower triangular, which gives the LU
factorization.


The proof of this theorem provides us an algorithm for computing the LU factorization. An Algorithm
for finding an LU Factorization
1. Reduce A to an echelon form U by a sequence of only forward-phase row replacement operations, if
possible.

2. Place entries in L such that the same sequence of row operations reduces L to Im , that is, place in the
(i, j)-position the scalar −c whenever Row i 7→ Row i + c Row j was used.

 
 2 4 10 5 9 
 
 7 6 3 3 1 
 
Example 3.5.9. Find an LU factorization of A = 

 (mod 11).

 2 6 7 1 8 
 
 
5 0 7 8 1
Since A has four rows, L will be 4 × 4. Below, you will see two columns of matrices. On the left, you will
see the row reduction of A to an echelon form U using only forward-phase replacements. On the right, you
will see the matrix L built brick-by-brick by placing the corresponding scalars below the diagonal.
   
 2 4 10 5 9   1 0 0 0 
   
 7 6 3 3 1   ∗ 1 0 0 
   
A= 

 L = 



 2 6 7 1 8   ∗ ∗ 1 0 
   
   
5 0 7 8 1 ∗ ∗ ∗ 1
120 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES
   
 2 4 10 5 9   1 0 0 0 
   
 7 6 3 3 1  (Row 2 − 9 Row 1)  9 1 0 0 
   
=

 L=



 2 6
 7 1 8 
 (Row 3 − Row 1)  1
 ∗ 1 0 

   
5 0 7 8 1 (Row 4 − 8 Row 1) 8 ∗ ∗ 1
   
 2 4 10 5 9   1 0 0 0 
   
 0 3 1 2 8   9 1 0 0 
   
∼

 L=



 0 2
 8 7 10  (Row 3 − 8 Row 2)  1
 8 1 0 

   
0 1 4 1 6 (Row 4 − 4 Row 2) 8 4 ∗ 1
   
 2 4 10 5 9   1 0 0 0 
   
 0 3 1 2 8   9 1 0 0 
   
∼

 L=



 0 0 0 2 1   1 8 1 0 
   
   
0 0 0 4 7 (Row 4 − 2 Row 3) 8 4 2 1
   
 2 4 10 5 9   1 0 0 0 
   
 0 3 1 2 8   9 1 0 0 
   
∼U = 

 L=



 0 0 0 2 1   1 8 1 0 
   
   
0 0 0 0 5 8 4 2 1

Thus,     
 2 4 10 5 9   1 0 0 0  2 4 10 5 9 
    
 7 6 3 3 1   9 1 0 0  0 3 1 2 8 
    
A= ≡   = LU (mod 11). 
    
 2 6 7 1 8   1 8 1 0   0 0 0 2 1 
    
    
5 0 7 8 1 8 4 2 1 0 0 0 0 5

Ax = b1

Consider the system of matrix equations de- 

picted. Such a system might be considered, for Ax = b2


example, if one needs to determine whether the ..
list of vectors {b1 , b2 , . . . , bp } is in the span of the 


 .
column vectors of A and, if so, what coefficients 
Ax = bp
for a linear combination will satisfy the equations.
Each equation Ax = bi is solved using the SAME row operations to compute an echelon form of A. So
computationally, it is often more efficient to record the necessary row operations once and for all. This is
the case with past row-reductions such as [B | C] or [A | In ]. This leads to the LU factorization.

We have seen the convenience of solving an augmented matrix in echelon form (aka upper triangular
matrix). Back substitution saves the day! Likewise, if a matrix is in lower triangular form, then it is
essentially in “upside-down” echelon form and back substitution will lead to a quick solution of the augmented
matrix. Therefore, suppose that
A = LU
is an LU factorization and consider the matrix equation
Ax = b.
3.5. MATRIX FACTORIZATIONS 121

Let y = U x. Then
Ax = (LU )x = L(U x) = Ly = b.
Thus, solving the equation Ax = b boils down to solving the two dramatically simpler matrix equations

Ly = b and U x = y

Example 3.5.10. We saw in the previous example that


    
 2 4 10 5 9   1 0 0 0  2 4 10 5 9 
    
 7 6 3 3 1   9 1 0 0  0 3 1 2 8 
    
A= ≡   = LU (mod 11).
    
 2 6 7 1 8   1 8 1 0  0 0 0 2 1 
 
  
    
5 0 7 8 1 8 4 2 1 0 0 0 0 5

Using this LU factorization of A, solve Ax = b, where b = (1, 2, 3, 4).

We first solve the matrix equation Ly = b, which corresponds to the augmented matrix:
       
 1 0 0 0 1   1 0 0 0 1   1 0 0 0 1   1 0 0 0 1 
       
 9 1 0 0 2   0 1 0 0 4   0 1 0 0 4   0 1 0 0 4 
       
 ∼ ∼ ∼ .
       
 1 8 1 0 3   0 8 1 0 2    0 0 1 0 3   0 0 1 0 3 
   
 
       
8 4 2 1 4 0 4 2 1 7 0 0 2 1 2 0 0 0 1 7

Thus, y = (1, 4, 3, 7).


Next, we need to solve the equation U x = y:

       
 2 4 10 5 9 1   2 4 10 5 9 1   2 4 10 5 0 6   2 4 10 5 0 6 
       
 0 3 1 2 8 4   0 3 1 2 8 4   0 3 1 2 0 6   0 3 1 2 0 6 
       
 ∼ ∼ ∼ 
       
 0 0 0 2 1 3  0 0 0 2 1 3  0 0 0 2 0 6   0 0 0 1 0 3 
  
     
       
0 0 0 0 5 7 0 0 0 0 1 8 0 0 0 0 1 8 0 0 0 0 1 8
       
 2 4 10 0 0 2   2 4 10 0 0 2   2 0 5 0 0 2   1 0 8 0 0 1 
       
 0 3 1 0 0 0   0 1 4 0 0 0   0 1 4 0 0 0   0 1 4 0 0 0 
       
∼

∼
 
∼
 
∼
 


 0 0 0 1 0 3  0 0 0 1 0 3  0 0 0 1 0 3 
  0 0 0 1 0 3 
  
     
       
0 0 0 0 1 8 0 0 0 0 1 8 0 0 0 0 1 8 0 0 0 0 1 8

Thus, x = (1, 0, 0, 3, 8) + t(3, 7, 1, 0, 0) is the general solution.

We should mention that the above example required 27 arithmetic operations while the usual row-
reduction method would have required 46 operations. 

To handle row interchanges, the LU factorization above can be modified easily to produce an P LU -
factorization, where P is a permutation matrix. Similarly, to handle row scaling, a diagonal matrix D can
be introduced to given instead the LDU -factorization. Combining all three row operations together leads
to the P LDU -factorization. The corresponding system of equation of each of these factorizations is still
relatively easy to solve.
122 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

Exercises (Go to Solutions)


For Exercises 1-4, determine if the matrix is a scalar matrix.
       
1 0 0 2 0  3 0   0 0 0 
2. 
   
      
1. 
 0 1 0 
 1 2 3.  0 3 
 4. 
 0 0 0 

     
0 0 1 0 0 0 0 0

QUICK! For Exercises 5-7, multiply by the diagonal matrices using the discussion following Definition 3.5.1
in LESS THAN 10 SECONDS!
      
 2 0  1
0 2   −3 0 0  1 2 3  3 0 0 
      
♠ 5. 
 0 −1   −3
0   2 
 ♠ 6. 
 0 5  0
0   −1  0
5   −1 0 

      
0 0 −3 1 5 0 0 2 2 −2 −3 0 0 −2

   
 1 0 0  1 2 0  0 0 0 
   
♠ 7. 
 0 0  3
0   −1 4 
 0
 3 0 

   
0 0 5 −2 1 −5 0 0 1

For Exercises 8-8, find A2 , A3 , A−1 , and A−3 .


 
 2 0 0 
 
♠ 8. A =  0 −3 0 

.
 
0 0 5

For Exercises 9-9, the permutation matrix P will be listed first and another matrix A will be listed second.
Factor P as a product of interchange elementary matrices (answers may vary). Compute P A and explain
how the rows of A have been permuted. Compute AP and explain how the rows of A have been permuted.
   
 0 0 1 0   1 5 8 10 
   
 1 0 0 0   5 2 6 9 
   
9. 

, 
  

 0 0 0 1   8 6 3 7 
   
   
0 1 0 0 10 9 7 4

For Exercises 10-11, factor the following matrices as a product of elementary matrices.
   
 1 0 0   3 12 −6 
   
♠ 10. 
 2 1 0 
 ♠ 11.  0 5 45 


   
3 4 1 0 0 2

For Exercises 12-13, find the LU -factorization of the provided matrix.


3.5. MATRIX FACTORIZATIONS 123
   
 2 4 −1   2 4 −1 5 −2 
   
12. 
 8 6 4 
  −4

−5 3 −8 1 

  13. 



6 8 10  2
 −5 −4 1 8 

 
−6 0 7 −3 1

For Exercises 14-18, find the LU -factorization of the coefficient matrix A below and use this factorization to
solve the linear system as in Example 3.5.10. [Hint: Remember to NOT scale NOR interchange.]
         
 2 −5   x1   −2   2 −2   x1   4 
♠ 14.   =  ♠ 15.   = 
−4 13 x2 −2 −3 6 x2 9

         
 2 6 0   x1   6   2 2 −2   x1   −4 
         
16. 
 16   x2  =  −3
51 0     
 ♠ 17. 
 −4 −3   x2  =  −2 
4     
         
12 42 3 x3 15 −1 −2 4 x3 6

    
 2 1 2   x1   1 
    
♠ 18. 
 6 6   x2  ≡  2
0      (mod 7)

    
4 5 3 x3 3

19. Show that if A is a lower triangular matrix then A> is upper triangular.

†I suppose it could be called a chandelier form.


124 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

“Life is a mirror and will reflect back to the thinker what he thinks into it.” – Ernest Holmes

3.6 Linear Transformations on R2


In this section, we will explore how multiplication by a nonsingular matrix transforms geometry in R2 .
While analogous statements can be extended to the vector space F n , based upon the field F itself, only some
geometric interpretations may apply.

Let a, b ∈ R such that a, b > 0.


We can stretch R2 horizontally by multiplying We can stretch R2 vertically by multiplying the
the (scaling) matrix matrix
   
 a 0   1 0 
 .  .
0 1 0 b
Thus, multiplication by a diagonal matrix will rescale the x- and y-axes by the diagonal entries. When a < 1
or b < 1, we say the geometry is compressed since the scale becomes smaller than the original.

Example 3.6.1. The unit square J is the y


rectangle with vertices at (0, 0), (1, 0), (0, 1), and
(1, 1). The unit square is displayed below in
cyan. It will be a useful tool to visualize how 1
matrix multiplication distorts the geometry by J
seeing how a matrix transforms the unit square
J. x
1

 
y
2 0
(a) Multiplication by the matrix   hori-
 
0 1
zontally stretches the plane by a factor of two.
1
J

x
1 2

 
 8 0  y
(b) Multiplication by the matrix   hori-
0 1 2
zontally stretches the plane by a factor of eight.
1 J
x
1 2 3 4 5 6 7 8 9
3.6. LINEAR TRANSFORMATIONS ON R2 125
 
y
 1 0 
(c) Multiplication by the matrix   verti-
0 4
4
cally stretches the plane by a factor of four.

2 J
1

x
1

 
y
 1 0 
(d) Multiplication by the matrix   verti-
0 12
1
cally comprresses the plane by a factor of two.
J

x
1

    
1
 1/2 0   2 0   1 0 
Example 3.6.2. The matrix  =   will vertically stretch the four vertices
0 3 0 1 0 3
of J by a factor of 3 and will horizontally compress the vertices by a factor of 2 (or we can say that they are
horizontally stretched by a factor of 1/2). In particular,
    
1
 2 0  0   0 
   =   y
0 3 0 0
    
1
0  1   1 3
2 2
=
 
  
0 3 0 0
     2
1


2 0  0   0 
  =   J
0 3 1 3
1
    
1 J
2 0   1   12 
=

   x
0 3 1 3 1
The image of J is displayed to the right in ma-
genta. 
126 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

In plane geometry, a shear mapping (or transvection) is a linear map that displaces each point in fixed
direction, by an amount proportional to its signed distance from a line that is parallel to that direction. Let
m ∈ R.
We can shear R2 horizontally by multiplying We can shear R2 vertically by multiplying the
the (replacement) matrix matrix
   
1 m 1 0
. .
   
 
0 1 m 1

Example 3.6.3. Consider the shearing of the unit square and the points u = (2, 1) and v = (1, 2):

(a) horizontally by a factor of 2. y

v v0
The images of these points under this shear are
     u u0
 1 2  2   4 
= ;
J J
  x
0 1 1 1
    
 1 2  1   5 
  = .
0 1 2 2

(b) vertically by a factor of −2. y

The images of u and v under this shear are v


    
u
1 0 2 2 J
= ;
    
  x
−2 1 1 −3 J v0
    
1 0  1   1 
  =  .


−2 1 2 0
u0

Reflections across the x- and y-axis are accom-


plished by multiplying by the scaling (diagonal)
matrices `
   
 1 0   −1 0  v v0
  and  , k k
0 −1 0 1
respectively.  
 0 1 
The interchange (permutation) matrix   produces a reflection across the line y = x. The composite
1 0
of reflections across the x- and y-axis gives a reflection through the origin. General reflections will be dis-
cussed in the homework.

Example 3.6.4. Consider the reflections of point v = (2, 1) across:


3.6. LINEAR TRANSFORMATIONS ON R2 127

(a) the x-axis. y

The image of v under this reflection is


     v
1 0 2 2 J
= .
    
  x
0 −1 1 −1 J

v0

(b) the y-axis. y

The image of v under this reflection is


     v0 v
−1 0 2 −2 J J
= .
    
  x
0 1 1 1

(c) y = x. y

v0
The image of v under this reflection is
    
v
 0 1  2   1 
J
  = . J x
1 0 1 2

(d) the origin. y

The image of v under this reflection is


    
v
 −1 0   2   −2 J
= .

  x
0 −1 1 −1 J
0
v

Theorem 3.6.5. Let A be a 2 × 2 nonsingular matrix. Then the matrix transformation associated to A
geometrically is a composition of shears, reflections, and stretches/compressions.
128 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

The following example illustrates Theorem 3.6.5.


 
−4 −5
Example 3.6.6. Let A =  . We reduce A to its row reduced echelon form in the following
 
2 4
way:
       
1 2  
1
 −4 −5  2 4 2 4  2 Row 1  1 2 
∼ ∼ ∼
  
  4 8   1  
1
2 4 −4 −5 (Row 2 + 2 Row 1) 0 3 0 3 3 Row 2

   
−2
 1 2 (Row 1 - 2 Row 2)  1 0 
  ∼ 
0 1 0 1

Following this sequence of row operation gives the following elementary factorization of A:
     
0 1 1 0 2 0 1 0 1 2
A= .
     
   
1 0 −2 1 0 1 0 3 0 1

To translate this elementary factorization into a geometric interpretation, we read the factorization right-
to-left. Why? Because with the transformation x 7→ Ax = (E1 E2 E3 E4 E5 )x, assuming Ei denotes the ith
elementary factor in the above factorization of A, the first matrix to transform x will be E5 . The next to
transform the resultant vector would be E4 , and onward toward the left. Hence, multiplication by A has the
affect of shearing horizontally by a factor of 2, stretching vertically by a factor of 3, stretching horizontally
by a factor of 2, shearing vertically by a factor of −2, and reflecting across the diagonal line y = x. 

 
 0 1 
Example 3.6.7. Let A =  . We can row reduce A by interchanges Rows 1 and 2, scaling Row
2 1
1 by 21 , and replacing Row 1 with Row 1 − 12 Row 2. This gives the following factorization into elementary
matrices as    
 0 1   2 0   1 1/2 
A=   .
1 0 0 1 0 1

Therefore, A corresponds to the transformation shear horizontally by 12 , horizontally stretch by 2, and reflect
across the line y = x. 

A counterclockwise rotation in R2 by angle θ θ


is captured by the matrix v0 v
 
k

 cos θ − sin θ 
 .
sin θ cos θ O
 −1  
 cos θ − sin θ  cos θ sin θ 
For a clockwise rotation by θ, take the previous matrix’s inverse,  = .


sin θ cos θ − sin θ cos θ

π
Example 3.6.8. Let T : R2 → R2 be the linear transformation given as counterclockwise rotation by
      2
4 2 6
(or 90◦ ). Find the images under T of u =  , v =  , and u + v =  .
     
1 3 4
3.6. LINEAR TRANSFORMATIONS ON R2 129
 
 0 −1  y
Note that [T ] =  . Then
1 0
    
 0 −1   4   −1  (u + v)0
T (u) =   = 
1 0 1 4
u0
    
 0 −1   2   −3  u+v
T (v) =   = 
1 0 3 2 v
     v0
 0 −1   6   −4 
T (u + v) =   = 
1 0 4 6 u
J

J
x

A similar analysis may be used to describe the geometry of matrix transformation in R3 (and higher).
130 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

Exercises (Go to Solutions)


For Exercises 1-5, find the matrix for the operator that performs the stated succession of geometric trans-
formations.

♠ 1. Compress by a factor of 2 in the x-direction, then expands by a factor of 5 in the y-direction.


♠ 2. Expands by a factor of 5 in the y-direction, then shears by a factor of 2 in the y-direction.
♠ 3. Reflects about y = x, then rotates through an angle of π radians about the origin.
♠ 4. Reflects about the y-axis, then expands by a factor of 5 in the x-direction, and then reflects about
y = x.
π
♠ 5. Rotates through about the origin, then shears by a factor of −2 in the y-direction, and then expands
6
by a factor of 3 in the y-direction.
For Exercises 6-9, express the matrix as a product of elementary matrices, and then describe the effect of
multiplication by this matrix in terms of shears, compressions/stretches, and reflections. Draw the image of
the unit square under this matrix transformation. Answers may vary.
       
4 0 1 0 0 −2 3 −6
♠ 6.  ♠ 7.  ♠ 8.  ♠ 9. 
       
   
0 1 0 −8 4 0 6 1

10. What matrix, if multiplied by, would have the affect to the plane R2 of reflection across the line y = −x?
How could you factor this matrix using the matrix transformation discussed in this section?
 
cos 2θ sin 2θ
♠ 11. Let A =  , where θ > 0.
 
sin 2θ − cos 2θ
   −1
cos θ − sin θ 1 0 cos θ − sin θ
(a) Show that A =   .
   
 
sin θ cos θ 0 −1 sin θ cos θ
   −1
 cos θ − sin θ   1 0   cos θ − sin θ 
(b) For     describe the sequence of geometric trans-
sin θ cos θ 0 −1 sin θ cos θ
formations that correspond to this matrix factorization. Conclude that the geometric transfor-
mation corresponding to A is reflection across the line ` through the origin which forms an angle
θ with the x-axis.
π π
(c) For θ = 0, , and , show that this agrees with the three reflections discussed in this section.
4 2
π
(d) For θ = , compute the reflection matrix and draw the image of the unit square under this matrix
6
transformation.
(e) Modifying part (b), find a 2 × 2 matrix B where multiplication by B performs the geometric
transformation of shearing by a factor of m in the direction of the line ` through the origin which
forms an angle θ with the x-axis.

12. Prove Theorem 3.6.5.


3.7. REPRESENTATIONS OF LINEAR TRANSFORMATIONS AS MATRICES 131

“The people who oppose your ideas are inevitably those who represent the established order that your ideas
will upset.” – Anthony J. D’Angelo

3.7 Representations of Linear Transformations as Matrices


We have seen before that linear transformations are those maps between vector spaces which preserve the
vector structure, that is, preserve vector addition and scalar multiplication. We have seen that matrix multi-
plication is a linear transformation, hence all the geometric transformations in the previous section are linear
transformations. We will now see that all linear transformations are essentially just matrix multiplication.

   
1 0
Example 3.7.1. Let e1 =   and e2 =  . Suppose that T : R2 → R3 is a linear transformation
   
0 1
such that    
 1   3 
   
T (e1 ) =  2 

 and T (e2 ) = 
 .
−5 
   
3 0
With no additional information, find a formula for the image of an arbitrary vector x ∈ R2 .
 
 x1
Since x =   = x1 e1 + x2 e2 and since T is a linear transformation, it must hold that

x2

T (x) = T (x1 e1 + x2 e2 ) = x1 T (e1 ) + x2 T (e2 )


       
 1  3   x1 + 3x2   1 3 
       
 2  + x2  −5  =  2x1 − 5x2
= x1      = 2
   x.
−5 
       
3 0 3x1 3 0

In particular, T can be represented as a matrix transformation. 

The argument applied in the previous example works in general.

Theorem 3.7.2. Let ei be the vector in F n such that


 the ith entry is 1 and all other
 entries are 0. Let
T : F → F be a linear transformation and let A = T (e1 ) T (e2 ) . . . T (en ) . Let x ∈ F n . Then
n m

T (x) = Ax.
This matrix A is called the standard matrix for the linear transformation T and is often denoted [T ].

Example 3.7.3. Find the standard matrix A for the linear transformation T (x1 , x2 ) = (3x1 + x2 , 2x1 −
4x2 ).
 
 3 1 
Since T (e1 ) = (3, 2) and T (e2 ) = (1, −4), we have that [T ] = A =   and
2 −4
 
3 1
T (x) = Ax =   x.
 


2 −4
132 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

Recall that if S : F m → F p and T : F n → F m are functions, then their composite S ◦ T is the function
given by the rule (S ◦ T )(x) = S(T (x)) where x ∈ F n . Functions are often viewed, by parable, as machines
on a conveyor belt. As an element x in the domain of T rolls into the machine T , it is transformed into the
element T (x). The composite S ◦ T is then the machine formed by placing the machines S and T next each
other on the conveyor belt. The following theorem states that matrix representations are compatible with
function composition.

Theorem 3.7.4. Let S : F m → F p and T : F n → F m be linear transformations with standard matrices A


and B, respectively. Let x ∈ F n . Then

S ◦ T (x) = S(T (x)) = ABx,

that is, the matrix which represents the composite of two linear transformations is the product of their matrix
representations.

Recall that a linear transformation T : F n → F m is one-to-one if and only if T (x) = T (y) implies that
x = y if and only if ker T = {0}, that is, the only vector mapping to zero is zero itself. Given the matrix
representation [T ] = A, we see that ker T = A, that is, the vectors mapped to 0 by T are exactly those
vectors whose product with A is 0. Hence, finding the null space of A is equivalent to finding the kernel of
T . In particular, a linear transformation is one-to-one if and only if the columns of its matrix representation
are linearly independent.

Example 3.7.5. Let T : R2 → R3 be a transformation given by the rule

T (x, y) = (x + y, 0, 2x + 3y).

Then        
1 1   1 1   1 2   1 0 
  
       
[T ] = T (e1 ) T (e2 ) = 
 0 ∼ 2
0  ∼ 0
3  ∼ 0
1  .
1 
  
       
2 3 0 0 0 0 0 0

The row-reduced echelon form of [T ] shows that the columns of [T ] are linear independent since there is a
pivot in each column. Therefore, T is one-to-one, that is, ker T = {0}. 

Example 3.7.6. Let S : R3 → R2 be a linear transformation given by the rule

S(x, y, z) = (x + y − 2z, −y + z).

Then
     
 1 1 −2   1 1 −2   1 0 −1 
 
[S] = S(e1 ) S(e2 ) S(e3 ) = ∼ ∼ .
0 −1 1 0 1 −1 0 1 −1

The row-reduced echelon form of [S] shows that the columns of [S] are linear dependent since there is no
pivot in the third column. Therefore, S is not one-to-one. In particular, ker S = Span{[1, 1, 1]> }. 
3.7. REPRESENTATIONS OF LINEAR TRANSFORMATIONS AS MATRICES 133

The homogeneous system Ax = 0 has a nontrivial solution when A has a non-pivot column. The nullity
of A is the number of non-pivot columns in A. If A is an m × n matrix and m < n, that is, A has more
columns than rows, then A necessarily has non-pivot columns. Hence, the nullity(A) ≥ 1.

Recall that a linear transformation T : F n → F m is onto if and only if for all vectors b ∈ F m there exists
a vector x ∈ F n such that T (x) = b if and only if im(T ) = F m , that is, the set of images of T is the whole
target space F m . Given the matrix representation [T ] = A, we see that im T = Col A, that is, the vectors
which are linear combinations of the columns of A are exactly the vectors which can come out of T . Hence,
finding the column space of A is equivalent to finding the image of T . In particular, a linear transformation
is onto if and only if the columns of its matrix representation span F m .

Example 3.7.7. Let T : R2 → R3 be the transformation given in Example 3.7.5. The row-reduced
echelon form of [T ] shows that [T ] has no pivot in the third row. Therefore, T is not onto. In particular,
[0, 1, 0]> ∈
/ im T . 

Example 3.7.8. Let S : R3 → R2 be the transformation given in Example 3.7.6. The row-reduced echelon
form of [S] shows that [S] has a pivot in each row. Therefore, S is onto. In particular, im S = R2 . 

The inconsistency of the equation Ax = b may occur when A has a row of zeros in echelon form but the
corresponding position in the augmented column is nonzero. In this case, consistency is dependent on the
choice of b and at least one choice of b will make Ax = b inconsistent. These rows of zeros in the echelon
form correspond to non-pivot rows of A. Let the conullity of A denote the number of non-pivot rows of A.
If A is an m × n matrix and m > n, that is, A has more rows than columns, then A necessarily has non-pivot
rows. Hence, the conullity(A) ≥ 1.

Proposition 3.7.9. Let T : F n → F m is a linear transformation.

(xxi) If m > n, that is, [T ] has more rows than columns, then T cannot be onto.
(xxii) If m < n, that is, [T ] has more columns than rows, then T cannot be one-to-one.
134 CHAPTER 3. THE ALGEBRA AND GEOMETRY OF MATRICES

Exercises (Go to Solutions)


For Exercises 1-5, for the linear transformations T find its standard matrix [T ].
♠ 1. T : R2 → R2 : T (x1 , x2 ) = (2x1 − x2 , −4x2 ).
 
 x1 + 2x2 + x3
 

 x1  
 
 3x1 + 14x3 − 5x2

   
♠ 2. T : R3 → R4 : T 
 x2  = 
   .

   3x2 − 5x1 − 18x3 
 
x3  
19x2 − 7x1 − 40x3

♠ 3. T : C → C4 : T (z) = (z, iz, −z, −iz).

♠ 4. T : Z42 → Z2 : T (x1 , x2 , x3 , x4 ) = x1 + x2 + x3 + x4 .
♠ 5. T : Z35 → Z35 : T (x1 , x2 , x3 ) = (x3 , x2 , x1 ).
For Exercises 6-6, find the standard matrix of the composite function S ◦ T .

♠ 6. T : R2 → R3 : T (x1 , x2 ) = (2x2 , 3x1 , 2x1 − 3x2 ) and S : R3 → R2 : S(y1 , y2 , y3 ) = (y3 − 2y2 , y1 − 2y3 )
For Exercises 7-11, determine if the given linear transformation from Exercises 1-5 is one-to-one or onto.
Find a basis for its kernel and for its image.

♠ 7. Exercise 1 ♠ 8. Exercise 2 ♠ 9. Exercise 3 ♠ 10. Exercise 4 ♠ 11. Exercise 5

QUICK! For Exercises 7-11, determine in less than 10 seconds if the linear transformation CANNOT be
one-to-one and if it CANNOT be onto.
12. T : R3 → R3 : T (x, y, z) = (x, y, z)

13. T : R3 → R2 : T (x1 , x2 , x3 ) = (x1 + 2x2 , x3 − 3x2 )


14. T : R3 → R4 : T (x1 , x2 , x3 ) = (2x1 + 3x3 , x2 + 4x1 + x3 , x3 + x1 + x2 , x1 + 3x2 + x3 )
15. T : C → C3 : T (z) = (z, z, z)
16. T : C → C3 : T (z) = ((3 − i)z, (−2 + 2i)z, z)

17. T : Z35 → Z5 : T (x, y, z) = x + y + z


18. T : Z27 → Z27 : T (x, y) = (4x + 2y, 3y)
Chapter 4

Orthogonality

In this chapter, we will begin to develop the geometric framework of lengths and angles through the lens
of linear algebra. While the notions of affine geometry were introduced earlier in Chapter 2, this chapter
will develop the notions of Euclidean geometry. As such, this geometric development will not be entirely
possible for the finite fields Zp we have often considered, although some notion of orthogonality are available
for general fields. In this chapter the only fields F we will consider for scalars will be R and C.

135
136 CHAPTER 4. ORTHOGONALITY

“Success isn’t measured by money or power or social rank. Success is measured by your discipline and inner
peace.” – Mike Ditka

4.1 Inner Products


Definition 4.1.1. We define the dot product · : Rn × Rn → R of two vectors, u, v ∈ Rn by the rule
 
 v1 
 
 v2
  
u · v = u> v = u1 u2 . . . un 

 = u1 v1 + u2 v2 + . . . + un vn .
 .. 
 . 
 
 
vn

   
 1   −5 
   
Example 4.1.2. Given u =  2  and v =  0 
  
, their dot product is
   
3 3

u · v = 1(−5)) + 2(0) + 3(3) = −5 + 0 + 9 = 4 . 

Let A be anm × n matrix and let x ∈ Rn . Let r 1 , r 2 , . . . , r m ∈ Rn be the row vectors ofA. Thus,

 r1   r1 · x 
   
 r2   r2 · x 
   
A=  .. 
. Then we can define the matrix-vector product using dot products: Ax = 
 ..
.

 .   . 
   
   
rm rm · x

There are many algebraic reasons to extend the dot product over other fields, such as Zp and C, like in the
above redefinition of matrix multiplication. For another example, the dot product is used in the creation of
error-correcting codes which involved vectors of Zn2 , but the geometric benefits of the dot product are absent
on these fields and this is the very goal of this chapter. Furthermore, over complex vector spaces, we vowed
to never use the matrix transpose > . Instead, we use the conjugate transpose ∗ . The reason we made such
a vow will be presented below in this section. This changes how we compute dot products of complex vectors.

Definition 4.1.3. We define the Hermitian product† · : Cn × Cn → C of two vectors, u, v ∈ Cn by the


rule  
v
 1 
 
v
  

 2 
u · v = u v = u1 u2 . . . un  . 

 = u1 v1 + u2 v2 + . . . + un vn .
 .. 
 
 
vn

Example 4.1.4. Let u = (1 + i, i, 3 − i) and v = (1 + i, 2, 4i). Find u · v and v · u.

u · v = (1 + i)(1 + i) + i(2) + (3 − i)(4i) = (1 − i)(1 + i) − 2i + (3 + i)(4i) = 2 − 2i + 12i − 4 = −2 + 10i


4.1. INNER PRODUCTS 137

v · u = (1 + i)(1 + i) + 2(i) + (4i)(3 − i) = (1 − i)(1 + i) + 2i − (4i)(3 − i) = 2 + 2i − 12i − 4 = −2 − 10i




Note that since Rn ⊆ Cn and x = x whenever x ∈ R, the definition of the Hermitian product generalizes
the notion of the dot product. But as the dot product is a possible operation on Cn (and there are situations
where one might allow the dot product of complex vectors), we will not refer to this generalization as the dot
product, but instead as the Hermitian product. Both the dot product over Rn and the Hermitian product
over Cn will be denoted u · v. In an attempt to unify the vocabulary here, we will call the dot product on
Rn and the Hermitian product on Cn the (standard) inner product†† on F n , where F could be R or C.

Theorem 4.1.5. Let F be R or C. Let u, v, w ∈ F n and let c ∈ F . Then

(i) u · (v + w) = u · v + u · w; (ii) u · (cv) = c(u · v);

(iii) u · v = v · u; (iv) u · u ≥ 0, and u · u = 0 if and only if u = 0.

Properties (i) and (ii) above show us that inner multiplication on the left is a linear transformation
F n → F for each fixed vector u ∈ F n . When F = R, (iii) simplifies to just be u · v = v · u, that is, the
dot product is symmetric. Combining all these three properties shows that (u + v) · w = u · w + v · w and
(cu) · v = c(u · v) whenever u, v, w ∈ Rn and c ∈ R. Thus, the dot product is linear in the first factor
and linear in the second factor, that is, we say the dot product is bilinear. These properties derive from the
properties of transposition.

Conversely, the Hermitian product is conjugate-symmetric, that is, the vector commute at the price of
complex conjugation, as shown in (iii). As such, inner multiplication on the right is not exactly a linear
transformation. Like with the real vector spaces, (u + v) · w = u · w + v · w still holds for all u, v, w ∈ Cn .
On the other hand, we get (cu) · v = c(u · v), that is, we can only factor scalars from the first factor if
we take their conjugate. Thus, the Hermitian product is linear in the second factor and conjugate-linear in
the second factor, that is, we say the Hermitian product is sequilinear. These properties derive from the
properties of transposition.

Lastly, (iv) is commonly referred to as positive-definite property ‡ , positive because x · x ≥ 0 and definite
because u · u = 0 if and only if u = 0.
Definition 4.1.6. The length (or norm) of a vector v ∈ F n is the nonnegative scalar kvk defined by
√ q
kvk = v · v = v12 + v22 + . . . + vn2 .

We say that v is a unit vector if kvk = 1.

It is useful to note that kvk2 = v · v.

Theorem 4.1.7. Let u ∈ F n and c ∈ F . Then

kcvk = |c|kvk

 
 1 
 
0 
 

Example 4.1.8. Let v =   ∈ R4 . Then the length of v is
 

 2 

 
−2
√ p √ √
kvk = v·v = 1(1) + 0(0) + 2(2) − 2(−2) = 1 + 4 + 4 = 9 = 3 .
138 CHAPTER 4. ORTHOGONALITY
 
 1/3 
 
 0 
 
1
Since the length of v is 3, v is not a unit vector. On the other hand, let u = v = 
 . Then
3  2/3 

 
 
−2/3

1 1 1
kuk = v = kvk = (3) = 1 .
3 3 3

Therefore, u is a unit vector in the same direction as v. 

The process of constructing a unit vector in the same direction as a given vector is called normalization.
1
The normalization of any nonzero vector v is given by v. The zero vector cannot be normalized. In
kvk
particular, the zero vector does not point in any direction.

Example 4.1.9. Let u = (1 + i, i, 3 − i). Find kuk.

q p √ √
kuk = (1 + i)(1 + i) + (i)i + (3 − i)(3 − i) = (1 − i)(1 + i) + (−i)i + (3 + i)(3 − i) = 2 + 1 + 10 = 13


Definition 4.1.10. For u, v ∈ F n , the distance between u and v, denoted as dist(u, v), is the length of
the vector u − v, that is,
dist(u, v) = ku − vk.

   
7  3 
Example 4.1.11. Let u =   and v =  . Then
 
1 2
 
4 p √
dist(u, v) = ku − vk =   = 42 + (−1)2 = 17 .
 


−1
4.1. INNER PRODUCTS 139

Exercises (Go to Solutions)


For Exercises 1-29, compute the given inner products.
             
1 2 1 2 7 2
1.  · ♠ 2. 3   · 2 
             
     
−1 −3 −2 −3 3.  6  ·  −1 
  

   
2 5
               
 7   1  7    1   1   3   1 
           
♠ 4.   ·  −1   · 2  −1  ♠ 6.   −5  +  2  · 
     
1  5.  0 
5 
 

   
 1 
    

  
 
 
 


     
−5 2 0 −3 −1
−5 2
                
  −1   1   0   2   7    7    7 
                
♠ 7. 2  −1  ·  −2  +  −5 
        
 1   1 
    
  1    1 
   
♠ 8.  ·  · 3 

         

 2 
9.  
  


3 1 3 
 −3   −1 
 

 
  −3    −1 
   

         
2 2 2 2
            
 1 2 3   −2   −1   −i   i   i   2−i
     ♠ 11.  ·    


♠ 10. 
 0
 −1   1  ·  −1 
−1      1+i 1  2i  ·  i
♠ 12.    

        
1 0 1 −1 3 3 3i
                 
1 + i 1−i  1 + i 1 − i 
 i   −1   2−i   −1  

 
  
 
 
  
   
 2−i   2+i    2 − i   1  2 + i 
                 
 2i  ·  2 − i
♠ 13.     ♠ 14. 
 i
· 2 − i  15. 
 ·  16. i   ·   
           2  
 3+i   3−i    3 + i    3−i 
        
3 4i 3i 4i
         
         
4−i 4+i 4−i 4+i

For Exercises 17-26, find the norm of the vector, that is, kuk.
         
3  2  −2
  7  
   1   
1 
         
17. 
 −1 
 ♠ 18. u = 
 1 
 
♠ 19.   20. 2 
 0  − 3 1 
 

 
−3 
        
5 5 −3 3
 
 
2
        
 3   1   −2   1   i 
         
21. 
  + 3 0
−1   + 1 
   ♠ 22. 
 −i 
 23.  2i 


         
5 −3 3 2i 3
       
1 + i 1−i 
 2−i   −1  

 
  
   
♠ 25.   2−i   2+i 
   
24. 
 i

 2−i 



   26. 2 

−
 


 3+i   3−i 
3i 4i 

 
 


4−i 4+i
140 CHAPTER 4. ORTHOGONALITY

♠ 27. Compute dist(u, v), if


   
 1   5 
   
u= 1 , v =  2 
  
   
1 2

♠ 28. Compute dist(u − v, 2w + x), if


       
 1   0   3   0 
       
u= −2 , v =  −5 , w =  −1 , x =  1 
      
       
1 3 −2 0

29. Compute dist((u · v)w, 2x), if


       
 3   1   −2   0 
       
u= −1 , v =  −2 , w =  1
    ,
 x=
 1 

       
−2 3 −1 0

    
 1   x   1/2 
     
 1 · 2
30. Find x such that   = 1. 31. For which real numbers x make  1/3 
 a unit
  

     
x −3 x
vector?

† Many textbooks alternatively define the Hermitian product as u · v = u> v. Although this does not at all change the

theory and applications of the Hermitian product, it does change intermediate calculations. Be cautious if comparing with
other sources since there is no universal consensus.
†† More generally, an inner product on F n is any function F n × F n → F which satisfies the axioms of Theorem 4.1.5.
‡ This property is the reason we will not be considering finite fields in this chapter. The dot product over any field is always

symmetric and bilinear. The positive-definite condition fails for these fields and many others. Positive-definition is needed to
establish the geometric properties we seek.
4.2. ORTHOGONALITY 141

“To be trusted is a greater compliment than being loved.” – George MacDonald

4.2 Orthogonality

Definition 4.2.1. Two nonzero† vectors u, v ∈ F n are orthogonal if u · v = 0.

    

 1   1  2
 
     
Example 4.2.2.  2 , v =  1
Given the vectors u =    , and w =  3 , determine if u is orthogonal
  
     
3 −1 5
to v or w.

u · v = 1(1) + 2(1) + 3(−1) = 3 − 3 = 0.


Thus, u is orthogonal to v.
u · w = 1(2) + 2(3) + 3(5) = 23 6= 0.
Thus, u is not orthogonal to w.

v · w = 1(2) + 1(3) − 1(5) = 5 − 5 = 0.

Thus, v is orthogonal to w. 

Theorem 4.2.3 (Pythagorean Theorem). Two vectors u, v ∈ F n are orthogonal†† then

ku + vk2 = kuk2 + kvk2 .

Proof. By assumption, u·v = 0 = u · v = v·u. Note that ku+vk2 = (u+v)·(u+v) = u·u+u·v+v·u+v·v =


u · u + 0 + 0 + v · v = kuk2 + kvk2 . 

We call the above equation the Pythagorean Theorem because how it resembles the trigonometric equa-
tion, after all it states that a sum of squares is equal to a square under certain conditions, but there is a
deeper geometry connection. First of all, if two vectors u and v are linearly independent, then they span
a plane in Rn , but they also form a triangle with sides u, v, and u + v, where the three points corre-
spond to the tail of u (which is also the tail of u + v), the head of u (which is also the tail of v), and
the head of v (which is also the head of u + v). The lengths of the three sides of this triangle is kuk,
kvk, and ku + vk. Therefore, the above theorem is saying that the sums of squares of two sides of the
a triangle equal the square of the other side. This is exactly the statement of the Pythagorean Theorem
from Trigonometry, but furthermore the Trigonometric Pythagorean Theorem guarantees this can only hap-
pen if there is a right angle present, that is, the angle between the vectors u and v must be a right angle.
This proves the notion that two vectors are orthogonal if and only if the angle between the two vectors is 90◦ .
142 CHAPTER 4. ORTHOGONALITY

An alternative way of expressing a plane in F 3 is by orthogonality. For example, we can construct a


plane in F 3 by taking all vectors in F 3 that are orthogonal to some fixed vector n = (a, b, c), called the
normal vector of the plane. If x = (x, y, z) is a vector on the plane, then

n · x = ax + by + cz = 0.

This produces the equation of a plane which passes through the origin (notice that 0 is a solution here). In
order to have the plane pass through the particular vector x0 we replace x with x − x0 , that is,

n · (x − x0 ) = 0
a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0
ax + by + cz = d,

gives an equation of the plane in F 3 through the point (x0 , y0 , z0 ) and orthogonal to n.

Example 4.2.4. The solutions to the equation

3(x − 4) + 5(y + 5) − 2z = 0 ⇒ 3x + 5y − 2z = −13

is the plane containing the point (4, −5, 0) and is perpendicular to (3, 5, −2). 

Similarly, this strategy will create a hyperplane in F n . In particular, the hyperplane containing x0 and
is orthogonal to n will be the solutions x to the linear equation, given in point-normal form,

n · (x − x0 ) = 0. (4.2.1)

Definition 4.2.5. A set of nonzero vectors {v 1 , . . . , v r } ⊆ F n is called an orthogonal set if each pair of
distinct vectors is orthogonal, that is, v i · v j = 0 whenever i 6= j. An orthogonal set of vectors is called
orthonormal if each vector in the set is also a unit vector.

Of course, any orthogonal set can be transformed into an orthonormal set by normalizing each vector,
1
that is, replacing each vector v with v.
kvk
     
 1   1   −5 
     
Example 4.2.6. Let v 1 =  . Then the set {v 1 , v 2 , v 3 } is orthogonal
 2  , v2 =  1  , v3 =  4
   

     
3 −1 −1
with respect to the dot product. To see this, we check the three dot products:

v1 · v2 = 1+2−3=0
v1 · v3 = −5 + 8 − 3 = 0
v2 · v3 = −5 + 4 + 1 = 0.

Theorem 4.2.7. If S = {v 1 , . . . , v p } ⊆ F n is an orthogonal set of nonzero vectors, then S is linearly


independent.

The requirement that an orthogonal set contain only nonzero vectors is critical. Note that 0 · v = 0 for
any vector v. Thus, as we defined an orthogonal pair, the zero vector is orthogonal to every vector. This
is a truly exceptional quality shared by no other vector. For this reason, the zero vector must be treated
definitely in the theory of orthogonality and it is thus excluded from orthogonal sets. While this may seem
4.2. ORTHOGONALITY 143

somewhat arbitrary, this is truly consequence driven. The main consequence is that Theorem 4.2.7 would
fail if 0 were allowed in. It should also be noted that for a singleton {v}, this set is orthogonal if and only if
v 6= 0, the same condition that guarantees that {v} is linearly independent. Notice that the condition that
all pairs be orthogonal is vacuously true in this case as there are no pairs which are not orthogonal (since
there are no pairs at all). This also holds for the empty set ∅. It has no pairs of distinct vectors which are
not orthogonal, hence, the empty set is an orthogonal set.

An orthogonal (orthonormal) spanning set of F n is necessarily a basis, called an orthogonal (orthonor-


mal) basis. For example, the standard basis in F n is an orthonormal basis.

Theorem 4.2.8. Let W be a subspace of F n . Let W ⊥ = {x ∈ F n | w · x = 0, ∀w ∈ W }, called the


orthogonal complement of W . Then W ⊥ is also a subspace of F n .

Example 4.2.9. Let W be the z-axis in R3 . Then W ⊥ is the xy-plane. 

Theorem 4.2.10. Let A be an m × n matrix. Then

(Row A)⊥ = Nul A.‡

Example 4.2.11. Let W be the subspace of R6 spanned by the vectors

w1 = (1, 3, −2, 0, 2, 0), w2 = (2, 6, −5, −2, 4, −3), w3 = (0, 0, 5, 10, 0, 15), w4 = (2, 6, 0, 8, 4, 18).

Construct a base for W ⊥ .

We can arrange these vectors into a matrix A:


   
 1 3 −2 0 2
0   1 3 0 4 2 0 
   
 2 6 −5 −2 4 −3   0 0 1 2 0 0 
   
A= ∼ 
   
 0 0 5 10 0 15   0 0 0 0 0 1 
   
   
2 6 0 9 4 18 0 0 0 0 0 0

such that W = Row(A). By the previous result, W ⊥ = Row(A)⊥ = Nul(A). As we know how to find a
basis for a null space given the echelon form of the matrix,

W ⊥ = Span{(−3, 1, 0, 0, 0, 0), (−4, 0, −2, 1, 0, 0), (−2, 0, 0, 0, 1, 0)}. 

Despite our reluctance to use any fields other than R and C, the entirety of this section‡‡ is transferable
to any field using the dot product or Hermitian product.
144 CHAPTER 4. ORTHOGONALITY

Exercises (Go to Solutions)


   
 k   4 
♠ 1. Find k so that  ·  is orthogonal.
1 3

For Exercises 2-5, determine whether the two given vectors are orthogonal or not. If not, adjust a value in
one of the vectors to make them orthogonal. Answers may vary.
               
 4   4   6   8   −3   5   2   4 
2.   ,   3.   ,   4.  ,  5.   ,  
2 −8 1 −24 −5 −3 8 1

For Exercises 6-9, find a hyperplane in F n containing the vector x0 , listed first, and whose normal vector is
n, listed second.
           
 1   3   0   1   1   −3 
           
♠ 6.   2 ,  −2 
    0   −2 
     2   −6 
♠ 7.  ,      
       
♠ 8.  3
   
,
   
3 0  1   2 
     
 3 
 
       
2 −1  4



 2 
 
   
5 −5

   
 10   1 
   
 −8   3 
   
   
9.  6 ,
   
 5 
   
   
 −4   7 
   
   
2 9

For Exercises 10-17, find a basis for the orthogonal complement of the subspace W , given below. Answers
may vary.
       
 0


 1 




 3 −1 


  
  




 
 


10. Span  2  ,  −1 
    ♠ 11. Span  −1  ,  1 
   

    
 
    


 
 
 

 0 1   9 −3 
     
1  3   −3 

  
 

  
  



 
    

♠ 12. Span 
 
3  3   −2 
 
   

♠ 13. Span 
  , 

  
    
 
−1   −12   12 
 
 

    


    

 
6 −4 

 

     
 x1
 
  x1
 

♠ 14.  2x1 + 3x2 = 0 15.  x1 + x2 = 0
   
 
 x2
 
  x2
 

4.2. ORTHOGONALITY 145
     



 x1  




 x1  



 

 

 


♠ 16. 
 x 2

 3x 1 − x2 + 2x 3 = 0 ♠ 17. 
 x 2

 x 1 + x 2 = 0, x1 − x 3 = 0

  
 
  


 
 
 

 x   x 
3 3

For Exercises 18-18, find a hyperplane in F n containing the given vectors x0 , x1 , x2 , . . .. [Hint: If W =
Span{x0 − x1 , x0 − x2 , . . .}, find a normal vector by computing a basis for W ⊥ .]
18. (1, 2, 3), (−1, 4, 1), (4, −2, 8), (0, 1, 6)

† When it comes to defining orthogonality, we have to exclude the zero vector. Note that 0 · v = 0 for every vector v. As

such, if we allowed the zero vector to be orthogonal then it would be orthogonal to every vector, including itself, which would
provide needless counterexamples to theorems and geometric interpretations that will follow.
†† For real vector spaces, the Pythagorean Theorem becomes an “if and only if” statement, that is, Two vectors u, v ∈ Rn

are orthogonal if and only if


ku + vk2 = kuk2 + kvk2 .
The proof of the other direction is based upon the observation that the equality of squared norms holds only if u · v + v · u = 0.
In the real case, of course, the left-hand side simplifies to 2(u · v), which then implies that u · v = 0. In the complex
case, counterexamples exist. For example, let u = (1, 1, 1, 1) and v = (0, 0, 0, i). Note that kuk = 2 and kvk = 1. Hence,
kuk2 + kvk2 = 5. On the other hand, ku + vk2 = 1 + 1 + 1 + 2 = 5, but u · v = i 6= 0. The defect in extending the proof to all
complex vectors here is that equality will hold whenever the Hermitian product between the two vectors is purely imaginary.
Of course, the only purely imaginary number which is also real is zero, hence why it holds in the real case.
† For real vector spaces, recall that Row(A) = Col(A> ). With the perspective, Theorem 4.2.10 tells us that Nul(A)⊥ =

Col(A> ). Hence, the orthogonal complement ⊥ and transpose > are, in same way, dual notions of each other.
‡‡ There is one important partial exception to this claim, namely the Pythagorean Theorem. We say this as a partial exception

because while the algebraic statement and proof still remain valid (note that kuk2 = u·u and is valid for any vector space which
we desire to introduce the dot product), the geometric interpretations about the norm (if it is defined via the dot product) do
not necessarily transfer to general vector spaces.
146 CHAPTER 4. ORTHOGONALITY

“We can never obtain peace in the outer world until we make peace with ourselves.” – Dalai Lama

4.3 Outer Products


We have already discussed some special families of matrices, e.g. nonsingular, elementary, diagonal, and
triangular matrices. In this section, we will explore some more special types of matrices and some of their
properties. These matrices are important in our study of orthogonality and later when we study eigenvalues.

Definition 4.3.1. A real matrix A is symmetric if A> = A. A complex matrix A is Hermitian if A∗ = A.

   
 1 2 3   1 i 1 + i 
    > ∗
Example 4.3.2. Let A =  2 4 5  and B =  −i
   −5 2 − i  . Since A = A and B = B
   
3 5 6 1−i 2+i 3
(convince yourself!), A is a symmetric matrix and B is an Hermitian matrix. 

The theory of symmetric and Hermitian matrices is almost identical. We will primarily focus on real
matrices below and will only specify Hermitian matrices when a critical difference arises.

Theorem 4.3.3. If A and B are n × n symmetric (Hermitian) matrices and r ∈ R (r ∈ C), then

(i) A + B is symmetric (Hermitian) (iii) AB is symmetric (Hermitian) if and only if


AB = BA

(ii) rA is symmetric (Hermitian) (iv) if A is invertible, then A−1 is symmetric


(Hermitian).
   
 1 2   5 3 
Example 4.3.4. Note that A =   and B =   are symmetric matrices. Likewise,
2 0 3 1
       
 1 2   5 3   1+5 2+3   6 5 
A+B = + = = 
2 0 3 1 2+3 0+1 5 1

is likewise symmetric, as well as    


 1 2   5 10 
5A = 5  = . 

2 0 10 0

Theorem 4.3.5. For any matrix A, the matrices A> A and AA> are symmetric (A∗ A and AA∗ are Her-
mitian).
 
 1 −2 4 
Example 4.3.6. Let A =  . Then
3 0 −5
   
1 3   10 −2 −11 
 1 −2 4  
 
>
 
A A=
 −2 0 
 =
 −2 4 −8 
  3 0 −5  
4 −5 −11 −8 41
4.3. OUTER PRODUCTS 147

and  
  1 3   
 1 −2 4    21 −17 

AA> =   −2
 =
0  . 

3 0 −5   −17 34
4 −5

Definition 4.3.7. Let P : F n → F n be a linear transformation. We say that P is a projection if P ◦P = P .

Let A be the standard matrix of P . Then the property that P ◦P = P translate to mean that A2 = A. Any
matrix (necessarily a square) which satisfies this identity is called an idempotent matrix and is necessarily
the standard matrix of a projection.

Let x ∈ F n . Then y = P (x) is an arbitrary element of the range of P . By definition, we have that
P (y) = P (P (x)) = P ◦ P (x) = P (x) = y,
that is, a projection is exactly a linear transformation that fixes its image. Essentially this means that while
some of the coordinates of x are unaltered the other coordinates are forgotten.

Geometrically, a projection is a map from the y


ambient space which projects onto some subspace
(the range of the projection). In the process of v
projecting into the subspace, some information
about the vectors is removed so that they can fit
inside the smaller space. The image of a vector v x
is the shadow cast by v onto the subspace associ- P (v)
ated with the projection P .

Example 4.3.8. For example, the matrices


   
 0 0 0   1 0 0 
 
 1 0     
 ,  0
 1 ,
0   0
 1 0 

0 0    
0 0 0 0 0 0

are easily seen to be idempotent matrices and hence correspond to projections. The first matrix is the
projection in R2 onto the x-axis where the y-coordinate is forgotten and replaced with 0. Any point already
on the x-axis has the form (x, 0) and is unaffected by the projection.

The second matrix is a projection in R3 onto the y-axis, where the x- and z-coordinates are discarded.
Likewise, the third matrix is a projection in R3 onto the xy-plane, where the z-coordinate is the only
information forgotten. In either case, elements already in the subspace are not altered by the projection. 

Example 4.3.9. The matrices  


 −8 4 1 
 
2 2   
,

 −18 9 2 
 
−1 −1  
0 0 1
are also idempotent matrices. (Convince yourself of this). The first matrix corresponds to projection of R2
1
onto the line spanned by (2, −1), that is, the line y = − x. The second matrix corresponds to projection of
2
R3 onto the plane spanned by (4, 9, 0) and (1, 2, 1), that is, the plane 9x − 4y − z = 0. 
148 CHAPTER 4. ORTHOGONALITY

If A is idempotent, then multiplication by A is projection onto Col(A).

   
 u1   v1 
 ..   .. 
 ∈ F n . Then the outer product (or matrix product)
Definition 4.3.10. Let u = 
 . ,v = 
  . 
   
un vn
 
of u and v, denoted u ⊗ v, is the n × n matrix of the form ui vj .

More compactly, we have that u ⊗ v = uv > (or u ⊗ v = uv ∗ for complex vectors). This highly resembles
the definition of the inner product u · v = u> v (or u∗ v for complex vectors). Despite this similarity, the
outer product is a matrix and the inner product is a scalar. Neither of them is a vector. Of course, these
two products are interconnected, justifying the complementary names, by the following formula:

(u ⊗ v)w = (v · w)u.

Proof.
(u ⊗ v)w = (uv > )w = u(v > w) = u(v · w) = (v · w)u. 

>
Essentially, the adjectives inner versus outer can be a mnemonic to describe the location of the (or ∗ ).
 
 2 
   
1   2 −2 3 −6
 −2 
     
   
Example 4.3.11. Let u =   −2 
 and v = 


 . Then u ⊗ v =  −4
 4 −6 12  . 
   3   
 
0   0 0 0 0
−6

Theorem 4.3.12. Let u be a unit vector. Then A = u ⊗ u is an idempotent matrix, and the matrix
transformation x 7→ Ax is a projection onto the subspace Span{u}.


     
1 1
1   2/2
Example 4.3.13. Consider the vector v =  . Its normalization is u = √   =  √ .
   
1 2 1 2/2
 
 1/2 1/2 
Then the matrix A = u ⊗ u =   is an idempotent matrix since
1/2 1/2
    
 1/2 1/2   1/2 1/2   1/4 + 1/4 1/4 + 1/4 
A2 =   =  = A.
1/2 1/2 1/2 1/2 1/4 + 1/4 1/4 + 1/4

The mapping x 7→ Ax is the projection of a vector in R2 onto the line y = x. 

Definition 4.3.14. Let A be an n × n matrix. We say that A is nilpotent if An = 0, the zero matrix.

Of course, it is possible for a nilpotent matrix A that Am = 0 for some integer smaller than n. For
example, the outer product can also be used to create nilpotent matrices such that A2 = 0 for any n.
4.3. OUTER PRODUCTS 149

Theorem 4.3.15. Let u, v ∈ F n such that u and v are orthogonal. Then A = u ⊗ v is a nilpotent matrix.
In particular, A2 = 0.

     
1 2  2 −1 
Example 4.3.16. Note that u =   and v =  . Then u · v = 0. Then A =   is
   
2 −1 4 −2
nilpotent since       
 2 −1   2 −1   4 − 4 −2 + 2   0 0 
A2 =   = = . 

4 −2 4 −2 8−8 −4 + 4 0 0

Example
  4.3.17. We note that all strictly triangular matrices are nilpotent. For example, if A =
 0 0 
 , then A2 = 0, and hence A is nilpotent.
1 0
   
 0 1 2   0 0 3 
  2
  3
Likewise, if B =  0 0 3 , then B =  0 0 0 
  
 and B = 0. Hence, we have construct a
   
0 0 0 0 0 0
nilpotent matrix such that B 2 6= 0. Hence, B is a nilpotent matrix that cannot be factored as an outer
product. You will notice that in the matrix B, while we had only zeros along the diagonal, we did have
non-zero entries in the slant right above the main diagonal and the slant right above that one, the so called
second upper diagonal and third upper diagonal. When we squared B, we gained zeros along the second
diagonal but still had nonzero entries on the third diagonal. Only when we cubed the matrix did we get
zeros everywhere. This is essentially how multiplication of strictly triangular matrices work. Every time we
take another power, the matrix loses one more of its diagonals. Since this will eventual terminate, strictly
triangular matrices are nilpotent. 
150 CHAPTER 4. ORTHOGONALITY

Exercises (Go to Solutions)


For Exercises 1-6, finish the matrix so that it is symmetric or Hermitian.
     
 ∗ 7  1 2 ∗   7 8 9 
1. 

    
∗ ∗ ♠ 2.  ∗ 1 2 
 ♠ 3. 
 ∗ 5 6 

   
3 ∗ 1 ∗ ∗ 3
     
 56 ∗ 3 23   ∗ 6 ∗ ∗ ∗   1 ∗ 5 − 2i 
     
 21 91 85

∗ 
  ∗ 7 23 14 8  6. 
 −7i 3 ∗ 
♠ 4. 
    
    
5.  8
 ∗  
 ∗ 43 35 
  ∗ 11 ∗ ∗ 
 ∗ 9i 2
   
∗ 75 ∗ 62  9
 ∗ 16 54 22 

 
13 ∗ 19 ∗ 72

For Exercises 7-9, give an example of a non-symmetric real matrix for each listed size. (Answers may vary).

7. 2 × 2 8. 3 × 3 9. 4 × 4

For Exercises 10-13, determine whether the matrix is Hermitian or not.


     
2 1 − 3i 1 3 0 1 − i
♠ 10.  ♠ 11.  ♠ 12. 
     
  
1 + 3i −3 3 0 1+i i
 
3 −1 
♠ 13. 


−1 2

We say a matrix A is skew-symmetric (or alternating) if A> = −A. For Exercises 14-15, finish the
matrix so that it is skew-symmetric.
   
∗ −2  ∗ 3 9 ∗ 
14. 
 
  
∗ ∗  ∗ ∗ ∗ ∗ 
 
15. 



 ∗ −5 0 ∗ 
 
 
−3 2 9 0

For Exercises 16-18, compute the given quantity.

♠ 16. Compute
 u⊗ v, if   ♠ 17. Compute
 u⊗ v, if   ♠ 18. Compute
 u ⊗ v, if

 −3   4  2 4  1 
 
u= ,v =   






   1
u =  −2  , v =  −1  u= 1 ,v = 
 
2 0   

 
      −3
1 2 2

For Exercises 19-22, find vectors u and v such that their outer product is equal to the given matrix, that is,
find an outer factorization.
4.3. OUTER PRODUCTS 151
   
 −28 0 7   14 −4 
 
 
19. u ⊗ v = 
 −8 0 2 
  −7

2 

  20. u ⊗ v = 
 

−4 0 1  −28
 8 

 
7 −2
 
 6 9 6 12 
 
 
 6 9 6 12  18 0 0 45 
  
   
21. u ⊗ v =  12 18
 
12 24  0 0 0 0 
 




 22. u ⊗ v = 



 18 27 18 36   0 0 0 0 
   
   
8 12 8 16 −16 0 0 −40

For Exercises 23-25, determine whether the matrix is idempotent, nilpotent, or neither.
     
 2 −2 −4   1 1 2   0 4 −2 
     
♠ 23. 
 −1 3 4 
 ♠ 24. 
 1 0 −3 
 ♠ 25. 
 0 0 5 

     
1 −2 −3 2 2 0 0 0 0

♠ 26. Using the projection in Example 4.3.13, draw the image of the unit square under this projection and
describe its “area.”
27. Let A be a square matrix.

(a) Show that A + A> is a symmetric matrix.


(b) Show that A − A> is a skew-symmetric matrix (see Exercise 14).
(c) Show that there exists matrices S and T such that S is symmetric, T is skew-symmetric, and
A = S + T.

28. Let A be a square matrix. Show that if A is both idempotent and invertible then A is an identity
matrix. Can a nilpotent matrix be invertible? Explain why or why not.
29. In Exercise 14, the notion of a skew-symmetric matrix was introduced. What would be the analogous
definition of skew-Hermitian matrix? Provide an example of a non-real skew-Hermitian matrix.
152 CHAPTER 4. ORTHOGONALITY

“It’s always good to take an orthogonal view of something. It develops ideas.” – Ken Thompson

4.4 Affine Transformations


Theorem 4.4.1 (Cauchy-Schwarz Inequality). For all u, v ∈ F n ,

|u · v| ≤ kukkvk.

Theorem 4.4.2 (The Triangle Inequality). For


all u, v ∈ F n ,
v u
+
u

v
ku + vk ≤ kuk + kvk.

Theorem 4.4.3 (The Law of Cosines). Let u, y ∈ F n . Then

u · y = kukkyk cos θ, †

where θ is the angle between the two line segments from the origin to the points identified with u and y.

Example 4.4.4. Compute the angle θ between u = (6, −1) and v = (1, 4) in R2 .
!    
6(1) − 1(4) 6−4 2
θ = cos−1 p √ = cos−1
√ √ = cos−1
√ √ ≈ 85.43◦ 
62 + (−1)2 12 + 42 36 + 1 1 + 16 37 17

From the previous theorem, if the angle between two vectors u and v is 90◦ , then the inner product
u · v = 0.

Definition 4.4.5. A real square matrix U is called orthogonal if U > U = I, that is, U T = U −1 . A complex
square matrix U is said to be unitary if U ∗ = U −1 .

Theorem 4.4.6. A square matrix U is orthogonal (unitary) if and only if its column vectors form an
orthonormal set.

Since U > = U −1 and U U > = U U −1 = I, it also follows that the row vectors of an orthogonal matrix U
must also form an orthonormal set.

Rotation and reflection matrices are examples of orthogonal matrices.

Example 4.4.7. The matrix  √ √ √ 


 3/ 11 −1/ 6 −1/ 66 
 √ √ √ 
U =
 1/ 11 2/ 6 −4/ 66 

 √ √ √ 
1/ 11 1/ 6 7/ 66
4.4. AFFINE TRANSFORMATIONS 153

is an orthogonal matrix. Note that


 √ √ √  √ √ √ 
 3/ 11 1/ 11 1/ 11  3/ 11 −1/ 6 −1/ 66 
√ √ √ √ √ √
U >U = 
  
 −1/ 6 2/ 6 1/ 6 1/ 11 2/ 6 −4/ 66 

 
 √ √ √  √ √ √ 
−1/ 66 −4/ 66 7/ 66 1/ 11 1/ 6 7/ 66
 √ √ 
 (9 + 1 + 1)/11 (−3 + 2 + 1)/ 66 (−3 − 4 + 7)/ 726 
 √ √ 
=
 (−3 + 2 + 1)/ 66 (1 + 4 + 1)/6 (1 − 8 + 7)/ 396  = I3
 
 √ √ 
(−3 − 4 + 7)/ 726 (1 − 8 + 7)/ 396 (1 + 16 + 49)/66

 
1 1
2 (1 + i) 2 (1+ i) 
Example 4.4.8. Let U =  . Note that

1 1
2 (1 − i) 2 (−1 + i)
    
1 1 1 1 1 1 i i
2 (1 + i) 2 (1 + i)  2 (1 − i) 2 (1 + i)   2+2 2 − 2
UU∗ =  =  = I2
 

1 1 1 1
2 (1 − i) 2 (−1 + i) 2 (1 − i) 2 (−1 − i) − 2i + 2i 1
2 + 1
2

Therefore, U is unitary. 

Theorem 4.4.9. Let U be an orthogonal (unitary) matrix, and let x, y ∈ F n . Then


(U x) · (U y) = x · y,
that is, the matrix transformation x 7→ U x preserves inner products.
Proof.
(U x) · (U y) = (U x)> (U y) = (x> U > )(U y) = x> (U > U )y = x> y = x · y. 

Since multiplication by an orthogonal (unitary) matrix U preserves inner products, as a consequence,


it also preserves lengths, distances, angles, and orthogonality of vectors. For example, kU xk = kxk and
dist(U x, U y) = dist(x, y) for any vectors x, y and any orthogonal matrix U .

Theorem 4.4.10. Let B and C be orthonormal bases of F n . Then the change-of-basis matrix P is
C←B
orthogonal (unitary).

Definition 4.4.11. An isometry (or a rigid motion) is a function R : F n → F n such that dist(T (x), T (y)) =
dist(x, y) for all x, y ∈ F n .

Isometry, which translates as “same-measure” from Greek, are those maps that preserve distances and
lengths. Using trigonometry, necessarily angles are preserved too. Therefore, an isometry is exactly a map
which “moves” objects is the vector space without “distorting” them, that is, the image of a shape will be
congruent to the original shape. Linear isometries are exactly multiplication by an orthogonal (unitary)
matrix.

Orthogonal transformations are not the only way to make an isometry. Let b ∈ F n be any vector in
the vector space. We can define a transformation T : F n → F n by the rule T (x) = x + b, called a trans-
lation by b. If b 6= 0, then translation is not a linear map since T (0) = b 6= 0. On the other hand,
dist(T (x), T (y)) = k(x + b) − (y + b)k = kx + b − y − bk = kx − yk = dist(x, y). If we allow for translations
in a vector space, we must broaden our type of transformations to affine transformations.
154 CHAPTER 4. ORTHOGONALITY

Definition 4.4.12. Let T : F n → F m be a function. We say that T is an affine transformation if there


exists a matrix A ∈ F m×n and a vector b ∈ F m such that T (x) = Ax + b for all x ∈ F n .

Of course, a linear transformation is affine (when b = 0) and every translation is affine (when A = In ).
Affine transformations are very important in linear geometry. As translation maps are ALWAYS one-to-one
(onto), an affine transformation x 7→ Ax + b is one-to-one (onto) if and only if the matrix transformation
x 7→ Ax is one-to-one (onto).

 
 3 −1 1 
3 3
 
Example 4.4.13. Consider the affine transformation T : R → R associated to the matrix A = 
 −3 2 0 

 
6 −3 2
and the translation vector b = (1, 2, 3). Then
            
 2   3 −1 1  2   1   5   1   6 
            
T


 =  −3
0   2 0 
 0
  +  2  =  −6  +  2  =  −4
       
.

            
−1 6 −3 2 −1 3 10 3 13

Is y = (2, −2, 4) ∈ im(T )? To answer this question, we solve the matrix equation Ax + b = y ⇒ Ax =
y − b = (1, −4, 1):
   
 3 −1 1 1   1 0 0 2 
   
∼ .
 −3 2 0 −4   0 1 0 1 

   
6 −3 2 1 0 0 1 −4

Therefore, T (2, 1, −4) = (2, −2, 4). 

As mentioned above, an affine transformation T : F n → F n : T (x) = Ax + b cannot be linear if b 6= 0 on


n
F . On the other hand, we can realize affine transformations as matrix multiplication in a higher dimension.
After all, F n is a naturally is a subset of F n+1 and consists of all those vectors in F n+1 whose last coordinate
is zero. We may identify F n in F n+1 with the hyperplane associated to the linear equation xn+1 = 1. In
other words, we will translate F n by the vector en+1 and consider this n-flat. It will consist of all vectors
in F n+1 whose last coordinate is 1. Suppose y = T (x) = Ax + b, which is the image of x under the affine
transformation. Then       
 A b   x   Ax + b   y 
  = = .
0...0 1 1 1 1

Hence, the affine transformation T : F n → F m can be extended to a linear transformation L : F n+1 → F m+1
in a higher dimension.

Example 4.4.14. Using the affine transformation from Example 4.4.13, the standard matrix of the affine
transformation is:
      
 3 −1 1 1   3 −1 1
1  2   6 
      
 −3 2 0 2  −3 2 0 2   0   −4 
      

[T ] =   . Note that T (2, 0, −1) =   = . 
      
 6 −3 2 3   6 −3 2 3   −1   13 
      
      
0 0 0 1 0 0 0 1 1 1
4.4. AFFINE TRANSFORMATIONS 155

Theorem 4.4.15 (Mazur-Ulam Theorem). If T : F n → F n is an isometry then, for all x ∈ F n , T (x) =


U x + b for some translation vector b ∈ F n and orthogonal (unitary) matrix U ∈ F n×n .

Over R2 there are four types of isometries: rotation around a point in the plane, reflection across a line in
the plane, translation, and glide reflection (the composition of a reflection and a translation, like footprints
in the sand).
156 CHAPTER 4. ORTHOGONALITY

Exercises (Go to Solutions)


For Exercises 1-3, find the angle between the vectors u and v in Rn .
           
1 3 1  1  1  1
♠ 1. u =  , v = 
        
        
1 −1 ♠ 2. u = 
 0 , v =  2 
   
 −1 

 2 
 
    ♠ 3. u = 

, v = 
 


−3 3 
 1 

 3 
 
   
1 4

For Exercises 4-11, identify whether of the transformation is an isometry. The graphic on the left will denote
the original shape and the graphic on the right is the transformed shape.
♠ 4. The shape was translated and rotated.

♠ 5. The shape was translated and rotated.††

♠ 6. The shape was translated and increased in size.‡

♠ 7. The object was translated and changed its shape from a rectangle to a circle, although the area stayed
the same.

♠ 8. The object was reflected.


4.4. AFFINE TRANSFORMATIONS 157

♠ 9. In the following transformation, the shadow object denotes the original object and the opaque object
denotes the transformed object.‡‡

♠ 10. The shape was translated and has been cut.

♠ 11. The following transformation is a little more difficult to visualize. Imagine the shape is transformed
by placing it back exactly the way it was before, that is, the starting and stopping shape is identical.

 
2
For Exercises 12-15, let T : R2 → R2 : x →
7 x + b be translation by b, provided below. Let P =  .
 
1
Draw P and T (P ) on the same gridlines.
       
 1   0   −1   −2 
♠ 12.   ♠ 13.   ♠ 14.   ♠ 15.  
0 −2 −1 1

For Exercises
 16-18, consider the affine transformation T : R3 → R3 : x 7→ Ax + b associated to the matrix

 3 −1 1   1 
   
A=  −3 2 0   and the translation vector b =  2 .
 
   
6 −3 2 3

16. Compute T (3, 1, −5).

17. Find a vector x ∈ R3 such that T (x) = (6, 1, 3).


18. Is T one-to-one? Onto? Why or why not?
158 CHAPTER 4. ORTHOGONALITY

For Exercises 19-21, consider the affine transfor- For Exercises 22-24, consider the affine transfor-
mation T : R3 → R3 : x 7→ Ax + b associated to mation T : R3 → R3 : x 7→ Ax + b associated to
the matrix
  the matrix
 
 1 0 −1   6 4 6 
   
A =   2 2  and the translation vector
2  A =  2 2 2  and the translation vector

   
4 1 −3 10 8 6
   
 −5   3 
   
b=  6 .
 b=  2 .

   
8 1

♠ 19. Compute T (5, −2, 4). 22. Compute T (1, 2, 1).


♠ 20. Find a vector x ∈ R3 such that 23. Find a vector x ∈ R3 such that
T (x) = (−6, 8, 12). T (x) = (7, 6, 5).
♠ 21. Is T one-to-one? Onto? Why or why not? 24. Is T one-to-one? Onto? Why or why not?

† For complex vectors, the real valued function cos : R → R will need to be extended to the complex plane cos : C → C.
†† Although the precious path of rotation and translation is different from the previous exercise, the starting and stopping
positions of the shape are exactly the same. In this case, we say these two transformations equal. In particular, a transformation
is determined by the image itself of the shape and not be the process which produces the image.
‡ This is an example of a dilation. To shrink in size is called a contraction.
‡‡ Notice that the center of the rectangle is not moved by the motion, that is, the transformation assigns this point to itself.

We say that a point P is a fixed point if P 0 = P . The identity motion is characterized as the transformation which assigns
every point to itself, that is, all points are fixed. Can you find the identity motion in this homework set?
4.5. ORTHOGONAL PROJECTIONS 159

“To raise new questions, new possibilities, to regard old problems from a new angle, requires creative
imagination and marks real advance in science.” – Albert Einstein

4.5 Orthogonal Projections


Theorem 4.5.1 (Parseval’s Identity). Let S = {v 1 , . . . , v p } ⊆ F n be an orthogonal subset. If y is a linear
combination of the vectors in S, that is,
y = c1 v 1 + c2 v 2 + . . . + cp v p ,
then
vi · y
ci = , called the Fourier coefficients.
vi · vi
Proof. Taking the dot product, we get
v i · y = v i · (c1 v 1 + c2 v 2 + . . . + cp v p ) = ci (v i · v i ).
Since v i · v i 6= 0, dividing both sides by v i · v i gives the formula. 

Example 4.5.2. Let S = {v 1 = (1, 2, 3), v 2 = (1, 1, −1), v 3 = (−5, 4, −1)}, which was shown in Example
4.2.6 to be an orthogonal subset of R3 . By Theorem 4.2.7, we see that S is, in fact, an orthogonal basis for
R3 . Let y = (−4, 8, 10). Since y ∈ R3 , y is a linear combination of the vectors of S. By Theorem 4.5.1, we
have
     
v1 · y v2 · y v3 · y 42 −6 42
y = v1 + v2 + v3 = v1 + v 2 + v 3 = 3v 1 − 2v 2 + v 3
v1 · v1 v2 · v2 v3 · v3 14 3 42
         
 1   1   −5   −4   3 
         
= 3  2  − 2  1  +  4  =  8  , that is, [y]S =  −2  .
        
         
3 −1 −1 10 1


Definition 4.5.3. Let y ∈ F n and W ≤ F n . Let {u1 , u2 , . . . , ur } be an orthogonal basis for W . Then
the orthogonal projection of y onto W is
r
X ui · y u1 · y u2 · y ur · y
y
b = projW y = ui = u1 + u2 + . . . + ur .
u · ui
i=1 i
u1 · u1 u2 · u2 ur · ur

When the subspace W is clear from context, projW (y) is often abbreviated as y
b.

Let y, u ∈ F n and u 6= 0. Then the orthogonal projection of y onto u is proju y = projW y, where
W = Span(u).

Intuitively, the orthogonal projection of y onto


u is the shadow the arrow y casts on the line y
spanned along u.

y
b u

   
 1   2 
   
Example 4.5.4. Let y =  7  and u =  1 
  
. Then
   
3 1
160 CHAPTER 4. ORTHOGONALITY

  

   4 
2
u·y 12    
y
b = proju (y) = u=  1  =  2 .
u·u 6  
  
  
1 2

Note that
        
1 4
     −3  4
   −3 
         
b = 7 − 2 = 5 
y−y     
, b + (y − y
y=y b) =  2  +  5 
  
,
         
3 2 1 2 1

and that    
 4   −3 
   
y · (y − y ) =  2  ·  5 
b b   
 = −12 + 10 + 2 = 0.
   
2 1

Therefore, y − y b ∈ W ⊥.
b are orthogonal, that is, y − y
b and y 

Theorem 4.5.5. Let y ∈ F n and let W ≤ F n . Then y − projW y ∈ W ⊥ . In particular,

projW y · (y − projW y) = 0.

Theorem 4.5.6 (The Orthogonal Decomposition Theorem). Let y ∈ F n and W ≤ F n . Then there exists
unique vectors w1 ∈ W and w2 ∈ W ⊥ such that

y = w1 + w2 .

     
 1   −5   −9 
     
Example 4.5.7. Let u1 =  2 , u2 =  4 , and y =  20  . Let W = Span{u1 , u2 }. Since
    
     
3 −1 −1
u1 · u2 = 0, {u1 , u2 } is an orthogonal basis for W . Using the orthogonal decomposition of y onto W , we
can express y as a sum of a vector from W and a vector from W ⊥ .

Now,
    
1
   −5   −13 
u1 · y u2 · y 28   126    
w1 = y
b = projW y= u1 + u2 =  2 +  4  =  16  .
u1 · u1 u2 · u2 14 
 
 42 

 
 


3 −1 3

Thus, 
    
 −9   −13   4 
     
w2 = y − y  20  −  16  =  4
b=     .

     
−1 3 −4
4.5. ORTHOGONAL PROJECTIONS 161

Note that w2 · u1 = w2 · u2 = 0, that is, w2 ∈ W ⊥ . Therefore,


     
 −13   4   −9 
     
y = w1 + w2 =   16
+ 4
 
 =  20  .
   
     
3 −4 −1

Theorem 4.5.8 (The Best Approximation Theorem). Let W ≤ F n , let y ∈ F n , and let y
b = projW y. Then
y
b is the closest point in W to y, that is,

ky − y
b k ≤ ky − wk

for all w ∈ W .
The vector y
b is called the best approximation to y in W .

Proof. Let w ∈ W . Then


y − w = (y − y y − w),
b ) + (b

where y − y
b∈W b − w ∈ W , by the Orthogonal Decomposition Theorem. Then by the Pythagorean
and y
Theorem,
ky − wk2 = ky − y
b k2 + kb
y − wk2 .
b k2 ≤ ky − wk2 .
Therefore, ky − y 

Example 4.5.9. The distance from a point y ∈ Rn to a subspace W is defined as the distance from
 y to

 −13 
 
the nearest point in W , the best approximation. Continuing with Example 4.5.7, the vector y b =  16 


 
3
p √
b k = 4 12 + 12 + (−1)2 = 4 3 . 
is the closest vector in W to the vector y. Hence, dist(W, y) = ky − y

Let H be a flat in F n passing through the vector h. Can we compute the distance between H and some
vector y? Modifying the Best Approximation Theorem, we can because we can translate H by the vector
−h so that it becomes a subspace, say W . If we also translate y by −h, then we get

dist(H, y) = dist(W, y − h) = k(y − h) − (y\


− h)k = k(y − y
b ) − (h − h)k,
b

since vector translation is an isometry and orthogonal projections are linear transformations.

It can be shown that projW : F n → F n for some subspace W ≤ F n is a linear transformation. Therefore,
there must be some matrix that represent projW . Let {u1 , . . . , ur denote an orthonormal basis for W . Let
Q= u1 u2 ... ur be the n × r matrix whose columns vectors are this orthonormal basis. Then
projW (y) = QQ> (y), that is, QQ> is the standard matrix representation for projW . Note that QQ> can
be thought of as a generalization of the outer product of vectors.
162 CHAPTER 4. ORTHOGONALITY

Exercises (Go to Solutions)


For Exercises 1-5, use inner products to express the vector y, listed first, as a linear combination of the
orthogonal basis B, listed second. Find the coordinate vector [y]B .
               
−2  1  −1 1 1 17  5 2 
  
 
 
 
♠ 1. 
    
,
      
    
    
              
−6  3 
  ♠ 2.  10 ,  2  ,  0 
      ♠ 3.  −8 ,  4
  
,
  −2 

  
    

   
    

   
11  1 −1  4  −2 1 
             
 3   3   −1 
 −6   −19 1 7

 
 
 

  
      


   

   


        
 9   4   −1   1 
         18  
 6   −2 


♠ 4. 
     
,  , ,   
            
♠ 5.  −12 ,  −6  ,  0 
      
 8   1   −5   −1 

  
      
      
  

      

   
   

−5

 0 1 −1 
  −15  
 0   5 

  
   

  

   


 
−13  1 5 

For Exercises 6-10, for the vector y, listed first, and vector space W = Span{B}, where B is an orthogonal
basis, listed second, find projW (y). Find the orthogonal decomposition of y, that is, y = w1 + w2 where
w1 ∈ W and w2 ∈ W ⊥ .
               
1  1  1 1 1 3 5 2
  
 
 
 

♠ 6.  , 
      
    
   
    

              
1

 3

 ♠ 7.  ,   ,
 2   2   0 
  ♠ 8.   ,   ,
 1   4   −2 
 
      
       
   
3  1 −1  4  −2 1 
             
1 3   3   −1  1 1 7

  
 

    
   
   

        
   
   

     6   −2 
 2   4   −1   1   1 
      
 

♠ 9.  ,  , ,   
   
        

♠ 10.  2 ,
     
 −6  ,  0 
     
  −5   −1 
 3  
 1     
   
       
        

   

   

4

 0 1 −1 
  3  
 0   5 

  
   


  
   

 
5  1 5 

For Exercises 11-15, for the vector y, listed first, and vector space W = Span{B}, where B is an orthogonal
basis, listed second, find the distance between y and W .
               
1 1 1 1   1  3   5 2 

 
 
 
 
 
♠ 11. 
  
, 
          
            
1  3 
  ♠ 12.  2 ,  2  ,  0
    
 ♠ 13. 
 1 ,  4
  ,
  −2 

      
      

   
3  1 −1  4  −2 1 
             
 3   3   −1 
 1   1  1   7 

 
 
 

 
    
  
 
     
   
   

 2   4   −1   1 
 
       1  
 6   −2 

  
♠ 14.  ,  , ,  
   
      

♠ 15. 
       
,  −6  ,  0 
 1   −5   −1 
 3        
  
    


  2     
  

     

   

   

4  0

1 −1 
  3  
 0   5 


  
   

  
   

 
5  1 5 
4.5. ORTHOGONAL PROJECTIONS 163

16. Let V be an inner product space. Prove the Parallelogram Law :† For all u, v ∈ V , ku+vk2 +ku−vk2 =
2kuk2 + 2kvk2 .

† This law gets its name from Euclidean space because the equality relates the lengths of the four sides of a parallelogram

with its two diagonals.


164 CHAPTER 4. ORTHOGONALITY

“Most of the fundamental ideas of science are essentially simple, and may, as a rule, be expressed in a
language comprehensible to everyone.” – Albert Einstein

4.6 The Fundamental Theorem of Linear Algebra


Example 4.6.1.
   
 −3 6 −1 1 −7   1 −2 0 −1 3 
   
Let A = 
 1 −2 ∼ 0
2 3 −1  0 1 . Thus, Nul A is a 3-dimensional subspace
2 −2 

   
2 −4 5 8 −4 0 0 0 0 0
of R and nullity A = 3. Likewise, Col A is a 2-dimensional subspace of R3 and rank A = 2.
5


Example 4.6.2.
   
 1 3 3 2 −9   1 0 −3 5
0 
   
 −2 −2 2 −8 2   0 1 2 −1 0 
   
Let A =  ∼ . Thus, Nul A is a 2-dimensional subspace
   
 2 3 0 7 1   0 0 0 0 1 
   
   
3 4 −1 11 −8 0 0 0 0 0
of R and nullity A = 2. Also, Col A is a 3-dimensional subspace of R4 and rank A = 3.
5


The rank of any matrix is the number of pivot positions. Suppose that A is an m × n matrix with p
pivots. Then rank A = p. This also implies that the number of non-pivot columns in A is n − p. Consider the
homogeneous system Ax = 0. The number of non-pivot columns is the same as the number of free variables
in the homogeneous system. As we have also seen, each free variable corresponds to a basis element for
Nul A. Thus, nullity A = n − p. This proves the following important theorem.

Theorem 4.6.3 (The Rank-Nullity Theorem). If A is an m × n matrix, then

rank A + nullity A = n.

Example 4.6.4. Let A be a 7 × 9 matrix. Suppose that nullity A = 2. Then by the Rank-Nullity Theorem,
rank A = 7, that is, the mapping x 7→ Ax is surjective.

Is it possible for a 6 × 9 matrix A to have nullity A = 2? No, because this implies that rank A = 7.
This is a contradiction because rank A ≤ 6 = dim R6 . In fact, the Rank-Nullity Theorem tells us that
nullity A ≥ 3. 

By the Rank-Nullity Theorem, we know that

rank A + nullity A = n.

Taking transposes, we all have


dim(Row A) + dim(Nul(A> )) = m.

Remark 4.6.5. The space Nul(A> ) is often called the left null space of A because it consists of all 1 × m
row vectors x such that xA = 0. H
4.6. THE FUNDAMENTAL THEOREM OF LINEAR ALGEBRA 165

Theorem 4.6.6. For a matrix A, rank A = rank A> , that is, the dimension of the column and row space of
A are the same.
Proof. First, the rank A is equal to the number of pivot columns. Let B be the row reduced echelon form
of A. Then dim(Row A) = dim(Row B), where the second dimension is the same as the number of nonzero
rows. But this equals the number of pivots. 

Theorem 3.3.11 (The Nonsingular Matrix Theorem–Continued) Let A be a square n × n matrix. Then the
following statements are equivalent:

(i) A is a nonsingular matrix. (xxiv) corank(A) = n


..
.
(xxv) Row(A) = F n
n
(xxi) Col(A) = F
(xxvi) conullity(A) = 0
(xxii) nullity(A) = 0
(xxiii) Nul(A) = 0 (xxvii) Lnl(A) = 0

We establish the four fundamental spaces of an m × n matrix A. Say A has p pivots. We have seen
already three of these four fundamental spaces many times already:

(i) Nul(A) = {x | Ax = 0} ≤ F n , the null space of A (the set of column vectors which produce zero
with A when multiplied on the right), nullity(T ) = nullity(A) = dim Nul(A) = n − p.

(ii) Col(A) = {Ax | x ∈ F n } ≤ F m , the column space of A (the span of the column vectors of A),
rank(T ) = rank(A) = dim Col(A) = p.

(iii) Row(A) = {y > A | y ∈ F m } = Col(A> ) ≤ F n , the row space of A (the span of the rows vectors of
A), corank(T ) = corank(A) = dim Row(A) = p.

(iv) Lnl(A) = {y | y > A = 0> } = Nul(A> ) ≤ F m , the left null space of A (the set of row vectors which
produce zero with A when multiplied on the left), conullity(T ) = conullity(A) = dim Lnl(A) = m − p.

The left null space essentially measures how much the column vectors of A do not span F m , much in the
same way the null space measures how much the column vectors are not linearly independent. We have seen
that for a matrix A, the columns of A are linearly independent if and only if the null space of A is trivial.
More generally, the dimension of the null space of A, its nullity, measures the size of the null space but also
counts the number of free variables in the linear system Ax = b. This is, of course, the number of non-pivot
columns n − p. Similarly, the columns of A span F m if and only if the left null space of A is trivial. More
generally, the dimension of the left null space of A, its conullity, measures the size of the left null space but
also counts the number of rows of zeros in the row-reduced echelon form U of A. This is, of course, the
number of non-pivot rows m − p. The presence of a row of zeros in U allows the possibilities of inconsistent
linear systems Ax = b and is dependent on the choice of b. Essentially, the left null space is the subspace of
F m of those vectors b which “definitely” make Ax = b inconsistent.
166 CHAPTER 4. ORTHOGONALITY
 
 3 −3 −2 
 
Example 4.6.7. Let A = 
 −5 4 . Note that
3 
 
1 −5 −2
 
3 −3 −2 
     
= 0> .
 
7 4  −5
−1  4 3  = 21 − 20 − 1
 −21 + 16 + 5 −14 + 12 + 2 = 0 0 0
 
1 −5 −2
 
 7 
 
Therefore, b = 
  ∈ Lnl(A). Next, if we try to solve the linear system Ax = b, we see that
4 
 
−1
     
 3 −3 −2
7   1 −5 −2
−1   1 −5 −1  −2
     
∼ 3 ∼ 0
 −5 4 3 4   −3 −2 7   12 4 10 
 
     
1 −5 −2 −1 −5 4 3 4 0 −21 −7 −1
   
 1 −5 −2 −1   1 −5 −2 −1 
   
∼
 0 3 ∼ 0
1 5/2   3 1 5/2 

   
0 3 1 1/7 0 0 0 −33/14

Therefore, Ax = b is inconsistent, as was to be expected. 

For all the other fundamental spaces, we have an algorithm to find a basis for the subspace by row-
reducing A. We provide one here. To find a basis for Lnl(A), augment A with the identity matrix Im (of
the appropriate size) and row reduce the matrix [A | Im ]. Let U be the row-reduced echelon form of A and
let E ∈ F m×m be a product of elementary matrices which transform A into U . Then [A | Im ] ∼ [U | E].
Suppose that U has a row of zeros in ith row. Let εi be the ith row of E. Since EA = U , the ith row of U is
factored as εi A. As this is a row of zeros, εi ∈ Lnl(A). As E is a nonsingular matrix, its set of row vectors is
linearly independent, as is any subset of row vectors. Considering that dim Lnl(A) = conullity(A) = m − p =
the number of zero rows, this independent set of row vectors must be a basis for Lnl(A).

Example 4.6.8. Continuing with Example 4.6.7,


   
 3 −3 −2 1 0 0   1 −5 −2 0 0
1 
   
[A | I3 ] = 
 −5 4 3 0 1 0  ∼ 0
 3 1 1/4 0 −3/4 

   
1 −5 −2 0 0 1 0 0 0 −1/4 −1/7 1/28

Hence, the third row corresponds to a row


 of zeros.
 Thus, the row vector on the right side of the augmentation
 −1/4 
 
is a basis element of Lnl(A), namely, 
 −1/7 . We could replace this vector in the spanning set with any

 
1/28
4.6. THE FUNDAMENTAL THEOREM OF LINEAR ALGEBRA 167
   
 −1/4   7 
   
nonzero scalar multiple, namely b = −28 
 −1/7  =  4
  . Thus, Lnl(A) = Span{(7, 4, −1)}.
 
   
1/28 −1

We present next the Fundamental Theorem of Linear Algebra, which is essentially a summary of many
important results.

Theorem 4.6.9 (The Fundamental Theorem of Linear Algebra). Let A be an m × n matrix in F m×n . Then
every vector v ∈ F n can be decomposed uniquely as a sum of vectors:

v = v Row(A) + v Nul(A)

where v Row(A) ∈ Row(A) and v Nul(A) ∈ Nul(A). In particular, Row(A)⊥ = Nul(A). Furthermore, if Av = b,
then v Row(A) is the particular solution to the linear system Ax = b of shortest length and v Nul(A) is a solution
to the associated homogeneous system Ax = 0.

Likewise, every vector w ∈ F m can be decomposed uniquely as a sum of vectors:

w = wCol(A) + wLnl(A)

where wCol(A) ∈ Col(A) and wLnl(A) ∈ Lnl(A). In particular, Col(A)⊥ = Lnl(A). Furthermore, if A> w = b,
then wCol(A) is the particular solution to the linear system A> y = b of shortest length and wLnl(A) is a so-
lution to the associated homogeneous system A> y = 0.

Finally,

rank(A) + nullity(A) = n, corank(A) + conullity(A) = m, rank(A) = corank(A).

This image needs to be replaced as it was ripped from another textbook. Also, the notion needs to be
congruent with our presentation here.

Rn Rm
AT
168 CHAPTER 4. ORTHOGONALITY

A
T im(A)
dim r im(A ) dim r
T
A

0 0
T
A A

n−r ker(A) ker(AT ) n−r


 3x1 − 3x2 − 2x3 = 3

Example 4.6.10. Solve the linear system −5x1 + 4x2 + 3x3 = −4 by finding the particular solution

x1 − 5x2 − 2x3 = 5

contained in the row space of the coefficient matrix.
   
 3 −3 −2 3   1 0 −1/3 0 
   
∼ 0 1 .
 −5 4 3 −4   1/3 −1 

   
1 −5 −2 5 0 0 0 0

Thus, the linear system is consistent and we see that


   
 0   1 
   
  
,
x =  −1  + t  −1 
   
0 3

is the general solution to the linear system where (1, −1, 3) ∈ Nul(A). But (0, −1, 0) ∈ / Row(A). We know
this since (0, −1, 0)·(1, −1, 3) = 1 6= 0. On the other hand, we may solve the equation xRow(A) ·(1, −1, 3) = 0.
               
 1   0   1   1   0   1   1   1 
               
xRow(A) · 
 −1  =  −1  + t  −1  ·  −1  =  −1  ·  −1  + t  −1  ·  −1 
              
=0
               
3 0 3 3 0 3 3 3

1
(0 + 1 + 0) + t(1 + 1 + 9) = 0 ⇒ 1 + 11t = 0 ⇒ t=− .
11
     
 0   1   −1/11 
  1    
Hence, xRow(A) =  −1  −
   −1  =  −10/11 
 
. It can be verified that xRow(A) is both a solution
 11 

   
0 3 −3/11
to the linear system and a member of Row(A) (since it is orthogonal to Nul(A)). 
4.6. THE FUNDAMENTAL THEOREM OF LINEAR ALGEBRA 169
    
 2 −6 0 0 0  x1   −72 
    

 3 −7 4 14 4 

 x2 

 −78 
 
    
Example 4.6.11. Solve the matrix equation  =  −249  by
    
 4 −19 −14 −49 −14  
 x3 
  
    

 3 −6 6 21 6 

 x4 

 −63 
 
    
−1 2 −2 −7 −2 x5 21
finding the particular solution contained in the row space of the coefficient matrix.

We quickly solve the matrix equation Ax = b. We do this by row reducing the augmented matrix [A | b]
to RREF. We then grab the particular solution associated to setting all free variables to zero and also grab
the basis to the null space of the coefficient matrix from the RREF:
   
 2 −6 0 0 0 −72   1 0 6 21 6 9 
   
 3
 −7 4 14 4 −78   0 1 2 7 2 15 
 

   
 4 −19 −14 −49 −14 −249  ∼  0 0 0 0 0 0  .
   
   
   
 3
 −6 6 21 6 −63   0 0 0 0 0 0 
 

   
−1 2 −2 −7 −2 21 0 0 0 0 0 0

Hence,      



  −6   −21   −6 




      

 





 −2 


 −7 


 −2 




     

Nul(A) = Span  1  ,  ,
     
    0   0 
  

      

  0     0 



    1   





      

 
 0 0 1 

and          
 9   −6   −21   −6   9 − 6r − 21s − 6t 
         
 15   −2   −7   −2   15 − 2r − 7s − 2t 
         
         
x =  0  + r 1  + s + t = .
         
     0 

 0  
 r 
         
 0   0   1   0   s 
         
         
0 0 0 1 t

To find the unique solution xRow(A) ∈ Row(A), we utilize the fact that Row(A)> = Nul(A). Hence, xRow(A)
will be orthogonal to each vector in the basis of Nul(A):
 
 xRow(A) · (−6, −2, 1, 0, 0) = 0 
  −6(9 − 6r − 21s − 6t) − 2(15 − 2r − 7s − 2t) + r = 0
xRow(A) · (−21, −7, 0, 1, 0) = 0 ∼ −21(9 − 6r − 21s − 6t) − 7(15 − 2r − 7s − 2t) + s = 0
 
xRow(A) · (−6, −2, 0, 0, 1) = 0 −6(9 − 6r − 21s − 6t) − 2(15 − 2r − 7s − 2t) + t = 0
 

 41r + 140s + 40t = 84

∼ 140r + 491 + 140t = 294

40r + 140s + 41t = 84

170 CHAPTER 4. ORTHOGONALITY

Solving this linear system, we get


   
84
 41 140 40 84   1 0 0 571 
   
∼ 0
 140 294 .
 491 140 294   1 0 571 
   
84
40 140 41 84 0 0 1 571

Therefore,
         
 9   −6   −21   −6   −2043 
         
15   −2   −7   −2   6171 
         

  84 

 294 
   84 
  1  
xRow(A) = +  1 + +  0 =  84  .
    
 0 
 571   571 
 0  571   571  


         

 0 

 0 
  
 1 

 0 
 
 294 
 
         
0 0 0 1 84
4.6. THE FUNDAMENTAL THEOREM OF LINEAR ALGEBRA 171

Exercises (Go to Solutions)


For Exercises 1-4, consider the matrix A and the bases for two of its fundamental subspaces, e.g., Col(A)
and Nul(A). How many rows and columns does A have, that is, what parameters m × n describe A?
     
 1   −3   −5 

 


 

      

 
    



 −2 


 4 


 0 




 2 −1
 
 
     

1. Col(A) = Span   ,   , Nul(A) = Span  1  ,  0  ,  0 
         
 1

0 
 


  
 
 
 



 




 0 
 
 1  
  0 




      

 
 0 0 1 
       
2   −2   2  0 

 
 
 


  
 
 

      
 
  

   
3   4   −5  0 

        
  
2. Col(A) = Span 
,
 
,
 
 ,
 Nul(A) = Span 


0   0   6  0 

   
 

      
 
  


      
 
  

   
−2 −1 4  1 

 
  
     



 1 0 




 7 


  
  

 




3. Row(A) = Span  0  ,  1  ,
    Lnl(A) = Span 
 1 


    
 
 

 
 
 

 1 2   −2 
     
1   0 0 

 

    

  
      

 
 1 
  

 0   1
   
  0 

4. Row(A) = Span   , Lnl(A) = Span  , ,
  
     
 2  0    0 1 
  
     

    


      

 
−1/4 1/2 −3/4 

 

For Exercises 5-7, compute a basis for the left null space Lnl(A) where A is the matrix provided. Answers
may vary.
     
 −3 2 1   1 2   2 −2 −24 −46 114



   
♠ 5. 
 7 −6 −5 
  3 4   8 −8 −26 −44 106
 

♠ 7. 
  
   
♠ 6.  5 6 
   
−7 4 1  
 −7
 7 5 3 −4 

   
 7 8 
  −2 2 10 18 −44
 
9 10

For Exercises 8-10, solve the linear system Ax = b by finding a particular solution in the row space of the
coefficient matrix, that is, find xRow(A) .
         
2 3 x 1 2 1 0 −3 x 1 −8
♠ 8.  =
         
      
−4 −6 x2 −4 ♠ 9.  −2
 3 0   x2  =  4 
   

    
−1 −2 7 x3 16
172 CHAPTER 4. ORTHOGONALITY
    
 6 6 1 5   x1   9 
    
 1 1 0 1   x2   2 
    
♠ 10. 
 

=
 


 −2 −2 −4 2   x3   8 
    
    
0 0 −3 3 x4 9
4.7. THE GRAM-SCHMIDT ALGORITHM 173

“Anyone who imagines that bliss is normal is going to waste a lot of time running around shouting that he
has been robbed.” – Jenkins Lloyd Jones

4.7 The Gram-Schmidt Algorithm


Using orthogonal projections, we can construct an orthogonal basis from a given basis. If desired, we can
also construct an orthonormal basis by normalizing.

Theorem 4.7.1 (The Gram-Schmidt Process). Let B = {x1 , . . . , xp } be a basis for a subspace W ≤ F n .
Define recursively the vectors

v1 = x1
v 1 · x2
v2 = x2 − v1
v1 · v1
v 1 · x3 v 2 · x3
v3 = x3 − v1 − v2
v1 · v1 v2 · v2
..
.
v 1 · xp v p−1 · xp
vp = xp − v1 − . . . − v p−1
v1 · v1 v p−1 · v p−1

Then C = {v 1 , . . . , v p } is an orthogonal basis of W . In addition,

Span{v 1 , . . . , v k } = Span{x1 , . . . , xk } ∀k.

     
 1  0   0 
     
 1   1   0 
     
Example 4.7.2. Let x1 =  , x2 = 
  
, and x3 = 
  , which as a set is linearly independent.

 1   1   1 
     
     
1 1 1
Let W = Span{x1 , x2 , x3 }. Then using the Gram-Schmidt procedure, we can construct an orthogonal basis
of W .
 
 1 
 
 1 
 
v 1 = x1 =   

 1 
 
 
1
     
 0   1   −3/4 
     
1 1 1/4
     
v 1 · x2 
− 3
    
v 2 = x2 − v1 =  = 
v1 · v1  1  4  1   1/4 
     
     
     
1 1 1/4
     
 0   1   −3/4 
     
0 1 1/4
     
v 1 · x3 v 2 · x3 
− 2
 
 − 1/2 
  
v 3 = x3 − v1 − v2 =  
v1 · v1 v2 · v2  1  4  1  12/16  1/4 
     
     
     
1 1 1/4
174 CHAPTER 4. ORTHOGONALITY
       
 0   1/2   −3/6   0 
       
 0   1/2   1/6   −2/3 
       
=  − − =
       

 1   1/2   1/6   1/3 
       
       
1 1/2 1/6 1/3

Note that

v 1 · v 2 = −3/4 + 1/4 + 1/4 + 1/4 = 0, v 1 · v 3 = 0 − 2/3 + 1/3 + 1/3 = 0, v 2 · v 3 = 0 − 2/3 + 1/3 + 1/3 = 0.

Therefore, {v 1 , v 2 , v 3 } is an orthogonal basis for W . If we normalize:



           
 1   1/2   −3/4   −3/ 12   0   0 
    r    √  r    √ 
1   1/2   1/4   1/ 12  −2/3   −2/ 6 
           
1  4 9
u1 = √    = 
 , u2 = 3 
 
=
 

, u3 =  =
√ ,

4 1    1/2   1/4   1/ 12 
 6  
       

 1/3   1/ 6 
 

       √     √ 
1 1/2 1/4 1/ 12 1/3 1/ 6

then {u1 , u2 , u3 } is an orthonormal basis of W . 

   
 1   1 
   
Example 4.7.3. Let u =  . Let W = Span{u, v} ⊆ Cn . Compute an orthogonal
 i  and v =  0
 

   
0 −i
basis for W .

The set {u, v} is a basis for W since the set is linearly independent (v 6= zu for any scalar z ∈ C).
Applying the Gram-Schmidt process, we set x1 = u and
     
 1   1   1/2 
u·v   1   
x2 = v − u= 0 
 −  =
i   −i/2 
 
.
u·u 
  2

  
−i 0 −i

Note that    
1 i 1 1
x1 · x2 = 1 −i − + 0(−i) = − = 0.
2 2 2 2
Thus, {x1 , x2 } is an orthogonal basis for W . 

Theorem 4.7.4 (The QR Factorization). If A is an m × n matrix with linearly independent columns, then
A can be factored as A = QR, where Q is an m × n matrix with orthonormal columns and Col A = Col Q
and R is an n × n upper triangular matrix with positive diagonal entries. In particular, R is necessarily
invertible.
The matrix Q is constructed by using an orthonormal basis provided by the Gram-Schmidt process, al-
 by −1 to guarantee
though some columns might need to be scaled  that the diagonal entries of R are positive.
If A = QR, then it can be shown that R = hq i , aj i , where q i and aj denote the columns of Q and A,
respectfully. With respect to the dot product, this gives that Q> A = R.
4.7. THE GRAM-SCHMIDT ALGORITHM 175
 
 1 0 0 
 
 1 1 0 
 
Example 4.7.5. Let A =  . By Example 4.7.2, {u1 , u2 , u3 } is an orthonormal basis for Col A
 
 1 1 1 
 
 
1 1 1
where

 
 1/2 −3/ 12 0 
 √ √ 
 1/2 1/ 12 −2/ 6 
   
Q= u1 u2 u3 = 
 √ √ .

 1/2 1/ 12 1/ 6 
 
 √ √ 
1/2 1/ 12 1/ 6

Let
 
 1 0 0  
  
1/2 1/2 1/2 1/2     2
 3/2 1
 1 1 0  
  
>
 √ √ √ √ √ √ 
R=Q A=
 −3/ 12 1/ 12 1/ 12 1/ 12  


= 0
  3/ 12 1/ 3 .

√ √ √   1 1 1  √
  

0 −2/ 6 1/ 6 1/ 6   0 0 2/ 6
1 1 1

Therefore, A = QR. 
176 CHAPTER 4. ORTHOGONALITY

Exercises (Go to Solutions)


For Exercises 1-7, apply the Gram-Schmidt algorithm to find an orthonormal basis for each subspace W
provided.
             
 1 −1 4 2 3 1   2 
 
 
 
 
 

♠ 1. Span   ,  ♠
 
3. Span , ,
    
     
   
          
 1

2 
 ♠ 2. Span   −4  ,  4 
    4

2 2 


     

 

 2 −5 
         
 1   2  1   2 3

 
 
 


  
   

    
      

♠ 5. Span 
 
 2 , 1
 0   1 
       , 3 
♠ 4. Span  ,     
    
       
 
 3   4   3 −3 3 

 

    


    

 
 1

2 

         
1   0 1 1   1−i

 
 
 


  
   

  

 
     

♠ 6. Span  , 1  
−i  i   2   0
 
     

♠ 7. Span 
   
,


    
    , 
    
 2i 1−i  −1   3   3 + 2i
 
     
 


     

 
 −i

4 i

For Exercises 8-12, compute the QR-factorizations of the matrix. Note that many of the column spaces of
these matrices coincide with subspaces considered in Exercises 1-7.
       
 1 −1   0 −1   4 2   1 2 
♠ 8.   ♠ 9.      
1 2 1 3 ♠ 10.  −4 4   0 1 
 
♠ 11. 
  
  
2 −5  3 4 
 
 
1 2

 
 1 2 3 
 
♠ 12. 
 2 1 3 

 
3 −3 3
4.8. THE LEAST SQUARES PROBLEM 177

“Hypotheses are what we lack the least.” – Henri Poincare

4.8 The Least Squares Problem


In the past, we have considered the matrix equation Ax = b. We have learned how to determine if the
equation has a solution and how to calculate the solution set when it is consistent. In fact, if x is a solution,
then we can decompose
x = xRow(A) + xNul(A)
orthogonally. But what about when Ax = b is inconsistent? Well, the solution set is empty then, but what
if we really needed it to have a solution? No manner of begging will change the fact that the matrix equation
does not have a solution. For example, what if we want to find a line that is incident to all of the below
points. As these points are not collinear, no such line exists. But is there a line that best fits the points, a
so-called regression line?
y

Regression Line
y = mx + b
(xn , yn )


 x1 m + b = y1
xm + b = y2


 2


x3 m + b = y3
(x2 , y2 ) 
 ..



 .
xn m + b = yn

(x1 , y1 )
x

Instead of asking, “Is there a vector x such that Ax = b,” we ask, “Is there a vector x such that Ax ≈ b?”
By approximation here, we mean to find a vector x such that kAx − bk is sufficiently small. This leads to
the least-squares problem.

Definition 4.8.1. If A is an m × n matrix and b ∈ F m , a least-squares solution of Ax = b is a vector


b ∈ F n such that
x
kb − Ab
xk ≤ kb − Axk
for all x ∈ F n

Note that if Ax = b is consistent, then there exists a vector x x = b and kb − Ab


b such that Ab xk = k0k = 0.
Thus, any solution is a least-square solution. On the other hand, if Ax = b is inconsistent, then the matrix
equation has no solution but must certainly still have a least-square solution. To find this least-squares
solution, let us change our focus. Why is the linear system inconsistent and who should be blamed for it?
Well, the vector b, sort of. We have also seen the b can be orthogonally decomposed as

b = bCol(A) + bLnl(A) .

If Ax = b is inconsistent, then it is because the left null space component of b, that is, bLnl(A) , is nontrivial.
As we mentioned before, while the null space measures the ease of solving a consistent system (because it
178 CHAPTER 4. ORTHOGONALITY

increases the dimension of the solution set), the left null space measures the difficulty of the system being
consistent (because more rows of zeros in the echelon form of A place more limitation on the choice of
consistent b).

Let A be an m × n matrix and b ∈ F m . Let x ∈ F n be a generic vector (treat x as a variable vector).


Then Ax is a generic vector of the subspace Col A. Let b
b = projCol A b = bCol(A) . Then

kb − b
bk ≤ kb − Axk

by the Best Approximation Theorem. Since b b ∈ Col A, there exists some vector x b ∈ F n such that Abx=b b.
Therefore, every linear system has a least-squares solution x
b , which is a solution to the linear system Ax = b
b,
since for all x we have
kb − Abxk ≤ kb − Axk.
Of course, we also have that the linear system Ax = b
b is ALWAYS consistent.

b is a least-squares solution of Ax = b, then b − Ab


If x x = b−bb ∈ (Col A)⊥ = Lnl(A). Hence, b − Ab x=
bLnl(A) . Thus, A> (b − Ab x) = 0. Therefore, A> Abx = A> b. In particular, every least-squares solution to
Ax = b is a solution to the system of normal equations

A> Ab
x = A> b. (4.8.1)

Essentially, multiplying by A> kills off the left null space component of b, leaving the column space compo-
nent behind. For complex linear systems, replace > with ∗ , per the usual.

Theorem 4.8.2. The set of least-squares solutions of Ax = b coincides with the solution set of the normal
equations A> Ab
x = A> b.

Example
 4.8.3.
 Find a least-squares
 solution of the inconsistent system Ax = b with

 4 0   2 
   
A= 0 2  and b = 

 .
0 
   
1 1 11
We begin by constructing the normal equations:
   
  4 0     2  
 4 0 1    17 1   4 0 1    19 
   
A> A =   0 2 =
  ; A> b =   0 =
  .
0 2 1   1 5 0 2 1   11
1 1 11

Thus, A> Ab
x = A> b becomes    
17 1 19
x =  .
   

1 5 11

Since A> A is invertible, we have


      
1  5 −1   19  1  84   1 
b = (A> A)−1 A> b =
x = = . 
84 84
  
−1 17 11 168 2
4.8. THE LEAST SQUARES PROBLEM 179

Example
 4.8.4. Find
 a least-squares
  solution for
 1 1 0 0   −3 
   
 1 1 0 0   −1 
   
   
   
 1 0 1 0   0 
A=  and b =  .
   
 1 0 1 0   2 
   
   
   
 1 0 0 1   5 
  

   
1 0 0 1 1

       
 6 2 2 2   4   6 2 2 2 4   1 0 0 1 3 
       
 2 2 0 0  −4   2 2 0 0 −4   0 1 0 −1 −5 
       
> >

A A= ; A b= ;  ∼ .
       
 2
 0 2 0 


 2 

 2 0 2 0
 2  
  0 0 1 −1 −2 

       
2 0 0 2 6 2 0 0 2 6 0 0 0 0 0

Therefore,    
 3   −1 
   
−5   1 
   

x
b=

 + x4 
 
.
 

 −2 

 1 
 
   
0 1

As illustrated in the last example, a least-squares solution need not be unique.

Theorem 4.8.5. Let A be an m × n matrix. Then the following are equivalent:


(i) The equation Ax = b has a unique least-squares solution for each b ∈ F m .
(ii) The columns of A are linearly independent.

(iii) The matrix A> A is invertible.


In this situation, the unique least-squares solution has the form

b = (A> A)−1 A> b.


x

Example 4.8.6. The matrix A given in Example 4.8.3 has a unique least-squares solution for all b ∈ Rm . 

Least-squares solutions also allow us to compute


 orthogonal projections
 without an orthogonal basis. For
example, if W = Span{v 1 , . . . , v r } ≤ F n , A = v1 v2 ... vr , and y ∈ F n , then W = Col(A) and
y
b = projW (y) = projCol(A) (y) = y Col(A) = Ab
x where x is a least-squares solutions to Ax = y.

Example 4.8.7. Let W = Col(A), where A is defined in Example 4.8.3. Then b b = projW (b) = Abx =
   
 4 0   4 
 
  1   
 0 2 
  = 4 . Thus, we may find orthogonal projections by solving least-squares problem. 

  2  
1 1 3
180 CHAPTER 4. ORTHOGONALITY

In the vein of the previous example, the standard matrix of the orthogonal projection from F n onto
the subspace W is A(A> A)−1 A> , where the columns of A correspond to a basis for W , since b 7→
A(A> A)−1 A> b = Ab x = bb = projW (b). Of course, if A = QR is a QR factorization, then

A(A> A)−1 A> = QR((QR)> QR)−1 (QR)> = QR(R> Q> QR)−1 R> Q> = QR(R> R)−1 R> Q> = QR(R−1 (R> )−1 )R> Q> = Q

This second form appears much simpler (and agrees with the formula we found in Section 4.5), but this is
because know A = QR means we have an orthonormal basis for W , namely the columns of Q. The advantage
of A(A>A)−1 A> is that no orthogonalization is necessarily to determine projW .
 
 4 0 
  3 3
Example 4.8.8. Let A =   0  and W = Col(A). Let T : R → R be the orthogonal projection onto
2 
 
1 1
W . Then
   
−1 
 4 0  4 0 
   
  17 1   4 0 1   5 −1   4 0 1 

1 
[T ] = A(A> A)−1 A> = 

 0 2
 =  0 2 
84 
     
  1 5 0 2 1   −1 17 0 2 1
1 1 1 1
     
 20 −4   80 −8 16   20 −2 4 
 
1   4 0 1  1   1  
=  −2 34   =  −8
 = 21  −2
68 32   17 8 
84 
 
 0 84  
2 1    
4 16 16 32 20 4 8 5

Note that
        
 20 −2 4  2   40 + 0 + 44   84   4 
1   
= 1  
= 1    
 21  84  =  4  .
 −2
 0
17 8   21  −4 + 0 + 88 
     
21 
        
4 8 5 11 8 + 0 + 55 63 3

Theorem 4.8.9. Let A be an m × n matrix with linearly independent columns. Let A = QR be a QR


factorization. Let b ∈ F m . Then the unique least-squares solution x
b ∈ F n has the form

b = R−1 Q> b.
x

b = R−1 Q> b. Then


Proof. Let x

x = A(R−1 Q> b) = QR(R−1 Q> b) = QQ> b = projCol Q b,


Ab

since QQ> is the standard matrix representation for projCol Q . Since Col Q = Col A, x
b is a least-squares
solution of Ax = b. By Theorem 4.8.5, this is the unique least-squares solution. 
4.8. THE LEAST SQUARES PROBLEM 181

Exercises (Go to Solutions)


For Exercises 1-3, compute the least square solutions for linear system.
     
 1 −1 2   0 1 3   2 0 −10 
     
♠ 1. 
 2 3 −1   1

0 −2 

 1

−2 2 6 

♠ 2.  ♠ 3. 
  
     
4 5 5  2 −1 −7   2 −1 0 0 
   
   
1 0 −2 0 1 −1 6

For Exercises 4-5, compute the orthogonal projection projW (b) using the method of least squares as in
Example 4.8.7 where W = Col(A), where A is the matrix provided first and b is the vector provided second.
       
 1 −1   4   5 1   −4 
       
♠ 4. 
 3 ,
2   1 
  ♠ 5. 
 1 ,
3   2 
 
       
−2 4 3 4 −2 3

For Exercises 6-7, compute the orthogonal projection projW (b) using the method of least-squares. where a
basis for W is provided first and the vector b is the vector provided second.

♠ 6. {(−1, 2, 1), (2, 2, 4)} , (1, −6, 1) ♠ 7. {(2, 1, 1, 1), (1, 0, 1, 1), (−2, −1, 0, −1)} , (6, 3, 9, 6)

For Exercises 8-9, find a basis for W and the standard matrix for the orthogonal projection onto W . Answers
may vary.

♠ 8. W is the plane given by the equation 5x − 3y + z = 0 in R3 .


♠ 9. W is the line with parametric equations x = 2t, y = −t, and z = 4t in R3 .

10. If A is a nonsingular matrix, show that the unique least squares solution x
b to the linear system Ax = b
is likewise the unique linear solution to Ax = b.
11. Suppose that A = QR is a QR factorization of A. If Ax = b has a unique least-squares solution, show
b = R−1 Q> b is this solution.
that x
182 CHAPTER 4. ORTHOGONALITY
Chapter 5

Determinants

183
184 CHAPTER 5. DETERMINANTS

“Our lives will depend upon the decisions which we make–for decisions determine destiny.”
– Thomas S. Monson

5.1 Introduction to Determinants


 
 a b 
In a previous section, we discussed the inverse of a 2 × 2 matrix A =  :
c d
 
1 d −b 
A−1 = .

ad − bc

−c a

The quantity ad − bc was called the determinant of A, denoted det A or |A|. In this chapter we work to
generalize this idea for any square matrix. Our strategy will be to define determinants recursively. We will
remind the reader that, unlike the previous chapter, the theory of determinants is applicable to any field.
Hence, F will denote an arbitrary field.

Definition 5.1.1. Let A be an n × n matrix. Define the (i, j)-minor matrix Aij to be the (n − 1) × (n − 1)
matrix which results by removing the ith row and jth column from A.

 
 1 5 0 
 
Example 5.1.2. Let A = 
 2 . Compute the (1, 1)−, (1, 2)−, (2, 2)−, and (3, 3)-minor matrix.
4 −1 
 
0 −2 0

       
4 −1   2 −1   1 0   1 5 
A11 =  , A12 = , A22 = , A33 = .



−2 0 0 0 0 0 2 4

   
Definition 5.1.3. Let A = aij be an n × n matrix. For n = 1, A = r for some r ∈ F . In this case,
we define det A = r. For n > 1, let

det A = a11 det A11 − a12 det A12 + . . . + (−1)1+n a1n det A1n
Xn
= (−1)1+j aij det Aij
j=1

The quantity det A is called the determinant of A.


 
 a b  a b
Let A =  . Then det A = = a det A11 − b det A12 = a d −b c = ad − bc, which
c d c d
agrees with our 2 × 2 determinant from before.

 
 1 5 0 
 
Example 5.1.4. Compute det A for A = 
 2 .
4 −1 
 
0 −2 0
5.1. INTRODUCTION TO DETERMINANTS 185

1 5 0
4 −1 2 −1 2 4
det A = 2 4 −1 =1 −5 +0
−2 0 0 0 0 −2
0 −2 0
= 1(4 · 0 − (−1) · (−2)) − 5(2 · 0 − (−1) · 0) + 0 = (0 − 2) − 5(0) = −2. 

Definition 5.1.5. Let A be an n × n matrix. Define the (i, j)-cofactor Cij as


Cij = (−1)i+j det Aij .

With this notation,


det A = a11 C11 + a12 C12 + . . . + a1n C1n .

The following diagram may help you remember the plus or minus sign in the cofactors:
 
 + − + · · · 
 
 − + −
 

 
 
 + − + 
 
 . .. 
.. .

 
Theorem 5.1.6 (Laplace Expansion). The determinant of an n × n matrix A = aij can be computed
by a cofactor expansion across any row or down any column. The cofactor expansion across the ith row is
n
X
det A = ai1 Ci1 + ai2 Ci2 + . . . + ain Cin = aij Cij .
j=1

The cofactor expansion down the jth column is


n
X
det A = aij C1j + a2j C2j + . . . + anj Cnj = aij Cij .
i=1

 
 1 5 0 
 
Example 5.1.7. Use the cofactor expansion across the third row to compute det A, where A = 
 2 .
4 −1 
 
0 −2 0

1 5 0
5 0 1 0 1 5
det A = 2 4 −1 =0 − (−2) +0
4 −1 2 −1 2 4
0 −2 0

1 0
= 2 = 2(−1 − 0) = −2 .
2 −1

186 CHAPTER 5. DETERMINANTS
 
 3 −7 8 9 −6 
 

 0 2 −5 7 3 

 
Example 5.1.8. Compute det A, where A =  .
 
 0 0 1 5 0 

 

 0 0 2 4 −1 

 
0 0 0 −2 0
We will expand across the leftmost column to maximize the number of zero coefficients.

3 −7 8 9 −6
2 −5 7 3
0 2 −5 7 3 1 5 0
0 1 5 0
0 0 1 5 0 = 3 =6 2 4 −1
0 2 4 −1
0 0 2 4 −1 0 −2 0
0 0 −2 0
0 0 0 −2 0
 
4 −1 5 0 
= 6 −2  = 6(−2 + 0) = −12

−2 0 −2 0

The matrix in the last example is nearly triangular. The method in that example is easily adapted to
prove the following.

Theorem 5.1.9. If A is a triangular n × n matrix, then det A is the product of the entries on the main
diagonal.

Theorem 5.1.10. If A is a 2×2 matrix over R or v


C, the area of the parallelogram determined by the
columns of A is | det A|. If A is 3 × 3, the volume u
of the parallelepiped determined by the columns of
A is | det A|. The higher dimensional analogues u
also hold.
v

Example 5.1.11. Calculate the area of the parallelogram determined by the points (−2, −2), (0, 3), (4, −1),
and (6, 4).

We begin by translating the parallelogram such that (−2, −2) is moved to the origin. This translation
does not affect the area. We now consider the parallelogram with vertices (0, 0), (2, 5), (6, 1), and (8, 6).
Note that this parallelogram is the one spanned by (2, 5) and (6, 1). Therefore,

2 6
Area = abs = |2 − 30| = 28 . 

5 1
5.1. INTRODUCTION TO DETERMINANTS 187

Exercises (Go to Solutions)


 
 5 7
12 
 
For Exercises 1-5, for the matrix A = 
 0 3 , find the given minor.
4 
 
9 −1 −6

1. A22 2. A33 3. A11 4. A23 5. A12


 
 1 2
4  3
 
 0 −2 −1 3 
 
For Exercises 6-10, for the matrix B = 

, find the given minor.

 1 1 3 6 
 
 
7 −4 8 0

♠ 6. B11 ♠ 7. B13 ♠ 8. B31 ♠ 9. B33 ♠ 10. B34


 
 1 2 3 4 
 
 0 5 3 −1 
 
For Exercises 11-13, for the matrix C = 

, find the given minor.

 −2 4 0 −4 
 
 
−1 3 2 5

11. C22 12. C23 13. C42


 
 i 3i − 2 2 + 5i 7−i 
 
 3+i 4 7i − 5 9 − 3i 
 
For Exercises 14-16, for the matrix D = 

, find the given minor.

 2 + 3i
 5i − 1 1 + 9i 4 + 5i 

 
7 + 2i i−3 2 + 4i 1 + 2i

14. D24 15. D33 16. D14

For Exercises 17-25, compute the determinant of the matrix.

1 2 i 1+i 3 2
♠ 17. ♠ 18. ♠ 19. (mod 7)
3 4 2−i 3 − 4i 1 5

1 2 −1 1 2 3 2 0 0
♠ 20. −1 0 2 ♠ 21. 3 −2 5 ♠ 22. 3 1 0 (mod 5)

3 5 1 0 0 2 4 3 4
188 CHAPTER 5. DETERMINANTS

1 1 2 4 1 0 0 0 0 5 4 4 2

0 1 1 3 0 0 0 1 0 2 4 6 8
♠ 23. (mod 5) 25.
0 0 2 4 ♠ 24. 0 0 1 0 0 (mod 2) 0 1 2 3

1 2 3 4 0 0 1 0 0 3 3 3 9

0 0 0 0 1

26. Compute det(A)(u · v) if:


  
3 −2
  5e e
2
 π e 
   
   
A= √ , u= 1 

, v= 0 

.
5 π    √ 
π π3

For Exercises 27-29, compute the area, volume, or hyper-volume of the parallelogram, parallelepiped, or
hyper-parallelepiped spanned by the given set of vectors.
                 
 1
  3  1   4   −2 1   1   0 2 
 
 
 
 
 
 
♠ 27. 
 
,
 
  
    
               

♠ 28.  , 8 , 3  
 2 2  2 2   0   0 −3 
   
       
  
♠ 29. 
       , , , 

      


 3         
2 7  


 3  
  1
0     4 


 
 
   
 
 



 
4 2 −2 5 

 
5.2. PROPERTIES OF DETERMINANTS 189

“In most encounters we can determine the kind of experience we are going to have by how we respond.”
– Wayne S. Peterson

5.2 Properties of Determinants


Let V and W be vector spaces over F . We have seen previously that linear maps T : V → W are those that
preserve the linear operations, that is, T (x + y) = T (x) + T (y) and T (cx) = cT (x) for all x, y ∈ V and
c ∈ F . Let V n = {(v 1 , v 2 , . . . , v n ) | v 1 , v 2 , . . . , v n ∈ V }, that is, V n is the set of all list of n vectors from
V . This is itself a vector space with dim V n = n dim V . When V = F m , then V n ∼ = F m×n , the space of all
n
m × n matrices as each list in V can be identified with the columns vectors of an m × n matrix.

Definition 5.2.1. Let V and W be vector spaces over F . Let n ∈ N. Let B : V n → W be a function. We
say that B is multilinear if for each i and choice of vector v 1 , v 2 , . . . , v n ∈ V the function

x 7→ f (v 1 , v 2 , . . . , v i−1 , x, v i+1 , . . . , v n )

is a linear transformation. In other words, a multilinear map is one which is linear in each variable. When
n = 2, we say a multilinear map is bilinear. Of course, if n = 1 then a multilinear map is simply just linear.

Example 5.2.2. The dot product · : Rn × Rn → R is bilinear since for all u, v, w and c ∈ R it holds

(u + v) · w = u · w + v · w, (cu) · v = c(u · v), u · (v + w) = u · v + u · w, u · (cv) = c(u · v).

Thus, the dot product is linear in the first and the second factor.

Likewise, the tensor product ⊗ : Rn × Rn → Rn×n is also bilinear since for all u, v, w and c ∈ R it holds

(u + v) ⊗ w = u ⊗ w + v ⊗ w, (cu) ⊗ v = c(u ⊗ v), u ⊗ (v + w) = u ⊗ v + u ⊗ w, u ⊗ (cv) = c(u ⊗ v). 

Determinants are NOT linear transformations (det(A + B) 6= det(A) + det(B)). The determinant map
det : F n×n → F is multilinear with respect to both its rows and columns since det(A) = det(A> ). In
particular, if A, B, and C are n × n matrices that differ only in a single row, say the rth row, and assume
the rth row of C is obtained by adding corresponding entries in the rth rows of A and B. Then

det(C) = det(A) + det(B).

Likewise, if A and B are matrices that differ only in a single row, say the rth row, and assume the rth row
of B is obtained by scaling the corresponding entries in the rth row of A by c ∈ F . Then det(B) = c det(A).

Example 5.2.3. Note that

4 5 0 4 5 0 4 5 0 1 2 3 1 2 3

3 −1 2 = 3 −1 2 + 3 −1 2 and 4 6 8 =2 2 3 4 . 

1+0 2+4 3−2 1 2 3 0 4 −2 0 1 3 0 1 3

The following is probably the most important property of determinants.

Theorem 5.2.4. If A and B are n × n matrices, then det(AB) = det(A) det(B).


190 CHAPTER 5. DETERMINANTS
   
 6 1   4 3 
Example 5.2.5. Let A =   and B =  . Then det A = 6(2) − 1(3) = 9 and det B =
3 2 1 2
4(2) − 3(1) = 5. Thus, det(A) det(B) = 45 . On the other hand,
    
6 1 4 3   25 20 
AB =  = .
 

3 2 1 2 14 13

Thus, det(AB) = 25(13) − 20(14) = 325 − 280 = 45 . 

Corollary 5.2.6. A square matrix A is nonsingular if and only if det A 6= 0. In this case, det(A−1 ) =
1
.
det(A)

The previous corollary tells us that the rows or columns are A are linearly dependent, then det(A) = 0.
In particular, if A has a repeated row (or column) or a row (or column) of zeros, then det(A) = 0 without
further calculation necessary.

For a square matrix A, let E and B be square matrices such that A = EB and E is an elementary
matrix. Thus, det(A) = det(E) det(B). If E is a replacement elementary matrix, det(E) = 1 since it is unit
triangular. If E is a scaling elementary matrix by a factor of c, then det(E) = c since it is diagonal. If E is
an interchange elementary matrix, then det(E) = −1.

Theorem 5.2.7. Let A be a square matrix.

(i) (Replacement) If a multiple of one row of A is added to another row to produce a matrix B, then
det B = det A.

(ii) (Scaling) If one row of A is multiplied by c to produce B, then det B = c · det A.

(iii) (Interchange) If two rows of A are interchanged to produce B, then det B = − det A.

 
 1 −4 2 
 
Example 5.2.8. Compute det A, where A = 
 −2 .
8 −9 
 
−1 7 0
Using Theorem 3, we row reduce A to an echelon form in order to compute det A.

1 −4 2 1 −4 2 1 −4 2 1 −4 2

−2 8 −9 = 0 0 −5 = 0 0 −5 =− 0 3 2 = −1(3)(−5) = 15 . 

−1 7 0 −1 7 0 0 3 2 0 0 −5
5.2. PROPERTIES OF DETERMINANTS 191
 
 2 −8 8 6
 
 3 −9 5 10 
 
Example 5.2.9. Compute det A, where A = 

.

 −3 0 1 −2 
 
 
1 −4 0 6
Again, we row reduce to calculate the determinant.

2 −8 6 8 1 −4 3 4 1 −4 3 4 1 −4 3 4

3 −9 5 10 3 −9 5 10 0 3 −4 −2 0 3 −4 −2
= 2 =2 =2
−3 0 1 −2 −3 0 1 −2 0 −12 10 10 0 0 −6 2

1 −4 0 6 1 −4 0 6 0 0 −3 2 0 0 −3 2

1 −4 3 4 1 −4 3 4

0 3 −4 −2 0 3 −4 −2
= 4 =4 = 4(1)(3)(−3)(1) = −36 .
0 0 −3 1 0 0 −3 1

0 0 −3 2 0 0 0 1


0 1 2 −1

2 5 −7 3
Example 5.2.10. Compute .
0 3 6 2

−2 −5 4 −2
This time we combine methods of cofactors and row reduction.

0 1 2 −1 0 1 2 −1

2 5 −7 3 2 5 −7 3
= (add Row 2 to Row 4)
0 3 6 2 0 3 6 2

−2 −5 4 −2 0 0 −3 1

1 2 −1
= −2 3 6 2 (cofactor expand across Column 1)

0 −3 1

1 2 −1
= −2 0 0 5 (add -3 Row 1 to Row 2)

0 −3 1

1 2 −1
= 2 0 −3 1 (interchange Rows 2 and 3)

0 0 5
192 CHAPTER 5. DETERMINANTS

= 2(−3)(5) = −30 .


5.2. PROPERTIES OF DETERMINANTS 193

Exercises (Go to Solutions)

a b c a b c a b u
For Exercises 1-8, suppose that e f g = 2, e f g = −3, and e f v = 5. Compute the

h i j x y z h i w
given expression.

a u b a b c 5a 5b 5c 2u b a
1. e v f ♠ 2. h i j ♠ 3. e f g 4. 2v f e

h w i e f g h i j 2w i j

a b c+u 3a 3c 3b e f g
5. e f g+v ♠ 6. h j i ♠ 7. 4a 4b 4c

h i j+w e + 5a g + 5c f + 5b h + 5x i + 5y j + 5z

2b a c + 5u
8. 2f e g + 5v

2i h j + 5w

For Exercises 9-19, compute the determinant of the each of the following matrices using row reduction.

6 1 4+i 3+i −3 6 −1 1 2 3
9. 10.
2 9 −i −2 ♠ 11. −8 11 −3 12. −3 −2 −4

2 −3 1 5 10 21

1 2 3 2 26 36 2 11 4
13. 4 5 6 14. 6 87 113 ♠ 15. 9 1 4 (mod 13)

0 2 4 5 65 97 6 6 5

2 4 7 1 2 3 4 1 0 1 2 3
16. −2 −1 1 0 3 2 −1 −1 1 6 3 1
♠ 17. (mod 7)
3 6 15 3 5 0 1 ♠ 18. 0 2 6 3 4

4 −1 −1 0 −2 4 3 5 0

0 3 1 2 1

2 3 9 1 4

7 2 2 9 3
19. 0 0 3 3 0

5 2 2 7 0

6 2 6 5 4
194 CHAPTER 5. DETERMINANTS

For Exercises 20-24, suppose A is a 5 × 5 real matrix such that det(A) = 3.

♠ 20. Compute rank(A). Compute nullity(A). ♠ 21. Compute corank(A). Compute conullity(A).

♠ 22. For any b ∈ R5 , is the linear system Ax = b consistent? How many free variables are in the linear
system Ax = b? How many solutions does Ax = b have?
♠ 23. Compute the row reduced echelon form of A.
♠ 24. Let T : R5 → R5 be the linear transformation with standard matrix A. Compute ker(T ). Compute
im(T ). Is T one-to-one? Is T onto?

25. Prove Theorem 5.2.4 in the special case that A and B are 2 × 2 matrices.
5.3. CRAMER’S RULE 195

“Desires dictate our priorities, priorities shape our choices, and choices determine our actions. The desires
we act on determine our changing, our achieving, and our becoming.” – Dallin H. Oaks

5.3 Cramer’s Rule


 
Definition 5.3.1. Let A = a1 ... an be an n × n matrix and let b ∈ F n . Then

 
Ai (b) = a1 ... b ... an ,

where b replaced the ith column vector of A.

Theorem 5.3.2 (Cramer’s Rule). Let A be a n × n nonsingular matrix. For any b ∈ F n , the unique solution
x of Ax = b has entries given by
det Ai (b)
xi = .
det A
Proof. Let I denote the n × n identity matrix, with column vectors ei . If Ax = b, then
   
A · Ii (x) = A e1 . . . x . . . en = Ae1 . . . Ax . . . Aen
 
= a1 . . . b . . . an = Ai (b).

Therefore,
det(A) det(Ii (x)) = det(A · Ii (x)) = det Ai (b).
But det Ii (x) = xi , by row reduction along the ith column of Ii (x). Therefore,

det(A) · xi = det Ai (b),

which finishes the proof. 

Note that det(A)xi = det Ai (b) holds even if det A = 0.

Example 5.3.3. Use Cramer’s rule to solve the system


(
3x1 − 2x2 = 6
−5x1 + 4x2 = 8.
   
 3 −2   6 
Let A =   and b =  . Thus,
−5 4 8

3 −2 6 −2 3 6
det A = = 12−10 = 2, det A1 (b) = = 24+16 = 40, det A2 (b) = = 24+30 = 54.
−5 4 8 4 −5 8

Then by Cramer’s rule, the unique solution is given as


       
x
 1   det A 1 (b)/ det A 40/2 20
x= = = = .
    


x2 det A2 (b)/ det A 54/2 27


196 CHAPTER 5. DETERMINANTS

Definition 5.3.4. Let A be an n × n matrix. Then the adjugate (or adjoint) of A, denoted adj A, is given
as  
C
 11 C 21 . . . C n1 
 
 C12 C22 . . . Cn2 
   
adj A = Cji =  . 
.. .. 

 .. . . 
 
 
C1n C2n . . . Cnn

Please note that the adjugate matrix is the transpose of the matrix of cofactors.

Theorem 5.3.5. Let A be an n × n matrix. Then A · adj(A) = adj(A) · A = det(A)In . In particular, if A


is nonsingular then
1
A−1 = adj A
det A
Proof. The jth columns of A−1 is a vector x such that

Ax = ej .

By Cramer’s rule, the ith entry in x is given as

det Ai (ej )
xi = .
det A
But by cofactor expansion across the ith column, we see that

det Ai (ej ) = (−1)i+j det Aji = Cji .

det Ai (ej ) Cji


Therefore, the (i, j) entry in A−1 is = , which finishes the proof. 
det A det A

In particular,
A · adj A = adj A · A = det A · In .
Note that is formula holds even if det A = 0.

 
 2 1 3 
 
Example 5.3.6. Find the inverse of the matrix A = 
 1 −1 .
1 
 
1 4 −2
We begin by finding the nine cofactors:

−1 1 1 1 1 −1
C11 = + = − 2, C12 = − = 3, C13 = + =5
4 −2 1 −2 1 4

1 3 2 3 2 1
C21 = − = 14, C22 = + = −7, C23 = − = −7
4 −2 1 −2 1 4

1 3 2 3 2 1
C31 = + = 4, C32 = − = 1, C33 = + = −3
−1 1 1 1 1 −1
5.3. CRAMER’S RULE 197

Therefore,  
 −2 14 4 
 
adj A = 
 3 −7 .
1 
 
5 −7 −3

To finish, we need to compute det A. We could compute it directly like in the previous sections, but instead
we use the observation that adj A · A = det A · In .
    
 −2 14 4  2 1 3   14 0 0 
    
adj A · A =  3 −7
 1   1 −1
  1  =  0 14 0 
 
.
    
5 −7 −3 1 4 −2 0 0 14

Thus, det A = 14 and  


 −1/7 1 2/7 
−1
 
A =
 3/14 −1/2 1/14 .
 
 
5/14 −1/2 −3/14

In practice, it is not very practical to solve linear systems or compute inverse via Cramer’s rule. The
method of row-reduction is generally more efficient. On the other hand, the application of Cramer’s rule and
adjugate matrices is immeasurable in the theory of linear algebra.

For example, if A is an integer matrix, that is, all of its entries are integers, then its determinant and
cofactors will all be integers too. After all, determinants are calculated using addition, subtraction, and
multiplication. No division required! Thus, adj A will be an integer matrix too. Thus, if det(A) = ±1, we
see that A−1 will be an integer matrix as well. This fact is very useful for instructor who want to exercise
homework questions with “cute” answers so that students feel happy about their linear algebra homework.
198 CHAPTER 5. DETERMINANTS

Exercises (Go to Solutions)


For Exercises 1-5, solve the linear system using Cramer’s Rule.
           
 6 1 1 1 3 5 2 3  1 
1.   x ≡   (mod 7) ♠ 2.  x =  ♠ 3.  x ≡
        
  
2 4 4 2 −3 1 4 2 3
(mod 5)
       
 1 3
5   2   1

2 0 1 

 2 
 
   
4. 
 0 −2 3 
 x =  1 
   0

1 2 1 

 0 
 
    5.   x ≡   (mod 3)
   
3 2 0 2  2
 1 1 2 

 1 
 
   
0 2 2 1 1

For Exercises 6-9, compute the adjugate of the matrix A below. Verify that A(adj A) = det(A)In .
   
 1 3   2 3 
♠ 6.   ♠ 7.   (mod 5)
2 −3 4 2
   
 1 3
5   1

2 0 1 

 
♠ 8. 
 0 −2 3   0

1 2 1 

♠ 9. 
  (mod 3)
   
3 2 0  2
 1 1 2 

 
0 2 2 1

QUICK! For Exercises 10-12, using Theorem 5.3.5, the given matrix A and its adjugate matrix adj(A),
calculate the determinant and inverse matrix of A in LESS THAN 60 SECONDS!
   
 1 2 3   4 13 5 
   
10. A =  −1 −1 , adj(A) =  −3 −11 −4 
1   
   
2 1 −5 1 3 1
   
 −6 8 13   8 6 8 
   
11. A = 
 6 −8 ,
−12  adj(A) = 
 0 −2 6 

   
2 −2 −4 4 4 0
   
 6 −2 4   −10 30 2 
   
12. A = 
 −3 ,
−1 1  adj(A) = 
 11 46 −18 

   
−4 5 5 −19 −22 −12
5.4. CROSS PRODUCTS 199

“You can’t cross the sea merely by standing and staring at the water.” – Rabindranath Tagore

5.4 Cross Products


Definition 5.4.1. Let u, v ∈ F 3 . Then we define the cross product u × v to be the vector in F 3 of the
form
u × v = (u2 v3 − u3 v2 , u3 v1 − u1 v3 , u1 v2 − u2 v1 ).

In the language of determinants, we can define the cross product as


 
 2u v 2 u 1 v 1 u 1 v 1 
u×v = , − , . (5.4.1)
u3 v3 u3 v3 u 2 v2

Continuing with this observation, the three coordinates in the cross product are the three cofactors of the
e1 u1 v1 e1 e2 e3
determinant e2 u2 v2 = u1 u2 u3 when expanded across the first column. Thus,

e3 u3 v3 v1 v2 v3

e1 u1 v1 e1 e2 e3
u2 v2 u1 v1 u1 v1
u×v = e2 u2 v2 = u1 u2 u3 = e1 − e2 + e3 .
u3 v3 u3 v3 u2 v2
e3 u3 v3 v1 v2 v3

There are important distinctions between the cross product and the dot product worth mentioning. First,
the cross product of two vectors is itself a vector in F 3 . For this reason, it is sometimes called the vector
product. Conversely, the dot product of two vectors is a scalar in F . For this reason, the dot product is
sometimes called the scalar product. Similarly, the outer product is sometimes called the matrix product
since u ⊗ v is a matrix. Second, while the dot product is defined for all vectors in F n , the cross product is
only defined here for vectors in F 3 .

Example 5.4.2. Let u = (1, 2, −2) and v = (3, 0, 1). Then their cross product is given as

u × v = (2(1) − (−2)(0), (−2)(3) − 1(1), 1(0) − 2(3)) = (2 − 0, −6 − 1, 0 − 6) = (2, −7, −6) . 

Definition 5.4.3. If u, v, w ∈ R3 , then u · (v × w) is called the scalar triple product of u, v, and w.

Theorem 5.4.4. If u, v, w ∈ R3 , then

u1 v1 w1
u · (v × w) = u2 v2 w2 .

u3 v3 w3

Proof.
 
 v2 w2 v1 w1 v1 w1 
u · (v × w) = u· , − , 
v3 w3 v3 w3 v2 w2
200 CHAPTER 5. DETERMINANTS

u1 v1 w1
v2 w2 v1 w1 v1 w1
= u1 − u2 + u3 = u2 v2 w2 ,
v3 w3 v3 w3 v2 w2
u3 v3 w3

where the last equality is seen by cofactor-expansion across the first column. 

Corollary 5.4.5. If u, v, w ∈ R3 , then the area of the parallelogram spanned by u and v is ku × vk and
the volume of the parallelepiped spanned by u, v, and w is |u · (v × w)|.

Example 5.4.6. Compute the scalar triple product u · (v × w) for the vectors u = (3, −2, −5), v =
(1, 4, −4), and w = (0, 3, 2).

3 1 0
4 3 −2 3
u · (v × w) = −2 4 3 =3 − = 3(8 + 12) − (−4 + 15) = 60 − 11 = 49 . 

−4 2 −5 2
−5 −4 2

Theorem 5.4.7. Let u, v, w ∈ R3 and let k ∈ R. Then


(i) u · (u × v) = 0; (vii) u × 0 = 0 × u = 0;
(ii) v · (u × v) = 0; (viii) u × v = −(v × u);
(iii) ku × vk2 = kuk2 kvk2 − (u · v)2 ; (ix) u × (v + w) = u × v + u × w;

(iv) ku × vk = kukkvk sin θ; (x) (u + v) × w = u × w + v × w;


(v) u × (v × w) = (u · w)v − (u · v)w; (xi) k(u × v) = (ku) × v = u × (kv);
(vi) (u × v) × w = (u · w)v − (v · w)u; (xii) u × u = 0;

There are a few important observations to make. First, the cross product is noncommutative, that is,
u × v 6= v × u. It is instead what we call anti-commutative. Second, the cross product is nonassociative,
that is, u × (v × w) 6= (u × v) × w. Instead, cross products satisfy what is known as the Jacobi identity:

u × (v × w) + v × (w × u) + w × (u × v) = 0.†

Finally, the cross product is necessarily orthogonal to the two factor vectors. This provides a usual tool for
creating normal vectors in F 3 .

We have seen before that an m-flat in F n can be constructed in two ways:


Bottom-Up—Using m linearly independent spanning vectors v 1 , . . . , v m , we construct a vector equation for
which the flat is the solution set to the vector equation

x = x0 + c1 v 1 + c2 v 2 + . . . + cm v m ,

where x0 is a vector on the flat, c1 , c2 , . . . , cm ∈ F are scalar parameters. Of course, a vector equation
corresponds to a linear system, which implies the flat is the solution set of this linear system.

Top-Down—Using n − m linearly independent normal vectors n1 , . . . , nn−m , we construct n − m linearly


independent scalar equations of the form ni · (x − x0 ) = 0 or ni · x = ni · x0 . If ni = (a1 , a2 , . . . , an ) ∈ F n ,
then this scalar equation has the form

a1 x1 + a2 x2 + . . . + an xn = d,
5.4. CROSS PRODUCTS 201

for some d ∈ F . Then the flat is the solution to the system of all these scalar equations. These normal
vectors, of course, belong to the orthogonal complement of the flat.

How does one transition between these two representations of the flats? If the Top-Down representation
is given, one could solve the linear system Ax = b to find x = x0 + c1 v 1 + c2 v 2 + . . . + cm v m , where x0 is a
particular solution and {v 1 , . . . , v m } is a basis for Nul(A). This gives the Bottom-Up representation of the
flat.

If we start with the Bottom-Up representation x = x0 + c1 v 1 + c2 v 2 + . . . + cm v m , we need to a matrix


A and vector b such that Nul(A) = Span{v 1 , . . . , v n }. As Nul(A)⊥ = Row(A), we need to find vector
orthogonal to the spanning vectors. This can be best accomplished by creating a matrix B such that
Row(B) = Nul(A) = Span{v 1 , . . . , v n }. Then Nul(B) = Row(B)⊥ = Nul(A)⊥ = Row(A). With these
normal vectors, we can construct the linear system for the flat.

Example 5.4.8. Consider the 2-flat in F 4 given by the vector equation x = x0 + su + tv where x0 =
(1, 2, 3, 4), u = (1, 0, 0, −1), and v = (2, 1, −3, 0). Find a linear system Ax = x for which this flat is the
solution set.
 
 1 0 0 −1 
Let B =  . Then
2 1 −3 0    
0   1 

 


  
  
     
 
 1 0 0 −1   3   2 
     

B∼  . Hence, Row(B) = Nul(B) = Span  
,  .
  
0 1 −3 −2  1   0 

    
 

     
 
0 1 

 

Therefore, the flat is the solution to the following linear system:


( (
(0, 3, 1, 0) · (x1 − 1, x2 − 2, x3 − 3, x4 − 4) = 0 0x1 + 3x2 + 1x3 + 0x4 = 0(1) + 3(2) + 1(3) + 0(4)

(1, 2, 0, 1) · (x1 − 1, x2 − 2, x3 − 3, x4 − 4) = 0 1x1 + 2x2 + 0x3 + 1x4 = 1(1) + 2(2) + 0(3) + 1(4)
(
3x2 + x3 =9

x1 + 2x2 + x4 = 9
One can easily check that the vectors in the flat above are solutions to this linear system. 

Now, when one wants to find the equation of a hyperplane, then the single normal can be found using
determinants in a manner similar to (5.4.1). Of course, when n = 3, this is just the cross product.

Example 5.4.9. Find an equation for the plane in R3 which passes through (1, −2, 1), (−1, 0, 1), and
(3, 2, 0).

We can construct the equation using a normal vector. First, consider the plane as a vector equation. Let
u = (3, 2, 0) − (1, −2, 1) = (2, 4, −1) and v = (3, 2, 0) − (−1, 0, 1) = (4, 2, −1). If x0 = (−1, 0, 1), then the
plane is given as x = su + tv + x0 . As the vectors u and v give the “slope” of the plane, we want to find a
vector orthogonal to both u and v. Such a vector is u × v. Note
   
e1 2 4 −4 + 2 −2
4 2 2 4 2 4
   
   
u × v = e2 4 2 = e1 − e2 + e3 =  −(−2 + 4)  =  −2 
  
.
−1 −1 −1 −1 4 2    
e3 −1 −1 4 − 16 −12
202 CHAPTER 5. DETERMINANTS

Thus, the equation for the plane can be given as −2(x − (−1)) − 2(y − 0) − 12(z − 1) = 0
⇒ −2x − 2 − 2y − 12z + 12 = 0 ⇒ x + y + 6z = 5 . 
5.4. CROSS PRODUCTS 203

Exercises (Go to Solutions)


For Exercises 1-3, compute u × v.
      
 1   3   1   2 
       
♠ 1. u = 
 ,v =  2
−1   
 ♠ 2. u = 
 2 ,v =  6
  

       
2 −1 4 −1

  
 1   −2 
   
♠ 3. u =  1  , v =  0 
  

   
1 4

For Exercises 4-5, simplify the expression.


              
 3   1   3   0   −1    4   2 
              
♠ 4.  4  +  −3  ×  1 
    
 ♠ 5. 
 3  − 2  2 ×2  1
      −
  −1 

              
0 1 2 4 5 1 3

For Exercises 6-8, use normal vectors to construct a system of linear equations so that the given affine set is
the solution set to your linear system.
♠ 6. The plane in R3 which passes through (1, −2, 1), (−1, 0, 1), and (3, 2, 0).
♠ 7. The hyperplane in R4 which passes through (2, −1, 3, −1), (3, 0, −2, 2), (1, −2, 0, 2), and (−1, −2, −2, 4).

♠ 8. The plane in R4 which passes through (1, 2, 3, 4), (2, 3, 0, 1), and (0, 1, 2, 4).

9. Prove Theorem 5.4.7 (ix).


10. Prove Theorem 5.4.7 (xii).

† The Jacobi Identity in addition to axioms (g)–(k) makes F 3 equipped with the cross product into a special type of vector

space known as a Lie algebra.


204 CHAPTER 5. DETERMINANTS
Chapter 6

Eigenvalues

205
206 CHAPTER 6. EIGENVALUES

“Try not to become a man of success, but rather try to become a man of value.” – Albert Einstein

6.1 Eigenvalues and Eigenvectors


     
 3 −2   −1   2 
Example 6.1.1. Let A =  , u =  , and v =  . Then
1 0 1 1
         
 3 −2   −1   −5   3 −2   2   4 
Au =   =  and Av =   =  = 2v.
1 0 1 −1 1 0 1 2

Notice that while multiplication by A sends u off to who-knows-where, multiplication by A only stretches
the vector v. This chapter will explore more deeply this type of phenomenon. 

Definition 6.1.2. Let A be an n × n matrix. Then a nonzero vector x is an eigenvector of A if there is a


scalar λ such that
Ax = λx.
In this case, λ is called an eigenvalue of A corresponding to x.

It is straight forward to check if a vector is an eigenvector.

     
 1 6  6 3
Example 6.1.3. Let A =  , u =  , and v =  . Then
   
5 2 −5 −2
      
 1 6   6   −24  6
Au =  =  = −4   = −4u.
 

5 2 −5 20 −5

On the other hand,       


 1 6   3   −9   3 
Av =   =  6= λ  ,
5 2 −2 11 −2
since (
− 9λ = 3
11λ = −2
has no solution. Therefore, u is an eigenvector of A with eigenvalue λ = −4. The vector v is not an
eigenvector of A. 

It can also be checked whether a scalar is an eigenvalue of a matrix, although this requires row reduction.

Example 6.1.4. Let A be the same matrix as the previous example. Is 7 an eigenvalue of A?

Now, suppose that λ is an eigenvalue of A with eigenvector x. Then

Ax = λx
Ax − λx = 0
Ax − λIx = 0
(A − λI)x = 0
6.1. EIGENVALUES AND EIGENVECTORS 207

Therefore, λ = 7 is an eigenvalue if and only if A − 7I has a nontrivial solution to the homogeneous system
(A − 7I)x = 0. Now,      
 1 6   7 0   −6 6 
A − 7I =  − = .
5 2 0 7 5 −5
But,      
 −6 6 0   1 −1 0   1 −1 0 
 ∼ ∼ .
5 −5 0 5 −5 0 0 0 0
 
 1 
Therefore, x =   is a nontrivial solution to this homogeneous system and hence an eigenvector of A.
1
Note that       
 1 6  1   7   1 
Ax =   =  = 7   = 7x.
5 2 1 7 1
 
 1 
Therefore, 7 is an eigenvalue of A with eigenvector  . 

 
1
Notice in the last example, that x =   was an eigenvector for A with eigenvalue 7. Now, this is not
 
1
   
2 14
the only eigenvector for 7. In fact, y =   is also an eigenvector. Note that Ay =   = 7y. Now,
   
2 14
y = 2x. In fact, any nonzero multiple of x is an eigenvector of A since
A(cx) = c(Ax) = c(λx) = λ(cx).
Furthermore, any nontrivial solution to the homogeneous system (A − λI)x = 0 is an eigenvector. But this
is the null space of (A − λI), a subspace of Rn . This leads to the next definition.

Definition 6.1.5. Let A be an n × n matrix with eigenvalue λ. Then the null space of (A − λI) is called the
eigenspace of A corresponding to λ. The dimension of the eigenspace is called the geometric multiplicity
of the eigenvalue λ and corresponds to the nullity of the matrix A − λI.

Since eigenspaces are null spaces of a matrix, they are always subspaces of F n .

 
 4 −1 6 
 
Example 6.1.6. Let A =  2 . An eigenvalue of A is λ = 2. Find a basis for the eigenspace
1 6 
 
2 −1 8
corresponding to λ = 2.

We form      
 4 −1 6   2 0 0   2 −1 6 
     
A − 2I = 
 2 − 0
1 6   2 = 2
0  
.
−1 6 
     
2 −1 8 0 0 2 2 −1 6
208 CHAPTER 6. EIGENVALUES

Thus,    
 2 −1 6   2 −1 6 
   
−1 ∼ 0 .
 2 6   0 0 

   
2 −1 6 0 0 0

This implies that the null space of A − 2I are the solutions to the equation

2x1 − x2 + 6x3 = 0.

In other words,
       



 1/2 −3 




 1 −3 



 
  

 
 

 
 


Nul(A − 2I) = Span  1  ,  0  = Span  2  ,  0  .
      

         

 
 
 

 0 1   0 1 
   



 1   −3 



  


 2   0 .
In fact, the eigenspace is 2-dimensional with basis   ,   

   

 

 0 1 

Theorem 6.1.7. The eigenvalues of a triangular matrix are the entries on its main diagonal.

Proof. Let A be a triangular matrix and let λ be the (i, i)-entry of A. Then A − λI is a singular matrix
because det(A − λI) = 0. Therefore, (A − λI)x = 0 has a nontrivial solution, which implies that λ is an
eigenvalue of A. 

This idea of determinants will return later in this chapter.

   
 3 6 −8   4 0 0 
   
Example 6.1.8. Let A =  0 0  6  and B =  −2 1 0 
 
. Since both matrices are triangular,
   
0 0 2 5 3 4
the eigenvalues of A are λ = 3, 0, 2 and the eigenvalues of B are λ = 4, 1. 

Notice from the previous example that 4 appeared twice along the diagonal of B. This is a repeated
eigenvalue of multiplicity two. This idea of multiplicity will also return later in this chapter.
6.1. EIGENVALUES AND EIGENVECTORS 209

Exercises (Go to Solutions)


 
 2 6 
For Exercises 1-3, determine if the vector is an eigenvector for A =  . If so, what is the eigenvalue?
3 −1
     
 6   1   1 
1.   2.   3.  
3 2 1

 
 −6 −21 −16 
 
For Exercises 4-6, determine if the vector is an eigenvector for A = 
 6 17 . If so, what is the
12 
 
−5 −12 −8
eigenvalue?
     
 −2   2   1 
     
♠ 4. 
 0 
 ♠ 5. 
 −1 
 ♠ 6. 
 −1 

     
1 −1 1

 
 2 −5 −2 
 
For Exercises 7-9, determine if the vector is an eigenvector for A = 
 2 . If so, what is the
−7 −3 
 
−4 14 6
eigenvalue?
     
 −1   −1   0 
     
♠ 7. 
 −1 
 ♠ 8. 
 −2 
 ♠ 9. 
 0 

     
2 4 0

For Exercises 10-12, find an eigenvector for the matrix A and eigenvalue λ. Answers may vary.
     
3 1 1 3 −2 −1
♠ 10.  , λ = 3 ♠ 11.  , λ = 2 ♠ 12.  , λ = −3
     
0 −2 −1 5 1 −4
 
 −2 4 −21 
 
 −15 29 −13 23 
 
For Exercises 13-15, determine if the number λ if an eigenvalue of A = 

.

 −30 44 −19 34 
 
 
0 −4 2 −3

♠ 13. λ = 3 ♠ 14. λ = −2 ♠ 15. λ = 2

For Exercises 16-17, find a basis for the eigenspace for the matrix and each eigenvalue λ listed. Answers may
vary. Also, determine the geometric multiplicities of each listed eigenvalue.
210 CHAPTER 6. EIGENVALUES
   
 −7 2 32   −8 −5 5 
   
♠ 16. 
 −8 1  , λ = 5, −3
0  ♠ 17. 
 20  , λ = −3, 2
12 −10 
   
−2 1 13 10 5 −3

For Exercises 18-18, find a basis for the eigenspace for the matrix and each eigenvalue. Answers may vary.
Also, determine the geometric multiplicities of each listed eigenvalue.
 
 1 −1 3 
 
18. 
 0 1 2 

 
0 0 2

19. Let A be an n × n matrix. Let λ be an eigenvalue of A with associated eigenvector x. Show that if m
is a positive integer then λm is an eigenvalue of Am . What is the associated eigenvector?

20. Show that if A is idempotent then λ = 0, 1.

21. Show that if A is nilpotent then λ = 0.


6.2. THE CHARACTERISTIC POLYNOMIAL 211

“There seems to be some perverse human characteristic that likes to make easy things difficult.”
– Warren Buffett

6.2 The Characteristic Polynomial


As observed previously, for an n × n matrix A, a scalar λ is an eigenvalue if and only if (A − λI)x = 0
has a nontrivial solution if and only if A − λI is singular if and only if det(A − λI) = 0. In this context,
determinants will be a valuable tool to compute eigenvalues.

Definition 6.2.1. Let A be an n×n matrix. Treating λ as a variable, the value det(A−λI) is a polynomial of
degree n, called the characteristic polynomial of A. The roots of the characteristic polynomial are exactly
the eigenvalues of A since the roots are the solutions to the equation det(A − λI) = 0. The (algebraic)
multiplicity of each eigenvalue is its multiplicity in the characteristic polynomial.

   
 3 6 −8   4 0 0 
   
Example 6.2.2. Let A =   0 0 6  and B =  −2
 1 .
0 
   
0 0 2 5 3 4
Then the characteristic polynomial of A is

3−λ 6 −8
det(A − λI) = 0 −λ 6 = (3 − λ)(−λ)(2 − λ)

0 0 2−λ
= (6 − 5λ + λ2 )(−λ) = −6λ + 5λ2 − λ3 .

Note that the eigenvalues of A are 3, 0, and 2, where all multiplicities are one.

The characteristic polynomial of B is

4−λ 0 0
det(B − λI) = −2 1−λ 0 = (4 − λ)(1 − λ)(4 − λ)

5 3 4−λ
= (16 − 8λ + λ2 )(1 − λ) = 16 − 24λ + 9λ2 − λ3 .

Note that the eigenvalues of B are 4 and 1, where 4 has multiplicity two and 1 has multiplicity one. 

Let A be a matrix with eigenvalue λ. Suppose that m is the geometric multiplicity of λ (nullity(A − λI))
and n is the algebraic multiplicity of λ. Then 1 ≤ m ≤ n, that is, the algebraic multiplicity is an upper
bound for the geometric multiplicity.

 
 2 3 
Example 6.2.3. Find the eigenvalues of A =  .
3 −6
We can find the eigenvalues of A by computing the characteristic polynomial of A and factoring it.

2−λ 3
det(A − λI) = = (2 − λ)(−6 − λ) − 3(3)
3 −6 − λ
212 CHAPTER 6. EIGENVALUES

= (−12 + 4λ + λ2 ) − 9 = −21 + 4λ + λ2
= (λ + 7)(λ − 3)

Therefore, the eigenvalues of A are λ = −7, 3. 

 
 0 −1 
Example 6.2.4. If A =  , then the characteristic polynomial of A is
1 0

−λ −1
= λ2 + 1 = (λ − i)(λ + i)
1 −λ
and the complex eigenvalues of A are ±i. Since
     
−i −1   i 1   1 −i 
A − iI =  ∼ ∼ ,

1 −i 0 0 0 0
 
i
we have that   is an eigenvector. Note that
 
1
        
i   0 −1   i   −1  i
A = =  = i .
  

1 1 0 1 i 1

Likewise,      
 i −1   i −1   1 i 
A + iI =  ∼ ∼ .
1 i 0 0 0 0
 
 −i 
Thus,   is an eigenvector of A and
1
        
 −i   0 −1   −i   −1   −i 
A =  =  = −i  .
1 1 0 1 −i 1

In particular, A is diagonalizable with


   
 i −i   i 0   −i/2 1/2 
A=   .
1 1 0 −i i/2 1/2

Let A be an n × n matrix with real entries. Then Ax = A x = Ax. If λ is an eigenvalue of A and x is a


corresponding eigenvector, then
Ax = A x = Ax = λx = λ x.
Then the conjugate of x is also an eigenvector of A whose corresponding eigenvalue is the conjugate of λ.
We saw this in the last example. In greater generality, the eigenvalues of a matrix of a field F might lie in
some extension field E such that F ⊆ E.
6.2. THE CHARACTERISTIC POLYNOMIAL 213

Theorem 6.2.5. An n × n matrix A is nonsingular if and only if 0 is not an eigenvalue of A. In particular,


1
if Ax = λx, then A−1 x = x.
λ
 
 3 6 −8 
 
Example 6.2.6. Let A =   0 0 . Then its eigenvalues are λ = 3, 0, 2. In particular, 0 is an
6 
 
0 0 2
eigenvalue of A. Even though, 0 cannot be an eigenvector, 0 can be an eigenvalue. Notice that the eigenvalues
of λ = 0 are simply the vectors in Nul(A), that is, the nullity of A is simply the geometric multiplicity of 0
as an eigenvalue of A. In fact,
        
 −2   3 6 −8  −2   0   −2 
        
A  =
 1   0 0
 6  1   0 
   =   = 0  1 .
  
        
0 0 0 2 0 0 0

Definition 6.2.7. Let A and B be n × n matrices. We say that two matrices are similar if there exists a
nonsingular n × n matrix P such that P AP −1 = B.
   
 1 0 −7   15 −18 −2 
   
Example 6.2.8. The matrices A =  5 1  2  and B =  17 −17 −4 
 
 are similar since P =
   
−4 2 0 7 −22 4
   
 1 −2 −1   2 0 −1 
  −1
 
 1 −1
 0  is nonsingular with P =  2 −1 −1 
 
 and
   
1 −4 −2 −3 2 1
      
 1 −2 −1   1 0 −7   2 0 −1   1 −2 −1   23 −14 −8 
P AP −1
      
=
 1 −1 0 
 5
 1 2 
 2

= 1
−1 −1   −1 0 
 6
 3 −4 

      
1 −4 −2 −4 2 0 −3 2 1 1 −4 −2 −4 −2 2
 
 15 −18 −2 
 
=
 17 −17 −4  = B.
 
 
7 −22 4

Theorem 6.2.9. If A and B are two similar matrices, then they have the same characteristic polynomial
and hence the same eigenvalues (with the same multiplicities).
Likewise, similar matrices have the same determinant, rank, nullity, and trace. Thus, if any two matrices
differ on one of these invariants then they cannot be similar.

   
 1 2   3 1 
Example 6.2.10. The matrices A =   and B =   are not similar since tr(A) =
0 −2 −1 2
1 − 2 = −1 6= 5 = 3 + 2 = tr(B). 
214 CHAPTER 6. EIGENVALUES

Exercises (Go to Solutions)


For Exercises 1-9, find the eigenvalues and bases for the eigenspaces of the matrix A given. Answers may
vary.
     
10 6 6 1  35 48
♠ 1.  ♠ 2.  ♠ 3. 
    
  
−12 −8 −1 4 −24 −33
     
 4 −5   −1 −5   5 −2 
♠ 4.   ♠ 5.   ♠ 6.  
1 0 4 7 1 3
     
 −6 −2 −2   17 5 5   3 1 1 
     
♠ 7. 
 17 6 5 
 ♠ 8. 
 −41 −12 −14 
 ♠ 9. 
 −2 −1 −1 

     
7 2 3 −19 −6 −4 −7 −2 −2

For Exercises 10-11, explain why the matrices A and B below are NOT similar. (Sure, do it QUICKLY if
you want a challenge, UNDER 60 SECONDS!)
       
 1 0   1 −3   1 0   2 −3 
♠ 10.  ,   ♠ 11.  ,  
0 4 5 2 0 4 0 3
6.3. DIAGONALIZATION 215

“The line of life is a ragged diagonal between duty and desire.” – William R. Alger

6.3 Diagonalization
Theorem 6.3.1. If x1 , . . . , xr are eigenvectors of an n×n matrix A which correspond to distinct eigenvalues
λ1 , . . . , λr , respectively, then {x1 , . . . , xr } is linearly independent.

Definition 6.3.2. Let A be an n × n matrix. We say that A is diagonalizable if A is similar to a diagonal


matrix D, that is, A = P DP −1 for some invertible matrix P .

   
2 7  1 1 
Example 6.3.3. Let A =  . Then A is a diagonalizable matrix. Let P =   and

−4 1 −1 −2
   
 5 0   2 1 
D= . Then P −1 =  and
0 3 −1 −1
        
1 1  5 0  2 1   1 1   10 5   7 2 
A = P DP −1 =  = = .

  
−1 −2 0 3 −1 −1 −1 −2 −3 −3 −4 1

Next, observe that

A2 = (P DP −1 )(P DP −1 ) = P D(P −1 P )DP −1


   
2
 1 1  5 0  2 1 
= P D2 P −1 =    
−1 −2 0 32 −1 −1
    
2 2 2 2 2 2
1 1 2 · 5 5 2 · 5 − 3 5 − 3
=  = .
    

−1 −2 −32 −32 −2 · 52 + 2 · 32 ) −52 + 2 · 32

In fact, it follows by induction that


     
k
1 1  5 0  2 1   2 · 5k − 3k 5k − 3k 
Ak = P Dk P −1 =  = .

   
−1 −2 0 3k −1 −1 −2 · 5k + 2 · 3k −5k + 2 · 3k

Amongst other reasons, diagonalization provides an effective method to compute powers of matrices. 

From the previous example, note that


        
 1   7 2  1   5   1 
A =  =  = 5 
−1 −4 1 −1 −5 −1

and         
1   7 2  1   3   1 
A = =  = 3 .


−2 −4 1 −2 −6 −2
Thus, the column vectors of P are eigenvectors of A! Furthermore, the diagonal entries of D are the eigen-
values of A!
216 CHAPTER 6. EIGENVALUES

Theorem 6.3.4 (The Diagonalization Theorem). An n × n matrix A is diagonalizable if and only if A


has n linearly independent eigenvectors. In this case, there is a basis of F n consisting of eigenvectors of
A, called an eigenvector basis (or eigenbasis). If A = P DP −1 , then the diagonal entries of D are the
eigenvalues of A with multiplicity, the columns of P are eigenvectors, and the eigenvectors in P correspond
to the eigenvalues in the same column in D .
Let B be an eigenbasis of A and let E be the standard basis of F n . Then P = P and P −1 = P .
E←B B←E

Theorem 6.3.5. An n × n matrix with n distinct eigenvalues is diagonalizable.

 
 1 3 3 
 
Example 6.3.6. If possible, diagonalize the matrix A = 
 −3 .
−5 −3 
 
3 3 1
We begin by computing the eigenvalues of A, via the characteristic polynomial of A:

1−λ 3 3
−5 − λ −3 −3 −3 −3 −5 − λ
det(A − λI) = −3 −5 − λ −3 = (1 − λ) −3 +3
3 1−λ 3 1−λ 3 3
3 3 1−λ
= (1 − λ)[(−5 − λ)(1 − λ) + 9] − 3[−3(1 − λ) + 9] + 3[−9 + 3(5 + λ)]
= −(5 + λ)(1 − λ)2 + 9(1 − λ) + 9(1 − λ) − 27 − 27 + 9(5 + λ)
= −(5 + λ)(1 − λ)2 + 18(1 − λ) − 54 + 9(5 + λ)
= −(5 + λ)(1 − λ)2 + 18(1 − λ) − 9(1 − λ) = −(5 + λ)(1 − λ)2 + 9(1 − λ)
= (1 − λ)[(5 + λ)(−1 + λ) + 9] = (1 − λ)[4 + 4λ + λ2 ]
= −(λ − 1)(λ + 2)2
 
 1 0 0 
 
Therefore, the eigenvalues of A are 1 and -2 (with multiplicity two). Let D = 
 0 −2 .
0 
 
0 0 −2
Next, we compute the eigenvectors of A. We begin with λ = 1.
         
 0 3 3   0 3 3   0 1 1   1 2 1   1 0 −1 
         
A−I =  −3 −6 −3  ∼  −3 −6 −3  ∼  1
    2 ∼ 0
1   1 ∼ 0
1   1 .
1 
         
3 3 0 0 0 0 0 0 0 0 0 0 0 0 0
 



 1 





Therefore, Nul(A − I) = Span  −1 . Next, we use λ = −2.


  

 

 1 
     
 3 3
3   3 3 3   1 1 1 
     
A + 2I = 
 −3 ∼ 0
−3 −3  0 ∼ 0
0  0 .
0 
 
     
3 3 3 0 0 0 0 0 0
6.3. DIAGONALIZATION 217
   



 −1   −1 


  


Therefore, Nul(A + 2I) = Span  1   0 . Since A has a basis of eigenvectors, A is diagonaliz-
 ,  

    

 

 0 1 
 
 1 −1 −1 
 
able. Let P = 
 −1 1 .
0 
 
1 0 1
To finish, we need to compute P −1 .
     
 1 −1 −1 1 0 0   1 −1 −1 1 0 0   1 −1 −1 1 0 0 
     
 ∼  0 ∼ 0
 −1 1 0 0 1 0   0 −1 1 1 0   1 2 −1 0 1 
 
     
1 0 1 0 0 1 0 1 2 −1 0 1 0 0 −1 1 1 0
   
 1 −1 −1 1 0 0   1 −1 0 0 −1 0 
   
∼   0 1 2 −1 ∼ 0
0 1   1 0 1 2 1 

   
0 0 1 −1 −1 0 0 0 1 −1 −1 0
 
 1 0 0 1 1 1 
 
∼   0 1 0 1 .
2 1 
 
0 0 1 −1 −1 0
 
 1 11 
−1
 
Therefore, P =
 1  and
2 1 
 
−1 −1 0
   
 1 −1 −1   1 0 0  1 1 1 
−1
   
A = P DP =
 −1 1 0 
 0
 −2 0 
 1

.
2 1  
   
1 0 1 0 0 −2 −1 −1 0

Note that a matrix is diagonalizable if and only if each geometric multiplicity is equal to its algebraic
multiplicity.
218 CHAPTER 6. EIGENVALUES

Exercises (Go to Solutions)


For Exercises 1-5, diagonalize the matrix A. Answers may vary.
     
3 −2 19 −8  8 −20 10 
♠ 1.  ♠ 2. 
   
   
1 0 40 −17 ♠ 3. 
 5 −12 5 
 
5 −10 3
   
 6 10 0 −4   8 2010 
   
 3

5 0 −2

 5. 
 5 −12 5 
♠ 4.   
   
 12 24 −1 −8  5 10 3
 
 
15 30 0 −11
6.4. ORTHOGONAL DIAGONALIZATION 219

“It’s always good to take an orthogonal view of something. It develops ideas.” – Ken Thompson

6.4 Orthogonal Diagonalization


Symmetric matrices interestingly bring together the theory of eigenvectors and inner products. Recall a real
square matrix U is called orthogonal if U > U = I, that is, U T = U −1 and complex matrix U is called unitary
if U ∗ U = I, that is, U ∗ = U −1 .

Theorem 6.4.1. If A is a symmetric matrix or a Hermitian matrix, then any two eigenvectors with distinct
eigenvalues are orthogonal.

 
 6 −2 −1 
 
Example 6.4.2. We note that A = 
 −2  is symmetric. It can also be checked that
6 −1 
 
−1 −1 5
     
 −1   −1  1
 
     
λ1 = 8 : v 1 =  1 

; λ2 = 6 : v 2 =  −1 

; λ3 = 3 : v 3 = 
 1 

     
0 2 1

are all eigenvectors of A. Furthermore, v 1 · v 2 = v 1 · v 3 = v 2 · v 3 = 0. Normalizing these vectors:


 √   √   √ 
 −1/ 2   −1/ 6   1/ 3 
 √   √   √ 
u1 =  1/ 2  ,
 u2 =  −1/ 6  ,
 u3 = 
 1/ 3
,

   √   √ 
0 2/ 6 1/ 3

we get a diagonalization of A:
 √ √ √   √ √ 
 −1/ 2 −1/ 6 1/ 3  8 0 0   −1/ 2 1/ 2 0 
√ √ √ √ √ √
A = P DP −1 = 
   
 1/ 2 −1/ 6 1/ 3  0
 6   −1/ 6
0   −1/ 6 2/ 6 

 √ √   √ √ √ 
0 2/ 6 1/ 3 0 0 3 1/ 3 1/ 3 1/ 3

using an orthogonal matrix P . 

Definition 6.4.3. We say a real matrix A is orthogonally diagonalizable if there exists an orthogonal
matrix P and diagonal matrix D such that A = P DP > = P DP −1 . We say a complex matrix A is unitarily
diagonalizable if there exists an Hermitian matrix P and diagonal matrix D such that A = P DP ∗ =
P DP −1 .

The matrix in the previous example is orthogonally diagonalizable. In fact, if A is orthogonally diago-
nalizable, then A = P DP > and A> = (P DP > )> = (P > )> D> P > = P D> P > = P DP > , that is, A> = A.
Thus, A is symmetric. The converse is also true.

Theorem 6.4.4. A real matrix A is orthogonally diagonalizable if and only if A is symmetric. A complex
matrix A is unitarily diagonalizable if and only if A is Hermitian.
220 CHAPTER 6. EIGENVALUES

The method of computing an orthogonal diagonalization is the same as any other diagonalization except
we require that our basis of eigenvectors be orthonormal. Normalizing the eigenvectors is simple enough.
On the other hand, we cannot simply apply the Gram-Schmidt procedure to an eigenbasis, because the end
result may not be an eigenbasis. Instead, we must apply the Gram-Schmidt procedure to a basis for each
distinct eigenspace. Since different eigenspaces are mutually orthogonal, the union of these orthogonal bases
gives an orthogonal eigenbasis.

 
 3 −2 4 
 
Example 6.4.5. Let A = 
 −2 . It can be shown that λ = 7, −2 are the eigenvalues of A and
6 2 
 
4 2 3
     



 1 −1/2 




 −1 



 
 






Nul(A − 7I) = Span  0  , 
   1  ,
 Nul(A + 2I) = Span  −1/2 
 

     
  


 
 
 

 1 0   1 

are the eigenspaces. Applying Gram-Schmidt to the eigenspace of λ = 7, we get the orthogonal basis
   

 1
  
 −1/4 



   
Nul(A − 7I) = Span  0  , 
   1  .

  
 

 

 1 1/4 

Therefore,      
 1


 −1/4 −1 



 
 
 
 


 0 , 1  ,  −1/2 
  
   

      


 1 
1/4 1 
is an orthogonal eigenbasis. After normalizing, the set
 √   √   



  1/ 2   −1/ 18   −2/3 



 ,  4/√18
     
 0  ,  −1/3 
      




 1/ 2 √   √   


1/ 18 2/3 

is an orthonormal eigenbasis. Therefore,


 √ √   √ √ 
 1/ 2 −1/ 18 −2/3  7 0 0   1/ 2 0 1/ 2 
 √   √ √ √ 
A=  0 4/ 18 −1/3  0
 7   −1/ 18
0   4/ 18 1/ 18 

 √ √   
1/ 2 1/ 18 2/3 0 0 −2 −2/3 −1/3 2/3

is an orthogonal diagonalization of A. 

In particular, unitary matrices are the complex analogue of orthogonal matrices. In fact, all of the pre-
vious theorems about orthogonal matrices remain true when considering unitary matrices. The same can be
said for symmetric and Hermitian matrices.

The set of eigenvalues of a matrix A is called the spectrum of A.


6.4. ORTHOGONAL DIAGONALIZATION 221

Theorem 6.4.6 (The Spectral Theorem for Symmetric (Hermitian) Matrices). An n × n symmetric (Her-
mitian) matrix A has the following properties:
(i) A has n real eigenvalues, counting multiplicities;

(ii) The geometric and algebraic multiplicities of each eigenvalue of A are equal;
(iii) The eigenspaces of A are mutually orthogonal;
(iv) A is orthogonally/unitarily diagonalizable.

 
1 + i 2
Example 6.4.7. Diagonalize the Hermitian matrix A =  .
 
1−i 3
We begin with its characteristic polynomial:
 
λ − 2 −1 − i
det(λI − A) =   = (λ − 2)(λ − 3) − (−1 − i)(−1 + i)
 
−1 + i λ − 3
= λ2 − 5λ + 6 − 1 − 1 = λ2 − 5λ + 4 = (λ − 4)(λ − 1).

Therefore, the eigenvalues of A are λ = 1, 4. We then investigate its eigenspaces:


   
 −1 −1 − i  2 −1 − i 
λ = 1;  ∼ λ = 4; ∼

  
−1 + i −2 −1 + i 1
   
 1 1 + i   2 −1 − i 
   
0 0 0 0
       
−1−i 1 1+i
−1 − i v1  √3  2 (1 + i) v2 √
6
v1 =  ; p = = v2 =  ; p2 = =
     
1
kv 1 k kv 2 k
   
1 √1 1 √2
3 6

Note that:
      
21 + i   −1 − i  −2 − 2i + 1 + i   −1 − i
 =  =
  
  
1−i 3 1 −2 + 3 1
      
 2 1 + i   21 (1 + i)  1+i+1+i 
1
 2 (1 + i)
 =   = 4
 
  
1−i 3 1 1+3 1
   
−1−i 1+i

3

6  1 0 
Let P =  , which is a unitary matrix. Let D =  . Then
 
√1 √2 0 4
3 6
     
−1−i 1+i −1+i √1
√ √
 1 0  √ 1 + i2
A = P DP ∗ =  3 6 3 3
= .
   
 
√1 √2 0 4 1−i
√ √2 1−i 3
3 6 6 6

As can be expected, the eigenvalues of A are real despite A being a non-real matrix. 
222 CHAPTER 6. EIGENVALUES

Let A be a symmetric matrix. Then it has an orthogonal diagonalization given as:


    
λ 0   u> u>
  1 1    1 
..  . ..
= P DP > =   ..
   
A u1 ... un 
 . 
= λ u
 1 1 ... λ n un 
 . 

    
0 λn u>n u>
n

= λ 1 u1 u> >
1 + . . . + λn un un = λ1 (u1 ⊗ u1 ) + . . . + λn (un ⊗ un ).

This last line is called a spectral decomposition of A. Each of matrices Bi = ui u> i = ui ⊗ ui , the
outer product of ui with itself, is an n × n symmetric matrix with rank 1. The range of Bi is Span{ui }.
Furthermore, Bi Bj = 0 if i 6= j and Bi2 = Bi , since {u1 , . . . , un } is a orthonormal set. Thus, the Bi ’s
are idempotent and pairwise “orthogonal.” When considering complex vectors, the outer product becomes
u ⊗ v = uv ∗ .

 
 7 2 
Example 6.4.8. The matrix A =   is symmetric and has an orthogonal diagonalization given by
2 4

√ √ √ √
   
 2/ 5 −1/ 5   8 0   2/ 5 1/ 5 
A= √ √   √ √ .
1/ 5 2/ 5 0 3 −1/ 5 2/ 5

√ √
 
 2/ 5 −1/ 5 
 
Let P = u1 u2 = √ √ . Then
1/ 5 2/ 5


   
2/ 5 4/5 2/5
 
√ √
u1 ⊗ u1 = u1 u> =  √  2/ 5 1/ 5 =  ,
   
1
1/ 5 2/5 1/5

   
 −1/ 5   1/5 −2/5 
 
√ √
u2 ⊗ u2 = u2 u>
2 =  √  −1/ 5 2/ 5 =  
2/ 5 −2/5 4/5
     
 4/5 2/5   1/5 −2/5   7 2 
8u1 u> >
1 + 3u2 u2 = 8  + 3 =  = A. 

2/5 1/5 −2/5 4/5 2 4


6.4. ORTHOGONAL DIAGONALIZATION 223

Exercises (Go to Solutions)


For Exercises 1-5, compute an orthogonal (or unitary) diagonalization for the matrix A.
     
17 5 5 17 8 5 3 −3 3
♠ 1. 
     
    
5 −7 ♠ 2.  17 5 8 
   3

5 3 −3 

  ♠ 3. 
 

8 8 14  −3 3 5 3 


 
3 −3 3 5
   
4 1−i   5 0 0
♠ 4. 
 
  
1+i 5 ♠ 5. 
 0 −1 −1 + i 

 
0 −1 − i 0

For Exercises 6-8, compute a spectral decomposition for the matrix A.

♠ 6. A from Exercise 1 ♠ 7. A from Exercise 2 ♠ 8. A from Exercise 3

9. Show that the inverse of an orthogonal matrix is orthogonal.

10. Show that the product of two orthogonal matrices is orthogonal.

11. Show that the determinant of an orthogonal matrix is ±1.

12. Show that if U is an orthogonal matrix then kU xk = kxk.


224 CHAPTER 6. EIGENVALUES

“Transformation literally means going beyond your form.” – Wayne Dyer

6.5 Similarity and Linear Transformations


Let V and W be n- and m-dimensional vector spaces with bases B = {b1 , . . . , bn } and C = {c1 , . . . , cm },
respectively. Let T : V → W be a linear transformation. Likewise, the coordinate mappings [·]B : V → F n
and [·]C : W → F m are linear transformations (in fact, they are isomorphisms). Consider the composite
linear transformation ([·]C ◦ T ◦ [·]−1 n m n m
B ) : F → F . This is a linear transformation between F and F . Let
A be the standard matrix of this composite transformation. Then
[T (x)]C = A[x]B
for all x ∈ V . This matrix A = C [T ]B is called the matrix representation of T relative to B and C. In
fact,  
A= [T (b1 )]C [T (b2 )]C ... [T (bn )]C .

The following diagram may be useful in remembering the relationship here:


T
x T (x)

A
[x]B [T (x)]C

Example 6.5.1. Consider the linear transformation T : R4 → R4 given by the rule


   
 x   −2x + 11y + z − w 
   
 y   2x − 9y + 2z + w 
   
T 
 
 = 
   .

 z  
   2x − 4y + 3z 

   
w 3x + 1.5y + 2z − w

Note that the standard matrix E [T ]E for T : R4 → R4 is


 
 −2 11 1 −1 
 
 2 −9 2 1 
 
E [T ]E = 
 .

 2 −4 3 0 
 
 
3 1.5 2 −1
         
1 1 1 2 3

 
 
 


   
 
      


    
 
      

   
 2   0 
     −1   0   −2 
      
4
Suppose B =    ,
 
 
 is a basis for V ⊆ R and C = 

 ,
 
  ,
 

 is a basis for



  3   −1  



  1   0   1  

   
 
      



    
 
      

4 0 −1 1 1 

 
 
 

W ⊆ R4 . Then T restricts to a linear transformation from V to W , that is, T : V → W . Note that


   
 19   −3 
   
 −6   0 
   
T (b1 ) = 


 T (b 2 ) = 

.

 3   −1 
   
   
8 1
6.5. SIMILARITY AND LINEAR TRANSFORMATIONS 225

In order to compute
 thematrix representation T : V → W with respect to B and C coordinates, we need to
row reduce C T (B) :
   
 1 2 3 19 −3   1 0 0 0 −2 
   
 −1 0 −2 −6 0   0 1 0 5 −2 
     
C T (B) =  ∼ .
   
 1 0
 1 3 −1  
  0 0 1 3 1 

   
−1 1 1 8 1 0 0 0 0 0

Thus,  

 0   −2 
   
[T (b1 )]C = 
 5 
 and  −2  .
[T (b2 )]C =  
   
3 1
This gives  
0 −2 
  
 
M= [T (b1 )]C [T (b2 )]C =
 5 .
−2  
 
3 1

Note that the matrix representation of T depends on the bases chosen for the domain V and codomain
W , up to a point.

Theorem 6.5.2. Let T : V → W be a linear transformations. Let A and A0 be two matrix representations
of T . If m = dim W and n = dim V , then there nonsingular matrices P (m × m) and Q (n × n) such that
A = P A0 Q−1 .
Proof. Let B, B 0 be bases for V and C, C 0 be bases for W such that A = C [T ]B and A0 = C 0 [T ]B0 . Then let
Q = P 0 and P = P 0 . Let x ∈ V . Then
B←B C←C

[P A0 Q−1 [x]B = P A0 [x]B0 = P [T (x)]C 0 = [T (x)]C = A[x]B .


Thus, P A0 Q−1 = A. 

When T : V → V is a linear transformation with the same domain and codomain, we often use the same
basis for the domain and codomain. We then refer to the matrix representation A = [T ]B relative to B.
In this case,
[T (x)]B = A[x]B .

Corollary 6.5.3. Let T : V → V be a linear transformation. Let A and B be two matrix representations of
T . Then there exists a nonsingular matrix P such that A = P BP −1 . In particular, all matrix representations
of T are similar to each other.
Proof. The result follows immediately from Theorem 6.5.2 once it is discovered that Q = P . 

Since matrix representations of a linear transformation always remains in the same similarity class,
any property invariant on similar matrices can be attached to the linear transformation. For example, if
T : V → V is a linear transformation, then we can define det(T ) to be the determinant of any matrix
representation of T . Likewise, we can define tr(T ), nullity(T ), rank(T ), etc. to be the trace, nullity, rank,
etc. of any matrix representation of T . This includes the eigenvalues of a matrix.
226 CHAPTER 6. EIGENVALUES

Example 6.5.4. Let T : R3 → R3 be a linear transformation such that


   
 x   5x + 2y 
   
T
 y  =  2x + 3y − z  .
  
   
z 26x + 16y − 2z

Then the standard matrix representation (using the standard basis on R3 ) yields
 
 5 2 0 
 
A = [T ] = 
 2 3 −1 .
 
26 16 −2
      
 2 



  −1   −1 



     
On the other hand, one could check that B = b1 = 
 −3 
 , b 2 = 

 , b =
2  3  1  is a basis for
 

      

 

 1 2 −2 
R3 . Then  
[T ]B = [Ab1 ]B [Ab2 ]B [Ab3 ]B .

To find these coordinate vectors, we solve the linear systems corresponding to the augmented matrix:
   
 2 −1 −1 4 −1 −3   1 0 0 2 0 0 
   
[B | AB] = 
 −3 2 1 −6 2 3 ∼ 0 1 0
  0 1 0  
   
1 2 −2 2 2 −6 0 0 1 0 0 3
 
 2 0 0 
 
Thus, [T ]B =  0 1 0  . This is the result of B being an eigenbasis of A. Thus, the eigenvalues of T are

 
0 0 3
λ = 2, 1, 3. In B-coordinates, T is just a diagonal matrix, the diagonalization of A. 
6.5. SIMILARITY AND LINEAR TRANSFORMATIONS 227

Exercises (Go to Solutions)


For Exercises 1-3, given the linear transformation T : F n → F m , a basis B of F n , and a basis C of F m , listed
in that order, find the matrix representation C [T ]B .

♠ 1. T : R2 → R3 ♠ 2. T : R3 → R2 ♠ 3. T : Z42 → Z2
T (x, y) = (x + y, 0, 2x + 3y), T (x, y, z) = (x + y − 2z, −y + z), T (x, y, z, w) = x + y + z + w (mod 2),
                  
 1
  1  1   1   1   1   0   1   0 
 
 
  
 

  
,  ,
 
 
 
 

       
        

 1

−1

  1
  
 ,  0  ,
 
 2 
 , 
 
 1



 1



 0



 0




     




     




 ,
 
  , 
   
 , 
,

 1 −2 3 0 1 1 1

2   −1   −1 
   
        

  
        


     

       
        


 −3  ,  2  ,  1  
 2 −1

  0

0 1 1


     
,

          
   
{1}

 

1 2 −2   −3 2 
  
228 CHAPTER 6. EIGENVALUES
Appendix A

Notation Index and Frequent


Terminology

iff – “if and only if”


∀ – “for all”
∃ – “there exists”
{x | x satisfies some property} - set builder notation
∈ – element of a set
⊆ – subset
≤ – subspace

Example A.1. X = {t(−2, 1, 3) | t ∈ R} is the set of all vectors which are real scalar multiples of (−2, 1, 3).
We have that (0, 0, 0), (−4, 2, 6) ∈ X, but (1, 0, 1) ∈
/ X. Hence, {(0, 0, 0), (−4, 2, 6)} ⊆ X. As X is a subspace,
X ≤ R3 . 

N – the set of natural numbers, that is, counting whole numbers, e.g. 0, 1, 2, 3, 4, . . .
Z – the set of integers, that is, whole numbers which are positive, zero, or negative, e.g., . . . , −3, −2, −1, 0, 1, 2, 3, . . .
Zn = {0, 1, 2, . . . , n − 1} – the set of integers modulo n
Q – the set of rational numbers, that is, those numbers which can be expressed as a ratio of integers a/b
(where b 6= 0), e.g. 1/2, 0, −3/5, 7/2, 7/3, 5, . . . Note that a rational number can be written in more than
one way. √
R – the set of real numbers, that is, those numbers which are rational or irrational, e.g. π, 2, 3.5, 4, −285,
37.4568, log6 (3), e7 , . . . √
C – the set of complex numbers, that is, those numbers of the form a + bi where i = −1 and a, b are real
numbers, e.g., 5, 2 + 4i, −i, 1/2 + 5i/7, . . .
a + bi (ac + bd) + (bc − ad)i
(a + bi)(c + di) = (ac − bd) + (ad + bc)i =
c + di c2 + d2

Example A.2. (1 + 2i)(3 + 4i) = (3 − 8) + (4 + 6)i = −5 + 10i ,


1 + 2i (3 + 8) + (6 − 4)i 11 2
= = + i 
3 + 4i 9 + 16 25 25
F n - the vector space of all column vectors with n entries chosen from the field of scalars F
0 - zero vector

ker T = {x ∈ X | T (x) = 0} – kernel of transformation T : X → Y


im T = {T (x) ∈ Y | x ∈ X} – image (or range) of transformation T : X → Y
one-to-one – T (u) = T (v) implies u = v for the transformation T : X → Y . If T is a linear transformation,
it is one-to-one iff ker T = {0}.
onto – ∀b ∈ Y , ∃x ∈ X such that T (x) = b for the transformation T : X → Y , that is, im T = Y .

m × n matrix – a matrix with m rows (horizontal) by n columns (vertical)


overdetermined system – m > n, more rows than columns (tall matrix)

229
230 APPENDIX A. NOTATION INDEX

underdetermined system – m < n, more columns than rows (fat matrix)


echelon form – pivots form a downward staircase, with zeros below, rows of zeros at bottom
row reduced echelon form (RREF) – pivots are all 1’s and form a downward staircase, with zeros below and
above, rows of zeros at bottom
replacement – replace a row in a matrix with that row plus a multiple of another row, i.e. Ri −→ Ri + cRj
Interchange – interchanging or swapping two rows in a matrix, i.e. Ri ←→ Rj
scaling – multiple a row of a matrix by a scalar, i.e. Ri −→ cRi
Appendix B

The Substitution and Elimination


Methods

Solving linear systems by graphing is very error-prone and problematic, even with the use of technology!
Although the geometric approach might be necessary to comprehend solutions to linear systems, algebraic
methods prove far superior to solving linear systems. In this appendix, we will discuss two such methods
typically presented in a pre-linear algebra setting, beginning with substitution.

The Substitution Method


1. Solve one equation for one of the variables.
2. Substitute the expression for this variable into the other equation.
3. Once you have determined one assignment, back substitute it into the equation from 1.

Example B.1. Solve the system by substitution.


(
2x + y = 1
3x + 4y = 14,

We will solve the first equation with respect to y. This gives us y = 1 − 2x. We next will substitute this
value of y into the second equation, giving:
3x + 4y = 14 ⇒ 3x + 4(1 − 2x) = 14 ⇒ 3x + 4 − 8x = 14 ⇒ −5x = 10 ⇒ x = −2.
We next back substitute x = −2 into the above equation. This gives y = 1 − 2(−2) = 1 + 4 = 5. Therefore,
the solution is (−2, 5) . 

Example B.2. Solve 


 x + y − z = −1

4x − 3y + 2z = 16

2x − 2y − 3z = 5.

We will solve the first equation with respect to z : z = x + y + 1. We now substitute this expression into
both of the remaining equations.
( ( (
4x − 3y + 2(x + y + 1) = 16 6x − y + 2 = 16 6x − y = 14
∼ ∼
2x − 2y − 3(x + y + 1) = 5 −x − 5y − 3 = 5 −x − 5y = 8.

We now have to solve a system of 2 equations with 2 unknowns. Again using substitution, notice that
y = 6x − 14. Plugging this into the last equation gives
−x − 5(6x − 14) = 8 ⇒ x − 30x + 70 = 8 ⇒ −31x = −62 ⇒ x = 2.

231
232 APPENDIX B. THE SUBSTITUTION AND ELIMINATION METHODS

We now substitute this value into the above equation for y and get y = 6(2) − 14 = 12 − 14 = −2. Lastly,
we substitute both of these values back into above equation for x and get z = (2) + (−2) + 1 = 1. Therefore,
the solution of the system is (2, −2, 1) . 

Algebraically, one discovers that a particular system is inconsistent if during the process of solving one
runs into a contradiction or a individual solution which has no solution.

Example B.3. Find all solutions for (


8x − 2y = 5
−12x + 3y = 7.
1 5
We will solve the variable x in the first equation, giving x = y + . Substituting this into the second
4 8
equation gives  
1 5 15 15
−12 y+ + 3y = 7 ⇒ −3y − + 3y = 7 ⇒ − = 7,
4 8 2 2
which is a contradiction. Therefore, the system is inconsistent and has no solutions. 

The Elimination Method


1. Choose a variable to eliminate and adjust the coefficients of said variable so they are alternating.

2. Add the equations and solve for the remaining variable.


3. Back substitute the variable value into one of the original equations.

Now, we try the elimination method to solve a linear system.

Example B.4. Find all solutions by elimination for


(
2x − 3y = 4
4x + 5y = 3.

We will eliminate the variable y. Notice that the coefficients of y are already alternating but we need a
common multiple. Between 3 and 5, the least common multiple is 3(5) = 15. Thus, we will multiply the
first equation by 5 and the second by 3. This gives
(
10x − 15y = 20
12x + 15y = 9.

29
If we add the equations, we get 22x = 29 ⇒ x = . Plugging this x value into either equation will give the
22
y-value. Alternatively, we could start the elimination process over again by canceling out the x this time.
To do so, we will multiply the first equation by -2 and the second by nothing. This gives
(
−4x + 6y = −8
4x + 5y = 3.
 
5 29 5
If we add the equations, we get 11y = −5 ⇒ y = − . Therefore, ,− is the solution of the
11 22 11

system. 
233

Example B.5. Solve 


 x − 2y − z = 8

2x − 3y + z = 23

4x − 5y + 5z = 53.

First, we will eliminate x from the system, which means eliminating x from the first pair of equations and
from the second pair of equations. With the first pair, multiply the first equation by −2 : −2x+4y+2z = −16.
When we add the first 2 equation together, we get y + 3z = 7. Next, we eliminate x from the second pair
of equations. We will multiply the second equation by −2, which gives −4x + 6y − 2z = −46. After we add
this equation with the third, we get 3y + 9z = 21. We now have to solve the 2 × 2 system
(
y + 3z = 7
3y + 9z = 21.

But upon further inspection, we notice that the first equation is 3 times the second equation. Therefore, the
equations are dependent and the system has infinitely many solutions.

In order to get the general form of the solution, notice that we have y = 7 − 3z and x = 2y + z + 8 =
2(7 − 3z) + z + 8 = −5z + 22. Therefore, the general solution has the form (−5z + 22, 7 − 3z, z) . 
234 APPENDIX B. THE SUBSTITUTION AND ELIMINATION METHODS
Appendix C

Proofs of Theorems

Theorem 3.2.9 Two matrices A and B are row equivalent if and only if Row A = Row B.
Proof. We will show that if A and B are row equivalent, then Row A = Row B. Let {r 1 , . . . , r m } be the
rows of A. Certainly, interchange would not change the row space since

Span{r 1 , . . . , r i , . . . , r j , . . . , r m } = Span{r 1 , . . . , r j , . . . , r i , . . . , r m }.

Additionally, scaling by a nonzero value c does not affect the span of the rows since
1
cr i ∈ Span{r 1 , . . . , r i , . . . , r m } and ri = (cr i ) ∈ Span{r 1 , . . . , cr i , . . . , r m }.
c
Finally, replacement also does not change the row space because

r j +cr i ∈ Span{r 1 , . . . , r i , . . . , r j , . . . , r m } and r j = (r j +cr i )−cr i ∈ Span{r 1 , . . . , r i , . . . , r j +cr i , . . . , r m }.

Since no row operation changes the row space, Row A = Row B for all A ∼ B.
In the other direction, if the rows of A and B are {r 1 , . . . , r n } and {s1 , . . . , sn }, respectively, then
Span{r 1 , . . . , r n } = Span{s1 , . . . , sn }. Applying Section 2.1 Exercise 20 repeatedly allows us to transform
along the following lines (up to relabelling):

Span{r 1 , r 2 , . . . , r n } = Span{s1 , r 2 , . . . , r n } = Span{s1 , s2 , . . . , r n } = . . . = Span{s1 , . . . , sn }.

Each of these equalities is obtained by expressing si as a linear combination of the current spanning set,
which inductively can be decomposed as some sequence of row scalings and row replacements. The relabelling
process alluded to above can be accomplished by row interchanges. Therefore, A is row equivalent to B. 

Theorem 3.2.10 If U is an echelon form of the matrix A, then nonzero rows of U form a basis for the row
space of A.
Proof. By Theorem 3.2.9, Row(A) = Row(U ) since A and U are row equivalent. Thus, a basis for Row(U )
is a basis for Row(A). If U is in echelon form, then no nonzero row can be written as a linear combination
of the other rows (otherwise we could have zeroed out that row). Hence, we have a linearly independent
spanning set for Row(A), which is necessarily a basis. 

 
 a b 
Theorem 3.3.3 Let A =  . If ad − bc 6= 0, then A is nonsingular with inverse
c d
 
1 d −b 
A−1 = .

ad − bc

−c a

If ad − bc = 0, then A is singular.

235
236 APPENDIX C. PROOFS OF THEOREMS
 
1 d −b 
Proof. Suppose that ad − bc 6= 0. Let B = . Note that

ad − bc

−c a
 
1  ad − bc −ab + ab 
AB =  = I2 .
ad − bc

cd − cd −bc + ad

Similarly, BA = I2 . Thus, B = A−1 .


     
b a b b
Suppose that ad − bc = 0. If a 6= 0, then  = = . If a = 0, then suppose b 6= 0.
    
a
c bc/a d
         
a b   a   a   0   0 
Similarly,  = = . If a = b = 0, then either   or   is a multiple of the
b
d ad/b c c d
other. In all cases, one column vector is a multiple of the other. Thus, the columns are linearly dependent,
which is singular by the Nonsingular Matrix Theorem. 

Theorem 3.3.7 Let A and B be n × n invertible matrices, let m be a positive integer, and r is a nonzero
real number. Then A−1 , AB, A> , Am , and rA are also invertible with:
(i) (A−1 )−1 = A (ii) (AB)−1 = B −1 A−1 (iii) (A> )−1 = (A−1 )>

(iv) (Am )−1 = (A−1 )m := A−m 1 −1


(v) (kA)−1 = A .
k
Proof.
(i) Note that if AA−1 = A−1 A = In , then A−1 has an inverse, A. Since inverses are unique, (A−1 )−1 = A.

(ii) Note that


(AB)(B −1 A−1 ) = A(BB −1 )A−1 = A(In )A−1 = AA−1 = In .
Similarly, (B −1 A−1 )(AB) = In . Thus, AB has an inverse and (AB)−1 = B −1 A−1 .

(iii) Next, note that


(A−1 )> A> = (AA−1 )> = In> = In .
Similarly, A> (A−1 )> = In . Thus, A> has an inverse with (A> )−1 = (A−1 )> .

(iv) Next, note that


(A−1 )m Am = |A−1 ·{z
· · A−1} · A
| ·{z
· · A} = In .
m-times m-times

Similarly, Am (A−1 )m = In . Thus, (A−1 )m = (Am )−1 .

(v) Finally, note that    


1 −1 1
A (kA) = · k (A−1 A) = 1 · In = In .
r k
   
1 −1 1 −1
Similarly, (kA) A = In . Thus, A = (kA)−1 . 
k k

Theorem 4.2.7 If S = {v 1 , . . . , v p } ⊆ F n is an orthogonal set of nonzero vectors, then S is linearly


independent.
237

Proof. Suppose that


c1 v 1 + c2 v 2 + . . . + cp v p = 0.
Then for each i,

v i · (c1 v 1 + c2 v 2 + . . . + cp v p ) = vi · 0
(v i · ci v 1 ) + (v i · c2 v 2 ) + . . . + (v i · ci v i ) + . . . + (v i · cp v p ) = 0
c1 (v i · v 1 ) + c2 (v i · v 2 ) + . . . + ci (v i · v i ) + . . . + cp (v i · v p ) = 0
2
0 + 0 + . . . + ci kv i k + . . . + 0 = 0
ci kv i k2 = 0.

Therefore, ci = 0 or kv i k = 0. Since v i 6= 0, the latter is impossible. Therefore, ci = 0 for all i, which


implies that S is linearly independent. 

Theorem 4.2.8 Let W be a subspace of F n . Then W ⊥ is also a subspace of F n .

Proof. Since 0 is orthogonal to every vector, including those in W , 0 ∈ W ⊥ . Let x, y ∈ W ⊥ . Then


w · x = w · y = 0 for all w ∈ W . Then w · (x + y) = w · x + w · y = 0 + 0 = 0. Thus, x + y ∈ W ⊥ . Finally,
let c ∈ F . Then w · (cx) = c(w · x) = c(0) = 0 for all w ∈ W . Therefore, cx ∈ W ⊥ , which proves the W ⊥
is a subspace. 

Theorem 4.2.10 Let A be an m × n matrix. Then

(Row A)⊥ = Nul A.


  
a
 i1   x1 
   
 ai2   x2
     

Proof. Let A = aij . Then ai =  .  is a “row” vector of A. Let x = 
   ∈ Nul A. Then
 ..
 .. 

 . 
   
   
ain xm

ai · x = ai1 x1 + ai2 x2 + . . . + ain xn = 0.

So, x is orthogonal to each row vector. Since inner products are bilinear, this shows that x is orthogonal to
the entire row space. 

Theorem 4.4.1 For all u, v ∈ F n ,


|u · v| ≤ kukkvk.

Proof. If u = 0, then both sides of the inequality are 0 and the inequality holds. Suppose that u 6= 0. Then

u·v u·v |u · v| |u · v|
k proju vk = u = kuk = 2 kuk = kuk .
u·u u·u kuk

Also,
k proju vk2 ≤ k proju vk2 + kv − proju vk2 = kvk2 ,
by the Pythagorean Theorem. Therefore,
|u · v|
kvk ≥ ,
kuk
which proves the inequality. 
238 APPENDIX C. PROOFS OF THEOREMS

Theorem 4.4.2 For all u, v ∈ F n ,


ku + vk ≤ kuk + kvk.

Proof.

ku + vk2 = (u + v) · (u + v) = u · u + u · v + v · u + v · v = kuk2 + 2Re(u · v) + kvk2


≤ kuk2 + 2|u · v| + kvk2
≤ kuk2 + 2kukkvk + kvk2 , by Cauchy-Schwarz Inequality,
2
= (kuk + kvk) .

Taking square roots finishes the proof. 

Theorem 4.4.3 Let u, v ∈ F n . Then

u · v = kukkvk cos θ,

where θ is the angle between† the two line segments from the origin to the points identified with u and v.

Proof. Consider the triangle formed by the vectors u, v, and u − v. Let θ be the angle between u and v.
Then
ku − vk2 = kuk2 − 2(u · v) + kvk2 .
Considering the orthogonal projection vb = proju (v), we have that the triangle formed by v, v b, and v − vb is
adj kb
v k
right, with interior angle corresponding to the reference angle of θ, call it θ.
b Then cos θb = = , or
hyp kvk

u·v |u · v| |u · v|
kvk cos θb = kb
vk = kuk = kuk = .
u·u kuk2 kuk

We then conclude that kukkvk cos θb = |u · v|. Therefore, u · v = kukkvk cos θ, where the sign will depend if
θ is acute, right, or obtuse. 

† We should mention that the situation is dramatically different between real and complex vector spaces. In the real case,

we already have a well-defined notion of angles in the plane. Given that two vectors form a triangle which is always contained
in a plane, the notion of angle to agree with the established notion of angle from Trigonometry. We will prove this case
below. In the complex case, we do not have a notion of a complex angle lying around. Thus, there are two main schools of
thought with regard to complex angles. First, as the field C can be viewed as a 2-dimensional real vector space, every vector
space Cn can be viewed as a 2n-dimensional real vector space too. Hence, the angle between the n-complex vectors is simply
the real angle between the corresponding 2n-real vectors. The second idea is to use the Cauchy-Schwarz Inequality. Since
kukkvk kukkvk
0 ≤ |u · v| ≤ kukkvk, then 0 ≤ ≤ 1. Hence, is a real number inside the range of cosine. Thus, we define θ
 |u· v| |u · v|
kukkvk
by the relation θ = cos−1 . This is the meaning we take here of complex angles, and, thus, no proof of the Law of
|u · v|
Cosines is necessary here as it is immediate from the definition of “the angle between.” We could take this perspective for real
angles, which we essentially already have, we choose to prove the Law of Cosines as a theorem instead of using it as a definition
so that we do not have the burden of developing Trigonometry from our linear algebraic definition of θ.
Appendix D

Solutions to Exercises

Chapter 1 : Introduction to Linear Algebra


1.1 Linear Systems

1. yes 2. yes 3. no 4. no

5. no 6. no 7. yes 8. no

♠ 9. no ♠ 10. no ♠ 11. yes ♠ 12. no

13. no 14. no 15. no 16. yes 17. no

18. consistent, 19. consistent, 20. inconsistent 21. consistent,


unique solution multiple solutions unique solution

22. inconsistent 23. consistent, 24. consistent, 25. consistent,


multiple solutions unique solution multiple solutions

26. inconsistent 27. consistent, 28. homogeneous, (0, 0)


unique solution

29. nonhomogeneous ♠ 30. homogeneous, (0, 0) ♠ 31. nonhomogeneous

32. nonhomogeneous 33. nonhomogeneous 34. nonhomogeneous

35. nonhomogeneous ♠ 36. homogeneous, (0, 0, 0, 0) 37. nonhomogeneous

38. It is consistent since it is homogeneous. For example, (0, 0, 0) is a solution. This is, in fact, the only
solution. A sketch of the three planes shows they intersect at a single point.

♠ 39. ♠ 40. Hint: Have you done Exercises 28–38 yet?

z (x, y, z)

2 (7, 5, 2)

3 (8, 8, 3)

−1 (4, −4, −1)

−2 (3, −7, −2)

π (π + 5, 3π − 1, π)
239
240 APPENDIX D. SOLUTIONS TO EXERCISES

1.2 Fields
1. Equality of integers is when the numbers are exactly the same. Congruence, on the other hand, is
only when the remainders are equal, when divided by p. Additionally, congruence depends on the
modulus, that is, two integers can be conguence with respect to one modulus but not another, e.g.
3 ≡ 8 (mod 5) but 3 6≡ 8 (mod 7). In contrast, equality is not relative but absolute.
2. False. The remainder a (mod n) is read “a modulo n” and n is called the modulus of Zn .
3. True

4. False. Modular division a/b (mod n) is defined by multiplying by the inverse of b with respect to
modular multiplication.
5. False. The set Zn is a field with respect to modular addition and multiplication if and only if n is a
prime number.
6. False. If 0 ≤ a < n, then a (mod n) = a.

7. 3 8. 4 9. 0 10. 3 11. 4 12. 2 13. 5

♠ 14. 4 ♠ 15. 2 16. 3 17. 1 ♠ 18. 1 ♠ 19. 11 ♠ 20. 6

21. 2 22. 1 23. 0 24. 3 25. 7 26. x = −1

5 1 8 29. x = 1 ♠ 30. x = 1 ♠ 31. x = 3


♠ 27. x = − ♠ 28. x = − + i
6 5 5

♠ 32. x = 2 ♠ 33. x = 5 34. x = 3 35. x = 1 36. x = 2 37. x = 8 38. x = 2

39. x = 1 40. x = 1 41. x = 5 42. x = 6 43. x = 3

♠ 44. Hint: As Z6 only contains six elements, one could check that the equation has no solution by trial and
error. The lack of solution to this equation results from the fact that 2 has no multiplicative inverse
modulo 6. In fact, 2(3) ≡ 6 ≡ 0 (mod 6). Thus, if 2 did have a reciprocal, then 0 would necessarily
have one too. Now, we do not divide by zero. Not ever! The dinosaurs made that mistake and where
are they now?

1.3 Vector Spaces


1. False. For any vector space V , 0 ∈ V .
         
 9   −9   8   4   3 
         
2.  20 

 ♠ 3.  −8 

 4.  10 

 5. 
 14 
 6.  0 


         
27 5 16 6 1
       
14 −17 + 26i −8 + 4i 1 + 5i
8.  ♠ 9.  10. 
       
    
7.  32 

 −14 + 26i 3 + 5i 4i
 
28
       
−3 + 4i −2 + 2i 1 − 4i 21 + 22i
11.  12. 
       
     
2 + 19i −2 + 8i 13. 
 −3 − 14i

 14.  20 + 12i 


   
−5i 24 + 21i
241
           
 2   3   1   1   2   1 
15.   16.          
1 0 ♠ 17. 
 0 
 18. 
 1 
 19. 
 1 
 20. 
 4 

       
0 1 0 4
     
 4   2   2 
     
 3   0   2 
     
♠ 21. 
 
 22. 
 
 23. 
 

 0   2   2 
     
     
2 4 1

♠ 24. For (d), note 0 = 00 = (c0)0 = c(00) = c0.


For (e), u + (−1)u = 1u + (−1)u = (1 − 1)u = 0u = 0. Therefore, (−1)u = −u.
♠ 25. Let a0 + a1 x + a2 x2 + . . . + an xn , b0 + b1 x + b2 x2 + . . . + bn xn ∈ Pn . Then (a0 + a1 x + a2 x2 + . . . +
an xn ) + (b0 + b1 x + b2 x2 + . . . + bn xn ) = (a0 + b0 ) + (a1 + b1 )x + (a2 + b2 )x2 + . . . + (an + bn )xn .
Since F is a field, ai + bi ∈ F for all i. Thus, the sum is a vector in Pn . Similarly, if c ∈ F , then
c(a0 + a1 x + a2 x2 + . . . + an xn ) = (ca0 ) + (ca1 )x + (ca2 )x2 + . . . + (can )xn ∈ Pn since cai ∈ F for all
i. Therefore, Pn is a vector space.
     
 0   0   0 
      3
♠ 26. 0 
 0  =  0  6=  0  over R . This violates Theorem 1.3.8 (iii).
    
     
1 1 0
     
 0   0   0 
     
♠ 27. 0   6  0  over R3 . This violates Theorem 1.3.8 (iii).
 0 = 0 =
 
 
     
1 1 0
     
 1   −1   1 
     
♠ 28. 1   6  0  over R3 . This violates Axiom (viii) from Definition 1.3.1.
 0 = 0 =
 
 
     
0 0 0
     
 1   0   1 
      3
♠ 29. 1  0  =  0  =
    6  0 

 over R . This violates Axiom (viii) from Definition 1.3.1.
     
0 0 0

1.4 Linear Transformations


   
     (x 1 +
x 2 ) + (y 1 + y2 ) (x1 + y1 ) + (x2 + y2 )
x x x + x
   
1 2 1 2    
♠ 1. T  +  = T   =  =
      
0   0 

y1 y2 y1 + y2    
2(x1 + x2 ) + 3(y1 + y2 ) (2x1 + 3y1 ) + (2x2 + 3y2 )
       
 x1 + y1   x2 + y2   x1   x2 
= +  = T   + T  
2x1 + 3y1 2x2 + 3y2 y1 y2
and
242 APPENDIX D. SOLUTIONS TO EXERCISES
     
     cx + cy c(x + y)   x+y
 
  x   cx   x
   
    
T c  = T = =  = c  = cT  .
 
    0   c0   0  
y cy       y
2(cx) + 3(cy) c(2x + 3y) 2x + 3y
Therefore, T is linear.

♠ 2. T (1, 2) = (3, 0, 8), T (5, −2) = (3, 0, 4) ♠ 3. ker T = {0}

♠ 4. yes, T (2, −1) = (1, 0, 1)


♠ 5. T is one-to-one since kernel is trivial. T is not onto since (0, 1, 0) ∈
/ im T .
     
 x1   y1   x1 + y1 
   
       (x1 + y1 ) + (x3 + y3 )   x1 + y1 + x3 + y3 
 x2  +  y2  = T  x2 + y2  = 
6. T        = 
      (x1 + y1 ) + 3(x2 + y2 ) x1 + y1 + 3x2 + 3y2
x3 y3 x3 + y3
   
    x 1 y 1
 x1 + x3   y1 + y3 
   
   
= +  = T  x2  + T  y2 
    


x1 + 3x2 y1 + 3y2       
x3 y3
and
    
x 1 cx 1
     
 (cx1 ) + (cx3 )   c(x1 + x3 )   x1 + x3 
    
    
Tc  x2  = T  cx2  = 
     =   = c  =
     (cx1 ) + 3(cx2 ) c(x1 + 3x2 ) x1 + 3x2
x3 cx3
 
 x1 
 
cT 
 x2 . Therefore, T is linear.
 
 
x3
      
0   1   

 3 


0    3 
     
  

    
 0 =
7. T  ,T  1 =  8. ker(T ) = t  −1  t∈R
  
  
0 4
    
   


 

0 2  −3 

   
 1   1 + 3t
   
  1    1 

 
9. yes, for example T   =   for any t ∈ R
 0 = . In fact, T 
 −t


  1   1
0 −3t
 
  0 
 b1 

 
10. T is not one-to-one since ker(T ) 6= {0}. T is onto since for any b =   we have that T 
 =
b2 /3 
b2  
b1
  

0 + b1 b
  1 
=

 
0 + 3(b2 /3) b2
243
 
 (x1 + x2 ) + (y1 + y2 ) − 2(z1 + z2 ) 
11. T ((x1 , y1 , z1 ) + (x2 , y2 , z2 )) = T (x1 + x2 , y1 + y2 , z1 + z2 ) =  =
−(y1 + y2 ) + (z1 + z2 )
     
 1x + x 2 + y 1 + y2 − 2z 1 − 2z2   1x + y1 − 2z 1   x 2 + y 2 − 2z2 
 = +  = T (x1 , y1 , z1 )+T (x2 , y2 , z2 )
−y1 − y2 + z1 + z2 −y1 + z1 −y2 + z2
     
(cx) + (cy) − 2(cz) c(x + y − 2z) x + y − 2z
and T (c(x, y, z)) = T (cx, xy, cz) =  =  = c =
     
−(cx) + (cz) c(−y + z) −y + z
cT (x, y, z). Therefore, T is linear.

♠ 12. T (1, 2, 3) = (−3, 1), T (1, 0, −2) = (5, −2) ♠ 13. ker T = {t(1, 1, 1) | t ∈ R}

♠ 14. yes, T (2, 1, 0) = (3, −1)


♠ 15. T is not one-to-one since kernel is nontrivial. T is onto since im T = R2 . Note that T (b1 + b2 , −b2 , 0) =
(b1 , b2 ).

16. T (1, 3, 2) = (7, 6, −6) 17. yes, T (1, −1, 1) = (1, 6, 2)

18. (Answers may vary). Note that T (2, 0, 0) = (0, 8, 0) 6= (0, 12, 0) = 2(0, 6, 0) = 2T (1, 0, 0). Thus, scalar
multiplication is NOT preserved. Thus, T is not linear.
19. (Answers may vary). Note that T (1, 2, 3)+T (0, 1, −4) = (2, 5)+(2, 5) = (4, 10) 6= (2, 5) = T (1, 3, −1) =
T [(1, 2, 3) + (0, 1, −4)].
20. No, the only vector in the image of T is (2, 5).
21. T is not one-to-one since T (1, 2, 3) = (2, 5) = T (0, 1, −4) but (1, 2, 3) 6= (0, 1, −4). Likewise, (0, 0) ∈
/
im T = {(2, 5)}.
22. T ((x1 , y1 , z1 , w1 ) + (x2 , y2 , z2 , w2 )) = T (x1 + x2 , y1 + y2 , z1 + z2 , w1 + w2 ) = (x1 + x2 ) + (y1 + y2 ) +
(z1 + z2 ) + (w1 + w2 ) = (x1 + y1 + z1 + w1 ) + (x2 + y2 + z2 + w2 ) = T (x1 , y1 , z1 , w1 ) + T (x2 , y2 , z2 , w2 )
and T (c(x, y, z, w)) = T (cx, cy, cz, cw) = (cx) + (cy) + (cz) + (cw) = c(x + y + z + w) = cT (x, y, z, w).
Therefore, T is a linear transformation.
♠ 23. T (1, 0, 0, 1) ≡ 0, T (1, 0, 1, 1) ≡ 1
♠ 24. ker T = {(0, 0, 0, 0), (1, 1, 0, 0), (1, 0, 1, 0), (1, 0, 0, 1), (0, 1, 1, 0), (0, 1, 0, 1), (0, 0, 1, 1), (1, 1, 1, 1)}
♠ 25. yes, T (1, 0, 1, 1) ≡ 1
♠ 26. T is not one-to-one since kernel is nontrivial. T is onto since im T = Z2 = {0, 1}.

27. T (1, 2) ≡ 1, T (5, 2) = 0 28. T (1, 0, 1) ≡ (2, 4, 1), T (1, 2, 3) ≡ (3, 2, 0)



 x + 2y + z ≡ 2

29. no, the linear system 2x + 2y + z ≡ 2 (mod 5) is inconsistent.

3x + 4y + 3z ≡ 3

30. T (x + y) = T ((x1 , . . . , xn ) + (y1 , . . . , yn )) = T (x1 + y1 , . . . , xn + yn ) = a1 (x1 + y1 ) + . . . + an (xn + yn ) =


a1 x1 + a1 y1 + . . . + an xn + an yn = (a1 x1 + . . . + an xn ) + (a1 y1 + . . . + an + yn ) = T (x1 , . . . , xn ) +
T (y1 , . . . , yn ) = T (x) + T (y) and T (cx) = T (c(x1 , . . . , xn )) = T (cx1 , . . . , cxn ) = a1 (cx1 ) + . . . +
an (cxn ) = c(a1 x1 ) + . . . + c(an xn ) = c(a1 x1 + . . . + an xn ) = cT (x1 , . . . , xn ) = cT (x). Therefore, T is a
linear transformation.
31. The transformation T is NOT one-to-one if n > 1. To see this, note T (a2 , −a1 , 0, . . . , 0) = a1 a2 −
a2 a1 + 0 + . . . + 0 = 0 = T (0, 0, . . . , 0). If n = 1, then T is one-to-one if and only if a1 6= 0 since
T (x1 ) = T (y1 ) ⇒ a1 x1 = a1 y1 ⇒ x1 = y1 .
244 APPENDIX D. SOLUTIONS TO EXERCISES

32. The transformation T is onto if and only if ai 6= 0 for at least one i. To see this, let c ∈ F . Without
the loss of generality, suppose that a1 6= 0. Then (c/a1 , 0, . . . , 0) ∈ F n and T (c, 0, . . . , 0) = a1 (c/a1 ) +
a2 (0) + . . . + an (0) = c.

1.5 Augmented Matrices


     
 1 0 0   1 2 1 4 −1   0 2 1 3 0 
1.  2.   
3. 
 
0 1 0 0 1 3 5 0  0 0 1 2 0 

 
free variables: none free variables: x3 , x4 0 0 0 4 0
free variables: x1
   
 1 4 2 6 1   1 3 5 0 1 1 
   
 0 3 4 1 2 2 
 0 2 2 1 2 
   
 
4. 


 5. 
 0 0 0 6 1

3 
 0 0 0 2 3  



   0
   0 0 0 2 4 

0 0 0 0 0  
0 0 0 0 0 5
free variables: x3
free variables: x3
     
 1 0 0 1 2 0 −3 0 0 0
7. 
    
    
6. 
 0 1 0 
 0 0 1 5 8. 
 0 0 0 

RREF, rank 2
   
0 0 1 0 0 0
RREF, rank 3 RREF, rank 0
   
 0 0 0  10. 3 −2 4 5
   
 0 0 0  echelon form, rank 1
 1 10 
 
9. 
 
  
 0 0 0  12. 
 0 1 0 
     
 
0 0 0 11. 1 2 3 4 5 6 0 0 1
RREF, rank 0 RREF, rank 1 echelon form, rank 3
     
 0 1 0   1 2 4 −3   1 0 0 0 
  14.    
13. 
 0 0 0 
 0 0 1 5 15. 
 0 0 2 0 

   
0 0 0 echelon form, rank 2 0 0 0 1
RREF, rank 1 echelon form, rank 3
     
 1 6 −5 0  5 6 3  2 4 3 6 
17. 
 
    
16. 
 0 0 0 2 
 0 4 12 18. 
 0 2 6 2 

   
0 0 0 0 echelon form, rank 2 0 0 1 4
echelon form, rank 2 echelon form, rank 3
   
19. NOT echelon form
 1 2 3 4 5   1 2 3 4
5 
   
♠ 20. 
 0 6 7 8 9 
 ♠ 21. 
 0 1 7 8 9 

   
0 0 0 3 4 0 0 0 0 2
echelon form, rank 3 echelon form, rank 2
245
   
♠ 22. NOT echelon form
 1 0 3 0 5   1 0 5  3 4
   
♠ 23. 
 0 1 7 0 9 
 ♠ 24. 
 0 1 7 8 9 

   
0 0 0 1 2 0 0 0 0 2
RREF, rank 3 RREF, rank 2
   
♠ 25. NOT echelon form 1 − 1 2
− 2 1
 5 5 5   1 0 0  2
   
7 
27.  0
 2 1 3 
28. 
 0 1 12 0 
   
26. NOT echelon form 0 0 0 1 0 0 0 1
echelon form, rank 2 RREF, rank 2
   
29. NOT echelon form
 1 0 0 3 5   9 4 2 6 
   
 0 7 0

0 4 
 31. 
 3 5 2 7 

30. 


  
 0 0 0 1 6  3 17 8 24
 
 
0 0 0 0 0
echelon form, rank 3
     
 3 −1 0  1 1 1 1  10 −7 2 −4 10 
♠ 33. 
 
    
♠ 32. 
 0 −2 5 
 −2 −6 1 12  3

4 −3 1 20 

  34. 



1 7 13  1
 0 4 0 73 

 
0 6 −3 2 10
 
x1 − 2x2 − 3x3 = −4 3x − 2y − z = 4
 
3
 12 −2 3 13 

 
2
  36. x2 + 2x3 = 1 ♠ 37. x + 3z = −3
35. 
 1 3 −2 0 9 


−2x1 − 2x3 = 8


0 = 2
 
0 1 17 −20 12
 
 4x − y − 2z + 5w = 12
 


12x + z − 9w = 3
♠ 38. −3x + w = 5 9y + 21z + 25w = −9





x = 0 39. 14x − 7y + 36w = 67


3x + 4y− 7z + 19w = 2





−18x + 2z + 7w = 1

  
2x + 2y
 = 5  x−
 y + 3z = 6 3x −
 y = 0
40. 3x + 3y − 3z = 3 ♠ 41. 2y + 5z = 10 ♠ 42. y = −5/2
  
− 5y + 4z = −1 2x + 7y − z = 0 x + 7y = 13
  
scale Row 2 by 3
( 
7x + y − 2z ≡ 10
 
x+ y+ z = 1 
1 3 2 3
♠ 43.
− 4y + 3z = 14 44. 5x + 5y + 9z ≡ 8 (mod 11) 





6x − z ≡ 3 45.  0 1 7 1 


 
0 0 1 0
246 APPENDIX D. SOLUTIONS TO EXERCISES
     
 0 0 0 0 0   1 2 3 −1 2   1 1 2 3 
     
46. 
 1 2 3 4 5 
 47. 
 0 1 3 5 5 
 48. 
 1 2 2 3 

     
4 4 6 8 17 0 0 0 1 1 9 4 7 2
     
 1 2 36   1 0 3 3   1 1 2 0 3 
     
49. 
 0 −1 −2 −11 
 ♠ 50. 
 3 2 1 4 
 ♠ 51. 
 2 0 0 1 0 

     
0 0 0 16 0 0 0 2 1 0 0 0 0
 
53. Interchange Rows 1 and 2. 54. Interchange Rows 1 and 3,
 1 0 2 2 1 
replace Row 1 with Row1 − Row2,
 
♠ 52. 
 0 1 3 1 2 
 scale Row 1 by 21 .
 
0 1 2 3 4

55. replace Row 2 with 56. replace Row 2 with 57. replace Row 1 with Row1 − 6Row3,
Row2 − 3Row1, Row2 − 3Row1, replace Row 2 with Row2 − 2Row3,
scale Row 2 by 12 replace Row 3 with replace Row 4 with Row4 − 3Row3
Row3 − 2Row1,
replace Row 4 with Row4 + 2Row1

1.6 Reduction of Linear Systems


   
replace Row 2 with Row 2 + 4Row 1
 1 2   1 0  1
1.  ∼ , scale Row 2 by ,
0 13 0 1 13
replace Row 1 with Row 1 − 2Row 2
   
replace Row 2 with Row 2 − 2Row 1,
 1 2 3   1 0 0 
replace Row 3 with Row 3 − 2Row 1,
   
♠ 2. 
 0 1 ∼ 0
3   1 ,
0  scale Row 2 by −1, scale Row 3 by −1
    replace Row 2 with Row 2 − 3Row 3,
0 0 1 0 0 1 replace Row 1 with Row 1 − 3Row 3,
replace Row 1 with Row 1 − 2Row 2
   
−2 − 35  replace Row 2 with Row 2 − 2Row 1,
 1 2 2 1   1 0
replace Row 3 with Row 3 − Row 1,
   
♠ 3. 
 0 -3 −6 ∼ 0
−4   1 2 4 , replace Row 3 with Row 3 − Row 2,
3 
    scale Row 2 by − 31 ,
0 0 0 0 0 0 0 0 replace Row 1 with Row 1 − 2Row 2
   
−5   1 replace Row 4 with Row 4 − 20Row 2,
 1 5 6 0 0 0  42
    replace Row 4 with Row 4 + Row 3
 0

1 3 3   0
 
1 0 0 
 25
♠ 4.  ∼ , 25
scale Row 4 by ,
712
   
 0 0 25 61   0 0 1 0 


  

 replace Row 1 with Row 1 + 5Row 4,
0 0 0 712
0 0 0 1 replace Row 2 with Row 2 − 3Row 4,
25
replace Row 3 with Row 3 − 61Row 4,
replace Row 2 with Row 2 − 3Row 1, 1
scale Row 3 by ,
replace Row 4 with Row 4 + 3Row 1, 25
interchange Rows 2 and 3, replace Row 1 with Row 1 − 6Row 3,
replace Row 3 with Row 3 + 15Row 2, replace Row 2 with Row 2 − 3Row 3,
replace Row 1 with Row 1 − 5Row 2
247
    1
scale Row 2 by ,
 1 2 0 9   1 0 0 11 
2
    1
5. 
 0 2 4 ∼ 0
6  1 0 ,
−1  scale Row 3 by ,

5
replace Row 2 with Row 2 − 2Row 3
   
0 0 5 10 0 0 1 2 replace Row 1 with Row 1 − 2Row 2
   
interchange Rows 2 and 3,
 8 16 0 24   1 0 0 9  1
    scale Row 1 by ,
6. 
 0 9 18 ∼ 0
9   1 0 ,
−3  8
1
    scale Row 2 by ,
0 0 4 8 0 0 1 2 9
1
scale Row 3 by ,
4
replace Row 2 with Row 2 − 2Row 3
replace Row 1 with Row 1 − 2Row 2
   
1
 3 0 6 9 18   1 0 2 0 9  scale Row 1 by ,
3

7. 
  
1
 0 2 4 0 ∼ 0
8  1 2 ,
0 −3  scale Row 2 by ,

    2
0 0 0 3 6 0 0 0 1 2 1
scale Row 3 by ,
3
replace Row 1 with Row 1 − 3Row 3

1
   
 1 2 + 5i   1 0  scale Row 1 by ,
♠ 8.  ∼ , 1+i
replace Row 2 with Row 2 − (2 − i)Row 1
0 −6 − 6i 0 1 1
scale Row 2 by ,
−6 − 6i
replace Row 1 with Row 1 − (2 + 5i)Row 2

1
   
 1 2i   1 0  scale Row 1 by ,
9.  ∼ , 6 − 2i
replace Row 2 with Row 2 − (−5 + i)Row 1
0 4 + 9i 0 1 1
scale Row 2 by ,
4 + 9i
replace Row 1 with Row 1 − (2i)Row 2

1
   
 1 12   1 0  scale Row 1 by ≡ 9 (mod 17),
♠ 10.  ∼  (mod 17), 2
replace Row 2 with Row 2 − 5Row 1,
0 2 0 1 1
scale Row 2 by ≡ 9 (mod 17),
2
replace Row 1 with Row 1 − 12Row 2
   
interchange Rows 1 and 2,
 2 1 2   1 0 0 
replace Row 3 with Row 3 − 3Row 1,
2
   
♠ 11. 
 0 5 ∼ 0
6   1  (mod 7),
0  replace Row 3 with Row 3 − Row 2 ≡
    5
0 0 2 0 0 1 Row 3 − 6Row 2 ≡ Row 3 + Row 2
1
scale Row 3 by ≡ 4 (mod 7),
2
replace Row 2 with Row 2 − 6Row 3,
replace Row 1 with Row 1 − 2Row 3,
1
scale Row 2 by ≡ 3 (mod 7),
5
replace Row 1 with Row 1 − Row 2,
1
scale Row 1 by ≡ 4 (mod 7)
2
248 APPENDIX D. SOLUTIONS TO EXERCISES
   
replace Row 2 with Row 2 − 2Row 1,
 1 3 6   1 0 0 
scale Row 2 by −1 ≡ 6 (mod 7),
   
12. 
 0 1 ∼ 0
5  1  (mod 7),
0  replace Row 3 with Row 3 − 5Row 2,

    scale Row 3 by 51 ≡ 3 (mod 7),
0 0 5 0 0 1 replace Row 1 with Row 1 − 6Row 3,
replace Row 2 with Row 2 − 5Row 3,
interchange Rows 1 and 2, replace Row 1 with Row 1 − 3Row 2

   
replace Row 2 with Row 2 − 1Row 1,
 1 0 1   1
0 0 0 1 
    replace Row 3 with Row 3 − 3Row 1,
 0 1 0 6   0 1 0 6  replace Row 4 with Row 4 − 1Row 4,
   
13.  ∼  (mod 7),

 0
   scale Row 2 by 21 ≡ 4 (mod 7),
0 5 4   0
 0 1 5 

  

 replace Row 3 with Row 3 + Row 2,
0 0 0 0 0 0 0 0 replace Row 4 with Row 4 − 4Row 2,
interchange Rows 3 and 4,
scale Row 3 by 51 ≡ 3 (mod 7)
       
 1 −2 3 1   1 −1 3 1   0 2 6 4   1 3 2 5 
       
14. 
 2 −2 ∼ 0
5 6  ,
5 −10 0  15. 
 1 3 2 ∼ 0 2
5  6 ,
4 
 
       
2 3 −4 2 0 0 −1 4 3 3 3 3 0 0 15 0
x = (5, −8, −4) x = (−1, 2, 0)
       

 3 2 1 1   3 2 1 1   1 3 2 0   1 3 2 0 
       
 0 3 4 0   0 3 4 0 
 3 8 3 5 ∼ 0
   
16.    3 1 ,
2 
    17. 

∼
 
 (mod 5)

 0 3 4 0   0 0 3 3 
0 3 1 6 0 0 0 4 

 
 


inconsistent 0 0 3 3 0 0 0 0
x ≡ (2, 2, 1)
   
 1 3 5 4 6 0   1 3 5 4 6 0 
   
 2 4 4 3 1 5   0 5 1 2 3 5 
   
18. 

∼
 
 (mod 7)

 0 0 0 1 0 5   0 0 3 5 3 4 
   
   
6 2 6 3 0 2 0 0 0 1 0 5
x ≡ (4 + 3t, 6 + t, 6t, 5, t)
       
2 4 0 1 0 1  1 2 2   1 0 6 
19.  ∼  (mod 5),
   
   
1 3 2 0 1 2 20. 
 0 ∼ 0 1
−4 8  ,
−2 

   
x ≡ (1, 2) 2 2 6 0 0 −2
inconsistent
       
 1 1 8   1 0 6   1 −1 0
6   1 0 0 8 
♠ 21.  ∼ ,    
1 −1 4 0 1 2 ♠ 22. 
 2 ∼ 0
0 −3 16   1 0 ,
2 
   
x = (6, 2) 0 2 1 4 0 0 1 0
x = (8, 2, 0)
249
       
 0 3 6 3   1 0 0 0   0 3 −2 0   1 0 0 1 
       
23. 
 1 2 ∼ 0 1 0
−3 2  ,
1  24. 
 1 1 ∼ 0 1
3 12  0 ,
2 
 
       
4 0 −3 0 0 0 1 0 6 2 1 13 0 0 1 3
x = (0, 1, 0) x = (1, 2, 3)
       
 1 4 7 3   1 0 −1 −1   2 −2 −2 2   1 0 −2/5 1 
       
25. 
 2 5 8 ∼ 0
3   1 2 ,
1  ♠ 26. 
 2 3 ∼ 0 1
1 2   3/5 ,
0 
       
3 6 9 3 0 0 0 0 3 2 0 0 0 0 0 −15
x = (t − 1, −2t + 1, t) inconsistent
       
 7 8 9 10   1 0 −1 −10   2 1 −1
−10   1 0 0 3 
       
27. 
 4 5 6 ∼ 0
10  1 2 ,
10  28. 
 1 −3 −2 −4 ∼ 0 1 0 ,
−5 
 
       
1 2 3 10 0 0 0 0 3 1 1 15 0 0 1 11
x = (10 + t, 10 − 2t, t) x = (3, −5, 11)

       
 2 1 4 199   1 0 0 42   5 2 6 11   1 0 0 −1/8 
       
29.  2 10 15 ∼ 0 1 0
984  ,
75   0 1 2 5   0 1 0 13/4 
    

    30. 
 ∼ ,
  
−1 0 10 58 0 0 1 10  3 2 1
 7 
  0 0 1
 7/8 

x = (42, 75, 10)
   
2 1 4 6 0 0 0 −3/2
inconsistent
       
 2 4 −2 10   1 0 0 4   1 1 1 1 1   1 0 0 0 3 
       
31. 
 16 −23 ∼ 0 1 0
5 33  ,
2   0 3 1 0 1   0 1 0 0 1 
    
    32.  ∼ 
   
4 2 −6 2 0 0 1 3  1 1 2 0 0

  0
  0 1 0 −2 

x = (4, 2, 3)
   
2 0 1 4 0 0 0 0 1 −1
x = (3, 1, −2, −1)

Chapter 2 : The Algebra and Geometry of Vectors


2.1 Vector Equations
        " # " # " # " #
1 3 0 7 3 7 −5 −5
        ♠ 2. x1 + x2 + x3 =
1. x1 
 7  + x2  6  + x3  −1  =  8 
       −1 −2 6 4

3 1 2 3
               
−2 6 −11 6 1 2 2 1
♠ 3. x1   + x2   + x3  =
               
1 −2 5 −3  4. x1  3  + x2  1  + x3  0  =  4 
       
3 −19 31 −17
5 2 1 3
250 APPENDIX D. SOLUTIONS TO EXERCISES

♠ 5. yes, b = 3a1 + 2a2 ♠ 6. no, the corresponding linear ♠ 7. yes, b = 3a1 + 4a2
system is inconsistent

8. yes, b = 3a1 + a2 9. yes, a1 + 2a2 + 3a3 + 4a4 10. no

♠ 11. yes, b = −a1 + a2 + a3 , for example. There are, in fact, infinitely many possible linear combinations.
A dependency relation occurs whenever x1 = 2 − x3 and x2 = 3 − 2x3 .
12. yes, b = a1 + 4a3 , for example.
♠ 13. Hint: Although we know nothing about the vectors v 1 , . . . , v n , we can still control the scalars xi ∈ F
such that
x1 v 1 + x2 v 2 + . . . + xn v n = 0.
What should we set xi equal to?
14. Hint: To show that Span{u1 , u2 } ⊆ Span{v 1 , v 2 , v 3 } take an arbitrary vector in Span{u1 , u2 }, namely
a1 u1 + a2 u2 for scalars a1 , a2 ∈ F , and argue why it is in Span{v 1 , v 2 , v 3 }, that is, a1 u1 + a2 u2 =
b1 v 1 + b2 v 2 + b3 v 3 for some scalars b1 , b2 , b3 ∈ F . By assumption we know that u1 = c1 v 1 + c2 v 2 + c3 v 3
and u2 = d1 v 1 + d2 v 2 + d3 v 3 .

15. Hint: Show that Span{v 1 , v 2 } ⊆ Span{v 1 , v 2 , v 3 } and that Span{v 1 , v 2 } ⊇ Span{v 1 , v 2 , v 3 }. That
is, take an arbitrary vector in one set and argue why it is in the other set. To show Span{v 1 , v 2 } ⊆
Span{v 1 , v 2 , v 3 }, set one of the coefficients to zero. To show Span{v 1 , v 2 } ⊇ Span{v 1 , v 2 , v 3 } then
suppose v 3 = av 1 + bv 2 and combine like-terms. Or, if you already did Exercise 14, you could apply
that exercise twice for a very fast argument.

16. Hint: Use Exercise 15 to show that Span{v 1 , v 2 } = Span{v 1 , v 2 , v 1 + v 2 } and Span{v 1 , v 1 + v 2 } =
Span{v 1 , v 1 + v 2 , v 2 }.

17. Hint: Generalize Exercise 14. 18. Hint: Generalize Exercise 15. 19. Hint: Generalize Exercise 16.

20. Hint: Use Exercise 19.

2.2 Matrix Equations


1. False. A homogeneous linear system can be written as Ax = 0, where A is the m × n coefficient matrix,
x ∈ F n , and 0 ∈ F m is the zero vector.
       
30 11 10 + 10i 0
♠ 4. 
       
      
2.  5 
  ♠ 3.  6 
  −2 − 11i ♠ 5.  1 


     
9 −2 0

≡1

4x1 + x2 + 2x3
 

x + 2y = 2
♠ 6. 4x2 + 3x3 + x4 ≡ 0 (mod 5)

 3x + 4y = 0
 ♠ 7.
2x1 + 2x2 + 4x3 ≡2 −x − 2y = 2



= −9

4x + 5y

27y + 17z − 32w = 12



21x +
   

 3 −7
42y + 6z + 19w = −6

 4x +    
8.
   
 103x − 72y + 17z − 8w = 15 9. 
 2 
 ♠ 10. 
 9 


    
11y − 10z + 13w =

2x + 1 1 2
251
         
♠ 11. inconsistent
   4   0   1   0 
 
 0 
 
     
1 + i ♠ 13. 
 2 

 1  + t 1 
14.      2 
 
 0 
 



       15.   + t 
 
 

12.  3 

 4 0 1  1 
 
 1 
 
     
1−i 0 1

 
1 ♠ 17. T (1, 0, 0, 1) = (5, 1), ♠ 18. x = (2/3, 1/3, 0, 0)
 
  T (1, 2, 3, 4) = (12, 10) (Answers may vary)
 0 
 
16. 
 

 1 
 
 
0

19. The solution would also be (4, 3, 2, 1) because the system of linear equations and the matrix equation
represent the same linear system and hence have the same solution set.

2.3 Linear Independence


1. False. If a set S ⊆ F n contains the zero vector, then S is linearly dependent.

2. True. 3. True. 4. True. 5. True. 6. True.

♠ 7. linearly dependent; ♠ 8. linearly independent; 9. linearly dependent;


the set contains 0. the second vector is not a the second vector is a scalar
scalar multiple of the first. multiple of the first.

10. linearly dependent; ♠ 11. linearly independent; ♠ 12. linearly dependent;


there are more vectors than it is a single nonzero vector. there are more vectors than
components (underdeter- components.
mined system).

13. linearly dependent; 14. linearly independent


the second vector is the zero vector.

♠ 15. linearly dependent; (Answers may vary) ♠ 16. linearly independent


       
 1   2   −2   0 
       
2  −  −2  + 0  3  =  0
 −1

      
       
3 6 −2 0

17. linearly independent ♠ 18. linearly independent ♠ 19. linearly independent


         
 1   2   6   0   0 
         
 2   6  5   5   0 
         

♠ 20. linearly dependent; (Answers may vary) 2 

+
 
 + 4
  +
 
≡
 
 (mod 7)

 0   6   1   4   0 
         
         
1 4 1 4 0

2.4 Affine Geometry


252 APPENDIX D. SOLUTIONS TO EXERCISES

♠ 1. F 2 ♠ 2. F ♠ 3. F 8 ♠ 4. F 3 ♠ 5. F 5 ♠ 6. F 0
         
 5   4   −3   1   4 
         
 4  + t 6 ,
7. x =     8. x =  6 
 + a  6 
  + b 
 ,
−2 
         
6 9 1 5 3
 
x1 = 5 + 4t
 x1 = −3 + a + 4b

x2 = 4 + 6t x2 = 6 + 6a − 2b
 
x3 = 6 + 9t x3 = 1 + 5a + 3b
 
           
 1   −1   −4   1   1   2 
           
9. x =  2  + s  1  + t  3 
    
,  2 
 
 1 
 
 4
 

      ♠ 10. x = 
 + s
 
 + t
  ,

3 −1 −2  3 
 
 3 
 
 2




x1 = 1 − s − 4t
     
 4 3 0
x2 = 2 + s + 3t 
 x1 = 1 + s + 2t
x3 = 3 − s − 2t
 


x = 2 + s + 4t
2

 x3 = 3 + 3s + 2t


x4 = 4 + 3s
           
 0   11   −2   9  1 1
           
   
1 −1 −4 8 ♠ 13. x ≡  2  + t  1 
       
 (mod 5),
          
♠ 11. x = 

+r
 
 + s
 
 + t
 
,
    
 4   −4   −4   −4 















 3 0
−1

3 2 0 x ≡ 1 + t
 1

11r − 2s + 9t

x1 = x2 ≡ 2 + t (mod 5)

1− r − 4s + 8t

x =

x3 ≡ 3
2


x 3 = 4 − 4r − 4s − 4t     

= −1 + 3r + 2s

x4 2 1 0 
    
     
    ♠ 14. x ≡  3  + s  2  + t 
   
  (mod 7),
0 
i 2 − i      
♠ 12. x =   + t , 1 3 5
   
1 + 2i 5 
( x1 ≡ 2 + s

x1 = i + (2 − i)t x2 ≡ 3 + 2s (mod 7)

x2 = 1 + 2i+ 5t 
x3 ≡ 1 + 3s + 5t
       
−7 17
 1   2  







   
 −7   12 
   
 −2   −2 
   
15. x =  ♠ 16. x =   + t ,
 + t
   ,    
  4   −3 
 4   5     
       
    0 1
0 1
= −7 + 17t

 x1
x1 = 1 + 2t


= −7 + 12t
 
x

x = −2 − 2t
 2
2
x3 = 4 − 3t
x3 = 4 + 5t


 


 x4 = t
x4 = t
253
     
 −3   1 
   17 
   
17. x = 
 2 
  2 
  19. 
 22 

  ♠ 18. x = 
 
  
−6  3 
  33
 
4

2.5 Subspaces
♠ 1. It is not a subspace because (0, 0) ∈
/ W . Likewise, it is not closed under addition, e.g. (1, 0), (0, 1) ∈ W
but (1, 0) + (0, 1) = (1, 1) ∈
/ W . It is likewise not closed under scalar multiples, e.g. (1, 0) ∈ W but
2(1, 0) = (2, 0) ∈
/ W.
♠ 2. It is not a subspace because it is not closed under addition, e.g. (1, 0), (0, 1) ∈ W but (1, 0) + (0, 1) =
(1, 1) ∈
/ W . Of course, there are some instances where the sum is contained, e.g. (0.1, 0.2)+(−0.1, 0.3) =
(0, 0.5) ∈ W , but closure under addition means that every possible sum is contained not just some
sums. It is likewise not closed under scalar multiples, e.g. (1, 0) ∈ W but 2(1, 0) = (2, 0) ∈
/ W . It does
contain the zero vector.
♠ 3. It is not a subspace because it is not closed under addition. Note that (1, 1), (2, 4) ∈ W but (1, 1) +
(2, 4) = (3, 5) ∈
/ W . It likewise fails closure under scalar multiples since (1, 1) ∈ W but 2(1, 1) =
(2, 2) ∈
/ W . On the other hand, it does contain the zero vector.
♠ 4. It is not a subspace because the sum of two points in W need not belong to W , for example, (1, 0) +
(0, 1) = (1, 1) ∈
/ W . On the other hand, (0, 0) ∈ W and W is closed under scalar multiples.
♠ 5. It is not a subspace because it is not closed under scalar multiples, e.g. (1, 1) ∈ W but (−1)(1, 1) =
(−1, −1) ∈/ W . Is is closed under addition and contains the zero vector.
6. It is not a subspace because (0, 0) ∈
/ W.

♠ 7. It is not a subspace because the sum of two such functions will have a y-intercept of 2, not 1; so, the
set is not closed under addition. Also, it is not closed under scalar multiples nor contains the zero
function.
♠ 8. It is a subspace, if f (0) = 0 and g(0) = 0 then (f + g)(0) = f (0) + g(0) = 0 + 0 = 0 and (cf )(0) =
c[f (0)] = c0 = 0. Of course, the zero function is a member.

♠ 9. It is not a subspace, because if the end behavior of f is ∞ then the end behavior of −f will be ∞; so,
the set is not closed under scalar multiples. It likewise does not contain the zero function. It is closed
under addition though.
♠ 10. It is a subspace, because if f (−x) = −f (x) and g(−x) = −g(x) then (f + g)(−x) = f (−x) + g(−x) =
−f (x) − g(x) = −[f (x) + g(x)] = −(f + g)(x) and (cf )(−x) = c[f (−x)] = c[−f (x)] = −[cf (x)] =
−(cf )(x). Of course, the zero function is a member.

2.6 Solution Sets of Linear Systems


       
1. Yes; x3
 −1  2  2   5 

 
   
 

 
  

 
 
 
 
 


 
   
   

 0  ♠ 3. Span  1   −2   −4 
       
2. Span 
 



 

 4. Span 
  ,
 
 



 2  
 0   
 1   0  

 




   



 
 
   

0  0 3 

  
 

           
 −2  2   −5  1 −3 −5
  
  
 

   
♠ 5. Span 
  
         
 
    




      



 1 
 




 1   0 
  





 −2 


 4 


 0 




   
 
     

♠ 6. Span  , 3  ♠ 7. Span  1  ,  0  ,  0 
         
  0 
         

    
 
      

  
 0   1   0 



  0   −4 
  

 
      




   




     


254 APPENDIX D. SOLUTIONS TO EXERCISES
               
 −2 1 2 5  3  −4
 1  2 
 


      
    

      
       
  5   0   5  0  1   0 ♠ 10. x = 
           
 + t 3 
 2
   
9. x = 

        + s  + t
     
    
8. Span  1  ,  0  ,  0 
           
0  0   1 
      
0 1

           

      
      
 0   1   0 
  −8 0 0

      


      


 
 0 0 1 
       
 1   0   1 + i   0 
       
 2 
 
 0 
  ♠ 12. x = 
 3 + 4i 
 + t  −2 
 
♠ 11. x =   + t  
  
    
 3   0  0 1
   
   
0 1
       
 0   1   1   1 
       
1  1  0  1 
       
   
       
       
 1   1   0   0 
♠ 13. x ≡ 
  + a
 
 + b
 
 + c
  

0  1  0  0 
       
   
       
       

 0 


 0 


 1 


 0 

       
0 0 0 1
   
 6   5 
   
♠ 14. We saw in Example 2.6.6 that x0 = 
 2  and x1 =  2  are two solution to the linear system, but
  
   
0 1
 
 4 
 
x = x0 + x1 ≡ 
 4  (mod 7), which is not a solution x since

 
1
        
 3 5 3   4   3(4) + 5(4) + 3(1)   0   0 
        
Ax =   ≡  5  6≡  6 .
 4 5   4  =  4(4) + 5(4) + 4(1)
4    
    
        
6 1 6 1 6(4) + 1(4) + 6(1) 6 3

   
6
   5 
   
♠ 15. We saw in Example 2.6.6 that x0 = 
 2 . Then 2x0 ≡  4
   (mod 7). But then

   
0 0
        
 3 5 3   5   3(5) + 5(4) + 3(0)   0   0 
        
Ax =   ≡  5  6≡  6  .
 4 5   4  =  4(5) + 5(4) + 4(0)
4    
    
        
6 1 6 0 6(5) + 1(4) + 6(0) 6 3

♠ 16. Hint: Suppose x0 and x1 are solutions to the linear system Ax = b. Multiply the equation x =
(1 − t)x0 + tx1 by the matrix A, using also the fact that x0 and x1 are solutions to the linear system.
Recall that multiplication by a matrix distributes across vector addition and commutes with scalar
multiplication. It is a linear transformation, after all.
255

17. Hint: Like the hint to the previous exercise, suppose x = a0 x0 + a1 x1 + . . . + am xm such that
x0 , x1 , . . . , xm are solutions to Ax = b and a0 + a1 . . . + am = 1. Then simplify

A(a0 x0 + a1 x1 + . . . + am xm ).

2.7 Bases
1. False. If A is an m × n matrix, then rank(A) + nullity(A) = n.
       



 1 21 




 0   1 



 
 



   

2. W = Span  2  ,  16 
    3. W = Span  0  ,  0 
   

    
 
     

 
 
 

 7 43   1 1 

4. rank A = 3, nullity
 A  = 0,    5. rank A = 3, nullity
 A  = 0,   
 1
  15   8 
 1
  1   2 
 
  


 
      
 
     


Col(A) = Span   0   9   6 ,
, , Col(A) = Span   2   3   5 ,
, ,
         

       
      

 
 
 

 0 0 2   4 2 4 
Nul(A) = Span {} Nul(A) = Span {}

6. rank A = 3, nullity
 A  = 1,
    7. rank A = 3, nullity
 A  = 1,
   



 1   1 1 




 5   4 2 


   
 
 

 

  
 
 


Col(A) = Span  , 4  , , 2  ,
 2 , 3 Col(A) = Span   4 , 4
   
       

      
 
      


 3   
1 3   2 5 6 
   
 3  7 

  
 
 
 
 


   
  
   
 2 
     −13 
   
Nul(A) = Span  

 Nul(A) = Span  




  −5   
  8 



   
  

  



  

4  1 

  
 

♠ 8. rank A = 2, nullity
 A = 3,   ♠ 9. rank A = 2, nullity
 A = 2,  
 8   −3   −1 0

 
  

 
Col(A) = Span  ,  , 
 
 
 


 −2

1 
 Col(A) = Span  7  ,  −8  ,
   
      
    


 

−3 −3 2 
2   −4

 
 

    
    

      

  1   −3   −5 
 2   1 

  
 


      
 
 
     
 

    


Nul(A) = Span  1  ,  0  ,  0
       1   0 
    







 
  
 


 Nul(A) = Span  
, 
  
 
0 1   0  0   1 

      

      
 
     

      

 

    


 0 
0 1  0 0 
  

   
10. rank A = 2, nullity
 A =2,  
 3 17



    

 1  
   2 
 
    
 
Col(A) = Span   ,   ,  −4   −13 
    
 3

4 
 Nul(A) = Span 
,
 


 1   0 

    
 

    

 
0 1

 

256 APPENDIX D. SOLUTIONS TO EXERCISES
   
11. rank A = 4, nullity
 A =2,
 1   23 
     
 


 
 
 6   5   4   2 

      
   
−1 −50

  
    


        
 
    

     
 5   4   3   1 

         
    
Col(A) = Span   −1   18 
 ,
  , ,  ,  
Nul(A) = Span  ,
        
 2   4 
  6   2 

     

 1   0 

     
    

        

 

   



 0

1 0 2 
 
    

0   19 

    

  


    

 
0 2 

 

12. rank A = 2, nullity


 A =3,  13. rank A = 2, nullity
 = 0,   
 1   2   1+i 4−i

 
 
 


  
   


    
 
    

   
 2   4   2 + 4i 4 − 2i

    
   
  
Col(A) = Span  ,
  
 ,
 Col(A) = Span  
,
 
 ,



  3   6  
  6 − 10i   3+i 

    
 
    


    




    


 2 −2   −8 − 2i 12 + 10i
   

Nul(A) = Span{}
     



  7   −8   13 




      

 
 1   −2   1 

      
 
     

Nul(A) = Span  3  ,  0  ,  0 
     

     

      

 0   3   0 
 


      

     



 0 
0 3 

♠ 14. rank A = 3, nullity


 A = 3,     ♠15. rank A = 3, nullity
 A =3,   



 1 + 2i   −1   0  



 1   0   0 


      

 

    


Col(A) = Span   2 + 4i  ,  1 + 4i  ,  −2 , Col(A) = Span  0  , 
       
,
1   0  ,


            

 
 
 

 3−i 2 i   0 0 1 
           
i −1 −1 − i 1 1   1

 
 
 


      
    


   
     
 
      

1 0 0 1 1   0

       
 
      

       
 
    



      
 
      
              

 0   −i   −2  1   0  1

      
 
     

Nul(A) = Span  , , Nul(A) = Span  , ,
   
 
0 2 −1 − 2i 1 0   0

       
      

            


      



      

              



  0   1   0 



  0   1   0 


              


      



      


 0

0 1

  0

0 1

     
16. rank A = 3, nullity
 A =3,   

 1   1   0



  

 
 1
 0 0 
 
      

0   1   1

        
  
  
 
 




  


Col(A) = Span  0  ,  1  ,  1  ,
      




 
 
 
 



1  0  0
  
      
    
Nul(A) = Span  , ,

 
      
 1 0 1  
1   0   0

      


  

      


      
0  1  0

    


      


      

 
0 0 1

 

257
   
♠ 17. rank A = 2, nullity
 A =2,
 2   3 
 
 


 
 
 2   1 

      
   


   


  1   0 
    


  4  
  
0 
 Nul(A) = Span   , 
  

Col(A) = Span   0   3 
 ,
   , 

 
 
     


  2   3  
     


    
  0

1 


    


 0

1 

18. rank A = 3, nullity


 A =1,    19. rank A = 3, nullity
 A =3,   



 6   2 0 




 1   2 6 


   
 
 



   
 
  

Col(A) = Span   1 , 5
  , 6  ,
   Col(A) = Span   1 , 3
  , 0  ,
  

       
      

 
 
 

 3 6 2   1 4 2 
       
 0   12   250   13

 
 
 


  
 


 
 
      

   
 −1 
    −4  

   
−86   −5
  


Nul(A) = Span   
      

  
      

 −2   −25 
  −1


  2    


Nul(A) = Span  , ,

  
    

  
 
0  1 0   0

  
      

    


      

      

0  8   0

    


    


      

 
0 0 1

 

20. Hint: If v is a strongly positive vector but u is not, is there a positive scalar c ∈ R that guarantees
that u + cv is also strongly positive? How does the span of u and u + cv compare to the span of u
and v?

2.8 Coordinates
           
−5 2  i 1 −4 4
♠ 1.  ♠ 3. 
          
         
3 ♠ 2.  3  −2 ♠ 4. 
 9 
  9 
  ♠ 6. 
 8 

♠ 5. 
   
      
−2 11  32 
  7
 
5
       
 23 + 13i   3 2 1   3 2 3 7   1 −2 6 −2 
       
7. 
 10 + 6i
 ♠ 8. 
 −1 1 1   10 4

0 −1 

 −17 −7 −8

10 

♠ 9. 
   10.  
       
−5 − 13i 0 −2 −4  −5 6
 −4 −2 

 69
 47 26 −20 

   
0 8 5 7 −17 −11 −8 6
     
 1+i 3  0 3 1   0 3 2 
♠ 11. 

    
i 2 + 2i 12. 
 3 1 4 
 13. 
 4 4 0 

   
2 4 2 2 4 1
258 APPENDIX D. SOLUTIONS TO EXERCISES

Chapter 3 : The Algebra and Geometry of Matrices


3.1 Matrix Operations
1. True
2. False. If A is a square matrix, then the trace of A is the sum of the diagonal entries of A.

3. True 4. True 5. True 6. True 7.  


 0 0 1 
 
 0 0 0 
 
 
0 0 0

8. There are more columns in 9. There are fewer columns in 10. C is not square.
C then rows in A. AB then rows in D.
     
 2 −1 8   4 −6  −7  14 23 21 −16 
     
♠ 11. 
 13 −11 0 
 ♠ 12.  31
 3 −6 
 ♠ 13.  9 −21 −28
 17 

     
8 −2 −5 −2 −8 5 −2 −17 −19 12
     
 1 4 1   3 1 3   2 −5 1  0
     
♠ 14. 
 2 −3 −2   0

7 3 
 ♠ 16.  3 0 −2 2 
♠ 15. 
   
     
3 0 −1  −1
 8 2 
 −2 7 4 3
 
2 −3 −4
   
♠ 17. −3 3 −29 10  9 −6 6 
 
♠ 18. −8
   
 42

39 −31 
 ♠ 20.  0 6 12 
♠ 19.    
   
 14
 18 −12 
 −6 6 −3
 
28 −9 −7
       
 3 3   1
22. 
7 
  −7 −2   19 −14 
24.  
   
21. 
 2 8 
 26 27 23. 
 2 −4 
 29 −54
   
7 14 6 −8
     
 −1 + 3i −5 + 10i  3 + 6i 3 − 9i   1 − 2i 5
♠ 25.  ♠ 27. 
 
   
5 − 4i −1 − 3i ♠ 26. 
 −2 + 11i 5 + 3i 
 1 + 3i 1+i
 
6 + 22i 10 − 6i
   
−i 3−i  ♠ 30. 2 + i
 3 1 5
♠ 28. 
 
♠ 31. 1 + i
  
0 −2i 1 − 3i 0 4−i 
 

♠ 29. 




 −i 0 
 
1 + 2i −4
259
     
 15 + 20i 5 − 8i   3 − 11i 5 − 5i   2 1 3 2 
  ♠ 33.    
 16 + 13i 3 − 5i  10 + 5i 6 − 17i  4 3 0 1 
   
♠ 32. 


 ♠ 34. 




 0 −3  
 4
 3 0 1 

   
−12 − 16i 2+i 1 4 4 3
     
 0 2 3 0   1

0 1 4 

 3

1 3 

 
♠ 35. 
 2 0 4 2   2

2 2 4 

 0

2 3 

♠ 36.  ♠ 37. 
  
     
4 0 2 0  3
 2 2 3 

 4
 3 2 

   
4 3 1 2 2 2 1
   
♠ 39. 2
 2 0 0 1   4 0 1 1 

♠ 38. 
 ♠ 40. 2 
♠ 41. 

 3 0 3 2 
  1 0 0 1 

   
3 2 4 3 0 4 3 0
     
 0 0 3 0   8 0
3   1/8 9/8 1 1/4 
     
 4

1 4 2 
 43. 7, 
 −7 ,7
−2 2   −7/8

−1/2 −12/15 8/15 

♠ 42. 


   44. 2, 

,2

 3
 1 0 0 
 2 4 1  2/5
 1/4 2 −3/8 

   
4 4 0 1 −12/13 1/4 1/5 3/8
   
 5 3 5 2   1 + 3i 2 + 4i −1 − 2i 
   
 1 0 2 3  46. 3 + 9i, 
 −5 + 5i −8i 3 + 6i  , 3 − 9i
 
45. 6, 

 (mod 7), 6
  
 6
 3 1 0 
 2 − 2i 6−i 2 − 4i
 
2 4 1 0

3.2 Matrix Properties


1. For all A, B, C ∈ F m×n and all scalars c, d ∈ F , the following eight properties hold:

(i) A + B = B + A (v) c(A + B) = cA + cB


(ii) (A + B) + C = A + (B + C)
(iii) There exists a matrix 0 ∈ F m×n , such (vi) (c + d)A = cA + dA
that A + 0 = 0 + A = A
(iv) For each A, there exists a matrix −A ∈ (vii) c(dA) = (cd)A
F m×n , such that A + (−A) = (−A) + A =
0 (viii) 1A = A
   
 5 12 21   14 19 5 
   
2. AB = 
 1 8  6=  17
22  9  = BA
8 

   
8 15 26 35 16 16
260 APPENDIX D. SOLUTIONS TO EXERCISES

♠ 3. Row(A) = Span {[1 0 − 2 3 4], [0 1 − 1 3 5]}, ♠ 4. Row(A) = Span {[1 − 2 − 1 0], [0 0 0 1]},
corank(A) = 2 corank(A) = 2

♠ 5. Row(A) = Span {[1 0 0 1 1 1], [0 1 0 1 1 0], [0 0 1 1 0 1]}, corank(A) = 3

♠ 6. Row(A) = Span {[1 3 0 2], [0 0 1 2]}, corank(A) = 2


♠ 7. Row(A) = Span {[1 i 0 0 1 1 − i], [0 0 1 0 − i 2], [0 0 0 1 − 2 1 − 2i]}, corank(A) = 3
8. rank(A) = corank(A)
 = 3,
    
 1

 1 7 


      


     


 1   2   10 

      

 
     

Col(A) = Span  1  ,  −1  ,  1  ,
     

     

      

 1   −3   5 
 


      

     


 
 1 −2 0 
Row(A) = Span{(1, 1, −3, 7, 9, −9), (0, 1, −4, 3, 4, −3), (0, 0, 0, 1, −1, 2)}

9. Hint: Is matrix multiplication commutative? The multiplication of a field should be commutative.


Can you find an example of two 2 × 2 matrices that do not commute? I hope so, because the section
provides some.
10. Hint: If n = 1, then F 1×1 would be the set of 1 × 1 matrices, which are really just scalars. Hence,
F 1×1 is really just F itself, which is a field by assumption. If n > 1, then consider the following two
matrices:    
 1 0 0 0 ···   0 1 0 0 ··· 
   

 0 0 0 0 ···  
  0 0 0 0 ··· 
   
···  , ···  .
   

 0 0 0 0   0 0 0 0 
   

 0 0 0 0 ···  
  0 0 0 0 ··· 
 .. .. .. .. ..
  .. .. .. .. ..

. . . . . . . . . .

3.3 Matrix Inverses


       
3 −5  1  −2 1  1  5 4  3 −5 
1.  2.  3. ♠ 4. 
 
2 17
    
−1 2 3 −1 7 9 −1 2
       
 −7 5  1  1 −3   4 1   4 1 
♠ 5.  ♠ 6. ♠ 7.  ♠ 8. 
10
    
3 −2 2 4 2 1 8 6
       
1  2 −1 − i  1  −3   29   18 
♠ 9. 10.
1+i 2
       
i 1 2 ♠ 11. 
 −25 
  15 
 
  ♠ 12. 
 

7  −11 
 
 
8

♠ 13. (0, 1, 1)
261

♠ 14. nonsinuglar, the matrix has ♠ 15. singular, it is not a square ♠ 16. nonsingular, the mapping
full rank matrix x 7→ Ax is surjective (onto)

♠ 17. singular, A is not row equiv- ♠ 18. not enough information, if ♠ 19. nonsingular, it has 3 pivot
alent to I5 the RREF is I3 it is non- columns
singular, otherwise it is sin-
gular.

♠ 20. singular, the mapping x 7→ ♠ 21. singular, A is not row equiv- ♠ 22. singular, the columns of A
Ax is not injective (one-to- alent to In , otherwise this are linearly dependent
one) would suppose that In is
row equivalent to a singular
matrix (a contradiction)

23. X = (CBA)−1 ♠ 24. X = −A−1 B ♠ 25. X = BA − A−1 C

♠ 26. X = BC −1 A−1 ♠ 27. X = C −1 B − A 28. X = D−1 (ECA + B)

29. Proof of Corollary 3.3.8. By Theorem 3.3.7, A> is invertible and products of invertible matrices are
invertible. Therefore, AA> and A> A are invertible. 

30. Proof of Corollary 3.3.12. If AB is invertible, then there exists some matrix C such that (AB)C = In .
Thus, A(BC) = In , which shows that A is invertible by (iv) in the Nonsingular Matrix Theorem.
Similarly, C(AB) = In , which implies that (CA)B = In . Therefore, B is likewise invertible by (iii) in
the Nonsingular Matrix Theorem. 

3.4 Elementary Matrices


     
 1 0 0 0 
 0 0 1     1 0
0 
 0 1 0 0 
     
♠ 1. 
 0 1 0 
 ♠ 2. 



♠ 3. 
 0 −5 0 

   0 0 2 0   
 
1 0 0   0 0 1
0 0 0 1

     
 1 0 0   1 0 0
0  0  1 0 −1 0 
     
♠ 4. 
 0 1 0   0 1 0 0 0   0

1 0 0 

♠ 6. 
   
   
♠ 5. 
   
2 0 1  0 0 1 0 0 

 0
 0 1 0 

   

 0 −3 0 1 0 
 0 0 0 1
 
0 0 0 0 1
   
7. singular
 −2 7 −1   3 3 −3 
  1 
8. 
 −2 6 −1  ♠ 9.  −6 −12 24 


 6



−3 10 −1 5 7 −9
        
25
 2 3 2   −8 4 − 38   0 1  1 0  1 0  1 2 
    12.     
♠ 10. 
 1 4 2 
 11. 
 −2 2 0 
 1 0 2 1 0 −3 0 1
   
1 1 1 − 83 0 −8 1
262 APPENDIX D. SOLUTIONS TO EXERCISES
      
 0 0 1  1 0 0  1 0 0  1 0 0  1 0 −2   1 −1 0 
      
♠ 13. 
 0 1   −3
0   1  0
0   1  0
0   1  0
0   1  0
0   1 0 

      
1 0 0 0 0 1 −6 0 1 0 2 1 0 0 1 0 0 1
          
 0 1 0   −1 0 0  1 0 0  1 0 0  1 0 0  1 0 0  1 0 0  1 0 0  1 0 −2   1 −16 0 
♠ 14.
          
 1 0 0 
 0 1  5
0  1  0
0  1  0
0  1  0
1  2  0
0  1  0
0  1  0
0  1  0
0  1 0 
        
 
          
0 0 1 0 0 1 0 0 1 2 0 1 0 0 1 0 0 1 0 12 1 0 0 2 0 0 1 0 0 1

      
 1 1 0  1 0 0  1 0 0  1 0 3i   1 0 0  1 i 0 
      
15. 
 0 1  0
0  
 0 1
1 0  
 0
4i   1 0  2
 1  0
0   1 0 

      
0 0 1 −2i 0 1 0 0 1 0 0 1 0 0 1 0 0 1
 
 3 6 2 
 
16. X ≡ 
 1 1  (mod 7)
5 
 
3 3 6

3.5 Matrix Factorizations

1. scalar 2. not scalar 3. not scalar 4. scalar


     
 2 4   −9 6 18   0 6 0 
     
♠ 5. 
 3 −2 
 ♠ 6. 
 0 5 −50 
 ♠ 7. 
 0 0 0 

     
−3 −15 12 4 12 0 15 −25

       
 4 0 0   8 0 0   1/2 0 0   1/8 0 0 
 2
 3
  −1
  −3
 
♠ 8. A = 
 0 9 , A =  0
0   −27 0 , A =  0
 −1/3 0 , A =  0
 −1/27 0 

       
0 0 25 0 0 125 0 0 1/5 0 0 1/125
   
 0 1 0 0  1 1 0 0  1 1 0 0 
   
 1 0 0 0  0 0 0 1  0 1 0 0 
   
9. P = 





;

 0 0 1 0   0 0 1 0   0 0 0 1 
   
   
0 0 0 1 0 1 0 0 0 0 1 0

   
 8 6 3 7   5 10 1 8 
   
 1 5 8 10   2 9 5 6 
   
PA =  , AP =  ,
   
 10 9 7 4   6 7 8 3 
   
   
5 2 6 9 9 4 10 7
Row 1 goes to Row 2, Column 1 goes to Column 3,
Row 2 goes to Row 4, Column 2 goes to Column 1,
Row 3 goes to Row 1, Column 3 goes to Column 4,
Row 4 goes to Row 3; Column 4 goes to Column 2.
263
   
 1 0 0  1 0 0  1 0 0 
   
♠ 10. 
 2 1  0
0   1  0
0   1 0 

   
0 0 1 3 0 1 0 4 1
      
 3 0 0  1 0 0  1 0 0  1 0 −2   1 0 0  1 4 0 
      
♠ 11. 
 0 1  0
0   5  0
0   1  0
0   1  0
0   1  0
9   1 0 

      
0 0 1 0 0 1 0 0 2 0 0 1 0 0 1 0 0 1
     
4 −1  1 0 0 0 2 4 −1 5 −2 
 1 0 0  2 
  
 −2 1 0 0  0 3 1 2 −3 
     
12. 
 4   0 −10
1 0   8 
 13.   
  
 1 −3 1 0  0 0 0 2 1 
   
2 49
3 1 0 0
  
5 5
  
−3 4 2 1 0 0 0 0 5

      
01 2 −5  1 0 −2  1 0 −6 
   
♠ 14. LU =  , L b ∼ , ∼
  
 U y 
−2 1 0 3 0 1 −6 0 1 −2
      
 1 0  2 −2   1 0 4   1 0 7 
   
♠ 15. LU =   , L b ∼ , U y ∼ 
− 23 1 0 3 0 1 15 0 1 5
      
 1 0 0  2 6 0 
  
1 0 0 6 
  
1 0 0 54 
      
16. LU = 
 8 2  0
0   3 ,
0  L b ∼
 0 1 0 ,
−51  U y ∼
 0 1 0 −17 

      
6 2 1 0 0 3 0 0 1 81 0 0 1 27
      
 1 00  2 2 −2  1 0 0 −4  1 0 0 6 
     
      
♠ 17. LU = 
 −2  0
1 0   1 ,
0  L b ∼
 0 1 0 ,
−10  U y ∼
 0 1 0 −10 

      
−1/2 −1 1 0 0 3 0 0 1 −6 0 0 1 −2
      
 1 0 0  2 1 2 
  
1 0 0 1 
  
1 0 0 5 
      
♠ 18. LU = 
 3 1 0 
 0
 3 ,
1  L b ∼
 0 1 0 ,
6  U y ∼
 0 1 0 0 

      
2 1 1 0 0 5 0 0 1 2 0 0 1 6

19. Proof. Let A = [aij ]. Since A is lower triangular we have that aij = 0 for i > j. On the other hand, say
that A> = [bij ]. Now, A> is an upper triangular matrix if bij = 0 whenever i < j. By the transpose
operation, we have that bij = aji . Suppose that i < j (or j > i). Then we have aji = 0. Thus, bij = 0,
that is, A> is upper triangular. 

3.6 Linear Transformations on R2


       
1/2 0  1 0  0 −1  0 1 
♠ 1.  ♠ 2.  ♠ 3.  ♠ 4. 
   
   
0 5 2 5 −1 0 −5 0
264 APPENDIX D. SOLUTIONS TO EXERCISES


 
1 3 −1
♠ 5.

2
 √ √ 
3−6 3 6+3 3
 
y
 4 0 
♠ 6.  
0 1
1
Stretch horizontally by a factor of 4. J J

x
1 2 3 4

  
y
 1 0  1 0 
♠ 7.   
0 −1 0 8
Stretch vertically by a factor of 8 and reflect J
across the x-axis. x
1 2

−1

−2

−3 J
−4

−5

−6

−7

−8
    
y
 0 1  4 0  1 0  1 0 
♠ 8.     
1 0 0 1 0 2 0 −1
4
Reflect across the x-axis, stretch vertically by
a factor of 2, stretch horizontally by a factor of
4, and reflect across the line y = x.
3
J

1
J
x
−2 −1 1
265
    
 3 0  1 6  1 0   1 −2 
♠ 9.     
0 1 0 1 0 13 0 1
Shear horizontally by a factor of −2, stretch vertically by a factor of 13, shear vertically by a factor of
6, and stretch horizontally by a factor of 3.

J
3

1
J
x
−5 −4 −3 −2 −1 1 2
    
0 −1   −1 0  0 1 
10.  =

 
−1 0 0 −1 1 0

♠ 11. (a)
   −1    
 cos θ − sin θ   1 0   cos θ − sin θ   cos θ − sin θ   1 0   cos θ sin θ 
    =   
sin θ cos θ 0 −1 sin θ cos θ sin θ cos θ 0 −1 − sin θ cos θ
      
2 2
 cos θ − sin θ   cos θ sin θ   cos θ − sin θ 2 sin θ cos θ   cos 2θ sin 2θ 
=  = = =A
sin θ cos θ sin θ − cos θ 2 sin θ cos θ sin2 θ − cos2 θ sin 2θ − cos 2θ

(b) The sequence of geometric transformations in the factorization of A rotates the plane clockwise
about the origin by an angle of θ, then reflects the plane across the x-axis, then rotates the plane
counter-clockwise about the origin by an angle of θ.

Since the line ` is the unique line through the origin that forms an angle of θ with the x-axis, the
composite of these transformations moves ` onto the x-axis, reflects across the x-axis, and then
moves the x-axis back to `. The net effort on the plane is to reflect across the line `.
     
cos 2(0) sin 2(0) cos 0 sin 0 1 0
(c)   =   =  , which is reflection across the x-
     
sin 2(0) − cos 2(0) sin 0 − cos 0 0 −1
axis, the unique line which forms a 0 angle with the x-axis.
266 APPENDIX D. SOLUTIONS TO EXERCISES
     
 cos 2(π/4) sin 2(π/4)   cos(π/2) sin(π/2)   0 1 
  =   =  , which is reflection
sin 2(π/4) − cos 2(π/4) sin(π/2) − cos(π/2) 1 0
π
across the line y = x, the unique line which forms a angle with the x-axis.
4
     
cos 2(π/2) sin 2(π/2) cos π sin π −1 0
 =   =  , which is reflection across
     

sin 2(π/2) − cos 2(π/2) sin π − cos π 0 1
π
the y-axis, the unique line which forms a angle with the x-axis.
2
√ √
       
 cos 2(π/6) sin 2(π/6)   cos(π/3) sin(π/3)   1/2 3/2  1  1 3 
(d)  = = √ =  √
2

sin 2(π/6) − cos 2(π/6) sin(π/3) − cos(π/3) 3/2 −1/2 3 −1

1
J
2
J

x
1 1
2

− 12

(e)
   −1    
 cos θ − sin θ   1 m   cos θ − sin θ  cos θ − sin θ 1 m cos θ sin θ
B=  =
   
    
sin θ cos θ 0 1 sin θ cos θ sin θ cos θ 0 1 − sin θ cos θ
  
 cos θ − sin θ   cos θ − m sin θ sin θ + m cos θ 
=  
sin θ cos θ − sin θ cos θ
   
2 2 2 m 2
cos θ − m sin θ cos θ + sin θ sin θ cos θ + m cos θ − sin θ cos θ 1 − 2 sin 2θ m cos θ
= =
   

2 2 2
sin θ cos θ − m sin θ − sin θ cos θ sin θ + m sin θ cos θ + cos2 θ −m sin θ 1+ m 2 sin 2θ
 
1  2 − m sin 2θ m + m cos 2θ 
= 
2

−m + m cos 2θ 2 + m sin 2θ

12. Proof. Since A is nonsingular, it can be factored into a product of elementary matrices. Replace-
ment elementary matrices cause shears, scaling elementary matrices cause stretches, compressions, and
reflections. Interchange elementary matrices cause reflections. 

3.7 Representations of Linear Transformations as Matrices


267
     
 2 −1   1 2 1   1 
♠ 1. [T ] =      
0 −4  3 −5 14   i 
   
♠ 2. [T ] = 


 ♠ 3. [T ] = 
 

 −5 3 −18   −1 
   
   
−7 19 −40 −i
   
♠ 4. [T ] = 1 1 1 1  0 0 1 
 
♠ 5. [T ] = 
 0 1 0 

 
1 0 0

 
 0 2  
  
 0 −2 1    −4 −3 
♠ 6. [S ◦ T ] = [S][T ] =   3
 =
0  
1 0 −2   −4 8
2 −3
 
 1 0 
♠ 7. [T ] ∼  . Hence, T is one-to-one and onto. ker T = Span{} = {0}, im T = Span{e1 , e2 } =
0 1
R2
 
 1 0 3 
 
 0 1 −1 
 
♠ 8. [T ] ∼   Hence, T is neither one-to-one nor onto. ker T = Span{[−3, 1, 1]> }, im T =
 
 0 0 0 
 
 
0 0 0
Span{[1, 3, −5, −7]> , [2, −5, 3, 19]> }
 
 1 
 
 0 
 
♠ 9. [T ] ∼  . Hence, T is one-to-one but not onto. ker T = Span{} = {0}, im T = Span{[1, i, −1, −i]> }
 
 0 
 
 
0
 
> > >
♠ 10. [T ] ∼ 1 1 1 1 . Hence, T is onto but not one-to-one. ker T = {[−1, 1, 0, 0] , [−1, 0, 1, 0] , [−1, 0, 0, 1] },
im T = Span{1} = Z2
 
 1 0 0 
 
♠ 11. [T ] ∼ 
 0 1 0 . Hence, T is one-to-one and onto. ker T = Span{} = {0}, im T = Span{e1 , e2 , e3 } =

 
0 0 1
Z35

12. As this is the identity transformation, it is clearly one-to-one and onto.


13. T cannot be one-to-one as the standard matrix [T ] is 2 × 3 and has too many columns. (It is onto
since in the standard matrix, the two rows are not multiples of each other, that is, they are linear
independent, and the matrix will have two pivots).
268 APPENDIX D. SOLUTIONS TO EXERCISES

14. T cannot be onto as the standard matrix [T ] is 4 × 3 and has too many rows. (It is onto, something
we see from row reducing the standard matrix).
15. T cannot be onto as the standard matrix [T ] is 3 × 1 and has too many rows. (It is one-to-one since
only zero maps to the zero vector).

16. T cannot be onto as the standard matrix [T ] is 3 × 1 and has too many rows. (It is one-to-one since
only zero maps to the zero vector).

17. T cannot be one-to-one as the standard matrix [T ] is 1 × 3 and has too many columns. (It is onto since
for any scalar c we have that T (c, 0, 0) = c).
18. As the standard matrix [T ] is 2 × 2, T could potentially be one-to-one or potentially onto. (It is, in
fact, both).

Chapter 4 : Orthogonality
4.1 Inner Products

1. 5 3. 18 5. −40 ♠ 7. 36
♠ 2. 48 ♠ 4. −4 ♠ 6. 7

♠ 8. 22 9. 342 ♠ 10. −6 ♠ 11. −i ♠ 12. 1 + 7i ♠ 13. −2 + 9i


√ √ √
♠ 14. 9 − 3i 15. 26 + 4i 16. 2 − 13i 17. 35 ♠ 18. 5 3 ♠ 19. 3 2
√ √ √ √ √ √
20. 298 21. 17 ♠ 22. 6 23. 14 24. 15 ♠ 25. 22
√ √ √ √ √
26. 66 ♠ 27. 3 2 ♠ 28. 3 5 29. 14 1 23
30. 31. ±
2 6

4.2 Orthogonality

3 2. orthogonal 3. no, but (6, 1) ⊥ (8, −24)


♠ 1. −
4
4. orthogonal 5. no, but (2, 8) ⊥ (4, −1) ♠ 6. 3x1 − 2x2 = −1

♠ 7. x1 − 2x2 + 2x3 − x4 = 0 ♠ 8. 3x1 +6x2 −3x3 −2x4 +5x5 = 9. x1 +3x2 +5x3 +7x4 +9x5 = 6
23
           
−1 −3 −3 1 4   0 

 
 
 
 
 
 
 

      
 













 
  

 
   

♠ 11.  ♠ 12. 
 
10. 
 0 

 0 
 1 , 0  

 0   −2 
  
♠ 13. 
      ,

 
 
 
 
   

   


     
1  1  0 1  1 
  0 
   
  

 

   

 
0 1 

 

         
 2   1  3 1   1
    
 
 
 

♠ 14. 
   
15. 
    
  
 
       
 3 
   1 
  ♠ 16.  −1 
  ♠ 17. 
 1 , 0
  


 
 
   

 
 
 

 2   0 −1 

18. x1 + 2x2 + x3 = 16
269

4.3 Outer Products


 
 1 7 
1.   (as long as the (2, 1) position is 7, the values on the diagonal entries could be any real
7 2
number).
     
 1 2 3   7 8 9   56 21 3 23 
     
♠ 2.  2 1 2  ♠ 3.  8 5 6   21 91 85

75 

♠ 4. 
    
     
3 2 1 9 6 3  3 85 43
 35 

 
23 75 35 62
     
1 6 8 9 13 (note that the 1 7i 5 − 2i 
  (1, 1) position   1 2 3 
     
 6 7 23 14 8  could be any real 6. 
 −7i 3 −9i   8.  2 1 4 

  
  number)    
5.  8 23 11 16 19 
 
  5 + 2i 9i 2 3 5 1
 
 9 14 16 54 22   
1 2
 
 
7. 
 
13 8 19 22 72 
3 4
 
 1 2 3 4 
 
 5 6 7 8 
 
9. 



 9 0 1 2 
 
 
3 4 5 6

♠ 10. Hermitian ♠ 11. Hermitian ♠ 12. not Hermitian ♠ 13. Hermitian


     
0 −2  0 3 9 3  −12 0
14.  ♠ 16. 
   
   
2 0  −3 0 5 −2  8 0
 
15. 



 −9 −5 0 −9 
 
 
−3 2 9 0
       
 8 −2 4   1 −3   7   −4 
       
♠ 17. 
 −8 2 −4 
 ♠ 18. 
 1 −3 
 19. u = 
 2 ,v =  0 
  
       
4 −1 2 2 −6 1 1
       
 −2  3  
 −9   −2 
2
         
   
 1  −7   3   0   0
       
  
20. u = 

,v = 
     
 3

 22. u = 

,v = 
  

21. u =  6  , v = 
 4   
  2  
  0 
 
 0



2 
        
   
−1  9 
    8 −5
  4
4
270 APPENDIX D. SOLUTIONS TO EXERCISES

♠ 23. idempotent ♠ 24. neither ♠ 25. nilpotent

♠ 26. The projection of the unit square is the line y


segment between the points (0, 0) and (1, 1). 2
As such, it has no area because it is only a line
segment. The notion of area is lost by projec-
tions. 1 J

J
x
1 2

27. (a) Proof. The result follows from the linearity of the transpose operator and (A> )> = A:

(A + A> )> = A> + (A> )> = A> + A = A + A> . 

(b) Proof. The result follows from the linearity of the transpose operator and (A> )> = A:

(A − A> )> = A> + ((−A)> )> = A> + (−A) = −A + A> = −(A − A> ). 

1 1
(c) Proof. Let S = (A + A> ) and T = (A − A> ). Then
2 2
1 1 1 1 2
S+T = (A + A> ) + (A − A> ) = (A + A) + (A> − A> ) = A + 0 = A.
2 2 2 2 2
By part (a), we know that A + A> is symmetric. Since the transpose operator is linear, any
scalar multiple of a symmetric matrix is symmetric. Thus, S is symmetric. A similar argument
using part (b) shows that T is skew-symmetric. Therefore, A can be decomposed into a sum of
symmetric and skew-symmetric matrices. 

28. Proof. Let A be an idempotent matrix, that is, A2 = A. Suppose that A is also nonsingular. Then

A2 = A ⇒ A−1 (A2 ) = A−1 (A) ⇒ In A = In ⇒ A = In .

Suppose next that A is nilpotent, that is, there exists some m such that Am = 0. Let m be the smallest
positive integer with this property. By the way of contradiction, suppose that A is also nonsingular.
Then
Am = 0 ⇒ A−1 (Am ) = A−1 (0) ⇒ In Am−1 = 0 ⇒ Am−1 = 0.
This contradicts the minimality of m. Therefore, A cannot be nilpotent and nonsingular. 

 
 i 3 4i 
 
29. We say that a square matrix A is skew-Hermitian if A∗ = −A. For example, the matrix 
 −3 0 1 − 2i 

 
4i −1 − 2i −7i
is skew-Hermitian.

4.4 Affine Transformations


     
2 −8 3
♠ 1. cos−1 √ √ ≈ 63.4◦ ♠ 2. cos−1 √ √ ≈ 132.5◦ ♠ 3. cos−1 √ ≈ 56.8◦
2 10 10 14 30

♠ 4. rigid motion ♠ 5. rigid motion ♠ 6. not a rigid motion ♠ 7. not a rigid motion
271

♠ 8. rigid motion ♠ 9. rigid motion ♠ 10. not a rigid motion ♠ 11. rigid motion

♠ 12. ♠ 13.
y y

P
P P0
v
x x

P0

♠ 14. ♠ 15.
y y

P0

P
v v P
x x
P0

   
4  7  18. Since A ∼ I3 , the lin-

 

  ear transformation associ-
16. 
 −5 
 17. 
 10 
 ated to A is one-to-one and
    onto. Therefore, the affine
8 −6 transformation T is likewise
one-to-one and onto.
   
−4 −7 ♠ 21. Since A ∼ I3 , the lin-







 ear transformation associ-
♠ 19.  20 

 ♠ 20.  14 

 ated to A is one-to-one and
    onto. Therefore, the affine
14 −6 transformation T is likewise
one-to-one and onto.
   
24. Since A ∼ I3 , the lin-
 23   −4  ear transformation associ-
   
22. 
 10 
 23. 
 4 
 ated to A is one-to-one and
    onto. Therefore, the affine
33 2 transformation T is likewise
one-to-one and onto.

4.5 Orthogonal Projections


272 APPENDIX D. SOLUTIONS TO EXERCISES
         
 1   1   1 5   2 
♠ 1. y = −2  ,   



   
3  2  − 6 0
♠ 2. y = 5    ,
 ♠ 3. y = 
 4  + 6  −2 ,
 
         
[y]B = −2 1 −1 −2 1
   
 5   1 
[y]B =   [y]B =  
−6 6
         
 3   3   −1   1   7 
         
 4   −1   1   6   −2 
     
♠ 4. y =   − 2
   
 + 3
     ,    
♠ 5. y = 2  −6  − 3  0 ,
    
 1   −5   −1 
         
         
0 1 −1  0 
 
 5 
 
     
1 1 5
   
 
[y]B = 
 −2 

 2 
  [y]B =  
3 −3
     
2/5 1/3 3
♠ 6. projW (y) =  ,
     
   
6/5 ♠ 7. projW (y) = 
 8/3 ,
 ♠ 8. projW (y) = 
 −4/5 ,

       
2/5 3/5 7/3 2/5
y= +
           

6/5 −1/5  1/3   2/3   3   0 
       
y=  8/3  +  −2/3
  
 y=  −4/5  +  9/5
  

       
7/3 2/3 2/5 18/5
   
 89/39   315/103 
   
 109/117 
   −90/103 
♠ 9. projW (y) = 
 ,  
 
♠ 10. projW (y) = 

,
 401/117   
   0 
   
11/9  225/103 
 
     
 89/39   −50/39  225/103
       
 109/117   125/117  315/103   −212/103
   

y=  +
 

   


 401/117   −50/117   −90/103   193/103
  


       
   
y= +
   
11/9 25/9  0   2 

   
 225/103   84/103 
   
   
225/103 290/103
√ √ √ √
10 2 3 9 5 25 26
♠ 11. ≈ 0.632 ♠ 12. ≈ 1.155 ♠ 13. ≈ 4.025 ♠ 14. ≈ 3.269
5 3 5 39

215785
♠ 15. ≈ 4.510
103
273

16. Proof.

ku+vk2 +ku−vk2 = (u+v)·(u+v)+(u−v)·(u−v) = (u·u+u·v+v·u+v·v)+(u·u−u·v−v·u+v·v)

= 2(u · u) + 2(v · v) = 2kuk2 + 2kvk2 . 

4.6 The Fundamental Theorem of Linear Algebra

1. 2 × 5 2. 4 × 4 3. 3 × 3 4. 4 × 2
            
7  1 2 3  −71   1 

  
  
 

    
  








  
 
 
 


 
     
♠ 5. 
   
1   −2   −3   −4  79 1
 
       
    

   
♠ 7. 
  
 , 

 
     
♠ 6.  1  ,  0  , 
           
−2  0   70   0 
 
 
      
     

      
 
    


 0   1     
0  0 5
 
 
 


      

     

 
 0 0 1 
     
4/13 −8/7, 1/5 
♠ 8. 
    
    
6/13 ♠ 9. 
 4/7 

 1/5 

♠ 10. 
 
   
16/7 
 −7/5 

 
8/5

4.7 The Gram-Schmidt Algorithm

√ √
           
 1/ 2 −1/ 2 2/3   2/3
 3/5 −4/5
 
 
 
  

♠ 1.  ♠ 3. 
 
, ,
    
     
√   √        
 1/ 2

1/ 2 
 ♠ 2.  −2/3  ,  1/3
 

  4/5

3/5 


    

 

 1/3 −2/3 

√ √ √ √ √
         
1/ 11 6/ 209 1/ 14 11/ 266 3/ 19

 
 
 


   
 
     

√ √  ,  8/√266  ,  −3/√19

   
      
♠ 5. 
 
0   11/ 209  2/ 14

    
♠ 4. 
     
,

√ √ √ √ √
    
     

  


 3/ 11   −4/ 209    3/ 14 −9/ 266 1/ 19 
   

√ √

   

 
 1/ 11

6/ 209 

           
1  3−i  31 − 11i 
 


 1 2 + i 





    


 1     
1  
      
   
♠ 6. √   , √  7 − 2i  i  1  5+i  −43 + i 

1      
−i  1 
6  78   ♠ 7.  , √  , √  
2

    
   4 7  2 1302  
−1  5+i  41 + 15i 

  
    
2i 4 − 2i 
 
      


      

 
−i 7−i −17 − 9i 

 

√ √ √ √
    
−1/ 2   2/ 2 2/3 2/3  
 1/ 2 1/ 2  
♠ 8.  √  6 −3 

√  √  
♠ 9. 
1/ 2 1/ 2 0 3/ 2  −2/3 1/3 
 
  0 6
1/3 −2/3
274 APPENDIX D. SOLUTIONS TO EXERCISES

√ √ √ √ √ √
    
 1/ 2 1/ 2   2/ 2 3/ 2   1/ 11 6/ 209
♠ 10.  √  √
 
√  √   √ 16
1/ 2 −1/ 2 0 1/ 2

0 11/ 209   11
 √

♠ 11.  √ 11 

  
 19 
 3/ 11 −4/ 209  0 √

 √ √

 209
1/ 11 6/ 209

 √ √ √  √ √ √ 
 1/ 14 11/ 266 3/ 19   14/ 14 −5/ 14 18/ 14 
 √ √ √  √ √ 
♠ 12. 
 2/ 14 8/ 266 −3/ 19 
 0 57/ 266 30/ 266 

 √ √ √  √ 
3/ 14 −9/ 266 1/ 19 0 0 3/ 19

4.8 The Least-Squares Problem


     
1  20   −2  14
♠ 1. ♠ 2.   
11
    
−8 3 ♠ 3.  30 


 
26

 
 −92 
     
> 1  >  14 −3  1  51  1  
♠ 4. A b =  , A A= , x
b= , b=
b  439 
285 285 
  
10 −3 21 143 
470
 
 −1 
     
> −6  >  42 0  1  −1   
♠ 5. A b =  , A A= , x
b=  , b=
b
 −1 

7
−4 0 14 −2  
0
   
 −1 2  3 
     
 −12   6 6  1  −7 

   
♠ 6. A = 
 2 ,
2  A> b =  , A> A =  , x
b= , b=
b −4 
3
  
  −6 6 24 1  
1 4 −1
  

 2 1 −2        7
 
  30  7 4 −6   6 
 
 1 0 −1   2 
     
>
  >
   
♠ 7. A = 

,
 A b=
 ,
21  A A=
 4 ,x
3 −3  b=
 3 ,
 b=
b
 

 1 1 0         9 
   
  −21 −6 −3 6 4  
1 1 −1 5
     

 1   0  1 0   −1
 
26 −15 

  
     
♠ 8. W = Span  0   1  ,
, A =  , A(A> A)−1 A> = A   A> =


  
 0 1 
−15 10

      

 

 −5 3  −5 3
275
 
  
 10
 15 −5 
1  10 15  1  10 15 −5  1  
A  A> = A =  15 26 3 
35 35 35  
15 26 15 26 3  
−5 3 34
     


 2 

  2   4 −2 8 

   −1
    > −1 > > 1 > 1  
♠ 9. W = Span  −1  ,

 −1  A(A A) A = A 21
A=  A = AA =  −2 1 −4 

     21 21 




 

 4  4 8 −4 16

10. Hint: By Theorem 4.8.5, x b is the linear solution to the matrix equation A> Ax = A> b. Since A is
invertible, show that the solution set of Ax = b and A> Ax = A> b are the same.
11. Hint: You may use the fact Q> Q = In . Then substitute A = QR into the equation from Theorem
4.8.5. Also, although we know that R is nonsingular and we can use R−1 , we do not know that Q is
nonsingular. Thus, we cannot use Q−1 because it might not be defined.

Chapter 5 : Determinants
5.1 Introduction to Determinants
       
3. 
 5 12   5 7   5 7   0 4 

1.   2.   3 4  4.   5.  
−6 −1 −6

9 0 3   9 9
−1 −6
       
 −2 −1 3   0 −2 3   2 3
4   1 2
4 
       
♠ 6. 
 1 3 6 
 ♠ 7. 
 1 1 6 
 ♠ 8. 
 −2 −1 3 
 ♠ 9. 
 0 −2 3 

       
−4 8 0 7 −4 0 −4 8 0 7 −4 0
     
 1 2 3   1 3 4   1 2 4 
     
♠ 10. 
 0 −2 −1 
 11. 
 −2 0 −4 
 12. 
 −2 4 −4 

     
7 −4 8 −1 2 5 −1 3 5
     
 1 3 4   i 3i − 2 2 + 5i   i 3i − 2 7−i 
     
13. 
 0 3 −1 
 14. 
 2 + 3i 5i − 1 1 + 9i 
 15. 
 3+i 4 9 − 3i 

     
−2 0 −4 7 + 2i i−3 2 + 4i 7 + 2i i−3 1 + 2i
 
7i − 5  ♠ 17. −2 ♠ 21. −16
 3+i 4
  ♠ 18. 1 + 2i
16. 
 2 + 3i 5i − 1 1 + 9i 
 ♠ 22. 3
  ♠ 19. 6
7 + 2i i − 3 2 + 4i
♠ 20. 9 ♠ 23. 4
276 APPENDIX D. SOLUTIONS TO EXERCISES

♠ 24. 0 25. 12 26. π 5 − 25e2 ♠ 27. 4 ♠ 28. 70 ♠ 29. 42

5.2 Properties of Determinants

1. −5 ♠ 2. −2 ♠ 3. 10 4. −10 5. 7

♠ 6. 6 ♠ 7. 52 8. −54 9. 52 10. −9 + i

♠ 11. 4 12. 24 13. 0 14. 126 ♠ 15. 1 16. 27

♠ 17. 6 ♠ 18. 127 19. 729 ♠ 20. rank(A) = 5, ♠ 21.


nullity(A) = 0 corank(A) = 5,
conullity(A) = 0

♠ 22. Yes, it is consistent for any b. Ax = b has no free variables. Ax = b has a unique solution.
♠ 23. I5
♠ 24. ker(T ) = {0}, im(T ) = R5 . Yes, T is both one-to-one and onto.
   
 a b   e f 
25. Proof. Let A =   and B =  . Then det(A) = ad − bc and det(B) = eh − f g. Hence,
c d g h

det(A) det(B) = (ad − bc)(eh − f g) = adeh − adf g − bceh + bcf g.


 
ae + bf af + bh
On the other hand, AB =  . Hence,
 
ce + dg cf + dh

det(AB) = (ae+bf )(cf +dh)−(af +bh)(ce+dg) = aecf +aedh+bf ch+bf dh−af ce−af dg−bhce−bhdg
= adeh − adf g − bceh + bcf g = det(A) det(B). 

5.3 Cramer’s Rule


         
1 2 4 12/17 1
1.  ♠ 2.  ♠ 3.  
         
     
4 1 1 ♠ 4. 
 −1/17 
  1 
 
  ♠ 5. 
 

5/17  0 
 
 
2
 
♠ 6. 
adj(A) =  ♠ 7. 
adj(A) = ♠ 8. adj(A) =
   51 0 0 
 −3 −3   2 2  −6 10 19  
 0 51 0 
 ,  , 


  
−2 1 1 2  9 −15 −3 ,  
0 0 51
 
 
6 7 −2
A(adj A) = 
 A(adj
 A)=

 −9 0   2 0  A(adj A) =
   
0 −9 0 2
277
   
♠ 9. adj(A) =
 0 0 1 1   2 0 0 0 
   
 0

1 0 2 
 A(adj A) =  0

2 0 0 

 ,  
   
 2 0 2 0   0 0 2 0 
   
   
2 1 2 1 0 0 0 2

10. det(A) = 1, 11. det(A) = 4, 12. det(A) = −158,


     
 4 13 5   2 3/2 2   10 −30 −2 
    1  
A−1 = 
 −3 −11 −4  A−1 =  0 −1/2 3/2 
 A−1 =  −11 −46 18 


   158 



1 3 1 1 1 0 19 22 12

5.4 Cross Products


         
 −3   −26   4   1   19 
         
♠ 1.  7 

 ♠ 2. 
 9 
 ♠ 3. 
 −6 
 ♠ 4. 
 −5 
 ♠ 5. 
 −34 

         
5 2 2 1 12
(
♠ 6. x + y + 6z = 5 ♠ 7. x + 2y + 3z + 4w = 5 −x + y =1
♠ 8.
3x − 3z + 4w = 10

9. Proof. To prove part (ix),


 
v
 1 + w 1  e1 e2 e3 e1 e2 e3 e1 e2 e3
 
u×(v+w) = u×
 v2 + w 2  =
 u1 u2 u3 = u1 u2 u3 + u1 u2 u3
 
v3 + w 3 v1 + w1 v2 + w2 v3 + w 3 v1 v2 v3 w1 w2 w3
= (u × v) + (u × w),

where the second to last equality follows from multilinearity of the determinant. 

10. Proof. To prove part (xii),


 
 u2 u3 u1 u3 u1 u2 
u×u= ,− ,  = (0, −0, 0) = 0. 

u2 u3 u1 u3 u1 u2

Chapter 6 : Eigenvalues
6.1 Eigenvalues and Eigenvectors
278 APPENDIX D. SOLUTIONS TO EXERCISES

1. yes, 2. no 3. no ♠ 4. yes, ♠ 5. no ♠ 6. yes,


λ=5 λ=2 λ = −1
     
♠ 7. yes, ♠ 8. yes, ♠ 9. no
λ=1 λ=0  1   3   1 
♠ 10. c   ♠ 11. c   ♠ 12. c  
0 1 1

♠ 13. yes ♠ 14. yes ♠ 15. no

♠ 16. λ
= 5, mult
= 1, ♠ 17. λ
= −3, mult
 = 1, 18. λ
= 1, mult
 = 1,



 2 




 −1 



 1 













 −4  ;  2  ;  0  ;
        

 
 
  
 
  

     
 1   1   0 
λ=
  −3,
mult = 1, λ
= 2, mult
  = 2,  λ
= 2, mult
 = 1,



 1 



 1   −1  



 1 




 

  

 




 2   0 , 2   2 

 

 
 
 
 

 






     
 0   
 2 0 
 
 1 

19. Hint: If Ax = λx, then what can we say about A2 x = A(Ax)? Can we generalize this observation?
20. Hint: If A is idempotent, then A2 = A. Use Exercise 19.
21. Hint: If A is nilpotent, then An = 0 for some n. Use Exercise 19.

6.2 The Characteristic Polynomial


     
 −1 
   −1 
   −3 
 
♠ 1. λ = 4,   ; ♠ 2. λ = 5,  ♠ 3. λ = 3,   ;
     

1  1  2 

  
  
 
   
 −1 
   −4 
 
λ = −2,  λ = −1, 
   
 
2  3 

  
 

     
 2+i 
   −2 + i 
   1+i 
 
♠ 4. λ = 2 + i,   ; ♠ 5. λ = 3+2i,   ; ♠ 6. λ = 4 + i,   ;
     
1 2 1

 
 
 
 
 

     
 2−i 
   −2 − i 
   1−i 
 
λ = 2 − i,  λ = 3 − 2i,  λ = 4 − i, 
     
  
1 2 1

 
 
 
 
 

     



 0 




 −1 




 1 













♠ 7. λ = 1,  −1 ; λ = 0,
  2  ; λ = 2,
 
 −3 
 

   
   
  

 
 
 
 
 
 1   1   −1  

   



 0 




 1 










♠ 8. λ = 2,  −1 ; λ = −3,
  −3 
 

   
 

 
 
 
 1   −1  
279
 



 0 





♠ 9. λ = 0,  −1 


  

 

 1 

♠ 10. A and B are not similar since tr(A) = 5 6= 3 = tr(B)


♠ 11. A and B are not similar since det(A) = 4 6= 6 = det(B)

6.3 Diagonalization
       
 2 1  2 0  1 −1   1 2  3 0  5 −2 
♠ 1.     ♠ 2.    
1 1 0 1 −1 2 2 5 0 −1 −2 1
       
 1 2 2   −2 0 0  1 −2 0  2 0 2 5  1 0
0 0  30 0 −1 
       
♠ 3.    0 −2 0
0 1 1     −1 3 −1  1 0

1 3 

0 −1 0 0  

0 −24 1 4 

♠ 4. 
     
       
−1 0 1 0 0 3 1 −2 1 4 1
 0 12 

 0 0 −1 0 

 0 −5 0 1  
   
5 0 6 15 0 0 0 0 −1 2 0 0
    
 4 2 −1   18 0 0   3 33 
   1  
5. 
 1 −3 0   0 −17 0
  
  21  1 −6 1 

    
2 1 1 0 0 −2 −7 0 14

6.4 Orthogonal Diagonalization

√ √ √ √ √ √ √  √ √ √ 
     
5/ 26 −1/ 26 18 0 5/ 26 1/ 26  1/ 3 1/ 2 1/ 6   30 0 0   1/ 3 1/ 3 1/ 3 
 √ √ √  √ √
♠ 1.  √
   
♠ 2.
  
√   √ √   1/ 3
 −1/ 2 1/ 6   0 −12
  0 
  1/ 2
 −1/ 2 0 

1/ 26 5/ 26 0 −8 −1/ 26 5/ 26  √ √   √ √ √ 
1/ 3 0 −2/ 6 0 0 6 1/ 6 1/ 6 −2/ 6

√ √ √
   
 1/2 1/ 2 −1/ 6 1/ 12   −4 0 0 0   1/2 −1/2 1/2 −1/2 
 √ √ √   √ √ 
 −1/2 1/ 2 1/ 6 −1/ 12  0 8 0 0   1/ 2 1/ 2 0 0
   

♠ 3. 
 √ √



 √ √ √


  −1/ 6
 1/2 0 2/ 6 1/ 12  0 0 8 0   1/ 6 2/ 6 0 
  
 √   √ √ √ √ 
−1/2 0 0 3/ 12 0 0 0 8 1/ 12 −1/ 12 1/ 12 3/ 12

√ √ √ √
       

 (−1 + i)/ 3 (1 − i)/ 6   3 0   (−1 − i)/ 3 1/ 3   1 0 0  5 0 0  1 0 0 


♠ 4.  √ √  √ √
√  ♠ 5.
   
√ √   √  0
 (1 − i)/ 3 (−1 + i)/ 6 
 0
 −2 0   0 (1 + i)/ 3 1/ 3
  

1/ 3 2/ 6 0 6 (1 + i)/ 6 2/ 6  √ √    √ √ 
0 1/ 3 2/ 6 0 0 1 0 (−1 − i)/ 6 2/ 6
         
−5/26   1/3 1/3 1/3   1/2 −1/2 0   1/6 1/6 −1/3 
 25/26 5/26   1/26
♠ 6. 18   − 8  ♠ 7.

30 
 1/3 1/3



1/3  − 12 
 −1/2
 
1/2 0  + 6 

 1/6 1/6

−1/3 

5/26 1/26 −5/26 25/26      
1/3 1/3 1/3 0 0 0 −1/3 −1/3 2/3
280 APPENDIX D. SOLUTIONS TO EXERCISES
       
 1/4 −1/4 −1/4
1/4   1/2 1/2 0 0   1/6 −1/6 −1/3 0   1/12 −1/12 1/12 1/4 
       
 −1/4 1/4 −1/4 1/4   1/2 1/2 0 0  −1/6 1/6 1/3 0   −1/12 1/12 −1/12 −1/4 
       

♠ 8. −4 

 + 8
 
 + 8
 
 + 8
 


 1/4 −1/4 1/4 −1/4   0 0 0 0   −1/3 1/3 2/3 0   1/12 −1/12 1/12 1/4 
      
       
−1/4 1/4 −1/4 1/4 0 0 0 0 0 0 0 0 1/4 −1/4 1/4 3/4
   
 1/4 −1/4 1/4 −1/4   3/4 1/4 −1/4 1/4 
   
 −1/4 1/4 −1/4 1/4   1/4 3/4 1/4 −1/4 
   
= −4 

 + 8
 


 1/4 −1/4 1/4 −1/4   −1/4 1/4 3/4 1/4 
 
 
   
−1/4 1/4 −1/4 1/4 1/4 −1/4 1/4 3/4

9. Hint: Recall that (A−1 )> = (A> )−1 .


10. Hint: Recall that (AB)> = B > A> . This is the shoes-socks property. Doesn’t another matrix operation
also have the shoes-socks property?

11. Hint: Recall that det(A> ) = det(A).


12. Hint: You may use the fact that (U x) · (U y) = x · y for all x, y ∈ Rn if and only if U is orthogonal.

6.5 Similarity and Linear Transformations


     
−7 −1  0 8 −5  ♠ 3. 0 0 1 0



 ♠ 2.  
♠ 1. 
 −5 −1 
 0 11 −7
 
−11 −1

You might also like