0% found this document useful (0 votes)
37 views608 pages

Linear Algebra Fraleigh

This document is the preface and introductory information for the third edition of 'Linear Algebra' by John B. Fraleigh and Raymond A. Beauregard. It outlines the book's purpose as a foundational text for undergraduate linear algebra courses, emphasizing accessibility for students from various disciplines and a blend of intuition and rigor. The edition includes new features such as gradual introduction of concepts, applications to coding, and integration of MATLAB exercises, along with retained features like linear transformations and eigenvalues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views608 pages

Linear Algebra Fraleigh

This document is the preface and introductory information for the third edition of 'Linear Algebra' by John B. Fraleigh and Raymond A. Beauregard. It outlines the book's purpose as a foundational text for undergraduate linear algebra courses, emphasizing accessibility for students from various disciplines and a blend of intuition and rigor. The edition includes new features such as gradual introduction of concepts, applications to coding, and integration of MATLAB exercises, along with retained features like linear transformations and eigenvalues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 608

LINEAR ALGEBRA

THIRD EDITION

JOHN B. FRALEIGH
RAYMOND A. BEAUREGARD
University of Rhode Island

Historical Notes by Victor J. Katz

University of the District of Columbia

Addison
A Vex] (oats
Longman
a

ADDISON-WESLEY PUBLISHING COMPANY


Reading, Massachusetts * Menlo Park, California * New York
Don Mills, Ontario * Wokingham, England * Amsterdam * Bonn
Sydney * Singapore * Tokyo * Madrid * San Juan * Milan ¢ Paris
Sponsoring Editor: Laurie Rosatone
Production Supervisor: Peggy McMahon
Marketing Manager: Andrew Fisher
Senior Manufactuzing Manager: Roy E. Logan
Cover Designer: Leslie Haimes
Composition: Black Dot Graphics

Library of Congress Cataloging-in-Publication Data


Fraleigh, John B.
Linear algebra / John’B. Fraleigh, Raymond A. Beawegard ;
historical notes by Victor Katz. -- 3rd ed.
p- cm.
Includes index.
ISBN 0-201-52675-1
1. Algebras, Linear. 1. Beauregard, Raymond A. Il. Katz, Victor J.
Ill. Title.
QA184.F73 1995
512'.5--de20 93-49722
CIP

MATLAB is « registered trademark of The MathWorks, Inc.

Many of the designations used by manufacturers and sellers to distinguish their products are
claimed as trademarks. Where those designations appear in this book, and Addison-Wesley
was awate of a trademark claim, the designations have been printed in initial cap or all
caps.

Reprinted with corrections, November 1995.

Copyright © 1995 by Addison-Wesley Publishing Company, Inc. All rights reserved. No part
of this publication may be reproduced, stored in a retrieval system, or transmitted, in any
form or by any means, electronic, mechanical, photocopying, recording, or otherwise,
without the prior written permission of the publisher. Printed in the United States of
America.

16-PHX-04
PREFACE
_

Our text is designed for use in a first undergraduate course in linear algebra.
Because linear algebra provides the tools for dealing with problems in fields
ranging from forestry to nuclear physics, it is desirable to make tiie subject
accessible to students from a variety of disciplines. For the mathematics
major, a course in linear algebra often serves as a bridge from the typical
intuitive treatment of calculus to more rigorous courses such as abstract
algebra and analysis. Recognizing this, we have attempted to achieve an
appropriate blend of intuition and rigor in our presentation.

NEW FEATURES IN THIS EDITION


¢ Evenly Paced Development: In our previous edition, as in many other texts,
fundamental notions of linear algebra—including linear combinations,
subspaces, independence, bases, rank, and dimension—were first intro-
duced in the context of axiomatically defined vector spaces. Having easily
mastered matrix algebra and techniques for solving linear systems, many
students were unable to adjust to the discontinuity in difficulty. In an
attempt to eradicate this abrupt jump that has plagued instructors for years,
we have extensively revised the first portion of our text to introduce these
fundamental ideas gradually, in the context of R’. Thus linear combinations
and spans of vectors are discussed in Section |.1 and subspaces and bases
are introduced where they first naturally occur in the study of the solution
space of a homogeneous linear system. Independence, rank, dimension, and
linear transformations are all introduced before axiomatic vector spaces are
defined. The chapter dealing with vector spaces (Chapter 3) is only half its
length in the previous edition, because the definitions, theorems, and proofs
already given can usually be extended to general vector spaces by replacing
“vectors in R””’ by “vectors in a vector space V.”’ Most of the reorganization
of our text was driven by our desire to tackle this problem.
+ Early Geometry: Vector geometry (geometric addition, the dot product,
length, the angle between vectors), which is familiar to many students from
calculus, is now presented for vectors in R’ in the first two sections of
Chapter |. This provides a geometric foundation for notions of linear
combinations and subspaces. Instructors may feel that students will be
uncomfortable working immediately in R’. It is our experience that this
causes no difficulty; students can compute a dot product of vectors with five
components as easily as with two components.
¢ Application to Coding: An application of matrix algebra to binary linear
codes now appears at the end of Chapter 1.
* MATLAR: The professional PC software MATLAB is widely used for
computations in linear algebra. Throughout the text, we have included
optional exercises to be done using the Student Edition of MATLAB. Each
exercise set includes an explanation of the procedures and commands in
MATLAB needed for the exercises. When performed sequentially through-
out the text, these exercises give an elementary tutorial on MATLAB.
Appendix D summarizes for easy reference procedures and commands used
in the text exercises, but it is not necessary to study Appendix D before
plunging in with the MATLAB exercises in Section 1.1. We have not written
MATLAB .M-files combining MATLAB’s commands for student use.
Rather, we explain the MATLAB commands and ask students to type a
single line, just as shown in the text, that combines the necessary com-
mands, and then to edit and access that line as necessary for working a
sequence of problems. We hope that students will grasp the commands, and
proceed to write their own lines of commands to solve problems in this or
other courses, referring if necessary to the MATLAB manual, which is the
best reference. Once students have had some practice entering data, we do
supply files containing matrix data for exercises to save time and avoid
typos in data entry. For example, the data file for the MATLAB exercises for
Section 1.4 is FBC1S4.M, standing for Fraleigh Beauregard Chapter |
Section 4. The data files are on the disk containing LINTEK, as explained in
the next item.
* LINTEK: The PC software LINTEK by Fraleigh, designed explicitly for this
text, has been revised and upgraded and is free to students using our text.
LINTEK is not designed for professional use. In particular, matrices can
have no more than 10 rows or columns. LINTEK does provide some
educauonal reinforcements, such as step-by-step execution and quizzes,
that are not available in MATLAB. All information needed for LINTEK is
supplied on screen. No manual is necessary. The matrix data files for
MATLAB referred to in the preceding item also work with LINTEK, and
are supplied on the LINTEK disk. Many optional exercise sets include
problems to be done using LINTEK.
PREFACE V

FEATURES RETAINED FROM THE PREVIOUS EDITION


Linear Transformations: In the previous edition, we presented material on
linear transformations throughout the text, rather than placing the tonic
toward the end of a one-semester course. This worked well. Our students
gained much more understanding of linear transformations by working
with them over a longer period of time. We have continued to do this in the
present edition.

Eigenvalues and Eigenvectors: This topic is introduced, with applications, in


Chapter 5. Eigenvalues and eigenvectors recur in Chapters 7, 8, and 9, so
students have the opportunity to continue to work with them.

Applications: We believe that if applications are to be presented, it is best to


give each one as soon as the requisite linear algebra has been developed,
rather than deferring them all to the end of the text. Prompt work with an
application reinforces the algebraic idea. Accordingly, we have placed
applications at the ends of the chapters where the pertinent algebra is
developed, unless the applications are so extensive that they merit a chapter
by themselves. For example, Chapter i concludes with applications to
population distribution (Markov chains) and to binary linear codes.

Summaries: The summaries at the ends of sections have proved very


convenient for both students and instructors, so we continue to provide
them.
Exercises: There are abundant pencil-and-paper exercises as well as com-
puter exercises. Most exercise sets include a ten-part true-false problem.
That exercise gives students valuable practice in deciding whether a
mathematical statement is true, as opposed to asking for a proof of a given
true statement. Answers to odd-numbered exercises having numerical
answers are given at the back of the text. Usually, a requested proof or
explanation is not given in the answers, because having it too readily
available does not seem pedagogically sound. Coinputer related exercises
are included at the end of most exercise sets. Their appearance is signaled
by a disk logo.

Complex Numbers: We use complex numbers when discussing eigenvalues


and diagonalization in Chapter 5. Chapter 9 is devoted to linear algebra in
C". Some instructors bemoan a restriction to real numbers throughout most
of the course, and they certainly have a point. We experimented when
teaching from the previous edition, making the first section of the chapter
on complex numbers the first lesson of the course. Then, as we developed
real linear algebra, we always assigned an appropriate problem or two from
the parallel development in subsequent sections on C’. Except for discuss-
ing the complex inner product and conjugate transpose, very little extra
time was necessary. This technique proved to be feasible, but our students
were not enamcred with pencil-and-paper computations involving complex
numbers.
* Dependence Chart: A dependence chart immediately follows this preface,
and is a valuable aid in constructing a syllabus for the course.

SUPPLEMENTS
¢ Instructor’s Solutions Manual: This manual, prepared by the authors, is
available to the instructor from the publisher. It contains complete solu-
tions, including proofs, for all of the exercises.
¢ Student’s Solutions Manual: Prepared by the authors, this manual contains
the complete solutions, including proofs, from the Instructor’s Solutions
Manual for every third problem (1, 4, 7, etc.) in each exercise set.
¢ LINTEK: This PC software, discussed above, is included with each copy of
the text.

¢ Testbank: The authors have created a substantial test bank. It is available to


the instructor from the publisher at no cost. Note that each multiple-choice
problem in the bank can also be requested as open-ended —that is, with the
choices omitied—as Iong as the problem still makes sense. Problems are
easily selected and printed using software by Fraleigh. We use this bank
extensively, saving ourselves much time—but of course, we made up the
problems!

ACKNOWLEDGMENTS
Reviewers of text manuscripts perform a vital function by keeping authors
in touch with reality. We wish to express our appreciation to all the re-
viewers of the manuscript for this edition, including: Michael Ecker, Penn
State University; Steve Pennell, University of Massachusetts, Lowell; Paul-
ine Chow, Harrisburg Area Community College; Murray Eisenberg, Univer-
sity of Massachusetts, Amherst; Richard Blecksmith, Northern Illinois Uni-
versity; Michael Tangredi, College of St. Benedict; Ronald D. Whittekin,
Metropolitan State College; Marvin Zeman, Southern Illinois University at
Carbondale; Ken Brown, Cornell University; and Michal Gass, College of St.
Benedict.
In addition, we wish to acknowledge the contributions of reviewers of the
previous editions: Ross A. Beaumont, University of Washington; Paul Blan-
chard, Boston University; Lawrence O. Cannon, Utah State University; Henry
Cohen, University of Pittsburgh; Sam Councilman, California State Univer-
sity, Long Beach; Daniel Drucker, Wayne State University; Bruce Edwards,
University of Florida; Murray Eisenberg, University of Massachusetts, Am-
herst; Christopher Ennis, University of Minnesota, Mohammed Kazemi,
University of North Carolina, Charlotte; Robert Maynard, Tidewarter Com-
munity College; Robert McFadden, Northern Illinois University; W. A.
McWorter, Jr., Ohio State University; David Meredith, San Francisco State
University; John Morrill, DePauw University; Daniel Sweet, University of
Maryland; James M. Sobota, University of Wisconsin—LaCrosse; Marvin
Zeman, Southern Illinois University.
PREFACE Vil

We are very grateful to Victor Katz for providing the excellent historical
notes. His notes are not just biographical information about the contributors
to the subject; he actually offers insight into its development.
Finally, we wish to thank Laurie Rosatone, Cristina Malinn, and Peggy
McMahon of Addison-Wesley, and copy editor Craig Kirkpatrick for their
help in the preparatiou of this edition.
DEPENDENCE CHART

Central Core

9.1 1.1-1.6
T 1
1.7 1.8 10.1-10.3

2.1-2.3
r ~~ I
2.4 2.5

3.1-3.4
+
9.2 3.5

4.1-4.3
f
4.4

5.1-5.2

9.3 5.3

6.1-6.3
9.4 r
6.4-6.5 8.4

7.1-7.2
| 1
8.1

1
8.2 8.3

Vill

CONTENTS

PREFACE | iii

CHAPTER 1

VECTORS, MATRICES, AND LINEAR SYSTEMS 1


1.1 Vectors in Euclidean Spaces 2
1.2 The Norm and the Dot Product 20
1.3. Matrices and Their Algebra 35
1.4 Solving Systems of Linear Equations 51
1.5 Inverses of Square Matrices 73
1.6 Homogeneous Systems, Subspaces, and Bases 88
1.7. Application to Population Distribution (Optional) 102
1.8 Application to Binary Linear Codes (Optional) 115

CHAPTER 2

DIMENSION, RANK, AND LINEAR


TRANSFORMATIONS 125
2.1 Independence and Dimension 125
2.2. The Rank ofa Matrix 136 ;
2.3 Linear Transformations of Euclidean Spaces 142
2.4 Linear Transformations of the Plane (Optional) 154
2.5 Lines, Planes, and Other Flats (Optional) 157
x CONTENTS
~

] CHAPTER 3
VECTOR SPACES 179
3.1 Vector Spaces 180
3.2 Basic Concepts of Vector Spaces 190
3.3 Coordinatization of Vectors 204
3.4 Linear Transformations 213
3.5 Inner-Product Spaces (Optional} 229

ae
CHAPTER 4 |
|
DETERMINANTS 238
4.1 Areas, Volumes, and Cross Products 238
4.2 The Neterminant of a Square Matrix 250
4.3 Computation of Determinants and Cramer’s Rule 263
4.4 Linear Transformations and Determinants (Optional) 273

CHAPTER 5

EIGENVALUES AND EIGENVECTORS 286


5.1 Eigenvalues and Eigenvectors 286
5.2 Diagonalization 305
5.3 Two Applications 317

oo,

7 CHAPTER 6

ORTHOGONALITY 326
6.1 Projections 326
5.2 The Gram-—Schmidt Process 338
6.3 Orthogonal Matrices 349
6.4 The Projection Matrix 360
6.5 The Method of Least Squares 369
CONTENTS xi

CHAPTER 7

CHANGE OF BASIS 388


7.1 Cvordinatization and Change of Basis 389
7.2 Matrix Representations and Similarity 396

| CHAPTER 8

EIGENVALUES: FURTHER APPLICATIONS AND


COMPUTATIONS 408
8.1 Diagonalization of Quadratic Forms 409
8.2 Applications to Geometry 418
8.3 Applications to Extrema 430
8.4 Computing Eigenvalues and Eigenvectors 438

CHAPTER 9

COMPLEX SCALARS 454


9.1 Algebra of Complex Numbers 454
9.2 Matrices and Vector Spaces with Complex Scalars 464
9.3 Eigenvalues and Diagonalization 475
9.4 Jordan Canonical Form 486

CHAPTER 10

SOLVING LARGE LINEAR SYSTEMS 502


10.1 Considerations of Time 503
10.2 The LU-Factorization 512
10.3 Pivoting, Scaling, and Ill-Conditioned Matrices 526

APPENDICES A-1
A. Mathematical Induction A-1
B. Two Deferred Proofs A-6
xii CONTENTS

C. LINTEK Routines A-13


D. MATLAB Procedures and Commands Used in the Exercises A-15

ANSWERS TO MOST ODD-NUMBERED EXERCISES A-21

iNDEX A-53
CHAPTER

VECTORS, MATRICES, AND


LINEAR SYSTEMS

We have all solved simultaneous linear equations—for example,


2x+ y= 4
x- y= —-3.

We shall call any such collection of simultaneous linear equations a /inear


system. Finding all solutions of a linear system is fundamental to the study of
linear algebra. Indeed, the great practical importance of linear algebra stems
from the fact that linear systems can be solved by algebraic methods. For
example, a /inear equation in one unknown, such as 3x = 8, is easy to solve.
But the nonlinear equations x° + 3x = 1, x* = 100, and x — sin x = | areall
difficult to solve algebraically.
One often-used technique for dealing with a nonlinear problem consists of
linearizing the problem—that is, approximating the problem with a linear one
that can be solved more easily. Linearization techniques often involve
calculus. If you have studied calculus, you may be familiar with Newton’s
method for approximating a solution to an equation of the form f(x) = 0; an
example would be x — | — sin x = 0. An approximate solution is found by
solving sequentially several linear equations of the form ax = b, which are
obtained by approximating the graph of fwith lines. Finding an approximate
numerical solution of a partial differential equation may involve solving a
linear system consisting of thousands of equations in thousands of unknowns.
With the advent of the computer, solving such systems is now possible. The
feasibility of solving huge linear problems makes linear algebra currently one
of the most useful mathematical tools in both the physical and the social
sciences,
The study of linear systems and their solutions is phrased in terms of
vectors and matrices. Sections 1.1 and 1.2 introduce vectors in the Euclidean
spaces (the plane, 3-space, etc.) and provide a geometric foundation for our
work. Sections 1.3-1.6 introduce matrices and methods for solving linear
systeins and study solution sets of linear systems.
2 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

1.1 VECTORS IN EUCLIDEAN SPACES

We all know the practicality of two basic arithmetic cperations—namely,


adding two numbers and multiplying one number by another. We can regard
the real numbers as forming a line which is a one-dimensional space. In this
section, we will describe a useful way of adding two points in a plane, which is
a two-dimensional space, or two points in three-dimensional space. We will
even describe what is meant by n-dimensional space and define addition of
two points there. We will also describe how to multiply a point in two-, three-,
aid n-dimensional space by a real number. These extended notions of
addition and of multiplication by a real number are as useful in n-dimensional
space for n > | as they are for the one-dimensional real number line. When
these operations are performed in spaces of dimension greater than one, it is
conventional to call the elements of the space vecters as well as peints. In this
section, we describe a physical model that suggests the term vector and that
motivates addition of vectors and multiplication of a vector by a number. We
then formally define these operations and list their properties.

Euclidean Spaces
Let R be the set of all real numbers. We can regard R geometrically as the
Euclidean line—that is, as Euclidean 4-space. We are familiar with rectangular
x,y-coordinates in the Euclidean plane. We consider each ordered pair (a, b)
of real numbers to represent a point in the plane, as illustrated in Figure 1.1.
The set of all such ordered pairs of real numbers is Euclidean 2-space, which
we denote by R’, and often call the plane.
To coordinatize space, we choose three mutually perpendicular lines as
coordinate axes through a point that we call the origin and label 0, as shown in
Figure 1.2. Note that we represent only half of each coordinate axis for clarity.
The coordinate system in this figure is called a right-hand system because,
when the fingers of the right hand are curved in the direction required to rotate
the positive x-axis toward the positive y-axis, the right thumb points up the
z-axis, as shown in Figure 1.2. The set of all ordered triples (a, b, c) of real
numbers is Euclidean 3-space, denoted R*, and often simply referred to as
space.
Although a Euclidean space of dimension four or more may be difficult for
us to visualize geometrically, we have no trouble writing down an ordered
quadruple of real numbers such as (2, —3, 7, 7) or an ordered quintuple such
as (—0.3, 3, 2, —5, 21.3), etc. Indeed, it can be useful to do this. A household
budget might contain nine categories, and the expenses allowed per week in
each category could be represented by an ordered 9-tuple of real numbers.
Generalizing, the set R’ of all ordered n-tuples (x,, x,, . . . , x,) of real numbers
is Euclidean n-space. Note the use of just one letter with consecutive integer
subscripts in this n-tuplc, rather than different letters. We will often denote an
element of R? by (x,, x,) and an element of R? by (x,, x, X;).
1 VECTORS IN EUCLIDEAN SPACES 3


1 (a, b, c)

(a, 5)
or
b

— Xx
0 a

FIGURE 1.1 FIGURE 1.2


Rectangular coordinates in the plane. Rectangular coordinates in space.

The Physical Notion of a Vector


We are accustomed to visualizing an ordered pair or triple as a point in the
plane or in space and denoting it geometrically by a dot, as shown in Figures
1.1 and 1.2. Physicists have found another very useful geometric interpreta-
tion of such pairs and triples in their consideration of forces acting on a body.
The motion in response to a force depends on the direction in which the force
is applied and on the magnitude of the force—that is, on how hard the force is
exerted. It is natural to represent a force Ly an arrow, pointing in the direction

HISTORICAL NOTE THE IDEA OF AN “1-DIMENSIONAL SPACE FOR m > 3 reached acceptance
gradually during the nineteenth century; it is thus difficult to pinpoint a frst “invention” of this
concept. Among the various early uses of this notion are its appearances in a work on the
divergence theorem by the Russian mathematician Mikhail Ostrogradskii (1801 ~1862) in 1836,
in the geometrical tracts of Hermann Grassmann (1809-1877) in the early 1840s, and in a brief
paper of Arthur Cayley (1821-1895) in 1846. Unfortunately, the first two authors were virtually
ignored in their lifetimes. In particular, the work of Grassmann was quite philosophical and
extremely difficult to read. Cayley’s note merely stated that one can generalize certain results to
dimensions greater than three “without recourse to any metaphysical notion with regard to the
possibility of a space of four dimensions.” Sir William Rowan Hamilton (1805-1865), in an 1841
letter, also noted that ‘‘it ynust be possible, in some way or other, to introduce not only tnplets but
polyplets, so as in some sense to satisfy the symbolical equation
a = (a), a), ..., &,);

a being here one symbol, as indicative of one (complex) thought; and a,, a,, . .. ,@, denoting real
numbers, positive or negative.”
Hamilton, whose work on quaternions will be mentioned later, and whc spent much of his
professional life as the Royal Astronomer of Ireland, is most famous for his work in dynamics. As
Erwin Shrédinger wrote, “the Hamiltonian principle has become the comerstone of modern
physics, the thing with which a physicist expects every physical phenomenon to be in conformity.”
4 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

in which the force is acting, and with the Jength of the arrow representing the
magnitude of the force. Such an arrow is a force vector.
Using a rectangular coordinate system in the plane, note that if we
consider a force vector to start from the origin (0, 8), then the vector is
completely determined by the coordinates of the point at the tip of the arrow.
Thus we.can consider each ordered pair in R’ to represent a vector in the plane
as well as a point in the plane. When we wish to regard an ordered pair as a
vector, we will use square brackets, rather than parentheses, to indicate this.
Also, we often will write vectors as columns of numbers rather than as rows,
and bracket notation is traditional for columns. Thus we speak of the point
(1, 2) in R? and of the vector [1, 2] in R’. To represent the point (1, 2) in th
plane, we make a dot at the appropriate place, whereas if we wish to represent
the vector (1, 2], we draw an arrow emanating from the origin with its tip at the
place where we would plot the point (1, 2). Mathematically, there is no
distinction between (1, 2) and (1, 2]. The different notations merely indicate
different views of the same member of R’. This is illustrated in Figure 1.3. A
similar observation holds for 3-space. Generalizing, each n-tuple of real
numbers can be viewed both as a point (x,, x,,..., X,) and as a vector
[X1, %),... ,X,] in R”. We use boldface letters such as a = [a,, a,], V = [V,, V2, V3],
and x = [x,, 2, . . . ; X,] to denote vectors. In written work, it is customary to
place an arrow over a letter to denote a vector, as in a, V, and x. The ith entry x,
in such a vector is the ith component of the vector. Even the real numbers in R
can be regarded both as points and as vectors. When we are not regarding a
real number as either a point or a vector, we refer to it as a scalar.
Two vectorsv = [v,, v,,..., ¥,] and w =(w,, Ww, ... , W,,] are equal ifn = m
and v, = w, for each i.
A vector containing only zeros as components 1s called a zero vector and
is denoted by 0. Thus, in R? we have 0 = [0, 0] whereas in R* we have 0 =
[0, 0, 0, O}.
When denoting a vector v in R” geometrically by an arrow in a figure, we
say that the vector is in standard position if it starts at the origin. If we draw an

x2 +2
A A

]
2+ «! 2) 2

1 t+ 1+ v=[1,2]

0 wm X) 0 > X]

(a) (b)
FIGURE 1.3
Two views of the same member of R?: (a) the point (1, 2); (b) the vector v = (1. 2].
1.1 VECTORS IN EUCLIDEAN SPACES 5

Vv Translated v

0 P

FIGURE 1.4 FIGURE 1.5


v translated to P. The vector sum F, + F,.

arrow having the same length and parallel to the arrow representing v but
starting at a point P other than the origin, we refer to the arrow as v translated
to P. This is illustrated in Figure 1.4. Note that we did not draw any coordinate
axes; we only marked the origin 0 and drew the two arrows. Thus we can
consider Figure 1.4 to represent a vector v in R’, R?, or indeed in R" for n = 2.
We will often leave out axes when they are not necessary for our understand-
ing. This makes our figures both less cluttered and more general.

Vector Algebra
Physicists tell us that if two forces corresponding to force vectors F, and F, act
on a body at the same time, then the two forces can be replaced by a single
force, the resultant force, which has the same effect as the original two forces.
The force vector for this resultant force is the diagonal of the parallelogram
having the force vectors F, and F, as edges, as illustrated in Figure 1.5. It is
natural to consider this resultant force vector to be the sum F, + F, of the two
original force vectors, and it is so labeled in Figure 1.5.

HISTORICAL NOTE Tue cONcEPT OF A VECTOR in its earliest manifestation comes from
physical considerations. In particular, there is evidence of velocity being thought of as a vector—a
qzantity with magnitude and direction—in Greek times. For example, in the treatise Mechanica
by an unknown author in the fourth century 5.c. is wntten: “When a body is moved in a certain
ratio (i.e., has two linear movements in a constant ratio to one another), the body must move in a
straight line, and this straight line is the diagonal of the parallelogram formed from the straight
lines which have the given ratio.” Heron of Alexandna (first century 4.D.) gave a proof of this result
when the directions were perpendicular. He showed that if a point A moves with constant velocity
over a line AB while at the same time the line 4B moves with constant velocity along the parallel
lines AC and BD so that it always remains parallel to its original position, and that if the time A
takes to reach B is the same as the time AB takes to reach CD, then in fact the point A moves along
the diagonal AD.
This basic idea of adding two motions vectorially was generalized from velocities to physical
forces in the sixteenth and seventeenth centuries. One example of this practice is found as
Corollary ! to the Laws of Motion in Isaac Newton’s Principia, where he shows that ‘“‘a body acted
on by two forces simultaneously will descrive the diagonal of a parallelogram in the same time as it
would describe the sides by those forces separately.”
6 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

We can visualize two vectors with different directions and emanating from
a point P in Euclidean 2-space or 3-space as determining a plane. It is
pedagogice!ly useful to do this for n-space for any n = 2 and show helpful
figures on our pages. Motivated by our discussion of force vectors above, we
consider the sur of two vectors v and w starting at a point P to be the vector
startingat P that forms the diagonal of the parallelogram wiih a vertex at Pand
having edges represented by v and w, as illustrated in Figure 1.6, where we take
the vectors in R" in standard position starting at 0. Thus we have a geometric
understanding of vector addition in IR". We have labeled as translated vy and
translated w the sides of the parallelogram opposite the vectors v and w.
Note that arrows along opposite sides of the parallelogram point in the
same direction and have the same length. Thus, as a force vector, the
translation of v is considered to be equivalent to the vector v, and the same is
true for w and its translation. We can think of obtaining the vector v + w by
drawing the arrow v from 0 and then drawing the arrow w translated to start
from the tip of v as shown in Figure 1.6. The vector from 0 to the tip of the
translated w is then v + w. This is often a useful way to regard v + w. To add
three vectors u, v, and w geometrically, we translate v to start at the tip of wand
then translate w to start at the tip of the translated v. The sum u + v + wthen
begins at the origin where u starts, and ends at the tip of the translated w, as
indicated in Figure 1.7.
The difference v — w of two vectors in R" is represented geometrically by
the arrow from the tip of w to the tip of v, as shown in Figure 1.8. Here v — wis
the vector that, when added to w, yields v. The dashed arrow in Figure 1.8
shows v — w in standard position.
If we are pushing a body with a force vector F and we wish to “double the
force” —that is, we want to push in the same direction but twice as hard—
then it is natural to denote the doubled force vector by 2F. If instead we want
to push the body in the opposite direction with one-third the force, we denote
the new force vector by —1F. Generalizing, we consider the product rv of a
scalar r times a vector v in R’ to be represented by the arrow whose length is |r|
times the length of v and which has the same direction as v if r > 0 but the
opposite direction if r < 0. (See Figure 1.9 for an illustration.) Thus we have a
geometric interpretation of scalar multiplication in R'—that is, of multiplica-
tion of a vector in R’ by a scalar.

Translated w
ee Translated v

Translated y

FIGURE 1.6 FIGURE 1.7


Representation of v + w in R". Representation of u + v + w in R’.
11 VECTORS IN EUCLIDEAN SPACES 7

x2
.

2v7 + 5 (2v,, 2vy)

| 2v
v2 +—_———_

‘ \v-w v-w
4 3Ul
\ TT _ oi
\ 1 Vi ] vy} 2,
0. = —3V ~ 7372

FIGURE 1.8 FIGURE 1.9


The vector v — w. Computation of rv in R?.

Taking a vecior v = [v,, v,] in R? and any scalar r, we would like to be able to
conipute rv algebraically as an element (ordered pair) in R’, and not just
represent it geometrically by an arrow. Figure |.9 shows the vector 2v which
points in the same direction as v but is twice as long, and shows that we have 2v
= [2y,, 2v,]. It also indicates that if we multiply all components of v by -4, the
resulting vector has direction opposite to the direction of v and length equal to
; the length of v. Similarly, if we take two vectors v = [¥,, vy] and w = [w,, w,] in
IR’, we would like to be able to compute v + w algebraically as an element
(ordered pair) in R’, Figure !.i0 indicates that we have v + w = [v, + w,,
v, + w,J]—that is, we can simply add corresponding components. With these
figures to guide us, we formally define some algebraic operations with vectors
in R’.

DEFINITION1 1° Vector Algebra in R’

Letv=[v,, v,,...,V,] and w = [w,, Ww, ..., W,] be vectors in R". The
vectors are added and subtracted as follows:
Vector addition: vy + w = [v, + #,, %) + w,,...,¥,+ Wl
Vector subtraction: vy — w = [v, — W,, ¥) — Wo,.-- 5%, — Wal
If ris any scalar, the vector v is multiplied by r as follows:
Scalar multiplication: rv = [rv,, rv), ..., rv,]

As a natural extension of Definition 1.1, we can combine three or more


vectors in R" using addition or subtraction by simply adding or subtracting
their corresponding components. When a scalar in such a combination is
negative, as in 4u + (—7)v + 2w, we usually abbreviate by subtraction, writing
4u — 7v + 2w. We write —v for (—i)v.
8 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

X2
A

(uj + wy, V2 + W2)

]
V2 + WwW +

W2
v» +
vt+w

vj
Ww J av]

v v) Ww v| + Wy

FIGURE 4.10
Coniputation of v + w in R?.

EXAMPLE 1 Let v = [—3, 5, —1] and w = [4, 10, —7] in 8°. Compute Sv — 3w.
SOLUTION We compute

Sv — 3w = 5[-3, 5, -1] — 3[4, 10, —7]


= [-15, 25, —5] — [12, 30, —21]
= [-27, —S, 16]. "

EXAMPLE 2 For vectors v and w in R’ pointing in different directions from the origin,
represent geometricaliy 5v — 3w.
SOLUTION This is done in Figure 1.11. =

The analogues of many familiar algebraic laws for addition and multipli-
cation of scalars also hold for vector addition and scalar multiplication. For
convenience, we gather them in a theorem.

FIGURE 1.11
5v — 3w in R’.
1.1 VECTORS IN EUCLIDEAN SPACES 9

THEOREM 1.1 Properties of Vector Algebra in R"

Let u, v, and w be any vectors in R’, and let rand s be any scalars in R.
Properties of Vector Addition
Al (utv)+w=ut (v+ w) An associative law
A2 v+w=wty A commutative law
A3 O+v=yV 0 as additive identity
A4 v+(-v)=0 ~v as additive inverse of v
Properties Involving Scalar Multiplication
Sl r(v+w)=rv+rw A distributive law
S2 (r+ sw=rv + sv A distributive law
S3_r(sv) = (rs)v An associative law
S4 lv=yv Preservation of scale

The eight properties given in Theorem 1.1 are quite easy to prove, and we
leave most of them as exercises. The proofs in Examples 3 and 4 are typical.

EXAMPLE 3 Prove property A2 of Theorem 1.1.


SOLUTION Writing
v=[v,¥,...,¥,] and w= [w, w,..-,
Wal
we have
v+tw=([(v¥t+w,ytw,...,¥,+
w,I
and
wtv=([w,t+v,w,+,...,W,
+ Y,)-
These two vectors are equal because v, + w, = w, + v, for each 7. Thus, the
commutative law of vector addition follows directly from the commutative law
of addition of numbers. «

EXAMPLE 4 Prove property S2 of Theorem 1.1.


SOLUTION Writing v = [v,, v.,..., ¥,], we have
(r+ sw =(r + s\[v,, v,,.... Vi)
= [(r + s)v,, (r+ s)v,,...,(r + 5)v,]
= [rv, + sy, rv, + Sv... rv, + SY]
= [ry rv, ..., rV,| + [5v,, SV... , SV,
=rv + Sv.
Thus the property (r + s)v = rv + sv involving vectors foilows from the
analogous property (r + s)a, = ra; + sa; for numbers. =
10 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

Parallel Vectors

The geometric significance of multiplication of a vector by a scalar, as


illustrated in Figure 1.9, leads us to this characterization of parallel vectors.

DEFINITION 1.2 Parallel Vectors

Two nonzero vectors v and w in R" are parallel, and we write + || w, if


one is a scalar multiple of the other. If v = rw with r > 0, then v and w
have the same direction; if r < 0, then v and w have opposite directions.

EXAMPLE 5 Determine whether the vectors v = (2, 1, 3, —4] and w = (6, 3, 9, —12] are
parallel.
SOLUTION We put v = rw and try to solve for r. This gives rise to four component
equations:

2 = 6r, 1 = 3r, 3 = 9r, —-4= -12r.

Because r = ; > 0 is acommon solution to the four equations, we conclude that


v and ware parallel and have the same direction. #

Linear Combinations of Vectors

Definition 1.] describes how to add or subtract two vectors, but as we


remarked following the definition, we can use these operations to combine
thiee or more vectors also. We give a formal extension of that definition.

DEFINITION 1.3. Linear Combination

Given vectors V,, V.,..., ¥, in R" and scalars r,, r,,..., 7, 1n R, the
vector
rv, + FoV> + ees + FV,

is a linear combination of the vectors v,, v,,..., v, with scalar


coefficients 7,, 2, ..., %-

The vectors [1, 0] and [0, |] play a very important role in R’. Every vector
b in R? can be expressed as a linear combination of these two vectors in a
unique way—namely, b = [b,, b,] = r,[1, 0] + r,[0, 1] 1f and only if r, = b, and
r, = b,. We call [1, 0] and (0, 1] the standard basis vectors in R’. They are
often denoted by i = [1, 0] and j = [0, 1], as shown in Figure |.12(a). Thus in
R’, we may write the vector [b,, b,] as bi + b,j. Similarly, we have three
standard basis vectors in R’——namely,
i=([1, 0,0], j=(0,1,0], and k = (0,0, 1],
14 VECTORS IN EUCLIDEAN SPACES 11

a) re

(a) (b)
FIGURE 1.12
(a) Standard basis vectors in R?; (b) standard basis vectors in R?.

as shown in Figure 1.12(b). Every vector in R? can be expressed uniquely as a


linear combination of i, j, and k. Fer example, we have (3, —2, 6] = 31 — 2j +
ok. For #2 > 3, we denote the rth standard basis vector, having | as the rth
component and zeros elsewhere, by

e,=(0,0,...,0,1,0,...,
0}.
t

rth component

We then have

b = [b,, b,, weg b,] = be, + be, +e + b,e,,:

We see that every vector in R” appears as a unique linear combination of the


standard basis vector in R’.

The Span of Vectors


Let v be a vector in R". Ail possible linear combinations of this single vector ¥
are simply all possible scalar multiples rv for all scalars r. If v 4 0, all scalar
multiples of v fill a line which we shall call the line along v. Figure 1.13(a)
shows the line along the vector [—1, 2] in R? while Figure 1.13(b) indicates the
line along a nonzero vector v in R’.
Note that the line along v always contains the origin (the zero vector)
because one scalar multiple of v is Ov = 0.
Now let v and w be two nonzero and nonparallel vectors in R*. All possible
linear combinations of v and w are all vectors of the form rv + sw for all scalars
rand s. As indicated in Figure 1.14, all these linear combinations fill a plane
which we call] the plane spanned by v and w.
12 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

a
nw
ye

(a) (b)

FIGURE 1.13
(a) The line along v in R?; (b) The line along v in R’.

EXAMPLE 6 Referring to Figure |.15(a), estimate scalars r and s such that rv + sw = b for
the vectors v, w, and b all lying in the plane of the paper.
SOLUTION We draw the line alcng vy, the line along w, and parallels to these lines through
the tip of the vector b, as shown in Figure 1.15(b). From Figure 1.15(b), we
estimate that b = 1.S5v — 2.5w. «
to
+
Ww
<

FIGURE 1.14
The plane spanned by v and w.
1.1 VECTORS IN EUCLIDEAN SPACES 13

V w

(a) (b)
FIGURE 1.15
(a) Vectors v, w, and b; (b) finding r and s so that b = rv + sw.

We now give an analytic analogue of Example 6 for two vectors in R?.

EXAMPLE 7 Let v = [1, 3] and w = [—2, 5] in R’. Find scalars r and s such that rv + sw =
[-1, 19].
SOLUTION Because rv + sw = ?[{1, 3} + s[-2, 5] = [r — 25, 3r + 5s], we see that ry + sw =
[—1, 19] if and only if both equations

r—2’s=-]
3r + 5s = 19

are Satisfied. Multiplying the first equation by —3 and adding the result to the
second equation, we obtain

0+ Ils
= 22,

so S = 2. Substituting in the equation r — 2s = —1, we find thatr= 3. «

We note that the components —! and 19 of the vector [—1, 19] appear on
the right-hand side of the system of two linear equations in Example 7. If we
replace —1 by 5, and 19 by b,, the same operations on the equations will enable
us to solve for the scalars r and s in terms of b, and b, (see Exercise 42). This
shows that all linear combinations of v and w do indeed fill the plane R’.
Example 7 indicates that an attempt to express a vector b as a linear
combination of given vectors corresponds to an attempt to find a solution of a
system of linear equations, This parallel is even more striking if we write our
vectors as columns of numbers rather than as ordered rows of numbers—that
is, as column vectors rather than as row vectors. For example, if we write the
vectors v and w in Example 7 as columns so that

v=[!] and w= [3]


14 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

and also rewrite [—1, 19] as a column vector, then the row-vector equation
rv + sw = [—1, 19] in the statement of Example 7 becomes

B+ sd [is}
Notice that the numbers in this column-vector equation are in the same
positions relative to each other as they are in the system of linear equations
r-2s=-1
3r+ 5s= 19
that we solved in Example 7. Every system of linear equations can be rewritten
in this fashion as a single column-vector equation. Exercises 35-38 provide
practice in this. Finding scalars r,,r,,...,7,Suchthatr,vy, tr.v, + ++ * + 7y¥,
= b for given vectors v,, v,, . . . , ¥,and bin R" is a fundamental computation in
linear algebia. Section 1.4 describes an algorithm for finding all possible such
scalars r,, f,... 4 My
The preceding paragraph indicates that often it will be natural for us to
think of vectors in R’” as column vectors rather than as row vectors.
The transpose of a row vector vy is defined to be the corresponding column
vector, and is denoted by v’. Similarly, the transpose of a column vector is the
corresponding row vector. For example,
| oT
4
[-1, 4, 15, -7]? = 15, and |—30} = [2, —30, 45].
~7 45

Note that for all vectors v we have (v’)? = v. As illustrated following Example
7, column vectors are often useful. In fact, some authors always regard every
vector v in R” as acolumn vector. Because it takes so much page space to write
column vectors, these authors may describe v by giving the row vector v’. We
do not follow this practice; we will write vectors in R” as either row or column
vectors depending on the context.
Continuing our geometric discussion, we expect that ifu, v, and w are three
nonzero vectors in R" such that u and vy are not parallel and also w is not a
vector in the plane spanned by u and y, then the set of all linear combinations
cfu, y, and w will fill a three-dimensional portion of R"—that is, a portion of
RR” wnat looks just like R?. We consider the set of these linear combinations to be
spanned by u, v, and w. We make the following definition.

DEFINITION 1.4 Span of v,, v,...,

Let v,,v,,..., ¥,be vectors in R”. The span of these vectors is the set of
all linear combinations of them and is denoted by sp(v,, v,,.. . , V,). In
set notation,

sp(v,, V.-.,VJ={r4v, toytec stay nyt, ...,7% ER.


11 VECTORS IN EUCLIDEAN SPACES 15

It is important to nc‘e that sp(v,, v,,..., v,) in R" may not fill what we
intuitively consider to be a k-dimensional portion of R". For example, in R? we
see that sp([1, —2], [—3, 6]) is just the one-dimensional line along [1, —2]
because [—3, 6] = —3[l, —2! already lies in sp({1, —2]). Similarly, if vy, is a
vector in sp(v,, v,), then sp(v,, v2, v3) = Sp(v,, ¥,) and so sp(¥,, Yo, V3) 1S not
three-dimensional. Section 2.1 will deal with this kind of dependency among
vectors. As a result of our work there, we will be able to define dimensionality.

SUMMARY
Euclidear n-space R" consists of all ordered -tuples of real numbers. Each
n-tuple x can be regarded as a point (x,, xX,,..., X,) and represented
graphically as a dot, or regarded as a vector [x,, x,,..-, X,] and
represented vy an arrow. The n-tuple 0 = [0, 0, . .. , O] isthe zero vector. A
real number r € R is called a scatar.
Vectors v and w in R’ can be aacded and subtracted, and each can be
ho

multiplied by a scalar r € R. In each case, the operation is performed on


the components, and the resulting vector is again in R”. Properties of these
operations are summarized in Theorem |.1. Graphic interpretations are
shown in Figures 1.6, 1.8, and 1.9.
Two nonzero vectors in R" are parallel if one is a scalar multiple of the
other.
A linear combination of vectors v,, V,, ..., ¥, in R” is a vector of the foim
rv, + 1¥, + ** + +7,¥,, where each r, is a scalar. The set of all such linear
combinations is the span of the vectors v,, ¥,,..., ¥, and is denoted by
SP(V¥), Yo,---5 Ve
Every vector in R” can be expressed uniquely as a linear combination of
the standard basis vectors €,,€), . . . ,@,, Where e, has | as its ith component
and zeros for all other components.

I
| EXERCISES
In Exercises 1-4, compute v + wand v — w for In Exercises 5-8, letu = [-1. 3, -2].v =
the given vectors v and w. Then draw coordinate [4, 0, -1], and w = [-3, —1, 2]. Compute the
axes and sketch, using your answers, the vectors v, indicaied vector.
Ww, v + w, andy — w.

1. v = [2, -1], w = [-3, —-2] 5. 3u


— 2v
2. v =[l,
3], w = [-2, 5] 6. u + 2(v
— 4w)
3. v=i+ 3j + 2k, w=i+t 2j + 4k 7,.u+v—-—w
4. v=2i-—j
t+ 3k,w=3it 5j+ 4k 8. 4(3u
+ 2v — Sw)
16 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

In Exercises 9-12, compute the given linear In Exercises 21-30, find all scalars c, if any exist,
combination of u = [{1, 2, 1, 0], v = [-2, 0, |, 6], such that the given statement is true. Try to do
and w = [3, —5, |, —2]. some of these problems without using pencil and
paper.
9. u—2v
+ 4w
10. 3u+tv-—w
21. The vector [2, 6] is parallel to the vector
[c, —3].
11. 4u — 2v + 4w
22. The vector [c’, —4] is parallel to the vector
12. —u
+ 5y + 3w
[!, —2].
23. The vector [c, —c, 4] is parallel to the vector
In Exercises 13-16, reproduce on your paper [-2, 2, 20].
those vectors in Figure 1.16 that appear in the 24. The vector [c’, c’, c*] is parallel to the vector
exercise, and then draw an arrow representing [1, —2, 4] with the same direction.
each of the following linear combinations. All of
the vectors are assumed to lie in the same plane. 25. The vector [13, —15] is a linear combination
of the vectors [1, 5] and [3, c].
Use the technique illustrated in Figure 1.7 when
all three vectors are involved. 26. The vector [—1, c] is a linear combination of
the vectors [—3, 5] and [6, —11].
13. 2u
+ 3y 27. i+ cj — 3k is a linear coinbination ofi + j
andj + 3k.
14, —3u
+ 2w
28. i+ ci + (c — L)k isin the span of i + 2}+ k
15. u+vtw
and 3i + 6j + 3k.
16. 2u —v + iw
29. The vector 3i — 2j + ck is in the span of
i+ 2] — kandj + 3k.
In Exercises 17-20, reproduce un your paper 30. The vector [c, —2c, c] is in the span of
those vectors in Figure 1.17 that appear in the [1, —1, 1], [0, 1, —3], and
exercise, and then use the technique illustrated in [0, 0, 1].
Example 6 to estimate scalars r and s such that
the given equation is true. All of the vectors are
In Exercises 31-34, find the vector which, when
assumed to lie in the same plane.
translated, represents geometrically an arrow
reaching from the first point to the second.
17. x =ru+t+sy
18. y=rutsv 31. From (—1, 3) to (4, 2) in R’
19. u=rx+sv 32. From (—3, 2, 5) to (4, -2, —6) in R?
20. y= ru
+ sx 33. From (2, 1, 5, —6) to (3, -2, 1, 7) in R’

Ww
e

Vv
0 u

FIGURE 1.16 FIGURE 1.17


1.1 VECTORS IN EUCLIDEAN SPACES 17

34. From (1, 2, 3, 4, 5) to(—5, —4, -3, -2, -1) _ . Ifa and b are two vectors in standard

oe
in R° position in R’, then the arrow from the
35. Write the linear system tip of a to the tip of b is a translated
representation of the vector b — a.
3x — 2y + 4z = 10 f. The span of any two nonzero vectors in
x- y-3z= 0 R? is all of R?.
2x+ y-3iz=-3 ——g. The span of any two nonzero, nonparallel
vectors in R? is all of R?.
as a column-vector equation. _—h. The span of any three nonzero,
36. Write the linear system _ nonparallel vectors in R? is all of R?.
—— i. If Vv), ¥),..., % are vectors in R? such
xX, — 3x, + 2x, = -6 that sp(v,,v., .., ¥,) = R’, then k = 2.
3x, ~ 4x, + 5x, = 12 — j. Ifv,%,..., ¥, are vectors in R? such
that sp(v,, v,,....V¥,) = R?, then k = 3.
as a column-vector equation. 40. Prove the indicated property of vector
37. Write the row-vector equation addition in R’, stated in Theorem 1.1.
a. Property Al
pl-3, 4, 6] + g[0, -2, 5] — r14, -3, 2] + b. Property A3
s[6, 0, 7] = [8, -3, 1] c. Property A4
as 4}, Prove the indicated property of scalar
a. a linear system, multiplication in R’, stated in Theorem 1.1.
b. a column-vectcr equation. 42. a. Property S!
b. Property S3
38. Write the column-vector equaticn
c. Property S4
3 1 9 | ; 42. Prove algebraically that the linear system
r| 3)+7ry 13) +7) 0} =|-8
o| --4} Ad
|[-9} .
Lai d
r- 25 = b,

3r+ Ss =),
as a linear system.
has a solution for all numbers 5,, 50, € R, as
39. Mark each of the following True or False.
asserted in the text.
___ a. The notion of a vector in R’ is useful a 83
only if 2 = 1, 2, or 3. on . Option | of the routine VECTGRPH in the
_— b. Every ordered n-tuple in R" can be software package LINTEK gives graphic
viewed both as a point and as a vector. quizzes on addition and subtraction of
-— ¢. lt would be impossible to define addition vectors in R?. Work with this Option {| until
of points in R” because we only add you consistently achieve a score of 80% or
vectors. better on the quizzes.
—~d. If a and b are two vectors in standard 44, Repeat Exercise 43 using Option 2 of the
position in R’, then the arrow from the routine VECTGRPH. The quizzes this time
tip of a to the tip of b is a translated are on linear combinations in R’, and are
representation of the vector a — b. quite similar to Exercises 17-20.

MATLAB
The MATLAB exercises are designed to build some familianty with this widely used
software as you work your way through the text. Complete information can be
obtained from the manual that accompanies MATLAB. Some information is
18 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

available on the screen by typing help followed by a space and then the word or
symbol about which you desire information.
The software LINTEK designed explicitly for this text does not require a
manual, because all information is automatically given on the screen. However,
MATLAB is a professionally designed program that is much more powerful than
LINTEK. Alinough LINTEK adequately illustrates material in the text, the
prospective scientist would be well advised to invest the extra time necessary to
acquire some facility with MATLAB.

Access MATLAB according to the directions for your installation. The MATLAB
prompt, which is its request for instructions, looks like >. Vectors can be entered by
entering the components in square brackets, separated by spaces; do not use commas.
Enter vectors a, b, c, u, v, and w by typing the lines displayed below. After you type
each line, press the Enter key. The vector components will be displayed, without
brackets, using decimal notation. Proofread each vector after you enter it. If you have
made an error, strike the up-arrow key and edit in the usual fashion to correct your
error. (If you do not want a vectc; displayed for proofreading after entry, you can
achieve this by typing; after the closing square bracket.) If you ever need to continue
on the ne-:t line to type data, enter at least two periods .. and immediately press the
Enter key and continue the data.
a=(2 -4 5 7
b=[-1 6 7 3]
c= [13 -21 5 39)
u = [2/3 3/5 1/7]
v= (3/2 -5/6 11/3)
w= [5/7 3/4 —-2/3]
Now enter u (that is, type u and press the Enter key) to see wu displayed again. Then
enter format long and then u to see the components of u displayed with more decimal
places. Enter format short and then wu to return to the display with fewer decimal
places. Enter rat(u, ’s'), which displays rational approximations that are accurate
when the numbers involved are fractions with sufficiently small denominators, to see u
displayed again in fraction (rational) format. Addition and subtraction of vectors can
be performed using + and —, and scalar multiplication using *. Enter a + b to see
this sum displayed. Then enter -3xa to see this scalar product displayed. Using what
you have discovered about MATLAB, work the following exercises. Entering who at
any time displays a list of variables to which you have assigned numerical, vector, or
matrix values. When you have finished, enter quit or exit to leave MATLAB.
M1. Compute 2a - 3b — Se.
M2. Compute 3c — 4(2a — b).
M3. Attempt to compute a + u. What happens, and why?
M4. Compute u + vin
a. short format
b. long format
¢c. rational (fraction) format.
M5. Repeat Exercise M4 for 2u- 3v + w by first entering x = 2«u — 3av + w
and then looking at x in the different formats.
11 VECTORS IN EUCLIDEAN SPACES 19

M6. Repeat Exercise M4 for 3 - 2b, entering the fractions as (1/3) and (3/7),
and using the time-saving technique of Exercise MS.
M7. Repeat Exercise M4 for 0.3u — 0.23w, using the time-saving technique of
Exercise M5.
M8. The transpose vy’ of a vector v is denoted in MATLAB by v’. Compute
a’ — 3c? and (a — 3c)’. How do they compare?
M9. Try to compute u + u’. What happens, and why?
M10. Enter help : to see some of the things that can be done using the colon. Use
the colon in a statement starting v1 = to generate the vector v1 = [-3 —2
-1 0 1 2], and then generate v2 = [1 4 7 10 13 16] similarly. Compute
Ww + 5y2 in rational format. (We use v1 and v2 in MATLAB where we
would use y, and v, in the text.)
M11. MATLAB can provide grarhic illustrations of the component values of a
vector. Enter plot(a) and note how the figure drawn reflects the components
of the vector a = [2 —4 5 7]. Press any key to clear the screen, and repeat
the experiment with bar(a) and finally with stairs(a) to see two other ways to
illustrate the component values of the vector a.
M12. Using the plot command, we can plot graphs of functions in MATLAB. The
command plot(x, y) will plot the x vector against the y vector. Enter x =
~1: .5: 1. Note that, using the colons, we have generated a vector of
x-coordinates starting at —1 and stopping at 1, with increments of 0.5. Now
enter y = x .« x . (A period before an operator, such as .+ or .*, will cause
that operation to be performed on each component of a vector. Enter help .
to see MATLAB explain this.) You will see that the y vector contains the
squares of the x-coordinates in the x vector. Enter plot(x, y) to see a crude
plot of the graph ofy = x for -l1 <x Sl.
a. Use the up-arrow key to return to the colon statement and make the
increment .2 rather than .5 to get a better graph. Use the up-arrow key to
get to the y = x .« x command, and press the Enter key to generate the
new y vector with more entries. Then get to the plot(x, y) command and
press Enter to see the improved graph of y = x’ for -1 Sx = 1.
b. Proceed as in part (a) to graph y = x’ for —3 < x = 3 with increments of
0.2. This time, put a semicolon after the command that defines the vector
x before pressing the Enter key, so that you don’t see the x-coordinates
printed out. Similarly, put a semicolon after the command defining the
vector y.
c. Plot the graph ofy = sin(x) for —4a S x S 4m. The number z can be
entered as pi in MATLAB. Remember to use « for multiplication.
d. Plot the graph of y = 3 cos(2x) — 2 sin(x) for —4a = x = 47. Remember
to use « for muitiplication.
20 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

iia THE NORM AND THE DOT PRODUCT

The Magnitude of a Vector


The magnitude ||v|| of v = [v,, v,] is considered to be the length of the arrow in
Figure 1.18. Using the Pythagorean theorem, we have
Iv] = Viv? + 9’. (1)

EXAMPLE1 Represent the vector v = [3, —4] geometrically, and find its magnitude.
SOLUTION The vector [3, -4} has magnitude

IM = VIF CA = V2 =
and is shown in Figure 1.19. a

In Figure 1.20, the magnitude ||v|| of a vector v = [v,, v,, V3] in R? appears as
the length of the hypotenuse of a right tnangle whose altitude is |v,| and whose
base in the x,,x,-plane has length Vv,’ + v,2. Using the Pythagorean theorem,
we obtain

Iv] = Vv? + v2 + vy. (2)

EXAMPLE2 Represent the vector v = [2, 3, 4] geometrically, and find its magnitude.
SOLUTION The vector v = [2, 3, 4] has magnitude ||v|| = \/2? + 3? + 4? = V/29 and is
represented in Figure 1.21. =

——— ite,

(vy, v2)

lvl

we Xy
v = (3, -4j
0 Vv)

FIGURE 1.18 FIGURE 1.19


The magnitude of v in R?. The magnitude of [3, —4].
1.2 THE NORM AND THE DOT PRODUCT 21

x3
A

ov = (2, 3, 4)
U3

PN “ V2, U3)

Vv

0 Xx

fs
t+——- x2 3 2

nN
Y/Y

x) vy + v2" x

FIGURE 1.20 FIGURE 1.21


The magnitude of v in R°. The magnitude of [2, 3, 4].

The magnitude of a vector is also called the norm or the /ength of the
vector. As suggested by Eqs. (1) and (2), we define the norm ||vj| of a vector v in
R’ as follows.

DEFINITION 1.5 Norm or Magnitude of a Vector in R’

Let v = [v,, v,, ... , ¥,] be a vector in R". The norm or magnitude of v 1s

Iv] = Viv? + v2 toes + yy’, (3)

EXAMPLE 3 Find the magnitude of the vector v = [—2, 1, 3, —1, 4, 2, 1].


SOLUTION We have

Ml = Vara P+ s+ (HP + 4 t+ P= V36=6 8


Here are some properties of this norm operation.

THEOREM 1.2 Properties of the Norm in R’

For all vectors v and w in R’ and for all scalars r, we have


1. |[v|] = 0 and |[v|| = 0 if and only ifv = 0 Positivity
2. |Irv|| = |r| |Ivll Homogeneity
3. |lv + wi] = [Ml + III Triangle inequality
22 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

0
FIGURE 1.22
The triangle inequality.

Proofs of Properties | and 2 follow immediately from Definition 1.5 and


appear as exercises at the end of this section. Figure 1.22 shows why Property 3
is called the iriangie inequality; geometrically, it states that the length of a side
of a triangle is less than or equal to the sum of the lengths of the other two
sides. Although this seems obvious to us from Figure 1.22, we really should
prove it—at least for n > 3, where we simply extended our definition of ||v|| for
v in R? or 22? without any further geometric justification. A proof of the triangle
inequality is given at the close of this section.

Unit Vectors

A vector in R" is a unit vector if it has magnitude 1. Given any nonzero vector v
in R”, a unit vector having the same direction 4s v is given by (1/|v/l)v.

EXAMPLE 4 Find a unit vector having the same direction as v = [2, 1, —3], and find a vector
of magnitude 3 having direction opposite to v.
SOLUTION Because ||v|]| = 2? + 1? + (—3)? = V/14, we see that u = (1/0/14)[2, 1, —3] is
the unit vector with the same direction as v, and —3u = (—3/V 14)[2, !, —3]
is the other required vector. s

The two-component unit vectors are precisely the vectors that extend from
the origin to the unit circle x? + y* = 1 with center (0, 0) and radius | in R’. (See
Figure |.23a.) The three-component unit vectors extend from (0, 0, 0) to the
unit sphere in R}, as illustrated in Figure 1.23(b).
Note that the standard basis vectors i and j in R’, as well asi, j, and k in R?,
are unit vectors. In fact, the standard basis vectors e,,e, ...,e, for Rare unit
vectors, because each has zeros in all components except for one component of
1. For this reason, these standard basis vectors are also called unit coordinate
vectors.
1.2 THE NORM AND THE DOT PRODUCT 23

FIGURE 1.23
(a) Typical unit vector in R?, (b) typical unit vector in R?.

The Dot Product

The dot product of two vectors is a scalar that we will encounter as we now
try to define the angle @ between two vectors v = [¥,, ¥.,..., V,] and w =
[w,, W,,..., ,] In Rk", shown symbolicaliy in Figure 1.24. To motivate the
definition of 6, we will use the law of cosines for the triangle symbolized in
Figure 1.24. Using our definition of the norm of a vector in R" to compute the
lengths of the sides of the triangle, the law of cosines yields
llvi? + [IP = ily — wil? + 2IIvl [Iwill (cos 6)
or

vetoes
tue t+ wets s + w,?
= (, — wi) +++ + + (, — Ww, + Ql [fwil (cos 0). (4)
n

After computing the squares on the right-hand side of Eq. (4) and simplifying,
we obtain

IIvl| lIwl] (cos 8) = vw, + ++ + VW, (5)

FIGURE 1.24
The angle between v and w.
24 CHAPTER 14 VECTORS, MATRICES, AND LINEAR SYSTEMS

The sum of products of corresponding components in the vectors v and w on


the right-hand side of Eq. (5) is frequently encountered, and is given a special
name and notation.

DEFINITION 1.6 The Dot Product

The dot product of vectors v = [V,, v,,... , V,] and w = [W,, #2, ..- 5 Wal
in R" is the scalar given by
Vow=VwW,
+ vw, + 2° ° + VW, (6)

The dot product is sometimes called the inner product or the scalar
product. To avoid possible confusion with scalar multiplication, we shall never
use the latter term.
In view of Definition 1.6, we can write Eq. (5) as

V+ w = |[v|| |iwi| (cos 4). (7)


Equation (7) suggests the following definition of the angle 6 between two
vectors v and win R*.

The aagle between nonzero vectors v and w is arccos ( tea (8)


\

Expression (8) makes sense, provided that

-js-* <], (9)


[vil liv
so that we can indeed compute the arccosine of (v « w)/(||v|| |wi|). This inequality
(9) is usually rewritten in the form

lv - wi = ||v|| |[wi]. Schwarz inequality (10)


We obtained it by assuming that Figure 1.24 is an appropriate representation
for vectors ¥ and w in R". We give a purely algebraic proof of it at the end of this
section to validate the definition in expression (8).

EXAMPLE 5 Find the angle @ between the vectors [1, 2, 0, 2] and [—3, 1, 1, 5] in R*.
SOLUTION We have

6 = [i, 2, 0, 2] -[-3, 1, 1, 5] _ 9 _ 41
CSUN VP + P+ Ot? V3 + P+ P+ BY) 2
Thus, 6= 60° a
1.2 THE NORM AND THE DOT PRODUCT 25

Equation 7 gives a geometric meaning for the dot product.

The dot product of two vectors is equal to the product of their


magnitudes with the cosine of the angle between them.

THEOREM 1.3. Properties of the Dot Product in R’

Let u, v, and w be vectors in R” and let r be any scalar in R. The


following properties hold:
Dl v-w=wrey, Commutative law
D2 u-(vt+w)=u-vt+u-w, Distributive law
D3 r(v > w) =(rv)> w= (rw), Homogeneity
D4 v-v=0,andv-v = Oif and only ifv = 0. Positivity

Verification of all of the properties in Theorem 1.3 is straightforward, as


illustrated in the following example.

HISTORICAL NOTE Tue Scuwarz INEQUALITY is due independently to Augustin-Louis Cauchy


(1789-1857) (see note on page 3). Hermann Amandus Schwarz (1843-1921). and Viktor
Yakovlevich Bunyakovsky (1804-1889).
It was first stated as a theorem about coordinates in an appendix to Cauchy’s 1821 text for his
course on analysis at the Ecole Polytechnique, as follows:
laata'a' tao" t > (SVe@tattatte::s Vaetattatt-::
Cauchy’s proof follows from the algebraic identity
(aata'a’ tata" + ++: + (aa' — a’a) + (aa" -— a"ay + ++: + (a'a" - a’a'y +
=(@ tartar t+ ++ Yatat tat +t ++).
Bunyakovsky proved the inequality for functions in 1859; that is, he stated the result
b 2 bh b
| a fe) ax = | at *(x) dx - | a g(x) dx.

where we can consider je I(x)g(x} dx to be the inner product of the functions j(x), g(x) in the
vector space of continuous functions on [a, 5]. Bunyakovsky served as vice-president of the St.
Petersburg Academy of Sciences from 1864 until his death. In 1875, the Academy established a
mathematics prize in his name in recognition of his 50 years of teaching and research.
Schwarz stated the inequality in 1884. In his case, the vectors were functions ¢, X of two
variables in a region 7 in the plane, and the inner product of these functions was given by

[f sea) = [fea Veena


Sf, ¢X dx dy, where this integral is assumed to exist. The inequality then states that

Schwarz’s proof is similar to the one given in the text (page 29). Schwarz was the leading
mathematician in Berlin around the turn of the century; the work in which the inequality appears
is devoted to a question about minimal surfaces.
26 CHAPTER1 VECTORS, MATRICES, AND LINEAR SYSTEMS
EXAMPLE 6 Verify the positivity property D4 of Theorem 1.3.
SOLUTION We let v = [¥,, v., ... ,v,], and we find that
vevevrtvetess ty?
A sum of squares is nonnegative and can be zero if and only if each summand
is zero. But a summand vy? is itself a square, and will be zero if and only if v; =
0. This completes the demonstration. =

It is important to observe that the norm of a vector can be expressed in


terms of its dot product with itself. Namely. for a vector v in R” we have

vi? = v- v. (11)

Letting v = [v,, ¥., ° °° ,¥,], we have

vev= VV, + VV, tres + VV, = IIv|/?.

Equation 11 enables us to use the algebraic properties of the dot product in


Theorem 1.3 to prove things about the norm. This technique is illustrated in
the preof of the Schwarz and triangle inequalities at the end of this section.
Here is another illustration.

EXAMPLE 7 Show that the sum of the squares of the lengths of the diagonals of a
parallelogram in R’ is equal tc tiie sum of the squares of the lengths of the
sides. (This is the parallelogram relution).
SOLUTION We take our parallelogram with vertex at the origin and with vectors v and w
emanating from the origin to form two sides, as shown in Figure 1.25. The
lengths of the diagonals are then ||v + w|| and |v — wi|. Using Eq. (11) and
properties of the dot product, we have

lly
+ wi + |lv
— wir
= (v + w)- (v + w) + (v— w): (Vv - Ww)
= (vv) + 2(v-w) + (Ww) + (V+ ¥) = QV w) + (WoW)
= 2(v- v) + 2(w:
w)
= Q|lv|P + 2liwiP,
which is what we wished to prove. &

The definition of the angle @ between two vectors v and w in R" leads naturally
to this definition of perpendicular vectors, or orthogonal vectors as they are
usually called in linear algebra.
12 THE NORM AND THE DOT PRODUCT 27

FIGURE 1.25
The parallelogram has v + w and
v — was vector diagonals.

DEFINITION 1.7 Perpendicular or Orthogonal Vectors

Two vectors v and w in R’ are perpendicular or orthogonal, and we


write v 1 w,ifv: w= 0.

EXAMPLE 8 Determine whether the vectors v = [4, 1. —2, 1] and w = [3, —4, 2, —4] are
perpendicular.
SOLUTION We have

v-w = (4)(3) + (1-4) + (-2)(2) + (1)(-4) = 0.


Thus,v lw. ao

Application to Velocity Vectors and Navigation


The next two examples are concerned with another important physical vector
model. A vector is the velocity vector of a moving object at an instant if it
points in the direction of the motion and if its magnitude is the speed of the
object at that instant. Physicists tell us that if a boat cruising with a heading
and speed that would give it a still-water velocity vector s is also subject to a
current that has velocity vector c, then the actual velocity vector of the boat is
v=ste.

EXAMPLE 9 Suppose that a ketch is sailing at 8 knots, following a course of 010° (that 1s, 10°
east of north), on a bay that has a 2-knot current setting in the direction 070°
(that is, 70° east of north). Find the course and speed made good. (The
expression made good is standard navigation terminology for the actual course
and speed of a vessel over the bottom.)
28 CHAPTER 1 VECTORS, MATRICCS, AND LINEAR SYSTEMS

SOLUTION The velocity vectors s for the ketch and c for the current are shown in Figure
1.26, in which the vertical axis points due north. We find s and c by using a
calculator and computing
s = [8 cos 80°, 8 sin 80°] = [1.39, 7.88]

and

c = [2 cos 20°, 2 sin 20°] = [1.88, 0.684].


By adding s and c, we find the vector v representing the course and speed of the
ketch over the bottom—that is, the course and speed made good. Thus we
have v = s + c = (3.27, 8.56]. Therefore, the speed of the ketch is

l\vl| = VB.27» + (8.56) = 9.16 knots,


and the course made good 1s given by

o
90 arctan( 8.56
5-55 =~— 90° o — 69° ow
= 21°.°

That is, the course is 021°. 8

EXAMPLE 10 Suppose the captain of our ketch realizes the importance of keeping track of
the current. He wishes to sail in 5 hours to a harbor that bears 120° and is 35
nautical miles away. That is, he wishes to make good the course 120° and the
speed 7 knots. He knows from a tide and current table that the current is
setting due south at 3 knots. What should be his course and speed through the
water?

N
1 Is|| = 8
|
N
A
10°

I
an Il vil = 9.16

L— lle =2

Le 20° -_

FIGURE 1.26 FIGURE 1.27


The vectorv =s +. The vectors =v-—c.
12 THE NORM AND THE DOT PRODUCT 29

SOLUTION In a vector diagram (see Figure 1.27), we again represent the course and speed
to be made good by a vector v and the velocity of the current by c. The correct
course and speed to follow are represented by the vector s, which is obtained
oy computing
=v-c
= [7 cos 30°, —7 sin 30°] — [0, -3]
~ [6.06, —3.5] - [0, -3] = [6.06, -C.5].
Thus the captain should steer course 90° — arctan(—0.5/6.06) ~ 90° + 4.7° =
94.7° and should proceed at
Iis\| ~ (6.06) + (—0.5): ~ 6.08 knots. ns

Proofs of the Schwarz and Triangle Inequalities

The proofs of the Schwarz and tnangle inequalities illustrate the use of
algebraic properties of the dot product in proving properties of the norm.
Recall Eq. {11): for a vector v in R’, we have

lv! = vey.

THEOREM 1.4 Schwarz Inequality

Let v and w be vectors in R”. Then |v + w/ = |lv/| |/wil.

PROOF Because the norm ofa vector is a real number and the square of a real
number is nonnegative, for any scalars r and s we have
Irv + swi|? = 0. (12)

Using relation (11), we find that

Irv + swi? = (rv + sw) + (rv + sw)


=r(v-v) + 2rs(v + w) + e(w> w) = 0
for all choices of scalars ~ and s. Setting r = w+ wands = —(y° w), the
preceding inequality becomes

(w- w)’(v > v) — 2(w > w)(v > w) + (v° w)'(w- Ww)
= (w- w)(v- vy — (Ww: wiv: wy = 0.
Factoring out (w - w), we see that
(w + w)[(w + w)(v + v) — (v+ wy] = 0. (13)
If w- w = 0, then w = 0 by the positivity property in Theorem 1.3, and the
Schwarz inequality is then true because it reduces to 0 = 0. If ||wi|? = w- w # 0,
30 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

then the expression in square brackets 1n relation (13) must also be nonnega-
tive—that is,

(w+ w)(v-v)
— (v- wy = 0,
and so

(vw) < (V+ v)(w- W) = |lv|IWI’.


Taking square roots, we obtain the Schwarz inequality. 4

The Schwarz inequality can be used to prove the triangle inequality that
was illustrated in Figure 1.22.

THEOREM 1.5. The Triangle Inequality

Let v and w be vectors in R”. Then ||v + wi] = ||v|| + ||wll.

PROOF Using properties of the dot product, as weli as tne Schwarz inequality,
we have

lv + wil? = (v + w)-(v + w)
= (v-v) + 2(v- w) + (w- Ww)
= (v+v) + 2{{v{| |[wi] + (w > w)
= [IMP + 2[IvIl [wl + Iwi?
= (||v|| + [lw].
The desired relation follows at once, by taking square roots. a

“| SUMMARY
Let v = [v,, %,....¥,] and w = [w,, w,,..., W,] be vectors in R".

The norm or magnitude of v is ||v|| = Vv,2+ vi + +++ + ¥,7.


>
=

The norm satishes the properties given in Theorem 1.2.


hk wn

A unit vector is a vector of magnitude 1.


Am

The dot product ofv and w is V+ Ww = v,w, + VW, + °° + VW,


The dot product satisfies the properties given in Theorem 1.3.
Moreover, we have v- v = |lv||? and |v + w| < |lv|| |lwi] (Schwarz inequality),
and also ||v + w|| < |lv|| + ||w|| (triangle inequality).
7. The angle 6 between the vectors v and w can be found by using the relation
v + w= [vil [lw] (cos 6).
8. The vectors v and w are orthogonal (perpendicular) if v - w = 0.
1.2 THE NORM AND THE DOT PRODUCT 31

EXERCISES

In Exercises 1-17, letw = [(-1. 3,


4], v= 24. Prove that the angle between two unit
9.1, —1], aad w = {-2. -1, 3]. Find the vectors u, and u, in R" is arccos(u, - U,).
indicated quantity.

1. ||-ull In Exercises 25-30, classify the vectors as


parallel, perpendicular, or neither. If they are
2. [ll parallel, state whether they have the same
3. lu + vil direction or opposite directions.
4. |lv — 2ul|
5, |j3u-v + 2w|| 25. [-1,4] and (8, 2]
6. [kewl 26. [-2, 1} and [5, 2]
7, The unit vector parallel to u, having the 27. (3, 2, 1] and (-9, -6, 3]
same direction 28. [2, 1,4, -1] and [0, 1, 2, 4]
. The unit vector parallel to w, having thc 29. [40, 4, -1, 8] and [—5, -2, 3, —4]
a

opposite direction
30. [4, 1, 2, 1, 6] and [8, 2, 4, 2, 3]
u:v¥
31. The distance between points {¥,, v.,... , ¥,)
. u:(v+w) and (w,, Ww, ..., W,) in R’ is the norm
. (uty)-w |v — w||, where v = |v, v.,..., ¥,] and w =
. The angle between u and v [w,, Ww, ..., W,]. Why is this a reasonable
definition of distance?
. The angle between u and w
. The value ofx such that [x, —3. 5] is
perpendicular to u In Exercises 32-35, use the definition given in
15. The value of » such that ([— 3, 1, 10] is Exercise 3! to find the indicated distance.
perpendicular to u
16. A nonzero vector perpendicular to both u 32. The distance from (—1, 4, 2) to (0, 8, 1) in
and v R3

17. A nonzero vector perpendicular to both u 33. The distance from (2, —1, 3) to (4, 1, —2) in
and w R3

34. The distance from (3, 1, 2, 4) to (-1, 2, 1, 2)


in R?
In Exercises 18-21, use properties of the dot 35. The distance from (—1, 2, 1, 4, 7, —3) to
product and norm to compute the indicated (2, 1, -3, 5, 4, 5) in RS
quantities mentally, without pencil or paper (or 36. The captain of a barge wishes to get to a
calculator). point directly across a straight river that
runs from north to south. If the current
18. {|[42, 14) flows directly downstream at 5 knots and the
19. ||[10, 20, 25. -15]]| barge steams at 13 knots, in what direction
20. should the captain steer the barge?
[14, 21, 28] - [4, 8, 20]
21, [12, —36, 24] - [25, 30, 10] 37. A 100-Ib weight is suspended by a rope
passed through an eyelet on top of the
22, Find the angle between [1, —1, 2, 3, 0, 4] weight and making angles of 30° with the
and [7, 0, I, 3, 2, 4] in R®& vertical, as shown in Figure 1.28. Find the
23. Prove that (2, 0, 4}, (4, |, —1), and (6. 7, 7) tension (magnitude of the force vector) along
are vertices of a right triangle in R?. the rope. (HINT: The sum of the force vectors
32 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

FIGURE 1.28 FIGURE 1.29


Both halves of the rcpe make an Two ropes tied at the eyelet and
angle of 30° with the vertical. making angles 6, arid 6, with the
vertical.

along the two halves of the rope at the evelet 40. Mark each of the following True or False.
must be an upward vertical vector of ___ a. Every nonzero vector in R” has nonzero
magnitude 100.] magnitude.
—— b. Every vector of nonzero magnitude in R’
38. a. Answer Exercise 37 if each half of the 1S nonzero.
rope makes an angle of 6 with the vertical ___c. The magnitude ofv + w must be at least
at the eyelet. . ;
: oo , . as large as the magnitude of either v or
b. Find the tension in the rope if both in Ree ° .
halves are vertical (@ = 0). d . Every nonzero vector v in tk” has exactly
c. What happens if an attempt is made to —_ one unit vector parallel to it.
stretch the rope out straight (horizontal)
—___ e. There are exactly two unit vectors
while the 100-lb weight hangs on it?
parallel to any given nonzero vector in
39. Suppose that a weight of 100 Ib is suspended R’.
by two different ropes tied at an eyelet on ____ f. There are exactly two unit vectors
top of the weight, as shown in Figure 1.29. perpendicular to any given nonzero
Let the angles the ropes make with the vector in R’.
vertical be 6, and 6,, as shown in the figure. — 8. The angle between two nonzero vectors
Let the tensions in the ropes be 7, fer the in R” is less than 90° if and only if the
right-hand rope and 7, for the lefi-uand dot product of the vectors is positive.
rope. __h. The dot product . of a vector with itself
a. Show that the force vector F, shown in _ yields the magnitude of the vector.
Figure 1.29 is T(sin 6)i + T,(cos 8,)j. ___ i. Fora vector v in R", the magnitude of r
b. Find the corresponding expression for F, times v is r times the magnitude of v.
in terms of 7, and 6). __ j._Ifv and w are vectors in R” of the same
c. If the system is in equilibrium, F, + F, = magnitude, then the magnitude of v — w
100j, so F, + F, must have i-component is 0.
() and j-component 100. Write two 41. Prove the indicated property of the norm
equations reflecting this fact, using the stated in Theorem 1.2.
answers to parts (a) and (b). a. The positivity property
d. Find 7, and 7; if 6, = 45° and 6, = 30°. b. The homogeneity property
12 THE NORM AND THE DOT PRODUCT 33

42. Prove the indicated property of the dot


product stated in Theorem 1.3.
a. The commutative law
pb. The distributive law
c. The homogeneity property
43. For vectors v and w in R’, prove that v — w
and v + ware perpendicular if and only if
II = [lvl
. For vectors u, v, and w in 8’ and for scalars r
and S$, prove that, if w is perpendicular to
both u and vy, then w is perpendicular to
rut Sv.

45. Use vector methods to prove that the


diagonals of a rhombus (parallelogram with
equal sides) are perpendicular. [HINT: Use a FIGURE 1.30
figure similar to Figure 1.25 and one of the
preceding exercises.] The vector “ + w) to the
. Use vector methods to prove that the midpoint of the hypotenuse.
midpoint of the hypotenuse of a right
triangle is equidistant from the three
vertices. [HINT: See Figure |.30. Show that

Itel + will = Ie - ml

MATLAB
MATLAB has a built-in function norm(x) for computing the norm of a vector x. It
has no built-in command for finding a dot product or the angle between two vectors.
Because one purpose of these exercises is to give practice at working with MATLAB,
we will show how the norm of a vector can be computed without using the built-in
function, as well as how to compute dot products and angles between vectors.
It is important to know how to enter data into MATLAB. In Section 1.1, we
showed how to enter a vector. We have created M-files on the LINTEK disk that
can be used to enter data automatically for our exercises, once practice in manual
data entry has been provided. If these M-files have been copied intu your MATLAB,
you can simply enter fbcls2 to create the data vectors a, b, c, u. v, and w for the
exercises below. The name of the file containing them is FBC1S2.M, where the
FBC1S2 stands for ‘‘Fraleigh/Beauregard Chapter | Section 2.” To view this data
file so that you can create data files of your own, if you wish, simply enter type
fbcls2 when in MATLAB. In addition to saving time, the data files help prevent
wrong answers resulting from typos in data entry.

Access MATLAB, and either emer focls2 or manually enter the following vectors.
a=[-2135]] u = [2/3 -4/7 8/5]
b= [4-1235] vy = [-1/2 13/3 17/11]
c= [-103064] w = [22/7 15/2 —8/3]
34 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

If you enter these vectors manually, be sure to proofread your data entry for accuracy.
Enter help . to see what can be done with the period. Enter a .« b and compare the
resulting vector with the vectors a and b. (Be sure to get in the habit of putting a space
before the period so that MATLAB will never interpret it as a decimal point in this
context.) Now enter c “3 and compare with the vector c. The symbol * is used to
denote exponentiation. Then entzr sum(a) and note how this number was obtained
from the vector a. Using the . notation, the sum function sum(x), the square root
function sqrt(x), and the arccosine function acos(x), we can easily write formulas for
computing norms of vectors, dot products of vectors, and the angle between two
vectors.

M1. Enter x = a and then enter normx = sqrt(sum(x .« x)) to find |la||. Compare
your answer with the result obtained by entering norm(a).
M2. Find ||b|| by entering x = b and then pressing the upward arrow until
equation normx = sqrt(sum(x .« x)) is at the cursor, and then pressing the
Enter key.
M3. Using the technique outlined in Exercise M2, find |lull.
M4. Using the appropriate MATLAB commands, compute the dot product v : w in
(a) short format and (b) rational tormat.
M5. Repeat Exercise M4 for (2u — 3v) + (4u — 7v).

NOTE. If you are working with your own personal MATLAB, you can add a function angl(x, y) for
finding the angle between vectors x and y having the same number of components to MATLAB’s
supply of available functions. First, enter help angi to be sure that MATLAB does not already have
a command with the name you will use; otherwise you might delete an existing MATLAB
function. Then, assuming that MATLAB has created a subdirectory C:\MATLAB\MATLAB on
your disk, get out of MATLAB and either use a word processor that will create ASCII files
or skip to the next paragraph. Using a word processor, create an ASCII file designated as
C:\MATLAB\MATLAB\ANGL.M by entering each of the following lines.
function z = angl(x, y)
% ANGL ANGL (x, y) is the radian angle between vectors x and y.
z = acos(sum(x .« y)/(norm(x)snorm(y)))
Then save the file. This creates a function angl(x, y) of two variables x and y in place of the name
angixy we use in M6. You will now be able to compute the angle between vectors x and y in
MATLAB simply by entering angl(x, y). Note that the file name ANGL.M concludes with the .M
suffix. MATLAB comes with a number of these M-files, which are probably in the subdirectory
MATLAB\MATLAB of your disk. Remember to enter help angl from MATLAB first, to be sure
there is not already a file with the name ANGL.M.
If you do not have a word processor that writes ASCII files, you can still create the file from
DOS if your hard disk is the default c-drive. First enter CD C:\MATLAB\MATLAB. Then enter
COPY CON ANGL.M and proceed to enter the three lines displayed above. When you have
pressed Enter after the final line, press the F6 key and then press Enter again.
After creating the file, access MATLAB and test your function angl(x, y) by finding the angle
between the vectors [!, 0] and [— 1, 0]. The angle should be a ~ 3.1416. Then enter help angi and
you should see the explanatory note on the line of the file that starts with % displayed on the
screen. Using these directions as a model, you can easily create functions of your own to add to
MATLAB.
1.3 MATRICES AND THEIR ALGEBRA 35

M6. Enter x = a and enter y = b. Then enter

anglxy = acos(sum(x .* y)/(norm(x)«norm(y)))

to compute the angle (in radians) between a and b. You should study this
formula until you understand why it provides the angle between a and b.
M7. Compute the angle between b and c using the technique suggested in Exercise
M2. Namely, enter x = b, enter y = c, and then use the upward arrow until
the cursor is at the formula for anglxy and press Enter.
M8. Move the cursor to the formula for anglxy and edit the formula so that the
angle will be given in degrees rather than in radians. Recall that we multiply
by 180/a to convert radians to degrees. The number 7 is available as pi in
MATLAB. Check your editing by computing the angle between the vectors
{{, O} and (0, 1}. Then find the angle between u and w in degrees.
M9. Find the angle between 3u — 2w and 4y + 2w in degrees.

1.3 MATRICES AND THEIR ALGEBRA

The Notation Ax = b

We saw in Section |.1 that we can write a linear system such as

x, ~~ 2X} = —]

3x, + 5x, = 19 (1)


in the unknowns x, and x, as a single column vector equation—namely,

x); ! + x] —2| = 9—1 (2)2


Another useful way to abbreviate this linear system is

1 -2 x, = —] (3)
3 5} |x, 19} .
4 X b

Let us denote by A the bracketed array on the left containing the coefficients of
the linear system. This array A is followed by the column vector x of
unknowns, and let the column vector of constants after the equal sign be
denoted by b. We can then symbolize the linear system as

Ax = b. (4)
There are several reasons why notation (4) is a convenient way to write a linear
system. It is much easier to denote a general linear system by Ax = b than to
write out several linear equations with unknowns x,, x,,..., X,, subscripted
letters for the coefficients of the unknowiis, and constants b,, D,, ... , b,, to the
36 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

right of the equal signs. [Just look ahead at Eq. (1) on page 54.] Also, a single
linear equation in just one unknown can be written in the form ax = b(2x = 6,
for example), and the notation Ax = b is suggestively similar. Furthermore,
we will see in Section 2.3 that we can regard such an array A as defining a
function whose value at x we will write as Ax, much as we write sin x. Solv-
ing a linear system Ax = b can thus be regarded as finding the vector x
such thai this function applied to x yields the vector b. For all of these rea-
sons, the notation Ax = b for a linear system is one of the most useful nota-
tions in mathematics.
It is very important to remember that

Ax is equal to a linear combination of the column vectors of A,

as 1ilustrated by Eqs. (2) and (3)—namely,

5 stl} 13} 7 21 sf >)

The Notion of a Matrix

We now introduce the usual terminology and notation for an array of numbers
such as the coefficient array A in Eq. (3).
A matrix is an ordered rectangular array of numbers, usually e1closed in
parentheses or square brackets. For example,

-1 0 3
1 —2 2 1 4
A=|; | and B= 4 5-6
-3 -1 -]1

are matrices. We will generally use upper-case letters to denote matrices.


The size of a matrix is specified by the number of (horizontal) rows and the
number of (vertical) columns that it contains. The matnx A above contains
two rows and two columns and is called a 2 x 2 (read “2 by 2’) matrix.
Similarly, B is a 4 X 3 matrix. In writing the notation m X n to describe the
shape of a matrix, we always write the number of rows first. An n X n matrix
has the same number of rows as columns and Is said to be a square matrix. We
recognize that a 1 X n matrix is a row vector with n components, and an m x |
matrix is a column vector with m components. The rows of a matrix are its row
vectors and the columns are its column vectors.
Double subscripts are commonly used to indicate the location of an entry
in a matrix that is not a row or column vector. The first subscript gives the
number of the row in which the entry appears (counting from the top), and the
1.3 MATRICES AND THEIR ALGEBRA 37

second subscript gives the number of the column (counting from the left).
Thus an m X 2 matrix A may be written as

Q@, Qj. Ay *¢° a,,|


Q,, Qn, Ar, °° * Ay,
— _ {42 2p Ay **:> @
A= [a,] ~ 3n

ant am an3 on Qin

If we want to express the matrix B on page 36 as [b,], we would have b,, = —1,
b,, = 2. by = 5, and so on.

Matrix Multiplication
We are going to consider the expression Ax shown in Eq. (3) to be the product
of the matrix. A and the column vector x. Looking back at Eq. (5), we see that
Such a product of a matrix A with a column vector x should be the linear
combination of the column vectors of A having as coefficients the components
in the vector x. Here is a nonsquare example in which we replace the vector x of
unknowns by a specific vector of numbers.

EXAMPLE 1 Write as a linear combination and then compute the product

[ 2 -3 ; “
[-1 4 -7]] 3
tL “J

HISTORICAL NOTE THE TERM maTRIXis first mentioned in mathematical literature in an 1850
paper of James Joseph Sylvester (1814-1897). The standard nontechnical meaning of this term is
‘a place in which something is bred, produced, or developed.” For Sylvester, then, a matrix, which
was an “oblong arrangement of terms,” was an entity out of which one could form various square
pieces to produce determinants. These latter quantities. formed from squaie matrices, were quite
well known by this time.
James Sylvester (his original name was James Joseph) was born into a Jewish family in
London. and was to become one of the supreme algebraists of the nineteenth century. Despite
having studied for several years at Cambridge University, he was not permitted to take his degree
there because he “professed the faith in which the founder of Christianity was educated.”
Therefore, he received his degrees from Trinity College, Dublin. In 1841 he accepted a
professorship at the University of Virginia; he remained there only a short time, however, his
horror of slavery preventing him from fitting into the academic community. In 1871 he returned
to the United States to accept the chair of mathematics at the newly opened Johns Hopkins
University. In between these sojourns, he spent about 10 years as an attomey, during which time
he met Arthur Cayley (see the note on p. 3), and 15 years as Professor of Mathematics at the Royal
Military Academy, Woolwich. Sylvester was an avid poet, prefacing many of his mathematical
papers with examples of his work. His most renowned example was the “Rosalind” poem, a
400-line epic. each line of which rhymed with “Rosalind.”
38 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

SOLUTION Using Eq. (5) as a guide. we find that

he Ae tae flees} 139


Note that in Example 1, the first entry 21 of the final column vector is
computed as (—2)(2) + (5)(—3) + (8)(5), which is precisely the dot product of
—2
the first row vector [2 —3 5] of the matrix with the column vector | 5].
8
Similarly, the second component — 34 of our answer is the dot product of the
second row vector [—1 4 —7] with this column vector.
In a similar fashion, we see that the ith component ofa column vector Ab
will be equal to the dot product of the ith row ofA with the column vector b.
We should also note from Example 1 that the number of components in a ro:v
of 4 will have to be equal to the number of components in the column vector b
if we are to compute the product Ab.
We have illustrated how to compute a product Ab of an m X n matrix with
ann X | column vector. We can extend this notion to a product AB of an
m X nmatiix A with an 1 X s matrix B.

The product AB 1s the matrix whose jth column is the product ofA
with the sth column vector of B.

Letting b be the jth column vector of B, we write AB = C symbolically as

b. --+ b, {=| Ab, Ab, ++ Ab].

B C
Because B has s columns, C has s columns. The comments after Example !
indicate that the ith entry in the jth column of AB is the dot product of the ith
row of 4 with the jth column of B. We give a formal definition.

DEFINITION 1.8 Matrix Multiplication

Let A = [a,] be an m X n matrix, and let B = [b,] be an n x 5 matrix.


The matrix product AB is the m Xx s matrix C = [c,], where c, is the dot
product of the ith row vector of A and the jth column vector of B.
13 MATRICES AND THEIR ALGEBRA 39

We illustrate the choice of row i from A and columnj from 8 to find the
elemeni c, in AB, according to Definition 1.8, by the equation

cr —- 4,, |

. . by, b,, by,

AB = [ce] =| 9 — 9m | | |,
by b,, bs

L Gm) — Ann |

where

¢;, = (ith row vector of A) - (jth cciumn vector of B).


In summation notation, we have

Cy = a,b, + Arby, tees t AinDay

_ > yb, (6)


ke
Notice again that AB is defined only when the second size-number (the
number of columns) of A is the same as the first size-number (the number of
rows) of B. The product matrix has the shape
(First size-number of A) x (Second size-number of B).

EXAMPLE 2 Let A be a2 X 3 matrix, and iet B be a 3 X 5 matrix. Find the sizes of 48 and
BA, if they are defined.
SOLUTION Because the second size-number, 3, of A equals the first size-number, 3, of B,
we see that 4B is defined; it is a 2 X 5 matrix. However, BA is not defined,
because the second size-number, 5, of B is not the same as the first
size-number, 2, of A. a

EXAMPLE 3 Compute the product

-» 3 nl 4-7! 2 5
4 6-o{| 3 9 1 UL
-2 3 5-3
SOLUTION The product is defined, because the left-hand matrix is 2 x 3 and the
right-hand matrix is 3 x 4; the product will have size 2 x 4. The entry in the
first row and first column position of the product is obtained by taking the dot
product of the first row vector of the left-hand matrix and the first column
vector of the right-hand matrix, as follows:

(—2)(4) + (3)(3) + (2-2) = -8 + 9-4 = -3.


40 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

The entry in the second row and third column of the product is the dot product
of the second row vector of the left-hand matrix and the third column vector of
the right-hand one:

(4)(2) + (6)(1) + (-2)(5) = 8 + 6 - 10 = 4,

and so on, through the remaining row and column positions of the product.
Eight such computations show that

2 3 4 oo 7 le[3) 8 9-2
4 6-2/55 §3| 138-10
4 32
Examples 2 and 3 show that sometimes AB is defined when BA is not. Even
if both AB and BA are defined, nowever, it need not be true that AB = BA:

Matrix multiplication is not comnuutative.

EXAMPLE 4 Let

Compute AB and BA.

HISTORICAL NOTE Matrix MULTIPLICATION Originated in the composition of linear substitu-


tions, fully explored by Carl Friedrich Gauss (£777-1855) in his Disquisitiones Arithmeticae of
180! in connection with his study of quadratic forms. Namely, if F = Ax? + 2Bxy + Cy*is sucha
form, then the linear substitution
x = ax’ + by’ y=cex' + dy’ (i)
transforms F into a new form F" in the variables x’ and y’. If a second substitution
x’ = ex” + fy” y’ = gx" + hy" (ii)

transforms F’ into a form F” in x”, y", then the composition of the substitutions, found by
replacing x’, y’ in (i) by their values in (ii), gives a substitution transforming F into F”:
x = (ae + bg)x” + (af + bh)y” y = (ce + dg)x” + (ef + dh)y”. (iii)
The coefficient matrix of substitution (iii) is the product of the coefficient matrices of substitutions
(i) and (ii). Gauss performed an analogous computation in his study of substitutions in forms iD
three variables, which produced the rule for multiplication of 3 x 3 matrices.
Gauss, however, did not explicitly refer to this idea of composition as a “multiplication.”
That was done by his student Ferdinand Gotthold Eisenstein (1823-1852), who introduced the
notation S x T to denote the substitution composed of S and 7. About this notation Eisenstein
wrote, “An algorithm for calculation can be based on this; it consists of applying the usual rules for
the operations of multiplication, division, and exponentiation to symbolical equations betweed
linear systems; correct symbolical equations are always obtained, the sole consideration being that
the order of the factors may not be altered.”
1.3 MATRICES AND THEIR ALGEBRA 41

SOLUTION We compute that

_[ 410 13°55
AB =| 4 x8 and BA = ' 29}
Of course, for a square matrix 4, we denote AA by 4’, AAA by 43, and so
on. It can be shown that matrix multiplication is associative; that is,
A(BC) = (AB)C
whenever the produci is defined. This is not difficult to prove from the
definition, although keeping track of subscripts can be a bit challenging. We
Icave the proof as Exercise 33, whose solution is given in the back of this text.

The n x rn Identity Matrix


Let be the n X n matrix [a,] such that a, = | fori=1,..., anda, = 0 for
1 # j. That is,
&


OO 2

ooo
oo;


or

000 --: I 0 1
where the large zeros above and below the diagonal in the second matrix
indicate that each entry of the matrix in those positions is 0. If A is any m x n
matrix and B is any n X s matrix, we can show that
AI=A and IB=B.
We can understand why this is so if we think about why it is that

i alo La al=[o alla a


Because of the relations aJ = A and JB = B, the matrix J is called the n Xx n
identity matrix. It behaves tor multiplication of n x n matrices exactly as the
scalar 1 behaves for multiplication of scalars. We have one such square icentity
matrix for each integer I, 2, 3, .. . . To keep notation simple, we denote them
all by J, rather than by /,, J,, 5, . . . . The size of J will be clear from the context.
The identity matrix 1s an example of a diagonal matrix—namely, a square
matrix with zero entries except possibly on the main diagonal, which extends
from the upper left corner to lower right corner.

Other Matrix Operations


Although multiplication is a very important matrix operation for our work,
we will have occasion to add and subtract matrices, and to multiply a matrix
42 CHAPTER 3 VECTORS, MATRICES, AND LINEAR SYSTEMS

by a scalar, in later chapters. Matrix addition, subtraction, and scalar mul-


tiplication are natural extensions of these same operations for vectors
as defined in Section 1.1; they are again performed on entries in corresponding
positions.

DEFINITION 1.9 Matrix Addition

Let A = [a,] and B = [b,] be two matrices of the same size m X n. The
sum A + B of these two matrices is the m x n matrix C = [c,], where
C, = a, + dy.

That is, the sum of two matrices of the same size is the matrix of that
size ohtained by adding corresponding entries.

EXAMPLE 5 Find

SOLUTION The sum is the matrix

EXAMPLE 6 Find

1-3 -5§ 4 6
; Aba 307 ‘|
SOLUTION The sum is undefined, because the matrices are not the same size. a

Let A be an m X n matrix, and let O be the m X n matrix all of whose


entries are zero. Then,

AtTO=O+A=A.

The matrix O is called the m X n zero matrix; the size of such a zero matrix is
made clear by the context.

DEFINITION 1.10 Scalar Multiplication

Let A = [a,], and let rbe a scalar. The product rA of the scalar rand the
matrix A is the matnix B = (b,] having the same size as A, where

bj = 1a,
13. MATRICES AND THEIR ALGEBRA 43

EXAMPLE 7 Find

| 72 |
3 -S|

SOLUTION Multiplying each entry of the matrix by 2. we obtain the mairix


-4 2
| 6 —10} a

For matrices A and B of the same size. we define the difference A — Btobe
A-B=A+(-1)B.

The entries in A — B are obtained by subtracting the entries of B from entries


in the corresponding positions in 4.

EXAMPLE 8 If
_{3 -l 4
A=|9 _ and B=|_/-1 1 0 4]
1
find 2A - 3B.
SOLUTION We find that

_f 9-2 -7
24 3B =| 5 10 3h .
We introduced the transpose operation to change a row vector to a column
vector, or vice versa, in Section I.!. We generalize this operation for
applicaticn to matrices, changing all the row vectors to column vectors, which
results in all the column vectors becoming row vectors.

DEFINITION 1.11 Transpose of a Matrix; Symmetric Matrix

The matrix B is the transpose of the matrix A, written B = A’, ifeach


entry 5, in Bis the same as the entry a, in A, and conversely. If A is a
matrix and if A = A’, then the matrix A is symmetric.

EXAMPLE 9 Find A’ if
{| 1 4 = §
A= _; 2 |
SOLUTION We have

1-31
A’=!]4 2),
5 7

Notice that the rows of A become the columns of A’. «


44 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS
A symmetric matrix must be square. Symmetric matrices arise in some
applications, as we shall see in Chapter 8.

EXAMPLE 10 Fill in the missing entries in the 4 x 4 matrix

5 -6 8
3
-—2 10 4
II —1

to make it symmetric.

SOLUTION Because rows must match corresponding columns, we obtain

5 -6 -2 &
-6 3 1 11
—2 1 O
8 lt 4-1 .

In Example 10, note the symmetry in the main diagonal.


We have explained that we will often regard vectors in R" as column
vectors. If a and b are two column vectors in R’, the dot product a - b can be
written in terms of the transpose operation and matrix multiplication—
namely,

a-b=a'b=fa,a,°+: a] °|. (7)

Strictly speaking, a’b is a 1 X 1 matrix, and its sole entry is a- b. Identifyinga


1 X 1 matrix with its sole entry should cause no difficulty. The use of Eq. (7)
makes some formulas given later in the text much easier to handle.

Properties of Matrix Operations

For handy reference, we box the properties of matrix algebra and of the
transpose operation. These properties are valid for all vectors, scalars,
and matrices for which the indicated quantities are defined. The exercises
ask for proofs of most of them. The proofs of the properties of matrix
algebra not involving matrix multiplication are essentially the same as the
proofs of the same properties presented for vector algebra in Section 1.1.
We would expect this because those operations are performed just on cor-
responding entries, and every vector can be regarded as either a 1 X nor an
n X | matrix.
1.3 MATRICES AND THEIR ALGEBRA 45

Properties of Matrix Algebra


A+B=BtaA Commutative law of addition
(A + B)+C=A+(B+C) Associative law of addition
A+tO=O+A=A Identity for addition
r(A + B) = r4+rB A left distributive law
(r+ S)\A=rA+sA A right distributive law
(rs)A = r(SA) Associative law of scalar multiplication
(rA)B = A(rB) = r(AB) Scalars pull through
A(BC) = (AB)C Associative law of matrix multiplication
IA = Aand BIl= B Identity for matrix multiplication
A(B + C) = AB + AC A left distributive law
(A + B)C = AC + BC A right distributive law

Properties of the Transpose Operation


(AT)F =A Transpose of tie transpose
(A + B)’ = A’ + B™ Transpose of a sum
(AB)' = BTA Transpose of a product

EXAMPLE 11 Prove that A(B + C) = AB + AC for any m X n matrix A and any n X s matrices
B and C.
SOLUTION Let A = [a,], B = [b,] and C = [cy]. Note the use of j, which runs from | to n, as
both the second index for entries in A and the first index for the entries in B
and C. Tie entry in the ith row and Ath column of A(B + C) is

> A, (by, + Cy).


j=

By familiar properties of real numbers, this sum is also equal to

nt n n

> (a,b, + AjCy) = > Aidit + > AjCjy


JF J j=l
which we recognize as the sum of the entries in the ith row and Ath columns of
the matrices AB and AC. This completes the proof. s
46 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

“| SUMMARY
An m X n matrix is an ordered rectangular array of numbers containing m
rows and n columns.
An m X | matrix is a column vector with m components, anda l X n
matrix 1s a row vector with n components.
The product Ab of au m X n matrix A and a column vector b with
components b,, 5,,..., 5, is the column vector equal to the linear
combination of the column vectors of A where the scalar coefficient of the
jth column vector of A is 8).
The product AB of an m X n matrix A and ann X § matrix Bis them X5
matrix C whose jth column is A times the jth column of B. The entry c, in
the ith row and jth column of Cis the dot product of the ith row vector of A
and ihe jth column vector of B. In general, AB # BA.
If A = [a,] and B = [6,] are matrices of the same size, then A + Bis the
matrix of that size with entry a, + 5, in the ith row and jth column.
For any matrix A and scalar r, the matrix rA is found by multiplying each
entry in A by r.
The transpose of an m X n matrix A is the n X m matrix A’, which has as its
kth row vector the Ath column vector of A.
Properties of the matrix operations are given in boxed displays on page 45.

") EXERCISES
In Exercises 1-16. let

Compute the indicated quantity, if it is defined. 17. Let

3A 9. (24)(5C) ,
=

a. Find A’.
OB
A (ODYAB) b. Find A’.
RYN

A+B
i. 4 , 18. Let
B+C 2. (AC) 0 0 —|

C-D 13. (24 - B)D A=|0 2. Ol.


AA

4A —- 2B 14. ADB 2 0 O
AB 15. (A™)A a. Find A’,
SY

. (CD)T 16. BC and CB b. Find A’,


1.3 MATRICES AND THEIR ALGEBRA 4?

19, Consider the row and column vectors vector c and a matrix A as a linear
combination of vectors. {Hint: Consider
4
x = {-2, 3, -l] andy = |
((eA)")".]
3
Compute the matrix products xy and yx. In Exercises 25-34, prove that the given relation
40. Fill in the missing entries in the 4 x 4 holds for all vectors, matrices, and scalars for
matrix which the expressions are defined.
1 -1 5
25. A+B=Bt+A
4 8
2-7 -1 26. (A+ BY+ C=A+(Bt+C)
6 3 27. (r+ s\A=rA+ SA
so that the matrix is symmetric. 28. (rs)A = r(sA)
29. A(B + C) = AB + AC
21. Mark each of the following True or False.
The statements involve matrices A, B, and 30. (A?) =
C that are assumed to have appropriate 31. {4 + B)’ = A’ + B
$1ze. 32. (AB)? = BTAT
a. IfA = B, then AC = BC.
b. if AC = BC, then A = B. 33. (AB)C = A(BC)
c. If AB = O, thenA = Oor B= 0. 34. (rA)B = A(rB) = r(AB)
d. ifA+C=B+C, ihenA=8B 35. If Bis anm X n matrix and ifB = A’, find
e. If A? = J, thenA = +1. the size of
f. If B = A? and ifA isn x n and a. A,
symmetric, then b, = 0 for b. AA?,
i=1,2,...,n. c. ATA.
__.g. If AB = C and if two of the matrices are
36. Let v and w be column vectors in R". What
square, then so Is the third.
is the size of vw”? What relationships hold
__h. If AB = C and if C is a column vector,
between yw’ and wv’?
then so is B.
—— i. If A? = J, then A’ = / for all integers 37. The Hilbert matrix H, is the n X n matrix
n= 2. [hy], where h, = 1/(i + j — 1). Prove that the
____ j. If A? = J, then A” = / for all even integers matrix H, is symmetric.
n= 2. 38. Prove that, if A is a square matrix, then the
22. a. Prove that, if A is a matrix and x is a row matrix A + A’ is symmetric.
vector, then xA (if defined) is again a row 39. Prove that, if A is a matrix, then the matrix
vector. AA’ is symmetric.
b. Prove that, if A is a matrix and y isa 40. a. Prove that, if A is a square matrix, then
column vector, then Ay (if defined) is (A*\7 = (AT) and (42)" = (A7). [Hint:
again a column vector. Don’t try to show that the matrices have
23. Let A be an m X n matrix and let b and c be equal entries; instead use Exercise 32.]
column vectors with n components. Express b. State the generalization of part (a), and
the dot product (Ab) : (Ac) as a product of give a proof using mathematical
matrices. induction (see Appendix A).
24. The product Ab of a matrix and a column 41. a. Let A bean m X n matnx, and let e, be
vector is equal to a linear combination of the n x | column vectcr whose jth
columns of A where the scalar coefficient of component is 1 and whose other
the jth column of A is 8;. In a similar components are 0. Show that Ae, is the
fashion, describe the product c4 of a row jth column vector of A.
48 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

b. Let .A and B be matrices of the same size. 46. Find all values of r for which
i. Prove that, if Ax = 0 (the zero vector) 2001 101
for all x, then A = OQ, the zero maitiix. 070 commutes with 010
{Hint: Use part (a).] 002 101 ‘
ii. Prove that, if Ax = Bx for all x, then
A = B. {Hint: Consider A - B.] = The software LINTEK includes a routine,
42. Let A and B be square matrices. |s MATCOMP, that performs the matrix operations
(A + BY =A? + 24B + B? described in this section. Let
ar . 4 6 0 1-9
If so, prove it; if not, give a counterexample 211 $$ 2-5
and state under what conditions the =|-1 2-4 5 7
equation is true. 012-8 4 3
43. Let A and B be square matrices. Is 10 4 6 2-5
(A + B)(A — B) = A? — BY? and
If so, prove it; if not, give a counterexample -8 15 4-11
and state under what conditions the 3 5 6 -2
equation is true. B=| 0-1 12 5}.
44. Ann Xn matrix C is skew symmetric if 113-15 7
C7 = -C. Prove that every square matrix A L6-8 0 -5!
can be written uniquely as A = B + C where
B is symmetric and C is skew symmetric. Use MATCOMP in LINTEK to enter and store
these matrices, and then compute ihe matrices in
o, _ Exercises 47-54, if they are defined. Write down
Matrix A commutes with matrix B if AB = BA. to hand in, if requested, the entry in the 3rd row,
. 4th column of the matrix.
45. Find all values of r for which
2001 CooL 47, AA+A 50. BA? 53. (2A) — AS

F 1 J commutes with f 1 | 48. A’B 51. B7(2A) 54. (A’7)°


00,r 101 49. AXAT 52. ABLABYT

MATLAB
To enter a matrix in MATLAB, start with a left bracket [ and then type the entries
across the rows, separating the entries by spaces and separating the rows by
semicolons. Conclude with a right bracket ]. To illustrate, we would enter the matrix
-! 5
4-[2 - as A =[-1
5; 13 —4; 7 0)
7 #0
and MATLAB would then print it for us to proofread. Recall that to avoid having
data printed again on the screen, we type a semicolon at the end of the data before
pressing the Enter key. Thus if we enter
A = [-1 5; 13 —4; 7 0);
1.3 MATRICES AND THEIR ALGEBRA 4S

the matrix A will not be printed for proofreading. In MATLAB, we can enter

A + B to find the sum


A — B to find the difference
A « B to find the product AB
A “nto find the power A"
rxA to find the scalar multiple rA
A’ to take the transpose of A
eye(n) for the n x n identity matrix
zeros(m,n) for an m X n matrix of 0’s, or zeros(n) if square
ones(m,n) for an m X n matrix of 1’s, or ones(n) if square
rand(m,n) for a matrix of random numbers from 0 to I, or rand(n) if square.

In MATLAB, A(i, j) is the entry in the ith row and jth column of A, while A(k) is
the Ath entry in A where entries are numbered consecutively starting at the upper
left and proceeding down the first column, then down the seccnd column, etc.

Access your MATLAB, and enter the matrices A and B given before Exercises 47-54.
(We ask you to enter them manually this time to be sure you know how to enter
matrices.) Proofread the data for A and B. If you find, for example, that you entered 6
rather than 5 for the entry in the 2nd row, 3rd column of A, you can correct your
error by entering A(2,3) = 5.

Ml. Exercises 47~54 are much easier to do with MATJ_ARB than with LINTEK,
because operations in LINTEK must be specified one at a time. Find the
element in the 3rd row, 4th column of the given matrix.
a. B(2A)
b. AB(AB)"
c. (2A) — A
M2. Enter B(8). What answer did you get? Why did MATLAB give that answer?
M3. Enter help : to review the uses of the colon with matrices. Mastery of use of
the colon is a real timesaver in MATLAB. Use the colon to set C equal to the
5 x 3 matrix consisting of the 3rd through the Sth columns of A. Then
compute C’C and write down your answer.
M4. Forma 5 = 9 matrix D whose first five columns are.those of A and whose last
four columns are those of B by
a. entering D = [A B}, which works when A and B have the same number of
rows,
b. first entering D = A and then using the colon to specify that columns 6
through 9 of D are equal to B. Use the fact that AG, j) gives the jth column
of A.
Write down the entry in the 2nd row, Sth column of D7D.
M5. Form a matrix E consisting of B with two rows of zeros put at the bottom
and write down the entry in the 2nd row, 3rd column of E7E by
a. entering Z = zeros (2,4) and then E = [B; Z}, which works when B and Z
have the same number of columns,
b. first entering E = B and then using the colon to specify that rows 6
through 7 of E are equal to Z. Use the fact that A(i, :) gives the ‘th row
of A.
56 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

M6. In mathematics, “mean” stands for “‘average,” so the mean of the numbeis 2,
- 4, and 9 is their average (2 + 4 + 9)/3 = 5. In MATLAB, enter help mean to
see what that function gives, and then enter mean(A). Figure out a way to
have MATLAB find the mean (average) of all 25 numbers in the matrix A.
Find that mean and write down your answer.
M7. If F isa 10 x 10 matrix with random entries from 0 to |, approximately
what would you expect the mean value of those entries to be? Enter help rand,
read what it says, and then generate such a matrix F. Using the idea in
Exercise M6, compute the mean of the entries in F, and write down your
answer. Repeat this several times.
M8. Enter help ones and read what it says. Write down a statement that you could
enter in MATLAB to form from the matrix F of Exercise M7 a 10 x 10
matrix G that has random entries from —4 to 4. Using the ideas in Exercise
M6, find and write down the mean of the entries in the matrix G. Repeat this
several times.
M9. In MATLAB, entering mesh(X) will draw a three-dimensional picture
indicating the relative values of the entries in a matrix X, much as entering
plot(a) draws a two-dimensional picture for the entries in a vector a. Enter
= eye(16); mesh(I) to see a graphic for the 16 x 16 identity matrix. Then
enter mesh(rot90(I)). Enter help rot90 and help triu. Enter X = criu(ones(14));
mesh(X). Then enter mesh(rot90(X)} and finally mesh(rot90(X,—1)).

MATLAB has the capability to draw surface graphs of a function z = f(x, y)


using the mesh function. This provides an excellent illustration of the use of a
matrix to store data. As you experiment on your own, you may run out of computer
memory, or try to form matrices larger than your MATLAB will accept. Entering
clear A will erase a matrix A from memory to free up some space, and entering clear
will erase all data previously entered. We suggest that you enter clear now before
proceeding.
MATLAB can draw a surface of z = f(x, y) over a rectangular region a = x < 8,
c= y = d of the x,y-plane by computing values z of the function at points on a grid
in the region. We can describe the region and the grid of points where we want
values computed using the function meshdom. Review the use of the colon, using
help : if you need to, and notice that entering —3:1:3 will form a vector with first
entry —3 and successive entries incremented by | until 3 is reached. If we enter
[X, Y] = meshdom(—3:1:3, —2:.5:2)

then MATLAB will create t.vo matrices X and Y containing, respectively, the
x-coordinates and y-coordinatc; of a grid of points in the region -3 = x = 3 and
-2 = y = 2. Because the x-increment is | and the y-increment is 0.5, we see that
both X and Y will be 9 x 7 matrices.

Enter now

{X, Y] = meshdom(—3:1:3, ~2:.5:2); (8)


and then enter X to see that matrix X and similarly view the matrix Y. We have
specified where we want the function values computed.
Enter help . to recall that entering A .x A will produce the matrix whose entries
are the squares of the entries in A. Thus entering Z = X x X + Y « Y in MATLAB
will produce a matrix Z whose entry at a position corresponding to a grid point (x, ¥)
1.4 SOLVING SYSTEMS CF LINEAR EQUATIONS 51

will be x° + ¥°. Entering mesh(Z) will then create the mesh graph over the region
-3sxy53,-2s7 2.
Enter now

Z=XaXt+Yx¥: (9)
mesh(Z) (i0)
tp see this graph.
M10. Using the up arrow, modify Eq. (8) to make both the x-increment and the
y-increment 0.2. After pressing the Enter key, use the up arrow to get Eq. (9)
and press the Enter key to form the larger matrix Z for these new grid
points. Then create the mesh graph using Eq. (10).
M11. Modify Eq. (9) and create the mesh graph for z = x — y’.
M12. Change Eq. (8) so thai the regien will be -3 = x = 3and -3 = ys 3 still
with 0.2 for increments. Form the mesh graph for z = 9 — x? — y’.
M13. Mesh graphs for cylinders are especially nice. Draw the mesh graphs for
a z=x
bh z=y.
M14. Change the mesh domain to —4a7 <= x = 4x, -3 = y S$ 3 with x-increment
0.2 and y-increment 6. Recall that 2 can be entered as pi. Draw the mesh
graphs for
a. the cylinder z = sin(x),
b. the cylinder z = x sin(x), remembering to use .x, and
c. the function z = » sin(x), which is not a cylinder but is pretty.

1.4 SOLVING SYSTEMS OF LINEAR EQUATIONS

As we have indicated, solving a system of linear equations is a fundamental


problem of linear algebra. Many of the computational exercises in this text
involve solving such linear systems. This section presents an algorithm for
finding all solutions of any linear system.
The solution set of any system of equations is the intersection of the
solution sets of the individual equations. That is, any solution of asystem must
be a solution of each equation in the system; and conversely, any solutior of
every equation in the system is considered to be a solution of the system.
Bearing this in mind, we start with an intuitive discussion of the geometry of
linear systems; a more detailed study of this geometry appears in Section 2.5.

The Geometry of Linear Systems


Frequently, students are under the impression that a linear system containing
the same number of equations as unknowns always has a unique solution,
whereas a system having more equations than unknowns never has a solution.
The geometric interpretation of the problem shows that these statements are
not true.
52 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

FIGURE 1.31
The plane
x +y + Zz = 1.

We know that a single linear equation in two unknowns has a line in the
plane as its solution set. Similarly, a single linear equation in three unknowns
has a plane in space as its solution set. The solution set of x + y + z= 1 1s the
plane sketched in Figure 1.31. This geometric analysis can be extended to an
equation that has more than three variables, but it is difficult for us to
represent the solution set of such an equation graphically.
Two lines in the plane usually intersect at a single point; here the word
usually means that, if the lines are selected in some random way, the chance

HISTORICAL NOTE Systems of Linear Equations are found in ancient Babylonian and
Chinese texts dating back well over 2000 years. The problems are generally stated in real-life
terms, but it is clear that they are artificial and designed simply to train students in mathematical
procedures, As an example of a Babylonian problem, consider the following, which has been
slightly modified from the original found on a clay tablet from about 300 8.c.: There are two fields
whose total area is | 800 square yards. One produces grain at a rate of = bushel per square yard, the
other at a rate of= bushel per square yard. The total yield of the two fields is 1100 bushels. What is
the size of each field? This problem leads to the system
x+ y= 1800

x + Jy = 1100,
A typical Chinese problem, taken from the Han dynasty text Nine Chapters of the
Mathematical Art (about 200 B.c.), reads as follows: There are three classes of corn, of which three
bundles of the first class, two of the second, and one of the third make 39 measures. Two of the
first, three of the second, and one of the third make 34 measures. And one of the first, two of the
second, and three of the third make 26 measures. How many measures of grain are contained in
one bundle of each class? The system of equations here is
3x
+ 2p+ z= 39
2x+3y+ z= 34
x + 2y + 3z=
26.
1.4 SOLVING SYSTEMS OF LINEAR EQUATIONS 53

y y
A h

| ~4x + 6y = —8

* o| 17% 3
al
4

Lor ay 4
|
FIGURE 1.32 FIGURE 1.33
2x ~— 3y = 4 is parallel to 2x — 3y = 4 and —4x + 6y = -—8
2x — 3y = 6. are the same line.

that they either are parallel (have empty intersection) or coincide (have an
infinite number of points in their intersection) is very small. Thus we see that a
system of two randomly selected equations in two unknowns can be expected
to have a unique solution. However, it is possible fur the system to have no
solutions or an infinite number of solutions. For example, the equations

2x — 3 = 4
2x — 3y
=6
correspond to distinct parallel lines, as shown in Figure 1.32, and the system
consisting of these equations has no solutions. Moreover, the equations
2x — 3y=4
—4x + by = -8

correspond to the same line, as shown in Figure !.33. All points on this line are
solutions of this system of two equations. And because it is possible to have
any number of lines in thc plane—say, fifty lines—pass through a single point,
it is possible for a system of fifty equations in only two unknowns to have a
unique solution.
Similar illustrations can be made in space, where a linear equation has as
its solution set a plane. Three randomly chosen planes can be expected to have
a unique poini in common. Two of them can be expected to intersect in a line
(see Figure 1.34), which in turn can be expected to meet the third plane at a
single point. However, it is possible for three planes to have no point in
common, giving rise to a linear system with no solutions. It is also possible for
all three planes to contain a common line, in which case the corresponding
linear system will have an infinite number of solutions.
54 CHAPTER 3 VECTORS, MATRICES, AND LINEAR SYSTEMS

FIGURE 1.34
Two planes intersecting in a line.

Elementary Row Operations


We now describe operations that can be used to modify the equations of a
linear system to obtain a system having the same solutions, but whose
solutions are obvious. The most general type of linear system can have m
equations in ” unknowns. Such a system can be written as

A,X, + Anx, + +++ + a,x, = d,

AyX, + AnX, + +++ + a,x, = dy

(1)

Qn X| + Qn tree + QnnXn = Dan:

System (1) is completely determined by its m x n coefficient matrix A = [q;]


and by the column vector b with ith component b,. The system can be written
as the single matnx equation
Ax =b, (2)
where x is tt:e column vector with ith component x,. Any column vector s such
that As = b is a solution of system (1).
The augmented matrix or partitioned matrix

a, a, **> a, | OD,
Q, Ay *** ay | dB,
(3)

ani Qn? se amp b,,

is a shorthand summary of system (1). The coefficient matrix has been


augmented by the column vector of constants. We denote matrix (3) by [A | bl.
1.4 SOLVING SYSTEMS OF LINEAR EQUATIONS 55

We shall see how to determine-all solutions of system (1) by manipulating


augmented matrix (3) using elementary row operations. The elementary row
operations correspond to the following familiar operations with equations of
system (1):
R1 Interchange two equations in system (1).
R2 Multiply an equation in system (1) by a nonzero constant.
R3 Replace an equation in system (1) with the sum of itself and a multiple
of a different equation of the system.
It is clear that operations Rl and R2 do not change the solution sets of the
equations tiey affect. Therefore, they do not change the intersection of the
solution sets of the equations; that is, the solution set of the system is
unchanged. The fact that R2 does not change the solution set and the familiar
algebraic principle, “Equals added to equals yield equals,’ show that any
solution of both the ith and jth equations is also a solution of a new jth
equation obtained by adding s times the ith equation to the jth equation. Thus
operation R3 yields a system having all the solutions of the original one.
Because the original system can be recovered from the new one by multiplying

HISTORICAL NOTE A MATRIX-REDUCTION METHOD of solving a system oi linear equations


vcecurs in the ancient Chinese work, Nine Chapters of the Mathematical Art. The author presents
the following solution to the system
3x+2y+ z= 39
2x+ 3y+ z= 34
x + 2y + 3z = 26.
The diagram of the coefficients is to be set up on a “counting board”:
1 2 3
2 3 2
3 1 1
26 34 39

The author then instructs the reader to multiply the middle column by 3 and subsequently to
subtract the right column “as many times as possible”; the same is t> be done to the left cclumn.
The new diagrams are then
103 0 0 3
25 2 4 5 2
3 1 1 ad g yy
26 24 39 39 24 39
The next instruction is to multiply the left column by 5 and then to subtract the middle column as
many times as possible. This gives
0 0 3
0 5 2
36 1 1
99 24 39
The system has thus been reduced to the system 3x + 2y + z = 39, Sy + z = 24, 36z = 99, from
which the complete solution is easily found.
56 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

the ith equation by —s and adding it to the new jth equation (an R3 operation),
we see that the original system has all the solutions of the new one. Hence R3,
too, does not alter the solution set of system (1).
These procedures applied to system (1) correspond to elementary row
operations applied to augmented matrix (3). We list these in a box together
with a suggestive notation for each.

Elementary Row Operations Notations


(Row interchange) Interchange the ith aiid jth row RK; R,
vectors in a matrix.
(Row scaling) Multiply the :th row vector ina matrix R,— SR;
by a nonzero scalar s.
(Row addition).Add to the ith row vector of a R,—> R, + sR;
matrix s times the jth row vector.

If a matrix B can be obtained from a matrix A by means of a sequence of


elementary row operations, then A is row equivalent to B. Each elementary row
operation can be undone by another of the same type. A row-addition
operation R;—> R; + sR; can be undone by R;—> R; — sR;. Row scaling, R;— sR;
for s # 0, can be undone using R, > (1/s)R,, while a row-interchange operation
undoes itself. Thus, if B is row equivalent to A, then A is row equivalent to B;
we can simply speak of row-equivalent matrices A and B, which we denote by
A ~ B. (See Exercise 55 in this regard.) We have just seen that the operations on
a linear system Ax = b corresponding to these elementary row operations on
the augmented matrix [A | b] do not change the solution set of the system. This
gives us at once the following theorem, which is the foundation for the
algorithm we will present for solving linear systems.

THEOREM 1.6 Invariance of Solution Sets Under Row Equivalence

If [A | b] and [H | c] are row-equivalent augmented matrices, then the


linear systems Ax = b and Hx = c have the same solution sets.

Row-Echelon Form

We will solve a linear system Ax = b by row-reducing the avgmented matrix


[A | b] to an augmented matrix [H | c], where H is a matrix in row-echelon
form (which we now define).
14 SOLVING SYSTEMS OF LINEAR EQUATIONS 57

DEFINITION 1.12 Row-Echelon Form, Pivot

A matrix is in row-echelon form if it satisfies two conditions:

1. All rows containing only zeros appear below rows with nonzero
entries.
2. The first nonzero entry in any row appears in a column to the right
of the first nonzero entry in any preceding row.
For such a matrix, the first nonzero entry in a row is the pivot for that
row.

EXAMPLE 1. Determine which of the matrices

Cow >

oor ht
Wr

Oo

are in row-echelon form.

SOIUTION Matrix 4 7s not in row-echelon form, because the second row (consisting of all
zero entries) is not below the third row (which has a nonzero entry).
Matrix B is not in row-echelon form, because the first nonzero entry in the
second row does not appear in a column to the right of the first nonzero entry
in the first row.
Matrix C is in row-echelon form, because both conditions of Definition
1.12 are satisfied. The pivots are —1 and 3.
Matrix D satisfies both conditions as well, and is in row-echelon form. The
pivots are the entries 1. §&

Solutions of Hx = c

We illustrate by examples that, ifa linear system Hx = c has coefficient matrix


H in row-echelon form, it is easy to determine all solutions of the system. We
color the pivots in H in these examples.

EXAMPLE 2_ Find all solutions of Hx = c, where


-5 -|] 3 3
[H|cj=| 0 3 5 8].
0 0 2]| -4
58 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS
SOLUTION The equations corresponding to this augmented matrix are
—$x,- x%,+3x,= 3
3x, + 5x,= 8
2x, = —4.
From the Jast equation, we obtain x, = —2. Substituting into the second
equation, we have
3x, + 5(-2) = 8, 3x,= 18, = 6.
Finally, we substitute these values for x, and x, into the top equation,
obtaining
—5x, — 6 + 3(-2) = 3, —5x,=15. », = —-3.

Thus the onlv solution is

|-3
|-2
x=] 61,

or equivalently, x, = —3,x,=6,x%,=-2. os

The procedure for finding the solution of Hx = c illustrated in Example 2


is called back substitution, because the values of the variables are found in
backward order, starting with the variable with the largest subscript.
By multiplying each nonzero row in [H | b] by the reciprocal of its pivot,
we can assume that each pivot in H is |. We assume that this is the case in the
next two examples.

EXAMPLE 3. Use back substitution to find all solutions of Hx = c, where


1-3 5] 3
[H|c]=/0 1 2] 2).
0 0 Oj] -!
SOLUTION The equation corresponding to the last row of this augmented matrix is
Ox, + Ox, + Ox, = -1.
This equation has no solutions, because the left side 1s 0 for any values of the
variables and the right side is —1. »

DEFINITION 1.13 Consistent Linear System

A linear system having no solutions is inconsistent. If it has one or


more solutions, the linear system is said to be consistent.

Now we illustrate a many-solutions case.


1.4 SOLVING SYSTEMS OF LINEAR EQUATIONS 59

EXAMPLE 4_ Use back subst..ution to find all solutions of Hx = c, where


1-3 0 5 0 | 4
. 0 0 |! 2 O| -7
i4#le]=|9 0 9 O 4 | 1|-
0 0 0 0 0! 0O
SOLUTION The linear system corresponding to this augmented matrix is
x; _ 3x, + 5X4 = 4

Xs= |.
We solve each equation for the variable corresponding to the colored pivot in
the inatrix. Thus we obtain
xX, = 3x, — 5x, + 4

xX; = 1.

Notice that x, and x, correspond to columns of H containing no pivot. We can


assign any value r we please to x, and any value s to x,, and we can then use
system (4} to determine corresponding values for x,, x,, and x,. Thus the
system has an infinite number of solutions. We describe all solutions by the
vector equation

Xy r
X=!x,;|}=| —2s—7 | for any scalars rand s. (5)
X4
Xs |
We call x, and x, free variables, and we refer to Eq. (5) as the general solution of
the system. We obtain particular solutions by setting r and s equal to specific
values. For example, we obtain

2 [ 25
| 2
—9|forr=s=1 and |-1]forr
= 2,5 = —3. =
| | 3
| l

Gauss Reduction of Ax = b to Hx = c

We now show how to reduce an augmented matrix [A | b] to [H | c], where His


in row-schelon form, using a sequence of elementary row operations. Exam-
ples 2 through 4 illustrated how to use back substitution afterward to find
solutions of the system Hx = c, which are the same as the solutions of Ax = b,
by Theorem 1.6. This procedure for solving Ax = b is known as Gauss
60 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

reduction with back substitution. In the box below, we give an outline for
reducing a matrix A to row-echelon form.

Reducing a Matrix.A to Row-Echelon Form H


1. If the first column of A contains only zero entries, cross it off
mentally, Continue in this fashion until the left column of the
remaining matrix has a nonzero entry or until the columns are
exhausted.
. Use row interchange, if necessary, to obtain a nonzero entry (pivot)
p in the top tow of the first column of the remaining matrix. For
each row below that has a nonzero entry r in the first column, add
—r/p times the top row to that row to create a zero in the first
column. In this fashion, create zeros below p in the entire first
column of the remaining matrix.
. Mentally cross off this first column and the first row of the matrix,
to obtain a smaller matrix. @etthe shaded portion of the third
matrix in the solution of Example 5.) Go back to step 1, and repeat
the prasess with this stiailer nzatezk until cither no rows or no
columns remain.

EXAMPLE5 Reduce the matrix


2-4 2-2
2-4 3-4
4-8 3-2
0 0-1 2
to row-echelon form, making all pivots 1.
SOLUTION We follow the boxed outline and color the pivots of 1. Remember that the
symbol ~ denotes row-equivalent matrices.
2
» ; 4 Multiply the first row by 5, to produce a pivot of | in
2
4 -§ the next matrix.
3-2
0 0-1 2 r,t,

-~2 1-1] Add —2 times row | to row 2, and then add —4


; ; - times row | to row 3, to obtain the next matrix.
ohN

—1

-2 1-1] Cross off the first shaded column of zeros (mentally),


0 1! —2) to obtain the next shaded matrix.
ooo

0-1 2
0-1 2
1.4 SOLVING SYSTEMS OF LINEAR EQUATIONS 61

1-2 1 -!l
_{9 0 1 -2) Add row 2 to rows 3 and 4, to obtain the final
0 O--1 2] matrix.
0 O-1 2
R,—>
R, + UR, R,—>
R, + IR:
1 2-1-1
0 O 1 -2
~“10 0 0 OF
0 0 0 O
This last matrix is row-echelon form, with voth pivots equal to 1. &

To solve a linear system Ax = b, we form the augmented matrix [A | b] and


row-reduce it to [H | c], where H is in row-echclon form. We can follow the
steps outlined in the box preceding Example 5 for row-reducing A to H. Of
course, we always perform the elementary row operations on the full augment-
ed matrix, including the entries in the column to the right of the partition.

EXAMPLE 6 Solve the linear system


xX» ~~ 3x, = —§5

2x,+3x,- 4 = 7
4x, + 5X, a 2X3 10.

SOLUTION We reduce the corresponding augmented matrix, using elementary row


operations. Pivots are colored.
0 1-3 | -5] [2 2-1 7
?

2 3-1] 7\~/o 1-3] -5 Rio R,


4 5-2] 10] |4 5-2] 10
2 3-1 7| [2 3-1 7
0 1-3|-5/~jo 1-3] -5 R, > Ry ~ 2k,
4 5 -2] 10 0 -1 0|-4
[27 3-1 7) (2 3-1 7
0 1-3} -5}~]O 1 -3 | —-SJ. R,—> R, + LR;
10 -1 0} -4| |o 0 -3 | -9

HISTORICAL NOTE THE Gauss SOLUTION METHOD js so named because Gauss described it in a
paper detailing the computations he made to determine the orbit of the asteroid Pallas. The
parameters of the orbit had to be determined by observations of the asteroid over a 6-year period
from 1803 to 1809. These led to six linear equations in six unknowns with quite complicated
coefficients. Gauss showed how to solve these equations by systematically replacing them with a
new system in which only the first equation had all six unknowns, the second equation included
five unknowns, the third equation only four, and so on, until the sixth equation had but one. This
last equation could, of course, be easily solved; the remaining unknowns were then found by back
substitution.
62 CHAPTER 1 VFCTORS, MATRICES, AND LINEAR SYSTEMS

From the last augmented matrix, we cculd proceed to write the corresponding
equations (as in Example 2) and to solve in succession for x, x,, and x, by back
substitution. However, it makes sense to keep using our shorthand, without
writing out variables, and to do our back substitution in terms of augmented
matrices. Starting with the final augmented matrix in the preceding set, we
obtain

2 3-1| 7) [2 3-1] 7 |
~—

0 1-3 | -5}~]O 1-3) -5 Ry


> 9k;
0 0-3] -9! Io | 3 (This shows that x, = 3.)

[2 3-1| 7] [2 3 0] 10]
0 1-3} -5)-|0 1 O 4, R,>R, + IR; Ry >R, + 3R;
0 0 1 3} jo oO 1 3 (This shows that x, = 4.)

2 3 0; 10] [2 0 0| -2]
0 1 O} 4/~Jo 1 0; 4 R, > R, — 3R,
001 3 001 | 3 (This shows that x, = —1.)

We have found the solution: x = [x,, x,, x;] = [-1, 4, 3]. =

In Example 6, we had to write down as many matrices to execute the back


substitution as we wrote to reduce the original augmented matrix to row-
echelon form. We can avoid this by creating ze1os above as well as below each
pivot as we reduce the matrix to row-echelon form. This is known as the
Gauss—Jordan method. We show in Chapter 10 that, for a large system, it takes
about 50% more time for a computer to use the Gauss—Jordau method than to
use the Gauss method with back substitution illustrated in Example 6; there
are actually about 50% more arithmetic operations involved. Creating the
zeros above the pivots requires less computation if we do it after the matrix is
reduced to row-echelon form than if we do it as we go along. However, when
one is working with pencil and paper, fixing up a whole column ina single step
avoids writing so many matrices. Our next example illustrates the Gauss-
Jordan procedure.

EXAMPLE 7 Determine whether the vector b = [1, —7, —4] ts in the span of the vectors v =
[2, 1, 1] and w = [I, 3, 2}.

SOLUTION We know that b is in sp(v. w) if and only ifb = x,v + x,w for some scalars x, and
X,. This vector equation is equivalent to the linear system
2 | l
1 3 = =|-7].
1 2) |-4
14 — SOLVING SYSTEMS OF LINEAR EQUATIONS 63
P.educing the appropriate augmented matrix, we obtain
[2 1 1} Jt 31-7) [1 3] -7] fl Of 2
1 3] -7}~/2 1] ly~]0 -5 | 15/~]o1 | -3).
{ 2/]-4) |1 2] -4| Jo -1] 3] Jo of Oo
R, - R, R, - R, - 2R, R, => IR
; 7 3
(to avoid R;>R,;- 1R, R, > R, - 3R,
fractions) R,— R, + IR;

The left side of the final augmented matrix is in reduced row-echelon form.
From the solution x, = 2 and x, = —3, we see that b = 2v — 3w, which is indeed
insp(v,w). §

The linear system Ax = b displayed in Eq. (1) can be written in the form

ay, Q12 a, b,
Q>, ar Qo, b,
X,| > + Xz} - tere t Xn| - |=] -

Ani Qn Ginn |

This equation expresses a typical column vector b in R” as a linear combina-


tion of the column vectors of the matrix A if and only if scalars x,, x... . , x,
can be found to satisfy that equation. Example 7 illustrates this. We phrase this
result as follows:

Let A be an m X n matrix. The linear system Ax = b is consistent if


and only if the vector b in R” is in the span of the column vectors
of A.

A matrix in row-echelon form with all pivots equal to 1 and with zeros
above as well as below each pivot is said to be in reduced row-echelon form.
Thus the Gauss—Jordan method consists of using elementary row operations
on an augmented matrix [A | b] to bring the coefficient matrix A into reduced
HISTORICAL NOTE Tue JorDAN HALF OF THE GAUSS-JORDAN METHOD is essentizlly a syste-
matic technique of back substitution. In this form, it was first described by Wilhelm Jordan
(1842-1899), a German professor of geodesy, in the third (1888) edition of his Handbook of
Geodesy. Although Jordan’s arrangement of his calculations is different from the one presented
here, partly because he was always applying the method to the symmetric system of equations
arising out of a least-squares application in geodesy (see Section 6.5), Jordan’s method uses the
same arithmetic and arrives at the same answers for the unknowns.
Wilhelm Jordan was prominent in his field in the late nineteenth century, being involved in
several geodetic surveys in Germany and in the first major survey of the Libyan desert. He was the
founding editor of the German geodesy journal and was widely praised as a teacher of his subject.
His interest in finding a systematic method of solving large systems of linear equations stems from
their frequent appearance in problems of triangulation.
64 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

row-echelon furm. It can be shown that the reduced row-echelon form of a


matrix A is unique. (See Section 2.3, Exercise 33.)
The examples we have given illustrate the three possibilities for soiutions
of a linear system—namely, no solutions (inconsistent system), a unique
solution, or an infinite number of solutions. We state cnis formally in a
theorem and prove it.

THEOREM 1.7. Solutions of Ax = b

Let Ax = b be a linear system, and let [A | b] ~ [H | c], where # is in


row-echelon form.
1. The system Ax = b is inconsistent if and only if the augmented
matrix [{H | c] has a row with all entries 0 to the left of the partition
and a nonzero entry to the right of the partition.
2. If Ax = bis consistent and every column of H contains a pivot, the
system has a unigue solution.
3. If Ax = b is consistent and some cclumn of H has no pivot, the
system has infinitely many solutions, with as many free variables as
there are pivot-free columns in H.

PROOF If [| c] hasan ith row with all entries 0 to the left of the partition and
a nonzero entry c, to the right of the partition, the corresponding ith equation
in the system Hx = cis Ox, + Ox, + +++ + Ox, =, which has no solutions;
therefore, the system Ax = b has no solutions, by Theorem 1.6. The next
paragraph shows that, if H contains no such row, we can find a solution to
the system. Thus the system is inconsistent if and only if H contains such
a row.
Assume now that [#7 | c] has no row with all entries 0 to the left of the
partition and a nonzero entry to the right. If the ith row of [H| c] is a zero row
vector, the corresponding equation Ox, + Ox, + - > - + 0x, = 0 is satisfied for
all values of the variables x,, and thus it can be deleted from the system Hx =.
Assume that this has been done wherever possible, so that [H | c] has no zero
row vectors. For each j such that the jth column has no pivot, we can set 4;
equal to any value we please (as in Example 4) and then, starting from the last
remaining equation of the system and working back to the first, solve in
succession for the variables corresponding to the columns containing the
pivots. If some column j has no pivot, there are an infinite number of solutions,
because x, can be set equal to any value. On the other hand, if every column
has a pivot (as in Examples 2, 6, and 7), the value of each x, is uniquely
determined. a

With reference to item (3) of Theorem 1.7, the number of free variables in
the solution set ofa system Ax = b depends only on the system, and not on the
1.4 SOLVING SYSTEMS OF LINEAR EQUATIONS 65

way in which the matrix A is reduced to row-echelon form. This follows from
the uniqueness of the reduced row-echelon form. (See Exercise 33 in Section
2.3.)

Elementary Matrices
The elementary row operations we have performed can actually be carried out
by means of matrix multiplication. Although it is not efficient to row-reduce a
matrix by multiplying it by other matrices, representing row reduction as a
product of matrices 1s a useful theoretical tool. For example, we use elemen-
tary matrices in Section 1.5 to show that, for square matrices A and C, if AC =
I, then CA = I. We use them again in Section 4.2 to demonstrate the
multiplicative property of determinants, and again in Section 10.2 to exhibit a
factorization of some square matrices A into a product LU of a lower-
triangular matrix Z and an upper-triangular matrix U.
If we interchange its second and third rows, the 3 x 3 identity matrix

100 0 0

=
I=|0 10] becomes E= 0 1}.
001 oo 1 0

If A = [a,] is a 3 x 3 matrix, we can compute FA, and we find that

! 0 Offa, a. a; Gy, Ay Ay;


EA “lo O T])a,, ay G23) = |, Ay. a3).
O 1 Offa, 3, a3) | A,, G22 G3

We havc intcrchanged the second and third rows of A by multiplying A on the


left by E.

DEFINITION 1.14 Elementary Matrix

Any matrix that can be obtained from an identity matrix by means of


one elementary row operation is an elementary matrix.

We leave the proof of the following theorem as Exercises 52 through 54.

THEOREM 1.8 Use of Elementary Matrices

Let A be an m X n matrix, and let E be an m X m elementary matrix.


Multiplication of A on the left by E effects-the same elementary
row operation on A that was performed on the identity matrix to
obtain E.
66 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

Thus row reduction of a matrix to row-echelon form can be accomplished by


successive multiplication on the left by elementary matrices. In other words, if
A can be reduced to H through elementary row operations, there exist
elementary matrices E,, E,,..., E, such that
H=(E,-°+ EEA.
Again, this is by no means an efficient way to execute row reduction, but such
an algebraic representation of H in terms of A is sometimes handy in proving
theorems.

EXAMPLE 8 Let
0 1-3
A=|2 3-1}.
4 5-2
Find a matrix C such that CA is a matrix in row-echelon form that is row
equivalent to A.
SOLUTION We row reduce A to row-echelon form H and write down, for each row
operation, the elementary matrix obtained by performing the same operation
on the 3 x 3 identity matnx.

Reduction of A Row Operation Elementary Matrix

'o 61 3] ‘Oo 1 (CO


A=|2 3-1 R, OR, E.=l1 0 0
14 5 ~2| 10 OI
fy 3-4 Ti 0 0
~ 0 l -—3 R,;> R, — 2R, E, = 0 1 0

14 § -2, I-20]
> 3-4] ‘1 0 0
~ 0 l —3 R,—> R, + IR, E, = 0 1 0

10-1 0] lO td
7 3-1
~10 1 -3/=H.
0 0-3,

Thus, we must have E,(E,(E,A)) = H; so the desired matrix C is


0 10
C=E,E,E,=|1 0 0}.
1-21
To compute C, we do not actually have to multiply out E,E,E,. We know that
multiplication of E, on the left by E, simply adds —2 times row | of E, to its
14 SOLVING SYSTEMS OF LINEAR EQUATIONS 67

row 3, and subsequent multiplication on the left by £, adds row 2 to Tov 3 or


the matrix £,E£,. Thus we can find C by executing the same row-reduction steps
on / that we executed to change A to H— namely.
1 o o]| fo 1 9] fo 1 oo] [0 1+O
G6 1 Of~]1 O O;~|t O Of~]1 OOF.
0 0 i] |0 O 1 0-2 i] |! -2 1 a
I E, EE, C = EE,E,

We can execute analogous elementary column operations on a matrix by


multiplying the matrix on the right by an elementary matrix. Column
reduction of a matrix A is not important for us in this chapter, because it does
not preserve the solution set of Ax = b when applied to the augmented matria
[A | b]. However, we will have occasion to refer to column reduction when
computing determinants. The effect of multiplication of a matrix A on the
right by elementary matrices is explored in Exercises 36-38 in Section 1.5.

SUMMARY

1. A linear system has an associated augmented (or partitioned) matnx.


having the coefficient matrix of the system on the left of the partition and
the column vector of constants on the right of the partition.
2. The elementary row operations on a matrix are as follows:
(Row interchange) Interchange of two rows,
(Row scaling) Multiplication of a row by a nonzero scalar;
(Row addition) Addition of a multiple of a row to a different row.
3. Matrices A and B are row equivalent (written A ~ B) if A can be
transformed into B by a sequence of elementary row operations.
4. If Ax = b and Hx = care systems such that the augmented matrices [.4| b]
and [H | c] are row equivalent, the systems Ax = b and Hx = c have the
same solution set.
5. A matrix is in row-echelon form if:
a. All rows containing only zero entries ae grouped together at the
bottom of the matrix.
b. The first nonzero element (the pivot) in any row appears in a column
to the right of the first nonzero element in any preceding row.
6. A matrix is in reduced row-echelon form if it is in row-echelon form and.
in addition, each pivot is 1 and is the only nonzero element in its column.
Every matrix is row equivalent to a unique matrix in reduced row-
echelon form.
7. In the Gauss method with back substitution, we solve a linear system by
reducing the augmented matrix so that the portion to the left of the
68 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

partition is in row-echelon form. The solution is then found by back


substitution.
The Gauss—Jordan method is similar to the Gauss method, except that
pivots are adjusted to be | and zeros are created above as well as below
the pivots.
A linear system Ax = b has no solutions if and only if, after [A | b] is
row-reduced so that A is transformed into row-echelon form, there exists
a row with only zero entries to the left of the partition but with a nonzero
entry to the right of the partition. The linear system is then inconsistent.
10. If Ax = b is a consistent linear system and if a row-echelon form H of A
has at Jeast one column containing no (nonzero) pivot, the system has an
infinite number of solutions. The free variables corresponding to the
columns containing no pivots can be assigned any values, and the
reduced linear system can then be solved for the remaining variables.
Il. An elementary matrix E is one obtained by applying a single elementary
row operation to an identity matrix /. Multiplication of a matrix A on the
left by E effects the same elementary row operation on A.

EXERCISES

In Exercises 1-6, reduce the matrix to (a) In Exercises 7-12, describe all solutions of a
row-echelon jorm, and (b) reduced row-echelon linear system whose corresponding augmented
form. Answers to (a) are not unique, so your matrix can be row-reduced to the given matrix. If
answer may differ from the one at the back of the requested, also give the indicated particular
text. solution, if il exists.

> 1 4 2 4-2 7. 1-1 2 {3 , solut th x, = 2


1. ; 32 2/4 8 3 i 1 4 | | somone %
8.10 1 24] -!
3 -l
0 2-1
1 2 90
3 0 0 2| 4
° 1 1-3 3 1020 |
1 5 5 9| 9.10 1 1 3 | -2],
0000 0
0 0 3 -2| solution with x, = 3, x, = -2
4.)0 0 | i 1 1 0 3 0j -4
Po 3 2 4) 19/9 9 1-1 Of 0
"10 0 0 0 14] -2)/
~! ; Oo! 000 0 0] 0
5. ; 6 ) 4 7 solution with x, = 2, x, = 1
0 0 1 3-4
14 SOLVING SYSTEMS OF LINEAR EQUSTIONS 69

1-!} 2 0 3] 4 24. x, t2.-3y4+ W=?2


000 1 4] 21 3y, + 6x, — 8x, — 2x, =
lo 0 0 0 Of -1!
0 0 0 0 0] 0!
In Exercises 25-28, deterinine whether the vector
b is in the span of the vectors ¥,
in Exercises 13-20, find all solutions of the given
jinear system, using the Gauss method with back 3 0 ! -3]
substitution. b= 3},¥, = 2), = 4|..; = 4

Ww
wr
3 4 —2 5
13. 2x- y= 8 |.
6x — Sy = 32 26. b= 4}
[14 2 fz
14. 4x, - 3x, = 10
5 [1 2 [-3
moe) ie| Zhe] Sine |d
8x, — xX, = 10
15. y+ z= 6
3x -—yt z= -7 | a13, | 0 5 ~8
x+y-3z=
-13
16. 2x+ y-3z= 0 v,= 0
=|
6x+3y-8z= 9 —4|

2x- yt5z=-4 [ 2 1 -3 I
17, x,-— 2x,= 3 28. b= atta ™ he 7
3x, — xX, = 14
| 7 4 -9 4

(4
x, — 1x,
= -2
2
18 x,-— 3x, + x, =2

0
3x, — 8x, + 2x, = 5 * 10
19. x, +4x,-2x,= 4
29. Mark each of the following True or False.
2x,
+ 71x, - xX;
= -2
___ a. Every linear system with the same
2x, + 9X,- 7x; = 1
number of equations as unknowns has a
20. x, _ 3x; + 2x; —_ X4 = 8 unique solution.
—_._ b. Every linear system with the same
number of equations as unknowns has at
least one solution.
In Exercises 21-24, find ail solutions of the iinear
___ ¢. A linear system with more equations than
system, using the Gauss-Jordan method. unknowns may have an infinite number
of solutions.
21. 3x, — 2x, = -8 __. d. A linear system with fewer equations
than unknowns may have no solution.
4x, + 5x, = -3
___e. Every matrix is row equivalent to a
22. 2x, + 8x, = 16 unique matrix in row-echelon form.
5x, — 4x, = -8 __ f. Every matrix is row equivalent to a
23. Xx, —2x,+ x, =6 unique matrix in reduced row-echelon
form.
2X,- x,+ «4, -3x,=0 ___ g. If [A | b] and [B | c] are row-equivalent
9x, — 3x,- x, - 7x,= 4 partitioned matrices, the linear systems
76 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

Ax = b and Bx = c have the same 40. Determine all values of the 5, that make the
solution set. linear system
. A linear system with a square coefficient
matrix A has a unique solution if and xX, +X, —- x; = 5,
only if A is row equivalent to the identity 2x, + x3= b,

matrix. X, — X; = b;
i. A linear system with coefficient matrix A
has an infinite number of solutions if and consistent.
only if A can be row-reduced to an 41. Determine all values 5,, b,, and 5, such that
echelon matrix that includes some b = (b,, b,, 55] lies in the span of v, =
column containing no pivot. (1, l, 0], VY, = [3, -l, 4], and y=
. Acconsistent linear system with coefficient {-1, 2, —3].
matrix A has an infinite number of 42. Find an elementary matrix £ such that
solutions if and only if A can be
row-reduced io an echelon matnx that yaa 1 3 1 4
includes some column containing no E012 1);=)0 1 2 1].
pivot. 3451) {0 -5 2-1)
43. Find an elementary matrix £ such that

In Exercises 30-37, describe all possible values for [1 3.14 1314


the unknowns x; so that the matrix equation is Gast ia4sal
valid. 3451 3451
. Find 2 matrix C such that
30 2[x, %}- [4 7] = [-2 11]
12] f1 2
31. 4[x, x] + 2ix, 31={[-6 18]
q3 4)-|0 3)
32. Ix, x4] _3| = (2] 42| |0 -6
45. Find a matrix C such that
33.
1 2 3 4
a

Cl3 4)=]4 2).


34.
al [e}-['s] 42 1 2

35.
[ =I} In Exercises 46-51, let A bea 4 x 4 matrix. Find

36.
afd a matrix C such that the result of applying the
given sequence of elementary row operations to A
can also be found by computing the product CA.
37.
Ee} 3)=[o
38. Determine all values of the b, that make the
46. Interchange row | and row 2.
47, Interchange row ! and row 3; multiply row 3
linear system by 4.
x, + 2x, = b, 48. Multiply row | by 5; interchange rows 2 and
3; add 2 times row 3 to row 4.
3x, + 6x. = dB,
49. Add 4 times row 2 to row 4; multiply row 4
consistent. by —3; add 5 times row 4 to row |.
39. Determine all values 5, and 6, such that b = 50. Interchange rows | and 4; add 6 times row 2
[b,, 5,] is a linear combination of v, = [1, 3] to row 1; add —3 times row | to row 3; add
and V5 = (5, —1}. —2 times row 4 to row 2.
14 SOLVING SYSTEMS OF LINEAR EQUATIONS 71

51. Add 3 times row 2 to row 4; add —2 times 0. where the meaning of ‘sufficiently small" must
row 4 to row 3; add 5 times row 3 to row |; be specified in terms of the size of the nonzero
add —4 times row | to row 2. entries in the original matrix. The routine
YUREDUCE in LINTEK provides drill on the
steps involved in reducing a matrix without
Exercise 24 in Section 1.3 is useful for the irext requiring burdensome computation. The program
three exercises. computes the smallest nonzero coefficient
magnitude m and asks the user to enter a number
52. Prove Theorem 1.8 for the row-interchange r (for ratio), all computed entries of magnitude
operation. less than rm produced during reduction of the
coefficient matrix will be set equal to zero. In
53. Prove Theorem 1.8 for the row-scaling
Exercises 59-64, use the routine YUREDUCE,
operation.
specifying r = 0.0001, to solve the linear system.
54. Prove Theorem 1.8 for the row-addition
operation.
59, 3x, - x, =-10
55. Prove that row equivalence ~ is an
equivalence relation by verifying the 1x, +2x%,= 7
following for m x n matrices A, B, and C. 2x, — 5x, = -37
a. A~ A. (Reflexive Property) 60. 5x, -— 2x, = 11
b. IfA ~ B, then
B ~ A.
8x, + x= 3
(Symmetric Property)
6x, — 5x, = —4
c. IfA ~ Band B~ C, thenA ~ C.
(Transitive Property) 61. 7x,-2x,+ x, =-14
56. Find a, b, and c such that the parabola y = —4x, + 5x, - 3x, = 17
ax’ + bx + c passes through the points 5x, - X,+2x,;= —-7
(1, —4), (-1, 0), and (2, 3). 62. —3x,+ 5x,+2x,= 12
57. Find a, b, c, and d such that the quartic 5x, — 7X, + 6x, = —16
curve y = ax’ + bx’ + cx’ + d passes
through (1, 2), (—1, 6), (-2, 38), and (2, 6). llx, — 17x, + 2x, = —40
58. Let A be an m X n matrix, and letc bea 63. x, -~24,+ X,- xX, +2x,= |
column vector such that Ax = c has a unique 2x,+ X,- 4x; - x, + 5x,= 16
solution. 8x,- x, +3x,- m4 - X= 1
a. Prove that m= n.
4x, — 2x, + 3x; — 8x, + 2x, = —5
b. If m = n, must the system Ax = b be
Sx, + 3x, — 4x, + 7x4, - 6x, = 7
consistent for every choice of b?
c. Answer part (b) for the case where 64. x,-2x,+ x,- x +2x,= |
m>n 2x,+ X,- 4x, - xX, + Sx,= 10
8x,- x, +3x,- XY- X= 5
h
4x, ~ 2x, + 3x; 7 8x, + 2X; = -3
& A problem we meet when reducing a matrix with
the aid of a computer involves determining when 5x, + 3x, — 4x, + 7x, -— 6x, = 1
a computed entry should be 0. The computer
might give an entry as 0.00000001, because of
roundoff error, when it should be 0. If the The routine MATCOMP in LINTEK can also be
computer uses this entry as a pivot in a future used to find the solutions of a linear system.
Step, the result is chaotic! For this reason, it is MATCOMP will bring the left portion of the
common practice to program the computer to augmented matrix to reduced row-echelon form
replace all sufficiently small computed entries with and display the result on the screen. The user can
72 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

then find the solutions. Use MATCOMP in the 66. Solve the linear system in Exercise 61.
remaining exercises. 67. Solve the linear system in Exercise 62.

65. Find the reduced row-echelon form of the 68. Solve the linear system in Exercise 63.
matrix in Exercise 6, by taking it as a
coefficient matrix for zero systems.

MATLAB
When reducing a matrix X to reduced row-echelon form, we may need to swap row
i with row k. This can be done in MATLAB using the command

X(li k],:) = X((k i],:).


if we wish to multiply the :th row by the reciprocal of x, to create a pivot ! in the
ith row and jth column, we can give the command

X(i,:) = XG.:)/XG,j).
When we have made pivots | and wish to make the entry in row k, columnj equal
to zero using the pivot in row i, column j, we always multiply row / by the negative
of the entry that we wish to make zero, and add the result to row k. In MATLAR,
this has the form

X(k,:) = X(k,:) — X(k, j)*X(i,:).

Access MATLAB and enter the lines

X = ones(4);
i = 1; j = 2;k = 3;
X({i k],:) = X({k i],:)
X(i,:) = XG,:)/XG,j)
X(k,:) = X(k,:) ~ X(k, D*XQi,:),

which vou can then access using the p-arrow key and edit repeatedly to row-reduce a
matrix X. MATLAB will not show a partition in X—you have to supply the partition
mentally If your installation contains the data files for our text, enter fcl1s4 now. Ve
will be asking you to work with some of the augmented matrices used in the exercises
for this section. In our data file, the augmented matrix for Exercise 63 is called E63.
etc Solve the indicated systen: by setting X equal to the appropriate matrix and
reducing tt using the up-arrow key and editing repeatedly the three basic commands
above. In MATLAB, only the commands executed most recently can be accessed by
using the up-arrow key. To avoid losing the command to interchange rows, which is
seldom necessary, execute it at least once in each exercise even if it is not needed.
(Interchanging the same rows twice leaves a matrix unchanged.) Solve the indicated
exercises listed below

M4. Exercise 21 M4, Exercise 61


M2. Exercise 23 MS. Exercise 62
M3. Exercise 60
15 INVEPSES OF SQUARE MATRICES 73

The command rref(A) in MATLAB will reduce the matnx A ta reduced row-ech elon
form. Use this command to Solve the following exercises
M6. Exercise6 M8. Exercise 63
M7. Exercise 24 M9. Exercise 64

(MATLAB contains a demo command rrefmovie(A) designed 10 show the


step-by-step reduction of A, but with our copy and a moderately fast computer. the
demo goes so fast that it is hard to catch it with the Pause key in order to view it. If
you are handy with a word processor, you might copy the file rrefmovi.im as
rrefmovp.m., and then edit rrefmovp.m to supply ,pause at the end of! each of the
four lines that start with A( . Entering rrefmovp(A) from MATLAB will then run
the altered demo, which will pause after each change of a matrix. Strike any key to
continue after a pause. You may notice that there seems to be unnecessary row
swapping to create pivots. Look at the paragraph on partial pivoting in Section 10.3
to understand the reason for this.)

INVERSES OF SQUARE MATRICES

Matrix Equations and Inverses


A system of n equations in n unknowns x,, x,,..., X, can be expressed in
matrix form as

Ax = b. (1)
where A is the n X n coefficient matrix. x isthe 7 X | column vector with /th
entry x,, and bis ann X | column vector with constant entries. The analogous
equation using scalars is
ax = b (2)
for scalars a and 8. Ifa 4 0, we usually think of solving Eq. (2) for x by dividing
by a, but we can just as well think of solving it by multiplying by 1/a. Breaking
the solution Gown into small steps, we have

(Flax = “p Multiplication by I/a


a \a
4 HW
a)al |= — (| Associativity of multiplication
a)
]
lx = (4). Property of 1/a

l
x= |7 b. Property of 1

Let us see whether we can solve Eq. (1) similarly if A is a nonzero matrix.
Matrix multiplication is associative, and the n x n identity matrix J plays the
same role for multiplication of n X 1 matrices that the number | plays for
74 CHAPTER t VECTORS, MATRICES, AND LINEAR SYSTEMS

multiplication of numbers. The crucial step is to find an n x n matrix C such


that CA = IJ, so that C plays for matrices the role that 1/a does for numbers. If
such a matrix C exists, we can obtain from Eq. (1)
C(Ax} = Cb Multiplication by C
(CA)x = Cb Associativity of multiplication
Ix = Cb Property of C
x = Cb, Property of /
which shows that our column vector x of unknowns must bethe column vector
Cb. Now an interesting prohlem arises. Wnen we substitute x = Cb back into
our equation Ax = b to verify that we do indeed have a solution, we obtain Ax
= A(Cb) = (AC)b. But how do we know that AC = ! from our assumption that
CA = I? Matrix multiplication 1s not a commutative operation. This problem
does not arise with our scalar equation ax = b, because multiplication of real
numbers is commutative. It is indeed true that for square matrices, if CA =/
then AC = I, and we will work toward a proof of this. For examoie, the

Pr alii a=[ =f allt


illustrates that

[29
A=l1 4 | and C= 5

satisfy CA = I = AC.
Unfortuaately, it is not true that, for each nonzero ” X n matrix A, we can
find ann X n matrix C such that CA = AC = J. For example, if the first column
of A has only zero entries, then the first column of CA also has only zero entries
for any matrix C, so CA # I for any matrix C. However, for many important
n X n matrices A, there does exist an n X n matrix C such that CA = AC =I.
Let us show that when such a matrix exists, it is unique.

THEOREM 1.9 Uniqueness of an Inverse

Let A be ann X n matrix. If Cand Dare matrices such that AC = DA =


I, then C = D. In particular, if AC = CA = J, then C 1s the unique
matnx with this property.

PROOF Let C and D be matrices such that AC = D4 = J. Because matrix


multiplication is associative, we have

D(AC) = (DA)C.
15. INVERSES OF SQUARE MATRICES 75

But, because AC = J and DA = IT, we find that

D(AC) = DI =D and (DA)C = IC =C.

Therefore, C = D.
Now suppose that AC = CA = J, and let us show that C is the unique
matrix with this property. To this end, supsose also that 4D = D1 = /. Then
we have AC = I = DA, so D = C, as we just showed. a

From the title of the preceding theorem, we anticipate the following


definition.

DEFINITION 1.15 Invertible Matrix

Ann X n matrix A is invertible if there exists an n x n matrix C such


that CA = AC = J, the n X n identity matrix. The matrix C is the
inverse of A and is denoted by A™'. If A is not invertible, it is singular.

Although A“! plays the same role arithmetically as a7! = 1/a (as we showed
at the start of this section), we will never write A”! as 1/A. The powers of an
invertible n X n matrix A are now defined for all integers. That is, for 7 > 9.
A” is the product of m factors A, and A~” is the product of m factors A~'. We
consider A° to be the n X n identity matrix J.

Inverses of Elementary Matrices


In Section 1.4, we saw that each elementary row operation can be undone by
another (possibly the same) elementary row operation. Let us see how this fact

HISTORICAL NOTE THE NOTION OF THE INVERSE OF A MATRIX firs] appears in an 1855 nole of
Arthur Cayley (1821-1895) and is made more explicil in an 1858 paper enlilled “A Memoir on
ihe Theory of Matrices.” In that work, Cayley outlines the basic properties of matrices. noting that
most of these derive from work with sets of linear equations. In particular, the inverse comes from
ihe idea of solving a system
Xa=axtbyt+ez
Y=a'xtby+c'z
Z=a'xt by + cz
for x, v, zin terms of X, Y, Z. Cayley gives an explicit construction for the inverse in terms of the
determinants of the original matrix and of the minors. ;
In 1842, Arthur Cayley graduated from Trinity College, Cambridge, but could not find a
suitable teaching post. So, like Sylvester, he studied law and was called to the bar in 1849. During
his 14 years as a lawyer, he wrote about 300 mathematical papers; finally, in 1863 he became a
professor at Cambridge, where he remained until his death. It was during his stint as a lawyer that
he met Sylvester; their discussions over the next 4) years were extremely fruitful for the progress
of algebra. Over his lifetime, Cayley produced about 1000 papers in pure mathematics, theoretical
dynamics, and mathematica] astronomy.
76 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

translates to the invertibility of the elementary matrices, and a description of


their inverses.
Let EF, be an elementary row-interchange matrix, obtained from the
identity matrix J by the interchanging of rows i and k. Recall that £,A effects
the interchange of rows i and k of A for any matrix A such ihat E,A is defined.
In partieular, taxing A = E,, we see that E,E, interchanges rows i and k of E,,
and hence changes E, back to J. Thus,
EE, = 1.
Consequently, £, is an invertible matrix and is its own inverse.
Now let £, be an elementary row-scaling matrix, obtained from the
identity matrix by the multiplication of row i by a nonzero scalar r. Let E,’ be
the matrix obtained from the identity matrix by the multipiication of row i by
l/r. It is clear that

E,'E, = E,E£,' = 1,
so E, is invertible, with inverse E£,’.
Finally, let E, be an elementary row-addition matrix, obtained from J by
the addition of 7 times row i to row k. If E,' is obtained from J by the addition
of —r times row i to row k, then
EyE, = E,Ey = 1.
We have established the following fact:

Every elementary matrix is invertible.


| |

EXAMPLE 1 Find the inverses of the elementary matrices


010 300 104
E,=|1
0 0}, £,=]0
1 0], and £F,=|0
1 0
001 001 001

SOLUTION Because E, is obtained from


100
1=|0 1 0
001
by the interchanging of the frst and second rows, we see that E,"' = E,.
The matrix E, is obtained from / by the multiplication of the first row by 3,
so we must multiply the first row of J by } to form
a
Ey! =|9 1 0
001
15 INVERSES OF SQUARE MATRICES 77

Finally, E, is obtained frc.n / by the addition of 4 times row 3 to row 1. To


form E,"' from J, we add —4 times row 3 to row 1, so
1 0 -4
Ey'={\0 1! O|.
0 0 1 8

Inverses of Products

The next theorem is fundamental in work with inverses.

THEOREM 1.10 _Inverses of Products

Let A and B be invertible 1 X n matrices. Then AB is invertible, and


(AB)! = BUA".

PROOF By assumption, there exist matrices A~' and B-' such that AA7! = A"'A
= T and BB"' = B''B = I. Making use of the associative law for matrix
multiplication, we find that

(AB)(B-'A-') = [A(BB")]A"! = (ADA™! = A "! = 1.


A similar computation shows that (B-'A~')(AB) = J. Therefore, the inverse of
AB is B"'A"'; that is, (AB)! = B'A"'. a

It is instructive to apply Theorem 1.10 to a product E,:-- E,E,E, of


elementary matrices. In the expression

(E,... E,E,E,)A,

the product E, - - - E,E£,E, performs a sequence of elementary row operations


or A. First, E, acts on A; then, £, acts on £,A: and so on. To undo this
sequence, we must first undo the last elementary row peration, performed by
E,. This is accomplished by using £,"'. Continuing, we should perform the
sequence of operations given by
Ey E,'E," s 2 E'

in order to effect (E, - +--+ EE,E,)"'.

A Commutativity Property
We are now in position to show that if CA = J, then AC = J. First we prove a
lemma (a result preliminary to the main result).
78 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

LEMMA 1.1 Condition for Ax = b to Be Solvable for All b

Let A be an nm X n matrix. The linear system Ax = b has a solution for


every choice of column vector b € Rif and only if A is row equivalent
to the m x n identity matrix J.

PROOF Let b be any column vector in R” and let the augmented matrix
[A | b] be row-reduced to {H | c] where H is in reduced row-echelon form.
If H is the identity matrix /, then the linear system Ax = b has the solution
X=C.
For the converse, suppose that reduction of A to reduced row-echelon
form yields a matrix H that is not the identity matrix. Then the bottom
row of H must have every entry equal to 0. Now there exist elementary
matrices E,, F,,..., E, such that (E,- ++ £,£,)A = H. Recall that every
elementary matrix is invertible, and that a product of elementary matrices
is invertible. Let b = (£,°:- £,E£,)"'e,, where e, is the column vector
with | in its mth component and zeros elsewhere. Reduction of the aug-
mented matrix [A | b] can be accomplished by multiplying both A and b on
the left by E, - - - £,£,, so the reduction will yield [H | e,], which represents
a system of equations with no solution because the bottom row has entries 0
to the left of the partition and | to the right of the partition. This shows that
if H is not the identity matrix, then Ax =-b does not have a solution for
somebER". a

THEOREM 1.11 A Commutetivity Property

Let A and C be n X n matrices. Then CA = Jif and only if AC = J.

PROOF To prove that C4 = Jif and only if AC = J, it suffices to prove that if


AC = I, then CA = I, decause the converse statement is obtained by reversing
the roles ofA and C.
Suppose now that we do have AC = J. Then the equation Ax = b has a
solution for every column vector b in R"; we need only notice that x = Cbisa
solution because A(Cb) = (AC)b = 7b = b. By Lemma 1.1, we know that A 1s
row equivalent to the n X n identity matrix J, so there exists a sequence of
elementary matrices £,, F..... E, such that (E,--:+ £,£,)A = J. By
Theorem 1.9. the two equations

(E.-++KE)A=1 and AC=1


imply that E,- ++ £,E£, = C.so we have C4 = J also. a
1.5 INVERSES OF SQUARE MATRICES 79

Computation of Inverses
Let A = [a,] be an” X n matrix. To find A”. if it exists. we must find an #7 xX A
matrix X = [x,] such that AX = /—that is. such that
Gy Ay 8 Ayal Ay Me ot ON, 10 -:: 0
Ay, Ary * Any | | Xa) Nn 8 May 01 :-- 0
: = . . (3)

Qn Ann 9 °° Ann | Xa Xm °° Xan [O |

Matrix equation (3) corresponds to n° linear equations in the n? unknowns x;;


there is one linear equation for each of the 7° positions in an z X n matrix. For
example, equating the entries in the row 2, column | position on each side of
Eq. (3), we obtain the linear equation

Ay Xyy + AyX, + 05+ F AX, = 0.


Of these n° linear equations, 1 of them involve the n unknowns x, for i =
1, 2,..., ; and these equations are given by the column-vector equation
xX) l
Xy 0
A -|=|]-], (4)

Xn1 0
which is a Square system of equations. There are also n equations involving the
nunknowns x, fori = 1, 2,...,; and so on. In addition to solving system
(4), we must solve the systems

X12 °| Xn 0
X29 i Xp 0
A - =|.j,...,A =T+], (5)

Xn 0 Xan l
where each system has the same coefficient matrix A. Whenever we want to
solve several systems Ax = b, with the same coefficient matrix but different
vectors b,, we solve them all at once. rather than one at a time. The main job in
Solving a linear system is reducing the coefficient matrix to row-echelon or
reduced row-echelon form, and we don’t want to repeat that work over and
over. We simply reduce one augmented matrix, where we line up all the vectors
b; to the right of the partition. Thus, to solve all the linear systems in Eqs. (4)
and (5), we form the augmented matrix

a, 4, ~** a, | |
Q, Qa. *** &, | O

Qn, An °° * Any 00 --: |


80 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

which we abbreviate by [A | J]. The matrix A is to the left of the partition, and
the identity matrix J is to the nght. We then perform a Gauss—Jordan
reduction on this augmented matnx. By Theorem 1.9, we know that if A!
exists, it is unique, so that every column in the reduced row-echelon form of A
has a pivot. Thus, 47! exists if and only if the augmented matrix (6) can be
reducec¢ to

l 0 eee 0 C , Ci o ee Ci

0 l soe 0 Cy Co) owe Co,

10 0 see l Cal Cn oes |

where the n Y n identity matrix / is to the left of the partition. The » x na


solution matrix C = [c,] to the nght of the partition then satisfies AC = J, so
A-' = C. This is an efficient way to compute A™'. We summarize the
computation in the following box, and we state the theory in Theorem 1.12.

Computation of A™
To find A~’, if it exists, proceed as follows:
Step 1 Form the augmented matrix [A | I].
Step 2 Apply the Gauss—Jordan method to attempt to reduce [A | J]
to [J | C]. If the reduction can be carried out, then A“! = C.
Otherwise, A~! does not exist.

2 , a: .
EXAMPLE 2 For the matrix A = i; A compute the inverse we exhibited at the start of this
section, and use this inverse to solve the linear system

2x + 9y = —5
x+4y= 7.

SOLUIILON Reducing the augmented matrix, we have

29} 10)_f14}o1n [1 4]01


14)01 29) 10 O11] 1-2

_flo}|-4 9
o1| 1-2)
Thercfore,
15 INVERSES OF SQUARE MATRICES 81

If -1-' exists, the solution of 4x = b is x = 47'b. Consequently, the solution of


our system

(2 4 "| _ fe - [*] _[-4 9] 3] _| 13


il afte] | 7)” n= | -3}| 7| [|-19) "
We emphasize that the computation of the solution of the linear system in
Example 2, using the inverse of the coefficient matrix. was for illustration only.
When faced with the problem of solving a square system, Ax = b, one should
never start by finding the inverse of the coefficient matrix. To do so would
involve row reduction of [A | /] and subsequent computation of A~'b, whereas
the shorter reduction of [A | b] provides the desired solution at once. The
inverse of a matrix is often useful in symbolic computations. For example, if
A is an invertible matrix and we know that AB = AC, then we can deduce that
B = Cby multiplying both sides of AB = AC on the !eft by A™'. If we have
r systems of equations

Ax, = b,, Ax, = b,, Lee Ax, = b,


all with the same invertible n x n coefficient matrix A, it might seem to be
more efficient (for large r) to solve all the systems by finding A~! and computing
the coluinn vectors

x, =A'b, x, = A™'b,, wee) X, = Av'b,.


Section 10.1 will show that using the Gauss method with back substitution on
the augmented matrix {A | b, b,--:- b,) remains more efficient. Thus,
inversion of a coefficient matrix is not a good numerical way to solve a linear
system. However, we wil! find inverses very useful for solving other kinds of
problems.

THEOREM 1.12 Conditions for A7' to Exist

The following conditions for an n X n matrix A are equivalent:


(i} A is invertible.
(11) A is row equivalent to the identity matrix J.
(iii) The system Ax = b has a solution for each n-component column
vector b. -
(iv) A can be expressed as a product of elementary matrices.
(v) The span of the column vectors of A is R’.-

PROOF Step 2 in the box preceding Example 2 shows that parts (i) and (11) of
Theorem 1,12 are equivalent. For the equivalence of (ii) with (iii), Lemma 1.1
shows that Ax = b has a solution for each b € R’ if and only if (ii) is true. Thus,
(ii) and (iii) are equivalent. The equivalence of (iii) and (v) follows from the
box on page 63.
82 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

Turning to the equivalence of parts (ii) and (iv), we know that the matrix A
is row equivalent tc / if and only if there is a sequence of elementary matrices
E,, £,,...,£,such that £,- -- E,E,A= I; and this is the case if and only if A is
expressible as a product 4 = E,'£,'--- E,' of elementary matrices. 4

EXAMPLE 3 Using Example 2, express i ‘ as a product of elementary matrices.

SOLUTION The steps we performed in Example 2 can be applied in sequence to 2 x 2


identity matrices to generate elementary matrices:

E. = fF Interchange rows | and 2.

. 1 0 : .
Ey = 7] Add -2 times row 1 to row 2.

E, = lo 1 Add —4 times row 2 to row 1.

Thus we see that £,£,£,4 = I, so

A= —- EB; 'E,
RP-ip-ip-) = i0 1}/1
ol O;io
]1 4
if ;
Example 3 illustraves the following boxed rule for expressing an invertible
matrix A as a product of elementary matrices.

| Expressing an Invertible Matrix A as a Product of Elementary Matrices


| Write in left-to-right order the inverses of the elementary matrices
| corresponding to successive row operations that reduce A to /.
|

EXAMPLE 4 Determine whether the matrix

fy 3-2
A=| 2 § -3
-3 2-4

is invertible, and find its inverse if it is.


SOLUTION We have

l 3-2 | 1 0 0 1 3 -2 1 0 0O
2 5 -3 O 1} O;~{0 -1 l —2 1 0
—3 2-4; 0 0 1 0 11 -10 3 0 1
1 6 1] -5 3 Oj] [1 O Of 14 -8 -1
~|0 1-1 2-1 Of/~/0 1 O] -17 10 I}.
0 0 41;,-19 11 #1 0 0 14-19 11 #1
15 INVERSES OF SQUARE MATRICES 83

Therefore, A 1s an invertible matrix, and

14 -8 -1]
A =]-17 10 1.

EXAMPLE 5 Express the matrix A of Example 4 as a product of elementary matrices.


SOLUTION In accordance with the box that follows Example 3, we write in left-to-right
order the successive inverses of the elementary matrices corresponding to the
row reduction of A in Example 4. We obtain

10 0;) 1 130
A=|2 10 01 0 010
00 1}/-3 0 7
l 0 0)/ i > l -
x 10 1 0 3 l 0 0 1 4 ,
0: 1 1 {lo G I 0 ¢ 1 u

EXAMPLE 6 Determine whether the span of the vectors [1, —2, 1], (3, —5, 4], and [4, —3, 9]
is all of R?.
SOLUTION Let

1 3 4
A=|-2 -5 —3}.
1 4 9
We have

1 3 4 134 1 34
—2 -5 -3}~|01 5|/~|0 1 35).
1 4 9; {01 5} |000
We do not have a pivot in the row 3, column 3 position, so we are not able
to reduce A to the identity matrix. By Theorem 1.12, the span of the given
veciors is not ailof R?. «=

SUMMARY

1. Let A be a square matrix. A square matrix C such that CA = AC = / 1s the


inverse of A and is denoted by C = A™'. If such an inverse of A exists, then
Ais said to be invertible. The inverse of an invertible matrix A is unique. A
square matrix that has no inverse is called singular.
2. The inverse of a square matrix A exists if and only if A can be reduced to
the identity matrix J by means of elementary row operations or (equiva-
lently) if and only if A is a product of elementary matrices. In this case, A is
egual to the product, in left-to-right order, of the inverses of the successive
84 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

elementary matrices corresponding to the sequence of row operations used


to reduce A to J.
3. To find A™', if it exists, form the augmented matrix [A | J] and apply the
Gauss—Jordan method to reduce this matrix to [J | C]. If this can be done,
then A~' = C. Otherwise, A is not invertible.
4. The inverse of a product of invertiblc- matrices is the product of the
inverses in the reverse order.

OO
EXERCISES
In Exercises 1-8, (a) find the inverse of the square In Exercises 11 and 12. determine whether the
matrix, if it exists, and (b} express each invertible span of the column vectors of the given matrix is
matrix as a product of elementary matrices. Re.

“By 3 6
BS
67
rT 1 (0 1 1]
0-1-3
1 O-!
4
2
3.
a
! 0 1
«To1] -3
rT 1-2
0 0-1]
| O
5.10 1 1 6. | 2 0 3 5 0 2
Ut

12.
0 1 2-4
w

lo oO -1 3 |
f-1 2 4 -2|
2 1 4 -1
2 13. a. Show that the matrix
7.1432 «5 8. | 2 -3
0-1 1 1 0 2 -3
[5 7
A=

is invertible, and hnd its inverse.


b. Use the result in (a) to find the solution
of the system of equations
In Exercises 9 and 10, find the inverse of the 2x, ~ 3X3 = 4, 5x, — 1X, = —3.
matrix, if it exists. 14, Using the inverse of the matrix in Exercise
7, find the solution of the system of
1 0 00 0 0 equations
0-1 00 0 0
9 0 0 2 0 0 0 2x, + No + 4x, = 5

‘lo 0 03 0 0 3x, + 2x, + 5x, = 3


i0 0 00 4 0 Ns + xy = 8.

lo 00005 15. Find three linear equations that express x, y,


z in terms of r,s. ¢, if
(0 0000 6
0 0 00 5 0 2vn+t v+d4rer
9.12 9 0 4 0 0 3x+ 21+ 52=5
"0 0 300 0 -re 251
lo 2 00 0 0
i} 0 0 00 0 [Hint: See Exercise 14.]
15 INVERSES OF SQUARE MATRICES 85

16. Let 22. Let -f and B be two m X n matrices. Show


that .4 and B are row equivaient if and only
fo? 4 if there exists an invertible m X m1 matrix C

Wiis
A “0 1]. such that C4 = B.
l4 1 2


23. Mark each of the following True or False.
If possible, find a matrix C such that The statements involve matrices 4, B, and

sc-[0 i) 1 2 C, which are assumed to be of appropriate


size.
4 | _— a. If 4C = BC and C is invertible, then
A=B.
. Let . If 4B = Oand B is invertible, then

a
A= 0.
NW

A'=|0 3 1 . . If 4B = C and two of the matrices are


mWwW

412 invcrtible, then so is the third.


. If 4B = C and two of the matrices are
If possible, find a matrix C such that singular, then so is the third.
—_— €. If A- is invertible, then 4? is invertible.
2 1. 31 —f If 4° is invertible, then A? is invertible.
ACA=|-1 2 2 _— g. Every elementary matrix is invertible.
2 1 ——h. Every invertible matrix is an elementary
18. Let matrix.
i. If A and B are invertible matrices, then
422 so isA + B, and (A + By! =A! + BB.
A='|0
3 I}. __ij. If A and B are invertible, then so is .4B,
20 1 and (AB)! = A'B"'.
If possible, find a matrix B such that 24. Show that, if A is an invertible n x n matrix,
AB = 2A. then A? is invertible. Describe (A7)"' in
terms of A7'.
19. Let
121 25. a. IfA is invertible, is A + A’ always
A=
|0 1 2}. invertible?
132 b. If A is invertible, is A + A always
invertible?
If possible, find a matrix B such that
AB = A? + 2A. 26. Let A be a matrix such that A? is invertible.
Prove that 4 is invertible.
20. Find all numbers r such that
27. Let A and B be n X n matrices with A
2 4 | invertible.
I r3
Show that 4X = B has the unique
li 21] solution X = 47'B.
1s invertible. b. Show that X = A~'B can be found by the
following row reduction:
21. Find all numbers r such that

242
[4| B]~ |X).
lL r 3 That is. if the matrix A is reduced to the
112 identity matrix J, then the matrix B will be
reduced to A7'B.
Is invertible.
86 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

28. Note that b. Show that A is invertible if and only if


h# 0.
1 +1 — (a+ 0)
a hb (ab)

for nonzero scalars a, b € R. Find an Exercises 36-38 develop elementary column


analogous equality for invertible 1 x n operations.
matrices A and B.
29. An n X n matrix A Is nilpotent if A’ = O 36. For each type of elementary mairix E,
(the n X n zero matrix) for some positive explain how E can be obtained from the
integer r. identity matrix by means of operations on
columns.
a. Give an example of a nonzero nilpotent
2 x 2 matrix. 37. Let A be a square matrix, and let E be an
b. Show that, if A is an invertible n x n elementary matrix of the same size. Find the
matrix, then .4 is not nilpotent. effect on A of multiplying A on the night by
E. [Hini: Use Exeicise 36.]
30. A square matrix 4 is said to be idempotent if
A= A. 38. Let A be an invertible square matrix. Recali
that (BA)"' = A~'B-'" and use Exercise 37 to
a. Give an example of an idempotent
answer the following questions:
matrix other than O ard /.
b. Show that, if a matrix A is both a. If two rows of A are interchanged, how
idempotent and invertible, then A = /. does the inverse of the resulting matnx
compare with A~'?
31. Show that
b. Answer the question in part (a) if,
instead, a row of A is multiplied by a
nonzero scalar r.
c. Answer the question in part (a) if,
instead, r times the ith row of A is added
to the jth row.
is nilpotent. (See Exercise 29.)
32. A square matrix is upper triangular if all a In Exercises 39-42, use the routine YUREDUCE
entries below the main diagonal are zero. in LINTEK to find the inverse of the matrix, if it
Lower triangular is defined symmetrically. exists. If a printer is available, make a copy of the
Give an example of a nilpotent 4 x 4 matrix results. Otherwise, copy down the answers to three
that is not upper or lower triangular. (See significant figures.
Exercises 29 and 31.)
. Give an example of two invertible 4 x 4 [3-1 2
matrices whose sum is singular. 39. /1 2 1
34. Give an example of two singular 3 x 3 10 3-4
matrices whose sum is !nvertible. [2 41 4
35. Consider the 2 x 2 matrix 40.; 3 6 7
113 15 -2
A=_ i ab) rT 2 -1 3 4
-§ 2 011
and let kh = ad — bc. 4.1 13 -6 8
a. Show that. ifh # 0. then 118 -10 3 0
d/h —b+h Tr
4-10 3 #17
—c/h_ = ath 2 0-3 U1
a2. 14. 2 12 -15
1s the inverse of .4. 10-10 9 -5
15 ‘NVERSES OF SQUARE MATRICES 87

In Exercises 43-48, follow the instructions for ‘44 -3 2 6


Exercises 3 9-42, but use the routine MATCOMP 0 | 5 2 1
in LINTER. Check to ensure that AA~' = I for 47.13 8-11 4 6
each matrix A whose inverse is found. 2 | -~§ 7 2
| 3 ~j 4+ 8B)
43. The matrix in Exercise 9 f 2 14 0 6
44. The matrix in Exercise 10 3 -l 2 4 6
45. ‘The matrix in Exercise +1 48. , | 3 4 ;
. . - l l
46. The matrix in Exercise 40 3 | 4-11 10

MATLAB
Access MATLAB and, if the data files for our text are accessible, enter focisS.
Otherwise, enter these four matrices by hand. {In MATLAB, In(x) is denoted by
log(x).]

—2 3 2/7 3a cos 2 21/8


A=|a/2 1 3.2], B=|V7 in4_ 2/31,
5 -6 1.3 V2 sind 83
-3.2 1.4 5.3
C=] 1.7 -3.6 4.1
10.3 8.5 -7.6

As you work the problems, write down the entry in the 2nd row, 3rd column position
of the answer, with four-significant-figure accuracy, to hand in.
Enter help inv, read the information, and then use the function inv to work
problems M1 through M4.
M1. Compute C°.
M2. Compute A°B°?°C.
M3. Find the matrix X such that XB = C.
M4. Find the matrix X such that B’XC = A.

Enter help / and then help \, read the information, and then use / and \ rather than
the function inv to work problems M5 through M8.
M5. Compute A'B’C"'B.
M6. Compute B-?CA-3B>.
M7. Find the matrix X such that CX = B°?.
M8. Find the matrix X such that AXC’ = B.
88 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

1.6 HOMOGENEOUS SYSTEMS, SUBSPACES,


AND BASES

The Solution Set of a Homogeneous System


A linear system Ax = b is homogeneous if b = 0. A homogeneous linear system
Ax = 0 is always consistent, because x = 0, the zero vector, is certainly a
soluticn. The zero vector is called the trivial solution. Other solutions are
nentrivial solutions. A homogeneous system is special in that its solution set
has a self-contained algebraic structure of its own, as we now show.

THEOREM 1.13 Structure of the Solution Set of Ax = 0

Let Ax = 0 be a homogeneous linear system. If h, and h, are solutions


of Ax = 0, then so is the linear combination rh, + sh, for any scalars r
and s.

PROOF Leth, and h, be solutions of Ax = 0, so that Ah, = 0 and Ah, = 0. By


the distributive and scalars-pull-through properties of matnx algebra, we have

A(rh, + sh,) = A(rh,) + A(sh,)


r(Ah,) + s(Ah,)
r0+ s0=0
for al! scalars r and s. Thus the vector rh, + sh, is a solution of the system
Ax=0. a

Notice how easy it was to write down the proof of Theorem 1.13 in matrix
notation. What a chore it would have been to write out the proof using
equations with their subscripted variables and coefficients to denote a general
m X n homogeneous system!
Although we stated and proved Theorem 1.13 for just two solutions of
Ax = 0, either induction or the same proof using k solutions shows that:

Every linear combination of solutions of a homogeneous system


Ax = 0 is again a solution of the system.
l
____-

Subspaces
The solution set of a homogeneous system Ax = 0 in nm unknowns is an
example of a subset W of R" with the property that every linear combination of
vectors in Wis again in W. Note that ¥ contains all linear combinations of its
16 HOMOGENEOUS SYSTEMS, SUBSPACES, AND BASES 89

vectors if and only if it contains every sum of two of its vectors and
every scalar multiple of each of its vectors. We now give a formal definition
of a subset of R’ having such a self-contained algebraic structure. Rather
than phrase the definition in terms of linear combinations, we state it in
terms of the two basic vector operations, vector addition and scalar multi-
plication.

DEFINITION 1.%6 Closure and Subspace

A subset W of R" 1s closed under vector addition if for all u,v & W the
sumu+ visin W.Ifrv & W for all v& Wand all scalars r, then W is
closed under scalar multiplication. A nonempty subset ¥ of R’ that is
closed under both vector addition and scalar multiplication is a
subspace of R’.

Theorem 1.13 shows that the solution set of every homogeneous system
with n unknowns is a subspace of R". We give an example of a subset of R? that
is a subspace and an example of a subset that is not a subspace.

EXAMPLE 1 Show that W = {[x, 2x] | x € R} is a subspace of R’.


SOLUTION Of course, W is a nonempty subset of R’. Let u, v € Wso that u = [a, 2a] and
v = [b, 2b]. Then u + v = [a, 2a] + [b, 2b] = [a + b, 2(a + b)] is again of the
form [x, 2x], and consequently is in W. This shows that W is closed under
vector addition. Because cu = ¢[a, 2a] = [ca, 2(ca\} is in W, we see that W is
also closed under scalar multiplication, so W is a subspace of R’. a

You might recognize the subspace W’ of Example | as a line passing


through the origin. However, not all lines in R? are subspaces. (See Exercises 11
and 14.)

EXAMPLE 2 Determine whether W = {{x, y] € R? | xy = 0} is a subspace of R’.


SOLUTION rere W consists of the vectors in the first or third quadrants (including the
cou: dinate axes), as shown in Figure 1.35. As the figure illustrates, the sum of a
vector in the first quadrant and a vector in the third quadrant may be a vector
in the second quadrant, so W is not closed under vector addition, and is not a
subspace. For a numerical example, [1, 2] + [-2, —1] = [-1, 1], which is not
inW. a

Note that the set {0} consisting of just the zero vector in R" is a subspace of
R", because 0 + 0 = 0 and r0 = 0 for all scalars r. We refer to {0} as the zero
subspace. Of course, R" itself is a subspace of R", because it is closed under
vector addition and scaiar multiplication. The two subspaces {0} and R"
represent extremes in size for subspaces of R". The next theorem shows one
way to form subspaces of various sizes.
90 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

X2

_---T fu
utv ---"
/

i |
/

Vv

FIGURC 1.35
The shaded subset is not closed under addition.

THEOREM 1.14 Subspace Property of a Span

Let W = sp(w,, w., . - . , W,) be the span of k > 0 vectors in R". Then W
is a subspace of R’.

PROOF Let

u=r7w, t+ rWw,t +++ + yw, and v= Sw, + Sw, + +++ + 5,W,


be two elements of W. Their sum is
u+v=(r, + Sw, + (7. + Sw. +t oe + (hy + 5,)W,
which is again a linear combination of w,, w-....,W,,sou+visin W. Thus ¥
is closed under vector addition. Similarly. for any scalar c,
cu = (cr,)w, + (cr.)w. + +++) + (cr,)w,

is again in W—that is. Wis closed under scalar multiplication. Because k > 0,
W is also nonempty, so Wis a subspace of R". a

We say that the vectors w,, w....., W, Span or generate the subspace
Sp(W). Ws... w;) of R".
We will see in Section 2.1 that every subspace in R” can be described as the
span of at most » vectors in R". In particular, the solution set of a homoge-
neous system .4x = 0 can always be described as a span of some of the solution
vectors. We illustrate how to describe the solution set this way in an example.
1.6 HOMOGENEOUS SYSTEMS, SUBSPACES, AND BASES 91

EXAMPLE 3 Express the solution set of the hornogeneous system

XN, — 2N,+ x, - x, = 0
2x, — 3N, + 4x, - 3x, = 0
3x, _ 3N, + 5N; — 4x, = 0

—X,+ XN. — 3y, + 2x, = 0


as a span of solution vectors.
SOLUTION We reduce the augmented matrix [A | 0] to transform the coefficient matrix A
of the given system inte reduced row-echelon form. We have
1-2 1-1); 0) f1 -2 1-1/0} f1 0 5-3/0
2-3 4-3/0] jo 1 2-1/0] Jo 1 2-1] 0
3-5 5 -4l/ol~lo 1 2-1! 0l~lo 0 0 0| cl:
-1 1-3 2/01 lo-1-2 1/0! lo 0 0 0] 0
The reduction is complete. Notice that we didn’t really need to insert the
column vector 0 in the augmented matrix, because it never changes.
From the reduced matrix, we find that thc homogeneous system has two
free variables and has a solution set described by the general solution vector
x,| [-sr+3s] [-5] [3]
A? —2?r + § -2| ]
X=) 551 = r =Tr hi” 10F (1)

| 5 0j ]
Thus the solution set is

-5 3
—2| |1
sp | ll] Oll-

0} | 1
We chose these two generating vectors from Eq. (1) by taking r= 1, 5 = 0 for
the first andr = 0,s = 1 forthe second. «

The preceding example indicates how we can always express the entire
solution set of a homogeneous system with k free variables as the span of k
solution vectors.
Given an m X n matrix A, there are three natural subspaces of R” or R"
associated with it. Recall (Theorem 1.14) that a span of vectors in R” is always
a subspace of R”. The span of the row vectors of A is the row space of A, and is
of course a subspace of R". The span of the column vectors of A is the column
space of A and is a subspace of R”. The solution set of Ax = 0, which we have
been discussing, is the nullspace of A and is a subspace of R”. For example, if
_fl1 0 3
A=|j |
92 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

we see that
ithe row space of A is sp({1, 0, 3], (0, 1, —1]) in R’,
the column space of A isso([o} Hi _} in R?, and
-3
the nullspace of A is sp}|_ 1}] in R’.
l
The nullspace of A was readily found because A is in reduced row-echelon
form.
In Section 1.3, we emphasized that for an m X ny matrix A and x € R’, the
vector Ax is a linear combination of the column vectors of A. In Section 1.4 we
saw that the system Ax = b has a solution if and only if b is equal to some linear
combination of the column vectors of A. We rephrase this criterion for
existence of a solution of Ax = b in terms of the column space of A.

Column Space Criterion


A linear system Ax = b has a solution if and only if b is in the
column space of A.

We have discussed the significance of the nullspace and the column space
of a matrix. The row space of A is significant because the row vectors of A are
orthogonal to the vectors in the nullspace of A, as the ith equation in the
system Ax = 0 shows. This observation will be useful when we compute
projections in Section 6.1.

Bases

We have seen how the solution set of a homogeneous linear system can be
expressed as the span of certain selected solution vectors. Look again at
Eq. (1), which shows the solution set of the linear system in Example 3 to be
sp(w,, W,) for

-5 [3
Ww, = | and Ww, = 0 .

0 l
The last two components of these vectors are 1, 0 for w, and 0, 1 for w,. These
components mirror the vectors i = [1, 0] and j = (0, 1] in the plane. Now the
vector [r, s] in the plane can be expressed uniquely as a linear combination ofi
and j—namely, as ri + sj. Thus we see that every solution vector in Eq. (1) of
the linear system in Example 3 is a unique linear combination of w, and
w,—namely, rw, + sw,. We can think of (r, s) as being coordinates of the
1.6 HOMOGENEOUS SYSTEMS, SUBSPACES, AND BASES $3

solution relative to w, and w,. Because we regard all ordered pairs of numbe-s
as filling a plane, this indicates how we might regard the solution set of th:s
system as a plane in R‘. We can think of {w,, w,} as a set of reference vectors to:
the plane. We have depicted this plane in Figure 1.36
More generally, if every vector w in the subspace W = sp(w,, W2,.. - - ¥.!
of iR” can be expressed as a linear combination w = cw, +-c.W, + °° ° + 6.5.
in a unique way, then we can consider the ordered k-tuple (c,, ¢,, . - . 5 Cx) in 3°
to be the coordinates of w. The set {w,, w,, . . . , W,} is considered to be a sei of
reference vectors for the subspace W. Such a set is known as a basis for HW. as
the next definition indicates.

DEFINITION 1.17 Basis fora Subspace

Let W bea subspace of R". A subset {w,, w,, ..., Wt of Wis a basis for
W if every vector in W can be expressed uniquely as a linear
combination of w,, W2,..., W;-

Our discussion following Example 3 shows that the two vectors


-5 3
- j
w,= 1 and wW,= 0

0 j
form a basis for the solution space of the homogeneous system there.
It {w,, W., .. . , W, iS a basis for 77’, then we have W = sp{w,, w,, ... , W,) 2s
well as the uniqueness requirement. Remember that we called e,, ¢,,....e.
standard basis vectors for R", The reason for this is that every element of R’” can
be expressed uniquely as a linear combination of these vectors e,. We call
{e,, @,,... , €,} the standard basis for R’.

a mi tam
/

FIGURE 1.36
The plane sp(w,, w,)
94 CHAPTER } VECTORS, MATRICES. AND LINEAR SYSTEMS

We would like to be able to determine whether {w,, w,, . . . , Wd IS a basis


for the subspace W = sp(w,, w.,..., w,) of R’—that is, whether the
uniqueness criterion holds. The next theorem will be helpful. It shows that we
need only examine uniqueness at the zero vector.

THEOREM 1.15 Unique Linear Combinations

The set {w,, w,,..., W,} is a basis for W = sp(w,, w,, ..., W,) 10 R’ if
and only if the zero vector is a unique linear combination of the
w,—that is, if and only if r,w, + nw, + -: +> + 7,7, = Oimplies that
hen=crr: =K=0.

PROOF If {w,, W.,..., W,} is a basis for W, then the expression for every
vector in Was a linear combination of the w,; is unique, so, in particular, the
linear combination that gives the zero vector must be unique. Because
Ow, + Ow, + ++: + Ow, = 0, it follows that rw, + nw, + ++: +7W, = 0
implies that each r; must be 0.
Conversely, suppose that Ow, + Ow. + --+ + Ow, is the only linear
combination giving the zero vector. If we have two linear combinations
W= CW, + 6,W, + ct + ew,
w= dw, + dw. t+ +--+: +d,

for a vector w & W, then, subtracting these two equations, we obtain

0 = (c, — dw, + (C, — dw, + 0 + (Gq — Ay.


From the unique linear combination giving the zero vector, we see that

¢ -d=cq-d,= +++ =¢-d4,=0,


and soc,=d,fori=1,2,...,k, showing that the linear combination giving w
is unique. A

The Unique Solution Case for Ax = b


The preceding theorem immediately focuses our attention on determining
when a linear system Ax = b has a unique solution. Our boxed column space
criterion asserts that the svstem Ax = b has at /east one solution precisely when
b is in the column space of A. By Definition 1.17, the system has exactly one
solution for each b in the column space of A if and only if the column vectors of
A form a basis for this column space.
Let 4 be a square n X nm matrix. Then each column vector b in R” is a
unique linear combination of the column vectors ofA if and only if Ax = b has
a unique solution for each b € R". By Theorem 1.12. this is equivalent to 4
being row-reducible to the identity matrix, so that A is invertible. We have now
established another equivalent condition to add to those in Theorem 1.12. We
summarize the main ones in a new theorem for easv reference.
16 HOMOGENEOUS SYSTEIIS, SUBSPACES, AND BASES 95

THEOREM 1.16. The Square Case, m = 7:

Let A be an 7 X n matrix. The following are equivalent.


1. The linear system .4x = b has a unique solution lor each b € &"
2. The matrix A is row equivalent to the identity matrix /.
3. The matrix A Is invertible.
4. The column vectors of . form a basis jor R".

In describing examples and stating exercises, we often write vectors in R"


as row vectors to save space. However, the vector b = Ax in Theorem 1.16 is
necessarily a column vector. Thus, solutions to examples and exercises that use
results such as Theorem |.16 are done using column-vector notation, as our
next example illustrates.

EXAMPLE 4 Determine whether the vectors v, = [1, 1. 3], v. = [3, 0, 4], and v, = [1, 4, -1]
form a basis for R?.

SOLUTION We must see whether the matrix 4 having v,, v.. and v, as column vectors is row
equivalent to the identity matrix. We need onlv create zeros below pivots to
determine if this is the case. We obtain

1 3 1 1 3 1) fi 3 4]
d={1 0 4/~]0 -3 3/~lo 1 -1/.
3 4-1} {0 -5 -4| |0 0-9

There is no point in going further. We see that we will be able to get the identity
matrix, so {v,. vy. ¥;} is a basis for R*.s

A linear system having the same number n of equations as unknowns 1s


calied a square system, because the coefficient matmx is a square ” X m matrix.
When a square matrix is reduced to echelon form. the result is a square matrix
having only zero entries below the main diagonal. which runs from ihe upper
left-hand corner to the lower right-hand corner. This follows at once from the
fact that the pivot in a nonzero row—say. the ith row—is always in a column),
where j = i. Such a square matrix U with zero entries below the main diagonal
is called upper triangular. The final matrix displayed in Example 4 is upper
triangular.
For a general linear system Ax = b of ™ equations in n unknowns, we
consult Theorem 1.7. It tells us that a consistent system 4x = b has a unique
solution if and only if a row-echelon torm H of A has a pivot in each of its n
columns. Because no two pivots appear in the same row of H, we see that H has
at least as many rows as columns; that is, m = n. Consequently, the reduced
row-echelon form forA must consist of the identity matrix, followed by m — n
96 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

zero rows. For example, if m = 5 and n = 3, the reduced row-echelon form for
A in this unique solution case must be
1 0

Oo
01

omct
00
00
000
We summarize these observations as a theorem.

THEOREM 1.17 The General Unique Solution Case

Let 4 be an m X rn matrix. The following are equivalent.


1. Each consistent system Ax = b has a unique solution.
2. The reduced row-echelon form of A consists of the n x n identity
matrix followed by m — n rows of zeros.
3. The column vectors of A form a basis for the column space of A.

EXAMPLE 5 Determine whether the vectors w, = [1, 2, 3, -1], w. = [-2, —3, —5, 1], and
w, = [-1, —3, —4, 2] form a basis for the subspace sp(w,, w,, ;) in R¥.
SOLUTION By Theorem 1.17, we need to determine whether the reduced row-echelon
form of the matrix A with w,, w,, and w, as column vectors consists of the 3 x 3
identity matrix followed by a row of zeros. Again, we can determine this using
just the row-echelon form, without creating zeros above the pivots. We obtain

1-2-1) f1 -2 -1) [1 -2 -1
2-3-3) Jo 1-1] lo 1-1
A=! 3-5 -4/~]o 1 -1{/~]o 0. O|:
-1 1 2} {o-1 1] [0 0 0
We cannot obtain the 3 x 3 identity matrix. Thus the vectors do not form a
basis for the subspace which is the column space of A. «

A linear system having an infinite number of solutions is called under-


determined. We now prove a corollary of the preceding theorem: that
a consistent system is underdetermined if it has fewer equations than un-
knowns.

COROLLARY 1. Fewer Equations than Unknowns, m <n

If a linear system Ax = b is consistent and has fewer equations than


unknowns. then it has an infinite number of solutions.
16 HOMOGENEOUS SYSTEMS, SUBSPACES, ANO BASES 97

PROOF If m< nin Theorem 1.17, the reduced row-echelon form of A cannot
contain the n X n identity matrix, so we cannot be in the unique solution case.
Because we are assuming that the system is consistent, there are an infinite
number of solutions. 4

The next corollary follows at once from Corollary 1 and Theorem 1.17.

COROLLARY 2 The Homogeneous Case

1. A homogeneous linear system Ax = 0 having fewer equations than


unknowns has a nontrivial solution—that is, a solution other than
the zero vector.
2. A square homogencous system Ax = 9 has a nontrivial solution if
and only if A is not row equivalent to the identity matrix of the
same Size.

EXAMPLE 6 Show that a basis for R* cannot contain more than n vectors.

SOLUTION If {v,, v.,..., Vf 18 a basis for R*, then, by Theorem 1.15, the only linear
combination of the v, equal to the zero vector is the one for which the
coefficient of each v, is the scalar 0. In terms of a matrix equation che
homogeneous linear system Ax = 0, where v; is the jth column vector of the
n X k matrix A, must have only the trivial solution. If k > n, then this linear
system has fewer equations than unknowns, and therefore 2 nontrivial
solution by Corollary 2. Consequently, we must havekK <n. ©

The Solution Set of Ax = b


Theorem 1.13 tells us the structure of the solution set of a homogeneous linear
system. To conclude the section, we now describe the solution set of Ax = bin
terms of the solution set of the corresponding homogeneous system Ax = 0. It is
customary to refer to an equation that describes the whole solution set as the
general solution, and to refer to each element of the solution set as a particular
solution.

THEOREM 1.18 Structure of the Solution Set of Ax = b

Let Ax = b bea linear system. If pis any particular solution of Ax = b


and h is a solution of the corresponding homogeneous system Ax = 0,
then p + his a solution of Ax = b. Moreover, every solution of Ax = b
has this form p + h, so that the general solution is x = p + h where
Ah = 0.

PROOF Let p be a solution of Ax = b, so that Ap = b, and let h be a solution of


Ax = 0, so that Ah = 0. Then

A(p +h) = Ap + Ah=b+0=b,


98 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS
and sop + his indeed a solution. Moreover, if q is any solution of Ax = b, then
A(q — p) = Aq ~ Ap=b-
d=0,
and soq — pis a solution h of Ax = 0. From q — p = h, it follows that q = p + h.
This completes the proof. a

Students who have studied differential equations may be familiar with a


similar theorem describing the general solution of a linear differential equa-
tion.

EXAMPLE 7 Illustrate Theorem 1.18 for the linear system Ax = b given by


X,- 2%, + xX,- xX,= 4
2x, — 3x, + 4x; - 3x, = -!
3x, — 5x, + 5x, -4x,= 3
—XxX, + XxX, - 3x; + 2x, 5.
SOLUTION We reduce the augmented matrix [A | b] to transform the coefficient matrix A
of the given system into reduced row-echelon form. (The coefficient matrix A is
the same as in Example 3.) We have
1-2 1-1 4 1-2 1-1 4 105 -3} -14
2-3 4-3]|-1 0 1! 2-1} -9 012-1 -9
3-5 5 -4 3}~j]O 1 2-1] -9}7|000 0 O|"
[-l 1-3 2 5 0-1-2 1 9 000 0 0
Writing the general solution in the usual form and then as described in
Theorem 1.18, we have

x, —-14- 5r+ 3s —14 —5r+ 3s


x; —9-2r+ 5 —9 -2r+ §
x= X; = r = 0 + r .

X4 5 0 S 8
General solution Pp h

“| SUMMARY
1. A linear system Ax = b is homogeneous if b = 0.
2. Every linear combination of solutioris of a homogeneous system Ax = 0 is
again a solution of the system.
3. A subset 4’ of R" is closed under vector addition if the sum of two vectors
in Wis again in W. The subset YU" is closed under scalar multiplication if
every scalar multiple of every vector in "is in W. If Wis nonempty and
closed under both operations, then Wis a subspace of R".
1.6 HOMOGENEOUS SYSTEMS, SUBSPACES, AND BASES 99

The span of any & vectors in R" is a subspace of R”. If A is an #72 x n


matrix, the row space of A is the span in R" of the row vectors of A, the
column space of A is the span in R” of the column vectors, and the
nullspace of A is the solution set of Ax = 0 in R".
A subset {w,, w,,..., W,} of a subspace W of R" is a basis for W if every
_vector in W can be expressed uniquely as a linear combination of
Wi, Wo, Whe
The set {w,, w,,..., W,} is a basis for sp(w,, w,, ..., W,) if and only if
Ow, + Ow, + +: > + Ow, is the unique linear combination of the w; that is-
equal to the zero vector.
A consistent linear System Ax = b of m equations in n unknowns has a
unique solution if and only if the reduced row-echelon form of A appears
as the n x n identity matrix followed by m — n rows of 7eros.
A consistent linear system having fewer equations than unknowns is
underdetermined—that is, it has an infinite number of solutions.
A square linear system has a unique solution if and only if its coefficient
matrix is row equivalent to the identity matrix.
10. The solutions of any consistent linear system Ax = b are precisely the
vectors p + h, where p is any one particular solution of Ax = b and h
varies through the solution set of the homogeneous svstem Ax = 0.

EXERCISES

In Exercises 1-10, determine whether the 12. Let a, b, and c be scalars such that abc # 0.
indicated subset is a subspace of the given Prove that the plane ax + by + cz = Oisa
Euclidean space R’. subspace of R’.
13. a. Give a geometric description of all
1. {{(r, -r]) |r E Rin R? subspaces of R’. ;
2. {[x, x + 1] | x © R}in R? b. Repeat part (a) for R’. oR ons th
3. {{n, m] | n and m are integers} in R? 14. Prove that every subspace of R” contains the
4. {[x, y] | xy € R and x,y = 0} (the first Zero vector.
» WS UT MY > = ‘ 15. Is the zero vectora basis for the subspace {0}
quadrant of R’) .
of R"? Why or why noi?
5. {[x, y, z] | x,y,z € R and z = 3x + 2}in R?
6. {[x, y, z] | x,y,z © R and x = 2y + z}in R’ In Exercises 16-21, find a basis for the solution
7. {[x, y, z] | ~y,z © R and z= 1, y = 2x} in RB? set of the given homogeneous linear system.
8. {[2x,
{[2x, xx ++ y,ys y]VI} 1 xyxy ۩ RhR} in
in k R? 16. x- y=0
9. {[2x,, 3x,, 4x5, 5x.) | x; € R} in R' dx -2=0
10. {[x,, %, ..- Xa) | x, ER, x, = O} in R" 1. 4x aan n=0
1}. Prove that the line y = mx is a subspace of , >?
R?. [Hint: Write the line as 6x, + 2x, + 2x; = 0
W = {[x, mx] | x € R}.] —9x, “ 3x, 7 3x; = 0
100 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

18. x, - %+x- x, =0 32. Find a basis for the nullspace of the matrix
Xy + Xy =0 {357
x, + 2x, — x, + 3x, = 0 2042I'.
19. 2x,+ m+ x4+ 1, =0 3287
xX, — 6xX1+ XX; =0 33. Let v,, v.,....¥, and W,, Wo, .... Wy, be
3x, — 5X, + 2x; + x, =0 vectors in a vector space V. Give a necessary
and sufficient condition, involving linear
Sx, — 4x; + 3x; + 2x, = 0 combinations. for
20. 2x, +X. + xX, + x, =0
Sp(V,, Va... . > Vy) = SPp(W), Wa... W,,):
3x, +X ~ =X; + 2x, =0
X, +X, + 3x; =0
X, — %) — 7x; + 2x, = 0 In Excrcises 34-37, solve the given linear system
and express the solution set in a form that
2. x - Xt 6X, + Xy- x, =0 illustrates Theorem 1.18.
3x, + 2x, - 3x, + 2x, + Sx; = 0
4x, + 2x,.- 4x; + 3x,- x; =0 34. x, — 2x, + x, + 5x, =7
3x, — 2x, + 14x, + x, - 8x, = 35. 2x, - X2+ 3d = -3
2x, — X,+ 8x, + 2x, - 7x, =0 4x, + 2x, -x,= |
36. x,-—2x,+ x+ x,
= 4
In Exercises 22-30, determine whether the set of 2x,+ %-—3x,- m= 6
vectors is a basis for the subspace of R" that the
vectors span. x, 7 7X5 7 6x; + 2N, = 6

37. 2x, + XxX, + 3x; =

ws
22. {[-1, 1], [1. 2]} in ¥, a X T 2x; + Xy =

Oo
|
23. {{-1, 3, 1], (2, 1, 4]} in RF

Wm
4x, - XxX + 7X; + 2X, =

24. {{-1, 3, 4), [1, 5. 1}. [f. £3, 2)} in R? 7X, 7 2x2 - X3 + xX, = -5

25. {[2. 1, -3], [4, 0. 2], (2, -1, 3]} in R? 38. Mark each of the following True or False.
26. {{2, 1, 0, 2], [2. —3. 1. 0}. [3, 2, 0, O}} in R* . — a. A linear system with fewer equations
than unknowns has an infinite number of
27. The set of row vectors of the matrix
solutions.
2-6 | ___ b. A consistent linear system with fewer
1-3 4] equations than unknowns has an infinite
28. The set of column vectors of the matrix in number of solutions.
Exercise 27. c. Jf a square linear system Ax = b has a
solution for every choice of column
29. The set of row vectors of the matrix
vector b, then the solution is unique for
-| ol each b.
=

| 2 ___ d. Ifa square system Ax = 0 has only the


1-3 |
no

trivial soltuion, then Ax = b has a unique


1-3 4 solution for every column vector b with
30. The set of column vectors of the matrix in the appropriate number of components.
Exercise 29. —_— e. Ifa linear system Ax = 0 has only the
trivial solution, then Ax = b has a unique
31. Find a basis for the nullspace of the matrix
solution for every column vector b with
2 3 1 the appropriate number of components.
5 2 | . The sum of two solution vectors of anv
=

1 7 2 linear system is also a solution vector of


6-2 0 the system.
1.6 HOMOGENEOUS SYSTEMS, SUBSPACES, AND BASES 101

— 68 . The sum of two solution vectors of any 45. Let v, and v, be vectors in R*. Prove the
homogeneous linear system is also a following set equalities by showing that each
solution vector of the system. of the spans is contained in the other.
h. A scalar multiple of a solution vector of a. sp(v,, Vv.) = sp(v,, 2v, + Va)
any homogeneous linear system is also a b. sp(V,, ¥) = sPtV, + ¥,, Vy — ¥)
solution vector of the system. 46. Referring to Exercise 45, prove that if {¥,, ¥;}
i. Every line in R? is a subspace of R? is a basis for sp(v,, v,), then
generated by a single vector. a. {v,, 2v, + v,} is also a basis.
j. Every line through the origin in R? is a b. {v, + v,, v, — v,} is also a basis.
subspace of R’ generated by a single c. {¥, + V2, V, — V2, 2v, — 3y,} is not a basis.
vector.
47. Let W, and W, be two subspaces of R".
39. We have defined a linear system to be Prove that their intersection W, M W, is also
underdetermined if it has an infinite number a subspace.
of solutions. Explain why this is a reasonable
term to use for such a system. fai In Exercises 48-51, use LINTEK or MATLAB to
40. A linear system is overdetermined if it has determine whether the given vectors form a basis
more equations than unknowns. Explain why for the subspace of RX" that they span.
this is a reasonabie term to use for such a
system. 48. a, = [1, 1, -1, 0}
a, = (5, 1, 1, 2]
41. Referring to Exercises 39 and 40, give an
example of an overdetermined a, = [5, -3, 2, -1]
underdetermined linear system! a, = [9, 3, 0, 3]
42. Use Theorem !.13 to explain why a 49. b, = [3, —4, 0, 0, 1]
homogeneous system of linear equations has b, = [4, C, 2, —6, 2]
either a unique solution or an infinite
b, = (0, 1, 1, —3, 0]
number of solutions.
b, = [1, 4, —1, 3, 0]
. Use Theorem 1.18 ts explain why no system
of linear equations can have exactly two . Vv, = (4, -1,2, 1
solutions. v, = [10, —2, 5, 1]

. Let A be an m X n matrix such that the v; = [-9, 1, —6, —3]


homogeneous system Ax = 0 has only the v, = [1, -1, 0, 0}
trivial solution. 51. w, = [1, 4, —8, 16]
a. Does it follow that every system Ax = b w, = [1, 1, -1, 1]
is consistent?
w; =[I1, 4, 8, 16]
b. Does it follow that every consistent
system Ax = b has a unique solution? w, =[1, 1,1, 1]

MATLAB
Access MATLAB and enter fbc1s6 if our text data files are available; otherwise, enter
the vectors in Exercises 48-51 by hand. ‘Use MATLAB matrix commands to form the
necessary matrix and reduce it in problems M1-M4.
Nil. Solve Exercise 48. 13. Solve Exercise 50.
M2. Solve Exercise 49. M4. Solve Excercise 51.
102 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

M5. What do you think the probability would be that if n vectors in R” were
selected at random, they would form a basis for R*. (The probability of an
event is a number from 0 to 1. An impossible event has probability 0, a
certain event has probability 1, an event as likely to occur as not to occur has
probability 0.5, etc.)
M6. As a way of testing your answer to the preceding exercise, you might
experiment by asking MATLAB to generate ‘‘random” n x n matrices for
some value of 1, and reducing them to sce if their column vectors form a
basis for R". Enter rand(8) to view an 8 xX 8 matrix with “random” entries
between 0 and 1. The column vectors cannot be considered to be random
vectors in R®, because all their couaponents lie between 0 and 1. Do you think
the probability that such column vectors form a basis for R* is the same as in
the preceding exercise? As an experimental check, execute the command
rref(rand(8)) ten times to row-reduce ten such 8 x 8 matrices, examining each
reduced matrix to see if the coiumn vectors of the matrix generated by
rand(8) did form a basis for R°.
M7. Note that 4srand(8)—2+ones(8) will produce an 8 x 8 matrix with “random”
entries between —2 and 2. Again, its column vectors cannot be regarded as
random vectors ia R®, but at least the components of the vectors need not all
be positive, as they were in the preceding exercise. Do you think the
probability that such column vectors form a basis for R® is the same as in
Exercise M5? As an experimental check, row-reduce ten such matrices.

1.7 APPLICATION TO POPULATION DISTRIBUTION (OPTIONAL)


Linear algebra has proved to be a valuable tool for many practical and
mathematical problems. In this section, we present an application tu popule
tion distribution (Markov chains).
Consider situations in which people are split into two or more categories.
For example, we might split the citizens of the United States according to
income into categories of

poor, middle income, rich.


We might split the inhabitants of North America into categories according to
the climate in which they live:
hot, cemperate, cold.
In this book, we will speak of a population split into states. In the two
illustrations above, the populations and states are given by the following:

Population States

Citizens of the United States poor, middle income, rich


People in North America hot, temperate, cold
1.7 APPLICATION TO POPULATION DISTRIBUTION (OPTIONAL) 103

Our populations will often consist of people, but this is not essential. For
example, at any moment we can classify the population of cars as operational
or not operational.
We are interested in how the distribution of a population between (or
among) states may change over a period of tinie. Matrices and their multiplica-
tion can play an important role in suci considerations.

Transition Matrices

The tendency of a population to move among n states can sometimes be


described using an n X n matrix. Consider a population distributed among n =
3 states, which we call state |, state 2, and state 3. Suppose that we know the
proportion-t, of the population of state j that moves to state i over a given fixed
time period. Notice that the direction of ovement from state j to state i is
the right-to-left order of the subscripts ia 4. The matrix T = [t,] is called
a transition matrix. (Do not confuse our use of T as a transition matrix in
this one section with our use of T as a linear transformation elsewhere in the
text.)

EXAMPLE 1 Let the population of a country be classified according to income as


State 1: poor,
State 2: middle income,
State 3: rich.
Suppose that, over each 20-year period (about one generation), we have the
following data for people and their offspring:
Of the poor people, 19% become middle income and 1% rich.
Of the middle income people, 15% become poor and 10% rich.
Of the rich people, 5% become poor and 30% middle income.
’ Give the transition matrix describing these data.
SOLUTION The entry 4, in the transition matrix T represents the proportion of the
population moving from state j tc state i, not the percentage. Thus, because
19% of the poor (state 1) will become middle income (state 2), we should take-
t, = .19. Similarly, because 1% of the people in state 1 move to state 3 (rich),
we should take ¢,, = .01. Now ¢,, represents the proportion of the poor people
who remain poor at the end of 20 years. Because this is 80%, we should take ¢,,
= .80. Continuing in this fashion, starting in state 2 and then in state 3, we
obtain the matrix

r
poor mid rich
80 .15 .05] poor
T=
19.75 .30| mid
| 01 .10 .65} rich
104 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

We have labeled the columns and rows with the names of the states. Notice
that an entry of the matrix gives the proportion of the population in the state
above the entry that moves to the state at the right of the entry during one
20-year period. =

In Example |, the sum of the entries in each column of T is 1, because the


sum reflects the movement of the entire population for the state listed at the
top of the column. Now suppose that the proportions of the entire population
in Example | that fall into the various states at the start of a time period are
given in the column. vector
P
P = | P2}-
Py
For example, we would have
1
3
1
Pp =|3
i
3
if the whole population were initially equally divided among the states. The
entries in such a population distribntion vector p must be nonnegative and must
have a sum equal to 1.
Let us find the proportion of the entire population that is in state | after
one time period of 20 years, knowing that initially the proportion in state | is
D,. The proportion of the state-1 population that remains in state | is ¢,,. This
gives a contribution of ¢,,p, to the proportion of the entire population that will
be found in state | at the end of 20 years. Of course, we also get contributions
to state | at the end of 20 years from states 2 and 3. These two states contribute
proportions ¢,,p, and ¢,,p, of the entire population to state 1. Thus, after 20
years, the proportion in state | is

CP +.b2P2 + ty
This is precisely the first entry in the column vector given by the product
i be tsl|Py
Tp = [ty ty b3}]P2)-
bi be 43||D3
In a similar fashion, we find that the second and third components of Tp give
the proportions of population in state 2 and in state 3 after one time period.

For an initial population distribution vector p and transition matrix |


T the product vector 7p is the population distribution vector after
one time period. |
1.7 APPLICATION TO POPULATION DISTRIBUTION (OPTIONAL) 105

Markov Chains

In Example 1, we found the transition matrix governing the flow of a


population among three states over a period of 20 years. Suppose that the same
transition matrix is valid over tie next 20-year period, and for the next 20
years after that, and so on. That is, suppose that there is a sequence or chain of
20-year periods over which the transition matrix is valid. Such a situation is
called a Markov chain. Let us give a formal definition of a transition matrix for
a Markov chain.

DEFINITION 1.18 Transition Matrix

Ann X n matrix T is the transition matrix for an n-state Markov chain


if all entries in T are nonnegative and the sum of the entries in each
colunin of T is 1.

Markov chains arise naturally in biology, psychology, economics, and


many other sciences. Thus they are an important application of linear algebra
and of probability. The entry 4; in a transition matrix T is known as the
probability of moving from state j to state i over one time period.

EXAMPLE 2 Show that the matrix

001
T=1100
010
is a transition matrix for a three-state Markov chain, and explain the
significance of the zeros and the ones.
SOLUTION The entries are all nonnegative, and the sum of the entries in each column is 1.
Thus the matrix is a transition matrix for a Markov chain.
At least for finite populations, a transition probability 4, = 0 means that
there is no movement from state j to state i over the time period. That is,

HISTORICAL NOTE Markov Cuains are named for the Russian mathematician Andrei
Andreevich Markov (1856-1922), who first defined them in a paper of 1906 dealing with the Law
of Large Numbers and subsequently proved many of the standard results about them. His interest
in these sequences stemmed from the needs of probability theory; Markov never dealt with their
applications to the sciences. The only real examples he used were from literary texts, where the
two possible states were vowels and consonants. To illustrate bis results, he did a statistical study
of the alternation of vowels and consonants in Pushkin’s Eugene Onegin.
Andrei Markov taught at St. Petersburg University from 1880 to 1905, when he retired to
make room for younger mathematicians. Besides his work in probability, he contributed to such
fields as number theory, continued fractions, and approximation theory. He was an active
participant in the liberal movement in the pre-World War I era in Russia; on many occasions he
made public criticisms of the actions of state authorities. In 1913, when as a member of the
Academy of Sciences he was asked to participate in the pompous ceremonies celebrating the 300th
anniversary of the Romanov dynasty, he instead organized a ceiebrazion of the 200th anniversary
of Jacob Bernoulli’s publication of the Law of Large Numbers.
106 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

transition from state j to state i over the time period is impossible. On the
other hand, if ¢; = 1, the entire population of state j moves to state / over the
time period. That is, transition from state j to state i in the time period is
certain.
For the given matrix, we see that, over one time period, the entire
population of state 1 moves to state 2, the ex:ire population of state 2 moves to
state 3, and the entire popuiation of state3 moves to state 1. a

If Tis an n X n transition matrix and pis a population distribution column


vector with n components, then we can readily see that Tp is again a
population distribution vector. We illustrate the general argument with the
case n = 2, avoiding summation notation and saving space. We have
Tp = Qi ta] [Pi] — [fiPi + GaP
br ba} | Pr EP, + bP
To show that the sum of the components of 7p is 1, we simply rearrange the
sum of the four products involved so that the terms involving p, appear first,
follcwed by the terms involving p,. We obtain
byP, + byDy + bP. + baPr = Py (ty + by) + Prlty + by)
= p,(1) + pl) = p, + p, = 1.
The proof for the n x ncase is identical; we would have n’ products rather than
four. Note that it follows that if T is a transition matrix, then so is T?; we need
only observe that the jth column c of T is itself a population distribution
vector, so the jth column of T’, which is Tc, has a component sum equal to 1.
Let T be the transition matrix over a time period—say, 20 years—in a
Markov chain. We can form a new Markov chain by lonxing at the flow of the
population over a time period twice as long—that 1s, over 40 years. Let us see
the relationship of the transition matrix for the 40-year time period to the one
for the 20-year time period. We might guess that the transition matrix for 40
years is T?. This is indeed the case. First, note that the jth column vector of an
n X nmatrix A is Ae,, where e; is the jth standard basis vector of R", regarded as
a column vector. Now e; is a population distribution vector, so the jth column
of the two-period transition matrix is T(Te,) = T’e, showing that T? is indeed
the two-period matrix.
If we extend the argument above, we find that the three-period transition
matrix for the Markov chain is T?, and so on. This exhibits another situation
in which matrix multiplication is useful. Although raising even a small matrix
to a powér using pencil and paper is tedious, a computer can do it easily.
LINTEK and MATLAB can be used to compute a power of a matrix.

m-Period Transition Matrix

A Markov chain with transition matrix has 7” as its #7-pericd


lransition matrix.
1.7 APPLICATION TO POPULATION DISTRIBUTION (OPTIONAL) 107

EXAMPLE 3 For the transition matrix

in Example 2, show that, after three time periods, the distribution of


population among the three states is the same as the initial population
distribution. -
SOLUTION After three time periods, the population distribution vector is T?p, and we
easily compute that 7? = J, the 3 x 3 identity matrix. Thus T°’p = p, as
asserted. Alternatively, we could note that the entire population of state |
moves tc state 2 in the first time period, then to state 3 in the next time period,
and finally back to state 1 in the third time period. Similarly, the populations
of the other two states move around and then back to the beginning state over
the three periods. us

Regular Markov Chains


We now turn to Markov chains where there exists some fixed number m of
time periods in which it is possible to get from any state to any other state. This
means that the mth power 7” of the transition matrix has no zero entries.

DEFINITION 1.19 Regular Transition Matrix, Regular Chain

A transition matrix T is regular if 7” has no zero entries for some


integer m. A Markov chain having a regular transition matrix is called
a regular chain.

EXAMPLE 4 Show that the transition matrix


—1

©
oo~_
we

Oo
—_—
©S&

of Example 2 is not regular.


SOLUTION A computation shows that 7” still has zero entries. We saw in Example 3 that
T? = ], the 3 X 3 identity matrix, so we must have T‘ = 7, T° = T’, T§ = T? =
I, and the powers of T repeat in this fashion. We never eliminate all the zeros.
Thus T is not a regular transition matrix. us

If T” has no zero entries, then 7”*' = (J)T has no zero entries, because
the entries in any column vector of T are nonnegative with at least one nonzero
entry. In determining whether a transition matrix is regular, it is not necessary
to compute the entries in powers of the matrix. We need only determine
whether or not they are zero.
108 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

EXAMPLE 5 If x denotes a nonzero entr;;, determine whether a transition matrix T with


the zero and nonzero configuration given by

is regular.
SOLUTION We compute configurations of high powers of T as rapidly as we can, because
once a power has no zero entries, all higher powers must have nonzero entries.
We find that
x 0x x xx x xX [x x x x
x 00 x x 0x xX x xxX xX
T? = x x x OP T‘ = x xX X XP T= x xX xX XP
00x QO x x x .0 x xX XX

so the matrix T is indeed regular. s

Jt can be shown that, if a Markov chain is regular, the distribution of


population among the states over many time periods approaches a fixed
Steady-state distribution vector s. That is, the distribution of population among
the states no longer changes significantly as time progresses. This is not to say
that there is no longer movement of population between states; the transition
matrix T continues to effect changes. But the movement of population out of
any state over one time period is balanced by the population moving into that
state, so the proportion of the total population in that state remains constant.
This is a consequence of the following theorem, whose proof is beyond the
scope of this book.

THEOREM 1.19 Achievement of Steady State

Let T be a regular transition matrix. There exists a unique column


vector s with strictly positive entries whose sum is | such that the
following hold:
1. As m becomes larger and larger, all columns of 7” approach the
column vector s.
2. Ts = s, ands is the unique column vector with this property and
whose components add up to 1.

From Theorem !.19 we can show that, if p is any ‘initial population


distribution vector for a regular Markov chain with transition matrix 7, the
population distribution vector after many time periods approaches the vector
1.7 APPLICATION TO POPULATION DISTRIBUTION (OPTIONAL) 109

s described in the theorem. Such a vector is called a steady-state distribution


vector. We indicate the argument using a 3 x 3 matrix 7. We know that the
population distribution vector after m time periods is Tp. If we let

Py, 5
p=|p,| and s=|]s,|,
P; 5;
then Theorem 1.19 tells us that 7"p is approximately

Sy S$, Si] Py Sip, + Sp. + Sip,


Sy Sq S_]} Py) = [S,p, + Sop. + Syp5}.
53 Sy S3|}P3} [53D + Sap. + Sap,
Because p, + p, + p, = 1, this vector becomes

Thus, after many time periods, the population distribution vector is approxi-
mately equal to the steady-state vector s for any choice of initial population
distribution vector p.
There are two ways we can attempt to compute the steady-state vector s of
a regular transition matrix 7. If we have a computer handy, we can simply
raise T to a sufficiently high power so that all column vectors are the same, as
far as the computer can print them. The software LINTEK or MATLAB can be
used to do this. Alternatively, we can use part (2) of Theorem 1.19 and solve
for s in the equation
Ts =s. (1)
In solving Eq. (1), we will be finding our first eigenvector in this text. We will
have a lot more to say about eigenvectors in Chapter 5.
Using the identity matrix J, we can rewrite Eq. (1) as
Ts = Is
Ts
- Is = 0
(T — Ds
= 0.

The last equation represents a homogeneous system of linear equations with


coefficient matrix (T — J) and column vector s of unknowns. From all the
solutions of this homogeneous system, choose the solution vector with positive
entries that add up to 1. Theorem |.19 assures us that this solution exists and
is unique. Of course, the homogeneous system can be solved easily using a
computer. Either LINTEK or MATLAB will reduce the augmented matrix to a
form from which the solutions can be determined casily. We illustrate both
methods with examples.
110 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

EXAMPLE 6 Use the routine MATCOMP in LINTEK, and raise the transition matrix to
powers to find the steady-state distribution vector for the Markov chain in
Example |, having states labeled

poor, middle income, rich.


SOLUTION Using MATCOMP and experimenting a bit with powers of the matrix T in
Example 1, we find that
poor mid rich
3872054 .3872054 .3872054| poor
T® =| 4680135 .4680135 .4680135|. mid
1447811 .1447811 .1447811) rich
Thus eventually about 38.7% of the population is poor, about 46.8% is middle
income, and avout 14.5% is rich, and these percentages no longer change as
iime progresses further over 20-year periods. =

EXAMPLE 7 The inhabitants of a vegetarian-prone community agree on the following rules:


1. No one wil! eat meat two days in a row.
2. A person who eats no meat one day will flip a fair coin and eat meat on
the next day if and only if a head appears.
Determine whether this Markov-chain situation is regular; and if so, find the
steady-state distribution vector for the proportions of the population eating no
meat and eating meat.

SOLUTION The transition matrix T is

no meat meat
1
7 1 Ino meat
T=(1
0 meat
2
Because 7? has no zero entries, the Markov chain is regular. We solve

(T -I)s = 0, or E S| = ol
We reduce the augmented matrix:

| | : -2 Of -2 | 0
I ~|1 0}~ lo o| of
Thus we have

rs, 2r|
ls =|) for some scalar r.
ay
17 APPLICATION TO POPULATION DISTRIBUTION (OPTIONAL) 111

But we must have s, + s, = 1, so 2r + r = | and r = 4. Consequently, the


2
steady-state population distribution is given by the vector H We see that,
3
eventually, on each day about 3 of the people eat no meat and the other +eat
meat. This is independent of the initial distribution of population between
the states. All might eat meat the first day, or all might eat no meat; the
2

WwW]
steady-state vector remains | | in either case. &


wa
If we were to solve Example 7 by reducing an augmented matrix with a
computer, we should add a new row corresponding to the condition s, + s, = |
for the desired steady-state vector. Then the unique solution could be seen at
once from the reduction of the augmented matrix. This can be done using
pencil-and-paper computations just as well. If we insert this as the first
condition on s and rework Example 7, our work appears as follows:
1 afi) ft a] af fr oy 2
12 1 |} 0}. }0 23 | 24 _|0 1 13\°
> -110 (9 “5 | 73 0010

[2
Again, we obtain the steady-state vector |3).
3

“| SUMMARY
1. A transition matrix for a Markov chain is a square matrix with nonneg-
ative entries such that the sum of the entries in each column is 1.
2. The entry in the ith row and jth column of a transition matnix is the
proportion of the population in state j that moves to state i during one time
period of the chain.
3. Ifthe column vector p is the initial population distribution vector bet ween
states in a Markov chain with transition matrix 7, the population
distribution vector after one time period of the chain is 7p.
4. If Tisthe transition matrix for one time period of a Markov chain, then 7”
is the transition matrix for m time periods.
5. A Markov chain and its associated transition matrix T are called regular if
there exists an integer m such that 7” has no zero entries.
6. If Tis a regular transition matrix for a Markov chain,
a. The columns of 7” all approach the same probability distribution
vector s as m becomes large;
b. s is the unique probability distribution vector satisfy:ng 7s = s; and
112 CHAPTER 1 —- VECTORS, MATRICES, AND LINEAR SYSTEMS
c. As the number of time pcriods increases, the population distribution
vectors approach s regardless of the initial population distribution
vector p. Thus s is the steady-stale population distribution vector.

EXERCISES

In Exercises 1-8, determine whether the given In Exercises 13-18, determine whether the given
matrix is a transition matrix. If it is, determine transition matrix with the indicated distribution

py
whether it is regular. of zero entries and nonzero X entries is regular.
ria 02
2 2 4 os Ix 0 x
13 3 4 13.|0 x x 14.|0
0 x
10x x ix x 0
.2 1 3 ri. 2
3./.4 .5 1 4.|.3 .4 ‘Ox 0 [x 0 x

_X,
14 4 8 16 4 15. |x
lo x
x x
0
16. |*
x
*
0 x
*

KKK
1.500 5 3.205
rox x x LX 0 x
5 [5.5 0 0 g |-4 20.1
"lo 5 50 "l1 212 7, |% 0% x TO 0 x

KK
“10 0x x 18, |% 9 0
10 0 5 55 2.402 10 0 x x 0x 0
fo .5 22.1 fo 100.9

sx
10 0 0
30.1 8 .5 0200.11
71004 0.1 810.301 0
7510.1 1.1000 In Exercises 19-24, find the steady-state
10 0 2 0 22] 10 3 100 distribution vector for the given transition matrix
of a Sfarkov chain.
| 3.7 4
19. |? 20.173
3 42
In Exercises 9-12, let
T =|.4 .2 .1| be the
3 1.5 RY3 0 i 42
transition matrix for a Markov chain, and let p = - "
3 033 133
2 be the initial population distribution vector. 2. |0 24 22. Jo L0
5
104 1 O1411

9. Find the proportion of the state 2


population that is in state 3 after two time oa
423
aa
§ 5 3
periods. 23, |10
i Ll4 24. [2211
1
10. Find the proportion of the state 3
population that is in state 2 after two time
bad
223
234
5 33
periods, 25. Mark each of the following True or False.
it. Find the proportion of the total population
—-. a. All entries tn a transition matrix arc
shat is in state 3 afier two time periods.
4

NOANESALVE,
. Find the population distribuuion vector after -—- b. Every matrix whose entries are all
tau time pericds. NuNNegalive IS a transition matrix.
1.7 APPLICATION TO POPULATION DISTRIBUTION (OPTIONAL) 113

c. The sum of all the entries in an n X n 31. If the initial population distribution vector
transition matrix is 1. 4
d. The sum of all the entries in ann x n for all the women is|.5|, find the popuiation
transition matrix is n. dl
e. If a transition matrix contains no zero distribution vector for the next generation.
entries, it is regular.
32. Repeat Exercise 31, but find the population
f. If a transition matrix is regular, it
distribution vector for the following (third)
contains no nonzero entries.
generation.
. Every power of a transition matrix is
again a transition matrix. 33. Show that this Markov chain is regular, and
h. If a transition matrix is regular, its square find the steady-state probability distribution
has equal column vectors. vector.
i. If a transition matrix T is regular, there
exists a unique vector s such that 7s = s.
j. If a transition matrix 7 is regular, there
Exercises 34-39 deal with a simple genetic model
exists a unique population distribution
involving just two types of genes, G and g.
vector s such that 7s = s.
Suppose that a physical trait, such as eye color, is
26. Estimate A', if A is the matrix in Exercise controlled by a pair of these genes, one inherited
20. from each parent. A person may be classified as
27. Estimate A'™, if A is the matrix in Exercise being in one of three states:
23.
Dominant (type GG), Hybrid (type Gg),
Recessive (type gg).

We assume that the gene inherited from a parent


Exercises 28-33 deal with the following Markov is a random choice from the parent's two
chain. We classify the women in a country genes—that is, the gene inherited is just as likely
according as to whether they live in an urban (U), to be one of the parent’s two genes as to be the
suburban (S), or rural (R) area. Suppose that each other. We form a Markov chain by starting with a
woman has just one daughter, who in turn has population and always crossing with hybrids to
just one daughter, and so on. Suppose further that produce offspring. We take the time required to
the following are true: produce a subsequent generation as the time
period for the chain.
For urban women, 10% of the daughters settle
in rural areas, and 50% in suburban areas.
34. Give an intuitive argument in support of the
For suburban women, 20% of the daughters idea that the transition matrix for this
Settle in rural areas, and 30% in urban areas. Markov chain is

For rural women, 20% of the daughters settle in DHR


the suburbs, and 70% in rural areas. 7 4 9)
Let this Markov chain have as its period the time
required to produce the next generation.
rol tty
04 2)
28. Give the transition matrix for this Markov 35. What proportion of the third-generation
chain, taking states in the order U, S, R. offspring (after two time periods) of the
29. Find the proportion of urban women whose recessive (gg) population is dominant (GG)?
granddaughters are suburban women. 36. What proportion of the third-generation
30. Find the proportion of rural women whose offspring (after two time periods) of the
granddaughters are also rural women. hybrid (Gg) population is not hybrid?
114 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

37. If initially the entire population is hybrid, A in that time period. Argue that, by starting
find the population distribution vector in the in any state in the original chain, you are
next generation. more likely to reach an absorbing state in m
38. If initially the population is evenly divided time periods than you are by starting from
Go

among the three states, find the population state F in the new chain and going for r time
distribution vector in the third generation periods. Using the fact that large powers of a
(after two time periods). positive number less than | are almost 0,
show that for the two-state chain, the
39. Show that this Markov chain is regular, and
population distribution vector approaches
find the steady-state population distribution
vector. } as the number of time periods increases,
. A state in a Markov chain is calied absorbing regardless of the initial populaticn
if it is impossible to leave that state over the distribution vector.]
next iime period. What characterizes the
44. Let Tbe an n X n transition matrix. Show
transition matrix of a Merkov chain with an
that, if every row and every column have
absorbing state? Can a Markov chain with
fewer than 7/2 zero entries, the matrix is
an absorbing state be regular?
regular.
41. Consider the genetic model for Exercises
34-39. Suppose that, instead of always
crossing with hybrids to produce offspring,
we always cross with recessives. Give the
transition matrix for this Markov chain, and In Exercises 45-49, find the steady-state
show that there is an absorbing state. (See population distribution vector for the given
Exercise 40.) transition matrix. See the comment following
Example 7.
42. A Markov chain is termed absorbing if it

° fe ;
contains at least one absorbing state (see
Exercise 40) and if it is possible to get from
V3 Hi j0
4
any state to an absorbiag state in some 45. 46. 41.
number of time periods.
a. Give an example of a transition matrix
for a three-state absorbing Markov chain.
ory
48.;503 49,
[gas ]
$03
b. Give an example of a transition matrix
for a three-state Markov chain that is not HE ea
absorbing but has an absorbing state.
43. With reference to Exercise 42, consider an
absorbing Markov chain with transition
matrix 7 and a single absorbing state. Argue = In Exercises 50-54, find the steady-state
that, for any initial distribution vector p, the population distribution vector by (a) raising the
vectors 7"p for large n approach the vector matrix to a power and (b) solving a linear system.
containing 1 in the component that Use LINTEK or MATLAB.
corresponds to the absorbing state and zeros
elsewhere. [Succestion: Let m be such that 1.3.4 3.3 1
it is possible to reach the absorbing state 50.|.2 0 2 §1.].1 .3 .5
from any state in m time periods, and let g 1774 6 .4 4
be the smallest entry in the row of 7” [50009
corresponding to the absorbing state. Form a eos 5 soeel
vw chain with just two states, Absorbing 32.4 2.5 40
53.]/0
6 .3 0 0
(A) and Free (F), which has as time period m 35 20 00.72 0
time periods of the original chain, and with ov 10 0 0 8 LL
probability g of moving from state F to state 54. The matrix in Exercise 8
1.8 APPLICATION TO BINARY LINEAR CODES (OPTIONAL) 115

APPLICATION TO BINARY LINEAR CODES (OPTIONAL)


We are not concerned here with secret codes. Rather, we discuss the problem
of encoding information for trausmission so that errors occurring during
transmission or reception have a good chance of being detected, and perhaps
even being corrected, by an appropriate decoding procedure. The diagram
message —> [encode]
—> jtransmit| — [receive] —> [decode] > message
shows the steps with which we are concerned. Errors could be caused at any
stage of the process by equipment malfunction, human error, lightning,
sunspots, cross talk interference, etc.

Numerical Representation cf Information


All information can be reduced to sequences of numbers. For example, we
could number the letters of our alphabet and represent every word in our
language as a finite sequence of numbers. We concentrate on how to encode
numbers to detect and possibly correct errors.
We are accustomed to expressing numbers in decimal (base 10) notation,
using as alphabet the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. However, they also can be
expressed using any integer base greater than or equal to 2. A computer works
in binary (base 2) notation, which uses the smaller alphabet {0, 1}; the number
I can be represented by the presence of an electric charge or by current
flowing, whereas the absence of a charge or current can represent 0. The
American National Standard Code for Informaticn Interchange (ASCII) has
256 characters and is widely used in personal computers. It includes all the
characters that we customarily find on typewriter keyboards, such as

AaBbZz1234567890,;?*&#!}+-/'"

The 256 characters are assigned numbers from 0 to 255. For example, s iS
assigned the number 83 (decimal), which is 1010011 (binary) because, reading
1010011 from left to right, we see that
1(25) + 0(2%) + 1(24) + 0(2°) + 0(27) + 1(2') + 1(2°) = 83.
The ASCII code number for the character 7 is 55 (decimal), which is 110111
(binary). Because 2 = 256, each character in the ASCII code can be
represented by a sequence of eight 0’s or 1’s; the S is represented by 01010011
and 7 by 00110111. This discussion makes it clear that all information can be
encoded using just the binary alphabet B = {0, 1}.

Message Words and Code Words


An algebraist refers to a sequence of characters from some alphabet, such as
010100i1 using the alphabet B or the sequence of letters g/ypt using our usual
letter alphabet, as a word; the computer scientist refers to this as a string. As we
116 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

discuss encoding words so that transmission errors can be detected, it is


convenient to use as small an alphabet as possible, so we restrict ourselves to
the binary words using the alphabet B = {0, 1}. Rather than give illustrations
using words of eight characters as in the ASC!I code, let us use words of just
four characters; the i6 possible words are
0000 0001 0010 O01! 0100 O101 0110 O11!
1000 1001 1010 {1011 1100 110) 1110 O10.
An error in transmitting a word occurs when a | is changed to a 0 or vice versa
during transmission. The first two illustrations exhibit an inefficient way and
an efficient way of detecting a transmission error in which only one erroneous
interchange of the characters 0 and 1! occurs. In each illustration, a binary
message word is encoded to form the code word to be transmitied.

ILLUSTRATION 1 Suppose we wish to send 1011 as the binary message word. To detect any
single-error transmission, we could send each character twice—that is, we
could encede 1011 as 11001111 when we send it. If a single error is made in
transmission of the code word 11001111 and the recipient knows the encoding
scheme, then tne error will be detected. For illustration, if the received code
word is 11001011, the recipient knows there is an error because the fifth and
sixth characters are differeut. Of course, the recipient does not know whether 0
or | is the correct character. But note that not all doub!e-error transmissions
can be detected. For example, if the received code word is 11000011, the
recipient perceives no error, and obtains 1001 upon decoding, which was not
the message sent. «=

One problem with encoding a word by repeating every character as in


Illustration | is that the code word is twice as long as the original message
word. There is a lot of redundancy. The next illustration shows how we can
more efficiently achieve the goal of warning the receiver whenever a single
error has been committed.

ILLUSTRATION 2 Suppose again that we wish to transmit a four-character word on the alphabet
B. Let us denote the word symbolically by x,x,x,x,, where each x; is either 0 or
1. We make use of mudulo 2 arithmetic on B, where
0+0=0, 1+0=0+1=1, and 1+1=0 (modulo
2 sums)
and where subtraction is the same as addition, sothatO ~ 1 =0+1= 1.
Multiplication is as usual: | -0 = 0- 1 =0Oand1- 1 = 1. We append to the
word x,x,x,x, the modulo 2 sum

Xs =X, tM tN; + YX. (1)


This amounts to appending the character 0 if the message word contains an
even number of characters 1, and appending a | if the number of 1's in the
1.8 APPLICATION TO BINARY LINEAR CODES (OPTIONAL) 117

word is odd. Note that the result is a five-character code word x,x,x,x,%;
definitely containing an even number of 1’s. Thus we have
X, + xX, + x,+x%,+ x;=0 (modulo 2).

If the message word is 1011, as in Illustration 1, then the encoded ward is


10111. If a single error is made in transmitting this code word, changing a
single 0 to 1 ora single 1 to 0, then the modulo 2 sum of the five characters will
be-1 rather than 0, and the recipient will recognize that there has been an error
in transmitting the code word.

Illustration 2 attained the goal of recognizing single-error transmissions


with less redundancy than in Illustration 1. In Illustration 2, we used just one
extra character, whereas in Illustration 1 we uscd four extra characters.
However, using the scheme in Illustration 1, we were able to identify which
character of the message was affected, whereas the technique in Illustration 2
showed only that at least one error had been made.
Computers use the scheme in Illustration 2 wher storing the ASCII code
number of one of the 256 ASCII characters. An extra 0 or 1 is appended to the
binary form of the code number, so that the number of 1's in the augmented
word is even. When the encoded ASCII character is retrieved, the computer
checks that the number of 1’s is indeed even. If it is not, it can try to read that
binary word again. The user may be warned that there is a PARITY CHECK
problem if the computer is not successful.

Terminology
Equation | in Illustration 2 is known as a parity-check equation. In general,
starting with a message word x,x, - * x, of kK characters, we encode it as a code
word X,%,*°* x, °+** x, of m characters. The first k characters are the
information portion of the encoded word, and the final n — kK characters are the
redundancy portion or parity-check portion.
We introduce more notation and terminology to make our discussion
easier. Let B" be the set of all binary words of n consecutive 0’s or 1’s. A binary
code Cis any subset of B". We can identify a vector of n components with each
word in B"—namely, the vector whose ith component is the ith character in
the word. For example, we can identify the word 1101 with the row vector
[1, 1, 0, 1]. It is convenient to denote the set of all of these row vectors with
components by B” also. This notation is similar to the notation R" for all
n-component vectors of real numbers. On occasion, we may find it convenient
to use column vectors rather than row vectors.
The length of a word u in B" is n, the number of its components. The
Hamming weight wt(u) of u is the number of components that are 1. Given two
binary words u and vy in B", the distance between them, denoted by d(u, v), is the
number of components in which the entries in u and are different, so that one
of the words has a 0 where the other has a 1.
118 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

ILLUSTRATION 3 Consider the binary words u = 11010011 and v = 01110111. Both words have
length 8. Also, wt(u) = 5, whereas wt(v) = 6. The associated vectors differ in
the first, third, and sixth components, so d(u, vy) = 3. =

We can define addition on the set 8" by adding modulo 2 the characters in
the corresponding positions. Remembering that 1 + 1 = 0, we add
0011201019 and 1010110001 as follows.
0011101010
+ 1010110001
1001011011
We refer to this operation as word addition. Word subtraction is the same as
word addition. Exercise 17 shows that 5" is closed under word addition. A
binary group code is any nonempty subset C of B" that is closed under word
addition. It can be shown that C has precisely 2* elements for some integer k
where 0 = k = n. We refer to such a code as an (n, k) binary group code.

Encoding to Enable Correcting a Single-Error Transmission


We now show how, using mere than one parity-check equation, we can not
only detect but actually correct a single-error transmission of a code word.
This method of encoding was developed by Richard Hamming in 1948.
Suppose we wish to encode the 16 message words
0000 0001 0010 0011 0100 0101 0110 0111
1000 1001 1010 1011 1100 1101 1110 111i
in B‘ so that any single-error transmission of a code word not only can be
detected but also corrected. The basic idea is simple. We append to the
message word x,x,x,x, some additional binary characters given by parity-
check equations such as the equation x, = x, + x, + x, + x, in Illustration 2,
and try to design the equations so that the minimum distance between the 16
code words created will be at least 3. Now, with a single-error transmission of a
code word, the distance from the received word to that code word will be 1. If
we can make our code words all at least three units apart, the pretransmissioa
code word will be the unique code word at distance 1 from the received word.
(If there were two such code words, the distance between them would be at
most 2.)
In order to detect the error in a single-error transmission of a code word
including not only message word characters but also the redundant parity
check characters, we need to have each component x; in the code word appear
at least once in some parity-check equation. Note that in Illustration 1, each
component x,, X,, X3, X,, and x, appears in the parity-check equation x; =
X, + X, + x, + x,. The parity-check equations
X; = Xx, + X + X}, X¢ = x; + Xy + Xa and Xz = Xz +r Xy + Xa (2)

which we will show accomplish our goal, also satisfy this condition.
1.8 APPLICATION TO BINARY LINEAR CODES (OPTIONAL) 119

Let us see how to get a distance of at least 3 between each pair of the 16
code words. Of course, the distance between any two of the original 16
four-character message words is at least 1 because they are all different.
Suppose now that two message words w and v differ in just one component, say
xX, A single parity-check equation containing x, then yields a different
character for w than for v. This shows that if each x, in our origina!. message
word appears in at least two parity-check equations, then any message words
at a distance of 1 are encoded into code words of distance at least 3. Note that
the three parity-check equations (Eqs. (2)] satisfy this condition. It remains to
ensure that two message words at a distance of 2 are encoded to increase this
distance by at least 1. Suppose two message words u and v differ in only the ith
and jth components. Now a parity-check equation containing both x; and x,
will create the same parity-check character for u as for ». Thus, for each such
combinaticn i,j of positions in ovr message word, we need some panty-check
equation to contain either x; but not x; or x; but not x, We see that this
condition is satisfied by the three parity-check equations [Eqs. (2)] for all
possible combinations i,j—namely,

1,2 1,3 1,4 2,3 2,4 and 3,4.


Thus, these equations accomplish our goal. The 16 binary words of length 7,
obtained by encoding the 16 binary words
0000 0001 0010 0011 0100 0101 0110 O111
1000 1001 1010 1011 1100 1101 1110 1111
of length 4 using Eqs. (2) form a subset of B’ called the Hamming (7, 4) code.
In Exercise 16, we ask you to show that the Haiusiing (7, 4) code is a binary
group code.
We can encode each of the 16 binary words of length 4 to form the
Hamming (7, 4) code by multiplying the vector form [X,, X2, 3, x4] of the word
on the right by a 4 x 7 matrix G, called the standard generator matrix—
namely, we compute

1 0
01
(x, Xo, X34, Xa] 00
0 0

To see this, note that the first four columns of G give the 4 x 4 identity matrix
I, so the first four entries in the encoded word will yield precisely the message
word x,x,X;X,. In columns 5, 6, and 7, we put the coefficients of x,, x,, x;, and x,
as they appear in the parity-check equations defining x,, x,, and x,, respec-
tively. Table 1.1 shows the 16 message words and the code words obtained
using this generator matrix.G. Note that the message words 0011 and 0111,
which are at distance 1, have been encoded as 0011100 and 011100!, which
are at distance 3. Also, the message words 0101 and 0011, which are at dis-
tance 2, have been encoded as 0101110 and 0011100, which are at distance 3.
120 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

Decoding, a received word w by selecting a code word at minimum


distance from w (in some codes, more than one code word might be at the
minimum distance) is known as nearest-neighbor decoding. If transmission
errors are independent from each other, it can be shown that this is equivalent
to what is known as maximum-likelinood decoding.

ILLUSTRATION 4 Suppose the Hamming (7, 4) code shown in Table 1.1 is used. If the received
word is 1011019, then the decoded message word consists of the first four
characters 1011, because 1011010 is a code word. However, suppose the word
0110101 is received. This is not a code word. The closest code word is
0100101, which is at distance 1 from 0110101. Thus we decode 0110101 as

TABLE 1.1
Tne Hamming (7, 4) code

Message Code Word

0000 0000000
0001 0001011
0010 0010111
0011 0011100
0100 0100101
0101 0101110
0110 0110010
0111 0111001
1000 1000110
1001 1001101
1610 1010001
1011 1011010
1100 1100011
1101 1101000
1110 1110100
111 112111

HISTORICAL NOTE Ricuarp Hammine fb. 1915) had his interest in the question of coding
stimulated in 1947 when he was using an earlv Sell System relay computer on weekends only
(because he did not have priority use of the machine). During the week, the machine sounded an
alarm when it discovered an error so that an operator could attempt to correct it. On weekends,
however, the machine was unattended and would dump any problem in which it discovered an
error and proceed to the next one. Hamming’s frustration with this behavior of the machine grew
when errors cost him two consecutive weekends of work. He decided that if the machine could
discover errors—it used a fairly simple error-detecting code—there must be a way for it to correct
them and proceed with the solution. He therefore worked on this idea for the next year and
discovered several different methods of creating error-correcting codes. Because of patenl
considerations, Hamming did no! publish his solutions until 1950. A brief descriplion of his (7, 4)
code, how.ver, appeared in a paper of Claude Shannon (0. 1916) in 1948.
Hamming, in fact, developed some of the parity-check ideas discusscd in the.texl as well as
the geometric model in which the distance between code words is the number of cvordinates in
which they differ. He also, in essence. realized that the sel of actual code words embedded in EB’
was 4 four-dimensional subspace of that space.
1.8 APPLICATION TO BINARY LINEAR CODES (OPTIONAL) 121

0100, which differs from the first four characters of the recetved word. On the
other hand, if we receive the noncode word 1100111, ¥-2 decode it as 1100,
because the closest code word to 1100111 is 1100011. =

Note in Illustration 4 that if the code word 0001011 is transmitted and is


received as 0011001, with two errors, then the recipient knows that an error
has been made, because 0011001 is not a code word. However, nearest-
neighbor decoding yields the code word 0111001, which corresponds to a
message word 0111 rather than the intended 0001. When retransmission is
practical, it may be better to request it when an error is detected rather than
blindiy using nearest-neighbor decoding.
Of course, if errors are generated independently and transmission is of
high quality, it should be much less likely for a word to be transmitted with
two errers than with une error. If the probability of an error in transniission of
a single character is p and errors are generated independently, then probability
theory shows that in transmitting a word of length 7,
the probability of no error is (1 — p}",
the probability of exactly one error is np(1 — p)'"', and
the probability of exactly two errors is Mn Dp — py.
For example, if p = 0.0001 so that we can expect about one character out of
every 10,000 to be changed and if the length of the word is n = 10, then the
probabilities of no error, one error, and two errors, respectively, are approxi-
mately 0.999, 0.000999, and 0.0000004.

Parity-Check Matrix Decoding


You can imagine that if we encoded all of the 256 ASCII characters in an (n, 8)
linear code and used nearest-neighbor decoding, it would be a job to pore over
the list of 256 encoded characters to determine the nearest neighbor to a
received code word. There is an easier way, which we illustrate using the
Hamming (7, 4) code developed before Illustration 4. Recall that the parity-
check equations for that code are
Xs=X,+%Hyt+%, X =X,
+ Xyt+X, and x, = xX, + xX, + %.
Let us again concern ourselves with detecting and correcting just single-error
transmissions. If these parity-check equations hold for the received word, then
no such single error has occurred. Suppose, on the other hand, that the first
two equations fail and the last one holds. The only character appearing in both
of the first two equations but not in the last is x,, so we could simply change the
character x, from 0 to 1, or vice versa, to decode. Note that each of x,, x,, and
X, 1S omitted just once but in different equations, x, is the only character that
appears in all three equations, and each of x;, x,, and x, appears just once but
in different cquaiions. This allows us to identify the character in a single-error
transmussion casilv.
122 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS
We can be even more systematic. Let us rewrite the equations as

X, + Xz, +X, + x, = 0.
We form the parity-check matrix H whose ith row contains the seven
coefficients of x,, X2, X3, %4) Xs. Xs) X7 in the ith equation—namely,

1110100
H=/101101 0).
O11!
Let w bea received word, written as a column vector. Exercise 26 shows that w
1s a code word if and only if Hw is the zero column vector, where we are always
using modulo 2 arithmetic. If w resulted from a single-error transmission in
which the character in the jth position was changed, then Hw would have | in
its ith component if and only if x; appeared in ihe ith parity-check equation, so
that the column vector Hw would be the jth column of H. Thus we can decode
a received word w as follows in the Hamming (7, 4) code of Illustration 4, and
be confident of detecting and correcting any single-position error, as follows.

Parity-Check Matrix Decoding


1. Compute Hw.
2. If Hw is the zero vector, decode as the first four characters of w.
3. If Hw is the jth column of H, then:
a. if j > 4, deccde as the first four characters of w;
b. otherwise, decode as the first four characters with the jth charac-
ter changed.
4. If Hw is neither the zero vector nor the jth column of H, then more
than one error has been made: ask for retransmission.

ILLUSTRATION 5 Suppose the Hamming (7, 4) code shown in Table 1.1 on page 120 1s used and
the word w = 0110101 is received. We compute that
0
I
1 1 10 Oj/1 1
—_

—_—i_

Hw= 01 0 1 O;/0) =] 1).


—_=

1 1 00 1//1 l
oe

0
I
Because this is the third column of H, we change the third character in the
message portion 0:10 and decode as 0100. Note that this is what wc obtained
in Illustration 4 when we decoded this word using Table 1.1. =
1.8 APPLICATION TO BINARY LINEAR CODES (OPTIONAL) 123

Just as we did after Illustration 4, we point out that if two errors are made
in transmission, the preceding outline may lead to incorrect decoding. If the
code word v = 0001011 is transmitted and received as w = 0011001 with two
errors, then

l
Hw= 1/0},
l

which ts column 2 of the matrix H. Thus, decoding by the steps above leads to
the incorrect message word 0111. Note that item 4 in the list above does not
say that if more than one error has been made, then Hw is neither the zero
vector nor the jth column of H.

| EXERCISES

1. Let the binary numbers 1 through 15 stand Exercise 4. (Recall that in the case where
for the letters ABCDEFGHIJKLM more than one code word is at minimum
N O, in that order. Using Table 1.1 for the distance from the received word, a code
Hamming (7, 4) code and letting 0000 stand word of minimum distance is selected
for a space between words, encode the arbitrarily.)
message A GOOD DOG. a. 110111
. With the same understanding as in the b. 001011
preceding exercise, use nearest-neighbcr c. 111011
decoding to decode this received message. d. 101010
e. i00i0i
0111001 1104111 1010100 0101110
0000000 1100110 1i11111 1101000 . Give the parity-check matrix for this code.
1101110 . Use the parity-check matrix to decode the
received words in Exercise 7.
In Exercises 3 through 11, consider the (6, 3)
linear code C with standard generator matrix 10. cee = 1101010111 and v = 0111001110.
in
100110 a. wt()
G=|0 1010 1}. b. wt(v)
001011 couty
; d. d(u, v)
. one the pasity-check equations for this 11. Show that for word addition of binary words
°° °. u and v of the same length, we have u + v =
. List the code words in C. u-v.
- How many errors can always be detected 12. If a binary code word u is transmitted and
using this code? the received word is w, then the sum uv + w
. How many errors can always be corrected given by word addition modulo 2 is called
using this code? the error pattern. Explain why this is a
. Assuining that the given word has been descriptive name for this sum.
received, decode it using nearest-neighbor 13. Show that for two binary words of the same
decoding. using your list of code words in length, we have d(u, v) = wt(" ~ »).
124 CHAPTER 1 VECTORS, MATRICES, AND LINEAR SYSTEMS

14. Prove the following properties of the if i # j, Then use the fact that m must be
distance function for binary words u, v, and large enough so that B” contains C and all
w of the same length. words whose distance from some word in C
2. d(u. v) = Oif and only ifu = v is 1]
b. A(u, v) = d(v, u) (Symmetrv) 22. Show that if the minimum distance between
c. d(u, w) = d(u, v) + d(v, w) tne words in an (m, k) binary group code C is
(Triangle inequality) at least 5, then we must have
d. d(u, v) = d(u + w,v+ w)
(Invariance under translation) arte lint =),
15. Show that 8” is closed under word addition. [Hint: Proceed as suggested by the hint in
16. Recall that we call a nonempty subset C of Exercise 21 to count the words at distance |
B" a binary group code if C is closed under and at distance 2 from some word in C.]}
addition. Show that the Hamming (7, 4) —
23. Using the formulas in Exercises 21 and 22,
code is a group code. [Hint: To show closure find a lower bound for the number of
under word addition, use the fact that the pafity-check equations necessary to encode
words in the Hamming (7, 4) code can be the 2* words in B* so that the minimum
formed from those in B‘ by multiplying by a distance between different code words is at
generator matrix.] least m for the given values of m and k.
17. Show that in a binary group code C, the (Note that k = 8 would allow us to encode
minimum distance between code words is all the ASCII characters, and that m = 5
equal to the minimum weight of the nonzero would allow us to detect and correct all
code words. single-error and double-error transmissions
18. Suppose that you want to be able to using nearest-neighbor decoding.)
recognize that a received word is incorrect a. k=2,m=3 d. k=2,m=5
when m or fewer of its characters have been b. k= 4, m = 3 e k=4,m=5
changed during transmission. What must be c. k=8,m=3 f. k=8,m=5
the minimum distance between code words 24. Find parity-check equations for encoding the
to accomplish this? 32 words in B° into an (n, 5) linear code that
19. Suppose that you want to be able to find a can be used to detect and correct any
unique nearest neighbor for a received word single-error transmission of a code word.
that has been transmitted with m or fewer of (Recall that each character x, must appear in
its characters changed. What must be the two parity-check equations, and that for
minimum distance between code words to each pair x,, x, some equation must contain
accomplish this? one of them but not the other.) Try to make
20. Show that if the minimum nonzero weight of the number of parity-check equations as
code words in a group code C is at least small as possible; see Exercise 21. Give the
2t + :, then the code can detect any 2¢ standard generator matrix for your code.
errors and correct any ¢ errors. (Compare the . The 256 ASCII characters are numbered
result stated in this exercise with your from 0 to 255, and thus can be represented
answers to the two preceding ones.) by the 256 binary words in B®. Find n — 8
21. Show that if the minimum distance between parity-check equations that can be used to
the words in an (n, k) binary group code C is form an (n, 8) linear code that can be used
at least 3, we must have to detect and correct any single-error
transmission of a code word. Try to make 7
aka Ltn. the value found in part (c) of Exercise 23.
uni: Let e, be the word in B" with | in the 26. Let C be an (n, k) linear code with
ih pesition and 0’s elsewhere. Show that e, parity-check matrix H. We know that Hc = 9
is not in C and that, for any two distinct for all c € C. Show conversely thai if
words vand win C, wehavev + e, # wt 2, w € B" and Hw = 0, thenweE C.
mn CHAPTER

2 DIMENSION, RANK, AND LINEAR


TRANSFORMATIONS

Given a finite set S of vectors that generate a subspace W of R’, we would like
to delete from S any superfluous vectors, obtaining as small a subset B of S as
we can that still generates W. We tackle this problem in Section 2.1. In doing
so, we encounter the notion of an independent set of vectors. We discover that
such a minimal subset B of S that generates W is a basis for W, so that every
vector in Wcan be expressed uniquely as a linear combination of vectors in B.
We will see that any two bases for W contain the same number of vectors—the
dimension of W wiil be defined to be this number. Section 2.2 discusses the
relationships among the dimensions of the column space, the row space, and
the nullspace of a matrix.
In Section 2.3, we discuss functions mapping R’ into R" that preserve, ina
sense that we will describe, both vector addition and scalar multiplication.
Such functions are known as linear transformations. We will see that for a
linear transformation, the image of a vector x in R" can be computed by
multiplying the column vector x on the left by a suitable m x n matrix.
Optional Section 2.4 then applies matrix techniques in describing geometri-
cally all linear transformations of the plaie R? into itself.
As another application to geometry, optional Section 2.5 uses vector
techniques to generalize the nctions of line and plane to k-dimensional flats in
R".

2.1 INDEPENDENCE AND DIMENSION

Finding a Basis for a Span of Vectors


Let w,, W,, ... , W, be vectors in R” and let W = sp(w,.w,, ... , W,). Now Wcan
be characterized as the smallest subspace of R" containing ali of the vectors w,,
W, ... . W,, because every subspace containing these vectors must contain all

125
126 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

linezz combinations of them, and consequently must include every vector in


W. We set ourselves the preblem of finding a basis for W.
Let us assume that {ww,, w,,..., w,} is not a basis for W. Theorem 1.15
then tells us that we can express the zero vector as a linear combination of the
w, in some nontrivial way. As an i!lustration, suppose that

2wW, _ SW, + ” = 0. (1)

Using Eq. (1), we can express each of w,, w,, and w, as a linear combination of
the other two. For example, we have

7 = —2w, + SW, sa Wy = —6w, + i Swe.

We claim that we can delete w, from our list w,, W,, . . . , W, and the remaining
w; will still span W. The space spanned by the remaining w, which is contained
in W, will still contain w, because w, = —6w, + 15w,, and we have seen that W
is the smallest space containing all the w,. Thus the vector w, in the original list
is not needed to span W.
The preceding illustration indicates that we can find a basis for W =
sp(W,, Wa, ..., W,) by repeatedly delcting from the list of w,, w., ..., W, one
vector that appears with a nonzero coefficient in a linear combination giving
the zero vector, such as Eq. (1), until no such nontrivial linear combination for
0 exists. The final list of remaining w, will still span W and be a basis for Wby
Theorem 1.15.

EXAMPLE 1 Find a basis for W = sp((2, 3], [0, 1], (4, —6]) in R*.
SOLUTION The presence of the vector [0, 1] allows us to spot that [4, —6] = 2[2, 3] -
12[0, 1], so we have a relation like Eq. (1)—namely,

2[2, 3] — 12[0, 1] - [4, —6] = [0, 0}.


Thus we can delete any one of the three vectors, and the remaining two will
still span W. For example, we can delete the vector [4, 6] and we will have
W = sp((2, 3}, [0, 1]). Because neither of these two remaining vectors is a
multiple of the other, we see that the zero vector cannot be expressed as a
nontrivial linear combination of them. (See Exercise 29.) Thus {[2, 3], [0, 1]}is
a basis for W, which we realize is actually all of R’, because any two nonzero
and nonparallel vectors span R? (Theorem 1.16 in Section 1.6). =

Our attention is focused on the existence of a nontrivial linear combina-


i10n yielding the zero vector, such as Eq. (1). Such an equation is known as a
dependence relation. We formally define this, and the notions of dependence
and idependence for vectors. This is a very important definition in our study
of linear algebra.
21 INDEPENDENCE AND DIMENSION 127

DEFINITION 2.1 Linear Dependence and Independence

Let {w,, W,, ... , W,} be a set of vectors in R”. A dependence relation in
this set is an equation of the form

nw, + nw, +++: +7W,=0, withatleastoner,# 0. (2)


If such a dependence relation exists, then {w,, W,,..., Ww, is a linearly
dependent set of vectors. Otherwise, the set of vectors is linearly
independent.

For convenience, we will often drop the word linearly from the terms
linearly dependent and linearly independent, and just speak of a dependent or
independent set of vectors. We will sometimes drop the words set ofand refer to
dependent or independent vectors w,, W,,..., We
Two nonzero vectors in R" are independent if and only if one is not a scalar
inultiple of the other (see Exercise 29). Figure 2.1(a) shows two independent
vectors w, and w, in R’. A little thought shows why r,w, + r,w, in this figure can
be the zero vector if and only if r, = r, = 0. Figure 2.1(b) shows three
independent vectors w,, W,, and w, in R”. Note how w, ¢ sp(w,, w;). Similarly,
w, € sp(w,, w;) and w, € sp(w,, w,).
Using our new terminology, Theorem 1.15 shows that {w,. w,,... , wis a
basis for a subspace W of R’ if and oniy if the vectors w,, W.,..., W, span W
and are independent. This is taken as a definition of a basis in many texts. We
chose the “unique linear combination” characterization in Definition 1.17
because it is the most important property of bases and was the natural choice
arising from our discussion of the solution set of Ax = 0. We state this
alternative characterization as a theorem.

(a) (b)
FIGURE 2.1
(a) Independent vectors w, and w,; (b) independent vectors w., Ww, and wy.
128 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

THEOREM 2.1 Alternative Characterization of a Basis

Let Wbe a subspace of R"®. A subset {w,, w,,.. w,}of Wis a basis for
W if and only if the following two conditions are met:
1. The vectors w,, W,,..., W, Span W.
2. The vectors are linearly independent.

We turn to a technique for computing a basis for W = sp(w,, W, . . . , W,)


in R". Determining whether there is a nontrivial dependence relation

x", + yw, t +++ +xWw,=0, some x, # 0,


amounts to determining whether tne linear system Ax = 0 has a nontrivial
solution, where A is the matrix whose jth column vector is w,. The ubiquitous
row reduction appears again! This time, we will get much more information
than just the existence of a dependence relation. Recall that the solutions of
Ax = 0 are identical with those of the system Hx = 0 where [H | 0] is obtained
from [A | 0] by row reduction. Suppose, for illustration, that H is in reduced
row-echelon form, and that
102 0 5(0
01-3 0 910
lo 00 1-7/0
HIM=})
9 9 0 0] of
0000 olo0
060 0 00
(Normally, we would not bother to write the column of zeros to the right of the
partition, but we want to be sure that you realize that, in this context, we
imagine the zero column to be there.) The zeros above as well as below the
pivots allow us to spot some dependence relations (arising from solutions of
Hx = 0) immediately, because the columns with pivots are in standard basis
vector form. Every nontrivial solution of Hx = 0 is a nontrivial solution of
Ax = 0 and so gives a dependence relation on the column vectors w, of A. In
particular, we see that
Ww; = 2W, ™- 3w, and Ws = SW, + ow, ™~ 7W,,

and so we have the dependence relations

—2w, + 3w, + Ww; = 0 and — Sw, _ ow, + TW, + W; = 0.

Hence we can delete w, and w, and retain {w,, w,, w,} as a basis for W. In order
to be systematic, we have chosen to keep precisely the vectors w, such that the
jth column of H contains a pivot. Thus we don’t really have to obtain reduced
row-echelon form with zeros above pivots to do this; row-echelon form is
enough. We have hit upon the following elegant technique.
2.1 INDEPENDENCE AND DIMENSION 129

Finding a Basis for W = sp(w,, wz, ..., W,)


1. Form the matrix A whose jth column vector is W,.
2. Row-reduce A to row-echelon form H.
3. The set of all w; such that the jth column of H contains a pivot is a
basis for W.

EXAMPLE 2 Find a basis for the subspace W of R’ spanned by


w, = [1, -1, 0, 2, 1], w, = [2, 1, -2, 0, 0],
w; = [0, —3, 2, 4, 2], w, = [3, 3, -4, -2, -]],
ws = (2, 4, 1, 0, 1], Ww, = [5, 7, -3, —2, 0].
SOLUTION We reduce the matrix that has w, as jth column vector, obtaining

1 2 0 3 2°5 12 0 3 2 =#«55
-! 1-3 3 4 7; 10 3-3 6 6 12
0-2 2-4 1 -3!/~|0 -2 2-4 #1 -3
2 0 4-2 0-2; |9 -4 4 -8 -4 -12
1 0 2-1 1 OF 10-2 2-4-1 -53
i 2 0 3 2 =5 1 2 0 3 2°5
0 1-1 2 2 4) {0 1-1 2 2 4
~10 0 0 0 5 5)~|0 0 0 0 1 1).
000 0 4 4 10 0 0 0 0 0
00 ¢ 0 3 3} 0 0 0 0 0 0
Because there are pivots in columns 1, 2, and 5 of the row-echeion form, the
vectors w,, W,, and w, are retained and are independent. We obtain {w,, w, Ws}
as a basis for W. oa

We emphasize that the vectors retained are in columns of the matrix A


formed at the start of the reduction process. A common error is to take instead
the actual column vectors containing pivots in the row-echelon form H. There
is no reason even to expect that the column vectors of H lie in the column
space of A. For example,
_{11)_ fi ly_
A=| 1 f 0) =
and certainly 0) is not in the column space of A.
Note that we can test whether vectors in R" are independent by reducing a
matrix having them as column vectors. The vectors are independent if and
only if row reduction of the matrix yields a matrix with a pivot in every
column. In particular, n vectors in R" are independent if and only if row
130 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

reduction of the matrix having them as column vectors yields the n x n


identity matrix J. On the other hand, more than n vectors in R"” must be
dependent, vecause an m X n matrix with m < n cannot have a pivot in every
column.

EXAMPLE 32 Determine whether the vectors v, = [1, 2, 3, 1], v, = (2, 2, 1, 3], and y, =
[-1, 2, 7, —3] in R* are independent.
SOLUTION Reducing the matrix with jth column vector v,, we obtain
1 2-1) f1 2-1) fi 0 3
2 2 2) lo-2 4] |o 1-2
3 1 7/~l0-5 10/~]o 0 OF
1 3-3} (0 1-2; lo 0 oO]
We see that the vectors are not independent. In fact, v, = 3v, — 2v,. m=

The Dimension of = Subspace


We realize that a basis for a subspace W of R” is far from unique. In Example 1,
for W = sp([2, 3}, (0, 1], [4, —6]) in R’, we discovered that any two of the three
vectors can be used to form a basis for W. We also know that any two nonzero,
nonparailel vectors in R? form a basis for R?. Likewise, in Example 2, the
vectors we found for a basis for sp(w,, W2, W3, W4, Ws, W) depended on the order
in which we put them as columns in the matrix A that we row-reduced. If we
reverse the order of the columns in the matrixA in Example 2, we will wind up
with a different basis. However, it is true that given a subspace W of R’, all
bases for W contain the same number of vectors. This is an important result,
which will be a quick corollary of the following theorem.

THEOREM 2.2 Relative Sizes of Spanning and Independent Sets

Let W be a subspace of R’. Let w,, w,, ..., W, be vectors in W that


span W, and let v,, ¥,,..., V,, be vectors in W that are independent.
Then k = m.

PROOF Let us suppose that k < m. We will show that the vectors v,,v,, ...,¥,
are dependent, contrary to hypothesis. Because the vectors w,, w,,..., W;
span W, there exist scalars a, such that
V, = 4,W, + ,W, + + °° + AyW,,
V. = A.W, + AyW, + +++ + Aym,

(3)

Vin = Qy,,W, + Qr,W beset QymWy.

We compute x,v, + x,¥, + +++ + X,¥,, in an attempt to find a dependence


relation by multiplying the first equation in Eqs. (3) by x,, the second by x,
2.1 INDEPENDENCE AND DIMENSION 131

etc., and then adding the equations. Now the resulting sum is sure to be the
zero vector if the total coefficient of each w, on the right-hand side in the sum
after adding is zero—that is, if we can make
*ai, + XQr9 tees t Xm2 sm = 0,

XjQy, + XAy + ++ * + XpQy, = 0,

XAyy + Xp +6 * + XpQen = O-

This gives us a homogeneous linear system of k equations in m unknowns x;.


Corollary 2 of Theorem 1.17 tells us that such a homogeneous system has a
nontrivial solution if there are fewer equations than unknowns, and this is the
case because we are supposing that k < m. Thus we can find scalars x,, x,,...,
Xm NOt all zero, such that x,v, + x,V, + + *+ + ,,V,, = 0. That is, the vectors
V1, Va)... Yq, are dependent if k < m, as we wanted to show. 4

COROLLARY Invariance of Dimension

Any two bases of a subspace W of R" contain the same number of


vectors.

PROOF Suppose that both a set B with k vectors and a set B’ with m vectors
are bases for W. Then both B and B’ are independent sets of vectors, and the
vectors in either set span W. Regarding B as a set of k vectors spanning Wand
regarding B’ as a set of m independent vectors in W, Theorem 2.2 tells us that
k = m. Switching around and regarding B’ as a set of m vectors spanning W
and regarding B as a set of k independent vectors in W, the theorem tells us
that m =k. Therefore,k =m. a

As the title of the corollary indicates, we will consider the number of


vectors in a basis for a subspace W of R’ to be the dimension of W. If different
bases for W were to have different numbers of vectors, then this notion of
dimension would not be well defined (that is, unambiguous). Because different
people may come up with different bases, the preceding corollary is necessary
in order to define dimension.

DEFINITION 2.2 Dimension of a Subspace

Let Wbe a subspace of R’. The number of elements in a basis for Wis
the dimension of W, and is denoted by dim(W).

Thus the dimension of R" is n, because we have the standard basis


{e,,@,. .,¢,}. Now R cannot be spanned by fewer than n vectors, because a
132 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

spanning set can always be cut down (if necessary) to form a basis using the
technique boxed before Example 2. Theorem 2.2 also tells us that we cannot
find a set containing more than n independent vectors in R’. The same
observations hold for any subspace 17 of R’ using the same arguments. If
dim(W) = k, then W cannot be spanned by fewer than & vectors, and an
independent set of vectors in W can contain at most k elements. Perhaps you
just assumed that this would be the case; it is gratifying now to have
Justification for it.

EXAMPLE 4 Find the dimension of the subspace W = sp(w,, w., W;, W,) of R? where
Wy = (1, —3, 1], Ww = [-2, 6, —2], W3 = (2, l, —4}, and W, = [-1, 10, —7}.

SOLUTION Clearly, dim(W) is no larger than 3. To determine its value, we form the matrix

1-2 2-1
A=|-3 6
1 10).
1-2 +4 -7
We reduce the matrix A to row-echelon form, obtaining

1-2 2-1] [1-2 2-1 1-2 2-1


-3 6 1 10/~10 0 7 7Ii~10 0 1 JI.
1-2 -4-7| |0 0-6-6] [0 0 0 O|

Thus the column vectors

l 2
—3] and 1
l —4

form a basis for W, the column space of A, and so dim(W) = 2. a

In Section 1.6, we stated that we would show that every subspace W of R’”
is of the form sp(w,, W>,..., W,). We do this now by showing that every
subspace W # {0} has a basis. Of course, {0} = sp(0). To construct a basis for W
# {0}, choose any nonzero vector w, in W. If W = sp(w,), we are done. If not,
choose a vector w, in W that is not in sp(w,). Now the vectors w,, w, must be
independent, for a dependence relation would allow us to express w, as a
multiple of w,, contrary to our choice of w, not in sp(w,). If sp(w,, w,) = W, we
are done. If not, choose w, € W that is not in sp(w,, w,). Again, no dependence
relation can exist for w,, W,, W; because none exists for w,, w, and because w,
cannot be a linear combination of w, and w,. Continue in this fashion. Now
W cannot contain an independent set with more than n vectors because
no independent subset of R” can have more than n vectors (Theorem 2.2).
The process must stop with W = sp(w,, w,,..., w,) for some k = a,
which demonstrates our goal. In order to be able to say that everv subspace
of R" has a basis, we define the basis of the zero subspace {0} to be the empty
set. Note that although sp(0) = {0}. the zero vector is not a waigue linear com-
21 INDEPENDENCE AND DIMENSION 133

bination of itself, because r® = 0 for ali scalars r. In view of our definition.


we have dim({0}) = 0.
The construction technique in the preceding paragraph also shows that
every independent subset S of vectors in a subspace W of R" can be enlarged. 1f
necessary, to become a basis for W. Namely, if S is not already a basis, we
choose a vector in W that is not in the span of the vectors in S, enlarge S' by this
vector, and continue this process until S becomes a basis.
If we know already that dim(W) = k and want to check that a subset S
containing & vectors of Wis a basis, it is not necessary to check both (1) that
the vectors in S span W and (2) that they are independent. It suffices to check
just one of these conditions. Because if the vectors span S, we know that the set
Scan be cut down—if necessary, by the technique of Example 2—to become a
basis. Because S already has the required number of vectors for a basis, no
such cutting down can occur. On the other hand, if we know that S$ is an
independent set, then the preceding paragraph shows that Scan be enlarged, if
necessary, to become a basis. But because S has the right number of vectors for
a basis, no such enlargement is possible.
We collect the observations in the preceding three paragraphs in a theorem
for easy reference.

THEOREM 2.3 Existence and Determination of Bases

1. Every subspace W of R’ has a basis and dim{W) = n.


2. Every independent set of vectors in R” can be enlarged, if neces-
sary, to become a basis for R’.
3. If Wis a subspace of R" and dim(W} = k, then
a. every independent sef of k vectors in W is a basis for W, and
b. every set of k vectors in W that spans Wis a basis for W.

The example that follows illustrates a technique for enlarging an indepen-


dent set of vectors in R" to a basis for R’.

EXAMPLE 5 Enlarge the independent set {[1, 1, —1], [1, 2, -2]} to a basis for R’.
SOLUTION Let v, = [1, 1, —l] andy, = [1, 2, —2]. We knowa spanning set for R’-—namely.
{e,, €,, €;}. We write R? = sp(v,, v,, e,, €,, €;) and apply the technique of Example
2 to find a basis. As long as we put y, and y, first as columns of the matrix to be
reduced, pivots will occur in those columns, so vy, and y, will be retained in the
basis. We obtain

1 1 1 0 0 1 t 1 0 0 1 1 1 0
1 2 0 1 Of~jO
1 -!t | O;~/0 1 -! 1
-1-2 0 0 1 0-1 1 0 1 00 0 1
We sec that the pivots occur in columns |, 2, and 4. Thus a basis containing v.
and vis {f1, 1, -1], (1, 2, -2], [0, 1, 0]. «
134 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

‘| SUMMARY
- Aset of vectors {w,, w,, . . . » W,} in R" is linearly dependent if there exists a
dependence relation
nw, t+ rw, + +++ +7W,=0, — with at least one 7; 4 0.
The set is linearly independent if no such dependence relation exists, so
that a linear combination of the w, is the zero vector only if all of the scalar
coefficients are zero.
. Asset Bof vectors in a subspace W of R’ is a basis for W if and only if the set
is independent and the vectors span W. Equivalently, each vector in Wcan
be written uniguely as a linear combination of the vectors in 2.
. If W= sp(w,, Wo, .. . , W,), then the set {w,, w., . . . , W,} can be cut down, if
necessary, to a basis for W by reducing the matrix A having w, as the jth
column vectur to row-echelon form H, and retaining w, if and only if the jth
column of H contains a pivot.
. Every subspace W of R" has a basis, and every independent set of vectors
in W can be enlarged (if necessary) to a basis for W.
. Let W be a subspace of R". All bases of W contain the same number of
vectors. The dimension of W, denoted by dim(W), is the number of
vectors in any basis for W.
. Let Wbea subspace of R" and let dim(W) = k. A subset S of W containing
exactly k vectors is a basis for W if either
a. Sis an independent set, or
b. S spans W.
That is, it is not necessary to check both conditions in Theorem 2.1 for a
basis if S has the right number of elements for a basis.

EXERCISES
. Give a geometric criterion for a set of two 6. Argue geometrically that every set of four
distinct nonzero vectors in R? to be distinct vectors in R? is dependent.
dependent.
. Argue geometrically that any set of three
distinct vectors in R? is dependent. In Exercises 7-11, use the technique of Example
2, described in the box on page 129, to find a
. Give a geometric criterion for a set of two basis for the subspace spanned by the given
distinct nonzero vectors in R? to be
vectors.
dependent.
. Give a geometric description of the subspace 7. sp([-3, 1), (6, 4]) in R?
of R} generated by an independent set of two
8. sp({[—3, 1], (9, —3]) in R?
vectors.
. Give a geometric criterion for a set of three 9. sp([2, 1], [-6, —3], [1, 4]) in R?
172)

distineL nonzero vectors in R’ to be 10. sp([-—2, 3, 1], [3, -1, 2], [1, 2, 3], [-1, 5, 4)
dependent. in R?
2.1 INDEPENDENCE AND DIMENSION 135

1. sp([!, 2, 1, 2], (2, 1, 0, -1], [-1. 4, 3, 8], —— b. Ifa set of nonzero vectors in R" is
(0, 3, 2, 5]) in R* dependent, then aay two vectors in tlie
12. Find a basis for the column space of the set are parallel.
matrix . Every subset of three vectors in R? is
dependent.
2 . Every subset of two vectors in R? is


Ww
_|5 independent.
A=T,
~)

—_—
th
. Ifa Subset of two vectors in R? spans R’,

no
6- then the subset is independent.
eo
NR

—— f. Every subset of R’ containing the zero


13. Find a basis for the row space of the matrix
vector is dependeni.
1357 . If S is independent, then each vector in
A={2
0 4 2]. R" can be expressed uniquely as a linear
3287 combination of vectors in S.
. If Sis independent and spans R”, then
14. Find a basis for the column space of the each vector in R’ cau be expressed
matrix A in Exercise 13. uniquely as a linear combination of
15, Find a basis for the row space of the matrix vectors in S.
A in Exercise 12. i. If each vector in R" can be expressed
uniquely as a linear combination of
vectors in S, then S is an independent
In Exercises 16-25, use the technique illustrated set.
in Example 3 to determine whether the given set j. The subset S is independeat if and only
of vectors is dependent or independent. if each vector in sp(v,, V.,...,¥,) has a
unique expression as a linear
16. {{1, 3], [-2, —6]} in R combination of vectors in 5S.
17. {{1, 3], (2, —4]} in R? k. The zero subspace of R” has dimension 0.
l. Any two bases of a subspace W of R’
18. {{-3, 1], (6, 4]} in R? contain the same number of vectors.
19. {(-3, 1], (9, -3]} in R? . Every independent subset of R" is a
20. {[2, 1], [-6, -3], [1, 4]} in R subset of every basis for R’.
. Every independent subset of R’ is a
21. {(-1, 2, 1], [2, -4, 3]} in R subset of some basis for R’.
22. {[1, -3, 2], (2, -5, 3], [4, 0, IJ} in R?
23. {{1, -4, 3}, [3, -11, 2], [1, —3, -4]} in R? . Let uand v be two different vectors in R’.
Prove that {u, v} is linearly dependent if and
24. {[1, 4, -1, 3], [-1, 5, 6, 2], [1, 13, 4, 7]} in only if one of the vectors is a multiple of the
R‘

other.
25. {{-2, 3, 1), (3, —1, 2], U), 2, 3], [-1, 5, 4]} in
R3 30. Let v,, v,, ¥; be independent vectors in R’,
Prove that w, = 3v,, W, = Zv, — ¥;,,
and w, = v, + v; are also independent.
In Exercises 26 and 27, enlarge the given
independent set to a basis for the entire space R’. 31. Let v,, V,, ¥; be any vectors in R”. Prove that
WwW, = 2v, + 3y,, W, = Vv, — 2v;,
26. {[1, 2, 1]} in R? and w, = —v, — 3v, are dependent.
27. {[2, I, 1, 1), (1, 0, l, 1}} in RS
32. Find all scalars s, if any exist, such that
22. Let S = {v,, v,,..., Vt be aset of vectors in [1, 0, 1], (2, s, 3], (2, 3, !] are independent.
R°. Mark each of the following True or False.
_— a. A subset of R’ containing two nonzero 33. Find all scalars s, if any exist, such that
distinct parallel vectors is dependent. [1, 0. 1]. [2. s, 3]. [1, -s, 0} are independent.
136 CHAPTER. 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

34. Let v and w be independent column vectors ~ a In Exercises 39-42, use LINTEK to find a basis
in R?, and let A be an invertible 3 x 3 for the space spanncd by the given vectors in R’.
matrix. Prove that the vectors Av and Aw are 39. v, = (5, 4, 3], v, = [6, 1, 4]
independent. -_ (2, 1, 6], v5 = 1,1,]]
35. Give an example showing that the (4, 5, -12],
conclusion of the preceding exercise need
not hold if A is nonzero but singular. Can “0.3 [ OA 7) a, = {-1, 4, 6, 11],
you also find specific independent vectors v [-3 Bt as 1, 1, 3],
and w and 2 singular matrix A such that Ay 1, = (3,7, 3, 9]
and Aw are still independent? “1. [3,
36. Let v and w be column vectors in R’, and let (3,
A be an mn X n matrix. Prove that, if Av and [3,
Aw are independent, y and w are [1,
independent. [7,
37. Generalizing Exercise 34, let v,, ¥,,..., %; (2,
be independent column vectors in R’, and let 42. w, =
C be an invertible n x n matrix. Prove that W, =
the vectors Cv,, Cv.,..., Cv, are W; =
independent. W, =
38. Prove that if W is a subspace of R’ and W; =
dim(W) = n, then W = R’. W, =

MATLAB
Access MATLAB and work the indicated exercise. M1. Exercise 39
If the data files for the text are available, enter M2. Exercise 40
fbe2s1 for the vector data. Otherwise, enter the M3. Exercise 41
vector data by hand. ~ EXETCISe
M4. Exercise 42

2.2. THE RANK OF A MATRIX


In Section 1.6 we discussed three subspaces associated with an m x n matrix
A: its column space in R”, its row space in R", and its nullspace (solution spacé
of Ax = 0) in R’. In this section we consider how the dimensions of these
subspaces are related.
We can find the dimension of the column space of A by row-reducingA to
row-echelon form H. This dimension is the number of columns of H having
pivots.
Turning to the row space, note that interchange of rows does not change
the row space, and neither does multiplication of a row by a nonzero scalar. |
we inultiply the ith row vector y, by a scalar r and add it to the kth row vector
y,, then the new kth row vector is rv, + v,, which is still in the row space of 4
2.2 THE RANK OF A MATRIX 137

because it is a linear combination of rows of A. But the original row vector ¥, is


also in the row space of the new matrix, because it is equal to (rv; + ¥,) + (—/)V..
Thus row addition also does not change the row space of a matrix.
Suppose that the reduced row-echelon form of a matrix A is
1020 5
01-30 9
_10 0 0 1 -7
H=l) 9 0 0 0|
000 0 0
0000 0|
The configuration of the three nonzero row vectors in their Ist, 2nd, and 4th
components is the configuration of the row vectors ¢., e,, e; in R’, and ensures
that the first three row vectors of H are independent. In this way we see that the
dimension of the row space of any matrix is the number of nonzero rows in its
reduced row-echelon form, or just in its row-echelon form. But this is also the
number of pivots in the matrix. Thus the dimension of the column space of A
must be equal to the dimension of its row space. This common dimension of
the row space and the column space is the rank of A, denoted by rank(A). These
arguments generalize to prove the following theorem.

THEOREM 2.4 Row Rank Equals Column Rank

Let A.be an m X n matrix. The dimension of the row space of A is


equal to the ‘dimension of thé column space of A. The common
dimension, the rank of A, is the number of pivots in a row-echelon
form of A.

We know that a basis for the column space of A consists of the columns of
A giving rise to pivots in a row-echelon form of A. We saw how to find a basis
for the nullspace of A in Section 1.6. We would like to be able to find a basis for
the row space. We could work with the transpose of A, but this would require

HISTORICAL NOTE THE RANK OF A MATRIX was defined in 1879 by Georg Frobenius
(1849-1917) as follows: If all determinants of the {r + 1)st degree vanish, but not all of the 7th
degree, then r is the rank of the matnx. Frobenius used this concept to deal with the questions of
canonical forms for certain matrices of integers and with the solutions of certain systems of linear
congruences.
The nullity was defined by James Sylvester in 1884 for square matrices as follows: The nullity
of an n X nv matrix is i if every minor (determinant) of order n — i + 1 (and therefore of every
higher order) equals 0 and / is the largest such number for which this is true. Sylvester was
interested here, as in much of his mathematical career, in discovering invariants—properties of
pa ticular mathematical objects that do not change under specified types of transformations. He
proceeded to prove what he called one of the cardinal laws in the theory of matrices, that the
nullity of the product of two matrices is not less than the nullity of any factor or greater than the
sum of the nullities of the factors.
138 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

another reduction to row-echelon form. In fact, because the elementary row


" operations do not change the row space of A, we simply can take as a basis for
the row space of A the nonzero rows ina row-echelon form. We summarize in a
box.

Finding Bases for Spaces Associated with a Matrix


Let A be an m X n matrix with row-echelon form H.
1. For a basis of the row space of A, use the nonzero rows of H.
2. For a basis of the column space of A, use the columns of A
corresponding to the columns of H containing pivots.
3. For a basis of the nullspace of A, use H and back substitution to
solve Hx = 0 in the usual way (see Example 3, Section 1.6).

EXAMPLE 1 Find the rank, a basis for the row space, a basis for the column space, and a
basis for the nullspace of the matrix
1 3 0-1 2
0-2 4-2 0
A=!3 11 -4 -t 6}:
25 3-4 0

SOLUTION We reduce A all the way to reduced row-echelon form, because we also want to
find a basis for the nullspace of A. We obtain
1 3 0-1 2] fl 3 O-1 2) f1 0 6-4 2
0-2 4-2 0} jo-2 4-2 Of jo 1-2 1 «0
A=!3 11 -4-1 6/~|0 2-4 2 01710 0 0 0 Ol7
2 5 3-4 O| lo-1 3-2-4) 10 0 1-1 -4
1 0 0 2 26
0 1 O-1 -8
0 0 1-1 -4|= 47.
0 0 0 0 0
Because the reduced form H contains three pivots, we see that rank(A) = 3.
As a basis for the row space of A, we take the nonzero row vectors of H,
obtaining
{{l, 0, 0, 2, 26], (0, l, 0, -I, —8], (0, 0, l, -i, —4}}.

Notice that the next to the last matrix in the reduction shows that the first
threc row vectors ofA are dependent, so we must not take them as a basis for
the row space.
2.2 THE RANK OF A MATRIX 139

Now the columns of A in which pivots appear in H form a basis for the
column space, and from H we see that the solution of Ax = 0 is
—2r - 3 "| ~26
r+ 8s ! 8
X=} r+ 4s;=r| t)+ 5} 4|. Thus we have the following bases:
r | | 0
S 0! I
i} / 3] f 0 72] |~26
0] |-2} | 4 ly; 8
Calumn space: 3\+{ a1)-|—4 Null space: ;| 1}, 4
atl sf l 3 0l
0 "

The Rank Equation


Let A be an m X n matrix. Recall that the nullspace of A—that 1s, the solu-
tion set of Ax = 0, has a basis witli as many vectcrs as the number of free
scalar variables, like r and s, appearing in the solution vector above. Because
we have one free scalar variable for each column without a pivot in a
row-echelon form of A, we see that the dimension of the nullspace of A is the
number of columns of A that do not contain a pivot. This dimension is called
the nullity of A, and is denoted by nullity(A).,Because.the number of columns
that do have a pivot is the dimension, rank(A), of the column space of A, we see
that

rank(A) + nullity(A) = 1, Rank equation


where n is the number of columns of A. This equation turns out to be very
useful. We summarize this equation, and the method for computing the
numbers it involves, in a theorem.

THEOREM 2.5. Rank Equation

Let A be anim X n ‘matrix with row-echelon form H, Tiga:


1. nullity(A) = (Number of freé ‘variables in:‘the solution space of
Ax = 0)= (Number of pivot--free columns i in A);
2. rank(A)= (Number of pivots in 43; See
. 3. (Rank equation). tank(A) + riullity(A) = (Number Of columns of A).

Because nullity(A) is defined as the number of vectors in a basis of the


nullspace of A, the invariance of dimension shows that the number of free
variables obtained in the solution of a linear system Ax = b is independent of
the steps in the row reduction to echelon form, as we asserted in Section 1.4.
140 CHAPTER 2. DIMENSION, RANK, AND LINEAR TRANSFORMATIONS
EXAMPLE 2 _ Illustrate the rank equation for the matrix A in Example 1.
SOLUTION The matrix Ain Example 1 has 2 = 5 columns, and we saw that rank(A) =
and nullity(A) = 2. Thus the rank equation is3+2=5. «

Our work has given us still another criterion for the invertibility of a
square matrix.

THEOREM 2.6 An Invertibility Criterion

Anz X n matrix A is invertible if and only if rank(A) = n.

“| SUMMARY
1. Let Abe an m x n matrix. The dimension of the row space of A is equal to
the dimension of the column space of A, and is called the rank of A,
denoted by rank(A). The rank of A is equal to the number of pivots in a
row-echelon form H of A. The nullity of A, denoted by nullity(A), is the
dimension of the nullspace of A—that is, of the solution set of Ax = 9.
2. Bases for the row space, the column space, and the nullspace of a matrix A
can be found as described in a box in the text.
3. (Rank Equation) For an m X n matrix A, we have

rank(A) + nullity(A) = 7.

“| EXERCISES. 0124 0231]


For the matrices inpe; Exercises
. 1-6, find (a) the 5. ; lan
10 | 6 -4414
3320
rank of the matrix, (b) a basis for the row space, 0211 4012
(c) a basis for the column space, and (d) a basis
for the nullspace. In Exercises 7-10, determine whether the given
matrix is invertible, by finding its rank.
5 -1 2
[2 9-3 17 2]! 2 1 0 0-9-9 2 231
"134 2 2 "13 1-2 4 7 ji 2 1 1 814-12
0 4-12 4 1-3 4 1 oT
1 3 2 0
fo 6 6 3] f3 1 4 2] (2 0 1] [3 0-1 2
Sob
x,t oo ~4b afdl 4065-1 O-1
067~«&; 0: | 9. No 0 0 4 4; 10. 41 2 1
40/1 8
+t m3 4j | 2 ft 9 | l2 4 0]
ee) 1 o-1 4 2 6-3 |
2.2 THE RANK OF A MATRIX 141

141. Mark each of the following True or False. 17. Give an example ofa 3 x 3 matrix 4 such
___ a. The number of independent row vectors that rank(A) = 2 and rank(A’) = 0.
in a matrix is the same as the number of
independent column vectors.
In Exercises 18-20, let A and C be matrices such
b. If H is a rcw-echelon form of a matrix A,
that the product AC is defined.
then the nonzero column vectors in H
form a basis for the column space of A.
18. Prove that rank(AC) < raak(A).
c. If His a row-echelon form of a matrix A,
then the nonzero row vectors in H area 19. Give an example where rank(AC) < rank(A).
basis for the row space of A. 20. Is it true that rank(AC) = rank(C)? Explain.
d. If ann X n matrix A is invertible, then
rank(A) = n. It can be shown that rank(ATA) = rank(A) (see
e. For every matrix A, we have rank(A) > 0. Theorem 6.10). Use this result in Exercises 2]-23.
f. For all positive integers m and n, the
rank of an m X n matrix might be any
21. Let A be an m X 4: matrix. Prove that
number from 0 to the maximum of m
rank(A(A7)) = rank(A).
and n.
__ g. For all positive integers m and n, the 22. Ifais ann x | vectorandbisal * m
rank of an m X n matrix might be any vector, prove that ab is ann X m matrix of
number from 0 to the minimum of m rank at most one.
and n. 23. Let A be an m X mn matrix. Prove that the
__h. For all positive integers m and n, the column space and row space of (A’)4 are the
nullity of an m Xx n matrix might be any same.
number from 0 to n. 24. Suppose that you are using computer
_— i. For all positive integers m and n, the software, such as LINTEK or MATLAB, that
nullity of an m X n matrix might be any will compute and print the reduced row-
number from 0 to m. echelon form of a matrix but does not
__.. j.. For all positive integers m and n, with indicate any row interchanges it may have
m = n, the nullity of an m X n matrix made. How can you determine what rows of
might be any number from 0 to n. the original matrix form a basis for the row
12. Prove that, if A is a square matrix, the space?
nullity of A is the same as the nullity of A’.
13. Let A be an m X rn matrix, and let b be an = In Exercises 25 and 26, use LINTEK or MATLAB
n X I vector. Prove that the system of to request a row reduction of the matrix, without
equations Ax = b has a solution for x if seeing intermediate steps. Load data files as usual
and only if rank(A) = rank(A | b), where if they are available. (a) Give the rank of the
rank (A | b) represents the rank of the matrix, and (bj use the software as suggested in
associated augmented matrix [A | b] of the Exercise 24 to find the lowest numbered rows, 11!
system. consecutive order, of the given matrix that form a
basis for its row space.
In Exercises 14-16, let A and C be matrices such
that the product AC is defined. 2-3 0 1 4
1 4 -6 3 -2
14. Prove that the column space of AC is 25. A=
0 lt -12 5 -8
contained in the column space of A. 4-1 5 3 7
1S. Is it true that the column space of AC is -! 1 3 —-6 8 —-2
contained in the column space of C? -3 & 3 | 4 8
Explain. 26. B=/| 1-3 3-13 12 -12
16. State the analogue of Exercise 14 concerning 0 2 -6 19 -20 14
the row spaces of A and C. 5 13 -21 3. Il 6
142 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

2.3 LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES


When we introduced the notation Ax = b and indicated why it is one of the
most useful notations in mathematics, we mentioned that we would see that
we coudregard A as a function and view Ax = b as meaning that the function
maps the vector x to the vector b. If A is an m X n matrix and the product Ax is
defined, then x € R" can be viewed as the input variable and b € R” can be
viewed as the output variable.
Functions are used throughout mathematics to study the structures of sets
and relationships between sets. You are familiar with the notation y = f(x),
where fis a function that acts cn numbers, signified by the input variable x,
and produces numbers signified by the output variable y. In linear algebra, we
are interested in functions y = f(x), where facts on vectors, signified by the
input variable x, and produces vectors signified by the output variable y.
In general, a function f: % —> Y is a rule that associates with each x in the
set X an element y = f(x) in Y. We say that fmaps the set X into the set Y and
maps the elementx to the element y. The set Xis the domain of fand the set Y is
called the codomain. To describe a function, we must give its domain and
codomain, and then we must specify the action of the function on each
element of its domain. For any subset H of X, we let {[H] = {/(4) | 4 © Hy}; the
set {[H] is called the image of H under f: The uunage of the domain of fis the
range of f. Likewise, for a subset K of Y, the set f'LK] = {x € X | f(x) € K} is
the inverse image of K under f: This is illustrated in Figure 2.2. For example,
if f: R > R is defined by f(x) = x’, then f[{1, 2, 3}] = {1, 4, 9} and f'[{1, 4, 9}]
= {-1, 1, -2, 2, —3, 3}. In this section, we study functions known as linear
transformations T that have as domain R" and as codomain R”, as depicted in
Figure 2.3.

The Notion of a Linear Transformation


Notice that for an m X n matrix A, the function mapping x € R" into Ax in R”
satisfies the two conditions in the following definition (see Example 3).

DEFINITION 2.3 Linear Transformation

A function 7: R’ — R* is a linear transformation if it satishes two


conditions:
1. T(u + v) = T(u) t+ T(v) Preservation of addition
2. T(ru) = rT(u) Preservation of scalar multiplication
for all vectors u and ¥ in R’ and for all scalars r.

From properties | and 2, it follows that [(ru + 3) = rT(u) = 57 (sj for all
u, v € R" and all scalars r and s. (See Exercise 32.) In fact, this equation can be
2.3 LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES 143

/ | | ™N

(b)
FIGURE 2.2
(a) The image of H under f; (b) the inverse image of K under [.

extended to any number of summands by induction —that is, forv,,v,,...,¥;


in R" and scalars 7,, 7, ..., % we have
T(rV, + rv, + + + yy, = 77 (v,) + Tv, + +++ +7,7(v,).
Equation (1) is often expressed verbally as follows:

Linear transformations preserve linear combinations.

penain of T Range of T

R’

FIGURE 2.3
The linear transformation T(x) = Ax.
144 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

We claim that if 7: R° > R” is a linear transformation, then T maps the


zero vector of R" to the zero vector of R”. Just observe that scalar multiplica-
tion of any vector by the zero scalar gives the zero vector, and property 2 of the
definition shows that
T(0) = T(00) = OT(0) = 0.
EXAMPLE 1 Show from Definition 2.3 that the function 7: R—> R defined by 7(x) = sin x is
nct a linear transformation.
SOLUTION We know that
- + TT T - (TT - {7
sin(Z + ;) # sin( 7 + sin( 7

because sin(a/4 + 2/4) = sin(a/2) = 1, but we have sin(a/4) + sin(7/4) =


1/2 + «1/2 = 2/2. Thus, sin x is not a linear transformation, because it
does not preserve addition. s

EXAMPLE 2 Determine whether T: R? > R? defined by T([x,, X,]) = [2,, X, — X,, 2x, + x] is
a linear transformation.
SOLUTION To test for preservation of addition, we let uv = [u,, u,] and v = [y,, v,], and
compute
T(a + vy) = T([u, + 1, % + »J)
= [uw + vu + Vy — Uy — Vy, 2u, + 2v, + wy + YY]
= [u,, Uy — Uy, 24, + uw] + [%, ¥, — Vy, 2¥, + V4]
= T(u) + T(y),
and so vector addition is preserved. To test for preservation of scalar
multiplication, we compute

T(ru) = T([r4, mi) = [ru ru, — r,, 2ru, + ru]


= r[u,, Uy — Uy, 2u, + Uy]
rT(u).
il

Thus, scalar multiplication is also preserved, and so T is a linear transforma-


tion. =

EXAMPLE 3 Let A be an m X n matrix, and let T,: R’ > R” be defined by 7,(x) = Ax for
each column vector x € R". Show that T, is a linear transformation.
SOLUTION This follows from the distributive and scalars-pull-through properties of
matrix multiplication stated in Section 1.3—namely, for any vectors u and V
and for any scalar r, we have

T,(u + v) = A(u + v) = Au + Av = T,(u} + T,(v)


and

T,(ru) = A(ru) = r(Au) = rT, (u).


2.3 LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES 145

These are precisely the conditions for a linear transformation given in


Dennition 2.3. «

Looking back at Example 2 we see that, in column-vector notation, the


transformation there appears as
x; |O 1
7({%) =| x,-x,/=]1 -1 |.
* 2x, +x,| [2 1/'%
Based on Example 3, we can conclude that T is a linear transformation,
obviating the need to check linearity directly as we did in Example 2. In a
moment, we will show that every linear transformation of R’ into R” has the
form T(x) = Ax for some m X n matrix A. This is especially easy to see for
lincar transformations of R into R.

EXAMPLE 4 Determine all linear transformations of R into R.


SOLUTION Let T: R — R be a linear transformation. Each element of 8 can be viewed
either as a vector or as a scalar. Let a = 7(1). Applying Property 2 of
Definition 2.3 with u = 1 and r = x, we obtain

T(x) = T(x(1)) = xT(1) = xa = ax.


Identifying a with the 1 x | matrixA having a as its sole entry, we see that we
have T(x) = Ax, and we know this transformation satisfies properties ! and 2
in Definition 2.3. From a geometric viewpoint, we see that the /inear
transformations of R into R can be described as precisely those functions
whose graphs are lines through the origin. «

Example 4 shows that a linear transformation of R into R is completely


determined as soon as 7(1) is known. More generally, a linear transformation
T: R’ > R" is uniquely determined by its values on any basis for R’, as we now
show.

THEOREM 2.7 Bases and Linear Transformations

Let T: R"— R* be a linea= transformation, and let B {b,, b,,... , b}


be a basis for R". For any vector v in R’, the vector T(v) is uniquely
determined by the vectors T(b,), T(b,), ..., T(b,).

PROOF Let v be any vector in R”. We know that because 8B is a basis, there
exist unique scalars r,, r,,..., 7, Such that
v=rb, + 7b, + °° > +7,b,.

Using Eq. (1), we see that


T(v) = T(rd, + by + + + 1yb,) = 1,7(0,) + T(b:) t+ + 7, 706,)
146 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

Because the coefficients r, are uniquely determined by y, it follows that T(¥) is


completely determined by the vectors T(b,) fori=1,2,...,”m. &

Theorem 2.7 shows that if two linear transformations have tie same value
at each basis vector b,, then the two transformations have the same value at
each vector in R’, and thus they are the sa:ne transformation.

COROLLARY Standard Matrix Representation of a Linear Transformation

Let T: R’ — R” be a linear transformation, and let A be the m X n


matrix whose jth column vector is 7(e,), which we denote symbolically
as

A=|T(e,) T(e,) +--+ T(e,) |. (2)

Then 7(x) = Ax for each column vector x € R”.

PROOF Recall that for any matrix A, Ae; is the jth column of A. This shows at
once that if A is the matrix described in Eq. (2), then Ae; = T(e,), and so T and
the linear transformation T, given by 7,(x) = Ax agree on the standard basis
{e,, e,,..., @,¢ of R". By Theorem 2.7, and the comment following this
theorem, we know that then T(x) = T,,(x) for every x € R’— that is, T(x) = Ax
foreveryxE RR’. &

The matrix 4 in Eq. (2} 1s the standard matrix representation of ihe linear
transformation T.

EXAMPLE 5 Let T: R? — R? be the linear transformation such that


T(e,) = [2, I, 4] and T(e,) — (3, 0, ~2].

Find the standard matrix representation A of 7 and find a formula for


T ([X,, x,]}.

SOLUTION Equation (2) for the standard matrix representation shows that
-
3! [2x. + 3x.
2.3. LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES 147

In row-vector notation, we have the formula

T([X,, X%]) = (22, + 3x, x, 4x, — 2x)]. a

EXAMPLE 6 Find the standard matnx representation of the linear transformation


T: R*
— R? where
T([, X35 X3, X4]) = [2 — 3x5, 2X, — x, + 3x4, 8x, — 4x, + 3x; - 4]. (3)

SOLUTION We compute

T(e,) = T({1, 0, 0, OJ) = (€, 2, 8], T(e,) = T((O, 1, 0, O]) = (1, -1, —4]
T(e;) = T({O, 0, 1, O}) = (-3, 0, 3], T(e,) = TE[O, 0, 0, 1]) = (0, 3, —1)].
Using Eq. (2), we find that
0 1-3 0
A=|2 -1 0 3}
8-4 3 -i

Perhaps you noticed in Example € that the first row of the matnx A
consists of the coefficients of x,, x,, x;, and x, in the first component x, — 3x, of
T ([X,, Xz» X3 X4]). The second and third rows can be found similarly. If Eq. (3)
iS written in column-vector form, the matrix A jumps out at you immediately.
Try it! This is often a fast way to write down the standard matrix representa-
tion when the transformation is described by a formula as in Example 6. Be
sure to remember, however, the Eq. (2) formulation for the standard matrix
representation.
We give another example indicating how a linear transformation is
determined, as in the proof of Theorem 2.7, if we know its values on a basis for
its domain. Note that the vectors u = [—1, 2] and v = (3, —5] are two
nonparallel vectors in the plane, and form a basis for R’.

EXAMPLE 7 Let u = [-1, 2] and v = [3, —5] be in R’, and let T: R’ — R? be a linear
transformation such that T(u) = [-2, 1, 0] and 7(v) = (5, —7, 1]. Find the
standard matnx representation A of T and compute 7((-—4, 3)).
SOLUTION To find the standard matrix representation of T, we need to find T(e,) and
T(e,) for e,, e, € R’. Following the argument in the proof of Theorem 2.7, we
express e, and e, as linear combinations of the basis vectors u and v for R’,
where we know the action of T. To express e, and e, as linear combinations of u
and v, we solve the two linear systems Ax = e, and Ax = e,, where the
coefficient matrix A has u and vas its column vectors. Because both systems
have the same coefficient matrix, we can solve them both at once as follows:

-1 3 Pog frst) ft of s 3
2-5]/01} {o 1121 jo 12ay
148 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

We see that e, = 5u + 2v and e, = 3u + v. Using the linearity properties,


T(e,) = T(5u + 2v) = 5T(u) + 27(y) = S5[-2, 1, 0] + 2[5, -7, 1]
= (0, —9, 2]
and
T(e,) = T(3u + vy) = 37T(a) + T(y) = 3[-2, 1, 0] + (5, -7, 1]
=(-!,-4,]]). |
The standard matrix representation A of T and 7((—4, 3]) are

34] ets)
0 -} 3
A=

Some Terminology of Linear Transformations


The matrix representation A of a linear transformation T: R’ > Ris a great
help in working with 7. Let us use column-vector notation. Suppose, for
example, we want to find the range of 7—that is, the set of all elements of R”
that are equal to T(x) for some x € R’. Recall that we use the notation 7[R’] to
denote the set of all these elements, so that 7[R"] = {7(x) | x € R’}. Because
T(x) = Ax, we have T[R"] = {Ax | x © R’}. Now Ax is a linear combination of
the column vectors of A where the coefficient of the jth column of A in the
linear combination 1s x, the jth component of the vector x. Thus the range of T
is precisely the column space of A.
For another illustration, finding all x such that 7(x) = 0 amounts to
solving the linear system Ax = 0. We know that the solution of this
homogeneous linear system is a subspace of R", called the nu//space of the
matrix A. This nullspace is often called the kernel of T as well as the nullspace
of T, and is denoted ker(T).
Let W be a subspace of R". Then W = sp(b,, b,,..-., b,) where B =
{b,, b,,..., b,} is a basis for W. Because T preserves linear combinations, we
have
T(r,b, + rb, tree + rd,) = r,T(b,) + r,T(b,) tree + r, 2 {b,).

This shows that T[W] = sp(T(b,), T(b,),..., T(b,)), which we know is a


subspace of R”.
We summarize the three preceding paragraphs in a box.

Let T: R? > R” be a linear transformation with standard matrix


representation A.
1. The range T[R’] of T is the column space of 4 in R".
2. The kernel of T is the nullspace ofA and is denoted ker(7).
3. If HW’ is a subspace of R’, then T[}I] is a subspace of R”: that is. 7
preserves subspaces. |
23 LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES 149

EXAMPLE 8 Find the kernel of the linear transformation T: R? > R’ where T([X,, x), X3])
= [x, ~ 2X, xX, + 4x,].

SOLUTION We simply find the nullspace of the standard matrix representation .1 of T.


Writing down and then .educing the matrix A, we have

alo defo 2 al~[o1


Thus we find that ker(7) = sp([-4, —2, 1]). s

Matrix Operations and Linear Transformations


It is very fruitful to be able to hop back and forth at will between matrices and
their associated linear transformations. Every property of matrices has an
interpretation for linear transformations, and vice versa. For example, the
rank equation, rank(A) + nullity(A) = n, for an m X n matrix A becomes
dim(range 7) + dira(ker(7)) = dim(domain T).

The dimension of range T is called the rank of 7, and the dimension of ker(T)
18 called the nullity of T.
Also, matrix multiplication and matrix inversion have very signifi-
cant analogues in terms of transformations. Let 7: R’ > R” and 7’: R™ > R*
be two linear transformations. We can consider the composite function
(T’ ° T): R" — R* where (7” © T)(x) = T’(T(x)) for x € R". Figure 2.4 gives a
graphic illustration of this composite map.
Now suppose that A is the m X n matrix associated with T and that 8 is the
k X m mairix associated with 7’. Then we can compute 7’(T(x)) as
T'(T(x)) = T'(Ax) = B(Ax).
But

B(Ax) = (BA)x, Associativity of matrix multiplication


so (T'° T\(x)= (BA)x. From Example 3, we see that 7’ ° T is again a linear
transformation, and that the matrix associated with it is the product of the
matrix associated with 7’ and the matrix assuciated with J, in that order.
Notice how easily this follows from the associativity of matrix multiplication.

T’ oT

| T
re
°
,

x
rn| 7 Ax By =B(Ax)
rel _| Ri
FIGURE 2.4
The composite map T’ « 7.
150 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

It really makes us appreciate the power of associativity! We can also show


directly from Definition 2.3 that the composite of two linear transformations
is again a linear transformation. (See Exercise 31.)

Matrix Multiplication and Composite Transformations


A composition of two iinear transformations T and T" yields a
linear transformation T’ ° 7 having as its associated matrix the
| oroduct of the matrices associated with 7” and T, in that order.

This result has some surprising uses.

ILLUSTRATION 1 (The Double Angle Formulas) It is shown in the next section that rotation of the
plane R’ counterclockwise about the origin through an angle @ is a linear
transformation 7: R’ > R? with standard matrix representation
es 6 —sin |
. . Counterclockwise rotation through @ (3)
sin @ cos 6

- Thus, 7 applied twice—that is, T° T—rotates the plane through 26. Replacing
6 by 26 in matrix (3), we find that the standard matrix representation for T° T
must be

cos 26 —sin 26 (4)


sin 26 ~=— cos 24|"

On the other hand, we know that the standard matrix representation for the
composition J» T must be the square of the standard matrix representation
for T, and so matnx (4) must be equal to
cos @ ~sin 6]|/cos 6 —sin 6} _ |cos’@ — sin’@ ~2 sin Ocos 6] (5)
sin@ cos @\|sin@ cos 6 2 sin @cos 86 —sin’@ + cos’6]’
Comparing the entries in matrix (4) with the finai result in Eq. (5), we obtain
the double angle trigonometric identities

sin 26 = 2 sin @cos 6 and cos 26 = cos’@ — sin2@. "

Let us see how matrix invertibility reflects a corresponding property of the


associated linear transformation. Suppose that A is an invertible 2 X n matrix,
and let 7: R" > R’ be the associated linear transformation, so that y = 7(x)
= Ax. There exists a linear transformation of R" into R’ associated with A7'; we
denote this by T~', so that T-'(y) = A™'y. The matrix of the composite
transformation T~'° T1s the product A“'A, as indicated in the preceding box.
Because A“'A = J and /x = x. we see that (T~!° T)(x) = x. That is, T~!e Tis the
identity transformation, leaving all vectors fixed. (See Fig. 2.5.) Because 447!
= ] too, we see that J ° T7' is also the identity transformation on R’. If
2.3 LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES 151

x =AT4Ax [|———_

rt Ke aw

R" ae a”
a

FIGURE 2.5
T-'o T is the identity transformation.

y = T(x), then x = T~'(y). This transformation T™' is the inverse transforma-


tion of 7, and T is an invertible linear transformation.

Invertible Matrices and Inverse Transformations


Let A be an invertible n < n matrix with associated linear
transformation T. The transformation T ' associated with A™' is the
inverse transformation of 7, and T° T7' and T~'° T are both the
identity transformation on R’. A linear transformation T: R? > R’
is invertible if and only if its associated matrix is invertible.

EXAMPLE 9 Show that the linear transformaticn T: R? — R? defined by T([x,, x,, x;]) =
t X, 1 — 2X, 2 + X;,3 X) 2 — Xy,3 2x, 2 — 3x,] 3 is invertible, and find a formula for its
inverse.
SOLUTION Using column-vector notation, we see that T(x) = Ax, where

1-2 |
A=!/0 1 —1/I.
0 2-3
Next, we find the inverse of A:
1-2 1/100] [f O-1}41 2 Of fi O Of 1 4-1
0 1-1)/0 1 0)~;0 1-170 1 OF~10 1 OF] O 3 -II.
0 2-3;00 1 0 0-110 -2 i} loo alo 2-1

Therefore,

T(x) =A'X=10 3 -I]} lx} = 3x,


— X3|,
0 2-1 x3 2X. — X;

which we express in row notation as


T'([X,, %2, X3]) = [ey + 42 — 2X5, 3x, — 4, 2 — Xy).
In Exercise 30, we ask you to verify that 7-'(7(x)) = x, asin Figure 2.5. a
152 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

, SUMMARY |
1, A function T: R’— Ris a linear transformation if T(u + v) = T(u) + T(y)
and T(ru) = rT(u) for all vectors u, vy € R’ and ll scalars r.
. IfAisanm xX n matrix, then the function T,: R’ > R” given by T(x) = Ax
for al! x € R’ is a linear transformation.
A linear transformation T: R" — R” is uniquely determined by 7(b,),
T(b.),..., T(b,) for any basis {b,, b,, .. . , b,} of R”.
Let T: R" — R* be a linear transformation and let A be the m X n matrix
whose jth column vector is T(e)). Then T(x) = Ax for all x € R’; the matrix
A is the standard matrix representation of 7. The kernei of T is the
nullspace of A, and the range of T is the column space of A.
Let T: R* > R” and 7’: R” — R* be linear transformations with standard
matrix representations A and B, respectively. The composition T" ° T of
the two transformations is a linear transformation, and its standard matrix
representation is BA.
6. If y = T(x) = A(x) where A is an invertible n X # matmx, then T is
invertible and the transformation T~' defined by T~'(y) = A™'y is the
inverse of T. Both T-!° T and 7° T~' are the identity transformation of R’.

EXERCISES

1. Is T([4, Xy X5]) = Py + 2%, %1 — 3x) a 7. If T((1, 0, O}) = [3, 1, 2], T((O, 1, C}) =
linear transformation of R? into R?? Why or [2, -1, 4], and T((0, 0, 1]) = [6, 0, 1], find
why not? T((2, —5, 1)).
+ Is T([%, 2 X]) = [0, 0, 0, 0] a linear . If T({1, 0, O}) = [-3, 1], T((0, 1, 0) =
transformation of R? into R‘? Why or why [4, -1]}, and T({0, -1, 1]) = [3, —5], find
not?
T([-1, 4, 2]).
~ Is T([%, X,, X5]) = [1, 1, 1, 1] a linear
transformation of R? into R*? Why or why . If T[-1, 2) = [1, 0, 0] aad 72, ij) =
not? [0, 1, 2], find 7([0, 10]).
- Ts F(D4, ©) = [4 — 22 + 1, 3x, — 2) a 10. If T([-1, 1]) = (2, 1, 4] and T((A, 1) =
linear traasformation of R? into R?? Why or {—6, 3, 2], find T(Lx, y)).
why not?
11. If T([1, 2, —3]) = {1, 0, 4, 2], T((3, 5, 2]) =
In Exercises 5—12, assume that T is a linear {—8, 3, 0, 1], and T([—-2, -3, -4)) =
transformation. Refer to Example 7 for Exercises {0, 2, —1, 0], find 7([5, —1, 4]).
9-12, if necessary. [Computational aid: See Example 4 in
Section 1.5.]
5. If T([1, 0]) = (3, —!] and 7((0, 1]) =
12. If T((2, 3, 0]) = 8, T({1, 2, -1]) = —S, and
[—2, 5]. find T([4, -61). T([4, 5, l]) = 17, find 7({[-3, tl, -4)).
6. If T({-1, OP = (2, 3] and T({C. 1]) = [5, 4]. [Computational aid: See the answer to
find T([-3, —5}). Exercise 7 in Section 1.5.]
2.3 LINEAR TRANSFORMATIONS OF EUCLIDEAN SPACES 153

In Exercises 13-18, the given formula defines a ___e. An invertible linear iransformation
linear transformation. Give its standard matrix mapping R’ into itself has a unique
representation. inverse.
ff. The same matrix may be the standard
13. [x,, ,]) = [x + Xy YY — 3x] matrix representation for several different
14. linear transformations.
[X1, X2]) = [2x, — x, Xx, + %, x, + 3x]
. A linear transformation having an m Xx n
15. (1, %, X3]) = [2% + XQ © XG, X) + Xy, Xi] matrix as standard matrix representation
16. [X1, X2, X3]) = [2x + xy +X, x +X, + 35] maps R’ into R”.
17. [X1, X25 X3]) = [%) — Xp + 3X5, xy +. + Xs, 4] . If T and T' are different linear
transformations mapping R? into R”, then
18. [X1, Xp, X3]) = 1 + + %
we may have 7(e,) = T’(e,) for some
19. If T: R? — R} is defined by T([x,, x,]) = standard basis vector e; of R’.
[2x, + x), xX), X; — X,] and T’: R? > R? is i. If T and T’ are different linear
defined by 7’([x,, x, X3]) = transformations mapping R’ into R*, then
[x, — xX, + X;, xX, + xX], find the standard we may have 7(e,) = T'(e,) for all
matrix representation for the linear standard basis vectors e; of R’.
transformation T” ° T that carries R? intc R’. . IfB = {b,, b,, ..., b,} is a basis for R?
Find a formula for (T' ° T\([x,, x,]). and 7 and T” are linear transformations
20. Referring to Exercise 19, find the standard mapping R’ into R”, then T(x) = T'(x)
matrix representation for the linear for all x € R’ if and only if T(b,)) = T'(b,)
transformation T° 7’ that carries R? into R’. fori=1,2,...,7.
Find a2 formula for (T° T')([x,, X, 3). . Verify that 7~'(7(x)) = x for the linear
transformation T in Example 9 of the text.
31. Let T: R? > R" and T': R" > R be linear
transformations. Prove directly from
In Exercises 21-28, determine whether. the Definition 2.3 that (T’ ° T) R" > R¥ is also a
indicated linear transformation T is invertible. If linear transformation.
it is, find a formula for T-'(x) in row notation. If 32. Let T: R? > R” be a linear transformation.
it is not, explain why it is not. Prove from Definition 2.3 that T(ru + sv)
= rT(u) + sT(vy) for all u, v € R’ and all
21. The transformation in Exercise 13. scalars r and s.
22. The transformation in Exercise 14,
23. The transformation in Exercise 15. Exercise 33 shows that the reduced row-echelon
24. The transformation in Exercise 16. form of a matrix is unique.
&
Je The transformation in Exercise i7.
33. Let A be an m X nm matrix with row-echelon
26. The transformation in Exercise 18.
form H, and let V be the row space of A (and
27. The transformation in Exercise 19. thus of H). Let W, = sp(e, @,..., eg
28. The transformation in Exercise 20. be the subspace of R” generated by the first k
29. Mark each of the following True or False. rows of the n x n identity matrix. Consider
—_— a. Every linear transformation is a function. T,: V— W, defined by
__— b. Every function mapping R” into R” is a
linear transformation.
T,((%, Xq, + -» Xl)
= (x1, X2) . ..,Xp, 0,..., 0).
. Composition of linear transformations
corresponds to multiplication of their a. Show that 7, is a linear transformation of
standard matrix representations. V into W, and that 7,[V] =
. Function composition is associative. {T,(v) | vin V} is a subspace of W,..
154 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

b. If 7,[V] has dimension d,, show that, for


eachj <n, we have either d., = d, or d__,
=d +l.
c. Assume that 4 has four columns.
Referzing to part (b), suppose that d, = d, C= in ; th and
= | and d, = d, = 2. Find the number of Noa
pivots in H, and give the location of each. D= 3 -4 ,
d Repeat part (c) for the case where A has -2 5 Of
six columns and d, = |, d, = d, = d, = 2,
and d, = & = 3. respectively, Use LINTEK or MATLAB to
c. Argue that, for any matrix A, the number compute the indicated quantity, if it is defined.
of pivots and the location of each pivot Load data files for the matrices if the data files
In any row-echelon form of A is always are available.
the same.
f. Show that the reduced row-echelon form 35. (T° T,° TEL, 2, 1)
of a matrix 4 is unique. [Himt: Consider ° who -
the nature of the basis for the row space 36. (Tye Tre FLO, — 1)
of A given by the nonzero rows of H.] 37. (T,° (T,° Ti! T)((-1, 0)
o ° —1 ° —
34. Let 7: R” > R” bea linear transformation 38. (Tie Tie TY TM 1, OM)
and let U be a subspace ot R”. Prove that 39. Work with Topic 4 of the LINTEK routine
the inverse image T-'[U] is a subspace of R’. VECTGRPH until you can consistently
achieve a score of at least 80%.
In Exercises 35-38, let T;, T,, T;, and T, be 40. Work with Topic 5 of the LINTEK routine
linear transformations whose siandard matrix VECTGRPH until you can regularly attain a
representations are score of at least 82%.

2.4 LINEAR TRANSFORMATIONS OF THE PLANE (OPTIONAL)

From the preceding section, we know that every linear transformation


T: R? + R? is given by T(x) = T(x) = Ax, where A is some 2 x 2 matrix.
Different 2 x 2 matrices 4 give different transformations because 7(e,) = Ae,
is the first column vector of A and T(e,) = Ae, isthe second colum..i vector. The
entire plane is mapped onto the column space of the matrix A. In this section
we discuss these linear transformations of the plane R? into itself, where we can
draw reasonable pictures. We will use the familiar x, y-notation for coordinates
in the plane.

The Collapsing (Noninvertible) Transformations


Fora 2 x 2 matrix A to be noninveruible, it must have rank 0 or 1. If rank(A) =
QO. then 4 is the zero matrix
2.4 LINEAR TRANSFORMATIONS OF THE PLANE (OPTIONAL) 155
and we have A(v) = 0 for all v € R?’. Geometrically, the entire plane is collapsed
to a single point—the origin.
If rank(A) = 1, then the column space of A, which is the range of 77, 15 a
one-dimensional subspace of R’, which is aline through the origin. The matrix
A contains at least one nonzero column vector, if both column vectors are
nonzero, then the second one is a scalar multiple of the first one. Examples of
such matrices are

10 0 0| -1 0| 1 -3 (1)
oop fo1p Lop |2 -6}:
Projection Projection Collapse Collapse
on x-axis ony-axis ontoy=-—-x ontoy = 2x

The first two of these matrices produce projections on the coordinate axes, as
labeled. Projection of the plane on a line /, through the origin maps each vector
v onto a vector p represented geometrically by the arrow starting at the origin
and having its tip at the point on L that is closest to the tip of v. The line
through the tips of v and p must be perpendicular to the line L; phrased
entirely in terms of vectors, the vector v — p must be orthogonal to p. This 1s
illustrated in Figure 2.6. Projection on the x-axis is illustrated in Figure 2.7; we
see that we have

(3) = {ol =!o ol


T x = [x = 1 Olfx

in accord with our labeling of the first matrix in (1). Similarly, the second
matrix in (1) gives projection on the y-axis. We refer to such matrices as
Drojection matrices. The third and fourth matrices map the plane onto the
indicated lines, as we readily see by examination of their column vectors. The
transformations represented by these matrices are not projections onto those
lines, however. Note that when projecting onto a line, every vector along the

FIGURE 2.6 FIGURE 2.7


Projection onto the fine L. Projection onto the x-axis.
156 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

line is left fixed—that is, it is carried into itself. Now [3, —3] is a vector along
the line y = —x, but

[i ols} [3] | 3)
-1 Of 3) _ {-3 3

which shows that the third matrix in (1) is not a projection matrix. A similar
computation shows that the final matrix in (1) is not a projection matrix. (See
Exercise 1.) Chapter 6 discusses larger projection matrices.

Invertible Linear Transformations of the Plane

We know that if 7,: R? > R® given by 7,(x) = Ax is an invertible linear


transformation of the plane into itself, then A is an invertible 2 x 2 matrix so
that Ax = b has a solution for every b © R*. Thus the range of T,, is all of R’.
Among the invertible linear transformations of the plane are the rigid
motions cf the plare that carry the origin into itself. Rotation of the piane
about the origin counterclockwise through an angle 6, as illustrated in Figure
2.8, is an example of such a rigid motion.

EXAMPLE 1 Explain geomctrically why T: R’? — R?, which rotates the plane counterclock-
wise through an angle @, is a linear transformation, and find its standard
matrix representation. An algebraic proof is outlined in Exercise 23.
SOLUTION We must show that for all u, v, w © R? and all scalars r, we have J(u + v) =
T(u) + T(v) and T(rw) = rT(w). Figure 2.8(a) indicates that the paralJelogram
that defines u + v is carried into the parallelogram defining T(u) + T(v) by 7,
and Figure 2.8(b) similarly shows the lines illustrating rw and T(rw). Thus T
preserves addition and scalar multiplication. Figure 2.9 indicates that

T(e,) = [cos 8, sin 6] and 7 {e,) = [-sin 6, cos 9].

yas y
A Muy) t T(rw)
y J

rw
T(w)
£6|
Ww

\ ~~

(a) T(u + v) = T(u) + T(v) (b) T(rw) = rT(w)

FIGURE 2.8
2.4 LINEAR TRANSFORMATIONS OF THE PLANE (OPTIONAL) 157

}
T(e€) A e2

I T(e;)
cos 8 ro
sin 6
I @ 4 x

sin 6 cos 8 “1

FIGURE 2.9
Countercloeckwise rotatiun of e, and e,
through the angle 6.

Thus

cos@ —sin@
Counterclockwise rotation threugh 0 (2)
sin@ cos 6}
is the standard matrix representation of this transformation. »#

Another type of rigid motion T of the plane consists of “turning the plane
over” around a line L through the origin. Turn the plane by holding the ends of
the line L and rotating 180°, as you might hold a pencil by the ends with the
“No. 2” designation on iop and rotate it 180° so that the “No. 2” is on the
underside. In analogy with the rotation in Figure 2.8, the parallelogram
defining u + v is carried into one defining T(u) + 7(v), and similarly for the
arrows defining the scalar product rw. This type of rigid motion of the plane is
called a reflection in the line L, because if we think of holding a mirror
perpendicular to the plane with its bottom edge falling on L, then 7(v) 1s the
reflection of v in this mirror, as indicated in Figure 2.10. Every vector w along
Lis carried into itself. As indicated in Figure 2.11, the reflection T of the plane

y -
A Lx, y]
I

oy
0 j
J
|

T(x, y]) = fx, —y]


FIGURE 2.10 FIGURE 2.11
Reflection in the line L. Reflection in the x-axis
158 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

in the x-axis is defined oy T({x, y]) = [x, —y]. Because e, is left fixed and e, is
carried into —e,, we see that the standard matrix representation of this
reflection in the x-axis is

[1 0
10 - 1 . Reflection in the x-axis (3)

It can be shown that the rigid motions of the plane carrying the origin into
itself are precisely the linear transformations T: R* — R’ that preserve lengths
of all vectors in R?—that is, such that ||7(x)|| = ||x|| for all x € R?. We will
discuss such ideas further in Exercises 17-22.
Thinking for a moment, we can see that every rigid motion of the plane
leaving the origin fixed is either a rotation or a reflection followed by a
rotation. Namely, if the plane is not turned over, all we can do is rotate it about
the origin. If the plane has been tumed over, we can achieve its final position
by reflection in the x-axis, turning it over horizontally, followed by a rotation
avout the origir io obtain the desired position. We will use this last fact in the
second solution of the next example. (Actually, every rigid motion leaving the
origin fixed and turning the plane over is a reflection in some line through the
origin, although this is not quite as easy to see.) The first solution of the next
example illustrates tuat bases for R* other than the standard basis can be
useful.

EXAMPLE 2 Find the standard matrix representation A for the reflection of the plane in the
line y = 2x.
SOLUTION 1 Let b, = [1, 2], which lies along the linev = 2x, and let b, = [—2, 1], which is
orthogonal to b, because b, - b, = 0. These vectors are shown in Figure 2.12. If
7: R* > R? is reflection in the line y = 2x, then we have

T(b,)=b, and T(b,)= —b,.

y= 2x

fb; = [1,2]= T(b))

FiGurRE 212
Reflection in the line y = 2x.
24 LINEAR TRANSFORMATIONS OF THE PLANE (OPTIONAL) 159

Now {b,, b,} is a basis for R’, and Theorem 2.7 teiis us that T is completely
determined by its action on this basis. To find T(e,) and 7(e,) for the column
vectors in the standard matrix representation A of JT, we arst express e, and e,
as linear combinations of b, and b,. To do this, we solve the two linear systems
with e, and e, as column vectors of constants and b, and b, as columns of the
coefficient matrix, as follows:

|
1

aldo
Ji -2] 1 OF Jt -2 1 O} 71 0 5
2 1/0 1 0 5j-2 1 0 | 2

.
tale
5
Thus
we have

l 2 2 1
e, = 5D - 3) and e,= 3), + 3b).

Applying the transformation T to both sides of these equations, we obtain

Tie)
= sTb) ~ ZT)= gh. + Bs= gle 2)+ Y-2, 1) =| 3, §|
and

Tle.) = $T(b,) + 370) = 3b. ~ gbal = BL - g{-2, 1)= [3,3ad


Thus the standard matrix representation is
lwo

tal nlf
UE

A=
unit

SOLUTION 2 The three parts of Figure 7.13 show that we can attain the reflection of the
plane in the line y = 2x as follows: First reflect in the x-axis, taking us from
part (a) to part (b) of the figure, and then rotate counterclockwise through the
angle 20, where @ is the angle from the x-axis to the line y = 2x, measured
counterclockwise. Using the double angle formulas derived in Section 2.3, we
see from the right triangle in Figure 2.13(a) that

2 It
sin 2@ = 2 sin@ cos@ = AA = 2

and

cos 26 = cos’6 — sin’@ = 2 — t= -=,

oo
Replacing @ by 26 in the matrix in Example 1, we see that the standard matrix
representation for rotation through the angle 24 is
Al
Alo
WIR

Wl
60 CHAPTER 2 DIMENSION, RANK, ANO LINEAR TRANSFORMATIONS

FIGURE 2.13
(a) The vector v (b) Reflected (c) Rotated

Multiplying matrices (4) and (3), we obtain

3
4h —3

al&
ale

_ 5 5
A=! 4 4
5
Uw
Ui

Rotate Reflect Wo

In Example 2, note that because we first reflect in the x-axis and then rotate
through 26, the matrix for the reflection is the one on the right, which acts ona
vector y € R? first wien computing Av.*

A Geometric Description of All Invertible Transformations


of the Plane
We exploit matrix techniques to describe geometrically all invertible linear
transformations of the plane inio itself. Recall that every invertibie matrix is a
product of elementary matrices. If we can interpret geometrically the effect on
the plane of the linear transformations having elementary 2 x 2 matrices as
their standard representations, we will gain insight into all invertible linear
transformations of R? into R’.

*This right-to-left order for composite transformations occurs because we wnte functions on the
left of the elements of the domain on which they act, wnting f(x} rather than (x). From a
pedagogical standpoint, writing functions on the left must be regarded as a peculiarity in the
development of mathematical notations in a society where text is read from left to right. If we
wrote functions on the right side, then we would take the transpose of 4x and write

x747 = x7{reflection matrix} (rotatien matrix]’,


except that we would of course have devcloped things in terms of row vectors sO nat Lee Lrursposc
notation would not appear as it does here.
2.4 LINEAR TRANSFORMATIONS OF THE PLANE (OPTIONAL) 161

EXAMPLE 3 Describe geometrically the effect on the plane of the linear transformation 7;
where E is an elementary matrix ootained by multiplying a row of the 2 x 2
identity matnix J by —1.
SOLUTION The matrix obiained by multiplying the second row of J by —1 is

elo-i}
_}l 0

which is the matrix given as matrix (3). We saw there that 7, is the reflection in
the x-axis and 7,([x, y]) = [x, —y]. Similarly, we see that the elementary
matrix obtained by multiplying the first row of J by —1 represents the
transformation that changes the sign of the first component of a vector,
carrying [x, y] into [—x, y]. This is the reflection in the y-axis.

EXAMPLE 4 Describe geometrically the effect on tne plane of the linear transformation 7;
where £ is an elementary matrix obtained by interchanging the rows of the
2 X 2 identity matnx J.
SOLUTION Here we have

eal a [rd bl= [eh


_10 1 O l]}x] _ly

In row-vector notation, we have T,([x, y]) = [y, x]. Figure 2.14 indicates that
this transformation, which interchanges the components of a vector in the
plane, is the reflection in the line y= x. o

EXAMPLE 5 Describe geometrically the linear transformation r() =f | where E is a


2 X 2 elementary matrix corresponding to row scaling.
SOLUTION The matnx £ has the form

oilboy

i yax
a

FIGURE 2.14
Reflection in the line y = x.
162 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

for sume nonzero scalar r, We discuss the first case and leave the second as
Exercise 8. The transformation is given by

n(s)-[oiG)-(5h
or, in row notation, T({x, y]) = [rx, yl. The second component of [x, y] is
unchanged. However, the first component is multiplied by the scalar r,
resulting in a horizontal expansion if r > 1 or in a horizontal contraction if
0<r< 1. In Figure 2.15, we illustrate the effect of such a horizontal expansion
or contraction on the points of the unit circle. If r <0, we have an expansion or
contraction followed by a reflection in the y-axis. For example,

—2 0) _[-1 0 3 0
0 1 0 1; !0 1/7
Reflection Horizontal expansion

indicating a horizontal expansion by a factor of 3, followed by a reflection in


the y-axis. -

EXAMPLE 6 Describe geometrically the linear transformation 7({?) =E *) where E£ is a


2 X 2 elementary matrix corresponding to row addition.
SOLUTION The matrix & has tke form

Pile [od
for some nonzero scalar r. We discuss the first case, and leave the second as
Exercise 10. The transformation is given by

“DL alb]= beso}


or, in row-vector notation, 7 ({x, y]) and [x, rx + y]. The first component of the
vector [x, y] is unchanged. However, thc second component is changed by the

y y
4 A

-1

(a) (b)

FIGURE 2.15
1
(a)T([x, y}) = 5 y contracts horizontally; (b) T([x, y]) = [3x, y] expands horizontally.
2.4 LINEAR TRANSFORMATIONS OF fHE PLANE (OPTIONAL) 163

addition of rx. For example, [1, 0] is carried onto [1, r], and [1, 1] 1s carried
onto [1, | + r], while [0, 0} and [0, 1] are carried onto themselves. Notice that
every vector along the y-axis remains fixed. Figure 2.16 illustrates the effect of
this transformation. The squares shaded in black are carried onto the
parallelograms shaded in color. This transformation is called a vertical shear.
Exercise 10 deals with the case of a horizontal shear. «=

We have noted that a square matrix A 1s invertible if and only if it is the


product of elementary matrices. We also know that a product of matrices
corresponds to the composition of the associated linear transformations, and
we have seen the effect of transformations associated with elementary matrices
on the plane. Putting all of these ideas together, we obtain the following.

Geometric Description of invertible Transformations of R?


A linear transformation T of the plane R? into itself is invertible if
and only if T consists of a finite sequence of:
Reflections in the x-axis, the y-axis, or the line y = x;
Vertical or horizontal expansions or ccntractions; and
Vertical or hoxizontal shears. |

(a)

FIGURE 2.16
(a) The verticai shear T([x, y]) = { ,X+ yl) r>0
(6) the vertical shear 7{[x, y]} = x, mx+ yr <0
164 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

EXAMPLE 7? illustrate the result relating to the boxed description above for the invertible
linear transformation T([x, y]) = [x + 2y, 3x + 4y).
SOLUTION We reduce the standard matrix representation A of T, obtaining

E, E, E,
A= 1 2 _ [| ? 1 2 _ 10 .
3 4 (0 --2| 0 | 01
RR, _ 3K, Ry>-5R, RR, —_ 2R,

In terms of elementary matrices, this reduction becomes

Opa he
E, E,
2
E,
and so

1 Oy;71 O;f1 2
A= Ev By; 5 1 fF 5 0 ir
= -! ~I -l=

This shows that 7 consists of a horizontal shear (matrix £,~') followed by an


expansion and reflection (matrix £,~'), followed in turn by a vertical shear
(matrix £,"'). =

as
7 SUMMARY

1. Linear transformations of R? into R? whose standard matrix representa-


tions have rank less than 2 either collapse the entire plane to the crigin (the
rank 0 case) or collapse the plane to a line (the rank 1 case).
2. A rigid motion of the plane into itself that leaves the origin fixed gives a
linear transformation 7: R’? — R*. Every such rigid motion is either a
rotation of the plane about the origin, or a reflection in the x-axis (to tum
the plane over) followed by such a rotation.
3. The standard matrix representation for the rotation of the plane counter-
clockwise about the origin through an angle @ is

cos @ —sin 6
sin @ cas 6

4. An invertible linear transformation of & inte itself can be described


geometrically, using elementary matrices, as indicated in the box preced-
ing Example 7.
2.4 LINEAR TRANSFORMATIONS OF THE PLANE (OPTIONAL) 165

EXERCISES

. Explain why the lincar transformation 9. Referring to Exercise 8, explain algebraically


why cases (iii) and (iv) can be described by


T, R? > R?, where A = F » , has the Jine
the reflection followed by the expansion or
y = 2x as range, but is not the projection contraction, in that order.
of R? onto that line. . Show that the linear trausformation
. Give the standard matrix representation of

= 10
the rotation of the plane counterclockwise
about the origin through an angle of
a. 30°,
b. 90°, corresponds to a horizontal shear of the
c. 135°. plane.
. Give the standard matrix representation of
the rotation of the plane clockwise about the In Exercises 11-15, express the standard matrix
origin through an angle of representation of the given invertible
a. 45°, transformation of R’ into itself as a product of
b. 60°, elementary matrices. Use this expression to
c. 150°. describe the transformation as a product of one or
. Use the rotation matrix in item 3 of the more reflections, horizontal or vertical expansions
Summary to derive trigonometric identities or contractions, and shears.
for sin 38 and cos 36 in terms of sin @ and
cos @. (See Illustration 1, Section 2.3.) 11. T([x, yi) = [-y, »] (Rotation
. Use the rotation matrix in item 3 of the counterclockwise through 90°)
Summary to derive trigonometric identities 12. T((x, y]) = (2x, 2y] (Expansion away from
for sin(@ + @) and cos (6 + ¢) in terms of the origin by a factor of 2)
sin 6, sin ¢, cos @, and cos ¢. (See
Illustration 1, Section 2.3.) 13. T([x, yl) = [-x, —y] (Rotation through
180°)
. Find the general matrix representation for
the reflection of the plane in the line y = 14. T([x, yl) = [x + y, 2x — y]
mx, using the method for the case m = 2 in 15. T([x, yl) = [x + y, 3x + Sy]
Solution | of Example 2 in the text. 16. Mark each of the following True or False.
. Repeat Exercise 6, but use the method for — a. Every rotation of the plane is a linear
the case m = 2 in Solution 2 of Example 2 transformation.
in the text. . Every rotation of the plane about the
. Show that the linear transformation origin is a linear transformation.
. Every reflection of the plane in a line Z is
r( x — {1 0|{x a rigid motion of the plane.
yy {0 rily . Every reflection of the plane in a line Z is
a linear transformation of the plane.
affects the plane R? as follows: . Every rigid motion of the plane that
(1) A vertical expansion, if r > 1; carries the origin into itself is a linear
transformation.
(ii) A vertical contraction, if 0 < r< 1;
f. Every invertible linear transformation of
(ii1) A vertical expansion followed by a the plane is a rigid motion.
reflection in the x-axis, if r< —]; . Ifa linear transformation T: R? > R? is a
(iv) A vertical contraction followed by a ngid motion of the plane, then
reflecticn in ihe x-axis, if -] <7 <0. I 7yx)H] = |x|] for all x © R?.
166 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

—_—h. The geometric effect of all invertible 19. Express both the length of a vector vy € R?
linear transformations of R? into itself and the angle between two nonzero vectors
can be described in terms of the u, v € R’ in terms of the dot product only.
geometric effect of the linear (From this we may conclude that if a linear
transformations of R? having elementary transformation T: R? — R? preserves the dot
matrices as standard matnx product, then it preserves length and angle.)
representations. 20. Suppose that 7,,: R? > R? preserves both
. i. Every linear transformation of the plane length and angle. Prove that the two column
into itself can be achieved through a vectors of the matrix A are orthogonal unit
succession of reflections. expansions, vectors.
contractions, and shears.
. Every invertible linear transformation of 21. Prove that the two column vectors of a
2 < 2 matrix A are orthogonal unit vectors if
the plane into itself can be achieved
and only if (A’)A = I. Demonstrate that the
through a succession of reflections,
matrix representations for the rigid motions
expansions, contractions, and shears.
given in Examples | and 2 satisfy this
condition.
22. Let A be a2 X 2 matrix such that (A7)A = I.
A linear transformation T: R? > R? Prove that the linear transformation T,
preserves length if ||7(x)|] = ||x|| for all preserves the dot product, and hence also
x & R’. It preserves angle if the angle preserves length and angle. [Hint: Note that
between u and v is the same as the angle the dot product of two column vectors
between 7(u) and 7(v) for ali u, v E R’. It u, v & R" is the entry in the 1 x 1! matrix
preserves the dot product if T(u) - 7(v) = (u7)v. Compute the dot product 7,(u) - T,(v)
uv for all u,v & R?. by computing (Au)"(Av).]
23. This exercise outlines an algebraic proof that
rotation of the plane about the origin is a
linear transformation. Let T: R? > R’ be the
We recommend that Exercises 17-22 be worked function that rotates the plane
sequentially, or at least be read sequentially. counterclockwise through an angle 6 as in
Example 1.
17. Use the familiar equation that describes the a. Prove algebraically that each vector
dot product u - v geometrically to prove that v & R? can be written in the polar form
if a linear transformation T: R? > R? v = r[cos a, sin a]. [Hint: Each unit
preserves both length and angle, then it also vector has this form with r = 1.]
preserves the dot product. b. For v = r[cos a, sin aj, express T(y) in
18. Use algebraic properties of the dot product this polar form.
to compute ||u — v|? = (u — v) - fu — v), and c. Using column-vector notation and .
prove from the resulting equatisn that a appropriate trigonometric identities, find
linear transformation T: R? > R? that a matrix A such that 7(v) = Av. The
preserves length also preserves the dot existence of such a matrix A proves that
product. T is a linear transformation.
2.5 LINES, PLANES, AND OTHER FLATS (OPTIONAL) 167

LINES, PLANES, AND OTHER FLATS (Optional)

We turn to geometry in this section, and generalize the notions of a /ine in the
plane or in space and of a plane in space. Our work in the preceding sections
will enable us to describe geometrically the solution set of any consistent linear
system.

The Notion of a Line in R’

For a nonzero vector d in R’, we visualize the one-dimensional subspace sp(4)


as a line through the origin, as shown in Figure 2.17. Similarly, Figure 2.18
indicates that for a nonzero vector d in R’, we can view sp(d) = {td | t © R}asa
line throngh the origin. Every subspace of R” contains the origin (zero vector),
but we surely want to consider lines in R" that do not pass through the origin,
such as line L in Figure 2.19. As indicated in Figure 2.19, if a is a vector to a
point on the line L, then every point on L is at the tip of a vector x = td + a,
where fis a scalar and d is any fixed nonzero vector that we regard intuitively as
parallel to the line L. This line is thus obtained from the line sp(d) by
translation. Geometers consider a translation of a subset S of R" to be a sliding
of every point in S in the same direction and for the same distance. The
direction and distance for a transiation can be specified by a vector (such as
vector a above) pointing in the direction of the translation and having
magnitude equal! to the distance the points are moved. The image of S under
such a translation is a translate of S. We will give a formal definition of a
translate of a subset, present an example, and then define a line in R’.

DEFINITION 2.4 Translate of a Subset of R’

Let S'be a subset of R" and let a be a vector in R” The set {x + a| x € S}


is the translate of S by a, and is denoted by S + a. The vector ais the
translation vector.

x2
A

/ p(d) sp(d)

x}

FIGURE 2.17 FIGURE 2.18


A line through the origin in 2? A line through the origin in R”.
38 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

FIGURE 2.19 FIGURE 2.20


A general line Z in R’. Translate of {x € R? |||x\] = 2} in R?
-by [3, —4].

EXAMPLE 1 Sketch the translate of the subset S = {x € R? | ||x|| = 2} of R? by the vector


[3, —4].

SOLUTION The subset S of R? is a disk with center at the origin and a radius of 2. As shown
in Figure 2.20, its translate by [3, —4] is the disk with center at the point
(3, —4) and a radius of 2. =

DEFINITION 2.5 Line inR’

A line in R" is a translate of a one-dimensional subspace of R’.

Although our definition defines a line to be a set of vectors, it is customary


in geometry to consider a line as the set of points in R" whose coordinates
correspond to the components of the vectors.
We can specify a line Z in R" by giving a point (a,, a), ... , a,) on the line
and a vector d parallel to the line. We consider d to be a direction vector for the
line, whereas a is a translation vector. In the terminology of Definition 2.5, L is
the translate of the subspace sp(d) by the vector a = [a,, a), . . . , a,]|—that is,
L = {td + a| t © R}. We can describe the line by the single equation

Xx =fd+a_ Vector equation of L


or by the equations

x, = td, + a,
xX, = td, + a,
Component equations of Z

x, = td, + a,.
2.5 LINES, PLANES, AND OTHER FLATS (OPTIONAL) 16
In classical geometry, component equations for L are also called parametric
equations for L, and the variable ¢ is a parameter. Of course, this same
parameter ¢ appears in the vector equation also.

EXAMPLE 2 Find a vector equation and component equations for the line in R’ through
(2, 1) having direction vector (3, 4]. Then find the point on the line having -4
as its x,-coordinate. ,
SOLUTION The line can be characterized as the transiate of sp([3, 4]) by the vector (2, 1],
and so a vector equation of the line is
Lx; xX] = (3, 4] + (2, I].

The component equations are


X=U+2, x, =4ttl.
Because {runs through all real numbers, we obtain all points (x,, x,) on the line
from these component equations. In order to find the point on the line with —4
as x,-coordinate, we set —4 = 3f + 2 and obtain ¢ = —2. Thus the x,-coordinate
is 4(—2) + 1 = -7, and so the desired point is (-4, -7). =

EXAMPLE 3 Find parametric equations of the line in R? that passes through the points
(2, —1, 3) and (1, 3, 5).
SOLUTION We arbitrarily choose a = [2, —!, 3] as the translation vector corresponding to
the point (Z, —1, 3) on the line. A direction vector is given by
d = [1, 3, 5] — [2, -1, 3] = [-1, 4, 2],

as indicated in Figure 2.21. We obtain

[x,; Xo, x3] = t(-1, 4, 2| + (2, —l, 3]

as a vector equation for the line. The corresponding parametric (component)


equations are
x, = -t+ 2, x, = 41-1, xX, = 2t + 3. a

x3

FIGURE 2.21
The line passing through (2, —1, 3) and (1, 3, 5).
170 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

Line Segments
Consider the line in R" that passes through the two points (a,, a, ..., @,) and
(5,, b,,..., b,). Letting a and b be the vectors corresponding to these points,
we sec as in Example 3 that d = b — ais a direction vector for the line. The
vecio. equation

x=(dt+a=(b-a)t+a (1)
for the line in effect presents the line as a f-axis whose origin is at the point
(a), a, ..., @,) and on which a one-unit change in ¢ corresponds to ||d|| units
distance in R’. This is illustrated in Figure -2.22.
As illustrated in Figure 2.23, each point in R” on the line segment that
joins the tip of a to the tip of b lies at the tip of a vector x obtained in Eq. (1) for
some valve of f for wnich C <t< 1. Notethat f = 0 yields the point at the tip of
a and 1 = 1 yields the point at the tip of b. By choosing ¢ between 0 and 1
appropnately, we can find the coordinates of any point on this line segment. In
particular, the coordinates of the midpoint of thc line segment are the
components of the vector

a+ 3(b - a) = 3(a + b).

EXAMPLE 4 Find the points that divide into five equal parts the line segment that ioins
(1, 2, 1, 3) to (2, 1, 4, 2) in R*.
SOLUTION We obtaind = [2, i, 4, 2] — [1, 2, 1, 3] =[1, —1, 3, —1] asa direction vector for
the line through the two given points. The corresponding vector equation of
the line is

[x,, X) X3, X%] = 1, -1, 3, -1]) + DH, 2, 1, 3].


By choosing t = 0, 3, z, 2, 5, and 1, we obtain coordinates of the points that
divide the segment as required, as shown in Table 2.1.»

atid
a3 +a
a+ zd

a=a+0d

FIGURE 2.22 FIGURE 2.23


Equation (1) sets up a t-axis. Points on a line segment.
25 LINES, PLANES, AND OTHER FLATS (OPTIONAL) 171

TABLE 2.1

t Equally Spaced Points


0 (t, 2, 1, 3)

: (1.2, 1.8, 1.6, 2.8)

: (1.4, 1.6, 2.2, 2.6)

3 (1.6, 1.4, 2.8, 2.4)

; (1.8, 1.2, 3.4, 2.2)


1 (2, 1, 4, 2)

Flats in R’
Just as a line is a translate of a one-dimensional subspace in R’, a plane in R" is
a translate of a iwo-dimensional subspace sp(d,, d,), where d, and d, are
nonzero, nonparallel vectors in R". A plane appears as a flat piece of R’, as
illustrated in Figure 2.24. We have no word analogous to “straight” or “‘flat” in
our language to denote that R? is not “curved.” We borrow the term “flat’’
when generalizing to higher dimensions, and describe a translate of a
k-dimensional subspace of R” for k < nas being “flat.” Let us give a formal
definition.

DEFINITION 2.6 A k-Flat in R’

A k-flat in R" is a translate of a k-dimensional subspace of R”. In


particular, a 1-flat is a line, a 2-flat is a plane, and an (n — 1)-flat isa
hyperplane. We consider each point of R” to be a zero-flat.

General 2-flat sg 8"


",

~ 2-flat through
the origin

FIGURE 2.24
Planes or 2-flats in R?
CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS
NO
~d

Just as for a line, it is conventional in geometr, to speak of the translate


W + a of a k-dimensional subspace W of R" as the k-flat through the point
(a,, @,..., @,) parallel to W. lf {d,, d,,..., d,} is a basis for W, then

is the vector equation of the k-flat. (We use the letter d because W determines
the direction of the k-flat as being parallel to W.) The corresponding
component equations are again called parametric equations for the k-flat.

EXAMPLE 5 Find parametnv equations of the plane in R* passing through the points
(1, 1, 1, 1), (2, 1, 1, 0), and (3, 2, 1, 0).
SOLUTION We arbitrarily choose a = [1, 1, 1, 1] as the translation vector corresponding to
the point (1, I, 1, 1) on the desired planc. Two vectors that (when translateu)
start at this point and reach to the other two points are

d. = [2, 1, 1, 0] — (1, 1, 1, 1) = [], 0, 0, -1)


and

d, = (3, 2, 1, 0] — [1, I, 1, 1) = [2, 1, 0, -1).


Because these vectors are nonparallel, they form a basis for the 2-flat through
the origin and parallel to the desired plane. See Figure 2.25. The vector
equation of the plane is x = sd, + td, + a, or, wntten out,

[X), Xa, X53, X4) = S[1, 0,0, -1] + ¢[2, 1,0, -1] + [1, 1, 1, 1).

HISTORICAL NOTE = THE EQuaTION OF A PLANE IN R} appears as early as 1732 in a paper of Jacob
Hermann (1678-1733). He was able to determine the plane’s position by using intercepts, and he
also noted that the sine of the angle between the plane and the one coordinate plane he dealt with
(what we call the x,, x,-plane) was
Vd? + d,
Vd? + d? + d}
In his 1748 Introduction to Infinitesimal Analysis, Leonhard Euler (1707-1783) used, instead. the
cosine of this angle, d,/\/q? + a, + d,.
At the end of the eighteenth century, Gaspard Monge (1746~1818), in his notes for a course
on solid analytic geometry at the Ecole Polytechnique, related the equation of a plane to all three
coordinate planes and gave the cosines of the angles the plane made with each cf these (the
so-called direction cosines). He also presented many of the standard problems of solid analytic
geometry, examples of which appear in the exercises. For instance, he showed how to find the
plane passing through three given points, the line passing through a point perpendicular to a plane,
the distance between two parallel planes, and the angle between a line and a plane.
Known as “‘the greatest geometer of the eighteenth century,” Monge developed new graphical
geometric techniques as a student and later as a professor at a military school. The first problem he
solved had to do with a procedure enabling soldiers to make quickly a fortification capable of
shielding a position from both the view and the firepower of the enemy. Monge served the French
revolutionary government as minister of the navy and later served Napoleon in various scicatufic
offices. Ultimately, he was appointed senator for life by the emperor.
25 LINES, PLANES, AND OTHER FLATS (OPTIONAL) 173

translated d,
translated dy
(hdd)

FIGURE 2.25
A 2-ilat in R*.

Parametric equations obtained by equating components are

X= st+2tt+1
xX = t+1
x= 1
X,=—-S- ttl. a

The Geometry of Linear Systems


Let Ax = b be any system of m equations in n unknowns that has at least one
solution x = p. Theorem 1.18 on p. 97 shows that the solution set of the system
consists of all vectors of the fuiti x = p + kh, where h is a solution of the
homogeneous system Ax = 0. Because the solution set of Ax — 0 is a subspace
of kk”, we see that the solution set of Ax = b is the translate of this subspace by
the vector p. That is, the solution set of Ax = b is a k-flat, where kis the nullity
of A. If the system of equations has a unique solution, its solution set is a
zero-flat.

EXAMPLE 6 Show that the linear equation ¢,x, + ¢,x, + ¢,x,; = 5, where at least one of
C1, C,, Cy 1S nonzero, represents a plane in R’.
SOLUTION Let us assume that c, 4 0. A particular solution of the given equation is
a = [b/c,, 0,0]. The corresponding homogeneous equation has a solution space
generated by d, = [c;, 0, —c,] and d, = [c,, —c,, 0]. Thus the solution set of the
linear equation is a 2-flat in R? with equation x = sd, + td, + a—that is, a
plane in R®, »

Reasoning as in Example 6, we see that every linear equation

CX, + OX, + see + ¢,X, = b

represents a hyperplane—that is, an (n — 1)-flat in R”.


174 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS
EXAMPLE 7 Solve the system of equations
X, + 2x,- 2x,
+ x, + 3x;= |
2x, + 5x. — 3x; — 4g + 2x5 = 2
—3x, — 8x, + 6x, -— x, — 5x; = 1,
and write the solution set as a k-flat.
SOLUTION Reducing the corresponding partitioned matrix, we have
J 2-2 1 371 1 2-2 1 3/1
2 5-3-1 2] 2;~|/0 1 1-3 -4 |0
-3 -8 6-1 -5 | 1 0-2 0 2 4 |4
1 0-4 7 tl {1 1 0 0-1 3 9
“To 1 1-3 -4 |0|}~|0 1 O -1 -2 ~—2}.
0 0 2-4-4 |4; {0 O 1 -2 -2 2
Thus, a = [9, —2, 2, 0, 0] is a particular solution to the given system, and
d, =(1, l, 2, 1, OJ and d, = [—3, 2, 2, 0, 1] form a basis for the solution space of
the corresponding homogeneous system. The solution set of the given system
is the 2-flat in R® with vector equation x = a+ td, + t,d,, which can be written
in the form
x, 9 1 —3
xX; —2 l 2
X)=] 224+ ¢,/2])4+ 4] 2).
X, 0 l 0
Xs; 0 0 J /

In the preceding example, we described the solution set of a system of


equations as a 2-flat in R°. Notice that the original form of the system
represents an intersection of three hyperplanes in R°—one for each equation.
We generalize this example to the solution set of any consistent linear system.
Consider a system of m equations in n unknowns, Let the rank of the
coefficient matrix be r so that, when the matrix is reduced to row-echelon
form, there are r nonzero rows. According to the rank equation, the number of
free variables is then » — r; the corresponding homogeneous system has as its
solution set an (n - *)-dimensional subspace—that is, an (n — r)-flat through
the origin. The solution set of the original nonhomogeneous system is a
translate of this subspace and is an (n — r)-flat in R’. In particular, a single
consistent linear equation has as its solution set an (7 — 1)-flat in R". In
general, if we adjoin an additional linear equation to a given linear system, we
expect the dimension of the solution flat to be reduced by 1. This is the case
precisely when the new system 1s still consistent and when the new equation is
independent of the others (in the sense that it yields a new nonzero row when
the augmented matrix is row-reduced to echelon form).
We have shown that a system {x = b of7 equations in 7 unknowns has as
lis solution set an (#1 — 7)-flat, where ris the rank of 4. Conversely, it can be
25 LINES, PLANES, AND OTHER FLAIS (UPHIUINAL) I>

shown that a k-flat in R" is the solution set of some system of n — & linear
equations in n unknowns, That is, < k-flat in R" is the intersection of n — k
hyperplanes. Thus there are two ways to view a k-flat in R™
|. As a translate of a k-dimensional subspace of R’, described using
parametric equations
2. As an intersection of n — k hyperplanes, described with a system of
linear equations.

EXAMPLE 8 Describe the line (1-flat) in R’ that passes through (2, —1, 3) and (1, 3, 5) in
terms of
(1) parametric equations, and
(2) a system of linear equations.

SOLUTION (1) In Example 3, we found the parametric equations for the line:
x, = -t + 2, x, = 4-1, xX; = 2t + 3. (3)

(2) In order to describe the line with a system of linear equations, we eliminate
the parameter ¢ from Eqs. (3):
4x, + x; = 7 Add four times the first to the second. (4)
X, — 2x, = —7 Subtract twice the third from the second.
This system describes the line as an intersection of two planes. The line can be
represented as the intersection of any two distinct planes, each containing the
line. This is illustrated by the equivalent systems we have at the various stages
in the Gauss reduction of system (4) to obtain solution (3). «

SUMMARY
1. The translate of a subset S of R" by a vector a € R'is the set of all vectors in
IR" of the form x + a for x € S, and is denoted by S + a.
2. A k-flat in RR“ is a transiate of a k-dimensionai subspace and has the form
a + sp(d,, d,,...,d,), where a is a vector in R" ana d,, d,,... , d, are
independent vectors in R". The vector equation of the k-flat is x = a +
id, + id, +--+ + td, for scalars ¢, in R.
3. A line in R" is a 1-flat. The line passing through the point a with parallel
vector d is given by x = a + fd, where t-runs through all scalars. Parametric
equations of the line are the component equations x, = a, + d@t for
P=1,2,...,a.
4. Let aandb be vectors in R”. Vectors to points on the line segment from the
tip ofa to the tip ofb are vectors of the form x = (b—a)+afor0=¢= 1.
5. A plane in R’ is a 2-flat; a hyperplane in R’ is an (n — 1)-flat.
176 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

6. The solution set of a consistent linear system in n variables with coefficient


matnx of rank ris an (n — r)-flat in R’.
7. Every k-flat in R” can be viewed both as a translate of a k-dimensional
subspace and 4s the intersection of n — k hyperplanes.

| EXERCISES
In, keeping with classical geometry, many of the 11. For each pair of points, find parametric
exercises that follow are phrased in terms of equations of the line containing them.
points rather than in terras of vectors. a. (-2, 4) and (3, —1) in R?
In Exercises 1-6, sketch the indicated b. (3, —1, 6) and (0, —3, —1) in R?
translate of the subset of R" in an appropriate c. (2, 0, 4) and (-—1, 5, —8) in R?
figure. . For each of the giveu pairs of lines in R’,
determine whether the lines intersect. If they
do intersect, find the point of intersection,
. The translate of the line x, = 2x, + 3 in R? and determine whether the lines are
by the vector [—3, 0] orthogonal.
. The translate of {(t, 2) | ¢ € R} in R? by the ax=H=4tt x%,=2- 31,
vector [-1, —2] x,=-3+5
. The translate of {x € R?| |x| < 1 for i= 1, 2} and
by the vector [1, 2} xX =114+3s, «x, =-9
- 4s,
. The translate of {x € R? | ||x|| = 3} by the x;= -4- 3s
vector {2, 3| bh x= 1!+34, 4 =-3-4,
. The translate of {x € R? | ||x|| <= 1} by the x,=4+ 3t
vector [2, 4, 3] and
. The translate of the plane x, + x, = 2 in R? xX=6-25, xm=-2+5,
by the vector [—1, 2, 3} x;= -15 + Ts
. Give parametric equations for the line in R? 13. Find all points in common to the lines in R?
through (3, —3) with direction vector given by x, = 5 ~ 3t,x, = -1 + tand
d = [-—8, 4]. Sketch the line in an x, = -7 + 65, x, = 3 — 2s.
appropriate figure. 14, Find parametric equations for the line in R°
. Give parametric equations for the line in R? through (—1, 2, 3) that is orthogonal to each
through (—1, 3, 0) with direction vector of the two lines having parametric equations
d = [-2, -1, 4]. Sketch the line in an xX, = -2 + 3t, x, = 4, x; = i — tand
appropriate figure. X=7-6%,=2+ 34%,=4 +1.
. Consider the line in R? that is given by the . Find the midpoint of the line segment
equation dx, + d,x, = c for numbers d,, d,, joining each pair of points.
and c in R, where d, and d, are not both a. (—2, 4) and (3, -1) in R?
zero. Find parametric equations of the line. b. (3, —1, 6) and (0, —3, —1) in R?
. Find parametric equations for the line in R? c. (0, 4, 8) and (-4, S, 9) in R?
through (5, —1) and orthogonal to the line 16. Find the point in R? on the line segment
“ith parametric equations x, = 4 — 21, joining (—t, 3) and (2, 5) tnaz is twice as
Wwe? te closc 1:0 (—!. 3) as to (2, 5).
2.8 LINES, PLANES, AND OTHER FLATS (OPTIONAL) 177

17. Find the point in R? on the line segment 29. Find a linear system with two equations
joining (—2, 1, 3) and (0, —5, 6) that is in four variables whose solution set is the
one-fourth of the way from (—2, 1, 3) to plane in Exercise 28. [See the hint for
(0, —5, 6). Exercise 23.]
18. Find the points that divide the line segment 30. Find a vector equation of the hyperplane
between (2, 1, 3, 4) and (—1, 2, 1, 3) in R* that passes through the points (1, 2, 1, 2, 3),
into three équal parts. (0, 1, 2, 1, 3), (0, 0, 3, 1, 2), (0, 0, 0, 1, 4),
19. Find the midpoint of the line segment and (0, 0, 0, 0, 2) in R°.
between (2, !, 3, 4, 0) and (1, 2, —1, 3, -1)
31. Find a single linear equation in five variables
in R?.
whose solution set is the hyperplane in
20. Find the intersection in R? of the line given Exercise 30. [See the hint for Exercise 23.]
by
32. Find a vector equation of the hyperplane in
X= 5+8 x, = —3t, x,=-2+4 R° through the endpoints of e,, e,, . . . , &
and the piane with equation x, — 3x, + 2x, 33. Find a single linear equation in six variables
= —25, whose solution set is the hyperplane in
21. Find the intersection in R? of the line given Exercise 32. [See the hint for Exercise 23.]
by
X=2, »x=5-12, X;
= 20
In Exercises 34-42, solve the given systern of
and the plane with equation x, + 2x, = 10. linear equations and write the solution set asa
22. Find parametric equations of the plane that k-flat.
passes through the unit coordinate points
(1, 0, 0}, (0, 1, 0), and (0, 0, 1) in R?. 34. xX, ~ 2X, = 3

23. Find a single linear equation in three 3x,- x, = 14


variables wnose solution set is the plane in
Exercise 22. [Hint: We suggest two general 35. x, +2x,- x, =-3
methods of attack: (!) climinate the 3x, + 7x, + 2x,= 1
parameters from your ans-er to Exercise 22, 4x, -—2x,+ x,;= -2
or (2) solve an appropriate linear system.
Actually, this particular answer can be found 36. x, +4x,-2x,= 4
by inspection.]} 2x, + 7X, - x; = -2
24. Find parametric equations of the plane in R? x + 3x, + x; = —6
that passes through (1, 0, 0), (0, 1, —1), and
(1, 1, 1). 37. Xi ™~ 3x, + XxX; = 2

25. Find a single linear equation in three 3x, — 8x, + 2x, =


variables whose solution set is the plane in 3x, — 7X, + x, = 4
Exercise 24. [See the hint for Exercise 23.]
26. Find a vector equation of the plane that 38. x, — 3x, + 2x, -— x, = 8
passes through the points (1, 2, 1), (—1, 2, 3), 3x, -— 7X, +X, =
and (2, 1, 4) in R®.
. Find a single linear equation in three 39. x, —~2x,+ x,
= 6
variables whose solution set is the plane in 2xX,- %+ x;,- 3x,=0
Exercise 26. [See the hint for Exercise 23.] 9x,
— 3x, ~ x; - 7x,
= 4
28. Find a vector equation for the plane in R‘
that passes through the points (1, 2, 1, 3), 40. x, +2x,-3x,+ x, =2
(4, 1,2, 1), and (3, 1, 2, 0). 3x, + 6X, — 8x; — 2x, = 1
178 CHAPTER 2 DIMENSION, RANK, AND LINEAR TRANSFORMATIONS

41. x,-3x,+ x,+2x,= 2 _— d. The Euclidean space R’ has no physical


X, — 2x, + 2x, + 4x,= -i existence, but exists only in our minds.
— & The Euclidean spaces R, R’, and R? have
2x, — 8x, - x; =
no physical existence, but exist only in
3x, ~ 92, + 4x, = 7 our minds.
. The mathematical existence of Euclidean
42. 2x, — 5x, + x; — 10x, + 15x, = 60
5-space is as substantial as the
42. Mark each of the following True or False. mathematicai existence of Euclidean
—. a. The solution set of a linear equation in x, 3-space.
and x, can be regarded as a hyperplane in . Every plane in R’ is a two-dimensional
R?. subspace of R’.
—— b. Every line and hyperplane in R” intersect . Every plane through the origin in R’ is a
in a single point. two-dimensional subspace of R’.
—___ c. The intersection of two distinct i. Every k-flat in R” contains the ongin.
hyperplanes in R? is a tine, if the . Every k-flat in R" is a translate of a
intersection is nonempty. k-dimensional subspace.
Da
CHAPTER

3 VECTOR SPACES

For the sake of efficiency, mathematicians often study objects just in terms of
their mathematical structure, deen:phasizing such things as particular sym-
bols used, names of things, and applications. Any properties derived exclu-
sively from mathematical structure will hold for all objects having that
structure. Organizing mathematics in this way avoids repeating the same
arguments in different contexts. Viewed from this perspective, linear algebra is
the study of all objects that have a vector-space structure. The Euclidean spaces
R" that we treated in Chapters 1 and 2 serve as our guide.
Section 3.1 defines the general notion of a vector space, motivated by the
familiar algebraic structure of the spaces R". Our examples focus mainiy on
spaces other than R’, such as function spaces. Unlike the first two chapters in
our text, this chapter draws on calculus for many of its illustrations.
Section 3.2 explains how the linear-algebra terminology introduced in
Chapter | for R" carries over to a general vector space V. The definitions given
in Chapters 1 and 2 for linear combinations, spans, subspaces, bases,
dependent vectors, independent vectors, and dimension can be left mostly
unchanged, except for replacing “R””’ by “a vector space V.” Indeed, with this
replacement, many of the theorems and proofs in Chapter i have word-for-
word validity for general vector spaces.
Section 3.3 shows that every finite-dimensional (real) vector space can be
coordinatized to become algebraically indistinguisiiable from one of the spaces
R". This coordinatization allows us to apply the matnx techniques developed
in Chapters 1 and 2 to any finite-dimensional vector space for such things as
determining whether vectors are independent or form a basis.
Linear transformations of one vector space into another are the topic of
Section 3.4. We will see that some of the basic operations of calculus, such as
differentiation, can be viewed as linear transformations.
To conclude the chapter, optional Section 3.5 describes how we try to
access such geometric notions as length and angle even in infinite-dimensional
vector spaces.

179
180 CHAPTER 3 VECTOR SPACES

3.1 VECTOR SPACES

The Vector-Space Operations


In each Euclidean space §", we know how to add two vectors and how to
perform scalar multiplication of a vector by a real number (scalar). These are
the two vector-space operations. The first requirement that a set V, whose
elements we will call “vectors,” must satisfy in order to be a vector space is
that it have two well-defined algebraic operations, each of which yields an
element of V—namely:

Addition of two elements of V Vector addition


Multiplication of an element of V by a scalar. Scalar multiplication

For example, we know how to add the two functions x’ and sin x, ad we know
how to muitiply them by a real number. We require that, whenever addition or
scalar multiplication is performed with elements in V, the answers obtained lie
again in V. That is, we require that V be closed uader vector addition and closed
under scalar multiplication. This notion of closure under an operation is
familiar to us from Chapter 1.

Definition of a Vector Space


The definition of a vector space which follows incorporates the ideas we have
just discussed. It also requires that the vector addition and scalar multinlica-
tion satisfy the algebraic properties that hold in R’—namely, those listed in
Theorem 1.1.

DEFINITION 3.1 Vector Space

A (real) vector space is a set V of objects called vectors, together with a


rule for adding any two vectors v and w to produce a vectory + win ¥
and a rule for multiplying any vector yin V by any scalar rin R to
produce a vector'rv in V Moreover; there must exist a vector 0 in V,
and for each vy in V there must exist: a vector —v in V such that
iy yeh

choices oF vectors U, Y, Ww ‘E Vand scalars r, sE R:

Properties of Vector Addition


Al (u+y)+w=ut(v+w) An associative law
A2 v+tw=wty A commutative law
A3 Q+yv=yv 0 as additive identity
A4 yt{-yv)=0 —y as additive inverse of v
31 VECTOR SPACES 181

Properties Involving Scalar Multiplication


Sl r(vt+tw)=rv+rw_ A distributive law
S2 (r+s\v=rv+sv A distributive law
$3 r(sv) = (rs) An associative law
S4 lv=y Preservation of scale

In a moment we wili show that there is only one vector 0 in V satisfying


condition A3; this vector is called the zero vector. Similarly, we will see that the
vector —v in condition A4 is uniquely determined by y; it is called the additive
inverse of vy, and is usually read “minus v.”’ We write v — w for v + (—w).
The adjective “real” in parentheses in the first line of Definition 3.1]
signifies that it is sometimes necessary to allow the scalars to be complex
numbers, a + bi where a, b € R and ? = —1. Linear algebra using complex
scalars is the topic of Chapter 9. There are several branches of mathematics in
which one does not gain full insight without using complex numbers, and
linear algebra is one of them. Unfortunately, pencil and paper computation
with complex numbers is cumbersome, and so it is customary in a first course
to work mostly with real scalars. Fortunately, many of the concepts of linear
algebra can be adequately explained in terms of real vector spaces.
Because our definition of a vector space was modeled on the algebraic
structure of the Euclidean spaces R” discussed in Chapter |, we see that R'is a
vector space for each positive integer n. We now proceed to illustrate this
concept with other examples.

HISTORICAL NOTE ALTHOUGH THE OBJECTS WE CALL VECTOR SPACES were well known in the late
nineteenth century, the first mathematician to give an abstract definition of a vector space was
Giuseppe Peano (1858-1932) in his Calcolo Geometrico of 1888. Peano’s aim in the book, as the
title indicates, was to develop a geometric calculus. Such a calculus “consists of a system of
operations analogous to those of algebraic calculus but in which the objects with which the
calculations are performed are, instead of numbers, geometrical objects.” Much of the book
consists of calculations dealing with points, lines, planes, and volumes. But in the ninth chapter,
Peano defines what he called a /inear system. This was a set of objects that was provided with
operations of addition and scalar multiplication. These operations were to satisfy axioms Al-A4
and S1-S4 presented in this section. Peano also defined the dimension of a linear system to be the
maximum number of linearly independent objects in the system and noted that the set of
polynomial functions in one variable forms a linear system of infinite dimension.
Curiously, Peano’s work had no immediate effect on the mathematical community. The
definition was even forgotten. It only entered the mathematical mainstream through the book
Space-Time-Matter (1918) by Hermann Weyl (1885-1955). Weyl wrote this book as an
introduction to Einstein’s general theory of relativity. In Chapter 1 he discusses the nature of a
Euclidean space and, as part of that discussion, formulates the same standard axioms as Peano did
earlier. He also gives a philosophic reason for adopting such a definition:
Not only in geometry, but to a still more astonishing degree in physics, has it becomc more and more
evident that as soon as we have succeeded in unraveling fully the natural laws ‘vhich govern reality, we find
them to be expressible by mathematical relations of surpassing simplicity and architectonic perfection
... Analytical geometry [ihe axiom system which he presented] . . . conveys an idea, even if inadequate,
of this perfection of form.
182 CHAPTER 3 VECTOR SPACES
,

EXAMPLE 1 Show that the set M/,,, of all m Xx n matrices is a vector space, using as vector
addition and scalar multiplication the usual addition of matnces and multipli-
cation of a matrix by a scalar.
SOLUTION We have seen that addition of m X n matrices and multiplication of an m X n
matrix by a scalar again yield an m X n matrix. Thus, M,,, is closed under
vector addition and scalar multiplication. We take as zero vector in M,,, the
usual zero matrix, all of whose entries are zero. For any matrix A in M,,,, we
consider —A to be the matrix (—1)A. The properties of matrix arithmetic on
page 45 show that all eight properties Al-A4 and S$1-S4 required of a vector
space are Satisfied. s

The preceding example introduced the notation M,,, for the vector space
of all m x n matrices. We use M, for the vector space of all square n x n
matrices.

EXAMPLE 2 Show that the set P of all polynomials in the variable x with coefficients in
fR is a vector space, using for vector addition and scalar multiplication
the usual addition of polynomials and multiplication of a polynomial by a
scalar.
SOLUTION Let p and g be polynomials

p=ataxtax+-+: ta.
and

G= bt bx t+ bytes + dx".
If m = n, the sum of p and 7 is given by

Dy Xt + DX
pt+q= (ath) t+ (a+ d)xt-°+ + (a, + 5,)2°

For example, if p= 1+ 2x+ 3x*-andg=x+x,thenp+g=1+3x+3x+


x. A similar definition is made if m < n. The product of p and a scalar r is given
by
rp = ray + ra,X + raxt+ +++ + 7ax"
Taking the usual notions of the zero polynomial and of —p, we recognize that
the eight properties Al—A4 and $1-S4 required of a vector space are familiar
properties for these polynomial operations. Thus, P is a vector space. sm

EXAMPLE 3 Let F be the set of all real-valued functions with domain R; that is, let F be the
set of all functions mapping R into R. The vector sum f+ g of two functions f
and gin Fis defined in the usual way to be the function whose value at any xin
R is f(x) + g(x); that is,
(f> Ox) = fly + a(x).
3.1 VECTOR SPACES 183

For any scalar r in R and function fin F, the product rf is the function whose
value at x is rf(x), so that
(rf (x) = rf(2).
Show that F with these operations is a vector space.
SOLUTION We observe that, for fand gin F, bcth f+ gand rfare functions mapping R into
R, sof + gand rfare in F. Thus, F is closed under vector addition and under
scalar multiplication. We take as zero vector in F the constant function whose
value at each x in R is 0. For each function fin F, we take as —fthe function
(-l)fin F.
There are four vector-addition properties to verify, and they are all easy.
We illustrate by verifying condition A4. For fin F, the function f+ (—f) =
f+ (-1)fhas as its value at x in R the number f(x) + (—1)f(x), which is 0.
Consequently, f+ (—/) is the zero function, and A4 is verified.
The scalar multiplicative properties are just as easy io verify. For example,
to verify S4, we must compute If at any x in R and compare the resuit with
I(x). We obtain (1f)(x) = f(x) = f(x), and so lf=f a

EXAMPLE 4 Show that the set P, of formal power series in x of the form

> ax? = a, + ax + ae + ree tax teres,


n=0

with addition and scalar multiplication defined by

yw)=
(> a. (> bys = > (a, + 8,)x" and (>
m)
Ms
ra,x",
=
ea 0
is a vector space.*
SOLUTION The reasoning here is precisely the same as for the space P of polynomials in

Example 2, The zero series is > Ox", and the additive inverse of > a,x" is
n=0 n=0
Mae

(—a,)x". All the other axioms follow from the associative, commutative,
n
So
It

an distributive laws for the real-number coefficients a, in the series.


Qa

The examples of vector spaces that we have presented so far are all based
on algebraic structures familiar to us—namely, the algebra of matnies,
polynomials, and functions. You may be thinking, “How can anything with an

*We can add only a finite number of vectors in a vector space. Thus we do not regard these formal
power series as infinite sums in P, of monomials. Also, we are not concerned with questions of
convergence or divergence, as studied in calculus. This is the significance of the word formal.
184 CHAPTER 3 VECTOR SPACES

addition and scalar multiplication defined on it fail to be a vector space?”’ To


answer this, we give two more-esoteric examples, where checking the axioms is
not as natural a process.

EXAMPLE 5 Lei R? have the usual operation of addition, but define scalar multiplication
r[x, y] by 7{x, y] = [0, 0]. Determine whether R? with these operations is a
vector space.
SOLUTION Because conditions Al-A4 of Definition 3.1 do not involve scalar multiplica-
tion, and because addition is the usual operation, we need cnly check
conditions S1-S4. We see that all of these hold, except for condition S4:
because I[x, y] = [0, 0], the scale is not preserved. Thus, R? is not a vector
space with these particular two operations. 5

EXAMPLE 6 Let R? have the usual! scalar multiplication, but let addition + be defined on R?
by the formula

[x, ¥] + [7 5) = [e t+ 7, 2y + 3).
Determine whether R? with these operations is a vector space. (We use the
symbol + for the warped addition of vectors in R’, to distinguish it from the
usual addition.)
SOLUTION We check the associative law for +:

(x, ¥] + [7 5]) + [a, 6] = [x + ¥, 2y + 8] + [a, B]


= [(x + r) + a, 2(2y + 5) + DB]
=[x+rt+4,+y + 2s + 0},
whereas

[sy] + (7, 5] + (a, 4) = [x y+ (r+ a, 28 + 3]


= [x + (r + a), 2y + (2s + b)]
=[xtrt+a,2y+
2s + Dj.

Because the two colored scalars are not equal, wc expect that + is not
associative. We can find a specific violation of the associative law by choosing
y ~ 0; for example,
({0, 1] + [0, 0}) + [0, 0] = [0, 4],
whereas

(0, 1] + ({0, 0] + [0, 0]) = [0, 2].


Therefore, R? is not a vector space with these two operations. s

We now indicate that vector addition and scalar multiplication possess


still more of the properties we are accustomed to expect. It is impertant to
realize that everything has to be proved using just the axioms Al through A4
and Sl through $4. Of course, once we have proved something from the
3.1 VECTOR SPACES 139

axicms, we can then use it in the proofs of other things. The properties that
appear in Theorem 3.1 below are listed in a convenient order for proof; for
example, we will see that it is convenient to know property 3 in order to prove
property 4. Property 4 states that in a vector space V, we have Ov = @ for all
v € V. Students often attempt to prove this by saying.

“Let v = (a), &,..., @,). Then Ov = O(a, a,,..., @,)


= (0,0,...,0) = 0.”

This is a fine argument if Vis R’, but we have now expanded our concept of
vector space, and we can no longer assume that v € Vis some n-tuple of real
numbers.

THEOREM 3.1 Elementary Pronerties of Vector Spaces

Every vector space V has the following properties:


1. The vector 0 is the unique vector x satisfying the equation x + v = v
for all vectors v in V.
' 2, For each vector: vein V, the vector —v is the unique vector y
satisfying. v+y=90.
_Ifutve u Aswifor. vectors uy v, and win V, then
v= w
Ww

Ov =0 forall vectors vin V.


ff

10 =0 forall scalars. rin R:-


am

. (“Av = r(-v) = =e) for all scalars r in R and vectors v in V.


ON

PROOF We prove only properties | and 4, leaving proofs of the remaining


properties as Exercises 19 through 22. In proving property 4, we assume that
properties 2 and 3 have been proved.
Tuming to property 1, the standard way to prove that something is unique
is to suppose that there are two of them, and then show that they must be
equal. Suppose, therefore, that there exist vectors 0 and @ satisfying

0+v=v and O'+v=v_ forallveV.

Taking v = 0’ in the first equation and v = 0 in the second equation, we obtain

0+0'=0' and 0'+0=0.


By the commutative law A2, we know that 0 + 0’ = 0’ + 0, and we conclude
that 0 = 0’.
Turning to property 4, notice that the equation Ov = 0 which we want to
prove involves both scalar multiplication (namely, Ov) and vector addition
(0 is an additive concept, given by axiom A3). To prove a relationship between
these two algebraic operations, we must use an axiom that involves both of
186 CHAPTER 3 VECTOR SPACES

them—namely, one of the distributive laws S1 or S2. Using distributive law


$2, we have for v & V,

Ov = (0 + O)v = Ov + Ov.

By the additive identity axiom A3 and the commutative law A2, we know that

. Ov = 0+ Ov = Ov + 0.
Therefore,

Ov + Ov = Ov + 0,

and, by property (3), we conclude thatOv=0. a

The Universality of Function Spaces (Optional)


In Example 3, we showed that the set F of all functions mapping R into R is a
vector space, where we define for ff ¢ © FandforrER
(f+ ay(x) = f(x) + g(x) anid (1f)(x) = of). (1)
Note that these definitions of addition and scalar multiplication for functions
having R as both domain and codomain use only the algebraic structure of the
codomain R and not the algebraic structure of the domain R. That is, the
defining additicn and scalar multiplication appearing on the right-hand sides
of Eqs. (1) take place in the codomain. We do not see anything like f(a + d),
which involves addition in the domain R, or like f(rx), which involves scalar
multiplication in the domain. This suggests that if Sis any set and we let F be
the set of all functions mapping S into R, then Example 3 might still go
through, and show that Fis a vector space. We show that this is the case in our
next example.

EXAMPLE 7 Let F be the set of all real-valued functions on a (nonempty) set S; that is, let F
be the set of all functions mapping S into R. For f, g € F, let the sum f+ g of
two functions fand g in F be defined by

(f+ g(x) = f(x) + 8(x) for all x € S,


and, for any scalar r, let scalar multiplication be defined by

(rf)(x) = rf(x) for all x € S.


Show that F with these operations is a vector space.
SOLUTION The solution of Example 3 is valid word for word if we replace each reference
R to the domain by the new domain S. That is, the functions map S into R, and
we let x € § rather than x € Rin this solution. »

We now indicate why we headed this discussion “The Universality of


runction Spaces.” Lei S = {1, 24, so that is the set of all functions mapping
{1, 2}into R. Let us abbreviate the description of a function fas [/(1), /(2)]. For
3.4 VCUIUN sims owe

example, we consider [—3, 8] to denote the function in F that maps | into —3


and maps 2 into 8. In this way, we identify each vector [a, 5] in R° with a
function f mapping {1, 2} into R—namely, f(1) = a and f(2) = 5. If we view
[a, b] as the function fand [e, d] as the function g, then

Ut y=) + aya ate and (f+ gX2)= f2)+82)= b+ 4.


Thus, the vector [a+ c, 6 + d] = [a, 6] + [c, d] in R’ is the function f + g.
Similarly, we see that the vector [ra, rb] = r[a, b] in R’ is the function rf. In this
way, we can regard R? as the vector space of functions mapping {1, 2} into R.
We realize that we can equally well consider each vector of R’ as a function
mapping {1, 2, 3,..., n} into R. For example, the vector [—3, 5, 2, 7] can be
considered as the function
/: {1, 2, 3, 4} > R, where f(1) = —3,/(2) = 5, f(3) =
2, and f(4) = 7.
In a similar fashion, we can view the vector space M,,, of matrices in
Example | as the vector space of functions mapping the positive integers from
1 to mz into R. For example, taking m = 2 and n = 3, we can view the matrix

é a; “
Q, Ay as
as the function /: {1, 2, 3, 4, 5, 6} R, where /(i) = a,. Addition of functions as
defined in Example 7 again corresponds to addition of matrices, and the same
is true for scalar multiplication.
The vector space P of all polynomials in Example 2 is net quite as easy to
present as a function space, because not all the polynomials have the same
number of terms. However, we can view the vector space P., of formal power
series in x as the space of all functions mapping {0, 1, 2, 3,...} into R.
Namely, if fis such a function and if f(n) = a, for n € {0, 1, 2,3, ... }, then we

can denote this function symbolically by > a,x". We see that function
n=0
addition and multiplication by a scalar will produce precisely the addition and
multiplication of power series defined in Example 4. We will show in the next
section that we can view the vector space P of polynomials as a subspace of P..
We have now freed the domain of our function spaces from having to be
the set R. Let’s see how much we can free the codomain. The definitions

(f+ gx) = f(x) + g(x) and (rf)(x) = rf(x)


for addition and scalar multiplication of functions show that we need a notion
of addition and scalar multiplication for the codomain. For function addition
to be associative and commutative, we need the addition in the codomain to be
associative and commutative. For there to be a zero vector, we need an
additive identity—let’s call it O—in the codomain, so that we will have a “zero
constant function” to serve as additive identity, etc. It looks as though what we
need is to have the codomain itself have a vector space structure! This 1s
indeed the case. If S is a set and V is a vector space, then the set F of all
functions{: S— V with this notion of function addition and multiplication b:
188 CHAPTER 3 VECTOR SPACES

a scalar is again a vector space. Note that R itself is a vector space, so the set of
functions f: S > R in Example 7 is a special case of this construction. For
another example, the set of all functions mapping R? into R’ has a vector space
structure. In third-semester calculus, we sometimes speak of a “vector-valued
function,” meaning a function whose codomain is a vector space, although we
usually don’t talk about vector spaces there.
Thus, starting with a set S, we could consider the vector space V, of all
functions mapping S into R, and then the vector space V, of ali functions
mapping S into V,, and then the vector space V; of all functions mapping S
into V,, and then—oops! We had better stop now. People who do too much of
this stuff are apt to start climbing the walls. (However, mathematicians do
sometimes build cumulative structures in this fashion.)
--————_

7 UNMARY |
1. A vector space is a nonempiy set V of objects called vectors, together with
rules for adding any two vectors v and w in V and for multiplying any
vector vin V by any scalar rin R. Furthermore, V must be closed under this
vector addition and scalar multiplication so that v + wand ry are both in
V. Moreover, the following axioms must be satisfied for all vectors u, v, and
w in Y and all scalars r and s in R:

Al (u+yv)+w=ut(v tw)
A2 v+w=wty
A3 There exists a zero vector 0 in V such that 0 + v = vy forally EV.
A4 Each v EV has an additive inverse —y in V such that vy + (- y) = @.
Sl riv+w)=rvtrw
S2 (r+ sw=rv t+ sy
$3 r(sv) = (75)v
S4 lv=y

2. Elementary properties of vector spaces are listed in Theorem 3.1.


3. Examples of vector spaces include R”, the space M,,, of all m x n matrices,
the space of all polynomials in the variable x, and the space of all functions
f:R-R. For all of these examples, vector addition and scalar multiplica-
tion are the addition and multiplication by a real number with which we
are already familiar. -
4. (Optional) For any set S, the set of all functions mapping S into R forms a
vector space with the usual definitions of addition and scalar multiplica-
tion of functions. The Euclidean space R" can be viewed as the vector
space of functions mapping {1, 2, 3,..., m}into R, where a vector a =
[@,, @, @;,..., 4,] 1S viewed as the function fsuch that /(7) = a.
5. (Optional) Still more generally, for any set S and any vector space V, the
set of all functions mapping S into V forms a vector space with the usual
definitions of addition and scalar multiplication of functions.
31 VECTOR SPACES 183

7 EXERCISES
LO

jn Exercises 1-8, decide whether or not the given 10. The set of all 2 x 2 matrices of the form
set, together with the indicated operations of
addition and scalar multiplication, is a (real) [x 1]
yector Space. [i xy
where each x may be any scalar.
. The set of all diagonal n x n matrices.
1. The set R?, with the usual addition but with
scalar multiplication defined by r[x, y] = 12. The set of ail 3 x 3 matrices of the form
[ry, rx].
xX 0x
. The set R?, with the usual scalar 0x O|,
wv

multiplication but with addition defined by x 0 x


[4 I +[r,ss] = yt
5, x + 7}.
where each X may be any scalar.
. The set R?, with addition defined by
fod

x, y} + [a, 5] = [x + a+ 1, y+ 5) and . The set {0} consisting only of the number 0.


with scalar multiplication defined by 14. The set @ of all rational numbers.
r[x, y] = [x +r- 1, ny). 15. The set C of complex numbers; that is,
4. The set of all 2 x 2 matrices, with the usual
scalar multiplication but with addition C = {a+ bV-1| a, bin Ri,
defined by A + B = O, the 2 x 2 zero with the usual addition of complex numbers
matrix. and with scalar multiplication defined in the
5. The set of all 2 x 2 matrices, with the usual usual way by r(a + bV -1) = ra t+ rbV -1
addition but with scalar multiplication ‘or any numbers a, 5, and r in R.
defined by rA = O, the 2 x 2 zero matrix. 16. The set P, of all polynomials in x, with real
6. The set F of all functions mapping R into R, coefficients and of degree less than or equal
with scalar multiplication defined as in to n, together with the zero polyncmial.
Example 7 but with addition defined by 17. Your answer to Exercise 3 should be that R?
(SF g(x) = max{ fx), g(x}. with the given cperations is a vector space.
7. The set F of all functions mapping R into R, a. Describe the “zero vector’ in this vector
with scalar multiplication defined as in space.
Example 7 but with addition defined by b. Explain why the relations r[0, 0] =
(SY glx) = f(x) + 2¢(). {r — 1, 0] # [0, 0] do not violate property
8. The set F of all functions mapping R into R, (5) of Theorem 3.1.
with scalar multiplication defined as in i8. Mark each of the following True or Faise.
Example 7 but with addition defined by —___ a. Matrix multiplication is a vector-space
(S g(x) = 2f(x) + 2g(2). operation on the set /,,, of all mm xn
matrices.
_— b. Matrix multiplication is a vector-space
In Exercises 9-16, determine whether the given Operation on the set M, of all square
set is closed under the usual operations of n X n matrices.
addition and scalar multiplication, and is a (real) —— ¢. Multiplication of any vector by the zero
vector space. scalar always yields the zero vector.
___d. Multiplication of a nonzero vector by a
nonzero scalar never yields the zero
9. The set of all upper-triangular n x n vector.
matrices. ___ e. No vector is its own additive inverse.
190 CHAPTER 3 VECTOR SPACES

f. The zero vector is the only vector that is Exercises 26-29 are based on the ootional
its own additive inverse. subsection on the universality of function spaces.
g- Multiplication of two scalars is of no
concern in the definition of a vector 26. Using the discussion of function spaces at
One of the end of this section, explain how we can
h. One of the axioms for a vector space | view the Euclidean vector space R”" and the
relates addition of scalars, multiplication vector space M,,, of all m x n matrices as
of a vector by scalars, and addition of
ay ma
essentially the same vector space with just a
vectors. different notation for the vectors.
i. Every vector space has at least two
vectors. 27. Repeat Exercise 26 for the vector space M,,
j. Every vector space has at least one of 2 x 6 matrices and the vector space M,,
vector. of3 x 4 matrices.
19. Prove property 2 of Theorem 3.1. 28. If you worked Exercise 16 correctly, you
20. Prove property 3 of Theorem 3.1. found that the po:ynomials in x of degree at
most 1 do form a vector space P,. Explain
24. Prove property 5 of Theorem 3.i.
how P, and R"*! can be viewed as essentially
22. Prove property 6 of Theorem 3.1. the same vector space, with just a different
23. Let V be a vector space. Prove that, if v is in notation for the vectors.
V and if ris a scalar and if ry = 0, then
29. Referring to the three preceding exercises,
either r= Oorv=0.
list the vector spaces R™, R*, R°, P,,, Pys,
24, Let be a vector space and let v and w be
nonzero vectors
. in ¥ Prove that if v is not a x ;5 an
in twoMay May columns
or more Man Mayin such
Meg a ane
way
scalar multiple of w, then v is not a scalar that any two vector spaces listed in the
mialtiple ofv + w. same column can be viewed as the same
. Let V bea vecior space, and let v and w be vector space with just different notation for
vectors in V. Prove that there is a unique vectors, but two vector spaces that appear in
vector x in V such that x + v = w. different columns cannot be so viewed.

3.2 BASIC CONCEPTS OF VECTOR SPACES

We now extend the terminology we developed in Chapters ! and 2 for the


Euclidean spaces R" to general vector spaces V. That is, we discuss lineal
combinations of vectors, the span of vectors, subspaces, dependent and
independent vectors, bases, and dimension. Definitions of most of thes¢
concepts, and the theorems concerning them, can be lifted with minor changes
from Chapters | and 2, replacing “in R”’ wherever it occurs by “‘in a vectol
space V.? Where we do make changes, the reasons for them are explained.

Linear Combinations, Spans, and Subspaces


The major change from Chapter | is that our vector spaces may now be $0
large that they cannot be spanned by a fiite number of vectors. Each
buchavan space R" and each subspace of R” can be spanned by a finite set 0
vectors, but this is not the case for a genera! vector space. For example, "
32 BASIC CONCEPTS OF VECTOR SPACES 191

finite set of polynomials can span the space P of all polynomials in x, because a
finite set of polynomials cannot contain polynomials of arbitrarily high degree.
Because we can add only a finite number of vectors, our definition of a linear
combination in Chapter | will be unchanged. However, surely we want to
consider the space P of all polynomials to be spanned by the monomials in the
infinite set {1, x, x’, x3, . . . }, because every polynomial is a linear combination
of these monomials. Thus we must modify the definition of the span of vectors
to include ihe case where the number of vectors may be infinite.

DEFINITION 3.2 Linear Combinations

Given Vectors Vy, Vo. « » Yin a vector space Vand scalars r,,7,, . .
r, in R, the vector ;
- : ' nv, + nv, + ct TV,

is a *tinear combination of the vectors Vi V-.-> ¥, With scalar


coefficieits Ny Vay oa Me

DEFINITION 3.3 Span of 2 Subset X of V

Let x be a siibéet ofa a ector space y. The span of Xi is the set of all
linear combinations ofof jWectors.
3 in X, ;and i is denoted: ‘by sp(X). If X is
a finite set, so that Ye = {¥,, Vo oy Vb then we also write sp(X) as
Sp(v,, ¥,---, Vy). lf W = sp(Xx), the vectors in X span or generate W.
ify — $001 for some finite subset X of V, then Vis finitely generated.

HISTORICAL NOTE A COORDINATE-FREE TREATMENT of vector-space concepts appeared in


1862 in the second version of Ausdehnungslehre (The Calculus of Extension) by Hermann
Grassman (1809-1877). In this version he was able to suppress somewhat the philesophical bias
that had made his earlier work so unreadable and to concentrate on his new mathematical ideas.
These included the basic ideas of the theory of n-dimensional vector spaces, including linear
combinations, linear independence, and the notions of a subspace and a basis. He developed the
idea of the dimension of a subspace as the maximal number of linearly independent vectors and
proved the fundamental relation for two subspaces V and W that dim(V + WW’) = dim(V) +
dim(W) — dim(VN W).
Grassmann’s notions derived from the attempt to translate geometric ideas about n-
dimensional! space into the language of algebra without dealing with coordinates, as is done in
ordinary analytic geometry. He was the first to produce a complete system in which such concepts
as pofits, line segments, planes, and their analogues in higher dimensions are represented as single
elements. Although his ideas were initially difficult to understand, ultimately they entered the
mathematica! mainstream in such fields as vector analysis and the exterior algebra. Grassmann
himself, unfortunately, never attained his goal of becoming a German university professor,
spending most of his professional life as a mathematics teacher al a gymnasium (high school) in
Stettin. In the final decades of his life, he turned away from mathematics and established himself
as an exnerl in linguistics.
192 CHAPTER 3 VECTOR SPACES

In general, we suppress Euclidean space illustrations here, because they


are already familiar from Chapter 1.

ILLUSTRATION 1 Let P be the vector space of all polynomials, and let M = {1, x, x’, x’, ... }be
this subset of monomials. Then P = sp(M). Our remarks above Definition 3.2
indicate that P is not finitely generated. a

ILLUSTRATION 2 Let M,,, be the vector space of all m x n matrices, and let E be the set
consisting of the matrices E,,, where E,; is the m x n matrix having entry | in
the ith row and jth column and entries 0 elsewhere. There are mn of these
matrices in the set E. Then M,,,, = sp(E) and is finitely generated. =

The notion of closure of a subsei of a vector space V under vector addition


or scalar multiplication is the same as for a subset of R". Namely, a subset Wof
a vector space Vis closed uader vector addition if for allu,v€ Wthesumu+ty
is in W. If for aliv € Wand all scalars 7 we have rv € W, then Wis closed under
scalar multiplication.
We will call a subset W of Va subspace precisely when it is nonempty and
closed under vector addition and scalar multiplication, just as in Definition
1.16. However, we really should define it here in a different fashion, reflecting
the fact that we have given an axiomatic definition of a vector space. A vector
space is just one of many mathematical structures that are defined axiomati-
cally. (Some other such axiomatic structures are groups, rings, fields, topologi-
cal spaces, fiber bundles, sheaves, and manifolds.) A substructure is always
understood to be a subset of the original structure set that satisfies, all by itself,
the axioms for that type of structure, using inherited features, such 4s
operations of addition and scalar multiplication, from the original structure.
We give a definition of a subspace of a vector space V that reflects this point of
view, and then prove as a theorem that a subset of Vis a subspace if and only if
it is nonempty and closed under vector addition and scalar multiplication.

DEFINITION 3.4 Subspace

A subset W of a vector space Vis a subspace of V if. W itself fulfills the


requirements of a vector space, where addition:.and scalar multiplica-
tion of vectors in W produce the same vectors as these operations did
in V. _

In order for a nonempty subset W of a vector space V to be a subspace, the


subset (together with the operations of vector addition and scalar multiplica
tion) must form a self-contained system. That is, any addition or scalaf
multiplication using vectors in the subset W must always yield a vector that
lies again in W. Then taking any vin W, we see that Ov = O and (-1)v = —valeé
also in W’. The eight aropertics Al-A4 and S1-S4 required ofa vector spac®
in Definition 3.1 are sure to be true for the subset, because they hold in all of V.
32 BASIC CONCEPTS OF VECIOR SPACES 193

That is, if W1s nonempty and is closed under addition and scalar multiplica-
tion, it is sure to be a vector space in its own right. We have arrived at an
efficient test for determining whether a subset is a subspace of a vector space.

THEOREM 3.2 Test fora Subspace

A subset W of a vector space Vis a subspace of V if and only if W is


nonempty and satisfies the following two conditions:
1. Ifvandwarein W,thenv+wisin Closure under vector addition
W.
2. Ifris any scalarinRandvisin W, Closure under scalar
_ thenrvisin W. . multiplication

Condition (2) of Theorem 3.2 with r = 0 shows that the zero vector lies in
every subspace. Recall that a subspace of R” always contains the origin.
The entire vector space V satisfies the conditions of Theorem 3.2. That is,
Vis a subspace of itself. Other subspaces of V are called proper subspaces. One
such subspace is the subset {0}, consisting of only the zero vector. We call {0}
the zero subspace of V.
Note that if Vis a vector space and ¥ is any nonempty subset of V, then
sp(X) is a subspace of V, because the sum of two linear combinations of
vectors in X is again a linear combination of vectors in X, as is any scalar
multiple of such a linear combination. Thus the closure conditions of
Theorem 3.2 are satisfied. A moment of thought shows that sp(X) is the
smallest subspace of V containing all the vectors in X.

ILLUSTRATION 3 The space P of all polynomials in x is a subspace of the vector space P., of
power series in x, described in Example 4 of Section 3.1. Exercise 16 in Section
3.1 shows that the set consisting of all polynomials in x of degree at most x,
together with the zero polynomial, is a vector space P,. This space P, is a
subspace both of Pand of P,. =

ILLUSTRATION 4 The set of invertible n X n matrices is not a subspace of the vector space M, of
all n X n matrices, because the sum of two invertible matrices may not be
invertible; also, the zero matrix is not invertible.

ILLUSTRATION 5 The set of all upper-triangular n x n matrices is a subspace of the space M, of


all m X n matrices, because sums and scalar multiples of upper-triangular
matrices are again upper triangular. «

ILLUSTRATION 6 Let F be the vector space of all functions mapping R into R. Because sums and
scalar multiples of continuous functions are continuous, the subset C of F
consisting of all continuous functions mapping R into R is a subspace cf #.
Because sums and scalar multiples of differentiable functions are differentia-
ble, the subset D ofF consisting of all diffeientiable functions mapping &e into
194 CHAPTER 3 VECTOR SPACES

R is also a subspace of F. Because every different.able function is continuous,


we see that D is also a subspace of C. Let D, be the set of all functions mapping
R into R that have derivatives of all orders. Note that D, is closed under
addition and scalar multiplication and is a subspace of D, C, and F. =

EXAMPLE 1 Let F be the vector space of all functions mapping R into R. Show that the set §
of all solutions in F of the differential equation
fit f=
is a subspace of F.
SOLUTION We note that the zero function in F is a solution, and so the set S is nonempty.
If fand gare in S, thenf” + f= 0 and g” + g=0, andso(
f+ g)’+(ftg)=
f'tetfre=(f"t+Js) +(e +g) = 0+ 0, which shows that S is ciosed
under addition. Similarly, (rf)” + rf= rf" + ff=r(f" + f) = 10 = 0, so Sis
closed under scalar multiplication. Thus S is a subspace of F. »

The preceding example is a special case of a general theorem stating that


all solutions in F of a homogeneous linear differential equation form a
subspace of F. We ask you to write out the proof of the general theorem in
Exercise 40. Recall that all solutions of the homogeneous linear system Ax = 0,
where A is an m X n matrix, form a subspace of R".

Independence
We wish to extend the notions of dependence and independence that were
given in Chapter 2. We restricted our consideration to finite sets of vectors in
Chapter 2 because we can’t have more than n vectors in an independent subset
of R". In this chapter, we have to worry about larger sets, because it may take
an infinite set to span a vector space V. Recall that the vector space P of all
polynomials cannot be spanned by a finite set of vectors. We make the
following slight modification to Definition 2.1.

DEFINITION 3.5 Linear Dependence and Independence

Let X be a set of vectors in a vector space V. A dependence relation in


this set X is an equation of the form
ny; trnyt-++ t7y,=0, somer, #0

where v, € Xfori=1,2,...,. Ifsuch a dependence relation exists,


then X is a linearly dependent set of vectors. Otherwise, the set X of
vectors is linearly independent.

LSTRATION 7 The subset {1, x, 7,2... v",... }of monomials in the vector space P of all
polynomials is an independent set.
3.2 BASIC CONCEPTS OF VECTOR SPACES 195

ILLUSTRATION 8 The subset {sin’x, cos?x, 1} of the vector space F of all functions mapping R
into R is dependent. A dependence relation is

I(sin’x) + 1(cos’x) + (-1)1 = 0.


Similarly, the subset {sin’x, cos’x, cos 2x} is dependent, because we have the
trigonometric identity cos 2x = cos’x — sin’x. Thus I(cos 2x) + (—1)(cos?x) +
l(sin’x} = 0 is a dependence relation. 1s

We know a mechanical procedure for determining whether a finite set of


vectors in R’ is independent. We simply put the vectors as column vectors in a
matrix and reduce the matrix to row-echelon form. The set is independent if
and only if every column in the matrix contains a pivot. There is no such
mechanical procedure for determining whether a finite set of vectors in a
general vector space Vis independent. We illustrate two methods that are used
in function spaces in the next two examples.

EXAMPLE 2 Show that {sin x, cos x} is au independent set of functions in the space F of all
functions mapping R into R.
SOLUTION We show that there is no dependence relation of the form
r(sin x) + s(cos x} = 0, (1)

where the ¢ on the right of the equation is the function that has the value 0 for
all x. If Eq. (1) holds for all x, then setting x = 0 and x = 7/2, we obtain the
linear system

r(0) + s(1}=0 — Setting x = 0

r(l) + s(0) = 0, Setting x =5

whose only solution is r = s = 0. Thus Eq. (1) holds only if 7 = s = 0, and so the
functions sin x and cos x are independent. 1s

From Example 2, we see that one way to try to show independence of k


functions f(x), f(x), ..., f(x) is to substitute & different values of x in a
dependence relation format
rf(x) + nfi(x) + +++ + nf00 = 0.
This will lead to a homogeneous system of & equations in the & unknowns
1, Ty... 7 If that system has only the zero solution, then the functions are
independent. If there is a nontrivial solution, we can’t draw any conclusion.
For example, substituting x = 0 and x = 7 in Eq. (1) in Example 2 yields the
system

r(0)+s=0 Setting x = 0
r(0)-s=0, Setting x = =
196 CHAPTER 3 VECTOR SPACES

which has a nontrival solution—for example, r = 10 and s = 0. This occurs


because we just chose the wrong values for x. The values 0 and 2/2 for x do
demonstrate independence, as we saw in Example 2.

EXAMPLE 3 Show that the functions e* and e* are independent in the vector space F of all
functions mapping R into R.
SOLUTION We set up the dependence reiation format

rex + se* = 0,

and try to determine if we must have r = 5 = 0. Illustrating a different


technique than in Example 2, we write this equation and its derivative:
rex+ sex =0
rex + 2se* = 0. Differentiating
Setting x = 0 in both equations, we obtain the homogeneous linear system

r+ s=0
r+ 2s= 0,
which has only the trivial solution r = s = 0. Thus the functions are
independent. s

In summary, we can try to show independence of functions by starting


with an equation in dependence relation format, and then substituting
different values of the variable, or differentiating (possibly several times) and
substituting values, or a combination of both, to obtain a square homogeneous
system with the coefficients in the dependence relation fonuat as unknowns. If
the system has only the zero solution, the functions are independent. If there
are nontrivial solutions, we can’t come to a conclusion without more work.

Bases and Dimension

Recall that we defined a subset {w,, w,, .. . , W,t to be a basis for the subspace
W = sp(w,, W,,..., W,) of R’if every vector in Wcan be expressed uniquely as
a linear combination of w,, W,, ..., W,. Theorem 1.15 shows that to demon-
strate this uniqueness, we need only show that the only linear combination
that yields the zero vector is the one with all coefficients O—that is, the
uniqueness condition can be replaced by the condition that the set’
{w,, W,, . . . , W,} be independent. This led us-to an alternate characterization
a basis for W (Theorem 2.1) as a subset of W that is independent and that spans
W. It is this alternate description that is traditional for general vector spaces.
The uniqueness condition then becomes a theorem; it remains the most
important aspect of a basis and forms the foundation for the next section of
our text. For a general vector space, we mav need an iifinile sct of vectors to
form a basis: for example, a basis for the space P of all polynomials is the set
32 BASIC CONCEPTS OF VECTOR SPACES 197

{1, x, x’,..., x*,...} of monomials. The following definition takes this


possibility into account. Also, because a subspace of a vector space is again a
vector space, it is unnecessary now to explicitly include the word “subspace”
in the definition.

DEFINITION 3.6 Basis fora Vector Space

Let V be a vector space. A set of vectors in V is a basis for V if the


following conditions are met:
1. The set of vectors spans V.
2. The set of vectors is linearly independent.

ILLUSTRATION 9 The setX = {1, x, x’, ..., x’, .. . } of monomials is a basis for the vector space
P of all polynomials. It is not a basis for the vector space P,, of formal power
a

series in x, discussed in Example 4 of Section 3.1, bocause a series > ax


n=0
cannot be expressed as a finite sum of scalar multiples of the monomials unless
all but a finite number of the coefficients a, are 0. For example, 1 + x + x? +
x} +... is not a finite sum of monomials. Remember that all linear combina-
tions are finite sums.
The vector space P, of polynomials of degree at most n, together with the
zero polynomial, has as a basis {1, x, x*,..., x"}. =

We now prove as a theorem the uniqueness which was the defining


criterion in Definition 1.17 in Section 1.6. Namely, we show that a subset B of
nonzero vectors in a vector space Vis a basis for Vif and only if each vector v
in V can be expressed uniquely in the form
v= nb, + rb, terse t+ rb, (2)

for scalars r; and vectors b; in B. Because B can be an infinite set, we need to


elaborate on the meaning of uniqueness. Suppose that there are two expres-
sions in the form of Eq. (2) for v. The two expressions might involve some of
the same vectors from 3 and might involve some different vectors from B. The
important thing is that each involves only a finite number of vectors. Thus if
we take all vectors in B appearing in one expression or the other, or in both, we
have just a finite list of vectors to be concerned with. We may assume that both
expressions contain each vector in this list by inserting any missing vector with
a zero coefhicient. Assuming now that Eq. (2) is the result of this adjustment of
the first expression for v, the second expression for v may be written as

v= $b, + 5,.b, + ++ * + 5,b,. (3)

Uniqueness asserts that s, = r, for each i.


198 CHAPTER 3 VECTOR SPACES

THEOREM 3.3 Unique Cor..bination Criterion for a Basis

Let B be a set of nonzero vectors in a vector space V. Then B is a basis


for Vif and only if each vector v in V can be uniquely expressed in the
form of Eq. (2) for scalars r; and vectors b; € B.

PROOF Assume that B is a basis tor V. Condition | of Definition 3.6 tells us


that a vector vin Vcan be expressed in the form of Eq. (2). Suppose now that y
can also be written in the form of Eq. (3). Subtracting Eq. (3) from Eq. (2), we
obtain
(r, — 5,)b, + (7%, — &)b, + 6° + + (KR — Sb, = 9.
Because B is independent, we see that r, — s, = 0,7, — 5, =0,...,7%—-5,=9,
and so r; = 5; for each i. Thus we have established uniqueness.
Now assume that each vector in V.can be expressed uniquely in the form of
Eq. (2). in particular, this is true of the zero vector. This means that no
dependence relation in B is possible. That is, if
rib, + rb, + - se + r,b, = 0

for vectors b, in 8 and scalars r,, then, because we always have


Ob, + Ob, + oe. + Ob, = 0,

each r, = 0 by uniqueness. This shows that B is independent. Because B also


generates V (by hypothesis), it is a basis. a

Dimension

We will have to restrict our treatment here to finitely generated vector


spaces—that is, those that can be spanned by a finite number of vectors. We
can argue just as we did for R" that every finitely generated vector space V
contains a basis.* Namely, if V = sp(v,, v,, . . . , v,), then we can examine the
vectors v, in turn, and delete at each step any that can be expressed as linear
combinations of those that remain. We would also like to know that any two
bases have the same number of elements, so that we can have a well-defined
concept of dimension for a finitely generated vector space. The main tool is
Theorem 2.2 on page 130, which we can restate for general vector spaces,
rather than for subspaces of R’.

*If we are willing to assume the Axiom of Choice:


Given a collection of nonempty sets, no two of which have an element in common, there exists a
“choice set” C that contains exactly one element from each set in the collection.
then we can prove that every vector space has a basis, and that given a vector space V, every basis
has the same number of elements, although we may be totally unable to actually specify a basis.
This kind of work is regarded as magnificent mathematics by some and as abstract nonsense by
others. For example. using the Axiom of Choice, we can prove that the space Pec of all formal
power series in x has a basis. But we are stili unable to exhibit one! It has been shown that the
Axiom of Choice is independent of the other axioms of set theory.
3.2 BASIC CONCEPTS OF VECTOR SPACES 199

THEOREM 3.4 Relative Size of Spanning and Independent Sets

Let V be a vector space. Let w,, W2, . . . , w, be vectors in Vthat span V,


and let v,, v,,..., V,, be vectors in V that are independent. Then
k= m.

PROOF The proof is the same, word for word, as the proof of Theo-
rem 2.2.

It is not surprising that the proof of the preceding theorem is the same as
that of Theorem 2.2. The next section will show that we can expect Chapter 2
arguments to be valid whenever we deal just with finitely generated vector
spaces.
The same arguments as those in the corollary to Theorem 2.2 give us the
following corollary to Theorem 3.4.

COROLLARY Invariance of Dimension for Finitely Generated Spaces

Let V be a finitely generated vector space. Then any two bases of V


have the same number of elements.

We can now rewrite the definition of dimension for finitely generated


vector spaces.

DEFINITION 3.7 Dimension of a Finitely Generated Vector Space

Let Vbe a finitely generated vector space. The number of elements in a


basis for V is the dimension of V, and is denoted by dim(V).

ILLUSTRATION 10 Let P, be the vector space of polynomials in x of degree at most 7. Because


{1, x, x*,..., x*} is a basis for P,, we see that dim(P,)=n+1. &

ILLUSTRATION 11 The set E of matrices E; ; in Illustration 2 is a basis for the vector space M,,,
of all m X n matrices, so dim(M,,,) = mn. &

By the same arguments that we used for R" (page 132), we see that for a
finitely generated vector space V, every independent set of vectors in V can be
enlarged, if necessary, to a basis. Also, if dim(V) = &, then every independent
set of k vectors in Vis a basis for V, and every set of k vectors that span Visa
basis for V. (See Theorem 2.3 on page 133.)

EXAMPLE 4 Determine whetherS = {1 — x, 2 — 3x*, x + 2x*}is a basis for the vector space
P, of polynomials of degree at most 2, together with the zero polynomial.
200 CHAPTER 3 VECTOR SPACES

SOLUTION We know that dim(P,) = 3 because {1, x, x’} is a basis for P,. Thus S will be a
basis if and only if S is an independent set. We can rewrite the dependence
relation format

r(l — x) + s(2 - 3x2) + t(x + 2x) = 0


as .
(r+ 2s) + (—r+ Ox + (-3s + 20x? = 0.
Because {1, x, x’} is independent, this relation can hold if and only if

r+ 2s =0
-r + t1=6
—35
+ 2t= 0.

Reducing the coefficient matrix oi this homogeneous system. we obtain

1 2 o] fi 2 of {i220
-1 0 tl~l0 2 1/~{01

Nop
0-3 2} [0-3 2] Igo

.
NJ
We see at once thai the. homogeneous system with this coefficient matrix
has only the trivial solution. Thus no dependence relation exists, and so
S is an independent set with the necessary number of vectors, and is thus
a basis. =

EXAMPLE 5 Find a basis for the vector space P; (polynomials of degree at most 3, and 0)
containing the polynomials x? + 1 and x — 1.
SOLUTION First we observe that the two given polynomials are independent because
neither is a scalar multiple of the other. The vectors
eP+1,P-1,1,x%,%7,%

generate P; because the last four of these form a basis for P;. We can reduce
this list of vectors to a basis by deleting any vector that is a linear combina-
tion of-others in the list, being sure to retain the first two. For this ex-
ample, it is actually easier to notice that surely x is not in sp(x’ + 1, x? - 1)
and x? is not in sp(x? — 1, x? + 1, x). Thus the set {x? + 1, x? — 1, x, x} is inde-
pendent. Because dim (P;) = 4 and this independent set contains four vectors,
it must be a basis for P;. Alternatively, we could have.deleted 1 and x by
noticing that

l
L=5(x° + 1) - ae -1) and x= S02 + 1) + ae — 1). s
~
3.2 BASIC CONCEPTS OF VECTOR SPACES 201

| SUMMARY
1. A subset W of a vector space V is a subspace of V if and only if it is
noneinpty and satishes the two closure properties:
v + w is contained in W for all vectors vy and w in W, and
rv is contained in W for all vectors vy in W and all scalars r.
2. Let X be a subset of a vector space V. The set sp(X) of all linear
combinations of vectors in X is a subspace of V called the span of X, or the
subspace of V generated by the vectors in X. It is the smallest subspace of V
containing all the vectors in X.
3. A vector space V is finitely generated if V = sp(X) for some set X =
{v,, V2... , ¥,¢ containing only a finite number of vectors in V.
4. Aset X of vectors in a vector space V is linearly dependent if there exists a
dependence relation
ny, + ny, + +++ +7y,= 0, at least one r, 4 0,
where each v, © X and each r; € R. The set X is linearly independent if no
such dependence relation exists, and so a linear combination of vectors in
X is the zero vector only if all scalar coefficients are zero.
5. A set B of vectors in a vector space V is a basis for V if B spans V and is
independent.
6. Asubset B of nonzero vectors in a vector space Vis a basis for Vif and only
if every nonzero vector in V can be expressed as a linear combination of
vectors in B in a unique way.
7. If X ts a finite set of vectors spanning a vector space V, then X can be
reduced, if necessary, to a basis for V by deleting in turn any vector that
can be expressed as a linear combination of those remaining.
8. Ifa vector space V has a finite basis, then all bases for V have the same
number of vectors. The number of vectors in a basis for Vis the dimension
of V, denoted by dim(V).
9. The following are equivalent for n vectors in a vector space V where
dim(V) = n.
a. The vectors are linearly independent.
b. The vectors generate V.

EXERCISES. .
_f

In Exercises 1-6, determine whether the indicated the vector space P of all polynomials with
subset is a subspace of the given vector space. coefficients in R
2. The set of all polynomials of degree 4
1, The set of all polynomials of degree greater together with the zero polynomial in the
than 3 together with the zero polynomial in vector space P of all polynomials in x
202 CHAPTER 3 VECTOR SPACES

3. The set of all functions fsuch that /(0) = 1 14, {sin x, cos x}
in the vector space F of all functions 15. {I, x, x}
mapping R into R 16. {sin x, sin 2x, sin 3x}
4. The set of all functions fsuch that /(1) = 0 17. {sin x, sin(~x)}
in the vector space F of all functions
mapping R into R 18. {e7, e*, e*}
_ The set of all functions fin the vector space 19. {1, e* + ev", e* — e*}
Wn

W of differentiable functions mapping R


into R (see Illustration 6) such that /‘(2) = 0 In Exercises 20 and 21, determine whether or not
6. The set of all functions fin the vector space the given set of vectors is a basis for the indicated
vector space.
W of differentiable functions mapping R
into R (see Illustration 6) such that /has
derivatives of all orders 20. {x, x + 1, (x — 1)} for P,
7. Let F be the vector space of func.ions 21. {x, & + 1), (x -- 1} for P,
mapping R into R. Show that
a. sp(sin2x, cos?x) contains all constant In Exercises 22-24, find a basis for the given
subspace of the vector space.
functions,
b. sp(sin’x, cos*x) contains the function
cos 2x, 22. sp(x? -— 1,2? + 1, 4, 2x — 3) in P
c. sp(7, sin?2x) contains the function 23. sp(l, 4x + 3, 3x —4, x7 + 2,x - x’) in P
8 cos 4x. 24. sp(1, sin’x, cos 2x, cos’x) in F
8. Lei P be the vector space of polynomials. 25. Mark each of the following True or False.
Prove that sp(1, x) = sp(l + 2x, x). [Hint: ___a. The set consisting of the zero vector is a
Show that each of these subspaces is a sudset subspace for every vector space.
of the other.] __ b. Every vector space has at least two
9. Let V be a vector space, and let v, and v, be distinct subspaces.
vectors in V. Follow the hint of Exercise 8 to —_.c. Every vector space with a nonzero vector
prove that has at least two distinct subspaces.
a. sp(¥, ¥)) = sp(, 2v, + ¥2), 4d. If{v,, ¥,,..., Vg iS a subset of a vector
b. sp(v,, ¥,) = Sp(¥, + YY) — ¥2)- space V, then vy; is in sp(¥,, ¥,,..., V,) for
10. Let v,,v),..., ¥, and W, W, ..-, W, be i=1,2,...,m.
vectors in a vector space V. Give a necessary —__e. If {v,, v,,..., V,} is a subset of a vector
and sufficient condition, involving linear space V, then the sum v, + vy; is in
combinations, for sp(¥,, Y2,.-., ¥,) forall choices of i andj
from | to n.
sp(v, Va, - 55 iP) = sp(w,, Woe e ey W,,)- —__. f. _Ifu + vlies in a subspace W of a vector
space V, then both u and v lie in W.
In Exercises 11-13, determine whether the given ——. g. Two subspaces of a vector space V may
set of vectors is dependent or independent. have empty intersection.
——h. If Sis independent, each vector in V can
1. £2 - 1,2+ 1, 4x, 2x— 3}inP be expressed uniquely as a linear
12. {1, 4x + 3,3x-4,2+2,x-x}inP combination of vectors in S.
13. {I, sin*x, cos 2x, cos*x} in F __. i. _If Sis independent and generates V, each
vector in V can be expressed uniquely as
Iu Exercises 14-19, use the technique discussed a linear combination of vectors in S.
j#lowing Example 3 to determine whether the —— j. Ifeach vector in V can be expressed
"cen set of functions in the vector space F is uniquely zs a linear combination of
vadependent or dependent. vectors in S, then S is an independent set.
3.2 BASIC CONCEPTS OF VECTOR SPACES 203

26. Let V be a vector space. Mark each of the 32. Let {v,, v., v,} be a basis for a vector space I.
following True or False. Prove that, if w is not in sj,¥,, v,), then
—_—_—
a. Every independent set of vectors in V {v,, ¥,, w} is also a basis for V.
is a basis for the subspace the vectors
span. 33. Let {v,, v>,....¥,} be a basis for a vector
b. If {v,, v,,..., ¥,} generates V, then each
v € Vis a linear combination of the {¥,, with £, ¥ 0. Prove that
vectors in this set. {v,, V9; ee ay Vi-t WwW, Veeb oe ey v,}

c. If {v,, ¥.,..., ¥,} generates V, then each


v € Vis a unique linear combination of is a basis for V.
the vectors in this set. 34. Let W and U be subspaces of a vector space
d. If {v,, vo, ---, Yat generates V and is V, and let WO U = {0}. Let {w,, w,, . . . , wi}
independent, then each v € V is a unique be a basis for W, and let {u,, u,,..., u,,} be
linear combination of the vectors in this a basis for U. Prove that, if each vector v in
set.
V is expressible in the form v = w + wu for
. If {v,, ¥, ..., V,¢ generates V, then this we Wandu€é U, then
set of vectors is independent.
f. If each vector in Vis a unique linear {W,, W.,..., Wj, Uy, Wy,..., UF
combination of the vectors in the set
{v,, ¥,,..., ¥,}, then this set is is a basis for V.
independent. 35. Illustrate Exercise 34 witn nontrivial
. If each vector in V is a unique linear subspaces W and U of R°.
combination of the vectors in the set
{v,, V2... Va}, then this set is a basis 36. Prove that, if W is a subspace of an
for V. n-dimensional vector space V and
. All vector spaces having a basis are dim(W) = n, then W = V.
finitely generated.
37. Let v,, v,,...,V, bea list of nonzero vectors
i. Every independent subset of a finitely
in a vector space V such that no vector in
generated vector space V ts a part of
this list is a linear combination of its
some basis for V.
predecessors. Show that the vectors in the
. Any two bases in a finite-dimensional
list form an independent set.
vector space V have the same number of
elements. 38. Exercise 37 indicates that a finite generating
27 . Let W, and W, be subspaces of a vector set for a vector space can be reduced to a
space V. Prove that the intersection basis by deleting, from left to right in a list
W, 1 W, is also a subspace of V. of the vectors, each vector that is a linear
. Let W, = sp((i, 2, 3), (2, 1, 1) and W, = combination of its predecessors. Use this
oo
N

sp((1, 0, 1), (3, 0, ~1)) in R?. Find a set of technique to find a basis for the subspace
generating vectors for W, N W,,. spe + 1, x +x- 1, 3x ~- 6,
. Let V be a vector space with basis {v,, v., v5}. + x7 + 1, x)
Prove that {v,, v, + v,, ¥, + Vv, + v3} is alsoa
basis for V. of the polynomial space P.
30 . Let V be a vector space with basis 39. We once watched a speaker in a lecture
{V,, ¥>,-.-, V,}, and let W = derive the equation f(x) sin x + g(x) cos x =
sp(¥;, V4... , ¥,). Ifw = ry, + ry, is 0, and then say, “Now everyone knows that
in W, show that w = 0. sin x and cos x are independent functions,
- Let {v,, ¥,, v;} be a basis for a vector space 17. so f(x) = O and g(x) = 0.” Was the
Prove that the vectors w, = v, + V2, W) = statement correct or incorrect? Give a proof
Vv, + v;, W; = Vv, — vy, do not generate V. or a counterexample.
204 CHAPTER 3 VECTOR SPACES

40. A homogeneous linear nth-order differential In Exercises 43-45, use your knowledge of
equation has the form calculus and the solution of Exercise 41 to
+ fixoy’ + describe the solution set of the given differential
+ f,_i(ye foes
flxyy equation. You should be able to work these
Koy’ + fdy = 0. problems without having had a course in
Show that the set of all solutions of this differential equations, using the hints.
equation that lie in the space F of all 43. a. y” + y = 0 [Hint: You need to find two
functions mapping R into R is a subspace indepcndent functions such that when
of F. you differentiate twice, you get the
41. Referring to Exercise 40, suppose that the negative of the function you started with]
b. y” + y = x [Hmnt: Find one solution by
differential equation
experimentation.]
LOY + faye + + fogy" +
. y” — 4y = 0 [Hint: What two
fiy’ + Koy = ax) independent functions, when
differentiated twice, give 4 times the
does have a solution y = p(x) in the space F
original function?]
of all functions mapping R into R. By b. y" — 4y = x [Rint: Find one solution by
analogy with Theorem 1.18 on p. 97,
expcrimentation.]
describe the structure of the set of solutions
of this equation that lie in F. 45. a. y® — yy’ = 0 [Hint: Try tc find values of
m such that y = 2” is a solution.]
42. Solve the differential equation y’ = 2x and b. yO — 9y' = x? + 2x [Hint: Find one
describe your solution in terms of your solution by experimentation.]
answer to Exercise 41, or in terms of the 46. Let S be any set and let F be the set of all
answer in the back of the text. functions mapping S into R. Let W be the
It is a theorem of differential equations that if subset of F consisting of all functions fE F
the functions /{x) of the differential equation in such that f(s) = 0 for all but a finite number
Exercise 40 are all constant, then all the of elements sin S.
solutions of the equation lie in the vector space F a. Show that W is a subspace of F.
of all functions mapping R into R and form a b. What condition must be satisfied to have
subspace of F of dimension n. Thus every W =F?
solution can be written as a linear combination 47. Referring to Exercise 46, describe a basis B
of n independent functions in F that form a basis for the subspace W of F. Explain why 8 is
for the solution space. not a basis for F unless F = W.

3.3 COORDINATIZATION OF VECTORS


Much of the work in this text is phrased in terms of the Euclidean vector
spaces R" for n = 1, 2, 3,.... In this section we show that, for finite-
dimensional vector spaces, no loss of generality results from restricting
ourselves to the spaces R’. Specifically, we will see that if a vector space V has
dimension n, then V can be coordinatized so that it will look just like R". We
can then work with these coordinates by utilizing the matrix techniques we
have developed for the space R". Threughout this section, we consider V to bea
finite-dimensional vector space.
3.3. COORDINATIZATION OF VECTORS 205

Ordered Bases

The vector [2, 5] in R? can be expressed in terms of the standard basis vectors
as 2e, + 5e,. The components of [2, 5] are precisely the coefficients of these
basis vectors. The vector [2, 5] is different from the vector [5, 2], just as the
point (2, 5) is different from the point (5, 2). We regard the standard basis
vectors as having a natural order, e, = [1, 0] and e, = [0, 1}. Ina nonzero vector
space V with a basis B = {b,, b, .. . , b,}, there is usually no natural order for
the basis vectors. For example, tle vectors b, = [—1, 5] and b, = [3, 2] forma
basis for R?, but there is no natural order for these vectors. If we want the
vectors to have an order, we must specify their order. By convention, set
notation does not denote order; for example, {b,, b,} = {b,, b,}. To describe
order, we use parentheses, ( ), in place of set braces, { }; we are used to
paying attention to order in the notation (b,, b,). We denote an ordered basis of
n vectors in V by 3B = (b, b-,..., b,). For example, the standard basis
{e,, €,, e;} of R? gives rise to six different ordered bases—namely,
(2, €p, €3) (Cp, €1, €3) (€3, 1, €2) (Cy. 3, €2) (€r, €3, €y) (C3, C2, €)).
These correspond to the six possible orders for the unit coordinate vectors.
The ordered basis (e,, e,, e,) is the standard ordered basis for R’, and in general,
the basis E = (e,, e,,.. ., e,) is the standard ordered basis for R’.

Coordinatization of Vectors

Let V be a finite-dimensional vector space, and let B = (b,, b,,..., b,) bea
basis for V. By Theorem 3.3, every vector v in V can be expressed in the form
yv=rnb,+ b+ -°°° +7,b,
for unique scalars r,, 7,,..., 7, We associate the vector [r,, 7, ..., 7,| in R"
with v. This gives us a way of coordinatizing V.

DEFINITION 3.8 Coordinate Vector Relative to an Ordered Basis

Let B = (b,, b,, .. . , b,) be an ordered basis for a finite-dimensional


vector space V, and let
v=nrb, + rb, + soe + r,b,.

The vector [r,, 7, .--, 7,] ‘$ the coordinate vector of v relative to the
ordered basis B, and is denoted by v,.

ILLUSTRATION 1 Let P, be the vector space of polynomials of degree at most n. There are two
natural choices for an ordered basis for P,—namely,
B=(l,x,2x7,...,x) and B= (x,x7',...,x',x, 1).
Taking # = 4, we see that for the polynomial p(x) = —x + x? + 2x* we have
p(x), = [0, -1,0, 1.2] and p(x), = [2, 1.6, -1, 0. z
206 CHAPTER 3 VECTOR SPACES

EXAMPLE 1 Find the coordinate vectors of [1, —1] and of [—1, —8] relative to the ordered
basis B = ({1, —1], [1, 2]) of R’.
SOLUTION We see that [1, —1], = [1, 9], because
(1, -1} = If, -1] + Of1, 21.
To find [—1, —8],, we must find r, and r, such that [-1, -8] = r,[1, -1] +
r, (1, 2]. Equating components of this vector equation, we obtain the linear
system
r, + ry = —|

—r, t+ 2r,= -8.


The solution of this system is r, = 2, r, = —3, so we have [—1, 8], = [2, —3].
Figure 3.1 indicates the geometric meaning of these coordinates. &

EXAMPLE 2 Find the cocrdinate vector of [1, 2, —2] relative to the orcered basis B =
([1, 1, 1], [1, 2, 0), [1, 0, 1]) in R’.
SOLUTION We must express [1, 2, —2] as a linear combination of the basis vectors in B.
Working with column vectors, we must solve the equation

11 { 1! 1
nil{+rnj2)+7r,j;0;=|] 2
{ 0 1) |-2

FIGURE 3 1
[—1, -8]g = (2, -3].
3.3 COORDINATIZATION OF VECTORS 207

for r,, r,, and r,. We find the unique solution by a Gauss—Jordan reduction:

Therefore, [1, 2, -2], = [—4, 3, 2]. =

We now box the procedure illustrated by Example 2.

Finding the Coordinate Vector of v in R” Relative to an Ordered Basis


B= (b,, b,,. . ., b,)
Step 1 Writing vectors as column vectors form the augmented
matrix [b, b,... b, | vi}.
Step 2 Use a Gauss—Jordan reduction to obtain the augmented
matrix [I | vg], where J is the n x n identity matrix and v, is
the desired coordinate vector,

Coordinatization of a Finite-Dimensiona! Vector Space


We can coordinatize a finite-dimensional vector space V by selecting an
ordered basis B = (h, b,,..., b,) and associating with each vector in V its
unique coordinate vector relative to B. This gives a one-to-one correspondence
between all the vectors in Vand all the vectors in R". To show that we may now
work in R’ rather than in V, we have to show that the vector-space operations
of vector addition and scalar multiplication in V are mirrored by those
operations on coordinate vectors in R”. That is, we must show that

(v+w),=V¥, tw, and (ty), = lv, (1)

for all vectors vy and win V and for all scalars ¢ in R. To do this, suppose that

yv=rb,+ rb,+ +--+ +7,b,


and

w= Sb, + 5b, + +++ + 5,b,.

Because

y+w=(r, + 5,)b, + (7, + 5b, + +++ + (7, + 5,)b,;


208 CHAPTER 3 VECTOR SPACES

we see that the coordinate vector of v + w is

(Vt+w)=([r,t+5,%+8,...,7, + 5]

= [ny ly. Me) + [5 5 ~~~ Sal

= Ve
+ Wp,

which is the sum of the coordinate vectors of v and of w. Similarly, for any
scalar t, we have

ty = t(r,b, + rb, + - ++ + 7,b,)


= (tr,)b, + (tr,)b, tess + (tr,)b,,,

so the coordinate vector of fv is

(tv), = [tr, try... 5 tril


= t{r,, Yo. - +5 Ta) = LVp.

This completes the demonstration of relaticns (1). These relaticns tell us that,
when we rename the vectors in V by coordinates relative to B, the resulting
vector space of coordinates—namely, R’—has the same vector-space struc-
ture as V. Whenever the vectors in a vector space V can be renamed to make V
appear structurally identical to a vector space W, we say that V and W are
isomorphic vector spaces. Our discussion shows that every real vector-space
having a basis of n vectors is isomorphic to R". For example, the space P, of ail
polynomials of degree at most 7 is isomorphic to R"*!, because P, has an
ordered basis B = (x", x!) ..., x, x, 1) of 2 + 1 vectors. Each polynomial

aX" + a,
jx +++ baxt ay

can be renamed by its coordinate vector

[2,, Qn-yy->+ +s Qs ap)

relative to B. The adjective isomorphic is used throughout algebra to signify


that two algebraic structures are identical except in the names of their
elements.
For a vector space V isomorphic to a vector space W, all of the algebraic
properties of vectors in V that can be derived solely from the axioms of a
vector space correspond to identical properties in W. However, we cannot
expect other features—such as whether the vectors are functions, matrices, or
n-tuples—to be carried over from one space to the other. But a generating set
of vectors or an independent set of vectors in one space corresponds to a set
with the same property in the other space. Here is an example showing how we
can simplify computations in a finite-dimensional vector space V, by working
instead in R’.

CXAMPLE 3 Determine whether x° — 3x + 2, 3x7 + Sv — 4, and 72° + 21x — 16 are


independent in the vector space P, of polynomials of degree at most 2.
33 COORDINATIZATION OF VECTORS 209

SOLUTION We take B = (x’, x, 1) as an ordered basis for P,. The coordinate vectors
relative to B of the given polynomials are

(x? ~ 3x + 2)p = [1, —3, 2],

(3x? + 5x — 4), = [3, 5, —4],


(7x? + 21x — 16),= [7, 21, - 16].

We can determine whether the polynomials are independent by determining


whether the corresponding coordinate vectors in R? are independent. We set
up the usual matrix, with these vectors as column vectors, and then we
row-reduce it, obtaining

1 3 7 | 3. 7 137
-—3 5 21)/~]0 14 42)~]0
1 3}.
2 -4 -16 0 -10 -30} |0 00

Because the third column in the echelon form has no pivot, these three
coordinate vectors in R? are dependent, and so the three polynomials are
dependent. s

Continuing Example 3, to further illustrate working with coordinates in


R", we can reduce the final matrix further to

ft o-
G 1 3).
00 0

If we imagine a partition between the second and third columns, we see that

[7, 21, -16] = —2[1, —3, 2] + 3[2, 5, —4].

Thus

Tx? + 21x — 16 = —2(x? — 3x + 2) + 3(3x? + Sx - 4).

EXAMPLE 4 It can be shown that the set {1, sin x, sin 2x, ..., sin mx} is an independent
subset of the vector space / of all functions mapping R into R. Find a basis for
the subspace of F spanned by the functions

fia = 3 - sin x + 3 sin 2x — sin 3x + 5 sin 4x,


A(x)= 1 + 2 sinx + 4 sin 2x — sin 4x
fi)= -1 + Ssin
x + 5 sin 2x + sin 3x — 7 sin 4x
fx) = 3 sin 2x — sin 4x.
SOLUTION We see that all these functions lie in the subspace W of F given by W =
sp(1, sin x, sin 2x, sin 3x, sin 4x), and we take

B = (1, sin x, sin 2x, sin 3x, sin 4x)


216 CHAPTER 3 VECTOR SPACES

as an ordered basis for this subspace. Working with coordinates relative to B,


the problem reduces to finding a basis for the subspace of R® spanned by
[3, -1, 3, -1, 5], [l, 2, 4, 0, -1], [-1, 5, 5, 1, -7], and (0, 0, 3, 0, —1]. We
reduce the matrix having these as column vectors, and begin this by switching
minus the fourth row with the first row to create the pivot | in the first column,
We obtain

3 1-1 0 1 0-1 0O 1 Q0-1 O 1 0-1 O


_ 2 5 0} JO 2 4 OF (O 1 2 OF JO 1 2 0
3 4 5 3/~|0 4 8 3)~10 0 0 3)/~j0 O O Tl.
- 0 1 OF JO 1 2 OF 0 0 0 OF 10 0 DO QO
5-1 -7 -l 0-1 -2 -1 0 0 0-1 0 0 0 0

Because the first, second, and fourth columns have pivots, we shuuld keep the
first, second, and fourth of the original column vectors, so that the set
{A(X £00), A000} is a basis for the subspace W. m

We give an example showing the uiility of different bases in the study of


polynomial functions. We know that a polynomial function
ypHpnaax"t-++ +axrtaxt a

has a graph that passes through the origin if a, = 0. If both a, and a, are zero,
then the graph not only goes through the crigin, but is quite flat there; indeed,
if a, ~ 0, then the function behaves approximately like a,x’ very near x = 0,
because for very small values of x, such as 0.00001, the value of x* is much
greater than the values of x’, x‘, and other higher powers of x. If a, = 0 also, but
a, ~ 0, then the function behaves very much like a,x’ for such values of x very
close to 0, etc. If instead of studying a polynomial function near x = 0, we want
to study it near some other x-value, say near x = a, then we would like to
express the polynomial as a linear combination of powers (x — a)'—that is,

p(x) = b(x — a)? + +++ + B(x — al + (x — a) + do.

Both
B = (x",...,
x’, x, 1) and B’ = ((x— a)’,...,
(x — a)’, x — a, 1) are
ordered bases for the space P, of polynomials of degree at most n. (We leave the
demenstration that B’ is a basis as Exercise 20.) We give an example
illustrating a method for expressing the polynomial x’ + x* — x — | asa linear
combination of (x + 1), (x+1)*,x+ landl. -

EXAMPLE5 Find the coordinate vector of p(x) = x3 + x2 - x — 1 relative to the ordered


basis B’ = ((x + 1), (x + 1), x + 1, 1).
SOLUTION Multiplying out the powers of x + 1, to express the vectors in B’ in terms of our
usual ordered basis B = (x3, x’, x, 1) for P;, we see that

= (0+ 34+ 3x41, 4+ 2x4 t,x4+ 1, 1).


Using coordinates relative to the ordered basis B, our problem reduces to
expressing the vector [1, 1, —1, —1] as a linear combination of the vectors
33 COORDINATIZATION OF VECTORS 211

[1, 3, 3, i], [0, l, 2, 1], [0, 0, l, 1], and 0, 0, 0, 1]. Reducing the matrix

er
corresponding to the assoc.ated linear syst m, we obtain

ao
1 0 00| i} ft 1 0 ty
31 00 1} {0

—-DOO
00/-2} |01 0

o-oo

DON
oOo
10

|
32 -1/~}0 0/-4/~
10 0 ~ 10
11 11 -1| [0 0 0 0
Thus the required coordinate vector is p(x)s. = [1, —2, 0, 0], and so
xt xe-—x- 1 = (x4 1) - x4 1). 7

Linear algebra is not the only tool that can be used to solve the problem in
Example 5. Exercise 13 suggests a polynomial algebra solution, and Exercise
16 describes a calculus solution.

SUMMARY

Let V be a vector space with basis {b,, b,, . . . , b,}.

1. B=(b,b,.. . ,b,) is an ordered basis; the vectors are regarded as being in


a specified order in this n-tuple notation.
2. Each vector v in V has a unique expression as a linear combination:
yv=rb,
+ 7,b, + °° * + rb.
3. The vector y, =[F,,7,... , ,| for the uniquely determined scalars r; in the
preceding equation (summary item 2) is the coordinate vector of v relative
to B.
4, The vector space V can be coordinatized, using summary item 3, so that V
is isomorphic to R’.

"| EXERCISES -
In Exercises 1-10, find the coordinate vector of . +x? — 2x + 4 in P, relative to
the given vector relative to the indicated ordered (1, x’, x, x’)
basis. . xi + 3x? — 4x + 2 1n P, relative to
J. [-1, 1] in R relative to ((0, 1], [1, 0]) (x, x? — 1, x3, 2x’)

2. [-2, 4] in R? relative to ((0, —2], [-}, 0]) . x + xin P, relative to


(1, 2x — 1,23 + x4, 2x}, x? + 2)
3. [4, 6, 2] in R? relative to
((2, 0, 0}, [0, 1, 1], [0, 0, 1}) 10. ; a in M, relative to
3 4
4. [4, —2, 1] in R? relative to
({0, 1, 1], [2, 0, 0], [0, 3, 0]) 0 1] /O -1] [1 -1] [0 1
5. [3, 13. —1] in R’ relative to 1o/|0 ojo 3fP{o1
({1, 3, —2], [4, 1, 3], [-1, 2, 0]) 1. Find the coordinate vector of the polynomial
6. [9, 6, 11, 0] in R‘ relative to ([1, 0, 1, 0], x — 4° + 3x + 7 relative to the ordered
(2, i, i, 1], (0, 1, 1. -1. [2, b, 3. 1) basis B’ = ((x — 2), (x — 2). (x -— 2), 1) of
212 CHAPTER 3 VECTOR SPACES

the vector space P, of polynomials of degree fix) = 1-2 sinx + 4cc3x — sin 2x -
at most 3. Use thc method illustrated in 3 cos 2x,
Example 5. f(x) = 2 -— 3sinx — cos x + 4sin 2x +
12. Find the coordinate vector of the polynomial 5 cos 2x
4x} — 9x? + x relative to the ordered basis
f(x) = 5 - 8sinx
+ 2 cosxt
B’ = ((x — 1), (x - 1), (x — 1), 1) of the
7 sin 2x+ 7 cos 2x
vector space P; of polynomials of degree at
most 3. Use the method illustrated in fx)= -1 + 14 cos
x - 11 sin 2x -
Example 5. 19 cos 2x
13. Example 5 showed how to usc linear algebra 20. Prove that for every positive integer n and
1o rewrite the polynomial p(xj = x3 + x? — every a € R, the set
x — | in powers ofx + | rather than in {(x — a)", (x — a)! 2.2, (x — a)’, x — a, 1}
powers of x. This exercise indicates a
polynomial aigebra sclution to this problem. is a basis for the vector space P, of
Replace x in p(x) by [(x + 1} — 1], and polynomials of degree at most n.
expand using the binomial theorem, keeping 21. Find the polynomial in P, whose coordinate
the (x + 1) intact. Check your answer with vector rclative to the ordered basis B =
that in Example 5. (x+ x*4,x -— x7, 1 + x) is (3, 1, 2].
14. Repeat Exercise 11 using the polynomial 22. Let V be a nonzero finite-dimensional vector
algebra method indicated in Exercise 13. space. Mark each of the following True or
15. Repeat Exercise !2 using the polynomial False.
algebra method indicated in Exercise | 3. __ a. The vector space V is isomorphic to R’
for some positive integer 7.
16. Example 5 showed how to use /inear algebra ___ b. There is a unique coordinate vector
to rewrite the polynomial p(x) = x3 + x? — associated with each vector v € V.
x — | in powers of x + | rather than in ___ c. There is a unique coordinate vector
powers of x. This exercise indicates a associated with each vector v € V relative
calculus solution to this problem. Form the to a basis for V.
equation ___ d. There is a unique coordinate vector
xt x—-x~ 1 = d(x + 1) + bfx+ 1 + associated with each vector v € V relative
b(x+ 1) + do. to an ordered basis for V.
___e. Distinct vectors in V have distinct
Find 0) by substitutingx = —1 in this coordinate vectors relative to the same
equation. Then equate the derivatives of ordered basis 8 for V.
both sides, and substitute x = —1 to find f. The same vector in V cannot have the
b,. Continue differentiating doth sides same coordinate vector relative to
and substituting x = —1 to find b, and different ordered bases for V.
b,. Check your answer with that in ___ g. There are six possible ordered bases for
Example 5. R?.
17. Repeat Exercise 1] using the calculus ___h. There are six possible ordered bases for
method indicated in Exercise 16. R?, consisting of the standard unit
coordinate vectors in R?.
18. Repeat Exercise 12 using the calculus i. A reordering of elements in an ordered
o-

method indicated in Exercise 16. basis for V corresponds to a similar


19. a. Prove that {1, sin x, cos x, sin 2x, cos 2.x} reordering of components in coordinate
is an independent set of functions in the veclors with respect to the basis.
vector Space F of all functions mapping R . Addition and multiplication by scalars in
fos 0

into R. I’ can be computed in terms of


b. finda basis for the subspace of F coordinate vectors with respect to any
generated by the functions fixed ordered basis for V.
34 LINEAR TRANSFORMATIONS 213

3.4 LINEAR TRANSFORMATIONS

Linear transformations mapping R” into R” were defined in Section 2.3. Now


that we have considered vectors in more general spaces than R’, it is natural to
extend the notion to linear transformations of other vector spaces, not
necessarily finite-dimensional. In this section, we introduce linear transforma-
tions in a general setting. Recall that a linear transformation T: R’ > Risa
function that satisfies
T(u + vy) = T(u) + T(y) (1)
and
T(ru) = r7(u) (2)
for all vectors u and v in R’ and for al! scalars r.

Linear Transformations T: V— V’

The definition of a linear transformation of a vector space V into a vector


space V’ is practically identical to the definition for the Euclidean vector
spaces in Section 2.3. We need only replace R’ by Vand R™ by V’.

DEFINITION 3.9 Linear Transformation

A function T that maps a vector space V into a vector space V' is a


linear transformaticn if it satishes two cniteria:
1. J(u + v) = 7(u) + 7(v), Preservation of additiou
2. T(ra) = rT(u), Preseryation of scalar multiplication
for all vectors u and v in V and for all scalars rin R.

Exercise 35 shows that the two conditions of Definition 3.9 may be


combined into the single condition
T(ru + sv) = rT(u) + sT(v) (3)
fer all vectors u and v in V and for all scalars r and s in R. Mathematica!
induction can be used to verify the analogous relation for n summands:

Try, + vy + 0+ + ry) = TM) + Tv.) + 0+ + Ty, A)


We remind you of some terminology and notation defined in Section 2.3
for functions in general and linear transformations in particular. We state
things here in the language of linear transformations. For a linear transforma-
tion T: V> V’', the set Vis the domain of T and the set V’' is the codomain of T.
If Wis a subset of V, then T[W] = {7(w) | w € W} is the image of W under T.
In particular, 7[V] is the range of T. Similarly, if W’' is a subset of V’. then
214 CHAPTER 3 VECTOR SPACES

T [WJ] = Wwe V | T(v) © W’'} is the inverse image of W’ under T. In


particular, T~'[{0’}] is the kernel of 7, denoted by ker(T). It consists of all of
the vectors in V that T maps into 0’.
Let V, V’, and V" be vector spaces, and let 7: V—> V' and T': V' > V" be
linear transformations. The composite transformation 7’ ° 7: V—> V’” is defined
by (T° F)(v) = T'(7(v)) for v in V. Exercise 36 shows that.7’ ° T is again a
linear transformation.

EXAMPLE 1 Let F be the vector space of all functions f R — R, and let D be its subspace of
all differentiable functions. Show that differentiation is a linear transforma-
tion of D into F.
SOLUTION Let T: D— F be defined by T(/) = /’, the derivative of f Using the familiar
rules

(ft gyafi te and (rf)=r(s')


for differentiacion from calculus, we see that

Tift g=(ft gy =f' +g’ = Tf) + Tg)


and

T rf) = (fy = (f') = rT(f)


for all functions fand gin D and scarlars r. In other words, these two rules for
differentiating a sum of functions and for differentiating a scalar times a
function constitute precisely the assertion that differentiation is a linear
transformation. «

EXAMPLE 2 Let F be the vector space of all functions /: R > R, and let c be in R. Show that
the evaluation function T: F > R defined by 7(f) = f(c), which maps each
function fin F into its value at c, is a linear transformation.

HISTORICAL NOTE THE CONCEPT OF A LINEAR SUBSTITUTION dates back to the eighteenth
century. But it was only after physicists became used to dealing with vectors that the idea of a
function of vectors became explicit. One of the founders of vector analysis, Oliver Heaviside
(1850-1925), introduced the idea of a linear vector operator in one of his works on electromagne-
tism in 1885. He defined it using coordinates: ‘B comes from H by a linear vector operator if,
when B has components B,, B,, B, and H has components H,, H,, H,, there are numbers y,, for
i,j = 1, 2,3, where
By = yA, + pbyalt, + pills
By = py Hh, + pall, + paylls
By = py, + pyrll, + pgs.
In his lectures at Yale, which were published in 1901, J. Willard Gibbs called this same
transformation a linear, vector function. But he also defined this more abstractly as a continuous
function fsuch that f(yvy + Ww) ==f v)+f( Ww). A fully abstract definition, exactly like Definition
3.9, was given by Hermann Weyl in Space-Time-Matter (1918).
Oliver Heaviside was a self-taught expert on mathematical physics who played an important
role in the development of electromagnetic theory and especially 1ts practical applications. In
1901 he predicted the existence of a reflecting ionized region surrounding the earth; the existence
of this layer, now called the ionosphere, was soon confirmed.
3.4 LINEAR TRANSFORMATIONS 215

SOLUTION We show that T preserves addition and scalar multiplication. If fand gare
functions in the vector space F, then, evaluating f+ g at c, we obtain

(f+ gc) = fle) + go).


Therefore,

Tif+ g)=(S+ 2X0) Definition of T


= fic) + ac) Definition of f+ gin F
= T(f) + T(g). Definition of T

This shows that 7 preserves addition. In a similar manner, the computation


T(rf) = (7f)\(c) _ Definition of T
=r(j(c)) Definition of rfin F
=r(T(f)) Definition of 7
shows that T preserves scalar multiplication.

EXAMPLE 3 Let C,, de the vector space of all continuous functions mapping the closed
interval a = x = b of R into R. Show that T: C,, > R defined by T(f) =
b

| f(x) dx is a linear transformation.

SOLUTION From properties of the definite integral, we know that for fg © C,, and for any
scalar r, we have

n+ = Ue) + eoax= | soy ar + [ ae de= TU) + 7H)


and

5 b

T(rf) = | f(x) dx = r| f(x) dx = rT(f).

This shows that 7 is indeed a linear transformation.

EXAMPLE 4 Let C be the vector space of all continuous functions mapping R into R. Let -

a€ Rand let T,: C—> Cbe defined by 7,(f) = { f(t) dt. Show that T, is a linear
transformation.

SOLUTION This follows from the same properties of the integral that we used in the
solution of the preceding exercise. From calculus, we know that the range of T,
is aviually a subset of the vector space of differentiable functions mapping R
into R. (Theorem 3.7, which follows shortly, will show that the range is
actually a subspace.) a
216 CHAPTER 3 VECTOR SPACES

EXAMPLE 5 Let D, be the space of functions mapping R iuto R that have derivatives of
all orders, and let a, @,, a,,..., a, € R. Show that T: D, > D, defined by
T(f) = 4, f' (x) + +++ + af" (x) + a f'(x) + agf(4) is a linear transforma-
tion.
SOLUTION This follows from the fact that the ith derivative of a sum of functions 1s the
sum of their ith derivatives—that is, (f+ g)(x) = f(x) + g°(x)—and that the
ith derivative of rf(x) is rf\(x), together with the fact that T(/) is defined to be
a linear combination of these derivatives. (We consider f(x) to be the Oth
derivative.) s

Note that the computation of 7(f) in Example 5 amounts to the


computation of the left-hand side of the general linear differential equation
with constant coefficients
Gy toe + ay" + ay’ + ay = glx) (5)
for y = f(x}. Thus, solving this differential equation amounis to finding all
f © D, such that T(/) = g(x).

Properties of Linear Transformations


‘Lhe two properties of linear transformations in our next theorem are useful
and easy to prove.

THEOREM 3.5. Preservation of Zero and Subtraction

Let V and V’ be vector spaces, and let 7; V > V’ be a linear


transformation. Then
1. 7(0) = 0’, and Preservation of zero
2. T(v, — v,) = T(v,) — T(¥,) Preservation of subtraction
for any vectors v, and y, in V.

PROOF We establish preservation of zero by taking r = 0 and v = @ in


condition 2 of Definition 3.9 for a linear transformation. Condition 2 and the
property Ov = 0 (see Theorem 3.1) yield
T(0) = 7(00) = 07(0) = 0’.
Preservation of subtraction follows from Eq. (3), as follows:
T(v, — v,) = T(v, + (—1)¥))
= T(v,) + (-1) 7)
= T(v,) — T(v.). A

7* AMPLE 6 Let D Le the vector space of ail differentiable functions and F the vector space
of all functions mapping & into R. Determine whether /. D — F defined by
TU) = 2f"(x) + 3f'(x) + x is a linear transformation.
34 LINLAK | KANSFURIMAILIUNS aur

SOLUTION Because for the zero constant function we have 7(0) = 2(0") + 3(0’') + x =
2(0) + 3(0) + x’ = x’, and x’ is not the zero constant function, we see that 7
does not preserve zero, and so it is not a linear transformation. 1s

Just as for a linear transformation 7: R" — R” discussed in Section 2.3, a


linear transformation of a vector space is determined by its action on any basis
for the domain. (See Exercises 40 and 41 as well.) Because our bases here iiay
be infinite, we restate the result and give the short proof.

THECREM 3.6 Bases and Linear Transformations

Let T: V— V’ bealinear transformation, and let B be a basis for V. For


any vector vin V, the vector T(v) is uniquely determined by tlie vectors
T(b) for all b € B. In.other words, if two linear transformations have
the same value at each basis vector b € B, the two: transformations
have the same value at each vector in V; that is, they are the same
transformation.

PROOF Let Tand 7 be two linear transformations such that 7(b, = T(b,) for
each vector b, in B. Let vy € V. Then there exist vectors b,, b,, ..., b, in Band
scalars r,, r,..., 7 Such that

v=rb, + +b, + +°° + 7b,

We then have

T(v) = 7(r,b, + rb. + +++ + 7b,

= nT(b,) + r7(b,) + «++ + 7,T(b,).


= r,T(b,) + r,T(b,) + +--+ + 7,7 (b,).
= T(r,b, + rb, + «++ + 7b,
= T(v).”
Thus T and T are the sametransformation. 4

The next theorem also generalizes results that appear ‘n the text and
exercises of Section 2.3 for linear transformations mapping R” into R”.

THEOREM 3.7 Preservation of Subspaces

Let V and Y’ be vector. spaces, and let 7: V — V’ be a linear


transformation.
1. If Wis a subspace of V, then T[W] is a subspace of V’.
2. If W’ is a subspace of V’, then T~'{W’'] is a subspace of V.
218 CHAPTER 3 VECTOR SPACES
PROOF
1. Because 7(0) = 0’, we need only show that T[W’] is closed under vector
addition and under scalar multiplication. Let 7(w,) and 7(w,) be any
vectors in 7(W), where w, and w, are vectors in W. Then
T(w,) + T(w2) = T(w, + W)),
by preservation of addition. Now w, + w, isin W because W is itself closed
under addition, and so 7(w, + w,) is in T[W]. This shows that T[W)] is closed
under vector audition. If ris any scalar, then rw, isin W, and
rT(w,) = T(rw,).
This shows that r7(w,) is in T[W], and so T[W] is closed under scalar
multiplication. Thus, 7[W} is a subspace of V’
2. Notice that 0 € T-'[W’'t. Let v, and v, be any vectors in T~'[W’}, so that
T(v,) and 7(v,\ are in W'. Then :

T(v, + ¥) = Tv,) + T(v,)


is also in the subspace W’, and so vy, + v, is in J~'{W’]. For any scalar r, we
know that

rT\v,) = T(rv,),
and r7(¥,) isin W’. Thus, rv, is also in T~'[W’']. This shows-that T7'[W’] is
closed under addition and under scalar multiplication, and so T"'[W’'] is a
subspace of V. az

The Equation T(x) = b


Let 7: V— V’ bea linear transformation. From Theorem 3.7, we know that
ker(T) = T~'[{0’}] is a subspace of V. This subspace is the solution set of the
homogeneous transformation equation T(x) = 0'. The structure of the solution
set of 7(x) = b exactly parallels the structure of the solution set of Ax = b
described on p. 97. Namely,

Solution Set of T(x) = b


Let 7: V— V’ be a linear transformation and let 7(p) = b for a
particular vector p in V. The solution set of 7(x) = b is the set
{p+ h|hé ker(T)}.

The proof is essentially the same as that of Theorem 1.18; we ask you to write
it out for this case in Exercise 46. This boxed result shows that if kez(7) = {0},
then 7(x) = b has at most one solution, and so T is one-to-one, meaning that
7 (¥,) = T(v,) implies that v, = v,. Conversely, if Tis one-to-one, then 7(x) = 0'
has only one solution—namely, 0—so ker(T) = {0}. We box this fact.
34 LINEAR TRANSFORMATIONS 219

Condition for T to Be One-to-One

A linear transformation T is one-to-one if and only if ker(7) = {O}.

ILLUSTRATION 1 The differential equation y” — 4y = x’ is linear with constant coefficients and is


of the form of Eq. (5). In Example 5, we showed that the left-hand side of such
an equation defines a linear transformation of D, into itself. {n differential
equations, it is shown that the kernel of the transformation 7(/) =f" — 4/is
two-dimensional. We can check that the independent functions h,(x) = e* and
h,(x) = e-* both satisfy the differential equation y” — 4y = 0. Thus {e”, e-**}is
a basis for the kernel of T. All solutions of the homogeneous system are of the
form c,e* + c,e~*. We find a particular solution p(x) of y’ — 4y = x? by
inspection: if y = —x’ /4, then the term —4y yields x but the second derivative
of —x? /4 is —4; we can kill off this unwanted —3 by taking
l 1
P(X) = —4X' — 3
(note that the second derivative of -} is 0). Thus the general solution of this
differential equation is

y= ce* + Ge™ — 7x — 2. 7

Invertible Transformations

In Section 2.2, we saw that, ifA is an invertible nm x » matrix, the linear


transformation T: R’ — R’ defined by 7(x)= Ax has an inverse 7~': R" > R’,
where 7~'{y)= A-‘y, and that both composite transformations T~'° 7 and
To T~' are the identity transformation of Rk”. We now extend this idea to linear
transformations of vector spaces in general.

DEFINITION 3.10 [nvertible Transformation

» Let: Vand Vbe% Weir Spaces. “A linear transformation T: V> V' is


invertible if there exists a linear transformation T~': V' > Vsuch that
Tro Tis the identity transformation on Vand Te T™ is the identity
transformation on: Ee
Pe “os gern. . is hi

EXAMPLE 7 Determine whether the evaluation transformation 7: F > R of Example 2,


where 7(/) = f(c) for some fixed c in R, 1s invertible.
SOLUTION Consider the polynomial functions f and g, where f(x) = 2x + c and g(x)=
4x — c. Then T(f)= T(g) = 3c. If T were invertible, there would have to bea
linear transformation T~': R > F such that 7~'(3c) is both fand g. But this is
impossible. am
220 CHAPTER 3 VECTOR SPACES

Example 7 illustrates that, if 77 V-> V’ is an invertible linear transforma-


tion, T must satisfy the following property:

if v, # v,, then 7(v,) # T{v,). One-to-one (6)


As in the argument of Example 7, if T(v,) = T(v,) = v’ and T is invertible, there
would have to be a linear transformation T~': V' > V such that T~\(v')
is simultaneously v, and y,, which is impossible when v, # ¥).
An invertible linear transformation 7: V > V’ must also satisfy another
property:

if v’ isin V’, then 7(v) = v’ for some vin V. Onto (7)

This follows at once from the fact that, for T~' with the properties in
Definition 3.10 and for v' in V’, we have T~'(v') = v for some vin V. But then
v = (Te T')(v') = 7(7-\v')) = Tv). A transformation T: V— V' satisfying
property (7) is onto V’; in this case the range of 7 is all of V’. We have thus
proved half of ihe following theorem.*

THEOREM 3.8 Invertibility of 7

A linear transformation T: V— V’ is invertible if and only if it is


one-to-one and onto V’.

PROOF We have just shown that, if Tis invertible, it must be one-to-one and
onto V’.
Suppose now that 7 is one-to-one and onto V’. Because T is onto V’, for
each v’ in V’ we can find v in V such that 7(v) = v’. Because T is one-to-one,
this vector v in Vis unique. Let T~': V' — Vbe defined by T~'(v') = v, where V
is the unique vector in V such that 7(v) = v’. Then

(Te Tv!) = T(T-\v')) = Tv) = v'


and

(T-!° T)(v) = T-(T(v)) = Tv’) = y,


which shows that T° T~' is the identity map of V' and that T~! o T is the
identity map of V. It only remains to show that T~' is indeed a linear
transformation—that is, that

Tv; + vs) = TW) + Ts) and T-(ry,) = rT“)

“This theorem is valid for functions in general,


3.4 LINEAR TRANSFORMATIONS 221

for all vy, and v; in V’ and for all scalars r. Let v, and v, be the unique vectors in
V such that 7(v,) = v, and 7(v,) = v;. Remembering that T”' ° T is the identity
map, we have

T (vy, + v;) = T-'(T(v,) + T(v,)) = T\(T(v, + ¥2))


= (T"'° Ty, + v,) =v, + vy = T'(¥,) + To '(¥}).

Similarly,
T-\(rvi) = T7\(rT(v,)) = T-(T(rv,)) = rv, = 'T(¥}). A

The proof of Theorem 3.8 shows that, if 7; V > V’ is invertible, the linear
transformation T-': V' — V described in Definition 3.10 is unique. This
transformation 7~' is the inverse transformation of T.

ILLUSTRATION 2 Let D be the vector space of differentiable functions and F the vector space
of all functions mapping R into R. Then T: D > F, where T(f) = f', the
derivative of f, is not an invertible transformation. Specificaily, we have 7(x) =
T(x + 17) = 1, showing that T is not one-to-one. Notice also tnat the kernel of
T is not {0}, but consists of all constant functions. »

ILLUSTRATION 3 Let P be the vector space cf all polynomials in x and let W be the subspace of
all polynomials in x with constant term 0. so that q(0) = 0 for q(x) © W. Let
T: P—> W be defined by T(p(x)) = xp(x). Then T is a linear transformation.
Because xp,(x) = xp,(x) if and only if p,(x) = p,(x), we see that T is one-to-one.
Every polynomial g(x) in W contains no constant term, and so it can be
factored as q(x} = xp(x); because T(p(x)) = g(x), we sec that T mans Ponto W.
Thus T is an invertible linear transformation. s

lsomorphism
An isomorphism is a linear transformation 7; V > V’' that is one-to-one
and onto V’. Theorem 3.8 shows that isomorphisms are precisely the in-
vertible linear transformations T: V > V’'. If an isomorphism T exists,
then it is invertible and its inverse is also an isomorphism. The vector spaces
V and V’ are said to be isomorphic in this case. We view isomorphic vector
spaces V and V’ as being structurally identical in the following sense. Let
T: V — V’ be an isomorphism. Rename each v in V by the v’ = 7(y) in
V'. Because T is one-to-one, no two different elements of V get the same
name from V’, and because T is onto V’', all names in V’ are used. The
renamed V and V’ then appear identical as sets. But they also have the
same algebraic structure as vector spaces, as Figure 3.2 illustrates. We
discussed a special case of this concept before Example 3 in Section 3.3,
indicating informally that every finite-dimensional vector space is struc-
turally the same as R" for some n. We are now in a position to state this
formally.
222 CHAPTER 3 VECTOR SPACES

THEOREM 3.9 Coordinatization of Finite-Dimensional Spaces

Let V be a finite-dimensional vector space with ordered basis B =


(b,, b,,..., b,). The map 7: V > R" defined by 7{v) = vz, the
coordinate vector of vy relative to B, is an isomorphism.

PROOF Equation 1 in Section 3.3 shows that T preserves addition and scalar
multiplication. Moreover, T is one-to-one, because the coordinate vector v, of
¥ uniquely determines v, and the range of T is all of R’. Therefore, 7 is an
isomorphism. a
The isomorphism of V with R’, described in Theorem 3.9, is by no means
wnique. There is one such isomorphism for each choice of an ordered basis B
of V.
Let V and V’ be vector spaces of dimensions n and m, respectively. By
Theorem 3.9, we can choose ordered bases B for V and b’ for V’', and
essentially convert V into R" and V’ into R” by renaming vectors by their
coordinate vectors relative to these bases. Then each linear transformation
T: V— V' corresponds to a linear transformation T: R" > R” in a natural way,
and we can answer questions about T by studying T instead. But we can study
T in turn by studying its standard matrix representation A, as we will

. add .
VY}, Vain V > Vv) t v,inV

rename rename

Vy, V2 in V’ aa Vj + ¥9= (vy + V9)’ in V’

(a)

multiply by scalar
vin V —>— rvin V

rename rename

v’ in V' —»rv' = (rv)’ in V'


multiply by scalar

(b)
FIGURE 3.2
(a) The equal sign shows that the renaming preserves vector addition
(b) The equal sign shows that the renaming preserves scalar multiplication.
3.4 LINEAR TRANSFORMATIONS 223

illustrate shortly. This matrix changes if we change the ordered bases Bor B’. A
good deal of the remainder of our text is devoted to studying how to choose B
and B’ when m = n so that the square matrix 4 has a simple form that
illuminates the structure of the transformation 7. This is the thrust of
Chapters 5 and 7.

Matrix Representations of Transformations


In Section 2.3, we saw that every linear transformation T: R" > R™ can be
computed using its standard matrix representation A, where A is the m x n
matnx having as jth column vector 7(e). Namely, for this matrix A and any
column vector x € R’, we have 7(x) =
We just showed that finite-dimensional vector spaces V and V’, where
dim(V) = n and dim(V’) = m, are isomorphic to R” and R”, respectively. We
can rename veciors in Vand by their coordinate vectors relative to ordered
basesB = (b,, b,, ... , b,) of Vand B’ = (bj, b;, ... , b),) of V’, and work with
the coordinates, doing computations in R" and R”. In particular, a linear
transformation T: V-> V' and a choice of ordered bases B and B’ then gives a
linear transformation T: R" > R™ such that for v € V we have T(v,) = T(v)p..
This transformation T has an m X n standard matrix representation whose jth
column vector is T(ee;). Now e, is the coordinate vector, relative to B, of the jth
vector b, in the ordered basis B, so T(e i) = T((b is)= 1(b,)_. Thus we see that
the standard matnx representation of Fi is

A= |T(b,)y T(b,)e ° +> T(b,)e (8)

We summarize in a theorem.

THEOREM 3.10 Matrix Representations of Linear Transformations

Let V and V’ be finite-dimensional vector spaces and let B =


(b,, b,.. . - , b,).and B’.= (bi, bj, .... 'b’,) -be‘ordered bases for Vand
V', respectively. Let T: V— V' be a linear transformation, and let
T: R" > R” be the linear transformation such that for each v € V, we
have T(v,) = T(v),:. Then the standard matrix representation of T is
the matrix A whose jth column vector is ‘T(b,),, and T(v), = Av, for all
vectors v & V.

DEFINITION 3.11 Matrix Representation of 7 Relative to B,B’

The matrix A in Eq. (8) and described in Theorem 3.10 is the matrix
representation of 7 relative to B,B’.
224 CHAPTER 3 VECTOR SPACES

EXAMPLE 8 Let V be the .ubspace sp(sin x cos x, sin’x, cos’x) of the vector space D of all
differentiable functions mapping R into R. Differentiation gives a linear
transformation T of V into itself. Find the matrix representation A of T
relative to B,B’ where B = B' = (sin x ccs x, sin*x, cos*x). Use A to compute the
derivative of
f(x) = 3 sin x cos x — 5 sin’x + 7 cos*x.

SOLUTION We find that

T(sin x cos x) = —sin?’x + cos’*x so T(sin x cos x), = [0, —1, 1],
T(sin’x) = 2 sin xcosx so T(sin*x), = [2,0,0], and
T(cos’*x) = —2 sinxcosx so T(cos‘x)g, = [—2, 6, 0}.
Using Eq. (8), we have

[0 2-2 0 2-2]{ 3] [-24]


A={-1 0 0] so A(f(x)s)=|-1 O O|]-5| =] -3).
i 0 0 1 0 0
Thus f(x) = —24 sin x cos x — 3 sin’x + 3 cos’x. a

EXAMPLE 9 Letting P, be the vector space of polynomials of degree at most n, we note that
T: P,— P, defined by T(p(x)) = (x + 1)p(x — 2) isa linear transformation. Find
the matrix representation A of T relative to the ordered bases B = (x’, x, 1) and
B' = (x3, x*, x, 1) for P, and P,, respectively. Use 4 to compute 7(p(x)} for
p(x) = 5x* — 7x + 18.
SOLUTION We compute

T(x’) = (x + 1x - 22 = (x + IQ? - 4x 4+ 4) = x - 3x? + 4,


T(x)
= (x + 1l(x-2)=x-x-2, and THh=(x+ D1 =xt+1.

Thus T(x?}5° = fl, —3, 0, 4}, T(x)» = [0, l, —I, —2], and T()),: = [0, 0, l, 1}.

Consequently,
1 0 Of 1 0 O|, ., 5
-3 1 0 -3 1 Oj] 3) |-22
A=! 9 -1 so A(p(x)s)=| g -1 1 - =| 5]:
4-2 1 4-2 1{L18 52
Thus 7(p(x)) = 5x? - 22x? + 25x +52. =

Let Vand V' be n-dimensicnal vector spaces with ordered bases B and B,
respectively. Theorem 3.8 tells us that a linear transformation T: V—> V' 1S
invertible if and only if it is one-to-one and onto V’. Under these circum-
stances, we might expect the following to be true:
34 LINEAR TRANSFORMATIONS 225

Matrix Representation of T-!


The matrix representation of T-' relative to B’,B is the inverse of
the matrix representation of T relative to B,B’.

This is indeed the case in view of Theorem 3.10 and the fact that linear
transfermations of R” have this property. (See the box on page 151.) Exercise
20 gives an illustration of this.
It is important to note that the matrix representation of the linear
transforination 7: V- V' depends on the particular basesB for Vand B’ for
V’. We really should use some noiation such as Ag, or Rg yg to denote this
dependency. Such notations appear cumbersome and complicated, so we
avoid them for now. We will use such a notation in Chapter 7, where we will
consider this dependency in more detail.

| SUMMARY

Let Vand V' be vector spaces, and let T be a function mapping V into V'.

1. The function Tis a linear transformation if it preserves addition and scalar


multiplication —that is, if

T(v, + ¥)) = T(v,) + Ty)


and

T(rv,) = rT(v,)
for all vectors v, and v, in V and for scalars vin R.
2. If T is a linear transformation, then 7(0) = 0’ and also 7(v, — v,) =
T(v,) — T(v)).
3. A function T: R’ > R” is a linear transformation if and only if T has the
form T(x) = Ax for some m X n matrix A.
4. The matrix A in summary item 3 is the standard matrix representation
of the transformation 7.
5. A linear transformation 7: V — V’' is invertible if and only if it is
one-to-one and onto V’. Such transformations are isomorphisms.
6. Every nonzero finite-dimensional real vector space Vis isomorphic to R’,
where n = dim(V).

Now let T: V—> V' be a linear transformation.

7. If W is a subspace of V, then T[W] is a subspace of V’. If W' is a


subspace of }’’, then T~'[ W’’] is a subspace of V.
226 CHAPTER 3 VECTOR SPACES

8. The range of T is the subspace {7(v) | v in V} of V’, and the kernel ker(T)
is the subspace {v € V | T(v) = 0'} of V.
9. Tis one-to-one if and only if ker(7) = {0}. If T is one-to-one and has as its
range all of V', then T-': V' — V is well-defined and is a linear
transformation. In this case, both T and 7~' are isomorphisms.
10. If 7: V— V’' isa linear transformation and 7(p) = b, then the solution set
of 7(x) = b is {p + h| h € ker(T)}.
11. Let B= (b, b,,... ,b,) and B’ = (bi, bi, . . . by.) be ordered bases for V
and V', respectively. The matrix representation of T relative to B,B' is the
m X n matrix A having 7(b,), as its jth column vector. We have T(v),. =
A(v,) for all v € V.

"| EXERCISES
In Exercises 1-5, let F be the vector space of all 9. Note that one solution of the differential
functions mapping R into R. Determine whether equation y” — 4y = sin x is y = —1 sin x.
the given function T is a linear transformation. If
Use summary itera 10 and Illustration | to
it ts a linear transformation, describe the kernel of
describe all solutions of this equation.
T and determine whether the transformation is
invertible. Let D, be the vector space of functions mapping
R into R that have derivatives of all orders. It
1. T: F— R defined by T(f) = f(-4) can be shown that the kernel of a linear
2. T: F > R defined by 7(f) = f(5) transformation 7: D, > D, of the form 7(/) =
afM+--+++af' + afwherea, # 0 is an
3. T: F > F defined by 7(/) = f+f
n-dimensional subspace of D..
4. T: F > F defined by 7(f) = f+ 3, where3
is the constant function with value 3 for all
xER. In Exercises 10-15, use the preceding
5. T: F > F defined by T(f) = —f information, summary item 10, and your
knowledge of calculus to find all solutions in D, of
6. Let Cy, be the space of continuous functions the given differential equation. See Illustration I
mapping the interval 0 = x = 2 into R. Let in the text.
T: Cy. — R be defined by 7(f) = f? f(x) dx.
See Example 3. If possible, give three
different functions in ker(T). 10. y’ = sin 2x 13. yp" + 4y=x?
7. Let C be the space of all continuous 11. y” = -cos x 14. y’ + y' = 3e*
tunctions mapping R into R, and let 12. y -y=x 15. yO - 2y" =x
T: C—> Che defined by 71f) = Ji f(o) dt. 16. Let V and V’' be vector spaces having
See Example 4. If possible, give three
ordered bases B = (b,, b,, b,) and B’ =
different functions in ker(7).
(bj, b;, b3, bi), respectively. Let 7: VV’
8. Let F be the vector space of all functions be a linear transformation such that
mapping R into R, and let 7: F— Fbea
linear transformation such that T(e*) = x’, T(b,) = 3b; + bj} + 4b) — by
T(e**) = sin x, and 7(1) = cos 5x. Find the T(b,) b) + 2b; — bj + 2b;
following, fit is determined by this data. T(b,) = —2b; — bs + 2b!
a ey c. T(3e")
b. ( + Se* )
7(3 d. 7(e+2e")
et+ 2e% Find the matrix representation A of T relativ
to B,B’.
3.4 LINEAR TRANSFORMATIONS 227

In Exercises 17-19, let Vand V be vector spaces c. Noting that T° T = D”, find the second
with ordered bases B = (b,, b,, b,) and B’ = derivative of —5x? + 8x7 -3x + 4 by
(b;, b;, b;, b,), respectively, and let T: V—> V’ be multiplying a column vector by an
the linear transformation having the given matrix appropriate matrix.
Aas matrix representation relutive to B,B' Find 22. Let T: P; > P; be defined by 7(p(x)) =
T(v) for the given vector v. x D(p(x)) and let the ordered bases B and
B' be as in Exercise Zi.
41 a. Find the matrix representation A relative
_|2 2
O—

to B,B’.
A=!) 6 ,v=b, +b, +b,
b. Working with the matrix A and
2 1
we

coordinate vectors, find all solutions p(x)


18. A as in Exercise 17, v = 3b, — b, of T(p(x)) = x — 3x7 + 4x.
c. The transformation T can be decomposed
0 4-1)
into T = T,° T,, where 7): P; > P, is
_{if 2 21.0.
19. A=]. 0 i [> ¥ = Ob, — 4b, + b, defined by 7,(p(x)} = D(p(x)) aad
T;: P, — P,is defined by T,(p(x)) = xp(x).
0 1 1
Find the matrix representations of 7,
and T; using the ordered bases B of P;
In Exercises 20-33, we consider A to be the and BY = (x’, x, 1) of P,. Now multiply
matrix representation of the indicated linear these matrix representations for 7, and
transformation T (you may assume it is linear) T, to obtain a 4 X 4 matrix, and
relative to the indicated ordered bases B and B’. compare with the matrix A. What do
Starting with Exercise 21, we let D denote you notice?
differentiating once, D* differentiating twice, and d. Multiply the two matrices found in part
50 on. (c) to obtain a 3 x 3 matrix. Let
T;: P, > P, be the linear transformation
20. Let V and V’ be vector spaces with ordered having this matrix as matrix
bases B = (b,, b,, b,) and B’ = (bi, bj, b5), representation relative to B”,B” for the
respectively. Let T: ’— V' be the linear ordered basis B” of part (c). Find
transformation such that T;(a,x° + a,x + a). How is T, related to
T(b,) = by) + 2b; — 3b; T, and T,?
T(b.) = 3b; + 5b; + 2b; 23. Let V be the subspace sp(x’e", xe*, e*) of the
vector space of all differentiable functions
T(b,) = —2b; — 3b; — 4b}. mapping R into R. Let 7: V— V be the
. Find the matrix A. linear transformation of V into itself given
~

. UseA to find 7(v),, if vz = (2, —5, 1}. by taking second derivatives, so T = D?, and
ao”

c. Show that 7 is invertible, and find the let B = B' = (xe", xe’, e*). Find the matrix
matrix representation of 7”! relative to A by .
B',B. a. following the procedure in summary item
d. Find T~'(v’), if v;. = ([—-1, 1, 3]. 11, and
e. Express 7~'(b}), 7~'(b;), and T~'(b}) b. finding and then squaring the matrix A,
as linear combinations of the vectors that represents the transformation D
in B. corresponding to taking first derivatives.
21. Let T: P; — P, be defined by 7(p(x)) = 24. Let T: P; —> P, be defined by 7(p(x)) =
D(p(x)), the derivative of p(x). Let the p'(2x + 1), where p'(x) = D(p(>)), and let B
ordered bases for P, be B = B’ = = (x, x, x, 1) and B’ = (x*, x, 1).
(x?, x’, x, 1). a. Find the matrix A.
a. Find the matrix A. b. Use A to compute 7(4x° — 5x’ + 4x — 7).
b. Use A to find the derivative of 25. Let V = sp(sin’x, cos’x) and let 7; V-> V be
4x? — 5x? + 10x- 13. defined by taking second derivatives.
228 CHAPTER 3 VECTOR SPACES

Taking B = B’ = (sin’x, cos’x), find A in two 33. For W and B in Exercise 31, find
ways. 7(a sin 2x + bcos 2x) for T: W> W
a. Compute 4 as described in summary whose matrix representation 1s
item 11.
b. Find the space W spanned by the first A > -2of
_|[90
derivatives of the vectors in B, choose an
ordered basis fer W, and compute A as a 34. Let Vand V’ be vector spaces. Mark each of
product of the two matrices representing the following True or False.
the differentiation map from V into —_—.a. A linear transformation of vector spaces
followed by the differentiaticn map from preserves the vector-space operations.
W into V. . Every function mapping V into V’' relates
. Let T: P; — P, be the linear transformation the algebraic structure of V to that of V’.
. A linear transformation T: V— V’
defined by T(p(x)) = D'(p(x)} — 4D(p(x) + carries the zero vector of V into the zero
p(x). Find the matrix representation A of 7,
where B = (x, 1 + x, x + x’, x’). vector of V’.
. A linear transformation 7: V— V'
. Let W = sp(e*, e*, e*) be the subspace of carries a pair v,—v in V into a pair v’,—v’
the vector space of al! real-valued functions in V’.
with domain R, and let B = (e*, e*, e*). . For every vector b’ in V’, the function
Find the matrix representation 4 relative to Ty: V— V' defined by 7,.(v) = b’ for all
B, B of the linear transformation T: W-> W vin Visa linear transformation.
defined by 7.) = D(f) + 2D(N +f . The function Ty: V— V' defined by
28. For W and B in Exercise 27, find the matrix T,(v) = 0’, the zero vector of V’, for all v
representation A of the linear transformation in V is a linear transformation.
T: W— W defined by T() = J2, (0) dt. _— g. The vector space P,, of polynomials of
29. For W and B in Exercise 27, find degree = 10 is isomorphic to R"”.
T(ae* + be* + ce*) for the linear —_—h. There is exactly one isomorphism
transformation T whose matrix T: Pig > R"
representation relative to B,B is i. Let Vand V' be vector spaces of

dimensions 7 and m, respectively. A


101 linear transformation T: V— V' is
A=}0 10 invertible if and only if m = n.
101 — j. If T in part (i) is an invertible
30. Repeat Exercise 29, given that transformation, then m = n.
35. Prove that the two conditions in Definition
[6 0 0 3.9 for a linear transformation are equivalent
A=|0 5 Ol. to the single condition in Eq. (3).
lo 0 -3 36. Let V, V', and V" be vector spaces, and let ~
31. Let W be the subspace sp(sin 2x, cos 2x) of T: V— V' and T: V' > V" be linear
the vector space of all real-valued functions transformations. Prove that the composite
with domain R, and let B = (sin 2x, cos 2x). function (T° J): V— V" defined by
Find the matrix representation A relative to (T° T\(v) = T(T(v)) for each v in V is again
B, B for the linear transformation 7; W > a linear transformation.
W defined by 7(f) = D(f) + 2D(N +f 37. Prove that, if T and 7” are invertible linear
32. For W and B in Exercise 31, find transformations of vector spaces such that
T(a sin 2x + 6 cos 2x) for T: W > W To T is defined, 7 ° T is also invertible.
whosc matrix representation is 38. State conditions for an m x n mairix A that
are equivalent to the condition that the
A= fit] linear transformation 7(x) = ix for x in R”
i 1 is an isomorphism.
3.5 INNE&-PRODUCT SPACES (OPTIONAL) 229

39. Let v and w be independent vectors in V, 45. If Vand V’ are the finite-dimensional spaces
and let 7: V > V’ be a one-to-one linear in Exercises 43 and 44 and have ordered
transformation of V into V'. Prove that T(v) bases B and B’, respectively, describe the
and 7(w) are independent vectors in V’, matrix representations of 7, + 7, in Exercise
43 and rT ip Exercise 44 in terms of the
40. Let V and V’ be vector spaces, let B =
{b,, b,,.. ., b,} be a basis for V, and let matrix representations of 7), 7), aud T
cl, ¢;,--+,¢, © V’. Prove that there exists a relative to B,B’.
linear transformation 7: Y¥ > V’ such that 46. Prove that if 7: V— V' is a linear
T(b) = ¢ fori=1,2,...,4, transformation and 7(p) = b. then the
solution set of T(x) = b is
41. State and prove a generalization of Exercise
40 for any vector spaces V and V’, where V {p+ h|h € ker(T)}.
has a basis B. 47. Prove that, for any five linear
transformations 7,, 7}, 73, T,, T; mapping
42. If the matrix representation of T: R’ > R"
relative to B,B is a diagonal matrix, describe R? into R?, there’ exist scalars ¢,, C), C3, C4, Cs
the effect of 7 on the basis vectors in B. (not all of which are zero) such that 7 =
CT, + Ty + 037; + C457, + €5T; has the
Exercises 43 and 44 show that the set L(V, V') of property that 7(x) = 0 for all x in R’.
all linear transformations mapping a vector space 48. Let T: R" > R’ be a linear transformation.
V into a vector space V' is a subspace of the Prove that, if 7(7(x)} = T(x) + T(x) + 3x
vector space of all functions mapping V into V'. for all x in R’, then 7 is a one-to-one
(See summary item 5 in Section 3.1.) mapping of R” into R’.
49. Let V and V’ be vector spaces having the
43. Let 7, and T, be in L(V, V’), and let same finite dimension, and let 7: V— V’' be
(T, + T,): V— V' be defined by a linear transformation. Prove that T is
one-to-one if and only if range(T) = V’.
(T, + T,)(v) = T,(v) + Tv) (Hint: Use Exercise 36 in Section 3.2.]
for each vector v in V. Prove that 7, + T; is 50. Give an example of a vector space V and a
again a linear transformation of V into V’. linear transformation 7: ¥ — ? such that 7
44, Let T be in L(V, V’'), let r be any scalar in R, is one-to-one but range(7) # V. [Hint: By
and let r7: V > V' be defined by Exercise 49, what must be true of the
dimension of V’?]
(rT )(v) = {T()) 51. Repeat Exercise 50, but this time make
for each vector vin V. Prove that rT is again range(7) = V for a transformation T that is
a linear transformation of V into V’ not one-to-one.

3.5 INNER-PRODUCT SPACES (Optional)

In Section 1.2, we introduced the concepts of the length of a vector and the
angle between vectors in R”. Length and angle are defined and computed in R’
using the dot product of vectors. Iii this section, we discuss these notions for
more general vector spaces. We start by recalling the properties of the dot
product in R’, listed in Thecrem 1.3, Section !.2.
230 CHAPTER 3 VECTOR SPACES

Properties of the Dot Product in R’°


Let u, v, and w be vectors in R" and let r be any scalar in R. The
following properties hold:
DI v:'w=w-y, Commutative law
D2 u-(vtw)=u'vtu'w, Distributive law
C3 r(v- w) = (rv): w=v- (rw), Homogeneity
D4 y-v20,andv-v= 0 if and only ifv = 0. Positivity

There are many vector spaces for which we can define useful dot products
satisfying properties D1-D4. In phrasing a general definition for such vector
spaces, it is customary to speak of an inner product rather than of a dot
product, and to use the notation (v, w) in place of v- w. One such example
would be ([2, 3], [4, —1]) = 2(4) + 3(-1) = 5.

DEFINITION 3.12 Inner-Product Space

An inner produet on a vector, Space Vis a function that associates with


each ordered pair “of vectors’ vy,winVa real number, written (v, w),
satisfying the following properties for all u, v, and w in V and for all
scalars r:
Pl (vy, w) = (w, v), Symmetry
P2 (u,v + w) = (u, vy) + (u, ¥), Additivity
F3 r(v, w)= (rv, w) = (v, rw), Homogeneity
P4 (vy, v) = 0, and v, ) = 0 if and only if v = 0. Positivity
An inner-product space is a vector space V together with an inner
product on V.

EXAMPLE ij Determine whether R? is an inner-product space if, for v = [v,, v,] and w =
[w,, w,], we define
(v, w) = 2v,w, + S5v,W,.
SOLUTION We check each of the four properties in Definition 3.12.
P1: Because (v, w) = 2v,w, + Sy,w, and because (w, v) = 2w,v, + 5w,y,, the first
property holds.
P2: We compute
(u,v + w) = 2u,(v, + W,) + 5u,(v, + Wy),
(u, v) = 2u,v, + Surv,
(u, W) = 2u,W, + 54,39.
3.5 INNER-PRODUCT SPACES (OPTIONAL) 231

The sum of the right-hand sides of the last two equations equals the
right-hand side of the firs1 equation. This establishes property P2.

P3: We compute

Kv, w) = r(2v, + S¥.Ws),


(rv, w) = 2(rvjjmy + d(ty2)Wy,
(v, rw) = 2v,(7,w,) + Dv,(rW,).
Because the three right-hand expressions are equal, property P3 tiolds.
P4: We see that (v, v) = 2y,?2 + 5v,? = 0. Because both terms of the sum are
nonnegative, the sum is zero if and only if v, = v. = O—that is, if and only if
v = 0. Therefore, we have an inner-product space. u

EXAMPLE 2 Determine whether R? is an inner-product space when, for v = [v,, v.] and w =
[w,, w,], we define

(v, w) = 2¥,w, — Sv.W.


SOLUTION A solution similar to the one for Example | goes along smoothly until we check
property P4, which fails:

(1, 1,0, 1) =2-5=-3<0.


Therefore, R? with this definition of ( , ) is not an inner-product space. us

EXAMPLE 3 Determine whether the space P,, of all polynomial functions with real
coefficients and domain 0 = x S J is an inner-product space if for p and g in
P,, we define

(7,0) = |, poxdax) de
SOLUTION We check the properties for an inner product.
Pi: Clearly (p, g) = (q, p) because

i p(x)q(x) dx = |, q(x)p(x) dx.


P2: For polynomial functions p, g, and A, we have

(p,q+ A) =| pOyqle) + Alx)) dx


=| veaaax + plohos ax
= (p,q) + (ph).
232 CHAPTER 3 VECTOR SPACES

P3: We have

r[ poatade = | rlpteygcs dx
= [ poxytatx)) de,
so P3 holds.
P4: Because (p, p) = J, p(x)’ dx and because p(x) = 0 for all x, we have (p, p) =
f p(x)? dx = 0. Now p(x)’ is a continuous nonnegative polynomial function and
can be zero only at a finite number of points unless p(x) is the zero polynomial.
It follows that

(p, p) = |, oxyde> 0,
unless p(x) is the zero polynomial. This establishes P4.
Because all four properties of Definition 3.12 hold, the space P,, is an
inner-product space with the given inner product. sm

We mention that Example 3 is a very famous inner product, and the same
definition

(a= | foes) de
gives an inner product on the space C,, of all continuous real-valued functions
with domain of the interval 0 = x = 1. The hypothesis of continuity is essential
for the demonstration of P4, as is shown in advanced calculus. Of course, there
is nothing unique about the interval 0 = x = 1. Any interval a = x s bcan be
used in its place. The choice of interval depends on the application.

Magnitude
The condition (Vv, v) = 0 in the definition of an inner-product space allows us to
define the magnitude of a vector, just as we did in Section 1.2 using the dot
product.

DEFINITION 3.13 Magnitude or Norm of a Vector

Let Vbe an inner-product space. The magnitude or norm of a vector v


in Vis |\v|| = V(y, v).

This definition of norm reduces to the usual definition of magnitude of


vectors in kk” when the dot product 1s used. That 1s, ||y] = Vv - v.
3.5 INNER-PRODUCT SPACES (OPTIONAL) 233

Recall that in R* we can visualize the vector v — w geometrically as an


arrow reaching from the tip of the arrow representing w to the tip of the arrow
representing Vv, so that ||v — w| is the distance between the tip of v and the tip of
w. This leads us to define the distance between v and w in an inner-product
space V to be d(v, w) = |lv — |.

EXAMPLE 4 In the inner-product space P,, of all polynomial functions with real coeff-
cients and domain 0 < x < 1, and with inner product defined by

f'

(p,q) = j, Pax)ax,
(a) find the magnitude of the polynomial p(x) = x + 1, and (b) compute the
distance d(x’, x) from x? to x.
SOLUTION For part (a), we have

ix+ P= + 1x+ =| (x+ 1) dx

1 4 l q
_ [ 2
(e+ 2x4 Dae -(~4y
S++ 2)) =—/3"

Therefore, ||x + 1|| = V7/3.


For part (b), we have d(x’, x) = ||x? — x||. We compute

I?— xP = G2 - x89) = | Ot — tae


_[hs (f@-oee
2x? xe dx
+ x’) grat—lylid
575 +95 = 39

Therefore, d(x’, x) = 1/V30. =

The inner product used in Example 4 was not contrived just for this
illustration. Another measure of the distance between the functions x and x’
over 0 = x = | is the maximum vertical distance between their graphs over the
interval 0 = x = 1, which calculus easily shows to be i where x = 5. In contrast,
the inner product in this example uses an integral to measure the distance
between the functions, not only at one point of the interval, but over the
interval as a whole. Notice that the distance 1/30 we obtained is less than the
maximum distance 4 between the graphs, reflecting the fact that the graphs are
less than 4 unit apart over much of the interval. The functions are shown in
Figure 3. 3, The notion (used in Example 4) of distance between functions over
an interval is very important in advanced mathematics, where it is used in
approximating a complicated function over an interval as closely as possible
by a function that is easier to handle.
234 CHAPTER 3 VECTOR SPACES


,
IT (1, 1)
rel

!
~
~
——af-——_po~
ele

nee » a,
+

7
nN|—

FIGURE 3.3
Graphs of x and x? overO sx = 1.

It is often best, when working with the norm of a vector vy, to work with
||v||? = <v, v) and to introduce the radical in the final stages of the computation.

EXAMPLE 5 Let V be an inner-product space. Verify that

Irvil = [71 Iv
for any vector vin V and for any scalar r.

SOLUTION We have

Irv? = (rv, rv) Applying homogeneity


= py, v) (property P3) twice

= Ply.
On taking square roots, we have |[rv|| = |r| |v].

The property of norms in the preceding example can be useful in


computing the magnitude of a vector and in establishing relations between
magnitudes, as illustrated in the following two examples.

EXAMPLE 6 Using the standard inner product in R’, find the magnitude of the vector

v = [-6, —12, 6, 18, —6}.


SOLUTION Because ¥ = —6[1, 2, —1, —3, 1], the property of norms in Example 5 tells us
that

tly] = 1-6] IL, 2, —1, -3, Ll = 6W16 = 24. "


3.5 INNER-PRGDUCT SPACES (OPTIONAL) 235

The Schwarz and Triangle Inequalities


In Section 1.2, we defined the angle @ between two nonzero vectors v and win
R" to be

v-w
8 = arccos
lvl [wil
The validity of this definition rested on the fact that for v, w € R’. we have

|v + w| = [lv [wl] Schwarz inequality


so that

vow
-l|s = 1.
[|v] ‘|
The Schwarz inequality with v + w replaced by (v, w) holds in any inner-product
space.

THEOREM 3.11 Schwarz Inequality


Let’V be an inner-produict space; aiid let V and w be vectors in
V. Then
Kv, W)| = [lvl lw.

PROOF Because the properties required for an inner-product space are


modeled on those for the dot product in R", we expect the proof here of the
Schwarz inequality to be essentially the same as in Theorem 1.4, Section 1.2
for ik", This is indeed the case. Jusi replace every occurrence, such as v- w, ofa
dot product in the proof of Theorem 1.4 by the corresponding inner product,
such as (v, W). az

We now define the angle between two vectors v and w in any inner-product
space to be

§ = arccos ——.
(vy, W)
\Ivl] [Iw
In particular, we define v and w to be orthogonal (or perpendicular) if (v, w) = 0.
Recall that another important inequality in R” that follows readily from
the Schwarz inequality is

Iv + wil < [lvl + [lwil. Triangle inequality


See Theorem 1.5 in Section 1.2. The triangle inequality is also valid in any
inner-product space; its proof can also be obtained from the proof of Theorem
1.5 by replacing a dot product such as v - w by the corresponding inner
product, (v, w).
236 CHAPTER 3 VECTOR SPACES
|

SUMMARY
a

An inner-product space is a vector space V with an inner product (, ) that


associates with each pair of vectors v, w in Va scalar (v, w) that satisfies the
following conditions for all vectors u, v, and win V and all scalars r:
Pl (v, w) = (w, ¥),
P2 (u,v + w) = (u, v) + (u, w),
P3 oriv, w) = (rv, w) = (vy, rw),
P4 (v, v) = 0, and (vy, v) = 0 if and only if v = 0.
RX’ is an inner-product space using the usual dot product as inner product.
In an innez-product space V, the norm of a vector is ||v|| = \(v, vy) and
satisfies ||rv|| = |r| |lvl]. The distance between vectors v and w is ||v — W|I.
For all vectors v and w in an inner-product space, we have the following
two inequalities:
Schwarz inequality’ |(v, w)| = |l¥|| ||w\I,
Triangle inequality: ||v + wi] = |ly| + |[wil-
Vectors v and win an inner-product space are orthogonal (perpendicular) if
and only if (v, w) = 0.
-———_-———

"| EXERCISES
In Exercises 1-9, determine whether or not the 10. Let C,, be the vector space of all continuous
indicated product satisfies the conditions for an real-valued functions with domain a < x = 0.
inner proditct in the given vector space. Prove that ( , ), defined in C,, by (f, g) =
JS? fdge(x) dx, is an inner product in C,,.
1. In R’, let ([x,, 2], Ds Ye) = xu — ae 11. Let C,, be the vector space of all continuous
2. In R?, let ([x,, x4), (1, Val) = Xy%q + Vo real-valued functions with domain 0 = x = 1.
3. In RY, let (x, %2, Dr, Jal) = 21’, + ay. rie 408) deined in Coy by (h 8) =
4. In R’, bet (Lx, 2), (i, Ya)) = xn. a. Find ((x + 1), x).
5. In R?, let (x, Xa; Xs], (vi, Yr ys) = Xj. b. Find |||.

6. In R?, let ([x,, X2, X3), [Yi Yo» al) = 1 + Y. c- Find I< ~ >
7. In the vector space M, of all 2 x 2 matrices, 4. Find |[sin 7x]
let 12. Prove that sin x and cos x are orthogonal |
functions in the vector space C), of Exercise
( é “i 5 a, 10, with the inner product defined there.
43 J LPs 4 13. Let (, ) be defined in C,, as in Exercise 11.
= a,b, + a,b, + ayby + ayby. Find a set of two independent functions in
8. Let C_,, be the vector space of all Cy, each of which is orthogonal to the
continuous functions mapping the interval constant function 1.
~l sx = L into R, and let (fg) = 14. Let u and v be vectors in an inner-product
$2, fodgtx) dx. space, and suppose that ||u|| = 3 and
9. Let C be as in Exercise §, and Ict (f, g} = ||v|| = 5. Find (u + 2v,u — 2v).
/(0)g(0).
3.5 INNER-PRODUCT SPACES (OPTIONAL) 237

15. Suppose that the vectors u and v in Exercise 20. Let V be an inner-product space, and !et S
14 are perpendicular. Find (u + 2v, 3u + vy), be a subset of V. Prove that
16. Let V be an inner-product space. Mark each St =
of the following True or False. {v € V| v is orthogona! to each vector in S}

a. The norm of every vector in Vis a is a subspace of V.


positive real number. 21. Referring tu Exercise 20, prove that
b. The norm of every nonzero vector in V is SE(Sy.
a positive real number. 22, Give an example of an inner-product space
c. We have ||rv|| = rilv|| for every scalar r V for which there exists a subspace ¥” such
and vector v in V. that (W*)* # W.
d. We have [lu + y||? = |[uj? + ||v||? for all
23. (Pythagorean theorem) Let u and v be
vectors u and v in V.
orthogonal vectors in an inner-product space
e. Two nonzero orthogonai vectors in V are
V. Prove that ||u + vl? = |lull? + ||v||2.
independent.
_— i. If |lu + v{! = |lull? + |lyj/? for two nonzero 24, Use the triangle inequality to prove that
vectors u and vin V, then u and ¥ are liv — wll = [lull + [Iwi
orthogonal.
__ g.- An inner product can be defined on every for any vectors v and w in an inner-product
finite-dimensional real vector space. space V.
___h. Let r be any real scalar. Then (, )’, 25. Prove that, for any vectors v and w in an
defined by (u, v)’ = xu, v) for vectors u inner-product space V, we have
and v in V, is also an inner product
on V.
Iv — || = {Ill — [II
26. Prove that the vectors ||v|w + |lw/|v and
_— i. (,)’, defined in part (h), is an inner
\|v||w — ||w||v in an inner-product space V are
product on V if r is nonzero.
perpendicular.
_— j. The distance between two vectors u and v
in V is given by |(u — v, u — v)j. 27. Consider the space C,, of continuous
functions with domain the closed interval
17. For vectors v and w in an inner-product asx < 5, aad let w(x) be a positive
space, prove that y — wand vy + ware continuous weight function, so that w(x) > 0
perpendicular if and only if |[v|| = |{w|. for a = x = b. Prove that for fand g in C,,,
18. For vectors u, v, and w in an inner-product the weighted integral
space and for scalars r and s, prove that, if w Cf, 8) = Sa w(x) flxg(x) dx
is perpendicular to both u and yv, then w is
perpendicular to ru + sv. defines an inner product on C,,. (Such a
weight function has the effect of making the
19. Let 5 be a subset of nonzero vectors in an portions of the interval a = x = 5 where
inner-product space V, and suppose that any w(x) is large more significant than portions
two different vectors in S are orthogonal. where w(x) is smaller in computing inner
Prove that S is an independent set. products and norms.)
CHAPTER

4 DETERMINANTS

Each square matrix has associated with it a number called the determinant of
the matrix. In Section 4.1 we introduce determinants of 2 x 2 and 3 x 3
matrices, motivated by computations of area and volume. Section 4.2 dis-
cusses determinants of n X n matrices and their properties.
Section 4.3 opens with an efficient way to compute determinants and then
presents Cramer’s rule as well as a formula for the inveise of an invertible
square matrix in terms of determinants. Cramer’s rule expresses, in terms of
determinants, the solution of a square linear system having a unique solution.
This method is primarily of theoretical interest because the methods presented
in Chapter 1 are much more efficient for solving a square system with more
than two or three equations. Because references and formulas involving
Cramer’s rule appear in advanced calculus and other fields, we believe that
students should at least read the statement of Cramer’s rule and look at an
illustration.
The chapter concludes with optional Section 4.4, which discusses the
significance of the determinant of the standard matrix representation of 4
linear transformation mapping R" into R’. The ideas in that section form the
foundation for the change-of-variable formulas for definite integrals of funt-
tions of one or more variables.

4.1 AREAS, VOLUMES, AND CROSS PRODUCTS

We introduce determinants by discussing one of their most important


applications: finding areas and vulumes. We will find areas and volumes ®
verv simple boxlike regions. In calculus, one finds areas and volumes ©
regions having more gencral shapes. using formulas that involve determinants:

238
4.1 AREAS, VOLUMES, ANO CROSS PRODUCTS 239

The Area of a Parallelogram


The parallelogram determined by two nonzero and nonparallel vectors a =
[a,, 4,] and b = (5,, D,] in R? is shown in Figure 4.1. This parallelogram has a
vertex at the origin, and we regard the arrows representing a and b as forming
the two sides of the parallelogram having the origin as a common vertex.
We can find the area of this parallelogram by multiplying the length |[al| of
its base by the altitude h, obtaining
Area = [[all 4 = |[al| |[b||(sin 8) = |lal| [|bI V1 — cos*é.
Recall from page 24 of Section |.2 that a - b = ||al| ||bll(cos 6). Squaring our area
equation, we have

(Area)’ = |[al/'[b||? — |[al|[b||? cos’é


= [lal/liall? — (a - by’
= (a)? + a?)(5" + by) — (a,b, + a,b.)
= (a,b, — a,b). (1)
The last equality should be checked using pencil and paper. On taking square
roots, we obtain

Area = |a,b, — a,b)|.


The number within the absolute value bars is known as the determinant of the
matrix

~{4 %
aT ot
and is denoted by !A] or det(A), so that

det(4) = (2! %
1 2

—p x

FIGURE 4.1
The parallelogram determined by a and b.
240 CHAPTER 4 DETERMINANTS

That ts, if

alee
_ {4

then

det(A) = b b, = a,b, — a,b,. (2)

We can remember this formula for the determinant by taking the product of
the black entries on the main diagonal of the matrix, minus the product of the
colored entries on the other diagonal.

EXAMPLE 1. Find the determinant of the matrix

[2 3
{1 4}
SOLUTION We have

r23)4] _= = _ 5.
4) - GA) .

EXAMPLE 2. Find the area of the parallelogram in R? with vertices (1, 1), (2, 3), (2, 1), (3, 3).
SOLUTION The paratlelogram is sketched in Figure 4.2. The sides having (1, 1) as common
vertex can be regarded as the vectors

a=(2,1]-[1, 1] =[1,0]
and

b = (2, 3] - 1, 1) = [1, 2],

x2
J

3 4 -

2+ b

] ——_>
a

+--+ +—— x)
0 l 2 3

FIGURE 4 2
The parallelogram determined by a = [1, 0] and b = [1, 2}.
4.1 AREAS, VOLUMES, AND CROSS PRODUCTS 244

as shown in the figure. Therefore, the area of the parallelogram is given by the
determinant

3] = 2) - ON)= 2.
0

The Cross Product

Equation (2) defines a second-order determinant, associated with a 2 x 2


matrix. Another application of these second-order determinants appears when
we find a vector in R? that is perpendicular to each of two given independent
vectors b = [b,, 5, 6] and c = [c,, ¢,, c3]. Recall that the unit coordinate vectors
in R? are i = [1, 0, 0],j = [0, 1, 0], and k = (0, 0, 1]. We leave as an exercise the
verification that

p= d, by in b, b, j+ b, b, k (3)
C, Cy C, & C, Cy
is a vector perpendicular to both b and c. (See Exercise 5.) This can be seen by
computing p- b = p-c = 0. The vector p in formula (3) is known as the cross
product of b and c, and is denoted p = b x ¢.
There is a very easy way to remember formula (3) for the cross product
b x c. Form the 3 x 3 symbolic matrix
ij k
b, 6, 6,).
Ci Cy Cy

HISTORICAL NOTE Tue NOTION OF A CROSS PRODUCT grew out of Sir William Rowan
Hamilton's attempt to develop a multiplication for “triplets” —that is, vectors in R’. He wanted
this multiplication to satisfy the associative and commutative properties as well as the distributive
law. He wanted division to be always possible, except by 0. And he wanted the lengths to
multiply—that is, if (a, 4, a) (b,, b,, b,) = (c, Cy, C3), then [I(a,, a, a,)|| Ilo. b,, b,)|| = Iker, Cy ¢,)|]-
After struggling with this problem for 13 years, Hamilton finally succeeded in solving it on
October 16, 1843, although not in the way he had hoped. Namely, he discovered an analogous
result—-not for triples, but for quadruples. As he walked that day in Dublin, he wrote, he could not
“resist the impulse . . . to cut with a knife on a stone of Brougham Bridge .. . the fundamental
formula with the symbols i, j, 44 namely, i? = j? = k? = ijk = --1.” This formula symbolized his
discovery of guaternions, elements of the form Q = w + xi + pj + zk with w, x, y, z real numbers,
whose multiplication obeys the laws just given, as well as the other laws Hamilton desired, except
for the commutative ‘aw.
Hamilton noted the convenience of wnting a quaternion Q in two parts: the scalar part wand
the vector part x: + yj + zk. Then the product of two quaternions a = xi+ yj + zk and B = x'i+
y’j + z'k with scalar parts 0 is given as
aB = (—xx' — yy’ — 22') + (yz — zy')i + (zx! — x2’)j + (xy’ — yx')k.
The vector part of this product is our modern cross product of the “vectors” a and B, while the
scalar part is the negative of the modern dot product (Section 1.2).
Although Hamilton and others pushed for the use of quaternions in physics, physicists
realized by the end of the nineteenth century that the only parts of the subject necessary for their
work were the two types of products of vectors. It was Josiah Willard Gibbs (1839-1903),
professor of mathematical physics at Yale, who introduced our modern notation for both the dot
product and the cross product and developed their properties in detail in his classes at Yale and
finally in his Vector Analysis of 1901.
242 CHAPTER 4 DETERMINANTS

Formula (3) can be obtained from this matrix in a simple way. Multiply the
vector i by the determinzat of the 2 X 2 matrix obtained by crossing out the
row and column containing i, as in

| hb
1 & &
Similarly, multiply (—j) by the determinant of the matrix obtained by crossing
out the row and column in which j appears. Finally, multiply k by the
determinant of the matrix obtained by crossing out the row and column
containing k, and add these multiples of i, j, and k to obtain formula (3).

EXAMPLE 3 Find a vector perpendicular to both [2, 1, 1] and [1, 2, 3] in R®.


SOLUTION We form tie symbolic matrix

aw,
we

Ce
me
NO


Ww
N
=

and find that

1 I, re 2 i,
(2,1
x [1,2,
,13)=],
] gi- |) lit |] 9
=i -5j+3k=[l,
-5, 3].
The cross product p = b x c as defined in Eq. (3) not only is perpendicular
to both b and c but points in the direction determined by the familiar
right-hand rule: when the fingers of the right hand curve in the direction fromb
to c, the thumb points in the direction of b x c. (See Figure 4.3.) We do not
attempt to prove this.

FIGURE 4.3
The area of the parallelogram determined by b and ¢ is ||b’x cl}.
4.1 AREAS, VOLUMES, AND CROSS PRODUCTS 243

The magnitude of the vector p = b X cin formula (3) is of interest as well:


it is the area of the parallelogram with a vertex at the origin in R? and eages at
that vertex given by the vectors b and c. To see this, we refer to a diagram such
as the one in Figure 4.1, but with a replaced by c, and we repeat the
computation for area. This time, Eq. (1) takes the form

(Area)? = |lcl}*|[bl? — (c « by?


=(c? +e? + cy\(b? + by? + by’) — (c,d, + Cb, + €b5)
by By? 2
+
|b, By 4 |, Oy) .

Again, pencil and paper are needed to check this last equality. Taking square
roots, we obtain
[|b x cl] = Area of the parallelogram in R’ determined by b and c.

EXAMPLE 4 Find the area of the parallelogram in R’ determined by the vectors b =[3, i, 0]
and ¢c = [l, 3, Z].
SOLUTION From the symbolic matrix
i j kl
3| |
11 3 2
we find that
_ {1 Oj, {3 0]. , [3 1]
bxe=|3 of ; i+ |i 3K
= [2, —6, 8].
Therefore, the area of the parallelogram shown in Figure 4.3 is
\lb x el] = 2V1 +9 + 16 = 20/26. 5

EXAMPLE 5 Find the area of the triangle in R? with vertices (—1, 2, 0), (2, 1, 3), and
(, 1, -1).
SOLUTION We think of (—1, 2. 0) as a local origin, and we take translated vectors starting
there and reaching to (2, 1, 3) and to (1, 1, —-1)—namely,

a = (2, 1, 3] - [-1, 2, 0] = [3, -1, 3]


and
b = [1, 1, -1] - [-1, 2, 0] = [2, -1, -1].
Now |la x bl| is the area of the parailelogram determined by these two vectors,
and the area of the triangle is half the area of the parallelogram, as shown in
Figure 4.4. We form the symbolic matrix
i jk
3-1 3),
ly yp -1
244 CHAPTER 4 DETERMINANTS

(2, 1, 3)

(--1, 2, 0)

FIGURE 4.4 FIGURE 4.5


The triangle constitutes half the The box in R? determined by a, b,
parallelogram. and c.

and we find that a x b = 4i + 9j — k. Thus,

‘|la x bl] = V/16 + 81 + 1 = V498


= 7V2,
so the area of the triangle is 72/2. =

The Volume of a Box

The cross product is useful in finding the volume of the box, or parailelepiped, -
determined by the three nonzero vectors a = [a,, @,, a,], b = [0,, 5, b,], and ¢=
[C,, G, ¢] in R’, as shown in Figure 4.5. The volume of the box can be
computed by multiplying the area of the base by the altitude A.

ee

HISTORICAL NOTE Tue VOLUME INTERPRETATION of a determinant first appeared in a 1773


paper on mechanics by Joseph Louis Lagrange (1736-1813). He noted that if the points M@, M’;
M” have coordinates (x, y, z), (x’, y’, 2’), (x”, y”. z”), respectively, then the tetrahedron with
vertices at the origin and at those three points will have volume
l
glze'y" _ yx") + 2'(yx" _ xy") + z"(xy' _ yx')];

that is,
x yz
got xy 2,
xyz
Lagrange was born in Turin, but spent most of his mathematical career in Berlin and in Pac
He contributed important results to such varied fields as the calculus of variation, celest!
mechanics, number theory, and the theory of equations. Among his most famous works are thé
Treatise on Analytical Mechanics (1788), in which he presented the various principles
mechanics from a single point of view, and the Theory of Analytic Hunctions (1797). in which BE
attempted to base the differential calculus on the theory of power scries.
4.1 AREAS, VOLUMES, AND CROSS PRODUCTS 245

We have just seen that the area of the base of the box 1s equal to ||b x cll,
and the altitude can be found by computing

- _ [Ib x el|fall
eos | _ |(b xc)-al
n= |lallleos |= "Ty x] bx
The absolute value is used in case cos @ is negative. This would be the case if
the direction of b x ¢ were opposite to that shown in Figure 4.5. Thus,

b xc):
Volume = (Area of base)h = ||b x eae = |(b x c)- al.
That is, referring to formula (3), which defines b x c, we see that

Volume = |a,(b,c, _ b,c) _ a,(b,c; —_ b,c.) + a;{b,c, _ b,c,)|. (4)

The number within the absolute value bars is known as a third-order


determinant. It is the determinant of the matrix

2 a, a;
A= b, b, b,

C, Cy Cy
and is denoted by

a, a, a,
det(A) = |6, b, 5,).
C, Cy C3
It can be computed as

det(A) = a, by bs] _ a, b 6, + a, db, , (5)


Cy C; C, 6 C, Cy
Notice the similarity of formula (5) to our computation of the cross product
b X cin formula (3). We simply replace i, j, and k by a,, a,, and a;, respectively.

EXAMPLE 6 Find the determinant of the matrix

2 1 3
A=|4 1 2).
1 2 -3
SOLUTION Using formula (5), we have

2 | 3
| _ al 2] f4 2 41
4 1 ; = 2), 4 iF 4 +3 |
1 2-
= 2(-7) - (-14) + 3(7) = 21. .
EXAMPLE 7 Find the volume of the box with vertex at the origin determined by the vectors
a= (4.1, 1]. b = [2, i, 0). and ec = [0, 2, 3], and sketch the box in a figure.
246 CHAPTER 4 DETERMINANTS

x]

FIGURE 4.6
The box determined by a, b, and c.

SOLUTION The box is shown in Figure 4.6. Its volume is given by the absolute value of the
determinant

411 ro i201. 121


21
a eal =4‘2 317 -|lo 3] *+| Jo2
= 10. 7

The computation of det(A) in Eq. (5) is referred to as an expansion of the


determinant on the first row. It is a particular case of a more general procedure
for computing det(A), which is described in the next section.
We list the results of our work with the cross product and some of its
algebraic properties in one place as a theorem. The algebraic properties can be
checked through computation with the components of the vectors. Example 8
gives an illustration.

THEOREM 4.1. Properties of the Cross Product

Let a, b, and c be vectors in-R’.


l. bX c= —(c x b). Anticommutativity
2. a X (b X c) is generally different Nonassociativity of x
from (a x b) x ¢.
3. ax (b +c) = (ax b)+(aXxc) Distributive properties
(a+b) Xc=(axc)
+ (bx oc).
41 AREAS, VOLUMES, AND CROSS PRODUCTS 247

4. b: (bx c)=(bxc)-c=0. Perpendicularity ofb x ¢ to


both b and c
5. ||b x cl] = Area of the parallelo- Area property
gram determined by b and c.
6. a:(bxXc)=(ax b)- c= Volume property
+ Volume of the box determined
by a, b, and c.
7, aX (b X c) = (a-c)b— (a* b)c. ‘Formula for computation of
a x (b X c)

EXAMPLE 8 Show that b x c = —(c x b) for any vectors b and ¢ in R’.


SOLUTION We compute
i j k
cx b=|c, c ¢;
1 b, b,

14 Gi. |G Gl. [a G
b, bt (6, bh* 16, 81%
A simple computation shows that interchanging the rows of a 2 X 2 matrix
having a determinant d gives a matrix with determinant —d (see Exercise i).
Comparison of the preceding formula for c x b with the formula for b x c in
Eq. (3) then shows that b x c= -(c x b).

| SUMMARY

1. A second-order determinant is defined by

ab = ad — be.
cd
A third-order determinant is defined by Eq. (5).
2. The area of the parallelogram with vertex at the origin determined by
nonzero vectors a and b in R? is the absolute value of the determinant of
the matrix having row vectors a and b.
3. The cross product of vectors b and c in R? can be computed by using the
symbolic determinant
i j k
b, by db;
Cy Oy C;

This vector b x c is perpendicular to both b and c.


4. The area of the parallelogram determined by nonzero vectors b and c in ®?
is |[b X ci}.
248 CHAPTER 4 DETERMINANTS

5. The volume of the box determined by nonzero vectors a, b, and c in x? is


the absolute value of the determinant of the matrix having row vectors a, b,
ana c. This determinant is also equal to a - (b X c).

EXERCISES

in Exercises 1-4, find the indicated determinaitt. 15. a= —i+ 2j + 4k, b = 2i — 4j — 8k


16. a=i-j+k,b=3i-
2j + 7k
1. r! 3 , |-! 0 17. a= 2i ~— 3j + 5k, b= 4i- Sj +k
5 0 7
18. a= -2i+ 3j-k,b=4i—- 6)7 k
_ 3!
3. ° | 4. 21 —-4 19. Mark each of the following True or False.
5 0 10 7 ___ a. The determinant of a 2 x 2 matrix isa
5. Show that the vector p = b X ¢ given in vector.

Eq. (3) 1s perpendicular to both b and c. ___b. If two rows of a3 x 3 matrix are
interchanged, the sign of the determinant
is changed.
In Exercises 5-9, find the indicated determinant.
___ ce. The determinant of a 3 < 3 matrix is
zero it two rows of the matrix are parallel
! 4 4 2-5 3 vectors in R’.
6. 5 13 O 7} 1 3 4 ___ d. In order for the determinant of a 3 x 3
p -1 3 —2 3 ~=7 matrix to be zero, two rows of the matrix
must be parallel vectors in R?.
1-2 7 z~-l J __. e. The determinant of a 3 x 35 matrix 15
8. |0 1 4 9 i-l 0 3 zero if the pcints in R? given by the rows
1 0 3 2 | -4 of the matnx lie in a plane.
16. Show by direct computation that: f. The determinant of a 3 x 3 matrix is
Qa, a a zero if the points in R? given by the rows
a. |@, Q@, a;| = 0; of the matrix lie in a plane through the
origin.
Cr Cf, C;
__ g. The parallelogram in R? determined by
nonzero vectors a and b is a square if and
b. b, b, b, = 0, only ifa- b= 0.
a, @, a, ___h. The box in R} determined by vectors a, ),
11. Show by direct computation that and cis a cube if and only ifa- b=
a‘c=b-c=Qanda-a=b-b=c°:e.
Qa, a& b, by j. If the angle between vectors a and b in R?
Q, a, is 7/4, then |la x bl| = |a - bj.
12. Show by direct computation that ____ j. For any vector a in R®, we have jja x aij =
[all
a a, a; a, a, a,
b, b, b, =~ Cc Cy Cy. In Exercises 20-24, find the area of the
C) Cr Cy b, 6, by parallelogram with vertex at the origin and with
the given vectors as edges
in Exercises 13-18, finda x b.
20. -i+ 4jand 2i + 3j
Ia - ti jo Shob si + Qj 21. -5i + 3jandi+ 7j
id. a= —S5i+jt
4k, b=2i+j - 3k 22. i + 3j — 5k and 21 + 4j — k
4.1 AREAS, VOLUMES, AND CROSS PRODUCTS 249

93. 2-j+ kandi +3j-k volume of the box having the same three vectors
94. i- 4j + kand 2i + 3j - 2k as adjacent edges.)

jn Exercises 25-32, find the arca of the given Al. (-3, 0, 1), (4, 2, 1), (0, 1, 7), (1 1, I)
geometric covfiguration. 42. (0, 1, 1), (8, 2, -7), (3, 1, 6), (—4, -2, 0)
43. (-1, 1, 2), (3, 1, 4% (-1, 6, 0), (2, -1, 5)
25. The triangle with vertices (—1, 2), (3, -1), 44. (-1, 2, 4), (2, -3; 0), (-4, 2, -1), (0, 3, —2)
and (4, 3)
26. The triangle with vertices (3, —4), (1, 1) and In Exercises 45-48, use a determinant to
(5, 7) ascertain whether the given points lie or. a line in
27. The triangle with vertices (2, 1, —3), R?, [Hur: What is the area of a “parallelogram”
(3, 0, 4), and (1, 0, 5) with collinear vertices?|
28. The triangle with vertices (3, 1, —2),
- (1, 4, 5), and (2, 1, -4) 45. (0, 0), (3, 5), (6, 9
29. The triangle in the piane R? bounded by the 46. (0, 0), (4, 2), (6, -3)
lines y = x, y = —3x+ 8, and 3y + 5x =0 47. (1, 5), (3, 7), (-3, 1)
30. The parallelogram with vertices (1. 3), 48. (2, 3), (1, —4), (6, 2)
(—2, 6), (1, 11), and (4, 8)
31. The parallelogram with vertices (1, 0, 1), In Exercises 49-52, use a determinant to
(3, 1, 4), (0, 2, 9), and (-2, 1, 6) ascertain whether the given points lie in a plane
32. The parallelogram in the plane R? bounded in R?, [Hinr: What is the “volume” of a box with
by the linesx — 2y = 3,x — 2y = 10, coplanar vertices?|
2x+ 3y = -1, and 2x+ 3y = —-8
49. (0, 0, 0), (1, 4, 3), (2, 5, 8), (1, 2, -5)
In Exercises 33-36, find a + (b x c) and 50. (0, 0, 0), (2, 1, 1), (3, 2, 1), (-1, 2, 3)
a X (b X c).
51. (i, -1, 3), (4, 2, 3), (3, 1, -2), (5, 5, -5)
33, a=i+ 2j — 3k,b = 4i-jt+ 2k,c = 3i+k §2. (1, 2, 1), (3, 3, 4), (2, 2, 2), (4, 3, 5)
34. a=~-i+j+ 2k, b=i+k,
c = 3i — 2j + 5k Let a, b, and c be any vectors in R>. In Exercises
53-56, simplify the given expression.
35. a=i— 3k,b=-i+ 4j,c=i+jt+k
36. a = 4i — j + 2k, b = 31 + 5j — 2k,
53. a- (a x b)
c=i-3jt+k
54. (b x c) — (c X b)
In Exercises 37-40, find the volume of the box 55. {la x bil? + (a- bP
having the given vectors as adjacent edges. 56. aX (bX c) + bx (¢c xX a) +c xX (aX b)
57. Prove property (2) of Theorem 4.1.
37. -i + 4j + 7k, 3i— 2j—k, 4i + 2k 58. Prove property (3) of Theorem 4.1.
38. 2i + j — 4k, 3i-j + 2k, i+ 3j — 8k 5 9. Prove property (6) of Theorem 4.1.
39. -21 + j, 3i- 4j+ ki — 2k a 60. Option 7 of the routine VECTGRPH in
40. 31 — 5+ 4k, i — 2j + 7k, Si — 3j + 10k LINTEK provides drill on the determinant
of a2 X 2 matrix as the area of the
In Exercises 41-44, find the volume of the parallelogram determined by its row veciors,
‘etrahedron having the given vertices. (Consider with an associated plus or minus sign. Run
how the volume of a tetrahedron having three this option until you can regularly achieve a
vectors from one point as edges is related to the score of 80% or better.
250 CHAPTER 4 DETERMINANTS

MATLAB has a function det(A) which gives the 61. -i + 7j + 3k, 41 + 23j — 13k,
determinant of a matrix A. In Exercises 61-63, 12i — 17) — 31k
use the routine MATCOMP in LINTEX or 2. 4.ii — 2.3k, 5.3) — 2.1%, 6.11 + 5.7j
MATLAB to find the volume of the box having the
given vectors in PB as adjacent edges. (W have 63. 2.131 + 4.71j — 3.62k, 5i — 3.2j + 6.32k,
rot supplied matrix files for these problems ) 8.31 — 0.457 + 1.13k

MATLAB

M1. Enter the data vectorsx =[1 5 7] andy =[-3 2 4] into MATLAB.
Then enter a line crossxy = | ], which wil! compute the cross product
x X y of vectors [x(1)_ x(2) x(3)} and [y(@1) y(2) y(3)I. (Hint: The first
componen: in [ ] will be x(2)*y(3)— y(2)*x(3).] Be sure you use no spaces
excepi one between the vector components. Check that the value given
for crossxy is the correct vector 61 — 25j + 17k for the data vectors
entered.
M2. Usz the norm function in MATLAB to find the area of the parallelogram in R?
having the vectors x and y in the preceding exercise as adjacent edges.
M3. Enter the vectors x = 4.2i — 3.7j + 5.6k and y = —7.3i + 4.5j + 11.4k.
a. Using the up-arrow key to access your line defining crossxy, find x x y.
b. Find the area of the parallelogram in R? having x and y as adjacent
edges.
M4. Find the area of the triangle in R? having vertices (—1.2, 3.4, —6.7),
(2.3, —5.2, 9.4), and (3.1, 8.3, —3.6). [Hint: Enter vectors a, b, and c from the.
origin to these points and set x and y equal to appropriate differences of
them.]

NOTE: If you want to add a function cross(x, y) to your own personal


MATLAB, do so following a procedure analogous to that described at the very end
of Section !.2 for adding the function angl(x, y).

4.2 THE DETERMINANT OF A SQUARE MATRIX

The Definition
We defined a third-order determinant in terms of second-order determinants
in Eq. (5) on page 245. A second-order determinant can be defined in termsof
first-order determinants if we interpret the determinant ofa 1 x 1 matrix to}
its sole entry. We define an nth-order determinant in terms of determinants
order +7 — |. In order to facilitate this, we introduce the minor matrix Ajj
A> nmatriv.d = (a,j: itis the (2 — 1) xX (a — 1) matrix obtained by crossin
4.2 THE DETERMINANT OF A SQUARE MATRIX 251

out the ith row and jth column of A. The 1.inor matnx is the portion shown in
color shading in the matrix

ith row. (1)

[ay * po” Ann |

jth column

Using |A;| as notation for the determinant of the minor matrix A, we can
express the determinant of a 3 X 3 matnx A as

Q, Ay Gy
Ay Ay Ay = Ay|Ay| — Ap/Ay| + @43[A)3l-
ay, ay) as,

The numbers a), = |A,,|, a1. = —|A,,, and ai, = |A,;| are appropriately called
the cofactors of a,,, @,,, and a,;. We now proceed to define the determinant of
any square matrix, using mathematical induction. (See Appendix A for a
discussion of mathematical induction.)

HISTORICAL NOTE = THE FIRST APPEARANCE OF THE DETERMINANT OF A SQUARE MATRIX in Western
Europe occurred in a 1683 letter from Gottfned von Leibniz (1646-1716) to the Marquis de
L’H6pital (1661-1704). Leibniz wrote a system of three equations in two unknowns with abstract
“‘numerical” coefficients,
10+ tix + 12y=0
20 + 2Ix + 22y =0
30 + 31x+ 32y =0,
in which he noted that each coefficient number has “two characters, the first marking in which
equation it occurs, the second marking which letter it belongs to.” He then proceeded to eliminate
first y and then x to show that the criterion for the system of equations to have a solution is that
10-21-32 + 11-22-30+ 12-20-31 = 10-22-31 + 11-20-32 +4 12-21 30.
This is equivalent to the moder condition that the determinant of the matzix of coefficients must
be zero.
Determinants also appeared in the contemporaneous work of the Japanese mathematician
Seki Takakazu (1642-1708). Seki’s manuscript of 1683 includes his detailed calculations of
determinants of 2 x 2, 3 x 3, and 4 X 4 matrices—although his version was the negative of the
version used today. Seki applied the determinant to the solving of certain types of equations, but
evidently not to the solving cf systems oi linear equations. Seki spent mosi of his life as an
accountant working for two feudal lords, Tokugawa Tsunashige and Tokugawa Tsunatoyo, in
Kofu, a city in the prefecture of Yamanashi, west of Tokyo.
252 CHAPTER 4 DETERIAINANTS

DEFINITION 4.1. Cofactors and Determinants

The determinant of a 1 x 1 matrix is its sole entry; it is a first-order


determinant. Let n > 1, and assume that determinants of order less
than n have been defined. Let A = [a,] be an 1 X n matrix. The cofactor
of a,,in A is

ai, = (—1)'" det(A,), (2)


where A,, given in Eq. (1) is the minor matrix of A. The determinant cf
Ais

Q, A, °° * a,
Q, 4, **°* Ay
det(A} =

ay an se. ann

= 4,4), + AA), + + A,Qins (3)


— ‘ e , e

and is an ath-order determinant.

In addition to the notation det(A) for the determinant of A, we will


sometimes use |A] when determinants appear in equations, to make the
equations easier to read.

EXAMPLE 1 Find the cofactor of the entry 3 in the matnx


101
ho

= ©

212
N=
W

A=
01 4:
£&

021
m

SOLUTION Because 3 is in the row 2, column | position of A, we cross out the second row
and first column of A and find the cofactor of 3 to be
101 14) | | |o1 |
ai, = (-if*' 10lori
1 4) = - 2H 102
= -(-7- 0) =7.
EXAMPLE 2 Use Eq. (3) in Definition 4.1 to find the determinant of the matnx
5-2 4-1
01 52
A=| 1 2 0 1f
-3 1-1 1
4.2 THE DETERMINANT OF A SQUARE MATRIX 253

SOLUTION We have

5 -2 4-]

d=] 1 2 0 1
i—3 «61 -l 61
1 5 2 0 5 2
> 5-17 ]2 0 I+ (-2y-1)3} 1 0 1
1-1! 1 —-3 -! 1
0 1 2 0 1 5
+ 4-1) 1 2 1+(-1y-1)) 1 2 OL.
- 1 1 -—3 1-1
Computing the third-order determinants, we have
1 5 2
2 0 y= ft 1 3/1 | + 3ti a
I-11
= 1(1) - 5(1) + 2(-2) = 8;
052
1 0 =o] 1 -5|_3 | +23 4
3-1 1
= O(1) — 5(4) + 2(-1) = -22:
012
—3
1 2 =o} |
1 l a
- 1] 3 a 4]
+ 2] 3 Fj
4

= O(1) - 1(4) + 2(7) = 10;


1 5 _2 OJ 1 Oe} 1 2
12 deo ify [tes |
= (2) — I(-1) + 5(7) = 36.
Therefore, det(4) = 5(—8) + 2(-22) + 4(10) + 1(36) = -8.
The preceding example makes one thing plain:

Computation of determinants of matrices of even moderate size


using only Definition 4.1 1s a tremendous chore.

According to modern astronomical theory, our solar system would be dead


long before a present-day computer could find the determinant of a 50 x 50
matrix using just the inductive Definition 4.1. (See Exercise 37.) Section 4.3
gives an alternative, efficient method for computing determinants.
254 CHAPTER 4 DETERMINAG:TS

Determinants play an important role in calculus for functions of several!


variables. In some cases where primary values depend on some secondary
values, the single number that best measures the rate at which the primary
values change as the secondary values change is given by a determinant. This is
closely connected with the geometric interpretation of a determinant as a
volume. In Section 4.1, we motivated the determinant of a 2 x 2 matrix by
using area, and we motivated the determinant of a 3 xX 3 matrix by using
volume. In Section 4.4, we show how an nth-order determinant can be
interpreted as a “volume.”
It is desirable to have an efficient way to compute a determinant. We will
spend the remainder of this section developing properties of determinants that
will enable us to find a good method for their computation. The computation
of det(A) using Eq. (3) is called expansion by minors on the first row. Appendix
B gives a proof by mathematical induction that det(A) can be obtained by
using an expansion by minors on any row or cn any column. We state this more
precisely in a theorem.

THEOREM 4.2 General Expansion by Minors

Let A be an n X n matrix, and let rand s be any selections from the list
of numbers 1, 2,..., 2. Then

det(A) = a,a', + ana, + +++ +4,a),, (4)


and also

det(A) = a,,a), + G,0;, + + * + Anns, (5)


where a, is the cofactor of a, given in DeGnition 4.1.

Equation (4) is the expansion of det(A) by minors on the rth row of A, and
Eq. (5) is the expansion of det(A) by minors on the sth column of A. Theorem 4.2
thus says that det(A) can be found by expanding by minors on any row or on
any column of A.

EXAMPLE 3 Find the determinant of the matrix

3 2 0 1 3
—2 4 1 2 #41
A=| 0-1 0 1 -S}.
-! 2 0-1 2
000 0 2

SOLUTION A recursive computation such as the one in Definition 4.1 is still the only waY
we have of computing det(A) at the moment, but we can expedite ¢
4.2 THE DETERMINANT OF A SQUARE MATRIX 255

computation if we expan3 by minors at each step on the row or column


containing the most zeros. We have
3 2 0 1
di .c
det(A) = 2(-1) 0-1 01 Expanding on row 5
l 2 F .
= 5+5 —2

~1 2 0-1
3 2 1
= (21-17) 0 -1 1 Expanding on column 3
-l1 2-1

=
= 213-5
1+] —1
_ 3l 7 1(—1) 3+!
11
21
Expanding
:
on column 1

= —2(3(1 — 2) — 1(2 + 1)) = -2(-3 - 3) = 12. 3

EXAMPLE 4 Show that the determinant of an uppez- or lower-triangular square matrix is


the product of its diagonal elements.
SOLUTION We work with an upper-triangular matnix, the other case being analogous. If

UW, %U. *°° Mi


Uy eee U2,

¢ Unn

is an upper-triangular matrix, then by expanding on first columns each time,


we have

Uy, Un3 ° °° Uay U3, "°° Uy,


0 U5; oe U;, 0 eo ee U,,

det(U) = uy, . ° ° = Uy Uy

0 0 -:: 4&,, O +: 4,

St = yy? * Ung. .

Properties of the Determinant


Using Theorem 4.2, we can establish several properties of the determinant that
will be of tremendous help in its computation. Because Definition 4.1 was an
inductive one, we use mathematical induction as we start to prove properties
of the determinant. (Again, mathematical induction is reviewed in Appendix
A.) We will always consider A to be an n X n matrix.
256 CHAPTER 4 DETERMINANTS

PROPERTY 1 The Transpose Property

For any square matrix A, we have det(A) = det(A”).

PROOF Verification of this property 1s trivial for determinants of orders | or


2. Let n > 2, and assume that the property holds for square matrices of size
smaller than n x n. We proceed to prove Property | for ann X n matrix A. We
have

det(A) = a,JA,,{ — @]4pl + + °° + (—1)"*'a,,/A),|- Expanding on row 1 of A


Writing B = A‘, we have
nt Expanding on
det(BY == byfByl — Sy lBalb + °° + (-1) ‘DalBnal: woleann ' oa

However, a, = 6, and B, = A)’, because B = A’. Applying our induction


hypothesis to the (n — |)st-order determinant |A,|, we have |A,| = |B,,|. We
conclude that det(A) = det(B) = det(A7). a

This transpose property has a very useful consequence. It guarantees that


any property of the determinant involving rows of a matnx is equally valid if
we replace rows oy columns in the statement of that property. For example, the
next property has an analogue for columns.

PROPERTY 2 The Row-Interchange Property

If two different rows of a square matrix A are interchanged, the


determinant of the resulting matrix is —det(A).

PROOF Again we find that the proof is trivial for the case n = 2. Assume that
n > 2, and that this row-interchange property holds for matrices of size smaller
than n X n. Let A be ann x n matrix, and let B be the matrix obtained from4
by interchanging the ith and rth rows, leaving the other rows unchanged.
Because n > 2, we can choose a kth row for expansion by minors, where kis
different from both r and /. Consider the cofactors

(— 1A] and (— 1) By].


These numbers must have opposite signs, by our induction hypothesis,
because the minor matrices A,, and B,; have size (n — 1) X (n — 1), and B, ca"
be obtained from A,; by interchanging two rows. That is, |B,| = — |Augp
Expanding by minors on the kth row to find det(A) and det(B), we see that
det(A) = —det(B). a
4.2 THE DETERMINANT OF A SQUARE MATRIX 257

PROPERTY 3. The Equal-Rows Property

If two rows of a square matrix A are equal, then det(A) = 0.

PROOF Let B be the matrix obtained from A by interchanging the two equal
rows of A. By the row-interchange property, we have det(B) = —det(A). On the
other hand, B = A, so det(A) = —det(A). Therefore, det(A) = 0. a

PROPERTY 4 The Scalar-Multiplication Property

If a single row of a square matrix A is multiplied by a scalar r, che


determinant of the resulting matrix is r - det(A).

PROOF Let r be any scalar, and let B be the matrix obtained from A by
replacing the kth row [dy,, dy, . . . » Qn] Of A by [ray,, Ay, . . . , FAgy). Since the
rows of B are equal to those of A except possibly for the kth row, it follows that
the minor matrices A, and B,, are equal for each j. Therefore, aj, = 5;,, and
computing det(B) by expanding by minors on the kth row, we have

det(B) = 5,6, + baby +--+ + + DenDien


=F Aydy te AygA, tes $+ aay
r - det(A). A

HISTORICAL NOTE THe THEORY OF DETERMINANTS grew from the efforts of many mathemat:-
cians of the late eighteenth and early nineteenth centuries. Besides Gabriel Cramer, whose work
we wilt discuss in the note on page 267, Etienne Bezout (1739-1783) in 1764 and Alexandre-
Theophile Vandermonde (1735-1796) in 1771 gave various methods for computing determi-
nants. In a work on integral calculus, Pierre Simon Laplace (1749-1827) had to deal with systems
of linear equations. He repeated the work of Cramer, but he also stated and proved the rule that
interchanging two adjacent columns of the determinant changes the sign and showed that a
determinant with two equal columns will be 0.
The most complete of the early works on determinants is that of Augustin-Louis Cauchy
(1789-1857) in 1812. In this work, Cauchy introduced the name determinant to replace several
older terms, used our current double-subscript notation for a square array of numbers, defined the
array of adjoints (or minors) to a given array, and showed that one can calculate the determinant
by expanding on any row or column. In addition, Cauchy re-proved many of the standard
theorems on determinants that had been more or less known for the past 50 years.
Cauchy was the most prolific mathematician of the nineteenth century, contributing to such
areas as complex analysis, calculus, differential equations, and mechanics. In particular, he wrote
the first calculus text using our modern ¢,5-approach to continuity. Politically he was a
conservative; when the July Revolution of 1830 replaced the Bourbon king Charles X with the
Orleans king Louis-Philippe, Cauchy refused to take the oath of allegiance, thereby forfeiting his
chairs at the Ecole Polytechnique and the Collége de France and going into exile in Turin and
Prague.
258 CHAPTER 4 DETERMINANTS

EXAMPLE 5. Find the determinant of the matnx

Yo

=—_AN
NAA

Wo

hf

WN
my
I}

ame
=

NR
SOLUTION Wenote that the third row of A is three times the first row. Therefore, we have

21342
62141
det(A) = 3}2 1 3 4 2! Property4
213441
14211

= 3(0)= 6. Property 3 iz

The row-interchange property and the scalar-multiplication propeity


indicate hcw the determinant of a matrix changes when two of the three
elementary row operations are used. The next property deals with the most
complicated of the elementary row operations, and lies at the heart of the
efficient computation of determinants given in the next section.

PROPERTY 5S. The Row-Addition Property

If the product of one row of a square matrix A by a scalar is added toa


different row of A, the determinant of the resulting matrix is the same
as det(A).

PROOF Let a, = [@,,, 4p, . . . , @;,] be the ith row of A. Suppose that ra; is added
to the kth row a, of A, where r is any scalar and k # i. We obtain a matrix B
whose rows are the same as the rows of A except possibly for the kth row,
which is

b, = [raj + ay, ra, + Ay, ... 5 Tin + Aq).

Clearly the minor matrices A, and B,, are equal for each j. Therefore, aj, = Dip
and computing det(B) by expanding by minors on the kth row, we have

det(B) = byb!, + Baby t +++ + db, bt,


= (ray, + Ay)Qy + (Fa, + Ay)aiy t °° + + (Tig + Ayn) Ben

= (FajQj, + TAxAj, tree t+ TQ Qn)

+ (Ai + AyQiy + ++ + + ApgQiy)


= r+ det(C) + det(A),
4.2 THE DETERMINANT OF A SQUARE MATRIX 259

where c is the matrix obtained from A by replacing the kth row of A with the
ith row of A. Because C has two equal rows, its determinant is zero, so det(B) =
det(A). a

We now know how the three types of elementary row operations affect the
determinant of a matrix A. In particular, if we reduceA to an echelon form H
and avoid the use of row scaling, then det(A) = +det(H), and det(#) is the
product of its diagonal entries. (See Example 4.) We know that an echelon
form of A has only nonzero entries on its main diagonal if and only if 4 is
invertible. Thus, det(A) # 0 if and only if A is invertible. We state this new
condition for invertibility as a theorem.

THEOREM 4.3 Determinant Criterion for invertibility

A square matnxA is invertible if and only if det(4) 4 0. Equivalently,


A is singular if and only if det(A) = 0.

We conclude with a multiplicative property of determinants. Section 4.4


indicates that this property has important geometric significance. Rather than
labeling it ‘“Property 6,” we emphasize its increased level of importance over
Properties | through 5 by stating it as a theorem.

THEOREM 4.4 The Multiplicative Property

If A and B aren X n matrices, then det(AB) = det(A) - det(B).

PROOF First we note that, if A is a diagonal matrix, the result follows easily,
because the product

ay by byt: Bh,
Qy, by bys Bay

() aan lb ba “we Ban

has its ith row equal to a, times the ith row of B. Using the scalar-
multiplication property in each of these rows, we obtain
det(AB) = (4,4, * °° Gy.) - det(B) = det(A) - det(B).

To deal with the nondiagonal case, we begin by reducing the problem to


the case in which the matrix A is invertible. For if A is singular, then so is AB
260 CHAPTER 4 DETERMINANTS

(see Exercise 30); so both A and AB have a zero determinant, by Theorem 4.3,
and det(A) - det(B) = 0, too.
If we assume that A is invertibie, it can be row-reduced through row-
interchange and row-addition operations to an upper-triangular matrix with
nonzero entries on the diagonal. We continue such row reduction analogous to
the Gauss—Jordan method but without making pivots 1, and finally we reduce
A toa diagonal matrix D with nonzero diagonal entries. We can write D = EA,
where E is the product of elementary matrices corresponding to the row
interchanges and row additions used to reduce A to D. By the properties
of determinants, we have det(A) = (—1)’ - det(D), where r is the number
of row interchanges. The same sequence of steps will reduce the matrix
AB to the matrix E(AB) = (EA)B = DB, so det(AB) = (—1) - det(DB).
Therefore,

det(AB) = (-1) - det(DB) = (-1)- det(D) - det(B) = det(A) - det(B),


and the proof is complete. a

EXAMPLE 6 Find det(A) if

20 O}]1 2 3
A=}13 OVO L 2].
L
42 1/0 0 2

SOLUTION Because the determinant of an upper- or lower-triangular matrix is the product


of the diagonal elements (see Example 4), Theorem 4.4 shows that
2 0 Ol2 3
det(A) = 1 3 OO
1 2 = (6)(2) = 12.
42 10 0 2

EXAMPLE 7 If det(A) = 3, find det(A°) and det(A™').


SOLUTION Applying Theorem 4.4 several times, we have
det(A*) = [det(A)}? = 3° = 243.
From AA! = J, we obtain

1 = det(/) = det(AA™') = det(A)det(A~') = 3fdet(A~')],

so det(A7') = 3] a

Exercise 31 asks you to prove that if A is invertible, then

det(A~')
1
~ det(A)’
4.2 THE DETERMINANT OF A SQUARE MATRIX 261

| SUMMARY
1. The cofactor of an element a, in a square matrix A is (—1)'A;|, where A, is
the matrix obtained from A by deleting the ith row and the jth column.
2. The determinant of an n X n matrix may be defined inductively by
expansion by minors on the first row. The determinant can be computed
by expansion by minors on any row or on any column; it is the sum of the
products of the entries in that row or column by the cofactors of the
entries. For large matrices, such a computation is hopelessly long.
3. The elementary row operations have the following effect on the determi-
nant of a square matrix A.
a. If two different rows of A are interchanged, the sign of the determinant
is changed.
b. If a single row of A is multiplied by a scalar, the determinant is
multiplied by the scalar.
c. If a multiple of one row is added to a different row, the determinant is
not changed.
4. We have det(A) = det(A’). As a consequence, the properties just listed for
elementary row operations are also true for elementary column opera-
tions.
5. If two rows or two columns of a matrix are the same, the determinant of
the matnix is zero.
6. The determinant of an upper-triangular matrix or of a lower-tnangular
matrix is the product of the diagonal entries.
7. Ann X n matrix A is invertible if and only if det(A) 4 0.
8. If A and Bare n X n matrices, then det(AB) = det(A) - det(B).

EXERCISES

In Exercises 1-10, find the determinant of the [2 3 4 6 2 0-1 7


given matrix. 7 2 0-9 6 8 F 1 0 4
4 1 0 2 8 -2 1 QO
{5 2 1 1 0 6 (GO i -t 0 4 1 6 2
I. jt -1 1 2 : l “
3 0 2 5 0 1 1 2 0-1! 2 4

for 324
141
a -1
4-1
2 #1
2
ere oe BOT
1 2
1 2 0-1 2 4
: 14 621 1 O 1 8 1 5
3./2 3 1 6./0 4 1
14) 00 i
262 CHAPTER. 4 DETERMINANTS

1012 22. ataatbate


3412 23. a, b, 2a + 3b
0-16 100 24. a, b, 2a + 3b + 2c
0121
25. a+b bt+tcct+a
11. Find the cofactor of 5 for the matrix in
Exercise 2.
In Exercises 26-29, find the values of » for which
12. Find the cofactor of 3 for the matrix in
the given matrix is singular.
Exercise 4.
13. Find the cofactor of 7 for the matrix in
Exercise 8. 26.
3 2-A | 23-A
14. Find the cofactor of —5 for the matrix in
Exercise 9.
2-r 0 0 |
28. 0 I-A ‘|
0 1 1l-A
In Exercises 15-20, let A be a 3 X 3 matrix with
det(A) = 2. 1-a C 2
-| 0 4-A. 3
0 4 -)
15. Find det(A’). 16. Find det(4’).
. If A and B are n X¥ n matrices and if A is
17. Find det(34). 18. Find det(A + A). singular, prove (without using Theorem 4.4)
19. Find det(A-'). 20. Find det(A’). that AB is also singular. (Hint: Assume that
21. Mark each of the following Trve or False. AB is invertible, and derive a contradiction.]
___ a. The determinant det(A) is defined for any 31. Prove that if A is invertible, then det(A~') =
matrix A. 1 /det(A).
. The determinant det(4) is defined for 32. If A and C are n x n matrices, with C
each square matria A. invertible, prove that det(A) = det(C~'AC).
. The determinant of a square matrix is a
scalar.
. Without using the multiplicative property of
determinants (Theorem 4.4), prove that
. If a matrix A is-multiplied by a scalar c,
the determinant of the resulting matrix is det(AB) = det(A) - det(S) for the case where
c+ det(A). B is a diagona! matrix.
. If an X n matrix A is multiplied by a . Continuing Exercise 33, find two other types
scalar c, the determinant of the resulting of matrices B for which it is easy to show
matrix is c” + det(A). that det(AB) = det(A) - det(B).
. For every square matrix A, we have 35. Prove that, if three n x n matrices A, B, and
det(AA7) = det(A7A) = [det(A))’. C are identical except for the kth rows a, dy:
—— g. If two rows and also two columns of a and c,, respectively, which are related by
square matrix A are interchanged, the a, = b, + ¢, then
determinant changes sign.
—_—h. The determinant of an elementary matrix det(A) = det(B) + det(C).
is nonzero. - 36. Notice that
—— |. If det(A) = 2 and det(B) = 3, then
det(A + B) = 5S. Qi, 42 = (4,@y)) = (—@,24)
—— j. If det(A) = 2 and det(B) = 3, then 2, 2,
det(AB) = 6. is a sum of signed products, where each
product contains precisely one factor from
In Exercises 22-25, let A be a 3 X 3 matrix with each row and one factor from each columa
row vectors a, b,c and with determinant equal to of the corresponding matrix. Prove by
3. Find the determinant of the matrix having the induction that this is true for ann X ”
indicated row vectors. matrix A = [a;].
4.3 COMPUTATION OF DETERMINANTS AND CRAMER’S RULE 263

31. (Application to permutation theory) Consider skeptical of our assertion that the solar
an arrangement of n objects, lined up in a system would be dead long before a
column. A rearrangement of the order of the present-day computer could find the
objects is called a permutation of the objects. determinant of a 50 x 50 matrix using just
Every such permutation can be achieved bv Definition 4.1 with expansion by minors.
successively swapping the positions of pairs a. Recall that n! = n(n — 1) - + - (3)(2X(1).
of the objects. For example, the first swap Show by induction that expansion of an
might be to interchange the first object with n X n matrix by minors requires at least
whatever one you want to be first in the new n! multiplications for n > 1.
arrangement, and then continuing this
procedure with the second, the third, etc.
WH b. Run the routine: EBYMTIME
; in LINTEK'
However, there are many possible sequences and find the time required to perform n
multiplications for n = 8, 12, 16, 20, 25,
of swaps that will achieve a given permutation.
Use the theory of determinants to prove that 30, 40, 50, 70, and 100.
it is impossible to achieve the same 39. Use MATLAB or the routine MATCOMP in
permutation using both an even number and LINTEK to check Example 2 and Exercises
an odd number of swaps. [Hint: It doesn’t 5-10. Load the appropriate file of matrices
matter what the objects actually are—think of if it is accessible. The determinant of a
them as being the rows of an n x n matrix.] matrix A is found in MATLAB using the
38. This exercise is for the reader who is command det(A).

4.3 COMPUTATION OF DETERMINANTS AND CRAMER'S RULE

We have seen that computation of determinants of high order 1s an unreasona-


ble task if it is done directly from Definition 4.1, relying entirely on repeated
expansion by minors. In the special case where a square matrix is triangular,
Example 4 in Section 4.2 shows that the determinant is simply the product of
the diagonal entries. We know that a matrix can be reduced to row-echelon
form by means of elementary row operations, and row-echelon form for a
square matrix is always triangular. The discussion leading to Theorem 4.3 in
ihe previous section actually shows how the determinant of a matnx can be
computed by a row reduction to echelon form. We rephrase part of this
discussion in a tox as an algorithm that a computer might follow to find a
determinant.

Computation of a Determinant
The determinant of an 7 X m matrix A can be computed as follows:
1. Reduce A to an echelon form, using only row addition and row
interchanges.
2. If any of the matrices appearing in the reduction contains a row of
zeros, then det(A) = 0.
64 CHAPTER 4 DETERMINANTS

3. Otherwise,

det(A) = (—1) - (Product of pivots),


where r is the number of row interchanges performed.

When doing a computation with pencil and paper rather than with a
computer, we often use row scaling to make pivots 1, in order to ease
calculations. As you study the following example, notice how the pivots
accumulate as factors when the scalar-multiplication property of determinants
's repeatedly used.

EXAMPLE 1 Find the determinant or the following matrix by reducing it to row-echelon


form.

2 204
2322
A=\9 132
2021

SOLUTION We find that

2204 1102
3322 3322 o
0132/7 2 0132 Scalar-multiplication property
2021, 2021
1 1 0 2
0 0 2-4
=2 013 2 Row-addition property twice
0-2 2-3
1 1 0 2
0 1 3 2 ;
= -2 002-4 Row-interchange property
0-2 2-3
1 | 0 2
0 1 3 2 .
= -2 00 2-4 Row-addition property
0 0 8 #1
1 1 0 2
0 1 3 2
= (—2)(2)lg Q 1 —2| Scalar-multiplication property
0 0 8 #1
43 COMPUTATION OF DETERMINANTS AND CRAMER'S RULE 265

1 1 0 2
0 1 3 2
= (—2)(2) 0 0 1-2 Row-addilion properly
0 0 017

Therefore, det(A) = (—2)(2)(17) = —68. a

In our written work, we usually don’t write out the shaded portion of the
computation in the preceding example.
Row reduction offers an efficient way to program a computer to compute a
determinant. If we are using pencil and paper, a further modification is more
practical. We can use elementary row or column operations and the properties
of determinants to reduce the computation to the determinant of a matrix
having some row or column with a sole nonzero entry. A computer program
generally modifies the matrix so that the first column has a single nonzero
entry, but we can look at the matrix and choose the row or column where this
can be achieved most easily. Expanding by minors on that row or column
reduces the computation to a determinant of order one less, and we can
continue the process until we are left with the computation of a determinant of
a2 xX 2 matnix. Here is an illustration.

EXAMPLE 2 Find the determinant of the matnx

2-1 3 °«5
201 40
A=) 6 1 3 4
-7 3-2 8|
SOLUTION It is easiest to create zeros in the second row and then expand by minors on
that row. We start by adding —2 times the third column to the first column,
and we continue in this fashion:

2-1 3 35| |-4-1 3 5


201 o foo. oO [4-t 3
6 13 4-10 1 3 a=] 9 | 4
-7 3-2 8 |-3 3-2 8} [73 3 8
|-4 —1 9| _ -4 4g!
=-(0 1 O=-)_, -4|
-3 3-4
= -(16 + 27) = —43. .

Cramer’s Rule

We now exhibit formulas in terms of determinants for the components in the


solution vector of a square lincar system Ax = b, where 4 is an invertible
matrix. The formulas are contained :n the following theorem.
266 CHAPTER 4 DETERMINANTS

THEOREM 4.5. Cramer's Rule

Consider the linear system Ax = b, where A = [a,] is ann X n


invertible matrix,

x, b,

x=|°]}, and b=

Xn D, =)

The system has a unique solution given by

_ det(B,)
for k=1,...,n, (1)
*e “Get(A)
where B, is the matrix obtained from 4 by replacing the kth-column
vector of A by the column vector b.

PROOF Because A is invertible, we know that the linear system Ax = b has a


unique solution, and we let x be this solution. Let X, be the matrix obtained
from the n X n identity matrix by replacing its kth-column vector by the
column vector x, so that

100 -+: x,00--: 0


010 --- x06 0

X,=1000
--- x00 --: Of

000 +++ x00 I


Let us compute the product AX,. If j # k, then the jth column of AX, is the
product of A and the jth column of the identity matrix, which also yields the
jth column of A. If j = k, then the jth column of 4X, is Ax = b. Thus AX; 1S the
matrix obtained from A by replacing the kth column of A by the column vectol
b. That is, AX, is the matrix B, described in the statement of thc theorem.
From the equation AX, = B, and the multiplicative property of determinants,
we obtain
det(A) - det(X,) = det(B,).
Computing det(X,) by expanding by minors across the kth row, we sé
that det(X,) = x,, and thus det(A) - x, = det(B,). Because A is invertible,
we know that det(A) # 0 and so x, = det(B,)/det(A) as asserted in Equa
tion(!). a
4.3. COMPUTATION OF DETERMINANTS AND CRAMER'S RULE 267
EXAMPLE 3 Solve the linear system
5X, — 2x,
+ x, = 1
3x, + 2x, = 43
X, + X, — x; = 0,
using Cramer’s rule.
SOLUTION Using the notation in Theorem 4.5, we find that
5 -2 | 1-2 |
det(A)=13 2 O| = -15, det(B,)
= |3 2 OO}
= —-5,
1 1-1 0 1-!

5 1 1 5-2 I
det(B,)=|3 3 Of = —-15, det(B,)=|3 2 3) = —20.
1 0-1 1 1 @

Hence,

xewel
! —] 3?

m=
-15
yg =i,

3
= 4
—] ~ 3 a

HISTORICAL NOTE Cramek’s ruLE appeared for ine firsi cu.


time in full generality in the
Introduction to the Analysis of Algebraic Curves (1750) by Gabriel Cramer (1704-1752). Cramer
was interested in the problem of determining the equation of a plane curve of given degree passing
through a certain number of given points. For example. the general second-degree curve, whose
equation is
A+ By + Cx + Dy? + Exy + x? =0,
is determined by five points. To determine A, B, C, D, and E, given the five points, Cramer
substituted the coordinates of each of the points into the equation for the second-degree curve and
found five linear equations for the five unknown coefficients. Cramer then referred to the appendix
of the work, in which he gave his general rule: “One finds the value of each unknown by forming n
fractions of which the common denominator has as many terms as there are permutations of n
things.” He went on to explain exactly how one calculates these terms as products of certain
coefficients of the n equations, how one determines we appropnate sign for each term, and how
one determines the a numerators of the fractions by replacing certain coefficients in this
calculation by the constant terms of the system.
Cramer did not, however, explain why his calculations work. An explanation of the rule for
the cases n = 2 and n = 3 did appear, however, in A Treatise of Algebra by Colin Maclaurin
(1698-1746). This work was probably written in the 1730s, but was not published until 1748, after
his death. In it, Maclaurin derived Cramer's rule for the two-variable case by going through the
standard elimination procedure. He then derived the three-variable version by solving two pairs of
equations for one unknown and equating the results, thus reducing the problem to the
two-variable case. Maclaurin then described the result for the four-variable case, but said nothing
about any further generalization. Interestingly, Leonhard Euler, in his Introduction of Algebra of
1767, does not mention Cramer’s rule at all in his section on solving systems of linear equations.
268 CHAPTER 4 DETERMINANTS

The most efficient way we have presented for computing a determinant is


to row-reduce a matrix to triangular form. This is also the way we solve a
square linear system. If A isa 10 x 10 invertible matrix, solving Ax = b using
Cramer’s rule involves row-reducing eleven 10 x 10 matrices A, B,, B,...,
B,, to triangular form. Solving the linear system by the method of Section 1.4
requires 1ow-reducing just onc 10 x !1 matrix so that the first ten columns are
in upper-triangular form. This illusirates the folly of using Cramer’s rule to
solve linear systems. However, the structure of the components of the solution
vector, as given by the Cramer’s rule formula x, = det(B,)/det(A), is of interest
in the study of advanced calculus, for example.

The Adjoint Matrix


We conclude this section by finding a formula in terms of determinants for the
inverse of an invertible x x n matrix A = [a,]. Pecall the definition of the
cofactor a;, from Eq. (2) of Section 4.2. Let A,.; be the matrix obtained from A
by replacing the jth row of A by the ith row. That 1s,
~

la, Qin °° * any

a, 4, ‘** 4,| ithrow

ij

Gy, Gy *** G,|) jthrow

a, a,2 . Ann

Then
_ fdet(A) if i = j
det(4,.) = (5 if i xj.
If we expand det(4,_.,) by minors on the jth row, we have
nt

det(A,,) = > 4,2,


s=

and we obtain the important relation

~ _ {det(A) if i =j, 2)
py Ais ~ ( ifi Aj. (
The term on the left-hand side in Eq. (2) is the entry in the ith row and jth
column in the product A(A’)’, where A’ = [a;] is the matrix whose entries are
the cofactors of the entries of A. Thus Eq. (2) can be written in matrix form 4

A(A')! = (det(A))/,
4.3 COMPUTATION OF DETERMINANTS AND CRAMER'S RULE 269

where Jis the n X nidentity matrix. Similarly, replacing the ith column of A by
the jth column and by expanding on the /th column, we have

A det(A)
if i =j
2, diay = \
r=)
ifi # ji (3)
Relation (3) yields (A’)"A = (det(A))I.
fhe matrix (A’)’ is called the adjoint of A and is denoted by adj(A). We
have established an important relationship between a matrix and its adjoint.

THEOREM 4.6 Property of the Adjoint

Let A be an n X n matrix. The adjoint adj(A) = (A’)’ of A satisfies

(adj(4))4 = A(adj(A)) = (det(A))é,


where J is the n X n identity matrix.

Theorem 4.6 provides a formula for the inverse of an invertible matrix,


which we present as a corollary.

COROLLARY A Formula for the Inverse of an Invertible Matrix

Let A = (a,] be ann X n matrix with det(A) # 0. Then A is invertible,


and

A‘ = =—,adj(A),
TaD

where adj(A) = [aj]’ is the transposed matrix of cofactors.

EXAMPLE 4_ Find the inverse of

on
te

©
od NO

—m—N

if the matrix is invertible, using the corollary of Theorem 4.6.


SOLUTION We find that det(A) = 4, so A is invertible. The cofactors a; are

a= (-I9| 20 f= 2 a = ry] 2 ,
= -2,
z 2 ' 01
Ay = (-1) 3 l =—4, Qa, = (-1) l 1 = l,

; 41 | 40
on =o 1 a one Cos 1 a
270 CHAPTER 4 DETERMINANTS

' 0 ; ; ;|4 1
ay, = (—1)"), = -2, ay = (-1) 20 =2

a, = (-1)]5 52 = 8.

Hence,
2-2 -4 2 | -2
A’=[aJ=| 1 1-4), so adj(A)=|}-2 1 2
-2 2 8 -4 -4 8
and
toot
2 1-2 ; 44
A! = adj(A)=—|-2 1 2/=| 72 4 2}.
wet) -4-4 8] [-1-1 2 .
The method described in Section 1.5 for finding the inverse of an
invertible matrix is more efficient thax the method illustrated in the preceding
example, especially if the matrix is large. The corollary is often used to find ihe
inverse of a 2 x 2 matrix. We see that if ad— bc # 0, then
la ey ee d —b
le d| ~ad- a ak

“| SUMMARY
1. A computationally feasible algcrithm for finding the determinant of a
matrix is te reduce the matrix to echelon form, using just row-addition and
row-interchange operations. If a row of zeros is formed during the process,
the determinant is zero. Otherwise, the determinant of the original matrix
is found by computing (—1)' - (Product of pivots) in the echelon form,
where r is the number of row interchanges performed. This is one way to
program a computer to find a determinant.
2. The determinant of a matrix can be found by row or column reduction of
the matrix to a matnx having a sole nonzero entry in some column or row.
One then expands by minors on that column or row, and continues this
process. If a matrix having a zero row or column is encountered, the
determinant is zero. Otherwise, one continues until the computation is
reduced to the determinant of a 2 X 2 matrix. This is a good way to find
determinant when working with pencil and paper.
3. IfA is invertible, the linear system Ax = b has the unique solution x whos¢
kth component is given explicitly by the formula
_ det(B,)
\ det(A)
where the matrix B, is obtained from matrix A by replacing the kth colum®@
of A by b
4.3 COMPUTAT lON OF DETERMINANTS AND CRAMER'S RULE 271

4. The methods of Chapter | are far more efficient than those described in
this section for actual computation of both the inverse of A and the
solution of the system Ax = b.
Let A be ann x n matrix, and let A’ be its matrix of cofactors. The adjoint
un
adj(A) is the matrix (A’)’ and satisfies (adj(A))A = A(adj(A)) = (det(A))/,
where J is the n xX n identity matrix.
6. The inverse of an invertible matrix A is given by the explicit formula

l
Al = det(d) 204).

EXERCISES

In Exercises 1-10, find the determinant of the 7-1 3 0 0


given matrix. 01 4 0 0
9%)-5 2 6 0 O
[2 3-1 4-3 2 000 1 4
1 | 5 -7 1 2,{-1 -1 1 1-0 0 0-2 8
I-32 -1 -§ 5 7 Tr Qo 00 3-4

3, |2 3-1 2 10./-1 2 4 0 0O
° 3 —4 3 7 3 1 -2 0 0

1-10 4 . 5 1 5 0 0
- 11. The matrices in Exercises 8 and 9 have zero
| 3-5-1 7 entries except for entries in anr X r
4.| 9 3 1-6 submatrix R and a separate ¢ x 5 submatrix
2-5-1 8 S whose main diagonals lie on the main
(8 8 2-9 diagonal of the whole n x n matrix, and
2 1 0 0 0 where r + 5 = n. Prove that, 1f A is such a
3-! 2 0 0 matrix with submatrices & and S, then
510 4 1-1 2 det(A) = det(R) - det(S).
0 0-3 2 4 12. The matrix A in Exercise 10 has a structure
0 O O-1 3 similar to that discussed in Exercise 1 1,
r3 2 00 90 except that the square submatrices R and S
-1 4 1 0 0 lie along the other diagonal. State and prove
6 | 0-3 5 2 =O a result similar to that in Exercise 11 for
000 i: 4 such a matrix.
| 0 0 O-! 2| 13. State and prove a generalization of the result
fo 0 03 4 in Exercise 11, when the matrix A has zero
002 0-3 entries except for entries in k submatrices
72/02 1 0 0 positioned along the diagonal.
5-3 2 0 0
3 4 0 0 0| In Exercises 14-19, use the corollary to Theorem
72-1 0 0 4.6 to find A” if A is invertible.
gs ‘lo|4 5
03 0 0
6 Maat | Is. A=[) 1
10 0-4 2 1 -1 2 |
272 CHAPTER 4 DETERMINANTS

21 1 3 0 4 32. x} + XxX, ~ 3x; + X4 = |


16. A=! 0 ft I] 1% A4=/-2 1 4 2x, + X; + 2x,=0
—2 1 1 3 1 2
X,- 6x,- xX,
=5
3 0 3 |? 1 3 3x, + x; + x,=)
18. 4=| 4 | -2/ 19 4=10 1 4
-5 1 4 lh 24] 33. 6x,+ XxX,
— x; = 4
X,- x; + 5x,
= -2
20. Find the adjoint of the matcix| ; sf —x, + 3x, + x; = 2
2 X,+ %,-X,+2x%,= 0
21. Find the adjoint of the matrix | 3
0 34. Find the unique solution (assuming that it
exists) of the system of equations
22. Given that A! = |: | and det(A~') = 3, represented by the partitioned matrix
C
fina the matrix A. a, b, c, a, | 30,
. If Ais a matnx with integer entries and if a, b, c, d, | 3b,
lod
nw

det(A) = +1, prove that A~! also has the a, b, c, da, | 3b,|
same properties. a, dy cy a, | 3d,

35. Let A ve a square matrix. Mark each of the


In Exercises 24-31, solve the given system of following True or False.
linear equations by Cramer's rule wherever it is __ a. The determinant of a square matrix is th
possible. product of the entries on its main
diagonal.
24. x, — 2x, = 1 25. 2x,- 3x,= 1 ___ b. The determinant of an upper-tnianguiar
3x, + 4x, = 3 —4x, + 6x, = -2 square matrix is the product of the
entries on its main diagonal.
26. 3x, + x, = 5 27. xX,+ xX,
= 1
___c. The determinant of a lower-triangular
2x, + xX, = 0 x, + 2x, = 2 square matrix is the product of the
entries on its main diagonal.
28. 5x, - 2x, + x; = 1
__.d. A square matrix is nonsingular if and
xX, +x,=0 only if its determinant is positive.
xX, + 6x, — x, =4 ___ e. The column vectors of an n X n matrix
are independent if and only if the
29. x + 2X, ~ xy = —2
determinant of the matrix is
2x, + x,+ x,= 0 nonzero.
___ f. A homogeneous square linear system has
a nontrivial solution if and only if the
30. x, - x,+ x,=0 determinant of its coefficient matrix is
X,
+ 2x,- x,
= 1 zero.
X,- X,+ 2x, =0
___g. The product of a square matrix and its
adjoint is the identity matrix.
WA. 3x, + 2x,-x,= 1 ___h. The product of a square matrix and its
X, — 4x, + x; = -2 adjoint is equal to some scalar times the
5x, + 2x, = | identity matrix.
___ i. The transpose of the adjoint of A is the
matrix of cofactors of A.
in Exercises 32 and 33, find the component ___ j. The formula A~! = (1/det(A))adj(A) is of
x, of the solution vector for the given linear practical use in computing the inverse ©
system. a large nonsingular matrix.
4.4 LINEAR TRANSFORMATIONS ANDO DETERMINANTS (OPTIONAL) 273

46. Prove that the inverse of a nonsingular aefault roundoff control ratio r. Start
upper-triangular matrix is upper triangular. computing determinants of powers of A.
37. Prove that a square matrix is invertible if
Find the smallest positive integer m such
and only if its adjoint is an invertible that det(A”) # 1, according 9 MATCOMP.
matrix. How bad is the error? What does
MATCOMP give for det(A™)? At what
4g. Let A be an n X n matrix. Prove that integer exponent does the break occur
det(adj(A)) = det(A)""'. between the incorrect value 0 and incorrect
49. Let A be an invertible 1 x n matrix with values of large magnitude?
n> 1. Using Exercises 37 and 38, prove that Repeat the above, taking zero for
adj(adj(A)) = (det(A))""7A. roundoff control ratio r. Try to explain why
the results are different and what is
Bl he routine YUREDUCE in LINTEK has a happening in each case.
menu option D that will compute and display . Using MATLAB, find the smallest positive
the product of the diagonal elements of a integer m such that det(A”) # 1, according
square matrix. The routine MATCOMF has a
to MATLAB, for the matrix A in Exercise
menu cption D to compute a determinant.
43.
Use YUREDUCE or MATLAB to compute
the determinant of the matrices in Exercises
40-42. Write down your results. If you used In Exercises 45-47, use MATCOMP in LINTEK
YUREDUCE, use MATCOMP to compute or MATLAB and the corollary of Theorem 4.6 to
the determinants of the same matrices again find the matrix of cofactors of the given matrix.
and compare the answers.
li -9 28 13 -15 33 f} 2 -3
40. }32 -24 21 41.;-15 25 40 45. 2 3 0
10 13 -19 12 -33 27 3 1 «4

7.6 2.8 -3.9 19.3 25.0 [52 3] 47]


—33.2 11.4 13.2 22.4 18.3 46. 21-11 28
42. | 21.4 —32.1 45.7 -8.9 12.5 | 43 -71 37|
17.4 11.0 -6.8 20.3 -35.1 rT 6 -3 2 «14
22.7 11.9 33.2 2.5 7.8 3 7 8 1
47.
43. MATCOMP computes determinants in 4 9-5 3
essentially the way described in this section. I-§ -40 47 29
The matrix (Hint: Entries in the matrix of cofactors are
integers. The cofactors of a matrix are

2 3]
34
A =
continuous functions of its entries; that is,
changing an entry by a very slight amount
has determinant |, so every power of it will change a cofactor only slightly. Change
should have determinant i. Use MATCOMPF some entry just a bit to make the
with single-precision printing and with the determinant nonzero.]

4.4 LINEAR TRANSFORMATIONS AND CETERMINANTS


(OPTIONAL)
We continue our program of exhibiting the relationship between mairices and
linear transformations. Associated with an m xX n matrix A is the linear
transformation 7 mapping I?” into R”, where 7(x) = Ax for xin RR’. Ifm = m, so
274 CHAPTER 4 DETERMINANTS

thatA is a square matrix, then det(A) is defined. But what is the meaning of the
number det(A) for the transformation 7? We now tackle this question, and the
answer is sO important that it merits a section all to itself. The not‘on of the
determinant associated with a linear transformation 7 mapping R” into R" lies
at the heart of variable substitution in integral calculus. This section presents
an informal and intuitive explanation of this notion.

The Volume of an n-Box in R”

In Section 4.1, we saw that the area of the parallelogram (or 2-box} in R?
determined by vectors a, and a, is the absolute value |det(A)| of the determi-
nant of the 2 x 2 matrix A having a, and a, as column vectors.* We also saw
that the volume of the parallelepiped (or 3-ox) in R? determined by vectors a,,
a), and a, is |det(A)| for the 3 x 3 matrix A whose column vectors are a, a), a3.
We wish to extend these notions by defining an n-box in R™ for m = n and
finding its “volume.”

DEFINITION 4.2 Ann-Box in R®

Leta,,a,,...,a, be mindependent vectors in R” for n = m. The a-box


in R” determined by these vectors is the set of all vectors x satisfying
Xx=fta,t+ta,t---+
ta,
forO <¢,S landi=1,2,...,n.

If the vectors a,, a,,..., a, in Definition 4.2 are dependent, the set
described is a degenerate n-box.

EXAMPLE 1 Describe geometrically the 1-box determined by the “vector” 2 in R and the
I-box determined by a nonzero vector a in R”.
SOLUTION The 1-box determined by the “vector” 2 in R consists of all numbers 4(2) for
0 <¢< 1, which is simply the closed interval 0 < x < 2. Similarly, the 1-box in
R™ determined by a nonzero vector a is the line segment joining the origin to
the tipofa. s

EXAMPLE 2 Draw symbolic sketches of a 2-box in R™ and a 3-box in R”.


SOLUTION A 2-box in R™ is a parallelogram with a vertex at the origin, as shown in Figure
4.7. Similarly, a 3-box in R” is a parallelepiped with a vertex at the origin, 45
illustrated in Figure 4.8. @

Notice that our boxes need not be rectangular boxes; perhaps we should
have used the term skew box to make this clear.

*Anticipating our work in this section, we arrange the vectors a, as columns rather than as rows
of A. Recall that det(A) = det(A’).
44 LINEAR TRANSFORMATIONS AND DETERMINANTS (OPTIONAL) 275

FIGURE 4.7 FIGURE 4.8


A 2-vox in R™. A 3-box in R”.

Wc are accustomed to speaking of ihe /ength of a line segment, of the area


of a piece of the plane, and of the volume of a piece of space. To avoid
introducing new terms when discussing a general n-box, we will use the
three-dimensional term volurie when speaking of its size. Notice that we
already used the three-dimensional term box as the name for the object! Thus,
by the volume of a 1-box, we simply mean its length; the volume of a 2-box is its
area, and so on.
The volume of the 1-box determined by a, in R” is its length !!a,||. Because
the determinant of a 1 x 1 matrix is its soie entry, Exercise 36 shows that this
length can be written as

Length = [la\|| = V det((a, - a))). (1)


Let us turn to a 2-box in R” determined by nonzero and nonparallel
vectors a, and a,. We repeat an argument that we made in Section 4.1 for a
2-box in R’, using a slightly different notation for this general m. From Figure
4.9, we see that the area of this parallelogram is given by

Area = |[Al| llazh,


where, for the angle @ between a, and a,, we have ||A| = ||a,|| sin 6. We then have

(Area)’ = |la,|Pjla,|| sin’@


= [la lPila,|’(. — cos?6)
= {alPlladtl’ — (lal [lal] cos 6)?
“= (a, + a,)(a2 * a2) — (a, - a,)(a, - a,)
aa,
a,* a, a,a) ° * a,ayy _ det([a, - ; a). (2)2
276 CHAPTER 4 DETERMINANTS

ay

[Al] = llayl| sin 8


VI
0 “a2
Base

FIGURE 4.9 FIGURE 4.10


The volume of a 2-box in R” is The volume of a 3-box in R” is
(Lenoth of the base) x (Altitude). (Area of the base) x (Altitude).

From Eqs. (1) and (2), we might guess that the square of the volume of an
n-box in R” is det((a; - a). Of course, we must define what we mean by the
volume of such a box, but with the natural definition, this conjecture is true. If
A is the matrix with jth column vector a, then A’ is the matrix with ith row
vector a, and the n X n matrix [a, - a] is A7A, so Eq. (2) can be written as
(Area) = \det(A7A).
Wc have an intuitive idea of the volume of an 7-box in R” for n < m. For
example, the 3-box in R™ determined by independent vectors a,, a,, a; has a
volume equal to the altitude of the box times the volume (that is, area) of the
base, as shown in Figure 4.10. Roughly speaking, the volume of an n-box 1s
equal to the altitude of the box times the volume of the base, which is an
(n — 1)-box. This notion of the altitude of a box can be made precise after we
develop projections in Chapter 6. The formal definition of the volume of an
n-box appears in Appendix B, as does the proof of our main result on volumes
(Theorem 4.7). For the remainder of this section, we are content to proceed
with our intuitive notion of volume.

THEOREM 4.7 Volume of a Box

The volume of the n-box in R™ determined by independent vectors


@,, a, ..., 4, 1S given by
Volume = Vdet(A7A),
where A is the m X n matrix with a, as jth column vector.

The volume of an n-box in R” is of such importance that we restate this


special case as a corollary.
44 LINEAR TRANSFORMATIONS AND DETERMINANTS (OPTIONAL) 277

COROLLARY Vo-ume of an n-Box in R’

If Ais ann X n matrix with independent column vectors a, a), . . - + n>


then |det(A)| is the volume of the n-box in R” determined by these 7
vectors.

PROOF By Theorem 4.7, the square of the volume of the n-box is det(A7A).
But because A is ann X n matrix, we have

det(ATA) = det(A7) - det(A) = (det(A))’.


The conclusion of the corollary then follows at once. a

EXAMPLE 3 Find the area of the parallelogram in R* determined by the vectors [2, |, — 1, 3]
and [0, 2, 4, —1].
SOLUTION If

2 0
1 2
A=|_1 4),
3 -1

then
2 ;
2 1-1 3] 1 2]_f15 -5
AA E 2 4 aye 4 i i}
r = =

3-1
By Theorem 4.7, we have
15 -5
(Area) 2—
2 71 =
290.

Thus the area of the parallelogram 1s V290. s

EXAMPLE 4 Find the volume of the parallelepiped in ik’ determined by the vectors
[1, 0, -1], [-1, 1, 3], and [2, 4, 1].
SOLUTION We compute the determinant
1-1 2) [1-1 2
0 1 4=lo 1 = |) 3 = -3.
-1 3 of jo 2 3 |
Applying the corollary of Theorem 4.7, the volume of the parallelepiped is
therefore 5 =

Comparing Theorem 4.7 and its corollary, we see that the formula for the
volume of an n-box in a space of larger dimension m involves 2 square root,
278 CHAPTER 4 DETERMINANTS

whereas the formula for the volume of a box in a space of its own dimension
does not involve a square root. The student of calculus discovers that the
calculus formulas used to find the length of a curve ‘which 1s one-dimensional)
in the plane or in space involve a square root. The same is true of the formulas
used to find the area of a surface (two-dimensional) in space. However, the
calculus formulas for finding the area of part of the plane or the volume of
some part of space do not involve square roots. Theorem 4.7 and its corollary
lie at the heart of this difference in the calculus formulas.

Volume-Change Facter of a Linear Transformation


Recall the multiplicative property of determinants: det(AB) = det(A) - det(B),
where A and B are n X n matrices. Suppose that both A and B have rank n.
Then the column veciors b,, b,,..., b, of B determine an n-box having
volume {det(B); # 0. From: the definition of matrix multiplication, the jth
column vector of AB is Ab,, and we write

LL
AB =|Ab, Ab, «++ Ab,|.

The linear transformation 7: R” — R" defined by 7(x) = Ax thus carries the


original n-box determined by the column vectors of B into a new n-box
determined by the column vectors of AB. The new n-box has volume
|det(AB)| = |det(A)| - |det(B)|. That is, the volume of the new n-box, or image
box, is equal to |det(A)| times the volume of the original box. Thus |det(A)| is
referred to as the velume-change factor for the linear transformation 7. We are
interested in this concept only when det(A) # 0, a requirement that ensures
that A is invertible, that the n vectors Ab, are independent, and that T is an
invertible transformation. .
To illustrate this idea of the volume-change factor, consider the n-cube 10
R" determined by the vectors ce,, ce, . . . , ce, forc > 0. This n-cube with edges
of length c has volume c’. It is carried by T into an n-box having volume
|det(A)| - c’; the image n-box need not be a cube, nor even rectangular. We
illustrate the image n-box in Figure 4.11 for the case n = 1 and in Figure 4.12
for the case n = 3.

EXAMPLE 5 Consider the linear transformation 7: R’ > R° defined by

T([X), X, %3]) = [2 + 2%, 2X,, 2x, + 5x,].

Find the volume of the image box when T acts on the cube determined by the
vectors ce,, ce,, and ce, for c > 0.
44 LINEAR TRANSFORMATIONS AND DETERMINANTS (OPTIONAL) 279

y
A
y= 3x

Volume change
factor = |3|

> Xx
c

FIGURE 4.11
The volume-change factor of T(x) = 3x.

SOLUTION The image box is determined by the vectors


T(ce,) = c7(e,) = c{1, 0, 2]
T(ce,} = cT(e,) = c[0, 2, 0]
T(ce;) = cT(e;) = c[l, 0, 5].
The standard matrix representation of T is

¢ Volume of the
Volume of the image box = c|det(A)|
x cube = c? x,

FIGURE 4.12
The volume-change factor of T(x) = Ax.
280 CHAPTER 4 DETERMINANTS

and the volume-change factor of Tis |det(A)| = 6. Therefore, the image box has
volume 6c’. This volume can also be computed by taking the determinant of
the matrix having as column vectors 7{ce,), T(ce,), and T(ce,). This matrix is
cA, and has determinant c - det(A) = 6c.

Tie sign of the determinant associated with an invertible linear transfor-


mation 7: R" -> R" depends on whether T preserves orientation. In the plane
where n = 2, orientation is preserved by T if 7(e,) can be rotated counterclock-
wise through an angle of less than 180° to hie along 7{e,). It can be shown that
this is the case if and only if det(A) > 0, where A is the standard matrix
representation of 7. In general, a linear transformation 7: R" — R’ is said to
preserve orientation if the determinant of its standard matrix representation A
is positive. Thus the linear transformation in Example 5 preserves orientation
because det(A) = 6. Because this topic is more suitable for a course in analysis
tuan for one in linear algebra, we do not pursue it further.

Application to Calculus
We can get an intuitive feel for the connection between the volume-change
factor of T and integral calculus. The definition of an integral involves
summing products of the form
(Function value at some point of a box)(Volume of the box). (3)
f{x) dx
Under a change of variables—say, from x-variables to t-variables—the boxes
in the dx-space are replaced by boxes in dt-space via an invertible linear
transformation—namely, the differential of the variable substitution function.
Thus the second factor in piuduct (3) must be expressed in terms of volumes of
boxes in the dt-space. The determinant of the differential tiansformation must
play a role, because the volume of the box in dx-space is the volume of the
corresponding box in dt-space multiplied by the absolute value of the
determinant.
Let us look at a one-dimensional example. In making the substitution x =
sin ¢ in an integral in single-variable calculus, we associate with each f-value 4
the linear transformation of dt-space into dx-space given by the equation dx =
(cos f,)dt. The determinant of this linear transformation is cos f). A little 1-box
of volume (length) dt and containing the point ¢, is carried by this lineaf
transformation into a little 1-box in the dx-space of volume (length) |cos t,|dt.
Having conveyed a rough idea of this topic and its importance, we leave 1ts
further development to an upper-level course in analysis.

Regions More General Than Boxes


In the remainder of this section, we will be working with the volume of
sufficiently nice regions in R". To define such a notion carefully would require
an excursion into calculus of several variables. We will simply assume that w¢
all have an intuitive notion of such regions having a well-defined volume.
4.4 LINEAR TRANSFORMATIONS AND DETERMINANTS (OPTIONAL) 281

Let 7: R" > R* be a linear transformation of rank n, and let A be its


standard matrix representation. Recall that the jth column vector of A is T{e))
(see p. 146). We will show that, if a region Gin R" has volume V, the image of G
under T has volume

\det(A)| - V.
That is, the volume of a region is multiplied by |det(A)| when the region
is transformed by T. This result has important applications to integral cal
culus.
The volume of a sufficiently nice region G in R" may be approximated by
adding the volumes of small n-cubes contained in G and having edges of length
c parallel to the coordinate axes. Figure 4.13(a) illustrates this situation for a
plane region in R’, where a grid of small squares (?-cubes) is placed on the
region. As the length c of the edges of the squares approaches zero, the sum of
the areas of the colored squares inside the region approaches the area of the
region. These squares inside G are mapped by T into parallelograms of area
c’|det(A)| inside the image of G under T. (See the colored parallelograms in
Figure 4.13(b). As c approaches zero, the sum of the areas of these parallelo-
grams approaches the area of the image of G under 7, which thus must be the
area of G multiplied by |det(A)|. A similar construction can be made with a
grid of n-cubes for a region G in R". Each such cube is mapped by 7 into an
n-box of volume c’|det(A)|. Adding the volumes of these n-boxes and taking
the limiting value of the sum as c approaches zero, we see that the voiume of
the image under T of the region G is given by
Volume of image of G = |det(A)| - (Volume of G). (4)
We summarize this work in a theorem on the following page.

€2
U e;
c

(a) Region G in the domain of T (b) The image of G under T

FIGURE 4.13
282 CHAPTER 4 - DETERMINANTS

THEOREM 4.8 Volume-Change Factor for T: R° > R"

Let G be a region in R’ of volume V, and let 7: R" — R’ be a linear


transformation of rank 1 with standard matrix representation A. Then
the volume of the image of G under T 1s |det(A)| - V.

EXAMPLE6 Let T: R? > R’ be the linear transformation of the plane given by 7([x, y]) =
[2x — y, x + 3}}. Find the area of the image under T of the disk x? + y= 4 in
the domain of T.
SOLUTION The disk x? + y = 4 has radius 2 and area 42. It is mapped by 7 into a region
(actually bounded by an ellipse) of area

Idet(A)| - (4a) = H “ : (47) = (6 + 1)(4a) = 287. a

We can generalize Theorem 4.8 to a linear transformation 7: R" > R’,


where m = rand T has raak n. This time, the standard matrix representation
Ais an m % nmatrix. The image under T of the unit m-cube in R" outlined by
€,,@),...,@, 1s the m-box in R” outlined by 7{e,). 7(e,), .. . , T(e,). According
to Theorem 4.7, the volume of this box in R” is
V det(A7A).
The same grid argument used earlier and illustrated in Figure 4.13 shows that
a region G in R" of volume V is mapped by 7 into a region of R” of volume

Vdet(A7A) - V.
We summarize this generalization in a theorem.

THEOREM 4.9 Volume-Change Factor for T: R? > R”

Let G be a region in R" of volume V. Let m= nand let 7:R"’— R” bea


linear transformation of rank n. Let A be the standard matrix
representation of 7. Then the volume of the image of G in R” under
the transformation 7 is

Vdet(A7A) - V.

EXAMPLE7 Let T: R? > R? be given by T([x, y]) = [2x + 3y, x — y, 2y]. Find the area of the
image in R? under T of the disk x7 + 7 s 4 in R’.
SOLUTION The standard matrix representation A of Tis
2 3
A=|1 -1
0 2
44 LINEAR TRANSFORMATIONS AND DETERMINANTS (OPTIONAL) 283

and

fz
ATA =}5 _y
1 of?
al!
3 455yah
-1=]5
0 2
Thus,

Vdet(ATA) = V70 — 25 = V45 = 3315.


A region G of R’ having area V is mapped by T into a plane region of area
3V5 - Vin R’. Thus the disk x? + y? = 4 of aiea 4 is mapped into a plane
region in R? of area
(354m) = 127V5. .

| SUMMARY

1. An n-box in R”, where m = n, is determined by n independent vectors


@,, a, ..., a, and consists of all vectors x in R” such that
X=fa, + ta,+-+- + 4a,

where
0 <= f,= 1 fori=1,2,...,7n.
2. A |-box in R” is a line segment, and its “volume” is its length.
3. A 2-box in R” is a parallelogram determined by two independent vectors,
and the “volume” of the 2-box is the area of the parallelogram.
A 3-box in iR” is a skewed box (parallelepiped) in the usual sense, and its
fo.

volume is the usual volume.


5. Leta,,a,,...,a, be independent vectors in R” for m= n, and let A be the
m X n matrix with jth column vector a, The volume of the n-box in k™
determined by the n vectors is Vdet(A7A).
6. For the case of an n-box in the space R’ of the same dimension, the formula
for its volume given in summary item 5 reduces to |det(A)|.
7. If T: R’—> R’ is a linear transformation of rank n with standard matnx
representation A, then 7 maps a region in its domain of volume V into a
region of volume |det(A)|V.
8. If T: R° > R” is a linear transformation of rank n with standard matnx
representation A, then 7 maps a region in its domain of volume VV into a
region of R™ of volume \Vdet(A‘A) - V.

| exercises

1. Find the area of the parallelogram in R? 2. Find the area of the parallelogram in R°
determined by the vectors [0, !, 4] and determined by the vectors [1, 0, 1, 2, —1] and
[—1,3, -2]. (0, 1, -1, 1, 3}.
284 CHAPTER 4 DETERMINANTS

3. Find the volume of the 3-box in R‘ [4x — 2y, 2x + 3y]. Find the area of the image
determined by the vectors [—1, 2, 0, 1], under T of each of the given regions in R?.
[0, 1, 3, 0], and [0, 0, 2, —1}.
4. Find the volume of the 4-box in R° 18. The squaareOSx51,0s ys]
determined by the vectors [1, 1, 1, 0, 1], 19. The rectangle -l=x<t,l=sy=2
(0, 1, 1. 0, 0], [3, 0, 1, 0, 0], and 20. The parallelogram determined by 2e, + 3e,
[1, —1, 0, 0, 1). and 4e, — e,

In Exercises 5-10, find the volume of the n-box


2. The disk (x — 1) + (y + 2)? <9
determined by the given vectors in R’.
In Exercises 22-25, let T: R? > R? be defined by
T([x, y, Z]) = [x - 2y, 3x + z, 4x + 3y). Find the
. (-1, 4], [2, 3] in R? volume of the image under T of each of the given
HN

. [-S, 3], (1, 7] in R? regions in R?.


. (i, 3, -5], [2, 4, —1), [3, 1, 2] in R?
On

. {-1, 4, 7], [2, -2, - 1), [4, 0, 2] in R? 22. Thecube


Os x<=1,9Sysi,0sz<1

. £1, 0,0, 1), (2, -1, 3, O}, (0, !, 3, 4], 23. TheboxO =<x<2,-ls ys3,252<5
Oo

{-1, 1, —2, 1] in R* 24. The box determined by 2e, + 3e, — ¢e,,


10. [1, —1, 0, 1, [2, -1, 3, 1], [-1.4, 2, -1], 4e, — 2e,, and 2, — e, + 2e,
{0, 1,0, 2] in R* 25. The ball x + (y -— 3 + (z + 2/ = 16
11. Find the area of the triangle in R? with
vertices (—1, 2, 3), (0, 1, 4), and (2, 1, 5). In Exercises 26-29, let T: R? — R? be the linear
(Hint: Think of vectors emanating from transformation defined by Ti[x. y]) =
(—1, 2, 3). The triangle may be viewed as Ly, x, x + y]. Find the area of the imuge under T
half a para!le!ogram.] ef each of the given regions in R?.
12. Find the volume of the tetrahedron in R?
26. The squaeO = x=1,0s ys 1
with vertices (1, 0, 3), (-1, 2, 4),
(3, —1, 2), and (2, 0, —1). [Hint: Think of 27. The rectangle 2<x=<3,-lsy<4
vectors emanating from (1, 0, 3).j 28. The triangie with vertices (0, 0), (6, 0), (0, 3)
13. Find the volume of the tetrahedron in R‘ 29. The disk x + y? < 25
with vertices (1, 0, 0, 1), (-1, 2, 0, 1),
(3, 0, 1, 1), and (-—1, 4, 0, 1). [Hint: See the In Exercises 30-32, let T: R? + R‘ be defined by
hint for Exercise 12.} T(x, y]) = [x — y, x, —y, 2x + y). Find the area
14. Give a geometric interpretation of the fact of the image under T of each of the given regions
that an 7 X n matrix with two equal rows in R?,
has determinant zero.
15. Using the results of this section, give a 30. The squaeO =< x=1,0s ysl
criterion that four points in R’ lie in a plane. 31. The square -1 Ss x= 3,-Ilsys3
16. Determine whether the noiats (1, 0, 1, 0), 32. The disk x + y? <9
(~1, 1,0, 1), (0, 1, -1, 1), and (1, -1, 4, -1) 33. a. If one attempts to define an n-box in R”
lie in a plane in R*. (See Exercise 15.) for n > m, what will its volume as an
17. Determine whether the points (2, 0, 1, 3), n-box be?
(3, 1,0, 1), (-1, 2, 0, 4), and (3, 1, 2, 4) lie b. LetA be an m X n matrix with n > m.
in a plane in R*. (See Exercise 15.) Find det(A™A).
34, We have seen that, for n x n matrices A and
In Exercises 18-21, let T: R? - R? be the linear B, we have det(AB) = det(A) - det(B), but the
transformation defined by T([x, y]) = proof was not intuitive. Give an intuitive
4.4 LINEAR TRANSFORMATIONS AND DETERMINANTS (OPTIONAL) 285

eoetric argument showing that at least —_. @ If the image under 7 of an n-box B in R*
|det(AB)| = |det(A)] - |det(B)|. [Hint: Use the has volume 12, the box B has volume
fact that, if A is the standard matrix 12/|det(A)].
representation of 7: R” > R" and B is the . fn = 2, the image under 7 of the unit
standard matrix representation of disk x + y’ = | has area |det(A)|.
T’: R’ — R’, then AB is the standard matrix
. The linear transformation 7 is an
representation T° T’.]
isomorphism.
35. Let T: R? — R" be a linear transformation of
. The image under 7° T of an n-box in R’
rank n with standard matnx representation
of volume V is a box in R’ of volume
A. Mark each of the following True or False.
det(A’) - V.
_ &. The image under T of a box in R’ is
again a box in R’. i. The image under 7° Te T of an n-box in
__ b. The image under 7 of an n-box in R" of R’ of volume V is a box in R’ of volume
volume V is a box in R’ of volume det(A’) - V.
det(A)- V. —_—j. The image under 7 of a nondegenerate
__ c. The image under T of an n-box in R’ of 1-box is again nondegenerate.
volume > 9 is a box in R’ of volume > 0.
__ 4. If the image under 7 of an n-box B in R" 36. Prove Eq. (1); that is, prove that the square
has volume 12, the box B has volume of the length of the line segment determined
|det(A)| - 12. by a, in R" is |la,||? = det([a, - a,]).
CHAPTER

5 EIGENVALUES AND
J — EIGENVECTORS

This chapter introduces the important topic of eigenvalues and eigenvectors of


a square matrix and tle associated linear transformation. Determinants
enable us to give illustrative computations and applications involving matrices
of very small size. Eigenvectors and eigenvalues continue to appear at
intervals throughout much of the rest of the text. In Section 8.4 we discuss
some other methods for computing them.

5.1 EIGENVALUES AND EIGENVECTORS

Encounters with A*x


In Section 1.7 we studied Markov chains dealing with the distribution of
population among states, measured over evenly spaced time intervals. An
n X n transition matrix T describes the movement of the population ame
the states during one time interval. The matrix T has the property that
entries are nonnegative and the sum of the entries in any column is |. Supp
that p is the initial population distribution vector—that is, the column Veclg
whose ith component is the proportion of the population in tne ith state at
start of the process. Then 7p is the corresponding population distribut@™
vector after one time interval. Similarly, T’p is the population distriboty
vector after two time intervals, and in general, 7“p gives the distributiom
population among the states after & time intervals.
Markov chains provide one example in which we are interest
computing 4x for an m X n matrix A and acolumn vector x of n compo
We give a famous classical problem that provides another illustration.

EXAMPLE 1 (Fibonacci’s rabbits) Suppose that newly born pairs of rabbits produ
offspring curing the first month of their lives, but each pair produces one
pair each subsequent month. Starting with F, = | newly born pair in the

286
5.1 EIGENVALUES AND EIGENVECTORS 287

month, find the number F; of pairs in the Ath month, assuming that 20 rabbit
dies.
SOLUTION In the Ath month, the number of pairs of rabbits is
F, = (Number of pairs alive the preceding month)
+ (Number of newly born pairs for the kth month).
Because our rabbits do not produce offspring during the first month of their
lives, we see that the number of newly born pairs for the kth month is the
number F,_, of pairs alive two months before. Thus we can write the equation
above as

F, = F,., + Fy.» Fibonacci’s relation (1)


It is convenient to set F, = 0, denoting 0 pairs for monih 0 before the arrival of
the first newly born paiz, which is presumably a gift. Thus the sequence
Fy, F,, Fy... Fy...
for the number of pairs of rabbits becomes the Fibonacci sequence

0, 1, 1, 2, 3, 5, 8, 13, 21, 34,..., (2)


where each term starting with F, = 0 + 1 = 1 is the sum of the two preceding
terms. For any particular k, we could compute F, by writing out the sequence
far enough. & ‘

Fibonacci published this problem early in the thirteenth century. The


Fibonacci sequence (2) occurs naturally in a surprising number of places. For
example, leaves appear in a spiral pattern along a branch. Some trees have
five growths of leaves for every two turns, others have eight growths for every
three turns, and still others have 13 growths for every five turns; note the ap-
pearance of these numbers in the sequence (2). A mathematical journal, the
Fibonacci Quarterly, has published many papers relating to the Fibonacci
sequence.
We said in Example 1 that F, can be found by simply writing out enough
terms of the sequence (2). That can be a tedious task, even if we want
to compute only F,. Linear algebra gives us another approach to this
problem. The Fibonacci relation (1) can be expressed in matrix form. We
see that

F, _|! 1] | F,,
F,., 1 0} LF yo}
Thus, if we set
_ |F fil
x, = i and A= i of

we find that
X, = AX;,.;. (3)
288 CHAPTER 5 EIGENVALUES AND EIGENVECTORS
Applying Eq. (3) repeatedly, we see that

X, = AX, X= AX, =A’x, x, = AX, = A*X,,

and in general

x, = At'x, = i al [of (4)

Thus we can compute the kth Fibonacci number F, by finding A‘' and
multiplying it on the right by the column vector x,. Raising a matrix to a power
is also a bit of a job, but the routine MATCOMP (in LINTEK) or MATLAB
can easily find F,, for us. (See Exercise 45.)
Both Markov chains and the Fibonacci sequence lead us to computations
ef the form A‘x. Other examples leading to A*x abound in the physical and
social sciences.

Computations of A*x arise in any process in which information


given by a column vector gives rise to analogous information at
a later time by multiplying the vector by a matrix A.

Eigenvalues and Eigenvectors


Suppose that 4 is an x X n matrix and v is a column vector with n components
such that

Av = Av (5)

for some scalar A. Geometrically, Eq. (5) asserts that Av is a vector parallel to ¥.
From Av = Av, we obtain

A’y = A(Av) = A(Av) = A(Av) = A(Av) = Av.


It is easy to show, in general, that A’v = A*y. (See Exercise 27.) Thus, Ax is
easily computed if x is equal to this vector v.

DEFINITION 5.1 Eigenvalues and Eigenvectors

Let A be an n X n matrix. A scalar A is an eigenvalue of A if there is a


nonzero column vector v in R* such that Av = Av. The vector v is then
an eigenvector of A corresponding to A. (The terms characteristic vector
and characteristic value or proper vector and proper value are also used
in place of eigenvector and eigenvalue, respectively.)
5.t EIGENVALUES AND EIGENVECTORS 289

For example, the computations


2 21/1) _ |4] _ J!
3 11] [4 I
show that the vector Hli is an eigenvector of the matrix i corresponding to
31
the eigenvalue 4.
For many matrices A, the computation of A*x for a general vector x is
greatly simplified by first finding all eigenvalues and eigenvectors of --.
Suppose that we can find a basis {b,, b,,..., b,} for R" consisting of
eigenvectors of A, so that for eigenvalues A,, A;, ... , A, we have

Ab,
= Ab; for i= 1,2,....n.
To compute A‘x, we first express x as a linear combination of these basis
eigenvectors:

x= db, + d,b, tees + d,b,,.

Because A*b; = A,b,, we then obtain


Akx = djtd,b, + Atdjb, + «+ + + A,td.b,. (6)
It can be shown that if complex numbers are allowed as vector components
and scalars, then such a basis consisting of eigenvectors usually exists, in the
sense that if A is chosen at random, it exists with probability 1. When we limit
ourselves to reai-number components and scalars, such a basis often does not
exist. However, it can be shown that if A is a real symmetric matrix, which is
the case for some applications, then we can find a basis of eigenvectors. This
idea is explored further in Section 5.2.
Equation (6) can give important information on the effect of repeatedly
transforming R" using multiplication by A. Suppose that at least one of the
eigenvalues A,;—we may as well assume that it is A, —has magnitude greater
than 1. Then vectors of arbitrarily large magnitude can be obtained under
repeated multiplication of b, by A—that is, lim,_.. |A*b,| = ~. Multiplication
by A is then called an unstable transformation of R’. On the other hand, if jAj <
1 for i = 1, 2,..., n, then lim,,. |4%x| = 0 for all x in R’ and the
transformation is stable. If the maximum of the |Aj is 1, the transformation is
neutrally stable. -
Suppose now that in Eq. (6), one of the A;— again, we may as well assume
that it is A,—is of magnitude greater than all of the other A, Then A,‘
dominates the other A+ for large values of k. If d, 4 0, this means that the term
A,*d,b, dominates the other terms in Eq. (6), so that for large values of k, the
vector A*x is nearly parallel to b,, and one more multiplication of A*x by A
amounts approximately to multiplying each component of A*x by A,. Indeed,
one way to compute such a dominant eigenvalue A, is to use a computer and
find the ratio of the components of A‘*'x to those of A*x for a suitable value of
k. Quite small values of k will serve if A, dominates strongly—for instance, if
290 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

its magnitude is several times the next largest magnitude of an eigenvalue. If


the ratio of those magnitudes is closcr to 1—say, 1.2—then we have to use a
larger value for k. MATLAB can be used to illustrate this. (See the MATLAB
exercises.) This method of computing the eigenvalue of maximum magnitude
is called the power method, and is refined and treated in more detail in Section
8.4, where some additional methods for computing cigenvalues are described.

Computing Eigenvalues
In this section we show how a determinant can be used to find eigenvalues of
ann X nmatrix A; the computational technique is practical only for relatively
small matrices.
We write the equation Av = Av as Av — Av = 0, or as Av — Alv = 0, where /is
the n x nidentity matrix. This last equation can be written as (A — A/)v = 0, so
y must be a solution of the homogeneous linear system

(A — AZ)x = 9. (7)
An eigenvalue of A is thus a scalar A for which system (7) has a nontrivial
solution v. (Reca!] that an eigenvector is nonzero by definition.) We know that
system (7) has a nontrivial solution precisely when the determinant of the
coefficient matnx is zero—that is, if and only if
det(A - Al) = 0. (8)

HISTORICAL NOTE THE CONCEPT OF AN EIGENVALUE, in its origins and its later development,
was independent of matrix theory. In fact, its original context was that of the solution of systems
of linear differential equations with constant coefficients (see Section 5.3). Jean Le Rond
D’Alembert (1717-1783), in his work in the 1740s and 1750s on the motion of a string loaded
with a finite, number of masses, considered the system
3
2
I> ay =0, t= 1,2, 3.
dt get
(Here the number of masses is restricted to 3 for simplicity.) To solve this system, D’Alembert
multiplied the ith equation by a constant v;(to be determined) for each i and added the equations
together to obtain
3 3
= qd?
au aa + > va4y, = 0.
i=l ik= J

If the v, are chosen so that ¥, va,;, + Av, = 0 fork = 1, 2, 3—that is, if the vector [v,, V2. Ya] is
eigenvector corresponding to the eigenvalue —A for the matrix A = [a,), then the substitution # ~
Viv, + vat. + v3), converts the original system to the single differential equation

This equation can now casily be solved and leads to solutions for the y;. It is not difficult to show
that A is determined by a cubic equation that has three roots. .
Eigenvalues also appeared in other situations involving svstems of differential equations:
including physical situations studied by Euler and Lagrange.
5.1 EIGENVALUES AND EIGENVECTORS 291

If A = (a,], then the previous equation can be writtca

a,-A ay me Qi,
Ay, A2—A Q2, =6 (9)

a, ano ° Gan A

If we expand the determinant in Eq. (9), we obtain a polynomial expression


P(A) of degree n with coefficients involving the a;. That is,
det(4 — AI) = pA).
The polynomial p(A) is the characteristic polynomial* of the matrix A. The
eigenvalues of A are precisely the solutions of the characteristic equation
pA) = 9.
EXAMPLE 2 Find the eigenvalues of the matrix

A=|2
_ {32
A
SOLUTION The characteristic polynomial of A is
— yn f3-A 2 oy ay
det(A AN =| 7 j=) 3A — 4.

The characteristic equation is


WV - 31-4=0,
and we obtain (A — 4)(A + 1) = 0; therefore, A, = —1 and A, = 4 are eigenvalues
of A. um

EXAMPLE 3 Show that A, = | is an eigenvalue of the transition matrix for any Markov
chain.
SOLUTION Let T be ann X n transition matrix for a Markov chain; that is, all entries in 7
are nonnegative and the sum of the entries in each column of T is 1. We easily
see that the sum of the eniries in any column of 7 — J must be zero. Thus the

*Many authors define the characteristic polynomial of A to be p(A) = det(A/ — A) rather than
P(A) = det(A — AI). Because AJ — A = (—1(A — AJ), we see that for an n x n matrix A, we have
det(Al — A) = (—1)*det(4 — A). Thus the two definitions differ only when n is an odd integer, in
which case the two polynomials differ only in sign. The definition det(A/ — A) has the advantage
that the term of highest degree is always A’, rather than tcing —A" when n is odd. We use the
definition p(A) = det(A — A/) in this first course because for our pencil-and-paper computations, it
is easier to write “— A” after each diagonal entry than to change the sign of every entry and then
write “A +” before the diagonal entries. However, the command poly(A) in MATLAB will produce
the coefficients of the polynomial det(A/ — A).
292 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

sum ol the row vectors of 7 — / 1s the zero vector, so the rows of T — J are
linearly dependent. Consequently, rank(7 — /) < an, and (7 — /)x = 0 hasa
nontrivial solution, so A, = | is an eigenvalue of 7.

The characteristic equation of an” X n matrix 1s a polynomial equation of


degree n. This equation has 7 solutions if we allow both real and complex
numbers and if we count the possible multiplicities greater than | of some
’ solutions. Linear algebra can be done using complex scalars as well as real
scalars. Introducing complex scalars makes the theory simpler and illuminates
behavior in the real-scalar case. However, pencil-and-paper computations
involving complex numbers can be very laborious. We can expect an
eigenvector corresponding to an eigenvalue 4 = a + bi with b 4 Oto have some
components that are not real numbers. Exercise 44 and some of the computer
exercises require computaticns using complex numbers.
We have now run into complex numbers while attempting to do linear
algebra within R". With the exception of the exercises just mentioned, we leave
the computation of complex eigenvalues and of eigenvectors with complex
components to Chapter 9. It is only in the context of compiex n-space C’,
which consists of all n-tuples of complex numbers, that the complete
eigenstory can be explained. For example, diagonalization of a square matrix
with randomly chosen real number entries, which we will discuss in Section
5.2, often cannot be achieved if we restrict ourselves only to real numbers.
Using complex numbers, such a matrix can be diagonalized with probability 1.
We believe it is important that vou be aware of the complete eigenstory, even if
you do not study Chapter 9. For this reason, theorems and definitions in this
chapter are often phrased using a parenthetical possibly complex or stated 10
terms of n-space, meaning either R” or C". Looking back at Definition 5.1, the
same definitions of eigenvalues and eigenvectors apply if A has complex
entries and we allow vectors v in C” and complex scalars A.
The characteristic polynomial of a matnx A may have multiple roots,
perhaps it has —2 as a root of multiplicity 1, and 5 as a root of multiplicity 2,
corresponding to factors (A + 2)(A — 5) of the characteristic polynomial.
Suppose that these are the only roots. We will say that the eigenvalues are
A, = —2 and A, = A, = 5. That is, if there are a total of k distinct roots of the
characteristic polynomial of degree n, it is convenient for us to denote the
roots by A,, A., ..., A,- Our next example illustrates this.

EXAMPLE 4 Find the eigenvalues of the matnx


2 1 0
4=|-! 0 1).
1 3 1
SOLUTION The characteristic polynomial is

2-Aa 1 0
-’A 1 -! 1
PAyY=}-1 A | =@-A) 3 al - 4 1 1-A
l 3. I-A
3.1 EIGENVALUES AND EIGENVECTORS 293

= (2 — AMA? — A- 3) -- (A= 2) = -(A - 2)? - A 2)


= —(A — 2A -- 2)(A + 1).
Hence, the eigenvalues ofA are A, = —! anda, =A, = 2. o

Computation of Eigenvectors
We turn to the computation of the eigenvectors corresponding to an eigen-
value A of a matrix A. Having found the eigenvalue, we substitute it in
homogeneous system (7) and solve to find the nontrivial solutions of the
system. We will obtain an infinite number of nontrivial solutions, each of
which is an eigenvector corresponding to the eigenvalue A.

EXAMPLE 5 Find the eigenvectors corresponding to each eigenvalue found in Example 4


for the matrix

2 0


A=|-1 1}.

&
1 1
Ww
SOLUTION The eigenvalues of A were found to be A, = —1 and A, = A, = 2. We substitute
each of these values in the homogeneous system (7). The eigenvectors are
obtained by reducing the coefficicnt matrix A — AJ in the augmented matrix for
the system. For A, = —1, we obtain
[A — AJ | 0] = [A+ J] 0]

(In the future, we will drop the column of zeros to the right of the partition
when solving for eigenvectors.) The solution of the homogeneous system 1s
given by

r/4
—3r/4 for any scalar r.
r

Therefore,
pAlw ale

rl4
vy, = |—3r/4| = r}—3| for any nonzero scalar r
r

294 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

Is an eigenvector corresponding to the eigenvalue A, = —1. Replacing r by 4r,


we can express this result without fractions as

r l
vy, = -s =r 5 for any nonzero scalar r.
L
4r 4|

For A, = 2, we obtain

[A - Af] = 1A - 2]
0 1 O 1 3-1
=|-1 -—2 ~10
1 Ol.
1 3-1] [0 0 0
This time we find that

s l
vy, = i = for any nonzero scalar s
5 1

is an eigenvector. As a check, we could compute Av, and Av,. For example, we


have

21 0\[s] [2s s
AV, = —1 0 l 0 = 0 = 0 = 2v, = dV).

13 jis} [2s s|

EXAMPLE 6 Find the eigenvalues and eigenvectors of the matrix

1 0 0
A=|-8 4 -6].
8 I 9

SOLUTION The characteristic polynomial of A is

il-A 0 0 A -6
His =~ a=|-8 4-A -6 9 -A =(1- aly
| 8 1 9-A
= (1 — AA? — 13A + 42) = (1 — AMA — 6)(A — 7).

The eigenvalues ofA are A, = 1, A, = 6, and A, = 7. For A, = 1, we have

00 0; 0 0 0
A-AI=A-I=|-8 3 -6/~|0 4 2
8 1 8 {8 1 8
5.1 EIGENVALUES AND EIGENVECTORS 295

I 15
8 1 3) |. ? | 9 6
~lo 2 I~] ! a}~]O1 a),
00 0; |0 0 O| j0 0 0

sO

—15r/16 “i
vVi=|] -r/2 |=r -3 for any nonzero scalarr
r
1

is an eigenvector. Replacing r by — |6r, we can express this result without the


fractions as

[Sr 15
8ri=r| 8| for any nonzero scalar r.
~—l6r —16
For A, = 6, we have

[--5 0 o] f1 0 oO] ft 0 O
A-AJ=A-6I1=|-8 -2 -6]~]0 -2 -6/~|0 1 3},
8 1 3
sO
0 0
vy, = - = i for any nonzero scalar s
s

1S an eigenvector. Finally, for A, = 7, we have

-6 0 0 1 0 0 1 0 0
A-A,j =A-7I1=|-8 -3 -6|~|0 -3 -6|~|0 1 2],
8 1 2); |O t 2] 10 0 @
so
0 0
vy, = I = [3 for anv nonzero scalar ¢
| ¢ | 1

is an eigenvector. sm

Properties of Eigenvalues and Eigenvectors


We turn now to algebraic properties of eigenvalues and eigenvectors. The
properties given in Theorem 5.1 are so straightforward and instructive to
prove that we leave the proofs as Exercises 27-29.
296 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

THEOREM 5.1 Properties of Eigenvalues and Eigenvectors

Let A be an n X n matrix.
1. If A is an eigenvalue of A with v as a corresponding eigenvector,
then A* is an eigenvalue of A*, again with v as a corresponding
eigenvector, for any positive integer k.
2. If A is an eigenvalue of an invertible matrix A with v as a
corresponding eigenvector, then A # 0 and 1/A is an eigenvalue of
A~', again with v as a corresponding eigenvector.
3. If A is an eigenvalue of A, inen the set E, consisting of the zero
vector together with ail eigenvectors of A for this eigenvalue A is a
subspace of n-space, the eigenspace of A.

Eigenvalues and Transformations


Let us consider the significance of an eigenvalue A and corresponding
eigenvector v of an 2 X n matrix A for the associated linear transformation
T(x) = Ax. The equation

Av = dv

takes the form

T(v) = Av.

Thus, the linear transformation 7 maps the vector v onto a vector that is
parallel to v. (See Fig. 5.1.) We present a definition of eigenvalues and
eigenvectors for linear transformations that is more general than for matrices,
in that we need not restrict ourselves to a finite-dimensional vector space.

vA 7(v) = 2v

T(v) = -3y

(a) (b)
FIGURE 5.1
(a) T has eigenvalue A = —2; (b) 7 has eigenvalue A = 2.
5.1 EIGENVALUES AND EIGENVECTORS = 297
DEFINITION 5.2 Eigenvaiues and Eigenvectors

Let Tbe a linear transformation of a vector space V into itself. A scalar


A is an eigenvalue of 7 if there 1s a nonzero vector v in V such that
T(v) = Av. The vector v is then an eigenvector of T corresponding to A.

It is significant that we can define eigenvalues and eigenvectors for a linear


transformation 7: V-> V without any reference to a matrix representation and
without even assuming that Vis finite-dimensional. Example 8 will discuss the
eigenvalues and eigenvectors of a linear transformation that lies at the heart of
calculus and Geals with an infinite-dimensional vector space. Students who
have studied exponential growth problems in calculus will recognize the
importance of this example.

ILLUSTRATION 1 Not every linear transformation has eigenvectors. Rotation of the plane
counterclockwise through a positive angle 6 is a linear transformation (see
page 156). If 0 < @< 180°, then no vector is mapped onto one parallel to
it—that is, no vector is an eigenvector. If @ = 180°, then every nonzero vector
is an eigenvector and they all have the same associated eigenvalue A,=—1. ©

ILLUSTRATION 2 The linear transformation T: R? > R? that zeflects vectors in the line x + 2 = 0
maps the vector [2, — 1] onto itself and maps the vector [1, 2] onto [-—1, —2], as
indicated in Figure 5.2. The equations 7((2, —1]) = (2, —1] and 7([1, 2]) =
[—1, —2] show that (2, —1] and [1, 2} are eigenvectors of T with corresponding
eigenvalues i and —i, respectively. *

For a linear transformation JT: R* — R’, we can find the transformation’s


eigenvalues and eigenvectors by finding those of its standard matrix represen-
tation. The following example illustrates how this works.

~
72, — 1)
(C1, 2))

FIGURE 5.2
Reflection in the line x + 2y = 0.
298 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

EXAMPLE 7 Find the eigenvalues A and eigenvectors v of the linear transformation:


T: R? — R} defined by 7([x,, X); %3]) = [,, ~8x, + 42, — 6x5, 8x, + x, + 9x).
Illustrate the equation 7(v) = Av for each eigenvalue.
SOLUTION Writing vectors as column vectors, we may express the linear transformation T
as T(x) = Ax, where A is the matrix given in Example 6. The rest of the solution
proceeds precisely as in that example. Let us return to row notation and
illustrate the action of T on the basic eigenvectors obtained by taking r = 5 = ¢
= ! in the expressions for v,, v,, and v, in Example 6. We have
T([15, 8, —16]}= [15, -8(15) + 4(8)— 6(-16), 8(15) + 8 + 9(-16)]
= [15, 8, —16],
T({0, -3, 1]) = [0, —8(0) + 4(-3) — 6(1), 80) - 3 + 9(1)]
= [0, -18, 6] = 6[0, —3, 1],
T({0, —2, 1]) = [0, —8(0) + 4(-2) — 6(1), 8(0) — 2 + 9(1)]
= [0, -14, 7] = 7[0, —2, 1]. 8
We now give an example from calculus, involving a vector space that is not
finite-dimensional.

EXAMPLE 8 Let D. be the vector space of all functions mapping R into R and having
derivatives of all orders. Let 7: D, — D, be the differentiation map, so that
T(f) =f", the derivative of {/ Describe ail eigenvalues and eigenvectors of T.
(We have seen in Section 3.4 that differentiation does give a linear transforma-
tion.)
SOLUTION We must find scalars A and nonzero functions fsuch that 7({) = Af—that is,
such that f’ = Af, We consider two cases: if A = 0, and if A # 0.
If 4 = 0, we are trying to solve the differential equation f’ = 0, or to use
Leibniz notation, dy/dx = 0. We know from calculus that the only soluticns of
this equation are the constant functions. Thus the nonzero constant functions
are eigenvectors corresponding to the eigenvalue 0.
If A 4 0, the differential equation becomes f' = Af or dy/dx = Ay. It 1s
readily checked that y = e* is a solution of this equation, so f(x) = ke* is aa
eigenvector for every nonzero scalar k. To see that these are the only solutions,
we can solve the differential equation by separating variables, which yields tne
equation
dy
—=A) dx.
y_
Integrating both sides of the equation yields
In|y| = Ax +c.
Solving for y, we obtain y = +e‘e* = ke**, so the only solutions of the
differential equation are indeed of the form y = ke*. a
ILLUSTRATION 3 There are several physical situations in which a point of a vibrating body
subject to a restoring force proportional to its displacement from its positis
at rest. One such situation is the vibration of a body suspended by a spring,8
5.1 EIGENVALUES AND EIGENVECTORS 299

(a) f() = 0 (b) f() #0


FIGURE 5.3
A vibrating body suspended by a spring.

illustrated in Figure 5.3. We let f(t) be the displacement of the body at time ¢.
The acceieration of the body at time ¢ is then given by /"(t). Using Newton’s
law F = ma, we see that we can express the relation between the restoring force
and the displacement as mf"(t) = —cf(t), where c is a positive constant.
Dividing by m, we can rewrite this differential equation as

FD = -kf(9) (10)
for some constant k. Now differentiating twice is a linear transformation of the
vector space of all infinitely differentiable function into itself, and we see that
the equation f"() = —kf(é) asserts that the function fis an eigenvector of this
transformation with eigenvalue —k’. We can easily check that the functions sin
kt and cos kt are solutions of Eq. (10). By property 3 of Theorem 5.1, every
linear combination
S(Q = asin kt + bcos kt
is also a solution. Note that for this vibration situation, the eigenvalue —k?
determines the frequency of the vibration. To illustrate, suppose in the case of
the spring in Figure 5.3 we have f(t) = 2 and f'(t) = 0 when ¢ = 0. It is readily
determined that we then must have b = 2 anda = 0, so f(t) = 2 cos kt. Thus the
frequency of vibration, which is the reciprocal of the period, is k/(27). =

SUMMARY

Let A be ann X n matrix.


1. If Av = Av, where vis a nonzero column vector and A is ascalar, then A is an
eigenvalue of A and v is an eigenvector of A corresponding to A.
2. The characteristic polynomial p{A) of A is obtained by expanding the
determinant |A — Al], where J is the n X n identity matrix.
3. The eigenvalues A of A can be found by solving the characteristic equation
p(d) = |A — Al] = 0. There are at most n real solutions A of this equation.
300 CHAPTER S EIGENVALUES AND EIGENVECTORS

4. The eigenvectcrs of A corresponding to A are the nontrivial solutions of the


homogeneous system (A — AJ)x = 0, as illustrated in Examples 5 and 6.
5. Let k be a positive integer. If A is an eigenvalue of A having v as
eigenvector, then A‘ is an eigenvalue of A‘ with v as eigenvector. If A is
invertible, this statement is also true for k = —1.
6. Let A be an eigenvalue of A. The set £, in n-space consisting of the zero
vector and all eigenvectors corresponding to A is a subspace of n-space.
7. Let Tbea linear transformation of a vector space V into itself. A nonzero v
in Vis an eigenvector of Tif 7(v) = Av for some scalar A, which is called the
eigenvalue of 7 corresponding to v.
8. If T is a linear transformation of the vector space R’ into itself, the
eigenvalues of 7 are the eigenvalues of the standard matrix representation
of T.

EXERCISES 7

1. Consider the matrices T-yoQ0 0 8 0 0

i -1 -i 3-2 0 8. 4 , “ ’ 3 O |
A,=|-1 1 -1|,4,=|-2 3 0 L
-1-1 1 0 0 5 rT 1 0 0 -2 0 0
10.|-8 4 -5 Wl. |-5S -2 -5
and the vectors 8 0 9 5 0 3

0 -! [4 0 0 [1 O 1
V=l1],%=/0),%=] 1, W2.|-7 2 -1 13./-7 2 5
I I 0 7 0 3 . 3 0 TD
! -| [460 0 ro 0 |
v¥,=]1], v5 =| OF. 14.;8 4 8 1§.|-2 -2 1
0 I 10 0 4 | 2 0-1!

List the vectors that are eigenvectors of A, [2 0 1


and the ones that are eigenvectors of A,. Give 16. |6 4 -3
the eigenvalue in each case. 2 0 3

In Exercises 2-16, find the characteristic In Exercises 17-22, find the eigenvalues A, 27d
polynomial, athe real eigenvalues,a and the . the corresponding eigenvectors v; of the linear
corresponding eigenvectors of the given matrix. transformation T.

2. [10 3| 3, | , A 17. T defined on R? by T({x, y]) =


4.
[2x — 3y, —3x + 2y]
-7 -5 5 0 —!
2
16 17 “11 20 18. T defined on R’ by 7([x, y]) =
| > 0 0 [x-y, —x+ y]
6. 1 7 7.1 1-1 -2 19. T defined on R? by 7([x,, x, x;]) =
l 2 |-1 0 1 [x + X3, Xz, X, + x3]
5.1 EIGENVALUES AND EIGENVECTORS 301

X), x;)) = 30. Prove that a square matrix is invertible if


20. T defined on R? by T(x,
[%1» 4x, + 7X3, 2%, — X3] and only if no eigenvalue is zero.
41. T defined on R? by 7([X,, %2, X3]) = 31. Find the eigenvalues and eigenvectors of the
{x,, ~5% + 3%) — 5x3, —3x, — 2x] matrix
93, T defined on R? by 7([x,, %, Xs]) = (3x, —
fo
A= 11
Xy + Xs —2x, + 2x, — Xj, 2x, + X, + 4x]
93, Mark each of the following True or False.
a. Every square matrix has real eigenvalues. used to generate the Fibonacci sequence (2).
__ b. Every n X n matrix has n distinct 32. Let A be an n X n matrix and let J be the
(possibly complex) eigenvalues. n X nidentity matnx. Compare the
__¢. Every n X n matrix has n not necessarily eigenvectors and eigenvalues of A with those
distinct ard possibly complex of A + rl for a scalar r.
eigenvalues.
33. Let a square matrix A with real eigenvalues
d. There can be only one eigervalue
have a unique eigenvalue of greatest
associated with an eigenvector of 4 linear
magnitude. Numerical computation of this
transformation.
eigenvalue by the power method (discussed
e. There can be only one eigenvector
in Section 8.4) can be difficult if there is
associated with an eigenvalue of a linear
another eigenvalue of almost equal
transformation.
magnitude, so the ratio of these magnitudes
f. If v is an eigenvector of a matrix A, then
vy is an eigenvector of A + cl for all is close to 1.
a. Suppose we know that a 4 x 4 matrix 4
scalars c.
has eigenvalues of approximately 20, 2,
__ g. If A is an eigenvalue of a matrix A, then
—3, and —19.5. Using Exercise 32, how
A is an eigenvalue of A + cl for all
might we modify A so that the
scalars c.
aforementioned ratio is not so close to 1?
_—h. If vis an eigenvector of an invertible
b. Repeat part (a), given that the
matrix A, then cv is an eigenvector of A™!
eigenvalues are known to be
for all nonzero scalars c.
approximately 19.5, 2, —3, and —20.
__. i. Every vector in a vector space V is an
eigenvector of the identity transformation . Let A be ann X n real matrix. An
of V into V. eigenvector w in R’ and a corresponding
— j. Every nonzero vector in a vector space V eigenvalue a of A’ are also called a left
is an eigenvector of the identity eigenvector and eigenvalue of A. Explain the
transformation of V into V. reason for this name.
24. Let T: V— V be a linear transformation of a 35. (Principle of biorthogonality) Let A be an
vector space V into itself, and let A be a n X neal matrix. Let v in R" be an
scalar. Prove that {v € V| 7(v) = Av} isa eigenvector of A with corresponding
subspace of V. eigenvalue A, and let w in R” be an
eigenvector of A’ with corresponding
25. Prove that if A is a square matrix, then 447
eigenvalue a. Prove that if A # a, then
and ATA have the same eigenvalues.
v and w are perpendicular vectors. [HinT:
26. Following the idea in Example 8, show that Refer to Exercise 34, and compute w7Av in
the functions e”, e~™, sin ax, and cos ax are two ways, using associativity of matrix
eigenvectors for the linear transformation multiplication.]
T: D, — D, defined by T(f) = f™, the
36. a. Prove that the eigenvalues of an n x n
fourth derivative of £ Indicate the eigenvalue
real matrix A are the same as the
in each case.
eigenvalues of A’.
27. Prove property | of Theorem 5.1. b. With reference to part (a), show by a
28. Prove property 2 of Theorem 5.1. counterexample that an eigenvector of A
29. Prove property 3 of Theorem 5.1. need not be an eigenvector of A’.
302 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

37. The trace of ann Xx n matrix A 1s defined by 44. Let

tA) =a, tant-+-+


ta ane A= i - |
1 0
Let the characteristic polynomial p(A) factor
It can be shown that, if x is any column
into linear factors, so that A has n (not
vector in R?, then Ax can be obtained
necessarily distinct) eigenvalues A,, A,, . .
geometrically from x by rotating x
., Prove that
counterclockwise through an angle of 90°.
tr(A) = (-1)” (Coefficient of A""' in p{A)) For example, we find that Ae, = e, and
=A, +A,+ + A, Ae, = —@,.
a. Argue geometrically that 4 has no real
38. Let A be ann X n matrix, and let C be an eigenvalues.
invertible 7 xX n matnx. Show that the b. Find the complex eigenvalues and
eigenvalues of A and of C-'AC are the same. eigenvectors of A.
{Hint: Show that the characteristic c. Argue geometrically that A? should have
polynomials of the two matrices are the real eigenvalues _____. (Fill in the blank.)
same.]} d. Use part (b) and Theorem 5.1 to find the
eigenvalues of A’, and compare with the
39. Cayley—Hamilton theorem: Every square
matrix A satisfies its characteristic equation. answer obtained for part (c). What are
the real eigenvectors of A”? the complex
That is, if the characteristic equation is
eigenvectors of A’?
PA" + py-A™' + +++ + pA + py = O, then e. Find eigenvalues and eigenvectors for A’.
pA" + p,-Av! + +++ + pA + pol = O, the f. Find eigenvalues and eigenvectors for A’.
zero matrix. Illustrate the Cayley- Hamilton
a 45. Use MATCOMP in LINTEK, or MATLA3,
theorem for the matnx i “3 and relation (4) to find the following terms
of the Fibonacci sequence (2) as accurately
40. Let A be an invertible n xX n matrix. Using as possible. (Use double-precision printing if
the Cayley-Hamilton theorem stated in the possible.)
preceding exercise, show that A7' can be a. F, (Note that F, = 21; this part is to
computed as a linear combination of the check procedure.)
powers A, A’,..., A* of A. Compute the
inverse of the matrix 4 in the preceding
pao

exercise in this fashion.


41. Let v, and v, be eigenvectors of a linear e Fiso

transformation 7: V ~ V with corresponding 46. The first two terms of a sequence are a, = 0
eigenvalues A, and A,, respectively. Prove and a, = 1. Subsequent terms are generated
that, if A, * A), then v, and v, are using the relation
independent vectors.
a, = 2a,.,+ a. for k=2.
42. The analogue of Exercise 41 for a list of r
eigenvectors in V having distinct eigenvalues a. Write the terms of the sequence through
is also true; that is, the vectors are . dy.
independent. See if you can prove it. [HiNT: b. Find a matnx that can be used to
Suppose that the vectors are dependent; generate the sequence, as the matrix A in
consider the first vector in the list that is a Exercise 31 can be used to generate the
linear combination of its predecessors, and Fibonacci sequence.
apply T.] c. Use MATCOMP in LINTEK, or
43. State the result for matrices corresponding MATLAB, to find ap.
to Exercise 42. Explain why successful 47, Repeat Exercise 46 for a sequence where
completion of Exercise 42 gives a proof of a, = 0, a, = 1, a, = 2, and a, =
this statement for matrices. 2a,-, — 3ay-2 + ay-3 for k = 3.
5.1 EIGENVALUES AND EIGENVECTORS 303

a square matrix A, .£he MATLAB command Copy down the equation, and then use
gig(A) produces the eigenvalues (both real and ALLROOTS to find all eigenvaiues of the
plex) of A. The command [V, D] = eig(A) matrix.
ces a matrix V whose column vectors are
haps complex) eigenvectors of A and a -l 4 6 10 -13 8
‘agonal matrix D whose entry d,, is the .| 2 7 9 53. 3 -20 5
eigenvalue for the eigenvector in the ith column of -3 Il 13 -11 7 -6
y, In Exercises 48-51, use either MATLAB or the
poutine MATCOMP in LINTEK to find the real [ -7 t1 -7 10
eigenvalues and corresponding eigenvectors of the 5 8 -13 3
piven matrix. TS 8 -9 2
3-4 20 -6
710 6 “1000 21-8 O 32

-14 17 -6 9
#)2-15 -60 49,| -33290
020 55.
15 11-13 16
I-31 0 3! I-18 30 4331
00-2 2 . Use the routine MATCOMP in LINTEK, or
-34-3 3 MATLAB, to illustrate the Cayley-Hamilton
01409 2 4 theorem for the matna
20 2 4
-2 4 6 -1
4 00 O 5-8 3 2
1-6 16-6 6 11 -3 7 1}
-)_16 0 20 16 0-5 9 10
-16 0 16 20
(See Exercise 39.)
The routine ALLROOTS in LINTEK can be used 57. The eigenvalue option of the routine
to find both real and complex roots of a VECTGRPH in LINTEK is designed to
polynomial. Ihe program uses Newton’s method, illustrate graphically, for a linear
which finds a solution by successive transformation of R? into itself having real
approximations of the polynomial function by a eigenvalues, how repeatedly transforming a
linear one. ALLROOTS is designed so that the vector makes it swing in the direction of an
user can watch the approximations approach a eigenvector having eigenvalue of maximum
solution. Of course, a program such as MATLAB, magnitude. By finding graphically a vector
which is designed for research, simply spits out the whose transform is parallel to it, one can
answers. In Exercises 52-55, either estimate from the graph an eigenvector and
&. use the command eig(A) in MATLAB to find the corresponding eigenvalue. Read the
all eigenvalues of the matrix or directions for this option, and then work
b. first use MATCOMP in LINTEK to find the with it until you can reliably achieve a score
characteristic equation of the given matrix. of 85% or better. -

MATLAB
The command v = poly(A) in MATLAB produces a vector v whose components are
the coefficients of the characteristic polynomial p(A) of A, appearing in order of
decreasing powers of A. The command roots(v) then produces the solutions of
Pia) = 0.
304 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

Mi. €nter the command format long and then use thc commands jusi eaplained to
find the characteristic polynomial and both the real and the complex
eigenvalues of
a. the matrix in Exercise 53
b. the matrix in Exercise 54.
{Recall that in MATLAB, the characteristic polynomial of an n x n matrix A is
det(A/ — A), rather than det(A — AJ). Thus the coefficient of A” should be | rather
than (—1)". We enter the long format because otherwise, with the displays in
scientific notation, it might seem from MATLAB that the coefficient of A” is 0.]
Let A be ann X rn matrix. In the introduction to this section, we considered the
case where there is a basis {b,, b,, . . . , b,} for R" consisting of eigenvectors of A, and
where, for one of the eigenvectors, say b,, the corresponding eigenvalue A, is of
greater magnitude than each of A,,..., A,. Also ifx = db, + d,b, +--+ + d,b,
with d, # 9, then for sufficiently !arge values of k, we saw that

Ablx = A,(440,
so that this eigenvalue A, of maximum magnitude might be estimated by taking the
ratio of components of A‘*'x to components of A*x. Recall that a period before an
arithmetic operation symbol, such as .« or ./, will cause that operation to be
performed component by component for vectors or matrices. Thus the MATLAB
command
r = (A*(k+1)«x) ./ (A*k«x)

will cause the desired ratios of components to be printed. If these ratios agree for all
components, we expect we have computed the eigenvalue of maximum magnitude
accurate to all figures shown.
M2. Enterk = 1;x = [1 23)’; A = [3 5 —11; 5 -8 —3; -11 —3 14]; in MATLAB.
(Recall that the final semicolon keeps the data from being displayed.) Then
enter r = (A‘(k+1)«x) ./ (A*k«x) to see the ratios displayed. Now enter k =
5; and use the up-arrow key to have the ratios computed for that &. Continue
using the up-arrow key, setting k successively equal to 10, 15, 20, 25, . . . until
the ratios agree to all places shown. Then change to format long, and continue
until you have found the eigenvalue of maximum magnitude accurate to all
figures shown. Copy it down as your answer to this problem. Check it using
the command eig(A).
M3. Property 2 of Theorem 5.1 indicates that the eigenvalue of minimum
magnitude of A is the reciprocal of the eigenvalue of maximum magnitude of
A~', Continuing the preceding exercise, enter B = A; to save A, and then enter
A = inv(A);. Now enter s = ones(3,1) / r. (Enter heip ones to learn the effect
of the ones(3,1) statement.) Using the up-arrow key to access statements
defining k and computing r and s, compute the eigenvalue of A of minimum
magnitude accurate to all figures shown in the long format. Copy down your
answer. and check it using the eig(B) command.
M4. Raising a matrix A to a high power can generate error, and can cause overflow
(numbers too large to handle) in a computer. Continuing the preceding two
exercises. let us avoid this by only raising A to a low power, say the Sth
power, and replace x by A°x after each iteration. Repeating this m times
should have the same effect as raising A to the power 5m. Of course, now the
entries in the vector x may get large. We compensate for this by norming x
5.2 DIAGONALIZATION 305

be of magnitude | before the next iteration. This does not change the
direction of x, which should swing to parallel the eigenvector b, having
eigenvalue A, of maximum magnitude as the iterations are performed. To
execute this procedure, enter A = B; to recover A as in Exercise M2, and then
enter
x = A%Sex; x = (1/norm(x))}«x; r = (Aex) / x
to counpute 4°x, normalize x, and compute the ratios of components of Ax to
those of x. One repetition of the up-arrow key followed by the Enter key
executes these iterations rapidly. Establish your result in Exercise M2 again,
and then, replacing A by 4~', establish your result in Exercise M3 again.
MS. Explain why the final vector x obtained when finding the eigenvalue of either
maximum or of minimum magnitude in Exercise M4 should be an
eigenvector corresponding to that eigenvalue. Check that this is so using the
[V, D] = eig(A) command explained before Exercise 48.

In Exercises M6-M8, use the displayed command in Exercise M3 preceded by any


necessary modifications in the initial vector x to find the eigenvalues of maximum
and minimum magnitude. Always reinitialize x to something like x = [1 2 3] before
finding the eigenvalue of minimum magniiude. Give answers in short format. The
matrices are in the file cf matrices for this section, if it is available at your
installation.

| 41 $ -3 -4 7 ®-ll 5
-2 6 8 1-2-2 6 7-3 0 2 10
M6. | 6-3 21 M7] 5 4 _3 (M8 | 8 0 2 1 4
8 2-4 5 30g oo -1 2 10S =9
5 10 4 -9 3

DIAGONALIZATION

Recall that a square matrix is called diagonal if all entnes not on the main
diagonal are zero. In the preceding section we indicated the importance of
being able to compute A*x for an m X m matrix A and a column vector x in R’.
In this section, we show that, if A has distinct eigenvalues, then computation
of A‘ can be essentially replaced by computation of D*, where D is a diagonal
matrix with the eigenvalues of A as diagonal entries. Notice that D* is the
diagonal matrix obtained from D by raising each diagonal entry to the power
k, For example,
2 0 of |8 0 O
0-1 oO} =|o-1 Ol.
0 0-2 0 0 -8|
306 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

The theorem that follows shows how to summarize the action of a square
matrix on its eigenvectors in a single matrix equation. This theorem is the first
step toward our goal of diagonalizing a matrix. Theorems stated are valid for
matrices and vectors with complex entries and complex scalars unless we use
the adjective real. We sometimes refer to n-space in this section, meaning
either R" or C’.

THEOREM 5.2 Matrix Summary of Eigenvalues of A

Let A be an n X mn matrix and let A,, A,, .. . , A, be (possibly complex)


scalars and v,, v,,.. . , ¥, be nonzero vectors in n-space. Let C be the
n X n matrix having vy, as jth column vector, and let

|” 0
0 an
Then AC = CD if and only if A,, A, . . . , A, are eigenvalues of A and vy,
is an eigenvector of A corresponding to A, for j = 1, 2,...,n.

PROOF We have

CD=lv,
|
| v, °*° ¥,
Ih. Q
r

On the other hand,

AC
= Alv, Vv, -*:+ V,|= Av, Av, *°- Av,}.

Thus, AC = CD if and only if Av, = Ay; &


5.2 DIAGONALIZATION 307

The n X n matrix C is invertible if and only if rank(C) = n—that ts, if and


only if the column vectors of C form a basis for n-space. In this case, the
criterion AC = CD in Theorem 5.2 can be written as D = C~'AC. The equation
D = C'AC transforms a matrix A into a diagonal matrix D that is much easier
to work with, as we will see ina moment. We give a formal definition of such a
transformation of A into D, followed by a corollary summarizing the results of
this paragraph.

DEFINITION 5.3 Diagonalizable Matrix

Ann X n matrix A is diagonalizable if there exists an invertible matrix


C such that C"'AC = D, a diagonal matrix. The matrix C is said to
diagonalize the matrix A.

COROLLARY 1 A Criterion for Diagonaiization

Ann X nmatnx A is diagonalizable if and only if n-space has a basis


consisting of eigenvectors of A.

If an n X n matrix A is diagonalizable as described in Definition 5.3, then


we have A = CDC' as well. From this we obtain

Ak = (CDC"(CDC(CDC) . .. (CDC) (1)


k factors
The adjacent terms C~'C cancel in Eq. (1) to give A‘ = CD*C™'. Thus, the
computation of A‘ is essentially reduced to the computation of D*. We
summarize these observations as a corollary to Theorem $.2.

COROLLARY 2 Computation of A‘

Let ann X n matrix A have n eigenvectors and eigenvalues, giving rise


to the matrices C and D so that AC = CD, as described in Theorem
5.2. If the n eigenvectors are independent, then C is an invertible
matrix and C-'AC = D. Under these conditions, we have
At = CDIC"!

Diagonalization of square matrices plays a very important role in linear


algebra. As we show in the next theorem, diagonalization of an n X n matrix
A can always be achieved if the characteristic polynomial has n distinct
roots.
308 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

THEOREM 5.3 Independence of Eigenvectors

Let A be an n X n matrix. If v,, v,,..., v, are eigenvectors of A


corresponding to distinct eigenvalues A,, A, . . . , An, Fespectively, the
set {v,, V.,..., V,}1S linearly independent and A is diagonalizable.

PROOF Suppose that the conclusion is false, so the eigenvectors V,, V,,..., V,
are linearly dependent. Then one of them is a linear combination of its
predecessors. (See Exercise 37, page 203.) Let v, be the first such vector, so that

vy, = dy, + dy, t+ + AywYyey (2)


and {v,, V,,..-, “,-} iS independent. Multiplying Eq. (2) by A,, we obtain
ALY, = di>,¥, + dA,V> + s+ di, AyVh-1- (3)
On the other hand, muitiplying both sides of Eq. {2) on the left by the matrix4
yields

AV, = GA, + AAW, tot + Ag Me (4)


because Av, = A,yv,. Subtracting Eq. (4) from Eq. (3), we see that

0= dA, - A,)v, + a,(A, — A,)v, + + de (Ag — Age


This !ast equation is a dependence relation because not all the coefficients are
zero. (Not all «, are zero because of Eq. (2) and because the A; are distinct.) But
this contradicts the linear independence of the set {v,, v,,..., Vs} We
conclude that {v,, v,, . . . , V,} 1s independent. That A is diagonalizable follows
at once from Corollary 1 of Theorem5.2. 4

EXAMPLE 1 Diagonalize the matrix A = => il and compute A* in terms of k.

SOLUTION We compute

ded ~ a= |~> 5? > fae -a-2=@- 2A,

The eigenvalues of A are A, = 2 and A, = —|. For A, = 2, we have

A-- a=" 2
5S 5} jl -l
2/ |0 oP
which yields an eigenvector

For A, = —1, we have


5.2 DIAGONALIZATION 309

-f
which yields an eigenvector

A diagonalization of A is given by

s-coer-[! 8 of

ee |
Wiha

ofan
Wie

Lvl
|
(We omit the computation of C~'.) Thus,

f | 2k 0 5 | _ {noe + § 5(2*) z |
Ak=
LL 2} iy + 2)’
373] 3[-2 £2 5024
the colored sign being used only when k is odd. «

Diagonalization has uses other than computing A‘. The next section
presents an application to differential equations. Section 8.1 gives another
application. Here is an application to geometry in R’.

EXAMPLE 2 Find a formula for the linear transformation T: R’ — R’ that reflects vectors in
the line x + 2y = 0.
SOLUTION We saw in Illustration 2 of Section 5.1 that T has eigenvectors [1, 2]
and [2, —1] with corresponding eigenvalues —1 and 1, respectively. Let
A be the standard matrix representation of 7. Using column-vector nota-
tion, we have

r((]) =a] =[22) 7a) -4a}-L-}


Letting C = 3 | dD= ro ihe see that AC = CD asin Theorem 5.2.
Because C is invertible,
l 2

a-coc=[ a) | i 1-4-3]
1 2)f-t O75 5) _ af 3 -4

Thus

r= Ab) = s-a— 3
x}\_ ,f[x|)_ 1] 3x - 4y

or, in row notation, 7([x, y]) = 43x — 4y, ~4x - By]. x

The preceding matrices A and D provide examples of similar matrices, a


term we now define.
310 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

DEFINITION 5.4 Simiar Matrices

Ann X n matrix P is similar to an n X n matrix Q if there exists an


invertible n x n matrix C such that C-'PC = Q.

The relationship “P is similar to Q” satishes the properties required for an


equivalence relation (see Exercise 16). In particular, if P is similar to Q, then Q
is also similar to P. This means that similarity need not be stated in a
directional way; we can simply say that P and Q are similar matrices.

EXAMPLE 3 Find a diagonal matrix similar to the matrix

1 0 0
A=|-8 4 -6
81 9
of Example 6 in Section 5.1.
SOLUTION Taking r = 5 = t = | in Example 6 of Section 5.1, we see that eigenvalues and
corresponding eigenvectors of A are given by
A=1 4 =6, A =7,
15 0 ay
v= 8], v,=/-3], v,=|]-2).
—16 1 l
If we let
f 15 0 0
C= 8 -3 -2|,
-16 1 #1
then Theorem 5.3 tells us that C is invertible. Theorem 5.2 then shows that

100
C'AC= D=|0 6 O}.
007
—_——_—_—_—”

HISTORICAL NOTE Tue 1DEA OF SIMILARITY, like many matrix notions, appears without 4
definition in works from as early as the 1820s. In fact, in his 1826 work on quadratric forms (
note on page 409), Cauchy showed that if two quadratic forms (polynomials) are related by §
change of variables—that is, if their matrices are similar—then their characteristic equations4
the same. But like the concept of orthogonality, that of similarity was first formally defined af
discussed by Georg Frobenius in 1878. Frobenius began by discussing the general case: he called
two matrices A, D equivalent if there existed invertible matrices P, Q such that D = PAQ. The faster
matrices were called the substitutions through which A was transformed into D.
Frobenius then dealt with the special cases where P = Q7 (the two matrices were then calted
congruent) and where P = Q™' (the similarity case of this section). Frobenius went on to
many results on similarity, including the useful theorem that, if 4 is similar to D, then fA
similar to f(D), where fis any polynomial matrix function.
5.2 DIAGONALIZATION 311

Thus Dis similar to A. We are not eager to check that C"'AC = D; however, it is
easy to check the equivalent statement:
5 0 0
AC = CD= 8 —i8 —14).
-16 6 7 .

It is not always essential that a matrix have distinct eigenvalues in order to


be diagonalizable. As long as the n X n matrix A has n independent
eigenvectors to form the column vectors of an invertible C, we have C~'AC =
D, the diagonal matrix of the eigenvalues corresponding to C.

EXAMPLE 4 Diagonalize the matrix

1-3 3
A=|0 -5 6}.
0-3 4
SOLUTION We find that the characteristic equation of A is
(1 — A)(-5 — A\(4 — A) + 18) = (1 — AYA? + A — 2)
= (1 — AXA + 2) — 1) = 0.
Thus, the eigenvalues of A are A, = 1, A, = 1, and A, = —2. Notice that 1 isa
root of multiplicity 2 of the characteristic equation; we say that the eigenvalue
1 has algebraic multiplicity 2.
Reducing A — J, we obtain

fo -3 31 fo 1-11
A-I=}]0-6 61~|0 O Oj}.
0-3 3} {@ 0 0
We see that the eigenspace E£, (that 1s, the nullspace of A — J) has dimension 2
and consists of vectors of the form
5
r for any scalars r and s.
r

Taking s = 1 and r = 0, and then taking s = 0 and r = 1, we obtain the


independent eigenvectors

l 0
v, =|0| and vy, =]1
0 |
corresponding to the eigenvalues A, = A, = |.
Reducing A + 2/, we find that

3 -3 3; |3 0-3
A+27={0-3 6|~/0 1 -2).
0-3 6] {0 0 0
312 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

Thus an eigenvector corresponding to A, = —2 1s

Therefore, if we take

we should have

A check shows that indeed

1 Q -2
AC
= CD= 0 1 —-4|.
0 1 -2
As we indicated in Example 4, the algebraic multiplicity of an eigenvalue A;
of A is its multiplicity as a root of the characteristic equation of A. Its geometric
multiplicity is the dimension of the eigenspace E,. Of course, the geometric
multiplicity of each eigenvalue must be at least 1, because there always exists a
nonzero eigenvector in the eigenspace. However, it is possible for the algebraic
multiplicity to be greater than the geometric multiplicity.

EXAMPLE 5 Referring back to Examples 4 and 5 in Section 5.1, find the algebraic and
geometric multiplicities of the eigenvalue 2 of the matrix
2 1 0
A=|-1 0 1).
1 3 1
SOLUTION Example 4 on page 292 shows that the characteristic equation of A is
—(A — 2)(A + 1) = 0, so 2 isan eigenvalue of algebraic multiplicity 2. Example
5 on page 293 shows that the reduced form of A — 21 is

1 3-1
0 1 QO}.
00 0
Thus, the eigenspace E,, which is the set of vectors of the form
r
O|forr ER,
r

has dimension |, so the eigenvalue 2 has geometric multiplicity 1.


5.2 DIAGONALIZATION 313

We state a relationship between the algebra‘= multiplicity and the geomet-


ric multiplicity of a (possibly complex) eigenvalue. (See Exercise 33 in Section
9.4.)

The geometric multiplicity of an eigenvalue of a matrix A is less


io than or equal to its algebraic multiplicity.

Let A,, A,,..., A,, be the distinct (possibly complex) eigenvalues of an


n X nmatrix A. Let B,be a basis for the eigenspace of A; fori=1,2,..., mm. It
can be shown by an argument similar to the proof of Theorem 5.3 that the
union of these bases B; is an independent set of vectors in n-space (see Exercise
24). Corollary | on page 307 shows that the matrix A is diagonalizable if this
union of the B; is a basis for n-space. This will occur precisely when the
geometric multiplicity of each eigenvalue is equal to its algebraic multiplicity.
(See the boxed statement.) Conversely, it can be shown that, if A is diagonali-
zable, the algebraic multiplicity of each eigenvalue is the same as its geometric
multiplicity. We summarize this in a theorem.
THEOREM 5.4 A Criterion for Diagonalization

An n X n matrix A is diagonalizable if and only if the algebraic


multiplicity of each (possibly complex) eigenvalue is equal to its
geometric multiplicity.

Thus the 3 X 3 matrix in Example 4 is diagonalizable, because its


eigenvalue | has algebraic and geometric multiplicity 2, and its eigenvalue —2
has algebraic and geometric multiplicity |. However, the matrix in Example 5
is not diagonalizable, because the eigenvalue 2 has algebraic multiplicity 2 but
geometric multiplicity |.
In Section 9.4, we show that every square matnix A is similar to a matrix J,
its Jordan canonical form. Uf A is diagonalizable, then J is a diagonal matnx,
found precisely as in the preceding examples. If A is not diagonalizable, then J
again has the eigenvalues of A on its main diagonal, but it also has entnes |
immediately above some of the diagonal entries. The remaining entries are al!
zero. For example, the matrix
2 1 0
A={-1 0 1
1 3 |
of Example 5 has Jordan canonical form
314 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

Section 9.4 describes a technique for finding J. This Jordan canonical form is
as close to a diagonalization of A as we can come. The Jordan canonical form
has applications to the solution of systems of differential equations.
To conclude this discussion, we state a result whose proof requires an
excursion into complex numbers. The proof is given in Chapter 9. Thus far, we
have seen nothing to indicate that symmetric matrices play any significant role
in linear algebra. The following theorem immediately elevates them into a
position of prominence.

THEOREM 5.5 _ Diagonalization of Real Symmetric Matrices

Every real symmetric matrix is real diagonalizable. That is, if A is an


n X n symmetric matrix with real-number entries, then each eigenval-
ue of A is a real number, and its algebraic multiplicity equals its
geometric multinlicity.

A diagonalizing matrix C for a symmetric matrix A can be chosen to have


some very nice properties, as we will show for real symmetric matrices in
Chapter 6.

“| SUMMARY
a |
Let A be ann Xx n matrix.

1. If A has n distinct eigenvalues A,, A,,..., A,, and C is an n X n matrix


having as jth column vector an eigenvector corresponding to A,, then
C'AC is the diagonal matrix having A, on the main diagonal in the jth
column.
2. If C'AC = D, then AF = CD'C"'.
A matrix P is simiiar to a mairix Q if there exists an invertible matnx C
such that C-'PC = Q.
4. The algebraic multiplicity of an eigenvalue A of A is its multiplicity as a
root of the characteristic equation; its geometric multiplicity 1S the
dimension of the corresponding eigenspace E£,.
5. Any eigenvalue’s geometric multiplicity is less than or equal to its algebral¢
multiplicity.
6. The matrix A is diagonalizable if and only if the geometric multiplicit y of

each of its eigenvalues is the same as the algebraic multiplicity.


7. Every symmetric matrix is diagonalizable. All eigenvalues of 4 rea
symmetric matnx are real numbers.
5.2 DIAGONALIZATION 315

EXERCISES
jn Exe srcises 1-8, find the eigenvalues A, and the i. If an n X n matrix A is diagonalizable,
gorresponding eigenvectors v, cf the given matrix there is a unique diagonal matrix D that
A and also find an invertible matrix Cand a is similar to 4.
CAC.
diagonal matrix D such that D = — j. If A and B are similar square matrices.
then det(A) = det(B).
-3 4 32 14. Give two different diagonal matrices that are
2. A=
LA
=
| 4 i ; ‘ i
similar n E1
to the matnx 34 .
6 3-3
4.Az=|-2 -1 2 . Prove that, if a matrix is diagonalizable, so
3, A= 7 |
van 16 8 -7 is its transpose.
16. Let P, Q, and R be n X n matrices.
(-3 10 -6| -3 5 -208 Recall that P is similar to Q if there
-6 6. A= 2 0
5. A -| 0 7
exists an invertible n x n matrix C such that
C"'PC = Q. This exercise shows that

“20 0-1 -43-1] 6 -126 similarity is an equivalence relation.


a. (Reflexive.) Show that P is similar to
1.A= 2 0 8. A=
3 0 2 3-3 8 itself.
b. (Symmetric.) Show that, if P is similar to
Q, then QO is similar to P.
In Exercises 9-12, determine whether the given
c. (Zransitive.) Show that, if P is similar to
matrix is diagonalizable.
Q and Q is similar to R, then P is similar
to R.
1 2 6 3 10
9/2 O- 70.}0 3 1 17. Prove that, for every square matnix A all of
16-4 3 10 0 3 whose eigenvalues are real, the product of its
eigenvalues is det(A).
f-} 4 2-7 ; 25 1 18. Prove that similar square matrices have the
0 5-3 6 20 2 6 same eigenvalues with the same algebraic
Ml) 9 go -5 1] Is 2 7-1 multiplicities.
00 011 ll 6-1 3
19. Let A be an nm X nm matrix.
13. Mark each of the following True or False. a. Prove that if A is similar to rA where r is
a. Every n X n matnix is diagonalizabie. a real scalar other than | or —1, then all
—b. If ann xX n matnx has n distinct real eigenvalues of A are zero. [HINT: See the
eigenvalues, it is diagonalizable. preceding exercise.]
—c. Every n X nm real symmetric matrix ‘s tcal b. What can you say about A if t is
diagonalizable. diagonalizable and similar to rA for some
- Ann X n matrix is diagonalizable if and r where |r| 4 1?
Only if it has distinct eigenvalues.
c. Find a nonzero 2 x 2 matrix A which is
. Ann ™ n matnx is diagonalizable if and
similar to rA for every r 4 0. (See part a.)
only if the algebraic multiplicity of each
of its eigenvalues equals the geometric d. Show that A = 4 is similar to —A.
multiplicity.
— f. Every invertible matrix is diagonalizabie. (Observe that the eigenvalues of A are not
—— 2. Every triangular matnx is diagonalizable. all zero.)
——h. lf A and B are similar square matrices 20. Let A be a real tridiagonal matrix—that is,
and A is diagonalizable, then B is also ann X n matrix for n > 2 all of whose
diagonalizable. entries are zero except possibly those of the
316 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

form 4@,,.,, @,, OF a,_,,. Show that if a,,_, and this section, give the best justification you
a,_,; are both positive, both negative, or both can for this statement.
zero for i = 2, 3,..., ”, then A has real
eigenvalues. [HINT: Shew that a diagonal
[Bl 29. The MATLAB command

matrix D can be found such that DAD™' is A = (2er)srand(n) — reones(n); eig(A)


real symmetric.]
21. Find a formula for the iinear transformation should exhibit the eigenvalues of an n x n
matnx A with random entries between —r.
i

T: R? — R? that reflects vectors in the line


and r. Illustrate your argument in the
y = mx. [HINT: Proceed as in Example 2.]
preceding exercise for seven matrices with
22. Let A and C be n X n matrices, and let C be values of r from | to 50 and values of n
invertidle. Prove that, if v is an eigenvector from 2 to 8. (Recall that we used rand(n) in
of A with corresponding eigenvalue A, then the MATLAB exercises of Section 1.6.)
C~'y is an eigenvector of C-'AC with
corresponding eigenvalue A. Then prove that In Exercises 30-41, use the routines MATCOMP
ail eigenvectors of C~'AC are of the form and ALLROOTS in LINTEK or use MATLAB to
C-'v, where v is an eigenvector of a. classify the given matrix as real diagonalizable,
. Explain how we can deduce from Exercise complex (but not real) diagonalizable, or not
Na
Go

22 that, if A and B are similar square diagonalizable. (Load the matrices from a matrix
matrices, each eigenvalue of A has the same file if it is accessible.)
geometric multiplicity for A that it has for B.
(See Exercise 18 for the corresponding 18 25 -25 8.3 8.0 —6.0
statement on algebraic multiplicities.) 30.); 1 6 -1 31. |-2.0 0.3 3.0
24, Prove that, if A,, A,,... ,A,, are distinct real 18 34 -25 0.0 00 43
eigenvalues of an n X n real matrix A and if
B; is a basis for the eigenspace E,, then the 24.55 46.60 46.60
union of the bases B; is an independent set 32. |-4.66 -8.07 -9.32
of vectors in R*. [HinT: Make use of Theorem -~9.32 -18.64 -17.39
5.3.]
0.8 -1.6 1.8]
25. Let 7: V > V be a linear transformation of a 33. | -0.6 -3.8 3.4
vector space V into itself. Prove that, if -~20.6 -1.2 9.6
V1, V2... , ¥, are eigenvectors of T
corresponding to distinct nonzero [ 7-20 -5 5
eigenvalues A,, A,,.. . ,A;,, then the set 5 -13 -5 0
{7(v,), T{v,), . . . ,7(v,)} is independent. 4/5 10 75
| § -10 -5 -3
26. Show that the set {e"", 2", ... , “}, where
the A, are distinct, is independent in the 2 § -9 10
vector space W of all functions mapping R 4 9 8 -3
into R and having derivatives of all orders. 3-13 2 012
27. Using Exercise 26, show that the infinite set 7 -6 3 2
{e* | k © R} is an independent set in the -22.7 -26.9 -6.3 —46.5
vector space W described in Exercise 26.
[HinT: How many vectors are involved in any
36, |759-7 —40.9 20.9 -99.5
dependence relation?]
"| 15.9 9.6 -8.4 26.5
43.8 36.5 -7.3 78.2
28. In Section 5.1, we stated that if we allow
complex numbers, then an n x n matrix 66.2 58.0 -11.6 116.0
with entries chosen at random has 37, | 120.6 89.6 -42.6 201.0
eigenvectors that form a basis for n-space "1-210 -150 7.6 35.0
with probability 1. In light of our work in -99.6 -79.0 28.6 -169.4
5.3 TWO APPLICATIONS 317

-253 -232 -96 1088 280] T2513 596 -414 ~2583 1937
213. 204 +93 -879 -225 127-32. 33 132 8!
3. | 90 -90 -47 360 90 40.| -421 94 -83 -434 306
-38 -36 -18 162 40 2610 -615 443 2684 -1994
62 64 42 -251 —57| | 90 -19 29 494 -50
154 -24 -36 —1608 —336 2-4 6 21
-126 16 18 1314 270 6 3-8 12
99.| 54 0 4 -540 -108 4./-4 0 5 12
2440 O -236 —48 4 3 S-IL 3
-42 -12 -18 366 70 16679 #2 3

oe
5.3 | TWO APPLICATIONS
| |
In this section, A will always denote a matrix with real-number entries.
In Section 5.1, we were motivated to introduce eigenvalues by our desire
to compute A*x. Recall that we regard x as an initial information vector of
some process, and A as a matrix that transforms, by left multiplication, an
information vector at.any stage of the process into the information vector at
the next stage. In our first application, we examine this computation of A*x and
the significance of the eigenvalues of A. As illustration, we determine the
behavior of the terms F, of the Fibonacci sequence for large values of n. Our
second appiication shows how diagonalization of matrices can be used to solve
some linear systems of differential equations.

Application: Computing A*x


Let A bea real-diagonalizable n x n matrix, and let A,, A,,...,A, be the 7 not
necessarily distinct eigenvalues of A. That is, each eigenvalue of A is repeated
in this list in accord with its algebraic multiplicity. Let B = (v,, ¥,,...,¥,) be
an ordered basis for R’, where v; is an eigenvector for A,. We have seen that, ifC
is the matrix having v, as jth column vector, then

» 0
A,
CAC = D= () .

" a

For any vector x in R’, let d = [d,, d,, . . . , d,] be its coordinate vector relative
to the basis B. Thus,

x=dy,+dy,+-::+dyv n
318 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

Then

Ax = dAjv, + dA)V, + +++ + dA, Hy,. (1)


Equation (1) expresses A*x as a linear combination of the eigenvectors v,.
Let us regard x as an initial information vector in a process in which the
information vector at the next stage of the process is found by multiplying the
present information vector on the left by a matnx A. Illustrations of this
situation are provided by Markov chains in Section |.7 and the generation of
the Fibonacci sequence in Section 5.1. We are interested here in the long-term
outcome of the process. That is, we wish to study A*x for large vaiues of k.
Let us number our eigenvalues and eigenvectors in Eq. (1) so that |A,| = |A|
if i<j; that is, the eigenvalues are arranged in order of decreasing magnitude.
Suppose that |A,| > |A,| so that A, is the unique eigenvalue of maximum
magnitude. Equation (1) may be written
A'x = Aj(d,v, + d,(A,/A,)v, + + °° + d(A,/A,)*V,).
Thus, if ks large and d, # 0, the vector A‘x is approximately equal to d,A,*v, in
the sense that ||4*x — d,A,*v,|l is small compared with {I4*x||.

EXAMPLE 1 Show that a diagonalizable transition matmx 7 for a Markov chain has no
eigenvalues of magnitude > 1.
SOLUTION Example 3 in Section 5.1 siiows that 1 is an eigenvalue for every transition
matrix of a Markov chain. For every choice of population distribution vector
p, the vector T*p is again a vector with nonnegative entries having sum 1. The
preceding discussion shows that all eigenvalues of 7 must have magnitude = 1;
otherwise, entries in some 7*p would have very large magnitude as k
increases. @

EXAMPLE 2 Find the order of magnitude of the term F, of the Fibonacci sequence Fo, F;,
F,, F;,... —that is, the sequence
0, 1, 1, 2,3, 5, 8, 13,...
for large values of k.
SOLUTION We saw in Section 5.i that, if we let

then

xX, = Ve
1 0
We compute relation (1) for
11 l
A=|i A and x= x= (9)
5.3 TWO APPLICATIONS =—-3319
The characteristic equation of A is
(1 -—AY-A)-
1 =A-A-1=0.
Using the quadratic formula, we find the eigenvalues
1+-V5 1-V5
A, = 2 and A=):

Reducing A ~ A,J, we obtain

1-V5
A-Al~ 2 I) SO wale |
0 0

is an eigenvector for A,. In an analogous fashion, we find that

n=|Via |
is an eigenvector corresponding to A,. Thus we take

cele 1 vee
To find the coordinate vector d of x, relative to the basis (v,, v,). we observe that
= Cd. We find that

cia, [Vot} A
4V5l1-V5 2)
thus,

aj- om= fo] avait V5}


on [CRESS [a
Equation | takes the form

Cavs 2)[vai a} @
For large k, the kth power of the eigenvalue A, = (1 + VV5)/2 dominates, so A‘x,
is approximately equal to the shaded portion of Eq. (2). Computing the second
component of A*x,, we find that

n=val(-z) -(a)}
Lyla Vsye (L- V5
(3)
320 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

Thus,

1 lt V5\e
F, = val a} for large k. (4)

Indeed, because jA,| = |(1 — /5)/2| < 1, we see that the contribution from this
second eigenvalue io the right-hand side of Eq. (3) approaches zero as k
increases. Because ]A,‘/5| < 5 for k = 1 and hence for all F, we see that F, can
be characterized as the closest integer to (1/\/5)((1 + 5)/2)* for all k,
Approximation (4) verifies that F, increases exponentially with k, as expected
for a population of rabbits. #

Example 2 is a typical analysis of a process in which an information vector


after the kth stage is equal to A*x, for an initial information vector x, and a
diagonalizable matrix A. An important consideration is whether any of the
eigenvalues of A have magnitude greater than 1. When this is the case, the
components of the information vector may grow exponentially in magnitude,
as illustrated by the Fibonacci sequence in Example 2, where |A,| > 1. On the
other hand, if all the eigenvalues have magnitude less than 1, the components
of the information vector must approach zero as k increases.
The process just described is called unstable if A has an eigenvalue of
magnitude greater than 1, stable if all eigenvalues have magnitude less than 1,
and neutrally stable if the maximum magnitude of the eigenvalues is 1. Thusa
Markov chain is a neutrally stable process, whereas generation of the Fibonac-
ci sequence is unstable. The eigenvectors are called the normal modes of the
process.
In the type of process just described, we study information at evenly
spaced time intervals. If we study such a process as ihe number of time
intervals increases and their duration approaches zero, we find ourselves in
calculus. Eigenvalues and eigenvectors play an important role in applications
of calculus, especially in studying any sort of vibration. In these anplications
of calculus, components of vectors are functions of time. Our second applica-
tion illustrates this.

Application: Systems of Linear Differentia! Equations


In calculus, we see the importance of the differential equation
dx rie kx

in simple rate of growth problems involving the time derivative of the amount
X present of a single quantity. We may also write this equation as x’ =
where we understand that x is a function of the time variable ¢, In mor
complex growth situations, m quantities may be present in amounts X,, x, --*’
x,. The rate of change of x, may depend not only on the amount of x; present,
but also on the amounts of the other n — 1 quantities present at time /. we
5.3 TWO APPLICATIONS 321

consider a situatiun in which the rate of growth of each x, depends linearly on


the amounts present of the n quantitics. This leads to a system of linear
differential equations

X) = AyX, + Ax, + +++ + aK,


X= AyX, + AyX, + +++ + a,x,

(5)

xX, = 4X, + Qn2X7 teee t+ Qn Xns

where each x, is a differentiable function of the real variable t and each a, is a


scalar. The simplest such system is the single differential equation
x' = ax, (6)
which has the general solution
x(t) = ke", (7)
where k is a scalar. (See Example 8 on page 298.) Direct computation verifies
that function (7) is the general solution of Eq. (6).
Turning to the solution of system (5), we write it in matrix form as
x' = AX, (8)

wilere

x,(t) oy
X(t) x,(d)
x=| .- |, x’=] . |, and A = [a,).

x,(0) x(t)
If the matrix A is a diagonal matrix so that a, = 0 for i # j, then system (8)
reduces to a system of nm equations, each like Eq. (6), namely:

X, = aX
X_ = AyX;

(9)

Xq = AngXy-
The general solution is given by
x,] [ke
X,| | k,e%22!
x=|-]=

Xn k eon!
322 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

In the general case, we try to diagonalize A and reduce system (8) to a


system like (9). If A is diagonalizable, we have

A,
D = CAC= ()
") n

for some invertible n X n matrix C. If we substitute A = CDC~', Eq. (8) takes


the form C!x’ = D(C~'x), or

y' = Dy, (10)


where

x = Cy. (11)
(If we let x = Cy, we can confirm that x’ = Cy’.) The genera! solution of system
(10) is

where the A, are eigenvalues of the diagonalizable matrix A. The general


solution x of system (8) is then obtained from Eq. (11), using the correspond
ing eigenvectors v, of A as columns of the matrix C.
We emphasize that we have described the general solution of system (8)
only in the case where the matrix A is diagonalizable. The algebraic multipli-
city of each eigenvalue of A must equal its geometric multiplicity. The
following example illustrates that the eigenvalues need not be distinct, as long
as their algebraic and geometric multiplicities are the same.

EXAMPLE 3 Selve the linear differential system

X= X- XX;
X, = -X, +X- X;
x; = —X, — X, + X;.

SOLUTION The first step is to diagonalize the matrix


1-1 -1
A={-1l 1 -l}.
-l -1 1
9.3 TWO APPLICATIONS 323

Expanding the determinant |A — A/] across the first row, we obtain


l1-A -l —|
|A-AN=| -1 1-aA -l
—-1] -1 I-A
-_ 7, fl-a -1 | |-l -l | f-bi-a
“0 a -1 pt a - i"
= (1 — AA? — 2A) + 2(A - 2)
= (A — 2X(1 — AA + 2)
= -(A+ IYa
- 2/7.
This yields eigenvalues A, = —1 and A, = A, = 2.
Next we compute eigenvectors. For A, = —1, we have
2-1-1! {1 1-2
A+J=|-1 2 -1l]/~/0 2 -3},
-l1 -1 2 00 0

whicn gives the eigenvector

which gives the independent eigenvectors

-] -]
v,=| I] and v,=| OJ.
0 l
A diagonalizing matnx is then
1-1-1
C=]1 1 O|,
1 0 1

so that
-1 0 0
D=C'AC=| 0 2 O|.
0 0 2

Setting x = Cy, we see that the system becomes


yan
y, = 2y,
V3 = 23,
324 CHAPTER 5 EIGENVALUES AND EIGENVECTORS

whose solution is given by

Ni ke"!
Yy| = | ke
V3 ke "|

Tuerefore, the solution tu the original system 1s


x, 1 -1 -1][y,| [Aye’ — ke* — ke”
%J=}1 1 Of ly} =lket + ket ,
x} |1 0 I} | ke"! + k,e* a

In Section 9.4, we show that every square matrix A is similar to a mairix


J in Jordan canonical form, so that J = C~'AC. The Jordan canonical form
J and the invertible matrix C may have complex entries. In the Jordan
canonical form J, ail nondiagonal entries are zero except for some entries |
immediately above the diagonal entries. If the Jordan form J = CAC
is known, the substitution x = Cy again reduces a system x’ = Ax of
linear differential equations to one that can be solved easily for y. The
solution of the original system is computed as x = Cy, just as was the solu-
tion in Example 3. This is an extremely important technique in the study of
differential equations.

~~ |

"| SUMMARY
1. Let A be diagonalizable by a matrix C, let x be any column vector, and let
d = C''x. Then
A‘x = dA,*y, + d,A,*v, treet dA. V 3

where ¥, is the jth column vector of C, as described on page 318.


2. Let multiplication of a column information vector by A give the informa-
tion vector for the next stage of a process, as described on page 320. The
process is stable, eutrally stable, or unstable, according as the maximum
magnitude of the eigenvalues of A is less than 1, equal to |, or greater that
1, respectively.
3. The system x’ = Ax of linear differential equations can be solved, if Ais
diagonalizable, using the following three steps.
Step I. Find a matrix C so that D = C~'AC is a diagonal matrix.
Step 2. Solve the simpler diagonal system y’ = Dy.
Step 3. The solution of the original system is x = Cy.
5.3 TWO APPLICATIONS 325

EXERCISES
y. Let the sequence a, a, a),... be given by In Exercises 6-13, solve the given system of linear
a = 9, a, = I, and a, = (a,_, + aj-,)/2 for differential equations as outlined in the summary.
k2 2.
6. x = 3x, _~ 5X;
a. Find the matnx A that can be used to x, = 2X,
generate this sequence as we used a
matrix to generate the Fibonacci xi= xX, + 4,
sequence in Section 5.1. x = 3x,

b. Classify this generation process as stable,


neutrally stable, or unstable. Xp=2x,+
1
c Compute expression (1) fozx = | a, | for xt = 2x, + 2x,
Ay xX} = Xx, + 3x,
this process. Check computations with the
first few terms of the sequence. 10. x} = 6x, + 3x; 7” 3x;

d. Use the answer to part (c) to estimate a, x} = —2x;, _ Xy + 2x;

x; = 16x, + 8x, - 7x,


for large k.
. Repeat Exercise | if a, = a,_, — (is a,-2 for 11. x, = —3x, + 10x, — 6x;
x3 = 1X, - 6x;
k> 2.
x= xy
. Repeat Exercise 1, but change the initial
aod

data to a) = 1, a,=0 12. xX = — 3x, + 3X; - 20x,


X,= 2x, + 8x,
. Repeat Exercise | if a, = (;}a- + (rs}av-s Xy= 2x, + x + Tx;
fork = 2.
13. xX; = —2x, - Xx;
. Repeat Exercise | if a, = a,_, + (z}a%-2 for xX} = 2X3
k= 2. x3= 3x, + 2x,
—————————.
=—
CHAPTER

6 ORTHOGONALITY

We are accustomed to working in R? with coordinates of vectors relative to the


standard ordered basis (e,, e,) = (i, j). These basis vectors are orthogonal
(perpendicular) ana have lengtn 1. The vectors in the standard ordered basis
(e,, €,,... , e,) of R” have these same two properties. It is precisely these two
properties of the standard bases that make computations using coordinates
relative to them quite easy. For example, computations of angles and of length
are easy when we use coordinates relative to standard bases. Throughout linear
algebra, computations using coordinates relative to bases consisting of orthog-
onal unit vectors are generally the easiest to perform, and they generate less
error in work with a computer.
This chapter is devoted to orthcgonality. Orthogonal projection, which is
the main tool for finding a basis of orthogonal vectors, is developed in Section
6.1. Section 6.2 shows that every finite-dimensional inner-product space has a
basis of orthogonal unit vectors, and shows how to construct such a basis from
any given basis. Section 6.3 deals with orthogonal matrices and orthogonal
linear transformations. In Section 6.4 we show that orthogonal projection can
be achieved by matrix multiplication. To conclude the chapter, we give
applications to the method of least squares and overdetermined linear systems
in Section 6.5.

|e: PROJECTIONS

The Projection of b on sp(a)


For convenience, we develop the ideas in this section for the vector space R’;
using the dot product. The work is valid in general finite-dimensional
inner-product spaces, using the more cumbersome notation (u, v) for the innef
product of u and v. An illustration in function spaces, using an inner product
defined by an integral, appears at the end of this section.

326
6.1 PROJECTIONS 327

LUSTRATION 1 A practical concern in vector applications involves determining what portion


of a vector b can be considered to act in the direction given by another vector a.
A force vector acting in a certain direction may be moving a body along a line
having a different direction. For example, suppose that you are trying to roll
your stalled car off the road by pushing on the door jamb at the side, so you can
reach in and control the steering wheel when necessary. You are not applying
the force in precisely the direction in which the car moves, as you would be if
you could push from directly behind the car. Such considerations lead the
physicist to consider the projection p of the force vector F ona direction vector
a, as shown in Figure 6.1. In Figure 6.1, the vector p is found by dropping a
perpendicular from the tip of F to the vector a. =

Figure 6.2 shows the situation where the projection of F on a has a


direction opposite to the direction of a; in terms of our car in {Illustration 1,
you would be applying the force to move the car backward rather than
forward. Figure 6.2 suggests that it is preferable to speak of projection on the
subspace sp(a) (which in this example is a line) than to speak of projection on a.
We derive geometrically a formula for the projection p of the force vector F
on sp(a), based on Figures 6.1 and 6.2. We see that p is a multiple of a. Now
(1/jal|)a is a unit vector having the same direction as a, so p is a scalar multiple
of this unit vector. We need only find the appropriate scalar. Referring to
Figure 6.1, we see that the appropriate scalar is ||F|| cos 0, because it is the
length of the leg labeled p of the right triangle. This same formula also gives the
appropriate negative scalar for the case shown in Figure 6.2, because cos @ 1s
negative for this angle @ lying between 90° and 180°. Thus we obtain

p= [IF cos @ | _ |IFilllall cos @) = F-a,


lal {all tall aca
Of course, we assume that a # 0. We replace our force vector F by a general
vector b and box this formula.

Projection p of b on sp(a) in R’
=”

re (1)
<)

FIGURE 6.1 FIGURE 6.2


Projection p of F ona. Projection p of F on sp(a).
328 CHAPTER 6 ORTHOGONALITY

EXAMPLE 1 Find the projection p of the vector [1, 2, 3] on sp({2, 4, 3]) in R®.
SOLUTION We let a = [2, 4, 3] and b = [1, 2, 3] in formula (1), obtaining

_b-a,_ 2+8+9 19
Pr aa? 44 16t9%
2944 3h
The Concept of Projection

We now explain what is meant by the projection of a vector b in R’ on a general


subspace W of R’. We will show that there are unique vectors by, and by,» such
that
I. b, is in the subspace W;
2. by» is orthogonal to every vector in W; and
3. b = by + by.

Let W* be the set of all vectors in R’ that are perpendicular to every vector in
W. Properties of W* will appear shortly in Theorem 6.1. Figure 6.3 gives a
symbolic illustration of this decomposition of b into a sum of a vector in W
and a vector orthogonal to W. Once we have demonstrated the existence of
this decomposition, we will define the projection ofb on W to be the vector by.
The projection b,, of b on W is the vector w in W that is closest to b. That
is, W = by minimizes the distance ||b — w|| from b to W for all w in W. This
seems reasonable, because we have b = by + by» and because b,, is
orthogonal to every vector in W. We can demonstrate algebraically that for any
w € W, we have ||b — wil = ||b — by/|. We work with ||b — wil? so we can use the
dot product. Because the dot product of any vector in Wand any vector in W*
is 0, we obtain, for all w € W,

[|b — wi’ = (b — w) + (b — w)
= ((b — by) + (by— w)) - ((b — by) + (by~ w))
= (b — by) + (b — by) + 2(b — by) + (by — w) + (by — W) + (by — ¥)
in W* in W
= |[b — by? + [lbw — wil = |[b — byl’.

FIGURE 6.3
The decomposition b = b, + by:
6.1 PROJECTIONS 329

Of course, this mi imum distance from b to W is just ||b,,||, as indicated by


Figure 6.3.

ILLUSTRATION 2 ‘Suppose that you are pushing a box across a floor by pushing forward and
downward on the top edge. (See Figure 6.4.) We take as origin the point where
the force is applied to the box and as W the plane through that origin paralic!
to the floor. If F is the force vector applied at the origin, then F,,is the portion
of the force vector that actually moves the body along the floor, and Fy:
(which is directed straight down) is the portion of the force vector that
attempts to push the box into the floor and thereby increases the friction
between the box and the floor. =

Orthegonal Complement of a Subspace


As a preliminary to proving the existence and uniqueness of the vectors b,,and
by: just described, we consider all vectors in R” that are orthogonal to every
vector in a Subspace W.

DEFINITION 6.1 Orthogonal Complement

Let W be a subspace of R’. The set of all vectors in R’ that are


orthogonal to every vector in W is the orthogonal complement of VW,
and 1s denoted by +.

FIGURE 6.4
The decomposition of a force vector.
330 CHAPTER 6 ORTHOGONALITY

It is not difficult to find the orthogonal complement of W if we know a


generating set for W. Let {v,, v,, . . . , v,} be a generating set for W. Let A be the
k X n matrix having vy, as its ith row vector. That is,

A =
vy

.
| j

Vy |
Thus, W is the row space of A. Now the nullspace of A consists of all vectors x
in R* that are solutions of the homogeneous system Ax = 0. But Ax = 0 if and
only ifv,-x =Ofori=1,2,...,. Therefore, the nullspace of A is the set of
all vectors x in R" that are orthogonal to each of the rows of A, and hence to the
row space of A. In other words, the orthogonal complement of the row space of
A is the nullspace of A. Thus we have found W+. We summarize this procedure
in a box. Notice that this procedure marks one of the rare occasions in this text
when vectors must be placed as rows of a matrix.

Finding the Orthogonal Complement of a Subspace W of IR’


1. Find a matrix A having as row vectors a generating set for W.
2. Find the nullspace of A—that is, the solution space of Ax = 0. This
nullspace is H+.

EXAMPLE 2 Find a basis for the orthogonal complement in R* of the subspace

W = sp({I, 2, 2, 1), (3, 4, 2, 3)).


SOLUTION We find the nullspace of the matrix

a= [}22 4
Reducing A, we have

|; 22 I [ 2 2 \~(¢ 0 -2 ,
342 3 0-2 -4 0 0 1 2 OF
Therefore, the nullspace of A, which is the orthogonal complement of VW, is the
set of vectors of the form
[2r — s, —2r, r,s] for any scalars r and s.

Thus {[2, —2, 1, 0], [-1, 0, 0, 1]} is a basis for W+.


6.1 PROJECTIONS 331

ILLUSTRATION 3 Note that if Wis the (m — 1}-dimensional solution space in R” of = single linear
equation a,x, + a,x, + +--+ + a,x, = 0, then the vector [a,, a),..-., 4,] of
coefficients is orthogonal to W, because the equation can be written as

[2,, @,..., @,] > (%1,% --- > %) = 0.

Thus W* = sp((@,, a), . . . , a,]). The subspace Wis a line if m = 2, a planeifn =


3, and is cailed a Ayperplane for other values of 7.) =

We now show that the orthogonal complement of a subspace W has some


very nice properties. In particular, we exhibit the decomposition b = by + by
described earlier and show that it is unique.

THEOREM 6.1 Propertiesof W2

The orthogonal complement W+ of a subspace W of R” has the


following properties:
1. W* is a subspace of R*.
2. dim(W*) = n — dim(W).
3. (W’)}' = W, that is, the orthogonal complement of W is W.
4. Each vector b in R* can be expressed uniquely in the form
b = by + by,» for by in W and by» in W*.

PROOF We may assume W # {0}. Let dim(W) = k, and let {v,, v,,..., v,} bea
basis for W. Let A be the & X m matrix having v, as its ith row vector for
i=1,...,k4
For property 1, we have seen that W" is the nullspace of the matrix A, so it
is a subspace of R’.
For property 2, consider the rank equation of A:

rank(A) + nullity(A) = 2.

Because dim(W) = rank(A) and because W” is the nullspace of A, we see that


dim(W*) = n — dim(W).
For property 3, applying property 2 with the subspace W+, we find that

dim(W2): = n — dim(W*) =n - (n-K=k.

However, every vector in W is orthogonal to every vector in W+, so W


is a subspace of (W+)*. Because both W and (W*)+ have the same dimension
k, we conclude that W must be equal to (W“)+. (See Exercise 38 in Sec-
tion 2.1.)
332 CHAPTER 6 ORTHOGONALITY

For property 4, let {v..;, Via2, . - - > ¥,¢ be a basis for W+. We claim that the
set

{V1 Vp, -- +) Val (2)

is a basis for R*. Consider a relation

TV, TV, tov FY H SeeiVeer + SeeMeer to + SV, = 0.


Rewrite this relation as

TV, HOV, Fo FM = SpaVeat 7 See2Ven2 7 OT SAV (3)

The sum on the left-hand side is in W, ana the sum on the right-hand side is in
W". Because these sums are equal, they represent a vector that is 11 both W
and W+ and must be orthogonal to itself. The only such vector is the zero
vector, so both sides of Eq. (3) musi equal 0. Because the v, are independent for
| =iskand the v,are independent for x + | =7 =n, we see that ail the scalars
r, and 5, are zero. This shows that set (2) is independent; because it contains n
vectors, it must be a basis for R*. Therefore, for every vector b in R" we can
express bh in the form

b= nv, try, tse tay, + SeeiVers + StaoYean toc + SaVne


wT

by bya

This shows that b can indeed be expressed as a sum of a vector in W anda


vector in W*, and this expression is unique because each vector in R’ is a
unique combination of the vectors in the basis {v,, v,,...,V,}. @

Projection of a Vector on a Subspace


Now that the groundwork is firmly established, we can define the projection of
a vector in R" on a subspace and then illustrate with some examples.

DEFINITION 6.2 Projection


of b onW

Let b be a vector in R’, and let W be a subspace of R”. Let


b = by + bys,
as described in Theorem 6.1. Then b,, is the projection of b
on W.

Theorem 6.1 shows that the projection of b on W is unique. It also shows


one way in which it can be computed.
6.1 PROJECTIONS 333

Steps to-Find the Projecticn of b on W


1. Select a basis {v,, v,,..., v,} for the subspace W. (Often this is
given.)
2. Find a basis {¥,,,, Vin2, .. . » Vat for W*, as in Example. 2.
3. Find the coordinate vector r = [r,, fx - - - » %) Of b relative to the
basis (¥,,.¥,- ..., V,).so that b = 7,V, + mv, + +++ + 7,v,. (See the
box on page 207.)
4. Then
by = "7,yj + ry, + °° ° + 7,¥,

EXAMPLE 3 Fird the projection of b = [2, 1, 5}on the subspace 'V = sp((I, 2, 1], (2, 1, - 1)).
SOLUTION We follow the boxed procedure.
Step 1: Because v, = [1, 2, \] and v, = (2, 1, —1] are independent, they form
a basis for W.
Step 2: A basis for W* can be found by obtaining the nullspace of the
matrix
_{l 2 1
a=() ; 1}
An echelon form of A is
1 2 1
0 -3 -3P
and the nuilspace of A is the set of vectors [7, -7,r], where ris any scalar. I et us
take v, = [1, —1, 1] to form the basis {v,} of W*. (Alternatively, we could havc
computed v, X v, to find a suitable v;.)
Step 3: To find the coordinate vector r of b relative to the ordered
basis (v,, ¥,, ¥;), we proceed as described in Section 3.3, and perform the
reduction
1 2 1/2) f1 2 1] 2] ft 0 1] 4]
2 i -l ~|0 -3 -3 | -3}~]O 1 Oj -1
(1-1 1|5{ [0-3 O| 3} [0 0 -3 | -6
vy. Vv ¥; +b
1 0 0 2
~10 1 0 —1f.
0 0 1 2
Thus, r = (2, -1, 2].
Step 4: The projection of b on Wis
by = 2v, — v, = 2[1, 2, 1] — [2, 1, -1] = [0, 3, 3}.
As acheck, notice that 2v,; = [2, —2, 2] 1s the projection of b on W4, and b =
by + by = [0, 3, 3] + [2, —2, 2] = (2, 1, 5]. 7
334 CHAPTER 6 ORTHOGONALITY

Our next example shows that the procedure described in the box preceding
Example 3 yields formula (1) for the projection of a vector b on sp(a).

EXAMPLE 4 Let a # 0 and b be vectors in R*. Find the projection of b on sp(a), using the
same boxed procedure we have been applying.
SOLUTION We project b on the suuspace W = sp(a) of R”. Let {v,, v5, .. . , v,} be a basis for
W*, and let r = [r,, ,..-, 7,] be the coordinate vector of b relative to the
ordered basis (a, v,,..., V,)- Then
b=ratnytec+
+ryvan

Because v,-a = Ofori=2,...,n, wesee that b-a=r,a-a=~r,(a- a). Because


a # 0, we can write
b-a
r=
aca
+ ae b-a
he projection of b on sp(a) is then r,a = aa
° a

We have seen that if W is a one-dimensional subspace of R", then we can


readily find the projection by of b € R” on W. We simply use formula (1). On
the other hand, if W* is one-dimensional, we can use formula (1) to find the
projection of b on WM“, and then find by, using the relation by, = b — by». Our
next example illustrates this.

EXAMPLE 5S Find the projection of the vector [3, —1, 2] on the plane x + y + z= O through
the origin in R?>.
SOLUTION Let W be the subspace of R? given by ihe plane x + y + z = 0. Then WM is
one-dimensional, and a generating vector for W* is a = [1, 1, 1), obtained by
taking the coefficients of x, y, and z in this equation. (See Illustration 3.) Let
b = [3, —1, 2]. By formula (1), we have

bus _bra_
= a= _ Taal
3-142 _4
b= fh 1 Ib
Thus by=—_ b — by» —= (3, -1, 2] - 4{1,
4 1, N=— 15 _7 q, Ar2 .

Projections in Inner-Product Spaces (Optional)


Everything we have done in this section is equally valid for any finite
dimensional inner-product space. For example, formula (1) for projecting one
vector on a one-dimensional subspace takes the form

— %, a) (4)
P ~ Ya, a)*
The fact that the projection by; of a vector b on a subspace W is the vector Wi”
W that minimizes ||b — w(|| is useful in many contexts.
For an example involving function spaces, suppose that fis a complicated
function and that W consists of functions that are easily handled, such 4
6.1 PROJECTIONS 335

polynomials or trigonometric functions. Using a suitable inner product and


projection, we find that the function fin W becomes a best approximation to
the function fby functions in W. Example 6 illustrates this by approximating
the function f(x) = x over the interval 0 = x = | by a function in the space W
of all constant functions on this interval. Visualizing their graphs, we are nci
surprised that the constant function p(x) = ; turns out to be the best
approximation to x in W, using the inner product we defined on function
spaces in Section 3.5. In Section 6.5, we will use this minimization feature
again to find a best approximate solution of an inconsistent overdetermined
linear system.

EXAMPLE 6 Let the inner product of two polynomials p(x) and q(x) in the space P,, of
polynomial functions with domain 0 = x = | be defined by

(Pbx), a0) = | pexdalx) ae


(See Example 3 in Section 3.5.) Find the projection of f(x) = x on sp(1), using
formula (4). Then find the projection of x on sp(1)'.
SOLUTION By formula (4), the projection of x on sp(1) is

I, (9)
ea) [car i=(T)E=3
The projection of x on sp(1)" is obtained by subtracting the projection on sp(1)}
from x. We obtain x — 5. As a check, we should then have (!, x — ) =0O0,anda
computation of Jo Oe — })dx shows that this is indeed so. ®

"| SUMMARY
1. The projection of b in R’ on sp(a) for a nonzero vector a in R’ is given by
((b - a)/(a - a))a.
2. The orthogonal complement W* of a subspace W of R’ is the set cf all
vectors in R” that are orthogonal to every vector in W. Further, W is a
subspace of R" of dimension n — dim(W), and (W*)' = W
3. The row space and the nullspace of an m X n matrix A are orthogonal
complements of each other. In particular, W* can be computed as the
nullspace of a matrix A having as its row vectors the vectors in a generating
set for W.
4. Let Wbea subspace of R’. Each vector b in R’ can be expressed uniquely in
the form b = b, + by for by in W and by» in M4,
S. The vectors by and by: are the projections of b on W and on H%,
respectively. They can be computed by means of the boxed procedure on
page 333.
336 CHAPTER 6 ORTHOGONALITY

EXERCISES

In Exercises 1-6, find the indicated projection. 18. The projection of [-1, 0, 1] on the plane
xt+y=0inR
1. The projection of (2, 1] on sp([3, 4]) in i 19. The projection of [0, 0, 1] on the plane
2. The projection of [3, 4] on sp({2, 1]) in R’ 2x-y-z=0inR
3. The projection of [1, 2, 1] on each of the 20. The projection in R* of [—2, 1, 3, —5] on
unit coordinate vectors in R? a. the subspace sp(e,)
4. The projection of [1, 2, 1] on the line with b. the subspace sp(e,, 4)
c. the subsnace sp(e,, €3, €4)
parametric equations x = 3, y= ,z = 21
in R? d. RY
21. The projection of [1, 0, —1, i] on the
5. The projection of [—1, 2, 0, 1] on
subspace sp([!, 0, 0, 0], [0, 1, 1, OJ,
sp((2, —3, 1, 2]) in R*
(0, 0, 1, 1]) in R*
o. The projection of [2, —1, 3, —5] on che line
22. The projection of [0, 1, —1, 0} on the
sp({1, 6, —1, 2)} in R* subspace (hyperplane) x, — x, + x; + x, =0
in & (Hint: See Example 5.]
In Exercises 7~12, find the orthogunal
23. Assume that a, b, and c are vectors in R” and
complement of the given subspace. that W is a subspace of R*. Mark each of the
following True or False.
7. The subspace sp([!, 2, —1]) in R? ——.a. The projection of b on sp(a) is a scalar
8. The line sp((2, —1, 0, —3]) in R* multiple of b.
9. The subspace sp({1, 3, 0], (2, 1, 4]) in R? ___ b. The projection of b on sp(a) is a scalar
multiple of a.
10. The plane 2x + y + 3z = 0in R?
—c. The set of all vectors in R’ orthogonal to
11. The subspace sp{[2, 1, 3, 4], [1, 0, —2, 1]) in every vector in W is a subspace of R’.
R4
___ d. The vector w G W that minimizes
12, The subspace (hyperplane) ax, + bx, + cx, llc — w|| is cy.
+ dx, = 0 in R* (Hint: See Illustration 3.] ___ e. If the projection of b on W is b itself,
13. Find a nonzero vector in R? perpendicular to then b is orthogonal to every vector in W.
[1, 1, 2] and (2, 3, 1] by _— f. Ifthe projection of b on W is b itself,
a. the methods of tiis section, then b is in W.
b. computing a determinant. ___ g. The vector b is orthogonal to every
vector in W if and only if b, = 0.
14. Find a nonzero vector in R‘ perpendicular to _—h. The intersection of W and W* is empty.
{1, 0, ~1, 1), (, 0, —1, 2], and (2, -1, 2, 0] __ i. If b and c have the same projection on
by W, then b = c.
a. the methods of this section,
—— j. Ifb and c have the same projection on
b. computing a determinant. every subspace of R”, then b = c.
24. Let a and b be nonzero vectors in R’, and let
In Exercises 15-22, find the indicated projection. 6 be the angle between a and b. The scalar
||b|] cos @ is called the scalar component of b
15. The projection of [1, 2, 1] on the subspace along a. Interpret this scalar graphically (se¢
sp((3, 1, 2], [1, 0, 1]) in R? Figures 6.1 and 6.2), and give a formula for
16. The projection of [1, 2, |] on the plane it in terms of the dot product.
x+y+z=0i0® 25. Let W be a subspace of R" and let b be a
17. The projection of [1, 0, 0] on the subspace vector in R". Prove that there is one and
sp((2, 1, 1], (1, 0, 2]) in R? only one vector p in W such that b — pis
6.1 PROJECTIONS 337

perpendicular to every vector in W. (Hint: 30. Find the distance from the point (2, 1, 3, 1)
Suppose that p, and p, are two such vectors, in R‘ to the plane sp({1, 0, 1, 0],
aod show that p, — p, is in W*+.] fl, —1, 1, 1]). (Hint: See Exercise 29.]
36. Let A be an m X n matrix.
a. Prove that the set W of row vectors x in
RR” such that xA = 0 is a subspace of R”.
p. Prove that the subspace W in part (a) In Exercises 31-36, use the idea in Exercise 29 to
and the column space of A are orthogonal find the distance from the tip of a to the giver:
complements. one-dimensional subspace (line). [NoTE: To
calculate |lay-||, first calculate ||ay\| and then use
47. Subspaces U and Wof R" are orthogonal if
Exercise 28.}
u-w = 0 for all uin U and all win W. Let
U and W be orthogonal subspaces of R", and
let dim(U) = n — dim(W). Prove that each 31. a=[l, 2, 1],
subspace is the orthogonal complement of W = sp({2, 1, 0]) in &
the other.
3z. a={2,—-1, 3],
98. Let W be a subspace of R’ with orthogonal
W = sp({I,2, 4]) in R?
complement W". Wniting a = a, + a,/1, as
in Theorem 6.1, prove that 33. a = [I, 2, -1, 0),

W = sp((3, 1, 4, —1)) in R*
all = Vflaud? + [anal
(Hint: Use the formula |lal\? = a - a.] 34. a= (2, 1, 1, 2],
29. (Distance from a point to a subspace) Let W W = sp((I,2, 1, 3]) in R*
be a subspace of R". Figure 6.5 suggcsts that 35. a=([l, 2, 3, 4, 5],
the distance from the tip of a in R" to the
W = sp(fl, 1, 1, 1, 1)) in R?
subspace W is equal to the magnitude of the
projection of the vector a on the orthogonal 36. a=[I,0,1,0, 1,0, 1),
complement of W. Find the distance from W = sp({i, 2, 3, 4, 3, 2, 1) in R?
the point (1, 2, 3) in R? to the subspace
(plane) sp([2, 2, 1], [1, 2, 1)).

Exercises 37-39 involve inner-product spaces


discussed in optional Section 3.5.

wt
37. Referring to Example 6, find the projection
of f(x) = 1 on sp(x) in P,.
| | 38. Referring to Example 6, find the projection
!
of f(x) = x on sp(I + x).
|
4 39. Let S and T be nonempty subsets of an
\
inner-product space V with the property that
aw. a : llay:l|
|
every vector in S is orthogonal to every
|
vector in T. Prove that the span of S and the
J span of T are orthogonal subspaces of V.
l

a 40. Work with Topic 3 of the routine


ay Ww VECTGRPH in LINTEK until you are able
FIGURE 6.5 to get a score of at least 80% most of the
The distance from a to W is |la,,,||. time.
338 CHAPTER 6 ORTHOGONALITY

6.2 THE GRAM-SCHMIDT PROCESS

In the preceding section, we saw how to project a vector b on a subspace W


of R". The calculations can be somewhat tedious. We open this section
by observing that, if we know a basis for W consisting of mutually perpen-
dicular vectors, the computational burden can be eased. We then present
the Gram-Schmidt algorithm, showing how such a nice basis for W ca:: be
found.

Orthogonal and Orthonormal Bases


A set {v,, v,,..., ¥,¢ Of nonzero vectors in R" is orthogonal if the vectors vj
are mutually perpendicular—that is, if v;- v, = 0 for i ¥ j. Our next theorem
shows that an orthogonal generating set for a subspace of R” is sure to be
a basis.

THEOREM 6.2 Orthogonal Bases

Let {v,,¥), ... , ¥,¢ be an orthogonal set of nonzero vectors in R”. Then
this set is independent and consequently is a basis for the subspace
SP(V1, Ya. + +» Ya):

PROOF To show that the orthogonal set {v,, v,, ... , v,} is independent, let us
suppose that

4, = SV, + SW, tt Tt SLY;

Taking the dot product of both sides of this equation with v, yields v, - v, = 0,
which contradicts the hypothesis that v, # 0. Thus, no v,is a linear combination
of its predecessors, so {v,, v,,..-, ¥,} iS independent and thus is a basis for
Sp(¥,, ¥.,..., ¥,). (See Exercise 37 on page 203.) a

EXAMPLE 1 Find an orthogonal basis for the plane 2x — y + z = Din R’.


SOLUTION The given plane contains the origin and hence is a subspace of R?, We need
only find two perpendicular vectors v, and v, in this plane. Letting y = 0 and
z = 2, we find that x = ~—1 in the given equation,
so v, = [—1, 0, 2] lies in the
plane. Because the vector [2, —1, 1] of coefficients is perpendicular to the,
plane, we compute a cross product, and let
v, = [-1, 0, 2] x [2, -1, 1] = [2, 5, I.
This vector is perpendicular to the coefficient vector [2, —1, 1], so it lies in the
plane; and of course, it is also perpendicular to the vector [—1, 0, 2]. Thus:
{v,, v.} is an orthogonal basis for the plane. =
6.2 THE GRAM-SCHMIDT PROCESS 339

Now we show how easy it is to project a vector b on a subspace W of R’ if


we know an orthogonal basis {v,, v2, .-., Vit for W. Recall from Section 6.1
that
b = by + by, (1)
where b,, is the projection of b on V’, and by, is the projection of b on W.
Because b, lies in W, we have

Dy = 7,V, + rN, tot Ty (2)


for some choice of scalars 7;, Computing the dot product of b with v; and using
Egs. (!) and (2), we have
b-v,= (by > ¥) + (Dy: - v)
S(WVc Vt HV Vt eo tye Vy) tO vjisin Y
= FN; * V,. v,° Vv, = 0 fori 4 j

Therefore, r; = (b - v,)/(v; « v,), so

rN; = boi
V; ° Vv;
Vis

which is just the projection of b on v;. In other words, to project b on W, we


need only project b on each of the orthogonal basis vectors, and then add! We
summarize this in a theorem.

THEOREM 6.3 Projection Using an Orthogonal Basis

Let {v,, ¥,, . . . , ¥} be an orthogonal basis for a subspace W of R’, and


let b be any vector in R". The projection of b on Wis

b - 5%, tbat “+ pithy Vp. (3)


vy~ v,°Vv vy 2 Vv
1 1 k

EXAMPLE 2 Find the projection ofb = (3, —2, 2] on the plane zx — y + z = 0 in R?*.
SOLUTION In Example 1, we found an orthogonal basis for the given plane, consisting of
the vectors v, = [—1, 0, 2] and v, = (2, 5, 1]. Thus, the plane may be expressed
as W = sp(v,, v,). Using Eq. (3), we have
“yy b-v
by = ——ly,
b- +——? ",
- Vv; v,° Vv,

- Ie 3
0, 2] + (-Zo)l2 5,1 = y3l-1, 0, 2) ~ 7912, 5,1]
-{1 _1 3 3 37 31 a
340 CHAPTER 6 ORTHOGONALITY

It is sometimes desirable to normatize the vectors in au orthogonal basis,


converting each basis vector to one parallel to it but of unit length. The result
remains a basis for the same subspace. Notice that the standard basis in R’
consists of such perpendicular unit vectors. Such bases are extremely useful and
merit a formal definition.

DEFINITION 6.3 Orthonormal Basis

Let Wbea subspace of R". A basis {q,, q,, . . . ,q¢ for Wis orthonormal
if
1. q;° q,)= Ofori # J, Mutually perpendicular
2. q;°
q; = I. Length1

The standard basis ror R" is just one of many orthonormal bases for R’ if
i > |. For example, any two perpendicular vectors v, and v, on the wnit circle
(i!lustrated in Figure 6.6) form an orthonormal basis for R?.
For the projection of a vector on a subspace that has a known orthonormal
basis, Eq. (3) in Theorem 6.3 assumes a simpler form:

Projection of b on W with Orthonormal Basis {q,, q,, .... ad

by = (b- q,)q, + (b+ qu)q, + °° + + (D° Gy), (4)

EXAMPLE 3 Find an orthonormal basis for W = sp(v,, v,, v;) in R* if v, = [1, 1, 1, 1,


v, = ([-1, 1, -1, 1], and v, = [1, —1, —1, 1]. Then find the projection ofb =
[1, 2, 3, 4] on W.
—p
ee

vy

FIGURE 6.6
One of many orthonormal bases for R,.
6.2 THE GRAM-SCHMIDT PROCESS 341

SOLUTION Wesee that v,*¥2 = vq ¥3 =v, +¥, = 0. Because [lvl] = Iv = [il] = 2, let a, = 3y
for i = 1, 2, 3, to obtain an orthonormal basis {q,, q,, q,} for W, so that

q, ?2?2??> 2: q = ??? 2: q3 ? ? ? OI

To find the projection of b= (1, 2, 3, 4] on W, we use Eq. (4) and obtain


by = (b+ q,)q, + (b* aa, + (b - qs)9,
= 5q, + q, + 0g; = (2, 3, 2, 3). "

The Gram-Schmidt Process


We now describe a computational technique for creating an orthonormal basis
from a given basis of a subspace W of R". The theorem that follows asserts the
existence of such a basis; its proof is constructive. That is, the proof shows how
an orthonormal basis can be constructed.

THEOREM 6.4 Orthonormal Basis (GGram-Schmidt) Theorem

Let W be a subspace of R’, let {a,, a,, .. . , a,} be any basis for W, and
let
W,= sp(ay, a,...,8) for j=1,2,...,k.
Then there is an orthonormal basis {q,, q,, . . . , q,} for W such that
W; = sp(q; G,---35 q).

PROOF Let v, = a,. Forj=2,...,k,let p,be the projection of a,on W,_,, and
let v; = a, — p,. That is, v;is obtained by subtracting from a; its projection on the
subspace generated by its predecessors. Figure 6.7 gives a symbolic illustra-
tion. The decomposition
a, = Pj + (a; ~ P,)

FIGURE 6.7
The vector v; in the Gram-—Schmidt construction.
342 CHAPTER 6 ORTHOGONALITY

s the unique expression for a; as the sum of the vector p,;in W,_, and the vector
~ p,;in(W,_,)*, described in Theorem 6.1. Because a; is in Wi and because p.
is in W,_,, which is itself contained in W,, we see that v; = a; — p; lies in the
subspace W,. Now v, is perpendicular to every vector in W;_, . Consequently, y
is perpendicular to v,, v,,..-, V4. We conclude that each vector in the set
{v,,¥.,...,¥) (5)
1s perpendicular to each of its predecessors. Thus the set (5) of vectors consists
of j mutually perpendicular nonzero vectors in the j-dimensional subspace W,,
and so the set constitutes an orthogonal basis for W,. It follows that, if we set q,
= (1/|lv{)v, for i= 1,2,...,j, then W;= sp(q,,q,,... , q)). Taking
j = k, we see
that
{q,, ,---, 4d

is an orthonormal basis for 7”. a

The procf of Theorem 6.4 was computational, providing us with a


technique for constructing an orthonormal basis for a subspace W of R". The
technique is known as the Gram-Schmidt process, and we have boxed it for
easy reference.

Gram-Schmidt Process
To find an orthonormal basis for 2 subspace W of Re:
1. Find a basis [a, a,,..., aj) for 7
2. Let v, = a. Fors — Peek, compute in succession the yectory
given by subtracting from 2,is projection on the subspace:penerat-
ed by its predecessors
3. They,so obtained form an mnogoneats for W and they may be
normalized to yield an orthpxormal basis,

When actually executing the Gram-Schmidt process, we project vectors


on subspaces, as described in step 2 in the box. We know that it is best to work
with an orthogonal or orthonormal basis for a subspace when projecting on its
and because the subspace W,_, = sp(a,, a), ..., a-,) is also the subspace
generated by the orthogonal set , V),- ++, Vj-4}, it is surely best to work wi
the iatter basis for W,_, when computing the desired projection of a,on 7j-"
Step 2 in the box and ‘Ea. (3) show that the specific formula for v, is as follows:

General Gram—-Schmidt Formula


.°Yy:
yay (Sty, +t +4 8M y, (6)
y° Y V2 ° Vina * Vins
6.2 THE GRAM-SCHMIOT PROCESS 343

One may no--nalize the v, forming the vector q, = (1/|lvj|)v, to obtain a vector
of length | at each step of the construction. In that case, formula (6) can be
replaced by the fotlowing simple form:

Normalized Grarn—Schmidt Formula

v, = a, — ((a° qq, + (8° G2)Q. + ++ + (a,> q-_,)a-1) (7)

The arithmetic using formula (6) and that using formula (7) are similar, but
formula (6) postpones the introduction of the radicals from normalizing until
the entire orthogonal basis is obtained. We shall use formula (6) in our work.
However, a computer will generate less error if it normalizes as it goes along.
This is indicated in the next section.

EXAMPLE 4 Find an orthonormal basis for the subspace W = sp([1, 0, 1], (1, 1, 1]) of R?.
SOLUTION We use the Gram-Schmidi process with formula (6), finding first an orthogo-
nal basis for W. We take v, = [1, 0, 1}. From formula (6) with v, = [1, 0, 1] and
a, = {1, 1, 1], we have

a “Shy, =[1, 1, 1] — (2/2)[1, 0, 1] = (0, 1, 0}.


2 vy°vy

An orthogonal basis for W is {[{1, 0, 1], [0, 1, OJ}, and an orthonormal basis is
{{1/V/2, 0, 1/2], [0, 1, O}}. =

Referring to the proof of Theorem 6.4, we see that the sets {v,, v,,... , v}
and {q,, 4, ... , q} are both bases for the subspace W; = spf{a,, a),..., a).
Consequently, the vector a; can be expressed as a linear combination

a; = ryQ + Ty Q? tees + r,Y,- (8)

HISTORICAL NOTE Tue Gram-Scumipt processis named for the Danish mathematician
Jorgen P. Gram (1850-1916) and the German Erhard Schmidt (1876-1959). It was first published
by Gram in 1883 in a paper entitled “Series Development Using the Method of Least Squares.”” It
was published again with a careful proof by Schmidt in 1907 in a work on integral equations. In
fact, Schmidt even referred to Gram’s result. For Schmidt, as for Gram, the vectors were
continuous functions defined on an interval |a, 5] with the inner product of two such functions
¢, # being given as (¢¢{x)y{x) dx. Schmidt was more explicit than Gram, however, writing out the
process in great detail and proving that the set of functions ¢, derived from his original set ¢, was in
fact an orthonormal set.
Schmidt, who was at the University of Berlin from 1917 until his death, is best known for his
definitive work on Hilbert spaces —spaces of square summable sequences of complex numbers. In
fact, he applied the Gram-—Schmidt process to sets of vectors in these spaces to help develop
necessary and sufficient conditions for such sets to be linearly independent.
344 CHAPTER 6 ORTHOGONALITY

In particular, we see that a, = 1,9), a = 1:4, + rq), and so on. These


equations arising from Eq. (8) for j = 1, 2,..., k can be written in matnx
form as

| Mori tt Mg
Py Vy
a 4 *** a/=!}q, G °° () ,

| | | Vex

A Q R

so A = QR for the indicated matrices. Because each a, is in W, but not in W;_,,


we see that no r, is zero, so R is an invertible k x k matrix. This factorization
A = QR is important in numerical linear algebra; we state it as a corollary. We
will find use for it in Sections 6.5 and 8.4.

COROLLARY 1° Q&-Factorization

Let A be an n X k matrix with independent column vectors in R’.


There exists an n X k matrix Q with orthonormal column vectors and
an upper-triangular invertible & x k matrix R such that A = QR.
ee

EXAMPLE 5 Let
—_—

A=
—>

FactorA in the form A = OR described in Corollary 1 of Theorem 6.4, using


the computations in Example 4.
SOLUTION From Example 4, we see that we can take
/Vv2 0
Q=| 0 l
W/Vv2 0

nea
and solve QR = A for the matrix R. That is, we solve the matrix equation

liv 0
for the entries r,,, r,, and 3). This corresponds to two linear systems of threé
equations each, but by inspection we see that r,, = V2, r,. = V2, and rn = L.
Thus,

11] firvz 0
A=|01{=| 0 fy \’] = oR, .
11) [in0
6.2 THE GRAM-SCHMIDT PROCESS 345

We give another illustration of the Gram-Schmidt process, this time


requiring two applications of formula (6).

EXAMPLE 6 Find an orthonormal basis for the subspace

W = sp({1, 2, 0, 2], (2, 1, 1, 1], (1, 0, 1, 1)


of R*.
SOLUTION First we find an orthogonal basis, using formula (6). We take v, = [1, 2, 0, 2]
and compute v, by subtracting from a, = (2, 1, 1, 1] its projection on v,:
e 4 l i
5) = a, _ sty, = {2, l, l, 1] _ 1 2, 0, 2] = E 3 l. 41

1 !

‘fo ease computations, we replace v, by the parallel vector 3v,, which serves just
as well, obtaining v, = [4, —1, 3, —i]. Finally, we subtract from a, = (1, 0, 1, 1]
its projection on the subspace sp(v,, v,), obtaining
a,°Vv My a.:V
ac,
vy;
= a; —
YVrv v2
° V2

= (1,0, 1, 1] - 31,2, 0, 2] - 214, -1,3,-1]


-|-3 435
1-9 9 > OF
Replacing v, by 9v,, we see that

{{1, 2, 0, 2), (4, -!, 3, -1], [-2, -4, 3, 5]}


is an orthogonal basis for W. Normalizing each vector tc length 1, we
obtain
l I I
tH, 2, 0, 2], 3-3 (4, —1, 3, ~1], 3% {-2, —4, 3, sj

as an orthonormal basis for W. ®

As you can see, the arithmetic involved in the Gram~Schmidt process can
be a bit tedious with pencil and paper, but it is very easy to implement the
process on a computer.
We know that any independent set of vectors in R” can be extended to a
basis for R”. Using Theorem 6.4, we can prove a similar result for orthogonal
sets.

COROLLARY 2. Expansion of an Orthogonal Set to an Orthogonal Basis

Every orthogonal set of vectors in a subspace W of R" can be expanded


if necessary to an orthogonal basis for W.
346 CHAPTER 6 ORTHOGONALITY

PROOF An orthogonal set {v,,v,,...,v,} of vectors in His an independent set


by Theorem 6.2, and can be expanded to a basis {v,,...,v,a,,.-.,a}of W
by Theorem 2.3. We apply the Gram—Schmidt process to this basis for W.
Because the v, are already mutually perpendicular, none of them will be
changed by the Gram-—Schmidt process, which thus yields an orthogonal basis
containing the given vectors v, forj=1,...,7. 0

EXAMPLE 7 Expand {f1, 1, 0], [1, —1, 1]} to an oruiogonal basis for R’, and then transform
this to an orthonormai basis for R?.
SOLUTION First we expand the given set to a basis {a,, a), aj} for R?. We take a, = (1, 1, OJ,
a, = [1, —1, 1], and a, = [1, 0, 0], which we can see form a basis for R*. (See
Theorem 2.3.)
Now we use the Gram—Schmidt process with formula (6). Because a, and
a, are perpendicular, we let v, = a, = [1, 1, O] and vy, = a, = [1, —1, 1]. From
formula (6), we have
a,°, a.-yv
Vv, =a; - 3 ly +3 2y,
y° yy Vv,
° Va

(1,0, 0] - (Ft i, 0] + zl, -1, i)

-110.0-[564)-[b-$-4
Multiplying this vector by —6, we replace v, by [-1, !. 2]. Thus we have
expanded the given set to an orthogonal basis
{{1, 1, 0], (1, -1, 1], [-1, 1, 2]}
of R’. Normalizing these vectors to unit length, we obtain
{((1/V42, 12, 0], V3, - 1/3, 13), [- 1/6, 1/6, 2-6 J}
as an orthonormal basis. ss

The Gram-Schmidt Process in Inner-Product Spaces (Optional)


The results in this section easily extend to any inner-product space. We havé
the notions of an orthogonal set, an orthogonal basis, and an orthonormal
basis, with essentially the same definitions given earlier. The Gram -Schmidt
process is still valid. We conclude with an example.

EXAMPLE 8 Find an orthogonal basis for the subspace sp(1, x, x) of the vector space Coi
of continuous functions with domain 0 = x = 1, where (J, g) = [j f(x)g(x) 4
SOLUTION We let v, = | and compute

wa VE ED 1)
(Vx, 2 ve tev
Vx dx 2/3
- Bee -F2
62 THE GRAM-SCHMIDT PROCESS 347

We replace v, by 3v,, obtaining v, = 3x — 2, and compute v, as

[x ax [ xavx- 2) dx
vy = xX = | - [= 3 Vx - 2)
I, 1 dx | BV x — 2) ax

eye 1/2 -
8-1 yg ya x-to2ovend
_,§
=x 5VX +753

Replacing v, by 10v,, we obtain the orthogonal basis


1, 3x - 2, 10x — 12x + 3}.

| SUMMARY

A basis for a subspace W is orthogonal if the basis vectors are mutually


perpendicular, and it is orthonormal if the vectors also have length 1.
Any orthogonal set of vectors in R" is a basis for the subspace it generates.
Let W be a subspace of R" with an orthogonal basis. The projection of a
vector bin R" on Wis equal to the sum of the projections of b on each basis
vector.
Every nonzero subspace Wof R’ has an orthonormal basis. Any basis can
be transformed into an orthogonal basis by means of the Gram-—Schmidt
process, in which each vector a, of the given basis is replaced by the vector
v, obtained by subtracting from a; its projection on the subspace generated
by its predecessors.
Any orthogonal set of vectors in a subspace W of R’ can be expanded, if
necessary, to an orthogonal basis for W.
Lei A be ann X k matrix of rank k. Then A can be factored as OR, where Q
oN

is an m X k matrix with orthonormal column vectors and Risa k x k


upper-triangular invertible matrix.

| EXERCISES

In Exercises 1-4, verify that the generating set of 3. W=sp((!, -1, -1, 1), U1, 1, 1, 1),
the given subspace W is orthogonal, and find the [-1, 0, 0, 1); b = [2, 1, 3, 1]
Projection of the given vector b on W. 4. W=sp([l, -1, 1, 1, [-1, 1,1, 1),
(1,1, —1, 1);b = [1, 4, 1, 2]
1. W = sp([2, 3, 1], [-1, 1, -1);> = (2, 1,4] 5. Find an orthonormal basis for the plane
2. W = sp((-1, 0, 1], [1 1, sb = 0, 2, 3] 2x t+ iy+z=0.
348 CHAPTER 6 ORTHOGONALITY

6. Find an orthonormal basis for the subspace 22. Find an orthogonal basis for sp({1, 2, 1, 2],
(2, 1, 2, 0]) that contains j1, 2, 1, 2].
W = {[x,, X), X5, Xa] | x, = xX) + 2;,
23. Find an orthogonal basis for sp([2, 1, -1, 1] ’
Xy = —X, + x5}
[1, 1, 3, 0}, [1, 1, 1, 1) that contains
of R‘.
(2, 1, -1, 1] and [I, 1, 3, 0).

(spb
. Find an orthonormal basis for ihe subspace
24. Let B be the ordered orthonormal basis

sp((0, 1, O}, [1, 1, 1)) of R’.


. Find an orthonormal basis for the subspace 733
sp([1, 1, 0], [-1, 2, 1) of R?.
. Transform the basis {[1, 0, 1], (0, |, 2], a. Find the coordinate vectors [c,, c,, c;] for
(2, 1, O]} for R? into an orthonormal basis, (1, 2, —4] and [d,, d,, a3] for [5, —3, 2],
using the Gram—Schmidt process. relative to the ordered basis B.
b. Compute {1, 2, —4] + [5 ~ 3, 2], and then
10. Repeat Exercise 9, using the basis {[1, 1, 1], compute [c,, c,, C;] ° [d,, d,, d,]. What do
[1, 0, 1), (0, 1, 1)} for R’.
you notice? _~
il. Find an orthonormal basis for the subspace
. Mark each of the following True or False.
of R‘ spanned by [1, 0, 1, 0], [1, 1, 1, 0], and . All vectors in an orthogonal! basis have
[1, -1, 0, 1}. length 1.
12. Find an orthonormal basis for the subspace . All vectors in an orthonormal basis have
sp({l, —1, 1, 0, Q], [-1, 0, ©, 0, 1), length 1.
{0, 0, 1, 0, 1], (1, 0, 0, 1, 1]) of RY. . Every nontrivial subspace of R" has an
. Find the projection of [5, —3, 4] on the orthonormal! basis.
subspace in Exercise 7, using the . Every vector in R’ is in some
orthonormal basis found there. orthonormal basis for R’.
14. Repeat Exercise }3, but use the subspace in . Every nonzero vector in R” is in some
Exercise 8.
orthonormal! basis for R’.
. Every unit vector in R” is in some
15. Find the projection of [2, 0, —1, 1] on the orthonormal! basis for R’.
subspace in Exercise | 1, using the . Every n X k matrix A has a factorization
orthonormal basis found there. A = QR, where the column vectors of Q
16. Find the projection of [—1, 0, 0, 1, —1] on form an orthonormal set and R is an
the subspace in Exercise 12, using the invertible kK x k matrix.
orthonormal! basis found there. . Everyn X k matrix A of rank k has a
17. Find an orthonormal basis for R‘ that factorization A = QR, where the column
contains an orthonormal basis for the vectors of Q form an orthonormal set and
subspace sp([1, 0, 1, 0], (0, 1, 1, OJ). R is an invertible k x k matrix.
i. It is advantageous to work with an
18. Find an orthogonal basis for the orthogonal
orthogonal basis for W when projecting 4
complement of sp{[l1, —1, 3]) in R?.
vector b in R” on a subspace W of R’.
19. Find an orthogonal basis for the nullspace of . It is even moze advantageous to work
the matrix with an orthonormal basis for W when
performing the projection in part (i).
2 1

1 -]
O

ft

5 ] In Exercises 26-28, use the text answers for thé


tw)

1 2- indicated earlier exercise to find a


QR-factorization of the matrix having as column-


20. Find an orthonormal basis for R3 that vectors the transposes of the row vectors given IM
contains the vector (1/\/3)[1, 1, 1]. that exercise.
21. Find an orthonormal basis for sp({[2, 1, 1],
{i, —1, 2]) that contains (1/6)[2, 1, 1]. 26. Exercise 7 27. Exercise 9 28. Exercise il
6.3 ORTHOGONAL MATRICES 349

29. Let A be an 7 X k matrix. Prove that the 35. Find an orthonormal basis for sp(1, e*) for
column vectors of A are orthonormal if and O0=x=1 if the inner product is defined by
only if A7A = 1. SL 8) = Sof(xdg(x) dx.
. Let A be ann X nm matnix. Prove that A has
orthonormal column vectors if and only if A a The routine QRFACTOR in LINTEK allows the
is invertiole with inverse A7'! = A’.
user to enter k independent row vectors in R" for n
and k at most 10. The program can then be used
31. Let A be an n X n matrix. Prove that the to find an orthonormal set of vectors spanning the
column vectors of A are orthonormal if and same subspace. It will also exhibit a
only if the row vectors of A are orthonormal. QR-factorization of the n x k matrix A having
[Hint: Use Exercise 30 and the fact that 4 the entered vectors as column vectors.
commutes with its inverse.] For ann X k matrixA of rank k, the
command {Q,R] = qr(A) in MATLAB produces
ann Xn matrix Q whose columns form an
orthonormal basis for R" andann x k
Exercises 32-35 invoive inner-product spaces. upper-triangular matrix R (that is, with 1, = 0 for
i > j) such that A = QR. The first k columns of Q
comprise the n X k mairix Q described in
32. Let V be an inner-product space of
Corollary I of Theorem 6.4, and R is thek x k
dimension 7, and let B be an ordered
matrix R described in Corollary 1 with n — k
orthonormal basis for V. Prove that, for any
rows of zeros supplied at the bottom to make it
vectors a and b in V, the inner product (a, b)
the same size as A.
is equal to the dot product of the coordinate
vectors of a and b relative to B. (See Exercise In Exercises 36-38, use MATLAB or
24 for an illustration.) LINTEK as just described to check the answers
you gave for the indicated preceding exercise.
33. Find an orthonomnnal basis for sp(sin x, (Note that the order you took for the vectors in the
cos x) for 0 = x = wif the inner product is Gram-Schmidt process in those exercises must be
defined by (f, g) = JUs()g() ax. the same as the order in which you supply them
34. Find an orthonormai basis for sp(1, x, x2) in the Software to be able to check your answers.)
for —1 = x < 1 if the inner product is 36. Exercises 7-12 37. Exercises 17, 20~23
defined by (f, g) = J' fg) dx. 38. Exercises 26-28

6.3 ORTHOGONAL MATRICES

Let .4 be the n X n matrix with column vectors a,, a), . . . ,4,. Recall that these
vectors form an orthonormal basis for R" if and only if

a; = ? if i#j, ala,
a,° a;
1 if i=j llall=1
Because

a; a, eo « e
350 CHAPTER 6 ORTHOGONALITY

has a, * a; in the ith row and jth column, we see that the columns of A form an
orthonormal basis of R" if and only if

ATA = I. (1)
In computations with matrices using a computer, it is desirable to use matrices
satisfying Eq. (1) as much as possible, as we discuss later in this section.

DEFINITION 6.4 Orthogonal Matrix

Annxn matrix A is orthogonal if A7A = I.

The term orthogonal applied to a matrix 1s just a bit misleading. For


example [ is not an orthogonal matnix. For an orthogonal matrix, not only
"12 #1
must the columns be mutually orthogonal, they must also be uni! vectors; that is,
they must have length 1. This is not indicated by the name. It is unfortunate that
the conventional name is orthogonal matrix rather than orthonormal matrix,
but it is very difficult to change established terminology.
From Definition 6.4 we see that an n X n matrix A is orthogonal if and only
if Ais invertible and A~! = A’. Because every invertible matrix commutes with
its inverse, it follows that AA? = I, too; that is, (A7)7A7 = I. This means that the
cofumn vectors of A’, which are the row vectors cf A, also form an
orthonormal basis for R". Conversely, if the row vectors of an n X n matrix A
form an orthonormal basis for R’, then so do the column vectors. We
summarize these remarks in a theorem.

THEOREM 6.S Characterizing Properties of an Orthogonal Matrix

Let A be an n X n matrix. The following conditions are equivalent:


1. The rows of A form an orthonormal basis for R’.
2. The columns of A form an orthonormal basis for R’.
3. The matrix A is orthogonal—that is, invertible with A~! = A’.

EXAMPLE 1 Venfy that the matnx


1/2 3 6
A=7)3 —6 2

6 2 -3
is an orthogonal matrix, and find A™'.
SOLUTION We have
2 3 6//2 3 6 100
ATA = 75/3 —6 2\|3 -6 2/=/0
1 O}.
6 2 -3}|/6 2 -3} |001
6.3 ORTHOGONAL MATRICES 351

In this example, A is symmetric, so


; - 1? 3 6
A'=AT=A=-7)3 -6 2|.
6 2-3 a

We now present the properties of orthogonal matrices that make them


especially desirable for use in matrix computations.

THEOREM 6.6 Properties of Ax for an Orthogonal Matrix A

Let A be an orthogonal n x n matrix and let x and y be any column


vectors in R’.
1. (Ax): (ay) = x-y. Preservation of dot product
2. ||Ax|| = {[x|]). Preservation of length
3. The angle between nonzero vectors Preservation of angle
x and y equals the angle between
Ax and Ay.

PROOF For property 1, we need only recall that the dot product x - y of two
column vectors can be found by using tne matrix multiplication (x")y. Because
A is orthogonal, we know that A7A = J, so

[(Ax) - (Ay)] = (Ax)"Ay = x7ATAy = x"Iy = xy = [x- y].

HISTORICAL NOTE Properties OF ORTHOGONAL MATRICES for square systems of coefficients


appear in various works of the early nineteenth century. For example, in 1833 Carl Gustav Jacob
Jacobi (1804-1851) sought to find a linear substitution
y= 2 a,Xis 2 = x OyjXp.- +s Vr = % aX,

such that Zy? = Ex. He found that the coefficients of the substitution must satisfy the
orthogonality property

> _ {°. j#k,


F aa 7 1, j= k.

One can even trace orthogonal systems of coefficients back to seventeenth- and eighteenth-
century works in analytic geometry, when rotations of the plane or of 3-space are given in order to
transform the equations of curves or surfaces. Expressed as matrices, these rotations would give
orthogonal ones.
The formal definition of an orthogonal matrix, however, and a comprehensive discussion
appeared in an 1878 paper of Georg Ferdinand Frobenius (1849-1917) entitled “On Linear
Substitutions and Bilinear Forms.” In particular, Frobenius dealt with the eigenvalues of such a
matrix.
Frobenius, who was a futl professor in Zurich and laier in Berlin, made his major
mathematical contribution in the area of group theory. He was instrumental in developing the
concept of an abstract group, as well as in investigating the theory of finite matnx groups and
group characters.
352 CHAPTER 6 ORTHOGONALITY

For property 2, the length of a vector can be defined in terms of the dot
product—namely, ||x|]]| = x - x. Because multiplication by A preserves the
dot product, it must preserve the length.
For property 3, the angle between nonzero vectors x and y can be defined
in terms of the dot product—namely, as

_jf{—_X Ye
COS Vx x wr}
Because multiplication by A preserves the dot product, it must preserve
angles. a

EXAMPLE 2 Let v be 2 vector in R? with coordinate vector [2, 3, 5] relative to some ordered
orthonormal basis (a, a), a) of R?. Find |v].
SOLUTION We have v = 2a, + 3a, + 5a;, which can be expressed as

| 2
v= Ax, where A=|a, a, a,{ and x ={|3).
| 5

Using property (11) of Theorem 6.6, we obtain


{|v|] = |LAx|] = |x| = V4+ 9 + 25 = V38. .

Property 2 of Theorem 6.6 is the reason that it is desirable to use


orthogonal matrices in matrix computations on a computer. Suppose, for
example, that we have occasion to perform a multiplication Ax for a square
matrix A and a column vector x whose components are quantities we have to
measure. Our measurements are apt to have some error, so rather than using
the true vector x for these measured quantities, we probably work with x + ¢,
where e is a nonzero error vector. Upon multiplication by A, we then obtain

A(x + e) = Ax + Ae.
The new error vector is Ae. If the matrix A is orthogonal, we know that
\|Ael| = lel], so the magnitude of the error vector remains the same under
multiplication by A. We express this important fact as follows:

/ Multiplication by orthogonal matrices is a stable operation.

If A is not orthogonal, ||Ae|| can be a great deal larger than |lell. Repeated
mv'tiplication by nonorthogonal matrices can cause the error vector to blow
up to such an extent that the final answer is meaningless. To take advantas°
6.3 ORTHOGONAL MATRICES 353

of this stability, scientists try to orthogonalize computationa: algorithms with


matrices, in order to produce reliable results.

Orthogonal Diagonalization of Real Symmetric Matrices


Recall that a symmetric matrix is an n X m Square matrix A in which the Ath
row vector is equal to the Ath column vector for each k = 1, 2,..., n.
Equivalently, A is symmetric if and only if it is equal to its transpose A’. The
probiem of diagonalizing a real symmetric matrix arises in many applications.
(See Section 8.1, for example.) As we stated in Theorem 5.5, this diagonaliza-
tion can always be achieved. That is, there is an invertible matrix C such that
C-'AC = Disa diagonal matrix. In fact, C can be chosen to be an orthogona!
matrix, as we will show. This means that diagonalization of real symmetric
matrices is a computationally stable process. We begin by proving the
perpendiculanty of eigenvectors of a real symmetric matrix that correspond to
distinct eigenvalues.

THEOREM 6.7 Orthogonality of Eigenspaces of a Real Symmetric Matrix

‘+: Eigenvectors of a real symmetric matrix that correspond to different


:. eigenvalues are orthogonal. That is, the eigenspaces of a real symmet-
ric matrix are orthogonal.

PROOF Let A be an” X n symmetric matrix, and let v, and v, be eigenvectors


corresponding to distinct eigenvalues A, and A,, respectively. Writing vectors
as column vectors, we have

Av, = A,v, and Av, = A,y,.

We want to show that v, and v, are orthogonal. We begin by showing that


A,(¥, * ¥.) = A,{V, - v,). We compute

A(¥, * V2) = OAV) + ¥, = (Av,) > ¥.

The final dot’ product can be written in matrix form as

(Av,)"v, = (v,7A7)v,.

Therefore,
fA,(V, ° ¥,)] = v,7ATv,. (2)

On the other hand,

[A,(¥, * v2)} = [¥) + (A2¥2)] = v,7Av, (3)


Because A = A’, Eqs. (2) and (3) show that

A,(v, * ¥2) = AV, > ¥,) Or (A, — A,)(v, + ¥,) = 0.


354 CHAPTER 6 ORTHOGONALITY

Because A, — A, # 0, we conclude that v, - v. = 0, which shows that vy, is


orthogonal tov,. A

The results stated for real symmetric matrices in Section 5.2 tell us that an
n X n real symmetric matnx A has only real numbers as the roots of its
characteristic polynomial, and that the algebraic multiplicity of each eigen.
value is equal to its geometric multiplicity; therefore, we can find a basis for R"
consisting of eigenvectors of A. Using the Gram~Schmidt process, we can
modify the vectors of the basis in each eigenspace to be an orthonormal set.
Theorem 6.7 then tells us that the basis vectors from different eigenspaces are
also perpendicular, so we obtain a basis of mutually perpendicular real
eigenvectors of unit length. We can take as the diagonalizing matrix C, such
that C-'AC = D, an orthogonal matrix whose column vectors consist of the
vectors in this orthonormal basis for R’. We summarize our discussion in a
theorem.

THEOREM 6.8 Fundamental Theorem of Real Symmetric Matrices

Every real symmetric matrix A is diagonalizable. The diagonalization


C-'AC = D can be achieved by using a real orthogonal matrix C.

The converse of Theorem 6.8 is also true. If D = C'AC is a diagonal


matrix and C is an orthogonal matrix, then A is symmetric. (See Exercise 24.)
Tne equation D = C"'AC is said to be an orthogonal diagonalization of A.

EXAMPLE 3 Find an orthogonal diagonalization of the matrix

a=)
_ {12
4
SOLUTION The eigenvalues of A are the roots of the characteristic equation

HSA
1-A
2 Jew sina
24 2 gL

They are A, = O and A, = 5. We proceed to find ihe corresponding eigenvector:


For A, = 0, we have

a-a=a-[}3]-|
which yields the eigenspace

For A, = 5, we have

A-\AI=A ~sr= [5 l~[o “a


6.3 ORTHOGONAL MATRICES 355

a)
which yields the eigenspace

6-year
Thus,

B )-say
is a diagonalization of A, and

[2aL 4
as a
is an orthogonal diagonalization of A. 8

EXAMPLE 4 Find an orthogonal diagonalization of the matrix


1-1 -1
=/-1 1 -I],
-l1-1 J
that is, find an orthogonal matrix C such that C~'AC is a diagonal matrix D.
SOLUTION Example 2 of Section 5.3 shows that the eigenvalues and associated eigen-
spaces of A are ,

A, = -1, Ea ?

—ty; j-1
A,=A,;=2, E,=sp of ’ ;

VY. V3
Notice that vectors v, and v, in E, are orthogonal to the vector v, in £,, as must
be the case for this symmetric matrix A. The vectors v, and v, in £, are not
orthogonal, but we can use the Gram-—Schmidt process to find an orthogonal
basis for E,. We replace v, by
. -1]- ,{-!] [-1 -1
-HoMy, = 01-5 1}=|—1/2], or by {-I11.
V2 ° V2 1 0 l 2
Thus {[1, 1, 1], [—1, 1, 0], [-1, —1, 2]} is an orthogonal basis for R? of
eigenvectors of A. An orthogonal diagonalizing matrix C is obtained by
normalizing these vectors and taking the vectors in the resulting orthonormal
basis as column vectors in C. We obtain
V3 1/2 - 1/6
C= WWV3 WN - V/V.
W330 216
356 CHAPTER 6 ORTHOGONALITY
Orthogonal Linear Transformations

Every theorem about matrices has an interpretation for linear transforma-


tions. Let A be an orthogonal n X n matrix, and let 7: R" > R" be defined by
7(x) = Ax. Theorem 6.6 immediately establishes the following properties of T:
1. T(x) > Ty} = x-¥; Preservation of dot product
2. ||7(x)ll = [Ixll: Preservation of length
3. The angle between 7(x) and T(y) equals Preservation of angle
the angle between x and y.
The first property is commonly used to define an orthogonal linear transfor-
mation 7: R" > R’.

DEFINITION 6.5 Orthogonal Linear Transformation

A linear transformaiion 7: R? — R’ is orthogonal if it satisfies


T(v) + T(w) = v- w for all vectors v and w in R’.

For example, the linear transformation that reflects the plane R? in a line
containing the origin clearly preserves both the angle @ between vectors u and v
and the magnitudes of the vectors. Because the dot product in R’ satisfies

u-v = |{lull[iy|| cos 6,


it follows that dot products are also preserved. Therefore, this reflection of the
plane is an orthogonal linear transformation.
We just showed that every orthogonal matrix gives rise to an orthogonal
linear transformation of R" into itself. The converse is also true.

THEOREM 6.9 Orthogonal Transformations vis-a-vis Matrices

A linear transformation T of R" into itself is orthogonal if and only if


its standard matrix representation A is an orthogonal matrix.

PROOF It remains for us to show that A is an orthogonal matrix if 7 preserves


the dot product. The columns of A are the vectors T(e,), T(e,),..., (en
where e, is the jth unit coordinate vector of R’. We have

Me) +The) =e-g= _j0{0 WIAA


-)e Ti += if iF#j
e.-e= . . >

showing that the columns of A form an orthonormal basis of R". Thus, A 1s


an orthogonal matrix. 4

EXAMPLE 5 Show that the linear transformation 7: R? > R? defined by T([x,, %2. %3]) 7
[x [V2 + xfV72, x, —x,{W2 + 4/2] is orthogonal.
6.3 ORTHOGONAL MATRICES 357

SOLUTION The orthogonality of the transformation follows from the fact that the
standard matrix representation
Ww72 0 WN
Az=| 0 1 4
-WV2 0 Wnv?2
is an orthogonal matrix. =

In Exercise 40, we ask you to show that a linear transformation 7: R"— R’


is orthogonal if and only if T maps unit vectors into unit veciors. Sometimes,
this is an easy condition to verify, as our next example illustrates.
EXAMPLE 6 Show that the linear transformation that rotates the plane counterclockwise
through any angle is an orthogonal linear transformation.
SOLUTION A rotation of the plane preserves the lengths of all vectors—in particular, unit
vectors. Example | in Section 2.4 shows that a rotation is a linear transforma-
tion, and Exercise 40 then shows that the transformation is orthogonal. &

You might conjecture that a linear transformation T: R" > R’ is orthogonal


if and only if it preserves the angle between vectors. Exercise 38 asks you to
give a counterexample, showing that this is not the case.
For students who studied inner-product spaces in Section 3.5, we mention
that a linear transformation T: V— Vof an inner-product space Vis defined to
be orthogonal if (7(v), 7(w)) = (v, w) for all vectors v,w € V. This is the natural
extension of Definition 6.5.

| SUMMARY
1. A square n X n matrix A is orthogonal if it satishes any one (and hence all)
of these three equivalent conditions:
a. The rows of A form an orthonormal basis for R’.
b. The columns of A form an orthonomnal basis for R’.
c. The matrix A is invertible, and A! = A’.
2. Multiplication of column vectors in R" on the left by an n x 1% orthogonal
matrix preserves length, dot product, and the angle between veccors. Such
multiplication is computationally stable.
3. A linear transformation of R" into itself is orthogonal if and only if it
preserves the dot product, or (equivalently) if and only if its standard
matrix representation is orthogonal, or (equivalently) if and only if it maps
unit vectors into unit vectors.
4. The eigenspaces of a symmetric matrix A are mutually orthogonal, and A
has n mutually perpendicular eigenvectors.
5. A symmetric matrix A is diagonalizable by an orthogonal matrix C. That
is, there exists an orthogonal matrix C such that D = C”'AC is a diagonal
matrix.
358 CHAPTER 6 ORTHOGONALITY

"| EXERCISES
In Exercises 1-4, verify that the given matrix is 12. Let (a,, a,, a,, a,) be an ordered orthonormal
orthogonal, and find its inverse. basis for R‘, and let (2, 1, 4, —3] be the
coordinate vector of a vector b in 1X‘ relative
to this basis. Find |{bjl.

u
ulw

via
2&2
1. wv 1 2.|_

oOo
ule
ote In Exercises 13-18, find a matrix C such that

SC
SO

D = C'AC is an orthogonal diagonalization of
I-11 1 the given syinmetric matrix A.
2-3 6 gil boat
|=

3. 3 ¢2 23 “ar a -tot
6 12. 12 32
- Lod ot-t
~~

A 4. | Ol
If A and D are square :natrices, D is diagonal, 2 10 joi
and AD is orthogonal, then (AD)! = {AD)' and 15.|1 2! 16.1321
D“'A-' = DTA’ so that A“' = DD™A? = D?A™. In 10 1 2 l1 10
Exercises 5~8, find the inverse of each matrix A
by first finding a diagonal matrix D so that AD Oo 1 Lt O 1-1 0 0
has column vectors of length 1, and then applying 1-2 2 1 -!1 1 0 0
the formula A“! = D’A’. oS | 18. 00 1 3
oO ft tt O 00 3 #41
3 0 8
s.[ -1
| 3]3 6. |-4 0 6 19. Mark each of the following True or False.
4 010 ___ a. A square matrix is orthogonal if its
column vectors are orthogonal.

7.
-12
4-3
6 6
2
623 8
212 1-3 #1
_— b. Every orthogonal matrix has nullispace
{0}.
_—.c. If A? is orthogonal, then A is orthogonal.
4 2 1 3-1 _—d. If A is an n X n symmetric orthogonal
matrix, then A? = J.
9. Supply a third column vector so that the
___e. IfA is ann X mn symmetric matrix such
matrix
that A? = J, then A is orthogonal.
V3 V2 ___ f. If A and B are orthogonal n x n matrices,
V3 0 is orthogonal. then AB is orthogonal.
UWV3 -1nW2 _—. g. Every orthogonal linear transformation
catries every unit vector into a unit
10. Repeat Exercise 9 for the matnx vector.
2 V5 ___h. Every linear transformation that carries
each unit vector into a unit vector is
3 -2V13 orthogonal.
___ i. Every map of the plane into itself that is
é7 0 an isometry (that is, preserves distance
between points) is given by an orthogonal
11. Let (a,, a), a,) be an ordered orthonormal linear transformation.
basis for R’, and let b be a unit vector with _— j. Every map of the plane into itself that
coordinate vector ; i q relative to this an isometry and that leaves the origin
fixed is given by an orthogonal linear
basis. Find all possible values for c. transformation.
6.3 ORTHOGONAL MATRICES 359

Let A be an orthogonal n X n matrix. Show 31. Let A and C be orthogonal n x n matrices.


that ||4x|| = {[4-'x|| for any vector x in R’. Show that C~'AC is orthogonal.
21. Let A be an orthogonal matrix. Show that A?
is an orthogonal matrix, too. In Exercises 32-37, determine whether the given
linear transformation is orthogonal.
22. Show that, if A 1s an orthogonal matrix, then
det(A) = +1.
32. T: R? > R? defined by Ti[x, y]) = Ly, x]
23 Find a 2 X 2 matrix with determinant | that
is not an orthogonal matrix. 33. T: R? > R defined by 7{([x, y, Z]) = [x, y, 0]
24. Let D = C"'AC be a diagonal matrix, where 34. T: R? > R’ defined by 7([x, y]) = [2x, ¥]
C is an orthogonal matrix. Show that A is
symmetric. 35. T: R? > R’ defined by 7([x, y]) = x. —y]
25. Let A be an n X nm matrix such that Ax - Ay 36. T: R? > R’ defined by 7([x, y]) =
= x-y for all vectors x and y in R’*. Show (x/2 — yBy/2, — ¥3x/2 + y/21
that A is au orthogonal matrix.
37. T: R? > R defined by 7{{x, y, z}) =
26. Let A be an n X n matrix such that ||Ax|| = [x/3 + 2y/3 + 22/3, -2x/3 - y/3 + 22/3.
||x|| for all vectors x in R*. Show that A is an —2x/3 + 2v/3 — 2/3)
orthogonal matrix. [Hint: Show that x - y =
x(x + yP — [x ~ y[P), and then use 38. Find a linear transformation 7: R’ > R’ that
Exercise 25.} preserves the angle between vectors but is
not an orthogonal transformation.
27. Show that the real eigenvalues of an
orthogonal matrix must be equal to | or —1. 39. Show that every 2 x 2 orthogonal matnx is
(Hint: Think in terms of linear of one of two forms: either
transformations.]}
cos @ sin é ov {cos @ —sin 0
. Describe all real diagonal orthogonal sin@ ~cos 0 ‘sin@ cos6 °
matrices.
29. a. Show that a row-interchange elementary lor some angle @.
matrix is orthogonal. — 40. Let T: R? > R’ be a linear transformation.
b. Let A be a matrix obtained by permuting Show that T is orthogonal if and only if 7
(that is, changing the order of) the rows maps unit vectors to unit vectors. [HinT: Use
of the n x n identity matrix. Show that A Exercise 26.]
is an orthogonal matrix.
. Let {a,, a,,..., a,} be an orthonormal basis 41. (Real Househdider matrix) Let v be a
of column vectors for R’, and let C be an nonzéro column vector in R’. Show that
orthogonal 7 X # matrix. Show ihai C=I- —_(w") is an orthogonal matrix.
{Ca,, Ca,,..., Ca,}
(These Househdlder matrices can be used to
perform certain important stable reductions
is also an orthonormal basis for R’. of matnices.)
360 CHAPTER 6 ORTHOGONALITY

6.4 | THE PROJECTION MATRIX

Let Wbea subspace of R’. Projection of vectors in R’ on W gives a mapping T


of R’ into itself. Figure 6.8 illustrates the projection 7(a + b) of a + bon W,
and Figurc 6.9 illustrates the projection 7(ra) of ra on W. These figures suggest
that
T(a + b) = 7(a) + 7(b)
and
T(ra) = rT(a).
Thus we expect that 7 is a linear transformation. If this is the case, there must
be a matrix P such that 7(x) = Px—namely, the standard matnix representa-
tion of T. Recall that in Section 6.1 we showed how to find the projection of a
vector b on W by finding a basis for W, then finding a basis for its orthogonal
complement W“, then finding coordinates of b with respect to the resulting
basis of R’, and so on. It is a somewhat involved process. It would be nice to
have a matrix P such that projection of h on W could be found by computing
Pb.
In this section, we start by showing that there is indeed a matnx P such
that the projection of b on Wis equal to Pb. Of course, this then shows that
projection is a linear transformation, because we know that multiplication of
column vectors in R* on the left by any m X n matrix A gives a linear
transformation of R’ into itself. We derive a formula for this projection matrix
P corresponding to a subspace W. The formula involves choosing a basis for
W, but the final matrix P obtained is independent of the basis chosen.
We have seen that projection on a subspace Wis less difficult to compute
when an orthonorinal basis for Wis used. The formula for the matrix P also
becomes quite simple when an orthonormal basis is used, and consequently
projection becomes an easy computation. Section 6.5 gives an application of
these ideas to data analysis.
We will need the following result concerning the rank of A7A for our work
in this section.

T(ra) = rT (a)
T(a)

FIGURE 6.8 FIGURE 6.9


Projection of a + b on W. Projection of ra on W.
6.4 THE PROJECTION MATRIX 361

THEOREM 6.10 The Ran


of (AA
k
LetA be an m X n matrix of rank r. Then the n X n symmetric matrix
(A)A also has rank r.

PROOF We will work with nullspaces. If v is any solution vector of the


system Ax = 0, so that Ay = 0, then upon multiplying the sides of this last
equation on the left by A’, we see that v is also a solution of the system
(A7)Ax = 0. Conversely, assuming that (A7)Aw = 0 for an n X | vector w,
then

[0] = w[(A7) Aw] = (Aw)"(4w),


which may be written ||Awil? = 0. That is, Aw is a vector of magnitude 0, and
consequently Aw = 0. It follows that A and (A‘)A have the same nullspace.
Because they have the same number of columns, we apply the rank equation
(Thcorem 2.5) and conclude that rank(A) = rank((A’)A). a

The Formula for the Projection Matrix


Let W= sp(a,, a), . . . , a) be a subspace of R’, where the vectors a,, a,... , &
are independent. Let b be a vector in R’, and let p = b,,be its projection on W.
From the unique decomposition
b=pt (b—p)
by- by

explained in Section 6.1, we see that the projection vector p is the unique
vector satisfying the following two properties. (See Figure 6.10 for an
illustration when dim(W) = 2.)

Properties of the Projection p of Vector b on the Subspace W


1. The vector p must lie in the subspace W.
2. The vector b — p must be perpendicular to every vector in W.

FIGURE 6.10
The projection of b on sp(a,, a,).
362 CHAPTER 6 ORTHOGONALITY

I{ we write our vectors in R" as column vectors, the subspace W’ is the


column space of the n X x matrix A, whose columns are a,, a,,..., a. All
vectors in W = sp(a,, a, . .. , a,) have the form Ax, where

x;
xX}
x =|. | for any scalars x,, xX,,..., X;,

Xx

Recause p lies in the space W, we see that p = Ar, where

r
ry
r=|-| for some scalars r,,7r,,...,

ry

Because b — Ar must be perpendicular to each vector in W, the dot preduct of


b — Ar and Ax must be zero for all vectors x. This dot-product condition can be
written as the matrix equation

(Ax)(b — Ar) = x"(A7b — ATAr) = [0].

In other words, the dot product of the vectors x and A’b — A7Ar must be zero
for all vectors x. This can happen only if the vector A’b — A“Ar is itself the zero
vector. (See Exercise 41 in Section 1.3.) Therefore, we have

ATb - ATAr = 0. (1)


Now the & x k matrix A™A appearing in Eq. (1) is invertible, because it has the
same rank as A (see Theorem 6.10); and A has rank k, because its columns are
independent. Solving Eq. (1) for r, we obtain

r = (ATA) 'A’b. (2)

Denoting the projection p = Ar of bon W by p = b,, and writing b as acolumn


vector, we obtain the formula in the following box.

Projection b,, of b on the Subspace W


Let W = sp(a,, a,,... , a) be a k-dimensional subspace of R’, and
let A have as columns the vectors a,, &,..., a. The projection of b
in R” on W is given by

by = A(ATA)"'A7b. (3)
eo
6.4 THE PROJECTION MATRIX 363

We leave as Exercise |2 the demonstration that the formula in Section 6.1 for
projecting b on sp(a) can be written in the form of formula (3), using matrices
oi appropriate size.
Our first example reworks Example | in Section 6.1, using formula (3).

EXAMPLE 1 Using formula (3), find the projection of the vector b on the subspace W’ =
sp(a), where

I 2|
b=/2| and a=/4).
3 3|
SOLUTION Let A be the matrix whose single column is a. Then

2
ane 4 falc
3
Putting this into formula (3), we have

by = A(ATA)!ATD = ‘(ae 4 2 = Is \ "| A = (3)

We refer to the matrix 4(A7A)"'A’ in formula (3) as the projection matrix


for the subspace W. It takes any vector b in R" and, by left multiplication,
projects it onto the vector b,,, which lies in W. We will show shortly that this
matrix is uniquely determined by W, which allows us to use the definite article
the when talking about it. Before demonstrating this uniqueness, we box the
formula for this matrix and present two more examples.

The Projection Matrix P for the Subspace W


Let W = sp(a,, a), .... a,) be a k-dimensional subspace of R’, and
let A have as columns the vectors a,, a,,..., a, The projection
matrix for the subspace W is given by

P = A(ATA)'AT,

EXAMPLE 2 Find the projection matrix for the x,,x,-plane in R’.

SOLUTION The x,,x,-plane is the subspace W = sp(e), e;), where e, and e, are the column
vectors of the matrix

00
A= 1 O}.
01
364 CHAPTER 6 ORTHOGONALITY

We find that A7A = J, the 2 x 2 identity matrix, and that the projection matrix
1S

9% 10 000
P= aaray'a’=|1 0l[9 9 |= 01 0}.
01 0 0 1
Notice that P projects each vector

|
b, b,
b,| inRonto Plb,\=
b, b,

in the x,,x;-plane, as expected. a

EXAMPLE 3 Find the matrix that projects vectors in R? on the plane 2x — y — 3z = 0. Also,
find the projection of a general vector b in R? on this plane.

SOLUTION We observe that the given plane contains the zero vector and can therefore be
written as the subspace W = sp(a,, a,), where a, and a, are any two nonzero and
nonparallel vectors in the plane. We choose

0 1
a,=| 3{ and a, =/2\,
0

($4
so that
&

=
WwW

on
m—

Then

TA\
(A°Ay"
=
's 5] mia 5 -6
6 10

and the desired matrix is

p--| 3 af 5 3° a
~ 14 6 101
2 O
I-10
1[-6 ry 3-17
[2 2 ©
=74l 3 211 2 of Tal 2 13-3)
-5 6 6-3 5
Each vector b = [b,, 5,, 53] in R? projects onto the vector

! 10 2 616, i 105, + 2b,+ 68,


by.= Pb = 14 2 13 —3 b, = 14 25, + 136, = 30; .

6 -3 5a, 6b, — 3b,


+ 5b,
6.4 — THE PROJECTION MATRIX 365
Uniqueness and Properties of the Projection Matrix
It might appear from the formula P = A(A™A)"'A™ that a projection matrix P
for the subspace W of R” depends on the particular choice of basis for W that is
used for the column vectors of A. However, this 1s not so. Because the
projection map of R" onto Wcan be computed for x € R” by the multiplication
Px of the vector x on the left by the matrix FP, we see that this projection map is
a linear transformation, and P must be its unique standard matrix representa-
tion. Thus we may refer to the matrix P = A(A7A)"!4? as the projection matrix
for the subspace W. We summarize our work in a theorem.

THEOREM 6.11 Projection Matrix

Let Wbe a subspace of R". There is a unique n X n matrix Psuch that,


for each column vector bin R’, the vector Pb is the projection of b on
W. This projection matrix P can be fourid by selecting any basis
{a,, a, ..., a+ for Wand computing P = A(A7A)"'47, where A is the
n X k matrix having column vectors a,, a, ... , a.

Exercise 16 indicates that the projection matnx P given in Theorem 6.11


satisfies two properties:

Properties of a Projection Matrix P


1. P= P. P is idempotent.
2. Pi = P. P is symmetric.

We can use property 2 as a partial check for errors in long computations that
lead to P, as in Example 3. These two properties completely characterize the
projection matrices, as we now show.

THEOREM 6.12 Characterization of Projection Matrices

The projection matrix P for a subspace W of R’ is both idempotent


and symmetric. Conversely, every n X n matrix that is both idem-
potent and symmetric is a projection matrix: specifically, it is the
projection matrix for its column space.

PROOF Exercise 16 indicates that a projection matrix is both idempotent and


symmetric.
To establish the converse, let P be an 1 X n matrix that is both symmetric
and idempotent. We show that P is the projection matrix for its own column
space W. Let b be any vector in R”. By Theorem 6.11, we need show only that
366 CHAPTER 6 GRTHOGONALITY

Pb satishes the characterizing properties of ihe projection of b on W given in


the box on page 361. Ncw Pb surely fies in the column space W of P, because
W consists of all vectors Px for any vecto. x in R". The second requirement is
that b — Pb must be perpendicular to each vector Px in the columa space of P.
Writing the dot product of b — Pb and Px in matrix form, and using the
hypotheses P? = P = P’, we have
(b — Pb)7Px = ((I — P)b)’Px = b’(I — P)'Px
= b’(I — P)Px = b’(P — P*)x
= b’(P — P)x = b’Ox = [0].
Because their dot product is zero, we see that b — Pb and Px are indced
perpendicular, and our proof is complete. 4

The Orthonormal Case

In Example Z, we saw that the projection matrix for the x,,x,-plane in R? has
the very simple description
000
P=|0 1 0}.
001
The usually complicated formula P = A(A7A)"'A’ was simplified when we used
the standard unit coordinate vectors in our basis for W. Namely, we had
(0 0 00
A=1l1 01 so ara =| 9 3] 1=[9 9] = 7
(0 | 0 1

Thus, (A7A)"! = J, too, which simplifies what is normally the worst part of the
computation in our formula for P. This simplification can be made in
computing P for any subspace W of R", provided that we know an orthonormal
basis {a,, a,..., a, for W. If A is the n X k matrix whose columns are
a,, a),..-., a, then we know that A7A = J. The formula for the projection
matrix becomes
P = A(ATAY AT = AIA? = AAT.
We box this result for easy reference.

Projection Matrix: Orthonormal Case


Let {a,, a,,..., a,} be an orthonormal basis for a subspace W of R’.
The projection matrix for W 1s

P = AA’, (4)
where .4 is the n x A matrix having column vectors a,, a,,... , &.
64 THE PROJECTION MATRIX 367

EXAMPLE 4 Find the projection matrix for the subspace W = sp(a,, a.) of R? if
WV3 V2
a, =|-1/V3| and a,=| 1/V2].
V3 0
Find the projection of each vector b in R? on W.
SOLUTION Let
V3 WV
A=|-WUV3 WV].
VvV3 0 7"
Then
ATAr,_|1 0
|
sO
5 J 3
6 6 3
1 S$ _t
P=AAT=/6 6 3).
11 4
3 3 3
Each column vector b in R? pro)
projects onte
§
b] {58+ by + 2]
Al—

6
by = Pb =|! |
Al

3 b, ~ 6 b, + 5b, — 26,

1 1 |b; 2b, — 2b, + 26,|


wel
|

Alternatively, we can compute b, directly, using boxed Eq. (4) in Section 6.2.
We obtain
if bb +b) fb, +b,
by = (b+ aja, + (b+ aja, = 3)—5, + b, — by + 516, + 6,
b, - b, +b 0
I + b+
b, + 5b, — 26,
On

26, ~ 2b, + 2b,

SUMMARY

Let {a,, a,,..., a, be a basis for a subspace W of R’, and let A be then x k
matrix having a, as jth column vector, so that W is the column space of A.

1. The projection of a column vector b in R" on W is by. = A(A™A)"'A’b.


2. The matrix P = A(A7A)"'A’ is the projection matrix for the subspace W. It
is the unique matrix P such that, for every vector b in R’, the vector Pb lies
in) W and the vector b -- Pb is perpendicular to every vector in W.
3. If the basis {a,, a,..., a,} of the subspace W is orthonormal, the
projection matrix for Wis P = AA’.
368 CHAPTER 6 ORTHOGONALITY

4. The projection matrix P of a subspace W is idempotent and symmetric.


Every symmetric idempotent matmx is the projection matrix for its
column space.

EXERCISES

In Exercises 1-8, find the projection matrix for __.h. Ann X nsymmetric matrix A is a
the given subspace, and find the projection of the projection matrix if and only if A? = J.
indicated vector on the subspace. _—— i. Every syminetric idempotent matnx is the
projection matnx for its column space.
. [1], 2, 1] on sp(f2, 1, —1)) in R?
_— j. Every symmetric idempotent matnx
bom

. (1, 3, 4] on sp(fi, —1, 2)) in R?


is the projection matrix for its row
. (2, -1, 3] on sp({Z, 1, 1), [-1, 2, 1)) in R?
Ge

space.
[1, 2, 1] on sp((3, 0, 1), [1, 1, 1]) in R?
16. Show that the projection matrix P =
de

[1, 3, 1] on the planex + y — 2z = 0 in R?


A(A7A)'A’ given in Theorem 6.11 satisfes
NN

. [4, 2, -1] on the plane 3x+ 2y + z= 0


the following two conditions:
in R?
a. P? = P,
. [1, 2, 1, 3] on sp({i, 2, 1, 1], [-1, 1, 0, -1)) b. Pr = P.
~

in R‘
17. What is the projection matrix for the
8. [1, 1, 2, 1] on sp("l, 1, 1, 1), afl, -I, l, —1),
subspace R’ of R”?
[-1, 1, 1, —1]} in R*
18. Let U be a subspace of W, which is a
9, Find the projection matrix for the
subspace of R*. Let P be the projection
X,,X,plane in R?.
matrix for W, and let R be the projection
10. Find the projection matrix for the
matrix for U. Find PR and RP. [Hint: Argue
X,,X;-coordinate subspace of R‘.
geometrically.]
11. Find the projection matrix for the
19. Let P b2 the projection matrix for a
X1,X2,X,-coordinate subspace of R‘.
k-dimensional subspace of R’.
12. Show that boxed Eq. (3) of this section
a. Find all eigenvalues of P.
reduces to Eq. (1) of Section 6.1 for
b. Find the algebraic multiplicity and the
projecting b on sp(a).
geometric multiplicity of each eigenvalue
13. Give a geometric argument indicating that
found in pari (a).
every projection matrix is idempotent.
c. Explain how we can deduce that P is
14. Let a be a unit column vector in R*. Show
diagonalizable, without using the fact that
that aa? is the projection matrix for the
P is a symmetric matrix.
subspace sp(a).
20. Show that every symmetric matrix whose
15. Mark each of the following True or False.
only eigenvalues are 0 and | is a projection
—— a. A Subspace W of dimension k in R’ has
matrix.
associated with ita x Xx k projection
21. Find all invertible projection matrices.
matrix.
_— b. Every subspace W of R" has associated In Exercises 22-28, find the projection matrix for
with it an n X n projection matrix. the subspace W having the given orthonormal
___ c. Projection of R” on a subspace W is a basis. The vectors are given in row notation to
linear transformation of R" into itself. save space in printing.
_— d. Two different subspaces of R* may have
22. W = sp(a,, a,) in R’, where
the same projection matrix.
___ e. Two different matrices may be projection a, = [1/V2, 0, -1/V/2] and
matrices for the same subspace of R’. a, = [1/V3, -1/V3, 1/V3]
_— f. Every projection matrix is symmetric.
—— g. Every symmetric matrix is a projection 23. W sp(a,, a,) in R?, where
M

matrix. a =
5, Q] and a) = [9, 0, 1]
6.5 THE METHOD OF LEAST SQUARES 369

24. W = sp(a., a,) in R‘, where


a, = (352), 4/(5V2), 4, “4 and
a, = (4(5V2), -3/(5V2), 4,
25. W = sp(a,, a.) in R’, where a, = B a, 3, -{]
and a. =j-3 a 62 0|
26. W = sp(a,, a.) in R*, where
a,
={9.-2.2!
~ 0, y y j and a,
=|2 90, -!BY 2 4 FIGURE 6.11
27. W = sp(a,, a, a;) in R*, where Reflection of R’ through W.
a, = [1/V3, 0, 1/13, 1/3],
direction an equal distance to arrive at b,.
a, = [1/V3, 1/V3, -1/V3, 0], and (See Figure 6.11.)
a, = [1/V3, -1/V33, 0, -1/V3]
Show that b, = (2P — Jb. (Notice that,
28. W = sp(a,, a;, a;) in R*, where
because reflection can be accomplished bv
a =jli lid), } a, =/-11
ry 11
7 i} and matrix multiplication, this reflection must be
aft t _i 1 a linear transformation of R’ into itself.)
a= lp cS The formula A(A*A)"'A’ for a projection matrix
In Exercises 29-32, find the projection of b on W. can be tedious to compute using pencil and paper,
but the routine MATCOMP in LINTEK, or
29. The subspace W in Exercise 22; MATLAB, can do it easily. In Exercises 34~38.
b = [6. —12, -6] use MATCOMP or MATLAB to find the indicated
vector projections.
30. The subspace W in Exercise 23;
b = (20, -15, 5] 34. The projections in R® of J—1, 2, 3, 1, 6, 2]
31. The subspace W in Exercise 26; and [2, 0, 3, —1, 4, 5] on
b = (9. 0, —9, 18] sp({!, -2, 3, 1, 4, 0))
32. The subspace W in Exercise 28; 35. The projections in R? of [1, —1, 4],
b = [4, -12, —4, 0] [3, 3. -—1], and [-2, 4, 7] on
33. Let W be a subspace of R’, and let P be the sp([l, 3, — 4], (2, 0, 3))
projection matrix for W. Reflection of R’ in 36. The projections in R‘ of [—1, 3, 2, 0] and
W is the mapping of R’ into itself that (4, -—I, 1, 5] on sp({0, 1, 2, 1), [-1, 2, 1, 4])
Carries each vector b in R’ into its reflection 37. The projections in R* of [2, 1, 0, 3),
b,, according to the following geometric {1, 1. -1, 2], and [4, 3, 1, 3} on
description: sp({!, 0, -—1, OJ, [1, 2, -1, 4], [2, 1, 3, -1]))
Let p be the projection of b on W. Starting 38. The projections in R? of [2, 1, —3, 2, 4] and
at the tip of b, travel in a straight line to the (1, —4, 0, 1, 5] on
tip of p. and then continue in the same sp({3, !, 4, 0, 1], [2, 1, 3, -2 1)

6.5 THE METHOD OF LEAST SQUARES

The Nature of the Problem

In this section we apply our work on projections to problems of data analysis.


Suppose that data measurements of the form (a,, 6) are obtained from
370 CHAPTER 6 ORTHOGONALITY

observation or experimentation and are plotted as data points in the x,y-plane.


It is desirable to find a mathematical relationship y = f(x) that represents the
data reasonably well, sc that we can make predictions of data values that were
not measured. Geometrically, this means that we would like the graph of y =
f(x) in the plane to pass very close to our data points. Depending on the nature
of the experiment and the configuration of the plotted data points, we might
decide on an appropriate type of function y = f(x) such as a linear function, a
quadratic function, or an exponential function. We illustrate with three types
of problems.

PROBLEM1 According to Hooke’s law, the distance that a spring stretches is proportional
to the force applied. Suppose that we attach four different weights a,, @,, a,,
and a, in turn to the bottom of a spring suspended vertically. We measure the
four lengths 5,, 5,, 5, and b, of the stretched spring, and the data in Table 6.1
are obtained. Because of Hooke’s law, we expect the data points (a;, 5,) to be
close to some line with equation
yaf[()an+ nx,
where r, is the unstretched length of the spring and r, is the spring constant.
That is, if our measurements were exact and the spring ideal, we would have
b, = ro + r,a; for specific values ry andr,

TABLE 6.1

a, = Weight in ounces 2.0 4.0 5.0 6.0


b, = Length in inches 6.5 8.5 11.0 12.5

In Problem 1, we have only the two unknowns 7, and r,; and in theory,
just two measurements should suffice to find them. In practice, however, we
expect to have some error in physical measurements. It is standard procedure
to make more measurements than are theoretically necessary in the hope
that the errors will roughly cancel each other out, in accordance with the
laws of probability. Substitution of each data point (a, 5,) from Problem |
into the equation y = r, + r,x gives a single linear equation in the two un-
knowns r, and r,. The four data points of Problem | thus give rise to 4
linear system of four equations in only twc unknowns. Such a linear sys-
tem with more equations than unknowns is called overdetermined, and
one expects to find that the system is inconsistent, having no actual solu-
tion. It will be our task to find values for the unknowns y, and r, that
will come as close as possible, in some sense, to satisfying all four of the
equations.
We have used the illustration presented in Problem | to introduce our
goal in this section, and we will solve the problem in a moment. We first
present two more hypothetical problems, which we will also solve later 10
the section.
6.5 THE METHOD OF LEAST SQUARES 371

PROBLEM 2 At a recent boat show, the observations listed in Table 6.2 were made relating
the prices 5; of sailboats and their weights a;. Plotting the data points (a;, 6,), as
shown in Figure 6.12, we might expect a quadratic function of the form

ypHf[M=antrxt Hr
to fit the data fairly well. »

TABLE 6.2

a; = Weight in tons 2 4 5 8
b, = Price in units of $10,000 ] 3 5 12

PROBLEM 3 A population of rabbits on a large island was estimated each year from 1991 to
1994, giving the data in Table 6.3. Knowing that population growth is
exponential in the absence of disease, predators, famine, and so on, we expect
an exponential function

y = f(x) = re"
to provide the best representation of these data. Notice that, by using
logarithms, we can convert this exponential function into a form linear in x:

Iny = Inr + sx.

TABLE 6.3

a, = (Year observed) ~ 1990 1 2 3 4


b, = Number of rabbits in units of 1000 3 4.5 8 17

Price
y
A

10+

5+

1+. Weight
py ht tt le FIGURE 6.12
112345678910 Problem
2 data.
372 CHAPTER 6 ORTHOGONAI ITY

The Method of Least Squares


Consider now the problem of finding a linear function f(x) = ry) + rx that best
fits data points (a;, b) for i = 1, 2,..., m, where m > 2. Geometrically, this
amounts to finding the line in the plane that comes closest, in some sense, to
passing through the m data points. If there were no error in our measurements
and our data were truly linear, then for some r, and r, we would have

b=rytrna; for i=1,2,...,m.


These m linear equations in the two unknowns ”, and r, form an overdeter-
mined system of equations that probably has no solution. Our data points
actually satisfy a system of linear approximations, which can be expressed in
matrix form as
b, l a,
b. 1 a,
~ 0
| (1)
bn 1 an
b A r

or simply as b ~ Ar. We try to find an optimal solution vector r for the system
(1) of approximations. For each vector r, the error vector Ar — b measures how
far our system (1) is from being a system of equations with solution vector r.
The absclute values of the components of the vector Ar — b represent the
vertical distances d, = |r, + r,a; — 0], shown in Figure 6.13.
We want to minimize, in some sense, our error vector Ar — b. A number of
different methods for minimization are very useful. For example, one might
want to minimize the maximum of the distances d;. We study just one sense of

bt

FIGURE 6.13
The distances d..
6.5 THE METHGD OF LEAST SQUARES 3/3

minimization, the one that probably seems most natural at this point. We will
find r = r to minimize the length ||Ar — bj] of our error vector. Minimizing
\| Ar — b|| can be achieved by minimizing || Ar — bj|?, which means minimizing
the sum

d?+dzp+-+-+d,-
of the squares of the distances in Figure 6.13. Hence the name metiod of least
squares given to this procedure.
If a, and a, denote the columns of A in system (1), the vector Ar = r,a, +
ra, lies in the column space W = sp(a,, a,) of A. From Figure 6.14, we see
geometrically that, of all the vectors Ar in W, the one that minimizes || Ar — b||
is the projection by = Ar of b on W. Recall that we proved this minimization
property of by algebraically on page 328. Formula (3) in Section 6.4 shows that
then Ar = A(A’A)'A’b. Thus our optimal solution vector r is
r = (ATAY'ATD. (2)
The 2 x 2 matrix A‘A is invertible as long as the columns of A are
independent, as shown by Theorem 6.10. For our matrix A shown in system
(1), this just means that not all the values a; are the same. Geometrically, this
corresponds to saying that our data points in the plane do not all lie ona
vertical line.
Note that the equation r = (A7A)"'A’b can be rewritten as (A"A)r = A‘b.
Thus r appears as the unique solution of the consistent linear system

(ATA)r = A‘b. (3)


Unless the coefficient matrix ATA is of very small size, it is more eficient when
using a Computer to solve the linear system (3), seducing an augmented matrix
as usual, rather than to invert AA and then multiply it by A’b as in Eq. (2). In
our examples and exercises, A’A will often be a 2 x 2 matrix whose inverse can
be written down at once by taking |/det(A7A) times the adjoint of A7A (see
page 269), making the use of Eq. (2) practical. We summarize in a box, and
proceed with examples.

FIGURE 6.14
The length ||Ar — bl|.
374 CHAPTER 6 ORTHOGONALITY

Least-Squares Solution of Ar = b
Let A be a matrix with independent column vectors. The
least-squares solution r of Ar ~ b can be computed in either of the
following ways:
1. Compute r = (A7A)"'A™S.
2. Solve (.47A)r = Ab.
When a computer is being used, the second method is more
efficient.

EXAMPLE 1 Find the least-squares fit to the data points in Problem | by a straight
line—that is, by a linear function y = 7% + r,x.
SOLUTiON We form the system b ~ Ar in system (1) for the data in Table 6.1:

We have

and

6.5
_ = (ATA\'ATh = 1f 81 -17])f1
11 1)] 8&5
r= (ArAyAa’d | 7 all; 45 A 11.0
12.5
6.5
_ 1/47 13 -4 -21)} 85}_ 1109.5) _ [3.1
~ 351-9 -1 3 7I{ILOl” 35) 53.5 1.5/°
12.5
Therefore, the linear function that best fits the data in the least-squares sense 1S
y ~ f(x) = 1.5x + 3.1. This line and the data points are shown in Figure
6:5. 8
65 THE METHOD OF LEAST SQUARES 375

FIGURE 6.75
The least-squares fit of data points.

EXAMPLE 2 Use the method of least squares to fit the data in Problem 3 by an exponential
function y = f(x) = re™.
SOLUTION We use logarithms and convert the exponential equation to an equation that is
linear in x:
Iny=Inr+t
sx.
TABLE 6.4

xX =4, | 2 3 4
y=5, 3 4.5 8 17
z = In(b) 1.10 1.50 2.08 2.83

HISTORICAL NOTE A TECHNIQUE VERY CLOSE TO THAT OF LEAST SQUARES was developed by
Roger Cotes (1682-1716), the gifted mathematician who edited the second edition of Isaac
Newton’s Principia, in a work dealing with errors in astronomical observations, wntten around
1715,
The complete principle, however, was first formulated by Carl Gauss at around the age of 16
while he was adjusting approximations in dealing with the distribution of prime numbers. Gauss
later stated that he used the method frequently over the years—for example, when he did
calculations concerning the orbits of asteroids. Gauss published the method in 1809 and gave a
definitive exposition 14 years later.
On the other hand, it was Adrien-Marie Legendre (1752-1833), founder of the theory of
elliptic functions, who first published the method of least squares, in an 1806 work on determining
the orbits of comets. After Gauss’s 1809 publication, Legendre wrote to him, censuring him for
claiming the method as his own. Even as late as 1827, Legendre was still berating Gauss for
“appropriating the discoveries of others.” In fact. the problem lay in Gauss’s failure to publish
many of his discoveries promptly; he mentioned them only after they were published by others.
376 CHAPTER 5 ORTHOGONALITY

From Table 6.3, we obtain the data in Table 6.4 to use for our logarithmic
equation. We obtain
11
11 i. it 2)_7 4 10
AA i 23 al 3 0 30
TA = =

14
3 1
aay =| 2 2 2 §

. a
3 1
3a(L 111 bo 3I OF 1
(ATA AT=| 2
iL
5
23 Je 10
tou
10 10
|10
Multiplying this last matrix on the right by
1.10
1.56
2.08 |>
2.83]
we obtain, from Eq. (3),
= = Inr} _ {-435
5 S77]
Thus r = e * = 1,54, and we obtain y = j(x) = 1.54e°”*as a fitting exponential
function.
The graph of the function and of the data points in Table 6.5 is shown in
Figure 6.16. On the basis of the function f(x) obtained, we can project the
population of rabbits on the island in the year 2000 to be about
f(10) - 1000 ~ 494,000 rabbits,
unless predators, disease, or lack of food interferes. =

TABLE 6.5

a; b, f(a,)

1 3 2.7
2 4.5 4.9
3 8 8.7
4 17 15.5

In Example 2, we used the least-squares method with the logarithm of out


original y-coordinate data. Thus, it is the equation In y = r + sx that is the
least-squares fit of the logarithmic data. Using the least-squares method to fit
the logarithm of the y-coordinate data produces an exponential function that
approximates the smaller y-values in the data better than it does the larger
y-values, as illustrated by Table 6.5. It can be shown that the exponential
6.5 THE METHOD OF LEAST SQUARES 377

<
rr
|
v=] (§4¢0-57 71x
20
qT

FIGURE 6.46
Data points and the exponential fit.

fit y = f(x) = re obtained from the least-squares fit of In y = Inr + sx amounts


roughly to minimizing the percent of error in taking f(a, for b;. The least-
squares &t of logarithmic data does not yield the least-squares fit of the original
data. The routine YOUFIT in LINTEK can be used to illustrate this. (See
Exercises 29 and 30.)

Least-Squares Solutions for Larger Systems


Data points of more than two components may lead to overdetermined linear
systems in more than two unknowns. Suppose that an experiment is repeated
m times, and data values

b;, Gy, ... 5 G; M1

are obtained from measurements on the ith experiment. For example, the data
values a, might be ones that can be controlled and the value b; measured. Of
course, there may be errors in the controlled as well as in the measured values.
Suppose that we have reason to believe that the data obtained for each
experiment should theoretically satisfy the same linear relation
YH tnx, + mx, + ses + 7,Xq) (4)

with y = b, and x; = a,. We thei obtain a system of m linear approximations in


n+ 1 unknowns f, 1, fy. ~~ Tan

b 1 a, a. *'*) allt
b, Lay Gy. t+ *) Gyllh
= . ry (5)

b, 1 ani an? sa Qnn °

r,
378 CHAPTER 6 ORTHOGONALITY

Ifm>n + 1, then system (5) corresponds to an overdetermined linear system


b = Ar, which probably has no exact sclution. If the rank of Ais n+ 1, thena
repetition of our geometric argument above indicates that the least-squares
solution r for the system b ~ Ar is given by
fF = (ATA)"'A’b. (6)
Again, it is more efficient computationally to solve (A7A)r = A’b.
A linear system of the form shown in system (5) arises if m data points
(a,, b,) are found and a least-squares fit is sought fo. a polynomial function

ypertnxteors tr yx! + rx".


The data point (a,, b,) ieads to the approximation

b, ~ rq + ra; tees + r,-\a," + ray".

The m data points thus give rise to a linear systein of the form of system (5),
where the matrix A 1s given by

] a, a; eoee a,"

1 a, a, 2 . e ea
a,"

Lay Ape ses Ope

EXAMPLE 3 Use a computer to find the least-squares fit to the data in Froblem 2 by a
parabola—that is, by a quadratic function

YHryrnx t+ rx’.
SOLUTION We write the data in the form b = Ar, where A has the form of matrix (7):

1 12 4
3) |1 4 16|}%
5] 11 5 251 4]-
12 18 64)?

Entering A and b in either LINTEK or MATLAB, we find that


_f 4 19 109 21
ATA =] 19 109 709} and Ab =]135).
109 709 4993 945
Solving the linear system (A7A)r = A’b using either package then yields

.207
r ~}.010
6.5 THE METHOD OF LEAST SQUARES 379

Thus, the quadratic function that best approximates the data in the least-
squares sense 1S

y = f(x) = .207 + Ox + .183x?.


In Figure 6.17, we show the graph of this quadratic function and plot the data
points. Data are given in Table 6.6. On the basis of the least-squares fit, we
might project that the price of a 10-ton sailing yacht would be about 18.6 times
$10,000, or $186,000, and that the price of a 20-ton yacht would be about
f(20) times $10,000, or about $736,000. However, one should be wary of using
a function f(x) that seems to fit data well for measured values of x to project
data for values of x far from those measured values. Our quadratic function
seems to fit our data quite well, but we should not expect the cost we might
project for 2 100-ton ship to be at all accurate. =

TABLE 6.6

a, b, f(a,)

2 | .959
4 3 3.17
5 5 4.83
& 12 12.0

y
A

12 T

10+

h- 207 + Olx + .183x

— +++» x
2 4 6 8 10 12
FIGURE 6.17
the graph and data points for Example 3.
380 CHAPTER 6 ORTHOGONALITY

EXAMPLE 4 Show that the least-squares linear fit to data points (a, b,) fori = 1, 2,...,m
by a constant function /(x) = 7 is achieved when r, is the average y-value
(b, +b, + +++ +b ym
of the data values 0.
SOLUTION Such a constant function y = f(x) = 7 has a horizontal iine as its graph. In
this situation, we have n = 1 as the number of unknowns and system (5) be-
comes

b| (1
b, l

.1~{. | Ll

i|
b A

Thus A is the im X 1 matrix with all entries 1. We easily find that A7A = [m],so
(ATA)! = [l/m]. We also find that A7b = 5, + b, + - ++ + b,. Thus, the
least-squares solution [Eq. (6)] !s

r=(6,+6,+--+
+ 6,)/m,
as asserted. a

Overdetermined Systems of Linear Equations


We have presented examples in which least-squares solutions were found to
systems of linear approximations arising from applications. We used r as the
column vector of unknowns in those examples, because we were using x as the
independent variable in the formula for a fitting function /(x). We now discuss
the mathematics of overdetermined systems, and we use x as the column
vector of unknowns again.
Consider a general overdetermined system of linear equations
Ax = b,
where Ais an m X n matrix of rank n and m > n. We exject such a system tobe
inconsistent—that is, to have no solution. As we have seen, the system ©
linear approximations Ax ~ b has least-squares solution
X = (ATA)"'A7D. (8)
The vector x is thus the least-squares solution of the overdetermined syste#!
Ax = b. Equation (8) can be wniten in the equivalent form
ATAX = Atb, (9)
. oe , ; , T
which exhibits X as the solution of the consistent linear system A7Ax = A b.
6.5 THE METHOD OF LEAST SQUARES 381

EXAMPLE 5 Find the least-squares approximate solution of the overcstermined linear


system

1 0 1
2 1 1
1 3 =O
0 2 1
-l 2-1

by transforming it into the consistent form [system (9)}.


SOLUTION The consistent system [system (9)] is

1 0 1 l
1 2 1 0-1} 2 1 Ix, i 2 It O -1)/0
0 1 3 2 2 1 3 Off%y=)0 1 3 2 21h
1 ! O 1 lf} O 2° IIx, 1 1 0 1 -1}}2
-1 2-1 0

or
7 3 4i\x, 2
3 18 L)px,}=47],
4 1 4)Ix, 3

whose sclution is found, using MATLAB or LINTEK, to be

x, —.614
X,) =] .421),
L
X31 | 1.259
accurate to three decimal places. This is the least-squares approximate
solution x to the given overdetermined system. a

The Orthogonal Case


To obtain the least-squares linear fit y = r, + r,x of k data points (a,, 5,), we
form the matrix

A=| . (10)

and compute the least-squares solution vector r = (4°A)''A’b. If the set of


column vectors in A were orthonormal, then A7A would be the 2 X 2 identity
matrix J. Of course, this is never the case, because the first column vector of A
382 CHAPTER 6 ORTHOGONALITY

has length Wk. However, it may happen that the column vectors in 4 are
mutually perpendicular; we can see then that

k 0 Vk 0
A'A
T =
If a: } sO (A TA) “l=
| 0 (a . |

The matrix A has this property if the x-values a,, a,,..., a, for the data are
symmetrically positioned about zero. We illustrate with an example.

EXAMPLE 6 Find the least-squares linear fit of the data points (—3, 8), (-—1, 5), (1, 3), and
(3, 0).

SOLUTION The matrix A is given by


1

ee
A=|j

eee
I

tad
We can see why the symmetry of the x-values abont zero causes the column
vectors of this matrix to be orthcgonal. We ind that

1 -3
1 1 1 og} -l 4 0
AA [_ -1 1 shi l S sof
T = =

1 3,
Then

1
r = I = (ATAY'ATh = :
So

|
ee
tI—
°
aim
Oo

10 36 ~ [1 3}
|

al-

Thus, the least-squares linear fit is given by y= 4-1.3x. o

Suppose now that the x-values in the data are not symmetrically posi-
tioned about zero, but that there is a number x = c about which they
are symmetrically located. If we make the vanable transformation ¢ = x — 6
then the ¢-values for the data are symmetrically positioned about ¢ = 0. We
can find the 4,y-equation for the least-squares linear fit, and then replace !
by x — c to obtain the x,y-equation. (Exercises 14 and 15 refine this idea
still further.)

EXAMPLE 7 The number of sales of a particular item by a manufacturer in each of the first
four months of 1995 is given in Table 6.7. Find the least-squares linear fit of
these data, and use it to project sales in the fifth month.
6.5 THE METHOD OF LEAST SQUARES 383

TABLE 6.7

a, = Month 1 2 3 4
b, = Thousands sold 2.5 3

SOLUTION Our x-values a, are symmetrically located about ¢ = 2.5. If we take f= x — 2.5,
the data points (¢, v) become

(—1.5, 2.5), (—.5, 3}, (.5, 3.8), (1.5, 4.5).


We find that
1-15
r,_{ 1 1 t Ut —3)_ [401
AA Er —5 5 15|li 5] > fo sf
[1 1.5
Thus,

[2.5
TA\-\ATh —|4 OF 2 1a 13
(waar o 1 a3 —.5 .5 1513.8
5 4.5

; (34 - [*6s}
Thus, the ¢,y-equation is y = 3.45 + .682. Replacing ¢ by x — 2.5, we obtain

y = 3.45 + .68(x — 2.5) = 1.75 + .68x.


Setting x = 5, we estimate sales in the fifth month to be 5.15 units, or 5150
items. om

Using the QR-Factorization


The QR-factorization discussed at the end of Section 6.2 has application to the
method of least squares. Consider an overdetermined system Ax = b, where
the matrix A has independent column vectors. We factor A as QR and
remember that Q’Q = J, because Q has orthonormal column vectors. Then we
obtain

(ATA)'A7b = ((OR)'OR) "(QR)" b = (R7OTOR)'R™O'b


= (RTR)'R7Q7D = R-\“(R7)'R7Q"b = R"'0Q'b. (11)

To find the least-squares solution vector R"'Q’b once Q and R have been
found, we first multiply b by Q7, which is a stable computation because Q is
crthogonal. We then solve the upper-triangular system

Rx = Q'b (12)
384 CHAPTER 6 ORTHOGONALITY

by back substitution. Because R is already upper triangular, no matri


reduction is needed! The routine QRFACTOR in LINTEK has an option fo
this least-squares computation. The program demonstrates how quickly th
least-squares solution vector can be found once a QR-factorization of A i
known.
Suppose we believe that the output y cf a process should be a linea
function of time ¢. We measure the y-values at specified times during th
process. We can then find the least-squares linear ft y = r, + s,¢ from th
measured data by finding the QR-factorization of an appropriate matrix A anc
subsequently finding the solution vector [Eq. (11)] in the wav we jus
described. If we repeat the process later and make measurements again at th:
same time intervals, the matrix A will not change, so we can find anothe
least-squares linear fit using the solution vector with the same matrices QO anc
R. We can repeat this often, obtaining linear fits r, + 5,0, 7, + 5)t, 2.0m t Smal
For our final estimate of the proper linear fit for this process, we might tak
+ st, where ris the average of the m values r, and s is the average of the 7
values s,. We expect that errors made in measurements will roughly cancel eact
other out over numerous repetitions of the process, so that our estimate
linear fit r + st will be fairly accurate.

/ | SUMMARY | | —
1. Aset of k data points (a,, b;) whose first coordinates are not all equal can be
fitted with a polynomial function or with an exponential function, using
the method of least squares illustrated in Examples | through 3.
2. Suppose that the x-values a, in a set of k data points (a, b,) are
symmetrically positioned about zero. Let A be the k x 2 matrix with first
column vector having all entries 1 and second column vector a. The
columns of A are orthogonal, and

ATA = 0 0 |
Oa-a
Computation of the least-squares linear fit for the data points is then
simplified. If the x-values of the data points are symmetrically positioned
about x = c, then the substitution ¢ = x — c gives data points with f-values
symmetrically positioned about zero, and the above simplification applies. See
Example 7.

Let Ax = b be a linear system of m equations in n unknowns, where m > n (an


overdetermined system) and where the rank of A is n.

3. The least-squares solution of the corresponding system Ax = b of linea!


approximations is the vector x = x, which minimizes the magnitude of the
error vectors Ax — b for x € R’.
6.5 THE METHOD OF LEAST SQUARES 385

4, The least-squares solution x of Ax ~ b and of Ax = b is given by the


formula

X = (ATA)"'A',
Geometrically, Ax is the projection of b on the column space of A.
5. An alternative to using the formula for x in summary item 4 is to convert
the overdetermined system Ax = b to the consistent system (A7A)x = A™b
by multiplying both sides of Ax = b by A’, and then to find its unique
solution, which is the least-squares solution x of Ax = b.

EXERCISES

1. Let the length b of a spring with an attached b. Use the answer to part (a) to estimate the
weight a, be determined by measurements, as profit if the sales force is reduced to 25.
shown in Table 6.8. c. Does the profit obtained using the answer
a. Find the least-squares linear fit, in to part (a) for a sales force of 0 people
accordance with Hooke’s law. seem in any way plausible?
b. Use the answer to part (a) to estimate the
length of the spring if a weight of 5 In Exercises 5-7. find the least-squares fit to the
ounces is attached. given data by a linear function f(x) = ro + rx.
TABLE 6.8
Graph the linear function and the data points.
5. (0, 1), (2, 6), (3, 11), (4, 12)
a,= Weight in ounces 1 2 4 6 6. (1, 1), (2, 4), (3, 6), (4, 9)
b; = Length in inches 3. 4.1 5.9 = 8.2
7. (0, 0), (1, 1), (2, 3), (3, 8)
8. Find the least-squares fit to the data in
. A company had profits (in units of $19,900) Exercise 6 by a parabola (a quadratic
ta

of 0.5 in 1989, | in 1991, and 2 in 1994. Let polynomial function).


time ¢ be measured in years, with ¢ = 0 in 9. Repeat Exercise 8, but use the data in
1989. Exercise 7 instead.
a. Find the least-squares linear fit of the
data. - In Exercises 10-13, use the technique illustrated
b. Using the answer to part (a), estimate the in Examples 6 and 7 to solve the least-squares
profit in 1995. problem.
3. Repeat Exercise 2, but find an exponential fit
10. Find the least-squares linear fit to the data
of the data, working with logarithms as
explained in Example 2. points (—4, —2), (—2, 9), (0, 1). (2, 4), (4, 5).
11. Find the least-squares linear fit to the data
4. A publishing company specializing in college
points (0, 1), (1, 4), (2, 6), (3, 8), (4, 9).
texts starts with a field sales force of ten
people, and it has profits of $100,000. On 12. The gallons of maple syrup made from the
increasing this sales force to 20, it has profits sugar bush of a Vermont farmer over the
of $300,000; and increasing its sales force to past five years were:
30 produces profits of $400,000. 80 gallons five years ago,
a. Find the least-squares linear fit for these 70 gallons four years ago,
data. {Hint: Express the numbers of 75 gallons three vears ago,
salespeople in multiples of 10 and the 65 gallons two years ago,
profit in multiples of $100,000} 60 gallons last year.
386 CHAPTER 6 ORTHOGONALITY
r
Use a least-squares linear fit of these data to 2 -! 1 2
project the number of gallons that will be l 0 If, l
produced this year. (Does this problem make 20. | -1 0 I}{[x,)=]2
practical sense? Why?) 0 —3 -II)x, 5
. The minutes required for a rat to find its | 2 -2 0
way out of a maze on each repeated attempt 21. Mark each of the following True or False.
were 8, 8, 6, 5, and 6 on its first five tries. —_— 2a. There is a unique polynomial function of
Use a least-squres linear fit of these data to degree k with graph passing through any
project the time the rat will take on its sixth given k points in R’.
try. — b. There is a unique polynomial function of
14. Let (a,, 5,). (@, By), - - 5 (Amr On) be data degree k with graph passing through any
points. If 2”, a, = 0, show that the line that k points in R? having distinct first
best fits the data in the least-squares sense is coordinates.
given oy r, + rx, where . There is a unique polynomial function of

(ap
degree at most / — | with graph passing
through any k points in R? having distinct
first coordinates.
— d. The least-squares solution of Ax = b is

= Goalie)
and unique.
— ¢. [he least-squares solution of Ax = b can
ft!
be an actual solution only if A is a square
matnx.
15. Repeat Exercise !4, but do not assume that . The least-squares solution vector of
x”, a, = 0. Show that the least-squares linear Ax = bis the projection of b on the
fit of the data is given by y = r, + r(x — 0), column space of A.
where c = (2%, a,)/m and r, and r, have ___ g. The least-squares solution vector of
the values given in Exercise 14. Ax = b is the vector x such that Ax is the
projection of b on the column space of A.
In Exercises 16-20, find the least-squares solution —_— h. The least-squares so!ution vector of
of the given overdetermined system Ax = b by Ax = b is the vector x = x that
converting it to a consistent system and then minimizes the magnitude of Ax — b.
solving, as illustrated in Example 5. i. A linear system has a least-squares
solution only if the number of equations

Teak
is greater than the number of unknowns.
. Every linear system has a least-squares
16.
solution.

Ale
—_S
=

a In Exercises 22-26, use LINTEK or MATLAB to


sl

17.
wt x

find the least-squares solution of the linear system.


NO
a

If matrix files are available, the matrix A and the


1 log x 0] vector b for Exercise 22 are denoted in the file
18.
-! oO 4q{JC 1 by A22 and b22, and similarly for Exercise
1-1 olf*%2] |~1 23-26. Recall that in MATLAB, the command
f O 1-1] | -24 rref([A’sA A’sb]) will row-reduce the augmented
1 1 3] ry matrix [ATA | A’b].
-1 O 5S\fPxq J-t
22. Exercise 16 25. Exercise 19
23. Exercise 17 26. Exercise 20
rt oO FF 0 24. Exercise 18
6.5 THE METHOD OF LEAST SQUARES 387

The routine YOUFIT in LINTEK can be used to executing back substitution on Rx = Q™b as in
illustrate graphically the fitting of data points by Eq. (12).
linear, quadratic, or exponential functions. In Recall that in MATLAB, if A is n x k, then
Exercises 27-31, use YOUFIT to try visually to fit Q isn X nand Risnx k. Cutting Qand R
the given Gata with the indicated type of graph. down to the text sizesn x kandk x k,
When inis is done, enter the zera data suggested respectively, we can use the command lines
to see the computer's fit. Run twice more with the
same data points but without trying to fit the data [Q R] = qr(A); fu, k] = size(A);
visually, and determine whether the data are best rref([R(1:k,1:k) Q(:,1:k)‘sb])
futed by a linear, quadratic, or {logarithmically
fated) exponential function, by comparing the fo compute the solution of Rx = Q"b.
least-squares sums for the three cases. Use LINTEK or MATLAB in this fashion for
Exercises 32-37. You must supply the matrix A
27. Fit (1, 2), (4, 6), (7, 10), (10, 14), (14, 19) by and the vector b.
a linear function.
32. Find the least-squares linear fit for the data
28. Fit (2, 2), (6, 10), (10, 12), (16, 2) by a
points (—3, 10), (-2, 8), (—1, 7), (0, 6),
quadratic function.
(1, 4), (2, 5), (G, 6).
29. Fit (1, 1), (10, 8), (14, 12), (16, 20) by an
33. Find the ieast-squares quadratic at for the
exponential function. Try to achieve a lower
data points in Exercise 32.
squares sum than the computer obtains with
its least-squares fit tbat uses logarithms of 34. Find the least-squares cubic fit for the data
y-values. points in Exercise 32.
30. Repeat Exercise 29 with data (1, 9), (5, 1), 35. Find the least-squares quartic fit for the data
(6, .5), (9, .01). points in Exercise 32.
31. Fit (2, 9), (4, 6), (7, 1), (8, .1) by a linear 36. Find the quadratic polynomial function
function. whose graph passes through the points (1, 4),
(2, 15), (3, 32).
The routine QRFACTOR in LINTEK has an 37. Find the cubic polynomial function whose
option to use a QR-factorization of A to find the graph passes through the points (— 1, 13),
least-squares solution of a linear system Ax = b, (0, —5), (2, 7), (3, 25).
CHAPTER

/ CHANGE OF BASIS

Recall from Section 3.4 that a linear transformation is a mapping of one vector
space into another that preserves addition and scalar multiplication. Two
vector spaces and V’ are isomorphic if there exists a linear transformation of
V onto V’ that is one-to-one. Of particular importance are the coordinatization
isomorphisms of an n-dimensional vector space V with kk". One chooses an
ordered basis B for V and defines T: V-> R’ by taking 7(v) = ¥,, the coordinate.
vector of v relative to B, as described in Section 3.3. Such an isomorphism
describes V and R" as being virtually the same as vector spaces. Thus mucin of
the work in finite-dimensional vector spaces can be reduced to computations
in R". We take full advantage of this feature as we continue our study of linear
transformations in this chapter.
We have seen that bases other than the standard basis E of R" can be useful.
For example, suppose A is an n X n matrix. Changing from the standard basis
of R" to a basis of eigenvectors (when this is possible) facilitates computation
of the powers A‘ of A using the more easily computed powers of a diagonal
matrix. as described in Section 5.2. Our work in the preceding chapter gives
another illustration of a desirable change of basis: recall the convenience that
an orthonormal basis can provide. We expect that changing to a new basis will
change coordinate vectors and matrix representations of linear transforma-
tions.
In Section 7.1 we review the notion of the coordinate vector vy, relative to
an ordered basis B, and consider the effect of changing the ordered basis from
B to B'. With this backdrop we turn to matrix representations of a linear
transformation in Section 7.2, examining the relationship between matrix
representations of a linear transformation T: V > V' relative to different
ordered bases of V and of V’.

388
74 COORDINATIZATION AND CHANGE OF BASIS 389

7.1 COORDINATIZATION AND CHANGE OF BASIS

Let B = (b,,b,, . . . , b,) be an ordered basis for a vector space V. Recall that ifv
is a vector in Vand v = 7,b, + r,b, + «+ + +47,b,, then the coordinate vector of¥
relative to B is

Ve = [F,, Fo - - Mal-

For example, the coordinate vector of cos’x relative to the ordered basis
(1, sin?x) in the vector space sp(1, sin’x) is [1, —1]. If this ordered basis is
changed to some other ordered basis, the coordinate vector may change. In
this section we consider the relationship between two coordinate vectors v, and
vy Of the same vector v.
We begin our discussion with R", which we will view as a space of column
veciors. Let v be a vector in R’ with coord:nate vector

relative to an ordered basis B = (b,, b,, .. . , b,), so that

rib, + rb, + “ee + rib, = V. (1)

Let M, be the matrix having the vectors in the ordered basis B as column
vectors; this is the basis matrix for B, which we display as

M, = | “ob ],
| (2)
] |
Equation (1), which expresses v as a vector in the column space of the matrix
M,, can be written in the form

Msp = V. (3)
If B’ is another ordered basis for R”, we can similarly obtain

Mvp = V. (4)
Equations (3) and (4) together yield

MyVy = M5Vp. (5)


390 CHAPTER 7 CHANGE OF BASIS

Equation (5) is easy to remember, and it turns out to be very useful. Both M,
and M, are invertible, because they are square matrices whose column vectors
are independent. Thus, Eq. (5) yields

= M;'Mg\. (6)

Equation (6) shows that, given any two ordered bases B and B’ of R’, there
exists an invertible matrix C—-namely,

C = Mz Ms, (7)

such that, for all vin R’,

vy = Crp. (8)
If we know this inatrix C, we can conveniently convert coordinates relative to
B into coordinates relative tc B’. The matrix C in Eq. (7) is the unique matrix
satisfying Eq. (8). This can be seen by assuming that D is also a matrix
satisfying Eq. (8). so that v, = Dv; then Cy, = Dv, for all vectors ¥, in R’.
Exercise 4! in Section 1.3 shows that we must have C = D.
The matrix C in Eq. (7) and Eq. (8) is computed in terms of the ordered
bases B and 3’, and we will now change to the subscripted notation Cy» to
suggest this dependency. Thus Eq. (8) becomes

y= C, aYp,

and the subscripts on C, read irom left to right, indicate that we are changing
from coordinates relative to B to coordinates relative to B’.
Because every n-dimensional vector space is isomorphic to R’, all our
results here are valid for coordinates with respect to ordered bases B and B’ in
any finite-dimensional vector space. We phrase the definition that follows in
these terms.

DEFINITION 7.1 Change-of-Coordinates Matrix

Tet B and B' be ordered bases for a finite-dimensiona! vector space V.


The change-of-coordinates matrix from B to B’ is the unique matrix
Cy» such that Cp oVy = Yp for all vectors v in V.

The term transition matrix is used in some texts in place of change-of-


coordinates matrix. But we used transition matrix with reference to Markov
chains in Section 1.7, so we avoid duplicate terminology here by using the
more descriptive term change-of- coordinates matrix.
Equation (8), written in the form C'y,. = vg, shows that the inverse of the
change-of-coordinates matrix from B to B’ is the change-of-coordinates matrix
from B to B’—that is, Cy,= C5.
71 COORDINATIZATION AND CHANGE OF BASIS 391

Equation (7) shows us exactly what the change-of-coordinates matrix must


be for any two ordered bases B and B’ in R*. A direct way to compute this
product M;'M,, if Mg! is not already available, is to form the partitioned
matrix

. [Me | Mo)
and reduce it (by using elementary row operations) to the form [J | C]. We can
regard this reduction as solving 7 linear systems, one for each column vector of
M, and all having the same coefficient matnx M,. From this perspective, the
matrix M, times the jth column vector of C (that is, the jth “solution vector’)
must yield the jth column vector of M,. This shows that MpC = Msg.
Consequently, we must have C = M;'M, = C,,.. We now have a convenient
procedure for finding a change-of-coordinates matrix.

SISCr:
B to BY in R"
ae
2.

Bee
D,) be.ardered bases of
‘Bao B'is athacont.
the matrix|e C, »

SLT | Coal

EXAMPLE 1 Let B = ([I, 1, O}, (2, 0, 1], [1, —1, 0]) and let EF = (e,, e,, e,) be the standard
ordered basis of R’. Find the change-of-coordinates matrix C,, from E to B,
and use it to find the coordinate vector y, of

v= ]3
. 4
relative to B.
SOLUTION Following the boxed procedure, we place the ‘‘new’’ basis matrix to the left and
the “old” basis matrix to the mght in an augmented matrix, and then we
proceed with the row reduction:

1 2 j%171 0 0 1 2 1) 1 0 0
1 O -i {0 1 O}~]0 -2 -2}-!1 1 O
0 1 O;0 0 1 0 1 0 0 0 1
M, M,
392 CHAPTER 7 CHANGE OF BASIS

1 O-1{ O 1 OF JLoo{s 3-1


~jO 1 1] $-} OF.j010)00 FY.
0 O-1}-; f LY JOO; s-Z-1
Thus the changc-of-coordinates matrix from E to B is

Onl
Onl
Ll}.

|
O
by
1

Nl
2
The coordinate vector of v =|3] relative to B is
4

piw

]

J
Cnr

Oni

KO
et
Vp = Ceg¥ =

Cd

Mt

A

bof

In the solution to Example 1, notice that C,, is the inverse of the matrix
appearing to the left of the partition in the original augmented matrix; that is,
Cex = M;'. Taking the inverse of both matrices, we see that Cz, = Mg. Of
course, this is true in general, because by our boxed procedure, we find Cy, by
reducing the matrix [M, | 3] = [/ | 4,], and this augmented matrix is already
in reduced row-echelon form.
In order to find a change-of-coordinates matrix C; 5 for bases B and B’ in
an n-dimensional vector space V other than R", we choose a convenient
ordered basis for V and coordinatize the vectors in V reiative to thai basis,
making V look just like R’. We illustrate this procedure with the vector space ?,
of polynomials of degree at most 2, showing how the work can be carned out
with coordinate vectors in R?.
EXAMPLE 2 Let B = (x2, x, 1) and B’ = (x? — x, 2x? — 2x + 1, x? — 2x) be ordered bases of
the vector space P,. Find the change-of-coordinates matnx from B to B’, and
use it to find the coordinate vector of 2x? + 3x — | relative to B’.
SOLUTION Let us use the ordered basis B = (x, x, 1) to coordinatize polynomials in
P,. Identifying each polynomial a,x? + a,x + a, with its coordinate vector
[a,, @,, Q] relative to the basis B, we obtain the following correspondence from
P, to R?:

Polynomial Basis in P, Coordinate Basis in R’

Old B= (x’, x, 1) ({1, 0, O}, {0, 1, O], [0, 0.1])


New B= (x? — xX, 2x? -2x+ {, x —_ 2x) ({l, -I, 0), (2, -2, I], {I, -2, 0)
ee”
7.1 COORDINATIZATION AND CHANGE OF BASIS 393

Working with coordinate vectors in R?, we compute the desired change-of-


coordinates matrix, as described in the box preceding Example !.

1 2 114100 1 2 1/100
-1 -z -2;010/~|0 O-1(116
0 1 O;JO0O1} |O 1 olooi
New basis Old basis

1 O ty] tl 0-2 100 2 1 -2


~]O 1 0/0 0 {}~j]010 0 0 1).
0 9-1/1 1 O| [O01] -1 -1 O
The change-of-coordinates matrix is
2 1-2
Cop=| 0 O If.
-1 -1 0O
We compute the coordinate vector of v = 2x’ + 3x — | relative to B’, using C5 ».
and the coordinate vector of ¥ relative to B, as follows:

Beh Car Vg Ve

As a check, we have
2x7 + 3x -— 1 = G(x? — x) — 1(2x? — 2x + 1) - SQ? — 2x).
y b’ b b} a
There is another way besides 45'M, to express the change-of-coordinates
matnx C,,. Recall that the coordinate vector (bj), of b; relative to B’ is found
by reducing the augmented matrix [M, | bj]. Thus all » coordinate vectors
(by can be found at once by reducing the augmented matrix [M, | Mg].
But this is precisely what our boxed procedure for finding Cy5. calls for. We
conclude that

|
Cog =| (die (Dade ° °° JI (9)

Because every n-dimensional vector space V is isomorphic to R", Eq. (9)


continues to hold for ordered bases B and B’ in V. To see explicitly why this 1s
true, we use B’ to coordinatize V. Relative to B’ the coordinate vectors of
vectors in B’ are the vectors in the standard ordered basis E of R’, whereas the
coordinate vectors of basis vectors in B are the vectors in the basis Be =
((b,)5, (b2)5-, . . - , (D,)g-) of R”. Now the change-of-coordinates matnx from B*
to E is precisely the basis matrix M,, on the right-hand side of Eq. (9). Because
394 CHAPTER 7 CHANGE OF BASIS

V and R* are isomorphic, we see (as in Example 2) that this matria is also the
change-of-coordinates matrix from B to B’.
There are times when it is feasible to use Eq. (9) to find the change-of-
coordinates matrix—just noticing how the vector b, in B can be expressed as a
ltnear combination of vectors in B' to determine the jth column vector of C, ,.
rather than actually coordinatizing ” and working within R*. The next
example illustrates this.

EXAMPLE 3. Use Eq. (9) to find the change-of-coordinates matrix C,,. from the basis
B=(xX?- 1,2 + 1,x° + 2x + 1) to the basis B’ = (x’, x, 1) in the vector space
P, of polynomials of degree at most 2.
SOLUTION Relative to 3’ = (x’, x, 1), we see at once that the coordinate vectors of x? - [,
x? + 1,and x’ + 2x + 1 are [l, 0,-1], [1, 0. 1], and [1, 2, 1], respectively.
Changing these to column vectors, we obtain
Pid
Cp py = 0 0 2 |. a

-! 11

SUMMARY

Lat V be a vector space with ordered basis B = (b,,b., .., 5,).

1. Each vectcr vin V has a unique expression as a linear combination


v= rib, + rb, + oes + r,b,,-

The vector vy, = [r,, 7, .--, 1,] 18 the coordinate vector of v relative to B.
Associating with each vector v its coordinate vector v, coordinatizes V, so
that V is isomorphic to R’.
2. Let Band B’ be ordered bases of V. There exists a unique n X n matrix Cgy
such that Cy Vy = Vg for all vectors v in V. This is the change-of-coordinates
matrix from B to B’. It can be computed by coordinatizing V and then
applying the boxed procedure on page 391. Alternatively, it can be
computed using Eq. (9).

EXERCISES

Exercises 1-7 are a review of Section 3.3. In 3. x3 + 3x? — 4x + 2 in P, relative to


Exercises 1-6, find the coordinate vector of the (x, x? — 1, x3, 2x?)
given vector relative to the given ordered basis. 4. x + x‘ in P, relative to
(1, 2x- 1x8 + x4, 2x3, x?
+ 2)
1. cos 2x in sp(sin’x, cos?x) relative to
(sin?x, cos?x)
2. x3 + x7 -— 2x + 4 in P, relative to
(1, x, x, x’).
COORDINATIZATION AND CHANGE OF BASIS 395

5 5 4 in M, relative to 16. Proceeding as in Example 2, find the


°
change-of-coordinates matrix from B =
3 4

(x7, x7, x, 1) to B= 08 — x47 — x,


01] /0 -I) /1 -1) [01 x — 1, x* + 1) in the vector space P; of
1 orjo oPjo 37jol polynomials of degree at most 3.
6. sinh x in sp(e', e-*) relative to (e*, e-*) 17. Find the change-of-coordinates matrix from
4, Find the polynomial in P, whose coordinate B’ to B for the vector space and ordered
vector relative to the ordered basis B = bases given in Exercise 16. Show that this
(x + x7,x — x4, 1 + x) is [3, 1, 2]. matrix is the inverse of the matrix found in
g. Let B be an ordered basis for R?. If Exercise 16.

3 1 2] 2
Cra=| 4 1 | and v=| 51, In Exercises 18-21, use Eq. (9) in tne text to find
-1 2 1 =| the change-of-coordinates matrix from B to B' for
the given vector space V with ordered bases B and
find the coordinate vector ¥,. B’
9, Let V be a vector space with ordered bases B
and B’. If 18. V = P,, B is the basis in Exercise 3, B’ =
i 2 0 (x5, Xy, x, 1)

Cag=| 0 | ~2] and 19. V=P,B=(C+ rH 120 -x + x4 1,


-! 0 1 wW—-xtlx+xt ly, B =, x, x, 1)
20. V is the space M, of all 2 < 2 matrices, B is
v = 3b, — 2b, + b,,

Aba EH
the basis in Exercise 5,
find the coordinate vector vg.

In Exercises 10-14, find the change-of-coordinates 21. Vis the subspace sp(sin?x, cos?x) of the
vector space F of all real-valued functions
matrix (a) from B to B’, and (b) from B’ to B.
with domain R, B = (cos 2x, 1), B’ =
Verify that these matrices are inverses of each
other. (sin2x, cos?x)
22. Find the change-of-coordinates matrix from
10. B= ([1, 1], [t, O]) and B’ = ((0, 1), (1, 1) B’ to B for the vector space V and ordered
in R? bases B and B’ of Exercise 21.
11. B= ({2. 3, 1), (1, 2, 0}, (2, 0, 3]) and B’ = 23. Let B and B’ be ordered bases for R’. Mark
({1, 0, 0}, [0, 1, 0], [0, 0, 1]) in R? each of the following True or False.
12. B = ((I, 0, 1), [J, 1, 0], (0, 1, 1]) and B’ = ___ a. Every change-of-coordinates matrix is
([0, 1, 1], (1, 1, 0}, (1, 0, tJ) in R? square.
—— b. Every n X nm matrix is a
13. B=({i, 1, l, 1, (1, 1, 1, 0), (1, 1, 0, 0},
change-of-coordinates matrix relative to
[1, 0, 0, OJ) and the standard ordered basis
some bases of R’.
B' = E for R‘
___c. If B and B’ are orthonormal bases, then
14. B = (sinh x, cosh x) and B’ = (e, e*) in Cs. is an orthogonal matrix.
sp(sinh x, cosh x) __d. If Cg, is an orthogonal matrix, then 8
15. Find the change-of-coordinates matrix from and B’ are orthonormal bases.
B’ to B for the bases B = (x?, x, 1) and B’ = —_—e. If Cg¢ is an orthogonal matrix and 8 1s
(x? ~ x, 2x? — 2x + 1, x? — 2x) of P, in an orthonormal basis, then B’ is an
Example 2. Verify that this matrix is the orthonormal basis.
inverse of the change-of-coordinates matrix _— f. For all choices ofB and B’, we have
from B to B’ found in that example. det(C, ,-) = 1.
CHAPTER 7 CHANGE OF BASIS

. For all choices of B and B’, we have 24. For ordered bases B and B’ in R’, explain
det(Cy¢) # 0. how the change-of-coordinates matrix from
B to B’ is related to the change-of-
. det(Cys,) = 1 ifand only if B= B’. coordinates matrices from B to E and from
E to B’.
i. Cys = / if and only :fB = B’.
25. Let B, B’, and B’ be ordered bases for R’.
. Every invertible 1 x n matrix is Find the change-of-coordinates matrix from
the change-of-coordinates matrix B to B’ in terms of Cg¢ and Cy... [HINT:
C,” for some ordered bases B For a vector v in R", what matrix times ¥,
and B’ of R". gives v7]

7.2 MATRIX REPRESENTATIONS AND SIMILARITY

Let Vand V’ be vector spaces wich ordered vases B = (b,, b,,. . . ,b,) and B’ =
(b,', b.’, ..., b,,’), respectively. If 7; V— V’ is a linear transformation, then
Theorem 3.10 shows that the matrix representation of T relative to B, B’,
which we now denote as Rg, is given by

Rag =| Td) T(b)e °° T(b,)e |, (1)

||
where 7(b,), is the coordinate vector of 7(b)) relative to B’. Furthermore, Rap
is the unique matrix satisfying
T(v\)s = Rees forall vin V. (2)
Let us denote by My,5, the m x n matrix whose jth column vector is 7(b,). From
Eq. (1), we see that we can compute Rg» by row-reducing the augmented
matrix [My | My, to obtain [7 | Rg»). The discussion surrounding Theorem!
3.10 shows how computations involving 7; V > V’ can be carried out in R
and R" using R, » and coordinates of vectors relative to the basis B of Vand B
of V’, in view of the isomorphisms of V with R" and of V' with R” that thesé
coordinatizations provide.
It is the purpose of this section to study the effect that choosing differem
bases for coordiratization has on the matrix representations of a lineaf
transformation. For simplicity, we shall derive our results in terms of the
vector spaces R". They can then be carried over to other finite-dimension4
vector spaces using coordinatization isomorphisms.

The Multiplicative Property of Matrix Representations

Let T: R’ > R" and T': R” > R‘ be linear transformations. Section 2.3 showed
that composition of linear transformations corresponds to multiplication
7.2 MATRIX REPRESENTATIONS AND SIMILARITY 397

cf their standard matrix representations; that is, for standard bases, we


have

Matrix for (7” ° T) = (Matrix for 7")(Matrix for T).

The analogous property holds for representations relative to any bases.


Consider the linear transformations and ordered bases shown by the diagram
T' oT

e—. pe Re
B B’ B"
The foilowing vector and matrix counterpart diagram shows the action of the
t.ansformations on vectors:

Rar

Vg —— T(¥)y ——> T'(T\))g-. (3)


Rey Ry.

The matrix Rpg» under the first arrow transforms the vector on the left of the
arrow into the vector on the right by left multiplication, as indicated by Eq.
(2). The matrix under the second arrow acts in a similar way. Thus the matrix
product Ry »-R,» transforms v, into 7”(7(v)),-. However, the matnx represen-
tation R,,- of T” T relative to B,B" is the unique matrix that carries vy, into
T"(T(¥)),- for all vy, in R". Thus we must have

Rap = Re eRe e- (4)

Notice that the order of the matrices in this product is opposite to the order in
which they appear in diagram (3). (See the footnote on page 160 to see why this
1S SO.)
Equation (4) holds equally well for matrix representations of linear
transformations of general finite-dimensional vector spaces, as shown in the
diagram
Toft

yt,yp 1, ye
B B' B’

EXAMPLE 1 Let B = (x‘, x}, x’, x, 1), which is an ordered basis for P,, and let T: P, > P, be
the differentiation transformation. Find the matrix representation R,, of 7
relative to B,B, and illustrate how it can be used to differentiate the polynomial
function

3x4 -— 5x3 + 7x? -— 8x + 2.


Then use calculus to show that (R,,)° = O.
398 CHAPTER 7 CHANGE OF BASIS

SOLUTION Because 7(x*) = Ax*"!, we see that the matrix representation Ry» in Eq. (1) is

00000
0

Oo
ooo &

Oo
0

OW
Ree =

oN

ooco
oO
0
l

Oo
r
Now the coordinate vector of p(x) = 3x4 — 5x’ + 7° — 8x + 2 relative to B is

Appiying Eq. (2) with B’ = B, we find that

00000) i 0
4000 0/|-5 2|
= Ro aP(X),
T(p(x))p =|9 300 CH 7) =)-151
0020} -% H
0001oll 2} | -8
Thus, p'(x) = T(p(x)) = 12x? — 15x? + 14x - 8.
The discussion surrounding Eq. (4) shows that the linear transforma-
tion of P, that computes the fifth derivative nas matrix representation
(Rpg). Because the fifth derivative of any polynomial in P, is zero, we see that
(Rps) =O. @

Similarity of Representations Relative to Different Bases


Let B and B’ be ordered bases for R’. In the preceding section we saw that
the change-of-coordinates matrix Cs, from B to B’ is the unique matnx
satisfying

Vy = Copy for all v in R’.


Comparison of this equation with Eq. (2) shows that C,,. is the matnx
representation, relative to B,B’, of the identity transformation «: R* > R’,
where cv) = v for all v © R’.
The relationship between the matrix representations R, and Ry, for
the linear transformation 7: R" — R" can be derived from the following
diagram:

R’ _t , R’ _T, Re, R’.

B' B B B’
7.2 MATRIX REPRESENTATIONS AND SIMILARITY 399

The vector and matrix counterpart diagram similar to diagram (3) is

R;

Va > Vg ——> TV), -———> T(¥)g.


Cyn Ry Car

Remembering to reverse the order, we find that

Rg = CapRsCeop. (5)

Reading the matrix product in Eq. (5) from right to left. we see that in order to
compute 7{(v), from vy. if we know Rg, we:
1. change from B' to B coordinates,
2. compute the iransformation relative to B coordinates,
3. change back to B’ coordinates.
Equation (5) makes this procedure easy to remember. _
The matrices Cy5 and Cy, are inverses of each other. It we drop
subscripts for a moment and let C = Cy. y, then Eq. (5) becomes
R, = C'R,C. (6)
Recall that two n X n matrices A and R are similar if there exists an invertible
n X nmatrix C such that R = C"'AC. (See Definition 5.4.) We have shown that
matrix representations of the same transformation relative to different bases
are similar. We state this as a theorem in the context of a general finite-
dimensional vector space.

THEOREM 7.1. Similarity of Matrix Representations of 7

Let T be a linear transformation of a finite-dimensional vector space V


into itself, and let B and B’ be ordered bases of V. Let R, and R, be the
matrix representations of T relative to B and B’, respectively. Then

Ry = C'!R,C,
where C = Cy, is the change-of-coordinates matnx from B’ to B.
Consequently, R,. and R, are similar matrices.

EXAMPLE 2 Consider the linear transformation T: R? ~ R? defined by

T(X), Xp X3) = (+ XQ + XG, X, + XH, HG).


Find the standard matrix representationA of T, and find the matnx represen-
tation R, of 7 relative to B, where
B = ({1, 1, 0], (1, 6, 1], (0, 1, 1)).
In addition, find an invertible matrix C such that R, = C"'AC.
400 CHAPTER 7 CHANGE OF BASIS

SOLUTION Here the standard ordered basis £ plays the role that the basis B played in
Theorem 7.1, and the basis B here plays the role played by B’ in that theorem.
Of course, the standard matrix representation of T is
17 1
A={1 1 Of.
001

We compute the matrix representation R, by reducing [M,| Mn» to [7 | Rg] as


follows:

fi 10 2 2 2] [1 1 012
2 2
10 1 2 | 1 j~jO-l L{|O-t1 -t
lori] o ut lb i lout
b, b, b, 7(b,) 7(d.) T(by}
fai 100 an
~10 I 1' Ol t/~jO10)/01 11.
lo 0 2/000] joo!]ooo|
I R,
Thus,

[211
R,=|0 1 1}.
000

An invertible matrix C such that Rg = C"'AC is C = Cy. We find C = C,- by


reducing the augmented matrix [M, | M,] = [J | !@,]. Because this matrix 1s
already reduced, we see that

l
CC)

C = M,= l
es
Oo

0
—_

in this case. The matricesA and R, are similar matrices, both representing the
given linear transformation. As acheck, we could compute that AC = CR,. #

We give an example illustrating Theorem 7.1 in the case of a finite-


dimensional vector space other than R’.

EXAMPLE 3 Forthespace P, of polynomials of degree at most 2, let T: P, > P, be defined by


T(p(x)) = p(x — 1). Consider the two ordered bases B = (x°, x, 1) and B’ =
(x, x + 1, x? — 1). Find the matrix representations R, and Ry of T and 4
matrix C such that Ry. = C7'R,C.
SOLUTION Because

Tie) = (x — 1p =x? - 2x 4+ 1,
T(x) =x- 1,
T(1) = 1,
7.2 MATRIX REPRESENTATIONS AND SIMILARITY 401

the matnx representation of 7 relative to B = (x2, x, 1) is


1 0 0
Rz= —2 1 QO).
1-1! 1

Next we compute the change-of-coordinates matrices Cy, and C,.,. We see


that
C 1
C=Cpg,=|1 1. OF.
lo 1-1
Moreover
x= -i(x) + I(x t+ 1) 4+ 10? - 2D,
x= I(x) + Ox 4+ 1) + OC? - J),
1 = -1(x) + I(x + 1) + 0(¢ - J),
sO
-! 1-1
Coy=! 1 0 FI.
1 0 0
Notice that Cp, can be computed as (Cz s)"'. We now have
-1 1t-tl)f 1 0 olfo o 1
R,=| 1 O I{{-2 1 Ol 1 0
1 0 Off 1-1 Jffo 1 -t
Cre R; Crp

-!} 1 -1}} 0 O 1 2
=! 1 O Tf} 1 1 -2})=]-1
i O O;;/-!1 O O 0
Alternatively, R, can be computed directly as

Ry=| T(bi)e TMb)e T(b5)z |. .

We have seen that matrix representations of the same transformation


relative to different bases are similar. Conversely, any two similar matrices can
be viewed as representations of the same transformation relative to different
bases. To see this, let A be ann X n matrix, and let C be any invertible n X n
matrix. Let 7: R"—> R" be defined by 7(x) = Ax so that A is the standard matnx
representation of 7. Because C is invertible, its column vectors are indepen-
dent and form a basis for R". Let B be the ordered basis having as /th vector the
402 CHAPTER 7 CHANGE OF BASIS

jth coiumn vector of C. Then C is precisely the change-of-coordinates matrix


from B to the standard ordered basis F. That is, C = C,-. Consequently,
C'AC = C, ,ACg, is the matrix representation of T relative to B.

Significance of the Similarity Relationship for Matrices


Two n X n matrices are similar if and only ir they are matrix
representations of the same lincar transformation 7 relative to
suitable ordered bases.

The Interplay of Matrices and Linear Transformations

Let V be an n-dimensional vector space, and let 7: V — V be a linear


transformation. Suppose that T has a property that can be characterized in
terms of the mapping, without reference to coordinates relative to any basis.
Assertions about T of the form

There is a basis of eigenvectors of T,


The nullspace of T has dimension 4,
The eigenspace FE, of T has dimension 2,
are coordinate-independent assertions. For another example, T has A as an
eigenvalue if and only if

T(v) = Av for some nonzero vector v in V. (7)

This statement makes no reference to coordinates relative to a basis. The


existence and value of an eigenvalue A is coordinate independent. Of course,
the existence and the characterization of eigenvectors v curresponding to A as
given in Eq. (7) are also coordinate independent. However, an eigenvector ¥
can have different coordinates relative to different bases B and B’ of V, so the
matrix representations R, and R,z of V may have different eigenvectors.
Equation (7) expressed in terms of the coordinatizations of V by B and 5’
becomes

RAVve) = Avg and Rg(v-) = Avg,

respectively. While the coordinates of the eigenvector v change, the value 4


doesn’t change. That is, if A is an eigenvalue of 7, then A is an eigenvalue of
every matrix representation of T..
Now any two similar matrices can be viewed as matrix representations of
the same linear transformation T, relative to suitable bases. Consequently;
similar matrices must share any properties that can be described for the
transformation in a coordinate-free fashion. In particular we obtain, with no
additional work, this nice result:
7.2 MATRIX REPRESENTATIONS AND SIMILARITY 403

Similar matrices have the same eigenvalues. |

We take a moment to expand on the ideas we have just introduced. Let V


be a vector space of finite dimension n. The study of linear transformations
T: V— Vis essentially the same as the study of products Ax of vectors x in R” by
n X n matrices A. We should understand this relationship thoroughly and be
able to bounce back and forth at will from 7(v) for vin V to Ax for x in R’.
Sometimes a theorem that is not immediately obvious from one point of view
is quite easy from another. For example, from the matrix point of view, it is
not immediately apparent that similar matrices have the same eigenvalues; we
ask for a matrix proof in Exercise 27. However, the truth of this statement
becomes obvious from the linear transformation point of view. On the other
hand, it is not obvious from Eq. (7) concerning an eigenvalue of a linear
transformation 7: R’—> R’ that a linear transformation of Vcan have at most n
eigenvalues. However, this is easy tc establish from the matnx point of view:
Ax = Ax has a nontrivial solution if and only if the coefficient matrix A — AJ of
the system (A ~— A/)x = 0 has determinant zero, and det(A — AJ) = O is a
polynomial equation of degree n.
We now state a theorem relating eigenvalues and eigenvectors of similar
matrices. Exercises 24 and 25 ask for proofs of the second and third statements
in the theorem. We have already proved the first statement.

THEOREM 7.2 Eigenvalues and Eigenvectors of Similar Matrices

Let A and R be similar n X n matiices, so that R = C-'AC for some


invertible n X mn matrix C. Let the eigenvalues of A be the (not
necessarily distinct) numbers A,, A,,..., A,-
1. The eigenvalues of R are also A,, A,,..., A,
2. The algebraic and geometric multiplicity of each A, as an eigen-
value of A remains the same as when it is viewed as an eigenvalue
of R.
3. Ifv,in R" is an eigenvector of the matrix A corresponding to A,, then
C"'y, is an eigenvector of the matrix R corresponding to A,.

We give one more illustration of the usefulness of this interplay between


linear transformations and matrices, and of the significance of similarity. Let
V be a finite-dimensional vector space, and let 7: V — V be a linear
transformation. It was easy to define the notions of eigenvalue and eigenvector
in terms of 7. We can now define the geometric multiplicity of an eigenvalue A
to be the dimension of the eigenspace EF, = {vy € V| T(v) = Av}. However, at this
time there is no obvious way to characterize the algebraic multiplicity of
A exclusively in terms of the mapping 7, without coordinatization. Conse-
quently, we define the algebraic multiplicity of A to be its algebraic multiplicity
404 CHAPTER 7 CHANGE OF BASIS

as an eigenvalue ofa matrix representation of A. This makes sense because this


algebraic multiplicity of A is the same for ai! matrix representations of T.
Statement 2 of Theorem 7.2 assures -us that this is the case.

Diagonalization
Let 7: V— Vbea linear transformation of an n-dimensional vector snace into
itself. Suppose that there exists an ordered basis B = (b,, b,,..., b,) of V
composed of eigenvectors of 7. Let the eigenvalue corresponding to b, be 4..
Then the matrix representation of 7 relative to B has the simple diagonal form

We give a definition of diagonalization for linear transformations that clearly


parallels one for matrices.

DEFINITION 7.2 Diagonalizable Transformation

A linear transformation T of a finite-dimensional vector space V into


itself is diagonalizable if V has an ordered basis consisting of eigen-
vectors of T.

EXAMPLE 4 Show that reflection of the plane R? in the line y = mx is a diagonalizable


transformation and find a diagonal matnx representation for it.
SOLUTION Reflection of the plane R? in a line through the origin is a linear transforma-
tion. (See Illustration 2 on page 297.) Figure 7.1 shows that b, = [1, m] is
carried into itself and b, = [—m, 1] is carried into —b,. Thus b, is an
eigenvector of this transformation with corresponding eigenvalue 1 whereas b,

y
t
b, = [-m, I]

oo 1 a

FIGURE 7.1
Reflection in the line y = mx.
7.2 MATRIX REPRESENTATIONS AND SIMILARITY 405

is an eigenvector with eigenvalue — 1. Because {b,, b,} is a basis of eigenvectors


for R?, the reflection transformation is diagonalizable. Relative to the ordered
basis B = (b,, b,),
1 0
R, = f a a

If V has a known ordered basis B of eigenvectors, it becomes easy to


compute the k-fold composition 7“(v) for any positive integer k and for any
vector vin V. We need only find the coordinate vector d in R’ of v relative to B,
so that
v=db,
+ d,b, +--+ +d,b

Tnen
T*(v) = d,A,*b, + d,,*b, tere + d,d,*b,.- (8)

Of course, this is the transfoimation analogue of ihe computation of -i‘x


discussed in Section 5.1. We illustrate with an example.

EXAMPLE 5 Consider the vector space P, of polynomials of degree at most Z, and let B’ be
the ordered basis (1, x, x’) for P,. Let T: P, > P; be the linear transformation
such that
TQ) =34+2x4+%, Td =2, The’) = 2x’.
Find T(x + 2).
SOLUTION The matrix representation of T relative to B’ is
320
R, =|2
0 O1.
102
Using the methods of Chapter 5, we easily find the eigenvalues and eigen-
vectors of R,. and of T given in Table 7.1.

TABLE 7.1

Eigenvalues Eigenvectors of R,. Eigenvectors of T

,=-l w, = [-3, 6, 1] p(x)= -3 + 6x+x


A= 2 w, = (0, 0, 1] px) =x
A,= 4 w, = (2, 1, UJ p(x)=2+x+x

Let B be the ordered basis (—3 + 6x + x’, x’, 2 + x + x’) consisting of these
eigenvectors. We can find the coordinate vector d of x + 2 relative to the basis
B by inspection. Because

x +2 = 000 + 6x - 3) + (-1)x? + Ie + x + 2).


406 CHAPTER 7 CHANGE OF BASIS

we see that

d, = 0, d, = —1, and d, = l.

Thus, Eq. (8) has the form

T(x + 2) = 2-1)? + 44 1jO2 + x + 2).

In particular,
T(x
+ 2) = —16x?
+ 256(x2 + x + 2) = 240x?+ 256x
+ 512. "

, SUMMARY

Let T: V—> V be a linear transformation of a finite-dimensional vector space


into itself.

1. If Band B’ are ordered bases of V, then the matrix representations R, and


R, of T relative to B and to B’ are similar. That is, there is an invertible
matrix C—namely, C = Cs ,—such that
Ry. = C'!R,C.
2. Conversely, two similar n X 1 matrices represent the same linear transfor-
mation of R” into R’ relative to two suitably chosen ordered bases.
3. Similar matrices have the same eigenvalues with the same algebraic and
geometric multiplicities.
4. If Aand R are similar matrices with R = C-'AC, and if v is an eigenvector
of A, then C 'v is an eigenvector of R corresponding to the same
eigenvalue.
5. The transformation T is diagonalizable if / has a basis B consisting of
eigenvectors of 7. In this case, the matrix representation R, is a diagonal
matrix and computation of T*(v) by R,*(v,) becomes relatively easy.

EXERCISES

In Exercises 1-14, find the matrix representations 3. T: R?— R? defined by 7([x, y, z]} =
R, and Ry and an invertible matrix C such that [xt y,x+z,y—-z];B=
Ry = C'RsC for the linear trzusformation T of ([1, 1, 1), (1, 1, 0) (1, 0, 0), B= E
the given vector space with the indicated ordered 4. T: R? > R? defined by T(x, y, z]) =
bases B and B'. [Sx, 2y, 3z]; B and B’ as in Exercise 3
1. T: R? > R’ defined by 7([x, y]) = 5. T: R? — R? defined by 7([x, y, z]) = [z, 0, xh
[x — y,x + dy}; B= ([t, 1], (2, 1), B = ((3, 1, 2], (1, 2, 1), (2, -1, 0)), BY =
Bose ({1, 2, 1], [2, 1, -I), [S, 4,1)
2. T: R?-—> R defined by 7([x, y]) = 6. T: R? — R’ defined as reflection of the plane
[2x + 3y. x + 2y}; B = ((1, -1), [1, 1), through the line 5x = 3y; B =
B = ((2, 3], (1, 2) ((3, 5], [5, —3)), B' ad E
MATRIX REPRESENTATIONS AND SIMILARITY 407

7, T: P3 — R? defined as reflection of R? 20. T defined on R? by 7([X,, % %G]) =


through the plane x + y+ z=0;B= [%, 4x, + 7x5, 22 — X45]
({1, 0, —11, (1, -1, 0], (1, 1, 1), B =F 21. T defined on R? by 7([x,, X2 %3]) =
8. T: R? > R? defined as reflection of R? [5Sx,, —5x, + 3x, ~ 5x, ~3x, - 2x]
through the plane 2x + 3y + z= 0; B= 22. T defined on R? by 7([x,, x,, X3]) =
((2, 3, 1}, [0, 1, —3}, [1, 0, -2)), B= E (3x, — xX; + X3, —2x, + 2x, — x,
9. T: R? — R defined as projection on the 2x, + xX, + 4x,]
plane x,=0;B=E,B'= 23. Mark each of the following True or False
((, 0, 1], I, 0, —1), [9, I, 0}) —— a. Two similar n x n matrices represent the
10. 7: R? — R? defined as projection on the same linear transformation of R" into
plane
x + y + z= 0; Band
B’ as in itself relative to the standard basis.
Exercse 7. ——b. Two different n x n matrices represent
11. T: P, — P, defined by 7(p(x)) = p(x + 1) + different linear transformations of R" into
Ax), B = (x, x, 1), B’ = (1, x, 2) itself relative to the standard basis.
___c. Two similar n X n matrices represent the
12. T: P, > P, as in Exercise 11, but using B =
same linear transformation of R" into
(x?,x, 1) and BY = (x+ 1, x + 1, 2)
itself relative to two suitably chosen bases
13. T: P; — P, defined by 7(p(x)) = p’(x), the for R"
derivative of p(x); B = (x, x, x, 1), B’ = —_—d. Similar matrices have the same
(1,x+ 1,241, e+ 1) eigenvalues and eigenvectors.
14. T: W— W, where W = sp(e’, xe") and T is __. e. Similar matrices have the same
the derivative transformation; B = (e%, xe’), eigenvalues with the same algebraic and
B’ = (2xe", 3e*) geometric multiplicities.
15. Let T: P, — P, be the liuear transformation —_— f. IfA and Caren xX n matrices and C is
and B’ the ordered basis of P, given in invertible and v is an eigenvector of A,
Example 3. Find the matrix representation then C~'v is an eigenvector of C-'AC.
Ry of T by computing the matrix with —__. g. If A and Caren X n matrices and C is
invertible and v is an eigenvector of A,
column vectors 7(bi)s, T(b;)g, T(b3)s.
then CV is an eigenvector of CAC™'.
16. Repeat Exercise 15 for the transformation in
—.-h. Any two 7 x n diagonal matrices are
Example 5 and the basis B’ =
similar.
(x + 1, -1,2 + x) of P,. —— i. Any two n X n diagonalizable matrices
having the same eigenvectors are similar.
In Exercises 17-22, find the eigenvalues 2, and . Any two n X n diagonalizable matrices
me

the corresponding eigenspaces of the linear having the same eigenvalues of the same
transformation T. Determine whether the linear algebraic multiplicities are similar.
transformation is diagonaiizabl< 24. Prove statement 2 of Theorem 7.2.
25. Prove statement 3 of Theorem 7.2.
17. T defined on R‘ by 7([x, y]) = 26. Let A and R be similar matrices. Prove in
[2x — 3y, —3x + 2y]
two ways that A? and R? are similar matrices:
18. T defined on R? by 7([x, y]) = using a matmx argument, and using a linear
[x — y, ~x + y] transformation argument.
19. T defined on R? by 71[x,, x3, X]) = 27. Give a determinant proof that similar
[X, + 2X3, XX, + Xy] matrices have the same eigenvalues.
TY

CHAPTER

6 EIGENVALUES: FURTHER
| APPLICATIONS AND
COMPUTATIONS

This chapter deals with further applications of eigenvalues and with the
computation of eigenvalues. In Section 8.1, we discuss quadratic forms and
their diagonalization. The principal axis theorem (Theorem 8.1) asserts that
every quadratic form can be diagonalized. This is probably the most impor-
tant result in the chapter, having applications to the vibration of elastic bodies,
to quantum mechanics, and to electric circuits. Presentations of such applica-
tions are beyond the scope of this text, and we have chosen to present a more
accessible application in Section 8.2: classification of conic sections and
quadric surfaces. Although the facts about conic sections may be familiar to
you, their easy derivation from the principal axis theorem should be seen and
enjoyed.
Section 8.3 applies the principal axis theorem to local extrema of
functions and discusses maximization and minimization of quadratic forms
on unit spheres. The latter topic is again important in vibration problems; it
indicates that eigenvalues of maximum and of minimum magnitudes can be
found for symmetric matrices by using techniques from advanced calculus for
finding extrema of quadratic forms on unit spheres.
In Section 8.4, we sketch three methods for computing eigenvalues: the
power method, Jacobi’s method, and the QR method. We attempt to make as
intuitively clear as we can how and why each method works, but proofs and
discussions of efficiency are omitted.
This chapter contains some applications to geometry and analysis that are
usually phrased in terms of points in R" rather than in terms of vectors. We will
be studying these applications using our work with vectors. To permit a natural
and convenient use of the classical terminology of points while working with
vectors, we relax for this chapter our convention that the boldface x always be
a vector [X,. x;,..., x,], and allow the same notation to represent the point
(X;, X>,....X,) as well.

408
8.1 DIAGONALIZATION OF QUADRATIC FORMS 409

8.1 DIAGONALIZATION OF QUADRATIC FORMS

Quadratic Forms

A quadratic form in one variable x is a polynomial f(x) = ax’, where a # 0.A


quadratic form in two variables x and y is a polynomial f(x, y) = ax + bxy +
cy’, where at least one of a, b, or cis nonzero. The term quadratic means degree
2. The term form means homogeneous, that is, each summand contains a
product of the same number of variables—namely, 2 for a quadratic form.
Thus, 3x? — 4xy1s a quadratic form in xand y, but x’ + ¥ — 4 and x — 3j“ are
not quadratic forms.
Turning to the general case, a quadratic form f(x) = f(x,, x... , X,)
in n variables is a polynomial that can be written, using summation nota-
tion, as

f(x) = > U,,XX)s (1)


isj
ij=l

where not all u,,are zero. To illustrate, the general quadratic form in x,, x,, and
X; 1S
; 2
Ui Xp cb UyyXyXy + UygXXy + Uy 9X? + Uy XyXy + Uy3X;°- (2)

HISTORICAL NOTE _ IN 1826, Cauchy DISCUSSED QUADRATIC FORMS IN THREE VARIABLES— forms
of the type Ar + By + Cz? +2Dxy +2Exz + 2Fyz. He showed that the characteristic equation
formed from the determinant
ADE
DB
EF
remains the same under any change of rectangular axes, whai we would call an orthogonal
coordinate change. Furthermore, he demonstrated that one could always find axes such that the
new form has only the square terms. Three years later, Cauchy generalized the result to quadratic
forms in n variables. (The matrices of such forms are » X n symmetric matrices.) He showed that
the roots A,, A;,..., A, of the characteristic equation are all real, and he showed how to find the
linear substitution that converts the original form to the form A,x,2 + Ap? + °° + A,x,2. In
modem terminology, Cauchy had proved Theorem 5.5, that every real symmetric matnix is
diagonalizable. In the two-variable case, Cauchy’s proof amounts to finding the maximum and
minimum of the quadratic form f(x,y) = Ax? + 2Bxy + Cy? subject to the condition that x? + 37 =
1. In geometric terms, the point at which the extreme value occurs is that point on the unit circle
which also lies on the end of one axis of one of the family of ellipses (or hyperbolas) described by
the quadratic form. If one then takes the lire from the origin to that point as one of the axes and
the perpendicular to that line as the other, the equation in relation to those axes will have only the
squares of the variables, as desired. To determine an extreme point subject to a condition, Cauchy
uses what is today called the principle of Lagrange multipliers.
410 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS

A computation shows that

Uy, Uy. yy |],


[X1, %y, Xj) }O yy U3} |X,
0 0 wy]| x;

UyX, + UX + UyyXy
= [x,, x), X43] Uy) X, + UX;
U33X3
= X(tXy + yXy + Uy3Xy) + XY(UyyX, + MQ5X5) + X3(U53X5)
FS Uy XY + Uy
A Xy + Ung XXy + Uy_Xy + Uy XX, + UyXY,
which 1s again form (2). We can verify that the term involving u, fori = jin the
expansion of

Uy, Un Ural [X,]


0 uy Ur, x, |
[X,, X), sae yg Xl ° ° ° ° (3)

| 0 0 Unn| | Xn

is precisely u;,x,x,. Thus, matrix product (3) isa 1 xX 1 matrix whose sole entry
is equal to sum (1).

Every quadratic form in n variables x, can be written as x7Ux,


where x is the column vector of variables and U is a nonzero
upper-triangular matrix.

We will call the matrix U the upper-triangular coefficient matrix of the


quadratic form.

EXAMPLE 1 Write x? — 2xy + 6xz + 2? in the form of matrix product (3).


SOLUTION We obtain the matrix expression

1-2 6]\x
(x,y, 2]}0 O Olly).
0 0 Iilz
Just think of x as the first variable, y as the second, and z as the third. The
summand —2xy, for example, gives the coefficient —2 in the row 1, column 2
position. «

A matrix expression for a given quadratic form is by no means unique:


x" Ax gives a quadratic form for any nonzero n X n matrix A, and the form can
be rewritten as x7Ux, where U is upper triangular.
8.1 DIAGONALIZATION OF QUADRATIC FORMS 411

EXAMPLE 2 Expand
-l 3 Hx
[x,yZ)} 2 1 Olly 4

2 -~
Zz

and find the upper-triangular coefficient matrix for the quadratic form.
SOLUTION We find that

-! 3 IUf|x —x
+ 3y+2z
[x,y,z]} 2 1 Ollyl=[xyy, 2] x+y
—-2 2 4f/z —2x
+ 2y + 4z
X(—x+ By+ 2) + y(2x + y) + 2-2x + 2p + 42)
~x? + 3xy + xz + 2xy + y? — 2xz + 2ypz + 42?
= -x? + Sxy + y? — xz + 2yz + 427,
The upper-triangular coefficient matrix 1s
-! 5-1]
U=| 0 1 2. 5
0 0 4

All the nice things that we will prove about quadratic forms come from the
fact that any quadratic form f(x) can be expressed as f(x) = x™Ax, where A isa
symmetric matrix.

EXAMPLE 3 Find the symmetric cuefficient matrix of tne form x? — 2xy + 6xz + 2
discussed in Example 1.
SOLUTION Rewriting the form as
x? — xy — yx + 3xz + 32x + 27,

we obtain the symmetric coefficient matrix

1 -l 3
A=|{-1 0O Ol. a
3 0 1

As illustrated in Example 3, we obtain the symmetric matrix A of the form

f(x) = > U,)X,X, (4)


1S}
ijel

by wmiting each cross term u,x,x, as (u,/2)x,x, + (u,/2)x;x, and by taking


a,, = a, = u,/2. The resulting matrix A is symmetric. We call it the symmetric
coefficient matrix of the form.
412 CHAPTER 8 EIGENVALUES FURTHER APPLICATIONS AND COMPUTATIONS

Every quadratic form in n variables x; can be wnitten as x7Ax,


where x is the column vector of variables and A is a symmetric
matrix.

Diagonalization of Quadratic Forms


We saw in Section 6.3 that a symmetric matrix can be diagonalized by an
orthogonal matrix. That is, if A is an n X n symmetric matrix, there exists an
n X northogonal change-of-coordinates matrix C such that C~'AC = D, where
D is diagonal. Recall that the diagonal entries of Dare A,, A,, .. . , A,, where the
A, are the (not necessarily distinct) eigenvalues of the matrix A, the jth column
of C is a unit eigenvector corresponding to A, and the column vectors of C
form an orthonormal basis for R’.
Consider now a quadratic form x"Ax, where A is symmetric, and let C be
an orthogonal diagonalizing matrix for A. Because C~' = C7, the substitution
x = Ct Diagonalizing substitution
changes our quadratic form as follows:
xTAx = (Ct)TA(Ct) = t7C7ACt = t7C"'ACt
= tt = Ait? + Agty +s + AL’,
where the A, are the eigenvalues of A. We have thus diagonalized the quadratic
form. The value of x7Ax for any x in R" is the same as the value of t7Dt for
t=C'x.

EXAMPLE 4 Find a substitution x = Ct that diagonalizes the form 3x,? + 10x,x, + 3x,?, and
find the corresponding diagonalized form.
SOLUTION The symmetric coefficient matnx for the quadratic form is

A=_ 513 3}5


We need to find the eigenvalues and eigenvectors for A. We have
3-A 5
4 — all =| 5 3A = A? — 6A — 16 = (A+ 2)(A — 8).

Thus, A, = —2 and A, = 8 are eigenvalues of A. Finding the eigenvectors that


are to become the columns of the substitution matrix C, we compute

a-ar=avrr=[S]~ [9
_ _{5 5) {1 1

and
ype goer {7 SJ fl -l
8.1 DIAGONALIZATION OF QUADRATIC FORMS 413

Thus eigenvectors are v, = (—1, 1] and v, = (1, 1). Normalizing them to length
1 and placing them in the columns of the substitution matrix C, we obtain the
diagonalizing substitution

x] = cf] -[-YV2 v2 4]
X) t, W222} [a

Our theory then tells us that making the variable substitution

l
x, = 5] (-1, + 4),
.)
2 FT (44! (t, + 4),
/

in the form 3x? + 10x,x, + 3x,? will give the diagonal form

Ait? + Agt? = —2t? + 81,2

This can, of course, be checked by actually substituting the expressions for x,


and x, in Eqs. (5) into the form 3x,? + 10x,x, + 32,7. «

In the preceding example, we might like to clear denominators in our


substitution given in Egs. (5) by using x = (/2C)t, so that x, = —f, + 4, and
xX, = t, + t,. Using the substitution x = kCt with a quadratic form x7Ax, we
obtain

xTAx = (KCt)TA(KCt) = K*(t™CTACt) = R(t Dt).

Thus, with k = V2, the substitution x, = -f, + 4, x, = t, + tin Example 1


results in the diagonal form

2(-21,? + 84,2) = —4t, + 16t,2.

In any particular situation, we would have to balance the desire for arithmsiic
simplicity against the desire to have the new orthogonal basis actually be
orthonormal.
As an orthogonal matrix, C has determinant +1. (See Exercise 22 in
Section 6.3.) We state without proof the significance of the sign of det(C). If
det(C) = 1, then the new ordered orthonormal basis B given by the column
vectors of C has the same orientation in R” as the standard ordered basis £;
while if det(C) = —1, then B has the opposite orientation to £. In order for B =
(b,, b,) in R? to have the same orientation as (e,, e,), there must be a rotation of
the plane, given by a matnx transformation x = Ct, which carries both e, to b,
and e, to b,. The same interpretation in terms of rotation is true in R’, with the
414 CHAPTER 8 EIGENVALUES’ FURTHER APPLICATIONS AND COMPUTATIONS

42

b;
7)

(a) (b;
FIGURE 8.1
(a) Rotation of axes, hence (b,, b,) and (e,, e,) have the same orientation;
(b) not a rotation of axes, hence, (b,, b,) and (e,, e,) have opposite orientations.

additional condition that e, ve carried to b. To illustrate for the plane R’,


Figure 8.1(a) shows an ordered orthonormal basis B = (b,, b,), where b, =
[-1/V2, 172] and b, = [- 1/2, - 1/2], having the same orientation as E.
Notice that

_{-iv2 -1/V2) ob ot
dele)= | ind 17} 7272 > 1
b, b,
Counterclockwise rotation of the plane through an angle of 135° carries £ into
B, preserving the order of the vectors. However, we see that the basis (b,, b,) =
({0, —1], [—1, Oj) shown in Figure 8.1(b) does not have the same orientation
as E, and this time

det(C) = _t a =|.
b, b,

For any orthogonal matrix C having determinant —1, multiplication of


any single column of C by — 1 gives an orthogonal matrix with determinant |.
For the diagonalization we are considering, where the columns are normalized
eigenvectors, multiplication by ~—1 still gives a normalized eigenvector.
Although there will be some sign changes in the diagonalizing substitution, the
final diagonal form remains the same, because the eigenvalues are the same
and their order has not been changed.
We summarize all our work in one main theorem, and then conclude with
a final example. Section 8.2 will give an application of the theorem to
geometry, and Section 8.3 will give an application to optimization.
8.1 DIAGONALIZATION OF QUADRATIC FORMS 415
THEOREM 8.1. Principal Axis Theorem

Every quadratic form f(x) in n variables x,, x,, ..., x, can be


diagonalized by a substitution x = Ct, where C is an n X n orthogonal
matiix. The diagonalized form appears as

At? + Agt? + 2+ HAL,

where the A, are the eigenvalues of the symmetric coefficient matrix


A of f(x). The jth column vector of C is a normalized eigenvector
v; of A corresponding to A; Moreover, C can be chosen so that
det(C) = 1.

We box a step-by-step outline for diagonalizing a quadratic form.

Diagonalizing a Quadratic Form f(x)


Step 1 Find the symmetric coefficient matrix A of the form f(x).
Step 2 Find the (not necessarily distinct) eigenvalues A,, A,,..., A,
of A.
Step 3 Find an crthonormal basis for R” consisting of normalized
eigenvectors of A.
Step 4 Form the matrix C, whose columns are the basis vectors
found in step 3, in the order corresponding to the listing of
eigenvalues-in step 2.;The transformation x = Ci is a
rotation if det(C) = 1: If a rotationis desired and det(C) =
—1, change the sign of all components of just one column
vector in C.
Step 5 The substitution x = Ct transforms /(x) to the diagonal
form A,t,? + Age? + 20° + Agt,?.

EXAMPLE 5 Finda variable substitution that diagonalizes the form 2xy + 2xz, and give the
resulting diagonal form.

SOLUTION The symmetric coefficient matrix for the given quadratic form is

011
0 O}.
he

0 0

416 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS

Expanding the determinant on the last column, we find that

-A 1 1
IA-aAn=| 1 -A Of} =1()-aQ-1)
1 0O- A
~A(—1 + A? = 1) = -A(A? = 2).

The eigenvalues of A are then A, = 0, A, = V2, and A, = —V/2.


Computing eigenvectors, we have

so v, = [0, —1, 1] is an eigenvector for A, = 0. Moreover,

-V2 1 1 ] -V2 0
A-AI=A-V2=| 1 -V2. 0 [~|-W2 1 4
l 0 -v2 i 0 -V2
1 -vW2 0 f -V2 0
~|0 -1 1 |~]0 1 —1],
0 V2 -V2} [0 0 0

so v, = [V2, |, 1] is an eigenvector for A, = 2. In a similar fashion, we find


that vy, = [-2, 1, 1] is an eigenvector for A, = —V/2. Thus,

0 V22 -V2/2
_ 1 1

V2 : i

is an orthogonal diagonalizing matnx for A. The substitution

x= (alte — &)

y=>(-V2t, + t + 4)
Mie yl

L>=
V2t, + t+ 4)

will diagonalize 2xy + 2xz to become V2t, - V/21,2. a


81 DIAGONALIZATION OF QUADRATIC FORMS 417
ee

| SUMMARY

Let x andt ben x 1 column vectors with entries x, and t, respectively.

{. For every nonzero n X n matrix A, the product f(x) = x7Ax gives a


quadratic form in the variables x,.
2. Given any quadratic form f(x) in the variables x, there exist an upper-
triangular coefficient matrix U such that f(x)= x’Ux and symmetric
coefficient matrix A such that f(x) = x7Ax.
3. Any quadratic form f(x) can be orthogonally diagonalized by using a
variable substitution x = Ct as described in the boxed steps preceding
Example 5.

EXERCISES

In Exercises 1-8, find the upper-triangular 9. 2xy 10. 3x7 + 4xy


coefficient matrix U and the symmetric coefficient 2 2 2
. . _ 7 2x + -
matrixA of the given quadratic form. 1. —6xy + By 12 oxy +)
13. 3x? — 4xy + 3y? 14. x? + 2x,x,

_ 2 _ 5. x2 + x + x; — 2x,x, — 2x3 -— 2X;


1. 3x - 6xy + 2. 8x? + 9xy — 3y (SucGEsTION: Use Example 4 in Section 6.3.]
3. x? 2 -— yppy — Axy + 3xz -_— 8yz 16. x2 + 2x, — 6x,x, + x? - 4x2

» Xj) — 2xy) + xy? + Oxy" — 2xyx, + 17. Find a necessary and sufficient condition on
Oxp%4 — 8x4; a, b, and c such that the quadratic form
-2 11fx ax? + 2bxy + cy’ can be orthogonally
5. [x, y] | 7 1 3 diagonalized to kf.
18. Repeat Exercise 17, but require also that
6. [x,y | 7 ~!0 [* k=1,
Is 2} ly
8 3 I lx a In Exercises 19-24, use the routine MATCOMP
7. [x,y,z] 2 1 -4ily in LINTEK, or MATLAB, t° finda diagonal form
~§ 2 10\lz into which the given form can be transformed by
an orthogonal substitution. Do not give the
2-1 3 Olfx, substitution.
4 2-t 3)\x
8. [x,, %, %3, x ; 5
Mn ty Al) _§ 3g 7 x; 19. 3x2+ 4xy - 5y 20. x2- 8xy
+ y’
10 2 1 Sihx,
21. 3x? + yp? — 22? — 4xyp + 6yz
22. y — Bz? + 3xp — 4xz + Tyz
In Exercises 9-16, find an orthogonal substitution
23. x,? — 3x,x4 + 5x -— 8x2%;
that diagonalizes the given quadratic form, and
find the diagonatized form. 24. x, — 8x,x, + 6x; — 4%
418 CHAPTER 8 EIGENVALUES FURTHER APPLICATIONS AND COMPUTATIONS

8.2 APPLICATIONS TO GEOMETRY

Conic Sections in R’

Figure 8.2 shows three different types of plane curves obtained when a double
right-circular cone is cut by a plane. These conic sections are the ellipse, the
hyperbola, and the parabola. Figure 8.3(a) shows an ellipse in standard
position, wilh center at the origin. The equation of this ellipse is

ry
at Rp I. (1)

The ellipse in Figure 8.3(b) with center (A, &) has the equation

(x= hy, OW Ly.


a OB ”)2
If we make the translation substitution x = x — h, y = y — k, then Eq. (2)
becomes x’/a? + y/B’ = 1, which resembles Eq. (1).
By completing the square, we can put any quadratic polynomial equation
in x and y with no xv-term but with x and y having coefficients of the same
sign into the form of Eq. (2), possibly with 0 or — 1 on the right-hand side. The
procedure should te familiar to you from work with circles. We give one
illustration for an ellipse.

(a) Elliptic section (b) Hyperbolic section (c) Parabolic section

FIGURE 8.2
Sections of a cone: (a) an elliptic section; (b) a hyperbolic section; (c) a parabolic
section.
8.2 APPLICATIONS TO GEOMETRY 419

(x -_— h) 2 ak) =|
5
a’ b-

~~.
———

~~
\
=
x,

|
IS

o
ll
+

La
5,

\
S

|
>Xx
“Ae 0 h

—h

(a) (b)
FIGURE 8.3
(a) Ellipse in standard position; (b) ellipse centered at (h, k).

EXAMPLE 1 Complete the square in the equation

x + 3y - 4x + Oy = -1.
SOLUTION Completing the square, we obtain

(x7? — 4x)+ 397 + 2y) = -1


(x-
27 +3 + 1p =4+3-1=6

(x=2 OF],
6 2

HISTORICAL NOTE Anacytic GEOMETRY is generally considered to have been founded by


René Descartes (1596-1650) and Pierre Fermat (1601-1665) in the first half of the seventeenth
century. But it was not until the appearance of a Latin version of Descartes’ Geometry in 1661 by
Frans vza Schooten (1615-1660) that its influence began to be felt. This Latin version was
published along with many commentanies; in particular, the Elements of Curves by Jan de Witt
(1625-1672) gave a systematic treatment of conic sections. De Witt gave canonical forms of these
equations similar to those in use today; for example, y? = ax, by? + x = f?, and xr - by =f?
represented the parabola, ellipse, and hyperbola, respectively. He then showed how, given an
arbitrary second-degree equation in x and y, to find a transformation of axes that reduces the given
equation to one of the canonical forms. This is, of course, equivalent to diagonalizing a particular
symmetric matrix.
De Witt was a talented mathematician who, because of his family background, could devote
but little time to mathematics. In 1653 he became in effect the Prime Minister of the Netherlands.
Over the next decades, he guided the fortunes of »he country through a most difficult period,
including three wars with England. In 1672 the hostility of one of the Dutch factions culminated in
his murder.
420 CHAPTER 8 EIGENVALUES” FURTHER APPLICATIONS AND COMPUTATIONS

which is of the form of Eq. (2). This is the equation of an ellipse with center
(h, k) = (2, -1). Setting X = x — 2 and y = y + 1, we obtain the equation

ye6 + 7 l. r

If the constant on the right-hand side of the initial equation in Example |


nad been —7, we would have obtained x? + 3y? = 0, which describes the single
point (x, y) = (0, 0). This single point is regarded as a degenerate ellipse. If the
constant had been —8, we would have obtained x? + 3y? = —1, which has no
real solution; the ellipse is then called empty.
Thus, every polynomial equation

qetartaoxtay=d, cc,>0, (3)


describes an ellipse, possibly degenerate or empty.
in a Similar fashion, an equation

CX? + Gy? + cx + Gy = d, CC, < 0, (4)

describes a hyperbola. Notice that the coefficients of x’ and y’ have opposite


signs. The standard form of the equation for a hyperbola centered at (0, 0) is

xy
- mo | or xy
at ® =|,

as shown in Figure 8.4. The dashed diagonal lines y = +(b/a)x shown in the
figure are the asymptotes of the hyperbola. By completing squares and

(b)
FIGURE 8.4
2 2 2 2
(a) The hyperbola = — A = 1; (b) the hyperbola — + ni = 1,
82 APPLICATIONS TO GEOMETRY 421
translating axes, we can reduce Eq. (4) to one of the standard forms shown in
Figure 8.4, in variables x and y, unless the constant in the final equation
reduces to zero. In that case we obtain
x2 y? _
7 BR O or y=

These equations represent two lines that can be considered a degenerate


hyperbola. Thus any equation of the form of Eq. (4) describes a (possibly
degenerate) hyperbola.
Finally, the equations
qrtoxteoy=d and qytoxtoy=d c, #0, (5)
describe parabolas. If c, # 0 in the first equation in Eqs. (5) and c, # 0 in the
second, these equations can be reduced to the form

x?=ay and y*=ax (6)


by completing the square and translating axes. Figure 8.5 shows two paraholas
in standard position. If Eqs. (5) reduce to cx? + c.x = dand cy + cy = d,
each describes two parallel lines that can be considered degenerate parabolas.

In ‘summary, every equation of the form


ox toy +aox+ay=d (7)
with at least one of c, or c, nonzero describes a (possibly degenerate
or empty) ellipse, hyperbola, or parabola. |
nN

ra
=
ll
S
a
Ne

x? = ay

(a) (b)

FIGURE 8.5
(a) The parabola x? = ay, a > 0; (b) the parabola y’ = ax, a < 0
422 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS
Classification of Second-Degree Curves
We can now apply our work in Section 8.1 on diagonalizing quadratic forms to
classification of the plane curves described by ari equation of the type

ax’ + bxy + cy + dx + ey + f=0 for a, b,c not all zero. (8)

We make a substitution

xX} Alf _
* =C i where det(C) = 1, (9)

which orthogonally diagonalizes the quadratic-form part of Eq. (8), which is in


color, and we obtain an equation of the form

At? + Agt? + gt, + ht, + k = 0. (10)


This equation has the form of Eq. (7) and describes an ellipse, hyperboia, or
parabola. Remember that Eq. (9) corresponds to a rotation that carries the
vecior e, to the first column vector b, of C and carries e, to the second column
vector b,. We think of b, as pointing out the ¢,-axis and b, as pointing out the
t,-axis. We summarize our work in a theorem and give just one illustration,
leaving others to the exercises.

THEOREM 8.2 Classification of Second-Degree Plane Curves

Every equation of the form of Eq. (8) can be reduced to an equation of


the form of Eq. (10) by means of an orthogonal substitution corre-
sponding to a rotation of the plane. The coefficients A, and A, in Eq.
(10) are the eigenvalues of the symmetric coefficient matrix of the
quadratic-form portion of Eq. (8). The curve describes a (possibly
degenerate or empty)

ellipse if A,A, > 0,


hyperbola _ if A,A, < 0,
parabola if A,A, = 0.

EXAMPLE 2 Use rotation and translation of axes to sketch the curve 2xy + 2V/2x = 1.
SOLUTION The symmetric coefficient matrix of the quadratic form 2xy is

4=|t
_ {01
9h
We easily find that the eigenvalues are A, = 1 and A, = —1, and that

c= [Na IW
8.2 APPLICATIONS TO GEOMETRY 423

FIGURE 8.6 FIGURE 8.7


The hyperbola 2xy + 2V 2x = 1. vy?
The elliptic cylinder 3 + bo 1.

is an orthogonal diagonalizing matnx with determinant |. The substitution

-~ty-
x ~V5 ty)

i
y= Blt + b)

then yields
t?- 6) + 2t - 26 =

|
:

Completing the square, we obtain

(f, + ly ~~ (t, + ly l,

which describes the hyperbola shown in Figure 8.6. a

Quadric Surfaces
An equation in three variables of the form
C7 + Cy? + 0527 + GX + Cy + Gz = d, (11)
where at least one of c,, c,, or c, is nonzero, describes a quuuric surface in space,
which again might be degenerate or empty. Figures 8.7 through 8.15 show
some of the quadric surfaces in standard position.
424 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS

z
b

FIGURE 8.8 FIGURE 8.9 FIGURE 8.10


The hyperbolic cylir.der The parabolic cylinder J xy
xy? ay = xi. Tha ellipsoid at Be tas 1.
ai
a pl
b

By completing the square in Eq. (11), which corresponds to translating


axes, we see that Eq. (11) can be reduced to an equation involving x, y, and
z in which a variable appearing to the second power does not appear to the
first power. Notice that this is true for all the equations in Figures 8.7 through
8.15.

HISTORICAL NOTE THE EARLIEST CLASSIFICATION OF QUADRIC SURFACES was given by Leonhard
Euler, in his precalculus text Introduction to Infinitesimal Analysis (1748). Euler’s classification
was similar to the conic-section classification of De Witt. Euler considered the second-degree
equation in three variables Ap? + Bq? + Cr? + Dog + Epr + For + Gp + Hqa+ir+K=0Oas
representing a surface in 3-space. As did De Witt, he gave canonical forms for these surfaces and
showed how to rotate and translate the axes to reduce any given equation to a standard form such
as Ap’ + Bg? + Cr? + K = 0. An analysis of the signs of the new coefficients determined whether
the given equation represented an ellipsoid, a hyperboloid of one or two sheets, an elliptic or
hyperbolic paraboloid, or one of the degenerate cases. Euler did not, however, make explicit use of
eigenvalues. He gave a ger.cral formula for rotation of axes in 3-space as functions of certain angles
and then showed how to choose the angles to make the coefficients D, E, and F all zero.
Euler was the most prolific mathematician of all time. His collected works fill over 70
large volumes. Euler was born in Switzerland, but spent his professional life in St. Peters-
burg and Berlin. His texts on precalculus, differential calculus, and integral calculus (1748,
1755, 1768) had immense influence and became the bases for such texts up to the present. He stan-
dardized much of our current notation, introducing the numbers e, 7, and i, as well as
giving our current definitions for the trigonometric functions. Even though he was blind for
the last 17 years of his life, he continued to produce mathematical papers almost up to the day
he died.
82 APPLICATIONS TO GEOMETRY 425

FIGURE 8.11
FIGURE 8.12
a
The elliptic paraboloid. z = xy?
3 + be The hyperbolic paraboloid
Z eo

~ pt at

FIGURE 8.13
FIGURE 8.14
a x? 2
The elliptic cone z?yea= 3 + be The hyperboloid of two sheets
426 CHAPTER 8 EIGENVALUES FURTHER APPLICATIONS AND COMPUTATIONS

FIGURE 8.15
2 2 y?
The hyperboloid of one sheet 5 +1= 5 + be

Again, degenerate and empty cases are possible. For example, the equation
x + 2y + z? = —4 gives an empty ellipsoid. The elliptic cone, hyperboloid of
two sheets, and hyperboloid of one sheet in Figures 8.13 through 8.15 differ
only in whether a constant in their equations is zero, negative, or positive.
Now consider a general second-degree polynomial equation

ax’ + by? + cz? + dxy + exz + fyz+ pxtqytrz+s=0, (12)

where the coefficient of at !east one term of degree 2 is nonzero. Making a


substitution

x
yl=C|t|, where det(C) = 1, (13)
z
that orthogonally diagonalizes the quadratic-form portion of Eq. (12) (the
portion in color), we obtain

Ait? + Aghy? + Ast? + pt + gh + rt + s' = 0, (14)


which is of the form of Eq. (11). We state this as a theorem.

THEOREM 8.3 Principal Axis Theorem for R?

Every equation of the form of Eq. (12) can be reduced to an equation


of the form of Eq. (14) by an orthogonal substitution (13) that
corresponds to a rotation of axes.
8.2 APPLICATIONS TO GEOMETRY 427

Again, computation of the eigenvalues A,, A,, and A, of the symmetric


coefficient matrix of the quadratic-form portion of Eq. (12) may give us quite a
bit of information as to the type of quadric surface. However, actually
executing the substitution and completing squares, which can be tedious, may
be necessary to distinguish among certain surfaces. We give a rough classifica-
tion scheme in Table 8.1. Remember that empty or degenerate cases are
possible, even where we do not explicitly give them.

TABLE 8.1

Eigenvalues A,, Az, A; Quadric Surface

All of the same sign Ellipsoid


Two of one sign and one of the other Elliptic cone, hyperboloid of two sheets,
sign or hyperboloid of one sheet
One zero, two of the same sign Elliptic paraboloid or elliptic cylinder
(degenerate case)
One zero, two of opposite signs Hyperbolic paraboloid or hyperbolic
cylinder (degenerate case)
Two zero, one nonzero Parabolic cylinder or two parallel planes
(aegenerate case)

If you try to verify Table 8.1, you will wonder about an equation of the
form ax? + by + cz = din the last case given. Exercise 11 indicates that, by
means of a xotaticn of axes, this equation can be reduced to at,’ + rt, = d,
which can be written as at,’ + r(t, — d/r) = 0, and consequently describes a
parabolic cylinder. We conclude with four examples.

EXAMPLE 3 Classify the quadric surface 2xy + 2xz = 1.


SOLUTION Example 5 in Section 8.1 shows that the orthogonal substitution

x = lt — )

ye 5(-v% + t, + ts]

z= a(v2e +t, + |

transforms 2xy + 2xzinto V2t,? — V/2t,2. It can be checked that the matrix C
corresponding to this substitution has determinant |. Thus, the substitution
corresponds to a rotation of axes and transforms the equation 2xy + 2xz = |
into V2t, — V2t,? = 1, which we recognize as a hyperbolic cylinder. =
EXAMPLE 4. Classify the quadric surface 2xy + 2xz = y + I.
428 CHAPTER 8 —_ EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS
SOLUTION Using the sane substitution as we did in Example 3, we obtain the equation

V212 - V2; = {-V24 +1, + " +1.

Translation of axes by completing squares yields an equation of the form

VAU(t, — bY — V2(t) — k= - Flt — 1),


which we recegnize as a hyperbolic paraboloid. ®

EXAMPLE 5 Classify the quadric surface 2xy + 2xz =x + 1.

SOLUTION Using again the substitution in Example 3, we obtain

Vt? - V2t; = lh —t)+)


or
21,2 — t, - 2t2 + 4 = V2.
Completing squares yields

a(n -Y = -1) =v,


4 4

which repiesents a hypezbolic cylinder. &

EXAMPLE 6 Classify the quadric surface


2x?— 3y? + 2? — Qxy + 4yz — 6x + By — 8z= 17
as far as possible, by finding just the eigenvalues of the symmetric coefficient
matrix of the quadratic-form portion of the equation.
SOLUTION The symmetric matnx of the quadratic-form portion is
2-1 O
A=|-1 -3 2).
0 2 !

We find that
2-A -1 0
det(4-AN=| -1 -3-a 2
0 2 1-A
ony [-3-A 2 | Jel 2
= A) | 2 12+ 1 al
= (2 - AA? + 2A-7)+A-1
= —-' + 12A — 15.
8.2 APPLICATIONS TO GEOMETRY 429

We can see that A = 0 is not a solution of the characteristic equation


—-M + 12A — 15 = 0. We could plot a rough sketch of the graph of y=
—N + 12A — 15, just to determine the signs of the eigenvalues. However, we
prefer to use the routine MATCOMP in LINTEK, or MATLAB, with the
matrix A. We quickly find that the eigenvalues are approximately
A, = —3.9720, A, = 1.5765, A; = 2.3954.

According to Table 8.1, we have an elliptic cone, a hyperboloid of two sheets,


or 2 hyperboloid of one sheet. «

[ |
SUMMARY |

1. Given an equation ax + bxy + cy? + dx + ey + f= 0, let A be the
symmetric coefficient matrix of the quadratic-form portion of the equation
(the portion in color), let A, and A, be the eigenvalues of A, and let C be an
orthogonal matrix of determinant | with eigenvectors of A for columns.
The substitution corresponding to inatrix C followed by translation of axes
reduces the given equation to a standard form for the equation of 2 conic
section. In particular, the equation describes a (possibly degenerate or
empty)
ellipse if A,A, > 0,
hyperbola if A,A, < 0,
parabola. if A,A, = 0.
2. Proceeding in a way analogous to that described in summary item (1) but
for the equation
ax’ + by? + cz? + dxy + exz + fyzt+ pxtqtzt+s=0,
one obtains a standard form for the equation of a (possibly degenerate or
empty) quadnic surface in space. Table 8.1 lists the information that can be
obtained from the three eigenvalues A,, A,, and A, alone.

| EXERCISES

In Exercises 1-8, rotate axes, using a substitution 3. e+ 2xytyr=4

| | — off}
yl |b
complete squares if necessary, and sketch the
4. x2 -— Ixy
t y?+ 4V2x= 4

5. 10x° + 6xy+ 2p?


=4
graph (if it is not empty) of the given conic
section.
6. Sx2+
4xy + 2y? = -1
7. 3x2 + 4xy + by? = 8
1. 2xy
=1
2. 2xy - 2V2y
= | 8. x2 + Bxy + Ty? + 18V5x
= —9
430 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS

9. Show that the plane curve ax* + Axy + cy? + In Exercises 12-20, classify the quadric surface
dx + ey + f= 01s a (possibly degenerate or with the given equation as one of the (possibly
empty) empty or degenerate) types illustrated in Figurcs
. 8.7-8.15.
ellipse ifP — 4ac < 0,
hyperbola if& — 4ac > 0,
12. 2x° + 2p + 6yz+ 102
= 9
parabola if b? — 4ac = 0.
13. 2xv+ 2774 1=0
[Hint: Diagonalize ax’ + bxy + cy’, using 2
the quadratic formula, and check the signs 14. 2xz+ yet dv t+ 1 =0
of the eigenvalues.} 15. x8 +2yz t+ 4x +1 =0
10. Use Exercise 5 to classify the conic section 16. 3x2 + 2y? + 6xz + 32?
with the given equation.
a . 2x0 + Biy + BY — 3x + 2) = 13 17. x — Bxy + 16y* — 3z' = §
b. yt 4ay ~ Sx? — Bx = 12 18. x + 49 — 4xz + 42 = 8
ce. -x?+ Sxy-— Ty - 4p +11 =0
d. xy + 4x - 3p - 8 19. —3x°
+ 2y + Byz + 162?= 10
e. 2x’ — 3xy + ¥ — 8x + Sy = 30 20. e+ y+ 27 - 2xy + 2xz- 2yz = 9
f. x? + Oxy + 9 - 2x+ 14y= 10
. 4 — Ixy — By ft 8x - Sy =!7
r 8x2 + éxy + oy —~ Sx = 3 = In Exercises 21-27, use the routine MATCOMP
i. x2 — Ixy + 4x - Sy =6 in LINTEK, or MATLAB, to classify the quadric
. Ix? - 3xy + 2p - By = 15 surface according to Table 8.].
j
11. Show that the equation ax? + by + cz =d
>a ye ye
can be transformed into an equation of the 21. x+y +z? — 2xy — 2xz - Dz +
forin at,? + rt, = d by a rotation of axes in 3x — 32 = 8
space. [HInT: 22. 3x2 + 2p + 522 + 4xy + 2ps - 3x +
x t 10y = 4

If = Cb}, then 23. 2x? — BY + 32? -— 4xp + pz t+


z f 6xz — 3x — 8z =3
fy x x 24. 3x7 + 7p + 42? + Oxy — 8yz + 16x = 20
-c-)l=ec7
hy= Clyy= C' yl. 25. x2 + 4y + 1622 + 4xy + 8xz +
fs 2 z l6yz — 8x + By = 8
Find an orthogonal matrix C™ such that 26. x2 + 6y? + 422 — xy — 2xz - 3yz — 9x = 20
det(C’) = 1, 4, = x, and t, = (by + cz)/r for
some r.| 27. 4x2 -— 3y + z+ 8xz + byz + 2x - 3y = 8

oon —

8.3 APPLICATIONS TO EXTREMA

Finding Extrema of Functions


We turn now to one of the applications of linear algebra to calculus. We simply
state the facts from calculus that we need, trying to make them seem
reasonable where possible.
8.3 APPLICATIONS TO EXTREMA 431

There are many situations in which one desires to maximize 01 minimize a


function of one or more variables. Applications involve maximizing profit,
minimizing costs, maximizing speed, minimizing time, and so on. Such
problems are of great practical importance.
Polynomial functions are especially easy to work with. Many important
functions, such as trigonometric, exponential, and logarithmic functions, can’t
be expressed by polynomial formulas. However, each of these functions, and
many others, can be expressed near a point in the domain of the function as a
“polynomial” of infinite degree—an infinite series, as it is called. For example,
it is shown in calculus that

l 5
cos x= | - ax? + axt - Gt to (1d)

for any number x. [Recall that 2! = 2- 1, 3! = 3-2-1, and in general, nt =


n(n — l)(n — 2) > ++ 3-2-1.] We leave to calculus the discussion of such things
as the interpretation of the infinite sum

1 11
lata @
+ «8 6

as cos 1. From Eq. (1), it would seem that we should have

I I 1.
cos(2x — y) =1 - alex - yy + qil2x — yj - 62% —yfo+tr++. (2)

Notice that the term

lo — r= lige ayy t p
ail x y) ~ 4( x xy y’)

is a quadratic form—that is, a form (homogeneous polynomial!) of degree 2.


Similarly,

q(2x -— y= a l6x4 — 32x)v + 24x’ — 8xy? + y')

is a form of degree 4.
Consider a function g(x), X,,..., X,) Of n vanables, which we denote as
usual by g(x). For many of the most common functions g(x), it can be shown
that, ifx is near 0, so that all x, are smal! in magnitude, then

B(x = Cth t+ AO + fA) ts +h) toc, (3)


where each f(x) is a form of degree i or is zero. Equation (2) illustrates this.
We wish to determine whether g(x) in Eq. (3) has a local extremum—that
is, a local maximum or local minimum at the origin x = 0. A function g(x) has
a local maximum at 0 if g(x) < g(0) for all x where ||x|| is sufficiently small. The
notion of a local minimum at 0 for g(x) is analogously defined. For example,
the function of one variable g(x) = x’ + |. whose graph is shown in Figure
8.16, has a local minimum of | at x = 0.
432 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS ANO COMPUTATIONS

yar tl

FIGURE 8.16
The graph of y = g(x) = x? + 1.

Now if x is really close to 0, ail the forms fi(x), f(x), 4(x), . . . in Eq. (3)
have very small values, so the constant c, if c # 0, is the dominant term of
Eq. (3) near zero. Notice that, for g(x) = 1 + x’ in Figure 8.16, the function has
values close to the constant | for x close to zero.
After a nonzero constant, the form in Eq. (3) that contributes the most to
g(x) for ||x|| close to 0 is the nonzero form f{x) of lowest degree, because the
lower the degree of a term of a polynomial, the mure it contributes when the
variables are all near zero. For example, x is greater than x near zero; if x =
joo then x? is only rp5p5- If fi(x) = 0 and f(x) # 0, then f,(x) is the dominant
form of Eq. (3) after a nonzero constant c, and so on.
We claim that, if f(x) # 0 in Eg. (3), then g(x) does not have a local
maximum or minimum at x = 0. Suppose that

Six) = dx, + dx, + +++ + dx,


with some coefficient—say, d,—nonzero. If k is small but of the same sign as
d,, then near the point (k, 0, 0, . . . , 0) the function g(x) ~ c + d,k > c. On the
other hand, if k is small but of opposite sign to d,, then near this point g(x) <¢
Thus, g(x) can have a local maximum or minimum of c at 0 only if f;(x) = 0 for
all x.
If f(x) = 0 and f,(x) # 0 in Eq. (3), then, for x near 0, f,(x) is the dominant
form in the equation; it can be expected to dominate the terms of higher degreé
for x close to 0. It seems reasonable that, if f(x) > 0 for all x 4 0 but near 0,
then for such x

g(x) = c + (Little bit),

and g(x) has a local minimum of c at 0. On the other hand, if f,(x) < 0 for all
such x, then we expect that
g(x) = c — (Little bit)
8.3 APPLICATIONS TO EXTREMA 433

for these values x, and g(x) has a local maximum of cat 0. This is proved in an
advanced calculus course.
We know that we can orthogonally diagonalize the form f(x) with a
substitution x = Ct to become
At? + At? treet Ant, (4)

where the A; are the eigenvalues of the symmetric coefficient matrix of f(x).
Form (4) is > 0 for all nonzero t, and hence f,(x) > 0 for all nonzero x, if and
only if we have all A; > 0. Similarly, f(x) < 0 for ail nonzero x if and only if all
A; < 0. It is also clear that, if some A; are positive and some are negative, then
form (4) and hence /5(x) assume both positive and negative values arbitrarily
close to zero.

DEFINITION 8.1 Definite Quadratic Forms

A quadratic form f(x) is positive definite if f(x) > 0 for all nonzero x in
R*, and it is negative definite if f(x) < 0 for all such nonzero x.

Our work in Section 8.1 and the preceding statement give us the following
theorem and corollary.

THEOREM 8.4 Criteria for Definite Quadratic Forms

A quadratic form is positive definite :f and only if all the eigenvalues


of its symmetric coefficient matrix are positive, and it is negative
definite if and only if all those eigenvalues are negative.

COROLLARY A Test for Local Extrema

Let g(x) be a function of n variables given by Eq. (3). Suppose that the
function f(x) in Eq. (3) is zero. If f,(x) is positive definite, then g(x) has
a local minimum of c at x = 0, whereas if f(x) is negative definite, then
g(x) has a local maximum of c at x = 0. If f(x) assumes both positive
and negative values, then g(x) has no local extremum at x = 0.

We have discussed local extrema only at the origin. To do similar work at


another point (/,, h,, . . . , A,), we need only translate axes to this point, letting
X; = x, — h, so that Eq. (3) becomes

(X= C+ f(X) + A(X, to tf) tee. (5)


Our discussion indicates a method for attempting to find local extrema of
a function, which we box.
434 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS

Finding an Extremum of 9(x)


Step 1 Find a point h where the function f(x) = f(x — h) in Eq.
(5) becomes the zero function. (This is the province of
calculus.)
Step 2 Find the quadratic form /,(x) at the point h. (This also
requires calculus.)
Step 3 Find the eigenvalues of the symmetric coefficient matrix of
the quadratic form.
Step 4 If all eigenvalues are positive, then g(x) has a local
minimum of c at h. If all are negative, then g(x) has a local
maximum of ¢ at h. If eigenvalues of both signs occur, then
g(x) has no local extremum at h.
Step 5 If any of the eigenvalues is zero, further study is necessary.

Exercises 17 through 22 illustrate some of the things that can occur if step
5 is the case. We shall not tackle steps 1 or 2, because they require calculus, but
will simply start with equations of the form of Eqs. (3) or (5) that meet the
requirements in steps | and 2.

EXAMFLE 1 Let

g(x, y) = 3 + (2x? — 4xy + 4y’) + OF + 4xy + y’).


Determine whether g(x, y) has a local extremum at the origin.

SOLUTION The symmetric coefficient matrix for the quadratic-form portion 2x? — 4xy +
4y? of g(x, y) is

We find that

a _ All = i a) =\N- 6A + 4.

The solutions of the characteristic equation A? — 6A + 4 = 0 are found by the


quadratic formula to be

6+ V36 — 16
A= ENS.

We see that A, = 3 + V5 and A, = 3 — V5 are both positive, so our form


positive definite. Thus, g(x, y) has a local minimum of 3 at (0, 0). ®
8.3 APPLICATIONS TO EXTREMA =—s_s« 435
EXAMPLE 2 Suppose that

&(%, y, 2) = 7 + (2x? — By + 32’ — Axy + 2yz + 6xz) + (xz? -— 5y)


+ (higher-degree terms).

Determine whether g(x, y, z) has a local extremum at the origin.


SOLUTION The symmetric coefficient matrix of the quadratic-form portion is
2-2 3
A=)-2 -8 1}.
3 1 3
We could find p(A) = |A — Ad] and attempt to solve the characteristic equation,
or we could try to sketch the graph of p{A) well enough to determine the signs of
the eigenvalues. However, we prefer to use ine routine MATCOMP in
LINTEK, or MATLAB, and we find that the eigenvalues are approximately
A, = 8.605, A, = 0.042, A, = 5.563.
Because both positive and negative eigenvalues appear, there is no local ex-
tremum at the origin. ss

faximizing or Minimizing a Quadratic Form on the Unit Sphere


Let f(x) be a quadratic form in the vanables x,, x,, ..., x, We consider the
problem of finding the maximum and minimum values of f(x) for x on the unit
sphere, where ||x|| = 1—that is, where x,? + x2 + +++ + x,?= 1. It is shown in
advanced calculus that such extrema of f(x) on the unit sphere always exist.
We need only orthogonally diagonalize the form f(x), using an orthogonal
transformation x = Ct. obtaining as usual

AG? + At? ++ + A,L,?. (6)


Because our new basis for R" is again orthonormal, the unit sphere has
t-equation

Pt art er +47 = 1, (7)


this is a very important point. Suppose that the A, are arranged so that

A, = A; aot Se A,-

On the unit sphere with Eq. (7), we see that formula (6) can be written as
A(L -— t? — +++ — £2) + Ag? +--+ + At,?
=A — (A, - Ay)ty? Tr (Am An. (8)

Because A, — A, = 0 for i > 1, we see that the maximum value assumed by


formula (8) is A, when t, = t, =+-°* = i, = 0, and ¢, = +1. Exercise 32
indicates similarly that the minimum value assumed by form (6) is A,, when
t, = +1 and all other /; = 0. We state this as a theorem.
436 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS

THEOREM 8.5 Extrema of a Quadratic Form on the Unit Sphere

Let f(x) be a quadratic form, and let A,, A,,... , A, be the eigenvalues
of the symmetric coefficient matrix of f(x). The maximum value
assumed by f(x) on the unit sphere ||x|| = 1 is the maximum of the 4,
and the minimum value assumed is the minimum of the A,. Each
extremum Is assumed at any eigenvector of length | corresponding to
the eigenvalue that gives the extremum.

The preceding theorem is very important in vibration application:


ranging from aerodynamics to particle pnysics. In such applications, one ofter
needs to know the eigenvalue of maximum or of minimum magnitude for <
symmetric matrix. These eigenvalues are frequently found by using advanced.
calculus techniques to maximize or minimize the value of a quadratic form on
a unit sphere, rather than using algebraic techniques such as those presented in
this text for finding eigenvalues. The principal axis theorem (Theorem 8.1°
and the preceding theorem are the algebraic foundation for such an analytic
approach. We illustrate Theorem 8.5 with an example that maximizes a
quadratic form on a unit sphere by finding the eigenvalues, rather than
illustrating the more important reverse procedure.

EXAMPLE 3 Find the maximum and minimum vaiues assumed by 2xy + 2xz on the unit
sphere x? + y + z? = 1, and find all points where these extrema are assumed.
SOLUTION. From Example 5 in Section 8.1, we see that the eigenvalues of the symmetric
coeficient matrix A and an associated eigenvector are given by

h, = 0, v, = [0, -1, 1},

A, - V2, vy, = [V2, 1, 1],


A,=-V2, vy, =[-V2, 1, 1].
We see that the maximum value assumed by 2xy + 2xz on the unit sphere is
V2, and it is assumed at the points +(/2/2, 4, 4). The_minimum value
assumed is —\/2, and it is assumed at the points +(—V/2/2, 4, ,). Notice
that we normalized our eigenvectors to length 1 so that they extend to the unit
sphere. s

The extension of Theorem 8.5 to a sphere centered at the origin, with


radius other than 1, is left as Exercise 33.

SUMMARY
1. Let g(x) =c+ f(x) + f(x) +--+ -* + f(x) + +++ near x = 0, where f(x) is
a form of degree i or is zero.
a. If f(x) = 0 and f(x) is positive definite, then g(x) has a local minimum
of cat x = 0.
8.3 APPLICATIONS TO EXTREMA 437

b. Iff\(x) = 0 and f(x) is negative definite, then g(x) hasa local maximum
ofcatx = 9.
c. If f(x) # 0 or if the symmetric coefficient matrix of (x) ha’ both
positive and negative eigenvalues, then g(x) has no local extremum at
x = 0.
2. The natural analogue of summary item i olds at x = h; just translate the
axes to the point h, and replace x, by xX, = x; — A..
3. A quadratic form in n variables has as maximum (minimum) value on the
unit sphere ||x|] = 1 in R” the maximum (minimum) of the eigenvalues of
the symmetric coefficient matrix of the form. The maximum (minimum) is
assumed at each corresponding eigenvector of length 1.

EXERCISES

In Exercises 1-15, assume that g(x), or g(X), ts 13. e(x yz) = 54+ (42+ Axyt yY—-— z+
described by the given formula for values of x, or y + -::
— 4xyz)
(x
x, near zero. Draw whatever conclusions are 14. e(x, y,
= 44+ (+ yt Z)
2? - 2xz -
possible concerning local extrema of the Ixy — 2yz) + GXI-D)+---,
function g. x=x+lyp=y-2,z=z+5
15. g(x, y,z)=4- 37+
1. B(x, y) = —7 + (3x° — bxy + 4y’) +

- 4y) (2x° — 2xy + 3xz + Sp? +z) +
(xyz -ZD)t--+,X=xXx-TyV=yt G,
. B(x, y) = 8 — (2 — Bxyp + By) + Z=2
(2x7y — y) 16. Define the notion ofa local minimum ¢ of a
~ B(%, y) = 4 -— 3x + (2 - Qxy t+ y) + function f(x) of n variables at a point h.
Qxy+ yt
» B(x, y) = 5 — (8 — b6xy + 2y) + In Exercises 17-22, let
(40 = xy)
++ -- ax y=act fx yt haytflayt---
» B(x, y) = 3 — (4x? — Bxy + Sp’) +
QQx*y-—p)yters x=x+5, yay in accordance with the notation we have been
» 8% y) = 24+ (8 + day t+ p+ using. Let d, and A, be the eigenvalues of the
Co + 5xy) symmetric coefficient matrix of f,(x, y).
» &(x, y) = 5+ 3x + 10xy + TP) +
(Ixy
- y) i7. Give an example ofa polynomial function
g(x, y) for which f(x, y) = 0, A, = 0,
» &(x, y) = 4 - Oe - bxy + Dy?) + |A.| = 1, and g(x, y) has a local minimum of
(Oy) too 10 at the origin.
- B(x, y) = 3 + (2x? + Bxy + By?) + 18. Repeat Exercise 17, but make g(x, y) have a
(4x? -xy*)y)+-->,x=x-3,p=y-l local maximum of —5 at the origin.
10. &(xX, y) = 4 - (+ 3xy - y+ . Repeat Exercise 17, but make g(x, y) have
(4— x’ x=xt+1yp=y-7
Sxy),p no local extremum at the origin.
Il. &(x, y, Z) = 4-0 + 4xy + Sy + 327) +
20. Give an example of a polynomial function
Gd
— xyz) g(x, y) such that f(r, ») = Ale y) = 9.
12. &(x, y, Z) = 3 + (2 + 6xz — p + 52%) + having a local maximum of | at the
(x°z — y?z) + ces origin.
438 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS

21. Repeat Exercise 20, but make g(x, y) have a and prove that this minimum value is
local minimum of 40 at the origin. assumed at any corresponding eigenvector of
22. Repeat Exercise 20, but make g(x, y) have length 1.
no local extremum at the origin. 33. Let f(x) be a quadratic form in n vanables.
Descnbe the maximum and minimum
In Exercises 23-31, find the maximum and values assumed by f(x) on the sphere
minimum values of the quadratic form on the unit \|x|| = a? for any a > 0. Describe the points
circle in R?, or unit sphere in R’, and find all of the sphere at which these extrema are .
points or the unit circle or sphere where the assumed. [Hint: For any & in R, how does
extrema are assumed. f(kx) compare with f(x)?]

23. xy,n=2 a In Exercises 34-38, use the routine MATCOMP


24. 3x2 + 4xy,n = 2 in LINTEK, or MATLAB, and follow the
25. —6xy + 8p, n=2 directions for Exercises 1-15.
26. x + 2xy t+ yn= 3
34. e(x. y, Z)= 7+ (Bx + 2y + 527 + Axp +
27. 3x?— 6xy + 3y*4,n
= 2
2yz) + (xyz — xy)
28. y? + 2xz,n = 3
35. &(x, y, 2) = 5 —(e+ 6y + 42? — xy -
29. ety tz — 2xyp + 2xz -— 2yz,n = 3 2xz — 3yz) + (xz?- 52) + -+*
30. x+y t+ 2? — 2xp -— 2xz - 2ypz,n = 3
36. g(x,y, Z) = -1 — (2h — BY + 32? - 4xy +
(SuGGeEstion: Use Example 4 in Section 6.3.]
yz + 6xz) + (8x? — 4y?z + 32°)
31. x + w? + 4yz — 2xn, n = 4
37. g(x y 2274+ (8 + 2yrtz — 3xy-
32. Prove that the minimum value assumed by a
4xz — 3yz) + (ey - 4yz’)
quadratic form in n variables on the unit
sphere in R" is the minimum eigenvalue of 38. &(x, v, 2) = 5S + (x + 43? + 162? + 4xy +
the symmetric coefficient matnx of the form, 8xz+ 16yz) + G8 + yp? + Txyz) ++ ->

8.4 COMPUTING EIGENVALUES AND EIGENVECTORS

Computing the eigenvalues of a matrix is one of the toughest jobs in linear


algebra. Many algorithms have been developed, but no one method can be
considered the best for all cases. We have used the characteristic equation for
computation of eigenvalues in most of our examples and exercises. The
routine MATCOMP in LINTEK also uses it. Professionals frown on the use of
this mvihod in practical applications because small errors made in computing
coefficients of the characteristic polynomial can !cad to significant errors in
eigenvalues. We describe three other methods in this section.
1. The power method is especially useful if one wants only the eigenvalue
of largest (or of smallest) magnitude, as in many vibration problems.
2. Jacobi’s method for symmetric matrices is presented without proof.
This method is chiefly of historical interest, because the third and most
recent of the methods we describe is more general and usually more
efficient.
8.4 COMPUTING EIGENVALUES AND EIGENVECTORS 439

3. The QR method was deveioped by H. Rutishauser in 1958 and J. G. F.


Francis in 1961 and is probably the most widely used method today. It
ws finds all eigenvalues, both real and complex, of-a real-matrix. Details
are beyond the scope of this text. We give only a rough idea of the
method, with no proofs.
The routines POWER, JACOBI, and-QRFACTOR in LINTEK can be used to
illustrate the three methods.

The Power Method

Let A be an n X n diagonalizable matnx with real eigenvalues. Suppose that


one eigenvalue—say, A,—has gieater magnitude than ail the others. That is,
|A,| > |A| for i> 1. We call A, the dominant eigenvalue of A. In many vibration
problems, we are interested only in computing this dominant eigenvalue and
the eigenvalue of minimum absolute value.
Because A is diagonalizable, there exists a basis

{b,, b,,..., b,}

for R"” composed of eigenvectors of A. We assume that b, is the eigenvector


corresponding to A, and that the numbering is such that |A,| > JA,| =
sl =Let +++w, beBLAany |
nonzero vector in R’”. Then

w, = Cb, + Gb, + +--+ + ¢,b, (1)

for some constants c; in R. Applying A’ to both sides of Eq. (1) and


remembering that A‘b, = A/b,, we see that

A‘w, = Ayc,b, + Ayeyb, + +++ + Ajse.b,. (2)


Because A, is dominant, we see that, for large s, the summand A,‘c,b, dominates
the right-hand side of Eq. (2), as long as c, # 0. This is even more evident if we
rewrite Eq. (2) in the form

Aw, = Ai {c,b, + {A,/A,}'c,b, tree + (A,/A,)°c,b,,)- (3)

If s is large, the quotients (A/A,) for i> 1 are close to zero, because |A/A,| < I.
Thus, if c, # O and s is large enough, A‘w, is very nearly parallel to A,'c,b,, which
is an eigenvector of A corresponding to the eigenvalue A,. This suggests that we
can approximate an eigenvector of A corresponding to the dominant eigen-
value A, by multiplying an appropriate initial approximation vector w,
repeatedly by A.
A few comments are in order. In a practical application, we may have a
rough idea of an eigenvector for A, and be able to choose a reasonably good
first approximation w,. In any case, w, should not be in the subspace of R’
440 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS

generated by the eigenvectors corresponding to the A, for j > 1. This


corresponds to the requirement that c, # 0.
Repeated multiplication of w, by A may produce ver; large (or very
small) numbers. It is customary to scale after each multiplication, to keep
the components of the vectors at a reasonable size. After the first multiplica-
tion, we find the maximum d, of the magnitudes of all the components of Aw,
and apply A the next time to the vector w, = (1/d,)Aw,. Similarly, we let w, =
(1/d,)Aw,, where d, is the maximum of the magnitudes of components of Aw,,
and so on. Thus we are always multiplying A times a vector w, with
components of maximum magnitude I. This scaling also aids us in estimating
the number-of-significant-figures accuracy we have attained in the components
of our ajsproximations to an eigenvector.
If x is an eigenvector corresponding to A,, then

AXX*X7X AX
K°X
Ly, (4)
The quotient (Ax : x)/(x + x) is called a Rayleigh quotient. As we compute the w,,
the Rayleigh quotients (Aw, - w,)/(w; + w,) should approach A,.
This power method for finding the dominant eigenvector should, mathe-
matically, break down if we choose the initial approximation w, in Eq. (1) in
such a way that the coefficient c, of b, is zero. However, due to roundoff error,
it often happens that a nonzero component of b, creeps into the w, as they are
computed, and the w, then start swinging toward an cigenvector for A, as
desired. This is one case whcre roundoff error is he!pful!
Equation (3) indicates that the ratio |A,/A,|, which is the maximum of
the magnitudes |Aj/A,| for ¢ > 1, should control the speed of convergence
of the w; to an eigenvector. If |A,/A,{ is close to 1, convergence may be quite
SiOwW,
We summarize the steps of the power method in the following box.

HISTORICAL NOTE Tue RAYLEIGH QUOTIENT is named for John William Strutt, the third
Baron Rayleigh (1842-1919). Rayleigh was a hereditary peer who surprised his family by pursuing
a scientific career instead of contenting himself with the life of a country gentleman. He set up @
laboratory at the family seat in Terling Place, Essex, and spent most of his life there pursuing his
research into many aspects of physics—in particular, sound and optics. He is especially famous
for his resolution of the long-standing question in optics as to why the sky i is blue, as well as for his
codiscovery of the element argon, for which he won the Nobel prize in 1904. When he received
the British Order of Merit in 1902 he said that “the only merit of which he personally was con-
scious was that of having pleased himself by his studies, and any results that may have bee?
due to his researches were owing to the fact that it had been a pleasure to him to become 4
physicist."
Rayleigh used the Rayleigh quotient early in his career in an 1873 ork in which he needed to
evaluate approximately the normal modes of a complex vibrating system. He subsequently used it
and related methods in his classic text The Theory of Sound (1877).
8.4 COMPUTING EIGENVALUES AND EIGENVECTORS 441

The Power Method for Finding the Dominant Eigenvalue A, of A


Step 1 Choose an appropriate vector w, in R" as first
approximation to an eigenvector corresponding to A).
Step 2 Compute Aw, and the Rayleigh quotient (Aw, - w,)/(w, ° W,).
Step 3 Let w, = (I/d,)Aw,, where d, is the maximum of the
magnitudes of components of Aw,.
Step.4 Repeat.step:2,-with all subscripts-increased by |. The
Rayleigh quotients should approach A,, and the w; should
approach an eigenvector of A corresponding to A,.

EXAMPLE 1 Iilustrate the power method for the matnx

A= —?3-2 0 s tarting
Ing W1 with Wi -| Tey | P

finding w,, w;, and the first two Rayleigh quotients.


SOLUTION We have

am = {3 oli] =[-a}
_{| 3-27 1]_] 5

We find as first Rayleigh quotient


7
Aw, ° W,
~~ 3.5.
whoo
Because 5 is the maximum magnitude or a component of Aw,, we have

' 1]
W,> 5AM = 3}

Then

The next Rayleigh quotient is

AW,"W, 5 _ 115 _
w,'W, 29 29

Finally,
442 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS

EXAMPLE 2 Use the routine POWER in LINTEK with the data in Example 1, and give the
vectors w, and the Rayleigh quotients until stabilization to all decimal place;
printed occurs.

SOLUTION POWER pnitts fewer significant figures for the vectors to save space. The datz
obtained using POWER are shown in Table 8.2. It is easy to solve the
characteristic equation A? — 3A — 4 = 0 of A, and tosee that the eigenvalues are
really A, = 4 and A, = —1. POWER found the dominant eigenvalue 4, and it
shows that [1, ~0.5] is an eigenvector for this eigenvalue.

ifA has an eigenvalue of nonzero magnitude smailer than that of any othe:
eigenvalue, the power method can be used with A“! to find this smalles
eigenvalue. The eigenvalues of A“! are the reciprocals of the eigenvalues of A.
and the eigenvectors are the same. To illustrate, if 5 is the dominant eigenvalue
of A“' and v is an associated eigenvector, then ; is the eigenvalue of A of
smallest magnitude, and v is stil] an associated eigenvector.

TABLE 8.2
3 -2]
Power Method for A = | _ 2 9|

Vector Approximations Rayleigh Quotients

(1, -1]
(1, -.4] 3.5
(1, —.5263158] 3.9655172413793i
[1, —.4935065] 3.997830802603037
(1, —.5016286] 3.9998 64369998644
[1, -.4995932] 3.99999 1522909338
[1, —.5001018) 3.999999470180991
[1, —.4999746] 3.999999966886309
[1, —.5000064] 3.999999997930394
[1, -.4999984] 3.99999999987065
[1, —.5000004] 3.99999999999 1916
[1, —.4999999] 3.999999999999495
[1, -.5] 3.999999999999968
[1, -.5] 3.999999999999998
(1, —.5] 4.
{1, -.5] 4

Deflation for Symmetric Matrices


The method of deflation gives a way to compute eigenvalues of intermediat®
magnitude of a symmetric matrix by the power method. It is based on a"
84 COMPUTING EIGENVALUES AND EIGENVECTORS 443

interesting de, omposition ofa matrix product AB. Let A be an m X n matrix,


and let B be ann X 5 matrix. We write AB symbolically as

——r
| lr.
AB=\|c, G& «°° ¢

=
,,

where c, is the jth column vector ofA and r, is the ith row vector of B. We claim
that
AB = cn, + er, + > ° + + ¢,F,, (5)

where each cy, is the product of an m X | matrix with a | x 5 matrix. To see


this, remember that the entry in the ith row and jth column of AB is

ab;; + Andy, terse t+ AinDny-

But c,r, contributes precisely a,b, to the ith row and jth column of the sum in
Eq. (5), while c,r, contributes a5, and so on. This establishes Eq. (5).
Let A be an n X n symmetric matrix with eigenvalues A,, A... ,A,, Where

[Ai] = FAg] = +> + = [Ad


We know from Section 6.3 that there exists an orthogonal matrix C such that
C~'AC = D, where D is a diagonal matrix. with d, = A,. Phen

A= CDC! = CDC"

A, —— b,’ ——

FQ
A, b,7
=|b, b, --: b, , (6)

where {b,, b,, . . . , b, } is an orthonorma: basis for R" consisting of eigenvectors


ofA and forming the column vectors of C. From Eq (6), we see that

_ A,b,7 ———

A,b,7
A= |b, b, --- b, (7)

| —_ A,b,7 TT

Using Eq. (5) and writing the b, as column vectors, we see that A can be written
in the following form:
444 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS

| Spectral Decomposition of A |

| A = A,b,b,7 + Ayb,b,7 + + +> + A,b,b,7. (8)


L_

Equation (8) is known as the spectral theorem for symmetric matrices.

EXAMPLE 3 Illustrate the spectral theorem for the matrix

A-[ 39 _| 3-2

of Example 1.
SOLUTION Computation shows that the matrixA has eigenvalues A, = 4 and A, = —1, with
corresponding eigenvectors

| 1

o !
v,=]_1/ and vy, =} 2].

Eigenvectors b, and b, forming an orthonormal basis are

» =! 23S) ang », = [YS


= Lin = |ainl:
We have

A,b,b,” T
+ A,b,b, T—
| TA]
2s RIVS _
W/WV5] _ Ws
aval HVS, 25)

4.2 12
_ 5 S}_ 4j5 5
2 1 24
5 5 15 5

16 _8 1 2
_| 5 s/_|s5 3 oo .
-8 4/1/24 2 0
5 5 5 §

HISTORICAL NOTE Tue TERM SPECTRUM Was coined around 1905 by David Hilbert (1862-
1943) for use in dealing with the eigenvalues of quadratic forms in infinitely many variables. Th¢
notion of such forms came out of his study of certain linear operators in spaces of functions:
Hilbert was struck by the analogy one could make between these operators and quadratic forms 10
finitely many variables. Hilbert’s approach was greatly expanded and generalized during the next
decade by Erhard Schmidt, Frigyes Riesz (1880-1956), and Hermann Weyl. Interestingly enoug!
in the 1920s physicists called on spectra of certain linear operators to explain optical spectra. -
David Hilbert was the most influential mathematician of the early twentieth century. He
made major contributions in many fields, including algebraic forms, algebraic number theory:
integral equations, the foundations of geometry, theoretical physics, and the foundations ®
mathematics. His speech at the 2nd International Congress of Mathematicians in Paris in 1909,
outlining the important mathematical problems of the day, proved extremely significant in
providing the direction for twentieth-century mathematics.
84 COMPUTING EIGENVALUES AND EIGENVECTORS 4AS

Suppose now that we have found the eigenvalue A, of maximum magni-


tude of A, and a corresponding eigenvector v, by the power method. We
compute the unit vector b, = ¥,/|lv,||. From Eq. (8), we see that

A — A,b,b,7 = Obyb,7 + Aybyb,? + - + > + A,b,b,7 (9)


is a matrix with eigcnvalues A,, A;, ..., A,, 0 in order of descending
magnitude, and with corresponding eigenvectors b.,, b,,..., b,, b,. We can
now use the power method on this matrix to compute A, and b,. We then
execute this deflation again, formingA — A,b,b,’ — A,b,b,to find A, and b,, and
sO on.
The routine POWER in LINTEK has an option to use this method of
deflation. For the symmetric matrices of thc small size tnat we use in our
examples and exercises, POWER handles deflation well, provided that we
compute each eigenvalue until stabilization to all places shown on the screen is
achieved. In practice, scientists are wary of using deflation to find more than
one or two further eigenvalues, because any error made in computation of an
eigenvalue or eigenvector will propagate errors in the computation of subse-
quent ones.

EXAMPLE 4 Illustrate Eq. (9) for deflation with the matnx

4-13
_| 3 -2

of Example 3.

SOLUTION From Example 3, we have

4-2) |12
A = djb,b,7 + Ayb,b,7= 4 3 F}-]3 3h.
5 5) |5 5

Thus,

t2) fii _2l


A — Abb?= -|3 {J=] 3 3].
55 5 5

The characteristic polynomial for this matnx is

-l_)y -2
5 5 | _
2 -4_,(/= + A) = AA
+ 1),
5 5

and the eigenvalues are indeed A, = —1 together with 0, as indicated following


Eq. (9). 8
446 CHAPTER 8 EIGENVALUES” FURTHER APPLICATIONS ANDO COMPUTATIONS

EXAMPLE 5 Use the routine POWER in LINTEK to find the eigenvalues and eigenvectors,
using deflation, for the symmetric matrix
2~8 §!
-g 0 10),
5 10 -6|
SOLUTION Using POWER with deflation and findiiig eigenvalues as accurately as the
printing on the screen permits, we obtain the eigenvectors and eigenvalues
v, = [-.6042427, ~.8477272, 1], A, = —17.49848531152027,
v, = [—.760632, 1, 3881208}, Ay = 9.96626448890372,
v, = [1, 3958651, 9398283], A, = 3.532220822616553.
Using MATCOMP as a check, we find the same eigenvalues and eigen-
vectors. §&

Jacobi’s Method for Symmetric Matrices

We present Jacobi’s method for diagonalizing a symmetnc matrix, omitting


proofs. Let A = (a; be an n x n symmetric matrix, and suppose that a,, is an
entry of maximum magnitude among the entries of A that lie above the main
diagonal. For example, in the matrix
2-8 5
A=|-8 0 101, (19)
5 10 -6
the entry above the diagonal having maximum magnitude is 10. Then form the
2 X 2 matnx

Ba cn (11)
Dap gq
From matrix (10), we would form the matrix consisting of the portion shown
in color—namely,
0 10
10 -6]'
Let C = [c,] be a 2 x 2 orthogonal matrix that diagonalizes matrix (11).
(Recall that C can always be chosen to correspond to a rotation of the plane,
although this is not essential in order for Jacobi’s method to work.) Now form
ann X nmatrix R, the same size as the matrix A, which looks like the identity
matrix except that 7, = C11, Myq = Cia Top = Co1) ANG yg = Cy. For matrix (10),
where 10 has maximum magnitude above the diagonal, we would have

1 0 0
R= {0 cy ey).
0 Cy Cy
84 COMPUTING EIGENVALUES AND EIGENVECTORS 447

This matrix R will be an orthogonal matrix, with det(R) = det(C). Now form
the new symmetric matrix B, = R’AR, which has zero entries in the row p,
column q position and in the row q, column p position. Other entries in B, can
also be changed from thuse in 4, but it can be shown that the maxinium
magnitude of off-diagonal entiies has been reduced, assuming that no other
above-diagonal entry in A had the magnitude of a,. Then repeat this process,
Starting with B, instead of A, to obtain another symmetric matrix B,, and
su on. It can be shown that the maximum magnitude of off-diagonai entries
in the matrices B, approaches zero as / increases. Thus the sequence of
matrices

B, B,, B,, oe

will approach a diagonal matrix J whose eigenvalues d,, dy, .. . , d,, are the
same as those of A.
If one is going to use Jacobi’s method much, one should find the 2 x 2
matrix C that diagonalizes a general 2 x 2 symmetric matnx

bch
b c}’

that is, one should find formuias for computing C in terms of the entries a, b,
and c. Exercise 23 develops such formulas.
Rather than give a tedious pencil-and-paper example of Jacobi’s method,
we choose to present data generated by the routine JACOBI in LINTEK
for matrix (10)—which is the same matrix as that in Example 5, where we
used the power method. Observe how, in each step, the colored entnes of
maximum magnitude off the diagonal are reduced to “zero.” Although
they may not remain zero in the next step, they never return to their orig-
inal size.

EXAMPLE 6 Use the routine JACOBI in LINTEK to diagonalize the matrix

2-8 5
-8 0 10).
5 10 -6

SOLUTION The routine JACOBI gives the following matrices:

2-8 5
-8 0 10
5 10 -6

2 8.786909 3.433691
8.786909 —13.44031 0
3.433691 0 7.440307|
448 CHAPTER 8 EIGENVALUES FURTHER APPLICATIONS AND COMPUTATIONS

-17.41676 0 — 1.415677
0 5.97645 —3.128273
— 1.415677 —3.128273 7.440307

—17.41676 —.8796473 -1.109216


—.8796473 3.495621 0
~1,109216 0 9.921136

ine ~ 8789265 0
—.8789265 3.495621 3.560333E-02
0 3.560333E-02 9.966067
—17.49849 0 |.489243E-03
0 3.532418 3.55721 7E-02
| 1.489243E-03 3.557217E-02 9.956067 |
~17.49849 8.233765E-06 —1.48922E-03
8.233765E-06 3.53222! 0
—1.48922E-03 0 9.966265
—17.49849 8.233765E-06 0
8.233765E-06 3.532221 —4.464509E-10
0 ~4,464508E-10 9.966265
The off-diagonal entries are now quite small, and we obtain the same
eigenvalues from the diagonal that we did in Example 5 using the power
method. s

QR Algorithm
At the present time, an algorithm based on the QR factorization of an
invertible matrix, discussed in Section 6.2, is often used by professionals to
find eigenvalues of a matrix. A full treatment of the QR algorithm is beyond
the scope of this text, but we give a brief description of ihe method. As with the
Jacobi method, pencil-and-paper computations of eigenvalues using the
QR algorithm are too cumbersome to include in this text. The routine
QRFACTOR in LINTEK can be used to illustrate the features of the
QR algorithm that we now describe.
Let A be a nonsingular matrix. The QR algorithm generates a sequence of
matrices A,, A,, A;, Ay... , all having the same eigenvalues as A. To generat¢
this sequence, let A, = A, and factor A, = Q,R,, where Q, is the orthogon@
matrix and R, is the upper-triangular matrix described in Section 6.2. Then let
A, = R,Q,, factor A, into Q,R,, and set A, = R,Q,. Continue in this fash!”
factoring A, into Q,R, and setting A,,, = R,Q,. Under fairly gene’
84 COMPUTING EIGENVALUES AND EIGENVECTORS 449

conditions, the matrices A, will approach an almost upper-triangular matrix of


the form

IX X X X X X X]
XxX XX X X X;
00 XxX xX X xX
00 XXX xX xX
0000xX xX x}
0000 x xX xX

100000 -:: xX]


The colored entries just below the main diagonal may or may nct be zero. I!
one of these entries is nonzero, the 2 x 2 submatrix having the entry in its
lower left-hand corner, like the matrix shaded above, has a pair of complex
conjugate numbers a + Oi as eigenvalues that are also eigenvalues of the large
matnix, and of A. Entries on the diagonal that do not lie in such a 2 X 2 block
are real eigenvalues of the matrix and of A.
The routine QRFACTOR in LINTEK can be used to illustrate this
procedure. A few comments about the procedure and the program are in order.
From A, = Q,R,, we have R, = Q,"'A,. Then A, = R,Q, = Q,7'A,Q,, so we
see that A, is similar to A, = A and therefore has the same eigenvalues as A.
Continuing ii this fashion, we see that each matrix A, is similar to A. This
explains why the eigenvalues don’t change as the matrices of the sequence are
generated.
Notice, too, that Q, and Q,~' are orthogonal matrices, so Q,"'x and yQ,
have the same magnitudes as the vectors x and y, respectively. It follows that, if
E is the matrix of errors in the entries of A, then the error matrix Q,~'EQ,
arising in the computation of Q,~'AQ, is of magnitude comparable to that of E.
That is, the generation of the sequence of matrices A, is stable. This is highly
desirable in numerical computations.
Finally, it is often useful to perform a shift, adding a scalar multiple r/ of
the identity matrix to A, before generating the next matnx A,,,. Such a shift
increases all eigenvalues by r (see Exercise 32 in Section 5.1), but we can keep
track of the total change due to such shifts and adjust the eigenvalues of the
final matrix found to obtain those of A. To illustrate one use of shifts, suppose
that we wish to find eigenvalues ofa singular matnx B. We can form an initial
shift, perhaps takingA = B + (.001)/, to obtain an invertible matrix A to start
the algorithm. For another illustration, we can find by using the routine
QRFACTOR that the QR algorithm applied to the matnx

01
1 0
generates this same matrix repeatediy, even though the eigenvalues are 1 and
— | rather than complex numbers. This is an example of a matrix A for which
450 CHAPTER 8 EIGENVALUES: FURTHER APPLICATIONS AND COMPUTATIONS

the sequence of matrices A, does not approach a form described earlier.


However, a shift that adds (.9)/ produces the matrix

t 3}
which generates a sequence that quickly converges to
lo H
O -.1]
Subtracting the scalar .9 from the eigenvalues |.9 and —.1 of this last matrix,
we obtain the eigenvalues | and —! of the original matnx.
Shifts can also be used to speed convergence, which is quite fast when the
ratios of magnitudes of eigenvalues are large. The routine QRFACTOR
Cisplays the matrices A; as they are generated, and it allows shifts. If we notice
that we are going to obtain an eigenvalue whose decimal expansion starts with
17.52, then we can speed convergence greatly by adding the shift (—17.52)/.
The resulting eigenvalue will be near zero, and the ratios of the magnitudes of
other eigenvalues to its magnitude will probably be large. Using this technique
with QRFACTOR, it is quite easy to find all eigenvalues, both real and
complex, of most matrices of reasonable size that can be displayed conve-
niently.
Professional programs make many further improvements in the algonthm
we have presented, in order to speed the creation of zeros in the !ower part of
the matnx.

SUMMARY

1. Let A be ann X n diagonalizable matrix with real eigenvalues A, and witha


dominant eigenvalue A, of algebraic multiplicity I, so that |A,| > |A| for/ =
2, 3,..., ”. Start with any vector w, in R" that is not in the subspace
generated by eigenvectors corresponding to the eigenvalues A, for j > I.
Form the vectors w, = (Aw,)/d,, w,; = (Aw,)/d,,..., where d, is the
maximum of the magnitudes of the components of Aw,. The sequence of
vectors
W;, Wo, W3---
approaches an eigenvector of A corresponding to A,, and the associated
Rayleigh quotients (Aw, - w,/(w; : w;) approach A,. This is the foundatio#
for the power method, which is summarized in the box before Example !-
2. If A is diagonalizable and invertible, and if |A,| < |A] for i <n with A, of
algebraic multiplicity 1, then the power method may be used with A“ ©
find A,.
84 COMPUTING EIGENVALUES AND EIGENVECTORS 451

3. Let A be ann X n symmetric matrix with eigenvalues A, such that |A,| >
|A,| 2 -- + 2 |A,|. Tf b, is a unit eigenvector corresponding to A,, then
A — A,b,b,’ has eigenvalues A,, A;,... , A,, 0, and if |A,| > |A,|, then A, can
be found by applying the power method to A — A,b,b,7. This deflation can
be continued with A — A,b,b,” — A,b,b,7 if |A,| > |A,], and so on, to find
more eigenvalucs.
4. In the Jacobi method for diagonalizing a symmetric matrix A, one
generates a sequence of symmetnic matrices, starting with A. Each matrix
of the sequence is obtained from the preceding one by multiplying it on the
left by R’ and on the right by R, where R is an orthogonal “rotation”
matrix designed to annihilate the two (symmetrically located) entries of
maximum magnitude off the diagonal. The matrices in the sequence
approach a diagonal matnx D having the same eigenvalues as A.
5. Inthe QR method, one begins with an invertible matrix A and generates a
sequence of matrices A, by setting A, = A and A,,, = R,Q,, where the OR
factorization of A, is Q,R,. The matrices A, approach almost upper-
triangular matrices having the reai eigenvalues of A on the diagonal and
pairs of complex conjugate eigenvalues of A as eigenvalues of 2 x 2 blocks
appeanng along the diagonal. Shifts may be used to speed convergence or
to find eigenvalues of a singular matrix.

EXERCISES

In Exercises 1-4, use the power method to 111


estimate the eigenvalue of maximum magiiitude 7./1 00
and a corresponding eigenvector for the given 100
matrix. Start with first estimate ° i -1 -1

-['}
wal! 8.|-1
an 1-1

and compute w,, W,, and w,. Also find the three [Hinr: Use Example 4 in Section 6.3.]
Rayleigh quotients. Then find the exact
eigenvalues, for comparison, using the In Exercises 9-12, find the matrix obtained by
characteristic equation. deflation after the (exact) eigenvalue of maximum
magnitude and a corresponding eigenvector are
if 3-3 2, [3 -3 found. -
-5 { 4 -5
3, [-3 10 4 [-49 9. The matrix in Exercise 5
3 8 2 5 10. The matrix in Exercise 6
11. The matrix in Exercise 7
In Exercises 5-8, find the spectral decomposition
: 12. The matnx in Exercise 8
(8) of the given symmetric matrix.
= In Exercises 13-15, use the routine POWER in
5, 3 | 6. 3 4 LINTEK, or review Exercises M2-M5 in Section
3 2 5 3 5.1 and use MATLAB as in M4 and M5, to find
452 CHAPTER 8 EIGENVALUES FURTHER. APPLICATIONS AND COMPUTATIONS

the eigenvalue of maximum magnitude and a strengthen your geometric understanding of


corresponding eigenvector of the given matrix.
the power method. There is no scaling, but
[ 1 -44 -88 you have the option of starting again if the
3.4-5 55 143 numbers get so large that the vectors are off
| 1 -—24 —48| the screen. Read the directions in the
program, and then run this eigenvalue
57-231 option until you can reliably achieve a score
14. )/-205 8 -113 of 85% or better.
-130 4 -70
23. Consider the matrix
3 -22 -—46
-3 23 47 A= a 5}
1 -10 -20 bec
16. Use the software described for Exercises The following steps will form an orthogonal
13-15 with matrix inversion to find the diagonalizing matrix C for A.
eigenvalue of minimum magnitude and the
corresponding eigenvector for the matrix in Step 1 Le. g = (a — cV/Z.
Exercise 13. Step 2 Leth = Vg’? + B.
17. Repeat Exercise 16 for the matrix Step 3 Letr= VP + (¢ + Ay.
Step4 Lets = VP + (g — AY.
[-l 3 |
3 2-14), Step 5 Let C = —b/r —b/s |
1 -ll #7 (g + Ayr (g — A)is

In Exercises 18-21, use the routine POWER (If b < 0 and a rotation matrix is desired,
in change the sign of a column vector in C.)
LINTEK and deflation to find all eigenvalues and
corresponding eigenvectors for the given Prove this algorithm by proving the
symmetric matrix Always continue the method following.
before deflating until as much stabilization as a. The eigenvalues of A are
possible is achieved. Note the relationship between - a+c+V(a-
cP + 4b?
ratios of magnitudes of eigenvalues and the speed A= 5 ,
of convergence.
[Hint: Use the quadratic formula.}
3 5 -7 b. The first row vector of A — A/ is
18. 5 10 11 (g + Vg? + b’, bd).
-7 Il Oo c. The columns of the orthogonal matrix C
0-1 4 in step 5 can be found using part b.
19. |-] 2-1 d. The parenthetical statement following
4-1 Q step 5 is valid.
24. Use the algorithm in Exercise 23 to find an
}3 1 -2 i
>a

orthogonal diagonalizing matrix with


4 1 4 0 -3
wf? 00 5 determinant | for each of the following.

3
1-3
7-2
5
6
2
“ld
ae
7 4411 3
te 5-8 0
|6 3 0 6
22. The eigenvalue option of the routine In Exercises 25-28, use the routine JACOBI in
VECTGRPH in LINTEK is designed to LINTEK to find the eigenvalues of the matrix by
Jacobi's method.
84 COMPUTING EIGENVALUES AND EIGENVECTORS 453

25. The matrix in Exercise 18 3 6 —-7 2


26. The matrix in Exercise 19 39. {13 —!8 4 6
“1321 32 -16 9
27. The matrix in Exercise 20 _4 g 7 ,
|
28. The matrix in Exercise ?1
[-12 3 15 2-2 j
|
In Exercises 29-32, use the routine QRFACTOR 47 -34 87 24
in LINTEK to find all eigenvalues, both real and 31. 35 72 33 -57 82
complex, of the given matrix. Be sure to make use “145° 67 32 10 46
of shifts to speed convergence whenever you can L ~9 22 21 -45 8

[
see approximately what an eigenvalue will be. 3 6 4-2 {6
21 -33 5S 8 -12

2 -l1
3
9
§
-6
0 -Ii
7
8 | 32. 15
—18
[-22
-21
12
31
13
4
14
4 = 20
8
9 10
3
CHAPTER

Po
3 COMPLEX SCALARS

Much of the !inear aigebra we have discussed is equally valid in applications


using complex numbers as scalars. In fact, the use of complex scalars actually
extends and clarifies many of the results we have developed. For example,
when we afe dealing with complex scalars, every n X n matnx nas n
eigenvalues, counted with their algebraic multiplicities.
In Section 9.1, we review the algebra of complex numbers. Section 9.2
summarizes properties of the complex vector space C" and of matrices with
complex entnes. Diagonalization of these matrices and Schur’s unitary
triangularization theorem are discussed in Section 9.3. Section 9.4 is devoted
to sordan cancnical forms.

9.1 ALGEBRA OF COMPLEX NUMBERS

The Number /

Numbers exist only in our minds. There is no physical entity that is the
number 1. If there were, 1 would be in a place of high honor in some great
museum of science, and past it would file a steady stream of mathematicians,
gazing at | in wonder and awe. As mathematics developed, new numbers were
invented (some prefer to say discovered) to fulfill algebraic needs that arose. If
we start just with the positive integers, which are the most natural numbers,
inventing zero and negative integers enables us not merely to add any two
integers, but also to subtract any two integers. Inventing rational numbers
(fractions) enables us to divide an integer by any nonzero integer. It can be
shown that no rational number squared is equal to 2, so irrational numbers
have to be invented to solve the equation x* = 2, and our familiar decimal
notation comes into use. This decimal notation provides us with other num-
bers, such as 7 and e, that are of practical use. The numbers (positive, nega
tive, and zero) that we customarily write using decimal notation have unfor-
tunately become known as the real numbers; they are no more real than any
other numbers we may invent, because all numbers exist only in our minds.

454
9.1 ALGEBRA OF COMPLEX NUMBERS 455

The real numbers are still inadequate to provide solutions of even certain
polynomial equations. The simple equation x? + | = 0 has no real number as a
solution. In terms of our needs in this text, the matrix

id
0-1

has no eigenvalues. Mathematicians have had to invent a solution i of the


equation x’ + | = Oto fill this need. Of course, once a new number is invented,
it must take its place in the algebra of numbers; that is, we would like to
multiply it and add it to all the numbers we already have. Thus, we also allow
numbers bi and finally a + bi for all of our old real numbers a and Bb. As we will
subsequently show, it then becomes possible to add, subtract, multiply, and
divide any of these new complex numbers, except that division by Zero is still
not possible. It is unfortunate that i has become known as an imaginary
number. The purpose of this introduction is to point out that i is no less real
and no more imaginary than any cther number.
With the invention of i and the consequent enlargement of our number
system to the set C of all the complex numbers a + bi, a marvelous thing
happens. Not only does the equation x° + a = 0 have a solution for all real
numbers a, but every polynomial equation has a solution in the complex
numbers. We state without proof the famous Fundamental Theorem of
Algebra, which shows this.

Fundamental Theorem of Algebra


Every polynomial equation with coefficients in C has n solutions in
C, where n is the degree of the polynomial and the solutions are
counted with their algebraic multiplicity.

Review of Complex Arithmetic


A complex number z is any expression z = a + bi, where a and 0 are real
numbers and i = /—1. The scalar a is the real part of z, and bis the imaginary
part of z. It is useful to represent the complex number z = a + bias the vector
[a, 5] in the x,y-plane R’, as showrn Figure 9.1. The x-axis is the real axis, and
the y-axis is the imaginary axis. We let C = {a + bi| a, b © R} be the set of all
complex numbers.

FIGURE 9.1
The complex number z.
456 CHAPTER 9 COMPLEX SCALARS

w=ctdi

LW

z=atbi, ,

FIGURE 9.2 FIGURE 9.3


Representation of Zz + w. Representation of z — w.

Complex numbers are added, subtracted, and multiplied by real sealars in


the natural way:

(a+ bi)+ (c+ di)=(atc)+


= di,
Zz Ww

r(a + bi) = ra + (rb)i.


These operations have the same geometric representations as those for vectors
in R? and are illustrated in Figures 9.2, 9.3, and 9.4.
It is clear that the set C of complex numbers is a real vector space of
dimension 2, isomorphic to R’.
The modulus (or magnitude) of the complex number z = a + bi 1s |z| =
Va? + Bb’, which is the length of the vector in Figure 9.1. Notice that jzj = 0if
and only ifz = 0—that is, if and only ifa = b= 0.
Multiplication of two complex numbers is defined in the way it has to be to
make the distributive laws z,(z, + z,) = z,z, + z,z, and (z, + 2,)Z, = 2,2, + 22;
valid. Namely, remembering that i? = —1, we have

(a + bi)(c + di) = (ac — bd) + (ad + be)i.

y
4
y bt
A a
z=atbi
22 x

: pe Xx
Z=a- bi
32 —pt

FIGURE 9.4 FIGURE 9.5


Representation of rz The conjugate of z.
91 ALGEBRA OF COMPLEX NUMBERS 457

Multiplication of complex numbers is commutative and associative. To divide


one complex number by a nonzero complex number, we make use of the
notion ofa complex conjugate. The conjugate of z = a + biisZ = a— Di.
Figure 9.5 illustrates the geometric relationship between = and z. Computing
z= = (a + bila — bi). we find that

zarat b= lz. (1)

If z # 0, then Eq. (1) can be written as z(Z/|z|”) = 1, so I/z = Z/|z|*. More


generally, if z # 0, then w/z is computed as

w_wz_ wz iit (w?). (2)


z 2 (ze {zy

We will see geometric representations of multiplication and division in a


moment.

EXAMPLE 1 Find (3 + 4i)"' = 1/(3 + 4i).


SOLUTION Using Eq. (2), we have

| _/ 1 \(3=4i) 3-41 1 yy 3 4;
344 \344i)\3-4i) 25 ~ 25 25 25°

EXAMPLE 2 Compute (2 + 3z)/(1 + Zi).


SOLUTION Using the technique of Eq. (2), we obtain

2+3i_ (2+43i\(1-2i\_1. any 9, -8_ 1,


reas (Fea (3) = $e + 300 aI) = 5 ~ sh '
All of the familiar rules of arithmetic apply to the algebra of complex
numbers. However, there is no notion of order for complex numbers; the
notion z, < z, is defined only if z, and z, are real. The conjugation operation
turns out to be very important; we summarize its properties in a theorem.

THEOREM 9.1 Properties of Conjugation in C


sewn —
nA
458 CHAPTER 9 COMPLEX SCALARS

FIGURE 9.6
Polar form of z.

The proofs of the properties in Theorem 9.1 are very easy. We prove
property 3 as an example and leave the rest as exercises. (See Exercises 12 and
13.)

EXAMPLE 3 Prove property 3 of Theorem 9.1.


SOLUTION Letz=a+ biand
w=c + di. Then

zw = (a+ bil(c + dt) = (a — bi)(c — di)


= (ac — bd) -- (ad + bc)i = Zw. "

Polar Form cf Complex Numbers


Let us return to the vector representation of z = a + bi in R’. Figure 9.6
indicates that, for z # 0, if @is an angle from the positive x-axis to the vector
representation of z, and if r = |z|, then a = r cos @and b= rsin 6, Thus,

z=r(cos @+ isin 6). A polar form of z (3)

HISTORICAL NOTE Complex NUMBERS MADE THEIR INITIAL APPEARANCE on the mathematical
scene in The Great Art, or On the Rules of Algebra (1545) by the 16th century Italian
mathematician and physician, Gerolamo Cardano (1501-1576). It was in this work that Cardano
presented an algebraic method of solution of cubic and quartic equations. But it was the quadratic
problem of dividing 10 into two parts such that their product is 40 to which Cardano found the
solution 5 + \/—15 and 5 — \/—15 by standard techniques. Cardano was not entirely happy with
this answer, as he wrote, “So progresses anthmetic subtlety the end of which, as is said, is as refined
as it is useless.”
Twenty-seven years later the engineer Raphael Bombelli (1526-1572) published an algebra
text in which he dealt systematically with complex numbers. Bombelli wanted to clarify Cardano’s
cubic formula, which under certain circumstances could express a correct real solution of a cubic
equation as the sum of two expressions each involving the square root of a negative number. Thus
he developed our modern rules for operating with expressions of the form a + b\/—1, including
methods for determining cube roots of such numbers. Thus, for example, he showed that
W247i = 2 + \/—1 and therefore that the solution to the cubic equation x? = !5x + 4,
which Cardano’s formula gave as x = VY 2+ V-121 + Y 2 - V/-121, could be wnite~
simply as x = (2 + \/—1) + (2 - V/—1}), or as x = 4, the obvious answer.
9.1 ALGEBRA OF COMPLEX NUMBERS 459

The angle @ is an argument of z. Of particular importance is the restricted


value

—m@ <@5 7 The principai argument of -


denoted by Arg(z). We usually use this principal argument of z and refer to the
corresponding form (3) as the polar form of z.

EXAMPLE 4 rane principal argument and the polar form of the complex number z =
—~V3-i.
SOLUTION Because |z| = (-3)? + (-1)? = V4 = 2, we have cos @ = —V3/2 and
sin 6 = —3, as indicated in Figure 9.7. The principal argument of z is the
angle @ between —7z and 7 satisfying these two conditions—that is, @ =
~5a/€. The required polar form is z = 2(cos(—52/6) + isin(—S7/6)). =

If we multiply two complex numbers in their polar form, we quickly


discover the geometric representation of multiplication. Suppose that we let
zZ, = 1,(cos 6, + isin 6,) and z, = r,(cos @, + isin @,). Then
ZZ, = r,r,{((cos 6, cos 8, — sin 6, sin 6,) + i(sin 6, cos 6, + cos 6, sin @,))
= rr,(cos (6, + 8.) + isin (6, + 4,)).
The last equation arises from the familiar trigonometric identities for
the cosine and sine cf a sum. This computation shows that, when two
complex numbers are multiplied, the moduli are multiplied and the argu-
ments are added. Notice, however, that the principal argument of a product

the principal arguments may have to be adjusted to lie between —a and 7z.
(See Figure 9.8.)

y
y j

A
2 Z1

v3 KL > X Arg(z2)
1
11 r=2 *_] /
(5m Arg(z,) + Arg(z2) \ Arta
! x
\ 6 /
X Arg(z 322)
| “! 212) lz122) = |zileal

FIGURE 9.7 FIGURE 9.8


The polar form of z = —V3 — i. Representation of 2,2).
460 CHAPTER 9 COMPLEX SCALARS

Geometric Representation of 2,2,

Ll. [Z,23| = {2\ll24]. (4)


2. Arg(z,) + Arp(z,) is an argument of z,z,.

Because 2,/z, is the complex number w such that z,w = z,, we see that, in
division, one divides the moduli and subtracts the arguments. (See Figure 9.9.)

Geometric Representation of z,/z,

L. |z,/2,] = [zz]. 15)


2. Arg(z,) — Arg(z,) is an argument of z,/z,.

EXAMPLE 5 Illustrate relations (4) and (5) for the complex numbers z, = V3 + iand z, =
L+.

SOLUTION For z,, we have r, = |z,| = V4 = 2, cos 6, = 3/2, and sin 6, = 5, so

= eos 2 + isin =
z, = 2(cos 6 + isin 6)"

For z,, we have r, = |z,| = /2 and cos @ = sin 6, = 1/V2, so

Z, = V2(cos 4 + isin my

Thus,

2,2, = 2VAcos( + 7) +1 sin(g + a)

= 2V/2(cos oT12 + isin 3%)


12”

lzil
1 (zlzy 1
(21/22) NG Arg(z,/z)
Se / Z2
Arg(z) \ MArg(za) x .

I FIGURE 9.9
Representation of z,/z).
91 ALGEBRA OF COMPLEX NUMBERS 461
and

z .
no Fhloosl & ~ 4) + isin — 3)

= V7
~*_(cos(-2)
12 + i sin(
sin(-—1): .

Relations (4) can be used repeatedly with z, = z, = z= r{(cos 6 + isin 6) to


compute z’ for a positive integer n. We obtain the formula

z= r'(cos n9 + i sin 76). (6)

We can find nth roots of z by solving the equation w" = z, Writing z =


r(cos 6+ isin 6) and w = s(cos ¢ + isin ¢) and using Eq. (6), the equation
z = w becomes

r(cos 6 + isin 6) = s(cos nd + i sin n¢).

Therefore, r = s* anc n@ = @ + 2ka fork = 0, 1,2,...,S0

w = s(cos d + J sin ¢),

where

=r" and @=-—+ 2kn—


a1

Exercise 27 indicates that these values for ¢ represent precisely 7 distinct


complex numbers, as indicated in the following box.

nth roots of z = r(cos 6 + isin #)


The ath roots of z are

rir(cos( + 2h) + j sin( 2 + a) (7)

fork =0,1,2,...,"- 1. ;

This illustrates the Fundamental Theorem of Algebra for the equation w" = z
of degree n that has n solutions in C.

EXAMPLE 6 Find the fourth roots of 16, and draw their vector representations in R?.
462 CHAPTER 9 COMPLEX SCALARS

FIGURE 9.10
The fourth roots of 16.

SOLUTION The polar form ofz = 16 is z = 16(cos 0 + isin 0), where Arg(z) = 0. Apply-
ing formula (7), with n = 4, we find the following fourth roots of 16 (see
Figure 9.10):

k 16°{cos “a + isin 2

of 2(cos 0 + i sin 0) = 2
] acs 3 + ?sin 4 = 21
2 2(cos 7 + isin m) = —2
3 a{co 2 + isin | = -2i

| SUMMARY
[|
1. A complex number is a number of the form z = a + bi, where a and b are
real numbers and i = V-1.
2. The modulus of the complex number z = a + bi is
jz. = Va
+ B.
3. The complex number z = a + bi has the polar form z = |z|(cos @ + i sin 6),
and 6 = Arg(z) is the principal argument of z if -7 < 6S a.
91 ALGEBRA OF COMPLEX NUMBERS 463

4. Arithmetic computations in the set C of complex numbers, including


division and extraction of roots, can be accomplished as illustrated in this
section and can be represented geometrically as shown in the figures of this
section.

| EXERCISES
|

I. Find the sum z + wand the product zw if 12. Prove properties 1, 2, and 5 of Theorem 9.1
a Z=1+2iw=3 -i, 13. Prove property 4 of Theorem 9.1.
b z=3+i,w=i. 14. Illustrate Eqs. (5) in the text for z, =
. Find the sum z + wand the product zw if 3 + iand z, = -1 + V3i.
a z=2+3iw=5-4, . Illustrate Eqs. (5) in the text for z, = 2 + 27
b z=1+2iw=2-i. andz,=1+ MV3i.
. Find |z| and z, and verify that zz = |z|’, if . If2 = 16, find |z!.
a. Zz=3+ 2i, . Mark each of the following True or False.
b z=4-i.
a. The existence of complex numbers is
. Find |z| and z, and verify that zz = |z|’, if more doubtful than the oxistence of real
a z=2 +1, numbers.
b. z= 3 - 41. b. Pencil-and-paper computations with
Show that z is a real number if and only if complex numbers are more cumbersome
z=Z, than with real numbers.
c. The square of every complex number is a
. Express z”' in the form a + bi, for a and b
positive real number.
real numbers, if
d. Every complex number has two distinct
a. z= -—I +18, square roots in C.
b z= 3+ 41. e. Every nonzero complex number has two
Express z/w in the form a + bi, for a and 6 distinct square roots in C.
real numbers, if f. The Fundamental Theorem of Algebra
a z=1+2iw=1+i,
asserts that the algebraic operations of
addition, subtraction, multiplication, and
b z=3+i,w=3
+ 41.
division are possible with any two
Find the modulus and principal argument complex numbers, as long as we do not
for divide by zero.
a. 3-4, g. The product of two complex numbers
b. -V3 -i. cannot be a real number unless both
Find the moduius and principal argument numbers are themselves real or unless
for both are of the form bi, where b is a real
number.
a. —2 + 2i,
h. If(a+ bi)= 8, then a’ + b= 4.
b. -2 - 2i.
i. If Arg(z) = 32/4 and Arg(w) = —7/2,
10 Express (3 + i)° in the form a + bi for a then Arg(z/w) = 52/4.
and 6 real numbers. (HINT: Write the given j. If z +z = 2z, then z is a real number.
number in polar form.]
18. Find the three cube roots of 8.
11. Express ‘1 + i)* in the form a + bi for a and
b real numbers. (Hint: Write the given 19. Find the four fourth roots of — 16.
number in polar form.] 29. Find the three cube roots of —27.
464 CHAPTER 9 COMPLEX SCALARS

21. Find the four fourth roots of 1. 27. Show that the infinite list of values ¢ =

22. Find the six sixth roots of 1.


8 + tke
if
for k = 0,1, 2,.. . yields just
n distinct complex numbers w =
23. Find the eight eighth roots of 256. s(cos + I sin ) of modulus s.
24. A primitive ath root of unity is a complex 28. In calculus it is shown that
number z such that z" = | but z” ¥ | for Saltxteeegea.
m<n. v3 4
. ae ; zy Keg X_ vy XL
a. Give a formula for one primitive nth root Smox=xX- ata ato
of unity. cosx=zl—-L+X%—-¥4H-..
b. Find the primitive fourth roots of unity. a4 6 8
c. How many primitive eighth roots of unity a. Proceeding formally, show that e” = cos
are there? [HtnT: Argue geometrically in 6 + isin 6. (This is Euler’s formula.)
terms of polar forms.] b. Show that every complex number z may
be written in the form z = re*, where
25. Let z, w & C. Show that |z + w| < |z| + |w}.
r = |z; and @ = Arg(z).
[i4iNT: Remember that C is a real vector
c. Assuming that the usual laws of
ee re dimension 2, naturally isomorphic exponents hold for exponents that are
" complex numbers, use part b to derive
26. Show that the nth roots of z € C can be again Egs. (4) and (5) describing
represented geometrically as n equally multiplication and division of complex
spaced points on the circle x? + y* = |z/*. numbers in polar form.

9.2 MATRICES AND VECTOR SPACES WITH COMPLEX SCALARS

Complex Matrices and Linear Systems


Both the real number system R and the complex number system C are
algebraic structures known as fields. In a field, we can add any two elements
and multiply any two elements to produce an element of the field. Addition
and multiplication are commutative and associative operations, and multipli-
cation is distributive over addition. The field contains an additive identity 0
and a multiplicative identity 1. Every element c in the field has an additive
inverse —c in the field—that is, an element that, when added toc, produces the
additive identity. Similarly, every nonzero element d in the field has a
multiplicative inverse 1/d in the field.
The part of our work in Chapters 1, 2, and 3 that rests only on the field
axioms of R is equally valid if we allow complex scalars. In particular, we can
work with complex matrices—that is, with matrices having complex entries:
adding matrices of the same size, multiplying matrices of appropriate sizes,
and multiplying a matrix by a complex scalar. We can solve linear systems by
using the same Gauss or Gauss—Jordan methods that we used in Chapter 1. All
of our work in Chapter 1, Sections 3 through 6, makes perfectly good sense
when applied to complex scalars. Pencil-and-paper computations are more
tedious, however.
92 MATRICES AND VECTOR SPACES WITH COMPLEX SCALARS 465
EXAMPLE 1 Solve the linear system

Z- 2+(1
+ dz, = l
Iz, — 21z, + Iz,=2- 1
Iz, — (1 + dz, = 1 + 27.

SOLUTION We use the Gauss—Jordan method as follows:

1-1) L+i i 1-1 +i i


i -2i i 2-i|/~|0 -i l 3-1
O ¢ -1-i]1t+2i} |O f -1L-i | 142i
10 1+2))1+4 100;10+ 27
~10 1 i 1+ 3i/~]0 10] 5+ 43).
10 O -i | 4+/} [001 |-I
+ 4i
Thus we obtain the solution z,= 10
+ 2i,z,= 5+ 4i,z,=-l1+
41 o

EXAMPLE 2_ Find the inverse of the matrix

1 2: iti
A=\|1 31 !
Olt+i: -l

SOLUTION We proceed precisely as in Chapter 1:

! 2 o1t+if 100) {1 @& t+il 100


] 31 I 01 0;\~ {0 l -I1 |-1 10
lo 1+i -1 {OO} |O 1+% -1 | 001
10 3+i7] 3 —2 0
~10 1 I l -i 0
00 -i |l-i -1+7 1
100);)1-4 41 i-3I
~10 1 0 | -1 I
OOl;iti -1l-i i
Thus,

i-4i 4) 1-3
Ati=

Complex Vector Spaces


The definition of a complex vector space is identical with that of a real vector
space, except that the field of scalars used is C rather than R. The set C’ of all
n-tuples having entries in C is an example of a complex vector space. Another
example is the set of all m n matrices with complex entries. The vector space
466 CHAPTER 9 COMPLEX SCALARS

C” has the same standard basis {e,, €,, . . . ,e,} as R’, but of course now the field
of scalars is C. Thus, C” is an n-dimensional complex vector space—that is, if
we use complex scalars. In particular, C = C! is a one-dimensional complex
vector space, even though we can regard it geometrically as a plane. That plane
is a two-dimensional rea/ vector space, but is a one-dimensional complex
vector space. In general, C” can be regarded geometrically in a natural way as a
2n-dimensiona! real vector space, as we ask you to explain in Exercise 1.
All of our work in Chapters 1, 2, and 3 regarding subspaces, generating
sets, independence, and bases carries over tu complex vector spaces, and the
proofs are identical.

EXAMPLE 3 Determine whether the set


S = {[1, 2i, 1 + dg, [1, 34 a, (0, 1 + i -—1}}
is independent and is a basis for C’.
SOLUTION The vectors given in S are the row vectors in matrix A of Example 2. Row
reduction cf the matrix in that example shows that the matrix has rank 3. so
the vectors in S are independent and hence form a basis for the three-
dimensional space C*. ®

EXAMPLE 4 Find the coordinate vector v, in C? of the vector v = [i, 2 — i, 1 + 2i] relative to
the ordered basis

B= ({1, i, 0], [-1, —2i, i, (1 + §% -1 - a).


SOLUTION To find the coordinate vector of v relative to B, we reduce the augmented
matrix having the vectors in B as column vectors and having the vector v as the
column to the right of the partition. This is precisely the augmented matnx
that we reduced in Example 1, so we see that the coordinate vector v,, written
as usual as a column vector, is

110
+ 2i
vp=| 5+ 47 |.
-1+4i .

Euclidean Inner Product in €?°


We now come to an essential difference in the structures of C and of R. We
have a natural idea of order for the elements of R. We know what it means to
say X, < X,, and we have often used the fact that x7 = 0 for all x € R. There is 00
idea of order in C, extending the ordering of R, for # = —1 in C. The nonzero
numbers in C cannot be classified as either positive or negative on the basis of
whether or not they are squares, because al! numbers in C are squares. This is 4
very important difference between R and C.
Let us see what problems this causes as we try to extend some more of the
ideas in Chapter 3. We multiply matrices with complex entries by taking dot
products of row vectors of the first with column vectors of the second, just 4S
92 MATRICES AND VECTOR SPACES WITH COMPLEX SCALARS 467
- we do for matrices with real entries. Recall that, in R", the length of a vector v is
|v] = Vv - v. However, this dot product cannot be used as an inner product to
define length of vectors in C, because, if v = [1, i], we would have v: v = 0. The
fix for this problem is simple. Recalling that |a + bi? = (a — bi)(a + bi), we
make an adjustment and define the Euclidean inner product in C’.

DEFINITION §S.1_ Euclidean Inner Product

Let uo = [u,, u,,..., u,] and v = [v, v,,..., v,] be vectors in C". The
Euclidean inner product of u and v is

(u, v) = av, + Uv, + “ + U,V,

Notice that Definition 9.1 gives {{1, i], [1, i) = (J)() + (-)() = 14+ 1 = 2.
Because |a + il? = (a — bia + bi), we see at once that, for v € C", we have
WV = [yP + [WP t ++ + [val (1)
just as in the R" case.
We list properties of the Euclidean inner product in C" as a theorem,
leaving the proofs as exercises. (See Exercises 16 through 19.)

THEOREM 9.2 Properties of the Euclidean Inner Product

Let u, v, and w te vectors in C’, and let z be a complex scalar. Then

1. 4a, u) = 0, and (u, o) = 0.if apd only


if u = 0,
2:(a, y) = , 0), Lest a eee ae

3, X(a + v), #) = (a, w) + Ww),


4. dw, (u + v)) = (w, a) + (Ww, v),
5. (zu, Vv) = Z(a, v), and (u, zv) = z(u, ¥).

Property | of Theorem 9.2 and Eq. (1) preceding the theorem suggest that
we define the magnitude or norm of a vector v in C” as

[lv\| = Vy, v) = Vy, + y,v, + -°* + V¥,v,. Magnitude ofv

Vectors u and v in C" are perpendicular (or orthogonal) if (u, v) = 0; property 2


of Theorem 9.2 tells us that (u, v) = 0 if and only if (v, uw) = 0. The vectors are
parallel if u = zv for some scalar z in C. Notice that ||zv|| = |2| ||v/|, just as in the
real case. Vectors of magnitude 1 are unit vectors. Having made these
definitions, we will feel free to consider without further definition such things
as orthogonal subspaces and orthonormal bases.

EXAMPLES Finda unit vector in C’ parallel tov = [1, i, 1 + Jj.


468 CHAPTERS = COMPLEX SCALARS
SOLUTION Because ||v|| = Vv, v), we have

lv] = VI) + (“i+ 1 - 00 + = V1 + 1+ 2 = 2.


Fither of the vectors +41, i, ! + i] satisfies the requirement. »

The Euclidean inner product given in Definition 9.1 reduces to the usual
dot product of vectors if the vectors are in R”. However, we have to watch one
feature very carefully:

The Euclidean inner product in C" is


not commutative.

(See Property 2 in Tneorem 9.2.) For example,

(1, ,[0,i1)=—-% but (0, 1), 1, ib =e


To illustrate the care that must be taken as a result of the noncom-
mutativity of the inner product, we consider the Gram-Schmidt process
applied in the vector space C”. Let u and v be vectors in C”. Then the vector

w=u
_ wv,Wy, uw)vy).

1s orthogonal to v, because

(w, vs = (Cu, YY — (v, uw) = Cu, ¥) — du, ») = 0.

We must not use u — wy, whose inner product with v is generally not zero.

EXAMPLE 6 Transform the basis

{[1, 4, i], (1, 0, —a], [1, 0, 1)}


of C’ to an orthonormal one, using the Gram-Schmidt process.
SOLUTION First we transform the given basis to an orthogonal one. Because v, = [1, 4, #]
and v, = [1, 0, ~i] are orthogonal, we work with v, = [1, 0, 1] and replace it by

WV, v3) V., Vv»)


Vv, — \ Vv
3 v,, v,) (V2, v) ;

1-1 lt+i
3 vy. 2 V2

l . . l . ,
= [1, 0. N-s-Z1+h1+)-s+h01-9

= Fl -i,-2- 2,1 +9
9.2 MATRICES AND VECTOR SPACES WITH COMPLEX SCALARS 469

In fact, we prefer to replace v, by {1 — i, -2 — 2i, 1 + i], which is just as good.


An orthogonal basis is
{1,4 01, 0, -a, 1) - i, -2 - 28,1 + ap,
and an orthonormal basis is

{AU i, i], stl, 0, -!), AAU - §, -2 - 24,1 + i. .

The Conjugate Transpose

In fixing up our old inner product to serve in C” in Definition 9.1, we had to


decide whether to take conjugates of the components yu, of the first vector in
(u, v) or to take conjugates of the components v, of the second vector. It is more
convenient to take conjugates of components of the first vector for the
following reason. The bulk of our work in R" has been formulated in terms of
column vectors, although we often use row notation to save space. For
example, when working with matnx representations of linear transformations,
we always write Ax, where x is a column vector. You may recall several
instances where it has been convenient for us to use the fact that the inner
product (dot product) of two vectors in R" appears as sole entry in the matrix
product of the first vector as a row vector and of the second vector as a column
vector. Thai is, for column vectors x, y € R", we can write x - y = xy, where x”
is a row vector and where no distinction is made between a 1 X 1 matnx anda
scalar. Thinking in these terms, we can express the condition that the column
vectors of a real square matrix A form an orthonormal basis as A7A = J. In
order to preserve these convenient algebraic formulas with as little change as
possible, we choose to take conjugates of the components of the first vector in
Definition 9.1, and we continue with a definition that allows us to recover
these formulas.

DEFINITION 9.2 Conjugate Transpose, or Hermitian Adjoint

Let A = [a;] be an m X n matnx with complex scalar entries.


1. The conjzzate of A is the m X n matrix A = [a].
2. The conjugate transpose (or Hermitian adjoint) of A ‘s tue matnx
A* = [a,]’.

For column vectors v, w © C", we have

(v, W) = v¥w. (2)


Moreover, the condition for an X n complex matrix A to have orthogonal
unit column vectors can be written as
A*A = 1.
470 CHAPTER 9 COMPLEX SCALARS

EXAMPLE 7 Find the conjugate transpose A* of the matnx

Poioiti
A=|2 0 i
2i | i-i
SOLUTION We form the transpose while taking the conjugate oi each ciement, obtaining
| 2 -2i
A*=!| -1 0 I
1-i -i Itt .

Following are some properties of the conjugate transpose that can easily be
verified. (See Exercise 32.)

THEOREM 9.3 Properties of the Conjugate Transpose

Tet A and B be m X n matrices. Then


1. (A*)* = A,
2. (A + B)* = A* + B*,
3, (zA)* = ZA* for any scalar z € C,
4. If A and B are square matrices, (AB)* = B*A*.

EXAMPLE 8 Using the properties in Theorem 9.3, show that, for any 2 x n matnx A, we
have (A + A*)* = A + A®*.
SOLUTION Using properties 1 and 2 of Theorem 9.3, we have

(A + A*)* = A* + (A*)* = A* + A= AF A’,


which is what we wished to show. &

Unitary and Hermitian Matrices


Recall that a real orthogonal matrix is a square matrix having orthogonal unit
column vectors. The complex analogue of such a matrix is known by another
name.

DEFINITION 9.3 Unitary Matrix

A square matrix U with complex entries is unitary if its column vectors


are orthogonal unit vectors—that is, if U*U = I.

Our next example gives an important property of unitary matrices that 15


familiar to us from the real orthogonal case. Notice in this example how handy
the notation v*v is for (v, v).
92 MATRICES AND VECTOR SPACES WITH COMPLEX SCALARS 471

EXAMPLE 9 Let 7: C"—» C’ be a linear transformation having a unitary matrix U as mat rix
representation with respect to the standard basis. Show that ||7(z)|] = ||z|| for all
zEC’
SOLUTION We know that 7(z) = Uz, because U is the standard matrix representation of 7
Because ||v||’ = (v, v) = v*v, we find by using property 4 of Theorem 9.5 that
[|Uz|P = (U2)* Uz = 2*(U*U)z = z*Iz = 22 = lel’.
Taking square roots, we find that ||Uz|| = ||z\|, so ||7(z)|| = |lz\l. =

Real symmetric matrices play an iportant role in linear algebra. We saw


in Chapter 8 that they are very handy in work wiih quadratic forms. We also
stated without proof in Theorem 5.5 that every real symmetric matrix 1s
diagonalizable. Symmetric matrices are defined in terms of the transpose
operation. The useful analogue cf symmetry for matrices with complex scalars
involves the conjugate transpose operation.

DEFINITION 9.4 Hermitian Matrix

A square matrix H is Hermitian if H* = H—that is, if His equal to its


conjugate transpose.

If a square matrix H actually has real entries, it is Hermitian if and only if


it is symmetric. Notice that the condition H = H* = (#7)' implies that the
entnes on the diagonal of any Hermitian matrix are real numbers.

EXAMPLE 10 Example 8 shows that, for any square matnx A, the matnx A + A* is
Hermitian. Illustrate that the entries on the diagonal of A + A* are real, using
the matrix A in Example 7.
SOLUTION Example 7 shows that, for the matnx

A=|2 9 id,
23 1 i-i

we have

/1 2 -2i
A* =| -1 0 |
l-i -i ltt

Thus,

2 2+1 1-1
AtA*® =|2- i 0 1 + if,
l+i |l-i 2

which has real diagonal entries. «


472 CHAPTER 9 COMPLEX SCALARS

Hermitian matrices provide the proper generalization of real symmetric


matrices to enable us to prove in the next section that every Hermitian matrix
is diagonalizable. The special case of this theorem for real matrices thus finally
provides us with a proof of the fundamental theorem that every real!
symmetric matnx is diagonalizable. The fundamental theorem for real
symmetric matrices is tough to prove if we stay within the real number system,
but it is a corollary of a fairly easy theorem for complex matrices. This
illustrates how we can obtain true insight into theorems in real analysis and
linear algebra by studying analogous concepts that use complex numbers.

- SUMMARY
|
1. C"1s an n-dimensional complex vector space.
2. Ifu=[u,,u,...,u,)andv=[v,, v,,..., v,] are vectors in C’, then (u, v) =
U\V, + Uv, + ++ * + u,v, is the Euclidean inner product of u and v and
satisfies the preperties in Theorem 9.2. [n general, (u, v) # (vy, u).
V,u) . ,
3. For u, v € C’, the vector u — oy 1S perpendicular to v.

4. The conjugate transpose of an m x n matrix A is the n X m matnix A* =


(A)". The conjugate transpose operation satisfies the properties in Theo-
rem 9.3.
5. A square matrix U is wmtary if UtU = 1. A real unitary matrix is an
orthogonal matnx.
6. A square matrix H is Hermitian if H = A*. A real Hermitian matnix is a
symmetnic matrix.

EXERCISES

1. Explain how C’ can be viewed as a 5. Find A“! ifA = | ! |


2n-dimensional real vector space and as an P+i 241
n-dimensionai complex vector space. Give a i lt
basis in each case. 6. Find A"'ifA = I; |:
I
2. Is it appropriate to view R" as a subspace of
C”? Explain. l i 1-1
3. Find AB and BA if 7. Find A“ if A -| : r+
O -lt+i i
| i i
A=i1 +i 1 —j | and l l-i ltt
ri I+? l-i 8. rit ita =| 0 1 i
| L+i i 1-ir -' IL-l

B=i2+i 1 1-i i
i | i 9. Solve the linear system Az =] 1 + i Als
4. Find A2and A‘ if A -| 1 o1t |: i
“tsa ! the matrix in Exercise 7.
9.2 MATRICES AND VECTOR SPACES WITH COMPLEX SCALARS 473

-Il+i 22. Find a unit vector parallel to


10. Solve the linear system Az = | 2+1 jira is fltil- 4, i.
1 J 23. Find a vector of length 2 parallel to
the matrix in Exercise 8. fil -i 1 +i, - i)
ll. Find the solution space cf tiie homogeneous 24. Find a unit vector perpendicular to
system (2 -—i,1 + 2.
z,+ iz,+ (1 -—d)z,=0 25. Find a vecter perpendicular to both
(1, é@, 1 -— d)and [I +7, 1-4, 1).
(I + 2z,- z,+ z,;=0
(1 + dz, + 2iz, + (3 - 2dz, = 0.
In Exercises 26-29, transform the given basis of
12. Find the rank of the matrix C" into an orthogonal basis, using the
1 1+i i Ii Gram- Schmidt process.
l+i 2+i7 1-i 1 |.
il+f 1+47 14+3% 2-i 26. {[l +i,1- i), (1, 1} in@
13. Find the rank cf the matrix 27. (2+i1+9, [1 +i gin?
(2-G i 1-i 143i 28. {l-i1 +6149, 4,1, -1 - §,
I 1+i -1+21 I | {1,/, -J}in
Ll-i 1+2) L-3i 2433 29. (16,0 +619, 05, 1+ 4 in
. Find each of the indicated inner products. 30. Find the conjugate transpose of each of the
a. (2+ 1,2, 4, [1,1 +42-pin@ following matrices.
bn (i-,l+g(l+,1-pine «fi, lt+i 4
e (i t+ill-fij1,il+ii,1-p 1-i l+i 2
in C’ | iy
d((2-,3+,1+ 0,1 +42-1,1+ 2+7 I|I-i
in C3
po 1+i]
15. Compute (u, v) and (v, u) for each of the ce /2-i 1l-i
following. | 1-21
ao u=([l,j,vea[Ilt+tini1-J9
1+ 2i i l1-i
beu=(l+i2-,i4,v=[l,l-f,1+9
d. i 1-i l+t+i
16. Verify property 1 of Theorem 9.2. 1+? 2-¢ 1+3i
17. Verify property 2 of Theorem 9.2. 31. Label each of the following matrices as
18. Verify properties 3 and 4 of Theorem 9.2. Hermitian, unitary, both, or neither.
19. Verify property 5 of Theorem 9.2. Iyoioiti
20. Find the magnitude of each of the following a i|
vectors. | i l-d
a. [1,77] b. —-1 72 ]
Nl

Li
b [il +ail-Z14+g I+7 | l
e {l+742+i7,34+]
|
V2 0
da f,jlt+il-igig V3i 0 V3
ef{ltil-ii1-Q -i = -2 1
21. Determine whether the given pairs of vectors
are parallel, perpendicular, or neither. i! i l-i
a. [1,4], [i 1 d. 4 i 3
b. [1 +ié41-i,(f-31,-1- 9 il-i 1 2
e. (1 +72 - 9, (33,3 + 9 32. Prove the properties of the conjugate
d. {l,4,1- a, {1-41 +4, 2) transpose operation given in Theorem 9.3.
e.(l+i,1-,1,{,1-%4-3-] (Hint: From Section 1.3, we know that
474 CHAPTER 9 COMPLEX SCALARS

analogous properties of the transpose 34. Prove that, for vectors v,, v,,...,¥, in C’,
operation hold for real matrices and real {V,, Vv), .... Va} iS a basis for C* if and only if
scalars and can be derived using just field {v,,¥>,-.., V,} iS a basis for C’.
properties of R, so they are also tiue for . Prove that an n x n matrix U is unitary if
matrices with complex entries. Thus we can and only if the rows ot U form an
focus on the effect of the conjugation. From orthoncrmal basis for C’.
Theorem 9.1, we know that z + w =z + w,
36. Prove that, ifA is a square matrix, then AA*
zw =2zw, and z =z, forz, wE C. Use
is a Hermitian matrix.
these properties of conjugation to complete
the proof of Theorem 9.3.] 37. Prove that the product of two commuting
n X n Hermitian matrices is also a
33. Mark each of the following True or False.
Hermitian matrix. What can you say about
Assume that all matrices and scalars are
the sum of two Hermitian matrices?
complex.
38. Prove that the product of two m X n umitary
a. The definition ofa determinant, matrices is aiso a unitary matrix. What
properties of determinants (the transpose about the sum?
property, the row-interchange property,
39. Let 7: C? > C’ be a linear transformation
and so on), and techniques for computing whose standard matrix representation is a
them are developed using only field unitary matrix U. Show that (7(2), T(v)) =
properties of R in Chapter 4, and thus {u, v) for all u, v € C”. [Hinr: Remember that
they remain equally valid for square (u, v) = u*y.]
complex matrices.
40. Prove that for u, v € C", we have (u*v)* =
. Cramer’s rule is valid for square linear
systems with complex coefficients.
u*v = vu = u’V.
. IfA is any square matrix and det(A) # 0, 41. Describe the unitary diagonal matrices.
then det(iA) # det(A). . Prove that, if U is unitary, then U, U’, and
. If U is unitary, then U~! = UT. U* are unitary matrices also.
. If U is unitary, then (U)"' = U’. 43. A square matrix A is normal if A*A = AA’.
. The Euclidean inner product in C" is not a. Show that every Hermitian matnx is
commutative. normal.
. For u,v € C’, we have (u, v) = (v, u) if b. Show that every unitary matrix is
and only if (u, v) is a real number. normal.
c. Show that, if A* = —A, then A is normal.
. For a square matrix A, we have det(A) =
det(A). 44, Let A be an n X n matrix. Referring to
i. For a square matnx A, we have det(A*) = Exercise 43, prove that, if A is normal, then
dei{A). ||Az|| = ||A*2|| for all z € C’.
- Hf ("1s a unitary matrix, then 45. Prove the converse of the statement in
dof *) = +1. Exercise 44.

MATLAB
MATLAB can work with complex numbers. When typing a complex number a + 0!
as a component of a vector or as an entry of a matrix, be sure to type at+b«i with
the * denoting multiplication and with no spaces before or aiter the + or the *. A
space before the + would cause MATLAB to interpret a +bsi as two numbers, the
real number a followed by another entry containing the complex number Lv.
93 EIGENVALUES AND DIAGONALIZATION 4/5

Fora matrix A, MATLAB interprets A’ as the conjugate transpose of A—that is. as


the transpose of A with every entry replaced by its conjugate. 'lere are three more
MATLAB functions:
real(A) is the matrix of real parts of the entries in 4,
imag(A) is the matrix of complex parts of the entries in 4.
conj(A) is the matrix of conjugates of the entries in A.
Matrices for the exercises that follow are in a file. Some of the exercises ask you
to check your answers to some of the more tedious pencil-and-paper computations in
the exercises for this section.
Mi. Check Exercise 3. (The file matrices are E3A and E3B.)
M2. Check Exercise 7.
M3. Check Exercise 8.
M4. Check Exercise 9.
MS. Check Exercise 10.
M6. Check Exercise 11.
M7. Check Exercise 12.
M8. Check Exercise 13.
M9. Check Exercise 28.
M10. Check Exercise 29.
Mil. Consider the matrix
2-3) 3+ 71 -S5+2i 7-31 -10+ 4i
8- § 245) 11-3) 6+2i 14-4
13+2i 3-4) 94+9' 3-21 7+ 6)
S+ ¢ 8-4 i2+8i -3+2i 1-51
a. Find the norms of the fcur row vectors by multiplying appropriate
matrices.
b. Find the norms of the five column vectors by multiplying appropriate
matrices.
c. Find the inner product (r,, r,) where r, is the first row vector and r, is the
third row vector.
d. Find the inner product (c,, c,) where c, is the second column vector and
c, is the fifth column vector.

9.3 EIGENVALUES AND DIAGONALIZATION

Recall the fundamental theorem of real symmetric matrices that we stated


without proof as Theorem 6.8:

Every real symmetric matrix is diagonalizable by a real orthogonal


matrix.
476 CHAPTER 9 COMPLEX SCALARS

Our main goal in this section is to extend this result to complex matrices, as
follows:

Every Hermitian matnx is diagonalizable by a unitary matrix.

We will prove this theorem, which has the theorem for real symmetric matrices
as an easy corollary.

Eigenvalues for Complex Matrices


We begin by extending the notions of eigenvalues and eigenvecturs to com-
plex matrices. The definitions are identical to those for real matrices. If A is
an n X n complex matrix and if Avy = Av, where AE C andveé C’, v # 0,
then A is an eigenvalue of A and v 1s a corresponding eigenvector. The zero
vector and the set of all eigenvectors of A corresponding to A constitute the
eigenspace £,. Computation of eigenvalues and eigenspaces of a complex
matrix is the same as for real matrices, except that the anthmetic involves
complex numbers and consequently is more laborious to do with pencil and
paper. Every nm X n complex matrix has n not necessarily distinct eigen-
values, his is a consequence of the Fundamental Theorem of Algebra,
which we stated in Section 9.1. Recall that, for real matrices, there may exist
no real eigenvalues.

EXAMPLE 1 Find the eigenvalues and eigensnaces of the matrix

110i
A={ 0 2 0}.
-101

SOLUTION The characteristic polynomial of A 1s

1-A 0 i
det(A-AN=| 0 2-A O 2)
f=(2-ay(lL—-aypt
-i OO I-A
= (2 — AL — 2A + — 1) = -A(2 — AY.
The three roots of —A(2 — A)’ = O are A, = 0, A, = A, = 2.
For the eigenvalue A, = 0, we have

Loz 10 i
A-AJ=| 02 0/~/0
1 Ol,
-i01]; {090
9.3 EIGENVALUES AND DIAGONALIZATION 477

—i
which gives the eigenspace FE, = sp | ; . For the double root a, = A; = 2, we
have ]

-! 0O #8 1 0O-i
A-27/=; 0 0 O;~]0 O 9},
-i 0-1 0 0 0

it |0
which gives the two-dimensional eigenspace E, = sp ‘} " .
1} |0

Definitions and theorems concerning eigenvalues and eigenvectors that


depend only on the field axioms discussed at the beginning of Section 9.2
continue to make sense and to hold for complex matrices. In particular, an
n X ncomplex matrix A is diagoralizable if and only if there exist an invertible
matrix C and a diagonal matrix D such that D = C~'AC. Just as for real
matrices, two complex n X n matrices A and B are similar if there exists an
invertible n X m matrix C such that B = C"!AC. Similarity is an equivalence
relation. Thus, A is similar to A; if A is similar to B, then B is similar to A; and
if furthermore B is similar to D, then A is similar to D. All of these things are
defined and proved using just field properties.
Consider again the equation D = C"'AC, where D is a diagonal matrix.
The equivalent equation, CD = AC, for an invertible matrix C shows that
A 18 diagonalizable if and only if €* has a basis of eigenvectors of A, and
it shows that the matrix C must have such a basis of eigenvectors as its
column vectors, whereas D has the corresponding eigenvalues on its diago-
nal. We obtain all of this from CD = AC by considering the jth column vector
of CD and comparing it with the jth column vector of AC, just as we did
for the real case in Section 5.2. Such a basis for C" of eigenvectors of A
exists if and only if the algebraic multiplicity of each eigenvalue is equal
to its geometric multiplicity (the dimension of the corresponding eigen-
space).

0 i ;
——

=_

EXAMPLE 2 Let
A = 2 0|. Find a matrix C such that C7'AC is a diagonal matnx.
Co

0 1
=

SOLUTION From the preceding example, we see that A has an eigenvalue A, = 0 of


algebraic multiplicity | with eigenspace

Ena a
l
478 CHAPTER 9 COMPLEX SCALARS
and that it has the double eigenvalue A, = A, = 2 with eigenspace

1} |0
£, = sp] ]O},]1]).
\ULJ 10);
Thus, the algebraic multipiicity of each eigenvalue is equal to its geometric
multiplicity. and the vectors shown form a basis for C”. Therefore, the matrix

-—1 10
C=)|001
1 10
is invertible and diagonalizes A; and we must also have C~'AC = D, where

000
D=|0 2 0}.
00 2 .

The proof of Theorem 5.3, which asserts that eigenvectors corresponding


to distinct eigenvalues are independent, depends only on field properiies and
thus is valid in the complex case. Consequently, every matrix having only
eigenvalues of algebraic multiplicity 1 is diagonalizable. We focus our atten-
tion on the geometric multiplicity of any eigenvalues of algebraic multiplicity
greater than 1, when determining whether a matrix 1s diagonalizable.

EXAMPLE3 Find all values ofc for which the matrix

jie 1
A=!0
i 21
loo1
is diagonalizable.
SOLUTION The eigenvalues of the upper-triangular matrix A are A, = A, = and A, = 1. We
focus our attention on the eigenvalue / of algebraic multiplicity 2. For A to be
diagonalizable, its eigenspace E, must have geometric multiplicity 2. The
elgenspace is the nullspace of the matrix

0 ¢ I
A-As=|0 0 2: |,
10 O1l-i
and the nullspace has dimension 2 if and only if the matrix has rank 1, which is
the case if and onlyifc=0.

Diagonalization of Hermitian Matrices


Diagonalization via a unitary matrix is of special importance, as we saw in the
real case, where it becomes diagonalization by an orthogonal matrix. We call
n X n matrices A and B unitarily equivalent if there is a unitary matrix L’ such
93 EIGENVALUES AND DIAGONALIZATION 479

that B = U~'AU. Because the inverse of 2 unitary matrix is unitary and be-
cause a product of unitary matrices is unitary, we can show that unitary
equivalence is an equivalence relation. Thus, 4 is unitarily equivalent
to itself; and if A is unitarily equivalent to B, then B is unitarily equivalent
to A; if furthermore B is unitarily equivalent to C. then -4 is unitarily equis-
alent to C.
Now we achieve the main goal of this section: to prove that Hermitian
matrices are unitarily equivalent to a diagonal matrix. That is, a Hermitian
matrix can be diagonalized using a unitary matrix. This follows from a very
important result known as Schur’s lemma (or Schur’s unitary triangularization
theorem), which we state, deferring the proof until the end of this section.

THEOREM 9.4 Schur’s Lemma

Let A be an n X n (complex) matnx. There is a unitary matrix U such


that U-'AU is upper triangular.

Using Schur’s lemma, we can prove that every Hermitian matrix is


diagonalizable, and that the diagonalizing matrix can be chosen to be unitarv.
We express this by saying that every Hermitian matnix is unitarily diagonaliz-
able.

THEOREM 9.5 Spectral Theorem for Hermitian Matrices

If A is a Hermitian matnx, there exists a unitary matnx U such that


U~'AU is a diagonal matrix. Furthermore, all eigenvalues of A are real.

PROOF By Schur’s lemma, there exists a unitary matrix U such that U~'AU is
an upper-triangular matrix. Because U 1s unitary, we have U*U = J,so U"! =
U*; and because A is Hermitian, we also kwew that A* = A. Thus, we have

(U-'AU)* = (U*AU)* = U*A*(U*)* = U*AU = UU,


which shows that the upper-triangular matrix U~'AU is also H=rmitian.
Because the conjugate transpose of an upper-triangular matrix is a lower-
triangular matrix, we see that the entries above the diagonal in U~'AU must all
be zero; therefore, U~'AU = D, where D is a diagonal matrix. Thus, A is
unitarily diagonalizable.
It remains to be shown that each eigenvalue of A is a real number. From
the theory of diagonalization, we know that the entries on the diagonal of D
are the eigenvalues of A. Now we showed in the preceding paragraph that the
matrix D = U-'AU is Hermitian, so D* = D. Forming the conjugate transpose
of a diagonal matrix amounts simply to taking the conjugates of the entries on
the diagonal. Because D* = D, the entries on the diagonal of D remain
unchanged under conjugation, so they must be real numbers. 4
480 CHAPTER 9 COMPLEX SCALARS

COROLLARY Fundamental Theorem of Real Symmetric Matrices

Every n X n real symmetric matrix has n real eigenvalues, counted


with their algebraic multiplicity, and is diagonalizable by a real
orthogonal matrix.

PROOF Because every real n X n symmetric matrix A is also Hermitian,


Theorem 9.5 establishes that all of its eigenvalues in C actually lie in R;
therefore, the matrix has 7 real eigenvalues, counting them with their algebraic
multiplicity. Furthermore, Theorem 9.5 asserts that A can be diagonalized by
a unitary matnx U. We know that the column vectors of U are eigenvectors of
A. Now the eigenvectors of A can be computed by row reductions of A — AJ,
where the A; are eigenvalues of A. Because all the A, are real, the row reductions
all take place in the fiela R of real numbers. Tne reduced echelon form of
A — dJ is thus a rea/ matrix; 1t must have a nullspace of dimension (geometric
multiplicity) equal to the algebraic multiplicity of A;, because A is diagonaliz-
able. Thus, we can find bases for the eigenspaces E, consisting o: vectors in R’.
Using the Gram-Schmidt process, we can assume that the basis of each
eigenspace is orthonormal. The matnx C having as column vectors the vectors
in these orthonormal bases of eigenspaces E, is thus a real orthogonal matnx
that diagonalizes A. 4

EXAMPLE 4 Find a unitary matnx that diagonalizes the matrix A in Example 2.


SOLUTION We found in Example 2 that the matnx

-ii
=| 0 0
1 1

diagonalizes A. Notice that the inner product of any two distinct column
vectors of this matrix is zero, so the column vectors are orthogonal. We need
only normalize them to length | in order to obtain a unitary matrix that
diagonalizes A. Thus, such a matrix is

,{-i i 1
U=| 0 0 V2I.
vat Oj .

Recall that in Theorem 6.7 we showed that real eigenvectors of


symmetric matrix corresponding to distinct eigenvalues are orthogonal.
Generalizing this, we can show that the eigenvectors of a Hermitian matn*
corresponding to distinct eigenvalues are orthogonal. We ask you to show this
in Exercise 21, using the fact that a Hermitian matrix can be unitarily
diagonalized; but it iS easy to demonstrate this orthogonality by using
properties of matrices and the fact that the eigenvalues must be real.
9.3 EIGENVALUES AND DIAGONALIZATION 481

THEOREM 9.6 Orthogonality of Eigenspaces of a Hermitian Matrix

The eigenvectors of a Hermitian matrix corresponding to distinct


eigenvalues are orthogonal.

PROOF Let v and w be eigenvectors of a Hermitian matrix A corresponding to


distinct eigenvalues A, and A,, respectively. Using the facts that A = A* and
that the eigenvalues are real, so that A, = A,, we have

A,(w*v) = w*(A,v) = w*(Av) = w*(A*v) = (w*d*)v


= (Aw)*v = (A,w)*v = A,(w*v).

Therefore, (A, — A,)(w*v) = 0. Because A, # A,, we must have w*v = 0, so wand


v are orthogonal. 4

EXAMPLE S Find a unitary matrix C that diagonalizes the Hermitian matrix

—1 i jlti
A=] -i 1 0
1-7 0 l

SOLUTION We find that

-l-AzA i l+i
|A -— All = —j 1—A 0
l-i 0 1-A
7 y _ yllb mA 0 | | -) 0
=Cl al 9 1 Nt-1 t-alt
-i 1-A
(1 + Al, i 0 |
= (-1 —AXl — AP - (-DA -— A+ (lL + (- DU — O00 - A)
= (1 -ay(’- 1-1-2) =(1 - Aa’? - 4).
Thus, the eigenvalues are A, = 1, A, = 2, and A; = —2. To find U, we need only
compute one eigenvector of length | for each of these three distinct
eigenvalues. The three eigenvectors we obtain must form an orthonommal set.
according to Theorem 9.6.
For A, = 1, we find that
—2 @« Iti
A-I=| -i 0 Of,
1-i 0 0
SO an eigenvector 1s
0
CHAPTER 9 COMPLEX SCALARS

For A, = 2. we find that

3 i iti] fl -i Oo] fi -i 0
A-2={ -i -t 0 {~]O -2% 1+ij~]0 1 G- 12),
l-i 0 -1 O1+i -1] [0 0 0

Finally, for A, = —2, we have

| | i 1+il l on
At+2=-!' -— 3 O'!~|0 2 -1+i ~ I
1-i 0 3 0 -l1-i 1 | [0 0
and a corresponding eigenvector is

-3-3i
vy,=/ l-t
2

We normalize the vectors v,, v,, and v, and form the column vectors in U from
the resulting vectors of magnitude i, obtaining

0 (1 + Df(2V2) (-3 - 30/21)


U=|(1 + M3 ( - D/2V2)) 1 - DAV) I.
-1fV3 V2 Vo a

A Criterion for Unitary Diagonalization


We have seen that every Hermitian matrix is unitarily diagonalizable, but of
course a unitarily diagonalizable matrix need not be Hermitian. For example,
the | x 1 matrix [i] is already diagonal, so it is diagonalizable by the identity
matrix. However, it is not Hermitian because [i]* = {—i]. There is actually a
v'zy to determine whether a square matrix is unitarily diagonalizable, without
having to find its eigenvalues and eigenvectors.

DEFINITION 9.5 Normal Matrix

A square matrix A is normal if it commutes with its conjugate


transpose, so that AA* = A*A.

Exercises 25 and 26 ask you to prove the following theorem, which gives ‘
criterion for A to be unitarily diagonalizable in terms of matrix multiplicatio”
93 EIGENVALUES AND DIAGONALIZATION 483

THEOREM 9.7 Spectral Theorem for Normal Matrices

A square matnx A is unitarily diagonalizable if and only if it is a


normal matrix.

ll
EXAMPLE 6 Determine all values of a such that the matrix

is unitarily diagonalizable.
SOLUTION In order for the matrix to be unitarily diagonalizable, we must have AA* = A*A
so that
| i ae 3
= 3 “ill
if
Equating entnes, we obtain
row |,column 1: 1+ |a?=1+4, — so |al = 2,
row l,column2: 2i- ai = —ai + 2i,
row 2,column 1: —2i + @i= ai — 21,
row 2,column2: 4+1= Ila’ +1, so fal =2.
Clearly these conditions are satisfied as long as |a| = 2, so a can be any number
of the form x + yi, wherex?+y=4.

Proof of Shur’s Lemma

We now prove by induction that, if A is an n X n matrix, there exists a unitary


matrix U such that U-'AU = U*AU is upper triangular. If n = 1, the lemma is
trivial. We assume as induction hypothesis that the lemma is true for all
matrices of size at most (x — 1) x (n — 1), and we proceed to show that it must
hold for an n X n matrix A.
Let A, be an eigenvalue of A, and let v, be a corresponding eigcnvector of
norm |. We can expand {v,} to a basis for C’, anu by the Gram-Schmidt
process we can transform it into an orthonormal basis {v,, v,,... , v,}. Let U,
be the unitary matrix whose jth column vector is v,. Now the first column
vector of AU, is Av, = A,v,. Because the ith row vector of U,* is v;*, and because
the vectors v; are mutually orthogonal, we see that the first column vector of
the matrix U,*AU, is

A,
0
U,*\A\v,) =
484 CHAPTER 9 COMPLEX SCALARS

This shows that we can write U,*AU, symbolically as

A,X Xr X
0
U*AG, = ) (1)

0
where we have denoted the (n — 1) X (n — 1) submatrix in the lower right-hand
comer of U,*AU, by A,. By our induction hypothesis, there exists an (m — 1) x
(1 — 1) unitary matrix C such that C*A,C= B, where Bis upper triangular. Let

[1 00---0

U, =]. (2)

lo |
where we have tised a symbolic notation similar tc that in Eq. (1). Because Cis
unitary, it is clear that U, is a unitary matrix. Now let U = U,U,. Because
U*U = U,*(U,*U,)U, = UTU, = U,*U, = I, we see that Uis a unitary matrix.
Now
U*AU = U,*(U, *AU,)U,. (3)
The matrix in parentheses in Eq. (3) 1s the matrix displayed in Eq. (1). From
our definition of U, in Eq. (2), we see that the (n — 1) < (m — 1) block in the
lower right-hand corner of U*AU is C*A,C = B. Thus, we have

which is upper triangular because B is an upper-triangular matrix. This


completes our induction argument. 4

“) SUMMARY
1. Ann X nmatrix A is diagonalizable if and only if C’ has a basis consisting
of eigenvectors of A. Equivalently, each eigenvalue has algebraic multi-
plicity equal to its geometric multiplicity.
2. Every Hermitian matrix is diagonalizable by a unitary matrix.
Every Hermitian matrix has real eigenvalues.
93 EIGENVALUES AND DIAGONALIZATION 485

4. A square matrix A is unitarily diagonalizable if and only if it is normal. so


that AA* = A*A.
5. Schur’s lemma states that every square matrix is unitarily equivalent to an
upper-triangular matrix.

EXERCISES

5. Find all a € U such that the matrix F

_—
1 37
In Exercises 1-12, find a unitary matrix U and a is unitarily diagonalizable.
diagonal matrix D such that D = U~AU for the . Find all a, 6 € C such that the matrix

aN

given matrix A.
E
4 is unitarily diagenalizable.
Laat ] 2 A=| |-2i 1| i
—-i i 7. Prove that every 2 x 2 real matrix that is

_—
unitarily diagonalizable las one of the
3. A=_
[¢ H
l l+i 4.A4=_{ 9 3-1
Li | A ls +i 0 || following forms: [@ 4 ab f
|b d]’
OQ i 0 1 -i 0
a,b,dER.
5 A=|-i 0 0 6 A=l|i 1 0
10 0 1 00 2 i-!l l
8. Determine whether the inatrix} | -/ -!
_—

l 2-21 0 -| } I
7.A=|2 + 2i -!1 0
is unitarily diagonalizable.
0 0 3
9. Mark each of the following True or False.
_—

0 [ 0 1+2i ___ a. Every square matrix is unitarily


8. A=| 0 5 0 equivalent to a diagonal matrix.
fi
- 2: 0 4 | —__b. Every square matrix is unitarily
fT | 0 2+2i equivalent to an upper-triangular matnx.
9 A=| 0 3 0 ___ c. Every Hermitian matmix is unitanly
12-2: 0 ~| equivalent to a diagonal matrix.
___ d. Every unitarily diagonalizable matrix is
2 [ 0 l-i Hermitian.
10. A=| 0 3 0 ___ e. Every real symmetric matrix 1s
l‘l+i 0 Hermitian.
[ 3 i iti —_— f. Every diagonalizable matrix is normal.
W. A=l| -i 0 ___ g. Every unitarily diagonalizable matrix is

HL — i normal.
Oo

—_h. Every real symmetric matrix 1s normal.


[-3 St olti
i. Every square matrix is diagonalizable,
12. A=| -Si 3 0
although perhaps not by a unitary matrix.
l‘t-i 0 3 ___ j. Every square matrix with eigenvalues of
13. Find all @ € C such that the matrix i i 1S
algebraic multiplicity | is diagonalizable
ai by a unitary matrix.
unitarily diagonalizable. 20. Prove that the eigenvalues of a Hermitian
matrix are real, without using Theorem 9.5
14. Find all a, b € C such that the matrix E ‘|
i or Schur’s lemma. {Hint: Let Av = Av, and
is unitarily diagonalizable. use the fact that v*Av = v*A*v.]
486 CHAPTER 9 COMPLEX SCALARS

21. Argue directly from Theorem 9.5 that b. Prove that ann X n normal
eigenvectors from different eigenspaces of a upper-tniangular matnx B must be
Hermitian matnx are orthogonal. diagonal. [Hint: Let C = B*B = BB*.
22. Suppose that A is ann X n matrix such that Equating the computations of c,, from
A* = —A. Show that B*B and from BB*, show that b,, = 0 for
| <j <n. Then equate the computations
a. A has eigenvalues of the form ri, where
reR,
of c,, from B*B and from BB* to show
that 5, = 0 for 2 <j = n. Continue this
b. A is diagonalizable by a unitary matrix.
process to show that B is lower
{Hitt FOR BOTH PaRTs: Work with iA.]
triangular.]}
23. Prove that an n X m matrix A is unitarily c. Deduce from parts a and b that a normal
diagonalizable if and only if |[Av|| = ||A*v|| matnix is unitarily diagonalizabie.
for allv € C’,
24. Prove that a nomnal matnix is Hermitian if
and only if all its eigenvalues are in R.
25. s. Prove that a diagonai matrix is normal. a In Exercises 27 and 28, use the command
b. Prove that, if A is normal and B is [U, D] = eig(A) in MATLAB to work the
unitarily equivalent to A, then B is indicated exercise. If your MATLAB answer U for
normal. the unitary matrix differs from the U that we
c. Deduce from parts a and b that a found using pencil and paper and put in the
unitarily diagonalizable matrix is normal. answers at the end of our text, explain how you
26. 2. Prove that every normal matrix A is can get from one answer to tie other.
unitanily equivalent to a normal
upper-triangular matnx B. (Use Schur’s 27. Exercise 9
lemma and part b of Exercise 25.} 28. Exercise 1}

9.4 JORDAN CANONICAL FORM

Jordan Blocks

We have spent considerable time on diagonalization of matrices. The preced-


ing section was concerned pnimanily with unitary diagonalization. As we have
seen, diagonal matrices are easily handled. Unfortunately, not every n X ”
matrix A can be diagonalized, because we cannot always find a basis for c"
consisting of eigenvectors of A. We remind you of this with an example that is
well worth studying.

EXAMPLE 1 Show that the matrix

5 10
J=|050
005
is not diagonalizable.
94 JORDAN CANONICAL FORM 487
SOLUTION We see that 5 is the only eigenvalue of ¥ and that it has algebraic multiplicity
3. However,

010
J- 57=]|9 0 0},

vo 0 6

which shows that the eigenspace E, has dimension 2 and has basis {e,, e;}.
Thus, the geometric multiplicity of the eigenvalue 5 is only 2, and J is not
diagonalizable. We cannot find a basis for R? (or even C’) consisting entirely of
eigenvectors of J. "

Let us examine the matnx Jin Example | a bit more. Notice that, although
J — 5I has a nullspace of dimension 2, the matrix (J — 5J)’ is the zero matrix
and has all of C? as nullspace. Moreover. multiplication on the leit by J — SJ
carnies e, into e, and carries both e, and e, into 0. We say that J — 57 annihilates
e, and e,. The action of J — 5/ on these standard basis vectors is denoted
schematically by the two strings

e,->e,- 0, (1)
J-— SI e, > 0.

Diagram (1) also shows that (J — 5/)? maps each of these basis vectors into 0.
Because (J — Se, = e,, we have Je, = Se, + e,, whereas Je, = Se, and Je, = 5Se;.

EXAMPLE 2 Let
41000
0A100
J=|00A 1 0. (2)
000A1
O000A
Discuss the action of J — AJ on the standard basis vectors, drawing a schematic
diagram similar to diagram (1). Describe also the action of J on the vectors in
the standard basis.
SOLUTION We find that

610900
00100
J--A=|000
1 Ol.
00001
00000

We see that multiplication of e, on the left by J — Al yields e,_, for 2 =i = 5,


whereas (J — Al}e, = 0. Schematically, we have just one string,

J-d’Al e7e7e7e,7e,
> 0. (3)
488 § CHAPTER9 = COMPLEX SCALARS
Left-multiplication by J yields

Je,=re; +e, Je,=Ae,+e,, Je, = Ae, + e,,


Je,=de, +e, Je, = 9. a

The matnx J in Example 2 is an example of a Jordan block matrix.

DEFINITION 9.6 Jordan Block

An m X m matrix is a Jordan block if it is structured as follows:


1. All diagonal entries are equal.
2. Each entry immediately above a diagonal entry is 1.
3. All other entries are zero.

Thus, the matnx Jin Example 2 1s a Jordan block. However, the matrix in
Example | is not a Jordan biock, since the entry 5 at the bottom of the diagonal
does not have a I just above 1. A Jordan block has the properties described in
the next theorem. These properties were illustrated in Example 2, and we leave
a formal proof to you if you desire one. Notice that, foran m x m Jordan block
A 10 00
OA] 00
J=

000 A 1
000 OA
we have just one string:

J -— OO. On Pen °° 7e7e 0.

THEOREM 9.8 Properties of a Jordan Block

Let J be an m X m Jordan block with diagonal entnes all equal to A.


Then the following properties hold:
1, J — ADe, =e,-, for | <i = m, and (VJ — ANe, = 0.
2. (J — AN” = O, but VJ — Al! # O fori<m.
3. Je, = de; + e,, for | < i < m, whereas Je, = Ae,.

Jordan Canonical Forms

We have seen that not every 7 X n matrix is diagonalizable. It is our purpose in


this section to show that every m X nm matrix is similar to a matrix having all
94 JORDAN CANONICAL FORM 489

entries 0 except for those on the diagonal and entries | immediately


above some diagonal entries; each | above a diagonal entry must have
the same number on its left as below it on the diagonal. An example of such
a matrix is

f-i 1000009
0-i 1000 0 0
00-000 0 0
00 0-i 100 0
J=19 9 0 0 -i 0 0 OF (4)
000002 0 0
00000051
00000005
As the shading indicates, this matrix J is comprised of four Jordan blocks,
placed cormer-to-corner along the diagonal.

DEFINITION 9.7 Jordan Canonical Form

An n X n matrix J is a Jordan canonical form it it consists of Jordan


blocks, placed corner-to-comer along the main diagonal, as in matrix
(4), with only zero entries outside these Jordan blocks.

Every diagonal matrix is a Jordan canonical form, because each diagonal


entry can be viewed as being the sole entry ina 1 x 1 Jordan block. Notice that
matrix (4) contains the | x 1 Jordan bleck [2]. Notice, too, that the breaks
between the Jordan blocks in matrix (4) occur where some diagonal entry has a
0 rather than | immediately above it.

EXAMPLE 3 Is the matnx


©
owt
ON =
a=
oO

a Jordan canonical form? Why?


SOLUTION This matrix is not a Jordan canonical form. Because not all diagonal entries
are equal, there should be at least two Jordan blocks present in order for the
matrix to be a Jordan canonical form, and [7] should be a | x 1 Jordan block.
However, the entry immediately above 7 1s not 0. Consequently, this matnx is
not a Jordan canonical form. #

EXAMPLE 4 Describe the effect of matrix J in Eq. (4) on each of the standard basis vectors
in C*. Then give the eigenvalues and eigenspaces of J. Finally, find the
dimension of the nullspace of (J — AJ) for each eigenvalue A of J and for each
positive integer k.
490 CHAPTER 9 COMPLEX SCALARS

SOLUTION We find that

Je, = —1e, + e,, Je. = —ie, + e., Je, = —1e,.

Je, = —le, + @,. Je, = —1@,,


Je, = 2€,,
Je, = 5e, + e), Je, = Se.

The eigenvalues ofJ are —i, 2, and 5, which have algebraic multiplicities of 5,
|, and 2, respectively. The eigenspaces ofJ are E_, = sp(e,, e,), E, = sp(e,), and
E; = sp(e,), aS you can easily check.
The effect ofJ — (—i)J on the first five standard basis vectors 1s given by the
two strings

e,- e, > e, > 0,


J+ il: (5)
e,—e,— 0.

The 3 x 3 lower right-hand corner of J + iJ describes the action of J + Jone,


e,, and e,. Because this 3 x 3 matrix has a nonzero determinant, it causes J + if
to carry these three vectors into three independent vectors, and the same is
true of all powers of J + i/. Thus we can determine the dimension of the
nullspace ofJ + i by diagram (5), and we find that
J + il has nullspace sp(e,, e,) of dimension 2,
(J + il) has nullspace sp(e,, e2, €,, e;) of dimension 4,
(J + il) has nullspace sp(e,, 2, €;, e,, €;) of dimension 5,
(J + il) has the same nullspace as that of (J + if) for k > 3.
By a similar argument, we find that
(J — 2/)* has nullspace sp(e,) of dimension | for k = 1,
J — 5S] has nullspace sp(e,) of dimension 1,
(J — 5/) has nullspace sp(e,, e,) of dimension 2 fork > 1. «

HISTORICAL NOTE Tue JoRDAN CANONICAL FORM appears in the Treatise on Substitutions and
Algebraic Equations, the chief work of the French algebraist Camille Jordan (1838-1921). This
text, which appeared in 1870, incorporated the author’s group-theory work over the preceding
decade and became the bible of the field for the remainder of the nineteenth century. The theorem
containing the canonical form actually deals not with matrices over the real numbers, but with
matrices with entries from the finite field of order p. And as the title of the book indicates, Jordan
was not considering matrices as such, but the linear substitutions that they represented.
Camille Jordan, a brilliant student, entered the Ecole Polytechnique in Paris at the age of !7
and practiced engineering from the time of his graduation until 1885. He thus had ample time for
mathematical research. From 1873 until 1912, he taught at both the Ecole Polytechnique and the
College de France. Besides doing seminal work on group theory, he is known for important
discoveries in modern analysis and topology.
94 JORDAN CANONICAL FORM =—s_« 491
EXAMPLE 5 Suppose a 9 x 9 Jordan canonical form J has the following properties:
1. (J — 3if) has rank 7 fork = 1, rank 5 fork = 2, and rank 4 fork = 3.
2. (J + fy has rank 6 forj = | and rank § forj = 2.
Find the Jordan blocks that appear in J.
SOLUTION Because the rank ofJ ~ 3i/ is 7, the dimension of its nullspace is 9 — 7 = 2, so
3i is an eigenvalue of geometric multiplicity 2. It must give rise to two Jordan
blocks. In addition, J — 3i/ must annihilate two eigenvectors e, and e, in the
standard basis. Because the rank of (J — 3i/)? is 5, its nullspace must have
dimension 4, so ina diagram of the effect of J — 3i/ on the standard basis, we
must have (J — 3i/)e,,, = e, and (J— 3il)e,,, = e,. Because (J — 3i7)‘ has rank 4
for k = 3, its nullity is 5, and we have just one more standard basis
vector—either e,., or e,,,—that is annihilated by (J — 3i/)’. [hus, the two
Jordan blocks in J that have 3i on the diagonal are

3i 1 «<0 35 1
J,=|0 3i 1] and J,= S sil
00 3
Because J + 1/ has rank 6, its nullspace has dimension 9 — 6 = 3, so —1 is an
eigenvalue of geometric multiplicity 3 and gives nse to three Jordan blocks.
Because (J + J) has rank 5 for j 2 2, its nullspace has dimension 4, so (J + J)
annihilates a total of four standard basis vectors. Thus, just one of these
Jordan blocks is 2 x 2, and tne other two are 1 x |. The Jordan blocks arising
from the eigenvalue —1 are then
-!| 1
4 =| 0 a and J, = J; = (-1).

The matrix J might have these blocks in any order down its diagonal.
Symbolically, we might have

or any other order. a

Jordan Bases

If an n X n matrix A is similar to a Jordan canonical form J, we call Ja Jordan


canonical form of A. When this is the case, there exists an invertible matrix C
such that C-!AC = J. We know that similar matrices represent the same linear
492 CHAPTER 9 COMPLEX SCALARS

transformation, but with respect to different bases. Thus, if A is similar to J,


there must exist a basis {b,, b,, . . . , 5,} of C” with the same schematic string
properties relative to A that the standard ordered basis has relative to the
matrix J. We proceed to define such a Jordan basis.

DEFINITION 9.8 Jordan Basis

LetA be an n X nm matrix. An ordered basis B = (b,, b,, . . . ,b,) of C’ is


a Jordan basis forA if, for 1-= j = n, we have either Ab; = Ab; or Ab; =
Ab; + b,_,, where’
A is‘an eigenvalue of A that we say is associated with
b. If Ab; = Ab; + b_,, we require that the eigenvalue associated with b,_,
also be A.

If an n X n matrix A has a Jordan basis B, then the matrix representation of


the l:near transformation 7{z) = Az relative to B must be a Jordan canonical
form. We know then that J = C-'AC, where C is the n X n matrix whose jth
column vector is the jth vector b; in B. In a moment we will prove thai, for
every square matrix, there is an associated Jordan basis, and consequently that
every square matrix is similar to a Jordan canonical form. First, though,
we outline a method for the computation of a Jordan canonical form of A.

We now illustrate this technique.

EXAMPLE 6 Finda Jordan canonical form of the matrix


K— OOO
MN

ooo &
on

OOO
ON
om
iI

|
ooo

oOo

om
94 JORDAN CANONICAL FORM 493

SOLUTION Because 4 is an upper-tnangular matrix, we see that the eigenvalues of .“ are


A, = A, = 2 and A, = A, = A; = —1. Now

0 50 0 1
0 00 0 0
A-AI=A-21=|0 0-3 0 QO
0 0 0-3 Q
0 0 0 0-3
has rank 4 and consequently has a nullspace of dimension 1. We find that
0 0 0 0-3
0 0 0 0 O
(A-27y’7=|0 0 9 O ODO,
0 0 0 9 QO
0 0 0 0 9
which has rank 3 and therefore has a nullspace of dimension 2. Furthermore,

00
0 0 9
00 0 0 0
(4-217 =|0 0-27 0 0
0 0 0-27 0
0 0 0 0-27
has the same rank and nullity as (A — 2/)*. Thus we have Ab, = 2b, and
Ab, = 2b, + b, for some Jordan basis B = (b,, b,, b;, b,, bs) for A. There is just
one Jordan block associated with A, = 2—namely,

ror the eigenvalue A, = —1, we find that


35001]
03000
A-AJ=A+1=]000001,
00000
00000
which has rank 2 and therefore has a nullspace of dimension 3. Because — | is
an eigenvalue of both algebraic multiplicity and geometric multiplicity 3, we
realize that J, = J; = J, = [-1] are the remaining Jordan blocks. This is
confirmed by the fact that

9 30003
09000
(4+17=|0 0000
00000
00000
494 CHAPTER 9 COMPLEX SCALARS

again has rank 2 and nullity 3. Thus, a Jordan canonical form for A is

21 0 0 0
02 0 0 0
J=|0 0-1 0O 0
00 0-1 0
00 0 0-1 .

EXAMPLE 7 Find a Jordan basis for matrix A in Example 6.


SOLUTION For the part of a Jordan basis associated with the eigenvalue 2, we need to find
a vector b, in the nullspace of (A — 2/)? that is not in the nullspace of A — 2/;
then we may take b, = (A — 2/)b,. From the computation of A — 2/ and
(A — 21 in Example 6, we see that we can take
0 51
1 0
b,=|0|, andthen b, = (A — 2/)b, = {0}.
0 s
0 0
For b,, b,, and b,, we need only take a basis for the nullspace of A + I. We see
that we can take

[0 0 —1]

0 0 OO ©
b, =/}1], b,=|0}, and b,=
0 1
0 L05
Ww

In Example 7, it was easy to find vectors in a Jordan basis corresponding to


the eigenvalue 2 whose geoinetric multiplicity is less than its algebraic
multiplicity, because only one Jordan block corresponds to the eigenvalue 2.
We now indicate how a Jordan basis can be constructed when more than one
such block corresponds to a single eigenvalue A. Let N, be the nullspace of
(A — AI) for r = 1, and suppose (for example) that dim(N,) = 4, dim(N,) = 7,
and dim(N,) = 8 for r = 3. Then a Jordan basis for A contains four strings
corresponding to A, which we may represent as
b, > b, > b, > 0,
b, > b, > 0,

b> b> 0,
b, — 0.
To find the first and longest of these strings, we compute a basis {v,, v,, . . . » Ys
for the nullspace N, of (A — AI)’. The preceding strings show that multiplica-
tion of all of the vectors in N; on the left by (A — AI) yields a space of di-
mension 1, so at least one of the vectors v; has the property that (A — AI)*v, # 0.
9.4 JORDAN CANONICAL FORM 495

Let b, be such a vector, and set b, = (A — AJ)b, and b, = (A — AJ)b,. It is not


difficult to show that b,, b,, and b, must be independent. Thus we have found
the first string.
Now b, and b, lie in 4V,, and we can expand the independent set {b,, b,}
to a basis {b,, b,, w,, . . . , Ws} of N;. Again, the strings displayed earlier show
that multiplication of the vectors in N, on the left by A — Al must yield
a space of dimension 3, so there exist two vectors W; and Ww, such that the
vectors b,, (4 — Al)w, and (A — Al)w, are independent. Let b; = w, and b, =
(A — Af)b;, while b, = w, and b, = (A — A)b,. It can be shown that the vec-
tors b,, b,, ..., b, are independent. Finally, we expand the set {b,, b,, bg} to
a basis {b,, b,, by, b,} for N, to complete the portion of the Jordan basis cor-
responding to A.
Although we know the techniques for finding bases for the nullspaces JN,
and for expanding a given set of independent vectors to a basis, significant
pencil-and-paper illustrations of this construction would be cumbersome, so
we do not include them here. Any Jordan bases requested in the exercises can
be found as in Example 7.
An application of the Jordan canonical form to differential equations is
indicated in Exercise 32. We mention that computer-aided computation of a
Jordan canonical form fora square matrix is not a stable process. Consider, for
example, the matrix

A=[f 2c‘}
If c = 10-'™, then the Jordan canonical form of A has | as its entry in the upper
nght-hand corner; but if c = 0, that entry is 0.

Existence of a Jordan Form for a Square Matrix


To demonstrate the existence of a Jordan canonical form similar to an n x n
matrix A, we need only show that we have a Jordan basis B for A. Let us
formalize the concept of a string in a Jordan basisB = (b,, b,, . . . ,b,). Let A be
an eigenvalue of A. If Ab, = Ab; and Ab, = Ab, + b,-, for i < k < j, while
Ab, # Ab; + b,_,, we refer to the sequence b, b,,,, . - - » bj-1 aS 2 String of basis
vectors starting at b,,, ending at b, and associated with A. This string is
represented by the diagram
A-—Xdi.- b-1 > °° > > b,, 7 5 — 0.

THEOREM 9.9 Jordan Canonical Form of a Square Matrix

Let A be asquare matrix. There exists an invertible matrix C such that


the matrix J = C"'AC is a Jordan canonical form. This Jordan
canonical form is unique, except for the order of the Jordan blocks of
which it is composed.
496 CHAPTER 9 COMPLEX SCALARS

PROOF We use a proof due to Filippov. First we note that it sullices to prove
the theorem for matrices A having 0 as an eigenvalue. Observe that, if A is an
eigenvalue of A, then 0 is an eigenvalue of A — A/. Now if we can find C such
that C"'(A — AI)C = Jis a Jordan canonical form, then C"'AC = J + Al is also
a Jordan canonical form. Thus, we restrict ourselves to the case where .4 has an
eigenvalue of 0.
In order to find a Jordan canonica! fcrm for A, it is useful to consider also
the linear transformation T: C" — C’, where 7(z) = Az; a Jordan basis for A is
considered to be a Jordan basis for T. We will prove the existence of a Jordan
basis for any such linear transformation by induction on the dimension of the
domain of the transformation.
If T is a linear transformation of a one-dimensional vector space sp(z),
then 7(z) = Azfor some A € C, and {z} is the required Jordan basis. (The matrix
of 7 with respect to this ordered basis is the 1 X 1 matrix [A], which is already a
Jordan canonical form.)
Now suppose that there exist Jordan bases for linear transformations on
subspaces of C” of dimension less than n, and let T(z) = Az for z € C’ and an
n X nmatnx A. As noted, we can assume that zero is an eigenvalue of A. Then
rank(A) < n; let r = rank(A). Now T maps C’ onto the column space of A that 1s
of dimension r < n. Let T’ be the induced linear transformation of the column
space of A into itself, defined by 7'(v) = 7(v) for vin the column space of A. By
our induction hypothesis, there is a Jordan basis
B’ = (u,, uh, ..., U,)

for this column space of A.


Let S be the intersection of the column space and the nullspace of A. We
wish to separate the vectors in B' that are in S from those that are not. The
nonzero vectors in S are precisely the eigenvectors in the column space of A
with corresponding eigenvalue 0; that is, they are the eigenvectors of 7’ with
eigenvalue 0. In other words, S is the nullspace of 7’. Let J’ be the matrix
representation of T' relative to B'. Because J’ is a Jordan canonical form, we
see that the nullity of 7’ (and of J’) is precisely the number of zero rows in J’.
This is true because J’ is an upper-triangular square matrix; it can be brought
to echelon form by means of row exchanges that place the zero rows at the
bottom while sliding the nonzero rows up. Thus, if dim(S) = 5s, there are s zero
rows in J’. Now in J’ we have exactly one zero row for each Jordan block
corresponding to the eigenvalue u—namely, the row containing the bottom
row of the block. Because the number of such blocks is equal to the number of-
strings in B’ ending in S, we conclude that there are s such strings. Some of
these strings may be of length 1 whereas others may be longer.
Figure 9.11 shows one possible situation when s = 2, where two vectors in
S—namely, u, and u,—are ending points of strings

u,—>u,—>u,>0 and u,>u,->0


lying in the column space of A. These s strings of B’ that end in § start at 5
vectors in the column space of A; these are the vectors u, and u, in Figure 9.1 1-
9.4 JORDAN CANONICAL FORM 497

Because the vector at the beginning of the jth string is in the column space of A,
it must have the form Aw, for some vector w, in C”. Thus we obtain the vectors
w,, W,..., W, illustrated in Figure 9.11 for s = 2.
Finally, the nullspace ofA has dimension n — r, and we can expand the set
ofs independent vectors in S to a basis for this nullspace. This gives rise to
n—r-—Ss more vectors ¥,, Vi,...,V,-r-~ Of course, each v, is an eigenvector
with corresponding eigenvalue 0.
“e claim that

(u,, ce , 4, Ww), cee Ws, Vis ves » Va-s-s)

can be reordered to become a Jordan basis B forA (and of course for T). We
reorder it by moving the vectors w,, tucking each one in so that it starts the
appropriate string in B’ that was used to define it. For the situation in Figure
9.11, we obtain

(U,, U,, Us, W,, Uy, Us, W,, Us, - U,V, ~~ = 5 Mapp)

as Jordan basis. From our construciion, we see that B is a Jordan basis for A if
it is a basis for C". Because there are r + 5 + (n — r — s) = n vectors in all, we
need only show that they are independent.
Suppose that
n-r-

au, + >, cw, + > dv, = 0. (6)


M-

i jel k=I

Because the vectors v, lie in the nullspace of A, if we apply A to both sides of


this equation, we obtain
@mM-

a,Au; + 2, c,Aw, = 0. (7)


i=) Fl

Because each Au, is either of the form Au; or of the form Au, + u,_,, we see
that the first sum is a linear combination of vectors u;. Moreover, these vectors

Column space of A
Nullspace of A
*

c"

FIGURE 9.11
Construction of a Jordan basis for A (s = 2).
498 CHAPTER 9 COMPLEX SCALARS

Au; do not begin any siring in B’. Now the vectors Aw, in the second sum
are vectors u; that appear at tne start of the s strings in 3’ that end in S. Thus
they do not appear in the first sum. Because B’ is an independent set,
all the coefficients c, in Eq. (7) must be zero. Equation (6) can then be
written as

a-r~—-s

2a a; = » —d,y,. (8)

Now the vector on the left-hand side of this equation lies in the column
space of A, whereas the vector on the right-hand side is in the nullspace of A.
Consequently, this vector lies in S and is a linear combination of the s basis
vectors u; in S. Because the v, were cbtained by extending these s vectors to
a basis for the nullspace of A, the vector 0 is the only linear combination of
the v, that lies in S. Thus, the vector on Loth sides of Eq. (8) is 0. Because
the v, are independent, we see that all d, are zero. Because the u, are inde-
pendent, it follows that the a; are all zero. Therefore, B is an independent
set of n vectors and is thus a basis for C". We have seen.that, by our construc-
tion, it must be a Jordan basis. This completes the induction part of our
proof, demonstrating the existence of a Jordan canonical form for every
square matrix A.
Our work prior to this theorem makes clear that the Jordan blocks
constituting a Jordan canonical form for A are completely dctermined by the
ranks of the matrices (A — AJ)‘ for all eigenvalues A of 4 and for all positive
integers k. Thus, a Jordan canonical form J for A is unique except as to the
order in which these blocks appear along the diagonal of J. a

SUMMARY

1. A Jordan block is a square matrix with all diagonal entries equal, all entries
immediately above diagonal entries equal to 1, and all other entries equal
to 0.
2. Properties of a Jordan block are given in Theorem 9.8.
3. A square matrix is a Jordan canonical form if it consists of Jordan blocks
placed cornertocorner along its main diagonal, with entries elsewhere equal
to 0.
4. A Jordan basis (see Definition 9.8) for an n X n matrix A gives rise to a
Jordan canonical form J that is similar to A.
5. A Jordan canonical form similar to an n X n matrix A can be computed if
we know the eigenvalues A, of A and if we know the rank of (A — AJ)‘ for
each A, and for all positive integers k.
6. Every square matrix has a Jordan canonical form; that is, it is similar to a
Jordan canonical form.
9.4 JORDAN CANONICAL FORM 499

| EXERCISES

in Exercises 1-6, determine whether the given

~.

oooo~.O0O0©
6 «0 ©
Oo «~ =

Qonooceoecoe

OOOO CSO
O00
matrix 1S @ Jordan canonical form.

—————
oo oo
oo oo oceo
‘0 00 3 00 10.

oo
14.|9 00 2.}0
3 1

ooo.
10 0 0 6 03

eoo
[3100 1000

N=
el
0310 4 [0200

=
3- Jo 021 ‘10030
10 002 10004 In Exercises 11-14, find a Jordan canonical form
i100 for A from the given datu.
0-i 0 0
5-J00 31 11. Ais 5 X 5,A — 32 has nullity 2, (A — 3/)°
100 0 3 has nullity 3, (A — 37) has nullity 4,
(A — 31} has nullity 5 for k = 4.
‘2 1 0 O
5 |9 2 9 0 12. Ais 7 x 7,A + [has nullity 3, (A + J)‘ has
“10 0 ¢ O nullity 5 for k = 2; A + has nullity 1,
10 0 0-1 (A + if} has nullity 2 for j = 2.
13. Ais 8 x 8,A — [has nullity 2, (A — /)* has
nullity 4, (A — 1)‘ has nullity 5 for k = 3;
(A + 2I) has oullity 3 for j = 1.
In Exercises 7-10:
a) Find the eigenvalues of the given matrix J. 14. Ais 8 x 8;A + if has rank 4, (A + il) has
rank 2, (A + iJ) has rank 1, (A + i)‘ = O
b) Give the rank and nullity of (J — A} for each
for k = 4.
eigenvalue d of J and for every positive integer k.
¢) Draw schemata of the strings of vectors in the
standard basis arising from the Jordan blocks in J. In Exercises 15-22, find a Jordan canonical form
d) For each standard basis vector e,, express Je, as and a Jordan basis for the given matrix.
a linear combination of vectors in the standard
basis. -10 1a4
15. EE 16. 35 -41
-2 1 0 0 bid 3 0 |
| 9-2 1 6 17. |2 1 3 18.{ 2-2 1
0 0-2 1 Iso 4 -1 0 -1|
| 0 0 0-2
2 5 0 0 0
ri 0 0 0 0 0200 0
0 i 10 0 19./0 0-1 0-1
810 0 i 0 0 0 0 0-1 0
0 0 0-2 0 0 0 0 O-t.
0 0 0 0-2 fi 0 0 0 0
1 0 0 0 1 0 i 0 0 0
02100 2.10 0 2 0 0
%9!/ 002 0 0 000 2 0
0002 1 > 0-1 0 2!
0 0002
500 CHAPTER 9 COMPLEX SCALARS

2 00 0 1 20000
0 2 0 0 0 02100
21./0 0 2 0 1 27. Let
A =|0 0 2 0 O|. Compute
00 0 2 9 0003 i
oO 8 0 0 2? 100090 3]
rro2 06) (A - 2194 - 3/7). Compare with Exercise 26.
Oo | 0 0 9 ig000
22,.;/0 0 1 0 | 071100
0 0 0 t 2 28. LetA=|0 0 ¢ 0 O|. Find
a polynomial in
0 0 0 0 1 00020
23. Mark each of the fellowing True or False. 10 000 2]
A (that is, a sum of terms @,4’ with a term
___. a. Every Jordan block matrix has just one aol) that gives the zero matrix. (See Exercises
eigenvalue. 24-27.) r
___ b. Every matnx having a unique eigenvalue
is a Jordan block. 29. Repeat Exercise 28 for the matrix A =
___¢. Every diagonal matrix is a Jordan -! 1 0 0 =O
canonical form. 6-1 1 0 0
___d. Every square matrix is similar to a 0 0-1 O O}.
Jordan canonical form. 0 ok Od
___ e. Every square matrix is similar to a 000 0 it
unique Jordan canonical form. . The Cayley—-Hamilton theorem states that, if
__. f. Every 1 x | matrix is similar to a unique P(A) = a,A" + +++ + aA + ay is the
Jordan canonical form. characteristic polynomial of a mainx A, then
___ g. There is a Jordan basis for every square p(A) = a,A"+ +++ +a,A + af = O, the
matrix A. zero matrix. Prove it. [Hin1: Consider
__.h. There is a unique Jordan basis for every (A — AJ)"'b, where b is a vector in a Jordan
square matrix A. basis corresponding to A,.] [n view of
—_— i. Every 3 X 3 diagonalizable matrix 1s Exercises 24-29, explain why you expect
similar to exactly six Jordan cancnical PJ) to be O, where J is a Jordan canonical
forms. form for A. Deduce that p{A) = O.
_— j. Every 3 X 3 matrix is similar to exactly 31. Let T: C" > C’ be a linear transformaiion. A
six Jordan canonical forms. subspace W of C" is invariant under T if
T(w) © W for all w € W. Let A be the
0100 standard matrix representation of T.
24. LetA
_|90001
010 . Compute a. Describe the one-dimensional invariant
subspaces of T.
0000 b. Show that every eigenspace E, of T is
A’, A}, and A‘.
invariant under T-
25. Let A be an n X n upper-triangular -natrix c. Show that the vectors in any string in a
with all diagonal entries 0. Compute A” for Jordan basis for A generate an invariant
all positive integers m = n. (See Exercise subspace of T.
24.) Prove that your answer is correct. d. Is it true that, if S is a subspace of a
subspace W that is invariant under 7,
then S is also invariant under 7? Jf not,
give a counterexample.
e. Is it true that every subspace of R’
invariant under 7 is contained in the
nullspace of (A — AJ)’, where A is some
eigenvalue of T? If not, give a
counterexample.
9.4 JORDAN CANONICAL FORM 501

2, In Section 5.3, we considered systems c. Gt-en that, for


x’ = Ax of differential equations, and we saw
that, if A = CJC"', then the system takes the 2 3
_ form y’ = Jy, where x = Cy. (We used D in A='|0 -1 OQ).
place of J in Section 5.3, because we were 2 2 1
concerned only with diagonalization.) Let
Ay An -- ., A, be the (not necessarily Ty 3
distinct) eigenvalues of an n X n matnix A, 3 . 3
and let J be a Jordan canonical form for A. C= oa 0},
a. Show that the system y’ = Jy is of the
form 2 0 l

VN=AN
+ Oe
Vo = Ayy. + Crys, T-1 | QO
J=| 0-1 9,
10 0 4

Vn-1 = An-Wn-1 + Ch-iYm we have C™!AC = J, find the solution of


the differential system x’ = Ax.
Ya
= Asn»
33. Let A be an vn X n matrix with eigenvalue A.
where each ¢; is either 0 or 1. Prove that the algebraic multiplicity of
b. Fiow can the system in part a be solved? d is at least as large as its geometric multi-
[Hint: Start with the last equation.] plicity.

HAPTER

10 SOLVING LARGE LINEAR


SYSTEMS

The Gauss and Gauss—Jordan methods presented in Chapter | are fine for
solving very small linear systems with pencil and paper. Some applied
problems—in particular, those requiring numerica! solutions of differential
equations—can lead to very large linear systems, involving thousands of
equations in thousands of unknowns. Of course, such large linear systems
must be solved by the use of computers. That is the primary concern of this
chapter. Although a computer can work tremendously faster than we can with
pencil and paper, each individual arithmetic operation does take some time,
and additional time is used whenever the value of a variable is stored or
retrieved. In addition, indexing in subscripted arrays requires time. When
solving a large linear system with a computer, we must use as efficient a
computational algorithm as we can, so that the number of operations required
is as small as practically possible.
We begin this chapter with a discussion of the time required for a
computer to execute operations and a comparison of the efficiency of the
Gauss method including back substitution with that of the Gauss—Jordan
method.
Section 10.2 presents the LU (lower- and upper-triangular) factorization of
the coefficient matrix of a square linear system. This factorization will appear
as we develop an efficient algorithm for solving, by repeated computer runs,
many systems all having the same coefficient matrix.
Section 10.3 deals with problems of roundoff and discusses ill-conditioned
matrices. We will see that there are actually very small linear systems,
consisting of only two equations in two unknowns, for which good computer
programs may give incorrect solutions.

502
10.1 CONSIDERATIONS OF TIME S03

10.1 CONSIDERATIONS OF TIME

Timing Data for One PC

The computation involved in solving.a tinear system is just a lot of arithmetic.


Arithmetic takes time tc execute even with a computer, although the com-
puter is obviously much faster than an individual working with pencil and
paper. In reducing a matrix by using elementary row operations, we spend
most of our time adding a multiple of a row vector to another row vector.
A typical step in multiplying row k of a matrix A by r and adding it to
row / consists of

replacing a, by a, + ra,;. (1)

In a computer language such as BASIC, operation (1) might become

A(I,J) = A(I,J) + R*A(K,J). (2)

Computer instruction (2) involves operations of addition and multiplication,


as well as indexing, retrieving values, and storing values. We use terminology
of C. B. Moler and call such an operation a flop. These flops require time to
execute on any computer, although the iimes vary wideiy depending on the
computer hardware and the language of the computer program. When we
wrote the first edition of this text in 1986, we experimented with our personal
computer, which was very slow by teday’s standards, and created the data in
Table 10.1. Although computers are much faster today, the indication in the
table that computation does require time remains as valid now as it was then.
We left the original 1986 data in the table so that you can see, if you work
Exercise 22, how much faster computers are today. To form the table, we
generated two random numbers, R and S, and then found the time required to
add them 10,000 times, using the BASIC loop

FOR I =1TON:C=R + S: NEXT (3)

where we had earlier set N = 10,000. We obtained the data shown in the top
row of Table 10.1 We also had the computer execute loop (3) with + replaced
by — (subtraction), then by * (multiplication), and finally by / (division). We
similarly timed the execution of 10,000 flops, using

FOR I = | TON: A(K,J) = A(K,J) + R * A(M,J): NEXT, (4)


where we had set N = 10,000 and had also assigned values to all other
variables except 1. All our data are shown in Table 10.1 on the next page.
Table 10.1 gives us quite a bit of insight into'the PC that generated the
data. Here are some things we noticed immediately.
Point 1 Multiplication took a bit, but surprisingly little, more time than
addition.
504 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

TABLE 10.1
Time (in Seconds) for Executing 10,000 Operations

Interpretive BASIC Compiled BASIC

Single- Double- Single- Double-


Routine Precision Precision Precision Precision

Addition
{using (3)) 37 44 8 9
Subtraction
[using (3) with —] 37 48 8 9
Multiplication
[using (3) with +] 39 53 9 1!
Division
{using (3) with /] 47 223 9 15
Flops
[using (4)] 143 162 15 18

Point 2 Division took the mosi time of the four arithmetic operations. Indeed,
our PC found double-precision division in interpretive BASIC very
time-consuming. We should try to minimize divisions as much as
possible. For example, when computing

()c3, 2, 5,7, 8) + (-4, 11,2, 1, 5)


to obtain a row vector with first entry zero, we should not compute the
remaining entnes as
4 4 4 4
($)2 + 11, (5)s + 2, ($)7 + |, (3)s + 5,

which requires four divisions. Rather, we should do a single division,


finding r = 4, and then computing
2r+ 11, 5r + 2, Tr +1, 8rt+ 5.
Point 3 Our program ran much faster in compiled BASIC than in interpretive
BASIC. In the compiled version on our PC, the time for double-
precision division was not so far out of line with the time for other
operations.
Point 4 The indexing in the flops required significant time in interpretive
BASIC on our PC.
The routine TIMING in LINTEK was used to generate the data in Table
10.1 on our PC. Exercise 22 asks students to obtain analogous data with their
PCs, using this program.
10.1 CONSIDERATIONS OF TIME 505

Counting Operations
We turn to counting the flops required to solve a square linear system AX = b
that has a unique solution. We assume that no row interchanges are necessary,
which is frequently the case.
Suppose that we solve the system Ax = b, using the Gauss method with
back substitution. Form the augmented matrix

Qa, a, *** a, | O,
Q, Ay *** A, | b,
[A | b] =

[Ant an) se Gan b,

For the moment let us neglect the flops performed on b and count just the flops
performed on A. In reducing the n x n matrix A to upper-triangular form U, we
execute n — i flops in adding a multiple of the first row of [A | b] to the second
row. (We do not need to compute the zero entry at the beginning of our new
second row; we know it will be zero.) We similarly use 7 — | flops to obtain the
new row 3, and so on. This gives a total of (n — 1) flops executed using the
pivot in row 1. The pivot in row 2 is then used for the (n — 1) x (n — 1) matrix
obtained by crossing off the first row and column of the modified coefficient
matrix. By the count just made for the n x n matrix, we realize that using this
pivot in the second row will result in execution of (n — 2) flops. Continuing in
this fashion, we see that reducing A to upper-triangular form U will require

(n-1p+(n-2P+(n-3P t+ 41 (5)
flops, together with some divisions. Let’s count the divisions. We expect to use
just one division each time a row is multiplied by a constant and added to
another row (see point 2). There will be n — | divisions involving the pivot in
row |, n — 2 involving the pivot in row 2, and so on, for a total of

(n-l)+(n-2)+ (n-3)4+-°°- 41 (6)


divisions.
There are some handy formulas for finding a sum of consecutive integers
and a sum of squares of consecutive integers. It can be shown by induction (see
Appendix A) that

1424340040 = Meh (7)

and

2 n(n + 1)(2n + 1)
P+2+3P+es-s
tn 5 (8)
506 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

Replacing n by n — | in formula (8), we see that the number of flops given t


sum (5) is equal to
(n - Dalen -)_@- mn =)_ Hen? ~ 3n2 + n)
2
_” n n ,
3 “ 6 (
We assume that we are using a computer to solve a /arge linear systen
involving hundreds or thousands of equations. With n = 1000, the value of EK
(S) becomes
1,060,000,000 1,000,000 . 1000
3 ~~ 9 Ute (IC
The largest term in expression (10) is the first one, i,000,000,000/3, corr
sponding to the 73/3 term in Eq. (9). The lower powers of n in the n’/2 and n/
terms contribute little in comparison with the n°/3 term for large 7. It 1
customary to keep just the n°/3 term as a measure of the order of magnitude c
the expression in Eq. (9) for large values of n.
Turning to the count of the divisions in sum (6), we see from Eq. (7) with
replaced by n — | that we are using

"7 _h (11
divisions. For large n, this is of order of magnitude n’/2, which is inconsequen
tial in comparison with the order of magnitude n3/3 for the flops. Exercise
shows that the number of flops performed on the column vector b in reducin
[A | b] is again given by Eq. (11), so it can be neglected for large n in view of th
order of magnitude 7’/3. The result is shown in the following box.

Flop Count for Reducing [A | b] to [U | c]


If Ax = b is a square linear system in which A is an n X n matrix,
the number of flops executed in reducing [A | b] to the form [U | c]
is of order of magnitude 7/3 for large n.

We turn now to finding the number of flops used in back substitution t


solve the upper-triangular linear system Ux = c. This system can be wnittél
out as
UyXt rr FH UpyypXq-y Fo Uy Ny = Cy

Un-1n-1Xn-1 + Un-iaXn = Cy-1


UnnXn = Cn-
10.1 CONSIDERATIONS OF TIME 507

Solving foi x, requires one indexed division, which we consider to be a


flop. Solving then for x,_, requires an indexed multiplication and sub-
traction, followed by an indexed division, which we consider to be two flops.
Solving for x,-, requires two flops, each consisting of a multiplication
combined with a subtraction, followed by an indexed division, and we
consider this to contnbute three more flops, and so on. We obtain the
count

_ (n+ In win 2
1+2+3+4+---4n 2 27 5 (12)
for the flops in this back substitution. Again, this is of lower order of
magnitude than the order of magnitude n°/3 for the flops required to reduce
[A | b}] to [Uj cl].

Flop Count for Back Substitution |


If Uis an n X n upper-triangular matrix, then back substitution to
solve Ux = c requires the order of magnitude n’/2 flops for large n. !

Combining the results shown in the tast two boxes, we arrive at the
following flop count.

Flop Count for Solving Ax = b, Using the Gauss Method with Back
Substitution
If A is an n X n matrix, the number of flops required to solve the
system Ax = b using the Gauss method with back substitution is of
order of magnitude 77/3 for large n.

EXAMPLE 1 _For the computer that produced the execution times shown in Table 10.1, find
the approximate time required to solve a system of 100 equations in 100
unknowns, using single-precision arithmetic and (a) interpretive BASIC.
(b) compiled BASIC.
SOLUTION From the flop count for the Gauss method, we see that solving such a system
with n = 100 requires about 100°/3 = 1,000,000/3 flops. In interpretive
BASIC, the time required for 10,000 flops in single precision was about 143
seconds. Thus the 1,000,000/3 flops require about (1,000,000/30,000)143 =
14,300/3 seconds, or about | hour and 20 minutes. In compiled BASIC, where
10,000 flops took about 15 seconds, we find that the time is approximately
(1,000,000/30,000)!5 = 1500/3 = 500 seconds, or about 8 minutes. The PC
used for Table 10.1 is regarded today as terribly slow. «
508 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

It is interesting to compare the efficiency of the Gauss method with bac}


substitution to that of the Gauss—Jordan method. Recall that, in the Gauss.
Jordan method, [A | b] is reduced to a form [/ | c], where J is the identit:
matrix. Exercise 2 shows that Gauss—Jordan flop count is of order of mag
nitude n}/2 for large n. Thus tiie Gauss—Jordan method is not as efficient as the
Gauss method with back substitution if m is large. One expects a Gauss-
Jordan prugram to take about one and a half times as long to execute. The
routine TIMiNG in LINTEK can be used to illustrate this. (See Exercises 2:
and 24.)

Counting Flops for Matrix Operations


The exercises ask us to count the flops involved in matrix addition, multiplica-
tion, and exponcntiation. Recall that a matrix product C = AB is obtained by
taking dot products of row vectors of an m x n matrix A with column vectors
of an n X 5 matrix B. We indicate the usual way that a computer finds the
doi product ¢,, in C. First the computer sets ¢,, = 0. Then it replaces ¢, by
Cc, + @,b,, which gives c,, the value a,,b,. Then the computer replaces ¢, by
Cj + Apb,, giving c,; the value a,b, + a,,5,, This process continues in the
obvious way until finally we have accumulated the desired value,

Ci = a,D,; + a,b, tere + AiD yy.

Each of these replacements of ;, is accomplished by means of a flop, typically


expressed by

C(I,J) = C(,J) + A(I,K) * B(K,J) (13)


in BASIC.

EXAMPLE 2 Find the number of flops required to compute the dot product of two vectors,
each with m components.
SOLUTION The preceding discussion shows that the dot product uses n flops of form (13),
corresponding to the values 1, 2,3,...,aforK. s

fo
SUMMARY
Lo

i. A flop is a rather vaguely defined execution by a computer, consisting


typically of a bit of indexing, the retrieval and storage of a couple of values,
and one or two arithmetic operations. Typical flops might appear in a
computer program in instruction lines such as
C(I,J) = A(I,J) + B(LJ)
or

A(LJ) = A(IJ) + R*A(K,J).


10.1 CONSIDERATIONS OF TIME 509

2. A comyuter takes time to execute a flop, and it is desirable to use as few


flops as possible when performing an extensive computation.
3. If the number of flops used to solve a problem is given by a polynomial
expression in n, it is customary to keep only the term of highest degree in
the polynomial as a measure of the order of magnitude of tue number of
flops when n is large.
The order of magnitude of the number of flops used in solving a system
Ax = b for ann X n matrixA Is
3
a for the Gauss method with back substitution
3
and
3
> for the Gauss—Jordan method.

5. The formulas

1+2+3t:-- n= Meth

and

Pee Pg eee g gra Mat VQn+ 1)


6
are handy for determining the number of flops performed by a computer in
matrix computations.

EXERCISES
all of these exercises, assume that no row 3, B+C 4. 4 5. BA
ctors are interchanged in a matrix. 6. A} 7. A‘ 8. A?
. 5. A$ . A% 11. A”
. Let A be ann X n matrix. Show that, in 7. A 10. A
reducing [A | bj to [Uj c] using the Gauss |
method, the number of flops performed on b 12. Which of the following is more efficient with
is of order of magnitude 7/2 for large n. a computer?
1. Let A be ann X n matrix. Show that, in a. Solving a square system Ax = b by the
solving Ax = b using the Gauss—Jordan Gauss—Jordan method, which makes each
method, the number of flops has order of pivot | before creating the zeros in the
magnitude n°/2 for large n. column containing it.
b. Using a similar technique to reduce the
| Exercises 3-11, let A be ann X n matrix, and system to a diagonal system Dx = c,
t Band C be m X n matrices. For each matrix, where the entries in D are not necessarily
1d the number of flops required for efficient all 1, and then dividing by these entries
»mputation of the matrix. to obtain the solution.
510 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

13. Let A be an X 7m matrix, where n is large. [A | b} to a form [U| c], where U is an


Find the order of magnitude for the number upper-triangular matrix, taking into account
of flops if A~' is computed using the the banded character of A.
Gauss—Jordan method on the augmented 18. Repeat Exercise 17 for band width w,
matrix [A | /] without trying to reduce the expressing the result in ters of w.
number of flops used on / in response to the
zeros that appear init. - 19. Repeat Exercise 18, but include the flops
used in back substitution to solve Ax = b.
14. Repeat Exercise 13, using the Gauss method
with back substitution rather than the 20. Explain why the Gauss method with back
substitution is much more efficient than the
Gauss—Jordan method.
Gauss-Jordan method for a banded matrix
15. Repeat Exercise 14, but this time cut down where w is very small compared with n.
the number of flops performed on / during
21. Mark each of the following True or False.
the Gauss reduction by taking into account
the zeros in J. _— a. A flop is a very precisely defined entity.
_— b. Computers can work so fast that it is not
worthwhile to try to minimize the
Exercises 16-20 concern band matrices. In a
number of computations a computer
number of situations, square linear systems Ax —
makes to solve a problem, provided that
b occur in which the nonzero entries in the n X n
the number of computations is only a few
matrix A are all concentrated near the main
hundred.
diagonal, running from the upper left-hana corner
__ c. Computers can work so fast that it is not
to the lower right-hand corner of A. Such a matrix
worthwhile to try to minimize the
is called a band matrix. For example, the matrix
number of computations required to
w= solve a problem.
_—d. The Gauss method with back substitution
210000 and the Gauss—Jordan method for solving
134000 a large linear system both take about the
041200 (4) same amount of computer time.
002270 __— e. The Gauss—Jordan method for solving a
000713 large linear system takes about half again
000035) as long to execute as does the Gauss
method on a computer.
_— f. Multiplying two n x n matrices requires
is a symmetric 6 X 6 band matrix. We say that more flops than does solving a linear
the band width of a band matrix [a,] is w if w is system with an n X n Coefficient matrix.
the smallest integer such that a, = 0 for __. g. About 7’ flops are required to execute the
|i — j] = w. Thus matrix (14) has band width back substitution in solving a linear
w = 2, as indicated. Such a matrix of band width system with an n x n coefficient matrix
2 is also called tridiagonal. We usually assume by using the Gauss method.
that the band width is small compared with the __h. Executing the Gauss method with back
size of n. AS the band width approaches n, the substitution for a large linear system with
matrix becornes full. an n X n coefficient matrix requires
In Exercises 16-20, assume that A is an about 7” flops.
n X n band matrix with band width w that is _— i. Executing the Gauss method with back
small in comparison with n. substitution for a large linear system with
an 1 X ncoefficient matrix requires
16. What can be said concerning the band width about 73/3 flops.
of A”? of A>? of A™ ___. j. Executing the Gauss—Jordan method for
17. Let A be tridiagonal, so w = 2. Find the a large linear system with an 1 Xn
order of magnitude of the number of flops coefficient matrix requires about 7/2
required to reduce the partitioned matrix flops.
10.1 CONSIDERATIONS OF TIME 511

LINTEK contains a routine TIMING that can be 23. Run the routine TIMING in LINTEK to
used to time algebraic operations and flops. The time the Gauss method and the
program can also be used to time the solution of Gauss—Jordan method, starting with small
square systems Ax = b by the Gauss method with values of n and increasing them until a few
back substitution and by the Gauss-Jordan seconds’ difference in times for the two
methed. For a user-specified integer n = 80, the methods is obtained. Does the time for the
program generates the n X n matrix A and Gauss-Jordan method seem to be about 3
culumn vector b, where all entries are in the the time for the Gauss method with back’
interval [—20, 20]. Use TIMING for Exercises substitution? If not, why not?
22-24. 24. Continuing Exercise 23, increase the size of
n until the solutions take 2 or 3 minutes.
22. Experiment with TIMING to see roughly Now does the Gauss~Jordan method seem
how many of the indicated operations your to take about 3 times as iong? If this ratio is
computer can do in 5 seconds. significantly different from tuat obtained n
a. Additions Exercise 23, explain why. (The two ratios
b. Subtractions may or may not appear to be approximately
. Muttiplications the same, depending on the speed of the
. Divisions
ean

computer used and on whether time in


Flops fractions of seconds is displayed.)

MATLAB
The command clock in MATLAB returns a row vector

{year, month, day, hour, minute, second]

that gives the date and time in decimal form. To calculate the elapsed time for a
computation, we can

set t0 = clock,
execute the computation,
give the command etime(clock,t0), which returns elapsed time since t0.

The student edition of MATLAB will not accept a value of i greater than 1024 in a
“FOR i = 1 ton” loop, so to calculate the time for more than 1024 operations ina
loop, we use a double
FOR g = ! toh, FORi=
ton
loop with which we can find the time required for up to 1024- operations. Note below
that the syntax of the MATLAB loops is a bit different from that just displayed.
Recall that the colon can be used with the meaning “through.” The first two lines
below define data to be used in the last line. The last line is jammed together so that
it will all fit. on one screen line when you modify the c=a+b portion in Exercise M5
to time flops. As shown below, the last line returns the elapsed time for h - n repeated
additions of two numbers a aid b between 0 and |: ~
A = rand(6,6); a = rand; b = rand; r = rand; j = 2; k = 4; m = 3;
= 100; n = 50;
t0=clock;for g=1:h,for i= l:n,c=a+b;end;end;etime(clock,t0)
512 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

M1. Enter in the three lines shown above. Put spaces in the last line only after
“for”. Using the up-arrow key and changing the value of n (and A if
necessary), find roughly how many additions MATLAB can perform in 5
seconds on your computer. (Remember, the time given is for 4 - n additions.)
M2. Modify the c=a+b portion and repeat M1 for subtractions.
M3. Modify the c=a+b portion and repeat M1 for multiplications
M4. Modify the c=atb portion and repeat M1 for divisions.
M5. Modify the c=a+b portion and =2peat M1 for flops A(k, j)=A(k,j)+r*A(m, j).
Delete c=atb first, and don’t insert any spaces!

10.2 THE LU-FACTORIZATION

Keeping a Record of Row Operations


We continue to work with a square linear system Ax = b having a unique
solution that can be found by using Gauss elimination with back substitution,
without having to interchange any rows. That is, the matrix A can be
row-reduced to an upper-tnangular matrix U, without making any row
interchanges.
Situations occur in which it is necessary to solve many such systems, all
having the same coefficient matrix A but different column vectors b. We solve
such multiple systems by row-reducing a single augmented matnx
[A | b,, b,,...,dB)
in which we line up all the different column vectors b,, b,, .. . , b, cn the nght
of the partition. In practice, it may be impossible to solve all of these systems
by using this single augmentation in a single computer run. Here are some
possibilities in which a sequence of computer runs may be needed to solve all
the systems:
1. Remember that we are concerned with large systems. If the number s of
vectors b, is large, there may not be room in the computer memory to
accommodate all of these data at one time. Indeed, it might even be
necessary to reduce the n X n matrix A in segments, if n is very large. If
we can handle the s vectors b; only in groups of m at a time, we must use
at least s/t computer runs to solve all the systems.
2, Perhaps the vectors b; are generated over a period of time, and we need
to solve systems involving groups of the vectors b, as they are
generated. For example, we may want to solve Ax = b, with r different
vectors b; each day.
3. Perhaps the vector b,,, depends on the solution of Ax = b,. We would
then have to solve Ax = b,, determine b,, solve Ax = b,, determine bs,
and so on, until we finally solved Ax = b,.
10.2 THE LU-FACTORIZATION 513

From Section 10.1, we know that the magnitude of the number of flops
required to reduce A to an upper-tnangular matrix U is n3/3 for large n. We
want to avoid having to repeat all this work done in reducing A after the first
computer run.
We assume that A can be reduced to U without interchanging rows—that
is, that (nonzero) pivots always appear where we want them as we reduce A
to U. This means that only elementary row-addition operations (those that add
a multiple of a row vector to another row vector) are used. Recall that
a row-addition operation can be accomplished by multiplying on the left by an
n X n elementary matrix £, where E is obtained by applying the same
row-addition operation to the identity matrix. There is a sequence E£,, E,, . . - 9

E, of such elementary matrices such that


E,E,,°°° E,B,A = U. (i)
Once the matrix Uhas been found by the computer, the data in it can be stored
on a disk or on tape, and simply read in for future computer runs. However,
when we are solving Ax = b by reducing the augmented matrix [A | b], the
sequence of elementary row operations described by F,F,.,°-+-+ E,E, in
Eq. (1) must be applied to the entire augmented matnx, not merely to A. Thus
we need to keep a record of this sequence of row operations to perform on
column vectors b, when they are used in subsequent computer runs.
Here is a way of recording the row-addition operations that is both
efficient and algebraically interesting. As we make the reduction of A to U, we
create a lower-triangular matrix L, which is a record of the row-addition
operations performed. We start with the n x n identity matrix J, and as each
row-addition operation on A is performed, we change one of the zero entries
below the diagonal in J to produce a record of that operation. For example, if
during the reduction we add 4 times row 2 to row 6, we place —4 in the second
column position in the sixth row of the matrix L that we are creating as a
record. The general formulation is shown in the following box.

Creation of the Matrix L


Start with the » xn identity matrix J. If during the reduction of A to U,
r times row i is added to row k, replace the zero in row k and column i
of: the identity matrix by —r. The final result obtained from the
identity matrix is L.

EXAMPLE 1. Reduce the matrix

1 3-1
A=| 2 8 4
-!1 3 4

to upper-triangular form U, and create the matrix L described in the preceding


box.
514 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

SOLUTION We proceed in two columns, as follows:

Reduction of A to U Creation of LZ from 7


| a4 100
A=| 2 Ir=|0 10
—| ,90 1
Add —2 times
row | to row 2. .
1 3-1 100
~| 0 2. 210
-1 3 4 10 0 1
Add | times
row | to row 3.
1 3-1 ri 0 0
~10 2 6 2 1 O
0 6 3 I-1 0 1
Add —3 times
row 2 to row 3.
1 3 -1 1 0 O
~10 2 6;=U L=| 2 1 = OQ. a
0 0-15 -1 3 1

We now illustrate how the record kept in L in Example | can be used to


solve a linear system Ax = b having the matrix A in Example | as coefficient
matrix.

EXAMPLE 2 Use the record in L in Example | to solve the linear system Ax = b given by

x, + 3x, —_ X3 = —4

2x, + 8x, + 4x,= 2


—X,+3x,+4q= 4.
SOLUTION We use the record in L to find the column vector c that would occur if we were
to reduce [A | b] to [U | c], using these same row operations:
Entry €;in L Meaning of the Entry Reduction of b to c

4|

— 4]
_ Add —2 times row 1 to ~
f= 2 row 2. 10

Add —(-) = J times _ a


&, = —1 row I to row 3.
0|

e,, = 3 Add -3
ac 3 titimes row 2 2 to ~| “410l=e
— 30
10.2 THE LU-FACTORIZATION 515

If we put this result together with the matrix U obtained in Example |, the
reduced augmented matrix for the linear system becomes

1 3 -1 -
[U|jc]=|0 2 6 10}.
0 0 -15 | -30
Back substitution then yields

X=
—305 = 2

10
- 6(2) _ -2
me Ed,
x,=-44+3+2=1. a

We give another example.

EXAMPLE 3 Let

1-2 0 3 I!
—-2 3 #1 -6 —21
A=\-| 4-4 3) and |S
5-8 4 6G 23
Generate the matnx L while reducing the matrix A to U. Then use U and the
record in L to solve Ax = b.
SOLUTION We work in two columns again. This time we fix up a whole column of A in
each step:

Reduction of A Generation of L
fT 1-2 0 3 11000
_|-2 3 1 -6 01 0 8
A=!_1; 4-4 3 [= ° 01
5-8 4 0 0001
‘1-2 0 3] 1 00 0
0-1 1 O -2 1 0 0
~10 2-4 6 -1 0 | 0
10 2 4-15, 5 0 0 |,
‘1-2 0 31 1 0 0 0]
0-1 1 0 —2 i 0 0
~10 0-2 6 -1-2 1 0
10 0 6 15, | §-2 0 L
‘1-2 0 3 1 00 0
0-1 1 0 —2 1 0 0
~lo 0 -2 6/7 YU L£=)-1
-2 1. OF
10 0 (0 3] 1 5-2-3 J
516
c
CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

We now apply the record below the diagonal in L to the vector b, worki:
under the main diagonal down each column of the record in L in turn.
i] fa 11 11
—21 1 1 1
First column ol! L: b = ~11}~{-1])7}101 7 | 101
23{ | 23} {23} [-32
ti] Fou 11
1 1 1
second column of L: iol~} 1217} 121
— 32! 39 —30
li] fal
. l l
Third column of L: 1217 |12/-
— 30, L 6 |

The augmented matrix

1-2 0 3) 11
0-1 1 0 ]
0 0-2 64 12
0 0 0 3 6

yields, upon back substitution,


6
xX, = 3 = 2,

n= 12 ~ 12 = 0,
—2
1-0
xX, = _]_ = ~l,

x= 11-2-6=3.

Two questions come to mind:


1. Why bother to put the entries 1 down the main diagonal in L when the!
are not used?
2. If we add r times a row to another row while reducing A, why do we pu
—r rather than 7 in the record in L, and then change back to r agalt
when performing the operations on the column vector b?
We do these two things only because the matrix L formed in this way has a!
interesting algebraic property, which we will discuss in a moment. In fact.
when solving a large system using a computer, we certainly would not fuss with
the entries 1 down the diagonal. Indeed, we can save memory space by no!
even generating a matrix L separate from the one being reduced. Whe#
creating a zero below the main diagonal, place the record —r or r desired, #
described in question 2 above, directly in the matrix being reduced at the
10.2 THE LU-FACTORIZATION 517

position where a zero is being created! The computer already has space
reserved for an entry there. Just remember that the final matrix contains the
desired entries of U on and above the diagonal and the record for L or —L
below the diagonal. We will always use —r rather than r as record entry in this
texi. Thns, for the 4 x 4 matrix in Example 3, we obtain
1-2 0 3
—2 -! 1 0
~| —2 -2 6|> Combined L\U display (2)
5-2 -3 3
where the black entries on or above the main diagonal give the essential data
for U, and the color entries are the essential data for L.
Let us examine the efficiency of solving a system Ax = b if U and L are
already known. Each entry in the record in L requires one flop to execute when
applying this record to reduce a column vector b. The number of entries is
n-1|)n pw aA
l+24---4¢n— 12! Wn mn

which is of order of magnitude n’/2 for large n. We saw in Section 10.1 that
oack substitution requires about n’/2 flops, too, giving a total of n’ flops for
large n to solve Ax = b, once U and L are known. If instead we computed A7!
and found x = A~'b, the product A~'b would also require n? flops. But there are
at least two advantages in using the LU-technique. First, finding U requires
about n°/3 flops for large n, whereas finding A! requires n? flops. (See Exercise
15 in Section 10.1.) If m = 1000, the difference in computer time is
considerable. Second, more computer memory is used in reducing [A | /] to
[7 | A~'] than is used in the efficient way we record L as we find U, illustrated in
the combined L\U display (2).
We give a specific illustration in which keeping the record L is useful.

EXAMPLE 4 Let

1 2-1 9
A=|-2 -5 3] and b=]-17}.
~!| -3 0 ~44
Solve the linear system 42x = b.
SOLUTION We view A>x = bas A(A’x) = b, and substitute y = Ax to obtain Ay = b. We can
solve this equation for y. Then we write A’x = y as A(Ax) = y or Az = y, where z
= Ax. We then solve Az = y for z. Finally, we solve Ax = z for the desired x.
Because we are using the same coefficient matrix A each time, it is efficient to
find the matrices L and U and then to proceed as in Example 3 to find y, z, and
X in turn.
We find that the matrices U and L are given by
1 2-1] 100
U=/0 -1 '; and L=!/-21 0}.
0 0-2 -I ll
518 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

Applying the record in L to b, we obtain

9
b= 4 : a : . 6
LI.

From the matrix U, we find that

\—7
y=| 17}.
18
To solve Az = y, we apply the record in L toy:

-7) [-7) [-7


={17|/~| 3)/~| 3).
18 11 8
Using UY, we obtain

3
=|-7}.
—4
Finaliy, to solve Ax = z, we vn the record in L toz:

Using U, we find that

aH[1
The routine LUFACTOR in LINTEK can be used to find the matrices L
and U; it has an option for iteration to solve a system A”x = b, as we did in
Example 4.

The Factorization A = LU

This heading shows why the matrix L we described is algebraically interesting:


we have A = LU.

EXAMPLE 5 Illustrate A = LU for the matnces obtained in Example 3.


SOLUTION From Example 3, we have
1 0 0 olf1 -2 0 3 1-2 0 3
-2 1 0 Ollo-1 1 O| |-2 3 1 -6
LU=|_; 4 1 olfo 0-2 6/=|-1 4-4 3\=4
5-2-3 1/10 0 0 3 5-8 4 0
10.2. THE LU-FACTORIZATION = 519
It is not difficult to establish that we always have 4’ = LU. Recall from
Eq. (1) that

EE. °° E,E,A = U, (3)


for elementary matrices E, corresponding to elementary row operations
consisting of adding a scalar multiple of a row to a lower row. From Eq. (3), we
obtain
A=F,'E,! +++ Ey EU. (4)
‘We proceed to show that the matrix L is equal to the product E,"'E,! - --
E,., 'E,"' appearing in Eq. (4). Now if E; is obtained from the identity matrix
by adding r (possibly r = 0) times a row to another row, then £,"' is obtained
irom the identity matrix by adding —r times the same row to the other row.
Because E, corresponds to the last row operation performed, £,"'/ places the
negative of the last multiplier in the nth row and (nm — 1)st column of this
product precisely wheze it should appear in L. Similarly,
E,-1'(E,'T)
has the appropriate entry desired in L in the nth row and (n — 2)nd column;
Ey (Ey Ey YD)
has the appropriate entry desired in L in the {nm — 1)st row and (m — 2)nd
column; and so on. That is, the record eutries of L may be created from the
expression
EMEy' ++ Byes By
Starting at the right with / and working to the left. This is the order shown by
the color numbers below the main diagonal in the matrix
ly 7

: i O
=
Oo 2
KE

etc.
fwON

we

|
~1 00

i |
NO

As we perform elementary row operations that correspond to adding scalar


multiples of the shaded row to lower rows, none of the entries already created
below the main diagonal will be changed, because+the zeros to the right of the
main diagonal lie over those entries. Thus,
Bv'E,' ++ > EE W = L, (5)
and A = LU follows at once from Eq. (4).
92U CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

We have shown how to find a solution of Ax = b from a factorization A :


LU by using the record in L to modify b to a vector c and then solving Ux =
by back substitution. Approximately n’ flops are required. An alternativ
method of determining x from LUx = b is to view the equation as L{ Ux) = |
letting c = Ux. We first solve Le = b for c by forward substitution, and then w
solve Ux = ¢ by back substitution. Because each of the forward and bac
substitutions takes approximately n’/2 flops, the total number required i
again approximately 7’ flops. We illustrate this alternative method.

EXAMPLE 6 In Example 1, we found the factorization


1 3-! 1 0 0 1 3 -!1
8 4/=/] 2 1 OO 2 6}.
-| 3 4 |-1 3 1|o 0 -15
A L U

Use the method of forward substitution and back substitution to solve th


—4
linear system Ax =| 21, which we solved in Example 2.
4

SOLUTION First, we solve Le = b by forward substitution:

1 0 0| -4l
2 1 O} 2
~l 3 41 4
c= -4,
2¢, +0, = 2, G =2+8=
10.
—c, + 3c, +c, = 4,
C4, =¢c, — 3c, + 4= -4- 30+ 4 = —30.
Notice that this is the same c as was obtained in Example 2. The back
substitution with Ux = c of Example 2 then yields the same solution x. &

Factorization of an invertible square matrix A into LU, with L lowel


triangular and U upper triangular, is not unique. For example, if 7 is a nonzere
scalar, then rL is lower triangular and (1/r)U is upper triangular, and if A =
LU, then we also have A = (rL)((1/r)U). But let D be the n X n diagonal matnx
having the same main diagonal as U; tnat is,

uy,
Uy)
D=

Q Unn
10.2 THE LU-FACTORIZATION 521

Let U* be the upper-triangular matnix obtained frem U by multiplying the ith


row by I/u, fori = 1, 2,..., a. Then U = DU*, and we have
A= LDU*.
Now both L and U* have all entries | on their main | diagonals. This type of
factorization is unique, aS we now. show.

THEOREM 10.1. Unique Factorization

Let A be an invertible n X n matrix. A factorization A = LDU, where


L is lower triangular with all main diagonal entries 1,
U is upper triangular with all main diagonal entries 1,
D is diagonal with all main diagonal entries nonzero,
is unique.

PROOF Suppose thatA = L,D,U, andA = L,D,U, are two such factonzations.
Observe that both L,~' and L,"' are also lower triangular, D,~' and D,"' are
both diagonal, and U,~! and U,-' are both still upper triangular. Just think how
the matnx reductions of [L, | J] or [D, | 2] or [U, | 7] to find the inverses look.
Furthermore, L,~', £,-!, U,-', and U,-" have all their main diagonal entries equal
to I.
Now from L,D,U, = L,D,U,, we obtain

LE, ~ D,U,U,'D". (6)


We see that L,~'L, is again lower triangular with entries 1 on iis main diagonal,
whereas D,U,U,~'D,~' is upper triangular. Equation (6) then shows that both
matrices must be J, so L,L,"'! = Jand L, = L,. A similar argument starting over
with L,D,U, = L,D,U, rewritten as

UU! = DLL D, (7)


shows that U, = U,. We then have L,D,U, = L,D,U,, and multiplication on the
left by L,-' and on the right by U,"' yields D, = D,.

Systems Requiring Row interchanges


Let A be an invertible square matnx whose row reduction to an upper-
triangular matrix U requires at least one row interchange. Then not all
elementary matrices corresponding to the necessary row operations add
multiples of row vectors to other row vectors. We can still write
E,E,.° °° £,E,A = U,
sO

A= EES +++ Ey 'E,'U,


924 CHAFItK 1U SOLVING LARGE LINEAR SYSTEMS

but now that some of these £,"' interchange rows, their product may not |
lower triangular. However, after we discover which row interchanges a
necessary, we could start over, and make these necessary row interchanges |
the matrix A before we start creating zeros below the diagonal. We can see th.
the upper-triangular matrix U obtained would still be the same. Suppose, fi
example, that to obtain a nonzero element in pivot position in the ith row v
interchange this ith row with a kth sow farther down in the matrix. The ne
ith row will be the same as though it had been put in the ith row position befor
the start of the reduction; in either case, it has been modified during tt
reduction only by the addition of multiples of rows above the ith row positio1
As multiples of it are now added to rows below the ith row position, the sam
rows {except possibly for order) below the ith row position are create:
whether row interchange is performed during reduction or is completed befor
reduction starts. .
Interchanging some rows before the start of the reduction amounts t
multiplying A on the left by a sequence of elementary row-interchang
matrices. Any product of elementary row-interchange matrices is called
permutation matrix. Thus we can form PA for a permutation matrix P, and P
will then admit a factorization PA = LU. We state this 1s a theorem.

THEOREM 10.2 LU-Factorization

Let A be an invertible square matrix. Then there exists a permutation


matrix P, a lower-triangular matrix L, and an upper-triangular matrix
U such that PA = LU.

EXAMPLE 7 Illustrate Theorem 10.2 for the matrix

1 3 2
A=|-2 -6 1
2 5 7

SOLUTION Starting to reduce A to upper-triangular form, we have

fi 3 2) fl 3 2
-2-6 1I/~]0 0 5|,
2 5 7! 10-1 3

and we now find it necessary to interchange rows 2 and 3, which will ther
produce the desired U. Thus we take

0
o-- SO

P=
oo;

0 >

l
10.2 THE LU-FACTORIZATION 523

and we have

1 3 2 1 3 2
PA=| 2 5 TI~j0O -I 3/=v.
—2 -6 1 6 0 5

The record matrix Z for reduction of PA becomes

1 0 0
L=| 2 1 OQ,
-2 0 1

and we confirm that

1 3 2 1 0 olf) 3 2
PA={| 2 5 7T7l=; 2 1 OVO0-1 3/=L2U.
2-6 | > 0 11/0 0 5 .
Suppose that, when solving a large square linear system Ax = b with a
computer, keeping the record L, we find that row interchanges are advisable.
(See the discussion of partial pivoting in Section 10.3.) We could keep going
and make the row interchanges to find an upper-triangular matrix; we could
then start over, make those row interchanges first, just as in Example 7, to
obtain a matrix PA, and then obtain U and L for the matrix PA instead. Of
course, we would first have to compute Pb when solving Ax = b, because the
system would then become PA x = Pb, before we could proceed with the record
L and back substitution. This is an undesirable procedure, because it requires
reduction of both A and FA, taking a total of about 27/3 flops, assuming that A
is n X n and nis large. Surely it would be better to devise a method of
record-keeping that would also keep track of any row interchanges as they
occurred, and would then apply this improved record to b and use back
substitution to find the solution. We toss this suggestion out for enthusiastic
programmers to consider on their own.

~) SUMMARY
1. If A is an n X n invertible matrix that can be row-reduced to an
upper-triangular matrix U without row interchanges, there exists a lower-
triangular n X n matrix L such that A = LU.
2. The matrix Z in summary item | can be found as follows. Start with the
n X nidentity matrix /. If during the reduction of A to U, r times row 1 is
added to row k, replace the zero in row k and column / of the identity
matrix by —r. The final result obtained from the identity matrix is the
matrix L.
3. Once A has been reduced to Uand L has been found, a computer can find
the solution of Ax = b for a new column vector b, using about 1’ flops for
large n.
524 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

4. If Aisas described in summary item 1, then A has a unique factorization o


the form A = LDU, where
L is lower triangular with all diagonal entries 1,
U is upper triangular with all diagonal entries 1, and
D is a diagonal matnix with all diagonal entries nonzero.
5. Forany invertible matrix A, there exists a permutation matrix P such that
PA can be row-reduced to an upper-triangular matrix U and has the
properties described for A in summary items 1, 2, 3, and 4.

EXERCISES

1. Discuss briefly ihe need to worry about tue answer, using :natrix multiplication. Then solve
time required to create the record matnx L the system Ax = b, using P, L, and U.
when solving
Ax = by, D,...,d,
6 -5 [32
for a large n X n matrix A. [2 41 -3 6|
2. Is there any practical value in creating the 9. 4=|16 3 -8|,b=|17
record matnx L when one needs to solve 2-15 0
only a single square linear system Ax = b, as
rt | 3 -13
we did in Example 3?
10.4=|0 1 I/,b=|] 6
3-1 1 -7
In Exercises 3-7, find the solution oj Ax = b from
the given combined L\U display of the matrix A | -3
and the given vector b W.A=/3 7 2/,b=] 1
14-2 1 -2
3. L\U = 5 s}
-2
= 2]4 ‘1 -4 | -2 -9
10 2-1 I}, _] 6
12 A=\, 7 -2 1] = 10
4. L\U= | 72 5. =| 1 0 3 0-4 —16
~21 -7
fT 1 2-3 0 8
1-3 4 , 2|
5. 1\U=|-2 -1 9|,b= 3 _| 2 5-6 Of] ,_]17
0 1 -6 8 3. A=) yy yf =]W8
410-9 1 33
1-4 2 3]
6. L\U=|0 2 -1|,b=] 2 In Exercises 14-17, proceed as in Example 4 to
3 2-2 | 3{ solve the given system.
1 0 0 1 4
_}oO-14
7. INU=]_) 2J tt.>=|_g
3) } 7 14. Solve 4’x = bifA = 3 of b= i.
2 1-1 3 14
15. Solve A*x = bifA =|7! 3}
In Exercises 8-13, let A be the given matrix. Find 2 -3
a permutation matrix P, if necessary, and bal 144]
matrices L and U such that PA = LU. Check the -233
10.2 THE £U-FACTORIZATION 525

2 - °| a In Exercises 22-24, use the routine LUFACTOR


Solve
Ax = bifA=| 4 -1 2l, in LINTEK to find the combined L\U display (2)
-6 2 O| for the matrix A. Then solve Ax = b, for each of
-2} the column vectors b,.
b= | ~

12 -| 2 1-3 4 =O
1 4 6-1 §
-! 0O 1
22. A=!|-2 0 3 321 = 4);
Solve
x =bifA=| 3 1 Ol,
6 1 2 8 -3
-2 0 4 4 1 3 2 1
27
=) 29). —20
122 96| fori= 1,
=), , 9} and] 53| 2, and 3,
5| respectively.
Exercises 18-20, find the unique Jactorization M1 L 26
1U for the given matrix A.
1 3-5 2 {ft
The matnx in Exercise 8 4-6 10 8 3
23. A=| 3 6-1 4 =7);
The matrix in Exercise 10
2 1 11-3 =#13
The matnx in Exercise | 1 -§ 3 1 4 2)3
Mark each of the following True or False.
0} |-48 87
_a. Every matrix 4 has an LU-factorization.
20| | 218 -151
_b. Every square matnx A has an
b, =| —9], Q|, and} 102
LU-factonzation.
—31 100 — 46
. ¢. Every square matrix A has a factonzation
P~'LU for some permutation matrix P. 43] | 143 223]
_d. If an LU-factonization of ann x « aatnx for i = i, 2, and 3, respectively.
A 1s known, using this factorization to -_

solve a large linear system with coefficient -1 323 4 6


matrix A requires about n’/2 flops. 3 1 0-2 1 °7
_e. If an LU-factorization of an n < n matrix 16 6-2 1 4 6
A is known, using this factorization to A=) id 3501-7 «6
solve a large linear system with coefficient 3-2 110 8 0
matrix A requires about n’ flops. | 4-5 8-3 2 10
_ f. IfA = LU, then this is the only
factorization of this type for A. 19] | 82 25
_g. IfA = LU, then the matrix L can be 2 3
regarded as a record of the steps in the _| -8} | 82 24
row reduction of A to U. b=] gol} 31/494] 73
-h. All three types of elementary row 76| | 92 ~115
Operations used to reduce A to U can be 103} |-61 20
recorded in the matrix L in an
LU-factorization of A. for i = 1, 2, and 3, respectively.
. i. Two of the three types of elementary row In Exercises 25-29, use the routine LUFACTOR
Operations used to reduce A to U can be in LINTEK to solve the indicated system.
recorded in the matrix L in an
£LU-factorization of A. 25. A°x = b forA and b in Exercise 14
— j. One can solve a linear system LUx = b
by means of a forward substitution 26. A’x = b forA and b in Exercise 15
followed by a back substitution. 27. A’x = b for A and b in Exercise 16
526 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

28. 4*x = b forA and b in Exercise 17 large n, and the solution for each column vector
29. A*x = b, for A and b, in Exercise 24 should require about r° flops. In Exercises 30-3
use the routine TIMING and see if the ratios of
The routine TIMING in LINTEK has an option times obtained seem to conform to the ratios of
to time the forraation of the combined L\U the numbers of flops required, at least as n
display from a matrix A. The user specifies the increases. Compare the time required to solve o:
system, incliding the computation of L\U, with
size n for ann X » linear system Ax = d. The
program then geiierates ann X n matrix A with the time required to solve a system of the same
entries in the interval [--20, 20] and creates the size using the Gauss method with back
L\U display shown in matrix (2) in the text. The substitution. The times should be about the sam
time required for the reduction is indicated. The
user may then specify a number s for solution of 30. n=4,5= 1,3. 12
Ax = b,, by... , b,. The computer generates a 31. n= 10:5= 1.5, 10
column vector and solves s systems, using the 32. n= 16:5= 1.8. 16
record in L and back substitution; the time to find
33. n= 24,5 = 1.12, 24
these s solutions is also indicated. Recall that the
reduction of A should require ubout n’/3 flops for 34. n= 30:5 = 1.15, 30

MATLAB
MATLAB uses LU-factorization, for example, in its computation of A\ B. (Recall
that A\B = A"'B if A is a square matrix.) In finding an LU-factorization of A,
MATLAB uses partial pivoting (row swapping to make pivots as large as possible)
for increased accuracy (see Section 10.3). It obtains a lower-triangular matnx L ar
an upper-triangular matrix U such that LU = PA, where P is a permutation matrix
as illustrated in Example 7. The MATLAB command
{L,U,P}] = lu(A)
produces these matrices L, U, and P. In pencil-and-paper work. we like small pivo
rather than large ones, and we swapped rows in Exercises §- 13 only if we
encountered a zero pivot. Thus this MATLAB command cannot be used to cneck
our answers to Exercises 8-13.

10.3 PIVOTING, SCALING, AND ILL-CONDITIONED MATRICES

Some Problems Encountered with Computers


A computer can’t do absolutely precise arithmetic with real numbers. f
example, any computer can only compute using an approximation of t
number V2, never with the number V2 itself. Computers work using bast
arithmetic, representing numbers as strings of zeros and ones. But th
encounter the same problems that we encounter if we pretend that we art
computer using base-10 arithmetic and are capable of handling only some fix
finite number of significant digits. We will illustrate computer problems
base-10 notation, because it is more familiar to all of us.
10.3 PIVOTING, SCALING, AND ILL-CONDITIONED MATRICES 527

Suppose that our base-10 computer is asked to compute the quotient =;


and suppose that it can represent a number | in tloating-point arithmetic using
eight significant figures. It represents 3 in truncated form as 0.33333333. It may
represent 32 as 0.66666667, rounding off to ciglt significant figures. For
convenience, we will refer to all errors generated by either truncation or
roundoff as roundoff errors.
Most people realize that in an extended anthmetic computation by a
computer, the roundoff error can accumulate to such an extent that the final
result of the computation becomes meaningless. We will see that there are at
least two situations in which a computer cannot meaningfully execute even a
single arithmetic operation. In order to avoid working with long strings of
digits, we assume that our computer can compute only to three significant
figures in its work. Thus, our threc-figure, base-10 compuier computes the
quotient ; as 0.666. We box the first computer problem that concerns us, and
give an example.

Addition of two numbers of very different magnitude may result in


the loss of some or even all of the significant figures of the smaller
number.

EXAMPLE 1 Evaluate our three-figure computer’s computation of

45.1 + .0725.
SOLUTION Because it can handle only three significant figures, our computer represents
the actual sum 45.1725 as 45.1. In other words, the second summanid .0725
might as well be zero, so far as our computer 1s concerned. The datum .0725 1s
completely lost. a

The difficulty illustrated in Example | can cause serious problems in


attempting to solve a linear system Ax = b with a computer, as we will
illustrate in a moment. First we box ancther problem a computer may have.

Subtraction of nearly equal numbers can result in a loss of


significant figures.

EXAMPLE 2 Evaluate our three-fgure computer’s computation of

2 665
3 1000"
SOLUTION The actual difference is

666666 -- - — .665 = 00166666 - - -


528 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS
However, our three-figure computer obtains
.666 — .665 = .001.
Two of the three significant figures with which tne computer can work h:
been lost. a»

The difficulty illustrated in Example 2 is encountered if one tries to us:


computer to do differential calculus.
The cure for both difficulties is to have the computer work with mc
significant figures. But using more significant figures requires the computer
take more time to execute a program, and of course the same errors can occ
“farther out.” For example, the typical microcomputer of this decade, w
software designed for routine computations, will compute 10'° + 107'®as 1¢
losing the second summand 107"? entirely.

Partial Pivoting
In row reduction of a matnx to echelon form, a technique called part
pivoting is often used. In partial pivoting, one interchanges the row in whi
the pivot is to occur with a row farther down, if necessary, so that the pis
becomes as large in absolute value as possible. To illustrate, suppose that r
reduction cf a matrix tc echelon form leads to an intermediate matrix

2 8-1 3 4
0-2 3-5 6
0 4 1 2 OF
0-7 3 1 4
Using partial pivoting, we would then interchange rows 2 and 4 to use t
entry —7 of maximum magnitude among the possibilities —2, 4, and —7 !
pivots in the second column. That is, we would form the matrix
2 8-13 4
0-73 1 4

and continue with the reduction of this matrix.


We show by example the advantage of partial pivoting, Let us consider t
linear system

Olx, + 100x, 100


—100x, + 200x, 100.

EXAMPLE 3 Find the actual solution of linear system (1). Then compare the result with th
obtained by a three-figure computer using the Gauss method with ba
substitution, but without partial pivoting.
10.3 PIVOTING, SCALING, AND ILL-CONDITIONED MATRICES 52°

SOLUTION First we find the actual solution:

01 100 100) | 100 00)


-100 200 | 100 0 1,000,200 | 1.000,100}
and back substitution yields

x, = 1,000,100
2“ 1,000,200”
_ _ 100,010,000 _ 1,000,000
x= (100 1,000,200 100 = + 000,200"
Thus, x, = .9998 and x, = .9999. On the other hand, our three-figure computer
obtains

| 01 100 00) _ ! 100 00


-100 200 | 100 0 1,000,000 | 1,000,000)
which leads to

x, = 1,
x, = (100 — 160)100 = 0.
The x,-parts of the solution are very different. Our three-figure computer
completely lost the second-row data entries 200 and 100 in the matrix when it
added 10,000 times the first row to the second row. «

EXAMPLE 4_ Find the solution to system (1) that our three-figure computer would obtain
using partial pivoting.
SOLUTION Using partial pivoting, the three-figure computer obtains

01 100 100 | ~100 200 0)


-100 200 | 100 01 100 | 100
_ [- 100 200 100
0 100 | 100!
which yields
xX, = 1,
x, = 100 = 200 _ ,
11000
This 1s close to the solution x, ~ .9998 and x, ~ .9999 obtained in Example 3,
and it is much better than the erroneous x, = 0, x, = 1 that the three-figure
computer obtained in Example 3 without pivoting. The data entries 200 and
100 in the second row of the initial matrix were not lost in this computa-
tion. «
530 CHAPTER 16 SOLVING LARGE LINEAR SYSTEMS

Yo understand completely the reason for the difference between t


solutions in Example 3 and in Example 4, consider again the linear systen
01x, + 100x, = 100
—100x, + 200x, = 100,

which was solved in Example 3. Multiplication of the first row by 10,0


produceu a coefficient of x, of such magnitude compared with the seco
equation coefficient 200 that the coefficient 290 was totally destroyed in t
ensuing addition. (Remember that we are using our three-figure computer. Ii
less dramatic example, some significant figures of the smaller coefficient mig
still contribute to the sum.) If we use partial pivoting, the linear syste
becomes

—100x, + 200x, = 100


Olx, + 100x, = 100
and multiplication of ihe first equation by a number less than | does r
threaten the significant digits of the numbers in the second equation. T]
explains the success of partial pivoting in this situation.
Suppose now that we multiply the first equation of system (1) by 10,0(
which of course does not alter the solution of the system. We then obtain t
linear system

100x, + 1,000,000x, = 1,000,000


—100x, + 200x, = 100.

If we solved system (2) using partial pivoting, we would not interchange t


rows, because —100 is of no greater magnitude than 100. Addition of the fi
row to the second by our three-figure computer again totally destroys t
coefficient 200 in the second row. Exercise | shows that the erroneous solutt
xX, = 0, x, = 1 is again obtained. We could avoid this problem by full pivoti
In full pivoting, columns are also interchanged, if necessary, to make pivots
large as possible. That is, if a pivot is to be found in the row i and colum
position of an intermediate matnx G, then not only rows but also columns é
interchanged as needed to put the entry g,, of greatest magnitude, where r ?
and s = j, in the pivot position. Thus, full pivoting for system (2) will lead
the matrix

x, x,
71,000,000 100 000,000
200 —100 100]"
Exercise 2 illustrates that row reduction of matrix (3) by our three-figt
computer gives a reasonable solution of system (2). Notice, however, that '
now have to do some bookkeeping and must remember that the entries
column | of matrix (3) are really the coefficients of x,, not of x,. Fora matrix
any size, the search for elements of maximum magnitude and the bookkeep!
required in full pivoting take a lot of computer time. Partial pivoting
10.3 PIVOTING, SCALING, AND ILL-CONDITIONED MATRICES 531

frequently used, representing a compromise between time and accuracy. The


routine MATCOMP in LINTEK uses partial pivoting in its Gauss-Jordan
reduction to reduced echelon form. Thus, if system (1) is modified so that the
Gauss—Jordan method without pivoting fails to give a reasonable solution for
a 20-figure computer, MATCOMP could probably handle it. However, one can
no doubt create a similar modification of the 2 x 2 system (2) for which
MATCOMP would give an erroneous solution.

Scaling
We display again system (2):

100x, + 1,000,000x, = 1,000,000


—100x, + 200x, = 100.
We might recognize that the number 1,000,000 dangerously dominates the
data entries in the second row, at least as far as our three-figure computer is
concerned. We might multiply the first equation in system (2) by .0001 to cut
those two numbers down to size, essentially coming back to system (1). Partial
pivoting handles system (1) well. Multiplication of an equation by a nonzero
constant for such a purpose is known as scaling. Of course, one could
equivalently scale by multiplying the second equation in system (2) by 10,000,
to bring its coefficients into line with the large numbers in the first equation.
We box one other computer problem that can sometimes be addressed by
scaling. In reducing a matrix to echelon form, we need to know whether the
entry that appears in a pivot position as we start work on a column is truly
nonzero, and indeed, whether there is any truly nonzero entry from that
column to serve as pivot.

Due to roundoff error, a computed number that should be zero is


quite likely to be of small, nonzero magnitude.
|

Taking this into account, onc usually programs row-echelon reduction so that
entries of unexpectedly small size are changed to zero. MATCOMP finds the
smallest nonzero magnitude m among ! and all the coefficient data suppised
for the linear system, and sets E = rm, where r is specified by the user. (Default
r is .0001.) In reduction of the coefficient matrix, a computed entry of
magnitude less than E is replaced by zero. The same procedure is followed in
YUREDUCE. Whatever computed number we program a computer to choose
for E in a program such as MATCOMP, we will be able to devise some linear
system for which E£ 1s either too large or too small.to give the correct result.
A procedure equivalent to the one in MATCOMP that we just outlined is
to scale the original data for the linear system in such a way that the smallest
nonzero entry is of magnitude roughly 1, and then always use the same value,
perhaps 107‘, for E.
532 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

When one of the authors first started experimenting with a computer.


was horrified to discover that a matrix inversion routine built into |
mainframe BASiC program of a major computer company would refuse
invert a matrix if all the entries were small enough. An error message such

“Nearly singular matrix. Inversion impossible.”

would appear. He considers a 2 X 2 matrix

ed cd

to be nearly singular if and only if lines in the plane having equations of |


form

ax + by=r
cxtdy=s

are nearly parallel. Now the lines

1lO’x + Oy=1
Ox + 10°y = |
are actually perpendicular, the first one being vertical and the seco
honzontal. This is as far away from parallel as one can get! It annoyed t
author greatly to have the matrix

ie ,
0 10°
called “nearly singular.” The inverse is obviously

ie m
0 10°)
A scaling routine was promptly written to be executed before calling|
inversion routine. The matrix was multiplied by a constant that would bn
the smallest nonzero magnitude to at least 1, and then the inversion subré
tine was used, and the result rescaled to provide the inverse of the origi!
matrix. For example, applied to the matrix just discussed, this procedt
becomes
10-8 9 —
0 on multiplied by 10’ 5 becomes 10 0
| 0 1}

. 1 0
inverted becomes [is ,

multiplied by 10° becomes 10 ol


103 PIVOTING, SCALING, AND ILL-CONDITIONED MATRICES 53:

Having more programming experience now, this author is much more


charitable and understanding. The user may also find unsatisfactory things
about MATCOMP. We hope that this little anecdote has helped explain the
notion of scaling.

\ll-Cconditioned Matrices

The line x + y = 100 in the plane has x-intercept 100 and y-intercept | 00, as
shown in Figure 10.1. The line x + .9y = 100 also has x-intercept 100, but it
has y-intercept larger than 100. The two lines are almost parallel. The common
x-intercept shows that the solution of the linear system
x+ y= 100
4
x+ Sy = 100 ©)

is x = 100, y = 0, as illustrated in Figure 10.2. Now the line .9x + y = 100 has
y-intercept 100 but x-intercept larger than 100, sc the linear system

x+y= 100
5
9x + y = 100 ©)

y y

4 t

10 100% x+ Sy= 100

x+y= 100
x+y = 100 Intersection
/
yp Xx —p- X

0 1008 0 - 100%
IGURE 10.1 FIGURE 10.2
he line x + y = 100. The lines are almost parallel.
534 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

has the very different solution x = 0, y = 100, as shown in Figure 10.3. Syste
(4) and (5) are examples of ill-conditioned or unstable systems: small chang
in the coefficients or in the constants on the right-hand sides can produce ve
great changes in the solutions. We say that a matrix A is ill-conditioned ii
linear system Ax = b having A as coefficient matrix is ill-conditioned. For ty
equations in two unknowns, solving an ill-conditioned system corresponds
finding the intersection of two nearly parallel lines, as shown in Figure 10
Changing a coefficient of x or y slightly in one equation changes the slope
that line only slightly, but it may generate a big change in the location of t
point of intersection of the two lines.
Computers have a lot of trouble finding accurate solutions of i
conditioned systems such as systems (4) and (5), because the small round
errors created by the computer can produce large changes in the solutio
Pivoting and scaling usually don’t help the situation; the systems are basical
unstable. Notice that the coefficients of x and y in systems (4) and (5) are
comparable magnitude.
Among the most famous ill-conditioned matrices are the Hilbert matrice
These are very bad matrices named after a very good mathematician, Dav
Hilbert! (See the historical note on page 444.) The entry in the ith row and j
column of a Hilbert matrix is I/(i + j — 1). Thus, if we let H, be the n x
Hilbert matrix, we have
11
; 153
Ll 5 tid
A,=), 4} A, =|2 3 4), and so on.
2 3! aid
34 5

y Y
A I

L- Intersection

ae

Intersection

> >.

GURE 10.3 FIGURE 10.4


A very different solution from the one in The intersection of two nearly parallel lin
Figure 10.2.
10.3 PIVOTING, SCALING, AND ILL-CONDITIONED MATRICES 535

It can be shown that H, is invertible for all n, so a square linear system H,x = b
has a unique solution, but the solution may be very hard to find. When the
matrix is reduced to echelon form, entries of surprisingly small magnitude
appear. Scaling of a row in which all entries are close to zero may help a bit.
Bad as the Hiibert matuiices are, poweis of them are even worse. The
software we make available includes a routine called HILBERT, which is
modeled on YUREDUCE. The computer generates a Hilbert matrix of the
size we specify, up to 10 x 10. It will then raise the matrix to the power 2, 4, 8,
or 16 if we so request. We may then proceed roughly as in YUREDUCE.
Routines such as YUREDUCE and HILBERT should help us understand this
section, because we can watch and see just what is happening as we reduce a
matrix. MATCOMP, which simply spits out answers, may produce an absurd
result, but we have no way of knowing exactly where things went wrong.

SUMMARY
1. Addition of numbers of very different magnitudes by a computer can cause
loss of some or all of the significant figures in the number of smaller
magnitude.
2. Subtraction of nearly equal numbers by a computer can cause loss of
significant figures.
3. Due to roundoff error, a computer may obtain a nonzero value for a
number that should be zero. To attempt to handle this problem, a
computer program might assign the value zero to certain computed
numbers whenever the numbers have a magnitude less than some predeter-
mined small posttive number.
4. In partial pivoting, the pivot in each column is created by swapping rows,
if necessary, so that the pivot has at least the maximum magnitude of any
entry below it in that column. Partial pivoting may be helpful in avoiding
the problem stated in summary item 1.
5. In full ptvoting, columns are interchanged as well as rows, if necessary, to
create pivots cf maximum possible magnitude. Full pivoting requires
much more computer time than does partial pivoting, and bookkeeping ts
necessary to keep track of the relationshtp between the columns and the
variables. Partial pivoting is more commonly used.
6. Scaling, which is multiplication of a row by a nonzero constant, can be
used to reduce the sizes of entries that threaten to domtnate entries 1n
lower rows, or to increase the sizes of entries in a row where some entries
are very small.
7. Alinear system Ax = b is ill-conditioned or unstable if small changes in the
numbers can produce large changes in the solution. The matrix 4 is then
also called ill-conditioned.
536 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

8. Hilbert matrices, which are square matrices with entry I/(i + j — 1) inth
ith row and jth column, are examples of ill-conditioned matrices.
9. With present technology, it appears hopeless to write a computer prograr
that will successfully handle every linear system involving even very sma
coefficient matrices— say, of size at most 10 x 10.

EXERCISES

. Find the solution by a three-figure computer d. Multiply the matnx obtained in part c b
ant,

of the system n to get Av!.


= Use the routines MATCOMP, YUREDUCE, and
100x, + 1,000,000x, = 1,000,000 HILBERT in LINTEK for Exercises 8-13.
—100x, + 200x, = 100
using just partial pivoting. 8. Use YUREDUCE to solve the system
. Repeat Exercise |, but use full pivoting.
IE-9x, + 1E9 x, = 1
we

. Find the solution, without pivoting, by a


five-figure computer of the linear system in
1E9 x, + 2E9 x, = 1,
Exercise |. Is the solution reasonably without using any pivoting. Check the answe
accurate? If so, modify the system a bit to mentally to see !f it is approximately correc!
obtain one for whicn a five-figure computer If it is, increase the exponent from 9 until a
does not find a reasonable solution without system is obtained in which the solution
pivoting. without pivoting is erroneous. [Note: |E—9 :
. Modify the linear system in Exercise | so 10-9, 2E9 = 2(10°)}
that an eight-figure computer (roughly the . Use YUREDUCE to solve the system in
usual single-precision computer) will not Exercise 8, which did not give a reasonable
obta'n a reasonable solution without partial solution without pivoting, but this time use
pivoting. partial pivoting.
. Repeat Exercise 4 for an 18-figure computer 10. Repeat Exercise 9, but this time use full
(roughly the usual double-precision pivoting. Compare the answer with that in
computer). Exercise 9.
. Find a linear system with two equations in x 11. See how MATCOMP handles the linear
and y such that a change of .001 in the system formed tn Exercise 8.
coefficient of x in the second equation
produces a change of at least | ,000,000 in 12. Experiment with MATCOMP, and see if it
both of the values x and y in a solution. gives a “nearly singular matrix”’ message
when finding the inverse of a 2 x 2 diagona
. Let A be an invertible square matrix. Show matrix
that the following scaling routine for finding
A™' using a computer is mathematically
r[= Ht 0 ,
correct. 0 |
a. Find the minimum absolute value m of
all nonzero entries in A. where r is a sufficiently small nonzero
b. Multiply A by an integer n ~ I/m. number. Use the default for roundoff contro
c. Find the inverse of the resulting matrix. in MATCOMP.
103 PIVOTING, SCALING, AND ILL-CONDITIONED MATRICES 537

3. Using MATCOMP, use tne scaling routine —_— j. The entry in the ith row and jth column
suggested in Exercise 7 to find the inverse of of a Hilbert matrix is 1/{i + J).
the matrix
.00000: -.000003 .000011 .000006
—.000002 .000013 .000007 .000010 Use the routine HILBERT in LINTEK in
000009 -—.000011 0 —.000005 |" Exercises 15-22. Because the Hilbert matrices are
000014 -—.000008 -—.000002 .000003 nonsingular, diagonal entries computed during the
elimination should never be zero. Except in
Then see if the same answer is obtained Exercise 22, enter 0 for r when r is requested
without using the scaling routine. Check the during a run of HILBERT.
inverse in each case by multiplying by the
original matrix.

Bl Bl
15. Solve H,x = b, c, where
4. Mark each of the following True or False.
— a. Addition and subtraction never cause any
problem when executed by a computer.
—b. Given any present-day computer, cne can
find two positive numbers whose sum the Use just one run of HILBERT; that is, solve
computer will represent as the larger of both systems at once. Notice that the
the two numbers. components of b and c differ by just 1. Find
- A computer may have trouble the difference in the components of the two
representing as accurately as desired the
solution vectors.
sum of two numbers of extremely 16. Repeat Exercise 15, changing the coefficient
different magnitudes. matrix to H,'.
- Acomputer may have trouble 17. Repeat Exercise 15, using as coefficient
representing as accurately as desired the matnx H, with
sum of two numbers of essentially the
saine magnitude.
- A computer may have trouble
representing as accurately as desired the
sum of a and 8, ‘vhere a is approximately
2b or —20. 18. Repeat Exercise 15, using as coefficient
Partial pivoting handles all problems matrix H,!.
resulting from roundoff error when a
linear system is being solved by the 19. Find the inverse of H,, and use the / menu
Gauss method. option to test whether the computed inverse
Full pivoting handles roundoff error is correct.
problems better than partial pivoting, but . Continue Exercise 19 by trying to find the
it is generally not used because of the inverses of H, raised to the powers 2, 4, 8,
extra computer time required to and 16. Was it always possible to reduce the
implement it. power of the Hilbert matrix to diagonal
. Given any present-day computer, one can form? If not, what happened? Why did it
find a system of two equations in two happen? Was scaling (1 an entry) of any
unknowns that the computer cannot solve help? For those cases in which reduction to
accurately using the Gauss method with diagonal form was possible, did the
back substitution. computed inverse seem to be reasonably
i. A linear system of two equations in two accurate when tested?
unknowns is unstable if the lines 21. Repeat Exercises 19 and 20, using H, and
represented by the two equations are the various powers of it. Are problems
extremely close to being parallel. encountered for lower powers this
time?
538 CHAPTER 10 SOLVING LARGE LINEAR SYSTEMS

22. We have seen that, in reducing a matrix, we a)


may wish to instruct the computer to assign Hx =b, where b=| 1],
the value zero to computed entries of 4
sufficiently small magnitude, because entries entering as r the number .0001 when
that should be zero might otherwise be left requested. Will the usual Gauss—Jordan
nonzero because of roundoff error. Use
elimination using HILBERT solve the
HILBERT to try to solve the linear system? Will HILBERT solve it using scal
system (1 an entry)?

MATLAB
The command hilb(n) in MATLAB creates the » X m Hilbert matrix, and the
command invhilb(n) attempts to create its inverse. For a matrix X in MATLAB, t!
command max(X) produces a row vector whose jth entry is the maximum of the
entries in the jth column of X, whereas for a row vector v, the command max(y)
returns the maximum of the components of v. Thus the command max(max(X))
returns the maximum of the eniries of X. The functions min(X) and min(v) return
analogous minimum values. Thus the command line
n = 5; A = hi!b(n)+invhilb(n); b = [max(max(A)) min(min(A))]

will return the two-component row vector b whose first component is the maximt
of the entries of A and whose second component is the minimum of those entries
Of course, if invhilb(n) is computed accurately, this vector should be [1, 0].

Ml. Enter format long and then enter the line


n = 2; A = hilb(n)+invhilb(n); b = [max(max(A)) min(min(A))]

in MATLAB. Using the up-arrow key to change the value of n, copy down |
vector b with entries to three significant figures for the values 5, 10, 15, 20,
25, and 30 of n.
M2. Recall that the command X = rand(n,n) generates a random ” X n matrix
with entries between 0 and 1; the Hilbert matrices also have entries betwee!
and 1. Enter the command
X = rand(30,30); A = Xsinv(X); b = [max(max(A)) min(min(A))|

to see if MATLAB has difficulty inverting such a random 30 x 30 matrix.


Using the up-arrow key, execute this command a total of seven times, and :
if the results seem to be consistent.
A. | MATHEMATICAL INDUCTION -

Sometimes we want to prove that a statement about positive integers 1s true for
all positive integers or perhaps for some finite or infinite sequence of
consecutive integers. Such proofs are accomplished using mathematical
induction. The validity of the method rests on the following axiom of the
positive integers. The set of all positive integers is denoted by Z’.

———_I
induction Axiom
Let S be a subset of Z* satisfying
iL 1eS, |
2. IfkES,then(k+ IES.
Then S = Z*. |
L ___|

This axiom leads immediately to the method of mathematical induction.

Mathematical Induction
Let P(n) be a statement concerning the positive integer n. Suppose
that
1. P(1) is true,
2. If P(k) is true, then P(k + 1) is true.
Then P(n) is true for all n € Z*.

A-l
A-2 APPENDIX A’ MATHEMATICAL INDUCTION

Most of the time, we want to show that P(n) holds for all n € Z*. If we wis
only to show that it holds forr,r + 1,r+2,...,5— 1,5, then we show tha
P(r) is true and that P(k) implies P(k + 1) forr=k=s-— 1. Noticethat rma
be any integer—positive, negative, or zero.

EXAMPLE A.1 Prove the formula


n(n + 1)
l+2+--+-+n5 2
(Al

for the sum of the arithmetic progression, using mathematical induction.


SOLUTION We let P(n) be the statement that formula (A.1) is true. For n = 1, we obtair
n(n + 1) 1(2)_
2 #22 °°
so P(1) is true.
Suppose that k = 1 and P(k) is true (our induction hypothesis), so

+ 2b eet k= Me)

To show that P(k + 1) is true, we compute


P+2+-++ + (kK + ly H(h+2+-+-
+k +(k +1)
=D ee l) = P+ kt 2+?

=e skee2_ Es kr)

Thus, P(k + 1) holds, and formula (A.1) is true for alln € Z*. «

EXAMPLE A.2 Show that a set of n elements has exactly 2” subsets for any nonnegative inte-
ger n.
SOLUTION This time we start the induction with n = 0. Let S be a finite set having 7
elements. We wish to show
P(n): S has 2” subsets. (A.2)
If n = 0, then S is the empiy set and has only one subset— namely, the empty
set itself. Because 2° = 1, we see that P(0) is true.
Suppose that P(X) is true. Let Shave k + 1 elements, and let one element of
Sbec. Then S — {c} has k elements, and hence 2‘ subsets. Now every subset of
S either contains c or does not contain c. Those not containing c are subsets of
S — {c}, so there are 2* of them by the induction hypothesis. Each subset
containing c consists of one of the 2‘ subsets not containing c, with c adjoined.
There are 2* such subsets also. The total number of subsets of S is then
2k + 2k = 2h(2) = 2K,
so P(k + 1) is true. Thus, P(n) is true for all nonnegative integers n. &
APPENDIX A: MATHEMATICAL INDUCTION A-3

AMPLE A.3 Let x € R with x > -1 and x # 0. Show that (1 + x)" > 1 + mx for every
positive integer n = 2.
SOLUTION We let P(m) be the statement

(1 +x)" > 1 + nx. (A.3)


(Notice that P(1) is false.) Then P(2) is the statement (1 + x)? > 1 + 2x. Now
(1+ xP = 1+ 2x + x? and x? > 0, becausex ¥ 0. Thus, (1 + x)? > | + 2x, so
P(2) is true.
Suppose that P(k) is true, so
(l+xi>1t+k. (4.4)
Now | + x > 0, because x > —1. Multiplying both sides of inequality (A.4) by
1 + x, we obtain

(lL +x> (1+ kl + = lt (kt Lxt ke.


Because kx’ > 0, we see that P(k + 1) is true. Thus P(7) is true for every
positive integern => 2. @

In a frequently used form of induction known as complete induction, the


statement
If Pik) is true, then P(k + 1) is true
in the second box on page A-1 is replaced by the statement

If P(m) is true for | = m = k, then P(k + 1) is true.

Again, we are trying to show that P(k + 1) is true, knowing that P(k) is true.
But if we have reached the stage of induction where P(k) has been proved, we
know that P(m) is true for 1 = m = k, so the strengthened hypothesis in the
second statement is permissible.

AMPLE A.4 Recall that the set of all polynomials with real coefficients is denoted by P.
Show that every polynomial in P of degree n € Z* either is irreducible itself or
is a product of irreducible polynomials in P. (An irreducible polynomial is one
that cannot be factored into polynomials in P all of lower degree.)
SOLUTION We will use complete induction. Let P(n} de the statement that is to be proved.
Clearly P(1) is true, because a polynomial of degree 1 is already irreducible.
Let k be a positive integer. Our induction hypothesis is then: every
polynomial in P of degree less than k + 1 either is irreducible or can be
factored into irreducible polynomials. Let f(x) be a polynomial of degree
k + 1. If f(x) is irreducible, we have nothing more to do. Otherwise, we may
factor f(x) into polynomials g(x) and A(x) of lower degree than k + 1, obtain-
ing f(x) = g({x)h(x). The induction hypothesis indicates that each of g{x) and
h(x) can be factored into irreducible polynomials, thus providing such a
factorization of f(x). This proves P(k + 1). It follows that P(n) is true for
allne€ Z*. o
A-4 APPENDIX A’ MATHEMATICAL INDUCTION

EXERCISES

. Prove that max(i-1,j- l)=k,soi-1=j-—1 by th


induction hypothesis. Therefore, 1 = j and
_ n(n + 1)(2n + 1) P(k + 1) is true; so, P(7) is true for all n.
P+ 2+ 2+ --- +H? 6
. Criticize the following argument.
forn€ Z*. Let us show that every positive integer
. Prove that has some interesting property. Let P(m) be the
nw

statement that n has an interesting property.


_m(n t+ ly We use complete induction.
B+23+34+--- +73 4
Of course P(1) is true, because | is the
only positive integer that equals its own
forn€ Z*.
square, which is surely an interesting property
. Prove that of 1.
2

Suppose that P(77) is true for 1 <n Sk.


1+34+5¢---4+(Qn-hh=r
If P(k + 1) were not true, then k + | would
forneé Z?*. be the smallest integer without an interesting
. Prove that property, which would, in itseif, be an
interesting property of k + 1.So P(k + 1)
——
l
+
l
+
l
a
l must be true. Thus P({7) is true for all n € Z*.
1-2 2-3 2:4 n(n
+ |) . We have never been able to see any flaw in
_ oon part a. Try your luck with it, and then answer
n+l part b.
forn€ Z*.
a. A serial killer is sentenced to be executed.
He asks the judge not to let him know the
. Prove by induction that if a, rE Rand
r # |, day of the execution. The judge says, “I
then sentence you to be executed at 10 a.m.
atart+art:++
tur’ some day of this coming January, but |
_or promise that you will not be aware that
"der you are being cxecuied that day until they
forn€ Z*.
come to get you at 8 a.m.” The killer goes
to his cell and proceeds to prove, as
. Find the flaw in the following argument. follows, that he can’t be executed in
We prove that any two integers i and j in January.
Z* are equal. Let
_,_fi if i=} Let P(n) be the statement that J can’t be
math) =} ig poi. executed on January (31 — 7). I want to
prove P(n) for 0 = n = 30. Now I can’t be
Let P(n) be the statement executed on January 31, for since that is
P(n): Whenever max(i,/) = 7, then / = j.
the last day of the month and I am to be
executed that month, I would know that
Notice that, if P(m) is true for all positive was the day before 8 a.m., contrary to the
integers nm, then any two positive integers | judge’s sentence. Thus P(0) is true.
and j are equal. We proceed to prove P(n) for Suppose that P(m) is true for0 = m= k,
positive integers n by induction. where k = 29. That is, suppose 1 can’t be
Clearly P(1) is true, because, if i, j & Z* executed on January (31 — k) through
and max (i,j) = 1, then i = j = 1. January 31. Then January (31 — k — i)
Assume that P(k) is true. Let i and / be musi be the last possible day for execution,
such that max (i,j) = k + |. Then and I would be aware that was the day
APPENDIX A> MATHEMATICAL INDUCTION A-5

before 8 a.m., contrary to the judge’s her class that she will give one more quiz
sentence. Thus I can’t be executed on on one day during the final full week of
January (31 — (& + 1)), so P(k + 1) is classes, but that the students will not know
true. Therefore, I can’t be executed in for sure that the quiz will be that day until
January. (Of course, the serial killer was they come to the classroom. What is the
executed on January 17.) last day of the week on which she can give
the quiz in order to satisfy these
. An instructor teaches a class five days a conditions?
week, Monday through Friday. She tells
TWO DEFERRED PROOFS

PROOF OF THEOREM 4.2 ON EXPANSION BY MINORS


Our demonstration of the various properties of determinants in Section 4.2
depended on our ability to compute a determinant by expanding it by minors
on any row or column, as stated in Theorem 4.2. In order to prove Theorem
4.2, we will need to look more closely at the form of the terms that appear in an
expanded determinant.
Determinants of orders 2 and 3 can be written as
QA)
Ay, Q,| ~ (1)(@,)4y.) + (= 1)(4,2421)

and
Q, Ay 4:3
Oy, Ay, ys) = (141142433) + (—1)(G 12343) + (1)(G124234n) + (= 14124214)
Q3, Gy) Gs;
+ (1)(G)34y1@52) + (— 1)(4)342243).
Notice that each determinant appears as a sum of products, each with an
associated sign given by (1) or (—1), which is determined by the formula
(—1)'*/ as we expand the determinant across the first row. Furthermore, each
product contains exactly one factor from each row and exactly one factor from
each column of the matnx. That is, the row indices in each product run
through all row numbers, and the column indices run through all column
numbers. This is an illustration of a general theorem, which we now prove by
induction.

A-6
APPENDIX 8 TWO DEFERRED PROOFS A-7

THEOREM B.1. Structure of an Expanded Determinant

The determinant of an n X n matrix A = (a,] can be expressed as a sum


of signed products, where each product contains exactly one factor
from each row and exactly one factor from each column. The
expansion of det(A) on any row or column also has this form.

PROOF We consider the expansion of det(A) on the first row and give a proof
by induction. We have just shown that our result is true for determinants of
orders 2 and 3. Let n > 3, and assume that our result holds for all square
matrices of size smaller than n x n. Let A be ann X n matrix. When we expand
det(A) by minors across the first row, the only expression involving a, is
(—1)'"a,|A,|. We apply our induction hypothesis to the determinant |A,! of
order n — 1: it isa sum of signed products, each of which has one factor from
each row and column of A except for row | and column j. As we multiply this
sum term by term by a, we obtain a sum of products liaving a,, as the factor
from row | and column j, and one factor from each other row and from each
other column. Thus an expression of the stated form is indeed obtained as we
expand det(A) by minors across the first row.
It is clear that essentially the same arguinent shows that expansion across
any row or down any column yields the same type of sum of signed
products. a

Our illustration for 2 x 2 and 3 X 3 matrices indicates that we might


always have the same number of products appearing in det(A) with a sign given
by 1 as with a sign given by —1. This is indeed the case for determinants of
order greater than 1, and the induction proof is left to the reader.
We now restate and prove Theorem 4.2.

Let A = [a,] be an n X n matrix. Then

IA] = (- 1) ay lAnl + (- 1 aglAgl + + + (H 1) alAl (B.1)


for any r from | to n, and

|A] = (—1)'*¥a,JA,,| + (—1)aylAa| + > + (1) atAa{ = (B.2)


for any s from | to n.

PROOF OF THEOREM 4.2 We first prove Eq. (B.1) for any choice of r from 1 to
n. Clearly, Eq. (B.1) holds for n = 1 and n = 2. Procéeding by induction, let
n > 2 and assume that determinants of order less than n can be computed by
using an expansion on any row. Let A be an n X n matrix. We show that
expansion of det (A) by minors on row ris the same as expansion on row i for
i <r. From Theorem B |, we know that each of the expansions gives a sum of
A-8 APPENDIX B: TWO DEFERRED PROOFS

signed products, where each product contains a single factor from each ro:
and from each column of A. We will compare the products containing a
factors both a, and a,, in each of the expansions. We consider two cases, a
illustrated in Figures B.1 and B.2.
If det(A) is expanded on the ith row, the sum of signed products containin
a,a,, is-part of (—1)a,{A,|. In computing |A;| we may, by our inductio
assumption, expand on the rth row. For j < s, terms of |A;| involving a, ar
then (—1)"-)*6-d, where d is the determinant of the matrix obtained from ,
by crossing out rows i and rand columns j and s, as shown in Figure B.i. Th
exponent (r — 1) + (s — 1) occurs because a,, is in rowr — 1 andcolumns —
of A;. Thus, the part of our expansion of det(A) across the ith row that contain
a, ,, is equal to
(-1)*(-1) Oa, ad for J < §. (B.3

For j > s, we consult Figure B.2 and use similar reasoning to see that th
part of our expansion of det(A) across the ith row, which contains a,a,,, 1
equal to

(-1)(-1O"a,ad for j>s. (B.4


We now expand det(A) by minors on the rth row, obtaining (—1)’*'a,|A,,| a
the portion involving a,,. Expanding |A,| on the ith row, using our induction
assumption, we obtain (—1)'a,d if j < s and (—1)*""a,d if j > s. Thus th:
part of the expansion of det(A) on the rth row, which contains a,,a,,, is equal t
(--1)*{-i)"a,a,d for j<s (B.5
or
(-1)*(-1)*0a,a,d for j>s. (B.6
Expressions (B.3) and (B.5) are equal, because (—1)*#*") = (—1)*st##-?
and expressions (B.4) and (B.6) are equal, because (~ 1)’***"*-! is the algebrai
sign of each. This concludes the proof that the expansions of det(A) by minor.
across rows i and r are equal.

FIGURE B.1 FIGURE B.2


The case j < s. The case j > s.
APPENDIX B- TWO DEFERRED PROOFS A-9

A similar argument shows that expansions of det(A) down columns j and s


are the same.
Finally, we must show that an expansion of det(A) on a row is equal to
expansion on a column. It is sufficient for us to prove that the expansion of
det(A) on the first row is the same as the expansion on the first column, in view
of what we have proved above. Again, we use induction and dispose of the
cases n = | and n = 2 as trivial to check. Let n > 2, and assume that our result
holds for matrices of size smaller than n X n. Let A be ann X n matrix.
Expanding det(A) on the first row yields
rt

a,|A,,| + > (-1)'"a,JA,J.


j=

For j > 1, we expand [A on the first column, using our induction assumption,
and obtain |A,,| = %7., (—1)""*'a,,d, where d is the determinant of tne matrix
obtained from A by crossing out rows | and i and columns | and /. Thus the
terms in the expansion of det(A) containing a,,a,, are

(— 1)’ ayaid. (B.7)

On the other hand, if we expand det(A) on the first column, we obtain


"

a,,|A,,| + 2 (—1)*'a,,|A,\|.
{=

For i > 1, expanding on the first row, using our induction assumption, shows
that |A,,| = 22, (-—1)'*"a,.d. This results in
(- 1)*'"a;,a,,a

as the part of the expansion of det(A) containing the sum a,,@,,, and this agrees
with the expression in formula (B.7). This concludes our proof. 4

PROOF OF THEOREM 4.7 ON THE VOLUME


OF AN n-BOX IN R”™
Theorem 4.7 in Section 4.4 asserts that the volume of an n-box in R”
determined by independent vectors a,, a,,..., a, can be computed as

Volume = “ det(A7A), (B.8)

where A is the m x n matrix having a,, a, . . . , a, as column vectors. Before


proving this result, we must first give a proper definition of the volume of such
a box. The definition proceeds inductively. That is, we define the volume of a
1-box directly, and then we define the volume of an n-box in terms of the
volume of an (m — 1)-box. Our definition of volume is a natural one, essentially
taking the product of the measure of the base of the box and the altitude of the
box. We will think of the base of the n-box determined by a,, a, . . . , a, as an
A-10 APPENDIX B- TWO DEFERRED PROOFS

(n — 1)-box determined by some n — | of the vectors a,. In general, such a ba:


can be selected in several different ways. We find it convenient to work wit
boxes determined by ordered sequences of vectors. We will choose one speci:
base for the box and give a definition of its volume in terms of this order of tt
vectors. Once we obtain the expression det(A7A) in Eq. (B.8) for the square «
the volume, we can show that the volume does not change if the order of th
vectors in the sequence is changed. We will show that det(A’A) remair
unchanged if the order of the columns of A is changed.
Observe that, if an n-box is determined by a,, a,,..., a,, then a, can b
uniquely expressed in the form

a, = b + p, (B.S

where p is the projection of a, on W = sp(a,..., a,) and b = a, —


is orthogonal to W. This follows from Theorem 6.! and Is illustrated in Fig
ure B.3.

DEFINITION R.1. Volume of an n-Box

The volume of the {-box determined by a nonzero vector a, in R” is


ifa,|]. Let-a,, a,;..., a, be an ordered sequence of n independent
vectors, and suppose that the volume of an r-box determined by an
ordered sequence of r independent vectors has been defined for r < n.
The volume of the n-box in R” determined by the ordered sequence of
a; 1s the product of the volume of the “‘base”’ determined by the
ordered sequence a,, .. . ,a, and the length of the vector b given in Eq.
(B.9). That is,

Volume = (Altitude ||b||)( Volume of the base).

As a first step in finding a formula for the volume of an n-box, we establist


a preliminary result on determinants.

“Pp Base determined by


@,a;3,...,8 nm
FIGURE B.3
The altitude vector b perpendicular to the base box.
APPENDIX B: TWO DEFERRED PROOFS A-11
THEOREM B.2 Property of det(A’A)

Let a,, a), ..., a, be vectors in R”, and let A be the m x n matrix with
jth column vector a, Let B be the m x n matrix obtained from 4 by
replacing the first column of A by the vector
b=a,—-7,€8,-°*** — 7,8,

for scalars r,,..., 7,. Then

det(A’A) = det(B’B). (B.10)

PROOF The matrix B can be obtained from the matrix 4 by a sequence


of n — | elementary column-addition operations. Each of the elementary
column operations can be performed on A by multiplying A on the
right by an elementary matrix formed by executing the same elementary
column-addition operation on the n xX n identity matrix J. Each elemen-
tary column-addition matrix therefore has the same determinant 1 as the
identity matrix J. Their product is an n X n matrix FE such that B = AE,
and det(£) = 1. Using properties of determinants and the transpose opera-
tion, we have

det(B7B) = det((AE)"(AE)) = det(E"(ATA)E)


= | - det(A7A)- 1 = det(ATA). A
We can now prove our volume formula in Theorem 4.7.

The volume of the n-box in R” determined by the ordered sequence


a,, a),..., a, Of n independent vectors is given by
Volume = Vdet(A7A),

where A is the m X n matrix with a; as jth column vector.

PROOF OF THEOREM 4.7 Because our volume was defined inductively, we give
an inductive proof. The theorem is valid if m = | or 2, by Egs. (1) and (2),
respectively, in Section 4.4. Let n > 2, and suppose that the theorem is proved
for all k-boxes for k = n — 1. If we write a, = b + p, as in Eg. (B.9), then,
because p lies in sp(a), ... , a,), we have
p=ra,t::+: +7,4,

for some scalars r,,... , T,, SO

b =a, —- p=a,-—a,- °°: —7,8,.


A-12 APPENDIX 8: TWO DEFERRED PROOFS

Let B be the matrix obtained from A by replacing the first column vector a, of
A by the vector b, as in Theorem B.2. Because b is orthogonal to each of the
vectors a,,..., a,, Which determine the base of our box, we obtain
-b 0 wee 0 ]
0 a, . a, ee a, . a,

BB =

0 a,:a ‘** a,°4, (B.11)

From Eq. (B.11), we see that


a, e a, oe @ a, . a,

det(B7B) = ||bl|’
a, e a, eee a, s a,

By our induction assumption, the square of the volume of the (7 — 1)-box in


R”™ determined by the ordered sequence a,, ... , a, 1S
ara °°* a,°8,

a, e a, «se a, e a,

Applying Eq. (B.i0) in Theorem B.2, we obtain

det(A™A) = det(B7B} = ||b||’(Votume of the base)? = (Volume)’.


This proves our theorem. 4

COROLLARY Independence of Order

The volume of a box determined by the independent vectors a,,


a,,...,a, and defined in Definition B.1 is independent of the order of
the vectors; in particular, the volume is independent of a choice of
base for the box.

PROOF Arearrangement of the sequence a,, a), . . . ,a, of vectors corresponds


to the same rearrangement of the columns of matrix A. Such a rearrangement
of the columns of A can be accomplished by multiplying A on the right by a
product of n x n elementary colurmn-interchange matrices, having determi-
nant +1. As in the proof of Theorem B.2, we see that, for the resulting matrix B
= AE, we have det(B’B) = det(A™A) because det(E")det(E) = 1. a
LINTEK ROUTINES

Below are listed the names and brief descriptions of the routines that make up
the computer software LINTEK designed for this text.

VECTGRPH Gives graded quizzes on vector geometry based on displayed


graphics. Useful throughout the course.
MATCOMP Performs matrix computations, solves linear systems, and
finds real eigenvalues and eigenvectors. For use throughout
the course.
YUREDUCE Enables the user to select items from a menu for step-by-step
row reduction ofa matrix. For Sections |.4-1.6, 9.2, and 10.3.
EBYMTIME Gives a lower bound for the time required to find the
determinant of a matrix using only repeated expansion by
minors. Fur Section 4.2.
ALLROOTS Provides step-by-step execution of Newton’s method to find
both real and complex roots of a polynomial with real or
complex coefficients; may be used in conjunction with MAT-
COMP to find complex as well as real eigenvalues of a small
matrix. For Section 5.1.
QRFACTOR Executes the Gram-Schmidt process, gives the OR-fact oriza-
tion of a suitable matrix A, and can be used to find least-
squares solutions. For Section 6.2. Also, for Section 8.4,
step-by-step computation by the QR-algorithm of both real
and complex eigenvalues of a matrix. The user specifies shifts
and when to decouple.
YOUFIT The user can experiment with a graphic to try to find the
least-squares linear, quadratic, or exponential fit of two-
dimensional data. The computer can then be asked to find the
best fit. For Section 6.5.
POWER Executes, one step at a time, the power method (with defla-
tion) for finding eigenvalues and eigenvectors. For Section 8.4.

A-13
A-14 APPENDIX C LINTEK ROUTINES

JACOBI Executes, one “‘rotation” at a time, the method of Jacobi for


diagonalizing a symmetric matrix. For Section 8.4.
TIMING Times algebraic operations and flops; also times various
methods for solving linear systems. For Sections 10.1 and
10.2.
LUFACTOR Gives the factorizations A = LU or PA = LU, which can then
be used to solve Ax = b. For Section 10.2.
HILBERT Enables the user to experiment with ill-conditioned Hilbert
matrices. For Section 10.3.
LINPROG A graphics program allowing the user to estimate solutions of
two-variable linear programming problems. For use with the
previous edition of our text.
SIMPLEX Executes the simplex method with optional display of tableau
to solve linear programming problems. For use with Chapter
10 of the previous edition of our text.
MATLAB PROCEDURES AND
COMMANDS USED IN THE
EXERCISES

The MATLAB exercises in our text illustrate and facilitate computations in


linear algebra. Also, they have been carefully designed, starting with those in
Section 1.1, to introduce students gradually to MATLAB, explaining proce-
dures and commands as the occasion for them arises. It is not necessary to
study this appendix before starting right in with the MATLAB exercises in
Section 1.1. In case a student has forgotten a procedure or command, we
summarize here for easy reference the ones needed for the exercises. Tne
information given here is only a small fraction of that ava:lable from the
MATLAB manual, which is of course the best reference.

GENERAL INFORMATION

MATLAB pnints > on the screen as its prompt when it is ready for your next
command.
MATLAB is case sensitive—that is, if you have set n = 3, then AN 1s
undefined. Thus you can set X equal to a 3 x 2 matrix and x equal to a row
vector.
To view on the screen the value, vector, or matrix currently assigned to a
variable such as A or x, type the variable and press the Enter key.
For information on a command, enter help followed by a space and the
name of the command or function. For example, entering help * will bring the
response that XY * Y is the matrix product of X and Y, and entering help eye will
inform us that eye(m) is the n x n identity matrix.
The most recent commands you have given MATLAB are kept in a stack,
and you can move back and forth through them using the up-arrow and
down-arrow keys. Thus if you make a mistake in typing a command and get an
error message, you can correct the error by striking the up-arrow key to put the
command by the cursor and then edit it to correct the error, avoiding retyping
the entire command. The exercises frequently ask you to execute a command
you recently gave, and you can simply use the up-arrow key until the command
is at the cursor, and then press the Enter key to execute it again. This saves a lot
of typing. If you know you will be using a command again, don’t let it get too

A-15
A-16 APPENDIX D. MATLAB PROCEDURES AND COMMANDS USED IN THE EXERCISES

far back before asking to execute it again, or it may have been removed fron
the command stack. Executing it puts it at the top of the stack.

DATA ENTRY
Numerical, vector, and matrix data can be entered by the variable name for
the data followed by an equal sign and then the data. For example, entering
X = 3 assigns the value 3 to the letter x, and then entering y = sin(x) will then
assign to y the value sin(3). Entering y = sin(x) before a value has been
assigned to x will produce an error message. Values must have been assigned to
a variable before the variable can be used in a function or computation. If you
do not assign a name to the result of a computation, then it is assigned the
temporary name “ans.” Thus this reserved name ans should not be used in
your work because its value will be changed at the next computation having no
assigned name.
The constant a = 4 tan™'(1) can be entered as pi.
Data for a vector or a matnx should be entered between square brackets
with a space between nuinbers and with rows separated by a semicolon. For
example, the matrix
2-1 3 6
A=|0 512 -7
4-2 9 ll
c2n be entered as

A = [2 -136;05
12 —7;4-29 11],
which will produce the response
A=

2-1 3 6
0 $12 -7
4-2 9 11
from MATLAB. If you do not wish your data entry to be echoed in this fashion
for proofreading, put a semicolon after your data entry line. The semicolon
can also be used to separate several data items or commands all entered on one
line. For example, entering
x = 3; sin(x); v = [1 —3 4]; C = [2 —1; 3 5};
will result in the variable assignments

x = 3 and ans = sin(3) and v = [1, —3, 4] and C = 3 3

and no data will be echoed. If the final semicolon were omitted, the data for
the matrix C would be echoed. If you run out of space on a line and need to
continue data. on the next line, type at least two periods .. and then
immediately press Enter and continue entry on the next line.
APPENDIX D. MATLAB PROCEDURES AND COMMANDS USED IN THE EXERCISES A-17

Matrices can be glued together to form larger matrices using the same form
of matrix entry, provided the gluing will still produce a rectangular array. For
example, if A is a 3 X 4 matrix, Bis 3 X 5, and C 1s 2 x 4, then entering
D = [A B] will produce a 3 x 9 matrix D consisting of the four columns of A
followed at the right by the five columns of B, whereas entering E = [A; C] will
produce a5 x 4 matrix consisting of the first three rows of A followed below by
the two rows of C.

NUMERIC OPERATIONS

Use + for addition, — for subtraction, * for multiplication, / for division, and “
for exponentiation. Thus entering 63 produces the result ans = 216.

MATRIX OPERATIONS AND NOTATIONS

Addition, subtraction, multiplication, division, and exponentiation are denot-


ed just as for numeric data, always assuniing that they are defined for the
matrices specified. For example, if A and B are square matrices of the same
size, then we can form A + B,A — B, A * B, 2 * A, and A"4. IfA and B are
invertible, then MATLAB interprets A/B as AB™' and interprets A\B as A“'B.
A period before an operation symbol, such as .* or .*, indicates that the
operation is to be performed at each corresponding position i,j in a matrix
rather than on the matnx as a whole. Thus entering C = A .* B produces the
matrix C such that c, = a,5,, and entering D = A.“2 produces the matrix D
where d, = a,?.
The transpose of A is denoted by A’, so that entering A = A’ will replace A
by its transpose. Entering b = [1; 2; 3] or entering b = (1 2 3]' creates the same
column vector b.
Let A be a 3 X 5 matrix in MATLAB. Entries are numbered, starting from
the upper left corner and proceding down each column in tur. Thus A(6) is
the last entry in the second column of A. This entry is also identified as A(3,2);
we think of the 3 and 2 as subscripts as in our text notation a,,. Use of a colon
between two integers has the meaning of “through.” Thus, entering
v = A(2,1:4) will produce a row vector v consisting of entries 1 through 4 (the
first four components) of the second row vector of A, whereas entering
B = A(1:2,2:4) will produce the 2 x 3 submatrix of A consisting of columns 2,
3, and 4 of rows | and 2. The colon by itself in place of a subscript has the
meaning of “‘all,” so that entering w = A(:,3) sets w equal to the third column
vector of A. For another example, if x is a row vector with five components,
then entering A(2,:) = x will replace the second row vector of A by x.
The colon can be used in the “through” sénse to enter vectors with
components incremented by a specified amount. Entering y = 4: —.5: 2 sets y
equal to the vector [4, 3.5, 3, 2.5, 2] with increment —0.5, whereas entering
xX = 1: 6 will set x equal to the vector [1, 2, 3, 4, 5, 6] with default increment |
between components.
A-18 APPENDIX D: MATLAB PROCEDURES AND COMMANDS USED IN THE EXERCISES
MATLAB COMMANDS REFERRED TO IN THIS TEXT

Remember that all variables must have already been assigned numeric, vector,
or matrix values. Following is a summary of the commands used in the
MATLAB exercises of our text, together with a brief description of what each
one does.

acos(x) returns the arccosine of every entry in (the number,


vector, or matrix) x.
bar(v) draws a bar graph wherein the height of the ith bar is
the ith component of the vector v.
clear erases all variables; all data are lost.
clear A w x erases the variables A, w, and x only.
clock retums the vector [year month day hour minute sec-
ond}
det(A) returns the determinant of the square matrix A.
eig(A) retums a column vector whose components are the
eigenvalues of A. Entering [V, D] = eig(A) creates a
matrix VY whose column vectors are eigenvectcrs of A
and a diagonal matrix D whose jth entry on the
diagonal is the eigenvalue associated with the jth
column of V, so that AV = VD.
etime(tl,t0) returns the elapsed time between two clock output
vectors t0 and t1. If we set t6 = clock before executing a
computaiion, then the command etime(clock,t0) im-
mediately after the computation gives the time re-
quired for the computation.
exit leaves MATLAB and returns us to DOS.
eye(n) retums the X n identity matrix.
fori = I:n,...
5 end causes the routine placed in the... portion to be
executed 7 times with the value of i incremented by 1
each time. The value of ” must not exceed 1024 in the
student version of MATLAB.
format long causes data to be printed in scientific format with
about 16 significant digits until we specify otherwise.
(See format short.)
format short causes data to be printed with about five significant
digits until we specify otherwise. (See format long.) The
default format when MATLAB is accessed is the short
one.
help produces a list of topics for which we can obtain an
on-screen explanation. To obtain help on a specific
command or function—say, the det function—enter
help det.
APPENDIX D: MATLAB PROCEDURES AND COMMANDS USED IN THE EXERCISES A-19

hilb(n) retums the n X n Hilbert matrix.


invhilb(n) returns the result of an attempted computation of the
inverse of the r X n Hilbert matrix.
iny(A) returms the inverse of the matrix A.
log(x) retums the natural logarithm of every entry in (the
number, vector, or matrix) x.
lu The command [L, U, P] = lu(A) returns a lower-
triangular matrix L, an upper-triangular matrix U, and
a permutation matrix P such that PA = LU.
max(x) retums the maximum of the components of the vector
xX.

max(X) retums a row vector whose jth component is the


maximum of the components of the jth column vector
of X.
meun(x) returms the arithmetic mean of the components of the
vector x. That is, if x has 7 components, then mean(x)
=(x,+x,+ °°: + x,)/n.
mean(A) returms the row vector whose jth component is the
arithmetic mean of the jth column of the matnx A. (See
mean(x).) If A is an m X n matrix, then mean(mean(A))
= (the sum of all mn entries)/(mzn).
mesh(A) draws a three-dimensional perspective mesh surface
whose height above the point (i,j) in a honzontal
reference plane is a,. Let x be a vector of m compo-
nents and y a vector of n components. The command
[X,Y] = meshdom(x, y) creates matrices X and Y where
matrix X is an array of x-coordinates of all the smn
points (x;, y,) over which we want to plot heights for our
mesh surface, and the matrix Y is a similar array of
y-coordinates. See the explanation preceding Exercise
M10 in Section 1.3 for a description of the use of mesh
and meshdom to draw a surface graph of a function z =
f(x, y).
min(x) returns the minimum of the components of the vector
X.

min(X) returns a row vector whose jth component is the


minimum of the components of the jth column vector
of X.
norm(x) returms the usual norm ||x|| of the vector x.
ones(m,n) retums the m X n matrix with all entries equal to |.
ones(n) retums the n X n matrix with all entnes equal to 1.
piot(x) draws the polygonal approximation graph of a function
f such that f(i) = x; for a vector x.
A-20 APPENDIX D: MATLAB PROCEDURES AND COMMANDS USED IN THE EXERCISES
plot(x, y) draws the polygonal approximation graph of a functior
fsuch that f(x) = y; for vectors x and y.
poly(A) returns a row vector whose components are the coefh-
cients of the characteristic polynomial det(AJ — A) o!
the square matrix A. Entering p = poly(A) assigns this
vector to the variable p. (See roots(p).)
qr The command [Q, R] = qr(A) returns an orthogonal
matrix Q and an upper-tnangular matrix R such that
A = QR. If matrix A is an m X n matrix, then R is also
m X n whereas Qis m X m.
quit leaves MATLAB and retums us to DOS.
rand(m,n) retums an m X n matrix with random number entries
between 0 and 1.
rand(n) retums an ” X n matnx with random number entries
betwecn 0 and I.
rat(x, ‘s’) causes the vector (or matrix) x to be printed on the screen
with entnes replaced by rational (fractional) approximations.
The entries of x are not changed. Hopefully, entries that
should be equal to fractions with small denominatois will be
replaced by those fractions.
roots(p) returns a column vector of roots of the polynomial whose
coefficients are in the row vector p. (See poly(A).)
rot90(A) retums the matrix obtained by “rotating the matrix A”
counterclockwise 90°. Entering B = rot90(A) produces this
matrix B.
rref(A) returms the reduced row-echelon form of the matrix A.
rrefmovie(A) produces a “movie” of reduction of A to reduced row-
echelon form. It may go by so fast that it is hard to study, even
using the Pause key. See the comment following Exercise M9
in Section |.4 about fixing this problem.
size(A) returns the two-component row vector [m, n], where A is an
m X n matrix.
stairs(x) draws a stairstep graph of the vector x, where the height of the
ith step is x,.
sum(x) returns the sum of all the entries in the vector x.
triu(A) returns the upper-triangular matrix obtained by replacing all
entries a, for j > i in A by zero.
who creates a list on the screen of all variables that are currently
assigned numeric, vector, or matrix values.
zeros(m,n) returns the m x n matrix with all entries equal to 0.
zeros(n) retums the n X n matrix with all entries equal to 0.
NSWERS TO MOST ODD-
IUMBERED EXERCISES

APTER 1

tion 1.1

l.v + w={[-1, -3] 3.v+we=2i


+ 5j + 6k
yv—we=([5,1] v—-w=j-2k

5. [-11, 9, -4] 7. [6. 4, -5]


». (17, -18, 3, -20] 11. {20, -12, 6, -20]

A-21
A-22 ANSWERS TO MOST ODD-NUMBERED EXERCISES

15. l
7 e V7 1, hy 3, Ys 4 4] 9. ._ 7 -3 11. ° 3

_,__1l 43
13. cos t_____
364 = 54.8
§4 8° 15. —233

17. {13, —5, 7] or any nonzero multiple of it


19. 15\% 21. —540
25. Perpendicular
27. Parallel with opposite direction
29. Neith
17. r= 1.5.5~1.8 19. r= .5,5~ -1.2 x rh. “ft , red
2 . Lhe vector v — w, when translated to
21. —1 23. -§ 25. Allc # 15 start at (W,, W,, ..., W,), reaches to the
27. 0 29. -27 31. Si - j tip of the vector v, so its length is the
33. {1, -3, —4 13] distance from (W,, Wa... w,) to
Vip Voy ow ey Vale
3] f-2] Ff 4) fio] Ov" 100
35. x j ty + ‘73 = | 33. V33 35. 10 37. lb
2 l a) —3
39. b. F, = —T,(sin 6,)i + T,(cos 6,)j
37. a. —3p —-4r+6s= 8 T(sin 6) - T{sin 6,) = 0, T,(cos 6,) +
_ Cc. (Sin vo” 42 sin y = Y, (COS l

4p — 2g + 3r =—3 T,(cos. 6,) = 100


6p+5q-2r+7s= 1 1002 500

3 0} [-4] fe 8 n= Ay b=Ay
b. | +q 3 +r 3 +s q = 3 M1. |lal] ~ 6.3246; the two results agree.
mt

. M3. |Jul| ~ 1.8251


>

39. FT FFTFTFFT
MS. a. 485.1449
M1. [—58, 79, -36, — 190]
b. Not found by MATLAB
M3. Error using + because a and u have
different numbers of components. M7. Angle ~ 9.9499 radians
MS. a. [—2.4524, 4.4500, —11.3810] M9. Angle = 147.4283°

b. [—2.45238095238095,
4.45000000000000,
— 11.38095238095238}
c, |-103 89 -7 Section 1.3
42°20? 21
M7. a. [0.0357, 0.0075, 0.1962]
{76 3 9 3/2 2 1
b. [0.03571428571429, 12 0 -3 9-1 2
0.00750000000000, 0.196190476 19048) _
c, [LE & 183 6 -3
"28? 400° 525 5.{|-3 1 7. Impossible
M9. Error using + because wu is a row vector 2 §
and u? is a column vector.
9. | —130
110
140
“0
;
11. Impossible

Section 1.2 20 —2 -10


13, | 27 6 | 2 1 |
1. V26 3. 26 5. VaR [-4 26 -10 3 10
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-23

400 128 0 O 41. a. The jth entry in column vector Ae, is


17. a. /0 1 0 b. 0-1 0O [41,4 ** +, @q)°e; = a, Therefore,
001 0 0 1 Ae; is the jth column vector of A.
b. (i) We have Ae, = 0 for eachj = 1, 2,
-: 12 -4 ..., 4; 80, by part a, the jth column of
19. xy =[-14], yx =) 2 -3 1
A is the zero vector for exch j. That is,
|-6 9 -3 A = O. (ii) Ax = Bx foreach x
21. TFFTFTTTFT if and only if (A — B)x = 0 for each x,
23. (Ab)"(Ac) = bTATAC if and only ifA — B = O by part (i),
if and only ifA = B.
27. The (i, j)th entry in (r + s)A is (r + s)a, =
ra, + sa; which is the (i, j)th entry in 45. These matrices do not commute for any
rA + SA. Because (r + s)4 and rA + sA value of r.
have the same size, they are equal. 47. 2989 49. 348650
33. Let A = [a,] be an m X n matrix, B = [04] 51. 32 53. —41558
be an m X r matrix, and C = [c,,} be an MI. a. 32 b. 14679 Cc. —41558
r X $ matrix. Then the ith row of AB is
141 -30 -107
M3. -30 50 18 MS. -117
(2 an, z Aiba, ... 2 ay -107 18 189
j=l
so the (i, g)th entry in (AB)C is M7. The expected mean is approximately 0.5.
Your experimental values will differ from
fuer 4g ours, and so we don’t give ours.
+ ( > arb
(2 apy j
n f a

( J=1
x arb = k=1\j=1
2 ( > (ape)
Further, the gth column of BC has Section 1.4
components
ot

&

r
OuUw
oOo

OON
oot
oon

Bi Lis Dulin °°
Ne
Md

=
&

’ > blige
O-

x Hg Py 2g k=1
|

so the (1, g)th entry in A(BC) is


re |

oor Oo

l

OO
OW ©
S2Oorn

oor.
SGOn-
ooo

r r
=
eS

(SoS
a,( 2 but + an x
or
So
i
1
4

ain > Dulig) = y a >


OS

WO
Ow

Awth-
ooo

OO

3,(2, eed) = 2, (2, esa)


O20

OK


q

Because (AB)C and A(BC) are both m x s


o@DmoerOo
|

-O°C°.
ow

OG

matrices with the same (i, g)th entry, they


rm———
J coo

must be equal.
oo

or

ee

35. anxm bonxn c mxm


Nl
A
|

4
Nw

a5

NA

39. Because (AA’)? = (A7)"A™ = AAT, we see


lI

>

_
1
*

that AA’ is symmetric.


~
A-24 ANSWERS TO MOST ODD-NUMBERED EXERCISES

1 - 2r a) 0
MI. x = 3) M3. Inconsistent
9x= -2-r-—3s 1 2
x r | 3 Il. x _ 5

Ss —2 2 1-— Ils
13. x = 2, y= —4 M5. x ~=|3 - 7s

nfl
So)
15. x= -3,y=2,z=4 .
[13 — 2p + 145
19. Inconsistent _ r
M7. x = 5455
L ¢
—8
21.x =
—2 —23 -— 5s . 0.0345
* | 1 XT.x= 4s -0.5370
2s M9. x ~ | —1.6240
25. Yes 27. No 0.1222
| 0.8188
29. FFT TFTTTFT
31. x, = -l,x, =3
33. x, = 2, x), = —3
35. x, = 1,x,= -1,x%,=1, x,= 2 Section 1.5

37. x, = —3, x, = 5, x, = 2, x, = -3
la At=|! y
39. All 6, and 8, 0 1
41. All },, b,, 6, such that 5, + b, — 6, =0
b. The matrix A itself is an elementary
‘100 010 matrix.
43.)2 10 45.10
01 3. Not invertible
10 0 1 100
1 Oo 1
pose 1 -60 0 -15] §.a. A'=)0 1 1
0100 0 10 0 0 0-1!
714000 Plo oO | ;
10001 0-12 0 -3 1 0 O11 0 O;}1 0 1
-A=|0 1 OF,0 1 1/0 1 0
o

[ 2-30 5 -10 0 0 -1j10 0 1310 01


-4 121 -20 40 (Other answers are possible.)
31. 0 -6 1 -2
| oO 3 OO 1 [-7 5 3]
J.a. A'=| 3 -2 3
$7.a=1,b=-2,c=1,d=2 L321
1.2857
59. x = 4 61. | 3.1425 1

q 1 2859| 20 olf1 o ov! 9 Alf! 5°


b. A=|0 1 O113 1 O}/0 3 Ol]0 1 0
00 1Jl00 Ilo oO ylloo 4

63.x =|-
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-25

100000 0.291 0.199 0.0419 -0.00828 —0.272


~-0.0564 0.159 0.148 -0.0737 —0.0696
0-1 000 0 47. 0.0276 0.145 -0.00841 -0.0302 -—0.025(
9/2 9 41 0 0 0 -0.0467 0.122 -0.029 0.133 ~-0.008¢
0.0116 -0.128 -0.0470 -0.0417 0.178
0 0 0 i 0 0
M1. 0.001783 M3. 0.4397
0000 ! 0
MS. — 418.07 M7. —0.001071
fo 0 0 0 0 1I
Section 1.6
11. The span of the column vectors is R’. . A subspace 3. Nota

—_,
-7 / b. * = subspace
13. a. A= | [37]
-5§ 2 X5 ~26| . Not a subspace . Nota

~]
15.
x = —7r + 5s+ 31, y = 3r — 25 — 21, subspace
z=3r-—2s-1 . A subspace

\o
46 33 30 321 . a. Every subspace of R? is either the
17.|39 29 26 19.|0 3 2 ufigin, a line through the origin, or all
99 68 63 134 of R’.

21. The matrix is invertible for any value of r b. Every subspace of R? is either the
except r = 0. origin, a line through the origin, a plane
through the origin, or all of R?.
23. TTTFTTTFFF
. No, because 0 = r0 for all rE R, so 0 is
25. a. No; b. Yes not a unique linear combination of 0.
27. a. Notice that A(A7'B) = (A4"')B = IB = . {[-1, 3, 0], [-1, 0, 3]}
B, so X = A~'B 1s a solution. To show . {{(-7, 1, 13, 0], [-6, —1, 0, 13]}
uniqueness, suppose that AY = B. Then
. {{(- 60, 137, 33, 0, 1]} 23. Not a basis
A-AX) = A'B, (A'A)X = A'B, IX =
A7'B, and X = A7'B; therefore, this is . A basis 27. A basis
the only solution. . Not a basis 31. {(-1, -3, 11}}
b. Let E,, E,, ..., E, be elementary . Sp(V,, V2,.-- ȴ,) = sp(W,, W2, --- > Mm) if
matrices that reduce [A | B] to (7 | X], and only if each y, is a linear combination
and let C = E,F,., -++ £,E,. Then of the w, and each w, is a linear
CA = I and CB = X. Thus, C = A"! combination of the v,.

[pa
and so X = A™'B.
37/4 + s/8

“trol
29. a, [9 9
35.
3r/2 + s/4

an
4

0
T 0.355 -0.0645 0.161 0
39.|-0.129 0.387 0.0323
1

|-0.0968 0.290 -0.226


wlrwiw

[*:
—(5r + s)/3
. 0.0275 —0.0296 -0.0269 0.0263 37. (r+ 29/3
I
~~

4i.| 0-168 0.0947 0.0462 -0,0757


S

~! 0.395 -0.138 -0.00769 -—0.0771


ied

|—0.0180 0.0947 0.00385 0.0257 39. In this case, a solution is not uniquely
43. See answer to Exercise 9.
determined. The system is viewed as
underdetermined because it is insufficient
45. See answer to Exercise 41. to determine a unique solution.
A-26 ANSWERS TO MOST ODD-NUMBERED EXERCISES

41. x- yl DHER
2x - 2y=2 0 00;D

3x - 3y = 3 41.T=|1 5 0) H.

49. A basis 51. Not a basis 05 1 1)R

M1. Not a basis -M3. A basis The recessive state 1s absorbing, because
MS. The probability is actually 1. there is an entry | in the row 3, column 3
position.
M7. Yes, we obtained the expected result.
: 2 4037
45. |4 47. 4 49. | .3975
Section 1.7 5 L9 1988

1. Not a transition matrix


ausy
51. | .3462 53.
[288
1916
3. Not a transition matrix .4423 1676
5. A regular transition matrix |.1490
7. A transition matrix, but not regular
Section 1.8
9. 0.28 11. 0.330 13. Not regular
15. Regular 17. Not regular 1. 0001011 0000000 0111001 111111!
1111111 0100101 0000000 0100101

: yO
l 12 TL!1191 0111001
4
19. |) 21. |3 23. | 35 Xq = My +X, Xs = Xp + Ay, Xe =X + X;
ww

4 3 b Note that each x; appears in at least two


8 35
parity-check equations and that, for each
25. TFFITFTFFT combination i,j of positions in a message
word, some parity-check equation contains
[2 2 2] one of x, x, but not the other. As
3s 3s 38 [ 32
explained following Eq. (2) in the text, this
27. A'® = 135 35 35 29 0.47 31.|.47
shows that the distance between code
15 15 15 21
35 35 35 words is at least 3. Consequently, any two
errors can be detected.
33. The Markov chain is regular because 7 has
7. a. 110 b. 001 c. 110
d. 001 or 100 or II! e. 101
no zero entry: s = | 43]: 9. We compute, using binary addition,
lo1rid
i1o1rd0 ot! 0 0)_
ble

0111
101010 =
35. i 37. lorr1001fi9°°!
trsfm

8 11110
de tm

11101

39. The Markov chain is regular because T° 10010


0011 1],
00111
—J
be]

where the first matrix is the parity-check


tite

has no zero entries: 5 =


matrix and the second matrix has as
fa fm

columns the received words.


ANSWERS TO MOST ODD-NUMBERED EXERCISES A-27

a. Because the first column of the product On the other hand, if w is a nonzero code
is the fourth column of the parity-check word of minimum weight, then wt(w) is
matrix, we change the fourth position the distance from w to the zero code word.
of the received word !10111 to get the showing the opposite inequality, so
code word 110011 and decode it as we have the equality stated in the
110. exercise.
b. Because the second column of the . The triangle inequality in Exercise 14
product is the zero vector, the received shows that if the distance from received
word 001011 is a code word and we word v to code word u is at most m and
decode it as 001. the distance from v to code word wis at
c. Because the third column of the most m, then d(v, v) = 2m. Thus, if the
product is the third column of the distance between code words 1s at least
parity-check matrix, we change the 2m + 1, a received word v with at most 777
third position of the received word incorrect components has a unique
111011 to get the code word 110011 nearest-neighbor code word. This number
and decode it as 110. 2m + 1 can’t be improved, because if
d(u, v) = 2m, then a possible reccived
d. Because the fourth column of the word w at distance m from both u and v
product is not the zero vector and not 4 can be constructed by having its
column of the parity-check matrix, components agree with those of u and v
there are at least two errors, and we ask where the components of u and v agree,
for retransmission. and by making m of the 2m components
e. Because the fifth column of the produci of w in which x and y differ opposite from
is the third coijumn of the parity-check the components of u and the other
matrix, we change the third position of components of w opposite from those
the received word 100101 to get the of v.
code word 101101 and decode it as . Let e, be the word in B’ with | in the ith
bd
ok

101. position and 0’s elsewhere. New ¢; is not


. Because we add by components, this in C, because the distance from e; to
follows from the fact that, using binary 000...Q0is 1, and 0C0...0€C. Also,
addition,
i + 1=1-1=0,1+0= v +e; Aw + e for any two distinct words
1-0=1,0+1=0-1=1,and v and w in C, because otherwise v — w =
0+0=0-0=0. e, — e, would be in C with wt(e; — ¢) = 2.
Let e, + C= {e,+ v| v © Ch. Note that C
. From the solutions of Exercises 11 and 12,
oo

and e, + C have the same number of


we see that u — v = u + vy contains 1’s in
words. The disjoint sets C and e, + C for
precisely the positions where the two
i= 1,2,..., & contain all words whose
words differ. The number of places where
distances from some word in C are at most
u and v differ is equa! to the distance
1. Thus B” must be large enough to
between them, and the number of 1’s in
contain all of these (n + 1)2* elements.
u — vis wt(u — v), so these numbers are
That is, 2" = (mn + 1)2*. Dividing by 24
equal.
gives the desired result.
. This follows immediately from the fact
23. a3 b.3 ¢c.4sd.5 26 £7
that B" itself is closed under addition
modulo 2. 25. Xg =X, + Xy + Xz + Xe + Xp, XyqQ = Xs + Xe
. Suppose that d(u, y) is minimum in C. By + Xy + Xp, Xyy = Xp + Xz + Ay + Xe + Xp,
X\) = xX, + X; + xX, + xX, (Other answers are
Exercise 13, d(u, v) = wt(u — v), showing
possible.)
that the minimum weight of nonzero code
words is less than or equal to the
minimum distance between two of them.
A-28 ANSWERS TO MOST ODD-NUMBERED EXERCISES
CHAPTER 2 | 1 0; ; fol
35. Lei A =/0 0 if y -|o) w= il. Then
Section 2.1
001 6!
1

. Two nenzero vectors in R? are dependent if


= on} For the second part of the

and oniv if they are parallel.


0|
. Two nonzero vectors in R? are dependent if Loo}. fi]
We

and oniy if thev are parallel. problem,


let A =/0 ! 0}, ¥ =|0}.
. Three vectors in R? are depenaent if and 00 0; 0
Vr

only if they all lie in one plane through the 0


OgiN. wo! | Then -lv = vy and Aw = w, so 4yv
9.20). (1, 4 Lo
and Aw are sull independent.
Li. {{1, 2, £. 2], (2, 1.0. -ij}
39. {¥,, Vo, Va} 41. {u,, u), u,. ug}
13. {{1. 3, 5. 7], (2, 0, 4, 27}
M1. {¥,, V2, Va}
15. {{2. 3. 1], (5, 2, 1]} 17. Independent
M3. {u,, U,, Uy, u,}

19. Dependent 21. Independent


23. Dependent 25. Dependent
Section 2.2
27. {{2, 1, 1, 1], (1, 0, 1, 1), [1, 0, 0, 0),
(9, 0, 1, Qi} la. 2
31. Suppose that r,w, + r,w, + 7,w; = Q, so b. {{2, 0, -3, 1], (3, 4, 2, 2]} by inspection
that 7,(2v, + 3v,) + r(¥, — 2¥5) +

wafil
r;(—v, — 3v,) = 0. Then (27, — r,)v, + c. The set consisting of 3h a
(3r, + 1,)v, + (-2r, — 37,)v,; = 0. We try
setting these scalar coefficients to zero and
solve the linear system

2r,

3r, ry
- rn=0,

= d. The set consisting of Ai 31-1 ,


—2r,
— 3r, = 0 |
a | 3. a. 3
2 o-1 fo) jt & 3]? b. {{1, 0, —1, 0], [0, 1, 1, 0], [0, 0, 0, 1]}
3 1 Of} o|~jo ft 2] O~
iO -2 -3 0 0 1-3 0
0; {6 3
c. The set consisting of yyy!
1 Oo _1 0 477 j17 14
a 5 O}. 1] L3} {0

0 0 0/0
This system has the nontrivial solution r; = d. The set consisting of the vector 7
2,7, = —3,and 7, = 1. Thus, (2v, + 3v,) — 0
3(v, — 2v,) + 2(—v, — 3v,) = 0 for all choices
of vectors v,, V2, ¥; & V, so W,, W2, W; are 5. a. 3
dependent. (Notice that, if the system had b. {[6, 0, 0, 5}, [0, 3, 0, 1], [0, 0, 3, 1]}
only the trivial solution, then any choice of
O} | 1] 42
independent vectors v,, v,, v, would show . The set consisting of 2} | ql or the
this exercise to be false.)
a

Oj 12] Ll
33. No such scalars s exist. set of column vectors {e,, e,, e;}
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-29

—5] 23. Yes; 7-'([N). Ny. X5]) = [XX — Xp Xp — Ny).

. The set consisting of 3 5 . Yes: T'([v,. Xs. X4]) =

Ww
6 . —X, + 3, — 2x) x, + x, - 2x,
:. 4 . 4 .
The rank ts 3; therefore, the matrix is not
~~
s

invertible.
- Yes. (7' ° T)"'((x,, 2) = Lz =3% +> 203
23]
The rank is 3; therefore, the matrix is
w
.

invertible.
~TFTTTFTTFT
~TFTTFFTFFT
. a. Because
. No. If A is m x nand Cisn X 5s, then AC
we

is m X s. The column space of AC lies in Tu + vy) = [+ 4,1) + ¥,...,


RR” whereas the column space of C lies in u, + y%,0,...,
0)
= [uj,W%,...,%,0,...,0] =
[V,, ¥2,---,%,0,...
0]
= T,{u) + T,(¥)
~

and
. LetA = and C= |! | Then T,(ru) = [74, me, ..., my, 0,..., 0)
\o

rank (AC) = 1 < 2 = rank (A). =r[u,, W,...,%,0,...,


0)

5. a. 3 b. Rows 1, 2, and 4 = rT(u),


we know that 7; is a linear
transformation. We see that 7,[V] C W,.
tion 2.3 which means that it is a subspace of W’,.
x
by the box on page 143.
. Yes. We have 7([X,, X;, X3]) = ! | : 2| b. Because 7,[V] is a subspace of
eed

tL -3 0
Xx; T,,. [VY], we have
Example 3 then shows that 7 is a linear d,.xd,=d,s rar <d,=n.
transformation.
If the jth column of H has a pivot but
. No. T(O[1, 2, 3]) = 70, 0, 0}) = [1, 1, 1, 1), the (j + 1)st column has no pivot, then .
whereas 07((1, 2, 3)) = O[1, 1, 1, 1] = d;,, = d;. However, if the jth and
[0, 0, 0, 0}. (j + 1)st columns both have pivots,
5. (24, —34] 7. (2, 7, -15] then d,,, = d; + 1. See parts c and d for
illustrations.
9. [4, 2, 4]
| c. If d, = d,= | and d, = d, = 2, then H
[802, —477, 398, 57] 13 | -3| must havc the form
34 = 1]
eed

e _ 5 e

Lilt fi -1 3
pXxxxl
00 pxX
-{1 1.0 7W}t oft
0000
100 1 0 0
0000
. The matrix associated with 7" T is

5 i} (T* © T(x, x) = (2x, 3x, + 4) 10 0 0 0

‘Yes; T"\([x,, X)) = [2214 “1 “1 where a p denotes a pivot and an X


=

° 4 denotes a possibly nonzero entry.


°
A-30 ANSWERS. TO MOST ODD-NUMBERED EXERTISES

d. Me d, “< l, dy = dy = u, = 2. and u: =< i. = as Well as below pivots in ihe reduced


3. then A must have the form row-echelon lorm H shows that the only
f possible such linear comb‘nation of
nonzero rows of Hf is | times the Ath row
|
}
plus 0 times the other rows. This gives a
characterization of the Ath nonzero row of
H=| Hf in terms of the row space of -4 and
completes the demonstration that the
|| reduced echelon torm of 4 is unique.
35. [588, 160, 8]
| Q 9 09 a
37. Undefined because T, > T; is not invertible.
where a p denotes a pivot and an %
Genotes a possibly aonzerc entry.
Section 2.4
. The number of pivots in H is ihe number
of distinct dimension numbers in the list
1. B2cause [i 3 I; ‘= [x _ el whicn
d,, a;, d;,..., d,. Moreover, a pivot
occurs in the (j + 1)st coiumn if and only
ifd,., = d + 1; the row and column has the form “ we see that the range of
~~ J
positions of pivots are completely T, Consists of veciors along the line y =
determined by the list d,, d,, d,,... . d,. 2x. Now 71, 2]) = [-5, -10] # [1, 2],
The pivot in the Ath row occurs in the jth and projection onto a line must leave fixed
column if and only if dis the kth distinct the points that lie on the line, so T, is not
positive integer in the list d,s, d;,.... a projection on the line.
d,. Because the numbers d, depend only
I
on A (they are defined just in terms of the 3 a. UV2 1n4 2 302
row space of A), so do the number and -UV2 2 -V32 it
locations of ihe pivots, the number and
locations are the same for all echelon -V372, 4
forms of A.
“| -L -3/2
. Let ¥ be a reduced echelon form of A.
fl — mm? 2m
Part e shows that the number of pivots
and their locations depend only on A, and
l+nel +m
consequently the number of zero rows 1. 2m m1
depends only on A and is the same for all l+m? i +m

CCF
choices of H. Suppose that the pivot in 11. In column-vector notation, we have
the kth row of H is in column j. Consider
now a nonzero row vector in R’ that has
y x 1 O}LY
entries zero in all components
corresponding to columns of H containing / 1 4 |? 1 Sh which represents a
pivots except for the jth component, LO 1jtl OF LY
where the entry is 1; entries in reflection in the line y = x followed by a
reflection in the y-axis.

CCE
components not corresponding to pivot
column locations in H may be arbitrary. 13. In column-vector notation, we have
We claim that there is a unigue such

218 phe
vector in the row space of A—namely, the
Ath row vector of H. Such a vector must
be a linear combination of the nonzero 01
rows of H, which span the row space of A. reflection in the x-axis followed by a
The fact that there are zeros above reflection in the y-axis.
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-31

15. In column-vector notation, we have 7. x,=3-22

B})--B)- 6 105 Ib IB)


whicn represents a horizontal shear followed
xX = -3 + t

x)

by a vertical] expansion followed by a 4+


vertical shear.
3+ x,=3- 2
9. ||v| = Vv- vand @= 270 xp = 3 +6
uv
cos"! Vieurvv-v
I+
u-u v-v +—_+—_+—_ ++ x
-—4-3 i123 4

‘tion 2.5 —2+


—3-
1. “2 - 4+

dic d,c
a+
=~.
d? dt, x;
d?+dp “
> a t

11. a xX, =-2+4.%=4-6


b. x) =3- 31, xX, = —] — 21,x,=6- TC;

Cc. x, =?2 — 34, X) = Stm=4- 12¢

13. The lines contain exactly the same set


{(5 — 34, -1 + 0) | t € R} of points.

15.

{4 1 4s\
17. —=, — —| 19. 3 3 l, 7 l
2 2 4, 22 2 2

21. (2, 3, 4) 23. x, + X, + xX) = 1


25. 2x, + %) — xX; = 2

27. x, + 4x, + x, = 10
29. xX, + x; = 3, ~— 3x, — 7x, + 8x, + 3x, = 0

(There are infinitely many other correct


linear systems.)
31. 9x, — x, + 2x, — 6x, + 3x, = 6
33. X, tx + x,t x t+ x, +x, = !

1
35. The 0-flat x = i
| 2

| 2
37. The !-flatx ={-1l}/+¢/1
| 0 l

| -8 0
39. The I-flat x = ao +1 7
0 2
A-32 ANSWERS TO MUST ODD NUMBERED EXERCISCS

43] b. Now cos 2x = cos'x — sun*x =


a“ pb] (—1) sin*x + (1) cos*x, which shows
41. The O-flat x = 1| that cos 2.x € sp(sin*x, cos*x)
l s 9
Li c. Now cas 4x = cos-2x - sin-2x =
aT EE TETETEI (1 - sin?2x) ~ sin?2x = 317) +
(—2)sin?2x, which shows that cos 4x,
CHAPTER 3 and thus 8 cos 4x, is in sp(7, sin?2x).
Section 3.1 9. a. We see that v,, 2v, + ¥, € sp(v,. v.): and
therefore,
J. Not a vector space 3. A vector space sp(v,, 2¥, + ¥3) C sp(v,, ¥,).
. Not a vector space 7. Not a vector space Furthermore, v, = lv, + O(2v, + v,) and
Ui

. © vector space 1}. A vector space ¥, = (—2)y, + I(2¥, + v2), showing that
oOo

Vv). ¥, © sp(v,, 2¥, + ¥,): and therefore.


43. A vector space 15. A vector space
sp(v,, ¥.) C sp(v,, 2v, + Y,).
17. a. [—1, OJ] 1s the “zero vector°
b. Part 5 of Theorem 3.1 in this vector Thus, sp(¥,, ¥2) = sp(¥,, 24, + v2).
space becomes 7[—1, 0] = [—1, 0], for 11. Dependent 13. Dependent
all r € R. That is, (0, 0] is aot the zero
15. Independent 17. Dependent
vector 0 in this vectoi space.
19. Independent 21. Not a basis
27. Both 2 x 6 matnces and 3 x 4 matnces
contain 12 entries. If we number entnes in 23. {1, 4x + 3, 2x? + 2}
some fashion from | to 12, say starting at 25. TFTTTFFFITT
the upper left-hand corner and proceeding
35. Let W = sp(e,, e,) and U = sp(ey, e,, es) in
down each column in tum, then each
R°. Then WO U = {0} and each x € R°
matrix can be viewed as defining a
has the form x = w + u, where w =
function from the set S ={1, 2, ..., 12}
x)e, + x,e, and u = x3e; + xe, + X€s.
into the set R. The rules for adding
matrices and multiplying by a scalar 39. In deciding whether sin x and cos x are
correspond in each case to our defin:tioas independent, we consider linear
of addition and scalar multiplication in the combinations of them with scalar
space of functions mapping S into R. Thus coefficients. The given coefficients f(x) and
we can identify both M,, and M,, with &(x) are not scalars. For a counterexample,
this function space, and hence with each consider f(x) = —cos x and g(x) = sin x.
other. We have (—cos x) (sin x) + (sin x) (cos x)

29. R24 R> RS Py My,


Mn Px Pys My, 41. The set of solutions consists of all
M5 Ms. Ms;
functions of the form A(x) + p(x), where
30
h(x) is the general solution of the
corresponding homogeneous equation
Section 3.2 SAX) + fralgye) + + fi(x)y" +
Sim)y' + foldy = 0.
Not a subspace 3. Nota 43. a. {asin x + bcos x|a,b€
R}
subspace b. {asin x + bcosx +x] a, bE R}
A subspace 45. a. {ae + be* + c|a,b,cER}
3
. a. Because | = sin*x + cos*x, we have b. fae + be +¢-2 1p Aya,
c = c(sin2x) + ¢c(cos?x), which shows 27 9 81
that c € sp(sin’x, cos?x). b,c
€ R}
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-33

7. One basis B for ti’ consists of those /, for 3. A linear transformation. ker(7) is the zero
ain S defined by {(a) = | and f(s) = 0 for function, invertible
s#ainS.1FfE Wand f(s) # 0 only for . A linear transformation, ker(7) is the zero

Wa
s€ {aa . ..a,}. then f= f(a,) at function, invertible
L(4No, + +> * + f(a,)f,. Now the linear
combination g = Cs, tof, torn + . The zero function is the only functioi: in

~
Cm, 18 a function satisfying g(b) = c, for ker(7).
1.
J=1,2, ..., mrand g(s) = 0 for all other .{ce" + oe — zsin x | c¢,, c, € R}
5
s € S. Thus g € W, so B spans only W.
11. {cx + Cc, + cos x |G, € R}
(The crucial thing is that all linear , l
eombinations are sums of onlv finite 13. {c, sin 2x + ¢, cos 2x + Gx — lc, ¢. € R}
numbers of vectors.) 2 ly
. {ce + x +c, - 5x — 3 | c,, G, ¢; E R}
17. T(v) = 4b) + 4b, + 7b; + 6b,
‘tion 3.3 19. T(v) = —17b, + 4b, + 13b; — 3b,

0000
1. (1. -1] 3.(2,6,-4] 5. (2, 1, 3] _13 000
Zl. a. A= 0200
7.14, i, -2, 1] 9. [Lt 1-0
001 0
1. (1,2, -1, 5]
4 0
3. p(x) = [e+ 1) - IP +(e 4 1) - IF —5 = 12 ,
—[(x+1)-1)-1 b. A 10 19) - 12% 12x? — 10x ++ 10
=(x+ 1p
- 304+ IP+ 3v4+ 1) -1 —13} , 10
+ (xt lp - 2+ 1)+1 c. The second derivative is given by
— (xt+i)t+i -5 0
— | A 2
38 = 0
39° oo
30x - + 16
=(v + 1)'- 2x +1) + Ov + 1) +0
| 4) [ 16
so the coordinate vector is [1, —2. 0, 0}. 23. a. Die’) = et + 2xes Die) = rer +
1S. [4, 3, -5, -4] 4xe" + 2es D(xe") = xe’ + & D(xe’) =
17. Let x? — 402 + 3x + 7 = B(x — 2) + xe + 2e; Die’) = es De) = e
b,(x — 2)? + bx — 2) + by. Setting x = 2. 100
we find that 8 - 16+6+7=h,s0 b= A=|4 1 0
5. Differentiating and setiing x = 2, we 221,
obtain 3x? — 8x + 3 = 3b,x — 2) + 2b(x
b. From the computations in part a, we
~ 2)+5;i2-16+3=5,5,=-1.
Differentiating again and setting x = 2, we 100]
have 6x — 8 = 65,(x — 2) + 26,,12 - 8 = have A, =|2 | 0}, and computation
2b,; b, = 2. Differentiating again, we (0 i iy
obtain 6 = 65,, so b, = 1. The coordinate shows that A,? = A.
vector is {1, 2, —1, 5].
25. We obtain A = Fe | in both part a
19. b. {f(x). (9) 21. 2x7 + 6x + 2
and part b

9 0 0
ction 3.4 27.}0 25 0
0 0 81
1. A linear transformation, ker(7) =
{f€ F | f{-4) = 0}, not invertible
29. (a + c)e* + be + (a + ce 3 [-34-4
3
A-34 ANSWERS TO MOST ODD-NUMBERED EXERCISES

33. -2b sin 2x + 2a cos 2x 7, 120 9. -9


. lf 4, 1s the representation of 7, and A, is 1a, Qs) — G,Dy — Gb, = -(b,a, — b,a,)
ar
ou

the representation of T, relate to BB’, |b. b,


then A, + A. 1s the representation of se Wh, yy!
ZT, + 7, relative to BB’. Wf as the |G a.
representation of 7 relative to B.B". then
rA is the representation of r7 iclative to . -6i + 3) + Sk 15. Oi — uj + Ok
BB’
221 + 18j + 2k
31 . Let V = Dy, the space of infinitely
-FTTFFTFITTF
differentiable functions mapping i into R.
Let 7(f) =f’ for fE D,. Then range(7) = . 38 23. \/62 25. 2
D, because every infinitely differentiable
function is continuous and thus has an ve 29. 16 31. V 390
antiderivative, but 7(x) = T(x + 1) shows
that T 1s not one-to-one .a'(bxc)
= -6,
a X (bX c) = 121 + 4k
Sect ion 3.5 .a-(b xc) = 19,

1 . Not an inner product


ax(bxc)=3i-7jt+k
37. 2U 39. S 41. | 43. :
3 . Not an inner product
45. Not collinear 47. Collinear
5 . Not an inner product
49. Not coplanar 51. Not coplanar
7 . An inner product
53. 0 55. |lal[?libI/
9 . Not an inner product
57. ixax j)=ixk
= -j, but(@i xi) xj=
V1. a2 b —& d
“Oxj=0.
"6 "V3 "V9
i3. f(x) = -x + + and g(x) = cos wx. Other i j k
answers are possible. 59. b Xc= b, b, b, = (b,c, _ bsc,)i _-

C, Cy C;
15. 77
(b,c; _~ b,c, + (b,c, ~ b,c,)k. Thus.

CHAPTER 4 a:(b xc) = a,(b,c, — byc,) —


a,b,c, — b,c) +
Section 4.1 a,(b,c, — b,c)
Q, a, a,
1. -15 3. 15
= |b, b, bj.
b, b, 6, C, Cy Cy
5. b-(b xc)= 5b, Db, by
Equation (4) in the text shows that this
IC, Cr Cy determinant is +(Volume of the box
= b,(b,c; — b,c) ~ b{5,c ~ b;c,) + determined by a, b, and c). Similarly,
5 b,c, — 5,¢,)
= 0, C, &, Cy
(ax b)-c =c-+ (aX b) = Ja, a a,
C, C, Cy b, b, b,
c-(bxc)= |b, , b,
CG 6; = ¢,(a,b, — a;b,) —
|
€,(a,b, — ab,) +
c,(b,c, — bycy) — ¢(dc — b,c,) +
ca,b, — a,b,),
€,(0,C, — b,c,)
0 which is the same number.
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-35

61. 322 63. 130.8097 Section 4.3


M3. a. 67.38i — 88.76) — 8.11k
b. 111.7326 1. 27 3. -8 5. —195
7. —207 9. 496
13. Let R,, R,...., R, be square submatrices
positioned corner-to-corner along the
Section 4.2 diagonal of a matrix A, where A has zero
entrics except possibly for those in the
1.13 3.-21 519 7. 320 submatrices R;. Then det(A) =
det(R,) det(R,) - - - det(R,).
9.0 11.-6 8§=©613.2

oly
15. 4
17. 54 19. 5
15.
21.FTTFTTFTFT
23. 0 25. 6 27. 5,- 2 29. 1,6, -2 if! 4 Hf
33. Let A have column vectors a,, a),..., a, 17. A'=-—=| 7 -6 -1l1
Let 'T)§ -3 3
ar 5
19. A'=-—| 4-1 -8
I3}_) -3 2
[-7 -1 4
2 - adj(4) = |-3 2 -8
i 6 -4 -1

25. The coefficient matrix has determinant 0

i
b,a,|.
| so the system cannot be solved using
Cramer’s rule.
Then AB = b,a, b,a, oe 27. x, =0,x,=1
29. x= 3.x, = -txy=

31. x, = 0,%,=3,% = 0
Applying the column scaling property n = 4
33.
times, we have * * 59
35. FTTFTTFTTF
41. 19095. The answers are the same.
43. With default roundoff, the smallest such
det(4B)
= b,|a, ba,
- °° ba, value of m is 6, which gave 0 rather than |
as the value of the determinant. Although
this wrong result would lead us to believe
that A® is singular, the error is actually
quite small compared with the size of the
entries in A®, We obtained 0 as det(A”) for
6 = m = 17, and we obtained
= 5,b,/a, a, bya, os ba, — 1.470838 x 10'° as det(A'*). MATCOMP
gave us — 1.598896 x 10" as det(A”).
With roundoff control 0, we obtained
det(A‘) = 1 and det(A’) = 0.999998, which
= -++ =(bb, -+ + 6.) det(A)
has an error of only 0.000002. The error
= det(B) det(A}. became significantly Jarger with m = 12,
A-36 ANSWERS TO MOST ODD-NUMBERED EXERCISES

where we obtained 11.45003 Tur the CHAPTER 5


determinant. We obtained the same result
with m = 20 as we did with the default Section 5.1
roundolf.
With the default roundoff. computed ¥,, ¥;, and vy, are envenvecrars off with
entries of magnitude less than corresponding cigenvalucs - 1.2. and 2.
(0.0001 )(Sinailest nonzero magnitude in respectively. v,, ¥;. and ¥, are eigenvectors
A”) are set equal to zero when A” is of A,. with corresponding eigenvalues 5, 5,
reduced to echelon form. As soon as 7 is and 1, respectively.
large enough (so that the entries of 1” are
large enough). this creates a false zero . Characteristic puivnomul 4° — 44 + 3

no
entry on the diagonal, which produces a Eigenvalues: A, = 1, A, = 3
—r
ralse calculation of 0 for the determinant. Eigenvectors: for A, = 1: ¥, = | "} r#0,
With roundoff zero, this does not happen.
But when #7 get sufficiently large. roundoff
error in the computation of 4” and in its for A, = — Lash 5 #0

Loo
- 2§

reduction ‘o echelon form creates even


greater error in the calculation of the Characteristic polynomial: ? + |

wn
determinant, no matter what roundoff Eigenvalues: There are no real eigenvalues.
control nuinber is taken. Notice, however, ~l. Characteristic polvnomial. —A? + 247 +
that the value given for the determinant 1s A-2
always small compared with the size of the Eigenvalues: 4, = —1, A. = 1, A, = 2
entries 1n.4”.
Eigenvectors: for 4. = —\:v, =] °|rj.r 4 0,
45.
I2
-I!
-8
43
-7
5 0
9 -6 -l | 0
for A, = | a |s40
(/-4827 114 -2211 2409 5
47.| 73218 76 -1474 1606 =
“| 8045 -190 3685 -4015 for A, = 2: v, =|-2|,¢ 40
| 1609 -38 737 -803 LJ
Section 4.4
9. Characteristic polynomial: —\* + 87 +
A-8
Eigenvalues: 4, = —1, A. = 1, A; = 8
1.V213 3.V300- 5. 11 7. 38 0
V6 5 Eigenvectors: for 4. = —-\:v, =| r|,r 490.
9.2 MN. 13. = 0
. Let a, b, c. and d be the vectors from the 0
origin to the four points. Let A be the for A, = 1: v, =|-s], 5 40
n X 3 matrix with column vectors b — a, 5
c — a, and d — a. The points are coplanar —t|
if and only if det(A7A) = 0. " for A; = 8: ¥, = ipte
17. The points are not coplanar. 19. 32 t

11. Characteristic polynomial: —\? — \? +


21. 14427 23. 264 25. er 8A + 12
Eigenvalues: A, = A, —2, A; = 3
27. 5V3 29. 254V3 31. 16/17
—r
33. a. O b. 0 Eigenvectors: for Ay, A. = —2:¥, =| SI,
35. TFTFETFETTFT r
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-37

rand s not both 0, . +V5 -


cr 31. Figenvaluey A, = aS. A. = 1=™S
, 0
for A, = 3:y, = —fl. t x 0
1+ VS,
| Ligenvectors: for A, = ——

13. Characteristic polvnumal —d* + 22 + = ‘{¢ + YP) 0.


4. - 8
Eigenvalues: A, = —2,4. = A, = 2
1- V5
—r for A, = >TO
Eigenvectors for A, = —2: ¥, ={—3r\,r #4 0,
= | ~ |. 4 0

0
for Az, Ay = 2:¥,=| 5|,5 #0 . a. Work with the matrix A + 10/, whose
%
eigenvalues are approximately 30, 12,
15. Characteristic polynomial. —a* - 37 + 4 7, and —9.5. (Othe: answers are
Eigenvalues: A, = A, = —2,A,= 1 possible.).
0 b. Work with the matrix A — 10/, whose
Eigenvectors: for A,. Ay = —2:¥, =| rj, r #0, eigenvalues are approximately —9.5,
0 —8, —13, and —30. (Other answers are
35 possible.)
for A; = I: v, =|-s|,5 #0
35 37. When the determinant of an n X n matrix
A 1s expanded according to the definition
17. Eigenvalues: A, = —1, A, = 5
. r
using expansion by minors, a sum of n!
ELigenvectors. for A, = —1l:¥, = | |: r#0, (signed) terms is obtained, each containing
a product of one element from each row
forA, = 5: ¥, = ss #0 and from each column. (This is readily
proved by induction on 7.) One of the ji!
19. Eigenvalues: 4, = 0, A, = 1, A; = 2 terms obtained by expanding |A — Adj to
obtain p(A) is
on

Eigenvectors: for A, =
RK
Oo

(ay, — A)(@_2 — A)(Gy3 — A) + + + (Gan — A).


ll

Oo
-“

~
I

We claim that this is the only one of the n!


terms that can contribute to the coefficient
for A, = I: ¥, = 5 of A"~' in p(A). Any term contributing to
0

:
the coeffictent of A”~' must contain at least
n — 1 of the factors in the preceding
forA, = 2:¥, =|0|,140 product; the other factor must be from the
l remaining row and column, and hence it
21. Eigenvalues: A, = —2,4,; = 1,4, = 5 must be the remaining factor from the
diagonal of 4 — Al. Computing the
Eigenvectors: for A, = —2: ¥, =| |, coefficient of A’~' in the preceding product,
we find that, when —A is chosen from all
but the factor a, — Ain expanding the
product, the resulting contribution to the
for A, = |: ¥) =
coefficient of A”' im p(A) is (—1)""!a,. Thus,
the coefficient of A”~' in p(A) is
for A, = S:v,=|—-Se], 14 0 (-1)"""(a,, + ay + Qj); + °° ° + ,,)-
Now p(A) = (=1)"(A ~ A)(A = A:)(A ~ da)
23. FFTTFTFTFT > ++ (A — A,). and computation of this
A-38 ANSWERS TO MOST ODD-NUMBERED EXERCISES

product shows 1n the same way that the 0


coefficient of A"! is also

a)
A, = 16, ¥, = oft #05 a = 36, v=

Ise

(—1yrt(y + Ay + As ts + AY).
Thus, tr(A) = Ay +A, + AV tees +A,
0|
oe
39. Because {2 ~ > vi fex- 5A
+ 7, we SAO AR AH Ay = ,fand u
compute | wu
not both zero
A - a+ m=|? 7) “yt
1 3 1 3 53. Characteristic polynomial. —\? — 16d? +
48\ — 261
7 (1 0J_/3
|= . 5],f-10
+ 5],[70
+|
io!; [5 8 -5 -15| |07 Eigenvalues. d, = 1.6033 + 3.3194i,
_19 4 A, = 1.6033 — 3.31947, A, = —19.2067
lo 0}’ 55. Characteristic polynomial: s* — 56.3 +
illustrating the Cayley-Hamulton ineorem. 210A? + 22879 + 678658
4 . The desired result is stated as Theorem 5.3
Eigenvalues: A, = — 12.8959 + 13.3087i,

u
Lad

in Section 5.2. Because ann X nm matrix is A, = —12.8959 — 13.30871,

N
the standard matrix representation of the Ay = 40.8959 + 17.4259i,
linear transformation T: R" — R" and the A, = 40.8959 — 17.4259:
eigenvalues and eigenvectors of the
transformation are the same as those of Ml. a. A? + 16A? — 48A + 261;
the matrix, proof of Exercise 42 for A, = —19.2067,
transformations implies the result for A, = 1.6033 + 3.3194i,
matrices, and vice versa. A, = 1.6033 — 3.3194;
45. a. F, = 21; b. A‘ + 1443 -- 131A? + 739A — 21533;
b. Fy) =832040; A, = ~22.9142,
c. Fy = 12586269025; A, = 10.4799,
A, = --0.7828 + 9.43701,
d. Fy, = 5527939700884757;
II

A, = —0.7828 — 9.43707
e. Fisy ~ 9.969216677189304 x 10”
M3. A of minimum magnitude =
47. a. O, 1, 2, 1, -3, -7, —4, 10, 25; —2.68877026277667
2-3 1
vi7. A of maximum magnitude ~ 19.2077
b|1 O 0};
dA of minimum magnitude ~ 8.0) 47
0 1 0
e. ax = — {91694
Section 5.2
Foo fg!
f

Tl £0;4,=3,4 0 1. Eigenvalues: 4, = —5, A, = 5


49. A,=-Il,¥,=
=
oO,
—-2
5
Eigenvectors: for A, = —5: v, = | "|, r

0 s
for A, = sm |ail.s¥ 6
S#0;A, =A, =2,¥,= "| ¢and u not

t
both zero
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-39

3. Eigenvalues: A, = -1, A, = 3 9.b.4=0 “ 0


Eigenvectors: for A, = —1:¥, = } r#0,
21. T([x, y])= *\x + 2my,
for A. = 3: ¥, = [74 540

fife
2mx + on - OM
23. If A and B are similar square matrices,
-l -z -1 0
then B = C"'AC for some invertible matrix
C. Let A be an eigenvalue of A, and let
5. Eigenvalues: 4, = —3, A, = 1,4, =7 {¥,, ¥2,.-., Vy be a basis for E,. Then
Exercise 22 shows that C"'v,, C''v,,.. .,
r
C-'v, are eigenvectors of B corresponding
Eigenvectors: for A, = —3:¥, = | r#0,
to A, and Exercise 37 in Section 2.1 shows
0
that these vectors are independent. Thus
5 the elgenspace of A relative to B has
for A, = rns lfor 0, dimension at least as great as the
s eigenspace of A relative to A. By the
symmetiy of the similarity relation (see
Exercise 16), the eigenspace of A relative te
A has dimension at least as great as the
eigenspace of A relative io B. Thus the
dimensions of these eigenspaces are
equal—that is, the geometric multiplicity of
A 1s the same relative to A as relative to B.
7. Eigenvalues: A, = —1,A, = 1,A,;=2 31. Diagonalizable
33. Complex (but not real) diagonalizable
—-r
35. Complex (but not real) diagonalizable
Eigenvectors. for 4, = —1: y, -| 0|,7 #40,
37. Not diagonalizable
tL "J
39. Real diagonalizable
—S§
41. Complex (but not real) diagonalizable
for A, = cn] a
Section 5.3
0
Cerne
pote

Noi

for A, = 2:¥, = exe l.a. A=


a

[ee -aofl se)


b. Neutrally stable;
-l1 -1 0 -100
C= 0 0 1;,,D=| 010
oO

1 3 0 002 er

; a} The sequence starts 0,| oe 2


mie

9. Yes, the matnx is symmetric.

ws (2) ACL
11. Yes, the eigenvalues are distinct.
IW COlA

1 2} 1 1 }-1
eive™

3 —_
13.FTTFTFFTFT

3
15. Assume that A is a square matrix such that
which checks..
D = C"'AC is a diagonal matrix for some
invertible matrix C. Because CC7' = J, we d. For large k, we have |! a = 2 =
have (CC)? = (C"')'C’ = fF =I, so(C’)'
= (C')". Then D? = (CAC) =
Led

2
CTA™C")"', so A’ is similar to the diagonal
wi No WI

» S04 ~ 3:
matrix D’.
A-40 ANSWERS TO MOST ODD-NUMBERED EXERCISES

CHAPTER 6
Section 6.1

© [=a lil ser [+> ) 2


. = [3,4

[- i} The sequence starts 1, 0, ! p, = [1, 0, 0}, p, = (0, 2, 0], p, = (0,0. 1

Lj
? >

Ww
- I
3 5. p= 42, —-3, 1, 2]

se |
8
and 4'{tl=
sli) aa al
Ld
i 9

4 7. sp({1, 0, 1], [—2, 1, C))


which checks. 9. sp({—12, 4, 5})
_ {i i1. sp({2, —7, 1, 0], [-1, —2, 0, 1))
d. For large k, we have |“ | <li = | 13. a. -—Si+
3j +k b. -Sit+
3j +k
on 3
7 I 1.
SG a, =~
= 15. 5 [5, 4, 1] 7 5 (5, 3, 1]

1 3 I
19. = (2, -1, 5 21. = (3, -2, -1, 1]
4-|
1
i
0
b. Unstable; 3
L
23. FTTTFTTEFT 29. =
* ‘ 3 _l _1\f
c. 2 4 4
31. is 33. vil6l
— 35. V10 37.
3
. The sequence starts 0, 1, 12 >
5h © seq 22
Section 6.2
27 [3 ae1 l x32
and 4 0 32 2 ° = 1
3 l == =

H 1. (2,3, 1)-[-1,1, -l] = -2+3-1=(

|
so the generating set is orthogonal.
[Ue

| which checks. = pl! 36, 29, 103].


“IDO

3. [1, -1, -1, 1)- (I, 1, I, 1]


pS

=
1-1-1+1=0,
Ags — l 3 ‘ 3
d. For large k, we have a, | = (3) I fl, -1, -1, 1)-[-1,
0, 0, 1]=
-1+0+0+1=0, and
sO a, * kel 1?
and a, approaches « as k
[1, 1, 1, 1] [-1,
0, 0, 1) =
approaches «. -1+0+0+1=0,
7 Xx; _ —k,e*! + 4k,e" so the generating set is orthogonal; b,, =
" |X ~ k, e! + 3k, e {2, 2, 2, 1].
xX —2k, e' + ke 1 ! - art
5. {rll 0, —2], 79 | 7), 3]
k, e! + ke"

x ke" + ke! + ke" 7. to 1, 0}, will 0, u|


Ih. 2] = ke! + ke"
ke!
9. | alt, 0, Wal I, 1, ells 2, -
a S3

XY —k,e"' ~ ke!

all, 0, 1, 0}, [0, 1, 0, 01. ell. 0, -|


Ny = ke

Xy ke! + 3ke' 11.


ANSWERS TO MOST ODD-NUMBERED EXERCISES A-41

9 9 4 1 5 tot 1 32-3
13 |=, -3,= 5. |-,0, --,= 2 2 2
5 | I EF 0 3 | ; 1 49|-3 6 2
|
17. | atl
|
0. 1, 0), waar 2, 1, 0),
6 6 6 2 3
=|
1 “<4 ; 11. 23
Vl 1, —1, 0), (0, 0. 0, u| 6| | 6
19. {(3, -—2, 0, 1), [-9, —8, 14, 11)}
13. l -1 1 | V5 2 A
1 1 Al 1 1 IS. 5 |7 ‘ ) ‘
2). | el 1, ty], V0 —1, u|

0 1 1v2 t1
23. {{2, 1, —1, 1], (1, 1, 3, 0], (-24, 9, 5, 44}}
1
25. FTTFFTFITTT ~ V2 ~3 0 2

17. i 1
WV -11V3 16
27.Q0=| 0 V3 aw, 0 1 ow. 5
WW. V3 “Ue
W2V2 V2
R=(|0 V3 -1N3 19. FITITTTTET 23. i i
0 0 46,
27. An orthogonal matnx A gives rise to an
Ts

33.| 2 sin x, cos x orthogonal linear transformation 7(x) =


Ax that preserves the magnitude of vectors.
2 Thus, if Av = Av, so that 7{v) = Av, we
35.
{1 lesa e- e+ | must have |{v|] = ||Avi] = |Al||vj]. If v is an
eigenvector, so that v # 0, it follows that
|A| = 1; soa = +1.
action 6.3
33. No 35. Yes 37. Yes

1. Let A be the given matrix. Then


Section 6.4
TA = 1 ft -1])_1—_ f 1 —— 1)_1f20
AvA “Alt ale i slo >
4 2-2 1 2
_|1 0
2 1! —1|, projection =5 l
0 1]
u
-2 -1 #1 —i
so A is orthogonal and A“! = A? =
; | 34-35 1 86
fi -1 P==|-3 26 15 , projection = 37 13
Viti iy 5 15 10 25
. Let A be the given matrix. Then
boo

i} 371 2 1|2
if 2 3 -6],[ 2-3 6 5. Pre -!l 5 2 » projection = 5 8
ATA==|-3
T4 = —|—- 6 27)~ 3 6 2|= 2 2 2 5
6 2 3) [-6 2 3
[10 -1 3 10
1/49 9 90 100
_tli-1 19 6 -1 ontinn cx
49 049 O;=/0
1 0}, PRT 3 6. 3 3 |» Projection
0 049) (001
10 -1 3 10
so A is orthogonal and d A“! = A? = 41
2 3 -6 1 | 40
7|73 6 2). 21|27
6 2 3 4
A-42 ANSWERS TO MOST ODD-NUMBERED EXERCISES

100 1000 37. The projections are approximately


9. 010
00 0
i1,|90000
100 1.864516] | 1.058064
1.496774 187097
4.116125
2.574194
and
0001 —.135484]’ | -.941936]° 1.11612
13. If P is the projection matrix for a subspace 2.819355] | 2.077419 3.154835
W of R" and if b € R*.is a colunin vector, respectively.
then the projection of b on W is Ph.
Because Pb is in W, geometry indicates Section 6.5
that the projection of Pb on W is again Pb.
Thus, P(Pb) = Pb, so P’b = Pb and _ 1164 , 60.4
(P> - P)b = 0 for all b € R”. It follows 59 59
from Exercise 41 in Section |.3 that b. = 7.092 inches
P?~P=0,so P= P. 3. a. y= .528e?"** b. = $27,
-FTTFFTFFTT 17.1 102
bad

sn

Lad
WwW
5. +

oi |
I
‘=<
a. QO, | 35°

Wi
b. 0 has geometric and algebraic
multiplicity n — k,

<
[
1 has geometric and algebraic


multiplicity k.

T
q
c. Because the algebraic and geometric
multiplicities of each eigenvalue are Pe dk T
T

equal, P is a diagonalizable matrix.


oO
, OS

2 . Then X nidentity mairix J for each


o—_.

positive integer n.
T T

[2 129
25 25
in

_ 33,35° 102357
T

23. 1225 1625


= T

E 0 1
13 -18 0 -12 -++++-++++—
> x

2 4
25.
1/-18 36 12 0
49 0 12 %1I3 -18
7. y= —-0.9 + 2.6x
-12 0-18 36
(3 0 0 i 10 14 y

27.
To 2-11 _ 0

0 1 12 14
33. Referring to Figure 6.11, we see that, for
+++
oo

p = Pb, the vector from the tip of b to the


tip of pis p — b, which is also the vector
from the tip ofp to the tip of b,. Thus, the
vector
b, = b + 2(p — b) = 2p — b=
y = —0.9 + 2.6x
2(Pb) — b = (2P — Db.
. The projections are approximately
1.151261 3 1.932773
— 1,184874),; 3], and] —.806723],
3.89916} [-! 4.378151
respectively.
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-43
9. y=0.1 - 04x 4+ —1
11. y= 5. (3, 5.1. 1) 7.2 +Ox4+2 9. BI
16 + 2x 13. 4.5 min
--2
15. Let ¢ = x — c, where c = (27,a,)/m. The
data points (a, — c, b,), (a, — ¢, 6,),....
(a,, — Cc, 6,,) have the property that 11. a Cye =
27, (a; — c) = 0. Exercise 14 then shows
that these data points have least-squares
linear fit given by y = ry + r,t, where ry
and r, have the values given in Exercise 14.
Making the substitution ¢ = x — c, we see
that the data points (a,, 5,), (a, 6,),...,
(a,,, 0,,) have the least-squares linear fit
given by y = ry + 7,(x — ©).
_ 0
UA J
Ar In

17. X= 19.x=| 2

Ot

I
eee
a

H
&

J
Oe
21. FFTTFFTTFF

\
23. See answer to Exercise 17.


25. See answer to Exercise 19.
o-=—=
27. The computer gave the fit
vr

y = 0.7587548 + 1.311284x with a

—--o0
leasi-squares sum of 0.0389105i.
17.
29. We achieved a least-squares sum of
5.838961 with the expoential fit
y = 0.8e°*, The computer achieved a
ms oeCS

least-squares sum of 6.34004 with the


vt ts

&
|
-_—

exponential fit y = 0.88748362'9", The . Cos =


Qo


I

fit using logarithms tries to fit the smaller


——

y-value data accurately at the expense of -1 1


the larger y-value data, so that the percent 21. Con =| 1
accuracy of fit to the y-coordinates is as
good as possible. 23. TFITFTFIFTT
31. The computer gave the fit 25. Cas = Coe-Cae
y = 12.03846 — 1.526374x witha
least-squares sum of 0.204176. Section 7.2
33. y = 5.476 — 0.75x + 0.2738x
35. y = 5.632 — 1.139x + 0.1288x? +
0.05556x3 + 0.01512x!
37. y=-5-8x+9x7-x

JAPTER 7
ction 7.1

1. (-1, 1] 3. [-4,
-2, 1, 5]
A-44 ANSWERS TO MOST ODD-NUMBERED EXERCISES

4° (2 —4
4

NI
$365 3 0
5. Rp =|—> —3 ~2], Rs =|—4 —16 21. A, = —2, A, = Ay = 5; 2_, = spl] |

ILA
3)
~9
3 73 72 12
! 3 }
oF

Ih
0
caf BB
_ 3 2
E, = sp -3 ; hot diagonalizable
2
024 2

1 0 0 |! —2 72] 23. FTTFITTTFFT


7, R, =|0 1 0), Ry ==|-2 i —2!,
0 0-1 3} _> —2 1
if! 1 ~21 CHAPTER8
C=5]1 -2 j
il ot 4 Section 8.1
100 100
9.R,=|0
0 0|,R, =/0 1 Ol,
001 000 1.u=(3
_ {3 iz . [
-6) E 3 -3 1
101
C=5I1 ol 1-4 3 1-2 3
0 2 0|
3. U=l0-1 -8|,4=|-2 -1 -4
200 211 0 0 0 1-4
11. R,=|2 2 O},R» =|0 2 21,
112 002 -|3 4 _[-2 4
[foo1 s.u=|-} §.a=| a4
C=|010
100 8 i 8 372
(000 0) jo 7.U=(|0 12 a+ S$ 1-1
3.R,-|2 99% p 18 6 00 2 -1 10
cue 020 0) f 0
0010 0 O
9. [= [UNS UMA) ae
x

0001
10010
C=l0100 1 = Pye “VION TE nee
Pid “Ly UV10 3/10
2 1-3 x) finn val
i5./-1 0 1 S PI iW irV3) Lop Wt 5
00 1
AVA, = -1,A4,=5, FL, = o({1}}
x] [v3 -1rV2. 11% |
15. z v3 InV2 -1rVot|!
WV3 0 26 a
E,= so({-1]} diagonalizable
—t? + 2t? + 21,
1
19. A, = 0,A,= == Oj): Watc=kac=b
I 19. —5.4721361,2 + 3.4721361,
0 ] 21. —4.021597¢,? + 1.3230571,? + 4.69854
E,= sp|| 1 , £, = sp f ; diagonalizable
0 l 23. —4t? + tt? + 40? + Lg
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-4S

Section 8.2 5 C= [3-V/10 —17-V/10


LIM10 3/710
1 caliiV2 - 12 Htye+yr=4
W212
iP -tP =

y
t
9
S|

re
— ——)- Xx
1
-V2
10x? + 6xy + 2y2 = 4

2xy = |

3.ca [V2 -1N2 7 Ca [UNS -2VS


Ww Wn Uys ws
2t,2 + O1,2 = 7t,? + 217 = 8

3x2 + 4x)+ 6y7 = 8

x+2xy ty? =4
A-46 ANSWERS TO MOST ODD-NUMBERED EXERCISES

9. The sym:xetric matnx of the Section 8.3


dratic-f tion ig | 2 2/2
quadratic-form portion is i c . Thus,
g has a local minimum of —7 at (0, 0).

>
a-r b/2 g has no local extremum at (0, 0).
deti4 —M)= | p72 ¢-al =

YY
g has a local maximum of 3 ut (—5, 0).
NV — (a+ c)d + ae — b/4. g has no Jocal extremum at (0, 0).

en
The eigenvalues are given by A = The behavior of g at (3, 1) is not
() (a totVatoth- 4ac); they determined.
1 . g has a local maximum of 4 at (0, 0, 0).

—_
are real numbers because A is symmetric;
they have tue same algebraic sign if 13. g has no local extremum at (0, 0, 0).
b’ — 4ac < 0, and they have the opposite 15. g has no local extremum at (7, —6, 0).
algebraic sign if b? — 4ac > 0. One of them
17. (xy y=y +10 19. (x, y= ye +
is zero if b? — 4ac = 0. We obtain a
(possibly degenerate) ellipse, hyperbola, or 21. g(x, y) = xi + y+ 40
parabola accordingly. 23. The maximum is .5 at +(1/2)(1, 1). T.
(1 0 O minimum is —.5 at +(1/V2)(1, —1).
11. Let C7’=|0 O/r cfr|, where 25. The maximum is 9 at +(1/W/10~1, 3).
0 -c/r bir The mi imum is —1! at +(1/V 10)(3, 1).
27. The maximum is 6 at +(1/V2\(1, —1).
r= VP + c. Then C is an orthogonal
The minimum is 0 at +(1/2\(1, 1).
matnx such that det(C) = 1 and
29. The maximum is 3 at +(1/V/3)(1, -1, 1
t, x x The minimum is 0 at
L/=t=cC ly = | (by + cz)fr | +(1/V 2a? + 2h — 2abXa — 3B, a, b).
l, z (-cy + bz)r|

fo
31. The maximum is 2 at
Thus, (+1/V 2a + 2a, b, b, —a). The
minimum is -2 at +(1/2X0, ~-1, 1, 0)
L,
(bt, — “| 33. The local maximum of fis A,a’, where A
(ct, + bt,)/r the maximum eigenvalue of the symmet)
coefficient matrix of the form f; it is
represents a rotation of axes that assumed at any eigenvector correspondil
transforms the equation ax’ + by + cz=d to A, and of length a. An analogous
into the form at,’ + rt, = d. statement holds for the local minimum
13. Hyperboloid of two sheets of f-
15. Hyperboloid of one sheet 35. g has a local maximum of 5 at (0, 0, 0).
17. Hyperbolic cylinder 37. g has no local extremum at (0, 0, 0).
19. Hyperboloid of one sheet
21. Circular cone or hyperboloid of one or two Section 8.4

tme[ ths [ an [I
sheets
23. Elliptic cone or hyperboloid of one or two
sheets
25. Parabolic cylinder or two parallel planes or
one plane or empty Rayleigh quotients: —2, 1, 5.2
27. Hyperbolic paraboloid or hyperbolic Maximum eigenvalue 6, eigenvector E
cylinder or two intersecting planes
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-47

l l l
a=Z (ates Var oF= Mac - 2)
/Wr=15/,
Wy =l19], Wa =| 65
tod

7 29 103
= 5 (ates Va~ ot aB).
Rayleigh quotients. 6,6, 28
yleigh quotients: “4 = 4, 4222
0D = 3.5
b. If we use part a, the first row vector of
A - Alis

=—
—__—
Maximum eigenvalue 3, eigenvector

Ue
[a — A, } =

Wi
ie-es (a —c) + 48°), 7
1

bo l=
Ni—N]—

al
Nl

= (p= VETS, bh
to T= bof
re
aA

|
Nie

Ni
|

Wu

wu

c. From part a, eigenvectors for the


tT

matrix are [-b, g+ Ve? + BY] =


Ww l—
Al— Gof—

Go fm
IN

wl
Dl Al—

[-5, g = h]. Normalizing, we obtain


Ws foe OS
Wo fm

+ Ob,b,"
Go fm

(-b,g = h)
Wl
NO

. Using the upper choice


“|,

bt + (g + hy’
a bm

Al—

haa fm
eo fm
Wl
cc

of sign and setting r= V& + (g + A)’,


we obtain [—D/r, (g + h)/r] as the first
Go fe Le Je
ro Je

|e

column of C. Using the lower choice of


Rad Pomme ts fmm
po)
Nl

sign and setting s = VP + (g — A)’, we


—_,
—_

wo f— Geo
e
Ni=

obtain [— b/s, (g — A)/s] as the second


Nol
uo

wd |

column of C.
A = 12, v = [-.7059, 1, -.4118]
A = 6, ¥ = [-.9032, 1, -.4194] d. det(C) = He + ae) = 26h,
eo Ww

rs
A = .1883, v = [1, 2893, 3204] because A, 7, s = 0, we see that the
. A, = 4.732050807568877, algebraic sign of det(C) is the same as
v, = r[l, —.7320508, 1} that of 0.

v, = 5[.3660254, mo1, 3660254] A,A; == 7.906602974286551,


17.09857610264269
ae awe Th 0, 1), 27. A, = —5.210618568922174,
nonzero 1, , A, = 2.856693936892428,
1. A, =
16.87586339619508, A; = 3.528363748899602,
v, =
7[.9245289, 1, .3678643, .7858846] Ay = 7.825560883130143
A, =
—15.93189429348535, 29. 5.823349919059785,
v, =
$[—.3162426, .6635827, —1, —11.91167495952989 +
— 004253739] 1.3578 30063519836
Ay = 6.347821447472841, 31. 57.22941613544168,
¥; = t[-.5527083, .9894762, 8356429, —92.88108454947197,
1] —54.25594801085533,
Ay = -.291790550182573, FT BTN FAAIB 9074955974531
v, = u[—1, 06924058, .3582734, ,
9206089]
for nonzero 7, 5, ft, and u CHAPTER 9
3. a. The characteristic polynomial |A — A| Section 9.1
_ja-drA b}_ iy,
=l » o2 fae +a c)A+
la z+w=4+izwe=5+ Si
(ac — b’) has roots b. z+ w=3+
23. zw = -—1 4+ 31
A-48 ANSWERS TO MOST ODD-NUMBERED EXERCISES

3.a . [2] = = (3 — 25), zz = 31. a. Both


Mane ot b. Hermitian but not unitary
b. [z)= V/17,z7
= 4+ i, zz = c. Not Hermitian but uniiary
(4 - (449
= 17 = |Z?
d. Neither
3
la. +r 1. —13 9
33. TY FFTTTTFF
a2! b. 3 +(- =}
41. Diagonal. matrices with entries of moduh
9. a. Modulus 22, principal argument 32/4
1 on the diagonal.
11. 16 17. FTFFTFFTFT
M1. See answer to Exercise 3.
19.2 4+ Vi, -12 + Vi, -V2 - Vi,
V2 - Vi a 1+i 0 2+¢
NV | Lb+ti -lt+i ol MS. |-4 +

wd
MN. !,i,-1, -i


-I +i -1-2i i -6i
23. 2,2 + Vi, 21, -V2 +V/2i, -2.
M7. 2
—-V2 -V2i, -21, V2 - Vi
M9. Eniering [Q, R] = qr(A), where A is the
matrix having the given vectors a,, a2, a; i
Section 9.2 column vectors, returns a matrix Q havin
as column vectors an orthonormal basis
—3+2i 2i 2i {q,, dh, 4}; where
3. AB = 2
2+3i -lt+i
2i 1
2+i
|,
q, = [-0.5774, -0.5774i, -0.5774i],
—2+2i i 2-1 q, = [-0.4695 — 0.17193,
BA=|2+3i I+3i 0 ~0.4695 — 0.17193, 0.2977 + 0.6414i],
21 -Il+i 0 = [0.4695 + 0.4291, 0.0726— 0.6414:
0.3703 + 0.1719
- If 2+: -i
~3/-1-i 1 To check, using MATLAB, the Student’s
Solutions Manual’s answers
| 9= 3 1 + 3i -4 + 8i
7. -3+i 3-f -2-61 Vv, = a, V) = a, V; = [1 — 31, -3 + i, -2
10} 944i 2-41 2-4i for an orthogonal basis, enter
3a +i ((1-1)/Q(1,1))*Q¢,1) to check v,, enter
-Z=— 9-31 11. sp||1 + 37 (1/Q(1,2))#Q(:,2) to check v,, and enter
10| 6 -2i 2 ((1—3+i)/Q(1,3))*Q(:,3) to check v,.
13. 3 18. a. (u, v) = 0, wv, u) = 0 MII. a. 0/274, 4476, 458, and V/353 for
b. uv) = 5 - 34,0, =5 + 3 rows |, 2, 3, and 4, respectively.

21. a. Perpendicular d. Parallel b. V277, V192, V529, V124, and


439 for columns |, 2, 3, 4, and 5,
b. Parallel e. Perpendicular
respectively.
c. Neither
c. —45 — 146i

23. i d. 31 + 147

25. [-3i, 1,2 + 2i]


Section 9.3
27. {2 +i, 1 + i, [1 - i, -2 + i}
29. {{(l, i, i], [1 + 3i, 3 — 24, a, A [-i _[0 0]
[1 + i, i, 1 - 2i)} I | 1 | D= k 2]
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-49

3. U -|¢ aoe (1 M8 |,jo -(0 9. @. A, = -1.A, =A; = A= AS = 2.


b. (J + 1) has rank 4 and nullity | for
k2 1.
i 0 E 0 O
5. U i 0 l,,D=|0 1 0 (J — 24) has rank 3 and nullity 2,
v2 0 V20 001 (J — 2/) has rank | and nullity 4
‘for k = 2.
pave 0 a aI)
7.U=| -2V% 0 I ec J+he0),
0 I 0 | J —- 2h e,>e,-0,e, > e, > 0.

[3 0 0 d. Je,= —e,, Je, = 2e,, Je, = e, + 2e;.


D=| 0 3 O Je,
= 2e,, Je,
= e, + 2es.
|0 0 3 30000
03100
a + aI ° (+ avs] 11. 00310
9.U= 00008
| 0 V3 00003
_ .0 ‘1 1 0000 0 0
D=| 0-3 0 0 100 0 0 0 4
| O 0 | 0 0 1 10 0 0 0
000 1 1 0 0 0
13.
(-1 -d)/V8B 0 (3 + 3i/ 00001 0 0 0
WU =| (1 - V8 + DV rll 0 0 0 0 0-2 0 0
| OV -ifV3 2/24 00 0 0 0 0-2 0
0 0 0 0 0 0 O -2]
10 0 0
D=!|01 0 15. | 001 ol (51s| ![I (Other bases are possible.)
10 0 4
13. {a EC | Jal = 4} 15.a=-1 100 ° (Ol j1
17. 04 1], 3} 0}? (Other answers are
19. FTTFTFTTFF 00 4 | l
possible.)

tion 9.4 1 0 0 of} {fol fs | 0]


0 2 0 0 OF JI} JO 0; 0
19. 0 O-!1 1. Of, §)Of,/0},)- ia
1. Yes 3. No 5. No 0 0 0-1 OF }O} /0 K l
Toa. Ay =A, =A, = A= 2. 109 O O O -1) {L105 LO li 04)
(Other answers are possible.)
b. J + 27 has rank 3 and nullity 1,
Doon.

aQNnoo

(J + 2/) has rank 2 and nullity 2,


ooo°o
NODO
Gone

(J + 2/) has rank | and nullity 3, 21. , fe, + €3, ©c, Cr, Cg, Cy — e;}

(J + 21* has rank 0 and nullity 4


00002
fork = 4,
(Other answers are possible.)
c. J+ 2fe,7e,>e,>e
> 0. 23. TFTTFTTFFF 25. O 27. O
d. Je, —2e,, Je, = e, — 2e,, 29. At + (3 — DAP + (3 - 3DA? +
Je, ey —_ 2e;, Je, = ey — 2e,. (1 — 3)A -
A-50 ANSWERS TO MOST ODD-NUMBERED EXERCISES

CHAPTER 10 | 0 0 1 2 -]
W.L=|3 1 OU=|0 1 5
Section 10.1 4-10 1 0 0 55|
1. There are 1 — ! flops performed on b
while the first column of A is being fixed
up, 2 — 2 flops while the second columa of
A is being fixed up, and so on. The total
number is(n — |) +(n-2}+-+-+ 42+ 1 00 0 | .2 -3
1 = n(n — 1)/2, which has order of 13.2 _| 2
or 1 0 00 ,U _|0-
0 1 9Q
0-9
magnitude n?/2 for large n.
3. mn, if we call each indexed addition a flop 4 27 | 0 0 0
5. mn? 7. 2n} 9. 3n3 11. 6n3 3
13. 30/2 15. 2 17. 2n x =
~|
19. wn, counting each final division as a flop L 2
21. FTFFTTFRTF
15. | 4 a
23. (No text answer data are possible for this -| 17.) 2
probicm, since different computers run at 2)
different speeds. However, for n large
enough to require 6 seconds or more to 1 0 O1 O Olf: 4 -
solve an n X mn system, our computer did 19.£DU=|0 1 O}}0 | Oo 4
require roughly 50% more time when using 3-4 1410 0 14]l0 9g
the Gauss—Jordan method than when
using the Gauss method with back 21.FFTFTFTFFT
substitution.)
25. See answer to Exercise 23. 23. Display:
1 3 ~-5 2
4-18 39 0 -}
Section 10.2 3 1666667 9 -2 4.16€
2. 2777778 1.407407 - 4.185185 5.42
1. It is not significant; no arithmetic ~6 1.166667 6666667 —4.141593 45.47¢
operations are involved, just storing of
l 5

ei
indexed values.
-| -8
For b,x =|} 0}; forb,x=|] 9]; for b,x =
2 i
~2 4
8125
25. |_| 1593 | nf ia
7 —.3125

~ 09131459
— 08786787
—.4790014
29. | "01573263
08439226
| 2712737
ANSWERS TO MOST ODD-NUMBERED EXERCISES A-51

31. The ratios roughly conform on our 268


computer. One solution that requires 100
have x =
~2100 . Corresponding
flops took about | of the time to form the 3720
L/U display, which requires about 102 ~1820
flops, etc. ; components of the solutions of H,x = b
33. The ratios roughly conform on our and of H,x = ¢ differ by as much as 5220.
computer, in the sense explained in the
answer to Exercise 31. 4 -6
19. We find that H,-' -t = |_é ‘].

ction 10.3
21. We were able to complete a reduction for
1. xX, = 0,x,= l inverses of the Ist, 2nd, 4th, 8th, and 16th
powcrs of H,. The inverse checks for H,
3. X, = 1, x, = .9999. Yes, it is reasouably
and H,? were pretty good, and we expect
accurate. 10x, + 1000000x, = 1000000,
that
—10x, + 20x, = 10 is a system that can’t
be solved without pivoting by a five-fizure 16 -120 240 -140
computer. -| = —120 1200 -2700 1680
5. a, + 10'%x, = 10", —x, + 2x, = | Ay 240 -2700 6480 —4200
-140 1680 —4200 2800
. We need to show that (m(nA)"')A = I. It is
~~!

easy to see that (rA)B = A(rB) = r{AB) for and


any scalar r and matrices A and B such
that AB is defined. Thus we have 91856 -1029120 2471040 -— 1603840
(HR)! = —1029120 11566800 —27820800 18076800
(n(nA)"')A = (nA)'(nA) = I.
2471040 —27820800 66978000 —43545600|
. On our computer, [x,, x] = [-107°, 107%], — 1603840 18076800 —43545600 28322000|
\o

which is approximately correct.


11. MATCOMP yielded the solution x, = The entries in our “inverse’’ of H,‘ were of
—10°°, x, = 10°°, which is approximately the right order of magnitude; if the entries
correct for the system of H,” are of order of magnitude roughly
10°, then we expect the entries of H,?" to
10x, + 10°x, l -be of order roughly 10%. Also, our inverse
10°x, + 2(10°)x, 1. matrix for H,‘ was symmetric, but the
inverse check was not very good. For H,*
13. Both with and without the scaling routine, and H,'°, our “inverses” contained entnes
MATCOMP successfully inverted the given of completely wrong orders of magnitude,
matrix on our computer. and the inverse matrices were not
ae i _ [-46l | symmetric.
15. For b, we have x = | 96) and for c, we

_ | -36
have x = | 361
M1. For n = 5, b = [I, 0);
The two components of b differ from the for n = 10, b = [I, 0};
corresponding ones of c by 10 and by 18. for n = 15, b = 10°[1.68, —2.90];
32 for n = 20, b = 10'[6.265, — 6.688];
17. For b, we have x =
240 for n = 25, b = 10'92.09, — 3.09};
and for c, we
— 1500)
1400 for n = 30, b = 10*[-6.56, 6.97].
ANSWERS TO MOST ODD-NUMBERED EXERCISES

APPENDIX A . Let P(n) be the equation to be proved. We

1o7)
see that P(1) is true, because
1. Let P(n) be the equation to be proved. ai -ry(l-n=a(l+n=atar.
Clearly P(1) is true, because Assume that P(k) ts true. Then
+ 1)(2 +
|= eee) Assume that P(X) ts atartart+ +++ + ar!
+ ark!
_ a(1 _ rkth)
true. + ark!
l-r
Then
_ a(1 _ re + rk] _ r))

P+2+Pt---
+k + (kt ly
l-r
~ MEA DEED Gay _ a(l — r**?)
l-r’
(A+ 1)2h +k + 6k + 6) which establishes P(k + 1). Therefore, P(r
6 is true for all € Z*.

_ (K+ Ik + 2)(2k + 3) . The nection of an “interesting property”

~J
6 ,
has not been made precise; it is not well
defined. Moreover, we work in
Thus, P(k + 1) 1s true, and P(m) holds for
all ne Z*.
mathematics with two-valued logic: a
statement is either true or false, but not
. Let P(n) be the equation to be proved. We both, The assertion that not having an
see that P(1) is true. Assurce that P(x) is interesting property would be an
true. Then interesting property seems to contradict
this two-valued logic. We would be saying
1+ 345+ --: +(2k-1)
+ (2k +1) that the integer both has and does not ha
=k°+2k+1=(k+ 1), an interesting property.
as required. Therefure, P(n) holds for all
nEZ.
INDEX
absorbing Markov chain, 114 closure, 89, 192 dependent set, 127, 194
absorbing state, 114 code Descartes, René, 419
addition binary group, 118 determinant, 239, 241, 245. 252
of matrices, 42 Hamming (7, 4), 119 coiputation of, 263
preservation of, 142,213 code word, 117 expansion of, 254
of vectors, 6, 7, 180 codomain, 142, 213 properties of, 256, 257
of words, 118 coefficient matnx, 54 De Witt, Jan, 419, 424
adjoint, Hermitian, 469 coefficient, scalar, 10, 191 diagonal, main, 41
adjoint matnx, 269 cofactor, 252 diagonal matrix, 41, 305
algebraic multiplicity, 312, 403 column space, 91, 136 diagonalizable matnx, 307, 477
angle column vector, 13 diagonalizaLle transformation, 404
between vectors, 24, 235 commuting matrices, 48 diagonalization, 307
preservation of, 166 complete induction, A-3 orthogonal, 354
argument complex conjugate, 457 diagonalizing a quadratic form,
of a complex number, 459 complex number, 455 415
principal, 459 argument of, 459 diagonalizing substitution, 410
asymptotes, 420 conjugate of, 457 difference of matrices, 43
augmented matrix, 54 imaginary part of, 455 differential, 280
axiom of choice, 198 magnitude of, 456 differential equation, linear, 216,
back substitution, 58 modulus of, 456 320
band matrix, 510 polar form of, 459 dimension
band width, 510 real part of, 455 of a subspace, 131
basis complex vector space, 465 of a vector space, 199
Jordan, 492 component of a vector, 4 direction vector, 168
ordered, 205 composite function, 149, 213 distance, 31
orthonormal, 340 composite transformation, 149, 214 between vectors, 233
standard, 10 conic section, 418 betwcen words, 117
standard ordered, 205 conjugate domain, 142
of a subspace, 93 of a complex number, 457 dominant eigenvalue, 289
of a vector space, 197 of a matrix, 469 dot product, 24
basis matrix, 389 conjugate transpose, 469 preservation of, 166
Bezout, Etienne, 257 consistent linear system, 58 properties of, 25
binary alphabet, 115 contraction, 162 echelon form, 57
binary group code, 118 coordinate vector, 205, 389 reduced, 63
binary word, 116 unit, 22 eigenspace, 296, 476
biorthogonality, 301! coordinatization, 180 eigenvalue(s), 288, 476
Bombelli, Raphael, 458 Cotes, Roger, 375 algebraic multiplicity of, 312,
box, 274 Cramer, Gabriel. 257, 267 403
volume of, 276, A-9— Cramer’s rule, 266 eigenspace ot, 296
Bunyakovsky, Viktor Yakovlevich, cross product, 241 geometric multiplicity of, 312,
25 properties of, 246 403
Cardano, Gerolamo, 458 D’Alembert, Jean Le Rond, 290 left, 301
Cauchy, Augustin-Louis, 25, 257, decoding of a linear transformation, 297
310, 409 maximum-likelihood, !20 properties of, 296
Cayley, Arthur, 3, 37, 75 nearest-neighbor, |20 eigenvector(s), 109, 288, 476
Cayley-Hamilton theorem, 302 parity-check matrix, 122 left, 301
chain, Markov, 114 deflation, 442 of a linear transformation, 297
change-of-coordinates matrix, 390 degenerate ellipse, 420 properties of, 296
characteristic equation, 291 degenerate hyperbola, 420 Eisenstein, Ferdinand Gotthold, 40
characteristic polynomial, 291 degenerate n-box, 274 elementary column operations, 67
characteristic value, 288 degenerate parabola, 421 elementary matrix, 65
characteristic vector, 288 dependence relation, 127. 194 elementary row operations, 56

A-53
A-54 INOCX

ellipse. 418 Gauss reduction with back information vector, 288, 318
degenerate, 420 substitution, 60 inner product, 24, 230
ellipsoid, 424, 427 Gauss-Jordan method. 62 product, Euclidean, 467
elliptic cone, 425, 427 general solution. 59, 97 inner-product space, 230
elliptic paraboloid, 425, 427 general solutiun vector, 91! invariant subspace, 500
equal vectors, 4 generating veciors, 90, 191 inverse
equation geometric multiplicity, 312, 403 additive, 180
characteristic, 79| Gibbs, J. Willard, 214, 241 of a matnx, 75
panty-check, 117 Gram, Jorgen P., 343 of a product, 77
zank, 139 Gram—Schmidt process, 342 ofa transformation, 151 221
equivalence relation, 315 Grassman, Hermann, 3, 191 inverse image, 142, 213
Euclidean inner product, 467 group code, binary, 118 invertible matrix, 75
Euclidean space, 2 Hamilton, William Rowan, 3, 241 invertible transformation, 151, 21
subspace of, 89 Hamming, Richard, 120 irrational number, 454
Euler, Leonhard, 172, 267, 290, 424 Hamming (7. 4) code, 119 irreducible polynomial, A-3
expansion, 162 Hamming weight, 117 isomorphic vector spaces, 208, 22
by minors, 254 Heaviside, Oliver, 214 isomorphism, 221
Fermat, Pierre, 419 Hermann, Jacpb, 172 Jacobi, Carl Gustav, 351
Fibonacci sequence, 287 Hermitian adjoini, 469 Jacobi’s method, 436
field, 464 Hermitian matnx, 471 Jordan, Camille, 490
finite-dimensional vector space, 199 spectral theorem for, 479 Jordan basis, 492
finitely generated vector space, i91, Heron, of Alexandna, 5 Jordan block, 488
198 Hilbert, David, 444, 534 Jordan canonical form, 313, 324,
flat, 171 dilbert matnx, 47, 534 489, 491
parametnic equations of, 172 homogeneous linear system, 88 sordan, Wilhelm, 63
vector equation of, 172 nontrivial solution of, 88 kernel, 148, 218
flop, 503 trivial solution of, 88 Lagrange, Joseph Louis, 244, 290
force, resultant, 5 Hooke’s law, 370 Laplace, Pierre Simon, 257
force vector, 3 Householder matnx, 359 LDU-factonzation, 521
torm hyperbola, 420 least squares, 373
Jordan canonical, 313, 324, 489, asymptotes of, 420 Legendre, Adnen-Mane, 375
491 denegerate, 421 Leibniz, Gottfned von, 251
negative definite, 433 hyperbolic cylinder, 424, 427 length
polar, 459 hyperbolic paraboloid 475, 427 preservation of, 166
positive definite, 433 hyperboloid, 425, 426, 427 ofa vector, 21
quadratic, 409 hyperplane, 171, 331 of a word, 117
free vanables, 59 idempotent matrix, 86, 365 l'H6pital, Marquis de, 251
Frobenius, Georg Ferdinand, 137. identity matrix, 41 line, 168, 171
310, 35! identity transformation, 150 along a vector, 11
full matrix, 510 ill-conditioned matrix, 534 parametric equations of, 169
full pivong, 530 image, 142, 213 vector equation of, 168
function, | 42 inverse, 142, 213 linear combination, 10, 191
codomain of, 142, 213 imaginary axis. 455 linearly dependent, 127, 194
composite, 149, 215 imaginary number, 455 linearly independent, 127, 194
demain of, 142 inconsistent linear system, 58 linear system(s), 14, 51
local maximum of, 431 independent set, 127. 194 consistent, 58
locai minimum of, 433 independent vectors, 127, 194 general solution of, 59
one-to-one, 219 induction, A-1 homogeneous, 88
onto, 219 complete, A-3 history of, 52
scalar multiplication of, 183 induction axiom, A-1 inconsistent, 58
range of, 142, 213 induction hypothesis, A-2 overdetermined, 101, 370
sum of, 182 inequality particular solution of, 59
weight, 237 Schwartz, 24, 29 solution of, 51
fundamental theorem of algebra, triangle, 21, 30 square, 95
455 infinite series, 43] underdetermined, 96
Gauss, Carl Friedrich, 40, 66, 375 information portion, 117 unique soluiion case, 96
INDEX A-55

unstable, 534 conjugate transpose of, 469 skew symmetric, 48


ear transformation, 142, 213 determinant of, 239, 241, 245, symmetric, 43
codomain of, 213 252 trace of. 302
composite, 149, 214 diagonal, 41, 305 transition, 102 105
diagonalization of, 404 diagonalizable, 307, 477 transpose of, 43
domain of, 213 difference of, 43 tridiagonal, 315
eigenvalue of, 297 eigenspace of, 296 unitarily diagonalizable, 479
eigenvector of, 297 eigenvalue of, 288, 476 unitarily equivalent, 478
identity, 150 eigenvector of, 288, 476 unitary, 470
inverse of, 151, 219 elementary, 65 unstable, 534
invertible, 151, 219 full, 510 upper-triangular, 86
kemel of. 148 Hermitian, 471 zero, 42
matrix representation of, 223, Hermitian adjoint of, 469 matnx addition, 42
396 Hilbert, 47, 534 matnx multiplication, 38
neutrally stable, 289, 320 Householder, 359 matmx operations, properties of, 45
nullspace of, 148 idempoteut, 86, 365 matnx representation(s)
one-to-one, 218, 219 identity, 41 similarity of, 399
onto, 219 ill-conditioned, 534 standard, 146
orthogonal, 356 inverse of, 75 relative to ordered bases, 223.
range of, 231 invertible, 75 396
stable, 289, 320 Jordan block, 488 maximum-likelihood dccoding, 120
standard matnx representation Jordan canonical form of, 313, message word, !17
of, 146 489 method of least squares, 373
unstable, 289, 320 left eigenvalue, 301 midpoint of a line segment, 170
volume-change factor of, 278 left eigenvector, 301 minor matrix, 250
ic segment, 170 lower-trniangular, 86 minors, expansion by, 254
midpoint of, 170 LU-factorization ef, 518, 522 modulo 2 arithmetic, | 16
cal maximum, 431 main diagonal of, 41 modulus of a complex number, 456
cal minimum, 431 minor, 250 Monge, Gaspard, 172
wer-triangular matrix, 86 nilpotent, 86 multiplication,
J display, 517 normal, 482 of matrices, 38
J-factorization, 518, 522 nullity of, 139 scalar, 6, 7, 42, 180
2claumn, Colin, 267 nullspace of, 91, 136 n-box, 274
agnitude orthogonal, 350 degenerate, 274
of a complex number, 456 panity-check, 122 volume of, 276, A-9
order of, 506 partitioned, 54 natural number, 454
ofa vector, 21, 232, 467 product of, 38 nearest-neighbor decoding, 120
ain diagonal, 41 product by a scalar, 42 negative definite quadratic form,
ap (see function) projection, 155, 360, 363 433
arkov, Andrei Andreevich, 105 OR-factorization of, 344 neutrally stable transformation,
arkov chain, 105 rank of, 137 289, 320
absorbing, 114 rank equation of, 139 Newton, Issac, 5
regular 107 reduced row-echelon form of, 63 nilpotent matnx, 86
athematicai induction, A-i regular transition, 107 nontrivial solution, 88
atnx (matrices), 36 row equivalent, 56 norm
addition of, 42 row space of, 91, 136 properties of, 21
adjoint, 269 row-echelon form of, 57 ofa vector, 21, 232, 467
augmented, 54 scalar multiplication of, 42 normal matrix, 482
band, 510 similar, 310, 399, 477 spectral theorem for, 483
basis, 389 singular, 75 normal mode, 320
change-of-coordinates, 390 skew symmetnic, 48 nullify of a matrix, 139
characteristic polynomial of, 291 spectral decomposition of, 444 nullspace
coefficient, 54 square, 36 of a linear transformation, 148
column space of, 91, 136 standard generator, 119 ofa matrix, 91, 146
com muting, 48 subtraction of, 43 one-to-one, 218, 219
conjugate of, 469 sum of, 42 onto, 219
A-56 INDEX

operation population distribution vector, 104 equivalence, 315


closure under an, 89 steady state, 108 representations, similarity of, 399
elementary column, 67 positive definite form, 433 resultant force, 5
elementary row, 56 power method, 441 Riesz, Frigyes, 444
vector space, 180 preservation, 142 right-hand rule, 242
opposite orientation, 413 of anglc, 166 rigid motion of the plane, 156
order of magnitude, 506 of length, 166 root of unity, primitive, 464
ordered basis, 205 of dot product, 166 rotation of the plane, 156
onentation, 280 of subspaces, 217 roundoff error, 527
opposite, 413 under a function, 142 row addition, 56
same, 413 primitive nth root of unity, 464 row-echelon form, 57
orthogonal basis, 338 principal argument, 459 reduced, 63
orthogonal complement, 329 principal axis theorem, 415, 426 10w-equivalent matrices, 55
orthogonal diagonalization, 354 principle of biorthogonality, 30! row interchange, 56
orthogonal linear transformation, probability, 105 row operations, 56
356 product row scaling, 56
orthogonal matrix, 350 cross, 241 row space ofa mairix, 91, 136
orthogonal set of vectors, 338 dot, 24 row vector, 13
orthogonal subspaces, 337 inner, 24 same orientation, 413
orthogonal vectors, 27, 235, 467 of matrices, 38 scalar, 4
orthonormal basis, 340 scalar, 42 scalar coeficient, 10, 191
Ostrogradskii, Mikhail, 3 projection scalar multiplication, 6, 7, 180, 1&
overdetermired system, 101, 370 on a subspace, 327, 332 preservation of, 142, 213
least squares solution of, 380 of the plane, 155 properties of, 9
parabola, 421 projection matrix, 155, 360, 363 scaling, 531
degenerate, 421 proper subspace, 193 Schmidt, Erhard, 343, 444
parabolic cylinder, 424, 427 proper value, 288 Schooten, Frans van, 419
parallel vectors, 10, 467 proper vector, 288 Schur’s lemma. 479, 483
parallelogram relation, 26 QR algorithm, 448 Schwarz, Hermann Amandus, 25
parameter, 169 QOR-factorization, 344, 383 Schwarz inequality, 24, 29, 235
parametric equations quadratic form, 409 Seki, Takakazu, 251
of a flat, 172 diagonalization of, 415 Shannon, Claude, | 20
of a line, 169 negative definite, 433 shear, 163
parity-check equation, 117 positive definite, 433 shift, 449
panty-check matnix, 122 symmetnc coefficient matnx of, similar matrices, 310, 399, 477
parity-check matnx decoding, 122 411 similanty of matrix representation
parity-check portion, 117 upper-triangular coefficient 399
partial pivoting, 528 matrix of, 410 singular matrix, 75
particular solution, 59, 97 quadric surface, 423 skew symmetric matnx, 48
partitioned matnx, 54 quaternions, 241 solution
Peano, Giuseppe, 181 range of a function, 142, 213 general, 59, 97
perpendicular vectors, 27, 235, 467 rank ofa matnx, 137 nontrivial, 88
pivot, 57 rank equation, 139 least squares, 373, 380
pivoting rational number, 454 particular, 59, 97
full, 530 Rayleigh, Baron, 440 trivial, 88
partial, 528 Rayleigh quotient, 440 space, Euclidean, 2
plane. 171 real axis, 455 (see also vector spacé)
spanned by vectors, |! real number, 454 span, 14, 90, 191
points, distance between, 31 reduced row-echelon form, 63 spanning vectors, 90, 191
polar form of a complex number, redundancy portion, 117 spectral decomposition, 444
459 reflection spectral theorem,
poiynomiai(s) in a line, 157 for Hermitian matrices, 479
characteristic, 291 in a subspace, 369 for normal matrices, 483
irreducible, A-3 regular Markov chain, 107 for symmetric matrices, 444
product by a scalar, 182 relation, spring constant, 370
sum of, 182 dependence, 127, 194 square matrix, 36, 95
INDEX A-57

stable transformation, 289, 320 regular, 107 population distribution, 104


standard basis vectors, 10 translation, 167 projection of, 327, 332
standard generator matrix, 119 translated vector, 5, 167 proper, 288
standard matrix representation, 146 transpose row, 13
standard ordered basis, 205 of a matrix, 43 same direction, 10
standard position, 4 of a vector, 14 span of, 14
string, 115 properties of, 45 standard basis, 10
of basis vectors, 495 triangle inequality, 21, 30, 235 steady-state distribution, 108
Strutt, John Wiiiiam, 440 tridiagonal matrix, 315, 510 in standard position, 4
subset trivial solution, 88 sum of, 5
closed under addition, 192 truncation error, 527 translated, 5
closed under an operation, 89 underdetermined linear system, 96 translation, 167
closed under scalar unitarily diagonalizable matrices, transpose of, 14
multiplication, 192 479 unit, 22, 467
image of, 142, 213 unitarliy equivaient matrices, 478 velocity, 27
inverse image of, 142, 214 unitary matrix, 470 zero, 4, 183
iranslation of, 167 unit coordinate vector, 22 vector addition, 6, 7, 180
subspace(s), 192 unit vector, 22, 470 properties of, 9
basis for a, 93 unstable linear system, 534 vector equation of a flat, 172
dimension o:, 131 unstable transformation, 229, 320 vector equation of a line, 168
generated by vectors, 90 upper-triangular coefficient matrix, vector space{s), 180, 465
invariant, 500 basis for, 197
of R*, 89 upper triangular matrix, 86 complex, 465
orthogonal, 337 Vandermonde, Alexandre- coordinatization of, 180
orthogonal complement of, 329 Theophile, 257 dimension of, 199
preservation of, 148, 217 variable(s), free, 59 finitely generated, 191
projectior matnx fer, 360, 363 vector(s), 4 inner product on a, 230
projection on, 327, 332 addition of, !80 inner-product, 230
proper, 193 angle between, 24, 235 isomorphic, 22!
reflection in, 369 characteristic, 288 linear transformation of, 213
spanned by vectors, 90 column, 13 subspace of, 192
of a vector space, 192 component of, 4 vector space axioms, 180, 181
zero, 89, 193 coordinate, 205, 389 vector space operations, 180
substitution cross product of, 241 vector subtraction, 6, 7
back, 58 dependent set of, 127, 194 velocity vector, 27
diagonalizing, 410 direction, 168 volume of a box, 276. A-9
subtraction distance between, 233 volume-change factor, 278
of matnices, 43 dot product of, 24 weight function, 237
of vectors, 6, 7 error, 352 Weyl, Hermann, 181, 214, 444
sum equal, 4 word, 115
of functions, 182 force, 3 binary, 116
of matrices, 42 information, 288, 318 code, !17
of polynomials, 182 inner product of, 24 distance between, 117
-of vectors, 5 line along, 11 Hamming weight of, 117
Sylvester, James Joseph, 37, 75, linear combination of, 10, 291 information portion of, | 17
137 linearly dependent, 127, 194 length of, 117
symmetric coefficient matrix, 411 linearly independent, 127, 194 message, 117
symmetric matrix, 43 magnitude of, 21, 232, 467 parity-check portion of, 117
fundamental theorem of, 354, norm of, 21, 232, 467 redundancy portion of, 117
480 opposite direction, 10 word addition, !18
trace of a matrix, 302 orthogonal, 27, 235, 467 zero matnix, 42
transformation (see linear orthogonal set of, 338 zero subspace, 89, 193
transformation) parallel, 10, 467 zero vector, 4, 18!
transition matrix, 102, 105 perpendicular, 27, 235, 467

You might also like