0% found this document useful (0 votes)
16 views

Linear_Algebra_LectureNote

This document is a lecture note on Linear Algebra with Applications by Seongjai Kim from Mississippi State University, updated on December 2, 2023. It outlines learning objectives and covers topics such as systems of linear equations, matrix algebra, determinants, vector spaces, eigenvalues, and least-squares methods. The document also includes a table of contents detailing various sections and subtopics within the subject.

Uploaded by

rodrigo camargos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Linear_Algebra_LectureNote

This document is a lecture note on Linear Algebra with Applications by Seongjai Kim from Mississippi State University, updated on December 2, 2023. It outlines learning objectives and covers topics such as systems of linear equations, matrix algebra, determinants, vector spaces, eigenvalues, and least-squares methods. The document also includes a table of contents detailing various sections and subtopics within the subject.

Uploaded by

rodrigo camargos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 288

Linear Algebra with Applications

Lectures on YouTube:
https://www.youtube.com/@mathtalent

Seongjai Kim

Department of Mathematics and Statistics

Mississippi State University

Mississippi State, MS 39762 USA

Email: [email protected]

Updated: December 2, 2023


Seongjai Kim, Professor of Mathematics, Department of Mathematics and Statistics, Mississippi
State University, Mississippi State, MS 39762 USA. Email: [email protected].
Prologue
This lecture note is organized, following contents in Linear Algebra and Its Applications, 6th Ed.,
by D. Lay, S. Lay, and J. McDonald [1].
Seongjai Kim
December 2, 2023

Learning Objectives
Real-world problems can be approximated as and resolved by systems
of linear equations
 
a11 a12 · · · a1n
 
 a21 a22 · · · a2n  m×n
Ax = b, A=  ... .. . . . ..  ∈ R ,

 . . 
am1 am2 · · · amn

where one of {x, b} is the input and the other is the output.

What you would learn, from Linear Algebra:


1. How to Solve Systems of Linear Equations
• Programming with Matlab/Octave
2. Matrix Algebra (Matrix Inverse & Factorizations)
3. Determinants
4. Vector Spaces
5. Eigenvalues and Eigenvectors
• Differential Equations (§5.7)
• Markov Chains (§5.9)
6. Orthogonality and Least-Squares
• Least-Squares Problems
• Machine Learning: Regression Analysis

iii
iv
Contents

Title ii

Prologue iii

Table of Contents vii

1 Linear Equations 1
1.1. Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. Row Reduction and Echelon Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1. Echelon Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2. The General Solution of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . 14
1.3. Vector Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.1. Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.2. Linear Combinations and Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Programming with Matlab/Octave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.4. Matrix Equation Ax = b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.5. Solution Sets of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.5.1. Solutions of Homogeneous Linear Systems . . . . . . . . . . . . . . . . . . . . . 38
1.5.2. Solutions of Nonhomogeneous Linear Systems . . . . . . . . . . . . . . . . . . . 41
1.7. Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.8. Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.9. The Matrix of A Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.9.1. The Standard Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
1.9.2. Existence and Uniqueness Questions . . . . . . . . . . . . . . . . . . . . . . . . 61

2 Matrix Algebra 67
2.1. Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.1.1. Sum, Scalar Multiple, and Matrix Multiplication . . . . . . . . . . . . . . . . . . 68
2.1.2. Properties of Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2.2. The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.3. Characterizations of Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

v
vi Contents

2.5. Solving Linear Systems by Matrix Factorizations . . . . . . . . . . . . . . . . . . . . . . 87


2.5.1. The LU Factorization/Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 87
2.5.2. Solutions of Triangular Algebraic Systems . . . . . . . . . . . . . . . . . . . . . 92
2.8. Subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2.9. Dimension and Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.9.1. Coordinate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.9.2. Dimension of a Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

3 Determinants 109
3.1. Introduction to Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.2. Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4 Vector Spaces 119


4.1. Vector Spaces and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.2. Null Spaces, Column Spaces, and Linear Transformations . . . . . . . . . . . . . . . . 125

5 Eigenvalues and Eigenvectors 131


5.1. Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.2. The Characteristic Equation and Similarity Transformation . . . . . . . . . . . . . . . 138
5.3. Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.1. The Diagonalization Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.2. Diagonalizable Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.5. Complex Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.7. Applications to Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.7.1. Dynamical System: The System of First-Order Differential Equations . . . . . 155
5.7.2. Trajectories for the Dynamical Systems: Attractors, Repellers, and Saddle Points157
5.8. Iterative Estimates for Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.8.1. The Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.8.2. Inverse Power Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.9. Applications to Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.9.1. Probability Vector and Stochastic Matrix . . . . . . . . . . . . . . . . . . . . . . 170
5.9.2. Predicting the Distant Future: Steady-State Vectors . . . . . . . . . . . . . . . 173

6 Orthogonality and Least-Squares 179


6.1. Inner Product, Length, and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.1.1. Inner Product and Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.1.2. Orthogonal Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.2. Orthogonal Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Contents vii

6.2.1. An Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189


6.2.2. Orthonormal Basis and Orthogonal Matrix . . . . . . . . . . . . . . . . . . . . . 192
6.3. Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.4. The Gram-Schmidt Process and QR Factorization . . . . . . . . . . . . . . . . . . . . . 203
6.4.1. The Gram-Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
6.4.2. QR Factorization of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
6.5. Least-Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6.6. Machine Learning: Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
6.6.1. Regression Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
6.6.2. Least-Squares Fitting of Other Curves . . . . . . . . . . . . . . . . . . . . . . . . 221

A Appendix 227
A.1. Understanding / Interpretation of Eigenvalues and Eigenvectors . . . . . . . . . . . . . 228
A.2. Eigenvalues and Eigenvectors of Stochastic Matrices . . . . . . . . . . . . . . . . . . . 231

C Chapter Review 237


C.1. Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
C.2. Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
C.3. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
C.4. Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
C.5. Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
C.6. Orthogonality and Least-Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

P Projects 265
P.1. Project Regression Analysis: Linear, Piecewise Linear, and Nonlinear Models . . . . . 266

Bibliography 275

Index 277
viii Contents
C HAPTER 1
Linear Equations

In this first chapter, we study basics of linear equations, including


• Systems of linear equations
• Three elementary row operations
• Linear transformations

Contents of Chapter 1
1.1. Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. Row Reduction and Echelon Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3. Vector Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Programming with Matlab/Octave . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.4. Matrix Equation Ax = b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.5. Solution Sets of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.7. Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.8. Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.9. The Matrix of A Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

1
2 Chapter 1. Linear Equations

1.1. Systems of Linear Equations

Definition 1.1. A linear equation in the variables x1 , x2 , · · · , xn is an


equation that can be written in the form

a1 x1 + a2 x2 + · · · + an xn = b, (1.1)

where b and the coefficients a1 , a2 , · · · , an are real or complex numbers.

A system of linear equations (or a linear system) is a collection of one


or more linear equations involving the same variables – say, x1 , x2 , · · · , xn .
Example 1.2.
( 
4x1 − x2 = 3  2x + 3y − 4z = 2
(a)

2x1 + 3x2 = 5 (b) x − 2y + z = 1

3x + y − 2z = −1

• Solution: A solution of the system is a list (s1 , s2 , · · · , sn ) of num-


bers that makes each equation a true statement, when the values
s1 , s2 , · · · , sn are substituted for x1 , x2 , · · · , xn , respectively.
• Solution Set: The set of all possible solutions is called the solution
set of the linear system.
• Equivalent System: Two linear systems are called equivalent if
they have the same solution set.
• For example, above (a) is equivalent to
(
2x1 − 4x2 = −2
R1 ← R1 − R2
2x1 + 3x2 = 5
1.1. Systems of Linear Equations 3

Remark 1.3. Linear systems may have

no solution ) : inconsistent system


exactly one (unique) solution
: consistent system
infinitely many solutions

Example 1.4. Consider the case of two equations in two unknowns.


( ( (
−x + y = 1 x+y = 1 −2x + y = 2
(a) (b) (c)
−x + y = 3 x−y = 2 −4x + 2y = 4

Existence and Uniqueness Questions

Two Fundamental Questions about a Linear System:


1. (Existence): Is the system consistent; that is, does at least one
solution exist?
2. (Uniqueness): If a solution exists, is it the only one; that is, is the
solution unique?

Most systems in real-world are consistent (existence) and


they produce the same output for the same input (uniqueness).
4 Chapter 1. Linear Equations

Solving Linear Systems


Matrix Form
Consider a simple system of 2 linear equations:
(
−2x1 + 3x2 = −1
(1.2)
x1 + 2x2 = 4

Such a system of linear equations can be treated much more conveniently


and efficiently with matrix form. In matrix form, (1.2) reads
" # " # " #
−2 3 x1 −1
= . (1.3)
1 2 x2 4
| {z }
coefficient matrix

The essential information of the system can be recorded compactly in a


rectangular array called a augmented matrix:
" # " #
−2 3 −1 −2 3 −1
or (1.4)
1 2 4 1 2 4

Elementary Row Operations

Tools 1.5. Three Elementary Row Operations (ERO):


• Replacement: Replace one row by the sum of itself and a multiple
of another row
Ri ← Ri + k · Rj , j 6= i
• Interchange: Interchange two rows
Ri ↔ Rj , j 6= i
• Scaling: Multiply all entries in a row by a nonzero constant
Ri ← k · Ri , k 6= 0
1.1. Systems of Linear Equations 5

Solving (1.2)

System of linear equations Matrix form


( " #
−2x1 + 3x2 = −1 1 −2 3 −1
x1 + 2x2 = 4 2 1 2 4

1 ↔ 2 : (interchange)
( " #
x1 + 2x2 = 4 1 1 2 4
−2x1 + 3x2 = −1 2 −2 3 −1

2 ← 2 +2· 1 : (replacement)
( " #
x1 + 2x2 = 4 1 1 2 4
7x2 = 7 2 0 7 7

2 ← 2 /7: (scaling)
( " #
x1 + 2x2 = 4 1 1 2 4
x2 = 1 2 0 1 1

1 ← 1 −2· 2 : (replacement)
( " #
x1 = 2 1 1 0 2
x2 = 1 2 0 1 1

At the last step:


( " #
x1 = 2 2
LHS: solution : RHS : I
x2 = 1 1

Definition 1.6. Two matrices are row equivalent if there is a se-


quence of EROs that transforms one matrix to the other.
6 Chapter 1. Linear Equations

Example 1.7. Solve the following system of linear equations, using the 3
EROs. Then, determine if the system is consistent.

x2 − 4x3 = 8
2x1 − 3x2 + 2x3 = 1
4x1 − 8x2 + 12x3 = 1

Solution.

Ans: Inconsistency means that there is no point where the three planes meet at.
Example 1.8. Determine the values of h such that the given system is a
consistent linear system
x + h y = −5
2x − 8y = 6

Solution.

Ans: h 6= −4
1.1. Systems of Linear Equations 7

True-or-False 1.9.
a. Every elementary row operation is reversible.
b. Elementary row operations on an augmented matrix never change the
solution of the associated linear system.
c. Two linear systems are equivalent if they have the same solution set.
d. Two matrices are row equivalent if they have the same number of rows.
Solution.

Ans: T,T,T,F
8 Chapter 1. Linear Equations

You should report your homework with your work for problems. You can scan your solutions
and answers, using a scanner or your phone, then try to put in a file, either in doc/docx or pdf.

Exercises 1.1
1. Consider the augmented matrix of a linear system. State in words the next two elemen-
tary row operations that should be performed in the process of solving the system.
 
1 6 −4 0 1
0 1
 7 0 −4 
 
0 0 −1 2 3
0 0 2 1 −6
2. The augmented matrix of a linear system has been reduced by row operations to the
form shown. Continue the appropriate row operations and describe the solution set.
 
1 1 0 0 4
0 −1 3 0 −7
 
 
0 0 1 −3 1
0 0 0 2 4
3. Solve the systems or determine if the systems in inconsistent.
−x2 − 4x3 = 5 x1 + 3x3 = 2
(a) x1 + 3x2 + 5x3 = −2 x2 − 3x4 = 3
(b)
3x1 + 7x2 + 7x3 = 6 −2x2 + 3x3 + 2x4 = 1
3x1 + 7x4 = −5

4. Determine the value of h such that the matrix is the augmented matrix of a consistent
linear
" system. #
2 −3 h
−4 6 −5
Ans: h = 5/2

5. An important concern in the study of heat Write a system of four equations whose so-
transfer is to determine the steady-state tem- lution gives estimates for the temperatures
perature distribution of a thin plate when the T1 , T2 , · · · , T4 , and solve it.
temperature around the boundary is known.
Assume the plate shown in the figure repre-
sents a cross section of a metal beam, with
negligible heat flow in the direction perpen-
dicular to the plate. Let T1 , T2 , · · · , T4 denote
the temperatures at the four interior nodes of
the mesh in the figure. The temperature at
a node is approximately equal to the average
of the four nearest nodes. For example, T1 =
(10 + 20 + T2 + T4 )/4 or 4T1 = 10 + 20 + T2 + T4 . Figure 1.1
1.2. Row Reduction and Echelon Forms 9

1.2. Row Reduction and Echelon Forms


1.2.1. Echelon Forms
Terminologies
• A nonzero row in a matrix is a row with at least one nonzero entry.
• A leading entry of a row is the left most nonzero entry in a nonzero
row.
• A leading 1 is a leading entry whose value is 1.

Definition 1.10. Echelon form: A rectangular matrix is in an eche-


lon form if it has following properties.
1. All nonzero rows are above any rows of all zeros.
2. Each leading entry in a row is in a column to the right of leading
entry of the row above it.
3. All entries below a leading entry in a column are zeros.
Row reduced echelon form: If a matrix in an echelon form sat-
isfies 4 and 5 below, then it is in the row reduced echelon form
(RREF), or the reduced echelon form (REF).
4. The leading entry in each nonzero row is 1.
5. Each leading 1 is the only nonzero entry in its column.

Example 1.11. Check if the following matrix is in echelon form. If not,


put it in echelon form.
 
0 0 0 0 0
1 2 0 0 1
 

0 0 3 0 4
10 Chapter 1. Linear Equations

Example 1.12. Verify whether the following matrices are in echelon form,
row reduced echelon form.
   
1 0 2 0 1 2 0 0 5
(a) 0 1 3 0 4 (b) 0 0 0 9
   

0 0 0 0 0 0 1 0 6
   
1 1 0 1 1 2 2 3
(c) 0 0 1 (d) 0 0 1 1 1
   

0 0 0 0 0 0 0 4
   
1 0 0 5 0 1 0 5
(e) 0 1 0 6 (f) 0 0 0 6
   

0 0 0 1 0 0 1 2

Uniqueness of the Reduced Echelon Form


Theorem 1.13. Each matrix is row equivalent to one and only one
reduced echelon form.
1.2. Row Reduction and Echelon Forms 11

Pivot Positions
Terminologies
1) A pivot position is a location in A that corresponds to a leading 1
in the reduced echelon form of A.
2) A pivot column is a column of A that contains a pivot position.

Example 1.14. The matrix A is given with its reduced echelon form. Find
the pivot positions and pivot columns of A.
   
1 1 0 2 0 1 1 0 2 0
 R.E.F
A = 1 1 1 3 0 −−−→ 0 0 1 1 0
  

1 1 0 2 4 0 0 0 0 1
Solution.

Remark 1.15. Pivot Positions. Once a matrix is in an echelon form,


further row operations do not change the positions of leading entries.
Thus, the leading entries become the leading 1’s in the reduced eche-
lon form.
12 Chapter 1. Linear Equations

Terminologies
3) Basic variables: In the system Ax = b, the variables that corre-
spond to pivot columns (in [A : b]) are basic variables.
4) Free variables: In the system Ax = b, the variables that correspond
to non-pivotal columns are free variables.

Example 1.16. For the system of linear equations, identify its basic vari-
ables and free variables.

 −x1 − 2x2
 = −3
2x3 = 4

3x3 = 6

Solution. Hint : You may start with its augmented matrix, and apply row operations.

Ans: Basic variables: {x1 , x3 }. Free variable: {x2 }.


Why “free”?
1.2. Row Reduction and Echelon Forms 13

The Row Reduction Algorithm


Steps to reduce to reduced echelon form
1. Start with the leftmost non-zero column. This is a pivot column.
The pivot is at the top.
2. Choose a nonzero entry in the pivot column as a pivot. If necessary,
interchange rows to move a nonzero entry into the pivot position.
3. Use row replacement operations to make zeros in all positions be-
low the pivot.
4. Ignore row and column containing the pivot and ignore all rows
above it. Apply Steps 1–3 to the remaining submatrix. Repeat this
until there are no more rows to modify.

5. Start with right most pivot and work upward and left to make zeros
above each pivot. If pivot is not 1, make it 1 by a scaling operation.

Example 1.17. Row reduce the matrix into reduced echelon form.
 
0 −3 −6 4 9
A = −2 −3 0 3 −1
 

1 4 5 −9 −7
   
1 4 5 −9 −7 1 4 5 −9 −7
R ↔R3   R2 ←R2 +2R1 
Solution. −−1−−→ −2 −3 0 3 −1 −−−−−−−→ 0 5 10 −15 −15

0 −3 −6 4 9 0 −3 −6 4 9
   
1 4 5 −9 −7 1 4 5 −9 −7
R2 ←R2 /5   R ←R3 +3R2 
−−−−−→ 0 1 2 −3 −3 −−3−−− −−→ 0 1 2 −3 −3

0 −3 −6 4 9 0 0 0 −5 0
   
1 4 5 0 −7 1 0 −3 0 5
R3 ←R3 /−5; above the pivot→0   R ←R1 −4R2 
−−−−−−−−−−−−−−−−−−→ 0 1 2 0 −3 −−1−−− −−→ 0 1 2 0 −3

0 0 0 1 0 0 0 0 1 0

Combination of Steps 1–4 is call the forward phase of the row reduction, while Step
5 is called the backward phase.
14 Chapter 1. Linear Equations

1.2.2. The General Solution of Linear Systems

1) For example, for an augmented 4) The system (1.7) can be ex-


matrix, its R.E.F. is given as pressed as
  
1 0 −5 1  x1 = 1 +5 x3

0 1 1 4 (1.5) x2 = 4 −x3 (1.8)
 

0 0 0 0 
x3 = x3

2) Then, the associated system of 5) Thus, the solution of (1.6) can be


equations reads written as
− 5 x3 = 1
     
x1 x1 1 5
x2 + x3 = 4 (1.6) x2  = 4 + x3 −1 , (1.9)
     
0 = 0 x3 0 1
where {x1 , x2 } are basic vari- in which you are free to choose
ables (∵ pivots). any value for x3 . (That is why it
3) Rewrite (1.6) as is called a “free variable”.)

 x1 = 1 +5 x3

x2 = 4 −x3 (1.7)

x3 is free

• The description in (1.9) is called a parametric description of solu-


tion set; the free variable x3 acts as a parameter.
• The solution in (1.9) represents all the solutions of the system (1.5),
which is called the general solution of the system.
1.2. Row Reduction and Echelon Forms 15

Example 1.18. Find the general solution of the system whose augmented
matrix is  
1 0 −5 0 −8 3
 
0 1 4 −1 0 6
[A|b] = 0 0 0 0 1 0

 
0 0 0 0 0 0
Solution. Hint : You should first row reduce it for the reduced echelon form.
16 Chapter 1. Linear Equations

Example 1.19. Find the general solution of the system whose augmented
matrix is  
0 0 0 1 2
 
0 1 3 0 2
[A|b] = 0 1 3 2 6

 
1 0 −9 0 −8
Solution.
1.2. Row Reduction and Echelon Forms 17

Properties
1) Any nonzero matrix may be row reduced (i.e., transformed by el-
ementary row operations) into more than one matrix in echelon
form, using different sequences of row operations.
2) Once a matrix is in an echelon form, further row operations do not
change the pivot positions (Remark 1.15).
3) Each matrix is row equivalent to one and only one reduced eche-
lon matrix (Theorem 1.13, p. 10).

4) A linear system is consistent if and only if the rightmost column of


the augmented matrix is not a pivot column
–i.e., if and only if an echelon form of the augmented matrix has
no row of the form [0 · · · 0 b] with b nonzero.
5) If a linear system is consistent, then the solution set contains either
(a) a unique solution, when there are no free variables, or
(b) infinitely many solutions, when there is at least one free vari-
able.

Example 1.20. Choose h and k such that the system has

a) No solution b) Unique solution c) Many solutions


(
x1 − 3 x2 = 1
2 x1 + h x2 = k
Solution.

Ans: (a) h = −6, k 6= 2


18 Chapter 1. Linear Equations

True-or-False 1.21.
a. The row reduction algorithm applies to only to augmented matrices for
a linear system.
b. If one row in an echelon form of an augmented matrix is [0 0 0 0 2 0],
then the associated linear system is inconsistent.
c. The pivot positions in a matrix depend on whether or not row inter-
changes are used in the row reduction process.
d. Reducing a matrix to an echelon form is called the forward phase of
the row reduction process.
Solution.

Ans: F,F,F,T
1.2. Row Reduction and Echelon Forms 19

Exercises 1.2
1. Row reduce the matrices to reduced echelon form. Circle the pivot positions in the final
matrix and in the original matrix, and list the pivot columns.
   
1 2 3 4 1 3 5 7
(a) 4 5 6 7 (b) 3 5 7 9
   

6 7 8 9 5 7 9 1

2. Find the general solutions of the systems (in parametric vector form) whose aug-
mented matrices are given as
   
1 −7 0 6 5 1 2 −5 −6 0 −5
(a)  0 0 1 −2 −3 0 1 −6 −3 0 2
 
(b) 
 

−1 7 −4 2 7 0 0 0 0 1 0
0 0 0 0 0 0
Ans: (a) x = [5, 0, −3, 0]T + x2 [7, 1, 0, 0]T + x4 [−6, 0, 2, 1]T ;
Ans: (b) x = [−9, 2, 0, 0, 0]T + x3 [−7, 6, 1, 0, 0]T + x4 [0, 3, 0, 1, 0]T 1
3. In the following, we use the notation for matrices in echelon form: the leading entries
with , and any values (including zero) with ∗. Suppose each matrix represents the aug-
mented matrix for a system of linear equations. In each case, determine if the system is
consistent. If the system is consistent, determine if the solution is unique.
     
∗ ∗ ∗ 0 ∗ ∗ ∗ ∗ ∗ ∗ ∗
(a) 0 ∗ ∗ (b) 0 0 ∗ ∗ (c) 0 0 ∗ ∗
     

0 0 ∗ 0 0 0 0 0 0 0 ∗

4. Choose h and k such that the system has (a) no solution, (b) a unique solution, and (c)
many solutions.
x1 + hx2 = 2
4x1 + 8x2 = k
5. Suppose the coefficient matrix of a system of linear equations has a pivot position in
every row. Explain why the system is consistent.

 
a
1 T
The superscript T denotes the transpose; for example [a, b, c] =  b.
 

c
20 Chapter 1. Linear Equations

1.3. Vector Equations

Definition 1.22. A matrix with only one column is called a column


vector, or simply a vector.

For example,  
" # 1
3
∈ R2 −5 ∈ R
3
 
−2
4

1.3.1. Vectors in Rn
Vectors in R2
" #
a
We can identify a point (a, b) with a column vector , position vector.
b

Figure 1.2: Vectors in R2 as points. Figure 1.3: Vectors in R2 with arrows.

Note: Vectors are mathematical objects having direction and length.


We may try to (1) compare them, (2) add or subtract them, (3) multiply
them by a scalar, (4) measure their length, and (5) apply other operations
to get information related to angles.
1.3. Vector Equations 21

" # " #
u1 v1
1) Equality of vectors: Two vectors u = and v = are equal
u2 v2
if and only if corresponding entries are equal, i.e., ui = vi , i = 1, 2.
" # " #
u1 v1
2) Addition: Let u = and v = . Then,
u2 v2
" # " # " #
u1 v1 u1 + v1
u+v = + = .
u2 v2 u2 + v2

3) Scalar multiple: Let c ∈ R, a scalar. Then


" # " #
v1 cv1
cv = c = .
v2 cv2

Theorem 1.23. (Parallelogram Rule for Addition) If u and v are


represented as points in the plane, then u + v corresponds to the fourth
vertex of the parallelogram whose other vertices are 0, u, and v.

Figure 1.4: The parallelogram rule.


22 Chapter 1. Linear Equations
" # " #
2 1
Example 1.24. Let u = and v = .
1 −3

(a) Find u + 2v and 3u − 2v.


(b) Display them on a graph.
Solution.

" # " #
2 1
Remark 1.25. Let a1 = and a2 = . Then
1 −3
" # " # " # " #" # " #
2 1 2x1 + x2 2 1 x1 x1
x1 a1 + x2 a2 = x1 + x2 = = = [a1 a2 ] . (1.10)
1 −3 x1 − 3x2 1 −3 x2 x2

Vectors in Rn
Note: The above vector operations, including the parallelogram rule, are
also applicable for vectors in R3 and Rn , in general.

Algebraic Properties of Rn
For u, v, w ∈ Rn and scalars c and d,

1) u + v = v + u 5) c(u + v) = cu + cv
2) (u + v) + w = u + (v + w) 6) (c + d)u = cu + du
3) u + 0 = 0 + u = u 7) c(du) = (cd)u
4) u + (−u) = (−u) + u = 0 8) 1u = u
where −u = (−1)u
1.3. Vector Equations 23

1.3.2. Linear Combinations and Span

Definition 1.26. Given vectors v1 , v2 , · · · , vp in Rn and scalars


c1 , c2 , · · · , cp , the vector y ∈ Rn defined by

y = c1 v1 + c2 v2 + · · · + cp vp (1.11)

is called the linear combination of v1 , v2 , · · · , vp with weights


c1 , c2 , · · · , cp .

Example 1.27. Given v1 and v2 in R2 , as in the figure below, the collection


of all linear combinations of v1 and v2 must be the same as R2 .
24 Chapter 1. Linear Equations

Definition 1.28. A vector equation is of the form


x1 a1 + x2 a2 + · · · + xp ap = b, (1.12)

where a1 , a2 , · · · , ap , and b are vectors and x1 , x2 , · · · , xp are weights.


      
0 1 5 0
Example 1.29. Let a1 =  4, a2 = 6, a3 = −1, and b = 2.
       

−1 3 8 3
Determine whether or not b can be generated as a linear combination of a1 ,
a2 , and a3 .
Solution. Hint : We should determine
 whether weights x1 , x2 , x3 exist such that x1 a1 +
x1
x2 a2 + x3 a3 = b, which reads [a1 a2 a3 ] x2  = b. (See Remark 1.25 on p.22.)
 

x3
1.3. Vector Equations 25

Note:
1) The vector equation x1 a1 + x2 a2 + · · · + xp ap = b has the same solu-
tion set as a linear system whose augmented matrix is [a1 a2 · · · ap : b].
2) b can be generated as a linear combination of a1 , a2 , · · · , ap if and only
if the linear system Ax = b whose augmented matrix [a1 a2 · · · ap : b]
is consistent.

Definition 1.30. Let v1 , v2 , · · · , vp be p vectors in Rn . Then


Span{v1 , v2 , · · · , vp } is the collection of all linear combinations of
v1 , v2 , · · · , vp , that can be written in the form c1 v1 + c2 v2 + · · · + cp vp ,
where c1 , c2 , · · · , cp are weights. That is,

Span{v1 , v2 , · · · , vp } = {y | y = c1 v1 + c2 v2 + · · · + cp vp } (1.13)

Figure 1.5: A line: Span{v} in R3 .

Figure 1.6: A plane: Span{u, v} in R3 .


26 Chapter 1. Linear Equations
 
2
Example 1.31. Determine if b = −1 is a linear combination of the
 

6
 
1 0 5
columns of the matrix −2 1 −6. (That is, determine if b is in the span
 

0 2 8
of columns of the matrix.)
Solution.
1.3. Vector Equations 27
   
2 1
Example 1.32. Find h so that  1 lies in the plane spanned by a1 = 2
   

h 3
 
2
and a2 = −1.
 

1
Solution.

b ∈ Span{v1 , v2 , · · · , vp }
⇔ x1 v1 + x2 v2 + · · · + xp vp = b has a solution (1.14)
⇔ [v1 v2 · · · vp : b] has a solution

True-or-False 1.33.
# "
1
a. Another notation for the vector is [1 − 2].
−2
b. The set Span{u, v} is always visualized as a plane through the origin.
c. When u and v are nonzero vectors, Span{u, v} contains the line through
u and the origin.
Solution.

Ans: F,F,T
28 Chapter 1. Linear Equations

Exercises 1.3
1. Write a system of equations that is equivalent to the given vector equation; write a
vector equation that is equivalent to the given system of equations.
     
6 −3 1 x2 + 5x3 = 0
(a) x1 −1 + x2  4 = −7 (b) 4x1 + 6x2 − x3 = 0
     

5 0 −5 −x1 + 3x2 − 8x3 = 0

2. Determine
  if b is a linear
  combination
  of a1 , a2, and
 a3 .
1 0 5 2
a1 = −2 , a2 = 1 , a3 = −6 , b = −1 .
       

0 2 8 6
Ans: Yes
3. Determine if b is a linear combination of the vectors formed from the columns of the
matrix A.
       
1 −4 2 3 1 −2 −6 11
(a) A =  0 3 5 , b = −7 (b) A = 0 3 7 , b = −5
       

−2 8 −4 −3 1 −2 5 9
     
1 −2 4
4. Let a1 =  4 , a2 = −3, and b =  1. For what value(s) of h is b in the plane
     

−2 7 h
spanned by a1 and a2 ? Ans: h = −17
5. Construct a 3 × 3 matrix A, with nonzero entries, and a vector b in R3 such that b is
not in the set spanned by the columns of A. Hint : Construct a 3 × 4 augmented matrix in
echelon form that corresponds to an inconsistent system.
6. A mining company has two mines. One day’s operation at mine #1 produces ore that
contains 20 metric tons of copper and 550 kilograms of silver, while one day’s operation
at mine #2 produces
" # ore that contains
" # 30 metric tons of copper and 500 kilograms of
20 30
silver. Let v1 = and v2 = . Then v1 and v2 represent the “output per day" of
550 500
mine #1 and mine #2, respectively.
(a) What physical interpretation can be given to the vector 5v1 ?
(b) Suppose the company operates mine #1 for x1 days and mine #2 for x2 days. Write a
vector equation whose solution gives the number of days each mine should operate
in order to produce 150 tons of copper and 2825 kilograms of silver.
(c) M 2 Solve the equation in (b).

The mark M indicates that you have to solve the problem, using one of Matlab, Maple, and Mathemat-
2

ica. You may also try “octave" as a free alternative of Matlab. Attach a copy of your code.
1.3. Vector Equations 29

Programming with Matlab/Octave

Note: In computer programming, important things are


• How to deal with objects (variables, arrays, functions)
• How to deal with repetition effectively
• How to make the program reusable

Vectors and matrices

The most basic thing you will need Vectors and Matrices
to do is to enter vectors and matri- 1 >> u = [1; 2; 3] % column vector
2 u=
ces. You would enter commands to 3 1
Matlab or Octave at a prompt that 4 2
looks like >>. 5 3
6 >> v = [4; 5; 6];
7 >> u + 2*v
• Rows are separated by semi- 8 ans =
colons (;) or Enter . 9 9
10 12
• Entries in a row are separated 11 15
by commas (,) or space Space . 12 >> w = [5, 6, 7, 8] % row vector
13 w=
For example, 14 5 6 7 8
15 >> A = [2 1; 1 2]; % matrix
16 >> B = [-2, 5
17 1, 2]
18 B=
19 -2 5
20 1 2
21 >> C = A*B % matrix multiplication
22 C=
23 -3 12
24 0 9
30 Chapter 1. Linear Equations

You can save the commands in a file to run and get the same results.
tutorial1_vectors.m
1 u =
[1; 2; 3]
2 v =
[4; 5; 6];
3 u +
2*v
4 w =
[5, 6, 7, 8]
5 A =
[2 1; 1 2];
6 B =
[-2, 5
7 1, 2]
8 C = A*B

Solving equations
   
1 −4 2 3
Let A =  0 3 5 and b = −7 . Then Ax = b can be numerically
   

−2 8 −4 −3
solved by implementing a code as follows.

tutorial2_solve.m Result
1 A = [1 -4 2; 0 3 5; 2 8 -4]; 1 x =
2 b = [3; -7; -3]; 2 0.75000
3 x = A\b 3 -0.97115
4 -0.81731

Graphics with Matlab


In Matlab, the most popular graphic command is plot, which creates a 2D
line plot of the data in Y versus the corresponding values in X. A general
syntax for the command is
plot(X1,Y1,LineSpec1,...,Xn,Yn,LineSpecn)
1.3. Vector Equations 31

tutorial3_plot.m
1 close all
2

3 %% a curve
4 X1 = linspace(0,2*pi,10); % n=10
5 Y1 = cos(X1);
6

7 %% another curve
8 X2=linspace(0,2*pi,20); Y2=sin(X2);
9

10 %% plot together
11 plot(X1,Y1,'-or',X2,Y2,'--b','linewidth',3);
12 legend({'y=cos(x)','y=sin(x)'},'location','best',...
13 'FontSize',16,'textcolor','blue')
14 print -dpng 'fig_cos_sin.png'

Figure 1.7: fig_cos_sin.png: plot of y = cos x and y = sin x.

Above tutorial3_plot.m is a typical M-file for figuring with plot.

• Line 1: It closes all figures currently open.


• Lines 3, 4, 7, and 10 (comments): When the percent sign (%) appears,
the rest of the line will be ignored by Matlab.
• Lines 4 and 8: The command linspace(x1,x2,n) returns a row vector
of n evenly spaced points between x1 and x2.
• Line 11: Its result is a figure shown in Figure 1.7.
• Line 14: it saves the figure into a png format, named fig_cos_sin.png.
32 Chapter 1. Linear Equations

Repetition: iteration loops

Note: In scientific computation, one of most frequently occurring events


is repetition. Each repetition of the process is also called an iteration.
It is the act of repeating a process, to generate a (possibly unbounded)
sequence of outcomes, with the aim of approaching a desired goal, target
or result. Thus,
• iteration must start with an initialization (starting point) and
• perform a step-by-step marching in which the results of one iteration
are used as the starting point for the next iteration.

In the context of mathematics or computer science, iteration (along with


the related technique of recursion) is a very basic building block in pro-
gramming. Matlab provides various types of loops: while loops, for loops,
and nested loops.

while loop
The syntax of a while loop in Matlab is as follows.
while <expression>
<statements>
end
An expression is true when the result is nonempty and contains all nonzero
elements, logical or real numeric; otherwise the expression is false. Here is
an example for the while loop.
n1=11; n2=20;
sum=n1;
while n1<n2
n1 = n1+1; sum = sum+n1;
end
fprintf('while loop: sum=%d\n',sum);
When the code above is executed, the result will be:
while loop: sum=155
1.3. Vector Equations 33

for loop
A for loop is a repetition control structure that allows you to efficiently
write a loop that needs to execute a specific number of times. The syntax of
a for loop in Matlab is as following:
for index = values
<program statements>
end
Here is an example for the for loop.
n1=11; n2=20;
sum=0;
for i=n1:n2
sum = sum+i;
end
fprintf('for loop: sum=%d\n',sum);
When the code above is executed, the result will be:
for loop: sum=155

Functions: Enhancing reusability


Program scripts can be saved to reuse later conveniently. For example,
the script for the summation of integers from n1 to n2 can be saved as a form
of function.
mysum.m
1 function s = mysum(n1,n2)
2 % sum of integers from n1 to n2
3

4 s=0;
5 for i=n1:n2
6 s = s+i;
7 end

Now, you can call it with e.g. mysum(11,20).


Then the result reads ans = 155.
34 Chapter 1. Linear Equations

1.4. Matrix Equation Ax = b

A fundamental idea in linear algebra is to view a linear combina-


tion of vectors as a product of a matrix and a vector.

Definition 1.34. Let A = [a1 a2 · · · an ] be an m × n matrix and x ∈ Rn ,


then the product of A and x denoted by A x is the linear combination
of columns of A using the corresponding entries of x as weights, i.e.,
 
x1
 
 x2 
A x = [a1 a2 · · · an ] 
 ...  = x1 a1 + x2 a2 + · · · + xn an .
 (1.15)
 
xn

A matrix equation is of the form A x = b, where b is a column vector of


size m × 1.

Example 1.35.

Matrix equation Vector equation Linear system


" #" # " # " # " # " #
1 2 x1 1 1 2 1 x1 + 2x2 = 1
= x1 + x2 =
3 4 x2 −1 3 4 −1 3x1 + 4x2 = −1

Theorem 1.36. Let A = [a1 a2 · · · an ] be an m × n matrix, x ∈ Rn , and


b ∈ Rm . Then the matrix equation
Ax = b (1.16)

has the same solution set as the vector equation


x1 a1 + x2 a2 + · · · + xn an = b, (1.17)

which, in turn, has the same solution set as the system with augmented
matrix
[a1 a2 · · · an : b]. (1.18)
1.4. Matrix Equation Ax = b 35

Theorem 1.37. (Existence of solutions): Let A be an m × n matrix.


Then the following statements are logically equivalent (all true, or all
false).
a. For each b in Rm , the equation Ax = b has a solution.
b. Each b in Rm is a linear combination of columns of A.
c. The columns of A span Rm .
d. A has a pivot position in every row.
(Note that A is the coefficient matrix.)
     
0 0 4
Example 1.38. Let v1 = 0, v2 = −3, and v3 = −2.
     

3 9 −6
Does {v1 , v2 , v3 } span R3 ? Why or why not?
36 Chapter 1. Linear Equations
    

1 3 −2
     
 0  1
,  , and  1 span R4 ?
 
Example 1.39. Do the vectors 
 1  2 −3
     
−2 −8 2
Solution.

True-or-False 1.40.
a. The equation Ax = b is referred to as a vector equation.
b. Each entry in Ax is the result of a dot product.
c. If A ∈ Rm×n and if Ax = b is inconsistent for some b ∈ Rm , then A can-
not have a pivot position in every row.
d. If the augmented matrix [A b] has a pivot position in every row, then
the equation Ax = b is inconsistent.
Solution.

Ans: F,T,T,F
1.4. Matrix Equation Ax = b 37

Exercises 1.4
1. Write the system first as a vector equation and then as a matrix equation.
3x1 + x2 − 5x3 = 9
x2 + 4x3 = 0
   
0 3 −5
2. Let u = 0 and A = −2 6. Is u in
   

4 1 1
3
the plane R spanned by the columns of A?
(See the figure.) Why or why not?

Figure 1.8

3. The problems refer to the matrices A and B below. Make appropriate calculations that
justify your answers and mention an appropriate theorem.
   
1 3 0 3 1 3 −2 2
−1 −1 −1 1  0 1 1 −5
A= B=
   
 
 0 −4 2 −8  1 2 −3 7
2 0 3 −1 −2 −8 2 −1

(a) How many rows of A contain a pivot position? Does the equation Ax = b have a
solution for each b in R4 ?
(b) Can each vector in R4 be written as a linear combination of the columns of the
matrix A above? Do the columns of A span R4 ?
(c) Can each vector in R4 be written as a linear combination of the columns of the
matrix B above? Do the columns of B span R4 ?
Ans: (a) 3; (b) Theorem 1.37 (d) is not true
     
0 0 4
4. Let v1 =  0, v2 = −3, v3 = −1. Does {v1 , v2 , v3 } span R3 ? Why or why not?
     

−2 8 −5
Ans: The matrix of {v1 , v2 , v3 } has a pivot position on each row.
5. Could a set of three vectors in R4 span all of R4 ? Explain. What about n vectors in Rm
when n < m?
6. Suppose A is a 4 × 3 matrix and b ∈ R4 with the property that Ax = b has a unique
solution. What can you say about the reduced echelon form of A? Justify your answer.
Hint : How many pivot columns does A have?
38 Chapter 1. Linear Equations

1.5. Solution Sets of Linear Systems

Linear Systems Ax = b:
1. Homogeneous linear systems:
Ax = 0; A ∈ Rm×n , x ∈ Rn , 0 ∈ Rm . (1.19)

(a) It has always at least one solution: x = 0 (the trivial solution)


(b) Any nonzero solution is called a nontrivial solution.
2. Nonhomogeneous linear systems:
Ax = b; A ∈ Rm×n , x ∈ Rn , b ∈ Rm , b 6= 0. (1.20)

Note: Ax = 0 has a nontrivial solution if and only if the system has at least
one free variable.

1.5.1. Solutions of Homogeneous Linear Systems


Example 1.41. Determine if the following homogeneous system has a
nontrivial solution. Then describe the solution set.
x1 − 2x2 + 3x3 = 0
−2x1 − 3x2 − 4x3 = 0
2x1 − 4x2 + 9x3 = 0
1.5. Solution Sets of Linear Systems 39

Definition 1.42. If the solutions of Ax = 0 can be written in the form

x = c1 u1 + c2 u2 + · · · + cr ur , (1.21)

where c1 , c2 , · · · , cr are scalars and u1 , u2 , · · · , ur are vectors with size


same as x, then they are said to be in parametric vector form.

Note: When solutions of Ax = 0 is in the form of (1.21), we may say


{The solution set of Ax = 0} = Span{u1 , u2 , · · · , ur }. (1.22)

Example 1.43. Solve the system and write the solution in parametric
vector form.
x1 + 2x2 − 3x3 = 0
2x1 + x2 − 3x3 = 0
−x1 + x2 = 0
40 Chapter 1. Linear Equations

Example 1.44. Describe all solutions of Ax = 0 in parametric vector form


where A is row equivalent to the matrix.
 
1 −2 3 −6 5 0
 
0 0 0 1 4 −6
 
0 0 0 0 0 1
 
0 0 0 0 0 0
Hint : You should first row reduce it for the reduced echelon form.
Solution.

A single equation can be treated as a simple linear system.

Example 1.45. Solve the equation of 3 variables and write the solution in
parametric vector form.
x1 − 2x2 + 3x3 = 0

Solution. Hint : x1 is only the basic variable. Thus your solution would be the form of
x = x2 v1 + x3 v2 , which is a parametric vector equation of the plane.
1.5. Solution Sets of Linear Systems 41

1.5.2. Solutions of Nonhomogeneous Linear Systems


Example 1.46. Describe all solutions of Ax = b, where
   
3 5 −4 7
A = −3 −2 4 , b = −1.
   

6 1 −8 −4

Solution.

   
−1 4/3
Ans: x =  2 + x3  0
   

0 1

The solution of Example 1.46 is of the form


x = p + x3 v (= p + t v), (1.23)

where t is a general parameter. Note that (1.23) is an equation of line


through p parallel to v.
42 Chapter 1. Linear Equations

In the previous example, the solution of Ax = b is x = p + t v.


Question: What is “t v"?
Solution. First of all,

Ax = A(p + t v) = Ap + A(tv) = b. (1.24)

Note that x = p + t v is a solution of Ax = b, even when t = 0. Thus,

A(p + t v)t=0 = Ap = b. (1.25)

It follows from (1.24) and (1.25) that

A(t v) = 0, (1.26)

which implies that “t v" is a solution of the homogeneous equation


Ax = 0.

Theorem 1.47. Suppose the equation Ax = b is consistent for some


given b, and let p be a solution. Then the solution set of Ax = b is the
set of all vectors of the form {w = p + uh }, where uh is the solution of
the homogeneous equation Ax = 0.

Corollary 1.48. Let Ax = b have a solution. The solution is unique if


and only if Ax = 0 has only the trivial solution.

True-or-False 1.49.
a. The solution set of Ax = b is the set of all vectors of the form {w =
p + uh }, where uh is the solution of the homogeneous equation Ax = 0.
(Compare with Theorem 1.47, p.42.)
b. The equation Ax = b is homogeneous if the zero vector is a solution.
c. The solution set of Ax = b is obtained by translating the solution of
Ax = 0.
Solution.

Ans: F,T,F
1.5. Solution Sets of Linear Systems 43

Exercises 1.5
1. Determine if the system has a nontrivial solution. Try to use as few row operations as
possible.

2x1 − 5x2 + 8x3 = 0 3x1 + 5x2 − 7x3 = 0


(b)
(a) 2x1 − 7x2 + x3 = 0 6x1 + 7x2 + x3 = 0
4x1 − 12x2 + 9x3 = 0

Hint : x3 is a free variable for both (a) and (b).


2. Describe all solutions of Ax = 0 in parametric vector form, where A is row equivalent
to the given matrix.
" #  
1 3 −3 7 1 −4 −2 0 3 −5
(a)
0 1 −4 5
0 0 1 0 0 −1
(b) 
 

0 0 0 0 1 −4
0 0 0 0 0 0

Hint : (b) x2 , x4 , and x6 are free variables.


3. Describe and compare the solution sets of x1 − 3x2 + 5x3 = 0 and x1 − 3x2 + 5x3 = 4. Hint :
You must solve two problems each of which has a single equation, which in turn represents a
plane. For both, only x1 is the basic variable.
4. Suppose Ax = b has a solution. Explain why the solution is unique precisely when
Ax = 0 has only the trivial solution.
5. (1) Does the equation Ax = 0 have a nontrivial solution and (2) does the equation
Ax = b have at least one solution for every possible b?

(a) A is a 3 × 3 matrix with three pivot positions.


(b) A is a 3 × 3 matrix with two pivot positions.
44 Chapter 1. Linear Equations

1.7. Linear Independence

Definition 1.50. A set of vectors {v1 , v2 , · · · , vp } in Rn is said to be


linearly independent, if the vector equation

x1 v1 + x2 v2 + · · · + xp vp = 0 (1.27)

has only the trivial solution (i.e., x1 = x2 = · · · = xp = 0). The set of


vectors {v1 , v2 , · · · , vp } is said to be linearly dependent, if there exist
weights c1 , c2 , · · · , cp , not all zero, such that

c1 v1 + c2 v2 + · · · + cp vp = 0. (1.28)

Example 1.51. Determine if the set {v1 , v2 } is linearly independent.


" # " # " # " #
3 0 3 1
1) v1 = , v2 = 2) v1 = , v2 =
0 5 0 0

Remark 1.52. Let A = [v1 , v2 , · · · , vp ]. The matrix equation Ax = 0 is


equivalent to x1 v1 + x2 v2 + · · · + xp vp = 0.
1. Columns of A are linearly independent if and only if Ax = 0 has
only the trivial solution. ( ⇔ Ax = 0 has no free variable ⇔ Every
column in A is a pivot column.)
2. Columns of A are linearly dependent if and only if Ax = 0 has a
nontrivial solution. ( ⇔ Ax = 0 has at least one free variable ⇔ A
has at least one non-pivot column.)
1.7. Linear Independence 45

Example 1.53. Determine if the vectors are linearly independent.


           
1 0 1 0 0 2
1) 0, 1 2) 0, 1, 0, 1
           

0 0 0 0 1 2

Solution.

Example 1.54. Determine if the vectors are linearly independent.


     
0 0 −1
2,  0,  3
     

3 −8 1
Solution.
46 Chapter 1. Linear Equations

Example 1.55. Determine if the vectors are linearly independent.


       
1 −2 3 2
−2,  4, −6, 2
       

0 1 −1 3
Solution.

Note: In the above example, vectors are in Rn , n = 3; the number of vectors


p = 4. Like this, if p > n then the vectors must be linearly dependent.

Theorem 1.56. The set of vectors {v1 , v2 , · · · , vp } ⊂ Rn is linearly


dependent, if p > n.

Proof. Let A = [v1 v2 · · · vp ] ∈ Rn×p . Then Ax = 0 has n equations with


p unknowns. When p > n, there are more variables than equations; this
implies there is at least one free variable, which in turn means that there
is a nontrivial solution.
1.7. Linear Independence 47

Example 1.57. Find the value of h so that the vectors are linearly inde-
pendent.
     
3 −6 9
−6,  4, h
     

1 −3 3
Solution.

Example 1.58. (Revision of Example 1.57): Find the value of h so that


c is in Span{a, b}.
     
3 −6 9
a = −6, b =  4, c = h
     

1 −3 3
Solution.
48 Chapter 1. Linear Equations

Example 1.59. Determine by inspection if the vectors are linearly de-


pendent.
" # " # " #          
1 0 1 1 0 0 1 3
1) , ,
0 1 2 2)  0, 2, 0 3) 2, 6
         

−1 3 0 3 9
Solution.

Note: Let S = {v1 , v2 , · · · , vp }. If S contains the zero vector, then it is


always linearly dependent. A vector in S is a linear combination of other
vectors in S if and only if S is linearly dependent.

True-or-False 1.60.
a. The columns of any 3 × 4 matrix are linearly dependent.
b. If u and v are linearly independent, and if {u, v, w} is linearly depen-
dent, then w ∈ Span{u, v}.
c. Two vectors are linearly dependent if and only if they lie on a line
through the origin.
d. The columns of a matrix A are linearly independent, if the equation
Ax = 0 has the trivial solution.
Solution.

Ans: T,T,T,F
1.7. Linear Independence 49

Exercises 1.7
1. Determine if the columns of the matrix form a linearly independent set. Justify each
answer.
   
−4 −3 0 1 −3 3 −2
 0 −1 4 (b) −3 7 −1 2
 
(a) 
 

 1 0 3 0 1 −4 3
5 4 6

2. Find the value(s) of h for which the vectors are linearly dependent. Justify each answer.
           
1 3 −1 1 −2 3
(a) −1, −5,  5 (b)  5, −9,  h
           

4 7 h −3 6 −9

3. (a) For what values of h is v3 in Span{v1 , v2 }, and (b) for what values of h is {v1 , v2 , v3 }
linearlydependent?
 Justify
 eachanswer.

1 −3 5
v1 = −3, v2 =  9, v3 = −7.
     

2 −6 h
Ans: (a) No h; (b) All h
4. Describe the possible echelon forms of the matrix. Use the notation of Exercise 3 in
Section 1.2, p. 19.

(a) A is a 3 × 3 matrix with linearly independent columns.


(b) A is a 2 × 2 matrix with linearly dependent columns.
(c) A is a 4 × 2 matrix, A = [a1 , a2 ] and a2 is not a multiple of a1 .
50 Chapter 1. Linear Equations

1.8. Linear Transformations

Example 1.61. Let A ∈ Rm×n and x ∈ Rn .


Then Ax is a new vector in Rm . For example,
 
" # −1
4 −3 1
A= ∈ R2×3 , x =  1 ∈ R3 .
 
2 0 5
3
Then
 
" # −1 " # " #
4 −3 1   4 · (−1) − 3 · (1) + 1 · (3) −4
Ax =  1 = = .
2 0 5 2 · (−1) + 0 · (1) + 5 · (3) 13
3 | {z }
a new vector in R2
That is, A transforms vectors to another space.

Definition 1.62. Transformation (function or mapping)


A transformation T from Rn to Rm is a rule that assigns to each vec-
tor x ∈ Rn a vector T (x) ∈ Rm . In this case, we write
T : Rn → R m
(1.29)
x 7→ T (x)

where Rn is the domain of T , Rm is the codomain of T , and T (x) denotes


the image of x under T . The set of all images is called the range of T .
Range(T ) = {T (x) | x ∈ Rn }

Figure 1.9: Transformation T : Rn → Rm .


1.8. Linear Transformations 51

Definition 1.63. Transformation associated with matrix multiplication


is matrix transformation. That is, for each x ∈ Rn , T (x) = Ax, where
A is an m × n matrix. We may denote the matrix transformation as

T : Rn → Rm
(1.30)
x 7→ Ax

Here the range of T is set of all linear combinations of columns of A.

Range(T ) = Span{columns of A}.

" #
1 3
Example 1.64. Let A = . The transformation T : R2 → R2 defined
0 1
by T (x) = Ax is called a shear transformation. Determine the image of a
square [0, 2] × [0, 2] under T .
Solution. Hint : Matrix transformations is an affine mapping, which means that they
map line segments into line segments (and corners to corners).
52 Chapter 1. Linear Equations
     
1 −3 " # 3 3
2
Example 1.65. Let A =  3 5, u = , b =  2, c = 2, and
     
−1
−1 7 −5 5
2 3
define a transformation T : R → R by T (x) = Ax.
a. Find T (u), the image of u under the transformation T .
b. Find an x ∈ R2 whose image under T is b.
c. Is there more than one x whose image under T is b?
d. Determine if c is in the range of the transformation T .
Solution.

" #
1.5
Ans: b. x = ; c. no; d. no
−0.5
1.8. Linear Transformations 53

Linear Transformations
Definition 1.66. A transformation T is linear if:
(i) T (u + v) = T (u) + T (v), for all u, v in the domain of T
(ii) T (cu) = cT (u), for all scalars c and all u in the domain of T

Claim 1.67. If T is a linear transformation, then

T (0) = 0 (1.31)

and
T(c u + d v) = c T(u) + d T(v) , (1.32)
for all vectors u, v in the domain of T and all scalars c, d.
We can easily prove that if T satisfies (1.32), then T is linear.

Remark 1.68. The function f (x) = ax is a linear transformation:

f (c x1 + d x2 ) = a(cx1 + dx2 ) = c(ax1 ) + d(ax2 ) = c f (x1 ) + d f (x2 ). (1.33)

Example 1.69. Prove that a matrix transformation T (x) = Ax is linear.


Proof. It is easy to see that

T (c u + d v) = A(c u + d v) = c Au + d Av = c T (u) + d T (v),

which completes the proof, satisfying (1.32).

Remark 1.70. Repeated application of (1.32) produces a useful gener-


alization:
T (c1 v1 + c2 v2 + · · · + cp vp ) = c1 T (v1 ) + c2 T (v2 ) + · · · + cp T (vp ). (1.34)

In engineering physics, (1.34) is referred to as a superposition princi-


ple.
54 Chapter 1. Linear Equations

Example 1.71. Let θ be the angle measured from the positive x-axis coun-
terclockwise. Then, the rotation can be defined as
" #
cos θ − sin θ
R[θ] = (1.35)
sin θ cos θ

1) Describe R[π/2] explicitly.


" # " #
1 0
2) What are images of and under R[π/2].
0 1
3) Is R[θ] a linear transformation?
Solution.

For example, a yaw is a counter-


clockwise rotation of ψ about the z-
axis. The rotation matrix reads
 
cos ψ − sin ψ 0
Rz [ψ] =  sin ψ cos ψ 0
 
Figure 1.10: Euler angles (roll,pitch,yaw)
in aerodynamics. 0 0 1
1.8. Linear Transformations 55

True-or-False 1.72.
a. If A ∈ R3×5 and T is a transformation defined by T (x) = Ax, then the
domain of T is R3 .
b. A linear transformation is a special type of function.
c. The superposition principle is a physical description of a linear trans-
formation.
d. Every matrix transformation is a linear transformation.
e. Every linear transformation is a matrix transformation. (If it is false,
can you find an example that is linear but of no matrix description?)
Solution.

Ans: F,T,T,T,F
56 Chapter 1. Linear Equations

Exercises 1.8
1. With T defined by T x = Ax, find a vector x whose image under T is b, and determine
whether
 x is unique.
  
1 −3 2 6
A = 0 1 −4, b = −7
   

3 −5 −9 −9  
−5
Ans: x = −3, unique
 

1
2. Answer the following

(a) Let A be a 6 × 5 matrix. What must a and b be in order to define T : Ra → Rb by by


T x = Ax?
(b) How many rows and columns must a matrix A have in order to define a mapping
from R4 into R5 by the rule T x = Ax?
Ans: (a) a = 5; b = 6
   
−1 1 −4 7 −5
3. Let b =  1 and A = 0 1 −4 3. Is b in the range of the linear transformation
   

0 2 −6 6 −4
x 7→ Ax? Why or why not?
Ans: yes
" # " #
5 −2
4. Use a rectangular coordinate system to plot u = , u= , and their images under
2 4
the given transformation T . (Make a separate and reasonably large sketch.) Describe
2
geometrically
" what#"T does
# to each vector x in R .
−1 0 x1
T (x) =
0 −1 x2
5. Show that the transformation T defined by T (x1 , x2 ) = (2x1 − 3x2 , x1 + 4, 5x2 ) is not linear.
Hint : T (0, 0) = 0?
6. Let T : R3 → R3 be the transformation that projects each vector x = (x1 , x2 , x3 ) onto the
plane x2 = 0, so T (x) = (x1 , 0, x3 ). Show that T is a linear transformation.
Hint : Try to verify (1.32): T (cx + dy) = T (cx1 + dy1 , cx2 + dy2 , cx3 + dy3 ) = · · · = cT (x) + dT (y).
1.9. The Matrix of A Linear Transformation 57

1.9. The Matrix of A Linear Transformation


Note: In Example 1.69 (p. 53), we proved that every matrix transfor-
mation is linear. The reverse is not always true. However, a linear
transformation defined in Rn is a matrix transformation.

Here in this section, we will try to find matrices for linear transformations
defined in Rn . Let’s begin with an example.
Example 1.73. Suppose T : R2 → R3 is a linear transformation such that
   
5 −3 " # " #
1 0
T (e1 ) = −7, T (e2 ) =  8, where e1 = , e2 = .
   
0 1
2 0

3×2
Solution. What we should do is  tofind a matrix A ∈R such that
5 −3
T (e1 ) = Ae1 = −7, T (e2 ) = Ae2 =  8. (1.36)
   

2 0
" #
x1
Let x = ∈ R2 . Then
x2
" # " #
1 0
x = x1 + x2 = x 1 e1 + x 2 e2 . (1.37)
0 1
It follows from linearity of T that
T (x) = T (x1 e1 + x2 e2 ) = x1 T (e1 ) + x2 T (e2 )
 
" # 5 −3
x1
= [T (e1 ) T (e2 )] = −7 8 x.
  (1.38)
x2
2 0
| {z }
A

Now, you can easily check that A satisfies (1.36).

Observation 1.74. The matrix of a linear transformation is decided by


its action on the standard basis.
58 Chapter 1. Linear Equations

1.9.1. The Standard Matrix

Theorem 1.75. Let T : Rn → Rm be a linear transformation. Then


there exists a unique matrix A ∈ Rm×n such that

T (x) = Ax, for all x ∈ Rn .

In fact, with ej denoting the j-th standard unit vector in Rn ,

A = [T (e1 ) T (e2 ) · · · T (en )] . (1.39)

The matrix A is called the standard matrix of the transformation.

Note: Standard unit vectors in Rn & the standard matrix:


     
1 0 0
 .
 ..
 
0 1
 
 ..
e1 =  ., e2 = 0, · · · , en =  ....
     
(1.40)
   
 .
 ..  ..  
   . 0
 
0 0 1

Any x ∈ Rn can be written as


       
x 1 0 0
 1
 ...
   
 x2  0 1
 
 
 ..  ..
x =  . = x1  . + x2 0 + · · · + xn  ... = x1 e1 + x2 e2 + · · · + xn en .
       
   
 .
 ..
 .
 ..  ..  
     . 0
 
xn 0 0 1

Thus
T (x) = T (x1 e1 + x2 e2 + · · · + xn en )
= x1 T (e1 ) + x2 T (e2 ) + · · · + xn T (en ) (1.41)
= [T (e1 ) T (e2 ) · · · T (en )] x,
and therefore the standard matrix reads
A = [T (e1 ) T (e2 ) · · · T (en )] . (1.42)
1.9. The Matrix of A Linear Transformation 59

Example 1.76. Let T : R2 → R2 be a horizontal shear transformation


that leaves e1 unchanged and maps e2 into e2 + 2e1 . Write the standard
matrix of T .
Solution.

" #
1 2
Ans: A = .
0 1

Example 1.77. Write the standard matrix for the linear transformation
T : R2 → R4 given by
T (x1 , x2 ) = (x1 + 4x2 , 0, x1 − 3x2 , x1 ).

Solution.
60 Chapter 1. Linear Equations

Geometric Linear Transformations of R2


Example 1.78. Find the standard matrices for the reflections in R2 .
" # " #
x1 x1
1) The reflection through the x1 -axis, defined as R1 = .
x2 −x2
" # " #
x1 x2
2) The reflection through the line x1 = x2 , defined as R2 = .
x2 x1
3) The reflection through the line x1 = −x2 (Define R3 first.)
Solution.

" #
0 −1
Ans: 3) A3 =
−1 0
1.9. The Matrix of A Linear Transformation 61

1.9.2. Existence and Uniqueness Questions

Definition 1.79.

1) A mapping T : Rn → Rm is said to be surjective (onto Rm ) if each


b in Rm is the image of at least one x in Rn .
2) A mapping T : Rn → Rm is said to be injective (one-to-one) if each
b in Rm is the image of at most one x in Rn .

Figure 1.11: Surjective?: Is the range of T all of Rm ?

Figure 1.12: Injective?: Is each b ∈ Rm the image of one and only one x in Rn ?

Note: For solutions of Ax = b, A ∈ Rm×n ; existence is related to


“surjective"-ness, while uniqueness is granted for “injective" mappings.
62 Chapter 1. Linear Equations

Example 1.80. Let T : R4 → R3 be the linear transformation whose stan-


dard matrix is  
1 −4 0 1
A = 0 2 −1 3 .
 

0 0 0 −1
Is T onto? Is T one-to-one?
Solution.

Ans: onto, but not one-to-one

Theorem 1.81. Let T : Rn → Rm be a linear transformation with the


standard matrix A. Then,
(a) T maps Rn onto Rm if and only if the columns of A span Rm .
( ⇔ every row of A has a pivot position
⇔ Ax = b has a solution for all b ∈ Rm )
(b) T is one-to-one if and only if the columns of A are linearly indepen-
dent.
( ⇔ every column of A is a pivot column
⇔ Ax = 0 has “only" the trivial solution)
 
1 3 0
Example 1.82. Let T (x) = 0 3 4x. Is T one-to-one? Does T map R3
 

0 0 4
onto R3 ?
Solution.
1.9. The Matrix of A Linear Transformation 63
 
1 4
 
0 0 
1 −3x. Is T one-to-one (1–1)? Is T onto?
Example 1.83. Let T (x) =  
 
1 0
Solution.

 
1 0 0 1
Example 1.84. Let T (x) = 0 1 2 3x. Is T 1–1? Is T onto?
 

0 2 4 6
Solution.
64 Chapter 1. Linear Equations

Example 1.85. Let T : R4 → R3 be the linear transformation given by

T (x1 , x2 , x3 , x4 ) = (x1 − 4x2 + 8x3 + x4 , 2x2 − 8x3 + 3x4 , 5x4 ).

Is T 1–1? Is T onto?
Solution.

True-or-False 1.86.
a. A mapping T : Rn → Rm is one-to-one if each vector in Rn maps onto a
unique vector in Rm .
b. If A is a 3 × 2 matrix, then the transformation x 7→ Ax cannot map R2
onto R3 .
c. If A is a 3 × 2 matrix, then the transformation x 7→ Ax cannot be one-
to-one. (See Theorem 1.81, p.62.)
d. A linear transformation T : Rn → Rm is completely determined by its
action on the columns of the n × n identity matrix.
Solution.

Ans: F,T,F,T
1.9. The Matrix of A Linear Transformation 65

Exercises 1.9
1. Assume that T is a linear transformation. Find the standard matrix of T .

(a) T : R2 → R4 , T (e1 ) = (3, 1, 3, 1) and T (e2 ) = (5, 2, 0, 0), where e1 = (1, 0) and e2 =
(0, 1).
(b) T : R2 → R2 first performs a horizontal shear that transforms e2 into e2 − 2e1 (leav-
ing e1 unchanged) and then reflects points through the line x2 = −x1 .
" # " # " #
1 −2 0 −1 0 −1
Ans: (b) shear: and reflection: ; it becomes .
0 1 −1 0 −1 2
2. Show that T is a linear transformation by finding a matrix that implements the map-
ping. Note that x1 , x2 , · · · are not vectors but are entries in vectors.

(a) T (x1 , x2 , x3 , x4 ) = (0, x1 + x2 , x2 + x3 , x2 + x4 )


(b) T (x1 , x2 , x3 ) = (x1 − 5x2 + 4x3 , x2 − 6x3 )
(c) T (x1 , x2 , x3 , x4 ) = 2x1 + 3x3 − 4x4

3. Let T : R2 → R2 be a linear transformation such that T (x1 , x2 ) = (x1 +x2 , 4x1 +5x2 ). Find
x such that T (x) = (3, 8). " #
7
Ans: x =
−4
4. Determine if the specified linear transformation is (1) one-to-one and (2) onto. Justify
each answer.

(a) The transformation in Exercise 2(a).


(b) The transformation in Exercise 2(b).
Ans: (a) Not 1-1, not onto; (b) Not 1-1, but onto
5. Describe the possible echelon forms of the standard matrix for a linear transformation
T , where T : R4 → R3 is onto. Use the notation of Exercise 3 in Section 1.2.
Hint : The matrix should have a pivot position in each row. Thus there 4 different possible
echelon forms.
66 Chapter 1. Linear Equations
C HAPTER 2
Matrix Algebra

From the elementary school, you have learned about numbers and oper-
ations such as addition, subtraction, multiplication, division, and factor-
ization. Matrices are also mathematical objects. Thus you may de-
fine matrix operations, similarly done for numbers. Matrix algebra is a
study about such matrix operations and related applications. Algorithms
and techniques you will learn through this chapter are quite fundamen-
tal and important to further develop for application tasks.

Contents of Chapter 2
2.1. Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.2. The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.3. Characterizations of Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.5. Solving Linear Systems by Matrix Factorizations . . . . . . . . . . . . . . . . . . . . . . 87
2.8. Subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2.9. Dimension and Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

67
68 Chapter 2. Matrix Algebra

2.1. Matrix Operations

Let A be an m × n matrix.
Let aij denotes the entry in row i and
column j. Then, we write A = [aij ].

Figure 2.1: Matrix A ∈ Rm×n .

Terminologies
• If m = n, A is called a square matrix.
• If A is an n × n matrix, then the entries a11 , a22 , · · · , ann are called di-
agonal entries.
• A diagonal matrix is a square matrix (say n × n) whose non-diagonal
entries are zero.
Ex: Identity matrix In .

2.1.1. Sum, Scalar Multiple, and Matrix Multiplication

1) Equality: Two matrices A and B of the same size (say m × n) are


equal if and only if the corresponding entries in A and B are equal.
2) Sum: The sum of two matrices A = [aij ] and B = [bij ] of the same
size is the matrix A + B = [aij + bij ].
3) Scalar multiplication: Let r be any scalar. Then, rA = r[aij ] = [raij ].
2.1. Matrix Operations 69

Example 2.1. Let


" # " # " #
4 0 5 1 1 1 2 −3
A= , B= , C=
−1 3 2 3 5 7 0 1

a) A + B, A + C?
b) A − 2B
Solution.
70 Chapter 2. Matrix Algebra

Matrix Multiplication

Definition 2.2. Matrix Multiplication


If A is an m × n matrix and B is an n × p matrix with columns
b1 , b2 , · · · , bp , then the matrix product AB is a matrix with columns
Ab1 , Ab2 , · · · , Abp . That is,

AB = A[b1 b2 · · · bp ] = [Ab1 Ab2 · · · Abp ] ∈ Rm×p , (2.1)

which is a collection of matrix-vector multiplications.

Matrix multiplication, as a Composition of Two Linear Transfor-


mations. AB’s action on x:

A B : x ∈ Rp 7→ Bx ∈ Rn 7→ A(Bx) ∈ Rm (2.2)
B A

Figure 2.2

Let A ∈ Rm×n , B ∈ Rn×p , and x = [x1 , x2 , · · · , xp ]T ∈ Rp . Then,

Bx = [b1 b2 · · · bp ] x = x1 b1 + x2 b2 + · · · + xp bp
⇒ A(Bx) = A(x1 b1 + x2 b2 + · · · + xp bp )
= x1 Ab1 + x2 Ab2 + · · · + xp Abp (2.3)
= [Ab1 Ab2 · · · Abp ] x
⇒ AB = [Ab1 Ab2 · · · Abp ] ∈ Rm×p

where bi ∈ Rn .
2.1. Matrix Operations 71

Example 2.3. Let T : R2 → R3 and S : R2 → R2 be such that T (x) = Ax


 
4 −3 " #
1 4
where A = −3 5 and S(x) = Bx where B = . Compute the
 
3 −2
0 1
standard matrix of T ◦ S.
Solution. Hint : (T ◦ S) (x) = T (S(x)) = T (Bx) = A(Bx) = (AB)x.

Row-Column rule for Matrix Multiplication


If the product AB is defined (i.e. number of columns in A = number of
rows in B) and A ∈ Rm×n , then the entry in row i and column j of AB
is the sum of products of corresponding entries from row i of A and
column j of B. That is,

(AB)ij = ai1 b1j + ai2 b2j + · · · + ain bnj . (2.4)

The sum of products is also called the dot product.


72 Chapter 2. Matrix Algebra

2.1.2. Properties of Matrix Multiplication


Example 2.4. Compute AB if
" # " #
2 3 4 3 6
A= , B= .
1 −5 1 −2 3
Solution.

Example 2.5. Find all columns of matrix B if


" # " #
1 2 8 7
A= , AB = .
−1 3 7 −2
Solution.

" #
2 5
Ans: B =
3 1
2.1. Matrix Operations 73

Example 2.6. Find the first column of B if


   
−2 3 −11 10
A =  3 3 and AB =  9 0.
   

5 −3 23 −16
Solution.

" #
4
Ans: b1 =
−1

Remark 2.7.
1. (Commutativity) Suppose both AB and BA are defined. Then, in
" AB 6=
general, # BA. " # " # " #
3 −6 −1 1 −21 −21 −4 8
A= , B= . Then AB = , BA = .
−1 2 3 4 7 7 5 −10

2. (Cancellation law) If AB = AC, then B = C needs not be true


always. (e.g., A = 0) ⇒ determinant and invertibility
3. (Powers of a matrix) If A is an n × n matrix and if k is a positive
integer, then Ak denotes the product of k copies of A:

Ak = A
| ·{z
· · A}
k times

If k = 0, then Ak is identified with the identity matrix, In .


74 Chapter 2. Matrix Algebra

Transpose of a Matrix

Definition 2.8. Given an m × n matrix A, the transpose of A is the


matrix, denoted by AT , whose columns are formed from the correspond-
ing rows of A.
 
1 −4 8 1
Example 2.9. If A = 0 2 −1 3, then AT =
 

0 0 0 5

Theorem 2.10. Let A = [aij ].


a. AT = [aji ]
b. (AT )T = A
c. (A + B)T = AT + B T
d. (rA)T = r AT , for any scalar r
e. (AB)T = B T AT

Note: The transpose of a product of matrices equals the product of their


transposes in the reverse order: (ABC)T = C T B T AT
2.1. Matrix Operations 75

True-or-False 2.11.
a. Each column of AB is a linear combination of the columns of B using
weights from the corresponding column of A.
b. The second row of AB is the second row of A multiplied on the right by
B.
c. The transpose of a sum of matrices equals the sum of their transposes.
Solution.

Ans: F(T if A ↔ B),T,T

Challenge 2.12.

a. Show that if the columns of B are linearly dependent, then so are the
columns of AB.
b. Suppose CA = In (the n × n identity matrix). Show that the equation
Ax = 0 has only the trivial solution.

Hint : a. The condition means that Bx = 0 has a nontrivial solution.


76 Chapter 2. Matrix Algebra

Exercises 2.1
1. Compute the product AB in two ways: (a) by the definition, where Ab1 and Ab2 are
computed
 separately,
 and (b) by the row-column rule for computing AB.
−1 2 " #
3 2
A= 5 4 and B = .
 
2 1
2 −3
2. If a matrix A is 5 × 3 and the product AB is 5 × 7, what is the size of B?
" # " # " #
2 −3 8 4 5 −2
3. Let A = , B = , and C = . Verify that AB = AC and yet
−4 6 5 5 3 1
B 6= C.
" # " #
1 −2 −1 2 −1
4. If A = and AB = , determine the first and second columns of
−2 5 6 −9 3
B. " # " #
7 −8
Ans: b1 = , b2 =
4 −5

5. Give a formula for (ABx)T , where x is a vector and A and B are matrices of appropriate
sizes.
   
−2 a
6. Let u =  3 and v =  b. Compute uT v, vT u, u vT and v uT .
   

−4 c  
−2a −2b −2c
Ans: uT v = −2a + 3b − 4c and u vT =  3a 3b 3c.
 

−4a −4b −4c


2.2. The Inverse of a Matrix 77

2.2. The Inverse of a Matrix


Definition 2.13. An n × n matrix A is said to be invertible (nonsin-
gular) if there is an n × n matrix B such that AB = In = BA, where In
is the identity matrix.

Note: In this case, B is the unique inverse of A denoted by A−1 .


(Thus AA−1 = In = A−1 A.)
" # " #
2 5 −7 −5
Example 2.14. If A = and B = . Find AB and BA.
−3 −7 3 2
Solution.

Theorem 2.15. (Inverse of an n × n matrix, n ≥ 2) An n × n matrix


A is invertible if and only if A is row equivalent to In and in this case any
sequence of elementary row operations that reduces A into In will also
reduce In to A−1 .

Algorithm 2.16. Algorithm to find A−1 :


1) Row reduce the augmented matrix [A : In ]
2) If A is row equivalent to In , then [A : In ] is row equivalent to [In :
A−1 ]. Otherwise A does not have any inverse.
78 Chapter 2. Matrix Algebra
" #
3 2
Example 2.17. Find the inverse of A = .
8 5
Solution." You may begin
# with
3 2 1 0
[A : I2 ] =
8 5 0 1

" #
−5 2
Ans: A−1 =
8 −3
 
0 1 0
Example 2.18. Find the inverse of A = 1 0 3, if it exists.
 

4 −3 8
Solution.
2.2. The Inverse of a Matrix 79
 
1 2 −1
Example 2.19. Find the inverse of A = −4 −7 3, if it exists.
 

−2 −6 4
Solution.

Theorem 2.20.
" #
a b
a. (Inverse of a 2 × 2 matrix) Let A = . If ad − bc 6= 0, then A
c d
is invertible and " #
1 d −b
A−1 = (2.5)
ad − bc −c a

b. If A is an invertible matrix, then A−1 is also invertible and


(A−1 )−1 = A.
c. If A and B are n × n invertible matrices then AB is also invertible
and (AB)−1 = B −1 A−1 .
d. If A is invertible, then AT is also invertible and (AT )−1 = (A−1 )T .
e. If A is an n × n invertible matrix, then for each b ∈ Rn , the equation
Ax = b has a unique solution x = A−1 b.
80 Chapter 2. Matrix Algebra

Example 2.21. When A, B, C, and D are n × n invertible matrices, solve


for X if C −1 (A + X)B −1 = D.
Solution.

Example 2.22. Explain why the columns of an n × n matrix A are linearly


independent when A is invertible.
Solution. Hint : Let x1 a1 + x2 a2 + · · · + xn an = 0. Then, show that x = [x1 , x2 , · · · , xn ]T = 0.

Remark 2.23. (Another view of matrix inversion) For an invertible


matrix A, we have AA−1 = In . Let A−1 = [x1 x2 · · · xn ]. Then

AA−1 = A[x1 x2 · · · xn ] = [e1 e2 · · · en ]. (2.6)

Thus the j-th column of A−1 is the solution of

Axj = ej , j = 1, 2, · · · , n.
2.2. The Inverse of a Matrix 81

True-or-False 2.24.
a. In order for a matrix B to be the inverse of A, both equations AB = In
and BA = In must be true.
" #
a b
b. If A = and ad = bc, then A is not invertible.
c d
c. If A is invertible, then elementary row operations that reduce A to the
identity In also reduce A−1 to In .
Solution.
Ans: T,T,F

Exercises 2.2
 
"
# 1 −2 1
3 −4
1. Find the inverses of the matrices, if exist: A = and B =  4 −7 3
 
7 −8
−2 6 −4
Ans: B is not invertible.
2. Use matrix algebra to show that if A is invertible and D satisfies AD = I, then D = A−1 .
Hint : You may start with AD = I and then multiply A−1 .
3. Solve the equation AB + C = BC for A, assuming that A, B, and C are square and B is
invertible.
4. Explain why the columns of an n × n matrix A span Rn when A is invertible. Hint : If A
is invertible, then Ax = b has a solution for all b in Rn .
5. Suppose A is n × n and the equation Ax = 0 has only the trivial solution. Explain why
A is row equivalent to In . Hint : A has n pivot columns.
6. Suppose A is n × n and the equation Ax = b has a solution for each b in Rn . Explain
why A must be invertible. Hint : A has n pivot columns.
82 Chapter 2. Matrix Algebra

2.3. Characterizations of Invertible Matrices

Theorem 2.25. (Invertible Matrix Theorem) Let A be an n × n ma-


trix. Then the following are equivalent.
a. A is an invertible matrix. (Def: There is B s.t. AB = BA = I)
b. A is row equivalent to the n × n identity matrix.
c. A has n pivot positions.
d. The equation Ax = 0 has only the trivial solution x = 0.
e. The columns of A are linearly independent.
f. The linear transformation x 7→ Ax is one-to-one.
g. The equation Ax = b has unique solution for each b ∈ Rn .
h. The columns of A span Rn .
i. The linear transformation x 7→ Ax maps Rn onto Rn .
j. There is a matrix C ∈ Rn×n such that CA = I
k. There is a matrix D ∈ Rn×n such that AD = I
l. AT is invertible and (AT )−1 = (A−1 )T .
More statements will be added in the coming sections.

Note: Let A and B be square matrices. If AB = I, then A and B are both


invertible, with B = A−1 and A = B −1 .
2.3. Characterizations of Invertible Matrices 83

Example 2.26. Use the Invertible Matrix Theorem to decide if A is invert-


ible:  
1 0 −2
A =  3 1 −2
 

−5 −1 9
Solution.

Example 2.27. Can a square matrix with two identical columns be invert-
ible?
84 Chapter 2. Matrix Algebra

Example 2.28. An n × n upper triangular matrix is one whose entries


below the main diagonal are zeros. When is a square upper triangular ma-
trix invertible?

Theorem 2.29. (Invertible linear transformations)


1. A linear transformation T : Rn → Rn is said to be invertible if
there exists S : Rn → Rn such that S ◦ T (x) = T ◦ S(x) = x for all
x ∈ Rn . In this case, S = T −1 .
2. Also, if A is the standard matrix for T , then A−1 is the standard
matrix for T −1 .

Example 2.30. Let T : R2 → R2 be a linear transformation such that


" # " #
x1 −5x1 + 9x2
T = . Find a formula for T −1 .
x2 4x1 − 7x2
Solution.
2.3. Characterizations of Invertible Matrices 85

Example 2.31. Let T : Rn → Rn be one-to-one. What can you say about


T?
Solution. Hint : T : 1-1 ⇔ Columns of A is linearly independent

True-or-False 2.32. Let A be an n × n matrix.


a. If the equation Ax = 0 has only the trivial solution, then A is row
equivalent to the n × n identity matrix.
b. If the columns of A span Rn , then the columns are linearly independent.
c. If A is an n × n matrix, then the equation Ax = b has at least one
solution for each b ∈ Rn .
d. If the equation Ax = 0 has a nontrivial solution, then A has fewer than
n pivot positions.
e. If AT is not invertible, then A is not invertible.
f. If the equation Ax = b has at least one solution for each b ∈ Rn , then
the solution is unique for each b.
Solution.
Ans: T,T,F,T,T,T
86 Chapter 2. Matrix Algebra

Exercises 2.3
1. An m × n lower triangular matrix is one whose entries above the main diagonal are
0’s. When is a square upper triangular matrix invertible? Justify your answer. Hint :
See Example 2.28.
2. Is it possible for a 5 × 5 matrix to be invertible when its columns do not span R5 ? Why
or why not?
Ans: No
3. If A is invertible, then the columns of A−1 are linearly independent. Explain why.
4. If C is 6 × 6 and the equation Cx = v is consistent for every v ∈ R6 , is it possible that
for some v, the equation Cx = v has more than one solution? Why or why not?
Ans: No
5. If the equation Gx = y has more than one solution for some y ∈ Rn , can the columns of
G span Rn ? Why or why not?
Ans: No
6. Let T : R2 → R2 be a linear transformation such that T (x1 , x2 ) = (6x1 − 8x2 , −5x1 + 7x2 ).
Show that T is invertible and find a formula for T −1 . Hint : See Example 2.30.
2.5. Solving Linear Systems by Matrix Factorizations 87

2.5. Solving Linear Systems by Matrix Factor-


izations
2.5.1. The LU Factorization/Decomposition
In industrial and business applications, the linear system is often sparse
and to be solved for multiple right-sides:
Ax = b1 , Ax = b2 , Ax = b3 , · · · . (2.7)
The LU factorization is very useful for these common problems. (The in-
verse of a matrix usually becomes a full matrix.)
Definition 2.33. Let A ∈ Rm×n . The LU factorization of A is

A = LU, (2.8)

where L ∈ Rm×m is a unit lower triangular matrix and


U ∈ Rm×n is an echelon form of A (upper triangular matrix):

Remark 2.34. Let Ax = b be to be solved. Then Ax = LU x = b,


which reads (
Ly = b,
(2.9)
U x = y.
Each algebraic equation can be solved efficiently, via substitutions.

Definition 2.35. Every elementary row operation can be expressed


as a matrix to be left-multiplied.
• Such a matrix is called an elementary matrix.
• Every elementary matrix is invertible.
88 Chapter 2. Matrix Algebra
 
1 −2 1
Example 2.36. Let A =  2 −2 3 .
 

−3 2 0
a) Reduce A to an echelon matrix, using replacement operations.
b) Express the replacement operations as elementary matrices.
c) Find their inverse.
Solution. a) b) & c)
 
1 −2 1
A =  2 −2 3
 

−3 2 0
2.5. Solving Linear Systems by Matrix Factorizations 89

Algorithm 2.37. (LU Factorization Algorithm) The following


derivation introduces an LU factorization. Let A ∈ Rm×n . Then
A = Im A
= Im E1−1 E1 A
= Im E1−1 E2−1 E2 E1 A = (E2 E1 )−1 E2 E1 A
..
= .
= Im E1−1 E2−1 · · · Ep−1 Ep · · · E2 E1 A = (Ep · · · E2 E1 )−1 Ep · · · E2 E1 A
| {z } | {z }| {z }
an echelon form L U
(2.10)
where each Ei is the elementary matrix for a replacement operation.

Remark 2.38. The LU factorization algorithm (without pivoting) uses


a sequence of “replacement” row operations to get
Ep ···E2 E1
A −−−−−−−−→ U = Ep · · · E2 E1 A
(2.11)
(Ep ···E2 E1 )−1
I −−−−−−−→ L = I E1−1 E2−1 · · · Ep−1
90 Chapter 2. Matrix Algebra


3 −1 1
Example 2.39. Find the LU factorization of A =  9 1 2 .
 

−6 5 −5
Solution. (Forward Phase: Gauss Elimination)
   
3 −1 1 3 −1 1
 E1 : R2 ←R2 −3R1 
A =  9 1 2  −− −−−−−−−→  0 4 −1 
 
E2 : R3 ←R3 +2R1
−6 5 −5 0 3 −3
  (2.12)
3 −1 1
E3 : R3 ←R3 − 34 R2 
−−−−−−−−−→  0 4 −1  = U

0 0 − 49

Collect the “replacement” row operations and their inverse:

A → U : R2 ← R2 −3R1 =⇒ R3 ← R3 +2R1 =⇒ R3 ← R3 − 43 R2
E3 E2 E1 A = U =⇒ A = (E3 E2 E1 )−1 U
(2.13)
L = I(E3 E2 E1 )−1 = I E1−1 E2−1 E3−1
I → L : R2 ← R2 +3R1 ⇐= R3 ← R3 −2R1 ⇐= R3 ← R3 + 43 R2

Now we construct L = I E1−1 E2−1 E3−1 = E1−1 E2−1 E3−1 I:


   
1 0 0 1 0 0
 E3−1 : R3 ←R3 + 3 R2 
I =  0 1 0  −−−−−−−−−4−→  0 1 0 
 

0 0 1 0 43 1
  (2.14)
1 0 0
E −1 : R3 ←R3 −2R1 
−−2−1−−−−−−−−→  3 1 0  = L.

E1 : R2 ←R2 +3R1
3
−2 4 1

Note: The matrix L simply collects


• all the constants of “inverse replacements”,
• which are the same as ratios to pivot values.
2.5. Solving Linear Systems by Matrix Factorizations 91


3 −1 1
Example 2.40. Find the LU factorization of A =  9 1 2 .
 

−6 5 −5
Solution. (Practical Implementation):
   
3 −1 1 3 −1 1
 E : R ←R2 −3R1 
A =  9 1 2  −−1−−2−−− −−→  3 4 −1 
 
E2 : R3 ←R3 +2R1
−6 5 −5 -2 3 −3
  (2.15)
3 −1 1
E3 : R3 ←R3 − 34 R2 
−−−−−−−−−→  3 4 −1 

3
-2 4 − 94

from which we can get


   
1 0 0 3 −1 1
L= 3 1 0 , U =  0 4 −1  . (2.16)
   
3
−2 4 1 0 0 − 49

Matlab-code 2.41. The LU factorization (overwritten; without pivoting)


can be implemented as
lu_nopivot_overwrite.m
1 function A = lu_nopivot_overwrite(A)
2

3 [m,n] = size(A);
4 for k = 1:m-1
5 A(k+1:m, k) = A(k+1:n,k)/A(k,k); %ratios to pivot
6 for i = k+1:m
7 A(i,k+1:n) = A(i,k+1:n) - A(i,k)*A(k,k+1:n);
8 end
9 end

Self-study 2.42. Find the LU factorization of


 
4 3 −5
A = −4 −5 7
 

8 8 −8
92 Chapter 2. Matrix Algebra

2.5.2. Solutions of Triangular Algebraic Systems


Lower-Triangular Systems

• Consider the n × n system

L y = b, (2.17)

where L = [`ij ] is a nonsingular, lower-triangular matrix (`ii 6= 0).


• It is easy to see how to solve this system if we write it in detail:

`11 y1 = b1
`21 y1 + `22 y2 = b2
`31 y1 + `32 y2 + `33 y3 = b3 (2.18)
.. ..
. .
`n1 y1 + `n2 y2 + `n3 y3 + · · · + `nn yn = bn

• The first equation involves only the unknown y1 and therefore


y1 = b1 /`11 . (2.19)

• With y1 just obtained, we can determine y2 from the second equation:


y2 = (b2 − `21 y1 )/`22 . (2.20)

• With y1 , y2 known, we can solve the third equation for y3 , and so on.

Algorithm 2.43. Forward Substitution/Elimination


In general, once we have y1 , y2 , · · · , yi−1 , we can solve for yi using the ith
equation of (2.18):

yi = (bi − `i1 y1 − `i2 y2 − · · · − `i,i−1 yi−1 )/`ii


i−1
1 X  (2.21)
= bi − `ij yj
`ii j=1
2.5. Solving Linear Systems by Matrix Factorizations 93

Upper-Triangular Systems

• Consider the system


U x = y, (2.22)
where U = [uij ] ∈ Rn × n is nonsingular, upper-triangular.
• Writing it out in detail, we get

u11 x1 + u12 x2 + · · · + u1,n−1 xn−1 + u1,n xn = y1


u22 x2 + · · · + u2,n−1 xn−1 + u2,n xn = y2
.. .
. = .. (2.23)
un−1,n−1 xn−1 + un−1,n xn = yn−1
un,n xn = yn

• It is clear that we should solve the system from bottom to top.

Matlab-code 2.44. (Back Substitution):

for i=n:-1:1
if(U(i,i)==0), error(’U: singular!’); end
x(i)=y(i)/U(i,i); (2.24)
y(1:i-1)=y(1:i-1)-U(1:i-1,i)*x(i);
end
94 Chapter 2. Matrix Algebra

In practice, the LU factorization incorporates partial pivoting for an en-


hanced stability.

Algorithm 2.45. Gauss Elimination with Partial Pivoting


To get the solution of Ax = b:
1. Factorize A into A = P T LU (⇔ P A = LU ), where
P = permutation matrix
L = unit lower triangular matrix (i.e., with 1’s on the diagonal)
U = upper-triangular matrix
Matlab: [L,U,P] = lu(A)
2. Solve Ax = P T LU x = b
(a) r = P b (permuting b)
(b) Ly = r (forward substitution)
(c) U x = y (back substitution)

Example 2.46. (Revisit of Example 2.36) Use the LU factorization to


solve the linear system Ax = b, where
   
1 −2 1 −2
A =  2 −2 3, b =  1.
   

−3 2 0 1

Solution.
forward_sub.m
1 function y = forward_sub(L,b)
2 % function y = forward_sub(L,b)
3

4 [m,n] = size(L);
5 y = zeros(m,1); y(1)=b(1)/L(1,1);
6

7 for i=2:m
8 y(i) = ( b(i) - L(i,1:i-1)*y(1:i-1) ) / L(i,i);
9 end
2.5. Solving Linear Systems by Matrix Factorizations 95

back_sub.m
1 function x = back_sub(U,y)
2 %function x = back_sub(U,y)
3

4 [m,n] = size(U);
5 x = zeros(m,1); x(m)=y(m)/U(m,m);
6

7 for i=m-1:-1:1
8 x(i) = (y(i)-(U(i,i+1:end)*x(i+1:end))) / U(i,i);
9 end

lu_solve.m Output
1 A = [ 1 -2 1 1 x =
2 2 -2 3 2 1
3 -3 2 0]; 3 2
4 b = [-2 1 1]'; 4 1
5 5

6 x = A\b 6 L =
7 7 1.00000 0.00000 0.00000
8 [L,U,P] = lu(A) 8 -0.33333 1.00000 0.00000
9 r = P*b; 9 -0.66667 0.50000 1.00000
10 y = forward_sub(L,r); 10

11 x = back_sub(U,y) 11 U =
12 -3.00000 2.00000 0.00000
13 0.00000 -1.33333 1.00000
14 0.00000 0.00000 2.50000
15

16 P =
17 0 0 1
18 1 0 0
19 0 1 0
20

21 x =
22 1
23 2
24 1
96 Chapter 2. Matrix Algebra

Exercises 2.5
1. (Hand calculation) Solve the equation Ax = b by using the LU factorization given for
A.       
4 3 −5 1 0 0 4 3 −5 2
A = −4 −5 7 = −1 1 00 −2 2 and b = −4.
      

8 6 −8 2 0 1 0 0 2 6
Here Ly = b requires replacement operations forward, while U x = y requires replace-
ment and scaling operations backward.  
1/4
Ans: x =  2.
 

1
2. (Hand calculation) When A is invertible, Matlab finds A−1 by factoring A = LU , in-
verting L and U , and then computing U −1 L−1 . You will use this method to compute the
inverse of A given in Exercise 1.
(a) Find U −1 , starting from [U I], reduce it to [I U −1 ].
(b) Find L−1 , starting from [L I], reduce it to [I L−1 ].
(c) Compute U −1 L−1 .
 
1/8 3/8 1/4
Ans: A−1 = −3/2 −1/2 1/2.
 

−1 0 1/2

3. M 1 Use Matlab/Octave to solve the problem in Exercise 1, beginning with [L,U,P]=lu(A)


and following steps in Algorithm 2.45.
 
−1 −5 8 4
4. Let A =  4 2 −5 −7.
 

−2 −4 7 5
(a) Try to see if you can find LU factorization without pivoting.
(b) M Use Matlab/Octave to find the LU factorization of A. Then recover A from
[L,U,P].
 
2 5 4 3
−4 −9 −6 −2
5. Find LU factorization (without pivoting) of B =  .
 
 2 7 7 14
−6 −14 −10 −2  
2 5 4 3
 
0 1 2 4
Ans: U =  .
0
 0 −1 3

0 0 0 3

1
All problems marked by M will have a higher credit.
2.8. Subspaces of Rn 97

2.8. Subspaces of Rn

Definition 2.47. A subspace of Rn is any set H in Rn that has three


properties:
a) The zero vector is in H.
b) For each u and v in H, the sum u + v is in H.
c) For each u in H and each scalar c, the vector cu is in H.
That is, H is closed under linear combinations.

Example 2.48.

1. A line through the origin in R2 is a subspace of R2 .


2. Any plane through the origin in R3 .

Figure 2.3: Span{v1 , v2 } as a plane through the origin.

3. Let v1 , v2 , · · · , vp ∈ Rn . Then Span{v1 , v2 , · · · , vp } is a subspace of Rn .

Definition 2.49. Let A be an m × n matrix. The column space of A


is the set Col A of all linear combinations of columns of A. That is, if
A = [a1 a2 · · · an ], then

Col A = {u | u = c1 a1 + c2 a2 + · · · + cn an }, (2.25)

where c1 , c2 , · · · , cn are scalars. Col A is a subspace of Rm .


98 Chapter 2. Matrix Algebra
   
1 −3 −4 3
Example 2.50. Let A = −4 6 −2 and b =  3. Determine whether
   

−3 7 6 −4
b is in the column space of A, Col A.
Solution. Clue: 1 b ∈ Col A
⇔ 2 b is a linear combination of columns of A
⇔ 3 Ax = b is consistent
⇔ 4 [A b] has a solution

Definition 2.51. Let A be an m × n matrix. The null space of A, Nul A,


is the set of all solutions of the homogeneous system Ax = 0.

Theorem 2.52. Nul A is a subspace of Rn .

Proof.
2.8. Subspaces of Rn 99

Basis for a Subspace


Definition 2.53. A basis for a subspace H in Rn is a set of vectors that
1. is linearly independent, and
2. spans H.

Remark 2.54.
(" # " #)
1 0
1. , is a basis for R2 .
0 2
     
1 0 0
0 1 0
     
, e2 = 0, · · · , en =  .... Then {e1 , e2 , · · · , en } is called
     
2. Let e1 =  0
 ..  ..
     
 .  . 0
 

0 0 1
n
the standard basis for R .

Basis for Nul A


Example 2.55. Find a basis for the null space of the matrix
 
−3 6 −1 1
A =  1 −2 2 3.
 

2 −4 5 8
 
1 2 0 1 0
Solution. [A 0] ∼ 0 0 1 2 0
 

0 0 0 0 0
100 Chapter 2. Matrix Algebra

Theorem 2.56. Basis for Nul A can be obtained from the parametric
vector form of solutions of Ax = 0. That is, suppose that the solutions of
Ax = 0 reads
x = x1 u1 + x2 u2 + · · · + xk uk ,
where x1 , x2 , · · · , xk correspond to free variables. Then, a basis for Nul A
is {u1 , u2 , · · · , uk }.

Basis for Col A


Example 2.57. Find a basis for the column space of the matrix
 
1 0 −3 5 0
 
0 1 2 −1 0
B=0 0 0 0 1.

 
0 0 0 0 0
Solution. Observation: b3 = −3b1 + 2b2 and b4 = 5b1 − b2 .

Theorem 2.58. In general, non-pivot columns are linear combinations


of pivot columns. Thus the pivot columns of a matrix A form a basis for
Col A.
2.8. Subspaces of Rn 101

Example 2.59. Matrix A and its echelon form is given. Find a basis for
Col A and a basis forNul 
A. 
3 −6 9 0 1 −2 3 0
A = 2 −4 7 2 ∼ 0 0 1 2
   

3 −6 6 −6 0 0 0 0
Solution.

Ans: BCol A = {a1 , a3 }, BNul A = {[2, 1, 0, 0]T , [6, 0, −2, 1]T }.


True-or-False 2.60.
a. If v1 , v2 , · · · , vp are in Rn , then Span{v1 , v2 , · · · , vp } = Col [v1 v2 · · · vp ].
b. The columns of an invertible n × n matrix form a basis for Rn .
c. Row operations do not affect linear dependence relations among the
columns of a matrix.
d. The column space of a matrix A is the set of solutions of Ax = b.
e. If B is an echelon form of a matrix A, then the pivot columns of B form
a basis for Col A.
Solution.
Ans: T,T,T,F,F
102 Chapter 2. Matrix Algebra

Exercises 2.8
       
1 4 5 −4
−2 −7 −8  10
1. Let v1 =  , v2 =  , v3 =  , and u =  . Determine if u is in the subspace
       
 4  9  6 −7
3 7 5 −5
of R4 generated by {v1 , v2 , v3 }.
Ans: No
       
−3 −2 0 1
2. Let v1 =  0, v2 =  2, v3 = −6, and p =  14. Determine if p is in Col A, where
       

6 3 3 −9
A = [v1 v2 v3 ].
Ans: Yes
3. Give integers p and q such that Nul A is a subspace of Rp and Col A is a subspace of Rq .
 
  1 2 3
3 2 1 −5  4 5 7
A = −9 −4 1 7 B = 
   

−5 −1 0
9 2 −5 1
2 7 11
4. Determine which sets are bases for R3 . Justify each answer.
             
0 5 6 1 3 −2 0
a)  1, −7, 3 b) −6, −4,  7,  1
             

−2 4 5 −7 7 5 −2
Ans: a) Yes
5. Matrix A and its echelon form is given. Find a basis for Col A and a basis for Nul A.
   
1 4 8 −3 −7 1 4 8 0 5
−1 2 7 3 4 0 −1
0 2 5
 
A =   ∼   Hint : For a basis for Col A, you can just
 
−2 2 9 5 5 0 0 0 1 4
3 6 9 −5 −2 0 0 0 0 0
recognize pivot columns, while you should find the solutions of Ax = 0 for Nul A.
6. a) Suppose F is a 5 × 5 matrix whose column space is not equal to R5 . What can you
say about Nul F ?
b) If R is a 6 × 6 matrix and Nul R is not the zero subspace, what can you say about
Col R?
c) If Q is a 4 × 4 matrix and Col Q = R4 , what can you say about solutions of equations
of the form Qx = b for b in R4 ?
Ans: b) Col R 6= R6 . Why? c) It has always a unique solution
2.9. Dimension and Rank 103

2.9. Dimension and Rank


2.9.1. Coordinate Systems

Remark 2.61. The main reason for selecting a basis for a subspace
H (instead of merely a spanning set) is that each vector in H can be
written in only one way as a linear combination of the basis vectors.

Example 2.62. Let B = {b1 , b2 , · · · , bp } be a basis of H and

x = c1 b1 + c2 b2 + · · · + cp bp ; x = d1 b1 + d2 b2 + · · · + dp bp , x ∈ H.

Show that c1 = d1 , c2 = d2 , · · · , cp = dp .
Solution. Hint : A property of a basis is that basis vectors are linearly independent

Remark 2.63. For example, if a vector x ∈ R3 is expressed as


     
1 0 0
x = x1 0 + x2 1 + x3 0 = x1 e1 + x2 e2 + x3 e3 , (2.26)
     

0 0 1

then [x1 , x2 , x3 ]T is called the coordinate vector of x.

Definition 2.64. Suppose the set B = {b1 , b2 , · · · , bp } is a basis for a


subspace H. For each x ∈ H, the coordinates of x relative to the ba-
sis B are the weights c1 , c2 , · · · , cp such that x = c1 b1 + c2 b2 + · · · + cp bp ,
and the vector in Rp  
c1
 ..
[x]B =  .
cp
is called the coordinate vector of x (relative to B) or the B-
coordinate vector of x.
104 Chapter 2. Matrix Algebra
     
3 −1 3
Example 2.65. Let v1 = 6, v2 =  0, x = 12, and B = {v1 , v2 }.
     

2 1 7
Then B is a basis for H = Span{v1 , v2 }, because v1 and v2 are linearly
independent. Determine if x is in H, and if it is, find the coordinate vector
of x relative to B.
Solution.

Figure 2.4: A coordinate system on a plane H ⊂ R3 .

Remark 2.66. The grid on the plane in Figure 2.4 makes H “look"
like R2 . The correspondence x 7→ [x]B is a one-to-one correspondence
between H and R2 that preserves linear combinations. We call such a
correspondence an isomorphism, and we say that H is isomorphic to
R2 .
2.9. Dimension and Rank 105

2.9.2. Dimension of a Subspace

Definition 2.67. The dimension of a nonzero subspace H (dim H) is


the number of vectors in any basis for H. The dimension of zero subspace
{0} is defined to be zero.

Example 2.68.

a) dim Rn = n.
b) Let H be as in Example 2.65. What is dim H?
c) G = Span{u}. What is dim G?
Solution.

Remark 2.69.
1) Dimension of Col A:

dim Col A = The number of pivot columns in A

which is called the rank of A, rank A.


2) Dimension of Nul A:

dim Nul A = The number of free variables in A


= The number of non-pivot columns in A

Theorem 2.70. (Rank Theorem) Let A ∈ Rm×n . Then

dim Col A + dim Nul A = rank A + nullity A = n


= (the number of columns in A)

Here, “dim Nul A” is called the nullity of A: nullity A


106 Chapter 2. Matrix Algebra

Example 2.71. A matrix and its echelon form are given. Find the bases
for Col
 A and Nul A and also
 state
 the dimensions
 of these subspaces.
1 −2 −1 5 4 1 0 1 0 0
   
 2 −1 1 5 6 0 1 1 0 0
A= −2 0 −2 1 −6 ∼ 0 0 0 1 0
  
   
3 1 4 1 5 0 0 0 0 1
Solution.

Example 2.72. Find a basis for the subspace spanned by the given vectors.
What
 isthe 
dimension
   of the subspace?

1 2 0 −1 3
         
−1 −3 −1  4 −7
 ,  ,  ,  ,  
−2 −1  3 −7  6
         
3 4 2 7 −9
Solution.
2.9. Dimension and Rank 107

Theorem 2.73. (The Basis Theorem) Let H be a p-dimensional sub-


space of Rn . Then
a) Any linearly independent set of exactly p elements in H is automati-
cally a basis for H
b) Any set of p elements of H that spans H is automatically a basis for
H.

Theorem 2.74. (Invertible Matrix Theorem; continued from The-


orem 2.25, p.82)
Let A be an n × n square matrix. Then the following are equivalent.
m. The columns of A form a basis of Rn
n. Col A = Rn
o. dim Col A = n
p. rank A = n
q. Nul A = {0}
r. dim Nul A = 0

Example 2.75.

a) If the rank of a 9 × 8 matrix A is 7, what is the dimension of solution


space of Ax = 0?
b) If A is a 4 × 3 matrix and the mapping x 7→ Ax is one-to-one, then what
is dim Nul A?
Solution.
108 Chapter 2. Matrix Algebra

True-or-False 2.76.
a. Each line in Rn is a one-dimensional subspace of Rn .
b. The dimension of Col A is the number of pivot columns of A.
c. The dimensions of Col A and Nul A add up to the number of columns of
A.
d. If B = {v1 , v2 , · · · , vp } is a basis for a subspace H of Rn , then the corre-
spondence x 7→ [x]B makes H look and act the same as Rp .
Hint : See Remark 2.66.

e. The dimension of Nul A is the number of variables in the equation


Ax = 0.
Solution.

Ans: F,T,T(Rank Theorem),T,F

Exercises 2.9
1. A matrix and its echelon form is given. Find the bases for Col A and Nul A, and then
state the dimensions of these subspaces.
   
1 −3 2 −4 1 −3 2 −4
3 9 −1 5
 0
 0 5 −7
A= ∼
 

2 −6 4 −3 0 0 0 5
4 12 2 7 0 0 0 0
2. Use the Rank Theorem to justify each answer, or perform construction.

(a) If the subspace of all solutions of Ax = 0 has a basis consisting of three vectors and
if A is a 5 × 7 matrix, what is the rank of A?
Ans: 4
(b) What is the rank of a 4 × 5 matrix whose null space is three-dimensional?
(c) If the rank of a 7 × 6 matrix A is 4, what is the dimension of the solution space of
Ax = 0?
(d) Construct a 4 × 3 matrix with rank 1.
C HAPTER 3
Determinants

In linear algebra, the determinant is a scalar value that can be computed


for a square matrix. Geometrically, it can be viewed as a volume scaling
factor of the linear transformation described by the matrix, x 7→ Ax.
For example, let A ∈ R2×2 and
" # " #
1 0
u1 = A , u2 = A
0 1

Then the determinant of A (in modulus) is the same as the area of the par-
allelogram generated by u1 and u2 .
In this chapter, you will study the determinant and its properties.

Contents of Chapter 3
3.1. Introduction to Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.2. Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

109
110 Chapter 3. Determinants

3.1. Introduction to Determinants


Definition 3.1. Let A be an n × n square matrix. Then determinant
is a scalar value denoted by det A or |A|.
1) Let A = [a] ∈ R1 × 1 . Then det A = a.
" #
a b
2) Let A = ∈ R2 × 2 . Then det A = ad − bc.
c d
" #
2 1
Example 3.2. Let A = . Consider a linear transformation T : R2 → R2
0 3
defined by T (x) = Ax.

1) Find the determinant of A.


2) Determine the image of a rectangle R = [0, 2] × [0, 1] under T .
3) Find the area of the image.
4) Figure out how det A, the area of the rectangle (= 2), and the area of the
image are related.
Solution.

Ans: 3) 12

Note: The determinant can be viewed as a volume scaling factor.


3.1. Introduction to Determinants 111

Definition 3.3. Let Aij be the submatrix of A obtained by deleting row


i and column j of A. Then the (i, j)-cofactor of A = [aij ] is the scalar Cij ,
given by
Cij = (−1)i+j det Aij . (3.1)

Definition 3.4. For n ≥ 2, the determinant of an n × n matrix A = [aij ]


is given by the following formulas:
1. The cofactor expansion across the first row:

det A = a11 C11 + a12 C12 + · · · + a1n C1n (3.2)

2. The cofactor expansion across the row i:

det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin (3.3)

3. The cofactor expansion down the column j:

det A = a1j C1j + a2j C2j + · · · + anj Cnj (3.4)


 
1 5 0
Example 3.5. Find the determinant of A = 2 4 −1, by expanding
 

0 −2 0
across the first row and down column 3.
Solution.

Ans: −2
112 Chapter 3. Determinants
 
1 −2 5 2
 
2 −6 −7 5
Example 3.6. Compute the determinant of A = 
0 0 3
 by a
 0

5 0 4 4
cofactor expansion.
Solution.

Note: If A is a triangular (upper or lower) matrix, then det A is the


product of entries on the main diagonal of A.
 
1 −2 5 2
 
0 −6 −7 5
Example 3.7. Compute the determinant of A =  0 0 3 0.

 
0 0 0 4
Solution.
3.1. Introduction to Determinants 113

True-or-False 3.8.
a. An n × n determinant is defined by determinants of (n − 1) × (n − 1)
submatrices.
b. The (i, j)-cofactor of a matrix A is the matrix Aij obtained by deleting
from A its i-th row and j-th column.
c. The cofactor expansion of det A down a column is equal to the cofactor
expansion along a row.
d. The determinant of a triangular matrix is the sum of the entries on the
main diagonal.
Solution.
Ans: T,F,T,F

Exercises 3.1
1. Compute the determinants in using a cofactor expansion across the first row and down
the second column.
   
3 0 4 2 3 −3
a) 2 3 2 b) 4 0 3
   

0 5 −1 6 1 5
Ans: a) 1, b) −24
2. Compute the determinants by cofactor expansions. At each step, choose a row or column
that involves the least amount of computation.
   
3 5 −6 4 4 0 −7 3 −5
0 −2 3 −3 0 0

2 0 0

a) 
   
b) 

0 0 1 5 7 3 −6 4 −8 

0 0 0 3 5 0 5 2 −3
 
0 0 9 −1 2
114 Chapter 3. Determinants

Ans: a) −18, b) 6
3. The expansion of a 3 × 3 determinant can be remembered by the following device. Write
a second copy of the first two columns to the right of the matrix, and compute the deter-
minant by multiplying entries on six diagonals:

Figure 3.1

Then, add the downward diagonal products and subtract the upward products. Use this
method to compute the determinants for the matrices in Exercise 1. Warning: This
trick does not generalize in any reasonable way to 4 × 4 or larger matrices.
4. Explore the effect of an elementary row operation on the determinant of a matrix. In
each case, state the row operation and describe how it affects the determinant.
" # " # " # " #
a b c d a b a + kc b + kd
a) , b) ,
c d a b c d c d
Ans: b) Replacement does not change the determinant
" #
3 1
5. Let A = . Write 5A. Is det (5A) = 5det A?
4 2
3.2. Properties of Determinants 115

3.2. Properties of Determinants


Determinants under Elementary Row Operations

Theorem 3.9. Let A be an n × n square matrix.


a) (Replacement): If B is obtained from A by a row replacement, then
det B = det A.
" # " #
1 3 1 3
A= , B=
2 1 0 −5

b) (Interchange): If two rows of A are interchanged to form B, then


det B = −det A.
" # " #
1 3 2 1
A= , B=
2 1 1 3

c) (Scaling): If one row of A is multiplied by k (6= 0), then


det B = k · det A.
" # " #
1 3 1 3
A= , B=
2 1 −4 −2

 
1 −4 2
Example 3.10. Compute det A, where A = −1 7 0, after applying
 

−2 8 −9
a couple of steps of replacement operations.
Solution.

Ans: 15
116 Chapter 3. Determinants

Theorem 3.11. Invertible Matrix Theorem (p.82)


A square matrix A is invertible if and only if det A 6= 0.

Claim 3.12. Let A and B be n × n matrices.


a) det AT = det A.
" # " #
1 3 1 2
A= , AT =
2 1 3 1

b) det (AB) = det A · det B.


" # " # " #
1 3 1 1 13 7
A= , B= ; then AB = .
2 1 4 2 6 4

1
c) If A is invertible, then det A−1 = . (∵ det In = 1.)
det A

Example 3.13. Suppose the sequence 5 × 5 matrices A, A1 , A2 , and A3 are


related by following elementary row operations:
R ←R −3R R3 ←(1/5) R3 R ↔R
A −−2−−−2 1
−−→ A1 −−−−−−−→ A2 −−4−−→
5
A3
 
1 2 3 4 1
0 −2 1 −1 1
 
 
Find det A, if A3 = 
 0 0 3 0 1 

0 0 0 −1 1
 

0 0 0 0 1
Solution.

Ans: −30
3.2. Properties of Determinants 117

A Linearity Property of the Determinant Function

Note: Let A ∈ Rn×n , A = [a1 · · · aj · · · an ]. Suppose that the j-th col-


umn of A is allowed to vary, and write

A = [a1 · · · aj−1 x aj+1 · · · an ].

Define a transformation T from Rn to R by

T (x) = det [a1 · · · aj−1 x aj+1 · · · an ]. (3.5)

Then,
T (cx) = cT (x)
(3.6)
T (u + v) = T (u) + T (v)
This (multi-) linearity property of the determinant turns out to have
many useful consequences that are studied in more advanced courses.

True-or-False 3.14.
a. If the columns of A are linearly dependent, then det A = 0.
b. det (A + B) = det A + det B.
c. If three row interchanges are made in succession, then the new deter-
minant equals the old determinant.
d. The determinant of A is the product of the diagonal entries in A.
e. If det A is zero, then two rows or two columns are the same, or a row or
a column is zero.
Solution.

Ans: T,F,F,F,F
118 Chapter 3. Determinants

Exercises 3.2
1. Find the determinant by row reduction to echelon form.
 
1 3 0 2
−2 −5 7 4
 
 
 3 5 2 1
1 −1 2 −3
Ans: 0
2. Use
 determinants
 to find out if the matrix is invertible.
2 0 0 6
1 −7 −5 0
 
 
3 8 6 0
0 7 5 4
Ans: Invertible
3. Use
 determinants
    to  decide if the set of vectors is linearly independent.
7 −8 7
−4,  5,  0
     

−6 7 −5
Ans: linearly independent
 
1 0 1
4. Compute det B 6 , where B = 1 1 2.
 

1 2 1
Ans: 64
5. Show or answer with justification.

a) Let A and P be square matrices, with P invertible. Show that det (P AP −1 ) = det A.
b) Suppose that A is a square matrix such that det A3 = 0. Can A can be invertible?
Ans: No
c) Let U be a square matrix such that U T U = I. Show that det U = ±1.

6. Compute
" AB # and "verify# that det AB = det A · det B.
3 0 2 0
A= ,B=
6 1 5 4
C HAPTER 4
Vector Spaces

A vector space (also called a linear space) is a nonempty set of objects,


called vectors, which is closed under two operations:

• addition and
• scalar multiplication.

In this chapter, we will study basic concepts of such general vector spaces
and their subspaces.

Contents of Chapter 4
4.1. Vector Spaces and Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.2. Null Spaces, Column Spaces, and Linear Transformations . . . . . . . . . . . . . . . . 125

119
120 Chapter 4. Vector Spaces

4.1. Vector Spaces and Subspaces

Definition 4.1. A vector space is a nonempty set V of objects, called


vectors, on which are defined two operations, called addition and
multiplication by scalars (real numbers), subject to the ten axioms
(or rules) listed below. The axioms must hold for all vectors u, v, w ∈ V
and for all scalars c and d.
1. u + v ∈ V
2. u + v = v + u
3. (u + v) + w = u + (v + w)
4. There is a zero vector 0 ∈ V such that u + 0 = u
5. For each u ∈ V , there is a vector −u ∈ V such that u + (−u) = 0
6. cu ∈ V
7. c(u + v) = cu + cv
8. (c + d)u = cu + du
9. c(du) = (cd)u
10. 1u = u

Example 4.2. Examples of Vector Spaces:

a) Rn , n ≥ 1, are the premier examples of vector spaces.


b) Let Pn = {p(t) = a0 +a1 t1 +· · ·+an tn }, n ≥ 1. For p(t) = a0 +a1 t1 +· · ·+an tn
and q(t) = b0 + b1 t1 + · · · + bn tn , define

(p + q)(t) = p(t) + q(t) = (a0 + b0 ) + (a1 + b1 )t1 + · · · + (an + bn )tn


(cp)(t) = cp(t) = ca0 + ca1 t1 + · · · + can tn

Then Pn is a vector space, with the usual polynomial addition and scalar
multiplication.
c) Let V = {all real-valued functions defined on a set D}. Then, V is a vec-
tor space, with the usual function addition and scalar multiplication.
4.1. Vector Spaces and Subspaces 121

Definition 4.3. A subspace of a vector space V is a subset H of V that


has three properties:
a) 0 ∈ H, where 0 is the zero vector of V
b) H is closed under vector addition: for each u, v ∈ H, u + v ∈ H
c) H is closed under scalar multiplication: for each u ∈ H and each
scalar c, cu ∈ H

Example 4.4. Examples of Subspaces:

a) H = {0}: the zero subspace


b) Let P = {all polynomials with real coefficients defined on R}. Then, P
is a subspace of the space {all real-valued functions defined on R}.
c) The vector space R2 is not a subspace of R3 , because R2 6⊂ R3 .
  

 s 

d) Let H =  t : s, t ∈ R . Then H is a subspace of R3 .
 
 
0
 

Example 4.5. Determine if the given set is a subspace of Pn for an appro-


priate value of n.
 2
a) at | a ∈ R b) {p ∈ P3 with integer coefficients}

c) a + t2 | a ∈ R d) {p ∈ Pn | p(0) = 0}

Solution.

Ans: a) Yes, b) No, c) No, d) Yes


122 Chapter 4. Vector Spaces

A Subspace Spanned by a Set


Example 4.6. Let v1 , v2 ∈ V , a vector space. Prove that H = Span{v1 , v2 }
is a subspace of V .
Solution.

a) 0 ∈ H, where 0 is the zero vector of V

b) For each u, w ∈ H, u + w ∈ H

c) For each u ∈ H and each scalar c, cu ∈ H

Theorem 4.7. If v1 , v2 , · · · , vp are in a vector space V , then


Span{v1 , v2 , · · · , vp } is a subspace of V .

Example 4.8. Let H = {(a − 3b, b − a, a, b) | a, b ∈ R}. Show that H is a


subspace of R4 .
Solution.
4.1. Vector Spaces and Subspaces 123

Self-study 4.9. Let H and K be subspaces of V . Define the sum of H and


K as
H + K = {u + v | u ∈ H, v ∈ K}.
Prove that H + K is a subspace of V .
Solution.

True-or-False 4.10.
a. A vector is an arrow in three-dimensional space.
b. A subspace is also a vector space.
c. R2 is a subspace of R3 .
d. A subset H of a vector space V is a subspace of V if the following con-
ditions are satisfied: (i) the zero vector of V is in H, (ii) u, v, and u + v
are in H, and (iii) c is a scalar and cu is in H.
Solution.

Ans: F,T,F,F (In (ii), there is no statement that u and v represent all possible elements of H)
124 Chapter 4. Vector Spaces

Exercises 4.1
You may use Definition 4.3, p. 121, or Theorem 4.7, p. 122.
(" # )
x
1. Let V be the first quadrant in the xy-plane; that is, let V = : x ≥ 0, y ≥ 0 .
y

a) If u and v are in V , is u + v in V ? Why?


b) Find a specific vector u in V and a specific scalar c such that cu is not in V . (This is
enough to show that V is not a vector space.)
 
s
2. Let H be the set of all vectors of the form 3s.
 

2s
 
a) Find a vector v in R3 such that H = Span{v}. 1
Ans: a) v = 3
 
b) Why does this show that H is a subspace of R3 ?
2
 
5b + 2c
3. Let W be the set of all vectors of the form  b.
 

a) Find vectors u and v in R3 such that W = Span{u, v}.


b) Why does this show that W is a subspace of R3 ?
 
s + 3t
 s − t
4. Let W be the set of all vectors of the form  . Show that W is a subspace of R4 .
 
2s − t
4t
For fixed positive integers m and n, the set Mm × n of all m × n matrices is a vector space,
under the usual operations of addition of matrices and multiplication by real scalars.
" #
a b
5. Determine if the set H of all matrices of the form is a subspace of M2 × 2 .
0 d
4.2. Null Spaces, Column Spaces, and Linear Transformations 125

4.2. Null Spaces, Column Spaces, and Linear


Transformations
Note: In applications of linear algebra, subspaces of Rn usually arise
in one of two ways:
a) as the set of all solutions to a homogeneous linear system, or
b) as the set of all linear combinations of certain vectors.
In this section, we study these two descriptions of subspaces.
• The section looks like a duplication of Section 2.8. Subspaces of Rn .
• It aims to practice to use the concept of a subspace.

The Null Space of a Matrix

Definition 4.11. The null space of an m × n matrix A, written as


Nul A, is the set of all solutions of the homogeneous equation Ax = 0. In
set notation,
Nul A = {x | x ∈ Rn and Ax = 0}
 
" # 5
1 −3 −2
Example 4.12. Let A = and v =  3. Determine if v
 
−5 9 1
−2
belongs to the null space of A.
Solution.

Theorem 4.13. The null space of an m × n matrix A is a subspace


of Rn . Equivalently, the set of all solutions to a system Ax = 0 of m
homogeneous linear equations in n unknowns is a subspace of Rn .

Proof.
126 Chapter 4. Vector Spaces

Example 4.14. Let H be the set of all vectors in R4 , whose coordinates


(
a − 2b + 5c = d
a, b, c, d satisfy the equations . Show that H is a subspace
c−a = b
of R4 .
Solution. Rewrite the above equations as
(
a − 2b + 5c − d = 0
−a − b + c = 0
 
a " #
 
 b 1 −2 5 −1
 c is the solution of −1 −1 1 0 x = 0. Thus the collection of
Then  
 
d
these solutions is a subspace.

An Explicit Description of Nul A


Example 4.15. Find a spanning set for the null space of the matrix
 
−3 6 −1 1 −7
A =  1 −2 2 3 −1
 

2 −4 5 8 −4
 
1 −2 0 −1 3 0
Solution. [A 0] ∼ 0 0 1 2 −2 0 (R.E.F)
 

0 0 0 0 0 0
4.2. Null Spaces, Column Spaces, and Linear Transformations 127

The Column Space of a Matrix

Definition 4.16. The column space of an m × n matrix A, written


as Col A, is the set of all linear combinations of the columns of A. If
A = [a1 a2 · · · an ], then

Col A = Span{a1 , a2 , · · · , an } (4.1)

Theorem 4.17. The column space of an m × n matrix A is a subspace


of Rm .

Example 4.18. Find a matrix A such that W = Col A.


  
 6a − b
 

W =  a + b : a, b ∈ R
 
 
−7a
 
Solution.

Remark 4.19. The column space of an m × n matrix A is all of Rm if


and only if the equation Ax = b has a solution for each b ∈ Rm .
128 Chapter 4. Vector Spaces

The Contrast Between Nul A and Col A

Remark 4.20. Let A ∈ Rm×n .


1. Nul A is a subspace of Rn . 1. Col A is a subspace of Rm .
2. Nul A is implicitly defined; that is, you 2. Col A is explicitly defined; that is, you
are given only a condition (Ax = 0). are told how to build vectors in Col A.
3. It takes time to find vectors in Nul A. 3. It is easy to find vectors in Col A. The
Row operations on [A 0] are required. columns of A are displayed; others are
formed from them.
4. There is no obvious relation between 4. There is an obvious relation between
Nul A and the entries in A. Col A and the entries in A, since each
column of A is in Col A.
5. A typical vector v in Nul A has the prop- 5. A typical vector v in Col A has the prop-
erty that Av = 0. erty that the equation Ax = v is consis-
tent.
6. Given a specific vector v, it is easy to 6. Given a specific vector v, it may take
tell if v is in Nul A. Just compute Av. time to tell if v is in Col A. Row opera-
tions on [A v] are required.
7. Nul A = {0} ⇔ the equation Ax = 0 7. Col A = Rm ⇔ the equation Ax = b
has only the trivial solution. has a solution for every b ∈ Rm .
8. Nul A = {0} ⇔ the linear transforma- 8. Col A = Rm ⇔ the linear transforma-
tion x 7→ Ax is one-to-one. tion x 7→ Ax maps Rn onto Rm .
4.2. Null Spaces, Column Spaces, and Linear Transformations 129

Kernel (Null Space) and Range of a Linear Transformation

Definition 4.21. A linear transformation T from a vector space V


into a vector space W is a rule that assigns to each vector v ∈ V a unique
vector T (x) ∈ W , such that

(i) T (u + v) = T (u) + T (v) for all u, v ∈ V , and


(4.2)
(ii) T (cu) = cT (u) for all u ∈ V and scalar c

Example 4.22. Let T : V → W be a linear transformation from a vector


space V into a vector space W . Prove that the range of T is a subspace of
W.
Hint : Typical elements of the range have the form T (u) and T (v) for u, v ∈ V . See Defini-
tion 4.3, p. 121; you should check if the three conditions are satisfied.
Solution.

True-or-False 4.23.
a. The column space of A is the range of the mapping x 7→ Ax.
b. The kernel of a linear transformation is a vector space.
c. Col A is the set of all vectors that can be written as Ax for some x.
That is, Col A = {b | b = Ax, for x ∈ Rn }.
d. Nul A is the kernel of the mapping x 7→ Ax.
e. Col A is the set of all solutions of Ax = b.
Solution.

Ans: T,T,T,T,F
130 Chapter 4. Vector Spaces

Exercises 4.2
1. Either show that the given set is a vector space, or find a specific example to the contrary.
     
 a
 
  b − 2d
 

(a)  b : a + b + c = 2 (b) b + 3d : b, d ∈ R
   
   
c d
   

Hint : See Definition 4.3, p. 121, and Example 4.18.


   
−8 −2 −9 2
2. Let A =  6 4 8 and w =  1.
   

4 0 4 −2

(a) Is w in Col A? If yes, express w as a linear combination of columns of A.


(b) Is w in Nul A? Why?
Ans: yes, yes
3. Let V and W be vector spaces, and let T : V → W be a linear transformation. Given a
subspace U of V , let T (U ) denote the set of all images of the form T (x), where x ∈ U .
Show that T (U ) is a subspace of W .
Hint : You should check if the three conditions in Definition 4.3 are satisfied for all
elements in T (U ). For example, for the second condition, let’s first select two arbitrary
elements in T (U ): T (u1 ) and T (u2 ), where u1 , u2 ∈ U . Then what you have to do is to
show T (u1 ) + T (u2 ) ∈ T (U ). To show the underlined, you may use the assumption that
T is linear. That is, T (u1 ) + T (u2 ) = T (u1 + u2 ). Is the term in blue in T (U )? Why?
Advice from an old man: I know some of you may feel that the last problem is crazy. It is related
to mathematical logic and understandability. Just try to beat your brain out for it.
C HAPTER 5
Eigenvalues and Eigenvectors

In this chapter, for square matrices, you will study

• How to find eigenvalues and eigenvectors


• Similarity transformation & diagonalization
• How to estimate eigenvalues by computation
• Applications to differential equations & Markov Chains

Contents of Chapter 5
5.1. Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.2. The Characteristic Equation and Similarity Transformation . . . . . . . . . . . . . . . 138
5.3. Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.5. Complex Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.7. Applications to Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
5.8. Iterative Estimates for Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.9. Applications to Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

131
132 Chapter 5. Eigenvalues and Eigenvectors

5.1. Eigenvectors and Eigenvalues

Definition 5.1. Let A be an n × n matrix. An eigenvector of A is a


nonzero vector x such that Ax = λx for some scalar λ. In this case, the
scalar λ is an eigenvalue of A and x is the corresponding eigenvector.
" # " #
−1 5 2
Example 5.2. Is an eigenvector of ? What is the eigenvalue?
1 3 6
Solution.

" # " # " #


1 6 6 3
Example 5.3. Let A = ,u= , and v = .
5 2 −5 −2

a) Are u and v eigenvectors of A?

b) Show that 7 is an eigenvalue of matrix A, and find the corresponding


eigenvectors.
Solution. Hint : b) Start with Ax = 7x.
5.1. Eigenvectors and Eigenvalues 133

Definition 5.4. The set of all solutions of (A − λI) x = 0 is called the


eigenspace of A corresponding to eigenvalue λ.

Remark 5.5. Let λ be an eigenvalue of A. Then


a) Eigenspace is a subspace of Rn and the eigenspace of A corresponding
to λ is Nul (A − λI).
b) The homogeneous equation (A − λI) x = 0 has at least one free vari-
able.

Example 5.6. Find a basis for the eigenspace and hence the dimension of
 
4 0 −1
the eigenspace of A = 3 0 3, corresponding to the eigenvalue λ = 3.
 

2 −2 5
Solution.

Example 5.7. Find a basis for the eigenspace and hence the dimension of
 
4 −1 6
the eigenspace of A = 2 1 6, corresponding to the eigenvalue λ = 2.
 

2 −1 8
Solution.
134 Chapter 5. Eigenvalues and Eigenvectors

Theorem 5.8. The eigenvalues of a triangular matrix are the entries


on its main diagonal.
   
3 6 −8 4 0 0
Example 5.9. Let A = 0 0 6 and B = −2 1 0. What are
   

0 0 2 5 3 4
eigenvalues of A and B?
Solution.

Remark 5.10. Zero (0) is an eigenvalue of A ⇔ A is not invertible.

Ax = 0x = 0. (5.1)

Thus the eigenvector x 6= 0 is a nontrivial solution of Ax = 0.


5.1. Eigenvectors and Eigenvalues 135

Theorem 5.11. If v1 , v2 , · · · , vr are eigenvectors that correspond


to distinct eigenvalues λ1 , λ2 , · · · , λr of n × n matrix A, then the set
{v1 , v2 , · · · , vr } is linearly independent.

Proof.

• Assume that {v1 , v2 , · · · , vr } is linearly dependent.


• One of the vectors in the set is a linear combination of the preceding
vectors.
• {v1 , v2 , · · · , vp } is linearly independent; vp+1 is a linear combination of
the preceding vectors.
• Then, there exist scalars c1 , c2 , · · · , cp such that

c1 v1 + c2 v2 + · · · + cp vp = vp+1 (5.2)

• Multiplying both sides of (5.2) by A, we obtain

c1 Av1 + c2 Av2 + · · · + cp Avp = Avp+1

and therefore, using the fact Avk = λk vk :

c1 λ1 v1 + c2 λ2 v2 + · · · + cp λp vp = λp+1 vp+1 (5.3)

• Multiplying both sides of (5.2) by λp+1 and subtracting the result from
(5.3), we have

c1 (λ1 − λp+1 )v1 + c2 (λ2 − λp+1 )v2 + · · · + cp (λp − λp+1 )vp = 0. (5.4)

• Since {v1 , v2 , · · · , vp } is linearly independent,

c1 (λ1 − λp+1 ) = 0, c2 (λ2 − λp+1 ) = 0, · · · , cp (λp − λp+1 ) = 0.

• Since λ1 , λ2 , · · · , λr are distinct,

c1 = c2 = · · · = cp = 0 ⇒ vp+1 = 0,

which is a contradiction.
136 Chapter 5. Eigenvalues and Eigenvectors

Example 5.12. Show that if A2 is the zero matrix, then the only eigenvalue
of A is 0.
Solution. Hint : You may start with Ax = λx, x 6= 0.

True-or-False 5.13.
a. If Ax = λx for some vector x, then λ is an eigenvalue of A.
b. A matrix A is not invertible if and only if 0 is an eigenvalue of A.
c. A number c is an eigenvalue of A if and only if the equation (A−cI)x = 0
has a nontrivial solution.
d. If v1 and v2 are linearly independent eigenvectors, then they corre-
spond to distinct eigenvalues.
e. An eigenspace of A is a null space of a certain matrix.
Solution.

Ans: F,T,T,F,T
5.1. Eigenvectors and Eigenvalues 137

Exercises 5.1
" #
7 3
1. Is λ = −2 an eigenvalue of ? Why or why not?
3 −1
Ans: Yes
" # " #
1 −3 1
2. Is an eigenvector of ? If so, find the eigenvalue.
4 −3 8
3. Find a basis for the eigenspace corresponding to each listed eigenvalue.
   
1 0 −1 3 0 2 0
(a) A = 1 −3 0, λ = −2
 
  1 3 1 0
(b) B =  , λ = 4
4 −13 1 0
 1 1 0

0 0 0 4
 
  2
1 3
Ans: (a) 1 (b)   and another vector
   
1
3
0
 
0 0 0
4. Find the eigenvalues of the matrix 0 2 5.
 

0 0 −1
 
1 2 3
5. For 1 2 3, find one eigenvalue, with no calculation. Justify your answer.
 

1 2 3
Ans: 0. Why?
6. Prove that λ is an eigenvalue of A if and only if λ is an eigenvalue of AT . (A and AT have
exactly the same eigenvalues, which is frequently used in engineering applications of
linear algebra.)
Hint : 1 λ is an eigenvalue of A
⇔ 2 (A − λI)x = 0, for some x 6= 0
⇔ 3 (A − λI) is not invertible.
Now, try to use the Invertible Matrix Theorem (Theorem 2.25) to finish your proof. Note
that (A − λI)T = (AT − λI).
138 Chapter 5. Eigenvalues and Eigenvectors

5.2. The Characteristic Equation and Similar-


ity Transformation

Recall: Let A be an n × n matrix. An eigenvalue λ of A and its corre-


sponding eigenvector x are defined to satisfy

Ax = λx, x 6= 0.

Thus (A − λI) is not invertible and therefore det (A − λI) = 0.

Definition 5.14. The scalar equation det (A − λI) = 0 is called the


characteristic equation of A; the polynomial p(λ) = det (A − λI) is
called the characteristic polynomial of A.

The solutions of det (A − λI) = 0 are the eigenvalues of A.


Example 5.15. Find the characteristic polynomial and all eigenvalues of
" #
8 2
A= .
3 3
Solution.

Example 5.16. Find the characteristic polynomial and all eigenvalues of


 
1 1 0
A = 6 0 5
 

0 0 2
Solution.
5.2. The Characteristic Equation and Similarity Transformation 139

Theorem 5.17. (Invertible Matrix Theorem; continued from The-


orem 2.25, p.82, and Theorem 2.74, p.107)
Let A be n × n square matrix. Then the following are equivalent.
s. The number 0 is not an eigenvalue of A.
t. det A 6= 0

Example 5.18. Find the characteristic equation and all eigenvalues of


 
5 −2 6 −1
 
0 3 −8 0
A=0 0 5 4.

 
0 0 0 1
Solution.

Theorem 5.19. The eigenvalues of a triangular matrix are the entries


on its main diagonal.

Definition 5.20. The algebraic multiplicity (or, multiplicity) of an


eigenvalue is its multiplicity as a root of the characteristic equation.
140 Chapter 5. Eigenvalues and Eigenvectors

Remark 5.21. Let A be an n × n matrix. Then the characteristic equa-


tion of A is of the form
p(λ) = det (A − λI) = (−1)n (λn + cn−1 λn−1 + · · · + c1 λ + c0 )
(5.5)
= (−1)n Πni=1 (λ − λi ),

where some of eigenvalues λi can be complex-valued numbers. Thus


det A = p(0) = (−1)n Πni=1 (0 − λi ) = Πni=1 λi . (5.6)

That is, det A is the product of all eigenvalues of A.

Similarity

Definition 5.22. Let A and B be n × n matrices. Then, A is similar to


B, if there is an invertible matrix P such that
A = P BP −1 , or equivalently, P −1 AP = B.

Writing Q = P −1 , we have B = QAQ−1 . So B is also similar to A, and we


say simply that A and B are similar. The map A 7→ P −1 AP is called a
similarity transformation.

The next theorem illustrates one use of the characteristic polynomial, and
it provides the foundation for the computation of eigenvalues.

Theorem 5.23. If n × n matrices A and B are similar, then they


have the same characteristic polynomial and hence the same eigenvalues
(with the same multiplicities).

Proof. B = P −1 AP . Then,
B − λI = P −1 AP − λI
= P −1 AP − λP −1 P
= P −1 (A − λI)P,
from which we conclude
det (B − λI) = det (P −1 ) det (A − λI) det (P ) = det (A − λI).
5.2. The Characteristic Equation and Similarity Transformation 141

True-or-False 5.24.
a. The determinant of A is the product of the diagonal entries in A.
b. An elementary row operation on A does not change the determinant.
c. (det A)(det B) = det AB
d. If λ + 5 is a factor of the characteristic polynomial of A, then 5 is an
eigenvalue of A.
e. The multiplicity of a root r of the characteristic equation of A is called
the algebraic multiplicity of r as an eigenvalue of A.
Solution.

Ans: F,F,T,F,T

Exercises 5.2
" #
5 −3
1. Find the characteristic polynomial and the eigenvalues of .
−4 3 √
Ans: λ = 4 ± 13
2. Find the characteristic polynomial of matrices. [Note. Finding the characteristic poly-
nomial of a 3 × 3 matrix is not easy to do with just row operations, because the variable
is involved.]
   
0 3 1 −1 0 1
(a) 3 0 2 (b) −3 4 1
   

1 2 0 0 0 2
Ans: (b) −λ3 + 5λ2 − 2λ − 8 = −(λ + 1)(λ − 2)(λ − 4)

3. M Report the matrices and your conclusions.

(a) Construct a random integer-valued 4 × 4 matrix A, and verify that A and AT have
the same characteristic polynomial (the same eigenvalues with the same multiplic-
ities). Do A and AT have the same eigenvectors?
(b) Make the same analysis of a 5 × 5 matrix.

Note. Figure out by yourself how to generate random integer-valued matrices, how to
make its transpose, and how to get eigenvalues and eigenvectors.
142 Chapter 5. Eigenvalues and Eigenvectors

5.3. Diagonalization
5.3.1. The Diagonalization Theorem

Definition 5.25. An n × n matrix A is said to be diagonalizable if


there exists an invertible matrix P and a diagonal matrix D such that

A = P DP −1 (or P −1 AP = D) (5.7)

Remark 5.26. Let A be diagonalizable, i.e., A = P DP −1 . Then

A2 = (P DP −1 )(P DP −1 ) = P D2 P −1
Ak = P Dk P −1
(5.8)
A−1 = P D−1 P −1 (when A is invertible)
det A = det D

Diagonalization enables us to compute Ak and det A quickly.


" #
7 2
Example 5.27. Let A = . Find a formula for Ak , given that
−4 1
" # " #
1 1 5 0
A = P DP −1 , where P = and D = .
−1 −2 0 3
Solution.

" #
2 · 5k − 3k 5k − 3k
Ans: Ak =
2 · 3k − 2 · 5k 2 · 3k − 5k
5.3. Diagonalization 143

Theorem 5.28. (The Diagonalization Theorem)


1. An n × n matrix A is diagonalizable if and only if A has n linearly
independent eigenvectors v1 , v2 , · · · , vn .
2. In fact, A = P DP −1 if and only if columns of P are n linearly inde-
pendent eigenvectors of A. In this case, the diagonal entries of D are
the corresponding eigenvalues of A. That is,
P = [v1 v2 · · · vn ],
 
λ1 0 · · · 0
(5.9)
 
 0 λ2 · · · 0 
D = diag(λ1 , λ2 , · · · , λn ) = 
 ... ... . . . ... ,

 
0 0 · · · λn

where Avk = λk vk , k = 1, 2, · · · , n.

The Diagonalization Theorem can be proved using the following remark.

Remark 5.29. AP = P D with D Diagonal


Let P = [v1 v2 · · · vn ] and D = diag(λ1 , λ2 , · · · , λn ) be arbitrary n × n
matrices. Then,

AP = A[v1 v2 · · · vn ] = [Av1 Av2 · · · Avn ], (5.10)

while
 
λ1 0 · · · 0
 
 0 λ2 · · · 0 
P D = [v1 v2 · · · vn ]
 ... ... . . . ...  = [λ1 v1 λ2 v2 · · · λn vn ].
 (5.11)
 
0 0 · · · λn

If AP = P D with D diagonal, then the nonzero columns of P are


eigenvectors of A.
144 Chapter 5. Eigenvalues and Eigenvectors

Example 5.30. Diagonalize the following matrix, if possible.


 
1 3 3
A = −3 −5 −3
 

3 3 1

Solution. For the computation of det (A − λI), apply R3 ← R3 + R2 to A − λI.


1. Find the eigenvalues of A.
2. Find three linearly independent eigenvectors of A.
3. Construct P from the vectors in step 2.
4. Construct D from the corresponding eigenvalues.
Check: AP = P D?

 
1 −1 −1
Ans: P = −1 1 0 and D = diag(1, −2, −2).
 

1 0 1
5.3. Diagonalization 145

Note: A matrix is not always diagonalizable.

Example 5.31. Diagonalize the following matrix, if possible.


 
2 4 3
B = −4 −6 −3,
 

3 3 1

for which det (B − λI) = −(λ − 1)(λ + 2)2 .


Solution.
146 Chapter 5. Eigenvalues and Eigenvectors

5.3.2. Diagonalizable Matrices


Example 5.32. Diagonalize the following matrix, if possible.
 
3 0 0 1
 
0 2 0 0
A=0 0 2 0.

 
1 0 0 3
Solution.

 
0 0 −1 1
1 0 0 0
Ans: p(λ) = (λ − 2)3 (λ − 4). P =   and D = diag(2, 2, 2, 4).
 
0 1 0 0
0 0 1 1
5.3. Diagonalization 147

Theorem 5.33. An n × n matrix with n distinct eigenvalues is diago-


nalizable.

Example 5.34. Determine if the following matrix is diagonalizable.


 
5 −8 1
A = 0 0 7
 

0 0 −2
Solution.

Matrices Whose Eigenvalues Are Not Distinct

Theorem 5.35. Let A be an n × n matrix whose distinct eigenvalues


are λ1 , λ2 , · · · , λp . Let EA (λk ) be the eigenspace for λk .
1. dim E A (λk ) ≤ (the multiplicity of the eigenvalue λk ), for 1 ≤ k ≤ p
2. The matrix A is diagonalizable
⇔ the sum of the dimensions of the eigenspaces equals n
⇔ dim E A (λk ) = the multiplicity of λk , for each 1 ≤ k ≤ p
(and the characteristic polynomial factors completely into linear factors)

3. If A is diagonalizable and Bk is a basis for EA (λk ), then the total


collection of vectors in the sets {B1 , B2 , · · · , Bp } forms an eigenvector
basis for Rn .
148 Chapter 5. Eigenvalues and Eigenvectors

Example 5.36. Diagonalize the following matrix, if possible.


 
5 0 0 0
 
 0 5 0 0
A= 1 4 −3 0

 
−1 −2 0 −3
Solution.
5.3. Diagonalization 149

True-or-False 5.37.
a. A is diagonalizable if A = P DP −1 , for some matrix D and some invert-
ible matrix P .
b. If Rn has a basis of eigenvectors of A, then A is diagonalizable.
c. If A is diagonalizable, then A is invertible.
d. If A is invertible, then A is diagonalizable.
e. If AP = P D, with D diagonal, then the nonzero columns of P must be
eigenvectors of A.
Solution.

Ans: F,T,F,F,T

Exercises 5.3
1. The matrix A is factored in the form P DP −1 . Find the eigenvalues of A and a basis for
each eigenspace.
     
2 2 1 1 1 2 5 0 0 1/4 1/2 1/4
A = 1 3 1 = 1 0 −10 1 01/4 1/2 −3/4
     

1 2 2 1 −1 0 0 0 1 1/4 −1/2 1/4

2. Diagonalize the matrices, if possible.


     
2 2 −1 7 4 16 4 0 0
(a)  1 3 −1 (b)  2 5 8 (c) 1 4 0
     

−1 −2 2 −2 −2 −5 0 0 5

Hint : Use (a) λ = 5, 1. (b) λ = 3, 1. (c) Not diagonalizable. Why?


3. A is a 3 × 3 matrix with two eigenvalues. Each eigenspace is one-dimensional. Is A
diagonalizable? Why?
4. Construct and verify.
(a) A nonzero 2 × 2 matrix that is invertible but not diagonalizable.
(b) A nondiagonal 2 × 2 matrix that is diagonalizable but not invertible.
150 Chapter 5. Eigenvalues and Eigenvectors

5.5. Complex Eigenvalues

Definition 5.38. Let A be an n × n matrix with real entries. A complex


eigenvalue λ is a complex-valued scalar such that

Ax = λx, where x 6= 0.

As usual, we determine λ by solving the characteristic equation

det (A − λI) = 0.
" #
0 −1
Example 5.39. Let A = and consider the linear transformation
1 0
x 7→ Ax, x ∈ R2 .
Then
a) It rotates the plane counterclockwise through a quarter-turn.
b) The action of A is periodic, since after four quarter-turns, a vector is
back where it started.
c) Obviously, no nonzero vector is mapped into a multiple of itself, so
A has no eigenvectors in R2 and hence no real eigenvalues.

Find the eigenvalues of A, and find a basis for each eigenspace.


Solution.
5.5. Complex Eigenvalues 151

Eigenvalues and Eigenvectors of a Real Matrix That Acts on Cn

Remark 5.40. Let A be an n × n matrix whose entries are real. Then

Ax = Ax = Ax.

Thus, if λ is an eigenvalue of A, i.e., Ax = λx, then

Ax = Ax = λx = λx.

That is,
Ax = λx ⇔ Ax = λx (5.12)
If λ is an eigenvalue of A with corresponding eigenvector x, then the
complex conjugate of λ, λ, is also an eigenvalue with eigenvector x.
" #
1 5
Example 5.41. Let A = .
−2 3

(a) Find all eigenvalues and the corresponding eigenvectors for A.


(b) Let (λ,
" v), with
# λ = a − bi, be an eigen-pair. Let P = [Re v Im v] and
a −b
C= . Show that AP = P C. (This implies that A = P CP −1 .)
b a

Solution.

Ans: (a) λ = 2 ± 3i.


152 Chapter 5. Eigenvalues and Eigenvectors

Theorem 5.42. (Factorization): Let A be 2 × 2 real matrix with com-


plex eigenvalue λ = a − "bi (b 6=# 0) and an associated eigenvector v. If
a −b
P = [Re v Im v] and C = , then
b a

A = P CP −1 . (5.13)
" #
a −b
Example 5.43. Find eigenvalues of C = .
b a
Solution.

Example 5.44. Find all eigenvalues and the corresponding eigenvectors


for " #
3 −3
A=
1 1
Also find an invertible matrix P and a matrix C such that A = P CP −1 .
Solution.
5.5. Complex Eigenvalues 153

Exercises 5.5
1. Let each matrix act on C2 . Find the eigenvalues and a basis for each eigenspace in C2 .
" # " #
1 −2 1 5
(a) (b)
1 3 −2 3
" #
−1 + i
Ans: (a) An eigen-pair: λ = 2 + i,
1
" # " #
5 −2 a −b
2. Let A = . Find an invertible matrix P and a matrix C of the form such
1 3 b a
that A = P CP −1 . " # " #
1 −1 4 −1
Ans: P = ,C= .
1 0 1 4
3. Let A be an n × n real matrix with the property that AT = A. Let x be any vector in Cn ,
and let q = xT Ax. The equalities below show that q is a real number by verifying that
q = q. Give a reason for each step.

q = xT Ax = xT Ax = xT Ax = (xT Ax)T = xT AT x = q
(a) (b) (c) (d) (e)
154 Chapter 5. Eigenvalues and Eigenvectors

5.7. Applications to Differential Equations

Recall: For solving the first-order differential equation:


dx
= a x, (5.14)
dt
we rewrite it as
dx
= a dt. (5.15)
x
By integrating both sides, we have

ln |x| = a t + K.

Thus, for x = x(t),

x = ±ea t+K = ±eK eat = C · eat . (5.16)

Example 5.45. Consider the first-order initial-value problem

dx
= 5x, x(0) = 2. (5.17)
dt
(a) Find the solution x(t).
(b) Check if our solution satisfies both the differential equation and the
initial condition.
Solution.
5.7. Applications to Differential Equations 155

5.7.1. Dynamical System: The System of First-Order Dif-


ferential Equations

Consider a system of first-order differential equations:





 x01 = a11 x1 + a12 x2 + · · · + a1n xn
 x0 = a x + a x + · · · + a x

2 21 1 22 2 2n n
. (5.18)

 ..

 x0 = a x + a x + · · · + a x

n n1 1 n2 2 nn n

where x1 , x2 , · · · , xn are functions of t and aij ’s are constants.


Then the system can be written as

x0 (t) = Ax(t), (5.19)

where
x01 (t)
     
x1 (t) a11 · · · a1n
x(t) =  ... , x0 (t) =  ... , and A =  ... .. .
. 
    

xn (t) x0n (t) an1 · · · ann

How to solve (5.19): x0 (t) = Ax(t) ?

Observation 5.46. Let (λ, v) be an eigen-pair of A, i.e., Av = λv. Then,


for an arbitrary constant c,

x(t) = cveλt (5.20)

is a solution of (5.19).

Proof. Let’s check it:


x0 (t) = cv(eλt )0 = cλveλt
(5.21)
Ax(t) = Acveλt = cλveλt ,

which completes the proof.


156 Chapter 5. Eigenvalues and Eigenvectors

Example 5.47. Solve the initial-value problem:


" # " #
−4 −2 −2
x0 (t) = x(t), x(0) = . (5.22)
3 1 4

Solution. The two eigen-pairs are (λi , vi ), i = 1, 2. Then the general solu-
tion is x = c1 x1 + c2 x2 = c1 v1 eλ1 t + c2 v2 eλ2 t .

" # " #
2 −t 1 −2t
Ans: x(t) = −2 e +2 e
−3 −1

Figure 5.1: Trajectories for the dynamical


system (5.22).
5.7. Applications to Differential Equations 157

5.7.2. Trajectories for the Dynamical Systems: Attrac-


tors, Repellers, and Saddle Points

Summary 5.48. For a dynamical system

x0 (t) = A x(t), A ∈ R2×2 , (5.23)

let (λi , vi ), i = 1, 2, be eigen-pairs of A. Then the general solution of (5.23)


reads
x(t) = c1 v1 eλ1 t + c2 v2 eλ2 t , (5.24)
for arbitrary scalars c1 , c2 ∈ R.

Definition 5.49. Two Distinct Real Eigenvalues

1. If the eigenvalues of A are both negative, the origin is called an


attractor or sink, since all trajectories (solutions x(t)) are drawn to
the origin.
• For the solution in the last example, the direction of greatest
attraction is along the trajectory of the eigenfunction x2 (along
the line through 0 and v2 ), corresponding to the more negative
eigenvalue λ = −2.
2. If the eigenvalues of A are both positive, the origin is called a re-
peller or source, since all trajectories (solutions x(t)) are traversed
away from the origin.
3. If A has both positive and negative eigenvalues, the origin is called
saddle point of the dynamical system.

The larger the eigenvalue is in modulus, the greater attraction/repulsion.


158 Chapter 5. Eigenvalues and Eigenvectors

Example 5.50. Solve the initial-value problem:


" # " #
2 3 3
x0 (t) = x(t), x(0) = . (5.25)
−1 −2 2

What are the direction of greatest attraction and the direction of greatest
repulsion?
Solution.

Figure 5.2: Trajectories for the dynamical


" # " # system (5.25). " #
5 −3 t 9 −1 −t −1
Ans: x(t) = − e + e . The direction of greatest attraction = .
2 1 2 1 1
5.7. Applications to Differential Equations 159

Summary 5.51. For the dynamical system x0 (t) = Ax(t), A ∈ R2 × 2 , let


(λ1 , v1 ) and (λ2 , v2 ) be eigen-pairs of A.
• Case (i): If λ1 and λ2 are real and distinct, then
x = c1 v1 eλ1 t + c2 v2 eλ2 t . (5.26)

• Case (ii): If A has a double eigenvalue λ (with v), then you should
find a second generalized eigenvector by solving, e.g.,
(A − λI) w = v. (5.27)

(The above can be derived from a guess: x = tveλt + weλt .) Then the
general solution becomes [2]
x(t) = c1 veλt + c2 (tv + w)eλt (5.28)

(w is a simple shift vector, which may not be unique.)


• Case (iii): If λ = a + bi (b 6= 0) and λ are complex eigenvalues with
eigenvectors v and v, then
x = c1 veλt + c2 veλt , (5.29)

from which two linearly independent real-valued solutions must be


extracted.

Note: Let λ = a + bi and v = Rev + i Imv. Then

veλt = (Rev + i Imv) · eat (cos bt + i sin bt)


= [(Rev) cos bt − (Imv) sin bt]eat
+i [(Rev) sin bt + (Imv) cos bt]eat .

Let
y1 (t) = [(Rev) cos bt − (Imv) sin bt]eat
y2 (t) = [(Rev) sin bt + (Imv) cos bt]eat
Then they are linearly independent and satisfy the dynamical system.
Thus, the real-valued general solution of the dynamical system reads

x(t) = C1 y1 (t) + C2 y2 (t). (5.30)


160 Chapter 5. Eigenvalues and Eigenvectors

Example 5.52. Construct the general solution of x0 (t) = Ax(t) when


" #
−3 2
A=
−1 −1
Solution.

Figure 5.3: Trajectories for the dynamical


system. " # " #
1 − i (−2+i)t 1 + i (−2−i)t
Ans: (complex solution) c1 e + c2 e
" 1 # " 1 #
cos t + sin t −2t sin t − cos t −2t
Ans: (real solution) c1 e + c2 e
cos t sin t
5.7. Applications to Differential Equations 161

Exercises 5.7
1. (i) Solve the initial-value problem x0 = Ax with x(0) = (3, 2). (ii) Classify the nature of
the origin as an attractor, repeller, or saddle point of the dynamical system. (iii) Find
the directions of greatest attraction and/or repulsion. (iv) When the origin is a saddle
point, sketch typical trajectories.
" # " #
−2 −5 7 −1
(a) A = (b) A = .
1 4 3 3
Ans:"(a) #The origin is a saddle point.
" #
−5 −1
The direction of G.A. = . The direction of G.R. = .
1 " 1#
1
Ans: (b) The origin is a repeller. The direction of G.R. = .
1
2. Use the strategies in (5.27) and (5.28) to solve
" # " #
7 1 2
x0 = x, x(0) = .
−4 3 −5
"# " # " #
1 5t  1 0  5t
Ans: x(t) = 2 e − t + e
−2 −2 1
162 Chapter 5. Eigenvalues and Eigenvectors

5.8. Iterative Estimates for Eigenvalues

5.8.1. The Power Method

The power method is an iterative algorithm:


Given a square matrix A ∈ Rn×n , the algorithm finds a number λ, which
is the largest eigenvalue of A (in modulus), and its corresponding
eigenvector v.

Assumption. To apply the power method, we assume that A ∈ Rn×n has


• n linearly independent eigenvectors {v1 , v2 , · · · , vn }, and
• exactly one eigenvalue that is largest in magnitude, λ1 :

|λ1 | > |λ2 | ≥ |λ3 | ≥ · · · ≥ |λn |. (5.31)

The power method approximates the largest eigenvalue λ1 and its asso-
ciated eigenvector v1 .

Derivation of Power Iteration

• Since eigenvectors {v1 , v2 , · · · , vn } are linearly independent, any vector


x ∈ Rn can be expressed as
n
X
x = βj vj , (5.32)
j=1

for some constants {β1 , β2 , · · · , βn }.

• Multiplying both sides of (5.32) by A and A2 gives


n
X  n
X n
X
Ax = A βj vj = βj Avj = βj λj vj ,
j=1 j=1 j=1
n
X  n
X (5.33)
2
Ax = A βj λj vj = βj λ2j vj .
j=1 j=1
5.8. Iterative Estimates for Eigenvalues 163

• In general,
n
X
k
A x = βj λkj vj , k = 1, 2, · · · , (5.34)
j=1
which gives
n
X  λ k h  λ k  λ k  λ k i
k j 1 2 n
A x = λk1 · βj vj = λk1 · β1 v1 + β2 v2 + · · · + βn vn . (5.35)
j=1
λ1 λ1 λ1 λ1

• For j = 2, 3, · · · , n, since |λj /λ1 | < 1, we have lim |λj /λ1 |k = 0, and
k→∞

lim Ak x = lim λk1 β1 v1 . (5.36)


k→∞ k→∞

Remark 5.53. The sequence in (5.36) converges to 0 if |λ1 | < 1 and


diverges if |λ1 | > 1, provided that β1 6= 0.
• The entries of Ak x will grow with k if |λ1 | > 1 and will go to 0 if |λ1 | < 1.
• In either case, it is hard to decide the largest eigenvalue λ1 and its
associated eigenvector v1 .
• To take care of that possibility, we scale Ak x in an appropriate
manner to ensure that the limit in (5.36) is finite and nonzero.

Algorithm 5.54. (The Power Iteration) Given x 6= 0:

initialization : x0 = x/||x||∞
for k = 1, 2, · · ·
yk = Axk−1 ; µk = ||yk ||∞ (5.37)
xk = yk /µk
end for

Claim 5.55. Let {xk , µk } be a sequence produced by the power method.


Then,
xk → v1 , µk → |λ1 |, as k → ∞. (5.38)
More precisely, the power method converges as

µk = |λ1 | + O(|λ2 /λ1 |k ). (5.39)


164 Chapter 5. Eigenvalues and Eigenvectors
 
−4 1 −1
Example 5.56. The matrix A =  1 −3 2 has eigenvalues and eigen-
 

−1 2 −3
vectors as follows    
−6 1 −2 0
eig(A) = −3, −1 −1 1
   

−1 1 1 1
Verify that the sequence produced by the power method converges to the
largest eigenvalue and its associated eigenvector.
Solution.
power_iteration.m
1 A = [-4 1 -1; 1 -3 2; -1 2 -3];
2 %[V,D] = eig(A)
3

4 fmt = ['k=%2d: x = [',repmat('%.5f, ',1,numel(x)-1),'%.5f], ',...


5 'mu=%.5f (error = %.5f)\n'];
6 x = [1 0 0]';
7 for k=1:10
8 y = A*x;
9 [~,ind] = max(abs(y)); mu = y(ind);
10 x =y/mu;
11 fprintf(fmt,k,x,mu,abs(-6-mu))
12 end

Output
1 k= 1: x = [1.00000, -0.25000, 0.25000], mu=-4.00000 (error = 2.00000)
2 k= 2: x = [1.00000, -0.50000, 0.50000], mu=-4.50000 (error = 1.50000)
3 k= 3: x = [1.00000, -0.70000, 0.70000], mu=-5.00000 (error = 1.00000)
4 k= 4: x = [1.00000, -0.83333, 0.83333], mu=-5.40000 (error = 0.60000)
5 k= 5: x = [1.00000, -0.91176, 0.91176], mu=-5.66667 (error = 0.33333)
6 k= 6: x = [1.00000, -0.95455, 0.95455], mu=-5.82353 (error = 0.17647)
7 k= 7: x = [1.00000, -0.97692, 0.97692], mu=-5.90909 (error = 0.09091)
8 k= 8: x = [1.00000, -0.98837, 0.98837], mu=-5.95385 (error = 0.04615)
9 k= 9: x = [1.00000, -0.99416, 0.99416], mu=-5.97674 (error = 0.02326)
10 k=10: x = [1.00000, -0.99708, 0.99708], mu=-5.98833 (error = 0.01167)

1 1
Notice that | − 6 − µk | ≈ | − 6 − µk−1 |, for which |λ2 /λ1 | = .
2 2
5.8. Iterative Estimates for Eigenvalues 165

5.8.2. Inverse Power Method


Some applications require to find an eigenvalue of the matrix A, near a
prescribed value q. The inverse power method is a variant of the Power
method to solve such a problem.
• We begin with the eigenvalues and eigenvectors of (A − qI)−1 . Let

Avi = λi vi , i = 1, 2, · · · , n. (5.40)

• Then it is easy to see that

(A − qI)vi = (λi − q)vi . (5.41)

Thus, we obtain
1
(A − qI)−1 vi = vi . (5.42)
λi − q
• That is, when q 6∈ {λ1 , λ2 , · · · , λn }, the eigenvalues of (A − qI)−1 are
1 1 1
, , ··· , , (5.43)
λ1 − q λ2 − q λn − q
with the same eigenvectors {v1 , v2 , · · · , vn } of A.

Algorithm 5.57. (Inverse Power Method) Applying the power


method to (A − qI)−1 gives the inverse power method. Given x 6= 0:
set : x0 = x/||x||∞
for k = 1, 2, · · ·
yk = (A − qI)−1 xk−1 ; µk = ||yk ||∞
(5.44)
xk = yk /µk
λk = 1/µk + q
end for
166 Chapter 5. Eigenvalues and Eigenvectors
 
−4 1 −1
Example 5.58. The matrix A is as in Example 5.56: A =  1 −3 2.
 

−1 2 −3
Find the the eigenvalue of A nearest to q = −5/2, using the inverse power
method.
Solution.
inverse_power.m
1 A = [-4 1 -1; 1 -3 2; -1 2 -3];
2 %[V,D] = eig(A)
3

4 fmt = ['k=%2d: x = [',repmat('%.5f, ',1,numel(x)-1),'%.5f], ',...


5 'lambda=%.7f (error = %.7f)\n'];
6 q = -5/2; x = [1 0 0]';
7 B = inv(A-q*eye(3));
8 for k=1:10
9 y = B*x;
10 [~,ind] = max(abs(y)); mu = y(ind);
11 x =y/mu;
12 lambda = 1/mu + q;
13 fprintf(fmt,k,x,lambda,abs(-3-lambda))
14 end

Output
1 k= 1: x = [1.00000, 0.40000, -0.40000], lambda=-3.2000000 (error = 0.2000000)
2 k= 2: x = [1.00000, 0.48485, -0.48485], lambda=-3.0303030 (error = 0.0303030)
3 k= 3: x = [1.00000, 0.49782, -0.49782], lambda=-3.0043668 (error = 0.0043668)
4 k= 4: x = [1.00000, 0.49969, -0.49969], lambda=-3.0006246 (error = 0.0006246)
5 k= 5: x = [1.00000, 0.49996, -0.49996], lambda=-3.0000892 (error = 0.0000892)
6 k= 6: x = [1.00000, 0.49999, -0.49999], lambda=-3.0000127 (error = 0.0000127)
7 k= 7: x = [1.00000, 0.50000, -0.50000], lambda=-3.0000018 (error = 0.0000018)
8 k= 8: x = [1.00000, 0.50000, -0.50000], lambda=-3.0000003 (error = 0.0000003)
9 k= 9: x = [1.00000, 0.50000, -0.50000], lambda=-3.0000000 (error = 0.0000000)
10 k=10: x = [1.00000, 0.50000, -0.50000], lambda=-3.0000000 (error = 0.0000000)

Note: Eigenvalues of (A − qI)−1 are {−2/7, −2, 2/3}.


1
• The initial vector: x0 = [1, 0, 0]T = (v1 − v2 ); see Example 5.56.
3
• Thus, each iteration must reduce the error by a factor of 7.
5.8. Iterative Estimates for Eigenvalues 167

Exercises 5.8
1. M The matrix in Example 5.58 has eigenvalues {−6, −3, −1}. We may try to find the
eigenvalue of A nearest to q = −3.1.

(a) Estimate (mathematically) the convergence speed of the inverse power method.
(b) Verify it by implementing the inverse power method, with x0 = [0, 1, 0]T .
 
2 −1 0 0
−1 2 0 −1
2. M Let A = 

. Use indicated methods to approximate eigenvalues and

 0 0 4 −2
0 −1 −2 4
their associated eigenvectors of A within to 10−12 accuracy.

(a) The power method, the largest eigenvalue.


(b) The inverse power method, an eigenvalue near q = 3.
(c) The inverse power method, the smallest eigenvalue.
168 Chapter 5. Eigenvalues and Eigenvectors

5.9. Applications to Markov Chains


Theory
• Markov chains are useful tools in some probabilistic models.
• The basic idea is the following:
Suppose that you are watching some collection of objects that
are changing through time.
• Assumptions (on states & changes):
– The total number of objects is not changing, but their states
(position, colour, disposition, etc) are changing.
– The proportion of changing states is constant and these
changes occur at discrete times, one after the next.
• Then we are in a good position to model changes by a Markov chain.

Example 5.59. Consider a three storey aviary at a local zoo which


houses 300 small birds.
• The aviary has three levels.
• The birds spend their day, flying from one level to another.

Our problem is to determine what the probability is of a given bird being at


a given level of the aviary at a given time.
Continued on the next page ⇒
5.9. Applications to Markov Chains 169

Data
• Observe a vector  
p1
p = p2 , (5.45)
 

p3
where pi is the proportion (probability) of birds on the i-th level.
Note: p1 + p2 + p3 = 1.
• After 10 minutes, we have a new distribution of the birds
 0
p1
p0 = p02 . (5.46)
 

p03

Model
• We assume that the change from p to p0 is given by a linear operator
on R3 . In other words, there is a matrix T ∈ R3×3 such that

p0 = T p. (5.47)

The matrix T is called the transition matrix for the Markov chain.
• Another 10 minutes later, we observe another distribution

p00 = T p0 . (5.48)

Note: The same matrix T is used in (5.47) and (5.48), because we assume
that the probability of a bird moving to another level is indepen-
dent of time.
• In other words, the probability of a bird moving to a particular level
depends only on the present state of the bird, and not on any past
states.
• This type of model is known as a finite Markov chain.
170 Chapter 5. Eigenvalues and Eigenvectors

5.9.1. Probability Vector and Stochastic Matrix

Definition 5.60. Probability Vector and Stochastic Matrix


 
p1
• A vector p =  ...  with nonnegative entries that add up to 1 is called
 

pn
a probability vector.
• A (left) stochastic matrix is a square matrix whose columns are
probability vectors.

A stochastic matrix is also called a probability matrix, transition ma-


trix, substitution matrix, or Markov matrix.

Lemma 5.61. Let T be a stochastic matrix. If p is a probability vector,


then so is q = T p.

Proof. Let v1 , v2 , · · · , vn be the columns of T . Then

q = T p = p 1 v 1 + p2 v 2 + · · · pn v n .

Clearly q has nonnegative entries; their sum reads

sum(q) = sum(p1 v1 + p2 v2 + · · · pn vn ) = p1 + p2 + · · · + pn = 1.

Definition 5.62. Markov Chain


In general, a finite Markov chain is a sequence of probability vectors
x0 , x1 , x2 , · · · , together with a stochastic matrix T , such that

x1 = T x0 , x2 = T x1 , x3 = T x2 , · · · (5.49)

We can rewrite the above conditions as a recurrence relation

xk+1 = T xk , k = 0, 1, 2, · · · (5.50)

The vector xk is often called a state vector.


5.9. Applications to Markov Chains 171

Figure 5.4: Annual percentage migration between a city and its suburbs.

Example 5.63. Figure 5.4 shows population movement between a city and
its suburbs. Then, the annual migration between these two parts of the
metropolitan region can be expressed by the migration matrix M:
" #
0.95 0.03
M= . (5.51)
0.05 0.97

Suppose the 2023 population of the region is 60,000 in the city and 40,000 in
the suburbs. What is the distribution of the population in 2024? In 2025?
Solution.

annual_migration.m Output
1 M = [0.95 0.03 1 x1 =
2 0.05 0.97]; 2 58200
3 3 41800
4 x0 = [60000 4

5 40000]; 5 x2 =
6 6 56544
7 x1 = M*x0 7 43456
8 x2 = M*x1
172 Chapter 5. Eigenvalues and Eigenvectors

Example 5.64. (Revisit to the aviary example: Example 5.59)


Assume that
• Whenever a bird is on any level of the aviary, the probability of that
bird staying on the same level 10 min later is 1/2.
• If the bird is on the first level, the probability of moving to the second
level in 10 min is 1/3 and of moving to the third level in 10 min is 1/6.
• For a bird on the second level, the probability of moving to either the
first or third level is 1/4.
• For a bird on the third level, the probability of moving to the second
level is 1/3 and of moving to the first is 1/6.

(a) Find the transition matrix for this example.


(b) Suppose that after breakfast, all the birds are in the dining area on the
first level. Where are they in 10 min? In 20 min? In 30 min?

Solution. (a) From the information given, we derive the transition ma-
trix:  
1/2 1/4 1/6
T = 1/3 1/2 1/3 (5.52)
 

1/6 1/4 1/2


(b) The probability matrix at time 0 is p = [1, 0, 0]T .
birds_on_aviary.m Output
1 T = [1/2 1/4 1/6 1 p1 =
2 1/3 1/2 1/3 2 0.5000
3 1/6 1/4 1/2]; 3 0.3333
4 4 0.1667
5 p0 = [1 0 0]'; 5 p2 =
6 6 0.3611
7 p1 = T*p0 7 0.3889
8 p2 = T*p1 8 0.2500
9 p3 = T*p2 9 p3 =
10 0.3194
11 0.3981
12 0.2824
5.9. Applications to Markov Chains 173

5.9.2. Predicting the Distant Future: Steady-State Vec-


tors
The most interesting aspect of Markov chains is the study of the chain’s
long term behavior.
Example 5.65. (Revisit to Example 5.64)
What can be said in Example 5.64 about the bird population “in the long
run”? What happens if the chain starts with other initial vectors?
Solution.
birds_on_aviary2.m
1 T = [1/2 1/4 1/6
2 1/3 1/2 1/3
3 1/6 1/4 1/2];
4

5 p = [1 0 0]'; q = [0 0 1]';
6 fprintf('p_%-2d = [%.5f %.5f %.5f]; q_%-2d = [%.5f %.5f %.5f]\n',0,p,0,q)
7 fprintf('%s\n',repelem('-',1,68))
8

9 n=12;
10 for k=1:n
11 p = T*p; q = T*q;
12 fprintf('p_%-2d = [%.5f %.5f %.5f]; q_%-2d = [%.5f %.5f %.5f]\n',k,p,k,q)
13 end

Output
1 p_0 = [1.00000 0.00000 0.00000]; q_0 = [0.00000 0.00000 1.00000]
2 --------------------------------------------------------------------
3 p_1 = [0.50000 0.33333 0.16667]; q_1 = [0.16667 0.33333 0.50000]
4 p_2 = [0.36111 0.38889 0.25000]; q_2 = [0.25000 0.38889 0.36111]
5 p_3 = [0.31944 0.39815 0.28241]; q_3 = [0.28241 0.39815 0.31944]
6 p_4 = [0.30633 0.39969 0.29398]; q_4 = [0.29398 0.39969 0.30633]
7 p_5 = [0.30208 0.39995 0.29797]; q_5 = [0.29797 0.39995 0.30208]
8 p_6 = [0.30069 0.39999 0.29932]; q_6 = [0.29932 0.39999 0.30069]
9 p_7 = [0.30023 0.40000 0.29977]; q_7 = [0.29977 0.40000 0.30023]
10 p_8 = [0.30008 0.40000 0.29992]; q_8 = [0.29992 0.40000 0.30008]
11 p_9 = [0.30003 0.40000 0.29997]; q_9 = [0.29997 0.40000 0.30003]
12 p_10 = [0.30001 0.40000 0.29999]; q_10 = [0.29999 0.40000 0.30001]
13 p_11 = [0.30000 0.40000 0.30000]; q_11 = [0.30000 0.40000 0.30000]
14 p_12 = [0.30000 0.40000 0.30000]; q_12 = [0.30000 0.40000 0.30000]
174 Chapter 5. Eigenvalues and Eigenvectors

Steady-State Vectors

Definition 5.66. If T is a stochastic matrix, then a steady-state vec-


tor for T is a probability vector q such that

T q = q. (5.53)

Note: The steady-state vector q can be seen as an eigenvector of T , of


which the corresponding eigenvalue λ = 1.

Strategy 5.67. How to Find a Steady-State Vector


(a) First, solve for x = [x1 , x2 , · · · , xn ]T :

T x = x ⇔ T x − x = 0 ⇔ (T − I)x = 0. (5.54)

(b) Then, set


1
q= x. (5.55)
x1 + x2 + · · · + xn
" #
0.6 0.3
Example 5.68. Let T = . Find a steady-state vector for T .
0.4 0.7
Solution.

" #
3/7
Ans: q =
4/7
5.9. Applications to Markov Chains 175

Definition 5.69. A stochastic matrix T is regular if some matrix power


T k contains only strictly positive entries.

Interpretation 5.70. If the transition matrix of a Markov chain is


regular, then for some k it is possible to go from any state to any other
states (including remaining in the current state) in exactly k steps.
 
0.5 0.2 0.3
Example 5.71. Let T = 0.3 0.8 0.3.
 

0.2 0 0.4

(a) Is T regular?
(b) Find a steady-state vector for T , using the power method.

Solution.

regular_stochastic.m 7 x_2 =[0.37000 0.45000 0.18000]


1 T = [0.5 0.2 0.3 8 x_3 =[0.32900 0.52500 0.14600]
2 0.3 0.8 0.3 9 x_4 =[0.31330 0.56250 0.12420]
3 0.2 0 0.4]; 10 x_5 =[0.30641 0.58125 0.11234]
4 11 x_6 =[0.30316 0.59062 0.10622]
5 T2 = T*T; 12 x_7 =[0.30157 0.59531 0.10312]
6 disp('T^2 ='); disp(T2) 13 x_8 =[0.30078 0.59766 0.10156]
7
14 x_9 =[0.30039 0.59883 0.10078]
% The Power Method
8
15 x_10=[0.30020 0.59941 0.10039]
x = [1 0 0]';
x_11=[0.30010 0.59971 0.10020]
9
16
10 for k = 1:20
17 x_12=[0.30005 0.59985 0.10010]
11 x = T*x;
12 fprintf('x_%-2d=[%.5f %.5f %.5f]\n',k,x)18 x_13=[0.30002 0.59993 0.10005]
13 end 19 x_14=[0.30001 0.59996 0.10002]
20 x_15=[0.30001 0.59998 0.10001]
21 x_16=[0.30000 0.59999 0.10001]
Output 22 x_17=[0.30000 0.60000 0.10000]
1 T^2 = 23 x_18=[0.30000 0.60000 0.10000]
2 0.3700 0.2600 0.3300 24 x_19=[0.30000 0.60000 0.10000]
3 0.4500 0.7000 0.4500 25 x_20=[0.30000 0.60000 0.10000]
4 0.1800 0.0400 0.2200
5

6 x_1 =[0.50000 0.30000 0.20000]


176 Chapter 5. Eigenvalues and Eigenvectors

Theorem 5.72. If T is an n × n regular stochastic matrix, then T has a


unique steady-state vector q.
(a) The entries of q are strictly positive.
(b) The steady-state vector

q = lim T k x0 , (5.56)
k→∞

for any initial probability vector x0 .

Remark 5.73. Let T ∈ Rn×n be a regular stochastic matrix. Then


• If T v = λv, then |λ| ≤ 1.
(The above is true for every stochastic matrix; see § A.2.)
• Every column of T k converges to q as k → ∞, i.e.,

T k → [q q · · · q] ∈ Rn×n , as k → ∞. (5.57)

See Exercise 3 on p. 178.

Example 5.74. Let a regular stochastic matrix be given as in Exam-


 
0.5 0.2 0.3
ple 5.71: T = 0.3 0.8 0.3.
 

0.2 0 0.4

(a) Find the steady-state vector q, by deriving the RREF.


(b) Find T 10 and T 20 .

Solution. (a) Use Strategy 5.67.


5.9. Applications to Markov Chains 177

regular_stochastic_Tk.m
1 T = [0.5 0.2 0.3
2 0.3 0.8 0.3
3 0.2 0 0.4];
4 Tk = eye(3);
5 rref(T-Tk)
6

7 for k = 1:20
8 Tk = Tk*T;
9 if(mod(k,10)==0), fprintf('T^%d =\n',k); disp(Tk) end
10 end

Output
1 ans =
2 1.0000 0 -3.0000
3 0 1.0000 -6.0000
4 0 0 0
5

6 T^10 =
7 0.3002 0.2999 0.3002
8 0.5994 0.6004 0.5994
9 0.1004 0.0997 0.1004
10

11 T^20 =
12 0.3000 0.3000 0.3000
13 0.6000 0.6000 0.6000
14 0.1000 0.1000 0.1000

True-or-False 5.75. Let T be a stochastic matrix.


a. The steady-state vector is an eigenvector of T .
b. Every eigenvector of T is a steady-state vector.
c. The all-ones vector is an eigenvector of T T .
d. The number 2 can be an eigenvalue of T or T T .
e. All stochastic matrices are regular.

Ans: T, F, T, F, F
178 Chapter 5. Eigenvalues and Eigenvectors

Exercises 5.9
1. Find the steady-state vector, deriving the RREF.
 
" # .7 .1 .1
.1 .6
(a) (b) .2 .8 .2
 
.9 .4
.1 .1 .7
Ans: (b) [1/4, 1/2, 1/4]T
2. The weather in Starkville, MS, is either good, indifferent, or bad on any given day.

• If the weather is good today, there is a 60% chance the weather will be good tomor-
row, a 30% chance the weather will be indifferent, and a 10% chance the weather
will be bad.
• If the weather is indifferent today, it will be good tomorrow with probability .40 and
indifferent with probability .30.
• Finally, if the weather is bad today, it will be good tomorrow with probability .40
and indifferent with probability .50.

(a) What is the stochastic matrix for this situation?


(b) Suppose there is a 50% chance of good weather today and a 50% chance of indiffer-
ent weather. What are the chances of bad weather tomorrow?
(c) Suppose the predicted weather for Monday is 40% indifferent weather and 60% bad
weather. What are the chances for good weather on Wednesday?
Ans: (b) 20%
3. Let T ∈ Rn×n be a regular stochastic matrix. Prove (5.57).
Hint : Lrt T = [t1 , t2 , ·, tn ], where tj is the j-th column of T . Take x0 = ei , for some i.
Then
x1 = T x0 = T ei = ti ,
which implies that x1 is the i-th column of T . From the above we have

x k = T k x 0 = T k ei . (5.58)

Thus xk is the i-th column of T k . Now, use Theorem 5.72.

4. M Generate a regular stochastic matrix of dimension 5.

(a) Find all eigenvalues and corresponding eigenvectors, using e.g. eig in Matlab.
(b) Express the eigenvector corresponding to λ = 1 as a probability vector p.
(c) Use the power method to find a steady-state vector q, beginning with x0 = e1 .
(d) Compare p with q.
C HAPTER 6
Orthogonality and Least-Squares

In this chapter, we will learn

• Inner product, length, and orthogonality,


• Orthogonal projections,
• The Gram-Schmidt process, which is an algorithm to produce an or-
thogonal basis for any nonzero subspace of Rn , and
• Least-Squares problems, with applications to linear models.

Contents of Chapter 6
6.1. Inner Product, Length, and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . 180
6.2. Orthogonal Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.3. Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.4. The Gram-Schmidt Process and QR Factorization . . . . . . . . . . . . . . . . . . . . . 203
6.5. Least-Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6.6. Machine Learning: Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

179
180 Chapter 6. Orthogonality and Least-Squares

6.1. Inner Product, Length, and Orthogonality


6.1.1. Inner Product and Length

Definition 6.1. Let u = [u1 , u2 , · · · , un ]T and v = [v1 , v2 , · · · , vn ]T are


vectors in Rn . Then, the inner product (or dot product) of u and v is
given by  
v1
 
 v2 
u•v = uT v = [u1 u2 · · · un ]  ... 

  (6.1)
vn
n
X
= u1 v1 + u2 v2 + · · · + un vn = uk vk .
k=1
   
1 3
Example 6.2. Let u = −2 and v =  2. Find u•v.
   

2 −4
Solution.

Theorem 6.3. Let u, v, and w be vectors in Rn , and c be a scalar. Then


a. u•v = v•u
b. (u + v)•w = u•w + v•w
c. (cu)•v = c(u•v) = u•(cv)
d. u•u ≥ 0, and u•u = 0 ⇔ u = 0
6.1. Inner Product, Length, and Orthogonality 181

Definition 6.4. The length (norm) of v is nonnegative scalar kvk de-


fined by
√ q
kvk = v•v = v12 + v22 + · · · + vn2 and kvk2 = v•v. (6.2)

Note: For any scalar c, kcvk = |c| kvk.


" #
3
Example 6.5. Let W be a subspace of R2 spanned by v = . Find a unit
4
vector u that is a basis for W .
Solution.

Distance in Rn
Definition 6.6. For u, v ∈ Rn , the distance between u and v is

dist(u, v) = ku − vk, (6.3)

the length of the vector u − v.

Example 6.7. Compute the distance between the vectors u = (7, 1) and
v = (3, 2).
Solution.

Figure 6.1: The distance between u and v is


the length of u − v.
182 Chapter 6. Orthogonality and Least-Squares
  
1 3
Example 6.8. Let u = −2 and v =  2. Find the distance between u
   

2 −4
and v.
Solution.

Note: The inner product can be defined as

u•v = kuk kvk cos θ, (6.4)

where θ is the angle between u and v.


" # " #
1 −1/2
Example 6.9. Let u = √ and v = √ . Use (6.4) to find the angle
3 3/2
between u and v.
Solution.
6.1. Inner Product, Length, and Orthogonality 183

6.1.2. Orthogonal Vectors

Definition 6.10. Two vectors u and v in Rn are orthogonal if u•v = 0.

Theorem 6.11. The Pythagorean Theorem: Two vectors u and v


are orthogonal if and only if

ku + vk2 = kuk2 + kvk2 . (6.5)

Proof. For all u and v in Rn ,

ku + vk2 = (u + v)•(u + v) = kuk2 + kvk2 + 2u•v. (6.6)

Thus, u and v are orthogonal ⇔ (6.5) holds

Orthogonal Complements

Definition 6.12. Let W ⊂ Rn be a subspace. A vector z ∈ Rn is said


to be orthogonal to W if z•w = 0 for all w ∈ W . The set of all vectors
z that are orthogonal to W is called the orthogonal complement of
W and is denoted by W ⊥ (and read as “W perpendicular" or simply “W
perp"). That is,
W ⊥ = {z | z•w = 0, ∀ w ∈ W }. (6.7)

Example 6.13. Let W be a plane


through the origin in R3 , and let L be
the line through the origin and per-
pendicular to W . If z ∈ L and and
w ∈ W , then
z•w = 0.

In fact, L consists of all vectors that


are orthogonal to the w’s in W , and
W consists of all vectors orthogonal
to the z’s in L. That is,
Figure 6.2: A plane and line through the ori-
⊥ ⊥
L=W and W = L . (6.8) gin as orthogonal complements.
184 Chapter 6. Orthogonality and Least-Squares

Example 6.14. Let W is a subspace of Rn . Prove that if x ∈ W and x ∈ W ⊥ ,


then x = 0.
Solution. Hint : Let x ∈ W . The condition x ∈ W ⊥ implies that x is perpendicular to
every element in W , particularly to itself.

Remark 6.15. Let W be a subspace of Rn , with dim W = m ≤ n.


(a) Consider a basis for W . Let A be the collection of the basis vectors.
Then A ∈ Rn×m , W = Col A, and
W ⊥ = {x ∈ Rn | AT x = 0} = Nul AT . (6.9)

Note that AT ∈ Rm×n , m ≤ n, and


m = dim W = dim Col A = dim Col AT . (6.10)

(b) A vector x is in W ⊥ ⇔ x is orthogonal to every vector in a span-


ning set of W .
(c) W ⊥ is a subspace of Rn .

Example 6.16. Let W be a subspace of Rn . Prove that

dim W + dim W ⊥ = n. (6.11)

Solution. Hint : Use Remark 6.15 (a).


6.1. Inner Product, Length, and Orthogonality 185

Definition 6.17. Let A be an m × n matrix. Then the row space is the


set of all linear combinations of of the rows of A, denoted by Row A. That
is, Row A = Col AT .

Theorem 6.18. Let A be an m × n matrix. The orthogonal complement


of the row space of A is the null space of A, and the orthogonal comple-
ment of the column space of A is the null space of AT :

(Row A)⊥ = Nul A and (Col A)⊥ = Nul AT . (6.12)

True-or-False 6.19.
a. For any scalar c, u•(cv) = c(u•v).
b. If the distance from u to v equals the distance from u to −v, then u and
v are orthogonal.
c. For a square matrix A, vectors in Col A are orthogonal to vectors in
Nul A.
d. For an m × n matrix A, vectors in the null space of A are orthogonal to
vectors in the row space of A.
Solution.

Ans: T,T,F,T
186 Chapter 6. Orthogonality and Least-Squares

Exercises 6.1
   
3 6
x•w
1. Let x = −1 and w = −2. Find x•x, x•w, and .
   
x•x
−5 3
Ans: 35, 5, 1/7
  
0 −4
2. Find the distance between u = −5 and z = −1.
   

2 8 √
Ans: 2 17
3. Verify the parallelogram law for vectors u and v in Rn :

ku + vk2 + ku − vk2 = 2kuk2 + 2kvk2 . (6.13)

Hint : Use (6.6).


   
2 −7
4. Let u = −5 and v = −4.
   

−1 6

(a) Compute u•v, kuk, kvk, ku + vk, and ku − vk.


(b) Verify (6.6) and (6.13).

5. Suppose y is orthogonal to u and v. Show that y is orthogonal to every w in Span{u, v}.


6.2. Orthogonal Sets 187

6.2. Orthogonal Sets

Definition 6.20. A set of vectors {u1 , u2 , · · · , up } in Rn is said to be an


orthogonal set if each pair of distinct vectors from the set is orthogonal.
That is ui •uj = 0, for i 6= j.
     
1 0 −5
Example 6.21. Let u1 = −2, u2 = 1, and u3 = −2. Is the set
     

1 2 1
{u1 , u2 , u3 } orthogonal?
Solution.

Theorem 6.22. If S = {u1 , u2 , · · · , up } is an orthogonal set of nonzero


vectors in Rn , then S is linearly independent and therefore forms a basis
for the subspace spanned by S.

Proof. It suffices to prove that S is linearly independent. Suppose

c1 u1 + c2 u2 + · · · + cp up = 0.

Take the dot product with u1 . Then the above equation becomes

c1 u1 •u1 + c2 u1 •u2 + · · · + cp u1 •up = 0,

from which we conclude c1 = 0. Similarly, by taking the dot product with ui ,


we can get ci = 0. That is,

c1 = c2 = · · · = cp = 0,

which completes the proof.


188 Chapter 6. Orthogonality and Least-Squares

Definition 6.23. An orthogonal basis for a subspace W of Rn is a


basis for W that is also an orthogonal set.

The following theorem shows one of reasons why orthogonality is a useful


property in vector spaces and matrix algebra.

Theorem 6.24. Let {u1 , u2 , · · · , up } be an orthogonal basis for a sub-


space W of Rn . For each y in W , the weights in the linear combination

y = c1 u1 + c2 u2 + · · · + cp up (6.14)
are given by
y•uj
cj = (j = 1, 2, · · · , p). (6.15)
uj •uj

Proof. y•uj = (c1 u1 + c2 u2 + · · · + cp up )•uj = cj uj •uj = cj kuj k2 .


     
1 0 −5
Example 6.25. Let u1 = −2, u2 = 1, and u3 = −2. In Exam-
     

1 2 1
ple 6.21, we have seen that S = {u1 , u2 , u3 } is orthogonal. Express the
 
11
vector y =  0 as a linear combination of the vectors in S.
 

−5
Solution.

Ans: y = u1 − 2u2 − 2u3 .


6.2. Orthogonal Sets 189

6.2.1. An Orthogonal Projection


Given a nonzero vector u in Rn , con-
sider the problem of decomposing
a vector y ∈ Rn into sum of two
vectors, one a multiple of u and
the other orthogonal to u:

y=y
b + z, b // u and z ⊥ u.
y
(6.16)
b = αu. Then z = y − αu and
Let y

0 = z•u = (y−αu)•u = y•u−αu•u. Figure 6.3: Orthogonal projection: y =


y
b + z.
Thus α = y•u/u•u.

Definition 6.26. Given a nonzero vector u in Rn , for y ∈ Rn , let

y=y
b + z, b // u and z ⊥ u.
y (6.17)

Then
y•u
y
b = αu = u, z = y − y
b. (6.18)
u•u
The vector yb is called the orthogonal projection of y onto u, and z is
called the component of y orthogonal to u.
• Let L = Span{u}. Then we denote
y•u
y
b= u = projL y, (6.19)
u•u
which is called the orthogonal projection of y onto L.
• It is meaningful whether the angle between y and u is acute or
obtuse.
190 Chapter 6. Orthogonality and Least-Squares
" # " #
7 4
Example 6.27. Let y = and u = .
6 2

(a) Find the orthogonal projection of y onto u.


(b) Write y as the sum of two orthogonal vectors, one in L = Span{u} and
one orthogonal to u.
(c) Find the distance from y to L.
Solution.

Figure 6.4: The orthogonal projection of y


onto L = Span{u}.
6.2. Orthogonal Sets 191
" # " #
1 2
Example 6.28. Let y = and u = .
3 −4
(a) Find the orthogonal projection of y onto u.
(b) Write y as the sum of a vector in Span{u} and one orthogonal to u.
Solution.

  
4 2
Example 6.29. Let v = −12 and w =  1. Find the distance from v
   

8 −3
to Span{w}.
Solution.
192 Chapter 6. Orthogonality and Least-Squares

6.2.2. Orthonormal Basis and Orthogonal Matrix

Definition 6.30. A set {u1 , u2 , · · · , up } is an orthonormal set, if it is


an orthogonal set of unit vectors. If W is the subspace spanned by such a
set, then {u1 , u2 , · · · , up } is an orthonormal basis for W , since the set
is automatically linearly independent.
   
1 0
Example 6.31. In Example 6.21, p. 187, we know v1 = −2, v2 = 1,
   

1 2
 
−5
and v3 = −2 form an orthogonal basis for R3 . Find the corresponding
 

1
orthonormal basis.
Solution.

Theorem 6.32. An m × n matrix U has orthonormal columns if and


only if U T U = I.

Proof. To simplify notation, we suppose that U has only three columns:


U = [u1 u2 u3 ], ui ∈ Rm . Then
 T  T T T

u1 u1 u1 u1 u2 u1 u3
T  T
U U = u2 [u1 u2 u3 ] = u2 u1 uT2 u2 uT2 u3 .
 T 

uT3 uT3 u1 uT3 u2 uT3 u3


 
1 0 0
Thus, U has orthonormal columns ⇔ U T U = 0 1 0.
 
The proof of the gen-
0 0 1
eral case is essentially the same.
6.2. Orthogonal Sets 193

Theorem 6.33. Let U be an m × n matrix with orthonormal columns,


and let x, y ∈ Rn . Then
(a) kU xk = kxk (length preservation)
(b) (U x)•(U y) = x•y (dot product preservation)
(c) (U x)•(U y) = 0 ⇔ x•y = 0 (orthogonality preservation)

Proof.

Theorems 6.32 & 6.33 are particularly useful when applied to square ma-
trices.
Definition 6.34. An orthogonal matrix is a square matrix U such
that U T = U −1 , i.e.,

U ∈ Rn×n and U T U = I. (6.20)

Let’s generate a random orthogonal matrix and test it.

orthogonal_matrix.m Output
1 n = 4; 1 U =
2 2 -0.3770 0.6893 0.2283 -0.5750
3 [Q,~] = qr(rand(n)); 3 -0.3786 -0.2573 -0.8040 -0.3795
4 U = Q; 4 -0.6061 0.3149 -0.1524 0.7143
5 5 -0.5892 -0.5996 0.5274 -0.1231
6 disp("U ="); disp(U) 6 U'*U =
7 disp("U'*U ="); disp(U'*U) 7 1.0000 0.0000 0.0000 -0.0000
8 8 0.0000 1.0000 -0.0000 0.0000
9 x = rand([n,1]); 9 0.0000 -0.0000 1.0000 -0.0000
10 fprintf("\nx' ="); disp(x') 10 -0.0000 0.0000 -0.0000 1.0000
11 fprintf("||x||_2 =");disp(norm(x,2)) 11

12 fprintf("||U*x||_2=");disp(norm(U*x,2)) 12 x' = 0.4709 0.2305 0.8443 0.1948


13 ||x||_2 = 1.0128
14 ||U*x||_2= 1.0128
194 Chapter 6. Orthogonality and Least-Squares

True-or-False 6.35.
a. If y is a linear combination of nonzero vectors from an orthogonal set,
then the weights in the linear combination can be computed without
row operations on a matrix.
b. If the vectors in an orthogonal set of nonzero vectors are normalized,
then some of the new vectors may not be orthogonal.
c. A matrix with orthonormal columns is an orthogonal matrix.
d. If L is a line through 0 and if y
b is the orthogonal projection of y onto L,
then kb yk gives the distance from y to L.
e. Every orthogonal set in Rn is linearly independent.
f. If the columns of an m × n matrix A are orthonormal, then the linear
mapping x 7→ Ax preserves lengths.
Solution.

Ans: T,F,F,F,F,T
6.2. Orthogonal Sets 195

Exercises 6.2
1. Determine which sets of vectors are orthogonal.
           
2 −6 3 2 0 4
(a) −7, −3,  1 (b) −5, 0, −2
           

1 9 −1 −3 0 6
       
3 2 1 5
2. Let u1 = −3, u2 =  2, u3 = 1, and x = −3.
       

0 −1 4 1

(a) Check if {u1 , u2 , u3 } is an orthogonal basis for R3 .


(b) Express x as a linear combination of {u1 , u2 , u3 }.
Ans: x = 34 u1 + 13 u2 + 13 u3
" # " #
1 −1
3. Compute the orthogonal projection of onto the line through and the origin.
−1 3
4. Determine which sets of vectors are orthonormal. If a set is only orthogonal, normalize
the vectors to produce an orthonormal set.
             
0 0 1/3 −1/2 1 1 −2/3
(a) 1, −1 (b) 1/3,  0  (c) 4,  0,  1/3
             

0 0 1/3 1/2 1 −1 −2/3

5. Let U and V be n × n orthogonal matrices. Prove that U V is an orthogonal matrix.


Hint : See Definition 6.34, where U −1 = U T ⇔ U T U = I.
196 Chapter 6. Orthogonality and Least-Squares

6.3. Orthogonal Projections

Recall: (Definition 6.26) Given a nonzero vector u in Rn , for y ∈ Rn , let


y=y
b + z, b // u and z ⊥ u.
y (6.21)

Then y•u
y
b = αu = u, z = y − y b. (6.22)
u•u
The vector yb is called the orthogonal projection of y onto u, and z is
called the component of y orthogonal to u. Let L = Span{u}. Then we
denote y•u
y
b= u = projL y, (6.23)
u•u
which is called the orthogonal projection of y onto L.

We generalize this orthogonal projection to subspaces.

Theorem 6.36. (The Orthogonal Decomposition Theorem)


Let W be a subspace of Rn . Then each y ∈ Rn can be written uniquely in
the form
y=y b + z, (6.24)
where yb ∈ W and z ∈ W ⊥ . In fact, if {u1 , u2 , · · · , up } is an orthogonal
basis for W , then
y•u1 y•u2 y•up
b = projW y =
y u1 + u2 + · · · + up ,
u1 •u1 u2 •u2 up •up (6.25)
z = y−y b.

Figure 6.5: Orthogonal projection of y onto W .


6.3. Orthogonal Projections 197
     
2 −2 1
Example 6.37. Let u1 =  5, u2 =  1, and y = 2. Observe that
     

−1 1 3
{u1 , u2 } is an orthogonal basis for W = Span{u1 , u2 }.

(a) Write y as the sum of a vector in W and a vector orthogonal to W .


(b) Find the distance from y to W .
y•u1 y•u2
b+z ⇒ y
Solution. y = y b = u1 + u2 and z = y − y
b.
u1 •u1 u2 •u2

Figure 6.6: A geometric interpretation of the


orthogonal projection.
198 Chapter 6. Orthogonality and Least-Squares

Remark 6.38. (Properties of Orthogonal Decomposition)


Let y = y b ∈ W and z ∈ W ⊥ . Then
b + z, where y
1. y
b is called the orthogonal projection of y onto W (= projW y)
2. y
b is the closest point to y in W .
(in the sense ky − y
bk ≤ ky − vk, for all v ∈ W )
3. y
b is called the best approximation to y by elements of W .
4. If y ∈ W , then projW y = y.

Proof. 2. For an arbitrary v ∈ W , y−v = (y−b y −v), where (b


y)+(b y −v) ∈ W .
Thus, by the Pythagorean theorem,

ky − vk2 = ky − y
bk2 + kb
y − vk2 ,

which implies that ky − vk ≥ ky − y


bk.

Example 6.39. Find the closest point to y in the subspace Span{u1 , u2 }


and hence find the distance from y to W . (Notice that u1 and u2 are orthog-
onal.)      
3 1 −4
     
−1
, u1 = −2, u2 =  1
   
y=  1 −1  0
     
13 2 3
Solution.
6.3. Orthogonal Projections 199

Example 6.40. Find the distance from y to the plane in R3 spanned by u1


and u2.     
5 −3 −3
y = −9, u1 = −5, u2 =  2
     

5 1 1
Solution.

Example 6.41. Let {u1 , u2 , u3 } be an orthogonal basis for a subspace W of


R4 and v ∈ W :
       
1 −2 1 −3
       
 2
, u2 =  1, u3 =  1, and v =  7.
     
u1 = 
 1 −1 −2 −6
       
1 1 −1 2

Write v as the sum of two vectors: one in Span{u1 , u2 } and the other in
Span{u3 }.
Solution.

Ans: u1 + 3u2 and 2u3 .


200 Chapter 6. Orthogonality and Least-Squares

Theorem 6.42. If {u1 , u2 , · · · , up } is an orthonormal basis for a sub-


space W of Rn , then

projW y = (y•u1 ) u1 + (y•u2 ) u2 + · · · + (y•up ) up . (6.26)

If U = [u1 u2 · · · up ], then

projW y = U U T y, for all y ∈ Rn . (6.27)

The orthogonal projection can be viewed as a matrix transformation.

Proof. Notice that


(y•u1 ) u1 + (y•u2 ) u2 + · · · + (y•up ) up
= (uT1 y) u1 + (uT2 y) u2 + · · · + (uTp y) up
= U (U T y).
" # " √ #
7 1/ 10
Example 6.43. Let y = , u1 = √ , and W = Span{u1 }.
9 −3/ 10

(a) Let U be the 2 × 1 matrix whose only column is u1 . Compute U T U and


UUT.
(b) Compute projW y = (y•u1 ) u1 and U U T y.
Solution.

" # " #
1 1 −3 −2
Ans: (a) U U T = (b)
10 −3 9 6
6.3. Orthogonal Projections 201

True-or-False 6.44.
a. If z is orthogonal to u1 and u2 and if W = Span{u1 , u2 }, then z must be
in W ⊥ .
b. The orthogonal projection y
b of y onto a subspace W can sometimes de-
pend on the orthogonal basis for W used to compute y
b.
c. If the columns of an n × p matrix U are orthonormal, then U U T y is the
orthogonal projection of y onto the column space of U .
d. If an n × p matrix U has orthonormal columns, then U U T x = x for all x
in Rn .
Solution.

Ans: T,F,T,F
202 Chapter 6. Orthogonality and Least-Squares

Exercises 6.3
1. (i) Verify that {u1 , u2 } is an orthogonal set, (ii) find the orthogonal projection of y onto
W = Span{u1 , u2 }, and (iii) write y as a sum of a vector in W and a vector orthogonal to
W.
           
−1 3 1 −1 1 −1
(a) y =  2, u1 = −1, u2 = −1 (b) y =  4, u1 = 1, u2 =  3
           

6 2 −2 3 1 −2
   
3 −5
1  1
Ans: (b) y = 2 7 + 2  1

2 4
2. Find the best approximation to z by vectors of the form c1 v1 + c2 v2 .
           
3 1 −4 3 2 1
−1 −2  1 −7 −1  1
(a) z =  , v1 =  , v2 =   (b) z =  , v1 =  , v2 =  
           
 1 −1  0  2 −3  0
13 2 3 3 1 −1
Ans: (a) b
z = 3v1 + v2
3. Let z, v1 , and v2 be given as in Exercise 2. Find the distance from z to the subspace of
R4 spanned by v1 and v2 .
Ans: (a) 8
4. Let W be a subspace of Rn . A transformation T : Rn → Rn is defined as
x 7→ T (x) = projW x.

(a) Prove that T is a linear transformation.


(b) Prove that T (T (x)) = T (x).

Hint : Use Theorem 6.42.


6.4. The Gram-Schmidt Process and QR Factorization 203

6.4. The Gram-Schmidt Process and QR Fac-


torization
6.4.1. The Gram-Schmidt Process
The Gram-Schmidt process is an algorithm to produce an orthogonal
or orthonormal basis for any nonzero subspace of Rn .
   
3 1
Example 6.45. Let W = Span{x1 , x2 }, where x1 = 6 and x2 = 2. Find
   

0 2
an orthogonal basis for W .
Main idea: Orthogonal projection
( ) ( (
x1 x1 v1 = x1
⇒ ⇒
x2 x2 = αx1 + v2 v2 = x2 − αx1
where x1 •v2 = 0. Then W = Span{x1 , x2 } = Span{v1 , v2 }.

Solution.

Figure 6.7: Construction of an orthogonal


basis {v1 , v2 }.
204 Chapter 6. Orthogonality and Least-Squares

Example 6.46. Find an orthonormal basis for a subspace whose basis is


   
 3
 8 

 0,  5 .
   
 
−1 −6
 

Solution.

Theorem 6.47. (The Gram-Schmidt Process) Given a basis


{x1 , x2 , · · · , xp } for a nonzero subspace W of Rn , define

v1 = x1
x2 •v1
v2 = x2 − v1
v1 •v1
x3 •v1 x3 •v2
v3 = x3 − v1 − v2 (6.28)
v1 •v1 v2 •v2
..
.
xp •v1 xp •v2 xp •vp−1
vp = xp − v1 − v2 − · · · − vp−1
v1 •v1 v2 •v2 vp−1 •vp−1
Then {v1 , v2 , · · · , vp } is an orthogonal basis for W . In addition,

Span{x1 , x2 , · · · , xk } = Span{v1 , v2 , · · · , vk }, for 1 ≤ k ≤ p. (6.29)

Remark 6.48. For the result of the Gram-Schmidt process, define


vk
uk = , for 1 ≤ k ≤ p. (6.30)
kvk k
Then {u1 , u2 , · · · , up } is an orthonormal basis for W . In practice, it is
often implemented with the normalized Gram-Schmidt process.
6.4. The Gram-Schmidt Process and QR Factorization 205

Example 6.49. Find an orthonormal basis for W = Span{x1 , x2 , x3 }, where


     
1 −2 0
     
 0
, x2 =  2, and x3 =  1.
   
x1 = 
−1  1 −1
     
1 0 1
Solution.
206 Chapter 6. Orthogonality and Least-Squares

6.4.2. QR Factorization of Matrices

Theorem 6.50. (The QR Factorization) If A is an m × n matrix with


linearly independent columns, then A can be factored as

A = QR, (6.31)

where
• Q is an m × n matrix whose columns are orthonormal and
• R is an n × n upper triangular invertible matrix with positive en-
tries on its diagonal.

Proof. The columns of A form a basis {x1 , x2 , · · · , xn } for W = Col A.

1. Construct an orthonormal basis {u1 , u2 , · · · , un } for W (the Gram-


Schmidt process). Set
def
Q == [u1 u2 · · · un ]. (6.32)

2. (Expression) Since Span{x1 , x2 , · · · , xk } = Span{u1 , u2 , · · · , uk }, 1 ≤


k ≤ n, there are constants r1k , r2k , · · · , rkk such that

xk = r1k u1 + r2k u2 + · · · + rkk uk + 0 · uk+1 + · · · + 0 · un . (6.33)

We may assume hat rkk > 0. (If rkk < 0, multiply both rkk and uk by −1.)
3. Let rk = [r1k , r2k , · · · , rkk , 0, · · · , 0]T . Then

xk = Qrk (6.34)

4. Define
def
R == [r1 r2 · · · rn ]. (6.35)

Then we see A = [x1 x2 · · · xn ] = [Qr1 Qr2 · · · Qrn ] = QR.


6.4. The Gram-Schmidt Process and QR Factorization 207

We can summarize the QR Factorization as follows.


Algorithm 6.51. (QR Factorization) Let A = [x1 x2 · · · xn ].
• Apply the Gram-Schmidt process to obtain an orthonormal basis
{u1 , u2 , · · · , un }.
• Then
x1 = (u1 •x1 )u1
x2 = (u1 •x2 )u1 + (u2 •x2 )u2
x3 = (u1 •x3 )u1 + (u2 •x3 )u2 + (u3 •x3 )u3 (6.36)
..
.
Pn
xn = j=1 (uj •xn )uj .

• Thus
A = [x1 x2 · · · xn ] = QR (6.37)
implies that

Q = [u1 u2 · · · un ],
 
u1 •x1 u1 •x2 u1 •x3 · · · u1 •xn
 0 u2 •x2 u2 •x3 · · · u2 •xn 
 
  T
(6.38)
R = 
 0 0 u3 •x3 · · · u3 •xn 
 = Q A.
.. .. .. ... ..
. . . .
 
 
0 0 0 · · · un •xn

• In practice, the coefficients rij = ui •xj , i < j, can be saved during


the (normalized) Gram-Schmidt process.
" #
4 −1
Example 6.52. Find the QR factorization for A = .
3 2
Solution.

" # " #
0.8 −0.6 5 0.4
Ans: Q = R=
0.6 0.8 0 2.2
208 Chapter 6. Orthogonality and Least-Squares

True-or-False 6.53.
a. If {v1 , v2 , v3 } is an orthogonal basis for W , then multiplying v3 by a
scalar c gives a new orthogonal basis {v1 , v2 , cv3 }. Clue: c =?
b. The Gram-Schmidt process produces from a linearly independent set
{x1 , x2 , · · · , xp } an orthogonal set {v1 , v2 , · · · , vp } with the property
that for each k, the vectors v1 , v2 , · · · , vk span the same subspace as
that spanned by x1 , x2 , · · · , xk .
c. If A = QR, where Q has orthonormal columns, then R = QT A.
d. If x is not in a subspace W , then x
b = projW x is not zero.
e. In a QR factorization, say A = QR (when A has linearly independent
columns), the columns of Q form an orthonormal basis for the column
space of A.
Solution.

Ans: F,T,T,F,T
6.4. The Gram-Schmidt Process and QR Factorization 209

Exercises 6.4
1. The given set is a basis for a subspace W . Use the Gram-Schmidt process to produce an
orthogonal basis for W .
       
3 8 1 7
(a)  0,  5 −4 −7
       
(b)  ,  
−1 −6  0 −4
1 1
Ans: (b) v2 = (5, 1, −4, −1)
2. Find an orthogonal basis for the column space of the matrix
 
−1 6 6
 3 −8 3
 
 
 1 −2 6
1 −4 −3
Ans: v3 = (1, 1, −3, 1)
 
−10 13 7 −11
 2 1 −5 3
 

3. M Let A = 
 
 −6 3 13 −3

 16 −16 −2 5
 
2 1 −5 −7

(a) Use the Gram-Schmidt process to produce an orthogonal basis for the column space
of A.
(b) Use the method in this section to produce a QR factorization of A.
Ans: (a) v4 = (0, 5, 0, 0, −5)
210 Chapter 6. Orthogonality and Least-Squares

6.5. Least-Squares Problems

Note: Let A is an m × n matrix. Then Ax = b may have no solution,


particularly when m > n. In real-world,
• m  n, where m represents the number of data points and n denotes
the dimension of the points
• Need to find a best solution for Ax ≈ b

Definition 6.54. Let A is an m × n matrix and b ∈ Rm .


A least-squares (LS) solution of Ax = b is an xb ∈ Rn such that

kb − Ab
xk ≤ kb − Axk, for all x ∈ Rn . (6.39)

Note: The information matrix A and the observation vector b are


often formulated from a certain dataset.
• Finding a best approximation/representation is a major subject
in research level.
• Here we assume that the dataset is acquired appropriately.

Figure 6.8: Least-Squares approximation for noisy data. The dashed line in cyan is the
linear model from random sample consensus (RANSAC). The data has 1,200 and 300
points respectively for inliers and outliers.
6.5. Least-Squares Problems 211

Solution of the General Least-Squares Problem

Recall: (Definition 6.54) Let A is an m × n matrix and b ∈ Rm .


A least-squares (LS) solution of Ax = b is an xb ∈ Rn such that

kb − Ab
xk ≤ kb − Axk, for all x ∈ Rn . (6.40)

b is in Rn .
Figure 6.9: The LS solution x

Remark 6.55. Geometric Interpretation of the LS Problem


• For all x ∈ Rn , Ax will necessarily be in Col A, a subspace of Rm .
– So we seek an x that makes Ax the closest point in Col A to b.

• Let b
b = proj
Col A b. Then Ax = b has a solution and there is an
b
b ∈ Rn such that
x
Ab
x = b.
b (6.41)

• x
b is an LS solution of Ax = b.
b 2 = kb−Ab
• The quantity kb− bk xk2 is called the least-squares error.

Note: If A ∈ Rn × n is invertible, then Ax = b has a unique solution x


b
and therefore
kb − Abxk = 0. (6.42)
212 Chapter 6. Orthogonality and Least-Squares

The Method of Normal Equations

Theorem 6.56. The set of LS solutions of Ax = b coincides with the


nonempty set of solutions of the normal equations

AT Ax = AT b. (6.43)

Proof. Suppose x
b satisfies Ab
x=b b
⇔b−b b = b − Ab
x ⊥ Col A
⇔ aj •(b − Ab
x) = 0 for all columns aj
T
⇔ aj (b − Abx) = 0 for all columns aj (Note that aj T is a row of AT )
⇔ AT (b − Abx) = 0
T
⇔ A Ab x = AT b
   
1 1 −4
Example 6.57. Let A =  2 0 and b =  8.
   

−2 1 1

(a) Find an LS solution of Ax = b.


xk2 .
(b) Find the least-squares error, kb − Ab
Solution.

" #
1
Ans: (a) x
b= .
−1
6.5. Least-Squares Problems 213

Remark 6.58. Theorem 6.56 implies that LS solutions of Ax = b are


solutions of the normal equations AT Ab
x = AT b.
• When AT A is not invertible, the normal equations have either no
solution or infinitely many solutions.
• So, data acquisition is important, to make it invertible.

Theorem 6.59. Let A be an m × n matrix. The following statements


are logically equivalent:
a. The equation Ax = b has a unique LS solution for each b ∈ Rm .
b. The columns of A are linearly independent.
c. The matrix AT A is invertible.
When these statements are true, the unique LS solution x
b is given by

b = (AT A)−1 AT b.
x (6.44)

Definition 6.60. The matrix

A+ := (AT A)−1 AT (6.45)

is called the pseudoinverse of A.

Example 6.61. Describe all least squares solutions of the equation Ax = b,


given    
1 1 0 1
   
1 1 0
 and b = 3.
 
A= 1 0 1 8
   
1 0 1 2
Solution.
214 Chapter 6. Orthogonality and Least-Squares

Alternative Calculations of Least-Squares Solutions

Theorem 6.62. Given an m × n matrix A with linearly independent


columns, let A = QR be a QR factorization of A as in Algorithm 6.51.
Then, for each b ∈ Rm , the equation Ax = b has a unique LS solution,
given by
b = R−1 QT b.
x (6.46)

Proof. Let A = QR. Then the pseudoinverse of A reads

(AT A)−1 AT = ((QR)T QR)−1 (QR)T = (RT QT QR)−1 RT QT


(6.47)
= R−1 (RT )−1 RT QT = R−1 QT ,

which completes the proof.

Self-study 6.63. Find the LS solution of Ax = b for


   
1 3 5 3 
1/2 1/2

1/2  
1/2 −1/2 −1/2 2 4 5
   
1 1 0  5
A= 1 1 2 and b =  7, where A = QR =  0 2 3
     
1/2 −1/2 1/2
    0 0 2
1/2 1/2 −1/2
1 3 3 −3
Solution.

Ans: QT b = (6, −6, 4) and x


b = (10, −6, 2)
6.5. Least-Squares Problems 215

True-or-False 6.64.
a. The general least-squares problem is to find an x that makes Ax as
close as possible to b.
b. Any solution of AT Ax = AT b is a least-squares solution of Ax = b.
c. If x b = (AT A)−1 AT b.
b is a least-squares solution of Ax = b, then x
d. The normal equations always provide a reliable method for computing
least-squares solutions.
Solution.

Ans: T,T,F,F
216 Chapter 6. Orthogonality and Least-Squares

Exercises 6.5
1. Find a least-squares solution of Ax = b by (i) constructing the normal equations and
b. Also (iii) compute the least-squares error (kb − Ab
(ii) solving for x xk) associated with
the least-squares solution.
       
−1 2 4 1 −2 3
(a) A =  2 −3, b = 1 −1 2  1
       
(b) A =  , b =  
−1 3 2  0 3 −4
2 5 2
" #
4/3
Ans: (b) x
b=
−1/3
2. Find (i) the orthogonal projection of b onto Col A and (ii) a least-squares solution of
Ax = b. Also (iii) compute the least-squares error associated with the least-squares
solution.
       
4 0 1 9 1 1 0 2
1 −5 1 0  1 0 −1 5
      
(a) A =  , b =   (b) A =  , b =  

6 1 0 0  0 1 1 6
1 −1 −5 0 −1 1 −1 6
Ans: (b) b b = (1/3, 14/3, −5/3)
b = (5, 2, 3, 6) and x

3. Describe
 all least-squares solutions of the system and the associated least-squares error.
 x+y = 1

x + 2y = 3

x + 3y = 3

Ans: x
b = (1/3, 1)

For the above problems, you may use either pencil-and-paper or computer programs. For
example, for the last problem, a code can be written as
exercise-6.5.3.m
1 A = [1 1; 1 2; 1 3];
2 b = [1;3;3];
3

4 ATA = A'*A; ATb = A'*b;


5

6 xhat = ATA\ATb
7 error = norm(b-A*xhat)^2
6.6. Machine Learning: Regression Analysis 217

6.6. Machine Learning: Regression Analysis

Recall: (Section 6.5)


• (Definition 6.54) Let A is an m × n matrix and b ∈ Rm .
A least-squares (LS) solution of Ax = b is an xb ∈ Rn such that

kb − Ab
xk ≤ kb − Axk, for all x ∈ Rn . (6.48)

• (Theorem 6.56) The set of LS solutions of Ax = b coincides with the


nonempty set of solutions of the normal equations

AT Ax = AT b. (6.49)

• (Theorem 6.59) The normal equations have a unique solution, if and


only if the columns of A are linearly independent.
• (Definition 6.60) The matrix

A+ := (AT A)−1 AT

is called the pseudoinverse of A.


• (Theorem 6.62) Given an m × n matrix A with linearly indepen-
dent columns, let A = QR be a QR factorization of A as in Algo-
rithm 6.51. Then, for each b ∈ Rm , the equation Ax = b has a unique
LS solution, given by
xb = R−1 QT b. (6.50)
218 Chapter 6. Orthogonality and Least-Squares

6.6.1. Regression Line

Figure 6.10: A regression line.

Definition 6.65. Suppose a set of experimental data points are given


as
(x1 , y1 ), (x2 , y2 ), · · · , (xm , ym )
such that the graph is close to a line. We determine a line

y = β0 + β1 x (6.51)

that is as close as possible to the given points. This line is called the
least-squares line; it is also called regression line of y on x and β0 , β1
are called regression coefficients.
6.6. Machine Learning: Regression Analysis 219

Calculation of Least-Squares Lines


Remark 6.66. Consider a least-squares (LS) model of the form
y = β0 + β1 x, for a given data set {(xi , yi ) | i = 1, 2, · · · , m}.
• Then
Predicted y-value Observed y-value

β0 + β1 x1 = y1
β0 + β1 x2 = y2 (6.52)
.. ..
. .
β0 + β1 xm = ym

• It can be equivalently written as


Xβ = y, (6.53)

where    
1 x1 y1
  " #  
1 x2  β0  y2 
X=
 ... ... ,
 β= , y=
 ... .

  β1  
1 xm ym
Here we call X the design matrix, β the parameter vector, and y
the observation vector.
• (Method of Normal Equations) Thus the LS solution can be deter-
mined as
X T Xβ = X T y ⇒ β = (X T X)−1 X T y, (6.54)
provided that X T X is invertible.

Example 6.67. Find the equation y = β0 + β1 x of least-squares line that


best fits the given points:
(−1, 0), (0, 1), (1, 2), (2, 4)
Solution.
220 Chapter 6. Orthogonality and Least-Squares

Remark 6.68. It follows from (6.53) that


 
1 x1
" #  " #
1 1 · · · 1 1 x2 
  m Σxi
XT X = . . = ,
x1 x2 · · · xm  .
. .
.  Σx i Σx2
  i
1 xm
  (6.55)
y
" # 1 " #
1 1 ··· 1   y2  = Σyi .

XT y = . 
 .. 
x1 x2 · · · xm  Σxi yi
ym

Thus the normal equations for the regression line read


" # " #
Σ1 Σxi Σyi
β= . (6.56)
Σxi Σx2i Σxi yi

Example 6.69. Find the equation y = β0 + β1 x of least-squares line that


best fits the given points:
(0, 1), (1, 1), (2, 2), (3, 2)
Solution.
6.6. Machine Learning: Regression Analysis 221

6.6.2. Least-Squares Fitting of Other Curves

Remark 6.70. Consider a regression model of the form

y = β0 + β1 x + β2 x2 ,

for a given data set {(xi , yi ) | i = 1, 2, · · · , m}.


• As for the regression line, we will get a linear system and try to find
LS solutions of the system.
• Linear System:

Predicted y-value Observed y-value

β0 + β1 x1 + β2 x21 = y1
β0 + β1 x2 + β2 x22 = y2 (6.57)
.. ..
. .
β0 + β1 xm + β2 x2m = ym

• It is equivalently written as

Xβ = y, (6.58)

where    
1 x1 x21   y1
β0
1 x2 x22 
   
 y2 
X= .. , β = β1 , y=
 ... .
 
 ... ...
 
. 
β2
  
1 xm x2m ym

• The system can be solved by the method of normal equations:


   
Σ1 Σxi Σx2i Σyi
X T Xβ =  Σxi Σx2i Σx3i  β =  Σxi yi  = X T y (6.59)
   

Σx2i Σx3i Σx4i Σx2i yi


222 Chapter 6. Orthogonality and Least-Squares

Example 6.71. Find an LS curve of the form y = β0 + β1 x + β2 x2 that best


fits the given points:
(0, 1), (1, 1), (1, 2), (2, 3).
   
Σ1 Σxi Σx2i Σyi
Solution. The normal equations are 
 Σxi Σx2i Σx3i  β =  Σxi yi 
  

Σx2i Σx3i Σx4i Σx2i yi

Ans: y = 1 + 0.5x2
6.6. Machine Learning: Regression Analysis 223

Self-study 6.72. Find an LS curve of the form y = β0 + β1 x + β2 x2 that best


fits the given points:
(−2, 1), (−1, 0), (0, 1), (1, 4), (2, 9)
Solution.

Ans: y = 1 + 2x + x2

Further Applications
Example 6.73. Find an LS curve of the form y = a cos x + b sin x that best
fits the given points:
(0, 1), (π/4, 2), (π, 0).
Solution.


Ans: (a, b) = (1/2, −1/2 + 2 2) = (0.5, 2.32843)
224 Chapter 6. Orthogonality and Least-Squares

Nonlinear Models: Linearization

Strategy 6.74. For nonlinear models, change of variables can be


applied for a linear model.

Model Change of Variables Linearization


B 1
y =A+ x
e = , ye = y ⇒ ye = A + Bex
x x
1 1 (6.60)
y= x
e = x, ye = ⇒ ye = A + Bex
A + Bx y
y = CeDx x
e = x, ye = ln y ⇒ ye = ln C + De
x

The Idea: Transform the nonlinear model to produce a linear system.

Example 6.75. Find an LS curve of the form y = CeDx that best fits the
given points:
(0, e), (1, e3 ), (2, e5 ).
Solution.
x y x
e ye = ln y    
1 0 1
0 e 0 1
⇒ ⇒ X = 1 1, y = 3
   
1 e3 1 3
1 2 5
2 e5 2 5

Ans: y = ee2x = e2x+1


6.6. Machine Learning: Regression Analysis 225

Exercises 6.6
1. Find an LS curve of the form y = β0 + β1 x that best fits the given points.

(a) (1, 0), (2, 1), (4, 2), (5, 3) (b) (2, 3), (3, 2), (5, 1), (6, 0)
Ans: (a) y = −0.6 + 0.7x

2. M A certain experiment produces the data

(1, 1.8), (2, 2.7), (3, 3.4), (4, 3.8), (5, 3.9).

For these points, we will try to find the best-fitting model of the form y = β1 x + β2 x2 .

(a) Find and display the design matrix and the observation vector.
(b) Find the unknown parameter vector.
(c) Find the LS error.
(d) Plot the associated LS curve along with the data.
226 Chapter 6. Orthogonality and Least-Squares
A PPENDIX A
Appendix

Contents of Chapter A
A.1. Understanding / Interpretation of Eigenvalues and Eigenvectors . . . . . . . . . . . . . 228
A.2. Eigenvalues and Eigenvectors of Stochastic Matrices . . . . . . . . . . . . . . . . . . . 231

227
228 Appendix A. Appendix

A.1. Understanding / Interpretation of Eigenval-


ues and Eigenvectors

Recall: Let A be an n × n matrix. An eigenvalue λ of A and its corre-


sponding eigenvector v are defined as
Av = λv, v 6= 0. (A.1.1)

Observation A.1. (Matrix Transformation)


Let A be an n × n matrix. Consider the matrix multiplication
Ax = y. (A.1.2)

• It scales the vector.


• It rotates the vector.

Remark A.2. Historically, eigenvalues and eigenvectors appeared in


the study of quadratic forms and differential equations:
In the 18th century, Leonhard Euler studied the rotational mo-
tion of a rigid body, and discovered the importance of the princi-
pal axes. Joseph-Louis Lagrange realized that the principal
axes are the eigenvectors of the inertia matrix [3].

• (Having Principal Axes) There are favored directions/vectors,


for square matrices.
When the matrix acts on these favored (principal) vectors, the action
results in scaling the vectors, without rotation.
– These favored vectors are the eigenvectors;
– the scaling factor is the eigenvalue.
• (Forming a Basis) In various real-world interesting applications, the
eigenvectors form a basis, which makes matrix methods and al-
gorithms much more effective and useful.
A.1. Understanding / Interpretation of Eigenvalues and Eigenvectors 229

Example A.3. Let A ∈ Rn×n and its eigenvectors {v1 , v2 , · · · , vn } form a


basis for Rn , Avi = λi vi , i = 1, 2, · · · , n. Then for an arbitrary x ∈ Rn ,
n
X
x = ξ1 v1 + ξ2 v2 + · · · + ξn vn = ξi v i . (A.1.3)
i=1

and therefore
n
X n
X n
X
k
Ax = ξi Avi = ξi λi vi . ⇒ A x = ξi λki vi . (A.1.4)
i=1 i=1 i=1

The formulation is applicable for many tasks in scientific computing,


e.g. convergence analysis for various iterative procedures.
Example A.4. (Geometric Interpretation) Consider a 2 × 2 matrix
" #
2 −1
A= . (A.1.5)
−1 2

Then its eigenvalues and eigenvectors are


" # " #
−1 1
λ1 , λ2 = 3, 1 v1 , v2 = , . (A.1.6)
1 1

The action of A on the unit circle (S 1 ) results in the following figure.

Figure A.1: Action of A at x ∈ S 1 .

Find the area of the solid ellipse, the image of the unit disk by A.
Ans: π · |λ1 · λ2 | = π · 3 · 1 = 3π
230 Appendix A. Appendix

Singular Value Decomposition

Theorem A.5. (SVD Theorem). Let A ∈ Rm×n with m ≥ n. Then


A = U ΣV T, (A.1.7)

where U ∈ Rm×n and satisfies U T U = I, V ∈ Rn×n and satisfies V T V = I,


and Σ = diag(σ1 , σ2 , · · · , σn ), where
σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0. (A.1.8)

Remark A.6. The SVD


• The singular values are the square root of eigenvalues of AT A:
p
σi = λi , AT Avi = λi vi , (A.1.9)

the right singular vectors V is the collection of eigenvectors of


AT A:
V = [v1 v2 · · · vn ], (A.1.10)
the left singular vectors U are the collection of uj ’s:
uj = Avj /σj , σj 6= 0, (A.1.11)

and the principal components are


AV = U Σ. (A.1.12)

• (Dyadic Decomposition) Given A = U ΣV T , the matrix A ∈ Rm×n


can be expressed as Xn
A= σj uj vjT . (A.1.13)
j=1

• (Data Compression) The matrix A can be approximated by Ak :


k
def
X
A ≈ Ak == σj uj vjT , k < n, (A.1.14)
j=1

with error ||A − Ak ||2 = σk+1 .


• The SVD plays crucial roles in various applications, including re-
gression analysis and principal component analysis.
A.2. Eigenvalues and Eigenvectors of Stochastic Matrices 231

A.2. Eigenvalues and Eigenvectors of Stochas-


tic Matrices
Definition A.7. Probability Vector and Stochastic Matrix
 
p1
 .. 
• A vector p =  .  with nonnegative entries that add up to 1 is called
pn
a probability vector.
• A (left) stochastic matrix is a square matrix whose columns are
probability vectors.

A stochastic matrix is also called a probability matrix, transition ma-


trix, substitution matrix, or Markov matrix.

Lemma A.8. If p is a probability vector and T is a stochastic matrix,


then T p is a probability vector.

Proof. Let v1 , v2 , · · · , vn be the columns of T . Then


q := T p = p1 v1 + p2 v2 + · · · + pn vn ∈ Rn .
Clearly q has nonnegative entries; their sum reads
sum(q) = sum(p1 v1 + p2 v2 + · · · + pn vn ) = p1 + p2 + · · · + pn = 1.

Definition A.9. Markov Chain


In general, a finite Markov chain is a sequence of probability vectors
x0 , x1 , x2 , · · · , together with a stochastic matrix T , such that

x1 = T x0 , x2 = T x1 , x3 = T x2 , · · · (A.2.1)

We can rewrite the above conditions as a recurrence relation

xk+1 = T xk , k = 0, 1, 2, · · · (A.2.2)

The vector xk is often called a state vector.


232 Appendix A. Appendix

The Maximum of Eigenvalues

Definition A.10. For a vector x = [x1 , x2 , · · · , xn ]T ∈ Rn , the p-norm is


defined as 1/p
||x||p = |x1 |p + |x2 |p + · · · + |xn |p , p > 0. (A.2.3)
For example,
||x||1 = |x1 | + |x2 | + · · · + |xn | (1-norm)
p (A.2.4)
||x||2 = |x1 |2 + |x2 |2 + · · · + |xn |2 (2-norm)

Theorem A.11. Let T ∈ Rn×n be a stochastic matrix. Then


||T x||1 ≤ ||x||1 , ∀ x ∈ Rn . (A.2.5)
" # " #
t11 t12 x1
Proof. Consider the case, n = 2: T = . Then, for x = ∈ R2 ,
t21 t22 x2

||T x||1 = |t11 x1 + t12 x2 | + |t21 x1 + t22 x2 |


≤ t11 |x1 | + t12 |x2 | + t21 |x1 | + t22 |x2 |
(A.2.6)
= (t11 + t21 )|x1 | + (t12 + t22 )|x2 |
= |x1 | + |x2 | = ||x||1 .

For general n ≥ 2, use the same argument to complete the proof.

Corollary A.12. Let T ∈ Rn×n be a stochastic matrix. Then every


eigenvalue of T is bounded by 1 in modulus. That is,

T v = λv ⇒ |λ| ≤ 1. (A.2.7)

Proof. Let T v = λv. Then it follows from Theorem A.11 that

||T v||1 = ||λv||1 = |λ| ||v||1 ≤ ||v||1 , (A.2.8)

which completes the proof.


A.2. Eigenvalues and Eigenvectors of Stochastic Matrices 233

The Eigenvalue 1 and Its Corresponding Eigenvector

Theorem A.13. Let T ∈ Rn×n be a stochastic matrix. Then the number


1 is an eigenvalue of T .

Proof. We prove the theorem in two different ways.

(a) Note that det (T T − λI) = det (T − λI), which implies that T and T T
have exactly the same eigenvalues. Consider the all-ones vector
1 = [1, 1, · · · , 1]T ∈ Rn . Then

T T 1 = 1,

which implies that the number 1 is an eigenvalue of T T and therefore


it is an eigenvalue of T .
(b) Construct T − λI for λ = 1:
 
t11 − 1 t12 ··· T1n
 
 t21 t22 − 1 · · · T2n 
T −I =  .. . ... ..
. (A.2.9)
 . 

tn1 t12 · · · T1n − 1

We apply replacement operations so that all rows are added to the


bottom row. Then the resulting bottom row must become a zero
row, which implies
det (T − I) = 0, (A.2.10)
and therefore the number 1 is an eigenvalue of T .

Definition A.14. The eigenvector v corresponding to the eigenvalue


1 is called a steady-state vector of T . It is also called a Perron-
Frobenius eigenvector or a stable equilibrium distribution.

The steady-state vector represents a long term behavior of a Markov


chain.
234 Appendix A. Appendix

The Steady-State Vector

Theorem A.15. If T is an n × n regular stochastic matrix, then T


has a unique steady-state vector q.
(a) The entries of q are strictly positive.
(b) The steady-state vector can be computed by the power method

q = lim T k x0 , (A.2.11)
k→∞

where x0 is a probability vector.

Example A.16. Find eigenvalues and corresponding eigenvectors of the


transition matrix.  
1/2 1/4 1/6
T = 1/3 1/2 1/3. (A.2.12)
 

1/6 1/4 1/2


Solution. stochastic_eigen.py
1 import numpy as np
2 np.set_printoptions(precision=4,suppress=True)
3

4 T = [[1/2,1/4,1/6],
5 [1/3,1/2,1/3],
6 [1/6,1/4,1/2]]
7 T = np.array(T)
8

9 D,V = np.linalg.eig(T)
10

11 print('Eigenvalues:'); print(D)
12 print('Eigenvectors:'); print(V)
13

14 print('----- steady-state vector ----')


15 v1 = V[:,0]; v1 /= sum(v1)
16 print('v1 = ',v1)
17

18 print('----- power method -----------')


19 x = np.array([1,0,0]);
20 print('k = %2d; '%(0),x)
21 for k in range(10):
22 x = T.dot(x);
23 print('k = %2d; '%(k+1),x)
A.2. Eigenvalues and Eigenvectors of Stochastic Matrices 235

Output
1 Eigenvalues:
2 [1. 0.3333 0.1667]
3 Eigenvectors:
4 [[-0.5145 -0.7071 0.4082]
5 [-0.686 -0. -0.8165]
6 [-0.5145 0.7071 0.4082]]
7 ----- steady-state vector ----
8 v1 = [0.3 0.4 0.3]
9 ----- power method -----------
10 k = 0; [1 0 0]
11 k = 1; [0.5 0.3333 0.1667]
12 k = 2; [0.3611 0.3889 0.25 ]
13 k = 3; [0.3194 0.3981 0.2824]
14 k = 4; [0.3063 0.3997 0.294 ]
15 k = 5; [0.3021 0.3999 0.298 ]
16 k = 6; [0.3007 0.4 0.2993]
17 k = 7; [0.3002 0.4 0.2998]
18 k = 8; [0.3001 0.4 0.2999]
19 k = 9; [0.3 0.4 0.3]
20 k = 10; [0.3 0.4 0.3]

Remark A.17. Some eigenvalues of a stochastic matrix can be negative.


For example, apply a row interchange operation to T in (A.2.12):
T[[1,2]] = T[[2,1]]

Then
• The resulting matrix is still a stochastic matrix.
• Its eigenvalues become
[ 1. 0.281 -0.1977].
236 Appendix A. Appendix
A PPENDIX C
Chapter Review

Sections selected for the review:


§1.7. Linear Independence, p.44
§1.9. The Matrix of A Linear Transformation, p.57
§2.3. Characterizations of Invertible Matrices, p.82
§2.8. Subspaces of Rn , p.97
§2.9. Dimension and Rank, p.103
§3.2. Properties of Determinants, p.115
§4.1. Vector Spaces and Subspaces, p.120
§5.3. Diagonalization, p.142
§5.9. Applications to Markov Chains, p.168
§6.3. Orthogonal Projections, p.196
§6.4. The Gram-Schmidt Process and QR Factorization, p.203
§6.6. Machine Learning: Regression Analysis, p.217

Contents of Chapter Reviews


C.1. Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
C.2. Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
C.3. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
C.4. Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
C.5. Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
C.6. Orthogonality and Least-Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257

237
238 Appendix C. Chapter Review

C.1. Linear Equations

§1.7. Linear Independence


Definition 1.50. A set of vectors {v1 , v2 , · · · , vp } in Rn is said to be
linearly independent, if the vector equation

x1 v1 + x2 v2 + · · · + xp vp = 0 (C.1.1)

has only the trivial solution (i.e., x1 = x2 = · · · = xp = 0). The set of


vectors {v1 , v2 , · · · , vp } is said to be linearly dependent, if there exist
weights c1 , c2 , · · · , cp , not all zero, such that

c1 v1 + c2 v2 + · · · + cp vp = 0. (C.1.2)

Remark 1.52. Let A = [v1 , v2 , · · · , vp ]. The matrix equation Ax = 0 is


equivalent to x1 v1 + x2 v2 + · · · + xp vp = 0.
1. Columns of A are linearly independent if and only if Ax = 0 has
only the trivial solution. ( ⇔ Ax = 0 has no free variable ⇔ Every
column in A is a pivot column.)
2. Columns of A are linearly dependent if and only if Ax = 0 has a
nontrivial solution. ( ⇔ Ax = 0 has at least one free variable ⇔ A
has at least one non-pivot column.)

Example C.1. Determine if the vectors are linearly independent.


     
−1 2 −1
 1,  0,  3
     

3 −8 1
Solution.
C.1. Linear Equations 239

§1.9. The Matrix of A Linear Transformation


Theorem 1.75. Let T : Rn → Rm be a linear transformation. Then there
exists a unique matrix A ∈ Rm×n such that

T (x) = Ax, for all x ∈ Rn .

In fact, with ej denoting the j-th standard unit vector in Rn ,

A = [T (e1 ) T (e2 ) · · · T (en )] . (C.1.3)

The matrix A is called the standard matrix of the transformation.

Note: Standard unit vectors in Rn & the standard matrix:


     
1 0 0
 .
 ..
 
0 1
 
 .. ...
     
e1 =  ., e2 =  0 , · · · , en = 
 . (C.1.4)
 .  ..
 
 ..
 
   . 0
 
0 0 1

Any x ∈ Rn can be written as


       
x 1 0 0
 1  .
 ..
 
 x2  0 1
 
 ..  ..
x =  . = x1  . + x2 0 + · · · + xn  ... = x1 e1 + x2 e2 + · · · + xn en .
       
   
 .
 ..
 .
 ..  ..  
     . 0
 
xn 0 0 1

Thus
T (x) = T (x1 e1 + x2 e2 + · · · + xn en )
= x1 T (e1 ) + x2 T (e2 ) + · · · + xn T (en ) (C.1.5)
= [T (e1 ) T (e2 ) · · · T (en )] x,
and therefore the standard matrix reads
A = [T (e1 ) T (e2 ) · · · T (en )] . (C.1.6)
240 Appendix C. Chapter Review

Example C.2. Write the standard matrix for the linear transformation
T : R2 → R4 given by
T (x1 , x2 ) = (x1 + 4x2 , 5x1 , −3x2 , x1 − x2 ).

Solution.

Theorem 1.81. Let T : Rn → Rm be a linear transformation with the


standard matrix A. Then,
(a) T maps Rn onto Rm if and only if the columns of A span Rm .
( ⇔ every row of A has a pivot position
⇔ Ax = b has a solution for all b ∈ Rm )
(b) T is one-to-one if and only if the columns of A are linearly indepen-
dent.
( ⇔ every column of A is a pivot column
⇔ Ax = 0 has “only" the trivial solution)

Example C.3. Let T : R4 → R3 be the linear transformation whose stan-


dard matrix is  
1 −4 0 1
A = 0 2 −1 3 .
 

0 0 0 −1
Is T onto? Is T one-to-one?
Solution.

Ans: onto, but not one-to-one


C.2. Matrix Algebra 241

C.2. Matrix Algebra


§2.3. Characterizations of Invertible Matrices
Theorem 2.25. (Invertible Matrix Theorem)
Let A be an n × n matrix. Then the following are equivalent.
a. A is an invertible matrix. (Def: There is B s.t. AB = BA = I)
b. A is row equivalent to the n × n identity matrix.
c. A has n pivot positions.
d. The equation Ax = 0 has only the trivial solution x = 0.
e. The columns of A are linearly independent.
f. The linear transformation x 7→ Ax is one-to-one.
g. The equation Ax = b has unique solution for each b ∈ Rn .
h. The columns of A span Rn .
i. The linear transformation x 7→ Ax maps Rn onto Rn .
j. There is a matrix C ∈ Rn×n such that CA = I
k. There is a matrix D ∈ Rn×n such that AD = I
l. AT is invertible and (AT )−1 = (A−1 )T .

Theorem 2.74 (Invertible Matrix Theorem); §2.9


m. The columns of A form a basis of Rn
n. Col A = Rn
o. dim Col A = n
p. rank A = n
q. Nul A = {0}
r. dim Nul A = 0

Theorem 5.17 (Invertible Matrix Theorem); §5.2


s. The number 0 is not an eigenvalue of A.
t. det A 6= 0
242 Appendix C. Chapter Review

Example C.4. An n × n upper triangular matrix is one whose entries


below the main diagonal are zeros. When is a square upper triangular ma-
trix invertible?

Theorem 2.29 (Invertible linear transformations)


1. A linear transformation T : Rn → Rn is said to be invertible if
there exists S : Rn → Rn such that S ◦ T (x) = T ◦ S(x) = x for all
x ∈ Rn . In this case, S = T −1 .
2. Also, if A is the standard matrix for T , then A−1 is the standard
matrix for T −1 .

Example C.5. Let T : R2 → R2 be a linear transformation such that


" # " #
x1 −5x1 + 9x2
T = . Find a formula for T −1 .
x2 4x1 − 7x2
Solution.
C.2. Matrix Algebra 243

§2.8. Subspaces of Rn
Definition 2.47. A subspace of Rn is any set H in Rn that has three
properties:
a) The zero vector is in H.
b) For each u and v in H, the sum u + v is in H.
c) For each u in H and each scalar c, the vector cu is in H.
That is, H is closed under linear combinations.

Definition 2.49. Let A be an m × n matrix. The column space of A


is the set Col A of all linear combinations of columns of A. That is, if
A = [a1 a2 · · · an ], then

Col A = {u | u = c1 a1 + c2 a2 + · · · + cn an }, (C.2.1)

where c1 , c2 , · · · , cn are scalars. Col A is a subspace of Rm .

Definition 2.51. Let A be an m × n matrix. The null space of A, Nul A,


is the set of all solutions of the homogeneous system Ax = 0.

Theorem 2.52. Nul A is a subspace of Rn .


   
1 −2 −3 2
Example C.6. Let A =  2 4 2 and b = −4. Determine whether
   

−3 5 6 −7
b is in the column space of A, Col A.
Solution. Clue: 1 b ∈ Col A
⇔ 2 b is a linear combination of columns of A
⇔ 3 Ax = b is consistent
⇔ 4 [A b] has a solution
244 Appendix C. Chapter Review

Definition 2.53. A basis for a subspace H in Rn is a set of vectors that


1. is linearly independent, and
2. spans H.

Theorem 2.56. Basis for Nul A can be obtained from the parametric
vector form of solutions of Ax = 0. That is, suppose that the solutions of
Ax = 0 reads
x = x1 u1 + x2 u2 + · · · + xk uk ,
where x1 , x2 , · · · , xk correspond to free variables. Then, a basis for Nul A
is {u1 , u2 , · · · , uk }.

Theorem 2.58. In general, non-pivot columns are linear combinations


of pivot columns. Thus the pivot columns of a matrix A form a basis
for Col A.

Example C.7. Matrix A and its echelon form is given. Find a basis for
Col A and a basis forNul 
A. 
3 −6 9 0 1 −2 3 0
A = 2 −4 7 2 ∼ 0 0 1 2
   

3 −6 6 −6 0 0 0 0
Solution.

Ans: BCol A = {a1 , a3 }, BNul A = {[2, 1, 0, 0]T , [6, 0, −2, 1]T }.


C.2. Matrix Algebra 245

§2.9. Dimension and Rank


Definition 2.64. Suppose the set B = {b1 , b2 , · · · , bp } is a basis for a
subspace H. For each x ∈ H, the coordinates of x relative to the ba-
sis B are the weights c1 , c2 , · · · , cp such that x = c1 b1 + c2 b2 + · · · + cp bp ,
and the vector in Rp  
c1
 ..
[x]B =  .
cp
is called the coordinate vector of x (relative to B) or the B-
coordinate vector of x.
     
3 −2 2
Self-study C.8. Let v1 =  1, v2 =  2, x =  6, and B = {v1 , v2 }.
     

−2 1 −2
Then B is a basis for H = Span{v1 , v2 }, because v1 and v2 are linearly
independent. Determine if x is in H, and if it is, find the coordinate vector
of x relative to B.
Solution.
246 Appendix C. Chapter Review

Theorem 2.70. (Rank Theorem) Let A ∈ Rm×n . Then

dim Col A + dim Nul A = rank A + nullity A = n


= (the number of columns in A)

Here, “dim Nul A” is called the nullity of A: nullity A

Theorem 2.73. (The Basis Theorem)


Let H be a p-dimensional subspace of Rn . Then
a) Any linearly independent set of exactly p elements in H is automat-
ically a basis for H
b) Any set of p elements of H that spans H is automatically a basis for
H.

Example C.9. Find a basis for the subspace spanned by the given vectors.
What
 isthe 
dimension
   of the
 subspace?
1 2 −3 −4
       
−1 −3  5  6
 ,  ,  ,  
−2 −1  0  2
       
3 4 −5 −8
Solution.
C.3. Determinants 247

C.3. Determinants
§3.2. Properties of Determinants
Definition 3.1. Let A be an n × n square matrix. Then determinant is
a scalar value denoted by det A or |A|.
1) Let A = [a] ∈ R1 × 1 . Then det A = a.
" #
a b
2) Let A = ∈ R2 × 2 . Then det A = ad − bc.
c d

Definition 3.3. Let Aij be the submatrix of A obtained by deleting row i


and column j of A. Then the (i, j)-cofactor of A = [aij ] is the scalar Cij ,
given by
Cij = (−1)i+j det Aij . (C.3.1)

Definition 3.4. For n ≥ 2, the determinant of an n × n matrix A = [aij ]


is given by the following formulas:
1. The cofactor expansion across the first row:

det A = a11 C11 + a12 C12 + · · · + a1n C1n (C.3.2)

2. The cofactor expansion across the row i:

det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin (C.3.3)

3. The cofactor expansion down the column j:

det A = a1j C1j + a2j C2j + · · · + anj Cnj (C.3.4)

Note: The determinant can be viewed as a volume scaling factor.


248 Appendix C. Chapter Review

Theorem 3.9. Let A be an n × n square matrix.


a) (Replacement): If B is obtained from A by a row replacement, then
det B = det A.
" # " #
1 3 1 3
A= , B=
2 1 0 −5

b) (Interchange): If two rows of A are interchanged to form B, then


det B = −det A.
" # " #
1 3 2 1
A= , B=
2 1 1 3

c) (Scaling): If one row of A is multiplied by k (6= 0), then


det B = k · det A.
" # " #
1 3 1 3
A= , B=
2 1 −4 −2

 
1 −4 2
Example C.10. Compute det A, where A = −1 7 0, after applying
 

−2 8 −9
a couple of steps of replacement operations.
C.3. Determinants 249

Claim 3.12. Let A and B be n × n matrices.


a) det AT = det A.
b) det (AB) = det A · det B.
1
c) If A is invertible, then det A−1 = . (∵ det In = 1.)
det A

Example C.11. Find the determinant of A2 , when


 
1 0 0 3
 
 0 3 2 0
A=
−1 0 4 −2

 
1 0 0 4
Solution.

Ans: det A = 12 ⇒ det (A2 ) = (det A)2 = (12)2 = 144.


250 Appendix C. Chapter Review

C.4. Vector Spaces

§4.1. Vector Spaces and Subspaces


Definition 4.1. A vector space is a nonempty set V of objects, called
vectors, on which are defined two operations, called addition and
multiplication by scalars (real numbers), subject to the ten axioms
(or rules) listed below. The axioms must hold for all vectors u, v, w ∈ V
and for all scalars c and d.
1. u + v ∈ V
2. u + v = v + u
3. (u + v) + w = u + (v + w)
4. There is a zero vector 0 ∈ V such that u + 0 = u
5. For each u ∈ V , there is a vector −u ∈ V such that u + (−u) = 0
6. cu ∈ V
7. c(u + v) = cu + cv
8. (c + d)u = cu + du
9. c(du) = (cd)u
10. 1u = u

Definition 4.3. A subspace of a vector space V is a subset H of V that


has three properties:
a) 0 ∈ H, where 0 is the zero vector of V
b) H is closed under vector addition: for each u, v ∈ H, u + v ∈ H
c) H is closed under scalar multiplication: for each u ∈ H and each
scalar c, cu ∈ H

Theorem 4.7. If v1 , v2 , · · · , vp are in a vector space V , then


Span{v1 , v2 , · · · , vp } is a subspace of V .
C.4. Vector Spaces 251

Example C.12. Let H = {(a − b, 3b − a, a + b, b) | a, b ∈ R}. Show that H is


a subspace of R4 .
Solution.

Example C.13. Determine if the given set is a subspace of Pn for an ap-


propriate value of n.
 2
a) at | a ∈ R b) {p ∈ P3 with integer coefficients}

c) a + t2 | a ∈ R d) {p ∈ Pn | p(0) = 0}

Solution.

Ans: a) Yes, b) No, c) No, d) Yes


Self-study C.14. Let H and K be subspaces of V . Define the sum of H
and K as
H + K = {u + v | u ∈ H, v ∈ K}.
Prove that H + K is a subspace of V .
Solution.
252 Appendix C. Chapter Review

C.5. Eigenvalues and Eigenvectors

§5.3. Diagonalization
Definition 5.25. An n × n matrix A is said to be diagonalizable if there
exists an invertible matrix P and a diagonal matrix D such that

A = P DP −1 (or P −1 AP = D) (C.5.1)

Theorem 5.28. (The Diagonalization Theorem)


1. An n × n matrix A is diagonalizable if and only if A has n linearly
independent eigenvectors v1 , v2 , · · · , vn .
2. In fact, A = P DP −1 if and only if columns of P are n linearly inde-
pendent eigenvectors of A. In this case, the diagonal entries of D are
the corresponding eigenvalues of A. That is,
P = [v1 v2 · · · vn ],
 
λ1 0 · · · 0
(C.5.2)
 
 0 λ2 · · · 0 
D = diag(λ1 , λ2 , · · · , λn ) = 
 ... ... . . . ... ,

 
0 0 · · · λn

where Avk = λk vk , k = 1, 2, · · · , n.
C.5. Eigenvalues and Eigenvectors 253

The Diagonalization Theorem can be proved using the following remark.

Remark 5.29. AP = P D with D Diagonal


Let P = [v1 v2 · · · vn ] and D = diag(λ1 , λ2 , · · · , λn ) be arbitrary n × n
matrices. Then,

AP = A[v1 v2 · · · vn ] = [Av1 Av2 · · · Avn ], (C.5.3)

while
 
λ1 0 · · · 0
 
 0 λ2 · · · 0 
 ... ... . . . ...  = [λ1 v1 λ2 v2 · · · λn vn ].
P D = [v1 v2 · · · vn ]  (C.5.4)
 
0 0 · · · λn

If AP = P D with D diagonal, then the nonzero columns of P are


eigenvectors of A.

Self-study C.15. Diagonalize the following matrix, if possible.


 
2 4 3
B = −4 −6 −3,
 

3 3 1

for which det (B − λI) = −(λ − 1)(λ + 2)2 .


Solution.
254 Appendix C. Chapter Review

§5.9. Applications to Markov Chains


Definition 5.60. Probability Vector and Stochastic Matrix
 
p1
• A vector p =  ...  with nonnegative entries that add up to 1 is called
 

pn
a probability vector.
• A (left) stochastic matrix is a square matrix whose columns are
probability vectors.

A stochastic matrix is also called a probability matrix, transition ma-


trix, substitution matrix, or Markov matrix.

Lemma 5.61. Let T be a stochastic matrix. If p is a probability vector,


then so is q = T p.

Proof. Let v1 , v2 , · · · , vn be the columns of T . Then

q = T p = p 1 v 1 + p2 v 2 + · · · pn v n .

Clearly q has nonnegative entries; their sum reads

sum(q) = sum(p1 v1 + p2 v2 + · · · pn vn ) = p1 + p2 + · · · + pn = 1.

Definition 5.62. Markov Chain


In general, a finite Markov chain is a sequence of probability vectors
x0 , x1 , x2 , · · · , together with a stochastic matrix T , such that

x1 = T x0 , x2 = T x1 , x3 = T x2 , · · · (C.5.5)

We can rewrite the above conditions as a recurrence relation

xk+1 = T xk , k = 0, 1, 2, · · · (C.5.6)

The vector xk is often called a state vector.


C.5. Eigenvalues and Eigenvectors 255

Steady-State Vectors

Definition 5.66. If T is a stochastic matrix, then a steady-state vector


for T is a probability vector q such that

T q = q. (C.5.7)

Note: The steady-state vector q can be seen as an eigenvector of T , of


which the corresponding eigenvalue λ = 1.

Strategy 5.67. How to Find a Steady-State Vector


(a) First, solve for x = [x1 , x2 , · · · , xn ]T :

T x = x ⇔ T x − x = 0 ⇔ (T − I)x = 0. (C.5.8)

(b) Then, set


1
q= x. (C.5.9)
x1 + x2 + · · · + xn
" #
0.4 0.3
Example C.16. Let T = . Find a steady-state vector for T .
0.6 0.7
256 Appendix C. Chapter Review

Definition 5.69. A stochastic matrix T is regular if some matrix power


T k contains only strictly positive entries.

Theorem 5.72. If T is an n × n regular stochastic matrix, then T has a


unique steady-state vector q.
(a) The entries of q are strictly positive.
(b) The steady-state vector

q = lim T k x0 , (C.5.10)
k→∞

for any initial probability vector x0 .

Remark 5.73. Let T ∈ Rn×n be a regular stochastic matrix. Then


• If T v = λv, then |λ| ≤ 1.
(The above is true for every stochastic matrix; see § A.2.)
• Every column of T k converges to q as k → ∞, i.e.,

T k → [q q · · · q] ∈ Rn×n , as k → ∞. (C.5.11)
" #
0 0.5
Example C.17. Let T = .
1 0.5

(a) Is T regular?
(b) What is the first column of lim T k ?
k→∞
C.6. Orthogonality and Least-Squares 257

C.6. Orthogonality and Least-Squares

§6.3. Orthogonal Projections


Theorem 6.36. (The Orthogonal Decomposition Theorem)
Let W be a subspace of Rn . Then each y ∈ Rn can be written uniquely in
the form
y=y b + z, (C.6.1)
where yb ∈ W and z ∈ W ⊥ . In fact, if {u1 , u2 , · · · , up } is an orthogonal
basis for W , then
y•u1 y•u2 y•up
b = projW y =
y u1 + u2 + · · · + up ,
u1 •u1 u2 •u2 up •up (C.6.2)
z = y−y b.

Remark 6.38. (Properties of Orthogonal Decomposition)


Let y = y b ∈ W and z ∈ W ⊥ . Then
b + z, where y
1. y
b is called the orthogonal projection of y onto W (= projW y)
2. y
b is the closest point to y in W .
(in the sense ky − y
bk ≤ ky − vk, for all v ∈ W )
3. y
b is called the best approximation to y by elements of W .
4. If y ∈ W , then projW y = y.

Example C.18. Find the distance from y to the plane in R3 spanned by u1


and u2.     
5 −3 −3
y = −9, u1 = −5, u2 =  2
     

5 1 1
258 Appendix C. Chapter Review

Theorem 6.42. If {u1 , u2 , · · · , up } is an orthonormal basis for a sub-


space W of Rn , then

projW y = (y•u1 ) u1 + (y•u2 ) u2 + · · · + (y•up ) up . (C.6.3)

If U = [u1 u2 · · · up ], then

projW y = U U T y, for all y ∈ Rn . (C.6.4)

The orthogonal projection can be viewed as a matrix transformation.


" # " #
7 3
Example C.19. Let y = ,v= , and W = Span{v}.
9 4

(a) Find the projection matrix U U T .


y•v
(b) Compute projW y = v and U U T y.
v·v
(c) Find the distance from y to the subspace W .
Solution.
C.6. Orthogonality and Least-Squares 259

§6.4. The Gram-Schmidt Process and QR Factorization


The Gram-Schmidt process is an algorithm to produce an orthogonal
or orthonormal basis for any nonzero subspace of Rn .
Theorem 6.47. (The Gram-Schmidt Process) Given a basis
{x1 , x2 , · · · , xp } for a nonzero subspace W of Rn , define

v1 = x1
x2 •v1
v2 = x2 − v1
v1 •v1
x3 •v1 x3 •v2
v3 = x3 − v1 − v2 (C.6.5)
v1 •v1 v2 •v2
..
.
xp •v1 xp •v2 xp •vp−1
vp = xp − v1 − v2 − · · · − vp−1
v1 •v1 v2 •v2 vp−1 •vp−1
Then {v1 , v2 , · · · , vp } is an orthogonal basis for W . In addition,

Span{x1 , x2 , · · · , xk } = Span{v1 , v2 , · · · , vk }, for 1 ≤ k ≤ p. (C.6.6)

Remark 6.48. For the result of the Gram-Schmidt process, define


vk
uk = , for 1 ≤ k ≤ p. (C.6.7)
kvk k
Then {u1 , u2 , · · · , up } is an orthonormal basis for W . In practice, it is
often implemented with the normalized Gram-Schmidt process.

Example C.20. Find an orthogonal basis for W = Span{x1 , x2 } and projW y,


when      
1 −2 0
x1 =  0, x2 =  2, and y =  1.
     

−1 1 −1
260 Appendix C. Chapter Review

Algorithm 6.51. (QR Factorization) Let A = [x1 x2 · · · xn ].


• Apply the Gram-Schmidt process to obtain an orthonormal basis
{u1 , u2 , · · · , un }.
• Then
x1 = (u1 •x1 )u1
x2 = (u1 •x2 )u1 + (u2 •x2 )u2
x3 = (u1 •x3 )u1 + (u2 •x3 )u2 + (u3 •x3 )u3 (C.6.8)
..
.
Pn
xn = j=1 (uj •xn )uj .

• Thus
A = [x1 x2 · · · xn ] = QR (C.6.9)
implies that

Q = [u1 u2 · · · un ],
 
u1 •x1 u1 •x2 u1 •x3 · · · u1 •xn
 0 u2 •x2 u2 •x3 · · · u2 •xn 
 
  T
(C.6.10)
R = 
 0 0 u3 •x3 · · · u3 •xn 
 = Q A.
.. .. .. ... ..
. . . .
 
 
0 0 0 · · · un •xn

• In practice, the coefficients rij = ui •xj , i < j, can be saved during


the (normalized) Gram-Schmidt process.
" #
4 −1
Self-study C.21. Find the QR factorization for A = .
3 2
Solution.

" # " #
0.8 −0.6 5 0.4
Ans: Q = R=
0.6 0.8 0 2.2
C.6. Orthogonality and Least-Squares 261

§6.6. Machine Learning: Regression Analysis


Definition 6.54. Let A is an m × n matrix and b ∈ Rm .
A least-squares (LS) solution of Ax = b is an xb ∈ Rn such that

kb − Ab
xk ≤ kb − Axk, for all x ∈ Rn . (C.6.11)

Remark 6.55. Geometric Interpretation of the LS Problem


• For all x ∈ Rn , Ax will necessarily be in Col A, a subspace of Rm .
– So we seek an x that makes Ax the closest point in Col A to b.

• Let b
b = proj
Col A b. Then Ax = b has a solution and there is an
b
b ∈ Rn such that
x
Ab
x = b.
b (C.6.12)

• x
b is an LS solution of Ax = b.
b 2 = kb−Ab
• The quantity kb− bk xk2 is called the least-squares error.
262 Appendix C. Chapter Review

The Method of Normal Equations

Theorem 6.56. The set of LS solutions of Ax = b coincides with the


nonempty set of solutions of the normal equations

AT Ax = AT b. (C.6.13)

Theorem 6.59. Let A be an m × n matrix. The following statements are


logically equivalent:
a. The equation Ax = b has a unique LS solution for each b ∈ Rm .
b. The columns of A are linearly independent.
c. The matrix AT A is invertible.
When these statements are true, the unique LS solution x
b is given by

b = (AT A)−1 AT b.
x (C.6.14)

Regression Line

Definition 6.65. Suppose a set of experimental data points are given as

(x1 , y1 ), (x2 , y2 ), · · · , (xm , ym )

such that the graph is close to a line. We determine a line

y = β0 + β1 x (C.6.15)

that is as close as possible to the given points. This line is called the
least-squares line; it is also called regression line of y on x and β0 , β1
are called regression coefficients.
C.6. Orthogonality and Least-Squares 263

Calculation of Least-Squares Lines


Remark 6.66. Consider a least-squares (LS) model of the form
y = β0 + β1 x, for a given data set {(xi , yi ) | i = 1, 2, · · · , m}.
• Then
Predicted y-value Observed y-value

β0 + β1 x1 = y1
β0 + β1 x2 = y2 (C.6.16)
.. ..
. .
β0 + β1 xm = ym

• It can be equivalently written as


Xβ = y, (C.6.17)

where    
1 x1 y1
  " #  
1 x2  β0  y2 
X=
 ... ... ,
 β= , y=
 ... .

  β1  
1 xm ym
Here we call X the design matrix, β the parameter vector, and y
the observation vector.
• (Method of Normal Equations) Thus the LS solution can be deter-
mined as
X T Xβ = X T y ⇒ β = (X T X)−1 X T y, (C.6.18)
provided that X T X is invertible.

Self-study C.22. Find the equation y = β0 + β1 x of least-squares line that


best fits the given points:
(−1, 1), (0, 1), (1, 2), (2, 3)
Solution.
264 Appendix C. Chapter Review

Further Applications
Example C.23. Find an LS curve of the form y = a cos x + b sin x that best
fits the given points:
(0, 1), (π/2, 1), (π, −1).
Solution.

Nonlinear Models: Linearization


Strategy 6.74. For nonlinear models, change of variables can be ap-
plied for a linear model.

Model Change of Variables Linearization


B 1
y =A+ x
e = , ye = y ⇒ ye = A + Bex
x x
1 1 (C.6.19)
y= x
e = x, ye = ⇒ ye = A + Bex
A + Bx y
y = CeDx x
e = x, ye = ln y ⇒ ye = ln C + De
x

The Idea: Transform the nonlinear model to produce a linear system.

Self-study C.24. Find an LS curve of the form y = CeDx that best fits the
given points:
(0, e), (1, e3 ), (2, e5 ).
Solution.
x y x
e ye = ln y    
1 0 1
0 e 0 1
⇒ ⇒ X = 1 1, y = 3
   
1 e3 1 3
1 2 5
2 e5 2 5

Ans: y = ee2x = e2x+1


A PPENDIX P
Projects

Finally we add projects.

Contents of Projects
P.1. Project Regression Analysis: Linear, Piecewise Linear, and Nonlinear Models . . . . . 266

265
266 Appendix P. Projects

P.1. Project Regression Analysis: Linear, Piece-


wise Linear, and Nonlinear Models

Regression analysis is a set of statistical processes for estimating the


relationships between independent variables and dependent variables.
• Regression analysis is a way to find trends in data.
• There are variations: linear, multiple linear, and nonlinear.
– The most common models are simple linear and multiple linear.
– Nonlinear regression analysis is commonly used when the
dataset shows a nonlinear relationship.
• Choosing an appropriate regression model is often a difficult task.
In this project: we’ll try to find best regression models, for
given datasets.

Strategy P.1. Determination of the Best Model


Suppose we are given a dataset as in the figure.
1. We may plot the dataset for a visual inspection.
2. One or more good models can be selected.
3. Then, the best model is determined through analysis.

Figure P.1: A test dataset of 100 points {(xi , yi ) | i = 1, 2, · · · , 100}.

It looks like a quadratic polynomial!


P.1. Project Regression Analysis: Linear, Piecewise Linear, and Nonlinear Models 267

Polynomial Fitting: A Review

For the dataset {(xi , yi ) | i = 1, 2, · · · , m} in Figure P.1, consider a regres-


sion model of the form
y = a0 + a1 x + a2 x 2 .

• Then
Predicted y-value Observed y-value

a0 + a1 x1 + a2 x21 = y1
a0 + a1 x2 + a2 x22 = y2 (P.1.1)
.. ..
. .
a0 + a1 xm + a2 x2m = ym

• It is equivalently written as
Xp = y, (P.1.2)

where 
2
  
1 x1 x1   y1
a0
1 x2 x22 
   
 y2 
X= .. , p = a1 , y=
 ... .
 
 ... ...
 
. 
a2
  
1 xm x2m ym

• The system can be solved using the method of normal equations:


   
Σ1 Σxi Σx2i Σyi
X T Xp =  Σxi Σx2i Σx3i  p =  Σxi yi  = X T y (P.1.3)
   

Σx2i Σx3i Σx4i Σx2i yi

Note: The above polynomial fitting is well implemented in most pro-


gramming languages.
Matlab: p = polyfit(x,y,deg);
Python: p = np.polyfit(x,y,deg)
where x and y are arrays of x- and y-coordinates, respectively, and deg
denotes the degree of the regression polynomial. We will use it!
268 Appendix P. Projects

test_data_100.m
1 close all; clear all
2

3 DATA = readmatrix('test-data-100.txt');
4 x = DATA(:,1); y = DATA(:,2);
5 p = polyfit(x,y,2);
6 yhat = polyval(p,x); % predicted y-values
7 LS_error = norm(y-yhat)^2/length(y); % variance
8

9 %---------------------------------------------------
10 fprintf('LS_error= %.3f; p=',LS_error); disp(p)
11 % Output: LS_error= 0.130; p= 0.3944 -0.6824 0.3577
12

13 %---------------------------------------------------
14 x1 = linspace(min(x),max(x),100);
15 y1 = polyval(p,x1); % regression curve
16

17 figure, plot(x,y,'k.','MarkerSize',8); hold on


18 xlim([-1,5]); ylim([-1,5]);
19 xlabel('x','fontsize',15); ylabel('y','fontsize',14);
20 title('test-data-100: Regression','fontsize',14);
21 plot(x1,y1,'r-','linewidth',2)
22 exportgraphics(gcf,'test-data-100-regression.png','Resolution',100);

Figure P.2: test-data-100-regression.png: y = 0.3944 ∗ x2 − 0.6824 ∗ x + 0.3577.


P.1. Project Regression Analysis: Linear, Piecewise Linear, and Nonlinear Models 269

Nonlinear Regression
Example P.2. See the dataset {(xi , yi )} shown
in the figure. When we try to find the best fit-
ting model of the form
y = c edx , (P.1.4)

the corresponding nonlinear least-squares


problem reads
m
X 2
min yi − c edxi . (P.1.5)
c,d
i=1

The problem can be solved by applying a nonlinear iterative solver such as


the Newton’s method with a good initialization (c0 , d0 ).

We can solve it much more easily through linearization.


Linearization by Change of Variables
y = c edx . (P.1.6)

The Goal: find the best-fitting (c, d) for the dataset.


• (Transform) Apply the logarithmic function to have
ln y = ln(c edx ) = ln c + ln edx = dx + ln c. (P.1.7)

• (Change of Variables) Define


X = x; Y = ln y; a = ln c. (P.1.8)

• (Linear Model) Then the model in (P.1.7) reads


Y = d X + a, (P.1.9)

which is a linear model; we can get best (d, a), by polyfit(X,Y,1).


• Finally, we recover c = ea .
270 Appendix P. Projects

nonlinear_regression.m
1 lose all; clear all
2

3 FILE = 'seemingly-exp-data.txt';
4 DATA = readmatrix(FILE);
5 % fitting: y =c*exp(d*x) ---> ln(y) = d*x + ln(c)
6 %---------------------------------------------------
7

8 x = DATA(:,1); y = DATA(:,2);
9 lny = log(y); % data transform
10 p = polyfit(x,lny,1); % [p1,p2] = [d, ln(c)]
11 d = p(1); c = exp(p(2));
12

13 LS_error = norm(y-c*exp(d*x))^2/length(y); % variance


14

15 %---------------------------------------------------
16 fprintf('c=%.3f; d=%.3f; LS_error=%.3f\n',c,d,LS_error)
17 % Outout: c=1.346; d=0.669; LS_error=4.212
18

19 % figure

Figure P.3: The data and a nonlinear regression via linearization: y=1.346*exp(0.669*x).

Note: If you want to test nonlinear_regression.m, you may download


https://skim.math.msstate.edu/LectureNotes/data/nonlinear_regression.m
https://skim.math.msstate.edu/LectureNotes/data/seemingly-exp-data.txt
The dataset consists of 200 points, generated by y=1.3*exp(0.7*x) with
a random positioning and random noise.
P.1. Project Regression Analysis: Linear, Piecewise Linear, and Nonlinear Models 271

Finding the Best Regression Model

Example P.3. Consider a simple


dataset: 10 points generated from a
sine function, with noise.
Wanted: Find the best regression
model for the dataset:
• Let’s select a model from Pn ,
polynomials of degree ≤ n

Sine_Noisy_Data_Regression.m
1 close all; clear all
2

3 a=0; b=1; m=10;


4 f = @(t) sin(2*pi*t);
5 DATAFILE = 'sine-noisy-data.txt';
6 renew_data = 0;
7

8 %%-----------------------------------------------
9 if isfile(DATAFILE) && renew_data == 0
10 DATA = readmatrix(DATAFILE); % np.loadtxt()
11 else
12 X = linspace(a,b,m); Y0 = f(X);
13 noise = rand([1,m]); noise = noise-mean(noise(:));
14 Y = Y0 + noise; DATA = [X',Y'];
15 writematrix(DATA,DATAFILE); % np.savetxt()
16 end
17

18 %%-----------------------------------------------
19 x = linspace(a,b,101); y = f(x);
20 x1 = DATA(:,1); y1 = DATA(:,2);
21 E = zeros(1,m);
22 for n = 0:m-1
23 p = polyfit(x1,y1,n); % np.polyfit()
24 yhat = polyval(p,x1); % np.polyval()
25 E(n+1) = norm(y1-yhat,2)^2;
26 %savefigure(x,y,x1,y1,polyval(p,x),n)
27 end
28

29 % figure
272 Appendix P. Projects

Which One is the Best?

Figure P.4: Regression models Pn , n = 0, 1, · · · , 9.

Strategy P.4. Given several models with similar explanatory ability,


the simplest is most likely to be the best choice.
• Start simple, and only make the model more complex as needed.
P.1. Project Regression Analysis: Linear, Piecewise Linear, and Nonlinear Models 273

The LS Error
Given the dataset {(xi , yi ) | i = 1, 2, · · · , m} and the model Pn , define the
LS-error m
X 2
En = yi − Pn (xi ) , (m = 10), (P.1.10)
i=1
which is also called the mean square error.

Figure P.5: The best choice is P3 , the third-order polynomial.

Summary P.5. Let’s summarize what we have done.


• Review: method of normal equations
• Example: polynomial fitting
• Example: nonlinear regression & its linearization
• Strategy: determination of the best model
274 Appendix P. Projects

Project Objective: To find the best model for each of the datasets:

What to Do
First download two datasets:
https://skim.math.msstate.edu/LectureNotes/data/regression-test-data-01.txt
https://skim.math.msstate.edu/LectureNotes/data/regression-test-data-02.txt

1. Finding Best Models


(a) For regression-test-data-01.txt, which model is better,
y=a0+a1*x+a2*x.ˆ2 or y = c*exp(d*x) ?
(b) For regression-test-data-02.txt (Data-02), which order of poly-
nomial is fitting the best? Your claim must be supported pictori-
ally as in Figure P.4.
2. Verification: For all models, measure the LS-errors. Show them
in tabular form.
3. Figuring: For Data-02, display the LS-error as in Figure P.5.
4. Extra Credit: Find a piecewise regression model for Data-02.
Is it better than polynomial models? (You must verify your answer.)

Note: You may use parts of the codes shown in this project. Report your
code, numerical outputs, and figures, with a summary.
• A code itself will not give you any credit; include outputs or figures.
• The summary will be worth 20% the full credit.
• Include all into a single file in pdf or doc/docx format.
Bibliography
[1] D. L AY AND S. L. ABD J UDI M C D ONALD, Linear Algebra and Its Applications, 6th Ed.,
Pearson, 2021.

[2] PAUL, Paul’s Online Notes.


https://tutorial.math.lamar.edu/Classes/DE/RepeatedEigenvalues.aspx.

[3] W IKIPEDIA, Eigenvalues and eigenvectors.


https://en.wikipedia.org/wiki/Eigenvalues_and_eigenvectors.

275
276 BIBLIOGRAPHY
Index
M , 28, 96, 141, 167, 178, 209, 225 coordinates, 103, 245
1-norm, 232
2-norm, 232 data compression, 230
design matrix, 219, 263
acute angle, 189 determinant, 73, 109–111, 247
addition, 120, 250 diagonal entries, 68
algebraic multiplicity, 139 diagonal matrix, 68
all-ones vector, 233 diagonalizable, 142, 252
annual_migration.m, 171 diagonalization theorem, 143, 252
attractor, 157 differential equation, 154
augmented matrix, 4 dimension, 105
direction of greatest attraction, 157
back substitution, 93 distance, 181
back_sub.m, 95 domain, 50
backward phase, 13 dot product, 36, 71, 180
basic variables, 12 dot product preservation, 193
basis, 99, 244 double eigenvalue, 159
basis theorem, 107, 246 dyadic decomposition, 230
best approximation, 198, 257
birds_on_aviary.m, 172 echelon form, 9
birds_on_aviary2.m, 173 eigenspace, 133, 147
eigenvalue, 132, 138, 228
change of variables, 224, 264, 269 eigenvalues of stochastic matrices, 231
Chapter Review, 237 eigenvector, 132, 138, 228
characteristic equation, 138, 150 elementary matrix, 87
characteristic polynomial, 138 elementary row operation, 87
closed, 97, 243 elementary row operations, 4, 115
closest point, 198, 257 ellipse, 229
codomain, 50 equal, 68
coefficient matrix, 4 equality of vectors, 21
cofactor, 111, 247 equivalent system, 2
cofactor expansion, 111, 247 Euler angles, 54
column space, 97, 127, 243 Euler, Leonhard, 228
column vector, 20 exercise-6.5.3.m, 216
complex conjugate, 151 existence and uniqueness, 3
complex eigenvalue, 150
consistent system, 3 finite Markov chain, 169
coordinate vector, 103, 245 for loop, 33

277
278 INDEX

forward elimination, 92 least-squares (LS) problem, 210


forward phase, 13, 18, 90 least-squares (LS) solution, 210, 211, 217,
forward substitution, 92 261
forward_sub.m, 94 least-squares error, 211, 212, 261
free variable, 14 least-squares line, 218, 262
free variables, 12 left singular vectors, 230
function, 33 length, 181
fundamental questions, two, 3 length preservation, 193
linear combination, 23
Gauss Elimination, 90 linear equation, 2
Gauss elimination with partial pivoting, linear space, 119
94 linear system, 2
general solution, 14 linear transformation, 53, 129
generalized eigenvector, 159 linearization, 269
geometric linear transformations of R2 , 60 linearly dependent, 44, 238
Gram-Schmidt process, 203, 206, 259 linearly independent, 44, 80, 238
Gram-Schmidt process, normalized, 204, linearly independent eigenvectors, 162
259 linspace, in Matlab, 31
lower triangular matrix, 86
homogeneous linear system, 38
lower-triangular system, 92
inconsistent system, 3 LU decomposition, 87
information matrix, 210 LU factorization, 87
initial condition, 154 LU factorization algorithm, 89
initial-value problem, 154 lu_nopivot_overwrite.m, 91
injective, 61 lu_solve.m, 95
inner product, 180, 182
machine learning, 217
interchange operation, 235
Markov chain, 168, 170, 231, 254
inverse power method, 165
Markov matrix, 170, 231, 254
inverse_power.m, 166
Matlab, 29
invertibility, 73
matlab: fileparts, 270
invertible, 84, 242
matlab: polyfit, 270
invertible linear transformation, 84, 242
matlab: readmatrix, 271
invertible matrix, 77
matlab: writematrix, 271
invertible matrix theorem, 82, 107, 116,
matrix algebra, 67
139, 241
matrix equation, 34
isomorphic, 104
matrix form, 4
isomorphism, 104
matrix multiplication, 70, 228
iteration, 32
matrix product, 70
iterative algorithm, 162
matrix transformation, 51, 200, 228, 258
kernel, 129 mean square error, 273
method of normal equations, 212, 221,
Lagrange, Joseph-Louis, 228 262, 267
leading 1, 9 multiplication by scalars, 120, 250
leading entry, 9 multiplicity, 139
INDEX 279

mysum.m, 33 Perron-Frobenius eigenvector, 233


piecewise regression, 274
Newton’s method, 269 pivot column, 11
nonhomogeneous linear system, 38 pivot position, 11
nonlinear least-squares problem, 269 plot, in Matlab, 30
nonlinear_regression.m, 270 polynomial fitting, 267
nonsingular matrix, 77 position vector, 20
nontrivial solution, 38 power iteration, 162
nonzero row, 9 power method, 162, 175, 234
norm, 181 power_iteration.m, 164
normal equations, 212, 217, 262 principal component analysis, 230
np.linalg.eig, 234
principal components, 230
np.loadtxt, 271
probability matrix, 170, 231, 254
np.polyfit, 271
probability vector, 170, 231, 254
np.polyval, 271
programming, 29
np.savetxt, 271
pseudoinverse, 213, 214, 217
np.set_printoptions, 234
Pythagorean theorem, 183, 198
null space, 98, 125, 129, 243
nullity, 105, 246 QR factorization, 206
objects, 29 QR factorization algorithm, 207, 260
observation vector, 210, 219, 263
random orthogonal matrix, 193
obtuse angle, 189
random sample consensus, 210
Octave, 29
range, 50, 129
one-to-one, 61, 64
rank, 105
onto, 61
rank theorem, 105, 246
orthogonal, 183
RANSAC, 210
orthogonal basis, 188, 203, 204, 259
reduced echelon form, 9
orthogonal complement, 183
REF, 9
orthogonal decomposition theorem, 196,
reflections in R2 , 60
257
regression analysis, 217, 230, 266
orthogonal matrix, 193
regression coefficients, 218, 262
orthogonal projection, 189, 196, 198, 257
regression line, 218, 262
orthogonal set, 187
regular, 175, 256
orthogonal_matrix.m, 193
orthogonality preservation, 193 regular stochastic matrix, 234
orthonormal basis, 192, 203, 204, 259 regular_stochastic.m, 175
orthonormal set, 192 regular_stochastic_Tk.m, 176
repeller, 157
p-norm, 232 repetition, 29, 32
parallelogram law, 186 replacement, 233
parallelogram rule for addition, 21 reusability, 33
parameter vector, 219, 263 reusable, 29
parametric description, 14 right singular vectors, 230
parametric vector form, 19, 39, 43 roll-pitch-yaw, 54
partial pivoting, 94 rotation, 54
280 INDEX

row equivalent, 5 subspace, 97, 121, 243, 250


row reduced echelon form, 9 substitution matrix, 170, 231, 254
row space, 185 sum, 123, 251
row-column rule, 71 sum of products, 71
RREF, 9 sum of two matrices, 68
superposition principle, 53, 55
saddle point, 157
surjective, 61
scalar multiple, 21
SVD theorem, 230
scalar multiplication, 68, 119
system of linear equations, iii, 2
shear transformation, 51, 59
similar, 140 test_data_100.m, 268
similarity, 140 transformation, 50
similarity transformation, 140 transition matrix, 169, 170, 172, 231, 254
Sine_Noisy_Data_Regression.m, 271 transpose, 19, 74
singular value decomposition, 230 trivial solution, 38
singular values, 230
sink, 157 unique inverse, 77
solution, 2 unit circle, 229
solution set, 2 unit lower triangular matrix, 87
source, 157 unit vector, 181, 192
span, 25 upper triangular matrix, 84, 87, 242
sparse system, 87 upper-triangular system, 93
square matrix, 68
vector, 20
stable equilibrium distribution, 233
vector addition, 21
standard basis, 99
vector equation, 24, 36
standard matrix, 58, 84, 239, 242
vector space, 119, 120, 250
standard unit vectors in Rn , 58, 239
vectors, 119, 120, 250
state vector, 170, 231, 254
visual inspection, 266
states, 168
volume scaling factor, 109, 110, 247
steady-state vector, 174, 233, 255
stochastic matrix, 170, 231, 254 yaw, 54
stochastic_eigen.py, 234
submatrix, 111, 247 zero subspace, 121

You might also like