0% found this document useful (0 votes)
19 views75 pages

Linear Algebra Notes Ajay Gandecha

Uploaded by

Manvik Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views75 pages

Linear Algebra Notes Ajay Gandecha

Uploaded by

Manvik Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Math 347

Linear Algebra for Applications


Comprehensive Notes for the Final Exam

Ajay Gandecha

December 1, 2022
Contents

0 Introduction to Linear Algebra 7

0.1 Introduction to Scalars and Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

0.1.1 Definition of a Scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

0.1.2 Definition of a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

0.1.3 Denoting a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

0.1.4 Components of a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

0.1.5 Writing a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

0.1.6 Calculating the Magnitude of a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

0.2 Operations with Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

0.2.1 Adding Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

0.2.2 Multiply a Vector by a Scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1 Introduction to Vectors 10

1.1 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.1.1 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.1.2 Spans of Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.2 Lengths and Dot Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.1 Calculating the Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.2 Length of a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.3 Working with Unit Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.2.4 Angle Between Two Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3 Introduction to Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3.1 Introduction to Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3.2 Multiplying a Matrix by a Scalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3.3 Multiply a Matrix by a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3.4 Representing Linear Combinations as a Matrix . . . . . . . . . . . . . . . . . . . . . . 15

1.3.5 Identity Matrices and Introduction to Inverse Matrices . . . . . . . . . . . . . . . . . . 15

1
1.3.6 Matrix Equation Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

1.3.7 Dependent and Independent Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3.8 Representing Systems of Linear Equations as a Matrix . . . . . . . . . . . . . . . . . . 17

2 Solving Linear Equations 18

2.1 Vectors and Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.1 Row and Column Pictures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2 The Idea of Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.1 Why Eliminate? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.3 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.4 Gauss-Jordan Elimination - Backwards elimination . . . . . . . . . . . . . . . . . . . . 21

2.2.5 Solve the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.3 Elimination Using Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.1 Elimination Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.2 Permutation Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3.3 Divide By Pivots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.4 Rules for Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.4.1 Matrix Operation Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2.5 Inverse Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5.2 Singular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5.3 Properties of Inverse Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5.4 Inverse of Elimination Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5.5 Calculating A−1 with Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.5.6 Diagonally Dominant Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.6 Factorization of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.6.1 A = LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.6.2 A = LDU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.7 Transposes and Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.7.1 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2
2.7.2 Rules of Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.7.3 Outer Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.7.4 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.7.5 Permutation Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.8 Proofs for Midterm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.8.1 Uniqueness of Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.8.2 Unique Solution of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.8.3 Singular Matrix Proof if the Product of Nonzero vector and matrix is 0 . . . . . . . . 29

2.8.4 Inverse Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.8.5 Symmetry and Transpose Multiplication Rule . . . . . . . . . . . . . . . . . . . . . . . 30

3 Vector Spaces and Subspaces 31

3.1 Spaces of Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.1 Vector Space Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.2 Other Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.3 Determining a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.4 Dimensions of Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.1.5 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.1.6 Span of a Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.1.7 Using Span to Describe the Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.1.8 Column Space of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.1.9 Linear System and the Column Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1.10 Conclusions about the Column Space . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2 Null Space of A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2.1 Definition of the Null Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2.2 Null Space is a Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2.3 Calculating the Null Space of an Invertile Matrix . . . . . . . . . . . . . . . . . . . . . 35

3.2.4 Calculating Null Space of a Singular Matrix (Not Invertible) . . . . . . . . . . . . . . 36

3.2.5 Rank of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.2.6 Rank One Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3 Complete Solution to A⃗x = ⃗b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3
3.3.1 Homogeneous System of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.3.2 Non-Homogeneous Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . 39

3.3.3 Full Column Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.4 Full Row Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.3.5 Relationships Between r, n, and m. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4 Independence, Basis, and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.1 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.2 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.3 Dimension of a Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.4.4 Applying these Concepts to Other Spaces . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5 Dimensions of the Four Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5.1 Four Spaces of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.5.2 Fundamental Theorem of Linear Algebra, Part 1 . . . . . . . . . . . . . . . . . . . . . 45

4 Orthogonality 46

4.1 Orthogonality of the Four Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1.1 Definition of Orthogonal Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1.2 Pairwise Orthogonal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1.3 Orthogonal Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1.4 Orthogonal Matrix Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1.5 Orthogonal Complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.1.6 Fundamental Theorem of Linear Algebra, Part 2 . . . . . . . . . . . . . . . . . . . . . 47

4.1.7 Action of A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1.8 Example Solving for the Orthogonal Complement of S . . . . . . . . . . . . . . . . . . 48

4.1.9 Extending Part 2 of the Fundamental Theorem of Linear Algebra . . . . . . . . . . . . 49

4.2 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.1 Definition of Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.2 Projection onto an Axis or Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.3 Projection Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.4 Projection onto a Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2.5 Projection onto a Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4
4.3 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.3.1 Finding the Line of Best Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.3.2 Why A⃗x = ⃗b cannot be Solved (but AT A⃗x̂ = AT ⃗b can) . . . . . . . . . . . . . . . . . . 56

4.4 Orthonormal Bases and Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4.1 Definition of Orthonomal Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4.2 Matrices with Orthonomal Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.4.3 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.4.4 Finding Least Square (Line of Best Fit) with Orthogonal Matrices . . . . . . . . . . . 58

4.4.5 Constructing Orthonormal Vectors using Gram-Schmidt . . . . . . . . . . . . . . . . . 58

5 Determinants 60

5.1 Definition of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2.1 Determinant of the Identity Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2.2 Sign Reversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2.3 Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2.4 Determinant of Matrix with Duplicate Rows . . . . . . . . . . . . . . . . . . . . . . . 62

5.2.5 Determinant After Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2.6 Matrix with a Row of 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2.7 Determinant of Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.2.8 Determinants of Singular and Invertible Matrices . . . . . . . . . . . . . . . . . . . . . 62

5.2.9 Determinant of Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2.10 Determinant of Transpose Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2.11 Determinant of an Orthogonal Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2.12 Properties Regarding Rows and Columns . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.3 Permutations and Cofactors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.3.1 The Big Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.3.2 Cofactors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6 Eigenvectors and Eigenvalues 65

6.1 Introduction to Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5
6.1.1 Definition of Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1.2 Finding Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

6.1.3 Properties of Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . 67

6.1.4 Trace of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.1.5 Complex Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6.2 Diagonalizing a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.2.1 Diagonalization and Eigendecomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.2.2 Eigenvector Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.2.3 Similar Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.3 Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.3.1 Working with Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.3.2 Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.3.3 Rank One Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.4 Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.4.1 Definition of Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.4.2 Properties of Positive Definite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.4.3 Positive Semidefinite Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7 Single Value Decomposition 72

7.1 Introduction to Single Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.1.1 Why Single Value Decomposition? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.1.2 Single Value Decomposition Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

7.1.3 Constructing Single Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 73

7.1.4 Rank One Decomposition via SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.1.5 Best Rank k Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6
0 Introduction to Linear Algebra

0.1 Introduction to Scalars and Vectors

0.1.1 Definition of a Scalar

A scalar is a term used with vectors to refer to arbitrary constant numbers. These are simply values that
are used as single units, as normal numbers are.

0.1.2 Definition of a Vector

A vector is defined as an object with a direction and a magnitude, but not a position in space. The term
also denotes the mathematical or geometrical representation of such a quantity.

Vectors are often drawn like arrows in space that originate at some origin and point off in some direction.
The length of that arrow is known as the magnitude.

Vectors can also simply be represented as a point in space. This point is the tip of the arrow if the vector
were to have been drawn in arrow form.

0.1.3 Denoting a Vector

To denote a vector, we use the notation v, with the letter representing the vector being bold.

We can also denote a vector like this: ⃗v , with an arrow over the letter rather than a bold face letter.

0.1.4 Components of a Vector

Traditionally speaking, in a 2-D space, vectors have two components: one for the x-direction, and one for
the y-direction. The value of each of these components represents the distance travelled to get to the ”tip
of the arrow” in each axis.

In a 3-D space, there would be an addition z-component, and in higher dimentional spaces, more components
would be added.

0.1.5 Writing a Vector

With these components, vectors can be written a few ways.

Coordinate (or “row” notation)

First, one can define a vector based on the point at the tip of the arrow, in a traditional coordinate
system. For example:

7
⃗v = (1, 2)

In this case, the vector ⃗v has an x-component of 1, and a y-component of 2.

Matrix (or “column” notation) Vectors are also commonly written in a matrix form, usually written
with one column and n rows, where n is the number of dimensions the vector is in. For example:
 
1
⃗v =
2

Each row in the ”matrix” above represents movement along a specific axis. For example, a 3-dimensional
vector is defined below:
 
1
⃗v = 2
6

In this case, the components are represented with subscripts and numbers. So, ⃗v1 = 1, ⃗v2 = 2, and ⃗v3 = 6.

Linear Combination of Unit Vectors Lastly, any vector can be written as a linear combination. This
is a concept covered in Chapter 1, however a vector can always be represented as the sum of products between
scalars and a unit vector.

A unit vector is a vector that travels strictly along only 1 axis with a magnitude of 1. For example, in
physics, the vector î is the unit vector in the x-direction, and ĵ is the
 unit vector
 in
 the y-direction. This
1 0
means that, using the format shown directly above, we say that î = and ĵ = respectively.
0 1

We know the vectors point in a certain direction, and that the different components to a vector represents
the distance in each direction. Given that î and ĵ are vectors with a length of 1 in each direction, we can
actually multiply î and ĵ by the appropriate magnitudes in each direction, and add them together. So, the
third was the vector ⃗v can be represented is the following:

⃗v = 1î + 2ĵ

So, in conclusion based on the three different methods of writing vectors discussed above,
 
1
⃗v = (1, 2) = = 1î + 2ĵ
2

0.1.6 Calculating the Magnitude of a Vector

We can calculate the magnitude of a vector using the following formula:


q
∥⃗v ∥ = ⃗vx2 + ⃗vy2

This basic formula essentially follows the pythagorean theorem for finding the hypoteneuse of a triangle.
The same concept applies here.

8
0.2 Operations with Vectors

0.2.1 Adding Vectors

We can add two vectors simply by summing the values in each row.
     
a c a+c
+ =
b d b+d

Remember: you can only add scalars of the same shape! (same dimensions)

0.2.2 Multiply a Vector by a Scalar

We can also scale a vector by multiplying it by a scalar. The formula works as follows:
   
a c a
c1 = 1
b c1 b

Remember: you can only add scalars of the same shape! (same dimensions)

9
1 Introduction to Vectors

1.1 Linear Combinations

1.1.1 Linear Combinations

A linear combination is the sum of two scaled vectors, scaled by arbitrary scalars. For example, if ⃗v and
w
⃗ are vectors, and c and d are scalars, then the following is a valid linear combination of ⃗v and w:

c⃗v + dw

Linear combinations combine the two fundamental vector operations of addition and scalar multiplication.
Now, the actual result of these operations is another vector. Assuming that vectors ⃗v and w
⃗ are two-
dimensional in the problem above:
         
⃗v1 w
⃗1 c⃗v1 dw⃗1 c⃗v1 + dw
⃗1
c⃗v + dw
⃗ =c +d = + =
⃗v2 w
⃗2 c⃗v2 dw⃗2 c⃗v2 + dw
⃗2

You can see that the result after both operations is still a single vector. Therefore, a linear combination of
vectors usually will result in a vector. In addition to this, the reverse is true: all non-unit vectors can
be represented as a linear combination of unit vectors and some arbitrary scalars.

This is why it was possible to show in Chapter 0 that vectors can be represented as linear combinations,
with each vector being a unit vector and each scalar being the value of each component.

Just like you cannot add two vectors with different dimensions, you cannot compute the linear com-
bination of two vectors with different dimensions. Remember, this is because a linear combination is
just adding two (scaled) vectors!

Vector and Linear Combination Identities:


- Additions of vectors is commutative:
⃗v + w⃗ =w⃗ + ⃗v
- These two linear combinations are also equivalent:
c⃗v + dw
⃗ = d⃗v + cw

- There is also a special vector called the zero vector
 (or null vector), denoted by ⃗0. The null
0
vector can also be represeted as a bold 0. ⃗0 = . If all scalars in a linear combination is equal to
0
0, then the combination is the zero vector.
⃗ = ⃗0
0⃗v + 0w

1.1.2 Spans of Linear Combinations

The span of a linear combination is simply the set of all possible linear combinations of a set of vectors.
What does this mean?

Well, remember that a linear combination of two vectors looks like:

10
c⃗v + dw

Here, the two vectors being combined are ⃗v and w.


⃗ However, these vectors are being scaled by the arbitrary
scalars c and d. Now, these scalars are actually arbitrary, meaning we can choose any number n ∈ R to be
these values. When we do this, these original ⃗v and w ⃗ vectors can be scaled – shrunk, stretched, or even
reflected on itself (with a negative scalar).

Consider just c⃗v , which is the vector ⃗v scaled by the scalar c. ⃗v looks like an arrow in space, but we can also
consider it just as the point at the end of that arrow. As we change the value of c, that magnitude of that
vector shrinks and stretches, however the vector always remains on that ”line” it originally sat on, if drawn
to infinity. There are an infinite number of vectors that c⃗v can equal to depending on the value of c, yet it
never strays off of that original line through space.

Because of that, we can say that the span of c⃗v – the set of all its linear combinations – can fill a line in
space. It is impossible to stray from that line, however it is possible to make up any vector on that line from
c⃗v as long as the right c is chosen.

Now, let’s take a look at that original linear combination again:

c⃗v + dw

Remember, when you add vectors, the tail of the second vector sits at the tip of the first vector. So, we
established that the span of c⃗v is a line, since the magnitude of a vector pointing in a singular direction is
being scaled. However, now we introduce a second vector at the end of the first one. That one also scales,
and the span of that scalar and vector is also a line. However, assuming that these two vectors ⃗v and w ⃗ are
pointing in different directions, you will find that the second vector can scale along any point on the line of
the first vector. This essentially means that any point on the 2-dimensional plane can be reached, with the
correct c and d in the linear combination of c⃗v + dw.⃗

So, this means that the span of the linear combination c⃗v + dw
⃗ is a plane (rather than a line).

Now this is of course true if the two vectors ⃗v and w


⃗ are pointing in different directions. If two vectors are
pointing in the same direction, then the effect is nullified and the span is still a line. We call two vectors
that point along the same line as linearly dependent. Vectors that do not are linearly independent.

Now, we can continue this line of thinking forward with the following linear combination:

c⃗u + d⃗v + ew

Now, we are combining three vectors here. As you can probably imagine, as long as all three of these vectors
are linearly independent, and all three vectors are three-dimensional, then the shape of the span of the linear
combination above would be all of the 3-dimensional space.

11
1.2 Lengths and Dot Products

1.2.1 Calculating the Dot Product

If you noticed from Chapter 0, we did not go into the multiplication of two vectors.

The “multiplication” of two vectors is defined as the dot product, also known as the inner product, of
two vectors. Let’s say we have two vectors, ⃗v = (v1 , v2 ) and w
⃗ = (w1 , w2 ). Then, the dot product is defined
as the following:

⃗v · w
⃗ = v1 w1 + v2 w2

You can see that the dot product is the sum of the products of each component of the vector. Because of
this, unlike with addition, scalar multiplication, and linear combinations where the final result was a vector,
the final result of a dot product is a number.

In general form, with multiple dimensions where ⃗v = (v1 , v2 , ..., vn ) and w


⃗ = (w1 , w2 , ..., wn ), the dot product
of ⃗v and w
⃗ is:

⃗v · w
⃗ = v1 w1 + v2 w2 + ... + vn−1 wn−1 + vn + wn

1.2.2 Length of a Vector

Remember from Chapter 0, we denoted the length of the two-dimensional vector ⃗v as ∥⃗v ∥, using the following
formula:
p
∥⃗v ∥ = ⃗v12 + ⃗v22

In general form, with multiple dimensions where ⃗v = (v1 , v2 , ..., vn ) , the length of vector ⃗v is:
q
∥⃗v ∥ = ⃗v12 + ⃗v22 + ... + ⃗vn−1
2 + ⃗vn2

Again, this formula follows the basic principles of the pythagorean theorem. However, you might notice
something pretty interesting:
q p
∥⃗v ∥ = ⃗v12 + ⃗v22 + ... + ⃗vn−1
2 + ⃗vn2 = ⃗v1⃗v1 + ⃗v2⃗v2 + ... + ⃗vn−1⃗vn−1 + ⃗vn⃗vn

You will actually find that the interior of the square root on the right hand side is actually equivalent to
the dot product of ⃗v with itself, ⃗v ! So, we can also say that:

∥⃗v ∥ = ⃗v · ⃗v

Alternatively, with some algebra, it is also then reasonable to say that:


2
⃗v · ⃗v = ∥⃗v ∥ = (Length of ⃗v )2

1.2.3 Working with Unit Vectors

Remember from above, we defined a unit vector as a vector with the magnitude of 1. So, we can now prove
2
a vector ⃗v is a unit vector by showing that the dot product ⃗v · ⃗v = ∥⃗v ∥ = 1.

12
Now, we can also find the unit vector ⃗u for any vector ⃗v . In order to do this, we need to divide the vector
by its length:

v
⃗u = ∥⃗
v∥

This will ensure that the length is 1, but the direction is preserved. The operation of division here is actually
a vector, because the operation is actually scalar multiplication. The scalar here is c = ∥⃗v1∥ as shown here:


v 1
⃗u = ∥⃗
v∥ = v∥ ⃗
∥⃗ v

1.2.4 Angle Between Two Vectors

Based on the formula for the Law of Cosines we can derive a formula to calculate the angles between two
vectors.

⃗ are unit vectors (∥⃗v ∥ = 1 and ∥w∥


If ⃗v and w ⃗ = 1), then:

⃗v · w
⃗ = cos θ

If ⃗v and w
⃗ are not unit vectors, then we can add to this formula. Remeber that we defined a unit vector ⃗u
of vector ⃗v to be ⃗u = ∥⃗⃗vv∥ . So, we can apply this to the formula:
   

v
∥⃗
v∥ · ∥w

w∥
⃗ = cos θ

You can see that nothing actually changes. Both vectors here are unit vectors, and the angle between then
does not change depending on the length of the vectors, so it all works out. This equation also can be
manipulated to be:

⃗v · w
⃗ = ∥⃗v ∥ ∥w∥
⃗ cos θ

Two vectors are said to be perpendicular if ⃗v · w


⃗ = 0.

   
x y
In addition, given a vector ⃗v = , it is perpendicular to the unit vector ⃗u = c . This can
y −x
be proven by using the dot product and the rule above.
   

v
Whatever the angle, the dot product of ∥⃗
v∥ · ∥w

w∥
⃗ will never exceed 1. Since | cos θ| never exceeds 1, two
fundamental inequalities are produced.

Schwarz Inequality:
|⃗v · w|
⃗ ≤ ∥⃗v ∥ ∥w∥

Triangle Inequality:
∥⃗v + w∥
⃗ ≤ ∥⃗v ∥ + ∥w∥

13
1.3 Introduction to Matrices

1.3.1 Introduction to Matrices

A matrix is an organized structure of data with rows and columns. Matrices are denoted with a capital
letter. Below is an example of a matrix:
 
1 2 3
A=
4 5 6

In this case, A is a matrix with 2 rows and 3 columns. Hence, it is a 2x3 matrix.

Below is another example of a matrix:


 
1
B = −1
0

In this case, since B has 3 rows and 1 column, it is a 3x1 matrix.

You can also say that B is also a vector, since it only has one column.

1.3.2 Multiplying a Matrix by a Scalar

Multiplying a scalar s by a matrix A is quite


 easy. You take all of the elements of the matrix and multiply
a b c
by it by the scalar. Suppose that A = , and that suppose s is an arbitrary scalar. Then, the
d e f
result of multiplying s and A would be:
 
sa sb sc
sA =
sd se sf

The result of this operation is a matrix with the same original dimensions as A.

1.3.3 Multiply a Matrix by a Matrix

Multiplying matrices is pretty interesting and not as easy as simply multiplying all of the values together.
In reality, multiplying two matrices of dimensions m × n and k × l will result in a m × l matrix where each
element is the dot product of the first matrix’s rows and the second matrix’s columns.

This is a pretty interesting concept, so let


 mewrite it out here in more detail. Suppose we want to find the
  1
1 2 3
product of two matrices and −1. The solution would be found below:
4 6 5
0
 
  1      
1 2 3   (1, 2, 3) · (1, −1, 0) 1(1) + 2(−1) + 3(0) −1
−1 = = =
4 6 5 (4, 6, 5) · (1, −1, 0) 4(1) + 6(−1) + 5(0) −2
0

Every row of the resulting matrix represents the exact row in the first matrix’s dot product with every
column entry of the second matrix, with the dot product going in the correct slot for each column that was
used to compute it.

14
Every column of the resulting matrix then represents the all of the dot products calculated with the
matching column in the second matrix, with each dot product going in the correct slow for each row that
was used to compute it.
 
a11 a12 a13
So, for example, if A = , where each element aij represents an entry at row i and column j
a21 a22 a32
 
b11 b12
in the matrix, and I have B = b21 b22 , then when multiplied:
b31 b32
 
(a11 , a12 , a13 ) · (b11 , b21 , b31 ) (a11 , a12 , a13 ) · (b12 , b22 , b32 )
AB =
(a21 , a22 , a23 ) · (b11 , b21 , b31 ) (a21 , a22 , a23 ) · (b12 , b22 , b32 )

This does look extremely complicated, but in practice it is not too difficult.

VERY IMPORTANT: The multiplication of matrices is NOT commutative! AB and BA can lead
to different results, and order matters when multiplying matrices.

1.3.4 Representing Linear Combinations as a Matrix


     
1 0 0
Say that we have three vectors, ⃗u = −1, ⃗v =  1 , and w ⃗ = 0. We can choose three variables, say
0 −1 1
x1 , x2 , and x3 to represent our scalars. Then, the linear combination would look like the following:

     
1 0 0
x1 −1 + x2  1  + x3 0
0 −1 1

We can solve this particular combinations to get the following result:

       
1 0 0 x1
x1 −1 + x2  1  + x3 0 = x2 − x1 
0 −1 1 x3 − x2

We can actually represent this as an equation with matrices like so:

    
1 0 0 x1 x1
−1 1 0 x2  = x2 − x1 
0 −1 1 x3 x3 − x2

At first, these might not look related. But remember, when the matrix multiplication is computed, every
row of the left matrix’s dot product with the right matrix be equivalent to the same structure as above.

1.3.5 Identity Matrices and Introduction to Inverse Matrices

An identity
 matrix is a square matrix with 1’s on the diagonal, and 0 in all other slots. For example,

1 0 0  
1 0
I = 0 1 0 and I =
  are both identity matrices.
0 1
0 0 1

15
The inverse of a matrix A is denoted by A−1 . It is not the reciprocal of A, it is just notation. The
following rule applies for an inverse matrix:

Inverse Matrix Identification:


If A and B are square matrices, and AB = I where I is an identity matrix, then it is proven that
B = A−1 (is the inverse matrix of A).

VERY IMPORTANT: Not every matrix has an inverse.

1.3.6 Matrix Equation Form

Consider the following equation:

A⃗x = ⃗b

This is the fundamental matrix equation form that we will use a lot throughout linear algebra. However,
when solving these equations, a change of perspective is important.

In the past, we have solved equations where both A and ⃗x are known, and the result we solve for is ⃗b. This
is the result of A acting on the vector ⃗x.
     
1 0 0 x1 x1
For example, if A = −1 1 0, ⃗x = x2 , and ⃗b = x2 − x1 , then A⃗x = ⃗b:
0 −1 1 x3 x3 − x2
    
1 0 0 x1 x1
−1 1 0 x2  = x2 − x1 
0 −1 1 x3 x3 − x2

However now, rather than trying to solve ⃗b, we are given A and ⃗b and we try to solve for ⃗x.

This is the inverse problem to what we have attempted to solve before.

There are a few options for the result of trying to search for ⃗x.

No Solution:

In the equation A⃗x = ⃗b, no solution occurs if ⃗b is NOT a linear combination of the columns of A.
   
2 4 1
For example, in the equation ⃗x = , there is no solution because there is no linear combination.
1 2 1      
2 4 1
The above equation is the same as the linear combination x1 x = , which as no solution.
1 2 2 1

Unique Solution: In the equation A⃗x = ⃗b, a unique solution occurs if ⃗b is a linear combination of the
columns of A, and that we CANNOT write one of the columns of A as a linear combination of the other
columns.

TLDR; If A is invertible and A−1 exists, then the equation A⃗x = ⃗b has one unique solution.

Infinitely Many Solutions It is also possible for this equation to have infinitely many solutions, if
there is a cyclic difference in the matrix. For example here:

16
    
1 0 −1 x1 x1 − x3
C⃗x = −1 1 0  x2  = x2 − x1 
0 −1 1 x3 x3 − x2

Given the equation C⃗x = ⃗0, for any c, x1 = x2 = x3 = c is a solution. Therefore, there are infinitely many
solutions.

A cyclic matrix C has no inverse. All three columns lie in the same plane. Also, those dependent
columns add to the zero vector. Any equation C⃗x = ⃗0 has many solutions.

1.3.7 Dependent and Independent Vectors

Vectors ⃗u, ⃗v , and w


⃗ are dependent if there exists a non-trivial (non-zero) combination of scalars c, d, and
e where c⃗u + d⃗v + ew ⃗ = 0.

Vectors ⃗u, ⃗v , and w


⃗ are independent if c⃗u + d⃗v + ew
⃗ = 0 IF AND ONLY IF c = d = e = 0.

1.3.8 Representing Systems of Linear Equations as a Matrix

One of the most powerful uses of matrices, and one fundamental to linear algebra, is using them to represent
linear systems of equations. Take a look at the system below:

x + 2y + 3z = 6
2x + 5y + 2x = 4
6x − 3y + z = 2

This system of equations can first be represented by a linear combination of vectors for each coefficient, with
the scalar being the associated variable. This looks like:

       
1 2 3 6
x 2 + y  5  + z 2 = 4
3 −3 1 2

Lastly, we can convert this fully into a matrix form equation, where A⃗x = ⃗b:

    
1 2 3 x 6
2 5 2 y  = 4
6 −3 1 z 2

Remember, this form is perfectly valid to represent the linear combination above because the matrix multi-
plication represents that step.

In fact, we can create an augmented matrix that represents the matrix above, however in only one singular
matrix:
 
1 2 3 6
2 5 2 4
6 −3 1 2

In this case, the individual scalars are abstracted out and automatically assumed based on the number of
rows of the augmented matrix. Since the matrix is not square, the final column is the vector ⃗b. We will use
this form a lot in Chapter 2 when working with eliminating matrices.

17
2 Solving Linear Equations

2.1 Vectors and Linear Equations

2.1.1 Row and Column Pictures

We can show the solution to linear equations graphically using row and column pictures.

Row Picture

A row picture shows shapes generated based off of the shapes created by each row, where the intersection is
going to be the solution of the linear equations. For example, with the following system of equations:

    
1 2 3 x 6
2 5 2 y  = 4
6 −3 1 z 2

First, you can approach this problem by solving for x, y, and z for the following equations. For the first
equation in the system, we can solve that x = 6, y = 3, and z = 2 respectively. Using these points, we can
generate a shape that intercepts all of these points. Because there are three points here, we can create a 2D
plane.

Repeating this process for the second equation, you find that x = 2, y = 54 , and z = 2. Drawing the graphic
for this looks like the one of the left.

You will see that the two planes intersect at line L, which is a line that passes through the common point
z = 2.

Adding the third equation in the system, solved as x = 13 ,y = − 32 , and z = 2. As you can see, the third
equation also intersects the others at the line L, where the line L passes through the point z = 2.

In conclusion, the row picture points to a solution of (0, 0, 2).

Column Picture

Unlike the row picture, which treats each column in the matrix as its own unit, the column picture does the
same for the columns instead. Because columns within our coefficient matrix refers to vectors, we can draw
vectors on a plane. First, we can reorient our matrix to be in linear combination form:

18
       
1 2 3 6
x 2 + y  5  + z 2 = 4
3 −3 1 2

Now, we can clearly draw out each vector in a 3d space, as shown below:

Finally, the solution here can be found by drawing the correct combination of vectors with the right scaling.
In this case, all it took was to multiply column three by a factor of 2, and all other vectors were not used.
So:
       
1 2 3 6
0 2 + 0  5  + 2 2 = 4
3 −3 1 2

The combination of scalars satisfies the equation, and in conclusion, the column picture points to a solution
of (0, 0, 2).

19
2.2 The Idea of Elimination

2.2.1 Why Eliminate?

The goal of elimination is to manipulate a matrix (one that is representing a system of equations) so that
we can easily solve the system mathematically. There are three main steps of achieving this:

- Gauss-Jordan Elimination pt. 1: Turn matrix into upper-triangular form. - Normalization: Divide
equations so that the pivot is 1. - Gauss-Jordan Elimination pt. 2: Eliminate backwards to get final
values for variables in ⃗x.

2.2.2 Gaussian Elimination

Gaussian Elimination is the first step in the wider Gauss-Jordan Elimination. The goal of Gaussian
Elimination is to turn a matrix into upper triangular form. This makes solving systems of equations easier
because solving the last row is usually trivial (requires one division step), and with the result of that final
variable, you are able to use backwards substitution to reach a final answer.

Take the following matrix as an example:


    
1 −2 x 1
=
3 2 y 11

First, the pivot is defined as all of the diagonal numbers of a matrix.

We can work to eliminate the matrix by removing all of the numbers of every pivot. Then, we reach
upper-triangular form since all values underneath the pivots are 0.

We can follow this formula:

Subtract l times row 1 from row 2, and save the new row in row 2.

In this case, lij is the multiplier that when the rows are subtracted, the entry below the pivot becomes 0.

In the example above, the best multiplier to use is l21 = 3 because when 3 times row 1 is subtracted from
row 2, the value under the first pivot 1 will be 0. This is the result of that operation:
    
1 −2 x 1
=
0 8 y 8

You can see in the example above that we are already done. However, you must keep repeating per row for
every pivot on larger matrixes so all values underneath the pivots are 0.

The values of the pivots themselves are hidden until elimination occurs.

Note: Elimination does not change the solution. It might change the way how row or column pictures
look, but the solution will still be the same.

We can also find no solution if a pivot ever becomes 0.

For example:
         
1 −2 x 1 1 −2 x 1
= ⇒ =
3 −6 y 11 0 0 y 8

20
You can see here that after eliminating, the pivot in row 2 became 0. Therefore, there is no solution because
there is no possible way that 0 = 8.

There is one exception. If the element for that row in the ⃗b vector is also 0, then there are many solutions
to the equation.

For example:
         
1 −2 x 1 1 −2 x 1
= ⇒ =
3 −6 y 3 0 0 y 0

This is because the entire bottom row is eliminated from the matrix, and the only equation left is:

x − 2y = 1

This is a linear line with many solutions.

There is one issue though. If there is a 0 pivot in the first row, then you can exchange the rows
and continue eliminating.

For example:
         
0 2 x 4 3 −2 x 5
= ⇒ =
3 −2 y 5 0 2 y 4

This is completely valid to do, since the ordering of equations in a system of equations is arbitrary.

2.2.3 Normalization

We can normalize a matrix by dividing each row by its respective pivot. This is to get all of the pivots to
be 1. This step is ideal to complete before working on backwards elimination.

The resulting matrix is in REF (Row Echelon Form).

2.2.4 Gauss-Jordan Elimination - Backwards elimination

Now, we can complete the Gauss elimination step, but backwards. Meaning, start at the last row, and
subtract upwards to ensure that only the pivots have values.

2.2.5 Solve the System

Finally, you can solve the system! You know now have single term equations for each variable. Normalization
after the Gauss-Jordan step gives the answer, and is the RREF (Reduced Row Echelon Form).

21
2.3 Elimination Using Matrices

There are many different terms that are important to understand when using matrices to solve linear equa-
tions.

2.3.1 Elimination Matrices

Elimination matrices, represented by E, are matrices that actually perform the elimination steps described
above when multiplied with the matrix A. For example, say while we were eliminating, we subtracts 2 times
row 1 from row 2. This operation can be represented by the following matrix:
 
1 0 0
E = −2 1 0
0 0 1

This is because EA would equal a matrix with these operations performed. How did we get here?

Well, first the E matrix starts out as the identity matrix I. That is because, IA = A. I.e., I is the base
matrix with no operations performed. In this case, the identity matrix (and the start of our E matrix) looks
like:
 
1 0 0
I = 0 1 0
0 0 1

Then, we subtract 2 times row 1 from row 2. So, to do this, we put a 2 (the multiplier l21 ) in the first column
(representing row 1) and put it in the second row (representing the row to remove from). Then, we subtract
this matrix from E to represent the subtraction step. So:
     
1 0 0 0 0 0 1 0 0
E = 0 1 0 − 2 0 0 = −2 1 0
0 0 1 0 0 0 0 0 1

You can see that, in the end, we successfully created the elimination matrix. Now, when E is multiplied to
A, the steps will be performed perfectly.

We can show and even break up our Gaussian Elimination into operations of elimination matrices multiplied
to A.

We also can apply subscripts i and j to the notation to create Eij , where i represents the row to be subtracted
from, and j represents the row to subtract by.

So, for proper notation, we can say that the elimination matrix we created above is:
 
1 0 0
E21 = −2 1 0
0 0 1

2.3.2 Permutation Matrices

Permutation matrices, represented by P , are matrices that perform row switches (when the first row pivot
is 0, for example.

22
Similarly to above, we can apply subscripts i and j to the notation to create Pij Eij , where i and j are the
rows to be switched.

For example, the permutation matrix to switch rows 2 and 3 would be:
 
1 0 0
E23 = 0 0 1
0 1 0

When E23 A is computed, the rows for A will be switched as directed. The 1 goes in the column that should
be the new row for a row.

2.3.3 Divide By Pivots

There is actually notation to use also for diving by the pivots. This is simply just D−1 .
h i
So in conclusion, combinations of Eij , Pij , and D−1 when multiplied in order with augmented matrix A ⃗b
successfully eliminates the matrix.

23
2.4 Rules for Matrix Operations

2.4.1 Matrix Operation Rules

There are a few basic matrix operation rules.

Matrix Addition
1. Matrix addition is commutative. So:
A+B =B+A
1. The distributive property applies to multiplying the sum of matrices by a scalar. So:
c(A + B) = cA + cB
1. Matrix addition is also associative. So:
(A + B) + C = A + (B + C)
Matrix Multiplication 1. The distributive property applies to multiplying a matrix by a sum
of matrices. So:
A(B + C) = AB + BC
1. Matrix multiplication is associative. So:
A(BC) = (AB)C
1. The associative property also applies for vectors, since vectors can be matrices. So:
AB⃗x = (AB)⃗x = A(B⃗x)
1. Matrix multiplication is NOT COMMUTATIVE !
Usually, AB ̸= BA
1. Matrices and exponents work the same way as traditional numbers for the most part.
Ap = AAA...A (p times)
Ap Aq = Ap+q
q
(Ap ) = Apq  
1. Lastly for technicalities, b b b b
 if B has columns b1 , b2 ,b3 , b4 then B = 1 2 3 4 . Then:
AB = A b1 b2 b3 b4 = Ab1 Ab2 Ab3 Ab4

24
2.5 Inverse Matrices

2.5.1 Definition

This was touched upon in the previous chapter, however, for a square matrix A, an inverse matrix A−1
exists if A−1 A = AA−1 = I, where I is the identity matrix.

VERY IMPORTANT: Not every matrix has an inverse.

Given this definition, consider a linear system:

A⃗x = ⃗b

If A has an inverse, then:

A−1 (A⃗x) = A−1⃗b


A−1 A ⃗x = A−1⃗b


I⃗x = A−1⃗b
⃗x = A−1⃗b

So — we can say that, if A has an inverse, then ⃗x = A−1⃗b.

In practice, we usually don’t calculate A−1 . However, we can.

Inverse Matrix Uniqueness There is only one inverse matrix for a matrix A. So:
If BA = 1 and also AC = 1, then B = C.

IMPORTANT NOTE: An inverse of a square matrix A exists IF AND ONLY IF elimination


produces n pivots, where n is the dimention of the square matrix.

2.5.2 Singular Matrices

Singular matrices are matrices that do not have an inverse. Therefore, a matrix is singular if there exist
a non-zero vector ⃗x such that A⃗x = ⃗0.

2.5.3 Properties of Inverse Matrices

Inverse of the Product


If A and B are both invertible, then so is AB and:
−1
(AB) = B −1 A−1

Note: The order is reversed here!

2.5.4 Inverse of Elimination Matrices

Since the elimination matrix E subtracts, its inverse E −1 adds.

25
2.5.5 Calculating A−1 with Elimination

To find the inverse of A:


 
1. Write the augmented matrix A I . 2. Apply Gauss-Jordan elimination. 3. If A is invertible, then
there are n pivots and you get I A−1 .

This is a super cool trick and the fastest way to solve for the inverse of a matrix.

Note: Before normalizing first and after Gauss elimination, the product of the pivots is the deter-
minant of the matrix.

2.5.6 Diagonally Dominant Matrices

If each diagonal entry of a matrix is greater than the sum of the rest of its row, the matrix is diagonally
dominant.

Diagonally dominant matrices are always invertible. However, a matrix that is not diagonally
dominant could still be invertible.

26
2.6 Factorization of Matrices

2.6.1 A = LU Factorization

The factorization of a matrix is represented by the equation A = LU , where U is the upper triangular
matrix that is created after Gauss elimination. L is the lower triangular matrix that is created from the
product of all of the elimination matrices.

Note: This only works when there are no row exchanges.

2.6.2 A = LDU Factorization

You can also have an A = LDU factorization, where D are the pivots from U in A = LU factorization, and
U is the same U as in the previous factorization, but all rows are divided by the pivots. This balances the
factorization.

For example:
      
1 0 0 2 1 0 1 0 0 2 0 0 1 1/2 0
A = 1/2 1 0 0 3/2 1  ⇒ 1/2 1 0 0 3/2 0  0 1 2/3
0 2/3 1 0 0 4/3 0 2/3 1 0 0 4/3 0 0 1

27
2.7 Transposes and Permutations

2.7.1 Transpose

If A is a matrix, then AT is the transpose matrix of A.

The columns of AT are the rows of A. For example:


 
  1 0
1 2 3
If A = , then AT = 2 0
0 0 4
3 4

We can use the index notation as well, where i is the row number and j is the column number, to say that
(AT )ij = Aji .

2.7.2 Rules of Transpose

Transpose Rules:
T T T T −1
1. AT = A 2. (A + B) = AT + B T 3. (AB) = B T AT 4. A−1 = AT  = A−T 5. A is
invertible IF AND ONLY IF AT is invertible. 6. ⃗x · ⃗y = ⃗xT ⃗y 7. ⃗y · (A⃗x) = AT ⃗y · ⃗x

2.7.3 Outer Product

The outer product of vectors ⃗x and ⃗y is the same as finding ⃗x⃗y T .

Think of the outer product like creating a multiplication table.

2.7.4 Symmetric Matrices

A matrix A is symmetric if A = AT .

Because of this definition, all symmetric matrices are square.

With elimination, the form A = LU does not capture symmetry. However, the form A = LDU does
capture symmetry!

If matrix A is symmetric (i.e., A = AT ), then, with NO ROW EXCHANGES, A = LDU is the same as
writing A = LDLT .

2.7.5 Permutation Matrices

Permutation matrices are used to exchange rows within a matrix.

NOTE: For a permutation P , we can say that P −1 = P T .

28
2.8 Proofs for Midterm 1

2.8.1 Uniqueness of Inverse

If BA = I and also AC = I, then B = C.

Proof:

Since BA = I and AC = I, then B(AC) = B and (BA)C = C since any matrix times its identity
matrix is that matrix. By the associative property, (BA)C = (BA)C. Thus, B = C.

2.8.2 Unique Solution of Linear Equations

If A is invertible, then the unique solution to A⃗x = ⃗b is ⃗x = A−1⃗b.

Proof:

Since A⃗x = ⃗b, then we can multiply both sides by A−1 . Then, A−1 A⃗x = A−1⃗b. Since A−1 A = I by
the definition of an inverse matrix, A−1 A⃗x = A−1⃗b can be rewritten as I⃗x = A−1⃗b. Thus, ⃗x = A−1⃗b.

2.8.3 Singular Matrix Proof if the Product of Nonzero vector and matrix is 0

If there is a non-zero vector ⃗x ̸= ⃗0 such that A⃗x = ⃗0, then A is not invertible.

Proof:

Proof by contraction. Suppose x ̸= ⃗0 and A is invertible. Given A⃗x = ⃗0, we can multiply both
sides by A−1 . Then, A−1 A⃗x = A−1⃗0. Since A−1 A = I, the equation can be simplified to I⃗x = A−1⃗0,
which can be further simplified to ⃗x = ⃗0 since any matrix times ⃗0 is just ⃗0. Thus, x = ⃗0, which is a
contradiction.

2.8.4 Inverse Product

−1
If A and B are both invertible, then so is AB and (AB) = B −1 A−1 .

Proof

29
−1 −1
Given (AB) = B −1 A−1 , we can multiply both sides by (AB). This means that (AB) (AB) =
−1
(AB) B −1 A−1 . We know that if M is a matrix, M M −1 = I. So, (AB) (AB) = I, and that
I = (AB) B −1 A−1 . Therefore, I = A BB −1 A−1 by the associate property. This simplifies to


I = AIA−1 . Using the associative property, this equation is the same as I = A IA−1 . Since any
identity matrix times a matrix is the original matrix, I = AA−1 . Using the premise above, we know
−1
that AA−1 = I, so when substituted into the equation, I = I. Thus, (AB) = B −1 A−1 .

2.8.5 Symmetry and Transpose Multiplication Rule

If A is symmetric, then ⃗xT A⃗y = ⃗y T A⃗x.

Proof:

T
We know that ⃗xT A⃗y is a scalar. Therefore, ⃗xT A⃗y = ⃗xT A⃗y because the inverse of a scalar is just
 T
a scalar. So, ⃗xT A⃗y = ⃗y T A⃗x is the same as writing ⃗xT A⃗y = ⃗y T A⃗x. This is the same as writing
⃗y A ⃗x = ⃗y A⃗x. Given that A is symmetric, we know that A = AT . Thus, ⃗xT A⃗y = ⃗y T A⃗x.
T T T

30
3 Vector Spaces and Subspaces

3.1 Spaces of Vectors

3.1.1 Vector Space Definition

A vector space is essentially a set V of vectors along with operations of addition and multiplication, where
if vectors ⃗v and ⃗u are in the vector space, then the linear combination of these two vectors is also in the
space.

- R is the set of all real numbers. Therefore: - Rn = all column vectors with n real components. - For
example: (1, 4) ∈ R2

We can then more formally say that: If vectors ⃗v and ⃗u ∈ Rn and ∃ two scalars c and d, then
c⃗v +  Rn also. Example:
d⃗u∈        
8 π 0 8 π 0
If 3, 31 , and 0 ∈ R3 , then 8 3 − 5  13  + 0 ∈ R3 also.
5 5 1 5 5 1

3.1.2 Other Vector Spaces

There are also other examples vector spaces other than just Rn , including:

- Rm×n - F = Space of real functions f (x). - Pk = Space of polynomials of degree k. - Z = {0} = Vector
space of the 0 vector. - Cn = Space of vectors with n complex components.

3.1.3 Determining a Vector Space

In order for a vector space to be valid, it must be closed for vector addition and multiplication. More
specifically, it should follow these eight rules: Note: Assume ⃗x, ⃗y ∈ V and ∃c1 , c2 ∈ F .

Addition Rules

1. ⃗x + ⃗y = ⃗y + ⃗x 2. ⃗x + (⃗y + ⃗z) = (⃗x + ⃗y ) + ⃗z 3. ∃!⃗0 such that ⃗x + ⃗0 = ⃗x 4. ∀⃗x ∃ − ⃗x such that ⃗x + (−⃗x) = ⃗0

Multiplication Rules

1. 1 ∗ ⃗x = ⃗x 2. (c1 c2 ) ⃗x = c1 (c2 ⃗x) 3. c (⃗x + ⃗y ) = c⃗x + c⃗y 4. (c1 + c2 ) ⃗x = c1 ⃗x + c2 ⃗x

3.1.4 Dimensions of Spaces

Depending on the vector space, we can determine the dimension of the space. Some example are shown
below:

31
Key: Space ⇒ Dimension

Rn ⇒ n

Rm×n ⇒ mn

F ⇒ inf

Pℸ ⇒ k + 1 a0 + a1 x + ... + ak xk

Z = {0} ⇒ 0

3.1.5 Subspaces

A subspace is a subset of a vector space that is itself a vector space.

Examples of Subspaces of Space V

- {0} - The whole space V - Line through the origin - Plane through the origin

Subspaces Enherit the Properties of the Hosting Space:


Subspace = subset U ⊆ V of a vector space (including ⃗0) that satisfies:
1. ⃗v + w ⃗ ∈ U 2. c⃗v for any ⃗v ∈ U , c ∈ R
⃗ for any ⃗v , w
i.e., the subspace is also closed under linear combinations (addition and scalar multiplication).
The subspace U inherits the rules of the “hosting” space V .
⇒ Closedness under linear combinations is enough

All Subspaces of R3

- The zero subspace {0} = {(0, 0, 0)} - Any line through (0, 0, 0) - Any plane through (0, 0, 0) - The whole
space R3

Note: A subspace containing ⃗v and w


⃗ must contain all linear combinations c⃗v + dw.

3.1.6 Span of a Space

Given that V is a space, we can imagine that S = {⃗v1 , ..., ⃗vN } ⊂ V is a set of vectors.

Given the equation for S, we can say that SS = all linear combinations of vectors in S.

SS = a1⃗v1 + a2⃗v2 + ... + aN ⃗vN

We know that SS is a subspace of V , and that:

SS is the smallest subspace of V that contains S.

We call SS is the subspace spanned by the vectors ⃗v1 , ..., ⃗vN . So:

SS = span {⃗v1 , ..., ⃗vn }

= {a1⃗v1 + a2⃗v2 + ... + aN ⃗vN | a1 , ..., aN ∈ R}

32
3.1.7 Using Span to Describe the Subspace

We can use the span to describe the subspace. We can simply look at the linear combinations of all of the
vectors in the span, and if they are independent, they can be counted towards contributing to the subspace.
For example:
 
1
span = Line in R2
2
       
1 −3 2 1 −3
span , = Line in R : *Because, vectors and are dependent!*
2 −6 2 −6
   
1 0
span , = R2
0 1
     
1 0 1
span , , = R2
0 1 1
     
 1 0 0 
span 0 , 1 , 0 = R3
0 0 1
 
     
 1 0 1 
span 0 , 1 , 1 = xy plane in R3 : Vectors only contribute 2 dimensions!
0 0 0
 

3.1.8 Column Space of a Matrix

Previously, we have been talking about vector spaces.

Consider an m × n matrix A. The matrix A has columns a⃗1 , a⃗2 , to a⃗n , as shown below:
 
A = a⃗1 a⃗2 a⃗3

Now, consider all of the linear combinations of the columns of A. This would be shown as:

x1 a⃗1 + x2 a⃗2 + ... + xn a⃗n

Remember, we defined the linear combinations of all of the vectors in a space as a span. The linear combi-
nations above also span a space! These two statements are equivalent:

x1 a⃗1 + x2 a⃗2 + ... + xn a⃗n = span {a1 , a2 , ..., an }

The span described here is known as the column space of the matrix A:

C(A) = span {a1 , a2 , ..., an }

We use the notation C(A) to define the column space of a matrix A.


 
You can also see that A = a⃗1 a⃗2 a⃗3 ∈ Rm×n . Given the linear combinations x1 a⃗1 + x2 a⃗2 + ... + xn a⃗n
from above, we can rearrange the coefficients x1 , x2 ...xn into a single vector ⃗x. Then, we can say that:

C(A) = span {a⃗1 , ..., a⃗n } = {A⃗x : ⃗x ∈ Rn }. This leads to the following note:

Note: C(A) is a subspace of Rm , not Rn !

33
Column Space Example

Find the column space C(A) of the matrix A shown below:


 
1 0 0 −2
A =  2 0 0 −4
−1 1 0 2

The column space can be shown below:

        
 1 0 0 −2 
C(A) = x1  2  + x2 0 + x3 0 + x4 −4
−1 1 0 2
 

    
 1 0 
= t1  2  + t2 0
−1 1
 

You can see that the simplification above can occur because the last two vectors do not contribute.

This is also equivalent to:

   
 1 0 
= span  2  , 0
−1 1
 

3.1.9 Linear System and the Column Space

Reminder: the equation A⃗x = ⃗b is solvable if and only if ⃗b is a linear combination of the columns of the
matrix A. That is,
⃗b = x1 a⃗1 + x2 a⃗2 + ... + xn a⃗n

Remember, a⃗1 ...a⃗n are the columns of A!

Therefore, we can say that:


⃗b ∈ C(A)

Altogether, this means that A⃗x = ⃗b is solvable IF AND ONLY IF ⃗b belongs to the column space of A.

3.1.10 Conclusions about the Column Space

- C(A) = the space spanned by the columns of A - C(A) = all linear combinations of columns of A - C(A)
= all vectors A⃗x. - A⃗x = ⃗b has a solution IFF ⃗b ∈ C(A).

34
3.2 Null Space of A

3.2.1 Definition of the Null Space

The null space of a matrix A is the space of all solutions ⃗x where A⃗x = ⃗0.
n o
We can then more formally say that:N (A) = ⃗x | A⃗x = ⃗0

We use the notation N (A) to denote the null space of the matrix A.

We also call the null space N (A) the kernel of matrix A.

Note: If the matrix A is m × n then: - x ∈ Rn - N (A) is a subspace of Rn - C(A) is a subspace of


Rm

3.2.2 Null Space is a Subspace

We can prove that the null space of A is also a subspace.

1. Closed under addition:

⃗x, ⃗y ∈ N (A)

⇒ A⃗x = A⃗y = ⃗0

⇒ A (⃗x + ⃗y ) = ⃗0

⇒ ⃗x + ⃗y ∈ N (A)

2. Closed under multiplication:

⃗x ∈ N (A), c ∈ R

⇒ A⃗x = ⃗0

⇒ A (c⃗x) = c (A⃗x) = c⃗0 = ⃗0

⇒ c⃗x ∈ N (A)

∴ N (A) is a subspace.

There are two main scenarios in which we use to calculate the null space of a matrix A, and this depends
on whether or not A is invertible.

3.2.3 Calculating the Null Space of an Invertile Matrix

If a matrix A is invertible, then it is really easy to find the null space using the following rule:
n o
If A is invertible, then N (A) = ⃗0 .

35
That’s it!

Example
 
1 0
A= Easy!
0 1
n o
N (A) = ⃗0

Not much to remember here.

3.2.4 Calculating Null Space of a Singular Matrix (Not Invertible)

If a matrix A is NOT invertible (i.e., singular ), the process is less trivial, but the following steps can be
followed.

To show the steps to calculate this, we will calculate the null space using the following example. Suppose
that:
 
1 2 2 4
A=
3 8 6 16

1. First, find the reduced row echelon form (RREF) of the matrix A.
   
1 2 2 4 1 0 2 0
rref = =R
3 8 6 16 0 1 0 2
1. Now, select the pivot columns. Pivot columns are columns where the column contains only a single 1 in
the column, almost looking as if it would be in the I matrix. For example, columns 1 and 2 in R are pivot
columns. 2. All columns that are not pivot columns are free columns. In this case, columns 3 and 4 in R
are free columns.

Note: If A is wide (i.e., has more columns than rows n > m), then A will have free columns.

3. Now, you can create a set of vectors ⃗s that exist within the null space. First, for the number of pivot
columns, create vectors. The values of the pivot columns go to where, when the dot product of the row
and the vector ⃗s is taken, the 1 corresponds to the value in the free column. In this case:
   
   
s⃗1 = 
1 , s⃗2 = 0
  

0 1

4. Now, look at the R matrix. Take the corresponding rows and set the values in the ⃗s matrices by changing
the sign. So:
   
−2 0
0 −2
s⃗1 = 
 1  , s⃗2 =  0 
  

0 1

1. These special solutions s⃗1 and s⃗2 span the null space N (A). So, we can say that:
   

 −2 0 
   
0 −2

N (A) = span {s⃗1 , s⃗2 } = span  , 


 1   0 


0 1
 

36
Note: The dimension of N (A) = Number of free columns in A.

3.2.5 Rank of a Matrix

The rank of a matrix A is the number of pivots that A has.

The number of pivots in a matrix are revealed through Gauss-Jordan Elimination, or finding the RREF of
A.

The rank of A renoted as r(A).

3.2.6 Rank One Matrices

Some special matrices exist where the matrix’s rank r(A) = 1.

These matrices have only one pivot.

For example:
 
2 4 −2
A = 1 2 −1
3 6 −3

In this case, r(A) = 1!

- Every row in the matrix is a multiple of the first row, and - Every columns is a multiple of the first column!

We can actually represent rank one matrices as the outer product of two vectors: The first column of A,
and the first row of R, where R = rref (A). For example:
   
2 4 −2 1 2 −1
A = 1 2 −1 → R = 0 0 0 
3 6 −3 0 0 0

So then, we can say that:


 
2  
A = 1 1 2 −1
3

You can confirm this by computing the outer product.

So therefore, we can conclude that every rank one matrix is of the form A = ⃗u⃗v T .

In addition, the null space of a rank one matrix includes all of the vectors orthogonal to ⃗v .

Proof :

A⃗x = ⃗0. Since A = ⃗u⃗v T , then ⃗u⃗v T ⃗x = ⃗0.




This implies that ⃗u ⃗v T ⃗x = ⃗0, where ⃗v T ⃗x is a scalar! This implies that either u = 0 or ⃗v T ⃗x = 0. Only
  

the latter case is interesting — otherwise, A = ⃗u⃗v T = ⃗0. So:



This implies that ⃗v T ⃗x = 0.

∴ N (A) = the vectors orthogonal to ⃗v .

37
3.3 Complete Solution to A⃗x = ⃗b

3.3.1 Homogeneous System of Linear Equations

A homogeneous system of linear equations is one in which all of the constant terms are zero. A
homogeneous system always has at least one solution, namely the zero vector.

Note: When a row operation is applied to a homogeneous system, the new system is still homoge-
neous!

Remember, a the equation A⃗x = ⃗b represents a system of linear equations. So, since a homogeneous system
of linear equations always has as least one solution which is the vector, we can reword this in math notation
as the following:

Math notation:
If A⃗x = ⃗0 is homogeneous, then ∃⃗x such that A⃗x = ⃗0.

How do we find that ⃗x?

Well, recall that the null space of a matrix A is the space of all solutions ⃗x where A⃗x = ⃗0!

Therefore, to find that ⃗x, we just have to find a vector ⃗x in the null space (find ⃗x ∈ N (A)).

We can do this by simply finding a basis of the null space N (A). When we solved to find the null space in
Section 3.2 for a singular matrix, we found special solutions s⃗1 ...s⃗2 that span the null space.

We can simply treat one of these special solutions as a basis for the null space. Then, we use the cor-
responding coeffients mapped to the free columns and create a linear combination of these special
solutions. This matrix should exist in N (A).

Example

Given the following matrix A:


 
1 3 0 2
A = 0 0 1 4
1 3 1 6

First, we must find R, which is the reduced row echelon form of matrix A.
   
1 3 0 2 1 3 0 2  
1 3 0 2
A = 0 0 1 4 → R = 0 0 1 4 = 
0 0 1 4
1 3 1 6 0 0 0 0

Now, we know that columns 1 and 3 are pivot columns, and columns 2 and 4 are free columns.

So, we can begin to build n number of special vectors, where n = the number of free columns. First, we will
in the values of 1 and 0 in the right place.
   
1 0
s⃗1 = 
  , s⃗2 =  
  

0 1

Then, we fill in the rest of the values as described in Section 3.2. To do this, we:

38
- Set each variable slot to 1 and other to 0. - Get pivots from ⃗b.
   
−3 −2
1 0
 0  , s⃗2 = −4
s⃗1 =    

0 1

These vectors are special solutions! Now, look at the entry where the value 1 is located. Now, correspond
this to the free column and to the entry of the vector ⃗x (in the original equation A⃗x = ⃗b).

In this case, we will have the coefficients x1 on the vector s⃗1 , and x2 on s⃗2 .

Now, create a linear combination:


   
−3 −2
1 0
x2 
 0  + x4 −4
  

0 1

We can now say that the linear combination above exists in the null space of N (A).
   
−3 −2
1 0
 0  + x4 −4 ∈ N (A)
x2    

0 1

No matter the value of the coefficients x2 and x4 , the linear combination will always exist in
the null space N (A)!

We can also say that x⃗n is the vector in the null space that is the linear combination shown above:
   
−3 −2
1 0
 0  + x4 −4
x⃗n = x2    

0 1

Finally, we reach the solution! That is,


   
−3 −2
1 0
Ax⃗n = ⃗0 for any choice of x2 , x4 ∈ R where x⃗n = x2   + x4  
   .
0 −4
0 1

3.3.2 Non-Homogeneous Systems of Linear Equations

Above, we talked about solving homogeneous systems of linear equations, where the solution is the zero
vector.

What if we are trying to find the complete solution for a system where ⃗b ̸= ⃗0? That is a non-homogeneous
system!

We can follow a similar format to above, however the process is a bit more involved. It is easiest if this is
done through an example:

Example

39
h i
Given the following augmented matrix A ⃗b :
 
h i 1 3 0 2 1
A ⃗b = 0 0 1 4 6
1 3 1 6 7
h i h i
First, we must find R d⃗ , which is the reduced row echelon form of the augmented matrix A ⃗b :
   
h i 1 3 0 2 1 h i 1 3 0 2 1
A ⃗b = 0 0 1 4 6 → R d⃗ = 0 0 1 4 6
1 3 1 6 7 0 0 0 0 0

Now, we know that columns 1 and 3 are pivot columns, and columns 2 and 4 are free columns.

The next step now is a bit different than before. Before trying to find the special solutions, we first want to

find a particular solution x⃗p , where Rx⃗p = d.

To do this, we:

- Set free variable slots to 0. - Get pivots from d.

These steps are shown below:


    
1
 0 0
 →   → 6
x⃗p =     

0 0

This is our particular solution! It also should be verifiable that Rx⃗p = d⃗ and that Ax⃗p = ⃗b.


Note: Solution only exists if any zero rows in R are also 0 in d!

Now that we know the paticular solution, we can now solve for our special solutions.

To do this, we follow the same steps above, which are:



- Set each variable slot to 1 and other to 0. - Get pivots from d.
       
−3 −2
1 0 1 0
  , s⃗2 =   → s⃗1 =  0  , s⃗2 = −4
s⃗1 =        

0 1 0 1

Just like before, we look at the entry where the value 1 is located. Now, correspond this to the free column
and to the entry of the vector ⃗x (in the original equation A⃗x = ⃗b).

In this case, we will have the coefficients x1 on the vector s⃗1 , and x2 on s⃗2 .

Now, create a linear combination:


   
−3 −2
1 0
x⃗n = x2 
 0  + x4 −4
  

0 1

This is the special solution!

40
Finally, we can create the final, complete solution!

The complete solution ⃗x = x⃗p + x⃗n .

Finally, we reach the solution! That is,

A (x⃗p + x⃗n ) = ⃗b for any choice of x2 , x4 ∈ R where:


     
1 −3 −2
0 1 0
6and x⃗n = x2  0  + x4 −4
x⃗p =      

0 0 1

- Why is this the solution?

Well, take a look at the equation A (x⃗p + x⃗n ) = ⃗b.

We know that Ax⃗p = ⃗b and that Ax⃗n = ⃗0.

Therefore, if A⃗x = ⃗b:

A (x⃗p + x⃗n ) = Ax⃗p + Ax⃗n = ⃗b + ⃗0 = ⃗b

∴ (x⃗p + x⃗n ) = ⃗x.

3.3.3 Full Column Rank

Full column rank occurs when r(A) = n.

This only happens if A is square or tall. i.e., every column has a pivot.

In this case, there are no free columns / variables and N (A) = {0}.

If A⃗x = ⃗b has a solution, there is only one solution.

3.3.4 Full Row Rank

Full row rank occurs when r(A) = m.

This only happens if A is square or wide. i.e., every row has a pivot.

In this case, there are no zero rows.

Also, C(A) = Rm

If A⃗x = ⃗b has a solution, for every ⃗b.

There are n − r = n − m special solutions in N (A).

If m < n we say that A⃗x = ⃗b is undetermined.

3.3.5 Relationships Between r, n, and m.

41
3.4 Independence, Basis, and Dimension

3.4.1 Linear Independence

We can review the meaning of linear independence. Recall that:

Vectors v⃗1 , ..., v⃗n are linearly independent if the only combination of vectors that gives ⃗0 is:

0v⃗1 + ... + 0v⃗n = 0.

This definition can be extended. If x1 v⃗1 + x2 v⃗2 + ... + xn v⃗n = ⃗0, this implies that x1 = x2 = ... = xn = 0. If
this is the case, then v⃗1 , ..., v⃗n are linearly independent.

If there exists a non-zero combination where x1 v⃗1 + x2 v⃗2 + ... + xn v⃗n = ⃗0, then v⃗1 , ..., v⃗n are linearly
dependent.

We can further extend this. All of these points actually mean the same thing, but they are also written in
different ways. So:

The columns of A, where A = a⃗1 a⃗2 ... a⃗n , are linearly independent if the only solution to A⃗x = ⃗0
 

is ⃗x = ⃗0.

So therefore:

- The columns of A are independent IFF N (A) = {0} and: - The columns of A are independent IFF
r = n. - The columns of A are dependent IFF A⃗x = ⃗0 has a nonzero solution. - Any n vectors in Rm must
be dependent IFF n > m.

3.4.2 Basis

The basis of any vector space V is a set of linearly independent vectors that span the space V .

We can more formally define a basis:


If v⃗1 , ..., v⃗n is a basis for the vector space V , then for any ⃗v ∈ V there is a unique combination of
v⃗1 , ..., v⃗n that gives ⃗v .

- Proof of Uniqueness

Suppose that if v⃗1 , ..., v⃗n is a basis for the vector space V , then for any ⃗v ∈ V there is a unique combination
of v⃗1 , ..., v⃗n that gives ⃗v . Say there are two combinations:

42
a1 v⃗1 + a2 v⃗2 + ... + an v⃗n = ⃗v

b1 v⃗1 + b2 v⃗2 + ... + bn v⃗n = ⃗v

Therefore, if we take the difference, we get that:

(a1 − b1 ) v⃗1 + (a2 − b2 ) v⃗2 + ... + (an − bn ) v⃗n = ⃗v − ⃗v

and that creates the following linear combination:

(a1 − b1 ) v⃗1 + (a2 − b2 ) v⃗2 + ... + (an − bn ) v⃗n = ⃗0

Remember, since v⃗1 , ..., v⃗n is a basis for the vector space, this implies that v⃗1 , ..., v⃗n are independent. The
linear combination of independent vectors can only equal ⃗0 if and only if each coefficient is equal to
0. Therefore,

(a1 − b1 ) = 0, (a2 − b2 ) = 0 , ... , (an − bn ) = 0

This implies that:

a1 = b1 , a2 = b2 , ... , an = bn

Therefore, we have proven uniqueness. That is,

If v⃗1 , ..., v⃗n is a basis for the vector space V , then for any ⃗v ∈ V there is a unique combination of v⃗1 , ..., v⃗n
that gives ⃗v .

The standard basis vectors are simply the column of the identity matrix I. For example:
 
  1 0 0
1 0
- In R2 , the columns of I = are the standard basis. - In R3 , the columns of I = 0 1 0 are the
0 1
0 0 1
standard basis. - and so on. . .

It is also very important to note that the basis of a space is NOT unique. This means that there can be
more than one basis of a space. The uniqueness proof we did above proves that single vector in the vector
space is a unique linear combination of the basis vectors, but not that there was only one basis for the space!

Also, the columns of every invertible matrix A of dimension n × n are a basis for R2 .

Why?

- Independence - The only solution to A⃗x = ⃗0 is ⃗x = A−1⃗0 = ⃗0. - Span Rn - A is invertible → C(A) = Rn .

This also implies that the vectors v⃗1 , ..., v⃗n are a basis for Rn IFF they are the columns of an n × n dimension
invertible matrix.

Also, if A is singular, then the pivot columns of A are a basis for C(A).

3.4.3 Dimension of a Vector Space

We can define that the dimension of the vector space V is the number of vectors in the space’s basis.

All of the possible bases for a vector space V all have the same number of vectors! Therefore, the dimension
of the space should be the same no matter which basis is chosen.

- Any n independent vectors in Rn must span Rn . So, they are a basis. - Any n vectors that span Rn must
be independent. So, they are a basis. - If the n columns of A are independent, they must span Rn . So,
A⃗x = ⃗b is solvable for any ⃗b. - If the n columns of A span Rn , they are independent. So, A⃗x = ⃗b has only
one solution.

43
3.4.4 Applying these Concepts to Other Spaces

The concepts of span, independence, basis and dimension apply to spaces other than column vectors as well.

For example, the space R2×2 represents the space of all 2 × 2 matrices.

We can solve the standard basis and get that the standard basis is:
       
1 0 0 1 0 0 0 0
A1 = , A2 = , A3 = , A4 =
0 0 0 0 1 0 0 1

So then, we can say that the dimension of the space R2×2 is 4.

44
3.5 Dimensions of the Four Subspaces

3.5.1 Four Spaces of a Matrix

Notice the difference between m and n in all of the following examples:

The columns space C(A) is a subspace of Rm .

The row space C(AT ) = the space spanned by the rows of A and is a subspace of Rn .

The null space N (A) is a subspace of Rn .

The left nullspace N (AT ) is a subspace of Rm .

- y ∈ N (AT ) ←→ AT ⃗y = ⃗0 ←→ ⃗y A = ⃗0T

3.5.2 Fundamental Theorem of Linear Algebra, Part 1

We now can understand Part 1 of the Fundamental Theorem of Linear Algebra, which restates but
also adds on to the notes above.

Fundamental Theorem of Linear Algebra - Part 1:


The columns space C(A) is a subspace of Rm .
- dim C(A) = r
The row space C(AT ) is a subspace of Rn .
- dim C(AT ) = n − r
The null space N (A) is a subspace of Rn .
- dim N (A) = n − r
The left nullspace N (AT ) is a subspace of Rm .
- dim N (AT ) = m − r
This all leads to two conclusions:
dim N (A) + dim C(AT ) = n
dim C(A) + dim N (AT ) = m

45
4 Orthogonality

4.1 Orthogonality of the Four Spaces

4.1.1 Definition of Orthogonal Vectors

Two vectors ⃗v and ⃗u are orthogonal if they are perpendicular.

Mathematically, we can show that the vectors are orthogonal if ⃗v · ⃗u = 0.

Equivalently, if ⃗v T ⃗u = 0.
2 2 2
It is also good to note that || ⃗v + ⃗u || = || ⃗v || + || ⃗u || . This is derived from the pythagoren theorem!

4.1.2 Pairwise Orthogonal

A set of vectors ⃗v1 , ..., ⃗vk are pairwise orthogonal if the dot product of for all combination of non-same
vectors is 0.

We can then more formally say that: A set of vectors ⃗v1 , ..., ⃗vk are pairwise orthogonal if
⃗vi · ⃗vj = 0 for all i ̸= j and i, j = 1, 2, .., k.

- Proof

Suppose a1⃗v1 + ... + ak⃗vk = 0.

Take the inner product with ⃗vj .

⃗vjT (a1⃗v1 + ... + ak⃗vk ) = a1⃗vjT ⃗v1 + ... + ak⃗vjT ⃗vk → aj ⃗vjT ⃗vj
2
Recall the fact that ⃗vjT ⃗vj = || ⃗vj ||

This implies that:


2
aj || ⃗vj || = 0
2
Since the length of the vectors are non-zero, then || ⃗vj || ̸= 0.

Therefore, it has to be the case that aj = 0.

4.1.3 Orthogonal Subspaces

Vector subspaces V and U are orthogonal subspaces if every vector in each subspace are orthogonal.

We can then more formally say that: Vector subspaces V and U are orthogonal subspaces if
⃗v T ⃗u = 0 for all ⃗v ∈ V and ⃗u ∈ U .

46
4.1.4 Orthogonal Matrix Spaces

One fact in linear algebra is that the null space N (A) and the row space C(AT ) of a matrix A are orthogonal.

- Why?

We can derive this from the base equation A⃗x = ⃗0:


 
(row 1)
..
The row space would be represented as A as   ⃗x = ⃗0.
 
.
(row m)

So, every ⃗x ∈ N (A) is perpendicular to every row of A.

∴ N (A) is orthogonal to the row space of A.

We can also argue this algeabraicly:

If ⃗x ∈ N (A) then A⃗x = ⃗0.

If ⃗y ∈ C(AT ), then ⃗y = AT ⃗z for some ⃗z!

Therefore:

⃗y T ⃗x = ⃗zT A⃗x = ⃗zT ⃗0 = 0.

4.1.5 Orthogonal Complements

An orthogonal complement of a subspace of V contains every vector that is perpendicular (orthogonal)


to V !

We use the notion V ⊥ to denote the orthogonal complement of V .

We can extend this as well.

The orthogonal complement of a subspace V ⊂ U contains every vector in the hosting space U that is
perpendicular (orthogonal) to V .

We can then more formally say that: V ⊥ = ⃗u ∈ U : ∀⃗v∈V ⃗uT ⃗v = 0




We can also say that:

- The sum of the dimension of V and V ⊥ is the dimension of U . - dim V + dim V T = dim U - The direct
sum of V and V ⊥ is the space U . - V ⊕ V ⊥ = U

4.1.6 Fundamental Theorem of Linear Algebra, Part 2

With the knowledge of orthogonality, we now know what we need to learn the second part of the Fundamental
Theorem of Linear Algebra.

47
Fundamental Theorem of Linear Algebra - Part 2:
The null space N (A) is the orthogonal complement of the row space C(AT ) in Rn .
- N (A) = C(AT )⊥
The left null space N (AT ) is the orthogonal complement of the col space C(A) in Rn .
- N (AT ) = C(A)⊥

4.1.7 Action of A

Given this, every vector ⃗x in the space Rn can be decomposed as:

⃗x = ⃗xr + ⃗xn

x⃗r = Row space component of ⃗x

⃗xn = Null space component of ⃗x

This expands the equation A⃗x = ⃗b:

A⃗x = A (⃗xr + ⃗xn ) = A⃗xr + A⃗xn = A⃗xr + ⃗0 = A⃗xr , so:

A⃗xr = ⃗b.

4.1.8 Example Solving for the Orthogonal Complement of S

Say that S is a space that is spanned by a single vector as shown below:


 
 2 
S = span 1
3
 

What is S ⊥ , the orthogonal complement of S?

First, we need to know what S ⊥ means — which, is the space of all vectors ⃗x ∈ R3 that are perpendicular
to every vector ⃗s ∈ S. So, we can show the following:
  T 

 2 

S = span ⃗x ∈ R3 : 1 ⃗x = 0
3

 

48
We can then find the reduced row echelon form.
  
T
2
rref  1  = rref 2 1 3 → 1 12 32
      

We can now follow the process to find the special vectors that span the null space.
 1   3 
 −2 −2 
S ⊥ = N 2 1 3 = span  1  ,  0 
 

0 1
 

Therefore, we can conclude that:


   1   3 
 2   −2 −2 
S = span 1 , S ⊥ = span  1  ,  0 
3 0 1
   

We can also say that, using the Fundamental Theorem of Linear Algebra:

3 = dim R3 = dim S + dim S ⊥ = 1 + 2

Note: We can use the above formula dim Rn = dim S +dim S ⊥ to find the dimensions of the hosting
space, S, or the subspace of all vectors orthogonal to S S ⊥ .

Lastly, we can also say that any vector ⃗x ∈ R3 can be decomposed into:

⃗x = ⃗xs + ⃗xs⊥

Meaning that, any vector can be decomposed into a vector from either S or S ⊥ .

4.1.9 Extending Part 2 of the Fundamental Theorem of Linear Algebra

We can also extend what we mentioned above.

Fundamental Theorem of Linear Algebra - Part 2:


The null space N (A) is the orthogonal complement of the row space C(AT ) in Rn .
- N (A) ⊕ C(AT ) = Rn
The left null space N (AT ) is the orthogonal complement of the col space C(A) in Rn .
- N (AT ) ⊕ C(A) = Rm

49
4.2 Projections

4.2.1 Definition of Projections

The projection of a vector is the closest point from the vector onto another vector (either another line or
an axis).

4.2.2 Projection onto an Axis or Plane


 
2
For example, take the vector ⃗b = 3.
4
1. Project the vector onto the z axis.
 
0
Answer: p⃗1 = 0
4
2. Project the vector onto the xy plane.
 
2
Answer: p⃗2 = 3
0

4.2.3 Projection Matrices

We can use projection matrices that, when acted on a vector, can find the projection of that vector.

We can say that, for any vector ⃗b and it’s projection p⃗, there exists a projection matrix P where P⃗b = p⃗.
 
2
For example, using the problem from above where ⃗b = 3:
4
1. Project the vector onto the z axis.
   
0 0 0 0
Answer: p⃗1 = 0, where P1 = 0 0 0. This is true because P1⃗b = p⃗1 .
4 0 0 1
2. Project the vector onto the xy plane.
   
2 1 0 0
Answer: p⃗2 = 3, where P2 = 0 1 0. This is true because P2⃗b = p⃗2 .
0 0 0 0

Note: It is also pretty cool to note that if P1 + P2 = I, then C(P1 ) and C(P2 ) are orthogononal
complements! This makes sense. Think about the examples above.
- The column space C(P1 ) is simply the span of the last column — which is the z axis. - The column
space C(P2 ) is simply the span of the last two cols - which is the xy plane.
If you think about it, the z axis is perpendicular to the xy plane! Therefore, the columns spaces
C(P1 ) and C(P2 ) are orthogonal complements.

50
The note above extends into the following rules:

First, vector ⃗b and be decomposed into the sum of two projections p1 and p2 .

So, if ⃗b = p⃗1 + p⃗2 :

- P1 + P2 = I

and therefore:

- C(P1 ) = C(P2 )⊥ - R3 = C(P1 ) ⊕ C(P2 )

4.2.4 Projection onto a Line

The projection of a vector ⃗b onto a line passing through the origin and the direction of vector ⃗a is simply
the shadow.

The shadow , or the projection p⃗ of vector ⃗b, is essentially a scaled-down version of the vector ⃗a.

Because of this, we can say that there is some scalar, denoted as x̂, that scales ⃗a to get p⃗.

Therefore: p⃗ = x̂⃗a.

We can also define an error vector, which essentially is the difference between the original matrix ⃗b and the
projection vector p⃗. The dotted line in the image above describes the error. We denote the error vector as
⃗e, and we know that:

⃗e = ⃗b − p⃗ = ⃗b − x̂⃗a

As you can also see in the image above, ⃗e should be perpendicular to the original vector ⃗a.

Therefore, ⃗e and ⃗a are orthogonal — meaning that ⃗aT ⃗e = ⃗0.

We can use this formula to calculate the value of x̂:

⃗aT ⃗e = ⃗0

Since ⃗e = ⃗b − p⃗ = ⃗b − x̂⃗a,

51
 
⃗aT ⃗b − x̂⃗a = ⃗0

⃗aT ⃗b − x̂⃗aT ⃗a = 0

−x̂⃗aT ⃗a = −⃗aT ⃗b

x̂⃗aT ⃗a = ⃗aT ⃗b
aT ⃗b

x̂ = aT ⃗
⃗ a
, where ⃗aT ⃗a ̸= 0

Therefore:
aT ⃗b

p⃗ = x̂⃗a = aT ⃗
⃗ a
⃗a

Calculating the Projection Matrix


aT ⃗b

Given the equation: p⃗ = x̂⃗a = aT ⃗
⃗ a
⃗a, we can actually calculate the projection matrix P that turns the vector
⃗b into the projected vector p⃗!

aT ⃗b

p⃗ = x̂⃗a = aT ⃗
⃗ a
⃗a

Rearrange.
aT ⃗b
⃗ T⃗
p⃗ = aT ⃗
⃗ a
⃗a → p⃗ = ⃗a ⃗⃗aaT ⃗ab

Since this is all multiplication, rearrange again.


T⃗
aT ⃗
p⃗ = ⃗a ⃗⃗aaT ⃗ab → p⃗ = ⃗
a⃗
aT ⃗
⃗ a
b

In this case, this has been rearranged to get a projection matrix. This is because ⃗a⃗aT is a matrix Rn×n ,
which is being scaled by a scalar ⃗aT1 ⃗a ∈ R. Then, ⃗b is the original vector.
 T  T
Thus, p⃗ = ⃗⃗aa⃗aT ⃗a ⃗b, where ⃗⃗aa⃗aT ⃗a is a matrix. Remember the equation above that p⃗ = P⃗b. We can finally
conclude that:
 T
P = ⃗⃗aa⃗aT ⃗a

In Conclusion:
 
aT ⃗b aT ⃗b aT
- p⃗ = P⃗b - p⃗ = x̂⃗a = ⃗
aT ⃗
⃗ a
⃗a (which implies x̂ = ⃗
aT ⃗
⃗ a
) -P = ⃗
a⃗
aT ⃗
⃗ a
- ⃗e = ⃗b − p⃗ = ⃗b − x̂⃗a

4.2.5 Projection onto a Subspace

Just like we can project a vector ⃗b onto a line, we can also project ⃗b onto the subspace spanned by given
vectors ⃗a1 , ..., ⃗an .

Alternatively, this is equivalent to finding the combination x̂⃗a1 + ... + x̂⃗an that is the closest to the vector ⃗b.

We can say that this is equivalent to finding x̂⃗a1 + ... + x̂⃗an = Ax̂ that is closest to ⃗b.

In terms of the error, we want the error ⃗e = ⃗b − p⃗ = ⃗b − A⃗x to be perpendicular to the subspace spanned
by ⃗a1 , ..., ⃗an — and also perpendicular to each vector ⃗a1 , ..., ⃗an .

From this, we can get the equation:

52
 T  T
⃗a1 ⃗a1 
⃗e = ⃗0 →  ...  ⃗e = ⃗0 →  ...  ⃗b − Ax̂ = ⃗0 → AT ⃗b − Ax̂ = ⃗0
 T   
⃗a1 ... ⃗an
   

⃗aTn ⃗aTn

Then using distributivity,


 
AT ⃗b − Ax̂ = ⃗0 → AT ⃗b − AT Ax̂ = ⃗0

Finally, we can say that:

AT Ax̂ = AT ⃗b

This is known as the Normal Equation!

Calculating the Projection Matrix

First, we must follow the if-statment below.


−1
If AT A is invertible, then AT A exists and we can use it below.

Therefore, given the Normal Equation AT Ax̂ = AT ⃗b, we can say that:
−1
x̂ = AT A AT ⃗b exists ∈ Rn .

Then, using p⃗ = Ax̂, we can say that:


−1
p⃗ = A AT A AT ⃗b exists ∈ Rm .

Using the equation p⃗ = P⃗b, we then can conclude that the projection matrix P is:
−1
P = A AT A AT exists ∈ Rm×m .

- Example
     
6 1 0
Project ⃗b = 0onto the subspace spanned by 1 and 1.
0 1 2

We can say that:


 
1 0
A = 1 1
1 2
 
  1 0  
T 1 1 1  3 3
A A= 1 1 =

0 1 2 3 5
1 2
 
  6  
T⃗ 1 1 1   6
A b= 0 =
0 1 2 0
0
 
T
−1 T⃗ 5
x̂ = A A A b=
−3

Therefore,

53
   
1 0   5
5
p⃗ = Ax̂ = 1 1 = 2
−3
1 2 1

and has a projection matrix P :


 
5 2 −1
−1 T
A = 16  2 2 2 

P = A AT A
−1 2 5

Remember, all of this is based on the assumption that AT A is invertible.

And remember, AT A is invertible IFF A has linearly independent columns.

- Proof

In Conclusion:
−1 −1
- p⃗ = P⃗b - ⃗e = ⃗b − p⃗ = ⃗b − A⃗x - AT Ax̂ = AT ⃗b - x̂ = AT A AT ⃗b - p⃗ = Ax̂ = A AT A AT ⃗b -
−1 T
P = A AT A A

54
4.3 Least Squares

4.3.1 Finding the Line of Best Fit

Oftentimes, we will want to solve the system of equations A⃗x = ⃗b, but there is no solution!

Imagine that,

where ⃗x are some model parameters and then ⃗b are a set of noisy, scattered output points.

Given this scenario, we might want to instead fit a line to the points rather than get an exact solution.
Generally, the equation for a linear best fit line would be:

b = C + Dt, where b is synonymous to y and t is synonymous to x, and where C is the y-intercept (the
constant) and D is the coefficient for the t term.

Instead, using this basic model, we can begin to turn this into a matrix form A⃗x = ⃗b. Looking at the best-fit
line equation, you can see that b = C + Dt can be shown as ⃗1C + ⃗tD = ⃗b. The C term is ⃗1C because all of
the coeffients are constant and already set.

Now from this, we know the actual variables, what would go in ⃗x, are C and D. So, we can now build our
linear system of equations in matrix form!
 
⃗1 ⃗t C = ⃗b
 
D

Remember, ⃗1 ⃗t is a matrix and not a vector because its entries ⃗1 and ⃗t are columns!
 

This form will not necessarily have a solution.

So, you can solve this system (and create a best-fit line) by solving the Normal Equation!
Therefore, instead of solving A⃗x = ⃗b, you would solve AT A⃗x̂ = AT ⃗b.

Note: Remember for this approach, you must either assume or confirm that the columns of A are
independent — otherwise this does not work!

Example

Fit a line to the points (0, 6), (1, 0), and (2, 0).

First, recall the line equation b = C + Dt

Now, also remember that ⃗1C + ⃗tD = ⃗b.

55
Convert to matrix form:
   
1 0   6
1 1 C
= 0
D
1 2 0

This clearly has no solution. But instead, let’s use the Normal Equation AT A⃗x̂ = AT ⃗b!

So, we attempt to instead solve:


 T    T  
1 0 1 0   1 0 6
1 C
1 1 1 = 1 1 0
D
1 2 1 2 1 2 0
    
3 3 C 6
=
3 5 D 0

Now we solve for ⃗x̂.


   −1  
C 3 3 6
=
D 3 5 0
   
C 5
=
D −3

Thus, C = 5 and D = −3.

Using the equation b = C + Dt, we can reach our final equation:

b = 5 − 3t

4.3.2 Why A⃗x = ⃗b cannot be Solved (but AT A⃗x̂ = AT⃗b can)

Geometric Justification

We cannot solve A⃗x = ⃗b because ⃗b ∈


/ C(A). i.e., ⃗b is NOT in the column space of A.

Instead, we find a projection p⃗ ∈ C(A) that is the closest to the vector ⃗b.

p⃗ = projection of ⃗b onto C(A).

That happens when ⃗e = ⃗b − p⃗ = ⃗b − A⃗x is perpendicular to C(A) — or, orthogonal!

That happens when || ⃗e || = || A⃗x̂ − b || is minimized!

56
4.4 Orthonormal Bases and Gram-Schmidt

4.4.1 Definition of Orthonomal Vectors

Recall that ⃗v1 , ..., ⃗vn are pairwise orthogonal if ⃗vi · ⃗vj = 0 for all i ̸= j and i, j = 1, 2, .., n.

We can extend this to provide another definition.

We can say vectors ⃗q1 , ..., ⃗qn are orthonormal if they are unit length orthogonal vectors.
(
T 0 i ̸= j
Vectors ⃗q1 , ..., ⃗qn are orthonormal if: ⃗qi ⃗qj = , where the value is 0 when they are orthogonal and
1 i=j
1 where they are the unit vector.
⃗v1 ⃗vn
We can further extend this. If ⃗v1 , ..., ⃗vn are pairwise orthogonal, then vectors ⃗q1 = v1 || ...⃗
|||⃗ qn = |||⃗
vn || are
orthonormal.

4.4.2 Matrices with Orthonomal Columns

If a matrix Q has orthonormal columns, then we can say that QT Q = I.

Note: In these cases, Q does not have to be square!

Now, if Q is square, we can say that:

QQT = QT Q = I, and that QT = Q−1 .

We call Q an orthogonal matrix.

The columns of an orthogonal matrix Q are an orthonormal basis.

Examples

- Permutation matrices P ! This is easy to confirm because each column is a unit vector and all columns are
orthogonal. - Permutation matrices are orthogonal and P T = P −1 . - Rotation matrices R. - Proof

57
 
cos θ sin θ
Show that R = is orthogonal.
− sin θ cos θ

Well, we can say that:


 T  
cos θ sin θ cos θ sin θ
RT R = =
− sin θ cos θ − sin θ cos θ

cos θ2 + sin θ2
   
cos θ sin θ − cos θ sin θ 1 0
= = I.
cos θ sin θ − cos θ sin θ cos θ2 + sin θ2 0 1

Since RT R = I, this implies orthogonality.

4.4.3 Invariance

If Q has orthonomal columns, then when scaling a vector ⃗x if:

1. Leaves the length unchanged: - || Q⃗x || = || ⃗x || 2. Leaves angles (inner products) unchanged. -
T
(Q⃗x) (Q⃗y ) = ⃗xT QT Q⃗y = ⃗xT ⃗y

We can also prove that if Q has orthonomal columns, then || Q⃗x || = || ⃗x || for any vector ⃗x.

- Proof
2 T
|| Q⃗x || = (Q⃗x) (Q⃗x) = ⃗xT QT Q⃗x Since QT Q = I, then:
2
⃗xT QT Q⃗x = ⃗xT ⃗x = || ⃗x ||

4.4.4 Finding Least Square (Line of Best Fit) with Orthogonal Matrices

Recall the Normal Equation: that is, the least square solution to A⃗x = ⃗b is:

AT Ax̂ = AT ⃗b

If A has orthonormal columns, then we can say that A → Q. Then, the least squares solution to Q⃗x = ⃗b is
instead QT Qx̂ = QT ⃗b. However, remember that QT Q = I, so this simplifies to:

x̂ = QT ⃗b

So, if Q is square, then x̂ is an EXACT SOLUTION, NOT JUST THE CLOSEST SOLUTION!

That is very cool, and it is also very important to note the distinction.

This is because if Q⃗x = ⃗b, if we set ⃗x = x̂, then Qx̂ = b is QQT ⃗b = ⃗b and ⃗b = ⃗b.

4.4.5 Constructing Orthonormal Vectors using Gram-Schmidt

So, how can be construct orthonormal vecors?

We can use the Gram-Schmidt Algorithm.

The Gram-Schmidt Algorithm starts with independent vectors ⃗a, ⃗b, ⃗c. . .

58
⃗ B,
Now, we want to construct orthogonal vectors A, ⃗ and C
⃗ that spans the same space.

Then, we normalize A, B, and C to get orthonormal vectors ⃗q1 , ..., ⃗qn :


A B C
⃗q1 = ||A|| , ⃗q2 = ||B|| , ⃗q3 = ||C|| , ...

Constructing A, B, C

First, we say that A = a.

Then, we want B to be orthogonal to A. Therefore, we start with ⃗b and subtract its projection along A.
AT ⃗b
B = ⃗b = At A A

We can also check that A, B span the same space as a, b by ensuring AT B = 0.

Then, we want C to be orthogonal to A and B. Therefore, we start with ⃗c and subtract its projection on A,
B.
AT ⃗
c BT ⃗
c
C = ⃗c = At A A − Bt B B
n o
By construction, the span span {A, B, C} = span ⃗a, ⃗b, ⃗c .

From there, finding ⃗q1 , ..., ⃗qn is easy.

In Conclusion

A = ⃗a
AT ⃗b
B = ⃗b = At A A

AT ⃗
c BT ⃗
c
C = ⃗c = At A A − Bt B B

AT d⃗ B T d⃗ C T d⃗
D = d⃗ = At A A − Bt B B − CtC C

. . . and so on.

Linear Combinations Proof

Lastly, we can prove that ⃗a, ⃗b, ⃗c are linear combinations of ⃗q1 , ⃗q2 , ⃗q3 (and vice versa).

That is because there is a matrix connecting A and Q.

Using the equations above, we can say that:


T⃗
 T
⃗q1T ⃗c

h i   ⃗q1 ⃗a ⃗q1 b

⃗a b ⃗c = ⃗q1 ⃗q2 ⃗q3  0 ⃗q2T ⃗b ⃗q2T ⃗c. Therefore,
0 0 ⃗q2T ⃗c

A = QR where Q is orthogonal and R is an upper triangular matrix.

59
5 Determinants

5.1 Definition of Determinants


   
a b d −b
Recall that for a matrix A = , its inverse A−1 can be found with the following: A−1 = 1
.
c d ad−bc −c a

In fact, the determinant of matrix A is the is value of ad − bc, where the denominator of the ratio used to
find the inverse.

We can say that the determinant of matrix A can be written as det A, which is also written with the notation
|A| .

a b
Therefore, det A = |A| = = ad − bc.
c d

Note: Remember that a matrix A is invertible IF AND ONLY IF |A| =


̸ 0.

Also, recall that the determinant of any matrix is the product of its pivots.
 
a b
Remember that pivots are revealed through elimination. So, given a matrix A = , after elimination,
  c d
a b
we get that U = c . This process reveals that a and d − ac b are the pivots of the matrix. Their
0 d − a b
product a d − ac b = ad − bc = |A|.


60
5.2 Properties of Determinants

There are a few properties that let us easily find the determinant of a matrix.

5.2.1 Determinant of the Identity Matrix

The first property is easy.

The determinant of the n × n identity matrix I is 1.

1 ... 0
|I| = ... ..
.
.. = 1
.
0 ... 1

5.2.2 Sign Reversal

The determinant changes sign when two rows are exchanged.

Example:
a b c d
=−
c d a b

Because of this property, finding |P | is easy:

- P is a permutation (row exchange) of I.

- |P | = 1 for even number of row exchanges.

- |P | = −1 for odd number of row exchanges.

5.2.3 Linearity

The determinant of a matrix A is a linear function of each row separately.

ta tb a b
=t
c d c d

a + a′ b + b′ a b a′ b′
= +
c d c d c d

a b a b
=t
tc td c d

sa sb a b
= st
tc td c d

61
5.2.4 Determinant of Matrix with Duplicate Rows

If two rows of a matrix A are equal, then |A| = 0.

This rule is also pretty simple, and follows up from the rule defined in property 2.

5.2.5 Determinant After Elimination

Elimination, which involves subtracting a multiple of one row from another, leaves the determinant |A|
unchanged.

This follows from rules 3 and 4. Proof:

a b a b a b
= +
c − la d − lb c d −la −lb

a b a b
= −l
c d a b

a b
=
c d

∴ the determinant of |A| remains unchanged despite elimination.

In conclusion, we can say that:

- Elimination without exchange A → U results in |A| = |U |

- Elimination with exchange P A → U results in |A| = ± |U |

5.2.6 Matrix with a Row of 0

Matrix with a row of zeroes has |A| = 0.

This follows from rules 4 and 5.

5.2.7 Determinant of Triangular Matrices

If a matrix A is triangular, either upper or lower, then the determinant |A| = a11 a22 . . . ann is the product
of diagonal entries.

This can be proved by linearity.

5.2.8 Determinants of Singular and Invertible Matrices

If a matrix A is singular, then |A| = 0.

62
If a matrix A is invertible, then |A| =
̸ 0.

Why?

- Eliminate A → U

- If A is singular, U will have a zero row and so |A| = |U | = 0.

- If A is invertible, the diagonal of U is nonzero (pivots) and so |A| = |U | = d1 d2 . . . dn ̸= 0.

5.2.9 Determinant of Products

The determinant of the product of two matrices is the same as the product of the determines of two
matrices. So, |AB| = |A| |B|

To prove:
|AB|
- Define a function D(A) = |B| , where D : Rn×n → R

- Check that D(A) satisfies properties 1-3.

5.2.10 Determinant of Transpose Matrix

Given a matrix A, we can say that |A| = AT .

Why?

The elimination of A yields P A = LU and A = P T LU , so then AT = U T LT P , and finally |A| = P T |L| |U |


and also AT = U T LT |P |. Well, P and P T are permutations and are orthogonal, so |P | = P T . L is
triangular with 1’s on the diagonal, so |L| = LT = 1. Lastly, U is triangular, so |U | = U T .

So in conclusion, we know that |A| = P T |L| |U | and also AT = U T LT |P |. But, |P | = P T , |L| = LT ,


and |U | = U T .

∴ we can say that |A| = AT .

5.2.11 Determinant of an Orthogonal Matrix

If Q is an orthogonal matrix, then we can say that |Q| = ±1.


2
Why? Well, since QT Q = I, then 1 = |I| = QT Q = |Q| QT = |Q| , so therefore, |Q| = ±1.

5.2.12 Properties Regarding Rows and Columns

It is important to note that all properties that apply to ROWS also apply analogously to COLUMNS.

This makes sense, and is true because |A| = AT .

63
5.3 Permutations and Cofactors

There are two other ways of finding the determinant of a matrix.

5.3.1 The Big Formula

One way of finding the determinant of matrix A is to use the big formula, which essentially uses the
following method to solve for the determinant:

5.3.2 Cofactors

Another way to find the determinant of a matrix is to use cofactors.

64
6 Eigenvectors and Eigenvalues

6.1 Introduction to Eigenvectors and Eigenvalues

6.1.1 Definition of Eigenvalues and Eigenvectors

Recall the expression A⃗x. Usually, multiplication of a vector ⃗x by a matrix A changes the direction of a
vector.

y
So usually, if ⃗x ̸= ⃗0 and ⃗y = A⃗x ̸= 0, then usually it is the case that ⃗
x
||⃗
x|| ̸= y || .
||⃗

However, there are certain special vectors that satisfy the equation A⃗x = λ⃗x, where λ⃗x is in the same
direction of ⃗x, however perhaps has a different magnitude.

Given this equation A⃗x = λ⃗x, we then can say that:

- ⃗x is the eigenvector of A, and that

- λ is the eigenvalue of A.

Note: If λ = 0, then ⃗x is an eigenvector and ⃗x ∈ N (A).

6.1.2 Finding Eigenvalues and Eigenvectors

Remember that eigenvectors and eigenvalues satisfy the equation A⃗x = λx. So, we can also deduce that
A⃗x = λ⃗x = 0, or in other words:

(A⃗x − λI) ⃗x = 0

Note: If (A⃗x − λI) ⃗x = 0 for ⃗x ̸= ⃗0, then A − λI must be singular. This implies that |A − λI| = 0,
and that ⃗x ∈ N (A − λI).

Example:
 
0.8 0.3
Find the eigenvalues of the matrix A = .
0.2 0.7

First, we can find A − λI.

 
0.8 − λ 0.3
A − λI =
0.2 0.7 − λ

Now, we find the determinant of A − λI. Note that the 0.06 below is the reverse direction when finding the
determinant.

65
|A − λI| = (0.8 − λ) (0.7 − λ) − 0.06

3 1
= λ2 − λ +
2 2

 
1
= (λ − 1) λ −
2

Therefore, we can say that λ = 1 or that λ = 12 .

Now that we have found the eigenvalues, we can now begin to look for the eigenvectors.

Given the eigenvalues above, we can say that the eigenvectors are in the null spaces of A − 1I = A − I
and A − 21 I, for each eigenvalue respectively. This basically means that, given two eigenvectors ⃗x1 and ⃗x2 ,
A⃗x1 = ⃗x1 and A⃗x2 = 12 ⃗x2 .
   
0.6 1
We can solve the eigenvectors are ⃗x1 = , and that ⃗x2 = .
0.4 −1
   
0.6 1
We can also verify that ⃗x1 = ∈ N (A − I) and that ⃗x2 = ∈ N (A − 12 I).
0.4 −1

To summarize here:

- Eigenvalues are the roots of the Characteristic Polynomial |A − λI| = 0.

- Eigenvectors are the basis of N (A − λI) for each eigenvalue λ.

66
6.1.3 Properties of Eigenvalues and Eigenvectors

First, suppose that ⃗x is an eigenvector of a matrix A with eigenvalue λ. Then:


- c⃗x is an eigenvector of A with eigenvalue λ.

A (c⃗x) = cA⃗x = cλ⃗x = λ (c⃗x)

- ⃗x is an eigenvector of sA with eigenvalue sλ.

(sA) ⃗x = sA⃗x = sλ⃗x.

- ⃗x is an eigenvector of An with eigenvalue λn .

An ⃗x = λn ⃗x
- ⃗x is an eigenvector of A + I with eigenvalue λ + 1.

(A + I) ⃗x = λ⃗x + ⃗x = (λ + 1) ⃗x.

- If the columns of two matrices A and P sum to 1, then λ = 1 is an eigenvalue.


- If P is singular, then λ = 0 is an eigenvalue.
- If P = P T (P is symmetric), the the eigenvectors are orthogonal.
- Elimination does NOT preserve eigenvalues λ.
- The eigenvalues of a triangular matrix are its diagonal entries, and the product of the eigenvvalues
is the determinant.
- If ⃗x is an eigenvector of both A and B – that is, A⃗x = λ⃗x and B⃗x = β⃗x, we can then say that
AB⃗x = BA⃗x = λβ⃗x.

6.1.4 Trace of a Matrix

We define the trace of a matrix A as the sum of its diagonal entries.

We use the notation trA = a11 + a22 + · · · + ann to define the trace of a matrix.

Using a property from above, we can also notice that the sum of the eigenvalues is also the trace.

6.1.5 Complex Eigenvalues


 
0 −1
Suppose that we have a matrix Q, which is a rotational matrix. We can define Q as Q = .
1 0

Here, the matrix Q rotates a vector by 90◦ .

Now, notice that after a rotation, there is no real vector Q⃗x which is in the same direction as ⃗x.

This leads us to the uncomfortable fact that eigenvalues are not always real.


|Q − λI| = λ2 + 1 = 0 ⇒ λ = ± −1 = ±i

Thus, the eigenvalues are not real.

67
6.2 Diagonalizing a Matrix

6.2.1 Diagonalization and Eigendecomposition

Using what we know about eigenvectors and eigenvalues, we can now diagonalize a matrix.

Say we have a matrix A, which is composed of n independent eigenvectors


  ⃗x1 , . . . , ⃗xn with corresponding
eigenvalues λ1 , . . . , λn , we can create a matrix X such that X = ⃗x1 . . . ⃗xn .
   
If we take AX we can say that AX = A⃗x1 . . . A⃗xn = λ1 ⃗x1 . . . λn ⃗xn

If we rearrange this, we can create an expression that looks like this:

 
λ1
 
⃗x1 . . . ⃗xn  ..  = XΛ

.
λn

In this case, X stays the same, however a new diagonal matrix with the eigenvalues as its entries, known as
Λ, is created.

This creates the relationship:

AX = XΛ

This equation results in two other equations. Assuming X is invertible, so X −1 X = I, then:

- The diagonalization of the matrix A is Λ, and X −1 AX = Λ.

- The eigendecomposition of the matrix A is A = XΛX − 1.

As you can see, both of these equations are derived from the equation AX = XΛ above.

So, the matrix A is diagonalizable if we can write A as A = XΛX − 1.

Note: Remember, not every matrix is diagonalizable!

It is also important to note that powers of A are Ak = XΛk X −1 .

6.2.2 Eigenvector Independence

Eigenvectors that correspond to distinct eigenvalues are independent.

More formally, if A⃗xj = λj ⃗xj with λi ̸= λj for ̸= j, then ⃗x1 , . . . , ⃗xk are linearly independent.

Proof not provided, but is located in the slides.

Again, if A has distinct eigenvalues (no repeats), it has n independent eigenvectors.

So, if A has no repeated eigenvalues, A is diagonalizable.

However, if A has repeated eigenvalues, it can either be diagonalizable or not...

68
6.2.3 Similar Matrices

Two matrices A and C are similar if A = BCB −1 .

Similar matrices have the same eigenvalues.

Proof not provided, but is located in the slides.

69
6.3 Symmetric Matrices

6.3.1 Working with Symmetric Matrices

Symmetric matrices are incredibly nice.

If S = S T is symmetric, then we can always choose orthonormal eigenvectors ⃗q1 , . . . , ⃗qn such that:

S = QΛQT

since Q−1 = QT , where:


 
- Q = ⃗q1 . . . ⃗qn is an orthogonal matrix

- Λ = diag {λ1 , . . . , λn }

Note: Symmetric matrices are ALWAYS diagonalizable, even if they have repeated eigenvalues!

It is also important to note that eigenvectors of S = S T from distinct eigenvalues are orthogonal.

Proof not provided, but is located in the slides.

In addition, if S = S T is symmetric, then we can always choose an orthongonal matrix Q such that
S = QΛQT .

Note: If S = S T has real entries, then it also has real eigenvalues. In addition, the signs of the
eigenvalues must match the signs of the pivots.

6.3.2 Spectral Theorem

The Spectral Theorem states that every real symmetric matrix S = S T admits a factorization of S =
QΛQT , with real eigenvalues in Λ and orthonormal eigenvectors in the columns of Q.

6.3.3 Rank One Decomposition

Once again, recall that S = QΛQT .

The decomposition is that S = QΛQT = λ1 ⃗q1 ⃗q1T + λ2 ⃗q2 ⃗q2T + · · · + λn ⃗qn ⃗qnT

Proof not provided, but is located in the slides.

70
6.4 Positive Definite Matrices

6.4.1 Definition of Positive Definite Matrices

A symmetric matrix S = S T is positive definite (PD) if all its eigenvalues λi > 0.


 
1 0 0
For example, the matrix S = 0 2 0 is positive definite.
0 0 3

We can also say that if S⃗x = λ⃗x, then ⃗xT S⃗x = λ⃗xT ⃗x, which, by the definition of norm, is λ || x2 || > 0.

So, if all λi > 0, then ⃗xT S⃗x > 0 for any eigenvector ⃗x.

If S is positive definite, then ⃗xT S⃗x > 0 for all ⃗x ̸= ⃗0. The converse is also true.

Note: All of the n pivots of S are positive. In addition, all n upper left determinants of the matrix
are also positive.

6.4.2 Properties of Positive Definite Matrices

We can say that if S and T are positive definite, then so is S + T .

Proof:

⃗xT (S + T ) ⃗x = ⃗xT S⃗x + ⃗xT T ⃗x > 0 for all ⃗x ̸= ⃗0.

We can also say that if A has independent columns, then S = AT A is positive definite.

Proof:

Reminder: if A has independent columns, then A⃗x ̸= 0 for ⃗x ̸= ⃗0.


T 2
Then, for ⃗x ̸= ⃗0, ⃗xT S⃗x = ⃗xT AT A⃗x = (A⃗x) A⃗x = || A⃗x || > 0.

6.4.3 Positive Semidefinite Matrices

For a symmetric matrix S = S T , the following are equivalent:

- S is positive semidefinite (PSD).

- S has all λi ≥ 0.

- ⃗xT S⃗x ≥ 0 for all ⃗x ∈ Rn .

Note: For a positive semidefinite matrix, all eigenvalues of S are nonnegative. All of the n pivots of
S are also nonnegative.

71
7 Single Value Decomposition

7.1 Introduction to Single Value Decomposition

7.1.1 Why Single Value Decomposition?

Recall the eigendecomposition of a square matrix A, which is A = XΛX −1 . There are three potential issues
with this:

- Only for square matrices.

- Doesn’t always exist.

- Could be complex.

Now, recall the symmetrix eigendecomposition for symmetric matrices, which is S = QΛQT . The main issue
for this is that:

- Only for symmetric, square matrices.

Well, what if we want to calculate the decomposition for any matrix of any shape (m × n)?

For this, we must find the single value decomposition of the matrix.

7.1.2 Single Value Decomposition Process

First, recall that for a matrix A ∈ Rm×n , the Fundamental Theorem of Linear Algebra states that:

- C(A) N (AT ) = Rm , where dimC(A) = r and dimN (AT ) = m − r.


L

- C(AT ) N (A) = Rn , where dimC(AT ) = r and dimN (A) = n − r.


L

For the notes above, assume r = rankA.

So now, we need to take vectors for the orthonormal bases for Rm and Rn , using the components.

For Rm :

- Orthonormal basis for C(A) : ⃗u1 , . . . , ⃗ur .

- Orthonormal basis for N (AT ) : ⃗ur+1 , . . . , ⃗um .

For Rn :

- Orthonormal basis for C(AT ) : ⃗v1 , . . . , ⃗vr .

- Orthonormal basis for N (A) : ⃗vr+1 , . . . , ⃗vn .

such that:

72
A⃗vi = σi ⃗ui for i = 1, . . . , r
||A⃗vi ||
In this case, σ1 , . . . , σr are the singular values of A - where, σi = ||⃗
ui || = || A⃗vi ||, since || ⃗ui || = 1.

So now with this, we can begin to craft the single value decomposition.

First, put all ⃗us and ⃗v s in matrices. Remember here, we only used the first r ⃗u’s and ⃗v ’s. We would
still have m − r more ⃗u’s and n − r more ⃗v ’s.

⃗ur ∈ Rm×r
 
Ur = ⃗u1 ...

⃗vr ∈ Rn×r
 
V r = ⃗v1 ...

Here, notice that Ur is m × m and that U T U = I.

Also notice that Vr is n × n and that V T V = I.

Now, create a diagonal matrix Σr containing all σi singular values:

 
σ1
Σr = 
 .. 
. 
σr

Now, we can create our reduced single value decomposition as the following:

AVr = Ur Σr

Finally, we can solve for the single value decomposition by multiplying Vr−1 to both sides. Since V V = I,
we can find that:

A = U ΣV T

This equation describes the single value decomposition of A.

7.1.3 Constructing Single Value Decomposition

In order to create a Single Value Decomposition of A, first realize that AT A is positive semidefinite. It also
has orthonormal eigenvectors with nonnegative eigenvalues.

We can construct the matrix V by taking the orthonormal eigenvectors of AT A + the orthonormal basis for
N (A).

We then can derive U and Σ from V .


u′i

Define that ⃗u′i = A⃗vi , where ⃗vi is in V . Then, normalize to get ⃗ui = u′i || .
||⃗

From here, σi = || ⃗u′i || = || A⃗vi .

73
This gives all of the ⃗v ’s and ⃗u’s in C(A) and C(AT ).

Now, to get the remaining ⃗v ’s and ⃗u’s, use any orthonormal bases for N (A) and N (AT ).

This gives the final equation A = U ΣV T .

Note: Σ may be a non-square matrix!

7.1.4 Rank One Decomposition via SVD

σi ⃗ui⃗viT = σ1 ⃗u1⃗v1T + · · · + σr ⃗ur ⃗vrT , and A is the sum of r rank one


P
It is also important to note that A = i
matrices.

7.1.5 Best Rank k Approximation

What if we want to find the closest rank k matrix to B to a given matrix A?

Well, if:

X
A= σi ⃗ui⃗viT = σ1 ⃗u1⃗v1T + · · · + σr ⃗ur ⃗vrT
i

We want to find B such that:

X
B= σi ⃗ui⃗viT = σ1 ⃗u1⃗v1T + · · · + σk ⃗uk⃗vkT
i

74

You might also like