Linear Algebra Notes Ajay Gandecha
Linear Algebra Notes Ajay Gandecha
Ajay Gandecha
December 1, 2022
Contents
1 Introduction to Vectors 10
1
1.3.6 Matrix Equation Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6.1 A = LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.7.1 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2
2.7.2 Rules of Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.8.3 Singular Matrix Proof if the Product of Nonzero vector and matrix is 0 . . . . . . . . 29
3.1.5 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3
3.3.1 Homogeneous System of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4.2 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4 Orthogonality 46
4.1.7 Action of A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4
4.3 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.3 Invariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.4 Finding Least Square (Line of Best Fit) with Orthogonal Matrices . . . . . . . . . . . 58
5 Determinants 60
5.2.3 Linearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3.2 Cofactors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5
6.1.1 Definition of Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . 65
6
0 Introduction to Linear Algebra
A scalar is a term used with vectors to refer to arbitrary constant numbers. These are simply values that
are used as single units, as normal numbers are.
A vector is defined as an object with a direction and a magnitude, but not a position in space. The term
also denotes the mathematical or geometrical representation of such a quantity.
Vectors are often drawn like arrows in space that originate at some origin and point off in some direction.
The length of that arrow is known as the magnitude.
Vectors can also simply be represented as a point in space. This point is the tip of the arrow if the vector
were to have been drawn in arrow form.
To denote a vector, we use the notation v, with the letter representing the vector being bold.
We can also denote a vector like this: ⃗v , with an arrow over the letter rather than a bold face letter.
Traditionally speaking, in a 2-D space, vectors have two components: one for the x-direction, and one for
the y-direction. The value of each of these components represents the distance travelled to get to the ”tip
of the arrow” in each axis.
In a 3-D space, there would be an addition z-component, and in higher dimentional spaces, more components
would be added.
First, one can define a vector based on the point at the tip of the arrow, in a traditional coordinate
system. For example:
7
⃗v = (1, 2)
Matrix (or “column” notation) Vectors are also commonly written in a matrix form, usually written
with one column and n rows, where n is the number of dimensions the vector is in. For example:
1
⃗v =
2
Each row in the ”matrix” above represents movement along a specific axis. For example, a 3-dimensional
vector is defined below:
1
⃗v = 2
6
In this case, the components are represented with subscripts and numbers. So, ⃗v1 = 1, ⃗v2 = 2, and ⃗v3 = 6.
Linear Combination of Unit Vectors Lastly, any vector can be written as a linear combination. This
is a concept covered in Chapter 1, however a vector can always be represented as the sum of products between
scalars and a unit vector.
A unit vector is a vector that travels strictly along only 1 axis with a magnitude of 1. For example, in
physics, the vector î is the unit vector in the x-direction, and ĵ is the
unit vector
in
the y-direction. This
1 0
means that, using the format shown directly above, we say that î = and ĵ = respectively.
0 1
We know the vectors point in a certain direction, and that the different components to a vector represents
the distance in each direction. Given that î and ĵ are vectors with a length of 1 in each direction, we can
actually multiply î and ĵ by the appropriate magnitudes in each direction, and add them together. So, the
third was the vector ⃗v can be represented is the following:
⃗v = 1î + 2ĵ
So, in conclusion based on the three different methods of writing vectors discussed above,
1
⃗v = (1, 2) = = 1î + 2ĵ
2
This basic formula essentially follows the pythagorean theorem for finding the hypoteneuse of a triangle.
The same concept applies here.
8
0.2 Operations with Vectors
We can add two vectors simply by summing the values in each row.
a c a+c
+ =
b d b+d
Remember: you can only add scalars of the same shape! (same dimensions)
We can also scale a vector by multiplying it by a scalar. The formula works as follows:
a c a
c1 = 1
b c1 b
Remember: you can only add scalars of the same shape! (same dimensions)
9
1 Introduction to Vectors
A linear combination is the sum of two scaled vectors, scaled by arbitrary scalars. For example, if ⃗v and
w
⃗ are vectors, and c and d are scalars, then the following is a valid linear combination of ⃗v and w:
⃗
c⃗v + dw
⃗
Linear combinations combine the two fundamental vector operations of addition and scalar multiplication.
Now, the actual result of these operations is another vector. Assuming that vectors ⃗v and w
⃗ are two-
dimensional in the problem above:
⃗v1 w
⃗1 c⃗v1 dw⃗1 c⃗v1 + dw
⃗1
c⃗v + dw
⃗ =c +d = + =
⃗v2 w
⃗2 c⃗v2 dw⃗2 c⃗v2 + dw
⃗2
You can see that the result after both operations is still a single vector. Therefore, a linear combination of
vectors usually will result in a vector. In addition to this, the reverse is true: all non-unit vectors can
be represented as a linear combination of unit vectors and some arbitrary scalars.
This is why it was possible to show in Chapter 0 that vectors can be represented as linear combinations,
with each vector being a unit vector and each scalar being the value of each component.
Just like you cannot add two vectors with different dimensions, you cannot compute the linear com-
bination of two vectors with different dimensions. Remember, this is because a linear combination is
just adding two (scaled) vectors!
The span of a linear combination is simply the set of all possible linear combinations of a set of vectors.
What does this mean?
10
c⃗v + dw
⃗
Consider just c⃗v , which is the vector ⃗v scaled by the scalar c. ⃗v looks like an arrow in space, but we can also
consider it just as the point at the end of that arrow. As we change the value of c, that magnitude of that
vector shrinks and stretches, however the vector always remains on that ”line” it originally sat on, if drawn
to infinity. There are an infinite number of vectors that c⃗v can equal to depending on the value of c, yet it
never strays off of that original line through space.
Because of that, we can say that the span of c⃗v – the set of all its linear combinations – can fill a line in
space. It is impossible to stray from that line, however it is possible to make up any vector on that line from
c⃗v as long as the right c is chosen.
c⃗v + dw
⃗
Remember, when you add vectors, the tail of the second vector sits at the tip of the first vector. So, we
established that the span of c⃗v is a line, since the magnitude of a vector pointing in a singular direction is
being scaled. However, now we introduce a second vector at the end of the first one. That one also scales,
and the span of that scalar and vector is also a line. However, assuming that these two vectors ⃗v and w ⃗ are
pointing in different directions, you will find that the second vector can scale along any point on the line of
the first vector. This essentially means that any point on the 2-dimensional plane can be reached, with the
correct c and d in the linear combination of c⃗v + dw.⃗
So, this means that the span of the linear combination c⃗v + dw
⃗ is a plane (rather than a line).
Now, we can continue this line of thinking forward with the following linear combination:
c⃗u + d⃗v + ew
⃗
Now, we are combining three vectors here. As you can probably imagine, as long as all three of these vectors
are linearly independent, and all three vectors are three-dimensional, then the shape of the span of the linear
combination above would be all of the 3-dimensional space.
11
1.2 Lengths and Dot Products
If you noticed from Chapter 0, we did not go into the multiplication of two vectors.
The “multiplication” of two vectors is defined as the dot product, also known as the inner product, of
two vectors. Let’s say we have two vectors, ⃗v = (v1 , v2 ) and w
⃗ = (w1 , w2 ). Then, the dot product is defined
as the following:
⃗v · w
⃗ = v1 w1 + v2 w2
You can see that the dot product is the sum of the products of each component of the vector. Because of
this, unlike with addition, scalar multiplication, and linear combinations where the final result was a vector,
the final result of a dot product is a number.
⃗v · w
⃗ = v1 w1 + v2 w2 + ... + vn−1 wn−1 + vn + wn
Remember from Chapter 0, we denoted the length of the two-dimensional vector ⃗v as ∥⃗v ∥, using the following
formula:
p
∥⃗v ∥ = ⃗v12 + ⃗v22
In general form, with multiple dimensions where ⃗v = (v1 , v2 , ..., vn ) , the length of vector ⃗v is:
q
∥⃗v ∥ = ⃗v12 + ⃗v22 + ... + ⃗vn−1
2 + ⃗vn2
Again, this formula follows the basic principles of the pythagorean theorem. However, you might notice
something pretty interesting:
q p
∥⃗v ∥ = ⃗v12 + ⃗v22 + ... + ⃗vn−1
2 + ⃗vn2 = ⃗v1⃗v1 + ⃗v2⃗v2 + ... + ⃗vn−1⃗vn−1 + ⃗vn⃗vn
You will actually find that the interior of the square root on the right hand side is actually equivalent to
the dot product of ⃗v with itself, ⃗v ! So, we can also say that:
√
∥⃗v ∥ = ⃗v · ⃗v
Remember from above, we defined a unit vector as a vector with the magnitude of 1. So, we can now prove
2
a vector ⃗v is a unit vector by showing that the dot product ⃗v · ⃗v = ∥⃗v ∥ = 1.
12
Now, we can also find the unit vector ⃗u for any vector ⃗v . In order to do this, we need to divide the vector
by its length:
⃗
v
⃗u = ∥⃗
v∥
This will ensure that the length is 1, but the direction is preserved. The operation of division here is actually
a vector, because the operation is actually scalar multiplication. The scalar here is c = ∥⃗v1∥ as shown here:
⃗
v 1
⃗u = ∥⃗
v∥ = v∥ ⃗
∥⃗ v
Based on the formula for the Law of Cosines we can derive a formula to calculate the angles between two
vectors.
⃗v · w
⃗ = cos θ
If ⃗v and w
⃗ are not unit vectors, then we can add to this formula. Remeber that we defined a unit vector ⃗u
of vector ⃗v to be ⃗u = ∥⃗⃗vv∥ . So, we can apply this to the formula:
⃗
v
∥⃗
v∥ · ∥w
⃗
w∥
⃗ = cos θ
You can see that nothing actually changes. Both vectors here are unit vectors, and the angle between then
does not change depending on the length of the vectors, so it all works out. This equation also can be
manipulated to be:
⃗v · w
⃗ = ∥⃗v ∥ ∥w∥
⃗ cos θ
x y
In addition, given a vector ⃗v = , it is perpendicular to the unit vector ⃗u = c . This can
y −x
be proven by using the dot product and the rule above.
⃗
v
Whatever the angle, the dot product of ∥⃗
v∥ · ∥w
⃗
w∥
⃗ will never exceed 1. Since | cos θ| never exceeds 1, two
fundamental inequalities are produced.
Schwarz Inequality:
|⃗v · w|
⃗ ≤ ∥⃗v ∥ ∥w∥
⃗
Triangle Inequality:
∥⃗v + w∥
⃗ ≤ ∥⃗v ∥ + ∥w∥
⃗
13
1.3 Introduction to Matrices
A matrix is an organized structure of data with rows and columns. Matrices are denoted with a capital
letter. Below is an example of a matrix:
1 2 3
A=
4 5 6
In this case, A is a matrix with 2 rows and 3 columns. Hence, it is a 2x3 matrix.
You can also say that B is also a vector, since it only has one column.
The result of this operation is a matrix with the same original dimensions as A.
Multiplying matrices is pretty interesting and not as easy as simply multiplying all of the values together.
In reality, multiplying two matrices of dimensions m × n and k × l will result in a m × l matrix where each
element is the dot product of the first matrix’s rows and the second matrix’s columns.
Every row of the resulting matrix represents the exact row in the first matrix’s dot product with every
column entry of the second matrix, with the dot product going in the correct slot for each column that was
used to compute it.
14
Every column of the resulting matrix then represents the all of the dot products calculated with the
matching column in the second matrix, with each dot product going in the correct slow for each row that
was used to compute it.
a11 a12 a13
So, for example, if A = , where each element aij represents an entry at row i and column j
a21 a22 a32
b11 b12
in the matrix, and I have B = b21 b22 , then when multiplied:
b31 b32
(a11 , a12 , a13 ) · (b11 , b21 , b31 ) (a11 , a12 , a13 ) · (b12 , b22 , b32 )
AB =
(a21 , a22 , a23 ) · (b11 , b21 , b31 ) (a21 , a22 , a23 ) · (b12 , b22 , b32 )
This does look extremely complicated, but in practice it is not too difficult.
VERY IMPORTANT: The multiplication of matrices is NOT commutative! AB and BA can lead
to different results, and order matters when multiplying matrices.
1 0 0
x1 −1 + x2 1 + x3 0
0 −1 1
1 0 0 x1
x1 −1 + x2 1 + x3 0 = x2 − x1
0 −1 1 x3 − x2
1 0 0 x1 x1
−1 1 0 x2 = x2 − x1
0 −1 1 x3 x3 − x2
At first, these might not look related. But remember, when the matrix multiplication is computed, every
row of the left matrix’s dot product with the right matrix be equivalent to the same structure as above.
An identity
matrix is a square matrix with 1’s on the diagonal, and 0 in all other slots. For example,
1 0 0
1 0
I = 0 1 0 and I =
are both identity matrices.
0 1
0 0 1
15
The inverse of a matrix A is denoted by A−1 . It is not the reciprocal of A, it is just notation. The
following rule applies for an inverse matrix:
A⃗x = ⃗b
This is the fundamental matrix equation form that we will use a lot throughout linear algebra. However,
when solving these equations, a change of perspective is important.
In the past, we have solved equations where both A and ⃗x are known, and the result we solve for is ⃗b. This
is the result of A acting on the vector ⃗x.
1 0 0 x1 x1
For example, if A = −1 1 0, ⃗x = x2 , and ⃗b = x2 − x1 , then A⃗x = ⃗b:
0 −1 1 x3 x3 − x2
1 0 0 x1 x1
−1 1 0 x2 = x2 − x1
0 −1 1 x3 x3 − x2
However now, rather than trying to solve ⃗b, we are given A and ⃗b and we try to solve for ⃗x.
There are a few options for the result of trying to search for ⃗x.
No Solution:
In the equation A⃗x = ⃗b, no solution occurs if ⃗b is NOT a linear combination of the columns of A.
2 4 1
For example, in the equation ⃗x = , there is no solution because there is no linear combination.
1 2 1
2 4 1
The above equation is the same as the linear combination x1 x = , which as no solution.
1 2 2 1
Unique Solution: In the equation A⃗x = ⃗b, a unique solution occurs if ⃗b is a linear combination of the
columns of A, and that we CANNOT write one of the columns of A as a linear combination of the other
columns.
TLDR; If A is invertible and A−1 exists, then the equation A⃗x = ⃗b has one unique solution.
Infinitely Many Solutions It is also possible for this equation to have infinitely many solutions, if
there is a cyclic difference in the matrix. For example here:
16
1 0 −1 x1 x1 − x3
C⃗x = −1 1 0 x2 = x2 − x1
0 −1 1 x3 x3 − x2
Given the equation C⃗x = ⃗0, for any c, x1 = x2 = x3 = c is a solution. Therefore, there are infinitely many
solutions.
A cyclic matrix C has no inverse. All three columns lie in the same plane. Also, those dependent
columns add to the zero vector. Any equation C⃗x = ⃗0 has many solutions.
One of the most powerful uses of matrices, and one fundamental to linear algebra, is using them to represent
linear systems of equations. Take a look at the system below:
x + 2y + 3z = 6
2x + 5y + 2x = 4
6x − 3y + z = 2
This system of equations can first be represented by a linear combination of vectors for each coefficient, with
the scalar being the associated variable. This looks like:
1 2 3 6
x 2 + y 5 + z 2 = 4
3 −3 1 2
Lastly, we can convert this fully into a matrix form equation, where A⃗x = ⃗b:
1 2 3 x 6
2 5 2 y = 4
6 −3 1 z 2
Remember, this form is perfectly valid to represent the linear combination above because the matrix multi-
plication represents that step.
In fact, we can create an augmented matrix that represents the matrix above, however in only one singular
matrix:
1 2 3 6
2 5 2 4
6 −3 1 2
In this case, the individual scalars are abstracted out and automatically assumed based on the number of
rows of the augmented matrix. Since the matrix is not square, the final column is the vector ⃗b. We will use
this form a lot in Chapter 2 when working with eliminating matrices.
17
2 Solving Linear Equations
We can show the solution to linear equations graphically using row and column pictures.
Row Picture
A row picture shows shapes generated based off of the shapes created by each row, where the intersection is
going to be the solution of the linear equations. For example, with the following system of equations:
1 2 3 x 6
2 5 2 y = 4
6 −3 1 z 2
First, you can approach this problem by solving for x, y, and z for the following equations. For the first
equation in the system, we can solve that x = 6, y = 3, and z = 2 respectively. Using these points, we can
generate a shape that intercepts all of these points. Because there are three points here, we can create a 2D
plane.
Repeating this process for the second equation, you find that x = 2, y = 54 , and z = 2. Drawing the graphic
for this looks like the one of the left.
You will see that the two planes intersect at line L, which is a line that passes through the common point
z = 2.
Adding the third equation in the system, solved as x = 13 ,y = − 32 , and z = 2. As you can see, the third
equation also intersects the others at the line L, where the line L passes through the point z = 2.
Column Picture
Unlike the row picture, which treats each column in the matrix as its own unit, the column picture does the
same for the columns instead. Because columns within our coefficient matrix refers to vectors, we can draw
vectors on a plane. First, we can reorient our matrix to be in linear combination form:
18
1 2 3 6
x 2 + y 5 + z 2 = 4
3 −3 1 2
Now, we can clearly draw out each vector in a 3d space, as shown below:
Finally, the solution here can be found by drawing the correct combination of vectors with the right scaling.
In this case, all it took was to multiply column three by a factor of 2, and all other vectors were not used.
So:
1 2 3 6
0 2 + 0 5 + 2 2 = 4
3 −3 1 2
The combination of scalars satisfies the equation, and in conclusion, the column picture points to a solution
of (0, 0, 2).
19
2.2 The Idea of Elimination
The goal of elimination is to manipulate a matrix (one that is representing a system of equations) so that
we can easily solve the system mathematically. There are three main steps of achieving this:
- Gauss-Jordan Elimination pt. 1: Turn matrix into upper-triangular form. - Normalization: Divide
equations so that the pivot is 1. - Gauss-Jordan Elimination pt. 2: Eliminate backwards to get final
values for variables in ⃗x.
Gaussian Elimination is the first step in the wider Gauss-Jordan Elimination. The goal of Gaussian
Elimination is to turn a matrix into upper triangular form. This makes solving systems of equations easier
because solving the last row is usually trivial (requires one division step), and with the result of that final
variable, you are able to use backwards substitution to reach a final answer.
We can work to eliminate the matrix by removing all of the numbers of every pivot. Then, we reach
upper-triangular form since all values underneath the pivots are 0.
Subtract l times row 1 from row 2, and save the new row in row 2.
In this case, lij is the multiplier that when the rows are subtracted, the entry below the pivot becomes 0.
In the example above, the best multiplier to use is l21 = 3 because when 3 times row 1 is subtracted from
row 2, the value under the first pivot 1 will be 0. This is the result of that operation:
1 −2 x 1
=
0 8 y 8
You can see in the example above that we are already done. However, you must keep repeating per row for
every pivot on larger matrixes so all values underneath the pivots are 0.
The values of the pivots themselves are hidden until elimination occurs.
Note: Elimination does not change the solution. It might change the way how row or column pictures
look, but the solution will still be the same.
For example:
1 −2 x 1 1 −2 x 1
= ⇒ =
3 −6 y 11 0 0 y 8
20
You can see here that after eliminating, the pivot in row 2 became 0. Therefore, there is no solution because
there is no possible way that 0 = 8.
There is one exception. If the element for that row in the ⃗b vector is also 0, then there are many solutions
to the equation.
For example:
1 −2 x 1 1 −2 x 1
= ⇒ =
3 −6 y 3 0 0 y 0
This is because the entire bottom row is eliminated from the matrix, and the only equation left is:
x − 2y = 1
There is one issue though. If there is a 0 pivot in the first row, then you can exchange the rows
and continue eliminating.
For example:
0 2 x 4 3 −2 x 5
= ⇒ =
3 −2 y 5 0 2 y 4
This is completely valid to do, since the ordering of equations in a system of equations is arbitrary.
2.2.3 Normalization
We can normalize a matrix by dividing each row by its respective pivot. This is to get all of the pivots to
be 1. This step is ideal to complete before working on backwards elimination.
Now, we can complete the Gauss elimination step, but backwards. Meaning, start at the last row, and
subtract upwards to ensure that only the pivots have values.
Finally, you can solve the system! You know now have single term equations for each variable. Normalization
after the Gauss-Jordan step gives the answer, and is the RREF (Reduced Row Echelon Form).
21
2.3 Elimination Using Matrices
There are many different terms that are important to understand when using matrices to solve linear equa-
tions.
Elimination matrices, represented by E, are matrices that actually perform the elimination steps described
above when multiplied with the matrix A. For example, say while we were eliminating, we subtracts 2 times
row 1 from row 2. This operation can be represented by the following matrix:
1 0 0
E = −2 1 0
0 0 1
This is because EA would equal a matrix with these operations performed. How did we get here?
Well, first the E matrix starts out as the identity matrix I. That is because, IA = A. I.e., I is the base
matrix with no operations performed. In this case, the identity matrix (and the start of our E matrix) looks
like:
1 0 0
I = 0 1 0
0 0 1
Then, we subtract 2 times row 1 from row 2. So, to do this, we put a 2 (the multiplier l21 ) in the first column
(representing row 1) and put it in the second row (representing the row to remove from). Then, we subtract
this matrix from E to represent the subtraction step. So:
1 0 0 0 0 0 1 0 0
E = 0 1 0 − 2 0 0 = −2 1 0
0 0 1 0 0 0 0 0 1
You can see that, in the end, we successfully created the elimination matrix. Now, when E is multiplied to
A, the steps will be performed perfectly.
We can show and even break up our Gaussian Elimination into operations of elimination matrices multiplied
to A.
We also can apply subscripts i and j to the notation to create Eij , where i represents the row to be subtracted
from, and j represents the row to subtract by.
So, for proper notation, we can say that the elimination matrix we created above is:
1 0 0
E21 = −2 1 0
0 0 1
Permutation matrices, represented by P , are matrices that perform row switches (when the first row pivot
is 0, for example.
22
Similarly to above, we can apply subscripts i and j to the notation to create Pij Eij , where i and j are the
rows to be switched.
For example, the permutation matrix to switch rows 2 and 3 would be:
1 0 0
E23 = 0 0 1
0 1 0
When E23 A is computed, the rows for A will be switched as directed. The 1 goes in the column that should
be the new row for a row.
There is actually notation to use also for diving by the pivots. This is simply just D−1 .
h i
So in conclusion, combinations of Eij , Pij , and D−1 when multiplied in order with augmented matrix A ⃗b
successfully eliminates the matrix.
23
2.4 Rules for Matrix Operations
Matrix Addition
1. Matrix addition is commutative. So:
A+B =B+A
1. The distributive property applies to multiplying the sum of matrices by a scalar. So:
c(A + B) = cA + cB
1. Matrix addition is also associative. So:
(A + B) + C = A + (B + C)
Matrix Multiplication 1. The distributive property applies to multiplying a matrix by a sum
of matrices. So:
A(B + C) = AB + BC
1. Matrix multiplication is associative. So:
A(BC) = (AB)C
1. The associative property also applies for vectors, since vectors can be matrices. So:
AB⃗x = (AB)⃗x = A(B⃗x)
1. Matrix multiplication is NOT COMMUTATIVE !
Usually, AB ̸= BA
1. Matrices and exponents work the same way as traditional numbers for the most part.
Ap = AAA...A (p times)
Ap Aq = Ap+q
q
(Ap ) = Apq
1. Lastly for technicalities, b b b b
if B has columns b1 , b2 ,b3 , b4 then B = 1 2 3 4 . Then:
AB = A b1 b2 b3 b4 = Ab1 Ab2 Ab3 Ab4
24
2.5 Inverse Matrices
2.5.1 Definition
This was touched upon in the previous chapter, however, for a square matrix A, an inverse matrix A−1
exists if A−1 A = AA−1 = I, where I is the identity matrix.
A⃗x = ⃗b
I⃗x = A−1⃗b
⃗x = A−1⃗b
Inverse Matrix Uniqueness There is only one inverse matrix for a matrix A. So:
If BA = 1 and also AC = 1, then B = C.
Singular matrices are matrices that do not have an inverse. Therefore, a matrix is singular if there exist
a non-zero vector ⃗x such that A⃗x = ⃗0.
25
2.5.5 Calculating A−1 with Elimination
This is a super cool trick and the fastest way to solve for the inverse of a matrix.
Note: Before normalizing first and after Gauss elimination, the product of the pivots is the deter-
minant of the matrix.
If each diagonal entry of a matrix is greater than the sum of the rest of its row, the matrix is diagonally
dominant.
Diagonally dominant matrices are always invertible. However, a matrix that is not diagonally
dominant could still be invertible.
26
2.6 Factorization of Matrices
2.6.1 A = LU Factorization
The factorization of a matrix is represented by the equation A = LU , where U is the upper triangular
matrix that is created after Gauss elimination. L is the lower triangular matrix that is created from the
product of all of the elimination matrices.
You can also have an A = LDU factorization, where D are the pivots from U in A = LU factorization, and
U is the same U as in the previous factorization, but all rows are divided by the pivots. This balances the
factorization.
For example:
1 0 0 2 1 0 1 0 0 2 0 0 1 1/2 0
A = 1/2 1 0 0 3/2 1 ⇒ 1/2 1 0 0 3/2 0 0 1 2/3
0 2/3 1 0 0 4/3 0 2/3 1 0 0 4/3 0 0 1
27
2.7 Transposes and Permutations
2.7.1 Transpose
We can use the index notation as well, where i is the row number and j is the column number, to say that
(AT )ij = Aji .
Transpose Rules:
T T T T −1
1. AT = A 2. (A + B) = AT + B T 3. (AB) = B T AT 4. A−1 = AT = A−T 5. A is
invertible IF AND ONLY IF AT is invertible. 6. ⃗x · ⃗y = ⃗xT ⃗y 7. ⃗y · (A⃗x) = AT ⃗y · ⃗x
A matrix A is symmetric if A = AT .
With elimination, the form A = LU does not capture symmetry. However, the form A = LDU does
capture symmetry!
If matrix A is symmetric (i.e., A = AT ), then, with NO ROW EXCHANGES, A = LDU is the same as
writing A = LDLT .
28
2.8 Proofs for Midterm 1
Proof:
Since BA = I and AC = I, then B(AC) = B and (BA)C = C since any matrix times its identity
matrix is that matrix. By the associative property, (BA)C = (BA)C. Thus, B = C.
Proof:
Since A⃗x = ⃗b, then we can multiply both sides by A−1 . Then, A−1 A⃗x = A−1⃗b. Since A−1 A = I by
the definition of an inverse matrix, A−1 A⃗x = A−1⃗b can be rewritten as I⃗x = A−1⃗b. Thus, ⃗x = A−1⃗b.
2.8.3 Singular Matrix Proof if the Product of Nonzero vector and matrix is 0
If there is a non-zero vector ⃗x ̸= ⃗0 such that A⃗x = ⃗0, then A is not invertible.
Proof:
Proof by contraction. Suppose x ̸= ⃗0 and A is invertible. Given A⃗x = ⃗0, we can multiply both
sides by A−1 . Then, A−1 A⃗x = A−1⃗0. Since A−1 A = I, the equation can be simplified to I⃗x = A−1⃗0,
which can be further simplified to ⃗x = ⃗0 since any matrix times ⃗0 is just ⃗0. Thus, x = ⃗0, which is a
contradiction.
−1
If A and B are both invertible, then so is AB and (AB) = B −1 A−1 .
Proof
29
−1 −1
Given (AB) = B −1 A−1 , we can multiply both sides by (AB). This means that (AB) (AB) =
−1
(AB) B −1 A−1 . We know that if M is a matrix, M M −1 = I. So, (AB) (AB) = I, and that
I = (AB) B −1 A−1 . Therefore, I = A BB −1 A−1 by the associate property. This simplifies to
I = AIA−1 . Using the associative property, this equation is the same as I = A IA−1 . Since any
identity matrix times a matrix is the original matrix, I = AA−1 . Using the premise above, we know
−1
that AA−1 = I, so when substituted into the equation, I = I. Thus, (AB) = B −1 A−1 .
Proof:
T
We know that ⃗xT A⃗y is a scalar. Therefore, ⃗xT A⃗y = ⃗xT A⃗y because the inverse of a scalar is just
T
a scalar. So, ⃗xT A⃗y = ⃗y T A⃗x is the same as writing ⃗xT A⃗y = ⃗y T A⃗x. This is the same as writing
⃗y A ⃗x = ⃗y A⃗x. Given that A is symmetric, we know that A = AT . Thus, ⃗xT A⃗y = ⃗y T A⃗x.
T T T
30
3 Vector Spaces and Subspaces
A vector space is essentially a set V of vectors along with operations of addition and multiplication, where
if vectors ⃗v and ⃗u are in the vector space, then the linear combination of these two vectors is also in the
space.
- R is the set of all real numbers. Therefore: - Rn = all column vectors with n real components. - For
example: (1, 4) ∈ R2
We can then more formally say that: If vectors ⃗v and ⃗u ∈ Rn and ∃ two scalars c and d, then
c⃗v + Rn also. Example:
d⃗u∈
8 π 0 8 π 0
If 3, 31 , and 0 ∈ R3 , then 8 3 − 5 13 + 0 ∈ R3 also.
5 5 1 5 5 1
There are also other examples vector spaces other than just Rn , including:
- Rm×n - F = Space of real functions f (x). - Pk = Space of polynomials of degree k. - Z = {0} = Vector
space of the 0 vector. - Cn = Space of vectors with n complex components.
In order for a vector space to be valid, it must be closed for vector addition and multiplication. More
specifically, it should follow these eight rules: Note: Assume ⃗x, ⃗y ∈ V and ∃c1 , c2 ∈ F .
Addition Rules
1. ⃗x + ⃗y = ⃗y + ⃗x 2. ⃗x + (⃗y + ⃗z) = (⃗x + ⃗y ) + ⃗z 3. ∃!⃗0 such that ⃗x + ⃗0 = ⃗x 4. ∀⃗x ∃ − ⃗x such that ⃗x + (−⃗x) = ⃗0
Multiplication Rules
Depending on the vector space, we can determine the dimension of the space. Some example are shown
below:
31
Key: Space ⇒ Dimension
Rn ⇒ n
Rm×n ⇒ mn
F ⇒ inf
Pℸ ⇒ k + 1 a0 + a1 x + ... + ak xk
Z = {0} ⇒ 0
3.1.5 Subspaces
- {0} - The whole space V - Line through the origin - Plane through the origin
All Subspaces of R3
- The zero subspace {0} = {(0, 0, 0)} - Any line through (0, 0, 0) - Any plane through (0, 0, 0) - The whole
space R3
Given that V is a space, we can imagine that S = {⃗v1 , ..., ⃗vN } ⊂ V is a set of vectors.
Given the equation for S, we can say that SS = all linear combinations of vectors in S.
We call SS is the subspace spanned by the vectors ⃗v1 , ..., ⃗vN . So:
32
3.1.7 Using Span to Describe the Subspace
We can use the span to describe the subspace. We can simply look at the linear combinations of all of the
vectors in the span, and if they are independent, they can be counted towards contributing to the subspace.
For example:
1
span = Line in R2
2
1 −3 2 1 −3
span , = Line in R : *Because, vectors and are dependent!*
2 −6 2 −6
1 0
span , = R2
0 1
1 0 1
span , , = R2
0 1 1
1 0 0
span 0 , 1 , 0 = R3
0 0 1
1 0 1
span 0 , 1 , 1 = xy plane in R3 : Vectors only contribute 2 dimensions!
0 0 0
Consider an m × n matrix A. The matrix A has columns a⃗1 , a⃗2 , to a⃗n , as shown below:
A = a⃗1 a⃗2 a⃗3
Now, consider all of the linear combinations of the columns of A. This would be shown as:
Remember, we defined the linear combinations of all of the vectors in a space as a span. The linear combi-
nations above also span a space! These two statements are equivalent:
The span described here is known as the column space of the matrix A:
C(A) = span {a⃗1 , ..., a⃗n } = {A⃗x : ⃗x ∈ Rn }. This leads to the following note:
33
Column Space Example
1 0 0 −2
C(A) = x1 2 + x2 0 + x3 0 + x4 −4
−1 1 0 2
1 0
= t1 2 + t2 0
−1 1
You can see that the simplification above can occur because the last two vectors do not contribute.
1 0
= span 2 , 0
−1 1
Reminder: the equation A⃗x = ⃗b is solvable if and only if ⃗b is a linear combination of the columns of the
matrix A. That is,
⃗b = x1 a⃗1 + x2 a⃗2 + ... + xn a⃗n
Altogether, this means that A⃗x = ⃗b is solvable IF AND ONLY IF ⃗b belongs to the column space of A.
- C(A) = the space spanned by the columns of A - C(A) = all linear combinations of columns of A - C(A)
= all vectors A⃗x. - A⃗x = ⃗b has a solution IFF ⃗b ∈ C(A).
34
3.2 Null Space of A
The null space of a matrix A is the space of all solutions ⃗x where A⃗x = ⃗0.
n o
We can then more formally say that:N (A) = ⃗x | A⃗x = ⃗0
We use the notation N (A) to denote the null space of the matrix A.
⃗x, ⃗y ∈ N (A)
⇒ A⃗x = A⃗y = ⃗0
⇒ A (⃗x + ⃗y ) = ⃗0
⇒ ⃗x + ⃗y ∈ N (A)
⃗x ∈ N (A), c ∈ R
⇒ A⃗x = ⃗0
⇒ c⃗x ∈ N (A)
∴ N (A) is a subspace.
There are two main scenarios in which we use to calculate the null space of a matrix A, and this depends
on whether or not A is invertible.
If a matrix A is invertible, then it is really easy to find the null space using the following rule:
n o
If A is invertible, then N (A) = ⃗0 .
35
That’s it!
Example
1 0
A= Easy!
0 1
n o
N (A) = ⃗0
If a matrix A is NOT invertible (i.e., singular ), the process is less trivial, but the following steps can be
followed.
To show the steps to calculate this, we will calculate the null space using the following example. Suppose
that:
1 2 2 4
A=
3 8 6 16
1. First, find the reduced row echelon form (RREF) of the matrix A.
1 2 2 4 1 0 2 0
rref = =R
3 8 6 16 0 1 0 2
1. Now, select the pivot columns. Pivot columns are columns where the column contains only a single 1 in
the column, almost looking as if it would be in the I matrix. For example, columns 1 and 2 in R are pivot
columns. 2. All columns that are not pivot columns are free columns. In this case, columns 3 and 4 in R
are free columns.
Note: If A is wide (i.e., has more columns than rows n > m), then A will have free columns.
3. Now, you can create a set of vectors ⃗s that exist within the null space. First, for the number of pivot
columns, create vectors. The values of the pivot columns go to where, when the dot product of the row
and the vector ⃗s is taken, the 1 corresponds to the value in the free column. In this case:
s⃗1 =
1 , s⃗2 = 0
0 1
4. Now, look at the R matrix. Take the corresponding rows and set the values in the ⃗s matrices by changing
the sign. So:
−2 0
0 −2
s⃗1 =
1 , s⃗2 = 0
0 1
1. These special solutions s⃗1 and s⃗2 span the null space N (A). So, we can say that:
−2 0
0 −2
N (A) = span {s⃗1 , s⃗2 } = span ,
1 0
0 1
36
Note: The dimension of N (A) = Number of free columns in A.
The number of pivots in a matrix are revealed through Gauss-Jordan Elimination, or finding the RREF of
A.
For example:
2 4 −2
A = 1 2 −1
3 6 −3
- Every row in the matrix is a multiple of the first row, and - Every columns is a multiple of the first column!
We can actually represent rank one matrices as the outer product of two vectors: The first column of A,
and the first row of R, where R = rref (A). For example:
2 4 −2 1 2 −1
A = 1 2 −1 → R = 0 0 0
3 6 −3 0 0 0
So therefore, we can conclude that every rank one matrix is of the form A = ⃗u⃗v T .
In addition, the null space of a rank one matrix includes all of the vectors orthogonal to ⃗v .
Proof :
This implies that ⃗u ⃗v T ⃗x = ⃗0, where ⃗v T ⃗x is a scalar! This implies that either u = 0 or ⃗v T ⃗x = 0. Only
37
3.3 Complete Solution to A⃗x = ⃗b
A homogeneous system of linear equations is one in which all of the constant terms are zero. A
homogeneous system always has at least one solution, namely the zero vector.
Note: When a row operation is applied to a homogeneous system, the new system is still homoge-
neous!
Remember, a the equation A⃗x = ⃗b represents a system of linear equations. So, since a homogeneous system
of linear equations always has as least one solution which is the vector, we can reword this in math notation
as the following:
Math notation:
If A⃗x = ⃗0 is homogeneous, then ∃⃗x such that A⃗x = ⃗0.
Well, recall that the null space of a matrix A is the space of all solutions ⃗x where A⃗x = ⃗0!
Therefore, to find that ⃗x, we just have to find a vector ⃗x in the null space (find ⃗x ∈ N (A)).
We can do this by simply finding a basis of the null space N (A). When we solved to find the null space in
Section 3.2 for a singular matrix, we found special solutions s⃗1 ...s⃗2 that span the null space.
We can simply treat one of these special solutions as a basis for the null space. Then, we use the cor-
responding coeffients mapped to the free columns and create a linear combination of these special
solutions. This matrix should exist in N (A).
Example
First, we must find R, which is the reduced row echelon form of matrix A.
1 3 0 2 1 3 0 2
1 3 0 2
A = 0 0 1 4 → R = 0 0 1 4 =
0 0 1 4
1 3 1 6 0 0 0 0
Now, we know that columns 1 and 3 are pivot columns, and columns 2 and 4 are free columns.
So, we can begin to build n number of special vectors, where n = the number of free columns. First, we will
in the values of 1 and 0 in the right place.
1 0
s⃗1 =
, s⃗2 =
0 1
Then, we fill in the rest of the values as described in Section 3.2. To do this, we:
38
- Set each variable slot to 1 and other to 0. - Get pivots from ⃗b.
−3 −2
1 0
0 , s⃗2 = −4
s⃗1 =
0 1
These vectors are special solutions! Now, look at the entry where the value 1 is located. Now, correspond
this to the free column and to the entry of the vector ⃗x (in the original equation A⃗x = ⃗b).
In this case, we will have the coefficients x1 on the vector s⃗1 , and x2 on s⃗2 .
0 1
We can now say that the linear combination above exists in the null space of N (A).
−3 −2
1 0
0 + x4 −4 ∈ N (A)
x2
0 1
No matter the value of the coefficients x2 and x4 , the linear combination will always exist in
the null space N (A)!
We can also say that x⃗n is the vector in the null space that is the linear combination shown above:
−3 −2
1 0
0 + x4 −4
x⃗n = x2
0 1
Above, we talked about solving homogeneous systems of linear equations, where the solution is the zero
vector.
What if we are trying to find the complete solution for a system where ⃗b ̸= ⃗0? That is a non-homogeneous
system!
We can follow a similar format to above, however the process is a bit more involved. It is easiest if this is
done through an example:
Example
39
h i
Given the following augmented matrix A ⃗b :
h i 1 3 0 2 1
A ⃗b = 0 0 1 4 6
1 3 1 6 7
h i h i
First, we must find R d⃗ , which is the reduced row echelon form of the augmented matrix A ⃗b :
h i 1 3 0 2 1 h i 1 3 0 2 1
A ⃗b = 0 0 1 4 6 → R d⃗ = 0 0 1 4 6
1 3 1 6 7 0 0 0 0 0
Now, we know that columns 1 and 3 are pivot columns, and columns 2 and 4 are free columns.
The next step now is a bit different than before. Before trying to find the special solutions, we first want to
⃗
find a particular solution x⃗p , where Rx⃗p = d.
To do this, we:
⃗
- Set free variable slots to 0. - Get pivots from d.
0 0
This is our particular solution! It also should be verifiable that Rx⃗p = d⃗ and that Ax⃗p = ⃗b.
⃗
Note: Solution only exists if any zero rows in R are also 0 in d!
Now that we know the paticular solution, we can now solve for our special solutions.
0 1 0 1
Just like before, we look at the entry where the value 1 is located. Now, correspond this to the free column
and to the entry of the vector ⃗x (in the original equation A⃗x = ⃗b).
In this case, we will have the coefficients x1 on the vector s⃗1 , and x2 on s⃗2 .
0 1
40
Finally, we can create the final, complete solution!
0 0 1
This only happens if A is square or tall. i.e., every column has a pivot.
In this case, there are no free columns / variables and N (A) = {0}.
This only happens if A is square or wide. i.e., every row has a pivot.
Also, C(A) = Rm
41
3.4 Independence, Basis, and Dimension
Vectors v⃗1 , ..., v⃗n are linearly independent if the only combination of vectors that gives ⃗0 is:
This definition can be extended. If x1 v⃗1 + x2 v⃗2 + ... + xn v⃗n = ⃗0, this implies that x1 = x2 = ... = xn = 0. If
this is the case, then v⃗1 , ..., v⃗n are linearly independent.
If there exists a non-zero combination where x1 v⃗1 + x2 v⃗2 + ... + xn v⃗n = ⃗0, then v⃗1 , ..., v⃗n are linearly
dependent.
We can further extend this. All of these points actually mean the same thing, but they are also written in
different ways. So:
The columns of A, where A = a⃗1 a⃗2 ... a⃗n , are linearly independent if the only solution to A⃗x = ⃗0
is ⃗x = ⃗0.
So therefore:
- The columns of A are independent IFF N (A) = {0} and: - The columns of A are independent IFF
r = n. - The columns of A are dependent IFF A⃗x = ⃗0 has a nonzero solution. - Any n vectors in Rm must
be dependent IFF n > m.
3.4.2 Basis
The basis of any vector space V is a set of linearly independent vectors that span the space V .
- Proof of Uniqueness
Suppose that if v⃗1 , ..., v⃗n is a basis for the vector space V , then for any ⃗v ∈ V there is a unique combination
of v⃗1 , ..., v⃗n that gives ⃗v . Say there are two combinations:
42
a1 v⃗1 + a2 v⃗2 + ... + an v⃗n = ⃗v
Remember, since v⃗1 , ..., v⃗n is a basis for the vector space, this implies that v⃗1 , ..., v⃗n are independent. The
linear combination of independent vectors can only equal ⃗0 if and only if each coefficient is equal to
0. Therefore,
a1 = b1 , a2 = b2 , ... , an = bn
If v⃗1 , ..., v⃗n is a basis for the vector space V , then for any ⃗v ∈ V there is a unique combination of v⃗1 , ..., v⃗n
that gives ⃗v .
The standard basis vectors are simply the column of the identity matrix I. For example:
1 0 0
1 0
- In R2 , the columns of I = are the standard basis. - In R3 , the columns of I = 0 1 0 are the
0 1
0 0 1
standard basis. - and so on. . .
It is also very important to note that the basis of a space is NOT unique. This means that there can be
more than one basis of a space. The uniqueness proof we did above proves that single vector in the vector
space is a unique linear combination of the basis vectors, but not that there was only one basis for the space!
Also, the columns of every invertible matrix A of dimension n × n are a basis for R2 .
Why?
- Independence - The only solution to A⃗x = ⃗0 is ⃗x = A−1⃗0 = ⃗0. - Span Rn - A is invertible → C(A) = Rn .
This also implies that the vectors v⃗1 , ..., v⃗n are a basis for Rn IFF they are the columns of an n × n dimension
invertible matrix.
Also, if A is singular, then the pivot columns of A are a basis for C(A).
We can define that the dimension of the vector space V is the number of vectors in the space’s basis.
All of the possible bases for a vector space V all have the same number of vectors! Therefore, the dimension
of the space should be the same no matter which basis is chosen.
- Any n independent vectors in Rn must span Rn . So, they are a basis. - Any n vectors that span Rn must
be independent. So, they are a basis. - If the n columns of A are independent, they must span Rn . So,
A⃗x = ⃗b is solvable for any ⃗b. - If the n columns of A span Rn , they are independent. So, A⃗x = ⃗b has only
one solution.
43
3.4.4 Applying these Concepts to Other Spaces
The concepts of span, independence, basis and dimension apply to spaces other than column vectors as well.
For example, the space R2×2 represents the space of all 2 × 2 matrices.
We can solve the standard basis and get that the standard basis is:
1 0 0 1 0 0 0 0
A1 = , A2 = , A3 = , A4 =
0 0 0 0 1 0 0 1
44
3.5 Dimensions of the Four Subspaces
The row space C(AT ) = the space spanned by the rows of A and is a subspace of Rn .
- y ∈ N (AT ) ←→ AT ⃗y = ⃗0 ←→ ⃗y A = ⃗0T
We now can understand Part 1 of the Fundamental Theorem of Linear Algebra, which restates but
also adds on to the notes above.
45
4 Orthogonality
Equivalently, if ⃗v T ⃗u = 0.
2 2 2
It is also good to note that || ⃗v + ⃗u || = || ⃗v || + || ⃗u || . This is derived from the pythagoren theorem!
A set of vectors ⃗v1 , ..., ⃗vk are pairwise orthogonal if the dot product of for all combination of non-same
vectors is 0.
We can then more formally say that: A set of vectors ⃗v1 , ..., ⃗vk are pairwise orthogonal if
⃗vi · ⃗vj = 0 for all i ̸= j and i, j = 1, 2, .., k.
- Proof
⃗vjT (a1⃗v1 + ... + ak⃗vk ) = a1⃗vjT ⃗v1 + ... + ak⃗vjT ⃗vk → aj ⃗vjT ⃗vj
2
Recall the fact that ⃗vjT ⃗vj = || ⃗vj ||
Vector subspaces V and U are orthogonal subspaces if every vector in each subspace are orthogonal.
We can then more formally say that: Vector subspaces V and U are orthogonal subspaces if
⃗v T ⃗u = 0 for all ⃗v ∈ V and ⃗u ∈ U .
46
4.1.4 Orthogonal Matrix Spaces
One fact in linear algebra is that the null space N (A) and the row space C(AT ) of a matrix A are orthogonal.
- Why?
Therefore:
The orthogonal complement of a subspace V ⊂ U contains every vector in the hosting space U that is
perpendicular (orthogonal) to V .
- The sum of the dimension of V and V ⊥ is the dimension of U . - dim V + dim V T = dim U - The direct
sum of V and V ⊥ is the space U . - V ⊕ V ⊥ = U
With the knowledge of orthogonality, we now know what we need to learn the second part of the Fundamental
Theorem of Linear Algebra.
47
Fundamental Theorem of Linear Algebra - Part 2:
The null space N (A) is the orthogonal complement of the row space C(AT ) in Rn .
- N (A) = C(AT )⊥
The left null space N (AT ) is the orthogonal complement of the col space C(A) in Rn .
- N (AT ) = C(A)⊥
4.1.7 Action of A
⃗x = ⃗xr + ⃗xn
A⃗xr = ⃗b.
First, we need to know what S ⊥ means — which, is the space of all vectors ⃗x ∈ R3 that are perpendicular
to every vector ⃗s ∈ S. So, we can show the following:
T
2
S = span ⃗x ∈ R3 : 1 ⃗x = 0
3
48
We can then find the reduced row echelon form.
T
2
rref 1 = rref 2 1 3 → 1 12 32
We can now follow the process to find the special vectors that span the null space.
1 3
−2 −2
S ⊥ = N 2 1 3 = span 1 , 0
0 1
We can also say that, using the Fundamental Theorem of Linear Algebra:
Note: We can use the above formula dim Rn = dim S +dim S ⊥ to find the dimensions of the hosting
space, S, or the subspace of all vectors orthogonal to S S ⊥ .
Lastly, we can also say that any vector ⃗x ∈ R3 can be decomposed into:
⃗x = ⃗xs + ⃗xs⊥
Meaning that, any vector can be decomposed into a vector from either S or S ⊥ .
49
4.2 Projections
The projection of a vector is the closest point from the vector onto another vector (either another line or
an axis).
We can use projection matrices that, when acted on a vector, can find the projection of that vector.
We can say that, for any vector ⃗b and it’s projection p⃗, there exists a projection matrix P where P⃗b = p⃗.
2
For example, using the problem from above where ⃗b = 3:
4
1. Project the vector onto the z axis.
0 0 0 0
Answer: p⃗1 = 0, where P1 = 0 0 0. This is true because P1⃗b = p⃗1 .
4 0 0 1
2. Project the vector onto the xy plane.
2 1 0 0
Answer: p⃗2 = 3, where P2 = 0 1 0. This is true because P2⃗b = p⃗2 .
0 0 0 0
Note: It is also pretty cool to note that if P1 + P2 = I, then C(P1 ) and C(P2 ) are orthogononal
complements! This makes sense. Think about the examples above.
- The column space C(P1 ) is simply the span of the last column — which is the z axis. - The column
space C(P2 ) is simply the span of the last two cols - which is the xy plane.
If you think about it, the z axis is perpendicular to the xy plane! Therefore, the columns spaces
C(P1 ) and C(P2 ) are orthogonal complements.
50
The note above extends into the following rules:
First, vector ⃗b and be decomposed into the sum of two projections p1 and p2 .
- P1 + P2 = I
and therefore:
The projection of a vector ⃗b onto a line passing through the origin and the direction of vector ⃗a is simply
the shadow.
The shadow , or the projection p⃗ of vector ⃗b, is essentially a scaled-down version of the vector ⃗a.
Because of this, we can say that there is some scalar, denoted as x̂, that scales ⃗a to get p⃗.
Therefore: p⃗ = x̂⃗a.
We can also define an error vector, which essentially is the difference between the original matrix ⃗b and the
projection vector p⃗. The dotted line in the image above describes the error. We denote the error vector as
⃗e, and we know that:
⃗e = ⃗b − p⃗ = ⃗b − x̂⃗a
As you can also see in the image above, ⃗e should be perpendicular to the original vector ⃗a.
⃗aT ⃗e = ⃗0
Since ⃗e = ⃗b − p⃗ = ⃗b − x̂⃗a,
51
⃗aT ⃗b − x̂⃗a = ⃗0
⃗aT ⃗b − x̂⃗aT ⃗a = 0
−x̂⃗aT ⃗a = −⃗aT ⃗b
x̂⃗aT ⃗a = ⃗aT ⃗b
aT ⃗b
⃗
x̂ = aT ⃗
⃗ a
, where ⃗aT ⃗a ̸= 0
Therefore:
aT ⃗b
⃗
p⃗ = x̂⃗a = aT ⃗
⃗ a
⃗a
aT ⃗b
⃗
p⃗ = x̂⃗a = aT ⃗
⃗ a
⃗a
Rearrange.
aT ⃗b
⃗ T⃗
p⃗ = aT ⃗
⃗ a
⃗a → p⃗ = ⃗a ⃗⃗aaT ⃗ab
In this case, this has been rearranged to get a projection matrix. This is because ⃗a⃗aT is a matrix Rn×n ,
which is being scaled by a scalar ⃗aT1 ⃗a ∈ R. Then, ⃗b is the original vector.
T T
Thus, p⃗ = ⃗⃗aa⃗aT ⃗a ⃗b, where ⃗⃗aa⃗aT ⃗a is a matrix. Remember the equation above that p⃗ = P⃗b. We can finally
conclude that:
T
P = ⃗⃗aa⃗aT ⃗a
In Conclusion:
aT ⃗b aT ⃗b aT
- p⃗ = P⃗b - p⃗ = x̂⃗a = ⃗
aT ⃗
⃗ a
⃗a (which implies x̂ = ⃗
aT ⃗
⃗ a
) -P = ⃗
a⃗
aT ⃗
⃗ a
- ⃗e = ⃗b − p⃗ = ⃗b − x̂⃗a
Just like we can project a vector ⃗b onto a line, we can also project ⃗b onto the subspace spanned by given
vectors ⃗a1 , ..., ⃗an .
Alternatively, this is equivalent to finding the combination x̂⃗a1 + ... + x̂⃗an that is the closest to the vector ⃗b.
We can say that this is equivalent to finding x̂⃗a1 + ... + x̂⃗an = Ax̂ that is closest to ⃗b.
In terms of the error, we want the error ⃗e = ⃗b − p⃗ = ⃗b − A⃗x to be perpendicular to the subspace spanned
by ⃗a1 , ..., ⃗an — and also perpendicular to each vector ⃗a1 , ..., ⃗an .
52
T T
⃗a1 ⃗a1
⃗e = ⃗0 → ... ⃗e = ⃗0 → ... ⃗b − Ax̂ = ⃗0 → AT ⃗b − Ax̂ = ⃗0
T
⃗a1 ... ⃗an
⃗aTn ⃗aTn
AT Ax̂ = AT ⃗b
Therefore, given the Normal Equation AT Ax̂ = AT ⃗b, we can say that:
−1
x̂ = AT A AT ⃗b exists ∈ Rn .
Using the equation p⃗ = P⃗b, we then can conclude that the projection matrix P is:
−1
P = A AT A AT exists ∈ Rm×m .
- Example
6 1 0
Project ⃗b = 0onto the subspace spanned by 1 and 1.
0 1 2
Therefore,
53
1 0 5
5
p⃗ = Ax̂ = 1 1 = 2
−3
1 2 1
- Proof
In Conclusion:
−1 −1
- p⃗ = P⃗b - ⃗e = ⃗b − p⃗ = ⃗b − A⃗x - AT Ax̂ = AT ⃗b - x̂ = AT A AT ⃗b - p⃗ = Ax̂ = A AT A AT ⃗b -
−1 T
P = A AT A A
54
4.3 Least Squares
Oftentimes, we will want to solve the system of equations A⃗x = ⃗b, but there is no solution!
Imagine that,
where ⃗x are some model parameters and then ⃗b are a set of noisy, scattered output points.
Given this scenario, we might want to instead fit a line to the points rather than get an exact solution.
Generally, the equation for a linear best fit line would be:
b = C + Dt, where b is synonymous to y and t is synonymous to x, and where C is the y-intercept (the
constant) and D is the coefficient for the t term.
Instead, using this basic model, we can begin to turn this into a matrix form A⃗x = ⃗b. Looking at the best-fit
line equation, you can see that b = C + Dt can be shown as ⃗1C + ⃗tD = ⃗b. The C term is ⃗1C because all of
the coeffients are constant and already set.
Now from this, we know the actual variables, what would go in ⃗x, are C and D. So, we can now build our
linear system of equations in matrix form!
⃗1 ⃗t C = ⃗b
D
Remember, ⃗1 ⃗t is a matrix and not a vector because its entries ⃗1 and ⃗t are columns!
So, you can solve this system (and create a best-fit line) by solving the Normal Equation!
Therefore, instead of solving A⃗x = ⃗b, you would solve AT A⃗x̂ = AT ⃗b.
Note: Remember for this approach, you must either assume or confirm that the columns of A are
independent — otherwise this does not work!
Example
Fit a line to the points (0, 6), (1, 0), and (2, 0).
55
Convert to matrix form:
1 0 6
1 1 C
= 0
D
1 2 0
This clearly has no solution. But instead, let’s use the Normal Equation AT A⃗x̂ = AT ⃗b!
b = 5 − 3t
Geometric Justification
Instead, we find a projection p⃗ ∈ C(A) that is the closest to the vector ⃗b.
56
4.4 Orthonormal Bases and Gram-Schmidt
Recall that ⃗v1 , ..., ⃗vn are pairwise orthogonal if ⃗vi · ⃗vj = 0 for all i ̸= j and i, j = 1, 2, .., n.
We can say vectors ⃗q1 , ..., ⃗qn are orthonormal if they are unit length orthogonal vectors.
(
T 0 i ̸= j
Vectors ⃗q1 , ..., ⃗qn are orthonormal if: ⃗qi ⃗qj = , where the value is 0 when they are orthogonal and
1 i=j
1 where they are the unit vector.
⃗v1 ⃗vn
We can further extend this. If ⃗v1 , ..., ⃗vn are pairwise orthogonal, then vectors ⃗q1 = v1 || ...⃗
|||⃗ qn = |||⃗
vn || are
orthonormal.
Examples
- Permutation matrices P ! This is easy to confirm because each column is a unit vector and all columns are
orthogonal. - Permutation matrices are orthogonal and P T = P −1 . - Rotation matrices R. - Proof
57
cos θ sin θ
Show that R = is orthogonal.
− sin θ cos θ
cos θ2 + sin θ2
cos θ sin θ − cos θ sin θ 1 0
= = I.
cos θ sin θ − cos θ sin θ cos θ2 + sin θ2 0 1
4.4.3 Invariance
1. Leaves the length unchanged: - || Q⃗x || = || ⃗x || 2. Leaves angles (inner products) unchanged. -
T
(Q⃗x) (Q⃗y ) = ⃗xT QT Q⃗y = ⃗xT ⃗y
We can also prove that if Q has orthonomal columns, then || Q⃗x || = || ⃗x || for any vector ⃗x.
- Proof
2 T
|| Q⃗x || = (Q⃗x) (Q⃗x) = ⃗xT QT Q⃗x Since QT Q = I, then:
2
⃗xT QT Q⃗x = ⃗xT ⃗x = || ⃗x ||
4.4.4 Finding Least Square (Line of Best Fit) with Orthogonal Matrices
Recall the Normal Equation: that is, the least square solution to A⃗x = ⃗b is:
AT Ax̂ = AT ⃗b
If A has orthonormal columns, then we can say that A → Q. Then, the least squares solution to Q⃗x = ⃗b is
instead QT Qx̂ = QT ⃗b. However, remember that QT Q = I, so this simplifies to:
x̂ = QT ⃗b
So, if Q is square, then x̂ is an EXACT SOLUTION, NOT JUST THE CLOSEST SOLUTION!
That is very cool, and it is also very important to note the distinction.
This is because if Q⃗x = ⃗b, if we set ⃗x = x̂, then Qx̂ = b is QQT ⃗b = ⃗b and ⃗b = ⃗b.
The Gram-Schmidt Algorithm starts with independent vectors ⃗a, ⃗b, ⃗c. . .
58
⃗ B,
Now, we want to construct orthogonal vectors A, ⃗ and C
⃗ that spans the same space.
Constructing A, B, C
Then, we want B to be orthogonal to A. Therefore, we start with ⃗b and subtract its projection along A.
AT ⃗b
B = ⃗b = At A A
Then, we want C to be orthogonal to A and B. Therefore, we start with ⃗c and subtract its projection on A,
B.
AT ⃗
c BT ⃗
c
C = ⃗c = At A A − Bt B B
n o
By construction, the span span {A, B, C} = span ⃗a, ⃗b, ⃗c .
In Conclusion
A = ⃗a
AT ⃗b
B = ⃗b = At A A
AT ⃗
c BT ⃗
c
C = ⃗c = At A A − Bt B B
AT d⃗ B T d⃗ C T d⃗
D = d⃗ = At A A − Bt B B − CtC C
. . . and so on.
Lastly, we can prove that ⃗a, ⃗b, ⃗c are linear combinations of ⃗q1 , ⃗q2 , ⃗q3 (and vice versa).
59
5 Determinants
In fact, the determinant of matrix A is the is value of ad − bc, where the denominator of the ratio used to
find the inverse.
We can say that the determinant of matrix A can be written as det A, which is also written with the notation
|A| .
a b
Therefore, det A = |A| = = ad − bc.
c d
Also, recall that the determinant of any matrix is the product of its pivots.
a b
Remember that pivots are revealed through elimination. So, given a matrix A = , after elimination,
c d
a b
we get that U = c . This process reveals that a and d − ac b are the pivots of the matrix. Their
0 d − a b
product a d − ac b = ad − bc = |A|.
60
5.2 Properties of Determinants
There are a few properties that let us easily find the determinant of a matrix.
1 ... 0
|I| = ... ..
.
.. = 1
.
0 ... 1
Example:
a b c d
=−
c d a b
5.2.3 Linearity
ta tb a b
=t
c d c d
a + a′ b + b′ a b a′ b′
= +
c d c d c d
a b a b
=t
tc td c d
sa sb a b
= st
tc td c d
61
5.2.4 Determinant of Matrix with Duplicate Rows
This rule is also pretty simple, and follows up from the rule defined in property 2.
Elimination, which involves subtracting a multiple of one row from another, leaves the determinant |A|
unchanged.
a b a b a b
= +
c − la d − lb c d −la −lb
a b a b
= −l
c d a b
a b
=
c d
If a matrix A is triangular, either upper or lower, then the determinant |A| = a11 a22 . . . ann is the product
of diagonal entries.
62
If a matrix A is invertible, then |A| =
̸ 0.
Why?
- Eliminate A → U
The determinant of the product of two matrices is the same as the product of the determines of two
matrices. So, |AB| = |A| |B|
To prove:
|AB|
- Define a function D(A) = |B| , where D : Rn×n → R
Why?
It is important to note that all properties that apply to ROWS also apply analogously to COLUMNS.
63
5.3 Permutations and Cofactors
One way of finding the determinant of matrix A is to use the big formula, which essentially uses the
following method to solve for the determinant:
5.3.2 Cofactors
64
6 Eigenvectors and Eigenvalues
Recall the expression A⃗x. Usually, multiplication of a vector ⃗x by a matrix A changes the direction of a
vector.
⃗
y
So usually, if ⃗x ̸= ⃗0 and ⃗y = A⃗x ̸= 0, then usually it is the case that ⃗
x
||⃗
x|| ̸= y || .
||⃗
However, there are certain special vectors that satisfy the equation A⃗x = λ⃗x, where λ⃗x is in the same
direction of ⃗x, however perhaps has a different magnitude.
- λ is the eigenvalue of A.
Remember that eigenvectors and eigenvalues satisfy the equation A⃗x = λx. So, we can also deduce that
A⃗x = λ⃗x = 0, or in other words:
(A⃗x − λI) ⃗x = 0
Note: If (A⃗x − λI) ⃗x = 0 for ⃗x ̸= ⃗0, then A − λI must be singular. This implies that |A − λI| = 0,
and that ⃗x ∈ N (A − λI).
Example:
0.8 0.3
Find the eigenvalues of the matrix A = .
0.2 0.7
0.8 − λ 0.3
A − λI =
0.2 0.7 − λ
Now, we find the determinant of A − λI. Note that the 0.06 below is the reverse direction when finding the
determinant.
65
|A − λI| = (0.8 − λ) (0.7 − λ) − 0.06
3 1
= λ2 − λ +
2 2
1
= (λ − 1) λ −
2
Now that we have found the eigenvalues, we can now begin to look for the eigenvectors.
Given the eigenvalues above, we can say that the eigenvectors are in the null spaces of A − 1I = A − I
and A − 21 I, for each eigenvalue respectively. This basically means that, given two eigenvectors ⃗x1 and ⃗x2 ,
A⃗x1 = ⃗x1 and A⃗x2 = 12 ⃗x2 .
0.6 1
We can solve the eigenvectors are ⃗x1 = , and that ⃗x2 = .
0.4 −1
0.6 1
We can also verify that ⃗x1 = ∈ N (A − I) and that ⃗x2 = ∈ N (A − 12 I).
0.4 −1
To summarize here:
66
6.1.3 Properties of Eigenvalues and Eigenvectors
An ⃗x = λn ⃗x
- ⃗x is an eigenvector of A + I with eigenvalue λ + 1.
(A + I) ⃗x = λ⃗x + ⃗x = (λ + 1) ⃗x.
We use the notation trA = a11 + a22 + · · · + ann to define the trace of a matrix.
Using a property from above, we can also notice that the sum of the eigenvalues is also the trace.
Now, notice that after a rotation, there is no real vector Q⃗x which is in the same direction as ⃗x.
This leads us to the uncomfortable fact that eigenvalues are not always real.
√
|Q − λI| = λ2 + 1 = 0 ⇒ λ = ± −1 = ±i
67
6.2 Diagonalizing a Matrix
Using what we know about eigenvectors and eigenvalues, we can now diagonalize a matrix.
λ1
⃗x1 . . . ⃗xn .. = XΛ
.
λn
In this case, X stays the same, however a new diagonal matrix with the eigenvalues as its entries, known as
Λ, is created.
AX = XΛ
As you can see, both of these equations are derived from the equation AX = XΛ above.
More formally, if A⃗xj = λj ⃗xj with λi ̸= λj for ̸= j, then ⃗x1 , . . . , ⃗xk are linearly independent.
68
6.2.3 Similar Matrices
69
6.3 Symmetric Matrices
If S = S T is symmetric, then we can always choose orthonormal eigenvectors ⃗q1 , . . . , ⃗qn such that:
S = QΛQT
- Λ = diag {λ1 , . . . , λn }
Note: Symmetric matrices are ALWAYS diagonalizable, even if they have repeated eigenvalues!
It is also important to note that eigenvectors of S = S T from distinct eigenvalues are orthogonal.
In addition, if S = S T is symmetric, then we can always choose an orthongonal matrix Q such that
S = QΛQT .
Note: If S = S T has real entries, then it also has real eigenvalues. In addition, the signs of the
eigenvalues must match the signs of the pivots.
The Spectral Theorem states that every real symmetric matrix S = S T admits a factorization of S =
QΛQT , with real eigenvalues in Λ and orthonormal eigenvectors in the columns of Q.
The decomposition is that S = QΛQT = λ1 ⃗q1 ⃗q1T + λ2 ⃗q2 ⃗q2T + · · · + λn ⃗qn ⃗qnT
70
6.4 Positive Definite Matrices
We can also say that if S⃗x = λ⃗x, then ⃗xT S⃗x = λ⃗xT ⃗x, which, by the definition of norm, is λ || x2 || > 0.
So, if all λi > 0, then ⃗xT S⃗x > 0 for any eigenvector ⃗x.
If S is positive definite, then ⃗xT S⃗x > 0 for all ⃗x ̸= ⃗0. The converse is also true.
Note: All of the n pivots of S are positive. In addition, all n upper left determinants of the matrix
are also positive.
Proof:
We can also say that if A has independent columns, then S = AT A is positive definite.
Proof:
- S has all λi ≥ 0.
Note: For a positive semidefinite matrix, all eigenvalues of S are nonnegative. All of the n pivots of
S are also nonnegative.
71
7 Single Value Decomposition
Recall the eigendecomposition of a square matrix A, which is A = XΛX −1 . There are three potential issues
with this:
- Could be complex.
Now, recall the symmetrix eigendecomposition for symmetric matrices, which is S = QΛQT . The main issue
for this is that:
Well, what if we want to calculate the decomposition for any matrix of any shape (m × n)?
For this, we must find the single value decomposition of the matrix.
First, recall that for a matrix A ∈ Rm×n , the Fundamental Theorem of Linear Algebra states that:
So now, we need to take vectors for the orthonormal bases for Rm and Rn , using the components.
For Rm :
For Rn :
such that:
72
A⃗vi = σi ⃗ui for i = 1, . . . , r
||A⃗vi ||
In this case, σ1 , . . . , σr are the singular values of A - where, σi = ||⃗
ui || = || A⃗vi ||, since || ⃗ui || = 1.
So now with this, we can begin to craft the single value decomposition.
First, put all ⃗us and ⃗v s in matrices. Remember here, we only used the first r ⃗u’s and ⃗v ’s. We would
still have m − r more ⃗u’s and n − r more ⃗v ’s.
⃗ur ∈ Rm×r
Ur = ⃗u1 ...
⃗vr ∈ Rn×r
V r = ⃗v1 ...
σ1
Σr =
..
.
σr
Now, we can create our reduced single value decomposition as the following:
AVr = Ur Σr
Finally, we can solve for the single value decomposition by multiplying Vr−1 to both sides. Since V V = I,
we can find that:
A = U ΣV T
In order to create a Single Value Decomposition of A, first realize that AT A is positive semidefinite. It also
has orthonormal eigenvectors with nonnegative eigenvalues.
We can construct the matrix V by taking the orthonormal eigenvectors of AT A + the orthonormal basis for
N (A).
73
This gives all of the ⃗v ’s and ⃗u’s in C(A) and C(AT ).
Now, to get the remaining ⃗v ’s and ⃗u’s, use any orthonormal bases for N (A) and N (AT ).
Well, if:
X
A= σi ⃗ui⃗viT = σ1 ⃗u1⃗v1T + · · · + σr ⃗ur ⃗vrT
i
X
B= σi ⃗ui⃗viT = σ1 ⃗u1⃗v1T + · · · + σk ⃗uk⃗vkT
i
74