0% found this document useful (0 votes)
10 views64 pages

Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views64 pages

Unit 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

NMCI102 Engineering Mathematics II Department of Mathematics & Computing

Winter 2024-2025 Indian Institute of Technology (ISM), Dhanbad

Unit 1: Matrices, vector spaces, linear transformations

Contents
1 Matrices and system of linear equations 2
1.1 Matrix operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Addition of matrices and scalar multiplication . . . . . . . . . . . . 3
1.1.2 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Matrix transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.4 Determinant of a matrix . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.5 Inverse of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.6 Elementary row operations for matrices . . . . . . . . . . . . . . . 10
1.2 Row-echelon form and reduced row-echelon form . . . . . . . . . . . . . . 11
1.2.1 Row-echelon form . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.2 Reduced row-echelon form . . . . . . . . . . . . . . . . . . . . . . 12
1.3 System of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Gaussian elimination method for solving system of linear equations . . . . 17
1.4.1 The augmented matrix in a row-echelon form . . . . . . . . . . . . 17
1.4.2 The Gaussian elimination process . . . . . . . . . . . . . . . . . . 19
1.5 Gauss–Jordan method for computing the inverse of a matrix . . . . . . . . 21

2 Vector Spaces 24
2.1 Subspaces of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Basis of a vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Dimension of a vector space . . . . . . . . . . . . . . . . . . . . . . . . . 37

3 Introduction to linear transformations 40


3.1 Images of functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2 Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

1
2

3.3 Kernel and range of a linear transformation . . . . . . . . . . . . . . . . . 45


3.4 The rank-nullity theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.5 Subspaces associated with a matrix . . . . . . . . . . . . . . . . . . . . . . 51
3.5.1 Row space, column space and rank of a matrix . . . . . . . . . . . 51
3.5.2 Null space of a matrix . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 The matrix representation of a linear transformation . . . . . . . . . . . . . 54
3.6.1 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.6.2 Coordinate transformations . . . . . . . . . . . . . . . . . . . . . . 60
3.6.3 The matrix of a linear transformation . . . . . . . . . . . . . . . . 62

Linear algebra plays central role in the modern development of science and technology. It finds ex-
tensive applications in engineering, physics, computer science, economics, finance, cryptography,
etc. It is arguably one of the most widely applied fields of mathematics. Due to the importance of
linear algebra in various fields, in particular engineering, its basic theory has become indispens-
able.

This document presents lecture notes for Unit 1 of the course NMCI102 covering the following
topics: matrices, vector spaces, linear transformation, eigenvalues and eigenvectors, diagonaliza-
tion, and quadratic forms.

1 Matrices and system of linear equations

Matrices originated as a mathematical tool to represent and solve systems of linear equations effi-
ciently, with their roots tracing back to ancient methods of organizing coefficients.

Definition 1. A matrix is a rectangular array of scalars arranged in rows and columns. A


general matrix of size (or order) m × n (where n is the number of rows and m is the number
3

of columns) over a scalar field F is written as:


 
a11 a12 ··· a1n
 a21 a22 ··· a2n 
 
A=
 .. .. ... ,
.. 
 . . . 
am1 am2 · · · amn

where aij ∈ F represents the element of A in the ith row and j th column.

Depending on the structure of a matrix, some of the types of matrices are as follows:

• Square matrix: A matrix with the same number of rows and columns.

• Diagonal matrix: A square matrix where non-diagonal elements are zero.

• Identity matrix: A diagonal matrix where all diagonal elements are 1.

• Zero matrix: A matrix with all elements equal to 0.

• Symmetric matrix: A square matrix A such that aij = aji for all i, j.

• Skew-symmetric matrix: A square matrix A such that aij = −aji for all i, j.

Equality of matrices: Two matrices A = [aij ] and B = [bij ] are said to be equal if and only if they
have the same size and the corresponding elements are equal, i.e., aij = bij for all i, j.

1.1 Matrix operations

1.1.1 Addition of matrices and scalar multiplication

Matrix addition: Two matrices can be added if they have the same dimensions. The sum of two
matrices A = [aij ] and B = [bij ] of the same order m × n is another matrix C = [cij ] whose
elements are given by cij = aij + bij . We write C as A + B.

Scalar multiplication: Given a matrix A = [aij ] and a scalar α, the scalar multiplication of A with
the scalar α is defined to be the matrix αA = [αaij ]. In particular, we observe that A + ((−1)A) is
the zero matrix. So, we write −A = (−1)A.
4

1.1.2 Matrix multiplication

The multiplication (or product) of two matrices A and B is defined when their sizes are compat-
ible; i.e., the number of columns in A must be equal to the number of rows in B. Under this
compatibility condition, the multiplication of two matrices is defined as follows.

Definition 2. Let A be a matrix of order m × n and B be a matrix of order n × p. The


multiplication of A and B is a matrix C = AB is of the order m × p, where the elements cij
of C are given by
n
X
cij = aik bkj , for all 1 ≤ i ≤ m, 1 ≤ j ≤ p.
k=1

The following observations are useful:

• This product is sometimes called the row-column product to emphasize the fact that it is a
product involving the rows of A with the columns of B.

• The product of two matrices is not defined if their sizes are not compatible. For example, if
A has size 2 × 5 and B has size 3 × 2 then their product AB is not defined. However, the
product BA is defined. Why? What is the size of BA?

• Matrix multiplication distributes over addition, on both sides (provided, of course, that the
sizes of the matrices involved are compatible for the given operations):

(A + B)C = AC + BC and A(B + C) = AB + AC.

• Matrix multiplication is associative (again, under the compatibility condition):

(AB)C = A(BC).

• In particular, taking the nth power of a square matrix is well-defined for every positive integer
n.

• The transpose of the product of two matrices is the product of their transposes in reverse
order:
(AB)T = B T AT .
5

The matrix multiplication is illustrated in the following examples.

Example 3. Let A and B be the following matrices:


" # " #
1 2 5 6
A= , B= .
3 4 7 8

To compute C = AB, we find each element of C as:


" #
c11 c12
C= ,
c21 c22

where

c11 = (1)(5) + (2)(7) = 5 + 14 = 19,


c12 = (1)(6) + (2)(8) = 6 + 16 = 22,
c21 = (3)(5) + (4)(7) = 15 + 28 = 43,
c22 = (3)(6) + (4)(8) = 18 + 32 = 50.

Thus, the product is: " #


19 22
C= .
43 50

Example 4. Let A and B be two 3 × 3 matrices given by


   
1 2 3 9 8 7
A = 4 5 6 , B = 6 5 4 .
   

7 8 9 3 2 1

The product C = AB is computed as follows:


 
c11 c12 c13
C = c21 c22 c23  ,
 

c31 c32 c33


6

where elements of C are given by

3
X
cij = aik bkj , 1 ≤ i, j ≤ 3.
k=1

We thus have

c11 = (1)(9) + (2)(6) + (3)(3) = 9 + 12 + 9 = 30,


c12 = (1)(8) + (2)(5) + (3)(2) = 8 + 10 + 6 = 24,
c13 = (1)(7) + (2)(4) + (3)(1) = 7 + 8 + 3 = 18,
c21 = (4)(9) + (5)(6) + (6)(3) = 36 + 30 + 18 = 84,
c22 = (4)(8) + (5)(5) + (6)(2) = 32 + 25 + 12 = 69,
c23 = (4)(7) + (5)(4) + (6)(1) = 28 + 20 + 6 = 54,
c31 = (7)(9) + (8)(6) + (9)(3) = 63 + 48 + 27 = 138,
c32 = (7)(8) + (8)(5) + (9)(2) = 56 + 40 + 18 = 114,
c33 = (7)(7) + (8)(4) + (9)(1) = 49 + 32 + 9 = 90.

This gives
 
30 24 18
C =  84 69 54 .
 

138 114 90

Example 5. Let matrices A and B be given by

   
1 6 −2 2 −9
A = 3 4 5  , B = 6 1  .
   

7 0 8 1 −3

Consider   
1 6 −2 2 −9
C = AB = 3 4 5  6 1  .
  

7 0 8 1 −3
7

The first row of C is given by

c11 = (1 · 2) + (6 · 6) + (−2 · 1) = 2 + 36 − 2 = 36,


c12 = (1 · −9) + (6 · 1) + (−2 · −3) = −9 + 6 + 6 = 3.

The second row of C is computed as

c21 = (3 · 2) + (4 · 6) + (5 · 1) = 6 + 24 + 5 = 35,
c22 = (3 · −9) + (4 · 1) + (5 · −3) = −27 + 4 − 15 = −38.

Lastly, the third row of C is

c31 = (7 · 2) + (0 · 6) + (8 · 1) = 14 + 0 + 8 = 22,
c32 = (7 · −9) + (0 · 1) + (8 · −3) = −63 + 0 − 24 = −87.

Therefore, the product AB is given by

 
36 3
C = 35 −38 .
 

22 −87

1.1.3 Matrix transpose

If A is an m × n matrix then its transpose is the n × m matrix whose (i, j)-entry is equal to the
T 1 2 3
(j, i)-entry
h ofiA. We denote the transpose of A by A . For example, the transpose of A = [ 4 5 6 ]
1 4
is AT = 2 5 .
3 6

Recall that a square matrix A is said to be symmetric if it satisfies AT = A, and it is called


skew-symmetric if it satisfies AT = −A.
8

1.1.4 Determinant of a matrix

Definition 6. The determinant of an n × n matrix A, denoted by det(A) or |A|, is defined


inductively as follows.

• For a 1 × 1 matrix [a], it is defined as det[a] = a.

• For n ≥ 2, we compute the determinant via cofactor expansion: define A(1,k) to be


the matrix obtained from A by deleting the 1st row and kth column. Then
n
X
det(A) = (−1)k+1 a1,k det(A(1,k) ).
k=1

Below are explicit formulae for the determinants of 2 × 2 and 3 × 3 matrices.

• The determinant of a 2 × 2 matrix [ ac db ] is given by det[ ac db ] = ad − bc.


h a1 a2 a3 i
• The determinant of a 3 × 3 matrix cb1 cb2 cb3 is given by
1 2 3

 
a1 a2 a3 " # " # " #
b 2 b 3 b 1 b 3 b 1 b 2
det  b1 b2 b3  = a1 det − a2 det + a3 det .
 
c2 c3 c1 c3 c1 c2
c1 c2 c3

Example 7. The determinant of the matrix [ 13 24 ] is given by :


"#
1 2
det = (1)(4) − (2)(3) = −2.
3 4
h 1 2 4
i
Example 8. The determinant of −1 1 0 is given by
−2 1 3

 
1 2 4 " # " # " #
1 0 −1 0 −1 1
det −1 1 0 = 1 det − 2 det + 4 det .
 
1 3 −2 3 −2 1
−2 1 3

= 1(3) − 2(−3) + 4(1) = 13.


9

1.1.5 Inverse of a matrix

Definition 9. If A is an n × n matrix, then we say A is invertible (or nonsingular) if there


exists another n × n matrix B such that

AB = BA = In ,

where In is the n × n identity matrix. If such a B exists, it is unique. It is generally denoted


by A−1 and is called the inverse of A. If A is not invertible, we call it non-invertible (or
singular).

3 −5
For example, the matrix A = [ 21 53 ] is invertible and its inverse is given by A−1 =
 
−1 2 . One
can easily check that
" #" #" #" # " #
2 5 3 −5 3 −5 2 5 1 0
= .
1 3 −1 2 −1 2 1 3 0 1

Not every matrix is invertible. An obvious example is the n × n zero matrix. Another example of
non-invertible matrix is the n × n matrix with each element equal to 1 for n ≥ 2. Indeed, we can
see that " # " # " #
1 1 a b a+c b+d
· = ,
1 1 c d a+c b+d
which is never equal to the identity matrix for any choice of a, b, c, d.

We enlist several properties of invertible matrices along with their proofs.

Proposition 10. If a matrix is invertible, then its inverse is unique.

Proof. Let A be an n × n invertible matrix. Suppose B1 and B2 both satisfy AB1 = In = B1 A


and AB2 = In = B2 A. Then B1 = B1 In = B1 (AB2 ) = (B1 A)B2 = In B2 = B2 implying
B1 = B2 .

Proposition 11. If A and B are invertible n × n matrices with inverses A−1 and B −1 , then
AB is also an invertible matrix with the inverse B −1 A−1 .
10

Proof. We simply compute

(AB)(B −1 A−1 ) = A(BB −1 )A−1 = A(In )A−1 = AA−1 = In .

Similarly, we also get (B −1 A−1 )(AB) = In .

Proposition 12. Any square matrix with a row or column of all zeroes cannot be invertible.

Proof. Suppose the n × n matrix A has all entries in its i-th row equal to zero. Then for any n × n
matrix B, the product AB will have all entries in its i-th row equal to zero, so it cannot be the
identity matrix.

Similarly, if the n × n matrix A has all entries in its i-th column equal to zero, then for any n × n
matrix B, the product BA will have all entries in its i-th column equal to zero.

Proposition 13. A 2 × 2 matrix [ ac db ] is invertible if and only if ad − bc ̸= 0. In this case,


the inverse is given by " #
1 d −b
.
ad − bc −c a

Proof. This follows from solving the system of equations for e, f, g, h in terms of a, b, c, d that
arises from comparing entries in the product
" #" # " #
a b e f 1 0
= .
c d g h 0 1

One obtains precisely the solution given above. If ad = bc, then the system is inconsistent and
there is no solution; otherwise, there is exactly one solution as given.

1.1.6 Elementary row operations for matrices

The following row operations on a matrix will be used frequently later.

• Interchanging two rows.


11

• Adding a scalar multiple of one row to another row.

• Multiplying a row by a nonzero constant.

The above row operations can be applied to any rectangular matrix to obtain a so-called row-
echelon form of the matrix that we will see in the next section.

1.2 Row-echelon form and reduced row-echelon form

Here we shall briefly discuss the concepts of row-echelon and reduced row-echelon forms of rect-
angular matrices. These concepts will play important roles in solving systems of linear equations
discussed in the next section.

1.2.1 Row-echelon form

Definition 14. A rectangular matrix is said to be in row-echelon form if it satisfies the


following two conditions:

(i) every row with a nonzero element is always above all the zero rows,

(ii) the first nonzero element in every row is always to the right of the first nonzero term
in the row above it.

The first nonzero element of a row is called the pivot element of that row.

The two conditions of Definition 14 can be restated in other words as follows:

(i′ ) all rows without a pivot (the zero rows) are at the bottom,

(ii′ ) any row’s pivot, if it has one, lies to the right of the pivot of the row directly above it.

We will see in Section 1.4 that any rectangular matrix can be brought to a row-echelon form by
performing elementary row operations on it. Below are some examples of matrices in row-echelon
12

form whose pivots are boxed:


   
1 2 3 4 5 " # " # 0 0 3 4 5
−7 2 3 4 5 10 2 3 4 5
0 −1 2 3 4 , , , 0 0 0 0 1 .
   
0 0 0 8 0 0 0 0 0 0
0 0 5 0 1 0 0 0 0 0

Also, here are examples of matrices not in the row-echelon form. The matrix
 
1 2 3 4 5
1 1 2 3 4
 

0 0 1 0 1

is not in the row-echelon form because the pivot in the second row is not strictly to the right of the
pivot element in the row above it.

Another example of a matrix not in the row-echelon form is


 
0 0 3 4 5
0 0 0 1 0 .
 

0 0 1 0 1

It is because the pivot in the third row is not strictly to the right of the pivot in the second row.

Can you determine if the matrix  


0 0 3 4 5
0 0 0 0 0
 

0 0 0 0 1
is in the row-echelon form? Justify.

Remark: We will see in the next system that if the coefficient matrix of a system of linear equations
is in a row-echelon form then it is easy to solve the system. It just requires backward substitution
to obtain the solution.

1.2.2 Reduced row-echelon form

A matrix in row-echelon form can be further simplified into a form known as reduced row-echelon
form through elementary row operations.
13

Example 15. A rectangular matrix is said to be in reduced row-echelon form if it satisfies the
following three conditions:

(i) it is in row-echelon form,

(ii) every pivot element is equal to 1,

(iii) the entries above every pivot element are all zero.

Below are some examples of matrices in reduced row-echelon form with their pivots boxed:
     
1 0 0 4 5 1 2 3 0 5 1 2 0 4 0
0 1 0 3 4 , 0 0 0 1 0 , 0 0 1 3 0 .
     

0 0 1 0 1 0 0 0 0 0 0 0 0 0 1

Also, here are some examples of matrices not in reduced row-echelon form. The matrix
 
1 2 0 4 5
0 1 0 3 4
 

0 0 1 0 1

is not in reduced row-echelon form because the entry above the second pivot element is not zero.
The following matrix is also not in reduced row-echelon form.
 
0 0 3 4 5
0 0 0 0 1 .
 

0 0 0 0 0

Why?

As stated earlier, through elementary row-operations, every matrix can be brought to a reduced
row-echelon form. It turns out that the reduced row-echelon form of a matrix is unique, as stated
in the following theorem. We omit the proof.

Theorem 16. Every matrix admits a unique reduced row-echelon form.

Remark: We will later define the concept of rank of a matrix. For a matrix in row-echelon
h form, iti
1 2 3 4 5
is equivalently defined as the number of pivots in the matrix. For example, the rank of 0 1 2 3 4
0 0 1 0 1
14

h i
1 2 3 0 5
is 3, while the rank of is 2. More generally, the rank of any matrix A is equal to the rank
0 0 0 1 0
0 0 0 0 0
of a matrix in row-echelon form obtained from A by applying elementary row operations.

1.3 System of linear equations

A system of linear equations consists of two or more linear equations involving the same set of
variables. Such systems arise frequently in engineering, physics, computer science, and many
other fields.

Definition 17. A system of m equations in n variables is of the form:

a11 x1 + a12 x2 + · · · + a1n xn = b1 ,


a21 x1 + a22 x2 + · · · + a2n xn = b2 ,
.. ..
. .
am1 x1 + am2 x2 + · · · + amn xn = bm ,

where x1 , x2 , . . . , xn are the variables, aij are the coefficients, and b1 , b2 , . . . , bm are the
constants.

The above system of linear equations can be represented in the matrix form as follows:

Ax = B,

where A is an m × n matrix called the coefficient matrix, x is the column vector of variables, and
B is the column vector of the constants given by:
     
a11 a12 ··· a1n x1 b1
 a21 a22 ··· a2n   x2   b2 
     
A=
 .. .. ... .. 
 , x =  .. 
  , B =  . .
 . 
 . . .  .  . 
am1 am2 · · · amn xn bm

The matrix [A|B] obtained by appending the constant vector B to the right of the coefficient matrix
A is called the augmented matrix of the system. The vertical line in the augmented matrix is meant
to separate the coefficients from the constants in the system. We will see later that working with
the augmented matrix is a lot more convenient while solving a system of linear equations.
15

Some examples of the system of linear equations are:

2x + 3y = 5,
(1)
4x − y = 11.

x + 2y + z = 6,
(2) 3x − y + 4z = 8,
5x + y − 2z = 4.

The traditional method for solving a system of linear equations (probably familiar from basic
algebra) is by elimination: we solve the first equation for one variable, say x1 , in terms of the
others. We then substitute the result into all other equations to obtain a reduced system involving
one fewer variables. Eventually, the system is simplified to a scenario that implies a contradiction
(e.g., 1 = 0), a unique solution, or an infinite family of solutions.

Example 18. Let us solve the following system of equations by the elimination method described
above.

x + y = 7,
2x − 2y = −2.

Step 1: Solve the first equation for x: x = 7 − y.

Step 2: Substitute this into the second equation to get:

2(7 − y) − 2y = −2 → 14 − 4y = −2 → y = 4.

Step 3: Substituting y = 4 into x = 7 − y gives x = 3.

Thus, we get a unique solution to the system given by (x, y) = (3, 4).

Another way to perform elimination is by adding or subtracting multiples of equations to eliminate


variables, avoiding solving for individual variables explicitly.
16

Example 19. Let us solve the following system of linear equations:

x + y + 3z = 4, (i)
2x + 3y − z = 1, (ii)
−x + 2y + 2z = 1. (iii)

We first eliminate x by performing the following operations on the equations (ii) → (ii) − 2(i)
and (iii) → (iii) + (i):

x + y + 3z = 4,
y − 7z = −7,
3y + 5z = 5.

Now, eliminate y by performing (iii) → (iii) − 3(ii):

x + y + 3z = 4, (i′ )
y − 7z = −7, (ii′ )
26z = 26. (iii′ )

Solve equation (iii′ ) to get z = 1. Substituting z = 1 into (ii′ ) gives y = 0. Lastly, substitute
y = 0, z = 1 into (i′ ) to get x = 1. So, we get a unique solution (x, y, z) = (1, 0, 1).

Observe that the above procedure of solving systems of linear equations requires dealing with
only the coefficients of the variables and the constants. So, we only need to keep track of the
coefficients, which we can do by putting them into an array. For example, the above system of
linear equations given by (i), (ii), (iii) can be written in simplified form using the array as
 
1 1 3 4
 2 3 −1 1 .
 

−1 2 2 1

Recall that the above matrix is called the augmented matrix for the given system of linear equations.
We can then do operations on the entries in the array that correspond to manipulations of the
associated system of equations. The elimination process consists of a combination of the three
elementary row operations, which we recall below:
17

• Interchanging two rows.

• Adding a scalar multiple of one row to another row.

• Multiplying a row by a nonzero constant.

Each of these elementary row operations leaves unchanged the solutions to the associated system
of linear equations. The idea of elimination is to apply these elementary row operations to the
coefficient matrix until it is in a simple enough form that we can simply read off the solutions
to the original system of equations. The above ideas lead to the Gaussian elimination process of
solving systems of linear equations.

1.4 Gaussian elimination method for solving system of linear equations

1.4.1 The augmented matrix in a row-echelon form

If the augmented matrix of a system is in a row-echelon form, it is easy to read off the solutions to
the corresponding system of linear equations by working from the bottom up. This is illustrated in
the following example.

Example 20. Consider the augmented matrix:


 
1 1 3 4
0 1 −1 1 ,
 

0 0 2 4

corresponding to the system:


x + y + 3z = 4,
y − z = 1,
2z = 4.
The bottom equation immediately gives z = 2. Then the middle equation gives y = 1 + z = 3, and
the top equation gives x = 4 − y − 3z = −5.

By using row operations on the augmented matrix, we can solve the associated system of equations,
since, as we noted before, each of the elementary row operations does not change the solutions of
the system.
18

By reducing the augmented matrix of a system to a row-echelon form, it becomes straightfor-


ward to conclude the nature of solutions. Given the augmented matrix in a row-echelon form, the
following are the three possible scenarios.

Case 1: No solution. This happens precisely when two conditions are satisfied:

(i) each row contains a pivot element,

(ii) the pivot element in the last row appears in the last column.

Here is an example to illustrates this case. Consider the following augmented matrix in row-
echelon form:  
1 1 3 4
0 1 −1 1  .
 

0 0 0 4
Note that the pivot element in the last row (boxed) appears in the last column. Its corresponding
system is
x + y + 3z = 4,
y − z = 1,
0 = 4.
The above system does not have any solution because the last equation is impossible to satisfy.

Case 2: Unique solution. This happens precisely when two conditions are satisfied:

(i) each row contains a pivot element,

(ii) the pivot element in the last row appears in the second-to-last column.

For example, consider the following augmented matrix in row-echelon form:


 
1 1 3 4
0 1 −1 1 .
 

0 0 2 4

Observe that it satisfies the two conditions of the present case; i.e., each row has a pivot element and
the pivot element of the last row (boxed) appears in the second-to-last column. Its corresponding
19

system is
x + y + 3z = 4,
y − z = 1,
2z = 4.
The unique solution is z = 2, y = 3, and x = −5, which we obtain by backward substitution..

Case 3: Infinitely many solutions. This happens precisely when the last column of the augmented
matrix (in a row-echelon form) is zero. For example, consider the following augmented matrix in
row-echelon form whose last row is zero:
 
1 1 3 4
0 1 −1 1 .
 

0 0 0 0

Its corresponding system is


x + y + 3z = 4,
y − z = 1,
0 = 0.
The last equation is always true, so there are really only two relations between the three variables
x, y, and z. By choosing arbitrary value of z, we can find values of the remaining two variables x
and y in terms of z allowing for infinitely many solutions to exist for the system.

1.4.2 The Gaussian elimination process

The main idea behind the Gaussian elimination process to solve a system of linear equations is to
bring the augmented matrix in a row-echelon form. Then depending on the nature of solutions, as
discussed above, we either conclude there is not solution or solve the system through backward
substitution.

The following is a step-by-step description of the Gaussian elimination process:

Step 1. Convert the augmented matrix in a row-echelon form. The augmented matrix can be
reduced to a row-echelon form by performing elementary row operations on it. This is called
forward elimination. If you go further with the process and convert the augmented matrix
into its reduced row-echelon form then the following steps become significantly easier to
20

execute; and this is called the Gauss–Jordan elimination process of solving system of linear
equations.

Step 2. Interpreting the nature of solutions: If each row contains a pivot element and the pivot in
the last row appears in the last column, then conclude that no solution exists. Else, proceed
to the next step.

Step 3. Calculating solutions explicitly: If you are at this step, it means a solution exists. In this
case, the variables corresponding to the pivot elements are given in terms of the remaining
variables (also called free variables) through backward substitution. In particular, if there are
n pivot elements for the system in n variables then the system has a unique solution. Else,
the pivot variables are determined in terms of free variables, which implies infinitely many
solutions.

We demonstrate the Gaussian elimination process through an example as follows.

Example 21. Let us solve the system

x + y + 3z = 4,
2x + 3y − z = 1,
−x + 2y + 2z = 1,

using Gaussian elimination. The augmented matrix of the system is given by


 
1 1 3 4
 2 3 −1 1 .
 

−1 2 2 1

We now apply a series of elementary row operations to bring the augmented matrix in a row-
echelon form as follows:  
1 1 3 4
 2 3 −1 1
 

−1 2 2 1
21

Apply R2 → R2 − 2R1 and R3 → R3 + R1 :


 
1 1 3 4
0 1 −7 −7
 

0 3 5 5

Apply R3 → R3 − 3R2 :  
1 1 3 4
0 1 −7 −7
 

0 0 26 26
We have now obtained a row-echelon form of the original augmented matrix. We can also see
that the system has a unique solution. Why? The solution can be easily obtained through back
substitution, which is given by x = 1, y = 0, z = 1.

1.5 Gauss–Jordan method for computing the inverse of a matrix

This section concerns with computation of the inverse of invertible matrices. The method described
in the section is the Gauss–Jordan method for computing the inverse of a matrix. The main idea of
this method is to convert the problem of finding the inverse of a matrix into solving several systems
of linear equations simultaneously using the Gaussian elimination method.

Let A be an n × n invertible matrix. We know by definition that

AA−1 = In , (1)

where recall that In is the identity matrix of size n × n. Let us denote by x1 , . . . , xn the (unknown)
columns of A−1 and denote by e1 , . . . , en the columns of In . We can thus write (1) as

[Ax1 · · · Axn ] = [e1 · · · en ] .

The above matrix equation is equivalent to the following systems of linear equations:

Axi = ei , 1 ≤ i ≤ n. (2)

Therefore, finding the inverse of A boils down to solving the systems of linear equations (2) si-
22

multaneously. We apply to Gauss–Jordan elimination method (Gaussian elimination method with


reduced row-echelon form of the augmented matrix) to solve the above systems of linear equations.
This is known as the Gauss–Jordan method for computing the inverse of a matrix.

Gauss–Jordan method for finding inverse ≡ solving several systems of linear equations si-
multaneously

The following are the steps to perform Gauss–Jordan method to compute the inverse of a matrix:

Step 1. Consider the augmented matrix [A|In ].

Step 2. Reduce [A|In ] to its reduced row-echelon form:

(i) apply elementary row operations on [A|In ] to reduce it to a row-echelon form, say
[U |⋆],
(ii) apply another set of elementary row operations on [U |⋆] to get reduced row-echelon
form, which is given by [In |A−1 ].

Step 3. Read off A−1 from the reduced row-echelon form [In |A−1 ].

Gauss–Jordan method in a nutshell: [A|In ] −→ [U |⋆] −→ [In |A−1 ]

We illustrate the Gauss–Jordan method in the following examples.

Example 22. Let us find the inverse of [ 21 11 ] using Gauss–Jordan method. Consider the augmented
matrix [A|I2 ]:
" #
2 1 1 0
1 1 0 1

Apply R2 → R2 − 12 R2 :
" #
2 1 1 0
0 21 − 21 1
23

Apply R1 → 12 R1 , R2 → 2R2 :
" #
1 12 12 0
0 1 −1 2

Apply R1 → R1 − 12 R2 :
" #
1 0 1 −1
0 1 −1 2
" # " #
2 1 1 −1
The inverse of is thus given by .
1 1 −1 2

Example 23. Consider the matrix


 
2 1 0
1 2 1 .
 

0 1 2

Let us find its inverse using the Gauss–Jordan method. Consider the augmented matrix [A|I3 ]:
 
2 1 0 1 0 0
1 2 1 0 1 0
 

0 1 2 0 0 1

Apply on the above matrix R2 → R2 − 12 R1 :


 
2 1 0 1 0 0
 3
0 2 1 − 21 1 0

0 1 2 0 0 1

Apply R3 → R3 − 23 R2 :
 
2 1 0 1 0 0
 3
0 2 1 − 12 1 0

0 0 43 31 − 23 1
24

Apply R1 → 12 R1 , R2 → 23 R2 , R3 → 43 R3 :
 
1 21 0 12 0 0
0 1 32 − 31 23 0 
 

0 0 1 14 − 12 43

Apply R1 → R1 − 21 R2 :
 
1 0 − 31 23 − 13 0
0 1 23 − 31 32 0 
 

0 0 1 14 − 12 43

Apply R1 → R1 + 13 R3 , R2 → R2 − 23 R3 :
 
1 0 0 34 − 12 14
0 1 0 − 21 1 − 12 
 

0 0 1 41 − 12 34

The inverse of A is thus given by


 3

4
− 12 14
A−1 = − 12 1 − 12  .
 
1
4
− 12 34

2 Vector Spaces

When we read or hear the word vector, we may immediately think of two points in R2 (or R3 )
connected by an arrow. Mathematically speaking, a vector is just an element of a vector space.
This then begs the question: What is a vector space?

Roughly speaking, a vector space is a set of objects that can be added and multiplied by scalars.
We may have already worked with several types of vector spaces. Examples of vector spaces that
we might have already encountered are:

1. the set Rn ,

2. the set of all n × n matrices,


25

3. the set of all functions from [a, b] to R, and

4. the set of all sequences.

In all of these sets, there is an operation of addition and an operation multiplication by scalars. Let
us formalize exactly what we mean by a vector space.

Definition 24. A nonempty set V (called set of vectors) with operations “vector addition”,
denoted by “+” and “scalar multiplication”, denoted by “·” is called a vector space over
a set of scalars F if the following properties are satisfied:

1. u + v ∈ V for all u, v ∈ V . (closure under addition)

2. (u + v) + w = u + (v + w). (vector addition is associative)

3. There is a vector in V called the zero vector, denoted by 0, satisfying v + 0 = v =


0 + v.

4. For each v, there is a vector V , denoted by −v such that v + (−v) = 0 = (−v) + v.

5. u + v = v + u for all u, v ∈ V (addition is commutative)

6. α · v ∈ V for every α ∈ F and v ∈ V . (closure under scalar multiplication)

7. There is an element 1 ∈ F such that 1 · v = v.

8. α · (u + v) = α · u + α · v for every α ∈ F and u, v ∈ V .

9. (α + β) · v = α · v + β · v for every α, β ∈ F and v ∈ V ..

10. α · (β · v) = (αβ) · v for every α, β ∈ F and v ∈ V .

We usually take F = R or F = C. It can be shown that 0 · v = 0 for any vector v in V , where


0 ∈ F. To better understand the definition of a vector space, we first consider a few elementary
examples.
Example 25. Let V be the unit disc in R2 :

V = {(x, y) ∈ R2 | x2 + y 2 ≤ 1}.

Is V a vector space over R?


26

y
(0, 1)

x
O (1, 0)

Solution. The circle is not closed under scalar multiplication. For example, take u = (1, 0) ∈ V
and multiply by say α = 2. Then αu = (2, 0) is not in V . Therefore, property (6) of the definition
of a vector space fails, and consequently the unit disc is not a vector space.

Example 26. Let V be the graph of the quadratic function f (x) = x2 :

V = {(x, y) ∈ R2 | y = x2 }.

Is V a vector space over R?

10

6
f (x) = x2
4

−3 −2 −1 1 2 3

Solution. The set V is not closed under scalar multiplication. For example, u = (1, 1) is a point
in V but 2u = (2, 2) is not. You may also notice that V is not closed under addition either. For
example, both u = (1, 1) and v = (2, 4) are in V but u + v = (3, 5) and (3, 5) is not a point on
the parabola V . Therefore, the graph of f (x) = x2 is not a vector space.

Example 27. Let V be the graph of the function f (x) = 2x:

V = {(x, y) ∈ R2 | y = 2x}.

Is V a vector space over R?


27

Solution. We will show that V is a vector space. First, we verify that V is closed under addition.
We first note that an arbitrary point in V can be written as u = (x, 2x). Let then u = (a, 2a) and
v = (b, 2b) be points in V . Then

u + v = (a + b, 2a + 2b) = (a + b, 2(a + b)).

Therefore, V is closed under addition. Verify that V is closed under scalar multiplication:

αu = α(a, 2a) = (αa, α2a) = (αa, 2(αa)).

Therefore, V is closed under scalar multiplication. There is a zero vector 0 = (0, 0) in V :

u + 0 = (a, 2a) + (0, 0) = (a, 2a).

All the other properties of a vector space can be verified to hold; for example, addition is commu-
tative and associative in V because addition in R2 is commutative/associative, etc. Therefore, the
graph of the function f (x) = 2x is a vector space.

Example 28. Let V = Rn [t] be the set of all polynomials in the variable t of degree at most n,
given by
Rn [t] = a0 + a1 t + a2 t2 + · · · + an tn | a0 , a1 , . . . , an ∈ R .


Is V a vector space over R?

Solution. Let u(t) = u0 + u1 t + · · · + un tn and v(t) = v0 + v1 t + · · · + vn tn be polynomials in V .


We define the addition of u and v as the new polynomial (u + v) as follows:

(u + v)(t) = u(t) + v(t) = (u0 + v0 ) + (u1 + v1 )t + · · · + (un + vn )tn .

Then u + v is a polynomial of degree at most n, and thus (u + v) ∈ Rn [t]. Therefore, Rn [t] is


closed under addition.
28

Now let α be a scalar. Define a new polynomial (αu) as follows:

(αu)(t) = (αu0 ) + (αu1 )t + · · · + (αun )tn .

Then αu is a polynomial of degree at most n, and thus αu ∈ Rn [t]. Hence, Rn [t] is closed under
scalar multiplication. The zero vector in Rn [t] is the zero polynomial 0(t) = 0.

One can verify that all other properties of the definition of a vector space also hold; for example,
addition is commutative and associative, etc. Thus, Rn [t] is a vector space over R.

Example 29. Let V = Mm×n (R) be the set of all m × n matrices whose entries are from R. Under
the usual operations of addition of matrices and scalar multiplication, is Mm×n a vector space
over R?

Solution. Given matrices A, B ∈ Mm×n (R) and a scalar α, we define the sum A + B by adding
entry-by-entry, and αA by multiplying each entry of A by α. It is clear that the space Mm×n (R) is
closed under these two operations. The zero vector in Mm×n (R) is the matrix of size m × n having
all entries equal to zero. It can be verified that all other properties of the definition of a vector
space also hold. Thus, the set Mm×n (R) is a vector space over R.

Example 30. The n-dimensional Euclidean space V = Rn under the usual operations of addition
and scalar multiplication is a vector space over R.

Example 31. Let V = C[a, b] denote the set of functions with domain [a, b] and codomain R that
are continuous. Is V a vector space over R?

2.1 Subspaces of vector spaces

Frequently, one encounters a vector space W that is a subset of a larger vector space V . In this
case, we would say that W is a subspace of V . Below is the formal definition.

Definition 32. Let V be a vector space over F with operations “+” and “·” and let W ⊆ V .
Then W is called a subspace of V if W is a vector space under same operations + and ·
over F.

Since W is also a vector space, all 10-properties of the vector space hold. Hence it is closed under
vector addition and scalar multiplication. The following theorem says that without checking all the
10-properties to show a subset W of V is a subspace of V , we can only check 2-properties.
29

Theorem 33. Let V be a vector space over F under operations “+” and “·”. A subset W
of V is a subspace of V if and only if it satisfies the following properties:

1. u + v ∈ W for all u, v ∈ W

2. α · u ∈ W for all α ∈ F and u ∈ W .

Example 34. Let W be the graph of the function f (x) = 2x:

W = {(x, y) ∈ R2 | y = 2x}.

Is W a subspace of V = R2 ?

Solution. If x = 0, then y = 2 · 0 = 0 and therefore (0, 0) is in W . Let u = (a, 2a) and v = (b, 2b)
be elements of W . Then

u + v = (a, 2a) + (b, 2b) = (a + b, 2a + 2b) = (a + b, 2(a + b)).

Because the x- and y-components of u + v satisfy y = 2x, then u + v is inside W . Thus, W is


closed under addition.

Let α be any scalar and let u = (a, 2a) be an element of W . Then

αu = (αa, α2a) = (αa, 2(αa)).

Because the x- and y-components of αu satisfy y = 2x, then αu is an element of W , and thus W
is closed under scalar multiplication.

All three conditions of a subspace are satisfied for W , and therefore W is a subspace of V .

Example 35. Let W be the first quadrant in R2 :

W = {(x, y) ∈ R2 | x ≥ 0, y ≥ 0}.

Is W a subspace of R2 ?

Solution. The set W contains the zero vector, and the sum of two vectors in W is again in W ; you
may want to verify this explicitly as follows: If u1 = (x1 , y1 ) is in W , then x1 ≥ 0 and y1 ≥ 0, and
30

similarly if u2 = (x2 , y2 ) is in W , then x2 ≥ 0 and y2 ≥ 0. Then the sum

u1 + u2 = (x1 + x2 , y1 + y2 )

has components x1 + x2 ≥ 0 and y1 + y2 ≥ 0, and therefore u1 + u2 is in W . However, W is not


closed under scalar multiplication. For example, if u = (1, 1) and α = −1, then

αu = (−1, −1)

is not in W because the components of αu are clearly not non-negative.


Example 36. Let V = Mn×n be the vector space of all n × n matrices. We define the trace of a
matrix A ∈ Mn×n as the sum of its diagonal entries:

tr(A) = a11 + a22 + · · · + ann .

Let W be the set of all n × n matrices whose trace is zero:

W = {A ∈ Mn×n | tr(A) = 0}.

Is W a subspace of V ?

Solution. If 0 is the n × n zero matrix, then clearly tr(0) = 0, and thus 0 ∈ Mn×n . Suppose that A
and B are in W . Then necessarily tr(A) = 0 and tr(B) = 0. Consider the matrix

C = A + B.

Then
tr(C) = tr(A + B) = (a11 + b11 ) + (a22 + b22 ) + · · · + (ann + bnn )

= (a11 + · · · + ann ) + (b11 + · · · + bnn ) = tr(A) + tr(B) = 0

Therefore, tr(C) = 0 and consequently C = A + B ∈ W , in other words, W is closed under


addition. Now let α be a scalar and let C = αA. Then

tr(C) = tr(αA) = (αa11 ) + (αa22 ) + · · · + (αann ) = α tr(A) = 0.

Thus, tr(C) = 0, that is, C = αA ∈ W , and consequently W is closed under scalar multiplication.
Therefore, the set W is a subspace of V .
31

Example 37. Let V = Pn [t] and consider the subset W of V :

W = {u ∈ Pn [t] | u(2) = −1}

In other words, W consists of polynomials of degree n in the variable t whose value at t = 2 is


−1. Is W a subspace of V ?

Solution. The zero polynomial 0(t) = 0 clearly does not equal −1 at t = 2. Therefore, W does not
contain the zero polynomial and, because all three conditions of a subspace must be satisfied for
W to be a subspace, we conclude that W is not a subspace of n[t]. As an exercise, you may want
to investigate whether or not W is closed under addition and scalar multiplication.

Example 38. A square matrix A is said to be symmetric if AT = A. For example, the following is
a 3 × 3 symmetric matrix:  
2 4 5
A =  1 2 −3
 

−3 5 7

Verify for yourself that we do indeed have that AT = A. Let W be the set of all symmetric n × n
matrices. Is W a subspace of V = Mn×n ?

Example 39. For any vector space V , there are two trivial subspaces in V , namely, V itself is a
subspace of V , and the set consisting of the zero vector, W = {0}, is a subspace of V .

There is a particular way to generate a subspace of any given vector space V using the span of a
set of vectors.

Definition 40. Let V be a vector space over a field F and let S be any non-empty subset of
V . The span of S is defined to the set of all finite linear combinations of vectors in S, i.e.,

span S := {t1 v1 + t2 v2 + · · · + tp vp : p ∈ N, vi ∈ V, ti ∈ F, 1 ≤ i ≤ p}.

In particular, if S = {v1 , . . . , vk } is a finite subset of V then

span S = {t1 v1 + · · · tk vk : t1 , . . . , tk ∈ F}. (3)

We define the span the empty set to be the zero space {0}.
32

We now show that the span of a set of vectors in V is a subspace of V .

Theorem 41. Given a vector space V and S ⊆ V , span S is a subspace of V .

Proof. We present a proof for the case of a finite set S = {v1 , . . . , vp }. The general case arguments
are similar.

Let u = t1 v1 + · · · + tp vp and w = s1 v1 + · · · + sp vp be two vectors in span S. Then

u + w = (t1 v1 + · · · + tp vp ) + (s1 v1 + · · · + sp vp ) = (t1 + s1 )v1 + · · · + (tp + sp )vp .

Therefore, u + w is also in span S. Now consider αu:

αu = α(t1 v1 + · · · + tp vp ) = (αt1 )v1 + · · · + (αtp )vp .

Therefore, αu is in span S. Lastly, since 0v1 + 0v2 + · · · + 0vp = 0, the zero vector 0 is in the span
of v1 , v2 , . . . , vp . Therefore, span S is a subspace of V .

Given a general subspace W of V , if w1 , w2 , . . . , wp are vectors in W such that

span{w1 , w2 , . . . , wp } = W,

then we say that {w1 , w2 , . . . , wp } is a spanning set of W . Hence, every vector in W can be written
as a linear combination of the vectors w1 , w2 , . . . , wp .

2.2 Linear independence

Roughly speaking, the concept of linear independence evolves around the idea of working with “ef-
ficient” spanning sets for a subspace. For instance, the set of directions {EAST, N ORT H, N ORT H−
EAST } are redundant since a total displacement in the NORTH-EAST direction can be obtained
by combining individual NORTH and EAST displacements. With these vague statements out of
the way, we introduce the formal definition of what it means for a set of vectors to be “efficient”.
33

Definition 42. Let V be a vector space over F and let {v1 , v2 , . . . , vp } be a non-empty set of
vectors in V . Then {v1 , v2 , . . . , vp } is linearly independent if the only scalars c1 , c2 , . . . , cp ∈
F that satisfy the equation

c1 v1 + c2 v2 + · · · + cp vp = 0

are the trivial scalars c1 = c2 = · · · = cp = 0. If the set {v1 , . . . , vp } is not linearly


independent, then we say that it is linearly dependent.
More generally, we say that any non-empty subset S ⊆ V is linearly independent if every
non-empty finite subset of S is linearly independent.

We now describe the redundancy in a set of linearly dependent vectors. If {v1 , . . . , vp } are linearly
dependent, it follows that there are scalars c1 , c2 , . . . , cp , at least one of which is nonzero, such that

c1 v1 + c2 v2 + · · · + cp vp = 0. (⋆)

For example, suppose that {v1 , v2 , v3 , v4 } are linearly dependent. Then there are scalars c1 , c2 , c3 , c4 ,
not all of them zero, such that equation (⋆) holds. Suppose, for the sake of argument, that c3 ̸= 0.
Then,
c1 c2 c4
v3 = − v1 − v2 − v4 .
c3 c3 c3

Therefore, when a set of vectors is linearly dependent, it is possible to write one of the vectors
as a linear combination of the others. It is in this sense that a set of linearly dependent vectors is
redundant. In fact, if a set of vectors is linearly dependent, we can say even more, as the following
theorem states.

Theorem 43. A set of vectors {v1 , v2 , . . . , vp }, with v1 ̸= 0, is linearly dependent if and only
if some vj is a linear combination of the preceding vectors v1 , . . . , vj−1 .

Example 44. Show that the following set of 2 × 2 matrices is linearly dependent:
" # " # " #
1 2 −1 3 5 0
A1 = , A2 = , A3 = .
0 −1 1 0 −2 −3

Solution: It is clear that A1 and A2 are linearly independent, i.e., A1 cannot be written as a scalar
34

multiple of A2 , and vice versa. Since the (2, 1) entry of A1 is zero, the only way to get the −2 in
the (2, 1) entry of A3 is to multiply A2 by −2. Similarly, since the (2, 2) entry of A2 is zero, the
only way to get the −3 in the (2, 2) entry of A3 is to multiply A1 by 3. Hence, we suspect that
3A1 − 2A2 = A3 . Verify:
"
# " # " #
3 6 −2 6 5 0
3A1 − 2A2 = − = = A3 .
0 −3 2 0 −2 −3

Therefore, 3A1 − 2A2 − A3 = 0, and thus we have found scalars c1 , c2 , c3 , not all zero, such that
c1 A1 + c2 A2 + c3 A3 = 0.

2.3 Basis of a vector space

We now introduce the important concept of a basis. Given a set of vectors {v1 , . . . , vp−1 , vp } in V ,
we showed that W = span{v1 , v2 , . . . , vp } is a subspace of V . If, say, vp is linearly dependent on
v1 , v2 , . . . , vp−1 , then we can remove vp and the smaller set {v1 , . . . , vp−1 } still spans all of W :

W = span{v1 , v2 , . . . , vp } = span{v1 , . . . , vp−1 }.

Intuitively, vp does not provide an independent “direction” in generating W . If some other vector
vj is linearly dependent on v1 , . . . , vp−1 , then we can remove vj and the resulting smaller set of
vectors still spans W . We can continue removing vectors until we obtain a minimal set of vectors
that are linearly independent and still span W . The following remarks motivate the following
important definition.

Definition 45. Let W be a subspace of a vector space V . A set of vectors B = {v1 , . . . , vk }


in W is said to be a basis for W if:

1. The set B spans all of W , that is, W = span{v1 , . . . , vk }, and

2. The set B is linearly independent.

A basis is therefore a minimal spanning set for a subspace. Indeed, if B = {v1 , . . . , vp } is a basis
35

for W and we remove, say, vp , then

B
e = {v1 , . . . , vp−1 }

cannot be a basis for W (Why?). If B = {v1 , . . . , vp } is a basis, then it is linearly independent, and
therefore vp cannot be written as a linear combination of the others. In other words, vp ∈ W is not
in the span of
B
e = {v1 , . . . , vp−1 },

and therefore B
e is not a basis for W , because a basis must be a spanning set.

If, on the other hand, we start with a basis B = {v1 , . . . , vp } for W and we add a new vector u
from W , then
Be = {v1 , . . . , vp , u}

is not a basis for W (Why?). We still have that span(B) e = W , but now Be is not linearly indepen-
dent. Indeed, because B = {v1 , . . . , vp } is a basis for W , the vector u can be written as a linear
combination of {v1 , . . . , vp }, and thus B
e is not linearly independent.

Example 46. Show that the standard unit vectors form a basis for V = R3 :
     
1 0 0
e1 = 0 , e2 = 1 , e3 = 0 .
     

0 0 1

Solution: Any vector x ∈ R3 can be written as a linear combination of e1 , e2 , e3 :


       
x1 1 0 0
x = x2  = x1 0 + x2 1 + x3 0 = x1 e1 + x2 e2 + x3 e3 .
       

x3 0 0 1

Therefore, span{e1 , e2 , e3 } = R3 . The set B = {e1 , e2 , e3 } is linearly independent. Indeed, if there


are scalars c1 , c2 , c3 such that
c1 e1 + c2 e2 + c3 e3 = 0,

then clearly they must all be zero: c1 = c2 = c3 = 0. Therefore, by definition, B = {e1 , e2 , e3 }


is a basis for R3 . This basis is called the standard basis for R3 . Analogous arguments hold for
{e1 , e2 , . . . , en } in Rn .
36

Example 47. Is B = {v1 , v2 , v3 } a basis for R3 ?


     
2 −4 4
v1 = −2 , v2 =  8  , v3 = −6 .
     

−4 0 −6

Solution: Form the matrix A = [v1 v2 v3 ] and row reduce:


 
1 0 0
A ∼ 0 1 0 .
 

0 0 1

Therefore, the only solution to Ax = 0 is the trivial solution. Thus, B is linearly independent.
Moreover, for any b ∈ R3 , the augmented matrix [A b] is consistent. Therefore, the columns of A
span all of R3 :
Col(A) = span{v1 , v2 , v3 } = R3 .

Hence, B is a basis for R3 .

Example 48. In V = R4 , consider the vectors


     
1 0 −1
     
3 −2 4
v1 = 
 2 ,
 v2 = 
−2 ,
 v3 = 
 2 .

     
−1 1 −3

Let W = span{v1 , v2 , v3 }. Is B = {v1 , v2 , v3 } a basis for W ?

Solution: By definition, B is a spanning set for W , so we need only determine if B is linearly


independent. Form the matrix A = [v1 v2 v3 ] and row reduce to obtain:
 
1 0 1
 
0 1 −1
A∼
0
.
 0 0
0 0 0

Hence, rank(A) = 2 and thus B is linearly dependent. Notice v1 − v2 = v3 . Therefore, B is not a


basis of W .
37

Example 49. Find a basis for the vector space of 2 × 2 matrices.

Example 50. Recall that an n × n matrix A is skew-symmetric if A⊤ = −A. We proved that the set
of n × n skew-symmetric matrices is a subspace. Find a basis for the set of 3 × 3 skew-symmetric
matrices.

2.4 Dimension of a vector space

The following theorem will lead to the definition of the dimension of a vector space.

Theorem 51. Any two bases of a vector space have equal cardinality.

Proof. We will prove the theorem for the case that V = Rn . We already know that the standard
unit vectors {e1 , e2 , . . . , en } form a basis of Rn . Let {u1 , u2 , . . . , up } be nonzero vectors in Rn , and
suppose first that p > n. In previous theorem, we proved that any set of vectors in Rn containing
more than n vectors is automatically linearly dependent. The reason is that the RREF of A =
[u1 u2 · · · up ] will contain at most r = n leading ones, and therefore d = p − n > 0. Thus, the
solution set of Ax = 0 contains non-trivial solutions. On the other hand, suppose instead that
p < n. Also, a set of vectors {u1 , . . . , up } in Rn spans Rn if and only if the RREF of A has
exactly r = n leading ones. The largest possible value of r is r = p < n. Therefore, if p < n,
then {u1 , u2 , . . . , up } cannot be a basis for Rn . Thus, in either case (p > n or p < n), the set
{u1 , u2 , . . . , up } cannot be a basis for Rn . Hence, any basis in Rn must contain n vectors.

The previous theorem does not say that every set {v1 , v2 , . . . , vn } of nonzero vectors in Rn con-
taining n vectors is automatically a basis for Rn . For example,
     
1 0 2
     
0 3 0
v1 = 
1 ,
 v2 = 
0 ,
 v3 =  
0
     
0 0 0

do not form a basis for R3 because  


0
x = 1
 

0
38

is not in the span of {v1 , v2 , v3 }. All that we can say is that a set of vectors in Rn containing fewer
or more than n vectors is automatically not a basis for Rn . From above Theorem 1, any basis in
Rn must have exactly n vectors. In fact, on a general abstract vector space V , if {v1 , v2 , . . . , vn } is
a basis for V , then any other basis for V must have exactly n vectors also. Because of this result,
we can make the following definition.

Definition 52. Let V be a vector space. The dimension of V , denoted as dim V , is the
number of vectors in any basis of V . The dimension of the trivial vector space V = {0} is
defined to be zero.

Moving on, suppose that we have a set B = {v1 , v2 , . . . , vn } in Rn containing exactly n vec-
tors. For B = {v1 , v2 , . . . , vn } to be a basis of Rn , the set B must be linearly independent and
span(B) = Rn . In fact, it can be shown that if B is linearly independent, then the spanning
condition span(B) = Rn is automatically satisfied, and vice-versa.

For example, say the vectors {v1 , v2 , . . . , vn } in Rn are linearly independent, and put A = [v1 v2 · · · vn ].
Then A−1 exists, and therefore Ax = b is always solvable. Hence,

Col(A) = span{v1 , v2 , . . . , vn } = Rn .

In summary, we have the following theorem.

Theorem 53. Let B = {v1 , . . . , vn } be vectors in Rn . If B is linearly independent, then B


is a basis for Rn . Alternatively, if span{v1 , v2 , . . . , vn } = Rn , then B is a basis for Rn .

Example 54. Do the columns of the matrix

 
2 3 3 −2
 
4 7 8 −6
A=
 
0 0 1 0
−4 −6 −6 3
form a basis for R4 ?

Solution: Let v1 , v2 , v3 , v4 denote the columns of A. Since we have n = 4 vectors in R4 , we need


only check that they are linearly independent. Compute
39

det(A) = −2 ̸= 0.

Hence, rank(A) = 4, and thus the columns of A are linearly independent. Therefore, the vectors
v1 , v2 , v3 , v4 form a basis for R4 .

A subspace W of a vector space V is a vector space in its own right, and therefore also has a dimen-
sion. By definition, if B = {v1 , . . . , vk } is a linearly independent set in W and span{v1 , . . . , vk } =
W , then B is a basis for W , and in this case, the dimension of W is k. Since an n-dimensional
vector space V requires exactly n vectors in any basis, then if W is a strict subspace of V , then

dim(W ) < dim(V ).

As an example, in V = R3 , subspaces can be classified by dimension:

1. The zero-dimensional subspace in R3 is W = {0}.

2. The one-dimensional subspaces in R3 are lines through the origin. These are spanned by a
single nonzero vector.

3. The two-dimensional subspaces in R3 are planes through the origin. These are spanned by
two linearly independent vectors.

4. The only three-dimensional subspace in R3 is R3 itself. Any set {v1 , v2 , v3 } in R3 that is


linearly independent is a basis for R3 .

Example 55. Find a basis for Null(A) and the dim(Null(A)) if

 
−2 4 −2 −4
A =  2 −6 −3 1  .
 

−3 8 2 −3

Solution: By definition, Null(A) is the solution set of the homogeneous system Ax = 0. Row
reducing, we obtain  
1 0 6 5
A ∼ 0 1 5/2 3/2 .
 

0 0 0 0
40

The general solution to Ax = 0 in parametric form is

   
−5 −6
   
−3/2 −5/2
 0  + s  1  = tv1 + sv2 .
x = t   
   
1 0

By construction, the vectors

   
−5 −6
   
−3/2 −5/2
v1 = 
 , v2 = 
 
 0 
  1 

1 0

span Null(A) and are linearly independent. Therefore, B = {v1 , v2 } is a basis for Null(A), and
thus

dim(Null(A)) = 2.

In general, the dimension of Null(A) is the number of free parameters in the solution set of the
system Ax = 0, that is,

dim(Null(A)) = d = n − rank(A).

3 Introduction to linear transformations

3.1 Images of functions

Consider a function
T : Rn → Rm .

The domain of T is Rn and the co-domain of T is Rm . The case n = m is allowed, of course. In


engineering or physics, the domain is sometimes called the input space and the co-domain is called
41

the output space. Using this terminology, the points x in the domain are called the inputs, and the
points T (x) produced by the mapping are called the outputs.

Definition 56. The vector b ∈ Rm is in the range of T , or in the image of T , if there exists
some x ∈ Rn such that T (x) = b.

In other words, b is in the range of T if there is an input x in the domain of T that outputs
b = T (x). In general, not every point in the co-domain of T is in the range of T . For example,
consider the vector mapping T : R2 → R2 defined as
" #
x21 sin(x2 ) − cos(x21 − 1)
T (x) = .
x21 + x22 + 1

The vector b = (3, −1) is not in the range of T because the second component of T (x) is positive.
On the other hand, b = (−1, 2) is in the range of T because
" #! " # " #
1 12 sin(0) − cos(12 − 1) −1
T = 2 2
= = b.
0 1 +0 +1 2

Hence, a corresponding input for this particular b is x = (1, 0).

3.2 Linear transformations

Definition 57. Let V and U be vector spaces over the scalar field F. Then T : V → U is
called a linear transformation of V into W if the following holds for any u, v ∈ V and any
scalar α ∈ F:

(i) T (u + v) = T (u) + T (v)

(ii) u ∈ V it holds that T (αv) = αT (v).

Example 58. Consider the function T : V → U defined by T (v) = 0 for all v ∈ V . Verify that
this a linear transformation. This is called the zero transformation.

Example 59. Consider the function T : V → V defined by T (v) = v for all v ∈ V . Verify that
this a linear transformation. This is called the identity transformation.
42

Example 60. Let A ∈ Mm×n (F) be given. Consider the function LA : Fn → Fm defined by
LA (v) = Av for each column vector v ∈ Fn . Verify that this a linear transformation. This is
called a left multiplication transformation.

Example 61. Is the function T : R2 → R3 a linear transformation of R2 into R3 ?


 
" #! 2x1 − x2
x1
T =  x1 + x2  .
 
x2
−x1 − 3x2

Yes! We must verify that the two conditions in Definition 57 hold. For the first condition, take
arbitrary vectors u = (u1 , u2 ) and v = (v1 , v2 ). We compute:
 
" #! 2(u1 + v1 ) − (u2 + v2 )
u1 + v1
T (u + v) = T =  (u1 + v1 ) + (u2 + v2 )  .
 
u2 + v2
−(u1 + v1 ) − 3(u2 + v2 )

Expanding and simplifying, we find

T (u + v) = T (u) + T (v).

For the second condition, let c ∈ R be an arbitrary scalar. Then:


 
" #! c(2u1 − u2 )
cu1
T (cu) = T =  c(u1 + u2 )  = cT (u).
 
cu2
c(−u1 − 3u2 )

Therefore, both conditions of Definition 57 hold, and thus T is a linear transformation.

Example 62. Let V = Mn×n (R) be the vector space of n × n matrices with entries from R and let
T : V → V be the function
T (A) = A + AT .

Is T a linear transformation of V into V ?

Yes! Let A and B be matrices in V . Then using the properties of the transpose and regrouping, we
43

obtain:

T (A + B) = (A + B) + (A + B)T = A + B + AT + B T = (A + AT ) + (B + B T ) = T (A) + T (B).

Similarly, if α is any scalar, then

T (αA) = (αA) + (αA)T = αA + αAT = α(A + AT ) = αT (A).

This proves that T satisfies both conditions and thus T is a linear transformation of V into V .

Example 63. Is the function T : R2 → R2 a linear transformation of R2 into R2 ?

" #
x21 sin(x2 ) − cos(x21 − 1)
T (x) =
x21 + x22 + 1
No! Recall that " #! " #
1 −1
T = .
0 2
If T were linear, then by Definition 57, the following must hold:
" #! " #! " #! " # " #
3 1 1 −1 −3
T =T 3 = 3T =3 = .
0 0 0 2 6

However, " #! " # " # " #


3 32 sin(0) − cos(32 − 1) − cos(8) −3
T = 2 2
= ̸= .
0 3 +0 +1 10 6

Example 64. Let V = Mn×n (R) be the vector space of n × n matrices with entries from R, where
n ≥ 2, and let T : V → R be the function

T (A) = det(A).

Is T a linear transformation of V into R?

No! If T is a linear transformation, then according to Definition 57, we must have T (A + B) =


det(A + B) = det(A) + det(B) and also T (αA) = αT (A) for any scalar α. Do these properties
actually hold though?
44

For example, we know from the properties of the determinant that det(αA) = αn det(A), and
therefore it does not hold that T (αA) = αT (A) unless α = 1. Therefore, T is not a linear
transformation.

Also, it does not hold in general that det(A + B) = det(A) + det(B); in fact, it rarely holds. For
example, if " # " #
2 0 −1 1
A= , B= ,
0 1 0 3
then det(A) = 2, det(B) = −3, and therefore det(A) + det(B) = −1. On the other hand,
" #
1 1
A+B =
0 4

and thus det(A + B) = 4. Thus, det(A + B) ̸= det(A) + det(B).

Example 65. Let V = Rn [t] be the vector space of polynomials in the variable t of degree no more
than n ≥ 1. Consider the function T : V → V defined as

T (f (t)) = 2f (t) + f ′ (t).

For example, if f (t) = 3t6 − t2 + 5 then

T (f (t)) = 2f (t) + f ′ (t) = 2(3t6 − t2 + 5) + (18t5 − 2t) = 6t6 + 18t5 − 2t2 − 2t + 10.

Is T a linear transformation of V into V ?

Yes! Let f (t) and g(t) be polynomials of degree no more than n ≥ 1. Then

T (f (t) + g(t)) = 2(f (t) + g(t)) + (f (t) + g(t))′ = 2f (t) + 2g(t) + f ′ (t) + g ′ (t)
= (2f (t) + f ′ (t)) + (2g(t) + g ′ (t))
= T (f (t)) + T (g(t)).

Therefore, T (f (t) + g(t)) = T (f (t)) + T (g(t)). Now let α be any scalar. Then

T (αf (t)) = 2(αf (t)) + (αf (t))′ = 2αf (t) + αf ′ (t) = α(2f (t) + f ′ (t)) = αT (f (t)).
45

Therefore, T (αf (t)) = αT (f (t)). Thus, T is a linear transformation of V into V .

3.3 Kernel and range of a linear transformation

We now introduce two important subsets associated with a linear transformation.

Definition 66. Let V and U be vector spaces over the scalar field F. Let T : V → U be a
linear transformation of V into U .

(i) The kernel of T is the set of vectors v in the domain V that get mapped to the zero
vector, that is, T (v) = 0. We denote the kernel of T by ker(T ):

ker(T ) = {v ∈ V | T (v) = 0}.

It is also known as the null space of T .

(ii) The range of T is the set of vectors b in the codomain U for which there exists at least
one v in V such that T (v) = b. We denote the range of T by Range(T ):

Range(T ) = {b ∈ U | there exists some v ∈ V such that T (v) = b}.

It is also known as the image of T .

You may have noticed that the definition of the range of a linear transformation is the usual defini-
tion of the range of a function. Not surprisingly, the kernel and range are subspaces of the domain
and codomain, respectively.

Theorem 67. Let V and U be vector spaces over the scalar field F. Let T : V → U be
a linear transformation of V into U . Then ker(T ) is a subspace of V and Range(T ) is a
subspace of U .

Proof. Suppose that v and u are in ker(T ). Then T (v) = 0 and T (u) = 0. By the linearity of T ,
it holds that
T (v + u) = T (v) + T (u) = 0 + 0 = 0.

Therefore, since T (u+v) = 0, u+v is in ker(T ). This shows that ker(T ) is closed under addition.
46

Now suppose that α is any scalar and v is in ker(T ). Then T (v) = 0, and thus by linearity of T ,
it holds that
T (αv) = αT (v) = α · 0 = 0.

Therefore, since T (αv) = 0, αv is in ker(T ), which proves that ker(T ) is closed under scalar
multiplication.

Lastly, by linearity of T , it holds that

T (0) = T (v − v) = T (v) − T (v) = 0,

that is, T (0) = 0. Therefore, the zero vector 0 is in ker(T ). This proves that ker(T ) is a subspace
of v. The proof that Range(T ) is a subspace of U is left as an exercise.

Example 68. Let V = Mn×n (R) be the vector space of n × n matrices with entries in R, and let
T : V → V be the mapping
T (A) = A + AT .

Describe the kernel of T .

A matrix A is in the kernel of T if T (A) = A + AT = 0, that is, if AT = −A. Hence,

ker(T ) = {A ∈ Mn×n | AT = −A}.

What type of matrix A satisfies AT = −A? For example, consider the case where A is the 2 × 2
matrix " #
a11 a12
A= ,
a21 a22

and AT = −A. Then " # " #


a11 a21 −a11 −a12
= .
a12 a22 −a21 −a22
Therefore, it must hold that a11 = −a11 , a21 = −a12 , and a22 = −a22 . Thus, necessarily a11 = 0,
a22 = 0, and a12 can be arbitrary. For example, the matrix
" #
0 7
A=
−7 0
47

satisfies AT = −A.

Using a similar computation as above, a 3 × 3 matrix satisfies AT = −A if A is of the form


 
0 a b
A = −a 0 c  .
 

−b −c 0

where a, b, c are arbitrary constants. In general, a matrix A that satisfies AT = −A is called


skew-symmetric.

Example 69. Let V be the vector space of differentiable functions on the interval [a, b]. That is,
f is an element of V if f : [a, b] → R is differentiable. Describe the kernel of the linear mapping
T : V → V defined as
T (f (x)) = f (x) + f ′ (x).

A function f is in the kernel of T if T (f (x)) = 0, that is, if

f (x) + f ′ (x) = 0.

Equivalently, if f ′ (x) = −f (x). What functions f do you know that satisfy f ′ (x) = −f (x)? How
about f (x) = e−x ? It is clear that f ′ (x) = −e−x = −f (x) and thus f (x) = e−x is in ker(T ). How
about g(x) = 2e−x ? We compute that g ′ (x) = −2e−x = −g(x) and thus g is also in ker(T ). It
turns out that the elements of ker(T ) are of the form f (x) = Ce−x for a constant C.

Example 70. Consider the linear transformation T : R3 [t] → M2×2 (R) defined by:

T (p(t)) = p(A),
" #
1 1
where A = . For a polynomial p(t) = at3 + bt2 + ct + d, the transformation is computed
−1 1
as:
T (p(t)) = aA3 + bA2 + cA + dI,

where I is the identity matrix. The powers of A are:


" # " #
0 2 −2 2
A2 = , A3 = .
−2 0 −2 −2
48

Thus, "# " # " # " #


−2 2 0 2 1 1 1 0
T (p(t)) = a +b +c +d .
−2 −2 −2 0 −1 1 0 1

Kernel of T

To find ker(T ), solve T (p(t)) = 0:


" # " #
−2a + c + d 2a + 2b + c 0 0
T (at3 + bt2 + ct + d) = = .
−2a − 2b − c −2a + c + d 0 0

This gives the equations:


−2a + c + d = 0, 2a + 2b + c = 0.

Solving these, let a and b be free variables. Then:

c = −2a − 2b, d = 4a + 2b.

Thus, the kernel is the set of all polynomials of the form:

p(t) = a(t3 − 2t + 4) + b(t2 − 2t + 2),

and a basis for ker(T ) is:


{t3 − 2t + 4, t2 − 2t + 2}.

Range of T

To find Range(T ), determine the set of all matrices of the form:


" # " #
w x −2a + c + d 2a + 2b + c
= .
y z −2a − 2b − c −2a + c + d

The consistency conditions for w, x, y, z are:

x + y = 0, w = z.
49

This implies that Range(T ) consists of matrices of the form:


" #
z −y
.
y z

A basis for Range(T ) is: (" # " #)


1 0 0 −1
, .
0 1 1 0

3.4 The rank-nullity theorem

Definition 71. Let V and U be vector spaces over the scalar field F. Let T : V → W
be a linear transformation. The dimension of the kernel of is T called the nullity of T and
is denoted by nullity(T ). The dimension of the range of T is called the rank of T and is
denoted by rank(T ).

The following theorem is a fundamental result about linear transformations.

Theorem 72. Let V be a finite-dimensional vector space and U be vector space over the
scalar field F. Let T : V → W be a linear transformation. Then

nullity(T ) + rank(T ) = dim(V ).

Proof. As with almost every proof that involves dimensions, we make use of bases for various
vector spaces involved. What we must do is relate the sizes of bases for ker(T ), Range(T ), and V .
Since ker(T ) is a subspace of V , it makes sense to start with these two vector spaces (ker(T ) and
V ). Let {u1 , u2 , . . . , uk } be a basis for ker(T ) (the smaller of the two spaces). This is a linearly
independent set of vectors in V , so we can extend it to a basis for all of V . Let this extended basis
be {u1 , u2 , . . . , uk , v1 , v2 , . . . , vm }. In forming these two bases, we have labeled the dimensions
of ker(T ) and V . That is, we have said that dim(ker(T )) = k, and dim(V ) = k + m. This means
that we must show that the dimension of the range of T is m.

Claim: A basis for the range of T is {T (v1 ), T (v2 ), . . . , T (vm )}. If we can verify this claim,
we will have finished the proof. To show this set is a basis, we must establish that this is a span-
50

ning set for the range of T and that this set is linearly independent. We tackle these properties in
the order listed.

Spanning set of Range(T ): Let w be in the range of T . This means that w = T (v) for some v in
V . This v can be written as a combination of basis vectors so

v = c1 u1 + · · · + ck uk + d1 v1 + · · · + dm vm .

Applying the transformation and making use of linearity,

w = T (v) = T (c1 u1 +· · ·+ck uk +d1 v1 +· · ·+dm vm ) = c1 T (u1 )+· · ·+ck T (uk )+d1 T (v1 )+· · ·+dm T (vm ).

Since the u’s are all in the kernel of T , we have T (uj ) = 0 for each j. Consequently,

w = d1 T (v1 ) + · · · + dm T (vm ),

which shows that w is in the span of {T (v1 ), T (v2 ), . . . , T (vm )}.

Linear Independence: Suppose that d1 T (v1 ) + · · · + dm T (vm ) = 0 for some scalars d1 , . . . , dm .


We must show that all the d’s are forced to be 0. By linearity,

d1 T (v1 ) + · · · + dm T (vm ) = 0 =⇒ T (d1 v1 + · · · + dm vm ) = 0,

so d1 v1 + · · · + dm vm is in the kernel of T . Since we have a basis for ker(T ), we have

d1 v1 + · · · + dm vm = c1 u1 + · · · + ck uk .

This looks like a linear dependence relation among the u’s and v’s, but the u’s and v’s are linearly
independent. The only possibility, then, is that all the coefficients, all the c’s and all the d’s, are
zero. In particular, all the d’s must be zero. This shows that {T (v1 ), T (v2 ), . . . , T (vm )} is a
linearly independent set, completing the proof that it is a basis.
51

3.5 Subspaces associated with a matrix

Given a matrix of size m×n, we have some naturally associated vector spaces, which are discussed
below.

3.5.1 Row space, column space and rank of a matrix

If A is an m × n matrix with entries in F, each row of A has n entries and thus can be identified
with a vector in Fn . Similarly, each column of A has m entries, and can be identified with a vector
in Fm .

Definition 73. • The set of all linear combinations of the row vectors of A is a subspace
n
of F , called the row space of A. The dimension of the row space is said to be the row
rank of A.

• The set of all linear combinations of the column vectors of A is a subspace of Fm ,


called the row space of A. The dimension of the column space is said to be the column
rank of A.

Example 74. Let


   
−2 −5 7 3 3
A= 2 4 −2 1 ∈ M3×4 (R), b = −1 .
   

3 7 −8 6 3

Is b in the column space of A?

The vector b is in the column space of A if there exists x ∈ R4 such that Ax = b. Hence, we must
determine if Ax = b has a solution. Performing elementary row operations on the augmented
matrix [A | b], we obtain:
 
0 1 −5 −4 −2
[A | b] ∼ 2 4 −2 1 3 .
 

0 0 0 17 1

The system is consistent, and therefore Ax = b will have a solution. Thus, b is in the column
space of A.
52

Recall that two matrices are row-equivalent when one can be obtained from the other by performing
finitely many elementary row operations. One can see that row-equivalent matrices have the same
row space.

We want to find a basis for the row space of A. If A is in row echelon form, then its non-zero row
vectors are linearly independent, and hence form a basis for the row space of A. In general, we
have the following useful result.

Theorem 75. If a matrix A is row-equivalent to a matrix B in row echelon form, then the
non-zero row vectors of B form a basis for the row space of A.

Observe that, the column vectors of A are the row vectors of AT . Hence, to a find a basis for the
column space of A, we find a basis for the row space of AT as above.

In fact, we have the following:

Theorem 76. The row rank and column rank of any matrix is the same.

Thus we may define the notion of rank of a matrix as follows:

Definition 77. The dimension of the row (or column) space of a matrix is called the rank of
A and is denoted by rank(A).

3.5.2 Null space of a matrix

Let A be a matrix in Mm×n (F). Let us consider the homogeneous system of equations, Ax = 0,
where x = [x1 , x2 , . . . , xn ]T be the column vector of unknowns. Then the set of all solutions of
this homogeneous system of equations form a subspace of Fn , called the null space of A, which
we define below.

Definition 78. The null space of a matrix A ∈ Mm×n (F), denoted by Null(A), is the subset
of Fn consisting of vectors v ∈ Fn such that Av = 0. Using set notation:

Null(A) = {v ∈ Fn | Av = 0}.
53

This is a subspace of Fn . The dimension of the null space of A is called the nullity of A.

Example 79. Consider the following subset of R4 :

W = {(x1 , x2 , x3 , x4 ) ∈ R4 | 2x1 − 3x2 + x3 − 7x4 = 0}.

Is W a subspace of R4 ?

Yes! The set W is the null space of the 1 × 4 matrix A given by


h i
A = 2 −3 1 −7 .

Hence, W = Null(A) and consequently W is a subspace.

Recall the linear transformation LA : Fn → Fm of Example 60. We see that, v is in the kernel
of LA if and only if LA (v) = Av = 0. In other words, ker(LA ) = Null(A). Also, note that
range of LA is the column space of A. Hence, by the rank-nullity theorem applied to the linear
transformation LA , we have the following:

Theorem 80. For a matrix A ∈ Mm×n (F), rank(A) + nullity(A) = n.

By the above result, if the the row reduced echelon form of A has r leading 1’s, that is, rank of A
is r, then dimension of the null space is d = n − r. Therefore, we will obtain vectors v1 , . . . , vd
such that
Null(A) = span{v1 , v2 , . . . , vd }.

Example 81. Find a spanning set for the null space of the matrix
 
−3 6 −1 1 −7
A =  1 −2 2 3 −1 ∈ M3×5 (R).
 

2 −4 5 8 −4

The null space of A is the solution set of the homogeneous system Ax = 0. Performing elementary
row operations one obtains  
1 −2 0 −1 3
A ∼ 0 0 1 2 −2 .
 

0 0 0 0 0
54

Clearly r = 2 and since n = 5, we will have d = 3 vectors in a basis for Null(A). Letting x5 = t1 ,
and x4 = t2 , then from the 2nd row we obtain

x3 = −2t2 + 2t1 .

Letting x2 = t3 , then from the 1st row we obtain

x1 = 2t3 + t2 − 3t1 .

Writing the general solution in parametric vector form, we obtain


     
−3 1 2
0 0 1
     
     
x = t1  2  + t2 −2 + t3 
   
0 .

0 1 0
     

1 0 0

Therefore,       

 −3 1 2 

  
0 0 1

     

 
     
Null(A) = span v1 =  2  , v2 = −2 , v3 = 0 .
     
  
0 1 0

     

     

 
1 0 0
 

You can verify that Av1 = Av2 = Av3 = 0.

3.6 The matrix representation of a linear transformation

Until now, we have studied linear transformations by examining their ranges and kernels. Now,
we embark on one of the most useful approaches to the analysis of a linear transformation on a
finite-dimensional vector space: the representation of a linear transformation by a matrix.

3.6.1 Coordinates

Let V be a finite-dimensional vector space over F. Recall that a basis of V is a set of vectors
B = {v1 , v2 , . . . , vn } in V such that
55

(i) the set B spans all of V , that is, V = span(B), and

(ii) the set B is linearly independent.

Hence, if B is a basis for V , each vector x ∈ V can be written as a linear combination of vectors
in B:
x = c1 v 1 + c2 v 2 + · · · + cn v n .

Moreover, from the definition of linear independence, any vector x ∈ span(B) can be written in
only one way as a linear combination of v1 , . . . , vn . In other words, for the x above, there does
not exist other scalars t1 , . . . , tn such that also

x = t1 v1 + t2 v2 + · · · + tn vn .

To see this, suppose that we can write x in two different ways using B:

x = c1 v 1 + c2 v 2 + · · · + cn v n ,

x = t1 v1 + t2 v2 + · · · + tn vn .

Then
0 = x − x = (c1 − t1 )v1 + (c2 − t2 )v2 + · · · + (cn − tn )vn .

Since B = {v1 , . . . , vn } is linearly independent, the only linear combination of v1 , . . . , vn that


gives the zero vector 0 is the trivial linear combination. Therefore, it must be the case that ci − ti =
0, or equivalently that ci = ti for all i = 1, 2, . . . , n. Thus, there is only one way to write x in
terms of the vectors in B = {v1 , . . . , vn }. Hence, relative to the basis B = {v1 , v2 , . . . , vn }, the
scalars c1 , c2 , . . . , cn uniquely determine the vector x, and vice versa.

We also need the concept of an ordered basis for a vector space.

Definition 82. Let V be a finite-dimensional vector space over F. An ordered basis for V
is a basis for V endowed with a specific order; that is, an ordered basis for V is a finite
sequence of linearly independent vectors in V that generates V .

Example 83. In R2 , we have the ordered basis


(" # " #)
1 0
B= , .
0 1
56

A different ordered basis is (" # " #)


′ 0 1
B = , .
1 0

Our preceding discussion on the unique representation property of vectors in a given basis leads to
the following definition.

Definition 84. Let V be a finite-dimensional vector space over F. Let B = {v1 , . . . , vn } be


an ordered basis for V . For x ∈ V , let c1 , c2 , . . . , cn be the unique scalars such that

x = c1 v 1 + c2 v 2 + · · · + cn v n .

We define the coordinates of x relative to the ordered basis B to be the unique scalars
c1 , c2 , . . . , cn . The coordinate vector of x relative to the ordered basis B, denoted by [x]B ,
is defined to be  
c1
 c2 
 
n
[x]B =  ..  ∈ F .

.
cn
If it is clear what ordered basis we are working with, we will omit the subscript B and simply
write [x] for the coordinates of x relative to B.

Example 85. One can verify that


(" # " #)
1 −1
B= ,
1 1
" #
3
is an ordered basis for R2 . Find the coordinate vector of v = relative to the ordered basis B.
1
" # " #
1 −1
Let v1 = and v2 = . By definition, the coordinates of v with respect to B are the
1 1
scalars c1 , c2 such that " #" #
1 −1 c1
v = c1 v 1 + c2 v 2 = .
1 1 c2
57

" #
c1
If we put P = [v1 v2 ], and let [v]B = , then we need to solve the linear system
c2

v = P [v]B .
" #
2
Solving the linear system, one finds that the solution is [v]B = , and therefore this is the
−1
coordinate vector of v relative to the ordered basis B.

h Let B = {v1 , vi2 , . . . , vn }


It is clear how the procedure of the previous example can be generalized.
n n
be an ordered basis for R and let v be any vector in R . Put P = v1 v2 · · · vn . Then the
B-coordinates of v is the unique column vector [v]B solving the linear system

Px = v

that is, x = [v]B is the unique solution to P x = v. Because v1 , v2 , . . . , vn are linearly indepen-
dent, the solution to P x = v is
[v]B = P −1 v.
h i
We remark that if an inconsistent row arises when you row reduce P v , then you have made an
error in your row reduction algorithm. In summary, to find coordinates with respect to an ordered
basis B in Rn , we need to solve a square linear system.

Example 86. Let      


3 −1 3
v1 = 6 , v2 =  0  , x = 12 ,
     

2 1 7
and let B = {v1 , v2 }. One can show that B is linearly independent and therefore an ordered basis
for W = span{v1 , v2 }. Determine if x is in W , and if so, find the coordinate vector of x relative
to B.

By definition, x is in W = span{v1 , v2 } if we can write x as a linear combination of v1 , v2 :

x = c1 v 1 + c2 v 2 .
58

Form the associated augmented matrix and row reduce:


   
3 −1 3 1 0 2
6 0 12 ∼ 0 1 3 .
   

2 1 7 0 0 0

The system is consistent with solution c1 = 2 and c2 = 3. Therefore, x is in W , and the B-


coordinates of x are " #
2
[x]B = .
3

Example 87. What are the coordinates of


 
3
v =  11 
 

−7

in the standard ordered basis E = {e1 , e2 , e3 }?

Clearly,        
3 1 0 0
v =  11  = 3 0 + 11 1 − 7 0 .
       

−7 0 0 1
Therefore, the coordinate vector of v relative to {e1 , e2 , e3 } is
 
3
[v]E =  11 
 

−7

Example 88. Let R3 [t] be the vector space of polynomials of degree at most 3.

(i) Show that B = {1, t, t2 , t3 } is an ordered basis for P3 [t].

(ii) Find the coordinates of v(t) = 3 − t2 − 7t3 relative to B.

The set B = {1, t, t2 , t3 } is a spanning set for R3 [t]. In fact, any polynomial u(t) = c0 + c1 t +
c2 t2 + c3 t3 is clearly a linear combination of 1, t, t2 , t3 . Is B linearly independent? Suppose that
59

there exist scalars c0 , c1 , c2 , c3 such that

c0 + c1 t + c2 t2 + c3 t3 = 0.

Since the above equality must hold for all values of t, we conclude that c0 = c1 = c2 = c3 = 0.
Therefore, B is linearly independent, and consequently an ordered basis for R3 [t]. In the ordered
basis B, the coordinates of v(t) = 3 − t2 − 7t3 are
 
3
 
0
[v(t)]B = 
−1

 
−7

The ordered basis B = {1, t, t2 , t3 } is called the standard ordered basis in P3 [t].

Example 89. Show that


(" # " # " # " #)
1 0 0 1 0 0 0 0
B= , , ,
0 0 0 0 1 0 0 1
" #
3 0
is an ordered basis for M2×2 (R). Find the coordinate vector of A = relative to B.
−4 −1
" #
m11 m12
Any matrix M = can be written as a linear combination of the matrices in B:
m21 m22
" # " # " # " # " #
m11 m12 1 0 0 1 0 0 0 0
= m11 + m12 + m21 + m22
m21 m22 0 0 0 0 1 0 0 1

If " # " # " # " # " # " #


1 0 0 1 0 0 0 0 c1 c2 0 0
c1 + c2 + c3 + c4 = =
0 0 0 0 1 0 0 1 c3 c4 0 0
then clearly c1 = c2 = c3 = c4 = 0. Therefore, B is linearly independent, and consequently an
60

ordered basis for M2×2 (R). The coordinate vector of A relative to the ordered basis B is given by
 
3
 
0
[A]B = 
−4 .

 
−1

The basis B above is the standard ordered basis of M2×2 .

3.6.2 Coordinate transformations

Let B = {v1 , v2 , . . . , vn } be an ordered basis of Rn and let P = [v1 v2 · · · vn ] ∈ Mn×n (R). If


x ∈ Rn and [x]B is the coordinate vector of x relative to B, then

x = P [x]B . (4)

Hence, thinking of P : Rn → Rn as a linear transformation, P maps B-coordinate vectors to


coordinate vectors relative to the standard ordered basis of Rn . For this reason, we call P the
change-of-coordinates matrix from the ordered basis B to the standard ordered basis in Rn . If we
need to emphasise that P is constructed from the ordered basis B we will write PB instead of just
P . Multiplying equation (4) by P −1 we obtain

P −1 x = [x]B .

Therefore, P −1 maps coordinate vectors in the standard ordered basis to coordinate vectors relative
to B.

Example 90. The columns of the matrix P form a basis B for R3 :


 
1 3 3
P = −1 −4 −2 .
 

0 0 −1

(a) What vector x ∈ R3 has B-coordinate vector [x]B = (1, 0, −1)?

(b) Find the B-coordinate vector of v = (2, −1, 0).


61

The matrix P maps B-coordinate vectors to standard coordinate vectors in R3 . Therefore,


 
−2
x = P [x]B =  1  .
 

On the other hand, the inverse matrix P −1 maps standard coordinates in R3 to B-coordinates. One
can verify that  
4 3 6
P −1 = −1 −1 −1 .
 

0 0 −1
Therefore, the B-coordinate vector of v is
    
4 3 6 2 5
−1
[v]B = P v = −1 −1 −1 −1 = −1 .
    

0 0 −1 0 0

In general, if V is an n-dimensional vector space over F and B = {v1 , v2 , . . . , vn } is an ordered


basis for V , we define the coordinate mapping P : V → Fn relative to B as the mapping

P (v) = [v]B .

Example 91. Let V = M2×2 (R) and let B = {A1 , A2 , A3 , A4 } be the standard ordered basis for
M2×2 ((R). What is P : V → R4 ?

Recall, (" # " # " # " #)


1 0 0 1 0 0 0 0
B = {A1 , A2 , A3 , A4 } = , , ,
0 0 0 0 1 0 0 1
" #
a11 a12
Then for any A = , we have
a21 a22
 
" #! a11 
a11 a12 a12 
P =
a  .

a21 a22  21 
a22
62

3.6.3 The matrix of a linear transformation

Let V and W be finite-dimensional vector spaces over F and let T : V → W be a linear transfor-
mation. Then by definition, T (v + u) = T (v) + T (u) and T (αv) = αT (v) for every v, u ∈ V
and α ∈ F. Let β = {v1 , v2 , . . . , vn } be an ordered basis of V and let γ = {w1 , w2 , . . . , wm } be
an ordered basis of W . Then for any v ∈ V , there exist scalars c1 , c2 , . . . , cn such that

v = c1 v 1 + c2 v 2 + · · · + cn v n
 
c1
 c2 
 
and thus [v]β = 
 ..  is the coordinate vector of v relative to the ordered basis β. By linearity of

 .
cn
the mapping T , we have

T (v) = T (c1 v1 + c2 v2 + · · · + cn vn ) = c1 T (v1 ) + c2 T (v2 ) + · · · + cn T (vn ).

Now, each vector T (vj ) is in W , and therefore, because γ is a basis of W , there are scalars
a1,j , a2,j , . . . , am,j such that

T (vj ) = a1,j w1 + a2,j w2 + · · · + am,j wm .

In other words,  
a1,j
 a2,j 
 
[T (vj )]γ = 
 ..  .

 . 
am,j
Substituting T (vj ) = a1,j w1 + a2,j w2 + · · · + am,j wm for each j = 1, 2, . . . , n into

T (v) = c1 T (v1 ) + c2 T (v2 ) + · · · + cn T (vn )

and then simplifying, we get


m X
X n
T (v) = cj ai,j wi .
i=1 j=1
63

Therefore,
[T (v)]γ = A[v]B

where A is the m × n matrix with entries in F given by


 
A = [T (v1 )] [T (v2 )] . . . [T (vn )] .

The matrix A is the matrix representation of the linear transformation T in the ordered bases β and
γ.

Example 92. Consider the vector space V = R2 [t] of polynomials of degree no more than two and
let T : V → V be defined by
T (v(t)) = 4v′ (t) − 2v(t).

It is straightforward to verify that T is a linear mapping. Let

B = {v1 , v2 , v3 } = {t − 1, 3 + 2t, t2 + 1}.

(i) Verify that B is a basis of V .

(ii) Find the coordinates of v(t) = −t2 + 3t + 1 in the basis B.

(iii) Find the matrix representation of T in the basis B.

(i) Suppose that there are scalars c1 , c2 , c3 such that

c1 v 1 + c2 v 2 + c3 v 3 = 0

Then expanding and collecting like terms, we obtain

c3 t2 + (c1 + 2c2 )t + (−c1 + 3c2 + c3 ) = 0

Since the above holds for all t ∈ R, we must have

c3 = 0, c1 + 2c2 = 0, −c1 + 3c2 + c3 = 0

Solving for c1 , c2 , c3 , we obtain c1 = 0, c2 = 0, c3 = 0. Hence, the only linear combination of


the vectors in B that produces the zero vector is the trivial linear combination. This proves by
64

definition that B is linearly independent. Since we already know that dim(V ) = 3 and B contains
3 vectors, then B is a basis for V .

(ii) The coordinates of v(t) = −t2 + 3t + 1 are the unique scalars (c1 , c2 , c3 ) such that

c1 v 1 + c2 v 2 + c3 v 3 = v

In this case, the linear system is

c3 = −1, c1 + 2c2 = 3, −c1 + 3c2 + c3 = 1

and solving yields c1 = 1, c2 = 1, and c3 = −1. Hence, the coordinate vector of v relative to the
ordered basis B is given by  
1
[v]B =  1  .
 

−1

(iii) The matrix representation A of T in the ordered basis B is


h i
A = [T (v1 )]B [T (v2 )]B [T (v3 )]B .

Now we compute directly that

T (v1 ) = −2t + 6, T (v2 ) = −4t + 2, T (v3 ) = −2t2 + 8t − 2

and then one computes that


 18   6  24 
−5 −5 5
 4   2 8
[T (v1 )]B =  5  , [T (v2 )]B = − 5  , [T (v3 )]B =  .
 
5
0 0 −2

And therefore,  
− 18
5
− 6
5
24
5
A =  45 − 25 58  .
 

0 0 −2

You might also like