0% found this document useful (0 votes)
59 views326 pages

LA Lecture Notes

This document provides an overview of the key topics in linear algebra, including: 1) Systems of linear equations and methods for solving them like Gaussian elimination. 2) Vectors and vector spaces like R2 and R3, including the dot product and cross product. 3) Matrices and how they represent linear transformations. 4) Additional matrix concepts such as inverses, determinants, eigenvalues and eigenvectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views326 pages

LA Lecture Notes

This document provides an overview of the key topics in linear algebra, including: 1) Systems of linear equations and methods for solving them like Gaussian elimination. 2) Vectors and vector spaces like R2 and R3, including the dot product and cross product. 3) Matrices and how they represent linear transformations. 4) Additional matrix concepts such as inverses, determinants, eigenvalues and eigenvectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 326

Linear Algebra

FT
RA
D

Analysis Series

M. Winklmeier

Chigüiro Collection
Work in progress. Use at your own risk.

FT
RA
D
Contents

1 Introduction 7
1.1 Examples of systems of linear equations; coefficient matrices . . . . . . . . . . . . . . 8
1.2 Linear 2 × 2 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

FT
2 R2 and R3 25
2.1 Vectors in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Inner product in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3 Orthogonal Projections in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5 Vectors in R3 and the cross product . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.6 Lines and planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.7 Intersections of lines and planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . 56
RA
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3 Linear Systems and Matrices 69


3.1 Linear systems and Gauß and Gauß-Jordan elimination . . . . . . . . . . . . . . . . 69
3.2 Homogeneous linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.3 Matrices and linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
3.4 Matrices as functions from Rn to Rm ; composition of matrices . . . . . . . . . . . . 85
3.5 Inverses of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
D

3.6 Matrices and linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99


3.7 The transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.8 Elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

4 Determinants 121
4.1 Determinant of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2 Properties of the determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.3 Geometric interpretation of the determinant . . . . . . . . . . . . . . . . . . . . . . . 133
4.4 Inverse of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

3
4 CONTENTS

5 Vector spaces 143


5.1 Definitions and basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
5.3 Linear combinations and linear independence . . . . . . . . . . . . . . . . . . . . . . 156
5.4 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

6 Linear transformations and change of bases 187


6.1 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.2 Matrices as linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
6.3 Change of bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.4 Linear maps and their matrix representations . . . . . . . . . . . . . . . . . . . . . . 214
6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

7 Orthonormal bases and orthogonal projections in Rn 233

FT
7.1 Orthonormal systems and orthogonal bases . . . . . . . . . . . . . . . . . . . . . . . 233
7.2 Orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.3 Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.4 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.5 The Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
7.6 Application: Least squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
RA
8 Symmetric matrices and diagonalisation 269
8.1 Complex vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
8.2 Similar matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
8.3 Eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
8.4 Properties of the eigenvalues and eigenvectors . . . . . . . . . . . . . . . . . . . . . . 288
8.5 Symmetric and Hermitian matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
8.6 Application: Conic Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
8.6.1 Solutions of ax2 + bxy + cy 2 = d as conic sections . . . . . . . . . . . . . . . . 309
D

8.6.2 Solutions of ax2 + bxy + cy 2 + rx + sy = d . . . . . . . . . . . . . . . . . . . 311


8.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
8.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

A Complex Numbers 319

Index 325

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
Chapter 1. Introduction 5

Chapter 1

Introduction

This chapter serves as an introduction to the main themes of linear algebra, namely the problem of
solving systems of linear equations for several unknowns. We are not only interested in an efficient

FT
way to find their solutions, but we also wish to understand how the solutions could possibly look
and what we can say about their structure. For the latter, it will be crucial to find a geometric
interpretation of systems of linear equations. In this chapter we will use the “solve and insert”-
strategy for solving linear systems. A systematic and efficient formalism will be given in Chapter 3.
Everything we discuss in this chapter will appear again later on, so you may read it quickly or even
skip (parts of) it.
A linear system is a set of equations for a number of unknowns which have to be satisfied simul-
RA
taneously and where the unknowns appear only linearly. If the number of equations is m and the
number of unknowns is n, then we call it an m × n linear system. Typically the unknowns are
called x, y, z or x1 , x2 . . . , xn . The following is an example of a linear system of 3 equations for 5
unknowns:

x1 + x2 + x3 + x4 + x5 = 3, 2x1 + 3x2 − 5x3 + x4 = 1, 3x1 − 8x5 = 0.

An example of a non-linear system is

x1 x2 + x3 + x4 + x5 = 3, 2x1 + 3x2 − 5x3 + x4 = 1, 3x1 − 8x5 = 0


D

because
√ in the first equation we have a product of two of the unknowns. Also expressions like x2 ,
3
x, xyz, x/y or sin x make a system non-linear.
Now let us briefly discuss the simplest non-trivial case: A system consisting of one linear equation
for one unknown x. Its most general form is

ax = b (1.1)

where a and b are given constants and we want to find all x ∈ R which satisfy (1.1). Clearly, the
solution to this problem depends on the coefficients a and b. We have to distinguish several cases.
Case 1. a 6= 0. In this case, there is only one solution, namely x = b/a.
Case 2. a = 0, b 6= 0. In this case, there is no solution because whatever value we choose for x,
the left hand side ax will always be zero and therefore cannot be equal to b.

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
6 1.1. Examples of systems of linear equations; coefficient matrices

Case 3. a = 0, b = 0. In this case, there are infinitely many solutions. In fact, every x ∈ R solves
the equation.
So we see that already in this simple case we have three very different types of solution of the
system (1.1): no solution, exactly one solution or infinitely many solutions.
Now let us look at a system of one linear equation for two unknowns x, y. Its most general form is

ax + by = c. (1.1’)

Here, a, b, c are given constants and we want to find all pairs x, y so that the equation is satisfied.
For example, if a = b = 0 and c 6= 0, then the system has no solution, whereas if for example a 6= 0,
then there are infinitely many solutions because no matter how we choose y, we can always satisfy
the system by taking x = a1 (c − y).

Question 1.1
Is it possible that the system has exactly one solution?
(Come back to this question again after you have studied Chapter 3.)

FT
The general form of a system of two linear equations for one unknown is

a1 x = b1 ,

a11 x + a12 y = c1 ,
a2 x = b2

and that of a system of two linear equations for two unknowns is

a21 x + a22 y = c2
RA
where a1 , a2 , b1 , b2 , respectively a11 , a12 , a21 , a22 , c1 , c2 are constants and x, respectively x, y are the
unknowns.
Question 1.2
Can you find find examples for the coefficients such that the systems have

(i) no solution, (iii) exactly two solutions,


(ii) exactly one solution, (iv) infinitely many solutions?
D

Can you maybe even give a general rule for when which behaviour occurs?
(Come back to this question again after you have studied Chapter 3.)

Before we discuss general linear systems, we will discuss in this introductory chapter the special
case of a system of two linear equations with two unknowns. Although this is a very special type
of system, it exhibits many porperties of general linear systems and they appear very often in
problems.

1.1 Examples of systems of linear equations; coefficient ma-


trices
Let us start with a few examples of systems of linear equations.

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
Chapter 1. Introduction 7

Example 1.1. Assume that a car dealership sells motorcycles and cars. Altogether they have 25
vehicles in their shop with a total of 80 wheels. How many motorcycles and cars are in the shop?

Solution. First, we give names to the quantities we want to calculate. So let M = number of
motorcyles, C = number of cars in the dealership. If we write the information given in the exercise
in formulas, we obtain

1 M + C = 25, (total number of vehicles)


2 2M + 4C = 80, (total number of wheels)

since we assume that every motorcycle has 2 wheels and every car has 4 wheels. Equation 1 tells
us that M = 25 − C. If we insert this into equation 2 , we find

80 = 2(25 − C) + 4C = 50 − 2C + 4C = 50 + 2C =⇒ 2C = 30 =⇒ C = 15.

This implies that M = 25 − C = 25 − 15 = 10. Note that in our calculations and arguments, all

FT
the implication arrows go “from left to right”, so what we can conclude at this instance is that the
system has only one possible candidate for a solution and this candidate is M = 10, C = 15. We
have not (yet) shown that it really is a solution. However, inserting these numbers in the original
equation we see easily that our candidate is indeed a solution.
So the answer is: There are 10 motorcycles and 15 cars (and there is no other possibility). 

Let us put one more equation into the system.


RA
Example 1.2. Assume that a car dealership sells motorcycles and cars. Altogether they have 28
vehicles in their shop with a total of 80 wheels. Moreover, the shop arranges them in 7 distinct areas
of the shop so that in each area there are either 3 cars or 5 motorcycles. How many motorcycles
and cars are in the shop?

Solution. Again, let M = number of motorcyles, C = number of cars. The information of the
exercise leads to the following system of equations:
D

1 M+ C = 25, (total number of vehicles)


2 2M + 4C = 80, (total number of wheels)
3 M/5 + C/3 = 7. (total number of areas)

As in the previous exercise, we obtain from 1 and 2 that M = 10, C = 15. Clearly, this also
satisfies equation 3 . So again the answer is: There are 10 motorcycles and 15 cars (and there is
no other possibility). 

Example 1.3. Assume that a car dealership sells motorcycles and cars. Altogether they have 25
vehicles in their shop with a total of 80 wheels. Moreover, the shop arranges them in 5 distinct areas
of the shop so that in each area there are either 3 cars or 5 motorcycles. How many motorcycles
and cars are in the shop?

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
8 1.1. Examples of systems of linear equations; coefficient matrices

Solution. Again, let M = number of motorcycles, C = number of cars. The information of the
exercise gives the following equations:

1 M+ C = 25, (total number of vehicles)


2 2M + 4C = 80, (total number of wheels)
3 M/5 + C/3 = 5. (total number of areas)

As in the previous exercise, we obtain that M = 10, C = 15 using only equations 1 and 2 .
However, this does not satisfy equation 3 ; so there is no way to choose M and C such that all
three equations are satisfied simultaneously. Therefore, a shop as in this example does not exist. 

Example 1.4. Assume that a zoo has birds and cats. The total count of legs of the animals is 60.
Feeding a bird takes 5 minutes, feeding a cat takes 10 minutes. The total time to feed the animals
is 150 minutes. How many birds and cats are in the zoo?

Solution. Let B = number of birds, C = number of cats in the zoo. The information of the

FT
exercise gives the following equations:

1 2B + 4C = 60, (total number of legs)


2 5B + 10C = 150, (total time for feeding)

The first equation gives B = 30 − 2C. Inserting this into the second equation, gives

150 = 5(30 − 2C) + 10C = 150 − 10C + 10C = 150


RA
which is always true, independently of the choice of B and C. Indeed, for instance B = 10, C = 10
or B = 14, C = 8, or B = 0, C = 15 are solutions. We conclude that the information given in the
exercise it no sufficient to calculate the number of animals in the zoo. 

Remark. The reason for this is that both equations 1 and 2 are basically the same equation.
If we divide the first one by 2 and the second one by 5, then we end up in both cases with the
equation B + 2C = 30, so both equations contain exactly the same information.
D

Algebraically, the linear system has infinitely many solutions. But our variables represent animals
and the only come in nonnegativ integer quantities, so we have the 16 different solutions B = 30−C
where C ∈ {0, 1, . . . , 15}.

We give a few more examples.

Example 1.5. Find a polynomial P of degree at most 3 with

P (0) = 1, P (1) = 7, P 0 (0) = 3, P 0 (2) = 23. (1.2)

Solution. A polynomial of degree at most 3 is known if we know its 4 coefficients. In this exercise,
the unknowns are the coefficients of the polynomial P . If we write P (x) = αx3 + βx2 + γx + δ,
then we have to find α, β, γ, δ such that (1.2) is satisfied. Note that P 0 (x) = 3αx2 + 2βx + γ. Hence

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
Chapter 1. Introduction 9

(1.2) is equivalent to the following system of equations:


 
P (0) = 1,  1 δ = 1,

 

P (1) = 7, 2 α+ β+ γ+δ = 7,
⇐⇒
P 0 (0) = 3,
 
 3 γ = 3,
 
0  
P (2) = 23. 4 24α + 8β + 2γ + δ = 23.
Clearly, δ = 1 and γ = 3. If we insert this in the remaining equations, we obtain a system of two
equations for the two unknowns α, β:
2’ α + β = 3,
4’ 24α + 8β = 20.
From 2’ we obtain β = 4 − α. If we insert this into 4’ , we get that 16 = 24α + 8(4 − α) = 16α + 32,
that is, α = (32 − 16)/16 = 1. So the only possible solution is
α = 1, β = 2, γ = 3, δ = 1.

FT
3 2
It is easy to verify that the polynomial P (x) = x + 2x + 3x + 1 has all the desired properties. 

Example 1.6. A pole is 5 metres long and shall be coated with varnish. There are two types of
varnish available: The blue one adds 3 g per 50 cm to the pole, the red one adds 6 g per meter to
the pole. Is it possible to coat the pole in a combination of the varnishes so that the total weight
added is
(a) 35 g? (b) 30 g?
RA
Solution. (a) We denote by b the length of the pole which will be covered in blue and r the length
of the pole which will be covered in red. Then we obtain the system of equations
1 b+ r = 5 (total length)
2 6b + 6r = 35 (total weight)
The first equation gives r = 5 − b. Inserting into the second equation yields 35 = 6b + 6(5 − b) = 30
which is a contradiction. This shows that there is no solution.
(b) As in (a), we obtain the system of equations
D

1 b+ r = 5 (total length)
2 6b + 6r = 30 (total weight)
Again, the first equation gives r = 5−b. Inserting into the second equation yields 30 = 6b+6(5−b) =
30 which is always true, independently of how we choose b and r as long as 1 is satisfied. This
means that in order to solve the system of equations, it is sufficient to solve only the first equation
since then the second one is automatically satisfied. So we have infinitely many solutions. Any pair
b, r such that b + r = 5 gives a solution. So for any b that we choose, we only have to set r = 5 − b
and we have a solution of the problem. Of course, we could also fix r and then choose b = 5 − r to
obtain a solution.
For example, we could choose b = 1, then r = 4, or b = 0.00001, then r = 4.99999, or r = −2 then
b = 7. Clearly, the last example does not make sense for the problem at hand, but it still does
satisfy our system of equations. 

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
10 1.1. Examples of systems of linear equations; coefficient matrices

Example 1.7. When octane reacts with oxigen, the result is carbon dioxide and water. Find the
equation for this reaction.

Solution. The chemical formulas for the substances are C8 H18 , O2 , CO2 and H2 O. Hence the
reaction equation is
a C8 H18 + b O2 −→ c CO2 + d H2 O
with unkonwn integers a, b, c, d. Clearly the solution will not be unique since if we have one set
of numbers a, b, c, d which works and we multiply all of then by the same number, then we obtain
another solution. Let us write down the system of equations. To this end we note that the number
of atoms of each element has to be equal on both sides of the equation. We obtain:
1 8a = c (carbon)
2 18a = 2d (hydrogen)
3 2b = 2c + d (oxygen)
or, if we put all the variables on the left hand side,

1
2
4

FT 8a
18a
− c = 0,
− 2d = 0,
2b − 2c − d = 0.
Let us express all the unknowns in terms of a: 1 and 2 show that c = 8a and d = 9a. Inserting
this in 3 we obtain 0 = 2b − 2 · 8a − 9a = 2b − 25a, hence b = 25
2 a. If we want all coefficients to
be integers, we can choose a = 2, b = 25, c = 16, d = 18 and the reaction equation becomes
RA
2 C8 H18 + 25 O2 −→ 16 CO2 + 18 H2 O . 
All the examples we discussed in this section are so-called systems of linear equations. Let us give
a precise definition of what we mean by this.

Definition 1.8 (Linear system). An m×n system of linear equations (or simply a linear system)
is a system of m linear equations for n unknowns of the form
a11 x1 + a12 x2 + · · · + a1n xn = b1
D

a21 x1 + a22 x2 + · · · + a2n xn = b2


.. .. .. (1.3)
. . .
am1 x1 + am2 x2 + · · · + amn xn = bm
The unknowns are x1 , . . . , xn while the numbers aij and bi (i = 1, . . . , m, j = 1, . . . , n) are given.
The numbers aij are called the coefficients of the linear system and the numbers b1 , . . . , bn are
called the right side of the linear system.
A solution of the system (1.3) is a tuple (x1 , . . . , xn ) such that all m equations of (1.3) are satisfied
simultaneously. The system (1.3) is called consistent if it has at least one solution. It is called
inconsistent if it has no solution.
In the special case when all bi are equal to 0, the system is called a homogeneous system; otherwise
it is called inhomogeneous.

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
Chapter 1. Introduction 11

Definition 1.9 (Coefficient matrix). The coefficient matrix A of the system is the collection of
all coefficients aij in an array as follows:
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A= . ..  . (1.4)
 
 .. . 
am1 am2 . . . amn
The numbers aij are called the entries or components of the matrix A.
The augmented coefficient matrix A of the system is the collection of all coefficients aij and the
right hand side; it is denoted by
 
a11 a12 . . . a1n b1
 a21 a22 . . . a2n b2 
 
(A|b) =  ..
 .. ..  . (1.5)
 . . .
am1 am2 . . . amn bn

FT
The coefficient matrix is nothing else than the collection of the coefficients aij ordered in some sort
of table or rectangle such that the place of the coefficient aij is in the ith row of the jth column.
The augmented coefficient matrix contains additionally the constants from the right hand side.

Important observation. There is a one-to-one correspondence between linear systems and aug-
mented coefficient matrices: Given a linear system, it is easy to write down its augmented coefficient
matrix and vice versa.
RA
Let us write down the coefficient matrices of our examples.
Example 1.1: This is a 2 × 2 system with coefficients a11 = 1, a11 = 1, a21 = 2, a22 = 4 and
right hand side b1 = 60, b2 = 200. The system has a unique solution. The coefficient matrix and
the augmented coefficient matrix are
   
1 1 1 1 60
A= , (A|b) = .
2 4 2 4 200
Example 1.2: This is a 3 × 2 system with coefficients a11 = 1, a11 = 1, a21 = 2, a22 = 4, a31 = 2,
D

a32 = 3, and right hand side b1 = 60, b2 = 200, b3 = 140. The system has a unique solution. The
coefficient matrix and the augmented coefficient matrix are
   
1 1 1 1 60
A = 2 4 , (A|b) = 2 4 200 ,
2 3 2 3 140
Example 1.3: This is a 3 × 2 system with coefficients a11 = 1, a11 = 1, a21 = 2, a22 = 4, a31 = 2,
a32 = 3, and right hand side b1 = 60, b2 = 200, b3 = 100. The system has no solution. The
coefficient matrix is the same as in Example 1.2, the augmented coefficient matrix is
 
1 1 60
(A|b) = 2 4 200 ,
2 3 100

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
12 1.1. Examples of systems of linear equations; coefficient matrices

Example 1.5: This is a 4 × 4 system with coefficients a11 = 0, a12 = 0, a13 = 0, a14 = 1, a21 = 1,
a22 = 1, a23 = 1, a24 = 1, a31 = 0, a32 = 0, a33 = 1, a34 = 0, a41 = 24, a42 = 8, a43 = 2, a44 = 1,
and right hand side b1 = 1, b2 = 7, b3 = 3, b4 = 23. The system has a unique solution. The
coefficient matrix and the augmented coefficient matrix are
   
0 0 0 1 0 0 0 1 1
 1 1 1 1 1 1 1 1 7
A=  0 0 1 0 , (A|b) =  0 0 1 0 3  .
  

24 8 2 1 24 8 2 1 23

Example 1.7: This is a 3 × 4 homogeneous system with coefficients a11 = 8, a12 = 0, a13 = −1,
a14 = 0, a21 = 18, a22 = 0, a23 = 0, a24 = −2, a31 = 0, a32 = 2, a33 = −2, a34 = −1, and right
hand side b1 = 1, b2 = 7, b3 = 3, b4 = 23. The system has a unique solution. The coefficient matrix
and the augmented coefficient matrix are
   
8 0 −1 0 8 0 −1 0 0

FT
A = 18 0 0 −2 , (A|b) = 18 0 0 −2 0 .
0 2 −2 −1 0 2 −2 −1 0

We saw that Examples 1.1, 1.2, 1.5, 1.6 (a) have unique solutions. In Examples 1.6 (b) and 1.7
the solution is not unique; they even have infinitely many solutions! Examples 1.3 and 1.6(a) do
not admit solutions. So given an m × n system of linear equations, two important questions arise
naturally:
RA
• Existence: Does the system have a solution?
• Uniqueness: If the system has a solution, is it unique?

More generally, we would like to be able to say something about the structure of solutions of linear
systems. For example, is it possible that there is only one solution? That there are exactly two
solutions? That there are infinite solutions? That there is is no solution? Can we give criteria for
D

existence and/or uniqueness of solutions?


Can we give criteria for existence of infinitely many solutions? Is there an efficient way to calculate
all the solutions of a given linear system?
(Spoiler alert: A system of linear equations has either no or exactly one or infinitely many solutions.
It is not possible that it has, e.g., exactly 7 solutions. This will be discussed in detail in Chapter 3.)
Before answering these questions for general m × n systems in Chapter 3, we will have a closer look
at the special case of 2 × 2 systems in the next section.

You should now have understood


• what a linear system is,

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
Chapter 1. Introduction 13

• what a coefficient matrix and an augmented coefficient matrix are,


• their relation with linear systems,
• that a linear system can have different types of solutions,
• etc.

You should now be able to


• pass easily from a linear m × n system to its (augmented) coefficient matrix and back,
• solve linear systems by the “solve and substitute”-method,
• etc.

1.2 Linear 2 × 2 systems


Let us come back to the equation from Example 1.1. For convenience, we write now x instead of B

FT
and y instead of C. Recall that the system of equations that we are interested in solving is

1 x + y = 60,
(1.6)
2 2x + 4y = 200.

We want to give a geometric meaning to this system of equations. To this end we think of pairs
x, y as points (x, y) in the plane. Let us forget about the equation 2 for a moment and concentrate
only on 1 . Clearly, it has infinitely many solutions. If we choose an arbitrary x, we can always
find y such that 1 satisfied (just take y = 60 − x). Similarly, if we choose any y, then we only have
RA
to take x = 60 − y and we obtain a solution of 1 .
Where in the xy-plane lie all solutions of 1 ? Clearly, 1 is equivalent to y = 60 − x which we easily
identify as the equation of the line L1 in the xy-plane which passes through (0, 60) and has slope
−1. In summary, a pair (x, y) is a solution of 1 if and only if it lies on the line L1 , see Figure 1.1.
If we apply the same reasoning to 2 , we find that a pair (x, y) satisfies 2 if and only if (x, y) lies
on the line L2 in the xy-plane given by y = 41 (200 − 2x) (this is the line in the xy-plane passing
through (0, 50) with slope − 12 ).
D

Now it is clear that a pair (x, y) satisfies both 1 and 2 if and only if it lies on both lines L1 and
L2 . So finding the solution of our system (1.6) is the same as finding the intersection of the two
lines L1 and L2 . From elementary geometry we know that there are exactly three possibilities for
their intersection:

(i) L1 and L2 are not parallel. Then they intersect in exactly one point.
(ii) L1 and L2 are parallel and not equal. Then they do not intersect.
(iii) L1 and L2 are parallel and equal. Then L1 = L2 and they intersect in infinitely many points
(they intersect in every point of L1 = L2 ).

In our example we know that the slope of L1 is −1 and that the slope of L2 is − 12 , so they are not
parallel and therefore intersect in exactly one point. Consequently, the system (1.6) has exactly
one solution.

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
14 1.2. Linear 2 × 2 systems

M M M

40 40 40

30 30 2M + 4C = 80 30

20 M + C = 25 20 20

10 10 10 (15, 10)

C C C
−10 10 20 30 −10 10 20 30 −10 10 20 30
−10 −10 −10

Figure 1.1: Graphs of the lines L1 , L2 which represent the equations from the system (1.6) (see also
Example 1.1). Their intersection represents the unique solution of the system.

the lines

FT
If we look again at Example 1.6, we see that in Case (a) we have to determine the intersection of

L1 : y = 5 − x, L2 : y =
35
6
− x.
Both lines have slope −1 so they are parallel. Since the constant terms in both lines are not equal,
they intersect nowhere, showing that the system of equations has no solution, see Figure 1.2.
In Case (b), the two lines that we have to intersect are
RA
G1 : y = 5 − x, G2 : y = 5 − x.
We see that G1 = G2 , so every point on G1 (or G2 ) is solution of the system and therefore we have
infinite solutions, see Figure 1.2.

Important observation. If a linear 2 × 2 system has a unique solution or not, has nothing to
do with the right hand side of the system because this only depends on whether the two lines are
parallel or not, and this in turn depends only on the coefficients on the left hand side.
D

Now let us consider the general case.

One linear equation with two unknowns


The general form of one linear equation with two unknowns is
αx + βy = γ. (1.7)
For the set of solutions, there are three possibilities:
(i) The set of solutions forms a line. This happens if at least one of the coefficients α or β is
γ
different from 0. If β 6= 0, then set of all solutions is equal to the line L : y = − αβ x + β which
is a line with slope − αγ . If β = 0 and α 6= 0, then the set of solutions of (1.7) is a line parallel
to the y-axis passing through (0, αγ ).

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
Chapter 1. Introduction 15

r
6
5
4
3 L1 : y = 5 − x
2 L2 : y = 35/6 − x

1
g
−1 1 2 3 4 5 6
−1

Figure 1.2: Example 1.6. Graphs of L1 , L2 .

FT
(ii) The set of solutions is all of the plane. This happens if α = β = γ = 0. In this case, clearly
every pair (x, y) is a solution of (1.7).
(iii) There is no solution. This happens if α = β = 0 and γ 6= 0. In this case, no pair (x, y) is a
solution of (1.7) since the left hand side is always 0.

In the first two cases, (1.7) has infinitely many solutions, in the last case it has no solution.
RA
Two linear equations with two unknowns

The general form of one linear equation with two unknowns is

1 Ax + By = U
(1.8)
2 Cx + Dy = V.

We are using the letters A, B, C, D instead of a11 , a12 , a21 , a22 in order to make the calculations
more readable. If we interprete the system of equations as intersection of two geometrical objects,
D

in our case lines, we already know the there are exactly three possible types of solutions:

(i) A point if 1 and 2 describe two non-parallel lines.


(ii) A line if 1 and 2 describe the same line; or if one of the equations is a plane and the other
one is a line.
(iii) A plane if both equations describe a plane.
(iv) The empty set if the two equations describe parallel but different lines; or if one of the
equations has no solution.

In case (i), the system has exactly one solution, in cases (ii) and (iii) the system has infinitely many
solutions and in case (iv) the system has no solution.
In summary, we have the following very important observation.

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
16 1.2. Linear 2 × 2 systems

Remark 1.10. The system (1.8) has either exactly one solution or infinitely many solutions or
no solution.

It is not possible to have for instance exactly 7 solutions.

Question 1.3
What is the geometric interpretation of
(i) a system of 3 linear equations for 2 unknowns?
(ii) a system of 2 linear equations for 3 unknowns?
What can be said about the structure of its solutions?

Algebraic proof of Remark 1.10. Now we want to prove the Remark 1.10 algebraically and we want
to find a criterion on A, B, C, D which allows us to decide easily how many solutions there are. Let

FT
us look at the different cases.
1
Case 1. B 6= 0. In this case we can solve 1 for y and obtain y = B (U − Ax). Inserting 2 we find
D
Cx + − Ax) = V . If we put all terms with x on one side and all other terms on the other side,
B (U
we obtain
2’ (AD − BC)x = DU − BV.
DU −BV
(i) If AD − BC 6= 0 then there is at most one solution, namely x = AD−BC and consequently
1 AV −CU
y= − Ax) =
B (U AD−BC .
Inserting these expressions for x and y in our system of equations,
RA
we see that they indeed solve the system (1.8), so that we have exactly one solution.

(ii) If AD − BC = 0 then equation 2’ reduces to 0 = DU − BV . This equation has either no


solution (if DU − BV 6= 0) or it is true for every possible choice of x and y (if DU − BV = 0).
Since 1 has infinitely many solutions, it follows that the system (1.8) has either no solution
or infinitely many solutions.

Case 2. D 6= 0. This case is analogous to Case 1. In this case we can solve 2 for y and obtain
1 B
D

y= D (V − Cx). Hence 1 becomes Ax + D (V − Cx) = U . If we put all terms with x on one side
and all other terms on the other side, we obtain

1’ (AD − BC)x = DU − BV

We have the same subcases as before:


DU −BV
(i) If AD − BC 6= 0 then there is exactly one solution, namely x = AD−BC and consequently
1 AV −CU
y= B (U − Ax) = AD−BC .

(ii) If AD − BC = 0 then equation 1’ reduces to 0 = DU − BV . This equation has either no


solution (if DU − BV 6= 0) or holds for every x and y (if DU − BV = 0). Since 2 has
infinitely many solutions, it follows that the system (1.8) has either no solution or infinitely
many solutions.

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
Chapter 1. Introduction 17

Case 3. B = 0 and D = 0. Observe that in this case AD − BC = 0 . In this case the system (1.8)
reduces to
Ax = U, Cx = V. (1.9)
We see that the system no longer depends on y. So, if the system (1.9) has at least one solution,
then we automatically have infinitely many solutions since we may choose y freely. If the system
(1.9) has no solution, then the original system (1.8) cannot have a solution either.
Note that there are no other possible cases for the coefficients.
In summary, we proved the following theorem.

Theorem 1.11. Let us consider the linear system

1 Ax + By = U
(1.10)
2 Cx + Dy = V.

(i) The system (1.10) has exactly one solution if and only if AD − BC 6= 0 . In this case, the

FT
solution is
DU − BV AV − CU
x= , y= . (1.11)
AD − BC AD − BC

(ii) The system (1.10) has no solution or infinitely many solutions if and only if AD − BC = 0 .

Definition 1.12. The number d = AD − BC is called the determinant of the system (1.10).
RA
In Chapter 4.1 we will generalise this concept to n × n systems for n ≥ 3.

Remark 1.13. Let us see how the determinant connects to our geometric interpretation of the
system of equations. Assume that B 6= 0 and D 6= 0. Then we can solve 1 and 2 for y to obtain
equations for a pair of lines
A 1 C 1
L1 : y= − x + U, L2 : y= − x + V.
B B D D
A C
D

The two lines intersect in exactly one point if and only if they have different slopes, i.e., if − B 6= − D .
After multiplication by −BD we see that this is the same as AD 6= BC, or in other words,
AD − BC 6= 0.
On the other hand, the lines are parallel (hence they are either equal or they have no intersection)
A C
if − B 6= − D . This is the case if and only if AD = BC, or in other words, if AD − BC = 0.

Question 1.4
Consider the cases when B = 0 or D = 0 and make the connection between Theorem 1.11 and
the geometric interpretation of the system of equations.

Let us consider some more examples.

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
18 1.2. Linear 2 × 2 systems

y
7

(5, 3)
3 L1 : x + 2y = 11
L2 : 3x + 4y = 27

1
x
−1 1 3 5 7 9 11
−1

Figure 1.3: Example 1.14(a). Graphs of L1 , L2 and their intersection (5, 3).

Examples 1.14. (a)

FT 1
2
x + 2y = 11
3x + 4y = 27.
Clearly, the determinant is d = 4 − 6 = −2 6= 0. So the system has exactly one solution.
We can check this easily: The first equation gives x = 11 − 2y. Inserting this into the second
equations leads to
RA
3(11 − 2y) + 4y = 27 =⇒ −2y = −6 =⇒ y=3 =⇒ x = 11 − 2 · 3 = 5.

So the solution is x = 5, y = 3. (If we did not have Theorem 1.11, we would have to check
that this is not only a candidate for a solution, but indeed is one.)

Check that the formula (1.11) is satisfied.

(b) 1 x + 2y = 1
D

2 2x + 4y = 5.
Here, the determinant is d = 4 − 4 = 0, so we expect either no solution or infinitely many
solutions. The first equations gives x = 1 − 2y. Inserting into the second equations gives
2(1 − 2y) + 4y = 5. We see that the terms with y cancel and we obtain 2 = 5 which is a
contradiction. Therefore, the system of equations has no solution.

(c) 1 x + 2y = 1
2 3x + 6y = 3.
The determinant is d = 6 − 6 = 0, so again we expect either no solution or infinitely many
solutions. The first equations gives x = 1 − 2y. Inserting into the second equations gives
3(1 − 2y) + 6y = 3. We see that the terms with y cancel and we obtain 3 = 3 which is true.
Therefore, the system of equations has infinitely many solutions given by x = 1 − 2y.

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
Chapter 1. Introduction 19

L1 : x + 2y = 1 L1 : x + 2y = 1
y y
L2 : 3x + 4y = 5 L2 : 3x + 6y = 3
2 2

1 1

L1 = L2
x x

−1 1 2 3 −1 1 2 3

−1 −1

Figure 1.4: Picture on the left: The lines L1 , L2 from Example 1.14(b) are parallel and do not
intersect. Therefore the linear system has no solution.
Picture on the right: The lines L1 , L2 from Example 1.14(c) are equal. Therefore the linear system

FT
has infinitely many solutions.

Remark. This was somewhat clear since we can obtain the second equation from the first one
by multiplying both sides by 3 which shows that both equations carry the same information
and we loose nothing if we simply forget about one of them.

Exercise 1.15. Find all k ∈ R such that the system


RA
1 kx + (15/2 − k)y = 1
2 4x + 2ky = 3

has exactly one solution.

Solution. We only need to calculate the determinant and find all k such that it is different from
zero. So let us start by calculating

d = k · 2k − (15/2 − k) · 4 = 2k 2 + 4k − 30 = 2(k 2 + 2k − 15) = 2[(k + 1)2 − 16].


D

Hence there are exactly two values for k where d = 0, namely k = −1 ± 4, that is k1 = 3, k2 = −5.
For all other k, we have that d 6= 0.
So the answer is: The system has exactly one solution if and only if k ∈ R \ {−5, 3}. 

Remark 1.16. (a) Note that the answer does not depend on the right hand side of the system
of the equation. Only the coefficients on the left hand side determine if there is exactly one
solution or not.

(b) If we wanted to, we could also calculate the solution x, y in the case k ∈ R \ {−5, 3}. We
could do it by hand or use (1.11). Either way, we find

1 5k − 45/2 1 6k − 4
x= [2k − 3(15/2 − k)] = 2 , y= [6k − 4] = 2 .
d 2k + 4k − 30 d 2k + 4k − 30

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
20 1.2. Linear 2 × 2 systems

Note that the denominators are equal to d and they are equal to 0 exactly for the “forbidden”
values of k = −5 or k = 3.

(c) What happens if k = −5 or k = 3? In both cases, d = 0, so we will either have no solution or


infinitely many solutions.

If k = −5, then the system becomes −5x + 25/2y = 1, 4x − 10y = 3.


Multiplying the first equation by −4/5 and not changing the second equation, we obtain

4
4x − 10y = − , 4x − 10y = 3
5

which clearly cannot be satisfied simultaneously.

If k = 3, then the system becomes 3x − 9/2y = 1, 4x + 6y = 3.


Multiplying the first equation by 4/3 and not changing the second equation, we obtain

FT
4
4x − 6y = , 4x − 6y = 3
3

which clearly cannot be satisfied simultaneously.

In conclusion, if k = −5 or k = 3, then the linear system has no solution.


RA
You should have understood
• the geometric interpretation of a linear m × 2 system and how it helps to understand the
qualitative structure of solutions,
• how the determinant helps to decide whether a linear 2 × 2 system has a unique solution or
not,
• that whether 2 × 2 system a unique solution depends only on the coefficients; it does not
depend on the right side of the equation (the actual values of the solutions of course do
D

depend on the right side of the equation),


• etc.
You should now be able to
• pass easily from a linear m × 2 system to its geometric interpretation and back,
• calculate the determinant of a linear 2 × 2 system,
• determine if a linear 2 × 2 system has a unique, no or infinitely many solutions and calculate
them,
• give criteria for existence/uniqueness of solutions,
• etc.

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
Chapter 1. Introduction 21

1.3 Summary
A linear system is a system of equations

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. ..
. . .
am1 x1 + am2 x2 + · · · + amn xn = bm

where x1 , . . . , xn are the unknowns and the numbers aij and bi (i = 1, . . . , m, j = 1, . . . , n) are
given. The numbers aij are called the coefficients of the linear system and the numbers b1 , . . . , bn
are called the right side of the linear system.
In the special case when all bi are equal to 0, the system is called a homogeneous; otherwise it is
called inhomogeneous.
The coefficient matrix A and the augmented coefficient matrix (A|b) of the system is are

FT
 

a11 a12 . . . a1n
 a11 a12 . . . a1n b1
 a21 a22 . . . a2n   a21 a22 . . . a2n b2 
 
A= . ..  , (A|b) =   ... .. ..  .
 
 . . .   . .
am1 am2 . . . amn am1 am2 . . . amn bn

The general form of linear 2 × 2 system is


RA
a11 x1 + a12 x2 = b1
(1.12)
a21 x1 + a22 x2 = b2

and its determinant is


d = a11 a22 − a21 a12 .
The determinant tells us if the system (1.12) has a unique solution:
• If d 6= 0, then (1.12) has a unique solution.
D

• If d = 0, then (1.12) has either no or infinitely many solutions (it depends on b1 and b2 which
case prevails).
Observe that d does not depend on the right hand side of the linear system.

Last Change: Mi 2. Feb 01:00:12 CET 2022


Linear Algebra, M. Winklmeier
D
RA
FT
Chapter 2. R2 and R3 23

Chapter 2

R2 and R3

In this chapter we will introduce the vector spaces R2 , R3 and Rn . We will define algebraic
operations in them and interpret them geometrically. Then we will add some additional structure

FT
to these spaces, namely an inner product. This allows us to assign a norm (length) to a vector and
talk about the angle between two vectors; in particular, it gives us the concept of orthogonality. In
Section 2.3 we will define orthogonal projections in R2 and we will give a formula for the orthogonal
projection of a vector onto another. This formula is easily generalised to projections onto a vector
in Rn with n ≥ 3. Section 2.5 is dedicated to the special and very important case R3 since it is the
space that physicists use in classical mechanics to describe our world. In the last two sections we
study lines and planes in Rn and in R3 . We will see how we can describe them in formulas and we
will learn how to calculate their intersections. This naturally leads to the question on how to solve
RA
linear systems efficiently which will be addressed in the next chapter.

2.1 Vectors in R2
Recall that the xy-plane is the set of all pairs (x, y) with x, y ∈ R. We will denote it by R2 .
Maybe you already encountered vectors in a physics lecture. For instance velocities and forces are
described by vectors. The velocity of a particle says how fast it is and in which direction the particle
moves. Usually, the velocity is represented by an arrow which points in the direction in which the
D

particle moves and whose length is proportional to the magnitude of the velocity.
Similarly, a force has strength and a direction so it is represented by an arrow which points in the
direction in which it acts and with length proportional to its strength.
Observe that it is not important where in the space R2 or R3 we put the arrow. As long it points
in the same direction and has the same length, it is considered the same vector. We call two arrows
equivalent if they have the same direction and the same length. A vector is the set of all arrows
which are equivalent to a given arrow. Each specific arrow in this set is called a representation of
the vector. A special representation is the arrow that starts in the origin (0, 0). Vectors are usually
denoted by a small letter with an arrow on top, for example ~v .

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
24 2.1. Vectors in R2

# –
Given two points P, Q in the xy-plane, we write P Q for
the vector which is represented by the arrow that starts y
in P and ends in Q. For example, let P (2, 1) and Q(4, 4)
be pointsinthe xy-plane. Then the arrow from P to Q
# – 2 Q
is P Q = .
3

P#Q–
We can identify a point P (p1 , p2 ) in the xy-plane with
the vector starting in the poiint (0, 0) and ending in P
# – p1 x
P . We denote this vector by OP or or some-
p2
times by (p1 , p2 )t in order to save space (the subscript
t
stands for “transposed”). p1 is called its x-coordinate
or x-component and p2 is called its y-coordinate or y-
# –
component. Figure 2.1: The vector P Q and several of
its representations. The green arrow is the

FT
 
a special representation whose initial point is
On the other hand, every vector describes a unique in the origin.
b
point in the xy-plane, namely the tip of the arrow which
represents the given vector and starts in the origin.
Clearly its coordinates are (a, b). Therefore we can iden-
tify the set of all vectors in R2 with R2 itself.
RA
 
a b
Observe that the slope of the arrow ~v = is a if a 6= 0. If a = 0, then the vector is parallel to
b
the y-axis.
 
2
For example, the vector ~v = can be represented as an arrow whose initial point is in the origin
5
and its tip is at the point (2, 5). If we put its initial point anywhere else, then we find the tip by
moving 2 units to the right (parallel to the x-axis) and 5 units up (parallel to the y-axis).
D

 
0
A very special vector is the zero vector . Is is usually denoted by ~0.
0

We call numbers in R scalars in order to distinguish them from vectors.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 25

Algebra with vectors


If we think of a force and we double its strength then the
corresponding vector should be twice as long. If we multiply
the force by 5, then the length of the corresponding vector y
should be 5 times as long, that is, if for instance a force F~ = 2~v
(3, 4)t is given, then 5F~ should be (5 · 3, 5 · 4)t = (15, 20)t .
  ~v
a
In general, if a vector ~v = and a scalar c are given, then
  b x
ca
c~v = . Note that the resulting vector is always parallel −~v
cb
to the original one. If c > 0, then the resulting vector points
in the same direction as the original one, if c < 0, then it
points in the opposite direction, see Figure 2.2. Figure 2.2: Multiplication of a
vector by a scalar.
Given two points P (p1 , p2 ), Q(q1 , q2 ) in the xy-plane.
# – # –
Convince yourself that P Q = −QP .

FT
How should we sum two vectors? Again, let us think of forces. Assume we have two forces F~1
and F~2 both acting on the same particle. Then we get the resulting force if we draw the arrow
representing F~1 and attach to its end point the initial point of the arrow representing F~2 . The total
force is then represented by the arrow starting in the initial point of F~1 and ending in the tip of F~2 .

Convince yourself that we obtain the same result if we start with F~2 and put the initial point of
F~1 at the tip of F~2 .
RA
We could also think of the sum of velocities. For example, if a train moves with velocity ~vt and a
passengar on the train is moving with relative velocity ~vp , then her total velocity with respect to
the ground is the vectorsum
 of the twovelocities.

a p
Now assume that ~v = and w ~ = . Algebraically,
b q
we obtain the components of their sum by summing the y

a+p ~v + w
~
components: ~v + w~= , see Figure 2.3.
b+q w~
D

When you sum vector, you should always think of triangles w


~ ~v
(or polygons if you sum more than two vectors).
x

Given two points P (p1 , p2 ), Q(q1 , q2 ) in the xy-plane.


# – # – # –
Convince yourself that OP + P Q = OQ and consequently
# – # – # –
P Q = OQ − OP .
# – # – # – Figure 2.3: Sum of two vectors.
How could you write QP in terms of OP and OQ? What
# –
is its relation with P Q?

Our discussion of how the product of a vector and a scalar and how the sum of two vectors should
be, leads us to the following formal definition.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
26 2.1. Vectors in R2

   
a p
Definition 2.1. Let ~v = ,w
~= ∈ R2 , c ∈ R. Then:
b q
     
a p a+p
Vector sum: ~v + w~= + = ,
b q b+q
   
a ca
Product with a scalar: c~v = c = .
b cb

It is easy to see that the vector sum satisfies what one expects from a sum: (~u +~v ) + w
~ = ~u + (~v + w)
~
(associativity) and ~v + w ~ = w ~ + ~v (commutativity). Moreover, we have the distributivity laws
(a + b)~v= a~v + b~v 
anda(~v + w)
~ = a~v + aw.
~ Let us verify for example associativity. To this end,
u1 v1 w1
let ~u = , ~v = ,w ~= . Then
u2 v2 w2
 
         
u1 v1 w1 u1 + v1 w1 (u1 + v1 ) + w1
(~u + ~v ) + w
~= + + = + =
u2 v2 w2 u2 + v2 w2 (u2 + v2 ) + w2

FT
           
u1 + (v1 + w1 ) u1 (v1 + w1 ) u1 v1 w1
= = + = + +
u2 + (v2 + w2 ) u2 (v2 + w2 ) u2 v2 w2
= ~u + (~v + w).
~

In the same fashion, verify commutativity and distributivity of the vector sum.

~v w~ w~ ~v
RA
w
~
~v +

~v
w
~

~+
~v

w
w~
~v

Figure 2.4: The picture illustrates the commutativity of the vector sum.

~z ~z
~z)
~z
D
+

w
~
+
)

w
w~

(w~

~+

~z ~
w
+

~v +
+
(~v

w
~ w
~
~v

~z

~v
~v ~v

Figure 2.5: The picture illustrates associativity of the vector sum.

Draw pictures that illustrate the distributiviy laws.

We can take these properties and define an abstract vector space. We shall call a set of things, called
vectors, with a “well-behaved” sum of its elements and a “well-behaved” product of its elements
with scalars a vector space. The precise definition is the following.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 27

Vector Space Axioms. Let V be a set together with two operations

vector sum + : V × V → V, (v, w) 7→ v + w,


product of a scalar and a vector · : K × V → V, (λ, v) 7→ λ · v.

Note that we will usually write λv instead of λ · v. Then V is called an R-vector space and its
elements are called vectors if the following holds:

(a) Associativity: (u + v) + w = u + (v + w) for every u, v, w ∈ V .

(b) Commutativity: v + w = w + v for every v, w ∈ V .

(c) Identity element of addition: There exists an element O ∈ V , called the additive identity
such that for every v ∈ V , we have O + v = v + O = v.

(d) Inverse element: For all v ∈ V , we have an inverse element v 0 such that v + v 0 = O.

(e) Identity element of multiplication by scalar: For every v ∈ V , we have that 1v = v.

FT
(f) Compatibility: For every v ∈ V and λ, µ ∈ R, we have that (λµ)v = λ(µv).

(g) Distributivity laws: For all v, w ∈ V and λ, µ ∈ R, we have

(λ + µ)v = λv + µv and λ(v + w) = λv + λw.

These axioms are fundamental for linear algebra and we will come back to them in Chapter 5.1.
RA
Check that R2 is a vector space, that its additive identity is O = ~0 and that for every vector
~v ∈ R2 , its additive inverse is −~v .

It is important to note that there are vector spaces that do not look like R2 and that we cannot
always write vectors as columns. For instance, the set of all polynomials form a vector space (the
sum and scalar multiple of polynomials is again polynomial, the sum is additive and commutative;
the additive identity is the zero polynomial and for every polynomial p, its additive inverse is the
D

polynomial −p; we can multiply polynomials with scalars and obtain another polynomial, etc.). The
vectors in this case are polynomials and it does not make sense to speak about its “components” or
“coordinates”. (We will however learn how to represent certain subspaces of the space of polynomials
as subspaces of some Rn in Chapter 6.3.)

After this brief excursion about abstract vector spaces, let us return to R2 . We know that it can
be identified with the xy-plane. This means that R2 has more structure than only being a vector
space. For example, we can measure angles and lengths. Observe that these concepts do not appear
in the definition of a vector space. They are something in addition to the vector space properties.
Let us now look at some more geometric properties of vectors in R2 . Clearly a vector is known if
we know its length anditsangle with the x-axis. From the Pythagoras theorem it is clear that the
a √
length of a vector ~v = is a2 + b2 .
b

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
28 2.1. Vectors in R2

y
y y
~v
ϕ
~v
ϕ
x
x x ϕ

~v

Figure 2.6: Angle of a vector with the x-axis.

y
~v

ϕ0
ϕ
x

FT
−~v

Figure 2.7: The angle of ~v and −~v with the x-axis. Clearly, ϕ0 = ϕ + π.

2
Definition 2.2 (Norm of a vector in R ). The length of ~v =
 
a
∈ R2 is denoted by k~v k. It
RA
b
is given by
p
k~v k = a2 + b2 .

Other names for the length of ~v are magnitude of ~v or norm of ~v .

As already mentioned earlier, the slope of vector ~v is ab if a 6= 0. If ϕ is the angle of the vector ~v
with the x-axis then tan ϕ = ab if a 6= 0. If a = 0, then ϕ = 0 or ϕ = π. Recall that the range
a
of arctan is (−π/2, π/2), so we cannot simply take arctan
  of the  fraction b inorder to obtain ϕ.
D


b −b a −a a
Observe that arctan a = arctan −a , but the vectors and =− point in opposite
b −b b
directions, so they do not have the same angle with the x-axis. In fact, their angles differ by π, see
Figure 2.7. From elementary geometry, we find

arctan ab


 if a > 0,
b π − arctan b

if a < 0,
tan ϕ = if a 6= 0 and ϕ= a
a π/2
 if a = 0, b > 0,

−π/2 if a = 0, b < 0.

Note that this formula gives angles with values in [−π/2, 3π/2).

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 29

Remark 2.3. In order to obtain angles with values in (−π, π], we can use the formula

a

arccos √a2 +b2
 if b > 0,
ϕ= − arccos √a2a+b2 if b < 0,

π if a < 0, b = 0.

~ ∈ R2 . Then the following is


Proposition 2.4 (Properties of the norm). Let λ ∈ R and ~v , w
true:

(i) k~v k = 0 if and only if ~v = ~0.

(ii) kλ~v k = |λ|k~v k,

(iii) k~v + wk
~ ≤ k~v k + kwk
~ (triangle inequality),
   
a c

FT
Proof. Let ~v = ,w
~= ∈ R2 and λ ∈ R.
b d

(i) Since k~v k = a2 + b2 it follows that k~v k = 0 if and only if a = 0 and b = 0. This is the case
if and only if ~v = ~0.


   
a λa p p
(ii) kλ~v k = 2 2 2 2 2 2 2
λ b = λb = (λa) + (λb) = λ (a + b ) = |λ| a + b = |λ|k~v k.

RA
(iii) We postpone the proof of the triangle inequality to Corollary 2.20 when we will have the
cosine theorem at our disposal.

Geometrically, the triangel inequality says that in the plane


the shortest way to get from one point to the other is a w~
w
~

straight line. Figure 2.8 shows that it is shorter to go di-


~v +

rectly from the origin of the blue vector to its tip than taking
~ In other words, k~v +wk
a detour along ~v and w. ~ ≤ k~v k+kwk.
~ ~v
D

Figure 2.8: Triangle inequality.

Definition 2.5. A vector ~v ∈ R2 is called a unit vector if k~v k = 1.

Note that every vector ~v 6= ~0 defines a unit vector pointing in the same direction as itself by k~v k−1~v .

Remark 2.6. (i) The tip of every unit vector lies on the unit circle, and, conversely, every vector
whose initial point is the origin and whose tip lies on the unit circle is a unit vector.
 
cos ϕ
(ii) Every unit vector is of the form where ϕ is its angle with the positive x-axis.
sin ϕ

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
30 2.1. Vectors in R2

~v
ϕ
x
1

Figure 2.9: Unit vectors.

Finally, we define two very special unit vectors:

FT
   
1 0
~e1 = , ~e2 = .
0 1
Clearly, ~e1 is parallel to the x-axis, ~e2 is parallel to the y-axis and k~e1 k = k~e2 k = 1.
 
a
Remark 2.7. Every vector ~v = can be written as
b
     
a a 0
~v = = + = a~e1 + b~e2 .
RA
b 0 b

Remark 2.8. Another notation for ~e1 and ~e2 is ı̂ and ̂.

You should have understood


• the concept of an abstract vector space and vectors,
• the vector space R2 and how to calculate with vectors in R2 ,
 
a
D

• the difference between a point P (a, b) in R2 and a vector ~v = in R2 ,


b
• geometric concepts (angles, length of a vector),
• etc.
You should now be able to
• perform algebraic operations in the vector space R2 and visualise them in the plane,
• calculate lengths and angles,
• calculate unit vectors, scale vectors,
• perform simple abstract proofs (e.g., prove that R2 is a vector space).
• etc.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 31

2.2 Inner product in R2


In this section we will explore further geometric properties of R2 and we will introduce the so-called
inner product. Many of these properties carry over almost literally to R3 and more generally, to
Rn . Let us start with a definition.
   
v1 w1
Definition 2.9 (Inner product). Let ~v = ,w
~= be vectors in R2 . The inner product
v2 w2
of ~v and w
~ is
h~v , wi
~ := v1 w1 + v2 w2 .
The inner product is also called scalar product or dot product and it can also be denoted by ~v · w.
~

We usually prefer the notation h~v , wi


~ since this notation is used frequently in physics and extends
naturally to abstract vector spaces with an inner product. Moreover, the notation with the dot
seems to suggest that the dot product behaves like a usual product, whereas in reality it does not,
see Remark 2.12.

FT
Before we give properties of the inner product and explore what it is good for, we first calculate a
few examples to familiarise ourselves with it.

Examples 2.10.
   
2 −1
(i) , = 2 · (−1) + 3 · 5 = −2 + 15 = 13.
3 5
  2
RA
   
2 2 2 2
2
(ii) , 3 .
= 2 + 3 = 4 + 9 = 13. Observe that this is equal to
3 3
       
2 1 2 0
(iii) , = 2, , = 3.
3 0 3 1
   
2 −3
(iv) , = 0.
3 2

~ ∈ R2 and λ ∈ R. Then the


D

Proposition 2.11 (Properties of the inner product). Let ~u, ~v , w


following holds.

(i) h~v , ~v i = k~v k2 . In dot notation: ~v · ~v = k~v k2 .


(ii) h~u , ~v i = h~v , ~ui. In dot notation: ~u · ~v = ~v · ~u.
(iii) h~u , ~v + wi
~ = h~u , ~v i + h~u , wi.
~ In dot notation: ~u · (~v + w)
~ = ~u · ~v + ~u · w.
~
(iv) hλ~u , ~v i = λh~u , ~v i. In dot notation: (λ~u) · ~v = λ(~u · ~v ) .
     
u1 v w1
Proof. Let ~u = , ~v = 1 and w
~= .
u2 v2 w2

(i) h~v , ~v i = v12 + v22 = k~v k2 .

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
32 2.2. Inner product in R2

(ii) h~u , ~v i = u1 v1 + u2 v2 = v1 u1 + v2 u2 = h~v , ~ui.


   
u1 v + w1
(iii) h~u , ~v + wi~ = , 1
u2 v2 + w2
= u1 (v1 + w1 ) + u2 (v2 + w2 ) = u1 v1 + u2 v2 + u1 w1 + u2 w2
       
u1 v u1 w1
= , 1 + , = h~u , ~v i + h~u , wi.
~
u2 v2 u2 w2
   
λu1 v
(iv) hλ~u , ~v i = , 1 = λu1 v1 + λu2 v2 = λ(u1 v1 + u2 v2 ) = λh~u , ~v i.
λu2 v2

Remark 2.12. Observe that the proposition shows that the inner product is commutative and
distributive, so it has some properties of the “usual product” that we are used to from the product
in R or C, but there are some properties that show that the inner product is not a product.
(a) The inner products takes two vectors and gives back a number, so it gives back an object that
is not of the same type as the two things we put in.

FT
(b) In Example 2.10(iv) we saw that it may happen that ~v 6= ~0 and w ~ 6= ~0 but still h~v , wi
~ =0
which is impossible for a “decent” product.
(c) Given a vector ~v 6= 0 and a number c ∈ R, there are many solutions of the equation h~v , ~xi = c
for the vector ~x, in stark contrast to the usual product in R or C. Look for instance at
Example 2.10(i) and (ii). Therefore it makes no sense to write something like ~v −1 .
(d) There is no such thing as a neutral element for scalar multiplication.
RA
Now let us see why the inner product is useful. In fact, it is related to the angle between two vectors
and it will help us to define orthogonal projections of one vector onto another. Let us start with a
definition.

~ be vectors in R2 . The angle between ~v and w


Definition 2.13. Let ~v , w ~ is the smallest nonnegative
angle between them, see Figure 2.10. It is denoted by ^(~v , w).
~

~v ~v
D

w
~
ϕ
ϕ ϕ
w
~ ϕ ~v
w
~
w
~
~v

Figure 2.10: Angle between two vectors.

The following properties of the angle are easy to see.

Proposition 2.14. ~ ∈ [0, π] and ^(~v , w)


(i) ^(~v , w) ~ = ^(w,
~ ~v ).

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 33

(ii) If λ > 0, then ^(λ~v , w)


~ = ^(~v , w).
~
~ = π − ^(~v , w).
(iii) If λ < 0, then ^(λ~v , w) ~

w
~
ϕ ~v
ψ
−~v

~ and the vectors ~v and −~v . ϕ = ^(w,


Figure 2.11: Angle between the vector w ~ −~v ) =
~ ~v ), ψ = ^(w,
π − ^(w,
~ ~v ) = π − ϕ.

Definition 2.15. (a) Two non-zero vectors ~v and w


~ are called parallel if ^(~v , w)
~ = 0 or π. In
this case we use the notation ~v k w.
~

(b) Two non-zero vectors ~v and w~ are called orthogonal (or perpendicular ) if ^(~v , w)
~ = π/2. In

FT
this case we use the notation ~v ⊥ w.
~

(c) The vector ~0 is parallel and perpendicular to every vector.

The following properties should be intuitively clear from geometry. A formal proof of (ii) and (iii)
can be given easily after Corollary 2.20. The proof of (i) will be given after Remark 2.24.

~ be vectors in R2 . Then:
Proposition 2.16. Let ~v , w
RA
(i) If ~v k w ~ 6= ~0, then there exists λ ∈ R such that ~v = λw.
~ and w ~
(ii) If ~v k w
~ and λ, µ ∈ R, then also λ~v k µw.
~
(iii) If ~v ⊥ w
~ and λ, µ ∈ R, then also λ~v ⊥ µw.
~

Remark 2.17. (i) Observe that (i) is wrong if we do not assume that w ~ 6= ~0 because if w
~ = ~0,
then it is parallel to every vector ~v in R2 , but there is no λ ∈ R such that λ~v could ever
become different from ~0.
D

(ii) Observe that the reverse direction in (ii) and (iii) is true only if λ 6= 0 and µ 6= 0.

Without proof, we state the following theorem which should be known.


Theorem 2.18 (Cosine Theorem). Let a, b, c be the
sides or a triangle and let ϕ be the angle between the c
b
sides a and b. Then
ϕ
c2 = a2 + b2 − 2ab cos ϕ. (2.1)
a

~ ∈ R2 and let ϕ = ^(~v , w).


Theorem 2.19. Let ~v , w ~ Then

h~v , wi
~ = k~v kkwk
~ cos ϕ.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
34 2.2. Inner product in R2

Proof.

The vectors ~v and w ~ define a triangle in R2 , see


Figure 2.12. Now we apply the cosine theorem ~v − w
~
with a = k~v k, b = kwk,
~ c = k~v − wk. We obtain w
~

~ 2 = k~v k2 + kwk
k~v − wk ~ 2 − 2k~v kkwk
~ cos ϕ. (2.2) ϕ
~v

Figure 2.12: Triangle given by ~v and w.


~
On the other hand,

~ 2 = h~v − w
k~v − wk ~ , ~v − wi
~ = h~v , ~v i − h~v , wi
~ − hw
~ , ~v i + hw ~ = h~v , ~v i − 2h~v , wi
~ , wi ~ + hw
~ , wi
~
= k~v k2 − 2h~v , wi ~ 2.
~ + kwk (2.3)

Comparison of (2.2) and (2.3) yields

FT
k~v k2 + kwk
~ 2 − 2k~v kkwk
~ cos ϕ = k~v k2 − 2h~v , wi ~ 2,
~ + kwk

which gives the claimed formula.

A very important consequence of this theorem is that we can now determine if two vectors are
parallel or perpendicular to each other by simply calculating their inner product as can be seen
from the following corollary.
RA
~ ∈ R2 and ϕ = ^(~v , w).
Corollary 2.20. Let ~v , w ~ Then:

(i) ~v k w
~ ⇐⇒ k~v k kwk
~ = |h~v , wi|.
~

(ii) ~v ⊥ w
~ ⇐⇒ h~v , wi
~ = 0,

(iii) Cauchy-Schwarz inequality: |h~v , wi|


~ ≤ k~v k kwk.
~

(iv) Triangle inequality:


D

k~v + wk
~ ≤ k~v k + kwk.
~ (2.4)

Proof. The claims are clear if one of the vectors is equal to ~0 since the zero vector is parallel and
orthogonal to every vector in R2 . So let us assume now that ~v 6= ~0 and w~ 6= ~0.

(i) From Theorem 2.19 we have that |h~v , wi| ~ = k~v k kwk
~ if and only if | cos ϕ| = 1. This is the
case if and only if ϕ = 0 or π, that is, if and only if ~v and w
~ are parallel.

(ii) From Theorem 2.19 we have that |h~v , wi| ~ = 0 if and only if cos ϕ = 0. This is the case if and
only if ϕ = π/2, that is, if and only if ~v and w
~ are perpendicular.

(iii) By Theorem 2.19 we have that |h~v , wi|


~ = k~v k kwk
~ | cos ϕ| ≤ k~v k kwk
~ since 0 ≤ | cos ϕ| ≤ 1 for
ϕ ∈ [0, π].

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 35

(iv) Consider the triangle whose sides are ~v , w ~ and


~v + w~ and let ϕ be the angle opposite to the side
~v + w
~ (hence ϕ = π−^(~v , w)).
~ The cosine theorem
w~
gives

w
~
~v +
w
~ ϕ
~ 2 = k~v k2 + kwk
k~v + wk ~ 2 + 2k~v k wk
~ cos ϕ
≤ k~v k2 + kwk
~ 2 + 2k~v k wk
~ ~v

~ 2.
= (k~v k + kwk)
Taking the square root on both sides gives us the desired inequality.

Question 2.1
When does equality hold in the triangle inequality (2.4)? Draw a picture and prove your claim
using the calculations in the proof of (iv).

Exercise. Prove (ii) and (iii) of Proposition 2.16 using Corollary 2.20.

Exercise.

FT
(i) Prove Corollary 2.20 (iii) without the cosine theorem.
Hint. Start with the inequality 0 ≤ kwk~
~ v − k~v kw
2
~ and expand the right hand side similar

~ 2 k~v k2 − 2(h~v , wi)


as in the proof of Proposition 8.6. You will find that 0 ≤ 2kwk
(ii) Prove Corollary 2.20 (iv) without the cosine theorem.
Hint. Cf. the proof of the triangle inequality in Cn (Proposition 8.6).
~ 2.
RA
We give a proof of (iii) and (iii) in Proposition 8.6 without the use of the cosine theorem which
works also in the complex case.

Example 2.21. Theorem 2.19 allows us to calculate the angle of a given vector with the x-axis
easily (see Figure 2.13):
h~v ,~e1 i h~v ,~e2 i
cos ϕx = , cos ϕy = .
k~v kk~e1 k k~v kk~e2 k
If we now use that k~e1 k = k~e2 k = 1 and that h~v ,~e1 i = v1 and h~v ,~e2 i = v2 , then we can simplify
the expressions to
D

v1 v2
cos ϕx = , cos ϕy = .
k~v k k~v k

y
~v ϕy
ϕx
x

Figure 2.13: Angle of ~v with the axes.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
36 2.3. Orthogonal Projections in R2

You should have understood


• the concepts of being parallel and of being perpendicular,
• the relation of the inner product with the length of a vector and the angle between two
vectors,
• that the inner product is commutative and associative, but that it is not a product,
• etc.
You should now be able to
• calculate the inner product of two vectors,
• use the inner product to calculate angles between vectors
• use the inner product to determine if two vectors are parallel, perpendicular or neither,
• etc.

2.3

extend w
Orthogonal Projections in R2
~ be vectors in R2 and w
Let ~v and w
projection of ~v onto w
FT
~ 6= ~0. Geometrically, we have an intuition of what the orthogonal
~ should be and that we should be able to construct it as described in the
following procedure: We move ~v such that its initial point coincides with that of w. ~ Then we
~ to a line and construct a line that passes through the tip of ~v and is perpendicular to w.
The vector from the initial point to the intersection of the two lines should then be the orthogonal
~
RA
projection of ~v onto w.
~ see Figure 2.14

~v
~v
~v
w
~
w
~ w
~

~v
~v
·
D

~v
· w
~ ~vk
~vk w
~ w
~
·
~vk

~ in R2 .
Figure 2.14: Some examples for the orthogonal projection of ~v onto w

This procedure decomposes the vector ~v in a part parallel to w ~ and a part perpendicular to w
~ so
that their sum gives us back ~v . The parallel part is the orthogonal projection of ~v onto w.
~
In the following theorem we give the precise meaning of the orthogonal projection, we show that
a decomposition as described above always exists and we even derive a formula for orthogonal
projection. A more general version of this theorem is Theorem 7.34.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 37

Theorem 2.22 (Orthogonal projection). Let ~v and w ~ 6= ~0. Then there


~ be vectors in R2 and w
exist uniquely determined vectors ~vk and ~v⊥ (see Figure 2.15) such that
~vk k w,
~ ~v⊥ ⊥ w
~ and ~v = ~vk + ~v⊥ . (2.5)
The vector ~vk is called the orthogonal projection of ~v onto w
~ and it is given by

h~v , wi
~
~vk = w.
~ (2.6)
~ 2
kwk

~v ~v⊥
~v ~v⊥
·
~v
· w
~
~vk = projw~ ~v
w
~ ~v⊥ w
~
~vk = projw~ ~v
·
~vk = projw~ ~v

by definition ~vk = projw~ ~v .

and since w ~
~ 6= 0, there exists λ ∈ R such that ~vk = λw,
determine λ. For this, we notice that ~v = λw
FT
Figure 2.15: Examples of decompositions of ~v into ~v = ~vk + ~v⊥ with ~vk k w

Proof. Assume we have vectors ~vk and ~v⊥ satisfying (2.5). Since ~vk and w
~ and ~v⊥ ⊥ w.
~ Note that

~ are parallel by definition


~ so in order to find ~vk it is sufficient to
~ + ~v⊥ by (2.5). Taking the inner product on both
RA
sides with w
~ leads to
h~v , wi
~ = hλw ~ = hλw
~ + ~v⊥ , wi ~ + h~v⊥ , wi
~ , wi ~ = hλw
~ , wi
~ = λhw
~ , wi ~ 2
~ = λkwk
| {z }
v⊥ ⊥ w
= 0 since ~ ~
h~v , wi
~
=⇒ λ= .
~ 2
kwk
So if a sum representation of ~v as in (2.5) exists, then the only possibility is
D

h~v , wi
~ h~v , wi
~
~vk = λw
~= w
~ and ~v⊥ = ~v − ~vk = ~v − w.
~
~ 2
kwk ~ 2
kwk
This already proves uniqueness of the vectors ~vk and ~v⊥ . It remains to show that they indeed have
the desired properties. Clearly, by construction ~vk is parallel to w ~ and ~v = ~vk + ~v⊥ since we defined
~v⊥ = ~v − ~vk . It remains to verify that ~v⊥ is orthogonal to w. ~ This follows from
   
h~v , wi
~ h~v , wi
~ h~v , wi
~
h~v⊥ , wi
~ = ~v − w ~ = h~v , wi
~ ,w ~ − w ~ = h~v , wi
~ ,w ~ − hw
~ , wi
~ =0
~ 2
kwk ~ 2
kwk ~ 2
kwk
where in the last step we used that hw ~ 2.
~ = kwk
~ , wi

Notation 2.23. Instead of ~vk we often write projw~ ~v , in particular when we want to emphasise
onto which vector we are projecting.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
38 2.3. Orthogonal Projections in R2

Remark 2.24. (i) projw~ ~v depends only on the direction of w.


~ It does not depend on its length.
(ii) For every c ∈ R, we have that projw~ (c~v ) = c projw~ ~v .
(iii) As special cases of the above, we find projw~ (−~v ) = − projw~ ~v and proj−w~ ~v = projw~ ~v .
(iv) ~v k w
~ =⇒ projw~ ~v = ~v .
(v) ~v ⊥ w
~ =⇒ projw~ ~v = ~0.
(vi) projw~ ~v is the unique vector in R2 such that

(~v − projw~ ~v ) ⊥ ~v and projw~ ~v k w.


~

Proof. (i): By our geometric intuition, this should be clear. Let us give a formal proof. Suppose
~ for some c ∈ R \ {0}. Then
we want to project ~v onto cw
h~v , cwi
~ ch~v , wi
~ h~v , wi
~
projcw~ ~v = (cw)
~ = 2 (cw)
~ = w
~ = projw~ ~v .
kcwk~ 2 c kwk~ 2 ~ 2
kwk

Convince yourself graphically that it does not matter if we project ~v on w ~ or on − 75 w;


~ or on 5w ~

FT
only the direction of w
~ matters, not its length.

(ii): Again, by geometric considerations, this should be clear. The corresponding calculation is
hc~v , wi
~ ch~v , wi
~
projw~ (c~v ) = w
~= w
~ = c projw~ ~v .
kwk~ 2 kwk~ 2
(iii) follows directly from (i) and (ii).
RA
(iv), (v) and (vi) follow from the uniqueness of the decomposisition of the vector ~v as sum of a
vector parallel and a vector perpendicular to w.
~
Now the proof of Proposition 2.16 (i) follows easily.
Proof of Proposition 2.16 (i). We have to show that if ~v k w ~ 6= ~0, then there exists λ ∈ R
~ and if w
~ = λ~v . From Remark 2.24 (iv) it follows that ~v = projw~ ~v = h~
such that w v ,wi
~
~ 2 w,
kwk ~ hence the claim
h~
v ,wi
~
follows if we can choose λ = ~ 2 .
kwk
D

We end this section with some examples.

Example 2.25. Let ~u = 2~e1 + 3~e2 , ~v = 4~e1 − ~e2 .


h~
u ,~e1 i 2
(i) proj~e1 ~u = k~e1 k2 ~ e1 = 12 ~
e1 = 2~e1 .
h~
u ,~e2 i 3
(ii) proj~e2 ~u = k~e2 k2 ~ e2 = 12 ~
e2 = 3~e2 .

(iii) Similarly, we can calculate proj~e1 ~v = 4~e1 , proj~e2 ~v = −~e2 .


    
2
  ,
5
−1
 
h~ vi
u ,~ 3 8−3 5 5 2
(iv) proj~u ~v = k~uk2 ~u = uk2
k~ ~u = 22 +32 ~u = 13 ~u = 13 .
3

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 39

   
4 2
 ,
−1
 
h~ ui
v ,~ 3 8−3 5 5 4
(v) proj~v ~u = k~v k2 ~
v = uk2
k~ ~u = 42 +(−1)2 ~
v = 17 ~
v = .
17 −1
 
a
Example 2.26 (Angle with coordinate axes). Let ~v = ∈ R2 \ {~0}. Then cos ^(~v ,~e1 ) =
b
a
v ,~e2 ) = k~vb k , hence
v k , cos ^(~
k~
     
a cos ^(~v ,~e1 ) cos ϕx
~v = = k~v k = k~v k
b cos ^(~v ,~e2 ) cos ϕy
and

projection of ~v onto the x-axis = proj~e1 ~v = k~v k cos ^(~v ,~e1 )~e1 = k~v k cos ϕx ~e1 ,
projection of ~v onto the y-axis = proj~e2 ~v = k~v k cos ^(~v ,~e2 )~e2 = k~v k cos ϕy ~e2 .

FT
Question 2.2

~ be a vector in R2 \ {~0}.
Let w

~ is equal to ~0?
(i) Can you describe geometrically all the vectors ~v whose projection onto w
(ii) Can you describe geometrically all the vectors ~v whose projection onto w
~ have length 2?
(iii) Can you describe geometrically all the vectors ~v whose projection onto w
~ have length 3kwk?
~
RA
You should have understood

• the concept of orthogonal projections in R2 ,


• why the orthogonal projection of w
~ onto w
~ does not depend on the length of w,
~
• etc.
You should now be able to

• calculate the projection of a given vector onto another vector,


D

• calculate vectors with a given projection onto another vector,


• etc.

2.4 Vectors in Rn
In this section we extend our calculations from R2 to Rn . If n = 3, then we obtain R3 which
usually serves as model for our everyday physical world and which you probably already are familiar
with from physics lectures. We will discuss R3 and some of its peculiarities in more detail in the
Section 2.5.
First, let us define Rn .

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
40 2.4. Vectors in Rn

Definition 2.27. For n ∈ N we define the set


  
 x1
 

n  .. 
R =  .  : x1 , . . . , x n ∈ R .
 
xn
 

Again we can think of vectors as arrows. As in R2 , we can identify every point in Rn with the arrow
that starts in the origin of coordinate system and ends in the given point. The set of all arrows
with the same lengthandthe same direction is called a vector in Rn . So every point P (p1 , . . . , pn )
p1
defines a vector ~v =  ...  and vice versa. As before, we sometimes denote vectors as (p1 , . . . , pn )t
 

pn
in order to save (vertical) space. The superscript t stands for “transposed”.

Rn becomes a vector space with the operations

FT
       
v1 w1 v1 + w 1 cv1
~ =  ...  +  ...  = 
Rn × Rn → Rn , ~v + w ..
, R × Rn → Rn , c~v =  ...  . (2.7)
       
.
vn wn vn + w n cvn

Exercise. Show that Rn is a vector space. That is, you have to show that the vector space
axioms on page 29 hold.

As in R2 , we can define the norm of a vector, the angle between two vectors and an inner product.
RA
Note that the definition of the angle between two vectors is not different from the one in R2 since
when we are given two vectors, they always lie in a common plane which we can imagine as some
sort of rotated R2 . Let us give now the formal definitions.
   
v1 w1
 ..   .. 
Definition 2.28 (Inner product; norm of a vector). For vectors ~v =  .  and w
~ = . 
vn wn
the inner product (or scalar product or dot product) is defined as
D

   
* v1 w1 +
 ..   .. 
~ =  .  ,  .  = v1 w1 + · · · + vn wn .
h~v , wi
vn wn
 
v1
The length of ~v =  ...  ∈ Rn is denoted by k~v k and it is given by
 

vn
q
k~v k = v12 + · · · + vn2 .

Other names for the length of ~v are magnitude of ~v or norm of ~v .

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 41

As in R2 , we have the following properties:

~ ∈ Rn , we have that h~v , wi


(i) Symmetry of the inner product: For all vectors ~v , w ~ = hw
~ , ~v i.
~ ∈ Rn and all c ∈ R, we have that
(ii) Bilinearity of the inner product: For all vectors ~u, ~v , w
h~u , ~v + cwi
~ = h~u , ~v i + ch~u , wi.
~
~ ∈ Rn and let ϕ =
(iii) Relation of the inner product with the angle between vectors: Let ~v , w
^(~v , w).
~ Then
h~v , wi
~ = k~v k kwk
~ cos ϕ.

In particular, we have (cf. Proposition 2.16):

(a) ~v k w
~ ⇐⇒ ~ ∈ {0, π}
^(~v , w) ⇐⇒ |h~v , wi|
~ = k~v k kwk,
~
(b) ~v ⊥ w
~ ⇐⇒ ^(~v , w)
~ = π/2 ⇐⇒ h~v , wi
~ = 0.

Remark 2.29. In abstract inner product spaces, the inner product is actually used to define

FT
orthogonality.
(iv) Relation of the inner product with the norm: For all vectors ~v ∈ Rn , we have k~v k2 = h~v , ~v i.
~ ∈ Rn and scalars c ∈ R, we have that kc~v k = |c|k~v k
(v) Properties of the norm: For all vectors ~v , w
and k~v + wk
~ ≤ k~v k + kwk.
~
~ 6= ~0 the
~ ∈ Rn with w
(vi) Orthogonal projections of one vector onto another: For all vectors ~v , w
orthogonal projection of ~v onto w
~ is
RA
h~v , wi
~
projw~ ~v = w.
~ (2.8)
~ 2
kwk

As in R2 , we have n “special vectors” which are parallel to the coordinate axes and have norm 1:
     
1 0 0
0 1  .. 
~e1 :=  .  , ~e2 :=  .  , . . . , ~en :=  .  .
     
 ..   ..  0
D

0 0 1

In the special case n = 3, the vectors ~e1 , ~e2 and ~e3 are sometimes denoted by ı̂,̂, k̂.
For a given vector ~v 6= ~0, we can now easily determine its projections onto the n coordinate axes
and its angle with the coordinate axes. By (2.8), the projection onto the xj -axis is

proj~ej ~v = vj~ej .

Let ϕj be the angle between ~v and the xj -axis. Then

h~v ,~ej i vj
ϕj = ^(~v ,~ej ) =⇒ cos ϕx = = .
k~v k k~ej k k~v k

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
42 2.5. Vectors in R3 and the cross product

 
cos ϕ1
It follows that ~v = k~v k  ... . Sometimes the notation
 

cos ϕn
 
cos ϕ1
~v
v̂ := = k~v k  ... 
 
k~v k
cos ϕn

is used for the unit vector pointing in the same direction as ~v . Clearly kv̂k = 1 because kv̂k =
kk~v k−1~v k = k~v k−1 k~v k = 1. Therefore v̂ is indeed a unit vector pointing in the same direction as
the original vector ~v .

You should have understood


• the vector space Rn and vectors in Rn ,
• geometric concepts (angles, length of a vector) in Rn ,

FT
• that R2 from chapter 2.1 is a special case of Rn from this section,
• etc.
You should now be able to

• perform algebraic operations in the vector space R3 and, in the case n = 3, visualise them
in space,
• calculate lengths and angles,
RA
• calculate unit vectors, scale vectors,
• perform simple abstract proofs (e.g., prove that Rn is a vector space).
• etc.

2.5 Vectors in R3 and the cross product


The space R3 is very important since it is used in mechanics to model the space we live in. On
D

R3 we can define an additional operation with vectors, the so-called cross product. Another name
for it its vector product. It takes two vectors and gives back another vector. It does have several
properties which makes it look like a product, however we will see that it is not a product. Here
is its definition.
   
v1 w1
Definition 2.30 (Cross product). Let ~v = v2  , w ~ = w2  ∈ R3 . Their cross product (or
v3 w3
vector product or wedge product) is
     
v1 w1 v2 w3 − v3 w2
~ = v2  × w2  := v3 w1 − v1 w3  .
~v × w
v3 w3 v1 w2 − v2 w1

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 43

Another notation for the cross product is ~v ∧ w.


~

A way to remember this formula is as follows. Write the first and the second component of the
vectors underneath them, so that formally you get a column of 5 components. Then make crosses
as in the sketch below, starting with the cross consisting of a line from v2 to w3 and then from w2
to v3 . Each line represents a product of the corresponding components; if the line goes from top
left to bottom right then it is counted positive, if it goes from top right to bottom left then it is
counted negative.

v2 w3 − v3 w2
 
v1 w1
   
 v2  ×  w2  =  v3 w1 − v1 w3 
     
v3 w3 v1 w2 − v2 w1
v1 w1
v2 w2

FT
The cross product is defined only in R3 !

Before we collect some easy properties of the cross product, let us calculate a few examples.
   
1 5
Examples 2.31. Let ~u = 2, ~v = 6.
3 7
RA
         
1 5 2·7−3·6 14 − 18 −4
• ~u × ~v = 2 × 6 = 3 · 5 − 1 · 7 =  15 − 7  =  8,
3 7 1·6−2·5 6 − 10 −4
         
5 1 6·3−7·2 18 − 14 4
• ~v × ~u = 6 × 2 = 7 · 1 − 3 · 5 =  7 − 15  = −8,
7 3 5·2−6·1 10 − 6 4
       
5 1 6·0−7·0 0
D

• ~v × ~e1 = 6 × 0 = 7 · 0 − 7 · 1 = −7,


7 0 5·0−6·1 −6
       
5 5 6·7−7·6 0
• ~v × ~v = 6 × 6 = 7 · 5 − 5 · 7 = 0.
7 7 5·6−6·5 0

~ ∈ R3 and let c ∈ R. Then:


Proposition 2.32 (Properties of the cross product). Let ~u, ~v , w

(i) ~u × ~0 = ~0 × ~u = ~0.

(ii) ~u × ~v = −~v × ~u.

(iii) ~u × (~v + w)
~ = (~u × ~v ) + (~u × w).
~

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
44 2.5. Vectors in R3 and the cross product

(iv) (c~u) × ~v = c(~u × ~v ).

(v) ~u k ~v =⇒ ~u × ~v = ~0. In particular, ~v × ~v = ~0.

(vi) h~u , ~v × wi
~ = h~u × ~v , wi.
~

(vii) h~u , ~u × ~v i = 0 and h~v , ~u × ~v i = 0, in particular

~v ⊥ ~v × ~u, ~u ⊥ ~v × ~u

In other words: the vector ~v × w


~ is orthogonal to both ~v and w.
~

Proof. The proofs of the formulas (i) – (v) are easy calculations (you should do them!).

(vi) The proof is a long but straightforward calculation:


*u  v  w + *u  v w − v w +
1 1 1 1 2 3 3 2
~ = u2  , v2  × w2  = u2  , v3 w1 − w3 v1 
h~u , ~v × wi

FT
u3 v3 w3 u3 v1 w2 − v2 w1
= u1 (v2 w3 − v3 w2 ) + u2 (v3 w1 − v1 w3 ) + u3 (v1 w2 − v2 w1 )
= u1 v2 w3 − u1 v3 w2 + u2 v3 w1 − u2 v1 w3 + u3 v1 w2 − u3 v2 w1
= u2 v3 w1 − u3 v2 w1 + u3 v1 w2 − u1 v3 w2 + u1 v2 w3 − u2 v1 w3
= (u2 v3 − u3 v2 )w1 + (u3 v1 − u1 v3 )w2 + (u1 v2 − u2 v1 )w3
= h~u × ~v , wi.
~
RA
(vii) It follows from (vi) and (v) that

h~u , ~u × ~v i = h~u × ~u , ~v i = h~0 , ~v i = 0 and h~v , ~u × ~v i = −h~v , ~v × ~ui = 0.

Note that the cross product is distributive but it is neither commutative nor associative.

Exercise. Prove the formulas in (i) – (v).


D

Remark. A geometric interpretation of the number h~u , ~v × wi


~ from (vi) will be given in Proposi-
tion 2.36.

Remark 2.33. The property (vii) explains why the cross product makes sense only in R3 . Given
two non-parallel vectors ~v and w,
~ their cross product is a vector which is orthogonal to both of
them and whose length is k~v k kwk
~ sin ϕ (see Theorem 2.34; ϕ = ^(~v , w))
~ and this should define the
result uniquely up to a factor ±1. This factor has to do with the relative orientation of ~v and w
~ to
each other. However, if n 6= 3, then one of the following holds:

• If we were in R2 , the problem is that “we do not have enough space” because then the only
vector orthogonal to ~v and w~ at the same time would be the zero vector ~0 and it would not
make too much sense to define a product where the result is always ~0.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 45

• If we were in some Rn with n ≥ 4, the problem is that “we have too many choices”. We will
see later in Chapter 7.3 that the orthogonal complement of the plane generated by ~v and w ~
has dimension n − 2 and every vector in the orthogonal complement is orthogonal to both
~v and w.~ For example, if we take ~v = (1, 0, 0, 0)t and w~ = (0, 1, 0, 0)t , then every vector of
t
the form ~a = (0, 0, x, y) is perpendicular to both ~v and w
~ and it easy to find infinitely many
vectors of this form which in addition have norm k~v k kwk~ sin ϕ = 1 (~a = (0, 0, sin ϑ, ± cos ϑ)t
for arbitrary ϑ ∈ R works).

Recall that for the inner product we proved the formula h~v , wi
~ = k~v k kwk
~ cos ϕ where ϕ is the angle
between the two vectors, see Theorem 2.19. In the next theorem we will prove a similar relation
for the cross product.

~ be vectors in R3 and let ϕ be the angle between them. Then


Theorem 2.34. Let ~v , w

k~v × wk
~ = k~v k kwk
~ sin ϕ

FT
~ 2 = k~v k2 kwk
Proof. A long, but straightforward calculation shows that k~v × wk ~ 2 − h~v , wi
~ 2 . Now it
follows from Theorem 2.19 that

~ 2 = k~v k2 kwk
k~v × wk ~ 2 − h~v , wi
~ 2 = k~v k2 kwk
~ 2 − k~v k2 kwk
~ 2 (cos ϕ)2
= k~v k2 kwk
~ 2 (1 − (cos ϕ)2 ) = k~v k2 kwk
~ 2 (sin ϕ)2 .

If we take the square root of both sides, we arrive at the claimed formula. (We do not need to
worry about taking the absolute value of sin ϕ because ϕ ∈ [0, π], hence sin ϕ ≥ 0.)
RA
~ 2 = k~v k2 kwk
Exercise. Show that k~v × wk ~ 2 − h~v , wi
~ 2.

Application: Area of a parallelogram and volume of a parelellepiped


Area of a parallelogram
~ be two vectors in R3 . Then they define a parallelogram (if the vectors are parallel or
Let ~v and w
one of them is equal to ~0, it is a degenerate parallelogram).
D

w
~ h

~v

Figure 2.16: Parallelogram spanned by ~v and w.


~

Proposition 2.35 (Area of a parallelogram). The area of the parallelogram spanned by the
vectors ~v and w
~ is
A = k~v × wk.
~ (2.9)

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
46 2.5. Vectors in R3 and the cross product

Proof. The area of a parallelogram is the product of the length of its base with the height. We
can take w ~ and ~v . Then we obtain that h = k~v k sin ϕ and
~ as base. Let ϕ be the angle between w
therefore, with the help of Theorem 2.34

A = kwkh
~ = kwkk~
~ v k sin ϕ = k~v × wk.
~

Note that in the case when ~v and w


~ are parallel, this gives the right answer A = 0.

Volume of a paralellepiped
Any three vectors in R3 define a parallelepiped.

~n

~u

FT
pro j~ ~u
n


RA
h

w
~

~v
D

Figure 2.17: Parallelepiped spanned by ~


u, ~v , w.
~

Proposition 2.36 (Volume of a parallelepiped). The volume of the parallelepiped spanned by


the vectors ~u, ~v and w
~ is
V = |h~u , ~v × wi|.
~ (2.10)

Proof. The volume of a parallelepiped is the product of the area of its base with the height. Let us
take the parallelogram spanned by ~v , w
~ as base. If ~v and w
~ are parallel or one or them is equal to
~0, then (2.10) is true because V = 0 and ~v × w
~ = ~0 in this case.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 47

Now let us assume that they are not parallel. By Proposition 2.35 we already know that its base
has area A = k~v × wk.
~ The height is the length of the orthogonal projection of ~u onto the normal
vector of the plane spanned by ~v and w. ~ We already know that ~v × w ~ is such a normal vector.
Hence we obtain that

h~u , ~v × wi
~ |h~u , ~v × wi|
~ |h~u , ~v × wi|
~
h = k proj~v×w~ ~uk = 2
~
v × w
~ = k~v × wk
~ = .
k~v × wk ~ k~v × wk~ 2 k~v × wk~

Therefore, the volume of the parallelepiped is

|h~u , ~v × wi|
~
V = Ah = k~v × wk
~ = |h~u , ~v × wi|.
~
k~v × wk~

~ ∈ R3 . Then
Corollary 2.37. Let ~u, ~v , w

|h~u , ~v × wi|
~ = |h~v , w
~ × ~ui| = |hw
~ , ~u × ~v i|.

FT
Proof. The formula holds because each of the expressions describes the volume of the parallelepiped
spanned by the three given vectors since we can take any of the sides of the parallelogram as its
base.

You should have understood


• the geometric interpretations of the cross product,
• why it exists only in R3
RA
• etc.
You should now be able to
• calculate the cross product,
• use it to say something about the angle between two vectors in R3 ,
• use it calculate the area of a parallelogram and the volume of a parallelepiped,
• etc.
D

2.6 Lines and planes in R3


In this section we discuss lines and planes and how to describe them in formulas. In the next
section, we will calculate, e.g., intersections between them. We work mostly in R3 and only give
some hints on how the concepts discussed here generalise to Rn with n 6= 3. The special case n = 2
should be clear.
The formal definition of lines and planes will be given in Definition 5.55 because this requires the
concept of linear independence. (For the curious: a line is an (affine) one-dimensional subspace of
a vector space; a plane is an (affine) two-dimensional subspace of a vector space; a hyperplane is an
(affine) (n − 1)-dimensional subspace of an n-dimensional vector space). In this section we appeal
to our knowledge and intuition from elementary geometry.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
48 2.6. Lines and planes in R3

Lines
Intuitively, it is clear what a line in R3 should be. In order to describe a line in R3 completely, it
is not necessary to know all its points. It is sufficient to know either
(a) two different points P, Q on the line
or
(b) one point P on the line and the direction of the line.

Q Q
~v ~v
P P P
# –
OQ
L L L # –
OP
O

FT
Figure 2.18: Line L given by: two points P, Q on L; or by a point P on L and the direction of L.

Clearly, both descriptions are equivalent because: If we have two different points P, Q on the line
# –
L, then its direction is given by the vector P Q. If on the other hand we are given a point P on L
# – # –
and a vector ~v which is parallel to L, then we easily get another point Q on L by OQ = OP + ~v .
RA
Now we want to give formulas for the line.

Vector equation of a line

Given two points P (p1 , p2 , p3 ) and Q(q1 , q2 , q3 ) with P 6= Q, there is exactly one line L which passes
through both points. In formulas, this line is described as
  
n# – p1 + (q1 − p1 )t
# – o  
L = OP + tP Q : t ∈ R = p2 + (q2 − p2 )t : t ∈ R . (2.11)
D

p3 + (q3 − p3 )t
 
 
v1
If we are given a point P (p1 , p2 , p3 ) on L and a vector ~v = v2  6= ~0 parallel to L, then
v3
  
n# – o  p 1 + v1 t 
L = OP + t~v : t ∈ R = p2 + v2 t : t ∈ R (2.12)
p 3 + v3 t
 

The formulas are easy to understand. They say: In order to trace the line, we first move to an
# –
arbitrary point on the line (this is the term OP ) and then we move an amount t along the line.
With this procedure we can reach every point on the line, and on the other hand, if we do this,
then we are guaranteed to end up on the line.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 49

The formulas (2.11) and (2.12) are called vector equation for the line L. Note that they are the
same if we set v1 = q1 − p1 , v2 = q2 − p2 , v3 = q3 − p3 . We will mostly use the notation with the
v’s since it is shorter. The vector ~v is called directional vector of the line L.

Question 2.3
# –
Is it true that E passes through the origin if and only if OP = ~0?

Remark 2.38. It is important to observe that a given line has many different parametrisations.

• The vector equation that we write down depends on the points we choose on L. Clearly, we
have infinitely many possibilities to do so.

• Any given line L has many directional vectors. Indeed, if ~v is a directional vector for L, then
c~v is so too for every c ∈ R \ {0}. However, all possible directional vectors are parallel.

Exercise. Check that the following formulas all describe the same line:

FT
         
 1 6   1 12 
(i) L1 = 2 + t 5 : t ∈ R , (ii) L2 = 2 + t 10 : t ∈ R ,
3 4 3 8
   
    
 13 6 
(ii) L3 = 12 + t 5 : t ∈ R .
11 4
 
RA
Question 2.4
• How can you see easily if two given lines are parallel or perpendicular to each other?
• How would you define the angle between two lines? Do they have to intersect so that an
angle between them can be defined?
D

Parametric equation of a line

From the formula (2.12) it is clear that a point (x, y, z) belongs to L if and only if there exists t ∈ R
such that

x = p1 + tv1 , y = p2 + tv2 , z = p3 + tv3 . (2.13)

If we had started with (2.11), then we would have obtained

x = p1 + t(q1 − p1 ), y = p2 + t(q2 − p2 ), z = p3 + t(q3 − p3 ). (2.14)

The system of equations (2.13) or (2.14) are called the parametric equations of L. Here, t is the
parameter.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
50 2.6. Lines and planes in R3

Symmetric equation of a line

Observe that for (x, y, z) ∈ L, the three equations in (2.13) must hold for the same t. If we assume
that v1 , v2 , v3 6= 0, then we can solve for t and we obtain that

x − p1 y − p2 z − p3
= = . (2.15)
v1 v2 v3

If we use (2.14) then we obtain

x − p1 y − p2 z − p3
= = . (2.16)
q1 − p1 q2 − p2 q3 − p3

FT
The system of equations (2.15) or (2.16) is called the symmetric equation of L.

If for instance, v1 = 0 and v2 , v3 6= 0, then the line is parallel to the yz-plane and its symmetric
equation is

y − p2 z − p3
x = p1 , = .
v2 v3
RA
If v1 = v2 = 0 and v3 6= 0, then the line is parallel to the z-axis and its symmetric equation is

x = p1 , y = p2 , z ∈ R.

Representations of lines in Rn .
D

In Rn , the vector form of a line is


n# – o
L = OP + t~v : t ∈ R

for fixed P ∈ L and a directional vector ~v . Its parametric form is

x1 = p1 + tv1 , x2 = p2 + tv2 , ... , xn = pn + tvn , t ∈ R,

and, assuming that all vj are different from 0, its symmetric form is
x1 − p1 x2 − p2 xn − pn
= = ··· = .
v1 v2 vn

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 51

~n

R E E E
w
~
Q P
P ~v
P

(a) (b) (c)

Figure 2.19: Plane E given by: (a) three points P, Q, R on E, (b) a point P on E and two vectors
~v , w
~ parallel to E, (c) a point P on E and a vector ~n perpendicular to E.

FT
Question 2.5. Normal form of a line.
In R2 , there is also the normal form of a line:

L : ax + by = d (2.17)

where a, b and d are fixed numbers. This means that L consists of all the points (x, y) whose
coordinates satisfy the equation ax + by = d.
RA
(i) Given a line in the form (2.17), find a vector representation.

(ii) Given a line in vector representation, find a normal form (that is, write it as (2.17)).
 
a
(iii) What is the geometric interpretation of a, b? (Hint: Draw the line L and the vector .)
b

(iv) Can this normal form be extended/generalised to lines in R3 ? If it is possible, how can it
be done? If it is not possible, explain why not.
D

Planes
In order to know a plane E in R3 completely, it is sufficient to know

(a) three points P, Q on the plane that do not lie on a a common line,

or

(b) one point P on the plane and two non-parallel vectors ~v , w


~ which are both parallel the plane,

or

(c) one point P on the plane and a vector ~n which is perpendicular to the plane,

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
52 2.6. Lines and planes in R3

~
n

Q R

# Q

# P


O

O R–
#
(0, 0, 0)

# – # –
Figure 2.20: Plane E given with three points P, Q, R on E, two vectors P Q, P R parallel to E, and a
# – # –
vector ~n perpendicular to E. Note the ~n k P Q × P R.

FT
First, let us see how we can pass from one description to another. Clearly, the descriptions (a) and
(b) are equivalent because given three points P, Q, R on E which do not lie on a line, we can form
# – # –
the vectors P Q and P R. These vectors are then parallel to the plane E but are not parallel to each
# – # – # – # –
other. (Of course, we also could have taken QR and QP or RP and RQ.) If, on the other hand,
we have one point P on E and two vectors ~v and w,
# – # –
~ parallel to E and ~v 6k w,
# – # –
two other points on E, for instance by OQ = OP + ~v and OR = OP + w.
P, Q, R lie on E and do not lie on a line.
~ then we can easily get
~ Then the three points
RA
Vector equation of a plane

In formulas, we can now describe our plane E as


   
x
 # – 
E = (x, y, z) : y  = OP + s~v + tw
~ for some s, t ∈ R .
z
 
D

As in the case of the vector equation of a line, it is easy to understand the formula. We first move
# –
to an arbitrary point on the line (this is the term OP ) and then we move parallel to the plane as
we please (this is the term s~v + tw).
~ With this procedure we can reach every point on the plane,
and on the other hand, if we do this, then we are guaranteed to end up on the plane.

Question 2.6
# –
Is it true that E passes through the origin if and only if OP = ~0?

Normal form of a plane

Now we want to use the normal vector of the plane to describe it. Assume that we are given a
point P on E and a vector ~n perpendicular to the plane. This means that every vector which is

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 53

parallel to the plane E must be perpendicular to ~n. If we take an arbitrary point Q(x, y, z) ∈ R3 ,
# – # –
then Q ∈ E if and only if P Q is parallel to E, that means that P Q is orthogonal to ~n. Recall that
two vectors are perpendicular if and only if their inner product is 0, so Q ∈ E if and only if
*n  x − p +
# – 1 1
0 = hn , P Qi = n2  , y − p2  = n1 (x − p1 ) + n2 (y − p2 ) + n3 (z − p3 )
n3 z − p3
= n1 x + n2 y + n3 z − (n1 p1 + n2 p2 + n3 − p3 )
If we set d = n1 p1 + n2 p2 + n3 − p3 , then it follows that a point Q(x, y, z) belongs to E if and only
if its coordinates satisfy
n1 x + n2 y + n3 z = d. (2.18)
Equation (2.18) is called the normal form for the plane E and ~n is called a normal vector of E.

Notation 2.39. In order to define E, we write E : n1 x + n2 y + n3 z = d. As a set, we denote E as


E = {(x, y, z) : n1 x + n2 y + n + 3z = d}.

FT
Exercise. Show that E passes through the origin if and only if d = 0.

Remark 2.40. As before, note that the normal equation for a plane is not unique. For instance,
x + 2y + 3z = 5 and 2x + 4y + 6z = 10
describe the same plane. The reason is that “the” normal vector of a plane is not unique. If ~n is
normal vector of the plane E, then every c~n with c ∈ R \ {0} is also a normal vector to the plane.
RA
Definition 2.41. The angle between two planes is the angle between their normal vectors.

Note that this definition is consistent with the fact that two planes are parallel if and only if their
normal vectors are parallel.

Remark 2.42. • Assume a plane is given as in (b) (that is, we know a point P on E and two
vectors ~v and w ~ parallel to E but with ~v 6k w).
~ In order to find a description as in (c) (that is
one point on E and a normal vector), we only have to find a vector ~n that is perpendicular to
both ~v and w. ~ Proposition 2.32(vii) tells us how to do this: we only need to calculate ~v × w.
~
D

Another way to find an appropriate ~n is to find a solution of the linear 2 × 3 system given by
{h~v , ~ni = 0, hw
~ , ~ni = 0}.
• Assume a plane is given as in (c) (that is, we know a point P on E and a normal vector). In
order to find vectors ~v and w
~ as in (b), we can proceed in many ways:
– Find two solutions of ~x × ~n = 0 which are not parallel.
# – # – # –
– Find two points Q, R on the plane such that P Q 6k P R. Then we can take ~v = P Q and
# –
w
~ = P R.
– Find one solution ~v 6= ~0 of ~n × ~v = ~0 which is usually easy to guess and then calculate
~ = ~v × ~n. The vector w
w ~ is perpendicular to ~n and therefore it is parallel to the plane.
It is also perpendicular to ~v and therefore it is not parallel to ~v . In total, this vector w
~
does what we need.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
54 2.7. Intersections of lines and planes in R3

Representations of planes in Rn .
In Rn , the vector form of plane is
n# – o
~ :t∈R
E = OP + t~v + sw

for fixed P ∈ E and a two vectors ~v , w


~ parallel to the plane but not parallel to each other.
Note that there is no normal form of a plane in Rn for n ≥ 4. The reason it that for n ≥ 4, there
are more than just one normal directions to a given plane, so a normal form of a plane E must
consist of more than one equations (more precisely, it must consist of n − 2 equations of the form
n1 x1 + . . . nn xn = d).

You should have understood


• the concept of lines and planes in R3 ,

FT
• how they can be described in formulas,
• etc.
You should now be able to
• pass easily between the different descriptions of lines and planes,
• etc.
RA
2.7 Intersections of lines and planes in R3

Intersection of lines

Given two lines G and L in R3 , there are three possibilities:


D

(a) The lines intersect in exactly one point. In this case, they cannot be parallel.

(b) The lines intersect in infinitely many points. In this case, the lines have to be equal. In
particular the have to be parallel.

(c) The lines do not intersect. Note that in contrast to the case in R2 , the lines do not have to be
parallel for this to happen. For example, the line L : x = y = 1 is a line parallel to the z-axis
passing through (1, 1, 0), and G : x = z = 0 is a line parallel to the y-axis passing through
(0, 0, 0), The lines do not intersect and they are not parallel.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 55

Example 2.43. We consider four lines Lj = {~


pj + t~vj : t ∈ R} with
       
1 0 2 2
(i) ~v1 = 2 , p~1 = 0 , (ii) ~v2 = 4 , p~2 = 4 ,
3 1 6 7
       
1 −1 1 3
(iii) ~v3 = 1 , p~3 =  0 , (iv) ~v4 = 1 , p~4 = 0 .
2 0 2 5

We will calculate their mutual intersections.

L1 ∩ L2 = L1

Proof. A point Q(x, y, z) belongs to L1 ∩ L2 if and only if it belongs both to L1 and L2 . This means
# –
that there must exist an s ∈ R such that OQ = p~1 + s~v1 and there must exist a t ∈ R such that
# –
OQ = p~2 + t~v2 . Note that s and t are different parameters. So we are looking for s and t such that

FT
       
0 1 2 2
p~1 + s~v1 = p~2 + t~v2 , that is 0 + s 2 = 4 + t 4 . (2.19)
1 3 7 6

Once we have solved (2.19) for s and t, we insert them into the equations for L1 and L2 respectively,
in order to obtain Q. Note that (2.19) in reality is a system of three equations: one equation for
each component of the vector equation. Writing it out and solving each equation for s, we obtain
RA
0 + s = 2 + 2t s = 2 + 2t
0 + 2s = 4 + 4t ⇐⇒ s = 2 + 2t
1 + 3s = 7 + 6t s = 2 + 2t

This means that there are infinitely many solutions of (2.19). Given any point R on L1 , there is a
# – # –
corresponding s ∈ R such that OR = p~1 + s~v1 . Now if we choose t = (s − 2)/2, then OR = p~2 + t~v2
0
holds, hence R ∈ L2 too. If on the other hand we have a point R ∈ L2 , then there is a corresponding
# – # –
t ∈ R such that OR0 = p~2 + t~v2 . Now if we choose s = 2 + 2t, then OR0 = p~1 + t~v1 holds, hence
R0 ∈ L2 too. In summary, we showed that L1 = L2 .
D

Remark 2.44. We could also have seen that the directional vectors of L1 and L2 are parallel. In
fact, ~v2 = 2~v1 . It then suffices to show that L1 and L2 have at least one point in common in order
to conclude that the lines are equal.

L1 ∩ L3 = {(1, 2, 4)}

Proof. As before, we need to find s, t ∈ R such that


       
0 1 −1 1
p~1 + s~v1 = p~3 + t~v3 , that is 0 + s 2 =  0 + t 1 . (2.20)
1 3 0 2

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
56 2.7. Intersections of lines and planes in R3

We write this as a system of equations, we get

1 0 + s = −1 + t 1 s − t = −1
2 0 + 2s = 0 + t ⇐⇒ 2 2s − t = 0
3 1 + 3s = 0 + 2t 3 3s − 2t = −1

From 1 it follows that s = t − 1. Inserting in 2 gives 0 = 2(t − 1) − t = t − 2, hence t = 2. From


1 we then obtain that s = 2 − 1 = 1. Observe that so far we used only equations 1 and 2 . In
order to see if we really found a solution, we must check if it is consistent with 3 . Inserting our
candidates for s and t, we find that 3 · 1 − 2 · 2 = −1 which is consistent with 3 .
So L1 and L3 intersect in exactly one point. In order to find it, we put s = 1 in the equation for
L1 :      
0 1 1
# –
OQ = p~1 + 1 · ~v1 = 0 + 2 = 2 ,
1 3 4
hence the intersection point is Q(1, 2, 4).

FT
In order to check if this result is correct, we can put t = 2 in the equation for L3 . The result must
be the same. The corresponding calculation is:
     
−1 2 1
# –
OQ = p~3 + 2 · ~v3 =  0 + 2 = 2 ,
0 4 4

which confirms that the intersection point is Q(1, 2, 4).


RA
L1 ∩ L4 = ∅
Proof. As before, we need to find s, t ∈ R such that
       
0 1 3 1
p~1 + s~v1 = p~4 + t~v4 , that is 0 + s 2 = 0 + t 1 . (2.21)
1 3 5 2

We write this as a system of equations and we get


D

1 s=3+ t 1 s− t=3
2 2s = t ⇐⇒ 2 2s − t = 0
3 1 + 3s = 5 + 2t 3 3s − 2t = 5

From 1 it follows that s = t + 3. Inserting in 2 gives 0 = 2(t + 3) − t = t + 6, hence t = −6.


From 1 we then obtain that s = −6 + 3 = −3. Observe that so far we used only equations 1 and
2 . In order to see if we really found a solution, we must check if it is consistent with 3 . Inserting
our candidates for s and t, we find that 3 · (−3) − 2 · (−6) = 3 which is inconsistent with 3 .
Therefore we conclude that there is no pair of real numbers s, t which satisfies all three equations
1 – 3 simultaneously, so the two lines do not intersect.

Exercise. Show that L3 ∩ L4 = ∅.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 57

Intersection of planes
Given two planes E1 and E2 in R3 , there are two possibilities:

(a) The planes intersect. In this case, they necessarily intersect in infinitely many points. Their
intersection is either a line (if E1 and E2 are not parallel) or a plane (if E1 = E2 ).

(b) The planes do not intersect. In this case, the planes must be parallel and not equal.

Example 2.45. We consider the following four planes:

E1 : x + y + 2z = 3, E2 : 2x + 2y + 4z = −4, E3 : 2x + 2y + 4z = 6, E4 : x + y − 2z = 5.

We will calculate their mutual intersections.

E1 ∩ E2 = ∅

FT
Proof. The set of all points Q(x, y, z) which belong both to E1 and E2 is the set of all x, y, z which
simultaneously satisfy

1 x + y + 2z = 3,
2 2x + 2y + 4z = −4.

Now clearly, if x, y, z satisfies 1 , then it cannot satisfy 2 (the right side would be 6). We can
see this more formally if we solve 1 , e.g., for x and then insert into 2 . We obtain from 1 :
x = 3 − y − 2z. Inserting into 2 leads to
RA
−4 = 2(3 − y − 2z) + 2y + 4z = 6,

which is absurd.
   
1 2
This result was to be expected since the normal vectors of the planes are ~n1 = 1 and ~n2 = 2
2 4
respectively. Since they are parallel, the planes are parallel and therefore they either are equal or
D

they have empty intersection. Now we see that for instance (3, 0, 0) ∈ E1 but (3, 0, 0) ∈/ E2 , so the
planes cannot be equal. Therefore they have empty intersection.

E1 ∩ E3 = E1

Proof. The set of all points Q(x, y, z) which belong both to E1 and E3 is the set of all x, y, z which
simultaneously satisfy

1 x + y + 2z = 3,
2 2x + 2y + 4z = 6.

Clearly, both equations are equivalent: if x, y, z satisfies 1 , then it also satisfies 2 and vice versa.
Therefore, E1 = E3 .

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
58 2.7. Intersections of lines and planes in R3

Figure 2.21: The left figure shows E1 ∩ E2 = ∅, the right figure shows E1 ∩ E4 which is a line.

FT
    
 4 −1 
E1 ∩ E4 =  0 + t  1 : t ∈ R .
− 12 0
 

   
1 1
Proof. First, we notice that the normal vectors ~n1 = 1 and ~n4 =  1 are not parallel, so we
2 −2
RA
expect that the solution is a line in R3 .
The set of all points Q(x, y, z) which belong both to E1 and E4 is the set of all x, y, z which
simultaneously satisfy

1 x + y + 2z = 3,
2 x + y − 2z = 5.

Equation 1 shows that x = 3 − y − 2z. Inserting into 2 leads to 5 = 3 − y − 2z + y − 2z = 3 − 4z,


hence z = − 21 . Putting this into 1 , we find that x+y = 3−2z = 4. So in summary, the intersection
D

consists of all points (x, y, z) which satisfy

1
z=− , x=4−y with y ∈ R arbitrary,
2

in other words,
           
x 4−y 4 −y 4 −1
y  =  y  =  0  +  y  =  0  + y  1 with y ∈ R arbitrary.
z − 12 − 21 0 − 21 0

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 59

Intersection of a line with a plane


Finally we want to calculate the intersection of a plane E with a line L. There are three possibilities:
(a) The plane and the line intersect in exactly one point. This happens if and only if L is not
parallel to E which is the case if and only if L is not perpendicular to the normal vector of E.
(b) The plane and the line do not intersect. In this case, the E and L must be parallel, that is,
L must be perpendicular to the normal vector of E.
(c) The plane and the line intersect in infinitely many points. In this case, L lies in E, that is, E
and L must be parallel and they must share at least one point.
As an example we calculate E1 ∩ L2 . Since L2 is clearly not parallel to E1 , we expect that their
intersection consists of exactly one point.

E1 ∩ L2 = {(1/9, 2/9, 4/3)}

FT
Proof. The set of all points Q(x, y, z) which belong both to E1 and L2 is the set of all x, y, z which
simultaneously satisfy

x + y + 2z = 3 and x = 2 + 2t, y = 4 + 4t, z = 7 + 6t for some t ∈ R.

Replacing the expression with t from L2 into the equation of the plane E1 , we obtain the following
equation for t:

3 = (2 + 2t) + (4 + 4t) + 2(7 + 6t) = 20 + 18t =⇒ t = −17/18.


RA
Replacing this t into the equation for L2 gives the point of intersection Q(1/9, 2/9, 4/3).
In order to check our result, we insert the coordinates in the equation for E1 and obtain x+y +2z =
1/9 + 2/9 + 2 · 4/3 = 1/3 + 8/3 = 3 which shows that Q ∈ E1 .

Intersection of several lines and planes


If we wanted to intersect for instance, 5 planes in R3 , then we would have to solve a system of 5
equations for 3 unknowns. Or if we wanted to intersect 7 lines in R3 , then we had to solve a system
D

of 3 equations for 7 unknowns. If we solve them as we did here, the process could become quite
messy. So the next chapter is devoted to find a systematic and efficient way to solve a system of m
linear equations for n unknowns.

You should have understood


• what intersections of lines and planes can be geometrically and how they depends on their
relative orientation,
• the interpretation of a linear system with three unknowns as the intersection of planes in
R3 ,
• etc.
You should now be able to

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
60 2.8. Summary

• calculate the intersection of lines and planes,


• etc.

2.8 Summary
The vector space Rn is given by
  
 x1
 

n  .. 
R =  .  : x1 , . . . , x n ∈ R .
 
xn
 

For points P (p1 , . . . , pn ), Q(q1 , . . . , qn ), the vector whose initial point is P and final point is Q, is
   
q1 − p1 q1
# –  .  # – .
P Q =  ..  and OQ =  ..  where O denotes the origin.
qn − pn qn

FT
n
On R , the sum and product with scalars are defined by
       
v1 w1 v1 + w1 cv1
Rn × Rn → Rn , ~v + w~ =  ...  +  ...  =  ..
, R × Rn → R n , c~v =  ...  .
       
.
vn wn vn + wn cvn
The norm of a vector is q
k~v k = v12 + · · · + vn2 .
# – # –
RA
If ~v = P Q, then k~v k = kP Qk = distance between P and Q.
~ ∈ Rn their inner product is a real number defined by
For vectors ~v and w
   
* v1 w1 +
 ..   .. 
h~v , wi
~ =  .  ,  .  = v1 w1 + · · · + vn wn .
vn wn

Important formulas involving the inner product


D

• h~v , wi
~ = hw ~ , ~v i,
h~v , cwi
~ = ch~v , wi,~
h~v , w
~ + ~ui = h~v , wi
~ + h~v , ~ui,
• h~v , wi
~ = k~v k kwk ~ cos ϕ,
• k~v + wk
~ ≤ k~v k + kwk
~ Triangle inequality
• ~v ⊥ w ~ ⇐⇒ h~v , wi~ = 0,
2
• h~v , ~v i = k~v k .
The cross product is defined only in R3 . It is a vector defined by
     
v1 w1 v2 w3 − v3 w2
~ = v2  × w2  = v3 w1 − v1 w3  .
~v × w
v3 w3 v1 w2 − v2 w1

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 61

Important formulas involving the cross product


• ~u × ~v = −~v × ~u,
~u × (~v + w)
~ = (~u × ~v ) + (~u × w),
~
(c~u) × ~v = c(~u × ~v ),
• h~u , ~v × wi
~ = h~u × ~v , wi.
~
• ~u k ~v =⇒ ~u × ~v = ~0.
• h~u , ~u × ~v i = 0 and h~v , ~u × ~v i = 0, in particular ~v ⊥ ~v × ~u, ~u ⊥ ~v × ~u .
• k~v × wk
~ = k~v k kwk
~ sin ϕ.

Applications
~ ∈ R3 : A = k~v × wk.
• Area of a parallelogram spanned by ~v , w ~
~ ∈ R3 : V = |h~u , ~v × wi|.
• Volume of a parallelepiped spanned by ~u, ~v , w ~

FT
Representations of lines
n# – o
• Vector equation L = OP + ~t~v : t ∈ R .
P is a point on the line, ~v is called directional vector of L.
• Parametric equation x1 = p1 + tv1 , . . . , xn = pn + tvn , t ∈ R.
Then P (p1 , . . . , pn ) is a point on L and ~v = (v1 , . . . , vn )t is a directional vector of L.
• Symmetric equation x1v−p 1
= x2v−p 2
= · · · = xnv−p n
.
RA
1 2 n
Then P (p1 , . . . , pn ) is a point on L and ~v = (v1 , . . . , vn )t is a directional vector of L.
If one or several of the vj are equal to 0, then the formula above has to be modified.

Representations of planes
n# – o
• Vector equation E = OP + ~t~v + sw
~ : s, t ∈ R .
~ are vectors parallel to E with ~v 6k w.
P is a point on the line, ~v and w ~
D

• Normal form (only in R3 !!) E : ax + by + cz = d.


 
a
The vector ~n =  b  formed by coefficients on the left hand side is perpendicular to E.
c
Moreover, E passes through the origin if and only if d = 0.

The parametrisations are not unique!! (One and the same line (or plane) has many different
parametrisations.)

• The angle between two lines is the angle between their directional vectors.
• Two lines are parallel if and only if their directional vectors are parallel.
Two lines are perpendicular if and only if their directional vectors are perpendicular.

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
62 2.8. Summary

• The angle between two planes is the angle between their normal vectors.
• Two planes are parallel if and only if their normal vectors are parallel.
Two planes are perpendicular if and only if their normal vectors are perpendicular.
• A line is parallel to a plane if and only if its directional vector is perpendicular to the plane.
A line is perpendicular to a plane if and only if its directional vector is parallel to the plane.

FT
RA
D

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 63

2.9 Exercises
 
3
1. Sean P (2, 3), Q(−1, 4) puntos en R2 y sea ~v = un vector en R2 .
−2
−−→
(a) Calcule P Q.
# –
(b) Calcule kP Qk.
−−→
(c) Calcule P Q + ~v .
(d) Encuentre todos los vectores que son ortogonales a ~v .
 
2
2. Sea ~v = ∈ R2 .
5

(a) Encuentre todos los vectores unitarios cuya dirección es opuesta a la de ~v .


(b) Encuentre todos los vectores de longitud 3 que tienen la misma dirección que ~v .

FT
(c) Encuentre todos los vectores que tienen la misma dirección que ~v y que tienen doble
longitud de ~v .
(d) Encuentre todos los vectores con norma 2 que son ortogonales a ~v .

3. Show that the following equations describe the same line:


              
 1 4   1 8   1 −4 
2 + t 5 : t ∈ R , 2 + t 10 : t ∈ R , 2 + t −5 : t ∈ R ,
3 6 3 12 3 −6
     
RA
    
 5 4  x−1 y−2 z−3 x+3 y+3 z+3
7 + t 5 : t ∈ R , = = , = = .
4 5 6 4 5 6
9 6
 

Find at least one more vector equation and one more symmetric equation. Find at least two
different parametric equations.

4. Para los siguientes vectores ~u y ~v decida si son ortogonales, paralelos o ninguno de los dos.
D

Calcule el coseno del ángulo entre ellos. Si son paralelos, encuentre números reales λ y µ tales
que ~v = λ~u y ~u = µ~v .
       
1 5 2 1
(a) ~v = , ~u = , (b) ~v = , ~u = ,
4 −2 4 2
       
3 −8 −6 3
(c) ~v = , ~u = , (d) ~v = , ~u = .
4 6 4 −2

~ encuentre todos los α ∈ R tal que ~v y w


5. (a) Para las siguientes parejas ~v y w ~ son paralelos:
           
1 α 2 1+α α 1+α
(i) ~v = ,w~= , (ii) ~v = ,w~= , (iii) ~v = ,w
~= ,
4 −2 α 2 5 2α

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
64 2.9. Exercises

~ encuentre todos los α ∈ R tal que ~v y w


(b) Para las siguientes parejas ~v y w ~ son perpendicu-
lares:
           
1 α 2 α α 1+α
(i) ~v = ,w~= , (ii) ~v = ,w~= , (iii) ~v = ,w~= .
4 −2 α 2 5 2
   
1 ~ 5
6. Sean ~a = yb= .
3 2

(a) Calcule proj~b ~a y proj~a ~b.


(b) Encuentre todos los vectors ~v ∈ R2 tal que k proj~a ~v k = 0. Describa este conjunto
geométricamente.
(c) Encuentre todos los vectors ~v ∈ R2 tal que k proj~a ~v k = 2. Describa este conjunto
geométricamente.
(d) ¿Existe un vector ~x tal que proj~a ~x k ~b?
¿Existe un vector ~x tal que proj ~a k ~b?

FT
~
x

7. Sean ~a, ~b ∈ R2 con ~a 6= ~0.

(a) Demuestre que k proj~a ~bk ≤ k~bk.


(b) ¿Qué deben cumplir ~a y ~b para que k proj~a ~bk = k~bk ?

8. Sean ~a, ~b ∈ Rn con ~b 6= ~0.


RA
(a) Demustre que k proj~b ~ak ≤ k~ak.
(b) Encuentre condiciones para ~a y ~b para que k proj~b ~ak = k~ak.
(c) ¿Es cierto que k proj~b ~ak ≤ k~bk?

9. (a) Calcule el área del paralelogramo cuyos vértices adyacentes A(1, 2, 3), B(2, 3, 4), C(−1, 2, −5)
son y calcule el cuarto vértice.
(b) Calcule el área del triángulo con los vértices. A(1, 2, 3), B(2, 3, 4), C(−1, 2, −5).
D

(c) Calcule
 5 el volumen del paralelepipedo
 −1   1 determinado por los vectores
~u = 2 , ~v = 4 ,w ~ = −2 .
1 3 7

10. (a) Demuestre que no existe un elemento neutral para el producto cruz en R3 . Es decir:
Demuestre que no existe ningún vector ~v ∈ R3 tal que ~v × w ~ =w ~ ∈ R3 .
~ para todo w
1
(b) Sea w~ = 2 ∈ R3 .
3
2  2 
(i) Encuentre todos los vectores ~a, ~b ∈ R3 tales que ~a × w
~ = 1 , ~b × w
~ = −1 ,
0 0
(ii) Encuentre todos los vectores ~v ∈ R3 tales que h~v , wi
~ = 4.

11. Dados lı́neas L1 y L2 y el punto P , determine:

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 2. R2 and R3 65

• si L1 y L2 son paralelas,
• si L1 y L2 tienen un punto de intersección,
• si P pertenece a L1 y/o a L2 ,
• una recta paralela a L2 que pase por P .
3  1 
y−2
(a) L1 : ~r(t) = 4 + t −1 , L2 : x−3 2 = 3 = 4 ,
z−1
P (5, 2, 11).
5 3
 2  1
(b) L1 : ~r(t) = 1 + t 2 , L2 : x = t + 1, y = 3t − 4, z = −t + 2, P (5, 7, 2).
−7 3

12. En R3 considere el plano E dado por E : 3x − 2y + 4z = 16.

(a) Encuentre por lo menos tres puntos que pertenecen a E.


(b) Encuentre un punto en E y dos vectores ~v y w
~ en E que no son paralelos entre si.
(c) Encuentre un punto en E y un vector ~n que es ortogonal a E.

FT
(d) Encuentre un punto en E y dos vectores ~a y ~b en E con ~a ⊥ ~b.

13. Para los puntos P (1, 1, 1), Q(1, 0, −1) y los siguientes planos E:

• Encuentre la ecuación del plano.


• Determine si P pertenece al plano.
• Encuentre una recta que esté ortogonal a E y que contenga al punto Q.
RA
1
(i)E es el plano que contiene al punto A(1, 0, 1) y es paralelo a los vectores ~v = 1 y
3 0
w
~= 2 .
1
(ii)E es el plano que contiene los puntos A(1, 0, 1), B(2, 3, 4), C(3, 2, 4).
3
(iii)E es el plano que contiene el punto A(1, 0, 1) y es ortogonal al vector ~n = 2 .
1

14. Considere el plano E : 2x − y + 3z = 9 y la recta L : x = 3t + 1, y = −2t + 3, z = 5t.


D

(a) Encuentre E ∩ L.
(b) Encuentre una recta G que no interseque ni al plano E ni a la recta L. Pruebe su afirmación.
Cúantas rectas con esta propiedad hay?

15. En R3 considere el plano E dado por E : 3x − 2y + 4z = 16.


 2  2 2
(a) Demuestre que los vectores ~a = 1 , ~b = 5 y ~v = 3 son paralelos al plano E.
−1 1 0

(b) Encuentre números λ, µ ∈ R tal que λ~a + µ~b = ~v .


1
(c) Demuestre que el vector ~c = 1 no es paralelo al plano E y encuentre vectores ck y c⊥
1
tal que ck es paralelo a E, c⊥ es ortogonal a E y c = ck + c⊥ .

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
66 2.9. Exercises

16. Sea E un plano en R2 y sean ~a, ~b vectores paralelos a E. Demuestre que para todo λ, µ ∈ R,
el vector λ~a + µ~b es paralelo al plano.

17. Sea V un espacio vectorial. Demuestre lo siguiente:

(a) El elemento neutral es único.


(b) 0v = O para todo v ∈ V .
(c) λO = O para todo λ ∈ R.
(d) Dado v ∈ V , su inverso ve es único.
(e) Dado v ∈ V , su inverso ve cumple ve = (−1)v.

18. De todos los siguientes conjuntos decida si es un espacio vectorial con su suma y producto
usual.
  
a

FT
(a) V = :a∈R ,
a
  
a
(b) V = : a ∈ R ,
a2
(c) V es el conjunto de todas las funciones continuas R → R.
(d) V es el conjunto de todas las funciones f continuas R → R con f (4) = 0.
(e) V es el conjunto de todas las funciones f continuas R → R con f (4) = 1.
RA
D

Last Change: Mi 2. Feb 01:01:24 CET 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 67

Chapter 3

Linear Systems and Matrices

We will rewrite linear systems as matrix equations in order to solve them systematically and effi-
ciently. We will interpret matrices as linear maps from Rn to Rm which then allows us to define

FT
algebraic operations with matrices, specifically we will define the sum and the composition (=mul-
tiplication) of matrices which then leads naturally to the concept of the inverse of a matrix. We
can interpret a matrix as a system which takes some input (the variables x1 , . . . , xn ) and gives us
back as output b1 , . . . , bm via A~x = ~b. Sometimes we are given the input and we want to find
the bj ; and sometimes we are given de output b1 , . . . , bm and we want to find the input x1 , . . . , xn
which produces the desired output. The latter question is usually the harder one. We will see that
a unique input for any given output exists if and only if the matrix is invertible. We can refine
the concept of invertibility of a matrix. We say that A has a left inverse if for any ~b the equation
RA
A~x = ~b has at most one solution and we say that it has a right inverse A~x = ~b has at least one
solution for any ~b.
We will discuss in detail the Gauß and Gauß-Jordan elimination which helps us to find solutions
of a given linear system and the inverse of a matrix if it exists. In Section 3.7 we define the trans-
position of matrices and we have a first look at symmetric matrices. They will become important
in Chapter 8. We will also see the interplay of transposing a matrix and the inner product. In the
last section of this chapter we define the so-called elementary matrices which can be seen as the
building blocks of invertible matrices. We will use them in Chapter 4 to prove important properties
D

of the determinant.

3.1 Linear systems and Gauß and Gauß-Jordan elimination


We start with a linear system as in Definition 1.8:

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. (3.1)
. . .
am1 x1 + am2 x2 + · · · + amn xn = bm .

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
68 3.1. Linear systems and Gauß and Gauß-Jordan elimination

Recall that the system is called consistent if it has at least one solution; otherwise it is called
inconsistent. According to (1.4) and (1.5) its associated coefficient matrix and augmented coefficient
matrices are  
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A= . (3.2)
 
.. 
 .. . 
am1 am2 ... amn
and  
a11 a12 ... a1n b1
 a21 a22 ... a2n b2 
 
(A|b) = 
 ... .. ..  . (3.3)
 . .
am1 am2 ... amn bn

Definition 3.1. The set of all matrices with m rows and n columns is denoted by M (m × n). If we

FT
want to emphasise that the matrix has only real entries, then we write M (m × n, R) or MR (m × n).
Another frequently used notation is Mm×n . A matrix A is called a square matrix if its number of
rows is equal to its number of columns.

In order to solve (3.1), we could use the first equation, solve for x1 and insert this in all the other
equations. This gives us a new system with m − 1 equations for n − 1 unknowns. Then we solve
the next equation for x2 , insert it in the other equations, and we continue like this until we have
only one equation left. This of course will fail if for example a11 = 0 because in this case we cannot
RA
solve the first equation for x1 . We could save our algorithm by saying: we solve the first equation
for the first unknown whose coefficient is different from 0 (or we could take an equation where the
coefficient of x1 is different from 0 and declare this one to be our first equation. After all, we can
order the equations as we please). Even with this modification, the process of solving and replacing
is error prone.
Another idea is to manipulate the equations. The question is: Which changes to the equations
are allowed without changing the information contained in the system? We don’t want to destroy
information (thus potentially allowing for more solutions) nor introduce more information (thus
potentially eliminating solutions). Or, in more mathematical terms, what changes to the given
D

system of equations result in an equivalent system? Here we call two systems equivalent if they
have the same set of solutions.
We can check if the new system is equivalent to the original one, if there is a way to restore the
original one.
For example, if we exchange the first and the second row, then nothing really happened and we end
up with an equivalent system. We can come back to the original equation by simply exchanging
again the first and the second row.
If we multiply both sides of the first equation on both sides by some factor, let’s say, by 2, then
again nothing changes. Assume for example that the first equation is x + 3y = 7. If we multiply
both sides by 2, we obtain 2x + 6y = 14. Clearly, if a pair (x, y) satisfies the first equation, then
it satisfies also the second one an vice versa. Given the new equation 3x + 6y = 14, we can easily
restore the old one by simply dividing both sides by 2.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 69

If we take an equation and multiply both of its sides by 0, then we destroy information because we
end up with 0 = 0 and there is no way to get back the information that was stored in the original
equation. So this is not an allowed operation.
Show that squaring both sides of an equation in general does not give an equivalent equation.
Are there cases, when it does?
Squaring an equation or taking the logarithm on both sides or other such things usually are not
interesting to us because the resulting equation will no longer be a linear equation.
Let us denote the jth row of our linear system (3.1) by Rj . The following tabel contains the so-
called called elementary row operations. They are the “allowed” operations because they do not
alter the information contained in a given linear system since they are reversible.
The first column describes the operation in words, the second introduces their shorthand notation
and in the last row we give the inverse operation which allows us to get back to the original system.

FT
Elementary operation Notation Inverse Operation
1 Swap rows j and k. Rj ↔ Rk Rj ↔ Rk
2 Multiply row j by some λ 6= 0. Rj → λRj Rj → λ1 Rj
3 Replace row k by the sum of row k and λ times Rk → Rk + λRj Rk → Rk − λRj
Rj and leave row j unchanged.
RA
Exercise. Show that the operation in the third column reverses the operation from the second
column.

Exercise. Show that instead of the operation 3 it suffices to take 3’ : Rk → Rk + Rj because


3 can be written and as a composition of operations of the form 2 and 3’ . Show how this can
be done.

Exercise. Show that in reality 1 is not necessary since it can be achieved by a composition of
operations of the form 2 and 3 (or 2 and 3’ ). Show how this can be done.
D

Let us see in an example how this works.

Example 3.2.
    
x1 + x2 − x3 = 1 x1 + x2 − x3 = 1
  x1 + x2 − x3 = 1
 
R →R −2R1 R →R −4R2
2x1 + 3x2 + x3 = 3 −−2−−−2−−−→ x2 + 3x3 = 1 −−3−−−3−−−→ x2 + 3x3 = 1
    
4x2 + x3 = 7 4x2 + x3 = 7 − 11x3 = 3
    


x1 + x2 − x3 =
 1
R3 →R3 −4R2
−−−−−−−−→ x 2 + 3x 3 = 1

x3 = −3/11.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
70 3.1. Linear systems and Gauß and Gauß-Jordan elimination

Here we can stop because it is already quite easy to read off the solution. Proceeding from the
bottom to the top, we obtain

x3 = −3/11, x2 = 1 − 3x3 = 20/11, x1 = 1 + x3 − x2 = −12/11.

Note that we could continue our row manipulations to clean up the system even more:
   
x1 + x2 − x3 = 1
  R →−1/11R  x 1 + x 2 − x 3 = 1

3 3
· · · −→ x2 + 3x3 = 1 −−−−−−−−−→ x2 + 3x3 = 1
   
− 11x3 = 3 x3 = −3/11
   

   
x1 + x2 − x3 =
 1 R →R −1/11R x1 + x2
 = 8/11

R →R −3R3 1 1 3
−−2−−−2−−−→ x2 = 20/11 −−−−−−−−−−−→ x2 + = 20/11
   
x3 = −3/11 x3 = −3/11
   


x1 + = −12/11

FT

R →R −R
−−1−−−−
1 2
−−→ x2 = 20/11

x3 = −3/11

Our strategy was to apply manipulations that successively eliminate the unknowns in the lower
equations and we aimed to get to a form of the system of equations where the last one contains the
least number of unknowns possible.

Convince yourself that the first step of our reduction process is equivalent to solve the first
RA
equation for x1 and insert it in the other equations in order to eliminate it there. The next step
in the reduction is equivalent to solve the new second equation for x2 and insert it into the third
equation.

It is important to note that there are infinitely many different routes leading to the final result,
but usually some are quicker than others.

Let us analyse what we did. We looked at the coefficients of the system and we applied trans-
D

formations such that they become 0 because this results in removing the corresponding unknowns
from the equations. So in the example above we could just as well delete all the xj , keep only the
augmented coefficient matrix and perform the line operations in the matrix. Of course, we have
to remember that the numbers in the first columns are the coefficients of x1 , those in the second
column are the coefficients of x2 , etc. Then our calculations are translated into the following:
     
1 1 1 1 1 1 1 1 1 1 1 1
R2 →R2 −2R1 R →R −4R2
2 3 1 3 − −−−−−−−→ 0 1 3 1 −−3−−−3−−−→ 0 1 3 1
0 4 1 7 0 4 1 7 0 0 −11 3
 
1 1 1 1
R3 →1/11R3
−−−−−−−−→ 0 1 3 1 .
0 0 1 −3/11

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 71

If we translate this back into a linear system, we get

x1 + x2 + x3 = 1
x2 + 3x3 = 3
x3 = −3/11

which can be easily solved from the bottom up.


We did exactly the same calculations as we did with the system of equations but it looks much
tidier in matrix notation since we do not have to write down the unknowns all the time.
If we want to solve a linear system we write it as an augmented matrix and then we perform row
operations until we reach a “nice” form where we can read off the solutions if there are any.
But what is a “nice” form? Remember that if a coefficient is 0, then the corresponding unknown
does not show up in the equation.

• All rows with only zeros should be at the bottom.


• In the first non-zero equation from the bottom, we want as few unknowns as possible and we

FT
want them to be the last unknowns. So as last row we want one that has only zeros in it
or one that starts with zeros, until finally we get a non-zero number say in column k. This
non-zero number can always be made equal to 1 by dividing the row by it. Now we know
how the unknowns xk , . . . , xn are related. Note that all the other unknowns x1 , . . . , xk−1 have
disappeared from the equation since their coefficients are 0.
If k = n as in our example above, then we there is only one solution for xn .
• The second non-zero row from the bottom should also start with zeros until we get to a column,
say column l, with non-zero entry which we always can make equal to 1. This column should
RA
be to the left of the column k (that is we want l < k). Because now we can use what we
know from the last row about the unknowns xk , . . . , xn to say something about the unknowns
xl , . . . , xk−1 .
• We continue like this until all rows are as we want them.

Note that the form of such a “nice” matrix looks a bit like it had a triangle consisting of only zeros
in its lower left part. There may be zeros in the upper right part. If a matrix has the form we just
described, we say it is in row echelon form. Let us give a precise definition.
D

Definition 3.3 (Row echelon form). We say that a matrix A ∈ M (m × n) is in row echelon
form if:

• All its zero rows are the last rows.


• The first non-zero entry in a row is 1. It is called the pivot of the row.
• The pivot of any row is strictly to the right of that of the row above.

Definition 3.4 (Reduced row echelon form). We say that a matrix A ∈ M (m×n) is in reduced
row echelon form if:

• A is in row echelon form.


• All the entries in A which are above a pivot are equal to 0.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
72 3.1. Linear systems and Gauß and Gauß-Jordan elimination

Let us quickly see some examples.

Examples 3.5.
(a) The following matrices are in reduced row echelon form. The pivots are highlighted.
   
  1 1 0 0
1 0 0 0
! !
 1 6 0 0 , 0 0 1 0
  

,
 1 6 0 1 , 1 0 0 0 , 0 1 0 0 .
0 0 1 0  0 0 0 1
0 0 1 1 0 0 1 1 0 0 1 0
   
0 0 0 1  0 0 0 0  
0 0 0 1
0 0 0 0

(b) The following matrices are in row echelon form but not in reduced row echelon form. The
pivots are highlighted.
   
  1 6 3 1 ! ! 1 0 5 0
1 6 3 1 0 0 1 4

FT
  
1 6 1 0 , 1 0 2 0 , 0 1 0 0
 0 0 1 1 ,  ,
  .
0 0 0 1
  
0 0 1 1 0 0 1 1 0 0 1 0
  
0 0 0 1 0 0 0 0   
0 0 0 1
0 0 0 0

(c) The following matrices are not in row echelon form:


 
1 6 0 0  
  0 0 0 1
1 6 0 0 0 0 1 0
RA
   
2 0 1 0 , 0
  0 0 1 1 0 3 1 1 0 0 1 0
0 0 1 , , ,  .
  1 6 0 1 1 0 0 0 0 1 0 0
3 0 0 1 0 0 0 0
1 0 0 0
0 6 0 0

Exercise. • Say why the matrices in (b) are not in reduced row echelon form and use ele-
mentary row operations to transform them into a matrix in reduced row echelon form.
• Say why the matrices in (c) are not in row echelon form and use elementary row operations
to transform them into a matrix in row echelon form. Transform them further to obtain a
D

matrix in reduced row echelon form.

Question 3.1
If we interchange two rows in a matrix this corresponds to writing down the given equations in a
different order. What is the effect on a linear system if we interchange two columns?

Remember: if we translate a linear system to an augmented coefficient matrix (A|b), perform the
row operations to arrive at a (reduced) row echelon form (A0 |b0 ), and translate back to a linear
system, then this new system contains exactly the same information as the original one but it is
“tidied up” and it is easy to determine its solution.
The natural question now is: Can we always transform a matrix into one in (reduced) row echelon
form? The answer is that this is always possible and we can even give an algorithm for it.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 73

Gaußian elimination. Let A ∈ M (m × n) and assume that A is not the zero matrix. Gaußian
elimination is an algorithm that transforms A into a row echelon form. The steps are as follows:

• Find the first column which does not consist entirely of zeros. Interchange rows appropiately
such that the entry in that column in the first row is different from zero.
• Multiply the first row by an appropriate number so that its first non-zero entry is 1.
• Use the first row to eliminate all coefficients below its pivot.
• Now our matrix looks like  
0 0 1 ∗ ∗
 
 0 .


0
A0
 
 
 
0 0 0

where ∗ are arbitrary numbers and A0 is a matrix with fewer columns than A and m − 1 rows.

FT
Now repeat the process for A0 . Note that in doing so the first columns do not change since
we are only manipulating zeros.

Gauß-Jordan elimination. Let A ∈ M (m × n). The Gauß-Jordan elimination is an algorithm


that transforms A into a reduced row echelon form. The steps are as follows:

• Use the Gauß elimination to obtain a row echelon form of A.


• Use the pivots to eliminate the non-zero coefficients which are columns above a pivot.
RA
Of course, if we do a reduction by hand, then we do not have to follow the steps of the algorithm
strictly if it makes calculations easier. However, these algorithms always work and therefore can be
programmed so that a computer can perform them.

Definition 3.6. Two m × n matrices A and B are called row equivalent if there are elementary
row operations that transform A into B. (Clearly then B can be transformed by row operations
into A.)
D

Remark. Let A be an m × n matrix.


• A can be transformed into infinitely many different row echelon forms.
• There is only one reduced row echelon form that A can be transformed into.

Prove the assertions above.

Before we give examples, we note that from the row echelon form we can immediately tell how
many solutions the corresponding linear system has.

Theorem 3.7. Let (A|b) be the augmented coefficient matrix of a linear m × n system and let
(A0 |b0 ) be a row reduced form.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
74 3.1. Linear systems and Gauß and Gauß-Jordan elimination

(1) If there is a row of the form (0 · · · 0|β) with β 6= 0, then the system has no solution.
(2) If there is no row of the form (0 · · · 0|β) with β 6= 0, then one of the following holds:
(2.1) If there is a pivot in every column of A0 then the system has exactly one solution.
(2.2) If there is a column in A0 without a pivot, then the system has infinitely many solutions.

Proof. (1) If (A0 |b0 ) has a row of the form (0 · · · 0|β) with β 6= 0, then the corresponding equation
is 0x1 + · · · + 0xn = β which clearly has no solution.
(2) Now assume that (A0 |b0 ) has no row of the form (0 · · · 0|β) with β 6= 0. In case (2.1), the
transformed matrix is then of the form
 
0 0 0 0
 1 a12 a13 a1n b1 
0 0 0
 
 0 1 a23 a2n b 2

 
 

 1 

FT
 
 
 . (3.4)

 1 a0(n−1)n b0n−1  
0
 
b0n
 
 1 
 

 0 0 

0
 
 
0 0 0
RA
Note that the last zero rows appear only if n < m. This system clearly has the unique solution
xn = b0n , xn−1 = b0n−1 − a(n−1)n xn , ..., x1 = b01 − a1n xn − · · · − a12 x2 .

In case (2.2), the transformed matrix is then of the form


 
0 0 1 ∗ ∗ b01

∗ b02 
 
0
 0 1 ∗ 
0 
 
0 0 1 ∗ ∗ b3 
D


. (3.5)
 



 0 0 1 ∗ 0 
∗ bk 

 

 0 0  
 
 
0 0 0

where the stars stand for numbers. (If we continue the reduction until we get to the reduced
row echelon form, then the numbers over the 1’s must be zeros.) Note that we can choose the
unknowns which correspond to the columns without a pivot arbitrarily. The unknowns which
correspond to the columns with pivots can then always be chosen in a unique way such that
the system is satisfied.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 75

Definition 3.8. The variables wich correspond to columns without pivots are called free variables.

We will come back to this theorem later on page 99 (the theorem is stated again in the coloured
box).
From the above theorem we get as an immediate consequence the following.

Theorem 3.9. A linear system has either no, exactly one or infinitely many solutions.

Now let us see some examples.

Example 3.10 (Example with a unique solution (no free variables)). We consider the
linear system
2x1 + 3x2 + x3 = 12,
−x1 + 2x2 + 3x3 = 15, (3.6)
3x1 − x3 = 1.
Solution. We form the augmented matrix and perform row reduction.

FT
     
2 3 1 12 0 7 7 42 0 7 7 42
R1 →R1 +2R2 R3 →R3 +3R2
−1 2 3 15 −−−−−−−−→ −1 2 3 15 −−−−−−−−→ −1 2 3 15
3 0 −1 1 3 0 −1 1 0 6 8 46
  R →−R  
−1 2 3 15 1 1
R2 → 71 R2
1 −2 −3 −15
R1 ↔R2
−−−−−→  0 7 7 42 −−−−−−→ 0
 1 1 6
0 6 8 46 0 6 8 46
RA
   
1 −2 −3 −15 R3 → 12 R3
1 −2 −3 −15
R →R −6R2
−−3−−−3−−−→ 0 1 1 6 −−−−−−→ 0 1 1 6 .
0 0 2 10 0 0 1 5
This shows that the system (3.6) is equivalent to the system
x1 − 2x2 − 3x3 = −15,
x2 + x3 = 6, (3.7)
x3 = 5
D

whose solution is easy to write down:


x3 = 5, x2 = 6 − x3 = 1, x1 = −15 + 2x2 + 3x3 = 2. 

Remark. If we continue the reduction process until we reach the reduced row echelon form, then
we obtain
     
1 −2 −3 −15 1 −2 −3 −15 1 −2 0 0
R →R2 −R3 R →R +3R3
. . . −→ 0 1 1 6 −−2−−−− −−→ 0 1 0 1 −−1−−−1−−−→ 0 1 0 1
0 0 1 5 0 0 1 5 0 0 1 5
 
1 0 0 2
R →R +2R2
−−1−−−1−−−→ 0 1 0 1 .
0 0 1 5

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
76 3.1. Linear systems and Gauß and Gauß-Jordan elimination

Therefore the system (3.6) is equivalent to the system


x1 = 2,
x2 = 1,
x3 = 5.
whose solution can be read off immediately to be

x3 = 5, x2 = 1, x1 = 2.

Example 3.11 (Example with two free variables). We consider the linear system
3x1 − 2x2 + 3x3 + 3x4 = 3,
2x1 + 6x2 + 2x3 − 9x4 = 2, (3.8)
x1 + 2x3 + x3 − 3x4 = 1.
Solution. We form the augmented matrix and perform row reduction.

FT
     
3 −2 3 3 3 3 −2 3 3 3 0 −8 0 12 0
R →R −2R1 R →R −3R3
2 6 2 −9 2 −−2−−−2−−−→ 0 2 0 −3 0 −−1−−−1−−−→ 0 2 0 −3 0
1 2 1 −3 1 1 2 1 −3 1 1 2 1 −3 1
   
1 2 1 −3 1 1 2 1 −3 1
R ↔R3 R →R +4R2
−−1−−−→ 0 2 0 −3 0 −−3−−−3−−−→ 0 2 0 −3 0
0 −8 0 12 0 0 0 0 0 0
RA
 
1 0 1 0 1
R →R1 −R2
−−1−−−−−−→ 0 2 0 −3 0 .
0 0 0 0 0

The 3rd and the 4th column do not have pivots and we see that the system (3.8) is equivalent to
the system
x1 − x3 = 1,
x2 + x4 = 0.
D

Clearly we can choose x3 and x4 (the unknowns corresponding to the columns without a pivot)
arbitrarily. We will always be able to adjust x1 and x2 such that the system is satisfied. In order
to make it clear that x3 and x4 are our free variables, we sometimes call them x3 = t and x4 = s.
Then every solution of the system (3.8) is of the form

x1 = 1 + t, x2 = −s, x3 = t, x4 = s, for arbitrary s, t ∈ R. 

In vector form we can write the solution as follows. A tuple (x1 , x2 , x3 , x4 ) is a solution of (3.8) if
and only if the corresponding vector is of the form
         
x1 1+t 1 1 0
x2   −s  0 0 −1
x3   t  = 0 + t 1 + s  0 for some s, t ∈ R.
 =       

x4 s 0 0 1

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 77

Remark. Geometrically, the set of all solutions is an affine plane in R4 .

Example 3.12 (Example with no solution). We consider the linear system


2x1 + x2 − x3 = 7,
3x1 + 2x2 − 2x3 = 7, (3.9)
−x1 + 3x2 − 3x3 = 2.
Solution. We form the augmented matrix and perform row reduction.
     
2 1 −1 7 0 7 −7 11 0 7 −7 11
R1 →R1 +2R3 R →R +3R3
 3 2 −2 7 − −−−−−−−→  3 2 −2 7 −−2−−−2−−−→  0 11 −11 13
−1 3 −3 2 −1 3 −3 2 −1 3 −3 2
   
−1 3 −3 2 −1 3 −3 2
R ↔R3 11R3 →R3 −7R2
−−1−−−→  0 11 −11 13 −−−− −−−−−−→  0 11 −11 13 .
0 7 −7 11 0 0 0 30

FT
The last line tells us immediately that the system (3.9) has no solution because there is no choice
of x1 , x2 , x3 such that 0x1 + 0x2 + 0x3 = 30. 

You should now have understood

• what it means that two linear systems are equivalent,


RA
• which row operations transform a given system into an equivalent one and why this is so,
• when a matrix is in row echelon and a reduced row echelon form,
• why the linear system associated to a matrix in (reduced) echelon form is easy to solve,
• what the Gauß- and Gauß-Jordan elimination do and why they always work,
• that the Gauß- and Gauß-Jordan elimination is nothing very magical; essentially it is the
same as solving for variables and replacing in the remaining equations. It only does so in a
systematic way;
D

• why a given matrix can be transformed into may different row echelon forms, but in only
one reduced row echelon form,
• why a linear system always has either no, exactly one or infinitely many solutions,
• etc.
You should now be able to
• identify if a matrix is in row echelon or a reduced row echelon form,
• use the Gauß- or Gauß-Jordan elimination to solve linear systems,
• say if a system has no, exactly one or infinitely many solutions if you know its echelon form,
• etc.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
78 3.2. Homogeneous linear systems

3.2 Homogeneous linear systems


In this short section we deal with the special case of homogeneous linear systems. Recall that a
linear system (3.1) is called homogeneous if b1 = · · · = bn = 0. Such a system has always at least
one solution, the so-called trivial solution x1 = · · · = xn = 0. This also clear from Theorem 3.7
since no matter what row operations we perform, the right side will always remain equal to 0. Note
that if we perform Gauß or Gauß-Jordan elimination, there is no need to write down the right hand
side since it always will be 0.
If we adapt Theorem 3.7 to the special case of a homogeneous system, we obtain the following.

Theorem 3.13. Let A be the coefficient matrix of a homogeneous linear m × n system and let A0
be a row reduced form.

(i) If there is a pivot in every column then the system has exactly one solution, namely the trivial
solution.
(ii) If there is a column with without a pivot, then the system has infinitely many solutions.

Let us see an example.

FT
Corollary 3.14. A homogeneous linear system has either exactly one or infinitely many solutions.

Example 3.15 (Example of a homogeneous system with infinitely many solutions). We


consider the linear system
RA
x1 + 2x2 − x3 = 0,
2x1 + 3x2 − 2x3 = 0, (3.10)
3x1 − x2 − 3x3 = 0.

Solution. We perform row reduction on the associated matrix.


     
1 2 −1 1 2 −1 1 2 −1
R →R −2R1 R →R −3R1
2 3 −2 −−2−−−2−−−→ 0 −1 0 −−3−−−3−−−→ 0 −1 0
3 −1 −3 3 −1 −3 0 −7 0
D

   
1
use R2 to clear 0 −1 1 0 −1
the 2nd column R →−R2
−−−−−−−−−−→ 0 −1 0 −−2−−−−→ 0 1 0 .
0 0 0 0 0 0

We see that the third variable is free, so we set x3 = t. The solution is

x1 = t, x2 = 0, x3 = t for t ∈ R.

or in vector form    
x1 1
x2  = t 0 for t ∈ R. 
x3 1

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 79

Example 3.16 (Example of a homogeneous system with exactly one solution). We con-
sider the linear system

x1 + 2x2 = 0,
2x1 + 3x2 = 0, (3.11)
3x1 + 5x2 = 0.

Solution. We perform row reduction on the associated matrix.


       
1 2 use R1 to clear 1 2 use R2 to clear 1 0 1 0
R →−R2
2 3 −the
−−−1st column 
−−−−−→ 0 −1 − the 2nd column 
−−−−−−−−−→ 0 −1 −−2−−−−→ 0 1 .
3 5 0 −1 0 0 0 0

So the only possible solution is x1 = 0 and x2 = 0. 

In the next section we will see the connection between the set of solutions of a linear system and
the corresponding homogeneous linear system.

You should now have understood

FT
• why a homogeneous linear system always has either one or infinitely many solutions,
• etc.
You should now be able to
RA
• use the Gauß- or Gauß-Jordan elimination to solve homogeneous linear systems,
• etc.

3.3 Matrices and linear systems


So far we were given a linear system with a specific right hand side and we asked ourselves which
xj do we have to feed into the system in order to obtain the given right hand side. Problems of
D

this type are called inverse problems since we are given an output (the right hand of the system;
the “state” that we want to achieve) and we have to find a suitable input in order to obtain the
desired output.
Now we change our perspective a bit and we ask ourselves: If we put certain x1 , . . . , xn into the
system, what do we get as a result on the right hand side? To investigate this question, it is very
useful to write the system (3.1) in a short form. First note that we can view it as an equality of
the two vectors with m components each:

a11 x1 + a12 x2 + · · · + a1n xn


   
b1
a x + a x + · · · + a x
  b2 
  
 21 1 22 2 2n n
 =  . . (3.12)
 .. ..

  . 
 . .  .
am1 x1 + am2 x2 + · · · + amn xn b m

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
80 3.3. Matrices and linear systems

Let A be the coefficient matrix and ~x the vector whose components are x1 , . . . , xn . Then we write
the left hand side of (3.12) as
a11 x1 + a12 x2 + · · · + a1n xn
 
 
x1 a x + a x + · · · + a x 
 ..   21 1 22 2 2n n 
A~x = A  .  :=   .. ..
.
 (3.13)
 . .
xn

am1 x1 + am2 x2 + · · · + amn xn
With this notation, the linear system (3.1) can be written very short as
A~x = ~b
 
b1
with ~b =  ... .
 

bm

A way to remember the formula for the multiplication of a matrix and a vector is that we “multiply

FT
each row of the matrix by the column vector”, so we calculate “row by column”. For example, the
jth component of A~x is “(jth row of A) by (column ~x)”.
   
a11 a12 . . . a1n   aj1 x1 + aj2 x2 + · · · + ajn xn
 .. ..  x1  .. 
 . .  .
x
 
 2 

   
A~x =  a
 j1 a j2 . . . ajn  . 
 = a x
j1 1 + a x
j2 2 + · · · + a x
jn n  .
 (3.14)
 ..  

 . . .
 .. ..  ..

xn
 
RA
am1 am2 . . . amn am1 x1 + am2 x2 + · · · + amn xn

Definition 3.17. The formula in (3.13) is called the multiplication of a matrix and a vector.

An m × n matrix A takes a vector with n components and gives us back a vector with m compo-
nents.
Observe that something like ~xA does not make sense!
D

Remark 3.18. Formula (3.13) can be interpreted as follows. If A is an m × n matrix and ~x is


a vector in Rn , then A~x is the vector in Rm which is the sum of the columns of A weighted with
coefficients given by ~x since
     
a11 x1 + a12 x2 + · · · + a1n xn a11 x1 a1n xn
a21 x1 + a22 x2 + · · · + a2n xn   a21 x1   a2n xn 
A~x =  . =  .  + ··· +  . 
     
.
 .. ..   ..   .. 

am1 x1 + am2 x2 + · · · + amn xn am1 x1 amn xn


    (3.15)
a11 a1n
 a21   a2n 
= x1  .  + · · · + xn  .  .
   
 ..   .. 
am1 amn

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 81

Remark 3.19. Recall that ~ej is the vector which has a 1 as its jth component and has zeros
everywhere else. Formula (3.13) shows that for every j = 1, . . . , n
 
a1j
A~ej =  ...  = jth column of A. (3.16)
 

amj

Let us prove some easy properties.

Proposition 3.20. Let A be an m × n matrix, ~x, ~y ∈ Rn and c ∈ R. Then


(i) A(c~x) = cA~x,
(ii) A(~x + ~y ) = A~x + A~y ,
(iii) A~0 = ~0.

FT
Proof. The proofs are not difficult. They follow by using the definitions and carrying out some
straightforward calculations as follows.
a11 cx1 + · · · + a1n cxn a11 x1 + · · · + a1n xn
   
 
cx1 a cx + · · · + a cx  a x + · · · + a x 
21 1 2n n   21 1 2n n 
 ..  
(i) A(c~x) = A  .  =   .. ..
 = c
  .. ..
 = cA~x.

 . .  . .
cx
 
n
am1 cx1 + · · · + amn cxn am1 x1 + · · · + amn xn
RA
a11 (x1 + y1 ) + · · · + a1n (xn + yn )
 
 
x1 + y1 a (x + y ) + · · · + a (x + y ) 
21 1 1 2n n n
(ii) A(~x + ~y ) = A  ..   
=

.  .. ..


 . .
x +y

n n
am1 (x1 + y1 ) + · · · + amn (xn + yn )
a11 x1 + · · · + a1n xn a11 y1 + · · · + a1n yn
   
a x + · · · + a x  a y + · · · + a y 
 21 1 2n n   21 1 2n n 
=
 .. .
+
. .
 = A~x + A~y .
..   . ..
D


 .   . 
am1 x1 + · · · + amn xn am1 y1 + · · · + amn yn

(iii) To show that A~0 = ~0, we could simply do the calculation (which is very easy!) or we can use
(i):
A~0 = A(0~0) = 0A~0 = ~0.

Note that in (iii) the ~0 on the left hand side is the zero vector in Rn whereas the ~0 on the right
hand side is the zero vector in Rm .
Proposition 3.20 gives an important insight into the structure of solutions of linear systems.

Theorem 3.21. (i) Let ~x and ~y be solutions of the linear system (3.1). Then ~x − ~y is a solution
of the associated homogeneous linear system.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
82 3.3. Matrices and linear systems

(ii) Let ~x be a solution of the linear system (3.1), let ~z be a solution of the associated homogeneous
linear system and let λ ∈ R. Then ~x + λ~z is solution of the system (3.1).

Proof. Assume that ~x and ~y are solutions of the (3.1), that is

A~x = ~b and A~y = ~b.

By Proposition 3.20 (i) and (ii) we have

A(~x − ~y ) = A~x + A(−~y ) = A~x − A~y = ~b − ~b = ~0

which shows that ~x − ~y solves the homogeneous equation A~v = ~0. Hence (i) is proved
In order to show (ii), we proceed similarly. If ~x solves the inhomogeneous system (3.1) and ~z solves
the associated homogeneous system, then

A~x = ~b and A~z = ~0.

Now (ii) follows from

FT
A(~x + λ~z) = A~x + Aλ~z = A~x + λA~z = ~b + λ~0 = ~b.

Corollary 3.22. Let ~x be an arbitrary solution of the inhomogeneous system (3.1). Then the set
of all solutions of (3.1) is

{~x + ~z : ~z is solution of the associated homogeneous system}.


RA
This means that in order to find all solutions of an inhomogeneous system it suffices to find one
particular solution and all solutions of the corresponding homogeneous system.
We will show later that the set of all solutions of a homogeneous system is a vector space. When you
study the set of all solutions of linear differential equations, you will encounter the same structure.

Example 3.23. Let us consider the system


D

x1 + 2x2 − x3 = 3,
2x1 + 3x2 − 2x3 = 3, (3.10’)
3x1 − x2 − 3x3 = −12.
Solution. We form the augmented matrix and perform row reduction.
     
1 2 −1 3 1 2 −1 3 use R2 to clear 1 0 −1 −3
R →R −2R1 the 2nd column 
2 3 −2 3 −−2−−−2−−−→ 0 −1 0 −3 − −−−−−−−−−→ 0 −1 0 −3
R3 →R3 −3R1
3 −1 −3 −12 0 −7 0 −21 0 0 0 0
 
1 0 −1 −3
R →−R2
−−2−−−−→ 0 1 0 3 .
0 0 0 0

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 83

It follows that x2 = 3 and x1 = −3 + x3 . If we take x2 as parameter, the general solution of the


system in vector form is      
x1 0 1
x2  = 3 + t 0 for t ∈ R.
x3 0 1
Note that the left hand side of the system (3.10’) is the same as that of the homogeneous system
(3.10) in Example 3.15 which has the general solution
   
x1 1
x2  = t 0 for t ∈ R.
x3 1

This shows that indeed we obtain all solutions of the inhomogenous equation as sum of the particular
solution (0, 3, 0)t and all solutions of the corresponding homegenous system. 

You should now have understood

FT
• that an m × n matrix can be viewed as an operator that takes vectors in Rn and returns a
vector in Rm ,
• the structure of the set of all solutions of a given linear system,
• etc.
You should now be able to
• calculate expressions like A~x,
RA
• relate the solutions of an inhomogeneous system with those of the corresponding homoge-
neous one,
• etc.

3.4 Matrices as functions from Rn to Rm ; composition of


matrices
D

In the previous section we saw that a matrix A ∈ M (m × n) takes a vector ~x ∈ Rn and returns
a vector A~x in Rm . This allows us to view A as a function from Rn to Rm , and therefore we can
define the sum and composition of two matrices. Before we do this, let us see a few examples of
such matrices. As examples we work with 2 × 2 matrices because their action on R2 can be sketched
in the plane.
 
1 0
Example 3.24. Let us consider A = . This defines a function TA from R2 to R2 by
0 −1

TA : R2 → R2 , TA ~x = A~x.

Remark. We write TA to denote the function induced by A, but sometimes we will write simply
A : R2 → R2 when it is clear that we consider the matrix A as a function.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
84 3.4. Matrices as functions from Rn to Rm ; composition of matrices

We calculate easily
           
1 1 0 0 x x
TA = , TA = , in general TA = .
0 0 1 −1 y −y
So we see that TA represents the reflection of a vector ~x about the x-axis.
y y
TA

~e2 ~v = ( vv12 )

TA w
~
x x
~e1 TA~e1
w
~ TA~e2
v1 
TA~v = −v2

Example 3.25. Let us consider B =


FT
Figure 3.1: Reflection on the x-axis.

0 0

. This defines a function TB from R2 to R2 by
RA
0 1
TB : R2 → R2 , TB ~x = B~x.
We calculate easily
           
1 0 0 0 x 0
TB = , TB = , in general TB = .
0 0 1 1 y y
So we see that TB represents the projection of a vector ~x onto the y-axis.
y y
TB
D

0

TB ~v = v2
~e2 ~v = ( vv12 )

x x
~e1
w
~ TB w
~

Figure 3.2: Orthogonal projection onto the y-axis.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 85

 
0 −1
Example 3.26. Let us consider C = . This defines a function TC from R2 to R2 by
1 0

TC : R2 → R2 , TC ~x = C~x.

We calculate easily
           
1 0 0 −1 x −y
TC = , TC = , in general TC = .
0 1 1 0 y x

So we see that TC represents the rotation of a vector ~x about 90◦ counterclockwise.


y y
TC

−v2

TC ~v = v1

~e2 ~v = ( vv12 )

FT
TC~e1
x x
~e1 TC~e2
w
~

TC w
~
RA
Figure 3.3: Rotation about π/2 counterclockwise.

Just as with other functions, we can sum them or compose them. Remember from your calculus
classes, that functions are summed “pointwise”. That means, if we have two functions f, g : R → R,
then the sum f + g is a new function which is defined by
D

f + g : R → R, (f + g)(x) = f (x) + g(x). (3.17)

The multiplication of a function f with a number c gives the new function cf defined by

cf : R → R, (cf )(x) = c(f (x)). (3.18)

The composition of functions if defined as

f ◦ g : R → R, (f ◦ g)(x) = f (g(x)). (3.19)

Matrix sum
Let us see how this looks like in the case of matrices. Let A and B be matrices. First note that
they both must depart from the same space Rn because we want to apply them to the same ~x, that

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
86 3.4. Matrices as functions from Rn to Rm ; composition of matrices

is, both A~x and B~x must be defined. Therefore A and B must have the same number of columns.
They also must have the same number of rows because we want to be able to sum A~x and B~x. So
let A, B ∈ M (m × n) and let ~x ∈ R. Then, by definition of the sum of two functions, we have
     
a11 a12 · · · a1n x1 b11 b12 · · · b1n x1
 a21 a22 · · · a2n   x2   b21 b22 · · · b2n   x2 
(A + B)~x := A~x + B~x =  . ..   ..  +  ..
     
.. .. ..   .. 
 .. . .   .   . . .  . 
am1 am2 · · · amn xn bm1 bm2 · · · bmn xn
   
a11 x1 + a12 x2 + · · · + a1n xn b11 x1 + b12 x2 + · · · + b1n xn
 a21 x1 + a22 x2 + · · · + a2n xn
  b21 x1 + b22 x2 + · · · + b2n xn

= +
   
.. .. 
 .   . 
am1 x1 + am2 x2 + · · · + amn xn bm1 x1 + bm2 x2 + · · · + bmn xn
 
a11 x1 + a12 x2 + · · · + a1n xn + b11 x1 + b12 x2 + · · · + b1n xn
a21 x1 + a22 x2 + · · · + a2n xn + b21 x1 + b22 x2 + · · · + b2n xn

FT
 
=
 
.. 
 . 
am1 x1 + am2 x2 + · · · + amn xn + bm1 x1 + bm2 x2 + · · · + bmn xn
 
(a11 + b11 )x1 + (a12 + b12 )x2 + · · · + (a1n + bmn )xn
 (a21 + b11 )x1 + (a22 + b12 )x2 + · · · + (a2n + bmn )xn

=
 
.. 
 . 
(am1 + b11 )x1 + (am2 + b12 )x2 + · · · + (amn + bmn )xn
RA
  
a11 + b11 a12 + b12 ··· a1n + bmn x1
 a21 + b11 a22 + b12 ··· a2n + bmn   x2 
=   ..  .
  
.. .. ..
 . . .  . 
am1 + b11 am2 + b12 ··· amn + bmn xn

We see that A + B is again a matrix of the same size and that the components of this new matrix
are just the sum of the corresponding components of the matrices A and B.
D

Multiplication of a matrix by a scalar


Now let c be a number and let A ∈ M (m × n). Then we have
     
a11 a12 · · · a1n x1 a11 x1 + · · · + a1n xn
(cA)~x = c(A~x) = c  ... .. ..   ..  = c  ..
 
. .   .   . 
am1 am2 · · · amn xn am1 x1 + · · · + amn xn
    
ca11 x1 + · · · + ca1n xn ca11 ca12 ··· ca1n x1
=
 ..   ..
= .. ..   ..  .
.   . . .  . 
cam1 x1 + · · · + camn xn cam1 cam2 ··· camn xn

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 87

We see that cA is again a matrix and that the components of this new matrix are just the product
of the corresponding components of the matrix A with c.

Proposition 3.27. Let A, B, C ∈ M (m × n) let O be the matrix whose entries are all 0 and let
λ, µ ∈ R. Moreover, let A
e be the matrix whose entries are the negative entries of A. Then the
following is true.

(i) Associativity of the matrix sum: (A + B) + C = A + (B + C).

(ii) Commutativity of the matrix sum: A + B = B + A.

(iii) Additive identity: A + O = A.

e = O.
(iv) Additive inverse A + A

FT
(v) 1A = A.

(vi) (λ + µ)A = λA + µA and λ(A + B) = λA + λB.

Proof. The claims of the proposition can be proved by straightforward calculations.

Prove Proposition 3.27.


RA
From the proposition we obtain immediately the following theorem.

Theorem 3.28. M (m × n) is a vector space.

Composition of two matrices


D

Now let us calculate the composition of two matrices. This is also called the product of the matrices.
Assume we have A ∈ M (m × n) and we want to calculate AB for some matrix B. Note that A
describes a function from Rn → Rm . In order for AB to make sense, we need that B goes from
some Rk to Rn , that means that B ∈ M (n × k). The resulting function AB will then be a map
from Rk to Rm .

B A
Rk Rn Rm
AB

So let B ∈ M (n × k). Then, by the definition of the composition of two functions, we have for every

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
88 3.4. Matrices as functions from Rn to Rm ; composition of matrices

~x ∈ Rk

··· b11 x1 + b12 x2 + · · · + b1k xk


     
b11 b12 b1k x1
 b21 b22 ···  x2 
b2k     b21 x1 + b22 x2 + · · · + b2k xk 
(AB)~
x = A(B~
x) = A  . ..   ..  = A 
  
.. ..
 ..

. .   .    . 
bn1 bn2 ··· bnk xk bn1 x1 + bn2 x2 + · · · + bnk xk

··· b11 x1 + b12 x2 + · · · + b1k xk


  
a11 a12 a1k
 a21 a22 ···   b21 x1 + b22 x2 + · · · + b2k xk 
a2k   
= .

.. ..   ..
 ..

. .  . 
an1 an2 ··· ank bn1 x1 + bn2 x2 + · · · + bnk xk

a11 [b11 x1 + b12 x2 + · · · + b1k xk ] + a12 [b21 x1 + b22 x2 + · · · + b2k xk ] + · · · + a1n [bn1 x1 + bn2 x2 + · · · + bnk xk ]
 
 a21 [b11 x1 + b12 x2 + · · · + b1k xk ] + a22 [b21 x1 + b22 x2 + · · · + b2k xk ] + · · · + a2n [bn1 x1 + bn2 x2 + · · · + bnk xk ] 
=
 
.. 
 . 

FT
am1 [b11 x1 + b12 x2 + · · · + b1k xk ] + am2 [b21 x1 + b22 x2 + · · · + b2k xk ] + · · · + amn [bn1 x1 + bn2 x2 + · · · + bnk xk ]

[a11 b11 + a12 b21 + · · · + a1n bn1 ]x1 + [a11 b12 + a12 b22 + · · · + a1n bn2 ]x2 + · · · + [a11 b1k + a12 b2k + · · · + a1n bnk ]xk
 
 [a21 b11 + a22 b21 + · · · + a2n bn1 ]x1 + [a21 b12 + a22 b22 + · · · + a2n bn2 ]x2 + · · · + [a21 b1k + a22 b2k + · · · + a2n bnk ]xk 
=
 
.. 
 . 
[am1 b11 + am2 b21 + · · · + amn bn1 ]x1 + [am1 b12 + am2 b22 + · · · + amn bn2 ]x2 + · · · + [am1 b1k + am2 b2k + · · · + amn bnk ]xk

a11 b11 + a12 b21 + · · · + a1n bn1 a11 b12 + a12 b22 + · · · + a1n bn2 ··· a11 b1k + a12 b2k + · · · + a1n bnk
  
x1
RA
 a21 b11 + a22 b21 + · · · + a2n bn1 a21 b12 + a22 b22 + · · · + a2n bn2 ··· a21 b1k + a22 b2k + · · · + a2n bnk  x2 
=
  
.. ..  . 
 . .   .. 
am1 b11 + am2 b21 + · · · + amn bn1 am1 b12 + am2 b22 + · · · + amn bn2 ··· am1 b1k + am2 b2k + · · · + amn bnk xk

We see that AB is a matrix of the size m × k as was to be expected since the composition function
goes from Rk to Rm . The component j` of the new matrix (the entry in lines j and column `) is
D

n
X
cj` = ajk bk` .
k=1

So in order to calculate this entry we need from A only its jth row and from B we only need its
`th column and we multiply them component by component. You can memorise this again as “row
by column”, more precisely:

cj` = component in row j and column ` of AB = (row j of A) × (column ` of B) (3.20)

as in the case of multiplication of a vector by a matrix. Actually, a vector in Rn can be seen as an


n × 1 matrix (a matrix with n rows and one column), hence (3.13) can be viewed as a special case

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 89

of (3.20).
    
a11 a12 ... a1n b11 ... b1` ... b1k c11 ... c1` ... c1k
 .. ..   b21 ... b2` ... b2k   .. .. .. 
 . . 
  .. .. ..   . . . 
  
 
 aj1
AB =  aj2 . . . ajn  . ... . .  =  cj1 ... cj` ... cjk 
 

 . ..   .. .. ..   .. .. .. 
 .. .  . ... . .   . . . 
am1 am2 . . . amn bn1 ... bn` ... bnk cm1 ... cm2 ... cmk

with cj` = aj1 b1` + aj2 b2` + · · · + ajn bn` .


 
  7 1 2 3
1 2 3
Example 3.29. Let A = and B = −2 0 1 4. Then
8 6 4
2 6 −3 0
 
 7 1 2 3

FT

1 2 3 
AB = 2 0 1 4
8 6 4
2 6 −3 0
 
1·7+2·2+3·2 1·1+2·0+3·6 1 · 2 + 2 · 1 + 3 · (−3) 1·3+2·4+3·0
=
8·7+6·2+4·2 8·1+6·0+4·6 8 · 2 + 6 · 1 + 4 · (−3) 8·3+6·4+4·0
 
17 19 −5 11
= .
76 32 10 48
RA
Let us see some properties of the algebraic operations for matrices that we just introduced.

Proposition 3.30. Let A ∈ M (m × n), B, C ∈ M (k × m), S, T ∈ M (n × k) and R ∈ M (k × `).


Then the following is true.

(i) Associativity of the matrix product: A(RS) = A(RS).


D

(ii) Distributivity: A(S + T ) = AS + AT and (B + C)A = BA + CA.

Proof. The claims of the proposition can be proved by straightforward calculations.

Prove Proposition 3.30.

Very important remark.


The matrix multiplication is not commutative, that is, in general

AB 6= BA.

That matrix multiplication is not commutative is to be expected since it is the composition of two
functions (think of functions that you know from your calculus classes. For example, it does make

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
90 3.4. Matrices as functions from Rn to Rm ; composition of matrices

a difference if you first square a variable and then take the arctan or if you first calculate its arctan
and then square the result).
Let us see an example. Let B be the matrix from Example 3.25 and C be the matrix from
Example 3.26. Recall that B represents the orthogonal projection onto the y-axis and that C
represents counterclockwise rotation by 90◦ . If we take ~e1 (the unit vector in x-direction), and we
first rotate and then project, we get the vector ~e2 . If however we project first and rotate then, we
get ~0. That means, BC~e1 6= CB~e1 , therefore BC 6= CB. Let us calculate the products:
    
0 0 0 −1 0 0
BC = = first rotation, then projection,
0 1 1 0 1 0
    
0 −1 0 0 0 −1
CB = = first projection, then rotation.
1 0 0 1 0 0

FT
Let A be the matrix from Example 3.24, B be the matrix from Example 3.25 and C the matrix
from Example 3.26. Verify that AB 6= BA and AC 6= CA and understand this result geometrically
by following for example where the unit vectors get mapped to.

Note also that usually, when AB is defined, the expression BA is not defined because in general
the number of columns of B will be different from the number of rows of A.

We finish this section with the definition of the so-called identity matrix.
RA
Definition 3.31. Let n ∈ N. Then the n × n identity matrix is the matrix which has 1s on its
diagonal and has zero everywhere else:
   
1 0 0 1

0 1
 
  1 0 

= . (3.21)
   

   
1 0  1
   
0
D

 
0 0 1 1

As notation for the identity matrix, the following symbols are used in the literature: En , idn , Idn ,
In , 1n , 1n . The subscript n can be omitted if the size of the matrix is clear.

Remark 3.32. It can be easily verified that

A idn = A, idn B = B, idn ~x = ~x

for every A ∈ M (m × n), for every B ∈ M (n × k) and for every ~x ∈ Rn .

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 91

You should now have understood


• what the sum and the composition of two matrices is and where the formulas come from,
• why the composition of matrices is not commutative,
• that M (m × n) is a vector space,
• etc.

You should now be able to


• calculate the sum and product (composition) of two matrices,
• etc.

3.5 Inverses of matrices


We will give two motivations why we are interested in inverses of matrices before we give the formal

FT
definition.

Inverse of a matrix as a function


The inverse of a given matrix is a matrix that “undoes” what the original matrix did. We will
review the matrices from the Examples 3.24, 3.25 and 3.26.

• Assume we are given the matrix A from Example 3.24 which represents reflection on the
x-axis and we want to find a matrix that restores a vector after we applied A to it. Clearly,
RA
we have to reflect again on the x-axis: reflecting an arbitrary vector ~x twice on the x-axis
leaves the vector where it was. Let us check:
    
1 0 1 0 1 0
AA = = = id2 .
0 −1 0 −1 0 1

That means that for every ~x ∈ R2 , we have that A2 ~x = ~x, hence A is its own inverse.
• Assume we are given the matrix C from Example 3.26 which represents counterclockwise
rotation by 90◦ and we want to find a matrix that restores a vector after we applied C to it.
D

Clearly, we have to rotate clockwise by 90◦ . Let us assume that there exists a matrix which
represents this rotation and let us call it C−90◦ . By Remark 3.18 it is enough to know how it
acts on ~e1 and ~e2 in order to write it down. Clearly C−90◦~e1 = −~e2 and C−90◦~e2 = ~e1 , hence
C−90◦ = (−~e2 |~e1 ).
Let us check:     
0 1 0 −1 1 0
C−90◦ C = = = id2
−1 0 1 0 0 1
and     
0 −1 0 1 1 0
CC−90◦ = = = id2
1 0 −1 0 0 1
which was to be expected because rotating first 90◦ clockwise and then 90◦ counterclockwise,
leaves any vector where it is.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
92 3.5. Inverses of matrices

• Assume we are given the matrix B from Example 3.25 which represents projection onto the
y-axis. In this case, we cannot 
restore
 a vector ~x after we projected
  it onto the y-axis. For
0 0 7
example, if we know that B~x = , then ~x could have been or or any other vector
2 2 2
in R2 whose second component is equal to 2. This shows that B does not have an inverse.

Inverse of a matrix for solving a linear system

Let us consider the following situation. A grocery sells two different packages of fruits. Type A
contains 1 peach and 3 mangos and type B contains 2 peaches and 1 mango. We can ask two
different type of questions:

(i) Given a certain number of packages of type A and of type B, how many peaches and how
many mangos do we get?

(ii) How many packages of each type do we need in order to obtain a given number of peaches
and mangos?

then
FT
The first question is quite easy to answer. Let us write down the information that we are given. If

a = number of packages of type A,


b = number of packages of type B,

p = 1a + 2b
p = number of peaches
m = number of mangos.
RA
(3.22)
m = 3a + 1b.
Using vectors and matrices, we can rewrite this as
    
p 1 2 a
= .
m 3 1 b
 
1 2
Let A = . Then the above becomes simply
3 1
D

   
p a
=A . (3.23)
m b

If we know a and b (that is, we know how many packages of each type we bought), then we can
find the values of p and m by simply evaluating A( ab ) which is relatively easy.

Example 3.33. Assume that we buy 1 package of type A and 3 packages of type B, then we
calculate         
p 1 1 2 1 7
=A = = ,
m 3 3 1 3 6
which shows that we have 9 peaches and 7 mangos.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 93

If on the other hand, we know p and m and we are asked find a and b such that (3.22) holds, we
have to solve a linear system which is much more cumbersome. Of course, we can solve (3.23) using
the Gauß or Gauß-Jordan elimination process, but if we were asked to do this for several pairs p
and m, then it would become long quickly. However, if we had a matrix A0 such that A0 A = id2 ,
then this task would be quite easy since in this case we could manipulate (3.23) as follows:

           
p a 0 p 0 a a a
=A =⇒ A =AA = id2 = .
m b m b b b

If in addition we knew that AA0 = id2 , then we have that

       
p a p a
=A ⇐⇒ A0 = . (3.24)
m b m b

The task to find a and b again reduces to perform a matrix multiplication. The matrix A0 , if it

FT
exists, is called the inverse of A and we will dedicate the rest of this section to give criteria for its
existence, investigate its properties and give a recipe for finding it.
 
−1 2
Exercise. Check that A0 = 1
satisfies A0 A = id2 .
5 3 −1

Example 3.34. Assume that we want to buy 5 peaches and 5 mangos. Then we calculate
RA
        
a 5 1 −1 2 5 1
= A0 = = ,
b 5 5 3 −1 5 2

which shows that we have to by 1 package of type A and 2 packages of type B.


D

Now let us give the precise definition of the inverse of a matrix.

Definition 3.35. A matrix A ∈ M (n×n) is called invertible if there exists a matrix A0 ∈ M (n×n)
such that

AA0 = idn and A0 A = idn

In this case A0 is called the inverse of A and it is denoted by A−1 . If A is not invertible then it is
called non-invertible or singular.

The reason why in the definition we only admit square matrices (matrices with the same number
or rows and columns) is explained in the following remark.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
94 3.5. Inverses of matrices

Remark 3.36. (i) Let A ∈ M (m×n) and assume that there is a matrix B such that BA = idn .
This means that if for some ~b ∈ Rm the equation A~x = ~b has a solution, then it is unique
because
A~x = ~b =⇒ BA~x = B~b =⇒ ~x = B~b.
From the above it is clear that A ∈ M (m × n) can have an inverse only if for every ~b ∈ Rm
the equation A~x = ~b has at most one solution. We know that if A has more columns than
rows, then the number of columns will be larger than the number of pivots. Therefore,
A~x = ~b has either no or infinitely many solutions (see Theorem 3.7). Hence a matrix A with
more columns than rows cannot have an inverse.
(ii) Again, let A ∈ M (m × n) and assume that there is a matrix B such that AB = idm . This
means that for every ~b ∈ Rm the equation A~x = ~b is solved by ~x = B~b because

idm ~b = ~b =⇒ AB~b = ~b =⇒ A(B~b) = ~b.

From the above it is clear that A ∈ M (m × n) can have an inverse only if for every ~b ∈ Rm

FT
the equation A~x = ~b has at least one solution. Assume that A has more rows than columns.
If we apply Gaussian elimination to the augmented matrix A|~b) then the last row of the
row-echelon form has to be (0 · · · 0|βm ). If we chose ~b such that after the reduction βm 6= 0,
then A~x = ~b does not have a solution. Such a ~b is easy to find: We only need to take ~em
(the mth unit vector) and do the steps from the Gauß elimination backwards. If we take
this vector as right hand side of our system, then the last row after the reduction will be
(0 . . . 0|1). Therefore, a matrix A with more rows than columns cannot have an inverse
because there will always be some ~b such that the equation A~x = ~b has no solution.
RA
In conlusion we showed that we must have m = n if A ought to have an inverse matrix.

If A ∈ M (m × n) with n 6= m, then it does not make sense to speak of an inverse of A as explained


above. However, we can define the left inverse and the right inverse.

Definition 3.37. Let A ∈ M (m × n).


(i) A matrix C is called a left inverse of A if CA = idn .
(ii) A matrix D is called a right inverse of A if AD = idm .
D

Note that C and D must be n × m matrices. The following examples show that the left- and right
inverses do not need to exist, and if they do, they are not unique.
 
0 0
Examples 3.38. (i) A = has neither left- nor right inverse.
0 0
 
  1 0
1 0 0
(ii) A = has no left inverse and has right inverse D = 0 1. In fact, for every
0 1 0
0 0
 
1 0
x, y ∈ R the matrix  0 1 is a right inverse of A.
x y

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 95

 
1 0  
1 0 0
(iii) A = 0 1 has no right inverse and has left inverse C =
 . In fact, for every
0 1 0
0 0
 
1 0 x
x, y ∈ R the matrix is a left inverse of A.
0 1 y

Remark 3.39. We will show in Theorem 3.44 that a matrix A ∈ M (n × n) is invertible if and only
if it has a left- and a right inverse.

Examples 3.40. • From the examples at the beginning of this section we have:
   
1 0 −1 1 0
A= =⇒ A = A = ,
0 −1 0 −1
   
0 −1 0 1
C= =⇒ C −1 = ,
1 0 −1 0

FT
 
0 0
B= =⇒ B is not invertible.
0 1
   
4 0 0 0 1/4 0 0 0
0 5 0 0 −1
 0 1/5 0 0
• Let A =  . Then we can easily guess that A =  
0 0 −3 0  0 0 −1/3 0
0 0 0 2 0 0 0 1/2
is an inverse of A. It is easy to check that the product of these matrices gives id4 .
RA
• Let A ∈ M (n × n) and assume that the kth row of A consists of only zeros. Then A is not
invertible because for any matrix B ∈ M (n × n), the kth row of the product matrix AB will
be zero, no matter how we choose B. So there is no matrix B such that AB = idn .
• Let A ∈ M (n × n) and assume that the kth column of A consists of only zeros. Then A is
not invertible because for any matrix B ∈ M (n × n), the kth column of the product matrix
BA will be zero, no matter how we choose B. So there is no matrix B such that BA = idn .

Now let us prove some theorems about inverse matrices. Recall that A ∈ M (n × n) is invertible if
D

and only if there exists a matrix A0 ∈ M (n × n) such that AA0 = A0 A = idn .


First we will show that the inverse matrix, if it exists, is unique.

Theorem 3.41. Let A, B ∈ M (n × n).


(i) If A is invertible, then its inverse is unique.
(ii) If A is invertible, then its inverse A−1 is invertible and its inverse is A.
(iii) If A and B are invertible, then their product AB is invertible and (AB)−1 = B −1 A−1 .

Proof. (i) Assume that A is invertible and that A0 and A00 are inverses of A. Note that this
means that
AA0 = A0 A = idn and AA00 = A00 A = idn . (3.25)

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
96 3.5. Inverses of matrices

We have to show that A0 = A00 . This follows from (3.25) and from the associativity of the
matrix multiplication because

A0 = A0 idn = A0 (AA00 ) = (A0 A)A00 = idn A00 = A00 .

(ii) Assume that A is invertible and let A−1 be its inverse. In order to show that A−1 is invertible,
we need a matrix C such that CA−1 = A−1 C = idn . This matrix C is then the inverse of
A−1 . Clearly, C = A does the trick. Therefore A−1 is invertible and (A−1 )−1 = A.
(iii) Assume that A and B are invertible. In order to show that AB is invertible and (AB)−1 =
B −1 A−1 , we only need to verify that B −1 A−1 (AB) = (AB)B −1 A−1 = idn . We see that this
is true using the associativity of the matrix product:

B −1 A−1 (AB) = B −1 (A−1 A)B = B −1 idn B = B −1 B = idn ,


(AB)B −1 A−1 = A(BB −1 )A−1 = A−1 idn A = A−1 A = idn .

Note that in the proof we guessed the formula for (AB)−1 and then we verified that it indeed is the

FT
inverse of AB. We can also calculate it as follows. Assume that C is a left inverse of AB. Then

C(AB) = idn ⇐⇒ (CA)B = idn ⇐⇒ CA = idn B −1 = B −1 ⇐⇒ C = B −1 A−1

If D is a right inverse of AB then

(AB)D = idn ⇐⇒ A(BD) = idn ⇐⇒ BD = A−1 idn = A−1 ⇐⇒ D = B −1 A−1 .

Since C = D, this is the inverse of AB.


RA
Remark 3.42. In general, the sum of invertible matrices is not invertible. For example, both idn
and − idn are invertible, but their sum is the zero matrix which is not invertible.

Theorem 3.43 in the next section will show us how to find the inverse of a invertible matrix; see in
particular the section on page 100.

You should now have understood


D

• what invertibility of a matrix means and why it does not make sense to speak of the invert-
ibility of a matrix which is not a square matrix,
• that invertibility of matrix of n × n-matrix is equivalent to the fact that for every ~b ∈ Rm
the associated linear system A~x = ~b has exactly one solution.
• etc.
You should now be able to
• guess the inverse of simple invertible matrices, for example of matrices which have a clear
geometric interpretation, or of diagonal matrices,
• verify if two given matrices are inverse to each other,
• give examples of invertible and of non-invertible matrices,
• etc.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 97

3.6 Matrices and linear systems


Let us recall from Theorem 3.7:

For A ∈ M (m × n) and ~b ∈ Rm consider the equation

A~x = ~b. (3.26)

Then the following is true:


(1) Equation (3.26) has no ⇐⇒ The reduced row echelon form of the augmented
solution. system (A|~b) has a row of the form (0 · · · 0|β) with
some β 6= 0.

(2) Equation (3.26) has at least ⇐⇒ The reduced row echelon form of the augmented
one solution. system (A|~b) has no row of the form (0 · · · 0|β)
with some β 6= 0.

FT
In case (2), we have the following two sub-cases:
(2.1) Equation (3.26) has exactly one solution. ⇐⇒ #pivots = #columns.
(2.2) Equation (3.26) has infinitely many solutions. ⇐⇒ #pivots < #columns.

Observe that the case (1), no solution, cannot occur for homogeneous systems.
The next theorem connects the above to invertibility of the matrix representing the system.

Theorem 3.43. Let A ∈ M (n × n). Then the following is equivalent:


RA
(i) A is invertible.
(ii) For every ~b ∈ Rn , the equation A~x = ~b has exactly one solution.
(iii) The equation A~x = ~0 has exactly one solution.
(iv) Every row-reduced echelon form of A has n pivots.
(v) A is row-equivalent to idn .
D

We will complete this theorem with one more item in Chapter 4 (Theorem 4.11).

Proof. (ii) ⇒ (iii) follows if we choose ~b = ~0.


(iii) ⇒ (iv) If A~x = ~0 has only one solution, then, by the case (2.1) above (or by Theorem 3.7(2.1)),
the number of pivots is equal to n (the number of columns of A) in every row-reduced echelon form
of A.
(iv) ⇒ (v) is clear.
(v) ⇒ (ii) follows from case (2.1) above (or by Theorem 3.7(2.1)) because no row-reduced form of
A can have a row consisting of only zeros.
So far we have shown that (ii) - (v) are equivalent. Now we have to connect them to (i).
(i) ⇒ (ii) Assume that A is invertible and let ~b ∈ Rn . Then A~x = ~b ⇐⇒ ~x = A−1~b which shows
existence and uniqueness of the solution.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
98 3.6. Matrices and linear systems

(ii) ⇒ (i) Assume that (ii) holds. We will construct A−1 as follows (this also tells us how we can
calculate A−1 if it exists). Recall that we need a matrix C such that AC = idn . This C will
then be our candidate for A−1 (we still would have to check that CA = idn ). Let us denote the
columns of C by ~cj for j = 1, . . . , n, so that C = (~c1 | · · · |~cn ). Recall that the kth column of AC is
A(kth column of C) and that the columns of idn are exactly the unit vectors ~ek (the vector with a
1 as kth component and zeros everywhere else). Then AC = idn can be written as
(A~c1 | · · · |A~cn ) = (~e1 | · · · |~en ).
By (ii) we know that equations of the form A~x = ~ej have a unique solution. So we only need
to set ~cj = unique solution of the equation A~x = ~ej . With this choice we then have indeed that
AC = idn .
It remains to show that CA = idn . To this end, note that
A = idn A =⇒ A = ACA =⇒ A − ACA = 0 =⇒ A(idn −CA) = 0.
This means that A(idn −CA)~x = ~0 for every ~x ∈ Rn . Since by (ii) the equation A~y = ~0 has the
unique solution ~y = ~0, it follows that (idn −CA)~x = ~0 for every x ∈ Rn . But this means that

FT
~x = CA~x for every ~x, hence CA must be equal to idn .

Theorem 3.44. Let A ∈ M (n × n).


(i) If A has a left inverse C (that is, if CA = idn ), then A is invertible and A−1 = C.
(ii) If A has a right inverse D (that is, if AD = idn ), then A is invertible and A−1 = D.

Proof. (i) By Theorem 3.43 it suffices to show that A~x = ~0 has a the unique solution ~0. So
assume that ~x ∈ Rn satisfies A~x = ~0. Then ~x = idn ~x = (CA)~x = C(A~x) = C~0 = ~0. This
RA
shows that A is invertible. Moreover, C = C(idn ) = C(AA−1 ) = (CA)A−1 = idn A−1 = A−1 ,
hence C = A−1 .
(ii) By (i) applied to D, it follows that D has an inverse and that D−1 = A, so by Theo-
rem 3.41 (ii), A is invertible and A−1 = (D−1 )−1 = D.

Calculation of the inverse of a given square matrix


Let A be a square matrix. The proof of Theorem 3.43 tells us how to find its inverse if it exists.
D

We only need to solve A~x = ~ek for k = 1, . . . , n. This might be cumbersome and long, but we
already know that if these equations have solutions, then we can find them with the Gauß-Jordan
elimination. We only need to form the augmented matrix (A|~ek ), apply row operations until we get
to (idn |~ck ). Then ~ck is the solution of A~x = ~ek and we obtain the matrix A−1 as the matrix whose
columns are the vectors ~c1 , . . . , ~cn . If it is not possible to reduce A to the identity matrix, then it
is not invertible.
Note that the steps that we have to perform to reduce A to the identiy matrix depend only on
the coefficients in A and not on the right hand side. So we can calculate the n vectors ~c1 , . . . ~cn
with only one (big) Gauß-Jordan elimination if we augment our given matrix A by the n vectors
~e1 , . . . ,~en . But the matrix (~e1 | · · · |~en ) is nothing else than the identity matrix idn . So if we take
(A| idn ) and apply the Gauß-Jordan elimination and if we can reduce A to the identity matrix,
then the columns on the right are the columns of the inverse matrix A−1 . If we cannot get to the
identity matrix, then A is not invertible.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 99

 
1 2
Examples 3.45. (i) Let A = . Let us show that A is invertible by reducing the aug-
3 4
mented matrix (A| id2 ):
     
1 2 1 0 R2 −3R1 →R2 1 2 1 0 R1 +R2 →R1 1 0 2 1
(A| id2 ) = −−−−−−−−→ −−−−−−−−→
3 4 0 1 0 −2 −3 1 0 −2 −3 1
 
−1/2R2 →R2 1 0 −2 1
−−−−−−−−→ .
0 1 3/2 −1/2
 
−2 1
Hence A is invertible and A−1 = .
3/2 −1/2
We can check your result by calculating
      
1 2 −2 1 −2 + 3 1 − 1 1 0
= =
3 4 3/2 −1/2 −6 + 6 3 − 2 0 1
and

FT
      
−2 1 1 2 −2 + 3 −4 + 4 1 0
= = .
3/2 −1/2 3 4 3/2 − 3/2 3−2 0 1
 
1 2
(ii) Let A = . Let us show that A is not invertible by reducing the augmented matrix
−2 −4
(A| id2 ):
   
1 2 1 0 R2 +2R1 →R2 1 2 1 0
(A| id2 ) = −−−−−−−−→ .
−2 −4 0 1 0 0 2 1
RA
Since there is a zero row in the left matrix, we conclude that A is not invertible.
 
1 1 1
(iii) Let A = 0 2 3. Let us show that A is invertible by reducing the augmented matrix
5 5 1
(A| id3 ):
   
1 1 1 1 0 0 1 1 1 1 0 0
R3 −5R1 →R3
(A| id3 ) = 0 2 3 0 1 0 −−−−−−−−→ 0 2 3 0 1 0
D

5 5 1 0 0 1 0 0 −4 −5 0 1
   
1 1 1 1 0 0 4 4 0 −1 0 1
4R2 +3R3 →R2 4R1 +R3 →R1
−−−−−−−−−→ 0 8 0 −15 4 3 −−−−−−−−→ 0 8 0 −15 4 3
0 0 −4 −5 0 1 0 0 −4 −5 0 1
 
8 0 0 13 −4 −1
2R1 −R2 →R1
−−− −−−−−→ 0 8 0 −15 4 3
0 0 −4 −5 0 1
 
1 0 0 13/8 −1/2 −1/8
2R1 −R2 →R1
−−− −−−−−→ 0 1 0 −15/8 1/2 3/8 .
0 0 1 5/4 0 −1/4

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
100 3.6. Matrices and linear systems

   
13/8 −1/2 −1/8 13 −4 −1
Hence A is invertible and A−1 = −15/8 1/2 3/8 = 1 
8
−15 4 3 .
5/4 0 −1/4 10 0 2
We can check your result by calculating
    
1 1 1 13/8 −1/2 −1/8 1 0 0
0 2 3 −15/8 1/2 3/8 = · · · = 0 1 0
5 5 1 5/4 0 −1/4 0 0 1
and
    
13/8 −1/2 −1/8 1 1 1 1 0 0
−15/8 1/2 3/8 0 2 3 = · · · = 0 1 0 .
5/4 0 −1/4 5 5 1 0 0 1

Special case: Inverse of a 2 × 2 matrix


 
a b
Let A = . We already know that A is invertible if and only if its associated homogeneous

FT
c d
linear system has exactly one solution. By Theorem 1.11 this is the case if and only if det A 6= 0.
Recall that det A = ad − bc. So let us assume here that det A 6= 0.
Case 1. a 6= 0.
   
a b 1 0 aR2 −cR1 →R2 a b 1 0
(A| id2 ) = −−−−−−−−−→
c d 0 1 0 ad − bc −c a
RA
bc ab ad ab
b
   
R1 − ad−bc R2 →R1 a 0 1 + ad−bc − ad−bc a 0 ad−bc − ad−bc
−−−−−−−−−−−−→ =
0 ad − bc −c a 0 ad − bc −c a

d b
b
 
R1 − ad−bc R2 →R1 1 0 ad−bc − ad−bc
−−−−−−−−−−−−→ c a .
0 1 − ad−bc ad−bc

It follows that  
1 d −b
A−1 = . (3.27)
ad − bc −c a
D

Case 2. a = 0. Since 0 6= det A = ad − bc = bc in this case, it follows that c 6= 0 and calculations


as above again lead to formula (3.27).

You should now have understood

• the relation between the invertibility of a square matrix A and the existence and uniqueness
of solution of A~x = ~b,
• that inverting a matrix is the same as solving a linear system,
• etc.
You should now be able to

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 101

• calculate the inverse of a square matrix if it exists,


• use the inverse of a square matrix if it exists to solve the associated linear system,
• etc.

3.7 The transpose of a matrix


 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
 .. .. .. 
 
Definition 3.46. Let A = (aij )i=1,...,m =  . .  ∈ M (m × n). Then its trans-
. 

j=1,...,n  . . .
 .. .. .. 

am1 am2 · · · amn


pose At is the n × m matrix whose columns are the rows of A and whose rows are the columns of
A, that is,

FT
 
a11 a21 · · · am1
 a12 a22 · · · am2 
 .. .. .. 
 
t
A = . . .  ∈ M (n × m).

 . . .
 .. .. .. 

a1n a2n · · · amn

If we denote At = (e
aij ) i=1,...,n , then e
aij = aji for i = 1, . . . , n and j = 1, . . . , m.
RA
j=1,...,m

Examples 3.47. The transposes of


 
    1 2 3
1 2 1 2 3 4 5 6
A= , B= , C=
 
3 4 4 5 6 7 7 7
3 2 4

are
D

   
  1 4 1 4 7 3
t 1 3
A = , B t = 2 5 , C t = 2 5 7 2 .
2 4
3 6 3 6 7 4

Proposition 3.48. Let A, B ∈ M (m × n). Then (At )t = A and (A + B)t = At + B t .

Proof. Clear.

Theorem 3.49. Let A ∈ M (m × n) and B ∈ M (n × k). Then (AB)t = B t At .

Proof. Note that both (AB)t and B t At are m × k matrices. In order to show that they are equal,
we only need to show that they are equal in every entry. Let i ∈ {1, . . . , m} and j ∈ {1, . . . , k}.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
102 3.7. The transpose of a matrix

Then

component ij of (AB)t = component ji of AB


= [row j of A] × [column i of B]
= [column j of At ] × [row i of B t ]
= [row i of B t ] × [column j of At ]
= component ij of B t At .

Theorem 3.50. Let A ∈ M (n × n). Then A is invertible if and only if At is invertible. In this
case, (At )−1 = (A−1 )t .

Proof. Assume that A is invertible. Then AA−1 = id. Taking the transpose on both sides, we find

id = idt = (AA−1 )t = (A−1 )t At .

This shows that At is invertible and its inverse is (A−1 )t , see Theorem 3.44. Now assume that

FT
At is invertible. From what we just showed, it follows that then also its transpose (At )t = A is
invertible.
Next we show an important relation between transposition of a matrix and the inner product on
Rn .

Theorem 3.51. Let A ∈ M (m × n).


(i) hA~x , ~y i = h~x , At ~y i for all ~x ∈ Rn and all ~y ∈ Rm .
RA
(ii) If hA~x , ~y i = h~x , B~y i for all ~x ∈ Rn and all ~y ∈ Rm , then B = At .

Proof. Let A = (aij )i=1,...,m and B = (bij ) i=1,...,n .


j=1,...,n j=1,...,m
Pn
(i) Observe that the kth component of A~x is (A~x)k = j=1 akj xj . and that the `th coordinate
Pm
of At ~y is (At ~y )` = j=1 aj` yj . Then
m
X m X
X n n X
X m n
X
hA~x , ~y i = (A~x)k yk = akj xj yk = xj akj yk = xj (At ~y )j = h~x , At ~y i.
D

k=1 k=1 j=1 j=1 k=1 j=1

(ii) We have to show: For all i = 1, . . . , m and j = 1, . . . , n we have that aij = bji . Take
~x = ~ej ∈ Rn and ~y = ~ei ∈ Rm . If we take the inner product of A~ej with ~ei , then we obtain
the ith component of A~ej . Recall that A~ej is the jth column of A, hence

hA~ej ,~ei i = aij .

Similarly if we take the inner product of B~ei with ~ej , then we obtain the jth component of
B~ei . Since B~ei is the jth column of B it follows that

h~ej , B~ei i = bji .

By assumption hA~ej ,~ei i = h~ej , B~ei i, hence it follows that aij = bji , hence B = At .

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 103

Definition 3.52. Let A = (aij )ni,j=1 ∈ M (n × n) be a square matrix.

(i) A is called upper triangular if aij = 0 if i > j.


(ii) A is called lower triangular if aij = 0 if i < j.
(iii) A is called diagonal if aij = 0 if i 6= j. Diagonal matrices are sometimes denoted by
diag(c1 , . . . , cn ) where the c1 , . . . , cn are the numbers on the diagonal of the matrix.

That means that for an upper triangular matrix all entries below the diagonal are zero, for a lower
triangular matrix all entries above the diagonal are zero and for a diagonal matrix, all entries except
the ones on the diagonal must be zero. These matrices look as follows:
     
a11 a11 a11



a22 ∗ 
,




a22 0 
,




a22 0 

0 ∗ 0
     
     
ann ann ann

upper triangular matrix,

FT
lower triangular matrix, diagonal matrix diag(a11 , . . . , ann ).

Remark 3.53. A matrix is both upper and lower triangular if and only if it is diagonal.

Examples 3.54.
RA
   
  0 2 4 2 0 0 0 0    
1 2 4 0 2 0 0 0 0 0
0 5 2 0 3 0 0
A = 0 2 5 , B = 
0
, C =   , D = 0 3 0 , E = 0 0 0 .
0 0 8 3 4 0 0
0 0 3 0 0 8 0 0 0
0 0 0 0 5 0 0 1

The matrices A, B, D, E are upper triangular, C, D, E are lower triangular, D, E are diagonal.

Definition 3.55. (i) A matrix A ∈ M (n × n) is called symmetric if At = A. The set of all


symmetric n × n matrices is denoted by Msym (n × n).
D

(ii) A matrix A ∈ M (n × n) is called antisymmetric if At = −A. The set of all antisymmetric


n × n matrices is denoted by Masym (n × n).

Examples 3.56.
       
1 7 4 3 0 4 0 2 −5 0 0 8
A = 7 2 5 , B = 0 4 0 , C = −2 0 −3 , D = 0 3 0 .
4 5 3 4 0 1 5 3 0 2 0 0

The matrices A and B are symmetric, C is antisymmetric and D is neither.

Clearly, every diagonal matrix is symmetric.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
104 3.8. Elementary matrices

Exercise 3.57. • Let A ∈ M (n × n). Show that A + At is symmetric and that A − At is


antisymmetric.
• Show that every matrix A ∈ M (n × n) can be written as the sum of symmetric and an
antisymmetric matrix.

Question 3.2
How many possibilities are there to express a given matrix A ∈ M (n × n) as sum of a symmetric
and an antisymmetric matrix?

Exercise 3.58. Show that the diagonal entries of an antisymmetric matrix are 0.

You should now have understood

FT
• why (AB)t = B t At ,
• what the transpose of a matrix has to do with the inner product,
• etc.
You should now be able to
• calculate the transpose of a given matrix,
• check if a matrix is symmetric, antisymmetric or none,
RA
• etc.

3.8 Elementary matrices


In this section we study three special types of matrices. They are called elementary matrices. Let
us define them.

Definition 3.59. For n ∈ N we define the following matrices in M (n × n):


D

 
1
 
(i) Sj (c) =   for j = 1, . . . , n and c 6= 0. All entries outside the diagonal are 0.
 c 
 
1

column k
 
1 c  row j
 
(ii) Qjk (c) = 


 for j, k = 1, . . . , n with j 6= k and c ∈ R. The number c is
 
 
1

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 105

in row j and column k. All entries apart from c and the diagonal are 0.

col. k col. j
 
1
 
 

 0 1 

row k
 
(iii) Pjk (c) = 


 for j, k = 1, . . . , n. This matrix is obtained from
 
 

 1 0 
 row j
 
1
the identity matrix by swapping rows j and k (or, equivalently, by swapping columns j and
k).

Examples 3.60. Let us see some examples for n = 2.

FT
     
5 0 1 0 0 1
S1 (5) = , Q21 (3) = , P12 = .
0 1 3 1 1 0

Some examples for n = 3:


       
1 0 0 1 0 0 0 0 1 0 1 0
S3 (−2) = 0 1 0 , Q23 (4) = 0 1 4 , P31 = 0 1 0 P21 = 1 0 0 .
RA
0 0 −2 0 0 1 1 0 0 0 0 1

Let us see how these matrices act on other n × n matrices. Let A = (aij )ni,j=1 ∈ M (n × n). We
want to calculate EA where E is an elementary matrix.

    
1 a11 a12 a1n a11 a12 a1n
    
    
• Sj (c)A = 
D

 =  caj1 caj2
    
c  aj1 aj2 ajn cajn 
    
    
    
1 an1 an2 ann an1 an2 ann

   
  a11 a12 a1n a11 a12 a1n
1    
    
 aj1 aj2 ajn
  

 1 c    aj1 + cak1 aj2 + cak2 ajn + cakn 
• Qjk (c)A =   =
  
 
 
  
 ak1 ak2 akn
 
  
  ak1 ak2 akn 

    
1    
an1 an2 ann an1 an2 ann

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
106 3.8. Elementary matrices

    
1 a11 a12 a1n a11 a12 a1n
    
    

 0 1 
 aj1 aj2 ajn   ak1 ak2
 
akn


• Pjk A =   = .
  
 
  
  aj1 aj2 ajn

ak1 ak2 akn

 
1 0

    
 

 
 


1 an1 an2 ann an1 an2 ann

In summary, we see that

Proposition 3.61. • Sj (c) multiplies the jth row of A by c.


• Qjk (c) sums c times the kth row to the jth row of A.
• Pjk swaps the kth and the jth row of A.

FT
These are exactly the row operations from the Gauß or Gauß-Jordan elimination! So we see that
every row operation can be achieved by multiplying from the left by an appropriate elementary
matrix.

Remark 3.62. The form of the elementary matrices is quite easy to remember if you recall that
E idn = E for every matrix E, in particular for an elementary matrix. So, if you want to remember
e.g. how the 5 × 5 matrix looks like which sums 3 times the 2nd row to the 4th, just remember
that this matrix is
RA
 
1 0 0 0 0
 0 1 0 0 0
 
E = E id5 = (take id5 and sum 3 times its 2nd row to its 4th row) =   0 0 1 0 0

 0 3 0 1 0
0 0 0 0 1

which is Q42 (3).

Question 3.3
D

How do elementary matrices act on other matrices if we multiply them from the right?
Hint. There are two ways to find the answer. One is to carry out the matrix multiplication as we
did on page 107. Or you could use that AE = [(AE)t ]t = [E t At ]t . If E is an elementary matrix,
then so is E t , see Proposition 3.64. Since you know how E t At looks like, you can then deduce
how its transpose looks like.

Since the action of an elementary matrix can be “undone” (since the corresponding row operation
can be undone), we expect them to be invertible. The next proposition shows that they indeed are
and that their inverse is again an elementary matrix of the same type.

Proposition 3.63. Every elementary n × n matrix is invertible. More precisely, for j, k = 1, . . . , n


with j 6= k the following holds:

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 107

(i) (Sj (c))−1 = Sj (c−1 ) for c 6= 0.


(ii) (Qjk (c))−1 = Qjk (−c).
(iii) (Pjk )−1 = Pjk .

Proof. Straightforward calculations.

Show that Proposition 3.63 is true. Convince yourself that it is true using their interpretation as
row operations.

Proposition 3.64. The transpose of an elementary n × n matrix is again an elementary matrix.


More precisely, for j, k = 1, . . . , n with j 6= k the following holds:
(i) (Sj (c))t = Sj (c) for c 6= 0.
(ii) (Qjk (c))t = Qkj (c).
(iii) (Pjk )t = Pjk .

Proof. Straightforward calculations.

FT
Exercise 3.65. Show that Qjk (c) = Sk (c−1 )Qjk (1)Sk (c) for c 6= 0. Interpret the formula in
terms of row operations.

Exercise. Show that Pjk can be written as product of matrices of the form Qjk (c) and Sj (c).
RA
Let us come back to the relation of elementary matrices and the Gauß-Jordan elimination process.

Proposition 3.66. Let A ∈ M (n × n) and let A0 be a row echelon form of A. Then there exist
elementary matrices E1 , . . . , Ek such that

A = E1 E2 · · · Ek A0 .

Proof. We know that we can arrive at A0 by applying suitable row operations to A. By Propo-
sition 3.61 they correspond to multiplication of A from the left by suitable elementary matrices
D

Fk , Fk−1 , . . . , F2 , F1 , that is
A0 = Fk Fk−1 · · · F2 F1 A.
We know that all the Fj are invertible, hence their product is invertible and we obtain

A = [Fk Fk−1 · · · F2 F1 ]−1 A0 = F1−1 F2−1 · · · Fk−1


−1
Fk−1 A0 .

We know that the inverse of every elementary matrix Fj is again an elementary matrix, so if we set
Ej = Fj−1 for j = 1, . . . , k, the proposition is proved.

Corollary 3.67. Let A ∈ M (n × n). Then there exist elementary matrices E1 , . . . , Ek and an
upper triangular matrix U such that

A = E1 E2 · · · Ek U.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
108 3.8. Elementary matrices

Proof. This follows immediately from Proposition 3.66 if we recall that every row reduced echelon
form of A is an upper triangular matrix.
The next theorem shows that every invertible matrix is “composed” of elementary matrices.

Theorem 3.68. Let A ∈ M (n × n). Then A is invertible if and only if it can be written as product
of elementary matrices.

Proof. Assume that A is invertible. Then the reduced row echelon form of A is idn . Therefore,
by Proposition 3.66, there exist elementary matrices E1 , . . . , Ek such that A = E1 · · · Ek idn =
E1 · · · Ek .
If, on the other hand, we know that A is the product of elementary matrices, say, A = F1 · · · F` , then
clearly A is invertible since each elementary matrix Fj is invertible and the product of invertible
matrices is invertible.
We finish this section with an exercise where we write an invertible 2 × 2 matrix as product of
elementary matrices. Notice that there are infinitely many ways to write it as product of elementary

FT
matrices just as there are infinitely many ways of performing row reduction to get to the identity
matrix.
 
1 2
Example 3.69. Write the matrix A = as product of elementary matrices.
3 4
Solution. We use the idea of the proof of Theorem 3.43: we apply the Gauß-Jordan elimination
process and write the corresponding row transformations as elementary matrices.
RA
       
1 2 R2 →R2 −3R1 1 2 R1 →R1 +R2 1 0 R2 →− 12 R2 1 0
−−−−−−−−→ −−−−−−−−→ −−−−−− −→
3 4 Q21 (−3) 0 −2 Q12 (1) 0 −2 S2 (− 12 ) 0 1
| {z } | {z } | {z }
=Q21 (−3)A =Q21 (1)Q21 (−3)A =S2 (− 12 )Q21 (1)Q21 (−3)A

So we obtain that
id2 = S2 (− 12 )Q21 (1)Q21 (−3)A. (3.28)
Since the elementary matrices are invertible, we can solve for A and obtain
A = [S2 (− 21 )Q21 (1)Q21 (−3)]−1 id2 = [S2 (− 21 )Q21 (1)Q21 (−3)]−1
D

= [Q21 (−3)]−1 [Q21 (1)]−1 [S2 (− 12 )]−1


= Q21 (3)Q21 (−1)S2 (−2). 
Note that from (3.28) we get the factorisation for A−1 for free. Clearly, we must have
A−1 = S2 (− 12 )Q21 (1)Q21 (−3). (3.29)
If we wanted to we could now use (3.29) to calculate A−1 . It is by no means a surprise that we
actually get first the factorisation of A−1 because the Gauß-Jordan elimination leads to the inverse
of A. So A−1 is the composition of the matrices which leads from A to the identity matrix. (To
get from the identity matrix to A, we need to reverse these steps.)

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 109

You should now have understood


• the relation of the elementary matrices with the Gauß-Jordan process,
• why a matrix is invertible if and only if it is the product of elementary matrices,
• etc.
You should now be able to
• express an invertible matrix as product of elementary matrices,
• etc.

3.9 Summary
Elementary row operations (= operations which lead to an equivalent system) for
solving a linear system.

FT
Elementary operation Notation Inverse Operation
1 Swap rows j and k. Rj ↔ Rk Rj ↔ Rk
2 Multiply row j by some λ ∈ R \ {0} Rj → λRj Rj → λ1 Rj
3 Replace row k by the sum of row k and λ times Rk → Rk + λRj Rk → Rk − λRj
Rj and keep row j unchanged.
RA
On the solutions of a linear system.
• A linear system has either no, exactly one or infinitely many solutions.
• If the system is homogeneous, then it has either exactly one or infinitely many solutions. It
always has at least one solution, namely the trivial one.
• The set of all solutions of a homogeneous linear equations is a vector space.
• The set of all solutions of a inhomogeneous linear equations is an affine vector space.

For A ∈ M (m × n) and ~b ∈ Rm consider the equation A~x = ~b. Then the following is true:
D

(1) No solution ⇐⇒ The reduced row echelon form of the augmented


system (A|~b) has a row of the form (0 · · · 0|β) with
some β 6= 0.

(2) At least one solution ⇐⇒ The reduced row echelon form of the augmented
system (A|~b) has no row of the form (0 · · · 0|β)
with some β 6= 0.
In case (2), we have the following two sub-cases:
(2.1) Exactly one solution ⇐⇒ # pivots= # columns.
(2.2) Infinitely many solutions ⇐⇒ # pivots< # columns.

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
110 3.9. Summary

Algebra with matrices and vectors


A matrix A ∈ M (m × n) can be viewed as a function A : Rn → Rm .

Definition.
    
a11 a12 ··· a1n x1 a11 x1 + a12 x2 + · · · + a1n xn
 a21 a22 ···   x2  a21 x1 + a22 x2 + · · · + a2n xn 
a2n     
A~x =  . ..   ..  =  .. ,

.. ..
 .. . .  .   . . 
am1 am2 ··· amn xn am1 x1 + am2 x2 + · · · + amn xn
   
a11 a12 ··· a1n b11 b12 ··· b1n
 a21 a22 ··· a2n   b21 b22 ··· b2n 
A+B = . ..  +  ..
   
.. .. .. 
 .. . .   . . . 
am1 am2 ··· amn bm1 bm2 · · · bmn
 
a11 + b11 a12 + b12 ··· a1n + b1n

FT
 a21 + b21 a22 + b22 ··· a2n + b2n 
= ,
 
.. .. ..
 . . . 
am1 + bm1 am2 + bm2 ··· amn + bmn
  
a11 a12 ··· a1n b11 b12 ··· b1n
 a21 a22 ··· a2n   b21 b22 ··· b2n 
AB =  .
  
.. ..   .. .. .. 
 .. . .  . . . 
RA
am1 am2 · · · amn bm1 bm2 · · · bmn
 
a11 b11 + a12 b21 + · · · + a1n bn1 · · · a11 b1k + a12 b2k + · · · + a1n bnk
 a21 b11 + a22 b21 + · · · + a2n bn1 · · · a21 b1k + a22 b2k + · · · + a2n bnk 
=
 
.. .. 
 . . 
am1 b11 + am2 b21 + · · · + amn bn1 ··· am1 b1k + am2 b2k + · · · + amn bnk

= (cj` )j`
D

with
n
X
cj` = ajh bh` .
h=1

• Sum of matrices: componentwise,


• Product of matrices with vector or matrix with matrix: “multiply row by column”.

Properties. Let A1 , A2 , A2 ∈ M (m × n), B ∈ M (n × k), C ∈ M (k × r) be matrices, ~x, ~y ∈ Rn ,


~z ∈ Rk and c ∈ K.

• A1 + A2 = A2 + A1 ,
• (A1 + A2 ) + A3 = A1 + (A2 + A3 ),

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 111

• (AB)C = A(BC),
• in general, AB 6= BA,
• A(~x + c~y ) = A~x + cA~y ,
• (A1 + cA2 )~x = A1 ~x + cA2 ~x,
• (AB)~z = A(B~z),

Transposition of matrices

Let A = (aij )i=1,...,m ∈ M (m × n). Then its transpose is the matrix At = (e


aij ) i=1,...,n ∈ M (n × m)
j=1...,n j=1...,m
with e
aij = aji .
For A, B ∈ M (m × n) and C ∈ M (n × k) we have

• (At )t = A,

FT
• (A + B)t = At + B t ,
• (AC)t = C t At ,
• hA~x , ~y i = h~x , At ~y i for all ~x ∈ Rn and ~y ∈ Rm .

A matrix A is called symmetric if At = A and antisymmetric if At = −A. Note that only square
matrices can be symmetric.
A matrix A = (aij )i,j=1,...,n ∈ M (n × n) is called
RA
• upper triangular if aij = 0 whenever i > j,
• lower triangular if aij = 0 whenever i < j,
• diagonal if aij = 0 whenever i 6= j.

Clearly, a matrix is diagonal if and only if it is upper and lower triangular. The transpose of an
upper triangular matrix is lower triangular and vice verse. Every diagonal matrix is symmetric.
D

Invertibility of matrices

A matrix A ∈ M (n × n) is called invertible if there exists a matrix B ∈ M (n × n) such that


AB = BA = idn . In this case B is called the inverse of A and it is denoted by A−1 . If A is not
invertible, then it is called singular.

• The inverse of an invertible matrix A is unique.


• If A is invertible, then so is A−1 and (A−1 )−1 = A.
• If A is invertible, then so is At and (At )−1 = (A−1 )t .
• If A and B are invertible, then so is AB and (AB)−1 = B −1 A−1 .

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
112 3.9. Summary

Theorem. Let A ∈ M (n × n). Then the following is equivalent:

(i) A is invertible.
(ii) For every ~b ∈ Rn , the equation A~x = ~b has exactly one solution.
(iii) The equation A~x = ~0 has exactly one solution.
(iv) Every row-reduced echelon form of A has n pivots.
(v) A is row-equivalent to idn .

Calculation of A−1 using Gauß-Jordan elimination

Let A ∈ M (n × n). Form the augmented matrix (A| idn ) and use the Gauß-Jordan elimination to
reduce A to its reduced row echelon form A0 : (A| idn ) → · · · → (A0 |B). If A0 = idn , then A is
invertible and A−1 = B. If A0 6= idn , then A is not invertible.

FT
Inverse of a 2 × 2 matrix
 
a b
Let A = . Then det A = ad − bc. If det A = 0, then A is not invertible. If det A 6= 0, then
c d  
−1 1 d −b
A is invertible and A = det A .
−c a
RA
Elementary matrices

We have the following three types of elementary matrices:

• Sj (c) = (sik )i,k=1...,n for c 6= 0 where sik = 0 if i 6= k, skk = 1 for k 6= j and sjj = c,

• Qjk (c) = (qi` )i,`=1...,n for j 6= k, where qjk = c, q`` = 1 for all ` = 1, . . . , n and all other
coefficients equal to zero,

• Pjk = (pi` )i,`=1...,n for j 6= k, where p`` = 1 for all ` ∈ {1, . . . , n} \ {j, k}, pjk = pkj = 1 and
D

all other coefficients equal to zero.

col. k col. k
 
column k 1
   
   
1 0 1 row k
1
 
   c 
 row j




Sj (c) = 
 c
,
 Qjk (c) = 


 , Pjk =


 .
     
   
1 1  1 0  row j
 
 
1

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 113

Relation Elementary matrix - Elemntary row operation

Elementary matrix Elementary operation Notation


Pjk Swap rows j with row k Rj ↔ Rk
Sj (c), c 6= 0 Multiply row j by c Rj → cRk
Qjk (c) Sum c times row k to row j Rk → Rk + cRj

3.10 Exercises
1. Vuelva al Capı́tulo 1 y haga los ejercicios otra vez utilizando los conocimientos adquiridos en
este capı́tulo.

2. Encuentre un polinomio de grado a lo más 2 que pase por los puntos (−1, −6), (1, 0), (2, 0).

FT
¿Cuántos tales polinomios hay?

3. (a) ¿Existe un polynomio de grado 1 que pase por los tres puntos del Ejercicio 2? ¿Cuántos
tales polinomios hay?
(b) ¿Existe un polynomio de grado 3 que pase por los tres puntos del Ejercicio 2? ¿Cuántos
tales polinomios hay? Dé por lo menos dos polinomios de grado 3.
RA
2x2 − 4x + 14
4. Encuentre las fracciones parciales de .
x(x − 2)2

5. Encuentre un sistema lineal 2 × 3 cuya solución sea


   
1 4
2 + t 5 , t ∈ R.
3 6
D

¿Existen sistemas 3 × 3 y 4 × 3 con las mismas solucioes? Dé ejemplos o diga por qué no existen.
¿Existe un sistema 4 × 3 con las mismas solucioes? Dé ejemplos o diga por qué no existen.

6. Encuentre un sistema lineal 4 × 4 cuya solución sea


     
1 4 7
2 5 3
  + s  + t ,
3 6 2 s, t ∈ R.
4 7 1

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
114 3.10. Exercises

7. Considere el sistema lineal


x1 + 2x2 + 3x3 = b1
3x1 − x2 + 2x3 = b2
4x1 + x2 + x3 = b3 .

Encuentre todo los posibles b1 , b2 , b3 , o diga por qué no hay, para que el sistema tenga
(a) exactamente una solución,
(b) ninguna solución,
(c) infinitas soluciones.

8. Calcule todas las posibles combinaciones (matriz)(vector):


 
  1 0  
1 0 3 6 1 3 6  
4 8 −1 2 7

FT
A = 4 8 1 0 , B= , C = 4 1 0 , D= ,
1 4 3 −2 2
1 4 4 3 1 4 3
5 −4
  1  
1  4 4  
    1
0 2    3 −3
3 ,
~r =   ~v = , w
~ =  3 ,

 5 ,
~x =   ~y = , ~z = −2 .
3  5 5
π
6 −1
−1
RA
   
2 6 −1 17
9. Sean A = 1 −2 2 y ~b =  6 . Encuentre todos los vectores ~x ∈ R3 tal que A~x = ~b.
1 2 −2 4

 
1 1
10. Sea M = .
−1 3
(a) Demuestre que no existe ~y 6= 0 tal que M ~y ⊥ ~y .
D

(b) Encuentre todos los vectores ~x 6= 0 tal que M~xk~x. Para cada tal ~x, encuentre λ ∈ R tal
que M~x = λ~x.

11. Calcule todas las posibles combinaciones (matriz)(matriz):


 
  1 0  
1 0 3 6 4 8  1 3 6
A = 4 8 1 0 , B =  1 4  , C = 4 1 0 ,
  
1 4 4 3 1 4 3
5 −4
   
−1 2 7 1 0
D= , E= .
3 −2 2 3 6

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 115

12. Determine si las matrices son invertibles. Si lo son, encuentre su matriz inversa.
   
    1 3 6 1 4 6
1 −2 −14 21
A= , B= , D = 4 1 0 , E = 2 1 5 .
2 7 12 −18
1 4 3 3 5 11

13. De las siguientes matrices determine si son invertibles. Si lo son, encuentre su matriz inversa.
 
      1 3 6
1 0 5 2 4 10
A= , B= , C= , D = 4 1 0 .
3 6 8 6 6 15
1 4 3

14. Una tienda vende dos tipos de cajitas de dulces:


Tipo A contiene 1 chocolate y 3 mentas, Tipo B contiene 2 chocolates y 1 menta.

(a) Dé una ecuación de la forma A~x = ~b que describe lo de arriba. Diga que siginifican los
vectores ~x y ~b.

FT
(b) Calcule, usando el resultado de (a), cuantos chocolates y cuantas mentas contienen:
(i) 1 caja de tipo A y 3 de tipo B, (iii) 2 caja de tipo A y 6 de tipo B,
(ii) 4 cajas de tipo A y 2 de tipo B, (iv) 3 cajas de tipo A y 5 de tipo B.

(c) Determine si es posible conseguir


(i) 5 chocolates y 15 mentas, (iii) 21 chocolates y 23 mentas,
(ii) 2 chocolates y 11 mentas, (iv) 14 chocolates y 19 mentas.
RA
comprando cajitas de dulces en la tienda. Si es posible, diga cuántos de cada tipo se
necesitan.

 
1 3
15. Sea Ak = y considere la ecuación
2 k
 
0
Ak ~x = . (∗)
D

(a) Encuentre todos los k ∈ R tal que (∗) tiene exactamente una solución para ~x.
(b) Encuentre todos los k ∈ R tal que (∗) tiene infinitas soluciones para ~x.
(c) Encuentre todos los k ∈ R tal que (∗) tiene ninguna solución para ~x.
 
2
(d) Haga lo mismo para Ak ~x = en vez de (∗).
3
   
b1 b
(e) Haga los mismo para Ak ~x = en vez de (∗) donde 1 es un vector arbitrario distinto
  b 2 b2
0
de .
0

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
116 3.10. Exercises

16. Escriba las matrices invertibles de los Ejercicios 12 y 13 como producto de matrices elementales.

17. Para las sigientes matrices encuentre matrices elementales E1 , . . . , En tal que E1 · E2 · · · · · En A
es de la forma triangular superior.
   
  1 4 −4 1 2 3
7 4
A= , B = 2 1 0  , C = 1 2 0 .
3 5
3 5 3 2 4 3

18. Sea A ∈ M (m × n) y sean ~x, ~y ∈ Rn , λ ∈ R. Demuestre que A(~x + λ~y ) = A~x + λA~y .

19. Demuestre que el espacio M (m × n) es un espacio vectorial con la suma de matrices y producto
con λ ∈ R usual.

FT
20. Sea A ∈ M (n × n).

(a) Demuestre que hA~x, ~y i = h~x, At ~y i para todo ~x ∈ Rn .


(b) Demuestre que hAAt ~x, ~xi ≥ 0 para todo ~x ∈ Rn .

21. Sea A = (aij ) i=1,...,n ∈ M (m × n) y sea ~ek el k-ésimo vector unitario en Rn (es decir, el vector
j=1,...,m
RA
en Rn cuya k-ésima entrada es 1 y las demás son cero). Calcule A~ek para todo k = 1, . . . , n y
describa en palabras la relación del resultado con la matriz A.

22. (a) Sea A ∈ M (m × n) y suponga que A~x = ~0 para todo ~x ∈ Rn . Demuestre que A = 0 (la
matriz cuyas entradas son 0).
(b) Sea x ∈ Rn y suponga que A~x = ~0 para todo A ∈ M (n × n). Demuestre que ~x = ~0.
(c) Encuentre una matriz A ∈ M (2 × 2) y ~v ∈ R2 , ambos distintos de cero, tal que A~v = ~0.
D

(d) Encuentre matrices A, B ∈ M (2 × 2) tal que AB = 0 y BA 6= 0.

   
4 −1
23. Sean ~v = yw
~= .
5 3

(a) Encuentre una matriz A ∈ M (2 × 2) que mapea el vector ~e1 a ~v y el vector ~e2 a w.
~
(b) Encuentre una matriz B ∈ M (2 × 2) que mapea el vector ~v a ~e1 y el vector w
~ a ~e2 .

24. Encuentre una matriz A ∈ M (2 × 2) que describe una rotación por π/3.

25. Sean A ∈ M (m, n), B, C ∈ M (n, k), D ∈ M (k, l).

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 3. Linear Systems and Matrices 117

(a) Demuestre que A(B + C) = AB + AC.


(b) Demuestre que A(BD) = (AB)D.

26. Sean R, S ∈ M (n, n) matrices invertibles. Demuestre que

RS = SR ⇐⇒ R−1 S −1 = S −1 R−1 .

27. Falso o verdadero? Pruebe sus respuestas.


(a) Si A es una matriz simétrica invertible, entonces A−1 es sı́metrica.
(b) Si A, B son matrices simétricas, entonces AB es sı́metrica.
(c) Si AB es una matriz simétrica, entonces A, B son matrices simétricas.
(d) Si A, B son matrices simétricas, entonces A + B es sı́metrica.

FT
(e) Si A + B es una matriz simétrica, entonces A, B son matrices simétricas.
(f) Si A es una matriz simétrica, entonces At es sı́metrica.
(g) AAt = At A.

28. Sea A ∈ M (m × n). Demuestre que AAt y At A son matrices simétricas.


RA
29. Sea A ∈ M nm × n). Demuestre que A + At es simétrica y que Demuestre que A − At es
antisimétrica.

30. Calcule (Sj (c))t , (Qij (c))t , (Pij )t .

31. (a) Sea P12 = ( 01 10 ) ∈ M (2 × 2). Demuestre que P12 se deja expresar como producto de
matrices elementales de la forma Qij (c) y Sk (c).
D

(b) Pruebe el caso general: Sea Pij ∈ M (n × n). Demuestre que Pij se deja expresar como
producto de matrices elementales de la forma Qkl (c) y Sm (c).
Observación: El ejercicio demuestra que en verdad solo hay dos tipos de matrices elementales
ya que el tercero (las permutaciones) se dejan reducir a un producto apropiado de matrices de
tipo Qij (c) y Sj (c).

Last Change: Mi 6. Apr 00:24:13 CEST 2022


Linear Algebra, M. Winklmeier
D
RA
FT
Chapter 4. Determinants 119

Chapter 4

Determinants

In this section we will define the determinant of matrices in M (n × n) for arbitrary n and we will
recognise the determinant for n = 2 defined in Section 1.2 as a special case of our new definition.

FT
We will discuss the main properties of the determinant and we will show that a matrix is invertible
if and only if its determinant is different from 0. We will also give a geometric interpretation of
the determinant and get a glimpse of its importance in geometry and the theory of integration.
Finally we will use the determinant to calculate the inverse of an invertible matrix and we will
prove Cramer’s rule.

4.1 Determinant of a matrix


RA
Recall that in Section 1.2 on page 19 we defined the determinant of a 2 × 2 matrix by
 
a a12
det 11 = a11 a22 − a12 a21 .
a21 a22
Moreover, we know that a 2 × 2 matrix A is invertible if and only if its determinant is different
from 0 because both statements are equivalent to the associated homogeneous system having only
the trivial solution.
In this section we will define the determinant for arbitrary n × n matrices and we will see that
D

again the determinant tells us if a matrix is invertible or not. We will give several formulas for the
determinant. As definition, we use the Leibniz formula because it is non-recursive. First we need
to know what a permutation is.

Definition 4.1. A permutation of a set M is a bijection M → M . The set of all permutations of


the set M = {1, . . . , n} is denoted by Sn . We denote an element σ ∈ Sn by
 
1 2 ··· n−1 n
σ(1) σ(2) · · · σ(n − 1) σ(n).
The sign (or parity) of a permutation σ ∈ Sn is
sign(σ) = (−1)#inversions of σ
where an inversion of σ is a pair i < j with σ(i) > σ(j).

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
120 4.1. Determinant of a matrix

Note that Sn consists of n! permutations.

Examples 4.2. (i) S2 consists of two permutations:


   
1 2 1 2
σ
1 2 2 1

sign(σ) 1 -1

(ii) S3 consists of six permutations:


           
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
σ
1 2 3 2 3 1 3 1 2 2 1 3 1 3 2 3 2 1

sign(σ) 1 1 1 -1 -1 -1

FT
For instance the second permutation has two inversions (1 < 3 but σ(1) > σ(3) and 2 < 3
but σ(2) > σ(3)), the third permutation has two inversions (1 < 2 but σ(1) > σ(2), 1 < 3 but
σ(1) > σ(3)), etc.

Definition 4.3. Let A = (aij )i,j=1,...,n ∈ M (n × n). Then its determinant is defined by
X
det A = sign(σ) a1σ(1) a2σ(2) · · · anσ(n) . (4.1)
σ∈Sn
RA
The formula in equation (4.1) is called the Leibniz formula.

Remark. Another notation for the determinant is |A|.

Remark 4.4. Note that according to the formula

(a) the determinant is a sum of n! terms,


D

(b) each term is a product of n components of A,


(c) in each product, there is exactly one factor from each row and from each column and all such
products appear in the formula.

So clearly, the Leibniz formula is computational nightmare . . .

Equal rights for rows and columns!


Show that X
det A = sign(σ) aσ(1)1 aσ(2)2 · · · aσ(n)n . (4.2)
σ∈Sn

This means: instead of putting the permutation in the column index, we can just as well put
them in the row index.

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 4. Determinants 121

Let us check if this new definition coincides with our old definition for the case n = 2.
 
a11 a12 X
det = sign(σ) a1σ(1) a2σ(2) = a11 a22 − a21 a12
a21 a22
σ∈S2

which is the same as our old definition.


Now let us see what the formula gives us for the case n = 3. Using our table with the permutations
in S3 , we find
 
a11 a12 a13 X
det A = det a21 a22 a23  = sign(σ) a1σ(1) a2σ(2) a3σ(3)
a31 a32 a33 σ∈S3

= a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a12 a21 a33 − a11 a23 a32 − a13 a22 a31 . (4.3)

Now let us group terms with coefficients from the first line of A.

FT
     
det A = a11 a22 a33 − a23 a32 − a12 a21 a33 − a23 a31 + a13 a21 a32 − a22 a31 . (4.4)

We see that the terms in brackets are again determinants:

• a11 is multiplied by the determinant of the 2 × 2 matrix obtained from A by deleting row 1
and column 1.

• a12 is multiplied by the determinant of the 2 × 2 matrix obtained from A by deleting row 1
RA
and column 2.

• a13 is multiplied by the determinant of the 2 × 2 matrix obtained from A by deleting row 1
and column 3.

If we had grouped the terms by coefficients from the second row, we would have obtained something
similar: each term a2j would be multiplied by the determinant of the 2 × 2 matrix obtained from
A by deleting row 2 and column j.
Of course we could also group the terms by coefficients all from the first column. Then the formula
D

would become a sum of terms where the aj1 are multiplied by the determinants of the matrices
obtained from A by deleting row j and column 1.
This motivates the definition of the so-called minors of a matrix.

Definition 4.5. Let A = (aij )i,j=1,...,n ∈ M (n × n). Then the (n − 1) × (n − 1) matrix Mij which
is obtained from A by deleting row i and column j of A is called a minor of A. The corresponding
cofactor is Cij := (−1)i+j det(Mij ).

With these definitions we can write (4.3) as

3
X 3
X
det A = (−1)1+j a1j det M1j = a1j C1j .
j=1 j=1

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
122 4.1. Determinant of a matrix

This formula is called the expansion of the determinant of A along the first row. We also saw that
we can expand along the second or the third row, or along columns, so

3
X 3
X
det A = (−1)k+j akj det Mkj = akj Ckj for k = 1, 2, 3,
j=1 j=1
3
X 3
X
det A = (−1)i+k aik det Mik = aik Cik for k = 1, 2, 3.
i=1 i=1

The first formula is called expansion along the kth row, and the second formula is called expansion
along the kth column. With a little more effort we can show that an analogous formula is true for
arbitrary n.

Theorem 4.6. Let A = (aij )i,j=1,...,n ∈ M (n × n) and let Mij denote its minors. Then

n n

FT
X X
det A = (−1)k+j akj det Mkj = akj Ckj for k = 1, 2, . . . , n, (4.5)
j=1 j=1
Xn n
X
det A = (−1)i+k aik det Mik = aik Cik for k = 1, 2, . . . , n. (4.6)
i=1 i=1

Proof. The formulas can be obtained from the Leibniz formula by straightforward calculations; but
RA
they are long and quite messy so we omit them here.

The formulas (4.5) and (4.5) are called Laplace expansion of the determinant. More precisely, (4.5)
is called expansion along the kth row, (4.6) is called expansion along the kth column.
Note that for calculating for instance the determinant of a 5×5 matrix, we have to calculate five 4×4
determinants for each of which we have to calculate four (3×3) determinants, etc. Computationally,
it is as long as the Leibniz formula, but at least we do not have to find all permutations in Sn first.
Later, we will see how to calculate the determinant using Gaussian elimination. This is computa-
D

tionally much more efficient, see Remark 4.12.

Example 4.7. We use expansion along the second column to calculate


       
3 2 1 3 2 1 3 2 1 3 2 1
det 5 6 4 = − 2 det 5 6 4 + 6 det 5 6 4 − 0 det 5 6 4
8 0 7 8 0 7 8 0 7 8 0 7
     
5 4 3 1 3 1
= −2 det + 6 det − 0 det
8 7 8 7 5 4
   
= −2 5 · 7 − 4 · 8] + 6 3 · 7 − 1 · 8] = −2 35 − 32] + 6 21 − 8] = −6 + 78 = 72.

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 4. Determinants 123

We obtain the same result if we expand the determinant along e.g. the first row:
       
3 2 1 3 2 1 3 2 1 3 2 1
det 5 6 4 = 3 det 5 6 4 − 2 det 5 6 4 + 1 det 5 6 4
8 0 7 8 0 7 8 0 7 8 0 7
     
6 4 5 4 5 6
= 3 det − 2 det + 1 det
0 7 8 7 8 0
   
= 3 6 · 7 − 4 · 0] − 2 5 · 7 − 4 · 8] + 5 · 0 − 6 · 8] = 3 · 42 − 2 35 − 32] − 40 = 126 − 6 − 48 = 72.

Example 4.8. We give an example of the calculation of the determinant of a 4 × 4 matrix. The
red arrows indicate along which row or column we expand.
 
1 2 3 4        
0 6 0 1 6 0 1 0 0 1 0 6 1 0 6 0
det   = det 0 7 0 − 2 det 2 7 0 + 3 det 2 0 0 − 4 det 2 0 7
2 0 7 0

FT
3 0 1 0 0 1 0 3 1 0 3 0
0 3 0 1
          
6 1 0 1 6 1 2 7
= 7 det − 2 7 det + 3 −2 det − 4 −6 det
3 1 0 1 3 1 0 0
= 7[6 − 3] − 14[0 − 0] − 6[6 − 3] + 24[0 − 0] = 21 − 18 = 3.

Now we calculate the determinant of the same matrix but choose a row with more zeros in the first
step. The advantage is that there are only two 3 × 3 minors whose determinants we really have to
RA
compute.

 
1 2 3 4        
0 6 0 1 2 3 4 1 3 4 1 2 4 1 2 3
det   = −0 det 0 7 0 + 6 det 2 7 0 − 0 det 2 0 0 + det 2 0 7
2 0 7 0
3 0 1 0 0 1 0 3 1 0 3 0
0 3 0 1
         
2 0 1 4 0 7 2 3
= 6 −3 det +7 + det − 2 det
0 1 0 1 3 0 3 0
D

= 6[−6 + 7] + [−21 + 18] = 6 − 3 = 3.

Rule of Sarrus
We finish this section with the so-called rule of Sarrus. From (4.3) we know that
 
det A = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a12 a21 a33 + a11 a22 a33 + a13 a22 a31
which can be memorised as follows: Write down the matrix A and append its first and second
column to it. Then we sum the products of the three terms lying on diagonals from the top left to
the bottom right and subtract the products of the terms lying on diagonals from the top right to
the bottom left as in the following picture:

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
124 4.2. Properties of the determinant

a11 a12 a13 a11 a12

a21 a22 a23 a21 a22

a31 a32 a33 a31 a32

 
det A = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 + a11 a23 a32 + a12 a21 a33 .

The rule of Sarrus works only for 3 × 3 matrices!!!

Convince yourself that one could also append the first and the second row below the matrix and
make crosses.

FT
Example 4.9 (Rule of Sarrus).

 
1 2 3  
det 4 5 6 = 1 · 5 · 7 + 2 · 6 · 0 + 3 · 4 · 8 − 3 · 5 · 0 + 6 · 8 · 1 + 7 · 2 · 4
0 8 7
= 35 + 96 − [48 + 56] = 131 − 106 = 27.
RA
You should now have understood
• what a permutation is,
• how to derive the Laplace expansion formula from the Leibniz formula,
• etc.
D

You should now be able to


• calculate the determinant of an n × n matrix,
• etc.

4.2 Properties of the determinant

In this section we will show properties of the determinant and we will prove that a matrix is
invertible if and only if its determinant is different from 0.

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 4. Determinants 125

(D1) The determinant is linear in its rows.


This means the following. Let ~r1 , . . . , ~rn be the row vectors of the matrix A and assume that
~rj = ~sj + γ~tj . Then
       
~r1 ~r1 ~r1 ~r1
 ..   ..   ..   .. 
 . 
 

 . 

 .




 . 

det A = det  ~r = det ~sj + γtj  = det  ~sj  + γ det  ~tj .
 j 

     
 .  ..  . ..
 ..   ..
    
 .    . 
~rn ~rn ~rn ~rn
This is proved easily by expanding the determinant along the jth row, or it can be seen from the
Leibniz formula as well.

(D1’) The determinant is linear in its columns.


This means the following. Let ~c1 , . . . , ~cn be the column vectors of the matrix A and assume that

FT
~cj = ~sj + γ~tj . Then
det A = det(~c1 | · · · |~cj | · · · |~cn ) = det(~c1 | · · · |~sj + γtj | · · · |~cn )
= det(~c1 | · · · |~sj | · · · |~cn ) + γ det(~c1 | · · · |tj | · · · |~cn ).
This is proved easily by expanding the determinant along the jth column, or it can be seen from
the Leibniz formula as well.
RA
(D2) The determinant is alternating in its rows.
If two rows in a matrix are swapped, then the determinant changes its sign. This means: Let
~r1 , . . . , ~rn be the row vectors of the matrix A and i 6= j ∈ {1, . . . , n}. Then
.. ..
   
 .   . 
 ~rj   ~ri 
   
det A = det  .  = − det  ...  .
 ..   
   
 ~ri   ~rj 
D

   
.. ..
. .
This is easy to see when the two rows that shall be interchanged are adjacent. For example, assume
that j = i + 1. Let A be the original matrix and let B be the matrix with rows i and i + 1 swapped.
We expand the determinant of A along the ith row and and the determinant of B along the (i+1)th
A B
row. Note that in both cases the minors are equal, that is, Mik = M(i+1)k (we use superscripts A
and B to distinguish between the minors of A and of B). So we find
n
X n
X n
X
det B = (−1)(i+1)+k M(i+1)k
B
= (−1)(−1)i+k Mik
A
=− (−1)i+k Mik
A
= − det A.
k=1 k=1 k=1

This can seen also via the Leibniz formula. Now let us see what happens if i and j are not adjacent
rows. Without restriction we may assume that i < j. Then we first swap the jth row (j − i) times

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
126 4.2. Properties of the determinant

with the row above until it is in the ith row. The original ith row is now in row (i + 1). Now we
swap it down with its neighbouring rows until it becomes row j. To do this we need j − (i + 1)
swaps. So in total we swapped [j − i] + [j − (i + 1)] = 2j − 2i + 1 times neighbouring rows, so the
determinant of the new matrix is

(−1) · (−1) · · · · · (−1) · det A = (−1)2j−2i+1 det A = − det A.


| {z }
2j−2i+1 times (one factor for each swap)

(D2’) The determinant is alternating in its columns.

If two columns in a matrix are swapped, then the determinant changes its sign. This means: Let
~c1 , . . . , ~cn be the column vectors of the matrix A and i 6= j ∈ {1, . . . , n}. Then

FT
det A = det(· · · |~ci | · · · |~cj | · · · ) = det(· · · |~cj | · · · |~ci | · · · ).

This follows in the same way as the alternating property for rows.

(D3) det idn = 1.


RA
Expansion in the first row shows

det idn = 1 det idn−1 = 12 det idn−2 = · · · = 1n = 1.

Remark 4.10. It can shown: Every function f : M (n × n) → R which satisfies (D1), (D2) and
D

(D3) (or (D1’), (D2’) and (D3)) must be det.

Now let us see some more properties of the determinant.

(D4) det A = det At .

This follows easily from the Leibniz formula or from the Laplace expansion (if you expand A along
the first row and At along the first column, you obtain exactly the same terms). This also shows
that (D1’) follows from (D1) and that (D2’) follows from (D2) and vice versa.

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 4. Determinants 127

(D5) If one row of A is multiple of another row, or if a column is a multiple of another


column, then det A = 0. In particular, if A has two equal rows or two equal columns
then det A = 0.
Let ~r1 , . . . , ~rn denote the rows of the matrix A and assume that ~rk = c~rj . Then

.. .. .. ..
       
 .    .    . . 

 ~rk 



 c~rj
 (D2)



 (D1)

 ~rj ~rj 

.. .. .. ..
det A = det   = det   = − det   = −c det 
       
 .    .    . . 

 ~rj   ~rj   c~rj   ~rj 
       
.. .. .. ..
. . . .
.. ..
   
 .   . 

 c~rj  
 ~rk 
 
(D1)
= − det 
 ..  = − det  ..  = − det A.
. 
 . 

FT
 


 ~rj  
 ~rj 
 
.. ..
. .

This shows det A = − det A, and therefore det A = 0. If A has a column which is a multiple of
another, then its transpose has a row which is multiple of another row and with the help of (D4) it
follows that det A = det At = 0.
RA
(D6) The determinant of an upper or lower triangular matrix is the product of its
diagonal entries.
Let A be an upper triangular matrix and let us expand its determinant in the first column. Then
only the first term in the Laplace expansion is different from 0 because all coefficients in the first
column are equal to 0 except possibly the one in the first row. We repeat this and obtain
     
c1 c2 c3

det A = det 
c2 ∗  
 = c1 det 
c3 ∗  
 = c1 c2 det 
c4 ∗ 
D

     

0 0 0
     
     
cn cn cn
 
cn−1 0
= · · · = c1 c2 · · · cn−2 det = c1 c2 · · · cn−1 cn .
0 cn

The claim for lower triangular matrices follows from (D4) and what we just showed because the
transpose of an upper triangular matrix is lower triangular and the diagonal entries are the same.
Or we could repeat the above proof but this time we would expand always in the first row (or last
column).

Next we calculate the determinant of elementary matrices.

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
128 4.2. Properties of the determinant

(D7) The determinant of elementary matrices.


(i) det Sj (c) = c,
(ii) det Qij (c) = 1,
(iii) det Pij = −1.

The affirmation about Sj (c) and Qij (c) follow from (D6) since they are triangular matrices. The
claim for Pij follows from (D2) and (D3) because swapping row i and row j in Pij gives us the
identity matrix, so det Pij = − det id = −1.
Now we calculate the determinant of a product of an elementary matrix with another matrix.

(D8) Let E be an elementary matrix and let A ∈ M (n × n). Then det(EA) = det E det A.
Let E be an elementary matrix and let us denote the rows of A by ~r1 , . . . , ~rn . We have to distinguish
between the three different types of elementary matrices.
Case 1. E = Sj (c). We know from (D6) that det E = det Sj (c) = c. Using Proposition 3.61 and

FT
(D1) we find that
.. ..
   
  .   . 
det(EA) = det Sj (c)A = det  c~
r j
 = c det  ~rj  = c det A = det Sj (c) det A.
  
.. ..
. .

Case 2. E = Qij (c). We know from (D6) that det E = det Qij (c) = 1. Using Proposition 3.61 and
RA
(D1) and (D5) we find that
.. .. .. ..
       
 .   .   .   . 
~ri + c~rj   ~ri   ~rj   ~ri 
       
 .
.. . .  . 
det(EA) = det Qij (c)A = det   = det  ..  + c det  ..  = det  .. 
     
       
 ~rj   ~rj   ~rj   ~rj 
       
.. .. .. ..
. . . .
D

= det A = det Qij (c) det A.

Case 3. E = Pij . We know from (D6) that det E = det Pjk = −1. Using Proposition 3.61 and
(D2) we find that
.. .. ..
      
  .   .   . 
  ~rj   ~rk   ~rj 
      
  ..  .. ..
det(EA) = det Pjk A = det Pjk  .  = det   = − det 
    
    .  . 

  ~rk   ~rj   ~rk 
      
.. .. ..
. . .
= − det A = det Pjk det A.

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 4. Determinants 129

If we repeat (D8), then we obtain

det(E1 · · · Ek A) = det(E1 ) · · · det(Ek ) det(A)

for elementary matrices E1 , . . . , Ek .

(D9) Let A ∈ M (n × n). Then A is invertible if and only if det A 6= 0.


Let A0 be the reduced row echelon form of A. By Proposition 3.66 there exist elementary matrices
E1 , . . . , Ek such that A = E1 · · · Ek A0 , hence

det A = det(E1 · · · Ek ) = det(E1 ) · · · det(Ek ) det A0 . (4.7)

Recall that the determinant of an elementary matrix is different from zero, so (4.7) shows that
det A = 0 if and only if det A = 0.
If A is invertible, then A0 = id hence det A0 = 1 6= 0 and therefore also det A 6= 0. If A is not
invertible, then the last row of A0 must be zero, hence det A0 = 0 and therefore also det A = 0.

FT
Next we show that the determinant is multiplicative.

(D10) Let A, B ∈ M (n × n). Then det(AB) = det A det B.


As before, let A0 be the reduced row echelon form of A. By Proposition 3.66 there exist elementary
matrices E1 , . . . , Ek such that A = E1 · · · Ek A0 . It follows from (D9) that

det(AB) = det(E1 · · · Ek A0 B) = det(E1 ) · · · det(Ek ) det(A0 B). (4.8)


RA
If A is invertible, then A0 = id and (4.7) shows that

det(AB) = det(E1 ) · · · det(Ek ) det(B) = det(E1 · · · Ek ) det(B) = det(A) det(B).

If on the other hand A is not invertible, then det A = 0. Moreover, the last row of A0 is zero,
so also the last row of A0 B is zero, hence A0 B is not invertible and therefore det A0 B = 0. So
we have det(AB) = 0 by (4.7), and also det(A) det(B) = 0 det(B) = 0, so also in this case
det(AB) = det A det B.
D

(D11) Let A ∈ M (n × n) be an invertible matrix. Then det(A−1 ) = (det A)−1 .


If A invertible then det A 6= 0 and it follows from (D10) that

1 = det(idn ) = det(AA−1 ) = det(A) det(A−1 ).

Solving for det(A−1 ) gives the desired formula.

Let A ∈ M (n × n). Give two proofs of det(cA) = cn det A using either one of the following:
(i) Apply (D1) or (D1’) n times.

(ii) Use that cA = diag(c, c, . . . , c)A and apply (D10) and (D6).

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
130 4.2. Properties of the determinant

The determinant is not additive!


Recall that det(AB) = det A det B. But in general

det(A + B) 6= det A + det B.


   
1 0 0 0
For example, if A = and B = , then det A + det B = 0 + 0 = 0, but det(A + B) =
0 0 0 1
det id2 = 1.

The following theorem is Theorem 3.43 together with (D9).

Theorem 4.11. Let A ∈ M (n × n). Then the following is equivalent:


(i) A is invertible.
(ii) For every ~b ∈ Rn , the equation A~x = ~b has exactly one solution.
(iii) The equation A~x = ~0 has exactly one solution.

(vi) det A 6= 0.
FT
(iv) Every row-reduced echelon form of A has n pivots.
(v) A is row-equivalent to idn .

On the computational complexity of the determinant.


Remark 4.12. The above properties provide an efficient way to calculate the determinant of an
RA
n × n matrix. Note that both the Leibniz formula and the Laplace expansion require O(n!) steps
(O(n!) stands for “order of n!”. You can think of it as “roughly n!” or “up to a constant multiple
roughly equal to n!”. Something like O(2n!) is still the same as O(n!)). However, reducing a
matrix with the Gauß-Jordan elimination requires only O(n3 ) steps until we reach a row echelon
form. Since this is always an upper triangular matrix, its determinant can be calculated easily.
If n is big, then n3 is big, too, but n! is a lot bigger, so the Gauß-Jordan elimination is computa-
tionally much more efficient than the Leibniz formula or the Laplace expansion.

Let us illustrate this with an example.


D

       
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
1 3 4 6 1 0 1 1 2 2 0 1 1 2 3 0 0 1 2
det 
 = det 
 = 5 det 
 = 5 det 
 
1 7 8 9 0 5 5 5 0 1 1 1 0 0 1 1
1 5 3 4 0 3 0 0 0 3 0 0 0 3 0 0
   
1 2 3 4 1 2 3 4
4 0 0 0 2 5 0 3 0 0 7
= 5  = −5 det  = −30.
0 0 1 1 0 0 1 1 
0 3 0 0 0 0 0 2

1 We subtract the first row from all the other rows. The determinant does not change.
2 We factor 5 in the third row.

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 4. Determinants 131

3 We subtract 1/3 of the last row from rows 2 and 3. The determinant does not change.
4 We subtract row 3 from row 2. The determinant does not change.
5 We swap rows 2 and 4. This gives a factor −1.
6 Easy calculation.

You should now have understood


• the different properties of the determinant,
• why a matrix is invertible if and only if its determinant is different from 0,
• why the Gauß-Jordan elimination is computationally more efficient than the Laplace expan-
sion formula,
• etc.

You should now be able to

FT
• compute determinants using their properties,
• compute abstract determinants,
• use the factorisation of a matrix to compute its determinant,
• etc.

4.3 Geometric interpretation of the determinant


RA
In this short section we show a geometric interpretation of the determinant. This is of course only
a small part of the true importance of the determinant. You will hear more about this in a course
on vector calculus when you discuss the transformation formula (the substitution rule for higher
dimensional integrals), or in a course on Measure Theory or Differential Geometry. Here we content
ourselves with two basic facts.

Area in R2
   
a1 b
D

Let ~a = and ~b = 1 be vectors in R2 and let us consider the matrix A = (~a|~b) the matrix
a2 b2
whose columns are the given vectors. Then

A~e1 = ~a, A~e2 = ~b.

That means that A transforms the unit square spanned by the unit vectors ~e1 and ~e2 into the
parallelogram spanned by the vectors ~a and ~b. Let area(~a, ~b) be the area of the parallelogram
spanned by ~a and ~b. We can view ~a and ~b as vectors in R3 simply by adding a third component.
Then formula (2.9) shows that the area of the parallelogram spanned by ~a and ~b is equal to
     
a1 b2 − a2 b1
a1 b1

a2  × b2  =  0  = |a1 b2 − a2 b1 | = | det A|,

0 0 0

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
132 4.3. Geometric interpretation of the determinant

A
y y

 
b1
~e2 A~e2 = b2

A~e1 = ( aa12 )
x x
~e1

Figure 4.1: The figure shows how the area of the unit square transforms under the linear transforma-
tion A. The area of the square on left hand side is 1, the area of the parallelogram on the right hand
side is | det A|.

hence we obtain the formula

FT
area(~a, ~b) = | det A|. (4.9)
So while A tells us how the shape of the unit square changes, | det A| tells us how its area changes,
see Figure 4.1.
You should also notice the following: The area of the image of the unit square under A is zero
if and only if the two image vectors ~a and ~b are parallel. This is in accordance to the fact that
det A = 0 if and only if the two lines described by the associated linear equations are parallel (or if
one equation describes the whole plane).
RA
Volumes in R3
     
a1 b1 c1
Let ~a = a2 , ~b = b2  and ~c = c2  be vectors in R3 and let us consider the matrix
a3 b3 c3
~
A = (~a | b | ~c) whose columns are the given vectors. Then

A~e1 = ~a, A~e2 = ~b, A~e3 = ~c.


That means that A transforms the unit cube spanned by the unit vectors ~e1 , ~e2 and ~e3 into the
D

parallelepiped spanned by the vectors ~a, ~b and ~c. Let vol(~a, ~b, ~c) be the volume of the parallelepiped
spanned by the vectors ~a, ~b and ~c. According to formula (2.10), vol(~a, ~b, ~c) = |h~a , ~b × ~ci|. We
calculate
*     + *   
a1 b1 c1 a1 b2 c3 − b3 c2 +
|h~a , ~b × ~ci| = a2  , b2  × c2  = a2  , b3 c1 − c3 b1 

a3 b3 c3 a3 b1 c2 − b2 c1
= |a1 (b2 c3 − b3 c2 ) − a2 (c3 b1 − b3 c1 ) + a3 (b1 c2 − b2 c1 )|
= | det A|
hence
vol(~b, ~b, ~c) = | det A| (4.10)

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 4. Determinants 133

z A z

A~e3
~e3

~e2 y y

A~e2

~e1 A~e
1

x
x

Figure 4.2: The figure shows how the volume of the unit cube transforms under the linear transfor-
mation A. The volume of the cube on left hand side is 1, the volume of the parallelepiped on the right

FT
hand side is | det A|.

since we recognise the second to last line as the expansion of det A along the first column. So while
A tells us how the shape of the unit cube changes, | det A| tells us how its volume changes.

You should also notice the following: The volume of the image of the unit cube under A is zero if
RA
and only if the three image vectors lie in the same plane. We will see later that this implies that
the range of A is not all of R3 , hence A cannot be invertible. For details, see Section 6.2.

What we saw for n = 2 and n = 3 can be generalised to Rn with n ≥ 4: A matrix A ∈ M (n × n)


transforms the unit cube in Rn spanned by the unit vectors ~e1 , . . . , ~en into a parallelepiped in Rn
and | det A| tells us how its volume changes.

Exercise. Give two proofs of the following statements: One using the formula (4.9) and linearity
D

of the determinant in its columns; and another proof using geometry.


y
(i) Show that the area of the blue parallelogram is w
~
twice the area of the green parallelogram.
2~v

−~v x

−w
~

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
134 4.4. Inverse of a matrix

y
3w
~
(ii) Show that the area of the blue parallelogram is
six times the area of the green parallelogram.

~v 2~v
x
−w
~

w
~
(iii) Show that the area of the blue and the red par-
allelogram is equal to the area of the green par-
~z
allelogram. ~v

FT
x
−~z
−~z
RA
You should now have understood
• the geometric interpretation of the determinant in R2 and R3 ,
• the close relation between the determinant and the cross product in R3 and that this is the
reason why the cross product appears in the formulas for the area of a parallelogram and
the volume of a parallelepiped,
• etc.
D

You should now be able to

• calculate the area of a parallelogram and the volume of a parallelepiped using determinants,
• etc.

4.4 Inverse of a matrix


In this section we prove a method to calculate the inverse of an invertible square matrix using
determinants. Although the formula might look nice, computationally it is not efficient. Here it
goes.
Let A = (aij )i,j=1,...,n ∈ M (n × n) and let Mij be its minors, see Definition 4.5. We already know

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 4. Determinants 135

from (4.5) that for every fixed k ∈ {1, . . . , n}


n
X
det A = (−1)k+j akj det Mkj . (4.11)
j=1

Now we want to see that happens if the k in akj and in Mkj are different.

Proposition 4.13. Let A = (aij )i,j=1,...,n ∈ M (n × n) and let k, ` ∈ {1, . . . , n} with k 6= `. Then
n
X
(−1)`+j akj det M`j = 0. (4.12)
j=1

Proof. We build the new matrix B from A by replacing its `th row by the kth row. Then B has
two equal rows (row ` and row k), hence det B = 0. Note that the matrices A and B are equal
B A
everywhere except possibly in the `th row, so their minors along the row ` are equal: M`j = M`j
(we put superscripts A, B in order to distinguish the minors of A and of B). If we expand det B
along the `th row then we find

FT
n
X n
X
0 = det B = (−1)`+j b`j det M`j
B
= (−1)`+j akj det M`j
A
.
j=1 j=1

Using the cofactors Cij of A (see Definition 4.5), formulas (4.11) and (4.12) can be written as
n n
(
X
`+j A
X det A if k = `,
(−1) akj det M`j = akj C`j := (4.13)
j=1 j=1
0 if k 6= `.
RA
Definition 4.14. For A ∈ M (n × n) we define its adjugate matrix adj A as the transpose of its
cofactor matrix:
 t  
C11 C12 · · · C1n C11 C21 ··· Cn1
 C21 C22 · · · C2n   C12 C22 ··· Cn2 
adj A :=  . ..  =  .. ..  .
   
.. ..
 .. . .   . . . 
Cn1 Cn2 · · · Cnn C1n C2n ··· Cnn
D

Theorem 4.15. Let A ∈ M (n × n) be an invertible matrix. Then


1
A−1 = adj A. (4.14)
det A
Proof. Let us calculate PA adj A. By definition of adj A the coefficient ck` in the matrix product
n
A adj A is exactly ck` = j=1 (−1)`+j akj det M`j , so by (4.13) it follows that
 
det A 0 ... 0
 0 det A . . . 0 
A adj A =  . ..  = (det A) idn .
 
. .
. . .
 . . . . 
0 0 ... det A

Rearranging, we obtain that A−1 = 1


det A adj A id−1
n =
1
det A adj A.

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
136 4.4. Inverse of a matrix

Remark 4.16. Note that the proof of Theorem 4.15 shows that A adj A = det A idn is true for
every A ∈ M (n × n), even if it is not invertible (in this case, both sides of the formula are equal to
the zero matrix).

Formula (4.14) might look quite nice and innocent, however bear in mind that in order to calculate
A−1 with it you have to calculate one n ×n determinant and n2 determinants of the (n −1)×(n −1)
minors of A. This is a lot more than the O(n3 ) steps needed in the Gauß-Jordan elimination.
Finally, we prove Cramer’s rule for finding the solution of a linear system if the corresponding
matrix is invertible.

Theorem 4.17. Let A ∈ M (n × n) be an invertible matrix and let B~ ∈ Rn . Then the unique
~
solution ~x of A~x = b is given by
~
 
det Ab1
 
x1
~
 x2  det Ab2 

1 
~x =  .  = (4.15)
   . 

FT
 ..  det A 
 .. 

xn ~
det Ab n

~
where Abj is the matrix obtained from the matrix A if we replace its jth column by the vector ~b.

Proof. As usual we write Cij for the cofactors of A and Mij for its minors. Since A is invertible,
we know that ~x = A−1~b = det1 A adj A~b. Therefore we find for j = 1, . . . , n that
RA
n n
1 X 1 X 1 ~
xj = Ckj bk = (−1)k+j bk Ckj = det Abj .
det A det A det A
k=1 k=1

~
The last equality is true because the second to last sum is the expansion of the determinant of Abj
along the kth column.

Note that, even if (4.15) might look quite nice, it involves the computation of n + 1 determinants
of n × n matrices, so it involves O((n + 1)!) steps.
D

 
1 2 3
Example 4.18. Let us calculate the inverse of the matrix A = 4 5 6 from Example 4.9. We
0 8 7
already know that det A = 27. Its cofactors are
     
5 6 4 6 4 5
C11 = det = −13, C12 = − det = −28, C13 = det = 32,
8 7 0 7 0 8
     
2 3 1 3 1 2
C21 = − det = 10, C22 = det = 7, C23 = − det = −8,
8 7 9 7 0 8
     
2 3 1 3 1 2
C31 = det = −3, C32 = − det = 6, C33 = det = −3.
5 6 4 6 4 5

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 4. Determinants 137

Therefore
   
C C21 C31 −13 10 −3
1 1  11 1
A−1 = adj A = C12 C22 C32  = −28 7 6 .
det A det A 10
C13 C23 C33 32 −8 −3

You should now have understood


• what the adjugate matrix is and why it can be used to calculate the inverse of a matrix,
• etc.
You should now be able to
• calculate A−1 using adj A.

FT
• etc.

4.5 Summary
The determinant is a function from the square matrices to the real or complex numbers. Let
A = (aij )ni,j−1 ∈ M (n × n).
RA
Formulas for the determinant.

X
det A = sign(σ) a1σ(1) a2σ(2) · · · anσ(n) Leibniz formula
σ∈Sn
Xn n
X
= (−1)k+j akj det Mkj = akj Ckj Laplace expansion along the kth row
D

j=1 j=1
Xn n
X
= (−1)i+k aik det Mik = aik Cik Laplace expansion along the kth column
i=1 i=1

with the following notation

• Sn is the set of all permutations of {1, . . . , n},

• Mij are the minors of A ((n − 1) × (n − 1) matrices obtained from A by deleting row i and
column j),

• Cij = (−1)i+j det Mij are the cofactors of A.

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
138 4.5. Summary

Inverse of a matrix using the adjugate matrix


If A ∈ M (n × n) is invertible then
 
C11 C22 ··· Cn1
1 1   C12 C22 ··· Cn2 
A−1 = adj A = ..  .

 . ..
det A det A  .. . . 
C1n C2n ··· Cnn

Geometric interpretation
The determinant of a matrix A gives the oriented volume of the image of the unit cube under A.
• in R2 : area of parallelogram spanned by ~a and ~b = | det A|,
• in R3 : volume of parallelepiped spanned by ~a, ~b and ~c = | det A|.

Properties of the determinant.







det idn = 1.
det A = det At . FT
The determinant is linear in its rows and columns.
The determinant is alternating in its rows and columns.

If one row of A is multiple of another row, or if a column is a multiple of another column,


then det A = 0. In particular, if A has two equal rows or two equal columns then det A = 0.
RA
• The determinant of an upper or lower triangular matrix is the product of its diagonal entries.
• The determinants of the elementary matrices are

det Sj (c) = c, det Qij (c) = 1, det Pij = −1.

• Let A ∈ M (n × n). Then A is invertible if and only if det A 6= 0.


• Let A, B ∈ M (n × n). Then det(AB) = det A det B.

• If A ∈ M (n × n) is invertible, then det(A−1 ) = (det A)−1 .


D

Note however that in general det(A + B) 6= det A + det B.

Theorem. Let A ∈ M (n × n). Then the following is equivalent:


(i) det A 6= 0.
(ii) A is invertible.
(iii) For every ~b ∈ Rn , the equation A~x = ~b has exactly one solution.
(iv) The equation A~x = ~0 has exactly one solution.
(v) Every row-reduced echelon form of A has n pivots.
(vi) A is row-equivalent to idn .

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 4. Determinants 139

4.6 Exercises
1. De las siguientes matrices calcule la determinante. Determine si las matrices son invertibles. Si
lo son, encuentre su matriz inversa y la determinante de la inversa.
   
    1 3 6 1 4 6
1 −2 −14 21
A= , B= . D = 4 1 0 , E = 2 1 5  .
2 7 12 −18
1 4 3 3 5 11

2. De las siguientes matrices calcule el determinante. Determine si las matrices son invertibles. Si
lo son, encuentre su matriz inversa y el determinante de la inversa.
 
  1 2 3 0
  −1 2 3
π 3 0 1 2 2
A= , B =  1 3 1 , C= .
5 2 1 4 0 3
4 3 2
1 1 5 4

1

2 3

FT
3. Encuentre por lo menos cuatro matrices 3 × 3 cuyo determinante es 18.

4. Use las factorizaciones encontrados en los Ejercicios 16 y 16 en Capı́tulo 3 para calcular sus
determinantes.
RA
5. Escribe la matriz A =  1 2 6  como producto de matrices elementales y calcule el
−2 −2 −6
determinante de A usando las matrices elementales encontradas.

6. Determine todos los x ∈ R tal que las siguientes matrices son invertibles:
   
  x x 3 11 − x 5 −50
x 2
A= , B =  1 2 6 , C= 3 −x −15  .
1 x−3
−2 2 −6 2 1 −x − 9
D

7. Suponga que una función y satisface y [n] = bn−1 y [n−1] + · · · b1 y 0 + b0 y donde b0 , . . . , bn−1 son
coeficientes constantes y y [j] denota la derivada j-ésima de y.
Verifique que Y 0 = AY donde
 
0 1 0 0 ... 0  
0 0 1 0 ... 0  y
 ..

.. .. .. ..

  y0

. . . . . 
y 00

A= , Y =
 
. .. ..

.

. . 1 0 
 
   .
0 0 0 ... 0 1  y [n−1]
b1 b2 ... ... ... bn−1

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
140 4.6. Exercises

y calcule el determinante de A.

8. Sin usar fórmulas de expansión para determinantes, encuentre para cada una de las matrices
dadas parámetros x, y tales que el determinante de las siguientes matrices es igual a zero y
explique por qué los parametros encontrado sirven.
 
  1 x y 2
x 2 6 x 0 1 y 
N1 =  2 5 1 , N2 =  x 5 3 y  .

3 4 y
4 x y 8

9. (a) Calcule det Bn donde Bn es la matriz en M (n × n) cuyas entradas en la diagonal son 0 y


todas las demás entradas son 1, es decir:
 
  0 1 1 1 1
  0 1 1 1
  0 1 1 1 0 1 1 1
0 1 1 0 1 1

FT
 
B1 = 0, B2 = , B3 = 1 0 1 , B4 =  1 1 0 1 , B5 = 1 1 0 1
  1 , etc.
1 0
1 1 0 1 1 1 0 1
1 1 1 0
1 1 1 1 0

¿Cómo cambia la respuesta si en vez de 0 hay x en la diagonal?


(b) Calcule det Bn donde Bn es la matriz en M (n × n) cuyas entradas en la diagonal son 0 y
todas las demás entradas satisfacen bij = (−1)i+j , es decir:
 
RA
  0 1 −1 1
  0 1 −1
0 1  1 0 1 −1
B1 = 0, B2 = , B3 =  1 0 1 , B4 =  ,
1 0 −1 1 0 1
−1 1 0
1 −1 1 0
 
0 1 −1 1 −1
 1
 0 1 −1 1
B5 = −1
 1 0 1 −1  , etc.
 1 −1 1 0 1
−1 1 −1 1 0
D

¿Cómo cambia la respuesta si en vez de 0 hay x en la diagonal? Compare con el Ejercicio 8.

Last Change: Mi 6. Apr 00:17:08 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5

Vector spaces

In the following, K always denotes a field. In this chapter, you may always think of K = R,

FT
though almost everything is true also for other fields, like C, Q or Fp where p is a prime number.
Later, in Chapter 8 it will be more useful to work with K = C.

In this chapter we will work with abstract vector spaces. We will first discuss their basic proper-
ties. Then, in Section 5.2 we will define subspaces. These are subsets of vector spaces which are
themselves vector spaces. In Section 5.4 we will introduce bases and the dimension of a vector
space. These concepts are fundamental in linear algebra since they allow us to classify all finite
RA
dimensional vector spaces. In a certain sense, all n-dimensional vector spaces over the same field
K are equal. In Chapter 6 we will study linear maps between vector spaces.

5.1 Definitions and basic properties


First we recall the definition of an abstract vector space from Chapter 2 (p. 29).

Definition 5.1. Let V be a set together with two operations


D

vector sum + : V × V → V, (v, w) 7→ v + w,


product of a scalar and a vector · : K × V → V, (λ, v) 7→ λ · v.

Note that we will usually write λv instead of λ · v. Then V (or more precisely, (V, +, ·)) is called a
vector space over K if for all u, v, w ∈ V and all λ, µ ∈ K the following holds:

(a) Associativity: (u + v) + w = u + (v + w) for every u, v, w ∈ V .


(b) Commutativity: v + w = w + v for every u, v ∈ V .
(c) Identity element of addition: There exists an element O ∈ V , called the additive identity
such that O + v = v + O = v for every v ∈ V .
(d) Inverse element: For every v ∈ V , there exists an inverse element v 0 such that v + v 0 = O.

141
142 5.1. Definitions and basic properties

(e) Identity element of multiplication by scalar: For every v ∈ V , we have that 1v = v.


(f) Compatibility: For every v ∈ V and λ, µ ∈ K, we have that (λµ)v = λ(µv).
(g) Distributivity laws: For all v, w ∈ V and λ, µ ∈ K, we have

(λ + µ)v = λv + µv and λ(v + w) = λv + λw.

We already know that Rn is a vector space over R.

Remark 5.2. (i) Note that we use the notation ~v with an arrow only for the special case of
vectors in Rn or Cn . Vectors in abstract vector spaces are usually denoted without an arrow.
(ii) If K = R, then V is called a real vector space. If K = C, then V is called a complex vector
space.

Before we give examples of vector spaces, we first show some basic properties of vector spaces.

FT
Properties 5.3. (i) The identity element is unique. (Note that in the vector space axioms we
only asked for existence of an additive identity element; we did not ask for uniqueness. So one
could think that there may be several elements which satisfy (c) in Definition 5.1. However,
this is not possible as the following proof shows.)

Proof. Assume there are two neutral elements O and O0 . Then we know that for every v and
w in V the following is true:

v = v + O, w = w + O0 .
RA
Now let us take v = O0 and w = O. Then, using commutativity, we obtain

O0 = O0 + O = O + O0 = O.

(ii) Let x, y, z ∈ V . If x + y = x + z, then y = z.

Proof. Let x0 be an additive inverse of x (that means that x0 + x = O which must exist since
D

V is a vector space). This follows from

y = O + y = (x0 + x) + y = x0 + (x + y) = x0 + (x + z) = (x0 + x) + z O + z = z.

(iii) For every v ∈ V , its inverse element is unique. (Note that in the vector space axioms we
only asked for existence of an additive inverse for every element x ∈ V ; we did not ask
for uniqueness. So one could think that there may be several elements which satisfy (d) in
Definition 5.1. However, this is not possible as the following proof shows.)

Proof. Let v ∈ V and assume that there are elements v 0 , v 00 in V such that

v + v 0 = O, v + v 00 = O.

Now it follows from (ii) that v 0 = v 00 (take x = v, y = v 0 and z = v 00 in (ii)).

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 143

(iv) For every λ ∈ K we have λO = O.

Proof. Observe that λO = λO + O and that λO = λ(O + O) = λO + λO, hence

λO + O = λO + λO.

Now it follows from (ii) that O = λO (take x = λO, y = O and z = λO in (ii)).

(v) For every v ∈ V we have that 0v = O.

Proof. The proof is similar to the one above. Observe that 0v = 0v + O0 and 0v = (0 + 0)v =
0v + 0v, hence
0v + O = 0v + 0v.
Now it follows from (ii) that O = 0v (take x = 0v, y = O and z = 0v in (ii)).

(vi) If λv = O, then either λ = 0 or v = O.

(vii) For every v ∈ V , its inverse is (−1)v.


v=
FT
Proof. If λ = 0, then there is nothing to prove. Now assume that λ 6= 0. Then v is O because
1
λ
1
(λv) = O = O.
λ
RA
Proof. Let v ∈ V . Observe that by (vi), we have that 0v = O. Therefore

O = 0v = (1 + (−1))x = v + (−1)v.
Hence (−1)v is an additive inverse of v. By (iii), the inverse of v is unique, therefore (−1)v is
the inverse of v.

Remark 5.4. From now on, we write −v for the additive inverse of a vector. This notation is
D

justified by Property 5.3 (vii).

Examples 5.5. We give some important examples of vector spaces.

• R is a real vector space. More generally, Rn is a real vector space. The proof is the same
as for R2 in Chapter 2. Associativity and commutativity are clear. The identity element is
the vector whose entries are all equal to zero: ~0 = (0, . . . , 0)t . The inverse for a given vector
~x = (x1 , . . . , xn )t is (−x1 , . . . , −xn )t . The distributivity laws are clear, as is the fact that
1~x = ~x for every ~x ∈ Rn .

• C is a complex vector space. More generally, Cn is a complex space. The proof is the same
as for Rn .

• C can also be viewed as a real vector space.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
144 5.1. Definitions and basic properties

• R is not a complex vector space with the usual definition of the algebraic operations. If it
was, then the vectors would be real numbers and the scalars would be complex numbers. But
then if we take 1 ∈ R and i ∈ C, then the product i1 must be a vector, that is, a real number,
which is not the case.
• R can be seen as a Q-vector space.
• For every n, m ∈ N, the space M (m × n) of all m × n matrices with real coefficients is a real
vector space.

Proof. Note that in this case the vectors are matrices. Associativity and commutativity are
easy to check. The identity element is the matrix whose entries are all equal to zero. Given
a matrix A = (aij )i=1,...,m , its (additive) inverse is the matrix −A = (−aij )i=1,...,m . The
j=1,...,n j=1,...,n
distributivity laws are clear, as is the fact that 1A = A for every A ∈ M (m × n).

• For every n, m ∈ N, the space M (m × n, C) of all m × n matrices with complex coefficients,


is a complex vector space.

FT
Proof. As in the case of real matrices.

• Let C(R) be the set of all continuous functions from R to R. We define the sum of two
functions f and g in the usual way as the new function
f + g : R → R, (f + g)(x) = f (x) + g(x).
The product of a function f with a real number λ gives the new function λf defined by
RA
λf : R → R, (λf )(x) = λf (x).
Then C(R) is a vector space with these new operations.

Proof. It is clear that these operations satisfy associativity, commutativity and distributivity
and that 1f = f for every function f ∈ C(R). The additive identity is the zero function
(the function which is constant to zero). For a given function f , its (additive) inverse is the
function −f .
D

• Let P (R) be the set of all polynomials. With the usual sum and products with scalars, they
form a vector space.

Prove that C is a vector space over R and that R is a vector space over Q.

Observe that the sets M (m × n) and C(R) admit more operations, for example we can multiply
functions, or we can multiply matrices or we can calculate det A for a square matrix. However, all
these operations have nothing to do with the question whether they are vector spaces or not. It
is important to note that for a vector space we only need the sum of two vectors and the product
of a scalar with vector and that they satisfy the axioms from Definition 5.1.

We give more examples.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 145

Examples 5.6. • Consider R2 but we change the usual sum to the new sum ⊕ defined by
     
x a x+a
⊕ = .
y b 0

With this new sum, R2 is not a vector space. The reason is that
 there is no additive identity.
α
To see this, assume that we had an additive identity, say . Then we must have
β
       
x α x x
+ = for all ∈ R2 .
y β y y

However, for example,        


0 α α 0
+ = 6= ,
1 β 0 1

• Consider R2 but we change the usual sum to the new sum ⊕ defined by

FT
     
x a x+b
⊕ = .
y b y+b

With this new sum, R2 is not a vector space. One of the reasons is that the sum is not
commutative. For example
               
1 0 1+1 2 0 1 0+0 0
+ = = , but + = = .
0 1 0+0 0 1 0 1+1 2
RA
Show that there is no additive identity O which satisfies ~x ⊕ O = ~x for all ~x ∈ R2 .
• Let V = R+ = (0, ∞). We make V a real vector space with the following operations: Let
x, y ∈ V and λ ∈ R. We define

x ⊕ y = xy and λ x = xλ .

Then (V, ⊕, ) is a real vector space.


D

Proof. Let u, v, w ∈ V and let λ ∈ R. Then:

(a) Associativity: (u ⊕ v) ⊕ w = (uv) ⊕ w = (uv)w = u(vw) = u(v ⊕ w) = u ⊕ (v ⊕ w).


(b) Commutativity: v ⊕ w = vw = wv = w ⊕ v.
(c) The additive identity of ⊕ is 1 because for every x ∈ V we have that 1 ⊕ x = 1x = x.
(d) Inverse element: For every x ∈ V , its inverse element is x−1 because x⊕x−1 = xx−1 =
1 which is the identity element. (Note that this is in accordance with Properties 5.3 (vi)
since (−1) x = x−1 .)
(e) Identity element of multiplication by scalar: For every x ∈ V , we have that
1 x = 1x = x.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
146 5.2. Subspaces

(f) Compatibility: For every x ∈ V and λ, µ ∈ R, we have that

(λµ) v = v λµ = (v λ )µ = µ (v λ ) = λ (µ v).

(g) Distributivity laws: For all x, y ∈ V and λ, µ ∈ R, we have

(λ + µ) x = xλ+µ = xλ xµ = (λ v)(µ v) = (λ v) ⊕ (µ v)

and

λ (v ⊕ w) = (v ⊕ w)λ = (vw)λ = v λ wλ = v λ ⊕ wλ = (λ v) ⊕ (λ w).

• The example above can be generalised: Let f : R → (a, b) be an injective function. Then the
interval (a, b) becomes a real vector space if we define the sum of two vectors x, y ∈ (a, b) by

x ⊕ y = f (f −1 (x) + f −1 (y))

and the product of a scalar λ ∈ R and a vector x ∈ (a, b) by

FT
λ x = f (λf −1 (x)).

Note that in the example above we had (a, b) = (0, ∞) and f = exp (that is: f (x) = ex ).

You should have understood


• the concept of an abstract vector space,
RA
• that the spaces Rn are examples of vector spaces, but there are many more,
• that “vectors” not necessarily can be written as columns (think of the vector space of all
polynomials, etc.)
• etc.
You should now be able to
• give examples of vector spaces different from Rn or Cn ,
D

• check if a given set with a given addition and multiplication with scalars is a vector space,
• recite the vector space axioms when woken in the middle of the night,
• etc.

5.2 Subspaces
In this section, we work mostly with real vector spaces for the sake of definiteness. However, all
the statements are also true for complex vector spaces. We only have to replace R by C and the
word real by complex everywhere.

In this section we will investigate when a subset of a given vector space is itself a vector space.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 147

Definition 5.7. Let V be a vector space and let W ⊆ V be a subset of V . Then W is called a
subspace of V if W itself is a vector space with the sum and product with scalars inherited from V .
A subspace W is called a proper subspace if W 6= {0} and W 6= V .

First we observe the following basic facts.

Remark 5.8. Let V be a vector space.

• V always contains the following subspaces: {0} and V itself. However, they are not proper
subspaces.
• If V is a vector space, W is a subspace of V and U is a subspace of W , then U is a subspace
of V .
Prove these statements.

Remark 5.9. Let W be a subspace of a vector space V . Let O be the neutral element in V . Then
O ∈ W and it is the neutral element of W .

FT
Proof. Since W is a vector space, it must have a neutral element OW . A priori, it is not clear that
OW = O. However, since OW ∈ W ⊂ V , we know that 0OW = O. On the other hand, since
W is a vector space, it is closed under product with scalars, so O = 0OW ∈ W . Clearly, O is a
neutral element in W . So it follows that O = OW by uniqueness of the neutral element of W , see
Properties 5.3(i).

Now assume that we are given a vector space V and in it a subset W ⊆ V and we would like to
RA
check if W is a vector space. In principle we would have to check all seven vector space axioms
from Definition 5.1. However, if W is a subset of V , then we get some of the vector space axioms
for free. More precisely, the axioms (a), (b), (e), (f) and (g) hold automatically. For example, to
prove (b), we take two elements w1 , w2 ∈ W . They belong also to V since W ⊆ V , and therefore
they commute: w1 + w2 = w2 + w1 .
We can even show the following proposition:

Proposition 5.10. Let V be a real vector space and W ⊆ V a subset. Then W is a subspace of V
if and only if the following three properties hold:
D

(i) W 6= ∅, that is, W is not empty.


(ii) W is closed under sums, that is, if we take w1 and w2 in W , then their sum w1 + w2 belongs
to W .
(iii) W is closed under product with scalars, that is, if we take w ∈ W and λ ∈ R, then λw ∈ W .

Note that (ii) and (iii) can be summarised in the following:

(iv) W is closed under sums and product with scalars, that is, if we take w1 , w2 ∈ W and λ ∈ R,
then λw1 + w2 ∈ W .

Proof of 5.10. Assume that W is a subspace, then clearly (ii) and (iii) hold. (i) holds because every
vector space must contain at least the additive identity O.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
148 5.2. Subspaces

Now suppose that W is a subset of V such that the properties (i), (ii) and (iii) are satisfied. In
order to show that W is a subspace of V , we need to verify the vector space axioms (a) - (f) from
Definition 5.1. By assumptions (ii) and (iii), the sum and product with scalars are well defined in
W . Moreover, we already convinced ourselves that (a), (b), (e), (f) and (g) hold. Now, for the
existence of an additive identity, we take an arbitrary w ∈ W (such a w exists because W is not
empty by assumption (i)). Hence O = 0w ∈ W where O is the additive identity in V . This is then
also the additive identity in W . Finally, given w ∈ W ⊆ V , we know from Properties 5.3 (vi) that
its additive inverse is (−1)w, which, by our assumption (iii), belongs to W . So we have verified
that W satisfies all vector space axioms, so it is a vector space.

The proposition is also true if V is a complex vector space. We only have to replace R everywhere
by C.
In order to verify that a given W ⊆ V is a subspace, one only has to verify (i), (ii) and (iii) from
the preceding proposition. In order to verify that W is not empty, one typically checks if it contains
O.
The following definition is very important in many applications.

is a subspace of V .
FT
Definition 5.11. Let V be a vector space and W ⊆ V a subset. The W is called an affine subspace
if there exists an v0 ∈ V such that set

v0 + W := {v0 + w : w ∈ W }

Clearly, every subspace is also an affine subspace (take v0 = O).


RA
Let us see examples of subspaces and affine subspaces.

Examples 5.12. Let V be a vector space. We assume that V is a real vector space, but everything
works also for a complex vector space (we only have to replace R everywhere by C.)

(i) {0} is a subspace of V . It is called the trivial subspace of V .


D

(ii) V itself is a subspace of V .

(iii) Fix v ∈ V . Then the set W := {λv : λ ∈ R} is a subspace of V .

(iv) More generally, if we fix v1 , . . . vk ∈ V , then the set W := {α1 v1 + · · · αk vk : α1 , . . . , αk ∈ R}


is a subspace of V . This set is called the linear span of v1 , . . . , vk . It will be shown in
Theorem 5.22 that it is indeed a vector space.

(v) If we fix z0 and v1 , . . . vk ∈ V , then the set W := {z0 + α1 v1 + · · · αk vk : α1 , . . . , αk ∈ R} =


z0 + {α1 v1 + · · · αk vk : α1 , . . . , αk ∈ R} is an affine subspace of V . In general it will not be a
subspace.

Exercise. Show that W := {z0 + α1 v1 + · · · αk vk : α1 , . . . , αk ∈ R} is an affine subspace of


V . Show that it is a subspace if and only if z0 ∈ span{v1 , . . . , vk }.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 149

(vi) If W is a subspace of V , then V \ W is not a subspace. This can be easily seen if we recall that
W must contain O. But then V \ W cannot contain O, hence it cannot be a vector space.

Some more examples:

Examples 5.13. • The set of all solutions of a homogeneous system of linear equations is a
vector space.

• The set of all solutions of an inhomogeneous system of linear equations is an affine vector
space if it is not empty.

• The set of all solutions of a homogeneous linear differential equation is a vector space.

• The set of all solutions of an inhomogeneous linear differential equation is an affine vector
space if it is not empty.

Examples 5.14 (Examples and non-examples of subspaces of R2 ).

FT
  
λ
• W = : λ ∈ R is a subspace of R2 . This is actually a subspace of the form (iii) from
0
 
1
Example 5.12 with z = . Note that geometrically W is a line (it is the x-axis).
0
 
v1
• For fixed v1 , v2 ∈ R let ~v = and let W = {λ~v : λ ∈ R}. Then W is a subspace of R2 .
v2
RA
Geometrically, W is the trivial subspace {~0} if ~v = ~0. Otherwise it is the line in R2 passing
through the origin which is parallel to the vector ~v .

W
~v
D

Figure 5.1: The subspace W generated by the vector ~v .

   
a1 v1
• For fixed a1 , a2 , v1 , v2 ∈ R let ~a = and ~v = . Let us assume that ~v 6= ~0 and set
a2 v2
W = {~a + λ~v : λ ∈ R}. Then W is an affine subspace. Geometrically, W represents a line in
R2 parallel to ~v which passes through the point (a1 , a2 ). Note that W is a subspace if and
only if ~a and ~v are parallel.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
150 5.2. Subspaces

~v W

~v W
~a ~a

Figure 5.2: Sketches of W = {~a + λ~v : λ ∈ R}. In the figure on the left hand side, ~a 6k ~v , so W is an
affine subspace of R@ but not a subspace. In the figure on the right hand side, ~a k ~v and therefore W
is a subspace of R2 .

• U = {~x ∈ R2 : k~xk ≥ 3} is not a subspace of R2 since it does not contain ~0.

FT
 
2 2 1
• V = {~x ∈ R : k~xk ≤ 2} is not a subspace of R . For example, take ~z = . Then ~z ∈ W ,
0
however 3~z ∈
/ V.
    
x 2
• W = : x ≥ 0 . Then W is not a vector space. For example, ~z = ∈ W , but
y
  0
−2
(−1)~z = ∈
/ W.
0
RA
Note that geometrically W is a right half plane in R2 .

V 3~z ∈
/V −~z ∈
/W

~z ∈ V ~z ∈ W
D

Figure 5.3: The sets V and W in the figures are not subspaces of R2 .

Examples 5.15 (Examples and non-examples of subspaces of R3 ).


   
 x0 
• For fixed x0 , y0 , z0 ∈ R let W = λ  y0  : λ ∈ R . Then W is a subspace of R3 . Geomet-
z0
 
 
x0
rically, W is a line in R2 passing through the origin which is parallel to the vector  y0 .
z0

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 151

  
 x 
• For fixed a, b, c ∈ R the set W = y  : ax + by + cz = 0 is a subspace of R3 .
z
 

3 ~
Proof. We use Proposition 5.10
 toverify that W isa subspace of R . Clearly, 0 ∈ W since
x1 x2
0a+0b+0c = 0. Now let w~ 1 =  y1  and w~ 2 =  y2  in W and let λ ∈ R. Then w ~2 ∈ W
~ 1 +w
z1 z2
because

a(x1 + x2 ) + b(y1 + y2 ) + c(z1 + z2 ) = (ax1 + by1 + cz1 ) + (ax2 + by2 + cz2 ) = 0 + 0 = 0.

~ 1 ∈ W because
Also λw

a(λx1 ) + b(λy1 ) + c(λz1 ) = λ(ax1 + by1 + cz1 ) = λ0 = 0.

Hence W is closed under sum and product with scalars, so it is a subspace of R.

FT
Remark. Note that W is the set of all solutions of a homogeneouos linear system of equations
(one equation with three unknowns). Therefore W is a vector space by Theorem 3.21 where
it is shown that the sum and the product with a scalar of two solutions of a homogeneous
linear system is again a solution.

Remark. If a = b = c = 0, then W = R3 . If at least one of the numbers a, b, c ∈ R is


different from
 zero,
3
 then W is a plane in R which passes through the origin and has normal
RA
a
vector ~n =  b .
c

• For fixed
a,b, c, d ∈ R with d 6= 
0 and at least of the numbers a, b, c different from 0, the set
 x 
W = y  : ax + by + cz = d is not a subspace of R3 , see Figure 5.4, but it is an affine
z
 
subspace.
D

   
x1 x2
~ 1 =  y1  and w
Proof. Let us see that W is not a vector space. Let w ~ 2 =  y2  in W . Then
z1 z2
w
~1 + w~2 ∈
/ W because

a(x1 + x2 ) + b(y1 + y2 ) + c(z1 + z2 ) = (ax1 + by1 + cz1 ) + (ax2 + by2 + cz2 ) = d + d = 2d 6= d.

~ ∈ W and λ ∈ R \ {1}, then λw


(Alternatively, we could have shown that if w ~ ∈
/ W ; or we
could have shown that ~0 ∈
/ W .)
We know that W is a plane in R3 which has normal vector ~n = (a, b, c)t but does not pass
through the origin. This shows that W is an affine vector space because it can be written as
W = ~v0 + W0 where W0 is the plane parallel to W which passes through the origin and ~v0 is

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
152 5.2. Subspaces

FT
Figure 5.4: The green plane passes through the origin and is a subspace of R3 . The red plane does
not pass through the origin and therefore it is an affine subspace of R3 .

an arbitrary vector from the origin to a point on the plane W . (Note that W0 is the plane
described by ax + by + cz = 0.)
Note that we already showed in Corollary 3.22 that W is an affine vector space.
RA
Remark. If a = b = c = 0, then W = ∅.

• W = {~x ∈ R3 : k~xk ≥ 5} is not a subspace of R3 since it does not contain ~0.


 
5
• W = {~x ∈ R3 : k~xk ≤ 9} is not a subspace of R3 . For example, take ~z = 0. Then ~z ∈ W ,
0
however, for example, 7~z ∈
/ W (or: ~z + ~z ∈
/ W ).
D

    
 x  1
• W = x2  : x ∈ R . Then W is not a vector space. For example, ~a = 1 ∈ W , but
 3
x 1

 
2
2~a = 2 ∈ / W.
2

Examples 5.16 (Examples and non-examples of subspaces of M (m × n). The following


sets are examples for subspaces of M (m × n):

• The set of all matrices with a11 = 0.


• The set of all matrices with a11 = 5a12 .

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 153

• The set of all matrices such that its first row is equal to its last row.

If m = n, then also the following sets are subspaces of M (n × n):

• The set of all symmetric matrices.


• The set of all antisymmetric matrices.
• The set of all diagonal matrices.
• The set of all upper triangular matrices.
• The set of all lower triangular matrices.

The following sets are not subspaces of M (n × n):


• The set of all invertible matrices.
• The set of all non-invertible matrices.
• The set of all matrices with determinant equal to 1.

FT
Examples 5.17 (Examples and non-examples of subspaces of the set all functions from
R to R). Let V be the set of all functions from R to R. Then V clearly is a real vector space.
The following sets are examples for subspaces of V :

• The set of all continuous functions.


• The set of all differentiable functions.
• The set of all bounded functions.
RA
• The set of all polynomials.
• The set of all polynomials with degree ≤ 5.
• The set of all functions f with f (7) = 0.
• The set of all even functions.
• The set of all odd functions.

The following sets are not subspaces of V :


D

• The set of all polynomials with degree 3.


• The set of all polynomials with degree ≥ 3.
• The set of all functions f with f (7) = 13.
• The set of all functions f with f (7) ≥ 0.

Prove the claims above.

Definition 5.18. For n ∈ N0 let Pn be the set of all polynomials of degree less than or equal to n.

Remark 5.19. Pn is a vector space.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
154 5.3. Linear combinations and linear independence

Proof. Clearly, the zero function belongs to Pn (it is a polynomial of degree 0). For polynomials
p, q ∈ Pn and numbers λ ∈ R we clearly have that p + q and λp are again polynomials of degree
at most n, so they belong to Pn . By Proposition 5.10, Pn is a subspace of the space of all real
functions, hence it is a vector space.

You should have understood


• the concept of a subspace of a given vector space,
• why we only have to check if a given subset of a vector space is non-empty, closed under
sum and closed under multiplication with scalars if we want to see if it is a subspace,
• etc.
You should now be able to
• give examples and non-examples of subspaces of vector spaces,

FT
• check if a given subset of a vector space is a subspace,
• etc.

5.3 Linear combinations and linear independence


In this section, we work mostly with real vector spaces for the sake of definiteness. However, all
the statements are also true for complex vector spaces. We only have to replace R by C and the
RA
word real by complex everywhere.

We start with a definition.

Definition 5.20. Let V be a real vector space and let v1 , . . . , vk ∈ V and α1 , . . . , αk ∈ R. Then
every vector of the form
v = α1 v1 + · · · αk vk (5.1)
is called a linear combination of the vectors v1 , . . . , vk ∈ V .
D

       
1 4 9 3
Examples 5.21. • Let V = R3 and let ~v1 = 2 , ~v2 = 5 , ~a = 12 , ~b = 3 .
3 6 15 3
Then ~a and ~b are linear combinations of ~v1 and ~v2 because ~a = ~v1 + 2~v2 and ~b = −~v1 + ~v2 .
       
1 0 0 1 5 7 1 2
• Let V = M (2 × 2) and let A = , B= , R= , S= .
0 1 −1 0 −7 5 −2 3
Then R is a linear combination of A and B because R = 5A+7B. S is not a linear combination
of A and B because clearly every linear combination of A and B is of the form
 
α β
αA + βB =
−β α

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 155

so it can never be equal to S since S has two different numbers on its diagonal.

Definition and Theorem 5.22. Let V be a real vector space and let v1 , . . . , vk ∈ V . Then the
set of all their possible linear combinations is denoted by
span{v1 , . . . , vk } := {α1 v1 + · · · + αk vk : α1 , . . . , αk ∈ R}.
It is a subspace of V and it is called the linear span of the vectors v1 , . . . , vk . The vectors v1 , . . . , vk
are called generators of the subspace span{v1 , . . . , vk }.

Remark. By definition, the vector space generated by the empty set is the vector space which
consists only of the zero vector, that is, span{} := {O}.

Remark. Other names for “linear span” that are commonly used, are subspace generated by
the v1 , . . . , vk or subspace spanned by the v1 , . . . , vk . Instead of span{v1 , . . . , vk } the notation
gen{v1 , . . . , vk } is used frequently. All these names and notations mean exactly the same thing.

FT
Proof of Theorem 5.22. We have to show that W := span{v1 , . . . , vk } is a subspace of V . To this
end we use Proposition 5.10 again. Clearly, W is not empty since at least O ∈ W (we only need
to choose all the αj = 0). Now let u, w ∈ W and λ ∈ R. We have to show that λu + w ∈ W . Since
u, w ∈ W , there are real numbers α1 , . . . , αk and β1 , . . . , βk such that u = α1 v1 + . . . , αk vk and
w = β1 w1 + · · · + βk vk . Then
λu + v = λ(α1 v1 + · · · + αk vk ) + β1 w1 + · · · + βk vk
= λα1 v1 + · · · + λαk vk ) + β1 w1 + · · · + βk vk
RA
= (λα1 + β1 )v1 + · · · + (λαk + βk )vk
which belongs to W since it is a linear combination of the vectors v1 , . . . , vk .

Remark. The generators of a given subspace are not unique.


    
1 0 −1 0 1 −1
For example, let A = , B= , C= . Then
0 1 01 1 1
  
α −β
D

span{A, B} = {αA + βB : α, β ∈ R} = : α, β ∈ R ,
β α
  
α + γ −(β + γ)
span{A, B, C} = {αA + βB + γC : α, β, γ ∈ R} = : α, β, γ ∈ R ,
β+γ α+γ
  
α+γ −γ
span{A, C} = {αA + γC : α, γ ∈ R} = : α, γ ∈ R .
γ α+γ
We see that span{A, B} = span{A, B, C} = span{A, C} (in all cases it consists of exactly those
matrices whose diagonal entries are equal and the off-diagonal entries differ by a minus sign). So
we see that neither the generators nor their number is unique.

Remark. If a vector is a linear combination of other vectors, then the coefficients in the linear
combination are not necessarily unique.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
156 5.3. Linear combinations and linear independence

For example, if A, B, C are the matrices above, then A + B + C = 2A + 2B = 2C or A + 2B + 3C =


4A + 5B = B + 4C, etc.

Remark 5.23. Let V be a vector space and let v1 , . . . , vn and w1 , . . . , wm be vectors in V . Then
the following are equivalent:

(i) span{v1 , . . . , vn } = span{w1 , . . . , wm }.

(ii) vj ∈ span{w1 , . . . , wm } for every j = 1, . . . , n and wk ∈ span{v1 , . . . , vn } for every k =


1, . . . , m.

Proof. (i) =⇒ (ii) is clear.


(ii) =⇒ (i): Note that vj ∈ span{w1 , . . . , wm } for every j = 1, . . . , n implies that every vj can be
written as a linear combination of the w1 , . . . , wm . Then also every linear combination of v1 , . . . , vn
is a linear combination of w1 , . . . , wm . This implies that span{v1 , . . . , vn } ⊆ span{w1 , . . . , wm }. The
converse inclusion span{w1 , . . . , wm } ⊆ span{v1 , . . . , vn } can be shown analogously. Both inclusions
together show that we must have equality.

FT
Examples 5.24. (i) Pn = span{1, X, X 2 , . . . , X n−1 , X n } since every vector in Pn is a polyno-
mial of the form p = αn X n + αn−1 X n−1 + · · · + α1 X + α0 , so it is a linear combination of
the polynomials X n , X n−1 , . . . , X, 1.

Exercise. Show that {1, 1 + X, X + X 2 , . . . , X n−1 + X n } is also a set of generators of


Pn .
RA
 
0 1
(ii) The set of all antisymmetric 2 × 2 matrices is generated by .
−1 0

~ ∈ R3 \ {~0}.
(iii) Let V = R3 and let ~v , w

• span{~v } is a line which passes through the origin and is parallel to ~v .


• If ~v 6k w,
~ then span{~v , w}
~ is a plane which passes through the origin and is parallel to ~v
and w. ~ If ~v k w,
~ then it is a line which passes through the origin and is parallel to ~v .
D

Example 5.25. Let p1 = X 2 − X + 1, p2 = X 2 − 2X + 5 ∈ P2 , and let U = span{p1 , p2 }. Check


if q = 2X 2 − X − 2 and r = X 2 + X − 3 belong to U .

Solution. • Let us check if q ∈ U . To this end we have to check if we can find α, β such that
q = αp1 + βp2 . Inserting the expressions for p1 , p2 , q we obtain

2X 2 − X − 2 = α(X 2 − X + 1) + β(X 2 − 2X + 5) = X 2 (α + β) + X(−α − 2β) + α + 5β.

Comparing coefficients of the different powers of X, we obtain the system of equations

α+ β=2
−α − 2β = −1
α + 5β = −2.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 157

We use the Gauß-Jordan process to solve the system:


     
1 1 2 1 1 2 1 0 3
A = −1 −2 −1 −→ 0 −1 1 −→ 0 1 −1
1 5 −2 0 4 −4 0 0 0

It follows that α = 3 and β = −1 is a solution, and therefore q = 2p1 − p2 which shows that
q ∈ U.
• Let us check if r ∈ U . To this end we have to check if we can find α, β such that r = αp1 +βp2 .
Inserting the expressions for p1 , p2 , q we obtain

X 2 + X − 3 = α(X 2 − X + 1) + β(X 2 − 2X + 5) = X 2 (α + β) + X(−α − 2β) + α + 5β.

Comparing coefficients of the different powers of X, we obtain the system of equations

α+ β=1
−α − 2β = 1

FT
α + 5β = −3.

We use the Gauß-Jordan process to solve the system:


     
1 1 1 1 1 2 1 0 2
A = −1 −2 1 −→ 0 −1 2 −→ 0 1 −2 .
1 5 −3 0 4 −4 0 0 4

We see that the system is inconsistent. Therefore r is not a linear combination of p1 and p2 ,
RA
hence r ∈
/ U. 

Definition 5.26. A vector space V is called finitely generated if is has a finite set of generators.

Examples 5.27. The following vector spaces are finitely generated.

• The trivial vector space {O} is finitely generated.


• Rn because clearly Rn = gen{~e1 , . . . , ~en } where ~ej is the jth unit vector.
D

• M (m × n) because it is generated by the set of all possible matrices which are 0 everywhere
except a 1 in exactly one entry.
• Pn is finitely generated as was shown in Example 5.24.
• Let P be the vector space of all real polynomials. Then P is not finitely generated.
Proof. Assume that P is finitely generated and let q1 , . . . , qk be a system of generators of P .
Note that the qj are polynomials. We will denote their degrees by mj = deg qj and we set
M = max{m1 , . . . , mk }. Then any linear combination of them will be a polynomial of degree
at most M no matter who we choose the coefficients, However, there are elements in P which
have higher degree, for example X m+1 . Therefore q1 , . . . , qk cannot generate all of P .

Another proof using the concept of dimension will be given in Example 5.54 (f).

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
158 5.3. Linear combinations and linear independence

Later, in Lemma 5.51, we will see that every subspace of a finitely generated vector space is again
finitely generated.

Now we ask ourselves what is the least number of vectors we need in order to generate Rn . We
know that for example Rn = span{~e1 , . . . , ~en }. So in this case we have n vectors that generate
Rn . Could it be that fewer vectors are sufficient? Clearly, if we take away one of the ~ej , then the
remaining system no longer generates Rn since “one coordinate is missing”. However, could we
maybe find other vectors so that n − 1 or less vectors are enough to generate all of Rn ? The next
proposition says that this is not possible.

Proposition 5.28. Let ~v1 , . . . , ~vk be vectors in Rn . If span{~v1 , . . . , ~vk } = Rn , then k ≥ n.

Proof. Let A = (~v1 | . . . |~vk ) be the matrix whose columns are the given vectors. We know that
there exists an invertible matrix E such that A0 = EA is in reduced echelon form (the matrix E
is the product of elementary matrices which correspond to the steps in the Gauß-Jordan process
to arrive at the reduced echelon form). Now, if k < n, then we know that A0 must have at least

FT
one row which consists of zeros only. If we can find a vector w ~ such that it is transformed to ~en
under the Gauß-Jordan process, then we would have that A~x = w ~ is inconsistent, which means
that w~ ∈
/ span{~v1 , . . . , ~vk }. How do we find such a vector w?
~ Well, we only have to start with ~en
and “do the Gauß-Jordan process backwards”. In other words, we can take w ~ = E −1~en . Now if we
apply the Gauß-Jordan process to the augmented matrix (A|w), ~ we arrive at (EA|E w) ~ = (A0 |~en )
which we already know is inconsistent.
Therefore, k < n is not possible and therefore we must have that k ≥ n.
RA
Note that the proof above is basically the same as the one in Remark 3.36. Observe that the system
of vectors ~v1 , . . . , ~vk ∈ Rn is a set of generators for Rn if and only if the equation A~y = ~b has a
solution for every ~b ∈ Rn (as above, A is the matrix whose columns are the vectors ~v1 , . . . , ~vk ).

Now we will answer the question when the coefficients of a linear combination are unique. The
following remark shows us that we have to answer this question only for the zero vector.

Remark 5.29. Let V be a vector space, let v1 , . . . , vk ∈ V and let w ∈ span{v1 , . . . , vk }. Then
there are unique β1 , . . . , βk ∈ R such that
D

β1 v 1 + · · · + βk vk = w (5.2)

if and only if there are unique α1 , . . . , αk ∈ R such that

α1 v1 + · · · + αk vk = O. (5.3)

Proof. First note that (5.3) always has at least one solution, namely α1 = · · · = αk = 0. This
solution is called the trivial solution.
Let us assume that (5.2) has two different solutions, so that there are γ1 , . . . , γk ∈ R such that for
at least one j = 1, . . . , k we have that βj 6= γj and

γ1 v1 + · · · + γk vk = w. (5.2’)

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 159

Subtracting (5.2) and (5.2’) gives


(β1 − γ1 )v1 + · · · + (βk − γk )vk = w − w = O
where at least one coefficient is different from zero. Therefore also (5.3) has more than one solution.
On the other hand, let us assume that (5.3) has a non-trivial solution, that is, at least one of the
αj in (5.3) is different from zero. But then, if we sum (5.2) and (5.3), we obtain another solution
for (5.2) because
(α1 + β1 )v1 + · · · + (α1 + βk )vk = O + w = w.
The proof shows that there are as many solutions of (5.2) as there are of (5.3).
It should also be noted that if (5.3) has one non-trivial solution, then it has automatically infinitely
many solutions, because if α1 , . . . , αk is a solution, then also cα1 , . . . , cαk is a solution for arbitrary
c ∈ R since
cα1 v1 + · · · + cαk vk = c(α1 v1 + · · · + αk vk ) = c O = O.

In fact, the discussion above should remind you of the relation between solutions of an inhomo-
geneous system and the solutions of its associated homogeneous system in Theorem 3.21. Note

FT
that just as in the case of linear systems, (5.2) could have no solution. This happens if and only
if w ∈
/ span{v1 , . . . , vk }.
If V = Rn then the remark above is exactly Theorem 3.21.

So we see that only one of the following two cases can occur: (5.4) as exactly one solution (namely
the trivial one) or it has infinitely many solutions. Note that this is analogous to the situation of
the solutions of homogeneous linear systems: They have either only the trivial solution or they have
infinitely many solutions. The following definition distinguishes between the two cases.
RA
Definition 5.30. Let V be a vector space. The set of vectors v1 , . . . , vk in V are called linearly
independent if
α1 v1 + · · · + αk vk = O. (5.4)
has only the trivial solution. They are called linearly dependent if (5.4) has more than one solution.

Remark 5.31. The empty set is linearly independent since O cannot be written as a nontrivial
linear combination of vectors from the empty set.
D

Before we continue with the theory, we give a few examples.


   
1 −4
Examples. (i) The vectors v~1 = and v~2 = ∈ R2 are linearly dependent because
2 −8
4v~1 + v~2 = ~0.
   
1 5
(ii) The vectors v~1 = and v~2 = ∈ R2 are linearly independent.
2 0

Proof. Consider the equation αv~1 + β v~2 = ~0. This equation is equivalent to the following
system of linear equations for α and β:
α + 3β = 0
2α + 0β = 0.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
160 5.3. Linear combinations and linear independence

We can use the Gauß-Jordan process to obtain all solutions. However, in this case we easily
1
see that α = 0 (from the  line) and then that β = − 3 α = 0. Note that we could
 second
1 3
also have calculated det = −6 6= 0 to conclude that the homogeneous system above
2 0
has only the trivial solution. Observe that the columns of the matrix are exactly the given
vectors.
   
1 2
(iii) The vectors v~1 = 1 and v~2 = 3 ∈ R2 are linearly independent.
1 4

Proof. Consider the equation αv~1 + β v~2 = ~0. This equation is equivalent to the following
system of linear equations for α and β:

α + 2β = 0
α + 3β = 0
α + 4β = 0.

 
1
 
−1 0FT
If we subtract the first from the second equation, we obtain β = 0 and then α = −2β = 0. So
again, this system has only the trivial solution and therefore the vectors v~1 and v~2 are linearly
independent.
   
0
(iv) Let v~1 = 1, v~2 =  2 v~3 = 0 and v~4 = 6 ∈ R2 Then
1 3 1 8
RA
(a) The system {~v1 , ~v2 , ~v3 } is linearly independent.
(b) The system {~v1 , ~v2 , ~v4 } is linearly dependent.

Proof. (a) Consider the equation αv~1 + β v~2 + γ v~3 = ~0. This equation is equivalent to the
following system of linear equations for α, β and γ:

α − 1β + 0γ = 0
α + 2β + 0γ = 0
D

α + 3β + 1γ = 0.

We use the Gauß-Jordan process to solve the system. Note that the columns of the
matrix associated to the above system are exactly the given vectors ~v1 , ~v2 , ~v3 .
       
1 −1 0 1 −1 0 1 −1 0 1 0 0
A = 1 2 0 −→ 0 3 0 −→ 0 1 0 −→ 0 1 0 .
1 3 1 0 4 1 0 4 1 0 0 1

Therefore the unique solution is α = β = γ = 0 and consequently the vectors ~v1 , ~v2 , ~v3
are linearly independent.
Observe that we also could have calculated det A = 3 6= 0 to conclude that the homoge-
neous system has only the trivial solution.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 161

(b) Consider the equation αv~1 + β v~2 + δ v~4 = ~0. This equation is equivalent to the following
system of linear equations for α, β and δ:
α − 1β + 0δ = 0
α + 2β + 6δ = 0
α + 3β + 8δ = 0.
We use the Gauß-Jordan process to solve the system. Note that the columns of the
matrix associated to the above system are exactly the given vectors.
       
1 −1 0 1 −1 0 1 −1 0 1 −1 0
A = 1 2 6 −→ 0 3 6 −→ 0 1 2 −→ 0 1 2
1 3 8 0 4 8 0 1 2 0 0 0
 
1 0 2
−→ 0 1 2 .
0 0 0
So there are infinitely many solutions. If we take δ = t, then α = β = −2t. Consequently

FT
the vectors ~v1 , ~v2 , ~v3 are linearly dependent, because, for example, −2~v1 − 2~v2 + ~v3 = 0
(taking t = 1).
Observe that we also could have calculated det A = 0 to conclude that the system has
infinite solutions.
   
0 1 1 0
(v) The matrices and are linearly independent in M (2 × 2).
0 0 0 0
RA
     
1 1 1 0 0 1
(vi) The matrices A = , B= and C = are linearly dependent in M (2×2)
0 1 0 1 0 0
because A − B − C = 0.

After these examples we will proceed with some facts on linear independence. We start with the
special case when we have only two vectors.

Proposition 5.32. Let v1 , v2 be vectors in a vector space V . Then v1 , v2 are linearly dependent if
and only if one vector is a multiple of the other.
D

Proof. Assume that v1 , v2 are linearly dependent. Then there exist α1 , α2 ∈ R such that α1 v1 +
α2 v2 = 0 and at least one of the α1 and α2 is different from zero, say α1 6= 0. Then we have
α2
v1 + α 1
v2 = 0, hence v1 = − α
α1 v2 .
2

Now assume on the other hand that, e.g., v1 is a multiple of v2 , that is v1 = λv2 for some λ ∈ R.
Then v1 − λv2 = 0 which is a nontrivial solution of α1 v1 + α2 v2 = 0 because we can take α1 = 1 6= 0
and α2 = −λ (note that λ may be zero).

The proposition
 above cannot
 be extended
 to the case of three or more vectors. For instance, the
1 ~ 0 1
vectors ~a = ,b= , ~c = are linearly dependent because ~a + ~b − ~c = ~0, but none of
0 1 1
them is a multiple of any of the other two vectors.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
162 5.3. Linear combinations and linear independence

Proposition 5.33. Let V be a vector space.

(i) Every system of vectors which contains O is linearly dependent.


(ii) Let v1 , . . . , vk ∈ V and assume that there are α1 , . . . , αk ∈ R such that α1 v1 + · · · + αk vk = O.
If α` 6= 0, then v` is a linear combination of the other vj .
(iii) If the vectors v1 , . . . , vk ∈ V are linearly dependent, then for every w ∈ V , the vectors
v1 , . . . , vk , w are linearly dependent.
(iv) If v1 , . . . , vk are vectors in V and w is a linear combination of them, then v1 , . . . , vk , w are
linearly dependent.
(v) If the vectors v1 , . . . , vk ∈ V are linearly independent, then every subset of them is linearly
independent.

Proof. (i) Let v1 , . . . , vk ∈ V . Clearly 1O +0v1 +· · ·+0vk = O is a non-trivial linear combination


which gives O. Therefore the system {O, v1 , . . . , vk } is linearly dependent.

FT
α`−1 α`+1 αk
(ii) If α` 6= 0, then we can solve for v` : v` = − αα1` v1 − · · · − α` v`−1 − α` v`+1 − ··· − α ` vk .

(iii) If the vectors v1 , . . . , vk ∈ V are linearly dependent, then there exist α1 , . . . , αk ∈ R such
that at least one of them is different from zero and α1 v1 + · · · + αk vk = O. But then also
α1 v1 + · · · + αk vk + 0w = O which shows that the system {v1 , . . . , vk , w} is linearly dependent.

(iv) Assume that w is a linear combination of v1 , . . . , vk . Then there exist α1 , . . . , αk ∈ R such


that w = α1 v1 + · · · + αk vk . Therefore we obtain w − α1 v1 − · · · − αk vk = O which is a
RA
non-trivial linear combination since the coefficient of w is 1.

(v) Suppose that a subsystem of v1 , . . . , vk ∈ V are linearly dependent. Then, by (iii) every
system in which it is contained, must be linearly dependent too. In particular, the original
system of vectors must be linearly dependent which contradicts our assumption. Note that
also the empty set is linearly independent by Remark 5.31.

Now we specialise to the case when V = Rn . Let us take vectors ~v1 , . . . , ~vk ∈ Rn and let us write
(~v1 | · · · |~vk ) for the n × k matrix whose columns are the vectors ~v1 , . . . , ~vk .
D

Lemma 5.34. With the above notation, the following statements are equivalent:

(i) ~v1 , . . . , ~vk are linearly dependent.

(ii) There exist α1 , . . . , αk not all equal to zero, such that α1~v1 + · · · + αk~vk = 0.
   
α1 α1
 ..  ~  ..  ~
(iii) There exists a vector  .  6= 0 such that (~v1 | · · · |~vk )  .  = 0.
αk αk

(iv) The homogeneous system corresponding to the matrix (~v1 | · · · |~vk ) has at least one non-trivial
(and therefore infinitely many) solutions.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 163

Proof. (i) =⇒ (ii) is simply the definition of linear independence. (ii) =⇒ (iii) is only rewriting
the vector equation in matrix form. (iv) only says in word what the equation in (iii) means. And
finally (iv) =⇒ (i) holds because every non trivial solution the inhomogeneous system associated
to (~v1 | · · · |~vk ) gives a non-trivial solution of α1~v1 + · · · + αk~vk .

Since we know that a homogeneous linear system with more unknowns than equations has infinitely
many solutions, we immediately obtain the following corollary.

Corollary 5.35. Let ~v1 , . . . , ~vk ∈ Rn .

(i) If k > n, then the vectors ~v1 , . . . , ~vk are linearly dependent.
(ii) If the vectors ~v1 , . . . , ~vk are linearly independent, then k ≤ n.

Observe that (ii) does not say that if k ≤ n, then the vectors ~v1 , . . . , ~vk are linearly independent.
It only says that they have a chance to be linearly independent whereas a system with more than
n vectors always is linearly dependent.

FT
Now we specialise further to the case when k = n.

Theorem 5.36. Let ~v1 , . . . , ~vn be vectors in Rn . Then the following are equivalent:

(i) ~v1 , . . . , ~vn are linearly independent.


   
α1 α1
(ii) The only solution of (~v1 | · · · |~vn )  ...  = ~0 is the zero vector  ...  = ~0.
   
RA
αn αn
(iii) The matrix (~v1 | · · · |~vn ) is invertible.
(iv) det(~v1 | · · · |~vn ) 6= 0.

Proof. The equivalence of (i) and (ii) follows from Lemma 5.34. The equivalence of (ii), (iii) and
(iv) follows from Theorem 4.11.
D

Formulate an analogous theorem for linearly dependent vectors.

Now we can state when a system n vectors in Rn generates Rn .

Theorem 5.37. Let ~v1 , . . . , ~vn be vectors in Rn . and let A = (~v1 | · · · |~vn ) be the matrix whose
columns are the given vectors ~v1 , · · · , ~vn . Then the following are equivalent:

(i) ~v1 , . . . , ~vn are linearly independent.

(ii) Rn = span{~v1 , . . . , ~vn }.

(iii) det A 6= 0.

Proof. (i) ⇐⇒ (iii) is shown in Theorem 5.36.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
164 5.3. Linear combinations and linear independence

(ii) ⇐⇒ (iii): The vectors ~v1 , . . . , ~vn generate Rn if and only if for every w
~ ∈ Rn there exist
! numbers
β1
β1 , . . . , βn such that β1~v1 + · · · + βn vn = w.
~ In matrix form that means that A .. = w.
~ By
.
βn
Theorem 3.43 we know that this has a solution for every vector w ~ if and only if A is invertible
(because if we apply Gauß-Jordan to A, we must get to the identity matrix).

The proof of the preceding theorem basically goes like this: We consider the equation Aβ~ = w. ~
When are the vectors ~v1 , . . . , ~vn linearly independent? – They are linearly independent if and only
~ = ~0 the system has only the trivial solution. This happens if and only if the reduced echelon
if for w
form of A is the identity matrix. And this happens if and only if det A 6= 0.
When do the vectors ~v1 , . . . , ~vn generate Rn ? – They do, if and only if for every given vector w
~ ∈ Rn
the system has at least one solution. This happens if and only if the reduced echelon form of A is
the identity matrix. And this happens if and only if det A 6= 0.
Since a square matrix A in invertible if and only if its transpose At is invertible, Theorem 5.37 leads
immediately to the following corollary.

FT
Corollary 5.38. For a matrix A ∈ M (n × n) the following are equivalent:
(i) A is invertible.
(ii) The columns of A are linearly independent.
(iii) The rows of A are linearly independent.

We end this section with more examples.


RA
Examples. • Recall that Pn is the vector space of all polynomials of degree ≤ n.
In P3 , we consider the vectors p1 = X 3 − 1, p2 = X 2 − 1, p3 = X − 1. These vectors are
linearly independent.

Proof. Let α1 , α2 , α3 ∈ R such that α1 p1 + α2 p2 + α3 p3 = 0. This means that


0 = α1 (X 3 − 1) + α2 (X 2 − 1) + α3 (X − 1)
= α1 X 3 + α2 X 2 + α3 X − (α1 + α2 + α3 ).
D

Comparing coefficients, it follows that α1 = 0, α2 = 0, α3 = 0 and α1 + α2 + α3 = 0 which


shows that p1 , p2 and p3 are linearly independent.

If in addition we take p4 = X 3 − X 2 , then the system p1 , p2 , p3 and p4 is linearly dependent.

Proof. As before, let α1 , α2 , α3 , α4 ∈ R such that α1 p1 + α2 p2 + α3 p3 + α4 p4 = 0. This means


that
0 = α1 (X 3 − 1) + α2 (X 2 − 1) + α3 (X − 1) + α4 (X 3 − X 2 )
= (α1 + α4 )X 3 + (α2 − α4 )X 2 + α3 X − (α1 + α2 + α3 ).
Comparing coefficients, this is equivalent to α1 +α4 = 0, α2 −α4 = 0, α3 = 0 and α1 +α2 +α3 =
0. This system of equations has infinitely many solutions. They are given by α2 = α4 = −α1 ∈
R, α3 = 0 (verify this!). Therefore p1 , p2 , p3 and p4 are linearly dependent.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 165

Exercise. Show that p1 , p2 , p3 and p5 are linearly independent if p5 = X 3 + X 2 .

• In P2 , we consider the vectors p1 = X 2 + 2X − 1, p2 = 5X + 2, p3 = 2X 2 − 11X − 8. These


vectors are linearly dependent.

Proof. Let α1 , α2 , α3 ∈ R such that α1 p1 + α2 p2 + α3 p3 = 0. This means that

0 = α1 (X 2 + 2X − 1) + α2 (5X + 2) + α3 (2X 2 − 11X − 8)


= (α1 + 2α3 )X 2 + (2α1 + 5α2 − 11α3 )X − α1 + 2α2 − 8α3 ).

Comparing coefficients, it follows that α1 +2α3 = 0, 2α1 +5α2 −11α3 = 0, −α1 +2α2 −8α3 = 0.
We write this in matrix form and apply Gauß-Jordan:
       
1 0 2 1 0 2 1 0 2 1 0 2
 2 5 −11 −→ 0 5 −15 −→ 0 1 −3 −→ 0 1 −3 .
−1 2 −8 0 2 −6 0 1 −3 0 0 0

FT
This shows that the system has non-trivial solutions (find them!) and therefore p1 , p2 and p3
are linearly dependent.
     
1 2 1 0 0 5
• In V = M (2 × 2) consider A = ,B = ,C = . Then A, B, C are
2 1 0 1 5 0
linearly dependent because A − B − 51 C = 0.
     
1 2 3 2 2 2 1 2 2
• In V = M (2 × 3) consider A = ,B = ,C = . Then A, B, C
4 5 6 1 1 1 2 1 1
RA
are linearly independent.

Exercise. Prove this!

You should have understood


• what a linear combination is,
D

• the concept of linear independence,


• the concept of linear span and that it consists either of only the zero vector or of infinitely
many vectors,
• geometrically the concept of linear independence in R2 and R3 ,
• that the coefficients in a linear combination are not necessarily unique,
• what the number of solutions of A~x = ~0 says about the linear independence of the columns
of A seen as vectors in Rn ,
• what the existence (or non-existence) of solutions of A~x = ~b for all ~b ∈ Rm says about the
span of the columns of A seen as vectors in Rn ,
• why a matrix A ∈ M (n × n) is invertible if and only if its columns are linearly independent,

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
166 5.4. Basis and dimension

• etc.

You should now be able to


• verify if a given vector is a linear combination of a given set of vectors,
• verify if a given vector lies in the linear span of a given set of vectors,
• verify if a given set of vectors is a generator of a given vectors space,
• find a set of generators for a given vectors space,
• verify if a given set of vectors is a linearly independent,
• etc.

5.4 Basis and dimension


In this section, we work mostly with real vector spaces for the sake of definiteness. However, all
the statements are also true for complex vector spaces. We only have to replace R by C and the

FT
word real by complex everywhere.

Definition 5.39. Let V be a vector space. A basis of V is a set of vectors {v1 , . . . , vn } in V which
is linearly independent and generates V .

The following remark shows that a basis is a minimal system of generators of V and at the same
RA
time a maximal system of linear independent vectors.

Remark. Let {v1 , . . . , vn } be a basis of V .


(i) Let w ∈ V . Then {v1 , . . . , vn , w} in not a basis of V because this system of vectors is no
longer linearly independent by Proposition 5.33 (iv).
(ii) If we take away one of the vectors from {v1 , . . . , vn }, then it is no longer a basis of V be-
cause the new system of vectors no longer generates V . For example, if we take away v1 ,
then v1 ∈/ span{v2 , . . . , vn } (otherwise v1 , . . . , vn would be linearly dependent), and therefore
D

span{v2 , . . . , vn } =
6 V.

Remark 5.40. By definition, the empty set is a basis of the trivial vector space {O}.

Remark 5.41. Every basis of Rn has exactly n elements. To see this note that by Corollary 5.35,
a basis can have at most n elements because otherwise it cannot be linearly independent. On the
other hand, if it had less than n elements, then, by Remark 5.28, it cannot generate Rn .
     
 1 0 0 
Examples 5.42. • A basis of R3 is, for example, 0 , 1 , 0 . The vectors of this
0 0 1
 
basis are the standard unit vectors. The basis is called the standard basis (or canonical basis)
of R3 .

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 167

Other examples of bases of R3 are


           
 1 1 1   1 4 0 
0 , 1 , 1 , 2 , 5 , 2
0 0 1 3 6 1
   

Exercise. Verify that the systems above are bases of R3 .


The following systems are not bases of R3
                 
 1 4 3   1 4   1 1 1 0 
2 , 5 , 2 , 2 , 5 , 0 , 1 , 1 , 0 .
3 9 5 3 6 0 0 1 1
     

Exercise. Verify that the systems above are not bases of R3 .


• The standard basis in Rn (or canonical basis in Rn ) is {~e1 , . . . ,~en }. Recall that the ~ej are the
standard unit vectors whose jth entry is 1 and all other entries are 0.
Exercise. Verify that they form a basis of Rn .

FT
• The standard basis in Pn (or canonical basis in Pn ) is {1, X, X 2 , . . . , X n }.
Exercise. Verify that they form a basis of Pn .
• Let p1 = X, p2 = 2X 2 + 5X − 1, p3 = 3X 2 + X + 2. Then the system {p1 , p2 , p3 } is a basis
of P2 .

Proof. We have to show that the system in linearly independent and that it generates the
RA
space P2 . Let q = aX 2 + bX + c ∈ P2 . We want to see if there are α1 , α2 , α3 ∈ R such that
q = α1 p1 + α2 p2 + α3 p3 . If we write this equation out, we find
aX 2 + bX + c = α1 X + α2 (2X 2 + 5X − 1) + α3 (3X 2 + X + 2)
= (2α2 + 3α3 )X 2 + (α1 + 5α2 + α3 )X − α2 + 2α3 .
Comparing coefficients, we obtain the following system of linear equations for the αj :
     
2α2 + 3α3 = a  0 2 3 α1 a
α1 + 5α2 + α3 = b in matrix form: 1 5 1 α2  =  b  .
D

−α2 + 2α3 = c 0 −1 2 α3 c

Now we apply Gauß-Jordan to the augmented matrix:


     
0 2 3 a 1 5 1 b 1 0 11 b + 5c
1 5 1 b −→ 0 −1 2 c −→ 0 1 −2 c .
0 −1 2 c 0 2 3 a 0 0 7 a + 2c
So we see that there is exactly one solution for any given q. The existence of such a solution
shows that {p1 , p2 , p3 } generates P2 . We also see that for any give q ∈ P2 there is exactly one
way to write it as a linear combination of p1 , p2 , p3 . If we take the special case q = 0, this
shows that the system is linearly independent. In summary, {p1 , p2 , p3 } is a basis of P2 .

• Let p1 = X + 1, p2 = X 2 + X, p3 = X 3 + X 2 , p4 = X 3 + X 2 + X + 1. Then the system


{p1 , p2 , p3 , p4 } is not a basis of P2 .

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
168 5.4. Basis and dimension

Exercise. Show this!


• In the spaces M (m × n), the set of all matrices Aij form a basis where Aij is the matrix with
aij = 1 and all other entries equal to 0. For example, in M (2 × 3) we have the following basis:
           
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0
, , , , , .
0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1
       
1 0 1 0 1 0 1 1
• Let A = , B= , C= , D= . Then {A, B, C, D} is a basis
0 0 1 0 1 1 1 1
of M (2 × 2).
 
a b
Proof. Let M = be an arbitrary 2 × 2 matrix. Consider the equation M = α1 A +
c d
α2 B + α3 C + α4 D. This leads to
         
a b 1 0 1 0 1 0 1 1
= α1 + α2 + α3 + α4

FT
c d 0 0 1 0 1 1 1 1
 
α1 + α2 + α3 + α4 α4
= .
α2 + α3 + α4 α3 + α4

So we obtain the following set of equations for the αj :


     
α1 + α2 +α3 +α4 = a   1 1 1 1 α1 a
α4 = b 0 0 0 1 α2   b 

in matrix form:    =  .
α2 +α3 +α4 = c  0 1 1 1 α3   c 
RA

α3 +α4 = d 0 0 1 1 α4 d

Now we apply Gauß-Jordan to the augmented matrix:


     
1 1 1 1 a 1 1 1 1 a 1 1 1 0 a−b
 0 0 0 1 b 0 1 1 1 c 0 1 1 0 c − b
0 1 1 1 c −→ 0 0 1 1
    −→  
d 0 0 1 0 d − b
0 0 1 1 d 0 0 0 1 b 0 0 0 1 b
   
1 1 0 0 a−d 1 0 0 0 a−c
D

0 1 0 0 c − d 0 1 0 0 c − d
−→   −→  .
0 0 1 0 d − b 0 0 1 0 d − b
0 0 0 1 b 0 0 0 1 b

We see that there is exactly one solution for any given M ∈ M (2 × 2). Existence of the
solution shows that the matrices A, B, C, D generate M (2 × 2) and uniqueness shows that
they are linearly independent if we choose M = 0.

The next theorem is very important. It says that if V has a basis which consists of n vectors, then
every basis consists of exactly n vectors.

Theorem 5.43. Let V be a vector space and let {v1 , . . . , vn } and {w1 , . . . , wn } be bases of V . Then
n = m.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 169

Proof. Suppose that m > n. We will show that then the vectors w1 , . . . , wm cannot be linearly
independent, hence they cannot be a basis of V . Since the vectors v1 , . . . , vn are a basis of V , every
wj can be written as a linear combination of them. Hence there exist numbers aij which

w1 = a11 v1 + a12 v2 + · · · + a1n vn


w2 = a21 v1 + a22 v2 + · · · + a2n vn
.. .. (5.5)
. .
wm = am1 v1 + am2 v2 + · · · + amn vn .

Now we consider the equation


c1 w1 + · · · + cm wm = O. (5.6)
If the w1 , . . . , wm were linearly independent, then it should follow that all cj are 0. We insert (5.5)
into (5.6) and obtain

O = c1 (a11 v1 + a12 v2 + · · · + a1n vn ) + c2 (a21 v1 + a22 v2 + · · · + a2n vn )

FT
+ · · · + cm (am1 v1 + am2 v2 + · · · + amn vn )
= (c1 a11 + c2 a21 + · · · + cm am1 )v1 + · · · + (c1 a1n + c2 a2n + · · · + cm amn )vn .

Since the vectors v1 , . . . , vn are linearly independent, the expressions in the parentheses must be
equal to zero. So we find

c1 a11 + c2 a21 + · · · + cm am1 = 0


c1 a12 + c2 a22 + · · · + cm am2 = 0
RA
.. .. (5.7)
. .
c1 a1n + c2 a2n + · · · + cm amm = 0.

This is a homogeneous system of n equations for the m unknowns c1 , . . . , cm . Since n < m it


must have infinitely many solutions. So the system {w1 , . . . , wm } is not linearly independent and
therefore it cannot be a basis of V . Therefore m > n cannot be true and it follows that n ≥ m.
If we assume that n > m, then the same argument as above, with the roles of the vj and the wj
exchanged, leads to a contradiction and it follows n ≤ m.
D

In summary we showed that both n ≥ m and n ≤ m must be true. Therefore m = n.

Definition 5.44. • Let V be a finitely generated vector space. Then it has a basis by The-
orem 5.45 below and by Theorem 5.43 the number n of vectors needed for a basis does not
depend on the particular chosen basis. This number is called the dimension of V . It is denoted
by dim V .

• If a vector space V is not finitely generated, then we set dim V = ∞.

• The empty set is a basis of the trivial vector space {O}, hence dim{O} = 0.

Next we show that every finitely generated vector space has a basis and therefore a well-defined
dimension.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
170 5.4. Basis and dimension

Theorem 5.45. Let V be a vector space and assume that there are vectors w1 , . . . , wm ∈ V such
that V = span{w1 , . . . , wm }. Then the set {w1 , . . . , wm } contains a basis of V . In particular, V
has a finite basis and dim V ≤ m.

Proof. Without restriction we may assume that all vectors wj are different from O. We start with
the first vector. If V = span{w1 }, then {w1 } is a basis of V and dim V = 1. Otherwise we set
V1 := span{w1 } and we note that V1 6= V . Now we check if w2 ∈ span{w1 }. If it is, we throw it out
because in this case span{w1 } = span{w1 , w2 } so we do not need w2 to generate V . Next we check
if w3 ∈ span{w1 }. If it is, we throw it out, etc. We proceed like this until we find a vector wi2 in
our list which does not belong to span{w1 }. Such an i2 must exist because otherwise we already
had that V1 = V . Then we set V2 := span{w1 , wi2 }. If V2 = V , then we are done. Otherwise, we
proceed as before: We check if wi2 +1 ∈ V2 . If this is the case, then we can throw it out because
span{w1 , wi2 } = span{w1 , wi2 , wi2 +1 }. Then we check wi2 +2 , etc., until we find a wi3 such that
wi3 ∈
/ span{w1 , wi2 } and we set V3 := span{w1 , wi2 , wi3 }. If V3 = V , then we are done. If not, then
we repeat the process. Note that after at most m repetitions, this comes to an end. This shows
that we can extract from the system of generators a basis {w1 , wi2 , . . . , wik } of V .

FT
The following theorem complements the preceding one.

Theorem 5.46. Let V be a finitely generated vector space. Then any system w1 , . . . , wm ∈ V of
linearly independent vectors can be completed to a basis {w1 , . . . , wm , vm+1 , . . . , vn } of V .

Proof. Note that dim V < ∞ by Theorem 5.45 and set n = dim V . It follows that n ≥ m because
we have m linearly independent vectors in V . If m = n, then w1 , . . . , wm is already a basis of V
RA
and we are done.
If m < n, then span{w1 , . . . , wm } = 6 V and we choose an arbitrary vector vm+1 ∈
/ span{w1 , . . . , wm }
and we define Vm+1 := span{w1 , . . . , wm , vm+1 }. Then dim Vm+1 } = m + 1. If m + 1 = n,
then necessarily Vm+1 = V and we are done. If m + 1 < n, then we choose an arbitrary vector
vm+2 ∈ V \ Vm+1 and we let Vm+2 := span{w1 , . . . , wm , vm+1 , vm+2 }. If m + 2 = n, then necessarily
Vm+2 = V and we are done. If not, we repeat the step before. Note that after n − m steps we have
found a basis {w1 , . . . , wm , vm+1 , . . . , vn } of V .

In summary, the two preceding theorems say the following:


D

• If the set of vectors v1 , . . . vm generates the vector space V , then it is always possible to extract
a subset which is a basis of V (we need to eliminate m − n vectors).

• If we have a set of linearly independent vectors v1 , . . . vm in a finitely generated vector space


V , then it is possible to find vectors vm+1 , . . . , vn such that {v1 , . . . , vn } is a basis of V (we
need to add dim V − m vectors).

Corollary 5.47. Let V be a vector space.

• If the vectors v1 , . . . , vk ∈ V are linearly independent, then k ≤ dim V .

• If the vectors v1 , . . . , vm ∈ V generate V , then m ≥ dim V .

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 171

  
1 0 1 1
Example 5.48. • Let A = , B= ∈ M (2 × 2) and suppose that we want to
0 0 1 1
complete them to a basis of M (2 × 2) (it is clear that A and B are linearly independent,
so this makes sense). Since dim(M (2 × 2)) = 4, we know  that we need 2 more matrices.
0 1
We take any matrix C ∈ / span{A, B}, for example C = . Finally we need a matrix
 0 0
0 0
D ∈/ span{A, B, C}. We can take for example D = . Then A, B, C, D is a basis of
1 0
M (2 × 2).

Check that D ∈
/ span{A, B, C}

Find other matrices C 0 and D0 such that {A, B, C 0 , D0 } is a basis of M (2 × 2).


           
1 4 1 0 0 2
• Given the vectors ~v1 = 0 , ~v2 = 0 , ~v3 = 2 , ~v4 = 2 , ~v5 = 0 , ~v6 = 1
1 4 3 1 2 5

FT
3
and we want to find a subset of them which form a basis of R .
Note that a priori it is not clear that this is possible because we do not know without further
calculations that the given vectors really generate R3 . If they do not, then of course it is
impossible to extract a basis from them.
Let us start. First observe that we need 3 vectors for a basis since dim R3 = 3. So we start
with the first non-zero vector which is ~v1 . We see that ~v2 = 4~v1 , so we discard it. We keep
~v3 since ~v3 ∈
/ span{~v1 }. Next, ~v4 = ~v3 − ~v1 , so ~v4 ∈ span{~v1 , ~v3 } and we discard it. A little
RA
calculation shows that ~v5 ∈/ span{~v1 , ~v3 }. Hence {~v1 , ~v3 , ~v5 } is a basis of R3 .

Remark 5.49. We will present a more systematic way to solve exercises of this type in
Theorem 6.34 and Remark 6.35.

Theorem 5.50. Let V be a vector space with basis {v1 , . . . , vn }. Then every x ∈ V can be written
in unique way as linear combination of the vectors v1 , . . . , vn .
D

Proof. We have to show existence and uniqueness of numbers c1 , . . . , cn so that w = c1 v1 +· · ·+cn vn .


Existence is clear since the set {v1 , . . . , vn } is a set of generators of V (it is even a basis!).
Uniqueness can be shown as follows. Assume that there are numbers c1 , . . . , cn and d1 , . . . , dn such
that w = c1 v1 + · · · cn vn and w = d1 v1 + · · · dn vn . Then it follows that

O = w − w = c1 v1 + · · · cn vn − (d1 v1 + · · · dn vn ) = (c1 − d1 )v1 + · · · (cn − dn )vn .


Then all the coefficients c1 − d1 , . . . , cn − dn have to be zero because the vectors v1 , . . . , vn are
linearly independent. Hence it follows that c1 = d1 , . . . , cn = dn , which shows uniqueness. Note
that the theorem is also true if V = {O} because by definition the empty sum is equal to zero.

If we have a vector space V and a subspace W ⊂ V , then we can ask ourselves what the relation
between their dimensions is because W itself is a vector space.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
172 5.4. Basis and dimension

Lemma 5.51. Let V be a finitely generated vector space and let W be a subspace. Then W is
finitely generated and dim W ≤ dim V .

Proof. Let V be a finitely generated vector space with dim V = n < ∞. Let W be a subspace of
V and assume that W is not finitely generated. Then we can construct an arbitrary large system
of linear independent vectors in W as follows. Clearly, W cannot be the trivial space, so we can
choose w1 ∈ W \ {O} and we set W1 = span{w1 }. Then W1 is a finitely generated subspace of
W , therefore W1 ( W and we can choose w2 ∈ W \ W1 . Clearly, the set {w1 , w2 } is linearly
independent. Let us set W2 = span{w1 , w2 }. Since W2 is a finitely generated subspace of W , it
follows that W2 ( W and we can choose w3 ∈ W \ W2 . Then the vectors w1 , w2 , w3 are linearly
independent and we set W3 = span{w1 , w2 , w3 }. Continuing with this procedure we can construct
subspaces W1 ( W2 ( · · · W with dim Wk = k for every k. In particular, we can find a system
of n + 1 linear independent vectors in W ⊆ V which contradicts the fact that any system of more
than n = dim V vectors in V must be linearly dependent, see Corollary 5.47. This also shows that
any system of more than n vectors in W must be linear dependent. Since a basis of W consists of
linearly independent vectors, it follows that dim W ≤ n = dim V .

FT
Theorem 5.52. Let V be a finitely generated vector space and let W ⊆ V be a subspace. Then the
following is true:

(i) dim W ≤ dim V .


(ii) dim W = dim V if and only if W = V .

Proof. (i) follows immediately from Lemma 5.51.


RA
(ii) If V = W , then clearly dim V = dim W . To show the converse, we assume that dim V =
dim W and we have to show that V = W . As before let {w1 , . . . , wk } be a basis of W . Then
these vectors are linearly independent in W , and therefore also in V . Since dim W = dim V ,
we know that these vectors form a basis of V . Therefore V = span{w1 , . . . , wm } = W .

Remark 5.53. Note that (i) is true even when V is not finitely generated because dim W ≤ ∞ =
dim V whatever dim W may be. However (ii) is not true in general for infinite dimensional vector
spaces. In Example 5.54 (f) and (g) we will show that dim P = dim C(R) in spite of P 6= C(R).
(Recall that P is the set of all polynomials and that C(R) is the set of all continuous functions. So
D

we have P ( C(R).)

Now we give a few examples of dimensions of spaces.

Examples 5.54. (a) dim Rn = n, dim Cn = n.

(b) dim M (m × n) = mn. This follows because the set of all m × n matrices Aij which have a 1 in
the ith row and jth column and all other entries are equal to zero form a basis of M (m × n)
and there are exactly mn such matrices.

(c) Let Msym (n × n) be the set of all symmetric n × n matrices. Then dim Msym (n × n) = n(n+1) 2 .
To see this, let Aij be the n × n matrix with aij = aji = 1 and all other entries equal to 0.
Observe that Aij = Aji . It is not hard to see that the set of all Aij with i ≤ j form a basis of

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 173

Msym (n × n). The dimension of Msym (n × n) is the number of different matrices of this type.
How many of them are there? If we fix j = 1, then only i = 1 is possible. If we fix j = 2,
then i = 1, 2 is possible, etc. until for j = n the allowed values for i are 1, 2, . . . , n. In total
we have 1 + 2 + · · · + n = n(n+1)
2 possibilities. For example, in the case n = 2, the matrices
are      
1 0 0 1 0 0
A11 = , A12 = , A12 = .
0 0 1 0 0 1
In the case n = 3, the matrices are
     
1 0 0 0 1 0 0 0 1
A11 = 0 0 0 , A12 = 1 0 0 , A13 = 0 0 0 ,
0 0 0 0 0 0 1 0 0
     
0 0 0 0 0 0 0 0 0
A22 = 0 1 0 , A23 = 0 0 1 , A33 = 0 0 0 .
0 0 0 0 1 0 0 0 1

FT
Convince yourself that the Aij form a basis of Msym (n × n).

(d) Let Masym (n × n) be the set of all antisymmetric n × n matrices. Then dim Masym (n × n) =
n(n−1)
2 . To see this, for i 6= j let Aij be the n × n matrix with aij = −aji = 1 and all other
entries equal to 0 form a basis of Msym (n × n). It is not hard to see that the set of all Aij
with i < j form a basis of Masym (n × n). How many of these matrices are there? If we fix
j = 2, then only i = 1 is possible. If we fix j = 3, then i = 1, 2 is possible, etc. until for j = n
the allowed values for i are 1, 2, . . . , n − 1. In total we have 1 + 2 + · · · + (n − 1) = n(n−1)
RA
2
possibilities. For example, in the case n = 2, the only matrix is
 
0 1
A12 = .
−1 0

In the case n = 3, the matrices are


     
0 1 0 0 0 1 0 0 0
A12 = −1 0 0 , A13 =  0 0 0 , A23 = 0 0 1 .
D

0 0 0 −1 0 0 0 −1 0

Convince yourself that the Aij form a basis of Masym (n × n).

Remark. Observe that dim Msym (n × n) + dim Masym (n × n) = n2 = dim M (n × n). This
is no coincidence. Note that every n × n matrix M can be written as

M = 12 (M + M t ) + 12 (M − M t )

and that 21 (M + M t ) ∈ Msym (n × n) and 21 (M − M t ) ∈ Masym (n × n). Moreover it is easy


to check that Msym (n × n) ∩ Masym (n × n) = {0}. Therefore M (n × n) is the direct sum
of Msym (n × n) and Masym (n × n). (For the definition of the direct sum of subspaces, see
Definition 7.19.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
174 5.4. Basis and dimension

(e) dim Pn = n + 1 since {1, X, . . . , X n } is a basis of Pn and consists of n + 1 vectors.

(f) dim P = ∞. Recall that P is the space of all polynomials.

Proof. We know that for every n ∈ N, the space Pn is a subspace of P . Therefore for every
n ∈ N, we must have that n + 1 = dim Pn ≤ dim P . This is possible only if dim P = ∞.

(g) dim C(R) = ∞. Recall that C(R) is the space of all continuous functions.

Proof. Since P is a subspace of C(R), it follows that dim P ≤ dim(C(R)), hence dim(C(R)) =
∞.

Now we use the concept of dimension to classify all subspaces of R2 and R3 . We already know that
for examples lines and planes which pass through the origin are subspaces of R3 . Now we can show
that there are no other proper subspaces.

FT
Subspaces of R2 . Let U be a subspace of R2 . Then U must have a dimension. So we have the
following cases:

• dim U = 0. In this case U = {~0} is the trivial subspace.

• dim U = 1. Then U is of the form U = span{~v1 } with some vector ~v1 ∈ R2 \ {~0}. Therefore
U is a line parallel to ~v1 passing through the origin.

• dim U = 2. In this case dim U = dim R2 . Hence it follows that U = R2 by Theorem 5.52 (ii).
RA
• dim U ≥ 3 is not possible because 0 ≤ dim U ≤ dim R2 = 2.

In conclusion, the only subspaces of R2 are {~0}, lines passing through the origin and R2 itself.

Subspaces of R3 . Let U be a subspace of R3 . Then U must have a dimension. So we have the


following cases:

• dim U = 0. In this case U = {~0} is the trivial subspace.

• dim U = 1. Then U is of the form U = span{~v1 } with some vector ~v1 ∈ R3 \ {~0}. Therefore
D

U is a line parallel to ~v1 passing through the origin.

• dim U = 2. Then U is of the form U = span{~v1 , ~v2 } with linearly independent vectors
~v1 , ~v2 ∈ R3 . Hence U is a plane parallel to the vectors ~v1 and ~v2 which passes through the
origin.

• dim U = 3. In this case dim U = dim R3 . Hence it follows that U = R3 by Theorem 5.52 (ii).

• dim U ≥ 4 is not possible because 0 ≤ dim U ≤ dim R3 = 3.

In conclusion, the only subspaces of R3 are {~0}, lines passing through the origin, planes passing
through the origin and R3 itself.

We conclude this section with the formal definition of lines and planes.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 175

Definition 5.55. Let V be a vector space with dim V = n and let W ⊆ V be a subspace. Then
W is called a

• line if dim W = 1,
• plane if dim W = 2,
• hyperplane if dim W = n − 1.

Note that in R3 the hyperplanes are exactly the planes.

You should have understood


• the concept of a basis of a finite dimensional vector space,
• that a given vector space has infinitely many bases, but the number of vectors in any basis
of the space is the same,

FT
• why and how the concept of dimension helps to classify all subspaces of given vector space,
• why a matrix A ∈ M (n × n) is invertible if and only if its columns are a basis of Rn ,
• etc.
You should now be able to

• check if a system of vectors is a basis for a given vector space,


• find a basis for a given vector space,
RA
• extend a system of linear independent vectors to a basis,
• find the dimension of a given vector space,
• etc.

5.5 Summary
Let V be a vector space over K and let v1 , . . . , vk ∈ V .
D

Linear combinations and linear independence


• A vector w is called a linear combination of the vectors v1 , . . . , vk if there exists scalars
α1 , . . . , αk ∈ K such that
w = α1 v1 + · · · + αk vk .

• The set of all linear combinations of the vectors v1 , . . . , vk is a subspace of V , called the space
generated by the vectors v1 , . . . , vk or the linear span of the vectors v1 , . . . , vk . Notation:

gen{v1 , . . . , vk } := span{v1 , . . . , vk } := {w ∈ V : w is linear combination of v1 , . . . , vk }


= {α1 v1 + · · · + αk vk : α1 , . . . , αk ∈ K}.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
176 5.5. Summary

• The vectors v1 , . . . , vk are called linearly independent if the equation

α1 v1 + · · · + αk vk = O

has only the trivial solution α1 = · · · = αk = 0.

Basis and dimension

• A system v1 , . . . , vm of vectors in V is called a basis of V if it is linearly independent and


span{v1 , . . . , vm } = V .

• A vector space V is called finitely generated if it has a finite basis. In this case, every basis
of V has the same number of vectors. The number of vectors needed for a basis of a vector
space V is called the dimension of V .

• If V is not finitely generated, we set dim V = ∞.

FT
• For v1 , . . . , vk ∈ V , it follows that dim(span{v1 , . . . , vk }) ≤ k with equality if and only if the
vectors v1 , . . . , vk are linearly independent.

• If V is finitely generated then every linearly independent system of vectors v1 , . . . , vk ∈ V


can be extended to a basis of V .

• If V = span{v1 , . . . , vk }, then V has a basis consisting of a subsystem of the given vectors


v1 , . . . , vk .
RA
• If U is a subspace of V , then dim U ≤ dim V .

• If V is finitely generated and U is a subspace of V , then dim U = dim V if and only if U = V .


This claim is false if dim V = ∞.

• dim{O} = 0 and {O} has the unique basis ∅.

Examples of the dimensions of some vector spaces:

• dim{O} = 0,
D

• dim Rn = n, dim Cn = n,

• dim M (m × n) = mn,
n(n+1)
• dim Msym (n × n) = 2 ,
n(n−1)
• dim Masym (n × n) = 2 ,

• dim Pn = n + 1,

• dim P = ∞,

• dim C(R) = ∞.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 177

Linear independence, generator property and bases in Rn and Cn


Let ~v1 , . . . , ~vk ∈ Rn or Cn and let A = (~v1 | . . . |~vk ) ∈ M (n × k) be the matrix whose columns consist
of the given vectors.

• gen{~v1 , . . . , ~vk } = Rn if and only if the system A~x = ~b has at least one solution for every
~b ∈ Rn .

• The vectors ~v1 , . . . , ~vk are linearly independent if and only if the system A~x = ~0 has only the
trivial solution ~x = ~0.

• The vectors ~v1 , . . . , ~vk are a basis of Rn if and only if k = n and A is invertible.

5.6 Exercises
1. Sea X el conjunto de todas las funciones de R a R. Demuestre que X con la suma y producto
con números en R es un espacio vectorial.

FT
De los siguientes subconjuntos de X, diga si son subespacios de X.

(a) Todas las funciones acotadas de R a R.


(b) Todas las funciones constantes.
(c) Todas las funciones continuas.
(d) Todas las funciones continuas con f (3) = 0.
(e) Todas las funciones continuas con f (3) = 4.
RA
(f) Todas las funciones con f (3) > 0.
(g) Todas las funciones pares.
(h) Todas las funciones impares.
(i) Todos los polinomios.
(j) Todas las funciones nonegativas.
(k) Todos los polinomios de grado ≥ 4.
D

2. Sean A ∈ M (m × n) y sea ~a ∈ Rk .

(a) Demuestre que U = {A~x : ~x ∈ Rn } es un subespacio de Rm .


(b) Demuestre que W = {~x ∈ Rn : A~x = 0} es un subespacio de Rn .
(c) ¿Los conjuntos R = {~x ∈ Rn : A~x = (1, 1, . . . , 1)t } y S = {~x ∈ Rn : A~x 6= 0} son
subespacios de Rn ?

3. Sean A ∈ M (m × n) y sea ~a ∈ Rk .

(a) ¿El conjunto T = {~x ∈ Rk : h~x, ~ai = 0} es un subespacio de Rk ?

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
178 5.6. Exercises

(b) ¿Los conjuntos

S1 = {~x ∈ Rk : k~xk = 1}, B1 = {~x ∈ Rk : k~xk ≤ 1}, F = {~x ∈ Rk : k~xk ≥ 1}

son subespacios de Rk ?

4. Considere el conjunto R2 con las siguientes operaciones:


     
x1 y x1 + y2
⊕ : R 2 × R2 → R2 , ⊕ 1 = ,
x2 y2 x2 + y1
   
x1 λx1
: R × R 2 → R2 , λ = .
x2 λx2

¿Es R2 con esta suma y producto con escalares un espacio vectorial?

FT
5. Considere el conjunto R2 con las siguientes operaciones:
     
2 2 2 x1 y1 x1 + y1
:R ×R →R ,  = ,
x2 y2 0
   
2 2 x1 λx1
:R×R →R , λ = .
x2 λx2

¿Es R2 con esta suma y producto con escalares un espacio vectorial?


RA
6. (a) Sea V = (− π2 , π
2) y defina suma ⊕ : V × V → V y producto con escalar : R × V → V
por

x ⊕ y = arctan(tan(x) + tan(y)), λ x = arctan(λ tan(x))

para todo x, y ∈ V, λ ∈ R. Demuestre que (V, ⊕, ) es un espacio vectorial sobre R.


(b) Una generalización de la construcción en (a) es lo siguiente:
D

Sea V un conjunto y f : Rn → V una función biyectiva. Entonces V es un espacio vectorial


con suma y producto con escalar definido ası́:

x ⊕ y = f (f −1 (x) + f −1 (y)), λ x = f (λf −1 (x))

para todo x, y ∈ V, λ ∈ R.

7. Sea U un subespacio de Rn . Demuestre que Rn \ U no es un subespacio de Rn .

8. Sean m, n ∈ N. Demuestre que M (m × n, R) con la suma y producto con números en R es un


espacio vectorial.
De los siguientes subconjuntos de M (n × n), diga si son subespacios.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 179

(a) Todas matrices con a11 = 0.


(b) Todas matrices con a11 = 3.
(c) Todas matrices con a12 = µa11 para un µ ∈ R fijo.
(d) Todas matrices cuya primera columna coincide con la última columna.
Para los siguientes numerales supongamos que n = m.
(e) Todas las matrices simétricas (es decir, todas las matrices A con At = A).
(f) Todas las matrices que no son simétricas.
(g) Todas las matrices antisimétricas (es decir, todas las matrices A con At = −A).
(h) Todas las matrices diagonales.
(i) Todas las matrices triangular superior.
(j) Todas las matrices triangular inferior.
(k) Todas las matrices invertibles.

FT
(l) Todas las matrices no invertibles.
(m) Todas las matrices con det A = 1.

9. Demuestre que   
 x1 
x1 + x2 − 2x3 − x4 = 0 
 
x
 
2
V =   :
 
 x3 x1 − x2 + x3 + 7x4 = 0 
RA
 
x4
 

es un subespacio de R4 .

10. Demuestre que   


 x1 
− − −
 
x
 
2 3x 1 x 2 2x 3 x4 = 3 
W =   :


 x3 4x1 + x2 + x3 + 7x4 = 5  
x4
 
D

es un subespacio afı́n de R4 .

11. Considere los sistemas de ecuaciones lineales


   
 x + 2y + 3z = 0
   x + 2y + 3z = 3 
 
(1) 4x + 5y + 6z = 0 , (2) 4x + 5y + 6z = 9 .
   
7x + 8y + 9z = 0 7x + 8y + 9z = 15
   

Sea U el conjunto de todas las soluciones de (1) y W el conjunto de todas las soluciones de (2).
Note que se pueden ver como subconjuntos de R3 .

(a) Demuestre que U es un subespacio de R3 y descrı́balo geométricamente.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
180 5.6. Exercises

(b) Demuestre que W no es un subespacio de R3 .


(c) Demuestre que W es un subespacio afı́n de R3 y descrı́balo geométricamente.

     
1 −2 2 3
12. (a) Sean v1 = , v2 = ∈ R . Escriba v = como combinación lineal de v1 y v2 .
2 5 0
     
1 1 1
(b) ¿Es v = 2 combinación lineal de v1 = 7 , v2 = 5?
5 2 2
 
13 −5
(c) ¿Es A = combinación lineal de
50 8
       
1 0 0 1 2 1 1 −1
A1 = , A2 = , A3 = , A4 = ?
2 2 −2 2 5 0 5 2

FT
     
1 2 3
13. (a) ¿Los vectores v1 = 2 , v2 = 2 , v3 = 0 son linealmente independientes en R3 ?
3 5 1
     
1 1 1
(b) ¿Los vectores v1 = −2 , v2 = 7 , v3 = 5 son linealmente independientes en R3 ?
2 2 2
(c) ¿Los vectores p1 = X 2 − X + 2, p2 = X + 3, p3 = X 2 − 1 son linealmente independientes
RA
en P2 ? Son linealmente independientes en Pn para n ≥ 3?
     
1 3 1 1 7 3 1 −1 0
(d) ¿Los vectores A1 = , A2 = , A3 = son lineal-
−2 2 3 2 −1 2 5 2 8
mente independientes en M (2 × 3)?

~ ∈ Rn . Suponga que w
14. Sean ~v1 , . . . , ~vk , w 6 ~0 y que w
~ = ~ ∈ Rn es ortogonal a todos los vectores
~vj . Demuestre que w ~ ∈
/ gen{~v1 , . . . , ~vm }. Se sigue que el sistema w,
~ ~v1 , . . . , ~vm es linealmente
D

independiente?

15. Determine
  si gen{a
 1 4 } =
, a2 , a3 , a gen{v1 , v } para  
2 , v3    
0 1 1 2 5 1 1
a1 = 1 , a2 = 0 , a3 =  2  , a4 =  1  , v1 = −3 , v2 = 1 , v3 = −1.
5 3 13 11 0 8 −2

16. (a) ¿Las siguientes matrices generan el espacio de todas las matrices simétricas 2 × 2?
     
2 0 13 0 0 3
A1 = , A2 = , A3 = ,
0 7 0 5 3 0

Si no lo hacen, encuentre un M ∈ Msym (2 × 2) \ span{A1 , A2 , A3 }.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 181

(b) ¿Las siguientes matrices generan el espacio de todas las matrices simétricas 2 × 2?
     
2 0 13 0 0 3
B1 = , B2 = , B3 = ,
0 7 0 5 −3 0

(c) ¿Las siguientes matrices generan el espacio de las matrices triangulares superiores 2 × 2?
     
6 0 0 3 10 −7
C1 = , C2 = , C3 = .
0 7 0 5 0 0

Si no, encuentre una matriz M triangular superior que no pertence a span{C1 , C2 , C3 }.

17. Sea n ∈ N y sea V el conjunto de las matrices simétricas n × n con la suma y producto con
λ ∈ R usual.
(a) Demuestre que V es un espacio vectorial sobre R.

FT
(b) Encuentre matrices que generan V . ¿Cual es el número mı́nimo de matrices que se necesitan
para generar V ?

18. Determine si los siguientes conjuntos de vectores son bases del espacio vectorial indicado.
   
1 −2
(a) v1 = , v2 = ; R2 .
2 5
       
1 3 5 3 0 1 2 1
RA
(b) A = , B= , C= , D= ; M (2 × 2).
2 1 1 2 −2 2 5 0
(c) p1 = 1 + x, p2 = x + x2 , p3 = x2 + x3 , p4 = 1 + x + x2 + x3 ; P3 .

19. (a) Es F el plano dado por F : 2x − 5y + 3z = 0. Demuestre que F es subespacio de R3 y


~ ∈ R3 tal que F = gen{~u, w}.
encuentre vectores ~u y w ~
   
1 −5
(b) Sean v1 = 7 , v2 =  1  ∈ R3 . Sea E el plano E = gen{v1 , v2 }. Escriba E en la
D

3 2
forma E : ax + by + cz = d.
(c) Encuentre un vector w ∈ R3 , distinto de v1 y v2 , tal que gen{v1 , v2 , w} = E.
(d) Encuentre un vector v3 ∈ R3 tal que gen{v1 , v2 , v3 } = R3 .

20. (a) Encuentre una base para el plano E : x − 2y + 3z = 0 in R3 .


(b) Complete la base encontrado en (i) a una base de R3 .

21. Sea F := {(x1 , x2 , x3 , x4 )t : 2x1 − x2 + 4x3 + x4 = 0}.


(a) Demuestre que F es un subespacio de R4

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
182 5.6. Exercises

(b) Encuentre una base para F y calcule dim F .


(c) Complete la base encontrada en (ii) a una base de R4 .

22. Sea G := {(x1 , x2 , x3 , x4 )t : 2x1 − x2 + 4x3 + x4 = 0, x1 − x2 + x3 + 2x4 = 0}.

(a) Demuestre que G es un subespacio de R4


(b) Encuentre una base para G y calcule dim G.
(c) Complete la base encontrada en (ii) a una base de R4 .

         
1 0 4 2 1
23. Sean v1 = 2 , v2 = 4 , v3 = 2 , v4 = 8 , v5 = 0.
3 1 5 3 1
Determine si estos vectoren generan el espacio R3 . Si lo hacen, escoja una base de R3 de los

FT
vectores dados.

       
6 0 6 3 6 −3 12 −9
24. Sean C1 = , C2 = , C3 = , C4 = .
0 7 0 12 0 2 0 −1
Determine si estas matrices generan el espacio de las matrices triangulares superiores 2 × 2. Si
lo hacen, escoja una base de las matrices dadas.
RA
25. Sean p1 = x2 +7, p2 = x+1, p3 = 3x3 +7x. Determine si los polinomios p1 , p2 , p3 son linealmente
independientes. Si lo son, complételos a una base en P3 .

26. Para los siguientes conjuntos, determine si son espacios vectoriales. Si lo son, calcule su di-
mensión.

(a) M1 = {A ∈ M (n × n) : A es triangular superior}.


(b) M2 = {A ∈ M (n × n) : A tiene zeros en la diagonal}.
D

(c) M3 = {A ∈ M (n × n) : At = −A}.
(d) M4 = {p ∈ P5 : p(0) = 0}.

27. Para los siguientes sistemas de vectores en el espacio vectorial V , determine la dimensión del
espacio vectorial generado por ellos y escoja un subsistema de ellos que es base del espacio
vectorial generado por los vectores dados. Complete este subsistema a una base de V .
     
1 3 3
(a) V = R3 , ~v1 = 2 , ~v2 = 2 , ~v3 = 2.
3 7 1
(b) V = P4 , p1 = x3 + x, p2 = x3 − x2 + 3x, p3 = x2 + 2x − 5, p4 = x3 + 3x + 2.

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 5. Vector spaces 183

       
1 4 3 0 0 12 9 −12
(c) V = M (2 × 2), A = , B= , C= , D= .
−2 5 1 4 −7 11 10 1

28. Sea V un espacio vectorial. Falso o verdadero?

(a) Suponga v1 , . . . , vk , u, z ∈ V tal que z es combinación lineal de los v1 , . . . , vk . Entonces


que z es combinación lineal de v1 , . . . , vk , u.
(b) Si u es combinación lineal de v1 , . . . , vk ∈ V , entonces v1 , . . . , vk , u es un sistema de vectores
linealmente dependientes.
(c) Si v1 , . . . , vk ∈ V es un sistema de vectores linealmente dependientes, entonces v1 es com-
binación lineal de los v2 , . . . , vk .

29. Sean V y W espacios vectoriales.

(a) Sea U ⊂ V un subspacio y sean u1 , . . . , uk ∈ U . Demuestre que gen{u1 , . . . uk } ⊂ U .

tenemos w` ∈ gen{u1 , . . . , uk }.
FT
(b) Sean u1 , . . . , uk , w1 , . . . , wm ∈ V . Demuestre que lo siguiente es equivalente:
(i) gen{u1 , . . . , uk } = gen{w1 , . . . , wm }.
(ii) Para todo j = 1, . . . , k tenemos uj ∈ gen{w1 , . . . , wm } y para todo ` = 1, . . . , m

(c) Sean v1 , v2 , v3 , . . . , vm ∈ V y sea c ∈ R. Demuestre que


gen{v1 , v2 , v3 , . . . , vm } = gen{v1 + cv2 , v2 , v3 , . . . , vm }.
(d) Sean v1 , . . . , vk ∈ V y sea A : V → W una función lineal invertible. Demuestre que
RA
dim gen{v1 , . . . , vk } = dim gen{Av1 , . . . , Avk }. Es verdad si A no es invertible?

30. (a) ¿Es Cn un espacio vectorial sobre R?


(b) ¿Es Cn un espacio vectorial sobre Q?
(c) ¿Es Rn un espacio vectorial sobre C?
(d) ¿Es Rn un espacio vectorial sobre Q?
(e) ¿Es Qn un espacio vectorial sobre R?
D

(f) ¿Es Qn un espacio vectorial sobre C?

Last Change: Mi 6. Apr 23:59:29 CEST 2022


Linear Algebra, M. Winklmeier
D
RA
FT
Chapter 6

Linear transformations and change


of bases

FT
In the first section of this chapter we will define linear maps between vector spaces and discuss their
properties. These are functions which “behave well” with respect to the vector space structure. For
example, m × n matrices can be viewed as linear maps from Rm to Rn . We will prove the so-called
dimension formula for linear maps. In Section 6.2 we will study the special case of matrices. One of
the main results will be the dimension formula (6.4). In Section 6.4 we will see that, after choice of
a basis, every linear map between finite dimensional vector spaces can be represented as a matrix.
This will allow us to carry over results on matrices to the case of linear transformations.
RA
As in previous chapters, we work with vector spaces over R or C. Recall that K always stands for
either R or C.

6.1 Linear maps


Definition 6.1. Let U, V be vector spaces over the same field K. A function T : U → V is called
a linear map if for all x, y ∈ U and λ ∈ K the following is true:
T (x + y) = T x + T y, T (λx) = λT x. (6.1)
D

Other words for linear map are linear function, linear transformation or linear operator.

Remark. Note that very often one writes T x instead of T (x) when T is a linear function.

Remark 6.2. (i) Clearly, (6.1) is equivalent to


T (x + λy) = T x + λT y for all x, y ∈ U and λ ∈ K. (6.1’‘)

(ii) It follows immediately from the definition that


T (λ1 v1 + · · · + λk vk ) = λ1 T v1 + · · · + λk T vk
for all v1 , . . . , vk ∈ V and λ1 , . . . , λk ∈ K.

185
186 6.1. Linear maps

(iii) The condition (6.1) says that a linear map respects the vector space structures of its
domain and its target space.

Exercise 6.3. Let U, V be vector spaces over K (with K = R or K = C). Let us denote the set
of all linear maps from U to V by L(U, V ). Show that L(U, V ) is a vector spaces over K. That
means you have to show that the sum of two linear maps is a linear map, that a scalar multiple
of linear map is a linear map and that the vector space axioms hold.

Exercise 6.4. Let U, V, W be vector spaces over K (with K = R or K = C).


• Suppose that T : U → V and S : V → W are linear functions. Show that their composition
ST : U → W is a linear function too.
• Suppose that T : U → V is a linear invertible linear function so that we can define its inverse
function T −1 : Im(T ) → U . Show that it is a linear function too.

Examples 6.5 (Linear maps). (a) Every matrix A ∈ M (m × n) can be identified with a linear

FT
map Rn → Rm .
(b) Differentiation is a linear map, for example:
(i) Let C(R) be the space of all continuous functions and C 1 (R) the space of all continuously
differentiable functions. Then

T : C 1 (R) → C(R), Tf = f0

is a linear map.
RA
Proof. First of all note that f 0 ∈ C(R) if f ∈ C 1 (R), so the map T is well-defined. Now
we want to see that it is linear. So we take f, g ∈ C 1 (R) and λ ∈ R. We find

T (λf + g) = (λf + g)0 = (λf )0 + g 0 = λf 0 + g 0 = λT f + T g.

(ii) The following maps are linear, too. Note that their action is the same as the one of T
above, but we changed the vector spaces where it acts on.

R : Pn → Pn−1 , Rf = f 0 , S : P n → Pn , Sf = f 0 .
D

(c) Integration is a linear map. For example:


Z x
I : C([0, 1]) → C([0, 1]), f 7→ If where (If )(x) = f (t) dt.
0

Proof. Clearly I is well-defined since the integral of a continuous function is again continuous.
In order to show that I is linear, we fix f, g ∈ C(R) and λ ∈ R. We find for every x ∈ R:
Z x Z x Z t Z x

I(λf + g) (x) = (λf + g)(t) dt = λf (t) + g(t) dt = λ f (t) dt + g(t) dt
0 0 0 0
= λ(If )(x) + (Ig)(x).

Since this is true for every x, it follows that I(λf + g) = λ(If ) + (Ig).

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 187

(d) As an example for a linear map from M (n × n) to itself, we consider

T : M (n × n) → M (n × n), T (A) = A + At .

Proof that T is a linear map. Let A, B ∈ M (n × n) and let c ∈ R. Then

T (A + cB) = (A + cB) + (A + cB)t = A + cB + At + (cB)t = A + cB + At + cB t


= A + At + c(B + B t ) = T (A) + cT (B).

The next lemma shows that a linear map always maps the zero vector to the zero vector.

Lemma 6.6. If T is a linear map, then T O = O.

Proof. T O = T (O − O) = T O − T O = O.

Definition 6.7. Let T : U → V be a linear map.

FT
(i) T is called injective (or one-to-one) if

x, y ∈ U, x 6= y =⇒ T x 6= T y.

(ii) T is called surjective if for all v ∈ V there exists at least one x ∈ U such that T x = v.
(iii) T is called bijective if it is injective and surjective.
(iv) The kernel of T (or null space of T ) is
RA
ker(T ) := {x ∈ U : T x = 0}.

Sometimes the notations N (T ) or NT are used for ker(T ).


(v) The image of T (or range of T ) is

Im(T ) := {v ∈ V : y = T x for some y ∈ U }.

Sometimes the notations Rg(T ) or R(T ) or T (U ) are used for Im(T ).


D

Remark 6.8. (i) Observe that ker(T ) is a subset of U , Im(T ) is a subset of V . In Proposi-
tion 6.11 we will show that they are even subspaces.
(ii) Clearly, T is injective if and only if for all x, y ∈ U the following is true:

Tx = Ty =⇒ x = y.

(iii) If T is a linear injective map, then its inverse T −1 : Im(T ) → U exists and is linear too.

The following lemma is very useful.

Lemma 6.9. Let T : U → V be a linear map.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
188 6.1. Linear maps

(i) T is injective if and only if ker(T ) = {O}.


(ii) T is surjective if and only if Im(T ) = V .

Proof. (i) From Lemma 6.6, we know that O ∈ ker(T ). Assume that T is injective. Then ker(T )
cannot contain any other element, hence ker(T ) = {O}.
Now assume that ker(T ) = {O} and let x, y ∈ U with T x = T y. By Remark 6.8 it is sufficient to
show that x = y. By assumption, O = T x − T y = T (x − y), hence x − y ∈ ker(T ) = {O}. Therefore
x − y = O, which means that x = y.
(ii) follows directly from the definitions of surjectivity and the image of a linear map.

Examples 6.10 (Kernels and ranges of the linear maps from Examples 6.5).
(a) We will discuss the case of matrices at the beginning of Section 6.2.
(b) If T : C 1 (R) → C(R), T f = f 0 , then it is easy to see that the kernel of T consists exactly
of the constant functions. Moreover T is surjective because every continuousR functions is the
x
derivative of another function because for every f ∈ C(R) we can set g(x) = 0 f (t) dt. Then

FT
g ∈ C 1 (R) and T g = g 0 = f which shows that Im(T ) = C(R).
(c) For the integration operator in Example 6.5((c)) we have that ker(I) = {0} and Im(I) =
C 1 (R). In other words, I is injective but not surjective.

Proof. First we proveR x the claim about the range of I. Suppose that g ∈ Im(I). Then g is
of the form g(x) = 0 f (t) dt for some f ∈ C(R). By the fundamental theorem of calculus,
it follows that g ∈ C 1 (R), so we proved Im(I) ⊆ C 1 (R). To show the other inclusion, let
RA
0
g ∈ C 1 (R). Then g is differentiable
R x 0 and g ∈ C(R) and, again by the fundamental theorem of
calculus, we have that g(x) = 0 g (t) dt, so g ∈ Im(I) and it follows that C 1 (R) ⊆ Im(I).
Rx
Now assume that Ig = 0. If we differentiate, we find that 0 = (Ig)0 (x) = dx
d
0
g(t) dt = g(x)
for all x ∈ R, therefore g ≡ 0, hence ker(I) = {0}.

(d) Let T : M (n × n) → M (n × n), T (A) = A + At . Then ker T = Masym (n × n) (= the space


of all antisymmetric n × n matrices) and Im T = Msym (n × n) (= the space of all symmetric
n × n matrices).
D

Proof. First we prove the claim about the range of T . Clearly, Im(T ) ⊆ Msym (n × n) because
for every A ∈ M (n × n) we have that T (A) is symmetric because (T (A))t = (A + At )t =
At + (At )t = At + A = T (A). To prove Msym (n × n) ⊆ Im(T ) we take some B ∈ Msym (n × n).
Then T ( 21 B) = 12 B +( 12 B)t = 21 B + 12 B = B where we used that B is symmetric. In summary
we showed that Im(T ) = Msym (n × n).
The claim on the kernel of T follows from

A ∈ ker T ⇐⇒ T (A) = 0 ⇐⇒ A+At = 0 ⇐⇒ A = −At ⇐⇒ A ∈ Masym (n×n).

Proposition 6.11. Let T : U → V be a linear map. Then


(i) ker(T ) is a subspace of U .

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 189

(ii) Im(T ) is a subspace of V .

Proof. (i) By Lemma 6.6, O ∈ ker(T ). Let x, y ∈ ker(T ) and λ ∈ K. Then x + λy ∈ ker(T )
because
T (x + λy) = T x + λT y = O + λ0 = O.
Hence ker(T ) is a subspace of U by Proposition 5.10.
(ii) C;early, O ∈ Im(T ). Let v, w ∈ Im(T ) and λ ∈ K. Then there exist x, y ∈ U such that
T x = v and T y = w. Then v + λw = T x + λT y = T (x + λy) ∈ Im(T ). hence v + λw ∈ Im(T ).
Therefore Im(T ) is a subspace of V by Proposition 5.10.

Since we now know that ker(T ) and Im(T ) are subspaces, the following definition makes sense.

Definition 6.12. Let T : U → V be a linear map. We define

dim(ker(T )) = nullity of T, dim(Im(T )) = rank of T.

FT
Sometimes the notations ν(T ) = dim(ker(T )) and ρ(T ) = dim(Im(T )) are used.

Example. Let T : P3 → P3 be defined by T p = p0 . Then Im(T ) = {q ∈ P3 : deg q ≤ 2} and


ker(T ) = {q ∈ P3 : deg q = 0}. In particular dim(Im(T )) = 3 and dim(ker(T )) = 1.

Proof. • First we show the claim about the image of T . We know that differentiation lowers
the degree of a polynomial by 1. Hence Im(T ) ⊆ {q ∈ P3 : deg q ≤ 2}. On the other hand,
we know that every polynomial of degree ≤ 2 is the derivative of a polynomial of degree ≤ 3.
RA
So the claim follows.

• First we show the claim about the kernel of T . Recall that ker(T) = {p ∈ P3 : T p = 0}. So
the kernel of T are exactly those polynomials whose first derivative is 0. These are exactly
the constant polynomials, i.e., the polynomials of degree 0.

Lemma 6.13. Let T : U → V be a linear map between two vector spaces U, V and let {u1 , . . . , uk }
be a basis of U . Then Im T = span{T u1 , . . . , T uk }.

Proof. Clearly, T u1 , . . . , T uk ∈ Im T . Since the image of T is a vector space, all linear combinations
D

of these vectors must belong to Im T too which shows span{T u1 , . . . , T uk } ⊆ Im T . To show the
other inclusion, let y ∈ Im T . Then there is an x ∈ U such that y = T x. Let us express x as linear
combination of the vectors of the basis: x = α1 u1 + . . . αk uk . Then we obtain

y = T x = T (α1 u1 + . . . αk uk ) = α1 T u1 + . . . αk T uk ∈ span{T u1 , . . . , T uk }.

Since y was arbitrary in Im T , we conclude that Im T ⊆ span{T u1 , . . . , T uk }. So in summary we


proved the claim.

Proposition 6.14. Let U, V be K-vector spaces, T : U → V a linear map. Let x1 , . . . , xk ∈ U and


set y1 := T x1 , . . . , yk := T xk . Then the following is true.

(i) If the x1 , . . . , xk are linearly dependent, then y1 , . . . , yk are linearly dependent too.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
190 6.1. Linear maps

(ii) If the y1 , . . . , yk are linearly independent, then x1 , . . . , xk are linearly independent too.

(iii) Suppose additionally that T invertible. Then x1 , . . . , xk are linearly independent if and only
if y1 , . . . , yk are linearly independent.

In general the implication “If x1 , . . . , xk are linearly independent, then y1 , . . . , yk are linearly
independent.” is false. Can you give an example?

Proof of Proposition 6.14. (i) Assume that the vectors x1 , . . . , xk are linearly dependent. Then
there exist λ1 , . . . , λk ∈ K such that λ1 x1 + · · · + λk xk = O and at least one λj 6= 0. But then

O = T O = T (λ1 x1 + · · · + λk xk ) = λ1 T x1 + · · · + λk T xk
= λ1 y1 + · · · + λk yk ,

hence the vectors y1 , . . . , yk are linearly dependent.

(ii) follows directly from (i).

FT
(iii) Suppose that the vectors y1 , . . . , yk are linearly independent. Then so are the x1 , . . . , xk by
(i). Now suppose that x1 , . . . , xk are linearly independent. Note that T is invertible, so T −1
exists. Therefore we can apply (i) to T −1 in order to conclude that the system y1 , . . . , yk is linearly
independent. (Note that xj = T −1 yj .)

Exercise 6.15. Assume that T : U → V is an injective linear map and suppose that {u1 , . . . , u` }
is a set of are linearly independent vectors in U . Show that {T u1 , . . . , T u` } is a set of are linearly
independent vectors in V .
RA
The following lemma is very useful and it is used in the proof of Theorem 6.4.

Proposition 6.16. Let U, V be K-vector spaces with dim U = k < ∞.

(i) If T : U → V is linear transformation, then dim Im(T ) ≤ dim U .


(ii) If T : U → V is an injective linear transformation, then dim Im(T ) = dim U .
(iii) If T : U → V is a bijective linear transformation, then dim U = dim V .
D

Proof. Let u1 , . . . , uk be a basis of U .

(i) From Lemma 6.13 we know that Im T = span{T u1 , . . . , T uk }. Therefore dim Im T ≤ k = dim U
by Theorem 5.45.

(ii) Assume that T is injective. We will show that T u1 , . . . , T uk are linearly independent. Let
α1 , . . . , αk ∈ K such that α1 T u1 + · · · + αk T uk = O. Then

O = α1 T u1 + · · · + αk T uk = T (α1 u1 + · · · + αk uk ).
Since T is injective, it follows that α1 u1 + · · · + αk uk = O, hence α1 = · · · = αk = 0 which
shows that the vectors T u1 , . . . , T uk are indeed linearly independent. Therefore they are a basis
of span{T u1 , . . . , T uk } = Im T and we conclude that dim Im T = k = dim U .

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 191

(iii) Since T is bijective, it is surjective and injective. Surjectivity means that Im T = V and
injectivity of T implies that dim Im T = dim U by (ii). In conclusion,

dim U = dim Im T = dim V.

The previous theorem tells us for example that there is no injective linear map from R5 to R3 ; or
that there is no surjective linear map from R3 to M (2 × 2).

Remark 6.17. Proposition 6.16 is true also for dim U = ∞. In this case, (i) clearly holds whatever
dim Im(T ) may be. To prove (ii) we need to show that dim Im(T ) = ∞ if T is injective. Note that
for every n ∈ N we can find a subspace Un of U with dim Un = n and we define Tn to be the
restriction of T to Un , that is, Tn : Un → V . Since the restriction of an injective map is injective,
it follows from (ii) that dim Im(Tn ) = n. On the other hand, Im(Tn ) is a subspace of V , therefore
dim V ≥ dim Im(Tn ) = n by Theorem 5.52 and Remark 5.53. Since this is true for any n ∈ N, it
follows that dim V = ∞. The proof of (iii) is the same as in the finite dimensional case.

Theorem 6.18. Let U, V be K-vector spaces and T : U → V a linear map. Moreover, let E : U →

FT
U , F : V → V be linear bijective maps. Then the following is true:
(i) Im(T ) = Im(T E), in particular dim(Im(T )) = dim(Im(T E)).
(ii) ker(T E) = E −1 (ker(T )) and dim(ker(T )) = dim(ker(T E)).
(iii) ker(T ) = ker(F T ), in particular dim(ker(T )) = dim(ker(F T )).
(iv) Im(F T ) = F (Im(T )) and dim(Im(T )) = dim(Im(F T )).
RA
In summary we have

ker(F T ) = ker(T ), ker(T E) = E −1 (ker(T )),


(6.2)
Im(F T ) = F (Im(T )), Im(T E) = Im(T ).

and

dim ker(T ) = dim ker(F T ) = dim ker(T E) = dim ker(F T E),


(6.3)
D

dim Im(T ) = dim Im(F T ) = dim Im(T E) = dim Im(F T E).

Proof. (i) Let v ∈ V . If v ∈ Im(T ), then there exists x ∈ U such that T x = v. Set y = E −1 x.
Then v = T x = T EE −1 x = T Ey ∈ Im(T E). On the other hand, if v ∈ Im(T E), then there exists
y ∈ U such that T Ey = v. Set x = E. Then v = T Ey = T x ∈ Im(T ).
(ii) To show ker(T E) = E −1 ker(T ) observe that

ker(T E) = {x ∈ U : Ex ∈ ker(T )} = {E −1 u : u ∈ ker(T )} = E −1 (ker(T )).

It follows that
E −1 : ker T → ker(T E)
is a linear bijection and therefore dim T = dim ker(T E) by Proposition 6.16(iii) (or Remark 6.17 in
the infinite dimensional case) with E −1 as T , ker(T ) as U and ker(T E) as V .

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
192 6.1. Linear maps

(iii) Let x ∈ U . Then x ∈ ker(F T ) if and only if F T x = O. Since F is injective, we know that
ker(F ) = {O}, hence it follows that T x = O. But this is equivalent to x ∈ ker(T ).
(iv) To show Im(F T ) = F Im(T ) observe that
Im(F T ) = {y ∈ V : y = F T x for some x ∈ U } = {F v : v ∈ Im(T )} = F (Im(T )).
It follows that
F : Im T → Im(F T )
is a linear bijection and therefore dim T = dim Im(F T ) by Proposition 6.16(iii) (or Remark 6.17 in
the infinite dimensional case) with F as T , Im(T ) as U and Im(F T ) as V .

Remark 6.19. Ingeneral,


 ker(T ) = ker(T
 E) and ker(T ) = ker(F T ) is false. Take for example
2 1 0 0 1
U =V =R ,T = and E = F = . Then clearly the hypotheses of the theorem are
0 0 1 0
satisfied and    
0 1
ker(T ) = span , Im(T ) = span ,
1 0

FT
but    
1 0
ker(T E) = span , Im(F T ) = span .
0 1

Draw a picture to visualise the example above, taking into account that T represents √
the projection
onto the x-axis and E and F are rotation by 45◦ and a “stretching” by the factor 2.

We end this section with one of the main theorems of linear algebra. In the next section we will
RA
re-prove it for the special case when T is given by a matrix in Theorem 6.33. The theorem below
can be considered a coordinate free version of Theorem 6.33.
Theorem 6.20. Let U, V be vector spaces with dim U = n < ∞ and let T : U → V be a linear
map. Then
dim(ker(T )) + dim(Im(T )) = n. (6.4)

Proof. Let k = dim(ker(T )) and let {u1 , . . . , uk } be a basis of ker(T ). We complete it to a basis
{u1 , . . . , uk , wk+1 , . . . , wn } of U and we set W := span{wk+1 , . . . , wn }. Note that by construction
D

ker(T ) ∩ W = {O}. (Prove this!) Let us consider Te = T |W the restriction of T to W .


It follows that Te is injective because if Tex = O for some x ∈ W then also T x = Tex = O, hence
x ∈ ker(T ) ∩ W = {O}. It follows from Proposition 6.16(ii) that
dim Im Te = dim W = n − k. (6.5)
To complete the proof, it suffices to show that Im Te = Im T . Recall that by Lemma 6.13, we have
that the range of a linear map is generated by the images of a basis of the initial vector space.
Therefore we find that
Im T = span{T u1 , . . . , T uk , T wk+1 , . . . , T wn } = span{T wk+1 , . . . , T wn }
= span{Tewk+1 , . . . , Tewn }
= Im Te

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 193

where in the second step we used that T u1 = · · · = T uk = O and therefore they do not contribute
to the linear span and in the third step we used that T wj = Tewj for j = k + 1, . . . , n. So we
showed that Im Te = Im T , in particular their dimensions are equal and the claim follows from (6.5)
because, recalling that k = dim ker(T ),

n = dim Im Te + k = dim Im T + dim ker T.

Note that an alternative way to prove the theorem above is to first prove Theorem 6.33 for matrices
and then use the results on representations of linear maps in Section 6.4 to conclude formula (6.4).

You should now have understood


• what a linear map is and why they are the natural maps to consider on vector spaces,
• what injectivity, surjectivity and bijectivity means,
• what the kernel and image of a linear map is,
• why the dimension formula (6.4) is true,

FT
• etc.
You should now be able to

• give examples of linear maps,


• check if a given function is a linear maps,
• find bases and the dimension of kernels and ranges of a given linear map,
• etc.
RA
6.2 Matrices as linear maps
In this section, we work mostly with real vector spaces for definiteness sake. However, all the
statements are also true for complex vector spaces. We only have to replace everywhere R by C
and the word real by complex.
D

Let A ∈ M (m × n). We already know that we can view A as a linear map from Rn to Rm . Hence
ker(A) and Im(A) and the terms injectivity and surjectivity are defined.
Strictly speaking, we should distinguish between a matrix and the linear map induced by it. So
we should write TA : Rn → Rm for the map x 7→ Ax. The reason is that if we view A directly
as a linear map then this implies that we tacitly have already chosen a basis in Rn and Rm , see
Section 6.4 for more on that. However, we will usually abuse notation and write A instead of TA .
If we view a matrix A as a linear map and at the same time as a linear system of equations, then
we obtain the following.

Remark 6.21. Let A ∈ M (m × n) and denote the columns of A by ~a1 , . . . , ~an ∈ Rm . Then the
following is true.

(i) ker(A) = all solutions ~x of the homogeneous system A~x = ~0.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
194 6.2. Matrices as linear maps

(ii) Im(A) = all vectors ~b such that the system A~x = ~b has a solution
= span{~a1 , . . . , ~an }.

Consequently,

(iii) A is injective ⇐⇒ ker(A) = {~0}


⇐⇒ the homogenous system A~x = ~0 has only the trivial solution ~x = ~0.
(iv) A is surjective ⇐⇒ Im(A) = Rm
⇐⇒ for every ~b ∈ Rm , the system A~x = ~b has at least one solution.

Proof. All claims should be clear except maybe the second equality in (ii). This follows from
     

 x1 x1 

n  ..   ..  n
Im A = {A~x : ~x ∈ R } = (~a1 | . . . |~an )  .  :  .  ∈ R
 
xn xn
 

FT
= {x1~a1 + · · · + xn~an ) : x1 , . . . , xn ∈ R}
= span{~a1 , . . . , ~an },

see also Remark 3.18.

To practice a bit, we prove the following two remarks in two ways.

Remark 6.22. Let A ∈ M (m × n). If m > n, then M cannot be surjective.


RA
Proof with Gauß-Jordan. Let A0 be the row reduced echelon form of A. Then there must be an
invertible matrix E such that A = EA0 and A0 the last row of A0 must be zero because it can have
at most n pivots. But then (A0 |~em ) is inconsistent, which means that (A|E −1~em ) is inconsistent.
Hence E −1~em ∈/ Im A so A cannot be surjective. (Basically we say that clearly A0 is not surjective
because we can easily find a right side to that A0 ~x0 = ~b0 is inconsistent. Just pick any vector ~b0 whose
last coordinate is different from 0. The easiest such vector is ~em . Now do the Gauß-Jordan process
backwards on this vector in order to obtain a right hand side ~b such that A~x = ~b is inconsistent.)
D

Proof using the concept of dimension. We already saw that Im A is the linear span of its columns.
Therefore dim Im A ≤ #columns of A = n < m = dim Rm , therefore Im A ( Rm .

Remark 6.23. Let A ∈ M (m × n). If m < n, then M cannot be injective.

Proof with Gauß-Jordan. Let A0 be the row reduced echelon form of A. Then A0 can have at
most m pivots. Since A0 has more columns than pivots, the homogeneous system A~x = ~0 has
infinitely solutions, but then also ker A contains infinitely many vectors, in particular A cannot be
injective.

Proof using the concept of dimension. We already saw that Im A is the linear span of its n columns
in Rm . Since n > m it follows that the column vectors are linearly dependent in Rm , hence A~x = ~0
has a non-trivial solution. Therefore ker A is not trivial and it follows that A is not injective.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 195

Note that the remarks do not imply that A is surjective if m ≤ n or that A is injective if n ≤ m.
Find examples!

From Theorem 3.43 we obtain the following very important theorem for the special case m = n.

Theorem 6.24. Let A ∈ M (n × n) be a square matrix. Then the following is equivalent.

(i) A is invertible.
(ii) A is injective, that is, ker A = {~0}.
(iii) A is surjective, that is, Im A = Rn .
In particular, A is injective if and only if A is surjective if and only if A is bijective.

Definition 6.25. Let A ∈ M (m × n) and let ~c1 , . . . , ~cn be the columns of A and ~r1 , . . . , ~rm be the
rows of A. We define
(i) CA := span{~c1 , . . . , ~cm } =: column space of A ⊆ Rm ,

FT
(ii) RA := span{~r1 , . . . , ~rn } =: row space of A ⊆ Rn ,

The next proposition follows immediately from the definition above and from Remark 6.21(ii).

Proposition 6.26. For A ∈ M (m × n) it follows that


(i) RA = CAt and CA = RAt ,
(ii) CA = Im(A) and RA = Im(At ).
RA
The next proposition follows directly from the general theory in Section 6.1. We will give another
proof at the end of this section.

Proposition 6.27. Let A ∈ M (m × n), E ∈ M (n × n), F ∈ M (m × m) and assume that E and


F are invertible. Then
(i) CA = CAE .
D

(ii) RA = RF A .

Proof. (i) Note that CA = Im(A) = Im(AE) = CAE , where in the first and third equality we
used Proposition 6.26, and in the second equality we used Theorem 6.4.
(ii) Recall that, if F is invertible, then F t is invertible too. With Proposition 6.26(i) and what
we already proved in (i), we obtain RF A = C(F A)t = CAt F t = CAt = RA .
We immediately obtain the following proposition.

Proposition 6.28. Let A, B ∈ M (m × n).


(i) If A and B are row equivalent, then
dim(ker(A)) = dim(ker(B)), dim(Im(A)) = dim(Im(B)), Im(At ) = Im(B t ), RA = RB .

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
196 6.2. Matrices as linear maps

(ii) If A and B are column equivalent, then

dim(ker(A)) = dim(ker(B)), dim(Im(A)) = dim(Im(B)), Im(A) = Im(B), CA = CB .

Proof. We will only prove (i). The claim (ii) can be proved similarly (or can be deduced easily
from (i) by applying (i) to the transposed matrices). That A and B are row equivalent means that
we can transform B into A by row transformations. Since row transformations can be represented
by multiplication by elementary matrices from the left, there are elementary matrices F1 , . . . , Fk ∈
M (m × m) such that A = F1 . . . Fk B. Note that all Fj are invertible, hence F := F1 . . . Fk is
invertible and A = F B. Therefore all the claims in (i) follow from Theorem 6.4 and Proposition 6.27.

The proposition above is very useful to calculate the kernel of a matrix A: Let A0 be the reduced
row-echelon form of A. Then the proposition can be applied to A and A0 , and we find that
ker(A) = ker(A0 ).

In fact, we know this since the first chapter of this course, but back then we did not have fancy

FT
words like “kernel” at our disposal. It says nothing else than: the solutions of a homogenous
system do not change if we apply row transformations, which is exactly why the Gauß-Jordan
elimination works.

In Examples 6.36 and 6.37 we will calculate the kernel and range of a matrix. Now we will prove
two technical lemmas.

Lemma 6.29. Let A ∈ M (m × n). Then there exist elementary matrices E1 , . . . , Ek ∈ M (n × n)


and F1 , . . . , F` ∈ M (m × m) such that
RA
F1 · · · F` AE1 · · · Ek = A00

where A00 is of the form


r n−r
 
1
 0  r
A00 =  (6.6)
 

1
D

 
 
 
 
  m−r
0 0
Proof. Let A0 be the reduced row-echelon form of A. Then there exist F1 , . . . , F` ∈ M (m × m) such
that F1 · · · F` A = A0 and A0 is of the form
 
1 ∗ ∗ 0 ∗ ∗ 0 ∗
 
 1 ∗ ∗ 0 ∗ 
0
A = . (6.7)
 
 1 ∗ 
 
 

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 197

Now clearly we can find “allowed” column transformations such that A0 is transformed into the
form A00 . If we observe that applying column transformations is equivalent to multiplying A0 from
the right by elementary matrices, then we can find elementary matrices E1 , . . . , Ek such that
A0 E1 . . . Ek if of the form (6.6).

Lemma 6.30. Let A00 be as in (6.6). Then

(i) dim(ker(A)) = m − r = number of zero rows of A00 ,


(ii) dim(Im(A)) = r = number of pivots A00 ,
(iii) dim(CA00 ) = dim(RA00 ) = r.

Proof. All assertions are clear if we note that

ker(A00 ) = span{~er+1 , . . . ,~en } and Im(A00 ) = span{~e1 , . . . ,~er }

where the ~ej are the standard unit vectors (that is, their jth component is 1 and all other components

FT
are 0).

Proposition 6.31. Let A ∈ M (m × n) and let A0 be its reduced row-echelon form. Then

dim(Im(A)) = number of pivots of A0 .

Proof. Let F1 , . . . , F` , E1 , . . . , Ek and A00 be as in Lemma 6.29 and set F := F1 · · · F` and E :=


E1 · · · Ek . It follows that A0 = F A and A00 = F AE. Clearly, the number of pivots of A0 and A00
coincide. Therefore, with the help of Theorem 6.4 we obtain
RA
dim(Im(A)) = dim(Im(F AE))
= number of pivots of A00
= number of pivots of A0 .

Proposition 6.32. Let A ∈ M (m × n). Then

dim(Im(A)) = dim CA = dim RA .


D

That means:
(dimension of the range of A) = (dimension of row space) = (dimension of column space).

Proof. Since CA = Im(A) by Proposition 6.26, the first equality is clear.


Now let F1 , . . . , F` , E1 , . . . , Ek and A0 , A00 be as in Lemma 6.29 and set F := F1 · · · F` and E :=
E1 · · · Ek . Then

dim(RA ) = dim(RF AE ) = dim(RA00 ) = r = dim(CA00 ) = dim(CF AE )


= dim(CA ).

As an immediate consequence we obtain the following theorem which is a special case of Theo-
rem 6.20, see also Theorem 6.46.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
198 6.2. Matrices as linear maps

Theorem 6.33. Let A ∈ M (m × n). Then

dim(ker(A)) + dim(Im(A)) = n. (6.8)

Proof. With the notation a above, we obtain

dim(ker(A)) = dim(ker(A00 )) = n − r,
dim(Im(A)) = dim(Im(A00 )) = r

and the desired formula follows.


For the calculation of a basis of Im(A), the following theorem is useful.

Theorem 6.34. Let A ∈ M (m × n) and let A0 be its reduced row-echelon form with columns
~c1 , . . . , ~cn and ~c1 0 , . . . , ~cn 0 respectively. Assume that the pivot columns of A0 are the columns j1 <
· · · < jk . Then dim(Im(A)) = k and a basis of Im(A) is given by the columns ~cj1 , . . . , ~cjk of A.

FT
Proof. Let E be an invertible matrix such that A = EA0 . By assumption on the pivot columns of
A0 , we know that dim(Im(A0 )) = k and that a basis of Im(A0 ) is given by the columns ~cj1 0 , . . . , ~cjk 0 .
By Theorem 6.4 it follows that dim(Im(A)) = dim(Im(A0 )) = k. Now observe that by definition of
E we have that E~c` 0 = ~c` for every ` = 1, . . . , n; in particular this is true for the pivot columns of
A0 . Moreover, since E in invertible and the vectors ~cj1 0 , . . . , ~cjk 0 are linearly independent, it follows
from Theorem 6.14 that the vectors ~cj1 , . . . , ~cjk are linearly independent. Clearly they belong to
Im(A), so we have span{~cj1 , . . . , ~cjk } ⊆ Im(A). Since both spaces have the same dimension, they
must be equal.
RA
Remark 6.35. The theorem above can be used to determine a basis of a subspace given in the
form U = span{~v1 , . . . , ~vk } ⊆ Rm as follows: Define the matrix A = (~v1 | . . . |~vk ). Then clearly
U = Im A and we can apply Theorem 6.34 to find a basis of U .

Example 6.36. Find ker(A), Im(A), dim(ker(A)), dim(Im(A)) and RA for


 
1 1 5 1
3 2 13 1
A= 0 2 4 −1 .

D

4 5 22 1

Solution. First, let us row-reduce the matrix A:


     
1 1 5 1 Q21 (−1) 1 1 5 1 Q32 (2) 1 1 5 1
3 2 13 1 Q41 (−4) 0 −1 −2 −2 Q42 (1) 0 −1 −2 −2
A = − −−−−→   −−−−→  
0 2 4 −1 0 2 4 −1 0 0 0 −5
4 5 22 1 0 1 2 −3 0 0 0 −5
     
S2 (−1) 1 1 5 1 S4 (1/5) 1 0 3 −1 Q14 (1) 1 0 3 0
Q43 (−1) 0 1 2 2 Q12 (−1) 0 1 2 2 Q24 (−2) 0 1 2 0 0
−−−−−→  0 0 0 5 −−−−−→ 0 0 0
  − −−−−→  0 0 0 1 =: A .

1
0 0 0 0 0 0 0 0 0 0 0 0

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 199

Now it follows immediately that dim RA = dim CA = 3 and

dim(Im(A)) = # pivot columns of A0 = 3,


dim(ker(A)) = 4 − dim(Im(A)) = 1

(or: dim(Im(A)) = #non-zero rows of A0 = 3, or: dim(Im(A)) = dim(RA ) = 3 or: dim(ker(A)) =


#non-pivot columns A0 = 1).
Kernel of A: We know that ker(A) = ker(A0 ) by Theorem 6.4 or Proposition 6.28. From the
explicit form of A0 it is clear that A~x = 0 if and only if x4 = 0, x3 arbitrary, x2 = −2x3 and
x1 = −3x3 . Therefore
     

 −3x3 
 
 −3 
  
−2x −2 .
  
3
ker(A) = ker(A0 ) = 

 x3  : x 3 ∈ R = span  1

 
 
 
0 0
   

Image of A: The pivot columns of A0 are the columns 1, 2 and 4. Therefore, by Theorem 6.34, a

FT
basis of Im(A) are the columns 1, 2 and 4 of A:
     
 1
 1 1 
 3 2  1
Im(A) = span  , ,  .
  2 −1 (6.9)
 0
 
4 5 1
 

Alternative method for calculating the image of A: We can uses column manipulations of A
RA
to obtain Im A. (If you fell more comfortable with row operations, you could apply row operations
to At and then transpose the resulting matrix again.) We find (Cj stands for “jth column of A):
  C →C −C    
1 1 5 1 C32 → C32 − 5C11 1 0 0 0 C3 → C3 − 2C2 1 0 0 0
3 2 13 1 C4 → C4 − C1 3 −1 −2 −2 C4 → C4 − 2C2 3 −1 0 0
A =  −−−−−−−−−−→   −−−−−−−−−−→  
0 2 4 −1 0 2 4 −1 0 2 0 −5
4 5 22 1 4 1 2 −3 4 1 0 −5
     
1 0 0 0 C1 → C1 − 3C4 1 0 0 0 1 0 0 0
D

C4 → −1/5C4 C3 ↔ C4
C1 → C1 + 3C2 0 −1 0 0 C2 → C2 − 2C4 0 −1 0 0 C2 → −C2 0 1 0 0
−−−−−−−−−−→  3
 −−−−−−−−−−→   −−−−−−−→   =: A.
e
2 0 1 0 0 0 1 0 0 1 0
7 1 0 1 4 −1 0 1 4 1 1 0

It follows that      

 1 0 0 
     
e = span   ,   , 0 .
0 1

Im(A) = Im(A) (6.9’)


0 0  1 


4 1 1
 

• Explain why the method with the column operations work.


• Show by an explicite calculation that the spaces in (6.9) and (6.9’) are equal.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
200 6.2. Matrices as linear maps

Example 6.37. Find a basis of span{p1 , p2 , p3 , p4 } ⊆ P3 and its dimension for

p1 = x3 − x2 + 2x + 2, p2 = x3 + 2x2 + 8x + 13,
p3 = 3x3 − 6x2 − 5, p3 = 5x3 + 4x2 + 26x − 9.

Solution. First we identify P3 with R4 by ax3 + bx2 + cx + d = b (a, b, c, d)t . The polynomials
p1 , p2 , p3 , p4 correspond to the vectors
       
1 1 3 5
−1 2 −6  4
~v1 = 
 2 , ~v2 =  8  , ~v3 =  0 , ~v4 =  26 .
      

2 13 −5 −9

Now we use Remark 6.35 to find a basis of span{v1 , v2 , v3 , v4 }. To this end we consider the A whose
columns are the vectors ~v1 , . . . , ~v4 :
 
1 1 3 5
−1 2 −6 4

FT
A=
 2 8
.
0 26
2 13 −5 −9

Clearly, span{v1 , v2 , v3 , v4 } = Im(A), so it suffices to find a basis of Im(A). Applying row transfor-
mation to A, we obtain
   
1 1 3 5 1 0 4 5
−1 2 −6 4 0 1 2 3 = A0 .

A=  −→ · · · −→ 
 2 8 0 26 0 0 0 0
RA
2 13 −5 −9 0 0 0 0

The pivot columns of A0 are the first and the second column, hence by Theorem 6.34, a basis of
Im(A) are its first and second columns, i.e. the vectors ~v1 and ~v2 .
It follows that {p1 , p2 } is a basis of span{p1 , p2 , p3 , p4 } ⊆ P3 , hence dim(span{p1 , p2 , p3 , p4 }) = 2.

Remark 6.38. Let us use the abbreviation π = span{p1 , p2 , p3 , p4 }. The calculation above actually
shows that any two vectors of p1 , p2 , p3 , p4 form a basis of π. To see this, observe that clearly any
two of them are linearly independent, hence the dimension of their generated space is 2. On the
D

other hand, this generated space is a subspace of π which has the same dimension 2. Therefore
they must be equal.

Remark 6.39. If we wanted to complete p1 , p2 to a basis of P3 , we have (at least) the two following
options:
(i) In order to find q3 , q4 ∈ P3 such that p1 , p2 , q3 , q4 forms a basis of P3 we can use the reduction
process that was employed to find A0 . Assume that E is an invertible matrix such that
A = EA0 . Such an E can be found by keeping track of the row operations that transform
A into A0 . Let ~ej be the standard unit vectors of R4 . Then we already know that ~v1 = E~e1
and ~v2 = E~e2 . If we set w ~ 3 = E~e3 and w ~ 4 = E~e4 , then ~v1 , ~v2 , w ~ 4 form a basis of R4 .
~ 3, w
This is because ~e1 , . . . ,~e4 are linearly independent and E is injective. Hence E~e1 , . . . , E~e4 are
linearly independent too (by Proposition 6.14).

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 201

(ii) If we already have some knowledge of orthogonal complements as discussed in Chapter 7,


then we know that any basis of the orthogonal complement of span{~v1 , ~v2 } completes them
to a basis of R4 which we then only have to translate back to vectors in P3 . In order find
to two linearly independent vectors which are orthogonal to ~v1 an ~v2 we have to find linearly
independent solutions of the homogenous system of two equations for four unknowns

x1 − x2 +2x3 +2x4 = 0,
x1 +2x2 −6x3 +4x4 = 0

or, in matrix notation, P ~x = 0 where P is the 2 × 4 matrix whose rows are p1 and p2 . Since
clearly Im(P ) ⊆ R2 , it follows that dim(Im(P )) ≤ 2 and therefore dim(ker(P )) ≥ 4 − 2 = 2.

The following theorem is sometimes useful, cf. Lemma 7.27. For the definition of the orthogonal
complement see Definition 7.23.

Theorem 6.40. Let A ∈ M (m × n). Then ker(A) = (RA )⊥ .

FT
Proof. Observe that RA = CAt = Im(At ). So we have to show that ker(A) = (Im(At ))⊥ . Recall
that hAx , yi = hx , At yi. Therefore

x ∈ ker(A) ⇐⇒ Ax = 0 ⇐⇒ Ax ⊥ Rm
⇐⇒ hAx , yi = 0 for all y ∈ Rm
⇐⇒ hx , At yi = 0 for all y ∈ Rm ⇐⇒ x ∈ (Im(A))t .

Alternative proof of Theorem 6.40. Let ~r1 , . . . , ~rn be the rows of A. Since RA = span{~r1 , . . . , ~rn },
RA
it suffices to show that ~x ∈ ker(A) if and only if ~x ⊥ ~rj for all j = 1, . . . , m.
By definition ~x ∈ ker(A) if and only if
    
~r1 x1 h~r1 , ~xi
~0 = A~x =  .  .   . 
 ..   ..  =  .. 
~rm xm h~rm , ~xi

This is the case if and only if h~rj , ~xi for all j = 1, . . . , m, that is, if and only if ~x ⊥ ~rj for all
j = 1, . . . , m. (h· , ·i denotes the inner product on Rn .)
D

You should now have understood


• what the relation between the solutions of a homogeneous system and the kernel of the
associated coefficient matrix is,
• what the relation between the admissible right hand sides of a system of linear equations
and the range of the associated coefficient matrix is,
• why the dimension formula (6.8) holds and why it is only a special case of (6.4),
• why the Gauß-Jordan process works,
• etc.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
202 6.3. Change of bases

You should now be able to


• calculate a basis of the kernel of a matrix and its dimension,
• calculate a basis of the range of a matrix and its dimension,
• etc.

6.3 Change of bases


In this section, we work mostly with real vector spaces for definiteness sake. However, all the
statements are also true for complex vector spaces. We only have to replace everywhere R by C
and the word real by complex.

Usually we represent vectors in Rn as column of numbers, for example


   
3 x

FT
~v =  2 , or more generally, ~ = y  ,
w (6.10)
−1 z

Such columns of numbers are usually interpreted as the Cartesian coordinates of the tip of the
vector if its initial point is in the origin. So for example, we can visualise ~v as the vector which
we obtain when we move 3 units along the x-axis, 2 units along the y-axis and −1 unit along the
z-axis.
If we set ~e1 , ~e2 , ~e3 the unit vectors which are parallel to the x-, y- and z-axis, respectively, then we
RA
can write ~v as a weighted sum of them:
 
3
~v =  2 = 3~e1 + 2~e2 − ~e3 . (6.11)
−1

So the column of numbers which we use to describe ~v in (6.10) can be seen as a convenient way to
abbreviate the sum in (6.11).
Sometimes however, it may make more sense to describe a certain vector not by its Cartesian
D

coordinates. For instance, think of an infinitely large chess field (this is R2 ). Then the rook is
moving a along the Cartesian axis while the bishop moves a along the diagonals, that is along
~b1 = ( 1 ), ~b2 = −1 and the knight moves in directions parallel to ~k1 = ( 2 ), ~k2 = ( 1 ). We

1 1 1 2
suppose that in our imaginary chess game the rook, the bishop and the knight may move in arbitrary
multiples of their directions. Suppose all three of them are situated in the origin of the field and we
want to move them to the field (3, 5). For the rook, this is very easy. It only has to move 3 steps to
the right and then 5 steps up. He would denote his movement as ~vR = ( 35 )R . The bishop cannot
do this. He can move only along the diagonals. So what does he have to do? He has to move 4
steps in the direction of ~b1 and 1 step in the direction of ~b2 . So he would denote his movement with
respect to his bishop coordinate system as ~vB = ( 42 )B . Finally the knight has to move 31 steps in
the direction of ~k1 and 73 steps in the direction of ~k2 to reach the point
 (3, 5). So he would denote
1/3
his movement with respect to his knight coordinate system as ~vK = 7/3
. See Figure 6.1.
K

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 203

y y

P (3, 5) P (3, 5)
5 5
 
~b2 1
4 3
4 1 B 4 7
3 K
7~
3 3 k
3 2

4~b1 ~k2
2 2

~b2 1 1 ~k1
~b1 1~
x k
3 1 x
−1 0 1 2 3 4 −1 0 1 2 3 4

FT
Figure 6.1
The pictures shows the point (3, 5) in “bishop” and “knight” coordinates. The vectors for the
bishop are ~b1 = ~ −1
xB = ( 31 ). The vectors for the knight are ~k1 = ( 21 ), ~k2 = ( 12 )B
1

 ( 1 ), b2 =
1
1 and ~
and ~xK = 3
7 .
3 K

 
1/3
Exercise. Check that ~vB = ( 42 )B = 4~b1 + 2~b2 = ( 35 ) and that ~vK = 7/3
= 1/3~k1 + 7/3~k2 = ( 35 ).
K
RA
Although the three vectors ~v , ~vB and ~vK look very different, they describe the same vector – only
from three different perspectives (the rook, the bishop and the knight perspective). We have to
remember that they have to be interpreted as linear combinations of the vectors that describe their
movements.
What we just did was to perform a change of bases in R2 : Instead of describing a point in the plane
in Cartesian coordinates, we used “bishop”- and “knight”-coordinates.
We can also go in the other direction and transform from “bishop”- or “knight”-coordinates to
Cartesian coordinates. Assume that we know that the bishop moves 3 steps in his direction ~b1 and
D

−2 steps in his direction ~b2 , where does he end up? In his coordinate system, he is displaced by
the vector ~u = −23 B . In Cartesian coordinates this vector is


       
3 ~ ~ 3 2 5
~u = = 3b1 − 2b2 = + = .
−2 B 3 −2 1

~ ~
 3 steps in his direction k1 and −2 step in his direction k2 , that is, we move
If we move the knight
3
him along w
~ = −2 K according to his coordinate system, then in Cartesian coordinates this vector
is        
3 ~ ~ 6 −2 4
w
~= = 3b1 − 2b2 = + = .
−2 K 3 −4 −1
Can the bishop and the knight reach every point in the plane? If so, in how many ways? The

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
204 6.3. Change of bases

y y

3 3
3~b1 ~k2
2 −2~b2 2
  3~k1
~b1
3 ~k1
~b2 1 −2 B 1
−2~k2
x x
−1 1 2 3 4 5 6 −1 1 2 3 4 5 6
 
3
−2 K

3 3
 
Figure 6.2: The pictures shows the vectors −2 B and −2 K .

FT
answer is yes, and they can do so in exactly one way. The reason is that for the bishop and for the
knight, their set of direction vectors each form a basis of R2 (verify this!).
Let us make precise the concept of change of basis. Assume we are given an ordered basis B =
{~b1 , . . . , ~bn } of Rn . If we write
 
x1
~x =  ... (6.12)
 

xn B
RA
then we interprete it as a vector which is expressed with respect to the basis B and
 
x1
 ..
~x =  . := x1~b1 + · · · + xn~bn . (6.13)
xn B

If there is no index attached to the column vector, then we interprete it as a vector with respect to the
canonical basis ~e1 , . . . ,~en of Rn . Now we want to find a way to calculate the Cartesian coordinates
D

(that is, those with respect to the canonical basis) if we are given a vector in B-coordinates and
vice versa.
It will turn out that the following matrix will be very useful:

AB→can = (~v1 | . . . |~vn ) = matrix whose columns are the vectors of the basis B.

We will explain the index “B → can” in a moment.

Transition from representation with respect to a given basis to Cartesian coordinates.


Suppose we are given a vector as in (6.13). How do we obtain its Cartesian coordinates?
This is quite straightforward. We only need to remember what the notation (·)B means. We will
denote by ~xB the representation of the vector with respect to the basis B and by ~x its representation

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 205

with respect to the standard basis of Rn .


     
x1 x1 x1
~x =  .  = x1~b1 + x2~b2 + · · · + xn~bn = (~b1 |~b2 | · · · |~bn )  ..  = AB→can  ...  = AB→can ~xB ,
 ..   .   

xn B
xn xn

that is  
y1
~x = AB→can ~xB =  ...  . (6.14)
 

yn can

The last vector (the one with the y1 , . . . , yn in it) describes the same vector as ~xB , but it does so
with respect to the standard basis of Rn . The matrix AB→can is called the transition matrix from
the basis B to the canonical basis (which explains the subscript “B → can”). The matrix is also
called the change-of-coordinates matrix

FT
Transition from Cartesian coordinates to representation with respect to a given basis.
Suppose we are given a vector ~x in Cartesian coordinates. How do we calculate its coordinates ~xB
with respect to the basis B?
We only need to remember that the relation between ~x and ~xB according to (6.14) is

~x = AB→can ~xB .
RA
In this case, we know the entries of the vector ~xB . So we only need to invert the matrix AB→can in
order to obtain the entries of ~xB :
~xB = A−1
B→can ~x.
This requires of course that AB→can is invertible. But this is guaranteed by Theorem 5.37 since we
know that its columns are linearly independent. So it follows that the transition matrix from the
canonical basis to the basis B is given by

Acan→B = A−1
B→can .
D

 
y1
Note that we could do this also “by hand”: We are given ~x =  ...  and we want to find the
 

yn can
entries x1 , . . . , xn of the vector ~xB which describes the same vector. That is, we need numbers
x1 , . . . , xn such that
~x = x1~b1 + · · · + ~bn xn .
If we know the vectors ~b1 , . . . , ~bn , then we can write this as an n × n system of linear equations
and then solve it for x1 , . . . , xn which
  of course in reality is the same as applying the inverse of the
y1
matrix AB→can to the vector ~x =  ...  .
 

yn can

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
206 6.3. Change of bases

Now assume that we have two ordered bases B = {~b1 , . . . , ~bn } and C = {~c1 , . . . , ~cn } of Rn and we
are given a vector ~xB with respect to the basis B. How can we calculate its representation ~xC with
respect to the basis C? The easiest way is to use the canonical basis of Rn as an auxiliary basis.
So we first calculate the given vector ~xB with respect to the canonical basis, we call this vector ~x.
Then we go from ~x to ~xC . According to the formulas above, this is
~ can→C ~x = Acan→C AB→can ~xB .
~xC = A

Hence the transition matrix from the basis B to the basis C is

AB→C = Acan→C AB→can .

Example 6.41. Let us go back to our example  of our imaginary chess board. We have the “bishop
basis” B = {~b1 , ~b2 } where ~b1 = ( 11 ), ~b2 = −11 and the “knight basis” K = {~k1 , ~k2 } ~k1 = ( 21 ), ~k2 =
( 12 ). Then the transition matrices to the canonical basis are
   
1 −1 2 1
AB→can = , AK→can = ,

FT
1 1 1 2

their inverses are    


1 1 1 1 2 −1
Acan→B = , Acan→K =
2 −1 1 3 −1 2
and the transition matrices from C to K and from K to C are
   
1 3 −3 1 1 3
AB→K = , AK→C = .
3 1 1 2 −1 3
RA
• Given a vector ~x = ( 27 )B in bishop coordinates, what are its knight coordinates?

1 3 −3 −5
 
Solution. (~x)K = AB→K ~xB = 3 1 1 ( 27 ) = 3 K. 

• Given a vector ~y = ( 51 )K in knight coordinates, what are its bishop coordinates?

1 13 3
 
Solution. (~y )B = AK→B ~yK = −1 3 ( 51 ) = −1 B . 
D

• Given a vector ~z = ( 13 ) in standard coordinates, what are its bishop coordinates?

1 11

Solution. (~z)B = Acan→B ~z = 2 −1 1 ( 13 ) = ( 21 )B . 

Example 6.42. Recall the example on page 94 where we had a shop that sold different types of
packages of food. Package type A contains 1 peach and 3 mangos and package type B contains 2
peaches and 1 mango. We asked two types of questions:
Question 1. If we buy a packages of type A and b packages of type B, how many peaches and
mangos will we get? We could rephrase this question so that it becomes more similar to Question
2: How many peaches and mangos do we need in order to fill a packages of type A and b packages
of type B?

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 207

Question 2. How many packages of type A and of type B do we have to buy in order to get p
peaches and m mangos?
Recall that we had the relation
           
a m −1 m a 1 2 −1 1 −1 2
M = , M = where M = and M = .
b p p b 3 1 5 3 −1
(6.15)
We can view these problems in two different coordinate systems. We have the “fruit basis” F =
{~ ~ and the “package basis” P = {A,
p, m} ~ B}
~ where
       
1 0 ~= 1 , B ~ = 2 .
m
~ = , p~ = , A
0 1 3 1

Note that A~=m ~ + 3~ ~ = 2m


p, B ~ + p~, and that m~ = 51 (−A ~ + 3B)
~ and p~ = 1 (2A~ − B)
~ (that means
5
for example that one mango is three fifth of a package B minus one fifth of a package A).
An example for the first question is: How many peaches and mangos do we need to obtain 1 package
of type A and 3 packages of type B? Clearly,
 we
 need 7 peaches and 6 mangos. So the point that we

FT
 
1 7
want to reach is in “package coordinates” and in “fruit coordinates” . This is sketched
3 P 6 F
in Figure 6.3.
An example for the second question is: How many packages of type A and of type B do we have
to buy in order to obtain 5 peaches and 5 mangos? Using (6.15) we find that we need 1 package of
type
  A and 3 packages of type B.Sothe point that we want to reach is in “package coordinates”
1 5
and in “fruit coordinates” . This is sketched in Figure 6.4.
2 P 5 F
RA
In the rest of this section we will apply these ideas to introduce coordinates in abstract (finitely
generated) vector spaces V with respect to a given a basis. This allows us to identify in a certain
sense V with Rn or Cn for an appropriate n.
Assume we are given a real vector space V with an ordered basis B = {v1 , . . . , vn }. Given a vector
w ∈ V , we know that there are uniquely determined real numbers α1 , . . . , αn such that

w = α1 v1 + · · · + αn vn .
D

So, if we are given w, we can find the numbers α1 , . . . , αn . On the other hand, if we are given the
numbers α1 , . . . , αn , we can easily reconstruct the vector w (just replace in the right hand side of
the above equation). Therefore it makes sense to write
 
α1
 .. 
w= . 
αn B

where again the index B reminds us that the column of numbers has to be understood as the
coefficients with respect to the basis B. In this way, we identify V with Rn since every column
vector gives a vector w in V and every vector w gives one column vector in Rn . Note that if we
start with some w in V , calculate its coordinates with respect to a given basis and then go back to
V , we get back our original vector w.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
208 6.3. Change of bases

mangos
7
Type B
6

FT
5 ~
3B 4

4 6m
~
3

3
~
A 7~
p 2
2

1
1 ~
B
RA
p
~
peaches Type A
1 2 3 4 5 6 7 −1 1 2
m
~

(a)
(b)

Figure 6.3: How many peaches and mangos do we need to obtain 1 package of type A and 3 packages
of type B? Answer: 7 peaches and 6 mangos. Figure (a) describes the situation in the “fruit plane”
D

while Figure (b) describes the same situation in the “packages plane”. In both figures we see that
~ + 3B
A ~ = 7~
p + 6m.
~

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 209

mangos
Type B
6

5 4

~
2B
4
3

3 5m
~
~
A 2
2 5~
p
1
1 ~
B
p
~
peaches Type A
1 2 3 4 5 6 −1 1 2
m
~

(a)

FT
(b)

Figure 6.4: How many packages of type A and of type B do we need to get 5 peaches and 5 mangos?
Answer: 1 package of type A and 2 packages of type B. Figure (a) describes the situation in the “fruit
plane” while Figure (b) describes the same situation in the “packages plane”. In both figures we see
~ + 2B
that A ~ = 5~p + 5m.
~

Example 6.43. In P2 , consider the bases B = {p1 , p2 , p3 }, C = {q1 , q2 , q3 }, D = {r1 , r2 , r3 }


RA
where

p1 = 1, p2 = X, p3 = X 2 , q1 = X 2 , q2 = X, q3 = 1, r1 = X 2 + 2X, r2 = 5X + 2, r3 = 1.

We want to write the polynomial π(X) = aX 2 + bX + c with respect to the given basis.
 
c
• Basis B: Clearly, π = cp1 + bp2 + ap3 , therefore π =  b  .
a B
D

 
a
• Basis C: Clearly, π = aq1 + bq2 + cq3 , therefore π =  b  .
c C

• Basis D: This requires some calculations. Recall that we need numbers α, β, γ ∈ R such that
 
α
π = β  = αr1 + βr2 + γr3 .
γ D

This leads to the following equation

aX 2 + bX + c = α(X 2 + 2X) + β(5X + 2) + γ = αX 2 + (2α + 5β)X + 2β + γ.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
210 6.3. Change of bases

Comparing coefficients we obtain


     
α =a 1 0 0 α a
2α + 5β =b in matrix form: 2 5 0  β  =  b  . (6.16)
2β + γ = c. 0 2 1 γ c

Note that the columns of the matrix appearing on the right hand side are exactly  a the vector
representations of r1 , r2 , r3 with respect to the basis C and the column vector b is exactly
c
the vector representation of π with respect to the basis C! The solution of the system is

α = a, β = − 52 a + 15 b, γ = 25 a − 15 b + c,

therefore  
a
π =  − 25 a + 51 b  .
2 1
5a − 5b + c D

 
1
FT
We could have found the solution also by doing a detour through R3 as follows: We identify the
vectors q1 , q2 , q3 with the canonical basis vectors ~e1 , ~e2 ,~e3 of R3 . Then the vectors r1 , r2 , r3
and π correspond to
 
0
 
0
~r10 = 2 , ~r20 = 5 , ~r30 = 0 , ~π 0 =  b  .
0 2 1
 
a

c
RA
Let R = {~r10 , ~r20 , ~r30 }. In order to find the coordinates of ~π 0 with respect to the basis ~r10 , ~r20 , ~r30 ,
we note that
~π 0 = AR→can~πR
0

where AR→can is the transition matrix from the basis R to the canonical basis of R whose
columns consist of the vectors ~r10 , ~r20 , ~r30 . So we see that this is exactly the same equation as
the one in (6.16).

We give an example in a space of matrices.


D

Example 6.44. Consider the matrices


       
1 1 1 0 0 1 2 3
R= , S= , T = , Z= .
1 1 0 3 1 0 3 0

(i) Show that B = {R, S, T } is a basis of Msym (2 × 2) (the space of all symmetric 2 × 2 matrices).
(ii) Write Z in terms of the basis B.
Solution. (i) Clearly, R, S, T ∈ Msym (2 × 2). Since we already know that dim Msym (2 × 2) = 3,
it suffices to show that R, S, T are linearly independent. So let us consider the equation
 
α+β α+γ
0 = αR + βS + γT = .
α + γ α + 3β

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 211

We obtain the system of equations


     
α+ β = 0 1 1 0 α 0
α +γ=0 in matrix form: 1 0 1 β  = 0 . (6.17)
α + 3β =0 1 3 0 γ 0

Doing some calculations, if follows that α = β = γ = 0. Hence we showed that R, S, T are


linearly independent and therefore they are a basis of Msym (2 × 2).
(ii) In order to write Z in terms of the basis B, we need to find α, β, γ ∈ R such that
 
α+β α+γ
Z = αR + βS + γT = .
α + γ α + 3β
We obtain the system of equations
     
α+ β = 2 1 1 0 α 2
α +γ=3 in matrix form: 1 0 1 β  = 3 . (6.18)
α + 3β =0 1 3 0 γ 0

FT
| {z }
=A

Therefore
        
α 2 3 0 −1 2 3
1
β  = A−1 3 = −1 0 1 3 = −1 ,
2
γ 0 −3 2 1 0 0
 3
hence Z = 3R − S = −1 . 
0 B
RA
Now we give an alternative solution (which is essentially
  the same as  the above)doing a detour
1 0 0 0 1 1
through R3 . Let C = {A1 , A2 , A3 } where A1 = , A2 = , A3 = . This is
0 0 0 1 1 0
clearly a basis of Msym (2 × 2). We identify it with the standard basis ~e1 ,~e2 ,~e3 of R3 . Then the
vectors R, S, T in this basis look like
       
1 1 0 2
R0 = 1 , S 0 = 0 , T 0 = 1 and Z 0 = 3 .
1 3 0 0
D

(i) In order to show that R, S, T are linearly independent, we only have to show that the vectors
R0 , S 0 and T 0 are linearly independent in R3 . To this end, we consider the matrix A whose
columns are these vectors. Note that this is the same matrix that appeared in (6.18). It is
easy to show that this matrix is invertible (we already calculated its inverse!). Therefore the
vectors R0 , S 0 , T 0 are linearly independent in R3 , hence R, S, T are linearly independent in
Msym (2 × 2).
(ii) Now in order to find the representation of Z in terms of the basis B, we only need to find the
representation of Z 0 in terms of the basis B 0 = {R0 , S 0 , T 0 }. This is done as follows:
 
2
ZB0 0 = Acan→B0 Z 0 = A−1 Z 0 = 3 .
0

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
212 6.4. Linear maps and their matrix representations

You should now have understood


• the geometric meaning of a change of bases in Rn ,
• how an abstract finite dimensional vector space can be represented as Rn or Cn and that
the representation depends on the chosen basis of V ,
• how the vector representation changes if the chosen basis is reordered,
• etc.
You should now be able to
• perform a change of basis in Rn and Cn given a basis,
• represent vectors in a finite dimensional vector space V as column vectors after the choice
of a basis,
• etc.

FT
6.4 Linear maps and their matrix representations
Let U, V be K-vector spaces and let T : U → V be a linear map. Recall that T satisfies
T (λ1 x1 + · · · + λk xk ) = λ1 T (x1 ) + · · · + λk T (xk )
for all x1 , . . . , xk ∈ U and λ1 , . . . , λk ∈ K. This shows that in order to know T , it is in reality
enough to know how T acts on a basis of U . Suppose that we are given a basis B = {u1 , . . . , un } ∈ U
and take an arbitrary vector w ∈ U . Then there exist uniquely determined λ1 , . . . , λk ∈ K such
RA
that w = λ1 u1 + · · · + λn un . Hence
T w = T (λ1 u1 + · · · + λn un ) = λ1 T u1 + · · · + λn T un . (6.19)
So T w is a linear combination of the vectors T u1 , . . . , T un ∈ V and the coefficients are exactly the
λ1 , . . . , λ n .
Suppose we are given a basis C = {v1 , . . . , vk } of V . Then we know that for every j = 1, . . . , n, the
vector T uj is a linear combination of the basis vectors v1 , . . . , vm of V . Therefore there exist uniquely
determined numbers aij ∈ K (i = 1, . . . , m, j = 1, . . . n) such that T uj = aj1 v1 + · · · + ajm vm , that
is
D

T u1 = a11 v1 + a21 v2 + · · · + am1 vm ,


T u2 = a12 v1 + a22 v2 + · · · + am2 vm ,
.. .. .. .. (6.20)
. . . .
T un = a1n v1 + a2n v2 + · · · + amn vm .

Let us define the matrix AT and the vector ~λ by


   
a11 a12 · · · a1n λ1
 a21 a22 · · · a2n   λ2 
AT =  .

.. ..  ∈ M (m × n),
 ~λ = 
 ..  ∈ Rn .

 .. . .   . 
am1 am2 · · · amn λn

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 213

Note that the first column of AT is the vector representation of T u1 with respect to the basis
v1 , . . . , vm , the second column is the vector representation of T u2 , and so on.
Now let us come back to the calculation of T w and its connection with the matrix AT . From (6.19)
and (6.20) we obtain

T w = λ1 T u1 + λ2 T u2 + · · · + λn T un

= λ1 (a11 v1 + a21 v2 + · · · + am1 vm )


+ λ2 (a12 v1 + a22 v2 + · · · + am2 vm )
+ ···
+ λn (a1n v1 + a2n v2 + · · · + amn vm )

= (a11 λ1 + a12 λ2 + · · · + a1n λn )v1


+ (a21 λ1 + a22 λ2 + · · · + a2n λn )v2
+ ···

FT
+ (am1 λ1 + am2 λ2 + · · · + amn λn )vm .

The calculation shows that for every k the coefficient of vk is the kth component of the vector AT ~λ!
Now we can go one step further. Recall that the choice of the basis B of U and the basis C of V
allows us to write w and T w as a column vectors:
   
λ1 a11 λ1 + a12 λ2 + · · · + a1n λn
λ2   a21 λ1 + a22 λ2 + · · · + a2n λn 
w=w ~B  .  , Tw =   .
   
..
RA
 ..   . 
λ1 B
am1 λ1 + am2 λ2 + · · · + amn λn C

This shows that


(T w)C = AT w
~ B.
For now hopefully obvious reasons, the matrix AT is called the matrix representation of T with
respect to the bases B and C.
So every linear transformation T : U → V can be represented as a matrix AT ∈ M (m × n). On the
D

other hand, every a matrix A(m × n) induces a linear transformation TA : U → V .

Very important remark. This identification of m×n-matrices with linear maps U → V depends
on the choice of the basis! See Example 6.47.

Let us summarise what we have found so far.

Theorem 6.45. Let U, V be finite dimensional vector spaces and let B = {u1 , . . . , un } be an ordered
basis of U and let C = {v1 , . . . , vn } be an ordered basis of V . Then the following is true:

(i) Every linear map T : U → V can be represented as a matrix AT ∈ M (m × n) such that

(T w)C = AT w
~B

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
214 6.4. Linear maps and their matrix representations

where (T w)C is the representation of T w ∈ V with respect to the basis C and w ~ B is the
representation of w ∈ U with respect to the basis B. The entries aij of AT can be calculated
as in (6.20).

(ii) Every matrix A = (aij )i=1,...,m ∈ M (m × n) induces a linear transformation T : U → V


j=1,...,n
defined by
T (uj ) = a1j v1 + . . . amj vm , j = 1, . . . , n.

(iii) T = TAT and A = ATA . , That means: If we start with a linear map T : U → V , calculate its
matrix representation AT and then the linear map TAT : U → V induced by AT , then we get
back our original map T . If on the other hand we start with a matrix A ∈ M (m×n), calculate
the linear map TA : U → V induced by A and then calculate its matrix representation ATA ,
then we get back our original matrix A.

Proof. We already showed (i) and (ii) in the text before the theorem. To see (iii), let us start with a
linear transformation T : U → V and let AT = (aij ) be the matrix representation of T with respect

FT
to the bases B and C. For TAT , the linear map induced by AT , it follows that

T AT uj = a1j v1 + . . . amj vm = T uj , j = 1, . . . , n

Since this is true for all basis vectors and both T and TAT are linear, they must be equal.
If on the other hand we are given a matrix A = (aij )i=1,...,m ∈ M (m × n) then we have that the
j=1,...,n
linear transformation TA induced by A acts on the basis vectors u1 , . . . , un as follows:
RA
TA uj = TAT uj = a1j v1 + . . . amj vm .

But then, by definition of the matrix representation ATA of TA , it follows that ATA = A.

Let us see this “identifications” of matrices with linear transformations a bit more formally. By
choosing a basis B = {u1 , . . . , un } in U and thereby identifying U with Rn , we are in reality defining
a linear bijection
 
λ1
 .. 
D

n
Ψ:U →R , Ψ(λu1 + · · · + λn un ) =  .  .
λn
Recall that we denoted the vector on the right hand side by ~uB .
The same happens if we choose a basis C = {v1 , . . . , vm } of V . We obtain a linear bijection


µ1
Φ : V → Rm , Φ(µv1 + · · · + µm vm ) =  ...  .
 

µm

With these linear maps, we find that

AT = Φ ◦ T ◦ Ψ−1 and TA = Φ−1 ◦ A ◦ Ψ.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 215

The maps Ψ and Φ “translate” the spaces U and V to Rn and Rm where the chosen bases serve
as “dictionary”. Thereby they “translate” linear maps U : U → V to matrices A ∈ M (m × n) and
vice versa. In a diagram this looks likes this:
T
U V
Ψ Φ
AT
Rn Rm
So in order to go from U to V , we can take the detour through Rn and Rm . The diagram above is
called commutative diagram. That means that it does not matter which path we take to go from
one corner of the diagram to another one as long as we move in the directions of the arrows. Note
that in this case we are even allowed to go in the opposite directions of the arrows representing Ψ
and Φ because they are bijections.
What is the use of a matrix representation of a linear map? Sometimes calculations are easier in
the world of matrices. For example, we know how to calculate the range and the kernel of a matrix.
Therefore, using Theorem :

FT
• If we want to calculate Im T , we only need to calculate Im AT and then use Φ to “translate
back” to the range of T . In formula: Im T = Im(Φ−1 AT Ψ) = Im(Φ−1 AT ) = Φ−1 (Im AT ).
• If we want to calculate ker T , we only need to calculate ker AT and then use Ψ to “translate
back” to the kernel of T . In formula: ker T = ker(Φ−1 AT Ψ) = ker(AT Ψ) = Ψ−1 (ker AT ).
• If dim U = dim V , i.e., if n = m, then T is invertible if and only if AT is invertible. This is
the case if and only if det AT 6= 0.
RA
Let us summarise. From Theorem 6.24 we obtain again the following very important theorem, see
Theorem 6.20 and Proposition 6.16.

Theorem 6.46. Let U, V be vector spaces and let T : U → V be a linear transformation. Then

dim U = dim(ker T ) + dim(Im T ). (6.21)

If dim U = dim V , then the following is equivalent:


(i) T is invertible.
D

(ii) T is injective, that is, ker T = {O}.


(iii) T is surjective, that is, Im T = V .

Note that if T is bijective, then we must have that dim U = dim V .

Let us see some examples.

Example 6.47. We consider the operator of differentiation


T : P3 → P3 , T p = p0 .
Note that in this case the vector spaces U and V are both equal to P3 .

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
216 6.4. Linear maps and their matrix representations

(i) Represent T with respect to the basis B = {p1 , p2 , p3 , p4 } and find its kernel where p1 =
1, p2 = X, p3 = X 2 , p4 = X 3 .

Solution. We only need to evaluate T in the elements of the basis and then write the re-
sult again as linear combination of the basis. Since in this case, the bases are “easy”, the
calculations are fairly simple:

T p1 = 0, T p2 = 1 = p1 , T p3 = 2X = 2p2 , T p4 = 3X 2 = 3p3 .

Therefore the matrix representation of T is


 
0 1 0 0
AB 0 0 2 0

T = 0
.
0 0 3
0 0 0 0

The kernel of AT is clearly span{~e1 }, hence ker T = span{p1 } = span{1}. 

FT
(ii) Represent T with respect to the basis C = {q1 , q2 , q3 , q4 } and find its kernel where q1 =
X 3 , q2 = X 2 , q3 = X, q4 = 1.

Solution. Again we only need to evaluate T in the elements of the basis and then write the
result as linear combination of the basis.

T q1 = 3X 2 = 3q2 , T q2 = 2X = 2q3 , T q3 = X = q4 , T q4 = 0.
RA
Therefore the matrix representation of T is
 
0 0 0 0
C
3 0 0 0
AT =   .
0 2 0 0
0 0 1 0

The kernel of AT is clearly span{~e4 }, hence ker T = span{q4 } = span{1}. 

(iii) Represent T with respect to the basis B in the domain of T (in the “left” P3 ) and the basis
D

C in the target space (in the “right” P3 ).

Solution. We calculate

T p1 = 0, T p2 = 1 = q4 , T p3 = 2X = 2q3 , T p4 = 3X 2 = 3q2 .

Therefore the matrix representation of T is


 
0 0 0 0
B,C
0 0 0 3
AT =  0
.
0 2 0
0 1 0 0

The kernel of AT is clearly span{~e1 }, hence ker T = span{p1 } = span{1}. 

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 217

(iv) Represent T with respect to the basis D = {r1 , r2 , r3 , r4 } and find its kernel where
r1 = X 3 + X, r2 = 2X 2 + X 2 + 2X, r3 = 3X 3 + X 2 + 4X + 1, r4 = 4X 3 + X 2 + 4X + 1.

Solution 1. Again we only need to evaluate T in the elements of the basis and then write the
result as linear combination of the basis. This time the calculations are a bit more tedious.
T r1 = 3X 2 + 1 = − 8r1 + 2r2 + r4 ,
2
T r2 = 6X + 2X + 2 = − 14r1 + 4r2 + r3 ,
T r3 = 9X 2 + 2X + 4 = − 24r1 + 5r2 + 2r3 + 2r4 ,
T r4 = 12X 2 + 2X + 4 = 30r1 + 8r2 + 2r3 + 2r4 .
Therefore the matrix representation of T is
 
−8 −14 −24 −30
 2 4 5 8
AD
T = 0
 .
2 2 2
1 0 2 2

FT
In order to calculate the kernel of AT , we apply the Gauß-Jordan process and obtain
   
−8 −14 −24 −30 1 0 0 2
2 4 5 8  −→ · · · −→ 0 1 0 1
AD
  
T = 0
 .
2 2 2 0 0 1 0
1 0 2 2 0 0 0 0
The kernel of AT is clearly span{−2~e1 − ~e2 + ~e4 }, hence ker T = span{−2r1 − r2 + r4 } =
span{1}. 
RA
Solution 2. We already have the matrix representation ACT and we can use it to calculate
AD
T . To this end define the vectors
       
1 2 3 4
0 1 1 1
ρ
~1 = 
1 , ρ
  ~4 =   .
2 ~3 = 4 , ρ
 ~2 =   , ρ
4
0 0 1 1
Note that these vectors are the representations of our basis vectors r1 , . . . , r4 in the basis C.
D

The change-of-bases matrix from C to D and its inverse are, in coordinates,


   
1 2 3 4 0 −2 1 −2
0 1 1 1 −1
 0 1 0 −1
SD→C =  1 2 4 4 ,
 SC→D = SD→C = −1
.
0 1 0
0 0 1 1 1 0 −1 1
It follows that
AD C
T = SC→D AT SD→C
     
0 −2 1 −2 0 0 0 0 1 2 3 4 −8 −14 −24 −30
 0 1 0 −1 3 0 0 0 0 1 1 1  2 4 5 8
=−1
  = .
0 1 0 0 2 0 0 1 2 4 4  0 2 2 2
1 0 −1 1 0 0 1 0 0 0 1 1 1 0 2 2

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
218 6.4. Linear maps and their matrix representations

Let us see how this looks in diagrams. We define the two bijections of P3 with R4 which are
given by choosing the bases C and D by ΨC and ΨD :

Ψ C : P3 → R4 , ΨC (q1 ) = ~e1 , ΨC (q2 ) = ~e2 , ΨC (q3 ) = ~e3 , ΨC (q4 ) = ~e4 ,


4
ΨD : P3 → R , ΨD (r1 ) = ~e1 , ΨD (r2 ) = ~e2 , ΨD (r3 ) = ~e3 , ΨD (r4 ) = ~e4 .

Then we have the following diagrams:

T T
P3 P3 P3 P3
ΨC ΨC ΨD ΨD
AC AD
R4 T
R4 R4 T
R4

We already know everything in the diagram on the left and we want to calculate AD
T in the
diagram on the right. We can put the diagrams together as follows:

FT
T
P3 P3

ΨC ΨC
ΨD ΨD

SD→C AC SC→D
R4 R4 T
R4 R4
AD
T
RA
We can also see that the change-of-basis maps SD→C and SC→D are

SD→C = ΨC ◦ Ψ−1
D , SC→D = ΨD ◦ Ψ−1
C .

For AD
T we obtain
−1
AD C
T = ΨD ◦ T ◦ ΨD = SD→C ◦ AT ◦ SC→D .
D

Another way to draw the diagram above is

T
P3 P3

ΨC ΨC

AC
ΨD R4 T
R4 ΨD
S
C C
→ →
D
SD

AD
T
R4 R4

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 219

B,C
Note that the matrices AB C D
T , AT , AT and AT all look different but they describe the same linear
transformation. The reason why they look different is that in each case we used different bases to
describe them.

Example 6.48. The next example is not very applied but it serves to practice a bit more. We
consider the operator given

T : M (2 × 2) → P2 , T ( ac db ) = (a + c)X 2 + (a − b)X + a − b + d.


Show that T is a linear transformation and represent T with respect to the bases B = {B1 , B2 , B3 , B4 }
of M (2 × 2) and C = {p1 , p2 , p3 } of P2 where
       
1 0 0 1 0 0 0 0
B1 = , B2 = , B3 = , B4 = ,
0 0 0 0 1 0 0 1

and
p1 = 1, p2 = X, p3 = X 2 .

FT
Find bases for ker T and Im T and their dimensions.
a1 b1

Solution. First we verify that T is indeed a linear map. To this end, we take matrices A1 = c1 d1
and A2 = ac22 db22 and λ ∈ R. Then


       
a b1 a b2 λa1 + a2 λb1 + b2
T (λA1 + A2 ) = T λ 1 + 2 =T λ
c1 d1 c2 d2 λc1 + c2 λd1 + d2
= (λa1 + a2 + λc1 + c2 )X 2 + (λa1 + a2 − λb1 − b2 )X + λa1 + a2 − (λb1 + b2 ) + λd1 + d2
RA
= λ[(a1 + c1 )X 2 + (a1 − b1 )X + a1 − b1 + d1 )] + (a2 + c2 )X 2 + (a2 − b2 )X + a2 − b2 + d2 )
 

= λT (A1 ) + T (A2 ).

This shows that T is a linear transformation.


Now we calculate its matrix representation with respect to the given bases.

T B1 = X 2 + X + 1 = p1 + p2 + p3 ,
T B2 = −X = −p2 ,
D

T B3 = X 2 = p3 ,
T B4 = 1 = p1 .

Therefore the matrix representation of T is


 
1 0 0 1
AT =  1 −1 0 0
1 0 1 0

In order to determine the kernel and range of AT , we apply the Gauß-Jordan process:
     
1 0 0 1 1 0 0 1 1 0 0 1
AT = 1 −1 0 0 −→ 0 −1 0 −1 −→ 0 1 0 1 .
1 0 1 0 0 0 1 −1 0 0 1 −1

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
220 6.4. Linear maps and their matrix representations

5
R~z L
4

3
~v ~z
w
~ 2

x
−3 −2 −1 1 2 3 4 5
−1

−2 ~ = −w
Rw ~

FT
Figure 6.5: The pictures shows the reflection R on the line L. The vector ~v is parallel to L, hence
R~v = ~v . The vector w ~ = −w.
~ is perpendicular to L, hence Rw ~

So the range of AT is R3 and its kernel is ker  e1 +~e2 −~e3 −~e3 }. Therefore Im T = P2 and
 AT = span{~
ker T = span{B1 + B2 − B3 − B4 } = span −11 −11 . For their dimensions we find dim(Im T ) = 3
and dim(ker T ) = 1. 

Example 6.49 (Reflection in R2 ). In R2 , consider the line L : 3x − 2y = 0. Let R : R2 → R2


RA
which takes a vector in R2 and reflects it on the line L, see Figure 6.5. Find the matrix representation
of R with respect to the standard basis of R2 .
Observation. Note that L is the line which passes through the origin and is parallel to the vector
~v = ( 23 ).

Solution 1 (use coordinates adapted to the problem). Clearly, there are two directions which
are special in this problem: the direction parallel and the direction orthogonalto the line. So a
~ = −32 . Clearly, R~v = ~v
~ where ~v = ( 23 ) and w
basis which is adapted to the exercise, is B = {~v , w}
and Rw ~ = −w.~ Therefore the matrix representation of R with respect to the basis B is
D

 
B 1 0
AR = .
0 −1

In order to obtain the representation AR with respect to the standard basis, we only need to perform
a change of basis. Recall that change-of-bases matrices are given by
   
2 −3 −1 1 2 3
SB→can = (~v |w)
~ = , Scan→B = SB→can = .
3 2 13 −3 2

Therefore
     
1 2 −3 1 0 2 3 1 −5 12
AR = SB→can AB
R Scan→B = = . 
13 3 2 0 −1 −3 2 13 12 5

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 221

Solution 2 (reduce the problem to a known reflection). The problem would be easy if we
were asked to calculate
 the matrix representation of the reflection on the x-axis. This would simply
1 0
be A0 = . Now we can proceed as follows: First we rotate R2 about the origin such that
0 −1
the line L is parallel to the x-axis, then we reflect on the x-axis and then we rotate back. The result
is the same as reflecting on L. Assume that Rot is the rotation matrix. Then

AT = Rot−1 ◦ A0 ◦ Rot. (6.22)

How can we calculate Rot? We know that Rot~v = ~e1 and that Rotw ~ = ~e2 . It follows that
Rot−1 = (~v |w)
~ = −32 32 . Note that up to a numerical factor, this is SB→can . We can calculate


easily that Rot = (Rot−1 )−1 = 13


1 2 −3 −5 12
 
3 2 . If we insert this in (6.22), we find again AR = 12 5 . 

Solution 3 (straight forward calculation).


 We can form a system of linear equations in order
to find AT . We write AR = ac db with unknown numbers a, b, c, d. Again, we use that we know
~ = −w.
that AT ~v = ~v and AT w ~ This gives the following equations:

FT
      
2 a b 2 2a + 3b
= ~v = AT ~v = = ,
3 c d 3 2c + 3d
      
−3 a b −3 3a − 2b
=w ~ = −AT w ~ =− =
2 c d 2 3c − 2d

which gives the system

2a + 3b = 2, 2c + 3d = 3, 3a − 2b = −3,3c − 2d = 2,
RA
5 −5 12
, b = c = 12 5

Its unique solution is a = − 13 13 , d = 13 , hence AR = 12 5 . 

Example 6.50 (Reflection and orthogonal projection in R3 ). In R3 , consider the plane


E : x − 2y + 3z = 0. Let R : R3 → R3 which takes a vector in R3 and reflects it on the plane E and
let P : R3 → R3 be the orthogonal projection onto E. Find the matrix representation of R with
respect to the standard basis of RE .
Observation. 1 Note
 that E is the plane which
 2  passes through
 0  the origin and is orthogonal to the
vector ~n = −2 . Moreover, if we set ~v = 1 and w~ = 3 , then it is easy to see that {~v , w}
~ is
3 0 2
D

a basis of E.

Solution 1 (use coordinates adapted to the problem). Clearly, a basis which is adapted to
the exercise is B = {~n, ~v , w}
~ because for these vectors we have R~v = ~v , Rw ~ R~n = −~n, and
~ = w,
P ~v = ~v , P w ~ P ~n = ~0. Therefore the matrix representation of R with respect to the basis B is
~ = w,
 
1 0 0
AB
R =
0 1 0
0 0 −1

and the one of P is  


1 0 0
AB
R =  0 1 0
0 0 0

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
222 6.4. Linear maps and their matrix representations

z z

~x ~x
• •
E E
P ~x
R~x
y y

x x

Figure 6.6: The figure shows the plane E : x − 2y + 3z = 0 and for the vector ~
x it shows its orthogonal
projection P ~
x onto E and its reflection R~
x about E, see Example 6.50.

Therefore

2 0
~ n) = 1 3 −2 ,
SB→can = (~v |w|~
0 2
1

3
FT
In order to obtain the representations AR and AP with respect to the standard basis, we only need
to perform a change of basis. Recall that change-of-bases matrices are given by

−1
Scan→B = SB→can =
1
28

13
−3
2 −4
2 −3
6

5 .
6
RA
   
2 0 1 1 0 0 13 2 −3
1 
AR = SB→can AB
R Scan→B = 1 3 −2 0 1 0 −3 6 5
28
0 2 3 0 0 −1 2 −4 6
 
6 2 −3
1
=  2 3 6
7
−3 6 −2

and
D

   
2 0 2 1 0 0 13 2 −3
1 
AP = SB→can AB
P Scan→B = 1 3 −1 0 1 0 −3 6 5
28
0 2 3 0 0 0 2 −4 6
 
13 2 −3
1 
= 2 10 6 
14
−3 6 5

Solution 2 (reduce the problem to a known reflection). The problem would be easy if we
were asked to calculate
 thematrix representation of the reflection on the xy-plane. This would
1 0 0
simply be A0 = 0 1 0. Now we can proceed as follows: First we rotate R3 about the origin
0 0 −1

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 223

such that the plane E is parallel to the xy-axis, then we reflect on the xy-plane and then we rotate
back. The result is the same as reflecting on the plane E. We leave the details to the reader. An
analogous procedure works for the orthogonal projection. 
Solution 3 (straight forward calculation).
 a11 a12 a13  Lastly, we can form a system of linear equations in
order to find AR . We write AR = aa21 aa22 aa23 with unknowns aij . Again, we use that we know
31 32 33
that AR~v = ~v , AR w ~ and AR~n = −~n. This gives a system of 9 linear equations for the nine
~ =w
unknowns aij which can be solved. 

Remark 6.51. Yet another solution is the following. Let Q be the orthogonal projection onto ~n.
We already know how to calculate its representing matrix:
  
1 −2 3 x
h~x , ~ni x − 2y + 3z 1 
Q~x = ~
n = ~
n = −2 4 −6  y  .
k~nk2 14 14
3 −6 9 z
 1 −2 3 
1 −2 4 −6 . Geometrically, it is clear that P = id −Q and R = id −2Q. Hence it
Hence AQ = 14

FT
3 −6 9
follows that
     
1 0 0 1 −2 3 13 2 −3
1  1 
AP = id −AQ = 0 1 0 − −2 4 −6 = 2 10 6
14 14
0 0 1 3 −6 9 −3 6 5
and      
1 0 0 1 −2 3 6 2 −3
1 1
AR = id −2AQ = 0 1 0 − −2 4 −6 =  2 3 6 .
7 7
RA
0 0 1 3 −6 9 −3 6 −2

Change of bases as matrix representation of the identity


Finally let us observe that a change-of-bases matrix is nothing else than the identity matrix written
with respect to different bases. To see this let B = {~v1 , . . . , ~vn } and C = {w
~ 1 , . . . , ~vw } be bases of
Rn . We define the the linear bijections ΨB and ΨC as follows:
D

Ψ B : R n → Rn , ΨB (~e1 ) = ~v1 , . . . , ΨB (~en ) = ~vn ,


n n
ΨC : R → R , ΨC (~e1 ) = w
~ 1 , . . . , ΨC (~en ) = w
~ n,

Moreover we define the change-of-bases matrices


SB→can = (~v1 | · · · |~vn ), ~ 1 | · · · |w
SC→can = (w ~ n ).
Note that these matrices are exactly the matrix representations of ΨB and ΨC . Now let us consider
the diagram
id
Rn Rn
Ψ−1
B Ψ−1
C
Aid
Rn Rn

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
224 6.5. Summary

Therefore
Aid = Ψ−1 −1 −1
C ◦ id ◦ΨB = ΨC ◦ ΨB = SC→can ◦ SB→can = Scan→C ◦ SB→can = SB→C→can .

You should now have understood


• why every linear map between finite dimensional vector spaces can be written as a matrix
and why the matrix depends on the chosen bases,
• how the matrix representation changes if the chosen bases changes,
• in particular, how the matrix representation changes if the chosen bases are reordered,
• etc.
You should now be able to

• represent a linear map between finite dimensional vector spaces as a matrix,

FT
• use the matrix representation of a linear map to calculate its kernel and range,
• interpret a matrix as a linear map between finite dimensional vector spaces,
• etc.

6.5 Summary
Linear maps
RA
A function T : U → V between two K-vector spaces U and V is called linear map (or linear function
or linear transformation) if it satisfies
T (u1 + λu2 ) = T (u1 ) + λT (u2 ) for all u1 , u2 ∈ U and λ ∈ K.
The set of all linear maps from U to V is denoted by L(U, V ).
• The composition of linear maps is a linear map.
• If a linear map is invertible, then its inverse is a linear map.
D

• If U, V are K-vector spaces then L(U, V ) is a K-vector space. This means: If S, T ∈ L(U, V )
and λ ∈ K, then S + λT ∈ L(U, V ).
For a linear map T : U → V we define the following sets
ker T = {u ∈ U : T u = O} ⊆ U,
Im T = {T u : u ∈ U } ⊆ V.
ker T is called kernel of T or null space of T . It is a subspace of U . Im T is called image of T or
range of T . It is a subspace of V .
The linear map T is called injective if T u1 = T u2 implies u1 = u2 for all u1 , u2 ∈ U . The linear
map T is called surjective if for every v ∈ V exist some u ∈ U such that T u = v. The linear map
T is called bijective if it is injective and surjective.
Let T : U → V be a linear map.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 225

• The following are equivalent:


(i) T is injective.
(ii) T u = O implies that u = O.
(iii) ker T = {O}.
• The following are equivalent:
(i) T is surjective.
(ii) Im T = V .
• If T is bijective, then necessarily dim U = dim V . In other words: if dim U 6= dim V , then
there exists no bijection between them.

Let U, V be K-vector spaces and T : U → V a linear map. Moreover, let E : U → U , F : V → V


be linear bijective maps. Then

FT
ker(F T ) = ker(T ), ker(T E) = E −1 (ker(T )),
Im(F T ) = F (Im(T )), Im(T E) = Im(T ).

and

dim ker(T ) = dim ker(F T ) = dim ker(T E) = dim ker(F T E),


dim Im(T ) = dim Im(F T ) = dim Im(T E) = dim Im(F T E).
RA
If dim U = n < ∞ then

dim(ker(T )) + dim(Im(T )) = n.

Linear maps and matrices


Every matrix A ∈ MK (m × n) represents a linear map from Kn to Km by

TA : Kn → Km , ~x 7→ A~x.
D

Very often we write A instead of TA .


On the other hand, every linear map T : U → V between finite dimensional vector spaces U and V
has a matrix representation. Let B = {u1 , . . . , un } be a basis of U and C = {v1 , . . . , vm } be a basis
of V . Assume that T uj = a1j v1 + · · · + amj vm . Then the matrix representation of T with respect
to the basis B and C is AT = (aij )i=1,...,m ∈ M (m × n). Note that the matrix representation of T
j=1,...,n
depends on the chosen bases in U and V .
If we define the functions Ψ and Φ as
   
α1 β1
Ψ : U → Kn , Ψ(α1 u1 + . . . αn un ) =  ...  , Φ : V → Km , Φ(β1 v1 + . . . βm vm ) =  ...  ,
   

αn βm

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
226 6.6. Exercises

then these functions are linear and Φ ◦ AT ◦ Ψ = T and Ψ−1 ◦ T ◦ Φ−1 = AT . In a diagram this is

T
U V
Ψ Φ
n AT m
R R

Matrices
Let A ∈ M (m × n).

• The column space CA of A is the linear span of its column vectors. It is equal to Im A.

• The row space RA of A is the linear span of its row vectors. It is equal to the orthogonal
complement of ker A.

• dim RA = dim CA = dim(Im A) = number of columns with pivots in any echelon form of A.

FT
Kernel and image of A:

• dim(ker A) = number of free variables = number of columns without pivots in any row echelon
form of A.
ker A is equal to the solution set of A~x = ~0 which can be determined for instance with the
Gauß or Gauß-Jordan elimination.

• dim(Im A) = dim CA = number of columns with pivots in any row echelon form of A.
RA
Im(A) be be found by either of the following two methods:

(i) row reduction of A. The columns of the original matrix A which correspond to the
columns of the row reduced echelon form of A are a basis of Im A.
(ii) column reduction of A. The remaining columns are a basis of Im A.

6.6 Exercises
D

1. Determine si las siguientes funciones son lineales. Si lo son, calcule el kernel y la dimensión del
kernel.

 
x  
3 2x + y x−z
(a) A : R → M (2x2), A  y  = ,
x + y − 3z z
z
 
x  
3 2xy x−z
(b) B : R → M (2x2), A y =
  ,
x + y − 3z z
z
(c) C : M (2 × 2) → M (2 × 2), C(M ) = M + M t
(d) D : P3 → P4 , Dp = p0 + xp,

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 227

 
a+b b+c c+d
(e) T : P3 → M (2 × 3), T (ax3 + bx2 + cx + d) = ,
0 a+d 0
 
a+b b+c c+d
(f) T : P3 → M (2 × 3), T (ax3 + bx2 + cx + d) = .
0 a+d 3

2. Sean U, V espacios vectoriales sobre K (con K = R o K = C) y sea T : U → V una función lineal


invertible. Entonces podemos considerar su función inversa T −1 : Im(T ) → U . Demuestre que
es una función lineal.

3. Sean U, V, W espacios vectoriales sobre K (con K = R o K = C) y sean T : U → V , S : V → W


funciones lineales. Demuestre que la composición ST : U → W también es una función lineal.

4. Sean U, V espacios vectoriales sobre K (con K = R o K = C). Con L(U, V ) denotamos el


conjunto de todas las transformaciones lineales de U a V . Demuestre que L(U, V ) es un espacio
vectorial sobre K. ¿Qué se puede decir sobre dim L(U, V )?

una función lineal:

FT
5. Sean U, V espacios vectoriales sobre K (con K = R o K = C). Sabemos de Ejercicio 4 que
L(U, V ) es un espacio vectorial. Fije un vector v0 ∈ V . Demuestre que la siguiente función es

Φv0 : L(U, V ) → U, Φv0 (T ) >= T (v0 ).


RA
6. Sean      
1 3 1 1 1 0
A= , E= , F = .
2 6 −1 1 0 −1

(a) Demuestre que E y F son invertibles. Describa como actuan geométricamente en R2 .


(b) Calcule Im(A), ker(A) y sus dimensiones. Dibuja Im(A) y ker(A), diga qué objetos
geométricas son.
D

(c) Calcule Im(A), Im(F A), Im(AE) y sus dimensiones. Dibújalos y diga cual es la relación
entre ellos.
(d) Calcule ker(A), ker(F A), ker(AE) y sus dimensiones. Dibújalos y diga cual es la relación
entre ellos.

7. De los siguientes matrices, calcule kernel, imagen y las dimensiones correspondientes.


 
  1 1 5 1  
1 4 7 2 3 2 1 2 3
13 1  
A = 2 5 8 4 , B= , C = 1 2 3 .
0 2 7 −1
3 6 9 6 1 2 9
4 5 25 1

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
228 6.6. Exercises

8. Sea A ∈ M (m × n). Demuestre:


(i)A inyectiva =⇒ m ≥ n.
(ii)A sobreyectiva =⇒ n ≥ m.
Demuestre que la implicación “⇐=” en (i) and (ii) en general es falsa.

9. Sea A ∈ M (m × n) y suponga que A es invertible. Demuestre que m = n.

10. Sean m, n ∈ N y A ∈ M (m × n).


(a) ¿Cuáles son las dimensiones posibles de ker A y Im A?
(b) Para cada j = 0, 1, 2, 3 encuentre una matriz Aj ∈ M (2 × 3) con dim(ker Aj ) = j, es decir:
encuentre matrices A0 , A1 , A2 , A3 con dim(ker A0 ) = 0, dim(ker A1 ) = 1, . . . . Si tal matriz
no existe, explique por qué no existe.

11. (a) Encuentre por lo menos dos diferentes funciones lineales biyectivas de M (2 × 2) a P3 .

FT
(b) Existe una función lineal biyectiva S : M (2 × 2) → Pk para k ∈ N, k 6= 3?

12. Sean V y W espacios vectoriales.

(a) Sea U ⊂ V un subspacio y sean u1 , . . . , uk ∈ U . Demuestre que gen{u1 , . . . uk } ⊂ U .


(b) Sean u1 , . . . , uk , w1 , . . . , wm ∈ V . Demuestre que lo siguiente es equivalente:
(i) gen{u1 , . . . , uk } = gen{w1 , . . . , wm }.
RA
(ii) Para todo j = 1, . . . , k tenemos uj ∈ gen{w1 , . . . , wm } y para todo ` = 1, . . . , m
tenemos w` ∈ gen{u1 , . . . , uk }.
(iii) Sean v1 , v2 , v3 , . . . , vm ∈ V y sea c ∈ R. Demuestre que
gen{v1 , v2 , v3 , . . . , vm } = gen{v1 + cv2 , v2 , v3 , . . . , vm }.
(c) Sean v1 , . . . , vk ∈ V y sea A : V → W una función lineal invertible. Demuestre que
dim gen{v1 , . . . , vk } = dim gen{Av1 , . . . , Avk }. Es verdad si A no es invertible?

     
1 0 1
D

13. (a) Sean ~v1 = 4 , ~v2 = 1 , ~v3 = 0 y sea B = {~v1 , ~v2 , ~v3 }. Demuestre que B es una
7 2 2
   
1 0
base de R3 y escriba los vectores ~x = 2 , ~y = 1 en términos de la base B.
3 1
     
1 2 3 2 3 2
14. Sean R = , S= , T = . Demuestre que B = {R, S, T } es una base del
0 3 0 7 0 1
espacio de las matrices triangulares superiores y exprese las matrices
     
1 1 0 0 1 0
K= , L= , M=
0 1 0 1 0 1
en términos de la base B.

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 6. Linear transformations and change of bases 229

       
1 3 −1 3
15. Sean ~a1 = , ~a2 = , ~b1 = , ~b2 = ∈ R2 y sean A = {~a1 , ~a2 }, B = {~b1 , ~b2 }.
2 1 1 2

(a) Demuestre qu A y B son bases de R2 .


 
7
(b) Sea (~x)A = . Encuentre (~x)B y ~x (en la representación estandar).
8
 
3
(c) Sea (~y )B = . Encuentre (~y )A y ~y (en la representación estandar).
5

     
2 −1 4
16. Sea B = {~b1 , ~b2 } una base de R2 y sean ~x1 = , ~x2 = , ~x3 = (dados en
3 1 6
coordenadas cartesianas).
   
3 3
(a) Si se sabe que ~x1 = , ~x2 = , es posible calcular ~b1 y ~b2 ? Si sı́, calcúlelos. Si
1 B 2 B

FT
no, explique por qué no es posible.
   
3 6
(b) Si se sabe que ~x1 = , ~x3 = , es posible calcular ~b1 y ~b2 ? Si sı́, calcúlelos. Si
1 B 2 B
no, explique por qué no es posible.
   
3 6
(c) ¿Existen ~b1 y ~b2 tal que ~x1 = , ~x2 = ? Si sı́, calcúlelos. Si no, explique por
1 B 2 B
qué no es posible.
   
3 2
RA
(d) ¿Existen ~b1 y ~b2 tal que ~x1 = , ~x3 = ? Si sı́, calcúlelos. Si no, explique por
1 B 5 B
qué no es posible.

17. (a) Demuestre que la siguente función es lineal:

Φ : M (2 × 2) → M (2 × 2), Φ(A) = At

(b) Sea B = {E1 , E2 , E3 , E4 } la base estandar1 de M (2 × 2) . Encuentre la matriz que repre-


D

senta a Φ con respecto a esta base.


       
1 2 1 0 0 1 1 0
(c) Sean R = , S= , T = , U = y sea C = {R, S, T, U }.
3 4 0 1 −1 0 1 0
Demuestre que C es una base de M (2 × 2) y escriba Φ como matriz con respecto a esta
base.

18. (a) Demuestre que T : P3 → P3 , T p = p0 es una función lineal.


(b) Determine ker(T ), Im(T ), dim(ker(T )), dim(Im(T )).
(c) Sea B = {1, X, X 2 , X 3 } la base estandar de P3 . Encuentre la matriz que representa a T
con respecto a esta base.
       
1E 1 0 0 1 0 0 0 0
1 = , E2 = , E3 = , E4 = .
0 0 0 0 1 0 0 1

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
230 6.6. Exercises

(d) Sean q1 = X +1, q2 = X −1, q3 = X 2 +X, q4 = X 3 +1. Demuestre que C = {q1 , q2 , q3 , q4 }


es una base de P3 . .
(e) Encuentre la matriz con respecto a la base C que representa a T .

FT
RA
D

Last Change: Sa 14. Mai 22:29:27 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7

Orthonormal bases and orthogonal


projections in Rn

FT
In this chapter we will work in Rn and not in arbitrary vector spaces since we want to explore in
more detail its geometric properties. In particular we will discuss orthogonality. Note that in an
arbitrary vector space, we do not have the concept of angles or orthogonality. Everything that we
will discuss here can be extended to inner product spaces where the inner product is used to define
angles. Recall that we showed in Theorem 2.19 that for non-zero vectors ~x, ~y ∈ Rn the angle ϕ
between them satisfies the equation
h~x , ~y i
RA
cos ϕ = .
k~xk k~y k

In a general inner product space (V, h· , ·i) this equation is used to define the angle between two
vectors. In particular, two vectors are said to be orthogonal if their inner product is 0. Inner
product spaces are useful for instance in physics, and maybe in some not so distant future there
will be chapter in these lecture notes about them.
First we will define what the orthogonal complement of a subspace of Rn is and we will see that
the direct sum of a subspace and its orthogonal complement gives us all of Rn .
D

We already know what the orthogonal projection of a vector ~x onto another vector ~y 6= ~0 is (see
Section 2.3). Since it is independent of the norm of ~y , we can just as well consider it the orthogonal
projection of ~x onto the line generated by ~y . In this chapter we will generalise the concept of an
orthogonal projection onto a line to the orthogonal projection onto an arbitrary subspace.
As an application, we will discuss the minimal squares method for the approximation of data.

7.1 Orthonormal systems and orthogonal bases


Recall that two vectors ~x and ~y are orthogonal (or perpendicular ) to each other if and only if
h~x , ~y i = 0. In this case we write ~x ⊥ ~y .

231
232 7.1. Orthonormal systems and orthogonal bases

Definition 7.1. (i) A set of vectors ~x1 , . . . , ~xk ∈ Rn is called an orthogonal set if they are
pairwise orthogonal; in formulas we can write this as

h~xj , ~x` i = 0 for j 6= `.

(ii) A set of vectors ~x1 , . . . , ~xk ∈ Rn is called an orthonormal set if they are pairwise orthonormal;
in formulas we can write this as
(
1 for j = `,
h~xj , ~x` i =
0 for j 6= `.

The difference between an orthogonal and an orthonormal set is that in the latter we additionally
require that each vector of the set satisfies h~xj , ~xj i = 1, that is, that k~xj k = 1. Therefore an
orthogonal set may contain vectors of arbitrary lengths, including the vector ~0, whereas in an
orthonormal all vectors set must have length 1. Note that every orthonormal system is also an
orthogonal system. On the other hand, every orthogonal system which does not contain ~0 can be
converted to an orthonormal one by normalising each vector (that is, by dividing each vector by its

FT
norm).

Examples 7.2. (i) The following systems are orthogonal systems but not orthonormal systems
since the norm of at least one of their vectors is different from 1:
     
           1 0 0 
1 3 0 1 3
, , , , , 0 , 1 , −2 .
−1 3 0 −1 3
0 2 1
 
RA
(ii) The systems following systems are orthonormal systems:
     
      1 0 0 
1 1 1 1 1 1
√ , √ , 0 , √ 1 , √ −2 .
2 −1 2 1 
0 5 2 5 1

Lemma 7.3. Every orthonormal system is linearly independent.

Proof. Let ~x1 , . . . , ~xk be an orthonormal system and consider


D

~0 = α1 ~x1 + α2 ~x2 + · · · + αn−1 ~xn−1 + αn ~xn .

We have to show that all αj must be zero. To do this, we take the inner product on both sides
with the vectors ~xj . Let us start with ~x1 . We find

h~0 , ~x1 i = hα1 ~x1 + α2 ~x2 + · · · + αn−1 ~xn−1 + αn ~xn , ~x1 i


= α1 h~x1 , ~x1 i + α2 h~x2 , ~x1 i + · · · + αn−1 h~xn−1 , ~xn−1 i + αn h~xn , ~x1 i.

Since h~0 , ~x1 i = 0, h~x1 , ~x1 i = k~x1 k2 = 1 and h~x2 , ~x1 i = · · · = h~xn−1 , ~xn−1 i = h~xn , ~x1 i = 0, it
follows that
0 = α1 + 0 + · · · + 0 = α1 .
Now we can repeat this process with ~x2 , ~x3 , . . . , ~xn to show that α2 = · · · = αn = 0.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 233

Remark. The lemma shows that every system of n vectors in Rn is a basis of Rn .

Definition 7.4. An orthonormal basis of Rn is a basis whose vectors form an orthogonal set.
Occasionally we will write ONB for “orthonormal basis”.

Examples 7.5 (Orthonormal bases of Rn ).

(i) The canonical basis ~e1 , . . . , ~en is an orthonormal basis of Rn .

(ii) The following systems are examples of orthonormal bases of R2 :


               
1 1 1 1 1 2 1 −3 1 3 1 −4
√ , √ , √ , √ , , , .
2 −1 2 1 13 3 13 2 5 4 5 3

(iii) The following systems are examples of orthonormal bases of R3 :

FT
             
 1 1 1 −1   1 −3 1 
1 1 1 1 1
√ −1 , √ 1 , √  1 , √ 2 , √  0 , √ −5 .
 3 2 0 6   14 10 35
1 2 3 1 3

   
cos ϕ − sin ϕ
Exercise 7.6. Show that every orthonormal basis of R2 is of the form ,
    sin ϕ cos ϕ
cos ϕ sin ϕ
or , for some ϕ ∈ R. See also Exercise 7.13.
sin ϕ − cos ϕ
RA
We will see in Corollary 7.31 that every orthonormal system in Rn can be completed to an or-
thonormal basis. In Section 7.5 we will show how to construct an orthonormal basis of a subspace
of Rn from a given basis. In particular it follows that every subspace of Rn has an orthonormal
basis.
Orthonormal bases are very useful. Among other things it is very easy to write a given vector
~ ∈ Rn as a linear combination of such a basis. Recall that if we are given an arbitrary basis
w
~z1 , . . . , ~zn of Rn and we want to write a vector ~x as linear combination of this basis, then we have
D

to find coefficients α1 , . . . , αn such that ~x = α1 ~z1 +· · ·+αn ~zn , which means we have to solve a n×n
system in order to determine the coefficients. If however the given basis is an orthonormal basis,
then calculating the coefficients reduces to evaluating n inner products as the following theorem
shows.

Theorem 7.7 (Representation of a vector with respect to an ONB). Let ~x1 , . . . , ~xn be an
orthonormal basis of Rn and let w
~ ∈ Rn . Then

~ = hw
w ~ , ~x1 i~x1 + hw
~ , ~x2 i~x2 + · · · + hw
~ , ~xn i~xn .

Proof. Since ~x1 , . . . , ~xn is a basis of Rn , there are α1 , . . . , αn ∈ R such that

~ = α1 ~x1 + α2 ~x2 + · · · + αn ~xn .


w

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
234 7.2. Orthogonal matrices

Now let us take the inner product on both sides with ~xj for j = 1, . . . , n. Note that h~xk , ~xj i = 0
if k 6= j and that h~xj , ~xj i = k~xj k2 = 1.

hw
~ , ~xj i = hα1 ~x1 + α2 ~x2 + · · · + αn ~xn , ~xj i
= α1 h~x1 , ~xj i + α2 h~x2 , ~xj i + · · · + αn h~xn , ~xj i
= αj h~xj , ~xj i = αj .

Note that the proof of this theorem is essentially the same as that of Lemma 7.3. In fact, Lemma 7.3
follows from the theorem above if we choose w ~ = ~0.
Exercise 7.8. If ~x1 , . . . , ~xn are an orthogonal, but not necessarily orthonormal basis of Rn , then
we have for every w~ ∈ Rn that

hw
~ , ~x1 i hw
~ , ~x2 i hw
~ , ~xn i
w
~= ~x1 + ~x2 + · · · + ~xn .
k~x1 k2 k~x2 k2 k~xn k2

(You can either use a modified version of the proof of Theorem 7.7 or you define yj = k~xj k−1 ~xj ,

FT
show that ~y1 , . . . , ~yn is an orthogonal basis and apply the formula from Theorem 7.7.)

You should now have understood


• what an orthogonal system is,
• what an orthonormal system is,
• what an orthonormal basis is,
RA
• why orthogonal bases are useful,
• etc.
You should now be able to

• check if a given set of vectors is an orthogonal/orthonormal system,


• check if a given set of vectors is an orthogonal/orthonormal basis of the given space,
• check if a given basis is an orthogonal or orthonormal basis,
D

• give examples of orthonormal basis,


• find the coefficients of a given vector with respect to a given orthonormal or orthogonal
basis.
• etc.

7.2 Orthogonal matrices


We already saw that it is very easy to express a given vector as linear combination of the members
of an orthonormal basis. In this section we want to explore the properties of the transition matrices
between two orthonormal bases of Rn .
Let B = {~u1 , . . . , ~un } and C = {w ~ n } be orthonormal bases of Rn . Let Q = AB→C be the
~ 1, . . . , w
transition matrix from the basis B to the basis C. We know that its entries qij are the uniquely

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 235

determined numbers such that


   
q11 q1n
 .. 
~ 1 + · · · + qn1 w
~u1 =  .  = q11 w ~ n, ..., ~un =  ...  = q1n w
~ 1 + · · · + qnn w
~ n.
 

qn1 C
qnn C

Since C is an orthonormal basis, it follows that qij = h~uj , w


~ i i, see Theorem 7.7. Therefore
 
 h~u1 , w
~ 1 i h~u2 , w
~ 1i h~un , w
~ 1 i
 h~u1 , w
~ 2 i h~u2 , w
~ 2i h~un , w
~ 2 i
 
AB→C =  
.

 
 
 
h~u1 , w
~ n i h~u2 , w
~ ni h~un , w
~ ni

If we exchange the role of B and C and use that hw ~ i , ~uj i = h~uj , w~ i i, then we obtain

FT
   
hw
~ 1 , ~u1 i hw
~ 2 , ~u1 i hw
~ n , ~u1 i  h~u1 , w ~ 1 i h~u1 , w~ 2i h~u1 , w
~ n i
hw
~ 1 , ~u2 i hw
~ 2 , ~u2 i hw
~ n , ~u2 i  h~u2 , w~ 1 i h~u2 , w~ 2i h~u2 , w
~ n i
   
AC→B =  
=
  .

   
   
   
hw
~ 1 , ~un i hw
~ 2 , ~un i hw~ n , ~un i h~un , w
~ 1 i h~un , w~ 2i h~un , w
~ ni
RA
This shows that AC→B = (AB→C )t . If we use that AC→B = (AB→C )−1 , then we find that

(AB→C )−1 = (AB→C )t .

From these calculations, we obtain the following lemma.

Lemma 7.9. Let B = {~u1 , . . . , ~un } and C = {w ~ n } be orthonormal bases of Rn and let
~ 1, . . . , w
Q = AB→C be the transition matrix from the basis B to the basis C. Then

Qt = Q−1 .
D

Definition 7.10. A matrix A ∈ M (n × n) is called an orthogonal matrix if it is invertible and


At = A−1 .

Proposition 7.11. Let Q ∈ M (n × n). Then the following is equivalent:

(i) Q is an orthogonal matrix.


(ii) Qt is an orthogonal matrix.
(iii) Q−1 exists and is an orthogonal matrix.

Proof. (i) =⇒ (ii): Assume that Q is orthogonal. Then it is invertible, hence also Qt is invertible
by Theorem 3.50 and (Qt )−1 = (Q−1 )t = (Qt )t = Q holds. Hence Qt is an orthogonal matrix.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
236 7.2. Orthogonal matrices

(ii) =⇒ (i): Assume that Qt is an orthogonal matrix. Then (Qt )t = Q must be an orthogonal
matrix too by what we just proved.
(i) =⇒ (iii): Assume that Q is orthogonal. Then it is invertible and (Q−1 )−1 = (Qt )−1 = (Q−1 )t
where in the second step we used Theorem 3.50. Hence Q−1 is an orthogonal matrix.
(iii) =⇒ (i): Assume that Q−1 is an orthogonal matrix. Then its inverse (Q−1 )−1 = Q must be
an orthogonal matrix too by what we just proved.

By Lemma 7.9, every transition matrix from one ONB to another ONB is an orthogonal matrix.
The reverse is also true as the following theorem shows.

Theorem 7.12. Let Q ∈ M (n × n). Then:

(i) Q is an orthogonal matrix if and only if its columns are an orthonormal basis of Rn .

(ii) Q is an orthogonal matrix if and only if its rows are an orthonormal basis of Rn .

FT
(iii) If Q is an orthgonal matrix, then | det Q| = 1.

Proof. (i): Assume that Q is an orthogonal matrix and let ~cj be its columns. We already know
that they are a basis of Rn since Q is invertible. In order to show that they are also an orthonormal
system, we calculate
 
h~c , ~c i h~c1 , ~c2 i h~c1 , ~cn i
 1 1
RA
 
~c1

h~c2 , ~c1 i h~c2 , ~c2 i h~c2 , ~cn i
 
id = Qt Q =  ...  (~c1 | · · · | ~cn ) =  . (7.1)
 
 
~cn
 
 
 
h~cn , ~c1 i h~cn , ~c2 i h~cn , ~cn i

Since the product is equal to the identity matrix, it follows that all the elements on the diagonal
must be equal to 1 and all the other elements must be equal to 0. This means that h~cj , ~cj i = 1 for
j = 1, . . . , n and h~cj , ~ck i = 0 for j 6= k, hence the columns of Q are an orthonormal basis of Rn .
D

Now assume that the columns ~c1 , . . . , ~cn of Q are an orthonormal basis of Rn . Then clearly (7.1)
holds which shows that Q is an orthogonal matrix.
(ii): The rows of Q are the columns of Qt hence they are an orthonormal basis of Rn by (i) and
Proposition 7.11 (ii).
(iii): Recall that det Qt = det Q. Therefore we obtain

1 = det id = det(QQt ) = (det Q)(det Qt ) = (det Q)2 ,

which proves the claim.


 
1 1
Clearly, not every matrix R with | det R| = 1 is an orthogonal matrix. For instance, if R = ,
0 1
Last Change: Mo 16. Mai 00:43:18 CEST 2022
Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 237

   
1 −1 1 0
then det R = 1, but R−1 = is different from Rt = .
0 1 1 1

Question 7.1
Assume that ~a1 , . . . , ~an ∈ Rn are pairwise orthogonal and let R ∈ M (n × n) be the matrix whose
columns are the given vectors. Can you calculate Rt R and RRt ? What are the conditions on the
vectors such that R is invertible? If it is invertible, what is its inverse? (You should be able to
answer the above questions more or less easily if k~aj k = 1 for all j = 1, . . . , n because in this case
R is an orthogonal matrix.)

 
cos ϕ − sin ϕ
Exercise 7.13. Show that every orthogonal 2 × 2 matrix is of the form Q =
sin ϕ cos ϕ
 

FT
cos ϕ sin ϕ
or Q = . Compare this with Exercise 7.6.
sin ϕ − cos ϕ

Exercise 7.14. Use the results from Section 4.3 to prove that | det Q| = 1 if Q is an orthogonal
2 × 2 or 3 × 3 matrix.
RA
It can be shown that every orthogonal matrix represents either a rotation (if its determinant is 1)
or the composition of a rotation and a reflection (if its determinant is −1).

Orthogonal matrices in R2 . Let Q ∈ M (2 × 2) be an orthogonal matrix with columns  ~c1and


cos ϕ
~c2 . Recall that Q~e1 = ~c1 and Q~e2 = ~c2 . Since ~c1 is a unit vector, it is of the form ~c1 = for
sin ϕ
some ϕ ∈ R. Since ~c2 is also a unit
 vector  and in addition
 must
 be orthogonal to ~c1 , there are only
+ − sin ϕ − sin ϕ
the two possible choices ~c2 = or ~c2 = , see Figure 7.1.
− cos ϕ
D

cos ϕ

 
cos ϕ − sin ϕ
• In the first case, det Q = det(~c1 |~c2 + ) = det = cos2 ϕ + sin2 ϕ = 1 and Q
sin ϕ cos ϕ
represents the rotation by ϕ counterclockwise.

 
cos ϕ sin ϕ

• In the second case, det Q = det(~c1 |~c2 ) = det = − cos2 ϕ − sin2 ϕ = −1.
sin ϕ − cos ϕ
and Q represents the rotation by ϕ counterclockwise followed by a reflection on the direction
given by ~c1 (or: reflection on the x-axis followed by the rotation by ϕ counterclockwise).

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
238 7.2. Orthogonal matrices

y
Q y
cos ϕ 
~c1 = sin ϕ
~e2 − sin ϕ
~c2 + =

cos ϕ

ϕ
(a) x x
~e1

y
Q y
cos ϕ 
~c1 = sin ϕ
~e2

ϕ
(b) x x
~e1

FT
~c2 − = sin ϕ

− cos ϕ

Figure 7.1: In case (a), Q represents a rotation and det A = 1. In case (b) it represents rotation
followed by a reflection and det Q = −1.

Exercise 7.15. Let Q be an orthogonal n × n matrix. Show the following.


(i) Q preserves inner products, that is h~x , ~y i = hQ~x , Q~y i for all ~x, ~y ∈ Rn .
RA
(ii) Q preserves lengths, that is k~xk = kQ~xk for all ~x ∈ Rn .
(iii) Q preserves angles, that is ^(~x, ~y ) = ^(Q~x, Q~y ) for all ~x, ~y ∈ Rn \ {~0}.

Exercise 7.16. Let Q ∈ M (n × n)


(i) Assume that Q preserves inner products, that is h~x , ~y i = hQ~x , Q~y i for all ~x, ~y ∈ Rn . Show
that Q is an orthogonal matrix.
(ii) Assume that Q preserves lengths, that is k~xk = kQ~xk for all. Show that Q is an orthogonal
D

matrix.
Exercise 7.15 together with Exercise 7.16 show the following.

A matrix Q is an orthogonal matrix if and only if it preserves lengths if and only if it preserves
angles. That is

Q is orthogonal ⇐⇒ Qt = Q−1
⇐⇒ hQ~x , Q~y i = h~x , ~y i for all ~x, ~y ∈ Rn
⇐⇒ kQ~xk = k~xk for all ~x ∈ Rn .

Definition 7.17. A linear transformation T : Rn → Rm is called an isometry if kT ~xk = k~xk for


all ~x ∈ Rn .

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 239

Note that every isometry is injective since T ~x = ~0 if and only if ~x = ~0, therefore necessarily n ≤ m.

You should now have understood


• that a matrix is orthogonal if and only if it represents change of bases between two orthonor-
mal bases,
• that an orthogonal matrix represents either a rotation or a rotation composed with a reflec-
tion,
• etc.
You should now be able to
• check if a given matrix is an orthogonal matrix,
• construct orthogonal matrices,
• etc.

FT
7.3 Orthogonal complements
The first part of this section works for all vector spaces, not necessarily Rn .

Proposition 7.18. Let U, W be subspaces of a vector space V . Then their intersection U ∩ W is


a subspace of V .

Proof. Clearly, U ∩ W 6= ∅ because O ∈ U and O ∈ W , hence O ∈ U ∩ W . Now let z1 , z2 ∈ U ∩ W


RA
and c ∈ K. Then z1 , z2 ∈ U and therefore z1 + cz2 ∈ U because U is a vector space. Analogously
it follows that z1 + cz2 ∈ W , hence z1 + cz2 ∈ U ∩ W .

Observe that U ∩ W is the largest subspace which is contained both in U and in V .


For example, the intersection of two planes in R3 which pass through the origin is either that same
plane (if the two original planes are the same plane), or it is a line passing through the origin. In
either case, it is a subspace of R3 .
Observe however that in general the union of two vector spaces in general is not a vector space.
D

Exercise. • Give an example of two subspaces whose union is not a vector space.
• Give an example of two subspaces whose union is a vector space.

Question 7.2. Union of subspaces.


Can you find a criterion that subspaces must satisfy such that their union is a subspace?

Let us define the sum and the direct sum of vector spaces.

Definition 7.19. Let U, W be subspaces of a vector space V . Then the sum of the vector spaces
U and W is defined as
U + W = {u + w : u ∈ U, w ∈ W }. (7.2)

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
240 7.3. Orthogonal complements

If in addition U ∩ W = {O}, then the sum is called the direct sum of U and W and one writes
U ⊕ W instead of U + W .

Remark. Let U, W be subspaces of a vector space V . Then U + W is again a subspace of V .

Proof. Clearly, U + W 6= ∅ because O ∈ U and O ∈ W , hence O + O = O ∈ U + W . Now let


z1 , z2 ∈ U + W and c ∈ K. Then there exist u1 , u2 ∈ U and w1 , w2 ∈ W with z1 = u1 + w1 and
z2 = u2 + w2 . Therefore
z1 + cz2 = u1 + w1 + c(u2 + w2 ) = (u1 + cu2 ) + (w1 + cw2 ) ∈ U + W
and U + W is a subspace by Proposition 5.10.
Note that U + W consists of all possible linear combinations of vectors from U and from W . We
obtain immediately the following observations.

Remark 7.20. (i) Assume that U = span{u1 , . . . , uk } and that W = span{w1 , . . . , wj }, then
U + W = span{u1 , . . . , uk , w1 , . . . , wj }.

FT
(ii) The space U + W is the smallest vector space which contains both U and W .

Examples 7.21. (i) Let V be a vector space and let U ⊆ V be a subspace. Then we always
have:
(a) U + {O} = U ⊕ {O} = U ,
(b) U + U = U ,
(c) U + V = V .
RA
If U and W are subspaces of V , then
(a) U ⊆ U + W and W ⊆ U + W .
(b) U + W = U if and only if W ⊆ U .
(ii) Let U and W be lines in R2 passing through the origin. Then they are subspaces of R2 and
we have that U + W = U if the lines are parallel and U + W = R2 if they are not parallel.
(iii) Let U and W be lines in R3 passing through the origin. Then they are subspaces of R3 and
D

we have that U + W = U if the lines are parallel; otherwise U + W is the plane containing
both lines.
(iv) Let U be a line and W be a plane in R3 , both passing through the origin. Then they are
subspaces of R3 and we have that U + W = W if the line U is contained in W . If not, then
U + W = R3 .

Prove the statements in the examples above.


Recall that the intersection of two subspaces is again a subspace, see Proposition 7.18. The formula
for the dimension of the sum of two vector spaces in the next proposition can be understood as
follows: If we sum the dimension of the two vector spaces, then we count the part which is common to
both spaces twice; therefore we have to subtract its dimension in order to get the correct dimension
of the sum of the vector spaces.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 241

Proposition 7.22. Let U, W be subspaces of a vector space V . Then

dim(U + W ) = dim U + dim V − dim(U ∩ W ).

In particular, dim(U + W ) = dim U + dim V if U ∩ V = {O}.

Proof. Let dim U = k and dim W = m. Recall that U ∩ W is a subspace of V . and that U ∩ W ⊆ U
and U ∩W ⊆ W . Let v1 , . . . , v` be a basis of U ∩W . By Theorem 5.46 we can complete it to a basis
v1 , . . . , v` , u`+1 , . . . , uk of U . Similarly, we can complete it to a basis v1 , . . . , v` , w`+1 , . . . , wm of
W . Now we claim that v1 , . . . , v` , u`+1 , . . . , uk , w`+1 , . . . , wm is a basis of U + W .

• First we show that the vectors v1 , . . . , v` , u`+1 , . . . , uk , w`+1 , . . . , wm generate U + W . This


follows from Remark 7.20 and

U + W = span{v1 , . . . , v` , u`+1 , . . . , uk } + span{v1 , . . . , v` , w`+1 , . . . , wm }


= span{v1 , . . . , v` , u`+1 , . . . , uk , v1 , . . . , v` , w`+1 , . . . , wm }
= span{v1 , . . . , v` , u`+1 , . . . , uk , w`+1 , . . . , wm }.

FT
• Now we show that the vectors v1 , . . . , v` , u`+1 , . . . , uk , w`+1 , . . . , wm are linearly indepen-
dent. Let α1 , . . . , αn , β`+1 , . . . , βm ∈ R such that

α1 v1 + · · · + α` v` + α`+1 u`+1 + · · · + αk uk + β`+1 w`+1 + · · · + βm wm = O.

It follows that
RA
α1 v1 + · · · + α` v` + α`+1 u`+1 + · · · + αk uk = −(β`+1 w`+1 + · · · + βm wm ) (7.3)
| {z } | {z }
∈U ∈W

and therefore −(β`+1 w`+1 + · · · + βm wm ) ∈ U ∩ W hence it must be a linear combination of


the vectors v1 , . . . , v` because they are a basis of U ∩ V . So we can find γ1 , . . . , γ` ∈ R such
that γ1 v1 + · · · + γ` v` = −(β`+1 w`+1 + · · · + βm wm ). This implies that

γ1 v1 + · · · + γ` v` + β`+1 w`+1 + · · · + βm wm = O.
D

Since the vectors v1 , . . . , v` , w`+1 , . . . , wm form a basis of W , they are linearly independent,
and we conclude that γ1 = · · · = γ` = β`+1 = · · · = βm = 0. Inserting in (7.3), we obtain

α1 v1 + · · · + α` v` + α`+1 u`+1 + · · · + αk uk = O,

hence α1 = · · · = αk = 0.

It follows that

dim(U + W ) = #{v1 , . . . , v` , u`+1 , . . . , uk , w`+1 , . . . , wm }


= ` + (k − `) + (m − `)
=k+m−`
= dim U + dim W − dim(U ∩ W ).

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
242 7.3. Orthogonal complements

For the rest of this section, we will work in Rn . First let us define the orthogonal complement of a
given subspace.

Definition 7.23. Let U be a subspace of Rn .

(i) Let U be a subspace of Rn . We say that a vector ~x ∈ Rn is perpendicular to U if it is


perpendicular to every vector in U . In this case we write ~x ⊥ U .

(ii) The orthogonal complement of U is denoted by U ⊥ and it is the set of all vectors which are
perpendicular to every vector in U , that is

U ⊥ = {~x ∈ Rn : ~x ⊥ U } = {~x ∈ Rn : ~x ⊥ ~u for every ~u ∈ U }.

We start with some easy observations.

Remark 7.24. Let U be a subspace of Rn .

FT
(i) U ⊥ is a subspace of Rn .

(ii) U ∩ U ⊥ = {~0}.

(iii) (Rn )⊥ = {~0}, {~0}⊥ = Rn .

Proof. (i) Clearly, ~0 ∈ U ⊥ . Let ~x, ~y ∈ U ⊥ and let c ∈ R. Then for every ~u ∈ U we have that
h~x + c~y , ~ui = h~x , ~ui + ch~y , ~ui = 0, hence ~x + c~y ∈ U ⊥ and U ⊥ is a subspace by Theorem 5.10.
RA
(ii) Let ~x ∈ U ∩ U ⊥ . Then it follows that ~x ⊥ ~x, hence k~xk2 = h~x , ~xi = 0 which shows that ~x = ~0
and therefore U ∩ U ⊥ consists only of the vector ~0.

(iii) Assume that ~x ∈ (Rn )⊥ . Then ~x ⊥ ~y for every ~y ∈ Rn , in particular also ~x ⊥ ~x. Therefore
k~xk2 = h~x , ~xi = 0 which shows that ~x = ~0. It follows that ~x ∈ (Rn )⊥ .

It is clear that h~x , ~0i = 0, hence Rn ⊆ {~0}⊥ ⊆ Rn which proves that {~0}⊥ = Rn .
D

Examples 7.25. (i) The orthogonal complement of a line in R2 is again a line, see Figure 7.2.

(ii) The orthogonal complement of a line in R3 is the plane perpendicular to the given lines. The
orthogonal complement to a plane in R3 is the line perpendicular to the given plane, see
Figure 7.2.

The next goal is to show that dim U + dim U ⊥ = n and to establish a method for calculating
U ⊥ . To this end, the following lemma is useful. It tells us that in order to verify that some ~x is
perpendicular to U we do not have to check that ~x ⊥ ~u for every ~u ∈ U , but that it is enough to
check it for a set of vectors ~u which generate U .

Lemma 7.26. Let U = span{~u1 , . . . , ~uk } ⊆ Rn . Then ~x ∈ U ⊥ if and only if ~x ⊥ ~uj for every
j = 1, . . . , k.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 243

z
x H
G

U
L

• y •

FT
Figure 7.2: The figure on the left shows the orthogonal complement of the line L in R2 which is the
line G. The figure on the right shows the orthogonal complement of the plane U in R3 which is the
line H. Note the orhtogonal complement of G is U .

Proof. Suppose that ~x ⊥ U , then ~x ⊥ ~u for every ~u ∈ U , in particular for the generating vectors
~u1 , . . . , ~uk . Now suppose that ~x ⊥ ~uj for all j = 1, . . . , k. Let ~u ∈ U be an arbitrary vector in U .
Then there exist α1 , . . . , αk ∈ R such that ~u = α1 ~u1 + · · · + ~uk αk . So we obtain
RA
h~x , ~ui = h~x , α1 ~u1 + · · · + ~uk αk i = h~x , α1 ~u1 i + · · · + αk h~x , ~uk i = 0.

Since ~u can be chosen arbitrary in U , it follows that ~x ⊥ U .


The lemma above leads to a method for calculating the orthogonal complement of a given subspace
U of Rn as follows. Note that essentially it was already proved in Theorem 6.40.

Lemma 7.27. Let U = span{~u1 , . . . , ~uk } ⊆ Rn and let A be the matrix whose rows consist of the
vectors ~u1 , . . . , ~uk . Then
D

U ⊥ = ker A. (7.4)

Proof. Let ~x ∈ Rn . By Lemma 7.26 we know that ~x ∈ U ⊥ if and only if ~x ⊥ ~uj for every
j = 1, . . . , k. This is the case if and only if

h~u1 , ~xi = 0 ~u1


  
0

h~u2 , ~xi = 0 ~u2  0
which can be written in matrix form as  .  ~x =  . 
   
.. .  ..   .. 
. = ..
~uk 0
h~uk , ~xi = 0

which is the same as A~x = ~0 by definition of A. In conclusion, ~x ⊥ U if and only A~x = ~0, that is,
if and only if ~x ∈ ker A.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
244 7.3. Orthogonal complements

In Example 7.32 we will calculate the orthogonal complement of a subspace of R4 .

The next two theorems are the main results of this section.

Theorem 7.28. For every subspace U ⊆ Rn we have that

dim U + dim U ⊥ = n. (7.5)

Proof. Let ~u1 , . . . , ~uk be a basis of U . Note that k = dim U . Then we have in particular U =
span{~u1 , . . . , ~uk }. As in Lemma 7.26 we consider the matrix A ∈ M (k × n) whose rows are the
vectors ~u1 , . . . , ~uk . Then U ⊥ = ker A, so

dim U ⊥ = dim(ker A) = n − dim(Im A).

Note that dim(Im A) is the dimension of the column space of A which is equal to the dimension of
the row space of A by Proposition 6.32. Since the vectors ~u1 , . . . , ~uk are linear independent, this
dimension is equal to k. Therefore dim U ⊥ = n − k = n − dim U . Rearranging we obtained the

FT
desired formula dim U ⊥ + dim U = n.
(We could also have said that the reduced form of A cannot have any zero row because its rows
are linearly independent. Therefore the reduced form must have k pivots and we obtain dim U ⊥ =
dim(ker A) = n − #(pivots of the reduced form of A) = n − k = n − dim U . We basically re-proved
Proposition 6.32.)

Theorem 7.29. Let U ⊆ Rn be a subspace of Rn . Then the following holds.


RA
(i) U ⊕ U ⊥ = Rn .

(ii) (U ⊥ )⊥ = U .

Proof. (i) Recall that U ∩ U ⊥ = {~0} by Remark 7.24, therefore the sum is a direct sum. Now let
us show that U + U ⊥ = Rn . Since U + U ⊥ ⊆ Rn , we only have to show that dim(U + U ⊥ ) =
n because the only n-dimensional subspace of Rn is Rn itself, see Theorem 5.52. From
Proposition 7.22 and Theorem 7.28 we obtain
D

dim(U + U ⊥ ) = dim(U ) + dim(U ⊥ ) − dim(U ∩ U ⊥ ) = dim(U ) + dim(U ⊥ ) = n

where we used that dim(U ∩ U ⊥ ) = dim{~0} = 0.

(ii) First let us show that U ⊆ (U ⊥ )⊥ . To this end, fix ~u ∈ U . Then, for every ~y ∈ U ⊥ , we have
that h~x , ~y i = 0, hence ~x ⊥ U ⊥ , that is, ~x ∈ (U ⊥ )⊥ . Note that dim(U ⊥ )⊥ = n − dim U ⊥ =
n − (n − dim U ) = dim U . Since we already know that U ⊆ (U ⊥ )⊥ , it follows that they must
be equal by Theorem 5.52.

The next proposition shows that every subspace of Rn has an orthonormal basis. Another proof of
this fact will be given later when we introduce the Gram-Schmidt process in Section 7.5.

Proposition 7.30. Every subspace U ⊆ Rn with dim U > 0 has an orthonormal basis.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 245

Proof. Let U be a subspace of Rn with dim U = k > 0. Then dim U ⊥ = n − k and we can choose
a basis w ~ k+1 , . . . , wn of U ⊥ . Let A0 ∈ M ((n − k) × n) be the matrix whose rows are the vectors
~ k+1 , . . . , wn . Since U = (U ⊥ )⊥ , we know that U = ker A0 . Pick any ~u1 ∈ ker A0 with ~u1 6= ~0.
w
Then ~u1 ∈ U . Now we form the new matrix A1 ∈ M ((n−k+1)×n) by adding ~u1 as a new row to the
matrix A0 . Note that the rows of A1 are linearly independent, so dim ker(A1 ) = n−(n−k+1) = k−1.
If k−1 > 0, then we pick any vector ~u2 ∈ ker A1 with ~u2 6= ~0. This vector is orthogonal to all the rows
of A1 , in particular it belongs to U (since it is orthogonal to w ~ k+1 , . . . , w
~ n ) and it is perpendicular
to ~u1 ∈ U . Now we form the matrix A2 ∈ M ((n−k +2)×n) by adding the vector ~u2 as a row to A1 .
Again, the rows of A2 are linearly independent and therefore dim(ker A2 ) = n − (n − k + 2) = k − 2.
If k − 2 > 0, then we pick any vector ~u3 ∈ ker A2 with ~u3 6= ~0. This vector is orthogonal to all
the rows of A2 , in particular it belongs to U (since it is orthogonal to w ~ k+1 , . . . , w
~ n ) and it is
perpendicular to ~u1 , ~u2 ∈ U . We continue this process until we have vectors ~u1 , . . . , ~uk ∈ U which
are pairwise orthogonal and the matrix Ak ∈ M (n × n) consists of linearly independent rows, so its
kernel is trivial. By construction, ~u1 , . . . , ~uk is an orthogonal system of k vectors in U with none of
them being equal to ~0. Hence they are linearly independent and therefore they are an orthogonal
basis of U since dim U = k. In order to obtain an orthonormal basis we only have to normalise
each of the vectors.

Proof. Let w

Then w
~ 1, . . . , w

~ 1, . . . , w

We conclude this section with a few examples.


FT
Corollary 7.31. Every orthonormal system in Rn can be completed to an orthonormal basis.

~ k be an orthonormal system in Rn and let W = span{w ~ k }. By Propo-


~ 1, . . . , w
sition 7.30 we can find an orthonormal basis ~u1 , . . . ~un−k of W ⊥ (take U = W ⊥ in the proposition).
~ k , ~u1 . . . , ~un−k is an orthonormal basis of U ⊕ U ⊥ = Rn .
RA
Example 7.32. Find a basis for the orthogonal complement of
   

 1 1 
   
2 , 0 .

U = span 


 3 1 


4 0
 

Solution. Recall that ~x ∈ U ⊥ if and only if it is perpendicular to the vectors which generate U .
D

Therefore ~x ∈ U ⊥ if and only if it belongs to the kernel of the matrix whose rows are the generators
of U . So we calculate
     
1 2 3 4 1 2 3 4 1 0 1 0
−→ −→ .
1 0 1 0 0 −2 −2 −4 0 1 1 2

Hence a basis of U ⊥ is given by


  
0 −1
−2 −1
w
~1 = 
 0 ,
 w
~2 = 
 1 .
 
1 0

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
246 7.3. Orthogonal complements

Example 7.33. Find an orthonormal basis for the orthogonal complement of


   

 1 1 

2 0
 
U = span 
3 ,
  .
1

 
4 0
 

Solution. We will use the method from Proposition 7.30. Another solution of this exercise will be
given in Example 7.48. From the solution of Example 7.32 we can take the first basis vector w ~ 1.
We append it to the matrix from the solution of Example 7.32 and reduce the new matrix (note
that the first few steps are identical to the reduction of the original matrix). We obtain
     
1 2 3 4 1 0 1 0 1 0 1 0
1 0 1 0 −→ 0 1 1 2 −→ 0 1 1 2
0 −2 0 1 0 −2 0 1 0 0 2 5

FT
whose kernel is generated by
 
5
 1
 .
−5
2

Hence an orthogonal basis of U ⊥ is given by


RA
   
0 5
1 −2 1  1
~y1 = √  , ~y2 = √  . 
5  0 55 −5
1 2

You should now have understood

• the concept of sum and direct sum of two subspaces,


D

• why the formula dim(U + W ) = dim U + dim W − dim(U ∩ W ) makes sense,


• the concept of the orthogonal complement,
• in particular the geometric interpretation of the orthogonal complement of a subspace (at
least in R2 and R3 ),
• etc.

You should now be able to


• find the orthogonal complement of a given subspace of Rn ,
• find an orthogonal basis of a given subspace of Rn ,
• etc.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 247

7.4 Orthogonal projections


Recall that in Section 2.3 we discussed the orthogonal projection of one vector onto another in R2 .
~ ∈ Rn with w
This can clearly be extended to higher dimensions. Let ~v , w ~ 6= ~0. Then

h~v , wi
~
projw~ ~v := w
~ (7.6)
~ 2
kwk

is the unique vector in Rn which is orthogonal to w ~ and satisfies that ~v − projw~ ~v is parallel to
w.
~ We already know that the projection is independent on the length of w. ~ So projw~ ~v should be
regarded as the projection of ~v onto the one-dimensional subspace generated by w.~
In this section we want to generalise this to orthogonal projections on higher dimensional subspaces,
for instance you could think of the projection in R3 onto a given plane. Then, given a subspace U
of Rn , we want to define the orthogonal projection as the function from Rn to Rn which assigns to
each vector ~v its orthogonal projection onto U . We start with the analogue of Theorem 2.22.

Theorem 7.34 (Orthogonal projection). Let U ⊆ Rn be a subspace and let ~v ∈ Rn . Then there

FT
exist uniquely determined vectors ~vk and ~v⊥ such that

~vk ∈ U, ~v⊥ ⊥ U and ~v = ~vk + ~v⊥ . (7.7)

The vector ~vk is called the orthogonal projection of ~v onto U ; it is denoted by projU ~v .

Proof. First we show the existence of the vectors ~vk and ~v⊥ . If U = Rn , we take ~vk = ~v and
~v⊥ = ~0. If U = {~0}, we take ~vk = ~0 and ~v⊥ = ~v . Otherwise, let 0 < dim U = k < n. Choose
RA
orthonormal bases ~u1 , . . . , ~uk of U and w ~ n of U ⊥ . This is possible by Theorem 7.28
~ k+1 , . . . , w
and Proposition 7.30. Then ~u1 , . . . , ~uk , w ~ n is an orthonormal basis of Rn and for every
~ k+1 , . . . , w
n
~v ∈ R we find with the help of Theorem 7.7 that

~v = h~u1 , ~v i~u1 + · · · + h~uk , ~v i~uk + hw


~ k+1 , ~v iw
~ k+1 + · · · + hw
~ n , ~v iw
~n .
| {z } | {z }
∈U ∈U ⊥

If we set ~vk = h~u1 , ~v i~u1 + · · · + h~uk , ~v i~uk and ~v⊥ = hw


~ k+1 , ~v iw
~ k+1 + · · · + hw
~ n , ~v iw
~ n , then they
have the desired properties.
D

Next we show uniqueness of the decomposition of ~v . Assume that there are vectors ~vk and ~zk ∈ U
and ~v⊥ and ~z⊥ ∈ U ⊥ such that ~v = ~vk + ~v⊥ and ~v = ~zk + ~z⊥ . Then ~vk + ~v⊥ = ~zk + ~z⊥ and,
rearranging, we find that
~vk − ~zk = ~z⊥ − ~v⊥ .
| {z } | {z }
∈U ∈U ⊥

Since U ∩ U ⊥ = {~0}, it follows that ~vk − ~zk = ~0 and ~z⊥ − ~v⊥ = ~0, and therefore ~zk = ~vk and
~z⊥ = ~v⊥ .

Definition 7.35. Let U be a subspace of Rn . Then we define the orthogonal projection onto U as
the map which sends ~v ∈ Rn to its orthogonal projection onto U . It is usually denoted by PU , so

PU : Rn → R n , PU ~v = projU ~v .

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
248 7.4. Orthogonal projections

Remark 7.36 (Formula for the orthogonal projection). The proof of Theorem 7.34 indicates
how we can calculate the orthogonal projection onto a given subspace U ⊆ Rn . If ~u1 , . . . , ~uk is an
orthonormal basis of U , then
PU ~v = h~u1 , ~v i~u1 + · · · + h~uk , ~v i~uk . (7.8)
This shows that PU is a linear transformation since PU (~x + c~y ) = PU ~x + cPU ~y follows easily from
(7.8).

Exercise. If ~u1 , . . . , ~uk is an orthogonal basis of U (but not necessarily orthonormal), show that

h~u1 , ~v i h~uk , ~v i
PU ~v = ~u1 + · · · + ~uk . (7.9)
k~u1 k2 k~uk k2

Remark 7.37 (Formula for the orthogonal projection for dim U = 1). If dim U = 1, we
~ ∈U
obtain again the formula (7.6) which we already know from Section 2.3. To see this, choose w
~ 6= ~0. Then w
with w ~ 0 = kwk
~ −1 w
~ is an orthonormal basis of U and according to (7.8) we have that

FT
hw~ , ~v i
~ 0 , ~v iw
~ 0 = kwk
~ −1 w ~ −1 w ~ −2 hw


projw~ ~v = projU ~v = hw ~ , ~v kwk ~ = kwk ~ , ~v i w
~= w.
~
~ 2
kwk

Remark 7.38 (Pythagoras’s Theorem). Let U be a subspace of Rn , ~v ∈ Rn and let ~vk and ~v⊥
be as in Theorem 7.34. Then
k~v k2 = k~vk k2 + k~v⊥ k2 .

Proof. Using that ~vk ⊥ ~v⊥ , we find


RA
k~v k2 = h~v , ~v i = h~vk + ~v⊥ , ~vk + ~v⊥ i = h~vk , ~vk i + h~vk , ~v⊥ i + h~v⊥ , ~vk i + h~v⊥ , ~v⊥ i
= h~vk , ~vk i + h~v⊥ , ~v⊥ i = k~vk k2 + k~v⊥ k2 .

Exercise 7.39. Let U be a subspace of Rn with basis ~u1 , . . . , ~uk and let w
~ k+1 , . . . , w~ n be a basis
of U ⊥ . Find the matrix representation of PU with respect to the basis ~u1 , . . . , ~uk , w~ k+1 , . . . , w
~ n.

Exercise 7.40. Let U be a subspace of Rn . Show that PU ⊥ = id −PU . (You can show this either
directly or using the matrix representation of PU from Exercise 7.39.)
D

Exercise 7.41. Let U be a subspace of Rn . Show that (PU )2 = PU . (You can show this either
directly or using the matrix representation of PU from Exercise 7.39.)

Exercise 7.42. Let U be a subspace of Rn .


(i) Find ker PU and Im PU .
(ii) Find PU ⊥ PU and PU PU ⊥ .

In Theorem 7.34 we used the concept of orthogonality to define the orthogonal projection of ~v onto
a given subspace. We obtained a decomposition of ~v into a part parallel to the given subspace and
a part orthogonal to it. The next theorem shows that the orthogonal projection of ~v onto U gives
us the point in U which is closest to ~v .

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 249

~v z

dist(~v , U )
U


pro
jU ~v

FT
Figure 7.3: The figure shows the orthogonal projection of the vector ~v onto the subspace U (which is
a vector) and the distance of ~v to U (which is a number. It is the length of the vector (~v − projU ~v )).

Theorem 7.43. Let U be a subspace of Rn and let ~v ∈ Rn . Then PU ~v is the point in U which is
closest to ~v , that is,
RA
k~v − PU ~v k ≤ k~v − ~uk for every ~u ∈ U.

Proof. Let ~v ∈ Rn and ~u ∈ U ⊆ Rn . Note that ~v − PU ~v ∈ U ⊥ and that PU ~v − ~u ∈ U since both


vectors belong to U . Therefore, the Pythagoras theorem shows that

k~v − ~uk2 = k~v − PU ~v + PU ~v − ~uk2 = k~v − PU ~v k2 + kPU ~v − ~uk2 ≥ k~v − PU ~v k2 .

Taking the square root on both sides shows the desired inequality.
D

Definition 7.44. as Let U be a subspace of Rn and let ~v ∈ R. The we define the distance of ~v to
U as
dist(~v , U ) := k~v − PU ~v k.
This is the shortest distance of ~v to any point in U .

In Remark 7.36 we already found a formula for the orthogonal projection PU of a vector ~v to a
given subspace U . This formula however requires to have an orthonormal basis of U . We want to
give another formula for PU which does not require the knowledge of an orthonormal basis.

Theorem 7.45. Let U be a subspace of Rn with basis ~u1 , . . . , ~uk and let B ∈ M (n × k) be the
matrix whose columns are these basis vectors. Then the following holds.

(i) B is injective.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
250 7.5. The Gram-Schmidt process

(ii) B t B : Rk → Rk is a bijection.
(iii) The orthogonal projection onto U is given by the formula

PU = B(B t B)−1 B t .

Proof. (i) By construction, the columns of B are linearly independent. Therefore the unique
solution of B~x = ~0 is ~x = ~0 which shows that B is injective.
(ii) Observe that B t B ∈ M (k × k) and assume that B t B~x = ~0 for some ~x ∈ Rk . Then it follows
for every ~y ∈ Rk that B~y = ~0 because

0 = h~y , B t B~y i = h(B t )t ~y , B~y i = hB~y , B~y i = kB~y k2 .

Since B is injective, this implies ~y = ~0, so B t B is injective. Since it is a square matrix, it


follows that it is even bijective.
(iii) Observe that by construction Im B = U . Now let ~x ∈ Rn . Note that PU ~x ∈ Im B. Hence

FT
there exists exactly one ~z ∈ Rk such that PU ~x = B~z. Moreover, ~x − PU ~x ⊥ U = Im B, hence
for every ~y ∈ Rk we have that

0 = h~x − PU ~x , B~y i = h~x − B~z , B~y i = hB t ~x − B t B~z , ~y i.

Since this is true for every ~y ∈ Rk , it follows that B t ~x − B t B~z = ~0. Now we recall that B t B
is invertible, so we can solve for ~z and obtain ~z = (B t B)−1 B~x. This finally gives

PU ~x = B~z = B(B t B)−1 B~x.


RA
Since this holds for every ~x ∈ Rn , formula (iii) is proved.

You should now have understood

• the concept of an orthogonal projection onto a subspace of Rn ,


• the geometric interpretation of orthogonal projections and how it is related to the distance
of point to a subspace,
• etc.
D

You should now be able to

• calculate the orthogonal projection of a point to a subspace,


• calculate the distance of a point to a subspace,
• etc.

7.5 The Gram-Schmidt process


In this section we will describe the so-called Gram-Schmidt orthonormalisation process. Roughly
speaking, it converts a given basis of a subspace of Rn into an orthonormal basis, thus providing
another proof that every subspace of Rn has an orthonormal basis (Corollary 7.31).

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 251

Theorem 7.46. Let U be a subspace of Rn with basis ~u1 , . . . , ~uk . Then there exists an orthonormal
basis ~x1 , . . . , ~xk of U such that

span{~u1 , . . . , ~uj } = span{~x1 , . . . , ~xj } for every j = 1, . . . , k.

Proof. The proof is constructive, that is, we do not only prove the existence of such basis, but it
tells us how to calculate it. The idea is to construct the new basis ~x1 , . . . , ~xk step by step. In order
to simplify notation a bit, we set Uj = span{~u1 , . . . , ~uj } for j = 1, . . . , k. Note that dim Uj = j
and that Uk = U .

• Set ~x1 = k~v1 k−1~v1 . Then clearly k~x1 k = 1 and span{~u1 } = span{~x1 } = U1 .

• The vector ~x2 must be a normalised vector in U2 which is orthogonal to ~x1 , that is, it must
be orthogonal to U1 . So we simple take ~x2 and subtract its projection onto U1 :

~ 2 = ~x2 − projU1 w
w ~ 2 = ~x2 − proj~x1 w
~ 2 = ~x2 − h~x1 , w
~ 2 i~x1 .

FT
Clearly w ~ 2 ∈ U2 because it is a linear combination of vectors in U2 . Moreover, w ~ 2 ⊥ U1
because
D E
hw
~ 2 , ~x1 i = ~x2 − h~x1 , ~u2 i~x1 , ~x1 = h~x2 , ~x1 i − h~x1 , ~u2 ih~x1 , ~x1 i = h~x2 , ~x1 i − h~x1 , ~u2 i = 0.

Hence the vector ~x2 that we are looking for is

~ 2 k−1 w
~x2 = kw ~ 2.
RA
Since ~x2 ∈ U2 it follows that span{~x1 , ~x2 } ⊆ U2 . Both spaces have dimension 2, so they must
be equal.

• The vector ~x3 must be a normalised vector in U3 which is orthogonal to U2 = span{~x1 , ~x2 }.
So we simple take ~x3 and subtract its projection onto U2 :
 
w~ 3 = ~x3 − projU2 ~u3 = ~x2 − (proj~x1 ~u3 + proj~x2 ~u3 ) = ~x2 − h~x1 , ~u3 i~x1 + h~x1 , ~u3 i~x1 .

~ 2 ∈ U3 because it is a linear combination of vectors in U3 . Moreover, w


Clearly w ~ 3 ⊥ U2
D

because for j = 1, 2 we obtain


D E
hw
~ 3 , ~xj i = ~x3 − (h~x1 , w
~ 3 i~x1 + h~x2 , w
~ 3 i~x2 ) , ~xj
= h~x3 , ~xj i − h~x1 , w
~ 3 ih~x1 , ~xj i − h~x2 , w
~ 3 ih~x2 , ~xj i
= h~x3 , ~xj i − h~xj , w
~ 3 ih~xj , ~xj i = h~x3 , ~xj i − h~xj , w
~ 3 i = 0.

Hence the vector ~x3 that we are looking for is

~ 3 k−1 w
~x3 = kw ~ 3.

Since ~x3 ∈ U3 it follows that span{~x1 , ~x2 , ~x3 } ⊆ U3 . Since both spaces have dimension 3, they
must be equal.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
252 7.5. The Gram-Schmidt process

We repeat this k times until have constructed the basis ~x1 , . . . , ~xk .
Note that the general procedure is as follows:

• Suppose that we already have constructed ~x1 , . . . , ~x` . Then we first construct

~ `+1 = ~u`+1 − PU` ~u`+1 .


w

~ `+1 ∈ U`+1 and w


This vector satisfies w ~ `+1 ⊥ U` . Note that w ~ `+1 6= ~0 because otherwise
we would have that ~u`+1 = PU` ~u`+1 ∈ U` which is impossible because ~u`+1 , ~u` , . . . , ~u1 are
linearly independent. Then ~x`+1 = kw~ `+1 k−1 w
~ `+1 has all the desired properties.

Example 7.47. Let U = span{~u1 , ~u2 , ~u3 } where


     
1 −1 −2
1  4  5
  √   
0 , ~u2 =  2 ,
~u1 =   0 .
~u3 = 
   
1  3  0

FT
1 2 1

We want to find an orthonormal basis ~x1 , ~x2 , ~x3 of U using the Gram-Schmidt process.
 
Solution. (i) ~x1 = k~v1 k−1~v1 = 21 ~v1 . −3
 2
√ 
(ii) w~ 2 = ~u2 − proj~x1 ~u2 = ~u2 − h~x1 , ~u2 i~x1 = ~u2 − 4~x1 = ~u2 − 2~u1 =   2

 
−3  1
 2 0
1 √ 
RA
=⇒ ~x2 = kw~ 2 k−1 w
~2 =  2 .
4  1

0
   
(iii) w~ 3 = ~u3 − projspan{~x1 ,~x2 } ~u3 = ~u3 − h~x1 , ~u3 i~x1 + h~x2 , ~u3 i~x1 = ~u3 − 2~x1 + 4~x2
       
−2 1 −3 0
 5 1  2 
    √   √2

=  0 − 0 −  2 = − 2
      

 0 1  1  −2
D

1 1 0 0
 
0
 −2
−1 1
√ 
=⇒ ~x3 = kw~ 3k w ~ 3 = √10   2 .

 2
0
Therefore the desired orthonormal basis of U is
     
1 −3 0
1  2  −2
1  1 √  1 √ 
~x1 =  0 , ~x2 =  2 , ~x3 = √  .
2 
2 
  4 
  10 
1 1  2
1 0 0

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 253

Note that we will obtain a different basis if you change the order of the given basis ~u1 , ~u2 , ~u3 .

Example 7.48. We will given another solution of Example 7.33. We were asked to find an or-
thonormal basis of the orthogonal complement of
   
 1
 1 

 2 0
U = span  ,   .
 3
   1

4 0
 

From Example 7.32 we already know that



  
0 −1
−2 −1
U ⊥ = span{w ~ 2}
~ 1, w where w  0 ,
~1 =   w  1 .
~2 =  

1 0

FT
We use the Gram-Schmidt process to obtain an orthonormal basis ~x1 , ~x2 of U .
1
(i) ~x1 = k~v1 k−1~v1 = √ ~v1 .
5
     
−1 0 −5
2 −1 2 −2 1 −1
(ii) ~y2 = w~ 2 − proj~x1 w~ 2 = ~u2 − h~x1 , ~u2 i~x1 = ~u2 − √ ~x1 = 
 1 − 5  0 = 5  5
    
5
0 1 −2
RA
 
5
1  1
=⇒ ~x2 = k~y2 k−1 ~y2 = √  
55 −5
2

Therefore   
0 5
1 −2 1 −1
~x1 = √  , ~x2 = √  .
5  0 55  5
D

1 2

You should now have understood


• why the Gram-Schmidt process works,
• etc.
You should now be able to

• apply the Gram-Schmidt process in order to generate an orthonormal basis of a given sub-
space,
• etc.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
254 7.6. Application: Least squares

7.6 Application: Least squares

In this section we want to present the least squares method to fit a linear function to certain
measurements. Let us see an example.

Example 7.49. Assume that we want to measure the Hook constant k of a spring. By Hook’s law
we know that

y = y0 + km (7.10)

where y0 is the elongation of the spring without any mass attached and y is the elongation of the
spring when we attach the mass m to it.

FT y0 y0
RA
ym
m

y y
D

Assume that we measure the elongation for different masses. If Hook’s law is valid and if our
measurements were perfect, then our measured points should lie on a line with slope k. However,
measurements are never perfect and the points will rather be scattered around a line. Assume that
we measured the following.

m 2 3 4 5
y 4.5 5.1 6.1 7.9

Figure 7.4 contains a plot of these measurements in the m-y-plane.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 255

y y

8 8

7 7

6 6
g1
5 5
g2
4 4
m m
1 2 3 4 5 1 2 3 4 5

Figure 7.4: The left plot shows the measured data. In the plot on the right we added the two functions
g1 (x) = x + 2.5, g2 (x) = 1.1x + 2 which seem to be reasonable candidates for linear approximations to

FT
the measured data.

The plot gives us some confidence that Hook’s law holds since the points seem to lie more or less
RA
on a line. How do we best fit a line through the points? The slope seems to be around 1. We could
make the following guesses:

g1 (x) = x + 2.5 or g2 (x) = 1.1x + 2


D

Which of the two functions is the better approximation? Are there other approximations that are
even better?

The answer to this questions depend very much on how we measure how “good” an approximation
is. One very common way is the following: For each measured point, we take the difference
∆j := mj − g(mj ) between the measured value and the value of our test function. Then we
P 1
n 2 2
square all these differences, sum them and then we take the square root j=1 (mj − g(mj )) ,
see also Figure 7.5. The resulting number will be our measure for how good our guess is.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
256 7.6. Application: Least squares

y
y

∆n−1 ∆n

∆1 ∆2

m m
x1 x2 x3 . . . xn−1 xn x1 x2 x3 . . . xn−1 xn

Figure 7.5: The graph on the left shows points for which we want to find an approximating linear
function. The graph on the right shows such a linear function and how to measure the error or
Pn 1
2 2
discrepancy between the measured points the proposed line. A measure for the error is j=1 ∆j .

FT
Before we do this for our data, we make some simple observations.

(i) If all the measured point lie on a line and we take this line as our candidate, then this method
gives the total error 0 as it should be.
(ii) We take the squares of the errors in each measured points so that the error is always counted
positive. Otherwise it could happen that the errors cancel each other. If we would simply
sum the errors, then the total error could be 0 while the approximating line is quite far from
RA
all the measure points.
Pn
(iii) There are other ways how to measure the error, for example one could use j=1 |mj − g(mj )|,
but it turns out the methods with the squares has many advantages. (See some course on
optimisation for further details.)

Now let us calculate the errors for our measure points and our two proposed functions.

m 2 3 4 5 m 2 3 4 5
D

y (measured) 4.5 5.1 6.1 7.9 y (measured) 4.5 5.1 6.1 7.9
g1 (m) 4.5 4.5 6.5 7.5 g2 (m) 4.2 5.3 6.4 7.5
y − g1 0 0.6 -0.4 0.4 y − g2 0.3 -0.2 -0.3 0.4

Therefore we find for the errors


1 1
∆(1) = 02 + 0.62 + (−0.4)2 + 0.42 2 = [0.68] 2 ≈ 0.825,

Error for function g1 :
1 1
Error for function g2 : ∆(2) = 0.32 + (−0.2)2 + (−0.3)2 + 0.42 2 = [0.38] 2 ≈ 0.616,


so our second guess seems to be closer to the best linear approximation to our measured points
than the first guess. This exercise will be continued on p. 260.

Now the question arises how we can find the optimal linear approximation.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 257

Best linear approximation. Assume we are given measured data (x1 , y1 ), . . . , (xn , yn ) and we
want to find a linear function g(x) = ax + b such that the total error
n
hX i 21
∆ := (yj − g(xj ))2 (7.11)
j=1

is minimal. In other words, we have to find the parameters a and b such that ∆ becomes as small
as possible. The key here is to recognise the right hand side on (7.11) as the norm of a vector
(here the particular form of how we chose to measure the error is crucial). Let us rewrite (7.11) as
follows:
 
y1 − (ax1 − b)

hX n i 12 hX n i 12  y2 − (ax2 − b) 
∆= (yj − g(xj ))2 = (yj − (axj − b))2 = 
 
.. 
j=1 j=1

. 

yn − (axn − b)
      
y1 x1 1

FT

 y2    x2  1
=  .  − a  .  + b  .  .
      
 ..    ..   .. 

yn xn 1

Let us set      
y1 x1 1
 y2   x2  1
~y =  .  , ~x =  .  and ~u =  .  . (7.12)
     
 ..   ..   .. 
RA
yn xn 1
n
Note that these are vectors in R . Then

∆ = k~y − [a~x + b~u]k

and the question is how we have to choose a and b such that this becomes as small as possible. In
other words, we are looking for the point in the vector space spanned by ~x and ~u which is closest
to ~y . By Theorem 7.43 this point is given by the orthogonal projection of ~y onto that plane.
To calculate this projection, set U = span{~x, ~u} and let P be the orthogonal projection onto U .
D

Then by our reasoning


P ~y = a~x + b~u. (7.13)
Now let us see how we can calculate a and b easily from (7.13).1 In the following we will assume
that ~x and ~u so that U is a plane. This assumption seems to be reasonable because that they are
linearly dependent would mean that x1 = · · · = xn (in our example with the spring this would
mean that we always used the same mass in the experiment). Observe that if ~x, ~u were linearly
independent, then the matrix A below would have only one column; everything else works just the
same.
Recall that by Theorem 7.45 the orthogonal projection onto U is given by

P = A(At A)−1 At
1 Of y and then plant the linear n × 2 system to find the coefficients a and b.
course, you could simply calculate P ~

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
258 7.6. Application: Least squares

where A is the n × 2 matrix whose columns consist of the vectors ~x and ~u. Therefore (7.13) becomes
 
a
A(At A)−1 At ~y = a~x + b~u = A . (7.14)
b
Since by our assumption the columns of A are linearly independent, it is injective. Therefore we
can conclude from (7.14) that  
t −1 t a
(A A) A ~y =
b
which is formula for the numbers a and b that we were looking for.

Let us summarise our reasoning above in a theorem.

Theorem 7.50. Let (x1 , y1 ), . . . , (xn , yn ) be given. The linear function g(x) = ax + b which min-
imises the total error
hX n i 21
∆ := (yj − g(xj ))2 (7.15)

FT
j=1

is given by  
a
= (At A)−1 At ~y (7.16)
b
where ~y , ~x and ~u are as in (7.12) and A is the n × 2 matrix whose columns consist of the vectors ~x
and ~u.

In Remark 7.51 we will show how this formula can be derived with methods from calculus.
RA
Exercise 7.49 continued. . Let us use Theorem 7.50 to calculate the best linear approximation
to the data from Exercise 7.49. Note that in this case the mj correspond to the xj from the theorem
and we will write m
~ instead of ~x. In this case, we have
       
2 1 2 1 4.5
3 1 3 1 5.1
~ =   , ~u =   , A = (~x | ~u) = 
m       , ~y =  
 ,
4 1 4 1 6.1
5 1 5 1 7.9
D

hence  
  2 1    
t 2 3 4 5 
3 1
 = 54 14 t −1 1 2 −7
AA= , (AA ) =
1 1 1 1 4 1 14 4 5 −7 27
5 1
and therefore
   
     4.5   4.5
a t −1 t 1 2 −7 2 3 4 5.1 = 1 −3 −1
5   1 3 
5.1

= (A A) A ~y =
b 10 −7 27 1 1 1 1 6.1 10 13 6 −1 −8 6.1
7.9 7.9
 
1.12
= .
1.98

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 259

We conclude that the best linear approximation is

g(m) = 1.12m + 1.98.

7 g

4
m

FT
1 2 3 4 5

Figure 7.6: The plot shows the measured data and the linear approximation g(m) = 1.12m + 1.98
calculated with Theorem 7.50.

The method above can be generalised to other types of functions. We will show how it can be
adapted to the case of polynomial and to exponential functions.
RA
Polynomial functions.. Assume we are given measured data (x1 , y1 ), . . . , (xn , yn ) and we want
to find a polynomial of degree k which best fits the data points. Let p(x) = ak xk + ak−1 xk−1 +
· · · + a1 x + a0 be the desired polynomial. We define the vectors
   k  k−1     
y1 x1 x1 x1 1
 y2  xk2  xk−1   x2  1
 2 
~y =  .  , ξ~k =  .  , ξ~k−1 =  .  , . . . , ξ~1 =  .  , ξ~0 =  .  .
       
 ..   ..   ..   ..   .. 
yn k k−1 xn 1
xn xn
D

If the vectors ξ~k , . . . , ξ0 are linearly independent, then


 
ak
 .. 
.
  = (At A)−1 At ~y
a1 
a0

where A = (ξ~k | . . . | ξ~0 ) is the n × (k + 1) matrix whose columns are the vectors ξ~k , . . . , ξ~0 . Note
that by our assumption k < n (otherwise the vectors ξ~k , . . . , ξ~0 cannot be linearly independent).

Remark. Generally one should have many more data points than the degree of the polynomial
one wants to fit; otherwise the problem of overfitting might occur. For example, assume that

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
260 7.6. Application: Least squares

the curve we are looking for is f (x) = 0.1 + 0.2x and we are given only three measurements:
(0, 0.25), (1, 0), (3, 1). Then a linear fit would give us g(x) = 72 x + 28 1
≈ 0.23x + 0.036. The
1 2 1 1
fit with a quadratic function gives p(x) = 4 x − 2 x + 4 which matches the data points perfectly
but is far away from the curve we are looking for. The reason is that we have too many free
parameters in the polynomial so the fit the data too well. (Note that for any given n + 1 points
(x1 , y1 ), . . . , (xn+1 , yn+1 ) with x1 z . . . , xn+1 , there exists exactly one polynomial p of degree ≤ n
such that p(xj ) = yj for every j = 1, . . . , n + 1.) If we had a lot more data points and we tried to
fit a polynomial to a linear function, then the leading coefficient should become very small but this
effect does not appear if we have very few data points.

2
p
g
1
f

FT
x
−2 −1 1 2 3 4

Figure 7.7: Example of overfitting when we have too many free variables for a given set of data
points. The dots mark the measured points which are supposed to approximate the red curve f .
Fitting polynomial p of degree 2 leads to the green curve. The blue curve g is the result of a linear fit.
RA
Exponential functions.. Assume we are given measured data (x1 , y1 ), . . . , (xn , yn ) and we want
to find a function of form g(x) = c ekx to fit our data point. Without restriction we may assume
that c > 0 (otherwise we fit −g).
Then we only need to define h(x) = ln(g(x)) = ln c + kx so that we can use the method to fit a
linear function to the data points (x1 , ln(y1 )), . . . , (xn , ln(yn )) in order to obtain c and k.

Remark 7.51. Let us show how the formula in Theorem 7.50 can be derived with analytic methods.
Recall that the problem is the following: Let (x1 , y1 ), . . . , (xn , yn ) be given. Find a linear function
D

g(x) = ax + b which minimises the total error


n
hX i 21 n
hX i 21
∆ := (yj − g(xj ))2 = (yj − [axj + b])2
j=1 j=1

Let us consider ∆ as function of a and b. Then we have to find the minimum of


n
hX i 12
∆(a, b) = (yj − [axj + b])2
j=1

as a function of the two variables a, b. In order to simplify the calculations a bit, we observe that
it is enough to minimise the square of ∆ since ∆(a, b) ≥ 0 for all a, b, and therefore it is minimal if

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 261

and only if its square is minimal. So we want to find a, b which minimise


n
X
F (a, b) := (∆(a, b))2 = (yj − axj − b)2 . (7.17)
j=1

To this end, we have to derive F . Since F : R2 → R, the derivative will be vector valued function.
We find
  X n n 
∂F ∂F X
DF (a, b) = (a, b), (a, b) = −2xj (yj − axj − b), −2(yj − axj − b)
∂a ∂b j=1 j=1
 X n n
X n
X n
X n
X 
=2 a x2j + b xj − xj yj , a xj + nb − yj .
j=1 j=1 j=1 j=1 j=1

Now we need to find the critical points, that is, a, b such that DF (a, b) = 0. This is the case for
 n n n

X X X
2
 

FT
a
 xj + b xj = xj yj 
 Pn Pn  ! Pn 
  2
j=1 xj j=1 xj a j=1 xj yj

 j=1 

j=1 j=1
n n
that is Pn  =  Pn .

a
X X 
j=1 x j n b j=1 yj
xj + bn = yj 
 


 
 
j=1 j=1
(7.18)

Now we can multiply on both sides from the left by the inverse of the matrix and obtain the solution
for a, b. This shows that F has only one critical point. Since F tends to infinity for k(a, b)k → ∞,
RA
the function F must indeed have a minimum in this critical point. For details, see a course on
vector calculus or optimisation.
We observe the following: If, as before, we set
       
x1 1 x1 1 y1
 ..   ..   ..  .. 
~x =  .  , ~u =  .  , A = (~x | ~u) =  .  , ~y =  .  ,

xn 1 xn 1 yn

then
D

n
X n
X n
X n
X
x2j = h~x , ~xi, xj = h~x , ~ui, n = h~u , ~ui, xj yj = h~x , ~y i, yj = h~u , ~y i.
j=1 j=1 j=1 j=1

Therefore the expressions in equation (7.18) can be rewritten as


Pn 2
Pn ! ! !
j=1 xj j=1 xj h~x , ~xi h~x , ~ui ~x
Pn = = (~x | ~u) = At A
x
j=1 j n h~u , ~x i h~
u , ~u i ~u
Pn ! ! !
j=1 xj yj h~x , ~y i ~x
Pn = = ~y = At ~y
j=1 yj h~u , ~y i ~u

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
262 7.7. Summary

and we recognise that equation (7.18) is the same as


 
a
At A = At ~y
b

which becomes our equation (7.16) if we multiply both sides of the equation from the left by
(At A)−1 .

You should now have understood

• what the least square method is,


• how it is related to orthogonal projections,
• what overfitting is,
• etc.
You should now be able to

FT
• fit a linear function to given data points,
• fit a polynomial to given data points,
• fit an exponential function to given data points,
• etc.

7.7 Summary
RA
Let U be a subspace of Rn . Then its orthogonal complement is defined by

U ⊥ = {~x ∈ Rn : ~x ⊥ ~u for all ~u ∈ U }.

For any subspace U ⊆ Rn the following is true:

• U ⊥ is a vector space.
• U ⊥ = ker A where A is any matrix whose rows are formed by a basis of U .
D

• (U ⊥ )⊥ = U .
• dim U + dim U ⊥ = n.
• U ⊕ U ⊥ = Rn .
• U has an orthonormal basis. One way to construct such a basis is to first construct an
arbitrary basis of U and then apply the Gram-Schmidt orthogonalisation process to obtain
an orthonormal basis.

Orthogonal projection onto a subspace U ⊆ Rn


Let PU : Rn → Rn be the orthogonal projection onto U . Then

• PU is a linear transformation.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 263

• PU ~x k U for every ~x ∈ Rn .
• ~x − PU ~x ⊥ U for every ~x ∈ Rn .
• For every ~x ∈ Rn the point in U nearest to ~x is given by ~x − PU ~x and dist(~x, U ) = k~x − PU ~xk.
• Formulas for PU :

– If ~u1 , . . . , ~uk is a basis of U , then

PU = h~u1 , ·i + · · · + h~uk , ·i,

that is PU ~x = h~u1 , ~xi + · · · + h~uk , ~xi for every ~x ∈ Rn .


– if B is any matrix whose columns form a basis of U , then PU = B(B t B)B t .

Orthogonal matrices
A matrix Q ∈ M (n × n) is called an orthogonal matrix if it is invertible and if Q−1 = Qt . Note

FT
that the following assertions for a matrix Q ∈ M (n × n) are equivalent:

(i) Q is an orthogonal matrix.


(ii) Qt is an orthogonal matrix.
(iii) Q−1 is an orthogonal matrix.
(iv) The columns of Q are an orthonormal basis of Rn .
(v) The rows of Q are an orthonormal basis of Rn .
RA
(vi) Q preserves inner products, that is h~x , ~y i = hQ~x , Q~y i for all ~x, ~y ∈ Rn .
(vii) Q preserves lengths, that is k~xk = kQ~xk for all ~x ∈ Rn .

Every orthogonal matrix represents either a rotation (in this case its determinant is 1) or a com-
position of a rotation with a reflection (in this case its determinant is −1).

7.8 Exercises
D

 
1. (a) Complete p1/4 a una base ortonormal para R2 . ¿Cuántas posibilidades hay para
15/16
hacerlo?
√   √ 
1/ √2 1/√3
(b) Complete −1/ 2 , 1/√3 a una base ortonormal para R3 . ¿Cuántas posibilidades
0 1/ 3
hay para hacerlo?
 √ 
1/√2
(c) Complete 1/ 2 a una base ortonormal para R3 . ¿Cuántas posibilidades hay para
0
hacerlo?

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
264 7.8. Exercises

2. Encuentre una base para el complemento ortogonal de los siguientes espacios vectoriales. En-
cuentre la dimensión del espacio y la dimensión de su complemento ortogonal.
         

 1 2   1 3 2 
         

2 3   ,   , 3 ⊆ R4 .
2 4
 
,   ⊆ R , 4
(a) U = span  3 4 (b) U = span 3 5 4

  
 
4 5 4 6 5
   

3. (a) Sea U = {(x, y, z)t ∈ R3 : x + 2y + 3z = 0} ⊆ R3 .


(i) Sea ~v = (0, 2, 5)t . Ecuentre el punto ~x ∈ U que esté más cercano a ~v y calcule la
distancia entre ~v y ~x.
(ii) ¿Hay un punto ~y ∈ U que esté a una distancia máximal de ~v ?
(iii) Encuentre la matriz que representa la proyección ortogonal sobre U (en la base es-
tandar).
(b) Sea W = gen{(1, 1, 1, 1)t , (2, 1, 1, 0)t } ⊆ R4 .

FT
(i) Encuentre una base ortogonal de W .
(ii) Sean ~a1 = (1, 2, 0, 1)t , ~a2 = (11, 4, 4, −3)t , ~a3 = (0, −1, −1, 0)t . Para cada j = 1, 2, 3
encuentre el punto w ~ j ∈ W que esté más cercano a ~aj y calcule la distancia entre ~aj
yw~j.
(iii) Encuentre la matriz que representa la proyección ortogonal sobre W (en la base es-
tandar).

       
    0 −1 1 0
RA
0 1 3 0 0 0
2
 ~ = 2, ~a = 4, ~b =  0 , ~c = 1, d~ = 1.
         
2, w
4. Sean ~v =  3        
0 0 0 0
1 5
0 3 1 1

(a) Demuestre que ~v y w ~ son linealmente independientes y encuentre una base ortonormal de
~ ⊆ R4 .
U = span{~v , w}
(b) Demuestre que ~a, ~b, ~c y d~ son linealmente independientes. Use el proceso de Gram-Schmidt
para encontrar una base ortonormal de U = span{~a, ~b, ~c, d} ~ ⊆ R5 . Encuentre una base de
D


U .

5. Encuentre una base ortonormal de U ⊥ donde U = gen{(1, 0, 2, 4)t } ⊆ R4 .

6. Una bola rueda a lo largo del eje x con velocidad constante. A lo largo de la trayectoria de la
bola se miden las coordenadas x de la bola en ciertos tiempos t. Las siguientes mediciones son
(t en segundos, x en metros):

x 1.5 2.0 3.0 4.0 4.5 6


t 1.4 2.3 4.7 6.6 7.4 10.8

(a) Dibuje los puntos en el plano tx.

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 7. Orthonormal bases and orthogonal projections in Rn 265

(b) Use el método de mı́nimos cuadrados para econtrar la posición inicial x0 y la velocidad v
de la bola.
(c) Dibuje la recta en el bosquejo anterior. ¿Dónde/Cómo se ven x0 y v?

Hint. Recuerde que x(t) = x0 + vt para un movimiento con velocidad constante.

7. Se supone que una sustancia quı́mica inestable decaye según la ley P (t) = P0 ekt . Suponga que
se hicieron las siguientes mediciones:

t 1 2 3 4 5
P 7.4 6.5 5.7 5.2 4.9

Con el método de mı́nimos cuadrados aplicado a ln(P (t)), encuentre P0 y k que mejor corre-
sponden con las mediciones. Dé una estimada para P (8).

FT
8. Con el método de mı́nimos cuadrados encuentre el polı́nomio y = p(x) de grado 2 que mejor
aproxima los siguientes datos:

x -2 -1 0 1 2 3 4
y 15 8 2.8 -1.2 -4.9 -7.9 -8.7

9. Sea n ∈ N y sean Q, T ∈ M (n × n).


RA
(a) Demuestre que T es una isometrı́a si y solo si hT ~x, T ~y i = h~x, ~y i para todo ~x, ~y ∈ Rn
(es decir: una isometrı́a mantiene ángulos).
(b) Demuestre que Q es una matriz ortogonal si y solo si Q es una isometrı́a.

   
cos ϕ sin ϕ
10. (a) Sea ϕ ∈ R y sean ~v1 = , ~v2 = . Demuestre que ~v1 , ~v2 es una base
− sin ϕ cos ϕ
ortonormal de R2 .
D

(b) Sea α ∈ R. Encuentre la matriz Q(α) ∈ M (2 × 2) que describe rotación por α contra las
manecillas del reloj.
(c) Sean α, β ∈ R. Explique por qué es claro que Q(α)Q(β) = Q(α + β). Use esta relación
para concluir las identidades trigonométricas

cos(α + β) = cos α cos β − sin α sin β, sin(α + β) = sin α cos β + cos α sin β.

11. Sean O(n) = {Q ∈ M (n × n) : Q es matriz ortogonal} y SO(n) = {Q ∈ O(n) : det Q =


1}.
(a) Demuestre que O(n) con la composición es un grupo. Es decir, hay que probar que:
(i) Para todo Q, R ∈ O(n), la composición QR es un elemento en O(n).

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
266 7.8. Exercises

(ii) Existe un E ∈ O(n) tal que QE = Q y EQ = Q para todo Q ∈ O(n).


(iii) Para todo Q ∈ O(n) existe un elemento inverso Q
e tal que QQ
e = QQ
e = E.
(b) ¿Es O(n) conmutativo (es decir, se tiene QR = RQ para todo Q, R ∈ O(n))?
(c) Demuestre que SO(n) con la composición es un grupo.

12. Ses T : Rn → Rm una isometrı́a. Demuestre que T es inyectivo y que m ≥ n.

13. Sea V un espacio vectorial y sean U, W ⊆ V subespacios.


(a) Demuestre que U ∩ W es un subespacio.
(b) Demuestre que dim U + W = dim U + dim V − dim(U ∩ W ).
(c) Suponga que U ∩ W = {0}. Demuestre que dim U ⊕ W = dim U + dim V .
(d) Demuestre que U ⊥ es un subespacio de V y que (U ⊥ )⊥ = U .

FT
RA
D

Last Change: Mo 16. Mai 00:43:18 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8

Symmetric matrices and


diagonalisation

FT
In this chapter we work mostly in Rn and in Cn . We write MR (n × n) or MC (n × n) only if it is
important if the matrix under consideration is a real or a complex matrix.
The first section is dedicated to Cn . We already know that it is a vector space. But now we
introduce an inner product on it. Moreover we define hermitian and unitary matrices on Cn which
are analogous to symmetric and orthogonal matrices in Rn . We define eigenvalues and eigenvectors
in Section 8.3. It turns out that it is more convenient to work over C because the eigenvalues
RA
are zeros of the so-called characteristic polynomial and in C every polynomial has a zero. The
main theorem is Theorem 8.48 which says that an n × n matrix is diagonalisable if it has enough
eigenvectors to generate Cn (or Rn ). It turns out that every symmetric and every hermitian matrix
is diagonalisable.
We end the chapter with an application of orthogonal diagonalisation to the solution of quadratic
equations in two variables.
D

8.1 Complex vector spaces


In this section we introduce Cn as an inner product space because some calculations about eigen-
values later in this chapter are more natural in Cn than in Rn . Most of this section may be skipped.
The important part is the definition of the inner product on Cn , the notion of orthogonality derived
from it, and the concept of hermitian and unitary matrices.
Similarly as for Rn , we define the vector space Cn as the set

  
 z1
 

n  .. 
C =  .  : z1 , . . . , zn ∈ C
 
zn
 

267
268 8.1. Complex vector spaces

together with the sum and multiplication by a scalar c ∈ C:


         
w1 z1 w1 + z1 z1 cz1
 ..   ..  ..  ..   .. 
+ := , c :=  . .
 
 .  .  .  .
wn zn wn + zn zn czn
It is not hard to check that Cn together with these operations satisfies the vector space axioms
from Definition 5.1 with K = C, hence it is a complex vector space. In particular, we have concepts
like linear independence of vectors, basis and dimension of Cn , etc.
Next we introduce an inner product on Cn . As in the case of real vectors, we would like to interprete
h~z , ~zi as the square of the norm of ~z. In particular it should be a nonnegative real number. In
particular, for C1 = C, the vectors are just complex numbers ~z = z1 and we would like to have
h~z , ~zi = |z1 |2 = z1 z1 where z is the complex conjugate of the complex number z. This motivates
us to define the inner product in Cn as follows.
 
z1
Definition 8.1 (Inner product and norm of a vector in Cn ). For vectors ~z =  ...  and

FT
 

zn
 
w1
~ =  ...  ∈ Cn the inner product (or scalar product or dot product) is defined as
w
 

wn
   
* z1 w1 + n
 ..   ..  X
h~z , wi
~ =  .  , .  = zj wj = z1 w1 + · · · + zn wn .
RA
zn wn j=1

 
z1
The length of ~z =  ...  ∈ Rn is denoted by k~zk and it is given by
 

zn
p
k~zk = |z1 |2 + · · · + |zn |2 .
Other names for the length of ~z are magnitude of ~z or norm of ~z.
D

Exercise 8.2. Show that the scalar product from Definition 8.1 can be viewed as an extension of
the scalar product in Rn in the following sense: If the components of ~z and ~v happen to be real,
then they can also be seen as vectors in Rn . The claim is that their scalar product as vectors in
Rn is equal to their scalar product in Cn . The same is true for their norms.

Properties 8.3. (i) Norm of a vector: For all vectors ~z ∈ Cn , we have that
h~z , ~zi = k~zk2 .

~ ∈ Cn , we have (note the complex conju-


(ii) Symmetry of the inner product: For all vectors ~v , w
gation on the right hand side!)
~ = hw
h~v , wi ~ , ~v i.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 269

(iii) Sesqulinearity of the inner product: For all vectors ~u, ~v , ~z ∈ Cn and all c ∈ C, we have that

h~v + cw
~ , ~zi = h~v , ~zi + chw
~ , ~zi and h~v , w
~ + c~zi = h~v , wi
~ + ch~v , ~zi.

~ ∈ Cn and c ∈ C, we have that kc~v k = |c|k~v k.


(iv) For all vectors ~v , w
     
v1 w1 z1
~ =  ...  , ~z =  ...  ∈ Cn and let c ∈ C.
Proof. Let ~v =  ...  , w
     

vn wn zn

(i) h~z , ~zi = z1 z1 + · · · + zn zn = |z1 |2 + · · · + |zn |2 = k~zk2 .


(ii) h~v , wi
~ = v1 w1 + · · · + vn wn = v1 w1 + · · · + vn wn = w1 v1 + · · · + wn vn = hw
~ , ~v i.
(iii) A straightforward calculation shows

~ , ~zi = (v1 + cw1 )w1 + · · · + (vn + cwn )wn


h~v + cw

FT
= v1 w1 + · · · + vn wn + cw1 w1 + · · · + cwn wn
= h~v , ~zi + chw
~ , ~zi.

The second equation can be shown by an analogous calculation. Instead of repeating them,
we can also use the symmetry property of the inner product:

h~v , w
~ + c~zi = hw
~ + c~z , ~v i = hw
~ , ~v i + ch~z , ~v i = hw
~ , ~v i + ch~z , ~v i = h~v , ~zi + ch~v , ~zi.
RA
(iv) kc~zk2 = hc~z , c~zi = cch~z , ~zi = |c|2 k~zk2 . Taking the square root on both sides, we obtain the
desired equality kc~zk = |c|k~zk.

For Cn there is no cosine theorem and in general it does not make too much sense to speak about
the angle between two complex vectors (orthogonality still makes sense!).

Definition 8.4. Let ~z, ~v ∈ Cn .


(i) The vectors ~z, ~v are called orthogonal or perpendicular if h~z , ~v i = 0. In this case we write
~z ⊥ ~v .
D

h~ vi
z ,~
(ii) If ~v 6= ~0, then the orthogonal projection of ~z onto ~v is proj~v ~z = k~v k2 ~
v.

The next proposition shows that orthogonality works Cn as expected.

Proposition 8.5. Let ~z, ~v ∈ Cn .


(i) Pythagoras theorem: If ~z ⊥ ~v , then k~z + ~v k2 = k~zk2 + k~v k2 .
(ii) If ~v 6= ~0, then ~z = ~zk + ~z⊥ with ~zk := proj~v ~z and ~z⊥ := ~z − proj~v ~z and

proj~v ~z k ~v , and ~z − proj~v ~z ⊥ ~v .

Moreover, k proj~v ~zk ≤ k~zk.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
270 8.1. Complex vector spaces

(iii) If ~v 6= ~0, then Cn → Cn , ~z 7→ proj~v ~z is a linear map.

Proof. (i) If ~z ⊥ ~v , then k~z +~v k2 = h~z , ~zi + h~z , ~v i + h~v , ~zi + h~v , ~v i = h~z , ~zi + h~v , ~v i = k~zk2 + k~v k2 .
(ii) It is clear that ~z = ~zk + ~z⊥ and that ~zk k ~v by definition of ~zk and ~z⊥ . That ~z⊥ ⊥ ~v follows
from
h~z , ~v i
h~z⊥ , ~v i = h~z − proj~v ~z , ~v i = h~z , ~v i − hproj~v ~z , ~v i = h~z , ~v i − h~v , ~v i = h~z , ~v i − h~z , ~v i = 0.
k~v k2
Finally, by the Pythagoras theorem,

k~zk2 = k(~z − proj~v ~z) + proj~v ~zk2 = k~z − proj~v ~zk2 + k proj~v ~zk2 ≥ k proj~v ~zk2 .

(iii) Assume that ~v 6= ~0 and let ~z1 , ~z2 ∈ Cn and c ∈ C. Then


hz1 + c~z2 , ~v i hz1 , ~v i + ch~z2 , ~v i hz1 , ~v i~v ch~z2 , ~v i
proj~v (~z1 + c~z2 ) = 2
= 2
= 2
=
k~v k k~v k k~v k k~v k2

FT
= proj~v ~z1 + c proj~v ~z2 .

Question 8.1
What changes if in the definition of the orthogonal projection we put h~v , ~zi instead of h~z , ~v i?

Now let us show the triangle inequality. Note the the following inequalities (8.1) and (8.2) were
proved for real vector spaces in Corollary 2.20 using the cosine theorem.
RA
Proposition 8.6. For all vectors ~v , w ~ ∈ Cn and c ∈ C, we have the Cauchy-Schwarz inequality
(which is a special case of the so-called Hölder inequality)

|h~v , wi|
~ ≤ k~v k kwk
~ (8.1)

and the triangle inequality


k~v + wk
~ ≤ k~v k + kwk.
~ (8.2)

~ = ~0 because in this case both sides of the


Proof. We will first show (8.1). It is obviously true if w
D

inequality are equal to 0. So let us assume now that w ~ 6= ~0. Note that for any λ ∈ C we have that

~ 2 = h~v − λw
0 ≤ k~v − λwk ~ = kvk2 − λhw
~ , ~v − λwi ~ + |λ|2 kwk2 .
~ , ~v i − λh~v , wi

If we chose λ = − h~
v ,wi
~
~ 2 , we obtain
kwk

h~v , wi
~ h~v , wi
~ ~ 2
|h~v , wi|
0 ≤ kvk2 − hw~ , ~
v i − h~
v , wi
~ + kwk2
~ 2
kwk ~ 2
kwk ~ 4
kwk
~ 2
|h~v , wi| ~ 2
|h~v , wi|
= kvk2 − 2 + kwk2
~ 2
kwk ~ 4
kwk
~ 2
|h~v , wi| 1 h i
= kvk2 − = kvk 2
kwk 2
− |h~
v , wi|
~ 2
~ 2
kwk kwk~ 2

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 271

It follows that kvk2 kwk2 − |h~v , wi|


~ 2 ≥ 0, hence kvk2 kwk2 ≥ |h~v , wi|
~ 2 . We obtain the desired
inequality by taking the square root.
Now let us show the triangle inequality. It is essentially the same as for vectors in Rn , cf. Corol-
lary 2.20.

~ 2 = h~v + w
k~v + wk ~ = h~v , ~v i + h~v , wi
~ , ~v + wi ~ + hw
~ , ~v i + hw
~ , wi
~
~ + h~v , wi
= h~v , ~v i + h~v , wi ~ + hw
~ , wi
~
= k~v k2 + 2 Reh~v , wi ~ 2
~ + kwk
≤ k~v k2 + 2|h~v , wi| ~ 2 ≤ k~v k2 + 2k~v k kwk
~ + kwk ~ 2 = (k~v k + kw|)
~ + kwk ~ 2.

In the first inequality we used that Re a ≤ |a| for any complex number a and in the second inequality
we used (8.1). If we take the square root on both sides we get the triangle inequality.

Remark 8.7. Observe that the choice of λ in the proof of (8.1) is not as arbitrary as it may seem.
Note that for this particular λ

FT
h~v , wi
~
~v − λw
~ = ~v − ~ = ~v − projw~ ~v .
w
~ 2
kwk
Hence this choice of λ minimises the norm of ~v −λw
~ and ~v −projw~ ~v ⊥ w.
~ Therefore, by Pythagoras,

k~v k2 = k(~v − projw~ ~v ) + projw~ ~v k2 = k~v − projw~ ~v k2 + k projw~ ~v k2


2
~ 2

h~v , wi
≥ k projw~ ~v k2 =
~ = |h~v , wi|

2
w
~
kwk~ ~ 2
kwk
RA
which shows that k~v k2 kwk
~ 2 ≥ |h~v , wi|
~ 2.
Another way to see this inequality is

0 ≤ k~v − projw~ ~v k2 = h~v − projw~ ~v , ~v − projw~ ~v i = h~v − projw~ ~v , ~v i = k~v k2 − hprojw~ ~v , ~v i


h~v , wi
~ ~ 2
|h~v , wi|
= k~v k2 − hw~ , ~
v i = k~
v k2

~ 2
kwk ~ 2
kwk

which again gives k~v k2 kwk


~ 2 ≥ |h~v , wi|
~ 2.
D

Important classes of matrices


Recall that for a matrix A ∈ MR (m × n) we defined its transpose At . The important property of
At is that it is the unique matrix such that

hA~x , ~y i = h~x , At ~y i for all ~x ∈ Rn , ~y ∈ Rm .

In the complex case, we want for a given matrix A ∈ MC (m × n) a matrix A∗ such that

hA~x , ~y i = h~x , A∗ ~y i for all ~x ∈ Cn , ~y ∈ Cm .


t
It is easy to check that we have to take A∗ = A , where A is the matrix we obtain from A by taking
the complex conjugate of every entry. Clearly, if all entries in A are real numbers, then At = A∗ .

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
272 8.1. Complex vector spaces

Definition 8.8. The matrix A∗ is called the adjoint matrix of A.

Lemma 8.9. Let A ∈ M (n × n). Then det(A∗ ) = det A = complex conjugate of det A.

Proof. det A∗ = det(A)t = det A = det A. The last equality follows directly from the definition of
the determinant.

A matrix with real entries is symmetric if and only if A = At . The analogue for complex matrices
are hermitian matrices.

Definition 8.10. A matrix A ∈ M (n × n) is called the hermitian if A = A∗ .


   
1 2 + 3i ∗ 1 5
Examples 8.11. • A= =⇒ A = . The matrix A is not
5 1 − 7I 2 − 3i 1 + 7I
hermitian.
   
1 2 + 3i 1 2 + 3i

FT
• A= =⇒ A∗ = . The matrix A is hermitian.
2 − 3i 5 2 − 3i 1 + 7I

Exercise 8.12. • Show that the entries on the diagonal of a hermitian matrix must be real.

• Show that the determinant of a hermitian matrix is a real number.

Another important class of real matrices are the orthogonal matrices. Recall that a matrix Q ∈
MR (n × n) is an orthogonal matrix if and only if Qt = Q−1 . We saw that if Q is orthogonal, then
RA
its columns (or rows) form an orthonormal basis for Rn and that | det Q| = 1, hence det Q = ±1.
The analogue in complex vector spaces are so-called unitary matrices.

Definition 8.13. A matrix Q ∈ M (n × n) is called unitary if Q∗ = Q−1 .

It is clear from the definition that a matrix is unitary if and only if its columns (or rows) form an
orthonormal basis for Cn , cf. Theorem 7.12.
D

Proposition 8.14. Let Q ∈ M (n × n).

(i) The following is equivalent:

(a) Q is unitary.
(b) hQ~x , Q~y i = h~x , ~y i for all ~x, ~y ∈ Rn .
(c) kQ~xk = k~xk for all ~x ∈ Rn .

(ii) If Q is unitary, then | det Q| = 1.

Proof. (i) (a) =⇒ (b): Assume that Q is a unitary matrix and let ~x, ~y ∈ Cn . Then

hQ~x , Q~y i = hQ∗ Q~x , ~y i = h~x , ~y i.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 273

(b) =⇒ (a): Fix ~x ∈ Cn . Then we have hQ~x , Q~y i = h~x , ~y i for all ~y ∈ Cn , hence

0 = hQ~x , Q~y i − h~x , ~y i = hQ∗ Q~x , ~y i − h~x , ~y i = hQ∗ Q~x − ~x , ~y i. = h(Q∗ Q − id)~x , ~y i.

Since this is true for any ~y ∈ Cn , it follows that (Q∗ Q − id)~x = 0. Since ~x ∈ Cn was arbitrary,
we conclude that Q∗ Q − id = 0, in other words, that Q∗ Q = id.
(b) =⇒ (c): It follows from (b) that kQ~xk2 = hQ~x , Q~xi = h~x , ~xi = k~xk2 , hence kQ~xk = k~xk.
(c) =⇒ (b): Observe that the inner product of two vectors in Cn can be expressed completely
in terms of norms as follows
1h i
h~a , ~bi = k~a + ~bk2 − k~a − ~bk2 + ik~a + i~bk2 − ik~a − i~bk2
4
as can be easily verified. Hence we find
1h i
hQ~x , Q~y i = kQ~x + Q~y k2 − kQ~x − Q~y k2 + ikQ~x + iQ~y k2 − ikQ~x − iQ~y k2
4
1h

FT
i
= kQ(~x + ~y )k2 − kQ(~x − ~y )k2 + ikQ(~x + i~y )k2 − ikQ(~x − i~y )k2
4
1h i
= k~x + ~y k2 − k~x − ~y k2 + ik~x + i~y k2 − ik~x − i~y k2
4
= h~x , ~y i.

(ii) Assume that Q is unitary. Then

1 = det id = det QQ∗ = (det Q)(det Q∗ ) = (det Q)(det Q) = | det Q|2 .


RA
    
0 i 0 i 0 −i
Examples 8.15. • The matrix Q = is unitary because QQ∗ = =
  i 0 i 0 −i 0
1 0
, hence Q∗ = Q−1 . Note that det Q = −i2 = 1.
0 1
 iα   iα   −iα   
e 0 ∗ e 0 e 0 1 0
• The matrix Q = is unitary because QQ = = ,
0 eiβ 0 eiβ 0 e−iβ 0 1
hence Q∗ = Q−1 . Note that det Q = ei(α+β) , hence | det Q| = 1.
D

You should now have understood


• the vector space structure of Cn ,
• the inner product on Cn ,
• that the concept of orthogonality makes sense in Cn and works as in Rn ,
• why hermitian matrices in Cn play the role of symmetric matrices in Rn ,
• why unitary matrices in Cn play the role of orthogonal matrices in Rn ,
• etc.
You should now be able to

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
274 8.2. Similar matrices

• calculate with vectors in Cn ,


• check if vectors in Cn are orthogonal,
• calculate the orthogonal projection of one vector onto another,
• check if a given matrix is hermitian,
• check if a given matrix is unitary,
• etc.

8.2 Similar matrices


Definition 8.16. Let A, B ∈ M (n × n) be (real or complex) matrices. They are called similar if
there exists an invertible matrix C such that

A = C −1 BC. (8.3)

In this case, we write A ∼ B.

Question 8.2
FT
Exercise 8.17. Show that A ∼ B if and only if there exists an invertible matrix C

A = CB
e Ce −1 .
e such that

Assume that A and B are similar. Is the matrix C in (8.3) unique or is it possible that there are
(8.4)
RA
different invertible matrices C1 6= C2 such that A = C1−1 BC1 = C2−1 BC2 ?

Remark 8.18. Similarity is an equivalence relation on the set of all square matrices. This means
that it satisfies the following three properties. Let A1 , A2 , A3 ∈ M (n × n). Then:
(i) Reflexivity: A ∼ A for every A ∈ M (n × n).
(ii) Symmetry: If A1 ∼ A2 , then also A2 ∼ A1 .
(iii) Transitivity: If A1 ∼ A2 and A2 ∼ A3 , then also A1 ∼ A3 .
D

Proof. (i) Reflexivity is clear. We only need to choose C = id.


(ii) Assume that A1 ∼ A2 . Then there exists an invertible matrix C such that A1 = C −1 A2 C.
Multiplication from the left by C and from the right by C −1 gives CA1 C −1 = A2 . Let
Ce = C −1 . Then C e −1 = C. Hence we obtain C
e is invertible and C e −1 A1 C
e = A2 which shows
that A2 ∼ A1 .
(iii) Transitivity: If A1 ∼ A2 and A2 ∼ A3 , then there exist invertible matrices C1 and C2 such
that A1 = C1−1 A2 C1 and A2 = C2−1 A3 C2 . It follows that

A1 = C1−1 A2 C1 = C1−1 C2−1 A3 C2 C1 = (C1 C2 )−1 A3 C1 C2 .

Setting C = C1 C2 shows that A1 = C −1 A3 C, hence A1 ∼ A3 .

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 275

We can interpret A ∼ B as follows: Let C be an invertible matrix with A = C −1 BC. Since


C is an invertible matrix, its columns ~c1 , . . . , ~cn form a basis of Rn (or Cn ) and we can view C
as the transition matrix from the canonical basis to the basis ~c1 , . . . , ~cn . Since B is the matrix
representation of the map ~x 7→ B~x with respect to the canonical basis of Rn , the equation A =
C −1 BC says that A represents the same linear map but with respect to the basis ~c1 , . . . , ~cn .
On the other hand, if A and B are matrix representations of the same linear transformation but
with respect to possibly different bases, then A = C −1 BC where C is the transition matrix between
the two bases. Hence A and B are similar.
So we showed:

Two matrices A and B ∈ M (n × n) are similar if and only if they represent the same linear
transformation. The matrix C in A = C −1 BC is the transition matrix between the two bases
used in the representations A and B.

Hence the following fact is not very surprising.

Proposition 8.19. If A, B ∈ M (n × n) are similar, then det A = det B.

FT
Proof. Let C ∈ M (n × n) invertible such that A = C −1 BC. Then

det A = det C −1 BC = det(C −1 ) det B det C = (det C)−1 det B det C = det B.

Exercise 8.20. Show that det A = det B does not imply that A and B are similar.

Exercise 8.21. Assume that A and B are similar. Show that dim(ker A) = dim(ker B) and that
RA
dim(Im A) = dim(Im B). Why is this no surprise?

Question 8.3
Assume that A and B are similar. What is the relation between ker A and ker B? What is the
relation between Im A and Im B?
Hint. Theorem 6.4.

A very nice class of matrices are the diagonal matrices because it is rather easy to calculate with
D

them. Closely related are the so-called diagonalisable matrices.

Definition 8.22. A matrix A ∈ M (n × n) is called diagonalisable if it is similar to a diagonal


matrix.

In other words, A is diagonalisable if there exists a diagonal matrix D and an invertible matrix C
with
C −1 AC = D. (8.5)
How can we decide if a matrix A is diagonalisable? We know that it is diagonalisable if and only if
it is similar to a diagonal matrix, that is, if and only if there exists a basis ~c1 , . . . , ~cn such that the
representation of A with respect to these vectors is a diagonal matrix. In this case, (8.5) is satisfied
if the columns of C are the basis vectors ~c1 , . . . , ~cn .

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
276 8.2. Similar matrices

Denote the diagonal entries of D by d1 , . . . , dn . Then it easy to see that D~ej = dj~ej . This means
that if we apply D to some ~ej , then the image D~ej is parallel to ~ej . Since D is nothing else than
the representation of A with respect to the basis ~c1 , . . . , ~cn , we have A~cj = dj ~cj .
We can make this more formal: Take equation (8.5) and multiply both sides from the left by C so
that we obtain AC = CD. Recall that for any matrix B, we have that B~ej = jth column of B. If
we obtain
AC~ej = A~cj , AC=CD
======⇒ A~cj = d~cj .
CD~ej = C(dj~ej ) = dj C(~ej ) = dj ~cj .

In summary, we found:

A matrix A ∈ M (n × n) is diagonalisable if and only we can find a basis ~c1 , . . . , ~cn of Rn (or Cn )
and numbers d1 , . . . , dn such that

A~cj = dj ~cj , j = 1, . . . , n.

FT
In this case C −1 AC = D (or equivalently A = CDC −1 ) where D = diag(d1 , . . . , dn ) and C =
(~c1 | · · · |~cn ).

The vectors ~cj are called eigenvectors of A and the numbers dj are called eigenvalues of A. They
will be discussed in greater detail in the next section where we will also see how we can calculate
them.
Diagonalization of a matrix is very useful when we want to calculate powers of the matrix.

Proposition 8.23. Let A ∈ M (n × n) be a diagnalizable matrix and let C be an invertible matrix


RA
and D = diag(d1 , . . . , dn ) such that A = CDC −1 . Then Ak = C diag(dk1 , . . . , dkn )C −1 for all k ∈ N0 .
If A is invertible, then all dj are different from 0 and the formula is true for all k ∈ Z.

Proof. Let k ∈ N0 . Then


k
Ak = C diag(d1 , . . . , dn )C −1
= C diag(d1 , . . . , dn )C −1 C diag(d1 , . . . , dn )C −1 · · · C diag(d1 , . . . , dn )C −1
= C diag(d1 , . . . , dn ) diag(d1 , . . . , dn ) · · · diag(d1 , . . . , dn )C −1
D

k
= C diag(d1 , . . . , dn ) C −1
= C diag(dk1 , . . . , dkn )C −1 .
−1
If all dj 6= 0, then D is invertible with inverse D−1 = diag(d1 , . . . , dn ) = diag(d−1 −1
1 , . . . , dn ).
−1 −1 −1 −1 −1

Hence A is invertible and A = CDC = CD C and we obtain for k ∈ Z with k < 0
|k|
Ak = A−|k| = (A−1 )|k| = CD−1 C −1 = C(D−1 )|k| C −1 = CDk C −1 = CD−|k| C −1
= C diag(dk1 , . . . , dkn )C −1 .

Proposition 8.23 is useful for example when we describe dynamical systems by matrices or when
we solve linear differential equations with constant coefficients in higher dimensions.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 277

You should now have understood


• that similar matrices represent the same linear transformation,
• why similar matrices have the same determinant,
• why a matrix is diagonalisable if and only if Rn (or Cn ) admits a basis consisting of eigen-
vectors of A,
• etc.
You should now be able to
• etc.

8.3 Eigenvalues and eigenvectors


Definition 8.24. Let V be a vector space and let T : V → V be linear transformation. A number
λ is called an eigenvalue of T if there exists a vector ~v 6= ~0 such that

The vector v is then called a eigenvector .

FT
T v = λv.

The reason why we exclude v = O in the definition above is because for every λ it is true that
T O = O = λO, so (8.8) would be satisfied for any λ if we were allowed to choose v = O, in which
case the definition would not make too much sense.
(8.6)
RA
Exercise 8.25. Show that 0 is an eigenvalue of T if and only if dim(ker T ) ≥ 1, that is, if and only
if T is not invertible. Show that v is an eigenvector with eigenvalue 0 if and only if v ∈ ker T \{O}.

Exercise 8.26. Show that all eigenvalues of a unitary matrix have norm 1.

Question 8.4
Let V, W be vector spaces and let T : V → W be a linear transformation. Why does in not make
sense to speak of eigenvalues of T if V 6= W ?
D

Let us list some properties of eigenvectors that are easy to see.


(i) A vector v is an eigenvector of T if and only if T v k v.
(ii) If v is an eigenvector of T with eigenvalue λ 6= 0, then v ∈ Im T because v = λ1 T v.
(iii) If v is an eigenvector of T with eigenvalue λ, then every non-zero multiple of v is an eigenvector
with the same eigenvalue because
T (cv) = cT v = cλv = λ(cv).

(iv) We can generalise (iii) as follows: If v1 , . . . , vk are eigenvectors of T with the same eigenvalue
λ, then every non-zero linear combination is an eigenvector with the same eigenvalue because
T (α1 v1 + . . . αk vk ) = α1 T v1 + . . . αk T vk = α1 λv1 + · · · + αk λvk = λ(α1 v1 + · · · + αk vk ).

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
278 8.3. Eigenvalues and eigenvectors

(iv) says that the set of all eigenvectors with the same eigenvalue is almost a subspace. The only
thing missing is the zero vector O. This motivates the following definition.

Definition 8.27. Let V be a vector space and let T : V → V be a linear map with eigenvalue λ.
Then the eigenspace of T corresponding to λ is

Eigλ (T ) := Eig(T, λ) := {v ∈ V : v is eigenvector of T with eigenvalue λ} ∪ {O}


= {v ∈ V : T v = λv}.

The dimension of Eigλ (T ) is called the geometric multiplicity of λ.

Proposition 8.28. Let T : V → V be a linear map and let λ be an eigenvalue of T . Then

Eigλ (T ) = ker(T − λ id).

Proof. Let v ∈ V . Then

FT
v ∈ Eigλ (T ) ⇐⇒ T v = λv ⇐⇒ T v − λv = O ⇐⇒ T v − λ id v = O
⇐⇒ (T − λ id)v = O ⇐⇒ v ∈ ker(T − λ id).

Note that Proposition 8.28 shows again that Eigλ (T ) is a subspace of V . Moreover it shows that
that λ is an eigenvalue of T if and only if T − λ id is not invertible. For the special case λ = 0 we
have that Eig0 (T ) = ker T .
RA
Examples 8.29. (a) Let V be a vector space and let T = id. Then for every v ∈ V we have that
T v = v = 1v. Hence T has only one eigenvalue, namely λ = 1 and Eig1 (T ) = ker(T − id) =
ker 0 = V . Its geometric multiplicity is dim(Eig1 (T ) = dim V .

(b) Let V = R2 and let R be reflection on the x-axis. If ~v is an eigenvector of R, then R~v
must be parallel to ~v . This happens if and only if ~v is parallel to the x-axis in which case
R~v = ~v , or if ~v is perpendicular to the x-axis in which case R~v = −~v . All other vectors
change directions under a reflection. Hence we have the eigenvalues λ1 = 1 and λ2 = −1 and
Eig1 (R) = span{~e1 }, Eig−1 (R) = span{~e2 }. Each eigenvalue has geometric multiplicity 1.
D

Note that
  the matrix representation
   of R 
with respect
 to the canonical basis of R2 is AR =
1 0 1 0 x1 x1
and AR ~x = = . Hence AR ~x is parallel to ~x if and only if
0 −1 0 −1 x2 −x2
x1 = 0 (in which case ~x ∈ span{~e2 }) or x2 = 0 (in which case ~x ∈ span{~e1 }).

(c) Let V = R2 and let R be rotation about 90◦ . Then clearly R~v 6k ~v for any ~v ∈ R2 \ {~0}. Hence
R has no eigenvalues.
2
Note that
  the matrix representation of R with respect to the canonical basis of R is AR =
0 −1
. If we consider AR as a real matrix, then it has no eigenvalues. However, if consider
1 0
AR as a complex matrix, then it has the eigenvalues ±i as we shall see later.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 279

1 0 0 0 0 0
050000
(d) Let A =  00 00 50 05 00 00 . As always, we identify A with the linear map R6 → R6 , ~x 7→ A~x. It
000080
000000
is not hard to see that the eigenvalues and eigenspaces of A are

λ1 = 1, Eig1 (A) = span{~e1 }, geom. multiplicity: 1,


λ2 = 5, Eig5 (A) = span{~e2 , ~e3 , ~e4 }, geom. multiplicity: 3,
λ3 = 8, Eig8 (A) = span{~e6 , ~e7 }. geom. multiplicity: 2.

Show the claims above.


(e) Let V = C ∞ (R) be the space of all infinitely many times differentiable functions from R
to R and let T : V → V, T f = f 0 . Analogously to Example 6.5 we can show that T is a
linear transformation. The eigenvalues of T are those λ ∈ R such that there exists a function
f ∈ C ∞ (R) with f 0 = λf . We know that for every λ ∈ R this differential equation has a
solution and that every solution is of the form fλ (x) = c eλx for some real number c. Therefore
every λ ∈ R is an eigenvalue of T with eigenspace Eigλ (T ) = span{gλ } where gλ is the function

FT
given by gλ (x) = eλx . In particular, the geometric multiplicity of any λ ∈ R is 1.

(f) Let V = C ∞ (R) be the space of all infinitely many times differentiable functions from R to
R and let T : V → V, T f = f 00 . It is easy to see that T is a linear transformation. The
eigenvalues of T are those λ ∈ R such that there exists a function f ∈ C ∞ (R)√with f 00√= λf .
If λ > 0, then the general solution of this differential
√ equation
√ is fλ (x) = a e λx +b e λx . If
λ < 0, the general solution is fλ (x) = a cos λx + b sin λx. If λ = 0, the general solution is
f0 (x) = ax + b. Hence every λ ∈ R is an eigenvalue of T with geometric multiplicity 2.
RA
Write down the eigenspaces for a given λ.

Find the eigenvalues and eigenspaces if we consider the vector space of infinitely differentiable
functions from R to C.

In the examples above ifwas relatively


 easy to guess the eigenvalues. But how do we calculate the
1 2
eigenvalues of, e.g., A = or of the linear transformation T : M (n × n) → M (n × n), T (A) =
3 4
D

A + At ?
Since any linear transformation on a finite dimensional vector space V can be “translated” to a
matrix by choosing a basis on V , it is sufficient to find eigenvalues of matrices as the next theorem
shows.

Theorem 8.30. Let V be a finite dimensional vector space with basis B = {v1 , . . . , vn } and let
T : V → V be a linear transformation. If AT is the matrix representation of T with respect to
the basis B, then the eigenvalues of T and AT coincide  and  a vector v = c1 v1 + · · · + cn vn is an
c1
eigenvector of T with eigenvalue λ if and only if ~x =  ...  is an eigenvector of AT with the same
 

cn
eigenvalue λ. In particular, the dimensions of the eigenspaces of T and of AT coincide.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
280 8.3. Eigenvalues and eigenvectors

Proof. Let K = R if V is a real vector space and K = C if V is a complex vector space and
let Φ : V → Rn be the linear map defined by Φ(vj ) = ~ej , (j = 1, . . . , n). That means that Φ
c1
 .. 
“translates” a vector v = c1 v1 + · · · + cn vn into the column vector ~x =  . , cf. Section 6.4.
cn
T
V V
Φ Φ
AT
Kn Km
Recall that T = Φ−1 AT Φ. Let λ be an eigenvalue of T with eigenvector v, that is, T v = λv. We
express v as linear combination of the basis vectors from B as v = c1 v1 + · · · + cn vn . Hence
T v = λv ⇐⇒ Φ−1 AT Φv = λv ⇐⇒ AT Φv = Φλv ⇐⇒ AT (Φv) = λ(Φv)
which is the case if and only if λ is an eigenvalue of AT and Φv ∈ Eigλ (AT ).

FT
The proof shows that Eigλ (AT ) = Φ(Eigλ (T )) as was to be expected.

Corollary 8.31. Assume that A and B are similar matrices and let C be an invertible matrix with
A = C −1 BC. Then A and B have the same eigenvalues and for every eigenvalue λ we have that
Eigλ (B) = C Eigλ (A).

Now back to the question about how to calculate the eigenvalues and eigenvectors of a given matrix
A. Recall that λ is an eigenvalue of A if and only if ker(A − λ id) 6= {~0}, see Proposition 8.28. Since
RA
A − λ id is a square matrix, this is the case if and only if det(A − λ id) = 0.

Definition 8.32. The function λ 7→ det(A − λ id) is called the characteristic polynomial of A. It
is usually denoted by pA .

Before we discuss the characteristic polynomial and show that it is indeed a polynomial, we will
describe how to find the eigenvalues and eigenvectors of a given square matrix A.

Procedure to find the eigenvalues and eigenvectors of a given square matrix A.


D

• Calculate the characteristic polynomial pA (λ) := det(A − λ id).

• Find the zeros λ1 , . . . , λk of the characteristic polynomial. They are the eigenvalues of A.
• For each eigenvalue λj calculate ker(A − λj ), for instance using Gauß-Jordan elimination.
This gives the eigenspaces.

 
2 1
Example 8.33. Find the eigenvalues and eigenspaces of A = .
3 4
Solution. • The characteristic polynomial of A is
 
2−λ 1
pA (λ) = det(A − λ id) = det = (2 − λ)(4 − λ) − 3 = λ2 − 6λ + 5.
3 4−λ

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 281

• Now we can either complete the square or use the solution formula for quadratic equations to
find the zeros of pA . Here we choose to complete the square.
pA (λ) = λ2 − 6λ + 5 = (λ − 3)2 − 4 = (λ − 5)(λ − 1).
Hence the eigenvalues of A are λ1 = 5 and λ2 = 1.
• Now we calculate the eigenspaces using Gauß elimination.
       
2−5 1 −3 1 R2 →R2 +R1 −3 1 R1 →−R1 3 −1
∗ A − 5 id = = −−−−−−−−→ −−−−−−→ .
3 4−5 3 −1 0 0 0 0
 
1
Therefore, ker(A − 5 id) = span .
3
     
2−1 1 1 1 R2 →R2 −3R1 1 1
∗ A − 1 id = = −−−−−−−−→ .
3 4−1 3 3 0 0
 
1
Therefore, ker(A − 1 id) = span .
−1

FT
In summary, we have two eigenvalues,
 
1
λ1 = 5, Eig5 (A) = span , geom. multiplicity: 1,
3
 
1
λ2 = 1, Eig1 (A) = span , geom. multiplicity: 1.
−1

   
1 1
RA
If we set ~v1 = and ~v2 = we can check our result by calculating
3 −1
      
2 1 1 5 1
A~v1 = = =5 = 5~v1 ,
3 4 3 15 3
    
2 1 1 1
A~v2 = = = ~v2 .
3 4 −1 −1

Before we give more examples, we show that the characteristic polynomial is indeed a polynomial.
First we need a definition.
D

Definition 8.34. Let A = (aij )ni,j=1 ∈ M (n × n). The trace of A is the sum of its entries on the
diagonal:
tr A := a11 + a22 + . . . ann .

Theorem 8.35. Let A = (aij )ni,j=1 ∈ M (n × n) and let pA (λ) = det(A − λ id) be the characteristic
polynomial of A. Then the following is true.
(i) pA is a polynomial of degree n.
(i) Let pA (λ) = cn λn + cn−1 λn−1 + · · · + c1 λ + c0 . Then we have formulas for the coefficients
cn , cn−1 and c0 :
cn = (−1)n , cn−1 = (−1)n−1 tr A, c0 = det A.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
282 8.3. Eigenvalues and eigenvectors

Proof. By definition,
 
 a11 − λ a12 a1n 
22 − λ
 a a a2n 
 21 
pA (λ) = det(A − λ id) = det 

.

 
 
 
an1 an2 ann − λ

According to Remark 4.4, the determinant is the sum of products where each product consists of a
sign and n factors chosen such that it contains one entry from each row and from each column of
A − λ id. Therefore it is clear that pA is a polynomial in λ. The term with the most λ in it is the
one of the form
(a11 − λ)(a22 − λ) · · · (ann − λ). (8.7)
All the other terms contain at most n − 2 factors with λ. To see this, assume for example that in
one of the terms the factor from the first row is not (a11 − λ) but some a1j . Then there cannot be

FT
another factor from the jth column, in particular the factor (ajj − λ) cannot appear. So this term
has already two factors without λ, hence the degree of the term as polynomial in λ can be at most
n − 2. This shows that

pA (λ) = (a11 − λ)(a22 − λ) · · · (ann − λ) + terms of order at most n − 2. (8.8)

If we expand the first term and sort by powers of λ, we obtain

(a11 − λ)(a22 − λ) · · · (ann − λ) = (−1)n λn + (−1)n−1 λn−1 (a11 + · · · + ann )


RA
+ terms of order at most n − 2.

Inserting this in (8.7), we find that

pA (λ) = (−1)n λn + (−1)n−1 λn−1 (a11 + · · · + ann ) + terms of order at most n − 2, (8.9)

hence deg(pA ) = n.
Formula (8.9) also shows the claim about cn and cn−1 . The formula for c0 follows from
D

c0 = pA (0) = det(A − 0 id) = det A.

We immediately obtain the following very important corollary.

Corollary 8.36. An n × n matrix can have at most n different eigenvalues.

Proof. Let A ∈ M (n × n). Then the eigenvalues of A are exactly the zeros of its characteristic
polynomial. Since it has degree n, it can have at most n zeros.
Now we understand why working with complex vector spaces is more suitable when we are interested
in eigenvalues. They are precisely the zeros of the characteristic polynomial. While a polynomial
may not have real zeros, it always has zeros when we allow them to be complex numbers. Indeed,
any polynomial can always be factorised over C.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 283

Let A ∈ M (n × n) and let pA be its characteristic polynomial. Then there exist complex numbers
λ1 , . . . , λk and integers m1 , . . . , mk ≥ 1 such that
pa (λ) = (λ1 − λ)m1 · (λ2 − λ)m2 · · · (λk − λ)mk .
The numbers λ1 , . . . , λk are precisely the complex eigenvalues of A and m1 +· · ·+mk = deg pA = n.

Definition 8.37. The integer mj is called the algebraic multiplicity of the eigenvalue λj .

The following theorem is very important but we omit its proof.

Theorem 8.38. Let A ∈ M (n × n) and let λ be an eigenvalue of A. Then


geometric multiplicity of λ ≤ algebraic multiplicity of λ.
 
1 0 0 0 0 0
0 5 1 0 0 0
 
0 0 5 1 0 0

FT
Example 8.39. Let A =  . Since A − λ id is an upper triangular matrix, its
0 0 0 5 0 0
 
0 0 0 0 8 0
0 0 0 0 0 8
determinant is the product of the entries on the diagonal. We we obtain
pA (λ) = det(A − λ id) = (1 − λ)(5 − λ)3 (8 − λ)2 .
Therefore the eigenvalues of A are λ1 = 1, λ2 = 5, λ3 = 8. Let us calculate the eigenspaces.
RA
0 0 0 0 0 0 0 4 1 0 0 0
041000 permute rows 004100
• A − 1 id =  00 00 40 14 00 00  −−−−−−−−→  00 00 00 40 07 00 . This matrix is in row echelon form and
000070 000007
000007 000000
we can see easily that Eig1 (A) = ker(A − 1 id) = span{~e1 } which has dimension 1.
 −4 0 0 0 0 0   −4 0 0 0 0 0 
0 0 1 0 0 0 permute rows 0 0 1 0 0 0
• A − 5 id =  0 0 0 1 0 0 −−−−−−−−→  0 0 0 1 0 0 . This matrix is in row echelon form
0 0 0 0 0 0 0 0 0 0 3 0
0 0 0 0 3 0 0 0 0 0 0 3
0 0 0 0 0 3 0 0 0 0 0 0
and we can see easily that Eig5 (A) = ker(A − 5 id) = span{~e2 } which has dimension 1.
D

 −7 0 0 0 0 0 
0 −3 1 0 0 0
• A − 8 id =  0 0 −3 1 0 0 . This matrix is in row echelon form and we can see easily that
0 0 0 −3 0 0
0 0 0 000
0 0 0 000
Eig8 (A) = ker(A − 8 id) = span{~e5 , ~e6 } which has dimension 2.
In summary, we have
λ1 = 1, Eig1 (A) = span{~e1 }, geom. multiplicity: 1, alg. multiplicity: 1,
λ2 = 5, Eig5 (A) = span{~e2 }, geom. multiplicity: 1, alg. multiplicity: 3,
λ3 = 8, Eig8 (A) = span{~e6 , ~e7 }, geom. multiplicity: 2, alg. multiplicity: 2.
 
0 −1
Example 8.40. Find the complex eigenvalues and eigenspaces of R = .
1 0

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
284 8.3. Eigenvalues and eigenvectors

Solution. From Example 8.29 we already know that R has no real eigenvalues. The characteristic
polynomial of R is
 
−λ −1
pR (λ) = det(R − λ) = det = λ2 + 1 = (λ − i)(λ + i).
1 −λ

Hence the eigenvalues are λ1 = −i and λ2 = i. Let us calculate the eigenspaces.


     
i −1 R2 →R2 +iR1 i −1 1
• R−(−i) id = −−−−−−−−→ . Hence Eig−i (R) = ker(R+i id) = span .
1 i 0 0 i
     
−i −1 R2 →R2 +iR1 −i −1 1
• R−i id = −−−−−−−−→ . Hence Eigi (R) = ker(R−i id) = span .
1 −i 0 0 −i

 
2 1
Example 8.41. Find the diagonalisation of A = .
3 4

D=

5 0

FT
Solution. We need to find an invertible matrix C and a diagonal matrix D such that D = C −1 AC.
By Example 8.33, A has the eigenvalues λ1 = 5 and λ2 = 1, hence A is indeed diagonalisable. We
know that the diagonal entries of D are the eigenvalues
of C are the corresponding eigenvalues ~v1 =

, C=
 


1
3

1 1

of A,hence
and ~v2 =
−1
1
 D = diag(5, 1) and the columns
, hence

and D = C −1 AC.
RA
0 1 3 −1

Alternatively, we could have chosen D e = diag(1, 5). Then the corresponding C e = (~v2 |~v1
e is C
because the jth column of the invertible matrix must be an eigenvector corresponding the the jth
entry of the diagonal matrix, hence
   
De= 1 0 , C e= 1 1
and D e =Ce −1 AC.
e 
0 5 −1 3

Observe that up to ordering the diagonal elements, the matrix D is uniquely determined by A. For
D

the matrix C however we have more choices. For instance, if we multiply each column of C by an
arbitrary constant different from 0, it still works.

Example 8.42. Let V = M (2 × 2) and let T : V → V, T (M ) = M + M t . Find the eigenvalues


and eigenspaces of T .

Solution. Let M1 = ( 10 00 ), M2 = ( 00 10 ), M3 = ( 01 00 ), M4 = ( 00 01 ). Then B = {M1 , M2 , M3 , M4 }


is a basis of M (2 × 2). The matrix representation of T with respect to it is
 
2 0 0 0
0 1 1 0
AT = 0 1 1 0

0 0 0 2

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 285

The characteristic polynomial is


 
2−λ 0 0 0  
1−λ 1 0
 0 1 − λ 1 0 

det(AT − λ id) = det  = (2 − λ) det  1 1−λ 0 
 0 1 1−λ 0 
0 0 2−λ
0 0 0 2−λ

= (2 − λ)2 (1 − λ)2 − 1) = λ(λ − 2)3 .


 

Hence there are two eigenvalues: λ1 = 0 and λ2 = 2.


Let us find the eigenspaces.
     
2 0 0 0 1 0 0 0 1 0 0 0
0 1 1 0 R3 →R 3 −R 2
0 1 1 0 R 3 ↔R 4
0 1 1 0
• AT − 0 id = AT  0 1 1
 −−−−−−−−→   −−−−−→  ,
0 R1 → 21 R1 0 0 0 0 0 0 0 1
0 0 0 2 R4 → 12 R4 0 0 0 1 0 0 0 0

FT
     
0 0 0 0 0 0 0 0 0 1 −1 0
0 −1 1 0 R2 →R 2 +R 3
0 0 0 0 R1 ↔R3 0 0 0 0
• AT − 2 id =   −−−−−−−−→ 
0 1 −1 0 −−−−−→ 0 0
  .
0 1 −1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
       

 0  
 1 0 0 
−1      
 and Eig2 (AT ) = span   ,   , 0 .
0 1
  
Hence Eig0 (AT ) = span  1 0 1 0
RA

  
 
0 0 0 1
   

This means
 that the eigenvalues of T are 0 and 2 and that the eigenspaces are Eig0 (T ) = span {M2 − M3 } =

0 1
span and
−1 0
 
0 1
Eig0 (T ) = span {M2 − M3 } = span = Masym (2 × 2),
−1 0
     
1 0 0 1 0 0
Eig2 (T ) = span {M1 , M2 + M3 , M4 } = span , , = Msym (2 × 2),
D

0 0 1 0 0 1

Remark. We could have calculated the eigenspaces or T directly without calculating those of AT
first as follows.

• A matrix M belongs to Eig0 (T ) if and only if T (M ) = 0. This is the case if and only if
M + M t = 0 which means that M = −M t . So Eig0 (T ) is the space of all antisymmetric 2 × 2
matrices.

• A matrix M belongs to Eig2 (T ) if and only if T (M ) = 2M . This means that M + M t = 2M .


This is the case if and only if M = M t . So Eig0 (T ) is the space of all symmetric 2×2 matrices.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
286 8.4. Properties of the eigenvalues and eigenvectors

You should now have understood


• the concept of eigenvalues and eigenvectors,
• why an n × n matrix can have at most n eigenvalues,
• why the restriction of A to any of its eigenspaces acts as a multiple of the identity,
• what the characteristic polynomial of a matrix says about its eigenvalues,
• why a n × n matrix is diagonalisable if and only if Kn has a basis consisting of eigenvectors
of A,
• etc.
You should now be able to

• calculate the characteristic polynomial of a square matrix A,


• calculate the eigenvalues and eigenvectors of a square matrix A,
• diagonalise a diagonalisable matrix,

FT
• etc.

8.4 Properties of the eigenvalues and eigenvectors


In this section we collect important properties of eigenvectors.

Proposition 8.43. Let A ∈ M (n × n) and let λ1 , λ2 , . . . , λk be pairwise different eigenvalues of


RA
A with eigenvectors ~v1 , ~v2 , . . . , ~vk . Then the vectors ~v1 , ~v2 , . . . , ~vk are linearly independent.

Proof. We proof the claim by induction.


Basis of the induction: k = 2. Assume that λ1 6= λ2 are eigenvalues of A with eigenvectors ~v1 and
~v2 . Hence A~v1 = λ1~v1 and A~v2 = λ2~2 and ~v1 6= ~0 6= ~v2 . Let α1 , α2 numbers such that

α1~v1 + α2~v2 = ~0. (8.10)


α2
Assume that α1 6= 0. Then ~v1 =
D

α1 ~
v2 and
 
α2 α2 α2 α2 ~0 = (λ1 − λ2 )~v1 .
λ1~v1 = A~v1 = A ~v2 = A~v2 = λ2~v2 = λ2 ~v2 = λ2~v1 =⇒
α1 α1 α1 α1

Since λ1 6= λ2 and ~v1 6= ~0, the last equality is false and therefore we must have α1 = 0. Then,
by (8.10), ~0 = α1~v1 + α2~v2 = α2~v2 , hence also α2 = 0 which proves that ~v1 and ~v2 are linearly
independent.
Induction step: Assume that we already know for some j < k that the vectors ~v1 , . . . , ~vj are linearly
independent. We have to show that then also the vectors ~v1 , . . . , ~vj+1 are linearly independent. To
this end, let α1 , α2 , . . . , αj+1 such that

~0 = α1~v1 + α2~v2 + · · · + αj ~vj + αj+1~vj+1 . (8.11)

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 287

On the one hand we apply A on both sides of the equation and use the fact that vectors are
eigenvectors. On the other hand we multiply both sides by λj+1 and then we compare the two
results.

apply A : ~0 = A(α1~v1 + α2~v2 + · · · + αj ~vj + αj+1~vj+1 )


= α1 A~v1 + α2 A~v2 + · · · + αj A~vj + αj+1 A~vj+1
= α1 λ1~v1 + α2 λ2~v2 + · · · + αj λj ~vj + αj+1 λj+1~vj+1 1

multiply by λj+1 : ~0 = α1 λj+1~v1 + α2 λj+1~v2 + · · · + αj λj+1~vj + αj+1 λj+1~vj+1 2

The difference 1 - 1 gives

~0 = α1 (λ1 − λj+1 )~v1 + α2 (λ1 − λj+1 )~v2 + · · · + αj (λ1 − λj+1 )~vj .

Note that the term with ~vj+1 cancelled. By the induction hypothesis, the vectors ~v1 , . . . , ~vj are
linearly independent, hence

FT
α1 (λ1 − λj+1 ) = 0, α2 (λ1 − λj+1 ) = 0, ..., αj (λ1 − λj+1 ) = 0.

We also know that λj+1 is not equal to any of the other λ` , hence it follows that

α1 = 0, α2 = 0, ..., αj = 0.

Inserting this in (8.11) gives that also αj+1 = 0 and the proof is complete.
RA
Note that the proposition shows again that an n×n matrix can have at most n different eigenvalues.

Corollary 8.44. Let A ∈ M (n × n) and let µ1 . . . , µk be the different eigenvalues of A. If in each


Eigµj (A) we choose linearly independent vectors ~v1j , . . . , ~v`j1 , then the system of all those vectors
is linearly independent. In particular, if we choose bases in Eigµj (A), we see that the sum of
eigenspaces is a direct sum
Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A)
D

and dim(Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A)) = dim(Eigµ1 (A) + · · · + dim Eigµk (A)).

(m)
Proof. Let αj be numbers such that

~0 = α1(1)~v11 + · · · + α(1)~v`1 + α1(2)~v12 + · · · + α(2)~v`2 + . . . α1(k)~v1k + · · · + α(k)~v`k


`1 1 `2 2 `k k

=w
~1 + w
~2 + . . . w
~k

(j) (j)
~ j = α1 ~v1j + · · · + α`1 ~v`j1 ∈ Eigµj . Proposition 8.43 implies that w
with w ~ k = ~0.
~1 = · · · = w
(m) (m) (m)
But then also all coefficients αj = 0 because for fixed m, the vectors ~v1 , . . . , ~v`m are linearly
independent. Now all the assertions are clear.

A very special class of matrices are the diagonal matrices.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
288 8.4. Properties of the eigenvalues and eigenvectors

 
d1
Theorem 8.45. (i) Let D = diag(d1 , . . . , dn ) = 
 0  be a diagonal matrix. Then


0 dn

the eigenvalues of D are precisely the numbers d1 , . . . , dn and the geometric multiplicity of
each eigenvalue is equal to its algebraic multiplicity.
   
d1 ∗  d
(ii) Let B = 

 and C = 
 1 0   be upper and lower triangular matrices
 ∗

0 dn

dn

respectively. Then the eigenvalues of D are precisely the numbers d1 , . . . , dn and the algebraic
multiplicity of an eigenvalue is equal to the number of times it appears on the diagonal. In
general, nothing can be said about the geometric multiplicities.

Proof. (i) Since the determinant of a diagonal matrix is the product of its diagonal elements, we
obtain for the characteristic polynomial of D

FT
 
d 1 − λ
 0 
pD (λ) = det(D − λ) = det   = (d1 − λ) · · · · · (dn − λ).
 

 0 

dn − λ

Since the zeros of the characteristic polynomial are the eigenvalues of D, we showed that the
RA
numbers on the diagonal of D are precisely its eigenvalues. The algebraic multiplicity of an
eigenvalue µ is equal to the number of times it is repeated on the diagonal of D. The algebraic
multiplicity of µ is equal to dim(ker(D − µ id). Note that D − µ id is a diagonal matrix and
the jth entry on its diagonal is 0 if and only if µ = dj . it is not hard to see that the dimension
of the kernel of a diagonal matrix is equal to the number of zeros on its diagonal. So, in
summary we have for an eigenvalue µ of A:

algebraic multiplicity of µ = number of times µ appears in the diagonal of D


= geometric multiplicity of µ.
D

(ii) Since the determinant of a triangular matrix is the product of its diagonal elements, we obtain
for the characteristic polynomial of B
 
d − λ
 1
∗ 
pB (λ) = det(B − λ) = det   = (d1 − λ) · · · · · (dn − λ).
 

 0 

dn − λ

and analogously for C. The reasoning for the algebraic multiplicities of the eigenvalues is as
in the case of a diagonal matrix. However, in general the algebraic and geometric multiplicity
of an eigenvalue of a triangular matrix may be different as Example 8.39 shows.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 289

 
5 0 0 0 0 0
0 1 0 0 0 0
 
0 0 5 0 0 0
Example 8.46. Let D =   . Then pD (λ) = (1 − λ)(5 − λ)3 (5 − λ)2 .
0 0 0 8 0 0

0 0 0 0 8 0
0 0 0 0 0 5
The eigenvalues are 1 (with geom. mult = alg. mult = 1), 5 (with geom. mult = alg. mult = 3)
and 8 (with geom. mult = alg. mult = 2),

Theorem 8.47. If A and B are similar matrices, then they have the same characteristic polyno-
mial. In particular, they have the same eigenvalues with the same algebraic multiplicities. Moreover,
also the geometric multiplicities are equal.

Proof. Let C be an invertible matrix such that A = C −1 BC. Hence

A − λ id = C −1 BC − λ id = C −1 BC − λC −1 C = C −1 (B − λ id)C

FT
and we obtain for the characteristic polynomial of A

pA (λ) = det(A − λ id) = det(C −1 (B − λ id)C) = det(C −1 ) det(B − λ id) det C = det(B − λ id)
= pB (λ).

This shows that A and B have the same eigenvalues and that their algebraic multiplicities coincide.

Now let µ be an eigenvalue. Then


RA
Eigµ (A) = ker(A − µ id) = ker(C −1 (B − µ id)C) = ker((B − µ id)C) = C −1 ker(B − µ id)
= C −1 Eigµ (B)

where in the second to last step we used that C −1 is invertible. The invertibility of C −1 also shows
that dim(C −1 Eigµ (B)) = dim(Eigµ (B), hence dim Eigµ (A) = dim(Eigµ (B), which proves that the
geometric multiplicity of µ as eigenvalue of A is equal to that of B.
D

Next we prove a very important theorem about the diagonalisation of matrices.

Theorem 8.48. Let A ∈ MK (n × n) with K = R or K = C. Then the following is equivalent.

(i) A is diagonalisable, that means that there exists a diagonal matrix D and an invertible matrix
C such that C −1 AC = D.
(ii) For every eigenvalue of A, its geometric and algebraic multiplicities are equal.
(iii) A has a set of n linearly independent eigenvectors.
(iv) Kn has a basis consisting of eigenvectors of A.

Proof. Let µ1 , . . . , µk be the different eigenvalues of A and let us denote the algebraic multiplicities
of µj by mj (A) and mj (D) and the geometric multiplicities by nj (A) and nj (D).

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
290 8.4. Properties of the eigenvalues and eigenvectors

(i) =⇒ (ii): By assumption A and D are similar so they have the same eigenvalues by Theorem 8.47
and
mj (A) = mj (D) and nj (A) = nj (D) for all j = 1, . . . , k,
and Theorem 8.45 shows that

mj (D) = nj (D) for all j = 1, . . . , k,

because D is a diagonal matrix. Hence we conclude that also

mj (A) = nj (A) for all j = 1, . . . , k.

(ii) =⇒ (iii): Recall that the geometric multiplicities nj (A) are the dimensions of the kernel of
A − µj id. So in each ker(A − µj ) we may choose a basis consisting of nj (A) vectors. In total we
have n1 (A)+· · ·+nk (A) = m1 (A)+· · ·+mk (A) = n such vectors and they are linearly independent
by Corollary 8.44.
(iii) =⇒ (iv): This is clear because dim Kn = n.

FT
(iv) =⇒ (i): Let B = {~c1 , . . . , ~cn } be a basis of Kn consisting of eigenvectors of A and let d1 , . . . , dn
be the corresponding eigenvalues, that is, A~cj = dj ~cj . Note that the dj are not necessarily pairwise
different. Then the matrix C = (~c1 | · · · |~cn ) is invertible and C −1 AC is the representation of A in
the basis B, hence C −1 AC = diag(d1 , . . . , dn ). In more detail, using that ~cj = C~ej and C −1~cj = ~ej ,

jth column of C −1 AC = C −1 AC~ej = C −1 A~cj = C −1 (dj ~cj ) = dj C −1~cj = dj ~ej ,

hence D = (d1~e1 | · · · |dn~en ) = diag(d1 , . . . , dn ).


RA
An immediate consequence of Theorem 8.48 is the following.

Corollary 8.49. If a matrix A ∈ M (n × n) has n different eigenvalues, then it is diagonalisable.

Proof. If A has n different eigenvalues λ1 , . . . , λn , then for each of them the algebraic multiplicity
is equal to 1. Moreover,
D

1 ≤ geometric multiplicity ≤ algebraic multiplicity = 1

for each eigenvalue. Hence the algebraic and the geometric multiplicity for each eigenvalue are equal
(both are equal to 1) and the claim follows from Theorem 8.48.

Corollary 8.50. If the matrix A ∈ M (n × n) is diagonalisable, then its determinant is equal to the
product of its eigenvalues.

Proof. Let λ1 , . . . , λn be the (not necessarily different) eigenvalues of A and let C be an invertible
matrix such that C −1 AC = D := diag(λ1 , . . . , λn ). Then
n
Y
det A = det(CDC −1 ) = (det C)(det D)(det C −1 ) = det D = λj .
j=1

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 291

Theorem 8.51. Let A ∈ M (n × n) and let µ1 , . . . , µk be its different eigenvalues. Then A is


diagonalisable if and only if
Kn = Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A). (8.12)
where K = R or K = C depending on whether A is acting on R or on C.

Proof. Let us denote the algebraic multiplicity of each µj by mj (A) and its geometric multiplicity
by nj (A).
If A is diagonalisable, then the geometric and algebraic multiplicities are equal for each eigenvalue.
Hence
dim(Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A)) = dim(Eigµ1 (A)) + · · · + dim(Eigµk (A))
= n1 (A) + · · · + nk (A) = m1 (A) + · · · + mk (A) = n.
Since every n-dimensional subspace of Kn is equal to Kn , (8.12) is proved.
Now assume that (8.12) is true. We have to show that A is diagonalisable. In each Eigµj we choose
a basis Bj . By (8.12) the collection of all those basis vectors form a basis of Kn . Therefore we found

FT
a basis of Kn consisting of eigenvectors of A. Hence A is diagonalisable by Theorem 8.48.
The above theorem says that A is diagonalisable if and only if there are enough eigenvectors of
A to span Kn . This is the case if and only if Kn splits in the direct sum of subspaces on each of
which A acts simply by multiplying each vector with the number (namely with the corresponding
eigenvalue).
To practice a bit the notions of algebraic and geometric multiplicities, finish this section with an
alternative proof of Theorem 8.48.
RA
Alternative proof of Theorem 8.48. Let us prove (i) =⇒ (iv) =⇒ (iii) =⇒ (ii) =⇒ (i).
(i) =⇒ (iv): This was already discussed after Definition 8.22. Let D = diag(d1 , . . . , dn ) and let ~c1 , . . . , ~cn
be the columns of C. Clearly they form a basis of Kn because C in invertible. By assumption we know
that AC = CD. Hence we have that
A~cj = jth column of AC = jth column of CD = dj · (jth column of C) = dj ~cj .
Therefore the vectors ~c1 , . . . , ~cn are linearly independent and are all eigenvalues of A and hence they are
even a basis of Kn .
D

(iv) =⇒ (iii): Clear.


(iii) =⇒ (ii): Suppose that ~v1 . . . , ~vn is a basis of K n consisting of eigenvectors of A. Clearly, each of them
must belong to some eigenspace of A. Let `j be the number of those vectors which belong to Eigµj (A).
Hence it follows that `j ≤ nj (A) because the vectors are linearly independent and nj (A) = dim Eigµj (A).
So by Theorem 8.38 we have `j ≤ nj (A) ≤ mj (A) where mj (A) is the algebraic multiplicity of µj . Summing
over all eigenvectors, we obtain
n = `1 + · · · + `k ≤ n1 (A) + · · · + nk (A) ≤ m1 (A) + · · · + mk (A) = n
The first equality holds because the vectors are a basis of Kn and the last equality holds by definition
of the algebraic multiplicity. Hence all the ≤ signs are in reality equalities and n1 (A) + · · · + nk (A) =
m1 (A) + · · · + mk (A). Therefore
 
0 = n1 (A) + · · · + nk (A) − m1 (A) + · · · + mk (A)
   
= n1 (A) − m1 (A) + · · · + nk (A) − mk (A) .

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
292 8.5. Symmetric and Hermitian matrices

Since nj (A)−mj (A) ≤ 0 for all j = 1, . . . , k, each of the terms must be zero which shows that nj (A)−mj (A)
as desired.
(ii) =⇒ (i): For each j = 1, . . . , k let us choose a basis Bj of Eigµj (A). Observe that each basis has
nj (A) vectors. By Corollary 8.44, the system consisting of all these basis vectors is linearly independent.
Moreover, the total number of these vectors is n1 (A) + · · · + nk (A) = m1 (A) + · · · + mk (A) = n where we
used the assumption that the algebraic and geometric multiplicities are equal for each eigenvalue. Hence
the collection of all those vectors form a basis of Kn . That A is diagonalisable follows now as in the proof
of (iv) =⇒ (i):

You should now have understood

• why the eigenvectors of different eigenvalues of a matrix A are linearly independent,


• more generally, why the sum of the eigenspaces is even a direct sum,
• why a matrix is diagonalisable if and only if the vector space has a basis consisting of
eigenvectors of A,

FT
• algebraic and geometric multiplicities,
• etc.
You should now be able to
• verify if a given matrix is diagonalisable,
• if it is diagonalisable, find its diagonalisation,
• etc.
RA
8.5 Symmetric and Hermitian matrices
In this section we will deal with symmetric and hermitian matrices. The main results are that all
eigenvalues of a hermitian matrix are real, that eigenvectors corresponding to different eigenvalues
are orthogonal and that every hermitian matrix is diagonalisable. Note that symmetric matrices
are a special case of hermitian ones, so whenever we show something about hermitian matrices, the
D

same is true for symmetric matrices.

Theorem 8.52. Let A be a hermitian matrix. Then every eigenvalue λ of A is real.

Proof. Let A be hermitian, that is, A∗ = A and let λ be an eigenvalue of A with eigenvector ~v .
Then ~v 6= ~0 and A~v = λ~v . We have to show that λ = λ. Therefore

λk~v k2 = λh~v , ~v i = hλ~v , ~v i = hA~v , ~v i = h~v , A∗~v i = h~v , A~v i = h~v , λ~v i = λh~v , ~v i = λk~v k2 .

Since ~v 6= ~0, it follows that λ = λ which means that the imaginary part of λ is 0, hence λ ∈ R.

Theorem 8.53. Let A be a hermitian matrix and let λ1 , λ2 be two different eigenvalues of A with
eigenvectors ~v1 and ~v2 , that is A~v1 = λ1~v1 and A~v2 = λ2~v2 . Then ~v1 ⊥ ~v2 .

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 293

Proof. The prove is similar to the proof of Theorem 8.52. We have to show that h~v1 , ~v2 i = 0. Note
that by Theorem 8.52, the eigenvalues λ1 , λ2 are real.

λ1 h~v1 , ~v2 i = hλ1~v1 , ~v2 i = hA~v1 , ~v2 i = h~v1 , A∗~v2 i = h~v1 , A~v2 i = h~v1 , λ2~v2 i = λ2 h~v1 , ~v2 i = λ2 h~v1 , ~v2 i.

Since λ1 6= λ2 by assumption it follows that h~v1 , ~v2 i = 0.

Corollary 8.54. Let A be a hermitian matrix and let λ1 , λ2 be two different eigenvalues of A.
Then Eigλ1 (A) ⊥ Eigλ2 (A).

The next theorem is one of the most important theorems in Linear Algebra.

Theorem 8.55. Every hermitian matrix is diagonalisable.

Theorem 8.55*. Every symmetric matrix is diagonalisable.

We postpone the proof of these theorems to end of this section.

FT
As a corollary we obtain the following very important theorem.

Theorem 8.57. A matrix is hermitian if and only if it is unitarily diagonalisable, that is, there
exists a unitary matrix Q and a diagonal matrix D such that D = Q−1 AQ = Q∗ AQ.

The formulation of the above theorem for real matrices is:

Theorem 8.57*. A matrix is symmetric if and only if it is orthogonally diagonalisable, that is,
RA
there exists an orthogonal matrix Q and a diagonal matrix D such that D = Q−1 AQ = Qt AQ.

In both cases, D = diag(λ1 , . . . , λn ) where the λ1 , . . . , λn are the eigenvalues of A and the columns
of Q are the corresponding eigenvectors.
Proof. Let A be a hermitian matrix. From Theorem 8.55 we know that A is diagonalisable. Hence

Cn = Eigµ1 (A) ⊕ . . . Eigµk (A)

where µ1 , . . . , µk are the different eigenvalues of A. In each eigenspace Eigµj (A) we can choose an
D

orthonormal basis Bj consisting of nj vectors ~v1j , . . . , ~vnj j where nj is the geometric multiplicity of
µj . We know that the eigenspaces are pairwise orthogonal by Corollary 8.54. Hence the system of
all these vectors form an orthonormal basis B of Cn . Therefore the matrix Q whose columns are
the vectors of this basis is a unitary matrix and Q−1 AQ = D.
Now assume that A is unitarily diagonalisable. We have to show that A is hermitian. Let Q be a
unitary matrix and let D be a diagonal matrix such that D = Q∗ AQ. Then A = QDQ∗ and

A∗ = (QDQ∗ )∗ = (Q∗ )∗ D∗ Q∗ = QDQ∗ = A

where we used that D∗ = D because D is a diagonal matrix whose entries on the diagonal are real
numbers because they are the eigenvalues of A.
The proof of Theorem 8.57* is the same.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
294 8.5. Symmetric and Hermitian matrices

Corollary 8.59. If a matrix A is hermitian (or symmetric), then its determinant is the product
of its eigenvalues.

Proof. This follows from Theorem 8.55 (or Theorem 8.55*) and Corollary 8.50.

Proof of Theorem 8.55. Let A ∈ MC (n × n) be a hermitian matrix and let µ1 , . . . , µk be the


different eigenvalues of A with geometric multiplicities n1 , . . . , nk . By Theorem 8.51 it suffices to
show that
Cn = Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A).

Let us denote the right hand side by U , that is, U := Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A). Then we have
to show that U ⊥ = {~0}. For the sake of a contradiction, assume that this is not true and let
(j) (j)
` = dim(U ⊥ ). In each Eigµj (A) we choose an orthogonal basis ~v1 , . . . , ~vnj and we choose and
orthogonal basis w ~ ` in U ⊥ . The set of all these vectors is an orthonormal basis B of
~ 1, . . . , w
Cn because all the eigenspaces are orthogonal to each other and to U ⊥ . Let Q be the matrix
(1) (k)
whose columns are these vectors: Q = (~v1 | · · · |~vnk |w ~ 1 | · · · |w
~ ` ). Then Q is a unitary matrix

FT
because its columns are an orthogonal basis of Cn . Next let us define B = Q−1 AQ. Then B is
symmetric because B ∗ = (Q−1 AQ)∗ = Q∗ A∗ (Q−1 )∗ = Q−1 AQ = B where we used that A = A∗
by assumption and that Q−1 = Q∗ because it is a unitary matrix. On the other hand, B being the
matrix representation of A with respect to the basis B, is of the form
 
µ1
 
µ1
 
 
RA
µ2
 
 
B= .
 
 
 

 µk 

 
 
C

All the empty spaces are 0 and C is an ` × ` matrix (it is the matrix representation of the restriction
of A to U ⊥ with respect to the basis w
~ 1, . . . , w
~ ` ). The characteristic polynomial of C has at least
D

one zero, hence C has at least one eigenvalue λ. Clearly, λ is then also an eigenvalue of B and if
~y ∈ C` is an eigenvector of C, we obtain an eigenvector of B with the same eigenvalue by putting
0s as its first n − ` components and ~y as its last ` components. Since A and B have the same
eigenvalues, λ must be equal to one of the eigenvectors µ1 , . . . , µk , say λ = µj0 . But then the
dimension of the eigenspace Eigµj0 (B) is strictly larger than the dimension of Eigµj0 (A) which
contradicts Theorem 8.47. Therefore U ⊥ = {~0} and the theorem is proved.

Proof of Theorem 8.55*. The proof is essentially the same as that for Theorem 8.55. We only
have to note that, using the notation of the proof above, the matrix C is symmetric (because B
is symmetric). If we view C as a complex matrix, it has at least one eigenvalue λ because in C
its characteristic polynomial has at least one complex zero. However, since C is hermitian, all its
eigenvalues are real, hence λ is real, so it is an eigenvalue of C if we view it as a real matrix.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 295

You should now have understood


• why hermitian and symmetric matrices have real eigenvalues,
• why eigenvectors for different eigenvalues of a hermitian matrix are perpendicular to each
other,
• why a hermitian/symmetric matrix is orthogonally diagonalisable,
• that up to a rotation and maybe reflection, the eigenspaces of a hermitian matrix are gen-
erated by the coordinate axes,
• etc.
You should now be able to
• find eigenvalues and eigenvectors of hermitian/symmetric matrices,
• diagonalise symmetric matrices,
• write Cn (or Rn ) as direct sum of the eigenspaces of a given hermitian (or symmetric)

FT
matrix,
• etc.

8.6 Application: Conic Sections


In this section we will study quadratic equations in x and y. Recall that we know how to deal with
linear equations in two variables. The most general form is
RA
ax + by = d (8.13)

with constants a, b, d. A solution is a tuple (x, y) which satisfies (8.13). We can view the set of all
solutions as a subset in the plane R2 . Since (8.13) is a linear equation (a 1 × 2 system of linear
equations), we know that we have the following possibilities for the solution set:
(a) a line if a 6= 0 or b 6= 0,
(b) the plane R2 if a = 0, b = 0 and d = 0,
D

(c) the empty set (no solution) if a = 0, b = 0 and d 6= 0,


Now we will consider the quadratic equation

ax2 + bxy + cy 2 = d (8.14)

with constants a, b, c, d.

In the following we will always assume that d ≥ 0. This is no loss of generality because if d < 0,
we can multiply both sides of (8.14) by −1 and replace a, b, c by −a, −b, −c. The set of solutions
does not change.

Again, we want to identify the solutions with subsets in R2 and we want to find out what type of
figures they are. The equation (8.14) is not linear, so we have to see what relation (8.14) has with

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
296 8.6. Application: Conic Sections

what we studied so far. It turns out that the left hand side of (8.14) can be written as an inner
product       
x x a b/2
G , with G = . (8.15)
y y b/2 c

Question 8.5
The matrix G from (8.15) is not the only possible choice. Find all possible matrices G such that
hG( xy ) , ( xy )i = ax2 + bxy + cy 2 .

The matrix G is very convenient because it is symmetric. This means that up to an orthogonal
transformation, it is a diagonal matrix. So once we know how to solve the problem when G is
diagonal, then we know it for the general case since the solutions differ only by a rotation and
maybe a reflection. This motivates us to first study the case when G is diagonal, that is, when
b = 0.

FT
Quadratic equation without mixed term (b = 0).

If b = 0, then (8.14) becomes


ax2 + cy 2 = d (8.16)
with constants d ≥ 0 and a, c ∈ R.

Remark 8.60. The solution set is symmetric with respect to the x-axis and the y-axis because if
some (x, y) is a solution of (8.16), then so are (−x, y) and (x, −y).
RA
Let us define
( (
p p 2 a if a ≥ 0, c2 if c ≥ 0,
α := |a|, γ := |c|, hence α = and γ =
−a if a < 0 −c if c < 0.

We have to distinguish several cases according to whether the coefficients a, c are positive, negative
or 0.
D

Case 1.1: a > 0 and c > 0. In this case, the equation (8.16) becomes

α2 x2 + γ 2 y 2 = d. (8.16.1.1)

(i) If d > 0, then (8.16.1.1) is the equation of an ellipse whose axes are parallel to the x and
√ p
the y-axis. The intersection with the x-axis is at ± αd = ± d/a and the intersection with
√ p
the y-axis is at ± γd = ± d/c.

(ii) If d = 0, then the only solution of (8.16.1.1) is the point (0, 0) .



Remark 8.61. Note that the length of the semiaxes of the ellipse is proportional to d. Hence
as d decreases, the ellipse from (i) becomes smaller and for d = 0 it degenerates to the point (0, 0)
from (ii).

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 297

y y
p
d/c
p
d/a
x x

Figure 8.1: Solution of (8.16) for det G > 0 . If a > 0, b > 0, then the solution is an ellipse (if d > 0)
or the point (0, 0) (if d = 0). The right picture shows ellipses with a and c fixed but decreasing d (from
red to blue). If a < 0, b < 0, d > 0, then there is no solution.

Case 1.2: a < 0 and c < 0. In this case, the equation (8.16) becomes

− α2 x2 − γ 2 y 2 = d. (8.16.1.2)

FT
(i) If d > 0, then (8.16.1.2) has no solution because the left hand side is always less or equal to
0 while the right hand side is strictly positive.
(ii) If d = 0, then the only solution of (8.16.1.2) is the point (0, 0) .

Case 2.1: a > 0 and c < 0. In this case, the equation (8.16) becomes

α2 x2 − γ 2 y 2 = d. (8.16.2.1)
RA
(i) If d > 0, then (8.16.2.1) is the equation of a hyperbola . If x = 0, the equation has no

solution. Indeed, we need |x| ≥ αr such that the equation has a solution. Therefore the
hyperpola

does

not intersect the y-axes (in fact, the hyperbola cannot pass through the strip
d d
− α < y < α ).
• Intersection with the√coordinate
p axes: No intersection with the y-axis. Intersection with
the x-axis at x = ± αd = ± d/a.
• Asymptotics: For |x| → ∞ and |y| → ∞, the hyperbola has the asymptotes
D

α
y = ± x.
γ
Note that the asymptote does not depend on d.
Proof. It follows from (8.16.2.1) that |x| → ∞ if and only if |y| → ∞ because otherwise
the difference α2 x2 − γ 2 y 2 cannot be constant. Dividing (8.16.2.1) by x2 and by γ 2 and
rearranging leads to
y2 α2 d x large α2 α
2
= 2 − 2 2 ≈ , hence y≈± x.
x γ γ x γ2 γ

(ii) If d = 0, then (8.16.2.1), becomes α2 x2 +γ 2 y 2 = 0, and its solution is the pair of lines y = ± αγ x .

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
298 8.6. Application: Conic Sections

y y

p
d/a
x
x

FT
Figure 8.2: Solution of (8.16) for det G < 0 . The solutions are hyperbola (if d > 0) or a set of two
intersecting lines. The left picture shows a solution for a > 0, c < 0 and d > 0. The right picture
shows hyperbolas for fixed a and c but decreasing d. The blue pair of lines passing through the origin
correspond to the case d = 0.
RA
Remark√8.62. Note that the intersection point of the hyperbola with the x-axis is propor-
tional to d. Hence as d decreases, the intersection points moves closer to the 0 and the turn
becomes sharper. If d = 0, the intersection point reaches 0 and the hyperbola become two
angles which look like two crossing lines.

Case 2.2: a < 0 and c > 0. In this case, the equation (8.16) becomes

− α2 x2 + γ 2 y 2 = d. (8.16.2.2)
D

This case is the same as Case 2.1, only with the roles of x and y interchanged. So we find:

(i) If d > 0, then (8.16.2.1) is the equation of a hyperbola .

• Intersection with the√ coordinate


p axes: No intersection with the x-axis. Intersection with
the y-axis at y = ± γd = ± d/c.

• Asymptotics: For |x| → ∞ and |y| → ∞, the hyperbola has the asymptotes y = ± αγ x.

(ii) If d = 0, then (8.16.2.1), becomes α2 x2 +γ 2 y 2 = 0, and its solution is the pair of lines y = ± αγ x .

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 299

Case 3.1: a > 0 and c = 0. Then (8.16) Case 3.2: a = 0 and c > 0. Then (8.16)
becomes α2 x2 = d. becomes γ 2 y 2 = d.
• If d > 0, the solutions are the • If d > 0, the solutions are the
√ √
two parallel lines x = ± αd . two parallel lines y = ± γ
d
.
• If d = 0, the solution is the line x = 0 . • If d = 0, the solution is the line y = 0 .
Case 3.3: a < 0 and c = 0. Then (8.16) Case 3.4: a = 0 and c < 0. Then (8.16)
becomes −α2 x2 = d. becomes −γ 2 x2 = d.

• If d > 0, there is no solution . • If d > 0, there is no solution .


• If d = 0, the solution is the line x = 0 . • If d = 0, the solution is the line y = 0 .

Case 3.5: a = 0 and c = 0. Then (8.16) becomes 0 = d.

• If d > 0, there is no solution .


• If d = 0, the solution is R2 .

in all remaining cases det G = 0.

Quadratic equation with mixed term.


FT
Note that in the Cases 1.1 and 1.2, det G = ac > 0, in the Cases 2.1 and 2.2, det G = ac < 0 and

Now we want to solve (8.14) without the assumption that b = 0. Let G =



a b/2
b/2 c

and
RA
 
x
~x = . Then (8.14) is equivalent to
y

hG~x , ~xi = d. (8.17)

If G was diagonal, then we immediately could give the solution. We know that G is symmetric,
hence we know that G can be orthogonally diagonalized. In other words, there exists an orthogonal
basis of R2 with respect to which G has a representation as a diagonal matrix. We can even choose
this basis such that they are a rotation of the canonical basis ~e1 and ~e2 (without an additional
D

reflection).
Let λ1 , λ2 be eigenvalues of G and let D = diag(λ1 , λ2 ). We choose an orthogonal matrix Q such
that
D = Q−1 GQ. (8.18)
Denote the columns of Q by ~v1 and ~v2 . They are normalised eigenvectors of G with eigenvalues λ1
and λ2 respectively. Recall that for an orthogonal matrix Q we always have that det Q = ±1. We
may assume that det Q = 1, because if not we can simply multiply one of its columns by −1. This
column then is still a normalised eigenvector of G with the same eigenvalue, hence (8.18) is still
valid. With this choice we guarantee that Q is a rotation.
From (8.18) it follows that G = QDQ−1 = QDQ∗ . So we obtain from (8.17) that

d = hG~x , ~xi = hQDQ∗ ~x , ~xi = hDQ∗ ~x , Q∗ ~xi = hD~x 0 , ~x 0 i = hD~x 0 , ~x 0 i = λ1 x02 + λ2 y 02

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
300 8.6. Application: Conic Sections

 0
x
where ~x 0 = = Q∗ ~x = Q−1 ~x.
y0
 0
x
Observe that the column vector is the representation of ~x with respect to the basis ~v1 , ~v2
y0
(recall that they are eigenvectors of G). Therefore the solution of (8.14) is one of the solutions
we found for the case b = 0 only now the symmetry axes of the figures are no longer the x- and
y-axis, but they are the directions of the eigenvectors of G. In other words: Since Q is a rotation,
we obtain the solutions of ax2 + bxy + cy 2 = d by rotating the solutions of ax2 + cy 2 = d with the
matrix Q.

Procedure to find the solutions of ax2 + bxy + cy 2 = d.


 
a b/2
• Write down the symmetric matrix G = .
b/2 c
• Find the eigenvalues λ1 and λ2 and eigenvectors of G and define the diagonal matrix D =
diag(λ1 , λ2 ). and the orthogonal matrix Q such that det Q = 1 and D = Q−1 GQ.

FT
• Quadratic form without mixed terms: d = λ1 x02 + λ2 y 02 where x0 , y 0 are the components of
~x 0 = Q−1 ~x.
• Graphic of the solution: In the xy-coordinate system, indicate the x0 -axis (parallel to ~v1 )
and the y 0 -axis (parallel to ~v2 ). Note that these axes are a rotation of the x- and the y-axis.
The solutions are then, depending on the eigenvalues, an ellipse, hyperbola, etc. whose
symmetry axes are the x0 - and y 0 -axis.

If we want to know only the shape of the solution, it is enough to calculate the eigenvalues λ1 , λ2
RA
of G, or even only det G. Recall that we always assume d ≥ 0.

• If det G > 0, then we obtain an ellipse (which may be degenerate).


p
If λ1 > 0 and λ2 > 0, then the solution is an ellipse with length of its axes d/λ1 and
– p
d/λ2 . If d = 0 the ellipse is only the point (0, 0).
– If λ1 < 0 and λ2 < 0, then there is either no solution (if d > 0) or the solution is only
the point (0, 0) (if d = 0).
D

• If det G < 0, then we obtain a hyperbola (which may be degenerate).


0
– If λp
1 > 0 and λ2 < 0, then the solution is a hyperbola which intersects with the x -axis
at d/λ1 and has no intersection with the y 0 -axis.
0
1 < 0 and λ2 > 0, then the solution is a hyperbola which intersects with the x -axis
– If λp
0
at d/λ2 and has no intersection with the x -axis.
p
In both cases, the asymptotes of the hyperbola
p have slope ± λ1 /λ2 . If d = 0, the hyperbola
degenerate to the pair of lines y = ± λ1 /λ2 x.
• If det G = 0, then we obtain either the empty set, one of the axes, two lines parallel to one
of the axes, or R2 .

Definition 8.63. The axis of symmetry are called the principal axes.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 301

Example 8.64. Consider the equation

10x2 + 6xy + 2y 2 = 4. (8.19)

(i) Write the equation in matrix form.


(ii) Make a change of coordinates so that the quadratic equation (8.19) has no mixed term.
(iii) Describe the solution of (8.19) in geometrical terms and sketch it. Indicate the principal axes
and important intersections.

 we write (8.19) in the form hG~x, ~xi with a symmetric matrix G. Let us define
Solution.  (i) First
10 3
G= . Then (8.19) is equivalent to
3 2
    
x x
G , = 4. (8.20)
y y

FT
(ii) Now we calculate the eigenvalues of G. They are the roots of the characteristic polynomial
det(G − λ).

0 = det(G − λ) = (10 − λ)(2 − λ) − 9 = λ2 − 12λ + 11 = (λ − 6)2 − 25 = (λ − 1)(λ − 11).

Hence the eigenvalues of G are


λ1 = 1, λ2 = 11.
RA
Next we need the normalised eigenvectors. To this end, we calculate ker(G−λj ) using Gauß
elimination:
     
9 3 3 1 1 1
• G − λ1 = −→ =⇒ ~v1 = √ ,
3 1 0 0 10 −3
     
−1 3 −1 3 1 3
• G − λ2 = −→ =⇒ ~v2 = √ .
3 −9 0 0 10 1
D

(Recall that for symmetric matrices the eigenvectors for different eigenvalues are orthogonal.
If you solve such an exercise it might be a good idea to check if the vectors are indeed
orthogonal to each other.)
Observation. With the information obtained so far, we already can sketch the solution.

• The solution is an ellipse because both eigenvalues are positive.


• The principal axes
p (symmetry axes) are parallel to the vectors ~v1 u p
~v2 . The ellipsepinter-
sects them in ± 4/1 = ±2 along the axis parallel to ~v1 and in ± 4/11 = ±2/ 1/11
along the axis parallel to ~v2 .

Set      
1 1 −3 λ1 0 11 0
Q = (~v1 |~v2 ) = √ , D= = ,
10 3 1 0 λ2 0 1

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
302 8.6. Application: Conic Sections

then
Q−1 = Qt y D = Q−1 GQ = Qt GQ.
Observe that det Q = 1, so it is a rotation en R2 . It is a rotation by the angle arctan(−3).

If we define  0    
x −1 x 1 x + 3y
= Q = √ ,
y0 y 10 3x − y
then (8.20) gives
            0   0 
x x t x t x x x
4= G , = DQ , Q = D 0 , ,
y y y y y y0

and therefore
1 11
4 = x02 + 11y 02 = (x − 3y)2 + (3x + y)2 .
10 10

FT
(iii) The solution of (8.19) is an ellipse whose
principal axes are parallel to the vectors ~v1 y ~v2 .
x0 is the coordinate along the axis parallel to ~v1 ,
y 0 is the coordinate along the axis parallel to ~v2 .
y


2/ 11
y0
RA
~v2
x

~v1

2
D

x0


Example 8.65. Consider the equation

47 2 32 13
− x − xy + 13y 2 = 2. (8.21)
17 17 17
(i) Write the equation in matrix form.
(ii) Make a change of coordinates so that the quadratic equation (8.21) has no mixed term.
(iii) Describe the solution of (8.21) in geometrical terms and sketch it. Indicate the principal axes
and important intersections.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 303

Solution. (i) First we write (8.21) in the form hG~x, ~xi with symmetric matrix G. Let us define

1 −47 −16
G = 17 . Then (8.21) is equivalent to
−16 13
    
x x
G , = 2. (8.22)
y y

(ii) Now we calculate the eigenvalues of G. They are the roots of the characteristic polynomial
47 13
0 = det(G−λ) = (− 17 −λ)( 17 −λ)− 128 2 34 611 256 2
172 = λ + 17 λ− 172 − 172 = λ +2λ−3 = (λ−1)(λ+3).

Hence the eigenvalues of G are


λ1 = −3, λ2 = 1.
Next we need the normalised eigenvectors. To this end, we calculate ker(G−λj ) using Gauß
elimination:
     
1 4 −16 1 1 −4 1 4
~v1 = √

FT
• G − λ1 = −→ =⇒ ,
17 −16 64 17 0 0 17 1
     
1 −64 −16 1 4 1 1 −1
• G − λ2 = −→ =⇒ ~v2 = √ .
17 −16 −4 17 0 0 17 4

Observation. With the information obtained so far, we already can sketch the solution.
• The solution are hyperbola because the eigenvalues have opposite signs.
• The principal axes (symmetry axes) are parallel to the vectors ~v1 and ~v2 . The intersec-
RA

tions of the hyperbola with the axis parallel to ~v2 are ± 2.

Set      
1 4 1 λ1 0 −3 0
Q = (~v1 |~v2 ) = √ , D= = ,
17 −1 4 0 λ2 0 1
then
Q−1 = Qt y D = Q−1 GQ = Qt GQ.
Observe that det Q = 1, hence Q is a rotation of R2 . It is a rotation by the angle arctan(1/4).
D

If we define  0    
x −1 x 1 4x − y
= Q = √ ,
y0 y 17 x + 4y
then (8.22) gives
            0   0 
x x x x x x
2= G , = DQt , Qt = D 0 , ,
y y y y y y0
hence
3 1
2 = −3x02 + y 02 = − (4x − y)2 + (x + 4y)2 .
17 17

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
304 8.6. Application: Conic Sections

y
y0

(iii) The solution of equation (8.19) are hyperbola


whose principal axes are parallel to the vectors
~v1 y ~v2 .

x0 is the coordinate along the axis parallel to ~v1 , 2
y 0 is the coordinate along the axis parallel to ~v2 .
The angle between the x- and the x0 -axis is ~v2
arctan(1/4). x0
~v1
x

FT
Asymptotes of the hyperbola. In order to calculate the slopes of the asymptotes of the
hyperbola, we first calculate in the x0 -y 0 -coordinate system. Our starting point is the equation
2 = −3x02 + y 02 .
RA
r
02 02 y 02 1 y0 1
2 = −3x + y ⇐⇒ = 3 + 02 ⇐⇒ = ± 3 + 02 .
x02 2x x0 2x
0 √
We see that |y 0 | → ∞ if and only if |x0 | → ∞ and that xy 0 ≈ ± 3. So the slopes of the

asymptotes in x0 -y 0 -coordinates are ± 3.
How do we find the slope in x − y-coordinates?

• Method 1: Use Q. We know that if we rotate our hyperbola by the linear transforma-
D

tion Q− 1 (i.e. if we rotate by arctan(1/4)), then we obtain hyperbola whose symmetry


axes are the x- and y-axes and whose asymptotes have slopes ±3. Hence, in order to
obtain the asymptotes of our parabola, we only need to apply Q to the vectors w ~1 y w~2
which are parallel to the new asymptotes.
  The resulting
 vectors
 are then parallel to our
√1 1

original hyperbola. In our case w~1 = , w
~2 = . Hence
3 − 3
    √ 
1 4 1 √1 = √1 4+ √ 3
~ 10
w ~1 = √
= Qw ,
17 −1 4 3 17 −1 + 4 3

    √ 
1 4 1 1
√ 1 4− √ 3
~ 20
w ~2 = √
= Qw =√ .
17 −1 4 − 3 17 −1 − 4 3

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 305

Therefore the slopes of the asymptotes of our hyperbola are


√ √
−1 + 4 3 −1 − 4 3
√ and √ .
4+ 3 4− 3
y0

• Method 2: Insert in the formulas. The asymptotes are lines which satisfy x0 = ± 3.
Using x0 = √117 (4x − y) y y 0 = √117 (x + 4y), we obtain

√ √1 (x + 4y)
y0 17 x + 4y
± 3= = =
x0 √1 (4x − y) 4x − y
17

⇐⇒ ± 3(4x − y) = x + 4y
√ √
⇐⇒ (±4 3 − 1)x = (4 ± 3)y

y −1 ± 4 3
⇐⇒ = √ .
x 4± 3
0
• Method 3: Adding √ angles. We know that the0 angle between the x -axis and an

FT
asymptote is arctan 3 and the angle between the x -axis and the x-axis
√ is arctan(1/4).
Therefore the angel between the asymptote and the x-axis is arctan 3 + arctan(1/4)
(see Figure 8.3.)

−3x2 + y 2 = 2 − 47 2
17 x −
32
17 16xy + 13 2
17 y =2

y y
y0 y0
RA

2 α=ϕ+ϑ

x0 ~v2 ϑ x0
ϑ
ϕ ~v1 ϕ
D

x x

ϕ = arctan(1/4)

ϑ = arctan( 3)

Figure 8.3: The figure on the right (our hyperbola) is obtained from the figure on the left by
applying the transformation Q to it (that is, by rotating it by arctan(1/4)).


Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
306 8.6. Application: Conic Sections

Example 8.66. Consider the equation

9x2 − 6xy + y 2 = 25. (8.23)

(i) Write the equation in matrix form.


(ii) Make a change of coordinates so that the quadratic equation (8.23) has no mixed term.
(iii) Describe the solution of (8.23) in geometrical terms and sketch it. Indicate the principal axes
and important intersections.

Solution 1.
 • First
 we write (8.21) in the form hG~x, ~xi with symmetric matrix G. Let us define
9 −3
G= . Then (8.23) is equivalent to
−3 1
    
x x
G , = 25. (8.24)
y y

Hence the eigenvalues of G are FT


• Now we calculate the eigenvalue’s of G. They are the roots of the characteristic polynomial

0 = det(G − λ) = (9 − λ)(1 − λ) − 9 = λ2 − 10λ = λ(λ − 10).

λ1 = 0, λ2 = 10.

Next we need the normalised eigenvectors. To this end, we calculate ker(G−λj ) using Gauß
RA
elimination:
     
9 −3 3 −1 1 1
• G − λ1 = −→ =⇒ ~v1 = √ ,
−3 1 0 0 10 3
     
−1 −3 1 3 1 −3
• G − λ2 = −→ =⇒ ~v2 = √ .
−3 −9 0 0 10 1

Observation. With the information obtained so far, we already can sketch the solution.
D

– The solution are two parallel lines because one of the eigenvalues is zero and the other
is positive.
– The
p lines are p
parallel to ~v1 and their intersections with the axis parallel to ~v1 are
± 25/10 = ± 5/2.

Set      
1 1 −3 λ1 0 0 0
Q = (~v1 |~v2 ) = √ , D= = ,
10 3 1 0 λ2 0 10
then
Q−1 = Qt y D = Q−1 GQ = Qt GQ.
Observe that det Q = 1, hence Q is a rotation in R2 . It is a rotation by the angle arctan(3).

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 307

If we define
 0    
x −1 x 1 x + 3y
=Q =√ ,
y0 y 10 −3x + y
then (8.24) gives
            0   0 
x x x x x x
25 = G , = DQt , Qt = D 0 , ,
y y y y y y0

therefore
25 = 10y 02 = (−3x + y)2 .

the
pvector ~v1 which
0
p intersect the y -axis at
± 25/10 = ± 5/2.
FT
• The solution of (8.19) are two lines parallel to

x0 is the coordinate along the axis parallel to ~v1 ,


y 0 is the coordinate along the axis parallel to ~v2 .
The angle between the x- and the x0 -axis is
p
5/2
~v2
~v1

x
RA
arctan(3).

Solution 2. Note that


D

25 = 9x2 − 6xy + y 2 = (3x − y)2 ⇐⇒ 5 = |3x − y|.

Therefore the solution are two parallel lines given by

y = 3x ± 5

which coincides with the result above. 

8.6.1 Solutions of ax2 + bxy + cy 2 = d as conic sections


The reason why the title of this section is “conic section” is because most of the solution sets of the
quadratic equations can be obtained as the intersection of a double cone with a planes.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
308 8.6. Application: Conic Sections

FT
Figure 8.4: Ellipses. The plane in the picture on the left is parallel to the xy-plane. Therefore
the intersection with the cone is a circle. If the plane starts to incline, the intersection becomes an
ellipse. The more inclined the plane is, the more prolonged is the ellipse. As long as the plane is not
yet parallel to the surface of the cone, the intersects only either the upper or the lower part of the
cone and the intersection is an ellipse.
RA
D

Figure 8.5: Parabola. If the plane is parallel to the surface of the cone and does not pass through
the origin, then the intersection with the cone is a parabola (this is not a possible solution of (8.14)).
If the plane is parallel to the surface of the cone and passes through the origin, then the plane is
tangential to the cone and the intersection is one line.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 309

Figure 8.6: Hyperbola. If the plane is steeper than the cone, then it intersects both the upper and
the lower part of the cone. The intersection are hyperbola. If the plane passes through the origin, then

FT
the hyperbola degenerate to two intersecting lines. The plane in the picture in the middle is parallel
to the yz-plane. Therefore the intersection with the cone is a circle.

8.6.2 Solutions of ax2 + bxy + cy 2 + rx + sy = d


Let us briefly discuss the case then the quadratic equation (8.14) contains linear terms:

ax2 + bxy + cy 2 + rx + sy = d (8.25)


RA
We want  to find a transformation
 so that (8.25) can be written without the linear terms rx and sy.
a b/2
Let G = and let λ1 , λ2 be its eigenvalues. Moreover, let D = diag(λ1 , λ2 ) and Q an
b/2 c
orthogonal matrix with det Q = 1 and D = Q−1 GQ.
In the following we assume that G is invertible.
Method 1. First eliminate the mixed term bxy.
If we set ~x 0 = Q−1 ~x, then ax2 + bxy + cy 2 = λ1 x02 + λ2 y 02 . Since x0 and y 0 are linear in x and y,
equation (8.25) becomes
D

λ1 x02 + λ2 y 02 + r0 x0 + s0 y 0 = d0 .
Now we only need to complete the squares on the left hand sides to obtain

λ1 (x0 + r0 /2)2 + λ2 (y 0 + s0 /2)2 − (r0 /2)2 − (s0 /2)2 = d0 .

Note that this can always be done if λ1 and λ2 are not 0 (here we use that G is invertible).
If we set d00 = d0 + (r0 /2)2 + (s0 /2)2 , x00 = x0 + r0 /2 y 00 = y 0 + s0 /2, then

λ1 x002 + λ2 y 002 = d00 . (8.26)


 0   0 
r /2 r /2
Since ~x 00 = 0 +~x 0 = 0 +Q−1 ~x we see that the solution is the solution of λ1 x2 +λ2 y 2 = d00
s /2 s /2  0 
r /2
but rotated by Q and shifted by the vector .
s0 /2

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
310 8.6. Application: Conic Sections

Method 2. First eliminate the linear term rx and sy.


Let us make the ansatz x = x0 + x
e and y = y0 + ye. Inserting in (8.25) gives

e)2 + b(x0 + x
d = a(x0 + x e)(y0 + ye) + c(y0 + ye)2 + r(x0 + x
e) + s(y0 + ye)2
x2 + be y 2 + 2ax0 + by0 + r x e + 2cy0 + bx0 + s ye + ax20 + bx0 y0 + cy02 .
   
= ae xye + ce (8.27)

We want the linear terms in x


e and ye to disappear, so we need 2ax0 +by0 +r = 0 and 2cy0 +bx0 +s = 0.
In matrix form this is
      
r 2a b x0 x
− = = 2G 0 .
s b 2c y0 y0
   
x0 r
Assume that G is invertible. Then we can solve for x0 and y0 and obtain = − 21 G−1 .
y0 s
Now if we set de = d − ax20 − bx0 y0 − cy02 , then (8.27) becomes

FT
x2 + be
de = ae y2
xye + ce (8.28)

which is now in the form of (8.14) (if de is negative, then we must multiply both sides of (8.28) by
−1. In this case, the eigenvalues of G change their sign, hence D also changes sign, but Q does
not). Hence if we set ~x 0 = Q−1~x
e, then

de = λ1 x02 + λ2 y 02
RA
 
0 −1~ −1 −1 −1 −1 r
1 −1 −1
and ~x = Q x e = Q (~x − ~x0 ) = Q ~x − Q ~x0 = Q ~x + 2Q G . So again we see that
s
2 2
the solution
 of (8.25) is the solution of λ1 x + λ2 y = d but rotated by Q and shifted by the vector
e
1 −1 −1 r
2Q G
s
.

Example 8.67. Find the solutions of


D

10x2 + 6xy + 2y 2 + 8x − 2y = 4. (8.19’)

Solution. We know from Example 8.64 that


     
10 3 1 1 −3 11 0
G= , Q= √ , D=
3 2 10 3 2 0 1

and that
 0  0  0
x − 3y 0
      
x −1 x 1 x + 3y x x 1
=Q =√ and =Q 0 = √ 0 .
y0 y 10 3x − y y y 0
10 3x − y

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 311

Method 1. With the notation above, we know from Example 8.64 that(8.19’) is

4 = 10x2 + 6xy + 2y 2 + 8x − 2y = x02 + 11y 02 + 8x − 2y


8 2
= x02 + 11y 02 + √ (x0 − 3y 0 ) − √ (3x0 − y 0 )
10 10
2 22
= x02 + √ x0 + y 02 − √ y 0
10 10
 2  2
1 1 2
= x0 + √ + 11 y 0 − √ + ,
10 10 10
hence
 2  2
0 1 0 1 2 19
x +√ + 11 y − √ =4− = .
10 10 10 5

This is an ellipse oriented as the one from Example 8.64 q by −1/ 10 in x0 -direction and
but shifted q

FT
1/ 10 in y 0 -direction. The length of the semiaxes are 21 19
2 and 2
1 19
22 .

Method 2. Note that


          
x0 1 r 1 1 2 −3 8 1 22 −1
= − G−1 =− · =− = .
y0 2 s 2 11 −3 10 −2 22 −44 2

e = x − x0 = x + 1 and ye = y − y0 = y − 2. Then
Set x
RA
4 = 10x2 + 6xy + 2y 2 + 8x − 2y = 10(e
x − 1)2 + 6(e
x − 1)(e y + 2)2 + 8(e
y + 2) + 2(e x − 1) − 2(e
y + 2)
x2 − 20e
= 10e x + 1 + 6e x − 6e
xye + 12e y 2 + 8e
y − 12 + 2e x − 8 − 2e
y + 8 + 8e y−4
x2 + 6e
= 10e y 2 − 15
xye + 2e

hence

x2 + 6e
19 = 10e e02 + 11e
y2 = x
xye + 2e y 02

with
D

 0        
x −1 x 1 xe + 3e
y 1 (x + 1) + 3(y − 2) 1 x + 3y − 5
= Q = √ = √ = √ .
e e
ye0 ye 10 3ex − ye 10 3(x + 1) − (y − 2) 10 3x − y + 5


You should now have understood


• that a symmetric 2×2 matrix which is not a multiple of the identity marks two distinguished
directions in R2 , namely the ones parallel to its eigenvectors,
• why a change of variables is helpful to find solutions of a quadratic equation in two variables,
• etc.
You should now be able to

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
312 8.7. Summary

• find the solutions of quadratic equations in two variables,


• make a change of coordinates such that the transformed equation has no mixed term,
• sketch the solution in the xy-plane,
• etc.

8.7 Summary
Cn as an inner product space
Cn is an inner product space if we set
n
X
h~z , wi
~ = zj w j .
j=1

~ ~z ∈ Cn and c ∈ C:
We have for all ~v w,

FT
• h~v , ~zi = h~z , wi,
~
• h~v + cw
~ , ~zi = h~v , ~zi + chw
~ , ~zi, h~z , ~v + cwi
~ = h~z , ~v i + ch~z , wi,
~
2
• h~z , ~zi = k~zk ,
• h~v , ~zi ≤ k~v k k~zk,
• k~v + ~zk2 ≤ k~v k2 + k~zk2 .
RA
The adjoint of a matrix A ∈ MC (n×n) is A∗ = (At ) = (A)t (= transposed and complex conjugated).
The matrix A is called hermitian if A∗ = A. The matrix Q is called unitary if it is invertible and
Q∗ = Q−1 .
Note that det A∗ = det A.

Eigenvalues and eigenvectors


Definition. Let A ∈ M (n × n). Then λ is called an eigenvalue of A with eigenvector ~v if ~v 6= ~0
and A~v = λ~v . The set of all solutions of A~v = λ~v for an eigenvalue λ is called the eigenspace of A
D

for λ. It is denoted by Eigλ (A).

The eigenvalues of A are exactly the zeros of the characteristic polynomial

pA (λ) = det(A − λ).

It is a polynomial of degree n. Since every polynomial of degree ≥ 1 has at least one complex root,
every complex matrix has at least one eigenvalue (but there are real matrices without eigenvalues.)
Moreover, an n × n-matrix has at most n eigenvalues. If we factorise pA , we obtain

pA (λ) = (λ − µ1 )m1 · · · (λ − µk )mk

where µ1 , . . . , µ)k are the different eigenvalues of A. The exponent mj is called algebraic multi-
plicity of µj . The geometric multiplicity of µj is dim(Eigµj (A). Note that

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 313

• geometric multiplicity ≤ algebraic multiplicity,


• the sum of all algebraic multiplicities is m1 + · · · + mk = n.

Similar matrices.

• Two matrices A, B ∈ M (n × n) are called similar if there exists an invertible matrix C such
that A = C −1 BC.
• A matrix A is called diagonalisable if it is similar to a diagonal matrix.

Characterisation of diagonalisability. Let A ∈ MC (n × n) and let µ1 , . . . , µk be the different


eigenvalues of A. We set nj = dim(Eigµj (A) = geometric multiplicity of µj and mj = algebraic
multiplicity of µj . Then the following is equivalent:

(i) A is diagonalisable.
(ii) Cn has a basis consisting of eigenvectors of A.

FT
(iii) Cn = Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A).
(iv) nj = mj for every j = 1, . . . , k.
(v) n1 + · · · + nk = n.

The same is true for symmetric matrices with Cn replaced by Rn .

Properties of unitary matrices. Let Q be a unitary n × n matrix. Then:


RA
• | det Q| = 1,
• If λ ∈ C is an eigenvalue of Q, then |λ| = 1.
• Q is unitarily diagonalisable (we did not prove this fact), hence Cn has a basis consisting of
eigenvectors of Q. They can be chosen to be mutually orthogonal.

Moreover, Q ∈ M (n × n) is unitary if and only if kQ~zk = k~zk for all ~z ∈ Cn .

Properties of hermitian matrices. Let A ∈ MC (n × n) be a hermitian n × n matrix. Then:


D

• det A ∈ R,
• If λ is an eigenvalue of Q, then λ ∈ R.
• A is unitarily diagonalisable hence Cn has a basis consisting of eigenvectors of A. They can
be chosen to be mutually orthogonal.

Moreover, A ∈ M (n × n) is hermitian if and only if hA~v , ~zi = h~v , A~zi for all ~v , ~z ∈ Cn .

Properties of symmetric matrices. Let A ∈ MR (n × n) be a symmetric n × n matrix. Then:

• A is orthogonally diagonalisable. hence Rn has a basis consisting of eigenvectors of A. They


can be chosen to be mutually orthogonal.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
314 8.8. Exercises

Moreover, A is symmetric if and only if hA~v , ~zi = h~v , A~zi for all ~v , ~z ∈ Rn .

Solution of ax2 +bxy +cy 2 = d. The equation can be rewritten as hG~x , ~xi = d with the symmetric
matrix  
a b/2
G= .
b/2 c
Let λ1 , λ2 be the eigenvalues of G and let us assume that d ≥ 0. Then the solutions are:
• an ellipse if det G > 0, more precisely,
p p
– an ellipse with length of its axes d/λ1 and d/λ2 if λ1 , λ2 > 0 and d > 0,
– the point (0, 0) if d = 0,
– the empty set if λ1 , λ2 < 0 and d > 0,
• hyperbola if det G < 0, more precisely,
– hyperbola d > 0,
– two lines crossing at the origin if d = 0,

FT
• two parallel lines, one line or R2 if det G = 0.

8.8 Exercises
1. Sea Q una matriz unitaria. Demuestre que todos sus autovalores tienen norma 1.

2. Sea A una matriz con autovalores µ1 , . . . , µk y sea c una constante.


RA
(a) ¿Qué se puede decir sobre los autovalores de cA? ¿Qué se puede decir sobre los autovalores
de A + c id?

3. Dados la matriz A y los vectores u y w:


    
25 15 −18 0 −1
A = −30 −20 36 , u =  1 , w =  1 .
−6 −6 16 −1 0
D

(a) Diga si los vectores u y w son autovectores de A. Si lo son, cuáles son los vectores propios
correspondientes?
(b) Puede usar que det(A − λ) = −λ3 + 21λ2 − 138λ + 280. Calcule todos los autovalores de A.

4. Para las siguientes matrices, encuentre los vectorios propios, los espacios propios, una matriz
invertible C y una matrix diagonal D tal que C −1 AC = D.
     
−3 5 −20 −2 0 1 1 0 0
A1 =  2 0 8 , A2 =  0 2 0 , A3 = 3 2 0 .
2 1 7 9 0 6 1 3 2

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Chapter 8. Symmetric matrices and diagonalisation 315

5. We consider a string of lenth L which is fixed on both end points. It is excited then its vertical
∂2 ∂2
elongation satisfies the partial differential equation ∂t2 u(t, x) = ∂x2 u(t, x). If we make the
iωt
ansatz u(, x) = e v(x) for some number ω and a function v which depends only on x, we
obtain −ω 2 v = v 00 . If we set λ = −ω 2 , we see that we have to solve the following eigenvalue
problem:
T : V → V, T v = v 00
with
V = {f : [0, L] → R, f is twice differentiable and f (0) = f (L) = 0}.

(i)Show that V is a vector space.


(ii)Show that T is a well-defined linear opertor.
(iii)Find the eigenvalues and eigenspaces of T .

6. Para cada una de las siguientes matrices, determine si son diagonalizables. Si lo es, encuentre

FT
una D que es semejante. D = CAC −1 .
   
    −1 4 2 −7 3 2 5 1
3 1 −1 3 1 0  0 5 −3 6  2 0 2 6
A1 =  1 3 −1 , A2 = 0 3 1 , A3 =   0 0 −5 1  , A4 = 5 2 7
  .
−1
−1 −1 5 0 0 3
0 0 0 11 1 6 −1 3
RA
7. Encuentre una substitción ortogonal que diagonalice las formas cuadráticas dadas y encuentre
la forma diagonal. Haga un bosquejo de las soluciones. Si es un elipse, calcule las longitudes de
las ejes principales y el ángulo que tienen con el eje x. Si es una hipérbola, calcule en ángulo
que tiene las ası́ntotas con el eje x.

(a) 10x2 − 6xy + 2y 2 = 4,


(b) x2 − 9y 2 = 2,
(c) x2 − 9y 2 = 20 (compare la solución con la del literal anterior!)
D

(d) 11x2 − 16xy − y 2 = 30.


(e) x2 + 4xy + 42 = 4.

8. Encuentre los valores propios y los espacios propios de las siguientes matrices n × n:
 
  1 1 ··· 1 1
1 1 ··· 1 1 1 · · · 1 2
A =  ... ... ..  , B = . . ..  .
  
. ..
 .. .. . .
1 1 ··· 1
1 1 ··· 1 n

Compare con el Ejercicio 9.

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
316 8.8. Exercises

 
1 2 P∞ 1 n
9. Sea A = . Calcule eA := n=0 n! A .
2 4
Hint. Encuentre una matriz invertible C y una matriz diagonal D tal que A = C −1 DC y use
esto para calcular An .

10. Sea A ∈ M (n × n, C) una matriz hermitiana tal que todos sus autovalores son estrictamente
mayores a 0. Sea h· , ·i el producto interno estandar en Cn . Demuestre que A induce un producto
interno en Cn a través de

Cn × Cn → C, (x, y) := hAx , yi.

11. (a) Sea Φ : M (2 × 2, R) → M (2 × 2, R), Φ(A) = At . Encuentre los valores propios y los
espacios propios de Φ.
(b) Sea P2 el espacio vectorial de polinomios de grade menor o igual a 0 con coeficientes reales.
Encuentre los valores propios y los espacios propios de T : P2 → P2 , T p = p0 + 3p.

FT
(c) Sea R la reflexión en el plano P : x + 2y + 3z = 0 en R3 . Calcule los valores propios y los
espacios propios de R.
RA
D

Last Change: Mo 16. Mai 01:35:29 CEST 2022


Linear Algebra, M. Winklmeier
Appendix A

Complex Numbers

A complex number is an expression of the form

FT
a + ib
where a, b ∈ R and i is called the imaginary unit. The number a is called the real part of z, denoted
by Re(z) and b is called the imaginary part of z, denoted by Im(z).
The set of all complex numbers is sometimes called the complex plane and it is denoted by C:
C = {a + ib : a, b ∈ R}.
A complex number can be visualised as a point in the plane R2 where a is the coordinate on the
RA
real axis and b is the coordinate on the imaginary axis.
Let a, b, x, y ∈ R. We define the algebraic operations sum and product for complex numbers
z = a + ib, w = x + iy:
z + w = (a + ib) + (x + iy) := a + x + i(b + y),
zw = (a + ib)(x + iy) := ax − by + i(ay + bx).
 
a
Exercise A.1. Show that if we identify the complex number z = a+ib with the vector ∈ R2 ,
b
D

then the addition of complex planes is the same as the addition of vectors in Rn .
We will give a geometric interpretation of the multiplication of complex numbers later after formula
(A.5).
It follows from the definition above that i2 = −1. Moreover, we can view the real numbers R as a
subset of C if we identify a real number x with the complex number x + 0i.
Let a, b ∈ R and z = a + ib. Then the complex conjugate of z is
z = a − ib
and its modulus or norm is p
|z| = a2 + b2 .
Geometrically, the complex conjugate is obtained from the z by an reflection on the x-axis and its
norm is the distance of the point represented by z from the origin of the complex plane.

317
318

Im Im
3 + 2i
2 z = a + ib
−1 + i
1

Re Re
−3 −2 −1 1 2 3 4
−1
− 32 i
−2 z = a − ib

Figure A.1: Complex plane.

Properties A.2. Let a, b, x, y ∈ R and let z = a + ib, w = x + iy. Then:

(i) z = Re z + i Im z.

(iv) zz = |z|2 .
(v) Re z = 12 (z + z), Re z = 1
2i (z
FT
(ii) Re(z + w) = Re(z) + Re(w), Im(z + w) = Re(z) + Im(w).
(iii) (z) = z, z + w = z + w, zw = z w.

− z).
RA
Proof. (i) and (ii) should be clear. For (iii) not that z = a − ib = a + ib,

z + w = a + x + i(b + y) = a + x − i(b + y) = a − ib + x − iy = a + ib + x + iy = z + w,
zw = ax − by + i(ay + bx) = ax − by + i(ay + bx) = (a − ib)(x − iy) = (a + ib)(x + iy) = z w.

(iv) follows from

zz = (a + ib)(a + ib) = (a + ib)(a − ib) = a2 + b2 + i(ab − ba) = a2 + b2 = |z|2


D

and (v) follows from

z + z = a + ib + (a + ib) = a + ib + a − ib = 2a = 2 Re(z),
z + z = a + ib − (a + ib) = a + ib − (a − ib) = 2ib = 2i Im(z).

We call a complex number real if it is of the form z = a + i0 for some a ∈ R and we call it purely
imaginary if it is of the form z = 0 + ib for some b ∈ R. Hence

z is real ⇐⇒ z=z ⇐⇒ z = Re(z)


z is purely imaginary ⇐⇒ z = −z ⇐⇒ z = Im(z).

It turns out that C is a field , that is, it satisfies

Last Change: Mo 16. Mai 01:43:54 CEST 2022


Linear Algebra, M. Winklmeier
Chapter A. Complex Numbers 319

(a) Associativity of addition: (u + v) + w = u + (v + w) for every u, v, w ∈ C.

(b) Commutativity of addition: v + w = w + v for every u, v ∈ C.

(c) Identity element of addition: There exists an element 0, called the additive identity such
that for every v ∈ C, we have 0 + v = v + 0 = v.

(d) Additive inverse: For all z ∈ C, we have an inverse element −z such that z + (−z) = 0.

(e) Associativity of multiplication (uv)w = u(vw) for every u, v, w ∈ C.

(f) Commutativity of multiplication vw = wv for every u, v ∈ C.

(g) Identity element of addition: There exists an element 1, called the multiplicative identity
such that for every v ∈ C, we have 1 · v = v + ·1 = v.

(h) Multiplicative inverse: For all z ∈ C \ {0}, we have an inverse element z −1 such that
z · z −1 = 1.

FT
(i) Distributivity laws: For all u, v, w ∈ C we have

u(w + v) = uw + uv.

It is easy to check that commutativity, associativity and distributivity hold. Clearly, the additive
identity is 0 + i0 and the multiplicative identity is 1 + 0i. If z = a + ib, then its additive inverse is
−a − ib. If z ∈ C \ {0}, then z −1 = |z|z 2 = aa−ib 2
2 +b2 . This can be seen easily if we recall that |z| = zz.

The proof of the next theorem is beyond the scope of these lecture notes.
RA
Theorem A.3 (Fundamental theorem of algebra). Every non-constant complex polynomial
has at least one complex root.

We obtain immediately the following corollary.

Corollary A.4. Every complex polynomial p can be written in the form

p(z) = c(z − λ1 )n1 · (z − λ2 )n2 · · · · · (z − λk )n1 (A.1)


D

where λ1 , . . . , λk are the different roots of p. Note that n1 + · · · + nk = deg(p).

The integers n1 , . . . , nk are called the multiplicity of the corresponding root.

Proof. Let n = deg(p). If n = 0, then p is constant and it clearly of the form (A.1). If n > 0, then,
by Theorem A.3 there exists µ1 ∈ C such that p(µ) = 0. Hence there exists some polynomial q1
such that p(z) = (z − µ)q1 (z). Clearly, deg(q) = n − 1. If q1 is constant, we are done. If q1 is not
constant, then it must have a zero µ2 . Hence q1 (z) = (z − µ2 )q2 (z) with some polynomial q2 with
deg(q2 ) = n − 2. If we repeat this process n times, we finally obtain that

p(z) = c(z − µ1 )(z − µ2 ) · · · (z − µn ).

Now we only have to group all terms with the same µj and we obtain the form (A.1).

Last Change: Mo 16. Mai 01:43:54 CEST 2022


Linear Algebra, M. Winklmeier
320

Functions of complex numbers


It is more or less obvious how to form a complex polynomial. We can also extend functions which
admit a power series representation to the complex numbers. To this end, we recall (from some
calculus course) that a power series is an expression of the form

X
cn (z − a)n (A.2)
n=0

where the cn are the coefficients and a is where the power series is centred.
P∞ In our case, they are
complex numbers and z P is a complex number. Recall that a series n=0 an is called absolutely

convergent if and only if n=0 |an | is convergent. It can be shown that every absolutely convergent
series of complex numbers is convergent. Moreover, for every power series of the form (A.2) there
exists a number R > 0 or R = ∞, called the radius of convergence such that the series converges
absolutely for every z ∈ C with |z − a| < R and it diverges for z with |z − a| > R. That means that
the series converges absolutely for all z in the open disc with radius R centred in a, and it diverges
outside the closed disc with radius R centred in a. For z on the boundary the series may converge

FT
or diverge. Note that R = 0 and R = ∞ are allowed. If R = 0, then the series converges only for
z = a and if R = ∞, then the series converges for all z ∈ C.
Important functions that we know from the real numbers and have a power series are sine, cosine
and the exponential function. We can use their power series representation to define them also for
complex numbers.

Definition A.5. Let z ∈ C. Then we define


∞ ∞ ∞
X (−1)n 2n+1 X (−1)n 2n X 1 n
ez =
RA
sin z = z , cos z = z , z . (A.3)
n=0
(2n + 1)! n=0
(2n)! n=0
n!

Note that for every z the series in (A.3) are absolutely convergent because, for instance, for the series
P∞ (−1)n 2n+1 P∞ 1
for the sine function, we have n=0 | (2n+1)! z | = n=0 (2n+1)! |z|2n+1 is convergent because |z|
is a real number and we know that the cosine series is absolutely convergent for every real argument.
Hence the sine series is absolutely convergent for any z ∈ C, hence converges. The same argument
shows that the series for the cosine and for the exponential are convergent for every z ∈ C.
D

Remark A.6. Since the series for the sine function contains only odd powers of z, it is an odd
function and cosine is an even function because it contains only even powers of z. In formulas:
sin(−z) = − sin z, cos(−z) = cos z.

Next we show the relation between the trigonometric and the exponential function.

Theorem A.7 (Euler formulas). For every z ∈ C we have that


eiz = cos z + i sin z,
1
cos(z) = (eiz + e−iz ),
2
1
sin(z) = (eiz − e−iz ).
2i

Last Change: Mo 16. Mai 01:43:54 CEST 2022


Linear Algebra, M. Winklmeier
Chapter A. Complex Numbers 321

Proof. Let us show the formula for eiz . In the calculation we will use that i2n = (i2 )n = (−1)n and
i2n+1 = (i2 )n i = (−1)n i and

∞ ∞ ∞ ∞
X 1 X 1 n n X 1 (2n) 2n X 1
eiz = (iz)n = i z = i z + i(2n+1) z 2n+1
n=0
n! n=0
n! n=0
(2n)! n=0
(2n + 1)!
∞ ∞ ∞ ∞
X 1 X 1 X (−1)n 2n X (−1)n 2n+1
= (−1)n z 2n + i(−1)n z 2n+1 = z +i z
n=0
(2n)! n=0
(2n + 1)! n=0
(2n)! n=0
(2n + 1)!
= cos z + i sin z.

Note that the third steps needs some proper justification (see some course on intergral calculus).
For the proof of the formula for cos z we note that from what we just proved, it follows that

1 iz 1 1
(e + e−iz ) = (cos(z) + i sin(z) + cos(−z) + i sin(−z)) = (cos(z) + i sin(z) + cos(z) − i sin(z))

FT
2 2 2
= cos(z).

The formula for the sine function follows analogously.


RA
Exercise. Let z, w ∈ C. Show the following.

(i) ez ew = ez+w . Hint. Use Cauchy product.


(ii) Use the Euler formulas to prove cos α cos β = 12 (cos(α − β) + cos(α + β)), sin α sin β =
1 1
2 (cos(α − β) − cos(α + β)), sin α cos β = 2 (sin(α + β) + sin(α − β)).

(iii) (cos z)2 + (sin z)2 = 1.


(iv) cosh(z) = cos(iz), sinh(z) = −i sin(iz). In particular, sin and cos are not bounded functions
D

in C.
(v) Show that the exponential function is 2πi periodic.

Polar representation of complex numbers

Let z ∈ C with |z| = 1 and let ϕ be the angle between the positive real axis and the line connecting
the origin and z. It is called the argument of z. and it is denoted by arg(z). Observe that the
argument is only determined modulo 2π. That means, if we add or subtract any integer multiple
of 2π to the argument, we obtain another valid argument.

Last Change: Mo 16. Mai 01:43:54 CEST 2022


Linear Algebra, M. Winklmeier
322

Im
Im z
Im(z) = |z| sin ϕ

1 1
z
Im(z) ze
ϕ ϕ
Re(z) 1 Re Re(z) 1 Re
= |z| cos ϕ

Figure A.2: Left picture: If |z| = 1, then z = cos ϕ + i sin ϕ = eiϕ .


Right picture: If z 6= 0, then z = |z| cos ϕ + i|z| sin ϕ = |z| eiϕ .

FT
Then the real and imaginary part of z are Re(z) = cos ϕ and Im(z) = i sin ϕ, and therefore
z = cos ϕ + iϕ = eiϕ . We saw in Remark 2.3 how we can calculate the argument of a complex
number.
Now let z ∈ C \ {0} and again let ϕ be the angle between the positive real axis and the line
z
connecting the origin with z. Let ze = |z| z | = 1 and therefore ze = eiϕ . It follows that
. Then |e

z = |z| eiϕ . (A.4)


RA
(A.4) is called de polar representation of z.
Now we can give a geometric interpretation of the product of two complex numbers. Let z, w ∈
C \ {0} and let α = arg z and β = arg w. Then
zw = |z| eiα |w| eiβ = |z| |w| ei(α+β . (A.5)
This shows that the product zw is the complex number whose norm is the product of the norms of
z and w and whose argument is the sum of the arguments of z and w.

Im
D

zw w
α+β

β z
α
Re

Figure A.3: Geometric interprettion of the multiplication of two complex numbers.

Last Change: Mo 16. Mai 01:43:54 CEST 2022


Linear Algebra, M. Winklmeier
Index

|A| (determinant of A), 122 additive inverse, 145


k · k, 30, 42, 270 adjoint matrix, 274
⊕, 241 adjugate matrix, 137
h· , ·i, 33, 42, 270 affine subspace, 150
^(~v , w),
~ 34 algebraic multiplicity, 285
k, 35 angle between ~v and w,~ 34
⊥, 35, 244, 271 angle between two planes, 55
∼, 276 antisymmetric matrix, 105

FT
×, 44 approximation by least squares, 256
∧, 44 argument of a complex number, 323
C, 319 augmented coefficient matrix, 13, 70
Cn , 269
M (m × n), 70 bases
R2 , 25, 28 change of, 204
R3 , 44 basis, 168
Rn , 41 orthogonal, 235
RA
Eigλ (T ), 280 bijective, 189
Im, 319
L(U, V ), 188, 229 canonical basis in Rn , 169
Masym (n × n), 105 Cauchy-Schwarz inequality, 36, 272
Msym (n × n), 105 change of bases, 204
Pn , 155 change-of-coordinates matrix, 207
Re, 319 characteristic polynomial, 282
Sn , 121 coefficient matrix, 13, 70
U ⊥ , 244 augmented, 13, 70
D

dist(~v , U ), 251 cofactor, 123


adj A, 137 column space, 197
arg, 323 commutative diagram, 217
gen, 157 complement
pA , 282 orthogonal, 241
projU ~v , 249 complex conjugate, 319
projw~ ~v , 39, 43, 271 complex number, 319
span, 157 complex plane, 319
v̂, 44 component of a matrix, 13
~vk , 39 composition of functions, 87
~v⊥ , 39 cross product, 44

additive identity, 29, 143, 321 determinant, 19, 122

323
324

expansion along the kth row/column, 124 2×, 102


Laplace expansion, 124 invertible, 95
Leibniz formula, 122 isometry, 240
rule of Sarrus, 125
diagonal, 105 kernel, 189
diagonalisable, 277
diagram, 217, 282 Laplace expansion, 124
commutative, 217 least squares approximation, 256
dimension, 171 left inverse, 96
direct sum, 175, 241 Leibniz formula, 122
directional vector, 51 length of a vector, see norm of a vector
distance of ~v to a subspace, 251 line, 49, 177
dot product, 33, 42, 270 directional vector, 51
normal form, 53
eigenspace, 280 parametric equations, 51
eigenvalue, 278, 279 symmetric equation, 52
eigenvector, 278, 279 vector equation, 51

FT
elementary matrix, 106 linear combination, 156
elementary row operations, 71 linear map, 187
empty set, 157, 159, 161, 168, 171 linear maps
entry, 13 matrix representation, 214
equivalence relation, 276 linear operator, see linear map
Euler formulas, 322 linear span, 157
expansion along the kth row/column, 124 linear system, 12, 69
consistent, 12
RA
field, 320 homogeneous, 12
finitely generated, 159 inhomogeneous, 12
free variables, 77 solution, 12
linear transformation, see linear map
Gauß-Jordan elimination, 75 matrix representation, 215
Gaußian elimination, 75 linearly dependent, 161
generator, 157 linearly independent, 161
geometric multiplicity, 280 lower triangular, 105
Gram-Schmidt process, 252
D

magnitude of a vector, see norm of a vector


Hölder inequality, 272 matrix, 70
hermitian matrix, 274, 295 adjoint, 274
homogeneous linear system, 12 adjugate, 137
hyperplane, 49, 177 antisymmetric, 105
change-of-coordinates, 207
image of a linear map, 189 coefficient, 70
imaginary part of z, 319 cofactor, 123
imaginary unit, 319 column/row space, 197
inhomogeneous linear system, 12 diagonal, 105
injective, 189 diagonalisable, 277
inner product, 33, 42, 270 elementary, 106
inverse matrix, 100 hermitian, 274, 295

Last Change: Mo 16. Mai 01:43:54 CEST 2022


Linear Algebra, M. Winklmeier
Chapter A. Complex Numbers 325

inverse, 100 orthogonalisation, 252


invertible, 95 orthonormal system, 234
left inverse, 96 overfitting, 261
lower triangular, 105
minor, 123 parallel vectors, 35
orthogonal, 237 parallelepiped, 48
product, 89 parallelogram, 47
reduced row echelon form, 73 parametric equations, 51
right inverse, 96 permutation, 121
row echelon form, 73 perpendicular vectors, 35, 271
row equivalent, 75 pivot, 73
singular, 95 plane, 49, 177
snymmetrix, 295 angle between two planes, 55
square, 70 normal form, 55
symmetric, 105 polar represenation of a complex number, 324
transition, 207 principal axes, 302
unitary, 274 product

FT
upper triangular, 105 inner, 33, 42, 270
matrix representation of a linear transformation, product of vector in R2 with scalar, 28
215 projection
minor, 123 orthogonal, 249
modulus, 319 proper subspace, 149
Multiplicative identity, 321 Pythagoras Theorem, 250, 271
multiplicity
algebraic, 285 radius of convergence, 322
RA
geometric, 280 range, 189
real part of z, 319
norm, 319 reduced row echelon form, 73
norm of a vector, 30, 42, 270 reflection in R2 , 222
normal form reflection in R3 , 223
line, 53 right hand side, 12, 69
plane, 55 right inverse, 96
normal vector of a plane, 55 row echelon form, 73
null space, 189 row equivalent, 75
D

row operations, 71
ONB, 235 row space, 197
one-to-one, 189
orthogonal basis, 235 Sarrus
orthogonal complement, 241, 244 rule of, 125
orthogonal diagonalisation, 295 scalar, 26
orthogonal matrix, 237 scalar product, 33, 42, 270
orthogonal projection, 249, 249 sesquilinear, 271
orthogonal projection in R2 , 39 sign of a permutation, 121
orthogonal projection in Rn , 43, 249, 271 similar matrices, 276
orthogonal projection to a plane in R3 , 223 snymmetrix matrix, 295
orthogonal system, 234 solution
orthogonal vectors, 35, 271 vector form, 78

Last Change: Mo 16. Mai 01:43:54 CEST 2022


Linear Algebra, M. Winklmeier
326

span, 157
square matrix, 70
standard basis in Rn , 169
standard basis in Pn , 169
subspace, 149
affine, 150
sum of functions, 87
surjective, 189
symmetric equation, 52
symmetric matrix, 105
system
orthogonal, 234
orthonormal, 234

trace, 283
transition matrix, 207
triangle inequality, 31, 36, 272

FT
trivial solution, 80, 160

unit vector, 31
unitary matrix, 274
upper triangular, 105

vector, 29
in R2 , 25
RA
norm, 30, 42, 270
unit, 31
vector equation, 51
vector form of solutions, 78
vector product, 44
vector space, 29, 143
direct sum, 241
generated, 157
intersection, 241
D

polynomials, 155
spanned, 157
subspace, 149
sum, 241
vector sum in R2 , 28
vectors
orthogonal, 35, 271
parallel, 35
perpendicular, 35, 271

Last Change: Mo 16. Mai 01:43:54 CEST 2022


Linear Algebra, M. Winklmeier

You might also like