LA Lecture Notes
LA Lecture Notes
FT
RA
D
Analysis Series
M. Winklmeier
Chigüiro Collection
Work in progress. Use at your own risk.
FT
RA
D
Contents
1 Introduction 5
1.1 Examples of systems of linear equations; coefficient matrices . . . . . . . . . . . . . . 6
1.2 Linear 2 × 2 systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2 R2 and R3
2.1 Vectors in R2 . . . . . . . . . . . . . .
2.2 Inner product in R2 . . . . . . . . . .
2.3 Orthogonal Projections in R2 . . . . .
2.4 Vectors in Rn . . . . . . . . . . . . . .
2.5 Vectors in R3 and the cross product .
2.6 Lines and planes in R3 . . . . . . . . .
FT .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
25
25
33
39
43
46
53
RA
2.7 Intersections of lines and planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4 Determinants 137
4.1 Determinant of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
4.2 Properties of the determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.3 Geometric interpretation of the determinant . . . . . . . . . . . . . . . . . . . . . . . 152
4.4 Inverse of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
3
4 CONTENTS
FT
6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
B Solutions 375
Index 389
Chapter 1
Introduction
This chapter serves as an introduction to the main themes of linear algebra, namely the problem of
solving systems of linear equations for several unknowns. We are not only interested in an efficient
FT
way to find their solutions, but we also wish to understand how the solutions could possibly look
and what we can say about their structure. For the latter, it will be crucial to find a geometric
interpretation of systems of linear equations. In this chapter we will use the “solve and insert”-
strategy for solving linear systems. A systematic and efficient formalism will be given in Chapter 3.
Everything we discuss in this chapter will appear again later on, so you may read it quickly or even
skip (parts of) it.
A linear system is a set of equations for a number of unknowns which have to be satisfied simul-
RA
taneously and where the unknowns appear only linearly. If the number of equations is m and the
number of unknowns is n, then we call it an m × n linear system. Typically the unknowns are
called x, y, z or x1 , x2 . . . , xn . The following is an example of a linear system of 3 equations for 5
unknowns:
because
√ in the first equation we have a product of two of the unknowns. Also expressions like x2 ,
3
x, xyz, x/y or sin x make a system non-linear.
Now let us briefly discuss the simplest non-trivial case: A system consisting of one linear equation
for one unknown x. Its most general form is
ax = b (1.1)
where a and b are given constants and we want to find all x ∈ R which satisfy (1.1). Clearly, the
solution to this problem depends on the coefficients a and b. We have to distinguish several cases.
Case 1. a 6= 0. In this case, there is only one solution, namely x = b/a.
Case 2. a = 0, b 6= 0. In this case, there is no solution because whatever value we choose for x,
the left hand side ax will always be zero and therefore cannot be equal to b.
Case 3. a = 0, b = 0. In this case, there are infinitely many solutions. In fact, every x ∈ R solves
the equation.
So we see that already in this simple case we have three very different types of solution of the
system (1.1): no solution, exactly one solution or infinitely many solutions.
Now let us look at a system of one linear equation for two unknowns x, y. Its most general form is
ax + by = c. (1.1’)
Here, a, b, c are given constants and we want to find all pairs x, y so that the equation is satisfied.
For example, if a = b = 0 and c 6= 0, then the system has no solution, whereas if for example a 6= 0,
then there are infinitely many solutions because no matter how we choose y, we can always satisfy
the system by taking x = a1 (c − y).
Question 1.1
Is it possible that the system has exactly one solution?
(Come back to this question again after you have studied Chapter 3.)
FT
The general form of a system of two linear equations for one unknown is
a1 x = b1 ,
a11 x + a12 y = c1 ,
a2 x = b2
a21 x + a22 y = c2
RA
where a1 , a2 , b1 , b2 , respectively a11 , a12 , a21 , a22 , c1 , c2 are constants and x, respectively x, y are the
unknowns.
Question 1.2
Can you find find examples for the coefficients such that the systems have
Can you maybe even give a general rule for when which behaviour occurs?
(Come back to this question again after you have studied Chapter 3.)
Before we discuss general linear systems, we will discuss in this introductory chapter the special
case of a system of two linear equations with two unknowns. Although this is a very special type
of system, it exhibits many porperties of general linear systems and they appear very often in
problems.
Example 1.1. Assume that a car dealership sells motorcycles and cars. Altogether they have 25
vehicles in their shop with a total of 80 wheels. How many motorcycles and cars are in the shop?
Solution. First, we give names to the quantities we want to calculate. So let M = number of
motorcyles, C = number of cars in the dealership. If we write the information given in the exercise
in formulas, we obtain
since we assume that every motorcycle has 2 wheels and every car has 4 wheels. Equation 1 tells
us that M = 25 − C. If we insert this into equation 2 , we find
80 = 2(25 − C) + 4C = 50 − 2C + 4C = 50 + 2C =⇒ 2C = 30 =⇒ C = 15.
This implies that M = 25 − C = 25 − 15 = 10. Note that in our calculations and arguments, all
FT
the implication arrows go “from left to right”, so what we can conclude at this instance is that the
system has only one possible candidate for a solution and this candidate is M = 10, C = 15. We
have not (yet) shown that it really is a solution. However, inserting these numbers in the original
equation we see easily that our candidate is indeed a solution.
So the answer is: There are 10 motorcycles and 15 cars (and there is no other possibility).
Solution. Again, let M = number of motorcyles, C = number of cars. The information of the
exercise leads to the following system of equations:
D
As in the previous exercise, we obtain from 1 and 2 that M = 10, C = 15. Clearly, this also
satisfies equation 3 . So again the answer is: There are 10 motorcycles and 15 cars (and there is
no other possibility).
Example 1.3. Assume that a car dealership sells motorcycles and cars. Altogether they have 25
vehicles in their shop with a total of 80 wheels. Moreover, the shop arranges them in 5 distinct areas
of the shop so that in each area there are either 3 cars or 5 motorcycles. How many motorcycles
and cars are in the shop?
Solution. Again, let M = number of motorcycles, C = number of cars. The information of the
exercise gives the following equations:
As in the previous exercise, we obtain that M = 10, C = 15 using only equations 1 and 2 .
However, this does not satisfy equation 3 ; so there is no way to choose M and C such that all
three equations are satisfied simultaneously. Therefore, a shop as in this example does not exist.
Example 1.4. Assume that a zoo has birds and cats. The total count of legs of the animals is 60.
Feeding a bird takes 5 minutes, feeding a cat takes 10 minutes. The total time to feed the animals
is 150 minutes. How many birds and cats are in the zoo?
Solution. Let B = number of birds, C = number of cats in the zoo. The information of the
FT
exercise gives the following equations:
The first equation gives B = 30 − 2C. Inserting this into the second equation, gives
Remark. The reason for this is that both equations 1 and 2 are basically the same equation.
If we divide the first one by 2 and the second one by 5, then we end up in both cases with the
equation B + 2C = 30, so both equations contain exactly the same information.
D
Algebraically, the linear system has infinitely many solutions. But our variables represent animals
and the only come in nonnegativ integer quantities, so we have the 16 different solutions B = 30−2C
where C ∈ {0, 1, . . . , 15}.
Solution. A polynomial of degree at most 3 is known if we know its 4 coefficients. In this exercise,
the unknowns are the coefficients of the polynomial P . If we write P (x) = αx3 + βx2 + γx + δ,
then we have to find α, β, γ, δ such that (1.2) is satisfied. Note that P 0 (x) = 3αx2 + 2βx + γ. Hence
FT
3 2
It is easy to verify that the polynomial P (x) = x + 2x + 3x + 1 has all the desired properties.
Example 1.6. A pole is 5 metres long and shall be coated with varnish. There are two types of
varnish available: The blue one adds 3 g per 50 cm to the pole, the red one adds 6 g per meter to
the pole. Is it possible to coat the pole in a combination of the varnishes so that the total weight
added is
(a) 35 g? (b) 30 g?
RA
Solution. (a) We denote by b the length of the pole which will be covered in blue and r the length
of the pole which will be covered in red. Then we obtain the system of equations
1 b+ r = 5 (total length)
2 6b + 6r = 35 (total weight)
The first equation gives r = 5 − b. Inserting into the second equation yields 35 = 6b + 6(5 − b) = 30
which is a contradiction. This shows that there is no solution.
(b) As in (a), we obtain the system of equations
D
1 b+ r = 5 (total length)
2 6b + 6r = 30 (total weight)
Again, the first equation gives r = 5−b. Inserting into the second equation yields 30 = 6b+6(5−b) =
30 which is always true, independently of how we choose b and r as long as 1 is satisfied. This
means that in order to solve the system of equations, it is sufficient to solve only the first equation
since then the second one is automatically satisfied. So we have infinitely many solutions. Any pair
b, r such that b + r = 5 gives a solution. So for any b that we choose, we only have to set r = 5 − b
and we have a solution of the problem. Of course, we could also fix r and then choose b = 5 − r to
obtain a solution.
For example, we could choose b = 1, then r = 4, or b = 0.00001, then r = 4.99999, or r = −2 then
b = 7. Clearly, the last example does not make sense for the problem at hand, but it still does
satisfy our system of equations.
Example 1.7. When octane reacts with oxigen, the result is carbon dioxide and water. Find the
equation for this reaction.
Solution. The chemical formulas for the substances are C8 H18 , O2 , CO2 and H2 O. Hence the
reaction equation is
a C8 H18 + b O2 −→ c CO2 + d H2 O
with unkonwn integers a, b, c, d. Clearly the solution will not be unique since if we have one set
of numbers a, b, c, d which works and we multiply all of then by the same number, then we obtain
another solution. Let us write down the system of equations. To this end we note that the number
of atoms of each element has to be equal on both sides of the equation. We obtain:
1 8a = c (carbon)
2 18a = 2d (hydrogen)
3 2b = 2c + d (oxygen)
or, if we put all the variables on the left hand side,
1
2
4
FT 8a
18a
− c = 0,
− 2d = 0,
2b − 2c − d = 0.
Let us express all the unknowns in terms of a: 1 and 2 show that c = 8a and d = 9a. Inserting
this in 3 we obtain 0 = 2b − 2 · 8a − 9a = 2b − 25a, hence b = 25
2 a. If we want all coefficients to
be integers, we can choose a = 2, b = 25, c = 16, d = 18 and the reaction equation becomes
RA
2 C8 H18 + 25 O2 −→ 16 CO2 + 18 H2 O .
All the examples we discussed in this section are so-called systems of linear equations. Let us give
a precise definition of what we mean by this.
Definition 1.8 (Linear system). An m×n system of linear equations (or simply a linear system)
is a system of m linear equations for n unknowns of the form
a11 x1 + a12 x2 + · · · + a1n xn = b1
D
Definition 1.9 (Coefficient matrix). The coefficient matrix A of the system is the collection of
all coefficients aij in an array as follows:
a11 a12 . . . a1n
a21 a22 . . . a2n
A= . .. . (1.4)
.. .
am1 am2 . . . amn
The numbers aij are called the entries or components of the matrix A.
The augmented coefficient matrix A of the system is the collection of all coefficients aij and the
right hand side; it is denoted by
a11 a12 . . . a1n b1
a21 a22 . . . a2n b2
(A|b) = ..
.. .. . (1.5)
. . .
am1 am2 . . . amn bm
FT
The coefficient matrix is nothing else than the collection of the coefficients aij ordered in some sort
of table or rectangle such that the place of the coefficient aij is in the ith row of the jth column.
The augmented coefficient matrix contains additionally the constants from the right hand side.
Important observation. There is a one-to-one correspondence between linear systems and aug-
mented coefficient matrices: Given a linear system, it is easy to write down its augmented coefficient
matrix and vice versa.
RA
Let us write down the coefficient matrices of our examples.
Example 1.1: This is a 2 × 2 system with coefficients a11 = 1, a12 = 1, a21 = 2, a22 = 4 and
right hand side b1 = 25, b2 = 80. The system has a unique solution. The coefficient matrix and the
augmented coefficient matrix are
1 1 1 1 25
A= , (A|b) = .
2 4 2 4 80
Example 1.2: This is a 3 × 2 system with coefficients a11 = 1, a12 = 1, a21 = 2, a22 = 4, a31 = 51 ,
D
a32 = 13 , and right hand side b1 = 25, b2 = 80, b3 = 7. The system has a unique solution. The
coefficient matrix and the augmented coefficient matrix are
1 1 1 1 25
A = 2 4 , (A|b) = 2 4 80 ,
1 1 1 1
5 3 5 3 7
Example 1.3: This is a 3 × 2 system with coefficients a11 = 1, a12 = 1, a21 = 2, a22 = 4, a31 = 15 ,
a32 = 13 , and right hand side b1 = 60, b2 = 200, b3 = 100. The system has no solution. The
coefficient matrix is the same as in Example 1.2, the augmented coefficient matrix is
1 1 25
(A|b) = 2 4 80 ,
1 1
5 3 5
Example 1.5: This is a 4 × 4 system with coefficients a11 = 0, a12 = 0, a13 = 0, a14 = 1, a21 = 1,
a22 = 1, a23 = 1, a24 = 1, a31 = 0, a32 = 0, a33 = 1, a34 = 0, a41 = 24, a42 = 8, a43 = 2, a44 = 1,
and right hand side b1 = 1, b2 = 7, b3 = 3, b4 = 23. The system has a unique solution. The
coefficient matrix and the augmented coefficient matrix are
0 0 0 1 0 0 0 1 1
1 1 1 1 1 1 1 1 7
A= 0 0 1 0 , (A|b) = 0 0 1 0 3 .
12 4 1 0 12 4 1 0 23
Example 1.7: This is a 3 × 4 homogeneous system with coefficients a11 = 8, a12 = 0, a13 = −1,
a14 = 0, a21 = 18, a22 = 0, a23 = 0, a24 = −2, a31 = 0, a32 = 2, a33 = −2, a34 = −1, and right
hand side b1 = 1, b2 = 7, b3 = 3, b4 = 23. The system has a unique solution. The coefficient matrix
and the augmented coefficient matrix are
8 0 −1 0 8 0 −1 0 0
FT
A = 18 0 0 −2 , (A|b) = 18 0 0 −2 0 .
0 2 −2 −1 0 2 −2 −1 0
We saw that Examples 1.1, 1.2, 1.5, 1.6 (a) have unique solutions. In Examples 1.6 (b) and 1.7
the solution is not unique; they even have infinitely many solutions! Examples 1.3 and 1.6(a) do
not admit solutions. So given an m × n system of linear equations, two important questions arise
naturally:
RA
• Existence: Does the system have a solution?
• Uniqueness: If the system has a solution, is it unique?
More generally, we would like to be able to say something about the structure of solutions of linear
systems. For example, is it possible that there is only one solution? That there are exactly two
solutions? That there are infinite solutions? That there is is no solution? Can we give criteria for
D
Ejercicios.
De los siguientes sistemas de ecuaciones lineales, encuentre al menos una solucin (si la hay). ¿Cules
tienen solucin nica?
(a) 4x − 6y = 7
6x − 9y = 12
(b) x + 2y − 3z = −4
1.2
2x + y − 3z = 4
Linear 2 × 2 systems
(c)
FT
3x − 5y = 0
15x − 9y = 0
(d) 2x + 4y + 6z = 18
4x + 5y + 6z = 24
(e) 4x + 14y = 23
(f)
6x + 21y = 30
6x + 8y = 12
15x + 20y = 30
RA
Let us come back to the equation from Example 1.1. For convenience, we write now x instead of B
and y instead of C. Recall that the system of equations that we are interested in solving is
1 x + y = 60,
(1.6)
2 2x + 4y = 200.
We want to give a geometric meaning to this system of equations. To this end we think of pairs
x, y as points (x, y) in the plane. Let us forget about the equation 2 for a moment and concentrate
only on 1 . Clearly, it has infinitely many solutions. If we choose an arbitrary x, we can always
D
find y such that 1 satisfied (just take y = 60 − x). Similarly, if we choose any y, then we only have
to take x = 60 − y and we obtain a solution of 1 .
Where in the xy-plane lie all solutions of 1 ? Clearly, 1 is equivalent to y = 60 − x which we easily
identify as the equation of the line L1 in the xy-plane which passes through (0, 60) and has slope
−1. In summary, a pair (x, y) is a solution of 1 if and only if it lies on the line L1 , see Figure 1.1.
If we apply the same reasoning to 2 , we find that a pair (x, y) satisfies 2 if and only if (x, y) lies
on the line L2 in the xy-plane given by y = 41 (200 − 2x) (this is the line in the xy-plane passing
through (0, 50) with slope − 12 ).
Now it is clear that a pair (x, y) satisfies both 1 and 2 if and only if it lies on both lines L1 and
L2 . So finding the solution of our system (1.6) is the same as finding the intersection of the two
lines L1 and L2 . From elementary geometry we know that there are exactly three possibilities for
their intersection:
M M M
40 40 40
30 30 2M + 4C = 80 30
20 M + C = 25 20 20
10 10 10 (15, 10)
C C C
−10 10 20 30 −10 10 20 30 −10 10 20 30
−10 −10 −10
Figure 1.1: Graphs of the lines L1 , L2 which represent the equations from the system (1.6) (see also
Example 1.1). Their intersection represents the unique solution of the system.
FT
(i) L1 and L2 are not parallel. Then they intersect in exactly one point.
(ii) L1 and L2 are parallel and not equal. Then they do not intersect.
(iii) L1 and L2 are parallel and equal. Then L1 = L2 and they intersect in infinitely many points
(they intersect in every point of L1 = L2 ).
In our example we know that the slope of L1 is −1 and that the slope of L2 is − 21 , so they are not
RA
parallel and therefore intersect in exactly one point. Consequently, the system (1.6) has exactly
one solution.
If we look again at Example 1.6, we see that in Case (a) we have to determine the intersection of
the lines
35
L1 : y = 5 − x, L2 : y = − x.
6
Both lines have slope −1 so they are parallel. Since the constant terms in both lines are not equal,
they intersect nowhere, showing that the system of equations has no solution, see Figure 1.2.
D
G1 : y = 5 − x, G2 : y = 5 − x.
We see that G1 = G2 , so every point on G1 (or G2 ) is solution of the system and therefore we have
infinite solutions, see Figure 1.2.
Important observation. If a linear 2 × 2 system has a unique solution or not, has nothing to
do with the right hand side of the system because this only depends on whether the two lines are
parallel or not, and this in turn depends only on the coefficients on the left hand side.
r
6
5
4
3 L1 : y = 5 − x
2 L2 : y = 35/6 − x
1
g
−1 1 2 3 4 5 6
−1
FT
One linear equation with two unknowns
The general form of one linear equation with two unknowns is
αx + βy = γ. (1.7)
In the first two cases, (1.7) has infinitely many solutions, in the last case it has no solution.
1 Ax + By = U
(1.8)
2 Cx + Dy = V.
We are using the letters A, B, C, D instead of a11 , a12 , a21 , a22 in order to make the calculations
more readable. If we interprete the system of equations as intersection of two geometrical objects,
in our case lines, we already know the there are exactly three possible types of solutions:
(i) A point if 1 and 2 describe two non-parallel lines.
(ii) A line if 1 and 2 describe the same line; or if one of the equations is a plane and the other
one is a line.
(iii) A plane if both equations describe a plane.
(iv) The empty set if the two equations describe parallel but different lines; or if one of the
equations has no solution.
In case (i), the system has exactly one solution, in cases (ii) and (iii) the system has infinitely many
solutions and in case (iv) the system has no solution.
In summary, we have the following very important observation.
Remark 1.10. The system (1.8) has either exactly one solution or infinitely many solutions or
no solution.
It is not possible to have for instance exactly 7 solutions.
Question 1.3
What is the geometric interpretation of
FT
(i) a system of 3 linear equations for 2 unknowns?
(ii) a system of 2 linear equations for 3 unknowns?
Algebraic proof of Remark 1.10. Now we want to prove the Remark 1.10 algebraically and we want
RA
to find a criterion on A, B, C, D which allows us to decide easily how many solutions there are. Let
us look at the different cases.
1
Case 1. B 6= 0. In this case we can solve 1 for y and obtain y = B (U − Ax). Inserting 2 we find
D
Cx + − Ax) = V . If we put all terms with x on one side and all other terms on the other side,
B (U
we obtain
2’ (AD − BC)x = DU − BV.
DU −BV
(i) If AD − BC 6= 0 then there is at most one solution, namely x = AD−BC and consequently
AV −CU
y = B1 (U − Ax) = AD−BC . Inserting these expressions for x and y in our system of equations,
D
we see that they indeed solve the system (1.8), so that we have exactly one solution.
DU −BV
(i) If AD − BC 6= 0 then there is exactly one solution, namely x = AD−BC and consequently
1 AV −CU
y= B (U − Ax) = AD−BC .
Case 3. B = 0 and D = 0. Observe that in this case AD − BC = 0 . In this case the system (1.8)
reduces to
Ax = U, Cx = V. (1.9)
We see that the system no longer depends on y. So, if the system (1.9) has at least one solution,
then we automatically have infinitely many solutions since we may choose y freely. If the system
(1.9) has no solution, then the original system (1.8) cannot have a solution either.
Note that there are no other possible cases for the coefficients.
FT
In summary, we proved the following theorem.
1 Ax + By = U
(1.10)
2 Cx + Dy = V.
(i) The system (1.10) has exactly one solution if and only if AD − BC 6= 0 . In this case, the
RA
solution is
DU − BV AV − CU
x= , y= . (1.11)
AD − BC AD − BC
(ii) The system (1.10) has no solution or infinitely many solutions if and only if AD − BC = 0 .
Definition 1.12. The number d = AD − BC is called the determinant of the system (1.10).
Remark 1.13. Let us see how the determinant connects to our geometric interpretation of the
system of equations. Assume that B 6= 0 and D 6= 0. Then we can solve 1 and 2 for y to obtain
equations for a pair of lines
A 1 C 1
L1 : y= − x + U, L2 : y= − x + V.
B B D D
A C
The two lines intersect in exactly one point if and only if they have different slopes, i.e., if − B 6= − D .
After multiplication by −BD we see that this is the same as AD 6= BC, or in other words,
AD − BC 6= 0.
On the other hand, the lines are parallel (hence they are either equal or they have no intersection)
A C
if − B 6= − D . This is the case if and only if AD = BC, or in other words, if AD − BC = 0.
y
7
(5, 3)
3 L1 : x + 2y = 11
L2 : 3x + 4y = 27
1
x
−1 1 3 5 7 9 11
−1
Figure 1.3: Example 1.14(a). Graphs of L1 , L2 and their intersection (5, 3).
Question 1.4
FT
Consider the cases when B = 0 or D = 0 and make the connection between Theorem 1.11 and
the geometric interpretation of the system of equations.
So the solution is x = 5, y = 3. (If we did not have Theorem 1.11, we would have to check
that this is not only a candidate for a solution, but indeed is one.)
Check that the formula (1.11) is satisfied.
(b) 1 x + 2y = 1
2 2x + 4y = 5.
Here, the determinant is d = 4 − 4 = 0, so we expect either no solution or infinitely many
solutions. The first equations gives x = 1 − 2y. Inserting into the second equations gives
2(1 − 2y) + 4y = 5. We see that the terms with y cancel and we obtain 2 = 5 which is a
contradiction. Therefore, the system of equations has no solution.
L1 : x + 2y = 1 L1 : x + 2y = 1
y y
L2 : 3x + 4y = 5 L2 : 3x + 6y = 3
2 2
1 1
L1 = L2
x x
−1 1 2 3 −1 1 2 3
−1 −1
Figure 1.4: Picture on the left: The lines L1 , L2 from Example 1.14(b) are parallel and do not
intersect. Therefore the linear system has no solution.
Picture on the right: The lines L1 , L2 from Example 1.14(c) are equal. Therefore the linear system
FT
has infinitely many solutions.
(c) 1 x + 2y = 1
2 3x + 6y = 3.
The determinant is d = 6 − 6 = 0, so again we expect either no solution or infinitely many
solutions. The first equations gives x = 1 − 2y. Inserting into the second equations gives
RA
3(1 − 2y) + 6y = 3. We see that the terms with y cancel and we obtain 3 = 3 which is true.
Therefore, the system of equations has infinitely many solutions given by x = 1 − 2y.
Remark. This was somewhat clear since we can obtain the second equation from the first one
by multiplying both sides by 3 which shows that both equations carry the same information
and we loose nothing if we simply forget about one of them.
1 kx + (15/2 − k)y = 1
2 4x + 2ky = 3
Solution. We only need to calculate the determinant and find all k such that it is different from
zero. So let us start by calculating
Hence there are exactly two values for k where d = 0, namely k = −1 ± 4, that is k1 = 3, k2 = −5.
For all other k, we have that d 6= 0.
So the answer is: The system has exactly one solution if and only if k ∈ R \ {−5, 3}.
Remark 1.16. (a) Note that the answer does not depend on the right hand side of the system
of the equation. Only the coefficients on the left hand side determine if there is exactly one
solution or not.
(b) If we wanted to, we could also calculate the solution x, y in the case k ∈ R \ {−5, 3}. We
could do it by hand or use (1.11). Either way, we find
1 5k − 45/2 1 6k − 4
x= [2k − 3(15/2 − k)] = 2 , y= [6k − 4] = 2 .
d 2k + 4k − 30 d 2k + 4k − 30
Note that the denominators are equal to d and they are equal to 0 exactly for the “forbidden”
values of k = −5 or k = 3.
(c) What happens if k = −5 or k = 3? In both cases, d = 0, so we will either have no solution or
infinitely many solutions.
If k = −5, then the system becomes −5x + 25/2y = 1, 4x − 10y = 3.
Multiplying the first equation by −4/5 and not changing the second equation, we obtain
FT
4
4x − 10y = − , 4x − 10y = 3
5
which clearly cannot be satisfied simultaneously.
If k = 3, then the system becomes 3x − 9/2y = 1, 4x + 6y = 3.
Multiplying the first equation by 4/3 and not changing the second equation, we obtain
4
4x − 6y = , 4x − 6y = 3
RA
3
which clearly cannot be satisfied simultaneously.
In conclusion, if k = −5 or k = 3, then the linear system has no solution.
• determine if a linear 2 × 2 system has a unique, no or infinitely many solutions and calculate
them,
• give criteria for existence/uniqueness of solutions,
• etc.
Ejercicios.
1. Usando el criterio del determinante, diga cuáles sistemas tienen solucin nica y encuntrela.
En caso de que el determinante sea cero, especifique si el sistema posee infinitas soluciones o
ninguna solucin.
FT
2. ¿Para cules valores de k se cortan las siguientes rectas exactamente en un punto?
k+3 √ 2k − 5 √
3
y= x + π − 2, y= x + 3.
2 3
3. ¿Para cules valores de k se cortan las siguientes rectas exactamente en un punto?
√
(k + 2)x − 3y = 2π, 5kx + (k − 1)y = 3 − e2 .
RA
1.3 Summary
A linear system is a system of equations
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. ..
. . .
am1 x1 + am2 x2 + · · · + amn xn = bm
D
where x1 , . . . , xn are the unknowns and the numbers aij and bi (i = 1, . . . , m, j = 1, . . . , n) are
given. The numbers aij are called the coefficients of the linear system and the numbers b1 , . . . , bn
are called the right side of the linear system.
In the special case when all bi are equal to 0, the system is called a homogeneous; otherwise it is
called inhomogeneous.
The coefficient matrix A and the augmented coefficient matrix (A|b) of the system is are
a11 a12 . . . a1n
a11 a12 . . . a1n b1
a21 a22 . . . a2n a21 a22 . . . a2n b2
A= . , (A|b) = . . .. .
.
. .
.. ..
. . .
am1 am2 ... amn am1 am2 ... amn bn
1.4 Exercises
FT
1. Encuentre el área del triángulo que se encuentra en el primer cuadrante y que está delimitado
por las rectas y = 2x − 4, y = −4x + 20.
2. Suponga que los puntos (1, 5), (−1, 3) y (0, 1) están sobre la parábola y = ax2 + bx + c. Con
esta información, determine los valores de a, b, c.
3. Describa todas las parábolas que pasan por los puntos (1, 1) y (−1, 4).
4. Encuentre todos los valores de t, k ∈ R tal que el siguiente sistema sea consistente.
2x + 8y = 4,
RA
5x + 4ky = 20,
tx + 2y = 1.
5. De un número de tres cifras sabemos que sus tres dı́gitos suman 11, y la suma del primer y
tercer dı́gito es 5. Encuentre todos los números que cumplen la propiedad anterior.
6. El dueño de una tienda vende comida para perros a 40$ y comida para gatos a 20$. Haciendo
cuentas de la semana observa que por concepto de comida de animales recibió 640$ y que 22
clientes entraron esa semana a comprar comida de animales. Si se supone que cada cliente
D
tiene una única mascota ¿Cuántos clientes eran dueños de perros y cuántos de gatos?
7. La suma de la cifra de las decenas y la cifra de las unidades de un número de dos dgitos es
12, y si al número se le resta 18, las cifras se invierten. Hallar el número.
8. Sabemos que la distancia entre Bogotá y Puerto Concordia es de 375 km aproximadamente
y la distancia entre Villavicencio y Puerto Concordia es de aproximadamente 261 km. Un
conductor A parte de Bogotá hacia Villavicencio con una velocidad constante de 57 kmh a las
4:00 am y una hora despéus, un conductor B parte de Puerto Concordia hacia Bogotá a
una velocidad constante de 49 kmh . ¿A qué hora llega el conductor A a Villavicencio? ¿Los
conductores A y B se encuentran en carretera? ¿A qué hora lo hacen?. Repita las preguntas
si suponemos que el conductor A se mueve a una velocidad de 19 km h y el conductor B se
km
mueve a una velocidad de 70 h .
FT
RA
D
Chapter 2
R2 and R3
In this chapter we will introduce the vector spaces R2 , R3 and Rn . We will define algebraic
operations in them and interpret them geometrically. Then we will add some additional structure
FT
to these spaces, namely an inner product. This allows us to assign a norm (length) to a vector and
talk about the angle between two vectors; in particular, it gives us the concept of orthogonality. In
Section 2.3 we will define orthogonal projections in R2 and we will give a formula for the orthogonal
projection of a vector onto another. This formula is easily generalised to projections onto a vector
in Rn with n ≥ 3. Section 2.5 is dedicated to the special and very important case R3 since it is the
space that physicists use in classical mechanics to describe our world. In the last two sections we
study lines and planes in Rn and in R3 . We will see how we can describe them in formulas and we
will learn how to calculate their intersections. This naturally leads to the question on how to solve
RA
linear systems efficiently which will be addressed in the next chapter.
2.1 Vectors in R2
Recall that the xy-plane is the set of all pairs (x, y) with x, y ∈ R. We will denote it by R2 .
Maybe you already encountered vectors in a physics lecture. For instance velocities and forces are
described by vectors. The velocity of a particle says how fast it is and in which direction the particle
moves. Usually, the velocity is represented by an arrow which points in the direction in which the
D
particle moves and whose length is proportional to the magnitude of the velocity.
Similarly, a force has strength and a direction so it is represented by an arrow which points in the
direction in which it acts and with length proportional to its strength.
Observe that it is not important where in the space R2 or R3 we put the arrow. As long it points
in the same direction and has the same length, it is considered the same vector. We call two arrows
equivalent if they have the same direction and the same length. A vector is the set of all arrows
which are equivalent to a given arrow. Each specific arrow in this set is called a representation of
the vector. A special representation is the arrow that starts in the origin (0, 0). Vectors are usually
denoted by a small letter with an arrow on top, for example ~v .
# –
Given two points P, Q in the xy-plane, we write P Q for
the vector which is represented by the arrow that starts y
in P and ends in Q. For example, let P (2, 1) and Q(4, 4)
be pointsinthe xy-plane. Then the arrow from P to Q
# – 2 Q
is P Q = .
3
P#Q–
We can identify a point P (p1 , p2 ) in the xy-plane with
the vector starting in the poiint (0, 0) and ending in P
# – p1 x
P . We denote this vector by OP or or some-
p2
times by (p1 , p2 )t in order to save space (the subscript
t
stands for “transposed”). p1 is called its x-coordinate
or x-component and p2 is called its y-coordinate or y-
# –
component. Figure 2.1: The vector P Q and several of
its representations. The green arrow is the
FT
a special representation whose initial point is
On the other hand, every vector describes a unique in the origin.
b
point in the xy-plane, namely the tip of the arrow which
represents the given vector and starts in the origin.
Clearly its coordinates are (a, b). Therefore we can iden-
tify the set of all vectors in R2 with R2 itself.
RA
a b
Observe that the slope of the arrow ~v = is a if a 6= 0. If a = 0, then the vector is parallel to
b
the y-axis.
2
For example, the vector ~v = can be represented as an arrow whose initial point is in the origin
5
and its tip is at the point (2, 5). If we put its initial point anywhere else, then we find the tip by
moving 2 units to the right (parallel to the x-axis) and 5 units up (parallel to the y-axis).
D
0
A very special vector is the zero vector . Is is usually denoted by ~0.
0
FT
How should we sum two vectors? Again, let us think of forces. Assume we have two forces F~1
and F~2 both acting on the same particle. Then we get the resulting force if we draw the arrow
representing F~1 and attach to its end point the initial point of the arrow representing F~2 . The total
force is then represented by the arrow starting in the initial point of F~1 and ending in the tip of F~2 .
Convince yourself that we obtain the same result if we start with F~2 and put the initial point of
F~1 at the tip of F~2 .
RA
We could also think of the sum of velocities. For example, if a train moves with velocity ~vt and a
passengar on the train is moving with relative velocity ~vp , then her total velocity with respect to
the ground is the vectorsum
of the twovelocities.
a p
Now assume that ~v = and w ~ = . Algebraically,
b q
we obtain the components of their sum by summing the y
a+p ~v + w
~
components: ~v + w~= , see Figure 2.3.
b+q w~
D
Our discussion of how the product of a vector and a scalar and how the sum of two vectors should
be, leads us to the following formal definition.
a p
Definition 2.1. Let ~v = ,w
~= ∈ R2 , c ∈ R. Then:
b q
a p a+p
Vector sum: ~v + w~= + = ,
b q b+q
a ca
Product with a scalar: c~v = c = .
b cb
It is easy to see that the vector sum satisfies what one expects from a sum: (~u +~v ) + w
~ = ~u + (~v + w)
~
(associativity) and ~v + w ~ = w ~ + ~v (commutativity). Moreover, we have the distributivity laws
(a + b)~v= a~v + b~v
anda(~v + w)
~ = a~v + aw.
~ Let us verify for example associativity. To this end,
u1 v1 w1
let ~u = , ~v = ,w ~= . Then
u2 v2 w2
u1 v1 w1 u1 + v1 w1 (u1 + v1 ) + w1
(~u + ~v ) + w
~= + + = + =
u2 v2 w2 u2 + v2 w2 (u2 + v2 ) + w2
FT
u1 + (v1 + w1 ) u1 (v1 + w1 ) u1 v1 w1
= = + = + +
u2 + (v2 + w2 ) u2 (v2 + w2 ) u2 v2 w2
= ~u + (~v + w).
~
In the same fashion, verify commutativity and distributivity of the vector sum.
~v w~ w~ ~v
RA
w
~
~v +
~v
w
~
~+
~v
w
w~
~v
Figure 2.4: The picture illustrates the commutativity of the vector sum.
~z ~z
~z)
~z
D
+
w
~
+
)
w
w~
(w~
~+
~z ~
w
+
~v +
+
(~v
w
~ w
~
~v
~z
~v
~v ~v
We can take these properties and define an abstract vector space. We shall call a set of things, called
vectors, with a “well-behaved” sum of its elements and a “well-behaved” product of its elements
with scalars a vector space. The precise definition is the following.
Note that we will usually write λv instead of λ · v. Then V is called an R-vector space and its
elements are called vectors if the following holds:
(c) Identity element of addition: There exists an element O ∈ V , called the additive identity
such that for every v ∈ V , we have O + v = v + O = v.
(d) Inverse element: For all v ∈ V , we have an inverse element v 0 such that v + v 0 = O.
FT
(f) Compatibility: For every v ∈ V and λ, µ ∈ R, we have that (λµ)v = λ(µv).
These axioms are fundamental for linear algebra and we will come back to them in Chapter 5.1.
RA
Check that R2 is a vector space, that its additive identity is O = ~0 and that for every vector
~v ∈ R2 , its additive inverse is −~v .
It is important to note that there are vector spaces that do not look like R2 and that we cannot
always write vectors as columns. For instance, the set of all polynomials form a vector space (the
sum and scalar multiple of polynomials is again polynomial, the sum is additive and commutative;
the additive identity is the zero polynomial and for every polynomial p, its additive inverse is the
D
polynomial −p; we can multiply polynomials with scalars and obtain another polynomial, etc.). The
vectors in this case are polynomials and it does not make sense to speak about its “components” or
“coordinates”. (We will however learn how to represent certain subspaces of the space of polynomials
as subspaces of some Rn in Chapter 6.3.)
After this brief excursion about abstract vector spaces, let us return to R2 . We know that it can
be identified with the xy-plane. This means that R2 has more structure than only being a vector
space. For example, we can measure angles and lengths. Observe that these concepts do not appear
in the definition of a vector space. They are something in addition to the vector space properties.
Let us now look at some more geometric properties of vectors in R2 . Clearly a vector is known if
we know its length anditsangle with the x-axis. From the Pythagoras theorem it is clear that the
a √
length of a vector ~v = is a2 + b2 .
b
y
y y
~v
ϕ
~v
ϕ
x
x x ϕ
~v
y
~v
ϕ0
ϕ
x
FT
−~v
Figure 2.7: The angle of ~v and −~v with the x-axis. Clearly, ϕ0 = ϕ + π.
2
Definition 2.2 (Norm of a vector in R ). The length of ~v =
a
∈ R2 is denoted by k~v k. It
RA
b
is given by
p
k~v k = a2 + b2 .
As already mentioned earlier, the slope of vector ~v is ab if a 6= 0. If ϕ is the angle of the vector ~v
with the x-axis then tan ϕ = ab if a 6= 0. If a = 0, then ϕ = − π2 or ϕ = π2 . Recall that the range
a
of arctan is (−π/2, π/2), so we cannot simply take arctan
of the fraction b inorder to obtain ϕ.
D
b −b a −a a
Observe that arctan a = arctan −a , but the vectors and =− point in opposite
b −b b
directions, so they do not have the same angle with the x-axis. In fact, their angles differ by π, see
Figure 2.7. From elementary geometry, we find
arctan ab
if a > 0,
b π − arctan b
if a < 0,
tan ϕ = if a 6= 0 and ϕ= a
a π/2
if a = 0, b > 0,
−π/2 if a = 0, b < 0.
Note that this formula gives angles with values in [−π/2, 3π/2).
Remark 2.3. In order to obtain angles with values in (−π, π], we can use the formula
a
arccos √a2 +b2
if b > 0,
ϕ= − arccos √a2a+b2 if b < 0,
π if a < 0, b = 0.
(iii) k~v + wk
~ ≤ k~v k + kwk
~ (triangle inequality),
a c
FT
Proof. Let ~v = ,w
~= ∈ R2 and λ ∈ R.
b d
√
(i) Since k~v k = a2 + b2 it follows that k~v k = 0 if and only if a = 0 and b = 0. This is the case
if and only if ~v = ~0.
√
a λa p p
(ii) kλ~v k = λ = = (λa)2 + (λb)2 = λ2 (a2 + b2 ) = |λ| a2 + b2 = |λ|k~v k.
b λb
RA
(iii) We postpone the proof of the triangle inequality to Corollary 2.20 when we will have the
cosine theorem at our disposal.
rectly from the origin of the blue vector to its tip than taking
~ In other words, k~v +wk
a detour along ~v and w. ~ ≤ k~v k+kwk.
~ ~v
D
Note that every vector ~v 6= ~0 defines a unit vector pointing in the same direction as itself by k~v k−1~v .
Remark 2.6. (i) The tip of every unit vector lies on the unit circle, and, conversely, every vector
whose initial point is the origin and whose tip lies on the unit circle is a unit vector.
cos ϕ
(ii) Every unit vector is of the form where ϕ is its angle with the positive x-axis.
sin ϕ
~v
ϕ
x
1
FT
1 0
~e1 = , ~e2 = .
0 1
Clearly, ~e1 is parallel to the x-axis, ~e2 is parallel to the y-axis and k~e1 k = k~e2 k = 1.
a
Remark 2.7. Every vector ~v = can be written as
b
a a 0
~v = = + = a~e1 + b~e2 .
RA
b 0 b
Remark 2.8. Another notation for ~e1 and ~e2 is ı̂ and ̂.
Ejercicios.
3
1. Sean P (2, 3), Q(−1, 4) puntos en R2 y sea ~v = un vector en R2 .
−2
−−→
(a) Calcule P Q.
−−→
(b) Calcule kP Qk.
−−→
(c) Calcule P Q + ~v .
(d) Encuentre el ángulo que forma ~v con el eje x.
−−→
(e) Encuentre el ángulo que forma P Q con el eje x.
2. (a) Determine con la suma vectorial si los puntos (1, 1), (4, 2), (2, 4) y (−1, 3) forman un
paralelogramo.
(b) Repita el ejercicio anterior con los puntos (1, −3), (2, 0), (3, −2) y (0, 4).
FT
(c) Repita el ejercicio anterior con los puntos (1, 1), (2, 3), (3, 2) y (4, 4).
seems to suggest that the dot product behaves like a usual product, whereas in reality it does not,
see Remark 2.12.
Before we give properties of the inner product and explore what it is good for, we first calculate a
few examples to familiarise ourselves with it.
Examples 2.10.
2 −1
(i) , = 2 · (−1) + 3 · 5 = −2 + 15 = 13.
3 5
2
2 2 2
(ii) , = 22 + 32 = 4 + 9 = 13. Observe that this is equal to .
3 3 3
2 1 2 0
(iii) , = 2, , = 3.
3 0 3 1
2 −3
(iv) , = 0.
3 2
(iii) h~u , ~v + wi
~ = h~u , ~v i + h~u , wi.
~ In dot notation: ~u · (~v + w)
~ = ~u · ~v + ~u · w.
~
FT
u1 v1 w1
Proof. Let ~u = , ~v = and w
~= .
u2 v2 w2
Remark 2.12. Observe that the proposition shows that the inner product is commutative and
distributive, so it has some properties of the “usual product” that we are used to from the product
in R or C, but there are some properties that show that the inner product is not a product.
(a) The inner products takes two vectors and gives back a number, so it gives back an object that
is not of the same type as the two things we put in.
(b) In Example 2.10(iv) we saw that it may happen that ~v 6= ~0 and w
~ 6= ~0 but still h~v , wi
~ =0
which is impossible for a “decent” product.
(c) Given a vector ~v 6= 0 and a number c ∈ R, there are many solutions of the equation h~v , ~xi = c
for the vector ~x, in stark contrast to the usual product in R or C. Look for instance at
Example 2.10(i) and (ii). Therefore it makes no sense to write something like ~v −1 .
(d) There is no such thing as a neutral element for scalar multiplication.
Now let us see why the inner product is useful. In fact, it is related to the angle between two vectors
and it will help us to define orthogonal projections of one vector onto another. Let us start with a
definition.
~v ~v
w
~
ϕ
ϕ ϕ
w
~ ϕ ~v
w
~
w
~
~v
FT
Figure 2.10: Angle between two vectors.
(b) Two non-zero vectors ~v and w~ are called orthogonal (or perpendicular ) if ^(~v , w)
~ = π/2. In
this case we use the notation ~v ⊥ w.
~
The following properties should be intuitively clear from geometry. A formal proof of (ii) and (iii)
can be given easily after Corollary 2.20. The proof of (i) will be given after Remark 2.24.
~ be vectors in R2 . Then:
Proposition 2.16. Let ~v , w
Remark 2.17. (i) Observe that (i) is wrong if we do not assume that w ~ 6= ~0 because if w
~ = ~0,
2
then it is parallel to every vector ~v in R , but there is no λ ∈ R such that λw ~ could ever
become different from ~0.
(ii) Observe that the reverse direction in (ii) and (iii) is true only if λ 6= 0 and µ 6= 0.
Proof.
h~v , wi
~ = k~v kkwk
~ 2 = k~v k2 + kwk
k~v − wk ~ 2 − 2k~v kkwk
~ cos ϕ. (2.2) ϕ
~v
~ 2 = h~v − w
D
k~v − wk ~ , ~v − wi
~ = h~v , ~v i − h~v , wi
~ − hw
~ , ~v i + hw ~ = h~v , ~v i − 2h~v , wi
~ , wi ~ + hw
~ , wi
~
= k~v k2 − 2h~v , wi ~ 2.
~ + kwk (2.3)
k~v k2 + kwk
~ 2 − 2k~v kkwk
~ cos ϕ = k~v k2 − 2h~v , wi ~ 2,
~ + kwk
(i) ~v k w
~ ⇐⇒ k~v k kwk
~ = |h~v , wi|.
~
(ii) ~v ⊥ w
~ ⇐⇒ h~v , wi
~ = 0,
(iii) Cauchy-Schwarz inequality: |h~v , wi|
~ ≤ k~v k kwk.
~
(iv) Triangle inequality:
k~v + wk
~ ≤ k~v k + kwk.
~ (2.4)
Proof. The claims are clear if one of the vectors is equal to ~0 since the zero vector is parallel and
orthogonal to every vector in R2 . So let us assume now that ~v 6= ~0 and w~ 6= ~0.
(i) From Theorem 2.19 we have that |h~v , wi| ~ = k~v k kwk
~ if and only if | cos ϕ| = 1. This is the
case if and only if ϕ = 0 or π, that is, if and only if ~v and w
~ are parallel.
(ii) From Theorem 2.19 we have that |h~v , wi| ~ = 0 if and only if cos ϕ = 0. This is the case if and
only if ϕ = π/2, that is, if and only if ~v and w
~ are perpendicular.
FT
ϕ ∈ [0, π].
w
~
~v +
w
~ ϕ
~ 2 = k~v k2 + kwk
k~v + wk ~ 2 + 2k~v k wk
~ cos ϕ
RA
≤ k~v k2 + kwk
~ 2 + 2k~v k wk
~ ~v
~ 2.
= (k~v k + kwk)
Taking the square root on both sides gives us the desired inequality.
Question 2.1
When does equality hold in the triangle inequality (2.4)? Draw a picture and prove your claim
using the calculations in the proof of (iv).
D
Exercise. Prove (ii) and (iii) of Proposition 2.16 using Corollary 2.20.
Exercise. (i) Prove Corollary 2.20 (iii) without the cosine theorem.
2
Hint. Start with the inequality 0 ≤ kwk~
~ v − k~v kw
~ and expand the right hand side similar
~ 2 k~v k2 − 2(h~v , wi)
as in the proof of Proposition 8.6. You will find that 0 ≤ 2kwk ~ 2.
(ii) Prove Corollary 2.20 (iv) without the cosine theorem.
Hint. Cf. the proof of the triangle inequality in Cn (Proposition 8.6).
We give a proof of (iii) and (iii) in Proposition 8.6 without the use of the cosine theorem which
works also in the complex case.
Example 2.21. Theorem 2.19 allows us to calculate the angle of a given vector with the x-axis
easily (see Figure 2.13):
h~v ,~e1 i h~v ,~e2 i
cos ϕx = , cos ϕy = .
k~v kk~e1 k k~v kk~e2 k
If we now use that k~e1 k = k~e2 k = 1 and that h~v ,~e1 i = v1 and h~v ,~e2 i = v2 , then we can simplify
the expressions to
v1 v2
cos ϕx = , cos ϕy = .
k~v k k~v k
y
~v ϕy
ϕx
x
• use the inner product to determine if two vectors are parallel, perpendicular or neither,
• etc.
Ejercicios.
2
1. Sea ~v = ∈ R2 .
5
(d) Encuentre todos los vectores con norma 2 que son ortogonales a ~v .
2. Para los siguientes vectores ~u y ~v decida si son ortogonales, paralelos o ninguno de los dos.
Calcule el coseno del ángulo entre ellos. Si son paralelos, encuentre números reales λ y µ tales
que ~v = λ~u y ~u = µ~v .
1 5 2 1
(a) ~v = , ~u = , (b) ~v = , ~u = ,
4 −2 4 2
3 −8 −6 3
(c) ~v = , ~u = , (d) ~v = , ~u = .
4 6 4 −2
FT
(iv) ~v = , w
~= , (ii) ~v = , w
~= ,
5 2 α 2α
4;
(v) el ángulo entre ~a y ~b es 5π
6 .
The vector from the initial point to the intersection of the two lines should then be the orthogonal
projection of ~v onto w.
~ see Figure 2.14
~v
~v
~v
w
~
w
~ w
~
~v
~v
·
~v
· w
~ ~vk
~vk w
~ w
~
·
~vk
~ in R2 .
FT
Figure 2.14: Some examples for the orthogonal projection of ~v onto w
This procedure decomposes the vector ~v in a part parallel to w ~ and a part perpendicular to w
~ so
that their sum gives us back ~v . The parallel part is the orthogonal projection of ~v onto w.
~
In the following theorem we give the precise meaning of the orthogonal projection, we show that
a decomposition as described above always exists and we even derive a formula for orthogonal
projection. A more general version of this theorem is Theorem 7.30.
RA
Theorem 2.22 (Orthogonal projection). Let ~v and w ~ 6= ~0. Then there
~ be vectors in R2 and w
exist uniquely determined vectors ~vk and ~v⊥ (see Figure 2.15) such that
~vk k w,
~ ~v⊥ ⊥ w
~ and ~v = ~vk + ~v⊥ . (2.5)
h~v , wi
~
~vk = w.
~ (2.6)
~ 2
kwk
D
~v ~v⊥
~v ~v⊥
·
~v
· w
~
~vk = projw~ ~v
w
~ ~v⊥ w
~
~vk = projw~ ~v
·
~vk = projw~ ~v
Proof. Assume we have vectors ~vk and ~v⊥ satisfying (2.5). Since ~vk and w
~ are parallel by definition
and since w~ 6= ~0, there exists λ ∈ R such that ~vk = λw,
~ so in order to find ~vk it is sufficient to
determine λ. For this, we notice that ~v = λw ~ + ~v⊥ by (2.5). Taking the inner product on both
sides with w
~ leads to
h~v , wi
~ = hλw ~ = hλw
~ + ~v⊥ , wi ~ , wi ~ = hλw
~ + h~v⊥ , wi ~ , wi
~ = λhw
~ , wi ~ 2
~ = λkwk
| {z }
v⊥ ⊥ w
= 0 since ~ ~
h~v , wi
~
=⇒ λ= .
~ 2
kwk
h~v , wi
~ h~v , wi
~
~vk = λw
~= w
~ and ~v⊥ = ~v − ~vk = ~v − w.
~
~ 2
kwk ~ 2
kwk
This already proves uniqueness of the vectors ~vk and ~v⊥ . It remains to show that they indeed have
FT
the desired properties. Clearly, by construction ~vk is parallel to w
~ and ~v = ~vk + ~v⊥ since we defined
~v⊥ = ~v − ~vk . It remains to verify that ~v⊥ is orthogonal to w.
~ This follows from
h~v , wi
~ h~v , wi
~ h~v , wi
~
h~v⊥ , wi
~ = ~v − w
~ , w
~ = h~
v , wi
~ − w
~ , w
~ = h~v , wi
~ − hw
~ , wi
~ =0
~ 2
kwk ~ 2
kwk ~ 2
kwk
Notation 2.23. Instead of ~vk we often write projw~ ~v , in particular when we want to emphasise
RA
onto which vector we are projecting.
Proof. (i): By our geometric intuition, this should be clear. Let us give a formal proof. Suppose
~ for some c ∈ R \ {0}. Then
we want to project ~v onto cw
h~v , cwi
~ ch~v , wi
~ h~v , wi
~
projcw~ ~v = (cw)
~ = 2 (cw)
~ = w
~ = projw~ ~v .
kcwk~ 2 c kwk~ 2 ~ 2
kwk
(ii): Again, by geometric considerations, this should be clear. The corresponding calculation is
hc~v , wi
~ ch~v , wi
~
projw~ (c~v ) = 2
w
~= w
~ = c projw~ ~v .
kwk~ kwk~ 2
Proof of Proposition 2.16 (i). We have to show that if ~v k w ~ 6= ~0, then there exists λ ∈ R
~ and if w
~ = λ~v . From Remark 2.24 (iv) it follows that ~v = projw~ ~v = h~
such that w v ,wi
~
~ 2 w,
kwk ~ hence the claim
h~
v ,wi
~
follows if we can choose λ = ~ 2 .
kwk
FT
We end this section with some examples.
h~
u ,~e2 i 3
(ii) proj~e2 ~u = k~e2 k2 ~ e2 = 12 ~
e2 = 3~e2 .
RA
(iii) Similarly, we can calculate proj~e1 ~v = 4~e1 , proj~e2 ~v = −~e2 .
2
,
5
−1
h~ vi
u ,~ 3 8−3 5 5 2
(iv) proj~u ~v = k~uk2 ~u = uk2
k~ ~u = 22 +32 ~u = 13 ~u = 13 .
3
4 2
,
−1
h~ ui
v ,~ 3 8−3 5 5 4
(v) proj~v ~u = k~vk2 ~u = ~u = 42 +(−1)2 ~u = 17 ~u = 17 .
D
uk2
k~ −1
a
Example 2.26 (Angle with coordinate axes). Let ~v = ∈ R2 \ {~0}. Then cos ^(~v ,~e1 ) =
b
a b
vk ,
k~ cos ^(~v ,~e2 ) = vk ,
k~ hence
a cos ^(~v ,~e1 ) cos ϕx
~v = = k~v k = k~v k
b cos ^(~v ,~e2 ) cos ϕy
and
projection of ~v onto the x-axis = proj~e1 ~v = k~v k cos ^(~v ,~e1 )~e1 = k~v k cos ϕx ~e1 ,
projection of ~v onto the y-axis = proj~e2 ~v = k~v k cos ^(~v ,~e2 )~e2 = k~v k cos ϕy ~e2 .
Question 2.2
~ be a vector in R2 \ {~0}.
Let w
~ is equal to ~0?
(i) Can you describe geometrically all the vectors ~v whose projection onto w
(ii) Can you describe geometrically all the vectors ~v whose projection onto w
~ have length 2?
(iii) Can you describe geometrically all the vectors ~v whose projection onto w
~ have length 3kwk?
~
FT
You should now be able to
• calculate the projection of a given vector onto another vector,
• calculate vectors with a given projection onto another vector,
• etc.
Ejercicios.
RA
1 5
1. Sean ~a = y ~b = .
3 2
2.4 Vectors in Rn
In this section we extend our calculations from R2 to Rn . If n = 3, then we obtain R3 which
usually serves as model for our everyday physical world and which you probably already are familiar
with from physics lectures. We will discuss R3 and some of its peculiarities in more detail in the
Section 2.5.
First, let us define Rn .
Again we can think of vectors as arrows. As in R2 , we can identify every point in Rn with the arrow
that starts in the origin of coordinate system and ends in the given point. The set of all arrows
with the same lengthandthe same direction is called a vector in Rn . So every point P (p1 , . . . , pn )
p1
defines a vector ~v = ... and vice versa. As before, we sometimes denote vectors as (p1 , . . . , pn )t
pn
in order to save (vertical) space. The superscript t stands for “transposed”.
FT
v1 w1 v1 + w 1 cv1
~ = ... + ... =
Rn × Rn → Rn , ~v + w ..
, R × Rn → Rn , c~v = ... . (2.7)
.
vn wn vn + w n cvn
Exercise. Show that Rn is a vector space. That is, you have to show that the vector space
axioms on page 29 hold.
As in R2 , we can define the norm of a vector, the angle between two vectors and an inner product.
RA
Note that the definition of the angle between two vectors is not different from the one in R2 since
when we are given two vectors, they always lie in a common plane which we can imagine as some
sort of rotated R2 . Let us give now the formal definitions.
v1 w1
.. ..
Definition 2.28 (Inner product; norm of a vector). For vectors ~v = . and w
~ = .
vn wn
the inner product (or scalar product or dot product) is defined as
D
* v1 w1 +
.. ..
~ = . , . = v1 w1 + · · · + vn wn .
h~v , wi
vn wn
v1
The length of ~v = ... ∈ Rn is denoted by k~v k and it is given by
vn
q
k~v k = v12 + · · · + vn2 .
(a) ~v k w
~ ⇐⇒ ~ ∈ {0, π}
^(~v , w) ⇐⇒ |h~v , wi|
~ = k~v k kwk,
~
(b) ~v ⊥ w
~ ⇐⇒ ^(~v , w)
~ = π/2 ⇐⇒ h~v , wi
~ = 0.
Remark 2.29. In abstract inner product spaces, the inner product is actually used to define
FT
orthogonality.
(iv) Relation of the inner product with the norm: For all vectors ~v ∈ Rn , we have k~v k2 = h~v , ~v i.
~ ∈ Rn and scalars c ∈ R, we have that kc~v k = |c|k~v k
(v) Properties of the norm: For all vectors ~v , w
and k~v + wk
~ ≤ k~v k + kwk.
~
~ 6= ~0 the
~ ∈ Rn with w
(vi) Orthogonal projections of one vector onto another: For all vectors ~v , w
orthogonal projection of ~v onto w
~ is
RA
h~v , wi
~
projw~ ~v = w.
~ (2.8)
~ 2
kwk
As in R2 , we have n “special vectors” which are parallel to the coordinate axes and have norm 1:
1 0 0
0 1 ..
~e1 := . , ~e2 := . , . . . , ~en := . .
.. .. 0
D
0 0 1
In the special case n = 3, the vectors ~e1 , ~e2 and ~e3 are sometimes denoted by ı̂,̂, k̂.
For a given vector ~v 6= ~0, we can now easily determine its projections onto the n coordinate axes
and its angle with the coordinate axes. By (2.8), the projection onto the xj -axis is
proj~ej ~v = vj~ej .
h~v ,~ej i vj
ϕj = ^(~v ,~ej ) =⇒ cos ϕx = = .
k~v k k~ej k k~v k
cos ϕ1
It follows that ~v = k~v k ... . Sometimes the notation
cos ϕn
cos ϕ1
~v
v̂ := = k~v k ...
k~v k
cos ϕn
is used for the unit vector pointing in the same direction as ~v . Clearly kv̂k = 1 because kv̂k =
kk~v k−1~v k = k~v k−1 k~v k = 1. Therefore v̂ is indeed a unit vector pointing in the same direction as
the original vector ~v .
FT
• that R2 from chapter 2.1 is a special case of Rn from this section,
• etc.
You should now be able to
• perform algebraic operations in the vector space R3 and, in the case n = 3, visualise them
in space,
• calculate lengths and angles,
RA
• calculate unit vectors, scale vectors,
• perform simple abstract proofs (e.g., prove that Rn is a vector space).
• etc.
Ejercicios.
2 0
1 4
1. Sean ~a = ~
0 y b = 5. Calcular:
D
3 1
~
(a) 4~a + 3b. (c) h~a − ~b + 3~e1 , ~b − 5~e4 + ~e3 i.
(b) k3~a − 2~bk. (d) proj~b ~a.
v1 w1
Definition 2.30 (Cross product). Let ~v = v2 , w ~ = w2 ∈ R3 . Their cross product (or
v3 w3
vector product or wedge product) is
v1 w1 v2 w3 − v3 w2
~ = v2 × w2 := v3 w1 − v1 w3 .
~v × w
v3 w3 v1 w2 − v2 w1
A way to remember this formula is as follows. Write the first and the second component of the
vectors underneath them, so that formally you get a column of 5 components. Then make crosses
as in the sketch below, starting with the cross consisting of a line from v2 to w3 and then from w2
to v3 . Each line represents a product of the corresponding components; if the line goes from top
left to bottom right then it is counted positive, if it goes from top right to bottom left then it is
counted negative.
FT v2 w3 − v3 w2
v1 w1
v2 × w2 = v3 w1 − v1 w3
v3 w3 v1 w2 − v2 w1
v1 w1
v2 w2
RA
The cross product is defined only in R3 !
Before we collect some easy properties of the cross product, let us calculate a few examples.
1 5
Examples 2.31. Let ~u = 2, ~v = 6.
3 7
1 5 2·7−3·6 14 − 18 −4
D
(i) ~u × ~0 = ~0 × ~u = ~0.
(iii) ~u × (~v + w)
~ = (~u × ~v ) + (~u × w).
~
(vi) h~u , ~v × wi
~ = h~u × ~v , wi.
~
FT
~v ⊥ ~v × ~u, ~u ⊥ ~v × ~u
Proof. The proofs of the formulas (i) – (iv) are easy calculations (you should do them!).
(v) The implication “=⇒” is easy to check. The other direction follows easily from Theorem 2.34
below. Or it can be seen as follows: Let us assume that w ~ × ~v = ~0. If ~v = ~0, then clearly
RA
~ = ~a + ~b where ~a = proj~v w
~ If ~v 6= ~0, then we write w
~v k w. ~ and ~b ⊥ v. We need to show that
~b = ~0. Using that ~v × ~a = ~0 (because they are parallel), we obtain that
v2 b3 − v3 b2
~ = ~v × (~a + ~b) = (~v × ~a) + (~v × ~b) = ~v × ~b = v3 b1 − v1 b3 .
~0 = ~v × w
v1 b2 − v2 b1
In addition we know that h~v , ~bi = 0. So in total we have four linear equations for the three
components b1 , b2 , b3 of ~b:
D
v2 b3 − v3 b2 = 0, v3 b1 − v1 b3 = 0, v1 b2 − v2 b1 = 0, v1 b1 + v2 b2 + v3 b3 = 0.
Since ~v 6= ~0, at least one of its components is diffrerent from 0. Let as asssume that v1 6= 0.
Then we can solve for b1 in the second and third equation and obtain that b2 = vv12 b1 and
b3 = vv13 b1 . If we subsitute this in the last equation, we find that
v22 v2 b1 k~v k2
0 = v1 b1 + b1 + 3 b1 = 2 (v12 + v22 + v32 ) = b1 .
v1 v1 v1 v1
Note that the cross product is distributive but it is neither commutative nor associative.
tion 2.36.
FT
Exercise. Prove the formulas in (i) – (iv) and the implication “=⇒” in (v).
Remark 2.33. The property (vii) explains why the cross product makes sense only in R3 . Given
two non-parallel vectors ~v and w,
~ their cross product is a vector which is orthogonal to both of
RA
them and whose length is k~v k kwk
~ sin ϕ (see Theorem 2.34; ϕ = ^(~v , w))
~ and this should define the
result uniquely up to a factor ±1. This factor has to do with the relative orientation of ~v and w
~ to
each other. However, if n 6= 3, then one of the following holds:
• If we were in R2 , the problem is that “we do not have enough space” because then the only
vector orthogonal to ~v and w ~ at the same time would be the zero vector ~0 and it would not
make too much sense to define a product where the result is always ~0.
• If we were in some Rn with n ≥ 4, the problem is that “we have too many choices”. We will
see later in Chapter 7.3 that the orthogonal complement of the plane generated by ~v and w ~
D
has dimension n − 2 and every vector in the orthogonal complement is orthogonal to both
~v and w.~ For example, if we take ~v = (1, 0, 0, 0)t and w~ = (0, 1, 0, 0)t , then every vector of
t
the form ~a = (0, 0, x, y) is perpendicular to both ~v and w
~ and it easy to find infinitely many
vectors of this form which in addition have norm k~v k kwk~ sin ϕ = 1 (~a = (0, 0, sin ϑ, ± cos ϑ)t
for arbitrary ϑ ∈ R works).
Recall that for the inner product we proved the formula h~v , wi
~ = k~v k kwk
~ cos ϕ where ϕ is the angle
between the two vectors, see Theorem 2.19. In the next theorem we will prove a similar relation
for the cross product.
k~v × wk
~ = k~v k kwk
~ sin ϕ
~ 2 = k~v k2 kwk
Proof. A long, but straightforward calculation shows that k~v × wk ~ 2 − h~v , wi
~ 2 . Now it
follows from Theorem 2.19 that
~ 2 = k~v k2 kwk
k~v × wk ~ 2 − h~v , wi
~ 2 = k~v k2 kwk
~ 2 − k~v k2 kwk
~ 2 (cos ϕ)2
= k~v k2 kwk
~ 2 (1 − (cos ϕ)2 ) = k~v k2 kwk
~ 2 (sin ϕ)2 .
If we take the square root of both sides, we arrive at the claimed formula. (We do not need to
worry about taking the absolute value of sin ϕ because ϕ ∈ [0, π], hence sin ϕ ≥ 0.)
~ 2 = k~v k2 kwk
Exercise. Show that k~v × wk ~ 2 − h~v , wi
~ 2.
FT
~ be two vectors in R3 . Then they define a parallelogram (if the vectors are parallel or
Let ~v and w
one of them is equal to ~0, it is a degenerate parallelogram).
w
~ h
RA
~v
Proposition 2.35 (Area of a parallelogram). The area of the parallelogram spanned by the
vectors ~v and w
~ is
A = k~v × wk.
~ (2.9)
D
Proof. The area of a parallelogram is the product of the length of its base with the height. We
can take w ~ and ~v . Then we obtain that h = k~v k sin ϕ and
~ as base. Let ϕ be the angle between w
therefore, with the help of Theorem 2.34
A = kwkh
~ = kwkk~
~ v k sin ϕ = k~v × wk.
~
Volume of a paralellepiped
~n
pro j~ ~u
n ~u
FT
w
~
~v
RA
Figure 2.17: Parallelepiped spanned by ~
u, ~v , w.
~
Proof. The volume of a parallelepiped is the product of the area of its base with the height. Let us
D
|h~u , ~v × wi|
~
V = Ah = k~v × wk
~ = |h~u , ~v × wi|.
~
k~v × wk~
Note that in the case when two of the vectors ~u, ~v and w
~ are parellel, the formula gives the right
answer that he volume of the parellelepiped is 0.
~ ∈ R3 . Then
Corollary 2.37. Let ~u, ~v , w
|h~u , ~v × wi|
~ = |h~v , w
~ × ~ui| = |hw
~ , ~u × ~v i|.
Proof. The formula holds because each of the expressions describes the volume of the parallelepiped
spanned by the three given vectors since we can take any of the sides of the parallelogram as its
base.
Ejercicios.
1. (a) Calcule el área del paralelogramo cuyos vértices adyacentes son A(1, 2, 3), B(2, 3, 4),
C(−1, 2, −5) y calcule el cuarto vértice.
(b) Calcule el área del triángulo con los vértices A(1, 2, 3), B(2, 3, 4), C(−1, 2, −5).
D
(c) Encuentre un punto P tal que el área del triángulo con vértices B, C, P sea igual a 13.
¿Cuántos tales puntos P hay? Descrı́balos geométricamente.
2. Calcule
el volumen
del
paralelepı́pedo
determinado por los vectores
5 −1 1
~u = 2 , ~v = 4 , w~ = −2.
1 3 7
3.
Use elproducto
cruz para encontrar el seno del ángulo formado por los vectores
2 −3
1 y −2.
−1 4
1 2
4. Encuentre todos los vectores ~a ∈ R3 tales que ~a ⊥ −1 y ~a ⊥ 0 ¿Cuántos de ellos
2 −3
1 8
tienen norma 1?. ¿Cuáles tales ~a satisfacen que ~a × −1 = −2?
2 −5
FT
(affine) (n − 1)-dimensional subspace of an n-dimensional vector space). In this section we appeal
to our knowledge and intuition from elementary geometry.
Lines
Intuitively, it is clear what a line in R3 should be. In order to describe a line in R3 completely, it
is not necessary to know all its points. It is sufficient to know either
(b) one point P on the line and the direction of the line.
Q Q
~v ~v
P P P
D
# –
OQ
L L L # –
OP
O
Figure 2.18: Line L given by: two points P, Q on L; or by a point P on L and the direction of L.
Clearly, both descriptions are equivalent because: If we have two different points P, Q on the line
# –
L, then its direction is given by the vector P Q. If on the other hand we are given a point P on L
# – # –
and a vector ~v which is parallel to L, then we easily get another point Q on L by OQ = OP + ~v .
Given two points P (p1 , p2 , p3 ) and Q(q1 , q2 , q3 ) with P 6= Q, there is exactly one line L which passes
through both points. In formulas, this line is described as
n# – # – o p1 + (q1 − p1 )t
L = OP + tP Q : t ∈ R = p2 + (q2 − p2 )t : t ∈ R . (2.11)
p3 + (q3 − p3 )t
v1
If we are given a point P (p1 , p2 , p3 ) on L and a vector ~v = v2 6= ~0 parallel to L, then
v3
n# – o p 1 + v1 t
L = OP + t~v : t ∈ R = p2 + v2 t : t ∈ R (2.12)
p 3 + v3 t
FT
The formulas are easy to understand. They say: In order to trace the line, we first move to an
# –
arbitrary point on the line (this is the term OP ) and then we move an amount t along the line.
With this procedure we can reach every point on the line, and on the other hand, if we do this,
then we are guaranteed to end up on the line.
The formulas (2.11) and (2.12) are called vector equation for the line L. Note that they are the
same if we set v1 = q1 − p1 , v2 = q2 − p2 , v3 = q3 − p3 . We will mostly use the notation with the
v’s since it is shorter. The vector ~v is called directional vector of the line L.
RA
Question 2.3
# –
Is it true that E passes through the origin if and only if OP = ~0?
Remark 2.38. It is important to observe that a given line has many different parametrisations.
• The vector equation that we write down depends on the points we choose on L. Clearly, we
have infinitely many possibilities to do so.
D
• Any given line L has many directional vectors. Indeed, if ~v is a directional vector for L, then
c~v is so too for every c ∈ R \ {0}. However, all possible directional vectors are parallel.
Exercise. Check that the following formulas all describe the same line:
1 6 1 12
(i) L1 = 2 + t 5 : t ∈ R , (ii) L2 = 2 + t 10 : t ∈ R ,
3 4 3 8
13 6
(ii) L3 = 12 + t 5 : t ∈ R .
11 4
Question 2.4
• How can you see easily if two given lines are parallel or perpendicular to each other?
• How would you define the angle between two lines? Do they have to intersect so that an
angle between them can be defined?
From the formula (2.12) it is clear that a point (x, y, z) belongs to L if and only if there exists t ∈ R
such that
FT
If we had started with (2.11), then we would have obtained
The system of equations (2.13) or (2.14) are called the parametric equations of L. Here, t is the
parameter.
RA
Symmetric equation of a line
Observe that for (x, y, z) ∈ L, the three equations in (2.13) must hold for the same t. If we assume
that v1 , v2 , v3 6= 0, then we can solve for t and we obtain that
x − p1 y − p2 z − p3
= = . (2.15)
v1 v2 v3
x − p1 y − p2 z − p3
= = . (2.16)
q1 − p1 q2 − p2 q3 − p3
If v1 = v2 = 0 and v3 6= 0, then the line is parallel to the z-axis and its symmetric equation is
x = p1 , y = p2 , z ∈ R.
Representations of lines in Rn .
In Rn , the vector form of a line is
n# – o
L = OP + t~v : t ∈ R
and, assuming that all vj are different from 0, its symmetric form is
x1 − p1 x2 − p2 xn − pn
= = ··· = .
v1 v2 vn
FT
In R2 , there is also the normal form of a line:
L : ax + by = d (2.17)
where a, b and d are fixed numbers. This means that L consists of all the points (x, y) whose
coordinates satisfy the equation ax + by = d.
(i) Given a line in the form (2.17), find a vector representation.
RA
(ii) Given a line in vector representation, find a normal form (that is, write it as (2.17)).
a
(iii) What is the geometric interpretation of a, b? (Hint: Draw the line L and the vector .)
b
(iv) Can this normal form be extended/generalised to lines in R3 ? If it is possible, how can it
be done? If it is not possible, explain why not.
Planes
D
~n
R E E E
w
~
Q P
P ~v
P
Figure 2.19: Plane E given by: (a) three points P, Q, R on E, (b) a point P on E and two vectors
~v , w
~ parallel to E, (c) a point P on E and a vector ~n perpendicular to E.
~
n
FT
E
Q R
P
# Q
# P
–
–
O
O R–
#
RA
(0, 0, 0)
# – # –
Figure 2.20: Plane E given with three points P, Q, R on E, two vectors P Q, P R parallel to E, and a
# – # –
vector ~n perpendicular to E. Note the ~n k P Q × P R.
# – # – # – # –
other. (Of course, we also could have taken QR and QP or RP and RQ.) If, on the other hand,
~ parallel to E and ~v 6k w,
we have one point P on E and two vectors ~v and w, ~ then we can easily get
D
# – # – # – # –
two other points on E, for instance by OQ = OP + ~v and OR = OP + w. ~ Then the three points
P, Q, R lie on E and do not lie on a line.
As in the case of the vector equation of a line, it is easy to understand the formula. We first move
# –
to an arbitrary point on the line (this is the term OP ) and then we move parallel to the plane as
FT
n3 z − p3
= n1 x + n2 y + n3 z − (n1 p1 + n2 p2 + n3 p3 )
If we set d = n1 p1 + n2 p2 + n3 p3 , then it follows that a point Q(x, y, z) belongs to E if and only if
its coordinates satisfy
n1 x + n2 y + n3 z = d. (2.18)
Equation (2.18) is called the normal form for the plane E and ~n is called a normal vector of E.
RA
Notation 2.39. In order to define E, we write E : n1 x + n2 y + n3 z = d. As a set, we denote E as
E = {(x, y, z) : n1 x + n2 y + n3 z = d}.
Remark 2.40. As before, note that the normal equation for a plane is not unique. For instance,
x + 2y + 3z = 5 and 2x + 4y + 6z = 10
describe the same plane. The reason is that “the” normal vector of a plane is not unique. If ~n is
D
normal vector of the plane E, then every c~n with c ∈ R \ {0} is also a normal vector to the plane.
Definition 2.41. The angle between two planes is the angle between their normal vectors.
Note that this definition is consistent with the fact that two planes are parallel if and only if their
normal vectors are parallel.
Remark 2.42. • Assume a plane is given as in (b) (that is, we know a point P on E and two
vectors ~v and w ~ parallel to E but with ~v 6k w).
~ In order to find a description as in (c) (that is
one point on E and a normal vector), we only have to find a vector ~n that is perpendicular to
both ~v and w. ~ Proposition 2.32(vii) tells us how to do this: we only need to calculate ~v × w.
~
Another way to find an appropriate ~n is to find a solution of the linear 2 × 3 system given by
{h~v , ~ni = 0, hw
~ , ~ni = 0}.
• Assume a plane is given as in (c) (that is, we know a point P on E and a normal vector). In
order to find vectors ~v and w
~ as in (b), we can proceed in many ways:
Representations of planes in Rn .
In Rn , the vector form of plane is
n# – o
~ :t∈R
E = OP + t~v + sw
FT
~ parallel to the plane but not parallel to each other.
Note that there is no normal form of a plane in Rn for n ≥ 4. The reason it that for n ≥ 4, there
are more than just one normal directions to a given plane, so a normal form of a plane E must
consist of more than one equations (more precisely, it must consist of n − 2 equations of the form
n1 x1 + . . . nn xn = d).
RA
You should have understood
• the concept of lines and planes in R3 ,
• how they can be described in formulas,
• etc.
You should now be able to
• pass easily between the different descriptions of lines and planes,
D
• etc.
Ejercicios.
1. Mostrar que las siguientes ecuaciones describen la misma recta:
1 4 1 8 1 −4
2 + t 5 : t ∈ R , 2 + t 10 : t ∈ R , 2 + t −5 : t ∈ R ,
3 6 3 12 3 −6
5 4 x−1 y−2 z−3 x+3 y+3 z+3
7 + t 5 : t ∈ R , = = , = = .
4 5 6 4 5 6
9 6
FT
(a) Encuentre por lo menos tres puntos que pertenecen a E.
(b) Encuentre un punto en E y dos vectores ~v y w
~ en E que no son paralelos entre si.
(c) Encuentre un punto en E y un vector ~n que es ortogonal a E.
(d) Encuentre un punto en E y dos vectores ~a y ~b en E con ~a ⊥ ~b.
4. Para los puntos P (1, 1, 1), Q(1, 0, −1) y los siguientes planos E,
RA
(i) encuentre la ecuación del plano.
(ii) determine si P pertenece al plano.
(iii) encuentre una recta que esté ortogonal a E y que contenga al punto Q.
(b) E es el plano que contiene los puntos A(1, 0, 1), B(2, 3, 4), C(3, 2, 4).
3
(c) E es el plano que contiene el punto A(1, 0, 1) y es ortogonal al vector ~n = 2.
1
(a) The lines intersect in exactly one point. In this case, they cannot be parallel.
(b) The lines intersect in infinitely many points. In this case, the lines have to be equal. In
particular the have to be parallel.
(c) The lines do not intersect. Note that in contrast to the case in R2 , the lines do not have to be
parallel for this to happen. For example, the line L : x = y = 1 is a line parallel to the z-axis
passing through (1, 1, 0), and G : x = z = 0 is a line parallel to the y-axis passing through
(0, 0, 0), The lines do not intersect and they are not parallel.
L1 ∩ L2 = L1
# –
FT
Proof. A point Q(x, y, z) belongs to L1 ∩ L2 if and only if it belongs both to L1 and L2 . This means
# –
that there must exist an s ∈ R such that OQ = p~1 + s~v1 and there must exist a t ∈ R such that
OQ = p~2 + t~v2 . Note that s and t are different parameters. So we are looking for s and t such that
RA
0 1 2 2
p~1 + s~v1 = p~2 + t~v2 , that is 0 + s 2 = 4 + t 4 . (2.19)
1 3 7 6
Once we have solved (2.19) for s and t, we insert them into the equations for L1 and L2 respectively,
in order to obtain Q. Note that (2.19) in reality is a system of three equations: one equation for
each component of the vector equation. Writing it out and solving each equation for s, we obtain
D
0 + s = 2 + 2t s = 2 + 2t
0 + 2s = 4 + 4t ⇐⇒ s = 2 + 2t
1 + 3s = 7 + 6t s = 2 + 2t
This means that there are infinitely many solutions of (2.19). Given any point R on L1 , there is a
# – # –
corresponding s ∈ R such that OR = p~1 + s~v1 . Now if we choose t = (s − 2)/2, then OR = p~2 + t~v2
holds, hence R ∈ L2 too. If on the other hand we have a point R0 ∈ L2 , then there is a corresponding
# – # –
t ∈ R such that OR0 = p~2 + t~v2 . Now if we choose s = 2 + 2t, then OR0 = p~1 + t~v1 holds, hence
R0 ∈ L2 too. In summary, we showed that L1 = L2 .
Remark 2.44. We could also have seen that the directional vectors of L1 and L2 are parallel. In
fact, ~v2 = 2~v1 . It then suffices to show that L1 and L2 have at least one point in common in order
to conclude that the lines are equal.
L1 ∩ L3 = {(1, 2, 4)}
1 0 + s = −1 + t 1 s − t = −1
2 0 + 2s = 0 + t ⇐⇒ 2 2s − t = 0
3 1 + 3s = 0 + 2t 3 3s − 2t = −1
FT
candidates for s and t, we find that 3 · 1 − 2 · 2 = −1 which is consistent with 3 .
So L1 and L3 intersect in exactly one point. In order to find it, we put s = 1 in the equation for
L1 :
0 1 1
# –
OQ = p~1 + 1 · ~v1 = 0 + 2 = 2 ,
1 3 4
hence the intersection point is Q(1, 2, 4).
In order to check if this result is correct, we can put t = 2 in the equation for L3 . The result must
RA
be the same. The corresponding calculation is:
−1 2 1
# –
OQ = p~3 + 2 · ~v3 = 0 + 2 = 2 ,
0 4 4
L1 ∩ L4 = ∅
1 s=3+ t 1 s− t=3
2 2s = t ⇐⇒ 2 2s − t = 0
3 1 + 3s = 5 + 2t 3 3s − 2t = 5
Intersection of planes
Given two planes E1 and E2 in R3 , there are two possibilities:
(a) The planes intersect. In this case, they necessarily intersect in infinitely many points. Their
intersection is either a line (if E1 and E2 are not parallel) or a plane (if E1 = E2 ).
(b) The planes do not intersect. In this case, the planes must be parallel and not equal.
E1 : x + y + 2z = 3,
E1 ∩ E2 = ∅
FT
Example 2.45. We consider the following four planes:
E2 : 2x + 2y + 4z = −4, E3 : 2x + 2y + 4z = 6, E4 : x + y − 2z = 5.
RA
Proof. The set of all points Q(x, y, z) which belong both to E1 and E2 is the set of all x, y, z which
simultaneously satisfy
1 x + y + 2z = 3,
2 2x + 2y + 4z = −4.
Now clearly, if x, y, z satisfies 1 , then it cannot satisfy 2 (the right side would be 6). We can
see this more formally if we solve 1 , e.g., for x and then insert into 2 . We obtain from 1 :
x = 3 − y − 2z. Inserting into 2 leads to
D
−4 = 2(3 − y − 2z) + 2y + 4z = 6,
which is absurd.
1 2
This result was to be expected since the normal vectors of the planes are ~n1 = 1 and ~n2 = 2
2 4
respectively. Since they are parallel, the planes are parallel and therefore they either are equal or
they have empty intersection. Now we see that for instance (3, 0, 0) ∈ E1 but (3, 0, 0) ∈/ E2 , so the
planes cannot be equal. Therefore they have empty intersection.
E1 ∩ E3 = E1
Proof. The set of all points Q(x, y, z) which belong both to E1 and E3 is the set of all x, y, z which
simultaneously satisfy
1 x + y + 2z = 3,
2 2x + 2y + 4z = 6.
Clearly, both equations are equivalent: if x, y, z satisfies 1 , then it also satisfies 2 and vice versa.
Therefore, E1 = E3 .
4 −1
E1 ∩ E4 = 0 + t 1 : t ∈ R .
− 12 0
1 1
Proof. First, we notice that the normal vectors ~n1 = 1 and ~n4 = 1 are not parallel, so we
2 −2
FT
expect that the solution is a line in R3 .
The set of all points Q(x, y, z) which belong both to E1 and E4 is the set of all x, y, z which
simultaneously satisfy
1 x + y + 2z = 3,
2 x + y − 2z = 5.
1
z=− , x=4−y with y ∈ R arbitrary,
2
in other words,
x 4−y 4 −y 4 −1
y = y = 0 + y = 0 + y 1 with y ∈ R arbitrary.
D
z − 12 − 21 0 − 21 0
(a) The plane and the line intersect in exactly one point. This happens if and only if L is not
parallel to E which is the case if and only if L is not perpendicular to the normal vector of E.
(b) The plane and the line do not intersect. In this case, the E and L must be parallel, that is,
L must be perpendicular to the normal vector of E.
Figure 2.21: The left figure shows E1 ∩ E2 = ∅, the right figure shows E1 ∩ E4 which is a line.
FT
(c) The plane and the line intersect in infinitely many points. In this case, L lies in E, that is, E
and L must be parallel and they must share at least one point.
As an example we calculate E1 ∩ L2 . Since L2 is clearly not parallel to E1 , we expect that their
intersection consists of exactly one point.
Proof. The set of all points Q(x, y, z) which belong both to E1 and L2 is the set of all x, y, z which
RA
simultaneously satisfy
Replacing the expression with t from L2 into the equation of the plane E1 , we obtain the following
equation for t:
Replacing this t into the equation for L2 gives the point of intersection Q(1/9, 2/9, 4/3).
D
In order to check our result, we insert the coordinates in the equation for E1 and obtain x+y +2z =
1/9 + 2/9 + 2 · 4/3 = 1/3 + 8/3 = 3 which shows that Q ∈ E1 .
Let us calculate two more examples.
Observe that ~n ⊥ ~vL and ~n ⊥ ~vG . Therefore L k E and G k E (but L 6k G) and we expect for each
of the lines that it either does not intersect E or that it lies in E.
E ∩ L = ∅ The parametric equations of L are
x = −1 + t, y = t, z = 2 + 3t.
which is absurd. Therefore E and L do not have any point in common. In other words, they do
not intersect.
E ∩ G = G The parametric equations of G are
x = 3 + 2t, y = 2 + t, z = 5 + 4t.
FT
?
2 = 3 + 2t + 2(2 + t) − (5 + 4t) = 2
which is true for any t ∈ R. Therefore every point of G belongs also to E. In other words, G ⊆ E
or E ∩ G = G.
Remark. Recall that in the example above ~n ⊥ ~vL and ~n ⊥ ~vG . This means that L is parallel to
E, therefore it must either lies completely in E or it does not intersect E. The same is true for G.
RA
So if we take an arbitrary point on L and this point belongs also to E, then the intersection E ∩ L
is not empyt, but then we must have that E ∩ L = L. If that point does not belong to E, then E
and L cannot intersect (because otherwise L ⊆ E and the point would be in E too). For instance,
it is easy to see that the point P (−1, 0, 2) belongs to L (just take t = 0 in its parametric equation).
Let us put the coordinates of P in the formula for E:
−1 + 2 · 0 − 2 = −3 6= 2.
Ejercicios.
1. Considere el plano E : 2x − y + 3z = 9 y la recta L : x = 3t + 1, y = −2t + 3, z = 4t.
(a) Encuentre E ∩ L.
FT
(b) Encuentre una recta G que no interseque ni al plano E ni a la recta L. Pruebe su
afirmación. ¿Cúantas rectas con esta propiedad hay?
2
2. Sea L la recta que pasa por el punto (1, 1, 1) y es paralela al vector −1. Muestre que L
4
RA
no interseca al plano E : x − 2y − z = 1.
5. De dos planos E y F en R3 se sabe que no son paralelos y que los puntos A(1, 2, 3) y B(4, 0, −3)
pertenecen a E ∩ F . Se puede concluir qué es E ∩ F ? Dé dos ejemplos de planos E y F con
la propiedad arriba.
3
6. Un caminante arranca en al tiempo t = 0 en el punto (1, 2) con velocidad ~vc = . Hay
1
un ciclista en el punto (12, −3) y un punto de refrigerios en el sitio (16, 7). El caminante y el
ciclista se mueven en lineas rectas ambos con velocidad constante.
(a) Muestre que el caminante pasa por el punto de los refrigerios. ¿En cuál tiempo t pasa
por este punto?
(b) ¿En cuál dirección se debe dirigir el ciclista para pasar también por el punto de los
refrigerios?
(c) Supongamos que el ciclista arranca al mismo tiempo que el caminante. ¿Cómo debe el
ciclista escoger su velocidad si quiere encontrarse con el caminante en el punto de los
refrigerios? ¿Cómo la debe escoger si quiere pasar por el punto antes del caminante?
3
(d) Ahora supongamos que el ciclista se mueve con la velocidad w ~= . Muestre que el
7.5
ciclista pasa por el punto de los refrigerios. ¿A qué hora debe arrancar para encontrarse
con el caminante?
2.8 Summary
The vector space Rn is given by
n
FT
x1
..
xn
R = . : x1 , . . . , xn ∈ R .
RA
For points P (p1 , . . . , pn ), Q(q1 , . . . , qn ), the vector whose initial point is P and final point is Q, is
q1 − p1 q1
# – . # – .
P Q = .. and OQ = .. where O denotes the origin.
qn − pn qn
On Rn , the sum and product with scalars are defined by
v1 w1 v1 + w1 cv1
D
Rn × Rn → Rn , ~v + w ~ = ... + ... = ..
, R × Rn → R n , c~v = ... .
.
vn wn vn + wn cvn
The norm of a vector is q
k~v k = v12 + · · · + vn2 .
# – # –
If ~v = P Q, then k~v k = kP Qk = distance between P and Q.
~ ∈ Rn their inner product is a real number defined by
For vectors ~v and w
* v1 w1 +
.. ..
h~v , wi
~ = . , . = v1 w1 + · · · + vn wn .
vn wn
Applications
• Area of a parallelogram spanned by ~v , w
~ ∈ R3 : A = k~v × wk.
~
• Volume of a parallelepiped spanned by ~u, ~v , w
~ ∈ R3 : V = |h~u , ~v × wi|.
D
Representations of lines
n# – o
• Vector equation L = OP + ~t~v : t ∈ R .
P is a point on the line, ~v is called directional vector of L.
• Parametric equation x1 = p1 + tv1 , . . . , xn = pn + tvn , t ∈ R.
Then P (p1 , . . . , pn ) is a point on L and ~v = (v1 , . . . , vn )t is a directional vector of L.
• Symmetric equation x1v−p 1
1
= x2v−p
2
2
= · · · = xnv−p
n
n
.
Then P (p1 , . . . , pn ) is a point on L and ~v = (v1 , . . . , vn )t is a directional vector of L.
If one or several of the vj are equal to 0, then the formula above has to be modified.
Representations of planes
n# – o
• Vector equation E = OP + ~t~v + sw
~ : s, t ∈ R .
~ are vectors parallel to E with ~v 6k w.
P is a point on the line, ~v and w ~
• Normal form (only in R3 !!) E : ax + by + cz = d.
a
The vector ~n = b formed by coefficients on the left hand side is perpendicular to E.
c
Moreover, E passes through the origin if and only if d = 0.
The parametrisations are not unique!! (One and the same line (or plane) has many different
parametrisations.)
• The angle between two lines is the angle between their directional vectors.
• Two lines are parallel if and only if their directional vectors are parallel.
FT
Two lines are perpendicular if and only if their directional vectors are perpendicular.
• The angle between two planes is the angle between their normal vectors.
• Two planes are parallel if and only if their normal vectors are parallel.
Two planes are perpendicular if and only if their normal vectors are perpendicular.
• A line is parallel to a plane if and only if its directional vector is perpendicular to the plane.
A line is perpendicular to a plane if and only if its directional vector is parallel to the plane.
RA
D
2.9 Exercises
2 −1
1. Sean ~a = y ~b = . Encuentre vectores ~u, w
~ tal que cumplan todas las siguientes
−3 4
condiciones:
(a) ~a = ~u + w.
~
~
(b) ~u k b.
~ ⊥ ~b.
(c) w
1
2. Sea ~a = . Encuentre ~b ∈ R2 tal que ~a ⊥ ~b y k~ak = k~bk. Repita el ejercicio cuando
−3
2 1 x
~a = , , .
3 4 y
1 1
3. Sea ~a = −2 y ~b = 2. Encuentre escalares x, y tal que x~a + y~b ⊥ ~b y x~a + y~b 6= ~0.
FT
3 −4
4. Sean ~a, ~b, ~c ∈ Rn . Responda si son falsas o verdaderas las siguientes afirmaciones. En caso
afirmativo demuéstrela y en caso negativo de un contraejemplo.
(a) Si h~a , ~bi = h~a , ~ci y ~a 6= ~0, entonces ~b = ~c.
(b) Si existe un vector ~b con h~a , ~bi = 0, entonces ~a = 0.
(c) Si h~a , ~bi = 0 para todo vector ~b, entonces ~a = 0.
RA
(d) Si n = 3. Si h~a , ~bi = 0 y ~a × ~b = 0, entonces ~a = 0 ó ~b = 0 (Haga una interpretación
geométrica antes de intentar desarrollar este ı́tem).
1 0
5. Sean ~a = −1 y ~b = 3. Encuentre por lo menos 3 vectores ~c distintos tales que el
2 −1
volumen del paralepı̀pedo generado por ~a, ~b, ~c sea 1. ¿Cuál es el lugar geométrico que describen
todos los vectores ~c con esta propiedad?
D
π π π
6. En R3 demuestre que no existe un vector unitario cuyos ángulos directores son 6 , 10 , 3 .
7. Un cometa sale disparado del punto P (1, 0, 3) y se mueve con velocidad (3, 5, −2)t , al mismo
tiempo un asteroide sale disparado del punto R(3, 4, −1) y se mueve con vector velocidad
(2, 3, 0)t . Si suponemos que ambos asteroides se desplazan en lı́nea recta, a velocidad constante
y que además ningún objeto celeste perturba su trayectoria, responda las siguientes preguntas:
(a) i. ¿Cuáles son las ecuaciones vectoriales que describen las trayectorias del asteroide y
del cometa?
ii. ¿Los dos asteroides colisionan? ¿En que tiempo lo hacen?
iii. Si el cometa deja en su trayectoria una estela de hielo y el asteroide deja en su
recorrido una estela de polvo, ¿el hielo del cometa y el polvo del asteroide se mezclan
en algún punto del espacio?.
(b) Repita las preguntas anteriores considerando que el asteroide parte de R(10, 5, 2) a ve-
locidad (3, 15, −7)t .
R
P
8. Considere una pared inclinada que está dada por la ecuación E : 2x − 3y + 0.5z = 4.
3
FT
punto en la pared marcará el laser del literal anterior si apunta en la dirección
2/3
1 ?
9. Una empresa fabrica bultos de peso. Hay tres materiales con los que puede llenar los bultos:
Material A tiene una densidad de 1 kg
tiene una densidad de 3 kg
kg
l , material B tiene una densidad de 2 l y material C
RA
l . Cada bulto debe pesar 20kg.
(a) Interprete la información dada en el ejercicio como un plano en R3 donde los ejes repre-
sentan la cantidad de cada material.
(b) Si en un bulto ya se encuentran 5l del material A y 3l del material B, ¿cuántos litros del
material C se deba agregar para completar el bulto?
(c) ¿Cómo se pueden armar los bultos si se quiere adicionalmente que el volumen de cada
bulto es 13l? ¿Cuántas posibilidades de hacerlo hay? Interprete sus cálculos como
D
|ax1 + by1 − c|
d= √ .
a2 + b2
a
FT
(Hint: Recuerde el significado geométrico del vector en la recta L. Usar proyec-
b
ciones.)
(b) En R3 considere el plano E : ax + by + cz = d y un punto P (x1 .y1 , z1 ) exterior a E.
Demuestre que la distancia d de P a E viene dada por la fórmula:
7 0
14. Sea E el plano 3x + y + z = 1 y P (−6, 4, 4). El siguiente ejercicio pretende encontrar el punto
más cercano a P dentro de E.
(a) Verifique que P ∈
/ E.
(b) Encuentre la ecuación de la recta L que es paralela al vector normal de E y pasa por P .
(c) Obtenga el punto de intersección entre E y L, llámelo Q.
(d) Verifique que el valor de la distancia entre el punto obtenido y P da lo mismo que la
distancia de P al plano E (Ejercicio 12. parte (b)).
(e) Justifique el porqué Q es el punto más cercano a P que está dentro de E.
17. Sean
x+3 z+2
L1 := =y−4=
2 7
FT
y L2 dada por sus ecuaciones paramétricas:
x = −3 + s
y = 2 − 4s
z = 1 + 6s
(a) Demuestre que los vectores ~a = 1, ~b = 5 y ~v = 3 son paralelos al plano E.
−1 1 0
(b) Encuentre números λ, µ ∈ R tal que λ~a + µ~b = ~v .
1
(c) Demuestre que el vector ~c = 1 no es paralelo al plano E y encuentre vectores ck y
1
c⊥ tal que ck es paralelo a E, c⊥ es ortogonal a E y c = ck + c⊥ .
19. Sea E un plano en R2 y sean ~a, ~b vectores paralelos a E. Demuestre que para todo λ, µ ∈ R,
el vector λ~a + µ~b es paralelo al plano.
21. De todos los siguientes conjuntos decida si es un espacio vectorial con su suma y producto
usual.
a
(a) V = :a∈R ,
a
a
(b) V = : a ∈ R ,
a2
(c) V es el conjunto de todas las funciones continuas R → R.
FT
(d) V es el conjunto de todas las funciones f continuas R → R con f (4) = 0.
(e) V es el conjunto de todas las funciones f continuas R → R con f (4) = 1.
RA
D
Chapter 3
We will rewrite linear systems as matrix equations in order to solve them systematically and effi-
ciently. We will interpret matrices as linear maps from Rn to Rm which then allows us to define
FT
algebraic operations with matrices, specifically we will define the sum and the composition (=mul-
tiplication) of matrices which then leads naturally to the concept of the inverse of a matrix. We
can interpret a matrix as a system which takes some input (the variables x1 , . . . , xn ) and gives us
back as output b1 , . . . , bm via A~x = ~b. Sometimes we are given the input and we want to find
the bj ; and sometimes we are given de output b1 , . . . , bm and we want to find the input x1 , . . . , xn
which produces the desired output. The latter question is usually the harder one. We will see that
a unique input for any given output exists if and only if the matrix is invertible. We can refine
the concept of invertibility of a matrix. We say that A has a left inverse if for any ~b the equation
RA
A~x = ~b has at most one solution and we say that it has a right inverse A~x = ~b has at least one
solution for any ~b.
We will discuss in detail the Gauß and Gauß-Jordan elimination which helps us to find solutions
of a given linear system and the inverse of a matrix if it exists. In Section 3.7 we define the trans-
position of matrices and we have a first look at symmetric matrices. They will become important
in Chapter 8. We will also see the interplay of transposing a matrix and the inner product. In the
last section of this chapter we define the so-called elementary matrices which can be seen as the
building blocks of invertible matrices. We will use them in Chapter 4 to prove important properties
D
of the determinant.
Recall that the system is called consistent if it has at least one solution; otherwise it is called
inconsistent. According to (1.4) and (1.5) its associated coefficient matrix and augmented coefficient
matrices are
a11 a12 . . . a1n
a21 a22 . . . a2n
A= . (3.2)
..
.. .
am1 am2 ... amn
and
a11 a12 ... a1n b1
a21 a22 ... a2n b2
(A|b) =
... .. .. . (3.3)
. .
am1 am2 ... amn bm
Definition 3.1. The set of all matrices with m rows and n columns is denoted by M (m × n). If we
FT
want to emphasise that the matrix has only real entries, then we write M (m × n, R) or MR (m × n).
Another frequently used notation is Mm×n . A matrix A is called a square matrix if its number of
rows is equal to its number of columns.
In order to solve (3.1), we could use the first equation, solve for x1 and insert this in all the other
equations. This gives us a new system with m − 1 equations for n − 1 unknowns. Then we solve
the next equation for x2 , insert it in the other equations, and we continue like this until we have
only one equation left. This of course will fail if for example a11 = 0 because in this case we cannot
RA
solve the first equation for x1 . We could save our algorithm by saying: we solve the first equation
for the first unknown whose coefficient is different from 0 (or we could take an equation where the
coefficient of x1 is different from 0 and declare this one to be our first equation. After all, we can
order the equations as we please). Even with this modification, the process of solving and replacing
is error prone.
Another idea is to manipulate the equations. The question is: Which changes to the equations
are allowed without changing the information contained in the system? We don’t want to destroy
information (thus potentially allowing for more solutions) nor introduce more information (thus
potentially eliminating solutions). Or, in more mathematical terms, what changes to the given
D
system of equations result in an equivalent system? Here we call two systems equivalent if they
have the same set of solutions.
We can check if the new system is equivalent to the original one, if there is a way to restore the
original one.
For example, if we exchange the first and the second row, then nothing really happened and we end
up with an equivalent system. We can come back to the original equation by simply exchanging
again the first and the second row.
If we multiply both sides of the first equation on both sides by some factor, let’s say, by 2, then
again nothing changes. Assume for example that the first equation is x + 3y = 7. If we multiply
both sides by 2, we obtain 2x + 6y = 14. Clearly, if a pair (x, y) satisfies the first equation, then
it satisfies also the second one an vice versa. Given the new equation 3x + 6y = 14, we can easily
restore the old one by simply dividing both sides by 2.
If we take an equation and multiply both of its sides by 0, then we destroy information because we
end up with 0 = 0 and there is no way to get back the information that was stored in the original
equation. So this is not an allowed operation.
Show that squaring both sides of an equation in general does not give an equivalent equation.
Are there cases, when it does?
Squaring an equation or taking the logarithm on both sides or other such things usually are not
interesting to us because the resulting equation will no longer be a linear equation.
Let us denote the jth row of our linear system (3.1) by Rj . The following tabel contains the so-
called called elementary row operations. They are the “allowed” operations because they do not
alter the information contained in a given linear system since they are reversible.
The first column describes the operation in words, the second introduces their shorthand notation
and in the last row we give the inverse operation which allows us to get back to the original system.
FT
Elementary operation Notation Inverse Operation
1 Swap rows j and k. Rj ↔ Rk Rj ↔ Rk
2 Multiply row j by some λ 6= 0. Rj → λRj Rj → λ1 Rj
3 Replace row k by the sum of row k and λ times Rk → Rk + λRj Rk → Rk − λRj
Rj and leave row j unchanged (j 6= k).
RA
Exercise. Show that the operation in the third column reverses the operation from the second
column.
Exercise. Show that in reality 1 is not necessary since it can be achieved by a composition of
operations of the form 2 and 3 (or 2 and 3’ ). Show how this can be done.
D
Example 3.2.
x1 + x2 − x3 = 1 x1 + x2 − x3 = 1
x1 + x2 − x3 = 1
R →R −2R1 R →R −4R2
2x1 + 3x2 + x3 = 3 −−2−−−2−−−→ x2 + 3x3 = 1 −−3−−−3−−−→ x2 + 3x3 = 1
4x2 + x3 = 7 4x2 + x3 = 7 − 11x3 = 3
x1 + x2 − x3 =
1
R3 →R3 −4R2
−−−−−−−−→ x 2 + 3x 3 = 1
x3 = −3/11.
Here we can stop because it is already quite easy to read off the solution. Proceeding from the
bottom to the top, we obtain
Note that we could continue our row manipulations to clean up the system even more:
x1 + x2 − x3 = 1
R →−1/11R x 1 + x 2 − x 3 = 1
3 3
· · · −→ x2 + 3x3 = 1 −−−−−−−−−→ x2 + 3x3 = 1
− 11x3 = 3 x3 = −3/11
x1 + x2 − x3 =
1 R →R −1/11R x1 + x2
= 8/11
R →R −3R3 1 1 3
−−2−−−2−−−→ x2 = 20/11 −−−−−−−−−−−→ x2 + = 20/11
x3 = −3/11 x3 = −3/11
x1 + = −12/11
FT
R →R −R
−−1−−−−
1 2
−−→ x2 = 20/11
x3 = −3/11
Our strategy was to apply manipulations that successively eliminate the unknowns in the lower
equations and we aimed to get to a form of the system of equations where the last one contains the
least number of unknowns possible.
Convince yourself that the first step of our reduction process is equivalent to solve the first
RA
equation for x1 and insert it in the other equations in order to eliminate it there. The next step
in the reduction is equivalent to solve the new second equation for x2 and insert it into the third
equation.
It is important to note that there are infinitely many different routes leading to the final result,
but usually some are quicker than others.
Let us analyse what we did. We looked at the coefficients of the system and we applied trans-
D
formations such that they become 0 because this results in removing the corresponding unknowns
from the equations. So in the example above we could just as well delete all the xj , keep only the
augmented coefficient matrix and perform the line operations in the matrix. Of course, we have
to remember that the numbers in the first columns are the coefficients of x1 , those in the second
column are the coefficients of x2 , etc. Then our calculations are translated into the following:
1 1 1 1 1 1 1 1 1 1 1 1
R2 →R2 −2R1 R →R −4R2
2 3 1 3 − −−−−−−−→ 0 1 3 1 −−3−−−3−−−→ 0 1 3 1
0 4 1 7 0 4 1 7 0 0 −11 3
1 1 1 1
R3 →1/11R3
−−−−−−−−→ 0 1 3 1 .
0 0 1 −3/11
x1 + x2 + x3 = 1
x2 + 3x3 = 3
x3 = −3/11
FT
want them to be the last unknowns. So as last row we want one that has only zeros in it
or one that starts with zeros, until finally we get a non-zero number say in column k. This
non-zero number can always be made equal to 1 by dividing the row by it. Now we know
how the unknowns xk , . . . , xn are related. Note that all the other unknowns x1 , . . . , xk−1 have
disappeared from the equation since their coefficients are 0.
If k = n as in our example above, then we there is only one solution for xn .
• The second non-zero row from the bottom should also start with zeros until we get to a column,
say column l, with non-zero entry which we always can make equal to 1. This column should
RA
be to the left of the column k (that is we want l < k). Because now we can use what we
know from the last row about the unknowns xk , . . . , xn to say something about the unknowns
xl , . . . , xk−1 .
• We continue like this until all rows are as we want them.
Note that the form of such a “nice” matrix looks a bit like it had a triangle consisting of only zeros
in its lower left part. There may be zeros in the upper right part. If a matrix has the form we just
described, we say it is in row echelon form. Let us give a precise definition.
D
Definition 3.3 (Row echelon form). We say that a matrix A ∈ M (m × n) is in row echelon
form if:
Definition 3.4 (Reduced row echelon form). We say that a matrix A ∈ M (m×n) is in reduced
row echelon form if:
Examples 3.5.
(a) The following matrices are in reduced row echelon form. The pivots are highlighted.
1 1 0 0
1 0 0 0
! !
1 6 0 0 , 0 0 1 0
,
1 6 0 1 , 1 0 0 0 , 0 1 0 0 .
0 0 1 0 0 0 0 1
0 0 1 1 0 0 1 1 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 1
0 0 0 0
(b) The following matrices are in row echelon form but not in reduced row echelon form. The
pivots are highlighted.
1 6 3 1 ! ! 1 0 5 0
1 6 3 1 0 0 1 4
FT
1 6 1 0 , 1 0 2 0 , 0 1 0 0
0 0 1 1 , ,
.
0 0 0 1
0 0 1 1 0 0 1 1 0 0 1 0
0 0 0 1 0 0 0 0
0 0 0 1
0 0 0 0
Exercise. • Say why the matrices in (b) are not in reduced row echelon form and use ele-
mentary row operations to transform them into a matrix in reduced row echelon form.
• Say why the matrices in (c) are not in row echelon form and use elementary row operations
to transform them into a matrix in row echelon form. Transform them further to obtain a
D
Question 3.1
If we interchange two rows in a matrix this corresponds to writing down the given equations in a
different order. What is the effect on a linear system if we interchange two columns?
Remember: if we translate a linear system to an augmented coefficient matrix (A|b), perform the
row operations to arrive at a (reduced) row echelon form (A0 |b0 ), and translate back to a linear
system, then this new system contains exactly the same information as the original one but it is
“tidied up” and it is easy to determine its solution.
The natural question now is: Can we always transform a matrix into one in (reduced) row echelon
form? The answer is that this is always possible and we can even give an algorithm for it.
Gaußian elimination. Let A ∈ M (m × n) and assume that A is not the zero matrix. Gaußian
elimination is an algorithm that transforms A into a row echelon form. The steps are as follows:
• Find the first column which does not consist entirely of zeros. Interchange rows appropiately
such that the entry in that column in the first row is different from zero.
• Multiply the first row by an appropriate number so that its first non-zero entry is 1.
• Use the first row to eliminate all coefficients below its pivot.
• Now our matrix looks like
0 0 1 ∗ ∗
0 .
0
A0
0 0 0
where ∗ are arbitrary numbers and A0 is a matrix with fewer columns than A and m − 1 rows.
FT
Now repeat the process for A0 . Note that in doing so the first columns do not change since
we are only manipulating zeros.
Definition 3.6. Two m × n matrices A and B are called row equivalent if there are elementary
row operations that transform A into B. (Clearly then B can be transformed by row operations
into A.)
D
Before we give examples, we note that from the row echelon form we can immediately tell how
many solutions the corresponding linear system has.
Theorem 3.7. Let (A|b) be the augmented coefficient matrix of a linear m × n system and let
(A0 |b0 ) be a row reduced form.
(1) If there is a row of the form (0 · · · 0|β) with β 6= 0, then the system has no solution.
(2) If there is no row of the form (0 · · · 0|β) with β 6= 0, then one of the following holds:
(2.1) If there is a pivot in every column of A0 then the system has exactly one solution.
(2.2) If there is a column in A0 without a pivot, then the system has infinitely many solutions.
Proof. (1) If (A0 |b0 ) has a row of the form (0 · · · 0|β) with β 6= 0, then the corresponding equation
is 0x1 + · · · + 0xn = β which clearly has no solution.
(2) Now assume that (A0 |b0 ) has no row of the form (0 · · · 0|β) with β 6= 0. In case (2.1), the
transformed matrix is then of the form
0 0 0 0
1 a12 a13 a1n b1
0 1 a023 0 0
a2n b 2
1
FT
. (3.4)
1 a0(n−1)n b0n−1
0
b0n
1
0 0
0
0 0 0
RA
Note that the last zero rows appear only if n < m. This system clearly has the unique solution
xn = b0n , xn−1 = b0n−1 − a(n−1)n xn , ..., x1 = b01 − a1n xn − · · · − a12 x2 .
. (3.5)
0 0 1 ∗ 0
∗ bk
0 0
0 0 0
where the stars stand for numbers. (If we continue the reduction until we get to the reduced
row echelon form, then the numbers over the 1’s must be zeros.) Note that we can choose the
unknowns which correspond to the columns without a pivot arbitrarily. The unknowns which
correspond to the columns with pivots can then always be chosen in a unique way such that
the system is satisfied.
Definition 3.8. The variables wich correspond to columns without pivots are called free variables.
We will come back to this theorem later on page 112 (the theorem is stated again in the coloured
box).
From the above theorem we get as an immediate consequence the following.
Theorem 3.9. A linear system has either no, exactly one or infinitely many solutions.
Example 3.10 (Example with a unique solution (no free variables)). We consider the
linear system
2x1 + 3x2 + x3 = 12,
−x1 + 2x2 + 3x3 = 15, (3.6)
3x1 − x3 = 1.
Solution. We form the augmented matrix and perform row reduction.
FT
2 3 1 12 0 7 7 42 0 7 7 42
R1 →R1 +2R2 R3 →R3 +3R2
−1 2 3 15 −−−−−−−−→ −1 2 3 15 −−−−−−−−→ −1 2 3 15
3 0 −1 1 3 0 −1 1 0 6 8 46
R →−R
−1 2 3 15 1 1
R2 → 71 R2
1 −2 −3 −15
R1 ↔R2
−−−−−→ 0 7 7 42 −−−−−−→ 0
1 1 6
0 6 8 46 0 6 8 46
RA
1 −2 −3 −15 R3 → 12 R3
1 −2 −3 −15
R →R −6R2
−−3−−−3−−−→ 0 1 1 6 −−−−−−→ 0 1 1 6 .
0 0 2 10 0 0 1 5
This shows that the system (3.6) is equivalent to the system
x1 − 2x2 − 3x3 = −15,
x2 + x3 = 6, (3.7)
x3 = 5
D
Remark. If we continue the reduction process until we reach the reduced row echelon form, then
we obtain
1 −2 −3 −15 1 −2 −3 −15 1 −2 0 0
R →R2 −R3 R →R +3R3
. . . −→ 0 1 1 6 −−2−−−− −−→ 0 1 0 1 −−1−−−1−−−→ 0 1 0 1
0 0 1 5 0 0 1 5 0 0 1 5
1 0 0 2
R →R +2R2
−−1−−−1−−−→ 0 1 0 1 .
0 0 1 5
x3 = 5, x2 = 1, x1 = 2.
Example 3.11 (Example with two free variables). We consider the linear system
3x1 − 2x2 + 3x3 + 3x4 = 3,
2x1 + 6x2 + 2x3 − 9x4 = 2, (3.8)
x1 + 2x3 + x3 − 3x4 = 1.
Solution. We form the augmented matrix and perform row reduction.
FT
3 −2 3 3 3 3 −2 3 3 3 0 −8 0 12 0
R →R −2R1 R →R −3R3
2 6 2 −9 2 −−2−−−2−−−→ 0 2 0 −3 0 −−1−−−1−−−→ 0 2 0 −3 0
1 2 1 −3 1 1 2 1 −3 1 1 2 1 −3 1
1 2 1 −3 1 1 2 1 −3 1
R ↔R3 R →R +4R2
−−1−−−→ 0 2 0 −3 0 −−3−−−3−−−→ 0 2 0 −3 0
0 −8 0 12 0 0 0 0 0 0
RA
1 0 1 0 1
R →R1 −R2
−−1−−−−−−→ 0 2 0 −3 0 .
0 0 0 0 0
The 3rd and the 4th column do not have pivots and we see that the system (3.8) is equivalent to
the system
x1 − x3 = 1,
x2 + x4 = 0.
D
Clearly we can choose x3 and x4 (the unknowns corresponding to the columns without a pivot)
arbitrarily. We will always be able to adjust x1 and x2 such that the system is satisfied. In order
to make it clear that x3 and x4 are our free variables, we sometimes call them x3 = t and x4 = s.
Then every solution of the system (3.8) is of the form
In vector form we can write the solution as follows. . A tuple (x1 , x2 , x3 , x4 ) is a solution of (3.8)
if and only if the corresponding vector is of the form
x1 1+t 1 1 0
x2 −s 0 0 −1
x3 t = 0 + t 1 + s 0 for some s, t ∈ R.
=
x4 s 0 0 1
Remark 3.12. The solutions of an inhomogeneous system of linear equations in vector notation
is always of the form
~x = ~z0 + t1 ~y1 + · · · + tk ~yk .
0 0 1
FT
3x1 + 2x2 − 2x3 = 7, (3.9)
−x1 + 3x2 − 3x3 = 2.
Solution. We form the augmented matrix and perform row reduction.
2 1 −1 7 0 7 −7 11 0 7 −7 11
R1 →R1 +2R3 R →R +3R3
3 2 −2 7 − −−−−−−−→ 3 2 −2 7 −−2−−−2−−−→ 0 11 −11 13
−1 3 −3 2 −1 3 −3 2 −1 3 −3 2
RA
−1 3 −3 2 −1 3 −3 2
R1 ↔R3 11R3 →R3 −7R2
−−−−−→ 0 11 −11 13 −−−−−−−−−−→ 0 11 −11 13 .
0 7 −7 11 0 0 0 30
The last line tells us immediately that the system (3.9) has no solution because there is no choice
of x1 , x2 , x3 such that 0x1 + 0x2 + 0x3 = 30.
D
• why a given matrix can be transformed into may different row echelon forms, but in only
one reduced row echelon form,
• why a linear system always has either no, exactly one or infinitely many solutions,
• etc.
You should now be able to
• identify if a matrix is in row echelon or a reduced row echelon form,
• use the Gauß- or Gauß-Jordan elimination to solve linear systems,
• say if a system has no, exactly one or infinitely many solutions if you know its echelon form,
• etc.
Ejercicios.
1. Por medio de operaciones elementales, lleve las siguientes matrices aumentadas a sus formas
FT
escalonadas reducidas y encuentre todas las soluciones del sistema de ecuaciones asociado a
cada matriz:
1 2 3 −2 5 1 −5 1 0 0 1 −3
(a)
2 −1 −1 1 4 2 2 −4 (g) 0 2 1 1
(d)
1 0 3 2 3 2 1 7
−3 1 4 −1
1 2 3 4 1
2 4 6 −1 1 2 3 0 −2
RA
(b) 4 5 6 2 1 1 5 −1 2 (h)
1 2 0 0 −1
2 7 12 1 (e) 0 1 2 1 0
1 0 0 0 2
2 −1 4 1 13
1 0 0 0 1
6 2 4 0 2 3 1 2 1 0 0 −3
(i)
(c) 1 −2 −4 (f) 2 −6 7 3 3 2 1 0 2
1 1 2 1 −2 5 2 4 3 2 1 5
x + 3y − 2z = a + 1
3x − y + z = b + 6
5x − 5y + 4z = c + 11
3. Encuentre un polinomio de grado a lo más 2 que pase por los puntos (−1, −6), (1, 0), (2, 0).
¿Cuántos tales polinomios hay?
4. (a) ¿Existe un polinomio de grado 1 que pase por los tres puntos del Ejercicio 3.? ¿Cuántos
tales polinomios hay?
(b) ¿Existe un polinomio de grado 3 que pase por los tres puntos del Ejercicio 3.? ¿Cuántos
tales polinomios hay? Dé por lo menos dos polinomios de grado 3.
5. Un rolo hace un pequeño tour por Colombia y revisando sus cuentas nota que: En hostales
gastó $30.000 diarios en Medellı́n, $20.000 diarios en Villavicencio y $20.000 diarios en Yopal,
en alimentación gastó $20.000 diarios en Medellı́n, $30.000 diarios en Villavicencio y $20.000
diarios en Yopal y finalmente en desplazamientos gastó en promedio $10.000 diariamente en
cada ciudad. Si se sabe que durante todo el tour gastó $340.000 en hostales, $320.000 en
alimentación y $140.000 en desplazamientos, calcule el número de dı́as que estuvo el rolo en
cada ciudad.
6. Un criadero del llano tiene chiguiros, conejos y ratones arroceros. Cada chiguiro de consume
cada semana en promedio un kilo de frutas, un kilo de hierbas y dos kilos de arroz, cada
conejo consume semanalmente en promedio tres kilos de frutas, cuatro kilos del hierbas y
cinco kilos de arroz y cada ratón arrocero consume cada semana en promedio dos kilos de
frutas, un kilo de hierbas y cinco kilos de arroz. Cada semana se proporcionan 25.000 kilos
de frutas, 20.000 de hierbas y 55.000 de arroz. Si las tres especies de roedores se comen todo
el alimento ¿Cuantos roedores de cada tipo pueden vivir en el criadero?
FT
In this short section we deal with the special case of homogeneous linear systems. Recall that a
linear system (3.1) is called homogeneous if b1 = · · · = bn = 0. Such a system has always at least
one solution, the so-called trivial solution x1 = · · · = xn = 0. This also clear from Theorem 3.7
since no matter what row operations we perform, the right side will always remain equal to 0. Note
that if we perform Gauß or Gauß-Jordan elimination, there is no need to write down the right hand
RA
side since it always will be 0.
If we adapt Theorem 3.7 to the special case of a homogeneous system, we obtain the following.
Theorem 3.14. Let A be the coefficient matrix of a homogeneous linear m × n system and let A0
be a row reduced form.
(i) If there is a pivot in every column then the system has exactly one solution, namely the trivial
solution.
D
(ii) If there is a column with without a pivot, then the system has infinitely many solutions.
Corollary 3.15. A homogeneous linear system has either exactly one or infinitely many solutions.
x1 + 2x2 − x3 = 0,
2x1 + 3x2 − 2x3 = 0, (3.10)
3x1 − x2 − 3x3 = 0.
x1 = t, x2 = 0, x3 = t for t ∈ R.
or in vector form
x1 1
x2 = t 0 for t ∈ R.
FT
x3 1
Example 3.17 (Example of a homogeneous system with exactly one solution). We con-
sider the linear system
x1 + 2x2 = 0,
2x1 + 3x2 = 0, (3.11)
3x1 + 5x2 = 0.
RA
Solution. We perform row reduction on the associated matrix.
1 2 use R1 to clear 1 2 use R2 to clear 1 0 1 0
R →−R2
2 3 −the
−−−1st column
−−−−−→ 0 −1 − the 2nd column
−−−−−−−−−→ 0 −1 −−2−−−−→ 0 1 .
3 5 0 −1 0 0 0 0
In the next section we will see the connection between the set of solutions of a linear system and
the corresponding homogeneous linear system.
Ejercicios.
1. Encuentre todas las soluciones de los sistemas homogéneo asociado a las matrices del Ejercicio
1. de la sección 3.1.
2. Determine el conjunto de soluciones de los siguientes sistemas homogéneos:
3. Encuentre todos los r ∈ R tal que el siguiente sistema tenga solución única:
(2 + r)x − 2y = 0
2x + (1 − r)y = 0
FT
3.3 Matrices and linear systems
So far we were given a linear system with a specific right hand side and we asked ourselves which
xj do we have to feed into the system in order to obtain the given right hand side. Problems of
this type are called inverse problems since we are given an output (the right hand of the system;
the “state” that we want to achieve) and we have to find a suitable input in order to obtain the
desired output.
Now we change our perspective a bit and we ask ourselves: If we put certain x1 , . . . , xn into the
RA
system, what do we get as a result on the right hand side? To investigate this question, it is very
useful to write the system (3.1) in a short form. First note that we can view it as an equality of
the two vectors with m components each:
Let A be the coefficient matrix and ~x the vector whose components are x1 , . . . , xn . Then we write
the left hand side of (3.12) as
With this notation, the linear system (3.1) can be written very short as
A~x = ~b
b1
with ~b = ... .
bm
A way to remember the formula for the multiplication of a matrix and a vector is that we “multiply
each row of the matrix by the column vector”, so we calculate “row by column”. For example, the
jth component of A~x is “(jth row of A) by (column ~x)”.
a11 a12 . . . a1n aj1 x1 + aj2 x2 + · · · + ajn xn
.. .. x1 ..
. . .
x
2
A~x = aj1 aj2 . . . ajn . = aj1 x1 + aj2 x2 + · · · + ajn xn . (3.14)
. .
.
. .
.. .. ..
xn
am1 am2 . . . amn am1 x1 + am2 x2 + · · · + amn xn
Definition 3.18. The formula in (3.13) is called the multiplication of a matrix and a vector.
FT
An m × n matrix A takes a vector with n components and gives us back a vector with m compo-
nents.
Observe that something like ~xA does not make sense!
.. ..
am1 amn
Remark 3.20. Recall that ~ej is the vector which has a 1 as its jth component and has zeros
everywhere else. Formula (3.13) shows that for every j = 1, . . . , n
a1j
A~ej = ... = jth column of A. (3.16)
amj
Proof. The proofs are not difficult. They follow by using the definitions and carrying out some
straightforward calculations as follows.
a11 cx1 + · · · + a1n cxn a11 x1 + · · · + a1n xn
cx1 a cx + · · · + a cx a x + · · · + a x
21 1 2n n 21 1 2n n
..
(i) A(c~x) = A . = .
.
= c
. ..
= cA~x.
.. .. .
. .
cxn
am1 cx1 + · · · + amn cxn am1 x1 + · · · + amn xn
FT
(ii) A(~x + ~y ) = A =
. .. ..
. .
x +y
n n
am1 (x1 + y1 ) + · · · + amn (xn + yn )
a11 x1 + · · · + a1n xn a11 y1 + · · · + a1n yn
a x + · · · + a x a y + · · · + a y
21 1 2n n 21 1 2n n
=
.. ..
+
.. ..
= A~x + A~y .
. . . .
am1 x1 + · · · + amn xn am1 y1 + · · · + amn yn
RA
(iii) To show that A~0 = ~0, we could simply do the calculation (which is very easy!) or we can use
(i):
A~0 = A(0~0) = 0A~0 = ~0.
Note that in (iii) the ~0 on the left hand side is the zero vector in Rn whereas the ~0 on the right
hand side is the zero vector in Rm .
Proposition 3.21 gives an important insight into the structure of solutions of linear systems. See
also Remark 3.12.
D
Theorem 3.22. (i) Let ~x and ~y be solutions of the linear system (3.1). Then ~x − ~y is a solution
of the associated homogeneous linear system.
(ii) Let ~x be a solution of the linear system (3.1), let ~z be a solution of the associated homogeneous
linear system and let λ ∈ R. Then ~x + λ~z is solution of the system (3.1).
which shows that ~x − ~y solves the homogeneous equation A~v = ~0. Hence (i) is proved
In order to show (ii), we proceed similarly. If ~x solves the inhomogeneous system (3.1) and ~z solves
the associated homogeneous system, then
A~x = ~b and A~z = ~0.
Now (ii) follows from
Corollary 3.23. Let ~x be an arbitrary solution of the inhomogeneous system (3.1). Then the set
of all solutions of (3.1) is
{~x + ~z : ~z is solution of the associated homogeneous system}.
This means that in order to find all solutions of an inhomogeneous system it suffices to find one
particular solution and all solutions of the corresponding homogeneous system.
FT
We will show later that the set of all solutions of a homogeneous system is a vector space. When you
study the set of all solutions of linear differential equations, you will encounter the same structure.
0 0 0 0
It follows that x2 = 3 and x1 = −3 + x3 . If we take x2 as parameter, the general solution of the
system in vector form is
x1 0 1
x2 = 3 + t 0 for t ∈ R.
x3 0 1
Note that the left hand side of the system (3.10’) is the same as that of the homogeneous system
(3.10) in Example 3.16 which has the general solution
x1 1
x2 = t 0 for t ∈ R.
x3 1
This shows that indeed we obtain all solutions of the inhomogenous equation as sum of the particular
solution (0, 3, 0)t and all solutions of the corresponding homegenous system.
Ejercicios.
1. Sea A = 1 3
1 2 −1 3
−1 2
FT
0 2. Para cada uno de los siguientes vectores, verifique si es una
1 3
solución del sistema homogéneo A~x = ~0:
RA
5 1 10 3 2
−3 2 −6 0 −1
5,
(a)
3,
(b)
10,
(c)
0,
(d)
0.
(e)
2 4 20 −1 1
Posteriormente, encuentre todas las soluciones del sistema homogéneo. ¿Existe una solución
del sistema anterior tal que alguna de sus componentes sea cero pero no sea la solución trivial?
D
2. En cada ı́tem, escriba el sistema de ecuaciones lineales correspondiente y obtenga todas sus
soluciones:
2 −1 1 3
1 1 3 2 7 3
1 −2
−2
(a) 2 −1 0 4 ~x = 8 (b) 1 −1 1 ~x = 7
0 3 6 0 8 1 5 7 13
1 −7 −5 12
2 1 1 2 3 1 0 2 6
(c) 1 1 −2 ~x = 3 (d) 2 1 −1 0 ~x = 3
1 1 1 1 1 1 1 1 6
x + 3y + z = 3
2x + 7y + 2z = 5
2x + 6y + (a2 − 2)z = a + 4.
3.4
matrices FT
Matrices as functions from Rn to Rm ; composition of
In the previous section we saw that a matrix A ∈ M (m × n) takes a vector ~x ∈ Rn and returns
RA
a vector A~x in Rm . This allows us to view A as a function from Rn to Rm , and therefore we can
define the sum and composition of two matrices. Before we do this, let us see a few examples of
such matrices. As examples we work with 2 × 2 matrices because their action on R2 can be sketched
in the plane.
1 0
Example 3.25. Let us consider A = . This defines a function TA from R2 to R2 by
0 −1
D
TA : R2 → R2 , TA ~x = A~x.
Remark. We write TA to denote the function induced by A, but sometimes we will write simply
A : R2 → R2 when it is clear that we consider the matrix A as a function.
We calculate easily
1 1 0 0 x x
TA = , TA = , in general TA = .
0 0 1 −1 y −y
y y
TA
~e2 ~v = ( vv12 )
TA w
~
x x
~e1 TA~e1
w
~ TA~e2
v1
TA~v = −v2
FT
Figure 3.1: Reflection on the x-axis.
0 0
Example 3.26. Let us consider B = . This defines a function TB from R2 to R2 by
0 1
TB : R2 → R2 , TB ~x = B~x.
We calculate easily
1 0 0 0 x 0
RA
TB = , TB = , in general TB = .
0 0 1 1 y y
So we see that TB represents the projection of a vector ~x onto the y-axis.
y y
TB
0
TB ~v = v2
~e2 ~v = ( vv12 )
D
x x
~e1
w
~ TB w
~
TC : R2 → R2 , TC ~x = C~x.
We calculate easily
1 0 0 −1 x −y
TC = , TC = , in general TC = .
0 1 1 0 y x
−v2
TC ~v = v1
~e2 ~v = ( vv12 )
TC~e1
x x
~e1 TC~e2
w
~
FT
Figure 3.3: Rotation about π/2 counterclockwise.
TC w
~
RA
Just as with other functions, we can sum them or compose them. Remember from your calculus
classes, that functions are summed “pointwise”. That means, if we have two functions f, g : R → R,
then the sum f + g is a new function which is defined by
The multiplication of a function f with a number c gives the new function cf defined by
D
Matrix sum
Let us see how this looks like in the case of matrices. Let A and B be matrices. First note that
they both must depart from the same space Rn because we want to apply them to the same ~x, that
is, both A~x and B~x must be defined. Therefore A and B must have the same number of columns.
They also must have the same number of rows because we want to be able to sum A~x and B~x. So
let A, B ∈ M (m × n) and let ~x ∈ R. Then, by definition of the sum of two functions, we have
a11 a12 · · · a1n x1 b11 b12 · · · b1n x1
a21 a22 · · · a2n x2 b21 b22 · · · b2n x2
(A + B)~x := A~x + B~x = . .. .. + ..
.. .. .. ..
.. . . . . . . .
am1 am2 · · · amn xn bm1 bm2 · · · bmn xn
a11 x1 + a12 x2 + · · · + a1n xn b11 x1 + b12 x2 + · · · + b1n xn
a21 x1 + a22 x2 + · · · + a2n xn b21 x1 + b22 x2 + · · · + b2n xn
= +
.. ..
. .
am1 x1 + am2 x2 + · · · + amn xn bm1 x1 + bm2 x2 + · · · + bmn xn
a11 x1 + a12 x2 + · · · + a1n xn + b11 x1 + b12 x2 + · · · + b1n xn
a21 x1 + a22 x2 + · · · + a2n xn + b21 x1 + b22 x2 + · · · + b2n xn
=
..
.
FT
am1 x1 + am2 x2 + · · · + amn xn + bm1 x1 + bm2 x2 + · · · + bmn xn
(a11 + b11 )x1 + (a12 + b12 )x2 + · · · + (a1n + bmn )xn
(a21 + b11 )x1 + (a22 + b12 )x2 + · · · + (a2n + bmn )xn
=
..
.
(am1 + b11 )x1 + (am2 + b12 )x2 + · · · + (amn + bmn )xn
a11 + b11 a12 + b12 ··· a1n + bmn x1
RA
a21 + b11 a22 + b12 ··· a2n + bmn x2
= .. .
.. .. ..
. . . .
am1 + b11 am2 + b12 ··· amn + bmn xn
We see that A + B is again a matrix of the same size and that the components of this new matrix
are just the sum of the corresponding components of the matrices A and B.
Proposition 3.28. Let A, B, C ∈ M (m × n) let O be the matrix whose entries are all 0 and let
λ, µ ∈ R. Moreover, let A
e be the matrix whose entries are the negative entries of A. Then the
following is true.
e = O.
(iv) Additive inverse A + A
(v) 1A = A.
FT
Proof. The claims of the proposition can be proved by straightforward calculations.
Now let us calculate the composition of two matrices. This is also called the product of the matrices.
Assume we have A ∈ M (m × n) and we want to calculate AB for some matrix B. Note that A
describes a function from Rn → Rm . In order for AB to make sense, we need that B goes from
some Rk to Rn , that means that B ∈ M (n × k). The resulting function AB will then be a map
from Rk to Rm .
B A
Rk Rn Rm
AB
So let B ∈ M (n × k). Then, by the definition of the composition of two functions, we have for every
~x ∈ Rk
a11 [b11 x1 + b12 x2 + · · · + b1k xk ] + a12 [b21 x1 + b22 x2 + · · · + b2k xk ] + · · · + a1n [bn1 x1 + bn2 x2 + · · · + bnk xk ]
a21 [b11 x1 + b12 x2 + · · · + b1k xk ] + a22 [b21 x1 + b22 x2 + · · · + b2k xk ] + · · · + a2n [bn1 x1 + bn2 x2 + · · · + bnk xk ]
=
..
.
FT
am1 [b11 x1 + b12 x2 + · · · + b1k xk ] + am2 [b21 x1 + b22 x2 + · · · + b2k xk ] + · · · + amn [bn1 x1 + bn2 x2 + · · · + bnk xk ]
[a11 b11 + a12 b21 + · · · + a1n bn1 ]x1 + [a11 b12 + a12 b22 + · · · + a1n bn2 ]x2 + · · · + [a11 b1k + a12 b2k + · · · + a1n bnk ]xk
[a21 b11 + a22 b21 + · · · + a2n bn1 ]x1 + [a21 b12 + a22 b22 + · · · + a2n bn2 ]x2 + · · · + [a21 b1k + a22 b2k + · · · + a2n bnk ]xk
=
..
.
[am1 b11 + am2 b21 + · · · + amn bn1 ]x1 + [am1 b12 + am2 b22 + · · · + amn bn2 ]x2 + · · · + [am1 b1k + am2 b2k + · · · + amn bnk ]xk
a11 b11 + a12 b21 + · · · + a1n bn1 a11 b12 + a12 b22 + · · · + a1n bn2 ··· a11 b1k + a12 b2k + · · · + a1n bnk
x1
RA
a21 b11 + a22 b21 + · · · + a2n bn1 a21 b12 + a22 b22 + · · · + a2n bn2 ··· a21 b1k + a22 b2k + · · · + a2n bnk x2
=
.. .. .
. . ..
am1 b11 + am2 b21 + · · · + amn bn1 am1 b12 + am2 b22 + · · · + amn bn2 ··· am1 b1k + am2 b2k + · · · + amn bnk xk
We see that AB is a matrix of the size m × k as was to be expected since the composition function
goes from Rk to Rm . The component j` of the new matrix (the entry in lines j and column `) is
D
n
X
cj` = ajr br` .
r=1
So in order to calculate this entry we need from A only its jth row and from B we only need its
`th column and we multiply them component by component. You can memorise this again as “row
by column”, more precisely:
of (3.20).
a11 a12 ... a1n b11 ... b1` ... b1k c11 ... c1` ... c1k
.. .. b21 ... b2` ... b2k .. .. ..
. .
.. .. .. . . .
aj1
AB = aj2 . . . ajn . ... . . = cj1 ... cj` ... cjk
. .. .. .. .. .. .. ..
.. . . ... . . . . .
am1 am2 . . . amn bn1 ... bn` ... bnk cm1 ... cm2 ... cmk
FT
1 2 3
AB = 2 0 1 4
8 6 4
2 6 −3 0
1·7+2·2+3·2 1·1+2·0+3·6 1 · 2 + 2 · 1 + 3 · (−3) 1·3+2·4+3·0
=
8·7+6·2+4·2 8·1+6·0+4·6 8 · 2 + 6 · 1 + 4 · (−3) 8·3+6·4+4·0
17 19 −5 11
= .
76 32 10 48
RA
Let us see some properties of the algebraic operations for matrices that we just introduced.
AB 6= BA.
That matrix multiplication is not commutative is to be expected since it is the composition of two
functions (think of functions that you know from your calculus classes. For example, it does make
a difference if you first square a variable and then take the arctan or if you first calculate its arctan
and then square the result).
Let us see an example. Let B be the matrix from Example 3.26 and C be the matrix from
Example 3.27. Recall that B represents the orthogonal projection onto the y-axis and that C
represents counterclockwise rotation by 90◦ . If we take ~e1 (the unit vector in x-direction), and we
first rotate and then project, we get the vector ~e2 . If however we project first and rotate then, we
get ~0. That means, BC~e1 6= CB~e1 , therefore BC 6= CB. Let us calculate the products:
0 0 0 −1 0 0
BC = = first rotation, then projection,
0 1 1 0 1 0
0 −1 0 0 0 −1
CB = = first projection, then rotation.
1 0 0 1 0 0
FT
Let A be the matrix from Example 3.25, B be the matrix from Example 3.26 and C the matrix
from Example 3.27. Verify that AB 6= BA and AC 6= CA and understand this result geometrically
by following for example where the unit vectors get mapped to.
Note also that usually, when AB is defined, the expression BA is not defined because in general
the number of columns of B will be different from the number of rows of A.
We finish this section with the definition of the so-called identity matrix.
RA
Definition 3.32. Let n ∈ N. Then the n × n identity matrix is the matrix which has 1s on its
diagonal and has zero everywhere else:
1 0 0 1
0 1
1 0
= . (3.21)
1 0 1
0
D
0 0 1 1
As notation for the identity matrix, the following symbols are used in the literature: En , idn , Idn ,
In , 1n , 1n . The subscript n can be omitted if the size of the matrix is clear.
Ejercicios.
1 3 −2 0 −1 1
1. Para A = 2 5, B = 1 4 y C = 4 6, calcular:
FT
−1 2 −7 5 −7 3
(a) 2A,
(b) 3C − 2A,
(c) A + B + C,
(d) 2A − 3B + 5C,
(e) una matriz D tal que A + 2B − 3C + D es la matriz de solo ceros.
RA
2. Realice los siguientes cálculos (antes de hacer la multiplicación indicada, especifique cuál será
el tamaño de la matriz resultante al hacer el producto):
2 −3 5 1 4 6 1
(a) 1 0 6 −2 3 5 (d) 3 0 −2 5
2 3 1 1 0 4 −1
1 −1
1 4 −2 0 1
(b) 3 2 1 −2 4 3
3 0 4 2 3 (e)
D
−6 4 0 3 0 5
2 0
3 −6
2 4 1 6
(c) 1 4 0 2 1
7 1 4
0 (f) 0 4
2 −3 5
−2 3 −2 3
3 −1 −5 2 6 10
5. Sean A = yC = . Encuentre por lo menos una matriz B tal
4 2 1 0 −3 2
que AB = C. ¿Cuántas soluciones a la ecuación matricial AX = C existen?
Cϑ~e2 ~e2
Cϑ~e1
ϑ
ϑ
~e1
7. Si A =
1
0
1
1
8. Sean A, B ∈ M (n × n):
yB=
a
c
b
d
FT
, encuentre condiciones sobre a, b, c, d tal que AB = BA.
• Assume we are given the matrix A from Example 3.25 which represents reflection on the
x-axis and we want to find a matrix that restores a vector after we applied A to it. Clearly,
we have to reflect again on the x-axis: reflecting an arbitrary vector ~x twice on the x-axis
leaves the vector where it was. Let us check:
1 0 1 0 1 0
AA = = = id2 .
0 −1 0 −1 0 1
That means that for every ~x ∈ R2 , we have that A2 ~x = ~x, hence A is its own inverse.
• Assume we are given the matrix C from Example 3.27 which represents counterclockwise
rotation by 90◦ and we want to find a matrix that restores a vector after we applied C to it.
Clearly, we have to rotate clockwise by 90◦ . Let us assume that there exists a matrix which
represents this rotation and let us call it C−90◦ . By Remark 3.19 it is enough to know how it
acts on ~e1 and ~e2 in order to write it down. Clearly C−90◦~e1 = −~e2 and C−90◦~e2 = ~e1 , hence
C−90◦ = (−~e2 |~e1 ).
Let us check:
0 1 0 −1 1 0
C−90◦ C = = = id2
−1 0 1 0 0 1
and
0 −1 0 1 1 0
CC −90◦ = = = id2
1 0 −1 0 0 1
which was to be expected because rotating first 90◦ clockwise and then 90◦ counterclockwise,
leaves any vector where it is.
• Assume we are given the matrix B from Example 3.26 which represents projection onto the
FT
y-axis. In this case, we cannot
restore
a vector ~x after we projected
it onto the y-axis. For
0 0 7
example, if we know that B~x = , then ~x could have been or or any other vector
2 2 2
in R2 whose second component is equal to 2. This shows that B does not have an inverse.
(i) Given a certain number of packages of type A and of type B, how many peaches and how
many mangos do we get?
(ii) How many packages of each type do we need in order to obtain a given number of peaches
and mangos?
D
The first question is quite easy to answer. Let us write down the information that we are given. If
then
p = 1a + 2b
(3.22)
m = 3a + 1b.
Using vectors and matrices, we can rewrite this as
p 1 2 a
= .
m 3 1 b
1 2
Let A = . Then the above becomes simply
3 1
p a
=A . (3.23)
m b
If we know a and b (that is, we know how many packages of each type we bought), then we can
find the values of p and m by simply evaluating A( ab ) which is relatively easy.
Example 3.34. Assume that we buy 1 package of type A and 3 packages of type B, then we
calculate
p 1 1 2 1 7
=A = = ,
m 3 3 1 3 6
which shows that we have 9 peaches and 7 mangos.
If on the other hand, we know p and m and we are asked find a and b such that (3.22) holds, we
have to solve a linear system which is much more cumbersome. Of course, we can solve (3.23) using
FT
the Gauß or Gauß-Jordan elimination process, but if we were asked to do this for several pairs p
and m, then it would become long quickly. However, if we had a matrix A0 such that A0 A = id2 ,
then this task would be quite easy since in this case we could manipulate (3.23) as follows:
p a p a a a
=A =⇒ A0 = A0 A = id2 = .
m b m b b b
The task to find a and b again reduces to perform a matrix multiplication. The matrix A0 , if it
exists, is called the inverse of A and we will dedicate the rest of this section to give criteria for its
existence, investigate its properties and give a recipe for finding it.
−1 2
Exercise. Check that A0 = 1
satisfies A0 A = id2 .
5 3 −1
D
Example 3.35. Assume that we want to buy 5 peaches and 5 mangos. Then we calculate
a 0 5 1 −1 2 5 1
=A = = ,
b 5 5 3 −1 5 2
Definition 3.36. A matrix A ∈ M (n×n) is called invertible if there exists a matrix A0 ∈ M (n×n)
such that
AA0 = idn and A0 A = idn
In this case A0 is called the inverse of A and it is denoted by A−1 . If A is not invertible then it is
called non-invertible or singular.
The reason why in the definition we only admit square matrices (matrices with the same number
or rows and columns) is explained in the following remark.
Remark 3.37. (i) Let A ∈ M (m×n) and assume that there is a matrix B such that BA = idn .
This means that if for some ~b ∈ Rm the equation A~x = ~b has a solution, then it is unique
because
A~x = ~b =⇒ BA~x = B~b =⇒ ~x = B~b.
From the above it is clear that A ∈ M (m × n) can have an inverse only if for every ~b ∈ Rm
the equation A~x = ~b has at most one solution. We know that if A has more columns than
rows, then the number of columns will be larger than the number of pivots. Therefore,
A~x = ~b has either no or infinitely many solutions (see Theorem 3.7). Hence a matrix A with
more columns than rows cannot have an inverse.
FT
(ii) Again, let A ∈ M (m × n) and assume that there is a matrix B such that AB = idm . This
means that for every ~b ∈ Rm the equation A~x = ~b is solved by ~x = B~b because
From the above it is clear that A ∈ M (m × n) can have an inverse only if for every ~b ∈ Rm
the equation A~x = ~b has at least one solution. Assume that A has more rows than columns.
If we apply Gaussian elimination to the augmented matrix A|~b) then the last row of the
RA
row-echelon form has to be (0 · · · 0|βm ). If we chose ~b such that after the reduction βm 6= 0,
then A~x = ~b does not have a solution. Such a ~b is easy to find: We only need to take ~em
(the mth unit vector) and do the steps from the Gauß elimination backwards. If we take
this vector as right hand side of our system, then the last row after the reduction will be
(0 . . . 0|1). Therefore, a matrix A with more rows than columns cannot have an inverse
because there will always be some ~b such that the equation A~x = ~b has no solution.
In conlusion we showed that we must have m = n if A ought to have an inverse matrix.
D
Note that C and D must be n × m matrices. The following examples show that the left- and right
inverses do not need to exist, and if they do, they are not unique.
0 0
Examples 3.39. (i) A = has neither left- nor right inverse.
0 0
1 0
1 0 0
(ii) A = has no left inverse and has right inverse D = 0 1. In fact, for every
0 1 0
0 0
1 0
x, y ∈ R the matrix 0 1 is a right inverse of A.
x y
1 0
1 0 0
(iii) A = 0 1 has no right inverse and has left inverse C =
. In fact, for every
0 1 0
0 0
1 0 x
x, y ∈ R the matrix is a left inverse of A.
0 1 y
Remark 3.40. We will show in Theorem 3.45 that a matrix A ∈ M (n × n) is invertible if and only
if it has a left- and a right inverse.
Examples 3.41. • From the examples at the beginning of this section we have:
FT
1 0 −1 1 0
A= =⇒ A = A = ,
0 −1 0 −1
0 −1 −1 0 1
C= =⇒ C = ,
1 0 −1 0
0 0
B= =⇒ B is not invertible.
RA
0 1
4 0 0 0 1/4 0 0 0
0 5 0 0
• Let A = . Then we can easily guess that A−1 = 0 1/5 0 0
0 0 −3 0 0 0 −1/3 0
0 0 0 2 0 0 0 1/2
is an inverse of A. It is easy to check that the product of these matrices gives id4 .
• Let A ∈ M (n × n) and assume that the kth row of A consists of only zeros. Then A is not
D
invertible because for any matrix B ∈ M (n × n), the kth row of the product matrix AB will
be zero, no matter how we choose B. So there is no matrix B such that AB = idn .
• Let A ∈ M (n × n) and assume that the kth column of A consists of only zeros. Then A is
not invertible because for any matrix B ∈ M (n × n), the kth column of the product matrix
BA will be zero, no matter how we choose B. So there is no matrix B such that BA = idn .
Now let us prove some theorems about inverse matrices. Recall that A ∈ M (n × n) is invertible if
and only if there exists a matrix A0 ∈ M (n × n) such that AA0 = A0 A = idn .
First we will show that the inverse matrix, if it exists, is unique.
(ii) If A is invertible, then its inverse A−1 is invertible and its inverse is A.
(iii) If A and B are invertible, then their product AB is invertible and (AB)−1 = B −1 A−1 .
Proof. (i) Assume that A is invertible and that A0 and A00 are inverses of A. Note that this
means that
AA0 = A0 A = idn and AA00 = A00 A = idn . (3.25)
We have to show that A0 = A00 . This follows from (3.25) and from the associativity of the
matrix multiplication because
(ii) Assume that A is invertible and let A−1 be its inverse. In order to show that A−1 is invertible,
we need a matrix C such that CA−1 = A−1 C = idn . This matrix C is then the inverse of
A−1 . Clearly, C = A does the trick. Therefore A−1 is invertible and (A−1 )−1 = A.
(iii) Assume that A and B are invertible. In order to show that AB is invertible and (AB)−1 =
FT
B −1 A−1 , we only need to verify that B −1 A−1 (AB) = (AB)B −1 A−1 = idn . We see that this
is true using the associativity of the matrix product:
Note that in the proof we guessed the formula for (AB)−1 and then we verified that it indeed is the
inverse of AB. We can also calculate it as follows. Assume that C is a left inverse of AB. Then
RA
C(AB) = idn ⇐⇒ (CA)B = idn ⇐⇒ CA = idn B −1 = B −1 ⇐⇒ C = B −1 A−1
Remark 3.43. In general, the sum of invertible matrices is not invertible. For example, both idn
D
and − idn are invertible, but their sum is the zero matrix which is not invertible.
Theorem 3.44 in the next section will show us how to find the inverse of a invertible matrix; see in
particular the section on page 113.
Ejercicios.
FT
3 1 1/3 1 2 −1 −3 −1 −2 1
A= , B= , C= , D= , E= .
5 2 1/5 1/2 −5 3 −5 −2 5 −3
−1 0 0
2. Muestre que 0 −1 0 es su propia inversa.
RA
−2 −4 1
4 −3
1 2 3
3. Verifique que 0 0 es una inversa a derecha de la matriz A = y úsela para
1 3 4
−1 1
encontrar una solución particular al sistema A~x = ~b.
D
1 2
4. Sea A = . ¿Es A invertible? Hint. Suponga que AB = id2 para alguna matriz
1 2
B ∈ M (2 × 2). ¿Cómo se ve la primer columna si uno evalua el producto AB?
FT
Observe that the case (1), no solution, cannot occur for homogeneous systems.
The next theorem connects the above to invertibility of the matrix representing the system.
We will complete this theorem with one more item in Chapter 4 (Theorem 4.11).
the number of pivots is equal to n (the number of columns of A) in every row-reduced echelon form
of A.
(iv) ⇒ (v) is clear.
(v) ⇒ (ii) follows from case (2.1) above (or by Theorem 3.7(2.1)) because no row-reduced form of
A can have a row consisting of only zeros.
So far we have shown that (ii) - (v) are equivalent. Now we have to connect them to (i).
(i) ⇒ (ii) Assume that A is invertible and let ~b ∈ Rn . Then A~x = ~b ⇐⇒ ~x = A−1~b which shows
existence and uniqueness of the solution.
(ii) ⇒ (i) Assume that (ii) holds. We will construct A−1 as follows (this also tells us how we can
calculate A−1 if it exists). Recall that we need a matrix C such that AC = idn . This C will
then be our candidate for A−1 (we still would have to check that CA = idn ). Let us denote the
columns of C by ~cj for j = 1, . . . , n, so that C = (~c1 | · · · |~cn ). Recall that the kth column of AC is
A(kth column of C) and that the columns of idn are exactly the unit vectors ~ek (the vector with a
1 as kth component and zeros everywhere else). Then AC = idn can be written as
By (ii) we know that equations of the form A~x = ~ej have a unique solution. So we only need
to set ~cj = unique solution of the equation A~x = ~ej . With this choice we then have indeed that
AC = idn .
It remains to show that CA = idn . To this end, note that
This means that A(idn −CA)~x = ~0 for every ~x ∈ Rn . Since by (ii) the equation A~y = ~0 has the
unique solution ~y = ~0, it follows that (idn −CA)~x = ~0 for every x ∈ Rn . But this means that
~x = CA~x for every ~x, hence CA must be equal to idn .
FT
Theorem 3.45. Let A ∈ M (n × n).
(i) If A has a left inverse C (that is, if CA = idn ), then A is invertible and A−1 = C.
(ii) If A has a right inverse D (that is, if AD = idn ), then A is invertible and A−1 = D.
Proof. (i) By Theorem 3.44 it suffices to show that A~x = ~0 has a the unique solution ~0. So
assume that ~x ∈ Rn satisfies A~x = ~0. Then ~x = idn ~x = (CA)~x = C(A~x) = C~0 = ~0. This
shows that A is invertible. Moreover, C = C(idn ) = C(AA−1 ) = (CA)A−1 = idn A−1 = A−1 ,
RA
hence C = A−1 .
(ii) By (i) applied to D, it follows that D has an inverse and that D−1 = A, so by Theo-
rem 3.42 (ii), A is invertible and A−1 = (D−1 )−1 = D.
Let A be a square matrix. The proof of Theorem 3.44 tells us how to find its inverse if it exists.
D
We only need to solve A~x = ~ek for k = 1, . . . , n. This might be cumbersome and long, but we
already know that if these equations have solutions, then we can find them with the Gauß-Jordan
elimination. We only need to form the augmented matrix (A|~ek ), apply row operations until we get
to (idn |~ck ). Then ~ck is the solution of A~x = ~ek and we obtain the matrix A−1 as the matrix whose
columns are the vectors ~c1 , . . . , ~cn . If it is not possible to reduce A to the identity matrix, then it
is not invertible.
Note that the steps that we have to perform to reduce A to the identiy matrix depend only on
the coefficients in A and not on the right hand side. So we can calculate the n vectors ~c1 , . . . ~cn
with only one (big) Gauß-Jordan elimination if we augment our given matrix A by the n vectors
~e1 , . . . ,~en . But the matrix (~e1 | · · · |~en ) is nothing else than the identity matrix idn . So if we take
(A| idn ) and apply the Gauß-Jordan elimination and if we can reduce A to the identity matrix,
then the columns on the right are the columns of the inverse matrix A−1 . If we cannot get to the
identity matrix, then A is not invertible.
1 2
Examples 3.46. (i) Let A = . Let us show that A is invertible by reducing the aug-
3 4
mented matrix (A| id2 ):
1 2 1 0 R2 −3R1 →R2 1 2 1 0 R1 +R2 →R1 1 0 2 1
(A| id2 ) = −−−−−−−−→ −−−−−−−−→
3 4 0 1 0 −2 −3 1 0 −2 −3 1
−1/2R2 →R2 1 0 −2 1
−−−−−−−−→ .
0 1 3/2 −1/2
−2 1
Hence A is invertible and A−1 = .
3/2 −1/2
We can check your result by calculating
1 2 −2 1 −2 + 3 1 − 1 1 0
= =
3 4 3/2 −1/2 −6 + 6 3 − 2 0 1
and
FT
−2 1 1 2 −2 + 3 −4 + 4 1 0
= = .
3/2 −1/2 3 4 3/2 − 3/2 3−2 0 1
1 2
(ii) Let A = . Let us show that A is not invertible by reducing the augmented matrix
−2 −4
(A| id2 ):
1 2 1 0 R2 +2R1 →R2 1 2 1 0
(A| id2 ) = −−−−−−−−→ .
−2 −4 0 1 0 0 2 1
RA
Since there is a zero row in the left matrix, we conclude that A is not invertible.
1 1 1
(iii) Let A = 0 2 3. Let us show that A is invertible by reducing the augmented matrix
5 5 1
(A| id3 ):
1 1 1 1 0 0 1 1 1 1 0 0
R3 −5R1 →R3
(A| id3 ) = 0 2 3 0 1 0 −−−−−−−−→ 0 2 3 0 1 0
D
5 5 1 0 0 1 0 0 −4 −5 0 1
1 1 1 1 0 0 4 4 0 −1 0 1
4R2 +3R3 →R2 4R1 +R3 →R1
−−−−−−−−−→ 0 8 0 −15 4 3 −−−−−−−−→ 0 8 0 −15 4 3
0 0 −4 −5 0 1 0 0 −4 −5 0 1
8 0 0 13 −4 −1
2R1 −R2 →R1
−−− −−−−−→ 0 8 0 −15 4 3
0 0 −4 −5 0 1
1 0 0 13/8 −1/2 −1/8
2R1 −R2 →R1
−−− −−−−−→ 0 1 0 −15/8 1/2 3/8 .
0 0 1 5/4 0 −1/4
13/8 −1/2 −1/8 13 −4 −1
Hence A is invertible and A−1 = −15/8 1/2 3/8 = 1
8
−15 4 3 .
5/4 0 −1/4 10 0 2
We can check your result by calculating
1 1 1 13/8 −1/2 −1/8 1 0 0
0 2 3 −15/8 1/2 3/8 = · · · = 0 1 0
5 5 1 5/4 0 −1/4 0 0 1
and
13/8 −1/2 −1/8 1 1 1 1 0 0
−15/8 1/2 3/8 0 2 3 = · · · = 0 1 0 .
5/4 0 −1/4 5 5 1 0 0 1
FT
c d
linear system has exactly one solution. By Theorem 1.11 this is the case if and only if det A 6= 0.
Recall that det A = ad − bc. So let us assume here that det A 6= 0.
Case 1. a 6= 0.
a b 1 0 aR2 −cR1 →R2a b 1 0
(A| id2 ) = −−−−−−−−−→
c d 0 1 0 ad − bc −c a
RA
bc ab ad ab
b
R1 − ad−bc R2 →R1 a 0 1 + ad−bc − ad−bc a 0 ad−bc − ad−bc
−−−−−−−−−−−−→ =
0 ad − bc −c a 0 ad − bc −c a
d b
b
R1 − ad−bc 1 0
R2 →R1
ad−bc − ad−bc
−−−−−−−−−−−−→ c a .
0 1 − ad−bc ad−bc
It follows that
1 d −b
A−1 = . (3.27)
ad − bc −c a
D
• the relation between the invertibility of a square matrix A and the existence and uniqueness
of solution of A~x = ~b,
• that inverting a matrix is the same as solving a linear system,
• etc.
You should now be able to
Ejercicios.
1. Determine la inversa (si es posible) de las siguientes matrices:
1 1 1
1 3 0 −1 6
(a) (b) 2 3 (c)
−2 6 2 −12
5 5 1
1 1 1 1
3 2 1 1 0 0
0 −3
1 2 −1 2
(d) 2 2 (e) 0 0 (f)
1 −1 2 1
0 0 −1 4 3 1
FT
1 3 3 2
1 1 1 1 0 0
(g) 1 2 2 (h) 0 −5 0
0 3 3 0 0 3
2. De los siguientes sistemas de ecuaciones lineales, ¿cuál tiene una solución no trivial?
2x + y − z = 0 x−y−z =0
RA
i) x − 2y − 3z = 0 ii) 2x + y + 2z = 0
−3x − y − z = 0 −2x + 5y + 6z = 0
cos ϑ − sin ϑ
4. Calcular la inversa de la matriz de rotación Cϑ = . ¿Cómo se interpreta
sin ϑ cos ϑ
geométricamente Cϑ−1 ? (ver Sección 3.4, Ejercicio 6.).
5. Calcular la inversa de
1 0 0
0 cos ϑ − sin ϑ .
0 sin ϑ cos ϑ
6. Sea A ∈ M (n × n) no invertible. Demuestre que existe B ∈ M (n × n), B 6= O tal que
AB = O. (Hint: considere el sistema homogéneo A~x = ~0).
7. Sean B ∈ M (6 × 5) y C ∈ M (5 × 6). Muestre que BC no puede ser una matriz invertible.
(Hint: considere el sistema homogéneo C~x = ~0).
FT
If we denote At = (e
aij ) i=1,...,n , then e
aij = aji for i = 1, . . . , n and j = 1, . . . , m.
j=1,...,m
Proof. Clear.
D
Proof. Note that both (AB)t and B t At are m × k matrices. In order to show that they are equal,
we only need to show that they are equal in every entry. Let i ∈ {1, . . . , m} and j ∈ {1, . . . , k}.
Then
Theorem 3.51. Let A ∈ M (n × n). Then A is invertible if and only if At is invertible. In this
case, (At )−1 = (A−1 )t .
Proof. Assume that A is invertible. Then AA−1 = id. Taking the transpose on both sides, we find
This shows that At is invertible and its inverse is (A−1 )t , see Theorem 3.45. Now assume that
At is invertible. From what we just showed, it follows that then also its transpose (At )t = A is
invertible.
Next we show an important relation between transposition of a matrix and the inner product on
Rn .
FT
(ii) If hA~x , ~y i = h~x , B~y i for all ~x ∈ Rn and all ~y ∈ Rm , then B = At .
(ii) We have to show: For all i = 1, . . . , m and j = 1, . . . , n we have that aij = bji . Take
~x = ~ej ∈ Rn and ~y = ~ei ∈ Rm . If we take the inner product of A~ej with ~ei , then we obtain
the ith component of A~ej . Recall that A~ej is the jth column of A, hence
Similarly if we take the inner product of B~ei with ~ej , then we obtain the jth component of
B~ei . Since B~ei is the jth column of B it follows that
By assumption hA~ej ,~ei i = h~ej , B~ei i, hence it follows that aij = bji , hence B = At .
That means that for an upper triangular matrix all entries below the diagonal are zero, for a lower
triangular matrix all entries above the diagonal are zero and for a diagonal matrix, all entries except
the ones on the diagonal must be zero. These matrices look as follows:
a11 a11 a11
a22 ∗
,
a22 0
,
a22 0
0 ∗ 0
ann ann ann
upper triangular matrix, lower triangular matrix, diagonal matrix diag(a11 , . . . , ann ).
Remark 3.54. A matrix is both upper and lower triangular if and only if it is diagonal.
Examples 3.55.
FT
0 2 4 2 0 0 0 0
1 2 4 0 2 0 0 0 0 0
0 5 2 0 3 0 0
A = 0 2 5 , B =
0
, C = , D = 0 3 0 , E = 0 0 0 .
0 0 8 3 4 0 0
0 0 3 0 0 8 0 0 0
0 0 0 0 5 0 0 1
The matrices A, B, D, E are upper triangular, C, D, E are lower triangular, D, E are diagonal.
RA
Definition 3.56. (i) A matrix A ∈ M (n × n) is called symmetric if At = A. The set of all
symmetric n × n matrices is denoted by Msym (n × n).
Examples 3.57.
1 7 4 3 0 4 0 2 −5 0 0 8
D
A = 7 2 5 , B = 0 4 0 , C = −2 0 −3 , D = 0 3 0 .
4 5 3 4 0 1 5 3 0 2 0 0
Question 3.2
How many possibilities are there to express a given matrix A ∈ M (n × n) as sum of a symmetric
and an antisymmetric matrix?
Exercise 3.59. Show that the diagonal entries of an antisymmetric matrix are 0.
FT
• calculate the transpose of a given matrix,
• check if a matrix is symmetric, antisymmetric or none,
• etc.
Ejercicios.
1. (a) Encuentre las transpuestas de las siguientes matrices:
RA
0
1 3
−1 4 2 −1 1 5 −1
(a) , (b) −1 2 , (c) , 4 .
(d)
10 8 0 0 4 13
4 5
3
(b) Para cada una de las matrices del punto anterior, verifique la igualdad hA~x , ~y i =
hx , At yi.
1 β + 5 α + 2β − 2
D
4. Si A, B ∈ Msym (n × n), muestre que (AB)t = BA. ¿Se puede concluir que AB es simétrica?.
5. Sean v~1 , v~2 , v~3 ∈ R3 tales que hvi , vj i = 0 si i 6= j y hvi , vj i = 1 si i = j. Sea A = [v~1 v~2 v~3 ]
la matiz cuyas columnas son los vectores dados. Muestre que AAt = id3 (ver Sección 3.6,
Ejercicio 5.).
7. (a) Muestre que una matriz triangular superior (inferior) es invertible si y solo si todas las
entradas de la diagonal principal son distintas de 0.
(b) Sea D ∈ M (n × n) una matriz diagonal. Determine cuando D es invertible y halle su
inversa
FT
1
(i) Sj (c) =
c
for j = 1, . . . , n and c 6= 0. All entries outside the diagonal are 0.
1
column k
1 c
RA
row j
(ii) Qjk (c) =
for j, k = 1, . . . , n with j 6= k and c ∈ R. The number c is
1
in row j and column k. All entries apart from c and the diagonal are 0.
col. k col. j
1
D
0 1
row k
(iii) Pjk =
for j, k = 1, . . . , n. This matrix is obtained from the
1 0
row j
1
identity matrix by swapping rows j and k (or, equivalently, by swapping columns j and k).
Let us see how these matrices act on other n × n matrices. Let A = (aij )ni,j=1 ∈ M (n × n). We
want to calculate EA where E is an elementary matrix.
1 a11 a12 a1n a11 a12 a1n
• Sj (c)A = ajn = caj1
c aj1 aj2 caj2 cajn
1 an1 an2 ann an1 an2 ann
a11 a12 a1n a11 a12 a1n
FT
1
aj1 aj2 ajn
1 c aj1 + cak1 aj2 + cak2 ajn + cakn
• Qjk (c)A =
=
ak1 ak2 akn
ak1 ak2 akn
1
an1 an2 ann an1 an2 ann
RA
1 a11 a12 a1n a11 a12 a1n
0 1 aj1 aj2 ajn ak1 ak2
akn
• Pjk A =
= .
ak1
ak2 akn aj1 aj2 ajn
1 0
1 an1 an2 ann an1 an2 ann
• Qjk (c) sums c times the kth row to the jth row of A.
• Pjk swaps the kth and the jth row of A.
These are exactly the row operations from the Gauß or Gauß-Jordan elimination! So we see that
every row operation can be achieved by multiplying from the left by an appropriate elementary
matrix.
Remark 3.63. The form of the elementary matrices is quite easy to remember if you recall that
E idn = E for every matrix E, in particular for an elementary matrix. So, if you want to remember
e.g. how the 5 × 5 matrix looks like which sums 3 times the 2nd row to the 4th, just remember
that this matrix is
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
E = E id5 = (take id5 and sum 3 times its 2nd row to its 4th row) =
0 3 0 1 0
0 0 0 0 1
Question 3.3
How do elementary matrices act on other matrices if we multiply them from the right?
Hint. There are two ways to find the answer. One is to carry out the matrix multiplication as we
did on page 122. Or you could use that AE = [(AE)t ]t = [E t At ]t . If E is an elementary matrix,
FT
then so is E t , see Proposition 3.65. Since you know how E t At looks like, you can then deduce
how its transpose looks like.
Since the action of an elementary matrix can be “undone” (since the corresponding row operation
can be undone), we expect them to be invertible. The next proposition shows that they indeed are
and that their inverse is again an elementary matrix of the same type.
Show that Proposition 3.64 is true. Convince yourself that it is true using their interpretation as
row operations.
Exercise 3.66. Show that Qjk (c) = Sk (c−1 )Qjk (1)Sk (c) for c 6= 0. Interpret the formula in
terms of row operations.
Exercise. Show that Pjk can be written as product of matrices of the form Qjk (c) and Sj (c).
Let us come back to the relation of elementary matrices and the Gauß-Jordan elimination process.
Proposition 3.67. Let A ∈ M (n × n) and let A0 be a row echelon form of A. Then there exist
elementary matrices E1 , . . . , Ek such that
A = E1 E2 · · · Ek A0 .
Proof. We know that we can arrive at A0 by applying suitable row operations to A. By Propo-
sition 3.62 they correspond to multiplication of A from the left by suitable elementary matrices
Fk , Fk−1 , . . . , F2 , F1 , that is
A0 = Fk Fk−1 · · · F2 F1 A.
FT
We know that all the Fj are invertible, hence their product is invertible and we obtain
We know that the inverse of every elementary matrix Fj is again an elementary matrix, so if we set
Ej = Fj−1 for j = 1, . . . , k, the proposition is proved.
Corollary 3.68. Let A ∈ M (n × n). Then there exist elementary matrices E1 , . . . , Ek and an
upper triangular matrix U such that
RA
A = E1 E2 · · · Ek U.
Proof. This follows immediately from Proposition 3.67 if we recall that every row reduced echelon
form of A is an upper triangular matrix.
The next theorem shows that every invertible matrix is “composed” of elementary matrices.
Theorem 3.69. Let A ∈ M (n × n). Then A is invertible if and only if it can be written as product
D
of elementary matrices.
Proof. Assume that A is invertible. Then the reduced row echelon form of A is idn . Therefore,
by Proposition 3.67, there exist elementary matrices E1 , . . . , Ek such that A = E1 · · · Ek idn =
E1 · · · Ek .
If, on the other hand, we know that A is the product of elementary matrices, say, A = F1 · · · F` , then
clearly A is invertible since each elementary matrix Fj is invertible and the product of invertible
matrices is invertible.
We finish this section with an exercise where we write an invertible 2 × 2 matrix as product of
elementary matrices. Notice that there are infinitely many ways to write it as product of elementary
matrices just as there are infinitely many ways of performing row reduction to get to the identity
matrix.
1 2
Example 3.70. Write the matrix A = as product of elementary matrices.
3 4
Solution. We use the idea of the proof of Theorem 3.44: we apply the Gauß-Jordan elimination
process and write the corresponding row transformations as elementary matrices.
1 2 R2 →R2 −3R1 1 2 R1 →R1 +R2 1 0 R2 →− 12 R2 1
0
−−−−−−−−→ −−−−−−−−→ −−−−−− −→
3 4 Q21 (−3) 0 −2 Q12 (1) 0 −2 1
S2 (− 2 ) 0
1
| {z } | {z } | {z }
=Q21 (−3)A =Q21 (1)Q21 (−3)A =S2 (− 12 )Q21 (1)Q21 (−3)A
So we obtain that
id2 = S2 (− 12 )Q21 (1)Q21 (−3)A. (3.28)
Since the elementary matrices are invertible, we can solve for A and obtain
FT
= [Q21 (−3)]−1 [Q21 (1)]−1 [S2 (− 12 )]−1
= Q21 (3)Q21 (−1)S2 (−2).
Note that from (3.28) we get the factorisation for A−1 for free. Clearly, we must have
Ejercicios.
1. Determine cuáles de las siguientes son matrices elementales:
1 0 0 1
1 0 −1 1 0 −3 0
(a) (b) (c) 1
2 1 0 3
2 0 1
1 0 0 0 1 0 0 0
0 1 0 0 1 1 0 1
0 1 0 0
(d) 0 1 (e)
0 0 1 0
(f)
0
0 1 0
1 0 0
0 0 0 1 0 0 1 1
2. Muestre que cada una de las siguientes matrices es invertible y factorı́cela como un producto
de matrices elementales:
1 1 0 0
2 0 4 3 1 1
2 1 3 2 4 0
(a) , (b) 0 1 1 , (c)
0 0 −2
, (d) 0 2 4 .
3 2 1
3 −1 1 0 0 −1
FT
0 0 0 1
3. Escriba cada matriz como producto de matrices elementales y una matriz triangular superior:
2 −2 2 1 3 1 0 0 0 0
(a) , (d) .
−2 6 (b) 0 −3 1, (c) 5 0 0, 2 0
1 0 2 −2 1 3
RA
4. En los siguientes problemas, encuentre una matriz elemental E tal que EA = B:
12 1 2
1 2 3 6
(a) A = , B= (b) A = 3 4 , B = 3 4
−1 3 −1 3
56 1 −2
1 4 7 8 −2 5 3
1 −3 0 5 −5
(c) A = 5 6 , B = 5 6 (d) A = 2 −1 0 4 , B = 2 −1 0 4
7 8 1 4 5 1 −3 2 5 1 −3 2
D
5 1 2 0 11 2
1 −3 −2 0
(e) A = , B= (f) A = −1 3 4 , B = −1 3 4
−1 1 −1 1
1 −2 0 1 −2 0
5. (a) Sea A ∈ M (3 × 3) una matriz triangular superior (inferior) tal que las entradas de su
diagonal son todas distintas de 0. Muestre que A se factoriza como producto de a lo más
seis matrices elementales.
(b) Sean A, B ∈ M (3 × 3) matrices triangulares superiores (inferiores). Muestre que AB es
una matriz triangular superior (inferior).
(c) Sea A ∈ M (3 × 3) una matriz triangular superior (inferior) tal que las entradas de su
diagonal son todas distintas de 0. Muestre que A−1 es triangular superior (inferior).
3.9 Summary
Elementary row operations (= operations which lead to an equivalent system) for
solving a linear system.
FT
• If the system is homogeneous, then it has either exactly one or infinitely many solutions. It
always has at least one solution, namely the trivial one.
• The set of all solutions of a homogeneous linear equations is a vector space.
• The set of all solutions of a inhomogeneous linear equations is an affine vector space.
For A ∈ M (m × n) and ~b ∈ Rm consider the equation A~x = ~b. Then the following is true:
RA
(1) No solution ⇐⇒ The reduced row echelon form of the augmented
system (A|~b) has a row of the form (0 · · · 0|β) with
some β 6= 0.
(2) At least one solution ⇐⇒ The reduced row echelon form of the augmented
system (A|~b) has no row of the form (0 · · · 0|β)
with some β 6= 0.
In case (2), we have the following two sub-cases:
(2.1) Exactly one solution ⇐⇒ # pivots= # columns.
D
Definition.
a11 a12 ··· a1n x1 a11 x1 + a12 x2 + · · · + a1n xn
a21 a22 ··· x2 a21 x1 + a22 x2 + · · · + a2n xn
a2n
A~x = . .. .. = .. ,
.. ..
.. . . . . .
am1 am2 ··· amn xn am1 x1 + am2 x2 + · · · + amn xn
a11 a12 ··· a1n b11 b12 ··· b1n
a21 a22 ··· a2n b21 b22 ··· b2n
A+B = . .. + ..
.. .. ..
.. . . . . .
am1 am2 ··· amn bm1 bm2 · · · bmn
a11 + b11 a12 + b12 ··· a1n + b1n
a21 + b21 a22 + b22 ··· a2n + b2n
= ,
.. .. ..
. . .
am1 + bm1 am2 + bm2 ··· amn + bmn
a11 a12 ··· a1n b11 b12 ··· b1n
a21 a22 ··· a2n b21 b22 ··· b2n
AB = .
.. .. .. .. ..
.. . . . . .
am1 am2 · · · amn bm1 bm2 · · · bmn
a11 b11 + a12 b21 + · · · + a1n bn1 · · · a11 b1k + a12 b2k + · · · + a1n bnk
FT
a21 b11 + a22 b21 + · · · + a2n bn1 · · · a21 b1k + a22 b2k + · · · + a2n bnk
=
.. ..
. .
am1 b11 + am2 b21 + · · · + amn bn1 ··· am1 b1k + am2 b2k + · · · + amn bnk
= (cj` )j`
with
n
RA
X
cj` = ajh bh` .
h=1
• Product of matrices with vector or matrix with matrix: “multiply row by column”.
• A1 + A2 = A2 + A1 ,
• (A1 + A2 ) + A3 = A1 + (A2 + A3 ),
• (AB)C = A(BC),
• in general, AB 6= BA,
• (AB)~z = A(B~z),
Transposition of matrices
Let A = (aij )i=1,...,m ∈ M (m × n). Then its transpose is the matrix At = (e
aij ) i=1,...,n ∈ M (n × m)
j=1...,n j=1...,m
with e
aij = aji .
For A, B ∈ M (m × n) and C ∈ M (n × k) we have
• (At )t = A,
• (A + B)t = At + B t ,
• (AC)t = C t At ,
• hA~x , ~y i = h~x , At ~y i for all ~x ∈ Rn and ~y ∈ Rm .
A matrix A is called symmetric if At = A and antisymmetric if At = −A. Note that only square
matrices can be symmetric.
A matrix A = (aij )i,j=1,...,n ∈ M (n × n) is called
• upper triangular if aij = 0 whenever i > j,
• lower triangular if aij = 0 whenever i < j,
FT
• diagonal if aij = 0 whenever i 6= j.
Clearly, a matrix is diagonal if and only if it is upper and lower triangular. The transpose of an
upper triangular matrix is lower triangular and vice verse. Every diagonal matrix is symmetric.
Invertibility of matrices
A matrix A ∈ M (n × n) is called invertible if there exists a matrix B ∈ M (n × n) such that
AB = BA = idn . In this case B is called the inverse of A and it is denoted by A−1 . If A is not
RA
invertible, then it is called singular.
• The inverse of an invertible matrix A is unique.
• If A is invertible, then so is A−1 and (A−1 )−1 = A.
• If A is invertible, then so is At and (At )−1 = (A−1 )t .
• If A and B are invertible, then so is AB and (AB)−1 = B −1 A−1 .
(i) A is invertible.
(ii) For every ~b ∈ Rn , the equation A~x = ~b has exactly one solution.
(iii) The equation A~x = ~0 has exactly one solution.
(iv) Every row-reduced echelon form of A has n pivots.
(v) A is row-equivalent to idn .
Inverse of a 2 × 2 matrix
a b
Let A = . Then det A = ad − bc. If det A = 0, then A is not invertible. If det A 6= 0, then
c d
d −b
A is invertible and A−1 = det1 A .
−c a
Elementary matrices
We have the following three types of elementary matrices:
• Sj (c) = (sik )i,k=1...,n for c 6= 0 where sik = 0 if i 6= k, skk = 1 for k 6= j and sjj = c,
• Qjk (c) = (qi` )i,`=1...,n for j 6= k, where qjk = c, q`` = 1 for all ` = 1, . . . , n and all other
coefficients equal to zero,
• Pjk = (pi` )i,`=1...,n for j 6= k, where p`` = 1 for all ` ∈ {1, . . . , n} \ {j, k}, pjk = pkj = 1 and
all other coefficients equal to zero.
Sj (c) =
1
c
,
FT
1
Qjk (c) =
column k
c
row j
, Pjk
=
1
col. k
0
col. k
1
row k
.
RA
1 1 1 0 row j
1
3.10 Exercises
1. Vuelva al Capı́tulo 1 y haga los ejercicios otra vez utilizando los conocimientos adquiridos en
este capı́tulo.
2x2 − 4x + 14
2. Encuentre las fracciones parciales de .
x(x − 2)2
FT
x1 + 2x2 + 3x3 = b1
3x1 − x2 + 2x3 = b2
4x1 + x2 + x3 = b3 .
Encuentre todo los posibles b1 , b2 , b3 , o diga por qué no hay, para que el sistema tenga
RA
(a) exactamente una solución,
(b) ninguna solución,
(c) infinitas soluciones.
A = 4 8 1 0 , B= , C = 4 1 0 , D= ,
1 4 3 −2 2
1 4 4 3 1 4 3
5 −4
1
1 4 4
1
0 2 3 −3
~r =
3 ,
~v = , w
~ = 3 ,
~x =
5 ,
~y = , ~z = −2 .
3 5 5
π
6 −1
−1
2 6 −1 17
7. Sean A = 1 −2 2 y ~b = 6 . Encuentre todos los vectores ~x ∈ R3 tal que A~x = ~b.
1 2 −2 4
1 1
8. Sea M = .
−1 3
A=
1 −2
2 7
, B=
FT
10. Determine si las matrices son invertibles. Si lo son, encuentre su matriz inversa.
−14 21
12 −18
E = 2 1
1 4
3 5
6
5 .
11
(a) Dé una ecuación de la forma A~x = ~b que describe lo de arriba. Diga que signiifican los
D
vectores ~x y ~b.
(b) Calcule, usando el resultado de (a), cuantos chocolates y cuantas mentas contienen:
(i) 1 caja de tipo A y 3 de tipo B, (iii) 2 caja de tipo A y 6 de tipo B,
(ii) 4 cajas de tipo A y 2 de tipo B, (iv) 3 cajas de tipo A y 5 de tipo B.
1 3
13. Sea Ak = y considere la ecuación
2 k
0
Ak ~x = . (∗)
0
(a) Encuentre todos los k ∈ R tal que (∗) tiene exactamente una solución para ~x.
(b) Encuentre todos los k ∈ R tal que (∗) tiene infinitas soluciones para ~x.
(c) Encuentre todos los k ∈ R tal que (∗) tiene ninguna solución para ~x.
2
(d) Haga lo mismo para Ak ~x = en vez de (∗).
3
b1 b1
(e) Haga los mismo para Ak ~x = en vez de (∗) donde es un vector arbitrario
b 2 b2
0
distinto de .
0
7 4
1 4 −4
FT
14. Escriba las matrices invertibles de los Ejercicios 10. y 11. como producto de matrices elemen-
tales.
15. Para las siguientes matrices encuentre matrices elementales E1 , . . . , En tal que E1 ·E2 ·· · ··En A
es de la forma triangular superior.
1 2 3
RA
A= , B = 2 1 0 , C = 1 2 0 .
3 5
3 5 3 2 4 3
16. Sea A ∈ M (m × n) y sean ~x, ~y ∈ Rn , λ ∈ R. Demuestre que A(~x + λ~y ) = A~x + λA~y .
17. Demuestre que el espacio M (m×n) es un espacio vectorial con la suma de matrices y producto
con λ ∈ R usual.
D
19. Sea A = (aij ) i=1,...,n ∈ M (m×n) y sea ~ek el k-ésimo vector unitario en Rn (es decir, el vector
j=1,...,m
en Rn cuya k-ésima entrada es 1 y las demás son cero). Calcule A~ek para todo k = 1, . . . , n
y describa en palabras la relación del resultado con la matriz A.
20. (a) Sea A ∈ M (m × n) y suponga que A~x = ~0 para todo ~x ∈ Rn . Demuestre que A = 0 (la
matriz cuyas entradas son 0).
(b) Sea x ∈ Rn y suponga que A~x = ~0 para todo A ∈ M (n × n). Demuestre que ~x = ~0.
(c) Encuentre una matriz A ∈ M (2 × 2) y ~v ∈ R2 , ambos distintos de cero, tal que A~v = ~0.
(d) Encuentre matrices A, B ∈ M (2 × 2) tal que AB = 0 y BA 6= 0.
4 −1
21. Sean ~v = yw
~= .
5 3
(a) Encuentre una matriz A ∈ M (2 × 2) que mapea el vector ~e1 a ~v y el vector ~e2 a w.
~
(b) Encuentre una matriz B ∈ M (2 × 2) que mapea el vector ~v a ~e1 y el vector w
~ a ~e2 .
24. Sean A =
1 1
,B=
0
FT
23. Sean R, S ∈ M (n, n) matrices invertibles. Demuestre que
RS = SR
2
yC=
⇐⇒ R−1 S −1 = S −1 R−1 .
9 6
. Encuentre X ∈ M (2 × 2) que satisface
RA
1 2 2 0 −7 11
la ecuación
AX + 3X − B = C.
(a) Muestre que idn −A es invertible y encuentre su inversa. (Hint: Considere idn −A2 .
¿Por qué en este caso es correcto factorizar por medio de diferencia de cuadrados? Ver
Sección 3.4, Ejercicio 8.).
(b) Si λ ∈ R, λ 6= 0, muestre que λ idn −A es invertible.
(c) ¿Cómo se pueden generalizar los incisos anteriores si en lugar de suponer que A2 = O
suponemos que Am = O para algún m ∈ N?
29. Calcule (Sj (c))t , (Qij (c))t , (Pij )t .
30. (a) Sea P12 = ( 01 10 ) ∈ M (2 × 2). Demuestre que P12 se deja expresar como producto de
matrices elementales de la forma Qij (c) y Sk (c).
(b) Pruebe el caso general: Sea Pij ∈ M (n × n). Demuestre que Pij se deja expresar como
producto de matrices elementales de la forma Qkl (c) y Sm (c).
Observación: El ejercicio demuestra que en verdad solo hay dos tipos de matrices elementales
ya que el tercero (las permutaciones) se dejan reducir a un producto apropiado de matrices
FT
de tipo Qij (c) y Sj (c).
RA
D
Chapter 4
Determinants
In this section we will define the determinant of matrices in M (n × n) for arbitrary n and we will
recognise the determinant for n = 2 defined in Section 1.2 as a special case of our new definition.
FT
We will discuss the main properties of the determinant and we will show that a matrix is invertible
if and only if its determinant is different from 0. We will also give a geometric interpretation of
the determinant and get a glimpse of its importance in geometry and the theory of integration.
Finally we will use the determinant to calculate the inverse of an invertible matrix and we will
prove Cramer’s rule.
again the determinant tells us if a matrix is invertible or not. We will give several formulas for the
determinant. As definition, we use the Leibniz formula because it is non-recursive. First we need
to know what a permutation is.
sign(σ) 1 -1
sign(σ) 1 1 1 -1 -1 -1
FT
For instance the second permutation has two inversions (1 < 3 but σ(1) > σ(3) and 2 < 3
but σ(2) > σ(3)), the third permutation has two inversions (1 < 2 but σ(1) > σ(2), 1 < 3 but
σ(1) > σ(3)), etc.
Definition 4.3. Let A = (aij )i,j=1,...,n ∈ M (n × n). Then its determinant is defined by
X
det A = sign(σ) a1σ(1) a2σ(2) · · · anσ(n) . (4.1)
σ∈Sn
RA
The formula in equation (4.1) is called the Leibniz formula.
This means: instead of putting the permutation in the column index, we can just as well put
them in the row index.
Let us check if this new definition coincides with our old definition for the case n = 2.
a11 a12 X
det = sign(σ) a1σ(1) a2σ(2) = a11 a22 − a21 a12
a21 a22
σ∈S2
= a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a12 a21 a33 − a11 a23 a32 − a13 a22 a31 . (4.3)
Now let us group terms with coefficients from the first line of A.
det A = a11 a22 a33 − a23 a32 − a12 a21 a33 − a23 a31 + a13 a21 a32 − a22 a31 . (4.4)
FT
We see that the terms in brackets are again determinants:
• a11 is multiplied by the determinant of the 2 × 2 matrix obtained from A by deleting row 1
and column 1.
• a12 is multiplied by the determinant of the 2 × 2 matrix obtained from A by deleting row 1
and column 2.
• a13 is multiplied by the determinant of the 2 × 2 matrix obtained from A by deleting row 1
and column 3.
RA
If we had grouped the terms by coefficients from the second row, we would have obtained something
similar: each term a2j would be multiplied by the determinant of the 2 × 2 matrix obtained from
A by deleting row 2 and column j.
Of course we could also group the terms by coefficients all from the first column. Then the formula
would become a sum of terms where the aj1 are multiplied by the determinants of the matrices
obtained from A by deleting row j and column 1.
This motivates the definition of the so-called minors of a matrix.
D
Definition 4.5. Let A = (aij )i,j=1,...,n ∈ M (n × n) and let Mij be the (n − 1) × (n − 1) matrix Mij
which is obtained from A by deleting row i and column j. Then the numbers det Mij are called
minors of A and Cij := (−1)i+j det(Mij ) are called cofactors of A.
Remark. Some books use a slightly different notation. They call the (n − 1) × (n − 1) matrices
Mij which is obtained from A by deleting row i and column j of A the minors of A (or the
minor matrices of A). However, it seems that the majority of textbooks uses the convention from
Definition 4.5.
This formula is called the expansion of the determinant of A along the first row. We also saw that
we can expand along the second or the third row, or along columns, so
3
X 3
X
det A = (−1)k+j akj det Mkj = akj Ckj for k = 1, 2, 3,
j=1 j=1
3
X 3
X
det A = (−1)i+k aik det Mik = aik Cik for k = 1, 2, 3.
i=1 i=1
The first formula is called expansion along the kth row, and the second formula is called expansion
along the kth column. With a little more effort we can show that an analogous formula is true for
arbitrary n.
Theorem 4.6. Let A = (aij )i,j=1,...,n ∈ M (n × n) and let Mij denote its minors. Then
n
X n
X
det A = (−1)k+j akj det Mkj = akj Ckj for k = 1, 2, . . . , n, (4.5)
FT
j=1 j=1
Xn n
X
det A = (−1)i+k aik det Mik = aik Cik for k = 1, 2, . . . , n. (4.6)
i=1 i=1
The formulas (4.5) and (4.5) are called Laplace expansion of the determinant. More precisely, (4.5)
is called expansion along the kth row, (4.6) is called expansion along the kth column.
RA
Proof. Let us firste prove (4.5) for the case when k = 1. Let A ∈ M (n × n). Note that by definition
det A is the sum of products of the form sign(σ)a1σ(1) . . . an,σ(n) , see Remark 4.4. So the terms
are exactly all possible products of entries of the matrix A with exactly one term of each row and
exactly one term of each column, multiplied by +1 or −1.
The same is true for the formula (4.5): a11 is multiplied by det A11 , but the latter consists of
products with exactly one factor in each row and each column of A11 , that is, exactly one factor
from row 2 to n and column 2 to n of A; a12 is multiplied by det A12 , but the latter consists of
D
products with exactly one factor in each row and each column of A12 , that is, exactly one factor
from row 2 to n and column 1, 3, to n of A; etc.
So all products that appear in the Leibniz formula (4.1) appear also in the Laplace formula (4.5)
and vice versa. So it only remains to show that they appear with the same factor 1 or −1 in both
formulas.
Let σ ∈ Sn and set σ e : {2, 3, . . . , n} → {1, 2, . . . , n} \ {σ(1)}, σ
e(j) = σ(j). Then
#(inversions of σ) = #(pairs (i, j) such that i < j and σ(i) > σ(j))
= #(pairs (i, j) such that 2 ≤ i < j and σ(i) > σ(j))
+ #(pairs (1, j) such that 1 < j and σ(1) > σ(j))
= #(inversions of σ
e) + #(pairs (1, j) such that 1 < j and σ(1) > σ(j))
e) + σ(1) − 1,
= #(inversions of σ
hence sign(σ) = (−1)#(inversions of σe)+σ(1)−1 = sign(e σ )(−1)σ(1)−1 = sign(e σ )(−1)σ(1)+1 and there-
fore
sign(σ)a1σ(1) a2σ(2) · · · an,σ(n) = (−1)σ(1)+1 a1σ(1) sign(e
σ )a2σ(2) · · · an,σ(n) .
The term in brackets is one of the terms that appear in the determinant of A1σ(1) , hence the
product on the right hand side appears in the formula (4.5) (when j = σ(1)) and it is the only term
that contains the factors a11 , a12 , . . . , a1n . Consequently each term in the Leibniz formula appears
exactly once in the Laplace formula with exactly the same sign and there are no other terms in the
Laplace formula. Hence both formulas are equal.
The reasoning for k > 1 is the same. We only need to take σ e as the restriction of σ to {1, 2, . . . , n} \
{k} and note that sign(σ) = sign(eσ )(−1)σ(k)+k . This is true because
#(inversions of σ) = #(pairs (i, j) such that i < j and σ(i) > σ(j))
= #(pairs (i, j) such that i, j 6= k, 1 ≤ i < j and σ(i) > σ(j))
+ #(pairs (i, k) such that i < k or j = k and σ(i) > σ(k))
+ #(pairs (k, j) such that k < j, or j = k and σ(k) > σ(j))
FT
= #(inversions of σ
e) + #(pairs (1, j) such that 1 < j and σ(1) > σ(j))
e) + σ(k) − k + 2`
= #(inversions of σ
Therefore we obtain
Note that for calculating for instance the determinant of a 5×5 matrix, we have to calculate five 4×4
determinants for each of which we have to calculate four (3×3) determinants, etc. Computationally,
it is as long as the Leibniz formula, but at least we do not have to find all permutations in Sn first.
D
Later, we will see how to calculate the determinant using Gaussian elimination. This is computa-
tionally much more efficient, see Remark 4.12.
We obtain the same result if we expand the determinant along e.g. the first row:
3 2 1 3 2 1 3 2 1 3 2 1
det 5 6 4 = 3 det 5 6 4 − 2 det 5 6 4 + 1 det 5 6 4
8 0 7 8 0 7 8 0 7 8 0 7
6 4 5 4 5 6
= 3 det − 2 det + 1 det
0 7 8 7 8 0
= 3 6 · 7 − 4 · 0] − 2 5 · 7 − 4 · 8] + 5 · 0 − 6 · 8] = 3 · 42 − 2 35 − 32] − 40 = 126 − 6 − 48 = 72.
Example 4.8. We give an example of the calculation of the determinant of a 4 × 4 matrix. The
red arrows indicate along which row or column we expand.
1 2 3 4
0 6 0 1 6 0 1 0 0 1 0 6 1 0 6 0
det = det 0 7 0 − 2 det 2 7 0 + 3 det 2 0 0 − 4 det 2 0 7
2 0 7 0
FT
3 0 1 0 0 1 0 3 1 0 3 0
0 3 0 1
6 1 0 1 6 1 2 7
= 7 det − 2 7 det + 3 −2 det − 4 −6 det
3 1 0 1 3 1 0 0
= 7[6 − 3] − 14[0 − 0] − 6[6 − 3] + 24[0 − 0] = 21 − 18 = 3.
Now we calculate the determinant of the same matrix but choose a row with more zeros in the first
step. The advantage is that there are only two 3 × 3 minors whose determinants we really have to
RA
compute.
1 2 3 4
0 6 0 1 2 3 4 1 3 4 1 2 4 1 2 3
det = −0 det 0 7 0 + 6 det 2 7 0 − 0 det 2 0 0 + det 2 0 7
2 0 7 0
3 0 1 0 0 1 0 3 1 0 3 0
0 3 0 1
2 0 1 4 0 7 2 3
= 6 −3 det +7 + det − 2 det
0 1 0 1 3 0 3 0
D
Rule of Sarrus
We finish this section with the so-called rule of Sarrus. From (4.3) we know that
det A = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a12 a21 a33 + a11 a22 a33 + a13 a22 a31
which can be memorised as follows: Write down the matrix A and append its first and second
column to it. Then we sum the products of the three terms lying on diagonals from the top left to
the bottom right and subtract the products of the terms lying on diagonals from the top right to
the bottom left as in the following picture:
det A = a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a13 a22 a31 + a11 a23 a32 + a12 a21 a33 .
FT
Convince yourself that one could also append the first and the second row below the matrix and
make crosses.
1 2 3
RA
det 4 5 6 = 1 · 5 · 7 + 2 · 6 · 0 + 3 · 4 · 8 − 3 · 5 · 0 + 6 · 8 · 1 + 7 · 2 · 4
0 8 7
= 35 + 96 − [48 + 56] = 131 − 106 = 27.
D
Ejercicios.
1. Calcule los determinantes de las siguientes matrices:
−2 3 1 0 1 4 6 3 5
(a) 0 2 1 (b) −2 0 −6 (c) 3 −1 4
4 6 5 2 1 0 −2 1 −6
−3 0 0 0
−1 1 0 3 5 −1 −4 7 0 0
(d) 2 1 4 (e) 0 1 −2 (f)
−5
10 −1 0
1 5 6 0 0 2
2 3 −11 6
2 3 −1
√ 20 π
−2 0 0 7
0 1 2 −11 10
1 2 −1 4
(g) 0 0 4 −1 5 (h)
3 0 −1 5
0 0 0 −2 50
4 2 3 0
FT
0 0 0 0 6
1 1 1
2. Sea A = a b c . Muestre que det A = (b − a)(c − a)(c − b).
a2 b2 c2
0 a b
3. Sea A = −a 0 c. Muestre que det A = 0.
RA
−b −c 0
This means the following. Let ~r1 , . . . , ~rn be the row vectors of the matrix A and assume that
~rj = ~sj + γ~tj . Then
~r1 ~r1 ~r1 ~r1
.. .. .. ..
.
.
.
.
~tj
~rj
det A = det = det ~sj + γtj = det ~sj + γ det .
. .. . ..
.. ..
. .
~rn ~rn ~rn ~rn
This is proved easily by expanding the determinant along the jth row, or it can be seen from the
Leibniz formula as well.
FT
det A = det . = − det ... .
..
~ri ~rj
.. ..
. .
This is easy to see when the two rows that shall be interchanged are adjacent. For example, assume
that j = i + 1. Let A be the original matrix and let B be the matrix with rows i and i + 1 swapped.
We expand the determinant of A along the ith row and and the determinant of B along the (i+1)th
A B
RA
row. Note that in both cases the minors are equal, that is, Mik = M(i+1)k (we use superscripts A
and B to distinguish between the minors of A and of B). So we find
n
X n
X n
X
det B = (−1)(i+1)+k M(i+1)k
B
= (−1)(−1)i+k Mik
A
=− (−1)i+k Mik
A
= − det A.
k=1 k=1 k=1
This can seen also via the Leibniz formula. Now let us see what happens if i and j are not adjacent
rows. Without restriction we may assume that i < j. Then we first swap the jth row (j − i) times
with the row above until it is in the ith row. The original ith row is now in row (i + 1). Now we
swap it down with its neighbouring rows until it becomes row j. To do this we need j − (i + 1)
D
Remark 4.10. It can shown: Every function f : M (n × n) → R which satisfies (D1), (D2) and
(D3) (or (D1’), (D2’) and (D3)) must be det.
..
.
FT
(D5) If one row of A is multiple of another row, or if a column is a multiple of another
column, then det A = 0. In particular, if A has two equal rows or two equal columns
then det A = 0.
Let ~r1 , . . . , ~rn denote the rows of the matrix A and assume that ~rk = c~rj . Then
..
.
..
.
..
.
RA
~rk
c~rj
(D2)
~rj
(D1)
~rj
.. .. .. ..
det A = det = det = − det = −c det
. . . .
~rj ~rj c~rj ~rj
.. .. .. ..
. . . .
.. ..
. .
c~rj
~rk
(D1)
= − det
.. = − det .. = − det A.
D
.
.
~rj
~rj
.. ..
. .
This shows det A = − det A, and therefore det A = 0. If A has a column which is a multiple of
another, then its transpose has a row which is multiple of another row and with the help of (D4) it
follows that det A = det At = 0.
(D6) The determinant of an upper or lower triangular matrix is the product of its
diagonal entries.
Let A be an upper triangular matrix and let us expand its determinant in the first column. Then
only the first term in the Laplace expansion is different from 0 because all coefficients in the first
column are equal to 0 except possibly the one in the first row. We repeat this and obtain
c1 c2 c3
det A = det
c2 ∗
= c1 det
c3 ∗
= c1 c2 det
c4 ∗
0 0 0
cn cn cn
c 0
= · · · = c1 c2 · · · cn−2 det n−1 = c1 c2 · · · cn−1 cn .
0 cn
The claim for lower triangular matrices follows from (D4) and what we just showed because the
transpose of an upper triangular matrix is lower triangular and the diagonal entries are the same.
Or we could repeat the above proof but this time we would expand always in the first row (or last
column).
FT
Next we calculate the determinant of elementary matrices.
Now we calculate the determinant of a product of an elementary matrix with another matrix.
(D8) Let E be an elementary matrix and let A ∈ M (n × n). Then det(EA) = det E det A.
D
Let E be an elementary matrix and let us denote the rows of A by ~r1 , . . . , ~rn . We have to distinguish
between the three different types of elementary matrices.
Case 1. E = Sj (c). We know from (D6) that det E = det Sj (c) = c. Using Proposition 3.62 and
(D1) we find that
.. ..
. .
c~rj = c det ~rj
det(EA) = det Sj (c)A = det = c det A = det Sj (c) det A.
.. ..
. .
Case 2. E = Qij (c). We know from (D6) that det E = det Qij (c) = 1. Using Proposition 3.62 and
Case 3. E = Pij . We know from (D6) that det E = det Pjk = −1. Using Proposition 3.62 and
(D2) we find that
.. .. ..
. . .
~rj ~rk ~rj
FT
.. .. ..
det(EA) = det Pjk A = det Pjk . = det = − det
. .
~rk ~rj ~rk
.. .. ..
. . .
= − det A = det Pjk det A.
Recall that the determinant of an elementary matrix is different from zero, so (4.7) shows that
det A = 0 if and only if det A0 = 0.
If A is invertible, then A0 = id hence det A0 = 1 6= 0 and therefore also det A 6= 0. If A is not
invertible, then the last row of A0 must be zero, hence det A0 = 0 and therefore also det A = 0.
Next we show that the determinant is multiplicative.
If on the other hand A is not invertible, then det A = 0. Moreover, the last row of A0 is zero,
so also the last row of A0 B is zero, hence A0 B is not invertible and therefore det A0 B = 0. So
we have det(AB) = 0 by (4.7), and also det(A) det(B) = 0 det(B) = 0, so also in this case
det(AB) = det A det B.
FT
Let A ∈ M (n × n). Give two proofs of det(cA) = cn det A using either one of the following:
(ii) Use that cA = diag(c, c, . . . , c)A and apply (D10) and (D6).
(i) A is invertible.
(ii) For every ~b ∈ Rn , the equation A~x = ~b has exactly one solution.
(vi) det A 6= 0.
FT
1 2 3 4 1 2 3 4
4 0 0 0 1 5 0 3 0 0 7
= 5 = −5 det = −15.
0 0 1 1 0 0 1 1
0 3 0 0 0 0 0 1
1 We subtract the first row from all the other rows. The determinant does not change.
2 We factor 5 in the third row.
We subtract 1/3 of the last row from rows 2 and 3. The determinant does not change.
RA
3
4 We subtract row 3 from row 2. The determinant does not change.
5 We swap rows 2 and 4. This gives a factor −1.
6 Easy calculation.
Ejercicios.
a −2b c
1. Suponga que sabemos que det 1 3 −1 = −1. Calcule
0 5 2
−3a 6b −3c a −2b c 1 3 −1
det −1 −3 1 , det 1 + 2a 3 − 4b −1 + 2c , det 2 11 0 .
2
0 1 5 −a 5 + 2b 2−c a+1 −2b + 3 c−1
a11 a12 a13
2. Suponga que det a21 a22 a23 = 6. Calcular:
a31 a32 a33
a31 a32 a33
det 3a11 − 5a31 3a12 − 5a32 3a13 − 5a33 .
a21 a22 a23
FT
3. ¿Para cuáles valores de a la matriz:
2−a −2 0
0 1 1 + a
a 2 2a
no tiene inversa?
1 4
4. Sea A = . Halle todos los λ ∈ R tal que λ id2 −A es no invertible.
RA
0 4
5. ¿Falso o verdadero? Sean A, B ∈ M (n × n).
(a) Si n es impar, entonces det(−A) = − det(A).
(b) Si n es impar y A es antisimétrica, entonces A es no invertible.
(c) Si det A = 0 entonces A = O.
(d) Si det A = 0 entonces una fila ó columna de A consta de solo ceros.
(e) Si det A = 0 entonces por lo menos una entrada de A debe ser 0.
D
Area in R2
a1 b
Let ~a = and ~b = 1 be vectors in R2 and let us consider the matrix A = (~a|~b) the matrix
a2 b2
whose columns are the given vectors. Then
A~e1 = ~a, A~e2 = ~b.
That means that A transforms the unit square spanned by the unit vectors ~e1 and ~e2 into the
parallelogram spanned by the vectors ~a and ~b. Let area(~a, ~b) be the area of the parallelogram
FT
spanned by ~a and ~b. We can view ~a and ~b as vectors in R3 simply by adding a third component.
Then formula (2.9) shows that the area of the parallelogram spanned by ~a and ~b is equal to
a1 b1 a1 b2 − a2 b1
a2 × b2 = 0 = |a1 b2 − a2 b1 | = | det A|,
0 0 0
hence we obtain the formula
area(~a, ~b) = | det A|. (4.9)
RA
So while A tells us how the shape of the unit square changes, | det A| tells us how its area changes,
see Figure 4.1.
A
y y
b1
~e2 A~e2 = b2
D
A~e1 = ( aa12 )
x x
~e1
Figure 4.1: The figure shows how the area of the unit square transforms under the linear transforma-
tion A. The area of the square on left hand side is 1, the area of the parallelogram on the right hand
side is | det A|.
You should also notice the following: The area of the image of the unit square under A is zero
if and only if the two image vectors ~a and ~b are parallel. This is in accordance to the fact that
det A = 0 if and only if the two lines described by the associated linear equations are parallel (or if
one equation describes the whole plane).
Volumes in R3
a1 b1 c1
Let ~a = a2 , ~b = b2 and ~c = c2 be vectors in R3 and let us consider the matrix
a3 b3 c3
~
A = (~a | b | ~c) whose columns are the given vectors. Then
A~e1 = ~a, A~e2 = ~b, A~e3 = ~c.
That means that A transforms the unit cube spanned by the unit vectors ~e1 , ~e2 and ~e3 into the
parallelepiped spanned by the vectors ~a, ~b and ~c. Let vol(~a, ~b, ~c) be the volume of the parallelepiped
spanned by the vectors ~a, ~b and ~c. According to formula (2.10), vol(~a, ~b, ~c) = |h~a , ~b × ~ci|. We
calculate
* + * +
a1 b1 c1 a1 b2 c3 − b3 c2
|h~a , ~b × ~ci| = a2 , b2 × c2 = a2 , b3 c1 − c3 b1
a3 b3 c3 a3 b1 c2 − b2 c1
= |a1 (b2 c3 − b3 c2 ) − a2 (c3 b1 − b3 c1 ) + a3 (b1 c2 − b2 c1 )|
FT
= | det A|
hence
vol(~a, ~b, ~c) = | det A| (4.10)
since we recognise the second to last line as the expansion of det A along the first column. So while
A tells us how the shape of the unit cube changes, | det A| tells us how its volume changes.
z A
RA
z
A~e3
~e3
~e2 y y
A~e2
D
~e1 A~e
1
x
x
Figure 4.2: The figure shows how the volume of the unit cube transforms under the linear transfor-
mation A. The volume of the cube on left hand side is 1, the volume of the parallelepiped on the right
hand side is | det A|.
You should also notice the following: The volume of the image of the unit cube under A is zero if
and only if the three image vectors lie in the same plane. We will see later that this implies that
the range of A is not all of R3 , hence A cannot be invertible. For details, see Section 6.2.
−~v x
−w
~
FT
(ii) Show that the area of the blue parallelogram is
six times the area of the green parallelogram.
−w
~
y
~v
3w
~
2~v
x
RA
y
w
~
(iii) Show that the area of the blue and the red par-
allelogram is equal to the area of the green par-
~z
allelogram. ~v
D
x
−~z
−~z
Ejercicios.
1 2 4
1. Calcule el volumen del paralelepı́pedo generado por los vectores 3, 0, 4.
−5 1 4
1 0 2
2. Sea A = 0 −1 −2. Calcule el volumen del paralelepı́pedo generado por A~e1 , A~e2 y
2 −2 0
FT
A~e3 . ¿Cómo interpreta geométricamente el resultado obtenido?
3. Sean P = (1, 1), Q = (2, 2), R = (0, 3) y W = (−1, 2). Muestre que los puntos P QRW
forman un paralelogramo y calcule su área.
4. Sean P = (x1 , y1 ) ,Q = (x2 , y2 ) y R = (x3 , y3 ) puntos de R2 . Muestre que el área de triángulo
4P QR está dada por la fórmula:
1 x1 y1
1
RA
det 1 x2 y2 .
2
1 x3 y3
¿Cuándo este determinante será igual a cero?
goes.
Let A = (aij )i,j=1,...,n ∈ M (n × n) and let Mij be its minors, see Definition 4.5. We already know
from (4.5) that for every fixed k ∈ {1, . . . , n}
n
X
det A = (−1)k+j akj det Mkj . (4.11)
j=1
Now we want to see that happens if the k in akj and in Mkj are different.
Proposition 4.13. Let A = (aij )i,j=1,...,n ∈ M (n × n) and let k, ` ∈ {1, . . . , n} with k 6= `. Then
n
X
(−1)`+j akj det M`j = 0. (4.12)
j=1
Proof. We build the new matrix B from A by replacing its `th row by the kth row. Then B has
two equal rows (row ` and row k), hence det B = 0. Note that the matrices A and B are equal
B A
everywhere except possibly in the `th row, so their minors along the row ` are equal: M`j = M`j
(we put superscripts A, B in order to distinguish the minors of A and of B). If we expand det B
along the `th row then we find
n
X n
X
0 = det B = (−1)`+j b`j det M`j
B
= (−1)`+j akj det M`j
A
.
j=1 j=1
Using the cofactors Cij of A (see Definition 4.5), formulas (4.11) and (4.12) can be written as
n n
(
X
`+j A
X det A if k = `,
(−1) akj det M`j = akj C`j := (4.13)
j=1 j=1
0 if k 6= `.
Definition 4.14. For A ∈ M (n × n) we define its adjugate matrix adj A as the transpose of its
cofactor matrix:
FT
t
C11 C12 · · · C1n C11 C21 ··· Cn1
C21 C22 · · · C2n C12 C22 ··· Cn2
adj A := . .. = .. .. .
.. ..
.. . . . . .
Cn1 Cn2 · · · Cnn C1n C2n ··· Cnn
0 0 ... det A
Remark 4.16. Note that the proof of Theorem 4.15 shows that A adj A = det A idn is true for
every A ∈ M (n × n), even if it is not invertible (in this case, both sides of the formula are equal to
the zero matrix).
Formula (4.14) might look quite nice and innocent, however bear in mind that in order to calculate
A−1 with it you have to calculate one n ×n determinant and n2 determinants of the (n −1)×(n −1)
minors of A. This is a lot more than the O(n3 ) steps needed in the Gauß-Jordan elimination.
Finally, we prove Cramer’s rule for finding the solution of a linear system if the corresponding
matrix is invertible.
Theorem 4.17. Let A ∈ M (n × n) be an invertible matrix and let B ~ ∈ Rn . Then the unique
solution ~x of A~x = ~b is given by
~
det Ab1
x1
~
x2 det Ab2
1
~x = . = (4.15)
.. det A .
..
xn ~
det Abn
~
where Abj is the matrix obtained from the matrix A if we replace its jth column by the vector ~b.
Proof. As usual we write Cij for the cofactors of A and Mij for its minors. Since A is invertible,
we know that ~x = A−1~b = det1 A adj A~b. Therefore we find for j = 1, . . . , n that
n n
1 X 1 X 1 ~
xj = Ckj bk = (−1)k+j bk Ckj = det Abj .
det A det A det A
k=1 k=1
FT
~
The last equality is true because the second to last sum is the expansion of the determinant of Abj
along the kth column.
Note that, even if (4.15) might look quite nice, it involves the computation of n + 1 determinants
of n × n matrices, so it involves O((n + 1)!) steps.
1 2 3
RA
Example 4.18. Let us calculate the inverse of the matrix A = 4 5 6 from Example 4.9. We
0 8 7
already know that det A = 27. Its cofactors are
5 6 4 6 4 5
C11 = det = −13, C12 = − det = −28, C13 = det = 32,
8 7 0 7 0 8
2 3 1 3 1 2
C21 = − det = 10, C22 = det = 7, C23 = − det = −8,
8 7 9 7 0 8
2 3 1 3 1 2
D
Therefore
C C21 C31 −13 10 −3
1 1 11 1
A−1 = adj A = C12 C22 C32 = −28 7 6 .
det A det A 27
C13 C23 C33 32 −8 −3
Example 4.19. Let us use Cramer’s rule to solve the following linear system:
x+y+ z = 8
4y − z = −2
3x − y + 2z = 0.
FT
det 0 4 −2
3 −1 0 −104
z= = = 13.
1 1 1 −8
det 0 4 −1
3 −1 2
RA
You should now have understood
• what the adjugate matrix is and why it can be used to calculate the inverse of a matrix,
• etc.
You should now be able to
• calculate A−1 using adj A.
• solve systems of linear equations using Cramer’s rule.
D
• etc.
Ejercicios.
1. Usando la regla de Cramer, resuelva los siguientes sistemas de ecuaciones lineales:
3. Sea
1 −2 3 1
−1 5 0 2
A= .
0 3 1 −2
−2 1 2 4
Calcule la entrada a32 de A−1 .
4. El siguiente ejercicio tiene como objetivo demostrar la ley del coseno usando álgebra lineal.
α
γ
•
FT a
β
RA
b cos α a cos β
c
Usando trigonometrı́a elemental, deduzca las siguientes relaciones:
b cos α + a cos β =c
c cos α + a cos γ = b
c cos β + b cos γ = a
D
(b) Considere las relaciones anteriores como un sistema de ecuaciones con cos α, cos β y cos γ
como incógnitas. Calcule el determinante del sistema.
(c) Use la regla de Cramer para despejar cos(γ) y concluya la ley del coseno:
c2 = a2 + b2 − 2ab cos(γ)
5. Sea A ∈ M (n × n).
6. Sea A ∈ M (n × n) simétrica e invertible. Muestre que adj A es simétrica. ¿Sigue siendo cierta
la afirmación si no suponemos que A es invertible?
4.5 Summary
The determinant is a function from the square matrices to the real numbers. Later we will also
consider matrices with complex entries. In thia case, the determinant is a function from the square
matrices to the complex numbers. Let A = (aij )ni,j−1 ∈ M (n × n).
FT
with the following notation
• Mij are the minors of A ((n − 1) × (n − 1) matrices obtained from A by deleting row i and
column j),
Geometric interpretation
The determinant of a matrix A gives the oriented volume of the image of the unit cube under A.
• det A = det At .
• If one row of A is multiple of another row, or if a column is a multiple of another column,
then det A = 0. In particular, if A has two equal rows or two equal columns then det A = 0.
• The determinant of an upper or lower triangular matrix is the product of its diagonal entries.
• The determinants of the elementary matrices are
(ii) A is invertible.
FT
Theorem. Let A ∈ M (n × n). Then the following is equivalent:
(i) det A 6= 0.
(iii) For every ~b ∈ Rn , the equation A~x = ~b has exactly one solution.
(iv) The equation A~x = ~0 has exactly one solution.
RA
(v) Every row-reduced echelon form of A has n pivots.
(vi) A is row-equivalent to idn .
4.6 Exercises
1. De las siguientes matrices calcule la determinante. Determine si las matrices son invertibles.
Si lo son, encuentre su matriz inversa y la determinante de la inversa.
D
1 3 6 1 4 6
1 −2 −14 21
A= , B= . D = 4 1 0 , E = 2 1 5 .
2 7 12 −18
1 4 3 3 5 11
2. Sea
5t3
3 1
A=0 2+t 0.
−t 10 − t t2
4
√π
Determine t ∈ R tal que el sistema A~x = √2 tiene exactamente una única solución.
3
2
1+x y z w r
x 1+y z w r
3. Sea A = x
y 1+z w r
, muestre que det A = 1 + x + y + z + w + r.
x y z 1+w r
x y z w 1+r
(Hint: es más sencillo si usa la propiedad D1 de la sección 4.2).
4. De las siguientes matrices calcule el determinante. Determine si las matrices son invertibles.
Si lo son, encuentre su matriz inversa y el determinante de la inversa.
1 2 3 0
−1 2 3
π 3 0 1 2 2
A= , B = 1 3 1 , C= 1 4 0 3 .
5 2
4 3 2
1 1 5 4
1
7. Escribe la matriz A = 1
2
2
−2 −2 −6
FT
6. Use las factorizaciones encontradas en los Ejercicios 14. y 14. en Capı́tulo 3 para calcular sus
determinantes.
3
6 como producto de matrices elementales y calcule el
9. Suponga que una función y satisface y [n] = bn−1 y [n−1] + · · · b1 y 0 + b0 y donde b0 , . . . , bn−1 son
coeficientes constantes y y [j] denota la derivada j-ésima de y.
D
y calcule el determinante de A.
10. Sin usar fórmulas de expansión para determinantes, encuentre para cada una de las matrices
dadas parámetros x, y tales que el determinante de las siguientes matrices es igual a zero y
explique por qué los parametros encontrados sirven.
1 x y 2
x 2 6 x 0 1 y
N1 = 2 5 1 , N2 = x 5 3 y .
3 4 y
4 x y 8
11. (a) Calcule det Bn donde Bn es la matriz en M (n × n) cuyas entradas en la diagonal son 0
y todas las demás entradas son 1, es decir:
0 1 1 1 1
0 1 1 1
0 1 1 1 0 1 1 1
0 1 1 0 1 1
B1 = 0, B2 = , B3 = 1 0 1 , B4 = , B5 = 1 1 0 1 1 , etc.
1 0 1 1 0 1
1 1 0 1 1 1 0 1
1 1 1 0
1 1 1 1 0
B1 = 0, B2 =
0 1
1 0
FT
¿Cómo cambia la respuesta si en vez de 0 hay x en la diagonal?
(b) Calcule det Bn donde Bn es la matriz en M (n × n) cuyas entradas en la diagonal son 0
y todas las demás entradas satisfacen bij = (−1)i+j , es decir:
, B3 = 1 0
0 1 −1
−1 1
1 , B4 =
0
1
0
−1
1 −1
0
1
1 −1
0
1
,
1
RA
1 −1 1 0
0 1 −1 1 −1
1
0 1 −1 1
B5 = −1
1 0 1 −1 , etc.
1 −1 1 0 1
−1 1 −1 1 0
Vector spaces
In the following, K always denotes a field. In this chapter, you may always think of K = R,
FT
though almost everything is true also for other fields, like C, Q or Fp where p is a prime number.
Later, in Chapter 8 it will be more useful to work with K = C.
In this chapter we will work with abstract vector spaces. We will first discuss their basic proper-
ties. Then, in Section 5.2 we will define subspaces. These are subsets of vector spaces which are
themselves vector spaces. In Section 5.4 we will introduce bases and the dimension of a vector
space. These concepts are fundamental in linear algebra since they allow us to classify all finite
RA
dimensional vector spaces. In a certain sense, all n-dimensional vector spaces over the same field
K are equal. In Chapter 6 we will study linear maps between vector spaces.
Note that we will usually write λv instead of λ · v. Then V (or more precisely, (V, +, ·)) is called a
vector space over K if for all u, v, w ∈ V and all λ, µ ∈ K the following holds:
165
166 5.1. Definitions and basic properties
Remark 5.2. (i) Note that we use the notation ~v with an arrow only for the special case of
vectors in Rn or Cn . Vectors in abstract vector spaces are usually denoted without an arrow.
(ii) If K = R, then V is called a real vector space. If K = C, then V is called a complex vector
space.
Before we give examples of vector spaces, we first show some basic properties of vector spaces.
FT
Properties 5.3. (i) The identity element is unique. (Note that in the vector space axioms we
only asked for existence of an additive identity element; we did not ask for uniqueness. So one
could think that there may be several elements which satisfy (c) in Definition 5.1. However,
this is not possible as the following proof shows.)
Proof. Assume there are two neutral elements O and O0 . Then we know that for every v and
w in V the following is true:
v = v + O, w = w + O0 .
RA
Now let us take v = O0 and w = O. Then, using commutativity, we obtain
O0 = O0 + O = O + O0 = O.
Proof. Let x0 be an additive inverse of x (that means that x0 + x = O which must exist since
D
y = O + y = (x0 + x) + y = x0 + (x + y) = x0 + (x + z) = (x0 + x) + z O + z = z.
(iii) For every v ∈ V , its inverse element is unique. (Note that in the vector space axioms we
only asked for existence of an additive inverse for every element x ∈ V ; we did not ask
for uniqueness. So one could think that there may be several elements which satisfy (d) in
Definition 5.1. However, this is not possible as the following proof shows.)
Proof. Let v ∈ V and assume that there are elements v 0 , v 00 in V such that
v + v 0 = O, v + v 00 = O.
λO + O = λO + λO.
Proof. The proof is similar to the one above. Observe that 0v = 0v + O0 and 0v = (0 + 0)v =
0v + 0v, hence
0v + O = 0v + 0v.
Now it follows from (ii) that O = 0v (take x = 0v, y = O and z = 0v in (ii)).
O = 0v = (1 + (−1))x = v + (−1)v.
Hence (−1)v is an additive inverse of v. By (iii), the inverse of v is unique, therefore (−1)v is
the inverse of v.
Remark 5.4. From now on, we write −v for the additive inverse of a vector. This notation is
D
• R is a real vector space. More generally, Rn is a real vector space. The proof is the same
as for R2 in Chapter 2. Associativity and commutativity are clear. The identity element is
the vector whose entries are all equal to zero: ~0 = (0, . . . , 0)t . The inverse for a given vector
~x = (x1 , . . . , xn )t is (−x1 , . . . , −xn )t . The distributivity laws are clear, as is the fact that
1~x = ~x for every ~x ∈ Rn .
• C is a complex vector space. More generally, Cn is a complex space. The proof is the same
as for Rn .
• R is not a complex vector space with the usual definition of the algebraic operations. If it
was, then the vectors would be real numbers and the scalars would be complex numbers. But
then if we take 1 ∈ R and i ∈ C, then the product i1 must be a vector, that is, a real number,
which is not the case.
• R can be seen as a Q-vector space.
• For every n, m ∈ N, the space M (m × n) of all m × n matrices with real coefficients is a real
vector space.
Proof. Note that in this case the vectors are matrices. Associativity and commutativity are
easy to check. The identity element is the matrix whose entries are all equal to zero. Given
a matrix A = (aij )i=1,...,m , its (additive) inverse is the matrix −A = (−aij )i=1,...,m . The
j=1,...,n j=1,...,n
distributivity laws are clear, as is the fact that 1A = A for every A ∈ M (m × n).
FT
Proof. As in the case of real matrices.
• Let C(R) be the set of all continuous functions from R to R. We define the sum of two
functions f and g in the usual way as the new function
f + g : R → R, (f + g)(x) = f (x) + g(x).
The product of a function f with a real number λ gives the new function λf defined by
RA
λf : R → R, (λf )(x) = λf (x).
Then C(R) is a vector space with these new operations.
Proof. It is clear that these operations satisfy associativity, commutativity and distributivity
and that 1f = f for every function f ∈ C(R). The additive identity is the zero function
(the function which is constant to zero). For a given function f , its (additive) inverse is the
function −f .
D
• Let P (R) be the set of all polynomials. With the usual sum and products with scalars, they
form a vector space.
Prove that C is a vector space over R and that R is a vector space over Q.
Observe that the sets M (m × n) and C(R) admit more operations, for example we can multiply
functions, or we can multiply matrices or we can calculate det A for a square matrix. However, all
these operations have nothing to do with the question whether they are vector spaces or not. It
is important to note that for a vector space we only need the sum of two vectors and the product
of a scalar with vector and that they satisfy the axioms from Definition 5.1.
Examples 5.6. • Consider R2 but we change the usual sum to the new sum ⊕ defined by
x a x+a
⊕ = .
y b 0
With this new sum, R2 is not a vector space. The reason is that
there is no additive identity.
α
To see this, assume that we had an additive identity, say . Then we must have
β
x α x x
+ = for all ∈ R2 .
y β y y
• Consider R2 but we change the usual sum to the new sum ⊕ defined by
FT
x a x+b
⊕ = .
y b y+b
With this new sum, R2 is not a vector space. One of the reasons is that the sum is not
commutative. For example
1 0 1+1 2 0 1 0+0 0
+ = = , but + = = .
0 1 0+0 0 1 0 1+1 2
RA
Show that there is no additive identity O which satisfies ~x ⊕ O = ~x for all ~x ∈ R2 .
• Let V = R+ = (0, ∞). We make V a real vector space with the following operations: Let
x, y ∈ V and λ ∈ R. We define
x ⊕ y = xy and λ x = xλ .
(λµ) v = v λµ = (v λ )µ = µ (v λ ) = λ (µ v).
(λ + µ) x = xλ+µ = xλ xµ = (λ v)(µ v) = (λ v) ⊕ (µ v)
and
• The example above can be generalised: Let f : R → (a, b) be an injective function. Then the
interval (a, b) becomes a real vector space if we define the sum of two vectors x, y ∈ (a, b) by
x ⊕ y = f (f −1 (x) + f −1 (y))
FT λ x = f (λf −1 (x)).
Note that in the example above we had (a, b) = (0, ∞) and f = exp (that is: f (x) = ex ).
Ejercicios.
En los siguientes ejercicios, diga si el conjunto dado es un espacio vectorial sobre R. Si no lo es,
determiné cuales propiedades de espacio vectorial no se cumplen.
1. {(x, y) ∈ R2 : y ≤ 0} con la suma y producto escalar usuales.
2. {(x, 2x, 3x) : x ∈ R} con la suma y producto escalar usuales.
suma: ~a ⊕ ~b := ~a − ~b,
13. Sea V = {a} (note que V solo tiene un elemento), sobre V defina las siguientes operaciones:
suma: a ⊕ a := a,
producto con escalar: λ a = a.
D
5.2 Subspaces
In this section, we work mostly with real vector spaces for the sake of definiteness. However, all
the statements are also true for complex vector spaces. We only have to replace R by C and the
word real by complex everywhere.
In this section we will investigate when a subset of a given vector space is itself a vector space.
Definition 5.7. Let V be a vector space and let W ⊆ V be a subset of V . Then W is called a
subspace of V if W itself is a vector space with the sum and product with scalars inherited from V .
A subspace W is called a proper subspace if W 6= {O} and W 6= V .
• V always contains the following subspaces: {O} and V itself. However, they are not proper
subspaces.
• If V is a vector space, W is a subspace of V and U is a subspace of W , then U is a subspace
of V .
Prove these statements.
Remark 5.9. Let W be a subspace of a vector space V . Let O be the neutral element in V . Then
O ∈ W and it is the neutral element of W .
Proof. Since W is a vector space, it must have a neutral element OW . A priori, it is not clear that
OW = O. However, since OW ∈ W ⊂ V , we know that 0OW = O. On the other hand, since
W is a vector space, it is closed under product with scalars, so O = 0OW ∈ W . Clearly, O is a
FT
neutral element in W . So it follows that O = OW by uniqueness of the neutral element of W , see
Properties 5.3(i).
Now assume that we are given a vector space V and in it a subset W ⊆ V and we would like to
check if W is a vector space. In principle we would have to check all seven vector space axioms
from Definition 5.1. However, if W is a subset of V , then we get some of the vector space axioms
for free. More precisely, the axioms (a), (b), (e), (f) and (g) hold automatically. For example, to
prove (b), we take two elements w1 , w2 ∈ W . They belong also to V since W ⊆ V , and therefore
RA
they commute: w1 + w2 = w2 + w1 .
We can even show the following proposition:
Proposition 5.10. Let V be a real vector space and W ⊆ V a subset. Then W is a subspace of V
if and only if the following three properties hold:
(iii) W is closed under product with scalars, that is, if we take w ∈ W and λ ∈ R, then λw ∈ W .
(iv) W is closed under sums and product with scalars, that is, if we take w1 , w2 ∈ W and λ ∈ R,
then λw1 + w2 ∈ W .
Proof of 5.10. Assume that W is a subspace, then clearly (ii) and (iii) hold. (i) holds because every
vector space must contain at least the additive identity O.
Now suppose that W is a subset of V such that the properties (i), (ii) and (iii) are satisfied. In
order to show that W is a subspace of V , we need to verify the vector space axioms (a) - (f) from
Definition 5.1. By assumptions (ii) and (iii), the sum and product with scalars are well defined in
W . Moreover, we already convinced ourselves that (a), (b), (e), (f) and (g) hold. Now, for the
In order to verify that a given W ⊆ V is a subspace, one only has to verify (i), (ii) and (iii)
from the preceding proposition. In order to verify that W is not empty, one typically checks if it
contains O.
Examples 5.11. Let V be a vector space. We assume that V is a real vector space, but everything
FT
works also for a complex vector space (we only have to replace R everywhere by C.)
(i) {0} is a subspace of V . It is called the trivial subspace of V .
(ii) V itself is a subspace of V .
(iii) Fix v ∈ V . Then the set W := {λv : λ ∈ R} is a subspace of V .
(iv) More generally, if we fix v1 , . . . vk ∈ V , then the set W := {α1 v1 + · · · αk vk : α1 , . . . , αk ∈ R}
is a subspace of V . This set is called the linear span of v1 , . . . , vk . It will be shown in
RA
Theorem 5.24 that it is indeed a vector space.
(v) P (R), the set of all real polynomials, is a subspace of C(R), the set of all continuous functions
on R.
(vi) For every n, the polynomials of degree at most n, Pn (R), is a subspace of P (R), and also of
C(R).
(vii) If W is a subspace of V , then V \ W is not a subspace. This can be easily seen if we recall that
W must contain O. But then V \ W cannot contain O, hence it cannot be a vector space.
D
Examples 5.12. (i) The set of all solutions of a homogeneous system of linear equations is a
vector space.
(ii) The set of all solutions of a homogeneous linear differential equation is a vector space.
Proof. We prove only (i) since the proof of (ii) is similar. Assume that the system of equations is
given in matrix form A~x = ~0 for some matrix A ∈ (m × n). We denote by U the set of all solutions,
that is, U := {~x ∈ Rn : A~x = ~0}. Clearly, ~0 ∈ U . Now let ~y , ~z ∈ U and λ ∈ R. We have to show
that then also ~y + λ~z ∈ U . This is true because
Therefore, the vector ~y + λ~z solves the homogeneous equation, so it belongs to U as we wanted to
show.
Note however that the set of all solutions of a homogeneous equation is not a vector space. Let us
consider a system of linear equations given in matrix form by
A~x = ~b
where A ∈ M (m × n) and ~b ∈ Rm with ~b 6= ~0. We denote its set of solutions by
U := {~x ∈ Rn : A~x = ~b}.
Clearly, ~0 ∈ / U because A~0 = ~0 6= ~b. This is already enough to conclude that U is not a vector
space. But we can also see that U is not closed under sums and multiplication by scalars. To this
end, let ~y , ~z ∈ U and λ ∈ R \ {0}. Then
A(~y + λ~z) = A~y + λA~z = ~b + λ~b = (1 + λ)~b 6= ~b.
However, U is “almost” a vector space. Recall that if we write the solutions of A~x = ~b in vector
FT
form, they are always of the form
~x = ~z0 + t1 ~y1 + · · · + tk ~yk
where k is the number of free variables and ~y1 , . . . , ~yk are solutions of the homogeneous system
A~x = ~0. See Remark 3.12. Therefore we can write
U = {~x ∈ R2 : A~x = ~b}
= {~x = ~z0 + t1 ~y1 + · · · + tk ~yk : t1 , . . . , tk ∈ R}
= ~z0 + {~x0 = t1 ~y1 + · · · + tk ~yk : t1 , . . . , tk ∈ R}
RA
= ~z0 + {~x ∈ R2 : A~x = ~0}.
This shows that U is equal to the vector space W = {~x ∈ R2 : A~x = ~0} but shifted by the vector ~z0 .
We will call such sets affine subspaces. They are very important in many applications. The formal
definition is as follows.
Definition 5.13. Let V be a vector space and W ⊆ V a subset. The W is called an affine subspace
if there exists an v0 ∈ V such that set
D
v0 + W := {v0 + w : w ∈ W }
is a subspace of V .
Examples 5.14. Let V be a vector space. We assume that V is a real vector space, but everything
works also for a complex vector space (we only have to replace R everywhere by C.)
(i) Every subspace of V is also an affine subspace with z0 = O.
(ii) If we fix z0 and v1 , . . . vk ∈ V , then the set W := {z0 + α1 v1 + · · · αk vk : α1 , . . . , αk ∈ R} =
z0 + {α1 v1 + · · · αk vk : α1 , . . . , αk ∈ R} is an affine subspace of V . In general it will not be a
subspace.
Examples 5.15. • The set of all solutions of an inhomogeneous system of linear equations is
an affine vector space if it is not empty.
• The set of all solutions of an inhomogeneous linear differential equation is an affine vector
space if it is not empty.
Now we give several examples and non-examples of subspaces of R2 and R3 . You should try to
generalize them to examples in R4 , R5 , etc. and also try to come up with your own examples.
FT
Examples 5.16 (Examples and non-examples of subspaces of R2 ).
λ
• W = : λ ∈ R is a subspace of R2 . This is actually a subspace of the form (iii) from
0
1
Example 5.11 with z = . Note that geometrically W is a line (it is the x-axis).
0
v1
• For fixed v1 , v2 ∈ R let ~v = and let W = {λ~v : λ ∈ R}. Then W is a subspace of R2 .
RA
v2
Geometrically, W is the trivial subspace {~0} if ~v = ~0. Otherwise it is the line in R2 passing
through the origin which is parallel to the vector ~v .
W
~v
D
a1 v1
• For fixed a1 , a2 , v1 , v2 ∈ R let ~a = and ~v = . Let us assume that ~v 6= ~0 and set
a2 v2
W = {~a + λ~v : λ ∈ R}. Then W is an affine subspace. Geometrically, W represents a line in
R2 parallel to ~v which passes through the point (a1 , a2 ). Note that W is a subspace if and
only if ~a and ~v are parallel.
~v W
~v W
~a ~a
Figure 5.2: Sketches of W = {~a + λ~v : λ ∈ R}. In the figure on the left hand side, ~a 6k ~v , so W is an
affine subspace of R@ but not a subspace. In the figure on the right hand side, ~a k ~v and therefore W
is a subspace of R2 .
FT
1
• V = {~x ∈ R : k~xk ≤ 2} is not a subspace of R . For example, take ~z =
2 2
. Then ~z ∈ W ,
0
however 3~z ∈
/ V.
x 2
• W = : x ≥ 0 . Then W is not a vector space. For example, ~z = ∈ W , but
y
0
−2
(−1)~z = ∈
/ W.
0
RA
Note that geometrically W is a right half plane in R2 .
V 3~z ∈
/V −~z ∈
/W
~z ∈ V ~z ∈ W
D
Figure 5.3: The sets V and W in the figures are not subspaces of R2 .
x
• For fixed a, b, c ∈ R the set W = y : ax + by + cz = 0 is a subspace of R3 .
z
3 ~
Proof. We use Proposition 5.10
toverify that W isa subspace of R . Clearly, 0 ∈ W since
x1 x2
0a+0b+0c = 0. Now let w~ 1 = y1 and w~ 2 = y2 in W and let λ ∈ R. Then w ~2 ∈ W
~ 1 +w
z1 z2
because
~ 1 ∈ W because
Also λw
FT
Remark. Note that W is the set of all solutions of a homogeneouos linear system of equations
(one equation with three unknowns). Therefore W is a vector space by Theorem 3.22 where
it is shown that the sum and the product with a scalar of two solutions of a homogeneous
linear system is again a solution.
• For fixed
a,b, c, d ∈ R with d 6=
0 and at least of the numbers a, b, c different from 0, the set
x
W = y : ax + by + cz = d is not a subspace of R3 , see Figure 5.4, but it is an affine
z
subspace.
D
x1 x2
~ 1 = y1 and w
Proof. Let us see that W is not a vector space. Let w ~ 2 = y2 in W . Then
z1 z2
w
~1 + w~2 ∈
/ W because
FT
Figure 5.4: The green plane passes through the origin and is a subspace of R3 . The red plane does
not pass through the origin and therefore it is an affine subspace of R3 .
an arbitrary vector from the origin to a point on the plane W . (Note that W0 is the plane
described by ax + by + cz = 0.)
Note that we already showed in Corollary 3.23 that W is an affine vector space.
RA
Remark. If a = b = c = 0, then W = ∅.
x 1
• W = x2 : x ∈ R . Then W is not a vector space. For example, ~a = 1 ∈ W , but
3
x 1
2
2~a = 2 ∈ / W.
2
• The set of all matrices such that its first row is equal to its last row.
FT
Examples 5.19 (Examples and non-examples of subspaces of the set all functions from
R to R). Let V be the set of all functions from R to R. Then V clearly is a real vector space.
The following sets are examples for subspaces of V :
Definition 5.20. For n ∈ N0 let Pn be the set of all polynomials of degree less than or equal to n.
Proof. Clearly, the zero function belongs to Pn (it is a polynomial of degree 0). For polynomials
p, q ∈ Pn and numbers λ ∈ R we clearly have that p + q and λp are again polynomials of degree
at most n, so they belong to Pn . By Proposition 5.10, Pn is a subspace of the space of all real
functions, hence it is a vector space.
FT
• check if a given subset of a vector space is a subspace,
• etc.
Ejercicios.
En los ejercicios 1 al 13 diga si el subconjunto W es subespacio del espacio vectorial V .
9. V = R3 y W el plano xy.
FT
(b) Determine W1 ∩ W2 y muestre que también es un subespacio de V .
Definition 5.22. Let V be a real vector space and let v1 , . . . , vk ∈ V and α1 , . . . , αk ∈ R. Then
every vector of the form
v = α1 v1 + · · · αk vk (5.1)
is called a linear combination of the vectors v1 , . . . , vk ∈ V .
D
1 4 9 3
Examples 5.23. • Let V = R3 and let ~v1 = 2 , ~v2 = 5 , ~a = 12 , ~b = 3 .
3 6 15 3
Then ~a and ~b are linear combinations of ~v1 and ~v2 because ~a = ~v1 + 2~v2 and ~b = −~v1 + ~v2 .
1 0 0 1 5 7 1 2
• Let V = M (2 × 2) and let A = , B= , R= , S= .
0 1 −1 0 −7 5 −2 3
Then R is a linear combination of A and B because R = 5A+7B. S is not a linear combination
of A and B because clearly every linear combination of A and B is of the form
α β
αA + βB =
−β α
so it can never be equal to S since S has two different numbers on its diagonal.
Definition and Theorem 5.24. Let V be a real vector space and let v1 , . . . , vk ∈ V . Then the
set of all their possible linear combinations is denoted by
span{v1 , . . . , vk } := {α1 v1 + · · · + αk vk : α1 , . . . , αk ∈ R}.
It is a subspace of V and it is called the linear span of the vectors v1 , . . . , vk . The vectors v1 , . . . , vk
are called generators of the subspace span{v1 , . . . , vk }.
Remark. By definition, the vector space generated by the empty set is the vector space which
consists only of the zero vector, that is, span{} := {O}.
Remark. Other names for “linear span” that are commonly used, are subspace generated by
the v1 , . . . , vk or subspace spanned by the v1 , . . . , vk . Instead of span{v1 , . . . , vk } the notation
gen{v1 , . . . , vk } is used frequently. All these names and notations mean exactly the same thing.
FT
to choose all the αj = 0). Now let u, w ∈ W and λ ∈ R. We have to show that λu + w ∈ W . Since
u, w ∈ W , there are real numbers α1 , . . . , αk and β1 , . . . , βk such that u = α1 v1 + . . . , αk vk and
w = β1 w1 + · · · + βk vk . Then
λu + v = λ(α1 v1 + · · · + αk vk ) + β1 w1 + · · · + βk vk
= λα1 v1 + · · · + λαk vk ) + β1 w1 + · · · + βk vk
= (λα1 + β1 )v1 + · · · + (λαk + βk )vk
which belongs to W since it is a linear combination of the vectors v1 , . . . , vk .
RA
Remark. The generators of a given subspace are not unique.
1 0 −1 0 1 −1
For example, let A = , B= , C= . Then
0 1 0 1 1 1
α −β
span{A, B} = {αA + βB : α, β ∈ R} = : α, β ∈ R ,
β α
α + γ −(β + γ)
D
span{A, B, C} = {αA + βB + γC : α, β, γ ∈ R} = : α, β, γ ∈ R ,
β+γ α+γ
α+γ −γ
span{A, C} = {αA + γC : α, γ ∈ R} = : α, γ ∈ R .
γ α+γ
We see that span{A, B} = span{A, B, C} = span{A, C} (in all cases it consists of exactly those
matrices whose diagonal entries are equal and the off-diagonal entries differ by a minus sign). So
we see that neither the generators nor their number is unique.
Remark. If a vector is a linear combination of other vectors, then the coefficients in the linear
combination are not necessarily unique.
Remark 5.25. Let V be a vector space and let v1 , . . . , vn and w1 , . . . , wm be vectors in V . Then
the following are equivalent:
(i) span{v1 , . . . , vn } = span{w1 , . . . , wm }.
(ii) vj ∈ span{w1 , . . . , wm } for every j = 1, . . . , n and wk ∈ span{v1 , . . . , vn } for every k =
1, . . . , m.
FT
the polynomials X n , X n−1 , . . . , X, 1.
~ ∈ R3 \ {~0}.
(iii) Let V = R3 and let ~v , w
RA
• span{~v } is a line which passes through the origin and is parallel to ~v .
• If ~v 6k w,
~ then span{~v , w}
~ is a plane which passes through the origin and is parallel to ~v
and w. ~ If ~v k w,
~ then it is a line which passes through the origin and is parallel to ~v .
It follows that α = 3 and β = −1 is a solution, and therefore q = 2p1 − p2 which shows that
q ∈ U.
• Let us check if r ∈ U . To this end we have to check if we can find α, β such that r = αp1 +βp2 .
Inserting the expressions for p1 , p2 , q we obtain
α+ β=1
−α − 2β = 1
α + 5β = −3.
FT
1 5 −3 0 4 −4 0 0 4
We see that the system is inconsistent. Therefore r is not a linear combination of p1 and p2 ,
hence r ∈
/ U.
Definition 5.28. A vector space V is called finitely generated if is has a finite set of generators.
Another proof using the concept of dimension will be given in Example 5.56 (f).
Later, in Lemma 5.53, we will see that every subspace of a finitely generated vector space is again
finitely generated.
Now we ask ourselves what is the least number of vectors we need in order to generate Rn . We
know that for example Rn = span{~e1 , . . . ,~en }. So in this case we have n vectors that generate
Rn . Could it be that fewer vectors are sufficient? Clearly, if we take away one of the ~ej , then the
remaining system no longer generates Rn since “one coordinate is missing”. However, could we
maybe find other vectors so that n − 1 or less vectors are enough to generate all of Rn ? The next
proposition says that this is not possible.
Proof. Let A = (~v1 | . . . |~vk ) be the matrix whose columns are the given vectors. We know that
there exists an invertible matrix E such that A0 = EA is in reduced echelon form (the matrix E
is the product of elementary matrices which correspond to the steps in the Gauß-Jordan process
to arrive at the reduced echelon form). Now, if k < n, then we know that A0 must have at least
one row which consists of zeros only. If we can find a vector w ~ such that it is transformed to ~en
under the Gauß-Jordan process, then we would have that A~x = w ~ is inconsistent, which means
that w~ ∈
/ span{~v1 , . . . , ~vk }. How do we find such a vector w?
~ Well, we only have to start with ~en
and “do the Gauß-Jordan process backwards”. In other words, we can take w ~ = E −1~en . Now if we
apply the Gauß-Jordan process to the augmented matrix (A|w), ~ we arrive at (EA|E w) ~ = (A0 |~en )
FT
which we already know is inconsistent.
Therefore, k < n is not possible and therefore we must have that k ≥ n.
Note that the proof above is basically the same as the one in Remark 3.37. Observe that the system
of vectors ~v1 , . . . , ~vk ∈ Rn is a set of generators for Rn if and only if the equation A~y = ~b has a
solution for every ~b ∈ Rn (as above, A is the matrix whose columns are the vectors ~v1 , . . . , ~vk ).
Now we will answer the question when the coefficients of a linear combination are unique. The
following remark shows us that we have to answer this question only for the zero vector.
RA
Remark 5.31. Let V be a vector space, let v1 , . . . , vk ∈ V and let w ∈ span{v1 , . . . , vk }. Then
there are unique β1 , . . . , βk ∈ R such that
β 1 v1 + · · · + β k vk = w (5.2)
α1 v1 + · · · + αk vk = O. (5.3)
D
Proof. First note that (5.3) always has at least one solution, namely α1 = · · · = αk = 0. This
solution is called the trivial solution.
Let us assume that (5.2) has two different solutions, so that there are γ1 , . . . , γk ∈ R such that for
at least one j = 1, . . . , k we have that βj 6= γj and
γ1 v1 + · · · + γk vk = w. (5.2’)
where at least one coefficient is different from zero. Therefore also (5.3) has more than one solution.
On the other hand, let us assume that (5.3) has a non-trivial solution, that is, at least one of the
αj in (5.3) is different from zero. But then, if we sum (5.2) and (5.3), we obtain another solution
for (5.2) because
(α1 + β1 )v1 + · · · + (α1 + βk )vk = O + w = w.
The proof shows that there are as many solutions of (5.2) as there are of (5.3).
It should also be noted that if (5.3) has one non-trivial solution, then it has automatically infinitely
many solutions, because if α1 , . . . , αk is a solution, then also cα1 , . . . , cαk is a solution for arbitrary
c ∈ R since
cα1 v1 + · · · + cαk vk = c(α1 v1 + · · · + αk vk ) = c O = O.
In fact, the discussion above should remind you of the relation between solutions of an inhomo-
geneous system and the solutions of its associated homogeneous system in Theorem 3.22. Note
that just as in the case of linear systems, (5.2) could have no solution. This happens if and only
if w ∈
/ span{v1 , . . . , vk }.
If V = Rn then the remark above is exactly Theorem 3.22.
FT
So we see that only one of the following two cases can occur: (5.4) as exactly one solution (namely
the trivial one) or it has infinitely many solutions. Note that this is analogous to the situation of
the solutions of homogeneous linear systems: They have either only the trivial solution or they have
infinitely many solutions. The following definition distinguishes between the two cases.
Definition 5.32. Let V be a vector space. The set of vectors v1 , . . . , vk in V are called linearly
independent if
α1 v1 + · · · + αk vk = O. (5.4)
RA
has only the trivial solution. They are called linearly dependent if (5.4) has more than one solution.
Remark 5.33. The empty set is linearly independent since O cannot be written as a nontrivial
linear combination of vectors from the empty set.
2 −8
4v~1 + v~2 = ~0.
1 5
(ii) The vectors v~1 = and v~2 = ∈ R2 are linearly independent.
2 0
Proof. Consider the equation αv~1 + β v~2 = ~0. This equation is equivalent to the following
system of linear equations for α and β:
α + 3β = 0
2α + 0β = 0.
We can use the Gauß-Jordan process to obtain all solutions. However, in this case we easily
see that α = 0 (from the second line) and then that β = − 31 α = 0. Note that we could
1 3
also have calculated det = −6 6= 0 to conclude that the homogeneous system above
2 0
has only the trivial solution. Observe that the columns of the matrix are exactly the given
vectors.
1 2
(iii) The vectors v~1 = 1 and v~2 = 3 ∈ R2 are linearly independent.
1 4
Proof. Consider the equation αv~1 + β v~2 = ~0. This equation is equivalent to the following
system of linear equations for α and β:
α + 2β = 0
α + 3β = 0
α + 4β = 0.
FT
If we subtract the first from the second equation, we obtain β = 0 and then α = −2β = 0. So
again, this system has only the trivial solution and therefore the vectors v~1 and v~2 are linearly
independent.
1 −1 0 0
(iv) Let v~1 = 1, v~2 = 2 v~3 = 0 and v~4 = 6 ∈ R2 Then
1 3 1 8
RA
(a) The system {~v1 , ~v2 , ~v3 } is linearly independent.
(b) The system {~v1 , ~v2 , ~v4 } is linearly dependent.
Proof. (a) Consider the equation αv~1 + β v~2 + γ v~3 = ~0. This equation is equivalent to the
following system of linear equations for α, β and γ:
α − 1β + 0γ = 0
α + 2β + 0γ = 0
α + 3β + 1γ = 0.
D
We use the Gauß-Jordan process to solve the system. Note that the columns of the
matrix associated to the above system are exactly the given vectors ~v1 , ~v2 , ~v3 .
1 −1 0 1 −1 0 1 −1 0 1 0 0
A = 1 2 0 −→ 0 3 0 −→ 0 1 0 −→ 0 1 0 .
1 3 1 0 4 1 0 4 1 0 0 1
Therefore the unique solution is α = β = γ = 0 and consequently the vectors ~v1 , ~v2 , ~v3
are linearly independent.
Observe that we also could have calculated det A = 3 6= 0 to conclude that the homoge-
neous system has only the trivial solution.
(b) Consider the equation αv~1 + β v~2 + δ v~4 = ~0. This equation is equivalent to the following
system of linear equations for α, β and δ:
α − 1β + 0δ = 0
α + 2β + 6δ = 0
α + 3β + 8δ = 0.
We use the Gauß-Jordan process to solve the system. Note that the columns of the
matrix associated to the above system are exactly the given vectors.
1 −1 0 1 −1 0 1 −1 0 1 −1 0
A = 1 2 6 −→ 0 3 6 −→ 0 1 2 −→ 0 1 2
1 3 8 0 4 8 0 1 2 0 0 0
1 0 2
−→ 0 1 2 .
0 0 0
So there are infinitely many solutions. If we take δ = t, then α = β = −2t. Consequently
FT
the vectors ~v1 , ~v2 , ~v3 are linearly dependent, because, for example, −2~v1 − 2~v2 + ~v3 = 0
(taking t = 1).
Observe that we also could have calculated det A = 0 to conclude that the system has
infinite solutions.
0 1 1 0
(v) The matrices and are linearly independent in M (2 × 2).
0 0 0 0
RA
1 1 1 0 0 1
(vi) The matrices A = , B= and C = are linearly dependent in M (2×2)
0 1 0 1 0 0
because A − B − C = 0.
After these examples we will proceed with some facts on linear independence. We start with the
special case when we have only two vectors.
Proposition 5.34. Let v1 , v2 be vectors in a vector space V . Then v1 , v2 are linearly dependent if
and only if one vector is a multiple of the other.
D
Proof. Assume that v1 , v2 are linearly dependent. Then there exist α1 , α2 ∈ R such that α1 v1 +
α2 v2 = 0 and at least one of the α1 and α2 is different from zero, say α1 6= 0. Then we have
α2
v1 + α 1
v2 = 0, hence v1 = − α
α 1 v2 .
2
Now assume on the other hand that, e.g., v1 is a multiple of v2 , that is v1 = λv2 for some λ ∈ R.
Then v1 − λv2 = 0 which is a nontrivial solution of α1 v1 + α2 v2 = 0 because we can take α1 = 1 6= 0
and α2 = −λ (note that λ may be zero).
The proposition
above cannot
be extended
to the case of three or more vectors. For instance, the
1 ~ 0 1
vectors ~a = ,b= , ~c = are linearly dependent because ~a + ~b − ~c = ~0, but none of
0 1 1
them is a multiple of any of the other two vectors.
FT
α`−1 α`+1 αk
(ii) If α` 6= 0, then we can solve for v` : v` = − αα1` v1 − · · · − α` v`−1 − α` v`+1 − ··· − α` vk .
(iii) If the vectors v1 , . . . , vk ∈ V are linearly dependent, then there exist α1 , . . . , αk ∈ R such
that at least one of them is different from zero and α1 v1 + · · · + αk vk = O. But then also
α1 v1 + · · · + αk vk + 0w = O which shows that the system {v1 , . . . , vk , w} is linearly dependent.
(v) Suppose that a subsystem of v1 , . . . , vk ∈ V are linearly dependent. Then, by (iii) every
system in which it is contained, must be linearly dependent too. In particular, the original
system of vectors must be linearly dependent which contradicts our assumption. Note that
also the empty set is linearly independent by Remark 5.33.
Now we specialise to the case when V = Rn . Let us take vectors ~v1 , . . . , ~vk ∈ Rn and let us write
(~v1 | · · · |~vk ) for the n × k matrix whose columns are the vectors ~v1 , . . . , ~vk .
D
Lemma 5.36. With the above notation, the following statements are equivalent:
(ii) There exist α1 , . . . , αk not all equal to zero, such that α1~v1 + · · · + αk~vk = 0.
α1 α1
.. ~ .. ~
(iii) There exists a vector . 6= 0 such that (~v1 | · · · |~vk ) . = 0.
αk αk
(iv) The homogeneous system corresponding to the matrix (~v1 | · · · |~vk ) has at least one non-trivial
(and therefore infinitely many) solutions.
Proof. (i) =⇒ (ii) is simply the definition of linear independence. (ii) =⇒ (iii) is only rewriting
the vector equation in matrix form. (iv) only says in word what the equation in (iii) means. And
finally (iv) =⇒ (i) holds because every non trivial solution the inhomogeneous system associated
to (~v1 | · · · |~vk ) gives a non-trivial solution of α1~v1 + · · · + αk~vk .
Since we know that a homogeneous linear system with more unknowns than equations has infinitely
many solutions, we immediately obtain the following corollary.
(i) If k > n, then the vectors ~v1 , . . . , ~vk are linearly dependent.
(ii) If the vectors ~v1 , . . . , ~vk are linearly independent, then k ≤ n.
Observe that (ii) does not say that if k ≤ n, then the vectors ~v1 , . . . , ~vk are linearly independent.
It only says that they have a chance to be linearly independent whereas a system with more than
n vectors always is linearly dependent.
FT
Now we specialise further to the case when k = n.
Theorem 5.38. Let ~v1 , . . . , ~vn be vectors in Rn . Then the following are equivalent:
Proof. The equivalence of (i) and (ii) follows from Lemma 5.36. The equivalence of (ii), (iii) and
(iv) follows from Theorem 4.11.
D
Theorem 5.39. Let ~v1 , . . . , ~vn be vectors in Rn . and let A = (~v1 | · · · |~vn ) be the matrix whose
columns are the given vectors ~v1 , · · · , ~vn . Then the following are equivalent:
(iii) det A 6= 0.
(ii) ⇐⇒ (iii): The vectors ~v1 , . . . , ~vn generate Rn if and only if for every w
~ ∈ Rn there exist
! numbers
β1
β1 , . . . , βn such that β1~v1 + · · · + βn vn = w.
~ In matrix form that means that A .. = w.
~ By
.
βn
Theorem 3.44 we know that this has a solution for every vector w ~ if and only if A is invertible
(because if we apply Gauß-Jordan to A, we must get to the identity matrix).
The proof of the preceding theorem basically goes like this: We consider the equation Aβ~ = w. ~
When are the vectors ~v1 , . . . , ~vn linearly independent? – They are linearly independent if and only
~ = ~0 the system has only the trivial solution. This happens if and only if the reduced echelon
if for w
form of A is the identity matrix. And this happens if and only if det A 6= 0.
When do the vectors ~v1 , . . . , ~vn generate Rn ? – They do, if and only if for every given vector w
~ ∈ Rn
the system has at least one solution. This happens if and only if the reduced echelon form of A is
the identity matrix. And this happens if and only if det A 6= 0.
Since a square matrix A in invertible if and only if its transpose At is invertible, Theorem 5.39 leads
immediately to the following corollary.
FT
Corollary 5.40. For a matrix A ∈ M (n × n) the following are equivalent:
(i) A is invertible.
(ii) The columns of A are linearly independent.
(iii) The rows of A are linearly independent.
Comparing coefficients, it follows that α1 +2α3 = 0, 2α1 +5α2 −11α3 = 0, −α1 +2α2 −8α3 = 0.
We write this in matrix form and apply Gauß-Jordan:
1 0 2 1 0 2 1 0 2 1 0 2
2 5 −11 −→ 0 5 −15 −→ 0 1 −3 −→ 0 1 −3 .
−1 2 −8 0 2 −6 0 1 −3 0 0 0
FT
This shows that the system has non-trivial solutions (find them!) and therefore p1 , p2 and p3
are linearly dependent.
1 2 1 0 0 5
• In V = M (2 × 2) consider A = ,B = ,C = . Then A, B, C are
2 1 0 1 5 0
linearly dependent because A − B − 52 C = 0.
1 2 3 2 2 2 1 2 2
• In V = M (2 × 3) consider A = ,B = ,C = . Then A, B, C
RA
4 5 6 1 1 1 2 1 1
are linearly independent.
z
x
Solution. Clearly, V is a subspace of R3 (it is a plane). Let ~x = y ∈ V . By definition
z
of V , we have that x + 2y = 0. We can solve the previous equation for x or y, we obtain
x = −2y. So
x −2y −2 0
~x = y = y = y 1 + z 0 .
z z 0 1
−2 0
Therefore 1 , 0 is a set of generators for V .
0 1
FT
• verify if a given vector is a linear combination of a given set of vectors,
• verify if a given vector lies in the linear span of a given set of vectors,
• verify if a given set of vectors is a generator of a given vectors space,
• find a set of generators for a given vectors space,
RA
• verify if a given set of vectors is a linearly independent,
• etc.
Ejercicios.
En los ejercicios 1 al 10, encontrar un conjunto generador para el espacio dado. Antes de hacerlo,
asegúrese que los conjuntos dados efecitvamente son espacios vectoriales.
1. {~x ∈ R3 : x + y + z = 0}.
D
2. ~x ∈ R3 : x + y + z = 0, x − y − z = 0 .
3. Matrices antisimétricas de tamaño 3 × 3.
4. Polinomios de grado ≤ 3 tal que p00 (0) = 0.
R1
5. Polinomios de grado ≤ 3 tal que 0 xp(x)dx = 0.
6. {p ∈ P3 : p(0) = p(1), p0 (0) = p0 (1)}.
7. La recta en R2 dada por 2x + y = 0.
8. El plano 3x + 2y − z = 0 en R3 .
x y
9. La recta en R3 dada por 2 = 3 = 3z.
12. Muestre que el conjunto {x3 + 1, x3 − 5, x2 , x2 − 1} genera todos los polinomios de grado ≤ 3
tales que su primera derivada evaluada en cero vale cero.
2 1 0
13. Muestre que 1 no pertenece al generado de 1 , 1 .
3 1 1
14. En los siguientes ejercicios, diga si los vectores dados son linealmente independientes o de-
pendientes.
(d) En P3 ; x3 + 1, x3 − 5, x2 y x2 − 1.
FT
2 −4
(a) En R3 : −1, 2. (e) En P2 ; 1 − x, 1 + x, x2
4 −8
(f) En P3 ; 2x, x3 −3, 1+x−4x3 , x3 +18x−9.
0 1 1
(g) En C(R); cos 2t, sin2 t, cos2 t.
(b) En R3 : 1, 0, 1
1 1 0
2−1 0−3 4 1
(c) En M (2×2); , , .
4 0 1 5 7−5
RA
1−c c
15. ¿Para qué valores de c son linealmente independientes los vectores y ?
−c 1+c
1 0 2
16. Determine si el siguiente conjunto 2 , 1 , 3 es linealmente independiente y
1 1 1
describa su generado.
D
1 2 3
17. ¿Para qué valores de α son linealmente independientes los vectores 2 , −1 , α?
3 4 4
1 1 1
18. Determine condiciones sobre a, b, c para que los vectores a , b , c sean lineal-
a2 b2 c2
mente independientes (ver Sección 4.1, Ejercicio 2.).
2 −4 1
19. ¿Para qué valores de α son linealmente dependientes los vectores −5 , 10 , α?
3 −6 0
20. Falso o verdadero:
Definition 5.41. Let V be a vector space. A basis of V is a set of vectors {v1 , . . . , vn } in V which
FT
is linearly independent and generates V .
The following remark shows that a basis is a minimal system of generators of V and at the same
time a maximal system of linear independent vectors.
Remark 5.42. By definition, the empty set is a basis of the trivial vector space {O}.
Remark 5.43. Every basis of Rn has exactly n elements. To see this note that by Corollary 5.37,
D
a basis can have at most n elements because otherwise it cannot be linearly independent. On the
other hand, if it had less than n elements, then, by Remark 5.30, it cannot generate Rn .
1 0 0
Examples 5.44. • A basis of R3 is, for example, 0 , 1 , 0 . The vectors of this
0 0 1
basis are the standard unit vectors. The basis is called the standard basis (or canonical basis)
of R3 .
Other examples of bases of R3 are
1 1 1 1 4 0
0 , 1 , 1 , 2 , 5 , 2
0 0 1 3 6 1
• The standard basis in Rn (or canonical basis in Rn ) is {~e1 , . . . ,~en }. Recall that the ~ej are the
standard unit vectors whose jth entry is 1 and all other entries are 0.
FT
• Let p1 = X, p2 = 2X 2 + 5X − 1, p3 = 3X 2 + X + 2. Then the system {p1 , p2 , p3 } is a basis
of P2 .
Proof. We have to show that the system in linearly independent and that it generates the
space P2 . Let q = aX 2 + bX + c ∈ P2 . We want to see if there are α1 , α2 , α3 ∈ R such that
q = α1 p1 + α2 p2 + α3 p3 . If we write this equation out, we find
RA
aX 2 + bX + c = α1 X + α2 (2X 2 + 5X − 1) + α3 (3X 2 + X + 2)
= (2α2 + 3α3 )X 2 + (α1 + 5α2 + α3 )X − α2 + 2α3 .
Comparing coefficients, we obtain the following system of linear equations for the αj :
2α2 + 3α3 = a 0 2 3 α1 a
α1 + 5α2 + α3 = b in matrix form: 1 5 1 α2 = b .
−α2 + 2α3 = c 0 −1 2 α3 c
D
So we see that there is exactly one solution for any given q. The existence of such a solution
shows that {p1 , p2 , p3 } generates P2 . We also see that for any give q ∈ P2 there is exactly one
way to write it as a linear combination of p1 , p2 , p3 . If we take the special case q = 0, this
shows that the system is linearly independent. In summary, {p1 , p2 , p3 } is a basis of P2 .
FT
c d 0 0 1 0 1 1 1 1
α1 + α2 + α3 + α4 α4
= .
α2 + α3 + α4 α3 + α4
0 1 0 0 c − d 0 1 0 0 c − d
−→ −→ .
0 0 1 0 d − b 0 0 1 0 d − b
0 0 0 1 b 0 0 0 1 b
We see that there is exactly one solution for any given M ∈ M (2 × 2). Existence of the
solution shows that the matrices A, B, C, D generate M (2 × 2) and uniqueness shows that
they are linearly independent if we choose M = 0.
The next theorem is very important. It says that if V has a basis which consists of n vectors, then
every basis consists of exactly n vectors.
Theorem 5.45. Let V be a vector space and let {v1 , . . . , vn } and {w1 , . . . , wn } be bases of V . Then
n = m.
Proof. Suppose that m > n. We will show that then the vectors w1 , . . . , wm cannot be linearly
independent, hence they cannot be a basis of V . Since the vectors v1 , . . . , vn are a basis of V , every
wj can be written as a linear combination of them. Hence there exist numbers aij which
FT
+ · · · + cm (am1 v1 + am2 v2 + · · · + amn vn )
= (c1 a11 + c2 a21 + · · · + cm am1 )v1 + · · · + (c1 a1n + c2 a2n + · · · + cm amn )vn .
Since the vectors v1 , . . . , vn are linearly independent, the expressions in the parentheses must be
equal to zero. So we find
Definition 5.46. • Let V be a finitely generated vector space. Then it has a basis by The-
orem 5.47 below and by Theorem 5.45 the number n of vectors needed for a basis does not
depend on the particular chosen basis. This number is called the dimension of V . It is denoted
by dim V .
• The empty set is a basis of the trivial vector space {O}, hence dim{O} = 0.
Next we show that every finitely generated vector space has a basis and therefore a well-defined
dimension.
Theorem 5.47. Let V be a vector space and assume that there are vectors w1 , . . . , wm ∈ V such
that V = span{w1 , . . . , wm }. Then the set {w1 , . . . , wm } contains a basis of V . In particular, V
has a finite basis and dim V ≤ m.
Proof. Without restriction we may assume that all vectors wj are different from O. We start with
the first vector. If V = span{w1 }, then {w1 } is a basis of V and dim V = 1. Otherwise we set
V1 := span{w1 } and we note that V1 6= V . Now we check if w2 ∈ span{w1 }. If it is, we throw it out
because in this case span{w1 } = span{w1 , w2 } so we do not need w2 to generate V . Next we check
if w3 ∈ span{w1 }. If it is, we throw it out, etc. We proceed like this until we find a vector wi2 in
our list which does not belong to span{w1 }. Such an i2 must exist because otherwise we already
had that V1 = V . Then we set V2 := span{w1 , wi2 }. If V2 = V , then we are done. Otherwise, we
proceed as before: We check if wi2 +1 ∈ V2 . If this is the case, then we can throw it out because
span{w1 , wi2 } = span{w1 , wi2 , wi2 +1 }. Then we check wi2 +2 , etc., until we find a wi3 such that
wi3 ∈
/ span{w1 , wi2 } and we set V3 := span{w1 , wi2 , wi3 }. If V3 = V , then we are done. If not, then
we repeat the process. Note that after at most m repetitions, this comes to an end. This shows
that we can extract from the system of generators a basis {w1 , wi2 , . . . , wik } of V .
FT
The following theorem complements the preceding one.
Theorem 5.48. Let V be a finitely generated vector space. Then any system w1 , . . . , wm ∈ V of
linearly independent vectors can be completed to a basis {w1 , . . . , wm , vm+1 , . . . , vn } of V .
Proof. Note that dim V < ∞ by Theorem 5.47 and set n = dim V . It follows that n ≥ m because
we have m linearly independent vectors in V . If m = n, then w1 , . . . , wm is already a basis of V
and we are done.
RA
If m < n, then span{w1 , . . . , wm } = 6 V and we choose an arbitrary vector vm+1 ∈
/ span{w1 , . . . , wm }
and we define Vm+1 := span{w1 , . . . , wm , vm+1 }. Then dim Vm+1 } = m + 1. If m + 1 = n,
then necessarily Vm+1 = V and we are done. If m + 1 < n, then we choose an arbitrary vector
vm+2 ∈ V \ Vm+1 and we let Vm+2 := span{w1 , . . . , wm , vm+1 , vm+2 }. If m + 2 = n, then necessarily
Vm+2 = V and we are done. If not, we repeat the step before. Note that after n − m steps we have
found a basis {w1 , . . . , wm , vm+1 , . . . , vn } of V .
• If the set of vectors v1 , . . . vm generates the vector space V , then it is always possible to extract
a subset which is a basis of V (we need to eliminate m − n vectors).
1 0 1 1
Example 5.50. • Let A = , B= ∈ M (2 × 2) and suppose that we want to
0 0 1 1
complete them to a basis of M (2 × 2) (it is clear that A and B are linearly independent,
so this makes sense). Since dim(M (2 × 2)) = 4, we know that we need 2 more matrices.
0 1
We take any matrix C ∈ / span{A, B}, for example C = . Finally we need a matrix
0 0
0 0
D ∈/ span{A, B, C}. We can take for example D = . Then A, B, C, D is a basis of
1 0
M (2 × 2).
Check that D ∈
/ span{A, B, C}
FT
3
and we want to find a subset of them which form a basis of R .
Note that a priori it is not clear that this is possible because we do not know without further
calculations that the given vectors really generate R3 . If they do not, then of course it is
impossible to extract a basis from them.
Let us start. First observe that we need 3 vectors for a basis since dim R3 = 3. So we start
with the first non-zero vector which is ~v1 . We see that ~v2 = 4~v1 , so we discard it. We keep
~v3 since ~v3 ∈
/ span{~v1 }. Next, ~v4 = ~v3 − ~v1 , so ~v4 ∈ span{~v1 , ~v3 } and we discard it. A little
RA
calculation shows that ~v5 ∈/ span{~v1 , ~v3 }. Hence {~v1 , ~v3 , ~v5 } is a basis of R3 .
Remark 5.51. We will present a more systematic way to solve exercises of this type in
Theorem 6.34 and Remark 6.35.
Theorem 5.52. Let V be a vector space with basis {v1 , . . . , vn }. Then every x ∈ V can be written
in unique way as linear combination of the vectors v1 , . . . , vn .
D
If we have a vector space V and a subspace W ⊂ V , then we can ask ourselves what the relation
between their dimensions is because W itself is a vector space.
Lemma 5.53. Let V be a finitely generated vector space and let W be a subspace. Then W is
finitely generated and dim W ≤ dim V .
Proof. Let V be a finitely generated vector space with dim V = n < ∞. Let W be a subspace of
V and assume that W is not finitely generated. Then we can construct an arbitrary large system
of linear independent vectors in W as follows. Clearly, W cannot be the trivial space, so we can
choose w1 ∈ W \ {O} and we set W1 = span{w1 }. Then W1 is a finitely generated subspace of
W , therefore W1 ( W and we can choose w2 ∈ W \ W1 . Clearly, the set {w1 , w2 } is linearly
independent. Let us set W2 = span{w1 , w2 }. Since W2 is a finitely generated subspace of W , it
follows that W2 ( W and we can choose w3 ∈ W \ W2 . Then the vectors w1 , w2 , w3 are linearly
independent and we set W3 = span{w1 , w2 , w3 }. Continuing with this procedure we can construct
subspaces W1 ( W2 ( · · · W with dim Wk = k for every k. In particular, we can find a system
of n + 1 linear independent vectors in W ⊆ V which contradicts the fact that any system of more
than n = dim V vectors in V must be linearly dependent, see Corollary 5.49. This also shows that
any system of more than n vectors in W must be linear dependent. Since a basis of W consists of
linearly independent vectors, it follows that dim W ≤ n = dim V .
FT
Theorem 5.54. Let V be a finitely generated vector space and let W ⊆ V be a subspace. Then the
following is true:
Remark 5.55. Note that (i) is true even when V is not finitely generated because dim W ≤ ∞ =
dim V whatever dim W may be. However (ii) is not true in general for infinite dimensional vector
spaces. In Example 5.56 (f) and (g) we will show that dim P = dim C(R) in spite of P 6= C(R).
(Recall that P is the set of all polynomials and that C(R) is the set of all continuous functions. So
D
we have P ( C(R).)
(b) dim M (m × n) = mn. This follows because the set of all m × n matrices Aij which have a 1 in
the ith row and jth column and all other entries are equal to zero form a basis of M (m × n)
and there are exactly mn such matrices.
(c) Let Msym (n × n) be the set of all symmetric n × n matrices. Then dim Msym (n × n) = n(n+1) 2 .
To see this, let Aij be the n × n matrix with aij = aji = 1 and all other entries equal to 0.
Observe that Aij = Aji . It is not hard to see that the set of all Aij with i ≤ j form a basis of
Msym (n × n). The dimension of Msym (n × n) is the number of different matrices of this type.
How many of them are there? If we fix j = 1, then only i = 1 is possible. If we fix j = 2,
then i = 1, 2 is possible, etc. until for j = n the allowed values for i are 1, 2, . . . , n. In total
we have 1 + 2 + · · · + n = n(n+1)
2 possibilities. For example, in the case n = 2, the matrices
are
1 0 0 1 0 0
A11 = , A12 = , A12 = .
0 0 1 0 0 1
In the case n = 3, the matrices are
1 0 0 0 1 0 0 0 1
A11 = 0 0 0 , A12 = 1 0 0 , A13 = 0 0 0 ,
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 0
A22 = 0 1 0 , A23 = 0 0 1 , A33 = 0 0 0 .
0 0 0 0 1 0 0 0 1
FT
Convince yourself that the Aij form a basis of Msym (n × n).
(d) Let Masym (n × n) be the set of all antisymmetric n × n matrices. Then dim Masym (n × n) =
n(n−1)
2 . To see this, for i 6= j let Aij be the n × n matrix with aij = −aji = 1 and all other
entries equal to 0 form a basis of Msym (n × n). It is not hard to see that the set of all Aij
with i < j form a basis of Masym (n × n). How many of these matrices are there? If we fix
j = 2, then only i = 1 is possible. If we fix j = 3, then i = 1, 2 is possible, etc. until for j = n
the allowed values for i are 1, 2, . . . , n − 1. In total we have 1 + 2 + · · · + (n − 1) = n(n−1)
RA
2
possibilities. For example, in the case n = 2, the only matrix is
0 1
A12 = .
−1 0
0 0 0 −1 0 0 0 −1 0
Remark. Observe that dim Msym (n × n) + dim Masym (n × n) = n2 = dim M (n × n). This
is no coincidence. Note that every n × n matrix M can be written as
M = 12 (M + M t ) + 21 (M − M t )
Proof. We know that for every n ∈ N, the space Pn is a subspace of P . Therefore for every
n ∈ N, we must have that n + 1 = dim Pn ≤ dim P . This is possible only if dim P = ∞.
(g) dim C(R) = ∞. Recall that C(R) is the space of all continuous functions.
Proof. Since P is a subspace of C(R), it follows that dim P ≤ dim(C(R)), hence dim(C(R)) =
∞.
Now we use the concept of dimension to classify all subspaces of R2 and R3 . We already know that
for examples lines and planes which pass through the origin are subspaces of R3 . Now we can show
that there are no other proper subspaces.
FT
Subspaces of R2 . Let U be a subspace of R2 . Then U must have a dimension. So we have the
following cases:
• dim U = 1. Then U is of the form U = span{~v1 } with some vector ~v1 ∈ R2 \ {~0}. Therefore
U is a line parallel to ~v1 passing through the origin.
• dim U = 2. In this case dim U = dim R2 . Hence it follows that U = R2 by Theorem 5.54 (ii).
RA
• dim U ≥ 3 is not possible because 0 ≤ dim U ≤ dim R2 = 2.
In conclusion, the only subspaces of R2 are {~0}, lines passing through the origin and R2 itself.
• dim U = 1. Then U is of the form U = span{~v1 } with some vector ~v1 ∈ R3 \ {~0}. Therefore
D
• dim U = 2. Then U is of the form U = span{~v1 , ~v2 } with linearly independent vectors
~v1 , ~v2 ∈ R3 . Hence U is a plane parallel to the vectors ~v1 and ~v2 which passes through the
origin.
• dim U = 3. In this case dim U = dim R3 . Hence it follows that U = R3 by Theorem 5.54 (ii).
In conclusion, the only subspaces of R3 are {~0}, lines passing through the origin, planes passing
through the origin and R3 itself.
We conclude this section with the formal definition of lines and planes.
Definition 5.57. Let V be a vector space with dim V = n and let W ⊆ V be a subspace. Then
W is called a
• line if dim W = 1,
• plane if dim W = 2,
• hyperplane if dim W = n − 1.
FT
• why and how the concept of dimension helps to classify all subspaces of given vector space,
• why a matrix A ∈ M (n × n) is invertible if and only if its columns are a basis of Rn ,
• etc.
You should now be able to
• check if a system of vectors is a basis for a given vector space,
• find a basis for a given vector space,
RA
• extend a system of linear independent vectors to a basis,
• find the dimension of a given vector space,
• etc.
Ejercicios.
1. Encuentre bases para los espacios dados en los ejercicios del 1 al 10 de la sección 5.3.
D
w
a+c
a − b
(b) Todos los vectores de la forma b + c.
−a + b
1 −1 0 0
(c) A ∈ M (2 × 2) : A = .
2 −2 0 0
x − 5y = 0
2x − 3y = 0
x − 3y − z = 0
−2x + 2y − 3z = 0
4x − 8y + 5z = 0.
−x + 3y − 2z = 0
2x − 6y + 4z = 0
−3x + 9y − 6z = 0.
(a) En M (2 × 2);
3 1
0 0
,
3 2
0 0
,
FT
3. En los siguientes ejercicios, determine si el conjunto de vectores dado es una base para el
espacio vectorial indicado.
−5 1
0 6
,
0
0 −7
1
.
RA
1
(b) En W = {(x, y) ∈ R2 : 3x − y = 0}; .
3
0 −1 1 2
0 1 1 1
(c) En R4 ; 1, 1, 0, 2.
1 2 0 1
(d) En P2 ; 5 − x2 , 3x.
(e) En P3 ; x3 − x, x3 + x2 , x2 + 1, x − 1.
D
4. En R4 , encontrar una base para el subespacio U = {~x ∈ R4 : h~x , (2, −3, 0, 4)i = 0}. (Note
that U es un hiperplano en R4 .)
x y
5. En R3 , considere la recta L : 5 = 3 = 2z. Encuentre una base para L y complétela a una
base de R3 .
a −b
8. Muestre que los vectores , forman una base de R2 si ab 6= 0. Muestre también que
b a
a −b
⊥ .
b a
11. (a) En M (n × n), muestre que n2 matrices tales que en todas su entrada ann vale cero no
pueden ser linealmente independientes.
(b) En Pn , muestre que n + 1 polinomios cuya primera derivada evaluada en cero se anula
no pueden ser linealmente independientes.
FT
(c) En Pn , ¿existen n + 1 polinomios linealmente independientes tales que el coeficiente de
x0 es 1?
For example, the intersection of two planes in R3 which pass through the origin is either that same
plane (if the two original planes are the same plane), or it is a line passing through the origin. In
either case, it is a subspace of R3 .
Observe however that in general the union of two vector spaces in general is not a vector space. For
instance, in R2 the lines L : y = 0 (this is the x-axis) and G : x = 0 (this is the y-axis) are subspaces
and their union L ∪ G is consists of exactly both axis. This is clearly not a vector space because it
is not closed under sums. For example, ~e1 ∈ L ⊆ L ∪ G and ~e2 ∈ G ⊆ L ∪ G, but ~e1 +~e2 ∈ / L ∪ G. In
order to make it a vector space, we need to include all the missing linear combinations. The space
that we obtain in this way, is called a direct sum, see Definition 5.59.
Exercise. • Give more examples of two subspaces whose union is not a vector space.
• Give an example of two subspaces whose union is a vector space.
Let us define the sum and the direct sum of vector spaces.
Definition 5.59. Let U, W be subspaces of a vector space V . Then the sum of the vector spaces
U and W is defined as
U + W = {u + w : u ∈ U, w ∈ W }. (5.8)
If in addition U ∩ W = {O}, then the sum is called the direct sum of U and W and one writes
U ⊕ W instead of U + W .
FT
Proof. Clearly, U + W 6= ∅ because O ∈ U and O ∈ W , hence O + O = O ∈ U + W . Now let
z1 , z2 ∈ U + W and c ∈ K. Then there exist u1 , u2 ∈ U and w1 , w2 ∈ W with z1 = u1 + w1 and
z2 = u2 + w2 . Therefore
Remark 5.60. (i) Assume that U = span{u1 , . . . , uk } and that W = span{w1 , . . . , wj }, then
U + W = span{u1 , . . . , uk , w1 , . . . , wj }.
(ii) The space U + W is the smallest vector space which contains both U and W .
Examples 5.61. (i) Let V be a vector space and let U ⊆ V be a subspace. Then we always
D
have:
(a) U ⊆ U + W and W ⊆ U + W .
(b) U + W = U if and only if W ⊆ U .
(ii) Let U and W be lines in R2 passing through the origin. Then they are subspaces of R2 and
we have that U + W = U if the lines are parallel and U + W = R2 if they are not parallel.
(iii) Let U and W be lines in R3 passing through the origin. Then they are subspaces of R3 and
we have that U + W = U if the lines are parallel; otherwise U + W is the plane containing
both lines.
(iv) Let U be a line and W be a plane in R3 , both passing through the origin. Then they are
subspaces of R3 and we have that U + W = W if the line U is contained in W . If not, then
U + W = R3 .
Recall that the intersection of two subspaces is again a subspace, see Proposition 5.58. The formula
for the dimension of the sum of two vector spaces in the next proposition can be understood as
follows: If we sum the dimension of the two vector spaces, then we count the part which is common to
both spaces twice; therefore we have to subtract its dimension in order to get the correct dimension
of the sum of the vector spaces.
FT
dim(U + W ) = dim U + dim W − dim(U ∩ W ).
Proof. Let dim U = k and dim W = m. Recall that U ∩ W is a subspace of V . and that U ∩ W ⊆ U
and U ∩W ⊆ W . Let v1 , . . . , v` be a basis of U ∩W . By Theorem 5.48 we can complete it to a basis
v1 , . . . , v` , u`+1 , . . . , uk of U . Similarly, we can complete it to a basis v1 , . . . , v` , w`+1 , . . . , wm of
W . Now we claim that v1 , . . . , v` , u`+1 , . . . , uk , w`+1 , . . . , wm is a basis of U + W .
RA
• First we show that the vectors v1 , . . . , v` , u`+1 , . . . , uk , w`+1 , . . . , wm generate U + W . This
follows from Remark 5.60 and
• Now we show that the vectors v1 , . . . , v` , u`+1 , . . . , uk , w`+1 , . . . , wm are linearly indepen-
D
It follows that
γ1 v1 + · · · + γ` v` + β`+1 w`+1 + · · · + βm wm = O.
Since the vectors v1 , . . . , v` , w`+1 , . . . , wm form a basis of W , they are linearly independent,
and we conclude that γ1 = · · · = γ` = β`+1 = · · · = βm = 0. Inserting in (5.9), we obtain
α1 v1 + · · · + α` v` + α`+1 u`+1 + · · · + αk uk = O,
hence α1 = · · · = αk = 0.
It follows that
FT
and G = span{~a} where
0 1 −4
~v = 1 , w ~ = 0 , ~a = 3 ,
2 1 2
Find E ∩ F , E + F , E ∩ G, E + G and F ∩ G, F + G and their dimensions.
x + 2y − z = 0
A short calculation (Gauß-Jordan) shows that the set of solution is the line H : x = −y = −z, or
in vector form
−1
H = gen{~b} where ~b = 1 . (∗)
1
Solution 2. We can also use the vector forms of E and F . In order to write E in vector form, we
only need to choose two vectors ~r, ~s which are parallel to E, for instance
1 1
E = span{~r, ~s} where ~r = 2 , ~s = 5 .
0 1
A vector ~x = (x, y, z)t belongs to E ∩ F if and only if the vector ~x is a linear combination of
~ and a linear combination of ~r, ~s, that is, if and only if there exist α, β, γ, δ ∈ R such that
~v , w
α~v + β w
~ = γ~r + δ~s, or
~ − γ~r − δ~s = ~0.
α~v + β w
Writing this as a system for the unknowns α, β, γ, δ ∈ R, we obtain
β− γ− δ=0
α − 2γ − 5δ = 0
2α + β − δ=0
E ∩ F = {t~v − tw
~ : t ∈ R} = {t(~v − w) ~ = span{~b} = H
~ : t ∈ R} = span{~v − w}
or equivalently
FT
E ∩ F = {−2t~r + t~s : t ∈ R} = {t(−2~r − ~s) : t ∈ R} = span{−2~r − ~s} = span{~b} = H
E + F Solution 1. We know that E + F = span{~v , w, ~ ~r, ~s}. Now, similarly as inExample 5.50 we
see that the vectors ~v , w,
~ ~r are linearly independent, therefore the dimension of E + F is larger or
equal to 3. Since it is a subspace of R3 , it must be equal to R3 . (We could also use Theorem 6.34
and Remark 6.35) to find a system of generators for E + F .)
RA
Solution 2. We know that
E + G It is easy to see that the three vectors ~r, ~s, ~a are linearly independent, therefore they
generate R3 and hence E + G = span{~r, ~s, ~a} = R3 .
Alternatively we could use thet dim(E + G) = dim E + dim G − dim(E ∩ G) = 2 + 1 − 0 = 3 to
conclude that E + G = R3 .
F ∩ G It is easy to see that the three vectors ~v , w,
~ ~a are linearly dependent. In fact, ~a = 2~v − 4w.
~
Therefore G ⊆ F and consequently G ∩ F = F .
F + G From the above it follows that F + G = span{~v , w, ~ ~a} = span{~v , w}
~ = F.
From the above, it is clear that dim(E ∩ F ) = 1, dim(E + F ) = 3, dim(E ∩ G) = 0, dim(E + G) = 3,
dim(F ∩ G) = 2, dim(F + G) = 3.
FT
−4 −1
x + z + s = 0
−8
−3
U ∩W = (x, y, z, r, s) : y + 3z + 2s = 0 = span 0
,
1
(5.10)
−1 0
4r + s = 0
4 0
Solution 2. We can use vector forms of U and W . We choose any set of linearly indepen-
dent vectors ~u1 , ~u2 , ~u3 , ~u4 in U and w
~ 1, w ~ 3 in W . Then U = span{~u1 , ~u2 , ~u3 , ~u4 } and
~ 2, w
RA
W = span{w ~ 1, w
~ 2, w~ 3 }. For instance, we may take
1 0 0 0 −3 −2 −1
−2 −5 −4 0 −3 2 −3
0 , ~u2 = 1 , ~u3 = 0 , ~u4 = 0 , w
~ 1 = 0 , w
~ 2 = 0 , w
~u1 = ~ 3 = 1 .
0 0 0 1 0 1 0
0 0 1 0 2 0 0
for some αj , βj ∈ R. If the take the difference of the right hand sides, we obtain
α1
1 0 0 0 −3 −2 −1 α2
−2 −5 −4 0 −3 2 −3 α3
~0 = α1 ~u1 + α2 ~u2 + α3 ~u3 + α4 ~u4 − β1 w
~ 1 − β2 w
~ 2 − β3 w
~3 =
0 1 0 0 0 0 1
α4 .
0 0 0 1 0 1 0
β1
0 0 1 0 2 0 0 β2
β3
and therefore
FT
U ∩ W = {(−t − 4s)~u1 + t~u2 + 4s~u3 − s~u4 : t, s ∈ R} = {t(−~u1 + ~u2 ) + s(−4~u1 + 4~u3 − ~u4 ) : t, s ∈ R}
= span{−~u1 + ~u2 , −4~u1 + 4~u3 − ~u4 }
or equivalently
RA
U ∩ W = {−2sw
~ 1 + sw ~ 3 : t, s ∈ R} = {s(−2w
~ 2 + tw ~1 + w ~ 3 : t, s ∈ R}
~ 2 ) + tw
= span{−2w
~1 + w ~ 3 }.
~ 2, w
Ejercicios.
1 −3
5 , 4 y
4
1. En R , sean U = span 2 1
−5 −3
V = {~x ∈ R4 : h~x , (1, 2, −2, −1)t i = 0, h~x , (2, 5, −5, 1)t i = 0}. Determine U ∩ V , U + V y
dim(U + V ).
2. En R3 muestre que si E, F son planos no paralelos que pasan por el origen, entonces
E + F = R3 .
FT
5. Sean U, V subespacios de Rn , responda las siguientes preguntas.
(a) ¿Cuáles son las posibilidades para dim U ∩ V y dim U + V ? Dé ejemplos para cada caso.
(b) ¿Cuáles son las posibilidades para dim U ∩ W y dim U + W ? Dé ejemplos para cada caso.
D
(c) ¿Cuáles son las posibilidades para dim V ∩ W y dim V + W ? Dé ejemplos para cada
caso.
5.6 Summary
Let V be a vector space over K and let v1 , . . . , vk ∈ V .
• The set of all linear combinations of the vectors v1 , . . . , vk is a subspace of V , called the space
generated by the vectors v1 , . . . , vk or the linear span of the vectors v1 , . . . , vk . Notation:
α1 v1 + · · · + αk vk = O
FT
of V has the same number of vectors. The number of vectors needed for a basis of a vector
space V is called the dimension of V .
• If V is not finitely generated, we set dim V = ∞.
• For v1 , . . . , vk ∈ V , it follows that dim(span{v1 , . . . , vk }) ≤ k with equality if and only if the
vectors v1 , . . . , vk are linearly independent.
• If V is finitely generated then every linearly independent system of vectors v1 , . . . , vk ∈ V
can be extended to a basis of V .
RA
• If V = span{v1 , . . . , vk }, then V has a basis consisting of a subsystem of the given vectors
v1 , . . . , vk .
• If U is a subspace of V , then dim U ≤ dim V .
• If V is finitely generated and U is a subspace of V , then dim U = dim V if and only if U = V .
This claim is false if dim V = ∞.
• dim{O} = 0 and {O} has the unique basis ∅.
D
• The vectors ~v1 , . . . , ~vk are linearly independent if and only if the system A~x = ~0 has only the
trivial solution ~x = ~0.
• The vectors ~v1 , . . . , ~vk are a basis of Rn if and only if k = n and A is invertible.
U ∩ V := {v ∈ V : v ∈ U and v ∈ W },
FT
U ∪ V := {v ∈ V : v ∈ U or v ∈ W },
U + V := {u + w : u ∈ U, w ∈ W }.
5.7 Exercises
1. Sea X el conjunto de todas las funciones de R a R. Demuestre que X con la suma y producto
con números en R es un espacio vectorial.
D
2. Sean A ∈ M (m × n) y sea ~a ∈ Rk .
3. Sean A ∈ M (m × n) y sea ~a ∈ Rk .
son subespacios de Rk ?
FT
4. Considere el conjunto R2 con las siguientes operaciones:
⊕ : R 2 × R2 → R2 ,
x1
x2
y
⊕ 1 =
y2
x1 + y2
x2 + y1
,
RA
x1 λx1
: R × R2 → R2 , λ = .
x2 λx2
x2 y2 0
2 2 x1 λx1
:R×R →R , λ = .
x2 λx2
6. (a) Sea V = (− π2 , π
2) y defina suma ⊕ : V × V → V y producto con escalar : R×V → V
por
FT
(d) Todas matrices cuya primera columna coincide con la última columna.
Para los siguientes numerales supongamos que n = m.
(e) Todas las matrices simétricas (es decir, todas las matrices A con At = A).
(f) Todas las matrices que no son simétricas.
(g) Todas las matrices antisimétricas (es decir, todas las matrices A con At = −A).
(h) Todas las matrices diagonales.
RA
(i) Todas las matrices triangular superior.
(j) Todas las matrices triangular inferior.
(k) Todas las matrices invertibles.
(l) Todas las matrices no invertibles.
(m) Todas las matrices con det A = 1.
9. Demuestre que
x1
D
x1 + x2 − 2x3 − x4 = 0
x2
V = :
x3 x1 − x2 + x3 + 7x4 = 0
x4
es un subespacio de R4 .
es un subespacio afı́n de R4 .
Sea U el conjunto de todas las soluciones de (1) y W el conjunto de todas las soluciones de
(2). Note que se pueden ver como subconjuntos de R3 .
1 −2 2 3
12. (a) Sean v1 = , v2 = ∈ R . Escriba v = como combinación lineal de v1 y
2 5 0
FT
v2 .
1 1 1
(b) ¿Es v = 2 combinación lineal de v1 = 7 , v2 = 5?
5 2 2
13 −5
(c) ¿Es A = combinación lineal de
50 8
−1
RA
1 0 0 1 2 1 1
A1 = , A2 = , A3 = , A4 = ?
2 2 −2 2 5 0 5 2
1 2 3
13. (a) ¿Los vectores v1 = 2 , v2 = 2 , v3 = 0 son linealmente independientes en R3 ?
3 5 1
1 1 1
(b) ¿Los vectores v1 = −2 , v2 = 7 , v3 = 5 son linealmente independientes en
D
2 2 2
R3 ?
(c) ¿Los vectores p1 = X 2 − X + 2, p2 = X + 3, p3 = X 2 − 1 son linealmente independientes
en P2 ? Son linealmente independientes en Pn para n ≥ 3?
1 3 1 1 7 3 1 −1 0
(d) ¿Los vectores A1 = , A2 = , A3 = son lineal-
−2 2 3 2 −1 2 5 2 8
mente independientes en M (2 × 3)?
~ ∈ Rn . Suponga que w
14. Sean ~v1 , . . . , ~vk , w ~ 6= ~0 y que w
~ ∈ Rn es ortogonal a todos los vectores
~vj . Demuestre que w ~∈/ gen{~v1 , . . . , ~vm }. ¿Se sigue que el sistema w,
~ ~v1 , . . . , ~vm es linealmente
independiente?
15. Determine
si gen{a
1 } =
, a2 , a3 , a4 , v3 }para
gen{v1 , v2
0 1 1 2 5 1 1
a1 = 1 , a2 = 0 , a3 = 2 , a4 = 1 , v1 = −3 , v2 = 1 , v3 = −1.
5 3 13 11 0 8 −2
16. (a) ¿Las siguientes matrices generan el espacio de todas las matrices simétricas 2 × 2?
2 0 13 0 0 3
A1 = , A2 = , A3 = ,
0 7 0 5 3 0
FT
(c) ¿Las siguientes matrices generan el espacio de las matrices triangulares superiores 2 × 2?
6 0 0 3 10 −7
C1 = , C2 = , C3 = .
0 7 0 5 0 0
17. Sea n ∈ N y sea V el conjunto de las matrices simétricas n × n con la suma y producto con
RA
λ ∈ R usual.
18. Determine si los siguientes conjuntos de vectores son bases del espacio vectorial indicado.
1 −2
D
(a) v1 = , v2 = ; R2 .
2 5
1 3 5 3 0 1 2 1
(b) A = , B= , C= , D= ; M (2 × 2).
2 1 1 2 −2 2 5 0
(c) p1 = 1 + x, p2 = x + x2 , p3 = x2 + x3 , p4 = 1 + x + x2 + x3 ; P3 .
FT
(a) Demuestre que G es un subespacio de R4
(b) Encuentre una base para G y calcule dim G.
(c) Complete la base encontrada en (ii) a una base de R4 .
1 0 4 2 1
23. Sean v1 = 2 , v2 = 4 , v3 = 2 , v4 = 8 , v5 = 0.
3 1 5 3 1
RA
Determine si estos vectoren generan el espacio R3 . Si lo hacen, escoja una base de R3 de los
vectores dados.
6 0 6 3 6 −3 12 −9
24. Sean C1 = , C2 = , C3 = , C4 = .
0 7 0 12 0 2 0 −1
Determine si estas matrices generan el espacio de las matrices triangulares superiores 2 × 2.
Si lo hacen, escoja una base de las matrices dadas.
D
26. Para los siguientes conjuntos, determine si son espacios vectoriales. Si lo son, calcule su
dimensión.
27. Para los siguientes sistemas de vectores en el espacio vectorial V , determine la dimensión del
espacio vectorial generado por ellos y escoja un subsistema de ellos que es base del espacio
vectorial generado por los vectores dados. Complete este subsistema a una base de V .
1 3 3
(a) V = R3 , ~v1 = 2 , ~v2 = 2 , ~v3 = 2.
3 7 1
(b) V = P4 , p1 = x3 + x, p2 = x3 − x2 + 3x, p3 = x2 + 2x − 5, p4 = x3 + 3x + 2.
1 4 3 0 0 12 9 −12
(c) V = M (2 × 2), A = , B= , C= , D= .
−2 5 1 4 −7 11 10 1
FT
vectores linealmente dependientes.
(c) Si v1 , . . . , vk ∈ V es un sistema de vectores linealmente dependientes, entonces v1 es
combinación lineal de los v2 , . . . , vk .
FT
In the first section of this chapter we will define linear maps between vector spaces and discuss their
properties. These are functions which “behave well” with respect to the vector space structure. For
example, m × n matrices can be viewed as linear maps from Rm to Rn . We will prove the so-called
dimension formula for linear maps. In Section 6.2 we will study the special case of matrices. One of
the main results will be the dimension formula (6.4). In Section 6.4 we will see that, after choice of
a basis, every linear map between finite dimensional vector spaces can be represented as a matrix.
This will allow us to carry over results on matrices to the case of linear transformations.
RA
As in previous chapters, we work with vector spaces over R or C. Recall that K always stands for
either R or C.
Other words for linear map are linear function, linear transformation or linear operator.
Remark. Note that very often one writes T x instead of T (x) when T is a linear function.
223
224 6.1. Linear maps
(iii) The condition (6.1) says that a linear map respects the vector space structures of its
domain and its target space.
Exercise 6.3. Let U, V be vector spaces over K (with K = R or K = C). Let us denote the set
of all linear maps from U to V by L(U, V ). Show that L(U, V ) is a vector spaces over K. That
means you have to show that the sum of two linear maps is a linear map, that a scalar multiple
of linear map is a linear map and that the vector space axioms hold.
Examples 6.4 (Linear maps). (a) Every matrix A ∈ M (m × n) can be identified with a linear
map Rn → Rm .
(i) Let C(R) be the space of all continuous functions and C 1 (R) the space of all continuously
differentiable functions. Then
T : C 1 (R) → C(R), Tf = f0
is a linear map.
FT
Proof. First of all note that f 0 ∈ C(R) if f ∈ C 1 (R), so the map T is well-defined. Now
we want to see that it is linear. So we take f, g ∈ C 1 (R) and λ ∈ R. We find
(ii) The following maps are linear, too. Note that their action is the same as the one of T
RA
above, but we changed the vector spaces where it acts on.
R : Pn → Pn−1 , Rf = f 0 , S : P n → Pn , Sf = f 0 .
Proof. Clearly I is well-defined since the integral of a continuous function is again continuous.
In order to show that I is linear, we fix f, g ∈ C(R) and λ ∈ R. We find for every x ∈ R:
Z x Z x Z t Z x
I(λf + g) (x) = (λf + g)(t) dt = λf (t) + g(t) dt = λ f (t) dt + g(t) dt
0 0 0 0
= λ(If )(x) + (Ig)(x).
Since this is true for every x, it follows that I(λf + g) = λ(If ) + (Ig).
T : M (n × n) → M (n × n), T (A) = A + At .
FT
z
0 0 0 0
0 0 0
Then T 1 = , but T −3 1 = T −3 = 6= = −3T 1.
2 6 −6
0 0 0 0
The next lemma shows that a linear map always maps the zero vector to the zero vector.
x, y ∈ U, x 6= y =⇒ T x 6= T y.
D
(ii) T is called surjective if for all v ∈ V there exists at least one x ∈ U such that T x = v.
(iii) T is called bijective if it is injective and surjective.
(iv) The kernel of T (or null space of T ) is
ker(T ) := {x ∈ U : T x = 0}.
Remark 6.7. (i) Observe that ker(T ) is a subset of U , Im(T ) is a subset of V . In Proposi-
tion 6.11 we will show that they are even subspaces.
(ii) Clearly, T is injective if and only if for all x, y ∈ U the following is true:
Tx = Ty =⇒ x = y.
(iii) If T is a linear injective map, then its inverse T −1 : Im(T ) → U exists and is linear too.
• Suppose that T : U → V and S : V → W are linear functions. Show that their composition
ST : U → W is a linear function too.
When you compare Im(ST ) and Im(S), what can you conclude?
When you compare ker(ST ) and ker(S), what can you conclude?
• Suppose that T : U → V is a linear invertible linear function so that we can define its inverse
function T −1 : Im(T ) → U . Show that it is a linear function too.
FT
Lemma 6.9. Let T : U → V be a linear map.
Examples 6.10 (Kernels and ranges of the linear maps from Examples 6.4).
D
(a) We will discuss the case of matrices at the beginning of Section 6.2.
(b) If T : C 1 (R) → C(R), T f = f 0 , then it is easy to see that the kernel of T consists exactly
of the constant functions. Moreover T is surjective because every continuousR functions is the
x
derivative of another function because for every f ∈ C(R) we can set g(x) = 0 f (t) dt. Then
1 0
g ∈ C (R) and T g = g = f which shows that Im(T ) = C(R).
(c) For the integration operator in Example 6.4((c)) we have that ker(I) = {0} and Im(I) =
C 1 (R). In other words, I is injective but not surjective.
0
g ∈ C 1 (R). Then g is differentiable
R x 0 and g ∈ C(R) and, again by the fundamental theorem of
calculus, we have that g(x) = 0 g (t) dt, so g ∈ Im(I) and it follows that C 1 (R) ⊆ Im(I).
Rx
Now assume that Ig = 0. If we differentiate, we find that 0 = (Ig)0 (x) = dx
d
0
g(t) dt = g(x)
for all x ∈ R, therefore g ≡ 0, hence ker(I) = {0}.
Proof. First we prove the claim about the range of T . Clearly, Im(T ) ⊆ Msym (n × n) because
for every A ∈ M (n × n) we have that T (A) is symmetric because (T (A))t = (A + At )t =
At + (At )t = At + A = T (A). To prove Msym (n × n) ⊆ Im(T ) we take some B ∈ Msym (n × n).
Then T ( 12 B) = 12 B +( 12 B)t = 21 B + 12 B = B where we used that B is symmetric. In summary
we showed that Im(T ) = Msym (n × n).
The claim on the kernel of T follows from
FT
A ∈ ker T ⇐⇒ T (A) = 0 ⇐⇒ A+At = 0 ⇐⇒ A = −At ⇐⇒ A ∈ Masym (n×n).
Proof. (i) By Lemma 6.5, O ∈ ker(T ). Let x, y ∈ ker(T ) and λ ∈ K. Then x + λy ∈ ker(T )
RA
because
T (x + λy) = T x + λT y = O + λ0 = O.
Hence ker(T ) is a subspace of U by Proposition 5.10.
(ii) C;early, O ∈ Im(T ). Let v, w ∈ Im(T ) and λ ∈ K. Then there exist x, y ∈ U such that
T x = v and T y = w. Then v + λw = T x + λT y = T (x + λy) ∈ Im(T ). hence v + λw ∈ Im(T ).
Therefore Im(T ) is a subspace of V by Proposition 5.10.
Since we now know that ker(T ) and Im(T ) are subspaces, the following definition makes sense.
D
Sometimes the notations ν(T ) = dim(ker(T )) and ρ(T ) = dim(Im(T )) are used.
• First we show the claim about the kernel of T . Recall that ker(T) = {p ∈ P3 : T p = 0}. So
the kernel of T are exactly those polynomials whose first derivative is 0. These are exactly
the constant polynomials, i.e., the polynomials of degree 0.
Lemma 6.13. Let T : U → V be a linear map between two vector spaces U, V and let {u1 , . . . , uk }
be a basis of U . Then Im T = span{T u1 , . . . , T uk }.
Proof. Clearly, T u1 , . . . , T uk ∈ Im T . Since the image of T is a vector space, all linear combinations
of these vectors must belong to Im T too which shows span{T u1 , . . . , T uk } ⊆ Im T . To show the
other inclusion, let y ∈ Im T . Then there is an x ∈ U such that y = T x. Let us express x as linear
combination of the vectors of the basis: x = α1 u1 + . . . αk uk . Then we obtain
y = T x = T (α1 u1 + . . . αk uk ) = α1 T u1 + . . . αk T uk ∈ span{T u1 , . . . , T uk }.
FT
Proposition 6.14. Let U, V be K-vector spaces, T : U → V a linear map. Let x1 , . . . , xk ∈ U and
set y1 := T x1 , . . . , yk := T xk . Then the following is true.
(i) If the x1 , . . . , xk are linearly dependent, then y1 , . . . , yk are linearly dependent too.
(ii) If the y1 , . . . , yk are linearly independent, then x1 , . . . , xk are linearly independent too.
(iii) Suppose additionally that T invertible. Then x1 , . . . , xk are linearly independent if and only
if y1 , . . . , yk are linearly independent.
RA
In general the implication “If x1 , . . . , xk are linearly independent, then y1 , . . . , yk are linearly
independent.” is false. Can you give an example?
Proof of Proposition 6.14. (i) Assume that the vectors x1 , . . . , xk are linearly dependent. Then
there exist λ1 , . . . , λk ∈ K such that λ1 x1 + · · · + λk xk = O and at least one λj 6= 0. But then
O = T O = T (λ1 x1 + · · · + λk xk ) = λ1 T x1 + · · · + λk T xk
= λ1 y1 + · · · + λk yk ,
D
Exercise 6.15. Assume that T : U → V is an injective linear map and suppose that {u1 , . . . , u` }
is a set of are linearly independent vectors in U . Show that {T u1 , . . . , T u` } is a set of are linearly
independent vectors in V .
The following lemma is very useful and it is used in the proof of Theorem 6.4.
(i) From Lemma 6.13 we know that Im T = span{T u1 , . . . , T uk }. Therefore dim Im T ≤ k = dim U
by Theorem 5.47.
(ii) Assume that T is injective. We will show that T u1 , . . . , T uk are linearly independent. Let
α1 , . . . , αk ∈ K such that α1 T u1 + · · · + αk T uk = O. Then
O = α1 T u1 + · · · + αk T uk = T (α1 u1 + · · · + αk uk ).
FT
Since T is injective, it follows that α1 u1 + · · · + αk uk = O, hence α1 = · · · = αk = 0 which
shows that the vectors T u1 , . . . , T uk are indeed linearly independent. Therefore they are a basis
of span{T u1 , . . . , T uk } = Im T and we conclude that dim Im T = k = dim U .
(iii) Since T is bijective, it is surjective and injective. Surjectivity means that Im T = V and
injectivity of T implies that dim Im T = dim U by (ii). In conclusion,
Remark 6.17. Proposition 6.16 is true also for dim U = ∞. In this case, (i) clearly holds whatever
dim Im(T ) may be. To prove (ii) we need to show that dim Im(T ) = ∞ if T is injective. Note that
for every n ∈ N we can find a subspace Un of U with dim Un = n and we define Tn to be the
restriction of T to Un , that is, Tn : Un → V . Since the restriction of an injective map is injective,
it follows from (ii) that dim Im(Tn ) = n. On the other hand, Im(Tn ) is a subspace of V , therefore
D
dim V ≥ dim Im(Tn ) = n by Theorem 5.54 and Remark 5.55. Since this is true for any n ∈ N, it
follows that dim V = ∞. The proof of (iii) is the same as in the finite dimensional case.
Theorem 6.18. Let U, V be K-vector spaces and T : U → V a linear map. Moreover, let E : U →
U , F : V → V be linear bijective maps. Then the following is true:
In summary we have
and
Proof. (i) Let v ∈ V . If v ∈ Im(T ), then there exists x ∈ U such that T x = v. Set y = E −1 x.
Then v = T x = T EE −1 x = T Ey ∈ Im(T E). On the other hand, if v ∈ Im(T E), then there exists
y ∈ U such that T Ey = v. Set x = E. Then v = T Ey = T x ∈ Im(T ).
FT
ker(T E) = {x ∈ U : Ex ∈ ker(T )} = {E −1 u : u ∈ ker(T )} = E −1 (ker(T )).
It follows that
E −1 : ker T → ker(T E)
is a linear bijection and therefore dim T = dim ker(T E) by Proposition 6.16(iii) (or Remark 6.17 in
the infinite dimensional case) with E −1 as T , ker(T ) as U and ker(T E) as V .
RA
(iii) Let x ∈ U . Then x ∈ ker(F T ) if and only if F T x = O. Since F is injective, we know that
ker(F ) = {O}, hence it follows that T x = O. But this is equivalent to x ∈ ker(T ).
It follows that
F : Im T → Im(F T )
D
is a linear bijection and therefore dim T = dim Im(F T ) by Proposition 6.16(iii) (or Remark 6.17 in
the infinite dimensional case) with F as T , Im(T ) as U and Im(F T ) as V .
Draw a picture to visualise the example above, taking into account that T represents √
the projection
onto the x-axis and E and F are rotation by 45◦ and a “stretching” by the factor 2.
We end this section with one of the main theorems of linear algebra. In the next section we will
re-prove it for the special case when T is given by a matrix in Theorem 6.33. The theorem below
can be considered a coordinate free version of Theorem 6.33.
Theorem 6.20. Let U, V be vector spaces with dim U = n < ∞ and let T : U → V be a linear
map. Then
dim(ker(T )) + dim(Im(T )) = n. (6.4)
Proof. Let k = dim(ker(T )) and let {u1 , . . . , uk } be a basis of ker(T ). We complete it to a basis
{u1 , . . . , uk , wk+1 , . . . , wn } of U and we set W := span{wk+1 , . . . , wn }. Note that by construction
ker(T ) ∩ W = {O}. (Prove this!) Let us consider Te = T |W the restriction of T to W .
It follows that Te is injective because if Tex = O for some x ∈ W then also T x = Tex = O, hence
x ∈ ker(T ) ∩ W = {O}. It follows from Proposition 6.16(ii) that
FT
dim Im Te = dim W = n − k.
To complete the proof, it suffices to show that Im Te = Im T . Recall that by Lemma 6.13, we have
that the range of a linear map is generated by the images of a basis of the initial vector space.
Therefore we find that
where in the second step we used that T u1 = · · · = T uk = O and therefore they do not contribute
to the linear span and in the third step we used that T wj = Tewj for j = k + 1, . . . , n. So we
showed that Im Te = Im T , in particular their dimensions are equal and the claim follows from (6.5)
because, recalling that k = dim ker(T ),
D
Note that an alternative way to prove the theorem above is to first prove Theorem 6.33 for matrices
and then use the results on representations of linear maps in Section 6.4 to conclude formula (6.4).
• what a linear map is and why they are the natural maps to consider on vector spaces,
• what injectivity, surjectivity and bijectivity means,
• what the kernel and image of a linear map is,
• why the dimension formula (6.4) is true,
• etc.
Ejercicios.
De los ejercicios 1 al 14 determinar si la función dada es una transformación lineal. Si lo es
demuéstrelo, en caso contrario dé un ejemplo donde no se cumpla la linealidad.
x
x + 2y
y
1. T : R4 → R3 , T z = 3y − 5z .
w
w
FT
x
3 2 x
2. T : R → R , T y = .
1
z
x y
3. T : R2 → R2 , T = .
y x2
x y
4. T : R2 → R2 , T = .
RA
y 0
x
y
5. T : R4 → R, T z = |w|.
w
x
x − y + 2z + 3w
y
6. T : R4 → R3 , T y + 4z + 3w .
z =
D
x + 6z + 6w
w
Rx
7. T : P3 → P4 , T (p) = 0 p(t)dt.
a
8. T : R3 → P3 , T b = (a − 3b)x3 + (b + 2c).
c
9. T : M (3 × 3) → M (3 × 3), T (A) = At − A.
2 4
10. T : M (2 × 2) → M (2 × 2), T (A) = A .
0 1
a11 a12 a13
12. T : M (3 × 3) → R, T a21 a22 a23 = a11 + a22 + a33 .
a31 a32 a33
13. Sea ~x0 ∈ Rn y T : M (m × n) → Rm dada por T (A) = A~x0 .
14. T : M (n × n) → R, T (A) = det A.
15. De los ejercicios anteriores, (salvo el ejercicio 11.) determine Im T , ker T y sus dimensiones.
~ ∈ Rn un vector no nulo y T : Rn → R dada por T (~x) = h~x , wi.
16. Sea w ~ Demuestre que T es
una transformación lineal y determine las dimensiones de ker T e Im T .
17. Sea w~ ∈ Rn un vector no nulo y T : Rn → Rn dada por T (~x) = projw~ ~x. Demuestre que T es
lineal, encuentre Im T , ker T y sus dimensiones.
1
18. Sea T : R3 → R3 dada por T (~x) = ~x × 2, ¿T es una transformación lineal? En caso
3
FT
afirmativo encuentre ker T e Im T y sus dimensiones. (Hint: Sale muy sencillo si lo piensa
geométricamente).
19. Sea T : R2 → R3 una transformación lineal tal que:
1 −3
1 3
T = 0 , T = 0 ,
−1 2
3 −9
RA
7 x
encontrar T y aún más general, determinar T . ¿T es inyectiva?. ¿Cómo cambia la
11 y
0
3
respuesta si T = 1?
2
0
0 4 4
T −1 = 1 , T −3 = −1 ,
T −14 =
0?
2 1 8
−1 −3 −5
Let A ∈ M (m × n). We already know that we can view A as a linear map from Rn to Rm . Hence
ker(A) and Im(A) and the terms injectivity and surjectivity are defined.
Strictly speaking, we should distinguish between a matrix and the linear map induced by it. So
we should write TA : Rn → Rm for the map x 7→ Ax. The reason is that if we view A directly
as a linear map then this implies that we tacitly have already chosen a basis in Rn and Rm , see
Section 6.4 for more on that. However, we will usually abuse notation and write A instead of TA .
If we view a matrix A as a linear map and at the same time as a linear system of equations, then
we obtain the following.
Remark 6.21. Let A ∈ M (m × n) and denote the columns of A by ~a1 , . . . , ~an ∈ Rm . Then the
following is true.
Consequently,
FT
(iii) A is injective ⇐⇒ ker(A) = {~0}
⇐⇒ the homogenous system A~x = ~0 has only the trivial solution ~x = ~0.
(iv) A is surjective ⇐⇒ Im(A) = Rm
⇐⇒ for every ~b ∈ Rm , the system A~x = ~b has at least one solution.
Proof. All claims should be clear except maybe the second equality in (ii). This follows from
x1 x1
RA
Im A = {A~x : ~x ∈ Rn } = (~a1 | . . . |~an ) ... : ... ∈ Rn
xn xn
= {x1~a1 + · · · + xn~an ) : x1 , . . . , xn ∈ R}
= span{~a1 , . . . , ~an },
Proof with Gauß-Jordan. Let A0 be the row reduced echelon form of A. Then there must be an
invertible matrix E such that A = EA0 and A0 the last row of A0 must be zero because it can have
at most n pivots. But then (A0 |~em ) is inconsistent, which means that (A|E −1~em ) is inconsistent.
Hence E −1~em ∈/ Im A so A cannot be surjective. (Basically we say that clearly A0 is not surjective
because we can easily find a right side to that A0 ~x0 = ~b0 is inconsistent. Just pick any vector ~b0 whose
last coordinate is different from 0. The easiest such vector is ~em . Now do the Gauß-Jordan process
backwards on this vector in order to obtain a right hand side ~b such that A~x = ~b is inconsistent.)
Proof using the concept of dimension. We already saw that Im A is the linear span of its columns.
Therefore dim Im A ≤ #columns of A = n < m = dim Rm , therefore Im A ( Rm .
Proof with Gauß-Jordan. Let A0 be the row reduced echelon form of A. Then A0 can have at
most m pivots. Since A0 has more columns than pivots, the homogeneous system A~x = ~0 has
infinitely solutions, but then also ker A contains infinitely many vectors, in particular A cannot be
injective.
Proof using the concept of dimension. We already saw that Im A is the linear span of its n columns
in Rm . Since n > m it follows that the column vectors are linearly dependent in Rm , hence A~x = ~0
has a non-trivial solution. Therefore ker A is not trivial and it follows that A is not injective.
Note that the remarks do not imply that A is surjective if m ≤ n or that A is injective if n ≤ m.
Find examples!
From Theorem 3.44 we obtain the following very important theorem for the special case m = n.
The next proposition follows immediately from the definition above and from Remark 6.21(ii).
The next proposition follows directly from the general theory in Section 6.1. We will give another
proof at the end of this section.
(i) CA = CAE .
(ii) RA = RF A .
Proof. (i) Note that CA = Im(A) = Im(AE) = CAE , where in the first and third equality we
used Proposition 6.26, and in the second equality we used Theorem 6.4.
(ii) Recall that, if F is invertible, then F t is invertible too. With Proposition 6.26(i) and what
we already proved in (i), we obtain RF A = C(F A)t = CAt F t = CAt = RA .
We immediately obtain the following proposition.
Proof. We will only prove (i). The claim (ii) can be proved similarly (or can be deduced easily
from (i) by applying (i) to the transposed matrices). That A and B are row equivalent means that
FT
we can transform B into A by row transformations. Since row transformations can be represented
by multiplication by elementary matrices from the left, there are elementary matrices F1 , . . . , Fk ∈
M (m × m) such that A = F1 . . . Fk B. Note that all Fj are invertible, hence F := F1 . . . Fk is
invertible and A = F B. Therefore all the claims in (i) follow from Theorem 6.4 and Proposition 6.27.
The proposition above is very useful to calculate the kernel of a matrix A: Let A0 be the reduced
row-echelon form of A. Then the proposition can be applied to A and A0 , and we find that
ker(A) = ker(A0 ).
RA
In fact, we know this since the first chapter of this course, but back then we did not have fancy
words like “kernel” at our disposal. It says nothing else than: the solutions of a homogenous
system do not change if we apply row transformations, which is exactly why the Gauß-Jordan
elimination works.
In Examples 6.36 and 6.37 we will calculate the kernel and range of a matrix. Now we will prove
two technical lemmas.
D
Proof. Let A0 be the reduced row-echelon form of A. Then there exist F1 , . . . , F` ∈ M (m × m) such
that F1 · · · F` A = A0 and A0 is of the form
1 ∗ ∗ 0 ∗ ∗ 0 ∗
1 ∗ ∗ 0 ∗
0
A = . (6.7)
1 ∗
Now clearly we can find “allowed” column transformations such that A0 is transformed into the
form A00 . If we observe that applying column transformations is equivalent to multiplying A0 from
the right by elementary matrices, then we can find elementary matrices E1 , . . . , Ek such that
A0 E1 . . . Ek if of the form (6.6).
FT
(i) dim(ker(A)) = m − r = number of zero rows of A00 ,
(ii) dim(Im(A)) = r = number of pivots A00 ,
(iii) dim(CA00 ) = dim(RA00 ) = r.
Proposition 6.31. Let A ∈ M (m × n) and let A0 be its reduced row-echelon form. Then
E1 · · · Ek . It follows that A0 = F A and A00 = F AE. Clearly, the number of pivots of A0 and A00
coincide. Therefore, with the help of Theorem 6.4 we obtain
That means:
(dimension of the range of A) = (dimension of row space) = (dimension of column space).
As an immediate consequence we obtain the following theorem which is a special case of Theo-
rem 6.20, see also Theorem 6.47.
FT
dim(ker(A)) = dim(ker(A00 )) = n − r,
dim(Im(A)) = dim(Im(A00 )) = r
Theorem 6.34. Let A ∈ M (m × n) and let A0 be its reduced row-echelon form with columns
RA
~c1 , . . . , ~cn and ~c1 0 , . . . , ~cn 0 respectively. Assume that the pivot columns of A0 are the columns j1 <
· · · < jk . Then dim(Im(A)) = k and a basis of Im(A) is given by the columns ~cj1 , . . . , ~cjk of A.
Proof. Let E be an invertible matrix such that A = EA0 . By assumption on the pivot columns of
A0 , we know that dim(Im(A0 )) = k and that a basis of Im(A0 ) is given by the columns ~cj1 0 , . . . , ~cjk 0 .
By Theorem 6.4 it follows that dim(Im(A)) = dim(Im(A0 )) = k. Now observe that by definition of
E we have that E~c` 0 = ~c` for every ` = 1, . . . , n; in particular this is true for the pivot columns of
A0 . Moreover, since E in invertible and the vectors ~cj1 0 , . . . , ~cjk 0 are linearly independent, it follows
from Theorem 6.14 that the vectors ~cj1 , . . . , ~cjk are linearly independent. Clearly they belong to
D
Im(A), so we have span{~cj1 , . . . , ~cjk } ⊆ Im(A). Since both spaces have the same dimension, they
must be equal.
Remark 6.35. The theorem above can be used to determine a basis of a subspace given in the
form U = span{~v1 , . . . , ~vk } ⊆ Rm as follows: Define the matrix A = (~v1 | . . . |~vk ). Then clearly
U = Im A and we can apply Theorem 6.34 to find a basis of U .
4 5 22 1
FT
#non-pivot columns A0 = 1).
Kernel of A: We know that ker(A) = ker(A0 ) by Theorem 6.4 or Proposition 6.28. From the
explicit form of A0 it is clear that A~x = 0 if and only if x4 = 0, x3 arbitrary, x2 = −2x3 and
x1 = −3x3 . Therefore
−3x3
−3
−2x3 −2
0
ker(A) = ker(A ) = : x 3 ∈ R = span 1 .
x3
RA
0 0
Image of A: The pivot columns of A0 are the columns 1, 2 and 4. Therefore, by Theorem 6.34, a
basis of Im(A) are the columns 1, 2 and 4 of A:
1
1 1
3 2 1
Im(A) = span , , .
2 −1 (6.9)
0
4 5 1
D
Alternative method for calculating the image of A: We can uses column manipulations of A
to obtain Im A. (If you fell more comfortable with row operations, you could apply row operations
to At and then transpose the resulting matrix again.) We find (Cj stands for “jth column of A):
C →C −C
1 1 5 1 C32 → C32 − 5C11 1 0 0 0 C3 → C3 − 2C2 1 0 0 0
3 2 13 1 C4 → C4 − C1 3 −1 −2 −2 C4 → C4 − 2C2 3 −1 0 0
A = −−−−−−−−−−→ −−−−−−−−−−→
0 2 4 −1 0 2 4 −1 0 2 0 −5
4 5 22 1 4 1 2 −3 4 1 0 −5
C4 → −1/5C4 1 0 0 0 C1 → C1 − 3C4 1 0 0 0 C3 ↔ C4 1 0 0 0
C1 → C1 + 3C2 0 −1 0 0 C2 → C2 − 2C4 0 −1 0 0 C2 → −C2 0 1 0 0
−−−−−−−−−−→ 3
−−−−−−−−−−→ −−−−−−−→ =: A.
e
2 0 1 0 0 0 1 0 0 1 0
7 1 0 1 4 −1 0 1 4 1 1 0
It follows that
1 0 0
e = span , , 0 .
0 1
Im(A) = Im(A) 0 0 1 (6.9’)
4 1 1
p1 = x3 − x2 + 2x + 2, p2 = x3 + 2x2 + 8x + 13,
p3 = 3x3 − 6x2 − 5, p3 = 5x3 + 4x2 + 26x − 9.
Solution. First we identify P3 with R4 by ax3 + bx2 + cx + d = b (a, b, c, d)t . The polynomials
p1 , p2 , p3 , p4 correspond to the vectors
FT
1 1 3 5
−1 2 −6 4
2 , ~v2 = 8 , ~v3 = 0 , ~v4 = 26 .
~v1 =
2 13 −5 −9
Now we use Remark 6.35 to find a basis of span{v1 , v2 , v3 , v4 }. To this end we consider the A whose
columns are the vectors ~v1 , . . . , ~v4 :
RA
1 1 3 5
−1 2 −6 4
A=
2 8
.
0 26
2 13 −5 −9
Clearly, span{~v1 , ~v2 , ~v3 , ~v4 } = Im(A), so it suffices to find a basis of Im(A). Applying row transfor-
mation to A, we obtain
1 1 3 5 1 0 4 5
D
−1 2 −6 4 0 1 2 3 = A0 .
A= −→ · · · −→
2 8 0 26 0 0 0 0
2 13 −5 −9 0 0 0 0
The pivot columns of A0 are the first and the second column, hence by Theorem 6.34, a basis of
Im(A) are its first and second columns, i.e. the vectors ~v1 and ~v2 .
It follows that {p1 , p2 } is a basis of span{p1 , p2 , p3 , p4 } ⊆ P3 , hence dim(span{p1 , p2 , p3 , p4 }) = 2.
Remark 6.38. Let us use the abbreviation π = span{p1 , p2 , p3 , p4 }. The calculation above actually
shows that any two vectors of p1 , p2 , p3 , p4 form a basis of π. To see this, observe that clearly any
two of them are linearly independent, hence the dimension of their generated space is 2. On the
other hand, this generated space is a subspace of π which has the same dimension 2. Therefore
they must be equal.
Remark 6.39. If we wanted to complete p1 , p2 to a basis of P3 , we have (at least) the two following
options:
(i) In order to find q3 , q4 ∈ P3 such that p1 , p2 , q3 , q4 forms a basis of P3 we can use the reduction
process that was employed to find A0 . Assume that E is an invertible matrix such that
A = EA0 . Such an E can be found by keeping track of the row operations that transform
A into A0 . Let ~ej be the standard unit vectors of R4 . Then we already know that ~v1 = E~e1
and ~v2 = E~e2 . If we set w ~ 3 = E~e3 and w ~ 4 = E~e4 , then ~v1 , ~v2 , w ~ 4 form a basis of R4 .
~ 3, w
This is because ~e1 , . . . ,~e4 are linearly independent and E is injective. Hence E~e1 , . . . , E~e4 are
linearly independent too (by Proposition 6.14).
FT
x1 − x2 +2x3 +2x4 = 0,
x1 +2x2 −6x3 +4x4 = 0
or, in matrix notation, P ~x = 0 where P is the 2 × 4 matrix whose rows are ~v1 and ~v2 . Since
clearly Im(P ) ⊆ R2 , it follows that dim(Im(P )) ≤ 2 and therefore dim(ker(P )) ≥ 4 − 2 = 2.
ker(A) = (RA )⊥ .
ker(A) = (Im At )⊥ .
x
y 3x + 2y − 5r
2y + z − w
T z = 5x + 2y − w .
w
w + 3r
r
We want to write T in the form A~x. Note that T can be expressed in the form
x x
y 3 2 0 0 −5
0 2 1 0 −1 y
z
T =
z ,
w 5 2 0 0 −1 w
0 0 0 1 3
r r
y1
y2
This way of expressing T is not arbitrary, since if y3 ∈ Im T then:
y4
x + 2y − 5r = y1
2y + z − w = y2
5x + 2y − w = y3
w + 3r = y4
and we know from section 3.3 that this system can be written in the form:
x
3 2 0 0 −5 y1
0 2 1 0 −1 y y2
5 2 0 0 −1 z = y3 .
w
0 0 0 1 3 y4
r
FT
• what the relation between the solutions of a homogeneous system and the kernel of the
associated coefficient matrix is,
• what the relation between the admissible right hand sides of a system of linear equations
and the range of the associated coefficient matrix is,
• why the dimension formula (6.8) holds and why it is only a special case of (6.4),
RA
• why the Gauß-Jordan process works,
• etc.
You should now be able to
• calculate a basis of the kernel of a matrix and its dimension,
• calculate a basis of the range of a matrix and its dimension,
• etc.
D
Ejercicios.
1. Encuentre una base para el espacio generado de los siguientes conjuntos:
1 1 1
(a) , , .
3 −1 0
2 −4 10
(b) −1 , −2 , −5 .
1 2 5
1 2 1 3
(c) 0 , −2 , 2 , 3 .
1 0 3 5
1 0 1 −1 4 0 3 1
(d) , , , .
1 0 0 0 2 0 0 1
(e) X 3 − X 2 + X + 2, 2X 3 + 4X + 2, 2X 3 + X 2 + 5X + 1, X 3 + 2X 2 + 4X − 1 .
2. En los siguientes ejercicios, exprese T del Example 6.41 de la forma A~x. Determine Im T, ker T
y sus dimensiones.
1
(a) T : R3 → R3 , T (~x) = ~x × 0.
−2
x
x+y
y
(b) T : R4 → R3 , T
z = x − z .
2y + w
w
x−y
FT
x 5x + 4y + 9z
(c) T : R3 → R5 , T y = 2x − 3y − z .
z x+z
2y + 2z
1
(d) Sea w = 3 y T : R3 → R3 dada por T (~x) = projw~ ~x.
−1
RA
3. Encuentre una matriz A(3 × 3) tal que su kernel es el plano E : x + 2y − z = 0.
5. Sea A ∈ M (m × n) tal que para todo ~b ∈ Rm , el sistema A~x = ~b tiene solución. ¿Cuánto vale
dim(Im A)?, ¿que se puede decir de ker A?
6. Sean A ∈ M (m × n) y B ∈ M (n × k).
D
¿Cuándo se tiene igualdad en los ejercicios anteriores? Encuentre ejemplos para igualdad y
ejemplos donde hay desigualdad estricta.
Such columns of numbers are usually interpreted as the Cartesian coordinates of the tip of the
vector if its initial point is in the origin. So for example, we can visualise ~v as the vector which
we obtain when we move 3 units along the x-axis, 2 units along the y-axis and −1 unit along the
z-axis.
FT
If we set ~e1 , ~e2 , ~e3 the unit vectors which are parallel to the x-, y- and z-axis, respectively, then we
can write ~v as a weighted sum of them:
3
~v = 2 = 3~e1 + 2~e2 − ~e3 . (6.11)
−1
So the column of numbers which we use to describe ~v in (6.10) can be seen as a convenient way to
abbreviate the sum in (6.11).
RA
Sometimes however, it may make more sense to describe a certain vector not by its Cartesian
coordinates. For instance, think of an infinitely large chess field (this is R2 ). Then the rook is
moving a along the Cartesian axis while the bishop moves a along the diagonals, that is along
~b1 = ( 1 ), ~b2 = −1 and the knight moves in directions parallel to ~k1 = ( 2 ), ~k2 = ( 1 ). We
1 1 1 2
suppose that in our imaginary chess game the rook, the bishop and the knight may move in arbitrary
multiples of their directions. Suppose all three of them are situated in the origin of the field and we
want to move them to the field (3, 5). For the rook, this is very easy. It only has to move 3 steps
to the right and then 5 steps up. He would denote his movement as ~vR = ( 35 )R where we put the
D
index R to indicate that the numbers in this column vector correspond to the natural coordinate
system of rook. The bishop cannot do this. He can move only along the diagonals. So what does
he have to do? He has to move 4 steps in the direction of ~b1 and 1 step in the direction of ~b2 . So
he would denote his movement with respect to his bishop coordinate system as ~vB = ( 41 )B . Finally
the knight has to move 31 steps in the direction of ~k1 and 37 steps in the direction of ~k2 to reach
the point
(3, 5). So he would denote his movement with respect to his knight coordinate system as
1/3
~vK = 7/3
. See Figure 6.1.
K
1/3
Exercise. Check that ~vB = ( 41 )B = 4~b1 + 1~b2 = ( 35 ) and that ~vK = 7/3
= 1/3~k1 + 7/3~k2 = ( 35 ).
K
Although the three vectors ~v , ~vB and ~vK look very different, they describe the same vector – only
from three different perspectives (the rook, the bishop and the knight perspective). We have to
y y
P (3, 5) P (3, 5)
5 5
~b2 1
4 3
4 1 B 4 7
3 K
7~
3 3 k
3 2
4~b1 ~k2
2 2
~b2 1 1 ~k1
~b1 1~
x k
3 1 x
−1 0 1 2 3 4 −1 0 1 2 3 4
FT
Figure 6.1
The pictures shows the point (3, 5) in “bishop” and “knight” coordinates. The vectors for the
bishop are ~b1 = ~ −1
xB = ( 41 ). The vectors for the knight are ~k1 = ( 21 ), ~k2 = ( 12 )B
1
( 1 ), b2 =
1
1 and ~
and ~xK = 3
7 .
3 K
remember that they have to be interpreted as linear combinations of the vectors that describe their
RA
movements.
What we just did was to perform a change of bases in R2 : Instead of describing a point in the plane
in Cartesian coordinates, we used “bishop”- and “knight”-coordinates.
We can also go in the other direction and transform from “bishop”- or “knight”-coordinates to
Cartesian coordinates. Assume that we know that the bishop moves 3 steps in his direction ~b1 and
−2 steps in his direction ~b2 , where does he end up? In his coordinate system, he is displaced by
3
the vector ~u = −2 B . In Cartesian coordinates this vector is
3 3 2 5
D
~u = ~ ~
= 3b1 − 2b2 = + = .
−2 B 3 −2 1
~ ~
3 steps in his direction k1 and −2 step in his direction k2 , that is, we move
If we move the knight
3
him along w
~ = −2 K according to his coordinate system, then in Cartesian coordinates this vector
is
3 6 −2 4
w
~= = 3~k1 − 2~k2 = + = .
−2 K 3 −4 −1
Can the bishop and the knight reach every point in the plane? If so, in how many ways? The
answer is yes, and they can do so in exactly one way. The reason is that for the bishop and for the
knight, their set of direction vectors each form a basis of R2 (verify this!).
Let us make precise the concept of change of basis. Assume we are given an ordered basis B =
y y
3 3
3~b1 ~k2
2 −2~b2 2
3~k1
~b1
3 ~k1
~b2 1 −2 B 1
−2~k2
x x
−1 1 2 3 4 5 6 −1 1 2 3 4 5 6
3
−2 K
3 3
Figure 6.2: The pictures shows the vectors −2 B and −2 K .
FT
x1
~x = ... (6.12)
xn B
then we interprete it as a vector which is expressed with respect to the basis B and
x1
..
~x = . := x1~b1 + · · · + xn~bn . (6.13)
RA
xn B
If there is no index attached to the column vector, then we interprete it as a vector with respect to the
canonical basis ~e1 , . . . ,~en of Rn . Now we want to find a way to calculate the Cartesian coordinates
(that is, those with respect to the canonical basis) if we are given a vector in B-coordinates and
vice versa.
It will turn out that the following matrix will be very useful:
AB→can = (~v1 | . . . |~vn ) = matrix whose columns are the vectors of the basis B.
D
that is
y1
~x = AB→can ~xB = ... . (6.14)
yn can
The last vector (the one with the y1 , . . . , yn in it) describes the same vector as ~xB , but it does so
with respect to the standard basis of Rn . The matrix AB→can is called the transition matrix from
the basis B to the canonical basis (which explains the subscript “B → can”). The matrix is also
called the change-of-coordinates matrix
~x = AB→can ~xB .
FT
In this case, we know the entries of the vector ~xB . So we only need to invert the matrix AB→can in
~xB = A−1
B→can ~x.
This requires of course that AB→can is invertible. But this is guaranteed by Theorem 5.39 since we
know that its columns are linearly independent. So it follows that the transition matrix from the
canonical basis to the basis B is given by
Acan→B = A−1
RA
B→can .
y1
Note that we could do this also “by hand”: We are given ~x = ... and we want to find the
yn can
entries x1 , . . . , xn of the vector ~xB which describes the same vector. That is, we need numbers
x1 , . . . , xn such that
~x = x1~b1 + · · · + ~bn xn .
If we know the vectors ~b1 , . . . , ~bn , then we can write this as an n × n system of linear equations
D
yn can
Now assume that we have two ordered bases B = {~b1 , . . . , ~bn } and C = {~c1 , . . . , ~cn } of Rn and we
are given a vector ~xB with respect to the basis B. How can we calculate its representation ~xC with
respect to the basis C? The easiest way is to use the canonical basis of Rn as an auxiliary basis.
So we first calculate the given vector ~xB with respect to the canonical basis, we call this vector ~x.
Then we go from ~x to ~xC . According to the formulas above, this is
~ can→C ~x = Acan→C AB→can ~xB .
~xC = A
Example 6.42. Let us go back to our example of our imaginary chess board. We have the “bishop
basis” B = {~b1 , ~b2 } where ~b1 = ( 11 ), ~b2 = −11 and the “knight basis” K = {~k1 , ~k2 } ~k1 = ( 21 ), ~k2 =
( 12 ). Then the transition matrices to the canonical basis are
1 −1 2 1
AB→can = , AK→can = ,
1 1 1 2
FT
AB→K = , AK→C = .
3 1 1 2 −1 3
1 3 −3 −5
Solution. (~x)K = AB→K ~xB = 3 1 1 ( 27 ) = 3 K.
1 11
Solution. (~z)B = Acan→B ~z = 2 −1 1 ( 13 ) = ( 21 )B .
Example 6.43. Recall the example on page 106 where we had a shop that sold different types of
D
packages of food. Package type A contains 1 peach and 3 mangos and package type B contains 2
peaches and 1 mango. We asked two types of questions:
Question 1. If we buy a packages of type A and b packages of type B, how many peaches and
mangos will we get? We could rephrase this question so that it becomes more similar to Question
2: How many peaches and mangos do we need in order to fill a packages of type A and b packages
of type B?
Question 2. How many packages of type A and of type B do we have to buy in order to get p
peaches and m mangos?
Recall that we had the relation
a m −1 m a 1 2 −1 1 −1 2
M = , M = where M= and M = .
b p p b 3 1 5 3 −1
(6.15)
mangos
7
Type B
6
5 ~
3B 4
4 6m
~
3
3
~
A 7~
p 2
2
1
1 ~
B
p
~
peaches Type A
1 2 3 4 5 6 7 −1 1 2
m
~
(a)
FT (b)
Figure 6.3: How many peaches and mangos do we need to obtain 1 package of type A and 3 packages
of type B? Answer: 7 peaches and 6 mangos. Figure (a) describes the situation in the “fruit plane”
while Figure (b) describes the same situation in the “packages plane”. In both figures we see that
~ + 3B
A ~ = 7~
p + 6m.
~
RA
We can view these problems in two different coordinate systems. We have the “fruit basis” F =
{~ ~ B}
~ and the “package basis” P = {A,
p, m} ~ where
1 0 ~ 1 ~ 2
m~ = , p~ = , A= , B= .
0 1 3 1
An example for the first question is: How many peaches and mangos do we need to obtain 1 package
of type A and 3 packages of type B? Clearly,
we
need 7 peaches and 6 mangos. So the point that we
1 7
want to reach is in “package coordinates” and in “fruit coordinates” . This is sketched
3 P 6 F
in Figure 6.3.
An example for the second question is: How many packages of type A and of type B do we have
to buy in order to obtain 5 peaches and 5 mangos? Using (6.15) we find that we need 1 package of
type
A and 3 packages of type B.Sothe point that we want to reach is in “package coordinates”
1 5
and in “fruit coordinates” . This is sketched in Figure 6.4.
2 P 5 F
In the rest of this section we will apply these ideas to introduce coordinates in abstract (finitely
generated) vector spaces V with respect to a given a basis. This allows us to identify in a certain
mangos
Type B
6
5 4
~
2B
4
3
3 5m
~
~
A 2
2 5~
p
1
1 ~
B
p
~
peaches Type A
1 2 3 4 5 6 −1 1 2
m
~
(a)
FT
(b)
Figure 6.4: How many packages of type A and of type B do we need to get 5 peaches and 5 mangos?
Answer: 1 package of type A and 2 packages of type B. Figure (a) describes the situation in the “fruit
plane” while Figure (b) describes the same situation in the “packages plane”. In both figures we see
~ + 2B
that A ~ = 5~p + 5m.
~
w = α1 v1 + · · · + αn vn .
So, if we are given w, we can find the numbers α1 , . . . , αn . On the other hand, if we are given the
numbers α1 , . . . , αn , we can easily reconstruct the vector w (just replace in the right hand side of
the above equation). Therefore it makes sense to write
α1
D
..
w= .
αn B
where again the index B reminds us that the column of numbers has to be understood as the
coefficients with respect to the basis B. In this way, we identify V with Rn since every column
vector gives a vector w in V and every vector w gives one column vector in Rn . Note that if we
start with some w in V , calculate its coordinates with respect to a given basis and then go back to
V , we get back our original vector w.
p1 = 1, p2 = X, p3 = X 2 , q1 = X 2 , q2 = X, q3 = 1, r1 = X 2 + 2X, r2 = 5X + 2, r3 = 1.
We want to write the polynomial π(X) = aX 2 + bX + c with respect to the given basis.
c
• Basis B: Clearly, π = cp1 + bp2 + ap3 , therefore π = b .
a B
a
• Basis C: Clearly, π = aq1 + bq2 + cq3 , therefore π = b .
c C
• Basis D: This requires some calculations. Recall that we need numbers α, β, γ ∈ R such that
α
π = β = αr1 + βr2 + γr3 .
γ D
This leads to the following equation
aX 2 + bX + c = α(X 2 + 2X) + β(5X + 2) + γ = αX 2 + (2α + 5β)X + 2β + γ.
FT
Comparing coefficients we obtain
α =a 1 0 0 α a
2α + 5β =b in matrix form: 2 5 0 β = b . (6.16)
2β + γ = c. 0 2 1 γ c
Note that the columns of the matrix appearing on the right hand side are exactly a the vector
representations of r1 , r2 , r3 with respect to the basis C and the column vector b is exactly
c
RA
the vector representation of π with respect to the basis C! The solution of the system is
α = a, β = − 25 a + 15 b, γ = 25 a − 51 b + c,
therefore
a
π = − 25 a + 15 b .
2 1
5a − 5b + c D
We could have found the solution also by doing a detour through R3 as follows: We identify the
D
vectors q1 , q2 , q3 with the canonical basis vectors ~e1 , ~e2 ,~e3 of R3 . Then the vectors r1 , r2 , r3
and π correspond to
1 0 0 a
0 0 0 0
~r1 = 2 , ~r2 = 5 , ~r3 = 0 , ~π = b .
0 2 1 c
Let R = {~r10 , ~r20 , ~r30 }. In order to find the coordinates of ~π 0 with respect to the basis ~r10 , ~r20 , ~r30 ,
we note that
~π 0 = AR→can~πR
0
where AR→can is the transition matrix from the basis R to the canonical basis of R whose
columns consist of the vectors ~r10 , ~r20 , ~r30 . So we see that this is exactly the same equation as
the one in (6.16).
FT
α +γ=0 in matrix form: 1 0 1 β = 0 . (6.17)
α + 3β =0 1 3 0 γ 0
Therefore
α 2 3 0 −1 2 3
D
−1 1
β = A 3 = −1 0 1 3 = −1 ,
2
γ 0 −3 2 1 0 0
3
hence Z = 3R − S = −1 .
0 B
Now we give an alternative solution (which is essentially
the same as the above)doing a detour
1 0 0 0 0 1
through R3 . Let C = {A1 , A2 , A3 } where A1 = , A2 = , A3 = . This is
0 0 0 1 1 0
clearly a basis of Msym (2 × 2). We identify it with the standard basis ~e1 ,~e2 ,~e3 of R3 . Then the
vectors R, S, T in this basis look like
1 1 0 2
R0 = 1 , S 0 = 0 , T 0 = 1 and Z 0 = 3 .
1 3 0 0
(i) In order to show that R, S, T are linearly independent, we only have to show that the vectors
R0 , S 0 and T 0 are linearly independent in R3 . To this end, we consider the matrix A whose
columns are these vectors. Note that this is the same matrix that appeared in (6.18). It is
easy to show that this matrix is invertible (we already calculated its inverse!). Therefore the
vectors R0 , S 0 , T 0 are linearly independent in R3 , hence R, S, T are linearly independent in
Msym (2 × 2).
(ii) Now in order to find the representation of Z in terms of the basis B, we only need to find the
representation of Z 0 in terms of the basis B 0 = {R0 , S 0 , T 0 }. This is done as follows:
2
ZB0 0 = Acan→B0 Z 0 = A−1 Z 0 = 3 .
0
FT
You should now have understood
• the geometric meaning of a change of bases in Rn ,
• how an abstract finite dimensional vector space can be represented as Rn or Cn and that
the representation depends on the chosen basis of V ,
• how the vector representation changes if the chosen basis is reordered,
• etc.
RA
You should now be able to
Ejercicios.
1 1 00 0 0 1 0
1. Sea B = , , , . Muestre que B es una base de M (2 × 2)
0 0 11 1 0 0 4
a b
y encuentre [A]B para A = .
c d
FT
5. Sean B1 = 1 , 0 , 1 y B2 = 1 , 4 , −2 .
0 −1 1 3 5 4
√ 1
a √ 1
(a) Muestre que B = , es base de R2 .
a2 +b2 b a2 +b2 a
(b) Muestre que existe ϑ ∈ (−π, π] tal que B = Bϑ . (Hint: Interpretación geométrica de B)
FT
T u1 = a11 v1 + a21 v2 + · · · + am1 vm ,
T u2 = a12 v1 + a22 v2 + · · · + am2 vm ,
.. .. .. .. (6.20)
. . . .
T un = a1n v1 + a2n v2 + · · · + amn vm .
Let us define the matrix AT and the vector ~λ by
a11 a12 · · · a1n λ1
RA
a21 a22 · · · a2n λ2
AT = .
.. .. ∈ M (m × n),
~λ =
.. ∈ Rn .
.. . . .
am1 am2 · · · amn λn
Note that the first column of AT is the vector representation of T u1 with respect to the basis
v1 , . . . , vm , the second column is the vector representation of T u2 , and so on.
Now let us come back to the calculation of T w and its connection with the matrix AT . From (6.19)
and (6.20) we obtain
D
T w = λ1 T u1 + λ2 T u2 + · · · + λn T un
The calculation shows that for every k the coefficient of vk is the kth component of the vector AT ~λ!
Now we can go one step further. Recall that the choice of the basis B of U and the basis C of V
allows us to write w and T w as a column vectors:
λ1 a11 λ1 + a12 λ2 + · · · + a1n λn
λ2 a21 λ1 + a22 λ2 + · · · + a2n λn
w=w ~B . , Tw = .
..
.. .
λ1 B
am1 λ1 + am2 λ2 + · · · + amn λn C
FT
Very important remark. This identification of m×n-matrices with linear maps U → V depends
on the choice of the basis! See Example 6.48.
Theorem 6.46. Let U, V be finite dimensional vector spaces and let B = {u1 , . . . , un } be an ordered
basis of U and let C = {v1 , . . . , vm } be an ordered basis of V . Then the following is true:
RA
(i) Every linear map T : U → V can be represented as a matrix AT ∈ M (m × n) such that
(T w)C = AT w
~B
where (T w)C is the representation of T w ∈ V with respect to the basis C and w ~ B is the
representation of w ∈ U with respect to the basis B. The entries aij of AT can be calculated
as in (6.20).
(ii) Every matrix A = (aij )i=1,...,m ∈ M (m × n) induces a linear transformation T : U → V
j=1,...,n
defined by
D
(iii) T = TAT and A = ATA . , That means: If we start with a linear map T : U → V , calculate its
matrix representation AT and then the linear map TAT : U → V induced by AT , then we get
back our original map T . If on the other hand we start with a matrix A ∈ M (m×n), calculate
the linear map TA : U → V induced by A and then calculate its matrix representation ATA ,
then we get back our original matrix A.
Proof. We already showed (i) and (ii) in the text before the theorem. To see (iii), let us start with a
linear transformation T : U → V and let AT = (aij ) be the matrix representation of T with respect
to the bases B and C. For TAT , the linear map induced by AT , it follows that
T AT uj = a1j v1 + . . . amj vm = T uj , j = 1, . . . , n
Since this is true for all basis vectors and both T and TAT are linear, they must be equal.
If on the other hand we are given a matrix A = (aij )i=1,...,m ∈ M (m × n) then we have that the
j=1,...,n
linear transformation TA induced by A acts on the basis vectors u1 , . . . , un as follows:
But then, by definition of the matrix representation ATA of TA , it follows that ATA = A.
Let us see this “identifications” of matrices with linear transformations a bit more formally. By
choosing a basis B = {u1 , . . . , un } in U and thereby identifying U with Rn , we are in reality defining
a linear bijection
λ1
n ..
Ψ:U →R , Ψ(λu1 + · · · + λn un ) = . .
λn
Recall that we denoted the vector on the right hand side by ~uB .
The same happens if we choose a basis C = {v1 , . . . , vm } of V . We obtain a linear bijection
FT
µ1
m ..
Φ:V →R , Φ(µv1 + · · · + µm vm ) = . .
µm
With these linear maps, we find that
called commutative diagram. That means that it does not matter which path we take to go from
one corner of the diagram to another one as long as we move in the directions of the arrows. Note
that in this case we are even allowed to go in the opposite directions of the arrows representing Ψ
and Φ because they are bijections.
What is the use of a matrix representation of a linear map? Sometimes calculations are easier in
the world of matrices. For example, we know how to calculate the range and the kernel of a matrix.
Therefore, using Theorem :
• If we want to calculate Im T , we only need to calculate Im AT and then use Φ to “translate
back” to the range of T . In formula: Im T = Im(Φ−1 AT Ψ) = Im(Φ−1 AT ) = Φ−1 (Im AT ).
• If we want to calculate ker T , we only need to calculate ker AT and then use Ψ to “translate
back” to the kernel of T . In formula: ker T = ker(Φ−1 AT Ψ) = ker(AT Ψ) = Ψ−1 (ker AT ).
Let us summarise. From Theorem 6.24 we obtain again the following very important theorem, see
Theorem 6.20 and Proposition 6.16.
Theorem 6.47. Let U, V be vector spaces and let T : U → V be a linear transformation. Then
FT
Example 6.48. We consider the operator of differentiation
T : P 3 → P3 , T p = p0 .
RA
Note that in this case the vector spaces U and V are both equal to P3 .
(i) Represent T with respect to the basis B = {p1 , p2 , p3 , p4 } and find its kernel where p1 =
1, p2 = X, p3 = X 2 , p4 = X 3 .
Solution. We only need to evaluate T in the elements of the basis and then write the re-
sult again as linear combination of the basis. Since in this case, the bases are “easy”, the
calculations are fairly simple:
D
T p1 = 0, T p2 = 1 = p1 , T p3 = 2X = 2p2 , T p4 = 3X 2 = 3p3 .
(ii) Represent T with respect to the basis C = {q1 , q2 , q3 , q4 } and find its kernel where q1 =
X 3 , q2 = X 2 , q3 = X, q4 = 1.
Solution. Again we only need to evaluate T in the elements of the basis and then write the
result as linear combination of the basis.
T q1 = 3X 2 = 3q2 , T q2 = 2X = 2q3 , T q3 = X = q4 , T q4 = 0.
(iii) Represent T with respect to the basis B in the domain of T (in the “left” P3 ) and the basis
C in the target space (in the “right” P3 ).
FT
Solution. We calculate
T p1 = 0, T p2 = 1 = q4 , T p3 = 2X = 2q3 , T p4 = 3X 2 = 3q2 .
(iv) Represent T with respect to the basis D = {r1 , r2 , r3 , r4 } and find its kernel where
r1 = X 3 + X, r2 = 2X 3 + X 2 + 2X, r3 = 3X 3 + X 2 + 4X + 1, r4 = 4X 3 + X 2 + 4X + 1.
Solution 1. Again we only need to evaluate T in the elements of the basis and then write the
D
result as linear combination of the basis. This time the calculations are a bit more tedious.
T r1 = 3X 2 + 1 = − 8r1 + 2r2 + r4 ,
2
T r2 = 6X + 2X + 2 = − 14r1 + 4r2 + r3 ,
T r3 = 9X 2 + 2X + 4 = − 24r1 + 5r2 + 2r3 + 2r4 ,
T r4 = 12X 2 + 2X + 4 = 30r1 + 8r2 + 2r3 + 2r4 .
In order to calculate the kernel of AT , we apply the Gauß-Jordan process and obtain
−8 −14 −24 −30 1 0 0 2
2 4 5 8 −→ · · · −→ 0 1 0 1
AD
T = 0
.
2 2 2 0 0 1 0
1 0 2 2 0 0 0 0
Solution 2. We already have the matrix representation ACT and we can use it to calculate
AD
T . To this end define the vectors
1 2 3 4
0 1 1 1
ρ 1 , ρ
~1 = ~4 = .
2 ~3 = 4 , ρ
~2 = , ρ
4
0 0 1 1
SD→C =
1 2 3 4
0 1 1 1
1 2 4 4 ,
0 0 1 1
FT
Note that these vectors are the representations of our basis vectors r1 , . . . , r4 in the basis C.
The change-of-bases matrix from C to D and its inverse are, in coordinates,
SC→D = SD→C−1
=
0
−1
0 −2
1
1
0
0 −1
1 −2
0 −1
1
.
0
1
RA
It follows that
AD C
T = SC→D AT SD→C
0 −2 1 −2 0 0 0 0 1 2 3 4 −8 −14 −24 −30
0 1 0 −1 3 0 0 0 0 1 1 1 2 4 5 8
=−1
= .
0 1 0 0 2 0 0 1 2 4 4 0 2 2 2
1 0 −1 1 0 0 1 0 0 0 1 1 1 0 2 2
D
Let us see how this looks in diagrams. We define the two bijections of P3 with R4 which are
given by choosing the bases C and D by ΨC and ΨD :
T T
P3 P3 P3 P3
ΨC ΨC ΨD ΨD
AC AD
R4 T
R4 R4 T
R4
We already know everything in the diagram on the left and we want to calculate AD
T in the
diagram on the right. We can put the diagrams together as follows:
T
P3 P3
ΨC ΨC
ΨD ΨD
SD→C AC SC→D
R4 R4 T
R4 R4
AD
T
We can also see that the change-of-basis maps SD→C and SC→D are
SD→C = ΨC ◦ Ψ−1
D , SC→D = ΨD ◦ Ψ−1
C .
For AD
T we obtain
FT
−1
AD C
T = ΨD ◦ T ◦ ΨD = SD→C ◦ AT ◦ SC→D .
ΨC ΨC
RA
AC
ΨD R4 T
R4 S
ΨD
C C
→ →
D
SD
AD
T
R4 R4
B,C
Note that the matrices AB C D
D
T , AT , AT and AT all look different but they describe the same linear
transformation. The reason why they look different is that in each case we used different bases to
describe them.
Example 6.49. The next example is not very applied but it serves to practice a bit more. We
consider the operator given
T : M (2 × 2) → P2 , T ( ac db ) = (a + c)X 2 + (a − b)X + a − b + d.
Show that T is a linear transformation and represent T with respect to the bases B = {B1 , B2 , B3 , B4 }
of M (2 × 2) and C = {p1 , p2 , p3 } of P2 where
1 0 0 1 0 0 0 0
B1 = , B2 = , B3 = , B4 = ,
0 0 0 0 1 0 0 1
and
p1 = 1, p2 = X, p3 = X 2 .
Find bases for ker T and Im T and their dimensions.
a1 b1
Solution. First we verify that T is indeed a linear map. To this end, we take matrices A1 = c1 d1
and A2 = ac22 db22 and λ ∈ R. Then
a b1 a2 b2 λa1 + a2 λb1 + b2
T (λA1 + A2 ) = T λ 1 + =T λ
c1 d1 c2 d2 λc1 + c2 λd1 + d2
= (λa1 + a2 + λc1 + c2 )X 2 + (λa1 + a2 − λb1 − b2 )X + λa1 + a2 − (λb1 + b2 ) + λd1 + d2
= λ[(a1 + c1 )X 2 + (a1 − b1 )X + a1 − b1 + d1 )] + (a2 + c2 )X 2 + (a2 − b2 )X + a2 − b2 + d2 )
= λT (A1 ) + T (A2 ).
FT
T B1 = X 2 + X + 1 = p1 + p2 + p3 ,
T B2 = −X = −p2 ,
T B3 = X 2 = p3 ,
T B4 = 1 = p1 .
In order to determine the kernel and range of AT , we apply the Gauß-Jordan process:
1 0 0 1 1 0 0 1 1 0 0 1
AT = 1 −1 0 0 −→ 0 −1 0 −1 −→ 0 1 0 1 .
1 0 1 0 0 0 1 −1 0 0 1 −1
D
So the range of AT is R3 and its kernel is ker e1 +~e2 −~e3 −~e3 }. Therefore Im T = P2 and
AT = span{~
ker T = span{B1 + B2 − B3 − B4 } = span −11 −11 . For their dimensions we find dim(Im T ) = 3
and dim(ker T ) = 1.
Solution 1 (use coordinates adapted to the problem). Clearly, there are two directions which
are special in this problem: the direction parallel and the direction orthogonal to the line. So a
5
R~z L
4
3
~v ~z
w
~ 2
x
−3 −2 −1 1 2 3 4 5
−1
−2 ~ = −w
Rw ~
FT
Figure 6.5: The pictures shows the reflection R on the line L. The vector ~v is parallel to L, hence
R~v = ~v . The vector w ~ = −w.
~ is perpendicular to L, hence Rw ~
13 3 2 0 −1 −3 2 13 12 5
Solution 2 (reduce the problem to a known reflection). The problem would be easy if we
were asked to calculate
the matrix representation of the reflection on the x-axis. This would simply
1 0
be A0 = . Now we can proceed as follows: First we rotate R2 about the origin such that
0 −1
the line L is parallel to the x-axis, then we reflect on the x-axis and then we rotate back. The result
is the same as reflecting on L. Assume that Rot is the rotation matrix. Then
How can we calculate Rot? We know that Rot~v = ~e1 and that Rotw ~ = ~e2 . It follows that
Rot−1 = (~v |w)
~ = −32 32 . Note that up to a numerical factor, this is SB→can . We can calculate
z z
~x ~x
• •
E E
P ~x
R~x
y y
x x
Figure 6.6: The figure shows the plane E : x − 2y + 3z = 0 and for the vector ~
x it shows its orthogonal
projection P ~
x onto E and its reflection R~
x about E, see Example 6.51.
2
3
−3
=w
~ = −AT w
FT
= ~v = AT ~v =
~ =−
We can form a system of linear equations in order
to find AT . We write AR = ac db with unknown numbers a, b, c, d. Again, we use that we know
~ = −w.
that AT ~v = ~v and AT w ~ This gives the following equations:
a b
c d
a b
2
3
=
−3
2a + 3b
2c + 3d
=
,
3a − 2b
RA
2 c d 2 3c − 2d
2a + 3b = 2, 2c + 3d = 3, 3a − 2b = −3,
3c − 2d = 2,
5 −5 12
, b = c = 12 5
Its unique solution is a = − 13 13 , d = 13 , hence AR = 12 5 .
let P : R3 → R3 be the orthogonal projection onto E. Find the matrix representation of R with
respect to the standard basis of R3 .
Observation. 1 Note
that E is the plane which
2 passes through
0 the origin and is orthogonal to the
vector ~n = −2 . Moreover, if we set ~v = 1 and w ~ = 3 , then it is easy to see that {~v , w}
~ is
3 0 2
a basis of E.
Solution 1 (use coordinates adapted to the problem). Clearly, a basis which is adapted to
the exercise is B = {~n, ~v , w}
~ because for these vectors we have R~v = ~v , Rw ~ R~n = −~n, and
~ = w,
P ~v = ~v , P w ~ P ~n = ~0. Therefore the matrix representation of R with respect to the basis B is
~ = w,
1 0 0
AB
R =
0 1 0
0 0 −1
FT
= 2 3 6
7
−3 6 −2
and
2 0 2 1 0 0 13 2 −3
1
AP = SB→can AB
P Scan→B = 1 3 −1 0 1 0 −3 6 5
28
0 2 3 0 0 0 2 −4 6
13 2 −3
1
RA
= 2 10 6
14
−3 6 5
Solution 2 (reduce the problem to a known reflection). The problem would be easy if we
were asked to calculate
thematrix representation of the reflection on the xy-plane. This would
1 0 0
simply be A0 = 0 1 0. Now we can proceed as follows: First we rotate R3 about the origin
0 0 −1
such that the plane E is parallel to the xy-axis, then we reflect on the xy-plane and then we rotate
back. The result is the same as reflecting on the plane E. We leave the details to the reader. An
D
Remark 6.52. Yet another solution is the following. Let Q be the orthogonal projection onto ~n.
We already know how to calculate its representing matrix:
1 −2 3 x
h~x , ~ni x − 2y + 3z 1
Q~x = ~
n = ~
n = −2 4 −6 y .
k~nk2 14 14
3 −6 9 z
1 −2 3
1
Hence AQ = 14
−2 4 −6 . Geometrically, it is clear that P = id −Q and R = id −2Q. Hence it
3 −6 9
follows that
1 0 0 1 −2 3 13 2 −3
1 −2 1
AP = id −AQ = 0 1 0 − 4 −6 = 2 10 6
14 14
0 0 1 3 −6 9 −3 6 5
and
1 0 0 1 −2 3 6 2 −3
1 1
AR = id −2AQ = 0 1 0 − −2 4 −6 = 2 3 6 .
7 7
0 0 1 3 −6 9 −3 6 −2
FT
with respect to different bases. To see this let B = {~v1 , . . . , ~vn } and C = {w
~ 1 , . . . , ~vw } be bases of
Rn . We define the the linear bijections ΨB and ΨC as follows:
Note that these matrices are exactly the matrix representations of ΨB and ΨC . Now let us consider
the diagram
id
Rn Rn
Ψ−1
B Ψ−1
C
Aid
Rn Rn
D
Therefore
Aid = Ψ−1 −1 −1
C ◦ id ◦ΨB = ΨC ◦ ΨB = SC→can ◦ SB→can = Scan→C ◦ SB→can = SB→C .
• why every linear map between finite dimensional vector spaces can be written as a matrix
and why the matrix depends on the chosen bases,
• how the matrix representation changes if the chosen bases changes,
• in particular, how the matrix representation changes if the chosen bases are reordered,
• etc.
Ejercicios.
1. De los ejercicios 1 al 14 (exceptuando el 11.) de la sección 6.1 obtenga la representación
matricial de T en las respectivas bases canónicas.
2. Encuentre la representación matricial en la respectiva base canónica de las siguientes trans-
formaciones. En Pn tome la base {1, X, X 2 , . . . , X n+1 }.
FT
(a) En P4 , D(p) = Xp0 − p.
(b) En P4 , D(p) = p00 .
(c) En Pn , D(p) = p(m) , la m−ésima derivada de p.
(d) En M (3 × 3), T (A) = A − At .
R1
(e) En Pn , J : Pn → R dada por J(p) = 0
p(t)dt.
3. Para cada transformación lineal dada, encuentre su representación matricial en las bases
RA
indicadas:
2 2 1 −1
(a) T : R → R , T (x, y) = (y, 0) de base canónica a B2 = , .
1 1
x 1 1 1
x+y+z
(b) T : R3 → R2 , T y = de B1 = 0 , 1 , 0 a B2 =
x−y
z −1 1 0
0 1
, .
1 0
D
1 2 2 0
(c) T : R3 → R3 , T (~x) = ~x × 2 en la base B = 4 , −1 , −3 .
3 6 0 2
2a + b + c
(d) T : P2 → R2 , T (aX 2 + bX + c) = de B1 = {X 2 + 1, X 2 + X, X 2 + X + 1}
b − 3c
2 1
a B2 = , .
3 −1
4. Sea V = span{cos x, sin x, x cos x, x sin x} y B = {cos x, sin x, x cos x, x sin x}.
(a) Demuestre que B es base de V .
(b) Para D : V → V dada por D(f ) = f 0 , obtenga [D]B . ¿D es invertible?
5. Sea V = span{1, ex , e−x }, B1 = {1, ex , e−x } y B2 = {1, cosh x, sinh x}. Considere D : V → V
dada por D(f ) = f 0 , obtenga [D]B
B2 . ¿D es invertible?
1
6. Sea w
~ un vector nonulo y T: R3 → R3 dada por T (~x) = projw~ ~x. Encuentre una base B de
1 0 0
R3 tal que [T ]B = 0 0 0.
0 0 0
8. (a) Sea S : R4 → R una transformación lineal tal que S e~1 = 4, S e~2 = −3, S e~3 = 0 y S e~4 = π.
FT
~ ∈ R4 tal que S~x = h~x , wi
Muestre que existe w ~ para todo ~x ∈ R4 .
1 0 0
4
1 0 0
(b) Sea S : R → R una transformación lineal tal que S 0 = 1, S 1 = −2, S 1 = 3
0 1 0
1
0
y S0 = −1. Encuentre w
~ ∈ R4 tal que S~x = h~x , wi~ para todo ~x ∈ R4 .
RA
4
10. Sea T : Rn → Rn transformación lineal tal que T (~ei ) = ei+1 si 1 ≤ i < n y T (~en ) = 0. Muestre
que T n = O ¿Cómo es la representación matricial de T, T 2 , . . . , T n−1 en la base canónica?
D
6.5 Summary
Linear maps
A function T : U → V between two K-vector spaces U and V is called linear map (or linear function
or linear transformation) if it satisfies
• If U, V are K-vector spaces then L(U, V ) is a K-vector space. This means: If S, T ∈ L(U, V )
and λ ∈ K, then S + λT ∈ L(U, V ).
ker T = {u ∈ U : T u = O} ⊆ U,
Im T = {T u : u ∈ U } ⊆ V.
(i) T is injective.
(ii) T u = O implies that u = O.
(iii) ker T = {O}.
(i) T is surjective.
FT
RA
(ii) Im T = V .
• If T is bijective, then necessarily dim U = dim V . In other words: if dim U 6= dim V , then
there exists no bijection between them.
and
dim(ker(T )) + dim(Im(T )) = n.
TA : Kn → Km , ~x 7→ A~x.
αn βm
FT
then these functions are linear and Φ ◦ AT ◦ Ψ = T and Ψ−1 ◦ T ◦ Φ−1 = AT . In a diagram this is
Ψ
U
Rn
T
AT
V
Rm
Φ
RA
Matrices
Let A ∈ M (m × n).
• The column space CA of A is the linear span of its column vectors. It is equal to Im A.
• The row space RA of A is the linear span of its row vectors. It is equal to the orthogonal
complement of ker A.
• dim RA = dim CA = dim(Im A) = number of columns with pivots in any echelon form of A.
D
6.6 Exercises
1. Determine si las siguientes funciones son lineales. Si lo son, calcule el kernel y la dimensión
del kernel.
x
3 2x + y x−z
(a) A : R → M (2 × 2), A y =
,
x + y − 3z z
z
x
3 2xy x−z
(b) B : R → M (2 × 2), A y =
,
x + y − 3z z
z
(c) D : P3 → P4 , Dp = p0 + xp,
a+b b+c c+d
(d) T : P3 → M (2 × 3), T (ax3 + bx2 + cx + d) = ,
0 a+d 0
a+b b+c c+d
T (ax3 + bx2 + cx + d) =
FT
(e) T : P3 → M (2 × 3), .
0 a+d 3
6. Sean
1 3 1 1 1 0
A= , E= , F = .
2 6 −1 1 0 −1
(d) Calcule ker(A), ker(F A), ker(AE) y sus dimensiones. Dibújalos y diga cual es la relación
entre ellos.
12. (a) Encuentre una transformación lineal de M (5×5) a M (3×3) diferente de la transfomación
nula.
(b) Encuentre por lo menos dos diferentes funciones lineales biyectivas de M (2 × 2) a P3 .
(c) Existe una función lineal biyectiva S : M (2 × 2) → Pk para k ∈ N, k 6= 3?
D
1 0 1
14. (a) Sean ~v1 = 4 , ~v2 = 1 , ~v3 = 0 y sea B = {~v1 , ~v2 , ~v3 }. Demuestre que B es
7 2 2
1 0
una base de R3 y escriba los vectores ~x = 2 , ~y = 1 en términos de la base B.
3 1
1 2 3 2 3 2
15. Sean R = , S= , T = . Demuestre que B = {R, S, T } es una base
0 3 0 7 0 1
del espacio de las matrices triangulares superiores y exprese las matrices
1 1 0 0 1 0
K= , L= , M=
0 1 0 1 0 1
en términos de la base B.
1 3 ~ −1 ~ 3
16. Sean ~a1 = , ~a2 = , b1 = , b2 = ∈ R2 y sean A = {~a1 , ~a2 }, B = {~b1 , ~b2 }.
FT
2 1 1 2
Φ : M (2 × 2) → M (2 × 2), Φ(A) = At
FT
Rx
20. Sean T : P3 → P4 dada por T (p) = 0 p(t)dt y D : P4 → P3 dada por D(p) = p0
(a) Muestre que T, D son transformaciones lineales y para cada una encuentre su kernel, su
imagen y las dimensiones del kernel y la imagen.
(b) ¿Se cumple que T (D(p)) = p para todo p ∈ P4 ? En caso de que la respuesta sea negativa,
¿en cuáles casos se cumple?
(c) Repetir lo del inciso anterior para D(T (p)) donde p ∈ P3 .
~ ∈ Rn un vector no nulo. Muestre que existe T : Rn → R tal que T w
RA
21. Sea w ~ 6= 0. Calcule
dim(ker T ) y dim(Im T ).
22. En Rn , sea ϕ : Rn → R una transformación lineal diferente de la trivial.
(c) Sean ~v1 , ~v2 , . . . , ~vn vectores de R2 todos distintos de ~0. Muestre que existe ϕ : R2 → R
tal que ϕ no se anula en ninguno de ellos.
1E 1 0 0 1 0 0 0 0
1 = , E2 = , E3 = , E4 = .
0 0 0 0 1 0 0 1
FT
In this chapter we will work in Rn and not in arbitrary vector spaces since we want to explore in
more detail its geometric properties. In particular we will discuss orthogonality. Note that in an
arbitrary vector space, we do not have the concept of angles or orthogonality. Everything that we
will discuss here can be extended to inner product spaces where the inner product is used to define
angles. Recall that we showed in Theorem 2.19 that for non-zero vectors ~x, ~y ∈ Rn the angle ϕ
between them satisfies the equation
h~x , ~y i
RA
cos ϕ = .
k~xk k~y k
In a general inner product space (V, h· , ·i) this equation is used to define the angle between two
vectors. In particular, two vectors are said to be orthogonal if their inner product is 0. Inner
product spaces are useful for instance in physics, and maybe in some not so distant future there
will be chapter in these lecture notes about them.
First we will define what the orthogonal complement of a subspace of Rn is and we will see that
the direct sum of a subspace and its orthogonal complement gives us all of Rn .
D
We already know what the orthogonal projection of a vector ~x onto another vector ~y 6= ~0 is (see
Section 2.3). Since it is independent of the norm of ~y , we can just as well consider it the orthogonal
projection of ~x onto the line generated by ~y . In this chapter we will generalise the concept of an
orthogonal projection onto a line to the orthogonal projection onto an arbitrary subspace.
As an application, we will discuss the minimal squares method for the approximation of data.
275
276 7.1. Orthonormal systems and orthogonal bases
Definition 7.1. (i) A set of vectors ~x1 , . . . , ~xk ∈ Rn is called an orthogonal set if they are
pairwise orthogonal; in formulas we can write this as
(ii) A set of vectors ~x1 , . . . , ~xk ∈ Rn is called an orthonormal set if they are pairwise orthonormal;
in formulas we can write this as
(
1 for j = `,
h~xj , ~x` i =
0 for j 6= `.
The difference between an orthogonal and an orthonormal set is that in the latter we additionally
require that each vector of the set satisfies h~xj , ~xj i = 1, that is, that k~xj k = 1. Therefore an
orthogonal set may contain vectors of arbitrary lengths, including the vector ~0, whereas in an
orthonormal all vectors set must have length 1. Note that every orthonormal system is also an
orthogonal system. On the other hand, every orthogonal system which does not contain ~0 can be
converted to an orthonormal one by normalising each vector (that is, by dividing each vector by its
FT
norm).
Examples 7.2. (i) The following systems are orthogonal systems but not orthonormal systems
since the norm of at least one of their vectors is different from 1:
1 0 0
1 3 0 1 3
, , , , , 0 , 1 , −2 .
−1 3 0 −1 3
0 2 1
RA
(ii) The systems following systems are orthonormal systems:
1 0 0
1 1 1 1 1 1
√ , √ , 0 , √ 1 , √ −2 .
2 −1 2 1
0 5 2 5 1
We have to show that all αj must be zero. To do this, we take the inner product on both sides
with the vectors ~xj . Let us start with ~x1 . We find
Since h~0 , ~x1 i = 0, h~x1 , ~x1 i = k~x1 k2 = 1 and h~x2 , ~x1 i = · · · = h~xn−1 , ~xn−1 i = h~xn , ~x1 i = 0, it
follows that
0 = α1 + 0 + · · · + 0 = α1 .
Now we can repeat this process with ~x2 , ~x3 , . . . , ~xn to show that α2 = · · · = αn = 0.
Remark. The lemma shows that every orthogonal system of n vectors in Rn is a basis of Rn .
Definition 7.4. An orthonormal basis of Rn is a basis whose vectors form an orthogonal set.
Occasionally we will write ONB for “orthonormal basis”.
FT
1 1 1 −1 1 −3 1
1 1 1 1 1
√ −1 , √ 1 , √ 1 , √ 2 , √ 0 , √ −5 .
3 2 0 6 14 10 35
1 2 3 1 3
cos ϕ − sin ϕ
Exercise 7.6. Show that every orthonormal basis of R2 is of the form ,
sin ϕ cos ϕ
cos ϕ sin ϕ
or , for some ϕ ∈ R. See also Exercise 7.13.
sin ϕ − cos ϕ
RA
We will see in Corollary 7.27 that every orthonormal system in Rn can be completed to an or-
thonormal basis. In Section 7.5 we will show how to construct an orthonormal basis of a subspace
of Rn from a given basis. In particular it follows that every subspace of Rn has an orthonormal
basis.
Orthonormal bases are very useful. Among other things it is very easy to write a given vector
~ ∈ Rn as a linear combination of such a basis. Recall that if we are given an arbitrary basis
w
~z1 , . . . , ~zn of Rn and we want to write a vector ~x as linear combination of this basis, then we have
D
to find coefficients α1 , . . . , αn such that ~x = α1 ~z1 +· · ·+αn ~zn , which means we have to solve a n×n
system in order to determine the coefficients. If however the given basis is an orthonormal basis,
then calculating the coefficients reduces to evaluating n inner products as the following theorem
shows.
Theorem 7.7 (Representation of a vector with respect to an ONB). Let ~x1 , . . . , ~xn be an
orthonormal basis of Rn and let w
~ ∈ Rn . Then
~ = hw
w ~ , ~x1 i~x1 + hw
~ , ~x2 i~x2 + · · · + hw
~ , ~xn i~xn .
Now let us take the inner product on both sides with ~xj for j = 1, . . . , n. Note that h~xk , ~xj i = 0
if k 6= j and that h~xj , ~xj i = k~xj k2 = 1.
hw
~ , ~xj i = hα1 ~x1 + α2 ~x2 + · · · + αn ~xn , ~xj i
= α1 h~x1 , ~xj i + α2 h~x2 , ~xj i + · · · + αn h~xn , ~xj i
= αj h~xj , ~xj i = αj .
Note that the proof of this theorem is essentially the same as that of Lemma 7.3. In fact, Lemma 7.3
follows from the theorem above if we choose w ~ = ~0.
Exercise 7.8. If ~x1 , . . . , ~xn are an orthogonal, but not necessarily orthonormal basis of Rn , then
we have for every w~ ∈ Rn that
hw
~ , ~x1 i hw
~ , ~x2 i hw
~ , ~xn i
w
~= 2
~x1 + 2
~x2 + · · · + ~xn .
k~x1 k k~x2 k k~xn k2
(You can either use a modified version of the proof of Theorem 7.7 or you define yj = k~xj k−1 ~xj ,
FT
show that ~y1 , . . . , ~yn is an orthogonal basis and apply the formula from Theorem 7.7.)
Ejercicios.
1. De los ejercicios anteriores, verifique si el conjunto dado es una base ortonormal del espacio
vectorial V al que se refiere.
2 1 2 1 2
(a) V = R , √ , √5 .
5 1 −1
√1
1 √1 1
(b) V = R2 , , 2 .
2 −1 1
√1
2
(c) En R2 considerar V = la recta 3x − 2y = 0, 13 3
.
1 −1 1 8
(d) En R3 considerar V = span 2 , 2 , √1 4 , √ 1 −2 .
17 357
−1 3 0 −17
1 −1 1 1 1 −1
(e) En R3 considerar V = span 2 , 0 , 6 , √1 1 , √1 −1 , √1 1 .
2 6 3
3 4 17 0 2 −1
FT
2 a
3. En R , sea ~v1 = un vector no nulo. ¿Cuántas bases ortogonales de R2 que contienen a
b
~v1 existen? ¿Cuántas bases ortogonales {~v1 , ~v2 } de R2 existen tales que k~v1 k = k~v2 k ?
4. El siguiente ejercicio pretende obtener una base ortonormal del plano E : ax + by + cz = 0
con herramientas vistas hasta ahora.
(a) Considere un vector ~v1 paralelo ~v1 6= ~0. Sea ~n algún vector normal de E y
n a E con o
RA
~
v1 ~
v2
tome ~v2 = v1 × ~n. Demuestre k~v1 k , k~v2 k es una base ortonormal de E, (observe que
n o
~
v1 ~
v2 ~
n
v1 k , k~
k~ v2 k , k~
nk es una base ortonormal de R3 ).
~n
D
E
→
−
v1
→
−
v2
(b) Para el plano E : x + 2y + 3z = 0, obtenga una base ortonormal y complétela a una base
ortonormal de R3 .
1
(c) Escriba 2 en términos de la base que obtuvo del inciso anterior.
3
(d) Sea L : x3 = y2 = z5 . ¿Puede obtener una base ortonormal de R3 que contenga algún
vector director de L?
5. Sea B cualquier base de Rn y sean ~v1 , ~v2 ∈ Rn . Si h[~v1 ]B , [~v2 ]B i = 0 ¿se sigue que ~v1 ⊥ ~v2 ?
FT
qn1 C
qnn C
If we exchange the role of B and C and use that hw ~ i , ~uj i = h~uj , w~ i i, then we obtain
hw~ 1 , ~
u 1 i hw~ 2 , ~
u 1 i hw~ n , ~
u 1 i h~u1 , w
~ 1 i h~
u 1 , w
~ 2 i h~u1 , w
~ n i
hw i h i h i h~ i h~ i h~ i
~ 1 , ~
u 2 w~ 2 , ~
u 2 w~ n , ~
u 2 u2 , w
~ 1 u 2 , w
~ 2 u2 , w
~ n
AC→B =
=
.
D
hw~ 1 , ~un i hw ~ 2 , ~un i hw~ n , ~un i h~un , w
~ 1 i h~un , w ~ 2i h~un , w
~ ni
This shows that AC→B = (AB→C )t . If we use that AC→B = (AB→C )−1 , then we find that
Lemma 7.9. Let B = {~u1 , . . . , ~un } and C = {w ~ n } be orthonormal bases of Rn and let
~ 1, . . . , w
Q = AB→C be the transition matrix from the basis B to the basis C. Then
Qt = Q−1 .
Proof. (i) =⇒ (ii): Assume that Q is orthogonal. Then it is invertible, hence also Qt is invertible
by Theorem 3.51 and (Qt )−1 = (Q−1 )t = (Qt )t = Q holds. Hence Qt is an orthogonal matrix.
(ii) =⇒ (i): Assume that Qt is an orthogonal matrix. Then (Qt )t = Q must be an orthogonal
matrix too by what we just proved.
(i) =⇒ (iii): Assume that Q is orthogonal. Then it is invertible and (Q−1 )−1 = (Qt )−1 = (Q−1 )t
where in the second step we used Theorem 3.51. Hence Q−1 is an orthogonal matrix.
(iii) =⇒ (i): Assume that Q−1 is an orthogonal matrix. Then its inverse (Q−1 )−1 = Q must be
FT
an orthogonal matrix too by what we just proved.
By Lemma 7.9, every transition matrix from one ONB to another ONB is an orthogonal matrix.
The reverse is also true as the following theorem shows.
(i) Q is an orthogonal matrix if and only if its columns are an orthonormal basis of Rn .
RA
(ii) Q is an orthogonal matrix if and only if its rows are an orthonormal basis of Rn .
(iii) If Q is an orthgonal matrix, then | det Q| = 1.
Proof. (i): Assume that Q is an orthogonal matrix and let ~cj be its columns. We already know
that they are a basis of Rn since Q is invertible. In order to show that they are also an orthonormal
system, we calculate
h~c , ~
c i h~c , ~
c i h~c , ~
c i
1 1 1 2 1 n
D
~c1
h~c , ~
c i h~c , ~
c i h~c , ~
c i
. 2 1 2 2 2 n
t
id = Q Q = .. (~c1 | · · · | ~cn ) = . (7.1)
~cn
h~cn , ~c1 i h~cn , ~c2 i h~cn , ~cn i
Since the product is equal to the identity matrix, it follows that all the elements on the diagonal
must be equal to 1 and all the other elements must be equal to 0. This means that h~cj , ~cj i = 1 for
j = 1, . . . , n and h~cj , ~ck i = 0 for j 6= k, hence the columns of Q are an orthonormal basis of Rn .
Now assume that the columns ~c1 , . . . , ~cn of Q are an orthonormal basis of Rn . Then clearly (7.1)
holds which shows that Q is an orthogonal matrix.
(ii): The rows of Q are the columns of Qt hence they are an orthonormal basis of Rn by (i) and
Proposition 7.11 (ii).
1 1
with | det
Clearly, not every matrix R R| = 1 is an orthogonalmatrix.
For instance, if R = ,
0 1
1 −1 1 0
then det R = 1, but R−1 = is different from Rt = .
0 1 1 1
Question 7.1
Assume that ~a1 , . . . , ~an ∈ Rn are pairwise orthogonal and let R ∈ M (n × n) be the matrix whose
columns are the given vectors. Can you calculate Rt R and RRt ? What are the conditions on the
vectors such that R is invertible? If it is invertible, what is its inverse? (You should be able to
FT
answer the above questions more or less easily if k~aj k = 1 for all j = 1, . . . , n because in this case
R is an orthogonal matrix.)
cos ϕ − sin ϕ
Exercise 7.13. Show that every orthogonal 2 × 2 matrix is of the form Q =
sin ϕ cos ϕ
cos ϕ sin ϕ
or Q = . Compare this with Exercise 7.6.
sin ϕ − cos ϕ
RA
Exercise 7.14. Use the results from Section 4.3 to prove that | det Q| = 1 if Q is an orthogonal
2 × 2 or 3 × 3 matrix.
It can be shown that every orthogonal matrix represents either a rotation (if its determinant is 1)
or the composition of a rotation and a reflection (if its determinant is −1).
~c2 . Recall that Q~e1 = ~c1 and Q~e2 = ~c2 . Since ~c1 is a unit vector, it is of the form ~c1 = for
sin ϕ
some ϕ ∈ R. Since ~c2 is also a unit
vector and in addition
must
be orthogonal to ~c1 , there are only
− sin ϕ sin ϕ
the two possible choices ~c2 + = or ~c2 − = , see Figure 7.1.
cos ϕ − cos ϕ
cos ϕ − sin ϕ
• In the first case, det Q = det(~c1 |~c2 + ) = det = cos2 ϕ + sin2 ϕ = 1 and Q
sin ϕ cos ϕ
represents the rotation by ϕ counterclockwise.
cos ϕ sin ϕ
• In the second case, det Q = det(~c1 |~c2 ) = det
−
= − cos2 ϕ − sin2 ϕ = −1.
sin ϕ − cos ϕ
and Q represents the rotation by ϕ counterclockwise followed by a reflection on the direction
given by ~c1 (or: reflection on the x-axis followed by the rotation by ϕ counterclockwise).
y
Q y
cos ϕ
~c1 = sin ϕ
~e2 − sin ϕ
~c2 + =
cos ϕ
ϕ
(a) x x
~e1
y
Q y
cos ϕ
~c1 = sin ϕ
~e2
ϕ
(b) x x
~e1
FT
~c2 − = sin ϕ
− cos ϕ
Figure 7.1: In case (a), Q represents a rotation and det A = 1. In case (b) it represents rotation
followed by a reflection and det Q = −1.
(i) Q preserves inner products, that is h~x , ~y i = hQ~x , Q~y i for all ~x, ~y ∈ Rn .
RA
(ii) Q preserves lengths, that is k~xk = kQ~xk for all ~x ∈ Rn .
(iii) Q preserves angles, that is ^(~x, ~y ) = ^(Q~x, Q~y ) for all ~x, ~y ∈ Rn \ {~0}.
(i) Assume that Q preserves inner products, that is h~x , ~y i = hQ~x , Q~y i for all ~x, ~y ∈ Rn . Show
that Q is an orthogonal matrix.
(ii) Assume that Q preserves lengths, that is k~xk = kQ~xk for all. Show that Q is an orthogonal
D
matrix.
Exercise 7.15 together with Exercise 7.16 show the following.
A matrix Q is an orthogonal matrix if and only if it preserves lengths if and only if it preserves
angles. That is
Q is orthogonal ⇐⇒ Qt = Q−1
⇐⇒ hQ~x , Q~y i = h~x , ~y i for all ~x, ~y ∈ Rn
⇐⇒ kQ~xk = k~xk for all ~x ∈ Rn .
Note that every isometry is injective since T ~x = ~0 if and only if ~x = ~0, therefore necessarily n ≤ m.
FT
Ejercicios.
1. Verifique que las siguientes matrices son ortogonales.
1 0 0 cos ϑ 0 − sin ϑ cos ϑ − sin ϑ 0
0 cos ϑ − sin ϑ , 0 1 0 , sin ϑ cos ϑ 0
0 sin ϑ cos ϑ sin ϑ 0 cos ϑ 0 0 1
RA
¿Cuál es la interpretación geométrica de cada una? Ver 3.4 ejercicio 6..
2. Para el plano E : 2x + y − z = 0, obtenga una base B ortonormal de R3 tal que sus dos
primeros vectores sean una base de E. Obtenga AB→can y Acan→B .
4. Sean A, B ∈ M (n × n):
(a) Si AB es ortogonal. ¿Se puede concluir que A y B deben ser matrices ortogonales?
D
(b) Si A,B son matrices ortogonales. ¿Se puede concluir que AB es una matriz ortogonal?
5. Sea T : Rn → Rm y Q ∈ M (n × n)
(ii) The orthogonal complement of U is denoted by U ⊥ and it is the set of all vectors which are
perpendicular to every vector in U , that is
FT
Remark 7.19. Let U be a subspace of Rn .
(i) U ⊥ is a subspace of Rn .
(ii) U ∩ U ⊥ = {~0}.
Proof. (i) Clearly, ~0 ∈ U ⊥ . Let ~x, ~y ∈ U ⊥ and let c ∈ R. Then for every ~u ∈ U we have that
RA
h~x + c~y , ~ui = h~x , ~ui + ch~y , ~ui = 0, hence ~x + c~y ∈ U ⊥ and U ⊥ is a subspace by Theorem 5.10.
(ii) Let ~x ∈ U ∩ U ⊥ . Then it follows that ~x ⊥ ~x, hence k~xk2 = h~x , ~xi = 0 which shows that ~x = ~0
and therefore U ∩ U ⊥ consists only of the vector ~0.
(iii) Assume that ~x ∈ (Rn )⊥ . Then ~x ⊥ ~y for every ~y ∈ Rn , in particular also ~x ⊥ ~x. Therefore
k~xk2 = h~x , ~xi = 0 which shows that ~x = ~0. It follows that ~x ∈ (Rn )⊥ .
It is clear that h~x , ~0i = 0, hence Rn ⊆ {~0}⊥ ⊆ Rn which proves that {~0}⊥ = Rn .
D
Examples 7.20. (i) The orthogonal complement of a line in R2 is again a line, see Figure 7.2.
(ii) The orthogonal complement of a line in R3 is the plane perpendicular to the given lines. The
orthogonal complement to a plane in R3 is the line perpendicular to the given plane, see
Figure 7.2.
The next goal is to show that dim U + dim U ⊥ = n and to establish a method for calculating
U ⊥ . To this end, the following lemma is useful. It tells us that in order to verify that some ~x is
perpendicular to U we do not have to check that ~x ⊥ ~u for every ~u ∈ U , but that it is enough to
check it for a set of vectors ~u which generate U .
Lemma 7.21. Let U = span{~u1 , . . . , ~uk } ⊆ Rn . Then ~x ∈ U ⊥ if and only if ~x ⊥ ~uj for every
j = 1, . . . , k.
z
x H
G
U
L
• y •
FT
Figure 7.2: The figure on the left shows the orthogonal complement of the line L in R2 which is the
line G. The figure on the right shows the orthogonal complement of the plane U in R3 which is the
line H. Note the orthogonal complement of H is U .
Proof. Suppose that ~x ⊥ U , then ~x ⊥ ~u for every ~u ∈ U , in particular for the generating vectors
~u1 , . . . , ~uk . Now suppose that ~x ⊥ ~uj for all j = 1, . . . , k. Let ~u ∈ U be an arbitrary vector in U .
Then there exist α1 , . . . , αk ∈ R such that ~u = α1 ~u1 + · · · + ~uk αk . So we obtain
RA
h~x , ~ui = h~x , α1 ~u1 + · · · + ~uk αk i = h~x , α1 ~u1 i + · · · + αk h~x , ~uk i = 0.
Proof. Let ~r1 , . . . , ~rn be the rows of A. Since RA = span{~r1 , . . . , ~rn }, it suffices to show that
~x ∈ ker(A) if and only if ~x ⊥ ~rj for all j = 1, . . . , m.
By definition ~x ∈ ker(A) if and only if
~r1 x1 h~r1 , ~xi
~0 = A~x = . . .
.. .. = .. .
~rm xm h~rm , ~xi
This is the case if and only if h~rj , ~xi = 0 for all j = 1, . . . , m, that is, if and only if ~x ⊥ ~rj for all
j = 1, . . . , m.
Alternative proof of Theorem 7.22. Observe that RA = CAt = Im(At ). So we have to show that
x ∈ ker(A) ⇐⇒ Ax = 0 ⇐⇒ Ax ⊥ Rm
⇐⇒ hAx , yi = 0 for all y ∈ Rm
⇐⇒ hx , At yi = 0 for all y ∈ Rm ⇐⇒ x ∈ (Im(At ))⊥ .
The theorem above leads to a method for calculating the orthogonal complement of a given subspace
U of Rn as follows.
Lemma 7.23. Let U = span{~u1 , . . . , ~uk } ⊆ Rn and let A be the matrix whose rows consist of the
vectors ~u1 , . . . , ~uk . Then
U ⊥ = ker A. (7.2)
Proof. Let ~x ∈ Rn . By Lemma 7.21 we know that ~x ∈ U ⊥ if and only if ~x ⊥ ~uj for every
j = 1, . . . , k. This is the case if and only if
FT
h~u1 , ~xi = 0 ~u1
0
h~u2 , ~xi = 0 ~u2 0
which can be written in matrix form as . ~x = .
.. . .. ..
. = ..
~uk 0
h~uk , ~xi = 0
which is the same as A~x = ~0 by definition of A. In conclusion, ~x ⊥ U if and only A~x = ~0, that is,
if and only if ~x ∈ ker A.
RA
In Example 7.28 we will calculate the orthogonal complement of a subspace of R4 .
The next two theorems are the main results of this section.
Proof. Let ~u1 , . . . , ~uk be a basis of U . Note that k = dim U . Then we have in particular U =
D
span{~u1 , . . . , ~uk }. As in Lemma 7.21 we consider the matrix A ∈ M (k × n) whose rows are the
vectors ~u1 , . . . , ~uk . Then U ⊥ = ker A, so
Note that dim(Im A) is the dimension of the column space of A which is equal to the dimension of
the row space of A by Proposition 6.32. Since the vectors ~u1 , . . . , ~uk are linear independent, this
dimension is equal to k. Therefore dim U ⊥ = n − k = n − dim U . Rearranging we obtained the
desired formula dim U ⊥ + dim U = n.
(We could also have said that the reduced form of A cannot have any zero row because its rows
are linearly independent. Therefore the reduced form must have k pivots and we obtain dim U ⊥ =
dim(ker A) = n − #(pivots of the reduced form of A) = n − k = n − dim U . We basically re-proved
Proposition 6.32.)
(i) U ⊕ U ⊥ = Rn .
(ii) (U ⊥ )⊥ = U .
Proof. (i) Recall that U ∩ U ⊥ = {~0} by Remark 7.19, therefore the sum is a direct sum. Now let
us show that U + U ⊥ = Rn . Since U + U ⊥ ⊆ Rn , we only have to show that dim(U + U ⊥ ) =
n because the only n-dimensional subspace of Rn is Rn itself, see Theorem 5.54. From
Proposition 5.62 and Theorem 7.24 we obtain
(ii) First let us show that U ⊆ (U ⊥ )⊥ . To this end, fix ~u ∈ U . Then, for every ~y ∈ U ⊥ , we have
that h~x , ~y i = 0, hence ~x ⊥ U ⊥ , that is, ~x ∈ (U ⊥ )⊥ . Note that dim(U ⊥ )⊥ = n − dim U ⊥ =
n − (n − dim U ) = dim U . Since we already know that U ⊆ (U ⊥ )⊥ , it follows that they must
FT
be equal by Theorem 5.54.
The next proposition shows that every subspace of Rn has an orthonormal basis. Another proof of
this fact will be given later when we introduce the Gram-Schmidt process in Section 7.5.
Proposition 7.26. Every subspace U ⊆ Rn with dim U > 0 has an orthonormal basis.
Proof. Let U be a subspace of Rn with dim U = k > 0. Then dim U ⊥ = n − k and we can choose
RA
a basis w ~ k+1 , . . . , wn of U ⊥ . Let A0 ∈ M ((n − k) × n) be the matrix whose rows are the vectors
~ k+1 , . . . , wn . Since U = (U ⊥ )⊥ , we know that U = ker A0 . Pick any ~u1 ∈ ker A0 with ~u1 6= ~0.
w
Then ~u1 ∈ U . Now we form the new matrix A1 ∈ M ((n−k+1)×n) by adding ~u1 as a new row to the
matrix A0 . Note that the rows of A1 are linearly independent, so dim ker(A1 ) = n−(n−k+1) = k−1.
If k−1 > 0, then we pick any vector ~u2 ∈ ker A1 with ~u2 6= ~0. This vector is orthogonal to all the rows
of A1 , in particular it belongs to U (since it is orthogonal to w ~ k+1 , . . . , w
~ n ) and it is perpendicular
to ~u1 ∈ U . Now we form the matrix A2 ∈ M ((n−k +2)×n) by adding the vector ~u2 as a row to A1 .
Again, the rows of A2 are linearly independent and therefore dim(ker A2 ) = n − (n − k + 2) = k − 2.
If k − 2 > 0, then we pick any vector ~u3 ∈ ker A2 with ~u3 6= ~0. This vector is orthogonal to all
D
Solution. Recall that ~x ∈ U ⊥ if and only if it is perpendicular to the vectors which generate U .
Therefore ~x ∈ U ⊥ if and only if it belongs to the kernel of the matrix whose rows are the generators
of U . So we calculate
1 2 3 4 1 2 3 4 1 0 1 0
−→ −→ .
1 0 1 0 0 −2 −2 −4 0 1 1 2
FT
0 −1
−2 −1
w 0 ,
~1 = w 1 .
~2 =
1 0
Solution. We will use the method from Proposition 7.26. Another solution of this exercise will be
given in Example 7.44. From the solution of Example 7.28 we can take the first basis vector w ~ 1.
We append it to the matrix from the solution of Example 7.28 and reduce the new matrix (note
that the first few steps are identical to the reduction of the original matrix). We obtain
1 2 3 4 1 0 1 0 1 0 1 0
D
1 0 1 0 −→ 0 1 1 2 −→ 0 1 1 2
0 −2 0 1 0 −2 0 1 0 0 2 5
Ejercicios.
FT
1. Encuentre el complemento ortogonal de los siguientes conjuntos:
−1
(a) span
5
(b) La intersección de los planos x + 2y + 5z = 0, 2x − 3y − 4z = 0.
1 −1
(c) span −2 , 1
3 2
RA
(d) La imagen de la transformación lineal T : R3 → R4 dada por:
2x + y − 3z
x 3x + 2y − 5z
T y =
x−y
z
−x − 3y + 4z
1
x
−1 y 4
(d) ~v = y W = ∈ R : y = 2x + w, z = x − 2w .
3 z
2 w
FT
is the unique vector in Rn which is parallel to w ~ and satisfies that ~v − projw~ ~v is orthogonal to
w.
~ We already know that the projection is independent on the length of w. ~ So projw~ ~v should be
regarded as the projection of ~v onto the one-dimensional subspace generated by w. ~
In this section we want to generalise this to orthogonal projections on higher dimensional subspaces,
for instance you could think of the projection in R3 onto a given plane. Then, given a subspace U
of Rn , we want to define the orthogonal projection as the function from Rn to Rn which assigns to
each vector ~v its orthogonal projection onto U . We start with the analogue of Theorem 2.22.
RA
Theorem 7.30 (Orthogonal projection). Let U ⊆ Rn be a subspace and let ~v ∈ Rn . Then there
exist uniquely determined vectors ~vk and ~v⊥ such that
~vk ∈ U, ~v⊥ ⊥ U and ~v = ~vk + ~v⊥ . (7.5)
The vector ~vk is called the orthogonal projection of ~v onto U ; it is denoted by projU ~v .
Proof. First we show the existence of the vectors ~vk and ~v⊥ . If U = Rn , we take ~vk = ~v and
~v⊥ = ~0. If U = {~0}, we take ~vk = ~0 and ~v⊥ = ~v . Otherwise, let 0 < dim U = k < n. Choose
orthonormal bases ~u1 , . . . , ~uk of U and w ~ n of U ⊥ . This is possible by Theorem 7.24
~ k+1 , . . . , w
D
and Proposition 7.26. Then ~u1 , . . . , ~uk , w ~ n is an orthonormal basis of Rn and for every
~ k+1 , . . . , w
n
~v ∈ R we find with the help of Theorem 7.7 that
~v = h~u1 , ~v i~u1 + · · · + h~uk , ~v i~uk + hw
~ k+1 , ~v iw
~ k+1 + · · · + hw
~ n , ~v iw
~n .
| {z } | {z }
∈U ∈U ⊥
Since U ∩ U ⊥ = {~0}, it follows that ~vk − ~zk = ~0 and ~z⊥ − ~v⊥ = ~0, and therefore ~zk = ~vk and
~z⊥ = ~v⊥ .
Definition 7.31. Let U be a subspace of Rn . Then we define the orthogonal projection onto U as
the map which sends ~v ∈ Rn to its orthogonal projection onto U . It is usually denoted by PU , so
PU : Rn → Rn , PU ~v = projU ~v .
Remark 7.32 (Formula for the orthogonal projection). The proof of Theorem 7.30 indicates
how we can calculate the orthogonal projection onto a given subspace U ⊆ Rn . If ~u1 , . . . , ~uk is an
orthonormal basis of U , then
This shows that PU is a linear transformation since PU (~x + c~y ) = PU ~x + cPU ~y follows easily from
(7.6).
FT
Exercise. If ~u1 , . . . , ~uk is an orthogonal basis of U (but not necessarily orthonormal), show that
h~u1 , ~v i h~uk , ~v i
PU ~v = ~u1 + · · · + ~uk . (7.7)
k~u1 k2 k~uk k2
Remark 7.33 (Formula for the orthogonal projection for dim U = 1). If dim U = 1, we
~ ∈U
obtain again the formula (7.4) which we already know from Section 2.3. To see this, choose w
~ 6= ~0. Then w
with w ~ 0 = kwk
~ −1 w
~ is an orthonormal basis of U and according to (7.6) we have that
RA
hw~ , ~v i
~ 0 , ~v iw
~ 0 = kwk
~ −1 w ~ −1 w ~ −2 hw
projw~ ~v = projU ~v = hw ~ , ~v kwk ~ = kwk ~ , ~v i w
~= w.
~
~ 2
kwk
Remark 7.34 (Pythagoras’s Theorem). Let U be a subspace of Rn , ~v ∈ Rn and let ~vk and ~v⊥
be as in Theorem 7.30. Then
k~v k2 = k~vk k2 + k~v⊥ k2 .
k~v k2 = h~v , ~v i = h~vk + ~v⊥ , ~vk + ~v⊥ i = h~vk , ~vk i + h~vk , ~v⊥ i + h~v⊥ , ~vk i + h~v⊥ , ~v⊥ i
= h~vk , ~vk i + h~v⊥ , ~v⊥ i = k~vk k2 + k~v⊥ k2 .
Exercise 7.35. Let U be a subspace of Rn with basis ~u1 , . . . , ~uk and let w
~ k+1 , . . . , w~ n be a basis
of U ⊥ . Find the matrix representation of PU with respect to the basis ~u1 , . . . , ~uk , w~ k+1 , . . . , w
~ n.
Exercise 7.36. Let U be a subspace of Rn . Show that PU ⊥ = id −PU . (You can show this either
directly or using the matrix representation of PU from Exercise 7.35.)
Exercise 7.37. Let U be a subspace of Rn . Show that (PU )2 = PU . (You can show this either
directly or using the matrix representation of PU from Exercise 7.35.)
~v z
dist(~v , U )
U
•
pro
jU ~v
FT
Figure 7.3: The figure shows the orthogonal projection of the vector ~v onto the subspace U (which is
a vector) and the distance of ~v to U (which is a number. It is the length of the vector (~v − projU ~v )).
In Theorem 7.30 we used the concept of orthogonality to define the orthogonal projection of ~v onto
a given subspace. We obtained a decomposition of ~v into a part parallel to the given subspace and
a part orthogonal to it. The next theorem shows that the orthogonal projection of ~v onto U gives
us the point in U which is closest to ~v .
Theorem 7.39. Let U be a subspace of Rn and let ~v ∈ Rn . Then PU ~v is the point in U which is
D
Taking the square root on both sides shows the desired inequality.
Definition 7.40. as Let U be a subspace of Rn and let ~v ∈ R. The we define the distance of ~v to
U as
dist(~v , U ) := k~v − PU ~v k.
In Remark 7.32 we already found a formula for the orthogonal projection PU of a vector ~v to a
given subspace U . This formula however requires to have an orthonormal basis of U . We want to
give another formula for PU which does not require the knowledge of an orthonormal basis.
Theorem 7.41. Let U be a subspace of Rn with basis ~u1 , . . . , ~uk and let B ∈ M (n × k) be the
matrix whose columns are these basis vectors. Then the following holds.
(i) B is injective.
(ii) B t B : Rk → Rk is a bijection.
(iii) The orthogonal projection onto U is given by the formula
PU = B(B t B)−1 B t .
Proof. (i) By construction, the columns of B are linearly independent. Therefore the unique
solution of B~x = ~0 is ~x = ~0 which shows that B is injective.
FT
(ii) Observe that B t B ∈ M (k × k) and assume that B t B~x = ~0 for some ~x ∈ Rk . Then it follows
for every ~y ∈ Rk that B~y = ~0 because
Since this is true for every ~y ∈ Rk , it follows that B t ~x − B t B~z = ~0. Now we recall that B t B
is invertible, so we can solve for ~z and obtain ~z = (B t B)−1 B t ~x. This finally gives
Ejercicios.
1. Sea E : x + 2y − z = 1. Encuentre la distancia del punto P (6, 1, 2) al plano E.
x−1 z+1
2. Sea L : 2 =y+2= 3 . Encuentre la distancia del punto P (6, −1, 0) a la recta L.
x y
3. Sea L : 2 = 3 = z.
(a) Encuentre L⊥ .
(b) Sea T (~x) = media rotación de ~x con respecto a la recta L. Encuentre una fórmula
explicita para T . (Hint: ¿Cúal es la proyección de T ~x en L y en L⊥ ? Vea la gráfica).
FT
T ~x ~x
P royL⊥ ~x
L⊥
RA
4. Sea W un subespacio de Rn y PW la proyección ortogonal sobre W .
(a) Sean {w~ 1, w ~ k } una base ortonormal de W . Recuerde que esta base se puede
~ 2, . . . , w
completar a una base ortonormal B de Rn . ¿Cómo es la representación matricial de PW
con respecto a la base B?
D
Theorem 7.42. Let U be a subspace of Rn with basis ~u1 , . . . , ~uk . Then there exists an orthonormal
basis ~x1 , . . . , ~xk of U such that
span{~u1 , . . . , ~uj } = span{~x1 , . . . , ~xj } for every j = 1, . . . , k.
Proof. The proof is constructive, that is, we do not only prove the existence of such basis, but it
tells us how to calculate it. The idea is to construct the new basis ~x1 , . . . , ~xk step by step. In order
to simplify notation a bit, we set Uj = span{~u1 , . . . , ~uj } for j = 1, . . . , k. Note that dim Uj = j
and that Uk = U .
• Set ~x1 = k~v1 k−1~v1 . Then clearly k~x1 k = 1 and span{~u1 } = span{~x1 } = U1 .
FT
• The vector ~x2 must be a normalised vector in U2 which is orthogonal to ~x1 , that is, it must
be orthogonal to U1 . So we simple take ~u2 and subtract its projection onto U1 :
~ 2 = ~u2 − projU1 ~u2 = ~u2 − proj~x1 ~u2 = ~u2 − h~x1 , ~u2 i~x1 .
w
Clearly w ~ 2 ∈ U2 because it is a linear combination of vectors in U2 . Moreover, w ~ 2 ⊥ U1
because
D E
hw
~ 2 , ~x1 i = ~u2 − h~x1 , ~u2 i~x1 , ~x1 = h~u2 , ~x1 i − h~x1 , ~u2 ih~x1 , ~x1 i = h~u2 , ~x1 i − h~x1 , ~u2 i = 0.
RA
Hence the vector ~x2 that we are looking for is
~ 2 k−1 w
~x2 = kw ~ 2.
Since ~x2 ∈ U2 it follows that span{~x1 , ~x2 } ⊆ U2 . Both spaces have dimension 2, so they must
be equal.
• The vector ~x3 must be a normalised vector in U3 which is orthogonal to U2 = span{~x1 , ~x2 }.
So we simple take ~x3 and subtract its projection onto U2 :
w~ 3 = ~u3 − projU2 ~u3 = ~u3 − (proj~x1 ~u3 + proj~x2 ~u3 ) = ~u3 − h~x1 , ~u3 i~x1 + h~x1 , ~u3 i~x1 .
D
We repeat this k times until have constructed the basis ~x1 , . . . , ~xk .
Note that the general procedure is as follows:
• Suppose that we already have constructed ~x1 , . . . , ~x` . Then we first construct
FT
1 2 1
We want to find an orthonormal basis ~x1 , ~x2 , ~x3 of U using the Gram-Schmidt process.
Solution. (i) ~x1 = k~u1 k−1 ~u1 = 12 ~u1 . −3
2
√
(ii) w~ 2 = ~u2 − proj~x1 ~u2 = ~u2 − h~x1 , ~u2 i~x1 = ~u2 − 4~x1 = ~u2 − 2~u1 = 2
−3 1
2 0
1 √
RA
=⇒ ~x2 = kw~ 2 k−1 w
~2 = 2.
4 1
0
(iii) w~ 3 = ~u3 − projspan{~x1 ,~x2 } ~u3 = ~u3 − h~x1 , ~u3 i~x1 + h~x2 , ~u3 i~x1 = ~u3 − 2~x1 + 4~x2
−2 1 −3 0
5 1 2
√ √2
= 0 − 0 − 2 = − 2
0 1 1 −2
D
1 1 0 0
0
−2
−1 1
√
=⇒ ~x3 = kw~ 3k w ~ 3 = √10 2 .
2
0
Therefore the desired orthonormal basis of U is
1 −3 0
1 2 −2
1 1 √ 1 √
~x1 = 0 , ~x2 = 2 , ~x3 = √ 2.
2
4
10
1 1 2
1 0 0
Note that we will obtain a different basis if we change the order of the given basis ~u1 , ~u2 , ~u3 .
Example 7.44. We will give another solution of Example 7.29. We were asked to find an orthonor-
mal basis of the orthogonal complement of
1
1
2 0
U = span , .
3 1
4 0
1 0
FT
We use the Gram-Schmidt process to obtain an orthonormal basis ~x1 , ~x2 of U .
1
(i) ~x1 = k~v1 k−1~v1 = √ ~v1 .
5
−1 0 −5
2 −1 2 −2 1 −1
(ii) ~y2 = w~ 2 − proj~x1 w~ 2 = ~u2 − h~x1 , ~u2 i~x1 = ~u2 − √ ~x1 =
1 − 5 0 = 5 5
5
0 1 −2
RA
5
1 1
=⇒ ~x2 = k~y2 k−1 ~y2 = √
55 −5
2
Therefore
0 5
1 −2 1 −1
~x1 = √ , ~x2 = √ .
5 0 55 5
D
1 2
• apply the Gram-Schmidt process in order to generate an orthonormal basis of a given sub-
space,
• etc.
Ejercicios.
2. Encuentreuna
base
ortonormal
de R4 que contenga una base del subespacio generado por los
1 2
0 −1
1, 0.
vectores
0 1
1 0 0 0
1 1 0 0
3. Sea W = span 0 , 1 , 1 y ~v = 1. Encuentre el elemento en W más cercano
1 1 0 1
FT
a ~v y determine la distancia de ~v a W .
1 2 2 −5
3 2 1 −2
4. Sea A = . Determine una base ortonormal para el espacio columna de
2 0 −1 3
7 −2 1 4
A.
RA
5. Sea E : 2x + 3y
+ z = 0 y PE la proyección ortogonal sobre E. Encuentre bases B1 y B2 tales
5 0 0
que [PE ]B
B1 =
2 0 0 0 .
√
0 0 2
In this section we want to present the least squares method to fit a linear function to certain
measurements. Let us see an example.
Example 7.45. Assume that we want to measure the Hook constant k of a spring. By Hook’s law
we know that
y = y0 + km (7.8)
where y0 is the elongation of the spring without any mass attached and y is the elongation of the
spring when we attach the mass m to it.
y0 y0
ym
m
y y
Assume that we measure the elongation for different masses. If Hook’s law is valid and if our
measurements were perfect, then our measured points should lie on a line with slope k. However,
measurements are never perfect and the points will rather be scattered around a line. Assume that
FT
we measured the following.
m 2 3 4 5
y 4.5 5.1 6.1 7.9
y y
RA
8 8
7 7
6 6
g1
5 5
g2
D
4 4
m m
1 2 3 4 5 1 2 3 4 5
Figure 7.4: The left plot shows the measured data. In the plot on the right we added the two functions
g1 (x) = x + 2.5, g2 (x) = 1.1x + 2 which seem to be reasonable candidates for linear approximations to
the measured data.
The plot gives us some confidence that Hook’s law holds since the points seem to lie more or less
on a line. How do we best fit a line through the points? The slope seems to be around 1. We could
make the following guesses:
Which of the two functions is the better approximation? Are there other approximations that are
even better?
The answer to this questions depend very much on how we measure how “good” an approximation
is. One very common way is the following: For each measured point, we take the difference
∆j := mj − g(mj ) between the measured value and the value of our test function. Then we
P 1
n 2 2
square all these differences, sum them and then we take the square root j=1 (mj − g(m j )) ,
see also Figure 7.5. The resulting number will be our measure for how good our guess is.
y
y
∆n−1 ∆n
FT
∆1 ∆2
m m
x1 x2 x3 . . . xn−1 xn x1 x2 x3 . . . xn−1 xn
Figure 7.5: The graph on the left shows points for which we want to find an approximating linear
function. The graph on the right shows such a linear function and how to measure the error or
1
RA
Pn 2 2
discrepancy between the measured points the proposed line. A measure for the error is j=1 ∆j .
(i) If all the measured point lie on a line and we take this line as our candidate, then this method
gives the total error 0 as it should be.
(ii) We take the squares of the errors in each measured points so that the error is always counted
positive. Otherwise it could happen that the errors cancel each other. If we would simply
D
sum the errors, then the total error could be 0 while the approximating line is quite far from
all the measure points.
Pn
(iii) There are other ways how to measure the error, for example one could use j=1 |mj − g(mj )|,
but it turns out the methods with the squares has many advantages. (See some course on
optimisation for further details.)
Now let us calculate the errors for our measure points and our two proposed functions.
m 2 3 4 5 m 2 3 4 5
y (measured) 4.5 5.1 6.1 7.9 y (measured) 4.5 5.1 6.1 7.9
g1 (m) 4.5 4.5 6.5 7.5 g2 (m) 4.2 5.3 6.4 7.5
y − g1 0 0.6 -0.4 0.4 y − g2 0.3 -0.2 -0.3 0.4
so our second guess seems to be closer to the best linear approximation to our measured points
than the first guess. This exercise will be continued on p. 303.
Now the question arises how we can find the optimal linear approximation.
Best linear approximation. Assume we are given measured data (x1 , y1 ), . . . , (xn , yn ) and we
want to find a linear function g(x) = ax + b such that the total error
n
hX i 21
∆ := (yj − g(xj ))2 (7.9)
j=1
FT
is minimal. In other words, we have to find the parameters a and b such that ∆ becomes as small
as possible. The key here is to recognise the right hand side on (7.9) as the norm of a vector (here
the particular form of how we chose to measure the error is crucial). Let us rewrite (7.9) as follows:
y1 − (ax1 − b)
hXn i 12 hX n i 12 y2 − (ax2 − b)
∆= (yj − g(xj ))2 = (yj − (axj − b))2 =
..
j=1 j=1
.
yn − (axn − b)
RA
y1 x1 1
y2 x2 1
= . − a . + b . .
.. .. ..
yn xn 1
Let us set
y1 x1 1
y2 x2 1
D
~y = . , ~x = . and ~u = . . (7.10)
.. .. ..
yn xn 1
Note that these are vectors in Rn . Then
and the question is how we have to choose a and b such that this becomes as small as possible. In
other words, we are looking for the point in the vector space spanned by ~x and ~u which is closest
to ~y . By Theorem 7.39 this point is given by the orthogonal projection of ~y onto that plane.
To calculate this projection, set U = span{~x, ~u} and let P be the orthogonal projection onto U .
Then by our reasoning
P ~y = a~x + b~u. (7.11)
Now let us see how we can calculate a and b easily from (7.11).1 In the following we will assume
that ~x and ~u so that U is a plane. This assumption seems to be reasonable because that they are
linearly dependent would mean that x1 = · · · = xn (in our example with the spring this would
mean that we always used the same mass in the experiment). Observe that if ~x, ~u were linearly
independent, then the matrix A below would have only one column; everything else works just the
same.
Recall that by Theorem 7.41 the orthogonal projection onto U is given by
P = A(At A)−1 At
where A is the n × 2 matrix whose columns consist of the vectors ~x and ~u. Therefore (7.11) becomes
a
A(At A)−1 At ~y = a~x + b~u = A . (7.12)
b
Since by our assumption the columns of A are linearly independent, it is injective. Therefore we
can conclude from (7.12) that
FT
a
(At A)−1 At ~y =
b
which is formula for the numbers a and b that we were looking for.
Theorem 7.46. Let (x1 , y1 ), . . . , (xn , yn ) be given. The linear function g(x) = ax + b which min-
RA
imises the total error
hX n i 21
∆ := (yj − g(xj ))2 (7.13)
j=1
is given by
a
= (At A)−1 At ~y (7.14)
b
where ~y , ~x and ~u are as in (7.10) and A is the n × 2 matrix whose columns consist of the vectors ~x
and ~u.
D
In Remark 7.47 we will show how this formula can be derived with methods from calculus.
Exercise 7.45 continued. . Let us use Theorem 7.46 to calculate the best linear approximation
to the data from Exercise 7.45. Note that in this case the mj correspond to the xj from the theorem
and we will write m
~ instead of ~x. In this case, we have
2 1 2 1 4.5
3 1 3 1 5.1
m
~ =4 , ~u = 1 , A = (~x | ~u) = 4 1 , ~y = 6.1 ,
5 1 5 1 7.9
1 Of y and then plant the linear n × 2 system to find the coefficients a and b.
course, you could simply calculate P ~
hence
2 1
t 2 3 4 5
3 1
= 54 14 t −1 1 2 −7
AA= , (AA ) =
1 1 1 1 4 1 14 4 5 −7 27
5 1
and therefore
4.5 4.5
a t −1 t 1 2 −7 2 3 4 5.1 = 1 −3 −1
5 1 5.1
3
= (A A) A ~y =
b 10 −7 27 1 1 1 1 6.1 10 13 6 −1 −8 6.1
7.9 7.9
1.12
= .
1.98
We conclude that the best linear approximation is
g(m) = 1.12m + 1.98.
6
y
FT g
RA
5
4
m
1 2 3 4 5
Figure 7.6: The plot shows the measured data and the linear approximation g(m) = 1.12m + 1.98
D
The method above can be generalised to other types of functions. We will show how it can be
adapted to the case of polynomial and to exponential functions.
Polynomial functions.. Assume we are given measured data (x1 , y1 ), . . . , (xn , yn ) and we want
to find a polynomial of degree k which best fits the data points. Let p(x) = ak xk + ak−1 xk−1 +
· · · + a1 x + a0 be the desired polynomial. We define the vectors
k k−1
y1 x1 x1 x1 1
y2 xk2 xk−1 x2 1
2
~y = . , ξ~k = . , ξ~k−1 = . , . . . , ξ~1 = . , ξ~0 = . .
.. .. .. .. ..
yn xkn xk−1
n xn 1
where A = (ξ~k | . . . | ξ~0 ) is the n × (k + 1) matrix whose columns are the vectors ξ~k , . . . , ξ~0 . Note
that by our assumption k < n (otherwise the vectors ξ~k , . . . , ξ~0 cannot be linearly independent).
Remark. Generally one should have many more data points than the degree of the polynomial
one wants to fit; otherwise the problem of overfitting might occur. For example, assume that
the curve we are looking for is f (x) = 0.1 + 0.2x and we are given only three measurements:
(0, 0.25), (1, 0), (3, 1). Then a linear fit would give us g(x) = 27 x + 28 1
≈ 0.23x + 0.036. The
1 2 1 1
fit with a quadratic function gives p(x) = 4 x − 2 x + 4 which matches the data points perfectly
but is far away from the curve we are looking for. The reason is that we have too many free
FT
parameters in the polynomial so the fit the data too well. (Note that for any given n + 1 points
(x1 , y1 ), . . . , (xn+1 , yn+1 ) with x1 z . . . , xn+1 , there exists exactly one polynomial p of degree ≤ n
such that p(xj ) = yj for every j = 1, . . . , n + 1.) If we had a lot more data points and we tried to
fit a polynomial to a linear function, then the leading coefficient should become very small but this
effect does not appear if we have very few data points.
y
RA
2
p
g
1
f
x
−2 −1 1 2 3 4
D
Figure 7.7: Example of overfitting when we have too many free variables for a given set of data
points. The dots mark the measured points which are supposed to approximate the red curve f .
Fitting polynomial p of degree 2 leads to the green curve. The blue curve g is the result of a linear fit.
Exponential functions.. Assume we are given measured data (x1 , y1 ), . . . , (xn , yn ) and we want
to find a function of form g(x) = c ekx to fit our data point. Without restriction we may assume
that c > 0 (otherwise we fit −g).
Then we only need to define h(x) = ln(g(x)) = ln c + kx so that we can use the method to fit a
linear function to the data points (x1 , ln(y1 )), . . . , (xn , ln(yn )) in order to obtain c and k.
Remark 7.47. Let us show how the formula in Theorem 7.46 can be derived with analytic methods.
Recall that the problem is the following: Let (x1 , y1 ), . . . , (xn , yn ) be given. Find a linear function
as a function of the two variables a, b. In order to simplify the calculations a bit, we observe that
it is enough to minimise the square of ∆ since ∆(a, b) ≥ 0 for all a, b, and therefore it is minimal if
and only if its square is minimal. So we want to find a, b which minimise
n
X
F (a, b) := (∆(a, b))2 = (yj − axj − b)2 . (7.15)
j=1
FT
To this end, we have to derive F . Since F : R2 → R, the derivative will be vector valued function.
We find
X n n
∂F ∂F X
DF (a, b) = (a, b), (a, b) = −2xj (yj − axj − b), −2(yj − axj − b)
∂a ∂b j=1 j=1
X n Xn n
X n
X Xn
2
=2 a xj + b xj − xj yj , a xj + nb − yj .
RA
j=1 j=1 j=1 j=1 j=1
Now we need to find the critical points, that is, a, b such that DF (a, b) = 0. This is the case for
n n n
X X X
2
a xj + b xj = xj yj
Pn
2
Pn ! Pn
j=1 xj j=1 xj a j=1 xj yj
j=1
j=1 j=1
n n
that is Pn = Pn .
X X
j=1 xj n b j=1 yj
a xj + bn = yj
j=1 j=1
D
(7.16)
Now we can multiply on both sides from the left by the inverse of the matrix and obtain the solution
for a, b. This shows that F has only one critical point. Since F tends to infinity for k(a, b)k → ∞,
the function F must indeed have a minimum in this critical point. For details, see a course on
vector calculus or optimisation.
We observe the following: If, as before, we set
x1 1 x1 1 y1
.. .. .. ..
~x = . , ~u = . , A = (~x | ~u) = . , ~y = . ,
xn 1 xn 1 yn
then
n
X n
X n
X n
X
x2j = h~x , ~xi, xj = h~x , ~ui, n = h~u , ~ui, xj yj = h~x , ~y i, yj = h~u , ~y i.
j=1 j=1 j=1 j=1
FT
b
which becomes our equation (7.14) if we multiply both sides of the equation from the left by
(At A)−1 .
Ejercicios.
1. Una bola rueda a lo largo del eje x con velocidad constante. A lo largo de la trayectoria de
la bola se miden las coordenadas x de la bola en ciertos tiempos t. Las siguientes mediciones
son (t en segundos, x en metros):
(b) Use el método de mı́nimos cuadrados para econtrar la posición inicial x0 y la velocidad
v de la bola.
(c) Dibuje la recta en el bosquejo anterior. ¿Dónde/Cómo se ven x0 y v?
2. Se supone que una sustancia quı́mica inestable decaye según la ley P (t) = P0 ekt . Suponga
que se hicieron las siguientes mediciones:
t 1 2 3 4 5
P 7.4 6.5 5.7 5.2 4.9
Con el método de mı́nimos cuadrados aplicado a ln(P (t)), encuentre P0 y k que mejor corre-
sponden con las mediciones. Dé una estimada para P (8).
FT
3. Con el método de mı́nimos cuadrados encuentre el polı́nomio y = p(x) de grado 2 que mejor
aproxima los siguientes datos:
x -2 -1 0 1 2 3 4
y 15 8 2.8 -1.2 -4.9 -7.9 -8.7
7.7 Summary
RA
Let U be a subspace of Rn . Then its orthogonal complement is defined by
• U ⊥ is a vector space.
• U ⊥ = ker A where A is any matrix whose rows are formed by a basis of U .
• (U ⊥ )⊥ = U .
D
• dim U + dim U ⊥ = n.
• U ⊕ U ⊥ = Rn .
• U has an orthonormal basis. One way to construct such a basis is to first construct an
arbitrary basis of U and then apply the Gram-Schmidt orthogonalisation process to obtain
an orthonormal basis.
• PU is a linear transformation.
• PU ~x k U for every ~x ∈ Rn .
• ~x − PU ~x ⊥ U for every ~x ∈ Rn .
• For every ~x ∈ Rn the point in U nearest to ~x is given by ~x − PU ~x and dist(~x, U ) = k~x − PU ~xk.
• Formulas for PU :
Orthogonal matrices
A matrix Q ∈ M (n × n) is called an orthogonal matrix if it is invertible and if Q−1 = Qt . Note
that the following assertions for a matrix Q ∈ M (n × n) are equivalent:
Every orthogonal matrix represents either a rotation (in this case its determinant is 1) or a com-
position of a rotation with a reflection (in this case its determinant is −1).
7.8 Exercises
D
1/4
1. (a) Complete p a una base ortonormal para R2 . ¿Cuántas posibilidades hay para
15/16
hacerlo?
√ √
1/ √2 1/√3
(b) Complete −1/ 2 , 1/√3 a una base ortonormal para R3 . ¿Cuántas posibilidades
0 1/ 3
hay para hacerlo?
√
1/√2
(c) Complete 1/ 2 a una base ortonormal para R3 . ¿Cuántas posibilidades hay para
0
hacerlo?
2. Encuentre una base para el complemento ortogonal de los siguientes espacios vectoriales.
Encuentre la dimensión del espacio y la dimensión de su complemento ortogonal.
1
2 1 3
2
2 3
, , 3 ⊆ R4 .
2 4
, ⊆ R , 4
(a) U = span (b) U = span
3
4
3
5 4
4 5 4 6 5
FT
ii. Sean ~a1 = (1, 2, 0, 1)t , ~a2 = (11, 4, 4, −3)t , ~a3 = (0, −1, −1, 0)t . Para cada j = 1, 2, 3
encuentre el punto w ~ j ∈ W que esté más cercano a ~aj y calcule la distancia entre ~aj
yw~j.
iii. Encuentre la matriz que representa la proyección ortogonal sobre W (en la base
estandar).
0 −1 1 0
0 1 3 0 0 0
2
~ = 2, ~a = 4, ~b = 0 , ~c = 1, d~ = 1.
RA
2, w
4. Sean ~v = 3
0 0 0 0
1 5
0 3 1 1
(a) Demuestre que ~v y w~ son linealmente independientes y encuentre una base ortonormal
~ ⊆ R4 .
de U = span{~v , w}
(b) Demuestre que ~a, ~b, ~c y d~ son linealmente independientes. Use el proceso de Gram-
Schmidt para encontrar una base ortonormal de U = span{~a, ~b, ~c, d}
~ ⊆ R5 . Encuentre
⊥
una base de U .
D
FT
RA
D
FT
In this chapter we work mostly in Rn and in Cn . We write MR (n × n) or MC (n × n) only if it is
important if the matrix under consideration is a real or a complex matrix.
The first section is dedicated to Cn . We already know that it is a vector space. But now we
introduce an inner product on it. Moreover we define hermitian and unitary matrices on Cn which
are analogous to symmetric and orthogonal matrices in Rn . We define eigenvalues and eigenvectors
in Section 8.3. It turns out that it is more convenient to work over C because the eigenvalues
RA
are zeros of the so-called characteristic polynomial and in C every polynomial has a zero. The
main theorem is Theorem 8.48 which says that an n × n matrix is diagonalisable if it has enough
eigenvectors to generate Cn (or Rn ). It turns out that every symmetric and every hermitian matrix
is diagonalisable.
We end the chapter with an application of orthogonal diagonalisation to the solution of quadratic
equations in two variables.
D
z1
n ..
C = . : z1 , . . . , zn ∈ C
zn
313
314 8.1. Complex vector spaces
FT
zn
w1
~ = ... ∈ Cn the inner product (or scalar product or dot product) is defined as
w
wn
* z1 w1 + n
.. .. X
h~z , wi
~ = . , . = zj wj = z1 w1 + · · · + zn wn .
RA
zn wn j=1
z1
The length of ~z = ... ∈ Rn is denoted by k~zk and it is given by
zn
p
k~zk = |z1 |2 + · · · + |zn |2 .
Other names for the length of ~z are magnitude of ~z or norm of ~z.
D
Exercise 8.2. Show that the scalar product from Definition 8.1 can be viewed as an extension of
the scalar product in Rn in the following sense: If the components of ~z and ~v happen to be real,
then they can also be seen as vectors in Rn . The claim is that their scalar product as vectors in
Rn is equal to their scalar product in Cn . The same is true for their norms.
Properties 8.3. (i) Norm of a vector: For all vectors ~z ∈ Cn , we have that
h~z , ~zi = k~zk2 .
(iii) Sesqulinearity of the inner product: For all vectors ~u, ~v , ~z ∈ Cn and all c ∈ C, we have that
h~v + cw
~ , ~zi = h~v , ~zi + chw
~ , ~zi and h~v , w
~ + c~zi = h~v , wi
~ + ch~v , ~zi.
vn wn zn
FT
= v1 w1 + · · · + vn wn + cw1 w1 + · · · + cwn wn
= h~v , ~zi + chw
~ , ~zi.
The second equation can be shown by an analogous calculation. Instead of repeating them,
we can also use the symmetry property of the inner product:
h~v , w
~ + c~zi = hw
~ + c~z , ~v i = hw
~ , ~v i + ch~z , ~v i = hw
~ , ~v i + ch~z , ~v i = h~v , ~zi + ch~v , ~zi.
RA
(iv) kc~zk2 = hc~z , c~zi = cch~z , ~zi = |c|2 k~zk2 . Taking the square root on both sides, we obtain the
desired equality kc~zk = |c|k~zk.
For Cn there is no cosine theorem and in general it does not make too much sense to speak about
the angle between two complex vectors (orthogonality still makes sense!).
h~ vi
z ,~
(ii) If ~v 6= ~0, then the orthogonal projection of ~z onto ~v is proj~v ~z = k~v k2 ~
v.
Proof. (i) If ~z ⊥ ~v , then k~z +~v k2 = h~z , ~zi + h~z , ~v i + h~v , ~zi + h~v , ~v i = h~z , ~zi + h~v , ~v i = k~zk2 + k~v k2 .
(ii) It is clear that ~z = ~zk + ~z⊥ and that ~zk k ~v by definition of ~zk and ~z⊥ . That ~z⊥ ⊥ ~v follows
from
h~z , ~v i
h~z⊥ , ~v i = h~z − proj~v ~z , ~v i = h~z , ~v i − hproj~v ~z , ~v i = h~z , ~v i − h~v , ~v i = h~z , ~v i − h~z , ~v i = 0.
k~v k2
Finally, by the Pythagoras theorem,
k~zk2 = k(~z − proj~v ~z) + proj~v ~zk2 = k~z − proj~v ~zk2 + k proj~v ~zk2 ≥ k proj~v ~zk2 .
FT
= proj~v ~z1 + c proj~v ~z2 .
Question 8.1
What changes if in the definition of the orthogonal projection we put h~v , ~zi instead of h~z , ~v i?
Now let us show the triangle inequality. Note the the following inequalities (8.1) and (8.2) were
proved for real vector spaces in Corollary 2.20 using the cosine theorem.
RA
Proposition 8.6. For all vectors ~v , w ~ ∈ Cn and c ∈ C, we have the Cauchy-Schwarz inequality
(which is a special case of the so-called Hölder inequality)
|h~v , wi|
~ ≤ k~v k kwk
~ (8.1)
inequality are equal to 0. So let us assume now that w ~ 6= ~0. Note that for any λ ∈ C we have that
~ 2 = h~v − λw
0 ≤ k~v − λwk ~ = kvk2 − λhw
~ , ~v − λwi ~ + |λ|2 kwk2 .
~ , ~v i − λh~v , wi
If we chose λ = − h~
v ,wi
~
~ 2 , we obtain
kwk
h~v , wi
~ h~v , wi
~ ~ 2
|h~v , wi|
0 ≤ kvk2 − hw~ , ~
v i − h~
v , wi
~ + kwk2
~ 2
kwk ~ 2
kwk ~ 4
kwk
~ 2
|h~v , wi| ~ 2
|h~v , wi|
= kvk2 − 2 + kwk2
~ 2
kwk ~ 4
kwk
~ 2
|h~v , wi| 1 h i
= kvk2 − = kvk 2
kwk 2
− |h~
v , wi|
~ 2
~ 2
kwk kwk~ 2
~ 2 = h~v + w
k~v + wk ~ = h~v , ~v i + h~v , wi
~ , ~v + wi ~ + hw
~ , ~v i + hw
~ , wi
~
~ + h~v , wi
= h~v , ~v i + h~v , wi ~ + hw
~ , wi
~
= k~v k2 + 2 Reh~v , wi ~ 2
~ + kwk
≤ k~v k2 + 2|h~v , wi| ~ 2 ≤ k~v k2 + 2k~v k kwk
~ + kwk ~ 2 = (k~v k + kw|)
~ + kwk ~ 2.
In the first inequality we used that Re a ≤ |a| for any complex number a and in the second inequality
we used (8.1). If we take the square root on both sides we get the triangle inequality.
Remark 8.7. Observe that the choice of λ in the proof of (8.1) is not as arbitrary as it may seem.
Note that for this particular λ
FT
h~v , wi
~
~v − λw
~ = ~v − ~ = ~v − projw~ ~v .
w
~ 2
kwk
Hence this choice of λ minimises the norm of ~v −λw
~ and ~v −projw~ ~v ⊥ w.
~ Therefore, by Pythagoras,
In the complex case, we want for a given matrix A ∈ MC (m × n) a matrix A∗ such that
Lemma 8.9. Let A ∈ M (n × n). Then det(A∗ ) = det A = complex conjugate of det A.
Proof. det A∗ = det(A)t = det A = det A. The last equality follows directly from the definition of
the determinant.
A matrix with real entries is symmetric if and only if A = At . The analogue for complex matrices
are hermitian matrices.
FT
• A= =⇒ A∗ = . The matrix A is hermitian.
2 − 3i 5 2 − 3i 1 + 7I
Exercise 8.12. • Show that the entries on the diagonal of a hermitian matrix must be real.
Another important class of real matrices are the orthogonal matrices. Recall that a matrix Q ∈
MR (n × n) is an orthogonal matrix if and only if Qt = Q−1 . We saw that if Q is orthogonal, then
RA
its columns (or rows) form an orthonormal basis for Rn and that | det Q| = 1, hence det Q = ±1.
The analogue in complex vector spaces are so-called unitary matrices.
It is clear from the definition that a matrix is unitary if and only if its columns (or rows) form an
orthonormal basis for Cn , cf. Theorem 7.12.
D
(a) Q is unitary.
(b) hQ~x , Q~y i = h~x , ~y i for all ~x, ~y ∈ Rn .
(c) kQ~xk = k~xk for all ~x ∈ Rn .
Proof. (i) (a) =⇒ (b): Assume that Q is a unitary matrix and let ~x, ~y ∈ Cn . Then
(b) =⇒ (a): Fix ~x ∈ Cn . Then we have hQ~x , Q~y i = h~x , ~y i for all ~y ∈ Cn , hence
0 = hQ~x , Q~y i − h~x , ~y i = hQ∗ Q~x , ~y i − h~x , ~y i = hQ∗ Q~x − ~x , ~y i. = h(Q∗ Q − id)~x , ~y i.
Since this is true for any ~y ∈ Cn , it follows that (Q∗ Q − id)~x = 0. Since ~x ∈ Cn was arbitrary,
we conclude that Q∗ Q − id = 0, in other words, that Q∗ Q = id.
(b) =⇒ (c): It follows from (b) that kQ~xk2 = hQ~x , Q~xi = h~x , ~xi = k~xk2 , hence kQ~xk = k~xk.
(c) =⇒ (b): Observe that the inner product of two vectors in Cn can be expressed completely
in terms of norms as follows
1h i
h~a , ~bi = k~a + ~bk2 − k~a − ~bk2 + ik~a + i~bk2 − ik~a − i~bk2
4
as can be easily verified. Hence we find
1h i
hQ~x , Q~y i = kQ~x + Q~y k2 − kQ~x − Q~y k2 + ikQ~x + iQ~y k2 − ikQ~x − iQ~y k2
4
1h
FT
i
= kQ(~x + ~y )k2 − kQ(~x − ~y )k2 + ikQ(~x + i~y )k2 − ikQ(~x − i~y )k2
4
1h i
= k~x + ~y k2 − k~x − ~y k2 + ik~x + i~y k2 − ik~x − i~y k2
4
= h~x , ~y i.
Ejercicios.
1. Sea
i −3 + i 2i
A= 2 4i 2 + 2i .
4+i 2−i 6−i
Encuentre bases para la imagen del espacio columna y el kernel de A.
FT
2. Sea
1−i 1+i 8
A = 3i −3 −12 + 12i
4+i −1 + 4i 12 + 20i
Encuentre bases ortonormales para la imagen del espacio columna y el kernel de A.
1 i
1 − i 1+i
−3 y v2 = 1 y V = span{v1 , v2 }. Verifique que v1 ⊥ v2 y obtenga
RA
3. Sean v1 =
i 3 + 3i
una base ortonormal de V ⊥ .
4. Encuentre a, b, c, d, e, f ∈ R tales que la matriz
1 3a + ib 1 + 3i
A = 7a − 5ib − 4 3 + ic 5e + 3if + 2i
1 − 3i 4e − 6if + 2 − 8i 4 − ic
D
sea hermitiana.
5. Verifique que la matriz
1 −i −1 + i
1
i 1 1+i
2
1−i 1+i 0
es unitaria.
2 z1 −z2
6. Considere V = C y T : V → V dada por T = .
z2 z1
8. Sean A, B ∈ M (n × n) hermitianas.
Muestre que AB es hermitiana si y solo si AB = BA.
FT
Definition 8.16. Let A, B ∈ M (n × n) be (real or complex) matrices. They are called similar if
there exists an invertible matrix C such that
A = C −1 BC. (8.3)
Exercise 8.17. Show that A ∼ B if and only if there exists an invertible matrix C
e such that
RA
A = CB
e Ce −1 . (8.4)
Question 8.2
Assume that A and B are similar. Is the matrix C in (8.3) unique or is it possible that there are
different invertible matrices C1 6= C2 such that A = C1−1 BC1 = C2−1 BC2 ?
Remark 8.18. Similarity is an equivalence relation on the set of all square matrices. This means
D
(ii) Assume that A1 ∼ A2 . Then there exists an invertible matrix C such that A1 = C −1 A2 C.
Multiplication from the left by C and from the right by C −1 gives CA1 C −1 = A2 . Let
Ce = C −1 . Then C e −1 = C. Hence we obtain C
e is invertible and C e −1 A1 C
e = A2 which shows
that A2 ∼ A1 .
(iii) Transitivity: If A1 ∼ A2 and A2 ∼ A3 , then there exist invertible matrices C1 and C2 such
that A1 = C1−1 A2 C1 and A2 = C2−1 A3 C2 . It follows that
FT
Two matrices A and B ∈ M (n × n) are similar if and only if they represent the same linear
transformation. The matrix C in A = C −1 BC is the transition matrix between the two bases
used in the representations A and B.
det A = det C −1 BC = det(C −1 ) det B det C = (det C)−1 det B det C = det B.
Exercise 8.20. Show that det A = det B does not imply that A and B are similar.
Exercise 8.21. Assume that A and B are similar. Show that dim(ker A) = dim(ker B) and that
dim(Im A) = dim(Im B). Why is this no surprise?
D
Question 8.3
Assume that A and B are similar. What is the relation between ker A and ker B? What is the
relation between Im A and Im B?
Hint. Theorem 6.4.
A very nice class of matrices are the diagonal matrices because it is rather easy to calculate with
them. Closely related are the so-called diagonalisable matrices.
In other words, A is diagonalisable if there exists a diagonal matrix D and an invertible matrix C
with
C −1 AC = D. (8.5)
How can we decide if a matrix A is diagonalisable? We know that it is diagonalisable if and only if
it is similar to a diagonal matrix, that is, if and only if there exists a basis ~c1 , . . . , ~cn such that the
representation of A with respect to these vectors is a diagonal matrix. In this case, (8.5) is satisfied
if the columns of C are the basis vectors ~c1 , . . . , ~cn .
Denote the diagonal entries of D by d1 , . . . , dn . Then it easy to see that D~ej = dj~ej . This means
that if we apply D to some ~ej , then the image D~ej is parallel to ~ej . Since D is nothing else than
the representation of A with respect to the basis ~c1 , . . . , ~cn , we have A~cj = dj ~cj .
We can make this more formal: Take equation (8.5) and multiply both sides from the left by C so
that we obtain AC = CD. Recall that for any matrix B, we have that B~ej = jth column of B. If
we obtain
FT
CD~ej = C(dj~ej ) = dj C(~ej ) = dj ~cj .
In summary, we found:
A matrix A ∈ M (n × n) is diagonalisable if and only we can find a basis ~c1 , . . . , ~cn of Rn (or Cn )
and numbers d1 , . . . , dn such that
A~cj = dj ~cj , j = 1, . . . , n.
RA
In this case C −1 AC = D (or equivalently A = CDC −1 ) where D = diag(d1 , . . . , dn ) and C =
(~c1 | · · · |~cn ).
The vectors ~cj are called eigenvectors of A and the numbers dj are called eigenvalues of A. They
will be discussed in greater detail in the next section where we will also see how we can calculate
them.
Diagonalization of a matrix is very useful when we want to calculate powers of the matrix.
D
−1
If all dj 6= 0, then D is invertible with inverse D−1 = diag(d1 , . . . , dn ) = diag(d−1 −1
1 , . . . , dn ).
−1
Hence A is invertible and A−1 = CDC −1 = CD−1 C −1 and we obtain for k ∈ Z with k < 0
|k|
Ak = A−|k| = (A−1 )|k| = CD−1 C −1 = C(D−1 )|k| C −1 = CDk C −1 = CD−|k| C −1
= C diag(dk1 , . . . , dkn )C −1 .
Proposition 8.23 is useful for example when we describe dynamical systems by matrices or when
we solve linear differential equations with constant coefficients in higher dimensions.
FT
• etc.
You should now be able to
• etc.
Ejercicios.
1 0 −1 12 110 7 1 0 −2
RA
1. Sean A = 0 1 3, B = 5 16 −6 y C = 0 3 1. Verifique que AC =
2 −1 0 6 57 4 1 4 −1
CB y concluya que A, B son matrices semejantes.
2. Encuentre tres matrices que son semejantes a la matriz A del Ejercicio 1.. Para cada una de
ellas, encuentra el determinante y la traza.
(a) Sean A, B ∈ M (n × n) tales que det A = det B, entonces A, B son matrices semejantes.
(b) Sean D1 , D2 ∈ M (n × n) matrices diagonales tales que D1 6= D2 , entonces D1 , D2 no
son matrices semejantes.
(c) Si A, B ∈ M (n × n) son matrices equivalentes por filas entonces A, B son matrices
semejantes.
Muestre que D1 , D2 son semejantes. ¿Como puede generalizar este resultado a matrices
D1 , D2 diagonales de cualquier tamaño?
(b) Muestre que dos matrices diagonales D1 , D2 ∈ M (n × n) son semejantes si, salvo el
orden, tienen exactamente los mismos valores en la diagonal.
FT
7. Encuentre matrices A y b con det A = det B que no son semejantes.
T v = λv. (8.6)
RA
The vector v is then called a eigenvector .
The reason why we exclude v = O in the definition above is because for every λ it is true that
T O = O = λO, so (8.8) would be satisfied for any λ if we were allowed to choose v = O, in which
case the definition would not make too much sense.
Exercise 8.25. Show that 0 is an eigenvalue of T if and only if dim(ker T ) ≥ 1, that is, if and only
if T is not invertible. Show that v is an eigenvector with eigenvalue 0 if and only if v ∈ ker T \{O}.
D
Exercise 8.26. Show that all eigenvalues of a unitary matrix have norm 1.
Question 8.4
Let V, W be vector spaces and let T : V → W be a linear transformation. Why does in not make
sense to speak of eigenvalues of T if V 6= W ?
(iv) We can generalise (iii) as follows: If v1 , . . . , vk are eigenvectors of T with the same eigenvalue
λ, then every non-zero linear combination is an eigenvector with the same eigenvalue because
(iv) says that the set of all eigenvectors with the same eigenvalue is almost a subspace. The only
thing missing is the zero vector O. This motivates the following definition.
Definition 8.27. Let V be a vector space and let T : V → V be a linear map with eigenvalue λ.
Then the eigenspace of T corresponding to λ is
FT
= {v ∈ V : T v = λv}.
v ∈ Eigλ (T ) ⇐⇒ T v = λv ⇐⇒ T v − λv = O ⇐⇒ T v − λ id v = O
⇐⇒ (T − λ id)v = O ⇐⇒ v ∈ ker(T − λ id).
Note that Proposition 8.28 shows again that Eigλ (T ) is a subspace of V . Moreover it shows that
that λ is an eigenvalue of T if and only if T − λ id is not invertible. For the special case λ = 0 we
have that Eig0 (T ) = ker T .
D
Examples 8.29. (a) Let V be a vector space and let T = id. Then for every v ∈ V we have that
T v = v = 1v. Hence T has only one eigenvalue, namely λ = 1 and Eig1 (T ) = ker(T − id) =
ker 0 = V . Its geometric multiplicity is dim(Eig1 (T )) = dim V .
(b) Let V = R2 and let R be reflection on the x-axis. If ~v is an eigenvector of R, then R~v
must be parallel to ~v . This happens if and only if ~v is parallel to the x-axis in which case
R~v = ~v , or if ~v is perpendicular to the x-axis in which case R~v = −~v . All other vectors
change directions under a reflection. Hence we have the eigenvalues λ1 = 1 and λ2 = −1 and
Eig1 (R) = span{~e1 }, Eig−1 (R) = span{~e2 }. Each eigenvalue has geometric multiplicity 1.
Note that
the matrix representation
of R
with respect
to the canonical basis of R2 is AR =
1 0 1 0 x1 x1
and AR ~x = = . Hence AR ~x is parallel to ~x if and only if
0 −1 0 −1 x2 −x2
x1 = 0 (in which case ~x ∈ span{~e2 }) or x2 = 0 (in which case ~x ∈ span{~e1 }).
(c) Let V = R2 and let R be rotation about 90◦ . Then clearly R~v 6k ~v for any ~v ∈ R2 \ {~0}. Hence
R has no eigenvalues.
2
Note that
the matrix representation of R with respect to the canonical basis of R is AR =
0 −1
. If we consider AR as a real matrix, then it has no eigenvalues. However, if consider
1 0
AR as a complex matrix, then it has the eigenvalues ±i as we shall see later.
1 0 0 0 0 0
050000
(d) Let A = 00 00 50 05 00 00 . As always, we identify A with the linear map R6 → R6 , ~x 7→ A~x. It
000080
000000
is not hard to see that the eigenvalues and eigenspaces of A are
FT
Show the claims above.
(e) Let V = C ∞ (R) be the space of all infinitely many times differentiable functions from R
to R and let T : V → V, T f = f 0 . Analogously to Example 6.4 we can show that T is a
linear transformation. The eigenvalues of T are those λ ∈ R such that there exists a function
f ∈ C ∞ (R) with f 0 = λf . We know that for every λ ∈ R this differential equation has a
solution and that every solution is of the form fλ (x) = c eλx for some real number c. Therefore
every λ ∈ R is an eigenvalue of T with eigenspace Eigλ (T ) = span{gλ } where gλ is the function
RA
given by gλ (x) = eλx . In particular, the geometric multiplicity of any λ ∈ R is 1.
(f) Let V = C ∞ (R) be the space of all infinitely many times differentiable functions from R to
R and let T : V → V, T f = f 00 . It is easy to see that T is a linear transformation. The
eigenvalues of T are those λ ∈ R such that there exists a function f ∈ C ∞ (R)√with f 00√= λf .
If λ > 0, then the general solution of this differential
√ equation
√ is fλ (x) = a e λx +b e λx . If
λ < 0, the general solution is fλ (x) = a cos λx + b sin λx. If λ = 0, the general solution is
f0 (x) = ax + b. Hence every λ ∈ R is an eigenvalue of T with geometric multiplicity 2.
D
Find the eigenvalues and eigenspaces if we consider the vector space of infinitely differentiable
functions from R to C.
Theorem 8.30. Let V be a finite dimensional vector space with basis B = {v1 , . . . , vn } and let
T : V → V be a linear transformation. If AT is the matrix representation of T with respect to
the basis B, then the eigenvalues of T and AT coincide and a vector v = c1 v1 + · · · + cn vn is an
c1
eigenvector of T with eigenvalue λ if and only if ~x = ... is an eigenvector of AT with the same
cn
eigenvalue λ. In particular, the dimensions of the eigenspaces of T and of AT coincide.
Proof. Let K = R if V is a real vector space and K = C if V is a complex vector space and
let Φ : V → Rn be the linear map defined by Φ(vj ) = ~ej , (j = 1, . . . , n). That means that Φ
c1
..
“translates” a vector v = c1 v1 + · · · + cn vn into the column vector ~x = . , cf. Section 6.4.
cn
FT
V V
Φ Φ
n AT m
K K
Recall that T = Φ−1 AT Φ. Let λ be an eigenvalue of T with eigenvector v, that is, T v = λv. We
express v as linear combination of the basis vectors from B as v = c1 v1 + · · · + cn vn . Hence
Corollary 8.31. Assume that A and B are similar matrices and let C be an invertible matrix with
D
A = C −1 BC. Then A and B have the same eigenvalues and for every eigenvalue λ we have that
Eigλ (B) = C Eigλ (A).
Now back to the question about how to calculate the eigenvalues and eigenvectors of a given matrix
A. Recall that λ is an eigenvalue of A if and only if ker(A − λ id) 6= {~0}, see Proposition 8.28. Since
A − λ id is a square matrix, this is the case if and only if det(A − λ id) = 0.
Definition 8.32. The function λ 7→ det(A − λ id) is called the characteristic polynomial of A. It
is usually denoted by pA .
Before we discuss the characteristic polynomial and show that it is indeed a polynomial, we will
describe how to find the eigenvalues and eigenvectors of a given square matrix A.
• Find the zeros λ1 , . . . , λk of the characteristic polynomial. They are the eigenvalues of A.
• For each eigenvalue λj calculate ker(A − λj ), for instance using Gauß-Jordan elimination.
This gives the eigenspaces.
2 1
Example 8.33. Find the eigenvalues and eigenspaces of A = .
3 4
Solution. • The characteristic polynomial of A is
2−λ 1
pA (λ) = det(A − λ id) = det = (2 − λ)(4 − λ) − 3 = λ2 − 6λ + 5.
3 4−λ
• Now we can either complete the square or use the solution formula for quadratic equations to
FT
find the zeros of pA . Here we choose to complete the square.
pA (λ) = λ2 − 6λ + 5 = (λ − 3)2 − 4 = (λ − 5)(λ − 1).
Hence the eigenvalues of A are λ1 = 5 and λ2 = 1.
• Now we calculate the eigenspaces using Gauß elimination.
2−5 1 −3 1 R2 →R2 +R1 −3 1 R1 →−R1 3 −1
∗ A − 5 id = = −−−−−−−−→ −−−−−−→ .
3 4−5 3 −1 0 0 0 0
RA
1
Therefore, ker(A − 5 id) = span .
3
2−1 1 1 1 R2 →R2 −3R1 1 1
∗ A − 1 id = = −−−−−−−−→ .
3 4−1 3 3 0 0
1
Therefore, ker(A − 1 id) = span .
−1
In summary, we have two eigenvalues,
D
1
λ1 = 5, Eig5 (A) = span , geom. multiplicity: 1,
3
1
λ2 = 1, Eig1 (A) = span , geom. multiplicity: 1.
−1
1 1
If we set ~v1 = and ~v2 = we can check our result by calculating
3 −1
2 1 1 5 1
A~v1 = = =5 = 5~v1 ,
3 4 3 15 3
2 1 1 1
A~v2 = = = ~v2 .
3 4 −1 −1
Before we give more examples, we show that the characteristic polynomial is indeed a polynomial.
First we need a definition.
Definition 8.34. Let A = (aij )ni,j=1 ∈ M (n × n). The trace of A is the sum of its entries on the
diagonal:
tr A := a11 + a22 + . . . ann .
Remark. Note that exercise 5. of section 8.2 shows that the trace of similar matrices coincides,
so if V is a finite-dimensional space and T is a linear transformation, it makes sense to define tr T
as the trace of the matrix representation of T in any base of V.
Theorem 8.35. Let A = (aij )ni,j=1 ∈ M (n × n) and let pA (λ) = det(A − λ id) be the characteristic
polynomial of A. Then the following is true.
(i) pA is a polynomial of degree n.
(i) Let pA (λ) = cn λn + cn−1 λn−1 + · · · + c1 λ + c0 . Then we have formulas for the coefficients
FT
cn , cn−1 and c0 :
Proof. By definition,
a11 − λ a12 a1n
a
21 a22 − λ a2n
RA
pA (λ) = det(A − λ id) = det
.
an1 an2 ann − λ
According to Remark 4.4, the determinant is the sum of products where each product consists of a
sign and n factors chosen such that it contains one entry from each row and from each column of
A − λ id. Therefore it is clear that pA is a polynomial in λ. The term with the most λ in it is the
one of the form
D
pA (λ) = (−1)n λn + (−1)n−1 λn−1 (a11 + · · · + ann ) + terms of order at most n − 2, (8.9)
hence deg(pA ) = n.
Formula (8.9) also shows the claim about cn and cn−1 . The formula for c0 follows from
Proof. Let A ∈ M (n × n). Then the eigenvalues of A are exactly the zeros of its characteristic
polynomial. Since it has degree n, it can have at most n zeros.
FT
Now we understand why working with complex vector spaces is more suitable when we are interested
in eigenvalues. They are precisely the zeros of the characteristic polynomial. While a polynomial
may not have real zeros, it always has zeros when we allow them to be complex numbers. Indeed,
any polynomial can always be factorised over C.
Let A ∈ M (n × n) and let pA be its characteristic polynomial. Then there exist complex numbers
λ1 , . . . , λk and integers m1 , . . . , mk ≥ 1 such that
Definition 8.37. The integer mj is called the algebraic multiplicity of the eigenvalue λj .
0 0 0 0 0 0 0 4 1 0 0 0
0 4 1 0 0 0 permute rows 004100
• A − 1 id = 00 0
0
4
0
1
4
0
0
0 −
0 −−−−−−−→ 00 00 00 40 07 00 . This matrix is in row echelon form and
0 0 0 0 7 0 000007
0 0 0 0 0 7 000000
we can see easily that Eig1 (A) = ker(A − 1 id) = span{~e1 } which has dimension 1.
−4 0 0 0 0 0 −4 0 0 0 0 0
0 0 1 0 0 0 permute rows 0 0 1 0 0 0
• A − 5 id = 0
0
0
0
0
0
1
0
0
0
0 −
0 −−−−−−−→ 0
0
0
0
0
0
1
0
0
3
0 .
0 This matrix is in row echelon form
0 0 0 0 3 0 0 0 0 0 0 3
0 0 0 0 0 3 0 0 0 0 0 0
and we can see easily that Eig5 (A) = ker(A − 5 id) = span{~e2 } which has dimension 1.
−7 0 0 0 0 0
0 −3 1 0 0 0
• A − 8 id = 0 0 −3 1 0 0 .
0 0 0 −3 0 0
This matrix is in row echelon form and we can see easily that
0 0 0 000
0 0 0 000
Eig8 (A) = ker(A − 8 id) = span{~e5 , ~e6 } which has dimension 2.
In summary, we have
λ1 = 1, Eig1 (A) = span{~e1 }, geom. multiplicity: 1, alg. multiplicity: 1,
FT
λ2 = 5, Eig5 (A) = span{~e2 }, geom. multiplicity: 1, alg. multiplicity: 3,
λ3 = 8, Eig8 (A) = span{~e6 , ~e7 }, geom. multiplicity: 2, alg. multiplicity: 2.
0 −1
Example 8.40. Find the complex eigenvalues and eigenspaces of R = .
1 0
Solution. From Example 8.29 we already know that R has no real eigenvalues. The characteristic
polynomial of R is
RA
−λ −1
pR (λ) = det(R − λ) = det = λ2 + 1 = (λ − i)(λ + i).
1 −λ
Hence the eigenvalues are λ1 = −i and λ2 = i. Let us calculate the eigenspaces.
i −1 R2 →R2 +iR1 i −1 1
• R−(−i) id = −−−−−−−−→ . Hence Eig−i (R) = ker(R+i id) = span .
1 i 0 0 i
−i −1 R2 →R2 +iR1 −i −1 1
• R−i id = −−−−−−−−→ . Hence Eigi (R) = ker(R−i id) = span .
1 −i 0 0 −i
D
2 1
Example 8.41. Find the diagonalisation of A = .
3 4
Solution. We need to find an invertible matrix C and a diagonal matrix D such that D = C −1 AC.
By Example 8.33, A has the eigenvalues λ1 = 5 and λ2 = 1, hence A is indeed diagonalisable. We
know that the diagonal entries of D are the eigenvalues
of A,hence
D = diag(5, 1) and the columns
1 1
of C are the corresponding eigenvalues ~v1 = and ~v2 = , hence
3 −1
5 0 1 1
D= , C= and D = C −1 AC.
0 1 3 −1
Alternatively, we could have chosen D e = diag(1, 5). Then the corresponding C e = (~v2 |~v1
e is C
because the jth column of the invertible matrix must be an eigenvector corresponding the the jth
entry of the diagonal matrix, hence
1 0 1 1 e −1 AC.
D=
e , C=
e and D e =C e
0 5 −1 3
Observe that up to ordering the diagonal elements, the matrix D is uniquely determined by A. For
the matrix C however we have more choices. For instance, if we multiply each column of C by an
arbitrary constant different from 0, it still works.
FT
2 0 0 0
0 1 1 0
AT = 0 1 1 0
0 0 0 2
This means
that the eigenvalues of T are 0 and 2 and that the eigenspaces are Eig0 (T ) = span {M2 − M3 } =
0 1
span and
−1 0
0 1
Eig0 (T ) = span {M2 − M3 } = span = Masym (2 × 2),
−1 0
1 0 0 1 0 0
Eig2 (T ) = span {M1 , M2 + M3 , M4 } = span , , = Msym (2 × 2),
0 0 1 0 0 1
Remark. We could have calculated the eigenspaces or T directly without calculating those of AT
first as follows.
• A matrix M belongs to Eig0 (T ) if and only if T (M ) = 0. This is the case if and only if
M + M t = 0 which means that M = −M t . So Eig0 (T ) is the space of all antisymmetric 2 × 2
matrices.
• A matrix M belongs to Eig2 (T ) if and only if T (M ) = 2M . This means that M + M t = 2M .
FT
This is the case if and only if M = M t . So Eig0 (T ) is the space of all symmetric 2×2 matrices.
Ejercicios.
1. Para las siguientes matrices, encuentre los vectorios propios, los espacios propios, una matriz
invertible C y una matriz diagonal D tal que C −1 AC = D.
−3 5 −20 −2 0 1 −2 0 −1 1 0 0
A1 = 2 0 8 , A2 = 0 2 0 , A3 = 0 2 0 , A4 = 3 2 0 .
2 1 7 9 0 6 9 0 6 1 3 2
4. Sea
3 −1
A= .
−2 4
Calcule los valores propios de T y muestre que T es diagonalizable. (Hint: basta escoger una
FT
base adecuada de R2 ).
9. Sea A ∈ M (n × n). Muestre que A y At tienen los mismos valores propios. (Hint: analice el
polinomio caracterı́stico)
10. Sea ~v ∈ R3 un vector no nulo y T : R3 → R3 dada por T (~x) = ~v × ~x. Muestre que 0 es el
único valor propio real.
(a) Muestre que λ idn −A es invertible para todo λ ∈ C − {0}. (Hint: La prueba es la misma
del ejercicio 28. sección 3.5)
(b) Encuentre el polinomio caracterı́stico de A. Observe que en ejercicio 2., D4 = O.
14. Sea A ∈ M (n × n) una matriz tal que A2 = A, muestre que A es diagonalizable. (Hint: Por
el ejercicio 7. de la sección 6.2; elija una base de ker A y complétela a una base de Rn por
medio de una base de Im A)
15. Sea A ∈ M (n×n) distinta de la matriz nula y T : M (n×n) → M (n×n) dada por T (X) = XA.
Muestre que A y T tienen los mismos valores propios.
16. Sea A ∈ M (n × n) tal que todos sus valores propios son 0. ¿Se puede concluir que A = O?
¿Cómo cambia la respuesta si suponemos que A es diagonalizable?
FT
8.4 Properties of the eigenvalues and eigenvectors
In this section we collect important properties of eigenvectors.
α1 α1 α1 α1
Since λ1 6= λ2 and ~v1 6= ~0, the last equality is false and therefore we must have α1 = 0. Then,
by (8.10), ~0 = α1~v1 + α2~v2 = α2~v2 , hence also α2 = 0 which proves that ~v1 and ~v2 are linearly
independent.
Induction step: Assume that we already know for some j < k that the vectors ~v1 , . . . , ~vj are linearly
independent. We have to show that then also the vectors ~v1 , . . . , ~vj+1 are linearly independent. To
this end, let α1 , α2 , . . . , αj+1 such that
On the one hand we apply A on both sides of the equation and use the fact that vectors are
eigenvectors. On the other hand we multiply both sides by λj+1 and then we compare the two
results.
Note that the term with ~vj+1 cancelled. By the induction hypothesis, the vectors ~v1 , . . . , ~vj are
linearly independent, hence
α1 = 0,
FT
We also know that λj+1 is not equal to any of the other λ` , hence it follows that
α2 = 0, ..., αj = 0.
Inserting this in (8.11) gives that also αj+1 = 0 and the proof is complete.
Note that the proposition shows again that an n×n matrix can have at most n different eigenvalues.
RA
Corollary 8.44. Let A ∈ M (n × n) and let µ1 . . . , µk be the different eigenvalues of A. If in each
Eigµj (A) we choose linearly independent vectors ~v1j , . . . , ~v`j1 , then the system of all those vectors
is linearly independent. In particular, if we choose bases in Eigµj (A), we see that the sum of
eigenspaces is a direct sum
Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A)
and dim(Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A)) = dim(Eigµ1 (A) + · · · + dim Eigµk (A)).
D
(m)
Proof. Let αj be numbers such that
=w
~1 + w
~2 + . . . w
~k
(j) (j)
~ j = α1 ~v1j + · · · + α`1 ~v`j1 ∈ Eigµj . Proposition 8.43 implies that w
with w ~ k = ~0.
~1 = · · · = w
(m) (m) (m)
But then also all coefficients αj = 0 because for fixed m, the vectors ~v1 , . . . , ~v`m are linearly
independent. Now all the assertions are clear.
d1
Theorem 8.45. (i) Let D = diag(d1 , . . . , dn ) =
0 be a diagonal matrix. Then
0 dn
the eigenvalues of D are precisely the numbers d1 , . . . , dn and the geometric multiplicity of
each eigenvalue is equal to its algebraic multiplicity.
d1 ∗ d
(ii) Let B =
and C =
1 0 be upper and lower triangular matrices
∗
0 dn
dn
respectively. Then the eigenvalues of D are precisely the numbers d1 , . . . , dn and the algebraic
multiplicity of an eigenvalue is equal to the number of times it appears on the diagonal. In
general, nothing can be said about the geometric multiplicities.
Proof. (i) Since the determinant of a diagonal matrix is the product of its diagonal elements, we
obtain for the characteristic polynomial of D
FT
d 1 − λ
0
pD (λ) = det(D − λ) = det = (d1 − λ) · · · · · (dn − λ).
0
dn − λ
Since the zeros of the characteristic polynomial are the eigenvalues of D, we showed that the
RA
numbers on the diagonal of D are precisely its eigenvalues. The algebraic multiplicity of an
eigenvalue µ is equal to the number of times it is repeated on the diagonal of D. The algebraic
multiplicity of µ is equal to dim(ker(D − µ id). Note that D − µ id is a diagonal matrix and
the jth entry on its diagonal is 0 if and only if µ = dj . it is not hard to see that the dimension
of the kernel of a diagonal matrix is equal to the number of zeros on its diagonal. So, in
summary we have for an eigenvalue µ of A:
(ii) Since the determinant of a triangular matrix is the product of its diagonal elements, we obtain
for the characteristic polynomial of B
d − λ
1
∗
pB (λ) = det(B − λ) = det = (d1 − λ) · · · · · (dn − λ).
0
dn − λ
and analogously for C. The reasoning for the algebraic multiplicities of the eigenvalues is as
in the case of a diagonal matrix. However, in general the algebraic and geometric multiplicity
of an eigenvalue of a triangular matrix may be different as Example 8.39 shows.
5 0 0 0 0 0
0 1 0 0 0 0
0 0 5 0 0 0
Example 8.46. Let D = . Then pD (λ) = (1 − λ)(5 − λ)3 (5 − λ)2 .
0 0 0 8 0 0
0 0 0 0 8 0
0 0 0 0 0 5
The eigenvalues are 1 (with geom. mult = alg. mult = 1), 5 (with geom. mult = alg. mult = 3)
and 8 (with geom. mult = alg. mult = 2),
Theorem 8.47. If A and B are similar matrices, then they have the same characteristic polyno-
mial. In particular, they have the same eigenvalues with the same algebraic multiplicities. Moreover,
also the geometric multiplicities are equal.
A − λ id = C −1 BC − λ id = C −1 BC − λC −1 C = C −1 (B − λ id)C
FT
and we obtain for the characteristic polynomial of A
pA (λ) = det(A − λ id) = det(C −1 (B − λ id)C) = det(C −1 ) det(B − λ id) det C = det(B − λ id)
= pB (λ).
This shows that A and B have the same eigenvalues and that their algebraic multiplicities coincide.
where in the second to last step we used that C −1 is invertible. The invertibility of C −1 also shows
that dim(C −1 Eigµ (B)) = dim(Eigµ (B), hence dim Eigµ (A) = dim(Eigµ (B), which proves that the
geometric multiplicity of µ as eigenvalue of A is equal to that of B.
D
(i) A is diagonalisable, that means that there exists a diagonal matrix D and an invertible matrix
C such that C −1 AC = D.
(ii) For every eigenvalue of A, its geometric and algebraic multiplicities are equal.
(iii) A has a set of n linearly independent eigenvectors.
(iv) Kn has a basis consisting of eigenvectors of A.
Proof. Let µ1 , . . . , µk be the different eigenvalues of A and let us denote the algebraic multiplicities
of µj by mj (A) and mj (D) and the geometric multiplicities by nj (A) and nj (D).
(i) =⇒ (ii): By assumption A and D are similar so they have the same eigenvalues by Theorem 8.47
and
mj (A) = mj (D) and nj (A) = nj (D) for all j = 1, . . . , k,
and Theorem 8.45 shows that
(ii) =⇒ (iii): Recall that the geometric multiplicities nj (A) are the dimensions of the kernel of
A − µj id. So in each ker(A − µj ) we may choose a basis consisting of nj (A) vectors. In total we
have n1 (A)+· · ·+nk (A) = m1 (A)+· · ·+mk (A) = n such vectors and they are linearly independent
by Corollary 8.44.
(iii) =⇒ (iv): This is clear because dim Kn = n.
FT
(iv) =⇒ (i): Let B = {~c1 , . . . , ~cn } be a basis of Kn consisting of eigenvectors of A and let d1 , . . . , dn
be the corresponding eigenvalues, that is, A~cj = dj ~cj . Note that the dj are not necessarily pairwise
different. Then the matrix C = (~c1 | · · · |~cn ) is invertible and C −1 AC is the representation of A in
the basis B, hence C −1 AC = diag(d1 , . . . , dn ). In more detail, using that ~cj = C~ej and C −1~cj = ~ej ,
Proof. If A has n different eigenvalues λ1 , . . . , λn , then for each of them the algebraic multiplicity
is equal to 1. Moreover,
D
for each eigenvalue. Hence the algebraic and the geometric multiplicity for each eigenvalue are equal
(both are equal to 1) and the claim follows from Theorem 8.48.
Corollary 8.50. If the matrix A ∈ M (n × n) is diagonalisable, then its determinant is equal to the
product of its eigenvalues.
Proof. Let λ1 , . . . , λn be the (not necessarily different) eigenvalues of A and let C be an invertible
matrix such that C −1 AC = D := diag(λ1 , . . . , λn ). Then
n
Y
det A = det(CDC −1 ) = (det C)(det D)(det C −1 ) = det D = λj .
j=1
Proof. Let us denote the algebraic multiplicity of each µj by mj (A) and its geometric multiplicity
by nj (A).
If A is diagonalisable, then the geometric and algebraic multiplicities are equal for each eigenvalue.
Hence
dim(Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A)) = dim(Eigµ1 (A)) + · · · + dim(Eigµk (A))
= n1 (A) + · · · + nk (A) = m1 (A) + · · · + mk (A) = n.
Since every n-dimensional subspace of Kn is equal to Kn , (8.12) is proved.
Now assume that (8.12) is true. We have to show that A is diagonalisable. In each Eigµj we choose
a basis Bj . By (8.12) the collection of all those basis vectors form a basis of Kn . Therefore we found
FT
a basis of Kn consisting of eigenvectors of A. Hence A is diagonalisable by Theorem 8.48.
The above theorem says that A is diagonalisable if and only if there are enough eigenvectors of
A to span Kn . This is the case if and only if Kn splits in the direct sum of subspaces on each of
which A acts simply by multiplying each vector with the number (namely with the corresponding
eigenvalue).
To practice a bit the notions of algebraic and geometric multiplicities, finish this section with an
alternative proof of Theorem 8.48.
RA
Alternative proof of Theorem 8.48. Let us prove (i) =⇒ (iv) =⇒ (iii) =⇒ (ii) =⇒ (i).
(i) =⇒ (iv): This was already discussed after Definition 8.22. Let D = diag(d1 , . . . , dn ) and let ~c1 , . . . , ~cn
be the columns of C. Clearly they form a basis of Kn because C in invertible. By assumption we know
that AC = CD. Hence we have that
A~cj = jth column of AC = jth column of CD = dj · (jth column of C) = dj ~cj .
Therefore the vectors ~c1 , . . . , ~cn are linearly independent and are all eigenvalues of A and hence they are
even a basis of Kn .
D
Since nj (A)−mj (A) ≤ 0 for all j = 1, . . . , k, each of the terms must be zero which shows that nj (A)−mj (A)
as desired.
(ii) =⇒ (i): For each j = 1, . . . , k let us choose a basis Bj of Eigµj (A). Observe that each basis has
nj (A) vectors. By Corollary 8.44, the system consisting of all these basis vectors is linearly independent.
Moreover, the total number of these vectors is n1 (A) + · · · + nk (A) = m1 (A) + · · · + mk (A) = n where we
used the assumption that the algebraic and geometric multiplicities are equal for each eigenvalue. Hence
the collection of all those vectors form a basis of Kn . That A is diagonalisable follows now as in the proof
of (iv) =⇒ (i):
FT
• etc.
You should now be able to
1
T (A) = (A − At ).
2
Muestre que T es diagonalizable
4. Sean A, B ∈ M (n × n).
es diagonalizable?
7. Sea A ∈ M (n × n) diagonalizable y sean d1 , d2 , . . . , dk todos sus valores propios distintos.
Muestre que (A−d1 idn )(A−d2 idn ) . . . (A−dk idn ) = On×n . ¿Sigue siendo cierta la afirmación
FT
si no suponemos que A es diagonalizable?
8. Sea A ∈ M (n × n) triangular superior ó inferior. ¿Cuál es el polinomio caracterı́stico de A?
¿Puede dar condiciones de cuando A es diagonalizable?
9. Sean A, B, C ∈ M (2 × 2) y sea
A C
V =
O2×2 B
RA
(a) Muestre que el polinomio caracterı́stico de V es la multiplicación de los polinomios
caracterı́sticos de A y B.
(b) Si C = O2×2 , muestre que V es diagonalizable si y solo si A, B son diagonalizables.
(c) ¿Es cierta la conclusión del inciso anterior si no suponemos que C = O2×2 ?
eigenvalues of a hermitian matrix are real, that eigenvectors corresponding to different eigenvalues
are orthogonal and that every hermitian matrix is diagonalisable. Note that symmetric matrices
are a special case of hermitian ones, so whenever we show something about hermitian matrices, the
same is true for symmetric matrices.
Proof. Let A be hermitian, that is, A∗ = A and let λ be an eigenvalue of A with eigenvector ~v .
Then ~v 6= ~0 and A~v = λ~v . We have to show that λ = λ. Therefore
λk~v k2 = λh~v , ~v i = hλ~v , ~v i = hA~v , ~v i = h~v , A∗~v i = h~v , A~v i = h~v , λ~v i = λh~v , ~v i = λk~v k2 .
Since ~v 6= ~0, it follows that λ = λ which means that the imaginary part of λ is 0, hence λ ∈ R.
Theorem 8.53. Let A be a hermitian matrix and let λ1 , λ2 be two different eigenvalues of A with
eigenvectors ~v1 and ~v2 , that is A~v1 = λ1~v1 and A~v2 = λ2~v2 . Then ~v1 ⊥ ~v2 .
Proof. The prove is similar to the proof of Theorem 8.52. We have to show that h~v1 , ~v2 i = 0. Note
that by Theorem 8.52, the eigenvalues λ1 , λ2 are real.
λ1 h~v1 , ~v2 i = hλ1~v1 , ~v2 i = hA~v1 , ~v2 i = h~v1 , A∗~v2 i = h~v1 , A~v2 i = h~v1 , λ2~v2 i = λ2 h~v1 , ~v2 i = λ2 h~v1 , ~v2 i.
Since λ1 6= λ2 by assumption it follows that h~v1 , ~v2 i = 0.
Corollary 8.54. Let A be a hermitian matrix and let λ1 , λ2 be two different eigenvalues of A.
Then Eigλ1 (A) ⊥ Eigλ2 (A).
The next theorem is one of the most important theorems in Linear Algebra.
FT
Theorem 8.55*. Every symmetric matrix is diagonalisable.
Theorem 8.57. A matrix is hermitian if and only if it is unitarily diagonalisable, that is, there
exists a unitary matrix Q and a diagonal matrix D such that D = Q−1 AQ = Q∗ AQ.
In both cases, D = diag(λ1 , . . . , λn ) where the λ1 , . . . , λn are the eigenvalues of A and the columns
of Q are the corresponding eigenvectors.
Proof. Let A be a hermitian matrix. From Theorem 8.55 we know that A is diagonalisable. Hence
D
Corollary 8.59. If a matrix A is hermitian (or symmetric), then its determinant is the product
of its eigenvalues.
Proof. This follows from Theorem 8.55 (or Theorem 8.55*) and Corollary 8.50.
Proof of Theorem 8.55. Let A ∈ MC (n × n) be a hermitian matrix and let µ1 , . . . , µk be the
different eigenvalues of A with geometric multiplicities n1 , . . . , nk . By Theorem 8.51 it suffices to
show that
Cn = Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A).
Let us denote the right hand side by U , that is, U := Eigµ1 (A) ⊕ · · · ⊕ Eigµk (A). Then we have
to show that U ⊥ = {~0}. For the sake of a contradiction, assume that this is not true and let
(j) (j)
` = dim(U ⊥ ). In each Eigµj (A) we choose an orthogonal basis ~v1 , . . . , ~vnj and we choose and
orthogonal basis w ~ ` in U ⊥ . The set of all these vectors is an orthonormal basis B of
~ 1, . . . , w
C because all the eigenspaces are orthogonal to each other and to U ⊥ . Let Q be the matrix
n
FT
(1) (k)
whose columns are these vectors: Q = (~v1 | · · · |~vnk |w ~ 1 | · · · |w
~ ` ). Then Q is a unitary matrix
because its columns are an orthogonal basis of C . Next let us define B = Q−1 AQ. Then B is
n
All the empty spaces are 0 and C is an ` × ` matrix (it is the matrix representation of the restriction
D
FT
matrix,
• etc.
Ejercicios.
1. Diagonalice ortogonalmente las siguientes matrices:
1 −3 0 0
RA
6 2 4 2 −1 0
1 −2 −1
−3 1 0 0
(a) , (b) 2 3 2 , (c) 3 −1 , (d) .
−2 4 0 0 3 0
4 2 9 0 −1 2
0 0 0 3
(b) Se sabe que ker(A − id) = gen{3~e2 − 4~e3 }. Encuentre los espacios propios de A.
(c) Encuentre una matriz A que cumple lo arriba.
3. Diagonalice
5 3(1 + i)
A= .
3(1 − i) 2
4. (a) Dé una matriz simétrica tal que su kernel es el plano x − 3y + 2z = 0. ¿Cuál debe ser la
imagen de la matriz que escogió?
(b) Dé una matriz simétrica que tenga por imagen el plano 5x − y + z = 0. ¿Cuál es su
kernel?
(c) Caracterice todas las matrices M (3 × 3) que tienen un único valor propio.
A Ct
V = .
C B
A O
8. Sean A, B ∈ Msym (2 × 2) y V =
O B donde O es la matriz cero. Muestre que V ∈
Msym (4 × 4) y que además, los valores propios de V son los valores propios de A junto con
FT
los valores propios de B.
with constants a, b, d. A solution is a tuple (x, y) which satisfies (8.13). We can view the set of all
solutions as a subset in the plane R2 . Since (8.13) is a linear equation (a 1 × 2 system of linear
equations), we know that we have the following possibilities for the solution set:
(a) a line if a 6= 0 or b 6= 0,
(b) the plane R2 if a = 0, b = 0 and d = 0,
(c) the empty set (no solution) if a = 0, b = 0 and d 6= 0,
D
with constants a, b, c, d.
In the following we will always assume that d ≥ 0. This is no loss of generality because if d < 0,
we can multiply both sides of (8.14) by −1 and replace a, b, c by −a, −b, −c. The set of solutions
does not change.
Again, we want to identify the solutions with subsets in R2 and we want to find out what type of
figures they are. The equation (8.14) is not linear, so we have to see what relation (8.14) has with
what we studied so far. It turns out that the left hand side of (8.14) can be written as an inner
product
x x a b/2
G , with G = . (8.15)
y y b/2 c
Question 8.5
The matrix G from (8.15) is not the only possible choice. Find all possible matrices G such that
hG( xy ) , ( xy )i = ax2 + bxy + cy 2 .
The matrix G is very convenient because it is symmetric. This means that up to an orthogonal
transformation, it is a diagonal matrix. So once we know how to solve the problem when G is
diagonal, then we know it for the general case since the solutions differ only by a rotation and
maybe a reflection. This motivates us to first study the case when G is diagonal, that is, when
b = 0.
FT
Quadratic equation without mixed term (b = 0).
Remark 8.60. The solution set is symmetric with respect to the x-axis and the y-axis because if
some (x, y) is a solution of (8.16), then so are (−x, y) and (x, −y).
RA
Let us define
( (
p p 2 a if a ≥ 0, 2 c if c ≥ 0,
α := |a|, γ := |c|, hence α = and γ =
−a if a < 0 −c if c < 0.
We have to distinguish several cases according to whether the coefficients a, c are positive, negative
or 0.
D
Case 1.1: a > 0 and c > 0. In this case, the equation (8.16) becomes
α2 x2 + γ 2 y 2 = d. (8.16.1.1)
(i) If d > 0, then (8.16.1.1) is the equation of an ellipse whose axes are parallel to the x and
√ p
the y-axis. The intersection with the x-axis is at ± αd = ± d/a and the intersection with
√ p
the y-axis is at ± γd = ± d/c.
y y
p
d/c
p
d/a
x x
Figure 8.1: Solution of (8.16) for det G > 0 . If a > 0, b > 0, then the solution is an ellipse (if d > 0)
or the point (0, 0) (if d = 0). The right picture shows ellipses with a and c fixed but decreasing d (from
red to blue). If a < 0, b < 0, d > 0, then there is no solution.
Case 1.2: a < 0 and c < 0. In this case, the equation (8.16) becomes
− α2 x2 − γ 2 y 2 = d. (8.16.1.2)
FT
(i) If d > 0, then (8.16.1.2) has no solution because the left hand side is always less or equal to
0 while the right hand side is strictly positive.
(ii) If d = 0, then the only solution of (8.16.1.2) is the point (0, 0) .
Case 2.1: a > 0 and c < 0. In this case, the equation (8.16) becomes
α2 x2 − γ 2 y 2 = d. (8.16.2.1)
RA
(i) If d > 0, then (8.16.2.1) is the equation of a hyperbola . If x = 0, the equation has no
√
solution. Indeed, we need |x| ≥ αr such that the equation has a solution. Therefore the
hyperpola
√
does
√
not intersect the y-axes (in fact, the hyperbola cannot pass through the strip
d d
− α < y < α ).
• Intersection with the√coordinate
p axes: No intersection with the y-axis. Intersection with
the x-axis at x = ± αd = ± d/a.
• Asymptotics: For |x| → ∞ and |y| → ∞, the hyperbola has the asymptotes
D
α
y = ± x.
γ
Note that the asymptote does not depend on d.
Proof. It follows from (8.16.2.1) that |x| → ∞ if and only if |y| → ∞ because otherwise
the difference α2 x2 − γ 2 y 2 cannot be constant. Dividing (8.16.2.1) by x2 and by γ 2 and
rearranging leads to
y2 α2 d x large α2 α
2
= 2 − 2 2 ≈ , hence y≈± x.
x γ γ x γ2 γ
(ii) If d = 0, then (8.16.2.1), becomes α2 x2 +γ 2 y 2 = 0, and its solution is the pair of lines y = ± αγ x .
y y
p
d/a
x
x
FT
Figure 8.2: Solution of (8.16) for det G < 0 . The solutions are hyperbola (if d > 0) or a set of two
intersecting lines. The left picture shows a solution for a > 0, c < 0 and d > 0. The right picture
shows hyperbolas for fixed a and c but decreasing d. The blue pair of lines passing through the origin
correspond to the case d = 0.
RA
Remark√8.62. Note that the intersection point of the hyperbola with the x-axis is propor-
tional to d. Hence as d decreases, the intersection points moves closer to the 0 and the turn
becomes sharper. If d = 0, the intersection point reaches 0 and the hyperbola become two
angles which look like two crossing lines.
Case 2.2: a < 0 and c > 0. In this case, the equation (8.16) becomes
− α2 x2 + γ 2 y 2 = d. (8.16.2.2)
D
This case is the same as Case 2.1, only with the roles of x and y interchanged. So we find:
• Asymptotics: For |x| → ∞ and |y| → ∞, the hyperbola has the asymptotes y = ± αγ x.
(ii) If d = 0, then (8.16.2.1), becomes α2 x2 +γ 2 y 2 = 0, and its solution is the pair of lines y = ± αγ x .
Case 3.1: a > 0 and c = 0. Then (8.16) Case 3.2: a = 0 and c > 0. Then (8.16)
becomes α2 x2 = d. becomes γ 2 y 2 = d.
• If d > 0, the solutions are the • If d > 0, the solutions are the
√ √
two parallel lines x = ± αd . two parallel lines y = ± γ
d
.
• If d = 0, the solution is the line x = 0 . • If d = 0, the solution is the line y = 0 .
Case 3.3: a < 0 and c = 0. Then (8.16) Case 3.4: a = 0 and c < 0. Then (8.16)
becomes −α2 x2 = d. becomes −γ 2 x2 = d.
If G was diagonal, then we immediately could give the solution. We know that G is symmetric,
hence we know that G can be orthogonally diagonalized. In other words, there exists an orthogonal
basis of R2 with respect to which G has a representation as a diagonal matrix. We can even choose
this basis such that they are a rotation of the canonical basis ~e1 and ~e2 (without an additional
D
reflection).
Let λ1 , λ2 be eigenvalues of G and let D = diag(λ1 , λ2 ). We choose an orthogonal matrix Q such
that
D = Q−1 GQ. (8.18)
Denote the columns of Q by ~v1 and ~v2 . They are normalised eigenvectors of G with eigenvalues λ1
and λ2 respectively. Recall that for an orthogonal matrix Q we always have that det Q = ±1. We
may assume that det Q = 1, because if not we can simply multiply one of its columns by −1. This
column then is still a normalised eigenvector of G with the same eigenvalue, hence (8.18) is still
valid. With this choice we guarantee that Q is a rotation.
From (8.18) it follows that G = QDQ−1 = QDQ∗ . So we obtain from (8.17) that
0
x
where ~x 0 = = Q∗ ~x = Q−1 ~x.
y0
0
x
Observe that the column vector is the representation of ~x with respect to the basis ~v1 , ~v2
y0
(recall that they are eigenvectors of G). Therefore the solution of (8.14) is one of the solutions
we found for the case b = 0 only now the symmetry axes of the figures are no longer the x- and
y-axis, but they are the directions of the eigenvectors of G. In other words: Since Q is a rotation,
we obtain the solutions of ax2 + bxy + cy 2 = d by rotating the solutions of ax2 + cy 2 = d with the
matrix Q.
FT
• Quadratic form without mixed terms: d = λ1 x02 + λ2 y 02 where x0 , y 0 are the components of
~x 0 = Q−1 ~x.
• Graphic of the solution: In the xy-coordinate system, indicate the x0 -axis (parallel to ~v1 )
and the y 0 -axis (parallel to ~v2 ). Note that these axes are a rotation of the x- and the y-axis.
The solutions are then, depending on the eigenvalues, an ellipse, hyperbola, etc. whose
symmetry axes are the x0 - and y 0 -axis.
If we want to know only the shape of the solution, it is enough to calculate the eigenvalues λ1 , λ2
RA
of G, or even only det G. Recall that we always assume d ≥ 0.
Definition 8.63. The axis of symmetry are called the principal axes.
we write (8.19) in the form hG~x, ~xi with a symmetric matrix G. Let us define
Solution. (i) First
10 3
G= . Then (8.19) is equivalent to
3 2
x x
G , = 4. (8.20)
y y
FT
(ii) Now we calculate the eigenvalues of G. They are the roots of the characteristic polynomial
det(G − λ).
(Recall that for symmetric matrices the eigenvectors for different eigenvalues are orthogonal.
If you solve such an exercise it might be a good idea to check if the vectors are indeed
orthogonal to each other.)
Observation. With the information obtained so far, we already can sketch the solution.
Set
1 1 3 λ1 0 1 0
Q = (~v1 |~v2 ) = √ , D= = ,
10 −3 1 0 λ2 0 11
then
Q−1 = Qt y D = Q−1 GQ = Qt GQ.
Observe that det Q = 1, so it is a rotation en R2 . It is a rotation by the angle arctan(−3).
If we define 0
x −1 x 1 x − 3y
= Q = √ ,
y0 y 10 3x + y
then (8.20) gives
0 0
x x t x t x x x
4= G , = DQ , Q = D 0 , ,
y y y y y y0
and therefore
1 11
4 = x02 + 11y 02 = (x − 3y)2 + (3x + y)2 .
10 10
FT
(iii) The solution of (8.19) is an ellipse whose
principal axes are parallel to the vectors ~v1 y ~v2 .
x0 is the coordinate along the axis parallel to ~v1 ,
y 0 is the coordinate along the axis parallel to ~v2 .
y
√ √
2 10/ 11
y0
RA
~v2
x
~v1
√
2 10
D
x0
47 2 32 13
− x − xy + y 2 = 2. (8.21)
17 17 17
(i) Write the equation in matrix form.
(ii) Make a change of coordinates so that the quadratic equation (8.21) has no mixed term.
(iii) Describe the solution of (8.21) in geometrical terms and sketch it. Indicate the principal axes
and important intersections.
Solution. (i) First we write (8.21) in the form hG~x, ~xi with symmetric matrix G. Let us define
1 −47 −16
G = 17 . Then (8.21) is equivalent to
−16 13
x x
G , = 2. (8.22)
y y
(ii) Now we calculate the eigenvalues of G. They are the roots of the characteristic polynomial
47 13
0 = det(G−λ) = (− 17 −λ)( 17 −λ)− 128 2 34 611 256 2
172 = λ + 17 λ− 172 − 172 = λ +2λ−3 = (λ−1)(λ+3).
FT
• G − λ1 = −→ =⇒ ,
17 −16 64 17 0 0 17 1
1 −64 −16 1 4 1 1 −1
• G − λ2 = −→ =⇒ ~v2 = √ .
17 −16 −4 17 0 0 17 4
Observation. With the information obtained so far, we already can sketch the solution.
• The solution are hyperbola because the eigenvalues have opposite signs.
• The principal axes (symmetry axes) are parallel to the vectors ~v1 and ~v2 . The intersec-
RA
√
tions of the hyperbola with the axis parallel to ~v2 are ± 2.
Set
1 4 −1 λ1 0 −3 0
Q = (~v1 |~v2 ) = √ , D= = ,
17 1 4 0 λ2 0 1
then
Q−1 = Qt y D = Q−1 GQ = Qt GQ.
Observe that det Q = 1, hence Q is a rotation of R2 . It is a rotation by the angle arctan(1/4).
D
If we define 0
x −1 x 1 4x + y
= Q = √ ,
y0 y 17 −x + 4y
then (8.22) gives
0 0
x x x x x x
2= G , = DQt , Qt = D 0 , ,
y y y y y y0
hence
3 1
2 = −3x02 + y 02 = − (4x + y)2 + (−x + 4y)2 .
17 17
y
y0
FT
Asymptotes of the hyperbola. In order to calculate the slopes of the asymptotes of the
hyperbola, we first calculate in the x0 -y 0 -coordinate system. Our starting point is the equation
2 = −3x02 + y 02 .
RA
r
02 02 y 02 1 y0 1
2 = −3x + y ⇐⇒ = 3 + 02 ⇐⇒ = ± 3 + 02 .
x02 2x x0 2x
0 √
We see that |y 0 | → ∞ if and only if |x0 | → ∞ and that xy 0 ≈ ± 3. So the slopes of the
√
asymptotes in x0 -y 0 -coordinates are ± 3.
How do we find the slope in x − y-coordinates?
• Method 1: Use Q. We know that if we rotate our hyperbola by the linear transforma-
D
√
1 4 1 1
√ 1 4− √ 3
~ 20
w ~2 = √
= Qw =√ .
17 −1 4 − 3 17 −1 − 4 3
√ √1 (x + 4y)
y0 17 x + 4y
± 3= = =
x0 √1 (4x − y) 4x − y
17
√
⇐⇒ ± 3(4x − y) = x + 4y
√ √
⇐⇒ (±4 3 − 1)x = (4 ± 3)y
√
y −1 ± 4 3
⇐⇒ = √ .
x 4± 3
• Method 3: Adding 0
√ angles. We know that the0 angle between the x -axis and an
FT
asymptote is arctan 3 and the angle between the x -axis and the x-axis
√ is arctan(1/4).
Therefore the angel between the asymptote and the x-axis is arctan 3 + arctan(1/4)
(see Figure 8.3.)
−3x2 + y 2 = 2 − 47 2
17 x −
32
17 16xy + 13 2
17 y =2
y y
y0 y0
RA
√
2 α=ϕ+ϑ
x0 ~v2 ϑ x0
ϑ
ϕ ~v1 ϕ
D
x x
ϕ = arctan(1/4)
√
ϑ = arctan( 3)
Figure 8.3: The figure on the right (our hyperbola) is obtained from the figure on the left by applying
the transformation Q to it (that is, by rotating it by arctan(1/4)).
Solution 1.
• First
we write (8.21) in the form hG~x, ~xi with symmetric matrix G. Let us define
9 −3
G= . Then (8.23) is equivalent to
−3 1
x x
G , = 25. (8.24)
y y
λ1 = 0, λ2 = 10.
Next we need the normalised eigenvectors. To this end, we calculate ker(G−λj ) using Gauß
RA
elimination:
9 −3 3 −1 1 1
• G − λ1 = −→ =⇒ ~v1 = √ ,
−3 1 0 0 10 3
−1 −3 1 3 1 −3
• G − λ2 = −→ =⇒ ~v2 = √ .
−3 −9 0 0 10 1
Observation. With the information obtained so far, we already can sketch the solution.
D
– The solution are two parallel lines because one of the eigenvalues is zero and the other
is positive.
– The
p lines are p
parallel to ~v1 and their intersections with the axis parallel to ~v1 are
± 25/10 = ± 5/2.
Set
1 1 −3 λ1 0 0 0
Q = (~v1 |~v2 ) = √ , D= = ,
10 3 1 0 λ2 0 10
then
Q−1 = Qt y D = Q−1 GQ = Qt GQ.
Observe that det Q = 1, hence Q is a rotation in R2 . It is a rotation by the angle arctan(3).
If we define
0
x −1 x 1 x + 3y
=Q =√ ,
y0 y 10 −3x + y
then (8.24) gives
0 0
x x x x x x
25 = G , = DQt , Qt = D 0 , ,
y y y y y y0
therefore
25 = 10y 02 = (−3x + y)2 .
the
pvector ~v1 which
0
p intersect the y -axis at
± 25/10 = ± 5/2.
FT
• The solution of (8.19) are two lines parallel to
x
RA
arctan(3).
y = 3x ± 5
FT
Figure 8.4: Ellipses. The plane in the picture on the left is parallel to the xy-plane. Therefore
the intersection with the cone is a circle. If the plane starts to incline, the intersection becomes an
ellipse. The more inclined the plane is, the more prolonged is the ellipse. As long as the plane is not
yet parallel to the surface of the cone, the intersects only either the upper or the lower part of the
cone and the intersection is an ellipse.
RA
D
Figure 8.5: Parabola. If the plane is parallel to the surface of the cone and does not pass through
the origin, then the intersection with the cone is a parabola (this is not a possible solution of (8.14)).
If the plane is parallel to the surface of the cone and passes through the origin, then the plane is
tangential to the cone and the intersection is one line.
Figure 8.6: Hyperbola. If the plane is steeper than the cone, then it intersects both the upper and
the lower part of the cone. The intersection are hyperbola. If the plane passes through the origin, then
FT
the hyperbola degenerate to two intersecting lines. The plane in the picture in the middle is parallel
to the yz-plane. Therefore the intersection with the cone is a circle.
λ1 x02 + λ2 y 02 + r0 x0 + s0 y 0 = d0 .
Now we only need to complete the squares on the left hand sides to obtain
Note that this can always be done if λ1 and λ2 are not 0 (here we use that G is invertible).
If we set d00 = d0 + (r0 /2)2 + (s0 /2)2 , x00 = x0 + r0 /2 y 00 = y 0 + s0 /2, then
e)2 + b(x0 + x
d = a(x0 + x e)(y0 + ye) + c(y0 + ye)2 + r(x0 + x
e) + s(y0 + ye)2
x2 + be y 2 + 2ax0 + by0 + r x e + 2cy0 + bx0 + s ye + ax20 + bx0 y0 + cy02 .
= ae xye + ce (8.27)
FT
x2 + be
de = ae y2
xye + ce (8.28)
which is now in the form of (8.14) (if de is negative, then we must multiply both sides of (8.28) by
−1. In this case, the eigenvalues of G change their sign, hence D also changes sign, but Q does
not). Hence if we set ~x 0 = Q−1~x
e, then
de = λ1 x02 + λ2 y 02
RA
0 −1~ −1 −1 −1 −1 r
1 −1 −1
and ~x = Q x e = Q (~x − ~x0 ) = Q ~x − Q ~x0 = Q ~x + 2Q G . So again we see that
s
2 2
the solution
of (8.25) is the solution of λ1 x + λ2 y = d but rotated by Q and shifted by the vector
e
1 −1 −1 r
2Q G
s
.
and that
0 0 0
x + 3y 0
x −1 x 1 x − 3y x x 1
=Q =√ and =Q 0 = √ 0 .
y0 y 10 3x + y y y 0
10 −3x + y
Method 1. With the notation above, we know from Example 8.64 that(8.19’) is
FT
−1/ 10 in y 0 -direction. The length of the semiaxes are 10 and 10 11 .
e = x − x0 = x + 1 and ye = y − y0 = y − 2. Then
Set x
RA
4 = 10x2 + 6xy + 2y 2 + 8x − 2y = 10(e
x − 1)2 + 6(e
x − 1)(e y + 2)2 + 8(e
y + 2) + 2(e x − 1) − 2(e
y + 2)
x2 − 20e
= 10e x + 1 + 6e x − 6e
xye + 12e y 2 + 8e
y − 12 + 2e x − 8 − 2e
y + 8 + 8e y−4
x2 + 6e
= 10e y 2 − 15
xye + 2e
hence
x2 + 6e
19 = 10e e02 + 11e
y2 = x
xye + 2e y 02
with
D
0
x −1 x 1 xe + 3e
y 1 (x + 1) + 3(y − 2) 1 x + 3y − 5
= Q = √ = √ = √ .
e e
ye0 ye 10 3ex − ye 10 3(x + 1) − (y − 2) 10 3x − y + 5
Ejercicios.
1. Encuentre una substitución ortogonal que diagonalice las formas cuadráticas dadas y en-
cuentre la forma diagonal. Haga un bosquejo de las soluciones. Si es un elipse, calcule las
longitudes de los ejes principales y el ángulo que tienen con el eje x. Si es una hipérbola,
calcule en ángulo que tiene las ası́ntotas con el eje x.
FT
(d) 11x2 − 16xy − y 2 = 30.
(e) x2 + 4xy + 4y 2 = 4.
(f) xy = 1.
(g) 5x2 − 2xy + 5y 2 = −4.
(h) x2 − 2xy + 4y 2 = 0.
1 1
2. Encuentre la fórmula de una elipse cuyos semiejes tienen magnitudes y y cuyos ejes
RA
2 3
principales son paralelos a 1 2 y −2 1 .
3. Encuentre la fórmula de una elipse cuyos semiejes tienen magnitudes 3 y 1 y cuyo primer eje
principal tiene un ángulo de 30◦ con el eje x.
8.7 Summary
Cn as an inner product space
D
~ ~z ∈ Cn and c ∈ C:
We have for all ~v w,
It is a polynomial of degree n. Since every polynomial of degree ≥ 1 has at least one complex root,
FT
every complex matrix has at least one eigenvalue (but there are real matrices without eigenvalues.)
Moreover, an n × n-matrix has at most n eigenvalues. If we factorise pA , we obtain
where µ1 , . . . , µ)k are the different eigenvalues of A. The exponent mj is called algebraic multi-
plicity of µj . The geometric multiplicity of µj is dim(Eigµj (A). Note that
• geometric multiplicity ≤ algebraic multiplicity,
RA
• the sum of all algebraic multiplicities is m1 + · · · + mk = n.
Similar matrices.
• Two matrices A, B ∈ M (n × n) are called similar if there exists an invertible matrix C such
that A = C −1 BC.
• A matrix A is called diagonalisable if it is similar to a diagonal matrix.
• | det Q| = 1,
• If λ ∈ C is an eigenvalue of Q, then |λ| = 1.
• Q is unitarily diagonalisable (we did not prove this fact), hence Cn has a basis consisting of
eigenvectors of Q. They can be chosen to be mutually orthogonal.
• det A ∈ R,
• If λ is an eigenvalue of Q, then λ ∈ R.
• A is unitarily diagonalisable hence Cn has a basis consisting of eigenvectors of A. They can
be chosen to be mutually orthogonal.
FT
Moreover, A ∈ M (n × n) is hermitian if and only if hA~v , ~zi = h~v , A~zi for all ~v , ~z ∈ Cn .
Solution of ax2 +bxy +cy 2 = d. The equation can be rewritten as hG~x , ~xi = d with the symmetric
matrix
a b/2
G= .
b/2 c
Let λ1 , λ2 be the eigenvalues of G and let us assume that d ≥ 0. Then the solutions are:
D
– hyperbola d > 0,
– two lines crossing at the origin if d = 0,
8.8 Exercises
1. Sea Q una matriz unitaria. Demuestre que todos sus autovalores tienen norma 1.
FT
A = −30 −20 36 , u = 1 , w = 1 .
−6 −6 16 −1 0
(a) Diga si los vectores u y w son autovectores de A. Si lo son, cuáles son los vectores propios
correspondientes?
(b) Puede usar que det(A − λ) = −λ3 + 21λ2 − 138λ + 280. Calcule todos los autovalores de
A.
RA
1 2 P∞ 1 n
7. Sea A = . Calcule eA := n=0 n! A .
2 4
Hint. Encuentre una matriz invertible C y una matriz diagonal D tal que A = C −1 DC y use
esto para calcular An .
p0 + 3p.
(c) Sea R la reflexión en el plano P : x + 2y + 3z = 0 en R3 . Calcule los valores propios y
los espacios propios de R.
9. We consider a string of lenth L which is fixed on both end points. It is excited then its vertical
∂2 ∂2
elongation satisfies the partial differential equation ∂t 2 u(t, x) = ∂x2 u(t, x). If we make the
ansatz u(, x) = eiωt v(x) for some number ω and a function v which depends only on x, we
obtain −ω 2 v = v 00 . If we set λ = −ω 2 , we see that we have to solve the following eigenvalue
problem:
T : V → V, T v = v 00
with
V = {f : [0, L] → R, f is twice differentiable and f (0) = f (L) = 0}.
10. Encuentre los valores propios y los espacios propios de las siguientes matrices n × n:
1 1 ··· 1 1
1 1 ··· 1 1 1 · · · 1 2
A = ... ... .. , B = . . .. .
. ..
.. .. . .
1 1 ··· 1
1 1 ··· 1 n
11. Sea A ∈ M (n × n, C) una matriz hermitiana tal que todos sus autovalores son estrictamente
mayores a 0. Sea h· , ·i el producto interno estandar en Cn . Demuestre que A induce un
FT
producto interno en Cn a través de
Complex Numbers
FT
a + ib
where a, b ∈ R and i is called the imaginary unit. The number a is called the real part of z, denoted
by Re(z) and b is called the imaginary part of z, denoted by Im(z).
The set of all complex numbers is sometimes called the complex plane and it is denoted by C:
C = {a + ib : a, b ∈ R}.
A complex number can be visualised as a point in the plane R2 where a is the coordinate on the
RA
real axis and b is the coordinate on the imaginary axis.
Let a, b, x, y ∈ R. We define the algebraic operations sum and product for complex numbers
z = a + ib, w = x + iy:
z + w = (a + ib) + (x + iy) := a + x + i(b + y),
zw = (a + ib)(x + iy) := ax − by + i(ay + bx).
a
Exercise A.1. Show that if we identify the complex number z = a+ib with the vector ∈ R2 ,
b
D
then the addition of complex planes is the same as the addition of vectors in Rn .
We will give a geometric interpretation of the multiplication of complex numbers later after formula
(A.5).
It follows from the definition above that i2 = −1. Moreover, we can view the real numbers R as a
subset of C if we identify a real number x with the complex number x + 0i.
Let a, b ∈ R and z = a + ib. Then the complex conjugate of z is
z = a − ib
and its modulus or norm is p
|z| = a2 + b2 .
Geometrically, the complex conjugate is obtained from the z by an reflection on the x-axis and its
norm is the distance of the point represented by z from the origin of the complex plane.
369
370
Im Im
3 + 2i
2 z = a + ib
−1 + i
1
Re Re
−3 −2 −1 1 2 3 4
−1
− 32 i
−2 z = a − ib
(i) z = Re z + i Im z.
(iv) zz = |z|2 .
(v) Re z = 12 (z + z), Re z = 1
2i (z
FT
(ii) Re(z + w) = Re(z) + Re(w), Im(z + w) = Re(z) + Im(w).
(iii) (z) = z, z + w = z + w, zw = z w.
− z).
RA
Proof. (i) and (ii) should be clear. For (iii) not that z = a − ib = a + ib,
z + w = a + x + i(b + y) = a + x − i(b + y) = a − ib + x − iy = a + ib + x + iy = z + w,
zw = ax − by + i(ay + bx) = ax − by + i(ay + bx) = (a − ib)(x − iy) = (a + ib)(x + iy) = z w.
z + z = a + ib + (a + ib) = a + ib + a − ib = 2a = 2 Re(z),
z + z = a + ib − (a + ib) = a + ib − (a − ib) = 2ib = 2i Im(z).
We call a complex number real if it is of the form z = a + i0 for some a ∈ R and we call it purely
imaginary if it is of the form z = 0 + ib for some b ∈ R. Hence
(c) Identity element of addition: There exists an element 0, called the additive identity such
that for every v ∈ C, we have 0 + v = v + 0 = v.
(d) Additive inverse: For all z ∈ C, we have an inverse element −z such that z + (−z) = 0.
(g) Identity element of addition: There exists an element 1, called the multiplicative identity
such that for every v ∈ C, we have 1 · v = v + ·1 = v.
(h) Multiplicative inverse: For all z ∈ C \ {0}, we have an inverse element z −1 such that
z · z −1 = 1.
FT
(i) Distributivity laws: For all u, v, w ∈ C we have
u(w + v) = uw + uv.
It is easy to check that commutativity, associativity and distributivity hold. Clearly, the additive
identity is 0 + i0 and the multiplicative identity is 1 + 0i. If z = a + ib, then its additive inverse is
−a − ib. If z ∈ C \ {0}, then z −1 = |z|z 2 = aa−ib 2
2 +b2 . This can be seen easily if we recall that |z| = zz.
The proof of the next theorem is beyond the scope of these lecture notes.
RA
Theorem A.3 (Fundamental theorem of algebra). Every non-constant complex polynomial
has at least one complex root.
Proof. Let n = deg(p). If n = 0, then p is constant and it clearly of the form (A.1). If n > 0, then,
by Theorem A.3 there exists µ1 ∈ C such that p(µ) = 0. Hence there exists some polynomial q1
such that p(z) = (z − µ)q1 (z). Clearly, deg(q) = n − 1. If q1 is constant, we are done. If q1 is not
constant, then it must have a zero µ2 . Hence q1 (z) = (z − µ2 )q2 (z) with some polynomial q2 with
deg(q2 ) = n − 2. If we repeat this process n times, we finally obtain that
Now we only have to group all terms with the same µj and we obtain the form (A.1).
where the cn are the coefficients and a is where the power series is centred.
P∞ In our case, they are
complex numbers and z P is a complex number. Recall that a series n=0 an is called absolutely
∞
convergent if and only if n=0 |an | is convergent. It can be shown that every absolutely convergent
series of complex numbers is convergent. Moreover, for every power series of the form (A.2) there
exists a number R > 0 or R = ∞, called the radius of convergence such that the series converges
absolutely for every z ∈ C with |z − a| < R and it diverges for z with |z − a| > R. That means that
the series converges absolutely for all z in the open disc with radius R centred in a, and it diverges
outside the closed disc with radius R centred in a. For z on the boundary the series may converge
FT
or diverge. Note that R = 0 and R = ∞ are allowed. If R = 0, then the series converges only for
z = a and if R = ∞, then the series converges for all z ∈ C.
Important functions that we know from the real numbers and have a power series are sine, cosine
and the exponential function. We can use their power series representation to define them also for
complex numbers.
Note that for every z the series in (A.3) are absolutely convergent because, for instance, for the series
P∞ (−1)n 2n+1 P∞ 1
for the sine function, we have n=0 | (2n+1)! z | = n=0 (2n+1)! |z|2n+1 is convergent because |z|
is a real number and we know that the cosine series is absolutely convergent for every real argument.
Hence the sine series is absolutely convergent for any z ∈ C, hence converges. The same argument
shows that the series for the cosine and for the exponential are convergent for every z ∈ C.
D
Remark A.6. Since the series for the sine function contains only odd powers of z, it is an odd
function and cosine is an even function because it contains only even powers of z. In formulas:
sin(−z) = − sin z, cos(−z) = cos z.
Next we show the relation between the trigonometric and the exponential function.
Proof. Let us show the formula for eiz . In the calculation we will use that i2n = (i2 )n = (−1)n and
i2n+1 = (i2 )n i = (−1)n i and
∞ ∞ ∞ ∞
X 1 X 1 n n X 1 (2n) 2n X 1
eiz = (iz)n = i z = i z + i(2n+1) z 2n+1
n=0
n! n=0
n! n=0
(2n)! n=0
(2n + 1)!
∞ ∞ ∞ ∞
X 1 X 1 X (−1)n 2n X (−1)n 2n+1
= (−1)n z 2n + i(−1)n z 2n+1 = z +i z
n=0
(2n)! n=0
(2n + 1)! n=0
(2n)! n=0
(2n + 1)!
= cos z + i sin z.
Note that the third steps needs some proper justification (see some course on intergral calculus).
For the proof of the formula for cos z we note that from what we just proved, it follows that
1 iz 1 1
(e + e−iz ) = (cos(z) + i sin(z) + cos(−z) + i sin(−z)) = (cos(z) + i sin(z) + cos(z) − i sin(z))
FT
2 2 2
= cos(z).
in C.
(v) Show that the exponential function is 2πi periodic.
Let z ∈ C with |z| = 1 and let ϕ be the angle between the positive real axis and the line connecting
the origin and z. It is called the argument of z. and it is denoted by arg(z). Observe that the
argument is only determined modulo 2π. That means, if we add or subtract any integer multiple
of 2π to the argument, we obtain another valid argument.
Im
Im z
Im(z) = |z| sin ϕ
1 1
z
Im(z) ze
ϕ ϕ
Re(z) 1 Re Re(z) 1 Re
= |z| cos ϕ
FT
Then the real and imaginary part of z are Re(z) = cos ϕ and Im(z) = i sin ϕ, and therefore
z = cos ϕ + iϕ = eiϕ . We saw in Remark 2.3 how we can calculate the argument of a complex
number.
Now let z ∈ C \ {0} and again let ϕ be the angle between the positive real axis and the line
z
connecting the origin with z. Let ze = |z| z | = 1 and therefore ze = eiϕ . It follows that
. Then |e
Im
D
zw w
α+β
β z
α
Re
Appendix B
Solutions
h .
2. k ∈ R \ {19}.
√ √ • El conductor A llega a Villavicencio a
3. k ∈ R \ {8 − 66, 8 + 66}. las 10am.
• Los dos conductores se encuentran en
Soluciones de Sección 1.4 carretera a las 9am.
1. 6.
2. a = 3, b = 1, c = 1.
3. Si y = ax2 + bx + c es tal parábola,
entonces a + c = 52 y b = − 32 .
4. Las opciones posibles son:
• t 6= 21 , k = 10.
FT
6−9x+3z = 13. Por ejemplo,
son paralelos −1+5x−3y
√
1) si y solo si existen escalares µ, λ tal que P ((41 ± 7117)/106, 0, 0) sirve.
µ~a = λ~b; 2. 188.
2) si y solo si h~a , ~bi = k~ak k~bk;
q
5
3. 29 .
3) si y solo si las rectas que generan son 3
paralelas, es decir, no se intersecan en 4. Todos los ~a que son paralelos a 7 .
3 2
exactamente un punto, lo que se puede 1
Los que tienen norma 1 son 62 7 y
√
verficar con el determinante. 2
RA
3
4.(a)(i) α = −2. − √162 7 .
2
4.(a)(ii) α = 2. √
4.(a)(iii) α = −2(2 ± 3). Soluciones de Sección 2.6
4.(a)(iv) α = 0.
4.(a)(v) No existe tal α. 2.(a)(i) No son paralelas.
4.(b) Cuando α → ∞ el ángulo entre ~a y ~b 2.(a)(ii) No tienen ningún punto en común.
tiende a 3π 2.(a)(iii) P pertenece a L1 , P no pertenece a
4 . Cuando α → −∞ el ángulo entre
~ L2 .
~a y b tiende a π4 . y−2
2.(a)(iv) x−5
2 = 3 =
z−11
4 .
D
4.4.c(ii) P no pertenece a E.
4.4.c(iii) x = 3t + 1, y = 2t, z = t − 1, t ∈ R.
FT
4.(c) Verdadero.
4.(d) Verdadero.
7.(a)ii. Colisionan en el punto (7, 10, −1) y en
tiempo t = 2.
7.(b)ii. No colisionan.
7.(b)iii. Las dos estelas se mezclan en el
√ 20, −5).
punto (13,
13.(a) 3 6.
RA
14.(c) Q(−3, 5, 5). √
14.(d) La distancia obtenida es 11.
16.(b) Solo existe un único plano que cumple
tal condición.
17.(a) No.
17.(c) 34x − 5y − 9z = 0 es el único plano con
las propiedades deseadas.
18.(b) λ = µ = 12 .
D
FT
1 0 0 0 1
2.(f) 8 −12 20.
0 1 0 0 −3 −8
−11 7
1.(g) .
0 0 1 0 2 3 5
4. A = .
0 0 0 1 5 1 2
cos ϑ − sin ϑ
2. Los a, b, c ∈ R tales que a − 2b + c = 0. 6. .
sin ϑ cos ϑ
3. −x2 + 3x − 2.
5. El rolo pasó 6 dı́as en Medellı́n, 4 dı́as en 7. a = d, c = 0.
RA
Villavicencio y 4 dı́as en Yopal.
Soluciones de Sección 3.5
Soluciones de Sección 3.2 3. Una solución particular al sistema A~x = ~b
usando
la inversa
a derecha es
x −2 3
2.(a) y = s 1 + t 0 . 4 −3
z 0 1
x 5 3 ~x = 0 0 ~b.
2.(b) yz = s 61 + t −20 . −1 1
w 0 1
2.(c) ( xy )
= t( 41 ). Soluciones de Sección 3.6
D
3. r ∈ R − {−3, 2}
1 6 −3
1.a) 12 .
2 1
Soluciones de Sección 3.3 1.e) No tiene inversa.
1.(a) Sı́.
21 −3 −3 −6
1.(b) No. 4 −1 −4 1
1.(c) No. 1.f) 91
−1 −2
.
1 2
Las soluciones
del sistema homogéneo son −15 6 6 3
x 5
y −3
=t
z 5 1 0 0
w 2
2. (a) No tiene solución. 1.h) 0 −1/5 0.
(b) No tiene solución. 0 0 3
3.(a) a ∈ R \ {2, −2}. 2.
5.
1 0 0
0 cos ϑ sin ϑ .
0 − sin ϑ cos ϑ
FT
3.
(a) No.
(b) Sı́.
(c) No.
(d) Sı́.
RA
4. No se puede concluir que AB es simétrica.
1 0 0
3.(c) Q21 (5)P23 Q21 (2) 0 1 3.
0 0 0
4.b) E = Q31 (−4).
4.c) E = P13 .
4.e) E = Q12 (3).
FT
7. Hint. Use que A es invertible.
−1 2 4
1 0 2 0
−1 1 2 −2
2.d) .
1 0 −3 3
2 −2 −2 3
4
3. − 37 .
5. Use la relación entre A y adj A.
FT
7. No es subespacio de V .
3.(d) El conjunto dado solo tiene dos vectores.
8. No es subespacio de V .
3.(e) El conjunto dado es linealmente
11. Sı́ es subespacio de V .
dependiente.
14. Sı́ es subespacio de V .
4. Una basedel subespacio
dado es
0 a
17.(b) W1 ∩ W2 = : a, b ∈ R . 2 0 0
a 0
, , 0 .
0 4
0 0 1
−1 3 0
RA
Soluciones de Sección 5.3
6. Hint. Si tiene una base de E, completela
0 a una base de R3 con el vector normal del
2. 1. plano.
−1 10. α ∈ R − {−1, 2}.
4. {1, x, x3 }.
3 2 11.(c) Sı́ existen. Hint. Modifique un poco la
6. {2x
− 3x +x, 1}.
base canónica de Pn .
1 0 1
8. 0 , 1 , 1
3 2 5
D
FT
RA
D
FT
x −x
15. 8. Im T =span{x 3
, 1} y 3.(a) Recuerde que sinh x = e −e 2 ,
ex +e−x
6 cosh x = .
2
ker T = span 2 .
1 0 0
−1
3.(b) A = 0 1/2 −1/2
15.)10. Im T = M (2 × 2) y ker T = {O2×2 }. 0 1/2 1/2
15.)12. Im T = R y ker T son las matrices
de
a11 a12 a13 c+b+a
la formaa21 a22 a23 4. [a + bX + cX 2 ]B = 2c + b .
RA
a31 a32 −(a11 + a22 ) c
cos ϑ sin ϑ
17. Im T = span{w}~ y 6. ABϑ →can = y
− sin ϑ cos ϑ
ker T = {~x ∈ Rn : ~x ⊥ w}.
~
cos ϑ − sin ϑ
20. No existe. Acan→Bϑ = .
sin ϑ cos ϑ
√
−3 3 −3
Soluciones de Sección 6.2 8.(a) = .
−3 Bϑ
0
√
2.(a) Im T es
elplano x − 2z = 0 y 1 2
D
8.(b) = .
−1 B
1 ϑ
0
ker T = span 0 .
−2
9. Hint. Use la relación
1 ABϑ1 →Bϑ2 = ABϑ1 →can Acan→Bϑ2 .
−1
3
2.(b) Im T = R y ker T = span .
1 Soluciones de Sección 6.4
2
0 1
1 −1 1. 4.
0 0
5 4
2
2.(c) Im T = span , y
−3 1 −1 2 3
1 0 1. 6. 0 1 4 3.
0 2 1 0 6 6
0 1 2
0 0 0
1. 8.
.
0 0 0
1 −3 0
2 0 0 0
4 1 0 0
1. 10.
0
.
0 2 0
0 0 4 1
1. 12. 1 0 0 0 1 0 0 0 1 .
−1 0 0 0 0
0 0 0 0 0
0 0 1 0 0.
2.(a)
0 0 0 2 0
0 0 0 0 3
FT
2.(e) (1, 12 , 31 , . . . , n+1
1
).
1 0 1
3.(b) .
0 3 1
0 4/5 2/5
3.(d) .
3 7/5 16/5
4.(a) Hint. Suponga una combinación lineal
de vectores de B igualada a 0, evalúe en
x = 0, x = π.
RA
0 1 1 0
−1 0 0 1
4.(b) [D]B = 0 0 0 1.
0 0 −1 0
~ = (4, −3, 0, π)t .
8.(a) w
~ = (19, −18, 3, −5)t .
8.(b) w
FT
ángulo ϑ el plano xy. 1 1
2. Hint. Recuerde que AB→can es ortogonal. 1. Partiendo de la base 0 , 1 se
−1 0
4.(a) Falso.
4.(a) Verdadero. 1 1
obtiene: √12 0 , √16 2 .
−1 1
Soluciones de Sección 7.3
1 2
5
RA
1.(a) span . 0 −1
1 2. Sea W el generado de
1, 0.
7
1.(c) La recta t 5. 0 1
1 Encuentre una base para W ⊥ y aplique
1.(d) Gram-Schmidt en la base dada de W y en la
complemento ortogonal es
Una base del
base obtenida de W ⊥ .
−5
−7
3 5 0
, .
1 0 1 1
3. projW ~v = 2 y la distancia de ~v a W
0 1
D
2
1
3 1 1 es √12 .
3.(a) =− +4 . Una base
5 −1 1
1 4. Observe
que dim
ImA =
3 yuna base de
ortornormal de W ⊥ es √12 . 1 1 2
1
3 1 1
10 10 0 Im A es , , .
0 −1
2
3.(c) −1 = −1 + 0. Una base
7 −1 1
6 6 0 Aplicando
Gram-Schmidt
se obtiene
1 0 1 11/3 2
ortonormal de W ⊥ es 0 , √137 1 .
3 4 −1
1 1 1
√ , √35
, √15 .
0 −6 3 7 2 1/3 3
7 7/3 1
−1 1 1 3
⊥
5. Una base
ortonormal para
U es
4 0 2
FT
0 1 0
√ 1 , , √ 1 .
17 0 0 357 −17
−1 0 8
RA
D
FT
válida. de valores propios de A, por ende
(A − d1 idn )(A − d2 idn ) . . . (A − dk idn ) =
Q(D − d1 idn )(D − d2 idn ) . . . (D − dk idn )Q−1 .
Soluciones de Sección 8.2
9.(c) Si no suponemos que C 6= O2×2 la
3.(a) Falsa. conclusión de 9.(b) es falsa.
3.(b) Falsa.
3.(c) Falsa.
5.(c) Use que B = CAC −1 y la propiedad del Soluciones de Sección 8.5
RA
inciso (b).
5.(d) Observe que las entradas de la diagonal 2.(a) Los valorespropios
de A son 1 y 2.
de At A son las normas al cuadrado de los 0
vectores columna de A. 2.(b) E1 = span 3 y E5 es el plano
−4
3y − 4z = 0.
Soluciones de Sección 8.3
2. Su polinomio caracterı́stico es λ4 . 3.A =
1 1+i 1 8 0 1−i 1
3. El polinomio caracterı́stico de D es .
D
3 1 −1 + i 0 −1 1 −(1 + i)
(λ − 1)(λ − 2)(λ − 3)(λ − 4).
4. Si empieza diagonalizando
A se obtiene
5. Escoja una base ortonormal de L⊥ y
1 −1 2 0 2 1
A = 31 .
1 2 0 5 −1 1 complétela a una base de R3 con un vector
5. Los valores propios de T son 1 y −1. unitario de L. La representación
matricial
de
6. Resuelva la ecuación diferencial y = λy 0 −1 0 0
sujeta a la condición inicial y(0) = 0. T en esta base será 0 −1 0
8. Si λ es valor propio de A, existe ~x 6= 0 tal 0 0 1
que A~x = λ~x. Multiplique por A−1 .
10. Observe que ~x 6= 0 es un vector propio de 6. Use el teorema 8.55*
T si ~v × ~x = λ~x. ¿Para cuáles λ la igualdad 7.(a) Use el teorema 8.55
anterior es cierta? 7.(b) Use el teorema 8.55 y el ejercicio 8. de
11. Vea el ejercicio 4.(a) de la sección 7.4 la sección 8.1.
0 2
Soluciones de Sección 8.6 1.(e)
5(y) = 4con
ejes de rotacion
√1
−2 1
1.(a)(x0 )2 + 11(y 0 2
) = 4 con ejes de rotación , √15 .
5 1 2
√1
1 −3
10 3
, √110 .
1 y0
y
y x0
2
2 2
1
y0
1
√2 x
11 −2 −1 1 2
−1
−2 −1 1 2
x x0
−2
FT
−1
1.(g) 4(x0 )√
2
+ 6(y 0 )2 = −4.
−2
3. 3x − 4 3xy + 7y 2 = 9
2
y0 espacios propios
son el plano x + 2y + 3z = 0
1
y la recta t 2.
3
x 9.c Los autovalores de T son los valores de la
2 2 2 2
forma − kLπ2 donde k ∈ N y para cada − kLπ2 ,
2 2
su espacio propio es span{sin kLπ2 t}.
FT
×, 48 approximation by least squares, 293
∧, 48 argument of a complex number, 367
C, 363 augmented coefficient matrix, 13, 78
Cn , 307
M (m × n), 78 bases
R2 , 27, 30 change of, 237
R3 , 48 basis, 190
Rn , 45 orthogonal, 271
RA
Eigλ (T ), 320 bijective, 219
Im, 363
L(U, V ), 218, 265 canonical basis in Rn , 191
Masym (n × n), 118 Cauchy-Schwarz inequality, 38, 310
Msym (n × n), 118 change of bases, 237
Pn , 174 change-of-coordinates matrix, 240
Re, 363 characteristic polynomial, 322
Sn , 135 coefficient matrix, 13, 78
U ⊥ , 279 augmented, 13, 78
D
389
390
FT
elementary matrix, 120 vector equation, 55
elementary row operations, 79 linear combination, 176
empty set, 177, 179, 181, 190, 193 linear map, 217
entry, 13 linear maps
equivalence relation, 315 matrix representation, 248
Euler formulas, 366 linear operator, see linear map
expansion along the kth row/column, 138 linear span, 177
linear system, 12, 77
RA
field, 364 consistent, 12
finitely generated, 179 homogeneous, 12
free variables, 85 inhomogeneous, 12
solution, 12
Gauß-Jordan elimination, 83 linear transformation, see linear map
Gaußian elimination, 83 matrix representation, 250
generator, 177 linearly dependent, 181
geometric multiplicity, 320 linearly independent, 181
Gram-Schmidt process, 290 lower triangular, 117
D
FT
transition, 240 principal axes, 346
unitary, 312 product
upper triangular, 117 inner, 35, 46, 308
matrix representation of a linear product of vector in R2 with scalar, 30
transformation, 250 projection
minor, 137 orthogonal, 285
modulus, 363 proper subspace, 167
Multiplicative identity, 365 Pythagoras Theorem, 286, 309
RA
multiplicity
algebraic, 325 radius of convergence, 366
geometric, 320 range, 220
real part of z, 363
norm, 363 reduced row echelon form, 81
norm of a vector, 32, 46, 308 reflection in R2 , 256
normal form reflection in R3 , 258
line, 57 right hand side, 12, 77
plane, 59 right inverse, 107
D
solution
vector form, 86
span, 177
square matrix, 78
standard basis in Rn , 191
standard basis in Pn , 191
subspace, 167
affine, 169
sum of functions, 98
surjective, 219
symmetric equation, 56
symmetric matrix, 118
system
orthogonal, 270
orthonormal, 270
FT
transition matrix, 240
triangle inequality, 33, 39, 310
trivial solution, 89, 180
unit vector, 33
unitary matrix, 312
upper triangular, 117
RA
vector, 31
in R2 , 27
norm, 32, 46, 308
unit, 33
vector equation, 55
vector form of solutions, 86
vector product, 48
vector space, 31, 161
direct sum, 202
D
generated, 177
intersection, 201
polynomials, 174
spanned, 177
subspace, 167
sum, 202
vector sum in R2 , 30
vectors
orthogonal, 37, 309
parallel, 37
perpendicular, 37, 309