0% found this document useful (0 votes)
70 views239 pages

LA Nutshell

Uploaded by

tesillusion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views239 pages

LA Nutshell

Uploaded by

tesillusion
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 239

Linear Algebra

– in a nutshell –

This book was created and used for the lecture at


Hamburg University of Technology in the winter term
2018/19 for General Engineering Science and Computer
Science students. It is a modern approach to teach
Linear Algebra and all topics here have corresponding
videos that explain the mathematics in spoken form.

Hamburg, 26th May 2023, Version 2.5

Julian P. Großmann
www.JP-G.de
[email protected]
ii

The author would like give special thanks


• to Anusch Taraz, Marko Lindner, Marina Antoni and Max Ansorge for the German
lecture notes that were used as a base for this book and some nice pictures that also
appear here,
• to Anton Schiela for an English draft of the Linear Algebra course held at Hamburg
University of Technology,
• to Alexander Haupt and Fabian Gabel for many corrections and remarks,
• to all students who pointed out typos and other problems in the first draft,
• to all supporters of my YouTube channel who motivate me to continue in teaching.

Hamburg, 26th May 2023 J.P.G.


Contents

1 Foundations of mathematics 3
1.1 Logic and sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 Natural numbers and induction . . . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2 Vectors in Rn 25
2.1 What are vectors? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Vectors in the plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 The vector space Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Linear and affine subspaces (and the like) . . . . . . . . . . . . . . . . . . . 34
2.5 Inner product and norm in Rn . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.6 A special product in R3 (!): The vector product or cross product . . . . . . 42
2.7 What are complex numbers? . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3 Matrices and linear systems 47


3.1 Introduction to systems of linear equations . . . . . . . . . . . . . . . . . . 48
3.2 Some words about matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.3 Looking at the columns and the associated linear map . . . . . . . . . . . . 51
3.4 Looking at the rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.7 Linear dependence, linear independence, basis and dimension . . . . . . . . 60
3.8 Identity and inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.9 Transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.10 The kernel, range and rank of a matrix . . . . . . . . . . . . . . . . . . . . 68
3.11 Solving systems of linear equations . . . . . . . . . . . . . . . . . . . . . . 70
3.11.1 Row operations and the Gauß algorithm . . . . . . . . . . . . . . . 70
3.11.2 Set of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.11.3 Gaussian elimination (without pivoting) . . . . . . . . . . . . . . . 74

iii
iv Contents

3.11.4 Row echelon form . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79


3.11.5 Gaussian elimination with pivoting and P A = LK decomposition . 81
3.12 Looking at columns and maps . . . . . . . . . . . . . . . . . . . . . . . . . 85
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4 Determinants 91
4.1 Determinant in two dimensions . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 Determinant as a volume measure . . . . . . . . . . . . . . . . . . . . . . . 93
4.3 The cofactor expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4 Important facts and using Gauß . . . . . . . . . . . . . . . . . . . . . . . . 100
4.5 Determinants for linear maps . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.6 Determinants and systems of equations . . . . . . . . . . . . . . . . . . . . 105
4.7 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

5 General inner products, orthogonality and distances 109


5.1 General inner products in Rn . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.2.1 Orthogonal projection onto a line . . . . . . . . . . . . . . . . . . . 112
5.2.2 Orthogonal projection onto a subspace . . . . . . . . . . . . . . . . 114
5.3 Orthonormal systems and bases . . . . . . . . . . . . . . . . . . . . . . . . 118
5.4 Orthogonal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.5 Orthogonalisation: the QR-decomposition . . . . . . . . . . . . . . . . . . 123
5.6 Distances: points, lines and planes . . . . . . . . . . . . . . . . . . . . . . 125
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

6 Eigenvalues and similar things 129


6.1 What is an eigenvalue and an eigenvector? . . . . . . . . . . . . . . . . . . 130
6.2 The characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . 131
6.3 Complex matrices and vectors . . . . . . . . . . . . . . . . . . . . . . . . . 135
6.4 Eigenvalues and similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6.5 Calculating eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
6.6 The spectral mapping theorem . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.7 Diagonalisation – the optimal coordinates . . . . . . . . . . . . . . . . . . 145
6.8 Some applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

7 General vector spaces 153


7.1 Vector space in its full glory . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.2 Linear subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.3 Recollection: basis, dimension and other stuff . . . . . . . . . . . . . . . . 158
7.4 Coordinates with respect to a basis . . . . . . . . . . . . . . . . . . . . . . 161
7.4.1 Basis implies coordinates . . . . . . . . . . . . . . . . . . . . . . . . 161
7.4.2 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
7.5 General vector space with inner product and norms . . . . . . . . . . . . . 170
7.5.1 Inner products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
7.5.2 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
7.5.3 Norm in pre-Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . 177
7.5.4 Recollection: Angles, orthogonality and projection . . . . . . . . . . 178
Contents v

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

8 General linear maps 183


8.1 Definition: Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
8.2 Combinations of linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . 185
8.2.1 Sum and multiples of a linear map . . . . . . . . . . . . . . . . . . 185
8.2.2 Composition and inverses . . . . . . . . . . . . . . . . . . . . . . . 187
8.3 Finding the matrix for a linear map . . . . . . . . . . . . . . . . . . . . . . 190
8.3.1 Just know what happens to a basis . . . . . . . . . . . . . . . . . . 190
8.3.2 Matrix of a linear map with respect to bases . . . . . . . . . . . . . 190
8.3.3 Matrix representation for compositions . . . . . . . . . . . . . . . . 195
8.3.4 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
8.3.5 Equivalent and similar matrices . . . . . . . . . . . . . . . . . . . . 199
8.4 Solutions of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . 203
8.4.1 Existence for solutions . . . . . . . . . . . . . . . . . . . . . . . . . 203
8.4.2 Uniqueness and solution set . . . . . . . . . . . . . . . . . . . . . . 204
8.4.3 Invertibility: unconditional and unique solvability . . . . . . . . . . 205
8.4.4 A link to the matrix representation . . . . . . . . . . . . . . . . . . 205
8.5 Determinants and eigenvalues for linear maps . . . . . . . . . . . . . . . . 208
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

9 Some matrix decompositions 211


9.1 Jordan normal form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
9.2 Singular value decomposition . . . . . . . . . . . . . . . . . . . . . . . . . 220
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228

Index 231
Some words

This text should help you to understand the course Linear Algebra. To expand your
knowledge, you can look into the following books:
• Gilbert Strang: Introduction to Linear Algebra,
• Sheldon Axler: Linear Algebra Done Right,
• Gerald Teschl, Susanne Teschl: Mathematik für Informatiker, Band 1.
• Shin Takahashi, Iroha Inoue: The Manga Guide to Linear Algebra.
• Klaus Jänich: Lineare Algebra.
Linear Algebra is a very important topic and useful in different applications. We1 discuss
simple examples later. However, the main idea is that we have a problem consisting of
a lot of quantities where some are fixed and others can be altered or are not known.
However, if we know the relations between the quantities, we use Linear Algebra to find
all the possible solutions for the Unknowns.

Variables Simple
Linear Find all
relations
solutions
between
Unknowns for
the
variables Algebra

That would be the calculation side of the world regarding Linear Algebra. In this lecture,
we will concentrate on understanding the field as a whole. Of course, this is not an easy
task and it will be a hiking tour that we will do together. The summit and goal is to
understand why solving equations is indeed a meaningful mathematical theory.

1
In mathematical texts, usually, the first-person plural is used even if there is only one author. Most of
the time it simply means “we” = “I (the author) and the reader”.

1
2 Contents

We start in the valley of mathematics and will shortly scale the first hills. Always stay
in shape, practise and don’t hesitate to ask about the ways up. It is not an easy trip but
you can do it. Maybe the following tips can guide you:
• You will need a lot of time for this course if you really want to understand everything
you learn. Hence, make sure that you have enough time each week to do mathematics
and keep these time slots clear of everything else.
• Work in groups, solve problems together and discuss your solutions. Learning math-
ematics is not a competition.
• Explain the content of the lectures to your fellow students. Only the things you can
illustrate and explain to others are really understood by you.
• Learn the Greek letters that we use in mathematics:
α alpha β beta γ gamma Γ Gamma
δ delta  epsilon ε epsilon ζ zeta
η eta θ theta Θ Theta ϑ theta
ι iota κ kappa λ lambda Λ Lambda
µ mu ν nu ξ xi Ξ Xi
π pi Π Pi ρ rho σ sigma
Σ Sigma τ tau υ upsilon Υ Upsilon
φ phi Φ Phi ϕ phi χ chi
ψ psi Ψ Psi ω omega Ω Omega

This video may help you there:


https://jp-g.de/bsom/la/greek/

• Choosing a book is a matter of taste. Look into different ones and choose the book
that really convinces you.
• Keep interested, fascinated and eager to learn. However, do not expect to under-
stand everything at once.
DON’T PANIC J.P.G.
Foundations of mathematics
1
It is a mistake to think you can solve any major problems just with
potatoes.
Douglas Adams

Before starting with Linear Algebra, we first have to learn the mathematical language,
which consists of symbols, logic, sets, numbers, maps and so on. We also talk about the
concept of a mathematical proof. These things build up the mathematical foundation.
A little bit of knowledge about numbers and how to calculate with them is assumed but
not much more than that. All symbols are introduced such that you know how to work
with them. However, if you interested in a more detailed discussion, I can recommend
you my video series about the foundations of mathematics:

Video: Start Learning Mathematics

https: // jp-g. de/ bsom/ la/ slm/

1.1 Logic and sets


Basic logic is something, we usually accomplish intuitively right. However, in mathematics
we have to define it in an unambiguous way and it may differ a little bit from the everyday
logic. It is very important and useful to bring into our attention some of the basic rules
and notations of logic. For Computer Science students, logic is considered in more detail
in other courses.
Let us start with a definition:

3
4 1 Foundations of mathematics

Definition 1.1. logical statement, proposition


A logical statement (or proposition) is a statement, which means a meaningful
declarative sentence, that is either true or false.

Instead of true, one often writes T or 1 and instead of false, one often writes F or 0.
Not every meaningful declarative fulfils this requirement. There are opinions, alternative
facts, self-contradictory statements, undecidable statements and so on. In fact, a lot of ex-
amples here, outside the mathematical world, work only if we give the words unambiguous
definitions which we will implicitly do.

Example 1.2. Which of these are logical statements?


(a) Hamburg is a city.
(b) 1 + 1 = 2.
(c) The number 5 is smaller than the number 2.
(d) Good morning!
(e) x + 1 = 1.
(f) Today is Tuesday.

The last two examples are not logical statements but so-called predicates and will be
considered later.

Logical operations
For given logical statements, one can form new logical statements with so-called logical
operations. In the following, we will consider two logical statements A and B.

Definition 1.3. Negation ¬A (“not A”)


¬A is true if and only if A is false.

A ¬A
Truth table T F (1.1)
F T

Example 1.4. What are the negations of the following logical statements?
(a) The wine bottle is full.
(b) The number 5 is smaller than the number 2.
(c) All students are in the lecture hall.

Definition 1.5. Conjunction A ∧ B (“A and B”)


A ∧ B is true if and only if both A and B are true.
1.1 Logic and sets 5

A B A∧B
T T T
Truth table T F F (1.2)
F T F
F F F

Definition 1.6. Disjunction A ∨ B (“A or B”)


A ∨ B is true if and only if at least one of A or B is true.

A B A∨B
T T T
Truth table T F T (1.3)
F T T
F F F

Definition 1.7. Conditional A → B (“If A then B”)


A → B is only false if A is true but B is false.

A B A→B
T T T
Truth table T F F (1.4)
F T T
F F T

Definition 1.8. Biconditional A ↔ B (“A if and only if B”)


A ↔ B is true if and only if A → B and B → A is true.

A B A↔B
T T T
Truth table T F F (1.5)
F T F
F F T

If a conditional or biconditional is true, we have a short notation for this that is used
throughout the whole field of mathematics:

Definition 1.9. Implication and equivalence


If A → B is true, we call this an implication and write:

A⇒B.

If A ↔ B is true, we call this an equivalence and write:

A⇔B.
6 1 Foundations of mathematics

This means that we speak of equivalence of A and B if the truth values in the truth table
are exactly the same. For example, we have

A ↔ B ⇔ (A → B) ∧ (B → A) .

Now one can ask: What to do with truth-tables? Let us show that ¬B → ¬A is the same
as A → B.
A B ¬A ¬B ¬B → ¬A
T T F F T
Truth table T F F T F (1.6)
F T T F T
F F T T T
Therefore:
(A → B) ⇔ (¬B → ¬A) .
This is the proof by contraposition:
“Assume that B does not hold, then we can show that A cannot hold as well”. Hence A
implies B.

Contraposition
If A ⇒ B, then also ¬B ⇒ ¬A.

Rule of thumb: Contraposition


To get the contraposition A ⇒ B, you should exchange A and B and set a ¬-sign
in front of both: ¬B ⇒ ¬A.
It is clear: The contraposition of the contraposition is again A ⇒ B.

The contraposition is an example of a deduction rule, which basically tells us how to get
new true proposition from other true propositions. The most important deduction rules
are given just by using the implication.

Modus ponens
If A ⇒ B and A is true, then also B is true.

Chain syllogism
If A ⇒ B and B ⇒ C, then also A ⇒ C.

Reductio ad absurdum
If A ⇒ B and A ⇒ ¬B, then ¬A is true.

One can easily prove these rules by truth tables. However, here we do not state every
deduction in this formal manner. We may still use deduction in the intuitive way as well.
Try it here:

Exercise 1.10. Let “All birds can fly” be a true proposition (axiom). Are the following
deductions correct?
1.1 Logic and sets 7

• If Seagulls are birds, then Seagulls can fly.


• If Penguins are birds, then Penguins can fly.
• If Butterflies are birds, then Butterflies can fly.
• If Butterflies can fly, then Butterflies are birds.

Sets
Modern mathematics does not say what sets are, but only specifies rules. This is, however,
too difficult for us right now, and we rather cite the attempt of a definition by Georg
Cantor:

“Unter einer ‚Menge‘ verstehen wir jede Zusammenfassung von bestimmten wohlun-
terschiedenen Objekten unserer Anschauung oder unseres Denkens zu einem Gan-
zen.”

Definition 1.11. Set, element


A set is a collection into a whole of definite, distinct objects of our perception or of
our thought. Such an object x of a set M is called an element of M and one writes
x ∈ M . If x is not such an object of M , we write x 6∈ M .

A set is defined by giving all its elements M := {1, 4, 9}.

The symbol “:=” is read as defined by and means that the symbol M is newly
introduced as a set by the given elements.

Example 1.12. • The empty set {} = ∅ = ∅ is the unique set that has no elements
at all.
• The set that contains the empty set {∅}, which is non-empty since it has exactly
one element.
• A finite set of numbers is {1, 2, 3}.

Notation 1.13. Let A, B be sets:


• x ∈ A means x is an element of A
• x 6∈ A means x is not an element of A
• A ⊂ B means A is a subset of B: every element of A is contained in B
• A ⊃ B means A is a superset of B: every element of B is contained in A
• A = B means A ⊂ B ∧ A ⊃ B. Note that the order of the elements does not matter
in sets. If we want the order to matter, we rather define tuples: (1, 2, 3) 6= (1, 3, 2).
For sets, we always have {1, 2, 3} = {1, 3, 2}.
• A ( B means A is a “proper” subset of B, every element of A is contained in B,
but A 6= B.
8 1 Foundations of mathematics

The important number sets


• N is the set of the natural numbers 1, 2, 3, . . .;
• N0 is the set of the natural numbers and zero: 0, 1, 2, 3, . . . ;
• Z is the set of the integers, which means . . . , −3, −2, −1, 0, 1, 2, 3, . . .;
• Q is the set of the rational numbers, which means all fractions p
q
with p ∈ Z
and q ∈ N;
• R is the set of the real numbers (see next semester).

Other ways to define sets:

A = {n ∈ N : 1 ≤ n ≤ 300}
P(B) = {M : M ⊂ B} power set: set of all subsets of B
I = {x ∈ R : 1 ≤ x < π} = [1, π) half-open interval

More about these constructions later.

Definition 1.14. Cardinality


We use vertical bars |·| around a set to denote the number of elements. For example,
we have |{1, 4, 9}| = 3. The number of elements is called the cardinality of the set.

Example 1.15. |{1, 3, 3, 1}| = 2, |{1, 2, 3, . . . , n}| = n, |N| = ∞ (?)

Exercise 1.16. Which of the following logical statements are true?

3∈N, 12034 ∈ N , −1 ∈ N, 0 ∈ N, 0 ∈ N0

2
−1 ∈ Z , 0∈
/ Z, −2.7 ∈ Z, 3
∈ Z,
2

3
∈Q, −3 ∈ Q , −2.7 ∈ Q , 2 ∈ Q,
√ √
2∈R, −2 ∈ R, − 23 ∈ R , 0∈R.

Predicates and quantifiers

Definition 1.17. Predicate


If X is any set and A(x) is a logical statement depending on x ∈ X (and true or
false for every x ∈ X), we call A(x) a predicate with variable x. Usually, one writes
simply A(x) instead of A(x) = true.

Example 1.18.
X=R A(x) = “x < 0“
Then we can define the set

{x ∈ X : A(x)} = {x ∈ R : x < 0}
1.1 Logic and sets 9

Definition 1.19. Quantifiers ∀ and ∃


We use ∀ (“for all”) and ∃ (“it exists”) and call them quantifiers. Moreover, we use
the double point “ : ” inside the set brackets, which means “that fulfil”.

The quantifiers and predicates are very useful for a compact notation:
• ∀x ∈ X : A(x) for all x ∈ X A(x) is true
• ∃x ∈ X : A(x) there exists at least one x ∈ X for which A(x) is true
• ∃!x ∈ X : A(x) there exists exactly one x ∈ X for which A(x) is true
Negation of statements with quantifiers:
• ¬(∀x ∈ X : A(x)) ⇔ ∃x ∈ X : ¬A(x)
• ¬(∃x ∈ X : A(x)) ⇔ ∀x ∈ X : ¬A(x)

Example 1.20. There is no greatest natural number:

A(n) = “n is the greatest natural number”

In our notation: ¬(∃n ∈ N : A(n)) this is the same as ∀n ∈ N : ¬A(n), i.e. Each n ∈ N
is not the greatest natural number . But this is clear, because n + 1 > n.

Rule of thumb: Negation of the quantifier (∀ and ∃)


“¬∀ = ∃¬” and “¬∃ = ∀¬”

Example 1.21. The set M := {x ∈ Z : x2 = 25} is defined by the set of each integer x
that squares to 25. We immediately see that this is just −5 and 5.

{x ∈ Z : x2 = 25} = {−5, 5},


{x ∈ N : x2 = 25} = {5},
{x ∈ R : x2 = −25} = ∅.

In other words: The equation x2 = 25 with unknown x has, depending in which number
realm you want to solve it, one or two solutions, and the equation x2 = −25 has no
solution in the real numbers. However, we will find solutions in the complex numbers as
we will see later.

Operations on sets

We remember the important operations for sets:

• M1 ∪ M2 := {x : x ∈ M1 ∨ x ∈ M2 } (union)
• M1 ∩ M2 := {x : x ∈ M1 ∧ x ∈ M2 } (intersection)
• M1 \ M2 := {x : x ∈ M1 ∧ x 6∈ M2 } (set difference)
10 1 Foundations of mathematics

Definition 1.22. Set compositions


M1 ∪ M2

The union M1 ∪ M2 is the new set that


consists exactly of the objects that are ele-
M1 M2
ments of M1 or M2 .

M1 ∩ M2

The intersection M1 ∩ M2 is the new set


whose elements are the objects that are
M1 M2
elements of M1 and M2 .

M 1 \ M2

We write M1 \ M2 for the set difference


whose elements are the objects that are
M1 M2
elements of M1 but not elements of M2 .

M1 ⊂ M2

A subset of M2 is each set whose elements


are also elements of M2 . M2 M1

Definition 1.23. Complement set


X
Let X be a set. Then for a subset M ⊂ X
Mc
there is a unique complement of M with
respect to X:
M
c
M := X \ M = {x ∈ X : x ∈
/ M}

Definition 1.24. Product set


The Cartesian product of two sets A, B is given as the set of all pairs (two elements
with order):
A × B := {(a, b) : a ∈ A, b ∈ B}
1.2 Real Numbers 11

1 2 3 B
A
x (x,1) (x,2) (x,3)

A×B
y (y,1) (y,2) (y,3)

z (z,1) (z,2) (z,3)

(Source of the picture: Author Quartl - Wikipedia)

In the same sense, for sets A1 , . . . , An the set of all n-tupels is defined:

A1 × · · · × An := {(a1 , . . . , an ) : a1 ∈ A1 , . . . , an ∈ An }

Exercise 1.25. Which statements are correct?

{1, 3} ∪ {2, 4} = {1, 2, 4}, {1, 2} ∪ {3, 4} = {3, 2, 4, 2, 1} , N∪Z=Z.

{1, 2, 4} ∩ {3, 4, 5} = {4} , {1, 3} ∩ {2, 4} = ∅ , N ∩ Z = N0 .

{1, 2, 4} \ {3, 4, 5} = {1}, N0 \ N = {0} , N\Z=∅.

Z \ N = {−x : x ∈ N}, N ⊂ N0 , Z ⊂ N0 , (Z \ Q) ⊂ N .
3

N⊂N, −3 ∈ Z \ N0 , 7
∈Q\Z, 2∈R\Q,

Exercise 1.26. Which claims are correct? Prove or give a counterexample.


(a) (Q \ R) ⊂ N0 .
(b) Let A, B, C be three sets. Then one has A ∪ (B ∩ C) = (A ∪ B) ∩ C.
(c) Let A, B, C be three sets. Then one has A ∩ (B ∩ C) = (A ∩ B) ∩ C.
(d) Let A, B, C be three sets. Then one has A \ (B ∪ C) = (A \ B) ∩ (A \ C).

Exercise 1.27. Describe the following sets and calculate its cardinalities:
(a) X1 := {x ∈ N : ∃a, b ∈ {1, 2, 3} with x = a − b}
(b) X2 := {(a − b) : a, b ∈ {1, 2, 3}}
(c) X3 := {|a − b| : a, b ∈ {1, 2, 3}}
(d) X4 := {1, ..., 20} \ {n ∈ N : ∃a, b ∈ N with 2 ≤ a and 2 ≤ b and n = a · b}.
(e) X5 := {S : S ⊂ {1, 2, 3}}.

1.2 Real Numbers


At school, everybody learns about the rational numbers, the real numbers, and basic
arithmetics. However, usually one does not define them in a rigorous way. There are just
certain rules that we can apply and, usually, we do not think about them.
12 1 Foundations of mathematics

In our lecture, we will also get to know other objects than real numbers, like vectors and
matrices, where some of these internalized laws do not apply any more. So we start by
having a fresh look at these rules.
We can add and multiply real numbers. Moreover, we use parentheses to describe the
order of the computations. We have the notational convention that multiplication binds
stronger than addition: ab + c means (ab) + c.

Computational rules for real numbers


For real numbers a, b, c ∈ R, we have:

a + (b + c) = (a + b) + c , a(bc) = (ab)c associative law


a + b = b + aa , ab = ba commutative law
a(b + c) = ab + ac distributive law

Furthermore, we are used to have the neutral numbers 0 and 1 with special properties:

a+0=a a·1=a

and additive inverse elements, denoted by −a, and also the multiplicative inverses, denoted
by a−1 for a 6= 0. They fulfil a + (−a) = 0 and aa−1 = 1.
A set with such properties is called a field. Here we have the field of real numbers R.
More details, you can find in the box below and in the videos.
It is also known from school that the real numbers can be ordered, which simply means
that the relation a < b always makes sense. One can show that the following rules are
sufficient to derive all known calculations properties concerning ordering of numbers.

Positive real numbers


The notation a > 0 defines the positive numbers and it fulfils:

• For any a ∈ R, exactly one of the three relations hold

a > 0, or − a > 0, or a = 0.

• For all a, b ∈ R with a > 0 and b > 0, one has a + b > 0 and ab > 0.

Then, as a definition we write:

a<b :⇔ a−b<0

and
a≤b :⇔ a − b < 0 or a = b .

This order relation is the reason, why we can think of the real numbers as a line, the ”real
line“.
For describing subsets of the real numbers, we will use intervals. Let a, b ∈ R. Then we
define

[a, b] := {x ∈ R : a ≤ x ≤ b}
1.2 Real Numbers 13

(a, b] := {x ∈ R : a < x ≤ b}
[a, b) := {x ∈ R : a ≤ x < b}
(a, b) := {x ∈ R : a < x < b}.

Obviously, in the case a > b, all the sets above are empty. We also can define unbounded
intervals:

[a, ∞) := {x ∈ R : a ≤ x}, (a, ∞) := {x ∈ R : a < x}


(−∞, b] := {x ∈ R : x ≤ b}, (−∞, b) := {x ∈ R : x < b}.

Definition 1.28. Absolute value for real numbers


The absolute value of a number x ∈ R is defined by
(
x if x ≥ 0,
|x| :=
−x if x < 0.

Question 1.29. Which of the following claims are true?

| − 3.14| = 3.14 , |3| = 3 , | − 75 | = 7


5
, −| − 53 | = 35 , |0| is not well-defined.

Proposition 1.30. Two important properties


For any two real numbers x, y ∈ R, one has

(a) |x · y| = |x| · |y|, (|·| is multiplicative),

(b) |x + y| ≤ |x| + |y|, (|·| fulfils the triangle inequality).

(∗) Supplementary details: Real numbers


The real numbers are a non-empty set R together with the operations + : R × R → R and
· : R × R → R and an ordering relation <: R × R → {True, False} that fulfil the following rules
(A) Addition
(A1) associative: x + (y + z) = (x + y) + z
(A2) neutral element: There is a (unique) element 0 with x + 0 = x for all x.
(A3) inverse element: For all x there is a (unique) y with x + y = 0. We write for this
element simply −x.
(A4) commutative: x + y = y + x
(M) Multiplication
(M1) associative: x · (y · z) = (x · y) · z
(M2) neutral element: There is a (unique) element 1 6= 0 with x·1 = x for all x.
(M3) inverse element: For all x 6= 0 there is a (unique) y with x · y = 1. We write for this
element simply x−1 .
(M4) commutative: x · y = y · x
(D) Distributivity: x · (y + z) = x · y + x · z.
(O) Ordering
(O1) for given x, y exactly one of the following three assertions is true: x < y, y < x, x = y.
14 1 Foundations of mathematics

(O2) transitive: x < y and y < z imply x < z.


(O3) x < y implies x + z < y + z for all z.
(O4) x < y implies x · z < y · z for all z > 0.
(O5) x > 0 and ε > 0 implies x < ε + · · · + ε for sufficiently many summands.
(C) Completeness: Every sequence (an )n∈N with the property [For all ε > 0 there is an N ∈ N
with |an − am | < ε for all n, m > N ] has a limit.

(∗) Supplementary details: Definition: field


Every set M together with two the operations + : M × M → M and · : M × M → M that fulfil
(A), (M) and (D) is called a field.

Sums and products


We will use the following notations.
n
X
ai = a1 + a2 + · · · + an−1 + an
i=1
Yn
ai = a1 · a2 · · · · · an−1 · an
i=1
n
[
Ai = A1 ∪ A2 ∪ · · · ∪ An−1 ∪ An
i=1

The union works also for an arbitrary index set I:


[
Ai = {x : ∃i ∈ I with x ∈ Ai } .
i∈I

The first is a useful notation for a sum which is the result of an addition. Two or more
summands added. Instead of using points, we use the Greek letter . For example,
P

3 + 7 + 15 + . . . + 127

is not an unambiguous way to describe the sum. Using the sum symbol, there is no
confusion:
X7
(2i − 1).
i=2

Of course, the parentheses are necessary here. You can read this as a for loop:

for loop for the sum above


sum := 0;
for i:=2 to 7 do {
sum := sum + (2i -1);
}
1.3 Maps 15

Rule of thumb: Let i run from 2 to 7, calculate 2i − 1 and add.


index variable: i = 2, first summand: 2i − 1 = 22 − 1 = 4−1= 3
index variable: i = 3, second summand: i
2 −1=2 −1=3
8−1= 7
index variable: i = 4, third summand: i 4
2 − 1 = 2 − 1 = 16 − 1 = 15
index variable: i = 5, fourth summand: 2i − 1 = 25 − 1 = 32 − 1 = 31
index variable: i = 6, fifth summand: 2i − 1 = 26 − 1 = 64 − 1 = 63
index variable: i = 7, last summand: 2i − 1 = 27 − 1 = 128 − 1 = 127
Sum: 246

Example 1.31.
10
?
X
(2i − 1) = 1 + 3 + 5 + . . . + 19 = 100
i=1

10
?
X
i = −10 − 9 − 8 − . . . − 1 + 0 + 1 + · · · + 8 + 9 + 10 = 0
i=−10

With the same construction, we describe the result of a multiplication,


Q called a product,
which consists of two or more factors. There we use the Greek letter . For example:

8
?
Y
(2i) = (2 · 1) · (2 · 2) · (2 · 3) · . . . · (2 · 8) = 10321920.
i=1

1.3 Maps

Definition 1.32. Function or map


Let X, Y be non-empty sets. A rule that assigns to each argument x ∈ X a unique
value y ∈ Y is called a map or function from X into Y . One writes for this y
usually f (x).
Notation:
f :X→Y
x 7→ f (x)

Here, X is called domain of f , and Y is called codomain.

Attention! Two arrows!


We use the arrow “ → ” only between the sets, domain and codomain, and “ 7→ ”
only between the elements.

Example 1.33. (a) f : N → N with f (x) = x2 maps each natural number to its square.
16 1 Foundations of mathematics

X=N Y =N

f
1 1 2 3
2 4 5 6 7 8
9 10 11 12
3
13 14 15
4 16 . . .
5 25
... ...

(b)

f : R2 → R
(x1 , x2 ) 7→ x21 + x22

(c)

f :Z×N→Q
q
(q, p) 7→
p

Some words about well-definedness


What can go wrong with the definition of a map? Sometimes, when defining a function,
it is not completely clear if this makes sense. Then one has to work and explain why the
function is well-defined.

Example: the square-root



Try to define a map a → a in a mathematically rigorous way.
Naive definition:

:R→R
a 7→ the solution of x2 = a.

Problem of well-definedness: the above equation can have two solutions, in the case that
a > 0. However in the case of a < 0, there are no solutions at all.
One possible way out: restrict the domain of definition and the codomain

R+
0 = {a ∈ R : a ≥ 0}

Then:

: R+ +
0 → R0

a 7→ the non-negative solution of x2 = a.

This yields the classical square-root function.


1.3 Maps 17

Image and preimage

For every well-defined map f : X → Y and A ⊂ X, B ⊂ Y we are interested in the


following sets:

Definition 1.34.
Let f : X → Y be a function and A ⊂ X and B ⊂ Y some sets.
X f Y

f (A) := {f (x) : x ∈ A}
A f (A)
is called the image of A under f .

X Y

f −1 (B) := {x ∈ X : f (x) ∈ B}
f −1 (B) B
is called the preimage of B under f .

Note that the preimage can also be the empty set if none of the elements in B are “hit”
by the map.
To describe the behaviour of a map, the following sets are very important:

Definition 1.35. Range and fiber


Let f : X → Y be a map. Then

Ran(f ) := f (X) = {f (x) : x ∈ X} is called the range of f.

For each y ∈ Y the set

f −1 ({y}) := {x ∈ X : f (x) = y} is called a fiber of f.

If these definitions seem too abstract, the following video may help you to get used to the
terms.

Video: Range, Image and Preimage

https: // jp-g. de/ bsom/ la/ sls5/


18 1 Foundations of mathematics

Injectivity, surjectivity, bijectivity, inverse

Definition 1.36. Injective, surjective and bijective


A map f : X → Y is called

• injective if every fiber of f has only one element: x1 6= x2 ⇒ f (x1 ) 6= f (x2 ).

• surjective if Ran(f ) = Y . With quantifiers: ∀y ∈ Y ∃x ∈ X : f (x) = y.

• bijective if f is both injective and surjective.

Example 1.37. Define the function that maps each student to her or his chair. This
means that X is the set of all students in the room, and Y the set of all chairs in the
room.
• well-defined: every student has a chair
• surjective: every chair is taken
• injective: on each chair there is no more than one student
• bijective: every student has his/her own chair, and no chair is empty

not
injective
not
surjective

X Y

Rule of thumb: Surjective, injective, bijective


A map f : X → Y is

surjective ⇔ at each y ∈ Y arrives at least one arrow


⇔ f (X) = Y
⇔ the equation f (x) = y has for all y ∈ Y a solution

injective ⇔ at each y ∈ Y arrives at most one arrow


⇔ (x1 6= x2 ⇒ f (x1 ) 6= f (x2 ))
⇔ (f (x1 ) = f (x2 ) ⇒ x1 = x2 )
⇔ the equation f (x) = y has for all y ∈ f (X) a unique solution

bijective ⇔ at each y ∈ Y arrives exactly one arrow


⇔ the equation f (x) = y has for all y ∈ Y a unique solution

Thus, if f is bijective, there is a well defined inverse map


f −1 : Y → X
1.3 Maps 19

y 7→ x where f (x) = y .

Then f is called invertible and f −1 is called the inverse map of f .

Example 1.38. Consider the function f : N → {1, 4, 9, 16, . . .} given by f (n) = n2 . This
is a bijective function. The inverse map f −1 is given by:

f −1 : {1, 4, 9, 16, 25, . . . } → N



m 7→ m
or: n2 7→ n

f (N) N
1 1
4 2
9 3
16 4
25 5
... ...

Example 1.39. For a function f : R → R, we can sketch the graph {(x, f (x)) : x ∈ X}
in the x-y-plane:

y
f :R→R
y x 7→ x2 − 1

f :R→R
x 7→ x2 + 1

x
x

y f :R→R
x 7→ sin x

Which of the functions are injective, surjective or bijective?

These notions might seem a little bit off-putting, but we will use them so often that you
need to get use to them. Maybe the following video helps you as well:
20 1 Foundations of mathematics

Video: Injectivity, Surjectivity and Bijectivity

https: // jp-g. de/ bsom/ la/ sls6/

Composition of maps

Definition 1.40.
If f : X → Y and g : Y → Z, we may compose, or concatenate these maps:

g◦f :X →Z
x 7→ g(f (x))

We call g ◦ f the composition of the two functions.

Usually, g ◦ f 6= f ◦ g, the latter does not even make sense, in general.

X→Y →Z

X Y Z
f g
x z
y

g◦f

Example 1.41. (a) f : R → R, x 7→ x2 ; g : R → R, x 7→ sin(x)

g◦f :R→R
x 7→ sin(x2 )
f ◦g :R→R
x 7→ (sin(x))2

(b) Let X be a set. Then idX : X → X with x 7→ x is called the identity map. If there
is no confusion, one usually writes id instead of idX . Let f : X → X be a function.
Then
f ◦ id = f = id ◦ f.
1.4 Natural numbers and induction 21

Algebraic vs. analytic properties of maps


Maps are a versatile tool in mathematics and often the main object of interest. Many
other problems can be reformulated with maps.
We have seen here some algebraic properties: injectivity, surjectivity, bijectivity.
Other algebraic properties may be compatibility with operations on X and Y .
Examples:
f (x − y) = f (x) − f (y) affine maps
f (αx) = αf (x) homogenous maps
f (αx + βy) = αf (x) + βf (y) linear maps
f (xy) = f (x)f (y) . . .
These are sometimes called ”homomorphisms“.
In analysis next semester, we will learn about other properties, like continuity, differenti-
ability, integrability, . . . But for this, we have to define open sets first.

1.4 Natural numbers and induction


The natural numbers are N = {1, 2, 3 . . .}.
Using natural numbers is our first mathematical abstraction. We learn this as children in
the kindergarden.
What is this abstraction? A number is an abstraction for all finite sets of the same size.
• Question 1: When are two sets S, T of the same size? Have the same cardinality
|S| = |T |? Answer: They have the same size if there is a bijective map S → T . For
example, N and the set of all even numbers have the same cardinality.
• Question 2: When is a set S finite? Answer: It is finite if removing one element
changes the cardinality of S.
In mathematical language: “Natural numbers are equivalence classes of finite sets of the
same cardinality.”

Mathematical induction

Mathematical induction is an important technique of proof: Proof step by step. It is a


close relative to recursion in computer science.
“Assume I can solve a problem of size n. How can I solve one of size n + 1?”
In mathematics:
“If an assertion is true for n, show that it is true for n + 1”

Example 1.42. What is the sum of the first n natural numbers?


n
X
sn := k =?
k=1
22 1 Foundations of mathematics

To make this practical, we need three ingredients:


(i) An idea what the result could be. (Induction hypothesis)
(ii) The verification that our hypothesis is true for n = 1 (Base case)
(iii) A proof, that if it holds for n, then also for n + 1. (Induction step)
Getting the first ingredient is often the most difficult one. Often one has to try it out:

s1 = 1
s2 = s1 + 2 = 3
s3 = s2 + 3 = 6
s4 = s3 + 4 = 10
s5 = s4 + 5 = 15
sn+1 = sn + n + 1

Ideas? Let’s take the hypothesis

(n + 1)n
sn = (Induction hypothesis).
2
Very good! We can verify our formula for these examples. In particular:

(1 + 1)1
s1 = =1 (Base case).
2
Induction step: We have to show

(n + 2)(n + 1) (n + 1)n
is equal to sn+1 = sn + (n + 1) = +n+1
2 2
where we used the induction hypothesis in the last step. So let us compute:

(n + 1)n n2 + n + 2n + 2 (n + 2)(n + 1)
sn + (n + 1) = +n+1= = .
2 2 2

This proves that sn = (n+1)n


2
for all n ∈ N.

We will get plenty of other examples later.

Rule of thumb: Mathematical induction


To show that the predicate A(n) is true for all n ∈ N, we have to show two things:

(1) Show that A(1) is true.

(2) Show that A(n + 1) is true under the assumption that A(n) is true.

Sometimes it can happen that a claim A(n) is indeed false for finitely many natural
numbers, but it is eventually true. This means that the base case cannot be shown for
n = 1 but for some other natural number n0 ∈ N. Then the induction proof shows that
A(n) is true for all natural number n ≥ n0 .
1.5 Summary 23

1.5 Summary
• For doing Mathematics, we need logic and sets. A set is just a gathering of its
elements.
• Important symbols: ∈, 6∈, ∅, ∀, ∃, ⊂, (, ∩, ∪, \
• Implication A ⇒ B: If A holds, then also B.
• Equivalence A ⇔ B: The statement A holds if and only if B holds.
• Sums and products Σ, Π
• A map or function f : X → Y sends each x ∈ X to exactly one y ∈ Y .
• f is surjective: Each y ∈ Y is “hit” (one or more times).
• f is injective: Each y ∈ Y is “hit” at most one time.
• f is bijective: Each y ∈ Y is “hit” exactly once.
• Is f : X → Y bijective, then the inverse map f −1 : Y → X sends each y ∈ Y to
exactly one x ∈ X.
• The composition g ◦ f : X → Z is the application of the function g : Y → Z to the
result of another function f : X → Y : (g ◦ f )(x) = g(f (x)).
• Mathematical induction is a tool for proving mathematical statements for all natural
numbers at once. You have to show a base case and then do the induction step.

1.6 Exercises

Exercise 1
Calculate the following numbers and sets:
4 4 5 50
(a) j
, (b) (c) (d)
Q P S P
j+1
3, [2n, 2n + 2), k.
j=2 i=0 n=0 k=1

Exercise 2
(a) Consider the two functions f1 : R → R, x 7→ x2 and f2 : [0, ∞) → R, x 7→ x2 . For
both functions calculate preimages of the sets {1}, [4, 9) and (−1, 0).
(b) Consider the two functions g1 : R → [0, 1], x 7→ |sin(x)| and g2 : [0, 2π] → R,
x 7→ sin(x). For both functions calculate images of the sets (0, π/2), [0, π) and
(0, 2π].

(c) Consider the two functions h1 : R → R and h2 : [−1, 1] → [ 3, 2] given by

x = (h1 (x) + 2)2 − 2 and x2 + h2 (x)2 = 4.

Check whether h1 and h2 respectively are correctly defined.


(d) Consider all 6 functions from above and find out which of them are injective, surjective
and bijective. Try to provide proofs and counterexamples.
24 1 Foundations of mathematics

Exercise 3
Let X be the set of all fish in a given aquarium. Define a function f : X → Y by mapping
every fish on its species where Y denotes the set of all species of fish. What does it mean
if f is injective or surjective or bijective?

Exercise 4
In the lecture you already learnt about the example (A → B) ⇔ (¬B → ¬A) of two
logically equivalent statements. Show that the following statements are also logically
equivalent by using truth tables:
(a) ¬(A ∧ ¬B) ⇔ (A → B),
(b) ¬(A ∧ B) ⇔ ¬A ∨ ¬B.

Exercise 5
One usually deals with subsets A, B, etc. of a given fixed set X. In such a situation it
is useful to introduce Ac := X \ A which is called the complement of A (with respect to
(w.r.t.) the set X). Show for A, B ⊂ X
(a) A \ B = A ∩ B c ,
(b) (A ∩ B)c = Ac ∪ B c .

Exercise 6
Let A, B and C be sets.
(a) Show A × (B ∪ C) = (A × B) ∪ (A × C).
(b) Let |A| = n and |B| = m where n, m ∈ N. Show that

|A × B| = n · m.
Vectors in Rn
2
This is Frank Drebin, Police Squad. Throw down your guns, and come
on out with your hands up. Or come on out, then throw down your
guns, whichever way you want to do it. Just remember the two key
elements here: one, guns to be thrown down; two, come on out!

After the first chapter about the foundations, we can finally start with the first topics
about Linear Algebra. There is a whole video series which can help you understanding
the definitions and propositions of the next sections.

Video: Linear Algebra Video Series

https: // jp-g. de/ bsom/ la/ la00/

2.1 What are vectors?

In this section we do some informal discussions about the objects of linear algebra. We
will make all objects into rigorous definitions later.
When we are talking about a vector, we often mean an object or a quantity that has a
length and a direction in some sense. Therefore, we can always visualise this object as an
“arrow” and we write, for example, ~v and w
~ for two vectors.

25
26 2 Vectors in Rn

Now we can exactly do two things to vectors. First of all, we can scale a vector ~v by a
number λ and get a new vector that has the same direction but a different length. The
second operation is that we form two vectors into a new one. This vector addition where
one sets the tail of the one arrow at the tip of the other one.

With vectors or arrows, you can do two things:

• Add the two arrows, by concatenating them and call the result ~v + w.
~

• Scale an arrow by a (positive or negative) factor λ and call the result λ~v .

w
~
~
v+w
~

~
v

3~
v
~
v

− 12 ~
v

With these operations we can combine ~v and w


~ to a large number of arrows and this is
what one calls a linear combination:

If we scale two vectors ~v and w


~ and add them, we get a new vector:

λ~v + µw
~ (linear combination)
2.2 Vectors in the plane 27

Mostly, there is no confusion which variables are vectors and which one are just numbers
such that we will omit the arrow from now on. However, we will use bold letters in this
script to denote vectors most of the time.

2.2 Vectors in the plane

We already know that we can describe the two-dimensional plane by the cartesian product
R × R, which consists of all the pairs of real numbers. For each point in the plane, there
is an arrow where the tail sits at the origin. This is what one calls a position vector.
y

v
 
3
v = x
2

Our vector is given by the point in coordinate system, which means it consists of exactly
two numbers, an x- and a y-coordinate. The arrow is given if we know these two numbers
as in the example above we can write

 
3
v= .
2

The first number says how many steps we have to go to right (or left) and the second
number says how many steps we have to go upwards (or downwards) parallel to the y-axis.
These numerical representations of the arrows are called columns or column vectors
Now we also know how to add and scale these column vectors:

Define addition and scaling:


   
v1 + w1 λv1
v+w = λv =
v2 + w2 λv2

These are the two things, we want to do with vectors and now we can describe such
arrows in the two-dimensional plane. We have the geometric view given by arrows and
the numerical view by operating on the coordinates.
28 2 Vectors in Rn

Definition 2.1. Vector space R2


The set R2 = R × R is called the vector space R2 if we write the elements in column
form  
v
v= 1 with v1 , v2 ∈ R
v2
and use the vector addition and scaling from above. The numbers v1 and v2 are
called the components of v.

For describing each point in the plane, the following elements are useful:

Definition 2.2. Zero vector and canonical unit vectors


The two vectors e1 , e2 ∈ Rn are called canonical unit vectors and o is called the
zero vector:      
0 1 0
o= , e1 = , e2 = .
0 0 1

Note that we can write every vector v ∈ R2 as a linear combination of the two unit vectors:
y

 
v v v2 e2
v = 1 = v1 e1 + v2 e2
v2
v1 e1
x

Linear combinations

To compare apples and oranges: An apple has 8mg vitamin C and 4µg vit-
amin K. An orange has 85mg vitamin C and 0.5µg vitamin K:
   
8 VitC 85 VitC
Apple a = , Orange b =
4 VitK 0.5 VitK

Fruit salad: How much vitamin C and vitamin K do I get if I eat 3 apples and 2
oranges? Answer:
       
8 85 3 · 8 + 2 · 85 194 VitC
3a + 2b = 3 +2 = =
4 0.5 3 · 4 + 2 · 0.5 13 VitK

Here, you can see a rough sketch of this vector addition:


2.2 Vectors in the plane 29

VitK

3a + 2b

a
0 b VitC

A vector written as

λa + µb with λ, µ ∈ R (2.1)

is called a linear combination of a and b. We can expand this definition:

Definition 2.3. Linear combination


Let v1 , . . . , vk be vectors in R2 and λ1 , . . . , λk ∈ R scalars. Then
k
X
λj vj = λ1 v1 + · · · + λk vk
j=1

is called a linear combination of the vectors.

Orthogonal vector and inner product

Question:
Which vectors v in R2 are perpendicular to the vector u = 2
?

1
30 2 Vectors in Rn

Doing the sketch,


 one easily recognises that, for ex-
ample, v = 2 is perpendicular to u. Of course, all
−1 v
multiples of this vector will also work. In general: · u
x
0

   
u1 −u2
v ∈ R is perpendicular to u =
2
⇐⇒ v=λ for a λ ∈ R (2.2)
u2 u1

Rule of thumb: orthogonal vector in R2


To find a vector that is orthogonal to xy , exchange the x and y and write a minus


sign in front of one of the two.

Looking at (2.2), we can reformulate:

       
u1 v1 v1 −u2
u= and v = are orthogonal ⇔ =λ for a λ ∈ R
u2 v2 v2 u1
⇔ u1 v1 = −u2 v2
⇔ u1 v1 + u2 v2 = 0

Hence, this means that uu12 and vv12 are orthogonal if the calculation of u1 v1 + u2 v2 gives
 

us 0. Therefore, the term u1 v1 + u2 v2 is used to define the so-called inner product or


scalar product.

Definition 2.4. Inner product: hvector, vectori = number


For two vectors
    2
u1 v1 X
u= , v= ∈ R2 the number hu, vi := u1 v1 + u2 v2 = ui vi
u2 v2 i=1

is called the (standard) inner product of u and v. Sometimes also called: (standard)
scalar product.

Definition 2.5. Orthogonality of two vectors in R2


Two vectors u and v in R2 are called orthogonal (or perpendicular) if hu, vi = 0
holds. We also denote this by u ⊥ v

By using Pythagoras’ theorem, we can calculate the length of the arrow in the coordinate
system.
2.2 Vectors in the plane 31

v v2 e2
Length of v =
p
v12 + v22
v1 e1
x

Obviously, we can also define it by using the inner product:

Definition 2.6. Norm of a vector in R2


For a vector
 
v1
q
the number
p
v= ∈ R2 kvk := hv, vi = v12 + v22
v2

is called the norm or length of v.

Lines in R2
For describing points in the plane, we can use the position vectors and just use the vector
operations to define objects in the plane. One of the simplest objects is a line g inside
the plane:

First case: The origin lies on the line g.

y g

We already know that all vectors that a orthogonal to a fixed vector u ∈ R2 , which means
that hu, vi = 0, build a line through the origin. On the other hand, if we have a line g
through the origin, we can find a vector n that is perpendicular to the vectors lying on
the line. Such an orthogonal vector is often called normal vector of the line.
Inthis first case, where g goes through the origin, we denote the normal vector by n =
α
β
∈ R2 and get:
32 2 Vectors in Rn

x
g = {v ∈ R2 : hn, vi = 0} = { ∈ R2 : αx + βy = 0}.

y | {z }
|{z}
v hn,vi

Second case: General case.


y g

In this case, there is generally no special


 point on the line such that we can choose any
point P with position vector p = pp12 to fix the line in the plane. Now, if we again fix
a normal vector n = αβ of the line g, then we can describe all points V (with position


vector v = xy ) on line: Such a point V lies on g if and only if the vector v − p is inside


the line, which means it is orthogonal to n. Calculating this, gives us:

0 = hn, v − pi = h αβ , x−p
 
y−p2
1
i = α(x − p1 ) + β(y − p2 ) = αx + βy − (αp1 + βp2 ) .
| {z }
=:δ

Lines in the plane R2 (Equation in normal form)


For each line g, one has the following representation:

g = {v ∈ R2 : hn, v − pi = 0} = { xy ∈ R2 : αx + βy = δ}


with δ := αp1 + βp2 = h α p1


i = hn, pi. If the origin lies on g, then δ = 0
 
β
, p2
(choose p = o).

n V
n ·
n · P v−p
·
v
p
g

0
2.3 The vector space Rn 33

2.3 The vector space Rn


Instead of restricting to two components, we could also imagine that we have an arbitrary
number n of directions. It easy to visualise a three-dimensional space but harder to do
this for an n-dimensional space when n > 3. However, even without a visualisation, we
can transfer the calculation from above to column vectors with n components:
     
v1 v1 λv1
λ ∈ R, v = ... 
 ⇒ λv = λ  ...  :=  ... 
vn vn λvn
         
u1 v1 u1 v1 u1 + v1
u=  .
.. , v = ... 
  ⇒ u + v =  ...  +  ...  :=  ... 
un vn un vn un + vn

Definition 2.7. Vector space Rn


The set Rn = R × · · · × R is called the vector space Rn if we write the elements in
column form  
v1
v =  ...  with v1 , . . . , vn ∈ R
vn
and use the vector addition and scaling from above. The number vi are called the
ith component of v.

The same calculation rules as for R2 also hold for the general case. The most important
properties we should note:

Proposition 2.8. Properties of the vector space Rn


The set V = Rn with the addition + and scalar multiplication · fulfils the following:

(1) ∀v, w ∈ V : v+w =w+v (+ is commutative)


(2) ∀u, v, w ∈ V : u + (v + w) = (u + v) + w (+ is associative)
(3) There is a zero vector o ∈ V with the property: ∀v ∈ V we have v + o = v.
(4) For all v ∈ V there is a vector −v ∈ V with v + (−v) = o.
(5) For the number 1 ∈ R and each v ∈ V , one has: 1 · v = v.
(6) ∀λ, µ ∈ R ∀v ∈ V : λ · (µ · v) = (λµ) · v (· is associative)
(7) ∀λ ∈ R ∀v, w ∈ V : λ · (v + w) = (λ · v) + (λ · w) (distributive ·+)
(8) ∀λ, µ ∈ R ∀v ∈ V : (λ + µ) · v = (λ · v) + (µ · v) (distributive +·)

Each set V with an addition and scalar multiplication that satisfies the eight rules above
is called a vector space. We will come back to this in an abstract sense later. First we
will use this notion to talk about vector spaces inside Rn .

Definition 2.9. Zero vector and canonical unit vectors


For i = 1, . . . , n, we denote the ith canonical unit vector by ei ∈ Rn and the zero
34 2 Vectors in Rn

vector by o ∈ Rn , which means:


         
0 1 0 0 0
0 0 1 0 0
o =  ...  , e1 =  ...  , e2 =  ...  , . . . , en−1 =  ...  , en =  ... 
         
0 0 0 1 0
0 0 0 0 1

2.4 Linear and affine subspaces (and the like)


So far, when we identified vectors in Rn with points in our usual space, we considered
single points, lines, planes, and the space itself. Sometimes these objects went through
the origin, sometimes they did not. Let us develop a general name for these things:

Linear subspaces

Rule of thumb:
Linear subspaces correspond to lines, planes,. . . through the origin.

Definition 2.10. Subspaces in Rn


A nonempty subset U ⊂ Rn is called a (linear) subspace of Rn if all linear combin-
ations of vectors in U remain also in U :
k
X
u1 , . . . , uk ∈ U , λ1 , . . . , λk ∈ R =⇒ λj uj ∈ U .
j=1

Since we can set all λj to 0, the zero vector o is always contained in U , and therefore
{0} is the smallest possible subspace. On the other hand, Rn itself is the largest possible
subspace. Both are called the trivial subspaces.

Each linear subspace U of the vector space Rn is also a vector space in the sense of
the properties given in Proposition 2.8.

Linear combinations remain in U (by definition), and rules are inherited from V .

Proposition 2.11. Characterisation for subspaces


Let U ⊂ Rn with U 6= ∅, such that

u, v ∈ U , λ, µ ∈ R =⇒ λu + µv ∈ U . (2.3)

Then U is already a linear subspace.

Proof. We do the proof by induction for k vectors like in the definition of a subspace:
2.4 Linear and affine subspaces (and the like) 35

Induction hypothesis (IH): Linear combinations of k vectors remain in U .


Base case (BC): For k = 2. This is exactly given by equation (2.3).
Induction step (IS): k → k + 1. Let u1 , . . . , uk+1 ∈ U and λ1 , . . . , λk+1 be given. We can
write:
k+1 k
!
X X
v := λj uj = λj uj +λk+1 uk+1
j=1 j=1
| {z }
=:w
= w + λk+1 uk+1 ∈ U

By our induction hypothesis, w ∈ U because it is a linear combination of k vectors. Thus,


v ∈ U as well because it is a linear combination of w and uk+1 , see (2.3).

Thus, to check if a given set U is a linear subspace, we only have to check if linear
combinations of two vectors remain in U . Or we can check it separately:

Corollary 2.12. How to check if a set is a subspace


Let U ⊂ Rn such that

(1) o∈U,
(2) u ∈ U , λ ∈ R =⇒ λu ∈ U ,
(3) u, v ∈ U =⇒ u + v ∈ U .

Then U is already a linear subspace.

Rule of thumb: Subspace


A set U is a subspace if, by applying the operations + and λ· on elements of U , one
cannot escape the set U .

Linear hull or span


If we take a set of vectors M ⊂ Rn , we can create a linear subspace by building all possible
linear combinations:

Definition 2.13. Span


Let M ⊂ Rn be any non-empty subset. Then we define:
( k
)
X
Span (M ) := u ∈ Rn : ∃λj ∈ R, uj ∈ M such that u = λj uj .
j=1

This subspace is called the span or the linear hull of M . For convenience, we define
Span(∅) := {o}.

An equivalent definition would be: Span (M ) is the smallest linear subspace U ⊂ Rn with
M ⊂ U . See Proposition 2.15.
36 2 Vectors in Rn

Rule of thumb: All linear combinations form the span


Every vector in Span (M ) can be written (possibly in several ways) as a linear com-
bination of elements of M . Vice versa, every linear combination of M is contained
in Span (M ).

Most interesting is the case, where M = {u1 , . . . , uk } just consists of finitely many vectors.
We say that U := Span(M ) is spanned by the vectors u1 , . . . , uk or, the other way around,
that {u1 , . . . , uk } is a generating set for U (generates U , spans U ). In this case, we often
write U = Span(u1 , . . . , uk ).

Example 2.14. The vector space Rn is spanned by the n unit vectors:


       
1 0 0 0
0 1 0 0
e1 =  .  , e2 =  .  , . . . , en−1 =  .  , en =  ... 
 ..   ..   .. 
     
 
0 0 1 0
0 0 0 1

because v = vi ei for all v ∈ Rn . In short: Rn = Span(e1 , . . . , en ).


Pn
i=1

To check, if a vector space is spanned by some vectors, we only have to check this for
some generating set:

Proposition 2.15. Span is smallest linear subspace


Let U ⊂ Rn be a linear subspace and M ⊂ U any set. Then Span (M ) is a linear
subspace and Span(M ) ⊂ U .

Proof. Exercise!

We need one further notation.

Definition 2.16. Addition for subspaces?


If U1 and U2 are linear subspaces in Rn , then one defines

U1 + U2 := Span (U1 ∪ U2 ) = {u1 + u2 : u1 ∈ U1 , u2 ∈ U2 } .

Example 2.17. Let us look at some spans:


(a) Span( 31 ) ⊂ R2 is the line that “the vector 31 spans” going trough the origin of R2 .
 

(b) Span( 10 , 01 ) is the whole plane R2 . Span( 10 , 11 ) is also the whole plane.
   
! !
 1 0 
(c) Span 0 , 1 this is the xy-plane in R3 .
0 0
! ! ! ! !
 1 2  0 1 2
(d) Span 2 , 4 is a plane in R3 going through 0 , 2 and 4 .
3 7 0 3 7
2.4 Linear and affine subspaces (and the like) 37

! ! ! ! ! !
 1 0 0   1 0 1 
(e) Span 0 , 1 , 0 is the whole space R3 . Span 1 , 1 , 0 is also the
0 0 1 0 1 1
whole space
     
1 ! 1 5 !
2 2 4
(f) Span  3 is a “line” in R5 , Span 
3 , 3 is a “plane”.

   
4 4 2
5 5 1

Affine subspaces and convex subsets

Rule of thumb:
Affine subspaces correspond to arbitrary lines, planes,. . . . In other words: translated
linear subspaces.

If we do not want o to be part of our “generalised plane”, we have to replace linear


combinations by affine combinations:
k
X k
X
v= λj uj where λj = 1.
j=1 j=1

Example 2.18. Consider the position vectors


   
−1 3
a= und b=
2 4

corresponding to the points A and B. Find the centre point of the line between A and B.

y y

B B

M M
P
A A
b b

a a

x x
0 0

The connection vector from A to B is then:


           
−1 3 1 3 1+3 4
−a + b = − + = + = = =: d
2 4 −2 4 −2 + 4 2
38 2 Vectors in Rn

The center point is then given by going only half way in the direction of d:
         
1 −1 1 4 −1 2 1
m=a+ d= + = + = (2.4)
2 2 2 2 2 1 3

The point M with position vector m = 1


is the wanted centre point. In general, we get

3
the formula:

m = a + 21 d = a + 12 (−a + b) = a − 12 a + 12 b = 21 a + 12 b = 12 (a + b)

Instead of using 12 , we can choose λ ∈ R to divide the line from A to B. We get:

d
z }| {
q := a + λ (−a + b) = (1 − λ)a + λb (2.5)

The corresponding point Q (with position vector q from the equation above) lies

at point A if λ = 0,
at point B if λ = 1,
at the centre point M if λ = 21 ,
between A and B if λ ∈ [0, 1] (2.6)
on the line through A and B for all λ ∈ R, (2.7)
on the line through A and B, “in front of” A for all λ < 0,
on the line through A and B, “behind” B for all λ > 1.

y
Q2

Q1 = M Q1 = B
2
Q1 = P
4

Q0 = A
b
a
Q−1
x
0

This brings out to the following:


2.4 Linear and affine subspaces (and the like) 39

Definition 2.19. Affine Subspaces in Rn


A subset U ⊂ Rn is called an affine subspace of Rn if all affine combinations of
vectors in U remain also in U :
k
X k
X
u1 , . . . , uk ∈ U , λ1 , . . . , λk ∈ R with λj = 1 =⇒ λj uj ∈ U
j=1 j=1

Definition 2.20. Convex subsets in Rn


A subset U ⊂ Rn is called a convex subset of Rn if all convex combinations of
vectors in U remain also in U :
k
X k
X
u1 , . . . , uk ∈ U , λ1 , . . . , λk ∈ [0, 1] with λj = 1 =⇒ λj uj ∈ U
j=1 j=1

The analogous formulation to the linear hull is the affine hull. Try to give a definition!

Proposition 2.21. Properties of affine subspaces


(i) Each linear subspace is an affine subspace.

(ii) If an affine subspace contains o, it is a linear subspace.

(iii) Given an affine subspace S ⊂ Rn and a vector v ∈ Rn , the translated set:

v + S := {x ∈ Rn : x = v + s for s ∈ S}

is also an affine subspace.

(iv) Every nonempty affine subspace S can be written in the form S = v + U for
some v ∈ S and U a linear subspace.

Proof. (i) : Follows from the definition because each affine combination is a linear com-
bination.
(ii) : If we have an arbitrary linear combination, we can trivially add also the zero vector.
But if the zero-vector is in a linear combination, we can make it an affine one.
(iii) : Let us write an arbitrary affine combination of elements of v + S:

k
X k
X k
X k
X
λj (sj + v) = λj s j + λj v = λj sj + v .
| {z }
j=1 j=1 j=1 j=1
∈v+S | {z } | {z }
1 ∈v+S

(iv) : If S is an affine subspace, choose v ∈ S and define U = −v + S. By (iii) U is an


affine subspace and it contains o. Hence, by (i) it is a linear subspace. Obviously, we
have S = v + U .
40 2 Vectors in Rn

Proposition 2.22. Characterisation of affine subspaces


Let S ⊂ Rn , such that

a, b ∈ S , λ ∈ R =⇒ λa + (1 − λ)b ∈ S

Then S is already an affine subspace.

Proof. We do a proof by mathematical induction:


Induction hypothesis: affine combinations of k vectors remain in S. In other words:
k
X k
X
v= λj aj and λj = 1 implies v ∈ S
j=1 j=1

for every k and every admissible choice of λj and aj ∈ S.


Base case: by assumption, this is certainly true for k = 2.
Induction step: k → k + 1. Let aj and λj be given for all j ∈ {1, . . . , k + 1}. By definition
λ1 + · · · + λk = 1 − λk+1
thus we can write:
k+1 k
!
X X λj
v= λj aj = (1 − λk+1 ) aj +λk+1 ak+1
j=1 j=1 1
λ + · · · + λk
| {z }
affine combination w
= (1 − λk+1 )w + λk+1 uk+1 ∈ S
By our induction hypothesis, w ∈ U , because it is an affine combination of k vectors.
Thus, v ∈ U as well, because it is an affine combination of w and uk+1 .

Conical combinations (an outlook)


There are also other rules for combining vectors. they lead to different classes of sets. For
example, conical combinations of vectors are defined as:
k
X
v= λj uj where λj ≥ 0.
j=1

The sets which contain all possible conical combinations of their elements are called convex
cones, and we can define the conical hull of a set of vectors.
We can summarise this in the following table:

no sign imposed λj ≥ 0

no sum imposed linear conical


affine convex
P
λj = 1
2.5 Inner product and norm in Rn 41

For all these types of sets we know ”... combinations“, and ”... hulls“.

This illustrates our strategy: describe things known from R2 and R3 algebraically, and
thus generalise them to arbitrary dimensions.

2.5 Inner product and norm in Rn


We transfer the notion of the inner product to define orthogonality and the length of the
vector to the general Rn

Definition 2.23. Inner product: hVector, Vectori = Number


For two vectors
   
u1 v1 n
.
. .
. , v = ..  ∈ R
X
u=    n
the number hu, vi := u1 v1 + ... + un vn = ui vi
un vn i=1

is called the (standard) inner product of u and v. Sometimes also called: (standard)
scalar product. If hu, vi = 0, then we call u and v orthogonal.

Proposition 2.24.
The standard inner product h·, ·i : Rn × Rn → R fulfils the following: For all vectors
x, x0 , y ∈ Rn and λ ∈ R, one has

(S1) hx, xi > 0 for all x 6= o, (positive definite)


(S2) hx + x0 , yi = hx, yi+hx0 , yi, (additive) o
(linear)
(S3) hλx, yi = λhx, yi, (homogeneous)
(S4) hx, yi = hy, xi. (symmetric)

Definition 2.25. Norm of a vector in Rn


For a vector
 
v1
v =  ...  ∈ Rn
q
the number
p
kvk := hv, vi = v12 + ... + vn2
vn

is called the norm or length of v.

In general, we just need a map h·, ·i with the properties given in Proposition 6.18 to define
orthogonality as follows:
u ⊥ v :⇔ hu, vi = 0.

From the first binomial formula, we obtain directly a generalisation of the Pythagorean
theorem:
u ⊥ v ⇒ ku + vk2 = kuk2 + kvk2 .
42 2 Vectors in Rn

For a linear subspace U ⊂ Rn we define the orthogonal complement:

U ⊥ := {v ∈ Rn : hv, ui = 0 ∀u ∈ U } .

However, we come back to such constructions later.

2.6 A special product in R3(!): The vector product or


cross product
The three-dimensional space is in some sense special: One can define a product between
two vectors and gets a vector as a result. In contrary to the inner product, this multiplic-
ation exists only in R3 :

Definition 2.26. Cross product: Vector × Vector = Vector


The cross product or vector product of two vectors
! ! !
u1 v1 u2 v3 − u3 v2
u = u2 , v = v2 ∈ R3 is given by u × v := u3 v1 − u1 v3 ∈ R3 .
u3 v3 u1 v2 − u2 v1

Rule of thumb: How to remember this formula?


u1 v1
u2 v2
=⇒
 
u3 v3 +u2 v3 − u3 v2
=⇒ +u3 v1 − u1 v3 
u1 v1
=⇒ +u1 v2 − u2 v1
u2 v2
u3 v3

Remark:
In some calculations it can be really helpful to use the Levi-Civita symbol:

+1 if (i, j, k) is an even permutation of (1, 2, 3)

εijk = −1 if (i, j, k) is an odd permutation of (1, 2, 3)
0 if i = j, or j = k, or k = i

Then we find a short notation for the cross product of two vectors u, v ∈ R3 :
X
u×v = εijk ui vj ek .
ijk

Since we have a good imagination for the three-dimensional space, we can interpret the
result of the cross product u × v in a geometric way. It is the only vector in R3 that has
the following three properties:
2.7 What are complex numbers? 43

1.) u × v ⊥ u and u × v ⊥ v

 

v
2.) ku × vk = Area 
 

u
3.) Orientation: “right-hand rule”

You can use the cross product, for example,


• to find a vector that is perpendicular to u and v,
• to calculate the area of parallelogram.
Since all triangles are the half of a parallelogram, you can also use it to calculate the area
of a triangle. Keep in mind that you can embed R2 into R3 to use the cross product even
if you have just a two-dimensional problem.

2.7 What are complex numbers?


Once we can solve the equation
x2 − 2 = 0 in R
we also would like to solve √
x2 + 1 = 0 or x = −1
This has no real solution, because for x ∈ R, x2 ≥ 0 ⇒ x2 + 1 ≥ 1 > 0. However, in a
bigger number set, it is solvable.
Let us see what happens if we postulate the existence of two solutions and call them ±i,
where i stands for ”imaginary“ (some engineers use the letter j instead). This means that

i2 = −1, or i = −1

Of course, in general we would like to solve an arbitrary quadratic equation:

x2 + 2bx + c = 0,

which has the solutions √


x1,2 = −b ± b2 − c
If b2 − c ≥ 0, x1,2 can be solved as usual, but otherwise, we have to compute:
s
2
s
2
√ √
− }c = (−1)(c| −
b| {z b ) = i c − b 2 ⇒ x = −b ± i c − b2
{z } 1,2
<0 >0

Thus, to write down solutions of quadratic equations, we have to define

Complex numbers C = {x + iy : x, y ∈ R}.

In fact, there is the fundamental theorem of algebra, which says that complex numbers
can even be used to solve any algebraic equation.
44 2 Vectors in Rn

Theorem 2.27. Fundamental theorem of algebra


Every algebraic equation:

an xn + an−1 xn−1 + · · · + a1 x + a0 = 0, ak ∈ C : k = 0 . . . n

has at least one zero in C if the left hand side is not constant.

Complex plane

Im
geometrical identification:
y z
x

ϕ
x + iy ∈ C ←→ y
∈ R2
0 x Re

(∗) Supplementary details: Complex numbers


The vector space R2 becomes a field by defining a multiplication · : R2 × R2 → R2 in the following
way:      
a c ac − bd
· := .
b d bc + ad
One writes a + ib := ab and calls this field the complex numbers C. The natural embedding


R → C with a 7→ a + i0 justifies the notation R ⊂ C.

Computations in C

Business as usual, only new rule i2 = −1. We use two complex numbers w = u + iv and
z = x + iy:

w + z = (u + iv) + (x + iy) = (u + x) + i(v + y)


wz = (u + iv)(x + iy) = ux + i(vx + uy) + i2 vy = (ux − vy) + i(vx + uy)
w u + iv (u + iv)(x − iy) ux + ivx − iuy − i2 vy
= = =
z x + iy (x + iy)(x − iy) x2 − (iy)2
ux + vy vx − uy
= 2 2
+i 2
x +y x + y2

The last formula works if and only if z 6= 0, which means x2 + y 2 6= 0.


If w = u (i.e. v = 0) is a real number, then: uz = u(x + iy).
Thus, complex numbers can be added like vectors, and scaled by real numbers, just like
vectors. So we can think of complex numbers as 2d vectors:
 
x
z = x + iy ∼
= ∼
= (x, y)
y

But they are more: vectors cannot be multiplied with each other, but with complex
numbers we can do that. Just like the reals, they are a field (but have no ordering). This
is very special. There is no 3d analogue to the complex numbers.
2.7 What are complex numbers? 45

Definition 2.28.
We can define the following derived quantities for z = x + iy:

Re z = x real part
Im z = y imaginary part
z = x − iy complex conjugate
p
r = |z| = x2 + y 2 absolute value, modulus
ϕ = arg z = angle of z with positive real line argument

A warning, concerning arg z: its value is not unique: for example

arg −i = −π/2 or 3/2π.

Usually, one either takes −π < arg z ≤ π or 0 ≤ arg z < 2π. In cases of ambiguity, one
has to carefully explain, what is meant.
We have:
z = |z|(cos argz + i sin argz) = |z|eiargz
It holds
|wz| = |w||z| and arg (zw) = arg z + arg w.
So we can write shortly:
zw = |w||z|ei(arg w+arg z)
However, as usual with an angle, we would like to have arg w + arg z in [0, 2π[. Thus,

zw = |w||z|eiϕ

where ϕ is chosen by using a k ∈ Z in a way that 0 ≤ ϕ = arg w + arg z − 2kπ < 2π.

Representations of complex numbers

z = x + iy = |z| · (cos ϕ + i sin ϕ) = |z| · eiϕ


| {z } | {z } | {z }
algebraical trigonometrical exponential
representation representation representation

The nth square root of a complex number



It is well-known, that the equation x2 = a has two solutions ± a for a 6= 0. What about
z n = a?
It follows from the multiplication rule that

z n = |z|n ein arg z = |a|ei arg a

Thus |z| = |a|1/n and 0 ≤ arg a = n arg z − 2kπ < 2π, where k ∈ N can again be chosen.
Thus, we get the following solutions:
1
arg z0 = arg a
n
46 2 Vectors in Rn

1
arg z1 = (arg a + 2π)
n
1
arg z2 = (arg a + 2 · 2π)
n
..
.

So in general:
1 1
zj = |a| n ei n (arg a+j·2π) for j = 0, . . . , n − 1 .
Thus, we have n complex nth roots, which are evenly distributed on the circle around 0
1
with radius |a| n .
It does not matter here, which arg a one takes, if all the results are written again in the
form zk = xk + iyk .

Summary
• Rn denoted the set of all vectors with n real components
• You can add and scale vectors. Both operations in Rn are realised by doing these
inside the components.
• Rn is an example of an abstract concept, called a vector space.
• Combinations like λv + µu are called linear combinations.
• There are linear subspaces and affine linear spaces. They are the generalisation of
lines and planes one can illustrate in R3 .
• The inner product shows you orthogonality and the norm measures the length of a
vector.
• In R3 we have the cross product to calculate orthogonal vectors.
• Complex numbers are given by a multiplication rule on R2 .
Matrices and linear systems
3
Arthur looked up. “Ford!” he said, “there’s an infinite number of
monkeys outside who want to talk to us about this script for Hamlet
they’ve worked out.”
Douglas Adams

In this chapter, we will study matrices in more detail, and after that, describe systems of
linear equations. First of all, a matrix is just a table of numbers. One writes the numbers
in a rectangle with m rows and n columns, where m, n are natural numbers.

Definition 3.1.
The set of all matrices with m rows and n columns is notated as:
   

 a11 a12 . . . a1n 


  a21 a22 . . . a2n 
 

Rm×n := A =  .. ..  : aij ∈ R , i = 1, . . . , m , j = 1, . . . , n


  . .  

 
 am1 am2 . . . amn 

Since we can add matrices of the same size A + B by adding the entries and scale a matrix
λ · A by scaling the entries, the space Rm×n is also a vector space.

Definition 3.2. Matrix + Matrix = Matrix


Let A, B ∈ Rm×n . The addition A+B ∈ Rm×n is defined by
     
a11 · · · a1n b11 · · · b1n a11 + b11 · · · a1n + b1n
 .. ..  +  .. ..  :=  .. ..
 . .   . .  . . .


am1 · · · amn bm1 · · · bmn am1 + bm1 · · · amn + bmn
| {z } | {z } | {z }
A B A+B

Example 3.3.
       
1 2 1 0 1+1 2+0 2 2
+ = = .
3 4 2 −1 3+2 4−1 5 3

47
48 3 Matrices and linear systems

Attention!
The addition A + B is only defined for matrices with the same height and the same
width.

Definition 3.4. Scalar · Matrix = Matrix


Let A ∈ Rm×n and λ ∈ R. Then the scalar multiplication λ·A ∈ Rm×n is defined
by:    
a11 · · · a1n λa11 · · · λa1n
λ·  ... ..  :=  ..
.   .
..  .
. 

am1 · · · amn λam1 · · · λamn
| {z } | {z }
A λ·A

Example 3.5.
         
1 2 2·1 2·2 2 4 1 2 1 2
2 = = = + .
3 4 2·3 2·4 6 8 3 4 3 4

(∗) Supplementary details: Rm×n is a vector space


The set V := Rm×n with the addition + and scalar multiplication · is a vector space, which means
it fulfils the following:
(1) ∀A, B ∈ V : A+B =B+A (+ is commutative)
(2) ∀C, A, B ∈ V : C + (A + B) = (C + A) + B (+ is associative)
(3) There is the zero matrix 0 ∈ V with the property: ∀A ∈ V we have A + 0 = A.
(4) For all A ∈ V there is a matrix −A ∈ V with A + (−A) = 0.
(5) For the number 1 ∈ R and each A ∈ V , one has: 1 · A = A.
(6) ∀λ, µ ∈ R ∀A ∈ V : λ · (µ · A) = (λµ) · A (· is associative)
(7) ∀λ ∈ R ∀A, B ∈ V : λ · (A + B) = (λ · A) + (λ · B) (distributive ·+)
(8) ∀λ, µ ∈ R ∀A ∈ V : (λ + µ) · A = (λ · A) + (µ · A) (distributive +·)

To see why it is interesting to study matrices, we first have to look at linear equations.

3.1 Introduction to systems of linear equations


We start with some easy examples:

Xavier is two years older than Yasmin. x−y =2


Together they are 40 years old. x + y = 40
How old is Xavier and how old is Yasmin? x = ?, y = ?

This was an example with two unknowns (x and y). Here we give an example for three
3.1 Introduction to systems of linear equations 49

unknowns. (x, y and z):

2x −3y +4z = −7
−3x +y −z = 0
20x +10y = 80
10y +25z = 90

You can imagine that we can have an arbitrary number of unknowns and also an arbitrary
number of equations. Often these unknowns are denoted by x1 , x2 , . . . , xn and we search
for suitable values such that all equations are satisfied.
Here, the most important part is that the equations are linear. The exact definition will
follow later. The sloppy way to say that an equation is linear is:

constant · x1 + constant · x2 + · · · + constant · xn = constant . (3.1)

As you can see, there are a lot of constants that have to be numeric.

Definition 3.6. System of linear equations (LES)


Let m, n ∈ N be two natural numbers. A system of linear equations or a linear
equation system (abbreviation: LES) with m equations and n unknowns x1 , x2 , ..., xn
is given by:

a11 x1 + a12 x2 + · · · + a1n xn = b1 

a21 x1 + a22 x2 + · · · + a2n xn = b2


.. .. (LES)
. . 


am1 x1 + am2 x2 + · · · + amn xn = bm 

Here, aij and bi are given numbers, mostly just real numbers. A solution of the LES
is a choice of values for x1 , ..., xn such that all m equations are satisfied.

Example 3.7. Having three unknowns x1 , x2 , x3 , we could have different cases for the set
of solutions:

E2

s
E1
50 3 Matrices and linear systems

or

2 equations  3 equations 

While the system with two variables was very well-arranged, the general case seems more
complicated in spite of representing the same idea. At this point matrix and vector
notation comes in very handy:

Definition 3.8. LES in matrix notation


Let A ∈ Rm×n with entries aij ∈ R and b ∈ Rm with entries bi ∈ R. Then

Ax = b

represents (LES) from above, where x ∈ Rn .

The two examples from above in this notation:


   
2 −3 4 ! −7
−3 1 −1 x
    
1 −1 x 2 0
= ,  20 10 0  y =  80  .
   
1 1 y 40 z
0 10 25 90

This can be seen as a short notation for a system of linear equations. However, this also
defines a product of a matrix and a vector.

Definition 3.9. Matrix · Vector = Vector


Let m, n ∈ N and
 
a11 a12 · · · a1n  
 a21 a22 · · · a2n  x1
A =  .. .. ..  ∈ R
m×n
and x =  ...  ∈ Rn .
 
 . . .  xn
am1 am2 · · · amn

The product Ax = A · x (where we mostly do not use a dot) is given as the vector
 
a11 x1 + a12 x2 + · · · + a1n xn
..  ∈ Rm .
Ax :=  .
am1 x1 + am2 x2 + · · · + amn xn
3.2 Some words about matrices 51

Attention!
The width of A has to be the same as the height of x! Otherwise Ax is not defined.

3.2 Some words about matrices


For a matrix A ∈ Rm×n the number m is called the number of rows and n the number of
columns. The matrix A is a rectangle with height m and width n.
As special cases, we note:
• A ∈ Rn×n (i.e. m = n) is called a square matrix or quadratic matrix
• A ∈ Rm×1 is a column vector of size m
• A ∈ R1×n is a row vector of size n
• A ∈ R1×1 is a scalar, just a real number.
Then there are certain properties of A, concerning its diagonal:
• The entries aii are called “diagonal entries”. If outside of the diagonal entries of A
are only zero entries, then A is called diagonal matrix.
• Everything above the diagonal (including it), is the upper triangle, and similarly
we have the lower triangle. If A only contains non-zero entries there, we call it an
upper or lower triangular matrix, respectively.
• If for all indices i, j, one has aij = aji . Then A is called symmetric (A is reflected
over the diagonal).
• If for all indices i, j, one has aij = −aji (so in particular aii = 0), then A is called
skew-symmetric.

3.3 Looking at the columns and the associated linear


map
One way to imagine a matrix in Rm×n is to see it as a collection of n columns of size m:
    
a11 . . . a1n a1i

A =  ... ..  = a1 · · · an  , where a = 


. 
.. 
. 

i 
am1 . . . amn ami
In this view, the product of the matrix with a vector can just be seen as a linear combin-
ation of the columns of A:

Ax is a linear combination of the columns of A


      
x1
Ax = a1 · · · an   ...  = x1 a1  + · · · + xn an  (3.2)
xn
52 3 Matrices and linear systems

We can interpret this also as the following: The matrix is a machine where we can put in
vectors x ∈ Rn and we get a new vector Ax ∈ Rm as a linear combination of the columns.
This machine is given by the multiplication of A by x ∈ Rn :

n

x∈R Ax ∈ Rm
This machine multiplies
each vector from Rn
with the matrix A.

Such a machine is, of course, nothing else than “function” or “map” defined in Section 1.3.
We call this function above fA . It maps x ∈ Rn to Ax ∈ Rm .

The function fA defined by the matrix A

fA : Rn → Rm , with fA : x 7→ Ax (3.3)

Here, the function fA and the matrix include the same amount of information.

3.4 Looking at the rows


Above, we have considered a matrix A ∈ Rm×n as a collection of columns and defined a
linear map fA : Rn → Rm . However, we may also see A as a collection of m rows of size n:
  
a11 . . . a1n

αT1
A =  ... ..  =   , where αTi =

.  ··· ai1 · · · ain

am1 . . . amn αTm

Here, we use the notation T for the transpose of a column vector. The result is a row
vector with the same entries. We fix this as a space:

R1×n = {xT =

x1 . . . xn : x1 , . . . , xn ∈ R}

Since a row vector uT ∈ R1×n is just a very flat matrix, the product with a column vector
v ∈ Rn is well-defined:

uT v = (u1 v1 + · · · + un vn ) ∈ R1×1 .

Indeed, this is just a scalar and it coincides with standard inner product we already
know: hu, vi = uT v.
In this view, the product of the matrix with a vector can just be seen as the scalar product
with each row:
3.5 Matrix multiplication 53

Ax is the scalar product of x with the rows of A


   
αT1

αT1 x
...  x  .. 
Ax =  = . (3.4)


T
T
αm α mx

3.5 Matrix multiplication


Let A ∈ Rm×n . Recall that we can build the product of A by a vector b ∈ Rn . The result
is a column vector in Rm . We can do this for several vectors grouped into a matrix B,
and group the result in a matrix again:
   

A b1 . . . bk  = Ab1 Ab2 · · · Abk  for k column vectors b1 , . . . , bk .

The result is a matrix with m rows and k columns and denoted by AB. It is called the
matrix product of A and B.

Definition 3.10. Matrix product


For matrices A ∈ Rm×n and B ∈ Rn×k , the matrix product is defined as
    T 
αT1 α1 b1 · · · αT1 bk
...  .. ..  ∈ Rm×k .
AB =  1 . . . bk  =  . .  (3.5)
  b

T T T
αm αm b1 · · · αm bk

Or in other words: AB is the m × k-matrix that has the following entries:


n
X
(AB)ij = air brj
r=1

for i = 1, . . . , m and j = 1, . . . , k.

Attention!
The product AB is only defined if the width of A coincides with the height of B.
The “inner dimensions” have to match.

Special cases:
• A = aT ∈ R1×n , B = b ∈ Rn×1 : AB = aT b ∈ R

• A = a ∈ Rn×1 , B = bT ∈ R1×m : AB = abT ∈ Rn×m , (AB)ij = ai bj (called


rank 1 matrix)

Example 3.11. Just calculate some examples:


54 3 Matrices and linear systems

(a) We combine the following matrix dimensions (2 × 2) · (2 × 3) ⇒ 2 × 3:

           !
1 2 1 2 3 1 2 1 1 2 2 1 2 3
=
3 4 4 5 6 3 4 4 3 4 5 3 4 6
| {z } | {z } | {z } |{z} | {z } |{z} | {z } |{z}
A B A b1 A b2 A b3

!  
1·1+2·4 1·2+2·5 1·3+2·6 9 12 15
= =
+ 4 · 4} 3| · 2 {z
3| · 1 {z + 4 · 5} 3| · 3 {z
+ 4 · 6} 19 26 33
Ab1 Ab2 Ab3

   
1 2 3 1 2
(b) Let A = and B = . Then the product of A and B is not defined
4 5 6 3 4
since width(A) = 3 6= 2 = height(B). The product with the other order BA is
defined.
(c) Now the matrix dimensions (3 × 1) · (1 × 3) ⇒ 3 × 3:  
1
!
1
!
1
!
1
! ! 1 2 3
2 (1 2 3) = 2 |{z}
1 2 |{z}
2 2 |{z}
3 = 2 4 6
3 3 b1 3 b2 3 b3 3 6 9
(d) Now the matrix
! dimensions (1 × 3) · (3 × 1) ⇒ 1 × 1:
1
(1 2 3) 2 = (1 · 1 + 2 · 2 + 3 · 3) = (14) = 14
3
(e) A 2 × 2-example (which you shouldn’t take seriously):

(Example by Florian Dalwigk; images from Super Mario World and Super Mario World 2)

We can also ask what happens if we multiply a row vector xT from the left to a matrix
B ∈ Rn×k . By definition, we get:

 
β T1
x T B = x1 · · · xn 
 .. β T1

β Tn

.  = x1 + · · · + xn .

β Tn

This means the product xT B is a linear combination of the rows of B. This is an analogy
that Ax is a linear combination of the columns of A, cf. equation (3.2).
3.5 Matrix multiplication 55

Remark:
Now we can see the matrix product as introduced
 

AB = Ab1 Ab2 · · · Abk 

This means that each column of AB consists of a linear combination of the columns
from A.
Seeing the product with the other eye
   
αT1 αT1 B
.. ..
AB =  . B =  . ,
   
T T
αm αm B

we see that each row of AB consists of a linear combination of the rows from B.

Now, we summarise the properties of the matrix multiplication.

Proposition 3.12. Properties of the matrix product

(a) For all A, B ∈ Rm×n and C ∈ Rn×k and D ∈ R`×m we have:


(A + B) · C = A · C + B · C and D · (A + B) = D · A + D · B.

(b) For all A ∈ Rm×n and B ∈ Rn×k and λ ∈ R we have:


λ · (A · B) = (λ · A) · B = A · (λ · B).

(c) Associative rule: For all A ∈ Rm×n and B ∈ Rn×k and C ∈ Rk×` we have:
A · (B · C) = (A · B) · C.

Proof. All these rules follow from the definition of the matrix product of A and B,
n
X
(AB)ij = air brj ,
r=1

and the fact that these rules hold for the real numbers air , brj ∈ R. For example, for
showing (c):
n n k k n
!
X X X X X
(A(BC))ij = air (BC)rj = air brz czj = air brz czj = ((AB)C)ij .
r=1 r=1 z=1 z=1 r=1

Properties (a) and (b) are left as an exercise.

Attention! No commutative rule


In general, we have for two matrices:

AB 6= BA (in general).
56 3 Matrices and linear systems

Remark:
We thus have the following interpretations of the matrix vector product AB:

• the columns of B are used to build linear combinations of the columns of A,

• the rows of A are used to build linear combinations of the rows of B,

• each row αTi of A and each column bj of B are multiplied to form an entry
of the product: (AB)ij = αTi bj ,

• each column ai of A and each row β Ti of B is combined to a rank-1 matrix


ai β Ti , and the matrices are added up,

All these interpretations are equally valid, and from situation to situation, we can
change our point of view to gain additional insights.

3.6 Linear maps


Each map that conserves the structure of our vector space Rn is called a linear map.
We already know that the only structure we have is the vector addition and the scalar
multiplication.

Definition 3.13. Linearity of maps


A map f : Rn → Rm is called linear if for all x, y ∈ Rn and λ ∈ R, we have:

f (x + y) = f (x) + f (y) (+)

f (λx) = λf (x) (·)

Rule of thumb:
Equation (+) means: First adding, then mapping = First mapping, then adding
Equation ( · ) means: First scaling, then mapping = First mapping, then scaling

We already know that for each matrix A ∈ Rm×n there is an associated map fA . This
map is indeed a linear map.

Proposition 3.14. fA is linear


fA
Let m, n ∈ N, A ∈ Rm×n and fA : Rn → Rm with x 7→ Ax. Then the following
holds:
(a) For all x, y ∈ Rn we have

fA (x + y) = fA (x) + fA (y), i.e. A(x + y) = Ax + Ay. (+)

(b) For all λ ∈ R and x ∈ Rn one has

fA (λx) = λfA (x), i.e. A(λx) = λAx. (·)


3.6 Linear maps 57

Proof. This follows immediately from the properties of the matrix product in Proposi-
tion 3.12. However, it may be helpful to write down a direct proof for the case n = 2.
(a) Let x = xx12 and y = yy12 be vectors in R2 . Then we have:
 

 
 
a x1 + y1 (3.2)
fA (x + y) = A(x + y) = a 1 2  = (x1 + y1 )a1 + (x2 + y2 )a2
x2 + y2
   
   
(3.2)
a x1 a y1
= x1 a1 + x2 a2 + y1 a1 + y2 a2 = a1 2  + a 1 2 
x2 y2

= Ax + Ay = fA (x) + fA (y).

(b) Let λ ∈ R and x = x1


∈ R2 . Then:

x2

 
 
a λx1 (3.2)
fA (λx) = A(λx) = a 1 2  = (λx1 )a1 + (λx2 )a2
λx2
 
 
(3.2)
a x1
= λ(x1 a1 + x2 a2 ) = λ a 1 2  = λAx = λfA (x).
x2

Now we look at the connection of the composition and matrix product:


• For A ∈ Rm×k , we have the linear map fA : Rk → Rm : v 7→ Av
• For B ∈ Rk×n , we have the linear map fB : Rn → Rk : x 7→ Bx = v.
Then we can interpret the product AB ∈ Rm×n as a linear map of column vectors:

fA ◦ fB : Rn → Rm
x 7→ A(Bx) = ABx.

where we use the associativity law.


If we have a linear map f : Rn → Rm , we can write it as

f (x) = f (x1 e1 + · · · + xn en ) = x1 f (e1 ) + · · · + xn f (en )

and immediately find:

Remark: Linear maps induce matrices


For each linear map f : Rn → Rm , there is exactly one matrix A ∈ Rm×n with
f = fA . In the columns of A, one finds the images of the canonical unit vectors:
 

A := f (e1 ) · · · f (en ) . (3.6)

A is often called the transformation matrix of f .


58 3 Matrices and linear systems

Rule of thumb: Linear map = lines stay lines


A linear map f : Rn → Rm conserves the linear structure: If U ⊂ Rn is a linear
subspace then also the image f (U ) ⊂ Rm . Or in other words: Lines on the left stay
lines on the right:

(However, lines could shrink down to the origin.)

A linear map is completely determined when one knows how it acts on the canonical unit
vectors e1 , . . . , en . Therefore, in R2 , a good visualisation is to look at “houses”: A house H
is given by two points. Now what happens under a linear map fA associated to a matrix
A? One just have to look at the corners:

q fA q0
0
o0 := fA (o) = A

0
= 0a1 + 0a2 = o
1 a2
p0 := fA (p) = A

0
= 1a1 + 0a2 = a1
0 p0
q0 := fA (q) = A

1
= 0a1 + 1a2 = a2
o p o0 a1

With the help of the linearity, we also know what happens with the other parts of the
house, for example the corners of the door:

s s0
Since t = 1
p and u = 1
p, we have:
2 4
q fA q0 r0
r
(·) 1
fA (t) = fA ( 12 p) = f (p)
2 A
= 12 p0 a2
(·) 1
fA (u) = fA ( 14 p) = f (p) = 41 p0 a1
p0
4 A 0
o0 u t
o u t p 0

Example 3.15. A non-linear map

A map f : R2 → R2 given by
f
x − 51 (cos(πy) − 1)
   
x
f: 7→ .
y y + 18 sin(2πx)

is not linear!
3.6 Linear maps 59

Example 3.16. Some linear house transformations

     
3 0 1 0 3 0
A= B= C=
0 1 0 2 0 2

2 2
1

3 1 3
 
−1 0    
D= 1 0 −1 0
0 1 E= F =
0 −1 0 −1

1 −1
1
−1 −1
−1
     
5 0 1 0 0 1
G=
0 5 H 0 = 5H I=
0 1 H0 = H J=
1 0

fI = id

5
1 1

5 1 1
 

3 1
   3 0
3 6 M =
K= L= 1 0
1 2 1 2

1
1
 
−1
 
1   1 0
N = 0 0 P =
1 1 O= 0 0
0 0

2
1

−1 1 1
 
  
cos( π6 ) − sin( π6 )
 −1 −1
0 0 R= S=
Q= sin( π6 ) π
cos( 6 ) −3 3
0 1
3
1
1.5

π
/6

1 −3
60 3 Matrices and linear systems

3.7 Linear dependence, linear independence, basis and


dimension
We have seen that in R2 two vectors can be parallel (colinear):

There is a λ ∈ R with a = λb .

Similarly, in R3 three vectors can be in the same plane (coplanar):

There are λ, µ ∈ R with a = λb + µc .

If this is the case, we can build a loop of vectors, starting at o and ending at o again:

o = (−1)a + λb + µc .

Let us generalise this:

Definition 3.17. Linear dependence and indepedence


A family (v1 , . . . , vk ) of k vectors from Rn is called linearly dependent if we find a
non-trivial linear combination for o. This means that we can find λ1 , . . . , λn ∈ R
that are not all equal zero such that
k
X
λj vj = o .
j=1

If this is not possible, we call the family (v1 , . . . , vk ) linearly independent. This
means that
Xk
λj vj = o ⇒ λ1 , . . . , λk = 0
j=1

holds.

Example 3.18. Let us look at examples:


(a) The family ( 10 , 11 , 01 ) is linearly dependent since
  

     
1 1 0
= +
1 0 1

(b) ( 1 1
, 2 ) is linearly dependent.
  2
0
, 1
(c) Each family which includes o is linearly dependent. Also each family that has the
same vector twice or more is linearly dependent.
(d)
     
1 0 0
e1 =  0  e2 =  1  e3 =  0 
0 0 1
3.7 Linear dependence, linear independence, basis and dimension 61

These are linearly independent vectors, because


         
0 1 0 0 λ1
 0  = λ1  0  + λ2  1  + λ3  0  =  λ2 
0 0 0 1 λ3

yields λ1 = λ2 = λ3 = 0.
If we add an arbitrary additional vector
 
a1
a =  a2  ,
a3

we can combine it from the other three by setting λi = ai , which means:

a = a1 e1 + a2 e2 + a3 e3

So the resulting set of vectors is linearly dependent.

Proposition 3.19. Linear dependence


For a family (v1 , . . . , vk ) of vectors from Rn the following claims are equivalent:

(i) (v1 , . . . , vk ) is linearly dependent.

(ii) There is a vector in Span (v1 , . . . , vk ) that has two or more representations as
a linear combination.

(iii) At least one of the vectors in (v1 , . . . , vk ) is a linear combination of the others.

(iv) There is an i ∈ {1, . . . , k} such that we have

Span(v1 , . . . , vk ) = Span(v1 , . . . , vi−1 , vi+1 , . . . , vk ) .

(v) There is an i ∈ {1, . . . , k} with vi ∈ Span(v1 , . . . , vi−1 , vi+1 , . . . , vk ).

Proof. Exercise!

Since the opposite of linear dependence is linear independence, we can simply negate
Proposition 3.19 and get the following:

Proposition 3.20. Linear independence


For a family (v1 , . . . , vk ) of vectors from Rn the following are equivalent:

(i) (v1 , . . . , vk ) is linearly independent.

(ii) Every vector in Span (v1 , . . . , vk ) can be formed by linear combinations in


exactly one way.

(iii) None of the vectors in (v1 , . . . , vk ) is a linear combination of the others.


62 3 Matrices and linear systems

(iv) For all i ∈ {1, . . . , k} we have:

Span(v1 , . . . , vk ) 6= Span(v1 , . . . , vi−1 , vi+1 , . . . , vk ) .

(v) For all i ∈ {1, . . . , k} we have vi ∈


/ Span(v1 , . . . , vi−1 , vi+1 , . . . , vk ).

A simple consequence is:

Corollary 3.21.
If the family (v1 , . . . , vk ) is linearly dependent, we can subjoin vectors and the res-
ulting family is still linearly dependent. On the other hand, if the family (v1 , . . . , vk )
is linearly independent, we can omit vectors and the resulting family is still linearly
independent.

Let now V be a subspace of Rn , which is spanned by the vectors v1 , . . . , vk ∈ Rn . Hence


V = Span(v1 , . . . , vk ).

Question: Efficiency question:


How many vectors do we actually need to span V ?

The quick answer “k” is in general false since the family (v1 , . . . , vk ) could be linearly
dependent.
Our geometric intuition says that on a plane we cannot have more than two linearly
independent vectors, and in three-dimensional space not more than three. We express
this, by saying, a plane is two-dimensional, and space is three-dimensional. We will again
formalise this:

Definition 3.22. Basis, basis vectors


Let V be a subspace of Rn . A family B = (v1 , . . . , vk ) is called a basis of V if

(a) V = Span(B) and

(b) B is linearly independent.

The elements of B are called the basis vectors of V .

We can show that each subspace V ⊂ Rn has a basis. We define:

Proposition & Definition 3.23. Coefficients with respect to a basis


Let B = (v1 , . . . , vk ) be a basis of a subspace V ⊂ Rn . Each x ∈ V can be written as
a linear combination λ1 v1 +. . .+λk vk where the coefficients λ1 , . . . , λk are uniquely
determined. They are called the coordinates of x with respect to B.

Proof. Let B = (v1 , . . . , vk ) be a basis of V . Since Span(B) = V , we can express each


v ∈ V as a linear combination λ1 v1 + . . . + λk vk . The uniqueness of the coefficients
λ1 , . . . , λk follows from the linear independence of the basis vectors. This is an easy
exercise.
3.7 Linear dependence, linear independence, basis and dimension 63

Theorem 3.24. Steinitz’s theorem


Consider a basis B = (v1 , . . . , vk ) of a subspace V ⊂ Rn and a linearly independent
set of vectors A = (a1 , . . . , a` ) ⊂ V . Then we can extend A to a basis of V by
adding k − ` elements of B.

Sketch of the proof. Pack B and A together to a linearly dependent set, and remove vec-
tors (starting with elements of B) until it is linearly independent. One has to show now,
that the resulting set has again k elements, and that A remains untouched.

Now, we can record that all bases of V have the same number of elements.

Corollary 3.25.
Let V be a subspace of Rn and let B = (v1 , . . . , vk ) be a basis of V . Then:

(a) Each family (w1 , . . . , wm ) consisting of vectors from V where m > k is linearly
dependent.

(b) Each basis of V has exactly k elements.

So we can define:

Definition 3.26. Dimension of a linear subspace


Let V be a subspace of Rn , and let B be a chosen basis of V . The number of elements
in B is well-defined and called the dimension of V , written as dim(V ). As a special
case, we set dim({o}) = 0.

The unit vectors e1 , . . . , en in Rn form a basis. The linear indepencency can be seen by:
 
x1
x1 e1 + . . . + xn en = o ⇒  ...  = o ⇒ x1 = . . . = xn = 0.
xn

We obtain, as expected:
dim(Rn ) = n.

Rule of thumb:
The dimension of a vector space V says how many independent degrees of freedom
are needed to build linear combinations of all vectors in V .

Theorem 3.27.
Let U and V be two linear subspaces of Rn .

(i) One has dim(U ) = dim(V ) if and only if there exists a linear bijective map
between U and V .

(ii) If U ⊂ V and dim(U ) = dim(V ), then U = V .


64 3 Matrices and linear systems

Proof. (i) (⇒): If dim(U ) = dim(V ), a bijection can be defined by mapping the basis
vectors of the subspace U to the basis vectors of V and extending it linearly.
(⇐): If there is a bijection between U and V , the image of the basis of U is a basis of
V (linearly independent by injectivity and spanning by surjectivity). Hence, dim(U ) =
dim(V ).
(ii) A basis of U is an linearly independent family in V with dim(U ) = dim(V ) vectors.
Thus it is also a basis for V , and thus U = V .

Example 3.28.
The following subspaces of Rn are very important:

• The trivial subspace {0} with dim({0}) = 0

• Lines L (through the origin): dim(L) = 1

• Planes P (through the origin): dim(P ) = 2

• Hyperplanes H (through the origin): dim(H) = n − 1

The dimension of an affine subspace W = u0 + U (where U is a linear subspace) is usually


set to the dimension of U .

Corollary 3.29.
A family consisting of more than n vectors in Rn is always linearly dependent.

Proof. Use Corollary 3.25.

3.8 Identity and inverses


For each n ∈ N, we define the identity matrix 1n by
 
1 0 0 ··· 0
.. 
0 1 0 .
. . .. 

1n := 0 0 1 . .  ∈ Rn×n .

 .. .. ..
. . . 0

0 ··· ··· 0 1
The associated linear map Rn → Rn is, of course, the identity id. Other notations for the
identity matrix are 1n , In or En . If the context is clear, one usually omits the index n.
In the space of square matrices A ∈ Rn×n , the identity matrix fulfils:

1n x = x .

Also we can define inverses A−1 , which may or may not exist:

AA−1 = A−1 A = 1n .
3.9 Transposition 65

Definition 3.30. Invertible Matrix, A−1


We call a square matrix A ∈ Rn×n invertible or nonsingular if the corresponding
linear map fA : Rn → Rn is bijective. Otherwise, we call A singular. A matrix Ã
with fà = (fA )−1 is called the inverse of A and is usually denoted by A−1 . We have

fA−1 ◦ fA = id and fA ◦ fA−1 = id ,

which means fA−1 = (fA )−1 .


For the matrices, this means:
A−1 (Ax) = x and A(A−1 x) = x for all x ∈ Rn .
In short: A−1 A = 1 and AA−1 = 1.

If A is invertible, the linear system Ax = b has the unique solution x = A−1 b.

Theorem 3.31.
Let A ∈ Rn×n be a square matrix. Then

fA injective ⇔ fA surjective

Hence, if one of these cases holds, then fA is already bijective, i.e., invertible.

Proof. This is a classical dimension argument:


(⇒): If fA is injective, then (f (e1 ), . . . , f (en )) are linearly independent vectors and form
a basis of Rn . This means that fA is also surjective.
(⇐): If fA is surjective, then each y ∈ Rn is given by a linear combination from the family
(f (e1 ), . . . , f (en )) and, hence, it forms a basis of Rn . Therefore, fA is also injective.

For two invertible matrices A and B we have the formula:


(AB)−1 = B −1 A−1 .

Remark:
If f : Rn → Rn is a linear map that is bijective, then f −1 : Rn → Rn is also a linear
map.

3.9 Transposition
Mathematicians found it convenient to write down most things in terms of column vectors,
and mainly think about column matrices. If one wants to talk about the rows of a matrix,
most of the time, one defines the transposed matrix and talks about their columns.
We already know transposition of column vectors:
 T
a1
 .. 
 .  = (a1 . . . an )
an
66 3 Matrices and linear systems

and similarly, we can define:


 
a1
(a1 . . . an )T =  ...  ,
 
an

Then we have the simple formula (aT )T = a.


For a matrix, we can do the same:

Definition 3.32. Transpose


For a matrix A ∈ Rm×n , we define a matrix AT ∈ Rn×m and call it the transpose
of A. The ith column of A becomes the ith row of AT and the j th row of A becomes
the j th column of AT :
   
a11 a12 . . . a1n a11 a21 . . . am1
 a21 a22 . . . a2n   a12 a22 . . . am2 
For A =  .. .. , we define AT
:=  .. ..  .
   
 . . . . 

 
am1 am2 . . . amn a1n a2n . . . amn

Example 3.33. (a)


 
  1 2
1 2 0 1 2 0
A= ∈ R2×4 ⇒ AT =   ∈ R4×2 .
2 0 3 0 0 3
1 0

(b)    
1 2 1 3
A= ∈ R2×2 ⇒ T
A = ∈ R2×2 .
3 4 2 4

(c)    
1 −2 1 −2
A= ∈ R2×2 ⇒ T
A = ∈ R2×2 .
−2 1 −2 1

(d)  
1
A = 2 ∈ R3×1 AT = 1 2 3 ∈ R1×3 .

 ⇒
3

(e)  
4
5
A = 4 5 6 7 ∈ R1×4 AT =  4×1

⇒ 6 ∈ R .

Since we have exchanged the roles of rows and columns, the order of multiplication
changes, too:
(Ax)T = xT AT xT A = (AT x)T .
3.9 Transposition 67

Just as with matrix-vector multiplication, transposition reverses the order of matrix-


matrix multiplication:
(AB)T = B T AT .
In particular, if A is invertible, then

1 = 1T = (A−1 A)T = AT (A−1 )T ⇒ AT is invertible and (AT )−1 = (A−1 )T .

Example 3.34. We find


 
  0 −2 2  
1 2 3  11 −4 9
· 4 −1 2 =
4 5 6 26 −13 24
1 0 1

and      
0 4 1 1 4 11 26
−2 −1 0 · 2 5 = −4 −13 .
2 2 1 3 6 9 24

Proposition 3.35. Some rules for transposition

(a) For all A, B ∈ Rm×n we have: (A + B)T = AT + B T .

(b) For all A ∈ Rm×n and for all λ ∈ R, we have: (λ · A)T = λ · AT .

(c) For all A ∈ Rm×n we have: (AT )T = A.

(d) For all A ∈ Rm×n and for all B ∈ Rn×r we have: (A · B)T = B T · AT .

(e) If A ∈ Rn×n is invertible, then AT is also invertible and we get (AT )−1 =
(A−1 )T .

(f ) For all u, v ∈ Rn we have: uT · v = vT · u = hu, vi.

Proof. If we denote the entries of a matrix A by Aij . Then we have

(AT )ij = Aji for all i, j

and from this one can prove all properties. For example for showing (d), we see
X X
(B T · AT )ij = (B T )ik (AT )kj = Ajk Bki = (A · B)ji = ((A · B)T )ij
k k

for all i, j.

Proposition 3.36. What has AT to do with the inner product?


For x ∈ Rn , y ∈ Rm and A ∈ Rm×n , we have for the standard inner product:

hAx, yi = hx, AT yi.


68 3 Matrices and linear systems

Proof. We already know that for all u, v ∈ Rn , we have hu, vi = uT v. Hence, we conclude
that for x ∈ Rn , y ∈ Rm and A ∈ Rm×n , the following holds hAx, yi = (Ax)T y =
xT AT y = hx, AT yi.

Moreover, AT is the only matrix in B ∈ Rn×m that satisfies the equation hAx, yi = hx, Byi
for all x ∈ Rn and y ∈ Rm . Therefore, some people use this as the definition for AT .

Definition 3.37. Symmetric and skew-symmetric matrices


One typical notation for square matrices:

• If AT = A, then A is called symmetric.

• If AT = −A, then A is called skew-symmetric.

Example 3.38. (a)


   
1 3 −4 1 3 −4
A= 3 0 5  is symmetric since AT =  3 0 5  = A.
−4 5 3 −4 5 3

(b)
   
0 3 4 0 −3 −4
A = −3 0 −5 is skew-symmetric since AT = 3 0 5  = −A.
−4 5 0 4 −5 0

By definition, all skew-symmetric matrices have only zeros on the diagonal.

3.10 The kernel, range and rank of a matrix

Definition 3.39. Range and kernel of matrices


Let A ∈ Rm×n . The set

Ran(A) := {Ax : x ∈ Rn } ⊂ Rm

is called the range or image of the matrix A.


The set
Ker(A) := {x ∈ Rn : Ax = o} ⊂ Rn
is called the kernel or nullspace of the matrix A.

Note that the range of A coincides with the range of the corresponding map fA and
that the kernel of A corresponds to the fiber of fA for the origin o. In other words,
Ran(A) = Ran(fA ) and Ker(A) = fA−1 ({o}).
In our previous study of matrices, we have already found out quite a lot of things about
matrices:
• Ran(A) = Span (a1 , . . . , an ) where the vectors ai are the columns of A.
3.10 The kernel, range and rank of a matrix 69

• Ker(A) is a subspace of Rn
• For a matrix A the linear mapping fA is injective if and only if Ker(A) = {o} and
surjective if and only if Ran(A) = Rm .
Since our ultimate goal is to understand linear systems given the form:
Ax = b,
we would like to know more about Ran(A) (because it tell us, for which b our system has
a solution) and Ker(A) (because it tells us about the uniqueness of solutions).

Definition 3.40. Rank of a matrix


Let A ∈ Rm×n . The number

rank(A) := dim(Ran(A)) = dim(Span (a1 , . . . , an )).

is called the rank of the matrix A.

We obviously have:
rank(A) ≤ min{m, n}
A is said to have full rank, if rank(A) = min{m, n}.
Let A be some matrix with, say m columns. Let us assume that somebody gives us
r = rank(A). This means that A has r linearly independent columns, and these columns
are a basis for Ran(A). Let us again assume that we know these columns, and we have
already reordered them, so that they are the first r columns of A:
B | |{z}
A = (|{z} F )
r m−r

Later, when we discuss the Gauß algorithm, we will find a way to identify these columns.
With this information, we would like to compute Ker(A), i.e. all x, such that Ax = o. It
is also of interest to obtain its dimension dim(Ker(A)).

Theorem 3.41. Rank-nullity theorem


Consider a matrix A with n columns. Then

dim(Ker(A)) + dim(Ran(A)) = n.

Proof. Choose a basis (b1 , . . . , bk ) of Ker(A). Then choose r := n − k vectors c1 , . . . , cr


such that (b1 , . . . , bk , c1 , . . . , cr ) is a basis of Rn . This is possible by Steinitz’s theorem
in Theorem 3.24. Then we have:
Ran(A) = Span(Ac1 , . . . , Acr ) .
This means that (Ac1 , . . . , Acr ) would be a basis of Ran(A) if this family is linearly
independent. Hence, assume that we have
λ1 Ac1 + · · · + λr Acr = o for some λi ∈ R .
From this, we conclude A( ri=1 λi ci ) = o which means that ri=1 λi ci ∈ Ker(A). Since
P P
(b1 , . . . , bk , c1 , . . . , cr ) is linearly independent, only λ1 = · · · = λr = 0 is possible. There-
fore dim(Ran(A)) = r which proves the claim.
70 3 Matrices and linear systems

3.11 Solving systems of linear equations


Now we are coming to linear equations. For example, we want to solve the system:

2x1 + 3x2 + 4x3 = 1


4x1 + 6x2 + 9x3 = 1
2x1 + 4x2 + 6x3 = 1

Seeing this as the matrix vector form, we can write Ax = b. Later, we will also put in
the right-hand side and get the augmented matrix (A|b) that simply is
 
2 3 4 1
 4 6 9 1 .
2 4 6 1

There are a lot of viewpoints for the method of solving this system. We start with the
most important one.

3.11.1 Row operations and the Gauß algorithm


If you come across simple linear systems, for example, just with two unknows, you may
be tempted just randomly combining the equations to find solution. For example:

Example 3.42.
A typical 2 × 2 LES might be look like this

E1 : x1 + 3x2 = 7
E2 : 2x1 − x2 = 0.

From equation E2 we can conclude x2 = 2x1 . Putting this equation for x2 into the
first equation E1 , we immediately get:
E
7 =1 x1 + 3x2 = x1 + 3 · 2x1 = 7x1

and therefore x1 = 1. Hence, the system has exactly one solution given by the vector
x = xx12 = 12 .


My advice here: Please, do not do that. It works for 2×2 LES but you will be in big trouble
while solving larger systems, most of the time. Therefore, we will now learn a systematic
solving recipe, called the Gauß algorithm or often just called Gaussian elimination.
One idea of Gaussian elimination can be shown with the example above. One does row
operations to get:

E1 : x1 + 3x2 = 7 ; E10 :=E1 : x1 + 3x2 = 7


E2 : 2x1 − x2 = 0 −2·E1 ; E20 :=E2 −2·E1 : − 7x2 = −14.

The LES on the right-hand side can easily be solved.


3.11 Solving systems of linear equations 71

The next idea is, as always, to rewrite the system in matrix form Ax = b. Operations on
the system, can now be realised by using an invertible matrix M . This reformulates the
LES:

Ax = b ⇔ M Ax = M b

If the product M A has a simpler form than A, this helps to solve our system.

Linear systems can be solved by building linear combinations of rows. This means
multiplication of invertible matrices on A and b from the left.

Recall that for each vector c ∈ Rm the product cT A builds linear combinations of rows of
A which are always denoted by αT1 , . . . , αTm . As a reminder:
   
a11 . . . a1n αT1
A =  ... ..  =  ..
 , where αi = ai1 · · · ain .
T

.   .
 
am1 . . . amn αTm

Now, we have the following operations, for example:

cT = (0 . . . ci . . . 0) gives: cT A = ci αTi
cT = (0 . . . ci 0 . . . 0 cj . . . 0) gives: cT A = ci αTi + cj αTj

If we put such a vector cT as the ith row into a matrix M , then the ith row of M A will
contain this result. This gives us the row operations for the example above.
Similar things can be done, of course, with the columns of A by right multiplication of
columns. However, row operations are more important right now.

Example 3.43. Adding multiples of rows


In the following 3 × 3 example, λ times the first row is added to the second:
    
1 0 αT1 αT1
Z2+λ1 A =  λ 1   αT2  = αT2 + λαT1  .
T
1 α3 αT3

The next example shows how the first row is added λ-times to the last:
    
1 αT1 αT1
Z3+λ1 A =  1  αT2 = αT2 .
T T T
λ 1 α3 α3 + λα1

Undoing this last operation means using subtraction of such a row again. Thus, for
the last example, we have
    
1 αT1 αT1
Z3−λ1 Z3+λ1 A =  1  αT2 = αT2  = A.
−λ 1 αT3 T
+ λα1 T T T
α3 + λα1 − λα1

In other words: Z3+λ1


−1
= Z3−λ1 .
72 3 Matrices and linear systems

In general, we can define Zj+λi that adds the ith row of A to the j th row, where i 6= j. By
the 3 × 3 examples, it is easy to see how one has to define this matrix in the m × m-case.
This is always an invertible matrix since the inverse of Zj+λi is always the matrix Zj−λi
since this undoes the row addition. One gets:

Zj+λi Zj−λi = 1m . (3.7)

Instead of adding rows, we could also exchange rows: We replace the ith row of A by its
j th row, and vice versa.

Example 3.44. Exchanging rows


In the following 3 × 3 example, the first and the second row are exchanged:
    
0 1 αT1 αT2
P1↔2 A =  1 0  αT2 = αT1 .
T T
1 α3 α3

The next example shows how to exchange the first and the last row:
    
0 1 αT1 αT3
P1↔3 A =  1  αT2 = αT2 .
T T
1 0 α3 α1

The inverse of Pi↔j is Pi↔j again:

−1
Pi↔j = Pi↔j (3.8)

because exchanging of rows is undone by exchanging them again! Such an action is a


special case of a permutation and Pi↔j is a permutation matrix. Permutations of columns
are also possible, just multiply Pi↔j from the right: APi↔j .
Next one is scaling a row:

Example 3.45. Scaling rows


Each row is multiplied by a scalar value:
    
d1 αT1 d1 αT1
.. .. ..
DA =  . . = . .
    

dm αTn dm αTm

D is called a diagonal matrix. D is invertible if no diagonal entry is 0. Then the inverse


of D is a scaling matrix with reverse entries:
   
d1 1/d1
.. −1 ..
.  ⇒ D = . .
   

dm 1/dm
3.11 Solving systems of linear equations 73

Definition 3.46. Row operations


For a given matrix A ∈ Rm×n , we call the multiplication with the invertible matrices
Zi+λj , Pi↔j , D ∈ Rm×m from the left simply row operations.

Proposition 3.47. Row operations do not change the kernel


Let A ∈ Rm×n and M ∈ Rm×m a matrix consisting of finitely many row operations
(for example M = Z2+ 1 1 Z3+ 1 1 P2↔3 ). Then we have:
4 2

Ker(M A) = Ker(A) and Ran(M A) = M Ran(A) .

Proof. For the kernel:

M Ax = o ⇔ Ax = M −1 o ⇔ Ax = o.

The range formula directly follows from the definition.

Note that the range may change a lot by row operations. What happens with the kernel
and the range if you just do column operations?

3.11.2 Set of solutions


Solving a linear system Ax = b (with A ∈ Rm×n ) means:
• finding out, if a solution exists, i.e. b ∈ Ran(A)
• if yes, computation of all solutions of this system
We usually denote the set of all solution by S = {x ∈ Rn : Ax = b}.

Proposition 3.48. The set of solutions is an affine subspace


Let Ax = b be a LES with A ∈ Rm×n . Then for the set of solutions we have either
S = ∅ or
S = v0 + Ker(A) = {v0 + x0 : x0 ∈ Ker(A)}
for a v0 ∈ Rn .

Proof. Let v0 ∈ S be a solution Av0 = b. By linearity of A we know:

x = v0 + x0 ∈ S ⇐⇒ b = Ax = A(v0 + x0 ) = Av0 + Ax0 = b + Ax0


⇐⇒ Ax0 = o ⇐⇒ x0 ∈ Ker(A) .

If there is no such v0 , then S = ∅.

In combination with Proposition 3.47, we get the most important fact:

Corollary 3.49.
Row operations do not change the set of solutions.

We are looking for a method that allows:


74 3 Matrices and linear systems

• to decide, if b ∈ Ran(A), and read off rank(A) = r.


• to compute a particular solution Av0 = b
• to compute a basis F of Ker(A), so that S is completely known.
Since triangular systems are easy and straightforward to solve, we would like to bring A
into triangular form, or something similar.

3.11.3 Gaussian elimination (without pivoting)

Goal
For solving Ax = b, use row operations M to bring A into upper triangular form,
which is matrix that has zeros below the diagonal, or into row echelon form, which
we will define later. Then, one can construct the solution set for M Ax = M b.

Since we use the same row operations on A and b it is useful to use augmented matrix
(A|b). In the end, we obtain (M A|M b).

Example 3.50. Let us solve the system:


2x1 + 3x2 − 1x3 = 4
2x1 − x2 + 7x3 = 0
6x1 + 13x2 − 4x3 = 9
Then we do the following steps:
     
E1 : 2 3 −1 4 E10 : 2 3 −1 4 E100 : 2 3 −1 4
E2 : 2 −1 7 0−1·E1 ; E20 : 0 −4 8 −4 ; E200 : 0 −4 8 −4
E3 : 6 13 −4 9 −3·E1 E30 : 0 4 −1 −3 +1·E2
0 E3 : 0
00 0 7 −7
We get from E300 immediately x3 = −7 7
= −1. Putting this in E200 , we also get −4x2 +
8 · (−1) = −4 and, hence, x2 = −4+8−4
= −1. The last step is then, using x2 = −1 and
x3 = −1 in E1 , to get 2x1 + 3 · (−1) − (−1) = 4, which means x1 = 4+3−1
00
2
= 3. The
unique solution is then ! !
x1 3
x = x2 = −1 .
x3 −1

Attention!
Do not use equation E10 anymore at this point!
Otherwise, you would bring the variable x1 back in the game.

Gauß with a bug

We start with a square matrix A ∈ Rn×n . Let us write Ae := (A|b) as a row matrix:
e T1
   
a11 a12 . . . a1n b1 α
 a21 a22 . . . a2n b2   e T2
α 
A  ..
e = (A|b) =  .. .. .. = ..
  
 . . . . .
  
  
T
an1 an2 . . . ann bn α
en
3.11 Solving systems of linear equations 75

So aij is the j th entry of α


e Ti .
We can eliminate a21 by adding rows: α
e T2 ; α e T1 , where λ2 =
e T2 − λ2 α a21
a11
.
This can be written in terms of a matrix, and we obtain:
e T1
   
a11 a12 . . . a1n b1 α
T
 0 ã
 22 . . . ã2n b̃2   e − λ2 α
  α 2 e T1 

Z2−λ2 1 (A|b) =  31 a32 . . . a3n
 a b3 =
 e T3
α 
 .. .. .. ..   ..

 . . . .   .


an1 an2 . . . ann bn e Tn
α
Now we can do the same with all other rows, defining λi = aa11i1
, and computing:

e T1
   
a11 a12 . . . a1n b1 α
T
  e 2 − λ2 α
 0 ã22 . . . ã2n b̃2   α e T1 
Zn−λn 1 . . . Z2−λ2 1 (A|b) =  .. .. .. ..  =  ..  = L−1
1 (A|b)
 
| {z }  . . . .   . 
L−1
1 0 ãn2 . . . ãnn b̃n e Tn − λn α
α e Tn

Since L−1
1 substracts the first row from all others, its inverse is easily seen to be the matrix
that adds this row to all others:
   
1 1
 −λ2 1   λ2 1 
−1
L1 =  .. . , L1 =  .. .
   
 . . .   . . .


−λn 1 λn 1

Once, we have eliminated the entries a21 . . . an1 . We can do the same with ã32 . . . ãn2 .
Here we use factors λ̃i = ãã22
i2

We then obtain:
 
a11 a12 a13 . . . a1n b1

 0 ã22 ã23 . . . ã2n b̃2 

L−1 L −1
(A|b) = 0 0 â33 . . . â3n b̂3
 
2 1
.. .. .. .. ..
 
. . . . .
 
 
0 0 ân3 . . . ânn b̂n
It (luckily) turns out, that
 
1

 λ2 1 

L1 L2 = 
 λ3 λ̃3 1 
.. .. ..

. . .
 
 
λn λ̃n 1

If we do this column by column, (and don’t run out of hats) we obtain:


 
u11 . . . u1n c1
.. . ..  = (U |c).
L−1 . . . L−1 (A|b) =  . .. . 

| n−1 {z 1 }
L−1 unn cn
76 3 Matrices and linear systems

where L is unit lower triangular:


 
1
 l21 1 
L= .. . . ..
 
. . .

 
ln1 . . . ln(n−1) 1

LU-decomposition A = LU

We thus have L−1 A = U , and L−1 b = c. Multiplication by L yields the famous


LU -decomposition:
A = LU, b = Lc.
Here L is lower triangular, and U is upper triangular. Once, we have decomposed A, we
can, for given b compute x as follows:

solve Lc = b ”forward substitution“


solve U x = c ”backward substitution”

Then Ax = LU x = Lc = b.
• Since L and U are both triangular, the above solves can be performed easily.
• Since our factorisation is done once and for all, further problems with the same
matrix but different b can be solved later.
• Another point of view on L is the following: we keep track of what is done during
the transformation of the right hand side b → c.

ci ; ci − λcj ⇔ lij = λ

If later our system has to be solved for another right hand side b, then we can use
the subdiagonal entries of L to do the same computations to this new b, as was
done to the old one. This can be nicely written as c = L−1 b.
The Gauß algorithm can be performed by hand, or implemented on a computer. The
following pseudo-code describes it in detail:
Here U and c are overwritten, so we do not distinguish uij and ũij , and so on.

Gaussian elimination without pivoting (A ∈ Rn×n )

(U |c) = (A|b), L = 1n
for j = 1 . . . n (loop over columns)
for i = j + 1 . . . n (loop over rows)
uij
lij = ujj
uij = 0 (eliminate entry)
for s = j + 1 . . . n
uis = uis − lij ujs (subtract remaining entries)
ci = ci − lij cj (subtract rhs)

• We recognise three nested loops, and thus, the cost of this algorithm is proportional
to n3 .
3.11 Solving systems of linear equations 77

• After that, we have to perform backward substitution to compute x from c.


• If b is only known after the decomposition, we can compute c by forward substitu-
tion.
• In computer libraries A is overwritten by L and U . The upper triangular part is
used to store U , the lower triangular part is used to store L:

u11 . . . ... u1n


. ..
l21 . . .
.. . . ... ..
. . .
ln1 . . . ln(n−1) unn

This is called in place factorisation. It is possible, since we know anyway that U is


zero below the diagonal, L is zero above the diagonal, and lii = 1. We may call this
storage matrix L\U .

Example 3.51. LU decomposition:


We do the row operation and save them in the matrix L:
       
2 1 1 1 0 0 2 1 1 1 0 0 2 1 1
A=  4 4 6  =  2 1 0   0 2 4  =  2 1 0  0 2 4 
−2 3 9 −1 0 1 0 4 10 −1 2 1 0 0 2
| {z }| {z }
=:L =:U

Or if one uses the storage-saving notation:


     
2 1 1 2 1 1 2 1 1
A =  4 4 6  ;  2 2 4  ; L\U =  2 2 4 ,
−2 3 9 −1 4 10 −1 2 2

For a right-hand side b, one simply does the same row calculation steps:
     
1 1 1
b =  1  ;  −1  ;  −1  =: c
1 2 4

Alternatively: c := L−1 b. For finding then the (unique) solution of Ax = b, just


compute x = U −1 c as usual.

Video: LU decomposition - An Example

https: // jp-g. de/ bsom/ la/ lu/


78 3 Matrices and linear systems

The bug:

• What, if ujj is zero at some stage of the computation? Then we have division by 0.
• Also, on the computer, if ujj is very small, say 10−14 , then due to round-off error,
problems may also occur.

Remark: Gaussian elimination or LU -decomposition?


For solving a system Ax = b you have now two options:

(a) Gaussian elimination of (A|b) without memorising the row operations.

(b) LU -decomposition of A with memorising the row operations in the matrix L.

If you are just interested in the solution(s) of a given LES, then you will just do the
Gaussian elimination step by step until you reach the upper triangle form (or the row
echelon form, see next section).

Example 3.52. Let us a look at a higher dimensional and non-square example:

E1 : x1 + 2x2 + x4 = 3
E2 : 4x1 + 8x2 + 2x3 + 3x4 + 4x5 = 14
(3.9)
E3 : 2x3 + 3x4 + 12x5 = 10
E4 : −3x1 − 6x2 − 6x3 + 8x4 + 4x5 = 4

You should immediately rewrite this in an augmented matrix form:


 
E1 : 1 2 0 1 0 3
E2 :
 4 8 2 3 4 14

(A|b) = 
E3 :  0 0 2 3 12 10
E4 : −3 −6 −6 8 4 4

The entry in grey 1 is first one we have to consider. All entries below should get zero
after the first elimination.
• multiply 4
1
= 4 to E1 and subtract the result from E2 ,
• multiply −3
1
= (−3) to E1 and subtract the result from E4 .
x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
   
E1 : 1 2 0 1 0 3 E10 : 1 2 0 1 0 3
E2 :
 4
 8 2 3 4 14−4·E1  0 0
E20 : 2 −1 4 2
; 
E3 :  0 0 2 3 12 10 E30 : 0 0 2 3 12 10
E4 : −3 −6 −6 8 4 4 +3·E1 E40 : 0 0 −6 11 4 13

The next number on the diagonal is a zero and it seems like that our algorithm has to stop
here. However, since below there are also zeros, the column is already eliminated. We
can just ignore the variable x2 at this point and just restart the algorithm with starting
point 2 .
Just subtract the equation E20 with the right factor from the other rows: 2
2
= 1 times
3.11 Solving systems of linear equations 79

from E30 and −6


2
= (−3) times from E40 . We get:
x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
   
E10 : 1 2 0 1 0 3 E100 : 1 2 0 1 0 3
E2 : 0 0
0  2 −1 4 2 E2 : 0 0
00  2 −1 4 2 
 ; 
0
E3 :  0 0 2 3 12 10−1·E20 E300 : 0 0 0 4 8 8
E40 : 0 0 −6 11 4 13 +3·E20 00
E4 : 0 0 0 8 16 19

Next variable is x4 . Now we consider 4 . Multiply E300 with 8


4
= 2 and subtract from
equation E400 :
x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
   
E100 : 1 2 0 1 0 3 E1000 : 1 2 0 1 0 3
E2 : 0 0
00  2 −1 4 2  E2 : 0 0
000  2 −1 4 2
 ; 
E300 : 0 0 0 4 8 8 E3000 :
 0 0 0 4 8 8
E400 : 0 0 0 8 16 19 −2·E300 E4000 : 0 0 0 0 0 3

Now, we cannot use any rows for elimination and we are finished. We get the following
result:
x x x3 x4 x5
 1 2 
0000
E1 : 1 2 0 1 0 3
0000
E2 :
 0 0 2 −1 4 2
(3.10)


E3 : 0
0000 0 0 4 8 8
E40000 : 0 0 0 0 0 3
This is not a triangle matrix like in Example 3.42 but an upper triangle matrix by defin-
ition since below the diagonal, there are just zeros. This form is called the row echelon
form and defined below.

3.11.4 Row echelon form


Definition 3.53. Row echelon form, pivot element
A matrix A ∈ Rm×n in the form of the left-hand side of (3.10) is called row echelon
form. This means that the matrix A fulfils:

• all zero rows, if any, are at the bottom of the matrix,

• for each row: the first nonzero number from the left is always strictly to the
right of the first nonzero coefficient from the row above it.

This leading nonzero number in each row is called the pivot.

In the row echelon form we can put the variable into two groups:

Definition 3.54. Free and leading variables


Variables in the column of a pivot are called leading variables.
The other variables are called free variables.

Example 3.55. Looking at equation (3.10) again, we can distinguish the variables
80 3 Matrices and linear systems

x1 x2 x3 x4 x5
 
G0000
1 : 1 2 0 1 0 3
0000
G2 :
 0 0 2 −1 4 2
(3.11)


G0000
3 : 0 0 0 4 8 8
0000
G4 : 0 0 0 0 0 3
In this example x1 , x3 and x4 are the leading variables and x2 and x5 are free.

• Free variables can be chosen independently in R.


• The leading variables are chosen dependently of the free variables.

If you have a LES where the matrix is given in row echelon form, then the solution set
is immediately given. You just have to push the free variables to the right and solve the
leading variables by backward substitution.
For example, the solution of (3.11) is empty since the last row is not satisfiable. However,
we can give another example:

Example 3.56.
The LES
x1 x2 x3 x4 x5
 
E1 : 1 2 0 1 0 3
E2 : 0 0 2 −1 4 2 (3.12)
E3 : 0 0 0 4 8 8
is already in row echelon form and can be equivalently written as
x x3 x4
 1
1 3 − 2 x2

E1 : 1 0
E2 : 0 2 −1 2 − 4 x5  (3.13)
E3 : 0 0 4 8 − 8 x5

and backward substitution gives us the solution set:

1 − 2x2 + 2x5 −2
          
 x1 1 2 
 x 2  
 x2  0 1 0


S = x3  =  2 − 3x5  = 2 + x2  0  + x5 −3 : x2 , x5 ∈ R
         
 x4

 2 − 2x5 2 0 −2 


x5 x5 0 0 1

Corollary 3.57.
If A ∈ Rm×n is a row echelon matrix, then rank(A) is the number of leading variables
and dim(Ker(A)) is the number of free variables.

Proof. Obviously, the columns with pivots are linearly independent vectors where the
columns with free variables are a linear combination of the other ones.

In the next section, we will generalise what we did in the example before.
3.11 Solving systems of linear equations 81

3.11.5 Gaussian elimination with pivoting and P A = LK


decomposition
Now we consider the general case of Ax = b with non-square A ∈ Rm×n . Here the role of
the upper triangular U is played by a matrix K in row echelon form, see Definition 3.53.
In such a case, we need to make use of another technique, the exchange of rows of our
temporary matrices. This is again a multiplication from the left by an invertible matrix.
This is called pivoting.

Pen-and-paper strategy: ”non-zero pivoting“

Let j by the current column and assume that we want to eliminate all entries below
krj . (In the standard Gaussian elimination, we always wanted to eliminate all entries
below a diagonal element kjj , but in the general case, we have r ≤ j, as already seen in
Example 3.62.)
Initialise the permutations matrix Prow = 1 to store permutations and start with K := A
and c := b.
• If krj = 0, test for i = r + 1 . . . m, if kij 6= 0
• At the first occurence ipivot , exchange row r and row ipivot of L\K, and c and Prow .
This means that only the subdiagonal entries of L are exchanged, not the diagonal
entries (which are 1).
• If all tested kij are zero, continue with the next column.

Remark: Pivot search


In other words: If the next entry that we want to choose as a pivot is zero, we just
search the rest of the column below for a non-zero entry and switch the rows.

Question:
One may wonder, why only the entries of L are permuted, which are below the
diagonal, but not on the diagonal. As you recall, L is constructed for the purpose
of book-keeping, which row is substracted from what row, and by which scaling. The
unit diagonal of L has nothing to do with this book-keeping.
If we permute the subdiagonal entries of L, we update the book-keeping according to
the permutations of the rows of A. It is, as if the rows would have been permuted
in the beginning, before the start of the elimination.

Example 3.58. Invertible matrix (with pivoting):


        
2 3 4 1 0 0 2 3 4 1 0 0 2 3 4
 4 6 9 = 2 1 0   0 0 1  = (P2↔3 )2  2 1 0  (P2↔3 )2  0 0 1 
2 4 6 1 0 1 0 1 2 1 0 1 0 1 2
  
1 0 0 2 3 4
= P2↔3 1
 1 0   0 1 2 
2 0 1 0 0 1
82 3 Matrices and linear systems

Or if one uses the storage-saving notation:


     
2 3 4 2 3 4 2 3 4
 4 6 9  ; 2 0 1 (P; 2↔3 )
1 1 2
2 4 6 1 1 2 2 0 1

Example 3.59. Invertible matrix (with hindsight and pivoting)


     
2 3 4 2 3 4 2 3 4
 4 6 9  (P;
2↔3 )
 2 4 6  ; 1 1 2 
2 4 6 4 6 9 2 0 1

Since we have exchanged the subdiagonal entries of L, all row exchanges, applied during
the transformation b ; c can be performed at the beginning. Thus, we do not have to
remember when a row exchange took place. We only need the result of all row exchanges,
and apply it at the beginning.
     
1 1 1
b=  2  → (permute first) ; Prow b =  3  −1
; L Prow b = c =  2 .
3 2 0

This algorithm can also be applied to non-square matrices and leads to K in the so called
row echelon form. This is why there are now two variables r and j. The first stands for
the column, the second for the ”head“ of the column. r ≤ j.

Gaussian elimination with pivoting (A ∈ Rm×n )

K = A, L = 1m , c = b, r = 1, Prow = 1m
for j = 1 . . . n (loop over columns)
perform pivot search for the first non-zero element of K at or below krj
if ipivot was found, exchange row r and row ipivot of L\K, c, and Prow
for i = r . . . m (loop over rows)
kij
lir = krj
kij = 0
for s = r + 1 . . . m
kis = kis − lir krs
cr = cr − lir cr
r =r+1 consider the next row

It does, however, not work properly on the computer, because the test ujj = 0 is unreliable,
in the presence of round-off errors. For toy examples, however, it can be used to find all
possible solutions of a non-square linear system.

Example 3.60. non-square matrix (no pivoting needed here)


    
1 2 1 2 1 0 0 1 2 1 2
 1 2 2 3  =  1 1 0  0 0 1 1 
2 4 3 5 2 0 1 0 0 1 1
3.11 Solving systems of linear equations 83

  
1 0 0 1 2 1 2
=  1 1 0  0 0 1 1 
2 1 1 0 0 0 0
We observe that rank(A) = 2.

Example 3.61. Modified Example: with pivoting


    
1 2 1 2 1 0 0 1 2 1 2
 1 2 1 3 = 1 1 0  0 0 0 1 
2 4 3 5 2 0 1 0 0 1 1
  
1 0 0 1 2 1 2
= P2↔3  2 1 0  0 0 1 1 
1 0 1 0 0 0 1

The result of pen-and-paper pivoting

Although this exchange of rows happens during the course of elimination, the outcome of
the resulting algorithm for matrices A can be written as
Prow A = LK,
where in Prow all the performed permutations, and K is in row echelon form. Hence, for
a right hand side b we may solve Ax = b as follows:
w = Prow b (row permutations)
Lc = w (forward substitution)
Kx = c (backward substitution)
Then Prow Ax = LKx = Lc = w = Prow b. Usually, Prow is not stored as a matrix, but
rather as a vector p of indices: wi = bpi .

Example 3.62. We look at the example:


E1 : 2x3 + 3x4 + 12x5 = 10
E2 : 4x1 + 8x2 + 2x3 + 3x4 + 4x5 = 14
(3.14)
E3 : x1 + 2x2 + x4 = 3
E4 : −3x1 − 6x2 − 6x3 + 8x4 + 4x5 = 1
As always:  
E1 : 0 0 2 3 12 10
E2 :
 4 8 2 3 4 14

(A|b) = 
E3 : 1 2 0 1 0 3
E4 : −3 −6 −6 8 4 1
We need a row exchange.
Then we get the new pivot. Let us exchange E1 with E3 :
   
E1 : 0 0 2 3 12 10  E10 : 1 2 0 1 0 3
E2 : 4 8 2 3 4 14  | E20 :  4 8 2 3 4 14
   ;  
E3 : 1 2 0 1 0 3 E30 :  0 0 2 3 12 10
E4 : −3 −6 −6 8 4 1 E4 : −3 −6 −6
0 8 4 1
Now, there is a gray 1 that we will use for the subtraction:
84 3 Matrices and linear systems

• Subtract 4
1
= 4 times E10 from E20 ,
• subtract −3
1
= (−3) times E10 from E40 .
Here the solution:

x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
   
E10 : 1 2 0 1 0 3 E100 : 1 2 0 1 0 3
E20 :  4 8 2 3 4 14 0
−4·E1  0 0
E200 : 2 −1 4 2
 ; 
E30 :  0 0 2 3 12 10 E300 : 0 0 2 3 12 10
E40 : −3 −6 −6 8 4 1 +3·E10 E400 : 0 0 −6 11 4 10

Now there is no x1 in the rows 2, 3 and 4. Now, we should not touch the first row any
more since otherwise x1 comes back in the game.
Also, x2 remains only in row 1. Hence, we do not have to do anything with x2 . We can
go to x3 .
There the gray 2 in E200 is the next pivot. Subtract E200 with the right multiple ( 2
2
= 1)
from E300 . Also subtract −6
2
= (−3) times E200 from E400 . We get:

x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
   
E100 : 1 2 0 1 0 3 E1000 : 1 2 0 1 0 3
E200 : 0 0 2 −1 4 2  0 0
E2000 : 2 −1 4 2 
  ; 
00
E3 :  0 0 2 3 12 10 −1·E200 E3000 : 0 0 0 4 8 8
E400 : 0 0 −6 11 4 10 +3·E200 000
E4 : 0 0 0 8 16 16

Look at x4 . Here, 4 is the pivot. We subtract E3000 8


4
= 2 times from E4000 :

x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
   
G000
1 : 1 2 0 1 0 3 E10000 : 1 2 0 1 0 3
G2 : 0 0
000  2 −1 4 2   0 0
E20000 : 2 −1 4 2
 0 0
 ; 
G000
3 : 0 4 8 8 E30000 :
 0 0 0 4 8 8
G000
4 : 0 0 0 8 16 16 −2·G000
3 E40000 : 0 0 0 0 0 0

The elimination algorithm ends. This is the wanted solution in row echelon form

x1 x2 x3 x4 x5
 
E10000 : 1 2 0 1 0 3
0000
E2 :
 0 0 2 −1 4 2
(3.15)


E3 : 0
0000 0 0 4 8 8
E40000 : 0 0 0 0 0 0

Can you write down the set of all solutions S?


3.12 Looking at columns and maps 85

Video: PLU decomposition - An Example

https: // jp-g. de/ bsom/ la/ plu/

3.12 Looking at columns and maps

In the Gaussian elimination everything works in the rows. Now, we will look what we can
say about the columns. As a reminder:The LES Ax = b has at least one solution x if and
only if b can be written as Ax for a x ∈ Rn , which means that b ∈ Ran(A). Looking at
the columns of the matrix

 

A = a1 · · · an  ∈ Rm×n ,

we can conclude

      
x1
Ax = a1 · · · an   ...  = x1 a1  + · · · + xn an 
xn

In other words:

Ran(A) = {Ax : x ∈ Rn } = {x1 a1 + . . . + xn an : x1 , . . . , xn ∈ R} ⊂ Rm . (3.16)

Reformulating the fact from above

Corollary 3.63. Solvability in the column picture


For a matrix A ∈ Rm×n and vector b ∈ Rm the following claims are equivalent

(i) Ax = b has at least one solution,

(ii) b ∈ Ran(A),

(iii) b can be written as a linear combination of the columns from A.


86 3 Matrices and linear systems

Rn Rm
fA
Ran(A)

x Ax
b
c

b ∈ Ran(A) ⇒ Ax = b has at least one solution


c∈/ Ran(A) ⇒ Ax = c has no solution

Example 3.64. Let A = 31 62 .Then Ax = b has at least one solution if and only if


n 3  
6 o n 3  
3 o
b ∈ Ran(A) = x1 + x2 : x1 , x2 ∈ R = x1 + 2x2 : x1 , x2 ∈ R
1 2 1 1
  o n 3
n 3 o
= (x1 + 2x2 ) : x1 , x2 ∈ R = λ :λ∈R
1 1

This means that b ∈ R2 lies on the line through 00 and 31 .


 

Remember that for each matrix A there is a linear map fA : Rn → Rm , cf. section 3.3,
defined by
fA
x ∈ Rn 7→ Ax ∈ Rm .
Of course, solving Ax = b, is the same as solving fA (x) = b. This means we want to find
the preimage of the element b with respect to the map fA . Obviously the image of fA is
exactly the range of A Ran(A), so we get:

fA (Rn ) = {fA (x) : x ∈ Rn } = {Ax : x ∈ Rn } = Ran(A).

Hence, we find the following:

Proposition 3.65. Unconditional solvability needs surjectivity of fA )


For a matrix A ∈ Rm×n the following claims are equivalent:

(i) The LES Ax = b has for every b ∈ Rm at least one solution x.

(ii) All b ∈ Rm lie in Ran(A).

(iii) Ran(A) = Rm .

(iv) rank(A) = m ≤ n.

(v) The row echelon form of A, denoted by A0 , has a pivot in every row.

(vi) fA is surjective.
3.12 Looking at columns and maps 87

n Row echelon form A0 of A:

• each row has a pivot


0 0
m • There are no zero rows in A0 .
0 0 0
• We will never have (0 · · · 0 | c 6= 0) in the
0 0 0 0 0 last row.

Proof. (i)⇔(ii)⇔(iii)⇔(vi) and (iv)⇔(v) is clear. Assume (iv), rank(A) = m. Then for
each b ∈ Rm we can use rank(A) ≤ rank(A|b) ≤ m and therefore rank(A|b) = m. We
can conclude rank(A) = rank(A|b). This means that b is a linear combination of the
columns of A and, hence, this gives us a solution x, which shows (i).
Show now (i)⇒(v) by contraposition. Assume ¬(v). By doing the elimination A ; A0
to get a row echelon form A0 , we also get at least one zero row. Then, it is possible to
choose b ∈ Rm in such a way that we get a row in (A0 |b0 ) that is given by (0 . . . 0 | c)
with c 6= 0. Such a row cannot be solved and, hence, Ax = b has no solution. Therefore,
we get ¬(i).

Example 3.66. Consider a 3 × 5 matrix A and calculate the row echelon form A0 :
   
1 4 0 2 −1 1 4 0 2 −1
A = −1 2 −2 −2 3  ; · · · ; A0 =  0 6 −2 0 2
−3 0 −4 −3 8 0 0 0 3 1
Each row of A0 has a pivot and (v) from Proposition 3.65 holds. One immediately sees
rank(A) = 3 = m ≤ n = 5.
(i) says that the LES Ax = b has for every right-hand sides b ∈ R3 at least one solution.

Now we go to the uniqueness

Proposition 3.67. Unique solution (injectivity of fA )


For a matrix A ∈ Rm×n the following claims are equivalent:
(i) Ax = b has for every b ∈ Rm at most one solution x.
(ii) Ax = o has only the solution x = o.
(iii) Ker(A) = {o}.
(iv) rank(A) = n ≤ m.
(v) The row echelon form of A, denoted by A0 , has in every column a pivot.
(vi) fA is injective.
n
Row echelon form A0 of A:
0 • Each column has a pivot.
0 0 • All variables are leading variables.
m
0 0 0 • There are no free variables.
0 0 0 0 • It is not possible to have more than one solution.
0 0 0 0
88 3 Matrices and linear systems

Proof. Equivalence (i)⇔(iii) follow from S = v0 + Ker(A), see Proposition 3.48. (i)⇔(vi)
holds by the definition of injectivity. The equivalence (ii)⇔(iii) follows from the definition
of Ker(A). (iii)⇔(iv) holds by Theorem 3.41, the Rank-nullity Theorem. (iv)⇔(v) holds
since row operations do not change the rank of a matrix.

Example 3.68. Consider a 4 × 3 matrix A and calculate the row echelon form A0 :
   
2 3 0 2 3 0
 ; · · · ; A0 =  0 −1
2 2 5  5 
A= −4 −5

−3 0 0 −8 
4 7 1 0 0 0

Each column in A0 has a pivot. One also sees: rank(A) = 3 = n ≤ m = 4.


   
1
! 2 0
2 0
The LES Ax = b has exactly the solution x = 0 for b = −4; but for b = 0
0 4 1
there is no solution x.

Both things together:

Proposition 3.69. Existence and Uniqueness of a solution


For a matrix A ∈ Rm×n the following claims are equivalent:
(i) The LES Ax = b has for every b ∈ Rm a unique solution.

(ii) Ker(A) = {o} and Ran(A) = Rm .

(iii) rank(A) = m = n, i.e. A is square with maximal rank.

(iv) fA is bijective.

n Row echelon form A0 of A:

• Each column and row has a pivot.


0
m • The matrix has to be square.
0 0 • We have rank(A) = m = n.
0 0 0 • The row echelon form A0 has triangle form.

We look at the special case of square matrices

Proposition 3.70. m=n: square matrices


For a square matrix A ∈ Rn×n the following claims are equivalent:
(i) The LES Ax = b has a solution for for every b ∈ Rn .

(ii) The LES Ax = b has for some b ∈ Rn a unique solution.

(iii) The LES Ax = b has a unique solution for every b ∈ Rn .

(iv) Ker(A) = {o}


3.12 Looking at columns and maps 89

(v) Ran(A) = Rn .

(vi) rank(A) = n.

(vii) For A, the row echelon form A0 has a pivot in each row.

(viii) For A, the row echelon form A0 has a pivot in each column.

(ix) fA is surjective.

(x) fA is injective.

(xi) fA is bijective.

Proof. Since m = n, the equations rank(A) = m and rank(A) = n from Proposition 3.65
and Proposition 3.67 are equivalent. Therefore all the claims above are equivalent.

We can conclude:

Box 3.71. Fredholm alternative


For square matrices, we have either both claims below or neither of them:

• unconditional solvability (fA ist surjective),


• unique solution (Ker(A) = {o}, hence fA is injective)

Summary
• By Rm×n we denote number tables with m rows and n columns.
• We call these number tables matrices and can naturally scale them and add them
Both operations in Rm×n are realised by doing these inside the components.
• Linear equations look like

constant · x1 + constant · x2 + · · · + constant · xn = constant .

• Systems of linear equations (LES) are finitely many of these linear equations.
• A solution of the system is a choice of all unknowns x1 , . . . , xn such that all equations
are satisfied.
• A short notation for LES is the matrix notation: Ax = b.
• This notation leads us to the general matrix product.
• Each matrix A induces a linear map fA : Rn → Rm . A linear map satisfies two
properties ( · ) and (+).
• If fA is bijective, the corresponding matrix is invertible with respect to the matrix
product.
• Linearly independent vectors are the most efficient method to describe a linear sub-
space.
90 3 Matrices and linear systems

• A linearly independent family that generates the whole subspace U is called a basis
of U .
• Range, rank and kernel are important objects for matrices.
• For solving a LES, we use Gaussian elimination or equivalently LU -decomposition.
In the general case the upper triangular matrix U is substituted by a row echelon
form.
• Solvability and unique solvability can be equivalently formulated and, for example,
read from the row echelon form.
Determinants
4
A learning experience is one of those things that says, ’You know that
thing you just did? Don’t do that.’
Douglas Adams, The Salmon of Doubt

Let A ∈ Rn×n be a square matrix. The determinant det(A) ∈ R of A is a special real


number, associated to A.

• Geometrically, if A is a column matrix, the absolute value | det(A)| describes the


(generalised) n-dimensional volume of the parallel-epiped, spanned by the columns
of A.
• In particular, this volume is non-zero if and only if the columns of A are linearly
independent and this hold also if and only if A is invertible
• The sign (±) of det(A) gives an orientation, where det(1) = +1 (column matrix of
unit vectors in the usual order, unit cube). If we exchange two columns of a matrix,
then the sign of its determinant changes.
• The determinant should have all other properties that one expects from a volume.

4.1 Determinant in two dimensions

We already know how to solve the system Ax = b if A is a square matrix. The determinant
should then tell us if the system has a unique solution before solving it. For a 2 × 2 LES,
we get (for a11 6= 0):

   
a11 a12 b1 a11 a12 b1
;
a21 a22 b2 0 a11 a22 − a12 a21 b2 a11 − b1 a21

Hence, we know that the LES has a unique solution if and only if in the second column
there is a pivot. This means a11 a22 − a12 a21 6= 0. And that is the determinant of the
system or rather the matrix A.

91
92 4 Determinants

Definition 4.1.
For a matrix A = a11 a12
∈ R2×2 , we call det(A) := a11 a22 −a12 a21 the determinant

a21 a22
of A.

The determinant in two dimensions has an immediate interpretation when we compare it


to an area measurement:

Consider two vectors u = uu12 and v = vv12 from R2 and the spanned parallelogram.
 

We define Area(u, v) ∈ R as the real number that fulfils

v
|Area(u, v)| = area of parallelogram
u

and the sign of Area(u, v) is chosen in the following way:

• Plus sign if then rotation by rotating u such that the angle between u and v
gets smaller is the mathematical positive sense,
• Minus sign if this rotation is the mathematical negative sense.

v u

+ −
u v

If you look back at Section 2.6, you already know a possible calculation for the area of
a parallelogram. However, this only works in the three-dimensional space R3 since it
involves the cross product. However, if we just embed the vectors u, v ∈ R2 and therefore
the whole parallelogram into R3 , we can calculate |Area(u, v)| in this way. A possible
way is to set the supplementary third component as zero:
! !
u1 v1
ũ := u2 and ṽ := v2 .
0 0

Then we find:
! !
u2 0 − 0v2 0
|Area(u, v)| = kũ × ṽk = 0v1 − u1 0 = 0 = |u1 v2 − u2 v1 |.

u1 v2 − u2 v1 u1 v2 − u2 v1

Without the absolute value this coincides with the determinant of the matrix
!
A := u v .

Indeed, also the sign rule from above is fulfilled.


4.2 Determinant as a volume measure 93

Proposition 4.2.
u  v  
u1 v1

1 1
Area , = det = u1 v2 − u2 v1 (4.1)
u2 v2 u2 v2

Example 4.3. (a) If we look at u = 3


and v = 2
, we get:
 
−2 1

 3  2
Area(u, v) = Area , = 3 · 1 − (−2) · 2 = 7.
−2 1

The area is 7 and the orientation is positive.


(b) The other ordering gets:

2  3 
Area , = 2 · (−2) − 1 · 3 = −7.
1 −2

The area is again 7 but the orientation is negative.


(c) Choose u = uu12 and the scaled vector v = αu = αu . Then:
 1

αu2

u  αu 
1 1
Area(u, v) = Area(u, αu) = Area , = u1 αu2 − u2 αu1 = 0.
u2 αu2

Note that the vectors u and v = αu do not span a actual parallelogram but rather
just a stripe. Therefore, the area has to be zero.

4.2 Determinant as a volume measure

In the previous section, we showed that, in two dimensions, the determinant is connected
to a measuring of an area. In three dimensions, we therefore expect that the determinant
will measure a volume. In general, we want that the determinant measures an generalised
n-dimensional volume in Rn . We use the symbol Voln for this. We already know that
Vol2 = Area. Now, we can summarise what one should demand of a meaningful volume
measure.

Definition 4.4. Properties that Voln should have.


The n-dimensional volume function Voln : Rn × · · · × Rn → R that gets n vectors
as input should fulfil:

(1) Voln (u1 , . . . , uj + v, . . . , un ) = Voln (u1 , . . . , uj , . . . , un )+Voln (u1 , . . . , v, . . . , un )


for all u1 , . . . un , v ∈ Rn and j ∈ {1, . . . , n}.
94 4 Determinants

u1

h0 h + h0 · u1
v
·

u2 + v
h ·
u2
·
u1

(Picture in the case n = j = 2)

(2) Voln (u1 , . . . , αuj , . . . , un ) = α Voln (u1 , . . . , uj , . . . , un ) for all u1 , . . . un ∈ Rn ,


α ∈ R and j ∈ {1, . . . , n}.

3v
v

(3) Voln (u1 , . . . , ui , . . . , uj , . . . , un ) = −Voln (u1 , . . . , uj , . . . , ui , . . . , un ) for all


u1 , . . . un ∈ Rn , and i, j ∈ {1, . . . , n} with i 6= j.

v u

+ −
u v

(4) The unit cube (u1 = e1 , . . . , un = en ) has volume 1: Voln (e1 , . . . , en ) = 1.

In mathematical terms, this means that the volume function is linear in each entry, anti-
symmetric and normalised to the standard basis. For the case n = 2, we can show that
solely these properties imply equation (4.1).
First, we show an easy consequence that follows from the two properties (2) and (3):

Proposition 4.5. Colinear vectors do not have an area.


For all u ∈ R2 and α ∈ R, we have Vol2 (u, αu) = 0.

Proof. Because of (3), we find Vol2 (u, u) = −Vol2 (u, u) and this implies Vol2 (u, u) = 0.
Since (2) holds, we get Vol2 (u, αu) = α Vol2 (u, u) = α 0 = 0.

Now, we can prove the formula (4.1):


4.2 Determinant as a volume measure 95

Proposition 4.6.
If Vol2 fulfils (1), (2), (3), (4), then for all u = u1 v1
∈ R2 the following
 
u2
,v = v2
u1 v1
holds
u  v 
1 1
Vol2 (u, v) = Vol2 , = +u1 v2 −u2 v1 .
u2 v2 u2 v2

Proof.
u    
0 (1)
u    0  
1 1
Vol2 (u, v) = Vol2 + , v = Vol2 , v +Vol2 ,v
0 u2 0 u2
(1)
u  v  u   0 
1 1 1
= Vol2 , +Vol2 ,
0 0 0 v2
 0  v   0   0 
1
+ Vol2 , +Vol2 ,
u2 0 u2 v2
       
(2),(2)
 1 1   1 0
= u1 v1 Vol2 , +u1 v2 Vol2 ,
0 0 0 1
| {z } | {z }
=0 see Proposition 4.5 =1 because of (4)
0 1 0 0
+ u2 v1 Vol2 , +u2 v2 Vol2 ,
1 0 1 1
| {z } | {z }
=−1 because of (3),(4) =0 see Proposition 4.5

= u1 v2 − u2 v1 .

Note that this proves that the volume function Vol2 is uniquely defined by the four
properties (1), (2), (3) and (4) alone. We expect the same for arbitrary dimension n and
indeed we prove this now.
   
a11 a1n
For a1 =  ...  , . . . , an =  ...  ∈ Rn , it follows
an1 ann

Voln (a1 , . . . , an ) = Voln (a11 e1 + . . . + an1 en , . . . , a1n e1 + . . . + ann en ) .

Using the linearity in each entry, we can conclude:


n X
X n n
X
Voln (a1 , . . . , an ) = ··· ai1 ,1 ai2 ,2 · · · ain ,n Voln (ei1 , ei2 , . . . , ein )
i1 =1 i2 =1 in =1

with n · . . . · n = nn summands, where most of them are zero since det(ei1 · · · ein ) = 0 if
two indices coincide. The possibilities (i1 , i2 , . . . , in ) for that all entries are different can
be counted. The number is the number of all permutations of the set {1, . . . , n}, which
is exactly n · (n − 1) · . . . · 1 = n!. Let Pn be the set of these n! permutations, which can
also be denoted by τ = (i1 , i2 , . . . , in ).
For a permutation τ ∈ Pn , we define sgn(τ ) = 1 if one can use an even number of
exchanges of two elements to get from τ to (1, 2, . . . , n). If one needs an odd number of
exchanges of two elements to get from τ to (1, 2, . . . , n), we define sgn(τ ) = −1.
Repeatable usage of (3) shows Voln (ei1 · · · ein ) = sgn(i1 , . . . , in ). In summary, we get
96 4 Determinants

Proposition 4.7. Leibniz formula


The volume form Voln is uniquely
  determined bythe  properties (1),(2),(3) and (4)
a11 a1n
and fulfils for n vectors a1 =  ...  , . . . , an =  ...  ∈ Rn :
an1 ann
X
Voln (a1 , . . . , an ) = sgn(i1 , . . . , in )ai1 ,1 ai2 ,2 · · · ain ,n . (4.2)
(i1 ,...,in )∈Pn

Proof. The calculation from above shows that (4.2) is the only function that fulfils all the
four rules.

Definition 4.8. Determinant of square matrices


For a square matrix A ∈ Rn×n with entries
   
a11 · · · a1n
A := a1 · · · an  =  ... ..  ,
.
an1 · · · ann

we define the determinant as the volume measure of the column vectors:


X
det(A) := Voln (a1 , . . . , an ) = sgn(i1 , . . . , in )ai1 ,1 ai2 ,2 · · · ain ,n .
(i1 ,...,in )∈Pn

Remark:
You can remember the Leibniz formula of the determinant det(A) in the following
way:

(1) Build a product of n factors out of the entries in A. From each row and each
column you are only allowed to choose one factor.

(2) Sum up all the possibilities for such a product where you add a minus-sign for
the odd permutations.

Example 4.9. Consider the matrix Pk↔` that we used in the Gaussian elimination to
switch the kth row with the `th row. Let’s denote the entries by pij and then we know

1 if i = j and i, j 6= k, `

pij = 1 if i = k, j = ` or i = `, j = k
0 else

This means that in the Leibniz formula, there is only one non-vanishing term:

det(Pk↔` ) = sgn(τ )p11 · · · pk,` · · · p`,k · · · pnn = −1

Since the permutation is only one single exchange, the sign is −1. Of course, we expect
this result by property (3) of the volume form.
4.3 The cofactor expansion 97

We have used the Leibniz formula to finally define the determinant of a matrix or, equi-
valently, the volume measure in Rn . However, despite being useful in abstract proofs,
this formula is not a good one for actual calculation. Even for n = 4, we have to sum up
4! = 24 terms. Only for n = 2 and n = 3, we get good calculation formulas, which can be
memorised.
 
a11 a12 a13
det a21 a22 a23  = + a11 a22 a33 + a12 a23 a31 + a13 a21 a32
a31 a32 a33
− a13 a22 a31 − a11 a23 a32 − a12 a21 a33

Rule of thumb: Rule of Sarrus (Only for n = 3)


u1 v1 w1 u1 v1
u2 v2 w2 u2 v2
u3 v3 w3 u3 v3
+
Vol3 (u, v, w) = +u1 v2 w3 + v1 w2 u3 + w1 u2 v3 −u3 v2 w1 − v3 w2 u1 − w3 u2 v1

Moreover, the sign of the three-dimensional volume can be easily seen by the right-hand-
rule:

4.3 The cofactor expansion


We already know that volume measure and the determinant of a matrix coincide. From
now on, we will only consider the determinant of matrices and keep in mind that this is
the volume spanned by the columns of the matrix. We consider a matrix

A = (aij ) with i, j = 1, . . . , n.

We have already encountered these cases:


• A ∈ R1×1 :
det(A) = a11
98 4 Determinants

• A ∈ R2×2 :
det(A) = a11 a22 − a21 a12

• A ∈ R3×3 :
det(A) = + a11 a22 a33 + a12 a23 a31 + a13 a21 a32
− a13 a22 a31 − a11 a23 a32 − a12 a21 a33

Note that it would be helpful to have an algorithm that reduces an n × n-matrix to these
cases.

Checkerboard of signs:
 
+ − + − ...
− + − + . . .
 
+
 − + − . . .

− + − + . . .
.. .. .. ..
 
. . . .
The entry (i, j) of this matrix is (−1)i+j .

Cofactors:

For A ∈ Rn×n pick one entry aij .


• Delete the ith row and the j th column of A.
• Call the remaining matrix A(i,j) ∈ R(n−1)×(n−1) .
• The cofactor cij of aij is defined as
cij := (−1)i+j det(A(i,j) )

Definition 4.10.
C = (cij ) = (−1)i+j det(A(i,j) ) is called the cofactor matrix of A.

Using this, we find:

Proposition 4.11. Laplace’s formula


Let A ∈ Rn×n and choose the j th column:
n
X n
X
det(A) = aij cij = (−1)i+j aij det(A(i,j) )
i=1 i=1

This is expanding det(A) along the j th column.


If you choose the ith row:
n
X n
X
det(A) = aij cij = (−1)i+j aij det(A(i,j) )
j=1 j=1
4.3 The cofactor expansion 99

One calls this expanding det(A) along the ith row.

To compute det(A(i,j) ), apply the same formula recursively, until you reach 2 × 2 matrices,
where the corresponding formula can be applied. The proof of this formula follows im-
mediately from the Leibniz formula and is left to the reader.

Rule of thumb: Do not forget the checkerboard matrix


 
+ − + − +
− + − + −
Remember the signs when expanding a matrix along a column  
+ − + − +
or a row. 
− + − + −

+ − + − +

Rule of thumb: Use the nothingness.


Since it is your choice which of the rows or the columns you want to expand along,
you should search for zeros. If you find a row or column with a lot of zeros, which
means that one has aij = 0 for some indices, you do not need to calculate det(Aij )
for these indices.

Example 4.12. Consider the matrix


 
0 2 3 4
2 0 0 0
A=
1
.
1 0 0
6 0 1 2

Here, it would be useful first to expand along the second row since we find three zeros
there:
 
0 2 3 4  
2 0 0 2 3 4  
0
 3 4
det(A) = det 
1 1 0 = (−2) · det 1 0 0 = (−2) · (−1) · det
  = 4.
0 1 2
0 1 2
6 0 1 2

If C is the cofactor matrix of A ∈ Rn×n , then each entry of C is given by


 

cij = (−1)i+j det(A(i,j) ) = det a1 · · · aj−1 ei aj+1 · · · an 

as one can easily see by the Laplace’s formula. Now we can conclude:

Proposition 4.13.
Let A ∈ Rn×n and C be its cofactor matrix. Then

C T A = det(A)1n .
100 4 Determinants

In particular, if det(A) 6= 0, then A is invertible and the inverse is given by

CT
A−1 = .
det(A)

Proof. This is just a matrix multiplication where we consider the (i, j)th entry:
 
n
X n
X
(C T A)ij = cki akj = det a1 · · · ai−1 ek ai+1 · · · an  akj
k=1 k=1
 

= det a1 · · · ai−1 aj ai+1 · · · an 

This is det(A) for i = j and otherwise just zero.

4.4 Important facts and using Gauß

Proposition 4.14.
If the square matrix A ∈ Rn×n is in triangular form,
 
a1,1 a1,2 · · · a1,n
 0 a2,2 a2,n 
A =  .. . . ..  ,
 
 . . . . . . 
0 · · · 0 an,n

then the determinant is given by the multiplication of the diagonal:

det(A) = a11 a22 · · · ann .

Proof. Use Laplace’s formula to the first column recursively.

Proposition 4.15. Determinants for block matrices


Let A ∈ Rn×n and C ∈ Rk×k two square matrices. For every matrix B ∈ Rn×k
define a so-called block matrix in triangular form
 
A B
∈ R(n+k)×(n+k) ,
0 C

where 0 denotes the (k × n) zero matrix. Then, one has:


 
A B
det = det(A) det(C).
0 C

Proof. Exercise! Use the Leibniz formula.


4.4 Important facts and using Gauß 101

Proposition 4.16.
For each matrix A ∈ Rn×n , we have det(AT ) = det(A).

Proof. We can use the definition by the Leibniz formula:


X n
Y X
det(A) = sgn(σ) aσ(i),i = sgn(σ) aσ(1),1 aσ(2),2 · · · aσ(n),n .
σ∈Pn i=1 σ∈Pn

For the transpose AT we find the following:


X n
Y X
T
det(A ) = sgn(σ) ai,σ(i) = sgn(σ) a1,σ(1) a2,σ(2) · · · an,σ(n) .
σ∈Pn i=1 σ∈Pn

We only have to show that both sums consist of the same summands.
By definition of the signum function:
sgn(σ ◦ ω) = sgn(σ)sgn(ω)
From this we get:
sgn(σ −1 ) = sgn(σ)

The multiplication is commutative and we can rearrange the product aσ(1),1 aσ(2),2 · · · aσ(n),n .
Hence, we get
sgn(σ) aσ(1),1 aσ(2),2 · · · aσ(n),n = sgn(σ −1 ) a1,σ−1 (1) a2,σ−1 (2) · · · an,σ−1 (n),
Now we substitute ω for σ −1 and recognise that all summands are, in fact, the same. Here,
it is important that Pn is a so-called group, in which each element has exactly one inverse.
Therefore, we can sum over ω instead of σ without changing anything. In summary, we
have:
X X
sgn(σ) aσ(1),1 aσ(2),2 · · · aσ(n),n = sgn(ω) a1,ω(1) a2,ω(2) · · · an,ω(n) .
σ∈Pn ω∈Pn

Proposition 4.17.
For A, B ∈ Rn×n , we have

det(AB) = det(A) det(B).

In particular, if A is invertible, we have


1
det(A−1 ) = and det(A−1 BA) = det(B)
det(A)

Proof. Denote the column vectors of A by aj and the rows of B by β Tj , we can write the
matrix product as AB = j aj β Tj and get:
P

!
X X
det(AB) = Voln aj1 bj1 ,1 , . . . , ajn bjn ,n .
j1 jn
102 4 Determinants

Now we can use the properties (1), (2), (3), and (4) the volume form has, see Definition 4.4.
We get:
X
det(AB) = bj1 ,1 · · · bjn ,n Voln (aj1 , . . . , ajn )
j1 ,...,jn
X 
= bσ(1),1 · · · bσ(n),n Voln aσ(1) , . . . , aσ(n)
σ∈Pn
X
= bσ(1),1 · · · bσ(n),n sgn(σ)Voln (a1 , . . . , an )
σ∈Pn
X
= det(A) sgn(σ)bσ(1),1 · · · bσ(n),n = det(A) det(B) .
σ∈Pn

The determinant function det : Rn×n → R is therefore multiplicative. This is what we can
use for calculating determinants with the Gaussian elimination since this is nothing more
than multiplying matrices from the left.

Corollary 4.18.
Using the row operations Zi+λj (for i 6= j and λ ∈ R) do not change the determinant.
The switching of rows only changes the sign of the determinant. However, scaling
rows with a diagonal matrix D changes the determinant by the product of these
scaling factors.

Since det(AT ) = det(A), one can equivalently use column operations if you just want to
calculate the determinant. It is not recommended in the case you need indeed the row
echelon form in the end for other applications.

Rule of thumb: Using Gauß for determinants


For calculating det(A), you can add multiples of a row to another row or add mul-
tiples of a column to another column without changing the determinant. If you
exchange two rows or two columns, you simply have to change the sign. Do not
scale rows or columns since this changes the determinant.

In a formal way, we would say:


• Compute P A = LU , count the row permutations, to find either det(P ) = +1 or
det(P ) = −1
• det(A) = det(P ) det(U ).

Example 4.19. We calculate the determinant of the following 5 × 5 matrix A. The third
column already has three zeros but we can generate a fourth zero by using one Gaussian
step: Subtract the fifth row from the second one:
   
−1 1 0 −2 0 −1 1 0 −2 0
0
 2 1 −1 4
0
 4 0 −2 2
A=
 1 0 0 −3 1 ; 1
 0 0 −3 1 =: B
1 2 0 0 3 1 2 0 0 3
0 −2 1 1 2 0 −2 1 1 2
4.4 Important facts and using Gauß 103

Now, we know that det(A) = det(B) holds. Having so many zeros, it is the best to expand
det(B) along the third column:
 
−1 1 0 −2 0  
−1 1 −2 0
0
 4 0 −2 2 
 0 4 −2 2
3+5
det(B) = det  1
 0 0 −3 1  = (−1) · 1 · det 
 1 0 −3 1

1 2 0 0 3
1 2 0 3
0 −2 1 1 2 | {z }
=:C

Looking at C, we can use Gaussian elimination to get two more zeros in the second row.
(Add the fourth column to the third column and subtract it from the second column two
times):    
−1 1 −2 0 −1 1 −2 0
 0 4 −2 2 0 0 0 2
C=  1 0 −3 1
 ;  1 −2 −2 1 =: D
 

1 2 0 3 1 −4 3 3
Of course, we have again: det(C) = det(D). Now, det(D) should be expanded along the
second row:
 
−1 1 −2 0  
0 −1 1 −2
0 0 2 2+4
 1 −2 −2 1 = (−1)
det(D) = det   · 2 · det  1 −2 −2 .
1 −4 3
1 −4 3 3 | {z }
=:E

Now, we only have a 3 × 3 matrix and use the formula of Sarrus:

det(E) = (−1) · (−2) · 3 + 1 · (−2) · 1 + (−2) · 1 · (−4)


− 1 · (−2) · (−2) − (−4) · (−2) · (−1) − 3 · 1 · 1
= 6 − 2 + 8 − 4 + 8 − 3 = 13

In summary:

det(A) = det(B) = 1 · det(C) = det(D) = 2 · det(E) = 2 · 13 = 26.

Remark:
• det(A−1 ) = 1
det(A)
(if the inverse exists)

• If Q is an orthogonal matrix (QT Q = 1), then det(Q) = ±1

• Let P be a row permutation matrix, then det(P ) = 1, if the number of row


exchanges is even, and det(P ) = −1 if it is odd.

• If P A = LU , then det(A) = 1
det(P )
det(L) det(U ) = det(P ) det(U ) =
± det(U ).

• If A = S −1 BS, then det(A) = 1


det(S)
det(B) det(S) = det(B) (similar matrices
have the same determinant).
104 4 Determinants

Attention! Comparison: n3 /3 (Gauß) vs. n! (Laplace/Leibniz formula)

n 2 3 4 5 6 7 8 9 10 · · · 20
3
n /3 2 9 21 42 72 114 171 243 333 · · · 2667
n! 2 6 24 120 720 5040 40320 362880 3628800 · · · 2.4 · 1018

4.5 Determinants for linear maps

• For each matrix A, there is the linear map fA : Rn → Rn .


• For each linear map f : Rn → Rn , there is a exactly one matrix A such that f = fA .
• The columns of A are then the images of the unit cube under fA .
• Then det(A) is the relative change of volume (of the unit cube) caused by fA .

Definition 4.20. Determinant for fA


For a linear map f : Rn → Rn , we define

det(f ) := det(A)

where A is the uniquely determined matrix with f = fA .

In fact det(f ) is the relative change of all volumes and we remind that we have the
following:

Let A, B ∈ Rn×n . We have the formula:

det(fA ◦ fB ) = det(fA ) det(fB )

In general, det(A) = det(fA ) describes the change of volume for every figure:

det(A) = factor for change of volume under fA


fA
7→
figure F figure F 0 = fA(F )
volume · det(A)
−−−−−−−−−−−→

For the composition, we get the following picture:


4.6 Determinants and systems of equations 105

fAB = fA ◦ fB

fB fA
)
(AB
t (B) det
1 de

Area · det(B) Area · det(A)

Area · det(AB)

4.6 Determinants and systems of equations


Simple reasoning: if det(A) = 0, then A is not invertible, and vice versa. A matrix with
det(A) = 0 is called singular. Thus, det(A) can be used to check, if a linear system
Ax = b always has a unique solution (for right-hand sides b).
This is helpful, if A depends on a parameter e.g. A(λ).

Example 4.21.
 
λ 1 2
A(λ) =  1 2 3 det(A(λ)) = λ(4 − 3) − 1(2 − 2) + 1(3 − 4) = λ − 1.
1 1 2
This matrix is singular if and only if λ = 1, and in indeed, for λ = 1, we have for the
column vectors a1 (λ) + a2 = a3 .

Conclusion: singular matrices do not appear very often. Whatever this means.
Warning: this is only good for pen-and-paper computations. In numerical computations,
det(A + round off) says nothing about invertibility of A, only about change of volume:
 
ε 0
det = 1.
0 1/ε

We summarise our knowledge:

Proposition 4.22. Nonsingular matrices and LES


Let A ∈ Rn×n . Then the following is equivalent
(i) det(A) 6= 0,
(ii) the columns of A are linearly independent,
(iii) the rows of A are linearly independent,
(iv) rank(A) = n,
(v) A is invertible,
106 4 Determinants

(vi) Ax = b has a unique solution for every b ∈ Rn ,

(vii) Ker(A) = {o}.

Proof. Exercise!

4.7 Cramer’s rule


Consider the linear system of equations, with full rank matrix A:

Ax = b .

Then by our formula for the inverse we get:

CT b
−1
Ax = b ⇒ x = A b = .
det(A)

This let us say the following about the components of a solution:

Proposition 4.23. Cramer’s rule


 
x1
Let A ∈ Rn×n invertible and b ∈ Rn . Then the unique solution x =  ...  ∈ Rn of
xn
the LES Ax = b is given by:
 

det a1 · · · ai−1 b ai+1 · · · an 


xi =   for i = 1, . . . , n.
det a1 · · · ai−1 ai ai+1 · · · an 
| {z }
A

Proof. Having the cofactor matrix C, we already know that the solution is given by

CT b
x = A−1 b =
det(A)

Therefore, we just have to look at the ith row of the matrix C T b which is given by:
 
n
X n
X
(C T b)i = cki bk = det a1 · · · ai−1 ek ai+1 · · · an  bk
k=1 k=1
 

= det a1 · · · ai−1 b ai+1 · · · an 


4.7 Cramer’s rule 107

Attention! Do not use Cramer’s rule to solve a system Ax=b!


Cramer’s rule is less efficient than Gaussian elimination. That is noticeable for
large matrices.

If you calculate the determinants by Laplace, then you work is of order n!. If you use
3
Gaussian elimination for calculating the determinants, you only need n3 steps for each
component of x, but, of course in this case, you could have solved the whole system
Ax = b by using Gaussian elimination.
For computational reasons the Cramer’s rule can only be used for small matrices, but the
real advantage is the theoretical interest. You can use Cramer’s rule in proofs if you need
claims about a single component xi of the solution x.

Summary
• The determinant is the volume form.
• The determinant fulfils three defining properties:
(1) Linear in each column.
(2) Alternating when exchanging columns.
(3) The identity matrix has determinant 1.
• To calculate a determinant, you have the Leibniz formula, the Laplace expansion or
Gaussian elimination (without scaling!).
General inner products, orthogonality and distances
5
Sadly, very little school maths focuses on how to win free drinks in a
pub.
Matt Parker

We have already encountered the standard inner product (also called Euclidean scalar
product) in Rn :
Xn
T
hx, yi = x y = xi yi , for x, y ∈ Rn
i=1
With the help of this inner product, we are able to define and compute many useful things:
• length: kxk = hx, xi
p

• distances: dist(x, y) = kx − yk
hx,yi
• angle: cos(∠(x, y)) := kxkkyk

• orthogonality: x ⊥ y : ⇔ hx, yi = 0.
• orthogonal projections, e.g. the height
• rotations about an axis by an angle
• reflections at a hyperplane

5.1 General inner products in Rn


The following properties capture everything that is needed to measure angles and lengths
in a useful way:

Definition 5.1. Inner product


Let V be Rn or a subspace of Rn . We call a map of two arguments h·, ·i : V ×V → R
an inner product if it satisfies for all x, y, v ∈ V and λ ∈ R:

(S1) Positive definiteness:


hx, xi > 0 for x 6= o

109
110 5 General inner products, orthogonality and distances

(S2) Additivity in the first argument:

hx + y, vi = hx, vi + hy, vi

(S3) Homogenity in the first argument:

hλx, vi = λhx, vi

(S4) Symmetry:
hx, yi = hy, xi

We usually summarise (S2) and (S3) to linearity in the first argument. Note that from
(S3) always follows ho, oi = 0 · ho, oi = 0. In combination with (S1), we get:

hx, xi = 0 ⇔ x = o . (5.1)

Also, due to positive definiteness, we can define a norm (or length) via
p
kxk := hx, xi.

We call it the the associated norm with respect to h·, ·i.

• Inner products are also linear in the second argument, by symmetry.


• Later, we will define complex-valued inner products that fulfil instead of (S4):

hx, yi = hy, xi . (5.2)

Then the second argument actually has different properties than the first.
• In the usual real case, the binomial formulas hold:

kx ± yk2 = kxk2 ± 2hx, yi + kyk2


hx + y, x − yi = kxk2 − kyk2 .

Example 5.2. The standard inner product on Rn :


n
X
T
hx, yieuklid := x y = xi yi .
i=1

In contrast to Section 2.5, we now use a subscript to denote this special inner product. If
there is no confusion which inner product we use, we can omit the index.

Remark:
Due to its simplicity, this inner product is prominent in theory and practice. How-
ever, in particular for very large scale problems with special structure other “specially
tailored” inner products play a major role.
5.1 General inner products in Rn 111

Definition 5.3. Positive definite matrix


A matrix A ∈ Rn×n is called positive definite if it is symmetric (AT = A) and
satisfies
hx, Axieuklid = xT Ax > 0
for all x 6= o.

Example 5.4. Each diagonal matrix D ∈ Rn×n with positive entries on the diagonal is
a positive definite matrix.

Let A ∈ Rn×n be a positive definite matrix. Then the following defines an inner product
on Rn :
hx, yiA := hx, Ayieuklid = xT Ay

It is linear in the first argument, by the linearity of the matrix- (row) vector product,
symmetric by symmetry of A (aij = aji , or A = AT ) and positive definite by positive
definiteness of A.
The simplest case is A = 1, so hx, yi1 = xT y is the standard euclidean product.
A simple case, where A = D is a diagonal matrix makes sense, if R3 corresponds to spatial
coordinates, given in different units (say inch and centimeters).
Our abstract assumptions already yield all the useful formulas, known from our standard
inner product:

Proposition 5.5.
Let h·, ·i be an inner product on a subspace V ⊂ Rn and k · k its associated norm.
Then for all x, y ∈ V and λ ∈ R, we have:

(a) |hx, yi| ≤ kxkkyk (Cauchy-Schwarz inequality). Equality holds if and only if x
and y are colinear (written as x k y).

(b) The norm fulfils three properties:


(N1) kxk ≥ 0, and kxk = 0 ⇔ x = o,
(N2) kλxk = |λ| kxk,
(N3) kx + yk ≤ kxk + kyk.

Proof. We show the Cauchy-Schwarz inequality (CSI) in a short proof. Let y 6= o,


otherwise the CSI reads 0 = 0.
For any λ ∈ R the binomial formula yields:

0 ≤ kx − λyk2 kyk2 = kxk2 kyk2 − 2λhx, yikyk2 + λ2 kyk4 .

(This is zero, if y = o, or x = λy, i.e. x and y are colinear). Setting λ = hx, yi/kyk2 ,
we obtain
0 ≤ kxk2 kyk2 − 2hx, yi2 + hx, yi2 = kxk2 kyk2 − hx, yi2 .
The norm properties are left as an exercise.
112 5 General inner products, orthogonality and distances

No matter which inner product we are using, we can define orthogonality as follows:

x⊥y :⇔ hx, yi = 0.

A norm for matrices

Once we can measure the size of a vector v by a norm kvk, we may think about measuring
the “size” of a linear map. Consider A ∈ Rm×n , and w = Av. Then the following quotient

kwkRm kAvkRm
=
kvkRn kvkRn

tells us, how much longer (or shorter) w = Av is, compared to v. A should be “large”,
if it produces long vectors from short ones, and “small”, if it produces short vectors from
long ones. Thus, we may define

kAvkRm
kAk := max ,
v6=0 kvkRn

and call it the matrix norm of A. Hence, we have:

kwkRm = kAvkRm ≤ kAkkvkRn .

It is not easy to compute this norm. We will consider a possibility later.

5.2 Orthogonal projections


In this section h·, ·i denotes an arbitrary inner product in Rn .

5.2.1 Orthogonal projection onto a line

Imagine you ride a rowboat on a river. You want to go in a direction r 6= o. However


water flows in direction x, which is not parallel to r.

The steersman asks:


What is the component of x with respect to the wanted direction r?

In mathematical language: Write


U = Span{r}
x

the vector x in a linear combination n


ter

ion r
wa

· ect rse
x=p+n dire cou
of

th
- of
w

tho x)
flo

o r
consisting of two orthogonal vec- p ( on o
f
flow jecti
tors: p is parallel to the wanted dir- o
t
ne l pro
a
gon
ection r and n is orthogonal to this.
5.2 Orthogonal projections 113

Definition 5.6. Orthogonal projection onto a line


Let h·, ·i be an inner product in Rn and U := Span(r) for r 6= o. For a decomposition
x = p + n for a vector x ∈ Rn into two orthogonal vectors p ∈ U and n ⊥ r, we
call p the orthogonal projection of x onto U , and n is called the normal component
of x with respect to U .

The orthogonal projection and normal component is indeed well-defined. If there are two
decompositions x = p + n = p0 + n0 with p, p0 ∈ U and n, n0 ∈ U ⊥ , then we can use
the subspace properties to conclude n − n0 ∈ U ⊥ and p − p0 ∈ U . Applying the inner
product onto p − p0 = n0 − n, we conclude kp − p0 k = 0 and kn0 − nk = 0. From the
norm properties, we get n = n0 and p = p0 .
Calculation of p and n: Because of p ∈ U = Span(r), we have p = λr for a λ ∈ R,
which we simply have to find. Since x = p + n = λr + n and n ⊥ r, we get:

hx, ri
λr +n, ri = hλr, ri + hn, ri = λhr, ri and therefore λ =
hx, ri = h|{z} .
| {z } hr, ri
p 0

The case r = o (i.e. p = o and n = x) is omitted here. In summary, we get:

Proposition 5.7. Orth. projection & and normal component w.r.t a line
Let x, r ∈ Rn with r 6= o. For the orthogonal projection p of x onto U = Span(r)
and the associated normal component n of x w.r.t. U , one finds:

hx, ri hx, ri
p= r and n=x−p=x− r.
hr, ri hr, ri

Proof. Obviously, p ∈ U and calculation shows n ∈ U ⊥ .

Rule of thumb: k · k gives length and h·, ·i gives an angle


Geometrically kxk is seen as a length of the vector x. The inner product hx, yi
gives back the angle between x and y.

To define a meaningful angle between vectors, we


again look at the triangle, given by the vectors x, p x n
and n. It is right-angled since p ⊥ n is our definition
of 90 degree. The angle between x and r is called α α ·
o p r
in the picture.
If α is an acute angle, i.e. α ∈ [0, π/2], then λ ≥ 0 and:

λ≥0 hx, ri hx, ri hx, ri


cos(α)kxk = kpk = kλrk = |λ|krk = λkrk = krk = 2
krk = ,
hr, ri krk krk

We reformulate this:

hx, ri = kxkkrk cos(α).


114 5 General inner products, orthogonality and distances

If α is not acute, we can do an analogue calculation. In summary, we can give the following
definition for an angle:

Definition 5.8. Angle between two vectors in Rn


For two vectors x, y ∈ Rn \ {o} we write angle(x, y) for the angle α ∈ [0, π] between
x and y, which is defined by

hx, yi
cos(α) = . (5.3)
kxkkyk

Using Proposition 5.5 (Cauchy-Schwarz-inequality), we conclude that the angle is well-


defined:
|hx, yi| hx, yi
≤ 1 and hence −1≤ ≤ 1.
kxkkyk kxkkyk
This means the right-hand side of (5.3) is indeed in the range of the cos function. Re-
stricted to α ∈ [0, π], we know that cos(α) is bijective, and hence, α is well-defined by
equation (5.3).

Example 5.9. Consider the cube C in R3 with center M in the origin and the corners
(±1, ±1, ±1)T , where all the combinations with ±-signs occur.
   
−1 1
All diagonals of C go trough M and intersect with an angle  1  1
α, which is calculated with the vectors x = (1, 1, 1)T and 1 1
y = (−1, 1, 1)T :

hx, yieuklid −1 + 1 + 1 1
cos(α) = =√ √ = , y x
kxkkyk 1+1+1 1+1+1 3 α
M
which implies α = arccos( 13 ) ≈ 70.53◦ .

5.2.2 Orthogonal projection onto a subspace


Before we projected a vector onto a line, which is just a 1-dimensional subspace of Rn .
Now we generalise this procedure for arbitrary subspaces of Rn . In order to do this, we
recall the concept of orthogonal complements:

Definition 5.10. Orthogonal complement M ⊥


Let M ⊂ Rn be nonempty. Then we call

M ⊥ := {x ∈ Rn : hx, mi = 0 for all m ∈ M }

the orthogonal complement for M . Instead of x ∈ M ⊥ , we often write x ⊥ M .

Example 5.11. Consider h·, ·ieuklid the standard inner product in Rn .


(a) For M = {o} in Rn , we have M ⊥ = Rn .
5.2 Orthogonal projections 115

(b) For M = {e1 } in R2 , we have M ⊥ = Span(e2 ) ⊂ R2 .


(c) For M = {3e1 } in R2 , we have M ⊥ = Span(e2 ) ⊂ R2 .
(d) For M = {e1 , 3e1 } in R2 , we have M ⊥ = Span(e2 ) ⊂ R2 .
(e) For M = Span(e1 ) in R2 , we have M ⊥ = Span(e2 ) ⊂ R2 .
(f) For M = {e1 , e2 } or M = Span(e1 , e2 ) in R3 , we have M ⊥ = Span(e3 ) ⊂ R3 .
(g) For n ∈ R3 \ {o}, we have {n}⊥ the plane R3 through 0 with normal vector n.
(h) For n ∈ R3 \ {o} and p ∈ R3 , we have p + {n}⊥ , the plane R3 through p with normal
vector n (this is an affine space {x ∈ R3 : hx − p, ni = 0}).
(i) For M = Span(e1 , e2 , e5 ) in R5 , we have M ⊥ = Span(e3 , e4 ) ⊂ R5 .

Proposition 5.12.
For all nonempty sets M ⊂ Rn we have:

(a) M ⊥ = (Span(M ))⊥ ,

(b) M ⊥ is a linear subspace of Rn .

Proof. Exercise!

We state one important property of the orthogonal complement. Other important one,
you find at the end of this section.

Proposition 5.13. Properties of U ⊥


For a linear subspace U ⊂ Rn , we have U ∩ U ⊥ = {o}.

Proof. (a) For x ∈ U ∩ U ⊥ , we have x ⊥ x, i.e. hx, xi = 0. Using equation (5.1), we get
x = o.

Proposition 5.14. Orthogonal to a basis


Let U be a linear subspace of Rn and B = (u1 , . . . , uk ) a basis of U . Then for all
x ∈ Rn we have:
x⊥U ⇐⇒ x ⊥ B.
In other words: x is orthogonal to all vectors in U if and only if it is orthogonal to
the basis vectors of U .

Proof. ⇒ If x ⊥ u holds for all u ∈ U , then, of course, also for all basis elements
u = ui ∈ B ⊂ U .
⇐ If hx, ui i = 0 holds for i = 1, . . . , k and we have u = α1 u1 + . . . + αk uk ∈ U , then
hx, ui = hx, α1 u1 + . . . + αk uk i = α1 hx, u1 i + . . . + αk hx, uk i = α1 0 + . . . + αk 0 = 0.
116 5 General inner products, orthogonality and distances

Definition 5.15. Orthogonal projection onto a subspace U


Let U be a linear subspace of Rn and also let
x ∈ Rn . Again, we search for a decomposi- n
tion: U
·

x
p ∈ U and n ⊥ U with x = p + n.
p
In other words, we write x as a sum of two 0
vectors, where one lies in U and the other
one is orthogonal to U .

The (uniquely determined) vector p is called the orthogonal projection of x onto


U , and n is called the normal component of x w.r.t. U .
For the orthogonal projection p of x onto U , we often simply write x U .
In this notation, the decomposition x = p + n gets:

x = x U + x U⊥ , i.e. p = xU and n = x U ⊥

Calculation of the orthogonal projection x U : It works exactly the same as in the


one-dimensional case given in Section 5.2.1. The only difference is that U is spanned
by k ≥ 1 vectors. Hence, choose a basis B = (u1 , . . . , uk ) of U . Then we can rewrite
p = x U ∈ U as x U = α1 u1 + . . . + αk uk with coefficients α1 , . . . , αk ∈ R, we now have to
determine. For n = x U ⊥ , we only need the information n ⊥ U .
To find α1 , . . . , αk we just consider the inner product of
x = x U + n = (α1 u1 + . . . + αk uk ) + n
with respect to all k basis vectors of U : For i = 1, . . . , k, we have
hx, ui i = hα1 u1 + . . . + αk uk + n, ui i = α1 hu1 , ui i + . . . + αk huk , ui i + hn, ui i . (5.4)
| {z }
0

Now we have k equations and k unknowns α1 , . . . , αk :

Proposition 5.16. Calculating the projection x U


Let x ∈ Rn and U be a linear subspace of Rn where B = (u1 , . . . , uk ) is a basis of
U . Then we get the orthogonal projection

x U = α1 u1 + . . . + αk uk ,

where α1 , . . . , αk are given by the (unique) solution of the LES:


    
hu1 , u1 i hu2 , u1 i . . . huk , u1 i α1 hx, u1 i
hu1 , u2 i hu2 , u2 i . . . huk , u2 i α2  hx, u2 i
 .. .. ..   ..  =  ..  . (5.5)
    
 . . .  .   . 
hu1 , uk i hu2 , uk i . . . huk , uk i αk hx, uk i

The (k × k) matrix on the left-hand side is called the Gramian matrix G(B). The
normal component n = x U ⊥ is then given by n = x − x U .
5.2 Orthogonal projections 117

Proof. The result for αi follows from equation (5.4). Now we show that the Gramian
matrix G(B) =: G is invertible. This means that we have to show Ker(G) = {o}. Let
(β1 , . . . , βk )T ∈ Ker(G). Then for all i = 1, . . . , k, we have:
0 = β1 hu1 , ui i + . . . + βk huk , ui i = hβ1 u1 + . . . + βk uk , ui i.
Hence, u := β1 u1 + . . . + βk uk is orthogonal to all ui . Using Proposition 5.14, we get u ∈
U ⊥ . Per construction, we get u ∈ U , which means u ∈ U ∩ U ⊥ . Using Proposition 5.13,
we conclude u = o. The family (u1 , . . . , uk ) is linearly independent since it is a basis. So
u = o implies β1 = . . . = βk = 0.

Proposition 5.17. Approximation formula


Let x ∈ Rn and U be a linear subspace of Rn .
The orthogonal projection x U minimises the n
distance between x and the subspace U : U
·

x
k x − x U k = min kx − uk =: dist(x, U ) u
u∈U
| {z }
n
xU
0
In other words: No other vector of U is as
closed to x as x U .

Proof. For all u ∈ U , we get


kx − uk2 = k (x − x U ) + (x U − u) k2 = hn + v, n + vi = hn, ni +2 hn, vi + hv, vi ≥ knk2 ,
| {z } | {z } | {z } | {z } | {z }
n =:v knk2 0 ≥0

and, hence, kx − uk ≥ knk = kx − x U k. Equality holds if and only if v = o, i.e.


u = x U.

Proposition 5.18.
For all nonempty sets M ⊂ Rn we have:

(a) Rn = Span(M ) + M ⊥ and Span(M ) ∩ M ⊥ = {o},

(b) (M ⊥ )⊥ = Span(M ).

Proof. (a): By Proposition 5.16, we know that for each x ∈ Rn there is a decompos-
ition x = p + n with p ∈ Span(M ) and n ∈ (Span(M ))⊥ . Now we just have to use
Proposition 5.12 and Proposition 5.13.
(b): Exercise! (Use part (a))

Corollary 5.19. Properties of U ⊥


For a linear subspace U ⊂ Rn , we have:

(a) Rn = U + U ⊥ and U ∩ U ⊥ = {o}. Usually, one writes in this case

Rn = U ⊕ U ⊥ ,
118 5 General inner products, orthogonality and distances

and calls it direct sum or, more correctly, orthogonal sum of two subspaces.

(b) dim(U ⊥ ) = dim(Rn ) − dim(U ).

(c) (U ⊥ )⊥ = U .

5.3 Orthonormal systems and bases


For some applications it it very useful to have a set of vectors {u1 . . . uk } ⊂ Rn which are
mutually orthogonal:
i 6= j ⇒ ui ⊥ uj ⇔ hui , uj i = 0
and have unit norm: p
kui k = hui , ui i = 1.
Using the Kronecker symbol: 
1 : i=j
δij =
0 : i 6= j
we may write this in short:
hui , uj i = δij .

Definition 5.20. OS, ONS, OB, ONB


Let U be a linear subspace of Rn . A family F = (u1 , . . . , uk ) consisting of vectors
from U is called:

• Orthogonal system (OS) if the vectors in F are mutually orthogonal:


hui , uj i = 0 for all i, j ∈ {1, . . . , k} with i 6= j;
• Orthonormal system (ONS) if hui , uj i = δij for all i, j ∈ {1, . . . , k};
• Orthogonal basis (OB) if it is an OS and a basis of U ;
• Orthonormal basis (ONB) if it is an ONS and a basis of U .

If F is an ONB, then the Gram matrix G(F) is the identity matrix and projections are
very easily calculable.

Example 5.21. Let h·, ·i = h·, ·ieukl the standard inner product.
(a) The canonical unit vectors
e1 = (1, 0, . . . , 0)T , e2 = (0, 1, 0, . . . , 0)T , ..., en = (0, . . . , 0, 1)T
in Rn define an ONB for U = Rn .
(b) The family F = (u1 , u2 , u3 ) given by
u1 = (1, 0, 1)T , u2 = (1, 0, −1)T , u3 = (0, 1, 0)T
defines an OB of R3 . We show this: We immediately have hu1 , u3 i = 0 and hu2 , u3 i =
0. Moreover, we find
! !
D 1 1 E
hu1 , u2 i = 0 , 0 = 1 + 0 − 1 = 0.
1 −1
5.3 Orthonormal systems and bases 119

Hence, F is an OS. It remains to show that F is also a basis for R3 . Since dim(R3 ) = 3
and F consists of three linearly independent vectors, we are finished. For showing the
linear independence, the next Proposition 5.22 will be always helpful.
(c) Normalising the vectors from (b), we obtain an ONB ( √12 u1 , √12 u2 , u3 ).

Proposition 5.22. An OS is linearly independent.


Let F = (u1 , . . . , uk ) be an OS in Rn with ui 6= o for i = 1, . . . , k. Then F is
linearly independent.

Proof. Let F be an OS. To show the linear independence of F, we only have to show that
α1 u1 + . . . + αk uk = o always implies α1 = . . . = αk = 0. Using the inner product for ui
with i = 1, . . . , k, we get:

0 = ho, ui i = hα1 u1 + . . . + αk uk , ui i
= α1 hu1 , ui i + . . . + αi−1 hui−1 , ui i +αi hui , ui i +αi+1 hui+1 , ui i + . . . + αk huk , ui i
| {z } | {z } | {z } | {z } | {z }
0 0 kui k2 0 0
2
= αi kui k .

Since kui k2 6= 0, the only possibility is αi = 0 and this holds for all i = 1, . . . , k.

Now we can show, how easy it is to calculate Gramian matrices with a basis that is
orthogonal.

Proposition 5.23. Gramian matrix for OB and ONB


The Gramian matrix G(B) for an OB B = (u1 , . . . , uk ) is a diagonal matrix:

ku1 k2
   
hu1 , u1 i hu2 , u1 i . . . huk , u1 i 0
.
ku2 k2 . .
hu1 , u2 i hu2 , u2 i . . . huk , u2 i 
  0

G(B) =  .. .. .. = .
 
 . . . . . . .
. .
 
  0 
hu1 , uk i hu2 , uk i . . . huk , uk i 0 kuk k2

If B actually is an ONB, then we have G(B) = 1.

The orthogonal projection x U for a vector x ∈ Rn onto the linear subspace U =


Span(B) is then given by the coefficients

hx, u1 i hx, u2 i hx, uk i


α1 = , α2 = , ..., αk =
ku1 k2 ku2 k2 kuk k2

for equation (5.5). We get:

hx, u1 i hx, uk i
xU = 2
u1 + . . . + uk and x U⊥ = x − x U
ku1 k kuk k2

If B is even an ONB, then all the denominators kui k2 are equal to 1.


120 5 General inner products, orthogonality and distances

Even if one is not interested in the projection, this can be helpful for calculating the
coefficients for the linear combination.

Corollary 5.24. Fourier expansion w.r.t. an OB or ONB


Let U be a linear subspace of Rn and B = (u1 , . . . , uk ) an OB of U . Then the unique
linear combination for a vector x ∈ U with respect to B is given by:

hx, ui i
x = α1 u1 + . . . + αk uk with αi = for all i ∈ {1, . . . , k}. (5.6)
kui k2

This formula is called the Fourier expansion of x with respect to B, and the numbers
αi are called the associated Fourier coefficients. If B even is an ONB, then

αi = hx, ui i for all i = 1, . . . , k.

Note that in the case U = Rn , we simply set k = n.

Looking at the formula for n = x U ⊥ from Proposition 5.23, one recognise a general prin-
ciple how to construct orthogonal vectors. We summarise this in the following algorithm

Remark: Gram-Schmidt orthonormalisation


Let U be a linear subspace of Rn and (u1 , . . . , uk ) a basis of U. The following pro-
cedure will give us an ONB (w1 , . . . , wk ) for U .

(1) Normalise the first vector:


1
w1 := u1 .
ku1 k

(2) Choose the normal component of u2 with respect to Span(w1 )


1
v2 := u2 − hu2 , w1 iw1 and normalise it: w2 := v2 .
| {z } kv2 k
u2
Span(w1 )

(3) Choose the normal component of u3 with respect to Span(w1 , w2 )


  1
v3 := u3 − hu3 , w1 iw1 + hu3 , w2 iw2 and normalise it: w3 := v3 .
| {z } kv3 k
u3
Span(w1 ,w2 )

..
.

(k) In the last step choose the normal component of uk w.r.t. Span(w1 , . . . , wk−1 )
k−1
X 1
vk := uk − huk , wi iwi and normalise it: wk := vk .
kvk k
|i=1 {z }
uk
Span(w1 ,...,wk−1 )
5.4 Orthogonal matrices 121

Example 5.25. Let u1 = (1, 1, 0)T and u2 = (2, 0, 2)T be two vectors in R3 and U =
Span(u1 , u2 ) the spanned plane. We calculate an ONB (w1 , w2 ) for U . The first vector is
!
1 1 1
w1 := u1 = √ 1 .
ku1 k 2 0

For the second vector, we first need to calculate:


! ! ! ! ! ! !
2 D 2 1 1 E 1 1 2 1 1 1
v2 := u2 − hu2 , w1 iw1 = 0 − 0 ,√ 1 √ 1 = 0 − 2 1 = −1
2 2 2 0 2 0 2 2 0 2

Then v2 is getting normalised:


!
1 1 1
w2 := v2 = √ −1 .
kv2 k 6 2

Now we have kw1 k = 1 = kw2 k and hw1 , w2 i = 0 and also Span(w1 , w2 ) = U =


Span(u1 , u2 ).

We recall Corollary 5.24: Why are such ONB helpful? Usually, if we want to write a
vector v as a linear combination of basis vectors B = (b1 , . . . , bk ), we have to solve a
linear system:
Xk
v= λi bi .
i=1

If we have an orthonormal basis B = (u1 , . . . , uk ), then we can dispense with this. We


can simply calculate:
* +
X
hv, ui i = λj uj , ui = λi hui , ui i = λi .
j

Thus, each coefficient of the linear combination results from a simple inner product.

Remark: Outlook
It is this principle the so called Fourier-Transformation is built on. It decomposes a
signal v(t) into frequencies ui (t) = sin(ωi t). This is, however, a problem formulated
in a more abstract vector space.

5.4 Orthogonal matrices


Let us now restrict our attention to the standard inner product

hx, yi = hx, yieuklid = xT y,

and write down our results from above in terms of matrices.


122 5 General inner products, orthogonality and distances

Let B = (u1 , . . . , un ) be a basis for Rn . Then each x ∈ Rn can be uniquely written as:
      
α1
1 ··· u ..  .
x = α1 u 1  + . . . + αn u n  = u n   .
αn
| {z }
=:A

For the so-defined matrix A = (u1 · · · un ), we get:


  T T

u u · · · u u
 T

u1 1 1 1 n
.. . .. 
AT A =  .  u1 · · · un  =   .. . 
T
un T
un u1 · · · un un T
 
hu1 , u1 i · · · hun , u1 i
.. .. (5.7)
= . .  = G(B),
 
hu1 , un i · · · hun , un i

This means that AT A is the Gramian Matrix G(B) for the basis B. For an ONB B, the
matrix G(B) is the identity matrix 1 by Proposition 5.23. This gives us the following:

Definition 5.26. Orthogonal matrix


A matrix A ∈ Rn×n with the property AT A = 1 is called orthogonal.

We immediately see that an orthogonal matrix A has an ONB as columns and fulfils

hAx, Ayi = hx, yi.

The last property says that the corresponding linear map fA preserves the inner product,
and thus angles and lengths.

Proposition 5.27. Defining properties of orthogonal matrices


For a matrix A ∈ Rn×n the following claims are equivalent:

(a) A is an orthogonal matrix

(b) AT A = 1.

(c) AAT = 1.

(d) A−1 = AT .

(e) AT is an orthogonal matrix.

(f ) The columns of A define an ONB of Rn .

(g) The rows of A define an ONB of Rn .

(h) For all x, y ∈ Rn , we get hAx, Ayi = hx, yi.

(i) For all x ∈ Rn , we get kAxk = kxk.

Proof. Exercise!
5.5 Orthogonalisation: the QR-decomposition 123

Such matrices correspond to maps of special geometric interest:


• Rotations
• Reflections
• Special case: permutation matrices
We also see that solving a LES Ax = b described by an orthogonal matrix A is easy to
solve:
x = A−1 b = AT b
The inverse is computed now more easily than in the general case.

Proposition 5.28. Determinant of orthogonal matrices


For an orthogonal matrix A, we have det(A) = ±1.

Proof. 1 = det(1) = det(AT A) = det(AT ) det(A) = det(A)2 .

Definition 5.29. Rotations and reflections


Let A ∈ Rn×n be an orthogonal matrix. If det(A) = 1, we call A a rotation. If
det(A) = −1, we call the matrix a reflection.

Which matrices in Example 3.16 are rotations or reflections?

Attention! Notions: Rotation or reflection


(a) Not every matrix A ∈ Rn×n with det(A) = 1 (or det(A) = −1) is a rotation (or
a reflection)!

(b) A “reflection” from Definition 5.29 could also be a point reflection in the case
n ≥ 3.

5.5 Orthogonalisation: the QR-decomposition


Since orthonormal systems and bases are so useful, we learnt the Gram-Schmidt procedure
to turn a family (a1 , . . . , ak ) of linearly independent vectors into an orthonormal system
124 5 General inner products, orthogonality and distances

(q1 , . . . , qk ) that spans the same space.


Putting the the vectors ai as columns in a matrix A and the vectors qi in a matrix Q,
this can be written down as a decomposition:
A = QR QR-decomposition
We see that the columns of A can be seen as linear combinations of the columns of Q.
Here R is a column matrix, in which the coefficients of linear combinations of the qi stand.
Since the lth column of A should always be a linear combination of the first l columns of
Q we have:
Xl
al = ril qi
i=1
This means that R is an upper triangular matrix.
There are at least three alternatives to compute this:
• “Classical Gram-Schmidt”: this is what we learn next (good for pen-and-paper com-
putations), but instable on the computer
• “Modified Gram-Schmidt”: equivalent to our Gram-Schmidt, order of loops ex-
changed, numerically more stable
• “Householder reflections”: are cheaper and even more stable. This is the method of
choice in numerical computations

If A is square matrix with rank(A) = n, we get


 

A = a1 · · · an 
       
r11 r12 r1n r11 r12 r13 · · · r1n
 0  r22 
   
 r2n !
 

 r22 r23 · · · r2n 

=
0 0  r3n 
Q  Q  ··· Q  = Q
 r33 · · · r3n 
 = QR.
 ..   ..   ..  ... .. 
 .   .   .  . 


0 0 rnn rnn
| {z } | {z } | {z } | {z }
a1 a2 an =: R

This defines the so-called QR-decomposition of a matrix A.


As a result, we get (q1 , . . . , qn ) as an ONB for the space Span(a1 , . . . , an ) = Ran(A). We
immediately get QT Q = 1n . Since we consider square matrices, m = n, we can write
Q−1 = QT .

Example 5.30. Consider  


2 −1 8
A=  1 1 1
−2 4 4
 
2/3
a1 a1 
q1 = = = 1/3  , r11 = 3
ka1 k 3
−2/3
5.6 Distances: points, lines and planes 125

 
1/3
r12 = ha2 , q1 i = −3, q2 = 2/3 , r22 = 3
2/3
 
2/3
r13 = ha3 , q1 i = 3, r23 = ha3 , q2 i = 6, q3 = −2/3 , r33 = 6;
1/3
   
2/3 1/3 2/3 3 −3 3
Q =  1/3 2/3 −2/3 R= 3 6
−2/3 2/3 1/3 6

Example 5.31.
  2 
2 −1 8 /3
a1
For A =  1 1 1 Gram-Schmidt gives us q1 = =  1 
/3 ,
ka1 k −2
−2 4 4 /3
1  2 
a2 − (a2 ) Span(q1 ) /3 a − (a ) /3
3 3 Span(q1 ,q2 )
2  −2 
q2 = = /3 , q3 =
 =  /3
k...k 2 k...k 1
/3 /3
   
2 1 2 3 −3 3
1
Hence: Q =  1 2 −2 and R = QT A =  3 6 .
3
−2 2 1 6

As we have seen in the LR-decomposition, we can also use the QR-decomposition for
solving an LES Ax = b. If A is a square matrix (m = n), we know:

Q−1 =QT
Ax = b ⇐⇒ QRx = b ⇐⇒ Rx = QT b (5.8)

The last system has a triangle form and is solved by backwards substitution. A QR-
decomposition is also possible in the non-square case as we will see later in detail.

5.6 Distances: points, lines and planes


Recall that we call an affine subspace H in Rn with dimension n − 1 a hyperplane. This
is, for example, a line in R2 or a plane in R3 .

Definition 5.32. Hesse normal form (HNF), distance dist(·,·)


For each hyperplane in Rn , there exists a normal form

{v ∈ Rn : hn, v − pi = 0}

where p ∈ Rn is one chosen point and n ∈ Rn a normal vector. We call it Hesse


normal form (HNF) if knk = 1 holds.
For a given point q ∈ Rn and affine subspaces S, T in Rn , we write:

dist(q, T ) := min kq − tk and dist(S, T ) := min dist(s, T ) = min min ks − tk


t∈T s∈S s∈S t∈T
126 5 General inner products, orthogonality and distances

for the shortest distance between v and T and the shortest distance between S and
T , respectively.

If we are using the HNF for a hyperplane, then the expression hn, v − pi can indeed
measure the distances:

Proposition 5.33.
For a hyperplane T = {v ∈ Rn : hn, v − pi = 0} with knk = 1 (this is the HNF),
we have
hn, q − pi = ±dist(q, T ) (5.9)
where the sign “+” holds if q lies on the same side of T as the normal vector n,
and “−” holds if q lies on the other side of T .

Proof. This is an exercise where you should use


hn, v − pi hv − p, ni
hn, v − pi = =
1 hn, ni
and use projections.

Using equation (5.9), we are able to calculate distances. We summarise all possibilities
for such problems for R3 :

Distances in R3
• Point/Point: dist(p, q) = kp − qk, (for completeness’s sake),
• Point/Plane: dist(q, p + Span(a, b)) = |hn, q − pi|, cf. (5.9).
| {z }
E

• Line/Plane:

dist(p + Span(a), q + Span(b, c)) = dist(p, E),


| {z } | {z }
g E

if g is parallel with respect to E. In the other case, g and E intersect,


and, hence, dist(g, E) = 0. If g is parallel with respect to E, one has a
in Span(b, c): the family (a, b, c) is linearly dependent, i.e. det(a b c) = 0.
• Plane/Plane:

dist(p + Span(a, b), q + Span(c, d)) = dist(p, E2 ),


| {z } | {z }
E1 E2

if E1 is parallel to E2 . Otherwise, dist(E1 , E2 ) = 0, which means that E1 and


E2 have an intersection. If E1 and E2 are parallel, then the normal vectors
n1 := a×b and n2 := c×d of E1 and E2 , respectively, are linearly dependent.
• Line/Line: Let g1 = p + Span(a) and g2 = q + Span(b) be lines in R3 .
– 1st case: If a and b are parallel, then g1 and g2 are parallel. Let E the
plane with p ∈ E and normal vector n := a. The intersection point of
E and g2 is denoted by p0 (Calculation: Put the parameter equation of
5.6 Distances: points, lines and planes 127

g2 into the normal form of E). Then dist(g1 , g2 ) = dist(p, p0 ) = kp−p0 k.

– 2nd case: If the vectors a and b are not parallel, then

dist(g1 , g2 ) = dist(p, q + Span(a, b)).


| {z }
=:E

• Point/Line: Let p be a point in R3 and g = q + Span(a) a line in R3 . Define


b := (p − q) × a. If b = o, then p lies in g, and, hence, dist(p, g) = 0. On
the other hand, b can be perpendicular to the plane, defined by p and g. In
this case:
dist(p, g) = dist(p, q + Span(a, b)).
| {z }
=:E
Alternatively: Norm of the normal component of p−q with respect to Span(a)
(Proposition 5.7).

Note that we always have

dist(p, q + U ) = min kp − (q + u)k = min k(p − q) − u)k = dist(p − q, U ) ,


u∈U u∈U

and therefore, one usually just considers the case of linear subspaces instead of affine
subspaces.

Summary
• Each vector x ∈ Rn can be uniquely decomposed into
– a vector x U in a given subspace U and
– a vector n that is orthogonal to U .
The vector x U is called the orthogonal projection of x onto U . n is equal to x U ⊥ .
• If dim(U ) = 1, the calculation of x U is very easy, while one can use Proposition 5.7;
If dim(U ) ≥ 2, the one has to choose a basis B for U and either
– solve an LES with the help of the Gramian matrix G(B) (Proposition 5.16) or
– build an ONS or ONB with the help of the Gram-Schmidt procedure and use
Proposition 5.23.
• A matrix A ∈ Rn×n with A−1 = AT is called orthogonal. The determinant is ±1.
Depending on the sign of det(A), the matrix A describes a reflection or a rotation.
Eigenvalues and similar things
6
The first person you should be careful not to fool is yourself. Because
you are the easiest person to fool.
Richard Feynman

Consider again a square matrix A ∈ Rn×n and the associated linear map fA : Rn → Rn
which maps Rn into itself.

Question:
Are there vectors v which are only scaled by fA ? This means that they satisfy:

Av = λv or equivalently (A − λ1)v = o

• λ is called eigenvalue of A,

• v is called eigenvector of A (if v 6= o).

First conclusions:
• Not very interesting (trivial): v = o.
• v ∈ Ker(A) \ {o} ⇒ Av = 0v, so λ = 0.
• v ∈ Ker(A − λ1) \ {o} ⇒ Av = λv, so λ is an eigenvalue.
• v eigenvector ⇒ αv is also an eigenvector (for α 6= 0).

      
1 1 v1 v1 + v2 v
Example. (a) Av = = ⇒ λ = 1, v = 1
0 1 v2 v2 0
        
3 0 v1 3v1 0 v
(b) Av = = ⇒ λ = 2, v = , or λ = 3, v = 1
0 2 v2 2v2 v2 0
(c) The eigenvalues of a diagonal matrix are the diagonal entries, its eigenvectors are the
(scaled) canonical unit vectors.
(d) Suppose A ∈ R2×2 is a rotation about an angle (not a multiple of 180◦ ). Then
“obviously” there cannot be any eigenvectors (at least no real ones).

129
130 6 Eigenvalues and similar things

6.1 What is an eigenvalue and an eigenvector?


We start with an illustration in two-dimensional cases and consider a matrix
!
A= a1 a2 ∈ R2×2

and the associated linear map fA : R2 → R2 with x 7→ Ax.


Define the two lines g2
g20
g1 := Span(e1 ) and g30 = g3
g2 := Span(e2 ), and x1 g3 fA a2
e2 g10
look at the images under
g1 a1
the map fA , denoted e1
by g10 := fA (g1 ) and
g20 := fA (g2 ).
We already know g10 = Span(a1 ) and g20 = Span(a2 ). In the picture above, you see an
example what could happen under the map fA : Both lines have  been
 rotated but in
3 2
different orientations. In this example we chose the matrix A = .
1 2
By the linearity of fA , we also know that all other lines between g1 and g2 should also be
rotated. However, then in this example, we should also find a line g3 , which goes through
the origin, that is not rotated at all when using fA . This simply means that the image
g30 := fA (g3 ) is equal to g3 .
Of course, this does not mean that all the points of g3 stay fixed after using fA but only
that a point of g3 is mapped to another point on g3 .
g20
In the same sense, we can look
fA a2
e2 g10 at the other quadrants of our
g1 a1 coordinate system. There we
e1 also find such a special line:
x2
g40 := fA (g4 ) = g4 .
g4 g40 = g4
g2
For points x on both lines, we find our defining equation for eigenvectors and eigenvalues
again:
Ax1 = fA (x1 ) ∈ g3 , i.e. Ax1 = λ1 x1 with a certain λ1 ∈ R
and
Ax2 = fA (x2 ) ∈ g4 , i.e. Ax2 = λ2 x2 with a certain λ2 ∈ R.
Note that the numbers λ1 or λ2 cannot change for different points on the line since fA is
linear. We fix all this in a definition:

Definition 6.1. Eigenvalue, Eigenvector, spectrum


Let A be a square matrix. A vector x 6= o is called an eigenvector of A, if Ax is a
multiple of x. This scalar λ, which means Ax = λx, is called eigenvalue of A. The
set of all eigenvalues of A is called the spectrum of A and denoted by spec(A).
6.2 The characteristic polynomial 131

This is very general definition and will work later for other cases in the same manner.
Here, we are first interested in matrices A ∈ Rn×n and eigenvalues λ ∈ R. However, you
may already see that this can also work for complex numbers. We may also include λ ∈ C
later.

Proposition 6.2. Multiple of eigenvector = eigenvector


Every multiple (not o) of an eigenvector x for A is also an eigenvector for A,
corresponding to the same eigenvalue λ.

Proof. Let Ax = λx with x 6= 0, which means x is an eigenvector of A for the eigenvalue


λ. Let y = αx with α 6= 0. Then Ay = A(αx) = αAx = αλx = λ(αx) = λy, which
means y(6= o) is also an eigenvector of A associated to the same eigenvalue λ.

Looking again at the pictures above:


• We have Ax = λ1 x for all multiples x of x1 ∈ g3 (which means for all x ∈ g3 ).
• Also we have Ax = λ2 x for all multiples x of x2 ∈ g4 (which means for all x ∈ g4 ).
• Looking at the line g3 , the map fA acts like scaling with the factor λ1 .
• Looking at the line g4 , the map fA acts like scaling with the factor λ2 .

Optimal coordinate system for the map fA


Describing R2 with a coordinate system given by the two lines g3 and g4 (instead of
g1 and g2 ), the acting of the map fA is very simple: The coordinate axes are only
stretched: The one with factor λ1 , and the other one with factor λ2 .

To get this “optimal coordinate system” we need all the eigenvalues λ1 , λ2 and the cor-
responding eigenvectors x1 and x2 .

Question:

(a) How to find the eigenvalues and the eigenvectors of A?

(b) Do you always find n eigenvalues for an n × n matrix A?

(c) Do you find n different directions for eigenvectors?

(d) How to change the coordinate system?

(e) What are applications for this?

6.2 The characteristic polynomial


Our goal is to find λ ∈ R and x 6= o such that (A − λ1)x = o, i.e., (A − λ1) has a
nontrivial kernel. This means that the corresponding map for A − λ1 is not injective and,
hence, it is a singular matrix.
132 6 Eigenvalues and similar things

Idea:
Compute det(A − λ1), which yields a polynomial of degree n in λ and determine
its zeros, because

det(A − λ1) = 0 ⇔ A − λ1 is singular


⇔ Ker(A − λ1) is non-trivial
⇔ λ is an eigenvalue

Then, compute a basis for Ker(A − λ1) for each eigenvalue.

Example 6.3.
 
3 2
A=
1 4
 
3−λ 2
det(A − λ1) = det = (3 − λ)(4 − λ) − 2 · 1 = 10 − 7λ + λ2
1 4−λ

7 ± 49 − 40 7±3
λ1,2 = = ⇒ λ1 = 2, λ2 = 5
2 2
Thus we have the eigenvalues λ1 = 2 and λ2 = 5. Let us compute the eigenvectors:
      
1 2 v1 v1 + 2v2 2
o = (A − 21)v = = ⇒v=α
1 2 v2 v1 + 2v2 −1
      
−2 2 v1 −2v1 + 2v2 1
o = (A − 51)v = = ⇒v=α
1 −1 v2 v1 − v2 1

We summarise what we discovered:

Proposition 6.4. Five properties of an eigenvalue


For a square matrix A and a number λ the following is equivalent:

(i) λ is an eigenvalue A.

(ii) There is a vector x 6= o with Ax = λx.

(iii) The space Ker(A − λ1) contains a vector x 6= o.

(iv) The matrix A − λ1 is not invertible.

(v) det(A − λ1) = 0

Proof. Exercise!

Let A ∈ Rn×n . Then we observe that det(A − λ1) = pA (λ) is a polynomial of order n in
the variable λ. For example, there could be coefficients ci such that

pA (λ) = (−1)n λn + cn−1 λn−1 + . . . c1 λ + c0 .


6.2 The characteristic polynomial 133

Definition 6.5. Characteristic polynomial


For an n × n-Matrix A, the polynomial λ 7→ det(A − λ1) is called the characteristic
polynomial of the matrix A and is denoted by pA .

 
3 2
Example 6.6. Look at A = .
1 2
3 2  
1 0 

3−λ 2

pA (λ) = det(A − λ1) = det −λ = det
1 2 0 1 1 2−λ
= (3 − λ) · (2 − λ) − 2 · 1 = 6 − 3λ − 2λ + λ2 − 2 = λ2 − 5λ + 4

Solving the quadratic equation:


r r
−5 25 5 9 5±3
λ1,2 = − ± −4= ± = ∈ {1, 4}, hence λ1 = 4, λ2 = 1.
2 4 2 4 2

If we observe the polynomial in the largest number space we know, the complex numbers,
we recall the fundamental theorem of algebra:

Theorem 6.7. Fundamental theorem of algebra (Gauß 1799)


Let a0 , a1 , . . . , an ∈ C with an 6= 0. Then the polynomial equation

an xn + an−1 xn−1 + . . . + a1 x1 + a0 = 0
| {z }
=: p(x)

has n (not necessarily different) solutions x1 , . . . , xn in C. Moreover, we find for


x ∈ C:
p(x) = an (x − x1 )(x − x2 ) · · · (x − xn ).

For the characteristic polynomial, this means:


• Every polynomial has at least one (possibly) complex root λ1 , so we have at least
one eigenvalue, but not always a real one. (e.g. pA (λ) = λ2 + 1, λ1,2 = ±i).
• Finding roots of a polynomial means factorisation into linear factors:

pA (λ) = (±1)(λ − λ1 )(λ − λ2 ) . . . (λ − λn ).

Sometimes some of the values λ1 . . . λn are equal (e.g. pA (λ) = (λ − 1)2 , and then
λ1 = λ2 = 1), so we have a multiple root.

Definition 6.8.
If the same eigenvalue λ appears α(λ) times in this factorisation, we say:

λ has algebraic multiplicity α(λ) .

• If we have k different eigenvalues λ1 , . . . , λk ∈ C, then α(λ1 ) + · · · + α(λk ) = n,


because polynomials of degree n can be factorised into n linear factors.
• If λ is an eigenvalue, then A − λ1 is singular, so γ(λ) := dim(Ker(A − λ1)) ≥ 1.
134 6 Eigenvalues and similar things

Since we can calculate eigenvalues by calculating determinants, we immediately get the


eigenvalues if A has a triangular form or a block triangular form:

Proposition 6.9. Spectrum for triangular matrices


Let A ∈ Rn×n be a square matrix.

(a) For a matrix in triangular form


 
a11 a12 · · · a1n
 0 a22 a2n 
A =  .. . . . . ..  ,
 
 . . . . 
0 · · · 0 ann
we get spec(A) = {a11 , a22 , . . . , ann }.

(b) For a square block matrix in triangular form


 
B C
A=
0 D

with square matrices B and D, we get spec(A) = spec(B) ∪ spec(D).

(c) Also spec(A) = spec(AT ). Hence (a) and (b) also hold for lower triangular
matrices.

Proof. For (b): This immediately follows from Proposition 4.15 since λ ∈ spec(A) if and
only if
B C  
1 0 
 
B − λ1 C

0 = det −λ = det = det(B − λ1) det(D − λ1),
0 D 0 1 0 D − λ1

which means that λ ∈ spec(B) or λ ∈ spec(D).


For (a): Use repeatedly (b):
 
a11 − λ a12 ··· a1n
 0 a22 − λ a2n 
0 = det(A − λ1) = det  .. . . ..  = (a11 − λ) · . . . · (ann − λ)
 
 . . . . . . 
0 ··· 0 ann − λ

if and only if λ ∈ {a11 , a22 , . . . , ann }.


For (c): Using Proposition 4.16, one gets 0 = det(A − λ1) = det((A − λ1) ) = det(AT −
T

λ1T ) = det(AT − λ1).

Example 6.10. We give some examples for Proposition 6.9.


 
1 2 3 4
 5 6 7
(a) spec   = {1, 5, 8, 10}
 8 9
10
6.3 Complex matrices and vectors 135

 
1 2 3 4 5  
 6 7 8 9   10
(b) spec   = spec 1 2 ∪ spec 11 12
 
10  = {1, 6, 10, 12, 15}
  6
 11 12  13 14 15
13 14 15
 
1 2 3 4 5 6  
 7 8 9 10 11 12
   
(c) spec 
12  = spec 1 2 ∪ spec 13 14
   


 13 14 
 7 15 16 17 18
 15 16 17 18 19 20 21
 19
 20  21   
1 2 12 17 18
= spec ∪ spec ∪ spec = {1, 7, 12, 14, 17, 21}
7 13 14 21

From the Leibniz formula of the determinant, we conclude:

Remark:
The characteristic polynomial for A ∈ Rn×n is of the following form

pA (λ) = (−1)n λn + tr(A)(−1)n−1 λn−1 + · · · + det(A) , (6.1)

where tr(A) := nj=1 ajj is the sum of the diagonal, the so-called trace of A.
P

6.3 Complex matrices and vectors


We have seen that we need complex numbers when we are talking about eigenvalues.
Eigenvalues given in Definition 6.1 should always be complex numbers. Then we also have
to see the matrix A and the corresponding eigenvector x as general complex entities.

Definition 6.11. Complex matrices


For m, n ∈ N, the set of all m × n matrices with entries in C is denoted by Cm×n .
Analogously, Cn denotes the set of all (column-)vectors with n entries in C.

Naturally, we define the addition and scalar multiplication in Cn and Cm×n as we did for
the objects with real entries. Indeed, everything works the same and we find:

Proposition 6.12. Properties of the vector space Cn


The set V = Cn with the addition + and scalar multiplication · fulfils the following:

(1) ∀v, w ∈ V : v+w =w+v (+ is commutative)


(2) ∀u, v, w ∈ V : u + (v + w) = (u + v) + w (+ is associative)
(3) There is a zero vector o ∈ V with the property: ∀v ∈ V we have v + o = v.
(4) For all v ∈ V there is a vector −v ∈ V with v + (−v) = o.
(5) For the number 1 ∈ C and each v ∈ V , one has: 1 · v = v.
(6) ∀λ, µ ∈ C ∀v ∈ V : λ · (µ · v) = (λµ) · v (· is associative)
136 6 Eigenvalues and similar things

(7) ∀λ ∈ C ∀v, w ∈ V : λ · (v + w) = (λ · v) + (λ · w) (distributive ·+)


(8) ∀λ, µ ∈ C ∀v ∈ V : (λ + µ) · v = (λ · v) + (µ · v) (distributive +·)

We already mentioned that a set V with an addition and scalar multiplication that fulfils
the rules above is called a vector space. However, note that we now can also scale vectors
by using complex numbers. To make this clear, we often speak of the complex vector
space Cn .
Recall the notions of linear dependence, linear independence and basis, which we still use
in the complex vector space Cn .

Definition 6.13. Subspaces in Cn


A nonempty subset U ⊂ Cn is called a (linear) subspace of Cn if all linear combin-
ations of vectors in U remain also in U . This means:

(1) o∈U,
(2) u ∈ U , λ ∈ C =⇒ λu ∈ U ,
(3) u, v ∈ U =⇒ u + v ∈ U .

Definition 6.14. Span


Let M ⊂ Cn be any non-empty subset. Then we define:

Span (M ) := {λ1 u1 + · · · + λk uk : u1 , . . . , uk ∈ M , λ1 , . . . , λk ∈ C , k ∈ N} .

This subspace is called the span or the linear hull of M . For convenience, we define
Span(∅) := {o}.

Definition 6.15. Linear dependence and indepedence


A family (v1 , . . . , vk ) of k vectors from Cn is called linearly dependent if we find a
non-trivial linear combination for o. This means that we can find λ1 , . . . , λn ∈ C
that are not all equal zero such that
k
X
λj vj = o .
j=1

If this is not possible, we call the family (v1 , . . . , vk ) linearly independent. This
means that
Xk
λj vj = o ⇒ λ1 , . . . , λk = 0
j=1

holds.

Definition 6.16. Basis, basis vectors


Let V be a subspace of Cn . A family B = (v1 , . . . , vk ) is called a basis of V if
(a) V = Span(B) and
6.3 Complex matrices and vectors 137

(b) B is linearly independent.

The elements of B are called the basis vectors of V .

Even in the complex vector space Cn , we are able speak of geometry when endowing the
space with an inner product. We try to generalise what we know from the complex plane
C and the real vector space Rn .

Definition 6.17. Inner product in Cn


For the vectors
   
u1 v1 n
.
. .
. , v = ..  ∈ C
X
u=    n
the number hu, vi := u1 v1 + ... + un vn = ui vi
un vn i=1

is called the (standard) inner product of u and v. Moreover, we define the real
number p p
kvk := hv, vi = |v1 |2 + . . . + |vn |2
and call it the norm of v.

Attention!
In some other books, you might find an alternative definition of the standard inner
product in Cn where the first argument is the complex conjugated one.

Note that hv, vi is always a real number with ≥ 0 such it gives us indeed a length.
p Again,
we find the important property: hv, vi = 0 if and only if v = o. Hence, hv, vi is
well-defined and the norm k · k has the same properties as in Rn , see Proposition 6.19
below.

Proposition 6.18.
The standard inner product h·, ·i : Cn × Cn → C fulfils the following: For all vectors
x, x0 , y ∈ Cn and λ ∈ C, one has

(S1) hx, xi > 0 for x 6= o, (positive definite)


(S2) hx + x0 , yi = hx, yi+hx0 , yi, (additive) o
(linear)
(S3) hλx, yi = λhx, yi, (homogeneous)

(S4) hx, yi = hy, xi. (conjugate symmetric)

Proposition 6.19. Norm


The norm k · k : Cn → R defined by using the standard inner product satisfies for
all x, y ∈ Cn and α ∈ C:
(N1) kxk ≥ 0, and kxk = 0 ⇔ x = o, (positive definite)
(N2) kαxk = |α| kxk, (absolutely homogeneous)
(N3) kx + yk ≤ kxk + kyk. (triangle inequality).
138 6 Eigenvalues and similar things

Looking at the standard inner product in Rn , we know that AT satisfies the equation
hAx, yi = hx, AT yi for all x ∈ Rn , y ∈ Rm (Proposition 3.36). In the standard inner
product in Cn , we also have the complex conjugation involved.

Proposition & Definition 6.20. Adjoint matrix


For a given matrix
   
a11 · · · a1n a11 · · · am1
A =  ... ..  ∈ Cm×n , the matrix A∗ := AT =  ..
.   .
..  ∈ Cn×m
. 

am1 · · · amn a1n · · · amn

is called the adjoint matrix of A. It is the uniquely determined matrix that fulfils
the equation
hAx, yi = hx, A∗ yi
for all x ∈ Cn and y ∈ Cm .

Proof. For u, v ∈ Cn with


     
u1 v1 u1
u =  ...  , v =  ...  we have hu, vi = u1 v1 +...+un vn = (v1 · · · vn )  ...  = v∗ u.
un vn un

Hence for all: x ∈ Cn and y ∈ Cm

hAx, yi = y∗ (Ax) = (y∗ A)x = (A∗ y)∗ x = hx, A∗ yi.

In analogy to Proposition 6.9 (c), we get the following for complex matrices:

Proposition 6.21. Spectrum of A∗


For all A ∈ Cn×n , we have spec(A∗ ) = {λ : λ ∈ spec(A)}.

Proof. det(A∗ − λ1) = det (A − λ1)∗ = det(A − λ1) by Proposition 4.15 and z + w =


z + w and z · w = z · w.

Some important notions:

Definition 6.22.
A complex matrix A ∈ Cn×n is called
• selfadjoint if A = A∗ (complex version of “symmetric”),
• skew-adjoint if A = −A∗ (complex version of “skew-symmetric”),
• unitary if AA∗ = 1 = A∗ A (complex version of “orthogonal”),
• normal if AA∗ = A∗ A.

     
1 2i 1 −2i 1 2i
Beispiel 6.23. (a) A = ∗
⇒A = = =A
−2i 0 2i 0 −2i 0
6.4 Eigenvalues and similarity 139

     
i −1 + 2i i 1 + 2i −i 1 − 2i
(b) A = ⇒ A∗ = = = −A
1 + 2i 3i −1 + 2i 3i −1 − 2i −3i
     
1 + i 3 − 2i 1 + i 2i 1 − i −2i
(c) A = ⇒ A∗ = = 6∈ {A, −A}
2i −1 3 − 2i −1 3 + 2i −1

If A ∈ Cn×n is a real matrix, i.e. aij ∈ R for all i, j, then we get:

adjoint matrix A∗ = transpose matrix AT ,


selfadjoint = symmetric,
skew-adjoint = skew-symmetric,
unitary = orthogonal.

Proposition 6.24. Where are the eigenvalues?

(a) If A∗ = A (selfadjoint), then all eigenvalues of A lie on the real line.

(b) If A∗ = −A (skew-adjoint), then all eigenvalues of A lie on the imaginary axis.

(c) If A∗ A = 1 (unitary), then all eigenvalues of A lie on the unit circle in C.

Proof. If λ is an eigenvalue of A, then we find an eigenvalue x 6= o with Ax = λx.


Using Proposition 6.2, we normalise x, which means that we multiply it by α = kxk
1
. The
resulting vector is an eigenvector with norm 1. Therefore, consider the case kxk = 1.
Then:
λ = λkxk2 = λhx, xi = hλx, xi = hAx, xi. (6.2)
(a): If A = A∗ , then:
(6.2) A∗ =A (S4)
λ = hAx, xi = hx, A∗ xi = hx, Axi = hAx, xi = λ,

which implies λ ∈ R.
(b): If A = −A∗ , then:
(6.2) A∗ =−A (S4)
λ = hAx, xi = hx, A∗ xi = hx, −Axi = h−Ax, xi = −hAx, xi = −λ,

which implies λ = iy with y ∈ R.


(c): If A∗ A = 1, then:

A∗ A xi = hAx, Axi = kAxk2 = kλxk2 = (|λ|kxk)2 = |λ|2 .


1 = kxk2 = hx, xi = hx, |{z}
1

6.4 Eigenvalues and similarity

Definition 6.25. Similarity


Two matrices A, B ∈ Cn×n are called similar if there is an invertible S ∈ Cn×n with
A = S −1 BS.
140 6 Eigenvalues and similar things

Proposition 6.26.
Similar matrices have the same characteristic polynomial and thus the same eigen-
values.

Proof. Direct computation. Let A = S −1 BS, so

A − λ1 = S −1 BS − λS −1 S = S −1 (B − λ1)S.

Thus, A−λ1 and B−λ1 are similar matrices. Similar matrices have the same determinant:

pA (λ) = det(A − λ1) = det(B − λ1) = pB (λ).

Remark:
Later, we will see that any matrix A ∈ Cn×n is similar to a triangular matrix.

6.5 Calculating eigenvectors


Even for matrices A ∈ Rn×n , we now consider the eigenvalues in C and the eigenvectors
in Cn . This means that we now consider all square matrices as matrices in Cn×n .
 
0 −1
Example 6.27. Consider A ∈ R 2×2
with A = . Then pA (λ) = λ2 + 1 and
1 0
spec(A) = {−i, i}.
The corresponding map fA : R2 → R2 rotates e1 and e2 , and hence any vector in R2 ,
by an angle of π2 (or 90◦ ) in positive sense. In this sense, no line is sent to itself again.
However, this is only a problem if we look at the “real” picture.

We fix two important properties:

Proposition 6.28. Spectrum is not empty


For a square matrix A ∈ Cn×n holds:

(a) spec(A) 6= ∅.

(b) A is invertible if and only if 0 6∈ spec(A).

Proof. A is not invertible ⇐⇒ 0 = det(A) = det(A − 01) ⇐⇒ 0 ∈ spec(A).

Looking at Proposition 6.4, we see what we have to do in order to calculate the eigenvectors
of a given matrix A if we already know the eigenvalues λ:

Definition 6.29. Eigenspace


The solution set of the LES (A − λ1)x = o, which means Ker(A − λ1), is called the
eigenspace with respect to the eigenvalue λ and denoted by Eig(λ). Each nonzero
vector x ∈ Eig(λ) \ {o} is an eigenvector w.r.t. the eigenvalue λ.
6.5 Calculating eigenvectors 141

Note that Eig(λ) is always a linear subspace and makes also sense in the case when λ is
not an eigenvalue of A. In this instance, we simply have Eig(λ) = {o}.

Example 6.30. Consider A = 3 2


: xi 6= o is an eigenvalue for λi with i ∈ {1, 2} if

1 2

Axi = λi xi , i.e. (A − λi 1)xi = o.

Hence, we have to solve the LES (A − λ1 1)x1 = o and (A − λ2 1)x2 = o.


   
3 − λ1 2 −1 2
λ1 = 4 : A − λ1 1 = = ,
1 2 − λ1 1 −2
        
−1 2 x 0 x 2
(A − λ1 1)x1 = o ⇐⇒ = ⇐= x1 = =
1 −2 y 0 y 1

In the same manner:


   
3 − λ2 2 2 2
λ2 = 1 : A − λ2 1 = = ,
1 2 − λ2 1 1
        
2 2 x 0 x 1
(A − λ2 1)x2 = o ⇐⇒ = ⇐= x2 = =
1 1 y 0 y −1

Definition 6.31. Multiplicities


Let A ∈ Cn×n be square matrix. Then the characteristic polynomial can be written
as:
pA (z) = (λ1 − z)α1 · (λ2 − z)α2 · · · (λk − z)αk (6.3)
where λ1 , . . . , λk are pairwise different. The natural number αj above is called:

α(λj ) := αj algebraic multiplicity of λj

and tells you how often the eigenvalue λj occurs in the characteristic polynomial.
We also define

γ(λj ) := dim(Eig(λj )) = dim(Ker(A − λj 1)) geometric multiplicity of λj

Remark: Recipe for calculating eigenvectors


Let A ∈ Cn×n be a square matrix.

(1) The eigenvalues λ are the zeros of the characteristic polynomial pA of A. In


other words, the solutions of

pA (λ) = det(A − λ1) = 0.

(2) If A is real, then pA (λ) is a real polynomial. If it has a complex zero λ 6∈ R,


then its conjugate λ is also a zero,

(3) If one eigenvalue is found, we can reduce the characteristic polynomial by equat-
ing coefficients (or polynomial division).
142 6 Eigenvalues and similar things

(4) The eigenvectors x are given by the solutions of the LES (A − λ1)x = o for
each eigenvalue, where only the nonzero solutions x 6= o are interesting.

Example 6.32.
p(λ) = −λ3 + 5λ2 − 8λ + 6
• n = 3 is odd: “−λ3 ”
• Try some values and find: λ1 = 3.
Equating coefficients (or polynomial division):
We derive four equations for three unknowns in the following way:
−λ3 + 5λ2 −8λ + 6 = (aλ2 +bλ + c)(λ − 3) = aλ3 + (b − 3a)λ2 + (c − 3b)λ − 3c
−1 =a ⇒ a = −1
5 = b − 3a ⇒ 5=b+3 ⇒ b=2
−8 = c − 3b ⇒ c − 6 = −8 ⇒ c = −2
6 = −3c fulfilled, so λ1 = 3 is really a root of p.
Factorisation:
p(λ) = (−λ2 + 2λ − 2)(λ − 3)
Solution of quadratic equation:
√ √
−b ± b2 − 4ac −2 ± 4 − 8
λ2,3 = = = 1 ± i.
2a −2
Result:    
p(λ) = − λ − 3 λ − (i + 1) λ − (i − 1) .

Exercise 6.33.
Let A be a square matrix and λ1 , λ2 two different eigenvalues. Show that

Eig(λ1 ) ∩ Eig(λ2 ) = {o}

6.6 The spectral mapping theorem


Let λ ∈ C be an eigenvalue of A ∈ Cn×n corresponding to the eigenvector x ∈ Cn , which
means Ax = λx. Then we get for the powers:
A2 x = A(Ax) = A(λx) = λAx = λλx = λ2 x,
A3 x = A(A2 x) = A(λ2 x) = λ2 Ax = λ2 λx = λ3 x,
..
.
A x = λm x
m
for all m ∈ N, (6.4)
We conclude that Am has also the eigenvector x but now it corresponds to the eigenvalue
λm instead of λ.
Now we could also bring in the addition of the matrices A0 , A1 , A2 , and so on, and get a
similar result.
6.6 The spectral mapping theorem 143

Proposition 6.34. Polynomial spectral mapping theorem


Let p(λ) = pm λm + pm−1 λm−1 + . . . + p1 λ + p0 be a polynomial and A ∈ Cn×n a
square matrix. Putting the matrix A into p (formally), we get the following matrix:

p(A) := pm Am + pm−1 Am−1 + . . . + p1 A + p0 1 .

It is again an n × n matrix, and we get



spec p(A) = { p(λ) : λ ∈ spec(A) }.

Moreover, each eigenvector of A is also an eigenvector of p(A).

Proof. Let us denote { p(λ) : λ ∈ spec(A) } by M . Then we have to show two inclusions
to prove spec p(A) = M .
(⊃): For λ ∈ spec(A) with eigenvector x we use (6.4) and get
p(A)x = (pm Am + pm−1 Am−1 + . . . + p1 A + p0 1)x
= pm Am x + pm−1 Am−1 x + . . . + p1 Ax + p0 1x
= pm λm x + pm−1 λm−1 x + . . . + p1 λx + p0 x
= (pm λm + pm−1 λm−1 + . . . + p1 λ + p0 )x = p(λ)x.
Hence, the number p(λ) ∈ M is an eigenvalue of the matrix p(A) with the same eigenvector
x.
(⊂): First, assume that p is a constant polynomial p(z) = p0 ∈ C. Then let λ ∈
spec(p(A)), which means 0 = det(p(A) − λ1) = (p0 − λ)n . We conclude λ = p0 and
λ ∈ M.
Now assume that p is not constant and µ ∈ / M = { p(λ) : λ ∈ spec(A) }. (We do a
contraposition). Then the polynomial q(z) := p(z) − µ can be written in linear factors
m
Y
q(z) = c (z − aj )
j=1

with c 6= 0 and aj ∈ C. We get p(aj ) = µ for all j, and, hence, aj ∈


/ spec(A). This means
that det(A − aj 1) 6= 0 for all j, which also implies
m
Y
n
det(p(A) − µ1) = det(q(A)) = c det(A − aj 1) 6= 0 .
j=1

Therefore, µ is not an eigenvalue of p(A) and we conclude µ ∈


/ spec p(A) .


Example 6.35. Let A = 3 2


. We want to know the eigenvalues of the following matrix

1 2
B = 3A3 − 7A2 + A − 21.
We do not need to calculate the matrix B since we can use the spectral mapping theorem:
We know the eigenvalues of A from before: λ1 = 4 and λ2 = 1. Using the Proposition 6.34,
you only need to put these the numbers into: p(λ) = 3λ3 − 7λ2 + λ − 2. Hence, for B, we
find the following eigenvalues
p(λ1 ) = 3λ31 − 7λ21 + λ1 − 2 = 3 · 64 − 7 · 16 + 4 − 2 = 82 and
144 6 Eigenvalues and similar things

p(λ2 ) = 3λ32 − 7λ22 + λ2 − 2 = 3 · 1 − 7 · 1 + 1 − 2 = −5.


The eigenvectors x1 = 21 and x2 = −1 1
, we found for A, are also eigenvectors for the
 

matrix B. Moreover, x1 corresponds to the eigenvalue 82 and x2 to the eigenvalue −5.

We can expand the spectral mapping theorem also to other functions besides polynomials.
For example, it also works for the negative powers. This means that we can calculate the
eigenvalues of A−1 if you know the eigenvalues of A. In this case, you do not have to
calculate the inverse A−1 :
Let all the eigenvalues of A be nonzero (in this case, recalling Proposition 6.28 the inverse
A−1 exists) and fix one of them as λ with corresponding eigenvector x 6= o. Then we can
multiply the equation
Ax = λx
from the left by A−1 . Hence we get x = λA−1 x and also:

A−1 x = λ−1 x. (6.5)

Rule of thumb:
A−1 has the same eigenvector x as A – but for the eigenvalue λ−1 instead of λ.

We simply get:
spec(A−1 ) = {λ−1 : λ ∈ spec(A)}.
Of course, λ−1 is always well-defined since λ 6= 0.

Example 6.36. Let A = 31 22 . Now, one could calculate the inverse, using formulas from


Chapter 4: 1 
−1 /2 −1/2
A =
−1/4 3/4
This matrix has the eigenvalues µ1 = 1/4 and µ2 = 1 and the eigenvectors x1 = 21 and

1
.

x2 = −1
If one is only interested in the eigenvalues, we do not have to calculate the matrix A−1 .
We just use the rule from above and know that the eigenvalues of A−1 are the reciprocals
of λ1 = 4 and λ2 = 1. The eigenvectors are the same as the eigenvectors of A.

We do not have to stop here. We can multiply A−1 again from the left to equation (6.5)
and, doing this repeatedly, we get
A−2 x = λ−2 x, A−3 x = λ−3 x, etc.,
where A−n means (A−1 )n .
Hence, we can expand the equation (6.4) to all numbers m ∈ Z:

Am x = λm x for all m ∈ Z.

Of course, if we can also expand it to linear combinations . . . , A−2 , A−1 , A0 , A1 , A2 , . . .


which shows that our spectral mapping theorem is only a special case of a more general
one.
6.7 Diagonalisation – the optimal coordinates 145

6.7 Diagonalisation – the optimal coordinates


We startet this chapter with a two-dimensional picture. Now, we again revisit the 2 × 2-
example A = 1 2 . We
3 2
 know that1 λ
 1 = 4 and λ2 = 1 are the eigenvalues with associated


eigenvectors x1 = 1 and x2 = −1 .
2

Hence we know the two vectors, x1 and x2 , that span the lines g3 and g4 from the start
of the chapter, respectively. Also, we know the “stretch factors” λ1 = 4 and λ2 = 1 that
describe the acting of fA in the direction g3 and g4 , respectively. Therefore, choosing a
coordinate system with respect to (x1 , x2 )-coordinates makes calculations a lot easier:

Optimal coordinates for A


By using for u ∈ R2 the linear combination u = α1 x1 + α2 x2 with coefficients
α1 , α2 ∈ R, we get

Au = A(α1 x1 + α2 x2 ) = α1 (Ax1 ) + α2 (Ax2 ) = α1 (4x1 ) + α2 (1x2 ) = 4α1 x1 + 1α2 x2 .

The component in x1 -direction, which is α1 , is scaled by the factor λ1 = 4, and the


x2 -component α2 is scaled by the factor λ2 = 1.

fA
x1 x1
x2 x2

We immediately see a big advantage for this coordinate system: We can apply A several
times without effort. For example the operation A100 is directly given by: A100 u =
4100 α1 x1 + 1100 α2 x2

However, we already know that in general we cannot expect to stay in the real numbers. If
the eigenvalues are strictly complex numbers, the picture gets a little bit more complicated
but the properties remain. Let A ∈ Cn×n be a n × n matrix with complex entries. Let
λ1 , . . . , λn ∈ C denote the n eigenvalues A (which means the zeros pA ) counted with
algebraic multiplicities, and let x1 , . . . , xn ∈ Cn be the corresponding eigenvectors. Then
we already know:

Ax1 = λ1 x1 , ... , Axn = λn xn . (6.6)


This is what we can put together into a matrix equation:
   

A x1 · · · xn  = Ax1 · · · Axn 


| {z }
=:X
146 6 Eigenvalues and similar things

    
λ1
(6.6) ..
= λ1 x1 · · · λn xn  = x1 · · · xn   . ,
λn
| {z }
=:D

or in short: AX = XD. This means that A is similar to a diagonal matrix if X is


invertible.
Now we can look what happens if AX and XD act on a given vector v = (α1 · · · αn )T ∈ Cn .
The equation AXv = XDv can be written in the following form:

A(α1 x1 + . . . + αn xn ) = λ1 α1 x1 + . . . + λn αn xn .

Hence, if u ∈ Cn is given as a linear combination α1 x1 + . . . + αn xn using only the eigen-


vectors x1 , . . . , xn and a coefficient vector (α1 · · · αn )T , then Au is also a linear combination
of the eigenvectors x1 , . . . , xn , now with coefficient vector (λ1 α1 · · · λn αn )T . Hence in this
coordinate system A acts in this way:
   
α1 λ1 α1
 ...  7→  ...  .
αn λn αn

This is exactly the multiplication with the diagonal matrix D.


In mathematical terms: By changing the basis of Cn from the canonical unit vectors
(e1 , . . . , en ) to a basis consisting of eigenvectors (x1 , . . . , xn ), the matrix A changes to a
simple diagonal matrix D.

Diagonalisation of A
 

Choose X = x1 · · · xn  and


Canonical basis
 
λ1
... . Then: u A· Au
D=
λn

AX = XD. (6.7)

X −1 · X· X −1 · X·
Multiplication (6.7)·X −1
gives:

A = XDX −1 (6.8)    
α1 λ1 α 1
 ..  D·  .. 
.  . 
and in the same ways X −1
·(6.7)
αn λn α n
gives:
Coordinates w.r.t. basis (x1 , ..., xn )
−1
X AX = D.

The important question “Is that even possible?” is equivalent to the following:
6.7 Diagonalisation – the optimal coordinates 147

• Can we write all u ∈ Cn as α1 x1 + . . . + αn xn ?


• Span(x1 , . . . , xn ) = Cn ?
• Is (x1 , . . . , xn ) a basis of Cn ?
• Is X invertible?

Definition 6.37. Diagonalisability


A square matrix A ∈ Cn×n is called diagonalisable if one can find n different eigen-
vectors x1 , . . . , xn ∈ Cn that form a basis of Cn .


1 0
Example 6.38. (a) The matrix A = has e1 and e2 as eigenvectors and they
0 2
form a basis of C2 . Hence, A is diagonalisable.
     
1 1 1 1
(b) The matrix B = has and as eigenvectors and they form a basis of
0 2 0 1
C2 . Hence, B is diagonalisable.
   
1 1 1
(c) The matrix C = has only eigenvectors in direction and they cannot
0 1 0
form a basis of C2 . Hence, C is not diagonalisable.

Choosing a basis consisting of eigenvectors, we know that A acts like a diagonal matrix.

Proposition 6.39. Different eigenvalues ⇒ linearly ind. eigenvectors


If λ1 , . . . , λk are k different eigenvalues of A, then each family (x1 , . . . , xk ) of cor-
responding eigenvectors is linearly independent.

Proof. We use mathematical induction over k. The case k = 2 was proven Exercise 6.33.
Now the induction hypothesis is that (x1 , . . . , xk ) is linearly independent for k different ei-
genvalues. In the induction step, we now look at k+1 different eigenvalues λ1 , . . . , λk , λk+1
and corresponding eigenvectors (x1 , . . . , xk , xk+1 ). Now assume that this family is linearly
dependent. Then we can choose coefficients βi such that xk+1 = β1 x1 + · · · + βk xk holds.
Hence,

(λk+1 − λ1 )β1 x1 + · · · + (λk+1 − λk )βk xk


= λk+1 (β1 x1 + · · · + βk xk ) − (λ1 β1 x1 + · · · + λk βk xk )
= λk+1 xk+1 − Axk+1 = o

By the induction hypothesis, we conclude (λk+1 − λi )βi = 0 for all i. Since not all βi can
be zero, we find at least one j ∈ {1, . . . , k} with λk+1 = λj , which is a contradiction.

Therefore, we conclude:

Corollary 6.40. n different eigenvalues ⇒ diagonalisable


If A ∈ Cn×n has n different eigenvalues, then A is a diagonalisable.

Proof. A linearly independent family of n eigenvectors forms a basis for Cn .


148 6 Eigenvalues and similar things

Example 6.41. (a) A = 31 22 has eigenvalues λ1 = 4 and λ2 = 1. Corollary 6.40




tells us that A is diagonalisable. We also verify this by looking at the eigenvectors


x1 = 21 and x2 = −1 1
, which form a basis of C2 . Hence, 31 22 = A = XDX −1 =

 4 0 2 1 −1
2 1
1 −1 0 1 1 −1
.
(b) The 90◦ -rotation A = 01 −10 has eigenvalues λ1,2 = ±i. From A − λ1,2 1 = ∓i −1
we
 
1 ∓i 
conclude the eigenvectors x1 = 1 and x2 = i , which span C2 . Hence, 1 0 =
i 1 0 −1
 
−1
A = XDX −1 = i1 1i 0i −i0 i1 1i . X and D are strictly complex, while A is a real
 

matrix.
(c) Look at the 3 × 3 matrices:
   
4 0 0 8 8 4
A= 1 6 3 and B = −1 2 1 .
−2 −4 −2 −2 −4 −2
If you calculate the characteristic polynomials, you find

pA (λ) = −λ3 + 8λ2 − 16λ = −λ(λ − 4)2 = pB (λ)

and, hence, the same eigenvalues λ1 = 0, λ2 = 4 and λ3 = 4. The eigenvalues


λ2,3 = 4 have algebraic multiplicity 2. Since λ1 and λ2,3 are different, we know by
Proposition 6.39 that we have at least two linearly independent eigenvectors: one
corresponding to λ1 = 0 and one corresponding to λ2,3 = 4. If we actually find three
linearly independent vectors, the eigenvalue λ2,3 = 4 is the crucial one.
For A, the eigenspaces are:
! ! !
 0   2 −1 
Ker(A − λ1 1) = Span −1 and Ker(A − λ2 1) = Span −1 , 2 ,
2 0 −1
However for B, the eigenspaces are! !
 0   2 
Ker(B − λ1 1) = Span −1 and Ker(B − λ2 1) = Span −1 .
2 0
While A has three different directions for eigenvectors and is diagonalisable, the matrix
B has for λ2,3 = 4 only one direction for eigenvectors. There are too few vectors for
a basis and B is not diagonalisable.
(d) Let A = 01 00 . Using Proposition 6.9 the eigenvalues are λ1 = λ2 = 0. Hence,


D = λ0 1λ02 = 00 00 = 0. If A was diagonalisable, then there would be an X


 

with A = XDX −1 = X0X −1 = 0. Contradiction to A 6= 0. Therefore, A can-


not be diagonalisable. In fact, Ker(A − 01) = Ker(A) = Span( 1 ). Alternatively:
0


dim(Ker(A − 01)) = dim(Ker(A)) = 2 − rank(A) = 2 − 1 = 1.

Reminder: Algebraic and geometric multiplicity


For each eigenvalue λ of A we consider
• the algebraic multiplicity of λ, denoted by α(λ), given by the multiplicity of λ
as zero of pA , and
• the geometric multiplicity of λ, denoted by γ(λ), given by the dimension of
6.7 Diagonalisation – the optimal coordinates 149

the eigenspace Ker(A − λ1).

For A from Example 6.41 (c), we find α(0) = 1 = γ(0), α(4) = 2 = γ(4).
For B from Example 6.41 (c), we get α(0) = 1 = γ(0), α(4) = 2 6= 1 = γ(4).

Proposition 6.42. Algebraic vs. geometric multiplicity


Let A ∈ Cn×n be a square matrix, and let λ1 , . . . , λk ∈ C be all eigenvalues of A
(not counted with multiplicities). Then:

(a) α(λ1 ) + . . . + α(λk ) = n.

(b) For all i = 1, . . . , k, we have 1 ≤ γ(λi ) ≤ α(λi ).

Therefore, the following claims are equivalent:

(a) A is diagonalisable,

(b) γ(λ1 ) + . . . + γ(λk ) = n,

(c) γ(λi ) = α(λi ) for all i = 1, . . . , k.

Proof. Exercise.

For symmetric or selfadjoint matrices, we can improve Proposition 6.39 even more:

Proposition 6.43. A=A∗ : orthogonal eigenvectors


Let A ∈ Cn×n be selfadjoint, which means A = A∗ , and let λ, λ0 ∈ C be two
different eigenvalues of A with corresponding eigenvectors x and x0 , respectively.
Then x ⊥ x0 .

(S4) (S4)
Proof. Since hx, λ0 x0 i = hλ0 x0 , xi = λ0 hx0 , xi = λ0 hx0 , xi = λ0 hx, x0 i, we have
A=A∗ see above Prop. 6.24
λhx, x0 i = hλx, x0 i = hAx, x0 i = hx, Ax0 i = hx, λ0 x0 i = λ0 hx, x0 i = λ0 hx, x0 i

and, hence, (λ − λ0 )hx, x0 i = 0. This means that the second factor has to be zero.

Proposition 6.44. A=A∗ : diagonalisable - ONB of eigenvectors


Let A ∈ Cn×n be selfadjoint, which means A = A∗ . Then A is diagonalisable, where
there is an ONB (x1 , . . . , xn ) for Cn consisting of eigenvectors of A. The matrix
 

X = x1 . . . xn 

is unitary, i.e. X −1 = X ∗ . Therefore, we have:

A = XDX −1 = XDX ∗ and D = X −1 AX = X ∗ AX . (6.9)


150 6 Eigenvalues and similar things

Sketch of proof. Use Proposition 6.43 and Gram-Schmidt for each eigenspace to find an
ONB of Cn . Then X ∗ X = 1 and also X ∗ = X −1 .

Remark: Important!
Actually, we could generalise the Proposition from above and equation (6.9). It
holds if and only if the matrix A is normal (i.e. AA∗ = A∗ A).

Proposition 6.45.
For a diagonalisable A ∈ Cn×n , let λ1 , . . . , λn be the eigenvalues counted with algeb-
raic multiplicities. Then
n
Y n
X
det(A) = λi and tr(A) = λi ,
i=1 i=1

where tr(A) := ajj is the sum of the diagonal, the so-called trace of A.
Pn
j=1

Proof. Exercise!

Remark:
Later, we will see that the result of Proposition 6.45 actually holds for all matrices
A ∈ Cn×n .

6.8 Some applications


Here, we look at some of very many possible applications.

Rotation of boxes
A box of the size 10cm × 20cm × 30cm rotates around a axis given by the vector
ω ∈ R3 . The whole box has an angular momentum L ∈ R3 .

ω
L is given by a linear equation using ω, which means
L
L = Jω

with a symmetric matrix J ∈ R3×3 , which is called


the inertia tensor of the box. The rotation “wobbles”
if L, which means Jω, is not parallel to the rotation
axis ω. Of course, we have three special rotation axes
given by the eigenvectors of J. They are called the
principal axes of the box.
6.8 Some applications 151

Curves and areas


√ √
Which points x = xy ∈ R2 satisfy the equation 3x2 + 2 3xy + y 2 + x − 3y = 2?


Solution: Rewrite the equation as a vector-matrix equation


√  
√ √ √
  
3 3 x x
2 = 3x + 2 3xy + y + x − 3y = (x y) √
2 2
+ (1 − 3)
| {z } 3 1 y | {z } y
xT | {z } | {z } =:bT | {z }
=:A (=AT ) x x


and diagonalise the symmetric matrix A: λ1 = 4, λ2 = 0, x1 = 1 3 −1
1 √
 
2 1
, x2 = 2 3
 √  √   √ 
3 3 1 3 √−1 4 1 3 √1
√ ∗ T
= A = XDX = XDX = .
3 1 2 1 3 0 2 −1 3
| {z } | {z } | {z }
X D XT

Then we get 2 = xT Ax+bT x =√xT (XDX T )x+bT x = (xT X)D(X T x)+bT X(X T x).
Setting uv = u := X T x = 21 −13 √13 xy simplifies the equation to
  

    
T T 4 u u
2 = u Du + b Xu = (u v) + (0 −2) = 4u2 − 2v.
0 v | {z } v
bT X

y
v
The more complicated equation from above looks at lot
simpler in the “optimal” (x1 , x2 )-coordinate system: u
30◦
2 = 4u2 − 2v, also v = 2u2 − 1, x2
· x1
30◦
x
There you immediately seethat it is a parabola. The
v = 2u2 − 1
transformation we did, y = x 7→ v = u = X T x,
x u


was just a rotation by 30◦ .

A simple criterion for definiteness

n = 2: det(A) = λ1 λ2 .
• det(A) > 0 ⇒ eigenvalues have the same sign ⇒ A (pos. or neg.) definite. If
a11 = eT1 Ae1 > 0, then pos., otherwise neg. definite.
• det(A) < 0 ⇒ A indefinite
In general: A symmetric A is positive definite if all left upper subdeterminants are positive.

Summary
• All matrices A we considered here were square matrices.
• A vector x 6= o, which A only scales, which means Ax = λx, is called an eigenvector ;
152 6 Eigenvalues and similar things

the corresponding scaling factor λ is called an eigenvalue. The set of all eigenvalues
of A is called the spectrum.
• λ is an eigenvalue of A if and only if (A − λ1)x = o has non-trivial solutions x 6= o
(namely the eigenvectors). This is fulfilled if and only if det(A − λ1) = 0.
• For A ∈ Cn×n , we define pA (λ) := det(A − λ1), the characteristic polynomial of A,
which is a polynomial of degree n in the variable λ. It has exactly n complex zeros:
the eigenvalues of A.
• The eigenvalues λ are in general complex numbers, also the eigenvectors are complex
x ∈ Cn . All matrices should be considered as A ∈ Cn×n .
• Also in Cn , we can define inner products. Here, we only use the standard inner
product hx, yi, defined by x1 y1 + · · · + xn yn . Hence we get a new operation for
matrices: A∗ := AT = (aji ). It satisfies hAx, yi = hx, A∗ yi for all x, y.
• Checking eigenvalues: Product of all eigenvalues of A is equal to det(A); the sum is
equal to tr(A).
• The matrix A is invertible if and only if all eigenvalues are nonzero.
• The eigenvalues of a triangular matrix are the diagonal entries. The eigenvalues of
a block matrix in triangular form are given by the eigenvalues of the blocks on the
diagonal.
• The eigenvalues of Am are given by the eigenvalues of A to the power of m where
m ∈ Z. The eigenvectors stay the same as for A. For example, 3A17 − 2A3 + 5A−6
has the eigenvalues 3λ17 − 2λ3 + 5λ−6 , where λ goes through all eigenvalues of A.
The eigenvectors still stay the same.
• A is called diagonalisable if it can be written as XDX −1 , where D is a diagonal
matrix consisting of eigenvalues of A and X gets eigenvectors in the columns. This
only works if there are enough eigenvectors directions such that X is invertible.
• A is diagonalisable if and only if for all eigenvalues λ the algebraic multiplicity of λ
is the same as the geometric multiplicity.
• If A = A∗ , then A is diagonalisable and the eigenvalues are real and eigenvectors
can be chosen to be orthonormal.
General vector spaces
7
’Oh man, capitalism sucks!’ cries the Kangaroo as it flips the Monopoly
board.
Marc-Uwe Kling

We already mentioned the general definition of an abstract vector space. However, most
of the time, we focused on the vector spaces Rn and Cn . Now, we really start walking
into this new abstract terrain.

7.1 Vector space in its full glory


We already considered different vector spaces in the previous chapters. Recall that a
vector space is a set where scaling and adding makes sense. The standard example was
always Rn . However, we also saw that the matrices Rm×n form the same structure and
the we expanded this notion to Cn and Cm×n . In this section, we will finally consider all
these spaces in a general context and get a lot more different examples.
We will also recap important definitions like linear combinations, Span(. . .), basis, dimen-
sion, linearity and other things we learnt in Linear Algebra, however, now in a general
and abstract context.
The vector space needs only a notion of adding elements and scaling elements. This scalar
for stretching a vector was either a real number or a complex number. We put both cases
together and use the symbol F that stands either for R or C and stands for a so-called
number field.

Definition 7.1. Real or complex vector spaces


Let F be either R or C. A nonempty set V together with two operations,
• a vector addition + : V × V → V ,
• and a scalar multiplication · : F × V → V ,
where the rules below are satisfied, is called an F-vector space. The elements of V
are called vectors, and the elements F are called scalars. The two operations have

153
154 7 General vector spaces

to satisfy the following rules:

(1) ∀v, w ∈ V : v+w =w+v (+ is commutative)


(2) ∀u, v, w ∈ V : u + (v + w) = (u + v) + w (+ is associative)
(3) There is a zero vector o ∈ V with the property: ∀v ∈ V we have v + o = v.
(4) For all v ∈ V there is a vector −v ∈ V with v + (−v) = o.
(5) For the number 1 ∈ F and each v ∈ V , one has: 1 · v = v.
(6) ∀λ, µ ∈ F ∀v ∈ V : λ · (µ · v) = (λµ) · v (· is associative)
(7) ∀λ ∈ F ∀v, w ∈ V : λ · (v + w) = (λ · v) + (λ · w) (distributive ·+)
(8) ∀λ, µ ∈ F ∀v ∈ V : (λ + µ) · v = (λ · v) + (µ · v) (distributive +·)

Example 7.2. Rn and Cn . At this point, we are very familiar with the space Fn , where
the vectors have n components consisting of numbers from F and the addition and scalar
multiplication is done componentwise:
     
v1 v1 λv1
λ ∈ F, v =  ...  ⇒ λv = λ  ...  :=  ... 
vn vn λvn
         
u1 v1 u1 v1 u1 + v1
u =  ...  , v =  ...  ⇒ u + v =  ...  +  ...  :=  ... 
un vn un vn un + vn
Now, this is now just a special case of an F-vector space.

Of course: Vectors from Rn and Cn are also vectors in this new sense. However, now we
have a lot more examples:

Example 7.3. Matrices. The set of matrices V := Fm×n together with the matrix
addition and scalar multiplication
     
a11 · · · a1n b11 · · · b1n a11 + b11 ··· a1n + b1n
 .. ..  +  .. ..  := .. ..
 . .   . .  . . .
 

am1 · · · amn bm1 · · · bmn am1 + bm1 · · · amn + bmn
| {z } | {z } | {z }
A B A+B

   
a11 · · · a1n λa11 · · · λa1n
 .. ..  :=  .. ..  .
λ·  . .   . . 
am1 · · · amn λam1 · · · λamn
| {z } | {z }
A λ·A

defines also an F-vector space.

Example 7.4. Functions. Let F(R) be the set of functions f : R → R.


For all α ∈ R and f , g ∈ F(R) define α · f and f + g by
7.2 Linear subspaces 155

(α · f )(x) := α · f (x)
(f + g)(x) := f (x) + g(x)

for all x ∈ R.

arctan
arctan + cos
x
cos
2 · cos

This is a natural definition for the α-multiple of a function and the sum of two functions.
Obviously, α · f and f + g are again well-defined functions R → R and hence elements in
F(R). Now we have to check to rules: (1)–(8). This is an exercise.
Hence, F(R) with + and · is an F-vector space.

So, we see functions also as vectors since we have the same calculation rules.

Lemma 7.5. o=0 · f and −f =(−1) · f


Let V be an F-vector space with the operations + and ·. For all f ∈ V , we have

o=0·f and − f = (−1) · f .

(8)
Proof. We have 0 · f = (0 + 0) · f = (0 · f ) + (0 · f ). Add on both sides the vector −(0 · f ),
we get o = 0 · f . For the second claim, we have
(8) (5)
o = 0 · f = (1 + (−1)) · f = (1 · f ) + ((−1) · f ) = f + ((−1) · f ).

Add −f on both sides, we get −f = (−1) · f .

7.2 Linear subspaces


We already defined the term linear subspace in Rn and Cn and, of course, it can now be
generalised for the general vector spaces.
Let us look at a special subset from Example 7.4:

Example 7.6. Polynomial functions. Let P(R) denote the set of polynomial functions
f : R → R. We know that P(R) is a nonempty subset of F(R) (set of all functions
R → R) from Example 7.4. The addition + and scalar multiplication · are just inherited
from F(R).
156 7 General vector spaces

Is P(R) also a vector space? Before checking (1)–(8), we have to prove that the vector
addition + and scalar multiplication · is well-defined inside P(R), which means that you
cannot leave P(R) by these operations:

f + g and α · f lie in P(R) for arbitrary f , g ∈ P(R) and α ∈ R. (7.1)

We already know equation (7.1): It means that P(R) is closed under the addition + and
the scalar multiplication ·. We can easily show that (7.1) is correct for polynomials.
Now checking (1)–(8) is very fast because:

(1) f + g = g + f, F(R) all functions


(2) f + (g + h) = (f + g) + h,
(5) 1 · f = f, P(R)
all polynomial
(6) α · (β · f ) = (αβ) · f ,
functions
(7) α · (f + g) = (α · f ) + (α · g),
f
(8) (α + β) · f = (α · f ) + (β · f ) g
hold for all f , g, h ∈ F(R) and α, β ∈ F = R. f +g
Hence, they also hold for f , g, h ∈ P(R) ⊂ F(R). Great!
One also says: “P(R) inherits (1),(2),(5)–(8) from F(R).” 3·f
And what is about property (3) o ∈ P(R) and property
(4) ∀f ∈ P(R) : −f ∈ P(R)?
Both immediately follow from Lemma 7.5 and (7.1)!

This finishes the proof that P(R) is a vector space with respect to + and ·.

Refresh that this proof could be done without much work. The only time when you have
to look at P(R), which means at the polynomial functions, was equation (7.1). All the
other things are inherited from the superset F(R).

Proposition & Definition 7.7. Linear subspace


Let V be an F-vector space and let U be a non-empty subset of V , which is closed
under vector addition and scalar multiplication of V , which means

(a) for all u, v ∈ U , we have u + v ∈ U and

(b) for all α ∈ F and u ∈ U , we have α · u ∈ U .

Then U is also an F-vector space. In this case, U is called a linear subspace of V


or in short a subspace of V

Recall that we introduced the notion of a subspace before for V = Rn and V = Cn . In


the same way, we know that U = {o} and U = V are always subspaces of V . However,
in all non-trivial cases, we have more than these two subspaces.

Example 7.8. Quadratic polynomials. Let P2 (R) be the set of all polynomials with
degree ≤ 2, which means
all functions p : R → R, p(x) = a2 x2 + a1 x + a0 with a2 , a1 , a0 ∈ R.
7.2 Linear subspaces 157

Is P2 (R) with the vector addition + and · from F(R) a vector space?
Obviously, P2 (R) ⊂ F(R) and P2 (R) 6= ∅. Using Proposition 7.7 we only have to check
that P2 (R) is closed under + and ·, which means that we have to check (a) and (b):
Let p, q ∈ P2 (R) and α ∈ R. Then, there are a2 , a1 , a0 , b2 , b1 , b0 ∈ R such that

p(x) = a2 x2 + a1 x + a0 and q(x) = b2 x2 + b1 x + b0 .

Hence:
(p + q)(x) = p(x) + q(x) = (a2 x2 + a1 x + a0 ) + (b2 x2 + b1 x + b0 )
= (a2 + b2 )x2 + (a1 + b1 )x + (a0 + b0 ),
(α · p)(x) = α · p(x) = α · (a2 x2 + a1 x + a0 )
= (αa2 )x2 + (αa1 )x + (αa0 )

We conclude that p + q ∈ P2 (R) and α · p ∈ P2 (R). The set P2 (R) is a subspace of F(R)
and a vector space for its own.

Analogously, for n ∈ N0 , we define Pn (R) as the set of all polynomials with degree ≤ n.
It forms again a vector space with the operations + and · from F(R).
Here, another vector space:

Example 7.9. Upper triangular matrices Let n ∈ N and Rn×n the set of all upper
triangular matrices A ∈ Rn×n . The operations + and · are the same as before for all
matrices. Since o ∈ Rn×n 6= ∅, the sum of two upper triangular matrices is again a
triangular matrix and scaled triangular matrices are again triangular matrices, we know
that Rn×n is a subspace of the R-vector space Rn×n (from Example 7.3) and hence a
vector space itself.

Example 7.10. The set of all matrices U in the following form:


is closed under matrix addition and the multiplica-
 
a 0 a
0 b 0 with a, b ∈ C tion with scalars α ∈ C. Therefore, U is a subspace
a 0 a of C3×3 and a C-vector space for itself.

F(R)
If we look back at the polynomial spaces, we notice that P(R)
we have the following inclusions:
P10 (R)
P0 (R) ⊂ P1 (R) ⊂ P2 (R) ⊂ · · · ⊂ P(R) ⊂ F(R) P2 (R)
P1 (R)
The vector space F(R) is the largest of these. The poly- P0 (R)
nomial space Pn (R) gets smaller if we choose n smaller.
When we talk about sizes in vector spaces, we remember
the definition of a dimension of a subspace. We suppose
that the dimension of Pn (R) is n + 1. But first of all, we
have to define all the notions again.
158 7 General vector spaces

7.3 Recollection: basis, dimension and other stuff


Let F ∈ {R, C} and V be an F-vector space with vector addition + and scalar multiplic-
ation ·.
As we did for Rn and later for Cn , we introduce notions like linear independence, basis,
dimension and related definitions. In spite of considering abstract vector spaces, the
notions still work exactly the same.

Definition 7.11. Same as before: Basis, dimension, and so on


Let V be an F-vector space with operations + and ·.

• For k ∈ N, vectors v1 , . . . , vk ∈ V and scalars α1 , . . . , αk ∈ F the vector


k
X
α1 v1 + . . . + αk vk = α i vi ∈ V
i=1

is called a linear combination of the vectors v1 , . . . , vk .

• The set of all possible linear combinations for the vectors of a subset M ⊂ V
is called the linear hull or span of M :

Span (M ) := {λ1 u1 + · · · + λk uk : u1 , . . . , uk ∈ M , λ1 , . . . , λk ∈ F , k ∈ N} .

• A family E = (v1 , . . . , vk ) consisting of k vectors from V is called a generating


system for the subspace U ⊂ V , if U = Span(v1 , . . . , vk ).

• A family E = (v1 , . . . , vk ) consisting of k vectors from V is called linearly


dependent if o can be represented by a non-trivial linear combination of vectors
from E. If there is no such non-trivial linear combination, the family is called
linearly independent.

• A family E that is a generating system for U ⊂ V and linearly independent is


called a basis of U .

• The number of elements for a basis of U is called the dimension of U . We


just write dim(U ).

The definitions and proofs for related propositions are literally the same as in Section 3.7
for the vector space Rn and its subspace. Therefore, we just summarise the facts in this
more abstract case:

Rule of thumb: Basis, dimension and similar things


• A generating family E = (v1 , . . . , vk ) of U is called this way because we can
reach each point in U with linear combinations of vector from E and no other
points.

• A family E is linear independent if we need all “family members” to span (or


generate) the subspace Span(E).
7.3 Recollection: basis, dimension and other stuff 159

• A basis B of U is a smallest generating set U . (We cannot omit a vector from


B.)

• The dimension of a subspace U


= the number of elements of a basis of U . (All bases have the same number
of elements, just redo the proof in Proposition 3.25.)
= the smallest possible size for a generating system of U . (With less vectors
it is not possible to span the whole space U .)
= the maximal number of vectors from U that form a linearly independent
family. (If you choose more vectors, there are always linearly dependent.)

Let us look at different examples:

Example 7.12. – Matrix vector spaces


(a) The vector space C2×3 of all complex 2 × 3-matrices can be written in the following
way:
  
2×3 α β γ
C = : α, β, γ, δ, ε, ϕ ∈ C
δ ε ϕ
n o
1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0
     
= α 0 0 0 + β 0 0 0 + γ 0 0 0 + δ 1 0 0 + ε 0 1 0 + ϕ 0 0 1 : α, β, γ, δ, ε, ϕ ∈ C
 1 0 0 0 1 0 0 0 1   0 0 0   0 0 0   0 0 0 
= Span , , , , ,
000 000 000 1 00 0 1 0 00 1
Hence
 1 0 0 0 1 0 0 0 1   0 0 0   0 0 0   0 0 0 
B= , , , , ,
000 000 000 1 00 0 1 0 00 1

is a generating system for C2×3 . B is also linearly independent: From


           
100 010 001 000 000 000
α +β +γ +δ +ε +ϕ = o,
000 000 000 100 010 001
| {z }
α β γ
(δ ε ϕ)

we conclude α = β = γ = δ = ε = ϕ = 0. Hence, B is a basis of C2×3 and the


dimension of C2×3 is |B| = 6. Analogously, one can prove: dim(Fm×n ) = m · n.
(b) In a similar way, we can prove that
    
0

0 
1 0 0 1
B= , , (7.2)
0 0 0 0 0 1
forms a basis of R2×2 . Hence:
( ! )        
α β 1 0 0 1 0 0
: α, β, γ ∈ R = α +β +γ : α, β, γ ∈ R
0 γ 0 0 0 0 0 1

We conclude: dim( R2×2 ) = 2 + 1 = 3.


Analogously for given n ∈ N, one can prove dim( Rn×n ) = n+(n−1)+. . .+1 = n(n+1)
2
.
160 7 General vector spaces

(c) As a special vector space, we look at:


        
 α 0 α   1 0 1 0 0 0 
U = 0 β 0  : α, β ∈ C = α  0 0 0  + β 0 1 0 : α, β ∈ C
α 0 α 1 0 1 0 0 0
   
   
 1 0 1 0 0 0 
= Span  0 0 0 , 0 1 0 .
  (7.3)
1 0 1 0 0 0
| {z } | {z }
=: A =: B

Hence, B := (A, B) is a generating system for U . Again, we show that B is also


linearly independent. From αA + βB = o, one gets
   
α 0 α 0 0 0
 0 β 0  = αA + βB = o = 0 0 0 ,
α 0 α 0 0 0
and concludes α = β = 0. Therefore, B is a basis of U and dim(U ) = 2.

However, the examples above were all well-known matrix spaces. It would be more in-
teresting to look at our new function spaces like the polynomial space P2 (R) from Ex-
ample 7.8:

Example 7.13. – Polynomial space P2 (R). We define the special polynomials


m0 , m1 , m2 ∈ P2 by
m0 (x) := 1, m1 (x) := x, and m2 (x) := x2 for all x ∈ R
and see:
P2 (R) = {x 7→ a2 x2 + a1 x + a0 : a2 , a1 , a0 ∈ R} = {a2 m2 + a1 m1 + a0 m0 : a2 , a1 , a0 ∈ R}
= Span(m0 , m1 , m2 )
Hence, B := (m0 , m1 , m2 ) is a generating system for P2 (R).
In order to show that B is also linearly independent, we choose a linear combination
αm0 + βm1 + γm2 for the zero vector o and find conditions for possible coefficients
α, β, γ ∈ R. Note that the zero vector o in this vector space is the zero function, which
means the function x 7→ 0. In other words, the equation αm0 + βm1 + γm2 = o should
be read as: For all x ∈ R, we have
α + βx + γx2 = 0. (7.4)
Since the equation (7.4) above holds for all x ∈ R, we can choose some suitable values
for x. We start with trying x = 0, x = 1 and x = −1. The equation (7.4) gives us then
three different linear equations:

x=0: α + β·0 + γ·0 = 0 
x=1: α + β·1 + γ·1 = 0 (7.5)
x = −1 : α + β · (−1) + γ · 1 = 0

Indeed, (7.5) is an LES 3 equations and three unknowns α, β, γ, which we can just solve
with our known methods. We get α = β = γ = 0. Hence, B = (m0 , m1 , m2 ) is linearly
independent and a basis of P2 (R). We also get dim(P2 (R)) = 3.
7.4 Coordinates with respect to a basis 161

Proposition & Definition 7.14. Monomial basis of Pn (R)


Let n ∈ N0 . The particular polynomials m0 , m1 , . . . , mn ∈ Pn (R) defined by

m0 (x) = 1, m1 (x) = x, ..., mn−1 (x) = xn−1 , mn (x) = xn for all x ∈ R

are called monomials. The family B = (m0 , m1 , . . . , mn ) forms a basis of Pn (R)


and is called the monomial basis. Hence dim(Pn (R)) = n + 1.

Sketch of proof. This works the same as for P2 (R), see Example 7.13. In order to show the
linear independence, we have to choose n+1 different values for x. The (n+1)×(n+1)-LES
always has a unique solution.

Now, by knowing that the monomials are linearly independent, we can always solve equa-
tions by equating coefficients:

Corollary 7.15. The method of equating the coefficients


Let p and q be two real polynomials with degree n ∈ N, which means

p(x) = an xn + . . . + a1 x + a0 and q(x) = bn xn + . . . + b1 x + b0

for some coefficients an , . . . , a1 , a0 , bn , . . . , b1 , b0 ∈ R.


If we have the equality p = q, which means

an x n + . . . + a1 x + a0 = b n x n + . . . + b 1 x + b 0 , (7.6)

for all x ∈ R, then we can conclude an = bn , . . . , a1 = b1 and a0 = b0 .

Proof. Equation (7.6) means (an − bn )mn + . . . + (a1 − b1 )m1 + (a0 − b0 )m0 = o. Because
of the linear independence, we have (an − bn ) = . . . = (a1 − b1 ) = (a0 − b0 ) = 0.

Remark:
Since dim(Pn (R)) = n + 1 and we have the inclusions

P0 (R) ⊂ P1 (R) ⊂ P2 (R) ⊂ · · · ⊂ P(R) ⊂ F(R) ,

we conclude that dim(P(R)) and dim(F(R)) cannot be finite natural numbers. Sym-
bolically, we write dim(P(R)) = ∞ in such a case.

7.4 Coordinates with respect to a basis

7.4.1 Basis implies coordinates


Again, we deal with the case F = R and F = C simultaneously. Therefore, let V be an
F-vector space with the two operations + and ·. Let also n := dim(V ) < ∞ and choose
a basis B = (b1 , . . . , bn ) of V .
162 7 General vector spaces

Since B is a generating system and linearly independent, each v from V has a linear
combination
v = α1 b1 + . . . + αn bn (7.7)
where the coefficients α1 , . . . , αn ∈ F are uniquely determined. We call these numbers
the coordinates of v with respect to the basis B and sometimes write vB for the vector
consisting of these numbers:

A vector v in V and its coordinate vector vB in Fn


 
α1
v = α1 b1 + . . . + αn bn ∈ V ←→ vB =  ...  ∈ Fn . (7.8)
αn

One also sees the notation [x]B for the coordinate vector. When fixing a basis B in V ,
then each vector v ∈ V uniquely determines a coordinate vector vB ∈ Fn – and vice versa.

Forming the coordinate vector is a linear map


The translation of a vector v ∈ V into the
coordinate vector vB ∈ Fn defines a linear
map: vector space V
ΦB : V → Fn , ΦB (v) = vB  x
 
 −1
ΦB  ΦB

More concretely: 
y 
ΦB (α1 b1 + . . . + αn bn ) = α1 e1 + . . . + αn en
vector space Fn
For all x, y ∈ V and λ ∈ F, the map Φ sat-
isfies two properties:

ΦB (x + y) = ΦB (x) + ΦB (y) (+)

ΦB (λx) = λΦB (x) (·)


 
α1
v = α1 b1 + . . . + αn bn ∈ V ←→ ..  ∈ Fn . (7.9)
ΦB (v) =  .
αn
The linear map ΦB is called the basis isomorphism with respect to the basis B and
is completely defined by Φ(bj ) = ej for j = 1, . . . , n.

Example 7.16. An abstract vector is represented by numbers


The three functions sin, cos and arctan from the vector space F(R), cf. Example 7.4,
span a subspace:
V := Span(sin, cos, arctan) ⊂ F(R).
In the same manner as before, we can show that the three functions are linearly
independent. Hence, they form a basis of V :

B := (sin, cos, arctan).


7.4 Coordinates with respect to a basis 163

Now, if we look at another function v ∈ V given by

v(x) = 8 cos(x) + 15 arctan(x) for all x ∈ R, hence v = 8cos + 15arctan.

Then:

v = 8cos + 15arctan

arctan
sin
cos

We find
v = 8cos + 15arctan
directly by using
its coordinates:
!
0
ΦB (v) = 8 .
15

Rule of thumb: V is completely represented by Fn


Each F-vector space V with n := dim(V ) < ∞ is represented by Fn if you fix a basis
B = (b1 , . . . , bn )
For each vector v ∈ V , there is exactly one coordinate vector
 ΦB(v) ∈ F . Instead
n

α1
of using v ∈ V , one can also do calculations with ΦB (v) = ..  ∈ Fn .
 .
αn

Of course, calculations in Fn might be simpler and more suitable for a computer than the
calculations in an abstract vector space.

Example 7.17. The polynomials p, q ∈ P3 (R) given by p(x) = 2x3 − x2 + 7 and q(x) =
x2 +3 can be represented with the monomial basis B = (m0 , m1 , m2 , m3 ) by the coordinate
vectors:
   
7 3
0
and 0 ∈ R4 , (7.10)
 
ΦB (p) =   ∈ R4 ΦB (q) =
−1 1
2 0

since p = 7m0 + 0m1 + (−1)m2 + 2m3 and q = 3m0 + 0m1 + 1m2 + 0m3 .
In the same manner the two polynomials (2p)(x) = 4x3 −2x2 +14 and (p+q)(x) = 2x3 +10,
164 7 General vector spaces

have the following coordinate vectors:


   
14 10
0 0
ΦB (2p) = −2 = 2 ΦB (p)
 and ΦB (p + q) = 
 0  = ΦB (p) + ΦB (q).

4 2

This shows that we can also calculate with the coordinate vectors from equation (7.10).

Example 7.18. The matrix A = 10 23 ∈ R2×2 has the following coordinate vector with


respect to the basis B from equation (7.2):


!
1
ΦB (A) = 2 .
3

!
3
The matrix 3A has the coordinate vector 6 .
9
The matrix  
5 0 5
C = 0 2 0
5 0 5

has the following coordinate vector with respect to the basis from equation (7.3): 5
.

2
The matrix 2C has then the coordinate vector 104
.

7.4.2 Change of basis


We have seen that we can represent an abstract vector v ∈ V with a very concrete vector
in Fn . However, this representation is heavily dependent on the chosen basis B for V . If
we choose another basis C of V , then the coordinate vector ΦC (v) might be different to
the old coordinate vector ΦB (v). Here, we will talk what happens if we switch the bases.
For a given basis B = (b1 , . . . , bn ) of the vector space V , we have the linear map

ΦB : V → Fn , ΦB (bj ) = ej for all j

which is also invertible. We have called it the basis isomorphism. For a given element
v = α1 b1 + . . . + αn bn , we can write:

v = α1 b1 + . . . + αn bn = α1 Φ−1 −1 −1
B (e1 ) + . . . + αn ΦB (en ) = ΦB (ΦB (v)) .

Also remember the formula for the inverse:

Φ−1 n
B : F → V , Φ−1
B (ej ) = bj for all j
7.4 Coordinates with respect to a basis 165

Example 7.19. Consider the already introduced monomial basis (now with different
order!) B = (m2 , m1 , m0 ) = (x 7→ x2 , x 7→ x, x 7→ 1) of the space P2 (R) and the
polynomial p ∈ P2 (R) defined by p(x) = 4x2 + 3x − 2. Then:
!
4
p = Φ−1
B 3 = Φ−1 B (ΦB (p)) since p = 4m2 + 3m1 − 2m0 .
−2

Example 7.20. Let V = R2 and choose the basis B = ( 11 , 01 ). The vector x = 3


  
7
∈V
can then be represented as:
        
3 1 0 1 0 3
x = = 3 +4 = = Φ−1
B (ΦB (x))
7 1 1 1 1 4

Obviously, in this case, the linear map Φ−1


B : R → R has a corresponding 2 × 2-matrix
2 2
1 0
.

1 1

Now, let C = (c1 , . . . , cn ) be another basis of V . Then a given vector v ∈ V can also be
given as a linear combination with this new basis, which means v = α10 c1 + . . . + αn0 cn =
C (ΦC (v)) with some coefficients αj ∈ F.
Φ−1 0

Question: Old versus new coordinates


How to switch between both coordinate vectors?
   0
α1 α1
B-coordinates ΦB (v) =  ...  C-coordinates ΦC (v) =  ... 
?
←→
αn αn0

V
Φ−1
B Φ−1
C
?
Fn Fn
?

We see that we want to create a map Fn → Fn that makes this translation from one
coordinate system into the other. Looking at the linear maps ΦB and ΦC , we see that
this is just a composition of two linear maps. To get the map from “left to right” by
f : Fn → Fn , f := ΦC ◦ Φ−1
B . More concretely for all canonical unit vectors ej ∈ F , we
n

get:
f (ej ) = ΦC (Φ−1
B (ej )) = ΦC (bj ) (7.11)

Since f is a linear map, we find a uniquely determined matrix A such that f (x) = Ax for
all x ∈ Fn . This matrix is determined by equation (7.11) and given a suitable name:

Transformation matrix or change-of-basis matrix


 

TC←B := ΦC (b1 ) · · · ΦC (bn ) ∈ Fn×n (7.12)


166 7 General vector spaces

is called the transformation matrix or change-of-basis matrix from B to C.

The corresponding linear map gives us a sense of switching from basis B to the basis C.
Also a good mnemonic is:

Φ−1 −1
C TC←B x = ΦB x for all x ∈ Fn (7.13)

Now, if we have a vector v ∈ V and its coordinate vector ΦB (v) and ΦC (v), respectively,
then we can calculate:

TC←B ΦB (v) = ΦC (Φ−1


B (ΦB (v))) = ΦC (v) .

We fix our result:

Transformation formula

ΦC (v) = TC←B ΦB (v). (7.14)

If we exchange the roles of B and C, we also get a trans- v∈V


formation matrix: TB←C := (ΦB (c1 ) · · · ΦB (cn )). Ana-
logously to (7.13) and (7.14), we get Φ−1
B Φ−1
C
TC←B
ΦB (v) ∈ Fn ΦC (v) ∈ Fn
Φ−1 −1
B TB←C x = ΦC x and ΦB (v) = TB←C ΦC (v). TB←C

Of course by definition or by looking at (7.13) and (7.14), we get:

TB←C = (TC←B )−1 . (7.15)

Rule of thumb: How to get the transformation matrix TC←B


The notation TC←B means: We put the vector in B-coordinates in (from the right)
and get out the vector in C-coordinates. To get the transformation matrix TC←B
write the basis vectors of B in C-coordinates and put them as columns in a matrix.

Example 7.21. We already know the monomial basis

B = ( m2 , m1 , m0 ) = (x 7→ x2 , x 7→ x, x 7→ 1)
|{z} |{z} |{z}
=: b1 =: b2 =: b3

in P2 (R). Now, we can easily show that

C = (m2 − 21 m1 , m2 + 12 m1 , m0 ).
| {z } | {z } |{z}
=: c1 =: c2 =: c3

defines also a basis of P2 (R). Now we know how to change between these two bases.
Therefore, we calculate the transformation matrices. The first thing you should note is
7.4 Coordinates with respect to a basis 167

that the basis C is already given in linear combinations of the basis vectors from B. Hence
we get:
       
1 1 0 1 1 0
1  1 
ΦB (c1 ) = − /2 , ΦB (c2 ) = /2 , ΦB (c3 ) = 0
    =⇒ TB←C =  −1/2 1
/2 0 .
0 0 1 0 0 1

Then ΦB (p) = TB←C ΦC (p) gives us the wanted the translation. If we want to calculate
the reverse translation, we have to calculate the inverse matrix of TB←C .
By calculating the inverse, we get the other transformation matrix TC←B = (TB←C )−1 :
  1 
/2 −1 0
TC←B = ΦC (b1 ) ΦC (b2 ) ΦC (b3 ) = 1/2 1 0
0 0 1

For an arbitrary polynomial p(x) = ax2 + bx + c with a, b, c ∈ R, we get:


1    a 
/2 −1 0 a /2 − b
1
ΦC (p) = TC←B ΦB (p) = /2 1
 0  b  = a/2 + b .
0 0 1 c c

Hence, for our example p(x) = 4x2 + 3x − 2, a = 4, b = 3 and c = −2, we get:


4   
/2 − 3 −1
4
ΦC (p) = /2 + 3 =
   5 .
−2 −2

Let us check again if this was all correct:

(−1) (x2 − 21 x) +5 (x2 + 12 x) +(−2) |{z}


1 = (−1 + 5)x2 + ( 21 + 52 )x + (−2)1 = 4x2 + 3x − 2.
| {z } | {z }
c3 (x)
c1 (x) c2 (x)

Example 7.22. Now, we look at R2 with the two bases B = ( 12 , 34 ) and C = ( 10 , 22 ).


   

In this case neither the matrix TB←C nor TC←B is obviously given. In such a case, it might
be helpful to include a third basis, which is well-known. In R2 this third basis should be
of course the standard basis: E = (e1 , e2 ).

V = R2
Φ−1 Φ−1 Φ−1
E = id
B C
?
R2 R2 R2
?

The idea is to calculate first the transformation matrices TE←B and TE←C and the inverses
and then compose the maps in a way to get the transformation matrices TB←C and TC←B .
The basis elements of B and C are already given in the coordinates of the standard basis.
Hence:    
1 3 1 2
TE←B = and TE←C = .
2 4 0 2
168 7 General vector spaces

Now, for getting TC←B , we have to combine:


TC←B = TC←E TE←B = (TE←C )−1 TE←B .
This means, we have to calculate the inverse of TE←C first and to multiply with the matrix
TE←B . However, we can do both things together when doing the Gaussian elimination with
more than one right-hand-side (this is often called Gauß-Jordan algorithm). In this case,
we want to generate the unit matrix on the left:
   
1 2 1 3 1 0 −1 −1
(TE←C | TE←B ) ; (1 | TC←B ), so ; .
0 2 2 4 0 1 1 2
In more details, this means: We solve two LES simultaneously and also do the backward
substitution in the matrix notation. The advantage is that we read the solution as a
matrix directly from the right-hand side after all calculation steps.

Question: Can we do a similar thing in the polynomial space? Consider bases B and C
that is not the simple monomial basis:
B = (2m2 − 1m1 , −8m1 − 2m0 , 1m2 + 4m1 + 1m0 )
| {z } | {z } | {z }
=: b1 =: b2 =: b3

and C = (1m1 + 1m0 , 2m2 + 2m1 , 1m2 + 1m0 ) .


| {z } | {z } | {z }
=: c1 =: c2 =: c3

Answer: Yes, we can do the same by adding the the monomial basis (or a other
well-known basis) in the middle. We call the monomial basis by A, which means A =
(m2 , m1 , m0 ). Then TA←B and TA←C are immediately given:
   
2 0 1 0 2 1
TA←B = −1 −8 4 and TA←C = 1 2 0 ,
0 −2 1 1 0 1
and then we get TB←C :

TB←C by using an additional “nice” basis A

TA←C = TA←B TB←C


TA←B TB←C
and hence TB←C = (TA←B )−1 TA←C .
ΦA (x) →−7 ΦB (x) →−7 ΦC (x)

Since we again have to find an inverse of a matrix, we can use the Gauß-Jordan algorithm
again:

(TA←B | TA←C ) ; (1 | TB←C ). (7.16)

For our example, this gives us:


   
2 0 1 0 2 1 1 0 0 3 −2 4
 −1 −8 4 1 2 0  ;  0 1 0 −7/2 3 −4  .
0 −2 1 1 0 1 0 0 1 −6 6 −7
The boxed matrix is indeed TB←C .
7.4 Coordinates with respect to a basis 169

Change of basis for audio: WAV vs. MP3


Assume you have an audio signal f given at
finite time steps t = 1, 2, ..., 50 (e.g. milli- f = 123 50

seconds). Hence, you have some measure val-


ues f1 , f2 , . . . , f50 ∈ R.
b1 :1 At this point you know that the audio sig-
1 nal f is a vector in a 50-dimensional space,
b2 : 1 which can be represented with respect to the
2
canonical basis B = (b1 , . . . , b50 ) of R50 .
.. The coordinates of f are exactly the values
b50 : . 1
f1 , f2 , . . . , f50 . For describing tones (so os-
50
cillations) this basis is not optimal!
We want to change to a basis C of R , which is better fitting for tones.
50

Sine waves Cosine waves


no oscillation
c25 :
123 50

1 oscillation 1 oscillation
c1 : c26 :
123 50 123 50

2 oscillations 2 oscillations
c2 : c27 :
123 50 123 50

3 oscillations 3 oscillations
c3 : c28 :
123 50 123 50

.. ..
. .
24 oscillations 24 oscillations
c24 : c49 :
123 50 123 50

25 oscillations
c50 :
123 50
170 7 General vector spaces

One can show: C = (c1 , . . . , c50 ) is also linearly independent and hence a basis of
R50 .

f= 123 50

The signal f from above has the following form:

f = c3 − c25 + 3c26 + 41 c40 .

We reckon that most signals f are a superposition of some “basic tones” ci .

Compression: One stores only the coordinates in ΦC (f ). One can also focus on
the (for humans) important frequencies and ignore the higher and lower ones (e.g.
MP3 file format). All this saves storage space instead of storing the coordin-
ates ΦB (f ) = (f1 , . . . , f50 )T (e.g. WAV file format). Similar ideas exist for two-
dimensional signals like pictures: ⇒ BMP vs. JPG.

Information: The change of basis from B to C is important for a lot of applications


and known as the Fourier transform. We will consider it in more detail in the
analysis lecture.

7.5 General vector space with inner product and


norms
Recall that in the vector spaces Rn and Cn , besides the algebraic structure given by

vector addition + and the scalar multiplication ·,

we also defined a geometric structure by choosing

an inner product h·, ·i and also a norm k · k

for measuring angles and lengths.


Now we want to expand such a geometric structure to general F-vector spaces.

Attention! Convention for F=R and F=C


Since we handle the cases F = R and F = C simultaneously, we also use the notion
of the complex conjugation in the real case. Hence, for α ∈ F we write:

α if F = R,

α :=
α if F = C (complex conjugate number).
7.5 General vector space with inner product and norms 171

Analogously, for a matrix A ∈ Fm×n with m, n ∈ N:

A if F = R (transpose),
 T

A :=
A∗ if F = C (adjoint).

7.5.1 Inner products


Let F ∈ {R, C} and V be an F-vector space.

Definition 7.23. Inner product


A map h·, ·i : V × V → F is called an inner product for V if it fulfils: For all
x, x0 , y ∈ V and α ∈ F:

(S1) hx, xi > 0 for all x 6= o, (positive definite)


(S2) hx + x0 , yi = hx, yi+hx0 , yi, (additive) o
(linear)
(S3) hαx, yi = αhx, yi, (homogeneous)

(S4) hx, yi = hy, xi. ((conj.) symmetric)

A vector space with an inner product is often called a pre-Hilbert space.

Recall all the properties we could derive from these four rules. For example:

hx, αyi = αhx, yi for all α ∈ F, x, y ∈ V.


(S4) (S3) (S4)
The proof goes like: hx, αyi = hαy, xi = αhy, xi = αhy, xi = αhx, yi.

Example 7.24. (a) Let V = Fn .

Standard inner product Fn


     
D x1 y1 E x1
hx, yi = .
. .
. = x1 y1 + . . . + xn yn = (y1 · · · yn ) ...  (7.17)
 . , .
   
xn yn xn
= y∗ x =: hx, yieuclid , x, y ∈ Fn

Again, the standard inner product is the most important one in Rn and Cn . Since it
describes the usual euclidean geometry, we denote it by hx, yieuclid in both cases.
(b) For V = F2 and x = xx12 , y = yy12 ∈ F2 we define an inner product by
 

x1 y1
 
hx, yi = h x2
, y2
i := x1 y1 + x1 y2 + x2 y1 + 4x2 y2 .

(c) For V = F2 and x = x1 y1


∈ F2 , we could also define
 
x2
, y= y2

x1 y1
 
hx, yi = h x2
, y2
i := x1 y2 + x2 y1 .
172 7 General vector spaces

This is symmetric
 and linear in the first
 argument but not positive definite. For
example, x = −1 gives us hx, xi = h −1 , −1 i = −2.
1 1 1


(d) Let V = P([0, 1], F) be the F-vector space of all polynomial functions f : [0, 1] → F.
Then, we define for f , g ∈ V the inner product:
Z 1
hf , gi := f (x)g(x) dx
0

You should see the analogy to hx, yieuclid in Fn . All data is now continuously distrib-
uted over [0, 1], and we need an integral instead of a sum. Often, we are in the case
F = R and can ignore the complex conjugation g(x).

Recall that for a general inner product on Rn , there is a uniquely determined positive
matrix A such that:

hx, yi = hAx, yieuclid (7.18)

for all x, y ∈ Rn .
In the same way this also works for the complex vector space Cn . We just have to expand
the definition of positive definite matrices in this case:

Definition 7.25. Positive definite matrix


A matrix A ∈ Fn×n is called positive definite if it is selfadjoint (A∗ = A) and
satisfies
hAx, xieuclid > 0, i.e. x∗ Ax > 0 (7.19)
for all x ∈ Fn \ {o}.

Attention! Positive definite needs selfadjointness


By our definition a positive definite matrix is always selfadjoint. In the complex
case this follows from equation (7.19). However, in the real case, you cannot drop
this assumption. Moreover, hAx, xieuclid is always real, even in the case F = C,
A=A∗
hAx, xieuclid = hx, A∗ xieuclid = hx, Axieuclid = hAx, xieuclid .

Some authors might be using only equation (7.19) for defining positive definite
matrices in the real case. Therefore to play it safe, we often talk about matrices
that are “selfadjoint and positive definite”.

We fix the general result:

Proposition 7.26. Positive definite matrix A ⇒ hAx, yieuclid inner product


If A ∈ Fn×n is selfadjoint and positive definite, then

hx, yi := hAx, yieuclid , x, y ∈ Fn

defines an inner product in Fn .


7.5 General vector space with inner product and norms 173

Example 7.27. Let us look at the examples from before:


(a) The identity matrix 1 is positive definite since h1x, xieuclid = hx, xieuclid > 0 for all
x 6= o.
(b) The matrix A = 11 14 ∈ R2×2 is positive definite since for all x = xx12 ∈ R2 we have
 

D1 1 x  x E
1 1
, = x1 x1 + x2 x1 + x1 x2 + 4x2 x2 = (x1 + x2 )2 + 3(x2 )2 ≥ 0.
1 4 x2 x2 euclid
This can be only 0 if x1 = −x2 and x2 = 0, hence only for x = o.
(c) The matrix A = 01 10 is selfadjoint but not positive definite. For example, for x = 1
 
−1
the value hAx, xieuclid is negative.

Testing a matrix A ∈ Fn×n for positive definiteness can be much work, even in the case
n = 2. Therefore, the next criterion is very useful:

Proposition 7.28. 4 recognition features for a positive definite matrix


Let A = (aij ) ∈ Fn×n be a selfadjoint matrix. Then the following claims are
equivalent:
(i) A is positive definite.
(ii) All eigenvalues A are positive.
(iii) After using Gaussian elimination only with the matrices Zi−λj , all pivots are
positive.
(iv) The determinants of the so-called leading principal minors of A, which means
det(H1 ), . . . , det(Hn ), are positive.
Here  
  a11 a12 a13
a11 a12
H1 = (a11 ), H2 = , H3 = a21 a22 a23  , ..., Hn = A.
a21 a22
a31 a32 a33

We skip the proof here. Keep in mind that “positive” always means strictly greater than
zero (>0)! Claim (iv) is called Sylvester’s criterion.

Example 7.29. Let us check the proposition for the matrix A = 11 14 . It is positive


definite by Example 7.27 (b). The eigenvalues of A are given by solving


q 
5 2
0 = det(A − λ1) = (1 − λ)(4 − λ) − 1 = λ − 5λ + 3, so λ1,2 = 2 ±
2 5
2
− 3.
Both eigenvalues, λ1 and λ2 , are positive. The Gaussian elimination gives us:
   
1 1 1 1
A= ; .
1 4 0 3

Both pivots, 1 and 3 , are positive. At last the minors:


 
1 1
det(H1 ) = det(1) = 1 > 0 and det(H2 ) = det = 3 > 0.
1 4

We have already seen in Proposition 7.26 that hx, yi := hAx, yieuclid defines an inner
product in Fn if A is positive definite. In some sense, also the converse is correct:
174 7 General vector spaces

Proposition 7.30. Inner products are related to pos. definite matrices


Let V be an F-vector space with inner product h·, ·i and dim(V ) = n. Let B be a
basis of V . Then for all x, y ∈ V we have

hx, yi = hAΦB (x), ΦB (y)ieuclid ,

where h·, ·ieuclid is the standard inner product in Fn and


 
hb1 , b1 i · · · hbn , b1 i
.. ..
A = G(B) =  . .
 

hb1 , bn i · · · hbn , bn i

is the Gramian matrix w.r.t. B.

Proof. Exercise!

Example 7.31. Look at the R-vector space P2 ([0, 1]) of all real polynomial functions
f : [0, 1] → R with degree ≤ 2. The integral
Z 1
hp, qi := p(x)q(x) dx, p, q ∈ P2
0

defines an inner product. Let us check how to use Proposition 7.30 in this case. Choose
a basis B of P2 , for example the monomial basis B = (m0 , m1 , m2 ), and calculate the
associated Gramian matrix:
Z 1 Z 1
xi+j+1 1 1i+j+1 − 0i+j+1 1
hmi , mj i = i j
x x dx = i+j
x dx = = = (7.20)
0 0 i+j+1 0 i+j+1 i+j+1
and
   1 1 1  1 1 1

hm0 , m0 i hm1 , m0 i hm2 , m0 i 0+0+1 1+0+1 2+0+1 /1 /2 /3
(7.20) 1 1 1 1 1 1 
G(B) = hm0 , m1 i hm1 , m1 i hm2 , m1 i =  0+1+1 1+1+1 2+1+1
 = /2
 /3 /4 .
1 1 1 1 1 1
hm0 , m2 i hm1 , m2 i hm2 , m2 i 0+2+1 1+2+1 2+2+1
/3 /4 /5
Then, by Proposition 7.30: For all a, b, c, d, e, f ∈ R, we get:
    
D 1/1 1
/2 1
/3 a d E
ham0 + bm1 + cm2 , dm0 + em1 + f m2 i = 1/2 1
/3 1    
/4 b , e
1 1 1 euclid
/3 /4 /5 c f
= ad + 21 (ae + bd) + 31 (af + be + cd) + 14 (bf + ce) + 51 cf.
Let’s check this:
Z 1
ham0 + bm1 + cm2 , dm0 + em1 + f m2 i = (a + bx + cx2 )(d + ex + f x2 ) dx
0
Z 1 
= ad + (ae + bd)x + (af + be + cd)x2 + (bf + ce)x3 + cf x4 dx
0
Z 1 Z 1 Z 1 Z 1 Z 1
= ad dx + (ae + bd) x dx + (af + be + cd) x dx + (bf + ce) x dx + cf x4 dx
2 3
0 0 0 0 0
(7.20)
= ad + 21 (ae + bd) + 13 (af + be + cd) + 14 (bf + ce) + 51 cf.
7.5 General vector space with inner product and norms 175

Corollary 7.32. Gramian matrix is positive definite.


For a basis B of a vector space V with inner product h·, ·i, the Gramian matrix G(B)
is selfadjoint and positive definite.

Proof. G(B) = G(B)∗ follows from hbi , bj i = hbj , bi i. Using Proposition 7.30, we know
hG(B)ΦB (x), ΦB (x)ieuclid = hx, xi > 0 for all x ∈ V \ {o} and hence also for all vectors
ΦB (x) ∈ Fn \ {o}.

7.5.2 Norms
As always, let F ∈ {R, C} and V be an F-vector space. Even in the case V not having an
inner product, we can talk about the length of vectors if we define a length measure:

Definition 7.33. Norm


A map k · k : V → R with the following properties is called a norm on V . For all
x, y ∈ V and α ∈ F, we have:

(N1) kxk ≥ 0, and kxk = 0 ⇔ x = o, (positive definite)

(N2) kαxk = |α| kxk, (absolutely homogeneous)

(N3) kx + yk ≤ kxk + kyk (triangle inequality).


An F-vector space with such a norm is called a normed space.

Example 7.34. (a) We already know that the euclidean norm for Fn , given by
 
x1 p
kxk =  ...  = |x1 |2 + · · · + |xn |2 , x ∈ Fn , (7.21)

xn

satifies (N1-3) from Definition 7.33.


(b) In equation (7.21), you see squares and a square root that cancel themselves in some
sense. This would also work for cubes and the third root. Or even in general:

The p-norm
For each real number p ≥ 1, we set:
 
x1
kxkp =  ...  := p |x1 |p + · · · + |xn |p , (7.22)
p
x ∈ Fn .

xn p

This defines the so-called p-norm.In fact, proving the triangle inequality (N3) is not
trivial. The euclidean norm (7.21) is hence also called 2-norm.
(c) Another related norm is given by:
p
lim p |x1 |p + · · · + |xn |p = max{|x1 |, . . . , |xn |}
p→∞
176 7 General vector spaces

Therefore, we define

Maximum norm or ∞-norm


 
x1
kxk∞ =  ...  := max{|x1 |, . . . , |xn |} (7.23)

xn ∞
= lim kxkp , x ∈ Fn .
p→∞

Let us check for n = 2 that the three properties in Definition 7.33. Let α ∈ F and
x = xx12 , y = yy12 ∈ F2 .


(N1) kxk∞ = max{|x1 |, |x2 |} is only 0 if x1 = 0 and x2 = 0, hence x = o.


(N2) kαxk∞= max{|αx1 |, |αx2 |} = max{|α| |x1 |, |α| |x2 |} = |α| max{|x1 |, |x2 |} = |α|kxk∞
(N3) The triangle inequality:

kx + yk∞ = max{|x1 + y1 |, |x2 + y2 |} ≤ max{|x1 | + |y1 |, |x2 | + |y2 |}


(∗)
≤ max{|x1 |, |x2 |} + max{|y1 |, |y2 |} = kxk∞ + kyk∞

x2

p
On the right-hand side, you see the geometric picture for dif-

=

ferent norms. Usually, one calls it the “unit circles”, which

p
=
p

5
means the sets

=
2
p
=
{x ∈ R2 : kxkp = 1}

1
x1
Such a subset of R2 consists of all vectors with length 1, for 0
different p = 1, 2, 5 and ∞.
For p = 2, this in indeed a usual circle. However, also the
different geometric views for other p are interesting:

Assume you are in Manhattan inside a taxicab at point


p. Driving one block costs you $ 1. If you have $ 2 in
your pocket, you can reach all the red points in the
map. If you have $ 5, you can get to all the red and the
blue points. The ε-neighbourhoods
p
{x ∈ R2 : kx − pk < ε}

in Manhattan are just squares, which stay at one


corner, and not real circles. This exactly the 1-norm
k·k1 , which is often alternatively called “taxicab norm”.

Rule of thumb: Norm gives you lengths and distances


You should imagine kxk as the length of the “vector arrow” x. Hence, kx − yk is the
length of the connection vector between x and y – or in other words: The distance
between x and y.
7.5 General vector space with inner product and norms 177

Example 7.35. – p-norm for polynomials. The p-norms in Fn , which we defined


above, can be generalised for functions. For example, for the R-vector space P([a, b]),
which means all polynomial functions f : [a, b] → R, we can also define such norms:

Norms for polynomials on [a, b]


s
Z b
|f (x)|p dx for p ∈ [1, ∞) and
p
kf kp := kf k∞ := max |f (x)|
a x∈[a,b]

We can use the same symbol k · kp as before since the context is always clear.
On the right-hand side, you see some polynomial g
functions f , g, h ∈ P([a, b]). The area of the blue
region is h
Z b
kf k1 = |f (x)| dx
a
and the area of the red region is:
f
Z b
kg − hk1 = |g(x) − h(x)| dx. x
a
a b
In later lectures, like mathematical analysis, we will prove the three properties (N1),(N2)
and (N3) for all these norms.

7.5.3 Norm in pre-Hilbert spaces


If a F-vector space V is equipped with an inner product h·, ·i, then we automatically get
a norm k · k, that is associated to h·, ·i in such a way:

Proposition & Definition 7.36. Induced or associated norm


Let V be a pre-Hilbert space, which is an F-vector space with an inner product h·, ·i.
Then p
kxk := hx, xi, x∈V
defines a norm and it is called the induced norm or associated norm w.r.t. h·, ·i.

For a proof, we need the next Proposition.

Proposition 7.37. Cauchy-Schwarz inequality


Let V be a pre-Hilbert space. For all x, y ∈ V :

|hx, yi|2 ≤ hx, xihy, yi.

With the associated norm from Proposition& Definition 7.36, we get:

|hx, yi| ≤ kxk kyk.

Equality holds if and only if x and y are linearly dependent.


178 7 General vector spaces

Proof. Look again at the proof of Proposition 5.5.

We look at the examples for inner products from above and calculate the associated norms

p inner product hx, yineuclid = x1 y1 + · · · + xn yn in F


Example 7.38. (a) The standard n

induced the 2-norm kxk = |x1 |2 + · · · + |xn |2 in F .


(b) The associated norm with respect to the inner product hx, yi := hAx, yieuclid in Fn
where A ∈ Fn×n is a selfadjoint and positive definite matrix is given by
p p
kxk = hx, xi = hAx, xieuclid .

For the example A = 11 14 , we get




s 
x  D 1 1 x  x E
1 1 1
p
= , = |x1 |2 + x1 x2 + x2 x1 + 4|x2 |2 .
x2 1 4 x2 x2 euclid

(c) Looking at the F-vector space P([a, b]) of all polynomial functions f : [a, b] → F, we
defined the inner product

Z b
hf , gi = f (x)g(x) dx . (7.24)
a

The associated norm in P([a, b]) is the already introduced 2-norm since
s s
p Z b Z b
kf k = hf , f i = f (x)f (x) dx = |f (x)|2 dx = kf k2 .
a a

7.5.4 Recollection: Angles, orthogonality and projection


Let V be a pre-Hilbert space, which means an F-vector space with given inner product
h·, ·i, and let k.k be the associated norm.
In this case, we have again the geometric structure and can talk about angles, orthogonal
vectors and orthogonal projections:

Proposition & Definition 7.39. Still the same about orthogonality:


• For x, y ∈ V we write x ⊥ y if hx, yi = 0.
• For F = R and x, y ∈ V \ {o} we define:
 
hx, yi
angle(x, y) := arccos .
kxkkyk

• For a nonempty set M ⊂ V we call


M ⊥ := {x ∈ V : x ⊥ m for all m ∈ M }
the orthogonal complement of M . This is always a subspace of V .
Instead of x ∈ M ⊥ , we often write x ⊥ M .
7.5 General vector space with inner product and norms 179

• For x ∈ V and a subspace U of V there is a unique decomposition

x = p + n =: x U + x U ⊥

into the orthogonal projection p =: x U ∈ U and the normal component


n = x U ⊥ ∈ U ⊥ with respect to U . The calculation is given by
 
hx, b1 i
G(B) ΦB (p) =  ...  (7.25)
hx, bn i

for any basis B = (b1 , . . . , bn ) of U , and n = x − p.

• A family B = (u1 , . . . , un ) with vectors from V is called:


– Orthogonal system (OS) if ui ⊥ uj for all i, j = 1, ..., n with i 6= j;
– Orthonormal system (ONS) if, in addition, kui k = 1 for all i = 1, ..., n;
– Orthogonal basis (OB) if it an OS and a basis of V ;
– Orthonormal basis (ONB) if it an ONS and a basis of V .

• OS that do not own the zero vector o are always linearly independent.

• If B = (b1 , . . . , bn ) is an OB of U , then the equation (7.25) is much simpler:


 hx,b i 
1
kb1 k2 hx, b1 i hx, bn i
ΦB (x U ) =  ...  , i.e. x U = b1 + . . . + bn . (7.26)
 
kb1 k 2 kbn k2
hx,bn i
kbn k2

If B is an ONB, then it gets also easier kbi k2 (= 1).

Example 7.40. (a) The vectors x = 1i and y = 01 from C2 are not orthogonal w.r.t.
 

the standard inner product h·, ·ieuclid since


D1 0E
, = 1 · 0 + i · 1 = i 6= 0.
i 1 euclid

However, there are orthogonal w.r.t. the inner product given by hx, yi := hAx, yieuclid
with A = −i2 i
1
, since
D1 0E D 2 i
    E
1 0 D1 0E
hx, yi = , = , = , = 0.
i 1 −i 1 i 1 euclid 0 1 euclid

The orthogonal projection of x onto Span(y) can be different for different inner
products. W.r.t. h·, ·i it is o (since x ⊥ y), but w.r.t. h·, ·ieuclid it is

h 1i , 01 ieuclid 0
         
hx, yieuclid i 0 0 0
x Span(y) = y = 0 0 = =i = .
hy, yieuclid h 1 , 1 ieuclid 1 1 1 1 i

(b) Looking at the vector space F([0, 2π]), which contains function f : [0, 2π] → R, we
define a subspace V that is spanned by the family B = (1, cos, sin). Then w.r.t the
180 7 General vector spaces

inner product defined by


Z 1
hf , gi := f (x)g(x) dx ,
0

the family B is an OS:


Z 2π Z 2π
h1, cosi = cos x dx = 0, h1, sini = sin x dx = 0, and
0 0
Z 2π 2π sin2 2π − sin2 0
1 2
hcos, sini = cos x sin x dx = 2 sin x = = 0.
0 0 2
Because of
Z 2π Z 2π Z 2π
2
h1, 1i = 1 dx = 2π, hcos, cosi = cos x dx = π, hsin, sini = sin2 x dx = π
0 0 0
 
1 cos sin
the new family √ , √ , √ is an ONB of V .
2π π π

Recall also the Gram-Schmidt orthonormalisation from Remark 5.3.

Remark: Gram-Schmidt orthonormalisation


Given: Let V be a pre-Hilbert space and C a family of vectors from V .
To Find: An ONB B of Span(C).

Algorithm:

Initialise B as the empty set ( );


For all u in C:

Set v := u − u Span(B) ;
If v 6= o:

Set w := v
kvk
;
Add w to B

If you cancel the algorithm at some point, the family at this point, B = (w1 , . . . , wk ), is
a ONB of the Span(w1 , . . . , wk ).
Recall that for this ONB B = (w1 , . . . , wk ) the orthogonal projection u Span(B) is calculated
by
u Span(B) = hu, w1 iw1 + . . . + hu, wk iwk .

Example 7.41. R 1The monomials C = (m0 , m1 , m2 ) do not form an ONB in P([−1, 1])
w.r.t. hf , gi = −1 f (x)g(x) dx. We can apply the Gram-Schmidt procedure for C. Here
it is useful to start with the numbering indices 0, 1, 2, ...
7.5 General vector space with inner product and norms 181

v0 (x) 1
v0 = m0 = 1, =⇒ w0 (x) = =√ ,
kv0 k 2
r
v1 (x) 3
v1 = m1 − hm1 , w0 i w0 = m1 , =⇒ w1 (x) = = x,
| {z } kv1 k 2
0
r  
v2 (x) 45 2 1
v2 = m2 − hm2 , w0 i w0 − hm2 , w1 i w1 , =⇒ w2 (x) = = x − .
| {z

} | {z } kv2 k 8 3
2 0
3

B = (w0 , w1 , w2 ) is an ONB for Span(C) = P2 ([−1, 1]). The polynomials w0 , w1 , w2 (or


also with other normalisation factors) are called the Legendre polynomials. If we add the
other monomials m3 , m4 , ..., we get the next Legendre polynomials.

Summary
• Vectors are elements in a set, called a vector space V , that one can add together
and scale with numbers α from R or C, without leaving the set V . The addition
and scalar multiplication just have to satisfy the rules (1)–(8) from Definition 7.1.
• If you know that a set V with two operations + and α· is a vector space and if
you want to show that also a subset U 6= ∅ of V form a vector space, then you do
not have to check (1)–(8) again, but only (a) and (b) from Proposition 7.7. This is
called a subspace of V .
• The definitions linear combination, span, generating system, linearly (in)dependent,
basis and dimension are literally the same in Chapter 3.
• If you fix a basis B = (b1 , . . . , bn ) in V , then each x ∈ V has a uniquely determined
linear combination x = α1 b1 + · · · + αn bn . The numbers α1 , . . . , αn ∈ F (F is either
R or C) are called the coordinates of x w.r.t. B. This defines the vector ΦB (x) ∈ Fn .
• Changing the basis of V from B to C also changes the coordinate vector from
ΦcB (x) ∈ Fn to ΦC (x) ∈ Fn . This change can be describes by the transformation
matrix TC←B .
• One always has TB←C = TC←B−1
. Sometimes, it is helpful to go a detour TB←C =
TB←A TA←C where A is a simple and well-known basis.
• An inner product h·, ·i is a map, which takes two vectors x, y ∈ V and gives out a
number hx, yi in F. It has to satisfy the rules (S1)–(S4) from Definition 7.23.
• If A ∈ Fn×n is selfadjoint and positive definite, then hx, yi := hAx, yieuclid defines an
inner product in Fn . Here h·, ·ieuclid is the well-known standard inner product in Rn
(Chapter 2) or Cn (Chapter 6).
• A norm k · k is a map that sends a vector x ∈ V to number kxk ∈ R and satisfy the
rules (N1)–(N3) from Definition 7.33.
• An inner product h·, ·i always defines a norm kxk := hx, xi.
p

• By having an inner product, we can talk about orthogonal projection x U for a vector
x ∈ V w.r.t. a subspace U ⊂ V .
General linear maps
8
It’s dangerous to go alone! Take this.
Old man in a cave

In Chapter 3, we already introduced matrices and


linear maps. We have also seen that for a given
matrix A ∈ Rm×n there is an associated map
matrix A
n
fA : R → R m
with x 7→ Ax
 x
 
which fulfils two properties (+) and (·) and therefore chapter 3 

 chapter 8

is called a linear map. y 
We also discovered that for any map ` : Rn → Rm
with these properties there is exactly one matrix A ∈ linear map `
Rm×n , such that the associated map fA coincides
with `. In short: In A, the action of ` is written
down.
Now in Chapter 8, with the power of general vector spaces, we also can consider general
linear maps between arbitrary F-vector spaces V and W . Then it is not clear what a
suitable matrix that captures all the information for such a map would be. Later we will
see that the matrix comes from Fdim(W )×dim(V ) and is built in a similar way as before.

8.1 Definition: Linear maps


Let F be either R or C again. Let V and W be two F-vector spaces. It is important that
for both the same field F is chosen.

Definition 8.1. Linear map


A map ` : V → W is called a linear map, linear function or linear operator if `
satisfies the two following properties. For all x, y ∈ V and α ∈ F:

(L+) `(x + y) = `(x) + `(y), (additive)

183
184 8 General linear maps

(L · ) `(αx) = α `(x). (homogeneous)


If W = F, one often calls ` a linear functional.

Proposition 8.2. Linear maps send o to o.


For a linear map ` : V → W , we have `(oV ) = oW .

Proof. For arbitrary x ∈ V , we use (L·): `(oV ) = `(0x) = 0 `(x) = oW .

In the following examples F stands for R or C.

Example 8.3. (a) For V = W = F, let `(x) = 3x. We can easily check (L+) and (L·).
(b) For V = F and W = F2 , let `(x) = x 31 . Obviously, ` satisfies (L+) and (L·).


(c) Let ` : F3 → F defined by `(x) = hx, aieuclid = a∗ x with fixed a ∈ F3 , e.g.


! ! ! ! !
2 x1 D x1 2 E x1
a= 1 , hence ` : x2 7→ x2 , 1 = (2 1 3) x2 .
3 x3 x3 3 euclid x3
Using the definition of an inner product, we know that ` is linear.
(d) Define ` : F3 → F by `(x) = det(x a2 a3 ) with fixed a2 , a3 ∈ F3 , e.g.
         
1 3 x1 x1 1 3
a2 = 0 , a3 = 1 , hence ` : x = x2  7→ det x a2 a3  = det x2 0 1 .
2 1 x3 x3 2 1
We know from the definition of the determinant that ` is linear. Using Proposition 4.11
(Laplace’s formula) , we can rewrite `:
!

0 1
 
1 3
 
1 3
 x1
`(x) = x1 det −x2 det +x3 det = (−2 5 1) x2
2 1 2 1 0 1 x3
| {z } | {z } | {z }
−2 −5 1

(e) The map ` : F2 → F2 defined by x1 4x1 +3x2


is not linear because `(o) = 0
  
x2
7→ x2 +7 7
6= o.
(f) For A ∈ Fm×n define fA : Fn → Fm by fA : x 7→ Ax. This is a linear map by
Proposition 3.14. For example, F = R and m = n = 2, look at how fA acts on houses.
Let !
A= a1 a2 ∈ R2×2 . fA
e2 a1
We know: 0
fA fA fA 0 e1
o 7→ o, e1 7→ a1 , e2 7→ a2 a2
and the rest of the plane is given by
linearity.
The last example (f) includes all the other examples (a)–(e): We
 always find a corres-
ponding matrix `(x) = Ax. We find: (a) A = 3, (b) A = 1 , (c) A = (2 1 3), (d)
3

A = (−2 5 1).
Now let us look for some abstract vector spaces:
8.2 Combinations of linear maps 185

Example 8.4. (a) Let V = F(R), W = R and δ0 : V → W the evaluation for a


function f ∈ V in the origin 0, which means δ0 : f →7 f (0). Then δ0 is linear. (Show
it!) Another example would be a evaluation at different points and using linear
combinations: ` : f 7→ 3f (0) − 7f ( 41 ) + 5f (1).
(b) Let ∂ be the differential operator from V = P3 (R) to W = P2 (R), which means ∂
sends a polynomial f ∈ P3 (R) to its derivative f 0 ∈ P2 (R). Because of (f +g)0 = f 0 +g0
and (αf )0 = α f 0 , the map ∂ is linear. (Derivatives and the rules above, we will consider
next semester in mathematical analysis. Here, you can see it as a strictly algebraic
procedure, e.g. xn = nxn−1 .)
(c) In the same manner, we can look at the map P3 (R) → P1 (R) with f 7→ f 00 given by the
second derivative. In the same way, a combination is possible, f 7→ f 000 + 3f 00 − 2f 0 + 4f
as a map P3 (R) → P3 (R).
(d) Instead of using the derivative of a polynomial f ∈ P([a, b]) =: V or evaluating it in
Rb
one point, we can use the integration, hence the map i : f 7→ a f (x) dx. Therefore,
in this case, we have V = P([a, b]) and W = R. Again, we get a linear map:
Z b Z b Z b Z b Z b
g(x) dx and

f (x) + g(x) dx = f (x) dx + αf (x) dx = α f (x) dx.
a a a a a

(We also talk about the integration in mathematical analysis next semester.)

The linear maps Example 8.4 are not directly given by a matrix vector multiplication.
However, we will see that this is possible if we go over to the representation of the vector
spaces when fixing a basis. Recall the coordinate vectors and the basis isomorphism ΦB .
We will do this in Section 8.3.

8.2 Combinations of linear maps

8.2.1 Sum and multiples of a linear map

As seen in Example 7.4, we have seen that we can add and scale functions f : R → R.
This can be generalised for linear maps:

Definition 8.5. Sum and scaled linear maps


Let V and W be two F-vector space (with same F!) and let k : V → W and
` : V → W be linear maps. Then we define k+` : V → W by

(k+`)(x) := k(x) + `(x) for all x ∈ V,

and for α ∈ F, we define α · ` by

(α · `)(x) := α · `(x) for all x ∈ V.

The operations + and α· on the right-hand side are the operations in W .


186 8 General linear maps

Proposition & Definition 8.6. Vector space of linear maps V → W


The maps k+` and α·` from Definition 8.5 are again linear maps from V to W .
The set of all linear maps from V to W equipped with the two operations + and α·
form again an F-vector space. We denote this vector space by L(V, W ).

The zero vector in L(V, W ) is the zero map o : V → W defined by o(x) = o for all
x∈V.

Proof. Let k, ` : V → W be linear and let x, y ∈ V and α ∈ F. Then:

Def. 8.5 (L+)


(k + `)(x + y) = k(x + y) + `(x + y) = k(x) + k(y) + `(x) + `(y)
Def. 8.5
= k(x) + `(x) + k(y) + `(y) = (k + `)(x) + (k + `)(y)

Def. 8.5 (L ·)
and

(k + `)(αx) = k(αx) + `(αx) = α k(x) + α `(x) = α k(x) + `(x)
Def. 8.5
= α(k + `)(x),

which means k + ` has two properties (L+) and (L·) and is also linear. In the same
manner, we see that α · ` is linear. Showing the properties (1)-(8) is an exercise, for you.
I am serious. It could be an exam question.

From now on, we do not write the two operations + and α· in L(V, W ) in red anymore.
However, keep in mind that these are different operations than + and α· in W .

Example 8.7. – Projection and reflection. Let n ∈ Rn be a vector knk = 1 and


G := Span(n) the spanned line. For all x ∈ Rn , we can calculate the orthogonal projection

hx, nieuclid
xG = n = hx, nieuclid n = nhx, nieuclid = n(n> x) = (nn> )x
hn, nieuclid

(cf. Proposition 5.7).


Hence the map

projG : Rn → Rn with projG (x) := x G = (nn> )x, (8.1)

defines a linear map Rn → Rn . We also know that is given by the associated matrix:
projG = fnn> .
8.2 Combinations of linear maps 187

Using the orthogonal decomposition

x = xG + xE , projG (x) x

we also can also define the linear map


x|G
n n −x|G
projE : R → R
n
which is the orthogonal projection onto ·
projE (x)
E := G⊥ = {n}⊥ : 0 x|E
E
−x|G
projE (x) := x E = x − x G .

Subtracting the orthogonal projection x G reflE (x)


again, we get the reflection of x with respect G
to the hyperplane E.
Hence, we define:

reflE : Rn → Rn with reflE (x) := x E − x G = x − 2x G .

In other words:

projE = id − projG and reflE = id − 2 projG . (8.2)

Here, id : Rn → Rn is the identity map id : x 7→ x. By these formulas, we can conclude,


projE , reflE ∈ L(Rn , Rn ).

8.2.2 Composition and inverses


Recall that you can form the composition of two maps ` : U → V and k : V → W by
setting:
(k ◦ `)(x) = k(`(x)) for all x ∈ U. (8.3)

Proposition 8.8. Composition of linear maps is linear.


Let U, V, W be F-vector spaces and let ` : U → V and k : V → W be linear maps.
Then, the composition k ◦ ` : U → W is also linear. In short:

` ∈ L(U, V ), k ∈ L(V, W ) ⇒ k ◦ ` ∈ L(U, W ) .

Proof. For all x, y ∈ U and α ∈ F, we find:


(8.3) (L+) (L+)
(k ◦ `)(x + y) = k(`(x + y)) = k(`(x) + `(y)) = k(`(x)) + k(`(y))
(8.3)
= (k ◦ `)(x) + (k ◦ `)(y)
(8.3) (L ·) (L ·) (8.3)
and (k ◦ `)(αx) = k(`(αx)) = k(α `(x)) = α k(`(x)) = α (k ◦ `)(x).

Hence k ◦ ` also have the properties (L+) and (L·) and is linear.
188 8 General linear maps

Let compose the maps from Example 8.7:

Example 8.9. Recall both projections

projG : x 7→ nn> x and projE = id − projG

and the reflection


reflE = id − 2 projG .
All three maps act between Rn → Rn and can be composed in all possible ways. We
already know that projecting more than once does not change anything:

projG ◦ projG = projG and projE ◦ projE = projE . (8.4)

For the reflection, we expect that using it two times brings us back to the beginning,
which means that we should get the identity map:

reflE ◦ reflE = (id − 2 projG ) ◦ (id − 2 projG )


= id ◦ id} − id ◦ 2 projG − 2 projG ◦ id + 2 projG ◦ 2 projG = id.
| {z | {z } | {z } | {z }
id 2 projG 2 projG 4 projG

Composition of both projections gives us the zero map:

projG ◦ projE = projG ◦ (id − projG ) = projG ◦ id − projG ◦ projG = o. (8.5)


| {z } | {z }
projG projG

In the same way, projE ◦ projG = o. We also can calculate:

reflE ◦ projG = −projG and reflE ◦ projE = projE . (8.6)

Changing the order gives us the same result.

We again look at more abstract examples:

Example 8.10. (a) Let δ0 : P2 (R) → R given by δ0 : f 7→ f (0) the point evaluation and
∂ : P3 (R) → P2 (R) the differential operator ∂ : f 7→ f 0 from Example 8.4 (a) and (b).
Then, the composition δ0 ◦ ∂ from P3 (R) to R is given by
∂ δ
f 7→ f 0 7→
0
f 0 (0), hence δ0 ◦ ∂ : f 7→ f 0 (0).

The reverse composition ∂ ◦ δ0 is not defined!


(b) Let ∂ : P3 (R) → P2 (R) be the differentiation f 7→ f 0 and, in addition,
R
: P2 (R) →
P3 (R) the map that sends f ∈ P2 (R) to the function F with
Z x
F(x) = f (t) dt for all x ∈ [0, 1] .
0

We get:
R

f 7→ F 7→ F0 = f , hence ∂◦ : f 7→ f , which means ∂◦ = id : P2 (R) → P2 (R).
R R
8.2 Combinations of linear maps 189

We can also build the converse composition of ∂ and . Is ◦ ∂ then the identity
R R

map id : P3 (R) → P3 (R)?


Let f ∈ P3 (R) be arbitrary, which means f (x) = ax3 + bx2 + cx + d with R some
a, b, c, d ∈ R. RThen ∂(f ) = f 0
with f 0
(x) = 3ax 2
+ 2bx + c. Now, we use : The
function g := ( ◦ ∂)(f ) = (∂(f )) = (f ) satisfies:
R R 0

Z x Z x x
0
3at + 2bt + c dt = at + bt + ct = ax3 + bx2 + cx
2 3 2

g(x) = f (t) dt =

0 0 0

for all Hence, ◦ ∂)(f ) 6= f if d 6= 0. We see that “+d ” is lost. We conclude


R
R x. (
◦ ∂ 6= id.

Reminder: Inverse maps


We call a map f : V → W invertible if there is another map g : W → V with
f ◦ g = idW and g ◦ f = idV
Since g uniquely determined, it is called the inverse map of f and denoted by f −1 .

Recall that bijective and invertible are equivalent notions for maps.
However, here, we are only interested in linear maps between vector spaces. As mentioned
in Chapter 3, we have the following interesting result:

Proposition 8.11. Inverses are again linear.


If ` : V → W is a linear map that is bijective, then its inverse `−1 : W → V is also
linear

Proof. Let u, v ∈ W be arbitrary and set x := `−1 (u) and y := `−1 (v). Since ` is linear,
we have `(x + y) = `(x) + `(y) = u + v. Hence,

`−1 (u + v) = x + y = `−1 (u) + `−1 (v),


which means `−1 has the property (L+). In the same manner, we can show that `−1
satisfies (L·) as well.

Example 8.12. Recall that we already considered a linear map in Section 7.4, namely
the map ΦB : v 7→ vB , which maps a vector v from an F-vector space V to its coordinate
vector vB ∈ Fn with respect to a basis B. The map ΦB is invertible: It is surjective
because B and the standard basis of Fn are generating families, and it is injective because
B and the standard basis of Fn are linearly independent. By Proposition 8.11, we also
B : v 7→ v is linear.
know that Φ−1 B

Remark:
A linear map ` : V → W exactly conserves the structure of the vector spaces,
meaning vector addition and scalar multiplication. Therefore, mathematicians call
a linear map a homomorphism. A homomorphism ` that is invertible and has an
inverse `−1 that is also a homomorphism is called an isomorphism.
190 8 General linear maps

8.3 Finding the matrix for a linear map

8.3.1 Just know what happens to a basis


We have already seen it for a linear map fA : R2 → R2 associated to a matrix A ∈ R2×2
and the houses. When you know what fA does to the ground side of the house, the first
basis vector e1 , and the left side of the house, the second basis vector e2 of R2 , then we
know what happens to the other parts of the house and indeed to the whole space R2
under the map fA .

Rule of thumb: Linearity makes it easy


For a linear map, you only have to know what happens to a basis. The remaining
part of space “tags along”.

Let ` : V → W be a linear map and B = (b1 , . . . , bn ) some basis of V . For each x ∈ V ,


we denote by ΦB (x) ∈ Fn its coordinate vector, which means
   
α1 α1
ΦB (x) =  ...  ∈ Fn with x = α1 b1 + · · · + αn bn = Φ−1 B
 ..  .
.
αn αn
Then:

`(x) = `(α1 b1 + · · · + αn bn ) = α1 `(b1 ) + · · · + αn `(bn )

Equation (8.7) says everything: If you know the images of the all basis elements, which
means `(b1 ), . . . , `(bn ), then you know all images `(x) for each x ∈ V immediately.

Example 8.13. Let V = P3 (R) with the monomial basis B = (m0 , m1 , m2 , m3 ) where
mk (x) = xk . For the differential operator ∂ ∈ L(P3 (R), P2 (R)) where ∂ : f 7→ f 0 , we have
∂(m0 ) = o, ∂(m1 ) = m0 , ∂(m2 ) = 2m1 , ∂(m3 ) = 3m2 , (8.7)
For an arbitrary p ∈ P3 (R), which means p(x) = ax3 + bx2 + cx + d for a, b, c, d ∈ R or
p = dm0 + cm1 + bm2 + am3 , we have
 
d
c
 b  and hence ∂(p) = d∂(m0 )+c∂(m1 )+b∂(m2 )+a∂(m3 ) = cm0 +2bm1 +3am2 .
pB = 

a
Checking this: p0 (x) = 3ax2 + 2bx + c, hence ∂(p) = p0 = 3am2 + 2bm1 + cm0 .

8.3.2 Matrix of a linear map with respect to bases


Let us consider again two arbitrary finite-dimensional F-vector spaces V and W and linear
maps between them. Set n := dim(V ) and m := dim(W ). Fix B = (b1 , . . . , bn ) as a basis
for V and C = (c1 , . . . , cm ) as a basis for W . The idea is now to use both bases to
represent vectors in the vector spaces and also to represent the linear map ` : V → W as
a matrix A ∈ Fm×n . The following picture shows this idea:
8.3 Finding the matrix for a linear map 191

vector space V vector space W


with basis B ` with basis C

x `(x)

ΦB Φ−1
B ΦC Φ−1
C

matrix

ΦB (x) ΦC (`(x))

vector space Fn vector space Fm

Question:
How to get the map or the matrix in the bottom. How to send the coordinate vector
ΦB (x) to the coordinate vector ΦC (`(x))?

Of course, this is given by composing the three maps:

ΦC (`(x)) = (ΦC ◦ ` ◦ Φ−1


B )(ΦB (x))

So, f := ΦC ◦ ` ◦ Φ−1
B is a linear map from F to F . We already know that there is always
n m

a corresponding matrix A with f = fA . We get the columns of the matrix by putting the
canonical vectors into the map:

(ΦC ◦ ` ◦ Φ−1 −1
B )(ej ) = ΦC (`(ΦB (ej ))) = ΦC (`(bj ))

This gives us a matrix that really represents the abstract linear map. It depends, of course,
on the chosen bases B and C in the vector spaces V and W , respectively. Therefore, we
choose a good name:

Matrix representation of the linear map


For the linear map ` : V → W , we define the matrix
 

`C←B := ΦC (`(b1 )) · · · ΦC (`(bn )) ∈ Fm×n (8.8)

and call it the matrix representation of the linear map ` with respect to the basis B
and C.

This gets us to:

How to map the coordinates

ΦC (`(x)) = `C←B ΦB (x). (8.9)

This completes our picture:


192 8 General linear maps

The matrix `C←B describes how ` acts:

vector space V with basis B vector space W with basis C


`

x `(x)

ΦB Φ−1
B ΦC Φ−1
C

ΦB (x) ΦC (`(x))
`C←B

vector space Fn vector space Fm

Example 8.14. (a) Let ∂ : P3 (R) → P2 (R) with f 7→ f 0 the differential operator We use
in P3 (R) and P2 (R) the respective monomial basis:

B = (m3 , m2 , m1 , m0 ) and C = (m2 , m1 , m0 ).

We already know:
! !
3 0
ΦC (∂(m3 )) = ΦC (3m2 ) = 0 , ΦC (∂(m2 )) = ΦC (2m1 ) = 2 ,
0 0
! !
0 0
ΦC (∂(m1 )) = ΦC (m0 ) = 0 , ΦC (∂(m0 )) = ΦC (o) = 0 .
1 0

The column vectors from above give us the columns of the matrix ∂C←B :
 
  3 0 0 0
∂C←B = ΦC (∂(m3 )) ΦC (∂(m2 )) ΦC (∂(m1 )) ΦC (∂(m0 )) = 0 2 0 0 . (8.10)
0 0 1 0

Now we can use the map ∂ just on the coordinate level: For f ∈ P3 (R) given by
f (x) = ax3 + bx2 + cx + d with a, b, c, d ∈ R, we have
   
a   a  
b 3 0 0 0   3a
b  
ΦB (f ) = 
c
 hence ΦB (∂(f )) = ∂C←B ΦB (f ) = 0 2 0 0 
c = 2b .
0 0 1 0 c
d d
8.3 Finding the matrix for a linear map 193

So we get: !
3a
∂(f ) = Φ−1
C 2b = 3am2 + 2bm1 + cm0 .
c
We check this again by ∂(f ) = f 0 and f 0 (x) = 3ax2 + 2bx + c for all x. Therefore,
∂(f ) = 3am2 + 2bm1 + cm0 . Great!
(b) Looking again at the map : P2 ([0, 1]) → P3 ([0, 1]) which sends f to its antiderivative
R

F given by Z x
F(x) = f (t) dt for all x ∈ [0, 1].
0
Take again the monomial basis B = (mR2 , m1 , m0 ) for P2 ([0, 1]) and C = (m3 , m2 , m1 , m0 )
for P3 ([0, 1]). For getting the matrix C←B , we need the images of B. Because of
Z x
tk+1 x xk+1 1
k
for
R
(mk )(x) = t dt = = = mk+1 (x) k = 2, 1, 0 ,
0 k+1 0 k+1 k+1
we get
1 
/3
 0
ΦC ( (m2 )) = ΦC ( 31 m3 ) = 
R
 0 ,

0
 
0
1 
 /2 
ΦC ( (m1 )) = ΦC ( 12 m2 ) = 
R
 0 ,
0
 
0
0
ΦC ( (m0 )) = ΦC ( 11 m1 ) = 
R
1 .

0
The matrix representation C←B is now given by the coordinate vectors with respect
R

to the basis C:
1 
  /3 0 0
1
ΦC ( (m2 )) ΦC ( (m1 )) ΦC ( (m0 )) =  0 /2 0 .
R R R
(8.11)
R  
C←B
=  0 0 1
0 0 0

(c) Let V = P2 (R) with monomial basis B = (m2 , m1 , m0 ) and W = R with basis
C = (1). Look at the map δ0 : f 7→ f (0) as a linear map V → W . For the basis
vectors from B, we get:
δ0 (m2 ) = m2 (0) = 0, δ0 (m1 ) = m1 (0) = 0, δ0 (m0 ) = m0 (0) = 1
and hence (δ0 )C←B = (0 0 1). If we look at another map give by the evaluation at
x = 1, meaning δ1 : f 7→ f (1), the we get (δ1 )C←B = (1 1 1). Let us check the
calculations for a vector f ∈ P2 (R), which means f (x) = ax2 + bx + c with a, b, c ∈ R.
Then:
! !
a a
f = am2 + bm1 + cm0 = Φ−1 B b , hence ΦB (f ) = b
c c
194 8 General linear maps

and also
! !
a a
ΦC (δ0 (f )) = (0 0 1) b = c and ΦC (δ1 (f )) = (1 1 1) b = a + b + c.
c c

In fact: δ0 (f ) = f (0) = a02 + b0 + c = c and δ1 (f ) = f (1) = a12 + b1 + c = a + b + c.


(d) Let F ∈ {R, C} and m, n ∈ N. Choose

!
A= a1 . . . an ∈ Fm×n

and the associated linear map fA : Fn → Fm with fA : x 7→ Ax. For a basis in


V = Fn , we choose B = (e1 , . . . , en ) and in W = Fm canonical basis C = (ê1 , . . . , êm ),
where we choose the hats just to distinguish this basis from B. For getting the matrix
representation (fA )C←B we look what fA does with the basis B:

(∗) (∗)
fA (e1 ) = Ae1 = a1 = Φ−1
C a1 , ... , fA (en ) = Aen = an = Φ−1
C an .

In the step (∗), we just use that Φ−1


C = id. For the matrix representation (fA )C←B ,
we write the images into the columns and get:

!
(fA )C←B = a1 . . . an = A.

The matrix representation (fA )C←B of the linear map fA with respect to the canonical
basis is the associated matrix A.
(e) Let d : R2 → R2 be the rotation by angle ϕ. Choose in V = W = R2 the canonical
basis B = (e1 , e2 ). We use the rotation d for the basis elements e1 = 10 and e2 = 01 :


     
1 cos ϕ −1 cos ϕ
d(e1 ) = d( )= = ΦB ,
0 sin ϕ sin ϕ
     
0 − sin ϕ −1 − sin ϕ
d(e2 ) = d( )= = ΦB .
1 cos ϕ cos ϕ

The matrix representation of d with respect to the standard basis is a so-called rotation
matrix

“Rotation matrix” = matrix representation of rotation with ϕ


 
cos ϕ − sin ϕ
dB←B = . (8.12)
sin ϕ cos ϕ
8.3 Finding the matrix for a linear map 195

(f) Let n ∈ R3 with knk = 1 and projG : R3 → R3 the


linear map given by the orthogonal projection onto G := x|G
Span(n). We choose a basis B = (b1 , b2 , b3 ), in both
basis R3 , which fits our problem: Let b1 := n and b2 · E
and b3 orthogonal to n. Then: x b1 = n
b2
projG : x = αb1 + βb2 + γb3 7→ αb
|{z} | {z } |{z}1 b3
x x x
G E G

or in the coordinate language: ·


x|E
! !
α α
(projG )B←B : ΦB (x) = β 7→ ΦB (x G ) = 0 . G
γ 0
There, we can immediately see the matrix representation (projG )B←B :
 
1 0 0
(projG )B←B = 0 0 0 . (8.13)
0 0 0
Alternatively, you would calculate the images:

!
1
ΦB (projG (b1 )) = ΦB (b1 ) = 0 ,
0
!
0
ΦB (projG (b2 )) = ΦB (o) = 0 ,
0
!
0
ΦB (projG (b3 )) = ΦB (o) = 0 .
0

8.3.3 Matrix representation for compositions


In Chapter 3, we introduced the addition, scalar multiplication and matrix multiplication
for matrices, and later we generalised it to complex matrices as well. Now, we show that
these operations are compatible with the operations +, α· and ◦ for linear maps.

Proposition 8.15. Operations for matrix represenations

(a) Let V and W be two F-vector spaces with bases B and C, respectively. For linear
maps k, ` ∈ L(V, W ) and α ∈ F, we have

(k + `)C←B = kC←B + `C←B and (α `)C←B = α `C←B .

(b) Let U be a third F-vector space with chosen basis A. For all ` ∈ L(U, V ) and
k ∈ L(V, W ), we have

(k ◦ `)C←A = kC←B `B←A .


196 8 General linear maps

Please note that on the left-hand side there are the operations +, α· and ◦ for linear maps
and on the right-hand side there are the operations for matrices.
The zero matrix 0 and the identity matrix 1 are exactly the matrix representations of
the zero map o : V → W with x 7→ o and of the identity map id : V → V with x 7→ x,
respectively.

oC←B = 0 and idB←B = 1.

However, for the last equality, you really have to choose the same basis in V . Otherwise,
see equation (8.15) in section 8.3.4 later.
Now choose ` again as a linear map V → W and also a basis B in V and a basis C in W .
If ` is invertible, we immediately get:
(`−1 )B←C `C←B = (`−1 ◦ `)B←B = idB←B = 1 and `C←B (`−1 )B←C = 1.

Hence:

Matrix representation of inverse = inverse matrix

(`−1 )B←C = (`C←B )−1 . (8.14)

From this, we can conclude a very important result:

Corollary 8.16. Bijectivity not possible, if dim(V ) 6= dim(W )


If dim(V ) 6= dim(W ), then all linear maps ` : V → W are not invertible.

Proof. If ` is invertible, then (8.14) says the m × n-matrix `C←B is invertible. This means
that the matrix is a square one, hence dim(V ) = n = m = dim(W ).

Example 8.17. (a) Let projG ∈ L(R3 , R3 ) be the linear operator given by the orthogonal
projection onto G := Span(n). We choose the same basis B in both R3 like in
Example 8.14 (f). For the projection projE and the reflection reflE with respect to
the plane E := {n}⊥ , Proposition 8.15 gives us:

(8.2)
(projE )B←B = (id − projG )B←B      
1 0 0 1 0 0 0 0 0
(8.13)
= idB←B − (projG )B←B = 0 1 0 − 0 0 0 = 0 1 0
0 0 1 0 0 0 0 0 1
(8.2)
(reflE )B←B = (id − 2 projG )B←B      
1 0 0 1 0 0 −1 0 0
(8.13)
= idB←B − 2 (projG )B←B = 0 1 0 − 2 0 0 0 =  0 1 0
0 0 1 0 0 0 0 0 1

(b) Next, we again consider the differential operator ∂ : P3 (R) → P2 (R) and the anti-
derivative operator : P2 (R) → P3 (R). In P2 (R) and P3 (R) choose the monomial
R

basis B and C, respectively. From Proposition 8.15 and the equations (8.10) and
8.3 Finding the matrix for a linear map 197

(8.11), we conclude

 
  1/3 0 0  
3 0 0 0  1 1 0 0
R R 0 /2 0
(∂ ◦ )B←B = ∂B←C C←B = 0 2 0 0 

0
 = 0 1 0 = idB←B
0 1
0 0 1 0 0 0 1
0 0 0

and
1   
/3 0 0   1 0 0 0
0 1 3 0 0 0
R R /2 0 0 1 0 0
( ◦ ∂)C←C = C←B ∂B←C =
0
 0 2 0 0 =   6= idC←C .
0 1 0 0 1 0
0 0 1 0
0 0 0 0 0 0 0

8.3.4 Change of basis

Let B = (b1 , . . . , bn ) and C = (c1 , . . . , cn ) be two bases of V . Then, the identity map
id : x 7→ x of V with respect to B and C has the following matrix representation:

 
 
idC←B = ΦC (id(b1 )) · · · ΦC (id(bn )) = bC1 · · · bCn = TC←B . (8.15)

The transformation matrix TC←B from equation (7.12) for the change of basis is in fact
the matrix representation id (each x stays where it is).
Now, we can expand the notion of the change of basis: Let ` be a linear map V → W .
With respect to the bases B in V and C in W , we find the matrix representation `C←B of
`. In addition, we now choose second bases B 0 in V and C 0 in W ask the following:

Question:
What is the relation between `C←B and `C 0 ←B0 ?

Let us try to calculate the matrices `C 0 ←B0 with the help of `C←B :

Change of basis left and right


`C 0 ←B0 = (id ◦ ` ◦ id)C 0 ←B0 = idC 0 ←C `C←B idB←B0 = TC 0 ←C `C←B TB←B0 (8.16)

This gives us a nice picture:


198 8 General linear maps

Diagram: Change of basis left and right

`C←B ·
coordinates w.r.t. basis B coordinates w.r.t. basis C
matrix representation

xB `(x)C
Φ
Fn B ΦC Fm
Φ −1 −1
B
` ΦC
TB←B0
TB0 ←B

TC 0 ←C
TC←C 0
our linear map

0 x `(x) Φ
C0
ΦB vector space V vector space W
−10 Φ −1
C0
ΦB
Fn Fm
0 0
xB `(x)C
`C 0 ←B0 ·
coordinates w.r.t. basis B0 coordinates w.r.t. basis C 0
matrix representation

Example 8.18. Let us consider the differential operator ∂ : P3 (R) → P2 (R) where
V = P3 (R) carries the monomial basis B = (m3 , m2 , m1 , m0 ) and an additional basis
B 0 = (2m3 − m1 , m2 + m0 , m1 + m0 , m1 − m0 ) =: (b01 , b02 , b03 , b04 ) .
Moreover, W = P2 (R) carries the monomial basis C = (m2 , m1 , m0 ) and another basis
C 0 = (m2 − 12 m1 , m2 + 12 m1 , m0 ) =: (c01 , c02 , c03 ).
We already know the transformation TC 0 ←C for the change of basis, see Example 7.21. The
one matrix representation ∂C←B is also known by equation (8.10). The change of basis
TB←B0 is directly given by B 0 :
   
2 0
0 1 ,
ΦB (b01 ) = ΦB (2m3 − m1 ) =  0
   
−1 , ΦB (b 2 ) = ΦB (m 2 + m 0 ) = 0
0 1
   
0 0
0 , ΦB (b04 ) = ΦB (m1 − m0 ) =  0  .
ΦB (b03 ) = ΦB (m1 + m0 ) = 
   
1 1
1 −1
In summary, we have:
 
1    2 0 0 0
/2 −1 0 3 0 0 0 0 1 0 0
TC 0 ←C = 1/2 1 0 , ∂C←B = 0 2 0 0 , TB←B0 =
−1
.
0 1 1
0 0 1 0 0 1 0
0 1 1 −1
Using (8.16), we know that the matrix representation ∂C 0 ←B0 is given by the product of
these three matrices:
 
3 −2 0 0
∂C 0 ←B0 = TC 0 ←C ∂C←B TB←B0 =  3 2 0 0 . (8.17)
−1 0 1 1
8.3 Finding the matrix for a linear map 199

Alternatively, we could directly calculate ∂C 0 ←B0 from ∂ and the bases B 0 and C 0 . In order
to do this, we apply ∂ to the basis elements from B 0 and represent the results with respect
to the basis C 0 :
!
  3
ΦC 0 (∂(b01 )) = ΦC 0 ∂(2m3 − m1 ) = ΦC 0 (6m2 − m0 ) = 3 ,
| {z }
b01
−1
!
  −2
ΦC 0 (∂(b02 )) = ΦC 0 ∂(m2 + m0 ) = ΦC 0 (2m1 ) = 2 ,
| {z }
b02
0
!
  0
ΦC 0 (∂(b03 )) = ΦC 0 ∂(m1 + m0 ) = ΦC 0 (m0 ) = 0 ,
| {z }
b03
1
!
  0
Φ C0 (∂(b04 )) =Φ C0 ∂(m1 − m0 ) = ΦC 0 (m0 ) = 0 .
| {z }
b04
1

This gives us, as expected, the same matrix as in (8.17).


However, we can also do another alternative computation. Choose a, b, c, d ∈ R arbitrar-
ily. Then:
 
a
b Φ−1
B0 B0
f = c 7−→ f = a(2m3 − m1 ) + b(m2 + m0 ) + c(m1 + m0 ) + d(m1 − m0 )
d = 2am3 + bm2 + (−a + c + d)m1 + (b + c − d)m0

7−→ ∂(f ) = 6am2 + 2bm1 + (−a + c + d)m0
= 6a ( 21 c01 + 12 c02 ) +2b (−c01 + c02 ) +(−a + c + d) c03
| {z } | {z } |{z}
m2 m1 m0

= (3a − 2b)c01
+ (3a + 2b)c02
+ (−a + c + d)c03
 
a
  
3a − 2b 3 −2 0 0
2 0 0  cb  .
ΦC 0 C0 X
7−→ ∂(f ) =  3a + 2b  =  3
 
−a + c + d −1 0 1 1 d

8.3.5 Equivalent and similar matrices


Both matrices    
3 0 0 0 3 −2 0 0
∂C←B = 0 2 0 0 and ∂C 0 ←B0 = 3 2 0 0
0 0 1 0 −1 0 1 1
from Example 8.18 look completely different although they describe the same linear map
∂ ∈ L(P3 (R), P2 (R)), however, with respect two different bases. We will show that
the rank of the matrix is exactly the criterion for being a matrix representation for the
same linear map. Let V and W be two F-vector spaces with dimension n and m,
respectively. Choose a basis B in V and a basis C in W Let ` ∈ L(V, W ) given with a
matrix representation A := `C←B ∈ Fm×n , and let A0 ∈ Fm×n be another matrix (maybe
much simpler than A).
200 8 General linear maps

Question:
Are there bases B 0 and C 0 in V and W , respectively, such that A0 is the matrix
representation of `,
A0 = `C 0 ←B0 ?

coordinate vector w.r.t. coordinate vector w.r.t.


basis B in V basis C in W

xB `C←B = A `(x)C

Fn Fm

equivalence
T TB←B0 0
A = SAT
TC 0 ←C S

Fn ? Fm
`C 0 ←B0 = A0
0 0
xB with nice structure
`(x)C
coordinate vector w.r.t. coordinate vector w.r.t.
basis B0 in V basis C 0 in W

We already know, cf. (8.16),

`C 0 ←B0 = TC 0 ←C `C←B TB←B0 = SAT.


| {z } | {z } | {z }
=: S A =: T

Choosing all possible bases B 0 and C 0 in V and W , respectively, we get all possible invert-
ible matrices S and T and hence with `C 0 ←B0 all matrices that are equivalent to A:

Proposition & Definition 8.19. Equivalent matrices


A matrix B ∈ Fm×n is called equivalent to another matrix A ∈ Fm×n if there are
invertible matrices S ∈ Fm×m and T ∈ Fn×n with

B = SAT.

In this case, we write B ∼ A. For arbitrary matrices A, B, C ∈ Fm×n , the following


holds:

A ∼ A, A ∼ B ⇒ B ∼ A, A ∼ B ∧ B ∼ C ⇒ A ∼ C.

Equivalent matrices describe the same linear map, just with respect to different bases.
Here, we have a simple test for equivalence:

Proposition 8.20. Equivalence is given by the rank


For two matrices A, B ∈ Fm×n , we have: A ∼ B ⇐⇒ rank(A) = rank(B).
8.3 Finding the matrix for a linear map 201

Proof. For a given matrix A, we can use the Gaussian elimination to bring it into a
row echelon form K, namely P A = LK. By using even more row operation (backward
substitution) and column exchanges, we can bring the matrix into a so-called normal
form, given as a block matrix:
 
1r 0
A ; =: 1r,m,n with r = rank(A).
0 0 m×n

Note that in all these steps, we used invertible matrices from left and from right and,
therefore, did not change the rank of the matrix. In other words, there are invertible
matrices S and T with 1r,m,n = SAT , which means A ∼ 1r,m,n .
⇐ Is rank(A) =: r the same as rank(B) =: r0 , then we immediately get: A ∼ 1r,m,n and
1r0 ,m,n ∼ B and therefore A ∼ B.
⇒ Now, let A ∼ B. Then, there are invertible matrices S and T with B = SAT .
Because of

T x : x ∈ Fn } = {Ay : y ∈ Ran(T )} = Ran(A) ,


Ran(AT ) = {A |{z}
| {z }
=: y =Fn

we get rank(AT ) = dim(Ran(AT )) = dim(Ran(A)) = rank(A) and can conclude:

rank(A) = rank(AT ) = rank((AT )∗ ) = rank(T ∗ A∗ )


(∗)
= rank(T ∗ A∗ S ∗ ) = rank((SAT )∗ ) = rank(B ∗ ) = rank(B).

In (∗), we used that invertible matrices do not change the rank.

In particular, the natural number rank(`C←B ) is only dependent upon ` and not upon the
bases B and C. Hence the proposition below works as a alternative definition for the rank
of a linear map.

Definition 8.21. Rank of a linear map


Let ` ∈ L(V, W ). The number

rank(`) := dim(Ran(`)).

is called the rank of the linear map `.

Recall that the range of any map is the set of all elements that are “hit” by the map.

Proposition 8.22. Rank of linear map or matrix


For ` ∈ L(V, W ), the rank is the same as the rank of `C←B where B and C are any
bases in V and W :
rank(`) = rank(`C←B ).

Example 8.23. The operator projG ∈ L(R3 , R3 ) given by the orthogonal projection onto
G := Span(n) with n ∈ R3 , knk = 1 has the matrix representation with respect to B given
202 8 General linear maps

by (8.13), cf Example 8.14 (f). On the other hand, with respect to the standard basis
E = (e1 , e2 , e3 ), the operator projG is given by nn> , cf. equation (8.1), which means:
 
1 0 0
(projG )B←B = 0 0 0 and (projG )E←E = nn> ∈ R3×3 .
0 0 0

Both are matrix representations of the same linear map projG . Hence, they are equivalent
and have the same rank.

Now, we go back to a linear map ` : V → V , which means V = W . There, we can choose


the same basis on the left and right V , in other words B = C and B 0 = C 0 .
Let ` ∈ L(V, V ) and A := `B←B the matrix represenation w.r.t. to B and B.

Question:
Which matrices A0 do we get by `B0 ←B0 when B 0 is any basis of V ? What is the
connection to A?

coordinate vectors w.r.t. coordinate vectors w.r.t.


B in V B in V

xB `B←B = A `(x)B

Fn Fn

similarity
T TB←B0 0 −1 TB0 ←B S = T −1
A =T AT

Fn ? Fn
`B0 ←B0 = A0
0 0
xB maybe nice structure
`(x)B
coordinate vectors w.r.t. coordinate vectors w.r.t.
B0 in V B0 in V

In the same way as above, we have:

`B0 ←B0 = TB0 ←B `B←B TB←B0 = SAT = T −1 AT


| {z } | {z } | {z }
=: S A =: T

because S = T −1 by equation (7.15). This means the new matrix representation is given
by multiplying with a suitable matrix T from right and with the inverse T −1 from the
left. Hence, we define:

Proposition & Definition 8.24. Similar matrices


A square matrix B ∈ Fn×n is called similar to another matrix A ∈ Fn×n if there is
an invertible T ∈ Fn×n with:
B = T −1 AT.
8.4 Solutions of linear equations 203

In this case, we write B ≈ A. For A, B, C ∈ Fn×n , we get:


A ≈ A, A ≈ B ⇒ B ≈ A, A ≈ B ∧ B ≈ C ⇒ A ≈ C.

For square matrices A and B, we have the following implication:

A≈B ⇒ A∼B

The converse implication is in general wrong: 6⇐. For example, we have


1 0 0 1
but
 
0 0
∼ 0 0
1 0
0 0
6≈ 0 0 .
0 1

The classification by ∼ is much coarser than the one by ≈. Similarity ≈ is completely


determined by the Jordan normal form (cf. tutorial and Section 9.1).

8.4 Solutions of linear equations


Let V and W be two F-vector spaces. For a linear map ` ∈ L(V, W ) and a vector b ∈ W ,
we can ask about solutions of the equation

`(x) = b.

We are interested in existence and uniqueness of solutions.

Kernel and range of `


Ker(`) := {x ∈ V : `(x) = o} ⊂ V and Ran(`) := {`(x) : x ∈ V } ⊂ W.

8.4.1 Existence for solutions


V W
Clearly: The equation `
`(x) = b Ran(`)
x `(x)
has solutions x ∈ V if
b ∈ W is given of the form b c
`(x) for some x ∈ V , which
means:
b ∈ Ran(`) ⇒ `(x) = b has a solution
b ∈ Ran(`) .
c∈/ Ran(`) ⇒ `(x) = c has no solution

The existence of solutions is independent of the right-hand side b if and only if all b ∈ W
lie in Ran(`), which means:

Ran(`) = W (8.18)

We call this, as before, unconditional solvability of the equation `(x) = b and this is
equivalent to the surjectivity of `.
204 8 General linear maps

8.4.2 Uniqueness and solution set


If `(x) = b has solutions x ∈ V , then the best case scenario is when there is exactly one
such solution.
For b = o (the homogeneous equation), the solution space is given by Ker(`). Because
of Proposition 8.2, o ∈ V is always a solution and always in Ker(`). Uniqueness of the
solution is then given if o is the only element in Ker(`), hence Ker(`) = {o}.
For b 6= o (the inhomogeneous equation) we get the same criterion for uniqueness: Let
xp ∈ V be a solution, which means `(xp ) = b, and x ∈ V another solution, which means
`(x) = b, then:

`(x − xp ) = `(x) − `(xp ) = b − b = o, and x − xp ∈ Ker(`).

On the other hand, if x − xp ∈ Ker(`) and `(xp ) = b, then also x is a solution because

`(x) = `(xp + (x − xp ) ) = `(xp ) + `(k) = b.


| {z } | {z } |{z}
=: k∈Ker(`) b o

In summary:

Solution set of `(x)=b is an affine subspace


Let xp ∈ V be a solution of the equation with right-hand side b ∈ W , which means
`(xp ) = b. The solution set S = {x ∈ V : `(x) = b} is given as an affine subspace:

S = {x = xp + k : `(k) = o} = {xp + k : k ∈ Ker(`)} = xp + Ker(`)

While Ker(`) is a linear subspace of V , the solution set S = xp + Ker(`) is an affine


subspace of V . The kernel is translated by one particular solution xp :

V W
Ker(`)

k o
o `

S = xp + Ker(`)
xp + k b
xp

Therefore, also in the inhomogeneous case, the solution is unique if and only if

Ker(`) = {o} (8.19)

holds since S contains at most 1 element in this case, namely xp if it exists. If there is no
solution xp ∈ V at all, then L = ∅. We call this unique solvability, and for a linear map
` this is exactly the injectivity.
8.4 Solutions of linear equations 205

8.4.3 Invertibility: unconditional and unique solvability


The map ` is bijective if and only if it is surjective and injective. This is again equivalent
to the existence of an inverse map `−1 : W → V .

V o Ran(`) = W
o
Ker(`) = {o} `
xp b
L = {xp }
`−1
`−1 maps each right-hand side b ∈ W to the unique solution x = xp ∈ V .

8.4.4 A link to the matrix representation


Let B and C be bases in V and W , respectively, and also denote the dimensions by n and
m. Then for a linear map ` : V → W , we get a matrix representation `C←B ∈ Km×n .

V W

Ker(`) o
`
`(x)
S
b Ran(`)
x

ΦB Φ−1
B ΦC Φ−1
C

Fn Fm

Ker(`C←B ) o
`C←B ·
`(x)C
bC
Ran(`C←B )
xB

Our equation `(x) = b can be translated to the coordinate level to `C←B ΦB (x) = ΦC (b).
We immediately get
x ∈ Ker(`) ⇐⇒ ΦB (x) ∈ Ker(`C←B )
and
b ∈ Ran(`) ⇐⇒ ΦC (b) ∈ Ran(`C←B ).
This means:

Ker(`) = Φ−1
B Ker(`C←B ) and Ran(`) = Φ−1
C Ran(`C←B ). (8.20)
206 8 General linear maps

Hence Ker(`) and Ker(`C←B ) can only be simultaneously be trivial, which means {o}, and
in the same manner Ran(`) and Ran(`C←B ) can also only be simultaneously be the whole
space. For matrices `C←B , we can restate our Propositions 3.65 and 3.67 from Section 3.12:

Proposition 8.25. Unconditional solvability (surjectivity of `)


The following claims are equivalent:
(i) The equation `(x) = b has at least one solution x for each b ∈ W .
(ii) ` is surjective.
(iii) Ran(`) = W .
(iv) Ran(`C←B ) = Fm .
(v) rank(`) = rank(`C←B ) = m ≤ n.
(vi) If using Gaussian elimination to bring `C←B into row echelon form, then each
row has a pivot.

Proposition 8.26. Unique solvability (injectivity of `)


The following claims are equivalent:
(i) The equation `(x) = b has at most one solution x for each b ∈ W .
(ii) ` is injective.
(iii) Ker(`) = {o}.
(iv) Ker(`C←B ) = {o}.
(v) rank(`) = rank(`C←B ) = n ≤ m.
(vi) If using Gaussian elimination to bring `C←B into row echelon form, then each
column has a pivot.

In the case m = n, we get:

Proposition 8.27. Fredholm alternative


In the case m = n, which means dim(V ) = dim(W ) < ∞, then either all claims
from the Propositions 8.25 and 8.26 are true or none of them are true.

Also the rank-nullity theorem can now be transformed to the abstract case:

Rank-nullity theorem
For two F-vector spaces V , W with dim(V ) < ∞ and a linear map ` : V → W , we
have:  
dim Ker(`) + dim Ran(`) = dim(V ).

Example 8.28. (a) The projection operator projG ∈ L(R3 , R3 ) with G = Span(n) has
the Ker(projG ) = E := {n}⊥ and Ran(projG ) = G. The rank-nullity theorem gives
us 2 + 1 = 3. The map is neither injective nor surjective. The equations projG (x) = b
have only solutions if b ∈ G. The solution set S is then a line that is parallel to E,
hence a translation of the kernel by a particular solution xp .
8.4 Solutions of linear equations 207

(b) The reflection reflE = id−2 projG is an invertible map where the inverse is reflE again.
The kernel is {o} and the range R3 . The rank-nullity theorem says 0 + 3 = 3. The
equation reflE (x) = b has a unique solution for all b ∈ R3 .
(c) Consider the differential operator ∂ : f 7→ f 0 , defined as P3 (R) → P3 (R), where P3 (R)
carries the monomial basis B. The matrix representation is:
 
0 0 0 0
3 0 0 0
∂B←B =  0 2 0 0 .
 (8.21)
0 0 1 0

Obviously,
       
0 0 0 0
0 3 0 0
Ker(∂B←B ) = Span 
0
 and Ran(∂B←B ) = Span 
0 , 2 , 0 ,
    

1 0 0 1

and hence by (8.20), we have

Ker(∂) = Φ−1
B Ker(∂B←B ) = Span(m0 ) = P0 (R)

and
Ran(∂) = Φ−1
B Ran(∂B←B ) = Span(m2 , m1 , m0 ) = P2 (R) .

This makes sense since f 0 = o is solved exactly by the constant functions, which are
all functions f ∈ P0 (R), and the set of all possible outcomes f 0 for f ∈ P3 (R) is exactly
P2 (R).
The equation ∂(f ) = g, which means searching antiderivative for a given function
g ∈ P3 (R), has the solution set S = ∅ if g 6∈ P2 (R) = Ran(∂). For g ∈ P2 (R), we
have
S = fp + Ker(∂) = fp + P0 (R)
with a particular solution fp ∈ P3 (R).
(d) We want to find all solutions f ∈ P3 (R) for the differential equation f 00 + 2f 0 − 8f =
2m2 − 3m1 . In order to do this, we define the linear map ` := ∂ ◦ ∂ + 2 ∂ − 8 id by
P3 (R) → P3 (R) and `(f ) = f 00 + 2f 0 − 8f . Then, we only have to solve the equation
`(f ) = 2m2 − 3m1 in P3 (R). Choose for P3 (R) the monomial basis B, namely on
both sides:

`B←B = (∂ ◦ ∂ + 2 ∂ − 8 id)B←B = (∂B←B )2 + 2 ∂B←B − 8 idB←B


 2    
0 0 0 0 0 0 0 0 1 0 0 0
(8.21) 3 0 0 0
 + 2 3 0 0 0 − 8 0 1 0 0
   
=  0 2 0 0 0 2 0 0 0 0 1 0
0 0 1 0 0 0 1 0 0 0 0 1
 
−8 0 0 0
 6 −8 0 0
= .
6 4 −8 0 
0 2 2 −8
208 8 General linear maps

Since the matrix `B←B is invertible (look at the determinant (−8)4 6= 0), we can use
Proposition 8.25 and Proposition 8.26, which say that we have a unique solution for
all right-hand sides in P3 (R).
Let us now solve our equation from above: Seeing it in the coordinate language, we
can rewrite `(f ) = 2m2 − 3m1 as `B←B f B = (2m2 − 3m1 )B , hence:
 m3 m2 m1 m0     
m3 −8 0 0 0 m3 0 m3 0
1
m2 6 −8 0 0  m2 2  m2 − /4 
  fB =  . Solution: fB =  1

m1 6 4 −8 0  m1 −3 m1 /4 
m0 0 2 2 −8. m0 0 m0 0

and hence f = Φ−1


B f = − 4 m2 + 4 m1 , i.e. f (x) = − 4 x + 4 x.
B 1 1 1 2 1

8.5 Determinants and eigenvalues for linear maps

Reminder: Determinant and eigenvalues for similar matrices


Let A and B be two square matrices that are similar, A ≈ B. Then:

det(A) = det(B) and spec(A) = spec(B).

For this last section, we consider a linear map ` ∈ L(V, V ), which means V = W . All
matrix representations `B←B , where we have the same basis left and right, are similar
matrices and have the same determinant and eigenvalues by Proposition 8.5.
Hence, we define:

Definition 8.29. Determinant of a linear map `


The determinant of a linear map ` ∈ L(V, V ) is defined as the determinant of a
matrix representation `B←B for any basis B in V :

det(`) := det(`B←B ).

Knowing the matrix representations, we immediately get the rules for the determinant:
   
det(k◦`) = det (k◦`)B←B = det kB←B `B←B = det kB←B det `B←B = det(k) det(`)
and det(id) = det(idB←B ) = det(1) = 1.

Combining these two rules, we get:

det(`−1 ) det(`) = det(`−1 ◦ `) = det(id) = 1 and det(`−1 ) = 1/ det(`).

The notion of eigenvalues and eigenvectors were introduced for matrices and the associated
linear maps. Indeed the whole definition makes also sense in the abstract setting, so that
we can also use it for linear maps ` : V → V . In the end, this will be totally connected
to the eigenvalues of the matrix representations.
8.5 Determinants and eigenvalues for linear maps 209

Definition 8.30. Eigenvalue and eigenvector for linear map `


Let ` ∈ L(V, V ). A vector x ∈ V \ {o} is called an eigenvector of ` if `(x) is
a multiple of x. The number λ ∈ F with `(x) = λx is called an eigenvalue of `
associated to the eigenvector x.

Again, we get the following equivalences:

λ is an eigenvalue of `
⇐⇒ the homogeneous equation (` − λ id)(x) = o has a solution x 6= o
⇐⇒ ` − λ id is not injective
P.8.27
⇐⇒ ` − λ id is not invertible
(8.20)
⇐⇒ (` − λ id)B←B = `B←B − λ idB←B = `B←B − λ1
is not invertible for any basis B of V
⇐⇒ λ is an eigenvalue of `B←B for all bases B of V
⇐⇒ det (` − λ id)B←B = 0 for all bases B of V


⇐⇒ det(` − λ id) = 0

Example 8.31. (a) The rotation d ∈ L(R2 , R2 ) from Example 8.3 (e) has the determin-
ant 1 since the associated matrix representation(8.12) w.r.t. the standard basis B in
R2 :
 
cos ϕ − sin ϕ
= (cos ϕ)2 + (sin ϕ)2 = 1.

det(d) = det dB←B = det
sin ϕ cos ϕ

For F = R, we only find eigenvalues and eigenvector if ϕ is an integer multiple of


π. For example, for ϕ = π, we have d = −id and hence each vector in R2 is an
eigenvector for the eigenvalue λ = −1.
(b) For the orthogonal projection projG ∈ L(R3 , R3 ) onto the line G := Span(n) and both
variants
projE = id − projG and reflE = id − 2 projG
from Example 8.7, 8.9, 8.14 (f) and 8.17 (a), we find with the help of equation (8.13):
 
1 0 0
det(projG ) = det 0 0 0 = 0
0 0 0

Using Example 8.17 (a), we get:


   
0 0 0 −1 0 0
det(projE ) = det 0 1 0 = 0 and det(reflE ) = det  0 1 0 = −1.
0 0 1 0 0 1

For projG each vector from G is an eigenvector for the eigenvalue 1, and each vector
from E is an eigenvector for the eigenvalue 0 since E is the kernel of projG . For projE
we have the same with G ↔ E. For reflE each vector from G is an eigenvector for the
eigenvalue −1, and each vector from E is an eigenvector for the eigenvalue 1.
210 8 General linear maps

Summary
• A map ` from one F-vector space V to another F-vector space W is called linear if
`(x + y) = `(x) + `(y) and `(αx) = α`(x) for all x, y ∈ V and α ∈ F. We write:
` ∈ L(V, W ).
• Linear maps L(V, W ) can be added and scaled with α ∈ F. Hence, L(V, W ) gets an
F-vector space.
• The composition k ◦ ` of linear maps ` : U → V and k : V → W is linear.
• The inverse map of a bijective linear map is again linear. Therefore a bijective linear
map is called an isomorphism.
• Each linear map ` ∈ L(V, W ), between finite dimensional vector spaces V and W ,
can be identified with a matrix. In order to do this, choose a basis B = (b1 , . . . , bn )
in V and a basis C = (c1 , . . . , cm ) in W . Be using the basis isomorphisms ΦB and
ΦC , we get a linear map Fn → Fm . Such a linear map is also represented by a m × n
matrix `C←B := (`(b1 )C · · · `(bn )C ). It is called the matrix representation of ` w.r.t.
B and C.
• The matrix representation of k + ` is the sum of both matrix representations.
• The matrix representation of α` is α times the matrix representation of `.
• The matrix representation of k ◦ ` is the product of both matrix representations.
• The matrix representation of `−1 is the inverse of the matrix representation of `.
• Kernel and range of a linear map ` can be calculated by `C←B .
• By changing the basis of V from B to B 0 and changing the basis of W from C to C 0 ,
the matrix representation of ` : V → W changes from `C←B to `C 0 ←B0 . In this case,
we have `C 0 ←B0 = TC 0 ←C `C←B TB←B0 .
• We call two matrices A and B equivalent and write A ∼ B if there are invertible
matrices S and T with B = SAT .
• We have A ∼ B if and only if rank(A) = rank(B).
• For the special case ` : V → V , one often chooses the same basis B left and right.
How does the matrix `B←B change when changing the basis B to B 0 ? Then, we have
S = T −1 in the formula above.
• Two matrices A and B are called similar and one writes A ≈ B if there is an
invertible matrix T with B = T −1 AT .
• From A ≈ B follows det(A) = det(B) and spec(A) = spec(B) but the converse is
in general false.
• det(`) for a linear map ` : V → V is defined by det(`B←B ) for any basis B in V .
• λ ∈ F is an eigenvalue of ` : V → V if `(x) = λx for some x ∈ V \ {o}.
Some matrix decompositions
9
The story so far: In the beginning the Universe was created. This has
made a lot of people very angry and been widely regarded as a bad move.
Douglas Adams

In the previous chapters 7 and 8, we considered general vector spaces and linear maps
between them. We could show that we are able to decode abstract vectors into vectors
from Fn or Fm and also abstract linear maps into matrices from Fm×n . Therefore, even in
this new general setting, it is important to know how to deal with matrices. Accordingly,
in this chapter, we only consider matrices and want to find some simpler structures for a
given matrix. We already observed such transformations or decompositions of a matrix
into simpler forms:
• In Section 3.11.3, we discovered how to decompose a square matrix into a lower
and an upper triangular matrix: A = LU . We also generalised this for rectangular
matrices as A = P LK where K is the row echelon form.
• In Section 5.5, we discovered how to decompose a square matrix A with full rank
into an orthogonal matrix and an upper triangular matrix: A = QR. It is not hard
to generalise this method for complex matrices and for rectangular matrices as well.
• In Section 6.7, we found that some matrices A can be decomposed into three parts

A = XDX −1

where D is a diagonal matrix with eigenvalues on the diagonal and X consists of


eigenvectors in the columns. Note that diagonalisable actually means similar to a
diagonal matrix .
The mentioned diagonalisation aspect is the one we want to generalise. At first, we do
this for all square matrices, which brings us to the Jordan normal form, and then we do
this even for rectangular matrices, which brings us to the singular value decomposition.

211
212 9 Some matrix decompositions

9.1 Jordan normal form


We are searching for the best substitute of the usual diagonalisation A = XDX −1 such
that it works for all matrices A ∈ Cn×n . A good thing would be to use a triangular matrix
instead of D if A is not diagonalisable. The next Proposition tells us that we only need
some 1s above the diagonal:

Proposition & Definition 9.1. Jordan normal form


Let A ∈ Cn×n with pairwise different eigenvalues λ1 , . . . , λr ∈ C, where α1 , . . . , αr
denote the corresponding algebraic multiplicities and γ1 , . . . , γr the corresponding
geometric multiplicities. Then, there is an invertible matrix X ∈ Cn×n such that

A = XJX −1 or equivalently X −1 AX = J

and J ∈ Cn×n has the following block diagonal form:


 
J1
..
J = . .
 
Jr

J is called a Jordan normal form (JNF) of A. The entries Ji are again block
matrices, which are called Jordan blocks, and have the following structure:
 
Ji,1
Ji = 
 ...  α ×α
 ∈ C i i,
Ji,γi

where the matrices Ji,` are called Jordan boxes and have the following form:
 
λi 1
..

λi . 
Ji,` =  .
 
..
 . 1
λi

Note that Ji,` could also be a 1 × 1-matrix.

Example 9.2. If you have a matrix A ∈ C9×9 and find an invertible matrix X with
A = XJX −1 such that
4 1
 
 4 1 
 

 4 


 4 1 J2


J = 4 z }| { ,


 | {z } −3 1 


 J1 −3 

 −3 
−3
9.1 Jordan normal form 213

then you immediately find the following informations for A:


• On the diagonal of J, we find the eigenvalues A (counted with algebraic multiplicity),
namely λ1 = 4 and λ2 = −3.
• The Jordan block J1 , which corresponds to the eigenvalue λ1 = 4, has the size
5 × 5, hence α1 = 5. We conclude that λ1 = 4 is an eigenvalue of A with algebraic
multiplicity 5. The block J1 consists of two boxes and therefore γ1 = 2, which means
the eigenspace for λ1 = 4 is 2-dimensional. Here, the Jordan box J1,1 has the size
3 × 3 and the Jordan box J1,2 has the size 2 × 2.
• The Jordan block J2 , which corresponds to the eigenvalue λ2 = −3, has the size 4×4,
hence α2 = 4. We conclude that λ2 = −3 is an eigenvalue of A with multiplicity 4.
The block J2 owns three boxes, which means γ2 = 3. Here, the box J2,1 has the size
2 × 2 and the two boxes J2,2 and J2,3 have size 1 × 1.
On the other hand, we learn that J is not determined solely by eigenvalues and multipli-
cities because also the matrix
 
4 1

 4 1 


 4 1 


 4 


 4 


 −3 1 


 −3 

 −3 
−3

would fit to these parameters above

λ1 = 4, α1 = 5, γ1 = 2, λ2 = −3, α2 = 4, γ2 = 3 .

Construction of J and X: How and why?


First we need the eigenvalues λ1 , . . . , λr of A because on a triangular matrix they have
to be on the diagonal counted with the algebraic multiplicities. So we also determine
α1 , . . . , αr . For each λi , we do the following procedure.

Rule of thumb: Treat the problem for all λi separately.


Each λi has its own Jordan block Ji and corresponding columns in X. Therefore,
we can deal with the problem for each eigenvalue separately and put it together in
the end.

Since A ≈ J, we already know that the characteristic polynomial of A and J coincide


(cf. Proposition 6.26). Hence, both matrices have the same eigenvalues with the same
algebraic multiplicities. They have to be on the diagonal of J by Proposition 6.9.

Size of Ji
The Jordan block Ji for the eigenvalue λi has the size αi × αi because we need λi as
often on the diagonal of J as the algebraic multiplicity says.
214 9 Some matrix decompositions

The n columns of X have to be linearly independent vectors from Cn in order that X is


invertible. Just looking at the αi × αi -block Ji , we need αi columns from this matrix X.
How to get them?
Recall that for the diagonalisation, in the case that A is diagonalisable, we had enough
eigenvectors corresponding to the eigenvalue λi , which means vectors from Ker(A − λi 1).
We could choose them as a linearly independent family because

dim(Ker(A − λi 1)) =: γi = αi .

In the case γi < αi (which means A is not diagonalisable), we are missing some columns
in X.

To shorten everything: A − λi 1 =: N

Let us look at an example with αi = 8 and γi = 4. Choose x1 , . . . , x4 ∈ Ker(N ), which


are eigenvectors of A.
Ker(N )
dimension: 4 = γi x1,1 x2,1 x3,1 x4,1
We need αi = 8 linearly independent vectors for X but at this point we only have γi = 4.
How to get the missing four vectors?
Answer: Since we have not found enough vectors in the kernel of N , we can look at the
kernels of N 2 , N 3 , ... until we have found 8 vectors in total. Clearly:

Ker(N ) ⊂ Ker(N 2 ) ⊂ Ker(N 3 ) ⊂ · · · , since N x = o ⇒ N 2 x = N (N x) = o, . . .

Recall: Ker(N ) has the dimension γi = 4. Suppose that Ker(N 2 ) is of dimension 7 and
that Ker(N 3 ) has dimension 8 = αi . The difference

dk := dim(Ker(N k )) − dim(Ker(N k−1 ))

of the dimensions shows us where to find the four missing vectors. Elements from the
spaces Ker(N k ) are called generalised eigenvectors. To be more clear, we call an element
from Ker(N k ) \ Ker(N k−1 ) a generalised eigenvector of rank k. In this sense, the ordinary
eigenvectors are now generalised eigenvector of rank 1.
9.1 Jordan normal form 215

Box 9.3. Levels and Jordan chains

Ker(N 3 ) 3rd level


x1,3
dimension: 8 = αi
d3 = 8 − 7 = 1

2nd level

1st chain
Ker(N 2 )
x1,2 x2,2 x3,2
dimension: 7

4th chain
2nd chain

3rd chain
d2 = 7 − 4 = 3
N· N· N·
Ker(N 1 ) 1st level
dimension: 4 = γi x1,1 x2,1 x3,1 x4,1

As you can see in the picture, the vectors form “chains”, from top to bottom. We call each
of these sequences a Jordan chain and it will be related to a Jordan box.

Box 9.4. Number and size of the Jordan boxes


Each Jordan chain ends at an ordinary eigenvector xj,1 ∈ Ker(N ). Therefore, we
have exactly γi Jordan boxes inside the chosen Jordan block Ji . The length of a
Jordan chain is the size of the corresponding Jordan box. All sizes add up to αi
(here: 8), which is exactly the size of the Jordan block Ji .

Looking at our example, we have 4 Jordan boxes of size 3, 2, 2 and 1. Hence:


 
 λi 1 
λi 1
 
λi 1
 
Ji = Diag  λi 1 , , , (λi ) ∈ C8×8 .
λi λi
λi

At this point, we now know the whole block Ji . The next step is to find the corresponding
columns of X, which means that we have to calculate the generalised eigenvectors xj,k :

Box 9.5. Generalised eigenvectors: Start the Jordan chain


The starting point xj,k for the jth Jordan chain can be chose in an almost arbitrary
way from the kth level: Let xj,k ∈ Ker(N k ), but

xj,k 6∈ Span Ker(N k−1 ) ∪ {x1,k , . . . , xj−1,k } , (9.1)




where x1,k , . . . , xj−1,k are the vectors from the chains before, 1 to j − 1, which lie
on the same level k. Now you can build the whole chain to the bottom xj,1 . We just
have to multiply with N in each step:
For x ∈ Ker(N k ), we have N x ∈ Ker(N k−1 ) since o = N k x = N k−1 (N x).

Note that equation (9.1) guarantees that all generalised eigenvectors on the kth level are
linearly independent and that the linear independence remains on the levels below. All
these αi generalised eigenvectors are put as columns into X.
216 9 Some matrix decompositions

Box 9.6. Columns of X regarding λi


Let Xi ∈ Cn×αi the matrix with columns filled out from left to right:

1st Jordan chain (bottom to top), . . . , γi th Jordan chain (bottom to top).

For our example, this means: Xi = x1,1 , x1,2 , x1,3 , x2,1 , x2,2 , x3,1 , x3,2 , x4,1 ∈ Cn×8 .


After we did the whole procedure for all eigenvalues λ1 , . . . , λr , the only thing that remains
is:

Put everything together


 
J1
J := 
 ... ∈C
 n×n
and X := X1 , . . . , Xr ∈ Cn×n .

(9.2)
Jr

This is all. Let us summarise the whole story:

Algorithm for calculating a Jordan normal form of A


Given: An arbitrary matrix A ∈ Cn×n .
Wanted: Jordan normal form J and X in Cn×n with A = XJX −1 .
Algorithm

• Calculate all eigenvalues λ1 , . . . , λr (pairwise distinct) of A


and the algebraic multiplicities α1 , . . . , αr .
• For i = 1, . . . , r:

• Set N := A − λi 1.
• Calculate Ker(N ), Ker(N 2 ), . . . , Ker(N m ) to dim(·) = αi .
• Calculate all dk := dim(Ker(N k )) − dim(Ker(N k−1 )).
• Draw the levels 1, . . . , m and Jordan chains. (Box ??)
• Write down the Jordan block Ji . (Box 9.4)
• Calculate all generalised eigenvectors. (Box 9.5)
• Define Xi with all generalised eigenvectors. (Box 9.6)

• Set J := Diag(J1 , . . . , Jr ) and X := X1 , . . . , Xr as in (9.2).




Why does this work? Let us look at the X-columns regarding one Jordan chain and its
corresponding Jordan box. Choose the first chain from our example. The chain was given
by x1,2 = N x1,3 and x1,1 = N x1,2 . In this way, we get.

x1,2 = N x1,3 = (A − λi 1)x1,3 = Ax1,3 − λi x1,3 , hence Ax1,3 = x1,2 + λi x1,3


and x1,1 = N x1,2 = (A − λi 1)x1,2 = Ax1,2 − λi x1,2 , hence Ax1,2 = x1,1 + λi x1,2 .
9.1 Jordan normal form 217

In summary:
! ! !
A x1,1 x1,2 x1,3 = Ax1,1 Ax1,2 Ax1,3 = λi x1,1 x1,1 + λi x1,2 x1,2 + λi x1,3

! λ 1
 !
i
= x1,1 x1,2 x1,3  λi 1  =: x1,1 x1,2 x1,3 Ji,1 .
λi
By using the definition of the Jordan chain, we get the 1s above the diagonal in the matrix
Ji,1 . Only at the ordinary eigenvectors (here: x1,1 ), the chain stops. There, you do not
find a 1 but only λi since Ax1,1 = λi x1,1 .
By putting all Jordan boxes together into a Jordan block, we get γi equations (one per
Jordan box), given by
A (xj,1 xj,2 · · · xj,k ) = (xj,1 xj,2 · · · xj,k ) Ji,j , j = 1, . . . , γi ,
one matrix equation AXi = Xi Ji for the ith Jordan block. The final assembling, cf.
(9.2), of the Jordan blocks Ji to the whole matrix J gives us then AX = XJ, which is
exactly the factorisation A = XJX −1 .
Now let us practise:

Example 9.7. Let  


5 0 1 0 0
0 1 0 0 0
 
−1
A= 0 3 0 0.
0 0 0 1 0
0 0 0 0 4
• The characteristic polynomial is
det(A − λ1) = (4 − λ)3 (1 − λ)2 .
We see that λ1 = 4 with α1 = 3 and λ2 = 1 with α2 = 2.
• Let us start the work (and fun) with the eigenvalue λ1 = 4. For the matrix
 
1 0 1 0 0
 0 −3 0 0 0
 
N := A − λ1 1 = A − 41 =   −1 0 −1 0 0 

0 0 0 −3 0
0 0 0 0 0
we get (after solving the LES N x = o) that
Ker(N ) = {x = (−x3 , 0, x3 , 0, x5 )> : x3 , x5 ∈ C}
and hence γ1 = dim(Ker(N )) = 2. Since α1 = 3, we have to calculate
 
0 0 0 0 0
0 9 0 0 0 
2
 
N =  0 0 0 0 0 

0 0 0 9 0 
0 0 0 0 0
218 9 Some matrix decompositions

and we get:
Ker(N 2 ) = {x = (x1 , 0, x3 , 0, x5 )> : x1 , x3 , x5 ∈ C} .
From this, we conclude dim(Ker(N 2 )) = 3. Now we have reached the algebraic multi-
plicity α1 = 3 and do not need to consider any higher powers of N , hence m = 2.
• For the differences of the dimension, we get

d1 := dim(Ker(N 1 )) − dim(Ker(N 0 )) = 2 − 0 = 2,
d2 := dim(Ker(N 2 )) − dim(Ker(N 1 )) = 3 − 2 = 1.

Note that Ker(N 0 ) = Ker(1) = {o} always have dimension 0.


• We have m = 2 levels whereas the second level owns d2 = 1 vectors and the first level
has d1 = 2 vectors:
2nd level: x1,2

1st level: x1,1 x2,1

• Since we have a Jordan chain with length 2 and another one with length 1, we know
that the first Jordan block J1 has two Jordan blocks with different sizes:
 
4 1
J1 =  0 4 .
4

• Let us now find and correctly choose the generalised eigenvectors: We have to choose
x1,2 from Ker(N 2 ) \ Ker(N 1 ). Since we already calculated both kernels, we can choose
x1,2 = (1, 0, 0, 0, 0)> for the top level of the first chain. We calculate the chain to the
bottom: x1,1 := N x1,2 = (1, 0, −1, 0, 0)> .
Now, let us do the second chain. On the top level, we find x2,1 ∈ Ker(N 1 ). By choosing
this vector, we have to respect equation (9.1), which means it is not a linear combination
of vectors from Ker(N 0 ) ∪ {x1,1 } = {o, x1,1 }. In this case, this means that x2,1 comes
from Ker(N 1 ) and is not a multiple of x1,1 . Hence, we choose x2,1 := (0, 0, 0, 0, 1)> .
• We have finished the second chain and can give the matrix
 
1 1 0
   0 0 0
 
X1 = x1,1 x1,2 x2,1 =  −1 0 0 .

 0 0 0
0 0 1

Now, we have done everything for the eigenvalue λ1 = 4. Next thing is the eigenvalue
λ2 = 1.
• For the matrix
 
4 0 1 0 0
0 0 0 0 0
 
−1
N := A − λ2 1 = A − 11 =  0 2 0 0
0 0 0 0 0
0 0 0 0 3
9.1 Jordan normal form 219

we get (after solving N x = o) that

Ker(N ) = {x = (0, x2 , 0, x4 , 0)> : x2 , x4 ∈ C}

and hence γ2 = dim(Ker(N )) = 2. Since α2 = 2, we do not need to calculate higher


powers of N and set m = 1.
• We denote

d1 = dim(Ker(N 1 )) − dim(Ker(N 0 )) = 2 − 0 = 2 ...

• ...and get a bit boring picture with only m = 1 level and d1 = 2 vectors:

1st level: x1,1 x2,1

Here, we see two chains with length 1.


• The Jordan block J2 has two Jordan boxes of size 1 and looks like:
 
1
J2 = .
1

• Let us determine the generalised eigenvectors: x1,1 comes from Ker(N 1 ) \ Ker(N 0 ) and
we could choose x1,1 = (0, 1, 0, 0, 0)> . Now, for the second chain, choose x2,1 ∈ Ker(N 1 )
such that is not given by a linear combination of vectors from Ker(N 0 ) ∪ {x1,1 } =
{o, x1,1 }, cf. (9.1). Let us set x2,1 = (0, 0, 0, 1, 0)> .
• Hence, we have the matrix
 
0 0
  1
 0
X2 = x1,1 0
x2,1 =  0
0 1
0 0

and also finished the work for the eigenvalue λ2 .


• In summary, we get:
   
4 1 1 1 0 0 0
   0 4   0
 0 0 1 0
J1   
 and X = X1 X2 = −1

J= = 4 0 0 0 0,
J2   
 1  0 0 0 0 1
1 0 0 1 0 0

hence, A = XJX −1 .
220 9 Some matrix decompositions

Video: Jordan normal form


https: // jp-g. de/ bsom/ la/ jor/

Corollary 9.8. Eigenvalues give determinant and trace


For A ∈ Cn×n , let λ1 , . . . , λn be the eigenvalues counted with algebraic multiplicities.
Then n n
Y X
det(A) = λi and tr(A) = λi ,
i=1 i=1

where tr(A) := ajj is the sum of the diagonal, the so-called trace of A.
Pn
j=1

9.2 Singular value decomposition

In the diagonalisation and in the Jordan decomposition, we had three parts in the form
A = U DV . There we had

U and V are inverse to each other.

Now, if we drop that condition, we can actually fulfil the following two properties even
for rectangle matrices

(1) D is diagonal,

(2) U and V are unitary square matrices.

Since we allow U and V to be any square matrices, which means U ∈ Fn×n and V ∈ Fm×m ,
we can consider an arbitrary rectangular matrix A ∈ Fm×n .
We will later show that each matrix A has such a decomposition. Since we want to use
the common notations, we will denote the diagonal matrix D by Σ and use V ∗ on the
right-hand side instead of V . We will see below why this is indeed a good idea. Hence,
the wanted decomposition of A is now written as A = U ΣV ∗ and looks like this:

Singular value decomposition of A


9.2 Singular value decomposition 221

n m n
n

m A = m U · m Σ · n V∗ . (9.3)

unitary
arbitrary unitary diagonal

The word “diagonal” for a rectangular matrix Σ is of course not literally correct. It means
the following here:

n
s1 n
.. s1
.
...
Σ = m sn or Σ = m , (9.4)
sm
if m ≤ n
if m ≥ n

where every other entry is zero.


The equation A = U ΣV ∗ tells us that A and Σ are equivalent matrices, A ∼ Σ. The
matrix A is the matrix representation of the linear map ` := fA : Fn → Fm given by
x 7→ Ax with respect to the standard bases B in Fn and C in Fm . The change of basis
to an ONB V = (v1 , . . . , vn ) in Fn and an ONB U = (u1 , . . . , um ) in Fm gives us another
matrix represenation `U ←V which is the “diagonal” matrix from (9.4):

`C←B = TC←U `U ←V TV←B


| {z } | {z } | {z } | {z }
A U Σ V∗

with
! !
U= u1 · · · um = TC←U , V = v1 · · · v n = TB←V , V ∗ = V −1 = TV←B .

Because of A ∼ Σ and the characterisation of equivalences given by Proposition 8.20, we


know rank(Σ) = rank(A) =: r ≤ m, n. Hence, exactly r of the entries si in (9.4) are
non-zero. Of course, we can choose ui , vi in such an order that we have s1 , . . . , sr as the
non-zero elements.
Hence, we can see the matrix Σ from (9.4) as the following matrix representation:

vr ··· v vr+1 ··· vn


 1
s

.. 1 . .
u1
. . 
ur  sr
(9.5)

Σ = `U ←V =   ∈ Fm×n
ur+1
.. 
 
.

um
222 9 Some matrix decompositions

Multiplying A = U ΣV ∗ from the right with V gets us AV = U Σ. Let us look at this in


more detail:
! !
Av1 · · · Avn = A v 1 · · · vn = AV = U Σ

s1 .
    
..
= u1 · · · um   sr  = s1 u1 · · · sr ur o · · · o .

Therefore, we have:

Av1 = s1 u1 , Av2 = s2 u2 , . . . , Avr = sr ur , Avr+1 = o, . . . , Avn = o. (9.6)

Analogously, we get from A∗ = (U ΣV ∗ )∗ = V Σ∗ U ∗ that A∗ U = V Σ∗ and hence:

A∗ u1 = s1 v1 , . . . , A∗ ur = sr vr , A∗ ur+1 = o, . . . , A∗ um = o. (9.7)

From (9.5) or (9.6), we get:

Proposition 9.9. Kernel and range of A


Ker(A) = Ker(`) = Span(vr+1 , . . . , vn ), Ker(A)⊥ = Span(v1 , . . . , vr )
Ran(A) = Ran(`) = Span(u1 , . . . , ur ), Ran(A)⊥ = Span(ur+1 , . . . , um )

Please recognise the rank-nullity theorem dim(Ker(A)) + dim(Ran(A)) = dim(Fn ).


Of course, we see that the decomposition A = U ΣV ∗ from (9.3) is useful as a representa-
tion of the corresponding linear map. We will later see how we can use this in applications.
However, the question remains how to get U , V and Σ?
Let us go back to the result: From A = U ΣV ∗ , we would get:

A∗ A = (U ΣV ∗ )∗ (U ΣV ∗ ) = V Σ∗ U ∗ U ΣV ∗ = V Σ∗ ΣV ∗ (9.8)
and AA∗ = (U ΣV ∗ )(U ΣV ∗ )∗ = U ΣV ∗ V Σ∗ U ∗ = U ΣΣ∗ U ∗ . (9.9)

Because of (9.5) and si si = si si = |si |2 , we have square matrices

|s1 |2 |s1 |2
.. ..
. .
2
|sr | |sr |2

Σ Σ= n and ∗
ΣΣ = m ,

n m

that are also diagonal. Hence, (9.8) and (9.9) show us the unitary diagonalisations of
the square matrices AA∗ and A∗ A. Recall that both matrices are self-adjoint and have
by Proposition 6.44 in fact an ONB consisting of eigenvectors. These orthonormal eigen-
vectors (w.r.t to standard inner product!) are chosen as the columns of the matrices U
and V .
9.2 Singular value decomposition 223

Therefore, we find the eigenvalues of A∗ A on the diagonal of Σ∗ Σ and the eigenvalues of


AA∗ on the diagonal of ΣΣ∗ . Interestingly, with the exception of |m − n| zeros, these
are the same eigenvalues: |s1 |2 , . . . , |sr |2 , 0, . . . , 0. To see this, just consider the non-zero
eigenvalues of A∗ A and AA∗ . So choose λ 6= 0 and calculate

u:=Av
A∗ Av = λv =⇒ AA∗ u = AA∗ (Av) = A(A∗ Av) = A(λv) = λ(Av) = λu,

hence spec(A∗ A) ⊂ spec(AA∗ ). The converse works the same. As you know from the
homework, all eigenvalues are non-negative. This also fits in with the diagonal entries
|s1 |2 , . . . , |sr |2 , 0, . . . , 0 of Σ∗ Σ and ΣΣ∗ . Actually, we can choose the number si as we
want in F as long as |si |2 is the ith eigenvalue of A∗ A or AA∗ . A simple choice is, of
course, s1 , . . . , sr being real and positive numbers.
In summary, we now have everything for U, V and Σ:

Definition 9.10. Singular values and singular vectors


Let A ∈ Fm×n . The (non-negative) square roots from the eigenvalues of A∗ A are
called the singular values of A and we order them from highest to lowest (counted
with multiplicities):
s1 ≥ s2 ≥ · · · ≥ sn ≥ 0 .
The vectors from an orthonormal family (v1 , . . . , vn ) consisting of eigenvectors of
A∗ A, with the same order as for s21 , . . . , s2n , are called the right-singular vectors of A.
In the same way, the vectors from an orthonormal family (u1 , . . . , um ) consisting of
eigenvectors of AA∗ , with the same order as for s21 , . . . , s2n , are called the left-singular
vectors of A.
The factorisation
A = U ΣV ∗ ,
given by U = (u1 · · · um ), V = (v1 · · · vn ) and Σ from (9.4), is called the singular
value decomposition of A. In short: SVD.

To summarise everything, let us state the whole algorithm:

Algorithm for the SVD of A


Given: An arbitrary matrix A ∈ Fm×n .
Wanted: Unitary matrices U ∈ Fm×m , V ∈ Fn×n and diagonal matrix Σ ∈ Rm×n
for the singular value decomposition A = U ΣV ∗ of A.
Algorithm:
224 9 Some matrix decompositions

• Calculate the matrix A∗ A and all eigenvalues λ1 , . . . , λn (counted with mul-


tiplicities) and use the ordering such that λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0.
• Find an ONB (v1 , . . . , vn ) consisting of eigenvectors for A∗ A.
• Define r for number of the last non-zero eigenvalue λi .
• For i = 1, . . . , r:

• Set si := λi .
• Set ui := s1i Avi , cf. (9.6).

• Set sr+1 , . . . , sm := 0.
• Add to (u1 , . . . , ur ) a family (ur+1 , . . . , um ) such that it is an ONB of Fm .
• Set U := (u1 · · · um ), V := (v1 · · · vn ) and Σ as in (9.4) or (9.5).

Rule of thumb: Calculating the singular vectors

• Alternatively, one finds u1 , . . . , um as the eigenvectors of AA∗ . However, that


is more costly than using vi if we already have them.
• In the case m < n, AA∗ is smaller than A∗ A, hence it is better to calculate the
eigenvalues λi and eigenvectors ui of the matrix A∗ A and to use (9.7) for getting
vi .
• Depending on the application, the eigenvectors ui and vi for i > r might not be
important (cf. (9.11)).

√ 
Example 9.11. Consider the matrix A = 1
−2 0
3
. We have m = n = 2 and F = R.
• Calculate   √   √ 
1 −2 1 3 5 3
A∗ A = √ = √
3 0 −2 0 3 3
and get det(A∗ A − λ1) = (5 − λ)(3 − λ) − 3 = λ2 − 8λ + 12 = (λ − 2)(λ − 6).
The eigenvalues of A∗ A are (in decreasing order) λ1 = 6 and λ2 = 2.
• The eigenvector v1 for λ1 = 6: We solve the linear equation: (A∗ A − 61)v1 = o,
 √    √ 
−1 3 0 1 3
hence √ v1 = ⇐= v1 = (normalised).
3 −3 0 2 1

Eigenvector v2 for λ2 = 2: We solve the equation (A∗ A − 21)v2 = o,


 √     
3 3 0 1 −1
and get √ v2 = ⇐= v2 = √ (normalised).
3 1 0 2 3

Both vectors v1 and v2 are automatically orthogonal (Proposition 6.43).


• Both eigenvalues, λ1 = 6 and λ2 = 2, are non-zero. Hence r = 2.
9.2 Singular value decomposition 225

√ √
• Next thing, we calculate s1 := λ1 = 6 and
 √  √   √   
1 1 1 3 1 3 1 2 √3 1 1
u1 := Av1 = √ = √ =√ ,
s1 6 −2 0 2 1 2 6 −2 3 2 −1
√ √
• Then s2 := λ2 = 2 and
 √       
1 1 1 −1
3 1 √ 1 2 1 1
u2 := Av2 = √ = √ =√ .
s2 2 −2 0 2 3 2 2 2 2 1

• In summary, we get:
  √  √ 
1 1 1 1 −1
3 √ 6 √0
U=√ , V = and Σ = .
2 −1 1 2 1 3 0 2

Because we started in F = R, we could do the whole calculation inside R.

The unitary matrices U and V are indeed orthogonal matrices if all entries are real and,
by our Definition in 5.29, describe rotations (if det(·) = 1) or reflections (if det(·) = −1).
In the example above, both matrices are rotations: U rotates R2 by −45◦ and V rotates
it by 30◦ .
In Chapter 3, we have seen that linear maps fA : R2 → R2 with x 7→ Ax can only stretch,
rotate and reflect. Hence, a linear map changes the unit circle into an ellipse or it collapses
into a line or point. By using the SVD, we can explain in more details what happens
exactly. Let us look at our Example 9.11:

A=U ΣV ∗ means: fA = rotating, stretching, rotating

v2
· v1 A· s2 u2
30◦
· 45◦
fA
s1 u1

V ∗· V· A = U ΣV ∗ U·
Rotation Rotation
Σ
z }| {
e2 s1
· s2 e2
s2
· e1 ·
strechting with s1 e1
different factors

The factorisation A = U ΣV ∗ decompose fA in a composition into three maps:


(1) a rotation by −30◦ (that is multiplication by V ∗ = V −1 ),
226 9 Some matrix decompositions


(2) stretching separately into two directions
√ (with factors s 1 = 6 in direction
of the x1 -axis and with factor s2 = 2 in direction of the x2 -axis) and
(3) a rotation by −45◦ (multiplication by U ).

The major axis and minor axis of the ellipse, which is construed by fA from the unit
circle, are given by the eigenvectors u1 , u2 and the lengths are given by the singular
values s1 and s2 .

The singular values si give us the stretching factors in certain (orthogonal) directions.
For the largest singular value, s1 , we have

s1 = ks1 u1 k = kAv1 k = max{kAxk : x ∈ Fn , kxk = 1} =: kAk. (9.10)

The here defined number kAk is the already introduced matrix norm of A. It says how
long the vector Ax ∈ Fm can be at most when x ∈ Fn has length 1. The matrix norm
fulfils the three properties of a norm.
Now we look at an important application of the SVD. We start with a calculation:

A as sum of r dyadic products


s1 .
     
..  s1 
A = U ΣV ∗ = U  sr  V ∗
= U   + ··· +  sr  V∗

   
s1
=U V ∗ + ··· + U  sr V ∗

! ! r
X
= s1 u1 ( v1∗ ) + ··· + sr ur ( vr∗ ) = si ui vi∗ (9.11)
i=1

As we know, A has rank r. Each of the r terms in (9.11) has rank 1. Depending on the
rate of decay of the singular values

s1 ≥ s2 ≥ · · · ≥ sr > 0

we could omit some terms in the sum (9.11) without changing the matrix so much. We
call this a low-rank matrix approximation of A.

Example 9.12. Let us look at an 8-bit-grey picture with 500 × 500 pixels (which shows
a beautiful duck on water):
9.2 Singular value decomposition 227

This can be saved as a matrix A ∈ R500×500 where, in the entries, only integer values
0, 1, . . . , 255 are allowed.

1 3 7 5 . . . 16 8

3 7 3 3 . . . 11 12
. . .. ..
. .

. . . .


6 7 . . . 248 . . . 7 6 
 
A = . . .. .. 187 KB
 .. ..

 . . 

4 8 8 5 ... 4 8 
 
4 8 3 3 ... 6 6 
2 3 3 4 ... 9 9
For calculations, we convert the number entries into the range [0, 1] instead of [0, 255].
Most pictures should be full-rank matrices and here we can calculate the rank and actually
get r = n. Now let us write A in the representation given by equation (9.11). Now we
stop the summation instead of after r = 500 steps at k = 50, 30, 10 or 5 terms and we
get the following pictures:

50 KB 30 KB 10 KB 5 KB
The decay of the singular values s1 ≥ · · · ≥ s500 below shows us why we already have at
onyl 30 terms in the sum a very good approximation.

The first singular values (s1 ≈ 262, s2 ≈ 28 and s3 ≈ 21) are not shown in the picture,
for obvious reason.

In Example 9.12, we have seen that a given matrix A with rank r = 500 can be well
approximated by matrices Ak with rank k = 50, 30, 10 or 5. For
r
X k
X
A= si ui vi∗ and k ∈ {1, . . . , r} we set Ak := si ui vi∗ .
i=1 i=1
Ak has rank k and is in fact the best m × n-matrix with rank k for the approximation of
A. We measure the error of approximation by using the matrix norm in equation (9.10):
   
s1 . s1 .
. . ..

sk .
  sk 
..
 ∗
kA − Ak k = U  −  V
  
 sr   
228 9 Some matrix decompositions

 
 sk+1 .  ∗
..
= U   = sk+1 (largest singular value left).
V
 sr

In short:

sk+1 = distance of A to the set of all matrices with rank k (9.12)

In particular, s1 is the distance of A to the set of all matrices with rank 0, which consists
only of the zero matrix 0.

At the end, let us take a look at the special case m = n, which means A and Σ are square
matrices. In this case, eigenvalues and singular values are related in the following sense:
• A is invertible if and only if all the singular values are non-zero (see also Proposi-
tion 6.28 for the same claim with eigenvalues).
The smallest singular value of A, sn , gives the distance of A to the set of all n × n-
matrices with rank n − 1 or smaller (which are exactly the singular matrices) by
equation (9.12).
The equation A−1 = (U ΣV ∗ )−1 = V Σ−1 U ∗ gives the SVD of A−1 . Therefore, the
singular values of A−1 are 1/s1 , . . . , 1/sn . The largest of these, meaning 1/sn , is
kA−1 k.
• We know from Corollary 9.8 that the product of all eigenvalues of a given matrix A
is exactly det(A). Since
det(A) = det(U ΣV ∗ ) = det(U ) det(Σ) det(V ∗ ) ⇒ | det(A)| = det(Σ) ,
we know that the product of all singular values, which is det(Σ), is equal to the
absolute value of det(A).
• If A is normal, which means A∗ A = AA∗ , then A can be diagonalised by using a
unitary matrix: A = XDX ∗ . Then D = diag(d1 , . . . , dn ) is a diagonal matrix with
the eigenvalues of A as entries and X = (x1 · · · xn ) consists of eigenvectors for A.
Hence A∗ A = XD∗ DX ∗ = X diag(|d1 |2 , . . . , |dn |2 ) X ∗ . The eigenvalues λi of A∗ A
are, on the one hand, given by λi = di di = |di |2 and, on the other hand, they can be
written as λi = s2i by using the singular values si ≥ 0 of A. Therefore, we get:
si = |di |.
The singular values of A are exactly the absolute values of the eigenvalues of A.

Summary
• A lot of techniques in Linear Algebra deal with suitable factorisations of a given
matrix A:
• From Section 3.11.5: The Gaussian elimination are summarised by a left multiplic-
ation with a lower triangular matrix and a permutation matrix P . Hence, P A
is the row echelon form K of A and we have P A = LK with lower triangular matrix
L := −1 .
9.2 Singular value decomposition 229

• From Section 5.5: A linearly independent family of vectors (a1 , . . . , an ) from Fm can
be transformed into an ONS (q1 , . . . , qn ) by using the Gram-Schmidt procedure.
Therefore, we have for k = 1, . . . , n always ak ∈ Span(q1 , . . . , qk ). For the matrices
A := (a1 . . . an ) and Q := (q1 . . . qn ) we find A = QR, where R ∈ Fn×n is an
invertible upper triangular matrix.
• If we decompose A into a product U DV , then we have different approaches.
• For diagonalisable matrices, we can choose U = X and V = X −1 where in X the
columns are eigenvectors of A and form a basis. Then D has the eigenvalues of A
on the diagonal, counted with multiplicities. See Chapter 6. We also know that
selfadjoint and even normal matrices A are always diagonalisable, we can choose
eigenvectors in such a way that they form an ONB, which means X ∗ = X −1 .
• For non-diagonalisable matrices we still can write A = XDX −1 but now D is not
diagonal. We use the Jordan normal form as a substitute. We get the important
result that all (square) matrices A ∈ Cn×n have such a Jordan normal form and
therefore this decomposition. Note that we actually need the complex numbers
here.
• For the singular value decomposition, the two matrices U and V are not connected
such that we can also bring rectangular matrices A into “diagonal” structure. On
the diagonal D (that is often denoted by Σ), we find the so-called singular values
of A. The singular value decomposition is used for low rank approximation.
Index

LU -decomposition, 76 characteristic polynomial, 133


ith canonical unit vector, 33 closed under, 156
ith component, 33 codomain, 15
n-tupels, 11 cofactor matrix, 98
p-norm, 175 colinear, 60
(linear) subspace, 34, 136 column vector, 51
(standard) inner product, 30, 41, 137 column vectors, 27
(standard) scalar product, 30, 41 columns, 51
complement, 10
absolute value, modulus, 45 complex numbers C, 44
argument , 45 complex vector space, 136
complex conjugate, 45 components, 28
imaginary part , 45 composition, 20
real part , 45 convex combinations, 39
absolute value, 13 convex subset, 39
adjoint matrix, 138 coordinates, 62, 162
affine combinations, 39 coplanar, 60
affine subspace, 39 cross product, 42
algebraic multiplicity, 133, 141, 148
deduction rule, 6
argument, 15
determinant, 92, 96
associated norm, 110, 177
determinant of a linear map, 208
augmented matrix, 70
diagonal, 51
basis, 62, 136, 158 diagonal matrix, 51
basis isomorphism, 162, 164 diagonalisable, 147
basis vectors, 62, 137 dimension, 63, 158
bijective, 18 direct sum, 118
domain, 15
canonical unit vectors, 28
cardinality, 8 eigenspace, 140
Cartesian product, 10 eigenvalue, 129, 130, 209
Cauchy-Schwarz inequality, 111 eigenvector, 129, 130, 209
change-of-basis matrix, 166 element, 7

231
232 INDEX

equivalence, 5 Jordan normal form (JNF), 212


equivalent, 200
expanding det(A) along the ith row, 99 kernel, 68
expanding det(A) along the j th column,
leading principal minors, 173
98
leading variables, 79
factors, 15 left-singular vectors, 223
fiber, 17 Legendre polynomials, 181
field, 12, 14, 153 length, 31, 41
Fourier coefficients, 120 LES, 49
Fourier expansion, 120 Levi-Civita symbol, 42
free variables, 79 linear, 56
function, 15 linear combination, 26, 29, 158
linear equation system, 49
Gaussian elimination, 70 linear function, 183
Gauß algorithm, 70 linear functional, 184
Gauß-Jordan algorithm, 168 linear hull, 35, 136, 158
generalised eigenvector of rank k, 214 linear map, 56, 183
generalised eigenvectors, 214 linear operator, 183
generating set, 36 linear subspace, 156
generating system, 158 linearly dependent, 60, 136, 158
geometric multiplicity, 141, 148 linearly independent, 60, 136, 158
Gramian matrix, 116 logical statement, 4
low-rank matrix approximation, 226
Hesse normal form (HNF), 125 lower triangle, 51
homogeneous equation, 204 lower triangular matrix, 51
homomorphism, 189
hyperplane, 125 map, 15
matrices with m rows and n columns, 47
identity map, 20 matrix norm, 112, 226
identity matrix, 64 matrix product, 53
image, 17, 68 matrix product of A and B, 53
implication, 5 matrix representation, 191
in place factorisation, 77 monomial basis, 161
induced norm, 177 monomials, 161
inertia tensor, 150
inhomogeneous equation, 204 natural numbers, 8
injective, 18 nonsingular, 65
inner product, 109 norm, 31, 41, 111, 137
inner product for V , 171 norm on V , 175
intersection, 10 normal, 138, 150
intervals, 12 normal component, 113, 116, 179
inverse, 65 normal form, 201
inverse map, 19, 189 normal vector, 31
invertible, 19, 65, 189 normed space, 175
isomorphism, 189 nullspace, 68

Jordan blocks, 212 orthogonal, 30, 41, 122


Jordan boxes, 212 Orthogonal basis, 118, 179
Jordan chain, 215 orthogonal complement, 114, 178
INDEX 233

orthogonal projection, 113, 116, 179 singular value decomposition, 223


orthogonal sum, 118 singular values, 223
Orthogonal system, 118, 179 skew-adjoint, 138
orthogonality, 112 skew-symmetric, 51, 68
Orthonormal basis, 118, 179 solution, 49
Orthonormal system, 118, 179 span, 35, 136, 158
spectrum, 130
pairs, 10 square matrix, 51
perpendicular, 30 storage matrix, 77
pivot, 79 subset, 10
position vector, 27 subspace, 156
positive definite, 111, 172 sum, 14
positive numbers, 12 summands, 14
pre-Hilbert space, 171 surjective, 18
predicate, 8 SVD, 223
preimage, 17 Sylvester’s criterion, 173
product, 15 symmetric, 51, 68
proposition, 4 system of linear equations, 49
QR-decomposition, 124
trace, 135, 150, 220
quadratic matrix, 51
transformation matrix, 57, 166
quantifiers, 9
transpose, 52, 66
range, 17, 68
unconditional solvability, 203
rank, 69, 201
union, 10
Rank-nullity theorem, 69, 206
unique solvability, 204
reflection, 123
unitary, 138
right-singular vectors, 223
upper, 51
rotation, 123
upper triangle, 51
rotation matrix, 194
upper triangular form, 74
row echelon form, 74, 79
row operations, 73 value, 15
row vector, 51, 52 vector, 25
rows, 51 vector addition, 153
scalar multiplication, 153 vector product, 42
scalars, 153 vector space, 33, 48, 136, 153
selfadjoint, 138 vector space R2 , 28
set, 7 vector space Rn , 33
set difference, 10 vectors, 153
set of all solution, 73 well-defined, 16
similar, 103, 139, 202
singular, 65, 105 zero vector, 28, 34

You might also like