Linear Guest (001 109)
Linear Guest (001 109)
2
Contents
3
4
7 Matrices 121
7.1 Linear Transformations and Matrices . . . . . . . . . . . . . . 121
7.1.1 Basis Notation . . . . . . . . . . . . . . . . . . . . . . 121
7.1.2 From Linear Operators to Matrices . . . . . . . . . . . 127
7.2 Review Problems . . . . . . . . . . . . . . . . . . . . . . . . . 129
4
5
8 Determinants 169
8.1 The Determinant Formula . . . . . . . . . . . . . . . . . . . . 169
8.1.1 Simple Examples . . . . . . . . . . . . . . . . . . . . . 169
8.1.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . 170
8.2 Elementary Matrices and Determinants . . . . . . . . . . . . . 174
8.2.1 Row Swap . . . . . . . . . . . . . . . . . . . . . . . . . 175
8.2.2 Row Multiplication . . . . . . . . . . . . . . . . . . . . 176
8.2.3 Row Addition . . . . . . . . . . . . . . . . . . . . . . . 177
8.2.4 Determinant of Products . . . . . . . . . . . . . . . . . 179
8.3 Review Problems . . . . . . . . . . . . . . . . . . . . . . . . . 182
8.4 Properties of the Determinant . . . . . . . . . . . . . . . . . . 186
8.4.1 Determinant of the Inverse . . . . . . . . . . . . . . . . 190
8.4.2 Adjoint of a Matrix . . . . . . . . . . . . . . . . . . . . 190
8.4.3 Application: Volume of a Parallelepiped . . . . . . . . 192
8.5 Review Problems . . . . . . . . . . . . . . . . . . . . . . . . . 193
5
6
13 Diagonalization 241
13.1 Diagonalizability . . . . . . . . . . . . . . . . . . . . . . . . . 241
13.2 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . 242
13.3 Changing to a Basis of Eigenvectors . . . . . . . . . . . . . . . 246
13.4 Review Problems . . . . . . . . . . . . . . . . . . . . . . . . . 248
6
7
B Fields 317
7
8
Index 432
8
What is Linear Algebra?
1
Many difficult problems can be handled easily once relevant information is
organized in a certain way. This text aims to teach you how to organize in-
formation in cases where certain mathematical structures are present. Linear
algebra is, in general, the study of those structures. Namely
In broad terms, vectors are things you can add and linear functions are
functions of vectors that respect vector addition. The goal of this text is to
teach you to organize information about vector spaces in a way that makes
problems involving linear functions of many variables easy. (Or at least
tractable.)
To get a feel for the general idea of organizing information, of vectors,
and of linear functions this chapter has brief sections on each. We start
here in hopes of putting students in the right mindset for the odyssey that
follows; the latter chapters cover the same material at a slower pace. Please
be prepared to change the way you think about some familiar mathematical
objects and keep a pencil and piece of paper handy!
9
10 What is Linear Algebra?
But lets think carefully; what is the left hand side of this equation doing?
Functions and equations are di↵erent mathematical objects so why is the
equal sign necessary?
If someone says
we do not quite have all the information we need to determine the relationship
between inputs and outputs.
Do we multiply the first number of the input by 24 or by 35? No one has specified an
order for the variables, so we do not know how to calculate an output associated with
a particular input.1
A di↵erent notation for V can clear this up; we can denote V itself as an ordered
triple of numbers that reminds us what to do to each number from the input.
1
Of course we would know how to calculate an output if the input is described in
the tedious form such as “1 share of G, 2 shares of N and 3 shares of A”, but that is
unacceptably tedious! We want to use ordered triples of numbers to concisely describe
inputs.
10
1.1 Organizing Information 11
0 1 0 1
1 1
Denote V by 24 80 35 and thus write V @2A = 24 80 35 @2A
3B 3
If we change the order for the variables we should change the notation for V .
0 1 0 1
1 1
@
Denote V by 35 80 24 and thus write V 2 A = 35 80 24 2A
@
3 B0 3
The subscripts B and B 0 on the columns of numbers are just symbols2 reminding us
of how to interpret the column of numbers. But the distinction is critical; as shown
above V assigns completely di↵erent numbers to the same columns of numbers with
di↵erent subscripts.
There are six di↵erent ways to order the three companies. Each way will give
di↵erent notation for the same function V , and a di↵erent way of assigning numbers
to columns of three numbers. Thus, it is critical to make clear which ordering is
used if the reader is to understand what is written. Doing so is a way of organizing
information.
2
We were free to choose any symbol to denote these orders. We chose B and B 0 because
we are hinting at a central idea in the course: choosing a basis.
11
12 What is Linear Algebra?
This example is a hint at a much bigger idea central to the text; our choice of
order is an example of choosing a basis3.
1. uncover aspects of functions that don’t change with the choice (Ch 12)
Unfortunately, because the subject (at least for those learning it) requires
seemingly arcane and tedious computations involving large arrays of numbers
known as matrices, the key concepts and the wide applicability of linear
algebra are easily missed. So we reiterate,
In broad terms, vectors are things you can add and linear functions are
functions of vectors that respect vector addition.
12
1.2 What are Vectors? 13
(C) Polynomials: If p(x) = 1 + x 2x2 + 3x3 and q(x) = x + 3x2 3x3 + x4 then
their sum p(x) + q(x) is the new polynomial 1 + 2x + x2 + x4 .
(D) Power series: If f (x) = 1+x+ 2!1 x2 + 3!1 x3 +· · · and g(x) = 1 x+ 2!1 x2 1 3
3! x +· · ·
1 2 1 4
then f (x) + g(x) = 1 + 2! x + 4! x · · · is also a power series.
(E) Functions: If f (x) = ex and g(x) = e x then their sum f (x) + g(x) is the new
function 2 cosh x.
There are clearly di↵erent kinds of vectors. Stacks of numbers are not the
only things that are vectors, as examples C, D, and E show. Vectors of
di↵erent kinds can not be added; What possible meaning could the following
have? ✓ ◆
9
+ ex
3
In fact, you should think of all five kinds of vectors above as di↵erent
kinds, and that you should not add vectors that are not of the same kind.
On the other hand, any two things of the same kind “can be added”. This is
the reason you should now start thinking of all the above objects as vectors!
In Chapter 5 we will give the precise rules that vector addition must obey.
In the above examples, however, notice that the vector addition rule stems
from the rules for adding numbers.
When adding the same vector over and over, for example
x + x, x + x + x, x + x + x + x, ... ,
we will write
2x , 3x , 4x , . . . ,
respectively. For example
0 1 0 1 0 1 0 1 0 1 0 1
1 1 1 1 1 4
4 1 = 1 + 1 + 1 + 1 = 4A .
@ A @ A @ A @ A @ A @
0 0 0 0 0 0
13
14 What is Linear Algebra?
• numbers
• n-vectors
• 2nd order polynomials
• polynomials
• power series
• functions with a certain domain
14
1.3 What are Linear Functions? 15
In linear algebra, the functions we study will have vectors (of some type)
as both inputs and outputs. We just saw that vectors are objects that can be
added or scalar multiplied—a very general notion—so the functions we are
going to study will look novel at first. So things don’t get too abstract, here
are five questions that can be rephrased in terms of functions of vectors.
d
(D) What power series f (x) satisfies x dx f (x) 2f (x) = 0?
4
The cross product appears in this equation.
15
16 What is Linear Algebra?
x 10x
This is just like a function f from calculus that takes in a number x and
spits out the number 10x. (You might write f (x) = 10x to indicate this).
For part (B), we need something more sophisticated.
0 1 0 1
x z
@yA @ z A,
z y x
The inputs and outputs are both 3-vectors. The output is the cross product
of the input with... how about you complete this sentence to make sure you
understand.
The machine needed for example (C) looks like it has just one input and
two outputs; we input a polynomial and get a 2-vector as output.
0 R1 1
1
p(y)dy
p @R A.
1
1
yp(y)dy
16
1.3 What are Linear Functions? 17
While this sounds complicated, linear algebra is the study of simple func-
tions of vectors; its time to describe the essential characteristics of linear
functions.
Let’s use the letter L to denote an arbitrary linear function and think
again about vector addition and scalar multiplication. Also, suppose that v
and u are vectors and c is a number. Since L is a function from vectors to
vectors, if we input u into L, the output L(u) will also be some sort of vector.
The same goes for L(v). (And remember, our input and output vectors might
be something other than stacks of numbers!) Because vectors are things that
can be added and scalar multiplied, u + v and cu are also vectors, and so
they can be used as inputs. The essential characteristic of linear functions is
what can be said about L(u + v) and L(cu) in terms of L(u) and L(v).
Before we tell you this essential characteristic, ruminate on this picture.
The “blob” on the left represents all the vectors that you are allowed to
input into the function L, the blob on the right denotes the possible outputs,
and the lines tell you which inputs are turned into which outputs.6 A full
pictorial description of the functions would require all inputs and outputs
6
The domain, codomain, and rule of correspondence of the function are represented by
the left blog, right blob, and arrows, respectively.
17
18 What is Linear Algebra?
The key to the whole class, from which everything else follows:
1. Additivity:
L(u + v) = L(u) + L(v) .
2. Homogeneity:
L(cu) = cL(u) .
Most functions of vectors do not obey this requirement.7 At its heart, linear
algebra is the study of functions that do.
Notice that the additivity requirement says that the function L respects
vector addition: it does not matter if you first add u and v and then input
their sum into L, or first input u and v into L separately and then add the
outputs. The same holds for scalar multiplication–try writing out the scalar
multiplication version of the italicized sentence. When a function of vectors
obeys the additivity and homogeneity properties we say that it is linear (this
is the “linear” of linear algebra). Together, additivity and homogeneity are
called linearity. Are there other, equivalent, names for linear functions? yes.
7
E.g.: If f (x) = x2 then f (1 + 1) = 4 6= f (1) + f (1) = 2. Try any other function you
can think of!
18
1.3 What are Linear Functions? 19
And now for a hint at the power of linear algebra. The questions in
examples (A-D) can all be restated as
Lv = w
If we view functions as vectors with addition given by addition of functions and with
scalar multiplication given by multiplication of functions by constants, then these
familiar properties of derivatives are just the linearity property of linear maps.
Before introducing matrices, notice that for linear maps L we will often
write simply Lu instead of L(u). This is because the linearity property of a
19
20 What is Linear Algebra?
which feels a lot like the regular rules of algebra for numbers. Notice though,
that “uL” makes no sense here.
This idea will take some time to develop, but we provided an elementary
example in Section 1.1. A good starting place to learn about matrices is by
studying systems of linear equations.
20
1.4 So, What is a Matrix? 21
Each bag contains 2 apples and 4 bananas and each box contains 6 apples and 8
bananas. There are 20 apples and 28 bananas in the room. Find x and y.
The values are the numbers x and y that simultaneously make both of the following
equations true:
2 x + 6 y = 20
4 x + 8 y = 28 .
Here we have an example of a System of Linear Equations.8 It’s a collection
of equations in which variables are multiplied by constants and summed, and
no variables are multiplied together: There are no powers of variables (like x2
or y 5 ), non-integer or negative powers of variables (like y 1/7 or x 3 ), and no
places where variables are multiplied together (like xy).
21
22 What is Linear Algebra?
Writing our fruity equations as an equality between 2-vectors and then using
these rules we have:
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
2 x + 6 y = 20 2x + 6y 20 2 6 20
() = () x +y = .
4 x + 8 y = 28 4x + 8y 28 4 8 28
Now we introduce a function which takes in 2-vectors9 and gives out 2-vectors.
We denote it by an array of numbers called a matrix .
✓ ◆ ✓ ◆✓ ◆ ✓ ◆ ✓ ◆
2 6 2 6 x 2 6
The function is defined by := x +y .
4 8 4 8 y 4 8
✓ ◆ ✓ ◆
x 2x + 6y
.
y 4x + 8y
22
1.4 So, What is a Matrix? 23
Indeed this is an example of the general rule that you have probably seen
before ✓ ◆✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
p q x px + qy p q
:= =x +y .
r s y rx + sy r s
Notice, that the second way of writing the output on the right hand side of
this equation is very useful because it tells us what all possible outputs a
matrix times a vector look like – they are just sums of the columns of the
matrix multiplied by scalars. The set of all possible outputs of a matrix
times a vector is called the column space (it is also the image of the linear
function defined by the matrix).
Matrices in Space!
Thus matrices can be viewed as linear functions. The statement of this for
the matrix in our fruity example is as follows.
✓ ◆ ✓ ◆ ✓ ◆✓ ◆
2 6 x 2 6 x
1. = and
4 8 y 4 8 y
23
24 What is Linear Algebra?
✓ ◆ ✓ ◆ ✓ 0 ◆ ✓ ◆✓ ◆ ✓ ◆ ✓ 0◆
2 6 x x 2 6 x 2 6 x
2. + = + .
4 8 y y0 4 8 y 4 8 y0
These equalities can be verified using the rules we introduced so far.
✓ ◆
2 6
Example 8 Verify that is a linear operator.
4 8
The matrix-function is homogeneous if the expressions on the left hand side and right
hand side of the first equation are indeed equal.
✓ ◆ ✓ ◆ ✓ ◆✓ ◆ ✓ ◆ ✓ ◆
2 6 a 2 6 a 2 6
= = a + b
4 8 b 4 8 b 4 8
✓ ◆ ✓ ◆ ✓ ◆
2 a 6bc 2 a+6 b
= + =
4 a 8bc 4 a+8 b
while
✓ ◆✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
2 6 a 2 6 2a 6b
=c a +b = +
4 8 b 4 8 4a 8b
✓ ◆ ✓ ◆
2a + 6b 2 a+6 b
= = .
4a + 8b 4 a+8 b
The underlined expressions are identical, so the matrix is homogeneous.
The matrix-function is additive if the left and right side of the second equation are
indeed equal.
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆✓ ◆ ✓ ◆ ✓ ◆
2 6 a c 2 6 a+c 2 6
+ = = (a + c) + (b + d)
4 8 b d 4 8 b+d 4 8
✓ ◆ ✓ ◆ ✓ ◆
2(a + c) 6(b + d) 2a + 2c + 6b + 6d
= + =
4(a + c) 8(b + d) 4a + 4c + 8b + 8d
which we need to compare to
✓ ◆✓ ◆ ✓ ◆✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
2 6 a 2 6 c 2 6 2 6
+ =a +b +c +d
4 8 b 4 8 d 4 8 4 8
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
2a 6b 2c 6d 2a + 2c + 6b + 6d
= + + + = .
4a 8b 4c 8d 4a + 4c + 8b + 8d
Thus multiplication by a matrix is additive and homogeneous, and so it is, by definition,
linear.
24
1.4 So, What is a Matrix? 25
We have come full circle; matrices are just examples of the kinds of linear
operators that appear in algebra problems like those in section 1.3. Any
equation of the form M v = w with M a matrix, and v, w n-vectors is called
a matrix equation. Chapter 2 is about efficiently solving systems of linear
equations, or equivalently matrix equations.
The output of the first machine would be fed into the second.
✓ ◆ ✓ ◆ ✓ ◆
x 2x + 6y 1.(2x + 6y) + 2.(4x + 8y)
y 4x + 8y 0.(2x + 6y) + 1.(4x + 8y)
✓ ◆
10x + 22y
=
4x + 8y
Notice that the same final result could be achieved with a single machine:
✓ ◆ ✓ ◆
x 10x + 22y
.
y 4x + 8y
25
26 What is Linear Algebra?
g f :U !W
where
(g f )(u) = g(f (u)) .
This is called the composition of functions. Matrix multiplication is the tool
required for computing the composition of linear functions.
26
1.4 So, What is a Matrix? 27
0 1 20 1 0 13
2a 2 0 0 a
@
= 2a + 2b A = 4 @ 2 2 0 A @ b A5 .
b + 2c B 0 1 2 c B
That is, our notational convention for quadratic functions has induced a notation for
d
the di↵erential operator dx + 2 as a matrix. We can use this notation to change the
way that the following two equations say exactly the same thing.
20 1 0 13 0 1
✓ ◆ 2 0 0 a 0
d 4 @ A @ A5
+2 f =x+1, 2 2 0 b = 1A .
@
dx
0 1 2 c B
1 B
Our notational convention has served as an organizing principle to yield the system of
equations
2a =0
2a + 2b = 1
b + 2c = 1
0 1
0
1
with solution 2 A , where the subscript B is used to remind us that this stack of
@
1
4 B
numbers encodes the vector 12 x+ 14 , which is indeed the solution to our equation since,
d
substituting for f yields the true statement dx + 2 ( 12 x + 14 ) = x + 1.
27
28 What is Linear Algebra?
d
A simple example with the knowns (L and V are dx and 3, respectively) is
shown below, although the detour is unnecessary in this case since you know
how to anti-di↵erentiate.
To drive home the point that we are not studying matrices but rather lin-
ear functions, and that those linear functions can be represented as matrices
under certain notational conventions, consider how changeable the notational
conventions are.
28
1.4 So, What is a Matrix? 29
Example 10 of how a di↵erent matrix comes into the same linear algebra problem.
Notice that we have obtained a di↵erent matrix for the same linear function. The
equation we started with
20 1 0 13 0 1
✓ ◆ 2 1 0 a 1
d
+ 2 f = x + 1 , 4@0 2 2A @ bA5 = @1A
dx
0 0 2 c B0
0 B0
2a + b = 1
, 2b + 2c = 1
2c = 0
011
4
B C
has the solution @ 12 A. Notice that we have obtained a di↵erent 3-vector for the
0
same vector, since in the notational convention B 0 this 3-vector represents 14 + 12 x.
29
30 What is Linear Algebra?
Probably you will spend most of your time on the following review questions:
1. Problems A, B, and C of example 3 can all be written as Lv = w where
L:V !W,
(read this as L maps the set of vectors V to the set of vectors W ). For
each case write down the sets V and W where the vectors v and w
come from.
2. Torque is a measure of “rotational force”. It is a vector whose direction
is the (preferred) axis of rotation. Upon applying a force F on an object
at point r the torque ⌧ is the cross product r ⇥ F = ⌧ :
30
1.5 Review Problems 31
3. The function P (t) gives gas prices (in units of dollars per gallon) as a
function of t the year (in A.D. or C.E.), and g(t) is the gas consumption
rate measured in gallons per year by a driver as a function of their age.
The function g is certainly di↵erent for di↵erent people. Assuming a
lifetime is 100 years, what function gives the total amount spent on gas
during the lifetime of an individual born in an arbitrary year t? Is the
operator that maps g to this function linear?
f (t) = f (0)e2t
satisfies the DE for any number f (0). The number 2 in the DE is called
the constant of proportionality. A similar DE
d 2
f= f
dt t
has a time-dependent “constant of proportionality”.
(s, f )
sugar
Hint
32
1.5 Review Problems 33
33
34 What is Linear Algebra?
Are these matrices diagonal and why? Use the rule you found in prob-
lem 6 to compute the matrix products DD0 and D0 D. What do you
observe? Do you think the same property holds for arbitrary matrices?
What about products where only one of the matrices is diagonal?
(ii) Choose an ordering on {⇤, ?, #}, and then use it to write your
function from part (i) as a triple of numbers.
(iii) Choose a new ordering on {⇤, ?, #} and then write your function
from part (i) as a triple of numbers.
34
1.5 Review Problems 35
(iv) Your answers for parts (ii) and (iii) are di↵erent yet represent the
same function – explain!
35
36 What is Linear Algebra?
36
Systems of Linear Equations
2
2.1 Gaussian Elimination
Systems of linear equations can be written as matrix equations. Now you
will learn an efficient algorithm for (maximally) simplifying a system of linear
equations (or a matrix equation) – Gaussian elimination.
37
38 Systems of Linear Equations
38
2.1 Gaussian Elimination 39
Entries left of the divide carry two indices; subscripts denote column number
and superscripts row number. We emphasize, the superscripts here do not
denote exponents. Make sure you can write out the system of equations and
the associated matrix equation for any augmented matrix.
We now have three ways of writing the same question. Let’s put them
side by side as we solve the system by strategically adding and subtracting
equations. We will not tell you the motivation for this particular series of
steps yet, but let you develop some intuition first.
Example 11 (How matrix equations and augmented matrices change in elimination)
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
x + y = 27 1 1 x 27 1 1 27
, = , .
2x y = 0 2 1 y 0 2 1 0
With the first equation replaced by the sum of the two equations this becomes
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
3x + 0 = 27 3 0 x 27 3 0 27
, = , .
2x y = 0 2 1 y 0 2 1 0
Let the new first equation be the old first equation divided by 3:
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
x + 0 = 9 1 0 x 9 1 0 9
, = , .
2x y = 0 2 1 y 0 2 1 0
Replace the second equation by the second equation minus two times the first equation:
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
x + 0 = 9 1 0 x 9 1 0 9
, = , .
0 y = 18 0 1 y 18 0 1 18
Let the new second equation be the old second equation divided by -1:
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
x + 0 = 9 1 0 x 9 1 0 9
, = , .
0 + y = 18 0 1 y 18 0 1 18
Did you see what the strategy was? To eliminate y from the first equation
and then eliminate x from the second. The result was the solution to the
system.
Here is the big idea: Everywhere in the instructions above we can replace
the word “equation” with the word “row” and interpret them as telling us
what to do with the augmented matrix instead of the system of equations.
Performed systemically, the result is the Gaussian elimination algorithm.
39
40 Systems of Linear Equations
Equivalence Example
Note that in going from the first to second augmented matrix, we used the top left 1
to make the bottom left entry zero. For this reason we call the top left entry a pivot.
Similarly, to get from the second to third augmented matrix, the bottom right entry
(before the divide) was used to make the top right one vanish; so the bottom right
entry is also called a pivot.
This name pivot is used to indicate the matrix entry used to “zero out”
the other entries in its column; the pivot is the number used to eliminate
another number in its column.
40
2.1 Gaussian Elimination 41
called the Identity Matrix , since this would give the simple statement of a
solution x = a, y = b. The same goes for larger systems of equations for
which the identity matrix I has 1’s along its diagonal and all o↵-diagonal
entries vanish:
0 1
1 0 ··· 0
B 0 1 0 C
B C
I=B .. .. .. C
@ . . . A
0 0 ··· 1
For many systems, it is not possible to reach the identity in the augmented
matrix via Gaussian elimination. In any case, a certain version of the matrix
that has the maximum number of components eliminated is said to be the
Row Reduced Echelon Form (RREF).
This example demonstrates if one equation is a multiple of the other the identity
matrix can not be a reached. This is because the first step in elimination will make
the second row a row of zeros. Notice that solutions still exists (1, 1) is a solution.
The last augmented matrix here is in RREF; no more than two components can be
eliminated.
This system of equation has a solution if there exists two numbers x, and y such that
0 + 0 = 1. That is a tricky way of saying there are no solutions. The last form of the
augmented matrix here is the RREF.
41
42 Systems of Linear Equations
and then give up because the the upper left slot can not function as a pivot since the 0
that lives there can not be used to eliminate the zero below it. Of course, the right
thing to do is to change the order of the equations before starting
) ! ! (
x + y = 7 1 1 7 1 0 9 x + 0 = 9
, ⇠ ,
0x + y = 2 0 1 2 0 1 2 0 + y = 2.
The third augmented matrix above is the RREF of the first and second. That is to
say, you can swap rows on your way to RREF.
For larger systems of equations redundancy and inconsistency are the ob-
structions to obtaining the identity matrix, and hence to a simple statement
of a solution in the form x = a, y = b, . . . . What can we do to maximally
simplify a system of equations in general? We need to perform operations
that simplify our system without changing its solutions. Because, exchanging
the order of equations, multiplying one equation by a non-zero constant or
adding equations does not change the system’s solutions, we are lead to three
operations:
These are called Elementary Row Operations, or EROs for short, and are
studied in detail in section 2.3. Suppose now we have a general augmented
matrix for which the first entry in the first row does not vanish. Then, using
just the three EROs, we could1 then perform the following.
1
This is a “brute force” algorithm; there will often be more efficient ways to get to
RREF.
42
2.1 Gaussian Elimination 43
Beginner Elimination
This algorithm and its variations is known as Gaussian elimination. The
endpoint of the algorithm is an augmented matrix of the form
0 1
1 ⇤ 0 ⇤ 0 · · · 0 ⇤ b1
B 0 0 1 ⇤ 0 · · · 0 ⇤ b2 C
B C
B 0 0 0 0 1 · · · 0 ⇤ b3 C
B C
B . . . . . . C
. .
B . . . . .
. .
. .
. C
B C
B C.
B 0 0 0 0 0 ··· 1 ⇤ b C k
B C
B 0 0 0 0 0 · · · 0 0 bk+1 C
B C
B . . . . . . . . C
@ .. .. .. .. .. .. .. .. A
0 0 0 0 0 · · · 0 0 br
This is called Reduced Row Echelon Form (RREF). The asterisks denote
the possibility of arbitrary numbers (e.g., the second 1 in the top line of
example 13).
Learning to perform this algorithm by hand is the first step to learning
linear algebra; it will be the primary means of computation for this course.
You need to learn it well. So start practicing as soon as you can, and practice
often.
43
44 Systems of Linear Equations
The reason we need the asterisks in the general form of RREF is that
not every column need have a pivot, as demonstrated in examples 13 and 16.
Here is an example where multiple columns have no pivot:
Example 18 (Consecutive columns with no pivot in RREF)
✓ ◆ ✓ ◆
x + y + z + 0w = 2 1 1 1 0 2 1 1 1 0 2
, ⇠
2x + 2y + 2z + 2w = 4 2 2 2 1 4 0 0 0 1 0
⇢
x + y + z = 2
,
w = 0.
Note that there was no hope of reaching the identity matrix, because of the shape of
the augmented matrix we started with.
44
2.1 Gaussian Elimination 45
Advanced Elimination
It is important that you are able to convert RREF back into a system
of equations. The first thing you might notice is that if any of the numbers
bk+1 , . . . , br in 2.1.3 are non-zero then the system of equations is inconsistent
and has no solutions. Our next task is to extract all possible solutions from
an RREF augmented matrix.
45
46 Systems of Linear Equations
There are always exactly enough non-pivot variables to index your solutions.
In any approach, the variables which are not expressed in terms of the other
variables are called free variables. The standard approach is to use the non-
pivot variables as free variables.
Non-standard approach: solve for w in terms of z and substitute into the
other equations. You now have an expression for each component in terms
of z. But why pick z instead of y or x? (or x + y?) The standard approach
not only feels natural, but is canonical, meaning that everyone will get the
same RREF and hence choose the same variables to be free. However, it is
important to remember that so long as their set of solutions is the same, any
two choices of free variables is fine. (You might think of this as the di↵erence
between using Google MapsTM or MapquestTM ; although their maps may
look di↵erent, the place hhome sici they are describing is the same!)
When you see an RREF augmented matrix with two columns that have
no pivot, you know there will be two free variables.
46
2.1 Gaussian Elimination 47
0 1
1 00 47 ⇢
B0 14 1C
3
B C, x + 7z =4
@0 00 00A y + 3z+4w = 1
0 00 00
8 9
0 1 0 1 0 1 0 1
>
> x = 4 7z >
> x 4 7 0
< =
B y C B1C B 3C B 4C
y = 1 3z 4w
, ,B C B C B C B C
@ z A = @0A + z @ 1A + w @ 0A
>
> z = z >
>
: ;
w = w w 0 0 1
You can imagine having three, four, or fifty-six non-pivot columns and
the same number of free variables indexing your solutions set. In general a
solution set to a system of equations with n free variables will be of the form
1 + µ2 x2 + · · · + µn xn : µ1 , . . . , µn 2 R}.
{xP + µ1 xH H H
The parts of these solutions play special roles in the associated matrix
equation. This will come up again and again long after we complete this
discussion of basic calculation methods, so we will use the general language
of linear algebra to give names to these parts now.
47
48 Systems of Linear Equations
Check now that the parts of the solutions with free variables as coefficients
from the previous examples are homogeneous solutions, and that by adding
a homogeneous solution to a particular solution one obtains a solution to the
matrix equation. This will come up over and over again. As an example
d2
without matrices, consider the di↵erential equation dx 2 f = 3. A particular
48
2.2 Review Problems 49
0 1
1 1 0 1 0 1 0 1
B0 0 1 2 0 2 0 1C
B C
B0 0 0 0 1 3 0 1C
B C.
@0 0 0 0 0 2 0 2A
0 0 0 0 0 0 1 1
2. Solve the following linear system:
2x1 + 5x2 8x3 + 2x4 + 2x5 = 0
6x1 + 2x2 10x3 + 6x4 + 8x5 = 6
3x1 + 6x2 + 2x3 + 3x4 + 5x5 = 6
3x1 + 1x2 5x3 + 3x4 + 4x5 = 3
6x1 + 7x2 3x3 + 6x4 + 9x5 = 9
Be sure to set your work out carefully with equivalence signs ⇠ between
each step, labeled by the row operations you performed.
3. Check that the following two matrices are row-equivalent:
✓ ◆ ✓ ◆
1 4 7 10 0 1 8 20
and .
2 9 6 0 4 18 12 0
Now remove the third column from each matrix, and show that the
resulting two matrices (shown below) are row-equivalent:
✓ ◆ ✓ ◆
1 4 10 0 1 20
and .
2 9 0 4 18 0
Now remove the fourth column from each of the original two matri-
ces, and show that the resulting two matrices, viewed as augmented
matrices (shown below) are row-equivalent:
✓ ◆ ✓ ◆
1 4 7 0 1 8
and .
2 9 6 4 18 12
Explain why row-equivalence is never a↵ected by removing columns.
4. Check that the system of equations corresponding to the augmented
matrix 0 1
1 4 10
@3 13 9A
4 17 20
49
50 Systems of Linear Equations
has no solutions. If you remove one of the rows of this matrix, does
the new matrix have any solutions? In general, can row equivalence be
a↵ected by removing rows? Explain why or why not.
x 3y = 6
x + 3z = 3
2x + ky + (3 k)z = 1
Hint
50
2.2 Review Problems 51
8. Show that this pair of augmented matrices are row equivalent, assuming
ad bc 6= 0: ! !
a b e 1 0 de bf
ad bc
⇠
c d f 0 1 af ce
ad bc
Hint
11. Equivalence of augmented matrices does not come from equality of their
solution sets. Rather, we define two matrices to be equivalent if one
can be obtained from the other by elementary row operations.
Find a pair of augmented matrices that are not row equivalent but do
have the same solution set.
51
52 Systems of Linear Equations
On the right, we have listed the relations between old and new rows in matrix notation.
52
2.3 Elementary Row Operations 53
0 10 1 0 1
0 1 0 0 1 1 7 2 0 0 4
@1 0 0A @2 0 0 4A = @0 1 1 7A
0 0 1 0 0 1 4 0 0 1 4
⇠
01 10 1 0 1
0 0
2 2 0 0 4 1 0 0 2
@ 0 1 0A @0 1 1 7A = @0 1 1 7A
0 0 1 0 0 1 4 0 0 1 4
⇠
0 10 1 0 1
1 0 0 1 0 0 2 1 0 0 2
@0 1 1A @0 1 1 7A = @0 1 0 3A
0 0 1 0 0 1 4 0 0 1 4
Here we have multiplied the augmented matrix with the matrices that acted on rows
listed on the right of example 21.
6x = 12
, 3 1 6x = 3 1 12
, 2x = 4
, 2 1 2x = 2 1 4
, 1x = 2
53
54 Systems of Linear Equations
This is another way of thinking about Gaussian elimination which feels more
like elementary algebra in the sense that you “do something to both sides of
an equation” until you have a solution.
54
2.3 Elementary Row Operations 55
As we changed the left side from the matrix M to the identity matrix, the
right side changed from the identity matrix to the matrix which undoes M .
Example 26 (Checking that one matrix undoes another)
0 10 1 0 1
0 12 0 0 1 1 1 0 0
@ 1 0 1 A@ 2 0 0 A=@ 0 1 0 A.
0 0 1 0 0 1 0 0 1
If the matrices are composed in the opposite order, the result is the same.
0 10 1 0 1
0 1 1 0 12 0 1 0 0
@ 2 0 0 A@ 1 0 1 A = @ 0 1 0 A.
0 0 1 0 0 1 0 0 1
· · · E2 E1 M = I .
55
56 Systems of Linear Equations
1
How to find M
1
(M |I) ⇠ (I|M )
Much use is made of the fact that invertible matrices can be undone with
EROs. To begin with, since each elementary row operation has an inverse,
M = E1 1 E2 1 · · · ,
• Row Sum: The identity matrix with one o↵-diagonal entry not 0.
56
2.3 Elementary Row Operations 57
• The row swap matrix that swaps the 2nd and 4th row is the identity matrix with
the 2nd and 4th row swapped:
0 1
1 0 0 0 0
B0 0 0 1 0C
B C
B0 0 1 0 0C .
B C
@0 1 0 0 0A
0 0 0 0 1
• The scalar multiplication matrix that replaces the 3rd row with 7 times the 3rd
row is the identity matrix with 7 in the 3rd row instead of 1:
0 1
1 0 0 0
B0 1 0 0C
B C
@0 0 7 0A .
0 0 0 1
• The row sum matrix that replaces the 4th row with the 4th row plus 9 times
the 2nd row is the identity matrix with a 9 in the 4th row, 2nd column:
0 1
1 0 0 0 0 0 0
B0 1 0 0 0 0 0C
B C
B0 0 1 0 0 0 0C
B C
B0 9 0 1 0 0 0C .
B C
B0 0 0 0 1 0 0C
B C
@0 0 0 0 0 1 0A
0 0 0 0 0 0 1
57
58 Systems of Linear Equations
The inverse of the ERO matrices (corresponding to the description of the reverse row
maniplulations)
0 1 0 1 0 1
0 1 0 2 0 0 1 0 0
E1 1 = @ 1 0 0 A , E2 1 = @ 0 1 0 A , E3 1 = @ 0 1 1 A .
0 0 1 0 0 1 0 0 1
58
2.3 Elementary Row Operations 59
59
60 Systems of Linear Equations
We calculated the product of the first three factors in the previous example; it was
named L there, and we will reuse that name here. The product of the next three
factors is diagonal and we wil name it D. The last factor we named U (the name means
something di↵erent in this example than the last example.) The LDU factorization
of our matrix is
0 1 0 10 10 1
2 0 3 1 1 0 0 0 2 0 0 0 1 0 32 12
B 0 1 2 2C B 1 0 0C B 0C B C
B C=B 0 C B0 1 0 C B0 1 2 24 C .
@ 4 0 9 2 A @ 2 0 1 0 A @ 0 0 3 0 A @0 0 1 3A
0 1 1 1 0 1 1 1 0 0 0 3 0 0 0 1
60
2.4 Review Problems 61
0 1
0 1 0 0
B1 0 0 0C
P =B
@0
C=P 1
0 1 0A
0 0 0 1
0 1 0 10 10 10 3 11
0 1 2 2 1 0 0 0 2 0 0 0 0 1 0 0 1 0 2 2
B 2 0 3 1C B 0 1 0 0CB 0CB 0CB 2C
B C=B CB0 1 0 CB1 0 0 CB0 1 2 C
@ 4 0 9 2 A @ 2 0 1 0 @0
A 0 3 0 @0
A 0 1 0 @0
A 0 1 4A
3
0 1 1 1 0 1 1 1 0 0 1 3 0 0 0 1 0 0 0 1
61
62 Systems of Linear Equations
0 10 1 0 1
3 6 2 x 3
@5 9 4 A @ y A = @ 1 A
2 4 2 z 0
3. Solve this vector equation by finding the inverse of the matrix through
(M |I) ⇠ (I|M 1 ) and then applying M 1 to both sides of the equation.
0 10 1 0 1
2 1 1 x 9
@1 1 1A @ y A = @6A
1 1 2 z 7
5. Multiple matrix equations with the same matrix can be solved simul-
taneously.
62
2.5 Solution Sets for Systems of Linear Equations 63
6. How can you convince your fellow students to never make this mistake?
R10 =R1 +R2
0 1 R20 =R1 R2
0 1
1 0 2 3 1 1 4 6
R30 =R1 +2R2
@0 1 2 3 A ⇠ @1 1 0 0A
2 0 1 4 1 2 6 9
1. One solution
63
64 Systems of Linear Equations
2. No solutions
In each case the linear operator is a 1 ⇥ 1 matrix. In the first case, the linear
operator is invertible. In the other two cases it is not. In the first case, the
solution set is a point on the number line, in case 2b the solution set is the
whole number line.
Lets examine similar situations with larger matrices: 2 ⇥ 2 matrices.
✓ ◆✓ ◆ ✓ ◆ ✓ ◆
6 0 x 12 2
1. = has one solution: .
0 2 y 6 3
✓ ◆✓ ◆ ✓ ◆
1 3 x 4
2a. = has no solutions.
0 0 y 1
✓ ◆✓ ◆ ✓ ◆ ⇢✓ ◆ ✓ ◆
1 3 x 4 4 3
2bi. = has solution set +y :y2R .
0 0 y 0 0 1
✓ ◆✓ ◆ ✓ ◆ ⇢✓ ◆
0 0 x 0 x
2bii. = has solution set : x, y 2 R .
0 0 y 0 y
Again, in the first case the linear operator is invertible while in the other
cases it is not. When a 2 ⇥ 2 matrix from a matrix equation is not invertible
the solution set can be empty, a line in the plane, or the plane itself.
For a system of equations with r equations and k veriables, one can have a
number of di↵erent outcomes. For example, consider the case of r equations
in three variables. Each of these equations is the equation of a plane in three-
dimensional space. To find solutions to the system of equations, we look for
the common intersection of the planes (if an intersection exists). Here we
have five di↵erent possibilities:
64
2.5 Solution Sets for Systems of Linear Equations 65
2bi. Line. The planes intersect in a common line; any point on that line
then gives a solution to the system of equations.
2bii. Plane. Perhaps you only had one equation to begin with, or else all
of the equations coincide geometrically. In this case, you have a plane
of solutions, with two free parameters.
Planes
65
66 Systems of Linear Equations
Following the standard approach, express the pivot variables in terms of the non-pivot
variables and add “empty equations”. Here x3 and x4 are non-pivot variables.
9 0 1 0 1 0 1 0 1
x1 = 1 x3 + x4 > > x1 1 1 1
= B C B C B C B
x2 = 1 + x3 x4 x2 C B1C 1C 1C
,B @ A = @ A + x3 B @ A + x4 B
@
C
x3 = x3 >
> x3 0 1 0A
;
x4 = x4 x4 0 0 1
Notice that the first two components of the second two terms come from the non-pivot
columns. Another way to write the solution set is
S = xP + µ1 xH H
1 + µ 2 x 2 : µ 1 , µ2 2 R ,
where 0 1 0 1 01
1 1 1
B 1C B 1C B 1C
xP = B C
@0A , xH
1 =B C
@ 1A , xH
2 =B C
@ 0A .
0 0 1
Here xP is a particular solution while xH H
1 and x2 are called homogeneous solutions.
The solution set forms a plane.
operators. Thus
M (xP + µ1 xH H P H H
1 + µ2 x 2 ) = M x + µ1 M x 1 + µ2 M x 2 = v ,
66
2.5 Solution Sets for Systems of Linear Equations 67
M xP = v .
M xH
1 = 0.
M xH
2 = 0.
Here xH H
1 and x2 are examples of what are called homogeneous solutions to
the system. They do not solve the original equation M x = v, but instead its
associated homogeneous equation M y = 0.
We have just learnt a fundamental lesson of linear algebra: the solution
set to Ax = b, where A is a linear operator, consists of a particular solution
plus homogeneous solutions.
Example 33 Consider the matrix equation of example 32. It has solution set
80 1 0 1 0 1 9
>
> 1 1 1 >
>
<B C B 1C B 1C =
B 1 C B C B C
S = @ A + µ 1 @ A + µ 2 @ A : µ 1 , µ2 2 R .
>
> 0 1 0 >
>
: ;
0 0 1
0 1
1
B 1C
Then M xP = v says that B C
@0A is a solution to the original matrix equation, which is
0
certainly true, but this is not the only solution.
0 1
1
B C
M xH B 1C
1 = 0 says that @ 1A is a solution to the homogeneous equation.
67
68 Systems of Linear Equations
0 1
1
B 1C
M xH
2 = 0 says that B C
@ 0A is a solution to the homogeneous equation.
1
Notice how adding any multiple of a homogeneous solution to the particular solution
yields another particular solution.
2. Invent a simple linear system that has multiple solutions. Use the stan-
dard approach for solving linear systems and a non-standard approach
to obtain di↵erent descriptions of the solution set. Is the solution set
di↵erent with di↵erent approaches?
3. Let 0 1 0 1 1
a11 a12 · · · a1k x
B 2 2 2C B 2 C
B a1 a2 · · · ak C Bx C
M =B
B .. .. .. C
C and x = B
B ..
C.
C
@. . .A @ . A
r r r
a1 a2 · · · ak xk
Note: x2 does not denote the square of the column vector x. Instead
x1 , x2 , x3 , etc..., denote di↵erent variables (the components of x);
the superscript is an index. Although confusing at first, this nota-
tion was invented by AlbertPEinstein who noticed that quantities like
k
a21 x1 + a22 x2 · · · + a2k xk =: 2 j
j=1 aj x , can be written unambiguously
as a2j xj . This is called Einstein summation notation. The most im-
portant thing to remember is that the index j is a dummy variable,
68
2.6 Review Problems 69
Show that your rule for multiplying a matrix by a vector obeys the
linearity property.
4. The standard basis vector ei is a column vector with a one in the ith
row, and zeroes everywhere else. Using the rule for multiplying a matrix
times a vector in problem 3, find a simple rule for multiplying M ei ,
where M is the general matrix defined there.
69
70 Systems of Linear Equations
70
The Simplex Method
3
In Chapter 2, you learned how to handle systems of linear equations. However
there are many situations in which inequalities appear instead of equalities.
In such cases we are often interested in an optimal solution extremizing a
particular quantity of interest. Questions like this are a focus of fields such as
mathematical optimization and operations research. For the case where the
functions involved are linear, these problems go under the title linear pro-
gramming. Originally these ideas were driven by military applications, but
by now are ubiquitous in science and industry. Gigantic computers are dedi-
cated to implementing linear programming methods such as George Dantzig’s
simplex algorithm–the topic of this chapter.
71
72 The Simplex Method
Finally Pablo knows that oranges have twice as much sugar as apples and that apples
have 5 grams of sugar each. Too much sugar is unhealthy, so Pablo wants to keep the
children’s sugar intake as low as possible. How many oranges and apples should Pablo
suggest that the school board put on the menu?
x 5 and y 7,
to fulfill the school board’s politically motivated wishes. The teacher’s and parent’s
fruit requirement means that
x + y 15 ,
but to keep the canteen tidy
x + y 25 .
Now let
s = 5x + 10y .
This linear function of (x, y) represents the grams of sugar in x apples and y oranges.
The problem is asking us to minimize s subject to the four linear inequalities listed
above.
72
3.2 Graphical Solutions 73
x 5
y 7
15 x + y 25 .
You might be able to see the solution to Pablo’s problem already. Oranges
are very sugary, so they should be kept low, thus y = 7. Also, the less fruit
the better, so the answer had better lie on the line x + y = 15. Hence,
the answer must be at the vertex (8, 7). Actually this is a general feature
73
74 The Simplex Method
The plot of a linear function of two variables is a plane through the origin.
Restricting the variables to the feasible region gives some lamina in 3-space.
Since the function we want to optimize is linear (and assumedly non-zero), if
we pick a point in the middle of this lamina, we can always increase/decrease
the function by moving out to an edge and, in turn, along that edge to a
corner. Applying this to the above picture, we see that Pablo’s best option
is 110 grams of sugar a week, in the form of 8 apples and 7 oranges.
It is worthwhile to contrast the optimization problem for a linear function
with the non-linear case you may have seen in calculus courses:
74
3.3 Dantzig’s Algorithm 75
Here we have plotted the curve f (x) = d in the case where the function f is
linear and non-linear. To optimize f in the interval [a, b], for the linear case
we just need to compute and compare the values f (a) and f (b). In contrast,
for non-linear functions it is necessary to also compute the derivative df /dx
to study whether there are extrema inside the interval.
75
76 The Simplex Method
c1 := x+y+z+w = 5
c2 := x + 2y + 3z + 2w = 6 ,
where x 0, y 0, z 0 and w 0.
3x + 3y + z 4w + f = 0 .
Keep in mind that the first four columns correspond to the positive variables (x, y, z, w)
and that the last row has the information of the function f . The general case is depicted
in figure 3.1.
Now the system is written as an augmented matrix where the last row
encodes the objective function and the other rows the constraints. Clearly we
can perform row operations on the constraint rows since this will not change
the solutions to the constraints. Moreover, we can add any amount of the
constraint rows to the last row, since this just amounts to adding a constant
to the function we want to extremize.
76
3.3 Dantzig’s Algorithm 77
77
78 The Simplex Method
Precisely because we chose the second row to perform our row operations, all entries
in the last column remain positive. This allows us to continue the algorithm.
We now repeat the above procedure: There is a 1 in the first column of the last
row. We want to zero it out while adding as little to f as possible. This is achieved
by adding twice the first row to the last row:
0 1
1 1
2 0 2 0 0 2 c1 12 c2 = 2
B C
@ 1 2 3 2 0 6 A c2 = 6
0 7 6 0 1 16 f = 16 7y 6z .
The Dantzig algorithm terminates if all the coefficients in the last row (save perhaps
for the last entry which encodes the value of the objective) are positive. To see why
we are done, lets write out what our row operations have done in terms of the function
f and the constraints (c1 , c2 ). First we have
f = 16 7y 6z
f = 16 .
Finally, we check that the constraints can be solved with y = 0 = z and positive
(x, w). Indeed, they can by taking x = 4, w = 1.
x1 + x2 3, x1 + x2 13 ,
so are not of the form M x = v. To achieve this we introduce two new positive
variables x3 0, x4 4 and write
c1 := x1 + x2 x3 = 3 , c2 := x1 + x2 + x4 = 13 .
78
3.4 Pablo Meets Dantzig 79
These are called slack variables because they take up the “slack” required to convert
inequality to equality. This pair of equations can now be written as M x = v,
0 1
✓ ◆ x1 ✓ ◆
1 1 1 0 B C
Bx2 C = 3 .
1 1 0 1 @x3 A 13
x4
Finally, Pablo wants to minimize sugar s = 5x + 10y, but the standard problem
maximizes f . Thus the so-called objective function f = s + 95 = 5x1 10x2 .
(Notice that it makes no di↵erence whether we maximize s or s + 95, we choose
the latter since it is a linear function of (x1 , x2 ).) Now we can build an augmented
matrix whose last row reflects the objective function equation 5x1 + 10x2 + f = 0:
0 1
1 1 1 0 0 3
@ 1 1 0 1 0 13 A .
5 10 0 0 1 0
Here it seems that the simplex algorithm already terminates because the last row only
has positive coefficients, so that setting x1 = 0 = x2 would be optimal. However, this
does not solve the constraints (for positive values of the slack variables x3 and x4 ).
Thus one more (very dirty) trick is needed. We add two more, positive, (so-called)
artificial variables x5 and x6 to the problem which we use to shift each constraint
c1 ! c1 x5 , c2 ! c 2 x6 .
The idea being that for large positive ↵, the modified objective function
f ↵x5 ↵x6
is only maximal when the artificial variables vanish so the underlying problem is un-
changed. Lets take ↵ = 10 (our solution will not depend on this choice) so that our
augmented matrix reads
0 1
1 1 1 0 1 0 0 3
@ 1 1 0 1 0 1 0 13 A
5 10 0 0 10 10 1 0
0 1
1 1 1 0 1 0 0 3
R30 =R3 10R1 10R2
⇠ @ 1 1 0 1 0 1 0 13 A .
15 10 10 10 0 0 1 160
Here we performed one row operation to zero out the coefficients of the artificial
variables. Now we are ready to run the simplex algorithm exactly as in section 3.3.
79
80 The Simplex Method
The first row operation uses the 1 in the top of the first column to zero out the most
negative entry in the last row:
0 1
1 1 1 0 1 0 0 3
@ 1 1 0 1 0 1 0 13 A
0 5 5 10 15 0 1 115
0 1
0
1 1 1 0 1 0 0 3
R2 =R2 R1
⇠ @ 0 0 1 1 1 1 0 10 A
0 5 5 10 15 0 1 115
0 1
1 1 1 0 1 0 0 3
R30 =R3 +10R2
⇠ @ 0 0 1 1 1 1 0 10 A .
0 5 5 0 5 10 1 15
Now the variables (x2 , x3 , x5 , x6 ) have zero coefficients so must be set to zero to
maximize f . The optimum value is f = 15 so s = f + 95 = 110 exactly as before.
Finally, to solve the constraints x1 = 3 and x4 = 10 so that x = 8 and y = 7 which
also agrees with our previous result.
Clearly, performed by hand, the simplex algorithm was slow and complex
for Pablo’s problem. However, the key point is that it is an algorithm that
can be fed to a computer. For problems with many variables, this method is
much faster than simply checking all vertices as we did in section 3.2.
x 0, y 0, x + 2y 2 , 2x + y 2 ,
by
80
3.5 Review Problems 81
their profit (all of which goes to shareholders, not operating costs). The
quality of oil from well A is better than from well B, so is worth 50%
more per barrel. The Greasy government cares about the environment
and will not allow Conoil to pump in total more than 6 million barrels
per year. Well A costs twice as much as well B to operate. Conoil’s
yearly operating budget is only sufficient to pump at most 10 million
barrels from well B per year. Using both a graphical method and then
(as a double check) Dantzig’s algorithm, determine how many barrels
Conoil should pump from each well to maximize their profits.
81
82 The Simplex Method
82
Vectors in Space, n-Vectors
4
To continue our linear algebra journey, we must discuss n-vectors with an
arbitrarily large number of components. The simplest way to think about
these is as ordered lists of numbers,
0 1
a1
B C
a = @ ... A .
an
83
84 Vectors in Space, n-Vectors
84
4.2 Hyperplanes 85
4.2 Hyperplanes
Vectors in Rn are impossible to visualize unless n is 1,2, or 3. However,
familiar objects like lines and planes still make sense for any value of n: The
line L along the direction defined by a vector v and through a point P labeled
by a vector u can be written as
L = {u + tv | t 2 R} .
80 1 0 1? 9
>
> 1 1 ?? >
>
<B C B0C? =
B2 C B C ?
Example 45 @ A + t @ A?t 2 R describes a line in R4 parallel to the x1 -axis.
>
> 3 0 ? >
>
: ;
4 0 ?
unless both vectors are in the same line, in which case, one of the vectors
is a scalar multiple of the other. The sum of u and v corresponds to laying
the two vectors head-to-tail and drawing the connecting vector. If u and v
determine a plane, then their sum lies in the plane determined by u and v.
85
86 Vectors in Space, n-Vectors
{P + su + tv | s, t 2 R} .
Parametric Notation
We can generalize the notion of a plane with the following recursive def-
inition. (That is, infinitely many things are defined in the following line.)
86
4.2 Hyperplanes 87
80 1 0 1 0 1? 9
> 3 1 0 ? >
>
> ? >
>
>
> B1C B0C B1C? >
>
> B
<B C C B C B C ? >
=
4 B0 C B 0C ?
B C B C B C ?
= B C + a B C + b B C?a, b 2 R
>
>
> B1C B0C B0C? >
>
>
>
> @ 5A @0 A @0A? >
>
>
: ? >
;
9 0 0 ?
87
88 Vectors in Space, n-Vectors
You might sometimes encounter the word “hyperplane” without the qual-
ifier “k-dimensional. When the dimension k is not specified, one usually as-
sumes that k = n 1 for a hyperplane inside Rn . This is the kind of object
that is specified by one algebraic equation in n variables.
is
80 1 0 1 0 1 0 1 0 1? 9
>
> 1 1 1 1 1 ?? >
>
>
> >
<BB0C
C B
B 1C
C
B
B 0CC
B
B 0CC
B
B 1C
C?
? >
=
B0C + s2 B 0C B 1C B 0C B 0C ? s 2 , s3 , s4 , s5 2 R ,
> B C B C + s3 B C + s4 B C + s5 B C? >
>
> @0A @ 0A @ 0 A @ 1 A @ 0A? >
>
>
: ? >
;
0 0 0 0 1 ?
a 4-dimensional hyperplane in R5 .
Using the Law of Cosines, we can then figure out the angle between two
vectors. Given two vectors v and u that span a plane in Rn , we can then
connect the ends of v and u with the vector v u.
88
4.3 Directions and Magnitudes 89
Thus,
kuk kvk cos ✓ = u1 v 1 + · · · + un v n .
Note that in the above discussion, we have assumed (correctly) that Eu-
clidean lengths in Rn give the usual notion of lengths of vectors for any plane
in Rn . This now motivates the definition of the dot product.
1 00 1
u1 v1
B .. C B .. C
Definition The dot product of u = @ . A and v = @ . A is
un vn
u v := u1 v 1 + · · · + un v n .
89
90 Vectors in Space, n-Vectors
The sum above is the one Gauß, according to legend, could do in kindergarten.
u v = kukkvk cos ✓ .
90
4.3 Directions and Magnitudes 91
1. symmetric:
u v = v u,
2. Distributive:
u (v + w) = u v + u w ,
u (cv + dw) = c u v + d u w ,
and
(cu + dw) v = c u v + d w v .
4. Positive Definite:
u u 0,
and u u = 0 only when u itself is the 0-vector.
There are, in fact, many di↵erent useful ways to define lengths of vectors.
Notice in the definition above that we first defined the dot product, and then
defined everything else in terms of the dot product. So if we change our idea
of the dot product, we change our notion of length and angle as well. The
dot product determines the Euclidean length and angle between two vectors.
Other definitions of length and angle arise from inner products, which
have all of the properties listed above (except that in some contexts the
positive definite requirement is relaxed). Instead of writing for other inner
products, we usually write hu, vi to avoid confusion.
91
92 Vectors in Space, n-Vectors
p
• separated by a time hX1 , X2 i if hX1 , X2 i 0.
In particular, the di↵erence in time coordinates t2 t1 is not the time between the
two points! (Compare this to using polar coordinates for which the distance between
two points (r, ✓1 ) and (r, ✓2 ) is not ✓2 ✓1 ; coordinate di↵erences are not necessarily
distances.)
92
4.3 Directions and Magnitudes 93
ku + vk kuk + kvk.
Proof.
ku + vk2 = (u + v) (u + v)
= u u + 2u v + v v
= kuk2 + kvk2 + 2 kuk kvk cos ✓
= (kuk + kvk)2 + 2 kuk kvk(cos ✓ 1)
(kuk + kvk)2 .
That is, the square of the left-hand side of the triangle inequality is the
square of the right-hand side. Since both the things being squared are posi-
tive, the inequality holds without the square;
ku + vk kuk + kvk
Example 54 Let 0 1 0 1
1 4
B2C B3C
a=B C B C
@3A and b = @2A ,
4 1
so that
a a = b b = 1 + 22 + 32 + 42 = 30
93
94 Vectors in Space, n-Vectors
p 2 p
) kak = 30 = kbk and kak + kbk = (2 30)2 = 120 .
Since 0 1
5
B5C
a+b=B C
@5A ,
5
we have
2
ka + bk2 = 52 + 52 + 52 + 52 = 100 < 120 = kak + kbk
as predicted by the triangle inequality. p p
Notice also that a b = 1.4 + 2.3 + 3.2 + 4.1 = 20 < 30. 30 = 30 = kak kbk in
accordance with the Cauchy–Schwarz inequality.
94
4.4 Vectors, Lists and Functions: RS 95
What you have really done here is assign a number to each element of the
set S. In other words, the second list is a function
f : S ! R.
Given two lists like the second one above, we could easily add them – if you
plan to buy 5 apples and I am buying 3 apples, together we will buy 8 apples!
In fact, the second list is really a 5-vector in disguise.
In general it is helpful to think of an n-vector as a function whose domain
is the set {1, . . . , n}. This is equivalent to thinking of an n-vector as an
ordered list of n numbers. These two ideas give us two equivalent notions for
the set of all n-vectors:
80 1 9
1
>
< a >
=
B .. C 1
R := @ . A a , . . . , a 2 R = {a : {1, . . . , n} ! R} =: R{1,··· ,n}
n n
>
: an >
;
The notation R{1,··· ,n} is used to denote the set of all functions from {1, . . . , n}
to R.
Similarly, for any set S the notation RS denotes the set of functions from
S to R:
RS := {f : S ! R} .
When S is an ordered set like {1, . . . , n}, it is natural to write the components
in order. When the elements of S do not have a natural ordering, doing so
might cause confusion.
95
96 Vectors in Space, n-Vectors
a? = 3, a# = 5, a⇤ = 2.
because the elements of S do not have an ordering, since as sets {⇤, ?, #} = {?, #, ⇤}.
a? = 3, a# = 5, a⇤ = 2
and
b? = 2, b# = 4, b⇤ = 13
then a + b 2 RS is the function such that
96
4.5 Review Problems 97
A = (200, 300, 50, 50, 100, 100, 200, 500, 1000, 100) .
He also listed the number of times he mowed each lawn in a given year,
for the year 1988 that ordered list was
f = (20, 1, 2, 4, 1, 5, 2, 1, 10, 6) .
2. (2) Find the angle between the diagonal of the unit square in R2 and
one of the coordinate axes.
(3) Find the angle between the diagonal of the unit cube in R3 and
one of the coordinate axes.
(n) Find the angle between the diagonal of the unit (hyper)-cube in
Rn and one of the coordinate axes.
97
98 Vectors in Space, n-Vectors
98
4.5 Review Problems 99
where the vector p labels a given point on the plane and n is a vector
normal to the plane. Let N and P be vectors in R101 and
0 1
x1
B x2 C
B C
X = B .. C .
@ . A
x101
7. Let 0 1 0 1
1 1
B1C B 2 C
B C B C
B1C B 3 C
u = B C and v = B C
B .. C B .. C
@.A @ . A
1 101
Find the projection of v onto u and the projection of u onto v. (Hint:
Remember that two vectors u and v define a plane, so first work out
how to project one vector onto another in a plane. The picture from
Section 14.4 could help.)
8. If the solution set to the equation A(x) = b is the set of vectors whose
tips lie on the paraboloid z = x2 + y 2 , then what can you say about
the function A?
10. If A is a linear operator and both v and cv (for any real number c) are
solutions to Ax = b, then what can you say about b?
99
100 Vectors in Space, n-Vectors
100
Vector Spaces
5
As suggested at the end of chapter 4, the vector spaces Rn are not the only
vector spaces. We now give a general definition that includes Rn for all
values of n, and RS for all sets S, and more. This mathematical structure is
applicable to a wide range of real-world problems and allows for tremendous
economy of thought; the idea of a basis for a vector space will drive home
the main idea of vector spaces; they are sets with very simple structure.
The two key properties of vectors are that they can be added together
and multiplied by scalars. Thus, before giving a rigorous definition of vector
spaces, we restate the main idea.
101
102 Vector Spaces
Remark Rather than writing (V, +, . , R), we will often say “let V be a vector space
over R”. If it is obvious that the numbers used are real numbers, then “let V be a
vector space” suffices. Also, don’t confuse the scalar product · with the dot product .
The scalar product is a function that takes as its two inputs one number and one
vector and returns a vector as its output. This can be written
·: R ⇥ V ! V .
Similarly
+:V ⇥V !V .
On the other hand, the dot product takes two vectors and returns a number. Suc-
cinctly: : V ⇥ V ! R. Once the properties of a vector space have been verified,
we’ll just write scalar multiplication with juxtaposition cv = c · v, though, to keep our
notation efficient.
102
5.1 Examples of Vector Spaces 103
Example 58
RN = {f | f : N ! R}
Here the vector space is the set of functions that take in a natural number n and return
a real number. The addition is just addition of functions: (f1 +f2 )(n) = f1 (n)+f2 (n).
Scalar multiplication is just as simple: c · f (n) = cf (n).
We can think of these functions as infinitely large ordered lists of numbers: f (1) =
13 = 1 is the first component, f (2) = 23 = 8 is the second, and so on. Then for
example the function f (n) = n3 would look like this:
0 1
1
B 8 C
B C
B 27 C
B C
f =B .
B . C.
.
C
B C
B n3 C
@ A
..
.
Thinking this way, RN is the space of all infinite sequences. Because we can not write
a list infinitely long (without infinite time and ink), one can not define an element of
this space explicitly; definitions that are implicit, as above, or algebraic as in f (n) = n3
(for all n 2 N) suffice.
Let’s check some axioms.
(+iv) (Zero) We need to propose a zero vector. The constant zero function g(n) = 0
works because then f (n) + g(n) = f (n) + 0 = f (n).
The other axioms should also be checked. This can be done using properties of the
real numbers.
RR = {f | f : R ! R}
103
104 Vector Spaces
You can probably figure out how to show that RS is vector space for any
set S. This might lead you to guess that all vector spaces are of the form RS
for some set S. The following is a counterexample.
Example 61 Another very important example of a vector space is the space of all
di↵erentiable functions: ⇢
d
f: R!R f exists .
dx
From calculus, we know that the sum of any two di↵erentiable functions is dif-
ferentiable, since the derivative distributes over addition. A scalar multiple of a func-
tion is also di↵erentiable, since the derivative commutes with scalar multiplication
d d
( dx (cf ) = c dx f ). The zero function is just the function such that 0(x) = 0 for ev-
ery x. The rest of the vector space properties are inherited from addition and scalar
multiplication in R.
104
5.1 Examples of Vector Spaces 105
This example is called a subspace because it gives a vector space inside another vector
space. See chapter 9 for details. Indeed, because it is determined by the linear map
given by the matrix M , it is called ker M , or in words, the kernel of M , for this see
chapter 16.
105
106 Vector Spaces
A hyperplane which does not contain the origin cannot be a vector space
because it fails condition (+iv).
It is also possible to build new vector spaces from old ones using the
product of sets. Remember that if V and W are sets, then their product is
the new set
V ⇥ W = {(v, w)|v 2 V, w 2 W } ,
or in words, all ordered pairs of elements from V and W . In fact V ⇥ W is a
vector space if V and W are. We have actually been using this fact already:
Example 64 The real numbers R form a vector space (over R). The new vector space
R ⇥ R = {(x, y)|x 2 R, y 2 R}
5.1.1 Non-Examples
The solution set to a linear non-homogeneous equation is not a vector space
because it does not contain the zero vector and therefore fails (iv).
Do notice that if just one of the vector space rules is broken, the example is
not a vector space.
Most sets of n-vectors are not vector spaces.
106
5.2 Other Fields 107
⇢✓ ◆
a
Example 66 P := a, b 0 is not a vector space because the set fails (·i)
b
✓ ◆ ✓ ◆ ✓ ◆
1 1 2
since 2 P but 2 = 2
/ P.
1 1 2
does not form a vector space because it does not satisfy (+i). The functions f (x) =
x2 +1 and g(x) = 5 are in the set, but their sum (f +g)(x) = x2 4 = (x+2)(x 2)
is not since (f + g)(2) = 0.
C = x + iy | i2 = 1, x, y 2 R .
Example 68 In quantum physics, vector spaces over C describe all possible states a
physical system can have. For example,
⇢✓ ◆
V = | ,µ 2 C
µ
✓ ◆ ✓ ◆
1 0
is the set of possible states for an electron’s spin. The vectors and describe,
0 1
respectively, ✓an electron
◆ with spin “up” and “down” along a given direction. Other
i
vectors, like are permissible, since the base field is the complex numbers. Such
i
states represent a mixture of spin up and spin down for the given direction (a rather
counterintuitive yet experimentally verifiable concept), but a given spin in some other
direction.
107
108 Vector Spaces
Complex numbers are very useful because of a special property that they
enjoy: every polynomial over the complex numbers factors into a product of
linear polynomials. For example, the polynomial
x2 + 1
doesn’t factor over real numbers, but over complex numbers it factors into
(x + i)(x i) .
x2 = 1,
B2 = Z2 = {0, 1} ,
+ 0 1 ⇥ 0 1
0 0 1 0 0 0
1 1 0 1 0 1
108
5.3 Review Problems 109
(b) What would happen if you used R as the base field (try comparing
to problem 1).
3. (a) Consider the set of convergent sequences, with the same addi-
tion and scalar multiplication that we defined for the space of
sequences:
n o
V = f | f : N ! R, lim f (n) 2 R ⇢ RN .
n!1
109