0% found this document useful (0 votes)
165 views101 pages

MAM2084 Notes v1.1

1. The document introduces fundamental concepts in linear algebra including vectors and vector spaces. A vector space is a set that is closed under vector addition and scalar multiplication. 2. Examples of vector spaces include R3, the set of all functions from real numbers to real numbers, and the set of binary codes of length 3. 3. The document begins to discuss linear combinations, which involve combining vectors through addition and scalar multiplication.

Uploaded by

Kevin Jakadala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views101 pages

MAM2084 Notes v1.1

1. The document introduces fundamental concepts in linear algebra including vectors and vector spaces. A vector space is a set that is closed under vector addition and scalar multiplication. 2. Examples of vector spaces include R3, the set of all functions from real numbers to real numbers, and the set of binary codes of length 3. 3. The document begins to discuss linear combinations, which involve combining vectors through addition and scalar multiplication.

Uploaded by

Kevin Jakadala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 101

UNIVERSITY OF CAPE TOWN

MAM2084F/S

T H O M A S VA N H E E R D E N
Contents

1 Fundamentals 5
Vectors & Vector Spaces 5
Linear Combinations 8
Systems of Linear Equations 9
Subspaces 14
Gauss Reduction 16
Generating Sets 19
Linear Independence 22
Basis & Dimension 25

2 Determinants 29
Calculating Determinants 30
Calculating Determinants by Gauss Reduction 32
Adjoints and Cramer’s Rule 37
An Important and Useful Fact About Determinants 38

3 Differential Equations 41
Separable Differential Equations 41
Exact Differential Equations 43
Linear Differential Equations 45
Linear Independence and the Wronskian 49
Linear, Homogeneous Differential Equations with Constant Coefficients 52
Variation of Parameters 55
4 university of cape town

4 Diagonalisation 59
A First Example 59
Eigenvalues and Eigenvectors 60
More Facts about Eigenvalues and Eigenvectors 64
Diagonalisation 67
Inner Products, Norms and Orthogonality 70
Orthogonal Diagonalisation 76
Quadratic Forms 79
Rotation and Reflection in R2 and R3 84

5 Laplace Transforms 87
The Basics 87
Some Useful Facts 90
The Inverse Transform 92
Solving Differential Equations 94
Convolution 96
Step Functions 98
1
Fundamentals

This course is an introduction to linear algebra and differential


equations. There are a number of really fundamental ideas you need
to understand at the start of the course that we will use throughout
the rest of the course. You will find it difficult at first – this course
is more abstract than any of the mathematics courses you have done
before – but our experience teaching this course is that if you keep
thinking about it everything will make sense by the time you have to
write the exam. The problems you regard as difficult at the start of
the course will strike you as obvious by the end.
There are a number of things you learnt last year that are going to
be necessary in this course. You should understand matrix multipli-
cation, Gauss reduction and the modulus-argument form of complex
numbers from mam1021s. If you don’t remember any of that then
you should review that material.

Vectors & Vector Spaces

The first of these fundamental ideas are vectors and vector spaces.
What is a vector? You may already have some idea from your previ-
ous courses.

phy1031f A vector is something with a magnitude and direction, e.g.


a force or displacement.

mam1021s A vector is an element of Rn that follows certain rules.

These definitions are not wrong, but they are very narrow. We are
going to introduce a much broader, more general definition which
encompasses the answers above, and includes other useful objects.
Definition. We call the set V = {u, v, w, . . .} a vector space and its This definition is not rigorous – if
elements vectors if you are interested you can find the
formal definition of a vector space on
Wikipedia.
6 university of cape town

1. we can combine any two vectors u and v through addition to


produce a new vector (u + v). V must be closed under addition, i.e.
the result (u + v) must also be an element of V.

2. we can multiply any element u by a scalar α to produce a new


vector (αu). V must be closed under scalar multiplication, i.e. the
result (αu) must also be an element of V.
The scalars we use for multiplication are often (but not always!)
the real numbers. Most of the facts and explanation that we pro-
vide in this course will default to using the reals, even though this
comes at the expense of generality.

Let’s consider some concrete examples of vector spaces. In each


example we will first introduce a set, then check that we can add the
elements of the set and multiply the elements of the set by scalars,
and finally that the set is closed under those operations. If this is the
case, then the set is a vector space.
Example. R3 (more generally, Rn )

R3 = {( x1 , x2 , x3 )| xi ∈ R}

We can add elements in this set by adding the components of each


vector, and we can multiply a vector by a scalar by multiplying each
of its components by that scalar.

     
u1 v1 u1 + v1
Let u =  u2  and v =  v2  . Then u + v =  u2 + v2  .
     
u3 v3 u3 + v3

Note that the result of this addition is another vector in R3 since


ui + vi is just a real number. If you want something more concrete:

     
1 5 6
Let u =  6  and v =  0  . Then u + v =  6  .
     
−3 2 −1

For scalar multiplication we have


   
u1 αu1
u =  u2  and α ∈ R. Then αu =  αu2 
   
u3 αu3

which is again in V since αui is just a real number. More con-


cretely:
mam2084f/s 7

   
1 3
Let u =  6  and α = 3. Then αu =  18  .
   
−3 −9
We can show that Rn is a vector space similarly.

At this point you might start thinking that all sets are closed under
any operation. After all, it seems so obvious for vector addition and
scalar multiplication. This is not true. The dot product u · v combines
two vectors but gives a scalar as a result. R3 is not closed under the
dot product.
Example. RR
The second set we look at is the set of all functions R → R, i.e.
all functions that take any real number as an input, and give a real
number as an output.

RR = { f : R → R }

If we define addition as ( f + g)( x ) = f ( x ) + g( x ) then the function


( f + g)( x ) takes in any real number as an input, and its output is the
sum of two real numbers, so it is also a member of RR .

If we define scalar multiplication as (α f )( x ) = α f ( x ) for α ∈ R


then the function (α f )( x ) takes in any real number as an input, and
its output is the product of two real numbers, so it is also a member
of RR .

At this point you might start thinking that all vector spaces are
infinitely big. After all, if we take any element of the vector space u
then all the multiples αu should also be in the vector space since the
space is closed under scalar multiplication. This does not necessarily
mean the space is infinite, as we see in the final example below.
Example. F32 (Binary codes of length 3)
F2 is the set {0, 1} and F32 is just three copies of this set:

F32 = {( a, b, c)| a, b, c ∈ F2 } .

This is a finite set – it has only eight elements. Three are listed
below:

Let u = (1, 1, 1)
v = (1, 1, 0)
w = (0, 0, 1)

Can you list the other 5 elements?


8 university of cape town

We define addition of two elements in the following way: we will


add the components of each element but 1 + 1 ≡ 0 (mod 2) rather
0
than 1 + 1 = 2. The congruence sign ‘≡’ and ‘(mod 2)’ are reminders
that this is a special type of addition on F2 and not the regular addi-
+1 +1
tion we are used to. Figure 1.1 may make it easier to understand.
If we define addition in this way then v + w = u. This seems
1
reasonable, but it also means that u + v = w and u + w = v which is
very unusual. (Check that these claims are true using the rule for addition Figure 1.1: Addition modulo 2.
above!) F32 is closed under this operation.

For scalar multiplication αu we will multiply each component of u


by α. However, we will restrict α to elements of F2 , i.e. we can only
multiply by 1 or by 0. F32 is closed under this scalar multiplication
since if u is in the space to begin with, then so is 1u = u and 0u =
(0, 0, 0). This set is a vector space.

This example might seem a little contrived and useless for any-
thing other than scaring engineers who refuse to eat their vegeta-
bles. It’s not. The theory of codes is incredibly important in modern
telecommunications. Unfortunately there isn’t enough time to cover
this in any depth, but if you are interested in reading up by your-
self you could start by looking at the Wikipedia page for Hamming
codes.

Linear Combinations

Definition. Let v1 , v2 , . . . vn be vectors belonging to a vector space


V and α1 , α2 , . . . αn be scalars associated to that vector space. We call We normally use αi for sums of un-
the sum known size and α, β, γ, δ for smaller
n sums.
α1 v1 + α2 v2 + . . . + αn vn or ∑ α i vi
i =1

a linear combination of these vectors.


 
3
Example. Give all possible linear combinations of the vector  1  ∈ R3 .
 
3
Solution. A linear combination of one vector may seem odd, but it
is just the set of all scalar multiples of that vector, i.e. a straight line.
   

 3 

S = α 1 ;α ∈ R .
 
 
 3 

! !
1 −2
Example. Give all possible linear combinations of the vectors , ∈ R2
0 1
mam2084f/s 9

Solution. The set of all possible linear combinations is


( ! ! )
1 −2
S= α +β ; α, β ∈ R .
0 1

After some thought, and perhaps consulting Figure 1.2, we might (x,y)
realise that any vector in R2 can be written as a linear combination
of these two vectors, i.e. S = R2 . We will revisit this idea later in the
course.
Example. Give all possible linear combinations of the vectors
f ( x ) = 2x2 , g( x ) = π, h( x ) = −3x ∈ RR .
Solution. The set of all possible linear combinations is Figure 1.2: Two noncollinear vectors
generate a plane.
S = {α f ( x ) + βg( x ) + γh( x ); α, β, γ ∈ R} .

Again, after some thought we may realise that this can be simplified.
The resulting set S is the set of all polynomials of degree two or less,
i.e. n o
S = ax2 + bx + c; a, b, c ∈ R .

Example. Give all possible linear combinations of the vectors (1, 1, 0), (0, 1, 0) ∈ F32 .
Solution. The set of all possible linear combinations is

S = {( a, b, 0); a, b ∈ F2 } .

This is a finite set containing four elements:

(1, 1, 0) = 1 × (1, 1, 0) + 0 × (0, 1, 0),


(1, 0, 0) = 1 × (1, 1, 0) + 1 × (0, 1, 0),
(0, 1, 0) = 0 × (1, 1, 0) + 1 × (0, 1, 0),
(0, 0, 0) = 0 × (1, 1, 0) + 0 × (0, 1, 0).

You should convince yourself that this is exhaustive – both of ele-


ments of S as we described it above and possible linear combinations
of the two vectors we were given.
Linear combinations of vectors are important in understanding
systems of linear equations Ax = b, which is where we focus now.

Systems of Linear Equations

Definition. A system of linear equations has the form

a11 x1 + a12 x2 + a13 x3 + ... + a1n xn = b1


a21 x1 + a22 x2 + a23 x3 + ... + a2n xn = b2
.. .. ..
. . .
am1 x1 + am2 x2 + am3 x3 + ... + amn xn = bm
10 university of cape town

The is a system of m equations and n unknowns. It can be written


more compactly using an m × n matrix A, a n × 1 vector x and a
m × 1 vector b in the following form

Ax = b.

The i-th entry of the vectors x and b are xi and bi respectively, and
the entry in the i-th row and j-th column of A is the coefficient aij .
Definition. If a system of linear equations doesn’t have a solution
we call it inconsistent. If it has at least one solution we call it consis-
tent.
Fact 1.1. The result of the matrix multiplication Ax is a linear combination
of the columns of A, i.e.
     
a11 a12 a1n
 a21   a22   a2n 
     
Ax =  x +
 ..  1  ..  2
   x + . . . +  ..  xn .
  (1.1)
 .   .   . 
am1 am2 amn

Explanation. This follows obviously from the definition of matrix


multiplication. Despite how obvious it is, it is still an important re-
sult. We will use a number of times this later in the course so don’t
forget about it.
Fact 1.2. The system Ax = b is consistent if and only if (iff) the vector b is
a linear combination of the columns of the matrix A.
Explanation. Both implications follow from Fact 1.1.
Another way of thinking about the product Ax is as a linear trans-
formation of the vector x from Rn to Rm by the matrix A.
Definition. We say a transformation T : V → W between two
vector spaces is a linear transformation if it satisfies the following two
conditions
1. T (u + v) = T (u) + T (v)

2. T (αu) = αT (u)
for any vectors u and v in V and any scalar α associated with V.
Fact 1.3. Every linear transformation from Rn to Rm can be represented by
an m × n matrix, and every m × n matrix corresponds to a linear transfor-
mation between Rn and Rm .
! !
1 1
Example. A linear transformation T sends to and it
0 1
! !
−2 0
sends the vector to . Find its matrix representation.
1 −2
mam2084f/s 11

Solution. Let’s start by convincing ourselves that the information


above is enough to solve this problem; that by specifying how these
two vectors transform we know everything about the transformation
T. First, notice that any point in R2 can be expressed as a linear
combination of these two vectors , i.e. for any v we have This was an example we did in the
section above.
! !
1 −2
v=α +β .
0 1

Then the transformation of v is just


! !!
1 −2
T (v) = T α +β
0 1
! !
1 −2
= αT + βT
0 1
! !
1 0
=α +β .
1 −2

We have used the linearity of T to move from the first line to the
second. The information given completely describes a linear trans-
formation, so let’s find it. Let A be the matrix that represents this
transformation. Remember that the matrix multiplication Ax is just a
linear combination of the columns of A.
! ! ! ! !
1 a11 a12 1 1 a12
A = 1× +0× = ⇒A=
0 a21 a22 1 1 a22

and
! ! ! ! !
−2 1 a12 0 1 2
A = −2 × +1× = ⇒A=
1 1 a22 −2 1 0

Homogeneous Systems

Definition. We call a system of linear equations homogeneous if it has


the form Ax = 0, i.e. if the vector x is transformed by A to the zero
vector.

Fact 1.4. All homogeneous systems are consistent.


Explanation. All homogeneous systems are solved by x = 0. This is
sometimes called the trivial solution.

Fact 1.5. If x and y are two solutions to a homogeneous system, then every
linear combination of x and y is a solution to that homogeneous system.
12 university of cape town

Explanation. Let x and y be solutions to a homogeneous system, i.e.


Ax = 0 and Ay = 0. Then
A(αx + βy) = A(αx) + A( βy)
= αAx + βAy
= α0 + β0
=0
so any linear combination of the two is also a solution.
Fact 1.6. If a homogeneous system has at least one non-trivial solution it
has an infinite number of non-trivial solutions.
Explanation. Let x 6= 0 be a solution to a homogenous system, i.e.
Ax = 0. Then αx is a solution to the same system for any α ∈ R since
A(αx) = αAx
= α0
=0

Definition. The null space (or kernel) of a matrix A is the set


NS( A) = {x; Ax = 0} .
!
1 0
Example. Consider the matrix P = which projects onto
0 0
the x-axis. The null space of the matrix is the set NS( P)
( ! )
0
NS( P) = α ;α ∈ R .
1
Any point on the y-axis is projected onto the origin by this matrix
(see figure 1.3).

Inhomogeneous Systems

Definition. We call a system of linear equations inhomogeneous if it Figure 1.3: Null Space of a Projection
Matrix
has the form Ax = b and b 6= 0. Every inhomogeneous system has an
associated homogeneous system Ax = 0.
Fact 1.7. If x is a solution to an inhomogeneous system and z is a solution
to the associated homogeneous system, i.e. z ∈ NS( A), then x + z is a
solution to the inhomogeneous system.
Explanation. We have x and z that satisfy Ax = b and Az = 0.
Consider how x + z is transformed by the matrix A:
A(x + z) = Ax + Az
= b+0
=b
mam2084f/s 13

so x + z is another solution to the inhomogeneous system.

Fact 1.8. If x and y are two solutions to the inhomogeneous system then
x − y satisfies the associated homogeneous system, i.e. x − y ∈ NS( A).
Explanation. We have x and y that satisfy Ax = b and Ay = b.
Consider their difference, x − y:

A(x − y) = Ax − Ay
= b−b
=0

so A(x − y) = 0 as claimed.
Together these two facts imply that if we have a solution x to an
inhomogeneous system of linear equations we can generate another
solution by adding a solution to the associated homogeneous equa-
tion, and further, every other solution to the inhomogeneous system
can be generated in this way.

Fact 1.9. An inhomogeneous system of linear equations Ax = b has either

1. no solutions,

2. exactly one solution,

3. an infinite number of solutions.

Explanation. The inhomogeneous system is either consistent or


inconsistent. If it is inconsistent there is no solution, so 1. above
holds and we are done. Consider what happens when the system
is consistent.
If the inhomogeneous system is consistent, consider the associated
homogeneous system. It has either exactly one solution, the zero
vector, or an infinite number of solutions (Fact 1.6).
If the homogeneous system has exactly one solution then so does
the inhomogeneous system (Fact 1.8) and 2. holds. If it has an infinite
number of solutions then so does the inhomogeneous system (Fact
1.7) and 3. holds.

Fact 1.10. Although any linear combination of solutions to a homogeneous


system is again a solution to that system (Fact 1.5), the same does not hold
for inhomogeneous systems.
Explanation. Let x and y satisfy Ax = b and Ay = b. Consider a
14 university of cape town

linear combination of these two, e.g. 3x + 2y:

A(3x + 2y) = A(3x) + A(2y)


= 3Ax + 2Ay
= 3b + 2b
= 5b
6= b

so this linear combination is not a solution to the original inhomoge-


neous system.

Subspaces

Definition. We say that S is a subset of V (written S ⊆ V) if every


element of S is also an element of V.
( ! !)
1 −2
Example. The set , ⊆ R2 .
5 0
Definition. If V is a vector space, and S is a nonempty subset of V, Nonempty means that there must be at
we call S a subspace of V if it is least one element in S.

1. closed under vector addition, and

2. closed under scalar multiplication.

We use the same vector addition and scalar multiplication for S that
we do for V.
Example. Consider the following subsets of R2 and decide whether
or not they are subspaces.
( ! )
x
1. S = ; x 2 + y2 ≤ 1
y
( ! )
x
2. S = ; x ≥ 0, y ≥ 0
y
( ! )
x
3. S = ; y = 3x
y

Solution.

1. No. The set is not


! closed under
! vector addition or scalar multipli-
!
1 0 1
cation, e.g. and are both in S but their sum,
0 1 1
is not.
mam2084f/s 15

!
3
2. No. The set is not closed under scalar multiplication, e.g.
7
! !
3 −3
is in S but −1 = is not.
7 −7

3. Yes, this is a subspace.


To disprove a statement it’s enough to provide one example where
it fails (like we did in the previous two examples), but to prove
a statement we need to make a generalised argument. We show
below that S is closed under the addition of any two vectors, and
under multiplication of any vector with a scalar.
!
ux
Let u and v be elements of S and α ∈ R. Then u = where
uy
!
vx
uy = 3u x . Similarly v = where vy = 3v x . The sum of these
vy
vectors is
! ! !
ux vx ux + vx
u+v = + = .
uy vy uy + vy

We must check whether this new vector is still in S.

(uy + vy ) = (uy ) + (vy )


= (3u x ) + (3v x )
= 3( u x + v x ).

It is in S, which must be closed under vector addition.


We leave as an exercise for the reader the proof that S is closed
under scalar multiplication.

Fact 1.11. Any vector space V has at least two subspaces, {0} and V
itself.

Fact 1.12. The subspaces of R2 are

(a) R2 ,
(b) straight lines through the origin,
(c) the origin, i.e. {0}.

Explanation. Convince yourself. (See this week’s tutorial).

Fact 1.13. The subspaces of R3 are

(a) R3 ,
(b) planes through the origin,
(c) straight lines through the origin,
16 university of cape town

(d) the origin, i.e. {0}.

Explanation. Similar to the fact above.


We may have a strong intuition that these types of subspaces are
fundamentally different from each other in some way. Although
we haven’t yet talked about the dimension of a vector space or
subspace, it seems reasonable to describe these subspaces are
having dimension 3 (R3 ), 2 (the planes), 1 (the lines), or 0 (the
single point).

Fact 1.14. The null space of a matrix A is a subspace

Explanation. We have already done all the hard lifting in the


previous section, but for the sake of completeness let x and y be
elements of the null space, then

A(αx + βy) = αAx + βAy = 0

so we have closure under linear combinations (i.e. under vector


addition and scalar multiplication).

Fact 1.15. The set of solutions to an inhomogeneous system is not a


subspace.

Explanation. Again, we have done the hard lifting in an earlier


section, see Fact 1.10.

Example. Here’s an example in the space of functions. Consider


the differential equation
y0 = xy.
Let y1 and y2 be solutions to this differential equation. Is the linear
combination (αy1 + βy2 ) also a solution?

(αy1 + βy2 )0 = αy10 + βy20


= αxy1 + βxy2
= x (αy1 + βy2 )

Gauss Reduction

You should remember how to Gauss reduce to solve a system of


linear equations from mam1021s but it is so important in actually
doing computations in this course that we will review it here. We
outline the general procedure below, and apply it to an example as a
concrete illustration.
Given the system Ax = b, we perform the following steps to
obtain the solution x (when it exists).
mam2084f/s 17

1. Create an augmented matrix ( A|b).


The subsequent steps can be divided into two parts, first we work
from top to bottom and left to right (steps 2 to 4) and then from
bottom to top and right to left (steps 5 to 7).

2. Move the leftmost, nonzero entry in the matrix to the top row
by interchanging rows if necessary. Scale the top row so that the
leftmost nonzero entry is 1. This entry is our first pivot element.

3. Make all the entries underneath the pivot equal to zero by sub-
tracting a suitable multiple of the pivot row from each row be-
neath it.

4. Repeat from step two, ignoring the top row.


While performing this first part of the Gauss reduction we may
generate a row of zeros in the left of the augmented matrix, i.e.

(0 0 . . . 0 | α ).

If α = 0 the entire row of zeros should be left at the bottom of the


augmented matrix. It has no bearing on the solution. If α 6= 0 then
the system is inconsistent, there is no solution, and we should stop
the calculation.
Following the procedure to this point will transform the aug-
mented matrix into row echelon form – each pivot is to the left of the
pivots beneath it.

5. The leftmost, nonzero entry in the bottom row becomes our first
pivot element.

6. Make all the entries above the pivot equual to zero by subtracting
a suitable multiple of the pivot row from each row above it.

7. Repeat from step five, ignoring the bottom row.


The matrix is now in reduced row echelon form – each pivot is to
the left of the pivots beneath it and each pivot is the only nonzero
entry in its column.

8. Read off the answer. Each column corresponds to an unknown


variable. Columns without pivots are free and additional variables
should be introduced for each of these columns to indicate that the
corresponding unknowns can take any value.

Example. Solve the system


   
1 4 1 2 1 4
 1 3 1 1 1 x =  1 .
   
3 6 3 4 5 2
18 university of cape town

The corresponding augmented matrix is


 
1 4 1 2 1 4
 1 3 1 1 1 1 .
 
3 6 3 4 5 2

We subtract row 1 from row 2, and three copies of row 1 from row
3 to make all the entries in the first column under the pivot zero.
We indicate these changes to the matrix with following notation:
R2 → R2 − R1, R3 → R3 − 3R1.
 
1 4 1 2 1 4
 0 −1 0 −1 0 −3  .
 
0 −6 0 −2 2 −10

Multiply the second row by negative one so that the first non-zero
entry in the row is one: R2 → − R2
 
1 4 1 2 1 4
 0 1 0 1 0 3 .
 
0 −6 0 −2 2 −10

Add six copies of row two to row three to make the entry under
the pivot zero. We can also think of this as subtracting negative six
copies. In either case we denote it with: R3 → R3 + 6R2
 
1 4 1 2 1 4
 0 1 0 1 0 3 .
 
0 0 0 4 2 8

Scale the third row so that the first non-zero entry is equal to one:
R3 → 41 R3  
1 4 1 2 1 4
 0 1 0 1 0 3 .
 
0 0 0 1 12 2
We work from right to left making entries above the pivot zero. We
subtract two copies of row three from row one and row three from
row two: R1 → R1 − 2R3, R2 → R2 − R3
 
1 4 1 0 0 0
 0 1 0 0 − 12 1  .
 
1
0 0 0 1 2 2

Subtract four copies of row two from row one: R1 → R1 − 4R2


 
1 0 1 0 2 −4
 0 1 0 0 − 12 1 .
 
1
0 0 0 1 2 2
mam2084f/s 19

The augmented matrix is now in reduced row echelon form and


we can read off the answer. The third and fifth columns don’t
contain a pivot. These variables are free, i.e. x3 = α and x5 = β, so
we know the solution will have the form:
     
  
        
     
x =  0 +α 1 + β 0 .
     
     
        
0 0 1

To determine the missingentries consider each row


 of the matrix
separately. The first row 1 0 1 0 2 −4 is equivalent to
the equation x1 + x3 + 2x5 = −4. Substituting in for x3 and x5 and
rearranging we see that x1 = −α − 2β − 4. Thus:

−4 −1 −2
     
        
     
x =  0 +α 1 + β 0 .
     
     
        
0 0 1

We can do the same for the remaining two rows. After scaling the
final vector by a factor of two to remove the fractions we arrive at
the final answer
−4 −1 −4
     
 1   0   1 
     
x =  0 +α 1 + β 0 .
     
     
 2   0   −1 
0 0 2

Generating Sets
hSi
Let S ⊆ V and T ⊆ V. (S and T are subsets of V).
Definition. We denoted by hSi a new set that is formed by taking all
possible linear combinations of vectors in S. If hSi = T then we say  
1
that T is generated by S, or that S is a generating set for T. 2
( !)
1
Example. If S = what is hSi?
2
Solution. Since S is a singleton set, i.e. it has only one element, all
the possible linear combinations of vectors in S will be just the scalar
multiples of that vector,
( ! )
1 Figure 1.4: The set generated by a single
hSi = α ;α ∈ R . vector in R2 is a straight line.
2
20 university of cape town

( ! !)
1 3
Example. If S = , what is hSi?
2 6
Solution. The set of all possible linear combinations is
( ! ! )
1 3
hSi = α +β ; α, β ∈ R .
2 6

This answer can be simplified if we realise that the two vectors in S


are just scalar multiples of each other. If we call α + 3β by γ then
( ! )
1
hSi = γ ;γ ∈ R .
2

( ! !)
1 1
Example. If S = , what is hSi?
2 0
Solution. The set of all possible linear combinations is
( ! ! )
1 1
hSi = α +β ; α, β ∈ R .
2 0

This can also be simplified. We claim that

h S i = R2 ;

any vector in R2 can be expressed as a linear combination of the


vectors in S. You should be able to show that this is true by solving
! ! !
1 1 x
α +β =
2 0 y

for values of α and β in terms of x and y. Make sure you understand


why this is sufficient to show hSi = R2 and then do it.

Fact 1.16. For any S ⊆ V the set hSi is a subspace of V.


Explanation. To show that hSi is a subspace we must show that it is
closed under scalar multiplication and vector addition.
Let u and v be two elements of hSi, and β i , γi ∈ R. Since u is in
hSi we can express it as

u = β 1 x1 + β 2 x2 + . . . + β n xn

for some xi in S. Similarly

v = γ1 y1 + γ2 y2 + . . . + γm ym

for some y j in S.
The addition of u and v is

u + v = β 1 x1 + β 2 x2 + . . . + β n xn + γ1 y1 + γ2 y2 + . . . + γm ym
mam2084f/s 21

which is a linear combination of vectors in S since the vectors xi and


y j are all in S. (Where x p = yq we can group these vectors with a new
coefficient δr = β p + γq ). Thus u + v is an element of hSi and the set
is closed under vector addition.
Similarly we can show that the set is closed under scalar multipli-
cation (do it) and therefore hSi is a subspace.
In addition to working forward (finding hSi from a given S) we
should also be able to work backward (checking whether a given set
T is equal to hSi, or proposing a set S that would generate it).
       
 1
 4 −1   3
Example. Let S =  1  ,  2  ,  3  . Is  5  in hSi?
       

 1 
3 1  5
Solution. We need to find α, β and γ so that
       
1 4 −1 3
α 1 + β 2 +γ 3  =  5 .
       
1 3 1 5

This is a system of linear equations that can be represented by the


following augmented matrix.
 
1 4 −1 3
 1 2 3 5 .
 
1 3 1 5

This matrix can be Gauss reduced (do it) to


 
1 4 −1 3
 0 1 −2 −2  .
 
0 0 0 −2
 
3
There is no solution, so  5  is not in hSi.
 
5
We might notice that
     
−1 1 4
 3  = 7 1 −2 2 .
     
1 1 3

One of the vectors in S is a linear combination of the other two. This


means that the set generated by S will be a plane and the reason we
couldn’t find a solution to this question was because the specified
vector does not lie in this plane.
Example. F is the vector space of all functions R → R. Let T be
the subspace of all polynomials of degree two or less. Suggest the
smallest set S that generates T.
22 university of cape town

Solution. Consider S = x2 , x, 1 . Then




n o
hSi = αx2 + βx + γ; α, β, γ ∈ R = T.

Example. T is the plane in R3 defined by 3x − y + 2z = 0. Find a set


of two vectors that generates T.
Solution. Do it.

Linear Independence

Definition. We say that a set of vectors S = {v1 , v2 , v3 . . . vn } is


linearly independent if the only solution to

α1 v1 + α2 v2 + . . . + αn vn = 0 (1.2)

is α1 = α2 = . . . = αn = 0.
Note that any set of vectors will have this trivial solution to (1.2).
We call the set linearly independent if it is the only solution. Con-
versely, some sets are linearly dependent.
Definition. We say that a set of vectors S = {v1 , v2 , v3 . . . vn } is
linearly dependent if (1.2) has at least one solution where one of the
coefficients αi 6= 0.
     
 1
 0 5  
Example. Is S =  0  ,  2  ,  4  linearly dependent or
     

 0 
0 0 
independent?
Solution. The set is linearly dependent since
     
1 0 5
5  0  + 2  2  −  4  = 0.
     
0 0 0

Example. Is a singleton set S = {v1 } linearly dependent or indepen-


dent?
Solution. We are looking for solutions to the equation

α1 v1 = 0.

If the vector v1 6= 0 then the only solution is α1 = 0 and the set is


linearly independent. On the other hand, if v1 = 0 then any value of
α1 satisfies the equation, so the set is linearly dependent.
     
 1
 4 2 

Example. Is the set S =  2  ,  1  ,  3  linearly
     
 
 3 5 −1 
dependent or independent?
mam2084f/s 23

Solution. Unlike the earlier example we cannot solve this by inspec-


tion – there isn’t an obvious linear combination of these three vectors
that produces the zero vector. We might suspect that this is linearly
independent, but to show that it is the case we will do some Gauss
reduction. Consider the equation
     
1 4 2
α  2  + β  1  + γ  3  = 0.
     
3 5 −1

This is equivalent to the augmented matrix


 
1 4 2 0
 2 1 3 0 .
 
3 5 −1 0

After some Gauss reduction (do it) we see that this is equivalent to
 
1 4 2 0
 0 −7 −1 0  .
 
0 0 −6 0

At this stage we can conclude that the only solution is α = β = γ = 0


so this set of vectors is linearly independent.
     
 1
 4 2  
Example. Is the set S =  2  ,  1  ,  3  linearly depen-
     

 3 
5 5 
dent or independent?
Solution. Set up an augmented matrix and Gauss reduce.

Fact 1.17. Consider a set S of two or more vectors. Then the following two
statements are equivalent:

1. S is linearly dependent.

2. At least one vector in S can be expressed as a linear combination of the


others.

Explanation. There are two steps to showing that Fact 1.17 is true.
First we must show linear dependence implies that one vector is a
linear combination of the others, then we must do the converse.
Assume that S is linearly dependent, i.e. there is an αk 6= 0 so that

α1 v1 + α2 v2 + . . . + αk vk + . . . αn vn = 0.

Rearranging shows that

α1 v1 + α2 v2 + . . . + αk−1 vk−1 + αk+1 vk+1 + . . . αn vn = −αk vk .


24 university of cape town

Now we can divide by −αk since this is non-zero and we see that
α1 α2 α α αn
− v1 − v2 − . . . − k−1 vk−1 − k+1 vk+1 − . . . − vn = vk .
αk αk αk αk αk
We have shown that vk is a linear combination of the other vectors.
Now assume that one vector in S can be expressed as a linear
combination of the others, i.e.

α1 v1 + α2 v2 + . . . + αk−1 vk−1 + αk+1 vk+1 + . . . αn vn = vk

Rearrange:

α1 v1 + α2 v2 + . . . + αk−1 vk−1 + (−1)vk + αk+1 vk+1 + . . . αn vn = 0.

At least one of the coefficients in this sum is non-zero since αk = −1


so this set of vectors is linearly dependent.
Example. Is the set S = x2 , x, 1 linearly dependent or indepen-


dent? We will see that there is an easier


way of checking the independence of
Solution. We are looking for α, β and γ so that functions in section 3.

αx2 + βx + γ = 0

and this must be true for any value of x. In particular it must be true
when x = 1, when x = 0 and when x = −1:

α+β+γ = 0
γ=0
α−β+γ = 0

This is a fairly easy to solve system of three equations in three un-


knowns which has only one solution α = β = γ = 0. The functions
are linearly independent.
Example. Is the set S = x2 + x − 1, 2x2 + 3x + 1, x2 − 4 linearly


dependent or independent?
Solution. We can exploit the linear independence of x2 , x and 1 that
we have just shown to solve this problem. We start with

α( x2 + x − 1) + β(2x2 + 3x + 1) + γ( x2 − 4) = 0.

Rearrange and group commensurate powers of x:

x2 (α + 2β + γ) + x (α + 3β) + (−α + β − 4γ) = 0.

Since the different powers of x are linearly independent this can only
sum to zero if the coefficients of each power are individually equal
to zero. This gives three equations and three unknowns which we
represent with an augmented matrix: Make sure you understand how we
produced this matrix.
mam2084f/s 25

 
1 2 1 0
 1 3 0 0 .
 
−1 1 −4 0

Again, Gauss reduction (do it) gives the equivalent matrix


 
1 0 3 0
 0 1 −1 0  .
 
0 0 0 0

Since there is a free variable we immediately know there are an in-


finite number of solutions and so these functions form a linearly
dependent set. To make this more concrete, let γ = t since it is a free
variable. Then α = −3t and β = t. Letting t = 1 gives us

−3( x2 + x − 1) + (2x2 + 3x + 1) + ( x2 − 4) = −3x2 + 2x2 + x2 − 3x + 3x + 3 + 1 − 4 = 0.

Basis & Dimension

Definition. A subset B of V is called a basis for a vector space V if

1. B is linearly independent, and

2. h Bi = V.
( ! !)
1 0
Example. Show that , is a basis for R2 .
0 1
We need to show that this set is linearly independent and gener-
ates all of R2 . Consider
! ! !
1 0 0
α +β = .
0 1 0

By looking at the first component we see that α = 0 and by looking at


the second component β = 0 so the set is linearly independent.
! set generates R . Choose an arbitrary vec-
Now we show that the 2

x
tor in R2 , i.e. v = . We need to find two constants α and β so
y
that ! ! !
1 0 x
α +β = .
0 1 y

In this case we set α = x and β = y. The set generates all of R2 and


so it is a basis.
This example was very straightforward because of the structure of
this basis. This basis is so straightforward to work with that it has a
special name. We call it the standard or canonical basis for R2 , and we
denote these basis vectors by e1 and e2 .
26 university of cape town

     
 1
 0 0 
Example. The standard or canonical basis for R3 is  0  ,  1  ,  0 
     

 0 
0 1 
or {e1 , e2 , e3 }. Write down the standard basis for R4 .
       

 1 0 0 0 

 
 0   1   0   0 
Solution. The canonical basis for R4 is {e1 , e2 , e3 , e4 } =  , , ,  .
       

  0   0   1   0 
 
0 0 0 1
 
Note that the notation e1 is slightly ambiguous. This is the canon-
ical basis vector where the first entry is 1 and the remaining entries
are 0, but without context we don’t know how many other entries
there are. We have used it to denote a 2-, 3- and 4-dimensional vector
above. This ambiguity is only removed by reading through the sur-
rounding text and understanding which vector space we are working
in.
( ! !)
1 3
Example. Show that , is also a basis for R2 .
2 5

! that this set generates R . Consider an


Solution. First we show 2

x
arbitrary vector . Then we want to find α and β so that
y
! ! !
1 3 x
α +β = .
2 5 y

The corresponding augmented matrix is


!
1 3 x
2 5 y

Which can be Gauss reduced (do it) to


!
1 0 3y − 5x
.
0 1 2x − y

From here we see that we have α = 3y − 5x and β = 2x − y, for


example ! ! !
−1 1 3
= 17 −6 .
4 2 5
To show that the set is linearly independent we can reuse the compu-
tation we have just done. Setting x = y = 0 gives α = β = 0 which
shows linear independence. This set is a basis for R2 .
There are an infinite number of bases for R2 . Any set of two non-
collinear vectors is a basis. We may wonder whether there are bases
for R2 with a different number of elements, perhaps 1 or 3. There are
not.
mam2084f/s 27

Consider a possible one-element basis S = {v1 } for R2 . This set hSi


cannot generate the entire vector space since hSi is simply {αv1 ; α ∈ R}
which is either a straight line (if v1 is non-zero) or the origin. In ei- v1
ther case there are many vectors in R2 that are not generated by S
and it fails to be a basis.
If we consider a three element set S = {v1 , v2 , v3 } we may find that
it generates all of R2 but it will not be linearly independent. If we
want to solve
not in hSi
αv1 + βv2 + γv3 = 0

which we can write as


!
x1 x2 x3 0 Figure 1.5: A one-element set cannot
. generate R2 and fails to be a basis.
y1 y2 y3 0

No matter what the components of the vectors vi are, upon Gauss re-
ducing this matrix we will find at most two pivots which means that
one of α, β or γ is free and there are an infinite number of solutions.
S is linearly dependent and fails to be a basis.

Fact 1.18. For any vector space V, all its bases have the same number of
elements.
Definition. The dimension of a vector space is the number of ele-
ments in any of its bases. This all gets significantly more com-
plicated for infinite dimensional vector
Example. Let P2 [ x ] be the vector space of all polynomials in x of spaces. We do not discuss them in
degree two or less. What is the dimension of P2 [ x ]? detail in this course.

Solution. We know from an earlier example that the set x2 , x, 1




is linearly independent. It’s easy to show that it generates P2 [ x ] and


is hence a basis. It contains three elements, so this space is three-
dimensional.
Example. What is the dimension of the subspace S of R3 described
below?   
 x
 

S =  y  ; x − 2y + 5z = 0
 

 z 

Solution. We recast the expression


  that  S from Cartesian
 defines
2 0
to vector form. Note that  1  and  5  are two, non-collinear
   
0 2
vectors that lie in the plane. Then
     

 2 0 

S = α  1  + β  5  ; α, β ∈ R .
   
 
 0 2 
28 university of cape town

A natural choice for the basis of S is the set


   
 2
 0  
B =  1 , 5  .
   

 0 
2 
You should show that B is linearly independent and that h Bi = S.
Do it. The basis has two elements so this plane is a two-dimensional
subspace of R3 .
Fact 1.19. If V is an n-dimensional vector space and S is a subset of V that
contains k > n elements then S is linearly dependent. Equivalently, if T is
a linearly independent subset of an n-dimensional vector space V, then T
contains j ≤ n elements.
Explanation. Let V be an n-dimensional vector space.
Consider a basis for V

B0 = {b1 , b2 , b3 , . . . , bn }

and let T be a linearly independent subset of V



T = t1 , t2 , t3 , . . . , t j .

We will show that j ≤ n.


We start by introducing a new set

B̃1 = {t1 , b1 , b2 , . . . , bn } .

Since B0 was a basis for V we can express t1 as a linear combination


of the vectors in B0
n
t1 = ∑ α i bi
i =1
and not all of the coefficients αi can be equal to zero. Why? Think.
Let α p be a non-zero coefficient. Then we can rearrange and write
!
αi 1
b p = ∑ − bi + t1 .
i6= p
α p α p

We drop b p from the set B̃1 to create a new set



B1 = t1 , b1 , b2 , . . . , b p−1 , b p+1 , . . . , bn .

This set is a basis for V since it is linearly independent and still spans
V. Why? Think.
Now add t2 to form the set B̃2 , then drop one of the elements
in the set, bq , to create a new basis B2 . We repeat this procedure
until we have added all the elements in T to our new basis. Note
that we will never be forced to pull out one of the elements that we
previously added. Why? Think!
Since we can add all elements of T without running out of original
elements of B to remove it must be the case that j ≤ n.
2
Determinants

The determinant is a function which takes as its input an N by N


matrix (or N vectors each with N entries), and produces as its output
a single value. There are many ways that we could define such a
function, e.g. we could introduce a function that assigns to every
square matrix the value in its top-left corner, i.e.

top left( A) = a11 .

This is easy to calculate, but not particularly useful.


The determinant is a much more useful function. It’s defined in
the following way (but don’t worry too much about this definition).
Definition. The determinant is the unique function det : (R N , R N , . . . , R N ) → R
which is defined by the following three properties

1. The function det is alternating, i.e. switching two of the inputs


changes the sign of the output.

2. The function det is multilinear, i.e. det(v1 , v2 , . . . , αvk , . . . , vn ) = α det(v1 , . . . , vn )


and det(v1 , v2 , . . . , vk + w, . . . , vn ) = det(v1 , . . . , vn ) + det(v1 , . . . , vk−1 , w, vk+1 , . . . vn ).

3. The function det assigns to the set of canonical basis vectors a


value of one, i.e. det(e1 , e2 , . . . , en ) = 1.

This definition is difficult to work with, and it’s not at all clear
how the determinant of a particular matrix should be calculated.
We’re going to adopt a more pragmatic approach in the following
section; we jump right in to computing determinants. All you need
to know is that the slightly odd-looking computations have a sound
theoretical justification and agree with the definition above. We will
show that this is true a little later in the course.
30 university of cape town

Calculating Determinants

Instead of the formal definition presented above we are going to


adopt a more practical approach to determinants. We start with the !
a b
calculation for the 2 × 2 case. Consider the matrix A = .
c d
Its determinant is det( A) = ad − bc. This is the difference of the
products of diagonals. Note that det( A) is also written | A|.

a b
det( A) = = ad − bc.
c d

− +
!
1 2
Example. What is det ?
3 4
Solution. 1 × 4 − 2 × 3 = −2.
Determinants of larger matrices 3 × 3, 4 × 4, n × n are calculated
recursively using the determinants of smaller matrices using the
Laplace or cofactor expansion. We work through a 3 × 3 example first
and then discuss the general case.
 
1 0 2
Example. Calculate the determinant of A =  1 −1 1  .
 
2 1 2
Solution. The first step in calculating this determinant is to select a
row or column of the matrix along which we will expand. The calcu-
lation is simplified if any of the entries in the chosen row or column
are 0, so we will expand along the first row. (We could also expand
along the second column to take advantage of the zero entry).

1 0 2

1 −1 1

2 1 2
For each entry in the chosen row or column we calculate the de-
terminant of a smaller 2 × 2 matrix formed by omitting the row and
column corresponding to that entry.
mam2084f/s 31

1 0 2 1 0 2 1 0 2

1 −1 1 1 −1 1 1 −1 1

2 1 2 2 1 2 2 1 2

The determinant of the original 3 × 3 matrix is the sum of the


determinants of the 2 × 2 matrices weighted by the corresponding
coefficient and the alternating factor (−1)i+ j where i and j are the
row and column indexes of the coefficients. In this case


−1 1 1 1 1 −1
det( A) = +1 −0 +2


1 2 2 2 2 1
= 1(−1 × 2 − 1 × 1) − 0(1 × 2 − 1 × 2) + 2(1 × 1 − (−1) × 2)
=3
Make sure that you get the same answer if you calculate the deter-
minant by expanding along the second column.
Definition. Given a n × n matrix A, the minor Mij corresponding
to the entry aij is the determinant of the (n − 1) × (n − 1) matrix
formed by omitting the i-th row and j-th column of A. The cofactor
Cij corresponding to aij is (−1)i+ j Mij .
The determinant of A can be calculated by using the Laplace or
cofactor expansion along any row or column, i.e. for a fixed i
n n
det( A) = ∑ aij (−1)i+ j Mij = ∑ aij Cij
j =1 j =1 I realise these two formula look compli-
or for a fixed j cated. Make sure you understand how
they relate to the examples above and
n n below.
det( A) = ∑ aij (−1)i+ j Mij = ∑ aij Cij .
i =1 i =1
 
1 2 −1
Example. Let A =  3 1 1 . Calculate | A|.
 
−1 3 2
There are no zero entries in this matrix so expanding along one
row or column will be as much work as any other row or column. We
will expand along the second row.
| A| = ∑ a2j (−1)2+ j M2j
j

2 −1 1 −1 1 2
= 3 × (−1) × +1×1× + 1 × (−1) ×

−1 −1

3 2 2 3
= −21 + 1 − 5
= −25
32 university of cape town

You should check that you get the same answer if you expand along
another row or column.
 
2 2 6 3
 1 0 2 −1 
Example. Let B =  . Calculate | B|.
 
 3 0 1 1 
−1 0 3 2
Calculating the determinant of an arbitrary 4 × 4 matrix using
this method is tedious. Fortunately there is only one non-zero entry
in the second column of B, so expanding along this column greatly
reduces the amount of work.
We might also notice that the minor associated with the only non-
zero entry in the second column is just the matrix A from the previ-
ous example. Together these two imply that | B| = 50. Make sure you
can fill in the details yourself.

Calculating Determinants by Gauss Reduction

Calculating determinants recursively is computationally expensive


(Computer Science Majors – it’s O(n!)). We might not notice how
hard it is because we are only ever asked to calculate determinants of
small (i.e. 3 × 3) matrices. We should try to do better. Here are some If you ever want to convince yourself
facts that help. that it really is tediously long, calculate
the determinant of a 5 × 5 matrix with
randomly chosen, non-zero integer
entries between negative nine and nine.
Fact 2.1. If all the entries in an entire row or column of a square matrix are
zero, then the determinant of the matrix is zero.
Explanation. Calculate the determinant by expanding along the row
or column of zeros.
Definition. A matrix A is called diagonal if aij = 0 for i 6= j. The en-
tries on the main diagonal may be non-zero (or zero) but the entries
off the main diagonal are zero.

 
a11 0 0 ··· 0

 0 a22 0 ··· 0 

0 0 a33 ··· 0
 
 
.. .. .. .. ..
 
.
 
 . . . . 
0 0 0 ··· ann
Diagonal matrix

Definition. A matrix A is called upper triangular (or lower triangular)


if aij = 0 for j < i (or i < j).
mam2084f/s 33

 
a11 a12 a13 ··· a1n

 0 a22 a23 ··· a2n 

0 0 a33 ··· a3n
 
 
.. .. .. .. ..
 
.
 
 . . . . 
0 0 0 ··· ann
Upper triangular matrix

Fact 2.2. The determinant of a diagonal, upper triangular, or lower tri-


angular matrix is equal to the product of the entries on its main diagonal.
This implies that the determinant of the
identity matrix is 1.
Explanation. Repeatedly expand along the first column (diagonal,
upper triangular) or row (lower triangular).
As it turns out we can use Fact 2.2 to easily calculate the determi-
nant of an arbitrary square matrix. We will Gauss reduce the square
matrix into row echelon (i.e. upper triangular) form, keeping track of
which Gauss reduction operations we performed. Each of the Gauss
reduction operations changes the determinant in a simple way.
Example. Consider the four determinants below.

3 6 1 2
= −9 = −3


2 1 2 1


2 1 3 6
=9 = −9


3 6 5 7

From these calculations it seems that

1. scaling one row of a matrix by a constant scales the determinant of


that matrix by the same constant,

2. exchanging two rows of a matrix changes the sign of the determi-


nant,

3. adding one row of a matrix to another row doesn’t change the


determinant at all.

These observations hold for all square matrices, and we show why
below.

Fact 2.3. If the square matrix B is obtained from A by scaling one row of A
by the constant k then
det( B) = k det( A).

Explanation. Assume that it was the p-th row of the matrix A that
was scaled by k to produce B. We will calculate det( A) and det( B) by
34 university of cape town

expanding along row p.

det( A) = ∑ a pj (−1) p+ j M pj
j

det( B) = ∑ b pj (−1) p+ j Npj


j

= k ∑ a pj (−1) p+ j M pj The minors M pj and Npj are the same


j since the matrices A and B differ only
in row p which is omitted to calculate
= k det( A) these minors.

Fact 2.4. If the matrix B is obtained from A by swapping two rows of A


then
det( B) = − det( A).

Explanation. Tutorial problem.

Fact 2.5. If two rows of a square matrix are identical then the determinant
of that matrix is zero.
Explanation. This is a corollary of the previous fact. Let A be a
matrix with two identical rows and B be the matrix produced by
swapping those identical rows. Then

det( B) = − det( A)

since we have swapped two rows, but

det( B) = det( A)

since the two matrices are identical. This implies that det( A) = − det( A)
which is only possible if the determinant is zero.

Fact 2.6. If one row of a square matrix is a scalar multiple of another row
then the determinant of that matrix is zero.
Explanation. Do it yourself – it uses the same idea as the previous
problem but there is one additional step with Fact 2.3.

Fact 2.7. If A, B and C are square matrices, identical except for their p-th
row where
c pj = a pj + b pj

then
det(C ) = det( A) + det( B).
mam2084f/s 35

Explanation. Expand along row p to find det(C ).

det(C ) = ∑ c pj (−1) p+ j M pj The minors M pj are the same for all 3


j matrices since they are identical apart
from row p.
= ∑ a pj (−1) p+ j M pj + ∑ b pj (−1) p+ j M pj
j j
= det( A) + det( B)

Fact 2.8. If the matrix B is obtained from A by adding k × rowp to rowq


then det( B) = det( A).
Explanation. This follows from Fact 2.6 and 2.7.

2 4 6

Example. Evaluate 2 6 5 .


1 2 7
Since this is only a 3 × 3 matrix we could solve it by expanding.
Instead we will solve it by performing Gauss reduction since this
approach is much faster for larger matrices.

2 4 6 1 2 3

2 6 5 = 2 2 6 5

Scale row one.

1 2 7 1 2 7


1 2 3

= 2 0 2 −1

Zero the entries under the pivot.

0 0 4

= 16



1 −1 −2 −3
−5 1 2 4
Example. Show that = −200.

1 −1 6 1

1 −2 3 4
Do this yourselves. You can either expand directly, or you can do
some Gauss reduction first.
I strongly suggest that if you have any doubts about how useful
Gauss reduction is for these types of problem you try to expand it
directly.
If you do choose to do it by Gauss reducing, you don’t need to
reduce it all the way to row echelon form (although you can). Even a
single step will make it much easier to evaluate the determinant.
Definition. The transpose of a matrix A (written A T ) is created by
interchanging rows and columns.
36 university of cape town

   
1 2 3 1 4 7
Example. If A =  4 5 6  then A T =  2 5 8 .
   
7 8 9 3 6 9

Fact 2.9. The determinant of a matrix is equal to the determinant of its


transpose, det( A) = det( A T ).
Explanation. This is easy to see for 2 × 2 matrices.

a b a c
= ad − bc = .

c d b d

For 3 × 3 matrices expanding along the first row of a matrix and the
first column of its transpose produce the same answer. We use the
result for 2 × 2 matrices to rearrange the intermediate calculation.

a b c
e f d f d e
d e f = a −b +c

h i g i g h


g h i

e h d g d g
= a −b +c

f i f i e h



a d g

= b e h


c f i

Similarly the 4 × 4 case can be proven using the result for the 3 × 3
case, then the 5 × 5, 6 × 6 and so on.

Fact 2.10. Since det( A) = det( A T ) all the previous facts about rows are
also true for columns.

Fact 2.11. If A and B are both n × n then

det( AB) = det( A) det( B).

Explanation. A full explanation is too long, but you should be able


to convince yourself that this is true for two diagonal or upper/lower
triangular matrices

Fact 2.12. If A is invertible, then det( A) 6= 0.


Explanation.

det( A) det( A−1 ) = det( AA−1 )


= det( I )
=1
mam2084f/s 37

Adjoints and Cramer’s Rule

Consider a square matrix


 
a11 a12 a13 ··· a1n

 a21 a22 a23 ··· a2n 

a31 a32 a33 ··· a3n
 
A= .
.. .. .. .. ..
 
.
 
 . . . . 
an1 an2 an3 ··· ann

Every element aij in A has a cofactor cij = (−1)i+ j Mij , from which we
can define a new matrix of cofactors
 
c11 c12 c13 ··· c1n

 c21 c22 c23 ··· c2n 

c31 c32 c33 ··· c3n
 
C= .
.. .. .. .. ..
 
.
 
 . . . . 
cn1 cn2 cn3 ··· cnn

Definition. The adjoint of A is the transpose of its matrix of cofac-


tors
adj( A) = C T .

Fact 2.13. Aadj( A) = det( A) I.


Explanation. The matrix det( A) I is diagonal. We will show that
this result is true in two parts. First we show that the entries on the
diagonal are all det( A), then we show that all the entries off the
digaonal are zero.
Diagonal entries: Consider the entry in the first row and first col-
umn of the product Aadj( A). This entry is calculated by summing
the products of the entries in the first row of A with the entries in the
first column of adj( A), i.e.

a11 c11 + a12 c12 + . . . + a1n c1n = ∑ a1j (−1)1+ j M1j


j

= det( A)

Similarly the i-th diagonal entry is ∑ j aij (−1)i+ j Mij = det( A).
Off-diagonal entries: Consider the entry in the first row and second
column of the product,

a11 c21 + a12 c22 + . . . + a1n c2n .

Unlike the diagonal entries, this expression does not look immedi-
38 university of cape town

ately useful. What we can do is introduce a new matrix


 
a11 a12 a13 · · · a1n
 11 a12 a13 · · · a1n 
 a 
a a32 a33 · · · a3n  .
 
à = 
 31
 .. .. .. .. .. 

 . . . . . 
an1 an2 an3 · · · ann
Here we have replaced the second row of the matrix A with a copy of
the first row. We know that the determinant of this matrix is zero, but
we also know that we can find an expression for this determinant by
expanding along the second row:
0 = det( Ã)
= ∑ ã2j (−1)2+ j M̃2j
j

= ∑ a1j (−1)2+ j M2j


j

= ∑ a1j c2j
j

= a11 c21 + a12 c22 + . . . + a1n c2n .


In the calculation above we used M̃2j = M2j . This is true because the
minors are calculated by ignoring the second row, which is the only
row where the entries of A and à differ. To show that the entry in
the i-th row and j-th column is zero (i 6= j) we do a similar thing – all
that must change is that in the matrix à row j must be replaced by a
copy of row i.
1
Fact 2.14. If A is invertible then A−1 = adj( A).
det( A)
Explanation. This is easily shown by rearranging the equation in
Fact 2.13. Although this result is useful theoretically, it is rubbish for
computation. If you need to find the inverse of a matrix, use Gauss
reduction.
Fact 2.15 (Cramer’s Rule). If A is invertible and Ax = b then the
solution x = ( x1 , x2 , . . . , xn ) T can be found by using
det( A j )
xj =
det( A)
where A j is the matrix A with its j-th column replaced by b.
Explanation. Tutorial problem.

An Important and Useful Fact About Determinants

The following theorem ties together all of the work that we have
done in this section.
mam2084f/s 39

Important Fact 2.16. Let A be an n × n matrix. The following state-


ments are equivalent.

1. det( A) 6= 0.

2. The matrix A is invertible.

3. The equation Ax = 0 has a unique solution x = 0.

4. The columns of A form a linearly independent set of n vectors.

4.b The same is true for the rows of A. (Since the transpose doesn’t change
the value of the determinant).

Explanation. The quickest way to show that this is true is to show


that 1 ⇒ 2, 2 ⇒ 3, 3 ⇒ 4 and 4 ⇒ 1.
1 ⇒ 2: The matrix A is invertible if we can find another matrix B
so that AB = I. In this case we write B = A−1 . We know that

Aadj( A) = det( A) I Since det( A) 6= 0 we can divide.


 
1
A adj( A) = I
det( A)

1
The inverse of A is adj( A).
det( A)
2 ⇒ 3: Assume that A−1 exists.

Ax = 0
−1
A Ax = A−1 0
Ix = 0
x=0

3 ⇒ 4: The product Ax is a linear combination of the columns of A


(Fact 1.1). If the only solution to Ax = 0 is x = 0 then the only linear
combination of the columns of A that sums to the zero is the trivial
one, which is exactly the definition of linear independence.
4 ⇒ 1: Since the columns of A are linearly independent the aug-
mented matrix ( A|0) is reduced to ( I |0) by Gauss reduction. The Why? Linear independence implies
determinant of the identity matrix is 1. Although the Gauss reduction exactly one solution, so every row must
have a pivot.
operations change the value of the determinant, it is only by scaling
with another non-zero constant. The original matrix must also have
had a non-zero determinant.
This is enough to complete the proof. You should try to do some
of the other links for practise.
3
Differential Equations

Given some information about the derivatives of an unknown


function, can we recover the function? Answering this question is
both useful and hard, and depends on exactly what information is
given.
Before we start solving some differential equations (DEs) it’s pru-
dent to introduce some terminology.
Definition. The order of a DE is the highest derivative that appears
in the DE, e.g.
d2 y dy
2
+γ + ω2 y = 0
dx dx
is a second-order differential equation.
Definition. DEs may be ordinary or partial differential equations
(ODEs or PDEs). If the unknown function is a function of one vari-
able, e.g. f ( x ) the DE is an ODE, and if it is of two or more variables,
e.g. g( x, y, t) the DE is a PDE.
Definition. DEs may be linear or non-linear. Linearity refers to the
appearance of the unknown function and it’s derivatives in the DE,
e.g.
dy
+y = 0
dx
is linear but
 2
dy
+ y3 = 0
dx
is not. We discuss linear DEs in more detail later.

Separable Differential Equations

This section is revision of material from MAM1021 so you should


find it fairly straightforward.
42 university of cape town

Definition. We say that an ODE of the form

dy
= f ( x ) g(y)
dx

is separable. Separable DEs can be easily solved be dividing by g(y)


and then integrating with respect to x.

dy
= f ( x ) g(y)
dx
1 dy
= f (x)
g(y) dx
1 dy
Z Z
dx = f ( x ) dx
g(y) dx
1
Z Z
dy = f ( x ) dx
g(y)

Now each side can be integrated and an expression for y( x ) can be


recovered.
dy √
Example. Solve = x y with y(0) = 0.
dx
Solution.

dy √
=x y
dx
1 dy
√ =x Only possible to divide through if
y dx y 6= 0.
1 dy
Z Z
√ dx = xdx
y dx
√ 1
2 y = x2 + C
2

√ 1 2
Since y(0) = 0 we see that the constant C = 0. Thus 2 y = 2x
x4
which can be rearranged to find that y = 16 . Note that this is not the
only solution! The trivial solution y = 0 which we excluded earlier
also solves the DE. This non-uniqueness of solutions is typical of
non-linear DEs.
Example. Find a family of curves so that each member of the family
is orthogonal to each member of the family of hyperbolas x2 − y2 =
C.
dy dy
Solution. If x2 − y2 = C then 2x − 2y dx = 0 which means dx = yx .
We recall from school that the product of the slopes of two orthognal
lines is negative one, so the derivative of the new family of curves is
mam2084f/s 43

dy
dx = − yx .

dy y
=−
dx x
1 dy 1
=−
y dx x
1 1
Z Z
dy =− dx
y x
ln |y| = − ln | x | + C
eln |y| = Ae− ln | x| Since eC = A with A > 0.
A
y= Relaxing the condition to A 6= 0 when
|x| we change |y| to y.

In addition there are two singular solutions y = 0 and x = 0. This


describes a second family of hyperbolae orthogonal to the first.
The idea behind separable DEs can be extended to solve certain
PDEs using the method of separation of variables.

Exact Differential Equations

Solving exact differential equations requires partial derivatives. If


you have already been through MAM2083 you should be familiar
with them already.
Definition. The partial derivative of a function f ( x1 , x2 , . . . , xn ) of
∂f
more than one variable is written or f xi and can be calculated by
∂xi
holding all other variables constant and differentiating with respect
to xi .
Example. Let f ( x, y) = x2 y + y3 − x.

∂f
= 2xy − 1
∂x
∂f
= x2 + 3y2
∂y

Definition. We call a differential equation

dy
M ( x, y) + N ( x, y) =0
dx
an exact differential equation if

∂M ∂N
= .
∂y ∂x

To solve an exact DE we look for an underlying potential function How do we know F exists? Read your
F ( x, y) where MAM2083 notes.
∂F ∂F
= M and = N.
∂x ∂y
44 university of cape town

dy
Example. Solve 3x ( xy − 2) + ( x3 + 2y) dx = 0.
Solution. First we check whether or not this is an exact DE. Since
M( x, y) = 3x ( xy − 2) and N ( x, y) = x3 + 2y we have My = 3x2 = Nx
and so it is exact.
Next we look for a function F ( x, y) so that Fx = M. Do this by
integrating M with respect to x.
Z
F ( x, y) = M( x, y) ∂x
Z  
= 3x2 y − 6x ∂x

= x3 y − 3x2 + g(y) Instead of a constant of integration we


have a function of y.

There is no reason for us to prefer one variable over the other, so we


should also integrate N with respect to y.
Z
F ( x, y) = N ( x, y) ∂y
Z  
= x3 + 2y ∂y

= x 3 y + y2 + h ( x )

Finally we must reconcile these two different expressions for


f ( x, y). Since we have

f ( x, y) = x3 y −3x2 + g(y)
f ( x, y) = x3 y + h ( x ) + y2

we conclude that f ( x, y) = x3 y − 3x2 + y2 + C.


The solution to our exact DE is the implicitly defined curve x3 y − 3x2 + y2 + C = 0
where the constant C may be determined if a condition like y( a) = b
is specified. Geometrically, the solution is level curve generated by
intersecting a plane with the surface z = f ( x, y).
   
x 3 − 1 dy + 1
Example. Check that + y + arctan ( y ) =0
1 + y2 dx 1 + x2
is exact. Solve it.
   
Solution. Since M ( x, y) = 1+1x2 + arctan(y) and N ( x, y) = 1+xy2 + y3 − 1
1
the partial derivatives My = = Nx are equal and the DE is exact.
1+ y2
We look for F ( x, y) starting with M and then again with N.
Z
F ( x, y) = M ( x, y) ∂x
Z  
1
= + arctan(y) ∂x
1 + x2
= arctan( x ) + x arctan(y) + g(y)
mam2084f/s 45

Now integrate N with respect to y.


Z
F ( x, y) = N ( x, y) ∂y
Z  
x 3
= + y − 1 ∂y
1 + y2
1
= x arctan(y) + y4 − y + h( x )
4

Reconciling the two gives us F ( x, y) = x arctan(y) + arctan( x ) + 14 y4 − y + C.


Our solution is x arctan(y) + arctan( x ) + 14 y4 − y + C = 0.
dy
Any differential equation of the form M ( x, y) + N ( x, y) dx = 0
can be made exact by multiplying the entire equation by a suitable
integrating factor. Finding a suitable integrating factor is
dy often difficult.
Example. Solve ty + (t2 + tey ) dt = 0.
Solution. Since M (t, y) = ty and N (t, y) = t2 + tey we have
My = t 6= 2t + ey = Nt . This differential equation is not exact. If
we multiply through by 1t we recover a new differential equation

dy
y + (t + ey ) =0
dt
which is exact and can be solved in the same manner as the previous
problems (details are left to the reader).
All separable equations are also exact since

dy
= f ( x ) g(y)
dx
1 dy
⇒ − f (x) + =0
g(y) dx

and the partial derivatives My = 0 = Nx are equal.

Linear Differential Equations

Definition. A linear differential equation of order n has the form

dn y dn −1 y dy
a n ( x ) n + a n −1 ( x ) n −1 + . . . + a 1 ( x ) + a0 ( x ) y = f ( x )
| dx dx {z dx }
T (y)

where an ( x ) 6= 0. If f ( x ) = 0 we call this a homogeneous differential


equation. We will see that many of the results we have shown for
systems of linear equations hold true for linear differential equations.

Fact 3.1. The set of all solutions to a linear, homogeneous differential


equation form a subsapce of the vector space of all functions R → R.
46 university of cape town

Explanation. A linear, homogeneous differential equation can be


written
T (y) = 0.

Given y1 and y2 that satisfy this differential equation and a constant α


we want to show that

1. αy1 is a solution (i.e. T (αy1 ) = 0),

2. and y1 + y2 is a solution (i.e. T (y1 + y2 ) = 0).

Both of these are easy to show using the linearity of the derivative.

dn dn −1 d
T (αy1 ) = an ( x ) n
( αy 1 ) + a n −1 ( x ) n − 1
(αy1 ) + . . . + a1 ( x ) (αy1 ) + a0 ( x )αy
dx dx dx
dn dn −1 d
= αan ( x ) n (y1 ) + αan−1 ( x ) n−1 (y1 ) + . . . + αa1 ( x ) (y1 ) + αa0 ( x )y
dx dx dx !
dn dn −1 d
= α an ( x ) n
( y 1 ) + a n −1 ( x ) n − 1
( y1 ) + . . . + a1 ( x ) ( y1 ) + a0 ( x ) y
dx dx dx
= αT (y1 )
=0

The proof that T (y1 + y2 ) = 0 is left for the reader.


More generally, we note that the action of T (·) is linear. It’s often
called a linear operator rather a linear transformation since its inputs
are functions.
The subspace of solutions associated with a linear, homogeneous
differential equation will have a dimension equal to the order of the
differential equation. We will discuss this in detail later (once we
have seen the Wronskian).

Fact 3.2. The set of all solutions to a non-homogeneous differential equation


does not form a subspace.
Explanation. It is easy to see that y1 = sin( x ) is a solution of y0 =
cos( x ). It is just as easy to see that 2y1 = 2 sin( x ) is not.

Fact 3.3. The general solution to a non-homogeneous linear differential


equation is the sum of the solutions to the associated homogeneous differen-
tial equation and a particular solution to the non-homogeneous differential
equation.
y g = y p + y0 .

(This result is analogous to the ones we showed earlier for systems of linear
equations).
Explanation. Consider a linear non-homogeneous differential equa-
tion T (y) = f ( x ) with two solutions y1 and y2 . The difference be-
mam2084f/s 47

tween these two solutions solves the associated homogeneous equa-


tion since

T ( y1 − y2 ) = T ( y1 ) − T ( y2 )
= 0.

Conversely, given a solution to the associated homogeneous equa-


tion y0 the sum y1 + y0 is still a solution to the non-homogeneous
differential equation

T ( y1 + y0 ) = T ( y1 ) + T ( y0 )
= f ( x ).

Together this shows that any solution to the non-homogeneous differ-


ential equation can be written as the sum of a solution to the associ-
ated homogeneous differential equation and a particular solution to
the non-honogeneous equation

y g = y p + y0 .

Example. Check that y p = e x satisfies y00 + y = 2e x and hence write


down the general solution to this equation.
Solution. y p = e x so y00p = e x . Substitution shows that this satisfies
our equation. The solution to y00 + y = 0 is y0 = A cos( x ) + B sin( x )
so the general solution to our non-homogeneous equation is

y = e x + A cos( x ) + B sin( x ).

First-Order Linear Differential Equations


A first-order linear differential equation

dy
a1 ( x ) + a0 ( x ) y = f ( x )
dx
can be rewritten as
dy
+ P ( x ) y = Q ( x ).
dx
This can be solved by multiplying through by an appropriate inte-
grating factor. For these problems the appropriate integrating factor
is Rx
e P(s)ds .

After multiplication we have

P(s)ds dy
Rx Rx Rx
P(s)ds P(s)ds
e + P( x )e y = Q( x )e
| dx {z }
product rule
48 university of cape town

where we should notice that the terms on the left hand side of the
equation can be rewritten as a single derivative

d h R x P(s)ds i Rx
e y = Q( x )e P(s)ds .
dx
The solution y can be easily determined by integrating both sides
with respect to x and rearranging.
Example. Solve y0 − 4x y = x5 e x subject to y(1) = 0.
Solution. First calculate the integrating factor:
Z x
4
− ds = −4 ln | x |
s
 
1
= ln
x4
 
Rx
ln 1
− 4s ds
⇒e =e x4

1
= 4
x
Now multiply by the integrating factor and solve.
1 0 4
y − 5y = xe x
x4 x
d hyi
= xe x
dx x4
y
Z
= xe x dx Integrate by parts.
x4
y
= xe x − e x + C
x4
⇒ y = |x5 e x {z
− x4 e}x + |{z}
Cx4
yp y0

We can see that the solution y can be split into two pieces: the so-
lution to the association homogeneous equation and a particular
solution. Finally we use the specified condition y(1) = 0 to determine
the value of the constant of integration. In this case C = 0.
Example. A tank with a capacity of 500 litres holds 300 litres of pure
water. Brine with a concentration of 0.05 kg of salt per litre flows in
at a rate of 3 litres per minute. The mixture is thoroughly stirred and
drains at the same rate.
Draw up a differential equation that describes the rate of change
of salt in the tank. Use this differential equation to find an expression
for S(t), the total amount of salt in the tank at time t.
Solution. Since we started with pure water the initial condition is
S(0) = 0. The rate of change of salt in the tank can be written as

dS
= Salt In − Salt Out.
dt
mam2084f/s 49

The rate of salt coming in and out can be calculated by taking the
product of the flow rate with the concentration. Note that all the
terms will have units of kilograms per minute.
 
dS S(t)
= (3l/min)(0.05kg/l ) − (3l/min) .
dt 300l
After some simplification and rearrangement of terms we recover the
following linear first-order differential equation,
dS 1
+ S(t) = 0.15.
dt 100
The integrating factor associated with this differential equation is
t
e 100 . Multiply through by the integrating factor, recognise the left
hand side as the result of applying the product rule, and integrate.
d h t 15 t
Z i Z
e 100 S(t) dt = 15 e 100 dt. S(t)
dt 100
After integration we can rearrange to find an expression for S(t),
t
S(t) = 15 + Ce− 100 . The initial condition can now be used to de- 15kg

termine C. In this case C = −15. The behaviour of this function is


plotted in figure 3.1.
Example. As above, but with an outflow rate of 2 litres per minute.
Solution. Note that the difference in the outflow and inflow rates
mean the total volume of brine in the mixing tank will change, in fact t

V (t) = 300 + t. Figure 3.1: Salt in the mixing tank as a


function of time.
Since the total capacity of the tank is only 500 litres the solution we
now derive will only be valid for 0 ≤ t ≤ 200. The change to our
differential equation is evident in the term describing the outflow of
salt.   S(t)
dS S(t) 200
= (3l/min)(0.05kg/l ) − (2l/min) .
dt (300 + t)l
This can again be solved by the use of an integrating factor. After
some work we find that
3002
S(t) = 15 + 0.05t − 15 .
(300 + t)2

Linear Independence and the Wronskian

It will be important for us to be able to decide on the linear indepen-


t
dence of a set of functions. Given a set F = { f 1 ( x ), f 2 ( x ), . . . , f n ( x )}
to decide on linear (in)dependence we must solve Figure 3.2: Mismatched inflow and
outflow rates mean the solution is only
valid for 0 ≤ t ≤ 200.
α1 f 1 ( x ) + α2 f 2 ( x ) + . . . + αn f n ( x ) = 0. (3.1)
50 university of cape town

If the only solution is α1 = α2 = . . . = αn = 0 then the set is linearly


independent.
We will always speak about linear independence on a particular
interval I. The equation 3.1 must be satisfied for all values of x ∈ I.
Example. Is the set F = x2 + x, 4x, 2 linearly independent on R?


Solution. We should check what coefficients satisfy

α( x2 + x ) + β(4x ) + γ(2) = 0.

These coefficients must work for all values of x. In particular they


should work when x = 0, which immediately forces γ = 0. It should
also work when x = −1, which makes β = 0 and now consideration
of any other value of x gives α = 0. Since all the coefficients are zero
the set is linearly independent.
Example. Is the set G = cos2 ( x ), sin2 ( x ), cos(2x ) linearly indepen-


dent on [0, 2π ]?
Solution. Recall that cos(2x ) = cos2 ( x ) − sin2 ( x ). Substitution in
equation 3.1 gives

(α + γ) cos2 ( x ) + ( β − γ) sin2 ( x ) = 0.

The has an infinite number of non-trivial solutions, e.g. α = −2 and


β = γ = 2.
This is all very ad hoc. We have used different approaches to dif-
ferent problems, for different types of functions, and we’d like to
develop a more systematic way of checking for independence.
Consider a function F = { f 1 ( x ), f 2 ( x )}. Is this linearly indepen-
dent on I?

α1 f 1 ( x ) + α2 f 2 ( x ) = 0
⇒ α1 f 10 ( x ) + α2 f 20 ( x ) = 0 By taking the derivative on both sides.

This can be rewritten as


! !
f1 (x) f2 (x) α1
=0
f 10 ( x ) f 20 ( x ) α2

or more compactly as
Mα = 0.
If the matrix M is invertible then α = 0, i.e. the only solution to 3.1 is
α1 = α2 = 0 so the set is linearly independent. Consequently, if we
can find some value x0 ∈ I so that the determinant of M is non-zero
at x0 then F is linearly independent.
Definition. The Wronskian of a set F = { f 1 ( x ), f 2 ( x ), . . . , f n ( x )}, de-
noted W ( x ), may be defined on an interval I where all the functions
mam2084f/s 51

are at least n − 1 times differentiable and




f1 (x) f2 (x) ··· f n (x)

f 0 (x) f 20 ( x ) ··· f n0 ( x )

1
W (x) = .. .. .. ..
. . . .


( n −1) ( n −1) ( n −1)
f
1 (x) f2 (x) ··· fn (x)

Fact 3.4. If there is an x0 ∈ I so that W ( x0 ) 6= 0 then F is linearly


independent.
Explanation. The idea is the same as the 2 × 2 case above. We can
extend it to larger sets by induction.
Although a non-zero Wronskian implies linear independence,
linear independence does not imply the Wronskian is non-zero at any
point. Similarly, a Wronskian that is zero everywhere does not imply
linear dependence. The example below should make this clear.
Example. Consider the set F = x2 , x | x | .


We can show that this set is linearly independent on R by solving

αx2 + βx | x | = 0.

If x > 0 then α = − β and if x < 0 then α = β. The only solution is


α = β = 0. The set is linearly independent.
d
If we recall that ( x | x |) = 2| x | we can see that the Wronskian
dx

x2 x | x |
W (x) = = 2x2 | x | − 2x2 | x | = 0.

2x 2| x |

This example makes it clear that in general the implication in


Fact 3.4 runs in only one direction. It can be shown that if the set F
is a set of solutions to a linear, homogeneous differential equation
then implication runs in both directions. Using the Wronskian the
following important result can also be shown.

Fact 3.5. let T (y) = 0 be a linear homogeneous differential equation of


order n then

1. the dimension of the solution space is n,

2. any linearly independent set F = {y1 , y2 , . . . , yn } of n solutions to


T (yi ) = 0 is a basis and any solution can be rewritten as

y = α1 y1 + α2 y2 + . . . + α n y n .

Explanation. Check Wikipedia if you are interested.


n 2
o
Example. Show that 1, e x /2 is a basis for the solution space of

xy00 − ( x2 + 1)y0 = 0
52 university of cape town

and hence write down the general solution to this equation.


Solution. First we check that the two functions y1 = 1 and y2 =
2
e x /2 are both solution to this differential equations. The first function
is clearly a solution since its first and second derivatives are zero. For
the second function
2 /2 2 /2 2 /2
y0 = xe x and y00 = e x + x2 e x .

Substituting into the differntial equation gives


2 /2 2 /2 2 /2
x (e x + x2 e x ) − ( x2 + 1) xe x =0

and so this is a solution. Next we check that these are two indepen-
dent solutions.
1 e x2 /2 2
W (x) = = xe x /2 .

0 xe x2 /2
There are many values of x that make the Wronskian non-zero, e.g.
W (1) 6= 0, so this is an independent set. The general solution can be
written as
2
y = α + βe x /2 .

Linear, Homogeneous Differential Equations with Constant Coef-


ficients

A linear, homogeneous differential equation has the form

dn y dn −1 y dy
an ( x ) n
+ a n −1 ( x ) n −1 + . . . + a 1 ( x ) + a0 ( x )y = 0.
dx dx dx
If the functions ak ( x ) are constants then we say this is a differential
equation with constant coefficients. These types of differential equa-
tions can be easily solved and the solutions all involve exponentials.
Definition. Every linear, constant coefficient differential equation
has an associated auxiliary polynomial. There are two ways to calculate
this polynomial.
1. Substitute y = eλx into the equation.

Example. Let y00 − 3y0 + 2y = 0. Then substitution gives λ2 eλx − 3λeλx + 2eλx = 0
and after dividing through by the exponential we recover λ2 − 3λ + 2 = 0.
The auxiliary polynomial p(λ) = λ2 − 3λ + 2.

2. Rewrite the differential equation using D for the derivative, i.e.


an D n y + an−1 D n−1 y + . . . + a0 y = 0. Consider this as y being acted
on by a linear operator P( D ), i.e.
 
an D n + an−1 D n−1 + . . . + a0 y = 0.
| {z }
P( D )
mam2084f/s 53

The auxiliary polynomial is p(λ).

Example. The equation y00 − 3y0 + 2y = 0 can be rewritten as ( D2 −


3D + 2)y = 0 so the auxiliary polynomial is p(λ) = λ2 − 3λ + 2.
Some remarks. Any polynomial can be factored into a product
of (possibly repeated) linear and irreducible quadratic factors. The
multiplicity of a root is the number of times the corresponding factor
is repeated.
Example. The polynomial p(λ) = (λ − 4)(λ + 3)3 (λ − 1)6 has three
roots. The multiplicity of the root λ = 4 is 1. The multiplicity of
λ = −3 is 3 and of λ = 1 is 6.
Our approach to solving these types of differential equations will
be to factorise the auxiliary polynomial and sum the contributions
from each of the linear and quadratic factors. The contributions from
these factors is given by Facts 3.6 and 3.7 below.
Fact 3.6. The linear, homogeneous differential equation with constant
coefficients
( D − β)k y = 0
has a corresponding auxiliary polynomial p(λ) = (λ − β)k , k independent
solutions

y1 = e βx
y2 = xe βx
..
.
yk = x k−1 e βx

and a general solution

y = α1 e βx + α2 xe βx + . . . + αk x k−1 e βx .

Explanation. We do this by induction. It’s clear that y = e βx satisfies


y0 − βy = 0. Now assume that y∗ = x n−1 e βx satisfies ( D − β)n y = 0.
Then
1. this same function also satisfies ( D − β)n+1 y = 0 since

( D − β)n+1 y∗ = ( D − β)( D − β)n y∗ = ( D − β)0 = 0.

2. the function xy∗ satisfies ( D − β)n+1 y since

( D − β)n+1 xy∗ = ( D − β)n ( D − β) xy∗


= ( D − β)n (y∗ + βxy∗ − βxy∗ ) Using the product rule

= ( D − β)n y∗
= 0.
54 university of cape town

Fact 3.7. The linear, homogeneous differential equation with constant


coefficients
 k
( D − β )2 + ω 2 y = 0

has an auxiliary polynomial p(λ) = (λ − β)2 + ω 2 which is irreducible




over the reals (but has a complex conjugate pair of roots β ± iω) , 2k inde-
pendent solutions

y1 = e βx cos(ωx ) y2 = e βx sin(ωx )
y3 = xe βx cos(ωx ) y4 = xe βx sin(ωx )
.. ..
. .
y2k−1 = x k−1 e βx cos(ωx ) y2k = x k−1 e βx sin(ωx )

and a general solution

y = α1 e βx cos(ωx ) + α2 e βx sin(ωx ) + . . . + α2k−1 x k−1 e βx cos(ωx ) + α2k x k−1 e βx sin(ωx ).

Explanation. The explanation for why the repeated roots lead to ad-
ditional powers of x in the solution is the same as the previous fact’s.
It might be useful to review why the k = 1 case gives trigonometric
functions in our answer.
We start with the differential equation ( D − β)2 y + ω 2 y = 0. All
these constants are real and the function y is real-valued. If we as-
sume that y = eλx we find an auxiliary polynomial p(λ) = (λ − β)2 + ω 2
which is irreducible over the reals.
To solve this problem we factorise over the complex numbers. We
are allowed to introduce complex numbers for intermediate steps
of our calculation as long as we make sure that in the final solution
all the imaginary parts have disappeared and we have a real-valued
function y. The complex conjugate pair of roots are λ = β ± iω. We
write down our general solution

y = α1 e βx+iωx + α2 e βx−iωx .

This is clearly no good - it seems obvious that the solution y is no


longer real-valued. We can fix this by choosing α2 = α1 . If we write
α1 = a + ib this gives
 
y = e βx a(eiωx + e−iωx ) + ib(eiωx − e−iωx ) .

We can simplify this using Euler’s formula:

y = e βx (2a cos(ωx ) − 2b sin(ωx ))

and relabel the constants to recover

y = α1 e βx cos(ωx ) + α2 e βx sin(ωx ).
mam2084f/s 55

Fact 3.8. The general solution to any linear, homogeneous differential


equation with constant coefficients can be found by
1. determining its auxiliary polynomial and factorising it completely into
linear and irreducible quadratic factors,

2. solving the differential equation associated with each set of repeated


linear or quadratic factors,

3. taking a linear combination of the solutions to those differential equa-


tions.

Explanation. This follows from the linearity of the differential equa-


tion and of the combination of solutions.
Example. Solve y00 − 3y0 + 2y = 0.
Solution. The corresponding polynomial is p(λ) = λ2 − 3λ + 2 = (λ − 2)(λ − 1)
so the solution is
y = αe x + βe2x .

Example. Solve y00 + y0 + y = 0


Solution. Here p(λ) = λ2 + λ + 1 which

is irreducible. The complex
conjugate pair of roots is λ = − 12 ± i 23 so the solution is
√ ! √ !
− 12 3 − 21 3
y = αe cos x + βe sin x .
2 2

d5 y
Example. Solve + 32y = 0.
dx5
Solution. The associated polynomial is λ5 + 32 = 0. If we let
cos(π/5) = C and sin(π/5) = S then the roots are λ1 = −2, Make sure you know how to find these!
λ2,3 = 2C ± i2S and λ4,5 = −2(C2 − S2 ) ± i2SC and the solution is
2 − S2 ) x 2 − S2 ) x
y = α1 e−2x + α2 e2Cx cos(2Sx ) + α3 e2Cx sin(2Sx ) + α4 e−2(C cos(2SCx ) + α5 e−2(C sin(2SCx ).

Variation of Parameters

Variation of parameters is a method used for solving inhomogeneous


linear differential equations. In principle we can use it to solve n-th
order differential equations, but the calculations become very messy
so we will restrict our attention to second-order problems

d2 y dy
a2 ( x ) + a1 ( x ) + a0 ( x ) y = f ( x ).
dx2 dx
From the theory we have developed earlier we know that the general
solution to this equation can be written as

y = y p + αy1 + βy2 .
56 university of cape town

Here y p is a particular solution to the imhomogeneous equation and


αy1 + βy2 is the solution to the associated homogeneous equation.
Variation of parameters allows us to determine y p from y1 and y2 .

All the details


The first step in the variation of parameters is to make the assump-
tion that the particular solution to the inhomogeneous equation has
the form
y p = v1 ( x ) y1 + v2 ( x ) y2 .
This assumption was first made in the 1600’s by Euler and La-
grange. It’s motivated by perturbation theory – put simply we are
assuming that if the homogeneous equation changes slightly (by
the introduction of the function f (t)) then the solution changes
slightly. Instead of a linear combination αy1 + βy2 we have a solu-
tion v1 ( x )y1 + v2 ( x )y2 . What remains is to determine the unknown
functions v1 ( x ) and v2 ( x ).
Since we have two unknown functions that must be determined
we will need two equations. The first equation is the obvious one.
The proposed solution v1 ( x )y1 + v2 ( x )y2 must satisfy the differential
equation. The second equation we use is

v10 ( x )y1 + v20 ( x )y2 = 0. (3.2)

This second equation is chosen because it makes an number of the


necessary algebraic manipulations significantly simpler. Specifically,
the derivative y0p (suppressing arguments of v1 and v2 for brevity) is
now
y0p = v1 y10 + v2 y20
and the second derivative

y00p = v10 y10 + v1 y100 + v20 y20 + v2 y200 .

Substitution into the the differential equation (still suppressing argu-


ments) gives

a2 (v10 y10 + v1 y100 + v20 y20 + v2 y200 ) + a1 (v1 y10 + v2 y20 ) + a0 (v1 y1 + v2 y2 ) = f .

Multiplying out and regrouping the terms

v1 ( a2 y100 + a1 y10 + a0 y1 ) +v2 ( a2 y200 + a2 y10 + a0 y2 ) + a2 v10 y1 + a2 v20 y20 = f .


| {z } | {z }
zero zero

Two sets of terms are zero since y1 and y2 are solutions to the homo-
geneous differential equation, so this simplifies to
f
v10 y1 + v20 y20 = . (3.3)
a2
mam2084f/s 57

The two equations 3.2 and 3.3 can be written concisely as


! ! !
y1 y2 v10 0
= f .
y10 y20 v20 a 2

The short version


Unless explicitly asked to perform variation of parameters from
scratch we
1. solve the associated homogeneous differential equation for two
independent solutions y1 and y2 ,

2. write down the matrix equation


! ! !
y1 y2 v10 0
= f .
y10 y20 v20 a2

3. find an explicit expression for v10 and v20


1 f
v10 = − y2 W is the Wronskian.
W a2
1 f
v20 = y
W 1 a2

4. integrate v10 and v20 ,

5. write down the solution

y = v1 y1 + v2 y2 + αy1 + βy2 .

Example. Solve y00 − y = e x subject to y(0) = 1 and y0 (0) = 0.


Solution. The associated homogeneous equation has two indepen-
dent solutions y1 = e x and y2 = e− x . We must therefore solve the
equation ! ! !
e x e− x v10 0
= .
e x −e− x v20 ex
After inverting and multiplying through we have
! ! !
v10 1 −e− x −e− x 0
=−
v20 2 −e x ex ex

which gives v10 = 21 and v20 = − 12 e2x . Integation is easy: v1 = 12 x + C


and v2 = − 41 e2x + D. This means that the solution to our inhomoge- We will see in a moment that these con-
neous differential equation is stants of integration are unnecessary. In
future we will leave them out.
1 1
y = ( x + C )e x + (− e2x + D )e− x + αe x + βe− x Some of these terms (including the ones
2 4 that arise from the non-zero constants
1 x x −x of integration) can be absorbed into the
= xe + αe + βe
2 associated homogeneous solution.
58 university of cape town

What remains is to determine the value of the constants α and β. The


two conditions specified are

y (0) = 1 ⇒ α + β = 1
1
y 0 (0) = 0 ⇒ α − β = −
2
1
solving these gives α = 4 and β = 34 .
Example. Solve the differential equation ( D2 + 1)y = tan( x ).
Solution. The associated homogeneous equation has two indepen-
dent solutions y1 = cos( x ) and y2 = sin( x ). We must therefore solve
the equation
! ! !
cos( x ) sin( x ) v10 0
= .
− sin( x ) cos( x ) v20 tan( x )

After inverting and multiplying through we have


! ! !
v10 cos( x ) − sin( x ) 0
=
v20 sin( x ) cos( x ) tan( x )

which gives v10 = − sin( x ) tan( x ) and v20 = cos( x ) tan( x ) = sin( x ). In-
tegration of v20 is easy and we recover one of the functions v2 = − cos( x ).
Integrating v10 is a little harder:
Z
v1 = − sin( x ) tan( x ) dx
− sin2 ( x )
Z
= dx
cos( x )
cos2 ( x ) − 1
Z
= dx
cos( x )
Z
= cos( x ) − sec( x ) dx

= sin( x ) − ln | sec( x ) + tan( x )| Constants of integration are unneces-


sary.
This means that the solution to our inhomogeneous differential equa-
tion is Many of the terms cancel.

y = − cos( x ) ln | sec( x ) + tan( x )| + α cos( x ) + β sin( x ).


4
Diagonalisation

Vector spaces have more than one basis, and some of these bases
are more useful than others for thinking about particular problems.
For example, consider the standard first-year physics/statics problem
of a box resting on an inclined plane. Although we are used to de-
scribing R2 with the canonical basis {(1, 0) T , (0, 1) T } the first step we
might take in analysing the box on the plane is to describe the forces
acting on the box in terms of a new basis whose elements are a unit
vector parallel to the plane and perpendicular to it. It is this idea –
choosing an appropriate basis to describe a problem in a simple way
– that motivates this section on diagonalisation.

A First Example

To illustrate how useful diagonalisation is we will solve a system of


linear differential equations. You won’t be able to follow all the steps
– the margins include the questions you might be asking yourself.
Consider (
x10 = 2x1 − x2
x20 = 5x1 − 4x2

This can be written more compactly as

x0 = Ax (4.1)
! !
x1 2 −1
where the vector x = and the matrix A = . Let
x2 5 −4
!
1 1
x = Py where P = . If we substitute into the equation 4.1 Why? Where did this matrix P come
1 5 from?
it becomes
Py0 = APy.

Multiply on both sides by P−1 to get How do we know P is invertible?


60 university of cape town

y0 = P−1 APy.
!
1 0
The product P−1 AP is a diagonal matrix D = . This is It’s really diagonal? Yes – check it
0 −3 yourself. Why is it diagonal?
equivalent to the system of differential equations
(
y10 = 1y1 + 0y2
y20 = 0y1 − 3y2

Each of these equations contains only one variable. We have decou-


pled them and can solve them easily: y1 = αet and y2 = βe−3t . Since
x = Py we can use these solutions to recover the solutions to the
original system
! ! ! ! !
1 1 αet αet + βe−3t 1 1
x= = =α t
e +β e−3t .
1 5 βe−3t αet + 5βe−3t 1 5

This example illustrates the basic motivation for why diagonalisation


is useful, but doesn’t explain how to diagonalise a given matrix – to
be specific, what is the matrix P? We answer this question below.

Eigenvalues and Eigenvectors

Definition. Let A be an n × n matrix. We call λ and eigenvalue of A


if we can find a non-zero vector v so that

Av = λv.

We call v the eigenvector of A corresponding to λ.


We will see later (Fact 4.6) that a matrix is diagonalisable if it has
n linearly independent eigenvectors. The matrix P that diagonalises
A is a matrix whose columns are the n independent eigenvectors and
the resulting diagonal matrix D has the corresponding eigenvalues as
its diagonal entries. For now we are concerned with the calculation of
eigenvalues and eigenvectors.
The eigenvalues of a matrix are the roots of its characteristic poly-
nomial, equivalently, they are the values that satisfy its characteristic
equation.

Av = λv
Av − λIv = 0
( A − λI )v = 0
det( A − λI ) = 0 Since we want to find a non-zero
solution.
The characteristic polynomial is p(λ) = det( A − λI ) and the charac-
teristic equation is p(λ) = 0.
mam2084f/s 61

!
1 2
Example. Find the eigenvalues of A = .
2 1
Solution. The characteristic equation is

1−λ 2
=0

1−λ

2
(1 − λ )2 − 4 = 0
λ2 − 2λ − 3 = 0
(λ − 3)(λ + 1) = 0.
The eigenvalues of A are λ1 = −1 and λ2 = 3.
The eigenvectors can be calculated by solving the equation ( A − λI )v = 0
for each different eigenvalue.
Example. Now find the eigenvectors of A.
Solution. The first eigenvalue we found is λ1 = −1. Substitution
into the equation gives us
!
1 − (−1) 2
v = 0.
2 1 − (−1)
We write this as an augmented matrix and Gauss reduce:
! !
2 2 0 1 1 0

2 2 0 0 0 0
!
1
the solutions are v = α where α ∈ R, α 6= 0.
−1
The second eigenvalue is λ2 = 3. Substitution gives us the aug-
mented matrix (and its Gauss reduced form)
! !
−2 2 0 1 −1 0

2 −2 0 0 0 0
!
1
which has solutions v = α where α ∈ R, α 6= 0.
1
Note that for both eigenvalues when we substitute in and Gauss
reduce one of the rows of the augmented matrix is reduced to a row
of zeros. This will always happen – if it does not you have made a
mistake, either in the Gauss reduction or in your calculation of the
eigenvalues.
In this example the eigenvalues were distinct. It is possible that
a matrix may have repeated eigenvalues, as we see in the following
example.
 
1 0 0
Example. Find the eigenvalues and eigenvectors of B =  1 3 2 .
 
1 2 3
62 university of cape town

Solution. First we find the eigenvalues.



1−λ 0 0

1 3−λ 2 =0


1 2 3−λ

3−λ 2
(1 − λ ) =0

2 3−λ
 
(1 − λ ) (3 − λ )2 − 4 = 0
(1 − λ)(λ2 − 6λ + 5) = 0
(λ − 1)2 (λ − 5) = 0.

The eigenvalues are λ1 = 1 and λ2 = 5. Since λ1 is a repeated


root of the characteristic polynomial we say that it has an algebraic
multiplicity of 2. We calculate eigenvectors next. You should check
for  that the eigenvectors corresponding to λ2 = 5 are v =
yourself
0
α  1  with α ∈ R, α 6= 0. The calculation for λ1 = 1 is more
 
1
interesting:
   
1−1 0 0 0 1 2 2 0
 1 3−1 2 0 ∼ 0 0 0 0 
   
1 2 3−1 0 0 0 0 0
   
−2 −2
and the solution is v = α  0  + β  1  with α, β ∈ R, α 6=
   
1 0
0 or β 6= 0. Here the eigenvalue was repeated twice, two rows of the
augmented matrix were reduced to zero and two linearly indepen-
dent eigenvectors can be found.
Definition. We call

Eλ = {v ∈ Rn ; Av = λv}

the eigenspace corresponding to the eigenvalue λ. Note that the


eigenspace contains the zero vector even though it is not an eigen-
vector. Eigenspaces are subspaces (see below) and their dimension is
the geometric multiplicity of λ. The geometric multiplicity of an eigen-
value is always less than or equal to the algebraic multiplicity of that
eigenvalue.

Fact 4.1. The eigenspaces of a matrix are subspaces.


Explanation. Let u and v be vectors in an eigenspace. The sum of
mam2084f/s 63

these vectors is still in the eigenspace since

A(u + v) = Au + Av
= λu + λv
= λ (u + v)

so the eigenspace is closed under vector addition. Similarly

A(αu) = αAu
= αλu
= λ(αu)

and so the eigenspace is also closed under scalar multiplication.


!
1 2
Example. The matrix A = had two eigenvalues λ = 3 and
2 1
λ = −1. Its eigenspaces are
( ! )
1
E3 = α ;α ∈ R ,
1
( ! )
1
E−1 = β ;β ∈ R .
−1
 
1 0 0
Example. The matrix B =  1 3 2  had two eigenvalues λ = 1
 
1 2 3
and λ = 5. Its eigenspaces are
     

 −2 −2 

E1 = α  0  + β  1  ; α, β ∈ R ,
   
 
 1 0 
    y

 0 

E5 = γ  1  ; γ ∈ R
 
 
 1 

Geometric Interpretation
A 2 × 2 matrix is a linear transformation of the plane, e.g. a projec- x
tion, reflection, rotation and so on.
For certain linear transformations – ones that can be represented
by a symmetric matrix (i.e. a matrix equal to its own transpose) –
there is an easy geometric interpretation of the matrix’s eigenvalues
and eigenvectors. The image of a unit circle under this transforma-
tion is an ellipse whose axes are aligned with the eigenvectors, and
whose lengths are given by the absolute value of the eigenvalues. Figure 4.1: The
 imageof a unit circle
1 2
under A = is an ellipse.
2 1
64 university of cape town

(In higher dimensions this generalises as you might expect to unit


spheres and ellipsiods).
Example. Figure 4.1 ! shows the transformation of a unit circle by the
1 2
matrix A = .
2 1
The red unit circle is stretched into the blue ellipse. The solid
green arrows represent the eigenvectors of A that lie on the unit
circle, and the dasked arrows are their image under the matrix A.
!
1 0
Example. The projection matrix P = projects onto the
0 0
x-axis. It is relatively easy to show that it’s eigenvalues are λ1 = 0
and λ2 = 1.
The eigenspace corresponding to λ1 = 0 is the y-axis. All the vec-
tors in this eigenspace are squashed onto the origin. The eigenspace
corresponding to λ2 = 1 is the x-axis. Since they are arlready on the
x-axis the projection does nothing to them.
!
−1 0
Example. The reflection matrix R = reflects about the
0 1
y-axis. It’s eigenvalues are λ1 = −1 and λ2 = 1.
The corresponding eigenspaces are the x- and y-axes which are
reflected (λ1 = −1) or left unchanged (λ2 = 1).
Example. Let A be a 3 × 3 matrix that represents orthogonal projec-
tion onto the plane x + 2y + 3z = 0.
The eigenspaces of A are
   

 1 

E0 = α  2  ; α ∈ R
 
 
 3 

since vectors normal to the plane through the origin are squashed flat
onto the origin and
     

 −3 −2 

E1 = β  0  + γ  1  ; β, γ ∈ R
   
 
 1 0 

since vectors in the plane are unchanged by the projection.

More Facts about Eigenvalues and Eigenvectors

For certain types of matrices it is very easy to find the eigenvalues.


 
5 6 7
Example. Find the eigenvalues of A =  0 8 9  .
 
0 0 10
mam2084f/s 65

Solution. The shape of this matrix lends itself to a very simple cal-
culation. We simply expand along the first column to evaluate the
determinant:

5−λ 6 7
8−λ 9
0 8−λ 9 = (5 − λ ) = (5 − λ)(8 − λ)(10 − λ).

0 10 − λ
0 0 10 − λ

The eigenvalues of this matrix are therefore λ1 = 5, λ2 = 8 and


λ3 = 10.

Fact 4.2. If A is

1. upper triangular,

2. lower triangular, or

3. a diagonal matrix.

then the eigenvalues of A are just the entries on the diagonal.


Explanation. The idea is the same as the example above – simply
expand repeatedly along the first column.
The eigenvalues of an arbitrary matrix (i.e. not just upper/lower
triangular and diagonal matrices) also satisfy the following two equa-
tions.

Fact 4.3. Let A be an n × n matrix with eigenvalues λ1 , λ2 , . . . , λn (possi-


bly repeated). Then

1. the sum of the eigenvalues is equal to the trace of the matrix


n n
trace( A) = ∑ aii = ∑ λi ,
i =1 i =1

2. and the product of the eigenvalues is equal to the determinant of the


matrix
det( A) = λ1 λ2 · · · λn .

Explanation. Tutorial problem.


 
1 0 0
Example. Consider the matrix B =  1 3 2 . We already know
 
1 2 3
that the eigenvalues of this matrix are λ1 = 5 and λ2 = λ3 = 1.
The trace of this matrix is the sum of its diagonal elements, i.e.
1 + 3 + 3 = 7 which is the same as the sum of the eigenvalues
1 + 1 + 5 = 7.
The determinant of this matrix is 5 and the product of the eigen- Check the determinant yourself.
values is 1 × 1 × 5 = 5.
66 university of cape town

Example. If the matrix A has a trace of 5 and a determinant of 6, find


its eigenvalues.
Solution. We look for two values λ1 and λ2 so that λ1 λ2 = 6 and
λ1 + λ2 = 5. This is easy to solve – we recover λ1 = 2 and λ2 = 3.
Fact 4.4. If v1 , v2 , . . . , vk are eigenvectors of A that correspond to k dis-
tinct eigenvalues of A then the set

{v1 , v2 , . . . , vk }
is linearly independent.
Explanation. Consider the case k = 2, i.e. we have only two eigen-
vectors v1 and v2 belonging to the distinct eigenvalues λ1 and λ2 . We
are interested in solutions to the equation

αv1 + βv2 = 0 Apply A to both sides.

A(αv1 + βv2 ) = A0
αλ1 v1 + βλ2 v2 = 0

Multiply the first of these equations by λ1 and subtract the last.

αλ1 v1 + βλ1 v2 − αλ1 v1 − βλ2 v2 = λ1 0 − 0


β(λ1 − λ2 )v2 = 0

Since the eigenvalues are distinct and eigenvectors are by definition


non-zero this is only possible if β = 0. In turn this forces α = 0 and
so the set is linearly independent.
The case k = 3 can be tackled in a similar manner. We have

αv1 + βv2 + γv3 = 0

and
αλ1 v1 + βλ2 v2 + γλ3 v3 = 0.
After manipulation we have

α(λ3 − λ1 )v1 + β(λ3 − λ2 )v2 = 0.

Now we relabel α(λ3 − λ1 ) as α̃ and β(λ3 − λ2 ) as β̃ and apply the


reasoning of the k = 2 case to

α̃v1 + β̃v2 = 0.

Note that α̃ = 0 ⇒ α = 0.
For k = 4 we can reduce it to the k = 3 case and so on.
Fact 4.5. If an n × n matrix has n distinct eigenvalues then there is a basis
for Rn consisting of eigenvectors of A.
Explanation. This is a corollorary of the previous fact. Note that if
the eigenvalues are not distinct we may still be able to find a basis of
eigenvectors but it is not guaranteed.
mam2084f/s 67

Diagonalisation

Definition. An n × n matrix is diagonalisable if and only if it has n


linearly independent eigenvectors.

Fact 4.6. For any square matrix A, any matrix P (whose columns are
eigenvectors of A) and associated D (a diagonal matrix with corresponding
eigenvalues as entries) satisfy

AP = PD. (4.2)

If A is diagonalisable then P can be chosen to be invertible and

P−1 AP = D.

Explanation. Let P be a matrix of eigenvectors, i.e.

P = (v1 |v2 | · · · |vn )

and D a diagonal matrix of the corresponding eigenvalues


 
λ1 0 · · · 0
 0 λ2 · · · 0 
 
D= .  .. .. .
.. 
 .. . . . 
0 0 ··· λn

The k-th column of the product AP on the left hand side of equation
4.2 is simply Avk . The k-th column of the product PD on the right
is λk vk . If A is diagonalisable the eigenvectors v1 , v2 , . . . vn can be If you can’t see this immediately start
chosen to form a linearly independent set – this makes P invertible calculating the products and you
should notice it.
(Fact 2.16).
Given a matrix A, the process of finding an appropriate matrix P
so that P−1 AP = D is called diagonalising A.
!
1 2
Example. Let A = . Find, if possible, a diagonal matrix
−1 4
D and another matrix P so that D = P−1 AP. If not possible, explain
why.
Solution. We start by calculating the eigenvalues. We need to solve

1−λ 2
=0

−1 4−λ

(1 − λ)(4 − λ) + 2 = 0
λ2 − 5λ + 6 = 0
(λ − 3)(λ − 2) = 0
68 university of cape town

so the eigenvalues are λ1 = 2 and λ2 = 3. We find the corresponding


eigenvectors:
! ! !
1−2 2 0 −1 2 0 2
∼ ⇒v=α , α 6= 0,
−1 4 − 2 0 0 0 1

and
! ! !
1−3 2 0 1 −1 0 1
∼ ⇒v=β , β 6= 0.
−1 4 − 3 0 0 0 0 1

Then we have
! !
2 0 2 1
D= and P = .
0 3 1 1

For the sake of completeness we will check that P−1 AP really does
give D.
! ! !
1 −1 1 2 2 1
P−1 AP =
−1 2 −1 4 1 1
! !
1 −1 4 3
=
−1 2 2 3
!
2 0
= .
0 3

!
3 4
Example. Let B = . Find, if possible, a diagonal matrix
−1 7
D and another matrix P so that D = P−1 BP. If not possible, explain
why.
Solution. Again, start by calculating eigenvalues.

3−λ 4
=0

−1 7−λ


(3 − λ)(7 − λ) + 4 = 0
λ2 − 10λ + 25 = 0
(λ − 5)2 = 0.

We have only one eigenvalue λ1 = 5 which has an algebraic mul-


tiplicity of 2. We may not be able to diagonalise in this case, but we
will have to do more calculations before we decide.
! ! !
3−5 4 0 1 −2 0 2
∼ ⇒v=α , α 6= 0.
−1 7 − 5 0 0 0 0 1
mam2084f/s 69

The corresponding eigenspace is one-dimensional. We can still find


matrices P and D that satisfy AP = PD, in this case
! !
5 0 2 2
D= and P = ,
0 5 1 1

but P is not invertible so we don’t have P−1 AP = D.


 
9 1 1
Example. Diagonalise A =  1 9 1 .
 
1 1 9
Solution. You will have to fill in some of the details yourself. The
characteristic equation is −(λ − 8)2 (λ − 11) = 0 which means we
have eigenvalues λ1 = 11 and λ2 = 8 (with algebraic multiplicity 2).
Next we find corresponding eigenvectors. For λ1 they are
 
1
v = α  1  ; α 6= 0.
 
1
For λ2 we must solve
   
9−8 1 1 0 1 1 1 0
 1 9−8 1 0 ∼ 0 0 0 0 .
   
1 1 9 0 0 0 0 0
We are fortunate – the eigenspace
     

 −1 −1 

E8 = α  1  + β  0  ; α, β ∈ R
   
 
 0 1 

is two dimensional so we are able to find a set of three linearly inde-


pendent eigenvalues. Let
   
−1 −1 1 8 0 0
P= 1 0 1  and D =  0 8 0 
   
0 1 1 0 0 11

then P−1 AP = D. The order of the entries in D is important. It must Check this yourselves by doing the
correspond to the order of the columns of P. multiplication. It’s worth working
through this tedious calculation at least
Example. Solve the system of differential equations once for the 3 × 3 case to appreciate
 what a great result this is.
 dx1 = x1 + 2x2 + 2t

dt .
 dx2 = − x1 + 4x2 + t

dt
Solution. We start by rewriting the system in a more useful form:
! !
dx 1 2 2t
= x+ .
dt −1 4 t
| {z }
A
70 university of cape town

 
Now let x = Py. Then 1 2
Where P = , a matrix of
1 1
eigenvectors of A.
!
d 2t
( Py) = APy +
dt t
!
−1 dy 2t
P P = P−1 APy + P−1
dt t
!
dy 0
= Dy + .
dt t

This is equivalent to the system of differential equations



 dy1 = 3y1

dt
 dy2 = 2y2 + t

dt
which is now decoupled, i.e. each equation contains only a single un-
known function y1 or y2 instead of both. These differential equations
are easy to solve: y1 = αe3t and y2 = βe2t − 12 t − 14 and from here we
can recover the solutions x1 and x2 be recalling that x = Py. In this
case ! !
1 2 αe3t
x=
1 1 βe2t − 21 t − 41
and after multiplication we find
1
x1 = αe3t + 2βe2t − t − ,
2
3t 2t 1 1
x2 = αe + βe − t − .
2 4

Inner Products, Norms and Orthogonality

Even fairly simple real-valued matrices may have complex eigenval-


ues. Consider a rotation of the plane described by a matrix
!
cos θ − sin θ
R= .
sin θ cos θ

It’s eigenvalues are determined by solving | R − λI | = 0 which in this


case is

(cos θ − λ)2 + sin2 θ = 0


λ2 − 2 cos(θ )λ + 1 = 0
⇒ λ = e±iθ .

In this section we generalise the notion of a dot product to an


inner product, which can be applied to complex-valued vectors. First,
a quick refresher on the dot product.
mam2084f/s 71

Definition. If u and v are two vectors in Rn then the dot product of u


and v, written u · v is

u · v = u1 v1 + u2 v2 + . . . + u n v n
= uT v.

The dot product is useful. It allowed us to speak about the length


of vectors
|u|2 = u · u
and of orthogonal vectors

u · v = 0 ⇐⇒ u ⊥ v.

Definition. The inner product of two complex-valued vectors u, v ∈


Cn is written hu, vi and is defined as

hu, vi = u1 v1 + u2 v2 + . . . + un vn
= uT v.

The dot product of two real-valued vectors is a special case of this


inner product.
   
i 2 + 3i
Example. Let a =  0  and b =  1 . Calculate ha, bi
   
1−i 7 − 2i
and hb, ai.
Solution.

ha, bi = i (2 − 3i ) + 0 + (1 − i )(7 + 2i )
= 2i + 3 + 7 + 2 − 7i − 2i
= 12 − 3i
hb, ai = (2 + 3i )(−i ) + 0 + (7 − 2i )(1 + i )
= −2i + 3 + 7 + 2 − 2i + 7i
= 12 + 3i

Fact 4.7. Some properties of the inner product:

1. hx, yi = hy, xi

2. hαx, yi = αhx, yi = hx, αyi

3. hx + y, zi = hx, zi + hy, zi and hx, y + zi = hx, yi + hx, zi

4. For a real matrix A, h Ax, yi = hx, A T yi

Explanation. The first three are tutorial problems. For the last prop-
erty note that
h Ax, yi = ( Ax)T y = xT A T y
72 university of cape town

and
hx, A T yi = xT A T y = xT A T y.

Definition. The norm of a complex vector (denoted by kvk) can be


defined using the inner product
q
kvk = hv, vi.

Note that kvk of a real-valued vector reduces to the familiar Eu-


clidean length.

Fact 4.8. Some properties of the norm:

1. k0k = 0 and kxk > 0 if x 6= 0

2. kαxk = |α| kxk

3. kx + yk ≤ kxk + kyk (This is the triangle inequality)

Explanation.
p √
1. k0k = h0, 0i = 0 + 0 + . . . + 0 = 0.
For any non-zero z ∈ C zz = <(z)2 + =(z)2 > 0. If x 6= 0 it has
at least one non-zero component, and hence at least one term in
the sum hx, xi is strictly positive and the rest are non-negative. The
square root of a positive number is again positive.

2.
q
kαxk = hαx, αxi
p
= αx1 αx1 + αx2 αx2 + . . . + αxn αxn
p
= | α | x1 x1 + x2 x2 + . . . + x n x n
= | α | kxk

3. We’re not going to provide a completely rigorous algebraic proof


of the triangle inequality. Just realise that it’s equivalent to saying x+y
y
that the length of one side of a triangle cannot be longer than the
sum of the lengths of the other two sides.
x
Figure 4.2: The length of one side of a
Definition. Two vectors are orthogonal if and only if their inner prod-
triangle cannot be larger than the sum
uct is zero, i.e. of the lengths of the other two sides.
hu, vi ⇐⇒ u ⊥ v.
   
i 1
Example. Are x =  2 − i  and y =  3 − i  orthogonal?
   
−7 1
mam2084f/s 73

Solution. Check the inner product.

hx, yi = i + (2 − i )(3 + i ) + (−7)


= i + 6 + 1 − 3i + 2i − 7
=0

These vectors are orthogonal.


Definition. A set of vectors is orthonormal if

1. every vector in the set has a norm of 1, and

2. the vectors are mutually orthogonal, i.e. every pair of vectors in


the set is orthogonal.
( ! !)
1 0
Example. The set , is orthonormal. So are all the
0 1
canonical bases {e1 , e2 , . . . , en } for Rn .
1 1
( ! !)
√ √
Example. Is the set 2 , 2 orthonormal?
√1 −1

2 2
Solution. Yes. Both vectors have a norm of 1 since
s r
1 1 ±1 ±1 1 1
√ √ +√ √ = + =1
2 2 2 2 2 2

and the inner product of the two vectors is zero.


     
 1
 −1 1 

Example. Is the set  0  ,  1  ,  2  orthonormal?
     
 
 1 1 −1 
Solution. Call the three vectors x, y and z. It’s easy to see that each
pair of vectors is orthogonal, e.g.

hx, yi = −1 + 0 + 1 = 0.

It’s equally easy to see that the vectors do not have a norm of 1, e.g.
√ √
kxk = 1 + 1 = 2.

We can scale each vector by the reciprocal of its norm to create an


orthonormal set:
 
  √ −1
 
 √1 √1 

 2 2   2  
  1
√ , √  . 2
 0 ,

 3   6 
√1 √1 −1



 

2 3 6

Fact 4.9. Every set of mutually orthogonal vectors is also linearly indepen-
dent.
74 university of cape town

Not all linearly independent sets are mutually orthogonal (but they can
be made orthogonal using the Gram-Schmidt procedure).
Explanation. The explanation of the first part is a tutorial problem.
The second part, the Gram-Schmidt procedure, will be discussed a
little later on. We will need it to solve some problems and it’s best
explained through demonstration.
Definition. An invertible real-valued matrix Q is called orthogonal if
Q−1 = Q T . An invertible complex-valued matrix is called unitary if
Q −1 = Q T .
Example. Consider the rotation matrix
!
cos θ − sin θ
R= .
sin θ cos θ

It is easy to see that


! ! !
T cos θ − sin θ cos θ sin θ 1 0
RR = = .
sin θ cos θ − sin θ cos θ 0 1

The inverse and transpose of R are the same. R is orthogonal.


Note that the columns of R form an orthonormal set. This is not a
coincidence.

Fact 4.10. The columns of an orthogonal matrix form an orthonormal set.


Explanation. Consider an orthogonal matrix A which we write as

A = (a1 |a2 | · · · |an ) ,

where ai is a column vector. Since A is orthogonal we know that


AA T = A T A is the identity matrix, i.e.
 T 
a1
 aT 
 2 
AT A = 
 ..  (a1 |a2 | · · · |an )

 . 
anT

a1T a1 a1T a2 . . . a1T an

 T
 a2 a1 a2T a2 ... a2T an 

= .. .. .. .. 
 . . . . 

anT a1 anT a2 ... anT an


=I

The diagonal elements aiT ai = 1, i.e. the vectors all have a norm of
one. The off-diagonal elements aiT a j = 0 which means that every pair
of vectors is orthogonal. The set is thus orthonormal.
Orthogonal matrices have a number of important properties.
mam2084f/s 75

Fact 4.11. Let Q be an orthogonal matrix. Then

1. Q preserves norms, i.e. kxk = k Qxk,

2. Q preserves othogonality of vectors, i.e. hx, yi = 0 if and only if


h Qx, Qyi = 0.
In the special case that Q is 2 × 2 or 3 × 3 and the vectors are real-valued
then Q preserves the angles between them.

3. all the eigenvalues λi of Q lie on the unit circle in the complex plane, i.e.
|λi | = 1.

Explanation.

1.

k Qxk2 = h Qx, Qxi


= hx, Q T Qxi
= hx, xi
= kxk2

2. In general, if hx, yi = 0 then

h Qx, Qyi = hx, Q T Qyi


= hx, yi
= 0.

In the special case of Q : R3 → R3

hx, yi = x · y = |x||y| cos θ

and
h Qx, Qyi = Qx · Qy = |x||y| cos φ

which is possible only if θ = φ + 2πn.

3. Let λ be an eigenvalue of Q.

hv, vi = h Qv, Qvi


= hλv, λvi
= |λ|2 hv, vi

which means that |λ|2 = 1 and so |λ| = 1.


76 university of cape town

Orthogonal Diagonalisation

Definition. A square matrix A is orthogonally diagonalisable if we can


find a diagonal matrix D and an orthogonal matrix Q so that

D = Q−1 AQ = Q T AQ.

Definition. A square matrix A is symmetric if A T = A.

Fact 4.12. If A is orthogonally diagonalisable then it is symmetric.


Explanation.

Q T AQ = D
⇒ A = QDQ T
Then A T = ( QDQ T ) T
AT = (QT )T D T QT
A T = QDQ T
AT = A

Fact 4.13. If A is a real-valued symmetric matrix then

1. it is orthogonally diagonalisable,

2. its eigenvalues are real,

3. eigenvectors that belong to different eigenspaces are orthogonal.

Explanation.

1. The explanation is tricky. Take it on faith. Sorry.

2. Let λ be an eigenvalue of A. We want to show that λ = λ. We


manipulate the equation Ax = λx in two different ways to show
this. First:

Ax = λx
x Ax = xT λx
T

xT Ax = λ kxk2

Second:

Ax = λx
Ax = λx
Ax = λx A is real-valued.
T T
x Ax = x λx
xT Ax = λ kxk2
mam2084f/s 77

Subtract these two expressions to recover

xT Ax − xT Ax = (λ − λ) kxk2 .

By the symmetry of A the left hand side of this equation is zero.


Since x is an eigenvector it is non-zero and so the right hand side
is zero only if λ = λ.

3. Let x and y by eigenvectors with distinct eigenvalues λ and µ.

λhx, yi = hλx, yi
= h Ax, yi
T
= hx, A yi
= hx, Ayi
= hx, µyi
= µhx, yi

Since µ 6= λ this is possible only if hx, yi = 0, i.e. the eigenvectors


are orthogonal.
 
9 1 1
Example. Orthogonally diagonalise A =  1 9 1 .
 
1 1 9
Solution. Find need to find diagonal D and orthogonal Q so that
D = Q−1 AQ. We have worked with this matrix earlier, and our
calculations then gave
     

 −1 −1 

E8 = α  1  + β  0  ; α 6= 0 orβ 6= 0
   
 
 0 1 

and    

 1 

E11 = γ  1  ; γ 6= 0 .
 
 
 1 

Given this information, we need to produce an orthonormal set of


three eigenvalues. We know that the eigenspaces are orthogonal,
so one of the eigenvectors we desire is simply √1 (1, 1, 1) T . What
3
remains is to choose two orthogonal eigenvectors from E8 .
We will start with the two eigenvectors (−1, 1, 0) T and (−1, 0, 1) T
and generate two new orthogonal vectors in E8 . This process of gen-
erating new orthogonal vectors from a linearly independent set of v2
v ⊥ u = u2
vectors is the Gram-Schmidt procedure. In general we can apply it to
a set of n vectors but we illustrate it here on two.
u1
vku

Figure 4.3: Projecting v2 onto u1 to find


u2 .
78 university of cape town

Let v1 = (−1, 1, 0) T and v2 = (−1, 0, 1) T . The first vector in our


orthogonal set is simply u1 = v1 . To find a second vector u2 ⊥ u1 we
will project v2 onto u1 and then subtract this projection from v2 , i.e.

v2 = vku + v⊥u .
|{z}
u2

hv2 , u1 i
The projection of v2 onto u1 is vku = u . In this case You should convince yourself that this
hu1 , u1 i 1 is true in R2 and R3 by using the dot
product and doing some trigonometry.
hv2 , u1 i
vku = u
hu1 , u1 i 1
 
−1
1
=  1 

2
0
u2 = v⊥u
= v2 − v k u
   
−1 −1
 1
=  0 −  1 
 
2
1 0
 
−1
1
=  −1 

2
2

We now have a set of three mutually orthogonal eigenvectors


     
 −1
 −1 1  
 1  ,  −1  ,  1 
     
 
 0 2 1 

and all that remains is to scale each by their norm


   √1   √1 
√1 −
 − 2
 
 6   3 

 √1   1 1
, − √ , √ 
 2
  6   3 
√2 √1

 
 0 6 3

so the matrix  
− √1 − √1 √1
2 6 3
√1 − √1 √1
 
Q=
 2 6 3


0 √2 √1
6 3
and  
8 0 0
D= 0 8 0 
 
0 0 11
and
D = Q−1 AQ.
mam2084f/s 79

The Gram-Schmidt Procedure


The Gram-Schmidt procedure can be used to convert a set of n lin-
early independent vectors {v1 , v2 , . . . , vn } into an orthonomral set
{e1 , e2 , . . . , en }.

1. Let u1 = v1 .

2. Create uk for 2 ≤ k ≤ n by
hvk , u j i
(a) calculating the projections u for 1 ≤ j < k.
hu j , u j i j
(b) subtracting these projections from vk , i.e.

h v k , u1 i hv , u2 i hvk , uk −1 i
uk = vk − u1 − k u2 − . . . − u .
hu1 , u1 i hu2 , u2 i huk −1 , uk −1 i k −1

3. Scale each vector so that it has a norm of one, i.e.


ui
ei = .
k ui k

Quadratic Forms

In MAM1020 we discussed conic sections – parabolas, hyperbolas


and ellipses. The standard forms are

Parabola: c(y − y0 ) = ( x − x0 )2 or c( x − x0 ) = (y − y0 )2
( x − x0 )2 ( y − y0 )2
Hyperbola: − = ±1
lx2 ly2
ly
( x − x0 )2 ( y − y0 )2
Ellipse: + =1
lx2 ly2 ( x0 , y0 ) lx

The general quadratic form in two variables is

ax2 + 2bxy + cy2 + dx + ey + f = 0. (4.3) Figure 4.4: The ellipse


( x − x0 )2 ( y − y0 )2
l x2
+ ly2
= 1.
Notice that

1. all of the standard forms for these conic sections can be expressed
as a general quadratic form,

2. but some general quadratic forms are not standard forms of conic
lx
sections. Specifically, if b 6= 0, the term 2bxy does not appear when
( x0 , y0 )
any of the standard forms are expanded.

What type of curves are these quadratic forms with b 6= 0? We


will use orthogonal diagonalisation to show that they are parabolas,
hyperbolas and ellipses, but they are rotated so that they are not
Figure 4.5: The hyperbola
square to the xy-axes. ( x − x0 )2 ( y − y0 )2
l x2
− ly2
= 1. The slopes of
l
the asymptotes are ± lyx .
80 university of cape town

Example. Plot the conic section represented by

16x2 + 9y2 + 64x − 18y − 71 = 0.

Solution. This quadratic form has no xy term so we can reduce it to


one of the standard forms by completing the square.
y
2 2
(16x + 64x ) + (9y − 18y) = 71
16( x2 + 4x ) + 9(y2 − 2) = 71
16( x + 2)2 − 64 + 9(y − 1)2 − 9 = 71
4
( x + 2)2 ( y − 1)2
+ = 1.
32 42

This ellipse is plotted in figure 4.6 alongside. (−2, 1) 3


x
When there is a non-zero xy term in the quadratic form the prob-
lem is significantly more difficult.
Example. Plot the conic section represented by the quadratic form
√ √
8x2 − 8xy + 2y2 + 5x + 2 5y = 5.
Figure 4.6: The ellipse
( x +2)2 ( y −1)2
32
+ 42
=1.
Solution. This quadratic form has a non-zero xy term. For these
types of problems we should write the quadratic form

ax2 + 2bxy + cy2 + dx + ey + f = 0

in the form
! ! !
a b x x
( x y) + (d e) = −f.
b c y y

If we let x = ( x y) T this particular quadratic form can be rewritten as


!
8 −4 √
xT x+ 5(1 2)x = 5. (4.4)
−4 2
| {z }
A

Some comments:

1. If you aren’t sure that the quadratic form can be rewritten in this
way multiply out xT Ax and confirm that you recover the original
expression.

2. We have some freedom about how we choose the entries in A. It


makes sense to split the coefficient of the xy term evenly so that A
is symmetric, and hence orthogonally diagonalisable.
mam2084f/s 81

Having rewritten the quadratic form in this way we will now orthog-
onally diagonalise A. First we must find its eigenvalues.

8−λ −4
=0

−4 2−λ

(8 − λ)(2 − λ) − 16 = 0
λ(λ − 10) = 0.

The eigenvalues of A are λ1 = 0 and λ2 = 10. We find the corre-


sponding eigenvectors.
! !
8 −4 0 2 −1 0

−4 2 0 0 0 0
!
1
so the corresponding eigenvectors are v = α where α 6= 0, and
2
! !
−2 −4 0 1 2 0

−4 −8 0 0 0 0
!
−2
so the corresponding eigenvectors are w = β where β 6= 0.
1
The matrix Q that will orthogonally diagonalise A is
1 −2
!
√ √
Q= 5 5
√2 √1
5 5

and the result of Q−1 AQ will be a diagonal matrix


!
0 0
D= .
0 10

In equation 4.4 we make the substitution x = QX where X = ( X Y ) T .


Then

( QX)T AQX + 5(1 2) QX = 5

XT Q T AQX + 5(1 2) QX = 5

XT DX + 5(1 2) QX = 5
! ! ! !
0 0 X 1 −2 X
(X Y) + (1 2) =5
0 10 Y 2 1 Y
! !
0 X − 2Y
(X Y) + (1 2) =5
10Y 2X + Y
10Y 2 + X − 2Y + 4X + 2Y = 5
X = 1 − 2Y 2
Y

82 university of cape town

X
We have recovered a standard conic form in the variables X and
Y. It’s a parabola, and we plot it alongside. What remains is to trans-
form back to the original variables x and y. To do this recall that the
original and new variables are related through x = QX. In terms
Figure 4.7: Our parabola plotted against
of the variables X and Y the X axis points in the direction (1 0) T . the XY-axes.
The corresponding direction in terms of x and y can be recovered by
calculating

x = QX
!
1
x=Q
0
√1
!
x= 5 .
√2
5

Similarly we can see that the Y axis points (in terms of x and y) in the
direction
−2
!

5 .
√1
5
If we plot our quadratic form we recover the following.

Y
−2
!
√ √1
!
5 5
√1 √2
5 5

Figure 4.8: Our parabola plotted against


the xy-axes.
mam2084f/s 83

Quadratic Forms of Three Variables


All of this analysis can be extended to quadratic forms of three vari-
ables. In general the quadratic form can be written as

ax2 + by2 + cz2 + 2dxy + 2eyz + 2 f zx + gx + hy + iz + j = 0.

These quadratic forms defines surfaces in R3 , e.g. ellipsoids, cones,


parabaloids and so on. If d = e = f = 0 the quadratic form can be
put into a standard form by completing the square. If one of those
coefficients is non-zero we rewrite the quadratic form as
 
a d f
x T  d b e  x + ( g h i )x = − j
 
f e c
and orthogonally diagonalise.
Example. What sort of surface is
√ √
x2 + y2 − 2z2 − 8xy − 4yz + 4xz + 18x + 18y = 3?

Solution. This surface includes mixed terms (i.e. non-zero xy, yz


and zx terms) so we cannot simply complete the squares. Instead
introduce the vector x = ( x y z) T and rewrite it as
 
1 −4 2 √
xT  −4 1 −2  x + 18(1 1 0)x = 3.
 
2 −2 −2
| {z }
A

The matrix A is real-valued and symmetric so it is orthogonally diag-


onalisable. You should check for yourself that if
 
2 √1 √1
 32 18 2
− √1 √1

 −3
Q= 
18 2 
− 13 √4 0
18

and  
6 0 0
D= 0 −3 0 
 
0 0 −3
then Q−1 AQ = D. Let X = ( X Y Z ) T and make the substitution
x = QX in the equation that describes the sufrace. After some simpli-
fication we recover
Y2 ( Z − 1)2
X2 − − = 0.
2 2
This is the standard form of a cone, parallel to the X axis. In terms of
our original variables x, y, and z, it is a cone oriented along Q(1 0 0) T =
(2 − 2 1) T .
84 university of cape town

Rotation and Reflection in R2 and R3

In this section we are going to tackle the following types of problem.


Example. Find a matrix A that represents reflection in the plane
x + y + z = 0.
This is a hard problem to solve because the plane defined by
x + y + z = 0 is not square to the xyz-axes. The key idea to solving
this problem is the following: we introduce a new XYZ-axes which
is square to this plane, find the matrix that represents reflection with
respect to these axes, and then describe this matrix with respect to
the original xyz-axes.

Fact 4.14. Suppose that the matrix A represents a linear transformation of


R3 with respect to the canonical basis
     
 1
 0 0  
0 , 1 , 0  = {e1 , e2 , e3 } .
     
    

 0 
0 1 
These vectors desribe the x, y and z
If we have a second orthonormal basis for R3 axes.

{û1 , û2 , û3 }


These vectors desribe the W, Y and Z
we can create an orthogonal matrix whose columns are simply these vectors axes.

Q = (û1 |û2 |û3 )

and the same linear transformation can be represented with respect to this
second basis by a matrix B where

B = Q−1 AQ = Q T AQ

or equivalently
A = QBQ−1 = QBQ T . A
R3 R3

Usual
Explanation. Consider figure 4.9 which shows the action of A, B, Basis
Q and Q−1 . Since matrix multiplication is simply the composition of Q −1 Q
New
functions its obvious that A = QBQ−1 .
Basis
Let’s use this to solve the problem above.
R3 R3
Example. Find a matrix A that represents reflection in the plane B
x + y + z = 0.
Figure 4.9: The transformation rep-
Solution. We are going to introduce a new basis for R3 . This basis, resented by A is the same as QBQ−1 .

{û1 , û2 , û3 }, will have two vectors û1 and û2 which lie in this plane
and û3 perpendicular to the plane. If we think in terms of a new
XYZ-axes the plane above will lie in the XY-plane and its normal
will run parallel to the Z-axis.
mam2084f/s 85

It’s easiest to start with û3 which describes the Z-axis. This should
be normal to the plane, so we take
 
1
u3 =  1 
 
1
and scale by its norm to recover
 
√1
3
√1
 
û3 = 
 3
.

√1
3

Next we find two vectors that lie inside the plane, and that are
orthogonal to each other. Finding two vectors in the plane is easy, we
take    
1 1
v1 =  −1  and v2 =  0  .
   
0 −1
Next we apply the Gram-Schmidt procedure and recover
 
√1
 1 

2  1  6
û1 =  − √1  and û2 =  √ .
 
2  6 
2
0 −√
6

We form a matrix Q whose columns are the vectors ûi . The matrix
B which represents this linear transformation, i.e. this reflection, with
respect to the new XYZ-axes is very simple:
 
1 0 0
B =  0 1 0 .
 
0 0 −1

The matrix A is simply QBQ−T . Some calculation gives


 
1 −2 −2
1
A =  −2 1 −2  .

3
−2 −2 1

Example. Find a matrix A that represents anticlockwise rotation of


R3 by π4 about the line passing through the origin in the direction
(1 1 1) T .
Solution. We describe R3 using a new basis. Since the given line is
normal to the plane from the previous equation we can reuse some of
our previous calculations. All that changes is that the matrix B is
 1
− √1 0


2 2
B =  √1 √1 0 .
 
2 2
0 0 1
86 university of cape town

The matrix A is QBQ−T . Some calculation gives


 √ √ √ √ √ 
2+2 2 2− 2+ 6 2− 2− 6
1 √ √ √ √ √ 
A =  2− 2− 6 2+2 2 2− 2+ 6 .
6 √ √ √ √ √
2− 2+ 6 2− 2− 6 2+2 2

Fact 4.15. If Q is orthogonal and A and B are two square matrices so that

A = QBQ T

then

1. the eigenvalues of A and B are the same (and so are the determinants and
trace),

2. A is symmetric if and only if B is symmetric,

3. A is orthogonal if and only if B is orthogonal.

Explanation. The explanation of point 1 is a tutorial problem. You


should be able to convince yourself that points 2 and 3 are true.
5
Laplace Transforms

As a general rule, solving differential equations is hard but solv-


ing algebraic equations is easy. The Laplace transform turns differ-
ential equations into algebraic equations and the inverse Laplace
transform turns solutions to algebraic equations into solutions to dif-
ferential equations. This suggests the following approach to solving
differential equations:

Differential 1. Laplace Transform L {·} Algebraic


Equation Equation

2. Solve
Direct solution too hard.

Differential Algebraic
Solution 3. Inverse Laplace Transform L−1 {·} Solution
Figure 5.1: Using the Laplace transform
to solve a differential equation.
Before we can implement this approach we are going to have to
review some of the basic properties of the Laplace transform and the
inverse Laplace transform.

The Basics

Definition. The Laplace transform of a function f (t) defined for t ≥ 0


is written L { f (t)} or F (s) and is given by
Z ∞
L { f (t)} = F (s) = e−st f (t) dt.
0
88 university of cape town

Example. Find the Laplace transform of f (t) = 1.


Solution. Simply apply the definition.
Z ∞
L {1} = e−st 1 dt
0
Z k
= lim e−st dt
k→∞ 0
k
−1 −st

= lim e
k→∞ s 0

This integral diverges for s ≤ 0 but converges for s > 0.

1
L {1} = ; s > 0.
s

Example. Find the Laplace transform of f (t) = t.


Solution. Again, apply the definition. The integration is a little
trickier (we need integration by parts) but it is otherwise just like the
example above.
Z ∞
L {t} = e−st t dt
0
Z k
= lim e−st t dt
k→∞ 0
k
−t −st 1
 Z
= lim e + e−st dt
k→∞ s s 0
k
−t −st

1
= lim e − 2 e−st
k→∞ s s 0
1
= 2; s > 0
s

We can do the same to find the Laplace transform of t2 and t3 and so


on. If we do we might spot a pattern and conclude (correctly) that

n!
L {tn } = .
s n +1
To actually prove this is true for all
This transform, along with a number of others, is included on the values of n we could use mathematical
induction.
summary sheet at the back of these notes and will be provided in
the exam. You do still need to be able to calculate the transform of a
function using the definition though.
Example. Find the Laplace transform of f (t) = e at .
Solution. Apply the definition as we have done for the previous two
examples.
mam2084f/s 89

 Z ∞
L e at = e−st e at dt
0
Z k
= lim e(a−s)t dt
k→∞ 0
 k
1 ( a−s)t
= lim e
k→∞ a−s 0
1
= ; s>a
s−a

Example. Find the Laplace transform of f (t) = sin( at)


Solution. Apply the definition. This time the integration is not
straightforward. We will integrate by parts twice, exploiting the
fact that the sin(t) becomes cos(t) and then returns to sin(t) after
repeated differentiation/integration. Consider first the indefinite
integral:

Z
I= e−st sin( at) dt
1 s
Z
= − cos( at)e−st − e−st cos( at) dt
a a
1 s −st s2
Z
= − cos( at)e−st − e sin ( at ) − e−st sin( at) dt
a a2 a2 | {z }
I
s2
   
1 s
1+ 2 I=− cos( at) + 2 sin( at) e−st
a a a
− ( a cos( at) + s sin( at)) e−st
I= .
s2 + a2

Then the Laplace transformation

L {sin( at)} = lim [ I ]0k


k→∞
a
= ; s>0
s2 + a2

Not all functions have Laplace transforms.


1
Example. Find the Laplace transform of f (t) = if possible.
t2
Solution. We apply the definition as before, but we will find that the
90 university of cape town

improper integral diverges.


  Z k
1 1
L 2 = lim e−st 2 dt
t k→∞ 0 t
Z 1 Z k
1 1
= e−st 2 dt + lim e−st 2 dt
0 t k→∞ 1 t
| {z }
converges (s>0)
Z 1 Z k
−s 1 1
≥e dt + lim e−st dt The inequality holds because
t2 t2
| 0 {z } k→∞ 1 s > 0 so that the second part of the
diverges integral converges. This means that e−st
is a decreasing function.
By the comparison test for improper integrals the original integral
1
also diverges. There is no Laplace transform for 2 .
t

Some Useful Facts

Fact 5.1. The Laplace transform is linear.


1. L { f (t) + g(t)} = L { f (t)} + L { g(t)}

2. L {α f (t)} = αL { f (t)}

Explanation. This follows immediately from the definition of the


Laplace transform as an integral. Since the integral of a sum is the
sum of an integral, and scaling the integrand by a constant scales the
integral by a constant we have linearity of the Laplace transform.
Fact 5.2. Let L { f (t)} = F (s). Then L f (t)e at = F (s − a).


1 1
we expect L 1e at =

Example. Since L {1} = which we
s s−a
have shown above.
Explanation. Use the definition.
Z ∞
L f (t)e at = e−st f (t)e at dt

0
Z ∞
= e−(s− a)t f (t) dt Let s̃ = s − a.
0
Z ∞
= e−s̃t f (t) dt
0
= F (s̃)
= F (s − a)
Fact 5.3. The Laplace transforms of the derivatives of f (t) are
L f 0 (t) = sL { f (t)} − f (0)


L f 00 (t) = s2 L { f (t)} − s f (0) − f 0 (0)




..
.
n o
L f (n) (t) = sn L { f (t)} − sn−1 f (0) − sn−2 f 0 (0) − . . . − f (n−1) (0)
mam2084f/s 91

Explanation. A full explanation requires induction. We illustrate the


basic idea and you can fill in the gaps if you are interested. First find
the Laplace transformation of f 0 (t).
Z ∞
L f 0 (t) = e−st f 0 (t) dt

Integrate by parts.
0
k Z k
= lim f (t)e−st 0 + s e−st f (t) dt

k→∞ 0
= sL { f (t)} − f (0)

To find the Laplace transformation of f 00 (t) start by letting f 0 (t) =


g ( t ).

L f 00 (t) = L g0 (t)
 

= sL { g(t)} − g(0) By the previous calculation.

= s L f 0 ( t ) − f 0 (0)


= s2 L { f (t)} − s f (0) − f 0 (0) By the previous calculation.

Now we could use mathematical induction to extend this to the n-th


derivative of f (t).
This fact can be used to calculate the Laplace transformation of
functions that are otherwise tricky to determine.
Example. Find the Laplace transform of f (t) = t cos( at).
Solution. The approach to this problem is to calculate f 00 (t), deter-
mine L { f 00 (t)} in two different ways and then set the results of the
two approaches equal to each other. Note that f 00 (t) = −2a sin( at) − a2 t cos( at).

L f 00 (t) = s2 L { f (t)} − s f (0) − f 0 (0)



First approach is to use Fact 5.3.

= s2 L { f (t)} − 1
n o
L f 00 (t) = L −2a sin( at) − a2 t cos( at)

Second approach is to use the ex-
pression for f 00 (t) and linearity of the
= −2aL {sin( at)} − a2 L {t cos( at)} transform.
−2a2
= − a2 L { f (t)}
s2 + a2
These two expressions must be equal.

−2a2
s2 L { f (t)} − 1 = − a2 L { f (t)} Now rearrange.
s2 + a2
−2a2
(s2 + a2 )L { f (t)} = 1 + 2
s + a2
s − a2
2
L { f (t)} = 2
( s + a2 )2
2as
Similarly L {t sin( at)} = .
( s2 + a2 )2
92 university of cape town

The Inverse Transform

For well-behaved functions the Laplace transform is one-to-one.

Fact 5.4. If f 1 (t) and f 2 (t) are both continuous on [0, ∞) then L { f 1 (t)} = L { f 2 (t)}
only if f 1 (t) = f 2 (t).
This result is difficult to prove but it does allow us to define an
inverse Laplace transform.
Definition. The inverse laplace transform of a function F (s) is defined
by:

L−1 { F (s)} = f (t) ⇐⇒ F (s) = L { f (t)} .

Fact 5.5. The inverse Laplace transform is linear.

1. L−1 { F (s) + G (s)} = L−1 { F (s)} + L−1 { G (s)}

2. L−1 {αF (s)} = αL−1 { F (s)}

Explanation. This follows from the linearity of the Laplace trans-


form.
There is an integral representation of the inverse Laplace trans-
Z γ+iT
form but it is difficult to work with. Instead, whenever we want to 1
L−1 { F (s)} = lim est F (s) ds
2πi T →∞ γ−iT
calculate the inverse Laplace transform of some function we will try
to express it as a linear combination of known transformed functions.
It is often useful to do a partial fraction decomposition.
 
1
Example. Calculate L−1 .
s5
Solution. By checking the table of known transformations we see
n!
that L {tn } = n+1 . That suggests the following approach.
s
   
1 1 4!
L −1 = L −1
s5 4! s4+1
 
1 −1 4!
= L
4! s 4+1
1
= t4
4!

 
1
Example. Calculate L −1 .
s2 − 1
Solution. We check the table of known transformations but this
time there is nothing immediately useful so we do a partial fraction
mam2084f/s 93

expansion.
1 1
=
s2 −1 (s − 1)(s + 1)
1 α β
= +
s2 − 1 s−1 s+1
1 = α ( s + 1) + β ( s − 1)
1 1
⇒α= and β = −
2 2
1 1 1 1 1
So 2 = −
s −1 2s−1 2s+1
These two fractions do appear in the table.
   
1 1 1 1 1
L −1 2 = L −1 −
s −1 2s−1 2s+1
   
1 −1 1 1 1
= L − L −1
2 s−1 2 s+1
1 t 1 −t
= e − e
2 2
= sinh(t)
 
1
Example. Calculate L −1 .
s2 + 64
 
8 1 1
Solution. The table shows L {sin(8t)} = so L−1 = sin(8t).
s + 82
2 2
s + 64 8
 
3s + 5
Example. Calculate L −1 .
s2 + 7
n √ o s
Solution. From the table we see that L cos( 7t) = 2 and
√ s + 7
n √ o 7
L sin( 7t) = 2 . Then
s +7
  ( √ )
3s + 5 s 5 7
L −1 2 = L −1 3 2 +√ 2
s +7 s +7 7 s + 7
  ( √ )
−1 s 5 −1 7
= 3L 2
+√ L 2
s +7 7 s +7
√ 5 √
= 3 cos( 7t) + √ sin( 7t).
7
 
3
Example. Calculate L−1 .
s2 + 4s + 5
Solution. We complete the square in the denominator:
Since s2 + 4s + 5 = (s + 2)2 + 1
   
3 1
L −1 2 = 3 L −1
s + 4s + 5 ( s + 2)2 + 1
= 3e−2t sin(t) From the transform of sin(t) and Fact
5.2.
94 university of cape town

a2
 
Example. Calculate L−1 .
s ( s + a2 )
2

Solution. To find the inverse transform we must first decompose


into partial fractions.

a2 α βs + γ
= + 2
s ( s2 + a2 ) s s + a2
a2 = α(s2 + a2 ) + ( βs + γ)s
a2 = (α + β)s2 + γs + α2
⇒ α = 1, β = −1 and γ = 0

Then
a2
   
−1 −1 1 s
L =L − 2
s ( s2 + a2 ) s s + a2
= 1 − cos( at)

We will see later in the course that we can solve this problem without
doing a partial fraction decomposition. We could instead use the
convolution theorem.

Solving Differential Equations

dy
Example. Solve − y = 4 subject to y(0) = 6.
dx
Solution. The solution is split into three steps as outlined in figure
5.1. First we transform this differential equation into an algebraic
equation.
dy
−y = 4
 dx 
dy
L − y = L {4}
dx
 
dy
L − L { y } = L {4}
dx
4
sL {y} − 6 − L {y} =
s
Next we solve this algebraic equation.
4
sL {y} − 6 − L {y} =
s
4
(s − 1)L {y} = + 6
s
4 6
L {y} = +
s ( s − 1) s − 1
−4 10
L {y} = +
s s−1
mam2084f/s 95

Finally we transform back to find a solution to the original problem.

−4 10
L {y} = +
s s−1
−4
 
10
L−1 {L {y}} = L−1 +
s s−1
−4
   
10
y = L −1 + L −1
s s−1
y = −4 + 10et

At this stage we can check that we have done everything correctly by


noting that y(0) = −4 + 10e0 = 6 as required, and if we substitute
this expression into the original differential equation we find that

10et − −4 + 10et = 4


so this really is the solution.


Example. Solve y00 − 6y0 + 9y = t2 e3t subject to y(0) = 2 and
y0 (0) = 6.
Solution. This is a little trickier than the previous example. We
could solve it through variation of parameters, but we will use the
Laplace transform.
n o
L y00 − 6y0 + 9y = L t2 e3t

n o
L y00 − 6L y0 + 9L {y} = L t2 e3t
 

2
(s2 L {y} − 2s − 6) − 6(sL {y} − 2) + 9L {y} =
( s − 3)3
2
L {y} (s2 − 6s + 9) = 2s + 6 − 12 +
( s − 3)3
2
L { y } ( s − 3)2 = 2( s − 3) +
( s − 3)3
2 2
L {y} = +
s − 3 ( s − 3)5
 
2 2
L−1 {L {y}} = L −1 +
s − 3 ( s − 3)5
   
1 2 4!
y = 2 L −1 + L −1
s−3 4! ( s − 3)5
1
y = 2e3t + t4 e3t
12

Example. Solve y00 + 16y = cos(4t) subject to y(0) = 0 and y0 (0) = 1.


Solution. Again, this is a problem that could be solved using alter-
native methods, e.g. variation of parameters, but we will use Laplace
96 university of cape town

transforms.

L y00 + 16y = L {cos(4t)}




s
s2 L {y} − 1 + 16L {y} = 2
s + 42
s
(s2 + 16)L {y} = 1 + 2
s + 16
1 s
L {y} = 2 + 2
s + 16 (s + 16)2
1 1
y = sin(4t) + t sin(4t)
4 8
This differential equation models the behaviour of an undamped,
driven simple harmonic oscillator. In this case the driving force,
cos(4t), is matched to the resonant frequency of the system which is
why the solution “blows up”. As t → ∞, |y| → ∞.
The differential equations we have solved so far all have initial
conditions specified at t = 0. If the conditions on y are imposed at
some other time we must make the following adjustment.
Example. Solve y0 − y = 0 at y(1) = 4.
Solution. This is a very simple differential equation, but the condi-
tion on y is at 1 rather than 0 so it’s not immediately obvious how
we would apply Fact 5.3 to calculate the transform of y0 . In cases
like this we introduce a new condition y(0) = α. (And y0 (0) = β,
y00 (0) = γ and so on as necessary).

L y0 − y = 0


L y0 − L {y} = 0


sL {y} − α − L {y} = 0
α
L {y} =
s−1
y = αet

Now we solve for α (and any other constants β, γ . . .) by using the


given condition(s).
4
4 = αe1 ⇒ α =
e

Convolution

Definition. The convolution of two functions f (t) and g(t) is


Z t
( f ∗ g)(t) = f (w) g(t − w) dw.
0

(There is a very good visual representation of what exactly the con-


volution is on Wikipedia). We sometimes write f (t) ∗ g(t) for the
mam2084f/s 97

convolution but this notation can be ambiguous when the arguments


of the function are not just t.

Fact 5.6. The Convolution Theorem. Consider two functions f (t) and
g(t) with Laplace transforms F (s) and G (s). Then

L {( f ∗ g)(t)} = F (s) × G (s)

equivalently

( f ∗ g)(t) = L−1 { F (s) × G (s)} .

A proof of the convolution theorem is beyond the scope of the


course. We will use the convolution theorem mostly in its second
statement, i.e. as a tool for calculating inverse transforms.
a2
 
Example. Calculate L−1 .
s ( s2 + a2 )
Solution. We rewrite the function to be transformed as a product
and apply the Convolution theorem.

a2
     
a 1
L −1 = L −1 a 2 ×
s ( s2 + a2 ) s + a2 s
   
a −1 1
= a L −1 2 ∗ L
s + a2 s
= a sin( at) ∗ 1
Z t
=a sin( aw)1 dw
0
= 1 − cos at

 
1
Example. Calculate L −1 without doing a partial
(s − 1)(s2 + 1)
fraction expansion.
Solution. We will use the convolution theorem instead.

   
1 1 1
L −1 = L −1 ×
(s − 1)(s2 + 1) s − 1 s2 + 1
= et ∗ sin(t) By application of the convolution theo-
rem and the table of known transforms.

This much is easy. Now we have to calculate the convolution of these


two functions. To simplify the calculation we will first work with an
98 university of cape town

indefinite integral.
Z
I= ew sin(t − w) dw
Z
= ew sin(t − w) + ew cos(t − w) dw Integrate by parts.
Z
= ew sin(t − w) + ew cos(t − w) − ew sin(t − w) dw Parts, again.
| {z }
I
w
2I = e (sin(t − w) + cos(t − w))
1 t 1 1
⇒ et ∗ sin(t) = [ I ]0t = e − sin(t) − cos(t)
2 2 2
Fact 5.7. The convolution of f with g is the same as the convolution of g
with f , i.e.
( f ∗ g)(t) = ( g ∗ f )(t).

Explanation. We will make the change of variable v = t − w (hence


w = t − v and dw = −dv) to show equivalence.
Z t
( f ∗ g)(t) = f (w) g(t − w) dw
0
Z 0
=− f (t − v) g(v) dv
t
Z t
= g(v) f (t − v) dv
0
= ( g ∗ f )(t)

Step Functions

Definition. The unit step function u(t) is



0 t < 0
u(t) = .
1 t ≥ 0

It is sometimes called the Heaviside function and may be denoted by 1 u(t)

H (t) or Θ(t).
The unit step function is useful in engineering contexts because it
allows us to “turn on” (and “turn off”) other functions which may
represent current that starts flowing through a circuit, or a catalyst
added to a reaction vessel, or a load transferred onto a structure, or Figure 5.2: The unit step function.
any other situation where there is an abrupt change.
Example. Plot f (t) = u(t) sin(t)
Solution. When t < 0 the unit step function is “off” so the function
mam2084f/s 99

f (t) is zero. When t ≥ 0 it is “on” and the function f (t) is just sin(t).

0 t<0
f (t) = 1
sin(t) t≥0

The unit step function u(t) “turns on” at zero. If we want to “turn
on” at a different point we can use a shifted unit step function u(t −
u(t) sin(t)
a) −1

0 t < a
u(t − a) = .
1 t ≥ a Figure 5.3: “Turning on” the sine
function.
By combining two shifted unit step functions we can “turn on” and
then “turn off” at a later point.
Example. Plot f (t) = u(t + 1) − u(t − 2).
Solution. This function is zero when t is less than −1, one from −1
to 2, and then zero again after that.
This example is slightly unphysical. We normally restrict ourselves −1 2

to non-negative time values.


Of particular interest to us are functions of the form u(t − a) f (t −
Figure 5.4: “Turning on” and “turning
a), e.g. u(t − a) sin(t − 1). These are functions that are shifted so that off”.
they “start” as the step function “turns on”.

Figure 5.5: “Turning on” and shifting


the Sine function. Compare to Figure
5.3 above.
1

u(t − 1) sin(t − 1)
−1

Fact 5.8. Let F (s) be the Laplace transform of f (t). For a > 0

L {u(t − a) f (t − a)} = e−as L { f (t)} = e−as F (s).

Equivalently,
L−1 e−as F (s) = u(t − a) f (t − a).

100 university of cape town

Explanation. We use the definition of the Laplace transform.


Z ∞
L {u(t − a) f (t − a)} = e−st u(t − a) f (t − a) dt
0
Z ∞
= e−st f (t − a) dt Make a change of variable τ = t − a.
a
Z ∞
= e−s(τ + a) f (τ ) dτ
0
Z ∞
= e−as e−sτ f (τ ) dτ
0
= e−as L { f (τ )}
= e−as F (s)

Example. Find the Laplace transformations of the following func-


tions.

1. u(t − 5)

2. u(t − 2)(t − 2)3

3. u(t − 2)t2

Solution. The first two are fairly straightforward. The third requires
some manipulation.

1. L {u(t − 5)} = e−5s 1s

2. L u(t − 2)(t − 2)3 = e−2s s3!4




3. Start by rewriting t2 as (t − 2 + 2)2 .


n o n o
L u(t − 2)t2 = L u(t − 2) ((t − 2) + 2)2 Apply Fact 5.8 and then square out the
n o bracket.
= e−2s L (t + 2)2
n o
= e−2s L t2 + 4t + 4
 
2! 4 4
= e−2s 3 + 2 +
s s s

e−s
 
Example. Find the inverse Laplace transform L−1 2
.
s +9
 
1 1
Solution. We know that 2
L −1
= sin(3t). All that remains
s +9 3
is to correct for the exponential using Fact 5.8.
 
−1 −s 1 1
L e 2
= u(t − 1) sin (3(t − 1))
s +9 3
mam2084f/s 101

Systems of Differential Equations


Laplace transformation can also be used to solve systems of differen-
tial equations.
Example. Solve the system of differential equations

x 0 + 4y = 6 x (0) = 0
0
y + 4y − x = 0 y (0) = 0

Solution. Let L { x (t)} = X (s) and L {y(t)} = Y (s). We transform


both differential equations. This gives us a system of two algebraic
equations that we can solve.

6
sX − 0 + 4Y = (5.1)
s
sY − 0 + 4Y − X = 0 (5.2)

From the second equation we see that X = (s + 4)Y. Substitute into


the first equation.

6
s(s + 4)Y + 4Y =
s
6
Y=
s ( s + 2)2
a b c
= + +
s s + 2 ( s + 2)2
some algebra happens
3 3
2 2 3
= + +
s s+2 ( s + 2)2
( ) ( )
3 3  
−1 2 −1 2 −1 3
y(t) = L +L +L
s s+2 ( s + 2)2
3 3 −2t
= − e − 3te−2t
2 2
We can find x (t) in two different ways. Either find X and transform
back, or substitute y(t) into the original differential equation.

You might also like