Vector and Matrix Calculus
Vector and Matrix Calculus
Jeffrey R. Chasnov
The Hong Kong University of Science and Technology
Department of Mathematics
Clear Water Bay, Kowloon
Hong Kong
This work is licensed under the Creative Commons Attribution 3.0 Hong Kong License. To view a copy of this
license, visit http://creativecommons.org/licenses/by/3.0/hk/ or send a letter to Creative Commons, 171 Second
Street, Suite 300, San Francisco, California, 94105, USA.
Preface
View the promotional video on YouTube
These are the lecture notes for my online Coursera course, Vector Calculus for Engineers. Students
who take this course are expected to already know single-variable differential and integral calculus to
the level of an introductory college calculus course. Students should also be familiar with matrices,
and be able to compute a three-by-three determinant.
I have divided these notes into chapters called Lectures, with each Lecture corresponding to a
video on Coursera. I have also uploaded all my Coursera videos to YouTube, and links are placed at
the top of each Lecture.
There are some problems at the end of each lecture chapter. These problems are designed to
exemplify the main ideas of the lecture. Students taking a formal university course in multivariable
calculus will usually be assigned many more problems, some of them quite difficult, but here I follow
the philosophy that less is more. I give enough problems for students to solidify their understanding
of the material, but not so many that students feel overwhelmed. I do encourage students to attempt
the given problems, but, if they get stuck, full solutions can be found in the Appendix. I have also
included practice quizzes as an additional source of problems, with solutions also given.
Jeffrey R. Chasnov
Hong Kong
October 2019
iii
Contents
I Vectors 1
1 Vectors 5
2 Cartesian coordinates 7
3 Dot product 9
4 Cross product 11
8 Vector identities 23
II Differentiation 29
10 Partial derivatives 33
12 Chain rule 37
v
vi CONTENTS
15 Gradient 45
16 Divergence 47
17 Curl 49
18 Laplacian 51
21 Electromagnetic waves 59
24 Polar coordinates 73
25 Central force 75
28 Cylindrical coordinates 83
IV Fundamental Theorems 97
Appendix 127
Vectors
1
3
In this week’s lectures, we learn about vectors. Vectors are line segments with both length and direc-
tion, and are fundamental to engineering mathematics. We will define vectors, how to add and subtract
them, and how to multiply them using the scalar and vector products (dot and cross products). We
use vectors to learn some analytical geometry of lines and planes, and introduce the Kronecker delta
and the Levi-Civita symbol to prove vector identities. The important concepts of scalar and vector
fields are discussed.
4
Lecture 1
Vectors
View this lecture on YouTube
We define a vector in three-dimensional Euclidean space as having a length (or magnitude) and a
direction. A vector is depicted as an arrow starting at one point in space and ending at another point.
All vectors that have the same length and point in the same direction are considered equal, no matter
where they are located in space. (Variables that are vectors will be denoted in print by boldface, and
in hand by an arrow drawn over the symbol.) In contrast, scalars have magnitude but no direction.
Zero can either be a scalar or a vector and has zero magnitude. The negative of a vector reverses its
direction. Examples of vectors are velocity and acceleration; examples of scalars are mass and charge.
Vectors can be added to each other and multiplied by scalars. A simple example is a mass m acted
on by two forces F1 and F2 . Newton’s equation then takes the form ma = F1 + F2 , where a is the
acceleration vector of the mass. Vector addition is commutative and associative:
A + B = B + A, (A + B ) + C = A + (B + C );
k(A + B ) = kA + kB.
Multiplication of a vector by a positive scalar changes the length of the vector but not its direction.
Vector addition can be represented graphically by placing the tail of one of the vectors on the head of
the other. Vector subtraction adds the first vector to the negative of the second. Notice that when the
tail of A and B are placed at the same point, the vector B A points from the head of A to the head
of B, or equivalently, the tail of A to the head of B.
B−A
A B
A+B=B+A
A
B
B
A
5
6 LECTURE 1. VECTORS
2. Using vectors, prove that the line segment joining the midpoints of two sides of a triangle is parallel
to the third side and half its length.
Cartesian coordinates
View this lecture on YouTube
To solve a physical problem, we usually impose a coordinate system. The familiar three-dimensional
x-y-z coordinate system is called the Cartesian coordinate system. Three mutually perpendicular
lines called axes intersect at a point called the origin, denoted as (0, 0, 0). All other points in three-
dimensional space are identified by their coordinates as ( x, y, z) in the standard way. The positive
directions of the axes are usually chosen to form a right-handed coordinate system. When one points
the right hand in the direction of the positive x-axis, and curls the fingers in the direction of the
positive y-axis, the thumb should point in the direction of the positive z-axis.
A vector has a length and a direction. If we impose a Cartesian coordinate system and place the
tail of a vector at the origin, then the head points to a specific point. For example, if the vector A has
head pointing to ( A1 , A2 , A3 ), we say that the x-component of A is A1 , the y-component is A2 , and
the z-component is A3 . The length of the vector A, denoted by |A|, is a scalar and is independent of
the orientation of the coordinate system. Application of the Pythagorean theorem in three dimensions
results in q
|A| = A21 + A22 + A23 .
We can define standard unit vectors i, j and k, to be vectors of length one that point along the
positive directions of the x-, y-, and z-coordinate axes, respectively. Using these unit vectors, we write
A = A1 i + A2 j + A3 k.
With also B = B1 i + B2 j + B3 k, vector addition and scalar multiplication can be expressed component-
wise and is given by
The position vector, r, is defined as the vector that points from the origin to the point ( x, y, z), and is
used to locate a specific point in space. It can be written in terms of the standard unit vectors as
r = xi + yj + zk.
A displacement vector is the difference between two position vectors. For position vectors r1 and r2 ,
the displacement vector that points from the head of r1 to the head of r2 is given by
r2 r1 = ( x 2 x1 )i + ( y2 y1 )j + ( z2 z1 )k.
7
8 LECTURE 2. CARTESIAN COORDINATES
b) Newton’s law of universal gravitation states that two point masses attract each other along the
line connecting them, with a force proportional to the product of their masses and inversely
proportional to the square of the distance between them. The magnitude of the force acting on
each mass is therefore
m1 m2
F=G ,
r2
where m1 and m2 are the two masses, r is the distance between them, and G is the gravitational
constant. Let the masses m1 and m2 be located at the position vectors r1 and r2 . Write down the
vector form for the force acting on m1 due to its gravitational attraction to m2 .
Dot product
View this lecture on YouTube
We define the dot product (or scalar product) between two vectors A = A1 i + A2 j + A3 k and
B = B1 i + B2 j + B3 k as
A · B = A1 B1 + A2 B2 + A3 B3 .
One can prove that the dot product is commutative, distributive over addition, and associative with
respect to scalar multiplication; that is,
A geometric interpretation of the dot product is also possible. Given any two vectors A and B,
place the vectors tail-to-tail, and impose a coordinate system with origin at the tails such that A is
parallel to the x-axis and B lies in the x-y plane, as shown in the figure. The angle between the two
vectors is denoted as q.
A · B = |A|B | cos q,
|B|
y
9
10 LECTURE 3. DOT PRODUCT
a) A · B = B · A;
b) A · (B + C ) = A · B + A · C;
2. Determine all the combinations of dot products between the standard unit vectors i, j, and k.
3. Let C = A B. Calculate the dot product of C with itself and thus derive the law of cosines.
Cross product
View this lecture on YouTube
We define the cross product (or vector product) between two vectors A = A1 i + A2 j + A3 k and
B = B1 i + B2 j + B3 k as
i j k
A ⇥ B = A1 A2 A3 = ( A2 B3 A3 B2 )i + ( A3 B1 A1 B3 )j + ( A1 B2 A2 B1 )k.
B1 B2 B3
Use of the three-by-three determinant is a useful mnemonic to remember the formula. One can prove
that the cross product is anticommutative, distributive over addition, and associative with respect to
scalar multiplication; that is
|B| sin θ
|B|
y
A geometric interpretation of the cross product is also possible. Given two vectors A and B with
angle q between them, impose a coordinate system so that A is parallel to the x-axis and B lies in the
x-y plane. Then A = |A|i, B = |B | cos qi + |B | sin qj, and A ⇥ B = |A||B | sin qk. The coordinate-
independent relationship is
|A ⇥ B | = |A||B | sin q,
where q lies between zero and 180 . Furthermore, the vector A ⇥ B points in the direction perpendic-
ular to the plane formed by A and B, and its sign is determined by the right-hand rule.
11
12 LECTURE 4. CROSS PRODUCT
a) A ⇥ B = B ⇥ A;
b) A ⇥ (B + C ) = A ⇥ B + A ⇥ C;
2. Determine all the combinations of cross products between the standard unit vectors i, j, and k.
3. Show that the cross product is not in general associative. That is, find an example using unit vectors
such that
A ⇥ (B ⇥ C ) 6= (A ⇥ B ) ⇥ C.
a) A · B = B · A
b) A + (B + C ) = (A + B ) + C
c) A ⇥ (B ⇥ C ) = (A ⇥ B ) ⇥ C
d) A · (B + C ) = A · B + A · C
a) a2 b3 a3 b2
b) a3 b1 a1 b3
c) a1 b2 a2 b1
d) a1 b3 a3 b1
a) i ⇥ (j ⇥ k )
b) (i ⇥ j ) ⇥ k
c) (i ⇥ i) ⇥ j
d) i ⇥ (i ⇥ j )
13
14 LECTURE 4. CROSS PRODUCT
Lecture 5
In two dimensions, the equation for a line in slope-intercept form is y = mx + b, and in point-slope
form is y y1 = m ( x x1 ). In three dimensions, a line is most commonly expressed as a parametric
equation.
Suppose that a line passes through a point with position vector r0 and in a direction parallel to
the vector u. Then, from the definition of vector addition, we can specify the position vector r for any
point on the line by
r = r0 + ut,
x = x0 + u1 t, y = y0 + u2 t, z = z0 + u3 t;
Example: Find the parametric equation for a line that passes through the points (1, 2, 3) and (3, 2, 1). Determine
the intersection point of the line with the z = 0 plane.
To find a vector parallel to the direction of the line, we first compute the displacement vector between
the two given points:
u = (3 1)i + (2 2)j + (1 3)k = 2i 2k.
Choosing a point on the line with position vector r0 = i + 2j + 3k, the parametric equation for the
line is given by
The line crosses the z = 0 plane when 3 2t = 0, or t = 3/2, and ( x, y) = (4, 2).
15
16 LECTURE 5. ANALYTIC GEOMETRY OF LINES
A plane in three-dimensional space is determined by three non-collinear points. Two linearly indepen-
dent displacement vectors with direction parallel to the plane can be formed from these three points,
and the cross-product of these two displacement vectors will be a vector that is orthogonal to the
plane. We can use the dot product to express this orthogonality.
So let three points that define a plane be located by the position vectors r1 , r2 , and r3 , and construct
any two displacement vectors, such as s1 = r2 r1 and s2 = r3 r2 . A vector normal to the plane is
given by N = s1 ⇥ s2 , and for any point in the plane with position vector r, and for any one of the
given position vectors ri , we have N · (r ri ) = 0. With r = xi + yj + zk, N = ai + bj + ck and
d = N · ri , the equation for the plane can be written as N · r = N · ri , or
ax + by + cz = d.
Notice that the coefficients of x, y and z are the components of the normal vector to the plane.
Example: Find an equation for the plane defined by the three points (2, 1, 1), (1, 2, 1), and (1, 1, 2). Determine
the equation for the line in the x-y plane formed by the intersection of this plane with the z = 0 plane.
To find two vectors parallel to the plane, we compute two displacement vectors from the three points:
i j k
N = s 1 ⇥ s2 = 1 1 0 = i + j + k.
0 1 1
(i + j + k ) · ( xi + yj + zk ) = (i + j + k ) · (2i + j + k ), or x + y + z = 4.
The intersection of this plane with the z = 0 plane forms the line given by y = x + 4.
17
18 LECTURE 6. ANALYTIC GEOMETRY OF PLANES
a) ti + (1 + t)j + (1 + 2t)k
b) ti + (1 t)j + (1 + 2t)k
c) ti + (1 + t)j + (1 2t)k
d) ti + (1 t )j + (1 2t)k
a) y = x
b) y = x + 1
c) y = x 1
d) y = 1
19
20 LECTURE 6. ANALYTIC GEOMETRY OF PLANES
Lecture 7
We define the Kronecker delta dij to be +1 if i = j and 0 otherwise, and the Levi-Civita symbol
eijk to be +1 if i, j, and k are a cyclic permutation of (1, 2, 3) (that is, one of (1, 2, 3), (2, 3, 1) or (3, 1, 2));
1 if an anticyclic permutation of (1, 2, 3) (that is, one of (3, 2, 1), (2, 1, 3) or (1, 3, 2)); and 0 if any two
indices are equal. More formally, 8
<1, if i = j;
dij =
:0, if i 6= j;
and 8
< +1,
> if (i, j, k) is (1, 2, 3), (2, 3, 1) or (3, 1, 2);
eijk = 1, if (i, j, k) is (3, 2, 1), (2, 1, 3) or (1, 3, 2);
>
:
0, if i = j, or j = k, or k = i.
For convenience, we will use the Einstein summation convention when working with these symbols,
where a repeated index implies summation over that index. For example, dii = d11 + d22 + d33 = 3; and
eijk eijk = 6, where we have summed over i, j, and k. This latter expression contains a total of 33 = 27
terms in the sum, where six of the terms are equal to one and the remaining terms are equal to zero.
There is a remarkable relationship between the product of Levi-Civita symbols and the Kronecker
delta, given by the determinant
The Kronecker delta, Levi-Civita symbol, and the Einstein summation convention are used to
derive some common vector identities. The dot-product is written as A · B = Ai Bi , and the Levi-
Civita symbol is used to write the ith component of the cross-product as [A ⇥ B ]i = eijk A j Bk , which
can be made self-evident by explicitly writing out the three components. The Kronecker delta finds
use as dij A j = Ai .
21
22 LECTURE 7. KRONECKER DELTA AND LEVI-CIVITA SYMBOL
a) dij A j = Ai ;
4. Given the most general identity relating the Levi-Civita symbol to the Kronecker delta,
eijk elmn = dil (djm dkn djn dkm ) dim (djl dkn djn dkl ) + din (djl dkm djm dkl ),
Vector identities
View this lecture on YouTube
A · (B ⇥ C ) = B · (C ⇥ A) = C · (A ⇥ B ), (8.1)
A ⇥ (B ⇥ C ) = (A · C )B (A · B )C (8.2)
(A ⇥ B ) · (C ⇥ D ) = (A · C )(B · D ) (A · D )(B · C ) (8.3)
Parentheses are optional when expressions have only one possible interpretation, but for clarity they
are often written. Proofs of these vector identities make use of the following Kronecker delta and
Levi-Civita identities: eijk = e jki = ekij ; eijk eimn = djm dkn djn dkm ; and dij A j = Ai .
The first identity, called the scalar triple product, can be proved using the cyclic property of the
Levi-Civita tensor:
Ai eijk Bj Ck = Bj e jki Ck Ai = Ck ekij Ai Bj .
Another proof writes the scalar triple product as the three-by-three determinant
A1 A2 A3
A · (B ⇥ C ) = B1 B2 B3 ,
C1 C2 C3
and uses the property that the determinant changes sign under row interchange. The scalar triple
product is also the volume of the parallelepiped defined by the three vectors.
The second identity, called the vector triple product, can be proved by writing the ith component
as
The third identity, called the scalar quadruple product, has proof
23
24 LECTURE 8. VECTOR IDENTITIES
A ⇥ (B ⇥ C ) + B ⇥ (C ⇥ A) + C ⇥ (A ⇥ B ) = 0.
|A ⇥ B |2 = |A|2 |B |2 (A · B )2 .
a) B ⇥ (C ⇥ A)
b) A ⇥ (C ⇥ B )
c) (A ⇥ B ) ⇥ C
d) (C ⇥ B ) ⇥ A
a) A · (B ⇥ B )
b) A · (A ⇥ B )
c) A ⇥ (A ⇥ B )
d) B · (A ⇥ B )
25
26 LECTURE 8. VECTOR IDENTITIES
Lecture 9
In some physical problems scalars and vectors can be functions of both space and time. We call
these types of variables fields. For example, the temperature in some spatial domain is a scalar field,
and we can write
T (r, t) = T ( x, y, z; t),
where we use the common notation of a semicolon on the right-hand-side to separate the space and
time dependence. Notice that the position vector r is used to locate the temperature in space. As
another example, the velocity vector u of a flowing fluid is a vector field, and we can write
The equations governing a field are sometimes called the field equations, and these equations com-
monly take the form of partial differential equations. For example, the equations for the electric and
magnetic vector fields are the famous Maxwell’s equations, and the equation for the velocity vector
field is called the Navier-Stokes equation. The equation for the scalar field (called the wave function)
in non-relativistic quantum mechanics is called the Schrödinger equation.
yi+ xj
B ( x, y) = ,
x 2 + y2
27
28 LECTURE 9. SCALAR AND VECTOR FIELDS
Differentiation
29
31
In this week’s lectures, we learn about the derivatives of scalar and vector fields. We define the partial
derivative and derive the method of least squares as a minimization problem. We learn how to use
the chain rule for a function of several variables, and derive the triple product rule used in chem-
istry. From the del differential operator, we define the gradient, divergence, curl and Laplacian. We
learn some useful vector derivative identities and how to derive them using the Kronecker delta and
Levi-Civita symbol. Vector identities are then used to derive the electromagnetic wave equation from
Maxwell’s equations in free space. Electromagnetic waves are fundamental to all modern communica-
tion technologies.
32
Lecture 10
Partial derivatives
View this lecture on YouTube
For a function f = f ( x, y) of two variables, we define the partial derivative of f with respect to x
as
∂f f ( x + h, y) f ( x, y)
= lim ,
∂x h !0 h
and similarly for the partial derivative of f with respect to y. To take a partial derivative with respect
to a variable, take the derivative with respect to that variable treating all other variables as constants.
As an example, consider
f ( x, y) = 2x3 y2 + y3 .
We have
∂f ∂f
= 6x2 y2 , = 4x3 y + 3y2 .
∂x ∂y
Second derivatives are defined as the derivatives of the first derivatives, so we have
∂2 f ∂2 f
= 12xy2 , = 4x3 + 6y;
∂x2 ∂y2
and for continuous differentiable functions, the mixed second partial derivatives are independent of
the order in which the derivatives are taken,
∂2 f ∂2 f
= 12x2 y = .
∂x∂y ∂y∂x
To simplify notation, we introduce the standard subscript notation for partial derivatives,
∂f ∂f ∂2 f ∂2 f ∂2 f
fx = , fy = , f xx = , f xy = , f yy = , etc.
∂x ∂y ∂x2 ∂x∂y ∂y2
The Taylor series of f ( x, y) about the origin is developed by expanding the function in a multivariable
power series that agrees with the function value and all its partial derivatives at the origin. We have
1 ⇣ ⌘
f ( x, y) = f + f x x + f y y + f xx x2 + 2 f xy xy + f yy y2 + . . . ,
2!
where the function and all its partial derivatives on the right-hand side are evaluated at the origin and
are constants.
33
34 LECTURE 10. PARTIAL DERIVATIVES
1
f ( x, y, z) = .
( x2 + y2 + z2 ) n
2. Given the function f = f (t, x ), find the Taylor series expansion of the expression
n
f ( b0 , b1 ) = Â ( b 0 + b 1 xi y i )2 .
i =1
Here, the data is assumed given and the unknowns are the fitting parameters b 0 and b 1 . It should be
clear from the problem specification, that there must be values of b 0 and b 1 that minimize the function
f = f ( b 0 , b 1 ). To determine, these values, we set ∂ f /∂b 0 = ∂ f /∂b 1 = 0. This results in the equations
n n
 ( b 0 + b 1 xi yi ) = 0,  xi ( b 0 + b 1 xi yi ) = 0.
i =1 i =1
n n n n n
b0 n + b1 Â xi = Â yi , b0 Â xi + b 1 Â xi2 = Â xi yi .
i =1 i =1 i =1 i =1 i =1
 xi2  yi  xi yi  xi n  xi yi ( xi )( yi )
b0 = , b1 = ,
n  xi2 (  x i )2 n  xi2 ( xi )2
35
36 LECTURE 11. THE METHOD OF LEAST SQUARES
Chain rule
View this lecture on YouTube
Partial derivatives are used in applying the chain rule to a function of several variables. Consider
a two-dimensional scalar field f = f ( x, y), and define the total differential of f to be
We can write d f as
df ∂ f dx ∂ f dy
= + .
dt ∂x dt ∂y dt
And if one has f = f ( x (r, q ), y(r, q )), say, then the corresponding chain rule is given by
∂f ∂ f ∂x ∂ f ∂y ∂f ∂ f ∂x ∂ f ∂y
= + , = + .
∂r ∂x ∂r ∂y ∂r ∂q ∂x ∂q ∂y ∂q
dx d2 x
Example: Consider the differential equation = u(t, x (t)). Determine a formula for 2 in terms of u and its
dt dt
partial derivatives.
d2 x ∂u ∂u dx
= +
dt2 ∂t ∂x dt
∂u ∂u
= +u .
∂t ∂x
The above formula is called the material derivative and in three dimensions forms a part of the Navier-
Stokes equation for fluid flow.
37
38 LECTURE 12. CHAIN RULE
b) Eliminate x and y in favor of r and q and compute the partial derivatives directly.
Suppose that three variables x, y and z are related by the equation f ( x, y, z) = 0, and that it is possible
to write x = x (y, z) and z = z( x, y). Taking differentials of x and y, we have
∂x ∂x ∂z ∂z
dx = dy + dz, dz = dx + dy.
∂y ∂z ∂x ∂y
We can make use of the second equation to eliminate dz in the first equation to obtain
✓ ◆
∂x ∂x ∂z ∂z
dx = dy + dx + dy ;
∂y ∂z ∂x ∂y
or collecting terms, ✓ ◆ ✓ ◆
∂x ∂z ∂x ∂x ∂z
1 dx = + dy.
∂z ∂x ∂y ∂z ∂y
Since dx and dy are independent variations, the terms in parenthesis must be zero. The left-hand-side
results in the reciprocity relation
∂x ∂z
= 1,
∂z ∂x
which states the intuitive result that ∂z/∂x and ∂x/∂z are multiplicative inverses of each other. The
right-hand-side results in
∂x ∂x ∂z
= ,
∂y ∂z ∂y
which when making use of the reciprocity relation, yields the counterintuitive triple product rule,
∂x ∂y ∂z
= 1.
∂y ∂z ∂x
39
40 LECTURE 13. TRIPLE PRODUCT RULE
Lecture 14
Example: Demonstrate the triple product rule using the ideal gas law.
where P is the pressure, V is the volume, T is the absolute temperature, n is the number of moles of
the gas, and R is the ideal gas constant. We say P, V and T are the state variables, and the ideal gas
law is a relation of the form
f ( P, V, T ) = PV nRT = 0.
nRT nRT PV
P= , V= , T= ;
V P nR
∂P nRT ∂V nR ∂T V
= , = , = .
∂V V2 ∂T P ∂P nR
where we make use of the ideal gas law in the last equality.
41
42 LECTURE 14. TRIPLE PRODUCT RULE: EXAMPLE
2. Suppose the four variables x, y, z and t are related by the linear expression ax + by + cz + dt = 0.
Determine a corresponding quadruple product rule for these variables.
2( x + y )
a)
( x2 + y2 + z2 )5/2
( x + y )2
b)
( x2 + y2 + z2 )5/2
x 2 + y2
c)
( x2 + y2 + z2 )5/2
3xy
d)
( x2 + y2 + z2 )5/2
2. The least-squares line through the data points (0, 1), (1, 3), (2, 3) and (3, 4) is given by
7 9x
a) y = +
5 10
5 9x
b) y = +
7 10
7 10x
c) y = +
5 9
5 10x
d) y = +
7 9
3. Let f = f ( x, y) with x = r cos q and y = r sin q. Which of the following is true?
∂f ∂f ∂f
a) = x +y
∂q ∂x ∂y
∂f ∂f ∂f
b) = x +y
∂q ∂x ∂y
∂f ∂f ∂f
c) =y +x
∂q ∂x ∂y
∂f ∂f ∂f
d) = y +x
∂q ∂x ∂y
43
44 LECTURE 14. TRIPLE PRODUCT RULE: EXAMPLE
Lecture 15
Gradient
View this lecture on YouTube
Consider the three-dimensional scalar field f = f ( x, y, z), and the differential d f , given by
∂f ∂f ∂f
df = dx + dy + dz.
∂x ∂y ∂z
r f = yz i + xz j + xy k.
45
46 LECTURE 15. GRADIENT
a) f( x, y, z) = x2 + y2 + z2 ;
1
b) f( x, y, z) = p
x2 + y2 + z2
Solutions to the Problems
Lecture 16
Divergence
View this lecture on YouTube
Here, the dot product is used between a vector differential operator r and a vector field u. The diver-
gence measures how much a vector field spreads out, or diverges, from a point. A more math-based
description will be given later.
x y z
F = F1 i + F2 j + F3 k = i+ 2 j+ 2 k.
( x2 + y2 + z2 )3/2 ( x + y2 + z2 )3/2 ( x + y2 + z2 )3/2
and analogous results for ∂F2 /∂y and ∂F3 /∂z. Adding the three derivatives results in
3 3( x 2 + y2 + z2 ) 3 3
r·F = = = 0,
|r |3 |r | 5 |r |3 |r |3
47
48 LECTURE 16. DIVERGENCE
Curl
View this lecture on YouTube
i j k ✓ ◆ ✓ ◆ ✓ ◆
∂u3 ∂u2 ∂u1 ∂u3 ∂u2 ∂u1
r ⇥ u = ∂/∂x ∂/∂y ∂/∂z = i+ j+ k.
∂y ∂z ∂z ∂x ∂x ∂y
u1 u2 u3
Here, the cross product is used between a vector differential operator and a vector field. The curl
measures how much a vector field rotates, or curls, around a point. A more math-based description
will be given later.
Example: Show that the curl of a gradient is zero, that is, r ⇥ (r f ) = 0.
We have
0 1
i j k
B C
r ⇥ (r f ) = @ ∂/∂x ∂/∂y ∂/∂z A
∂ f /∂x ∂ f /∂y ∂ f /∂z
✓ 2 ◆ ✓ 2 ◆ ✓ ◆
∂ f ∂2 f ∂ f ∂2 f ∂2 f ∂2 f
= i+ j+ k = 0,
∂y∂z ∂z∂y ∂z∂x ∂x∂z ∂x∂y ∂y∂x
49
50 LECTURE 17. CURL
Laplacian
View this lecture on YouTube
∂2 ∂2 ∂2
r2 = 2
+ 2 + 2.
∂x ∂y ∂z
The Laplacian can be applied to either a scalar field or a vector field. The Laplacian applied to a scalar
field, f = f ( x, y, z), can be written as the divergence of the gradient, that is,
∂2 f ∂2 f ∂2 f
r2 f = r · (r f ) = + + .
∂x2 ∂y2 ∂z2
The Laplacian applied to a vector field, acts on each component of the vector field separately. With
u = u1 ( x, y, z)i + u2 ( x, y, z)j + u3 ( x, y, z)k, we have
r2 u = r2 u1 i + r2 u2 j + r2 u3 k.
The Laplacian appears in some classic partial differential equations. The Laplace equation, wave
equation, and diffusion equation all contain the Laplacian and are given, respectively, by
∂2 F ∂F
r2 F = 0, = c2 r2 F, = D r2 F.
∂t2 ∂t
We have r2 f = 2 + 2 + 2 = 6.
51
52 LECTURE 18. LAPLACIAN
53
54 LECTURE 18. LAPLACIAN
Lecture 19
r ⇥ r f = 0, r · (r ⇥ u) = 0.
r ⇥ (r ⇥ u) = r(r · u) r2 u,
r · ( f u) = u · r f + f r · u,
r ⇥ ( f u) = r f ⇥ u + f r ⇥ u,
r(u · v ) = (u · r)v + (v · r)u + u ⇥ (r ⇥ v ) + v ⇥ (r ⇥ u),
r · (u ⇥ v ) = v · (r ⇥ u) u · (r ⇥ v ),
r ⇥ (u ⇥ v ) = u(r · v ) v (r · u) + (v · r)u (u · r)v.
∂ ∂ ∂
u · r = u1 + u2 + u3 ,
∂x1 ∂x2 ∂x3
∂f ∂f ∂f
u · r f = u1 + u2 + u3 ;
∂x1 ∂x2 ∂x3
In some of these identities, the parentheses are optional when the expression has only one possible
interpretation. For example, it is common to see (u · r)v written as u · rv. The parentheses are
mandatory when the expression can be interpreted in more than one way, for example r ⇥ u ⇥ v
could mean either r ⇥ (u ⇥ v ) or (r ⇥ u) ⇥ v, and these two expressions are usually not equal.
Proof of all of these identities is most readily done by manipulating the Kronecker delta and Levi-
Civita symbols, and I give an example in the next lecture.
55
56 LECTURE 19. VECTOR DERIVATIVE IDENTITIES
Lecture 20
To prove the vector derivative identities, we use component notation, the Einstein summation con-
vention, the Levi-Civita symbol and the Kronecker delta. The ith component of the curl of a vector
field is written using the Levi-Civita symbol as
∂uk
(r ⇥ u)i = eijk ;
∂x j
∂ui
r·u = .
∂xi
The contraction of the Kronecker delta with a vector field is given by dij u j = ui , and the Levi-Civita
symbol is invariant under cyclical permutation of its indices, that is, eijk = e jki = ekij . If only two
indices are interchanged, then the symbol changes sign, for example, eijk = e jik . Furthermore, a
useful identity when a vector derivative identity contains two cross products is
r · (u ⇥ v ) = v · (r ⇥ u) u · (r ⇥ v ).
We have
∂ ⇣ ⌘
r · (u ⇥ v ) = eijk u j vk
∂xi
∂u j ∂v
= eijk v + eijk u j k
∂xi k ∂xi
∂u j ∂v
= vk ekij u j e jik k
∂xi ∂xi
= v · (r ⇥ u) u · (r ⇥ v ).
The crucial step in the proof is the use of the product rule for the derivative. The rest of the proof just
requires facility with the notation and the manipulation of the indices of the Levi-Civita symbol.
57
58 LECTURE 20. VECTOR DERIVATIVE IDENTITIES (PROOF)
a) r · ( f u) = u · r f + f r · u;
b) r ⇥ (r ⇥ u) = r(r · u) r2 u.
dr
= u(t, r (t)),
dt
where
r = x1 i + x2 j + x3 k, u = u1 i + u2 j + u3 k.
a) Write down the differential equations for dx1 /dt, dx2 /dt and dx3 /dt;
b) Use the chain rule to determine formulas for d2 x1 /dt2 , d2 x2 /dt2 and d3 x3 /dt2 ;
c) Write your solution for d2 r/dt2 as a vector equation using the r differential operator.
Electromagnetic waves
View this lecture on YouTube
Maxwell’s equations in free space are most simply written using the del operator, and are given by
∂B ∂E
r · E = 0, r · B = 0, r⇥E = , r ⇥ B = µ 0 e0 .
∂t ∂t
Here I use the so-called SI units familiar to engineering students, where the constants e0 and µ0 are
called the permittivity and permeability of free space, respectively.
From the four Maxwell’s equations, we would like to obtain a single equation for the electric field
E. To do so, we can make use of the curl of the curl identity
r ⇥ (r ⇥ E ) = r(r · E ) r2 E.
To obtain an equation for E, we take the curl of the third Maxwell’s equation and commute the time
and space derivatives
∂
r ⇥ (r ⇥ E ) = (r ⇥ B ).
∂t
We apply the curl of the curl identity to obtain
∂
r(r · E ) r2 E = (r ⇥ B ),
∂t
and then apply the first Maxwell’s equation to the left-hand-side, and the fourth Maxwell’s equation
to the right-hand-side. Rearranging terms, we obtain the three-dimensional wave equation given by
∂2 E
= c2 r2 E,
∂t2
p
with c the wave speed given by c = 1/ µ0 e0 ⇡ 3 ⇥ 108 m/s. This is, of course, the speed of light in
vacuum.
59
60 LECTURE 21. ELECTROMAGNETIC WAVES
a) r ⇥ (r f )
b) r · (r ⇥ u)
c) r · (r f )
d) r ⇥ (r(r · u))
3. Suppose the electric field is given by E (r, t) = sin(z ct)i. Then which of the following is a valid
free-space solution for the magnetic field B = B (r, t)?
1
a) B (r, t) = sin (z ct)i
c
1
b) B (r, t) = sin (z ct)j
c
1
c) B (r, t) = sin ( x ct)i
c
1
d) B (r, t) = sin ( x ct)j
c
Solutions to the Practice quiz
61
62 LECTURE 21. ELECTROMAGNETIC WAVES
Week III
63
65
In this week’s lectures, we learn about integrating scalar and vector fields. Double and triple integrals
of scalar fields are taught, as are line integrals and surface integrals of vector fields. The important
technique of using curvilinear coordinates, namely polar coordinates in two dimensions, and cylin-
drical and spherical coordinates in three dimensions, is used to simplify problems with cylindrical
or spherical symmetry. The change of variables formula for multidimensional integrals using the
Jacobian of the transformation is explained.
66
Lecture 22
are the limits of the sums of DxDy (or DxDyDz) multiplied by the integrand. A single integral is the
area under a curve y = f ( x ); a double integral is the volume under a surface z = f ( x, y). A triple
integral is used, for example, to find the mass of an object by integrating over its density.
To perform a double or triple integral, the correct limits of the integral needs to be determined,
and the integral is performed as two (or three) single integrals. For example, an integration over a
rectangle in the x-y plane can be written as either
ˆ y1 ˆ x1 ˆ x1 ˆ y1
f ( x, y) dx dy or f ( x, y) dy dx.
y0 x0 x0 y0
In the first double integral, the x integration is done first (holding y fixed), and the y integral is done
second. In the second, the order of integration is reversed. Either order of integration will give the
same result.
Example: Compute the volume of the surface z = x2 y above the x-y plane with base given by a unit square with
vertices (0, 0), (1, 0), (1, 1), and (0, 1).
To find the volume, we integrate z = x2 y over its base. The integral over the unit square is given by
either of the double integrals
ˆ 1ˆ 1 ˆ 1ˆ 1
x2 y dx dy or x2 y dy dx.
0 0 0 0
In this case, an even simpler integration method separates the x and y dependence and writes
1ˆ 1 1 1 ✓ ◆✓ ◆
1 1 1
ˆ ˆ ˆ
2 2
x y dx dy = x dx y dy = = .
0 0 0 0 3 2 6
67
68 LECTURE 22. DOUBLE AND TRIPLE INTEGRALS
1 1
0.8 0.8
0.6 0.6
x=1−y y =1−x
y
0.4 0.4
0.2 0.2
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
x x
The integral over the triangle (see the figures) is given by either one of the double integrals
ˆ 1ˆ 1 y ˆ 1ˆ 1 x
x2 y dx dy, or x2 y dy dx.
0 0 0 0
!
1ˆ 1 y 1 y =1 x 1 1
x 2 y2 1 1
ˆ ˆ ˆ ˆ
2
x y dy dx = dx = x 2 (1 x )2 dx = ( x2 2x3 + x4 ) dx
0 0 0 2 y =0 2 0 2 0
✓ ◆ 1 ✓ ◆
1 x3 x4 x5 1 1 1 1 1
= + = + = ;
2 3 2 5 0 2 3 2 5 60
69
70 LECTURE 23. EXAMPLE: DOUBLE INTEGRAL WITH TRIANGLE BASE
a) 3.0 g
b) 1.5 g
c) 1.33 g
d) 1.0 g
3. The volume of the surface z = xy above the x-y plane with base given by the triangle with vertices
(0, 0), (1, 1), and (2, 0) is equal to
1
a)
6
1
b)
5
1
c)
4
1
d)
3
Solutions to the Practice quiz
71
72 LECTURE 23. EXAMPLE: DOUBLE INTEGRAL WITH TRIANGLE BASE
Lecture 24
Polar coordinates
View this lecture on YouTube
θ̂
In two dimensions, polar coordinates is the most com- y r̂
monly used curvilinear coordinate system. The relation-
ship between the Cartesian coordinates and polar coordi-
y
r
nates is given by
θ
x = r cos q, y = r sin q.
x
x
One defines unit vectors r̂ and ✓ˆ to be orthogonal and in the direction of increasing r and q, respec-
tively (see above figure). The r̂-✓ˆ unit vectors are rotated an angle q from the i-j unit vectors. Simple
trigonometry shows that
r̂ = cos qi + sin qj, ✓ˆ = sin qi + cos qj.
It is important to remember that the direction of the unit vectors in curvilinear coordinates are not
fixed, but depend on their location. Here, r̂ = r̂ (q ) and ✓ˆ = ✓ˆ (q ), and by differentiating the polar unit
vectors, one can show that
dr̂ ˆ d✓ˆ
= ✓, = r̂.
dq dq
The Cartesian partial derivatives can be transformed into polar form using the chain rule. Using
the relationship between the Cartesian coordinates and polar coordinates, we have for a scalar field
f = f ( x (r, q ), y(r, q )),
∂f ∂ f ∂x ∂ f ∂y ∂f ∂f
= + = cos q + sin q ,
∂r ∂x ∂r ∂y ∂r ∂x ∂y
∂f ∂ f ∂x ∂ f ∂y ∂f ∂f
= + = r sin q + r cos q ;
∂q ∂x ∂q ∂y ∂q ∂x ∂y
∂f ∂f sin q ∂ f ∂f ∂f cos q ∂ f
= cos q , = sin q + .
∂x ∂r r ∂q ∂y ∂r r ∂q
73
74 LECTURE 24. POLAR COORDINATES
a) Given
r̂ = cos qi + sin qj, ✓ˆ = sin qi + cos qj,
b) Given
∂f ∂f ∂f ∂f ∂f ∂f
= cos q + sin q , = r sin q + r cos q ,
∂r ∂x ∂y ∂q ∂x ∂y
invert a two-by-two matrix to solve for ∂ f /∂x and ∂ f /∂y.
A central force is a force acting on a point mass and pointing directly towards a fixed point in space.
The origin of the coordinate system is chosen at this fixed point, and the axis orientated such that the
initial position and velocity of the mass lies in the x-y plane. The subsequent motion of the mass is
then two dimensional, and polar coordinates can be employed.
The position vector of the point mass in polar coordinates is given by
r = rr̂.
The velocity of the point mass is obtained by differentiating r with respect to time. The added dif-
ficulty here is that the unit vectors r̂ = r̂ (q (t)) and ✓ˆ = ✓ˆ (q (t)) are also functions of time. When
differentiating, we will need to use the chain rule in the form
As is customary, here we will use the dot notation for the time derivative. For example, ẋ = dx/dt
and ẍ = d2 x/dt2 .
The velocity of the point mass is then given by
dr̂ ˆ
ṙ = ṙr̂ + r = ṙr̂ + r q̇ ✓;
dt
dr̂ d✓ˆ
r̈ = r̈r̂ + ṙ + ṙ q̇ ✓ˆ + r q̈ ✓ˆ + r q̇
dt dt
= (r̈ 2
r q̇ )r̂ + (2ṙ q̇ + r q̈ )✓.ˆ
A central force can be written as F = f r̂, where usually f = f (r ). Newton’s equation mr̈ = F then
becomes the two equations
m(r̈ r q̇ 2 ) = f, m(2ṙ q̇ + r q̈ ) = 0.
The second equation is usually expressed as conservation of angular momentum, and after multipli-
cation by r, is written in the form
d
(mr2 q̇ ) = 0, or mr2 q̇ = constant.
dt
75
76 LECTURE 25. CENTRAL FORCE
l = r ⇥ p,
where r is the position vector of the mass and p = mṙ is the momentum of the mass. Show that
Let u(s) be a differentiable and invertible function. We can change variables in this integral by letting
x = u(s) so that dx = u0 (s) ds. The integral in the new variable s then becomes
u 1 (x
f)
ˆ
I= f (u(s))u0 (s) ds.
u 1 (x
0)
The key piece for us here is the transformation of the infinitesimal length dx = u0 (s) ds.
We can be more concrete by examining a specific transformation. Consider the calculation of the
area of a circle of radius R, given by the integral
ˆ R p
A=4 R2 x2 dx.
0
To more easily perform this integral, we can let x = R cos q so that dx = R sin q dq. The integral then
becomes ˆ p/2 p ˆ p/2
A = 4R2 1 cos2 q sin q dq = sin2 q dq,
0 0
77
78 LECTURE 26. CHANGE OF VARIABLES (SINGLE INTEGRAL)
Lecture 27
We would like to change variables from ( x, y) to (s, t). For simplicity, we will write this change of
variables as x = x (s, t) and y = y(s, t). The region A in the x-y domain transforms into a region A0 in
the s-t domain, and the integrand becomes a function of the new variables s and t by the substitution
f ( x, y) = f ( x (s, t), y(s, t)). We now consider how to transform the infinitesimal area dx dy.
The transformation of dx dy is obtained by considering how an infinitesimal rectangle is trans-
formed into an infinitesimal parallelogram, and how the area of the two are related by the absolute
value of a determinant. The main result, which we do not derive here, is given by
dx dy = |det ( J )| ds dt,
To be more concrete, we again calculate the area of a circle. Here, using a two-dimensional integral,
the area of a circle can be written as ¨
A= dx dy,
A
where the integral subscript A denotes the region in the x-y plane that defines the circle. To perform
this integral, we can change from Cartesian to polar coordinates. Let
x = r cos q, y = r sin q.
We have
! !
∂x/∂r ∂x/∂q cos q r sin q
dx dy = det dr dq = det dr dq = r dr dq.
∂y/∂r ∂y/∂q sin q r cos q
The region in the r-q plane that defines the circle is 0 r R and 0 q 2p. The integral then
becomes ˆ 2p ˆ R ˆ 2p ˆ R
A= r dr dq = dq r dr = pR2 .
0 0 0 0
79
80 LECTURE 27. CHANGE OF VARIABLES (DOUBLE INTEGRAL)
Suppose a circular disk of radius R has mass density r0 at its center and r1 at its edge, and its density
is a linear function of the distance from the center. Find the total mass of the disk.
ˆ •
2
2. Compute the Gaussian integral given by I = e x dx. Use the well-known trick
•
✓ˆ • ◆2 ˆ • ˆ • ˆ • ˆ •
2 x2 x2 y2 ( x 2 + y2 )
I = e dx = e dx e dy = e dx dy.
• • • • •
a) xi + yj
b) xi yj
c) yi + xj
d) yi + xj
d✓ˆ
2. is equal to
dq
a) r̂
b) r̂
c) ✓ˆ
d) ✓ˆ
3. Suppose a circular disk of radius 1 cm has mass density 10 g/cm2 at its center, and 1 g/cm2 at its
edge, and its density is a linear function of the distance from the center. The total mass of the disk is
equal to
a) 8.80 g
b) 10.21 g
c) 12.57 g
d) 17.23 g
81
82 LECTURE 27. CHANGE OF VARIABLES (DOUBLE INTEGRAL)
Lecture 28
Cylindrical coordinates
View this lecture on YouTube
∂2 1 ∂ 1 ∂2 ∂2
r2 = + + +
∂r2 r ∂r r2 ∂f2 ∂z2
✓ ◆ 2
1 ∂ ∂ 1 ∂ ∂2
= r + 2 2 + 2.
r ∂r ∂r r ∂f ∂z
The divergence and curl of a vector field, A = Ar (r, f, z)⇢ˆ + Af (r, f, z) ˆ + Az (r, f, z)ẑ, are given by
1 ∂ 1 ∂Af ∂Az
r·A = (rAr ) + + ,
r ∂r r ∂f ∂z
✓ ◆ ✓ ◆ ✓ ◆
1 ∂Az ∂Af ∂Ar ∂Az 1 ∂ 1 ∂Ar
r ⇥ A = ⇢ˆ + ˆ + ẑ (rAf ) .
r ∂f ∂z ∂z ∂r r ∂r r ∂f
83
84 LECTURE 28. CYLINDRICAL COORDINATES
∂ ∂ ∂
r = x̂ + ŷ + ẑ ,
∂x ∂y ∂z
and
∂ ∂ sin f ∂ ∂ ∂ cos f ∂
= cos f , = sin f + .
∂x ∂r r ∂f ∂y ∂r r ∂f
2. Compute r · ⇢ˆ in two ways:
ˆ r · ˆ and r ⇥ ˆ .
3. Using cylindrical coordinates, compute r ⇥ ⇢,
dx dy dz = r2 sin q dr dq df.
The spherical coordinate unit vectors can be written in terms of the Cartesian unit vectors by
By differentiating the unit vectors, we can derive the sometimes useful identities
∂r̂ ˆ ∂✓ˆ ∂ˆ
= ✓, = r̂, = 0;
∂q ∂q ∂q
∂r̂ ∂✓ˆ ∂ˆ
= sin q ˆ , = cos q ˆ , = sin q r̂ ˆ
cos q ✓.
∂f ∂f ∂f
85
86 LECTURE 29. SPHERICAL COORDINATES (PART A)
3. Consider a scalar field f = f (r ) that depends only on the distance from the origin. Using dx dy dz =
r2 sin q dr dq df, and an integration region V inside a sphere of radius R centered at the origin, show
that ˆ ˆ R
f dV = 4p r2 f (r ) dr.
V 0
4. Suppose a sphere of radius R has mass density r0 at its center, and r1 at its surface, and its density
is a linear function of the distance from the center. Find the total mass of the sphere. What is the
average density of the sphere?
First, we determine the gradient in spherical coordinates. Consider the scalar field f = f (r, q, f).
Our definition of a total differential is
∂f ∂f ∂f
df = dr + dq + df = r f · dr.
∂r ∂q ∂f
In spherical coordinates,
r = r r̂ (q, f),
and using
∂r̂ ˆ ∂r̂
= ✓, = sin q ˆ ,
∂q ∂f
we have
∂r̂ ∂r̂
dr = dr r̂ + r dq + r df = dr r̂ + r dq ✓ˆ + r sin q df ˆ .
∂q ∂f
Using the orthonormality of the unit vectors, we can therefore write d f as
✓ ◆
∂f 1 ∂f ˆ 1 ∂f ˆ
df = r̂ + ✓+ · (dr r̂ + r dq ✓ˆ + r sin q df ˆ ),
∂r r ∂q r sin q ∂f
showing that
∂f 1 ∂f ˆ 1 ∂f ˆ
rf = r̂ + ✓+ .
∂r r ∂q r sin q ∂f
Some messy algebra will yield for the Laplacian
✓ ◆ ✓ ◆
2 1 ∂ 2∂f 1 ∂ ∂f 1 ∂2 f
r f = 2 r + 2 sin q + ;
r ∂r ∂r r sin q ∂q ∂q r2 sin2 q ∂f2
and for the divergence and curl of a vector field, A = Ar (r, q, f)r̂ + Aq (r, q, f)✓ˆ + Af (r, q, f) ˆ ,
1 ∂ 2 1 ∂ 1 ∂Af
r·A = (r Ar ) + (sin q Aq ) + ,
r2 ∂r r sin q ∂q r sin q ∂f
r̂ ∂ ∂Aq ✓ˆ 1 ∂Ar ∂ ˆ ∂ ∂Ar
r⇥A = (sin q Af ) + (rAf ) + (rAq ) .
r sin q ∂q ∂f r sin q ∂f ∂r r ∂r ∂q
87
88 LECTURE 30. SPHERICAL COORDINATES (PART B)
a) (i, j, k )
b) (i, k, j )
c) (i, k, j )
d) (i, k, j)
3. Suppose a sphere of radius 5 cm has mass density 10 g/cm3 at its center, and 5g/cm3 at its surface,
and its density is a linear function of the distance from the center. The total mass of the sphere is given
by
a) 3927 g
b) 3491 g
c) 3272 g
d) 3142 g
89
90 LECTURE 30. SPHERICAL COORDINATES (PART B)
Lecture 31
The line integral of a vector field u along a directed curve C is written as one of
ˆ ˛
u · dr or u · dr,
C C
where the latter form is used when the curve C is closed, with ending point equal to starting point.
To define the line integral, we break the curve into small displacement vectors, take the dot product of
the average value of u on each piece of the curve with its displacement vector, and sum over all these
scalar values.
A general method to calculate the line integral is to parameterize the curve. Let the curve be
dr
parameterized by the function r = r (t) as t goes from t0 to t f . Using dr = dt, the line integral
dt
becomes ˆ tf
dr
ˆ
u · dr = u(r (t)) · dt.
C t0 dt
Sometimes the curve is simple enough that dr can be computed directly. No matter, line integrals are
always done by converting them into single-variable integrals.
Example: Compute the line integral of r = xi + yj in the x-y plane along two curves from the origin to the
point ( x, y) = (1, 1). The first curve C1 consists of two line segments, the first from (0, 0) to (1, 0), and the
second from (1, 0) to (1, 1). The second curve C2 is a straight line from the origin to (1, 1).
The computation along the first curve requires two separate integrations. For the curve along the
x-axis, we use dr = dxi and for the curve at x = 1 in the direction of j, we use dr = dyj. The line
integral is therefore given by
ˆ ˆ 1 ˆ 1
r · dr = x dx + y dy = 1.
C1 0 0
For the second curve, we parameterize the line by r (t) = t(i + j ) as t goes from 0 to 1, so that
dr = dt(i + j ), and the integral becomes
ˆ ˆ 1
r · dr = 2t dt = 1.
C2 0
The two line integrals are equal, and for this case only depend on the starting and ending points of
the curve.
91
92 LECTURE 31. LINE INTEGRAL OF A VECTOR FIELD
2. In the x-y plane, calculate the line integral of the vector field u = yi + xj counterclockwise around
the circle of radius R centered at the origin.
The surface integral of the normal component of a vector field u computed over a surface S is written
as one of ˆ ˛
u · dS or u · dS,
S S
where the latter is used when the surface S is closed. One can write dS = n̂dS, where n̂ is a normal
unit vector to the surface. To define the surface integral, we break the surface into little areas, take
the dot product of the average value of u on each area with the normal unit vector times the area
itself, and then sum. For an open surface, the direction of the normal vectors needs to be specified as
there are two choices, but for a closed surface, by convention the direction is always chosen to point
outward.
This surface integral is often called a flux integral. If u is the fluid velocity (length divided by
time), and r is the fluid density (mass divided by volume), then the surface integral
ˆ
ru · dS
S
computes the mass flux, that is, the mass passing through the surface S per unit time.
Example: Compute the flux integral of r = xi + yj + zk over a square box centered at the origin with sides
parallel to the axes and side lengths equal to L .
From symmetry, we need only compute the surface integral through one face and them multiply by
six. Consider the face located at x = L/2. The normal vector to this face is i, and the infinitesimal
surface area is dy dz. The integral over this surface, say S1 , is therefore given by
L/2 L/2 ✓ ◆
L L3
ˆ ˆ ˆ
r · dS = dy dz = .
S1 L/2 L/2 2 2
93
94 LECTURE 32. SURFACE INTEGRAL OF A VECTOR FIELD
a) 0
1 2
b) L
2
c) L2
d) 2L2
2. Consider a right circular cylinder of radius R and length L centered on the z-axis. The surface
integral of u = xi + yj over the cylinder is given by
a) 0
b) pR2 L
c) 2pR2 L
d) 4pR2 L
3. The flux integral of u = zk over the upper hemisphere of a sphere of radius R centered at the origin
with normal vector r̂ is given by
2p 3
a) R
3
4p 3
b) R
3
c) 2pR3
d) 4pR3
95
96 LECTURE 32. SURFACE INTEGRAL OF A VECTOR FIELD
Week IV
Fundamental Theorems
97
99
In this week’s lectures, we learn about the fundamental theorems of vector calculus. These include
the gradient theorem, the divergence theorem, and Stokes’ theorem. We show how these theorems
are used to derive continuity equations, define the divergence and curl in coordinate-free form, and
convert the integral version of Maxwell’s equations to differential form.
100
Lecture 33
Gradient theorem
View this lecture on YouTube
The gradient theorem is a generalization of the fundamental theorem of calculus for line integrals.
Let rf be the gradient of a scalar field f, and let C be a directed curve that begins at the point r1 and
ends at r2 . Suppose we can parameterize the curve C by r = r (t), where t1 t t2 , r (t1 ) = r1 , and
r (t2 ) = r2 . Then using the chain rule in the form
d dr
f(r ) = rf(r ) · ,
dt dt
t2 t2
dr d
ˆ ˆ ˆ
rf · dr = rf(r ) · dt = f(r ) dt
C t1 dt t1 dt
= f(r (t2 )) f(r (t1 )) = f(r2 ) f ( r1 ) .
df = rf · dr,
so that ˆ ˆ
rf · dr = df = f(r2 ) f ( r1 ) .
C C
We have thus shown that the line integral of the gradient of a function is path independent, depending
only on the endpoints of the curve. In particular, we have the general result that
˛
rf · dr = 0
C
(1,1)
1 1 2
ˆ ˆ
r · dr = r( x2 + y2 ) · dr = ( x + y2 ) = 1.
C 2 C 2 (0,0)
101
102 LECTURE 33. GRADIENT THEOREM
a) Compute rf.
c) Compute C rf · dr along the lines segments (0, 0, 0) to (1, 0, 0) to (1, 1, 0) to (1, 1, 1).
´
For a vector field u defined on R3 , except perhaps at isolated singularities, the following conditions
are equivalent:
1. r ⇥ u = 0;
i j k
r⇥u = ∂/∂x ∂/∂y ∂/∂z = (3x2 y2 3x2 y2 )k = 0.
x 2 (1 + y3 ) y2 (1 + x 3 ) 0
∂f ∂f
= x 2 (1 + y3 ), = y2 (1 + x 3 ).
∂x ∂y
1 3
ˆ
f= x2 (1 + y3 ) dx = x (1 + y3 ) + f ( y ),
3
where f = f (y) is a function that depends only on y. Differentiating f with respect to y and using the
second equation, we obtain
x 3 y2 + f 0 ( y ) = y2 (1 + x 3 ) or f 0 ( y ) = y2 .
One more integration results in f (y) = y3 /3 + c, with c constant, and the scalar field is given by
1 3
f( x, y) = ( x + x3 y3 + y3 ) + c.
3
103
104 LECTURE 34. CONSERVATIVE VECTOR FIELDS
a) 0
b) 1
c) 2
d) 3
˛
2. Let u = yi + xj. The value of u · dr, where C is the unit circle centered at the origin, is given by
C
a) 0
b) 1
c) 2
d) 3
a) ( x + y)2 + z
b) ( x y )2 + z
c) x2 + xy + y2 + z
d) x2 xy + y2 + z
105
106 LECTURE 34. CONSERVATIVE VECTOR FIELDS
Lecture 35
Divergence theorem
View this lecture on YouTube
Let u be a differentiable vector field defined inside and on a smooth closed surface S enclosing a
volume V. The divergence theorem states that the integral of the divergence of u over the enclosed
volume is equal to the flux of u through the bounding surface; that is,
ˆ ˛
(r · u) dV = u · dS.
V S
We first prove the divergence theorem for a rectangular solid with sides parallel to the axes. Let
the rectangular solid be defined by a x b, c y d, and e z f . With u = u1 i + u2 j + u3 k,
the volume integral over V becomes
f dˆ b ✓ ◆
∂u1 ∂u ∂u
ˆ ˆ ˆ
(r · u) dV = + 2+ 3 dx dy dz.
V e c a ∂x ∂y ∂z
The three terms in the integral can be integrated separately using the fundamental theorem of calculus.
Each term in succession is integrated as
!
f d b f d
∂u1
ˆ ˆ ˆ ˆ ˆ
dx dy dz = (u1 (b, y, z) u1 ( a, y, z)) dy dz;
e c a ∂x e c
!
f b d ˆ fˆ b
∂u2
ˆ ˆ ˆ
dy dx dz = (u2 ( x, d, z) u2 ( x, c, z)) dx dz;
e a c ∂y e a
!
dˆ b f ˆ dˆ b
∂u3
ˆ ˆ
dz dx dy = (u3 ( x, y, f ) u3 ( x, y, e)) dx dy.
c a e ∂z c a
The integrals on the right-hand-sides correspond exactly to flux integrals over opposite sides of the
rectangular solid. For example, the side located at x = b corresponds with dS = i dy dz and the side
located at x = a corresponds with dS = i dy dz. Summing all three integrals yields the divergence
theorem for the rectangular solid.
Now, given any volume enclosed by a smooth surface, we can subdivide the volume by a very fine
three-dimensional rectangular grid and apply the above result to each rectangular solid in the grid. All
the volume integrals over the rectangular solids add. The internal rectangular solids, however, share
connecting side faces through which the flux integrals cancel, and the only flux integrals that remain
are those from the rectangular solids on the boundary of the volume with outward facing surfaces.
The result is the divergence theorem for any volume V enclosed by a smooth surface S.
107
108 LECTURE 35. DIVERGENCE THEOREM
Test the divergence theorem using u = xy i + yz j + zx k for a cube of side L lying in the first octant with a
vertex at the origin.
Here, Cartesian coordinates are appropriate and we use r · u = y + z + x. We have for the left-hand
side of the divergence theorem,
ˆ ˆ Lˆ Lˆ L
(r · u) dV = ( x + y + z) dx dy dz
V 0 0 0
= L /2 + L /2 + L4 /2
4 4
= 3L4 /2.
For the right-hand side of the divergence theorem, the flux integral only has nonzero contributions
from the three sides located at x = L, y = L and z = L. The corresponding unit normal vectors are i,
j and k, and the corresponding integrals are
˛ ˆ Lˆ L ˆ Lˆ L ˆ Lˆ L
u · dS = Ly dy dz + Lz dx dz + Lx dx dy
S 0 0 0 0 0 0
= L /2 + L4 /2 + L4 /2
4
= 3L4 /2.
109
110 LECTURE 36. DIVERGENCE THEOREM: EXAMPLE I
2. Compute the flux integral of r = xi + yj + zk over a square box with side lengths equal to L by
applying the divergence theorem to convert the flux integral into a volume integral.
Test the divergence theorem using u = r2 r̂ for a sphere of radius R centered at the origin.
To compute the left-hand-side of the divergence theorem, we recall the formula for the divergence of
a vector field u in spherical coordinates:
1 ∂ 2 1 ∂ 1 ∂uf
r·u = 2
(r ur ) + (sin quq ) + .
r ∂r r sin q ∂q r sin q ∂f
1 d ⇣ 4⌘
r·u = r = 4r.
r2 dr
For the right-hand-side of the divergence theorem, we have u = R2 r̂ and dS = r̂R2 sin q dq df, so that
˛ ˆ 2p ˆ p
u · dS = R4 sin q dq df = 4pR4 .
S 0 0
111
112 LECTURE 37. DIVERGENCE THEOREM: EXAMPLE II
2. Compute the flux integral of r = xi + yj + zk over a sphere of radius R by applying the divergence
theorem to convert the flux integral into a volume integral.
Continuity equation
View this lecture on YouTube
The divergence theorem is often used to derive a continuity equation, which expresses the local con-
servation of some physical quantity such as mass or electric charge. Here, we derive the continuity
equation for a compressible fluid such as a gas. Let r(r, t) be the fluid density at position r and time
t, and u(r, t) be the fluid velocity. We will assume no sources or sinks of fluid. We place a small test
volume V in the fluid flow and consider the change in the fluid mass M inside V.
The fluid mass M in V varys because of the mass flux through the surface S surrounding V, and
one has
dM
˛
= ru · dS.
dt S
d
ˆ ˆ
r dV = r · (ru) dV.
dt V V
Taking the time derivative inside the integral on the left-hand side, and combining the two integrals
yields ˆ ✓ ◆
∂r
+ r · (ru) dV = 0.
V ∂t
Since this integral vanishes for any test volume placed in the fluid, the integrand itself must be zero,
and we have derived the continuity equation
∂r
+ r · (ru) = 0.
∂t
For an incompressible fluid for which the density r is uniform and constant, the continuity equation
reduces to
r · u = 0.
113
114 LECTURE 38. CONTINUITY EQUATION
∂r
+ u · rr + rr · u = 0.
∂t
2. The electric charge density (charge per unit volume) is give by r(r, t) and the volume current
density (current per unit area) is given by J (r, t). Local conservation of charge states that the time
rate of change of the total charge within a volume is equal to the negative of the charge flowing out of
that volume, resulting in the equation
d
ˆ ˛
r(r, t) dV = J · dS.
dt V S
From this law of charge conservation, derive the electrodynamics continuity equation.
a) 0
p
b) pRL R2 + L2
p
c) 2pRL R2 + L2
p
d) 3pRL R2 + L2
˛
2. The surface integral r · dS over a right circular cylinder of radius R and length L is equal to
S
a) 0
b) pR2 L
c) 2pR2 L
d) 3pR2 L
d) u = ( x + y)2 i + ( x y )2 j
115
116 LECTURE 38. CONTINUITY EQUATION
Lecture 39
Green’s theorem
View this lecture on YouTube
Green’s theorem is a two-dimensional version of Stokes’ theorem, and serves as a simpler introduction.
Let u = u1 ( x, y)i + u2 ( x, y)j be a differentiable two-dimensional vector field defined on the x-y plane.
Green’s theorem relates an area integral over S in the plane to a line integral around C surrounding
this area, and is given by ˆ ✓ ◆
∂u2 ∂u1
˛
dS = (u1 dx + u2 dy).
S ∂x ∂y C
is given by
ˆ ✓ ◆ dˆ b bˆ d
∂u2 ∂u1 ∂u2 ∂u1
ˆ ˆ
c
dS = dx dy dy dx.
S ∂x ∂y c a ∂x a c ∂y
a b
x
The inner integrals can be done using the fundamental theorem of calculus, and we obtain
ˆ ✓ ◆ d b
∂u2 ∂u1
ˆ ˆ ˛
dS = [u2 (b, y) u2 ( a, y)] dy + [u1 ( x, c) u1 ( x, d)] dx = (u1 dx + u2 dy).
S ∂x ∂y c a C
Note that the line integral is done so that the bounded area is always to the left, which means coun-
terclockwise.
Now, given any closed smooth curve in the x-y plane enclosing an area, we can subdivide the
area by a very fine two-dimensional rectangular grid and apply the above result to each rectangle
in the grid. All the area integrals over the internal rectangles add. The internal rectangles share
connecting sides over which the line integrals cancel, and the only line integrals that remain are those
that approximate the given bounding curve. The result is Green’s theorem for any area S in the plane
bounded by a curve C.
117
118 LECTURE 39. GREEN’S THEOREM
2. Test Green’s theorem using u = yi + xj for a circle of radius R centered at the origin.
Stokes’ theorem
View this lecture on YouTube
∂u2 ∂u1
= (r ⇥ u) · k;
∂x ∂y
and with dS = k dS and u1 dx + u2 dy = u · dr, Green’s theorem can be rewritten in the form
ˆ ˛
(r ⇥ u) · dS = u · dr.
S C
This three-dimensional extension of Green’s theorem is called Stokes’ theorem. Here, S is a general
three-dimensional surface bounded by a closed spatial curve C. A simple example would be a hemi-
sphere located anywhere in space bounded by a circle. The orientation of the closed curve and the
normal vector to the surface should follow the right-hand rule. If your fingers of your right hand point
in the direction of the line integral, your thumb should point in the direction of the normal vector to
the surface.
119
120 LECTURE 40. STOKES’ THEOREM
a) y-z plane;
b) z-x plane.
2. Test Stokes’ theorem using u = yi + xj for a hemisphere of radius R with z > 0 bounded by a
circle of radius R lying in the x-y plane with center at the origin.
˛
1. Let u = yi + xj. Compute u · dr for the quarter circle
y
C
of radius R as illustrated. Here, it is simpler to apply Stokes’
theorem to compute an area integral. The answer is 0
a) 0 0
x
R
1 2
b) 2 pR
c) pR2
d) 2pR2
y x
ˆ
2. Let u = i + j. Compute the value of (r ⇥ u) · dS over a circle of radius R centered
x 2 + y2 x 2 + y2 S
at the origin in the x-y plane with normal vector k. Here, because u is singular at r = 0, it is necessary
to apply Stokes’ theorem and compute a line integral. The answer is
a) 0
b) p
c) 2p
d) 4p
˛
3. Let u = x2 yi + xy2 j. Compute u · dr for a unit square in the first quadrant with vertex at the
C
origin. Here, it is simpler to compute an area integral. The answer is
a) 0
1
b)
3
2
c)
3
d) 1
121
122 LECTURE 40. STOKES’ THEOREM
Lecture 41
With u a differentiable vector field defined inside and on a smooth closed surface S enclosing a
volume V, the divergence theorem states
ˆ ˛
r · u dV = u · dS.
V S
We can limit this expression by shrinking the integration volume down to a point to obtain a coordinate-
free representation of the divergence as
1
˛
r · u = lim u · dS.
V !0 V S
Picture V as the volume of a small sphere with surface S and u as the velocity field of some fluid of
constant density. Then if the flow of fluid into the sphere is equal to the flow of fluid out of the sphere,
the surface integral will be zero and r · u = 0. However, if more fluid flows out of the sphere than in,
then r · u > 0 and if more fluid flows in than out, r · u < 0. Positive divergence indicates a source
of fluid and negative divergence indicates a sink of fluid.
Now consider a surface S bounded by a curve C. Stokes’ theorem states that
ˆ ˛
(r ⇥ u) · dS = u · dr.
S C
We can limit this expression by shrinking the integration surface down to a point. With n a unit
normal vector to the surface, with direction given by the right-hand rule, we obtain
1
˛
(r ⇥ u) · n = lim u · dr.
S !0 S C
Picture S as the area of a small disk bounded by a circle C and again picture u as the velocity field
of a fluid. The line integral of u · dr around the circle C is called the flow’s circulation and measures
the swirl of the fluid around the center of the circle. The vector field ! = r ⇥ u is called the vorticity
of the fluid. The vorticity is most decidedly nonzero in a wirling (say, turbulent) fluid, composed of
eddies of all different sizes.
123
124 LECTURE 41. MEANING OF THE DIVERGENCE AND THE CURL
∂u 1
+ (u · r)u = rp + nr2 u,
∂t r
a) By taking the divergence of the Navier-Stokes equation, derive the following equation for the
pressure in terms of the velocity field:
∂ui ∂u j
r2 p = r .
∂x j ∂xi
b) By taking the curl of the Navier-Stokes equation, and defining the vorticity as ! = r ⇥ u, derive
the vorticity equation
∂!
+ (u · r)! = (! · r)u + nr2 !.
∂t
You can use all the vector identities presented in these lecture notes, but you will need to prove
that
1
u ⇥ (r ⇥ u) = r(u · u) (u · r)u.
2
Maxwell’s equations
View this lecture on YouTube
qenc
˛
E · dS = , (Gauss’s law for electric fields)
#0
˛S
B · dS = 0, (Gauss’s law for magnetic fields)
˛S
d
ˆ
E · dr = B · dS, (Faraday’s law)
C dt S
✓ ◆
d
˛ ˆ
B · dr = µ0 Ienc + # 0 E · dS , (Ampère-Maxwell law)
C dt S
where E and B are the electric and magnetic fields, qenc and Ienc are the charge or current enclosed
by the bounding surface or curve, and # 0 and µ0 are dimensional constants called the permittivity and
permeability of free space.
The transformation from integral to differential form is a straightforward application of both the
divergence and Stokes’ theorem. The charge q and the current I are related to the charge density r
and the current density J by ˆ ˆ
q= r dV, I= J · dS.
V S
We apply the divergence theorem to the surface integrals and Stokes’ theorem to the line integrals,
replace qenc and Ienc by integrals, and combine results into single integrals to obtain
ˆ ✓ ◆ ˆ ✓ ◆
r ∂B
r·E dV = 0, r⇥E+ · dS = 0,
V #0 S ∂t
ˆ ✓ ✓ ◆◆
∂E
ˆ
(r · B ) dV = 0, r⇥B µ0 J + # 0 · dS = 0.
V S ∂t
Since the integration volumes and surfaces are of arbitrary size and shape, the integrands must vanish
and we obtain the aesthetically appealing differential forms for Maxwell’s equations given by
r ∂B
r·E = , r⇥E = ,
#0 ∂t
✓ ◆
∂E
r · B = 0, r ⇥ B = µ0 J + # 0 .
∂t
125
126 LECTURE 42. MAXWELL’S EQUATIONS
determine the magnetic field of a current carrying infinite wire placed on the z-axis. Assume the
magnetic field has cylindrical symmetry.
127
Appendix A
The first row of matrix A has elements a11 and a12 ; the second row has elements a21 and a22 . The
first column has elements a11 and a21 ; the second column has elements a12 and a22 . Matrices can be
multiplied by scalars and added. This is done element-by-element as follows:
! !
ka11 ka12 a11 + b11 a12 + b12
kA = , A+B = .
ka21 ka22 a21 + b21 a22 + b22
Matrices can also be multiplied. Matrix multiplication does not commute, and two matrices can be
multiplied only if the number of columns of the matrix on the left equals the number of rows of the
matrix on the right. One multiplies matrices by going across the rows of the first matrix and down the
columns of the second matrix. The two-by-two example is given by
! ! !
a11 a12 b11 b12 a11 b11 + a12 b21 a11 b12 + a12 b22
= .
a21 a22 b21 b22 a21 b11 + a22 b21 a21 b12 + a22 b22
Making use of the definition of matrix multiplication, a system of linear equations can be written
in matrix form. For instance, a general system with two equations and two unknowns is given by
Ax = b.
129
130 APPENDIX A. MATRIX ADDITION AND MULTIPLICATION
Appendix B
1 1
AA =A A = I,
It can be shown that a matrix A is invertible if and only if its determinant is not zero. Here, we only
need two-by-two and three-by-three determinants. The two-by-two determinant, using the vertical bar
notation, is given by
a11 a12
= a11 a22 a12 a21 ;
a21 a22
that is, multiply the diagonal elements and subtract the product of the off-diagonal elements.
The three-by-three determinant is given in terms of two-by-two determinants as
The rule here is to go across the first row of the matrix, multiplying each element in the row by the
determinant of the matrix obtained by crossing out that element’s row and column, and adding the
results with alternating signs.
We will need to invert two-by-two and three-by-three matrices, but this will mainly be simple
because our matrices will be orthogonal. The rows (or columns) of an orthogonal matrix, considered
as components of a vector, are orthonormal. For example, the following two matrices are orthogonal
matrices: 0 1
! sin q cos f sin q sin f cos q
cos q sin q B C
, @cos q cos f cos q sin f sin q A .
sin q cos q
sin f cos f 0
For the first matrix, the row vectors r̂ = cos qi + sin qj and ✓ˆ = sin qi + cos qj have unit length and
are orthogonal, and the same can be said for the rows of the second matrix.
The inverse of an orthogonal matrix is simply given by its transpose, obtained by interchanging
the matrices rows and columns. For example,
! 1 !
cos q sin q cos q sin q
= .
sin q cos q sin q cos q
131
132 APPENDIX B. MATRIX DETERMINANTS AND INVERSES
For more general two-by-two matrices, the inverse can be found from
! 1 !
a b 1 d b
= .
c d ad bc c a
In words, switch the diagonal elements, negate the off-diagonal elements, and divide by the determi-
nant.
Appendix C
Problem solutions
133
134 APPENDIX C. PROBLEM SOLUTIONS
C
A
B+
+B C
A
(A + B) + C = A + (B + C)
2. Draw a triangle with sides composed of the vectors A, B, and C, with C = A + B. Then draw the
vector X pointing from the midpoint of C to the midpoint of B.
B
A
1 1
(A + B ) + X = A + B,
2 2
and solving for X yields X = 12 A. Therefore X is parallel to A and one-half its length.
135
1.
r2 r1 (x x1 )i + ( y2 y1 )j + ( z2 z1 )k
= p2 .
| r2 r1 | ( x2 x1 )2 + ( y2 y1 )2 + ( z2 z1 )2
b) The force acting on m1 with position vector r1 due to the mass m2 with position vector r2 is
written as
r2 r1 ( x2 x1 )i + ( y2 y1 )j + ( z2 z1 )k
F = Gm1 m2 = Gm1 m2 .
| r2 r1 | 3 [( x2 x1 )2 + (y2 y1 )2 + (z2 z1 )2 ]3/2
136 APPENDIX C. PROBLEM SOLUTIONS
1.
a) A · B = A1 B1 + A2 B2 + A3 B3 = B1 A1 + B2 A2 + B3 A3 = B · A;
b) A · (B + C ) = A1 ( B1 + C1 ) + A2 ( B2 + C2 ) + A3 ( B3 + C3 ) = A1 B1 + A1 C1 + A2 B2 + A2 C2 +
A3 B3 + A3 C3 = ( A1 B1 + A2 B2 + A3 B3 ) + ( A1 C1 + A2 C2 + A3 C3 ) = A · B + A · C;
2. The dot product of a unit vector with itself is one, and the dot product of a unit vector with one
perpendicular to itself is zero. That is,
i · i = j · j = k · k = 1; i · j = i · k = j · k = 0; j · i = k · i = k · j = 0.
θ
B
With C = A B, we have
where q is the angle between vectors A and B. In the usual notation, if A, B and C are the lengths of
the sides of a triangle, and q is the angle opposite side C, then
C 2 = A2 + B2 2AB cos q.
137
1.
a)
i j k i j k
A ⇥ B = A1 A2 A3 = B1 B2 B3 = B ⇥ A.
B1 B2 B3 A1 A2 A3
b)
i j k
A ⇥ (B + C ) = A1 A2 A3
B1 + C1 B2 + C2 B3 + C3
i j k i j k
= A1 A2 A3 + A1 A2 A3 = A ⇥ B + A ⇥ C.
B1 B2 B3 C1 C2 C3
c)
i j k i j k i j k
A ⇥ (kB ) = A1 A2 A3 = k A1 A2 A3 = kA1 kA2 kA3
kB1 kB2 kB3 B1 B2 B3 B1 B2 B3
= k(A ⇥ B ) = (kA) ⇥ B.
2. The cross product of a unit vector with itself is equal to the zero vector, the cross product of a unit
vector with another (keeping the order cyclical in i, j, k) is equal to the third unit vector, and reversing
the order of multiplication changes the sign. That is,
i ⇥ i = 0, j ⇥ j = 0, k ⇥ k = 0;
i ⇥ j = k, j ⇥ k = i, k ⇥ i = j;
k⇥j = i, j ⇥i = k, i⇥k = j.
i ⇥ (i ⇥ k ) = i⇥j = k,
(i ⇥ i) ⇥ k = 0 ⇥ k = 0.
138 APPENDIX C. PROBLEM SOLUTIONS
1. c. As an example, i ⇥ (i ⇥ j ) 6= (i ⇥ i) ⇥ j.
2. b.
i j k
(A ⇥ B ) · j = a1 a2 a3 · j = a3 b1 a1 b3 .
b1 b2 b3
3. d.
i ⇥ (j ⇥ k ) = i ⇥ i = 0, (i ⇥ j ) ⇥ k = k ⇥ k = 0, (i ⇥ i) ⇥ j = 0 ⇥ j = 0, i ⇥ (i ⇥ j ) = i ⇥ k = j.
139
1. We first compute the displacement vector between (1, 1, 1) and (2, 3, 2):
Choosing a point on the line to be (1, 1, 1), the parametric equation for the line is given by
The line crosses the x = 0 and z = 0 planes when t = 1 at the intersection point (0, 1, 0), and
crosses the y = 0 plane when t = 1/2 at the intersection point (1/2, 0, 1/2).
140 APPENDIX C. PROBLEM SOLUTIONS
1. We find two vectors parallel to the plane defined by the three points, ( 1, 1, 1), (1, 1, 1), and
(1, 1, 0):
i j k
1
N = s 1 ⇥ s2 = 1 1 1 = i+j 2k.
2
0 2 1
(i + j 2k ) · ( xi + yj + zk ) = (i + j 2k ) · (i + j + k ), or x+y 2z = 0.
The intersection of this plane with the z = 0 plane forms the line given by y = x.
141
1. d. Write the parametric equation as r = r0 + ut. Using the point (0, 1, 1), we take r0 = j + k and
from both points (0, 1, 1) and (1, 0, 1), we have u = (1 0)i + (0 1)j + ( 1 1)k = i j 2k.
Therefore r = j + k + (i j 2k )t = ti + (1 t )j + (1 2t)k.
2. a. The line is parameterized as r = ti + (1 t)j + (1 2t)k. The intersection with the z = 0 plane
1 1 1 1
occurs when t = 1/2 so that r = i + j. The intersection point is therefore ( , , 0).
2 2 2 2
3. d. We first find the parametric equation for the plane. From the points (1, 1, 1), (1, 1, 2) and (2, 1, 1),
we construct the two displacement vectors
N = s1 ⇥ s2 = k ⇥ ( i k) = k ⇥ i k ⇥ k = j.
or y = 1. This plane is parallel to the x-z plane and when z = 0 is simply the line y = 1 for all values
of x. Note now that we could have guessed this result because all three points defining the plane are
located at y = 1.
142 APPENDIX C. PROBLEM SOLUTIONS
1.
a) If ijk is a cyclic permutation of (1, 2, 3), then eijk = e jki = ekij = 1. If ijk is an anticyclic per-
mutation of (1, 2, 3), then eijk = e jki = ekij = 1. And if any two indices are equal, then
eijk = e jki = ekij = 0. The use is that we can cyclically permute the indices of the Levi-Civita
tensor without changing its value.
b) If ijk is a cyclic permutation of (1, 2, 3), then eijk = 1 and e jik = ekji = eikj = 1. If ijk is an
anticyclic permutation of (1, 2, 3), then eijk = 1 and e jik = ekji = eikj = 1. And if any two
indices are equal, then eijk = e jik = eikj = 0. The use is that we can swap any two indices of the
Levi-Civita symbol if we change its sign.
2. We have
3.
a) Now, dij A j = di1 A1 + di2 A2 + di3 A3 . The only nonzero term has the index of A equal to i,
therefore dij A j = Ai .
b) Now, dik dkj = di1 d1j + di2 d2j + di3 d3j . If i 6= j, then every term in the sum is zero. If i = j, then only
one term is nonzero and equal to one. Therefore, dik dkj = dij . This result could also be viewed as
an application of Part (a).
4. We make use of the identities dii = 3 and dik djk = dij . For the Kronecker delta, the order of the
indices doesn’t matter. We also use
eijk elmn = dil (djm dkn djn dkm ) dim (djl dkn djn dkl ) + din (djl dkm djm dkl ).
a)
eijk eimn = dii (djm dkn djn dkm ) dim (dji dkn djn dki ) + din (dji dkm djm dki )
= 3(djm dkn djn dkm ) (djm dkn djn dkm ) + (djn dkm djm dkn )
= djm dkn djn dkm .
1.
A ⇥ (B ⇥ C ) + B ⇥ (C ⇥ A) + C ⇥ (A ⇥ B )
= [(A · C )B (A · B )C ] + [(B · A)C (B · C )A] + [(C · B )A (C · A)B ]
= [(A · C )B (C · A)B ] + [(B · A)C (A · B )C ] + [(C · B )A (B · C )A]
= 0.
1. c. The relevant formula from the lecture is eijk eilm = djl dkm djm dkl . To apply this formula, we need
to rearrange the indices keeping cyclical order:
3. c. Use the facts that A ⇥ B is orthogonal to both A and B, A · B is zero if A and B are orthogonal,
and A ⇥ B is zero if A and B are parallel.
145
2. Define
f (t + e, x + d) = g(e, d).
Applying this expansion to f (t + aDt, x + bDt f (t, x )), we have to first-order in Dt,
 xi2  yi  xi yi  xi n  xi yi ( xi )( yi )
b0 = , b1 = ,
n  xi2 (  x i )2 n  xi2 ( xi )2
2
y
1 2 3
x
148 APPENDIX C. PROBLEM SOLUTIONS
1.
∂f ∂ f ∂x ∂ f ∂y
= +
∂r ∂x ∂r ∂y ∂r
= ye xy cos q + xe xy sin q
2 cos q sin q 2 cos q sin q
= r sin q cos qer + r sin q cos qer
2 cos q sin q
= 2r sin q cos qer ,
and
∂f ∂ f ∂x ∂ f ∂y
= +
∂q ∂x ∂q ∂y ∂q
= ye xy ( r sin q ) + xe xy (r cos q )
2 cos q sin q 2 cos q sin q
= r2 sin2 qer + r2 cos2 qer
2 cos q sin q
= r2 (cos2 q sin2 q )er .
2 cos q sin q
b) Substituting for x and y, we have f = er . Then
∂f 2
= 2r cos q sin qer cos q sin q ,
∂r
∂f 2
= r2 (cos2 q sin2 q )er cos q sin q .
∂q
149
by cz ax cz ax by
x= , y= , z= .
a b c
by cz dt ax cz dt ax by dt ax by cz
x= , y= , z= , t= .
a b c d
∂x b ∂y c ∂z d ∂t a
= , = , = , = ;
∂y a ∂z b ∂t c ∂x d
Apparently an odd number of products yields 1 and an even number of products yields +1.
150 APPENDIX C. PROBLEM SOLUTIONS
∂f x
= 2 ;
∂x ( x + y2 + z2 )3/2
∂2 f 3xy
= 2
∂x∂y ( x + y2 + z2 )5/2
2. a. From the data points (0, 1), (1, 3), (2, 3) and (3, 4), we compute
3. d. Let f = f ( x, y) with x = r cos q and y = r sin q. Then application of the chain rule results in
∂f ∂ f ∂x ∂ f ∂y
= +
∂q ∂x ∂q ∂y ∂q
∂f ∂f
= r sin q + r cos q
∂x ∂y
∂f ∂f
= y +x .
∂x ∂y
151
1.
r(r2 ) = 2r.
1
b) Let f( x, y, z) = p . The gradient is given by
x2 + y2 + z2
!
1
rf = r p
x 2 + y2 + z2
x y z
= i j k.
( x + y + z2 )3/2
2 2 ( x2 + y2+ z2 )3/2 ( x2 + y2+ z2 )3/2
1.
∂ ∂ ∂
r·F = ( xy) + (yz) + (zx )
∂x ∂y ∂z
= y + z + x = x + y + z.
∂ ∂ ∂
r·F = (yz) + ( xz) + ( xy) = 0.
∂x ∂y ∂z
153
1.
i j k
r ⇥ F = ∂/∂x ∂/∂y ∂/∂z = yi zj xk.
xy yz zx
i j k
r ⇥ F = ∂/∂x ∂/∂y ∂/∂z = ( x x )i + ( y y )j + ( z z)k = 0.
yz xz xy
154 APPENDIX C. PROBLEM SOLUTIONS
1. We have
✓ ◆ ! ! !
21 ∂2 1 ∂2 1 ∂2 1
r = 2 p + 2 p + 2 p .
r ∂x x 2 + y2 + z2 ∂y x 2 + y2 + z2 ∂z x 2 + y2 + z2
We can compute the derivatives with respect to x and use symmetry to find the other two terms. We
have !
∂ 1 x
p = ;
∂x x 2 + y2 + z2 ( x2 + y2 + z2 )3/2
and
✓ ◆
∂ x ( x2 + y2 + z2 )3/2 + 3x2 ( x2 + y2 + z2 )1/2
=
∂x ( x2 + y2 + z2 )3/2 ( x 2 + y2 + z2 )3
1 3x2
= + .
( x2 + y2 + z2 )3/2 ( x2 + y2 + z2 )5/2
1. d. We have
✓ ◆ ✓ ◆
1 1
r =r
r2 x 2 + y2 + z2
2x 2y 2z
= 2 i+ 2 j+ 2 k
( x + y2 + z2 )2 ( x + y2 + z2 )2 ( x + y2 + z2 )2
2r
= .
r4
2. b. We use !
xi + yj + zk
r·F = r· p .
x 2 + y2 + z2
Now,
! p
∂ x x 2 + y2 + z2 x 2 ( x 2 + y2 + z2 ) 1/2
p =
∂x x 2 + y2 + z2 x 2 + y2 + z2
1 x2
= ,
r r3
and similarly for the partial derivatives with respect to y and z. Adding all three partial derivatives
results in
3 x 2 + y2 + z2 3 1 2
r·F = = = .
r r3 r r r
3.
b. We have
i j k
r ⇥ r = ∂/∂x ∂/∂y ∂/∂z = 0.
x y z
156 APPENDIX C. PROBLEM SOLUTIONS
1.
a) We compute:
∂
r · ( f u) = ( f ui )
∂xi
∂f ∂u
= u +f i
∂xi i ∂xi
= u · r f + f r · u.
Therefore, r ⇥ (r ⇥ u) = r(r · u) r2 u.
2.
d2 x1 ∂u ∂u dx ∂u dx ∂u dx
2
= 1+ 1 1+ 1 2+ 1 3
dt ∂t ∂x1 dt ∂x2 dt ∂x3 dt
∂u1 ∂u1 ∂u1 ∂u1
= + u1 + u2 + u3 .
∂t ∂x1 ∂x2 ∂x3
d2 x2 ∂u ∂u ∂u ∂u d2 x3 ∂u ∂u ∂u ∂u
2
= 2 + u1 2 + u2 2 + u3 2 , 2
= 3 + u1 3 + u2 3 + u3 3 .
dt ∂t ∂x1 ∂x2 ∂x3 dt ∂t ∂x1 ∂x2 ∂x3
d2 r ∂u
2
= + u · ru.
dt ∂t
This expression is called the material acceleration, and is found in the Navier-Stokes equation of
fluid mechanics.
158 APPENDIX C. PROBLEM SOLUTIONS
∂B ∂E
r · E = 0, r · B = 0, r⇥E = , r ⇥ B = µ 0 e0 .
∂t ∂t
Take the curl of the fourth Maxwell’s equation, and commute the time and space derivatives to obtain
∂
r ⇥ ( r ⇥ B ) = µ 0 e0 (r ⇥ E ),
∂t
∂
r(r · B ) r 2 B = µ 0 e0 (r ⇥ E ).
∂t
Apply the second Maxwell’s equation to the left-hand-side, and the third Maxwell’s equation to the
right-hand-side. Rearranging terms, we obtain the three-dimensional wave equation given by
∂2 B
= c2 r2 B,
∂t2
p
where c = 1/ µ0 e0 .
159
Setting v = u, we have
r(u · u) = 2(u · r)u + 2u ⇥ (r ⇥ u).
Therefore,
1
r(u · u) = u ⇥ (r ⇥ u) + (u · r)u.
2
2. c. The curl of a gradient (a. and d.) and the divergence of a curl (b.) are zero. The divergence of a
gradient (c) is the Laplacian and is not always zero.
i j k
r⇥E = ∂/∂x ∂/∂y ∂/∂z = cos (z ct)j.
sin (z ct) 0 0
∂B
Maxwell’s equation r ⇥ E = then results in
∂t
∂B
= cos (z ct)j,
∂t
1
B= sin (z ct)j.
c
160 APPENDIX C. PROBLEM SOLUTIONS
To determine the mass of the cube, we place our coordinate system so that one corner of the cube is at
the origin and the adjacent corners are on the positive x, y and z axes. We assume that the density of
the cube is only a function of z, with
z
r ( z ) = r1 + ( r2 r1 ).
L
1.
1
x = y/3
0.8
0.6
y
0.4
0.2
x = 1 + y/3
0
0 0.2 0.4 0.6 0.8 1 1.2
x
1 ˆ 1+y/3 1 x =1+y/3
x3 y
ˆ ˆ
x2 y dx dy = dy
0 y/3 0 3 x =y/3
1 ✓ ◆3 ✓ ◆3 !
1 1 1
ˆ
= y 1+ y y dy
3 0 3 3
✓ 1 ◆
1 1
ˆ
= y 1 + y + y2 dy
03 3
✓ ◆ 1
1 1 2 1 3 1
= y + y + y4
3 2 3 12 0
✓ ◆
1 1 1 1 11
= + + = .
3 2 3 12 36
162 APPENDIX C. PROBLEM SOLUTIONS
2. b. To determine the mass of the cube, we place our coordinate system so that one corner of the
cube is at the origin and the adjacent corners are on the positive x, y and z axes. We assume that the
density of the cube is only a function of z, with
r(z) = (1 + z) g/cm3 .
1ˆ 1ˆ 1 1 1 1
1
ˆ ˆ ˆ ˆ
1
M= (1 + z) dx dy dz = dx dy (1 + z) dz = (z + z2 ) 0
= 1.5 g.
0 0 0 0 0 0 2
3. d. We draw a picture of the triangle and illustrate the chosen direction of integration.
0.6
y
0.4
0.2
0 0.5 1 1.5 2
x
1ˆ 2 y 1 2 y 1 h i
1 2 1
ˆ ˆ ˆ
xy dx dy = x y dy = y (2 y )2 y2 dy
0 y 0 2 y 2 0
1 ✓ ◆
1 1 1
ˆ
=2 (y y2 ) dy = 2 = .
0 2 3 3
163
1.
a) The matrix form for the relationship between r̂, ✓ˆ and i, j is given by
! ! !
r̂ cos q sin q i
= .
✓ˆ sin q cos q j
Therefore,
i = cos q r̂ ˆ
sin q ✓, ˆ
j = sin q r̂ + cos q ✓.
b) The matrix form for the relationship between ∂ f /∂r, ∂ f /∂q and ∂ f /∂x, ∂ f /∂y is given by
! ! !
∂ f /∂r cos q sin q ∂ f /∂x
= .
∂ f /∂q r sin q r cos q ∂ f /∂y
Therefore,
∂f ∂f sin q ∂ f ∂f ∂f cos q ∂ f
= cos q , = sin q + .
∂x ∂r r ∂q ∂y ∂r r ∂q
2. We have
rr̂ = r cos q i + r sin q j = xi + yj,
and
r✓ˆ = r sin q i + r cos q j = yi + xj.
164 APPENDIX C. PROBLEM SOLUTIONS
1. We have
l = r ⇥ p = r ⇥ (mṙ ) = mrr̂ ⇥ (ṙr̂ + r q̇ ✓ˆ ) = mr2 q̇ (r̂ ⇥ ✓ˆ ).
s (r ) = r0 + ( r1 r0 )(r/R).
Integrating the mass density in polar coordinates to find the total mass of the disk, we have
ˆ 2p ˆ R
M= [ r0 + ( r1 r0 )(r/R)] r dr dq
0 0
r=R
r0 r 2 (r r0 )r 3
= 2p + 1
2 3R r =0
1
= pR2 (r0 + 2r1 ).
3
Therefore, ˆ • p
x2
I= e dx = p.
•
166 APPENDIX C. PROBLEM SOLUTIONS
d✓ˆ
= cos qi sin qj = r̂.
dq
s = s (r ) = (10 9r ) g/cm2 .
The mass is found by integrating in polar coordinates using dx dy = r dr dq. Calculating in grams, we
have
ˆ 2p ˆ 1
M= (10 9r )r dr dq
0 0
ˆ 2p ˆ 1
= dq (10 9r )r dr
0 0
1
= 2p (5r2 3r3 ) = 4p ⇡ 12.57 g.
0
167
1. We have
∂ ∂ ∂
r = x̂ + ŷ + ẑ
∂x ∂y ∂z
✓ ◆ ✓ ◆
ˆ ∂ sin f ∂ ˆ ∂ cos f ∂ ∂
= cos f⇢ˆ sin f cos f + sin f⇢ˆ + cos f sin f + + ẑ
∂r r ∂f ∂r r ∂f ∂z
✓ ◆
2 ∂ cos f sin f ∂ 2 ∂ sin f cos f ∂
= ⇢ˆ cos f + sin f +
∂r r ∂f ∂r r ∂f
!
2 2f ∂
∂ sin f ∂ ∂ cos ∂
+ ˆ sin f cos f + + cos f sin f + + ẑ
∂r r ∂f ∂r r ∂f ∂z
∂ 1 ∂ ∂
= ⇢ˆ + ˆ + ẑ .
∂r r ∂f ∂z
a)
1 ∂ 1
r · ⇢ˆ = (r) = ;
r ∂r r
b)
r · ⇢ˆ = r · (cos fi + sin fj )
!
x y
= r· p i+ p j
x 2 + y2 x 2 + y2
! !
∂ x ∂ y
= p + p
∂x x 2 + y2 ∂y x 2 + y2
p p
x2 + y2 x2 ( x2 + y2 ) 1/2 x 2 + y2 y2 ( x 2 + y2 ) 1/2
= +
x 2 + y2 x 2 + y2
p p
2 x 2 + y2 x 2 + y2
= 2 2
x +y
1 1
= p = .
2
x +y 2 r
3. r ⇥ ⇢ˆ = 0, r · ˆ = 0 and
1 ∂ 1
r ⇥ ˆ = ẑ (r) = ẑ.
r ∂r r
168 APPENDIX C. PROBLEM SOLUTIONS
1. The spherical coordinate unit vectors can be written in terms of the Cartesian unit vectors by
The columns (and rows) of the transforming matrix Q are observed to be orthonormal so that Q is an
orthogonal matrix. We have Q 1 = QT so that
0 1 0 10 1
i sin q cos f cos q cos f sin f r̂
B C B CB C
@ j A = @ sin q sin f cos q sin f cos fA @ ✓ˆ A ;
k cos q sin q 0 ˆ
or in expanded form
2. We need the relationship between the Cartesian and the spherical coordinates, given by
= r2 sin q.
3. We have
ˆ ˆ 2p ˆ p ˆ R
f dV = f (r )r2 sin q dr dq df
V 0 0 0
ˆ 2p ˆ p ˆ R
= df sin q dq r2 f (r ) dr
0 0 0
ˆ R
= 4p r2 f (r ) dr,
0
´ 2p ´p p
where we have used 0 df = 2p and 0 sin q dq = cos q 0
= 2.
r (r ) = r0 + ( r1 r0 )(r/R),
The average density of the sphere is its mass divided by its volume, given by
1 3
r= r0 + r1 .
4 4
170 APPENDIX C. PROBLEM SOLUTIONS
1. We begin with
Differentiating,
∂r̂ ˆ
= cos q cos f i + cos q sin f j sin q k = ✓;
∂q
and
∂r̂
= sin q sin f i + sin q cos f j = sin q ˆ .
∂f
1 ∂ 2 2
r · r̂ = 2
(r ) = , r ⇥ r̂ = 0;
r ∂r r
1 ∂ cos q ˆ ∂ ˆ
r · ✓ˆ = (sin q ) = , r ⇥ ✓ˆ = (r ) = ;
r sin q ∂q r sin q r ∂r r
r̂ ∂ ˆ
✓ ∂ r̂ cos q ✓ˆ
r · ˆ = 0, r⇥ ˆ = (sin q ) (r ) = .
r sin q ∂q r ∂r r sin q r
2. c. When r = xi, the position vector points along the x-axis. Then r̂ also points along the x-axis, ✓ˆ
points along the negative z-axis and ˆ points along the y-axis. We have (r̂, ✓,
ˆ ˆ ) = (i, k, j ).
where r is the object’s mass density. Here, with the density r in units of g/cm3 , we have
r = r(r ) = 10 r.
The integral is easiest to do in spherical coordinates, and using dx dy dz = r2 sin q dr dq df, and com-
puting in grams, we have
ˆ 2p ˆ p ˆ 5
M= (10 r ) r2 sin q dr dq df
0 0 0
ˆ 5
= 4p (10r2 r3 ) dr
0
✓ ◆ 5
10 3 1 4 3125p
= 4p r r = g
3 4 0 3
⇡ 3272 g ⇡ 3.3 kg.
172 APPENDIX C. PROBLEM SOLUTIONS
1. In spherical coordinates, on the surface of a sphere of radius R centered at the origin, we have
r = Rr̂ and dS = r̂ R2 sin q dq df. Therefore,
˛ ˆ 2p ˆ p
r · dS = R3 sin q dq df = 4pR3 .
S 0 0
174 APPENDIX C. PROBLEM SOLUTIONS
3. a. We perform the flux integral in spherical coordinates. On the surface of the sphere of radius R,
we have
u = zk = ( R cos q )(cos q r̂ sin q ✓ˆ ),
and
dS = r̂ R2 sin q dq df.
a) rf = (2xy + y2 )i + ( x2 + 2xy)j + k
c) Integrating over the three directed line segments given by (1) (0, 0, 0) to (1, 0, 0); (2) (1, 0, 0) to
(1, 1, 0), and; (3) (1, 1, 0) to (1, 1, 1):
ˆ ˆ ˆ ˆ
rf · dr = rf · dr + rf · dr + rf · dr
C C1 C2 C3
ˆ 1 ˆ 1
= 0+ (1 + 2y) dy + dz
0 0
= 3.
176 APPENDIX C. PROBLEM SOLUTIONS
a)
i j k
r⇥u = ∂/∂x ∂/∂y ∂/∂z
2xy + z2 2yz + x2 2zx + y2
= (2y 2y)i + (2z 2z)j + (2x 2x )k
= 0.
b) We need to satisfy
∂f ∂f ∂f
= 2xy + z2 , = 2yz + x2 , = 2zx + y2 .
∂x ∂y ∂z
Take the derivative with respect to y and satsify the second equation:
∂f ∂f
x2 + = 2yz + x2 or = 2yz.
∂y ∂y
Take the derivative of f = x2 y + xz2 + y2 z + g(z) with respect to z and satisfy the last gradient
equation:
2xz + y2 + g0 (z) = 2zx + y2 or g0 (z) = 0.
2. a. Since r ⇥ u = r ⇥ (yi + xj ) = 0, the line integral u around any closed curve is zero. By
inspection, we can also observe that u = rf, where f = xy.
3. c. To solve the multiple choice question, we can always take the gradients of the four choices.
Without the advantage of multiple choice, however, we need to compute f and we do so here. We
solve
∂f ∂f ∂f
= 2x + y, = 2y + x, = 1.
∂x ∂y ∂z
Integrating the first equation with respect to x holding y and z fixed, we find
ˆ
f= (2x + y) dx = x2 + xy + f (y, z).
∂f ∂f
x+ = 2y + x or = 2y.
∂y ∂y
Another integration results in f (y, z) = y2 + g(z). Finally, differentiating f with respect to z yields
g0 (z) = 1, or g(z) = z + c. The final solution is
f( x, y, z) = x2 + xy + y2 + z + c.
1. Using spherical coordinates, let u = ur (r, q, f)r̂ + uq (r, q, f)✓ˆ + uf (r, q, f) ˆ . Then the volume
integral becomes
2p p R ✓ ◆
1 ∂ 2 1 ∂ 1 ∂uf
ˆ ˆ ˆ ˆ
(r · u) dV = (r ur ) + (sin q uq ) + r2 sin q dr dq df.
V 0 0 0 r2 ∂r r sin q ∂q r sin q ∂f
Each term in the integrand can be integrated once. The first term is integrated as
✓ ◆ !
2p p R 2p p R
1 ∂ 2 ∂ 2
ˆ ˆ ˆ ˆ ˆ ˆ
2
(r ur ) r sin q dr dq df = (r ur ) dr sin q dq df
0 0 0 r2 ∂r 0 0 0 ∂r
ˆ 2p ˆ p
= ur ( R, q, f) R2 sin q dq df.
0 0
2p p R ✓ ◆ 2p R ✓ˆ p ◆
1 ∂ ∂
ˆ ˆ ˆ ˆ ˆ
2
(sin q uq ) r sin q dr dq df = (sin q uq ) dq r dr df
0 0 0 r sin q ∂q 0 0 0 ∂q
ˆ 2p ˆ R
= (sin (p ) uq (r, p, f) sin (0) uq (r, 0, f)) r dr df
0 0
= 0,
since uf (r, q, 2p ) = uf (r, q, 0) because f is a periodic variable with same physical location at 0 and 2p.
Therefore, we have
ˆ ˆ 2p ˆ p ˛
2
(r · u) dV = ur ( R, q, f) R sin q dq df = u · dS,
V 0 0 S
where S is a sphere of radius R located at the origin, with unit normal vector given by r̂, and infinites-
imal surface area given by dS = R2 sin q dq df.
179
1. With u = x2 y i + y2 z j + z2 x k, we use r · u = 2xy + 2yz + 2zx. We have for the left-hand side of
the divergence theorem,
ˆ ˆ Lˆ Lˆ L
(r · u) dV = 2 ( xy + yz + zx ) dx dy dz
V 0 0 0
"ˆ #
L ˆ L ˆ L ˆ L ˆ L ˆ L ˆ L ˆ L ˆ L
=2 x dx y dy dz + dx y dy z dz + x dx dy z dz
0 0 0 0 0 0 0 0 0
= 2( L5 /4 + L5 /4 + L5 /4)
= 3L5 /2.
For the right-hand side of the divergence theorem, the flux integral only has nonzero contributions
from the three sides located at x = L, y = L and z = L. The corresponding unit normal vectors are i,
j and k, and the corresponding integrals are
˛ ˆ Lˆ L ˆ Lˆ L ˆ Lˆ L
2 2
u · dS = L y dy dz + L z dx dz + L2 x dx dy
S 0 0 0 0 0 0
= L /2 + L5 /2 + L5 /2
5
= 3L5 /2.
Note that the integral is equal to three times the volume of the box and is independent of the placement
and orientation of the coordinate system.
180 APPENDIX C. PROBLEM SOLUTIONS
1 d 1
r·u = (r ) = 2 .
r2 dr r
R ✓ ◆
1
ˆ ˆ
(r · u) dV = 4p r2 dr = 4pR.
V 0 r2
For the right-hand side of the divergence theorem, we have for a sphere of radius R centered at the
origin, dS = r̂ dS and
1 4pR2
˛ ˛
u · dS = dS = = 4pR.
S S R R
Note that the integral is equal to three times the volume of the sphere and is independent of the
placement and orientation of the coordinate system.
provided r 6= 0 where f is singular. Therefore, if the volume V does not contain the origin, then
✓ ◆
1
ˆ
2
r dV = 0, (0, 0, 0) 2
/ V.
V r
However, if V contains the origin, we need only integrate over a small sphere of volume V 0 2 V
centered at the origin, since r2 (1/r ) = 0 outside of V 0 . We therefore have from the divergence
theorem ✓ ◆ ✓ ◆ ✓ ◆
1 2 1 1
ˆ ˆ ˛
2
r dV = r dV = r · dS,
V r V0 r S0 r
where the surface S0 is now the surface of a sphere of radius R, say, centered at the origin. Using
spherical coordinates, ✓ ◆ ✓ ◆
1 d 1 r̂
r = r̂ = ,
r dr r r2
and since dS = r̂dS, we have
✓ ◆
1 1
˛ ˛
r · dS = dS = 4p,
S0 r R2 S0
For those of you familar with the Dirac delta function, say from my course Differential Equations for
Engineers, what we have here is ✓ ◆
1
r2 = 4pd(r ),
r
where d(r ) is the three-dimensional Dirac delta function satisfying
d(r ) = 0, when r 6= 0,
and ˆ
d(r ) dV = 1, provided the origin is in V.
V
182 APPENDIX C. PROBLEM SOLUTIONS
∂r
+ r · (ru) = 0.
∂t
∂r
+ u · rr + rr · u = 0.
∂t
2. We begin with
d
ˆ ˛
r(r, t) dV = J · dS.
dt V S
and combining both sides of the equation and bringing the time derivative inside the integral results
in ˆ ✓ ◆
∂r
+r·J dV = 0.
V ∂t
Since the integral is zero for any volume V, we obtain the electrodynamics continuity equation given
by
∂r
+ r · J = 0.
∂t
183
2. d. ˛ ˆ ˆ
r · dS = (r · r ) dV = 3 dV = 3pR2 L.
S V V
1. With u = yi + xj, we use ∂u2 /∂x ∂u1 /∂y = 2. For a square of side L, we have for the left-hand
side of Green’s theorem ˆ ✓ ◆
∂u2 ∂u1
ˆ
dA = 2 dA = 2L2 .
A ∂x ∂y A
When the square lies in the first quadrant with vertex at the origin, we have for the right-hand side of
Green’s theorem,
˛ ˆ L ˆ 0 ˆ 0 ˆ L
(u1 dx + u2 dy) = 0 dx + ( L) dx + 0 dy + L dy = 2L2 .
C 0 L L 0
2. With u = yi + xj, we use ∂u2 /∂x ∂u1 /∂y = 2. For a circle of radius R, we have for the left-hand
side of Green’s theorem, ˆ ✓ ◆
∂u2 ∂u1
ˆ
dA = 2 dA = 2pR2 .
A ∂x ∂y A
For a circle of radius R centered at the origin, we change variables to x = R cos q and y = R sin q. Then
dx = R sin q and dy = R cos q, and we have for the right-hand side of Green’s theorem,
˛ ˛ ˆ 2p
(u1 dx + u2 dy) = ( y dx + x dy) = ( R2 sin2 q + R2 cos2 q )dq = 2pR2 .
C C 0
185
1. Let u = u1 ( x, y, z) i + u2 ( x, y, z) j + u3 ( x, y, z) k. Then
✓ ◆ ✓ ◆ ✓ ◆
∂u3 ∂u2 ∂u1 ∂u3 ∂u2 ∂u1
r⇥u = i+ j+ k.
∂y ∂z ∂z ∂x ∂x ∂y
a) For an area lying in the y-z plane bounded by a curve C, the normal vector to the area is i.
Therefore, Green’s theorem is given by
ˆ ✓ ◆
∂u3 ∂u2
˛
dA = (u2 dy + u3 dz);
A ∂y ∂z C
b) For an area lying in the z-x plane bounded by a curve C, the normal vector to the area is j.
Therefore, Green’s theorem is given by
ˆ ✓ ◆
∂u1 ∂u3
˛
dA = (u3 dz + u1 dx );
A ∂z ∂x C
The correct orientation of the curves are determined by the right-hand rule, using a right-handed
coordinate system.
2. We have u = yi + xj. The right-hand side of Stokes’ theorem was computed in an earlier problem
on Green’s theorem and we repeat the solution here. For a circle of radius R lying in the x-y plane
with center at the origin, we change variables to x = R cos f and y = R sin f. Then dx = R sin f and
dy = R cos f, and we have for the right-hand side of Stokes’ theorem,
˛ ˛ ˛ ˆ 2p
u · dr = (u1 dx + u2 dy) = ( y dx + x dy) = ( R2 sin2 f + R2 cos2 f)dq = 2pR2 .
C C C 0
i j k
r ⇥ u = ∂/∂x ∂/∂y ∂/∂z = 2k;
y x 0
With
k = cos q r̂ ˆ
sin q ✓,
we have
k · r̂ = cos q;
and
ˆ ˆ 2p ˆ p/2
2
(r ⇥ u) · dS = 2R df cos q sin q dq
S 0 0
p/2
= 2pR2 sin2 q 0
= 2pR2 .
186 APPENDIX C. PROBLEM SOLUTIONS
1
˛ ˆ ˆ
u · dr = (r ⇥ u) · dS = 2 dA = pR2 ,
C S 2
1
where we have used dS = k dA and the area of the quarter circle is pR2 .
4
y x
2. c. With u = i+ 2 j, one can show by differentiating that r ⇥ u = 0 provided
x 2 + y2 x + y2
( x, y) 6= (0, 0). However, the integration region contains the origin so the integral is best done by
applying Stokes’ theorem. We use cylindrical coordinates to write
y x ˆ
u= i + j = .
x 2 + y2 x 2 + y2 r
Then, ✓ ◆
ˆ ˛ ˆ 2p ˆ ˆ 2p
(r ⇥ u) · dS = u · dr = · ˆ r df = df = 2p.
S C 0 r 0
1ˆ 1 1 1 1 1
2
˛ ˆ ˆ ˆ ˆ ˆ ˆ
2 2 2
u · dr = (r ⇥ u) · dS = ( x + y ) dx dy = x dx dy + dx y2 dy = ,
C S 0 0 0 0 0 0 3
1.
∂u 1
+ (u · r)u = rp + nr2 u, r · u = 0.
∂t r
Taking the divergence of the Navier-Stokes equation and using the continuity equation results in
1 2
r · ((u · r)u) = r p.
r
Now, !
∂ ∂u ∂ui ∂u j
r · ((u · r)u) = uj i = .
∂xi ∂x j ∂x j ∂xi
Therefore,
∂ui ∂u j
r2 p = r .
∂x j ∂xi
b) Taking the curl of the Navier-Stokes equation, and using ! = r ⇥ u and r ⇥ rp = 0, we obtain
∂!
+ r ⇥ (u · r)u = nr2 !.
∂t
1
u ⇥ (r ⇥ u) = r(u · u) (u · r)u.
2
∂um ∂um
[u ⇥ (r ⇥ u)]i = eijk u j eklm = ekij eklm u j
∂xl ∂xl
∂um ∂u j ∂ui
= (dil djm dim djl )u j = uj uj
∂xl ∂xi ∂x j
1 ∂ ∂ui
= (u u ) uj
2 ∂xi j j ∂x j
1
= r(u · u) [(u · r)u]i .
2 i
Therefore, using that the curl of a gradient and the divergence of a curl is equal to zero, and
! = r ⇥ u and r · u = 0, we have
✓ ◆
1
r ⇥ (u · r)u = r ⇥ r(u · u) u ⇥ (r ⇥ u)
2
= r ⇥ (u ⇥ ! )
= [u(r · ! ) ! (r · u) + (! · r)u (u · r)! ]
= (! · r)u + (u · r)!.
∂!
+ (u · r)! = (! · r)u + nr2 !.
∂t
188 APPENDIX C. PROBLEM SOLUTIONS
1. The electric field from a point charge at the origin should be spherically symmetric. We therefore
write using spherical coordinates, E (r ) = E(r )r̂. Integrating Gauss’s law over a spherical shell of
radius r, we have
q
˛ ˛
E · dS = E(r ) dS = 4pr2 E(r ) = .
S S #0
Therefore, the electric field is given by
q
E (r ) = r̂.
4p# 0 r2
2. The magnetic field from a current carrying infinite wire should have cylindrical symmetry. We
therefore write using cylindrical coordinates, B (r ) = B(r) ˆ . Integrating Ampère’s law over a circle
of radius r in the x-y plane in the counterclockwise direction, we obtain
˛ ˛
B · dr = B(r) dr = 2prB(r) = µ0 I,
C C
µ0 I ˆ
B (r ) = .
2pr
Matrix Algebra for Engineers
Jeffrey R. Chasnov
The Hong Kong University of Science and Technology
Department of Mathematics
Clear Water Bay, Kowloon
Hong Kong
This work is licensed under the Creative Commons Attribution 3.0 Hong Kong License. To view a copy of this
license, visit http://creativecommons.org/licenses/by/3.0/hk/ or send a letter to Creative Commons, 171 Second
Street, Suite 300, San Francisco, California, 94105, USA.
Preface
View the promotional video on YouTube
These are my lecture notes for my online Coursera course, Matrix Algebra for Engineers. I have
divided these notes into chapters called Lectures, with each Lecture corresponding to a video on
Coursera. I have also uploaded all my Coursera videos to YouTube, and links are placed at the top of
each Lecture.
There are problems at the end of each lecture chapter and I have tried to choose problems that
exemplify the main idea of the lecture. Students taking a formal university course in matrix or linear
algebra will usually be assigned many more additional problems, but here I follow the philosophy
that less is more. I give enough problems for students to solidify their understanding of the material,
but not too many problems that students feel overwhelmed and drop out. I do encourage students to
attempt the given problems, but if they get stuck, full solutions can be found in the Appendix.
There are also additional problems at the end of coherent sections that are given as practice quizzes
on the Coursera platform. Again, students should attempt these quizzes on the platform, but if a
student has trouble obtaining a correct answer, full solutions are also found in the Appendix.
The mathematics in this matrix algebra course is at the level of an advanced high school student, but
typically students would take this course after completing a university-level single variable calculus
course. There are no derivatives and integrals in this course, but student’s are expected to have a
certain level of mathematical maturity. Nevertheless, anyone who wants to learn the basics of matrix
algebra is welcome to join.
Jeffrey R. Chasnov
Hong Kong
July 2018
iii
Contents
I Matrices 1
1 Definition of a matrix 5
3 Special matrices 9
4 Transpose matrix 13
6 Inverse matrix 17
7 Orthogonal matrices 21
8 Rotation matrices 23
9 Permutation matrices 25
10 Gaussian elimination 33
12 Computing inverses 39
13 Elementary matrices 43
14 LU decomposition 45
v
vi CONTENTS
15 Solving (LU)x = b 47
16 Vector spaces 57
17 Linear independence 59
19 Gram-Schmidt process 65
21 Null space 71
23 Column space 77
25 Orthogonal projections 83
29 Laplace expansion 99
Matrices
1
3
In this week’s lectures, we learn about matrices. Matrices are rectangular arrays of numbers or
other mathematical objects and are fundamental to engineering mathematics. We will define matrices
and how to add and multiply them, discuss some special matrices such as the identity and zero matrix,
learn about transposes and inverses, and define orthogonal and permutation matrices.
4
Lecture 1
Definition of a matrix
View this lecture on YouTube
An m-by-n matrix is a rectangular array of numbers (or other mathematical objects) with m rows
and n columns. For example, a two-by-two matrix A, with two rows and two columns, looks like
!
a b
A= .
c d
The first row has elements a and b, the second row has elements c and d. The first column has elements
a and c; the second column has elements b and d. As further examples, two-by-three and three-by-two
matrices look like 0 1
! a d
a b c B C
B= , C = @b eA .
d e f
c f
Of special importance are column matrices and row matrices. These matrices are also called vectors.
The column vector is in general n-by-one and the row vector is one-by-n. For example, when n = 3,
we would write a column vector as 0 1
a
B C
x = @bA ,
c
Here, the matrix element of A in the ith row and the jth column is denoted as aij .
5
6 LECTURE 1. DEFINITION OF A MATRIX
a) Write down the three-by-three matrix with ones on the diagonal and zeros elsewhere.
b) Write down the three-by-four matrix with ones on the diagonal and zeros elsewhere.
c) Write down the four-by-three matrix with ones on the diagonal and zeros elsewhere.
Matrices can be added only if they have the same dimension. Addition proceeds element by element.
For example, ! ! !
a b e f a+e b+ f
+ = .
c d g h c+g d+h
Matrices can also be multiplied by a scalar. The rule is to just multiply every element of the matrix.
For example, ! !
a b ka kb
k = .
c d kc kd
Matrices (other than the scalar) can be multiplied only if the number of columns of the left matrix
equals the number of rows of the right matrix. In other words, an m-by-n matrix on the left can only
be multiplied by an n-by-k matrix on the right. The resulting matrix will be m-by-k. Evidently, matrix
multiplication is generally not commutative. We illustrate multiplication using two 2-by-2 matrices:
! ! ! ! ! !
a b e f ae + bg a f + bh e f a b ae + c f be + d f
= , = .
c d g h ce + dg c f + dh g h c d ag + ch bg + dh
First, the first row of the left matrix is multiplied against and summed with the first column of the right
matrix to obtain the element in the first row and first column of the product matrix. Second, the first
row is multiplied against and summed with the second column. Third, the second row is multiplied
against and summed with the first column. And fourth, the second row is multiplied against and
summed with the second column.
In general, an element in the resulting product matrix, say in row i and column j, is obtained by
multiplying and summing the elements in row i of the left matrix with the elements in column j of
the right matrix. We can formally write matrix multiplication in terms of the matrix elements. Let A
be an m-by-n matrix with matrix elements aij and let B be an n-by-p matrix with matrix elements bij .
Then C = AB is an m-by-p matrix, and its ij matrix element can be written as
n
cij = Â aik bkj .
k =1
Notice that the second index of a and the first index of b are summed over.
7
8 LECTURE 2. ADDITION AND MULTIPLICATION OF MATRICES
4. Prove the associative law for matrix multiplication. That is, let A be an m-by-n matrix, B an n-by-p
matrix, and C a p-by-q matrix. Then prove that A(BC) = (AB)C.
Special matrices
View this lecture on YouTube
The zero matrix, denoted by 0, can be any size and is a matrix consisting of all zero elements. Multi-
plication by a zero matrix results in a zero matrix. The identity matrix, denoted by I, is a square matrix
(number of rows equals number of columns) with ones down the main diagonal. If A and I are the
same sized square matrices, then
AI = IA = A,
and multiplication by the identity matrix leaves the matrix unchanged. The zero and identity matrices
play the role of the numbers zero and one in matrix multiplication. For example, the two-by-two zero
and identity matrices are given by
! !
0 0 1 0
0= , I= .
0 0 0 1
A diagonal matrix has its only nonzero elements on the diagonal. For example, a two-by-two diagonal
matrix is given by !
d1 0
D= .
0 d2
Usually, diagonal matrices refer to square matrices, but they can also be rectangular.
A band (or banded) matrix has nonzero elements only on diagonal bands. For example, a three-by-
three band matrix with nonzero diagonals one above and one below a nonzero main diagonal (called
a tridiagonal matrix) is given by
0 1
d1 a1 0
B C
B = @ b1 d2 a2 A .
0 b2 d3
An upper or lower triangular matrix is a square matrix that has zero elements below or above the
diagonal. For example, three-by-three upper and lower triangular matrices are given by
0 1 0 1
a b c a 0 0
B C B C
U = @0 d eA , L = @b d 0A .
0 0 f c e f
9
10 LECTURE 3. SPECIAL MATRICES
Construct a two-by-two matrix B such that AB is the zero matrix. Use two different nonzero columns
for B.
2. Verify that ! ! !
a1 0 b1 0 a1 b1 0
= .
0 a2 0 b2 0 a2 b2
Prove in general that the product of two diagonal matrices is a diagonal matrix, with elements given
by the product of the diagonal elements.
3. Verify that ! ! !
a1 a2 b1 b2 a1 b1 a1 b2 + a2 b3
= .
0 a3 0 b3 0 a3 b3
Prove in general that the product of two upper triangular matrices is an upper triangular matrix, with
the diagonal elements of the product given by the product of the diagonal elements.
a) A and C only
b) A and D only
c) B and C only
d) B and D only
11
12 LECTURE 3. SPECIAL MATRICES
Lecture 4
Transpose matrix
View this lecture on YouTube
The transpose of a matrix A, denoted by AT and spoken as A-transpose, switches the rows and
columns of A. That is,
0 1 0 1
a11 a12 ··· a1n a11 a21 ··· am1
B C B C
B a21 a22 ··· a2n C B a12 T
a22 ··· am2 C
if A = B
B .. .. .. .. C
C, then A = B
B .. .. .. .. C
C.
@ . . . . A @ . . . . A
am1 am2 ··· amn a1n a2n ··· amn
Evidently, if A is m-by-n then AT is n-by-m. As a simple example, view the following transpose pair:
0 1T
a d !
B C a b c
@b eA = .
d e f
c f
A less obvious fact is that the transpose of the product of matrices is equal to the product of the
transposes with the order of multiplication reversed, i.e.,
(AB)T = BT AT .
If A is a square matrix, and AT = A, then we say that A is symmetric. If AT = A, then we say that A
is skew symmetric. For example, three-by-three symmetric and skew symmetric matrices look like
0 1 0 1
a b c 0 b c
B C B C
@ b d eA , @ b 0 eA .
c e f c e 0
13
14 LECTURE 4. TRANSPOSE MATRIX
2. Show using the transpose operator that any square matrix A can be written as the sum of a sym-
metric and a skew-symmetric matrix.
The inner product (or dot product or scalar product) between two vectors is obtained from the ma-
trix product of a row vector times a column vector. A row vector can be obtained from a column
vector by the transpose operator. With the 3-by-1 column vectors u and v, their inner product is given
by
0 1
⇣ ⌘ v1
B C
uT v = u1 u2 u3 @ v2 A = u1 v1 + u2 v2 + u3 v3 .
v3
If the inner product between two nonzero vectors is zero, we say that the vectors are orthogonal. The
norm of a vector is defined by
⇣ ⌘1/2 ⇣ ⌘1/2
||u|| = uT u = u21 + u22 + u23 .
If the norm of a vector is equal to one, we say that the vector is normalized. If a set of vectors are
mutually orthogonal and normalized, we say that these vectors are orthonormal.
An outer product is also defined, and is used in some applications. The outer product between u
and v is given by
0 1 0 1
u1 ⇣ ⌘ u1 v1 u1 v2 u1 v3
B C B C
uvT = @u2 A v1 v2 v3 = @ u2 v1 u2 v2 u2 v3 A .
u3 u3 v1 u3 v2 u3 v3
Notice that every column is a multiple of the single vector u, and every row is a multiple of the single
vector vT .
15
16 LECTURE 5. INNER AND OUTER PRODUCTS
2. The trace of a square matrix B, denoted as Tr B, is the sum of the diagonal elements of B. Prove that
Tr(AT A) is the sum of the squares of all the elements of A.
Inverse matrix
View this lecture on YouTube
Square matrices may have inverses. When a matrix A has an inverse, we say it is invertible and
denote its inverse by A 1. The inverse matrix satisfies
1 1
AA =A A = I.
and try to solve for x1 , y1 , x2 and y2 in terms of a, b, c, and d. There are two inhomogeneous and two
homogeneous linear equations:
To solve, we can eliminate y1 and y2 using the two homogeneous equations, and find x1 and x2 using
the two inhomogeneous equations. The solution for the inverse matrix is found to be
! 1 !
a b 1 d b
= .
c d ad bc c a
The term ad bc is just the definition of the determinant of the two-by-two matrix:
!
a b
det = ad bc.
c d
The determinant of a two-by-two matrix is the product of the diagonals minus the product of the
off-diagonals. Evidently, a two-by-two matrix A is invertible only if det A 6= 0. Notice that the inverse
of a two-by-two matrix, in words, is found by switching the diagonal elements of the matrix, negating
the off-diagonal elements, and dividing by the determinant.
Later, we will show that an n-by-n matrix is invertible if and only if its determinant is nonzero.
This will require a more general definition of the determinant.
17
18 LECTURE 6. INVERSE MATRIX
a) AT BT CT
b) AT CT BT
c) CT AT BT
d) CT BT AT
a) A + AT
b) AAT
c) A AT
d) AT A
!
2 2
3. Which matrix is the inverse of ?
1 2
!
1 2 2
a)
2 1 2
!
1 2 2
b)
2 1 2
!
1 2 2
c)
2 1 2
!
1 2 2
d)
2 1 2
19
20 LECTURE 6. INVERSE MATRIX
Lecture 7
Orthogonal matrices
View this lecture on YouTube
1
Q = QT
QQT = I and QT Q = I.
We can more easily understand orthogonal matrices by examining a general two-by-two example. Let
Q be the orthogonal matrix given by
!
q11 q12 ⇣ ⌘
Q= = q1 q2 ,
q21 q22
where q1 and q2 are the two-by-one column vectors of the matrix Q. Then
! !
qT1 ⇣ ⌘ qT1 q1 qT1 q2
QT Q = q1 q2 = .
qT2 qT2 q1 qT2 q2
That is, the columns of Q form an orthonormal set of vectors. The same argument can also be made
for the rows of Q.
Therefore, an equivalent definition of an orthogonal matrix is a square matrix with real entries
whose columns (and also rows) form a set of orthonormal vectors.
There is a third equivalent definition of an orthogonal matrix. Let Q be an n-by-n orthogonal
matrix, and let x be an n-by-one column vector. Then the length squared of the vector Qx is given by
The length of Qx is therefore equal to the length of x, and we say that an orthogonal matrix is a matrix
that preserves lengths. In the next lecture, an example of an orthogonal matrix will be the matrix that
rotates a two-dimensional vector in the plane.
21
22 LECTURE 7. ORTHOGONAL MATRICES
Rotation matrices
View this lecture on YouTube
A matrix that rotates a vector in space doesn’t change the vector’s length and so should be an orthog-
y'
y
r
x' x
onal matrix. Consider the two-by-two rotation matrix that rotates a vector through an angle q in the
x-y plane, shown above. Trigonometry and the addition formula for cosine and sine results in
x 0 = r cos (q + y) y0 = r sin (q + y)
= r (cos q cos y sin q sin y) = r (sin q cos y + cos q sin y)
= x cos q y sin q = x sin q + y cos q.
The above two-by-two matrix is a rotation matrix and we will denote it by Rq . Observe that the rows
and columns of Rq are orthonormal and that the inverse of Rq is just its transpose. The inverse of Rq
rotates a vector by q.
23
24 LECTURE 8. ROTATION MATRICES
2. Find the three-by-three matrix that rotates a three-dimensional vector an angle q counterclockwise
around the z-axis.
Permutation matrices
View this lecture on YouTube
Another type of orthogonal matrix is a permutation matrix. A permutation matrix, when multiplying
on the left, permutes the rows of a matrix, and when multiplying on the right, permutes the columns.
Clearly, permuting the rows of a column vector will not change its length.
For example, let the string {1, 2} represent the order of the rows of a two-by-two matrix. Then
the two possible permutations of the rows are given by {1, 2} and {2, 1}. The first permutation is
no permutation at all, and the corresponding permutation matrix is simply the identity matrix. The
second permutation of the rows is achieved by
! ! !
0 1 a b c d
= .
1 0 c d a b
The rows of a three-by-three matrix have 3! = 6 possible permutations, namely {1, 2, 3}, {1, 3, 2},
{2, 1, 3}, {2, 3, 1}, {3, 1, 2}, {3, 2, 1}. For example, the row permutation {3, 1, 2} is achieved by
0 10 1 0 1
0 0 1 a b c g h i
B CB C B C
@1 0 0A @ d e f A = @a b cA .
0 1 0 g h i d e f
Notice that the permutation matrix is obtained by permuting the corresponding rows of the identity
matrix, with the rows of the identity matrix permuted as {1, 2, 3} ! {3, 1, 2}. That a permutation
matrix is just a row-permuted identity matix is made evident by writing
PA = (PI)A,
where P is a permutation matrix and PI is the identity matrix with permuted rows. The identity matrix
is orthogonal, and so is the permutation matrix obtained by permuting the rows of the identity matrix.
25
26 LECTURE 9. PERMUTATION MATRICES
2. Find the inverses of all the three-by-three permutation matrices. Explain why some matrices are
their own inverses, and others are not.
2. Which matrix rotates a three-by-one column vector an angle q counterclockwise around the x-axis?
0 1
1 0 0
B C
a) @0 cos q sin q A
0 sin q cos q
0 1
sin q 0 cos q
B C
b) @ 0 1 0 A
cos q 0 sin q
0 1
cos q sin q 0
B C
c) @ sin q cos q 0A
0 0 1
0 1
cos q sin q 0
B C
d) @ sin q cos q 0A
0 0 1
27
28 LECTURE 9. PERMUTATION MATRICES
3. Which matrix, when left multiplying another matrix, moves row one to row two, row two to row
three, and row three to row one?
0 1
0 1 0
B C
a) @0 0 1A
1 0 0
0 1
0 0 1
B C
b) @1 0 0A
0 1 0
0 1
0 0 1
B C
c) @0 1 0A
1 0 0
0 1
1 0 0
B C
d) @0 0 1A
0 1 0
29
31
In this week’s lectures, we learn about solving a system of linear equations. A system of linear
equations can be written in matrix form, and we can solve using Gaussian elimination. We will learn
how to bring a matrix to reduced row echelon form, and how this can be used to compute a matrix
inverse. We will also learn how to find the LU decomposition of a matrix, and how to use this
decomposition to efficiently solve a system of linear equations.
32
Lecture 10
Gaussian elimination
View this lecture on YouTube
3x1 + 2x2 x3 = 1,
6x1 6x2 + 7x3 = 7,
3x1 4x2 + 4x3 = 6,
or symbolically as Ax = b.
The standard numerical algorithm used to solve a system of linear equations is called Gaussian
elimination. We first form what is called an augmented matrix by combining the matrix A with the
column vector b: 0 1
3 2 1 1
B C
@ 6 6 7 7A .
3 4 4 6
Row reduction is then performed on this augmented matrix. Allowed operations are (1) interchange
the order of any rows, (2) multiply any row by a constant, (3) add a multiple of one row to another
row. These three operations do not change the solution of the original equations. The goal here is
to convert the matrix A into upper-triangular form, and then use this form to quickly solve for the
unknowns x.
We start with the first row of the matrix and work our way down as follows. First we multiply the
first row by 2 and add it to the second row. Then we add the first row to the third row, to obtain
0 1
3 2 1 1
B C
@ 0 2 5 9A .
0 2 3 7
33
34 LECTURE 10. GAUSSIAN ELIMINATION
We then go to the second row. We multiply this row by 1 and add it to the third row to obtain
0 1
3 2 1 1
B C
@ 0 2 5 9A .
0 0 2 2
The original matrix A has been converted to an upper triangular matrix, and the transformed equations
can be determined from the augmented matrix as
3x1 + 2x2 x3 = 1,
2x2 + 5x3 = 9,
2x3 = 2.
These equations can be solved by back substitution, starting from the last equation and working
backwards. We have
x3 = 1,
1
x2 = ( 9 5x3 ) = 2,
2
1
x1 = ( 1 + x3 2x2 ) = 2.
3
When performing Gaussian elimination, the matrix element that is used during the elimination proce-
dure is called the pivot. To obtain the correct multiple, one uses the pivot as the divisor to the matrix
elements below the pivot. Gaussian elimination in the way done here will fail if the pivot is zero. If
the pivot is zero, a row interchange must first be performed.
Even if no pivots are identically zero, small values can still result in an unstable numerical compu-
tation. For very large matrices solved by a computer, the solution vector will be inaccurate unless row
interchanges are made. The resulting numerical technique is called Gaussian elimination with partial
pivoting, and is usually taught in a standard numerical analysis course.
35
(a)
(b)
x1 2x2 + 3x3 = 1,
x1 + 3x2 x3 = 1,
2x1 5x2 + 5x3 = 1.
If we continue the row elimination procedure so that all the pivots are one, and all the entries above
and below the pivots are eliminated, then we say that the resulting matrix is in reduced row echelon
form. We notate the reduced row echelon form of a matrix A as rref(A). For example, consider the
three-by-four matrix
0 1
1 2 3 4
B C
A = @4 5 6 7A .
6 7 8 9
We say that the matrix A has two pivot columns, that is two columns that contain a pivot position with
a one in the reduced row echelon form. Note that rows may need to be exchanged when computing
the reduced row echelon form.
37
38 LECTURE 11. REDUCED ROW ECHELON FORM
(a)
0 1
3 7 2 7
B C
A=@ 3 5 1 5A
6 4 0 2
(b)
0 1
1 2 1
B C
A = @2 4 1A
3 6 2
Computing inverses
View this lecture on YouTube
By bringing an invertible matrix to reduced row echelon form, that is, to the identity matrix, we
can compute the matrix inverse. Given a matrix A, consider the equation
1
AA = I,
for the unknown inverse A 1. Let the columns of A 1 be given by the vectors a1 1 , a2 1 , and so on.
The matrix A multiplying the first column of A 1 is the equation
⇣ ⌘T
Aa1 1 = e1 , with e1 = 1 0 ... 0 ,
1
Aai = ei ,
for i = 1, 2, . . . , n. The method then is to do row reduction on an augmented matrix which attaches
the identity matrix to A. To find A 1, elimination is continued until one obtains rref(A) = I.
We illustrate below:
0 1 0 1
3 2 1 1 0 0 3 2 1 1 0 0
B C B C
@ 6 6 7 0 1 0A ! @ 0 2 5 2 1 0A !
3 4 4 0 0 1 0 2 3 1 0 1
0 1 0 1
3 2 1 1 0 0 3 0 4 3 1 0
B C B C
@ 0 2 5 2 1 0A ! @ 0 2 5 2 1 0A !
0 0 2 1 1 1 0 0 2 1 1 1
0 1 0 1
3 0 0 1 1 2 1 0 0 1/3 1/3 2/3
B C B C
@ 0 2 0 1/2 3/2 5/2A ! @ 0 1 0 1/4 3/4 5/4A ;
0 0 2 1 1 1 0 0 1 1/2 1/2 1/2
39
40 LECTURE 12. COMPUTING INVERSES
41
42 LECTURE 12. COMPUTING INVERSES
0 1
3 7 2
B C
3. The inverse of @ 3 5 1A is
6 4 0
0 1
4/3 2/3 1/2
B C
a) @ 2 1 1/2A
3 5 1
0 1
2/3 1/2 4/3
B C
b) @ 1 1/2 2A
3 5 1
0 1
2/3 4/3 1/2
B C
c) @ 1 2 1/2A
5 3 1
0 1
2/3 4/3 1/2
B C
d) @ 1 2 1/2A
3 5 1
Elementary matrices
View this lecture on YouTube
The row reduction algorithm of Gaussian elimination can be implemented by multiplying elemen-
tary matrices. Here, we show how to construct these elementary matrices, which differ from the
identity matrix by a single elementary row operation. Consider the first row reduction step for the
following matrix A:
0 1 0 1 0 1
3 2 1 3 2 1 1 0 0
B C B C B C
A=@ 6 6 7A ! @ 0 2 5 A = M1 A, where M1 = @2 1 0A .
3 4 4 3 4 4 0 0 1
To construct the elementary matrix M1 , the number two is placed in column-one, row-two. This matrix
multiplies the first row by two and adds the result to the second row.
The next step in row elimination is
0 1 0 1 0 1
3 2 1 3 2 1 1 0 0
B C B C B C
@ 0 2 5A ! @ 0 2 5 A = M2 M1 A, where M2 = @0 1 0A .
3 4 4 0 2 3 1 0 1
Here, to construct M2 the number one is placed in column-one, row-three, and the matrix multiplies
the first row by one and adds the result to the third row.
The last step in row elimination is
0 1 0 1 0 1
3 2 1 3 2 1 1 0 0
B C B C B C
@ 0 2 5A ! @ 0 2 5A = M3 M2 M1 A, where M3 = @ 0 1 0A .
0 2 3 0 0 2 0 1 1
Here, to construct M3 the number negative-one is placed in column-two, row-three, and this matrix
multiplies the second row by negative-one and adds the result to the third row.
We have thus found that
M3 M2 M1 A = U,
where U is an upper triangular matrix. This discussion will be continued in the next lecture.
43
44 LECTURE 13. ELEMENTARY MATRICES
LU decomposition
View this lecture on YouTube
In the last lecture, we have found that row reduction of a matrix A can be written as
M3 M2 M1 A = U,
A = M1 1 M2 1 M3 1 U.
Now, the matrix M1 multiples the first row by two and adds it to the second row. To invert this
operation, we simply need to multiply the first row by negative-two and add it to the second row, so
that 0 1 0 1
1 0 0 1 0 0
B C B C
M1 = @ 2 1 0A , M1 1 = @ 2 1 0A .
0 0 1 0 0 1
Similarly,
0 1 0 1 0 1 0 1
1 0 0 1 0 0 1 0 0 1 0 0
B C B C B C B C
M2 = @ 0 1 0A , M2 1 = @ 0 1 0A ; M3 = @0 1 0A , M3 1 = @0 1 0A .
1 0 1 1 0 1 0 1 1 0 1 1
Therefore,
L = M1 1 M2 1 M3 1
is given by
0 10 10 1 0 1
1 0 0 1 0 0 1 0 0 1 0 0
B CB CB C B C
L=@ 2 1 0A @ 0 1 0A @0 1 0A = @ 2 1 0A ,
0 0 1 1 0 1 0 1 1 1 1 1
which is lower triangular. Also, the non-diagonal elements of the elementary inverse matrices are
simply combined to form L. Our LU decomposition of A is therefore
0 1 0 10 1
3 2 1 1 0 0 3 2 1
B C B CB C
@ 6 6 7A = @ 2 1 0A @ 0 2 5A .
3 4 4 1 1 1 0 0 2
45
46 LECTURE 14. LU DECOMPOSITION
Solving (LU)x = b
View this lecture on YouTube
The LU decomposition is useful when one needs to solve Ax = b for many right-hand-sides. With the
LU decomposition in hand, one writes
(LU)x = L(Ux) = b,
and lets y = Ux. Then we solve Ly = b for y by forward substitution, and Ux = y for x by backward
substitution. It is possible to show that for large matrices, solving (LU)x = b is substantially faster
than solving Ax = b directly.
y1 = 1,
y2 = 7 + 2y1 = 9,
y3 = 6 + y1 y2 = 2.
47
48 LECTURE 15. SOLVING (LU)X = B
x3 = 1,
1
x2 = ( 9 5x3 ) = 2,
2
1
x1 = ( 1 2x2 + x3 ) = 2,
3
51
52 LECTURE 15. SOLVING (LU)X = B
0 1
3 7 2
B C
2. Which of the following is the LU decomposition of @ 3 5 1A?
6 4 0
0 10 1
1 0 0 3 7 2
B CB C
a) @ 1 1 0 A @0 2 1A
2 5 1/2 0 0 2
0 10 1
1 0 0 3 7 2
B CB C
b) @ 1 1 0A @0 2 1A
2 5 1 0 0 1
0 10 1
1 0 0 3 7 2
B CB C
c) @ 1 2 1A @0 1 1A
2 10 6 0 0 1
0 10 1
1 0 0 3 7 2
B CB C
d) @ 1 1 0A @ 0 2 1A
4 5 1 6 14 3
0 1 0 1 0 1
1 0 0 3 7 2 1
B C B C B C
3. Suppose L = @ 1 1 0A, U = @0 2 1A, and b = @ 1A. Solve LUx = b by letting
2 5 1 0 0 1 1
y = Ux. The solutions for y and x are
0 1 0 1
1 1/6
B C B C
a) y = @ 0A, x = @1/2A
1 1
0 1 0 1
1 1/6
B C B C
b) y = @ 0A, x = @ 1/2A
1 1
0 1 0 1
1 1/6
B C B C
c) y = @ 0A, x = @ 1/2A
1 1
0 1 0 1
1 1/6
B C B C
d) y = @ 0A, x = @ 1/2A
1 1
Vector Spaces
53
55
In this week’s lectures, we learn about vector spaces. A vector space consists of a set of vectors
and a set of scalars that is closed under vector addition and scalar multiplication and that satisfies
the usual rules of arithmetic. We will learn some of the vocabulary and phrases of linear algebra,
such as linear independence, span, basis and dimension. We will learn about the four fundamental
subspaces of a matrix, the Gram-Schmidt process, orthogonal projection, and the matrix formulation
of the least-squares problem of drawing a straight line to fit noisy data.
56
Lecture 16
Vector spaces
View this lecture on YouTube
A vector space consists of a set of vectors and a set of scalars. Although vectors can be quite gen-
eral, for the purpose of this course we will only consider vectors that are real column matrices, and
scalars that are real numbers.
For the set of vectors and scalars to form a vector space, the set of vectors must be closed under
vector addition and scalar multiplication. That is, when you multiply any two vectors in the set by
real numbers and add them, the resulting vector must still be in the set.
As an example, consider the set of vectors consisting of all three-by-one matrices, and let u and
v be two of these vectors. Let w = au + bv be the sum of these two vectors multiplied by the real
numbers a and b. If w is still a three-by-one matrix, then this set of vectors is closed under scalar
multiplication and vector addition, and is indeed a vector space. The proof is rather simple. If we let
0 1 0 1
u1 v1
B C B C
u = @ u2 A , v = @ v2 A ,
u3 v3
then 0 1
au1 + bv1
B C
w = au + bv = @ au2 + bv2 A
au3 + bv3
57
58 LECTURE 16. VECTOR SPACES
2. Explain why the following sets of three-by-one matrices (with real number scalars) are vector spaces:
(a) The set of three-by-one matrices with zero in the first row;
(b) The set of three-by-one matrices with first row equal to the second row;
(c) The set of three-by-one matrices with first row a constant multiple of the third row.
Linear independence
View this lecture on YouTube
The set of vectors, {u1 , u2 , . . . , un }, are linearly independent if for any scalars c1 , c2 , . . . , cn , the equation
c1 u1 + c2 u2 + · · · + cn un = 0
has only the solution c1 = c2 = · · · = cn = 0. What this means is that one is unable to write any of
the vectors u1 , u2 , . . . , un as a linear combination of any of the other vectors. For instance, if there was
a solution to the above equation with c1 6= 0, then we could solve that equation for u1 in terms of the
other vectors with nonzero coefficients.
As an example consider whether the following three three-by-one column vectors are linearly
independent:
0 1 0 1 0 1
1 0 2
B C B C B C
u = @0A , v = @1A , w = @3A .
0 0 0
Indeed, they are not linearly independent, that is, they are linearly dependent, because w can be written
in terms of u and v. In fact, w = 2u + 3v.
Now consider the three three-by-one column vectors given by
0 1 0 1 0 1
1 0 0
B C B C B C
u = @0A , v = @1A , w = @0A .
0 0 1
These three vectors are linearly independent because you cannot write any one of these vectors as a
linear combination of the other two. If we go back to our definition of linear independence, we can
see that the equation
0 1 0 1
a 0
B C B C
au + bv + cw = @bA = @0A
c 0
59
60 LECTURE 17. LINEAR INDEPENDENCE
Given a set of vectors, one can generate a vector space by forming all linear combinations of that
set of vectors. The span of the set of vectors {v1 , v2 , . . . , vn } is the vector space consisting of all linear
combinations of v1 , v2 , . . . , vn . We say that a set of vectors spans a vector space.
For example, the set of vectors given by
80 1 0 1 0 19
< 1
> 0 2 >=
B C B C B C
@0A , @1A , @3A
>
: >
;
0 0 0
spans the vector space of all three-by-one matrices with zero in the third row. This vector space is a
vector subspace of all three-by-one matrices.
One doesn’t need all three of these vectors to span this vector subspace because any one of these
vectors is linearly dependent on the other two. The smallest set of vectors needed to span a vector
space forms a basis for that vector space. Here, given the set of vectors above, we can construct a basis
for the vector subspace of all three-by-one matrices with zero in the third row by simply choosing two
out of three vectors from the above spanning set. Three possible bases are given by
80 1 0 19 80 1 0 19 80 1 0 19
< 1
> 0 >
= < 1
> 2 >
= < 0
> 2 >
=
B C B C B C B C B C B C
@0A , @1A , @0A , @3A , @1A , @3A .
>
: >
; >
: >
; >
: >
;
0 0 0 0 0 0
Although all three combinations form a basis for the vector subspace, the first combination is usually
preferred because this is an orthonormal basis. The vectors in this basis are mutually orthogonal and
of unit norm.
The number of vectors in a basis gives the dimension of the vector space. Here, the dimension of
the vector space of all three-by-one matrices with zero in the third row is two.
61
62 LECTURE 18. SPAN, BASIS AND DIMENSION
b) The set of three-by-one matrices with the sum of all the rows equal to one.
c) The set of three-by-one matrices with the first row equal to the third row.
d) The set of three-by-one matrices with the first row equal to the sum of the second and third rows.
63
64 LECTURE 18. SPAN, BASIS AND DIMENSION
3. Which of the following is an orthonormal basis for the vector space of all three-by-one matrices
with the sum of all rows equal to zero?
8 0 1 0 19
>
< 1 1 1 >=
B C 1 B C
a) p @ 1A , p @ 1A
>
: 2 2 >
;
0 0
8 0 1 0 19
>
< 1 1 1 >
=
B C 1 B C
b) p @ 1A , p @ 1A
>
: 2 6 >
;
0 2
8 0 1 0 1 0 19
>
< 1 1 1 0 >
=
B C 1 B C 1 B C
c) p @ 1A , p @ 0A , p @ 1A
>
: 2 2 2 >
;
0 1 1
8 0 1 0 1 0 19
>
< 1 2 1 1 >
=
B C 1 B C 1 B C
d) p @ 1A , p @ 2A , p @ 1A
>
: 6 6 6 >
;
1 1 2
Gram-Schmidt process
View this lecture on YouTube
Given any basis for a vector space, we can use an algorithm called the Gram-Schmidt process to
construct an orthonormal basis for that space. Let the vectors v1 , v2 , . . . , vn be a basis for some n-
dimensional vector space. We will assume here that these vectors are column matrices, but this process
also applies more generally.
We will construct an orthogonal basis u1 , u2 , . . . , un , and then normalize each vector to obtain an
orthonormal basis. First, define u1 = v1 . To find the next orthogonal basis vector, define
(uT1 v2 )u1
u2 = v2 .
uT1 u1
Observe that u2 is equal to v2 minus the component of v2 that is parallel to u1 . By multiplying both
sides of this equation with uT1 , it is easy to see that uT1 u2 = 0 so that these two vectors are orthogonal.
The next orthogonal vector in the new basis can be found from
Here, u3 is equal to v3 minus the components of v3 that are parallel to u1 and u2 . We can continue in
this fashion to construct n orthogonal basis vectors. These vectors can then be normalized via
u1
u
b1 = T
, etc.
(u1 u1 )1/2
Since uk is a linear combination of v1 , v2 , . . . , vk , the vector subspace spanned by the first k basis
vectors of the original vector space is the same as the subspace spanned by the first k orthonormal
vectors generated through the Gram-Schmidt process. We can write this result as
span{u1 , u2 , . . . , uk } = span{v1 , v2 , . . . , vk }.
65
66 LECTURE 19. GRAM-SCHMIDT PROCESS
and construct an orthonormal basis for this subspace. Let u1 = v1 . Then u2 is found from
(uT1 v2 )u1
u2 = v2
uT1 u1
0 1 0 1 0 1
0 1 2
B C 2B C 1B C
= @1A @1A = @ 1A .
3 3
1 1 1
Notice that the initial two vectors v1 and v2 span the vector subspace of three-by-one column matri-
ces for which the second and third rows are equal. Clearly, the orthonormal basis vectors constructed
from the Gram-Schmidt process span the same subspace.
67
68 LECTURE 20. GRAM-SCHMIDT PROCESS EXAMPLE
Use the Gram-Schmidt process to construct an orthonormal basis for this subspace.
2. Consider a subspace of all four-by-one column vectors with the following basis:
80 1 0 1 0 19
> 1
>
> 0 0 >>
>
<B C B C B C> >
B1C B1C B0C =
B C B C B
W = B C,B C,B C . C
>
>
> @1A @1A @1A > >
>
>
: 1 >
1 1 ;
Use the Gram-Schmidt process to construct an orthonormal basis for this subspace.
a) v1
b) v2
c) v3
d) v4
( ! !)
1 1
2. The Gram-Schmidt process applied to {v1 , v2 } = , results in
1 1
( ! !)
1 1 1 1
a) {u
b1 , u
b2 } = p ,p
2 1 2 1
( ! !)
1 1 0
b) {u
b1 , u
b2 } = p ,
2 1 0
( ! !)
1 0
c) {u
b1 , u
b2 } = ,
0 1
( ! !)
1 1 1 2
d) {u
b1 , u
b2 } = p ,p
3 2 3 1
69
70 LECTURE 20. GRAM-SCHMIDT PROCESS EXAMPLE
80 1 0 19
>
< 1 0 >
=
B C B C
3. The Gram-Schmidt process applied to {v1 , v2 } = @ 1A , @ 1A results in
>
: >
;
1 1
8 0 1 0 19
>
< 1 1 0 >
=
B C 1 B C
a) {u
b1 , u
b 2 } = p @ 1A , p @1A
>
: 3 2 >
;
1 1
8 0 1 0 19
>
< 1 1 2 >
=
B C 1 B C
b) {u
b1 , u
b 2 } = p @ 1A , p @ 1A
>
: 3 6 >
;
1 1
8 0 1 0 19
>
< 1 1 1 >
=
B C 1 B C
c) {u
b1 , u
b 2 } = p @ 1A , p @ 1A
> 3
: 2 >
;
1 0
8 0 1 0 19
>
< 1 1 1 >=
B C 1 B C
d) {u
b1 , u
b2 } = p @ 1A , p @0A
>
: 3 2 >
;
1 1
Null space
Clearly, if x and y are in the null space of A, then so is ax + by so that the null space is closed under
vector addition and scalar multiplication. If the matrix A is m-by-n, then Null(A) is a vector subspace
of all n-by-one column matrices. If A is a square invertible matrix, then Null(A) consists of just the
zero vector.
To find a basis for the null space of a noninvertible matrix, we bring A to reduced row echelon
form. We demonstrate by example. Consider the three-by-five matrix given by
0 1
3 6 1 1 7
B C
A=@ 1 2 2 3 1A .
2 4 5 8 4
By judiciously permuting rows to simplify the arithmetic, one pathway to construct rref(A) is
0 1 0 1 0 1
3 6 1 1 7 1 2 2 3 1 1 2 2 3 1
B C B C B C
@ 1 2 2 3 1A ! @ 3 6 1 1 7A ! @0 0 5 10 10A !
2 4 5 8 4 2 4 5 8 4 0 0 1 2 2
0 1 0 1
1 2 2 3 1 1 2 0 1 3
B C B C
@0 0 1 2 2 A ! @0 0 1 2 2A .
0 0 5 10 10 0 0 0 0 0
We call the variables associated with the pivot columns, x1 and x3 , basic variables, and the variables
associated with the non-pivot columns, x2 , x4 and x5 , free variables. Writing the basic variables on the
left-hand side of the Ax = 0 equations, we have from the first and second rows
x1 = 2x2 + x4 3x5 ,
x3 = 2x4 + 2x5 .
71
72 LECTURE 21. NULL SPACE
Eliminating x1 and x3 , we can write the general solution for vectors in Null(A) as
0 1 0 1 0 1 0 1
2x2 + x4 3x5 2 1 3
B C B C B C B C
B x2 C B1C B 0C B 0C
B C B C B C B C
B C = x2 B0C + x4 B 2C + x5 B
C 2C
B 2x4 + 2x5 C B C B B C,
B C B C B C B C
@ x4 A @0A @ 1A @ 0A
x5 0 0 1
where the free variables x2 , x4 , and x5 can take any values. By writing the null space in this form, a
basis for Null(A) is made evident, and is given by
80 1 0 1 0 19
>
> 2 1 3 > >
>
> B C B C B C>>
>
> 1C B 0C 0C >
<BB C B C
B
B C=
>
B0C , B 2C , B C
2C .
> B C B C B
>
> B C B C B C>>
>
> @0A @ 1A @ 0A >>
>
>
: >
;
0 0 1
The null space of A is seen to be a three-dimensional subspace of all five-by-one column matrices. In
general, the dimension of Null(A) is equal to the number of non-pivot columns of rref(A).
73
An under-determined system of linear equations Ax = b with more unknowns than equations may
not have a unique solution. If u is the general form of a vector in the null space of A, and v is any
vector that satisfies Av = b, then x = u + v satisfies Ax = A(u + v) = Au + Av = 0 + b = b. The
general solution of Ax = b can therefore be written as the sum of a general vector in Null(A) and a
particular vector that satisfies the under-determined system.
As an example, suppose we want to find the general solution to the linear system of two equations
and three unknowns given by
2x1 + 2x2 + x3 = 0,
2x1 2x2 x3 = 1,
The null space satisfying Au = 0 is determined from u1 = 0 and u2 = u3 /2, and we can write
80 19
>
< 0 >
=
B C
Null(A) = span @ 1A .
>
: >
;
2
A particular solution for the inhomogeneous system satisfying Av = b is found by solving v1 = 1/4
and v2 + v3 /2 = 1/4. Here, we simply take the free variable v3 to be zero, and we find v1 = 1/4
and v2 = 1/4. The general solution to the original underdetermined linear system is the sum of the
null space and the particular solution and is given by
0 1 0 1 0 1
x1 0 1
B C B C 1B C
@ x2 A = a @ 1A + @ 1A .
4
x3 2 0
75
76 LECTURE 22. APPLICATION OF THE NULL SPACE
3x1 + 6x2 x3 + x4 = 7,
x1 2x2 + 2x3 + 3x4 = 1,
2x1 4x2 + 5x3 + 8x4 = 4.
Column space
View this lecture on YouTube
The column space of a matrix is the vector space spanned by the columns of the matrix. When a
matrix is multiplied by a column vector, the resulting vector is in the column space of the matrix, as
can be seen from ! ! ! ! !
a b x ax + by a b
= =x +y .
c d y cx + dy c d
In general, Ax is a linear combination of the columns of A. Given an m-by-n matrix A, what is the
dimension of the column space of A, and how do we find a basis? Note that since A has m rows, the
column space of A is a subspace of all m-by-one column matrices.
Fortunately, a basis for the column space of A can be found from rref(A). Consider the example
0 1 0 1
3 6 1 1 7 1 2 0 1 3
B C B C
A=@ 1 2 2 3 1A , rref(A) = @0 0 1 2 2A .
2 4 5 8 4 0 0 0 0 0
The matrix equation Ax = 0 expresses the linear dependence of the columns of A, and row operations
on A do not change the dependence relations. For example, the second column of A above is 2 times
the first column, and after several row operations, the second column of rref(A) is still 2 times the
first column.
It should be self-evident that only the pivot columns of rref(A) are linearly independent, and the
dimension of the column space of A is therefore equal to its number of pivot columns; here it is two.
A basis for the column space is given by the first and third columns of A, (not rref(A)), and is
80 1 0 19
>
< 3 1 >=
B C B C
@ 1A , @ 2A .
>
: >
;
2 5
Recall that the dimension of the null space is the number of non-pivot columns—equal to the
number of free variables—so that the sum of the dimensions of the null space and the column space
is equal to the total number of columns. A statement of this theorem is as follows. Let A be an m-by-n
matrix. Then
dim(Col(A)) + dim(Null(A)) = n.
77
78 LECTURE 23. COLUMN SPACE
In addition to the column space and the null space, a matrix A has two more vector spaces asso-
ciated with it, namely the column space and null space of AT , which are called the row space and the
left null space.
If A is an m-by-n matrix, then the row space and the null space are subspaces of all n-by-one
column matrices, and the column space and the left null space are subspaces of all m-by-one column
matrices.
The null space consists of all vectors x such that Ax = 0, that is, the null space is the set of all
vectors that are orthogonal to the row space of A. We say that these two vector spaces are orthogonal.
A basis for the row space of a matrix can be found from computing rref(A), and is found to be
rows of rref(A) (written as column vectors) with pivot columns. The dimension of the row space of A
is therefore equal to the number of pivot columns, while the dimension of the null space of A is equal
to the number of nonpivot columns. The union of these two subspaces make up the vector space of all
n-by-one matrices and we say that these subspaces are orthogonal complements of each other.
Furthermore, the dimension of the column space of A is also equal to the number of pivot columns,
so that the dimensions of the column space and the row space of a matrix are equal. We have
dim(Col(A)) = dim(Row(A)).
We call this dimension the rank of the matrix A. This is an amazing result since the column space and
row space are subspaces of two different vector spaces. In general, we must have rank(A) min(m, n).
When the equality holds, we say that the matrix is of full rank. And when A is a square matrix and of
full rank, then the dimension of the null space is zero and A is invertible.
79
80 LECTURE 24. ROW SPACE, LEFT NULL SPACE AND RANK
Check to see that null space is the orthogonal complement of the row space, and the left null space is
the orthogonal complement of the column space. Find rank(A). Is this matrix of full rank?
81
82 LECTURE 24. ROW SPACE, LEFT NULL SPACE AND RANK
x1 + 2x2 + x4 = 1,
2x1 + 4x2 + x3 + x4 = 1,
3x1 + 6x2 + x3 + x4 = 1,
is
0 1 0 1
0 2
B C B C
B0C B 1C
a) aB C B C
B0C + B 0C
@ A @ A
1 0
0 1 0 1
2 0
B C B C
B 1C B0C
b) aB C B C
B 0C + B0C
@ A @ A
0 1
0 1 0 1
0 0
B C B C
B0C B 0C
c) aB C B C
B0C + B 3C
@ A @ A
1 2
0 1 0 1
0 0
B C B C
B 0C B0C
d) aB C B C
B 3C + B0C
@ A @ A
2 1
0 1
1 2 0 1
B C
3. What is the rank of the matrix @2 4 1 1A?
3 6 1 1
a) 1
b) 2
c) 3
d) 4
Orthogonal projections
View this lecture on YouTube
Suppose that V is the n-dimensional vector space of all n-by-one matrices and W is a p-dimensional
subspace of V. Let {s1 , s2 , . . . , s p } be an orthonormal basis for W. Extending the basis for W, let
{s1 , s2 , . . . , s p , t1 , t2 , . . . , tn p } be an orthonormal basis for V.
Any vector v in V can be expanded using the basis for V as
v = a1 s1 + a2 s2 + · · · + a p s p + b1 t1 + b2 t2 + bn p tn p ,
where the a’s and b’s are scalar coefficients. The orthogonal projection of v onto W is then defined as
vprojW = a1 s1 + a2 s2 + · · · + a p s p ,
w = c1 s1 + c2 s2 + · · · + c p s p .
The distance between v and w is given by the norm ||v w||, and we have
or ||v vprojW || ||v w||, a result that will be used later in the problem of least squares.
83
84 LECTURE 25. ORTHOGONAL PROJECTIONS
Suppose there is some experimental data that you want to fit by a straight line. This is called a
linear regression problem and an illustrative example is shown below.
y
Linear regression
y1 = b 0 + b 1 x1 , y2 = b 0 + b 1 x2 , ..., yn = b 0 + b 1 xn .
These equations constitute a system of n equations in the two unknowns b 0 and b 1 . The corresponding
matrix equation is given by
0 1 0 1
1 x1 y1
B C ! B C
B1 x2 C b0 B y2 C
B. .. C =B C
B. C b1 B .. C .
@. . A @.A
1 xn yn
This is an overdetermined system of equations with no solution. The problem of least squares is to
find the best solution.
We can generalize this problem as follows. Suppose we are given a matrix equation, Ax = b, that
has no solution because b is not in the column space of A. So instead we solve Ax = bprojCol(A) , where
bprojCol(A) is the projection of b onto the column space of A. The solution is then called the least-squares
solution for x.
85
86 LECTURE 26. THE LEAST-SQUARES PROBLEM
A unique solution to this matrix equation exists when the columns of A are linearly independent.
An interesting formula exists for the matrix which projects b onto the column space of A. Multi-
plying the normal equations on the left by A(AT A) 1, we obtain
Ax = A(AT A) 1
AT b = bprojCol(A) .
Notice that the projection matrix P = A(AT A) 1 AT satisfies P2 = P, that is, two projections is the
same as one. If A itself is a square invertible matrix, then P = I and b is already in the column space
of A.
As an example of the application of the normal equations, consider the toy least-squares problem of
fitting a line through the three data points (1, 1), (2, 3) and (3, 2). With the line given by y = b 0 + b 1 x,
the overdetermined system of equations is given by
0 1 0 1
1 1 ! 1
B C b0 B C
@1 2A = @3A .
b1
1 3 2
or ! ! !
3 6 b0 6
= .
6 14 b1 13
We can using Gaussian elimination to determine b 0 = 1 and b 1 = 1/2, and the least-squares line is
given by y = 1 + x/2. The graph of the data and the line is shown below.
87
88 LECTURE 27. SOLUTION OF THE LEAST-SQUARES PROBLEM
2
y
1 2 3
x
2. Suppose we have data points given by ( xn , yn ) = (1, 1), (2, 1), and (3, 3). If the data is to be fit by
the line y = b 0 + b 1 x, which is the overdetermined equation for b 0 and b 1 ?
0 1 0 1
1 1 ! 1
B C b0 B C
a) @1 1A = @2A
b1
3 1 3
0 1 0 1
1 1 ! 1
B C b0 B C
b) @2 1A = @1A
b1
3 1 3
0 1 0 1
1 1 ! 1
B C b0 B C
c) @1 1A = @2A
b1
1 3 3
0 1 0 1
1 1 ! 1
B C b0 B C
d) @1 2A = @1A
b1
1 3 3
91
92 LECTURE 27. SOLUTION OF THE LEAST-SQUARES PROBLEM
3. Suppose we have data points given by ( xn , yn ) = (1, 1), (2, 1), and (3, 3). Which is the best fit line
to the data?
1
a) y = +x
3
1
b) y = +x
3
1
c) y = 1 + x
3
1
d) y = 1 x
3
Solutions to the Practice quiz
Week IV
93
95
In this week’s lectures, we will learn about determinants and the eigenvalue problem. We will
learn how to compute determinants using a Laplace expansion, the Leibniz formula, or by row or
column elimination. We will formulate the eigenvalue problem and learn how to find the eigenvalues
and eigenvectors of a matrix. We will learn how to diagonalize a matrix using its eigenvalues and
eigenvectors, and how this leads to an easy calculation of a matrix raised to a power.
96
Lecture 28
We already showed that a two-by-two matrix A is invertible when its determinant is nonzero, where
a b
det A = = ad bc.
c d
If A is invertible, then the equation Ax = b has the unique solution x = A 1 b. But if A is not invertible,
then Ax = b may have no solution or an infinite number of solutions. When det A = 0, we say that
the matrix A is singular.
It is also straightforward to define the determinant for a three-by-three matrix. We consider the
system of equations Ax = 0 and determine the condition for which x = 0 is the only solution. With
0 10 1
a b c x1
B CB C
@d e f A @ x2 A = 0,
g h i x3
one can do the messy algebra of elimination to solve for x1 , x2 , and x3 . One finds that x1 = x2 = x3 = 0
is the only solution when det A 6= 0, where the definition, apart from a constant, is given by
An easy way to remember this result is to mentally draw the following picture:
0 1 0 1
B a b c a b C B a b c a b C
B C B C
B d e f d e C B e C
B C — B d e f d C
B C B C
@ A @ A
g h i g h g h i g h
.
The matrix A is periodically extended two columns to the right, drawn explicitly here but usually only
imagined. Then the six terms comprising the determinant are made evident, with the lines slanting
down towards the right getting the plus signs and the lines slanting down towards the left getting the
minus signs. Unfortunately, this mnemonic only works for three-by-three matrices.
97
98 LECTURE 28. TWO-BY-TWO AND THREE-BY-THREE DETERMINANTS
2. Show that the three-by-three determinant changes sign when the first two rows are interchanged.
3. Let A and B be two-by-two matrices. Prove by direct computation that det AB = det A det B.
Laplace expansion
View this lecture on YouTube
There is a way to write the three-by-three determinant that generalizes. It is called a Laplace expansion
(also called a cofactor expansion or expansion by minors). For the three-by-three determinant, we have
a b c
d e f = aei + b f g + cdh ceg bdi afh
g h i
= a(ei f h) b(di f g) + c(dh eg),
a b c
e f d f d e
d e f =a b +c .
h i g i g h
g h i
Evidently, the three-by-three determinant can be computed from lower-order two-by-two determi-
nants, called minors. The rule here for a general n-by-n matrix is that one goes across the first row of
the matrix, multiplying each element in the row by the determinant of the matrix obtained by crossing
out that element’s row and column, and adding the results with alternating signs.
In fact, this expansion in minors can be done across any row or down any column. When the minor
is obtained by deleting the ith-row and j-th column, then the sign of the term is given by ( 1)i+ j . An
easy way to remember the signs is to form a checkerboard pattern, exhibited here for the three-by-three
and four-by-four matrices: 0 1
0 1 + +
+ + B C
B C B + +C
@ + A, B C.
B+ + C
@ A
+ +
+ +
We first expand in minors down the second column. The only nonzero contribution comes from the
99
100 LECTURE 29. LAPLACE EXPANSION
two in the third row, and we cross out the second column and third row (and multiply by a minus
sign) to obtain a three-by-three determinant:
1 0 0 1
1 0 1
3 0 0 5
= 2 3 0 5 .
2 2 4 3
1 5 0
1 0 5 0
We then again expand in minors down the second column. The only nonzero contribution comes
from the five in the third row, and we cross out the second column and third row (and mutiply by a
minus sign) to obtain a two-by-two determinant, which we then compute:
1 0 1
1 1
2 3 0 5 = 10 = 80.
3 5
1 5 0
The trick here is to expand by minors across the row or column containing the most zeros.
101
Leibniz formula
View this lecture on YouTube
Another way to generalize the three-by-three determinant is called the Leibniz formula, or more de-
scriptively, the big formula. The three-by-three determinant can be written as
a b c
d e f = aei afh + bfg bdi + cdh ceg,
g h i
where each term in the formula contains a single element from each row and from each column. For
example, to obtain the third term b f g, b comes from the first row and second column, f comes from
the second row and third column, and g comes from the third row and first column. As we can choose
one of three elements from the first row, then one of two elements from the second row, and only
one element from the third row, there are 3! = 6 terms in the formula, and the general n-by-n matrix
without any zero entries will have n! terms.
The sign of each term depends on whether the choice of columns as we go down the rows is an even
or odd permutation of the columns ordered as {1, 2, 3, . . . , n}. An even permutation is when columns
are interchanged an even number of times, and an odd permutation is when they are interchanged an
odd number of times. Even permutations get a plus sign and odd permutations get a minus sign.
For the determinant of the three-by-three matrix, the plus terms aei, b f g, and cdh correspond to
the column orderings {1, 2, 3}, {2, 3, 1}, and {3, 1, 2}, which are even permutations of {1, 2, 3}, and
the minus terms a f h, bdi, and ceg correspond to the column orderings {1, 3, 2}, {2, 1, 3}, and {3, 2, 1},
which are odd permutations.
103
104 LECTURE 30. LEIBNIZ FORMULA
Properties of a determinant
View this lecture on YouTube
The determinant is a function that maps a square matrix to a scalar. It is uniquely defined by the
following three properties:
Property 1: The determinant of the identity matrix is one;
Property 2: The determinant changes sign under row interchange;
Property 3: The determinant is a linear function of the first row, holding all other rows fixed.
Using two-by-two matrices, the first two properties are illustrated by
1 0 a b c d
=1 and = ;
0 1 c d a b
ka kb a b a + a0 b + b0 a b a0 b0
=k and = + .
c d c d c d c d c d
Both the Laplace expansion and Leibniz formula for the determinant can be proved from these three
properties. Other useful properties of the determinant can also be proved:
• The determinant is a linear function of any row, holding all other rows fixed;
• The determinant of an upper or lower triangular matrix is the product of the diagonal elements;
• The determinant of the product of two matrices is equal to the product of the determinants;
• The determinant of the inverse matrix is equal to the reciprical of the determinant;
• The determinant of the transpose of a matrix is equal to the determinant of the matrix.
Notably, these properties imply that Gaussian elimination, done on rows or columns or both, can be
used to simplify the computation of a determinant. Row interchanges and multiplication of a row by
a constant change the determinant and must be treated correctly.
105
106 LECTURE 31. PROPERTIES OF A DETERMINANT
2. Using the defining properties of a determinant, prove that the determinant is a linear function of
any row, holding all other rows fixed.
3. Using the results of the above problems, prove that if we add k times row-i to row-j, the determinant
doesn’t change.
a) 48
b) 42
c) 42
d) 48
0 1
a e 0 0
B C
Bb f g 0C
2. The determinant of B
Bc
C is equal to
@ 0 h iCA
d 0 0 j
c) agij beij + ce f j de f h
d) agij + beij ce f j de f h
3. Assume A and B are invertible n-by-n matrices. Which of the following identities is false?
a) det A 1 = 1/ det A
b) det AT = det A
107
108 LECTURE 31. PROPERTIES OF A DETERMINANT
Lecture 32
Let A be a square matrix, x a column vector, and l a scalar. The eigenvalue problem for A solves
Ax = lx
for eigenvalues li with corresponding eigenvectors xi . Making use of the identity matrix I, the eigen-
value problem can be rewritten as
(A lI)x = 0,
where the matrix (A lI) is just the matrix A with l subtracted from its diagonal. For there to be
nonzero eigenvectors, the matrix (A lI) must be singular, that is,
det (A lI) = 0.
This equation is called the characteristic equation of the matrix A. From the Leibniz formula, the char-
acteristic equation of an n-by-n matrix is an n-th order polynomial equation in l. For each found li , a
corresponding eigenvector xi can be determined directly by solving (A li I)x = 0 for x.
For illustration, we compute the eigenvalues of a general two-by-two matrix. We have
a l b
0 = det (A lI) = = (a l)(d l) bc = l2 ( a + d)l + ( ad bc);
c d l
l2 Tr A l + det A = 0,
109
110 LECTURE 32. THE EIGENVALUE PROBLEM
We compute here the two real eigenvalues and eigenvectors ! of a two-by-two matrix.
0 1
Example: Find the eigenvalues and eigenvectors of A = .
1 0
The characteristic equation of A is given by
l2 1 = 0,
The equation from the second row is just a constant multiple of the equation from the first row and
this will always be the case for two-by-two matrices. From the first row, say, we find x2 = x1 . The
second eigenvector is found by solving (A l2 I)x = 0, or
! !
1 1 x1
= 0,
1 1 x2
The eigenvectors can be multiplied by an arbitrary nonzero constant. Notice that l1 + l2 = Tr A and
that l1 l2 = det A, and analogous relations are true for any n-by-n matrix. In particular, comparing
the sum over all the eigenvalues and the matrix trace provides a simple algebra check.
111
112 LECTURE 33. FINDING EIGENVALUES AND EIGENVECTORS (1)
l2 = 0,
so that there is a degenerate eigenvalue of zero. The eigenvector associated with the zero eigenvalue
is found from Bx = 0 and has zero second component. This matrix therefore has only one eigenvalue
and eigenvector, given by !
1
l = 0, x = .
0
!
0 1
Example: Find the eigenvalues of C = .
1 0
The characteristic equation of C is given by
l2 + 1 = 0,
which has the imaginary solutions l = ±i. Matrices with complex eigenvalues play an important role
in the theory of linear differential equations.
113
114 LECTURE 34. FINDING EIGENVALUES AND EIGENVECTORS (2)
a) 1 ± 3i
p
b) 1 ± 3
p
c) 3 3 ± 1
d) 3 ± i
115
116 LECTURE 34. FINDING EIGENVALUES AND EIGENVECTORS (2)
0 1
2 1 0
B C
3. Which of the following is an eigenvector of @1 2 1A?
0 1 2
0 1
1
B C
a) @0A
1
0 1
1
Bp C
b) @ 2A
1
0 1
0
B C
c) @1A
0
0p 1
2
B C
d) @ 1 A
p
2
Matrix diagonalization
View this lecture on YouTube
For concreteness, consider a two-by-two matrix A with eigenvalues and eigenvectors given by
! !
x11 x12
l 1 , x1 = ; l 2 , x2 = .
x21 x22
Generalizing, we define S to be the matrix whose columns are the eigenvectors of A, and L to be
the diagonal matrix with eigenvalues down the diagonal. Then for any n-by-n matrix with n linearly
independent eigenvectors, we have
AS = SL,
where S is an invertible matrix. Multiplying both sides on the right or the left by S 1, we derive the
relations
1 1
A = SLS or L=S AS.
To remember the order of the S and S 1 matrices in these formulas, just remember that A should be
multiplied on the right by the eigenvectors placed in the columns of S.
117
118 LECTURE 35. MATRIX DIAGONALIZATION
2. Prove that if the columns of an n-by-n matrix are linearly independent, then the matrix is invertible.
(An n-by-n matrix whose columns are eigenvectors corresponding to distinct eigenvalues is therefore
invertible.)
!
a b
Example: Diagonalize the matrix A = .
b a
The eigenvalues of A are determined from
a l b
det(A lI) = = (a l )2 b2 = 0.
b a l
Solving for l, the two eigenvalues are given by l1 = a + b and l2 = a b. The corresponding
eigenvector for l1 is found from (A l1 I)x1 = 0, or
! ! !
b b x11 0
= ;
b b x21 0
Solving for the eigenvectors and normalizing them, the eigenvalues and eigenvectors are given by
! !
1 1 1 1
l1 = a + b, x1 = p ; l2 = a b, x2 = p .
2 1 2 1
119
120 LECTURE 36. MATRIX DIAGONALIZATION EXAMPLE
Powers of a matrix
View this lecture on YouTube
Diagonalizing a matrix facilitates finding powers of that matrix. Suppose that A is diagonalizable,
and consider
A2 = (SLS 1
)(SLS 1
) = SL2 S 1
,
In general, L p has the eigenvalues raised to the power of p down the diagonal, and
1
A p = SL p S .
121
122 LECTURE 37. POWERS OF A MATRIX
1 2 1
ex = 1 + x + x + x3 + . . . .
2! 3!
1 2 1
eA = I + A + A + A3 + . . . .
2! 3!
where 0 1
e l1 0 ... 0
B C
B 0 e l2 ... 0 C
L
e =B
B .. .. .. .. C.
C
@ . . . . A
0 0 ... eln
123
124 LECTURE 38. POWERS OF A MATRIX EXAMPLE
125
126 LECTURE 38. POWERS OF A MATRIX EXAMPLE
1.
0 1
0 1 0 1 1 0 0
1 0 0 1 0 0 0 B C
B C B C B0 1 0C
a) @0 1 0A b) @0 1 0 0A c) B
B0
C
@ 0 1CA
0 0 1 0 0 1 0
0 0 0
127
128 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
!
4 7
2. AB = AC = .
8 14
0 1 0 1
2 3 4 2 2 2
B C B C
3. AD = @2 6 12A , DA = @3 6 9 A.
2 9 16 4 12 16
n n p p n p
4. [A(BC)]ij = Â aik [BC]kj = Â Â aik bkl clj = Â Â aik bkl clj = Â [AB]il clj = [(AB)C)]ij .
k =1 k =1 l =1 l =1 k =1 l =1
129
2. Let A be an m-by-p diagonal matrix, B a p-by-n diagonal matrix, and let C = AB. The ij element of
C is given by
p
cij = Â aik bkj .
k =1
Since A is a diagonal matrix, the only nonzero term in the sum is k = i and we have cij = aii bij . And
since B is a diagonal matrix, the only nonzero elements of C are the diagonal elements cii = aii bii .
3. Let A and B be n-by-n upper triangular matrices, and let C = AB. The ij element of C is given by
n
cij = Â aik bkj .
k =1
Since A and B are upper triangular, we have aik = 0 when k < i and bkj = 0 when k > j. Excluding the
zero terms from the summation, we have
j
cij = Â aik bkj ,
k =i
which is equal to zero when i > j proving that C is upper triangular. Furthermore, cii = aii bii .
130 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
! ! !
1 1 1 1 2 2
2. a. =
1 1 1 1 2 2
3. b. For upper triangular matrices A and B, aik = 0 when k < i and bkj = 0 when k > j.
131
p p
cTij = c ji = Â a jk bki = Â bikT aTkj .
k =1 k =1
2. The square matrix A + AT is symmetric, and the square matrix A AT is skew symmetric. Using
these two matrices, we can write
1⇣ ⌘ 1⇣ ⌘
A= A + AT + A AT .
2 2
(AT A)T = AT A.
132 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
1. 0 1
! a d !
T a b c B C a2 + b2 + c2 ad + be + c f
A A= @b eA = .
d e f ad + be + c f d2 + e2 + f 2
c f
n n m m n
Tr(AT A) = Â (AT A) jj = Â Â aTji aij = Â Â a2ij ,
j =1 j =1 i =1 i =1 j =1
1 1 1
(AB) =B A .
1 1
AA = I and A A = I.
Taking the transpose of both sides of these two equations, using both IT = I and (AB)T = BT AT , we
obtain
1 T 1 T
(AA ) = (A ) AT = IT = I and (A 1
A ) T = AT ( A 1 T
) = IT = I.
1 T
(A ) AT = I and AT (A 1 T
) = I,
4. Let A be an invertible matrix, and suppose B and C are its inverse. To prove that B = C, we write
B = BI = B(AC) = (BA)C = C.
134 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
3. a. Exchange the diagonal elements, negate the off-diagonal elements, and divide by the determinant.
! 1 !
2 2 1 2 2
We have = .
1 2 2 1 2
135
1
(Q1 Q2 ) = Q2 1 Q1 1 = QT2 QT1 = (Q1 Q2 )T .
2. The z-coordinate stays fixed, and the vector rotates an angle q in the x-y plane. Therefore,
0 1
cos q sin q 0
B C
Rz = @ sin q cos q 0A .
0 0 1
137
1. 0 1 0 1 0 1
1 0 0 1 0 0 0 1 0
B C B C B C
P123 = @0 1 0A , P132 = @0 0 1A , P213 = @1 0 0A ,
0 0 1 0 1 0 0 0 1
0 1 0 1 0 1
0 1 0 0 0 1 0 0 1
B C B C B C
P231 = @0 0 1A , P312 = @1 0 0A , P321 = @0 1 0A .
1 0 0 0 1 0 1 0 0
2.
1 1 1 1
P123 = P123 , P132 = P132 , P213 = P213 , P321 = P321 ,
1 1
P231 = P312 , P312 = P231 .
The matrices that are their own inverses correspond to either no permutation or a single permutation
of rows (or columns), e.g., {1, 3, 2}, which permutes row (column) two and three. The matrices that are
not their own inverses correspond to two permutations, e.g., {2, 3, 1}, which permutes row (column)
one and two, and then two and three. For example, commuting rows by left multiplication, we have
Because matrices in general do not commute, P2311 6= P231 . Note also that the permutation matrices
are orthogonal, so that the inverse matrices are equal to the transpose matrices. Therefore, only the
symmetric permutation matrices can be their own inverses.
138 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
1. d. An!orthogonal matrix has orthonormal rows and columns. The rows and columns of the matrix
1 1
are not orthonormal and therefore this matrix is not an orthogonal matrix.
0 0
2. a. The rotation matrix representing a counterclockwise rotation around the x-axis in the y-z plane
can be obtained from the rotation matrix representing a counterclockwise rotation around the z-axis in
the x-y plane by shifting the elements to the right one column and down one row, assuming a periodic
0 1
1 0 0
B C
extension of the matrix. The result is @0 cos q sin q A.
0 sin q cos q
0 1 0 1
1 0 0 0 0 1
B C B C
3. b. Interchange the rows of the identity matrix: @0 1 0A ! @1 0 0A.
0 0 1 0 1 0
139
1.
x3 = 6,
1
x2 = ( x3 2) = 4,
2
1
x1 = (7x2 + 2x3 7) = 3.
3
x3 = 1,
x2 = 2x3 = 2,
x1 = 2x2 3x3 + 1 = 8.
1.
1.
0 1 0 1
3 7 2 1 0 0 3 7 2 1 0 0
B C B C
@ 3 5 1 0 1 0A ! @ 0 2 1 1 1 0A !
6 4 0 0 0 1 0 10 4 2 0 1
0 1 0 1
3 0 3/2 5/2 7/2 0 3 0 3/2 5/2 7/2 0
B C B C
@0 2 1 1 1 0A ! @0 2 1 1 1 0A !
0 0 1 3 5 1 0 0 1 3 5 1
0 1
1 0 0 2/3 4/3 1/2
B C
@0 1 0 1 2 1/2A .
0 0 1 3 5 1
Therefore,
0 1 1 0 1
3 7 2 2/3 4/3 1/2
B C B C
@ 3 5 1A =@ 1 2 1/2A .
6 4 0 3 5 1
142 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
2. c. A matrix in reduced row echelon form has all its pivots equal to one, and all the entries
above and below the pivots eliminated. The only matrix that is not in reduced row echelon form is
0 1
1 0 1 0
B C
@0 1 0 0A. The pivot in the third row, third column has a one above it in the first row, third
0 0 1 1
column.
3. d. There are many ways to do this computation by hand, and here is one way:
0 1 0 1 0 1
3 7 2 1 0 0 3 7 2 1 0 0 3 7 2 1 0 0
B C B C B C
@ 3 5 1 0 1 0A ! @0 2 1 1 1 0A ! @0 2 1 1 1 0A !
6 4 0 0 0 1 0
10 4 2 0 1 0 0 1 3 5 1
0 1 0 1 0 1
3 7 0 5 10 2 3 7 0 5 10 2 3 0 0 2 4 3/2
B C B C B C
@0 2 1 1 1 0A ! @0 2 0 2 4 1A ! @0 1 0 1 2 1/2A !
0 0 1 3 5 1 0 0 1 3 5 1 0 0 1 3 5 1
0 1 0 1 1 0 1
1 0 0 2/3 4/3 1/2 3 7 2 2/3 4/3 1/2
B C B C B C
@0 1 0 1 2 1/2A . Therefore, @ 3 5 1A =@ 1 2 1/2A.
0 0 1 3 5 1 6 4 0 3 5 1
143
1. 0 1
1 0 0 0
B C
B0 1 0 0C
M=B
B0
C.
@ 0 1 0CA
0 2 0 1
144 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
1. 0 1 0 1 0 10 1
3 7 2 3 7 2 1 0 0 3 7 2
B C B C B CB C
@ 3 5 1A ! @ 0 2 1A = @1 1 0A @ 3 5 1A
6 4 0 6 4 0 0 0 1 6 4 0
0 1 0 1 0 10 1
3 7 2 3 7 2 1 0 0 3 7 2
B C B C B CB C
@ 0 2 1A ! @0 2 1A = @ 0 1 0A @ 0 2 1A
6 4 0 0 10 4 2 0 1 6 4 0
0 1 0 1 0 10 1
3 7 2 3 7 2 1 0 0 3 7 2
B C B C B CB C
@0 2 1A ! @0 2 1A = @0 1 0A @0 2 1A
0 10 4 0 0 1 0 5 1 0 10 4
Therefore, 0 1 0 10 1
3 7 2 1 0 0 3 7 2
B C B CB C
@ 3 5 1A = @ 1 1 0A @0 2 1A .
6 4 0 2 5 1 0 0 1
145
1. We know 0 1 0 10 1
3 7 2 1 0 0 3 7 2
B C B CB C
A=@ 3 5 1A = @ 1 1 0A @0 2 1A = LU.
6 4 0 2 5 1 0 0 1
To solve LUx = b, we let y = Ux, solve Ly = b for y, and then solve Ux = y for x.
(a)
0 1
3
B C
b = @ 3A
2
y1 = 3,
y1 + y2 = 3,
2y1 5y2 + y3 = 2,
(b)
0 1
1
B C
b = @ 1A
1
y1 = 1,
y1 + y2 = 1,
2y1 5y2 + y3 = 1,
2. b. 0 1 0 1 0 1
3 7 2 3 7 2 1 0 0
B C B C B C
A=@ 3 5 1A ! @0 2 1A = M1 A,
where M1 = @1 1 0A ;
6 4 0 6 4 0 0 0 1
0 1 0 1 0 1
3 7 2 3 7 2 1 0 0
B C B C B C
@0 2 1A ! @0 2 1A = M2 M1 A, where M2 = @ 0 1 0A ;
6 4 0 0 10 4 2 0 1
0 1 0 1 0 1
3 7 2 3 7 2 1 0 0
B C B C B C
@0 2 1A ! @0 2 1A = M3 M2 M1 A, where M3 = @ 0 1 0A .
0 10 4 0 0 1 0 5 1
Therefore, 0 10 1
1 0 0 3 7 2
B CB C
A=@ 1 1 0A @0 2 1A .
2 5 1 0 0 1
3. b. To solve LUx = b, let y = Ux. Then solve Ly = b for y and Ux = y for x. The equations given by
Ly = b are
y1 = 1,
y1 + y2 = 1,
2y1 5y2 + y3 = 1.
0 1
1
B C
Solution by forward substitution gives y = @ 0A. The equations given by Ux = y are
1
1. Let v be a vector in the vector space. Then both 0v and v + ( 1)v must be vectors in the vector
space and both of them are the zero vector.
2. In all of the examples, the vector spaces are closed under scalar multiplication and vector addition.
148 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
1. Only (a) and (b) are linearly independent. (c) is linearly dependent.
149
1. b. A vector space must be closed under vector addition and scalar multiplication. The set of three-
by-one matrices with the sum of all the rows equal to one is not closed under vector addition and
scalar multiplication. For example, if you multiply a vector whose sum of all rows is equal to one by
the scalar k, then the resulting vector’s sum of all rows is equal to k.
3. b. Since a three-by-one matrix has three degrees of freedom, and the constraint that the sum
of all rows equals zero eliminates one degree of freedom, the basis should consist of two vectors.
0 1
1
B C
We can arbitrarily take the first unnormalized vector to be @ 1A. The vector orthogonal to this
0
0 1
1
B C
first vector with sum of all rows equal to zero is @ 1A. Normalizing both of these vectors, we get
2
8 0 1 0 19
>
< 1 1 1 >
B C 1 B C=
p @ 1A , p @ 1A .
>
: 2 6 >
;
0 2
151
1.
(uT1 v4 )u1 (uT2 v4 )u2 (uT3 v4 )u3
u 4 = v4 .
uT1 u1 uT2 u2 uT3 u3
152 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
1. Define 80 1 0 19
>
< 0 1 >
=
B C B C
{v1 , v2 } = @ 1A , @ 1A .
>
: >
;
1 1
(uT1 v2 )u1
u 2 = v2
uT1 u1
0 1 0 1 0 1
1 0 1
B C B C B C
= @ 1A @ 1A = @0A .
1 1 0
2. Define 80 1 0 1 0 19
> 1
>
> 0 0 >>
>
<B C B C B C> >
B1C B1C B0C =
B C B C B C
{v1 , v2 , v3 } = B C , B C , B C .
>
>
> @1A @1A @1A > >
>
>
: >
1 1 1 ;
(uT1 v2 )u1
u 2 = v2
uT1 u1
0 1 0 1 0 1
0 1 3
B C B C B C
B1C 3 B1C 1 B 1C
B
=B C B C
B1C 4 B1C = 4 B 1C ;
C
@ A @ A @ A
1 1 1
2. a. Since the vectors are already orthogonal, we need only normalize them to find
( ! !)
1 1 1 1
{u
b1 , u
b2 } = p ,p .
2 1 2 1
0 1
1
B C
3. b. Let u1 = v1 = @ 1A. Then,
1
0 1 0 1 0 1
0 1 2
(uT1 v2 )u1 B C 2B C 1B C
u 2 = v2 = @ 1A @ 1A = @ 1A .
uT1 u1 3 3
1 1 1
The equation Ax = 0 with the pivot variables on the left-hand sides is given by
x1 = 2x4 , x2 = x4 , x3 = x4 ,
0 1 0 1
2x4 2
B C B C
C = x4 B 1C. A basis for the null
B x4 C B C
and a general vector in the nullspace can be written as B
B x C B 1C
@ 4 A @ A
x4 1
0 1
2
B C
B 1C
space is therefore given by the single vector B C
B 1C.
@ A
1
156 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
We form the augmented matrix and bring the first four columns to reduced row echelon form:
0 1 0 1
3 6 1 1 7 1 2 0 1 3
B C B C
@ 1 2 2 3 1A ! @0 0 1 2 2A .
2 4 5 8 4 0 0 0 0 0
The null space is found from the first four columns solving Au = 0, and writing the basic variables on
the left-hand side, we have the system
u1 = 2u2 + u4 , u3 = 2u4 ;
from which we can write the general form of the null space as
0 1 0 1 0 1
2u2 + u4 2 1
B C B C B C
u2 C = u2 B1C + u4 B 0C
B C B C B
B C.
B 2u4 C B0C B 2C
@ A @ A @ A
u4 0 1
v1 2v2 v4 = 3, v3 + 2v4 = 2.
The free variables v2 and v4 can be set to zero, and the particular solution is determined to be v1 = 3
and v3 = 2. The general solution to the underdetermined system of equations is therefore given by
0 1 0 1 0 1
2 1 3
B C B C B C
B1C 0CC + B 0C .
B B C
x = aB C+bB
B C
B
@0A @ 2A B
C
@ 2A
C
0 1 0
157
1. We find 0 1 0 1
1 1 1 0 1 0 0 2
B C B C
A = @1 1 0 1A , rref(A) = @0 1 0 1A ,
1 0 1 1 0 0 1 1
and dim(Col(A)) = 3, with a basis for the column space given by the first three columns of A.
158 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
Columns one, two, and four are pivot columns of A and columns one, two, and three are pivot columns
of AT . Therefore, the column space of A is given by
80 1 0 1 0 19
>
>
> 2 3 1 >>
>
<B C B C B C>>
B 1C B 1C B 1C =
C
Col(A) = span B C B C B
B 1C , B 2C , B ;
>
>
> @ A @ A @ 1C
A>>
>
>
: >
1 2 1 ;
x1 = x3 + x5 , x2 = x3 2x5 , x4 = 2x5 ,
It can be checked that the null space is the orthogonal complement of the row space and the left null
space is the orthogonal complement of the column space. The rank(A) = 3, and A is not of full rank.
160 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
1. d. To find the null space of a matrix, bring it to reduced row echelon form. We have
0 1 0 1 0 1 0 1
1 2 0 1 1 2 0 1 1 2 0 1 1 2 0 0
B C B C B C B C
@2 4 1 1A ! @0 0 1 1A ! @0 0 1 1A ! @0 0 1 0A .
3 6 1 1 0 0 1 2 0 0 0 1 0 0 0 1
80 19
>
>
> 2 >>
>
<B C>>
B 1C =
With x1 = 2x2 , x3 = 0, and x4 = 0, a basis for the null space is B C .
B 0C >
>
> @ A>
>
> >
>
: 0 ;
2. b. This system of linear equations is underdetermined, and the solution will be a general vector
in the null space of the multiplying matrix plus a particular vector that satisfies the underdetermined
system of equations. The linear system in matrix form is given by
0 1
0 1x1 0 1
1 2 0 1 B C 1
B CB
B x2 C
C B C
@2 4 1 1A B C = @1A .
@ x3 A
3 6 1 1 1
x4
0 1 0 1
1 2 0 1 1 2 0 0
B C B C
3. c. The matrix in reduced row echelon form is @2 4 1 1A ! @0 0 1 0A . The number of
3 6 1 1 0 0 0 1
pivot columns is three, and this is the rank.
161
1. We first use the Gram-Schmidt process to find an orthonormal basis for W. We assign u1 =
⇣ ⌘T
1 1 1 . Then,
0 1
⇣ 0
⌘
B C
0 1 1 1 1 @1A 0 1
0 1
B C 1 B C
u2 = @1A 0 1 @1A
1 ⇣ ⌘ 1
B C 1
1 1 1 @1A
1
0 1 0 1 0 1
0 1 2
B C 2B C 1B C
= @1A @1A = @ 1A .
3 3
1 1 1
When a = 1, b = c = 0, we have
0 1 0 0 11
1 2 1
1B C 1B C B C
vprojW = @1A @ 1A = @0A ;
3 3
1 1 0
1. 0 1 0 1
1 0 ! B1C
B C
B1 1C b0 B3C
B C =B C
B1 2C b1 B3C
@ A @ A
1 3 4
163
or ! ! !
4 6 b0 11
= .
6 14 b1 21
The solution is b 0 = 7/5 and b 1 = 9/10, and the least-squares line is given by y = 7/5 + 9x/10.
164 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
b 0 + b 1 x1 = y1 ,
b 0 + b 1 x2 = y2 ,
b 0 + b 1 x3 = y3 .
Substituting in the values for x and y, and writing in matrix form, we have
0 1 0 1
1 1 ! 1
B C b0 B C
@1 2A = @1A .
b1
1 3 3
1
The best fit line is therefore y = + x.
3
165
1.
1 0 0
0 1 0 = 1 ⇥ 1 ⇥ 1 = 1.
0 0 1
2.
d e f
a b c = dbi + ecg + f ah f bg eai dch
g h i
a b c
= ( aei + b f g + cdh ceg bdi a f h) = d e f .
g h i
3. Let ! !
a b e f
A= , B= .
c d g h
Then !
ae + bg a f + bh
AB = ,
ce + dg c f + dh
and
6 3 2 4 0
3 2 4 0
9 0 4 1 0
0 4 1 0
8 5 6 7 2 =2 .
5 6 7 2
2 0 0 0 0
0 3 2 0
4 0 3 2 0
3 2 4 0
3 2 4
0 4 1 0
2 =4 0 4 1 .
5 6 7 2
0 3 2
0 3 2 0
3 2 4
4 1
4 0 4 1 = 12 = 60.
3 2
0 3 2
167
1. For each element chosen from the first row, there is only a single way to choose nonzero elements
from all subsequent rows. Considering whether the columns chosen are even or odd permutations of
the ordered set {1, 2, 3, 4}, we obtain
a b c d
e f 0 0
= a f hj behj + cegj degi.
0 g h 0
0 0 i j
168 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
1. Suppose the square matrix A has two zero rows. If we interchange these two rows, the determinant
of A changes sign according to Property 2, even though A doesn’t change. Therefore, det A = det A,
or det A = 0.
2. To prove that the determinant is a linear function of row i, interchange rows 1 and row i using
Property 2. Use Property 3, then interchange rows 1 and row i again.
3. Consider a general n-by-n matrix. Using the linear property of the jth row, and that a matrix with
two equal rows has zero determinant, we have
.. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . .
ai1 ... ain ai1 ... ain ai1 ... ain ai1 ... ain
.. ..
.
..
. . = ... ..
.
..
. +k
..
.
..
.
.. .
. = ..
..
.
..
. .
a j1 + kai1 ... a jn + kain a j1 ... a jn ai1 ... ain a j1 ... a jn
.. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . .
4.
2 0 1 2 0 1 2 0 1
3 1 1 = 0 1 5/2 = 0 1 5/2 = 2 ⇥ 1 ⇥ 7/2 = 7.
0 1 1 0 1 1 0 0 7/2
169
1. a. To find the determinant of a matrix with many zero elements, perform a Laplace expansion
across the row or down the column with the most zeros. Choose the correct sign. We have for one
expansion choice,
3 0 2 0 0
3 0 2 0
2 2 2 0 0 3 0 2
2 2 2 0 3 0
0 0 2 0 0 =2 = 4 2 2 2 =8 = 48.
0 0 2 0 2 2
3 0 3 2 3 0 0 2
3 3 3 2
3 3 3 0 2
2. b. We can apply the Leibniz formula by going down the first column. For each element in the first
column there is only one possible choice of elements from the other three columns. We have
0 1
a e 0 0
B C
Bb
B f g 0CC = a f hj
Bc behj + cegj degi.
@ 0 h iCA
d 0 0 j
The signs are obtained by considering whether the following permutations of the rows {1, 2, 3, 4} are
even or odd: a f hj = {1, 2, 3, 4} (even); behj = {2, 1, 3, 4} (odd); cegj = {3, 1, 2, 4} (even); degi =
{4, 1, 2, 3} (odd).
a l b c
0 = det(A lI) = d e l f
g h i l
= (a l)(e l)(i l) + b f g + cdh c(e l) g bd(i l) (a l) f h
3 2
= l + ( a + e + i)l ( ae + ai + ei bd cg f h)l + aei + b f g + cdh ceg bdi a f h.
e f a c a b
ae + ai + ei bd cg f h = (ei f h) + ( ai cg) + ( ae bd) = + + ,
h i g i d e
which are the sum of the minors obtained by crossing out the rows and columns of the diagonal
elements. The cubic equation in more memorable form is therefore
2 l 7
0 = det (A lI) = = (2 l )2 49.
7 2 l
2 l 1 0 ⇣ ⌘
0 = det (A lI) = 1 2 l 1 = (2 l ) (2 l )2 2 .
0 1 2 l
p p
Therefore, l1 = 2, l2 = 2 2, and l3 = 2 + 2. The eigenvector for l1 = 2 are found from
0 10 1
0 1 0 x1
B CB C
@1 0 1A @ x2 A = 0,
0 1 0 x3
0 1
1
B C p
or x2 = 0 and x1 + x3 = 0, or x1 = @ 0A. The eigenvector for l2 = 2 2 is found from
1
0p 10 1
2 1 0 x1
B p CB C
@ 1 2 1 A @ x2 A = 0.
p
0 1 2 x3
1 l 1
0 = det (A lI) = = (1 l)2 + 1.
1 1 l
1. b. The characteristic equation det (A lI) = 0 for a two-by-two matrix A results in the quadratic
equation l2 TrA l + det A = 0, which for
p the given matrix
p yields l
2
3l + 1 = 0. Application of
3± 9 4 3 5
the quadratic formula results in l± = = ± .
2 2 2
3 l 1 p
2. d. The characteristic equation is det (A lI) = = (3 l)2 + 1 = 0. With i = 1,
1 3 l
the solution is l± = 3 ± i.
3. b. One can either compute the eigenvalues and eigenvectors of the matrix, or test the given possible
answers. If we test the answers, then only one is an eigenvector, and we have
0 10 1 0 p 1 0 1
2 1 0 1 2+ 2 1
B C Bp C B p C p Bp C
@1 2 1A @ 2A = @2 + 2 2A = (2 + 2) @ 2A .
p
0 1 2 1 2+2 1
175
c1 x1 + c2 x2 = 0.
To prove that x1 and x2 are linearly independent, we need to show that c1 = c2 = 0. Multiply the
above equation on the left by A and use Ax1 = l1 x1 and Ax2 = l2 x2 to obtain
c1 l1 x1 + c2 l2 x2 = 0.
( l1 l2 )c1 x1 = 0, ( l2 l1 )c2 x2 = 0,
from which we conclude that if l1 6= l2 , then c1 = c2 = 0 and x1 and x2 are linearly independent.
dim(Col(A)) + dim(Null(A)) = n.
Since the columns of A are linearly independent, we have dim(Col(A)) = n and dim(Null(A)) = 0.
If the only solution to Ax = 0 is the zero vector, then det A 6= 0 and A is invertible.
176 APPENDIX A. PROBLEM AND PRACTICE QUIZ SOLUTIONS
Notice that the three eigenvectors are mutually orthogonal. This will happen when the matrix is
symmetric. If we normalize the eigenvectors, the matrix with eigenvectors as columns will be an
orthogonal matrix. Normalizing the orthogonal eigenvectors (so that S 1 = ST ) , we have
0 p 1
1/ 2 1/2 1/2
B p p C
S=@ 0 1/ 2 1/ 2A .
p
1/ 2 1/2 1/2
We therefore find
0 1 0 p p 10 10 p 1
2 0 0 1/ 2 0 1/ 2 2 1 0 1/ 2 1/2 1/2
B p C B p CB CB p p C
@0 2 2 0 A = @ 1/2 1/ 2 1/2 A @1 2 1A @ 0 1/ 2 1/ 2A
p p p
0 0 2+ 2 1/2 1/ 2 1/2 0 1 2 1/ 2 1/2 1/2
177
1.
1
eA = eSLS
SL2 S 1 SL3 S 1
= I + SLS 1 + + +...
2! 3!
✓ 2 3 ◆
L L 1
= S I+L+ + +... S
2! 3!
= SeL S 1
.
Because L is a diagonal matrix, the powers of L are also diagonal matrices with the diagonal elements
raised to the specified power. Each diagonal element of eL contains a power series of the form
l2i l3
1 + li + + i +...,
2! 3!
to find !n !
1 1 2n 1 2n 1
= 1 1
.
1 1 2n 2n
179
!2 ! !100 0 !2 150
0 1 1 0 0 1 0 1
2. d. A simple calculation shows that = = I. Therefore =@ A =
1 0 0 1 1 0 1 0
I50= I.
A more complicated calculation diagonalizes this symmetric
! matrix. The eigenvalues! and orthonor-
1 1 1 1
mal eigenvectors are found to be l1 = 1, v1 = p and l2 = 1, v2 = p . The diagonal-
2 1 2 1
! ! ! !
0 1 1 1 1 1 0 1 1
ization then takes the form = . Then,
1 0 2 1 1 0 1 1 1
!100 ! !100 ! ! ! !
0 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1
= = = I.
1 0 2 1 1 0 1 1 1 2 1 1 0 1 1 1
3. a. We have
✓ ◆ !
I2 I3 1 1 e 0
e = I + I + + + · · · = I 1 + 1 + + + . . . = Ie1 =
I
.
2! 3! 2! 3! 0 e