Math 214 Notes
Math 214 Notes
Kenneth Kuttler
August 19, 2011
Contents
0.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1 Fundamentals
1.0.1 Outcomes . . . . . . . . . . . . . . . . . . .
1.1 Rn . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Algebra in Rn . . . . . . . . . . . . . . . . . . . . .
1.3 Geometric Meaning Of Vector Addition In R3 . . .
1.4 Lines . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Distance in Rn . . . . . . . . . . . . . . . . . . . .
1.6 Geometric Meaning Of Scalar Multiplication In R3
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . .
1.8 Exercises With Answers . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
15
16
18
20
24
24
27
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
29
29
32
35
37
38
39
41
42
44
48
3 Determinants
3.0.1 Outcomes . . . . . . . . . . . . . . . . . . . .
3.1 Basic Techniques And Properties . . . . . . . . . . .
3.1.1 Cofactors And 2 2 Determinants . . . . . .
3.1.2 The Determinant Of A Triangular Matrix . .
3.1.3 Properties Of Determinants . . . . . . . . . .
3.1.4 Finding Determinants Using Row Operations
3.2 Applications . . . . . . . . . . . . . . . . . . . . . . .
3.2.1 A Formula For The Inverse . . . . . . . . . .
3.2.2 Cramers Rule . . . . . . . . . . . . . . . . .
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Exercises With Answers . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
53
53
53
56
58
59
61
61
64
66
71
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
II
CONTENTS
Vectors In Rn
77
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
79
79
79
82
87
88
5 Vector Products
5.0.1 Outcomes . . . . . . . . . . . . . . . . . . . .
5.1 The Dot Product . . . . . . . . . . . . . . . . . . . .
5.2 The Geometric Significance Of The Dot Product . .
5.2.1 The Angle Between Two Vectors . . . . . . .
5.2.2 Work And Projections . . . . . . . . . . . . .
5.2.3 The Parabolic Mirror, An Application . . . .
5.2.4 The Dot Product And Distance In Cn . . . .
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Exercises With Answers . . . . . . . . . . . . . . . .
5.5 The Cross Product . . . . . . . . . . . . . . . . . . .
5.5.1 The Distributive Law For The Cross Product
5.5.2 Torque . . . . . . . . . . . . . . . . . . . . . .
5.5.3 Center Of Mass . . . . . . . . . . . . . . . . .
5.5.4 Angular Velocity . . . . . . . . . . . . . . . .
5.5.5 The Box Product . . . . . . . . . . . . . . . .
5.6 Vector Identities And Notation . . . . . . . . . . . .
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . .
5.8 Exercises With Answers . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
91
91
91
94
94
96
98
100
103
104
105
108
109
111
112
113
115
117
119
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
123
123
123
126
129
III
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Vector Calculus
131
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
133
133
133
134
135
136
137
140
140
143
146
146
148
151
151
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
153
153
153
155
156
158
160
160
161
163
165
166
170
171
172
172
174
179
181
182
182
185
188
188
189
190
190
194
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
First
. . .
. . .
. . .
. . .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
197
197
197
200
202
205
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
Law
. . .
. . .
. . .
. . .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
209
209
209
210
212
213
214
216
218
219
219
222
222
224
225
227
.
.
.
.
.
.
.
.
.
.
IV
CONTENTS
229
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
231
231
231
233
234
234
236
238
240
240
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
243
243
243
245
246
251
252
252
252
256
258
259
262
263
268
268
269
271
272
272
275
276
277
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
281
281
281
284
285
286
288
292
296
301
303
306
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
14 The Riemann Integral On Rn
14.0.1 Outcomes . . . . . . . . .
14.1 Methods For Double Integrals . .
14.1.1 Density And Mass . . . .
14.2 Exercises . . . . . . . . . . . . .
14.3 Methods For Triple Integrals . .
14.3.1 Definition Of The Integral
14.3.2 Iterated Integrals . . . . .
14.3.3 Mass And Density . . . .
14.4 Exercises . . . . . . . . . . . . .
14.5 Exercises With Answers . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Center Of
. . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
309
309
309
316
316
318
318
320
323
324
326
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Mass
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
331
331
331
332
334
340
342
347
348
350
352
354
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
R3
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
357
357
357
361
363
364
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
369
369
369
370
372
372
373
374
377
378
378
379
379
380
381
386
387
388
389
389
390
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
393
393
393
398
400
402
403
406
406
408
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
411
411
413
413
414
415
415
416
416
417
419
420
420
421
422
423
425
426
0.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
437
440
442
450
454
460
Introduction
Multivariable calculus is just calculus which involves more than one variable. To do it
properly, you have to use some linear algebra. Otherwise it is impossible to understand.
This book presents the necessary linear algebra and then uses it as a framework upon
which to build multivariable calculus. This is not the usual approach in beginning
courses but it is the correct approach, leaving open the possibility that at least some
students will learn and understand the topics presented. For example, the derivative of
a function of many variables is a linear transformation. If you dont know what a linear
transformation is, then you cant understand the derivative because that is what it is
0.1. INTRODUCTION
and nothing else can be correctly substituted for it. The chain rule is best understood
in terms of products of matrices which represent the various derivatives. The concepts
involving multiple integrals involve determinants. The understandable version of the
second derivative test uses eigenvalues, etc.
The purpose of this book is to present this subject in a way which can be understood
by a motivated student. Because of the inherent difficulty, any treatment which is easy
for the majority of students will not yield a correct understanding. However, the attempt
is being made to make it as easy as possible.
Many applications are presented. Some of these are very difficult but worthwhile.
Hard sections are starred in the table of contents. Most of these sections are enrichment material and can be omitted if one desires nothing more than what is usually
done in a standard calculus class. Stunningly difficult sections having substantial mathematical content are also decorated with a picture of a battle between a dragon slayer
and a dragon, the outcome of the contest uncertain. These sections are for fearless
students who want to understand the subject more than they want to preserve their
egos. Sometimes the dragon wins.
10
CONTENTS
Part I
11
Fundamentals
1.0.1
Outcomes
1.1
Rn
The notation, Rn refers to the collection of ordered lists of n real numbers. More
precisely, consider the following definition.
Definition 1.1.1 Define
Rn {(x1 , , xn ) : xj R for j = 1, , n} .
(x1 , , xn ) = (y1 , , yn ) if and only if for all j = 1, , n, xj = yj . When (x1 , , xn )
Rn , it is conventional to denote (x1 , , xn ) by the single bold face letter, x. The
numbers, xj are called the coordinates. The set
{(0, , 0, t, 0, , 0) : t R }
for t in the ith slot is called the ith coordinate axis coordinate axis, the xi axis for
short. The point 0 (0, , 0) is called the origin.
Thus (1, 2, 4) R3 and (2, 1, 4) R3 but (1, 2, 4) 6= (2, 1, 4) because, even though
the same numbers are involved, they dont match up. In particular, the first entries are
not equal.
Why would anyone be interested in such a thing? First consider the case when
n = 1. Then from the definition, R1 = R. Recall that R is identified with the points of
a line. Look at the number line again. Observe that this amounts to identifying a point
on this line with a real number. In other words a real number determines where you
are on this line. Now suppose n = 2 and consider two lines which intersect each other
at right angles as shown in the following picture.
13
14
FUNDAMENTALS
(2, 6)
6
(8, 3)
3
2
Notice how you can identify a point shown in the plane with the ordered pair, (2, 6) .
You go to the right a distance of 2 and then up a distance of 6. Similarly, you can identify
another point in the plane with the ordered pair (8, 3) . Go to the left a distance of 8
and then up a distance of 3. The reason you go to the left is that there is a sign on the
eight. From this reasoning, every ordered pair determines a unique point in the plane.
Conversely, taking a point in the plane, you could draw two lines through the point,
one vertical and the other horizontal and determine unique points, x1 on the horizontal
line in the above picture and x2 on the vertical line in the above picture, such that
the point of interest is identified with the ordered pair, (x1 , x2 ) . In short, points in the
plane can be identified with ordered pairs similar to the way that points on the real
line are identified with real numbers. Now suppose n = 3. As just explained, the first
two coordinates determine a point in a plane. Letting the third component determine
how far up or down you go, depending on whether this number is positive or negative,
this determines a point in space. Thus, (1, 4, 5) would mean to determine the point
in the plane that goes with (1, 4) and then to go below this plane a distance of 5 to
obtain a unique point in space. You see that the ordered triples correspond to points in
space just as the ordered pairs correspond to points in a plane and single real numbers
correspond to points on a line.
You cant stop here and say that you are only interested in n 3. What if you were
interested in the motion of two objects? You would need three coordinates to describe
where the first object is and you would need another three coordinates to describe
where the other object is located. Therefore, you would need to be considering R6 . If
the two objects moved around, you would need a time coordinate as well. As another
example, consider a hot object which is cooling and suppose you want the temperature
of this object. How many coordinates would be needed? You would need one for the
temperature, three for the position of the point in the object and one more for the
time. Thus you would need to be considering R5 . Many other examples can be given.
Sometimes n is very large. This is often the case in applications to business when they
are trying to maximize profit subject to constraints. It also occurs in numerical analysis
when people try to solve hard problems on a computer.
There are other ways to identify points in space with three numbers but the one
presented is the most basic. In this case, the coordinates are known as Cartesian
coordinates after Descartes1 who invented this idea in the first half of the seventeenth
century. I will often not bother to draw a distinction between the point in n dimensional
space and its Cartesian coordinates.
1 Ren
e Descartes 1596-1650 is often credited with inventing analytic geometry although it seems
the ideas were actually known much earlier. He was interested in many different subjects, physiology,
chemistry, and physics being some of them. He also wrote a large book in which he tried to explain
the book of Genesis scientifically. Descartes ended up dying in Sweden.
1.2. ALGEBRA IN RN
1.2
15
Algebra in Rn
There are two algebraic operations done with elements of Rn . One is addition and the
other is multiplication by numbers, called scalars.
Definition 1.2.1 If x Rn and a is a number, also called a scalar, then ax Rn is
defined by
ax = a (x1 , , xn ) (ax1 , , axn ) .
(1.1)
This is known as scalar multiplication. If x, y Rn then x + y Rn and is defined
by
x + y = (x1 , , xn ) + (y1 , , yn )
(x1 + y1 , , xn + yn )
(1.2)
(1.4)
(1.5)
v+ (v) = 0,
(1.6)
(1.7)
( + ) v =v+v,
(1.8)
(v) = (v) ,
(1.9)
1v = v.
(1.10)
16
FUNDAMENTALS
1.3
It was explained earlier that an element of Rn is an n tuple of numbers and it was also
shown that this can be used to determine a point in three dimensional space in the case
where n = 3 and in two dimensional space, in the case where n = 2. This point was
specified reletive to some coordinate axes.
Consider the case where n = 3 for now. If you draw an arrow from the point in
three dimensional space determined by (0, 0, 0) to the point (a, b, c) with its tail sitting
at the point (0, 0, 0) and its point at the point (a, b, c) , this arrow is called the position
vector of the point determined by u (a, b, c) . One way to get to this point is to start
at (0, 0, 0) and move in the direction of the x1 axis to (a, 0, 0) and then in the direction of
the x2 axis to (a, b, 0) and finally in the direction of the x3 axis to (a, b, c) . It is evident
that the same arrow (vector) would result if you began at the point, v (d, e, f ) , moved
in the direction of the x1 axis to (d + a, e, f ) , then in the direction of the x2 axis to
(d + a, e + b, f ) , and finally in the x3 direction to (d + a, e + b, f + c) only this time, the
arrow would have its tail sitting at the point determined by v (d, e, f ) and its point at
(d + a, e + b, f + c) . It is said to be the same arrow (vector) because it will point in the
same direction and have the same length. It is like you took an actual arrow, the sort
of thing you shoot with a bow, and moved it from one location to another keeping it
pointing the same direction. This is illustrated in the following picture in which v + u
is illustrated. Note the parallelogram determined in the picture by the vectors u and v.
x
1
@
I
x3 v @ u + v
u
x2
17
HH
HH
v H
j
First here is a picture of u + v. You first draw u and then at the point of u you
place the tail of v as shown. Then u + v is the vector which results which is drawn in
the following pretty picture.
HHH
vHH
j
:
u + v
Next consider u v. This means u+ (v) . From the above geometric description
of vector addition, v is the vector which has the same length but which points in the
opposite direction to v. Here is a picture.
H
Y
v
6HH
HH
u + (v)
Finally consider the vector u+2v. Here is a picture of this one also.
HH
HH2v
HH
H
u + 2v
Hj
H
18
FUNDAMENTALS
1.4
Lines
To begin with consider the case n = 1, 2. In the case where n = 1, the only line is just
R1 = R. Therefore, if x1 and x2 are two different points in R, consider
x = x1 + t (x2 x1 )
where t R and the totality of all such points will give R. You see that you can
always solve the above equation for t, showing that every point on R is of this form.
Now consider the plane. Does a similar formula hold? Let (x1 , y1 ) and (x2 , y2 ) be two
different points in R2 which are contained in a line, l. Suppose that x1 6= x2 . Then if
(x, y) is an arbitrary point on l,
(x, y)
(x2 , y2 )
(x1 , y1 )
y y1
y2 y1
=
x2 x1
x x1
x = x1 + t x2 x1
where t R. This is known as a parametric equation and the variable t is called the
parameter.
1.4. LINES
19
Often t denotes time in applications to Physics. Note this definition agrees with the
usual notion of a line in two dimensions and so this is consistent with earlier concepts.
Lemma 1.4.2 Let a, b Rn with a 6= 0. Then x = ta + b, t R, is a line.
1
1
2
1
2
1
Proof:
Let x = b and let x x = a so that x 6= x . Then ta + 1b = x +
2
1
t x x and so x = ta + b is a line containing the two different points, x and x2 .
This proves the lemma.
Definition 1.4.3 The vector a in the above lemma is called a direction vector for
the line.
Definition 1.4.4 Let p and q be two points in Rn , p 6= q. The directed line segment
is defined to be the collection of points,
from p to q, denoted by
pq,
x = p + t (q p) , t [0, 1]
with the direction corresponding to increasing t. In the definition, when t = 0, the point
p is obtained and as t increases other points on this line segment are obtained until when
t = 1, you get the point, q. This is what is meant by saying the direction corresponds
to increasing t.
as an arrow whose point is on q and whose base is at p as shown in
Think of
pq
the following picture.
20
FUNDAMENTALS
(1.11)
Sometimes people elect to write a line like the above in the form
x = 1 + t, y = 2 + 2t, z = t, t R.
(1.12)
This is a set of scalar parametric equations which amounts to the same thing as 1.11.
There is one other form for a line which is sometimes considered useful. It is the so
called symmetric form. Consider the line of 1.12. You can solve for the parameter, t to
write
y2
t = x 1, t =
, t = z.
2
Therefore,
y2
x1=
= z.
2
This is the symmetric form of the line.
Example 1.4.7 Suppose the symmetric form of a line is
x2
y1
=
= z + 3.
3
2
Find the line in parametric form.
Let t =
x2
3 ,t
y1
2
x = 3t + 2, y = 2t + 1, z = t 3, t R.
Written in terms of vectors this is
(2, 1, 3) + t (3, 2, 1) = (x, y, z) , t R.
1.5
Distance in Rn
n
X
!1/2
|xk yk |
k=1
This is called the distance formula. Thus |x| |x 0| . The symbol, B (a, r) is
defined by
B (a, r) {x Rn : |x a| < r} .
This is called an open ball of radius r centered at a. It gives all the points in Rn which
are closer to a than r.
First of all note this is a generalization of the notion of distance in R. There the
distance between two points, x and y was given by the absolute value of their difference.
Thus |x y| is equal to the distance between these two points on R. Now |x y| =
1/2
2
(x y)
where the square root is always the positive square root. Thus it is
1.5. DISTANCE IN RN
21
the same formula as the above definition except there is only one term in the sum.
Geometrically, this is the right way to define distance which is seen from the Pythagorean
theorem. Consider the following picture in the case that n = 2.
(y1 , y2 )
(x1 , x2 )
(y1 , x2 )
There are two points in the plane whose Cartesian coordinates are (x1 , x2 ) and
(y1 , y2 ) respectively. Then the solid line joining these two points is the hypotenuse of a
right triangle which is half of the rectangle shown in dotted lines. What is its length?
Note the lengths of the sides of this triangle are |y1 x1 | and |y2 x2 | . Therefore, the
Pythagorean theorem implies the length of the hypotenuse equals
|y1 x1 | + |y2 x2 |
1/2
1/2
2
2
= (y1 x1 ) + (y2 x2 )
(y1 , y2 , x3 )
(x1 , x2 , x3 )
(y1 , x2 , x3 )
By the Pythagorean theorem, the length of the dotted line joining (x1 , x2 , x3 ) and
(y1 , y2 , x3 ) equals
1/2
2
2
(y1 x1 ) + (y2 x2 )
while the length of the line joining (y1 , y2 , x3 ) to (y1 , y2 , y3 ) is just |y3 x3 | . Therefore,
by the Pythagorean theorem again, the length of the line joining the points (x1 , x2 , x3 )
22
FUNDAMENTALS
(y1 x1 ) + (y2 x2 )
1/2 2
)1/2
2
+ (y3 x3 )
1/2
2
2
2
= (y1 x1 ) + (y2 x2 ) + (y3 x3 )
,
which is again just the distance formula above.
This completes the argument that the above definition is reasonable. Of course you
cannot continue drawing pictures in ever higher dimensions but there is no problem
with the formula for distance in any number of dimensions. Here is an example.
Example 1.5.2 Find the distance between the points in R4 ,
a = (1, 2, 4, 6)
and b = (2, 3, 1, 0)
Use the distance formula and write
2
|a b| = (1 2) + (2 3) + (4 (1)) + (6 0) = 47
Therefore, |a b| = 47.
All this amounts to defining the distance between two points as the length of a
straight line joining these two points. However, there is nothing sacred about using
straight lines. One could define the distance to be the length of some other sort of line
joining these points. It wont be done in this book but sometimes this sort of thing is
done.
Another convention which is usually followed, especially in R2 and R3 is to denote
the first component of a point in R2 by x and the second component by y. In R3 it is
customary to denote the first and second components as just described while the third
component is called z.
Example 1.5.3 Describe the points which are at the same distance between (1, 2, 3)
and (0, 1, 2) .
Let (x, y, z) be such a point. Then
q
q
2
2
2
2
2
(x 1) + (y 2) + (z 3) = x2 + (y 1) + (z 2) .
Squaring both sides
2
(x 1) + (y 2) + (z 3) = x2 + (y 1) + (z 2)
and so
x2 2x + 14 + y 2 4y + z 2 6z = x2 + y 2 2y + 5 + z 2 4z
which implies
2x + 14 4y 6z = 2y + 5 4z
and so
2x + 2y + 2z = 9.
(1.13)
Since these steps are reversible, the set of points which is at the same distance from the
two given points consists of the points, (x, y, z) such that 1.13 holds.
The following lemma is fundamental. It is a form of the Cauchy Schwarz inequality.
1.5. DISTANCE IN RN
23
xi yi |x| |y| .
(1.14)
i=1
n
X
n
X
xi yi =
i=1
Pn
i=1
i=1
xi (yi ) =
xi yi
i=1
n
X
x2i + 2t
i=1
n
X
|x| + 2t
n
X
xi yi + t2
i=1
n
X
yi2
i=1
2
xi yi + t2 |y|
i=1
If |y| = 0 then 1.14 is obviously true because both sides equal zero. Therefore, assume
|y| 6= 0 and then p (t) is a polynomial of degree two whose graph opens up. Therefore,
it either has no zeroes, two zeros or one repeated zero. If it has two zeros, the above
inequality must be violated because in this case the graph must dip below the x axis.
Therefore, it either has no zeros or exactly one. From the quadratic formula this happens
exactly when
n
!2
X
2
2
xi yi
4
4 |x| |y| 0
i=1
and so
n
X
i=1
n
X
xi yi =
xi yi |x| |y|
i=1
n
X
(xi + yi )
i=1
n
X
x2i + 2
i=1
2
n
X
xi yi +
i=1
n
X
i=1
yi2
24
FUNDAMENTALS
1.6
As discussed earlier, x = (x1 , x2 , x3 ) determines a vector. You draw the line from 0 to
x placing the point of the vector on x. What is the length of this vector? The length
of
p this vector is defined to equal |x| as in Definition 1.5.1. Thus the length of x equals
x21 + x22 + x23 . When you multiply
x by a scalar, , you get (x1 , x2 , x3 ) and the
r
p
2
2
2
length of this vector is defined as
(x1 ) + (x2 ) + (x3 ) = || x21 + x22 + x23 .
Thus the following holds.
|x| = || |x| .
In other words, multiplication by a scalar magnifies the length of the vector. What
about the direction? You should convince yourself by drawing a picture that if is
negative, it causes the resulting vector to point in the opposite direction while if > 0
it preserves the direction the vector points. One way to see this is to first observe that
if 6= 1, then x and x are both points on the same line.
1.7
Exercises
(a) x+1
3 =
(b)
(c)
(d)
(e)
(f)
6. Parametric equations for a line are given. Find symmetric equations for the line
if possible. If it is not possible to do it explain why.
1.7. EXERCISES
25
(a) x = 1 + 2t, y = 3 t, z = 5 + 3t
(b) x = 1 + t, y = 3 t, z = 5 3t
(c) x = 1 + 2t, y = 3 + t, z = 5 + 3t
(d) x = 1 2t, y = 1, z = 1 + t
(e) x = 1 t, y = 3 + 2t, z = 5 3t
(f) x = t, y = 3 t, z = 1 + t
7. The first point given is a point containing the line. The second point given is a
direction vector for the line. Find parametric equations for the line determined
by this information.
(a) (1, 2, 1) , (2, 0, 3)
(b) (1, 0, 1) , (1, 1, 3)
(c) (1, 2, 0) , (1, 1, 0)
(d) (1, 0, 6) , (2, 1, 3)
(e) (1, 2, 1) , (2, 1, 1)
(f) (0, 0, 0) , (2, 3, 1)
8. Parametric equations for a line are given. Determine a direction vector for this
line.
(a) x = 1 + 2t, y = 3 t, z = 5 + 3t
(b) x = 1 + t, y = 3 + 3t, z = 5 t
(c) x = 7 + t, y = 3 + 4t, z = 5 3t
(d) x = 2t, y = 3t, z = 3t
(e) x = 2t, y = 3 + 2t, z = 5 + t
(f) x = t, y = 3 + 3t, z = 5 + t
9. A line contains the given two points. Find parametric equations for this line.
Identify the direction vector.
(a) (0, 1, 0) , (2, 1, 2)
(b) (0, 1, 1) , (2, 5, 0)
(c) (1, 1, 0) , (0, 1, 2)
(d) (0, 1, 3) , (0, 3, 0)
(e) (0, 1, 0) , (0, 6, 2)
(f) (0, 1, 2) , (2, 0, 2)
10. Draw a picture of the points in R2 which are determined by the following ordered
pairs.
(a) (1, 2)
(b) (2, 2)
(c) (2, 3)
(d) (2, 5)
11. Does it make sense to write (1, 2) + (2, 3, 1)? Explain.
26
FUNDAMENTALS
12. Draw a picture of the points in R3 which are determined by the following ordered
triples.
(a) (1, 2, 0)
(b) (2, 2, 1)
(c) (2, 3, 2)
13. You are given two points in R3 , (4, 5, 4) and (2, 3, 0) . Show the distance from the
point, (3, 4, 2) to the first of these points is the same as the distance from this
5+3
point to the second of the original pair of points. Note that 3 = 4+2
2 ,4 = 2 .
Obtain a theorem which will be valid for general pairs of points, (x, y, z) and
(x1 , y1 , z1 ) and prove your theorem using the distance formula.
14. A sphere is the set of all points which are at a given distance from a single given
point. Find an equation for the sphere which is the set of all points that are at a
distance of 4 from the point (1, 2, 3) in R3 .
15. A parabola is the set of all points (x, y) in the plane such that the distance from
the point (x, y) to a given point, (x0 , y0 ) equals the distance from (x, y) to a given
line. The point, (x0 , y0 ) is called the focus and the line is called the directrix.
Find the equation of the parabola which results from the line y = l and (x0 , y0 ) a
given focus with y0 < l. Repeat for y0 > l.
16. A sphere centered at the point (x0 , y0 , z0 ) R3 having radius r consists of all
points, (x, y, z) whose distance to (x0 , y0 , z0 ) equals r. Write an equation for this
sphere in R3 .
17. Suppose the distance between (x, y) and (x0 , y 0 ) were defined to equal the larger
of the two numbers |x x0 | and |y y 0 | . Draw a picture of the sphere centered at
the point, (0, 0) if this notion of distance is used.
18. Repeat the same problem except this time let the distance between the two points
be |x x0 | + |y y 0 | .
19. If (x1 , y1 , z1 ) and (x2 , y2 , z2 ) are two points
|(xi , yi , zi )| = 1 for i = 1, 2,
such that
2 y1 +y2 z1 +z2
show that in terms of the usual distance, x1 +x
,
< 1. What would
2
2 ,
2
happen if you used the way of measuring distance given in Problem 17 (|(x, y, z)| =
maximum of |z| , |x| , |y| .)?
20. Give a simple description using the distance formula of the set of points which are
at an equal distance between the two points (x1 , y1 , z1 ) and (x2 , y2 , z2 ) .
21. Suppose you are given two points, (a, 0) and (a, 0) in R2 and a number, r > 2a.
The set of points described by
2 y 2
= 1. This is a nice exercise
the ellipse. Simplify this to the form xA
+
in messy algebra.
22. Suppose you are given two points, (a, 0) and (a, 0) in R2 and a number, r > 2a.
The set of points described by
27
is known as hyperbola. The two given points are known as the focus points
2 y 2
of the hyperbola. Simplify this to the form xA
= 1. This is a nice
1.8
12x
3
3+2y
2
= z + 1. Then x =
12x
3
3t1
2 , y
=
=
3+2y
2
= z + 1. Find parametric
2t3
2 ,z
= t 1.
4. Parametric equations for a line are x = 1t, y = 3+2t, z = 53t. Find symmetric
equations for the line if possible. If it is not possible to do it explain why.
Solve the parametric equations for t. This gives
t=1x=
y3
5z
=
.
2
3
5z
y3
=
.
2
3
28
FUNDAMENTALS
6. The first point given is a point containing the line. The second point given is a
direction vector for the line. Find parametric equations for the line determined
by this information. (1, 1, 2) , (2, 1, 3) . Parametric equations are equivalent to
(x, y, z) = (1, 1, 2) + t (2, 1, 3) . Written parametrically, x = 1 + 2t, y = 1 + t, z =
2 3t.
7. Parametric equations for a line are given. Determine a direction vector for this
line. x = t, y = 3 + 2t, z = 5 + t
A direction vector is (1, 2, 1) . You just form the vector which has components equal
to the coefficients of t in the parametric equations for x, y, and z respectively.
8. A line contains the given two points. Find parametric equations for this line.
Identify the direction vector. (1, 2, 0) , (2, 1, 2) .
A direction vector is (1, 3, 2) and so parametric equations are equivalent to (x, y, z) =
(2, 1, 2) + (1, 3, 2) . Of course you could also have written (x, y, z) = (1, 2, 0) +
t (1, 3, 2) or (x, y, z) = (1, 2, 0) t (1, 3, 2) or (x, y, z) = (1, 2, 0) + t (2, 6, 4) ,
etc. As explained above, there are always infinitely many parameterizations for a
given line.
Outcomes
2.1
2.1.1
Matrix Arithmetic
Addition And Scalar Multiplication Of Matrices
When people speak of vectors and matrices, it is common to refer to numbers as scalars.
In this book, scalars will always be real numbers.
A matrix is a rectangular array of numbers. Several of them are referred to as
matrices. For example, here is a matrix.
1 2 3 4
5 2 8 7
6 9 1 2
The size or dimension of a matrix is defined as m n where m is the number of rows
and n is the number of columns. The above matrix is a 3 4 matrix because there
are three rows and four columns. The
first
row is (1 2 3 4) , the second row is (5 2 8 7)
1
and so forth. The first column is 5 . When specifying the size of a matrix, you
6
always list the number of rows before the number of columns. Also, you can remember
the columns are like columns in a Greek temple. They stand upright while the rows
29
30
just lay there like rows made by a tractor in a plowed field. Elements of the matrix are
identified according to position in the matrix. For example, 8 is in position 2, 3 because
it is in the second row and the third column. You might remember that you always
list the rows before the columns by using the phrase Rowman Catholic. The symbol,
(aij ) refers to a matrix. The entry in the ith row and the j th column of this matrix is
denoted by aij . Using this notation on the above matrix, a23 = 8, a32 = 9, a12 = 2,
etc.
There are various operations which are done on matrices. Matrices can be added
multiplied by a scalar, and multiplied by other matrices. To illustrate scalar multiplication, consider the following example in which a matrix is being multiplied by the scalar,
3.
3
6
9 12
1 2 3 4
6
24 21 .
3 5 2 8 7 = 15
6 9 1 2
18 27 3 6
The new matrix is obtained by multiplying every entry of the original matrix by the
given scalar. If A is an m n matrix, A is defined to equal (1) A.
Two matrices must be the same size to be added. The sum of two matrices is a
matrix which is obtained by adding the corresponding entries. Thus
1 2
1 4
0
6
3 4 + 2
8 = 5 12 .
5 2
6 4
11 2
Two matrices are equal exactly when they are the same size and the corresponding
entries areidentical. Thus
0 0
0 0
0 0 6=
0 0
0 0
because they are different sizes. As noted above, you write (cij ) for the matrix C whose
ij th entry is cij . In doing arithmetic with matrices you must define what happens in
terms of the cij sometimes called the entries of the matrix or the components of the
matrix.
The above discussion stated for general matrices is given in the following definition.
Definition 2.1.1 (Scalar Multiplication) If A = (aij ) and k is a scalar, then kA =
(kaij ) .
2 0
14
0
Example 2.1.2 7
=
.
1 4
7 28
Definition 2.1.3 (Addition) If A = (aij ) and B = (bij ) are two m n matrices. Then
A + B = C where
C = (cij )
for cij = aij + bij .
Example 2.1.4
1
1
2
0
3
4
5
6
2 3
2 1
6 4 6
5 2 5
31
0
0
0
0
0
0
Note there are 2 3 zero matrices, 3 4 zero matrices, etc. In fact there is a zero
matrix for every size.
Definition 2.1.7 (Equality of matrices) Let A and B be two matrices. Then A = B
means that the two matrices are of the same size and for A = (aij ) and B = (bij ) ,
aij = bij for all 1 i m and 1 j n.
The following properties of matrices can be easily verified. You should do so.
Commutative Law Of Addition.
A + B = B + A,
(2.1)
(A + B) + C = A + (B + C) ,
(2.2)
(2.3)
A + (A) = 0,
(2.4)
(2.5)
(2.6)
(2.7)
1A = A.
(2.8)
32
2.1.2
Multiplication Of Matrices
x1
..
x = .
xn
is also called a column vector. The 1 n matrix
(x1 xn )
is called a row vector.
Although the following description of matrix multiplication may seem strange, it is
in fact the most important and useful of the matrix operations. To begin with consider
the case where a matrix is multiplied by a column vector. First consider a special case.
7
1 2 3
8
=?
4 5 6
9
One way to remember this is as follows. Slide the vector, placing it on top the two rows
as shown and then do the indicated operation.
7 8 9
50
71+82+93
1
2
3
.
=
7 8 9
122
74+85+96
4 5 6
multiply the numbers on the top by the numbers on the bottom and add them up to
get a single number for each row of the matrix as shown above.
In more general terms,
x
a11 x1 + a12 x2 + a13 x3
a11 a12 a13 1
x2
=
.
a21 x1 + a22 x2 + a23 x3
a21 a22 a23
x3
Another way to think of this is
a11
a12
a13
x1
+ x2
+ x3
a21
a22
a23
Thus you take x1 times the first column, add to x2 times the second column, and finally
x3 times the third column. In general, here is the definition of how to multiply an
(m n) matrix times a (n 1) matrix.
Definition 2.1.9 Let A = Aij be an m n matrix and let v be an n 1 matrix,
v1
v = ...
vn
Then Av is an m 1 matrix and the ith component of this matrix is
(Av)i = Ai1 v1 + Ai2 v2 + + Ain vn =
n
X
j=1
Aij vj .
33
Thus
Pn
A1j vj
..
Av =
.
.
Pn
j=1 Amj vj
j=1
(2.9)
In other words, if
A = (a1 , , an )
where the ak are the columns,
Av =
n
X
vk ak
k=1
This follows from 2.9 and the observation that the j th column of A is
A1j
A2j
..
.
Amj
so 2.9 reduces to
v1
A11
A21
..
.
+ v2
Am1
A12
A22
..
.
+ + vk
Am2
A1n
A2n
..
.
Amn
1 2
0 2
2 1
1
1
4
1
3
2
2
0 .
1
1
a2k vk
0 1 + 2 2 + 1 0 + (2) 1 = 2.
k=1
1
1 2 1 3
8
2
2 .
0 2 1 2
0 =
2 1 4 1
5
1
The next task is to multiply an m n matrix times an n p matrix. Before doing
so, the following may be helpful.
34
For A and B matrices, in order to form the product, AB the number of columns of
A must equal the number of rows of B.
these must match!
(m
[
n)
(n p
)=mp
Note the two outside numbers give the size of the product. Remember:
If the two middle numbers dont match, you cant multiply the matrices!
Definition 2.1.11 When the number of columns of A equals the number of rows of
B the two matrices are said to be conformable and the product, AB is obtained as
follows. Let A be an m n matrix and let B be an n p matrix. Then B is of the form
B = (b1 , , bp )
where bk is an n 1 matrix or column vector. Then the m p matrix, AB is defined
as follows:
AB (Ab1 , , Abp )
(2.10)
where Abk is an m 1 matrix or column vector which gives the k th column of AB.
Example 2.1.12 Multiply the following.
1 2 0
1 2 1
0 3 1
0 2 1
2 1 1
The first thing you need to check before doing anything else is whether it is possible
to do the multiplication. The first matrix is a 2 3 and the second matrix is a 3 3.
Therefore, is it possible to multiply these matrices. According to the above discussion
it should be a 2 3 matrix of the form
Second column
Third column
First column
}|
{
z
}|
{
z
}|
{
z
1
2
0
1 2 1
1 2 1
1 2 1
0 ,
3 ,
1
0 2 1
0 2 1
0 2 1
2
1
1
1 2
1 2 1
0 3
0 2 1
2 1
0
1 9
1
=
2 7
1
1 2 0
0 3 1 1 2
0 2
2 1 1
1
1
3
3
2.1.3
35
B1j
bj = ...
Bnj
and from the above definition, the ith entry is
n
X
Aik Bkj .
(2.11)
k=1
A11 A12
A21 A22
..
..
.
.
Am1 Am2
A1n
B11
B21
A2n
.. ..
. .
Amn
Bn1
..
..
..
.
.
.
Am1 Am2 Amn
which is a m 1 matrix or column
A11
A21
.. B1j +
.
Am1
B12
B22
..
.
B1p
B2p
..
.
Bn2
Bnp
form
B1j
B2j
..
.
Bnj
A12
A1n
A2n
A22
.. B2j + + ..
.
.
Am2
Amn
Bnj .
m
X
A2k Bkj .
k=1
m
X
Aik Bkj .
k=1
This shows the following definition for matrix multiplication in terms of the ij th entries
of the product coincides with Definition 2.1.11.
This shows the following definition for matrix multiplication in terms of the ij th
entries of the product coincides with Definition 2.1.11.
Definition 2.1.14 Let A = (Aij ) be an m n matrix and let B = (Bij ) be an n p
matrix. Then AB is an m p matrix and
(AB)ij =
n
X
k=1
Aik Bkj .
(2.12)
36
1 2
2
Example 2.1.15 Multiply if possible 3 1
7
2 6
3
6
1
2
First check to see if this is possible. It is of the form (3 2) (2 3) and since the
inside numbers match, the two matrices are conformable and it is possible to do the
multiplication. The result should be a 3 3 matrix. The answer is of the form
1 2
1 2
1 2
1
3
2
3 1
, 3 1
, 3 1
2
6
7
2 6
2 6
2 6
where the commas separate the columns in
product equals
16 15
13 15
46 42
5
5 ,
14
a 3 3 matrix as desired. In terms of the ij th entries and the above definition, the entry
in the third row and second column of the product should equal
X
a3k bk2 = a31 b12 + a32 b22
j
2 3 + 6 6 = 42.
1 2
2
Example 2.1.16 Multiply if possible 3 1 7
2 6
0
1
2 .
0
3
6
0
This is not possible because it is of the form (3 2) (3 3) and the middle numbers
dont match. In other words the two matrices are not conformable in the indicated
order.
2 3 1
1 2
Example 2.1.17 Multiply if possible 7 6 2 3 1 .
0 0 0
2 6
This is possible because in this case it is of the form (3 3) (3 2) and the middle
numbers do match so the matrices are conformable. When the multiplication is done it
equals
13 13
29 32 .
0 0
Check this and be sure you come up with the same answer.
1 2
1
2 1 2 1 0 = 2 4
1 2
1
0
0
0
2.1.4
37
As pointed out above, sometimes it is possible to multiply matrices in one order but
not in the other order. What if it makes sense to multiply them in either order? Will
the two products be equal then?
1 2
0 1
0 1
1 2
Example 2.1.19 Compare
and
.
3 4
1 0
1 0
3 4
The first product is
1
3
2
4
0
1
1
0
0
1
1
0
1
3
2
4
2 1
4 3
3 4
1 2
You see these are not equal. Again you cannot conclude that AB = BA for matrix
multiplication even when multiplication is defined in both orders. However, there are
some properties which do hold.
Proposition 2.1.20 If all multiplications and additions make sense, the following hold
for matrices, A, B, C and a, b scalars.
A (aB + bC) = a (AB) + b (AC)
(2.13)
(B + C) A = BA + CA
(2.14)
A (BC) = (AB) C
(2.15)
= a
Aik Bkj + b
Aik Ckj
= a (AB)ij + b (AC)ij
= (a (AB) + b (AC))ij .
Thus A (B + C) = AB + AC as claimed. Formula 2.14 is entirely similar.
Formula 2.15 is the associative law of multiplication. Using Definition 2.1.14,
X
(A (BC))ij =
Aik (BC)kj
k
X
k
Aik
Bkl Clj
(AB)il Clj
=
This proves 2.15.
((AB) C)ij .
38
2.1.5
The Transpose
Another important operation on matrices is that of taking the transpose. The following
example shows what is meant by this operation, denoted by placing a T as an exponent
on the matrix.
1 4
1 3 2
3 1 =
4 1 6
2 6
What happened? The first column became the first row and the second column became
the second row. Thus the 3 2 matrix became a 2 3 matrix. The number 3 was in the
second row and the first column and it ended up in the first row and second column.
Here is the definition.
Definition 2.1.21 Let A be an m n matrix. Then AT denotes the n m matrix
which is defined as follows.
T
A ij = Aji
Example 2.1.22
1 2
3 5
6
4
1 3
= 2 5 .
6 4
(2.16)
(2.17)
(AB) = B T AT
and if and are scalars,
(A + B) = AT + B T
Proof: From the definition,
T
(AB)
ij
(AB)ji
X
Ajk Bki
X
B T ik AT kj
B T AT
ij
The proof of Formula 2.17 is left as an exercise and this proves the lemma.
Definition 2.1.24 An n n matrix, A is said to be symmetric if A = AT . It is said
to be skew symmetric if A = AT .
Example 2.1.25 Let
2
A= 1
3
1
5
3
3
3 .
7
Then A is symmetric.
Example 2.1.26 Let
0
A = 1
3
Then A is skew symmetric.
1 3
0 2
2 0
2.1.6
39
There is a special matrix called I and referred to as the identity matrix. It is always a
square matrix, meaning the number of rows equals the number of columns and it has
the property that there are ones down the main diagonal and zeroes elsewhere. Here
are some identity matrices of various sizes.
1 0 0 0
1 0 0
0 1 0 0
1 0
(1) ,
, 0 1 0 ,
0 0 1 0 .
0 1
0 0 1
0 0 0 1
The first is the 1 1 identity matrix, the second is the 2 2 identity matrix, the third is
the 3 3 identity matrix, and the fourth is the 4 4 identity matrix. By extension, you
can likely see what the n n identity matrix would be. It is so important that there is
a special symbol to denote the ij th entry of the identity matrix
Iij = ij
where ij is the Kroneker symbol defined by
ij =
1 if i = j
0 if i 6= j
Aik kj
Aij
A1 = A1 I = A1 (AB) = A1 A B = IB = B.
Unlike ordinary multiplication of numbers, it can happen that A 6= 0 but A may fail
to have an inverse. This is illustrated in the following example.
1 1
1 1
40
One might think A would have an inverse because it does not equal zero. However,
1 1
1
0
=
1 1
1
0
and if A1 existed, this could not happen because you could write
0
0
1
= A1
= A1 A
=
0
0
1
= A1 A
1
1
=I
1
1
1
1
and
1 1
1 2
1 1
1 2
. Show
2 1
1 1
2
1
1
1
1
1
1
2
2
1
1
1
1 0
0 1
1 0
0 1
is the inverse of A.
x z
In the last example, how would you find A1 ? You wish to find a matrix,
y w
such that
1 1
x z
1 0
=
.
1 2
y w
0 1
This requires the solution of the systems of equations,
x + y = 1, x + 2y = 0
and
z + w = 0, z + 2w = 1.
The first pair of equations has the solution y = 1 and x = 2. The second pair of
equations has the solution w = 1, z = 1. Therefore, from the definition of the inverse,
2 1
A1 =
.
1 1
To be sure it is the inverse, you should multiply on both sides of the original matrix. It
turns out that if it works on one side, it will always work on the other. The consideration
of this and as well as a more detailed treatment of inverses is a good topic for a linear
algebra course.
2.2
41
Linear Transformations
1 2 0
Example 2.2.1 Consider the matrix,
. Think of it as a function which
2 1 0
x
takes vectors in R3 and makes them in to vectors in R2 as follows. For y a vector
z
in R3 , multiply on the left by the given matrix to obtain the vector in R2 . Here are some
numerical examples.
1
1
3
5
1 2 0
1 2 0
2 =
2
,
=
,
0
2 1 0
4
2 1 0
3
3
10
0
14
20
1 2 0
1 2 0
7 =
5 =
,
,
7
25
2 1 0
2 1 0
3
3
More generally,
1
2
2
1
0
0
x
x + 2y
y =
2x + y
z
The idea is to define a function which takes vectors in R3 and delivers new vectors in
R2 .
This is an example of something called a linear transformation.
Definition 2.2.2 Let T : Rn Rm be a function. Thus for each x Rn , T x Rm .
Then T is a linear transformation if whenever , are scalars and x1 and x2 are
vectors in Rn ,
T (x1 + x2 ) = 1 T x1 + T x2 .
In words, linear transformations distribute across + and allow you to factor out
scalars. At this point, recall the properties of matrix multiplication. The pertinent
property is 2.14 on Page 37. Recall it states that for a and b scalars,
A (aB + bC) = aAB + bAC
In particular, for A an m n matrix and B and C, n 1 matrices (column vectors) the
above formula holds which is nothing more than the statement that matrix multiplication gives an example of a linear transformation.
Definition 2.2.3 A linear transformation is called one to one (often written as 1 1)
if it never takes two different vectors to the same vector. Thus T is one to one if
whenever x 6= y
T x 6= T y.
Equivalently, if T (x) = T (y) , then x = y.
In the case that a linear transformation comes from matrix multiplication, it is common usage to refer to the matrix as a one to one matrix when the linear transformation
it determines is one to one.
42
2.3
It turns out that if T is any linear transformation which maps Rn to Rm , there is always
an m n matrix, A with the property that
Ax = T x
(2.18)
where as implied above, ei is the vector which has zeros in every slot but the ith and a
1 in this slot. Then since T is linear,
Tx
n
X
xi T (ei )
i=1
= T
|
(e1 )
|
x1
..
.
T (en )
|
x1
..
.
xn
xn
and so you see that the matrix desired is obtained from letting the ith column equal
T (ei ) . This yields the following theorem.
Theorem 2.3.1 Let T be a linear transformation from Rn to Rm . Then the matrix, A
satisfying 2.18 is given by
|
|
T (e1 ) T (en )
|
|
where T ei is the ith column of A.
43
Sometimes you need to find a matrix which represents a given linear transformation
which is described in geometrical terms. The idea is to produce a matrix which you can
multiply a vector by to get the same thing as some geometrical description. A good
example of this is the problem of rotation of vectors.
Example 2.3.2 Determine the matrix which represents the linear transformation defined by rotating every vector through an angle of .
1
0
Let e1
and e2
. These identify the geometric vectors which point
0
1
along the positive x axis and positive y axis as shown.
e2 6
e1
From the above, you only need to find T e1 and T e2 , the first being the first column
of the desired matrix, A and the second being the second column. From drawing a
picture and doing a little geometry, you see that
sin
cos
.
, T e2 =
T e1 =
cos
sin
Therefore, from Theorem 2.3.1,
A=
cos
sin
sin
cos
Example 2.3.3 Find the matrix of the linear transformation which is obtained by first
rotating all vectors through an angle of and then through an angle . Thus you want
the linear transformation which rotates all angles through an angle of + .
Let T+ denote the linear transformation which rotates every vector through an
angle of + . Then to get T+ , you could first do T and then do T where T is the
linear transformation which rotates through an angle of and T is the linear transformation which rotates through an angle of . Denoting the corresponding matrices by
A+ , A , and A , you must have for every x
A+ x = T+ x = T T x = A A x.
Consequently, you must have
A+
=
=
cos ( + ) sin ( + )
= A A
sin ( + ) cos ( + )
cos sin
cos sin
.
sin
cos
sin cos
44
You know how to multiply matrices. Do so to the pair on the right. This yields
cos ( + ) sin ( + )
cos cos sin sin cos sin sin cos
=
.
sin ( + ) cos ( + )
sin cos + cos sin cos cos sin sin
Dont these look familiar? They are the usual trig. identities for the sum of two angles
derived here using linear algebra concepts.
You do not have to stop with two dimensions. You can consider rotations and other
geometric concepts in any number of dimensions. This is one of the major advantages
of linear algebra. You can break down a difficult geometrical procedure into small steps,
each corresponding to multiplication by an appropriate matrix. Then by multiplying the
matrices, you can obtain a single matrix which can give you numerical information on
the results of applying the given sequence of simple procedures. That which you could
never visualize can still be understood to the extent of finding exact numerical answers.
The following is a more routine example quite typical of what will be important in the
calculus of several variables.
x1 + 3x2
x1 x2
. Thus T : R2 R4 . Explain why
Example 2.3.4 Let T (x1 , x2 ) =
x1
3x2 + 5x1
x1
T is a linear transformation and write T (x1 , x2 ) in the form A
where A is an
x2
appropriate matrix.
From the definition of matrix multiplication,
1 3
1 1
T (x1 , x2 ) =
1 0
5 3
x1
x2
2.4
Exercises
1 2 3
3 1 2
A =
,B =
,
2 1 7
3 2 1
1 2
1 2
2
C =
,D =
,E =
.
3 1
2 3
3
Find if possible 3A, 3B A, AC, CB, AE, EA. If it is not possible explain why.
2. Here are some matrices:
1 2
2 5 2
3 2 ,B =
,
3 2 1
1 1
1 2
1 1
1
,D =
,E =
.
5 0
4 3
3
Find if possible 3A, 3B A, AC, CA, AE, EA, BE, DE. If it is not possible explain why.
2.4. EXERCISES
45
1 2
2 5 2
3 2 ,B =
,
3 2 1
1 1
1 2
1 1
1
,D =
,E =
.
5 0
4 3
3
A
C
1 2
2 5 2
3 2
=
,B =
,
3 2 1
1 1
1 2
1
1
=
,D =
,E =
.
5 0
4
3
Find the following if possible and explain why it is not possible if this is the case.
AD, DA, DT B, DT BE, E T D, DE T .
1
1
1
1 3
1
1
2
0 .
5. Let A = 2 1 , B =
, and C = 1 2
2 1 2
1
2
3 1 0
Find if possible.
(a) AB
(b) BA
(c) AC
(d) CA
(e) CB
(f) BC
1 2
6. Let A =
,B =
3 4
BA? If so, what should k
1 2
7. Let A =
,B =
3 4
BA? If so, what should k
1 2
. Is it possible to choose k such that AB =
3 k
equal?
1 2
. Is it possible to choose k such that AB =
1 k
equal?
46
x1 + 4x2
x2 + 2x1
x1
transformation and write T (x1 , x2 ) in the form A
where A is an approx2
priate matrix.
x1 x2
x1
2
4
x1
transformation and write T (x1 , x2 ) in the form A
where A is an approx2
priate matrix.
x1 x2 + 2x3
2x3 + x1
. Thus T : R4 R4 . Explain why T
15. Let T (x1 , x2 , x3 , x4 ) =
3x3
3x4 + 3x2 + x1
x1
x2
2
x1 + 4x2
. Thus T : R2 R2 . Explain why T cannot
16. Let T (x1 , x2 ) =
x2 + 2x1
possibly be a linear transformation.
13. Let T (x1 , x2 ) =
17. Suppose A and B are square matrices of the same size. Which of the following
are correct?
2
(a) (A B) = A2 2AB + B 2
2
(b) (AB) = A2 B 2
2
(c) (A + B) = A2 + 2AB + B 2
2
(d) (A + B) = A2 + AB + BA + B 2
(e) A2 B 2 = A (AB) B
3
1 1
18. Let A =
. Find all 2 2 matrices, B such that AB = 0.
3
3
19. In 2.1 - 2.8 describe A and 0.
20. Let A be an n n matrix. Show A equals the sum
and a skew
of a symmetric
symmetric matrix. Hint: Consider the matrix 12 A + AT . Is this matrix symmetric?
21. If A is a skew symmetric matrix, what can be concluded about An where n =
1, 2, 3, ?
22. Show every skew symmetric matrix has all zeros down the main diagonal. The
main diagonal consists of every entry of the matrix which is of the form aii . It
runs from the upper left down to the lower right.
2.4. EXERCISES
47
A=
1 2
2 1
A=
1 0
2 3
A=
1 2
2 1
= B 1 A1 .
1 1 T
.
= A
41. Show that if A is an invertible nn matrix, then so is AT and AT
42. Show that if A is an n n invertible matrix and x is a n 1 matrix such that
Ax = b for b an n 1 matrix, then x = A1 b.
48
= C 1 B 1 A1 by verifying that
(ABC) C 1 B 1 A1 = C 1 B 1 A1 (ABC) = I.
2.5
1 2 1
A =
,B =
2 0 7
1 2
0
C =
,D =
3 1
2
1 2
,
2 1
2
2
,E =
.
3
1
0
3
Find if possible 3A, 3B A, AC, CB, AE, EA. If it is not possible explain why.
1 2 1
3A = (3)
=
2 0 7
1 2
0 1 2
3B A = 3
2 0
3 2 1
3 6
6 0
1
=
7
3
21
1
11
5 5
6 4
1 2
0 1 2
6 3 4
CB =
=
.
3 1
3 2 1
3 1 7
You cant multiply AE because it is of the form (2 3) (2 1) and the inside
numbers dont match. EA also cannot be multiplied because it is of the form
(2 1) (2 3) .
2. Here are some matrices:
1 2
0 5 2
3 2 ,B =
,
3 1 1
1 1
1 2
1 1
1
,D =
,E =
.
3 1
4 2
1
Find if possible 3AT , 3B AT , AC, CA, AE, E T B, EE T , E T E. If it is not possible explain why.
1
3AT = 3 3
1
3B AT = 3
0
3
5 2
1 1
1
AC = 3
1
2
2
1
2
3
2
=
6
1
1 2
3 2
1 1
1
2
=
3 1
9
6
3
3
1
11
7 4
9 8
2 1
18 5
1
4
49
1 2
3
1
AE = 3 2
= 5
1
1 1
0
ET B =
1
1
0
3
5 2
1 1
4 3
ET E =
1
1
1
1
=2
EE T =
In this case you
4
3. Let A = 2
1
if possible.
1
1
1
1
T
=
1
1
1
1
have (2 1) (1 2) = 2 2.
1
1
1
0
2
0 , B =
, and C = 0
2 1 2
2
3
4 1
1
2 0
(a) AB =
2
1 2
1 0 2
(b) BA =
2 1 2
0
1
2
2
6 1
= 2 0
5 2
1
2
1
3
0 . Find
0
6
4
2
4 1
2 3
2 0 =
8 6
1 2
4 1
1
1 3
2
0 = (3 2) (3 3) = nonsense
(c) AC = 2 0 0
1 2
3 1 0
1
1 3
4 1
1 5
2
0 2 0 = 4
0
(d) CA = 0
3 1 0
1 2
10 3
1
1 3
1 0 2
2
0
(e) CB = 0
= (3 3) (2 3) = nonsense
2 1 2
3 1 0
1 2
1 2
,B =
. Is it possible to choose k such that AB =
3 4
0 k
BA? If so, what should k equal?
1 2
1 2
1 2 + 2k
1 2
1 2
AB =
=
while BA =
=
3 4
0 k
3 6 + 4k
0 k
3 4
7 10
If AB = BA, then from what was just shown, you would need to
3k 4k
have 1 = 7 and this is not true. Therefore, there is no way to choose k such that
these two matrices commute.
4. Let A =
50
x1 x2 + 2x3
x1
x2
2x
x
3
1
5. Write
3x3 + x1 + x4 in the form A x3
3x4 + 3x2 + x1
x4
matrix.
1
1
1
1
1
0
0
3
where A is an appropriate
2 0
x1
x2
2 0
3 1 x3
0 3
x4
6. Suppose A and B are square matrices of the same size. Which of the following
are correct?
2
(b) (AB) = A2 B 2
2
(c) (A + B) = A + 2AB + B 2
(d) (A + B) = A + AB + BA + B 2
(e) A2 B 2 = A (AB) B
3
1 1
7. Let A =
. Find all 2 2 matrices, B such that AB = 0.
2 2
x y
You need a matrix,
such that
z w
1 1
x y
x+z
y+w
0 0
=
=
.
2 2
z w
2x + 2z 2y + 2w
0 0
Thus you need x = z and y = w. It appears
you can pick z and w at random
z w
and any matrix of the form
will work.
z
w
8. Let
3 2 3
A = 2 1 2 .
1 0 2
3 2 3
2 4 1
2 1 2 = 2 3 0
1 0 2
1 2 1
9. Let
0 0 3
A = 2 4 4 .
1 0 1
1 1
3 0
1
0 0 3
2 4 4 = 1 1 1
6
4
2
1
1 0 1
0
0
3
51
1 2 3
A = 2 1 4 .
3 3 7
1 1 1 2
3
2 3
1 4 = 0 0 0 .
3 7
If A1 existed then you could multiply on the right side in the above equations
and find
1 1 1 = 0 0 0
which is not true.
52
Determinants
3.0.1
Outcomes
3.1
3.1.1
a b
Definition 3.1.1 Let A =
. Then
c d
det (A) ad cb.
The determinant is also often denoted by enclosing the matrix with two vertical lines.
Thus
a b
a b
.
det
=
c d
c d
2 4
1 6
From the definition this is just (2) (6) (1) (4) = 16.
Having defined what is meant by the determinant of a 2 2 matrix, what about a
3 3 matrix?
53
54
DETERMINANTS
1
4
3
2
3
2
3
2 .
1
The (1, 2) minor is the determinant of the 2 2 matrix which results when you delete
the first row and the second column. This minor is therefore
4 2
det
= 2.
3 1
The (2, 3) minor is the determinant of the 2 2 matrix which results when you delete
the second row and the third column. This minor is therefore
1 2
det
= 4.
3 2
Definition 3.1.5 Suppose
A is a 3 3 matrix. The ij th cofactor is defined to be
i+j
i+j
th
(1)
ij minor . In words, you multiply (1)
times the ij th minor to get
th
the ij cofactor. The cofactors of a matrix are so important that special notation is
appropriate when referring to them. The ij th cofactor of a matrix, A will be denoted
by cof (A)ij . It is also convenient to refer to the cofactor of an entry of a matrix as
follows. For aij an entry of the matrix, its cofactor is just cof (A)ij . Thus the cofactor
of the ij th entry is just the ij th cofactor.
Example 3.1.6 Consider the matrix,
3
2 .
1
1 2
A= 4 3
3 2
The (1, 2) minor is the determinant of the 2 2 matrix which results when you delete
the first row and the second column. This minor is therefore
4 2
det
= 2.
3 1
It follows
1+2
4 2
3 1
det
1+2
= (1)
(2) = 2
The (2, 3) minor is the determinant of the 2 2 matrix which results when you delete
the second row and the third column. This minor is therefore
1 2
det
= 4.
3 2
Therefore,
2+3
det
1
3
2
2
Similarly,
2+2
det
2+3
= (1)
1
3
3
1
(4) = 4.
= 8.
55
1 2
A= 4 3
3 2
3
2 .
1
cof(A)11
}|
1+1 3 2
1(1)
2 1
cof(A)21
{
z
}|
+ 4(1)2+1 2 3
2 1
cof(A)31
{
z
}|
+ 3(1)3+1 2 3
3 2
= 0.
You see, I just followed the rule in the above definition. I took the 1 in the first column
and multiplied it by its cofactor, the 4 in the first column and multiplied it by its
cofactor, and the 3 in the first column and multiplied it by its cofactor. Then I added
these numbers together.
You could also expand the determinant along the second row as follows.
cof(A)21
}|
2+1 2 3
4(1)
2 1
z
cof(A)22
{
}|
z
1 3
2+2
+ 3(1)
3 1
cof(A)23
{
}|
z
1 2
2+3
+ 2(1)
3 2
= 0.
Observe this gives the same number. You should try expanding along other rows and
columns. If you dont make any mistakes, you will always get the same answer.
What about a 4 4 matrix? You know now how to find the determinant of a 3 3
matrix. The pattern is the same.
Definition 3.1.9 Suppose A is a 4 4 matrix. The ij th minor is the determinant
th
of the 3 3 matrix you obtain when you delete the
ith row and
th
the j column. The
i+j
th
ij cofactor, cof (A)ij is defined to be (1) ij minor . In words, you multiply
i+j
(1)
1
5
A=
1
3
2
4
3
4
3
2
4
3
4
3
5
2
As in the case of a 3 3 matrix, you can expand this along any row or column. Lets
pick the third column. det (A) =
5 4 3
1 2 4
1+3
2+3
3 (1)
1 3 5 + 2 (1)
1 3 5 +
3 4 2
3 4 2
56
DETERMINANTS
1 2
3+3
4 (1)
5 4
3 4
4
3
2
1 2
+ 3 (1)4+3 5 4
1 3
4
3
5
Now you know how to expand each of these 3 3 matrices along a row or a column.
If you do so, you will get 12 assuming you make no mistakes. You could expand this
matrix along any row or any column and assuming you make no mistakes, you will
always get the same thing which is defined to be the determinant of the matrix, A. This
method of evaluating a determinant by expanding along a row or a column is called the
method of Laplace expansion.
Note that each of the four terms above involves three terms consisting of determinants of 2 2 matrices and each of these will need 2 terms. Therefore, there will be
4 3 2 = 24 terms to evaluate in order to find the determinant using the method of
Laplace expansion. Suppose now you have a 10 10 matrix and you follow the above
pattern for evaluating determinants. By analogy to the above, there will be 10! =
3, 628 , 800 terms involved in the evaluation of such a determinant by Laplace expansion
along a row or column. This is a lot of terms.
In addition to the difficulties just discussed, you should regard the above claim
that you always get the same answer by picking any row or column with considerable
skepticism. It is incredible and not at all obvious. However, it requires a little effort
to establish it. This is done in the section on the theory of the determinant The above
examples motivate the following definitions, the second of which is incredible.
Definition 3.1.12 Let A = (aij ) be an n n matrix and suppose the determinant of
a (n 1) (n 1) matrix has been defined. Then a new matrix called the cofactor
matrix, cof (A) is defined by cof (A) = (cij ) where to obtain cij delete the ith row and
the j th column of A, take the determinant of the (n 1) (n 1) matrix which results,
i+j
(This is called the ij th minor of A. ) and then multiply this number by (1) . Thus
i+j
(1)
the ij th minor equals the ij th cofactor. To make the formulas easier to
remember, cof (A)ij will denote the ij th entry of the cofactor matrix.
With this definition of the cofactor matrix, here is how to define the determinant of
an n n matrix.
Definition 3.1.13 Let A be an n n matrix where n 2 and suppose the determinant
of an (n 1) (n 1) has been defined. Then
det (A) =
n
X
j=1
n
X
(3.1)
i=1
The first formula consists of expanding the determinant along the ith row and the second
expands the determinant along the j th column. This is called the method of Laplace
expansion.
Theorem 3.1.14 Expanding the n n matrix along any row or column always gives
the same answer so the above definition is a good definition.
3.1.2
57
0
..
.
..
..
.
..
.
..
.
0
A lower triangular matrix is defined similarly as a matrix for which all entries above
the main diagonal are equal to zero.
You should verify the following using the above theorem on Laplace expansion.
Corollary 3.1.16 Let M be an upper (lower) triangular matrix. Then det (M ) is obtained by taking the product of the entries on the main diagonal.
Example 3.1.17 Let
1
0
A=
0
0
2
2
0
0
3
6
3
0
77
7
33.7
1
2 6
2 3 77
7
2+1
1 0 3 33.7 + 0 (1)
0 3 33.7 +
0 0 1
0 0 1
2 3 77
2 3 77
3+1
4+1
7
0 (1)
2 6 7 + 0 (1)
2 6
0 0 1
0 3 33.7
and the only nonzero term in the expansion is
2 6
7
1 0 3 33.7
0 0 1
3 33.7
7
2+1 6
1 2
+ 0 (1)
0 1
0 1
3
= 1 2
0
+ 0 (1)3+1 6
7
33.7
33.7
1
Next expand this last determinant along the first column to obtain the above equals
1 2 3 (1) = 6
which is just the product of the entries down the main diagonal of the original matrix.
58
DETERMINANTS
3.1.3
Properties Of Determinants
There are many properties satisfied by determinants. Some of these properties have to
do with row operations which are described below.
Definition 3.1.18 The row operations consist of the following
1. Switch two rows.
2. Multiply a row by a nonzero number.
3. Replace a row by a multiple of another row added to itself.
Theorem 3.1.19 Let A be an n n matrix and let A1 be a matrix which results from
multiplying some row of A by a scalar, c. Then c det (A) = det (A1 ).
1 2
3 4
, A1 =
2 4
3 4
Theorem 3.1.21 Let A be an n n matrix and let A1 be a matrix which results from
switching two rows of A. Then det (A) = det (A1 ) . Also, if one row of A is a multiple
of another row of A, then det (A) = 0.
1
3
2
4
and let A1 =
3
1
4
2
2.
Theorem 3.1.23 Let A be an n n matrix and let A1 be a matrix which results from
applying row operation 3. That is you replace some row by a multiple of another row
added to itself. Then det (A) = det (A1 ).
1 2
1 2
Example 3.1.24 Let A =
and let A1 =
. Thus the second row of
3 4
4 6
A1 is one times the first row added to the second row. det (A) = 2 and det (A1 ) = 2.
Theorem 3.1.25 In Theorems 3.1.19 - 3.1.23 you can replace the word, row with
the word column.
There are two other major properties of determinants which do not involve row
operations.
Theorem 3.1.26 Let A and B be two n n matrices. Then
det (AB) = det (A) det (B).
Also,
det (A) = det AT .
Example 3.1.27 Compare det (AB) and det (A) det (B) for
A=
1 2
3 2
,B =
3
4
2
1
AB =
1 2
3 2
and so
3 2
4 1
and
11
1
59
4
4
2
1
4
4
= 40.
1 2
3 2
3
4
11
1
=8
= 5.
3.1.4
Theorems 3.1.23 - 3.1.25 can be used to find determinants using row operations. As
pointed out above, the method of Laplace expansion will not be practical for any matrix
of large size. Here is an example in which all the row operations are used.
Example 3.1.28 Find the determinant of the matrix,
1 2 3 4
5 1 2 3
A=
4 5 4 3
2 2 4 5
Replace the second row by (5) times the first row added to it. Then replace the
third row by (4) times the first row added to it. Finally, replace the fourth row by
(2) times the first row added to it. This yields the matrix,
1 2
0 9
B=
0 3
0 2
3
13
8
10
4
17
13
3
1 2
0 0
C=
0 3
0 6
3
11
8
30
4
22
.
13
9
The second row was replaced by (3) times the third row added to the second row. By
Theorem 3.1.23 this didnt change the value of the determinant. Then the last row was
multiplied by (3) . By Theorem 3.1.19 the resulting matrix has a determinant which is
(3) times the determinant of the unmultiplied matrix. Therefore, I multiplied by 1/3
to retain the correct value. Now replace the last row with 2 times the third added to it.
This does not change the value of the determinant by Theorem 3.1.23. Finally switch
60
DETERMINANTS
the third and second rows. This causes the determinant to be multiplied by (1) . Thus
det (C) = det (D) where
1 2
3
4
0 3 8 13
D=
0 0
11
22
0 0
14 17
You could do more row operations or you could note that this can be easily expanded
along the first column followed by expanding the 3 3 matrix which results along its
first column. Thus
11 22
= 1485
det (D) = 1 (3)
14 17
and so det (C) = 1485 and det (A) = det (B) = 1
(1485) = 495.
3
Example 3.1.29 Find the determinant of the matrix
1 2 3 2
1 3 2 1
2 1 2 5
3 4 1 2
Replace the second row by (1) times the first row added to it. Next take 2 times
the first row and add to the third and finally take 3 times the first row and add to the
last row. This yields
1
2
3
2
0 5 1 1
0 3 4 1 .
0 10 8 4
By Theorem 3.1.23 this matrix has the same determinant as the original matrix. Remember you can work with the columns also. Take 5 times the last column and add
to the second column. This yields
1 8 3
2
0 0 1 1
0 8 4 1
0 10 8 4
By Theorem 3.1.25 this matrix has the same determinant as the original matrix. Now
take (1) times the third row and add to the top row. This gives.
1 0
7
1
0 0 1 1
0 8 4 1
0 10 8 4
which by Theorem 3.1.23 has the same determinant as the original matrix. Lets expand
it now along the first column. This yields the following for the determinant of the
original matrix.
0 1 1
det 8 4 1
10 8 4
which equals
8 det
1
8
1
4
+ 10 det
1
4
1
1
= 82
3.2. APPLICATIONS
61
Do not try to be fancy in using row operations. That is, stick mostly to the one
which replaces a row or column with a multiple of another row or column added to it.
Also note there is no way to check your answer other than working the problem more
than one way. To be sure you have gotten it right you must do this.
3.2
3.2.1
Applications
A Formula For The Inverse
The definition of the determinant in terms of Laplace expansion along a row or column
also provides a way to give a formula for the inverse of a matrix. Recall the definition of
the inverse of a matrix in Definition 2.1.28 on Page 39. Also recall the definition of the
cofactor matrix given in Definition 3.1.12 on Page 56. This cofactor matrix was just the
matrix which results from replacing the ij th entry of the matrix with the ij th cofactor.
The following theorem says that to find the inverse, take the transpose of the cofactor
matrix and divide by the determinant. The transpose of the cofactor matrix is called
the adjugate or sometimes the classical adjoint of the matrix A. In other words,
A1 is equal to one divided by the determinant of A times the adjugate matrix of A.
This is what the following theorem says with more precision.
1 2 3
A= 3 0 1
1 2 1
First find the determinant of this matrix. Using Theorems 3.1.23 - 3.1.25 on Page
58, the determinant of this matrix equals the determinant of the matrix,
1 2
3
0 6 8
0 0 2
which equals 12. The cofactor matrix of
2
4
2
A is
2
2
8
6
0 .
6
Each entry of A was replaced by its cofactor. Therefore, from the above theorem, the
inverse of A should equal
1
1
1
6
3
6
2 2 6
1
2
1
1
6
3 .
4 2 0
= 6
12
2
8 6
1
1
0
2
2
62
DETERMINANTS
1
1
6
3
6
1
1
1
2
3 3
6 6
1
1
1
0 2
2
2
0
2
3
1 0
1 = 0 1
1
0 0
0
0
1
and so it is correct.
Example 3.2.3 Find the inverse of the matrix,
1
1
0
2
2
1 1 1
2
A= 6 3
5 2 1
6
3
2
First find its determinant. This determinant is 16 . The inverse is therefore equal to
1
3 12
1
3 2
1
0
6 2 1
3
2
1
0
2
1
3 12
1
6 12
5
1
6 2
1
1
2
2
5
1
6 2
1
1
2
2
1
1
6 2
1 1 T
6 3
5 2
6 3
1
0
2
5 2 .
6
3
1
0
2
1 1
6 3
1
6
6 3
1
6
1
3
1
6
1
6
13
1
6
1
6
= 2
2
1
2
1
1
1
1
2
1
2
1
2
1
2
1
1
1 6
1
5
6
0
1
3
2
3
1
2
1
12
0
=
1
0
2
0
1
0
0
0
1
and so it is correct. If the result of multiplying these matrices had been something other
than the identity matrix, you would know there was an error. When this happens, you
need to search for the mistake if you are interested in getting the right answer. A
common mistake is to forget to take the transpose of the cofactor matrix.
3.2. APPLICATIONS
63
i=1
Now consider
n
X
i=1
when k 6= r. Replace the k th column with the rth column to obtain a matrix, Bk whose
determinant equals zero by Theorem 3.1.21. However, expanding this matrix, Bk along
the k th column yields
1
n
X
i=1
Summarizing,
n
X
= rk
i=1
Now
n
X
i=1
n
X
1 if r = k
.
0 if r 6= k
i=1
T
cof (A)
A = I.
det (A)
(3.2)
= rk
j=1
Now
n
X
j=1
n
X
j=1
T
cof (A)
= I,
det (A)
(3.3)
a1
ij = cof (A)ji det (A)
In other words,
T
A1 =
cof (A)
.
det (A)
64
DETERMINANTS
et
0
A (t) =
0
Show that A (t)
0
cos t
sin t
0
sin t
cos t
1
0
0
C (t) = 0 et cos t et sin t
0 et sin t et cos t
and so the inverse is
T t
1
0
0
e
1
t
t
= 0
0
e
cos
t
e
sin
t
et
0 et sin t et cos t
0
3.2.2
0
0
cos t sin t .
sin t cos t
Cramers Rule
This formula for the inverse also implies a famous procedure known as Cramers rule.
Cramers rule gives a formula for the solutions, x, to a system of equations, Ax = y in
the special case that A is a square matrix. Note this rule does not apply if you have a
system of equations in which there is a different number of equations than variables.
In case you are solving a system of equations, Ax = y for x, it follows that if A1
exists,
x = A1 A x = A1 (Ax) = A1 y
thus solving the system. Now in the case that A1 exists, there is a formula for A1
given above. Using this formula,
xi =
n
X
j=1
a1
ij yj =
n
X
j=1
1
cof (A)ji yj .
det (A)
y1
1
..
..
det .
xi =
.
det (A)
yn
column,
.. ,
.
where here the ith column of A is replaced with the column vector, (y1 , yn ) , and
the determinant of this modified matrix is taken and divided by det (A). This formula
is known as Cramers rule.
3.2. APPLICATIONS
65
det Ai
det A
T
where Ai is obtained from A by replacing the ith column of A with the column (y1 , , yn ) .
Example 3.2.6 Find x, y if
1
3
2
Now to find y,
1
x
2 1
2 1 y = 2 .
3
z
3 2
x =
1
2
3
1
3
2
2
2
3
2
2
3
1 1
3 2
2 3
y =
1 2
3 2
2 3
z =
1
3
2
1
3
2
2
2
3
2
2
3
1
1
2
1
1
2
1
=
1
1
2
1
=
7
1
1
2
1
2
3
1
1
2
11
=
14
You see the pattern. For large systems Cramers rule is less than useful if you want to
find an answer. This is because to use it you must evaluate determinants. However,
you have no practical way to evaluate determinants for large matrices other than row
operations and if you are using row operations, you might just as well use them to solve
the system to begin with. It will be a lot less trouble. Nevertheless, there are situations
in which Cramers rule is useful.
Example 3.2.7 Solve for z if
1
0
0 et cos t
0 et sin t
0
x
1
et sin t y = t
et cos t
z
t2
You could do it by row operations but it might be easier in this case to use Cramers
rule because the matrix of coefficients does not consist of numbers but of functions.
66
DETERMINANTS
Thus
z =
1
0
1
0 et cos t
t
0 et sin t t2
1
0
0
0 et cos t et sin t
0 et sin t et cos t
= t ((cos t) t + sin t) et .
You end up doing this sort of thing sometimes in ordinary differential equations in the
method of variation of parameters.
3.3
Exercises
1
(a) 3
0
4
(b) 1
3
1 2
1 3
(c)
4 1
1 2
2 3
2 2 (The answer is 31.)
9 8
3 2
7 8 (The answer is 375.)
9 3
3 2
2 3
, (The answer is 2.)
5 0
1 2
2. Find the following determinant by expanding along the first row and second column.
1 2 1
2 1 3
2 1 1
3. Find the following determinant by
row.
1
1
1
4. Find the following determinant by expanding along the second row and first column.
1 2 1
2 1 3
2 1 1
5. Compute the determinant by cofactor expansion. Pick the easiest row or column
to use.
1 0 0 1
2 1 1 0
0 0 0 2
2 1 3 1
3.3. EXERCISES
67
1 2 1
2 3 2
4 1 2
2 1 3
2 4 2
1 4 5
1 2 1
2
3 1 2 3
1 0 3
1
2 3 2 2
1 4 1
2
3 2 2 3
1 0 3
3
2 1 2 2
10. An operation is done to get from the first matrix to the second. Identify what
was done and tell how it will affect the value of the determinant.
a b
a c
,
c d
b d
11. An operation is done to get from the first matrix to the second. Identify what
was done and tell how it will affect the value of the determinant.
a b
c d
,
c d
a b
12. An operation is done to get from the first matrix to the second. Identify what
was done and tell how it will affect the value of the determinant.
a b
a
b
,
c d
a+c b+d
13. An operation is done to get from the first matrix to the second. Identify what
was done and tell how it will affect the value of the determinant.
a b
a b
,
c d
2c 2d
14. An operation is done to get from the first matrix to the second. Identify what
was done and tell how it will affect the value of the determinant.
a b
b a
,
c d
d c
68
DETERMINANTS
1
(e) If A1 exists then det A1 = det (A) .
(f) If B is obtained by multiplying a single row of A by 4 then det (B) =
4 det (A) .
n
2
1
2
1
6
6
3
12
6
19. If A1 exist, what is the relationship between det (A) and det A1 . Explain
your answer.
20. Is it true that det (A + B) = det (A) + det (B)? If this is so, explain why it is so
and if it is not so, give a counter example.
21. Let A be an r r matrix and suppose there are r 1 rows (columns) such that
all rows (columns) are linear combinations of these r 1 rows (columns). Show
det (A) = 0.
22. Show det (aA) = an det (A) where here A is an n n matrix and a is a scalar.
23. Suppose A is an upper triangular matrix. Show that A1 exists if and only if all
elements of the main diagonal are non zero. Is it true that A1 will also be upper
triangular? Explain. Is everything the same for lower triangular matrices?
24. Let A and B be two n n matrices. A B (A is similar to B) means there
exists an invertible matrix, S such that A = S 1 BS. Show that if A B, then
B A. Show also that A A and that if A B and B C, then A C.
3.3. EXERCISES
69
25. In the context of Problem 24 show that if A B, then det (A) = det (B) .
26. Two n n matrices, A and B, are similar if B = S 1 AS for some invertible n n
matrix, S. Show that if two matrices are similar, they have the same characteristic polynomials. The characteristic polynomial of an n n matrix, M is the
polynomial, det (I M ) .
27. Prove by doing computations that det (AB) = det (A) det (B) if A and B are 2 2
matrices.
28. Illustrate with an example of 2 2 matrices that the determinant of a product
equals the product of the determinants.
29. An n n matrix is called nilpotent if for some positive integer, k it follows
Ak = 0. If A is a nilpotent matrix and k is the smallest possible integer such that
Ak = 0, what are the possible values of det (A)?
30. Use Cramers rule to find the solution to
x + 2y = 1
2x y = 2
31. Use Cramers rule to find the solution to
x + 2y + z = 1
2x y z = 2
x+z =1
32. Here is a matrix,
1 2
0 2
3 1
3
1
0
Determine whether the matrix has an inverse by finding whether the determinant
is non zero.
33. Here is a matrix,
1
0
0 cos t
0 sin t
0
sin t
cos t
Does there exist a value of t for which this matrix fails to have an inverse? Explain.
34. Here is a matrix,
1 t t2
0 1 2t
t 0 2
Does there exist a value of t for which this matrix fails to have an inverse? Explain.
35. Here is a matrix,
et
et
et
et cos t
t
e cos t et sin t
2et sin t
et sin t
et sin t + et cos t
2et cos t
Does there exist a value of t for which this matrix fails to have an inverse? Explain.
70
DETERMINANTS
et
et
et
cosh t
sinh t
cosh t
sinh t
cosh t
sinh t
Does there exist a value of t for which this matrix fails to have an inverse? Explain.
37. Use the formula for the inverse in terms
the inverses of the matrices
1 2
1 1
, 0 2
1 2
4 1
1
0 .
2
e
0
0
.
et cos t
et sin t
A= 0
t
t
t
t
0 e cos t e sin t e cos t + e sin t
39. Find the inverse if it exists of the matrix,
t
e
cos t
sin t
et sin t cos t .
et cos t sin t
a (t)
c (t)
b (t)
d (t)
F (t) = det
Now suppose
. Verify
a0 (t) b0 (t)
c (t) d (t)
+ det
a (t) b (t)
F (t) = det d (t) e (t)
g (t) h (t)
a (t)
c0 (t)
b (t)
d0 (t)
c (t)
f (t) .
i (t)
c (t)
f 0 (t)
i (t)
Conjecture a general result valid for n n matrices and explain why it will be
true. Can a similar thing be done with the columns?
41. Let Ly = y (n) + an1 (x) y (n1) + + a1 (x) y 0 + a0 (x) y where the ai are given
continuous functions defined on a closed interval, (a, b) and y is some function
71
which has n derivatives so it makes sense to write Ly. Suppose Lyk = 0 for
k = 1, 2, , n. The Wronskian of these functions, yi is defined as
y1 (x)
yn (x)
0
0
y1 (x)
yn (x)
.
.
(n1)
y1
(x)
(n1)
yn
y1 (x) yn (x)
y10 (x) yn0 (x)
W 0 (x) = det
..
..
.
.
(n)
(n)
y1 (x) yn (x)
(x)
Now use the differential equation, Ly = 0 which is satisfied by each of these functions, yi and properties of determinants presented above to verify that W 0 +
an1 (x) W = 0. Give an explicit solution of this linear differential equation,
Abels formula, and use your answer to verify that the Wronskian of these solutions to the equation, Ly = 0 either vanishes identically on (a, b) or never. Hint:
To solve the differential equation, let A0 (x) = an1 (x) and multiply both sides of
the differential equation by eA(x) and then argue the left side is the derivative of
something.
3.4
1. Find the following determinant by expanding along the first row and second column.
1 2 1
0 4 3
2 1 1
Expanding along the first row you would have
4 3
0 3
0 4
1
2
+ 1
1 1
2 1
2 1
= 5.
0 3
+ 4 1 1 1 1 1 = 5
2
2 1
2 1
0 3
i+j
1 2 1
1 0 1
2 1 1
0 1
1 2 1
1
1 1
1 1
+ 2 2
1
=2
1
72
DETERMINANTS
2 1
1 1
1
2
1
+ 1
0 1
1 1
1
2
=2
0
3. Compute the determinant by cofactor expansion. Pick the easiest row or column
to use.
1 0 0 1
0 1 1 0
0 0 0 3
2 1 3 1
Probably it is easiest to expand
1 0
(1) 3 0 1
1 1
0
1 1
= 6
1 = 3 1
1 3
3
Notice how I expanded the three by three matrix along the top row.
4. Find the determinant using row operations.
11 2 1
2 7 2
4 1 2
11 2 1
2 7 2
0 15 6
To get this one, I added 2 times the second row to the last row. This gives a matrix
which has the same determinant as the original matrix. Next I will multiply the
second row by 11 and the top row by 2. This has the effect of producing a matrix
whose determinant is 22 times too large. Therefore, I need to divide the result by
22.
22 4 2
22 77 22 1 .
0 15 6 22
Next I will add the (1) times the top row to the second row. This leaves things
unchanged.
22 4 2
0 73 20 1
0 15 6 22
That 73 looks pretty formidable so I shall take 3 times the third column and
add to the second column. This will leave the number unchanged.
22 2 2
0 13 20 1
0 3 6 22
73
Now I will divide the bottom row by 3. To compensate for the damage inflicted,
I must then multiply by 3.
22 2 2
0 13 20 3
0 1 2 22
I dont like the 13 so I will take 13 times the bottom row and add to the middle.
This will leave the number unchanged.
22 2 2
3
0
0 46
0 1 2 22
Finally, I will switch the two bottom rows. This will change the sign. Therefore,
after doing this row operation, I need to multiply the result by (1) to compensate
for the damage done by the row operation.
22 2 2
3
0 1 2
= 138
22
0
0 46
The final matrix is upper triangular so to get its determinant, just multiply the
entries on the main diagonal.
5. Find the determinant using row operations.
1 2 1
2
3 1 2 3
1 0 3
1
2 3 2 2
In this case, you can do row operations on the matrix which are of the sort where
a row is replaced with itself added to another row without switching any rows and
eventually end up with
1 2
1
2
0 5 5 3
9
0 0
2
5
0 0
0 63
10
Each of these row operations does not change the value of the determinant of the
matrix and so the determinant is 63 which is obtained by multiplying the entries
which are down the main diagonal.
6. An operation is done to get from the first matrix to the second. Identify what
was done and tell how it will affect the value of the determinant.
a b
a c
,
c d
b d
In this case the transpose of the matrix on the left was taken. The new matrix
will have the same determinant as the original matrix.
7. An operation is done to get from the first matrix to the second. Identify what
was done and tell how it will affect the value of the determinant.
a b
a
b
,
c d
a+c b+d
This simply replaced the second row with the first row added to the second row.
The new matrix will have the same determinant as the original one.
74
DETERMINANTS
1
y =
1
2
1 1
2 1
1 1
=0
5
1
1 1
0
1
1
0
0 cos t
0 sin t
0
sin t
cos t
Does there exist a value of t for which this matrix fails to have an inverse? Explain.
You should take the determinant and remember the identity that cos2 (t)+sin2 (t) =
1.
75
1
0
t3
t t2
1 2t
0 2
Does there exist a value of t for which this matrix fails to have an inverse? Explain.
0
3
t
t t2
1 2t = 2 + t5
0 2
and so if t = 5 2, then this matrix has fails to have an inverse. However, it has
an inverse for all other values of t.
16. Use the formula for the inverse in terms
the inverses of the matrices
1 2
1 1
, 0 2
1 2
4 1
1
0 .
2
76
DETERMINANTS
Part II
Vectors In Rn
77
Outcomes
4.1
Eventually, one must consider functions which are defined on subsets of Rn and their
properties. The next definition will end up being quite important. It describe a type of
subset of Rn with the property that if x is in this set, then so is y whenever y is close
enough to x.
Definition 4.1.1 Let U Rn . U is an open set if whenever x U, there exists r > 0
such that B (x, r) U. More generally, if U is any subset of Rn , x U is an interior
point of U if there exists r > 0 such that x B (x, r) U. In other words U is an open
set exactly when every point of U is an interior point of U .
79
80
If there is something called an open set, surely there should be something called a
closed set and here is the definition of one.
Definition 4.1.2 A subset, C, of Rn is called a closed set if Rn \ C is an open set.
They symbol, Rn \ C denotes everything in Rn which is not in C. It is also called the
complement of C. The symbol, S C is a short way of writing Rn \ S.
To illustrate this definition, consider the following picture.
qx
B(x, r)
You see in this picture how the edges are dotted. This is because an open set, can
not include the edges or the set would fail to be open. For example, consider what
would happen if you picked a point out on the edge of U in the above picture. Every
open ball centered at that point would have in it some points which are outside U .
Therefore, such a point would violate the above definition. You also see the edges of
B (x, r) dotted suggesting that B (x, r) ought to be an open set. This is intuitively clear
but does require a proof. This will be done in the next theorem and will give examples
of open sets. Also, you can see that if x is close to the edge of U, you might have to
take r to be very small.
It is roughly the case that open sets dont have their skins while closed sets do. Here
is a picture of a closed set, C.
qx
B(x, r)
Note that x
/ C and since Rn \ C is open, there exists a ball, B (x, r) contained
entirely in Rn \ C. If you look at Rn \ C, what would be its skin? It cant be in Rn \ C
and so it must be in C. This is a rough heuristic explanation of what is going on with
these definitions. Also note that Rn and are both open and closed. Here is why. If
x , then there must be a ball centered at x which is also contained in . This must be
considered to be true because there is nothing in so there can be no example to show
it false1 . Therefore, from the definition, it follows is open. It is also closed because
1 To a mathematician, the statment: Whenever a pig is born with wings it can fly must be taken
as true. We do not consider biological or aerodynamic considerations in such statements. There is no
such thing as a winged pig and therefore, all winged pigs must be superb flyers since there can be no
example of one which is not. On the other hand we would also consider the statement: Whenever a
pig is born with wings it cant possibly fly, as equally true. The point is, you can say anything you
81
if x
/ , then B (x, 1) is also contained in Rn \ = Rn . Therefore, is both open and
closed. From this, it follows Rn is also both open and closed.
Theorem 4.1.3 Let x Rn and let r 0. Then B (x, r) is an open set. Also,
D (x, r) {y Rn : |y x| r}
is a closed set.
Proof: Suppose y B (x,r) . It is necessary to show there exists r1 > 0 such that
B (y, r1 ) B (x, r) . Define r1 r |x y| . Then if |z y| < r1 , it follows from the
above triangle inequality that
|z x| =
<
|z y + y x|
|z y| + |y x|
r1 + |y x| = r |x y| + |y x| = r.
6
r
q
x
B(x, r)
rq1
y
D(x, r)
Ai = {(x1 , x2 , , xn ) : each xi Ai } .
i=1
want about the elements of the empty set and no one can gainsay your statement. Therefore, such
statements are considered as true by default. You may say this is a very strange way of thinking about
truth and ultimately this is because mathematics is not about truth. It is more about consistency and
logic.
82
Theorem 4.1.6 Let U be an open set in Rm and let V be an open set in Rn . Then
U V is an open set in Rn+m . If C is a closed set in Rm and H is a closed set in Rn ,
then C H is a closed set in Rn+m . If C and H are bounded, then so is C H.
Proof: Let (x, y) U V. Since U is open, there exists r1 > 0 such that B (x, r1 )
U. Similarly, there exists r2 > 0 such that B (y, r2 ) V . Now
B ((x, y) , )
m
n
X
X
2
2
(s, t) Rn+m :
|xk sk | +
|yj tj | < 2
k=1
j=1
Therefore, if min (r1 , r2 ) and (s, t) B ((x, y) , ) , then it follows that s B (x, r1 )
U and that t B (y, r2 ) V which shows that B ((x, y) , ) U V. Hence U V is
open as claimed.
Next suppose (x, y)
/ C H. It is necessary to show there exists > 0 such that
B ((x, y) , ) Rn+m \ (C H) . Either x
/ C or y
/ H since otherwise (x, y) would be
a point of C H. Suppose therefore, that x
/ C. Since C is closed, there exists r > 0 such
that B (x, r) Rm \ C. Consider B ((x, y) , r) . If (s, t) B ((x, y) , r) , it follows that
s B (x, r) which is contained in Rm \ C. Therefore, B ((x, y) , r) Rn+m \ (C H)
showing C H is closed. A similar argument holds ifQ
y
/ H.
m
If C is bounded, there exist [ai , bi ] such that C i=1 [ai , bi ] and if H is bounded,
Qm+n
[ai , bi ] for intervals [am+1 , bm+1 ] , , [am+n , bm+n ] . Therefore, C H
H
Qm+n i=m+1
i=1 [ai , bi ] and this establishes the last part of this theorem.
4.2
Physical Vectors
Suppose you push on something. What is important? There are really two things
which are important, how hard you push and the direction you push. This illustrates
the concept of force.
Definition 4.2.1 Force is a vector. The magnitude of this vector is a measure of how
hard it is pushing. It is measured in units such as Newtons or pounds or tons. Its
direction is the direction in which the push is taking place.
Of course this is a little vague and will be left a little vague until the presentation
of Newtons second law later.
Vectors are used to model force and other physical vectors like velocity. What
was just described would be called a force vector. It has two essential ingredients, its
magnitude and its direction. Geometrically think of vectors as directed line segments
or arrows as shown in the following picture in which all the directed line segments are
83
considered to be the same vector because they have the same direction, the direction in
which the arrows point, and the same magnitude (length).
Because of this fact that only direction and magnitude are important, it is always
be a directed line
possible to put a vector in a certain particularly simple form. Let
pq
segment or vector. Then from Definition 1.4.4 it follows that pq consists of the points
of the form
p + t (q p)
where t [0, 1] . Subtract p from all these points to obtain the directed line segment
consisting of the points
0 + t (q p) , t [0, 1] .
The point in Rn , q p, will represent the vector.
was slid so it points in the same direction and the
Geometrically, the arrow,
pq,
base is at the origin, 0. For example, see the following picture.
n
X
!1/2
2
(qk pk )
= |p q|
k=1
Pn
k=1
vk2
1/2
= |v|.
|v| = 12 + 22 + 32 = 14.
84
|rv| =
n
X
!1/2
(rai )
k=1
1/2
= r2
n
X
!1/2
a2i
n
X
!1/2
2
r (ai )
k=1
= |r| |v| .
k=1
Thus the magnitude of rv equals |r| times the magnitude of v. If r is positive, then the
vector represented by rv has the same direction as the vector, v because multiplying
by the scalar, r, only has the effect of scaling all the distances. Thus the unit distance
along any coordinate axis now has length r and in this rescaled system the vector is
represented by a. If r < 0 similar considerations apply except in this case all the ai also
change sign. From now on, a will be referred to as a vector instead of an element of
Rn representing a vector as just described. The following picture illustrates the effect
of scalar multiplication.
v 2v 2v
Note there are n special vectors which point along the coordinate axes. These are
ei (0, , 0, 1, 0, , 0)
where the 1 is in the ith slot and there are zeros in all the other spaces. See the picture
in the case of R3 .
z
e3 6
e2
e1
n
X
a i ei .
k=1
What does addition of vectors mean physically? Suppose two forces are applied to
some object. Each of these would be represented by a force vector and the two forces
acting together would yield an overall force acting on the object whichP
would also be
n
a force vector known as the resultant. Suppose the two vectors are a = k=1 ai ei and
85
Pn
b = k=1 bi ei . Then the vector, a involves a component in the ith direction, ai ei while
the component in the ith direction of b is bi ei . Then it seems physically reasonable that
the resultant vector should have a component in the ith direction equal to (ai + bi ) ei .
This is exactly what is obtained when the vectors, a and b are added.
a + b = (a1 + b1 , , an + bn ) .
n
X
=
(ai + bi ) ei .
i=1
Thus the addition of vectors according to the rules of addition in Rn which were
presented earlier, yields the appropriate vector which duplicates the cumulative effect
of all the vectors in the sum.
What is the geometric significance of vector addition? Suppose u, v are vectors,
u = (u1 , , un ) , v = (v1 , , vn )
Then u + v = (u1 + v1 , , un + vn ) . How can one obtain this geometrically? Consider
the directed line segment, 0u and then, starting at the end of this directed line segment,
follow the directed line segment u (u + v) to its end, u + v. In other words, place the
vector u in standard position with its base at the origin and then slide the vector v till
its base coincides with the point of u. The point of this slid vector, determines u + v.
To illustrate, see the following picture
1
u + v
u
Note the vector u + v is the diagonal of a parallelogram determined from the two
vectors u and v and that identifying u + v with the directed diagonal of the parallelogram determined by the vectors u and v amounts to the same thing as the above
procedure.
An item of notation should be mentioned here. In the case of Rn where n 3, it is
standard notation to use i for e1 , j for e2 , and k for e3 . Now here are some applications
of vector addition to some problems.
Example 4.2.4 There are three ropes attached to a car and three people pull on these
ropes. The first exerts a force of 2i+3j2k Newtons, the second exerts a force of
3i+5j + k Newtons and the third exerts a force of 5i j+2k. Newtons. Find the total
force in the direction of i.
To find the total force add the vectors as described above. This gives 10i+7j + k
Newtons. Therefore, the force in the i direction is 10 Newtons.
As mentioned earlier, the Newton is a unit of force like pounds.
Example 4.2.5 An airplane flies North East at 100 miles per hour. Write this as a
vector.
A picture of this situation follows.
86
The vector has length 100. Now using that vector as the hypotenuse
of a right
Thus the velocity vector in the above example is 100/ 2i + 100/ 2j.
Example 4.2.7 The velocity of an airplane is 100i + j + k measured in kilometers per
hour and at a certain instant of time its position is (1, 2, 1) . Here imagine a Cartesian
coordinate system in which the third component is altitude and the first and second
components are measured on a line from West to East and a line from South to North.
Find the position of this airplane one minute later.
Consider the vector (1, 2, 1) , is the initial position vector of the airplane. As it moves,
the position vector changes. After one minute the airplane has moved in the i direction
1
1
a distance of 100 60
= 53 kilometer. In the j direction it has moved 60
kilometer
1
during this same time, while it moves 60
kilometer in the k direction. Therefore, the
new displacement vector for the airplane is
5 1 1
8 121 121
(1, 2, 1) +
, ,
=
,
,
3 60 60
3 60 60
Example 4.2.8 A certain river is one half mile wide with a current flowing at 4 miles
per hour from East to West. A man swims directly toward the opposite shore from the
South bank of the river at a speed of 3 miles per hour. How far down the river does he
find himself when he has swam across? How far does he end up swimming?
Consider the following picture.
6
3
You should write these vectors in terms of components. The velocity of the swimmer
in still water would be 3j while the velocity of the river would be 4i. Therefore, the
velocity of the swimmer is 4i + 3j. Since the component of velocity in the direction
across the river is
3, it follows the trip takes 1/6 hour or 10 minutes. The speed at
which he travels is 42 + 32 = 5 miles per hour and so he travels 5 16 = 56 miles. Now
to find the distance downstream he finds himself, note that if x is this distance, x and
1/2 are two legs of a right triangle whose hypotenuse q
equals 5/6 miles. Therefore, by
the Pythagorean theorem the distance downstream is
(5/6) (1/2) =
2
3
miles.
4.3. EXERCISES
4.3
87
Exercises
1. The wind blows from West to East at a speed of 50 miles per hour and an airplane
which travels at 300 miles per hour in still air is heading North West. What is
the velocity of the airplane relative to the ground? What is the component of this
velocity in the direction North?
2. In the situation of Problem 1 how many degrees to the West of North should the
airplane head in order to fly exactly North. What will be the speed of the airplane
relative to the ground?
3. In the situation of 2 suppose the airplane uses 34 gallons of fuel every hour at that
air speed and that it needs to fly North a distance of 600 miles. Will the airplane
have enough fuel to arrive at its destination given that it has 63 gallons of fuel?
4. An airplane is flying due north at 150 miles per hour. A wind is pushing the
airplane due east at 40 miles per hour. After 1 hour, the plane starts flying 30
East of North. Assuming the plane starts at (0, 0) , where is it after 2 hours? Let
North be the direction of the positive y axis and let East be the direction of the
positive x axis.
5. City A is located at the origin while city B is located at (300, 500) where distances
are in miles. An airplane flies at 250 miles per hour in still air. This airplane
wants to fly from city A to city B but the wind is blowing in the direction of the
positive y axis at a speed of 50 miles per hour. Find a unit vector such that if the
plane heads in this direction, it will end up at city B having flown the shortest
possible distance. How long will it take to get there?
6. A certain river is one half mile wide with a current flowing at 2 miles per hour from
East to West. A man swims directly toward the opposite shore from the South
bank of the river at a speed of 3 miles per hour. How far down the river does he
find himself when he has swam across? How far does he end up swimming?
7. A certain river is one half mile wide with a current flowing at 2 miles per hour
from East to West. A man can swim at 3 miles per hour in still water. In what
direction should he swim in order to travel directly across the river? What would
the answer to this problem be if the river flowed at 3 miles per hour and the man
could swim only at the rate of 2 miles per hour?
8. Three forces are applied to a point which does not move. Two of the forces are
2i + j + 3k Newtons and i 3j + 2k Newtons. Find the third force.
9. The total force acting on an object is to be 2i + j + k Newtons. A force of
i + j + k Newtons is being applied. What other force should be applied to
achieve the desired total force?
10. A bird flies from its nest 5 km. in the direction 60 north of east where it stops to
rest on a tree. It then flies 10 km. in the direction due southeast and lands atop
a telephone pole. Place an xy coordinate system so that the origin is the birds
nest, and the positive x axis points east and the positive y axis points north.
Find the displacement vector from the nest to the telephone pole.
11. A car is stuck in the mud. There is a cable stretched tightly from this car to a tree
which is 20 feet long. A person grasps the cable in the middle and pulls with a
force of 100 pounds perpendicular to the stretched cable. The center of the cable
moves two feet and remains still. What is the tension in the cable? The tension
88
in the cable is the force exerted on this point by the part of the cable nearer the
car as well as the force exerted on this point by the part of the cable nearer the
tree.
12. Let U = {(x, y, z) such that z > 0} . Determine whether U is open, closed or
neither.
13. Let U = {(x, y, z) such that z 0} . Determine whether U is open, closed or
neither.
n
o
p
14. Let U = (x, y, z) such that x2 + y 2 + z 2 < 1 . Determine whether U is open,
closed or neither.
n
o
p
15. Let U = (x, y, z) such that x2 + y 2 + z 2 1 . Determine whether U is open,
closed or neither.
16. Show carefully that Rn is both open and closed.
17. Show that every open set in Rn is the union of open balls contained in it.
18. Show the intersection of any two open sets is an open set.
19. If S is a nonempty subset of Rp , a point, x is said to be a limit point of S if
B (x, r) contains infinitely many points of S for each r > 0. Show this is equivalent
to saying that B (x, r) contains a point of S different than x for each r > 0.
20. Closed sets were defined to be those sets which are complements of open sets.
Show that a set is closed if and only if it contains all its limit points.
4.4
1. The wind blows from West to East at a speed of 30 kilometers per hour and an
airplane which travels at 300 Kilometers per hour in still air is heading North
West. What is the velocity of the airplane relative to the ground? What is the
component of this velocity in the direction North?
Let the positive y axis point in the direction North and let the positive x axis
point in the direction East. The velocity of the wind is 30i. The plane moves
in the direction i + j. A unit vector in this direction is 12 (i + j) . Therefore, the
velocity of the plane relative to the ground is
300
30i+ (i + j) = 150 2j + 30 + 150 2 i.
2
89
Therefore, you need to have sin = 1/10, which means = . 100 17 radians.
Therefore, the degrees should be .1180
= 5. 729 6degrees.
99
10
j.
3. In the situation of 2 suppose the airplane uses 34 gallons of fuel every hour at that
air speed and that it needs to fly North a distance of 600 kilometers. Will the
airplane have enough fuel to arrive at its destination given that it has 63 gallons
of fuel?
300
600
99
10
99
10
. Therefore, it
90
Vector Products
5.0.1
Outcomes
1. Evaluate a dot product from the angle formula or the coordinate formula.
2. Interpret the dot product geometrically.
3. Evaluate the following using the dot product:
(a) the angle between two vectors
(b) the magnitude of a vector
(c) the work done by a constant force on an object
4. Evaluate a cross product from the angle formula or the coordinate formula.
5. Interpret the cross product geometrically.
6. Evaluate the following using the cross product:
(a) the area of a parallelogram
(b) the area of a triangle
(c) physical quantities such as the torque and angular velocity.
7. Find the volume of a parallelepiped using the box product.
8. Recall, apply and derive the algebraic properties of the dot and cross products.
5.1
There are two ways of multiplying vectors which are of great importance in applications.
The first of these is called the dot product, also called the scalar product and
sometimes the inner product.
Definition 5.1.1 Let a, b be two vectors in Rn define a b as
ab
n
X
ak bk .
k=1
With this definition, there are several important properties satisfied by the dot
product. In the statement of these properties, and will denote scalars and a, b, c
will denote vectors.
91
92
VECTOR PRODUCTS
(5.1)
(5.2)
(a + b) c = (a c) + (b c)
(5.3)
c (a + b) = (c a) + (c b)
(5.4)
|a| = a a
(5.5)
You should verify these properties. Also be sure you understand that 5.4 follows from
the first three and is therefore redundant. It is listed here for the sake of convenience.
Example 5.1.3 Find (1, 2, 0, 1) (0, 1, 2, 3) .
This equals 0 + 2 + 0 + 3 = 1.
Example 5.1.4 Find the magnitude of a = (2, 1, 4, 2) . That is, find |a| .
p
This is (2, 1, 4, 2) (2, 1, 4, 2) = 5.
The dot product satisfies a fundamental inequality known as the Cauchy Schwarz
inequality.
Theorem 5.1.5 The dot product satisfies the inequality
|a b| |a| |b| .
(5.6)
= |a| + 2t (a b) + |b| t2 .
Now
f (t) = |b|
t + 2t
ab
|b|
|a|
|b|
!2
ab
2
|a|
= |b| t2 + 2t 2 +
+
2
2
2
|b|
|b|
|b|
|b|
!2
!2
2
ab
a
b
|a|
2
0
t+
= |b|
+ 2
2
2
|b|
|b|
|b|
2
ab
ab
!2
93
2
for all t R. In particular f (t) 0 when t = a b/ |b| which implies
|a|
|b|
ab
|b|
!2
0.
(5.7)
|a| |b| (a b)
(5.8)
and equality holds if and only if one of the vectors is a nonnegative scalar multiple of
the other. Also
||a| |b|| |a b|
(5.9)
Proof : By properties of the dot product and the Cauchy Schwartz inequality,
2
|a + b| = (a + b) (a + b)
= (a a) + (a b) + (b a) + (b b)
2
= |a| + 2 (a b) + |b|
2
|a| + 2 |a b| + |b|
|a| + 2 |a| |b| + |b|
2
= (|a| + |b|) .
Taking square roots of both sides you obtain 5.8.
It remains to consider when equality occurs. If either vector equals zero, then that
vector equals zero times the other vector and the claim about when equality occurs is
verified. Therefore, it can be assumed both vectors are nonzero. To get equality in the
second inequality above, Theorem 5.1.5 implies one of the vectors must be a multiple
of the other. Say b = a. If < 0 then equality cannot occur in the first inequality
because in this case
2
2
(a b) = |a| < 0 < || |a| = |a b|
Therefore, 0.
94
VECTOR PRODUCTS
(5.10)
|b| |a| |b a| = |a b| .
(5.11)
Similarly,
It follows from 5.10 and 5.11 that 5.9 holds. This is because ||a| |b|| equals the left
side of either 5.10 or 5.11 and either way, ||a| |b|| |a b| . This proves the theorem.
5.2
5.2.1
Given two vectors, a and b, the included angle is the angle between these two vectors
which is less than or equal to 180 degrees. The dot product can be used to determine
the included angle between two vectors. To see how to do this, consider the following
picture.
*A
b
PP
A
a
A
PP
P
PPAAU
a bAA
q
P
A
AU
By the law of cosines,
2
|a b| = (a b) (a b)
2
= |a| + |b| 2a b
and so comparing the above two formulas,
a b = |a| |b| cos .
(5.12)
In words, the dot product of two vectors equals the product of the magnitude of the
two vectors multiplied by the cosine of the included angle. Note this gives a geometric
description of the dot product which does not depend explicitly on the coordinates of
the vectors.
Example 5.2.1 Find the angle between the vectors 2i + j k and 3i + 4j + k.
95
96
5.2.2
VECTOR PRODUCTS
Our first application will be to the concept of work. The physical concept of work does
not in any way correspond to the notion of work employed in ordinary conversation. For
example, if you were to slide a 150 pound weight off a table which is three feet high and
shuffle along the floor for 50 yards, sweating profusely and exerting all your strength
to keep the weight from falling on your feet, keeping the height always three feet and
then deposit this weight on another three foot high table, the physical concept of work
would indicate that the force exerted by your arms did no work during this project even
though the muscles in your hands and arms would likely be very tired. The reason for
such an unusual definition is that even though your arms exerted considerable force on
the weight, enough to keep it from falling, the direction of motion was at right angles
to the force they exerted. The only part of a force which does work in the sense of
physics is the component of the force in the direction of motion (This is made more
precise below.). The work is defined to be the magnitude of the component of this force
times the distance over which it acts in the case where this component of force points
in the direction of motion and (1) times the magnitude of this component times the
distance in case the force tends to impede the motion. Thus the work done by a force
on an object as the object moves from one point to another is a measure of the extent
to which the force contributes to the motion. This is illustrated in the following picture
in the case where the given force contributes to the motion.
q p
F
2
:
C
q F||
p1
CO
F C
In this picture the force, F is applied to an object which moves on the straight line
from p1 to p2 . There are two vectors shown, F|| and F and the picture is intended to
indicate that when you add these two vectors you get F while F|| acts in the direction
of motion and F acts perpendicular to the direction of motion. Only F|| contributes
to the work done by F on the object as it moves from p1 to p2 . F|| is called the
component of the force in the direction of motion. From trigonometry, you see the
magnitude of F|| should equal |F| |cos | . Thus, since F|| points in the direction of the
vector from p1 to p2 , the total work done should equal
|F|
p1 p2 cos = |F| |p2 p1 | cos
If the included angle had been obtuse, then the work done by the force, F on the object
would have been negative because in this case, the force tends to impede the motion
from p1 to p2 but in this case, cos would also be negative and so it is still the case
that the work done would be given by the above formula. Thus from the geometric
description of the dot product given above, the work equals
|F| |p2 p1 | cos = F (p2 p1 ) .
This explains the following definition.
Definition 5.2.7 Let F be a force acting on an object which moves from the point, p1
to the point p2 . Then the work done on the object by the given force equals F (p2 p1 ) .
The concept of writing a given vector, F in terms of two vectors, one which is parallel
to a given vector, D and the other which is perpendicular can also be explained with
no reliance on trigonometry, completely in terms of the algebraic properties of the dot
97
product. As before, this is mathematically more significant than any approach involving
geometry or trigonometry because it extends to more interesting situations. This is done
next.
Theorem 5.2.8 Let F and D be nonzero vectors. Then there exist unique vectors F||
and F such that
F = F|| + F
(5.14)
where F|| is a scalar multiple of D, also referred to as
projD (F) ,
and F D = 0. The vector projD (F) is called the projection of F onto D.
Proof: Suppose 5.14 and F|| = D. Taking the dot product of both sides with D
and using F D = 0, this yields
F D = |D|
which requires = F D/ |D| . Thus there can be no more than one vector, F|| . It
follows F must equal F F|| . This verifies there can be no more than one choice for
both F|| and F .
Now let
FD
F||
2 D
|D|
and let
F = F F|| = F
Then F|| = D where =
FD
.
|D|2
FD
|D|
F D = F D
FD
2 DD
|D|
= F D F D = 0.
98
VECTOR PRODUCTS
1
(i 2j + k) (2i + 3j 4k) (2i + 3j 4k)
4 + 9 + 16
8
16
24
32
(2i + 3j 4k) = i j + k.
29
29
29
29
Example 5.2.11 Suppose a, and b are vectors and b = b proja (b) . What is the
magnitude of b in terms of the included angle?
!
!
ba
ba
= b 2 a b 2 a
|a|
|a|
!2
2
ba
(b a)
2
2
= |b| 2
+
|a|
2
2
|a|
|a|
!
2
(b a)
2
= |b| 1 2 2
|a| |b|
2
2
= |b| 1 cos2 = |b| sin2 ()
where is the included angle between a and b which is less than radians. Therefore,
taking square roots,
|b | = |b| sin .
5.2.3
When light is reflected the angle of incidence is always equal to the angle of reflection.
This is illustrated in the following picture in which a ray of light reflects off something
like a mirror.
Angle of incidence
}
Angle of reflection
Reflecting surface
An interesting problem is to design a curved mirror which has the property that it
will direct all rays of light coming from a long distance away (essentially parallel rays
of light) to a single point. You might be interested in a reflecting telescope for example
or some sort of scheme for achieving high temperatures by reflecting the rays of the sun
to a small area. Turning things around, you could place a source of light at the single
point and desire to have the mirror reflect this in a beam of light consisting of parallel
rays. How can you design such a mirror?
99
It turns out this is pretty easy given the above techniques for finding the angle
between vectors. Consider the following picture.
(0, p) r
}
r
(x, y(x))
6
Piece of the curved mirror
It suffices to consider this in a plane for x > 0 and then let the mirror be obtained as
a surface of revolution. In the above picture, let (0, p) be the special point at which all
the parallel rays of light will be directed. This is set up so the rays of light are parallel
to the y axis. The two indicated angles will be equal and the equation of the indicated
curve will be y = y (x) while the reflection is taking place at the point (x, y (x)) as
shown. To say the two angles are equal is to say their cosines are equal. Thus from the
above,
(0, 1) (1, y 0 (x))
(x, p y) (1, y 0 (x))
q
q
=q
.
2
2
2
1 + y 0 (x)
x2 + (y p) 1 + y 0 (x)
This follows because the vectors forming the sides of one of the angles are (0, 1) and
(1, y 0 (x)) while the vectors forming the other angle are (x, p y) and (1, y 0 (x)) .
Therefore, this yields the differential equation,
y 0 (x) =
y 0 (x) (p y) + x
q
2
x2 + (y p)
2
2
x + (y p) + (p y) y 0 = x
Now let y p = xv so that y 0 = xv 0 + v. Then in terms of v the differential equation is
xv 0 =
This reduces to
If G
1
1+v 2 v
1
v.
1 + v2 v
1
v
1 + v2 v
dv
1
= .
dx
x
where C is a constant. This is because if you differentiate both sides with respect to x,
dv
1
1
1
dv
G0 (v)
=
= 0.
v
dx x
dx x
1 + v2 v
100
VECTOR PRODUCTS
R
1
To find G
v
dv, use a trig. substitution, v = tan . Then in terms of
2
1+v v
, the antiderivative becomes
Z
Z
1
tan sec2 d =
sec d
sec tan
= ln |sec + tan | + C.
Now in terms of v, this is
p
ln v + 1 + v 2 = ln x + c.
There is no loss of generality in letting c = ln C because ln maps onto R. Therefore,
from laws of logarithms,
ln v + 1 + v 2 = ln x + c = ln x + ln C
=
Therefore,
v+
and so
ln Cx.
1 + v 2 = Cx
1 + v 2 = Cx v.
yp
= C 2 x2 2C (y p) .
x
1
C 2
y = x + p
2
2C
and for this to correspond to reflection as described above, it must be that C > 0. As
described in an earlier section, this is just the equation of a parabola. Note it is possible
to choose C as desired adjusting the shape of the mirror.
5.2.4
Notice how you put the conjugate on the entries of the vector, y. It makes no
difference if the vectors happen to be real vectors but with complex vectors you must
do it this way. The reason for this is that when you take the dot product of a vector
with itself, you want to get the square of the length of the vector, a positive number.
101
Placing the conjugate on the components of y in the above definition assures this will
take place. Thus
X
X
2
xx=
xj xj =
|xj | 0.
j
If you didnt place a conjugate as in the above definition, things wouldnt work out
correctly. For example,
2
(1 + i) + 22 = 4 + 2i
and this is not a positive number.
The following properties of the dot product follow immediately from the definition
and you should verify each of them.
Properties of the dot product:
1. u v = v u.
2. If a, b are numbers and u, v, z are vectors then (au + bv) z = a (u z) + b (v z) .
3. u u 0 and it equals 0 if and only if u = 0.
The norm is defined in the usual way.
Definition 5.2.13 For x Cn ,
|x|
n
X
!1/2
|xk |
= (x x)
1/2
k=1
z
. Recall that for z = x + iy, z =
|z|
x iy and zz = |z| .
Theorem 5.2.15 (Cauchy Schwarz)The following inequality holds for xi and yi C.
!1/2 n
!1/2
n
n
X
X
X
2
2
|(x y)| =
xi y i
|xi |
|yi |
= |x| |y|
(5.15)
i=1
i=1
i=1
n
X
i=1
Thus
n
X
xi y i =
xi y i
i=1
xi y i =
xi yi =
xi y i .
i=1
i=1
i=1
Pn
Consider p (t) i=1 xi + tyi xi + tyi where t R.
n
X
p (t) =
n
X
n
X
|xi | + 2t Re
i=1
n
X
i=1
n
X
2
2
= |x| + 2t
xi y i + t2 |y|
i=1
!
xi y i
+ t2
n
X
i=1
|yi |
102
VECTOR PRODUCTS
If |y| = 0 then 5.15 is obviously true because both sides equal zero. Therefore, assume
|y| 6= 0 and then p (t) is a polynomial of degree two whose graph opens up. Therefore,
it either has no zeroes, two zeros or one repeated zero. If it has two zeros, the above
inequality must be violated because in this case the graph must dip below the x axis.
Therefore, it either has no zeros or exactly one. From the quadratic formula this happens
exactly when
2
n
2
2
4
xi y i 4 |x| |y| 0
i=1
and so
xi y i |x| |y|
i=1
Theorem 5.2.17 For length defined in Definition 5.2.16, the following hold.
|z| 0 and |z| = 0 if and only if z = 0
(5.16)
(5.17)
|z + w| |z| + |w| .
(5.18)
Proof: The first two claims are left as exercises. To establish the third, you use the
same argument which was used in Rn .
2
|z + w|
=
=
=
(z + w, z + w)
zz+ww+wz+zw
2
2
|z| + |w| + 2 Re w z
|z| + |w| + 2 |w z|
All other considerations such as open and closed sets and the like are identical in this
more general context with the corresponding definition in Rn . The main difference is
that here the scalars are complex numbers.
Definition 5.2.18 Suppose you have a vector space, V and for z, w V and a scalar
a norm is a way of measuring distance or magnitude which satisfies the properties 5.16
- 5.18. Thus a norm is something which does the following.
||z|| 0 and ||z|| = 0 if and only if z = 0
(5.19)
(5.20)
(5.21)
5.3. EXERCISES
5.3
103
Exercises
1. Use formula 5.12 to verify the Cauchy Schwartz inequality and to show that equality occurs if and only if one of the vectors is a scalar multiple of the other.
2. For u, v vectors in R3 , define the product, u v u1 v1 + 2u2 v2 + 3u3 v3 . Show
the axioms for a dot product all hold for this funny product. Prove
|u v| (u u)
1/2
(v v)
1/2
104
VECTOR PRODUCTS
22. Prove from the axioms of the dot product the parallelogram identity, |a + b| +
2
2
2
|a b| = 2 |a| + 2 |b| .
n
23. Let A and
Rm . Show (Ax, y)Rm =
be a real m n matrix and let x R and y
k
T
x,A y Rn where (, )Rk denotes the dot product in R . In the notation above,
Ax y = xAT y. Use the definition of matrix multiplication to do this.
T
= B T AT without
Show this dot product satisfies conditons 5.1 - 5.5. Explain why the Cauchy
Schwarz inequality continues to hold in this context and state the Cauchy Schwarz
inequality in terms of integrals.
5.4
cos = 9+1+1
= . 197 39. Therefore, you have to solve the equation
1+16+4
cos = . 197 39, Solution is : = 1. 769 5 radians. You need to use a calculator
or table to solve this.
vu
uu u.
1
14
(1, 2, 3) .
3. If F is a force and D is a vector, show projD (F) = (|F| cos ) u where u is the unit
vector in the direction of D, u = D/ |D| and is the included angle between the
two vectors, F and D. |F| cos is sometimes called the component of the force, F
in the direction, D.
projD (F) =
FD
DD D
1
D
= |F| |D| cos |D|
2 D = |F| cos |D| .
4. A boy drags a sled for 100 feet along the ground by pulling on a rope which is 40
degrees from the horizontal with a force of 10 pounds. How much work does this
force do?
40
The component of force is 10 cos 180
and it acts for 100 feet so the work done
is
40
100 = 766. 04
10 cos
180
105
8. Prove from the axioms of the dot product the parallelogram identity, |a + b| +
2
2
2
|a b| = 2 |a| + 2 |b| .
Use the properties of the dot product and the definition of the norm in terms of
the dot product.
n
9. Let A and
Rm . Show (Ax, y)Rm =
be a real m n matrix and let x R and y
T
k
x,A y Rn where (, )Rk denotes the dot product in R . In the notation above,
Ax y = xAT y. Use the definition of matrix multiplication to do this.
P
Remember the ij th entry of Ax = j Aij xj . Therefore,
Ax y =
(Ax)i yi =
XX
i
Aij xj yi .
Recall now that AT ij = Aji . Use this to write a formula for x,AT y Rn .
5.5
The cross product is the other way of multiplying two vectors in R3 . It is very different
from the dot product in many ways. First the geometric meaning is discussed and then
a description in terms of coordinates is given. Both descriptions of the cross product are
important. The geometric description is essential in order to understand the applications
to physics and geometry while the coordinate description is the only way to practically
compute the cross product.
Definition 5.5.1 Three vectors, a, b, c form a right handed system if when you extend
the fingers of your right hand along the vector, a and close them in the direction of b,
the thumb points roughly in the direction of c.
For an example of a right handed system of vectors, see the following picture.
y
XXXa
X
X
b
In this picture the vector c points upwards from the plane determined by the other
two vectors. You should consider how a right hand system would differ from a left hand
system. Try using your left hand and you will see that the vector, c would need to point
in the opposite direction as it would for a right hand system.
106
VECTOR PRODUCTS
From now on, the vectors, i, j, k will always form a right handed system. To repeat,
if you extend the fingers of our right hand along i and close them in the direction j, the
thumb points in the direction of k.
k
6
-
The following is the geometric description of the cross product. It gives both the
direction and the magnitude and therefore specifies the vector.
Definition 5.5.2 Let a and b be two vectors in R3 . Then a b is defined by the following two rules.
1. |a b| = |a| |b| sin where is the included angle.
2. a b a = 0, a b b = 0, and a, b, a b forms a right hand system.
Note that |a b| is the area of the parallelogram spanned by a and b.
3
|b|sin()
(5.22)
(a) b = (a b) = a (b) ,
(5.23)
For a scalar,
For a, b, and c vectors, one obtains the distributive laws,
a (b + c) = a b + a c,
(5.24)
(b + c) a = b a + c a.
(5.25)
Formula 5.22 follows immediately from the definition. The vectors a b and b a
have the same magnitude, |a| |b| sin , and an application of the right hand rule shows
they have opposite direction. Formula 5.23 is also fairly clear. If is a nonnegative
scalar, the direction of (a) b is the same as the direction of a b, (a b) and
a (b) while the magnitude is just times the magnitude of a b which is the same
as the magnitude of (a b) and a (b) . Using this yields equality in 5.23. In the
case where < 0, everything works the same way except the vectors are all pointing in
the opposite direction and you must multiply by || when comparing their magnitudes.
The distributive laws are much harder to establish but the second follows from the first
quite easily. Thus, assuming the first, and using 5.22,
(b + c) a = a (b + c)
= (a b + a c)
= b a + c a.
107
A proof of the distributive law is given in a later section for those who are interested.
Now from the definition of the cross product,
i j = k j i = k
k i = j i k = j
j k = i k j = i
With this information, the following gives the coordinate description of the cross product.
Proposition 5.5.3 Let a = a1 i + a2 j + a3 k and b = b1 i + b2 j + b3 k be two vectors.
Then
a b = (a2 b3 a3 b2 ) i+ (a3 b1 a1 b3 ) j+
+ (a1 b2 a2 b1 ) k.
(5.26)
Proof: From the above table and the properties of the cross product listed,
(a1 i + a2 j + a3 k) (b1 i + b2 j + b3 k) =
a1 b2 i j + a1 b3 i k + a2 b1 j i + a2 b3 j k+
+a3 b1 k i + a3 b2 k j
= a1 b2 k a1 b3 j a2 b1 k + a2 b3 i + a3 b1 j a3 b2 i
= (a2 b3 a3 b2 ) i+ (a3 b1 a1 b3 ) j+ (a1 b2 a2 b1 ) k
(5.27)
i
j k
a b = a1 a2 a3
(5.28)
b1 b2 b3
where you expand the determinant along the top row. This yields
(a2 b3 a3 b2 ) i (a1 b3 a3 b1 ) j+ (a1 b2 a2 b1 ) k
(5.29)
i
j k
1 1 2 = 1
2
3 2 1
2 1 2 1
i
j+
1 3 1 3
1
k
2
= 3i + 5j + k.
Example 5.5.5 Find the area of the parallelogram determined by the vectors, (i j + 2k)
and (3i 2j + k) . These are the same two vectors in Example 5.5.4.
From Example 5.5.4 and the geometric description of the cross product,
the area is
just
the norm of the vector obtained in Example 5.5.4. Thus the area is 9 + 25 + 1 =
35.
108
VECTOR PRODUCTS
Example 5.5.6 Find the area of the triangle determined by (1, 2, 3) , (0, 2, 5) , and (5, 1, 2) .
This triangle is obtained by connecting the three points with lines. Picking (1, 2, 3)
as a starting point, there are two displacement vectors, (1, 0, 2) and (4, 1, 1) such
that the given vector added to these displacement vectors gives the other two vectors.
The area of the triangle is half the area of the parallelogram determined by (1, 0, 2)
and
(4, 1, 1) . Thus
(1, 0, 2) (4, 1, 1) = (2, 7, 1) and so the area of the triangle
is 21 4 + 49 + 1 = 32 6.
Observation 5.5.7 In general, if you have three points (vectors) in R3 , P, Q, R the
area of the triangle is given by
1
|(Q P) (R P)| .
2
5.5.1
Qr
-r
This section gives a proof for 5.24, a fairly difficult topic. It is included here for the
interested student. If you are satisfied with taking the distributive law on faith, it is
not necessary to read this section. The proof given here is quite clever and follows the
one given in [7]. Another approach, based on volumes of parallelepipeds is found in [25]
and is discussed a little later.
Lemma 5.5.8 Let b and c be two vectors. Then b c = b c where c|| + c = c
and c b = 0.
Proof: Consider the following picture.
c 6c
b
b b
Now c = c c |b|
|b| and so c is in the plane determined by c and b. Therefore,
from the geometric definition of the cross product, b c and b c have the same
direction. Now, referring to the picture,
|b c | = |b| |c |
= |b| |c| sin
= |b c| .
Therefore, b c and b c also have the same magnitude and so they are the same
vector.
With this, the proof of the distributive law is in the following theorem.
Theorem 5.5.9 Let a, b, and c be vectors in R3 . Then
a (b + c) = a b + a c
(5.30)
109
@ B c
H
H b+c
@B
b
Then a b, a (b + c) , and a c are each vectors in the same plane, perpendicular
to a as shown. Thus a c c = 0, a (b + c) (b + c) = 0, and a b b = 0. This
implies that to get a b you move counterclockwise through an angle of /2 radians
from the vector, b. Similar relationships exist between the vectors a (b + c) and b + c
and the vectors a c and c. Thus the angle between a b and a (b + c) is the same
as the angle between b + c and b and the angle between a c and a (b + c) is the
same as the angle between c and b + c. In addition to this, since a is perpendicular to
these vectors,
|a b| = |a| |b| , |a (b + c)| = |a| |b + c| , and
|a c| = |a| |c| .
Therefore,
and so
|a (b + c)|
|a c|
|a b|
=
=
= |a|
|b + c|
|c|
|b|
|b + c| |a (b + c)|
|b + c|
|a (b + c)|
=
,
=
|a c|
|c|
|a b|
|b|
showing the triangles making up the parallelogram on the right and the four sided figure
on the left in the above picture are similar. It follows the four sided figure on the left
is in fact a parallelogram and this implies the diagonal is the vector sum of the vectors
on the sides, yielding 5.30.
Now suppose it is not necessarily the case that a b = a c = 0. Then write b = b|| +
b where b a = 0. Similarly c = c|| + c . By the above lemma and what was just
shown,
a (b + c) = a (b + c)
= a (b + c )
= a b + a c
= a b + a c.
This proves the theorem.
The result of Problem 17 of the exercises 5.3 is used to go from the first to the second
line.
5.5.2
Torque
Imagine you are using a wrench to loosen a nut. The idea is to turn the nut by applying
a force to the end of the wrench. If you push or pull the wrench directly toward or away
110
VECTOR PRODUCTS
from the nut, it should be obvious from experience that no progress will be made in
turning the nut. The important thing is the component of force perpendicular to the
wrench. It is this component of force which will cause the nut to turn. For example see
the following picture.
F
AK
A
F
A
*
R
F
i
j k i
j
k
1 1 2 1 5 1
Example 5.5.11 Find if possible a single force vector, F which if applied at the point
i + j + k will produce the same torque as the above two forces acting at the given points.
This is fairly routine. The problem is to find F = F1 i + F2 j + F3 k which produces
the above torque vector. Therefore,
i
j
k
1
1
1 = 27i 8j 8k
F1 F2 F3
which reduces to (F3 F2 ) i+ (F1 F3 ) j+ (F2 F1 ) k = 27i 8j 8k. This amounts
to solving the system of three equations in three unknowns, F1 , F2 , and F3 ,
F3 F2 = 27
F1 F3 = 8
F2 F1 = 8
However, there is no solution to these three equations. (Why?) Therefore no single
force acting at the point i + j + k will produce the given torque.
5.5.3
111
Center Of Mass
The mass of an object is a measure of how much stuff there is in the object. An object
has mass equal to one kilogram, a unit of mass in the metric system, if it would exactly
balance a known one kilogram object when placed on a balance. The known object
is one kilogram by definition. The mass of an object does not depend on where the
balance is used. It would be one kilogram on the moon as well as on the earth. The
weight of an object is something else. It is the force exerted on the object by gravity
and has magnitude gm where g is a constant called the acceleration of gravity. Thus
the weight of a one kilogram object would be different on the moon which has much less
gravity, smaller g, than on the earth. An important idea is that of the center of mass.
This is the point at which an object will balance no matter how it is turned.
Definition 5.5.12 Let an object consist of p point masses, m1 , , mp with the position
of the k th of these at Rk . The center of mass of this object, R0 is the point satisfying
p
X
(Rk R0 ) gmk u = 0
k=1
k=1
for any choice of unit vector, u. You should verify that if a u = 0 for all u, then it
must be the case that a = 0. Then the above formula requires that
p
X
Rk gmk R0
Pp
k=1
gmk = 0.
k=1
k=1
p
X
mk ,
Pp
k=1 Rk mk
R0 = P
.
p
k=1 mk
(5.32)
This is the formula for the center of mass of a collection of point masses. To consider
the center of mass of a solid consisting of continuously distributed masses, you need the
methods of calculus.
Example 5.5.13 Let m1 = 5, m2 = 6, and m3 = 3 where the masses are in kilograms.
Suppose m1 is located at 2i + 3j + k, m2 is located at i 3j + 2k and m3 is located at
2i j + 3k. Find the center of mass of these three masses.
Using 5.32
5 (2i + 3j + k) + 6 (i 3j + 2k) + 3 (2i j + 3k)
5+6+3
11
3
13
=
i j+ k
7
7
7
R0 =
112
5.5.4
VECTOR PRODUCTS
Angular Velocity
Definition 5.5.14 In a rotating body, a vector, is called an angular velocity vector if the velocity of a point having position vector, u relative to the body is given by
u.
The existence of an angular velocity vector is the key to understanding motion in
a moving system of coordinates. It is used to explain the motion on the surface of the
rotating earth. For example, have you ever wondered why low pressure areas rotate
counter clockwise in the upper hemisphere but clockwise in the lower hemisphere? To
quantify these things, you will need the concept of an angular velocity vector. Details
are presented later for interesting examples. In the above example, think of a coordinate
system fixed in the rotating body. Thus if you were riding on the rotating body, you
would observe this coordinate system as fixed even though it is not.
Example 5.5.15 A wheel rotates counter clockwise about the vector i + j + k at 60
revolutions per minute. This means that if the thumb of your right hand were to point
in the direction of i + j + k your fingers of this hand would wrap in the direction of
rotation. Find the angular velocity vector for this wheel. Assume the unit of distance is
meters and the unit of time is minutes.
Let = 60 2 = 120. This is the number of radians per minute corresponding to
(i + j + k) . Note this
60 revolutions per minute. Then the angular velocity vector is 120
3
gives what you would expect in the case the position vector to the point is perpendicular
to i + j + k and at a distance of r. This is because of the geometric description of the
cross product. The magnitude of the vector is r120 meters per minute and corresponds
to the speed and an exercise with the right hand shows the direction is correct also.
However, if this body is rigid, this will work for every other point in it, even those for
which the position vector is not perpendicular to the given vector. A complete analysis
of this is given later.
Example 5.5.16 A wheel rotates counter clockwise about the vector i + j + k at 60 revolutions per minute exactly as in Example 5.5.15. Let {u1 , u2 , u3 } denote an orthogonal
right handed system attached to the rotating wheel in which u3 = 13 (i + j + k) . Thus
u1 and u2 depend on time. Find the velocity of the point of the wheel located at the
point 2u1 + 3u2 u3 . Note this point is not fixed in space. It is moving.
Since {u1 , u2 , u3 } is a right handed system like i, j, k, everything applies to this
system in the same way as with i, j, k. Thus the cross product is given by
u1 u2 u3
a
b
c
d
e
f
u1 u2
u3
0
0 120
2
3
1
= 360u1 + 240u2
113
in meters per minute. Note how this gives the answer in terms of these vectors which
are fixed in the body, not in space. Since ui depends on t, this shows the answer in
this case does also. Of course this is right. Just think of what is going on with the
wheel rotating. Those vectors which are fixed in the wheel are moving in space. The
velocity of a point in the wheel should be constantly changing. However, its speed will
not change. The speed will be the magnitude of the velocity and this is
p
(360u1 + 240u2 ) (360u1 + 240u2 )
which from the properties of the dot product equals
q
2
2
(360) + (240) = 120 13
because the ui are given to be orthogonal.
5.5.5
ab
You notice the area of the base of the parallelepiped, the parallelogram determined
by the vectors, a and b has area equal to |a b| while the altitude of the parallelepiped
is |c| cos where is the angle shown in the picture between c and a b. Therefore,
the volume of this parallelepiped is the area of the base times the altitude which is just
|a b| |c| cos = a b c.
This expression is known as the box product and is sometimes written as [a, b, c] . You
should consider what happens if you interchange the b with the c or the a with the c.
You can see geometrically from drawing pictures that this merely introduces a minus
sign. In any case the box product of three vectors always equals either the volume of
the parallelepiped determined by the three vectors or else minus this volume.
Example 5.5.18 Find the volume of the parallelepiped determined by the vectors, i +
2j 5k, i + 3j 6k,3i + 2j + 3k.
114
VECTOR PRODUCTS
According to the above discussion, pick any two of these, take the cross product and
then take the dot product of this with the third of these vectors. The result will be
either the desired volume or minus the desired volume.
i j k
(i + 2j 5k) (i + 3j 6k) = 1 2 5
1 3 6
= 3i + j + k
Now take the dot product of this vector with the third which yields
(3i + j + k) (3i + 2j + 3k) = 9 + 2 + 3 = 14.
This shows the volume of this parallelepiped is 14 cubic units.
There is a fundamental observation which comes directly from the geometric definitions of the cross product and the dot product.
Lemma 5.5.19 Let a, b, and c be vectors. Then (a b) c = a (b c) .
Proof: This follows from observing that either (a b) c and a (b c) both give
the volume of the parallellepiped or they both give 1 times the volume.
An Alternate Proof Of The Distributive Law
Here is another proof of the distributive law for the cross product. Let x be a vector.
From the above observation,
x a (b + c) = (x a) (b + c)
= (x a) b+ (x a) c
=xab+xac
= x (a b + a c) .
Therefore,
x [a (b + c) (a b + a c)] = 0
for all x. In particular, this holds for x = a (b + c) (a b + a c) showing that
a (b + c) = a b + a c and this proves the distributive law for the cross product
another way.
Observation 5.5.20 Suppose you have three vectors, u = (a, b, c) , v = (d, e, f ) , and
w = (g, h, i) . Then u v w is given by the following.
i j k
u v w = (a, b, c) d e f
g h i
e f
b d f + c d e
= a
h i
g i
g h
a b c
= det d e f .
g h i
The message is that to take the box product, you can simply take the determinant of
the matrix which results by letting the rows be the rectangular components of the given
vectors in the order in which they occur in the box product.
5.6
115
i
j
k
v w = v1 v2 v3
w1 w2 w3
= (v2 w3 v3 w2 ) i+ (w1 v3 v1 w3 ) j+ (v1 w2 v2 w1 ) k
Next consider u (v w) which is given by
i
j
k
u1
u2
u3
u (v w) =
(v2 w3 v3 w2 ) (w1 v3 v1 w3 ) (v1 w2 v2 w1 )
(5.33)
A related formula is
(u v) w
= [w (u v)]
= [u (w v) v (w u)]
= v (w u) u (w v) .
(5.34)
This derivation is simply wretched and it does nothing for other identities which may
arise in applications. Actually, the above two formulas, 5.33 and 5.34 are sufficient for
most applications if you are creative in using them, but there is another way. This other
way allows you to discover such vector identities as the above without any creativity
or any cleverness. Therefore, it is far superior to the above nasty computation. It is
a vector identity discovering machine and it is this which is the main topic in what
follows.
There are two special symbols, ij and ijk which are very useful in dealing with
vector identities. To begin with, here is the definition of these symbols.
Definition 5.6.1 The symbol, ij , called the Kroneker delta symbol is defined as follows.
1 if i = j
ij
.
0 if i 6= j
With the Kroneker symbol, i and j can equal any integer in {1, 2, , n} for any n N.
Definition 5.6.2 For i, j, and k integers in the set, {1, 2, 3} , ijk is defined as follows.
116
VECTOR PRODUCTS
The way to think of ijk is that 123 = 1 and if you switch any two of the numbers in
the list i, j, k, it changes the sign. Thus ijk = jik and ijk = kji etc. You should
check that this rule reduces to the above definition. For example, it immediately implies
that if there is a repeated index, the answer is zero. This follows because iij = iij
and so iij = 0.
It is useful to use the Einstein summation convention when dealing with these symbols. Simply
stated, the conventionPis that you sum over the repeated index. Thus ai bi
P
means i ai bi . Also, ij xj means j ij xj = xi . When you use this convention, there
is one very important thing to never forget. It is this: Never have an index be repeated
more than once. Thus ai bi is all right but aii bi is not. The reason for
P this is that you
end up getting confused about what is meant. If you want to write i ai bi ci it is best
to simply use the summation notation. There is a very important reduction identity
connecting these two symbols.
Lemma 5.6.3 The following holds.
ijk irs = ( jr ks kr js ) .
Proof: If {j, k} 6= {r, s} then every term in the sum on the left must have either ijk
or irs contains a repeated index. Therefore, the left side equals zero. The right side
also equals zero in this case. To see this, note that if the two sets are not equal, then
there is one of the indices in one of the sets which is not in the other set. For example,
it could be that j is not equal to either r or s. Then the right side equals zero.
Therefore, it can be assumed {j, k} = {r, s} . If i = r and j = s for s 6= r, then there
is exactly one term in the sum on the left and it equals 1. The right also reduces to 1
in this case. If i = s and j = r, there is exactly one term in the sum on the left which
is nonzero and it must equal -1. The right side also reduces to -1 in this case. If there
is a repeated index in {j, k} , then every term in the sum on the left equals zero. The
right also reduces to zero in this case because then j = k = r = s and so the right side
becomes (1) (1) (1) (1) = 0.
Proposition 5.6.4 Let u, v be vectors in Rn where the Cartesian coordinates of u are
(u1 , , un ) and the Cartesian coordinates of v are (v1 , , vn ). Then u v = ui vi . If
u, v are vectors in R3 , then
(u v)i = ijk uj vk .
Also, ik ak = ai .
Proof: The first claim is obvious from the definition of the dot product. The second
is verified by simply checking it works. For example,
i
j
k
u v u1 u2 u3
v1 v2 v3
and so
(u v)1 = (u2 v3 u3 v2 ) .
From the above formula in the proposition,
1jk uj vk u2 v3 u3 v2 ,
the same thing. The cases for (u v)2 and (u v)3 are verified similarly. The last
claim follows directly from the definition.
With this notation, you can easily discover vector identities and simplify expressions
which involve the cross product.
5.7. EXERCISES
117
=
=
=
=
ijk (u v)j wk
ijk jrs ur vs wk
jik jrs ur vs wk
( ir ks is kr ) ur vs wk
= (ui vk wk uk vi wk )
= u wvi v wui
= ((u w) v (v w) u)i .
Since this holds for all i, it follows that
(u v) w = (u w) v (v w) u.
This is good notation and it will be used in the rest of the book whenever convenient.
5.7
Exercises
118
VECTOR PRODUCTS
11. Find the area of the parallelogram determined by the vectors, (1, 0, 3) and (4, 2, 1) .
12. Find the area of the parallelogram determined by the vectors, (1, 2, 2) and
(3, 1, 1) .
13. Find the volume of the parallelepiped determined by the vectors, i 7j 5k, i
2j 6k,3i + 2j + 3k.
14. Find the volume of the parallelepiped determined by the vectors, i + j 5k, i +
5j 6k,3i + j + 3k.
15. Find the volume of the parallelepiped determined by the vectors, i + 6j + 5k, i +
5j 6k,3i + j + k.
16. Suppose a, b, and c are three vectors whose components are all integers. Can you
conclude the volume of the parallelepiped determined from these three vectors will
always be an integer?
17. What does it mean geometrically if the box product of three vectors gives zero?
18. It is desired to find an equation of a plane containing the two vectors, a and b
and the point 0. Using Problem 17, show an equation for this plane is
x y z
a1 a2 a3 = 0
b1 b2 b3
That is, the set of all (x, y, z) such that the above expression equals zero.
19. Using the notion of the box product yielding either plus or minus the volume of
the parallelepiped determined by the given three vectors, show that
(a b) c = a (b c)
In other words, the dot and the cross can be switched as long as the order of the
vectors remains the same. Hint: There are two ways to do this, by the coordinate
description of the dot and cross product and by geometric reasoning.
20. Is a (b c) = (a b) c? What is the meaning of a b c? Explain. Hint:
Try (i j) j.
21. Verify directly that the coordinate description of the cross product, a b has the
property that it is perpendicular to both a and b. Then show by direct computation that this coordinate description satisfies
2
|a b| = |a| |b| (a b)
2
2
= |a| |b| 1 cos2 ()
where is the angle included between the two vectors. Explain why |a b| has the
correct magnitude. All that is missing is the material about the right hand rule.
Verify directly from the coordinate description of the cross product that the right
thing happens with regards to the vectors i, j, k. Next verify that the distributive
law holds for the coordinate description of the cross product. This gives another
way to approach the cross product. First define it in terms of coordinates and
then get the geometric properties from this.
22. Discover a vector identity for u (v w) .
119
A1
ks
1
rps ijk Apj Ari .
2 det (A)
31. When you have a rotating rigid body with angular velocity vector, then the
velocity, u0 is given by u0 = u. It turns out that all the usual calculus rules
such as the product rule hold. Also, u00 is the acceleration. Show using the product
rule that for a constant vector,
u00 = ( u) .
It turns out this is the centripetal acceleration. Note how it involves cross products. Things get really interesting when you move about on the rotating body.
weird forces are felt. This is in the section on moving coordinate systems.
5.8
1. If you only assume 5.31 holds for u = i, j, k, show that this implies 5.31 holds for
all unit vectors, u.
Pp
Pp
Suppose than that ( k=1 Rk gmk R0 k=1 gmk )u = 0 for u = i, j, k. Then if
u is an arbitrary unit vector, u must be of the form ai +P
bj + ck. Now from the
Ppdisp
tributive property of the cross product and letting w = ( k=1 Rk gmk R0 k=1 gmk ),
this says
Pp
Pp
( k=1 Rk gmk R0 k=1 gmk ) u
= w (ai + bj + ck)
= aw i + bw j + cw k
= 0 + 0 + 0 = 0.
2. Let m1 = 4, m2 = 3, and m3 = 1 where the masses are in kilograms and the
distance is in meters. Suppose m1 is located at 2i j + k, m2 is located at
2i 3j + k and m3 is located at 2i + j + 3k. Find the center of mass of these three
masses.
Let the center of mass be located at ai + bj + ck. Then (4 + 3 + 1) (ai + bj + ck) =
4 (2i j + k) + 3 (2i 3j + k) + 1 (2i + j + 3k) = 16i 12j + 10k. Therefore,
5
3
5
a = 2, b = 3
2 and c = 4 . The center of mass is then 2i 2 j + 4 k.
3. Find the angular velocity vector of a rigid body which rotates counter clockwise
about the vector i j + k at 20 revolutions per minute. Assume distance is measured in meters.
The angular velocity is 20 2 = 40. Then = 40 13 (i j + k) .
120
VECTOR PRODUCTS
4. Find the area of the triangle determined by the three points, (1, 2, 3) , (1, 2, 6) and
(3, 2, 1) .
The three points determine two displacement vectors from the point (1, 2, 3) , (0, 0, 3)
and (4, 0, 2) . To find the area of the parallelogram determined by these two
displacement vectors, you simply take the norm of their cross product. To find
the area of the triangle, you take one half of that. Thus the area is
(1/2) |(0, 0, 3) (4, 0, 2)| =
1
|(0, 12, 0)| = 6.
2
5. Find the area of the parallelogram determined by the vectors, (1, 0, 3) and (4, 2, 1) .
1 7 5
1 2 6 = 162
3 3 1
7. Suppose a, b, and c are three vectors whose components are all integers. Can you
conclude the volume of the parallelepiped determined from these three vectors will
always be an integer?
Hint: Consider what happens when you take the determinant of a matrix which
has all integers.
8. Using the notion of the box product yielding either plus or minus the volume of
the parallelepiped determined by the given three vectors, show that
(a b) c = a (b c)
In other words, the dot and the cross can be switched as long as the order of the
vectors remains the same. Hint: There are two ways to do this, by the coordinate
description of the dot and cross product and by geometric reasoning. It is best if
you use the geometric reasoning. Here is a picture which might help.
-
X
XXX
XXX
XX
z
b c XX
ab
121
In this picture there is an angle between a b and c. Call it . Now if you take
|a b| |c| cos this gives the area of the base of the parallelepiped determined by
a and b times the altitude of the parallelepiped, |c| cos . This is what is meant
by the volume of the parallelepiped. It also equals a b c by the geometric
description of the dot product. Similarly, there is an angle between b c and a.
Call it . Then if you take |b c| |a| cos this would equal the area of the face
determined by the vectors b and c times the altitude measured from this face,
|a| cos . Thus this also is the volume of the parallelepiped. and it equals a b c.
The picture is not completely representative. If you switch the labels of two of
these vectors, say b and c, explain why it is still the case that a b c = a b c.
You should draw a similar picture and explain why in this case you get 1 times
the volume of the parallelepiped.
9. Discover a vector identity for(u v) w.
((u v) w)i
=
=
Therefore, (u v) w = (u w) v (v w) u.
10. Discover a vector identity for (u v) (z w) .
Start with ijk uj vk irs zr ws and then go to work on it using the reduction identities
for the permutation symbol.
11. Discover a vector identity for (u v) (z w) in terms of box products.
You will save time if you use the identity for (u v) w or u (v w) .
A1
ks
1
rps ijk Apj Ari .
2 det (A)
Just show the expression on the right acts like the ksth entry of the inverse. Using
the repeated index summation convention this amounts to showing
1
rps ijk Ari Apj Asl = kl .
2 det (A)
122
VECTOR PRODUCTS
From Problem 13, rps det (A) = ijl Ari Apj Asl . Therefore,
6 det (A) = rps rps det (A) = rps ijl Ari Apj Asl
and so det (A) = det AT .
Hence
1
1
rps ijk Ari Apj Asl =
ijl ijk det (A) = kl
2 det (A)
2 det (A)
Outcomes
6.1
Planes
You have an idea of what a plane is already. It is the span of some vectors. However,
it can also be considered geometrically in terms of a dot product. To find the equation
of a plane, you need two things, a point contained in the plane and a vector normal to
the plane. Let p0 = (x0 , y0 , z0 ) denote the position vector of a point in the plane, let
p = (x, y, z) be the position vector of an arbitrary point in the plane, and let n denote
a vector normal to the plane. This means that
n (p p0 ) = 0
whenever p is the position vector of a point in the plane. The following picture illustrates
the geometry of this idea.
123
124
p i
p0
Expressed equivalently, the plane is just the set of all points p such that the vector,
p p0 is perpendicular to the given normal vector, n.
Example 6.1.1 Find the equation of the plane with normal vector, n = (1, 2, 3) containing the point (2, 1, 5) .
From the above, the equation of this plane is just
(1, 2, 3) (x 2, y + 1, z 3) = x 9 + 2y + 3z = 0
Example 6.1.2 2x + 4y 5z = 11 is the equation of a plane. Find the normal vector
and a point on this plane.
6.1. PLANES
125
Letting (x, y, z) be a point on the plane, the volume of the parallelepiped spanned
by (x, y, z) (1, 2, 1) and the two vectors, (2, 3, 1) and (3, 0, 0) must be equal to zero.
Thus the equation of the plane is
3
0
0
3
1 = 0.
det 2
x1 y2 z1
Hence 9z + 15 3y = 0 and dividing by 3 yields the same answer as the above.
Proposition 6.1.6 If (a, b, c) 6= (0, 0, 0) , then ax + by + cz = d is the equation of a
plane with normal vector ai + bj + ck. Conversely, any plane can be written in this form.
Proof: One of a, b, c is nonzero. Suppose for example that c 6= 0. Then the equation
can be written as
d
a (x 0) + b (y 0) + c z
=0
c
1 2
3 1
det
4 2
x y
1
1
=0
1
1
because the matrix sends a nonzero vector, (a, b, c, d) to zero and is therefore, not
one to one. Consequently from Theorem 3.2.1 on Page 61, its determinant equals zero.
Hence upon evaluating the determinant, 15 + 9z + 3y = 0 which reduces to 3z + y = 5.
Example 6.1.8 Find the equation of the plane containing the points (1, 2, 3) and the
line (0, 1, 1) + t (2, 1, 2) = (x, y, z).
There are several ways to do this. One is to find three points and use any of the
above procedures. Let t = 0 and then let t = 1 to get two points on the line. This yields
(1, 2, 3) , (0, 1, 1) , and (2, 2, 3) . Then the equation of the plane is
x y z 1
1 2 3 1
det
0 1 1 1 = 2y z 1 = 0.
2 2 3 1
126
Example 6.1.9 Find the equation of the plane which contains the two lines, given by
the following parametric expressions in which t R.
(2t, 1 + t, 1 + 2t) = (x, y, z) , (2t + 2, 1, 3 + 2t) = (x, y, z)
Note first that you dont know there even is such a plane. However, if there is, you
could find it by obtaining three points, two on one line and one on another and then
using any of the above procedures for finding the plane. From the first line, two points
are (0, 1, 1) and (2, 2, 3) while a third point can be obtained from second line, (2, 1, 3) .
You need a normal vector and then use any of these points. To get a normal vector, form
(2, 0, 2) (2, 1, 2) = (2, 0, 2) . Therefore, the plane is 2x + 0 (y 1) + 2 (z 1) = 0.
This reduces to z x = 1. If there is a plane, this is it. Now you can simply verify
that both of the lines are really in this plane. From the first, (1 + 2t) 2t = 1 and the
second, (3 + 2t) (2t + 2) = 1 so both lines lie in the plane.
One way to understand how a plane looks is to connect the points where it intercepts
the x, y, and z axes. This allows you to visualize the plane somewhat and is a good way
to sketch the plane. Not surprisingly these points are called intercepts.
Example 6.1.10 Sketch the plane which has intercepts (2, 0, 0) , (0, 3, 0) , and (0, 0, 4) .
z
You see how connecting the intercepts gives a fairly good geometric description of
the plane. These lines which connect the intercepts are also called the traces of the
plane. Thus the line which joins (0, 3, 0) to (0, 0, 4) is the intersection of the plane with
the yz plane. It is the trace on the yz plane.
Example 6.1.11 Identify the intercepts of the plane, 3x 4y + 5z = 11.
The easy way to do this is to divide both sides by 11.
x
y
z
+
+
=1
(11/3) (11/4) (11/5)
The intercepts are (11/3, 0, 0) , (0, 11/4, 0) and (0, 0, 11/5) . You can see this by letting
both y and z equal to zero to find the point on the x axis which is intersected by the
plane. The other axes are handled similarly.
6.2
Quadric Surfaces
In the above it was shown that the equation of an arbitrary plane is an equation of
the form ax + by + cz = d. Such equations are called level surfaces. There are some
standard level surfaces which involve certain variables being raised to a power of 2 which
are sufficiently important that they are given names, usually involving the portentous
semi-word oid. These are graphed below using Maple, a computer algebra system.
127
z = x2 /a2 y 2 /b2
hyperbolic paraboloid
z = x2 /a2 + y 2 /b2
elliptic paraboloid
Why do the graphs of these level surfaces look the way they do? Consider first the
hyperboloid of two sheets. The equation defining this surface can be written in the form
x2
y2
z2
1
=
+
.
a2
b2
c2
128
Suppose you fix a value for z. What ordered pairs, (x, y) will satisfy the equation?
2
If az 2 < 1, there is no such ordered pair because the above equation would require a
negative number to equal a nonnegative one. This is why there is a gap and there are
2
two sheets. If az 2 > 1, then the above equation is the equation for an ellipse. That is
why if you slice the graph by letting z = z0 the result is an ellipse in the plane z = z0 .
Consider the hyperboloid of one sheet.
x2
y2
z2
+ 2 = 1 + 2.
2
b
c
a
This time, it doesnt matter what value z takes. The resulting equation for (x, y) is an
ellipse.
Similar considerations apply to the elliptic paraboloid as long as z > 0 and the
ellipsoid. The elliptic cone is like the hyperboloid of two sheets without the 1. Therefore,
z can have any value. In case z = 0, (x, y) = (0, 0) . Viewed from the side, it appears
straight, not curved like the hyperboloid of two sheets.This is because if (x, y, z) is a
point on the surface, then if t is a scalar, it follows (tx, ty, tz) is also on this surface.
2
2
The most interesting of these graphs is the hyperbolic paraboloid1 , z = xa2 yb2 . If
z > 0 this is the equation of a hyperbola which opens to the right and left while if z < 0
it is a hyperbola which opens up and down. As z passes from positive to negative, the
hyperbola changes type and this is what yields the shape shown in the picture.
Not surprisingly, you can find intercepts and traces of quadric surfaces just as with
planes.
Example 6.2.1 Find the trace on the xy plane of the hyperbolic paraboloid, z = x2 y 2 .
This occurs when z = 0 and so this reduces to y 2 = x2 . In other words, this trace is
just the two straight lines, y = x and y = x.
Example 6.2.2 Find the intercepts of the ellipsoid, x2 + 2y 2 + 4z 2 = 9.
To find the intercept on the x axis, let y = z = 0 and this yields x = 3. Thus
there are two intercepts, (3, 0, 0) and (3, 0, 0) . The other intercepts are left for you to
find. You can see this is an aid in graphing the quadric surface. The surface is said to
be bounded
p if there is some number, C such that whenever, (x, y, z) is a point on the
surface, x2 + y 2 + z 2 < C. The surface is called unbounded if no such constant, C
exists. Ellipsoids are bounded but the other quadric surfaces are not bounded.
Example 6.2.3 Why is the hyperboloid of one sheet, x2 + 2y 2 z 2 = 1 unbounded?
Let z be very large. Does there correspond (x, y) such that (x, y, z) is a point
on the hyperboloid of p
one sheet? Certainly. Simply pick any (x, y) on the ellipse
x2 +2y 2 = 1+z 2 . Then x2 + y 2 + z 2 is large, at lest as large as z. Thus it is unbounded.
You can also find intersections between lines and surfaces.
Example 6.2.4 Find the points of intersection of the line (x, y, z) = (1 + t, 1 + 2t, 1 + t)
with the surface, z = x2 + y 2 .
First of all, there is no guarantee there is any intersection at all. But if it exists, you
have only to solve the equation for t
2
1 + t = (1 + t) + (1 + 2t)
1 It
6.3. EXERCISES
129
1
1
This occurs at the two values of t = 12 + 10
5, t = 12 10
5. Therefore, the two
points are
1
1
1
1
(1, 1, 1) + +
5 (1, 2, 1) , and (1, 1, 1) +
5 (1, 2, 1)
2 10
2 10
That is
6.3
1
1 1 1
1
1
1
1 1
1
+
5,
5, +
5 ,
5,
5,
5 .
2 10
5
2 10
2 10
5
2 10
Exercises
1. Determine whether the lines (1, 1, 2) + t (1, 0, 3) and (4, 1, 3) + t (3, 0, 1) have a
point of intersection. If they do, find the cosine of the angle between the two
lines. If they do not intersect, explain why they do not.
2. Determine whether the lines (1, 1, 2) + t (1, 0, 3) and (4, 2, 3) + t (3, 0, 1) have a
point of intersection. If they do, find the cosine of the angle between the two
lines. If they do not intersect, explain why they do not.
3. Find where the line (1, 0, 1) + t (1, 2, 1) intersects the surface x2 + y 2 + z 2 = 9 if
possible. If there is no intersection, explain why.
4. Find a parametric equation for the line through the points (2, 3, 4, 5) and (2, 3, 0, 1) .
5. Find the equation of a line through (1, 2, 3, 0) which has direction vector, (2, 1, 3, 1) .
6. Let (x, y) = (2 cos (t) , 2 sin (t)) where t [0, 2] . Describe the set of points encountered as t changes.
7. Let (x, y, z) = (2 cos (t) , 2 sin (t) , t) where t R. Describe the set of points encountered as t changes.
8. If there is a plane which contains the two lines, (2t + 2, 1 + t, 3 + 2t) = (x, y, z)
and (4 + t, 3 + 2t, 4 + t) = (x, y, z) find it. If there is no such plane tell why.
9. If there is a plane which contains the two lines, (2t + 4, 1 + t, 3 + 2t) = (x, y, z)
and (4 + t, 3 + 2t, 4 + t) = (x, y, z) find it. If there is no such plane tell why.
10. Find the equation of the plane which contains the three points (1, 2, 3) , (2, 3, 4) ,
and (3, 1, 2) .
11. Find the equation of the plane which contains the three points (1, 2, 3) , (2, 0, 4) ,
and (3, 1, 2) .
12. Find the equation of the plane which contains the three points (0, 2, 3) , (2, 3, 4) ,
and (3, 5, 2) .
13. Find the equation of the plane which contains the three points (1, 2, 3) , (0, 3, 4) ,
and (3, 6, 2) .
14. Find the equation of the plane having a normal vector, 5i + 2j6k which contains
the point (2, 1, 3) .
15. Find the equation of the plane having a normal vector, i + 2j4k which contains
the point (2, 0, 1) .
130
16. Find the equation of the plane having a normal vector, 2i + j6k which contains
the point (1, 1, 2) .
17. Find the equation of the plane having a normal vector, i + 2j3k which contains
the point (1, 0, 3) .
18. Find the cosine of the angle between the two planes 2x + 3y z = 11 and 3x +
y + 2z = 9.
19. Find the cosine of the angle between the two planes x+3yz = 11 and 2x+y+2z =
9.
20. Find the cosine of the angle between the two planes 2x + y z = 11 and 3x + 5y +
2z = 9.
21. Find the cosine of the angle between the two planes x + 3y + z = 11 and 3x + 2y +
2z = 9.
22. Determine the intercepts and sketch the plane 3x 2y + z = 4.
23. Determine the intercepts and sketch the plane x 2y + z = 2.
24. Determine the intercepts and sketch the plane x + y + z = 3.
25. Based on an analogy with the above pictures, sketch or otherwise describe the
2
2
graph of y = xa2 zb2 .
26. Based on an analogy with the above pictures, sketch or otherwise describe the
2
2
2
graph of zb2 + yc2 = 1 + xa2 .
27. The equation of a cone is z 2 = x2 + y 2 . Suppose this cone is intersected with the
plane, z = ay + 1. Consider
the projection of theointersection of the cone with this
n
2
plane. This means (x, y) : (ay + 1) = x2 + y 2 . Show this sometimes results in
a parabola, sometimes a hyperbola, and sometimes an ellipse depending on a.
28. Find the intercepts of the quadric surface, x2 +4y 2 z 2 = 4 and sketch the surface.
29. Find the intercepts of the quadric surface, x2 4y 2 + z 2 = 4 and sketch the
surface.
30. Find the intersection of the line (x, y, z) = (1 + t, t, 3t) with the surface, x2 /9 +
y 2 /4 + z 2 /16 = 1 if possible.
Part III
Vector Calculus
131
Outcomes
7.1
2
Example 7.1.3 Let f (x, y, z) = x+y
z , 1 x , y . Then D (f ) would consist of the set
of all (x, y, z) such that |x| 1 and z 6= 0.
There are many ways to make new functions from old ones.
Definition 7.1.4 Let f , g be functions with values in Rp . Let a, b be elements of R
(scalars). Then af + bg is the name of a function whose domain is D (f ) D (g) which
is defined as
(af + bg) (x) = af (x) + bg (x) .
f g or (f , g) is the name of a function whose domain is D (f ) D (g) which is defined
as
(f , g) (x) f g (x) f (x) g (x) .
133
134
Example 7.1.5 Let f (t) (t, 1 + t, 2) and g (t) t2 , t, t . Then f g is the name of
the function satisfying
f g (t) = f (t) g (t) = t3 + t + t2 + 2t = t3 + t2 + 3t
Note that in this case is was assumed the domains of the functions consisted of all
of R because this was the set on which the two both made sense. Also note that f and
g map R into R3 but f g maps R into R.
7.2
Vector Fields
Some people find it useful to try and draw pictures to illustrate a vector valued function.
This can be a very useful idea in the case where the function takes points in D R2
and delivers a vector in R2 . For many points, (x, y) D, you draw an arrow of the
appropriate length and direction with its tail at (x, y). The picture of all these arrows
can give you an understanding of what is happening. For example if the vector valued
function gives the velocity of a fluid at the point, (x, y) , the picture of these arrows can
give an idea of the motion of the fluid. When they are long the fluid is moving fast, when
they are short, the fluid is moving slowly the direction of these arrows is an indication
of the direction of motion. The only sensible way to produce such a picture is with a
computer. Otherwise, it becomes a worthless exercise in busy work. Furthermore, it is
of limited usefulness in three dimensions because in three dimensions such pictures are
too cluttered to convey much insight.
Example 7.2.1 Draw a picture of the vector field, (x, y) which gives the velocity of
a fluid flowing in two dimensions.
135
In this example, drawn by Maple, you can see how the arrows indicate the motion
of this fluid.
Example 7.2.2 Draw a picture of the vector field (y, x) for the velocity of a fluid flowing
in two dimensions.
So much for art. Get the computer to do it and it can be useful. If you try to do it,
you will mainly waste time.
Example 7.2.3 Draw a picture of the vector field (y cos (x) + 1, x sin (y) 1) for the
velocity of a fluid flowing in two dimensions.
7.3
Continuous Functions
What was done in beginning calculus for scalar functions is generalized here to include
the case of a vector valued function.
Definition 7.3.1 A function f : D (f ) Rp Rq is continuous at x D (f ) if for
each > 0 there exists > 0 such that whenever y D (f ) and
|y x| <
136
it follows that
|f (x) f (y)| < .
f is continuous if it is continuous at every point of D (f ) .
Note the total similarity to the scalar valued case.
7.3.1
The next theorem is a fundamental result which will allow us to worry less about the
definition of continuity.
Theorem 7.3.2 The following assertions are valid.
1. The function, af + bg is continuous at x whenever f , g are continuous at x
D (f ) D (g) and a, b R.
2. If f is continuous at x, f (x) D (g) Rp , and g is continuous at f (x) ,then g f
is continuous at x.
3. If f = (f1 , , fq ) : D (f ) Rq , then f is continuous if and only if each fk is a
continuous real valued function.
4. The function f : Rp R, given by f (x) = |x| is continuous.
The proof of this theorem is in the last section of this chapter. Its conclusions are not
surprising. For example the first claim says that (af + bg) (y) is close to (af + bg) (x)
when y is close to x provided the same can be said about f and g. For the second
claim, if y is close to x, f (x) is close to f (y) and so by continuity of g at f (x), g (f (y))
is close to g (f (x)) . To see the third claim is likely, note that closeness in Rp is the
same as closeness in each coordinate. The fourth claim is immediate from the triangle
inequality.
For functions defined on Rn , there is a notion of polynomial just as there is for
functions defined on R.
Definition 7.3.3 Let be an n dimensional multi-index. This means
= (1 , , n )
where each i is a natural number or zero. Also, let
||
n
X
|i |
i=1
7.4
137
Limits Of A Function
As in the case of scalar valued functions of one variable, a concept closely related to
continuity is that of the limit of a function. The notion of limit of a function makes
sense at points, x, which are limit points of D (f ) and this concept is defined next.
Definition 7.4.1 Let A Rm be a set. A point, x, is a limit point of A if B (x, r)
contains infinitely many points of A for every r > 0.
Definition 7.4.2 Let f : D (f ) Rp Rq be a function and let x be a limit point of
D (f ) . Then
lim f (y) = L
yx
if and only if the following condition holds. For all > 0 there exists > 0 such that if
0 < |y x| < , and y D (f )
then,
|L f (y)| < .
Theorem 7.4.3 If limyx f (y) = L and limyx f (y) = L1 , then L = L1 .
Proof: Let > 0 be given. There exists > 0 such that if 0 < |y x| < and
y D (f ) , then
|f (y) L| < , |f (y) L1 | < .
Pick such a y. There exists one because x is a limit point of D (f ) . Then
|L L1 | |L f (y)| + |f (y) L1 | < + = 2.
Since > 0 was arbitrary, this shows L = L1 .
As in the case of functions of one variable, one can define what it means for
limyx f (x) = .
Definition 7.4.4 If f (x) R, limyx f (x) = if for every number l, there exists
> 0 such that whenever |y x| < and y D (f ) , then f (x) > l.
The following theorem is just like the one variable version of calculus.
Theorem 7.4.5 Suppose limyx f (y) = L and limyx g (y) = K where K, L Rq .
Then if a, b R,
lim (af (y) + bg (y)) = aL + bK,
(7.1)
yx
lim f g (y) = L K
yx
(7.2)
yx
(7.3)
yx
(7.4)
138
Proof: The proof of 7.1 is left for you. It is like a corresponding theorem for
continuous functions. Now 7.2is to be verified. Let > 0 be given. Then by the triangle
inequality,
|f g (y) L K| |fg (y) f (y) K| + |f (y) K L K|
|f (y)| |g (y) K| + |K| |f (y) L| .
There exists 1 such that if 0 < |y x| < 1 and y D (f ) , then
|f (y) L| < 1,
and so for such y, the triangle inequality implies, |f (y)| < 1 + |L| . Therefore, for
0 < |y x| < 1 ,
|f g (y) L K| (1 + |K| + |L|) [|g (y) K| + |f (y) L|] .
(7.5)
, |g (y) K| <
.
2 (1 + |K| + |L|)
2 (1 + |K| + |L|)
<
|b f (y)| ,
a contradiction to the assumption that |b f (y)| r.
Theorem 7.4.6 For f : D (f ) Rq and x D (f ) a limit point of D (f ) , f is continuous at x if and only if
lim f (y) = f (x) .
yx
139
(7.6)
lim fk (y) = Lk
(7.7)
yx
if and only if
yx
(7.8)
yx
Proof: Suppose 7.6. Then letting > 0 be given there exists > 0 such that if
0 < |y x| < , it follows
|fk (y) Lk | |f (y) L| <
which verifies 7.7.
Now suppose 7.7 holds. Then letting > 0 be given, there exists k such that if
0 < |y x| < k , then
|f (y) L| =
p
X
!1/2
2
|fk (y) Lk |
k=1
<
p
X
2
k=1
!1/2
= .
It remains to verify 7.8. But from the first part of this theorem and the description of
the cross product presented earlier in terms of the permutation symbol,
lim (f (y) g (y))i
yx
yx
= ijk Lj Kk = (L K)i .
Therefore, from the first part of this theorem, this establishes 11.5. This completes the
proof.
2
9
Example 7.4.8 Find lim(x,y)(3,1) xx3
,y .
140
x2 9
x3
xy
x2 +y 2 .
First of all observe the domain of the function is R2 \ {(0, 0)} , every point in R2
except the origin. Therefore, (0, 0) is a limit point of the domain of the function so
it might make sense to take a limit. However, just as in the case of a function of one
variable, the limit may not exist. In fact, this is the case here. To see this, take points on
the line y = 0. At these points, the value of the function equals 0. Now consider points
on the line y = x where the value of the function equals 1/2. Since arbitrarily close to
(0, 0) there are points where the function equals 1/2 and points where the function has
the value 0, it follows there can be no limit. Just take = 1/10 for example. You cant
be within 1/10 of 1/2 and also within 1/10 of 0 at the same time.
Note it is necessary to rely on the definition of the limit much more than in the
case of a function of one variable and there are no easy ways to do limit problems for
functions of more than one variable. It is what it is and you will not deal with these
concepts without suffering and anguish.
7.5
Functions of p variables have many of the same properties as functions of one variable.
First there is a version of the extreme value theorem generalizing the one dimensional
case.
Theorem 7.5.1 Let C be closed and bounded and let f : C R be continuous. Then
f achieves its maximum and its minimum on C. This means there exist, x1 , x2 C
such that for all x C,
f (x1 ) f (x) f (x2 ) .
There is also the long technical theorem about sums and products of continuous
functions. These theorems are proved in the next section.
Theorem 7.5.2 The following assertions are valid
1. The function, af + bg is continuous at x when f , g are continuous at x D (f )
D (g) and a, b R.
2. If and f and g are each real valued functions continuous at x, then f g is continuous at x. If, in addition to this, g (x) 6= 0, then f /g is continuous at x.
3. If f is continuous at x, f (x) D (g) Rp , and g is continuous at f (x) ,then g f
is continuous at x.
4. If f = (f1 , , fq ) : D (f ) Rq , then f is continuous if and only if each fk is a
continuous real valued function.
5. The function f : Rp R, given by f (x) = |x| is continuous.
7.6
Exercises
t
t
and let g (t) = t + 1, 1, t2 +1
. Find f g.
1. Let f (t) = t, t2 + 1, t+1
2. Let f , g be given in the previous problem. Find f g.
7.6. EXERCISES
141
3. Find D (f ) if f (x, y, z, w) =
xy
zw ,
6 x2 y 2 .
4. Let f (t) = t, t2 , t3 , g (t) = 1, t, t2 , and h (t) = (sin t, t, 1) . Find the time rate
of change of the volume of the parallelepiped spanned by the vectors f , g, and h.
5. Let f (t) = (t, sin t) . Show f is continuous at every point t.
6. Suppose |f (x) f (y)| K |x y| where K is a constant. Show that f is everywhere continuous. Functions satisfying such an inequality are called Lipschitz
functions.
f (x, y)
xy
x2 +y 2
if (x, y) 6= (0, 0)
.
0 if (x, y) = (0, 0)
Find lim(x,y)(0,0) f (x, y) if it exists. If it does not exist, tell why it does not
exist. Hint: Consider along the line y = x and along the line y = 0.
12. Find the following limits if possible
(a) lim(x,y)(0,0)
x2 y 2
x2 +y 2
(b) lim(x,y)(0,0)
x(x2 y 2 )
(x2 +y 2 )
(c) lim(x,y)(0,0)
(x2 y4 )
(x2 +y 4 )2
1
x2 +y 2
13. In the definition of limit, why must x be a limit point of D (f )? Hint: If x were
not a limit point of D (f ), show there exists > 0 such that B (x, ) contains no
points of D (f ) other than possibly x itself. Argue that 33.3 is a limit and that so
is 22 and 7 and 11. In other words the concept is totally worthless.
14. Suppose limx0 f (x, 0) = 0 = limy0 f (0, y) . Does it follow that
lim
(x,y)(0,0)
f (x, y) = 0?
142
(x, y, z) : x2 + y 2 + 2z 2 8 ?
Explain why.
20. Suppose x is defined to be a limit point of a set, A if and only if for all r > 0,
B (x, r) contains a point of A different than x. Show this is equivalent to the above
definition of limit point.
21. Give an example of a set of points in R3 which has no limit points. Show that if
D (f ) equals this set, then f is continuous. Show that more generally, if f is any
function for which D (f ) has no limit points, then f is continuous.
n
22. Let {xk }k=1 be any finite set of points in Rp . Show this set has no limit points.
23. Suppose S is any set of points such that every pair of points is at least as far apart
as 1. Show S has no limit points.
24. Find limx0
sin(|x|)
|x|
25. Suppose g is a continuous vector valued function of one variable defined on [0, ).
Prove
lim g (|x|) = g (|x0 |) .
xx0
26. Give some examples of limit problems for functions of many variables which have
limits and prove your assertions.
7.7
143
Some Fundamentals
This section contains the proofs of the theorems which were stated without proof
along with some other significant topics which will be useful later. These topics are of
fundamental significance but are difficult.
Theorem 7.7.1 The following assertions are valid
1. The function, af + bg is continuous at x when f , g are continuous at x D (f )
D (g) and a, b R.
2. If and f and g are each real valued functions continuous at x, then f g is continuous at x. If, in addition to this, g (x) 6= 0, then f /g is continuous at x.
3. If f is continuous at x, f (x) D (g) Rp , and g is continuous at f (x) ,then g f
is continuous at x.
4. If f = (f1 , , fq ) : D (f ) Rq , then f is continuous if and only if each fk is a
continuous real valued function.
5. The function f : Rp R, given by f (x) = |x| is continuous.
Proof: Begin with 1.) Let > 0 be given. By assumption, there exist 1 > 0 such
such that whenever |x y| < 2 , it follows that |g (x) g (y)| < 2(|a|+|b|+1)
. Then let
0 < min ( 1 , 2 ) . If |x y| < , then everything happens at once. Therefore, using
the triangle inequality
|af (x) + bf (x) (ag (y) + bg (y))|
|a| |f (x) f (y)| + |b| |g (x) g (y)|
< |a|
+ |b|
< .
2 (|a| + |b| + 1)
2 (|a| + |b| + 1)
Now begin on 2.) There exists 1 > 0 such that if |y x| < 1 , then |f (x) f (y)| <
1. Therefore, for such y,
|f (y)| < 1 + |f (x)| .
It follows that for such y,
|f g (x) f g (y)| |f (x) g (x) g (x) f (y)| + |g (x) f (y) f (y) g (y)|
|g (x)| |f (x) f (y)| + |f (y)| |g (x) g (y)|
(1 + |g (x)| + |f (y)|) [|g (x) g (y)| + |f (x) f (y)|]
(2 + |g (x)| + |f (x)|) [|g (x) g (y)| + |f (x) f (y)|]
144
Now let > 0 be given. There exists 2 such that if |x y| < 2 , then
,
2 (2 + |g (x)| + |f (x)|)
2 (2 + |g (x)| + |f (x)|)
Now let 0 < min ( 1 , 2 , 3 ) . Then if |x y| < , all the above hold at once and
|f g (x) f g (y)|
(2 + |g (x)| + |f (x)|) [|g (x) g (y)| + |f (x) f (y)|]
g (x) g (y) =
g (x) g (y)
|f (x) g (y) f (y) g (x)|
2
|g(x)|
2
2
2
|g (x)|
2
3
2
where
M
2
2
|g (x)|
(1 + 2 |f (x)| + 2 |g (x)|)
1
M
2
145
f (x) f (y)
g (x)
g (y)
h
i
< M M 1 + M 1 = .
2
2
This completes the proof of the second part of 2.) Note that in these proofs no effort is
made to find some sort of best . The problem is one which has a yes or a no answer.
Either is it or it is not continuous.
Now begin on 3.). If f is continuous at x, f (x) D (g) Rp , and g is continuous
at f (x) ,then g f is continuous at x. Let > 0 be given. Then there exists > 0 such
that if |y f (x)| < and y D (g) , it follows that |g (y) g (f (x))| < . It follows
from continuity of f at x that there exists > 0 such that if |x z| < and z D (f ) ,
then |f (z) f (x)| < . Then if |x z| < and z D (g f ) D (f ) , all the above
hold and so
|g (f (z)) g (f (x))| < .
|g (y) g (x)| <
(7.9)
i=1
Suppose first that f is continuous at x. Then there exists > 0 such that if |x y| < ,
then |f (x) f (y)| < . The first part of the above inequality then shows that for
each k = 1, , q, |fk (x) fk (y)| < . This shows the only if part. Now suppose each
function, fk is continuous. Then if > 0 is given, there exists k > 0 such that whenever
|x y| < k
|fk (x) fk (y)| < /q.
Now let 0 < min ( 1 , , q ) . For |x y| < , the above inequality holds for all k
and so the last part of 7.9 implies
|f (x) f (y)|
<
q
X
i=1
q
X
i=1
= .
q
146
7.7.1
Qp
Lemma 7.7.2 Let Ik = i=1 aki , bki x Rp : xi aki , bki
and suppose that for
all k = 1, 2, ,
Ik Ik+1 .
Then there exists a point, c Rp which is an element of every Ik .
Proof: Since Ik Ik+1 , it follows that for each i = 1, , p , aki , bki ak+1
, bk+1
.
i
i
This implies that for each i,
aki ak+1
, bki bik+1 .
i
(7.10)
(7.11)
Consequently, if k l,
Now define
ci sup ali : l = 1, 2,
ci = sup ali : l = k, k + 1,
(7.12)
k
for each k = 1, 2 . Therefore,
picking any k,7.11 shows that bi is an upper bound for
l
the set, ai : l = k, k + 1, and so it is at least as large as the least upper bound of
this set which is the definition of ci given in 7.12. Thus, for each i and each k,
aki ci bki .
Defining c (c1 , , cp ) , c Ik for all k. This proves the lemma.
If you dont like the proof,you could prove the lemma for the one variable case first
and then do the following.
Qp
and suppose that for
Lemma 7.7.3 Let Ik = i=1 aki , bki x Rp : xi aki , bki
all k = 1, 2, ,
Ik Ik+1 .
Then there exists a point, c Rp which is an element of every Ik .
7.7.2
Qp
Definition 7.7.4 A set, C Rp is said to be bounded if C i=1 [ai , bi ] for some
choice of intervals, [ai , bi ] where < ai < bi < . The diameter of a set, S, is
defined as
diam (S) sup {|x y| : x, y S} .
A function, f having values in Rp is said to be bounded if the set of values of f is a
bounded set.
Thus diam (S) is just a careful description of what you would think of as the diameter. It measures how stretched out the set is.
147
a
|
because for x, y I0 , |xi yi | |ai bi |
i
i
i=1
for each i = 1, , p,
p
!1/2
X
2
|x y| =
|xi yi |
i=1
1
p
X
!1/2
2
|bi ai |
21 diam (I0 ) .
i=1
Denote by {J1 , , J2p } these sets determined above. It follows the diameter of each set
is no larger than 21 diam (I0 ) . In particular, since d (d1 , , dp ) and c (c1 , , cp )
are two such points, for each Jk ,
p
!1/2
X
2
|di ci |
diam (Jk )
21 diam (I0 )
i=1
C = 2k=1 Jk C.
If f is not bounded on C, it follows that for some k, f is not bounded on Jk C. Let I1
Jk and let C1 = C I1 . Now do to I1 and C1 what was done to I0 and C to obtain
I2 I1 , and for x, y I2 ,
|x y| 21 diam (I1 ) 22 diam (I2 ) ,
and f is unbounded on I2 C1 C2 . Continue in this way obtaining sets, Ik such that
Ik Ik+1 and diam (Ik ) 2k diam (I0 ) and f is unbounded on Ik C. By the nested
interval lemma, there exists a point, c which is contained in each Ik .
Claim: c C.
Proof of claim: Suppose c
/ C. Since C is a closed set, there exists r > 0 such that
B (c, r) is contained completely in Rp \ C. In other words, B (c, r) contains no points of
C. Let k be so large that diam (I0 ) 2k < r. Then since c Ik , and any two points of
Ik are closer than diam (I0 ) 2k , Ik must be contained in B (c, r) and so has no points
of C in it, contrary to the manner in which the Ik are defined in which f is unbounded
on Ik C. Therefore, c C as claimed.
Now for k large enough, and x C Ik , the continuity of f implies |f (c) f (x)| < 1
contradicting the manner in which Ik was chosen since this inequality implies f is
bounded on Ik C. This proves the theorem.
Here is a proof of the extreme value theorem.
Theorem 7.7.6 Let C be closed and bounded and let f : C R be continuous. Then
f achieves its maximum and its minimum on C. This means there exist, x1 , x2 C
such that for all x C,
f (x1 ) f (x) f (x2 ) .
148
1
M f (x)
7.7.3
write the sequence, not as f but as {fi }i=k or just {fi } for short. The letter used for
the name of the sequence is not important. Thus it is all right to let a be the name of a
sequence or to refer to it as {ai } . When the sequence has values in Rp , it is customary
to write it in bold face. Thus {ai } would refer to a sequence having values in Rp for
some p > 1.
149
2
bk = (2k 1) + 1 = 4k 2 4k + 2.
Definition 7.7.12 A sequence, {ak } is said to converge to a if for every > 0
there exists n such that if n > n , then |a a | < . The usual notation for this
is limn an = a although it is often written as an a.
The following theorem says the limit, if it exists, is unique.
Theorem 7.7.13 If a sequence, {an } converges to a and to b then a = b.
Proof: There exists n such that if n > n then |an a| <
|an b| < 2 . Then pick such an n.
|a b| < |a an | + |an b| <
+ = .
2 2
n1
X
|ak | .
k=1
150
+ =
2 2
showing that, since > 0 is arbitrary, {an } is a Cauchy sequence. It remains to show
the last claim. Suppose then that {an } is a Cauchy sequence and a = limk ank
where {ank }k=1 is a subsequence. Let > 0 be given. Then there exists K such
that if k, l K, then |ak al | < 2 . Then if k > K, it follows nk > K because
n1 , n2 , n3 , is strictly increasing as the subscript increases. Also, there exists K1 such
that if k > K1 , |ank a| < 2 . Then letting n > max (K, K1 ) , pick k > max (K, K1 ) .
Then
|a an | |a ank | + |ank an | < + = .
2 2
This proves the theorem.
Definition 7.7.17 A set, K in Rp is said to be sequentially compact if every sequence in K has a subsequence which converges to a point of K.
Qp
Theorem 7.7.18 If I0 = i=1 [ai , bi ] , p 1, where ai bi , then I0 is sequentially
compact.
Qp
|x y| =
p
X
!1/2
|xi yi |
i=1
1
p
X
!1/2
2
|bi ai |
21 diam (I0 ) .
i=1
D1
p
X
!1/2
|di ci |
21 diam (I0 )
i=1
Denote by {J1 , , J2p } these sets determined above. Since the union of these sets
equals all of I0 I, it follows that for some Jk , the sequence, {ai } is contained in Jk
for infinitely many k. Let that one be called I1 . Next do for I1 what was done for I0 to
get I2 I1 such that the diameter is half that of I1 and I2 contains {ak } for infinitely
many values of k. Continue in this way obtaining a nested sequence of intervals, {Ik }
such that Ik Ik+1 , and if x, y Ik , then |x y| 2k diam (I0 ) , and In contains
{ak } for infinitely many values of k for each n. Then by the nested interval lemma, there
exists c such that c is contained in each Ik . Pick an1 I1 . Next pick n2 > n1 such that
an2 I2 . If an1 , , ank have been chosen, let ank+1 Ik+1 and nk+1 > nk . This can
be done because in the construction, In contains {ak } for infinitely many k. Thus the
distance between ank and c is no larger than 2k diam (I0 ) and so limk ank = c I0 .
This proves the theorem.
7.8. EXERCISES
151
7.7.4
Just as in the case of a function of one variable, there is a very useful way of thinking
of continuity in terms of limits of sequences found in the following theorem. In words,
it says a function is continuous if it takes convergent sequences to convergent sequences
whenever possible.
Theorem 7.7.20 A function f : D (f ) Rq is continuous at x D (f ) if and only if,
whenever xn x with xn D (f ) , it follows f (xn ) f (x) .
Proof: Suppose first that f is continuous at x and let xn x. Let > 0 be given. By
continuity, there exists > 0 such that if |y x| < , then |f (x) f (y)| < . However,
there exists n such that if n n , then |xn x| < and so for all n this large,
|f (x) f (xn )| <
which shows f (xn ) f (x) .
Now suppose the condition about taking convergent sequences to convergent sequences holds at x. Suppose f fails to be continuous at x. Then there exists > 0 and
xn D (f ) such that |x xn | < n1 , yet
|f (x) f (xn )| .
But this is clearly a contradiction because, although xn x, f (xn ) fails to converge to
f (x) . It follows f must be continuous after all. This proves the theorem.
7.8
Exercises
152
5. Suppose every Cauchy sequence converges in R. Show this implies the least upper
bound axiom which is the usual way to state completeness for R. Explain why
the convergence of Cauchy sequences is equivalent to every nonempty set which
is bounded above has a least upper bound in R.
6. From Problem 2 every closed and bounded set is sequentially compact. Are these
the only sets which are sequentially compact? Explain.
7. A set whose elements are open sets, C is called an open cover of H if C H.
In other words, C is an open cover of H if every point of H is in at least one
set of C. Show that if C is an open cover of a closed and bounded set H then
there exists > 0 such that whenever x H, B (x, ) is contained in some set
of C. This number, is called a Lebesgue
number. Hint: If there is no
Qn
Lebesgue number for H, let H I = i=1 [ai , bi ] . Use the process of chopping
the intervals in half to get a sequence of nested intervals, Ik contained in I where
diam (Ik ) 2k diam (I) and there is no Lebesgue number for the open cover on
Hk H Ik . Now use the nested interval theorem to get c in all these Hk . For
some r > 0 it follows B (c, r) is contained in some open set of U. But for large k,
it must be that Hk B (c, r) which contradicts the construction. You fill in the
details.
8. A set is compact if for every open cover of the set, there exists a finite subset of
the open cover which also covers the set. Show every closed and bounded set in
Rp is compact. Next show that if a set in Rp is compact, then it must be closed
and bounded. This is called the Heine Borel theorem.
9. Suppose S is a nonempty set in Rp . Define
dist (x,S) inf {|x y| : y S} .
Show that
|dist (x,S) dist (y,S)| |x y| .
Hint: Suppose dist (x, S) < dist (y, S) . If these are equal there is nothing to show.
Explain why there exists z S such that |x z| < dist (x,S) + . Now explain
why
|dist (x,S) dist (y,S)| = dist (y,S) dist (x,S) |y z| (|x z| )
Now use the triangle inequality and observe that is arbitrary.
10. Suppose H is a closed set and H U Rp , an open set. Show there exists a
continuous function defined on Rp , f such that f (Rp ) [0, 1] , f (x) = 0 if x U
/
and f (x) = 1 if x H. Hint: Try something like
dist x, U C
,
dist (x, U C ) + dist (x, H)
where U C Rp \ U, a closed set. You need to explain why the denominator is
never equal to zero. The rest is supplied by Problem 9. This is a special case of
a major theorem called Urysohns lemma.
Outcomes
8.1
Limits of vector valued functions have been considered earlier. Here it is desired to
consider
f (t0 + h) f (t0 )
lim
h0
h
Specializing to functions of one variable, one can give a meaning to
lim f (s) , lim f (s) , lim f (s) ,
st+
st
and
lim f (s) .
153
154
st+
if and only if for all > 0 there exists > 0 such that if
0 < s t < ,
then
|f (s) L| < .
In the case where D (f ) is only assumed to satisfy D (f ) (t r, t) ,
lim f (s) = L
st
if and only if for all > 0 there exists > 0 such that if
0 < t s < ,
then
|f (s) L| < .
One can also consider limits as a variable approaches infinity. Of course nothing is
close to infinity and so this requires a slightly different definition.
lim f (t) = L
(8.1)
and
lim f (t) = L
if for every > 0 there exists l such that whenever t < l, 8.1 holds.
Note that in all of this the definitions are identical to the case of scalar valued
functions. The only difference is that here || refers to the norm or length in Rp where
maybe p > 1.
Example 8.1.2 Let f (t) = cos t, sin t, t2 + 1, ln (t) . Find limt/2 f (t) .
Use Theorem 7.4.7 on Page 139 and the continuity of the functions to write this
limit equals
+ 1 , ln
.
=
0, 1, ln
4
2
Example 8.1.3 Let f (t) =
Recall that limt0
(1, 0, 1) .
sin t
t
sin t
t
8.2
155
The following definition is on the derivative and integral of a vector valued function of
one variable.
Definition 8.2.1 The derivative of a function, f 0 (t) , is defined as the following limit
whenever the limit exists. If the limit does not exist, then neither does f 0 (t) .
lim
h0
f (t + h) f (x)
f 0 (t)
h
The function of h on the left is called the difference quotient just as it was for a scalar
Rb
valued function. If f (t) = (f1 (t) , , fp (t)) and a fi (t) dt exists for each i = 1, , p,
Rb
then a f (t) dt is defined as the vector,
Z
f1 (t) dt, ,
fp (t) dt .
yx
f (y) f (x)
.
yx
As in the case of a scalar valued function, differentiability implies continuity but not
the other way around.
Theorem 8.2.2 If f 0 (t) exists, then f is continuous at t.
Proof: Suppose > 0 is given and choose 1 > 0 such that if |h| < 1 ,
f (t + h) f (t)
f (t) < 1.
h
then for such h, the triangle inequality implies
|f (t + h) f (t)| < |h| + |f 0 (t)| |h| .
Now letting < min 1 , 1+|f0 (x)| it follows if |h| < , then
|f (t + h) f (t)| < .
Letting y = h + t, this shows that if |y t| < ,
|f (y) f (t)| <
which proves f is continuous at t. This proves the theorem.
As in the scalar case, there is a fundamental theorem of calculus.
Theorem 8.2.3 If f R ([a, b]) and if f is continuous at t (a, b) , then
d
dt
f (s) ds
a
= f (t) .
156
h0
f (x + h) f (x)
= lim 0 = 0
h0
h
Example 8.2.5 Let f (t) = (at, bt) where a, b are constants. Find f 0 (t) .
From the above discussion this derivative is just the vector valued functions whose
components consist of the derivatives of the components of f . Thus f 0 (t) = (a, b) .
8.2.1
Suppose r is a vector valued function of a parameter, t not necessarily time and consider
the following picture of the points traced out by r.
r(t + h)
*
r(t)
In this picture there are unit vectors in the direction of the vector from r (t) to
r (t + h) . You can see that it is reasonable to suppose these unit vectors, if they converge,
converge to a unit vector, T which is tangent to the curve at the point r (t) . Now each
of these unit vectors is of the form
r (t + h) r (t)
Th .
|r (t + h) r (t)|
Thus Th T, a unit tangent vector to the curve at the point r (t) . Therefore,
r0 (t)
=
r (t + h) r (t)
|r (t + h) r (t)| r (t + h) r (t)
= lim
h0
h0
h
h
|r (t + h) r (t)|
|r (t + h) r (t)|
lim
Th = |r0 (t)| T.
h0
h
lim
157
Therefore,
|r (t + h) r (t)|
h
gives for small h, the approximate distance travelled on the time interval, [t, t + h]
divided by the length of time, h. Therefore, this expression is really the average speed
of the object on this small time interval and so the limit as h 0, deserves to be called
the instantaneous speed of the object. Thus |r0 (t)| T represents the speed times a unit
direction vector, T which defines the direction in which the object is moving. Thus r0 (t)
is the velocity of the object. This is the physical significance of the derivative when t is
time.
How do you go about computing r0 (t)? Letting r (t) = (r1 (t) , , rq (t)) , the expression
r (t0 + h) r (t0 )
(8.2)
h
is equal to
r1 (t0 + h) r1 (t0 )
rq (t0 + h) rq (t0 )
, ,
.
h
h
Then as h converges to 0, 8.2 converges to
v (v1 , , vq )
where vk = rk0 (t) . This by Theorem 7.4.7 on Page 139, which says that the term in 8.2
gets close to a vector, v if and only if all the coordinate functions of the term in 8.2 get
close to the corresponding coordinate functions of v.
In the case where t is time, this simply says the velocity vector equals the vector
whose components are the derivatives of the components of the displacement vector,
r (t) .
In any case, the vector, T determines a direction vector which is tangent to the
curve at the point, r (t) and so it is possible to find parametric equations for the line
tangent to the curve at various points.
Example 8.2.6 Let r (t) = sin t, t2 , t + 1 for t [0, 5] . Find a tangent line to the
curve parameterized by r at the point r (2) .
From the above discussion, a direction vector has the same direction as r0 (2) . Therefore, it suffices to simply use r0 (2) as a direction vector for the line. r0 (2) = (cos 2, 4, 1) .
Therefore, a parametric equation for the tangent line is
(sin 2, 4, 3) + t (cos 2, 4, 1) = (x, y, z) .
158
Example 8.2.7 Let r (t) = sin t, t2 , t + 1 for t [0, 5] . Find the velocity vector when
t = 1.
From the above discussion, this is simply r0 (1) = (cos 1, 2, 1) .
8.2.2
Differentiation Rules
There are rules which relate the derivative to the various operations done with vectors
such as the dot product, the cross product, and vector addition and scalar multiplication.
Theorem 8.2.8 Let a, b R and suppose f 0 (t) and g0 (t) exist. Then the following
formulas are obtained.
0
(af + bg) (t) = af 0 (t) + bg0 (t) .
(8.3)
0
(8.4)
(8.5)
The formulas, 8.4, and 8.5 are referred to as the product rule.
Proof: The first formula is left for you to prove. Consider the second, 8.4.
f g (t + h) fg (t)
h0
h
lim
(g (t + h) g (t)) (f (t + h) f (t))
= lim f (t + h)
+
g (t)
h0
h
h
n
n
X
(gk (t + h) gk (t)) X (fk (t + h) fk (t))
= lim
fk (t + h)
+
gk (t)
h0
h
h
= lim
h0
k=1
n
X
k=1
k=1
n
X
k=1
1
,2 .
From 8.5 this equals(2t, cos t, sin t)(t, ln (t + 1) , 2t)+ t2 , sin t, cos t 1, t+1
R
Example 8.2.10 Let r (t) = t2 , sin t, cos t Find 0 r (t) dt.
R
R
R
This equals 0 t2 dt, 0 sin t dt, 0 cos t dt = 13 3 , 2, 0 .
t
Example 8.2.11 An object has position r (t) = t3 , 1+1
, t2 + 2 kilometers where t is
given in hours. Find the velocity of the object in kilometers per hour when t = 1.
159
Recall the velocity at time t was r0 (t) . Therefore, find r0 (t) and plug in t = 1 to
find the velocity.
!
2
1/2
0
2 1 (1 + t) t 1
r (t) = 3t ,
,
t +2
2t
2
2
(1 + t)
!
1
1
2
p
= 3t ,
t
2,
(t2 + 2)
(1 + t)
When t = 1, the velocity is
1 1
r (1) = 3, ,
kilometers per hour.
4
3
0
Obviously, this can be continued. That is, you can consider the possibility of taking
the derivative of the derivative and then the derivative of that and so forth. The main
thing to consider about this is the notation and it is exactly like it was in the case of a
scalar valued function presented earlier. Thus r00 (t) denotes the second derivative.
When you are given a vector valued function of one variable, sometimes it is possible
to give a simple description of the curve which results. Usually it is not possible to do
this!
Example 8.2.12 Describe the curve which results from the vector valued function,
r (t) = (cos 2t, sin 2t, t) where t R.
The first two components indicate that for r (t) = (x (t) , y (t) , z (t)) , the pair,
(x (t) , y (t)) traces out a circle. While it is doing so, z (t) is moving at a steady rate in
the positive direction. Therefore, the curve which results is a cork skrew shaped thing
called a helix.
As an application of the theorems for differentiating curves, here is an interesting
application. It is also a situation where the curve can be identified as something familiar.
Example 8.2.13 Sound waves have the angle of incidence equal to the angle of reflection. Suppose you are in a large room and you make a sound. The sound waves spread
out and you would expect your sound to be inaudible very far away. But what if the room
were shaped so that the sound is reflected off the wall toward a single point, possibly far
away from you? Then you might have the interesting phenomenon of someone far away
hearing what you said quite clearly. How should the room be designed?
Suppose you are located at the point P0 and the point where your sound is to be
reflected is P1 . Consider a plane which contains the two points and let r (t) denote a
parameterization of the intersection of this plane with the walls of the room. Then the
condition that the angle of reflection equals the angle of incidence reduces to saying the
angle between P0 r (t) and r0 (t) equals the angle between P1 r (t) and r0 (t) . Draw
a picture to see this. Therefore,
(P1 r (t)) (r0 (t))
(P0 r (t)) (r0 (t))
=
.
0
|P0 r (t)| |r (t)|
|P1 r (t)| |r0 (t)|
This reduces to
Now
(8.6)
160
(r (t) P1 ) (r (t) P1 )
=
=
1
1/2
((r (t) P1 ) (r (t) P1 ))
2 ((r (t) P1 ) r0 (t))
2
(r (t) P1 ) (r0 (t))
.
|r (t) P1 |
8.2.3
Leibnizs Notation
8.3
dy
dt
0
Proof: (A (t) B (t)) = Cij
(t) where Cij (t) = Aik (t) Bkj (t) and the repeated
index summation convention is being used. Therefore,
0
0
Cij
(t) = A0ik (t) Bkj (t) + Aik (t) Bkj
(t)
0
0
= (A (t) B (t))ij + (A (t) B (t))ij
8.4
161
Let i (t) , j (t) , k (t) be a right handed1 orthonormal basis of vectors for each t. It is
assumed these vectors are C 1 functions of t. Letting the positive x axis extend in the
direction of i (t) , the positive y axis extend in the direction of j (t), and the positive
z axis extend in the direction of k (t) , yields a moving coordinate system. Now let
u = (u1 , u2 , u3 ) R3 and let t0 be some reference time. For example you could let
t0 = 0. Then define the components of u with respect to these vectors, i, j, k at time t0
as
u u1 i (t0 ) + u2 j (t0 ) + u3 k (t0 ) .
Let u (t) be defined as the vector which has the same components with respect to i, j, k
but at time t. Thus
u (t) u1 i (t) + u2 j (t) + u3 k (t) .
and the vector has changed although the components have not.
For example, this is exactly the situation in the case of apparently fixed basis vectors
on the earth if u is a position vector from the given spot on the earths surface to a
point regarded as fixed with the earth due to its keeping the same coordinates relative
to coordinate axes which are fixed with the earth.
Now define a linear transformation Q (t) mapping R3 to R3 by
Q (t) u u1 i (t) + u2 j (t) + u3 k (t)
where
u u1 i (t0 ) + u2 j (t0 ) + u3 k (t0 )
Thus letting v, u R3 be vectors and , , scalars,
Q (t) (u + v) (u1 + v1 ) i (t) + (u2 + v2 ) j (t) + (u3 + v3 ) k (t)
= (u1 i (t) + u2 j (t) + u3 k (t)) + (v1 i (t) + v2 j (t) + v3 k (t))
= (u1 i (t) + u2 j (t) + u3 k (t)) + (v1 i (t) + v2 j (t) + v3 k (t))
Q (t) u + Q (t) v
showing that Q (t) is a linear transformation. Also, Q (t) preserves all distances because,
since the vectors, i (t) , j (t) , k (t) form an orthonormal set,
3
!1/2
X 2
i
|Q (t) u| =
u
= |u| .
i=1
For simplicity, let i (t) = e1 (t) , j (t) = e2 (t) , k (t) = e3 (t) and
i (t0 ) = e1 (t0 ) , j (t0 ) = e2 (t0 ) , k (t0 ) = e3 (t0 ) .
Then using the repeated index summation convention,
u (t) = uj ej (t) = uj ej (t) ei (t0 ) ei (t0 )
and so with respect to the basis, i (t0 ) = e1 (t0 ) , j (t0 ) = e2 (t0 ) , k (t0 ) = e3 (t0 ) , the matrix of Q (t) is
Qij (t) = ei (t0 ) ej (t)
Recall this means you take a vector, u R3 which is a list of the components of u with
respect to i (t0 ) , j (t0 ) , k (t0 ) and when you multiply by Q (t) you get the components
of u (t) with respect to i (t0 ) , j (t0 ) , k (t0 ) . I will refer to this matrix as Q (t) to save
notation.
1 Recall
162
Lemma 8.4.1 Suppose Q (t) is a real, differentiable n n matrix which preserves disT
T
tances. Then Q (t) Q (t) = Q (t) Q (t) = I. Also, if u (t) Q (t) u, then there exists
a vector, (t) such that
u0 (t) = (t) u (t) .
2
2
Proof: Recall that (z w) = 14 |z + w| |z w| . Therefore,
(Q (t) uQ (t) w) =
=
=
This implies
1
2
2
|Q (t) (u + w)| |Q (t) (u w)|
4
1
2
2
|u + w| |u w|
4
(u w) .
T
Q (t) Q (t) u w = (u w)
for all u, w. Therefore, Q (t) Q (t) u = u and so Q (t) Q (t) = Q (t) Q (t) = I. This
proves the first part of the lemma.
It follows from the product rule, Lemma 8.3.2 that
T
T
T
T
Q0 (t) Q (t) = Q0 (t) Q (t)
.
(8.7)
z
}|
{
T
0
0
0
u (t) = Q (t) u =Q (t) Q (t) u (t).
T
Then writing the matrix of Q0 (t) Q (t) with respect to i (t0 ) , j (t0 ) , k (t0 ) , it follows
T
from 8.7 that the matrix of Q0 (t) Q (t) is of the form
0
3 (t) 2 (t)
3 (t)
0
1 (t)
2 (t) 1 (t)
0
for some time dependent scalars, i . Therefore,
u1
0
3 (t) 2 (t)
u1
u2 (t) = 3 (t)
0
1 (t) u2 (t)
u3
2 (t) 1 (t)
0
u3
where
(t) = 1 (t) i (t0 ) + 2 (t) j (t0 ) + 3 (t) k (t0 ) .
(8.8)
8.5. EXERCISES
163
because
w2
w3
(t) u (t) w1
u1
u2
u3
8.5
Exercises
x
(b) limx0+ |x| , sec x, ex
2
16
4x
(c) limx4 xx+4
, x + 7, tan
5x
2
x
x
sin x2
(d) limx 1+x
2 , 1+x2 ,
x
2. Find limx2
x2 4
2
x+2 , x
2
4
+ 2x 1, xx2
.
a3 b3 = (a b) a2 + ab + b2 .
164
3
2
3
describe the position of an ob4. Let r (t) = 4 + (t 1) , t2 + 1 (t 1) , (t1)
5
t
5.
6.
7.
8.
9.
10.
Let r (t) = sin 2t, t2 , 2t + 1 for t [0, 4] . Find a tangent line to the curve parameterized by r at the point r (2) .
Let r (t) = t, sin t2 , t + 1 for t [0, 5] . Find a tangent line to the curve parameterized by r at the point r (2) .
Let r (t) = sin t, t2 , cos t2 for t [0, 5] . Find a tangent line to the curve
parameterized by r at the point r (2) .
Let r (t) = sin t, cos t2 , t + 1 for t [0, 5] . Find the velocity when t = 3.
11. Suppose an object has position r (t) R3 where r is differentiable and suppose
also that |r (t)| = c where c is a constant.
(a) Show first that this condition does not require r (t) to be a constant. Hint:
You can do this either mathematically or by giving a physical example.
(b) Show that you can conclude that r0 (t) r (t) = 0. That is, the velocity is
always perpendicular to the displacement.
12. Prove 8.5 from the component description of the cross product.
13. Prove 8.5 from the formula (f g)i = ijk fj gk .
14. Prove 8.5 directly from the definition of the derivative without considering components.
15. A bezier curve in Rn is a vector valued function of the form
n
X
n
nk k
y (t) =
xk (1 t)
t
k
k=0
n
where here the k are the binomial coefficients and xk are n+1 points
nin
R .nShow
0
0
that y (0)= x0, y (1) = xn , and find y (0) and y (1) . Recall that 0 = n = 1
n
and n1
= n1 = n. Curves of this sort are important in various computer
programs.
16. Suppose r (t), s (t) , and p (t) are three differentiable functions of t which have
0
values in R3 . Find a formula for (r (t) s (t) p (t)) .
17. If r0 (t) = 0 for all t (a, b), show there exists a constant vector, c such that
r (t) = c for all t (a, b) .
Rb
18. If F0 (t) = f (t) for all t (a, b) and F is continuous on [a, b] , show a f (t) dt =
F (b) F (a) .
19. Verify that if u = 0 for all u, then = 0.
20. Verify that if u 6= 0 and v u = 0 and both and 1 satisfy u = v, then
1 = .
8.6
165
tan x
,
sin
2x/x,
= (1, 2, 1)
(a) limx0+ |x|
x
x
x
(b) limx0+ |x|
, cos x, e2x = (1, 1, 1)
2
7x
16
(c) limx4 xx+4
, x 7, tan
= 0, 3, 75
5x
3
2
3
2. Let r (t) = 4 + (t 1) , t2 + 1 (t 1) , (t1)
describe the position of an ob5
t
ject in R3 as a function of t where t is measured in seconds and r (t) is measured
in meters. Is the velocity of this object ever equal to zero? If so, find the value of
t at which this occurs and the point in R3 at which the velocity is zero.
2 4t2 t+3
2 2t5
You need to differentiate this. r0 (t) = 2 (t 1) , (t 1)
.
,
(t
1)
t6
2
(t +1)
3. Let r (t) = sin t, t2 , 2t + 1 for t [0, 4] . Find a tangent line to the curve parameterized by r at the point r (2) .
r0 (t) = (cos t, 2t, 2). When t = 2, the point on the curve is (sin 2, 4, 5) . A direction
vector is r0 (2) and so a tangent line is r (t) = (sin 2, 4, 5) + t (cos 2, 4, 2) .
4. Let r (t) = sin t, cos t2 , t + 1 for t [0, 5] . Find the velocity when t = 3.
r0 (t) = cos t, 2t sin t2 , 1 . The velocity when t = 3 is just
r0 (3) = (cos 3, 6 sin (9) , 1) .
5. Prove 8.5 directly from the definition of the derivative without considering components.
The formula for the derivative of a cross product can be obtained in the usual way
using rules of the cross product.
u (t + h) v (t + h) u (t) v (t)
h
u (t + h) v (t + h) u (t + h) v (t)
h
u (t + h) v (t) u (t) v (t)
+
h
v (t + h) v (t)
u (t + h) u (t)
= u (t + h)
+
v (t)
h
h
=
Doesnt this remind you of the proof of the product rule? Now procede in the usual
way. If you want to really understand this, you should consider why u, v u v
is a continuous map. This follows from the geometric description of the cross
product or more easily from the coordinate description.
6. Suppose r (t), s (t) , and p (t) are three differentiable functions of t which have
0
values in R3 . Find a formula for (r (t) s (t) p (t)) .
From the product rules for the cross and dot product, this equals
0
166
7. If r0 (t) = 0 for all t (a, b), show there exists a constant vector, c such that
r (t) = c for all t (a, b) .
Do this by considering standard one variable calculus and on the components of
r (t) .
Rb
8. If F0 (t) = f (t) for all t (a, b) and F is continuous on [a, b] , show a f (t) dt =
F (b) F (a) .
Do this by considering standard one variable calculus and on the components of
r (t) .
9. Verify that if u = 0 for all u, then = 0.
Geometrically this says that if is not equal to zero then it is parallel to every
vector. Why does this make it obvious that must equal zero?
8.7
Definition 8.7.1 Let r (t) denote the position of an object. Then the acceleration of
the object is defined to be r00 (t) .
Newtons2 first law is: Every body persists in its state of rest or of uniform motion
in a straight line unless it is compelled to change that state by forces impressed on it.
Newtons second law is:
F = ma
(8.9)
where a is the acceleration and m is the mass of the object.
Newtons third law states: To every action there is always opposed an equal reaction; or, the mutual actions of two bodies upon each other are always equal, and
directed to contrary parts.
Of these laws, only the second two are independent of each other, the first law being
implied by the second. The third law says roughly that if you apply a force to something,
the thing applies the same force back.
The second law is the one of most interest. Note that the statement of this law
depends on the concept of the derivative because the acceleration is defined as a derivative. Newton used calculus and these laws to solve profound problems involving the
motion of the planets and other problems in mechanics. The next example involves the
concept that if you know the force along with the initial velocity and initial position,
then you can determine the position.
Example 8.7.2 Let r (t) denote the position of an object of mass
2 kilogram
at time t
and suppose the force acting on the object is given by F (t) = t, 1 t2 , 2et . Suppose
r (0) = (1, 0, 1) meters, and r0 (0) = (0, 1, 1) meters/sec. Find r (t) .
167
r0 (t) =
t2 t t3 /3
,
, et
4
2
+c
where c is a constant vector which must be determined from the initial condition given
for the velocity. Thus letting c = (c1 , c2 , c3 ) ,
(0, 1, 1) = (0, 0, 1) + (c1 , c2 , c3 )
which requires c1 = 0, c2 = 1, and c3 = 2. Therefore, the velocity is found.
2
t t t3 /3
t
0
,
+ 1, e + 2 .
r (t) =
4
2
Now from this, the displacement must equal
3 2
t t /2 t4 /12
r (t) =
,
+ t, et + 2t + (C1 , C2 , C3 )
12
2
where the constant vector, (C1 , C2 , C3 ) must be determined from the initial condition
for the displacement. Thus
r (0) = (1, 0, 1) = (0, 0, 1) + (C1 , C2 , C3 )
which means C1 = 1, C2 = 0, and C3 = 0. Therefore, the displacement has also been
found.
3
t
t2 /2 t4 /12
t
r (t) =
+ 1,
+ t, e + 2t meters.
12
2
Actually, in applications of this sort of thing acceleration does not usually come to you
as a nice given function written in terms of simple functions you understand. Rather,
it comes as measurements taken by instruments and the position is continuously being
updated based on this information. Another situation which often occurs is the case
when the forces on the object depend not just on time but also on the position or
velocity of the object.
Example 8.7.3 An artillery piece is fired at ground level on a level plain. The angle
of elevation is /6 radians and the speed of the shell is 400 meters per second. How far
does the shell fly before hitting the ground?
Neglect air resistance in this problem. Also let the direction of flight be along the
positive x axis. Thus the initial velocity is the vector, 400 cos (/6) i + 400 sin (/6) j
while the only force experienced by the shell after leaving the artillery piece is the force
of gravity, mgj where m is the mass of the shell. The acceleration of gravity equals
9.8 meters per sec2 and so the following needs to be solved.
mr00 (t) = mgj, r (0) = (0, 0) , r0 (0) = 400 cos (/6) i + 400 sin (/6) j.
Denoting r (t) as (x (t) , y (t)) ,
x00 (t) = 0, y 00 (t) = g.
Therefore, y 0 (t) = gt + C and from the information on the initial velocity,
C = 400 sin (/6) = 200.
168
Thus
y (t) = 4.9t2 + 200t + D.
D = 0 because the artillery piece is fired at ground level which requires both x and y
to equal
zero at this time. Similarly, x0 (t) = 400 cos (/6) so x (t) = 400 cos (/6) t =
200 3t. The shell hits the ground when y = 0 and this occurs when 4.9t2 + 200t = 0.
Thus t = 40. 816 326 530 6 seconds and so at this time,
169
(8.10)
1 kt
em c + C
k
and using the initial condition, v = c/k + C and so
e(k/m)t r0 =
1 kt
c
me m v
+D
k
k
where D is a constant to be determined from the initial conditions. Thus
m
c
u=
v
+D
k
k
r (t) = (c/k) t
(8.11)
and so
1 kt
c
m
c
me m v
+ u+
v
.
k
k
k
k
Now apply this to the system 8.10 to find
1
(.1)
2
r1 (t) =
2 exp
t
(880) +
(880)
(.1)
2
(.1)
r (t) = (c/k) t
(.1)
64
2
64
1
2 exp
t
+ 30000 +
(.1)
2
(.1)
(.1) (.1)
(8.12)
170
To determine the velocity when the blue ice hits the ground, it is necessary to find the
value of t when this event takes place and then to use 8.12 to determine the velocity. It
hits ground when r2 (t) = 0. Thus it suffices to solve the equation,
0 = 640.0t 12800.0 exp (.0 5t) + 42800.0.
This is a fairly hard equation to solve using the methods of algebra. In fact, I do not
have a good way to find this value of t using algebra. However if plugging in various
values of t using a calculator you eventually find that when t = 66.14,
640.0 (66.14) 12800.0 exp (.0 5 (66.14)) + 42800.0 = 1.588 feet.
This is close enough to hitting the ground and so plugging in this value for t yields the
approximate velocity,
(880.0 exp (.0 5 (66.14)) , 640.0 + 640.0 exp (.0 5 (66.14))) = (32. 23, 616. 56) .
Notice how because of air resistance the component of velocity in the horizontal direction
is only about 32 feet per second even though this component started out at 880 feet per
second while the component in the vertical direction is -616 feet per second even though
this component started off at 0 feet per second. You see that air resistance can be very
important so it is not enough to pretend, as is often done in beginning physics courses
that everything takes place in a vacuum. Actually, this problem used several physical
simplifications. It was assumed the force acting on the lump of blue ice by gravity was
constant. This is not really true because it actually depends on the distance between
the center of mass of the earth and the center of mass of the lump. It was also assumed
the air resistance is proportional to the velocity. This is an over simplification when
high speeds are involved. However, increasingly correct models can be studied in a
systematic way as above.
8.7.1
Kinetic Energy
Newtons second law is also the basis for the notion of kinetic energy. When a force is
exerted on an object which causes the object to move, it follows that the force is doing
work which manifests itself in a change of velocity of the object. How is the total work
done on the object by the force related to the final velocity of the object? By Newtons
second law, and letting v be the velocity,
F (t) = mv0 (t) .
Now in a small increment of time, (t, t + dt) , the work done on the object would be
approximately equal to
dW = F (t) v (t) dt.
(8.13)
If no work has been done at time t = 0,then 8.13 implies
dW
= F v, W (0) = 0.
dt
Hence,
dW
m d
2
= mv0 (t) v (t) =
|v (t)| .
dt
2 dt
2
m
Therefore, the total work done up to time t would be W (t) = m
2 |v (t)| 2 |v0 | where
|v0 | denotes the initial speed of the object. This difference represents the change in the
kinetic energy.
8.7.2
171
Work and energy involve a force acting on an object for some distance. Impulse involves
a force which acts on an object for an interval of time.
Definition 8.7.5 Let F be a force which acts on an object during the time interval,
[a, b] . The impulse of this force is
Z
F (t) dt.
a
This is defined as
F1 (t) dt,
F2 (t) dt,
F3 (t) dt .
a
Proof: This is really just the fundamental theorem of calculus and Newtons second
law applied to the components of F.
Z
F (t) dt =
a
m
a
dv
dt = mv (b) mv (a)
dt
(8.14)
Now suppose two point masses, A and B collide. Newtons third law says the force
exerted by mass A on mass B is equal in magnitude but opposite in direction to the
force exerted by mass B on mass A. Letting the collision take place in the time interval,
[a, b] and denoting the two masses by mA and mB and their velocities by vA and vB it
follows that
Z
b
mA vA (b) mA vA (a) =
(Force of B on A) dt
a
and
Z
mB vB (b) mB vB (a)
(Force of A on B) dt
=
a
(Force of B on A) dt
a
172
8.8
The idea is you have a coordinate system which is moving and this results in strange
forces experienced relative to these moving coordinates systems. A good example is
what we experience every day living on a rotating ball. Relative to our supposedly fixed
coordinate system, we experience forces which account for many phenomena which are
observed.
8.8.1
Imagine a point on the surface of the earth. Now consider unit vectors, one pointing
South, one pointing East and one pointing directly away from the center of the earth.
j
HH
H
j
Denote the first as i, the second as j and the third as k. If you are standing on the
earth you will consider these vectors as fixed, but of course they are not. As the earth
turns, they change direction and so each is in reality a function of t. Nevertheless, it is
with respect to these apparently fixed vectors that you wish to understand acceleration,
velocities, and displacements.
In general, let i (t) , j (t) , k (t) be an orthonormal basis of vectors for each t, like the
vectors described in the first paragraph. It is assumed these vectors are C 1 functions of
t. Letting the positive x axis extend in the direction of i (t) , the positive y axis extend
in the direction of j (t), and the positive z axis extend in the direction of k (t) , yields
a moving coordinate system. By Theorem 8.4.2 on Page 163, there exists an angular
velocity vector, (t) such that if u (t) is any vector which has constant components
with respect to i (t) , j (t) , and k (t) , then
u = u0 .
Now let R (t) be a position vector of the moving coordinate system and let
r (t) = R (t) + rB (t)
where
rB (t) x (t) i (t) +y (t) j (t) +z (t) k (t) .
rB (t)
@@
R
R(t)
r(t)
(8.15)
173
In the example of the earth, R (t) is the position vector of a point p (t) on the earths
surface and rB (t) is the position vector of another point from p (t) , thus regarding p (t)
as the origin. rB (t) is the position vector of a point as perceived by the observer on
the earth with respect to the vectors he thinks of as fixed. Similarly, vB (t) and aB (t)
will be the velocity and acceleration relative to i (t) , j (t) , k (t), and so vB = x0 i + y 0 j
+ z 0 k and aB = x00 i + y 00 j + z 00 k. Then
v r0 = R0 + x0 i + y 0 j + z 0 k+xi0 + yj0 + zk0 .
By , 8.15, if e {i, j, k} , e0 = e because the components of these vectors with
respect to i, j, k are constant. Therefore,
xi0 + yj0 + zk0
= x i + y j + z k
= (xi + yj + zk)
and consequently,
v = R0 + x0 i + y 0 j + z 0 k + rB = R0 + x0 i + y 0 j + z 0 k + (xi + yj + zk) .
Now consider the acceleration. Quantities which are relative to the moving coordinate system are distinguished by using the subscript, B.
vB
z
}|
{
a = v0 = R00 + x00 i + y 00 j + z 00 k+x0 i0 + y 0 j0 + z 0 k0 + 0 rB
rB (t)
vB
}|
{ z
}|
{
z
= R00 + aB + 0 rB + 2 vB + ( rB ) .
The acceleration aB is that perceived by an observer for whom the moving coordinate
system is fixed. The term ( rB ) is called the centripetal acceleration. Solving
for aB ,
aB = a R00 0 rB 2 vB ( rB ) .
(8.16)
Here the term ( ( rB )) is called the centrifugal acceleration, it being an acceleration felt by the observer relative to the moving coordinate system which he regards
as fixed, and the term 2 vB is called the Coriolis acceleration, an acceleration
experienced by the observer as he moves relative to the moving coordinate system. The
mass multiplied by the Coriolis acceleration defines the Coriolis force.
There is a ride found in some amusement parks in which the victims stand next to
a circular wall covered with a carpet or some rough material. Then the whole circular
room begins to revolve faster and faster. At some point, the bottom drops out and the
victims are held in place by friction. The force they feel which keeps them stuck to the
wall is called centrifugal force and it causes centrifugal acceleration. It is not necessary
to move relative to coordinates fixed with the revolving wall in order to feel this force
and it is pretty predictable. However, if the nauseated victim moves relative to the
rotating wall, he will feel the effects of the Coriolis force and this force is really strange.
The difference between these forces is that the Coriolis force is caused by movement
relative to the moving coordinate system and the centrifugal force is not.
174
8.8.2
Now consider the earth. Let i , j , k , be the usual basis vectors attached to the rotating
earth. Thus k is fixed in space with k pointing in the direction of the north pole from
the center of the earth while i and j point to fixed points on the surface of the earth.
Thus i and j depend on t while k does not. Let i, j, k be the unit vectors described
earlier with i pointing South, j pointing East, and k pointing away from the center of
the earth at some point of the rotating earths surface, p. Letting R (t) be the position
vector of the point p, from the center of the earth, observe the coordinates of R (t)
are constant with respect to i (t) , j (t) , k (t) . Also, since the earth rotates from West to
East and the speed of a point on the surface of the earth relative to an observer fixed
in space is |R| sin where is the angular speed of the earth about an axis through
the poles, it follows from the geometric definition of the cross product that
R0 = k R
Therefore, = k and so
=0
z }| {
R00 = 0 R+ R0 = ( R)
since does not depend on t. Formula 8.16 implies
aB = a ( R) 2 vB ( rB ) .
(8.17)
In this formula, you can totally ignore the term ( rB ) because it is so small
whenever you are considering motion near some point on the earths surface. To see
seconds in a day
z }| {
this, note (24) (3600) = 2, and so = 7.2722 105 in radians per second. If you
are using seconds to measure time and feet to measure distance, this term is therefore,
no larger than
2
7.2722 105 |rB | .
Clearly this is not worth considering in the presence of the acceleration due to gravity
which is approximately 32 feet per second squared near the surface of the earth.
If the acceleration a, is due to gravity, then
aB = a ( R) 2 vB =
z
}|
{
GM (R + rB )
( R) 2 vB g 2 vB .
3
|R + rB |
Note that
2
( R) = ( R) || R
and so g, the acceleration relative to the moving coordinate system on the earth is
not directed exactly toward the center of the earth except at the poles and at the
equator, although the components of acceleration which are in other directions are
very small when compared with the acceleration due to the force of gravity and are
often neglected. Therefore, if the only force acting on an object is due to gravity, the
following formula describes the acceleration relative to a coordinate system moving with
the earths surface.
aB = g2 ( vB )
175
While the vector, is quite small, if the relative velocity, vB is large, the Coriolis
acceleration could be significant. This is described in terms of the vectors i (t) , j (t) , k (t)
next.
Letting (, , ) be the usual spherical coordinates of the point p (t) on the surface
taken with respect to i , j , k the usual way with the polar angle, it follows the
i , j , k coordinates of this point are
sin () cos ()
sin () sin () .
cos ()
It follows,
and
z }| {
0
cos () cos () sin () sin () cos ()
a
0 = cos () sin () cos () sin () sin () b
1
sin ()
0
cos ()
c
(8.18)
The first column is i, the second is j and the third is k in the above matrix. The solution
is a = sin () , b = 0, and c = cos () .
Now the Coriolis acceleration on the earth equals
k
z
}|
{
(8.19)
Remember is fixed and pertains to the fixed point, p (t) on the earths surface. Therefore, if the acceleration, a is due to gravity,
aB = g2 [(y 0 cos ) i+ (x0 cos + z 0 sin ) j (y 0 sin ) k]
(R+rB )
where g = GM
( R) as explained above. The term ( R) is
|R+rB |3
pretty small and so it will be neglected. However, the Coriolis force will not be neglected.
Example 8.8.1 Suppose a rock is dropped from a tall building. Where will it strike?
Assume a = gk and the j component of aB is approximately
2 (x0 cos + z 0 sin ) .
The dominant term in this expression is clearly the second one because x0 will be small.
Also, the i and k contributions will be very small. Therefore, the following equation is
descriptive of the situation.
aB = gk2z 0 sin j.
176
Two integrations give gt3 /3 sin for the j component of the relative displacement
at time t.
This shows the rock does not fall directly towards the center of the earth as expected
but slightly to the east.
Example 8.8.2 In 1851 Foucault set a pendulum vibrating and observed the earth rotate
out from under it. It was a very long pendulum with a heavy weight at the end so that
it would vibrate for a long time without stopping3 . This is what allowed him to observe
the earth rotate out from under it. Clearly such a pendulum will take 24 hours for the
plane of vibration to appear to make one complete revolution at the north pole. It is also
reasonable to expect that no such observed rotation would take place on the equator. Is
it possible to predict what will take place at various latitudes?
Using 8.19, in 8.17,
aB = a ( R)
2 [(y 0 cos ) i+ (x0 cos + z 0 sin ) j (y 0 sin ) k] .
Neglecting the small term, ( R) , this becomes
= gk + T/m2 [(y 0 cos ) i+ (x0 cos + z 0 sin ) j (y 0 sin ) k]
where T, the tension in the string of the pendulum, is directed towards the point
at which the pendulum is supported, and m is the mass of the pendulum bob. The
pendulum can be thought of as the position vector from (0, 0, l) to the surface of the
2
sphere x2 + y 2 + (z l) = l2 . Therefore,
x
y
lz
T = T iT j+T
k
l
l
l
and consequently, the differential equations of relative motion are
x00 = T
y 00 = T
x
+ 2y 0 cos
ml
y
2 (x0 cos + z 0 sin )
ml
and
lz
g + 2y 0 sin .
ml
If the vibrations of the pendulum are small so that for practical purposes, z 00 = z = 0,
the last equation may be solved for T to get
z 00 = T
gm 2y 0 sin () m = T.
Therefore, the first two equations become
x00 = (gm 2my 0 sin )
and
x
+ 2y 0 cos
ml
y
2 (x0 cos + z 0 sin ) .
ml
3 There is such a pendulum in the Eyring building at BYU and to keep people from touching it,
there is a little sign which says Warning! 1000 ohms.
177
All terms of the form xy 0 or y 0 y can be neglected because it is assumed x and y remain
small. Also, the pendulum is assumed to be long with a heavy weight so that x0 and y 0
are also small. With these simplifying assumptions, the equations of motion become
x00 + g
and
y 00 + g
x
= 2y 0 cos
l
y
= 2x0 cos .
l
(8.20)
where a2 = gl and b = 2 cos . Then it is fairly tedious but routine to verify that for
each constant, c,
!
b2 + 4a2
b2 + 4a2
bt
bt
x = c sin
sin
t , y = c cos
sin
t
(8.21)
2
2
2
2
yields a solution to 8.20 along with the initial conditions,
x (0) = 0, y (0) = 0, x0 (0) = 0, y 0 (0) =
c b2 + 4a2
.
2
(8.22)
It is clear from experiments with the pendulum that the earth does indeed rotate out
from under it causing the plane of vibration of the pendulum to appear to rotate. The
purpose of this discussion is not to establish these self evident facts but to predict how
long it takes for the plane of vibration to make one revolution. Therefore, there will be
some instant in time at which the pendulum will be vibrating in a plane determined by
k and j. (Recall k points away from the center of the earth and j points East. ) At
this instant in time, defined as t = 0, the conditions of 8.22 will hold for some value
of c and so the solution to 8.20 having these initial conditions will be those of 8.21 by
uniqueness of the initial value problem. Writing these solutions differently,
!
2 + 4a2
b
x (t)
sin bt
2
=c
t
sin
y (t)
cos bt
2
2
sin bt
2
This is very interesting! The vector, c
always has magnitude equal to |c|
cos bt
2
but its direction changes very slowly because b is very small.
of vibration is
The plane
b2 +4a2
determined by this vector and the vector k. The term sin
t changes relatively
2
fast and takes values between 1 and 1. This is what describes the actual observed
vibrations of the pendulum. Thus the plane of vibration will have made one complete
revolution when t = P for
bP
2.
2
Therefore, the time it takes for the earth to turn out from under the pendulum is
P =
4
2
=
sec .
2 cos
2
24
12
in radians per
178
I think this is really amazing. You could actually determine latitude, not by taking
readings with instruments using the North Star but by doing an experiment with a
big pendulum. You would set it vibrating, observe P in hours, and then solve the
above equation for . Also note the pendulum would not appear to change its plane of
vibration at the equator because lim/2 sec = .
The Coriolis acceleration is also responsible for the phenomenon of the next example.
Example 8.8.3 It is known that low pressure areas rotate counterclockwise as seen from
above in the Northern hemisphere but clockwise in the Southern hemisphere. Why?
Neglect accelerations other than the Coriolis acceleration and the following acceleration which comes from an assumption that the point p (t) is the location of the lowest
pressure.
a = a (rB ) rB
where rB = r will denote the distance from the fixed point p (t) on the earths surface
which is also the lowest pressure point. Of course the situation could be more complicated but this will suffice to explain the above question. Then the acceleration observed
by a person on the earth relative to the apparently fixed vectors, i, k, j, is
aB = a (rB ) (xi+yj+zk) 2 [y 0 cos () i+ (x0 cos () + z 0 sin ()) j (y 0 sin () k)]
Therefore, one obtains some differential equations from aB = x00 i+y 00 j+z 00 k by matching
the components. These are
x00 + a (rB ) x
y 00 + a (rB ) y
z 00 + a (rB ) z
= 2y 0 cos
= 2x0 cos 2z 0 sin ()
= 2y 0 sin
Now remember, the vectors, i, j, k are fixed relative to the earth and so are constant
vectors. Therefore, from the properties of the determinant and the above differential
equations,
i j k 0 i
j
k
0
(r0B rB ) = x0 y 0 z 0 = x00 y 00 z 00
x y z
x
y
z
i
j
k
x
y
z
Then the kth component of this cross product equals
0
cos () y 2 + x2 + 2xz 0 sin () .
The first term will be negative because it is assumed p (t) is the location of low pressure
causing y 2 + x2 to be a decreasing function. If it is assumed there is not a substantial
motion in the k direction, so that z is fairly constant and the last term
can
be neglected,
0
then the kth component of (r0B rB ) is negative provided 0, 2 and positive if
2 , . Beginning with a point at rest, this implies r0B rB = 0 initially and
then the above implies its kth component is negative in the upper hemisphere when
< /2 and positive in the lower hemisphere when > /2. Using the right hand and
the geometric definition of the cross product, this shows clockwise rotation in the lower
hemisphere and counter clockwise rotation in the upper hemisphere.
Note also that as gets close to /2 near the equator, the above reasoning tends
to break down because cos () becomes close to zero. Therefore, the motion towards
the low pressure has to be more pronounced in comparison with the motion in the k
direction in order to draw this conclusion.
8.9. EXERCISES
8.9
179
Exercises
1. Show
the
to v0 + rv = c with the initial condition, v (0) = v0 is v (t) =
solution
c
rt
v0 r e
+ (c/r) . If v is velocity and r = k/m where k is a constant for air
resistance and m is the mass, and c = f /m, argue from Newtons second law that
this is the equation for finding the velocity, v of an object acted on by air resistance
proportional to the velocity and a constant force, f , possibly from gravity. Does
there exist a terminal velocity? What is it? Hint: To find the solution to the
d
equation, multiply both sides by ert . Verify that then dt
(ert v) = cert . Then
1
rt
rt
integrating both sides, e v (t) = r ce + C. Now you need to find C from using
the initial condition which states v (0) = v0 .
2. Verify Formula 8.14 carefully by considering the components.
3. Suppose that the air resistance is proportional to the velocity but it is desired to
find the constant of proportionality. Describe how you could find this constant.
4. Suppose an object having mass equal to 5 kilograms experiences a time dependent
force, F (t) =et i + cos (t) j + t2 k meters per sec2 . Suppose also that the object is
at the point (0, 1, 1) meters at time t = 0 and that its initial velocity at this time
is v = i + j k meters per sec. Find the position of the object as a function of t.
5. Fill in the details for the derivation of kinetic energy. In particular verify that
2
d
mv0 (t) v (t) = m
2 dt |v (t)| . Also, why would dW = F (t) v (t) dt?
6. Suppose the force acting on an object, F is always perpendicular to the velocity of
the object. Thus F v = 0. Show the Kinetic energy of the object is constant. Such
forces are sometimes called forces of constraint because they do not contribute to
the speed of the object, only its direction.
7. A cannon is fired at an angle, from ground level on a vast plain. The speed of
the ball as it leaves the mouth of the cannon is known to be s meters per second.
Neglecting air resistance, find a formula for how far the cannon ball goes before
hitting the ground. Show the maximum range for the cannon ball is achieved
when = /4.
8. Suppose in the context of Problem 7 that the cannon ball has mass 10 kilograms
and it experiences a force of air resistance which is .01v Newtons where v is the
velocity in meters per second. The acceleration of gravity is 9.8 meters per sec2 .
Also suppose that the initial speed is 100 meters per second. Find a formula for
the displacement, r (t) of the cannon ball. If the angle of elevation equals /4, use
a calculator or other means to estimate the time before the cannon ball hits the
ground.
9. Show that Newtons first law can be obtained from the second law.
10. Show that if v0 (t) = 0, for all t (a, b) , then there exists a constant vector, z
independent of t such that v (t) = z for all t.
11. Suppose an object moves in three dimensional space in such a way that the only
force acting on the object is directed toward a single fixed point in three dimensional space. Verify that the motion of the object takes place in a plane. Hint: Let
r (t) denote the position vector of the object from the fixed point. Then the force
acting on the object must be of the form g (r (t)) r (t) and by Newtons second
law, this equals mr00 (t) . Therefore,
mr00 r = g (r) r r = 0.
180
Now argue that r00 r = (r0 r) , showing that (r0 r) must equal a constant
vector, z. Therefore, what can be said about z and r?
12. Suppose the only forces acting on an object are the force of gravity, mgk and a
force, F which is perpendicular to the motion of the object. Thus F v = 0. Show
the total energy of the object,
E
1
2
m |v| + mgz
2
is constant. Here v is the velocity and the first term is the kinetic energy while
the second is the potential energy. Hint: Use Newtons second law to show the
time derivative of the above expression equals zero.
13. Using Problem 12, suppose an object slides down a frictionless inclined plane
from a height of 100 feet. When it reaches the bottom, how fast will it be going?
Assume it starts from rest.
14. The ballistic pendulum is an interesting device which is used to determine the
speed of a bullet. It is a large massive block of wood hanging from a long string.
A rifle is fired into the block of wood which then moves. The speed of the bullet
can be determined from measuring how high the block of wood rises. Explain
how this can be done and why. Hint: Let v be the speed of the bullet which has
mass m and let the block of wood have mass M. By conservation of momentum
mv = (m + M ) V where V is the speed of the block of wood immediately after
the collision. Thus the energy is 12 (m + M ) V 2 and this block of wood rises to a
height of h. Now use Problem 12.
15. In the experiment of Problem 14, show the kinetic energy before the collision
is greater than the kinetic energy after the collision. Thus linear momentum is
conserved but energy is not. Such a collision is called inelastic.
16. There is a popular toy consisting of identical steel balls suspended from strings of
equal length as illustrated in the following picture.
The ball at the right is lifted and allowed to swing. When it collides with the
other balls, the ball on the left is observed to swing away from the others with
the same speed the ball on the right had when it collided. Why does this happen?
Why dont two or more of the stationary balls start to move, perhaps at a slower
speed? This is an example of an elastic collision because energy is conserved. Of
course this could change if you fixed things so the balls would stick to each other.
17. An illustration used in many beginning physics books is that of firing a rifle horizontally and dropping an identical bullet from the same height above the perfectly
flat ground followed by an assertion that the two bullets will hit the ground at
181
exactly the same time. Is this true on the rotating earth assuming the experiment
takes place over a large perfectly flat field so the curvature of the earth is not an
issue? Explain. What other irregularities will occur? Recall the Coriolis force is
2 [(y 0 cos ) i+ (x0 cos + z 0 sin ) j (y 0 sin ) k] where k points away from the
center of the earth, j points East, and i points South.
18. Suppose you have n masses, m1 , , mn . Let the position vector of the ith mass
be ri (t) . The center of mass of these is defined to be
Pn
Pn
ri m i
ri (t) mi
i=1
.
R (t) Pn
i=1
M
i=1 mi
Let rBi (t) = ri (t) R (t) . Show that
Pn
i=1
mi ri (t)
P
i
mi R (t) = 0.
19. Suppose you have n masses, m1 , , mn which make up a moving rigid body. Let
R (t) denote the position vector of the center of mass of these n masses. Find a
formula for the total kinetic energy in terms of this position vector, the angular
velocity vector, and the position vector of each mass from the center of mass.
Hint: Use Problem 18.
8.10
1. Show
the
to v0 + rv = c with the initial condition, v (0) = v0 is v (t) =
solution
c
rt
v0 r e
+ (c/r) . If v is velocity and r = k/m where k is a constant for air
resistance and m is the mass, and c = f /m, argue from Newtons second law that
this is the equation for finding the velocity, v of an object acted on by air resistance
proportional to the velocity and a constant force, f , possibly from gravity. Does
there exist a terminal velocity? What is it?
Multiply both sides of the differential equation by ert . Then the left side becomes
d
ert
rt
rt
rt
dt (e v) = e c. Now integrate both sides. This gives e v (t) = C + r c. You
finish the rest.
2. Suppose an object having mass equal to 5 kilograms experiences a time dependent
force, F (t) = et i + cos (t) j + t2 k meters per sec2 . Suppose also that the object is
at the point (0, 1, 1) meters at time t = 0 and that its initial velocity at this time
is v = i + j k meters per sec. Find the position of the object as a function of t.
2
This is done by using Newtons law. Thus 5 ddt2r = et i + cos (t) j + t2 k and so
3
t
5 dr
dt = e i + sin (t) j + t /3 k + C. Find the constant, C by using the given
initial velocity. Next do another integration obtaining another constant vector
which will be determined by using the given initial position of the object.
3. Fill in the details for the derivation of kinetic energy. In particular verify that
2
d
mv0 (t) v (t) = m
2 dt |v (t)| . Also, why would dW = F (t) v (t) dt?
2
2
d
m 12 |v| , the derivative of the kinetic
0 = F v = mv0 v. Explain why this is dt
energy.
182
8.11
Line Integrals
The concept of the integral can be extended to functions which are not defined on an
interval of the real line but on some curve in Rn . This is done by defining things in such
a way that the more general concept reduces to the earlier notion. First it is necessary
to consider what is meant by arc length.
8.11.1
The application of the integral considered here is the concept of the length of a curve.
C is a smooth curve in Rn if there exists an interval, [a, b] R and functions xi :
[a, b] R such that the following conditions hold
1. xi is continuous on [a, b] .
2. x0i exists and is continuous and bounded on [a, b] , with x0i (a) defined as the derivative from the right,
xi (a + h) xi (a)
lim
,
h0+
h
and x0i (b) defined similarly as the derivative from the left.
3. For p (t) (x1 (t) , , xn (t)) , t p (t) is one to one on (a, b) .
4. |p0 (t)|
n
i=1
|x0i (t)|
1/2
"
"
p1
"
p2
"
"
p0
Now consider what happens when the partition is refined by including more points.
You can see from the following picture that the polygonal approximation would appear
to be even better and that as more points are added in the partition, the sum of the
lengths of the line segments seems to get close to something which deserves to be defined
as the length of the curve, C.
183
p3
p1
p2
p0
Thus the length of the curve is approximated by
n
X
|p (tk ) p (tk1 )| .
k=1
|p0 (t)| dt
i=1
Thus this new definition which is valid for smooth curves which may not be straight
line segments gives the usual length for straight line segments.
The proof that curve length is well defined for a smooth curve contains a result
which deserves to be stated as a corollary. It is proved in Lemma 8.14.13 on Page 194
but the proof is mathematically fairly advanced so it is presented later.
Corollary 8.11.1 Let C be a smooth curve and let f : [a, b] C and g : [c, d] C
be two parameterizations satisfying 1 - 5. Then g1 f is either strictly increasing or
strictly decreasing.
Definition 8.11.2 If g1 f is increasing, then f and g are said to be equivalent parameterizations and this is written as f g. It is also said that the two parameterizations
give the same orientation for the curve when f g.
184
When the parameterizations are equivalent, they preserve the direction, of motion
along the curve and this also shows there are exactly two orientations of the curve since
either g1 f is increasing or it is decreasing. This is not hard to believe. In simple
language, the message is that there are exactly two directions of motion along a curve.
The difficulty is in proving this is actually the case.
Lemma 8.11.3 The following hold for .
f f,
(8.23)
If f g then g f ,
(8.24)
If f g and g h, then f h.
(8.25)
So what is the thing to remember from all this? First, there are certain conditions
which must be satisfied for a curve to be smooth. These are listed in 1 - 5. Next, if you
have any curve, there are two directions you can move over this curve, each called an
orientation. This is illustrated in the following picture.
185
8.11.2
F(x(t))
x(t)
x(t + h)
In this picture, the work done by F on an object which moves from the point x (t) to
the point x (t + h) along the straight line shown would equal F (x (t + h) x (t)) . It is
reasonable to assume this would be a good approximation to the work done in moving
along the curve joining x (t) and x (t + h) provided h is small enough. Also, provided h
is small,
x (t + h) x (t) x0 (t) h
186
where the wriggly equal sign indicates the two quantities are close. In the notation of
Leibniz, one writes dt for h and
dW = F (x (t)) x0 (t) dt
or in other words,
dW
= F (x (t)) x0 (t) .
dt
Defining the total work done by the force at t = 0, corresponding to the first endpoint
of the curve, to equal zero, the work would satisfy the following initial value problem.
dW
= F (x (t)) x0 (t) , W (a) = 0.
dt
This motivates the following definition of work.
Definition 8.11.7 Let F (x) be given above. Then the work done by this force field on
an object moving over the curve C in the direction determined by the specified orientation
is defined as
Z
Z b
F dR
F (x (t)) x0 (t) dt
C
where the function, x is one of the allowed parameterizations of C in the given orientation of C. In other words, there is an interval, [a, b] and as t goes from a to b, x (t)
moves in the direction determined from the given orientation of the curve.
R
Theorem 8.11.8 The symbol, C F dR, is well defined in the sense
R that every parameterization in the given orientation of C gives the same value for C F dR.
Proof: Suppose g : [c, d] C is another allowed parameterization. Thus g1 f is
an increasing function, . Then since is increasing,
Z
F (g (s)) g (s) ds =
c
Z
=
a
d 1
F (f (t))
g g f (t) dt =
dt
F (q p) dt = F (q p) .
0
Therefore, the new definition adds to but does not contradict the old one.
187
Example 8.11.9 Suppose for t [0, ] the position of an object is given by r (t) =
ti + cos (2t) j + sin (2t) k. Also suppose there is a force field defined on R3 , F (x, y, z)
2xyi + x2 j + k. Find
Z
F dR
C
where C is the curve traced out by this object which has the orientation determined by
the direction of increasing t.
To find this line integral use the above definition and write
Z
Z
F dR =
2t (cos (2t)) ,t2 ,1
C
Example 8.11.10 Let C denote the oriented curve obtained by r (t) = t, sin t, t3
whereRthe orientation is determined by increasing t for t [0, 2] . Also let F = (x, y, xz + z) .
Find C FdR.
You use the definition.
Z
Z
F dR =
C
=
0
1251 1
cos2 (2) .
14
2
Suppose you have a curve specified by r (s) = (x (s) , y (s) , z (s)) and it has the
property that |r0 (s)| = 1 for all s [0, b] . Then the length of this curve for s between
0 and s1 is
Z
Z
s1
s1
|r0 (s)| ds =
1ds = s1 .
0
This parameter is therefore called arc length because the length of the curve up to s
equals s. Now you can always change the parameter to be arc length.
Proposition 8.11.11 Suppose C is an oriented smooth curve parameterized by r (t)
for t [a, b] . Then letting l denote the total length of C, there exists R (s) , s [0, l]
another parameterization for this curve which preserves the orientation and such that
|R0 (s)| = 1 so that s is arc length.
Prove: Let (t)
Rt
a
188
0
r0 1 (s) 1 (s)
r0 1 (s)
= 0 1
r (s)
b
= r (b)
and so |R0 (s)| = 1 as claimed. R (l) = r 1 (l) = r 1 a |r0 ( )| d
1
and R (0) = r (0) = r (a) and R delivers the same set of points in the same order
as r because ds
dt > 0.
The arc length parameter is just like any other parameter in so far as considerations of line integrals are concerned because it was shown above that line integrals
are independent of parameterization. However, when things are defined in terms of
the arc length parameterization, it is clear they depend only on geometric properties
of the curve itself and for this reason, the arc length parameterization is important in
differential geometry.
R0 (s) =
8.11.3
Definition 8.11.12 Let F (x, y, z) = (P (x, y,Rz) , Q (x, y, z) , R (x, y, z)) and let C be
an oriented curve. Then another way to write C FdR is
Z
P dx + Qdy + Rdz
C
8.12
Exercises
1. Suppose for t [0, 2] the position of an object is given by r (t) = ti + cos (2t) j +
sin (2t) k. Also suppose there is a force field defined on R3 ,
Z
F dR
C
where C is the curve traced out by this object which has the orientation determined
by the direction of increasing t.
where C is the curve traced out by this object which has the orientation determined
by the direction of increasing t. Repeat the problem for r (t) = ti + t2 j + tk.
189
5. Suppose for t [0, 1] the position of an object is given by r (t) = ti + tj + tk. Also
suppose there is a force field defined on R3 , F (x, y, z) zi + xzj + xyk. Find
Z
F dR
C
where C is the curve traced out by this object which has the orientation determined
by the direction of increasing t. Repeat the problem for r (t) = ti + t2 j + tk.
6. Let F (x, y, z) be a given force field and suppose it acts on an object having mass,
m on a curve with parameterization, (x (t) , y (t) , z (t)) for t [a, b] . Show directly
that the work done equals the difference in the kinetic energy. Hint:
Z b
F (x (t) , y (t) , z (t)) (x0 (t) , y 0 (t) , z 0 (t)) dt =
Z
a
b
etc.
8.13
1. Suppose for t [0, 2] the position of an object is given by r (t) = 2ti + cos (t) j +
sin (t) k. Also suppose there is a force field defined on R3 ,
Z
F dR
C
where C is the curve traced out by this object which has the orientation determined
by the direction of increasing t.
You might think of dR = r0 (t) dt to help remember what to do. Then from the
definition,
Z
F dR =
C
2 (2t) (sin t) , 4t2 + 2 sin (t) cos (t) , sin2 (t) (2, sin (t) , cos (t)) dt
Z
=
Z
F dR =
C
/4
2 sin (2t) , cos2 (2t) + t, 4t sin (2t) (2 sin (2t) , 4 cos (2t) , 1) dt
/4
4
4 sin2 2t + 4 cos2 2t + t cos 2t + 4t sin 2t dt =
3
Z
=
190
3. Suppose for t [0, 1] the position of an object is given by r (t) = ti + tj + tk. Also
suppose there is a force field defined on R3 ,
F (x, y, z) yzi + xzj + xyk.
Find
Z
F dR
C
where C is the curve traced out by this object which has the orientation determined
by the direction of increasing t. Repeat the problem for r (t) = ti + t2 j + tk.
You should get the same answer in this case. This is because the vector field
happens to be conservative. (More on this later.)
8.14
Independence Of Parameterization
If some other parameterization were used to trace out C, would the same answer be
obtained? To answer this question in a satisfactory manner requires some hard calculus.
8.14.1
Hard Calculus
if and only if for every > 0 there exists n such that whenever n n ,
|an a| < .
In words the definition says that given any measure of closeness, , the terms of the
sequence are eventually all this close to a. Note the similarity with the concept of limit.
Here, the word eventually refers to n being sufficiently large. Earlier, it referred to y
being sufficiently close to x on one side or another or else x being sufficiently large in
either the positive or negative directions. The limit of a sequence, if it exists, is unique.
Theorem 8.14.2 If limn an = a and limn an = a1 then a1 = a.
191
Proof: Suppose a1 6= a. Then let 0 < < |a1 a| /2 in the definition of the limit.
It follows there exists n such that if n n , then |an a| < and |an a1 | < .
Therefore, for such n,
|a1 a|
|a1 an | + |an a|
< + < |a1 a| /2 + |a1 a| /2 = |a1 a| ,
a contradiction.
Definition 8.14.3 Let {an } be a sequence and let n1 < n2 < n3 , be any strictly
increasing list of integers such that n1 is at least as large as the first index used to define
the sequence {an } . Then if bk ank , {bk } is called a subsequence of {an } .
Theorem 8.14.4 Let {xn } be a sequence with limn xn = x and let {xnk } be a
subsequence. Then limk xnk = x.
Proof: Let > 0 be given. Then there exists n such that if n > n , then |xn x| <
. Suppose k > n . Then nk k > n and so
|xnk x| <
showing limk xnk = x as claimed.
There is a very useful way of thinking of continuity in terms of limits of sequences
found in the following theorem. In words, it says a function is continuous if it takes
convergent sequences to convergent sequences whenever possible.
Theorem 8.14.5 A function f : D (f ) R is continuous at x D (f ) if and only if,
whenever xn x with xn D (f ) , it follows f (xn ) f (x) .
Proof: Suppose first that f is continuous at x and let xn x. Let > 0 be given.
By continuity, there exists > 0 such that if |y x| < , then |f (x) f (y)| < .
However, there exists n such that if n n , then |xn x| < and so for all n this
large,
|f (x) f (xn )| <
which shows f (xn ) f (x) .
Now suppose the condition about taking convergent sequences to convergent sequences holds at x. Suppose f fails to be continuous at x. Then there exists > 0 and
xn D (f ) such that |x xn | < n1 , yet
|f (x) f (xn )| .
But this is clearly a contradiction because, although xn x, f (xn ) fails to converge to
f (x) . It follows f must be continuous after all. This proves the theorem.
Definition 8.14.6 A set, K R is sequentially compact if whenever {an } K is a
sequence, there exists a subsequence, {ank } such that this subsequence converges to a
point of K.
The following theorem is part of a major advanced calculus theorem known as the
Heine Borel theorem.
Theorem 8.14.7 Every closed interval, [a, b] is sequentially compact.
192
Therefore, (a) < (x) whenever x (a, b) . Similarly (b) > (x) for all x (a, b).
It only remains to verify 1 is continuous. Suppose then that sn s where sn
and s are points of ([a, b]) . It is desired to verify that 1 (sn ) 1 (s) . If this
not happen, there
exists > 0 and a subsequence, still denoted by sn such that
does
193
Corollary 8.14.9 Let f : (a, b) R be one to one and continuous. Then f (a, b) is an
open interval, (c, d) and f 1 : (c, d) (a, b) is continuous.
Proof: Since f is either strictly increasing or strictly decreasing, it follows that
f (a, b) is an open interval, (c, d) . Assume f is decreasing. Now let x (a, b). Why is
f 1 is continuous at f (x)? Since f is decreasing, if f (x) < f (y) , then y f 1 (f (y)) <
x f 1 (f (x)) and so f 1 is also decreasing. Let > 0 be given. Let > > 0 and
(x , x + ) (a, b) . Then f (x) (f (x + ) , f (x )) . Let
= min (f (x) f (x + ) , f (x ) f (x)) .
Then if
|f (z) f (x)| < ,
it follows
z f 1 (f (z)) (x , x + ) (x , x + )
so
This proves the theorem in the case where f is strictly decreasing. The case where f is
increasing is similar.
Theorem 8.14.10 Let f : [a, b] R be continuous and one to one. Suppose f 0 (x1 )
0
exists for some x1 [a, b] and f 0 (x1 ) 6= 0. Then f 1 (f (x1 )) exists and is given by
0
1
the formula, f 1 (f (x1 )) = f 0 (x
.
1)
Proof: By Lemma 8.14.8 f is either strictly increasing or strictly decreasing and f 1
is continuous on [a, b]. Therefore there exists > 0 such that if 0 < |f (x1 ) f (x)| < ,
then
1
x x1
f (f (x)) f 1 (f (x1 ))
1
1
x x1
=
<
f (x) f (x1 )
f 0 (x1 ) f (x) f (x1 ) f 0 (x1 )
Therefore, since > 0 is arbitrary,
1
f 1 (y) f 1 (f (x1 ))
= 0
y f (x1 )
f (x1 )
yf (x1 )
lim
0
exists for some x1 (a, b) and f 0 (x1 ) 6= 0. Then f 1 (f (x1 )) exists and is given by
1 0
1
(f (x1 )) = f 0 (x
the formula, f
.
1)
194
This is one of those theorems which is very easy to remember if you neglect the
difficult questions and simply focus on formal manipulations. Consider the following.
f 1 (f (x)) = x.
Now use the chain rule on both sides to write
1 0
f
(f (x)) f 0 (x) = 1,
and then divide both sides by f 0 (x) to obtain
1 0
(f (x)) =
f
f0
1
.
(x)
Of course this gives the conclusion of the above theorem rather effortlessly and it is
formal manipulations like this which aid many of us in remembering formulas such as
the one given in the theorem.
8.14.2
Independence Of Parameterization
Theorem 8.14.12 Let : [a, b] [c, d] be one to one and suppose 0 exists and is continuous on [a, b] . Then if f is a continuous function defined on [a, b] which is Riemann
integrable4 ,
Z d
Z b
f (s) ds =
f ( (t)) 0 (t) dt
c
Rs
Proof: Let F 0 (s) = f (s) . (For example, let F (s) = a f (r) dr.) Then the first
integral equals F (d) F (c) by the fundamental theorem of calculus. By Lemma 8.14.8,
is either strictly increasing or strictly decreasing. Suppose is strictly decreasing.
Then (a) = d and (b) = c. Therefore, 0 0 and the second integral equals
Z
Z
f ( (t)) 0 (t) dt =
b
d
(F ( (t))) dt
dt
195
Proof: Let C be the curve and suppose f : [a, b] C and g : [c, d] C both satisfy
Rb
Rd
conditions 1 - 5. Is it true that a |f 0 (t)| dt = c |g0 (s)| ds?
Let (t) g1 f (t) for t [a, b]. Then by the above lemma is either strictly
increasing or strictly decreasing on [a, b] . Suppose for the sake of simplicity that it is
strictly increasing. The decreasing case is handled similarly.
Let s0 ([a + , b ]) (c, d) . Then by assumption 4, gi0 (s0 ) 6= 0 for some i. By
continuity of gi0 , it follows gi0 (s) 6= 0 for all s I where I is an open interval contained
in [c, d] which contains s0 . It follows that on this interval, gi is either strictly increasing
or strictly decreasing. Therefore, J gi (I) is also an open interval and you can define
a differentiable function, hi : J I by
hi (gi (s)) = s.
This implies that for s I,
h0i (gi (s)) =
gi0
1
.
(s)
(8.26)
Now letting s = (t) for s I, it follows t J1 , an open interval. Also, for s and t
related this way, f (t) = g (s) and so in particular, for s I,
gi (s) = fi (t) .
Consequently,
s = hi (fi (t)) = (t)
and so, for t J1 ,
0 (t) = h0i (fi (t)) fi0 (t) = h0i (gi (s)) fi0 (t) =
fi0 (t)
gi0 ( (t))
(8.27)
which shows that 0 exists and is continuous on J1 , an open interval containing 1 (s0 ) .
Since s0 is arbitrary,
this shows
0 exists on [a + , b ] and is continuous there.
1
Now f (t) = g g f (t) = g ( (t)) and it was just shown that 0 is a continuous
function on [a , b + ] . It follows
f 0 (t) = g0 ( (t)) 0 (t)
and so, by Theorem 8.14.12,
Z
(b)
|g (s)| ds
(a+)
a+
b
|f 0 (t)| dt.
a+
Now using the continuity of , g0 , and f 0 on [a, b] and letting 0+ in the above, yields
Z
Z
0
|g (s)| ds =
c
|f 0 (t)| dt
196
Outcomes
1. Recall the definitions of unit tangent, unit normal, and osculating plane.
2. Calculate the curvature for a space curve.
3. Given the position vector function of a moving object, calculate the velocity, speed,
and acceleration of the object and write the acceleration in terms of its tangential
and normal components.
4. Derive formulas for the curvature of a parameterized curve and the curvature of
a plane curve given as a function.
9.1
Space Curves
A fly buzzing around the room, a person riding a roller coaster, and a satellite orbiting
the earth all have something in common. They are moving over some sort of curve in
three dimensions.
Denote by R (t) the position vector of the point on the curve which occurs at time
t. Assume that R0 , R00 exist and is continuous. Thus R0 = v, the velocity and R00 = a
is the acceleration.
z
1s
R(t)
Lemma 9.1.1 Define T (t) R0 (t) / |R0 (t)| . Then |T (t)| = 1 and if T0 (t) 6= 0, then
there exists a unit vector, N (t) perpendicular to T (t) and a scalar valued function,
(t) , with T0 (t) = (t) |v| N (t) .
Proof: It follows from the definition that |T| = 1. Therefore, T T = 1 and so,
upon differentiating both sides,
T0 T + T T0 = 2T0 T = 0.
197
198
T0
.
|T0 |
d |v|
2
T + |v| N
dt
aT T + aN N.
=
(9.1)
=
=
=
dv
d
d
=
(R0 ) =
(|v| T)
dt
dt
dt
d |v|
T + |v| T0
dt
d |v|
2
T + |v| N.
dt
|a|
=
=
(aT T + aN N) (aT T + aN N)
a2T T T + 2aN aT T N + a2N N N
a2T + a2N
199
Finally, it is well to point out that the curvature is a property of the curve itself,
and does not depend on the paramterization of the curve. If the curve is given by two
different vector valued functions, R (t) and R ( ) , then from the formula above for the
curvature,
dT d
dT
|T0 (t)|
d
dt
d
(t) =
= dR d = dR
( ) .
|v (t)|
d dt
v
v
R (t) = r cos
t , r sin
t .
r
r
Therefore, T = sin vr t , cos vr t and
v
v
v
v
T0 = cos
t , sin
t .
r
r
r
r
Thus, = |T0 (t)| /v = 1r . It follows
a=
dv
v2
T + v 2 N = N.
dt
r
=
=
d |v|
2
T v + |v| N v
dt
d |v|
3
T v + |v| N T
dt
Now T and v have the same direction so the first term on the right equals zero. Taking
the magnitude of both sides, and using the fact that N and T are two perpendicular
unit vectors,
3
|a v| = |v|
and so
=
|a v|
|v|
(9.2)
Example 9.1.4 Let R (t) = cos (t) , t, t2 for t [0, 3] . Find the speed, velocity, curvature, and write the acceleration in terms of normal and tangential components.
First of all v (t) = ( sin t, 1, 2t) and so the speed is given by
q
|v| = sin2 (t) + 1 + 4t2 .
Therefore,
aT =
d
dt
It remains to find aN . To do this, you can find the curvature first if you like.
a (t) = R00 (t) = ( cos t, 0, 2) .
200
Then
Then
aN = |v|
q
You can observe the formula a2N + a2T = |a| holds. Indeed a2N + a2T =
q
2
!2
2
4 + (2 sin (t) + 2 (cos (t)) t) + cos2 (t)
sin (t) cos (t) + 4t
q
+ p
(2 + 4t2 cos2 t)
sin2 (t) + 1 + 4t2
2
9.1.1
(9.3)
where aT = d|v|
dt and aN = |v| . Of course one way to find aT and aN is to just find
d|v|
|v| , dt and and plug in. However, there is another way which might be easier. Take
the dot product of both sides with T. This gives,
a T = aT T T + aN N T = aT .
Thus
a = (a T) T + aN N
and so
a (a T) T = aN N
and taking norms of both sides,
|a (a T) T| = aN .
Also from 9.4,
aN N
a (a T) T
=
= N.
|a (a T) T|
aN |N|
(9.4)
201
Also recall
=
|a v|
3
|v|
This is usually easier than computing T0 / |T0 | . To illustrate the use of these simple
observations, consider the example worked above which was fairly messy. I will make it
easier by selecting a value of t and by using the above simplifying techniques.
Example 9.1.5 Let R (t) = cos (t) , t, t2 for t [0, 3] . Find the speed, velocity, curvature, and write the acceleration in terms of normal and tangential components when
t = 0. Also find N at the point where t = 0.
First I need to find the velocity and acceleration. Thus
v = ( sin t, 1, 2t) , a = ( cos t, 0, 2)
and consequently,
( sin t, 1, 2t)
T= q
(1, 0, 2) = 0 T + 5N
and so
5 = |v (0)| = 1 = .
1
N = (1, 0, 2) .
5
|v|
|f 00 (t)|
2
1 + f 0 (t)
3/2 .
202
9.2
If you are interested in more on space curves, you should read this section. Otherwise,
procede to the exercises. Denote by R (s) the function which takes s to a point on this
curve where s is arc length. Thus R (s) equals the point on the curve which occurs when
you have traveled a distance of s along the curve from one end. This is known as the
parameterization of the curve in terms of arc length. Note also that it incorporates an
orientation on the curve because there are exactly two ends you could begin measuring
length from. In this section, assume anything about smoothness and continuity to
make the following manipulations valid. In particular, assume that R0 exists and is
continuous.
Lemma 9.2.1 Define T (s) R0 (s) . Then |T (s)| = 1 and if T0 (s) 6= 0, then there
exists a unit vector, N (s) perpendicular to T (s) and a scalar valued function, (s) with
T0 (s) = (s) N (s) .
Rs
Proof: First, s = 0 |R0 (r)| dr because of the definition of arc length. Therefore,
from the fundamental theorem of calculus, 1 = |R0 (s)| = |T (s)| . Therefore, T T = 1
and so upon differentiating this on both sides, yields T0 T + T T0 = 0 which shows
T T0 = 0. Therefore, the vector, T0 is perpendicular to the vector, T. In case T0 (s) 6=
T0 (s)
0
0
0, let N (s) = |T
0 (s)| and so T (s) = |T (s)| N (s) , showing the scalar valued function
is (s) = |T0 (s)| . This proves the lemma.
The radius of curvature is defined as = 1 . Thus at points where there is a lot of
curvature, the radius of curvature is small and at points where the curvature is small,
the radius of curvature is large. The plane determined by the two vectors, T and N is
called the osculating plane. It identifies a particular plane which is in a sense tangent
to this space curve. In the case where |T0 (s)| = 0 near the point of interest, T (s)
equals a constant and so the space curve is a straight line which it would be supposed
has no curvature. Also, the principal normal is undefined in this case. This makes
sense because if there is no curving going on, there is no special direction normal to the
curve at such points which could be distinguished from any other direction normal to
the curve. In the case where |T0 (s)| = 0, (s) = 0 and the radius of curvature would
be considered infinite.
203
Definition 9.2.2 The vector, T (s) is called the unit tangent vector and the vector,
N (s) is called the principal normal. The function, (s) in the above lemma is called
the curvature.When T0 (s) 6= 0 so the principal normal is defined, the vector, B (s)
T (s) N (s) is called the binormal.
The binormal is normal to the osculating plane and B0 tells how fast this vector
changes. Thus it measures the rate at which the curve twists.
Lemma 9.2.3 Let R (s) be a parameterization of a space curve with respect to arc
length and let the vectors, T, N, and B be as defined above. Then B0 = T N0 and
there exists a scalar function, (s) such that B0 = N.
Proof: From the definition of B = T N, and you can differentiate both sides and
get B0 = T0 N + T N0 . Now recall that T0 is a multiple called curvature multiplied
by N so the vectors, T0 and N have the same direction and B0 = T N0 . Therefore,
B0 is either zero or is perpendicular to T. But also, from the definition of B, B is a unit
vector and so B (s)B (s) = 0. Differentiating this,B0 (s)B (s)+B (s)B0 (s) = 0 showing
that B0 is perpendicular to B also. Therefore, B0 is a vector which is perpendicular to
both vectors, T and B and since this is in three dimensions, B0 must be some scalar
multiple of N and it is this multiple called . Thus B0 = N as claimed.
Lets go over this last claim a little more. The following situation is obtained. There
are two vectors, T and B which are perpendicular to each other and both B0 and N
are perpendicular to these two vectors, hence perpendicular to the plane determined by
them. Therefore, B0 must be a multiple of N. Take a piece of paper, draw two unit
vectors on it which are perpendicular. Then you can see that any two vectors which are
perpendicular to this plane must be multiples of each other.
The scalar function, is called the torsion. In case T0 = 0, none of this is defined
because in this case there is not a well defined osculating plane. The conclusion of the
following theorem is called the Serret Frenet formulas.
Theorem 9.2.4 (Serret Frenet) Let R (s) be the parameterization with respect to arc
length of a space curve and T (s) = R0 (s) is the unit tangent vector. Suppose |T0 (s)| 6=
T0 (s)
0 so the principal normal, N (s) = |T
The binormal is the vector
0 (s)| is defined.
B T N so T, N, B forms a right handed system of unit vectors each of which is
perpendicular to every other. Then the following system of differential equations holds
in R9 .
B0 = N, T0 = N, N0 = T B
where is the curvature and is nonnegative and is the torsion.
Proof: 0 because = |T0 (s)| . The first two equations are already established.
To get the third, note that B T = N which follows because T, N, B is given to form
a right handed system of unit vectors each perpendicular to the others. (Use your right
hand.) Now take the derivative of this expression. thus
N0
=
=
B0 T + B T0
N T+B N.
204
problem for a system of ordinary differential equations. Having done this, you can
reconstruct the entireR space curve starting at some point, R0 because R0 (s) = T (s)
s
and so R (s) = R0 + 0 T0 (r) dr.
The vectors, B, T, and N are vectors which are functions of position on the space
curve. Often, especially in applications, you deal with a space curve which is parameterized by a function of t where t is time. Thus a value of t would correspond to a point
on this curve and you could let B (t) , T (t) , and N (t) be the binormal, unit tangent,
and principal normal at this point of the curve. The following example is typical.
Example 9.2.5 Given the circular helix, R (t) = (a cos t) i + (a sin t) j + (bt) k, find
the arc length, s (t) ,the unit tangent vector, T (t) , the principal normal, N (t) , the
binormal, B (t) , the curvature, (t) , and the torsion, (t) . Here t [0, T ] .
R t
The arc length is s (t) = 0
a2 + b2 dr =
a2 + b2 t. Now the tangent vector
is obtained using the chain rule as
T
dR
dR dt
1
=
=
R0 (t)
ds
dt ds
a2 + b2
1
=
=
=
=
and so
dT dt
dt ds
1
((a cos t) i + (a sin t) j + 0k)
a2 + b2
dT dT
N=
/
= ((cos t) i + (sin t) j)
ds ds
The binormal:
1
2
a + b2
B =
i
j
k
a sin t a cos t b
cos t sin t 0
1
((b sin t) ib cos tj + ak)
+ b2
r
2
2
dT
a cos t
a sin t
a2
a
a2 +b2 .
vature is constant in this example. The final task is to find the torsion. Recall that
B0 = N where the derivative on B is taken with respect to arc length. Therefore,
remembering that t is a function of s,
B0 (s)
1
dt
((b cos t) i+ (b sin t) j)
2
ds
+b
1
((b cos t) i+ (b sin t) j)
=
a2 + b2
= ( (cos t) i (sin t) j) = N
a2
and it follows b/ a2 + b2 = .
An important application of the usefulness of these ideas involves the decomposition
of the acceleration in terms of these vectors of an object moving over a space curve.
9.3. EXERCISES
205
Corollary 9.2.6 Let R (t) be a space curve and denote by v (t) the velocity, v (t) =
R0 (t) and let v (t) |v (t)| denote the speed and let a (t) denote the acceleration. Then
2
v = vT and a = dv
dt T + v N.
Rt
ds
dR dt
dt
Proof: T = dR
ds = dt ds = v ds . Also, s = 0 v (r) dr and so dt = v which implies
dt
1
ds = v . Therefore, T = v/v which implies v = vT as claimed.
Now the acceleration is just the derivative of the velocity and so by the Serrat Frenet
formulas,
a =
=
dT
dv
T+v
dt
dt
dv
dT
dv
T+v
v=
T + v 2 N
dt
ds
dt
Note how this decomposes the acceleration into a component tangent to the curve and
0
2
one which is normal to it. Also note that from the above, v |T0 | T|T(t)
0 | = v N and so
0
|T |
T0 (t)
v = and N = |T0 |
From this, it is possible to give an important formula from physics. Suppose an
object orbits a point at constant speed, v. What is the centripetal acceleration of this
object? You may know from a physics class that the answer is v 2 /r where r is the
radius. This follows from the above quite easily. The parameterization of the object
which is as described is
v
v
R (t) = r cos
t , r sin
t .
r
r
v
v
Therefore, T = sin r t , cos r t and T0 = vr cos vr t , vr sin vr t . Thus,
= |T0 (t)| /v =
1
.
r
v
2
It follows a = dv
thecenter
dt T+v N = r N. The vector, N points from the object
toward
9.3
Exercises
206
9. An object moves over the helix, (cos t, sin t, t) . Find the normal and tangential
components of the acceleration of this object as a function of t and write the
acceleration in the form aT T + aN N.
10. An object moves in R3 according to the formula, cos 3t, sin 3t, t2 . Find the normal and tangential components of the acceleration of this object as a function of
t and write the acceleration in the form aT T + aN N.
11. An object moves over the helix, (cos t, sin t, 2t) . Find the osculating plane at the
point of the curve corresponding to t = /4.
12. An object moves over a circle of radius r according to the formula,
r (t) = (r cos (t) , r sin (t))
where v = r. Show that the speed of the object is constant and equals to v. Tell
why aT = 0 and find aN , N. This yields the formula for centripetal acceleration
from beginning physics classes.
13. Suppose |R (t)| = c where c is a constant and R (t) is the position vector of an
object. Show the velocity, R0 (t) is always perpendicular to R (t) .
14. An object moves in three dimensions and the only force on the object is a central
force. This means that if r (t) is the position of the object, a (t) = k (r (t)) r (t)
where k is some function. Show that if this happens, then the motion of the
object must be in a plane. Hint: First argue that a r = 0. Next show that
0
0
(a r) = (v r) . Therefore, (v r) = 0. Explain why this requires v r = c
for some vector, c which does not depend on t. Then explain why c r = 0. This
implies the motion is in a plane. Why? What are some examples of central forces?
17. Let R (t) = e5t i + e5t j + 5 2tk. Find the arc length, s as a function of the
parameter, t, if t = 0 is taken to correspond to s = 0.
18. An object moves along the x axis toward (0, 0) and then along the curve y = x2 in
the direction of increasing x at constant speed. Is the force acting on the object
a continuous function? Explain. Is there any physically reasonable way to make
this force continuous by relaxing the requirement that the object move at constant
speed? If the curve were part of a railroad track, what would happen at the point
where x = 0?
19. An object of mass m moving over a space curve is acted on by a force, F. Show
the work done by this force equals maT (length of the curve) . In other words, it
is only the tangential component of the force which does work.
20. The edge of an elliptical skating rink represented in the following picture has a
y2
x2
light at its left end and satisfies the equation 900
+ 256
= 1. (Distances measured
in yards.)
9.3. EXERCISES
207
T
s(x, y)
z
u
L
A hockey puck slides from the point, T towards the center of the rink at the rate
of 2 yards per second. What
p is the speed of its shadow along the wall when z = 8?
Hint: You need to find x02 + y 02 at the instant described.
208
Outcomes
10.1
Polar Coordinates
So far points have been identified in terms of Cartesian coordinates but there are other
ways of specifying points in two and three dimensional space. These other ways involve
using a list of two or three numbers which have a totally different meaning than Cartesian coordinates to specify a point in two or three dimensional space. In general these
lists of numbers which have a different meaning than Cartesian coordinates are called
Curvilinear coordinates. Probably the simplest curvilinear coordinate system is that of
polar coordinates. The idea is suggested in the following picture.
(x, y)
(r, )
You see in this picture, the number r identifies the distance of the point from the
origin, (0, 0) while is the angle shown between the positive x axis and the line from
the origin to the point. This angle will always be given in radians and is in the interval
[0, 2). Thus the given point, indicated by a small dot in the picture, can be described
209
210
in terms of the Cartesian coordinates, (x, y) or the polar coordinates, (r, ) . How are
the two coordinates systems related? From the picture,
x = r cos () , y = r sin () .
(10.1)
Example 10.1.1 The polar coordinates of a point in the plane are 5, 6 . Find the
Cartesian or rectangular coordinates of this point.
From 10.1, x = 5 cos 6 = 52 3 and y = 5 sin 6 = 52 . Thus the Cartesian coordi5 5
nates are 2 3, 2 .
Example 10.1.2 Suppose the Cartesian coordinates of a point are (3, 4) . Find the polar
coordinates.
10.1.1
Just as in the case of rectangular coordinates, it is possible to use relations between the
polar coordinates to specify points in the plane. The process of sketching their graphs
is very similar to that used to sketch graphs of functions in rectangular coordinates. I
will only consider the case where the relation between the polar coordinates is of the
form, r = f () . To graph such a relation, you can make a table of the form
1
2
..
.
r
f (1 )
f (2 )
..
.
and then graph the resulting points and connect them up with a curve. The following
picture illustrates how to begin this process.
s
2
To obtain the point in the plane which goes with the pair (, f ()) , you draw the ray
through the origin which makes an angle of with the positive x axis. Then you move
along this ray a distance of f () to obtain the point. As in the case with rectangular
coordinates, this process is tedious and is best done by a computer algebra system.
Example 10.1.3 Graph the polar equation, r = 1 + cos .
To do this, I will use Maple. The command which produces the polar graph of this
is: > plot(1+cos(t),t=0..2*Pi,coords=polar); It tells Maple that r is given by 1 + cos (t)
and that t [0, 2] . The variable t is playing the role of . It is easier to type t than
in Maple.
211
You can also see just from your knowledge of the trig. functions that the graph
should look something like this. When = 0, r = 2 and then as increases to /2, you
see that cos decreases to 0. Thus the line from the origin to the point on the curve
should get shorter as goes from 0 to /2. Then from /2 to , cos gets negative
eventually equaling 1 at = . Thus r = 0 at this point. Viewing the graph, you see
this is exactly what happens. The above function is called a cardioid.
Here is another example. This is the graph obtained from r = 3 + sin 7
6 .
7
Example 10.1.4 Graph r = 3 + sin 6 for [0, 14] .
212
10.2
How can you find the area of the region determined by 0 r f () for [a, b] ,
assuming this is a well defined set of points in the plane? See Example 10.1.5 with
[0, 2] to see something which it would be better to avoid. I have in mind the
situation where every ray through the origin having angle for [a, b] intersects the
graph of r = f () in exactly one point. To see how to find the area of such a region,
consider the following picture.
f ()
This is a representation of a small triangle obtained from two rays whose angles
differ by only d. What is the area of this triangle, dA? It would be
1
1
2
2
sin (d) f () f () d = dA
2
2
with the approximation getting better as the angle gets smaller. Thus the area should
solve the initial value problem,
dA
1
2
= f () , A (a) = 0.
d
2
Therefore, the total area would be given by the integral,
1
2
f () d.
(10.2)
Example 10.2.1 Find the area of the cardioid, r = 1 + cos for [0, 2] .
From the graph of the cardioid presented earlier, you can see the region of interest
satisfies the conditions above that every ray intersects the graph in only one point.
Therefore, from 10.2 this area is
1
2
(1 + cos ()) d =
0
3
.
2
a2 d = a2 .
Example 10.2.3 Find the area of the region inside the cardioid, r = 1 + cos and
outside the circle, r = 1 for 2 , 2 .
As is usual in such cases, it is a good idea to graph the curves involved to get an
idea what is wanted.
10.3. EXERCISES
213
desired
region
The area of this region would be the area of the part of the cardioid corresponding
to 2 , 2 minus the area of the part of the circle in the first quadrant. Thus the
area is
Z
Z
1 /2
1 /2
1
2
(1 + cos ()) d
1d = + 2.
2 /2
2 /2
4
This example illustrates the following procedure for finding the area between the
graphs of two curves given in polar coordinates.
Procedure 10.2.4 Suppose that for all [a, b] , 0 < g () < f () . To find the area
of the region defined in terms of polar coordinates by g () < r < f () , [a, b], you
do the following.
Z
1 b
2
2
f () g () d.
2 a
10.3
Exercises
1. The following are the polar coordinates of points. Find the rectangular coordinates.
(a) 5, 6
(b) 3, 3
(c) 4, 2
3
(d) 2, 3
4
(e) 3, 7
6
(f) 8, 11
6
2. The following are the rectangular coordinates of points. Find the polar coordinates
of these points.
(a)
(b)
(c)
(d)
(e)
(f)
2, 25 2
3 3
2, 2 3
5 5
2 2, 2 2
5 5
2, 2 3
3, 1
3 3
2, 2 3
2
214
3. In general it is a stupid idea to try to use algebra to invert and solve for a set of
curvilinear coordinates such as polar or cylindrical coordinates in term of Cartesian
coordinates. Not only is it often very difficult or even impossible to do it1 , but also
it takes you in entirely the wrong direction because the whole point of introducing
the new coordinates is to write everything in terms of these new coordinates and
not in terms of Cartesian coordinates. However, sometimes this inversion can be
done. Describe how to solve for r and in terms of x and y in polar coordinates.
4. Suppose r = 1+easin where e [0, 1] . By changing to rectangular coordinates,
show this is either a parabola, an ellipse or a hyperbola. Determine the values of
e which correspond to the various cases.
5. In Example 10.1.4 suppose you graphed it for [0, k] where k is a positive
integer. What is the smallest value of k such that the graph will look exactly like
the one presented in the example?
6. Suppose you were to graph r = 3 + sin m
n where m, n are integers. Can you
give some description of what the graph will look like for [0, k] for k a very
large positive integer? How would things change if you did r = 3 + sin () where
is an irrational number?
7. Graph r = 1 + sin for [0, 2] .
8. Graph r = 2 + sin for [0, 2] .
9. Graph r = 1 + 2 sin for [0, 2] .
10. Graph r = 2 + sin (2) for [0, 2] .
11. Graph r = 1 + sin (2) for [0, 2] .
12. Graph r = 1 + sin (3) for [0, 2] .
13. Find the area of the bounded region determined by r = 1 + sin (3) for [0, 2] .
14. Find the area inside r = 1 + sin and outside the circle r = 1/2.
15. Find the area inside the circle r = 1/2 and outside the region defined by r =
1 + sin .
10.4
1. The following are the polar coordinates of points. Find the rectangular coordinates.
(b) 2, 3 Rectangular coordinates: 2 cos 3 , 2 sin 3 = 1, 3
2
2 7 7
(c) 7, 2
= 2, 2 3
3 Rectangular coordinates: 7 cos 3 , 7 sin 3
3
3
3
(d) 6, 4 Rectangular coordinates: 6 cos 4 , 6 sin 4
= 3 2, 3 2
2. The following are the rectangular coordinates of points. Find the polar coordinates
of these points.
1 It is no problem for these simple cases of curvilinear coordinates. However, it is a major difficulty
in general. Algebra is simply not adequate to solve systems of nonlinear equations.
215
(a) 5 2, 5 2 Polar coordinates: = /4 because tan () = 1.
q
2
2
r=
5 2 + 5 2 = 10
(b) 3, 3 3 Polar coordinates: = /3 because tan () = 3.
q
2
2
r = (3) + 3 3 = 6
(c) 2, 2 Polar coordinates: = /4 because tan () = 1.
q
2
2
r=
2 +
2 =2
216
7. Find the area of the bounded region determined by r = 1 + sin (4) for [0, 2] .
First you should graph this thing to get an idea what is needed.
You see that you could simply take the area of one of the petals and then multiply
by 4. To get the one which is mostly in the first quadrant, you should let go
R 3/8
2
from /8 to 3/8. Thus the area of one petal is 12 /8 (1 + sin (4)) d = 83 .
Then you would need to multiply this by 4 to get the whole area. This gives 3/2.
Alternatively, you could just do
1
2
(1 + sin (4)) d =
0
3
.
2
Be sure to always graph the polar function to be sure what you have in mind
is appropriate. Sometimes, as indicated above, funny things happen with polar
graphs.
10.5
Sometimes you have information about forces which act not in the direction of the coordinate axes but in some other direction. When this is the case, it is often useful to
express things in terms of different coordinates which are consistent with these directions. A good example of this is the force exerted by the sun on a planet. This force is
always directed toward the sun and so the force vector changes as the planet moves. To
discuss this, consider the following simple diagram in which two unit vectors, er and e
are shown.
217
@
I
e
r
@
yXX
X
(r, )
The vector, er = (cos , sin ) and the vector, e = ( sin , cos ) . You should
convince yourself that the picture above corresponds to this definition of the two vectors.
Note that er is a unit vector pointing away from 0 and
e =
der
de
, er =
.
d
d
(10.3)
Now consider the position vector from 0 of a point in the plane, r (t) .Then
r (t) = r (t) er ( (t))
where r (t) = |r (t)| . Thus r (t) is just the distance from the origin, 0 to the point. What
is the velocity and acceleration? Using the chain rule,
der
der 0
de
de 0
=
(t) ,
=
(t)
dt
d
dt
d
and so from 10.3,
der
de
= 0 (t) e ,
= 0 (t) er
dt
dt
Using 10.4 as needed along with the product rule and the chain rule,
r0 (t)
(10.4)
d
(er ( (t)))
dt
0
0
r (t) er + r (t) (t) e .
= r0 (t) er + r (t)
=
2
=
r00 (t) r (t) 0 (t) er + 2r0 (t) 0 (t) + r (t) 00 (t) e .
(10.5)
218
Example 10.5.2 A platform rotates at a constant speed in the counter clockwise direction and an object of mass m moves from the center of the platform toward the edge at
constant speed. What forces act on this object?
Let v denote the constant speed of the object moving toward the edge of the platform.
Then
r0 (t) = v, r00 (t) = 0, 00 (t) = 0,
while 0 (t) = , a positive constant. From 10.5
mr00 (t) = mr (t) 2 er + m2ve .
Thus the object experiences centripetal force from the first term and also a funny force
from the second term which is in the direction of rotation of the platform. You can
observe this by experiment if you like. Go to a playground and have someone spin one
of those merry go rounds while you ride it and move from the center toward the edge.
The term 2r0 0 is called the Coriolis force.
Suppose at each point of space, r is associated a force, F (r) which a given object
of mass m will experience if its position vector is r. This is called a force field. a force
field is a central force field if F (r) = g (r) er . Thus in a central force field, the force
an object experiences will always be directed toward or away from the origin, 0. The
following simple lemma is very interesting because it says that in a central force field,
objects must move in a plane.
Lemma 10.5.3 Suppose an object moves in three dimensions in such a way that the
only force acting on the object is a central force. Then the motion of the object is in a
plane.
Proof: Let r (t) denote the position vector of the object. Then from the definition
of a central force and Newtons second law,
mr00 = g (r) r.
0
10.6
Planetary Motion
Keplers laws of planetary motion state that planets move around the sun along an
ellipse, the equal area law described above holds, and there is a formula for the time it
takes for the planet to move around the sun. These laws, discovered by Kepler, were
shown by Newton to be consequences of his law of gravitation which states that the
force acting on a mass, m by a mass, M is given by
1
1
r
=
GM
m
er
F = GM m
r3
r2
where r is the distance between centers of mass and r is the position vector from M to
m. Here G is the gravitation constant. This is called an inverse square law. Gravity
acts according to this law and so does electrostatic force. The constant, G, is very small
when usual units are used and it has been computed using a very delicate experiment.
It is now accepted to be
6.67 1011 Newton meter2 /kilogram2 .
219
The experiment involved a light source shining on a mirror attached to a quartz fiber
from which was suspended a long rod with two equal masses at the ends which were
attracted by two larger masses. The gravitation force between the suspended masses
and the two large masses caused the fibre to twist ever so slightly and this twisting
was measured by observing the deflection of the light reflected from the mirror on a
scale placed some distance from the fibre. The constant was first measured successfully
by Lord Cavendish in 1798 and the present accepted value was obtained in 1942.
Experiments like these are major accomplishments.
In the following argument, M is the mass of the sun and m is the mass of the planet.
(It could also be a comet or an asteroid.)
10.6.1
An object moves in three dimensions in such a way that the only force acting on the
object is a central force. Then the object moves in a plane and the radius vector from
the origin to the object sweeps out area at a constant rate. This is the equal area rule.
In the context of planetary motion it is called Keplers second law.
Lemma 10.5.3 says the object moves in a plane. From the assumption that the force
field is a central force field, it follows from 10.5 that
2r0 (t) 0 (t) + r (t) 00 (t) = 0
Multiply both sides of this equation by r. This yields
0
2rr0 0 + r2 00 = r2 0 = 0.
(10.6)
r2 0 = c
(10.7)
Consequently,
for some constant, C. Now consider the following picture.
In this picture, d is the indicated angle and the two lines determining this angle
are position vectors for the object at point t and point t + dt. The area of the sector,
dA, is essentially r2 d and so dA = 12 r2 d. Therefore,
1 d
c
dA
= r2
= .
dt
2 dt
2
10.6.2
(10.8)
Consider the first of Keplers laws, the one which states that planets move along ellipses.
From Lemma 10.5.3, the motion is in a plane. Now from 10.5 and Newtons second law,
GM m 1
1
2
r00 (t) r (t) 0 (t) er + 2r0 (t) 0 (t) + r (t) 00 (t) e =
e
=
k
er
r
m
r2
r2
220
Thus k = GM and
00
1
r2
(10.9)
0
As in 10.6, r2 0 = 0 and so there exists a constant, c, such that
r2 0 = c.
(10.10)
c2
r4
= k
1
r2
(10.11)
r0 =
(10.12)
and so also
r00
=
=
dr d
d c dr
+
(2) (c) r3
2
dt
r
d
d dt
2 2
2
2
d r c
dr
c
2
2
2
5
r
d
r
d
d2 r
d2
(10.13)
dr
d
1
kr2
r =
.
r
c2
(10.14)
This is a nice differential equation for r as a function of but it is not clear what its
solution is. It turns out to be convenient to define a new dependent variable, r1
so r = 1 . Then
dr
d d2 r
= (1) 2 ,
= 23
d
d d2
d
d
2
+ (1) 2
d2
.
d2
23
d
d
2
+ (1) 2
2
d2
k2
2 d
.
1 =
2
d
c2
d
which simplifies to
k2
d2
1
=
c2
d2
2
since those two terms which involve d
cancel. Now multiply both sides by 2 and
d
this yields
d2
k
(10.15)
2 + = c2 ,
d
(1) 2
221
k
c2 .
dR
d .
1 d
2 d
and so
dR
d
dR
d
2
+R
=0
2
+ R2 = 2
(10.16)
dR
= cos ()
d
1
because 10.16 says 1 dR
d , R is a point on the unit circle. But differentiating, the first
of the above equations,
dR
d
= cos ()
= cos ()
d
d
and so d
d = 1. Therefore, = + . Choosing the coordinate system appropriately,
you can assume = 0. Therefore,
R=
k
1
k
= 2 = sin ()
c2
r
c
=
=
where
1
c2 /k
=
1 + (c2 /k) sin
c2 + sin
p
1 + sin
k
= c2 /k and p = c2 /k.
(10.17)
x2 + y 2 = (p y) = 2 p2 2p2 y + 2 y 2
And so
x2 + 1 2 y 2 = 2 p2 2p2 y.
(10.18)
222
10.6.3
Keplers third law involves the time it takes for the planet to orbit the sun. From 10.18
you can complete the square and obtain
p2
p 2 4
2 p2
2
2
2 2
x + 1
y+
=
p
+
=
,
1 2
(1 2 )
(1 2 )
and this yields
x /
2 p 2
1 2
2
p2
2 p2
+ y+
/
= 1.
2
1 2
(1 2 )
(10.19)
Now note this is the equation of an ellipse and that the diameter of this ellipse is
2p
2a.
(1 2 )
This follows because
2 p2
2
2 )
(10.20)
2 p2
.
1 2
(1
Now let T denote the time it takes for the planet to make one revolution about the sun.
Using this formula, and 10.8 the following equation must hold.
area of ellipse
z
}|
{
p
p
c
=T
2
1 2 (1 2 )
Therefore,
T =
2 2 p2
c (1 2 )3/2
and so
T2 =
4 2 4 p4
3
c2 (1 2 )
Now using 10.17, recalling that k = GM, and 10.20,
T2
=
=
4 2 4 p4
3
4 2 (p)
3
3
kp (1 2 )
k (1 2 )
2 3
2 3
4 a
4 a
=
.
k
GM
3
4 2 diameter of ellipse
2
.
T =
GM
2
(10.21)
10.7
Exercises
1. Suppose you know how the spherical coordinates of a moving point change as a
function of t. Can you figure out the velocity of the point? Specifically, suppose
(t) = t, (t) = 1 + t, and (t) = t. Find the speed and the velocity of the object
in terms of Cartesian coordinates. Hint: You would need to find x0 (t) , y 0 (t) ,
and z 0 (t) . Then in terms of Cartesian coordinates, the velocity would be x0 (t) i +
y 0 (t) j + z 0 (t) k.
10.7. EXERCISES
223
5. Suppose an object moves in such a way that r2 0 is a constant. Show the only
force acting on the object is a central force.
6. Explain why low pressure areas rotate counter clockwise in the Northern hemisphere and clockwise in the Southern hemisphere. Hint: Note that from the point
of view of an observer fixed in space above the North pole, the low pressure area
already has a counter clockwise rotation because of the rotation of the earth and
its spherical shape. Now consider 10.7. In the low pressure area stuff will move
toward the center so r gets smaller. How are things different in the Southern
hemisphere?
7. What are some physical assumptions which are made in the above derivation of
Keplers laws from Newtons laws of motion?
8. The orbit of the earth is pretty nearly circular and the distance from the sun to
the earth is about 149 106 kilometers. Using 10.21 and the above value of the
universal gravitation constant, determine the mass of the sun. The earth goes
around it in 365 days. (Actually it is 365.256 days.)
9. It is desired to place a satellite above the equator of the earth which will rotate
about the center of mass of the earth every 24 hours. Is it necessary that the orbit
be circular? What if you want the satellite to stay above the same point on the
earth at all times? If the orbit is to be circular and the satellite is to stay above
the same point, at what distance from the center of mass of the earth should the
satellite be? You may use that the mass of the earth is 5.98 1024 kilograms.
Such a satellite is called geosynchronous.
224
10.8
Now consider two three dimensional generalizations of polar coordinates. The following
picture serves as motivation for the definition of these two other coordinate systems.
z
6
(, , )
(r, , z1 )
(x1 , y1 , z1 )
z1
y1
y
r
x1
(x1 , y1 , 0)
In this picture, is the distance between the origin, the point whose Cartesian
coordinates are (0, 0, 0) and the point indicated by a dot and labelled as (x1 , y1 , z1 ),
(r, , z1 ) , and (, , ) . The angle between the positive z axis and the line between the
origin and the point indicated by a dot is denoted by , and , is the angle between the
positive x axis and the line joining the origin to the point (x1 , y1 , 0) as shown, while r is
the length of this line. Thus r and determine a point in the plane determined by letting
z = 0 and r and are the usual polar coordinates. Thus r 0 and [0, 2). Letting z1
denote the usual z coordinate of a point in three dimensions, like the one shown as a dot,
(r, , z1 ) are the cylindrical coordinates of the dotted point. The spherical coordinates
are determined by (, , ) . When is specified, this indicates that the point of interest
is on some sphere of radius which is centered at the origin. Then when is given,
the location of the point is narrowed down to a circle and finally, determines which
point is on this circle. Let [0, ], [0, 2), and [0, ). The picture shows
how to relate these new coordinate systems to Cartesian coordinates. For Cylindrical
coordinates,
x = r cos () ,
y = r sin () ,
z=z
and for spherical coordinates,
x = sin () cos () ,
y = sin () sin () ,
z = cos () .
Spherical coordinates should be especially interesting to you because you live on the
surface of a sphere. This has been known for several hundred years. You may also know
that the standard way to determine position on the earth is to give the longitude and
latitude. The latitude corresponds to and the longitude corresponds to .3
3 Actually latitude is determined on maps and in navigation by measuring the angle from the equator
rather than the pole but it is essentially the same idea.
10.9. EXERCISES
225
1
3
p
x2 + y 2 in spherical coordinates.
This is
1
cos () =
3
q
2
1
3 sin .
3
10.9
Exercises
1. The following are the cylindrical coordinates of points. Find the rectangular and
spherical coordinates.
(a) 5, 5
6 , 3
(b) 3, 3 , 4
(c) 4, 2
3 ,1
(d) 2, 3
4 , 2
(e) 3, 3
2 , 1
(f) 8, 11
6 , 11
2. The following are the rectangular coordinates of points. Find the cylindrical and
spherical coordinates of these points.
(a) 52 2, 25 2, 3
(b) 32 , 23 3, 2
(c) 52 2, 52 2, 11
(d) 52 , 52 3, 23
(e) 3, 1, 5
(f) 32 , 23 3, 7
3. The following are spherical coordinates of points in the form (, , ) . Find the
rectangular and cylindrical coordinates.
(a) 4, 4 , 5
6
(b) 2, 3 , 2
3
3
(c) 3, 5
6 , 2
(d) 4, 2 , 7
4
226
(e) 4, 2
3 , 6
5
(f) 4, 3
4 , 3
4. The following are rectangular coordinates of points. Find the spherical and cylindrical coordinates.
(a)
2, 6, 2 2
1 3
(b) 2 3, 2 , 1
(c) 34 2, 34 2, 32 3
(d) 3, 1, 2 3
(e) 14 2, 14 6, 12 2
9
(f) 94 3, 27
4 , 2
5. Describe how to solve the problem of finding spherical coordinates given rectangular coordinates.
6. A point has Cartesian coordinates, (1, 2, 3) . Find its spherical and cylindrical
coordinates using a calculator or other electronic gadget.
7. Describe the following surface in rectangular coordinates. = /4 where is the
polar angle in spherical coordinates.
8. Describe the following surface in rectangular coordinates. = /4 where is the
angle measured from the postive x axis spherical coordinates.
9. Describe the following surface in rectangular coordinates. = /4 where is the
angle measured from the postive x axis cylindrical coordinates.
10. Describe the following surface in rectangular coordinates. r = 5 where r is one of
the cylindrical coordinates.
11. Describe the following surface in rectangular coordinates. = 4 where is the
distance to the origin.
p
12. Give the cone, z = x2 + y 2 in cylindrical coordinates and in spherical coordinates.
13. Write the following in spherical coordinates.
(a) z = x2 + y 2 .
(b) x2 y 2 = 1
(c) z 2 + x2 + y 2 = 6
p
(d) z = x2 + y 2
(e) y = x
(f) z = x
14. Write the following in cylindrical coordinates.
(a) z = x2 + y 2 .
(b) x2 y 2 = 1
(c) z 2 + x2 + y 2 = 6
p
(d) z = x2 + y 2
(e) y = x
(f) z = x
10.10.
227
10.10
1. The following are the cylindrical coordinates of points. Find the rectangular and
spherical coordinates.
(a) 5, 5
3 , 3 Rectangular coordinates:
5 cos
5
6
, 5 sin
5
3
, 3
5
5
=
3,
3, 3
2
2
3 cos
, 3 sin
, 4 = (0, 3, 4)
2
2
(c) 4, 3
4 , 1 Rectangular coordinates:
3
3
4 cos
, 4 sin
, 1 = 2 2, 2 2, 1
4
4
2. The following are the rectangular coordinates of points. Find the cylindrical and
spherical coordinates of these points.
(a) 52 2, 25 2, 3 Cylindrical coordinates:
s
2
2
1
2 +
2 , , 3 = 5, , 3
2
2
4
4
5
Spherical coordinates:
34, 4 , where cos =
(b) 1, 3, 2 Cylindrical coordinates:
r
2
(1) +
3 , ,2
3
2
34
1
= 2, , 2
3
2 2
so =
4.
3. The following are spherical coordinates of points in the form (, , ) . Find the
rectangular and cylindrical coordinates.
(a) 4, 4 , 5
Rectangular coordinates:
6
5
5
4 sin
cos
, 4 sin
sin
, 4 cos
= 2 3, 2, 2 2
4
6
4
6
4
5
Cylindrical coordinates: 4 sin 4 , 5
= 2 2, 6 , 2 2 .
6 , 4 cos 4
(b) 2, 3 , 3
Rectangular coordinates:
4
3 1
5
5
cos
, 2 sin
sin
, 2 cos
= ,
2 sin
3, 1
3
6
3
6
3
2 2
3
Cylindrical coordinates: 2 sin 3 , 3
=
3, 4 , 1 .
4 , 2 cos 3
228
(c) 2, 6 , 3
Rectangular coordinates:
2
3
3
2 sin
cos
, 2 sin
sin
, 2 cos
= 0, 1, 3
6
2
6
2
6
3
= 1, 2 , 3 .
Cylindrical coordinates: 2 sin 6 , 3
2 , 2 cos 6
4. The following are rectangular coordinates of points. Find the spherical and cylindrical coordinates.
(a)
2, 6, 2 2 To find , note that tan = 62 = 3 and so = 3 . =
2 2
2
=
spherical
are
2 + 6 + 8 = 4. cos
=
4
2 so = 4 . The
coordinates
therefore, 4, 4 , 3 . The cylindrical coordinates are 4 sin 4 , 3 , 4 cos 4 =
1
2 2, 3 , 2 2 . I cant stand to do any more of these but you can do the
others the same way.
5. Describe how to solve the problem of finding spherical coordinates given rectangular coordinates.
This is not easy and is somewhat unpleasant but everyone should p
do this once in
their life. If x, y, z are the rectangular coordinates, you can get as x2 + y 2 + z 2 .
Now cos = z . Finally, you need . You know and . x = sin cos and
y = sin sin . Therefore, you can find in the same way you did for polar
coordinates. Here r = sin .
6. A point has Cartesian coordinates, (1, 2, 3) . Find its spherical and cylindrical
coordinates using a calculator or other electronic gadget.
See how to do it using Problem 5.
7. Describe the following surface in rectangular coordinates. = /3 where is the
polar angle in spherical coordinates.
This is a cone such that the angle between the positive z axis and the side of the
cone seen from the side equals /3.
p
8. Give the cone, z = 2 x2 + y 2 in cylindrical coordinates and in spherical coordinates.
Cylindrical: z = 2r Spherical: cos = 2 sin . So it is tan = 12 .
9. Write the following in spherical coordinates.
(a) z = 2 x2 + y 2 .
cos = 22 sin2 or in other words cos = 2 sin2
2
Part IV
229
Outcomes
11.1
With vector valued functions of many variables, it doesnt take long before it is impossible to draw meaningful pictures. This is because one needs more than three dimensions
to accomplish the task and we can only visualize things in three dimensions. Ultimately,
one of the main purposes of calculus is to free us from the tyranny of art. In calculus,
we are permitted and even required to think in a meaningful way about things which
cannot be drawn. However, it is certainly interesting to consider some things which
can be visualized and this will help to formulate and understand more general notions
which make sense in contexts which cannot be visualized. One of these is the concept
of a scalar valued function of two variables.
Let f (x, y) denote a scalar valued function of two variables evaluated at the point
(x, y) . Its graph consists of the set of points, (x, y, z) such that z = f (x, y) . How does
one go about depicting such a graph? The usual way is to fix one of the variables, say x
and consider the function z = f (x, y) where y is allowed to vary and x is fixed. Graphing
this would give a curve which lies in the surface to be depicted. Then do the same thing
for other values of x and the result would depict the graph desired graph. Computers
231
232
do this very well. The following is the graph of the function z = cos (x) sin (2x + y)
drawn using Maple, a computer algebra system.1 .
Notice how elaborate this picture is. The lines in the drawing correspond to taking
one of the variables constant and graphing the curve which results. The computer did
this drawing in seconds but you couldnt do it as well if you spent all day on it. I used
a grid consisting of 70 choices for x and 70 choices for y.
Sometimes attempts are made to understand three dimensional objects like the above
graph by looking at contour graphs in two dimensions. The contour graph of the above
three dimensional graph is below and comes from using the computer algebra system
again.
4
y
2
2 x
2
4
This is in two dimensions and the different lines in two dimensions correspond to
points on the three dimensional graph which have the same z value. If you have looked
at a weather map, these lines are called isotherms or isobars depending on whether the
function involved is temperature or pressure. In a contour geographic map, the contour
lines represent constant altitude. If many contour lines are close to each other, this
indicates rapid change in the altitude, temperature, pressure, or whatever else may be
measured.
A scalar function of three variables, cannot be visualized because four dimensions are
required. However, some people like to try and visualize even these examples. This is
done by looking at level surfaces in R3 which are defined as surfaces where the function
assumes a constant value. They play the role of contour lines for a function of two
variables. As a simple example, consider f (x, y, z) = x2 + y 2 + z 2 . The level surfaces
of this function would be concentric spheres centered at 0. (Why?) Another way to
visualize objects in higher dimensions involves the use of color and animation. However,
there really are limits to what you can accomplish in this direction. So much for art.
However, the concept of level curves is quite useful because these can be drawn.
Example 11.1.1 Determine from a contour map where the function,
f (x, y) = sin x2 + y 2
1I
used Maple and exported the graph as an eps. file which I then imported into this document.
233
is steepest.
3
2
y
1
3
1
2
3
In the picture, the steepest places are where the contour lines are close together
because they correspond to various values of the function. You can look at the picture
and see where they are close and where they are far. This is the advantage of a contour
map.
11.2
Review Of Limits
Example 11.2.2 Let S denote the set, (x, y, z) R3 : x, y, z are all in N . Which
points are limit points?
This set does not have any because any two of these points are at least as far apart
as 1. Therefore, if x is any point of R3 , B (x, 1/4) contains at most one point.
Example 11.2.3 Let U be an open set in R3 . Which points of U are limit points of U ?
They all are. From the definition of U being open, if x U, There exists B (x, r) U
for some r > 0. Now consider the line segment x+tre1 where t [0, 1/2] . This describes
infinitely many points and they are all in B (x, r) because
|x + tre1 x| = tr < r.
Therefore, every point of U is a limit point of U.
The case where U is open will be the one of most interest but many other sets have
limit points.
Definition 11.2.4 Let f : D (f ) Rp Rq where q, p 1 be a function and let x be a
limit point of D (f ). Then
lim f (y) = L
yx
if and only if the following condition holds. For all > 0 there exists > 0 such that if
0 < |y x| < and y D (f )
then,
|L f (y)| < .
234
The condition that x must be a limit point of D (f ) if you are to take a limit at x is
what makes the limit well defined.
Proposition 11.2.5 Let f : D (f ) Rp Rq where q, p 1 be a function and let x be
a limit point of D (f ). Then if limyx f (y) exists, it must be unique.
Proof: Suppose limyx f (y) = L1 and limyx f (y) = L2 . Then for > 0 given,
let i > 0 correspond to Li in the definition of the limit and let = min ( 1 , 2 ). Since
x is a limit point, there exists y B (x, ) D (f ) . Therefore,
|L1 L2 | |L1 f (y)| + |f (y) L2 |
< + = 2.
Since > 0 is arbitrary, this shows L1 = L2 . The following theorem summarized many
important interactions involving continuity. Most of this theorem has been proved in
Theorem 7.4.5 on Page 137 and Theorem 7.4.7 on Page 139.
Theorem 11.2.6 Suppose x is a limit point of D (f ) and limyx f (y) = L , limyx g (y) =
K where K and L are vectors in Rp for p 1. Then if a, b R,
lim af (y) + bg (y) = aL + bK,
(11.1)
lim f g (y) = L K
(11.2)
yx
yx
yx
(11.3)
T
For a vector valued function, f (y) = (f1 (y) , , fq (y)) , limyx f (y) = L = (L1 , Lk )
if and only if
lim fk (y) = Lk
(11.4)
yx
for each k = 1, , p.
In the case where f and g have values in R3
lim f (y) g (y) = L K.
yx
(11.5)
11.3
11.3.1
The directional derivative is just what its name suggests. It is the derivative of a function
in a particular direction. The following picture illustrates the situation in the case of a
function of two variables.
235
z = f (x, y)
r
v
(x0 , y0 )
In this picture, v (v1 , v2 ) is a unit vector in the xy plane and x0 (x0 , y0 ) is a
point in the xy plane. When (x, y) moves in the direction of v, this results in a change
in z = f (x, y) as shown in the picture. The directional derivative in this direction is
defined as
f (x0 + tv1 , y0 + tv2 ) f (x0 , y0 )
lim
.
t0
t
It tells how fast z is changing in this direction. If you looked at it from the side, you
would be getting the slope of the indicated tangent line. A simple example of this is a
person climbing a mountain. He could go various directions, some steeper than others.
The directional derivative is just a measure of the steepness in a given direction. This
motivates the following general definition of the directional derivative.
Definition 11.3.1 Let f : U R where U is an open set in Rn and let v be a unit
vector. For x U, define the directional derivative of f in the direction, v, at the
point x as
f (x + tv) f (x)
Dv f (x) lim
.
t0
t
Example 11.3.2 Find the directional derivative of the function, f (x, y) = x2 y in the
direction of i + j at the point (1, 2) .
First you needa unit vector
which has the same direction as the given vector. This
1 1
unit vector is v
, 2 . Then to find the directional derivative from the definition,
2
1+
f (x + tv) f (x)
=
t
t
2
2+
t
2
5
Dv f (1, 2) =
2 .
2
There is something you must keep in mind about this. The direction vector must
always be a unit vector2 .
2 Actually, there is a more general formulation of the notion of directional derivative known as the
Gateaux derivative in which the length of v is not equal to one but it will not be considered.
236
11.3.2
Partial Derivatives
There are some special unit vectors which come to mind immediately. These are the
vectors, ei where
T
ei = (0, , 0, 1, 0, 0)
and the 1 is in the ith position.
Thus in case of a function of two variables, the directional derivative in the direction
i = e1 is the slope of the indicated straight line in the following picture.
z = f (x, y)
e1
As in the case of a general directional derivative, you fix y and take the derivative
of the function, x f (x, y). More generally, even in situations which cannot be drawn,
the definition of a partial derivative is as follows.
Definition 11.3.3 Let U be an open subset of Rn and let f : U R. Then letting
T
x = (x1 , , xn ) be a typical element of Rn ,
f
(x) Dei f (x) .
xi
This is called the partial derivative of f. Thus,
f
(x)
xi
=
f (x+tei ) f (x)
t
f (x1 , , xi + t, xn ) f (x1 , , xi , xn )
lim
,
t0
t
lim
t0
and to find the partial derivative, differentiate with respect to the variable of interest and
regard all the others as constants. Other notation for this partial derivative is fxi , f,i ,
or Di f. If y = f (x) , the partial derivative of f with respect to xi may also be denoted
by
y
or yxi .
xi
Example 11.3.4 Find
f f
x , y ,
and
f
z
if f (x, y) = y sin x + x2 y + z.
f
f
2
From the definition above, f
x = y cos x + 2xy, y = sin x + x , and z = 1. Having
taken one partial derivative, there is no reason to stop doing it. Thus, one could take the
partial derivative with respect to y of the partial derivative with respect to x, denoted
2f
or fxy . In the above example,
by yx
2f
= fxy = cos x + 2x.
yx
237
2f
= fyx = cos x + 2x.
xy
Higher order partial derivatives are defined by analogy to the above. Thus in the
above example,
fyxx = sin x + 2.
These partial derivatives, fxy are called mixed partial derivatives.
There is an interesting relationship between the directional derivatives and the partial derivatives, provided the partial derivatives exist and are continuous.
Definition 11.3.5 Suppose f : U Rn R where U is an open set and the partial
derivatives of f all exist and are continuous on U. Under these conditions, define the
gradient of f denoted f (x) to be the vector
T
Example 11.3.7 Find the directional derivative of the function, f (x, y) = sin 2x2 + y 3
T
at (1, 1) in the direction 12 , 12
.
First find the gradient.
T
f (x, y) = 4x cos 2x2 + y 3 , 3y 2 cos 2x2 + y 3
.
Therefore,
f (1, 1) = (4 cos (3) , 3 cos (3))
1
1
,
2
2
T
=
7
(cos 3) 2.
2
Another important observation is that the gradient gives the direction in which the
function changes most rapidly.
Proposition 11.3.8 In the situation of Definition 11.3.5, suppose f (x) 6= 0. Then
the direction in which f increases most rapidly, that is the direction in which the directional derivative is largest, is the direction of the gradient. Thus v = f (x) / |f (x)| is
the unit vector which maximizes Dv f (x) and this maximum value is |f (x)| . Similarly,
v = f (x) / |f (x)| is the unit vector which minimizes Dv f (x) and this minimum
value is |f (x)| .
Proof: Let v be any unit vector. Then from Proposition 11.3.6,
Dv f (x) = f (x) v = |f (x)| |v| cos = |f (x)| cos
where is the included angle between these two vectors, f (x) and v. Therefore,
Dv f (x) is maximized when cos = 1 and minimized when cos = 1. The first case
corresonds to the angle between the two vectors being 0 which requires they point in
238
the same direction in which case, it must be that v = f (x) / |f (x)| and Dv f (x) =
|f (x)| . The second case occurs when is and in this case the two vectors point in
opposite directions and the directional derivative equals |f (x)| .
The concept of a directional derivative for a vector valued function is also
easy to define although the geometric significance expressed in pictures is not.
Definition 11.3.9 Let f : U Rp where U is an open set in Rn and let v be a unit
vector. For x U, define the directional derivative of f in the direction, v, at the point
x as
f (x + tv) f (x)
Dv f (x) lim
.
t0
t
T
Example 11.3.10 Let f (x, y) = xy 2 , yx . Find the directional derivative in the
T
direction (1, 2) at the point (x, y) .
T
First, a unit vector in this direction is 1/ 5, 2/ 5 and from the definition, the
desired limit is
y + t 2/ 5
xy 2 , x + t 1/ 5
y + t 2/ 5 xy
x + t 1/ 5
lim
t0
t
4
1
2
4
1 2 4
4 2 2
5y + ty + t 5, x 5 + y 5 + t
= lim
xy 5 + xt +
t0 5
5
5
5
25
5
5
5
4
1
2
1
=
5y 2 , x 5 + y 5 .
xy 5 +
5
5
5
5
You see from this example and the above definition that all you have to do is to
form the vector which is obtained by replacing each component of the vector with its
directional derivative. In particular, you can take partial derivatives of vector valued
functions and use the same notation.
Example 11.3.11 Find the partial derivative with respect to x of the function f (x, y, z, w) =
2
T
xy , z sin (xy) , z 3 x .
T
From the above definition, fx (x, y, z) = D1 f (x, y, z) = y 2 , zy cos (xy) , z 3 .
11.4
Under certain conditions the mixed partial derivatives will always be equal. This
astonishing fact is due to Euler in 1734.
Theorem 11.4.1 Suppose f : U R2 R where U is an open set on which fx , fy ,
fxy and fyx exist. Then if fxy and fyx are continuous at the point (x, y) U , it follows
fxy (x, y) = fyx (x, y) .
Proof: Since U is open, there exists r > 0 such that B ((x, y) , r) U. Now let
|t| , |s| < r/2 and consider
h(t)
h(0)
}|
{ z
}|
{
1 z
(s, t) {f (x + t, y + s) f (x + t, y) (f (x, y + s) f (x, y))}.
st
(11.6)
239
1/2
= |(t, s)| = t2 + s2
2
1/2
r
r2
r
+
= < r.
4
4
2
1
1
(h (t) h (0)) = h0 (t) t
st
st
1
(fx (x + t, y + s) fx (x + t, y))
s
(s,t)(0,0)
xy (x2 y 2 )
x2 +y 2
if (x, y) 6= (0, 0)
0 if (x, y) = (0, 0)
Here is a picture of the graph of this function. It looks innocuous but isnt.
x4 y 4 + 4x2 y 2
2
(x2 + y 2 )
, fy = x
x4 y 4 4x2 y 2
(x2 + y 2 )
240
Now
fx (0, y) fx (0, 0)
y 4
= lim
= 1
y0
y0 (y 2 )2
y
fy (x, 0) fy (0, 0)
x4
=1
= lim
x0
x0 (x2 )2
x
showing that although the mixed partial derivatives do exist at (0, 0) , they are not equal
there.
11.5
Partial differential equations are equations which involve the partial derivatives of
some function. The most famous partial differential equations involve the Laplacian,
named after Laplace3 .
Pn
Definition 11.5.1 Let u be a function of n variables. Then u k=1 uxk xk . This
is also written as 2 u. The symbol, or 2 is called the Laplacian. When u = 0 the
function, u is called harmonic.Laplaces equation is u = 0. The heat equation
is ut u = 0 and the wave equation is utt u = 0.
Example 11.5.2 Find the Laplacian of u (x, y) = x2 y 2 .
uxx = 2 while uyy = 2. Therefore, u = uxx + uyy = 2 2 = 0. Thus this function
is harmonic, u = 0.
Example 11.5.3 Find ut u where u (t, x, y) = et cos x.
In this case, ut = et cos x while uyy = 0 and uxx = et cos x therefore, ut u =
0 and so u solves the heat equation, ut u = 0.
Example 11.5.4 Let u (t, x) = sin t cos x. Find utt u.
In this case, utt = sin t cos x while u = sin t cos x. Therefore, u is a solution of
the wave equation, utt u = 0.
11.6
Exercises
5. Find the smallest value of the directional derivative of f (x, y, z) = x sin 4xy 2 +z 2
at the point (1, 1, 1) .
3 Laplace was a great physicist of the 1700s. He made fundamental contributions to mechanics and
astronomy.
11.6. EXERCISES
241
T
f (x, y, z, w) = y 2 , z 2 sin (xy) , z 3 x .
8. Find the partial derivative with respect to x of the function
T
f (x, y, z, w) = wx, zx sin (xy) , z 3 x .
9. Find
f f
x , y ,
and
f
z
for f =
+y 2
z sin (x + y)
2 3
(c) z 2 sin3 ex +y
+z
10. Suppose
(
f (x, y) =
Find
f
x
if (x, y) 6= (0, 0)
0 if (x, y) = (0, 0) .
(0, 0) and
f
y
(0, 0) .
11. Why must the vector in the definition of the directional derivative be a unit vector?
Hint: Suppose not. Would the directional derivative be a correct manifestation
of steepness?
12. Find fx , fy , fz , fxy , fyx , fxz, fzx , fzy , fyz for the following. Verify the mixed partial
derivatives are equal.
(a) x2 y 3 z 4 + sin (xyz)
(b) sin (xyz) + x2 yz
(c) z ln x2 + y 2 + 1
(d) ex
+y 2 +z 2
242
is as small as possible.
other words, you want to minimize the function of two
PIn
p
2
variables, f (a, b)
i=1 (ati + b xi ) . Find a formula for a and b in terms
of the given ordered pairs. You will be finding the formula for the least squares
regression line.
15. Show that if v (x, y) = u (x, y) , then vx = ux and vy = uy . State and prove
a generalization to any number of variables.
16. Let f be a function which has continuous derivatives. Show u (t, x) = f (x ct)
solves the wave equation, utt c2 u = 0. What about u (x, t) = f (x + ct)?
17. DAlembert found a formula for the solution to the wave equation, utt = c2 uxx
along with the initial conditions u (x, 0) = f (x) , ut (x, 0) = g (x) . Here is how he
did it. He looked for a solution of the form u (x, t) = h (x + ct) + k (x ct) and
then found h and k in terms of the given functions f and g. He ended up with
something like
Z x+ct
1
1
u (x, t) =
g (r) dr + (f (x + ct) + f (x ct)) .
2c xct
2
Fill in the details.
18. Determine which of the following functions satisfy Laplaces equation.
(a) x3 3xy 2
(b) 3x2 y y 3
(c) x3 3xy 2 + 2x2 2y 2
(d) 3x2 y y 3 + 4xy
(e) 3x2 y 3 + 4xy
(f) 3x2 y y 3 + 4y
(g) x3 3x2 y 2 + 2x2 2y 2
19. Show that z =
2
xy
yx
z
is a solution to the partial differential equation, x2 x
2 +
z
z
2xy xy
+ y 2 y
2 = 0.
p
z
z
+ y y
= 0.
20. Show that z = x2 + y 2 is a solution to x x
1 x2 /4c2 t
e
t
Outcomes
1. Define differentiability and explain what the derivative is for a function of n variables.
2. Describe the relation between existence of partial derivatives, continuity, and differentiability.
3. Give examples of functions which have partial derivatives but are not continuous, examples of functions which are differentiable but not C 1 , and examples of
functions which are continuous without having partial derivatives.
4. Evaluate derivatives of composite functions using the chain rule.
5. Solve related rates problems using the chain rule.
12.1
This observation follows from the definition of the derivative of a function of one variable, namely
f (x + h) f (x)
.
f 0 (x) lim
h0
h
Definition 12.1.2 A vector valued function of a vector, v is called o (v) if
o (v)
= 0.
|v|0 |v|
lim
(12.1)
Thus the function f (x + h) f (x) f 0 (x) h is o (h) . The expression, o (h) , is used
like an adjective. It is like saying the function is white or black or green or fat or thin.
The term is used very imprecisely. Thus
o (v) = o (v) + o (v) , o (v) = 45o (v) , o (v) = o (v) o (v) , etc.
243
244
When you add two functions with the property of the above definition, you get another
one having that same property. When you multiply by 45 the property is also retained
as it is when you subtract two such functions. How could something so sloppy be useful?
The notation is useful precisely because it prevents you from obsessing over things which
are not relevant and should be ignored.
Theorem 12.1.3 Let f : (a, b) R be a function of one variable. Then f 0 (x) exists
if and only if
f (x + h) f (x) = ph + o (h)
(12.2)
In this case, p = f 0 (x) .
Proof: From the above observation it follows that if f 0 (x) does exist, then 12.2
holds.
Suppose then that 12.2 is true. Then
o (h)
f (x + h) f (x)
p=
.
h
h
Taking a limit, you see that
p = lim
h0
f (x + h) f (x)
h
and that in fact this limit exists which shows that p = f 0 (x) . This proves the theorem.
This theorem shows that one way to define f 0 (x) is as the number, p, if there is one
which has the property that
f (x + h) = f (x) + ph + o (h) .
You should think of p as the linear transformation resulting from multiplication by the
1 1 matrix, (p).
Example 12.1.4 Let f (x) = x3 . Find f 0 (x) .
3
3
2
2
3
2
2
f (x +2 h) = (x + h) = x + 3x0 h + 3xh2 + h = f (x) + 3x h + 3xh + h h. Since
3xh + h h = o (h) , it follows f (x) = 3x .
Example 12.1.5 Let f (x) = sin (x) . Find f 0 (x) .
f (x + h) f (x) =
sin (x + h) sin (x) = sin (x) cos (h) + cos (x) sin (h) sin (x)
(cos (h) 1)
= cos (x) sin (h) + sin (x)
h
h
(sin (h) h)
(cos (h) 1)
= cos (x) h + cos (x)
h + sin x
h.
h
h
Now
(cos (h) 1)
(sin (h) h)
h + sin x
h = o (h) .
(12.3)
h
h
Remember the fundamental limits which allowed you to find the derivative of sin (x)
were
sin (h)
cos (h) 1
lim
= 1, lim
= 0.
(12.4)
h0
h0
h
h
These same limits are what is needed to verify 12.3.
cos (x)
12.2
245
This way of thinking about the derivative is exactly what is needed to define the derivative of a function of n variables. Recall the following definition.
Definition 12.2.1 A function, T which maps Rn to Rp is called a linear transformation
if for every pair of scalars, a, b and vectors, x, y Rn , it follows that T (ax + by) =
aT (x) + bT (y) .
Recall that from the properties of matrix multiplication, it follows that if A is an
np matrix, and if x, y are vectors in Rn , then A (ax + by) = aA (x)+bA (y) . Thus you
can define a linear transformation by multiplying by a matrix. Of course the simplest
example is that of a 1 1 matrix or number. You can think of the number 3 as a
linear transformation, T mapping R to R according to the rule T x = 3x. It satisfies
the properties needed for a linear transformation because 3 (ax + by) = a3x + b3y =
aT x + bT y. The case of the derivative of a scalar valued function of one variable is of
this sort. You get a number for the derivative. However, you can think of this number
as a linear transformation. Of course it is not worth the fuss to do so for a function of
one variable but this is the way you must think of it for a function of n variables.
Definition 12.2.2 Let f : U Rp where U is an open set in Rn for n, p 1 and let
x U be given. Then f is defined to be differentiable at x U if and only if there
T
exist column vectors, vi such that for h = (h1 , hn ) ,
f (x + h) = f (x) +
n
X
vi hi + o (h) .
(12.5)
i=1
The derivative of the function, f , denoted by Df (x) , is the linear transformation defined
by multiplying by the matrix whose columns are the p 1 vectors, vi . Thus if w is a
vector in Rn ,
|
|
Df (x) w v1 vn w.
|
|
It is common to think of this matrix as the derivative but strictly speaking, this
is incorrect. The derivative is a linear transformation determined by multiplication
by this matrix, called the standard matrix because it is based on the standard basis
vectors for Rn . The subtle issues involved in a thorough exploration of this issue will
be avoided for now. It will be fine to think of the above matrix as the derivative.
Other notations which are often used for this matrix or the linear transformation are
f
df
f 0 (x) , J (x) , and even x
or dx
.
Theorem 12.2.3 Suppose f is as given above in 12.5. Then
vk = lim
h0
f (x+hek ) f (x)
f
(x) ,
h
xk
Proof: Let h = (0, , h, 0, , 0) = hek where the h is in the k th slot. Then 12.5
reduces to
f (x + h) = f (x) + vk h + o (h) .
Therefore, dividing by h
f (x+hek ) f (x)
o (h)
= vk +
h
h
246
f (x+hek ) f (x)
o (h)
= lim vk +
= vk
h0
h0
h
h
lim
and so, the above limit exists. This proves the theorem.
Let f : U Rq where U is an open subset of Rp and f is differentiable. It was just
shown
p
X
f (x)
f (x + v) = f (x) +
vj + o (v) .
xj
j=1
Taking the ith coordinate of the above equation yields
fi (x + v) = fi (x) +
p
X
fi (x)
j=1
xj
vj + o (v)
and it follows that the term with a sum is nothing more than the ith component of
J (x) v where J (x) is the q p matrix,
f1 f1
f1
x
x1
x2
p
f2 f2 f2
x1 x2
xp
.
..
..
..
.
.
.
.
.
.
fq
f
fq
xqp
x1
x2
This gives the form of the matrix which defines the linear transformation, Df (x) . Thus
f (x + v) = f (x) + J (x) v + o (v)
(12.6)
12.3
C 1 Functions
Given a function of many variables, how can you tell if it is differentiable? Sometimes
you have to go directly to the definition and verify it is differrentiable from the definition. For example, you may have seen the following important example in one variable
calculus.
12.3. C 1 FUNCTIONS
247
x2 sin x1 if x 6= 0
. Find Df (0) .
0 if x = 0
f (h) f (0) = 0h + h2 sin h1 = o (h) and so Df (0) = 0. If you find the derivative
for x 6= 0, it is totally useless information if what you want is Df (0) . This is because
the derivative, turns out to be discontinuous. Try it. Find the derivative for x 6= 0 and
try to obtain Df (0) from it. You see, in this example you had to revert to the definition
to find the derivative.
It isnt really too hard to use the definition even for more ordinary examples.
x2 y + y 2
y3 x
. Find Df (1, 2) .
First of all note that the thing you are after is a 2 2 matrix.
6
f (1, 2) =
.
8
Then
f (1 + h1 , 2 + h2 ) f (1, 2)
(1 + h1 ) (2 + h2 ) + (2 + h2 )
3
(2 + h2 ) (1 + h1 )
6
8
4 5
h1
2h1 h2 + 2h21 + h21 h2 + h22
=
+
12h1 h2 + 6h22 + 6h22 h1 + h32 + h32 h1
8 12
h2
4 5
h1
=
+ o (h) .
8 12
h2
4 5
.
8 12
Most of the time, there is an easier way to conclude a derivative exists and to find
it. It involves the notion of a C 1 function.
Therefore, the standard matrix of the derivative is
Definition 12.3.3 When f : U Rp for U an open subset of Rn and the vector valued
fi
f
functions, x
are all continuous, (equivalently each x
is continuous), the function is
i
j
said to be C 1 (U ) . If all the partial derivatives up to order k exist and are continuous,
then the function is said to be C k .
It turns out that for a C 1 function, all you have to do is write the matrix described
in Theorem 12.2.3 and this will be the derivative. There is no question of existence for
the derivative for such functions. This is the importance of the next few theorems.
Theorem 12.3.4 Let U be an open subset of R2 and suppose f : U R has the
property that the partial derivatives fx and fy exist for (x, y) U and are continuous
at the point (x0 , y0 ) . Then
f ((x0 , y0 ) + (v1 , v2 )) = f (x0 , y0 ) +
That is, f is differentiable.
f
f
(x0 , y0 ) v1 +
(x0 , y0 ) v2 + o (v) .
x
x
248
Proof:
f
f
f ((x0 , y0 ) + (v1 , v2 )) f (x0 , y0 ) +
(x0 , y0 ) v1 +
(x0 , y0 ) v2
(12.7)
x
y
f
f
(x0 , y0 ) v1 +
(x0 , y0 ) v2
= (f (x0 + v1 , y0 + v2 ) f (x0 , y0 ))
x
y
f
f
(x0 , y0 ) v1 +
(x0 , y0 ) v2
x
y
By the mean value theorem, there exist numbers s and t in [0, 1] such that this equals
f
f
=
(x0 + tv1 , y0 + v2 ) v1 +
(x0 , y0 + sv2 ) v2
x
y
f
f
(x0 , y0 ) v1 +
(x0 , y0 ) v2
x
y
f
f
f
f
=
(x0 + tv1 , y0 + v2 )
(x0 , y0 ) v1 +
(x0 , y0 + sv2 )
(x0 , y0 ) v2
x
x
y
y
Therefore, letting o (v) denote the expression in 12.7, and noticing that |v1 | and |v2 | are
both no larger than |v| ,
f
f
f
f
|o (v)|
(x0 + tv1 , y0 + v2 )
(x0 , y0 ) +
(x0 , y0 + sv2 )
(x0 , y0 ) |v| .
x
x
y
y
It follows
|o (v)| f
f
f
(x0 + tv1 , y0 + v2 )
(x0 , y0 ) +
(x0 , y0 + sv2 )
(x0 , y0 )
|v|
x
x
y
y
Proof: Let f (x) (f1 (x) , , fq (x)) . From the assumption each component
function is differentiable, the following holds for each k = 1, , q.
fk (x0 + v) = fk (x0 ) +
p
X
fk
i=1
xi
(x0 ) vi + ok (v) .
Define o (v) (o1 (v) , , oq (v)) . Then 12.1 on Page 243 holds for o (v) because it
holds for each of the components of o (v) . The above equation is then equivalent to
p
X
f
f (x0 + v) = f (x0 ) +
(x0 ) vi + o (v)
x
i
i=1
and so f is differentiable at x0 .
Here is an example to illustrate.
12.3. C 1 FUNCTIONS
249
x2 y + y 2
y3 x
. Find Df (x, y) .
From Theorem 12.3.4 this function is differentiable because all possible partial
derivatives are continuous. Thus
2xy x2 + 2y
Df (x, y) =
.
y3
3y 2 x
In particular,
Df (1, 2) =
4
8
5
12
Not surprisingly, the above theorem has an extension to more variables. First this
is illustrated with an example.
x21 x2 + x22
Example 12.3.7 Let f (x1 , x2 , x3 ) = x2 x1 + x3 . Find Df (x1 , x2 , x3 ) .
sin (x1 x2 x3 )
All possible partial derivatives are continuous so the function is differentiable. The
matrix for this derivative is therefore the following 3 3 matrix
0
2x1 x2
x21 + 2x2
x2
x1
1
x2 x3 cos (x1 x2 x3 ) x1 x3 cos (x1 x2 x3 ) x1 x2 cos (x1 x2 x3 )
The following theorem is the general result.
Theorem 12.3.8 Let U be an open subset of Rp for p 1 and suppose f : U R has
the property that the partial derivatives fxi exist for all x U and are continuous at
the point x0 U. Then
f (x0 + v) = f (x0 ) +
p
X
f
(x0 ) vi + o (v) .
xi
i=1
That is, f is differentiable at x0 and the derivative of f equals the linear transformation
obtained by multiplying by the 1 p matrix,
f
f
(x0 ) , ,
(x0 ) .
x1
xp
T
Proof: The proof is similar to the case of two variables. Letting v = (v1 , vp ) ,
denote by i v the vector
T
(0, , 0, vi , vi+1 , , vp )
T
!
p
X
f
f (x0 + v) f (x0 ) +
(x0 ) vi
xi
i=1
p
}|
{ X
f
z
=
(x0 ) vi
f (x0 + i1 v) f (x0 + i v)
xi
i=1
i=1
p
X
(12.8)
250
Now by the mean value theorem there exist numbers si (0, 1) such that the above
expression equals
p
p
X
X
f
f
=
(x0 + i v+si vi ) vi
(x0 ) vi
x
x
i
i
i=1
i=1
p
X
f
|o (v)|
xi (x0 + i v + si vi ) xi (x0 ) |vi |
i=1
p
X
f
p
X
f
|o (v)|
f
lim
lim
(x0 + i v + si vi )
(x0 ) = 0
v0 |v|
v0
xi
xi
i=1
f (x0 ) +
p
X
f
(x0 ) (xi x0i ) + o (v)
x
i
i=1
p
X
f
f (x0 ) +
(x0 ) vi + o (v) .
x
i
i=1
f (1, 2, 3) + 1 (x 1) + 2 (y 2) + 6 (z 3)
= 11 + 1 (x 1) + 2 (y 2) + 6 (z 3) = 12 + x + 2y + 6z
In the case where f has values in Rq rather than R, is there a similar theorem about
differentiability of a C 1 function?
Theorem 12.3.10 Let U be an open subset of Rp for p 1 and suppose f : U Rq
has the property that the partial derivatives fxi exist for all x U and are continuous
at the point x0 U, then
f (x0 + v) = f (x0 ) +
p
X
f
(x0 ) vi + o (v)
xi
i=1
(12.9)
and so f is differentiable at x0 .
Proof: This follows from Theorem 12.3.5.
When a function is differentiable at x0 it follows the function must be continuous
there. This is the content of the following important lemma.
12.3. C 1 FUNCTIONS
251
p
Lemma 12.3.11 Let f : U Rq where U is an open subset
n of R . If f is differentiable,
o
f
(12.10)
Proof: Suppose f is differentiable. Since o (v) satisfies 12.1, there exists 1 > 0 such
that if |x x0 | < 1 , then |o (x x0 )| < |x x0 | . But also, by the triangle inequality,
Corollary 1.5.5 on Page 23,
p
p
X f
xi
i=1
i=1
Therefore, if |x x0 | < 1 ,
|f (x) f (x0 )|
X f
xi
i=1
<
(Cp + 1) |x x0 |
which verifies 12.10. Now letting > 0 be given, let = min 1 , Cp+1
. Then for
|x x0 | < ,
|f (x) f (x0 )| < (Cp + 1) |x x0 | < (Cp + 1)
=
Cp + 1
showing f is continuous at x0 .
There have been quite a few terms defined. First there was the concept of continuity.
Next the concept of partial or directional derivative. Next there was the concept of
differentiability and the derivative being a linear transformation determined by a certain
matrix. Finally, it was shown that if a function is C 1 , then it has a derivative. To give
a rough idea of the relationships of these topics, here is a picture.
Partial derivatives
xy
x2 +y 2
derivative
C1
Continuous
|x| + |y|
You might ask whether there are examples of functions which are differentiable
but not C 1 . Of course there are. In fact, Example 12.3.1 is just such an example as
explained earlier. Then you should verify that f 0 (x) exists for all x R but f 0 fails to
be continuous at x = 0. Thus the function is differentiable at every point of R but fails
to be C 1 at every point of R.
12.3.1
In the case where f is a scalar valued function of two variables, the geometric significance
of the derivative can be exhibited in the following picture. Writing v (x x0 , y y0 ) ,
252
f
f
(x0 , y0 ) (x x0 ) +
(x0 , y0 ) (y y0 ) + o (v)
x
x
f
The right side of the above, f (x0 , y0 ) + f
x (x0 , y0 ) (x x0 ) + x (x0 , y0 ) (y y0 ) =
z is the equation of a plane approximating the graph of z = f (x, y) for (x, y) near
(x0 , y0 ) . Saying that the function is differentiable at (x0 , y0 ) amounts to saying that
the approximation delivered by this plane is very good if both |x x0 | and |y y0 | are
small.
Example 12.3.12 Suppose f (x, y) = xy. Find the approximate change in f if x goes
from 1 to 1.01 and y goes from 4 to 3.99.
This can be done by noting that
f (1.01, 3.99) f (1, 4)
12.4
12.4.1
First recall the chain rule for a function of one variable. Consider the following picture.
g
IJ R
Here I and J are open intervals and it is assumed that g (I) J. The chain rule says
that if f 0 (g (x)) exists and g 0 (x) exists for x I, then the composition, f g also has
0
a derivative at x and (f g) (x) = f 0 (g (x)) g 0 (x) . Recall that f g is the name of the
function defined by f g (x) f (g (x)) . In the notation of this chapter, the chain rule
is written as
Df (g (x)) Dg (x) = D (f g) (x) .
(12.11)
12.4.2
U V Rq
253
The chain rule says that if the linear transformations (matrices) on the left in 12.11
both exist then the same formula holds in this more general case. Thus
Df (g (x)) Dg (x) = D (f g) (x)
Note this all makes sense because Df (g (x)) is a q p matrix and Dg (x) is a p n
matrix. Remember it is all right to do (q p) (p n) . The middle numbers match.
More precisely,
Theorem 12.4.1 (Chain rule) Let U be an open set in Rn , let V be an open set in Rp ,
let g : U Rp be such that g (U ) V, and let f : V Rq . Suppose Dg (x) exists for
some x U and that Df (g (x)) exists. Then D (f g) (x) exists and furthermore,
D (f g) (x) = Df (g (x)) Dg (x) .
(12.12)
In particular,
p
(12.13)
There is an easy way to remember this in terms of the repeated index summation
convention presented earlier. Let y = g (x) and z = f (y) . Then the above says
z yi
z
=
.
yi xk
xk
(12.14)
Remember there is a sum on the repeated index. In particular, for each index,
r,
zr yi
zr
=
.
yi xk
xk
The proof of this major theorem will be given at the end of this section. It will
include the chain rule for functions of one variable as a special case. First here are some
examples.
Example 12.4.2 Let f (u, v) = sin (uv) and let u (x, y, t) = t sin x+cos y and v (x, y, t, s) =
z
s tan x + y 2 + ts. Letting z = f (u, v) where u, v are as just described, find z
t and x .
From 12.14,
z
z u z v
=
+
= v cos (uv) sin (x) + us cos (uv) .
t
u t
v t
Here y1 = u, y2 = v, t = xk . Also,
z u z v
z
=
+
= v cos (uv) t cos (x) + us sec2 (x) cos (uv) .
x
u x v x
Clearly you can continue in this way taking partial derivatives with respect to any of
the other variables.
Example 12.4.3 Let w = f (u1 , u2 ) = u2 sin (u1 ) and u1 = x2 y + z, u2 = sin (xy) .
w
w
Find w
x , y , and z .
The derivative of f is of the form (wx , wy , wz ) and so it suffices to find the derivative
of f using the chain rule. You need to find Df (u1 , u2 ) Dg (x, y, z) where g (x, y) =
254
x2 y + z
2xy
x2
1
. Then Dg (x, y, z) =
. Also Df (u1 , u2 ) =
sin (xy)
y cos (xy) x cos (xy) 0
(u2 cos (u1 ) , sin (u1 )) . Therefore, the derivative is
2xy
x2
y cos (xy) x cos (xy)
1
0
= 2u2 (cos u1 ) xy + (sin u1 ) y cos xy, u2 (cos u1 ) x2 + (sin u1 ) x cos xy, u2 cos u1 = (wx , wy , wz )
Thus w
xy+ sin x2 y + z y cos xy
x = 2u2 (cos u1 ) xy+(sin u1 ) y cos xy = 2 (sin (xy)) cos x y + z
. Similarly, you can find the other partial derivatives of w in terms of substituting in for
u1 and u2 in the above. Note
w u1
w u2
w
=
+
.
x
u1 x
u2 x
u1 (x, y, z)
u2 (x, y, z)
, then
wu1
wu2
u1x
u2x
u1y
u2y
u1z
u2z
u1
Example 12.4.4 Let w = f (u1 , u2 , u3 ) = u21 + u3 + u2 and g (x, y, z) = u2 =
u3
x + 2yz
x2 + y . Find w and w .
x
z
z2 + x
By the chain rule,
(wx , wy , wz ) =
wu1
wu2
wu3
u1x
u2x
u3x
u1y
u2y
u3y
u1z
u2z
u3z
wu1 u1z + wu2 u2z + wu3 u3z
=
=
wz
Therefore,
wx = 2u1 (1) + 1 (2x) + 1 (1) = 2 (x + 2yz) + 2x + 1 = 4x + 4yz + 1
and
wz = 2u1 (2y) + 1 (0) + 1 (2z) = 4 (x + 2yz) y + 2z = 4yx + 8y 2 z + 2z.
255
Of course to find all the partial derivatives at once, you just use the chain rule. Thus
you would get
1 2z 2y
2u1 1 1 2x 1
0
wx wy wz
=
1
0 2z
u21 + u2
Example 12.4.5 Let f (u1 , u2 ) =
and
sin (u2 ) + u1
u1 (x1 , x2 , x3 )
x1 x2 + x3
g (x1 , x2 , x3 ) =
=
.
u2 (x1 , x2 , x3 )
x22 + x1
Find D (f g) (x1 , x2 , x3 ) .
To do this,
Df (u1 , u2 ) =
2u1
1
1
cos u2
Then
Df (g (x1 , x2 , x3 )) =
, Dg (x1 , x2 , x3 ) =
x2
1
x1
2x2
2 (x1 x2 + x3 )
1
1
cos x22 + x1
1
0
Df (g(x))
}|
2 (x1 x2 + x3 )
1
1
cos x22 + x1
(2x1 x2 + 2x
3 ) x2 +1
x2 + cos x22 + x1
{z
Dg(x)
x2
1
}|
x1
2x2
1
0
(2x1 x2 + 2x3) x1 + 2x
2
x1 + 2x2 cos x22 + x1
2x1 x2 + 2x3
1
Therefore, in particular,
f1 g
(x1 , x2 , x3 ) = (2x1 x2 + 2x3 ) x2 + 1,
x1
f2 g
f2 g
(x1 , x2 , x3 ) = 1,
(x1 , x2 , x3 ) = x1 + 2x2 cos x22 + x1 .
x3
x2
etc.
z1
u21 + u2
In different notation, let
= f (u1 , u2 ) =
. Then
z2
sin (u2 ) + u1
z1 u1
z1 u2
z1
=
+
= 2u1 x2 + 1 = 2 (x1 x2 + x3 ) x2 + 1.
x1
u1 x1
u2 x1
u1 + u2 u3
z1
Example 12.4.6 Let f (u1 , u2 , u3 ) = z2 = u21 + u32 and let
z3
ln 1 + u23
u1
x1 + x22 + sin (x3 ) + cos (x4 )
.
x24 x1
g (x1 , x2 , x3 , x4 ) = u2 =
u3
x23 + x4
0
Find (f g) (x) .
256
2u1
2u1
Df (u) =
0
Similarly,
1 2x2
0
Dg (x) = 1
0
0
u3
3u22
0
u2
0
2u3
(1+u23 )
cos (x3 )
0
2x3
sin (x4 )
.
2x4
1
Then by the chain rule, D (f g) (x) = Df (u) Dg (x) where u = g (x) as described
above. Thus D (f g) (x) =
2u1 u3
u2
1 2x2 cos (x3 ) sin (x4 )
2
2u1 3u
1
0
0
2x4
2
2u3
0
0
0
0
2x3
1
(1+u23 )
z1
x1
z2
x4
equals
2u1 sin x4 + 6u22 x4 = 2 x1 + x22 + sin (x3 ) + cos (x4 ) sin (x4 ) + 6 x24 x1 x4 .
If you wanted
z
x2 equals
z1
x2
z2
x2
z3
x2
z
x2
4u1 x2
4 x1 + x22 + sin (x3 ) + cos (x4 ) x2
4u1 x2 = 4 x1 + x22 + sin (x3 ) + cos (x4 ) x2 .
=
0
0
I hope that by now it is clear that all the information you could desire about various partial derivatives is available and it all reduces to matrix multiplication and the
consideration of entries of the matrix obtained by multiplying the two derivatives.
12.4.3
Sometimes several variables are related and given information about how one variable is
changing, you want to find how the others are changing.The following law is discussed
later in the book, on Page 387.
Example 12.4.7 Bernoullis law states that in an incompressible fluid,
v2
P
+z+
=C
2g
where C is a constant. Here v is the speed, P is the pressure, and z is the height above
some reference point. The constants, g and are the acceleration of gravity and the
dz
weight density of the fluid. Suppose measurements indicate that dv
dt = 3, and dt = 2.
Find dP
dt when v = 7 and z = 8 in terms of g and .
257
This is just an exercise in using the chain rule. Differentiate the two sides with
respect to t.
1 dv dz
1 dP
v
+
+
= 0.
g dt
dt
dt
Then when v = 7 and z = 8, finding dP
dt involves nothing more than solving the following
for dP
.
dt
7
1 dP
(3) + 2 +
=0
g
dt
Thus
dP
21
=
2
dt
g
at this instant in time.
Example 12.4.8 In Bernoullis law above, each of v, z, and P are functions of (x, y, z) ,
the position of a point in the fluid. Find a formula for P
x in terms of the partial
derivatives of the other variables.
This is an example of the chain rule. Differentiate both sides with respect to x.
1
v
vx + zx + Px = 0
g
and so
Px =
vvx + zx g
g
Example 12.4.9 Suppose a level curve is of the form f (x, y) = C and that near a
dy
point on this level curve, y is a differentiable function of x. Find dx
.
This is an example of the chain rule. Differentiate both sides with respect to x. This
gives
dy
fx + fy
= 0.
dx
Solving for
dy
dx
gives
dy
fx (x, y)
=
.
dx
fy (x, y)
Example 12.4.10 Suppose a level surface is of the form f (x, y, z) = C. and that near
a point, (x, y, z) on this level surface, z is a C 1 function of x and y. Find a formula for
zx .
This is an exaple of the use of the chain rule. Differentiate both sides of the equation
with respect to x. Since yx = 0, this yields
fx + fz zx = 0.
Then solving for zx gives
zx =
fx (x, y, z)
fz (x, y, z)
258
rx cos (r sin ) x
rx sin + (r cos ) x
sin ()
r
12.4.4
sin ()
r
Example 12.4.12 Let f : U V where U and V are open sets in Rn and f is one to
one and onto. Suppose also that f and f 1 are both differentiable. How are Df 1 and
Df related?
This can be done as follows. From the assumptions, x = f 1 (f (x)) . Let Ix = x.
Then by Example 12.2.4 on Page 246 DI = I. By the chain rule,
I = DI = Df 1 (f (x)) (Df (x)) .
Therefore,
Df (x)
This is equivalent to
= Df 1 (f (x)) .
1
Df f 1 (y)
= Df 1 (y)
or
Df (x)
= Df 1 (y) , y = f (x) .
This is just like a similar situation for functions of one variable. Remember
0
f 1 (f (x)) = 1/f 0 (x) .
In terms of the repeated index summation convention, suppose y = f (x) so that x = f 1 (y) .
Then the above can be written as
ij =
yk
xi
(f (x))
(x) .
yk
xj
12.4.5
259
and fix and , letting only change, this gives a curve in the direction of increasing
. Thus it is a vector which points away from the origin. Letting only change and
fixing and , this gives a vector which is tangent to the sphere of radius and points
South. Similarly, letting change and fixing the other two gives a vector which points
East and is tangent to the sphere of radius . It is thought by most people that we live
on a large sphere. The model of a flat earth is not believed by anyone except perhaps
beginning physics students. Given we live on a sphere, what directions would be most
meaningful? Wouldnt it be the directions of the vectors just described?
Let r (t) denote the position vector of the object from the origin. Thus
T
r (t) = (t) e (t) = (x (t) , y (t) , z (t))
Now this implies the velocity is
0
(12.16)
e
1
T
= ( sin sin , sin cos , 0) = e ,
and
e
= 0.
=
=
e d
+
dt
1 d
e +
dt
By 12.16,
r 0 = 0 e +
e d
dt
1 d
e .
dt
d
d
e + e .
dt
dt
(12.17)
260
Now things get interesting. This must be differentiated with respect to t. To do so,
e
T
= ( sin cos , sin sin , 0) =?
Also,
e
T
= ( cos sin , cos cos , 0) = (cot ) e
and
e
1
T
= ( sin sin , sin cos , 0) = e .
and finally,
e
1
T
= (cos cos , cos sin , sin ) = e .
With these formulas for various partial derivatives, the chain rule is used to obtain r00
which will yield a formula for the acceleration in terms of the spherical coordinates and
these special vectors. By the chain rule,
d
(e )
dt
=
=
d
(e )
dt
e 0 e 0 e 0
+
+
0
0
e + e
e 0 e 0 e 0
+
+
0
= 0 ( cos sin ) e + sin2 e + 0 (cot ) e + e
d
(e ) =
dt
=
e 0 e 0 e 0
+
+
0
0
cot e + 0 (e ) +
e
261
r00 = 00 e + 00 e + 00 e + 0 (e ) + 0 (e ) + 0 (e )
0
0
e + e + e +
e + e +
0 0 cot e + 0 (e ) +
e
+
0
0
0
0
2
( cos sin ) e + sin e + (cot ) e + e
00
00
00
and now all that remains is to collect the terms. Thus r00 equals
0 2
0 2 2
20 0 0 2
00
00
00
cos sin e +
r
=
sin () e + +
20 0
+ 00 +
+ 20 0 cot () e .
and this gives the acceleration in spherical coordinates. Note the prominent role played
by the chain rule. All of the above is done in books on mechanics for general curvilinear
coordinate systems and in the more general context, special theorems are developed
which make things go much faster but these theorems are all exercises in the chain rule.
As an example of how this could be used, consider a rocket. Suppose for simplicity
that it experiences a force only in the direction of e , directly away from the earth.
Of course this force produces a corresponding acceleration which can be computed as
a function of time. As the fuel is burned, the rocket becomes less massive and so the
acceleration will be an increasing function of t. However, this would be a known function,
say a (t). Suppose you wanted to know the latitude and longitude of the rocket as a
function of time. (There is no reason to think these will stay the same.) Then all that
would be required would be to solve the system of differential equations1 ,
2
2
00 0 0 sin2 () = a (t) ,
20 0 0 2
00 +
cos sin = 0,
20 0
00 +
+ 20 0 cot () = 0
along with initial conditions, (0) = 0 (the distance from the launch site to the center of
the earth.), 0 (0) = 1 (the initial vertical component of velocity of the rocket, probably
0.) and then initial conditions for , 0 , , 0 . The initial value problems could then be
solved numerically and you would know the distance from the center of the earth as a
function of t along with and . Thus you could predict where the booster shells would
fall to earth so you would know where to look for them. Of course there are many
variations of this. You might want to specify forces in the e and e direction as well
and attempt to control the position of the rocket or rather its payload. The point is that
if you are interested in doing all this in terms of , , and , the above shows how to do
it systematically and you see it is all an exercise in using the chain rule. More could be
said here involving moving coordinate systems and the Coriolis force. You really might
want to do everything with respect to a coordinate system which is fixed with respect
to the moving earth.
1 You wont be able to find the solution to equations like these in terms of simple functions. The
existence of such functions is being assumed. The reason they exist often depends on the implicit
function theorem, a big theorem in advanced calculus.
262
12.4.6
v0
|o (g (x + v) g (x))|
= 0.
|v|
(12.18)
From Lemma 12.3.11, there exists > 0 such that if |v| < , then
|g (x + v) g (x)| (Cn + 1) |v| .
(12.19)
Now let > 0 be given. There exists > 0 such that if |g (x + v) g (x)| < , then
|o (g (x + v) g (x))| <
|g (x + v) g (x)|
(12.20)
Cn + 1
|g (x + v) g (x)|
Cn + 1
(Cn + 1) |v|
Cn + 1
|o (g (x + v) g (x))| <
<
and so
|o (g (x + v) g (x))|
<
|v|
(12.21)
(12.22)
p
X
f (g (x))
i=1
yi
p
X
f (g (x))
i=1
yi
263
(f g) (x + v) = f (g (x)) +
p
X
f (g (x))
yi
i=1
f (g (x)) +
p
X
f (g (x))
n
X
gi (x)
j=1
xj
n
X
gi (x)
vj + o (v) + o (v)
vj +
yi
xj
j=1
!
p
n
X
X
f (g (x)) gi (x)
i=1
= (f g) (x) +
j=1
i=1
yi
xj
p
X
f (g (x))
i=1
yi
o (v) + o (v)
vj + o (v)
Pp
because i=1 f (g(x))
o (v) + o (v) = o (v) . This establishes 12.22 because of Theorem
yi
12.2.3 on Page 245. Thus
(D (f g) (x))kj
=
=
p
X
fk (g (x)) gi (x)
i=1
p
X
yi
xj
i=1
12.5
Lagrangian Mechanics
A difficult and important problem is to come up with differential equations which model
mechanical systems. Lagrange gave a way to do this. It will be presented here as a
very interesting and important application of the chain rule. Lagrange developed this
technique back in the 1700s. The presentation here follows [12]. Assume N point
masses, located at the points x1 , , xN in R3 and let the mass of the th mass be m .
Then according to Newtons second law,
m x00 = F (x , t) .
(12.23)
264
N
X
1
dx dx
m
2
dt
dt
=1
N
j
r
X
X
X
G
G
1
G dq
G dq
m
+
+
j dt
r dt
2
q
t
q
t
r
=1
j
"
X1 X
j,r
G G
q j
q r
X1
" N
m
X
X
j=1
=1
r j
q q +
dq k
dt .
G G
t
t
G G
q j
t
qj
(12.24)
Therefore,
G G
q j
q k
#
qj +
XX
G G
q k
t
m
G X G j X
G G
+
=
m k
q
m
q
q j
q k
t
=1
j=1
!
N
X
X
G
G
G G
=
m k x0
+
m
q
t
q k
t
=1
N
X
N
X
G
m x0
k
q
=1
Now using the chain rule and product rule again, along with Newtons second law,
d
dt
T
qk
2
X 2 G
m x0
=
qj +
k q j
k
q
tq
=1
j
N
!
X G
00
+
m x
q k
=1
N
2
X
X 2 G
G
qj +
m x0 +
=
k q j
k
q
tq
=1
j
N
!
X G
+
F
q k
=1
N
2
X
X 2 G
qj +
=
k q j
k
q
tq
=1
j
!! N
!
X G
X G
G
m
qr +
+
F
q r
t
q k
r
=1
N
X
(12.25)
" N
X X
265
#
N X
X
2 G j
G
q q +
q m
m
k q j
q
t
=1
=1 j
rj
!
N
!
X X 2 G
X 2 G
X G
G r
G
+
+
m
q +
m
F(12.26)
tq k
q r
tq k
t
q k
=1
Next consider
T =
T
.
q k
r j
Recall 12.24,
"
X1 X
j,r
2 G G
q j q k q r
G G
q j
q r
+
X1
2
#
qr qj +
XX
G G
t
t
G G
q j
t
qj
(12.27)
qr qj +
j q k
r
q k
q
q
=1
rj
XX
2 G G
q k q j
t
+
qj +
XX
2 G G
q k t
t
G 2 G
q j q k t
qj
(12.28)
T
qk
N
X
T
G
=
F .
q k
q k
=1
Resolve the force, F into the sum of two forces, F = Fa + Fc where Fc is a force
perpendicular to G
. The only requirement of this sort is placed on Fc . Therefore,
q k
G
G
F =
Fa
q k
q k
and so in the end, you obtain the following interesting equation which is equivalent to
Newtons second law.
d
dt
T
qk
T
q k
N
X
G
Fa
k
q
=1
(12.29)
G a
F ,
q k
(12.30)
where Fa (F
1 , , FN ) is referred to as the total applied force.
It is particularly agreeable when the total applied force comes as the gradient of a
potential function. This means there exists a scalar function of x, defined near G (V )
such that
Fa (x,t) = (x,t)
266
where the symbol denotes the gradient with respect to x . More generally,
Fa (x,t) = (x,t) + Fd
where Fd is a force which is not a force of constraint or the gradient of a given function.
For example, it could be a force of friction. Then
Fa (x,t) = (x,t) + Fd
where
Fd = Fd1 , , FdN
X (x) xj
d L
L
T
d T
+
dt qk
q k
dt qk
q k
xj q k
j
=
G
G
G
(x) + Fd + k = k Fd .
q k
q
q
(12.31)
These are called Lagranges equations of motion and they are enormously significant
because it is often possible to find the kinetic and potential energy in terms of variables
q k which are meaningful for a particular problem. The expression, L (q, q)
is called the
Lagrangian. This has proved part of the following theorem.
Theorem 12.5.1 In the above context Newtons second law implies
d L
L
G
k = k Fd .
dt qk
q
q
(12.32)
In particular, if the applied force is the gradient of , the right side reduces to 012.32.
If, in addition to this, the potential function is time independent then the total energy
is conserved. That is,
T (q, q)
+ (G (q,t)) = C
(12.33)
for some constant, C.
Proof: It remains to verify the assertion about the energy. In terms of the Cartesian
coordinates,
X1
E=
m x x + (x,t) .
2
m x
x +
F x +
(x,t) x +
(x,t) x +
x j +
j
x
t
j
Fa x +
X
X
(x,t) x +
(x,t) + Fd x +
=
t
Fd x +
=
.
t
267
D
D
D l1
D
D m1
D
D
D l2
D
D m2
It is fairly easy to find the equations of motion in terms of the variables, and .
These variables are the q k mentioned above. Because the two rods joining the masses
have fixed length, a constraint is introduced on the motion of the two masses. It is
clear the position of these masses is specified from the two variables, and . In fact,
letting the origin be located at the point at the top where the pendulum is suspended
and assuming the vibration is in a plane,
x1 = (l1 sin , l1 cos )
and
x2 = (l1 sin + l2 sin , l1 cos l2 cos ) .
Therefore,
x 1
x 2
l1 cos , l1 sin
1
2 + 2l1 (sin ) l2 sin + l2 ()
2 + 1 m1 l2 ()
2 .
T = m2 2l1 (cos ) l2 cos + l12 ()
2
1
2
2
There are forces of constraint acting on these masses and there is the force of gravity
acting on them. The force from gravity on m1 is m1 g and the force from gravity
on m2 is m2 g. Our function, is just the total potential energy. Thus (x1 , x2 ) =
m1 gy1 + m2 gy2 . It follows that (G (q)) = m1 g (l1 cos ) + m2 g (l1 cos l2 cos ) .
Therefore, the Lagrangian, L, is
1
2 + l2 ()
2 + 1 m1 l2 ()
2
m2 2l1 l2 (cos ( )) + l12 ()
2
1
2
2
[m1 g (l1 cos ) + m2 g (l1 cos l2 cos )] .
It now becomes an easy task to find the equations of motion in terms of the two angles,
and .
L
d L
=
dt
268
(12.34)
L
L
d h
m2 l1 l2 (cos ( )) + m2 l22 + m2 gl2 sin m2 l1 l2 sin ( )
dt
= m2 l1 l2 00 cos ( ) m2 l1 l2 0 sin ( ) 0 0
d
dt
1
2 + l2 ()
2 + 1 m1 l2 ()
2
m2 2l1 l2 (cos ( )) + l12 ()
2
1
2
2
+ [m1 g (l1 cos ) + m2 g (l1 cos l2 cos )] = C.
12.6
Newtons Method
12.6.1
various
equations. For example, suppose you want to find 2. The existence of 2 is not
difficult to establish by considering the continuous function, f (x) = x2 2 which is
negative at x = 0 and positive at x = 2. Therefore, by the intermediate
value theorem,
there exists x (0, 2) such that f (x) = 0 and this x must equal 2. The problem
consists of how to find this number, not just to prove it exists. The following picture
illustrates the procedure of the Newton Raphson method.
x2
x1
f (x1 )
.
f 0 (x1 )
269
This second point, x2 is the second approximation and the same process is done for x2
that was done for x1 in order to get the third approximation, x3 . Thus
x3 = x2
f (x2 )
.
f 0 (x2 )
f (xn )
.
f 0 (xn )
(12.36)
which hopefully has the property that limn xn = x where f (x) = 0. You can see
from the above picture that this must work out in the case of f (x) = x2 2.
Now carry out the computations in the above case for x1 = 2 and f (x) = x2 2.
From 12.36,
2
x2 = 2 = 1.5.
4
Then
x3 = 1.5
(1.5) 2
1. 417,
2 (1.5)
2
(1.417) 2
= 1. 414 216 302 046 577,
2 (1.417)
What is the true value of 2? To several decimal places this is 2 = 1. 414 213 562
373 095, showing that the Newton Raphson method has yielded a very good approximation after only a few iterations, even starting with an initial approximation, 2, which
was not very good.
This method does not always work. For example, suppose you wanted to find the
solution to f (x) = 0 where f (x) = x1/3 . You should check that the sequence of iterates
which results does not converge. This is because, starting with x1 the above procedure
yields x2 = 2x1 and so as the iteration continues, the sequence oscillates between
positive and negative values as its absolute value gets larger and larger. The problem
is that f 0 (0) does not exist.
However, if f (x0 ) = 0 and f 00 (x) > 0 for x near x0 , you can draw a picture to
show that the method will yield a sequence which converges to x0 provided the first
approximation, x1 is taken sufficiently close to x0 . Similarly, if f 00 (x) < 0 for x near
x0 , then the method produces a sequence which converges to x0 provided x1 is close
enough to x0 .
x4 = 1.417
12.6.2
The same formula yields a procedure for finding solutions to systems of functions of n
variables. This is particularly interesting because you cant make any sense of things
from drawing pictures. The technique of graphing and zooming which really works well
for functions of one variable is no longer available.
Procedure 12.6.1 Suppose f is a C 1 function of n variables and f (z) = 0. Then to
find z, you use the same iteration which you would use in one dimension,
xk+1 = xk Df (xk )
f (xk )
270
3x2 3y 2 6x + 7
6xy + 6y
Df (x, y) =
6xy 6y
3x2 3y 2 6x + 7
Start with an initial guess (x0 , y0 ) = (1, 3) . Then the next iteration is
1
3
23
0
0
23
1
3. 937 183 7 102
72
0
23
1.0
=
2. 415 625 8
0
3
72
23
0
3. 937 183 7 102
0
18. 155 338
I will not bother to use all the decimals in 2.4156258. The next iteration is
1.0
7. 530 120 5 102
0
0
2. 4
0
7. 530 120 5 102
4. 224
1.0
=
.
2. 081 927 7
Notice how the process is converging to the solution (x, y) = (1, 2) . If you do one more
iteration, you will be really close.
The above was pretty painful because at every step the derivative had to be re
evaluated and the inverse taken. It turns out a simpler procedure will work in which
you dont have to constantly re evaluate the inverse of the derivative.
Procedure 12.6.3 Suppose f is a C 1 function of n variables and f (z) = 0. Then to
find z, you can use the following iteration procedure
1
xk+1 = xk Df (x0 )
f (xk )
1
0
23
1
Df (1, 3) =
1
0
23
x1
=
y1
=
1
3
271
1
23
0
1
0
23
0
15.0
1.0
2. 347 826 1
1
x2
1.0
23
=
y2
2. 347 826 1
0
1.0
=
2. 193 452 7
The next iteration is
1
x3
1.0
23
=
y3
2. 193 452 7
0
1.0
=
2. 116 087 3
The next iteration is
1
x4
1.0
23
=
y4
2. 116 087 3
0
1.0
=
.
2. 072 125 5
0
1
23
0
1
23
0
1
23
0
3. 550 587 8
0
1. 779 405
0
1. 011 120 4
12.7
Convergence Questions
272
12.7.1
The message of this section is that under reasonable conditions amounting to an as1
sumption that Df (z) exists, Newtons method will converge whenever you take an
initial approximation sufficiently close to z. This is just like the situation for the method
in one dimension.
The proof of convergence rests on the following lemma which is somewhat more
interesting than Newtons method. It is a case of the contraction mapping principle
important in differential and integral equations.
Lemma 12.7.1 Suppose T : B (x0 , ) Rp Rp and it satisfies
|T xT y|
1
|x y| for all x, y B (x0 , ) .
2
(12.37)
n
X
k
T x0 T k1 x0
k=1
n
X
k=1
1
2
k1
|T x0 x0 |
2 |T x0 x0 | < 2
= < .
4
2
Thus the sequence remains in the closed ball, B (x0 , /2) B (x0 , ) . Also, by similar
reasoning,
|T n x0 T m x0 |
n
n k
X
k+1
X
1
1
T
x0 T k x0
|T x0 x0 |
.
2
4 2m1
k=m
k=m
12.7.2
1
2
|x y| and so
273
and for A, B L (F , F ) ,
||A + B|| ||A|| + ||B||
The first two properties are obvious but you should verify them. It remains to verify the
norm is well defined and also to verify the triangle inequality above. First if |x| 1, and
(Aij ) is the matrix of the linear transformation with respect to the usual basis vectors,
then
!1/2
X
2
|(Ax)i |
: |x| 1
||A|| = max
2 1/2
X X
= max
A
x
:
|x|
ij
j
i j
x
1
|Ax|
= A ||A||
(12.38)
|x| |x|
It only remains to verify completeness. Suppose then that {Ak } is a Cauchy sequence
in L (Fn , Fm ) . Then from 12.38 {Ak x} is a Cauchy sequence for each x Fn . This
follows because
|Ak x Al x| ||Ak Al || |x|
which converges to 0 as k, l . Therefore, by completeness of Fm , there exists Ax,
the name of the thing to which the sequence, {Ak x} converges such that
lim Ak x = Ax.
a lim Ak x + b lim Ak y
aAx + bAy.
274
By the first part of this argument, ||A|| < and so A L (Fn , Fm ) . This proves the
theorem.
The following is an interesting exercise which is left for you.
Proposition 12.7.4 Let A (x) L (Fn , Fm ) for each x U Fp . Then letting
(Aij (x)) denote the matrix of A (x) with respect to the standard basis, it follows Aij is
continuous at x for each i, j if and only if for all > 0, there exists a > 0 such that
if |x y| < , then ||A (x) A (y)|| < . That is, A is a continuous function having
values in L (Fn , Fm ) at x.
Proof: Suppose first the second condition holds. Then from the material on linear
transformations,
|Aij (x) Aij (y)| =
2 1/2
X
X
=
(Aij (x) Aij (y)) vj
i j
(12.39)
2 1/2
X
X
|Aij (x) Aij (y)| |vj | .
i
By continuity of each Aij , there exists a > 0 such that for each i, j
2 1/2
X X
|v|
<
n m
i
j
2 1/2
X X
n m
i
j
275
segment joining the two points lies in U .) Suppose also that for all points on this line
segment,
||Df (x+t (y x))|| M.
Then
||f (y) f (x)|| M ||y x|| .
Proof: Let
S {t [0, 1] : for all s [0, t] ,
||f (x + s (y x)) f (x)|| (M + ) s ||y x||} .
Then 0 S and by continuity of f , it follows that if t sup S, then t S and if t < 1,
||f (x + t (y x)) f (x)|| = (M + ) t ||y x|| .
(12.40)
{hk }k=1
12.7.3
1
(12.41)
I Df (x0 ) Df (x) < .
2
Now pick x0 B (z, ) also close enough to z such that
1
Df (x0 ) |f (x0 )| < .
4
Define
T x x Df (x0 )
Then the sequence,
{T n x0 }n=1
converges to z.
f (x) .
276
1
1
Proof: First note that |T x0 x0 | = Df (x0 ) f (x0 ) Df (x0 ) |f (x0 )| < 4 .
Also on B (x0 , ) B (z, 2) the inequality, 12.41, the chain rule, and Theorem 12.7.5
shows that for x, y B (x0 , ) ,
|T x T y|
1
|x y| .
2
12.7.4
Newtons Method
1
1
(12.42)
Df (x2 ) Df (x1 ) K |x2 x1 | .
Then there exists > 0 small enough that for all x1 , x2 B (z, 2)
1
x1 x2 Df (x2 ) (f (x1 ) f (x2 ))
|f (x1 )|
<
1
|x1 x2 | ,
4
1
.
4K
(12.43)
(12.44)
1
Df (x0 ) |f (x0 )| < .
4
Define
1
T x x Df (x)
f (x) .
1
x1 x2 Df (x2 ) (Df (x2 ) (x1 x2 ) + f (x1 ) f (x2 ) Df (x2 ) (x1 x2 ))
1
= Df (x2 ) (f (x1 ) f (x2 ) Df (x2 ) (x1 x2 ))
C |f (x1 ) f (x2 ) Df (x2 ) (x1 x2 )|
1
because 12.42 implies Df (x) is bounded for x B (z, ) . Now use the assumption that f is C 1 and Proposition 12.7.4 to conclude there exists small enough that
||Df (x) Df (z)|| < 18 for all x B (z, 2) . Then let x1 , x2 B (z,2) . Define
h (x) f (x) f (x2 ) Df (x2 ) (x x2 ) . Then
||Dh (x)|| =
2 The following condition as well as the preceeding can be shown to hold if you simply assume f is
a C 2 function and Df (z)1 exists. This requires the use of the inverse function theorem, one of the
major theorems which should be studied in an advanced calculus class.
12.8. EXERCISES
277
This proves 12.43. 12.44 can be satisfied by taking still smaller if necessary and using
f (z) = 0 and the continuity of f .
Now let x0 B (z, ) be as described. Then
1
1
|T x0 x0 | = Df (x0 ) f (x0 ) Df (x0 ) |f (x0 )| < .
4
Letting x1 , x2 B (x0 , ) B (z, 2) ,
1
1
|T x1 T x2 | = x1 Df (x1 ) f (x1 ) x2 Df (x2 ) f (x2 )
1
1
1
f (x1 )
x1 x2 Df (x2 ) (f (x1 ) f (x2 )) + Df (x1 ) Df (x2 )
41 |x1 x2 | + K |x1 x2 | |f (x1 )| 21 |x1 x2 | .
The desired result now follows from Lemma 12.7.1.
12.8
Exercises
t0
f (x + tv) f (x)
.
t
xy sin x1 if x 6= 0
. Find where f is differentiable and compute
2. Let f (x, y) =
0 if x = 0
the derivative at all these points.
3. Let
f (x, y) =
Show f is continuous at (0, 0) and that the partial derivatives exist at (0, 0) but
the function is not differentiable at (0, 0) .
4. Let
f (x, y, z) =
x2 sin y + z 3
sin (x + y) + z 3 cos x
Find Df (1, 2, 3) .
5. Let
f (x, y, z) =
x tan y + z 3
cos (x + y) + z 3 cos x
Find Df (1, 2, 3) .
6. Let
Find Df (x, y, z) .
x sin y + z 3
f (x, y, z) = sin (x + y) + z 3 cos x .
x5 + y 2
278
7. Let
(
f (x, y) =
(x2 y4 )
if (x, y) 6= (0, 0) .
1 if (x, y) = (0, 0)
(x2 +y 4 )2
Show that all directional derivatives of f exist at (0, 0) , and are all equal to zero
but the function is not even continuous at (0, 0) . Therefore, it is not differentiable.
Why?
8. In the example of Problem 7 show the partial derivatives exist but are not continuous.
2
y
x
z
9. A certain building is shaped like the top half of the ellipsoid, 900
+ 900
+ 400
=
1 determined by letting z 0. Here dimensions are measured in meters. The
building needs to be painted. The paint, when applied is about .005 meters thick.
About how many cubic meters of paint will be needed. Hint:This is going to
replace the numbers, 900 and 400 with slightly larger numbers when the ellipsoid
is fattened slightly by the paint. The volume of the top half of the ellipsoid,
x2 /a2 + y 2 /b2 + z 2 /c2 1, z 0 is (2/3) abc.
10. Show carefully that the usual one variable version of the chain rule is a special
case of Theorem 12.4.15.
x1 + x2
2
11. Let z = f (y) = y1 + sin y2 + tan y3 and y = g (x) x22 x1 + x2 .
x22 + x1 + sin x2
z
Find D (f g) (x) . Use to write xi for i = 1, 2.
2
12. Let z = f (y) = y1 + cot y2 + sin y3 and y = g (x)
Find D (f g) (x) . Use to write
z
xi
x1 + x4 + x3
x22 x1 + x2 .
2
x2 + x1 + sin x4
for i = 1, 2, 3, 4.
x1 + x4 + x3
x22 x1 + x2
x1 + x2
and y = g (x) x22 x1 + x2 .
x22 + x1 + sin x2
for i = 1, 2 and k = 1, 2.
zk
xi
y
y
+
y
x
2
1
3
and y = g (x) 2 2 2x1 + x3
16. Let z = f (y) =
x3 + x1 + cos x1
y23 y4 + y1
y1 + y2
x22
zk
Find D (f g) (x) . Use to write xi for i = 1, 2, 3, 4 and k = 1, 2, 3, 4.
12.8. EXERCISES
279
f (r1 (t) , r2 (t) , r3 (t)) (r10 (t) , r20 (t) , r30 (t)) = 0.
What geometric fact have you just established?
23. Suppose f is a C 1 function which maps U, an open subset of Rn one to one and
onto V, an open set in Rm such that the inverse map, f 1 is also C 1 . What must
be true of m and n? Why? Hint: Consider Example 12.4.12 on Page 258. Also
you can use the fact that if A is an m n matrix which maps Rn onto Rm , then
m n.
280
Outcomes
13.1
Fundamental Properties
282
Proof:
f (x+tv) f (x)
t
1
tvi + o (tv) f (x)
f (x) +
t
xi
j=1
n
X
1
f (x)
tvi + o (tv)
t j=1 xi
n
X
f (x)
j=1
Now limt0
o(tv)
t
n
X
f (x)
xi
vi +
o (tv)
.
t
= 0 and so
n
Dv f (x) = lim
as claimed.
f
x1
T
f
(x) , , x
(x)
n
just as was done in the special case where f is C 1 . As before, this vector is called the
gradient vector.
This defines the gradient for a differentiable scalar valued function. There are ways
to define the gradient for vector valued functions but this will not be attempted in this
book.
It follows immediately from 13.1 that
f (x + v) = f (x) + f (x) v + o (v)
(13.2)
As mentioned above, an important aspect of the gradient is its relation with the directional derivative. A repeat of the above argument gives the following. From 13.2, for v
a unit vector,
f (x+tv) f (x)
t
o (tv)
t
o (t)
= f (x) v+
.
t
= f (x) v+
Therefore, taking t 0,
Dv f (x) = f (x) v.
(13.3)
1
1
1
v= , ,
.
3
3
3
Note this vector which is given is already a unit vector. Therefore, from the above,
it is only necessary to find f (1, 0, 1) and take the dot product.
f (x, y, z) = (2x + (cos xy) y, (cos xy) x, 1) .
Therefore, f (1, 0, 1) = (2, 1, 1) . Therefore, the directional derivative is
1
1
1
4
(2, 1, 1) , ,
=
3.
3
3
3
3
Because of 13.3 it is easy to find the largest possible directional derivative and the
smallest possible directional derivative. That which follows is a more algebraic treatment
of an earlier result with the trigonometry removed.
283
(13.4)
(13.5)
and
Furthermore, the maximum in 13.4 occurs when v = f (x) / |f (x)| and the minimum
in 13.5 occurs when v = f (x) / |f (x)| .
Proof: From 13.3 and the Cauchy Schwarz inequality,
|Dv f (x)| |f (x)|
and so for any choice of v with |v| = 1,
|f (x)| Dv f (x) |f (x)| .
The proposition is proved by noting that if v = f (x) / |f (x)| , then
Dv f (x)
The conclusion of the above proposition is important in many physical models. For
example, consider some material which is at various temperatures depending on location.
Because it has cool places and hot places, it is expected that the heat will flow from
the hot places to the cool places. Consider a small surface having a unit normal, n.
Thus n is a normal to this surface and has unit length. If it is desired to find the rate
in calories per second at which heat crosses this little surface in the direction of n it is
defined as J nA where A is the area of the surface and J is called the heat flux. It
is reasonable to suppose the rate at which heat flows across this surface will be largest
when n is in the direction of greatest rate of decrease of the temperature. In other
words, heat flows most readily in the direction which involves the maximum rate of
decrease in temperature. This expectation will be realized by taking J = Ku where
K is a positive scalar function which can depend on a variety of things. The above
relation between the heat flux and u is usually called the Fourier heat conduction law
and the constant, K is known as the coefficient of thermal conductivity. It is a material
property, different for iron than for aluminum. In most applications, K is considered
to be a constant but this is wrong. Experiments show this scalar should depend on
temperature. Nevertheless, things get very difficult if this dependence is allowed. The
constant can depend on position in the material or even on time.
An identical relationship is usually postulated for the flow of a diffusing species. In
this problem, something like a pollutant diffuses. It may be an insecticide in ground
water for example. Like heat, it tries to move from areas of high concentration toward
areas of low concentration. In this case J = Kc where c is the concentration of the
diffusing species. When applied to diffusion, this relationship is known as Ficks law.
Mathematically, it is indistinguishable from the problem of heat flow.
Note the importance of the gradient in formulating these models.
284
13.2
Tangent Planes
The gradient has fundamental geometric significance illustrated by the following picture.
f (x0 , y0 , z0 )
> x01 (t0 )
^
x02 (s0 )
In this picture, the surface is a piece of a level surface of a function of three variables, f (x, y, z) . Thus the surface is defined by f (x, y, z) = c or more completely as
{(x, y, z) : f (x, y, z) = c} . For example, if f (x, y, z) = x2 +y 2 +z 2 , this would be a piece
of a sphere. There are two smooth curves in this picture which lie in the surface having parameterizations, x1 (t) = (x1 (t) , y1 (t) , z1 (t)) and x2 (s) = (x2 (s) , y2 (s) , z2 (s))
which intersect at the point, (x0 , y0 , z0 ) on this surface1 . This intersection occurs when
t = t0 and s = s0 . Since the points, x1 (t) for t in an interval lie in the level surface, it
follows
f (x1 (t) , y1 (t) , z1 (t)) = c
for all t in some interval. Therefore, taking the derivative of both sides and using the
chain rule on the left,
f
(x1 (t) , y1 (t) , z1 (t)) x01 (t) +
x
f
f
(x1 (t) , y1 (t) , z1 (t)) y10 (t) +
(x1 (t) , y1 (t) , z1 (t)) z10 (t) = 0.
y
z
In terms of the gradient, this merely states
f (x1 (t) , y1 (t) , z1 (t)) x01 (t) = 0.
Similarly,
f (x2 (s) , y2 (s) , z2 (s)) x02 (s) = 0.
Letting s = s0 and t = t0 , it follows
f (x0 , y0 , z0 ) x01 (t0 ) = 0, f (x0 , y0 , z0 ) x02 (s0 ) = 0.
It follows f (x0 , y0 , z0 ) is perpendicular to both the direction vectors of the two indicated curves shown. Surely if things are as they should be, these two direction vectors
would determine a plane which deserves to be called the tangent plane to the level
surface of f at the point (x0 , y0 , z0 ) and that f (x0 , y0 , z0 ) is perpendicular to this
tangent plane at the point, (x0 , y0 , z0 ).
Example 13.2.1 Find the equation of the tangent plane to the level surface, f (x, y, z) =
6 of the function, f (x, y, z) = x2 + 2y 2 + 3z 2 at the point (1, 1, 1) .
First note that (1, 1, 1) is a point on this level surface. To find the desired plane it
suffices to find the normal vector to the proposed plane. But f (x, y, z) = (2x, 4y, 6z)
1 Do there exist any smooth curves which lie in the level surface of f and pass through the point
(x0 , y0 , z0 )? It turns out there do if f (x0 , y0 , z0 ) 6= 0 and if the function, f, is C 1 . However, this is
a consequence of the implicit function theorem, one of the greatest theorems in all mathematics and a
topic for an advanced calculus class. It is also in an appendix to this book
13.3. EXERCISES
285
and so f (1, 1, 1) = (2, 4, 6) . Therefore, from this problem, the equation of the plane
is
(2, 4, 6) (x 1, y 1, z 1) = 0
or in other words,
2x 12 + 4y + 6z = 0.
Example
13.2.2
The point,
3, 1, 4 is on both the surfaces, z = x2 + y 2 and z =
2
8 x + y 2 . Find the cosine of the angle between the two tangent planes at this point.
Recall this is the same as the angle between two normal vectors. Of course there
is some ambiguity here because if n is a normal vector, then so is n and replacing n
with n in the formula for the cosine of the angle will change the sign. We agree to
look
and
for the acute angle
its cosine rather than the obtuse angle. The normals are
2 3, 2, 1 and 2 3, 2, 1 . Therefore, the cosine of the angle desired is
2
2 3 +41
15
=
.
17
17
Example 13.2.3 The point, 1, 3, 4 is on the surface, z = x2 + y 2 . Find the line
perpendicular to the surface at this point.
All that is needed is the direction vector of this line. The surface is the level surface,
x2 + y 2 z = 0. The
to this
normal
surface
is given by the gradient at this point. Thus
the desired line is 1, 3, 4 + t 2, 2 3, 1 .
13.3
Exercises
1 1 1
2, 2, 2 .
(a) x2 y + z 3 at (1, 1, 1)
286
13.4
Local Extrema
6Tangent Plane
r
z = f (x)
287
q
2
2
x0 , y0 , 4 x0 y0
and so the equation of the tangent plane at this point is
q
q
x0 (x x0 ) + y0 (y y0 ) + 4 x20 y02 z 4 x20 y02 = 0
When x = y = 0,
z=p
(4 x20 y02 )
When z = 0 = y,
x=
4
,
x0
y=
4
.
y0
and when z = x = 0,
288
1
64
p
6 xy (4 x2 y 2 )
This is because in beginning calculus it was shown that the volume of a pyramid is 1/3
the area of the base times the height. Therefore, you simply need to find the gradient
of this and set it equal to zero. Thus upon taking the partial derivatives, you need to
have
4 + 2x2 + y 2
p
= 0,
x2 y (4 + x2 + y 2 ) (4 x2 y 2 )
and
4 + x2 + 2y 2
p
= 0.
xy 2 (4 + x2 + y 2 ) (4 x2 y 2 )
64 64
+
y
x
To find best dimensions you note these must result in a local minimum.
Ax =
yx2 64
xy 2 64
= 0, Ay =
.
2
x
y2
13.5
There is a version of the second derivative test in the case that the function and its
first and second partial derivatives are all continuous. The proof of this theorem is
dependent on fundamental results in linear algebra which are in an appendix. You can
skip the proof if you like. It is given later.
Definition 13.5.1 The matrix, H (x) whose ij th entry at the point x is
2f
(x)
xi xj
is called the Hessian matrix. The eigenvalues of H (x) are the solutions to the
equation
det (I H (x)) = 0
289
The following theorem says that if all the eigenvalues of the Hessian matrix at
a critical point are positive, then the critical point is a local minimum. If all the
eigenvalues of the Hessian matrix at a critical point are negative, then the critical point
is a local maximum. Finally, if some of the eigenvalues of the Hessian matrix at the
critical point are positive and some are negative then the critical point is a saddle point.
The following picture illustrates the situation.
First 10xy + y 2 = (10y, 10x + 2y) and so there is one critical point at the point
(0, 0). What is it? The Hessian matrix is
0 10
10 2
and the eigenvalues are of different signs. Therefore, the critical point (0, 0) is a saddle
point. Here is a graph drawn by Maple.
290
2
fx (x, y) = 8x3 12x2 + 28x + 24yx 12y 12 and
fy (x,
y) = 12x 12x + 4y + 4.
The points at which both fx and fy equal zero are 12 , 41 , (0, 1), and (1, 1).
and the thing to determine is the sign of its eigenvalues evaluated at the critical points.
16 0
First consider the point 12 , 14 . The Hessian matrix is
and its eigen0 4
values are 16, 4 showing that this is a local minimum.
4
12
Next consider (0, 1) at this point the Hessian matrix is
and the
12
4
eigenvalues are 16, 8. Therefore, this point is a saddle point. To determine this, find
the eigenvalues.
1 0
4
det
0 1
12
12
4
= 2 8 128 = ( + 8) ( 16)
4 12
12 4
and the
Example 13.5.5 Suppose f (x, y) = 6xy 2 2x3 3y 4 . Show that (0, 0) is a critical
point for which the second derivative test gives no information.
Before doing anything it might be interesting to look at the graph of this function
of two variables plotted using Maple.
291
This picture should indicate why this is called a monkey saddle. It is because the
monkey can sit in the saddle and have a place for his tail. Now to see (0, 0) is a critical
point, note that fx (0, 0) = fy (0, 0) = 0 because fx (x, y) = 6y 2 6x2 , fy (x, y) =
12xy 12y 3 and so (0, 0) is a critical point. So are (1, 1) and (1, 1). Now fxx (0, 0) = 0
and so are fxy (0, 0) and fyy (0, 0). Therefore, the Hessian matrix is the zero matrix
and clearly has only the zero eigenvalue. Therefore, the second derivative test is totally
useless at this point.
However, suppose you took x = t and y = t and evaluated this function on this line.
This reduces to h (t) = f (t, t) = 4t3 3t4 ), which is strictly increasing near t = 0. This
shows the critical point (0, 0) of f is neither a local max. nor a local min. Next let
x = 0 and y = t. Then p (t) f (0, t) = 3t4 . Therefore, along the line, (0, t), f has a
local maximum at (0, 0).
Example 13.5.6 Find the critical points of the following function of three variables
and classify them as local minimums, local maximums or saddle points.
f (x, y, z) =
7
4
5
4
1
5 2
x + 4x + 16 xy 4y xz + 12z + y 2 zy + z 2
6
3
3
6
3
3
First you need to locate the critical points. This involves taking the gradient.
7
4
5 2 4
1 2
5 2
x + 4x + 16 xy 4y xz + 12z + y zy + z
6
3
3
6
3
3
7
4
7
5
4
4
4
2
5
=
x + 4 y z, x 4 + y z, x + 12 y + z
3
3
3
3
3
3
3
3
3
Next you need to set the gradient equal to zero and solve the equations. This yields
y = 5, x = 3, z = 2. Now to use the second derivative test, you assemble the Hessian
matrix which is
5
73 43
3
5
7
43 .
3
3
2
4
4
3 3
3
292
Note that in this simple example, the Hessian matrix is constant and so all that is left
is to consider the eigenvalues. Writing the characteristic equation and solving yields the
eigenvalues are 2, 2, 4. Thus the given point is a saddle point.
13.6
Exercises
1. Use the second derivative test on the critical points (1, 1) , and (1, 1) for Example
13.5.5.
2. If H = H T and Hx = x while Hx = x for 6= , show x y = 0.
The
must be checked at the critical points. First consider the point
1 eigenvalues
21
,
.
At
this
point, the Hessian is
2
4
24
0
0
2
and its eigenvalues are 24, 2, both negative. Therefore, the function has a local
maximum at this point.
Next consider (0, 4) . At this point the Hessian matrix is
2
10
10
2
2
10
10
2
Check its eigenvalues at the critical points. First consider the point
this point the Hessian is
53
2 , 12
. At
13.6. EXERCISES
293
16
3
0
0
6
Check its eigenvalues at the critical points. First consider the point
this point, the Hessian matrix is
16
5
0
0
10
37
2 , 20
. At
10 6
6 10
and the eigenvalues are 16, 4. Therefore, there is a local minimum at this point.
There is also a local minimum at the critical point, (1, 2) .
48x2 8 8y 48x 8x + 4
. Check its eigenvalues
The Hessian matrix is
8x + 4
1 817
at the critical points. First consider the point 2 , 8 . This matrix is
3 0
and its eigenvalues are 3, 8.
0 8
Next consider (0, 2) at this point the Hessian matrix is
8 4
and the eigenvalues are 12, 4. Finally consider the point (1, 2) .
4 8
8 4
, eigenvalues: 12, 4.
4 8
If the eigenvalues are both negative, then local max. If both positive, then local
min. Otherwise the test fails.
294
7. Find the critical points of the following function of three variables and classify
them according to whether they are local minima, local maxima or saddle points.
f (x, y, z) = 31 x2 +
32
3 x
4
3
16
3 yx
58
3 y
43 zx
46
3 z
+ 13 y 2 34 zy 53 z 2 .
Answer:
The critical point is at (2, 3, 5) . The eigenvalues of the Hessian matrix at this
point are 6, 2, and 6.
8. Find the critical points of the following function of three variables and classify
them according to whether they are local minima, local maxima or saddle points.
f (x, y, z) = 35 x2 + 23 x
2
3
+ 83 yx + 23 y +
14
3 zx
28
3 z
53 y 2 +
14
3 zy
83 z 2 .
Answer:
The eigenvalues are 4, 10, and 6 and the only critical point is (1, 1, 0) .
9. Find the critical points of the following function of three variables and classify
them according to whether they are local minima, local maxima or saddle points.
2
f (x, y, z) = 11
3 x +
40
3 x
56
3
+ 83 yx +
10
3 y
43 zx +
22
3 z
11 2
3 y
34 zy 53 z 2 .
10. Find the critical points of the following function of three variables and classify
them according to whether they are local minima, local maxima or saddle points.
f (x, y, z) = 32 x2 +
28
3 x
37
3
14
3 yx
10
3 y
43 zx
26
3 z
23 y 2 34 zy + 73 z 2 .
11. Show that if f has a critical point and some eigenvalue of the Hessian matrix
is positive, then there exists a direction in which when f is evaluated on the
line through the critical point having this direction, the resulting function of one
variable has a local minimum. State and prove a similar result in the case where
some eigenvalue of the Hessian matrix is negative.
12. Suppose = 0 but there are negative eigenvalues of the Hessian at a critical point.
Show by giving examples that the second derivative tests fails.
13. Show the points 12 , 92 , (0, 5) , and (1, 5) are critical points of the following
function of two variables and classify them as local minima, local maxima or saddle
points.
f (x, y) = 2x4 4x3 + 42x2 + 8yx2 8yx 40x + 2y 2 + 20y + 50.
10 2
3 x
44
3 x
64
3
10
3 yx
16
3 y
+ 23 zx
20
3 z
10 2
3 y
+ 23 zy + 43 z 2 .
17. Find the critical points of the following function of three variables and classify
them as local minima, local maxima or saddle points.
f (x, y, z) = 37 x2
146
3 x
83
3
16
3 yx
+ 43 y
14
3 zx
94
3 z
73 y 2
14
3 zy
+ 83 z 2 .
13.6. EXERCISES
295
18. Find the critical points of the following function of three variables and classify
them as local minima, local maxima or saddle points.
f (x, y, z) = 32 x2 + 4x + 75
14
3 yx
38y 83 zx 2z + 23 y 2 38 zy 13 z 2 .
19. Find the critical points of the following function of three variables and classify
them as local minima, local maxima or saddle points.
f (x, y, z) = 4x2 30x + 510 2yx + 60y 2zx 70z + 4y 2 2zy + 4z 2 .
20. Show the critical points
of the following function are points of the form, (x, y, z) =
t, 2t2 10t, t2 + 5t for t R and classify them as local minima, local maxima
or saddle points.
10
50
19
95
5 2
10
1 2
2
2
2
f (x, y, z) = 61 x4 + 53 x3 25
6 x + 3 yx 3 yx + 3 zx 3 zx 3 y 3 zy 6 z .
The verification that the critical points are of the indicated form is left for you.
The Hessian is
2x2 + 10x 25
3 +
20
50
3
3
38
95
3 x 3
20
3 y
38
3 z
20
3 x
10
3
10
3
50
3
38
3 x
10
3
13
95
3
at a critical point it is
43 t2 + 20
3 t
20
(t)
50
3
3
38
95
3 (t) 3
25
3
20
3
(t)
10
3
10
3
50
3
38
3
(t)
10
3
13
95
3
2
10
2p 4
t2 + t 6
(t 10t3 + 493t2 2340t + 2916).
3
3
3
If you graph these functions of t you find the second is always positive and the
third is always negative. Therefore, all these critical points are saddle points.
21. Show the critical points of the following function are
1
(0, 3, 0) , (2, 3, 0) , 1, 3,
3
and classify them as local minima, local maxima or saddle points.
f (x, y, z) = 23 x4 + 6x3 6x2 + zx2 2zx 2y 2 12y 18 32 z 2 .
The Hessian is
12 + 36x + 2z 18x2 0 2 + 2x
0
4
0
2 + 2x
0
3
Now consider the critical point, 1, 3, 13 . At this point the Hessian matrix
equals
16
3
0
0
0
4
0
0
0 ,
3
296
16
3 , 3, 4
Next consider the critical point, (2, 3, 0) . At this point the Hessian matrix is
12
0
2
The eigenvalues are 4, 15
2 +
there is a local max.
1
2
0
2
4 0
0 3
97, 15
2
1
2
Finally consider the critical point, (0, 3, 0) . At this point the Hessian is
12
0
2
0 2
4 0
0 3
and the eigenvalues are the same as the above, all negative. Therefore, there is a
local maximum at this point.
22. Show the critical points
of the following function are points of the form, (x, y, z) =
t, 2t2 + 6t, t2 3t for t R and classify them as local minima, local maxima
or saddle points.
f (x, y, z) = 2yx2 6yx 4zx2 12zx + y 2 + 2yz.
23. Show the critical points of the following function are (0, 1, 0) , (4, 1, 0) , and
(2, 1, 12) and classify them as local minima, local maxima or saddle points.
f (x, y, z) = 21 x4 4x3 + 8x2 3zx2 + 12zx + 2y 2 + 4y + 2 + 12 z 2 .
24. Can you establish the following theorem which generalizes Theorem 13.4.3? Suppose U is an open set contained in D (f ) such that f is differentiable at x U
and x is either a local minimum or local maximum for f . Then f (x) = 0. Hint:
It ought to be this way because it works like this for a function of one variable.
Differentiability at the local max. or min. is sufficient. You dont have to know
the function is differentiable near the point, only at the point. This is not hard
to do if you use the definition of the derivative.
25. Suppose f (x, y), a function of two variables defined on all Rn has all directional
derivatives at (0, 0) and they are all equal to 0 there. Suppose also that for
h (t) f (tu, tv) and (u, v) a unit vector, it follows that h00 (0) > 0. By the one
variable second derivative test, this implies that along every straight line through
(0, 0) the function restricted to this line has a local minimum at (0, 0). Can it be
concluded that f has a local minimum at (0, 0) . In other words, can you conclude
a point is a local minimum if it appears to be so along every straight line through
the point? Hint: Consider f (x, y) = x2 + y 2 for (x, y) not on the curve y = x2
for x 6= 0 and on this curve, let f = 1.
13.7
Lagrange Multipliers
Lagrange multipliers are used to solve extremum problems for a function defined on a
level set of another function. For example, suppose you want to maximize xy given
that x + y = 4. This is not too hard to do using methods developed earlier. Solve for
one of the variables, say y, in the constraint equation, x + y = 4 to find y = 4 x.
Then the function to maximize is f (x) = x (4 x) and the answer is clearly x = 2.
297
Thus the two numbers are x = y = 2. This was easy because you could easily solve
the constraint equation for one of the variables in terms of the other. Now what if you
wanted to maximize f (x, y, z) = xyz subject to the constraint that x2 + y 2 + z 2 = 4? It
is still possible to do this using using similar techniques. Solve for one of the variables
in the constraint equation, say z, substitute it into f, and then find where the partial
derivatives equal zero to find candidates for the extremum. However, it seems you might
encounter many cases and it does look a little fussy. However, sometimes you cant solve
the constraint equation for one variable in terms of the others. Also, what if you had
many constraints. What if you wanted to maximize f (x, y, z) subject to the constraints
x2 + y 2 = 4 and z = 2x + 3y 2 . Things are clearly getting more involved and messy. It
turns out that at an extremum, there is a simple relationship between the gradient of
the function to be maximized and the gradient of the constraint function. This relation
can be seen geometrically as in the following picture.
(x0 , y0 , z0 )
g(x0 , y0 , z0 )
p
f (x0 , y0 , z0 )
In the picture, the surface represents a piece of the level surface of g (x, y, z) = 0
and f (x, y, z) is the function of three variables which is being maximized or minimized
on the level surface and suppose the extremum of f occurs at the point (x0 , y0 , z0 ) .
As shown above, g (x0 , y0 , z0 ) is perpendicular to the surface or more precisely to the
tangent plane. However, if x (t) = (x (t) , y (t) , z (t)) is a point on a smooth curve which
passes through (x0 , y0 , z0 ) when t = t0 , then the function, h (t) = f (x (t) , y (t) , z (t))
must have either a maximum or a minimum at the point, t = t0 . Therefore, h0 (t0 ) = 0.
But this means
0 = h0 (t0 ) = f (x (t0 ) , y (t0 ) , z (t0 )) x0 (t0 )
= f (x0 , y0 , z0 ) x0 (t0 )
and since this holds for any such smooth curve, f (x0 , y0 , z0 ) is also perpendicular to
the surface. This picture represents a situation in three dimensions and you can see
that it is intuitively clear that this implies f (x0 , y0 , z0 ) is some scalar multiple of
g (x0 , y0 , z0 ). Thus
f (x0 , y0 , z0 ) = g (x0 , y0 , z0 )
This is called a Lagrange multiplier after Lagrange who considered such problems
in the 1700s.
Of course the above argument is at best only heuristic. It does not deal with the
question of existence of smooth curves lying in the constraint surface passing through
(x0 , y0 , z0 ) . Nor does it consider all cases, being essentially confined to three dimensions.
In addition to this, it fails to consider the situation in which there are many constraints.
However, I think it is likely a geometric notion like that presented above which led
Lagrange to formulate the method.
Example 13.7.1 Maximize xyz subject to x2 + y 2 + z 2 = 27.
Here f (x, y, z) = xyz while g (x, y, z) = x2 + y 2 + z 2 27. Then g (x, y, z) =
(2x, 2y, 2z) and f (x, y, z) = (yz, xz, xy) . Then at the point which maximizes this
298
function2 ,
(yz, xz, xy) = (2x, 2y, 2z) .
Therefore, each of 2x2 , 2y 2 , 2z 2 equals xyz. It follows that at any point which maximizes xyz, |x| = |y| = |z| . Therefore, the only candidates for the point where the maximum occurs are (3, 3, 3) , (3, 3, 3) (3, 3, 3) , etc. The maximum occurs at (3, 3, 3)
which can be verified by plugging in to the function which is being maximized.
The method of Lagrange multipliers allows you to consider maximization of functions
defined on closed and bounded sets. Recall that any continuous function defined on a
closed and bounded set has a maximum and a minimum on the set. Candidates for the
extremum on the interior of the set can be located by setting the gradient equal to zero.
The consideration of the boundary can then sometimes be handled with the method of
Lagrange multipliers.
Example 13.7.2 Maximize f (x, y) = xy + y subject to the constraint, x2 + y 2 1.
Here I know there is a maximum because the set is the closed circle, a closed and
bounded set. Therefore, it is just a matter of finding it. Look for singular points on
the interior of the circle. f (x, y) = (y, x + 1) = (0, 0) . There are no points on the
interior of the circle where the gradient equals zero. Therefore, the maximum occurs on
the boundary of the circle. That is the problem reduces to maximizing xy + y subject
to x2 + y 2 = 1. From the above,
(y, x + 1) (2x, 2y) = 0.
Hence y 2 2xy = 0 and x (x + 1) 2xy = 0 so y 2 = x (x + 1). Therefore from the
1
constraint, x2 + x (x + 1) =
the
solution
is x = 1, x = 2 . Then the candidates
1 and
for a solution are (1, 0) ,
3
1
,
2 2
f (1, 0) = 0, f
1 3
,
2
2
. Then
!
!
3 3
3 3
1
3
1
3
=
=
,
,f
,
.
2 2
4
2
2
4
3
1
,
.
2 2
The
299
2 + 2 2 1
det
=0
1 2 2
which yields
= 1/8
Therefore,
3
y= x
4
(13.6)
2
3
3
2x + 2x x + x = 4
4
4
2
and so
8
8
17 or
17
17
17
Now from 13.6, the points of interest on the boundary of this set are
8
6
8
6
17,
17 , and
17,
17 .
17
17
17
17
x=
8
6
17,
17
17
17
=
=
8
6
f
17,
17
=
17
17
=
8
17
17
112
17
(13.7)
2
6
8
17
17
17
17
2
8
6
8
17
17
17
17
17
17
112
17
It follows the maximum value of this function on the given set occurs at (0, 0) and is
equal to zero and the minimum occurs at either of the two points in 13.7 and has the
value 112/17.
This illustrates how to use the method of Lagrange multipliers to identify the extrema
for a function defined on a closed and bounded set. You try and consider the boundary
as a level curve or level surface and then use the method of Lagrange multipliers on it
and look for singular points on the interior of the set.
There are no magic bullets here. It was still required to solve a system of nonlinear
equations to get the answer. However, it does often help to do it this way.
The above generalizes to a general procedure which is described in the following major Theorem. All correct proofs of this theorem will involve some appeal to the implicit
function theorem or to fundamental existence theorems from differential equations. A
complete proof is very fascinating but it will not come cheap. Good advanced calculus
books will usually give a correct proof and there is a proof given in an appendix to this
book. First here is a simple definition explaining one of the terms in the statement of
this theorem.
Definition 13.7.4 Let A be an m n matrix. A submatrix is any matrix which can be
obtained from A by deleting some rows and some columns.
300
Theorem 13.7.5 Let U be an open subset of Rn and let f : U R be a C 1 function. Then if x0 U is either a local maximum or local minimum of f subject to the
constraints
gi (x) = 0, i = 1, , m
(13.8)
and if some m m submatrix of
..
..
Dg (x0 )
.
.
gmx1 (x0 ) gmx2 (x0 )
g1xn (x0 )
..
.
gmxn (x0 )
fx1 (x0 )
g1x1 (x0 )
gmx1 (x0 )
..
..
..
= 1
+ + m
.
.
.
fxn (x0 )
g1xn (x0 )
gmxn (x0 )
(13.9)
holds.
To help remember how to use 13.9 it may be helpful to do the following. First write
the Lagrangian,
m
X
L = f (x)
i gi (x)
i=1
and then proceed to take derivatives with respect to each of the components of x and also
derivatives with respect to each i and set all of these equations equal to 0. The formula
13.9 is what results from taking the derivatives of L with respect to the components
of x. When you take the derivatives with respect to the Lagrange multipliers, and set
what results equal to 0, you just pick up the constraint equations. This yields n + m
equations for the n + m unknowns, x1 , , xn , 1 , , m . Then you proceed to look for
solutions to these equations. Of course these might be impossible to find using methods
of algebra, but you just do your best and hope it will work out.
Example 13.7.6 Minimize xyz subject to the constraints x2 +y 2 +z 2 = 4 and x2y =
0.
Form the Lagrangian,
L = xyz x2 + y 2 + z 2 4 (x 2y)
and proceed to take derivatives with respect to every possible variable, leading to the
following system of equations.
yz 2x = 0
xz 2y + 2 = 0
xy 2z
x2 + y 2 + z 2
=
=
0
4
x 2y
Now you have to find the solutions to this system of equations. In general, this could
be very hard or even impossible. If = 0, then from the third equation, either x or y
must equal 0. Therefore, from the first two equations, = 0 also. If = 0 and 6= 0,
then from the first two equations, xyz = 2x2 and xyz = 2y 2 and so either x = y or
x = y, which requires that both x and y equal zero thanks to the last equation. But
13.8. EXERCISES
301
then from the fourth equation, z = 2 and now this contradicts the third equation.
Thus and are either both equal to zero or neither one is and the expression, xyz
equals zero in this case. q
However,qI know this is not the best value for a minimizer
q
q
q
8
8
or 2 15
, 15
, 43 and the minimum value of the function subject to the con
straints is 25 30 32 3.
You should rework this problem first solving the second easy constraint for x and
then producing a simpler problem involving only the variables y and z.
13.8
Exercises
y2
9
1.
y2
9
1.
x2
4
y2
9
+ z 2 1.
y2
9
+ z 2 1.
302
8. Find points on xy = 4 farthest from (0, 0) if any exist. If none exist, tell why.
What does this say about the method of Lagrange multipliers?
9. A can is supposed to have a volume of 36 cubic centimeters. Find the dimensions
of the can which minimizes the surface area.
10. A can is supposed to have a volume of 36 cubic centimeters. The top and bottom
of the can are made of tin costing 4 cents per square centimeter and the sides of
the can are made of aluminum costing 5 cents per square centimeter. Find the
dimensions of the can which minimizes the cost.
Pn
Pn
11. Minimize j=1 xj subject to the constraint j=1 x2j = a2 . Your answer should
be some function of a which you may assume is a positive number.
12. Find the point, (x, y, z) on the level surface, 4x2 + y 2 z 2 = 1which is closest to
(0, 0, 0) .
13. A curve is formed from the intersection of the plane, 2x + 3y + z = 3 and the
cylinder x2 + y 2 = 4. Find the point on this curve which is closest to (0, 0, 0) .
14. A curve is formed from the intersection of the plane, 2x + 3y + z = 3 and the
sphere x2 + y 2 + z 2 = 16. Find the point on this curve which is closest to (0, 0, 0) .
15. Find the point on the plane, 2x + 3y + z = 4 which is closest to the point (1, 2, 3) .
16. Let A = (Aij ) be an n n matrix which is symmetric. Thus Aij = Aji and recall
(Ax)i = Aij xj where as usual sum over the repeated index. Show x
(Aij xj xi ) =
i
2Aij xj . Show that when you use the method of P
Lagrange multipliers to maximize
n
the function, Aij xj xi subject to the constraint, j=1 x2j = 1, the value of which
corresponds to the maximum value of this functions is such that Aij xj = xi .
Thus Ax = x.
T
x2
4
y2
9
303
26. Let f (x1 , , xn ) = xn1 x2n1 x1n . Then f achieves a maximum on the set,
)
(
n
X
n
S xR :
ixi = 1 and each xi 0 .
i=1
n
2
Show the maximum is r /n . Now show from this that
n
Y
!1/n
x2i
i=1
1X 2
x
n i=1 i
n
Y
!1/n
xi
i=1
1X
xi
n i=1
and there exist values of the xi for which equality holds. This says the geometric
mean is always smaller than the arithmetic mean.
29. Maximize x2 y 2 subject to the constraint
x2p
y 2q
+
= r2
p
q
where p, q are real numbers larger than 1 which have the property that
1 1
+ = 1.
p q
show the maximum is achieved when x2p = y 2q and equals r2 . Now conclude that
if x, y > 0, then
xp
yq
xy
+
p
q
and there are values of x and y where this inequality is an equation.
13.9
304
2
2
3
z. x2 + 2 3x
+ (6x) = 9. This gives the values for x as x = 83
166, x =
2
3
83
166. From the three equations above, this also determines the values of z
9
9
18
and y. y = 166
166 or 166
166 and z = 18
83 166 or 83 166. Thus there
are two points to look at. One will give the minimum value and the other will
give the maximum value. You know the minimum
and maximum
exist
because
of
3
9
166, 166
166, 18
166
and
the extreme value theorem. The two points are 83
83
3
9
18
83 166, 166 166, 83 166 . Now you just need to find which is the minimum
and which is the
maximum.
9 Plug these
in to the
function you are trying to
3
maximize. 83
166 + 3 166
166 6 18
166
will clearly be the maximum
83
3
9
value occuring at 83
166, 166
166, 18
166
.
The
other point will obviously
83
yield the minimum because this one is positive
and
the
other
3
9 oneis negative.
18 If you
4
= 162 16 = 0. This is
of coefficients must equal zero. Thus
4
8
because the system of equations is of the form
2
4
x
0
=
.
4
8
y
0
If the matrix has an inverse, then the only solution would be x = y = 0 which
as noted above cant happen. Therefore, = 1. First suppose = 1. Then the
first equation says 2y = x. Pluggin this
in to the constraint equation, x2 + x2 = 4
2
and so x = 2. Therefore, y = 2 . This yields the dimensions of the largest
rectangle to be 2 2 2. You can check all the other cases and see you get the
same thing in the other cases as well.
3. Maximize 2x + y subject to the condition that
x2
4
+ y 2 1.
The maximum of this function clearly exists because of the extreme value theorem
since the condition defines a closed and bounded set in R2 . However, this function
2
does not achieve its maximum on the interior of the given ellipse defined by x4 +
y 2 1 because the gradient of the function which is to be maximized is never
equal to zero. Therefore, this function must achieve its maximum on the set
x2
x2
2
2
4 + y = 1. Thus you want to maximuze 2x + y subject to 4 + y = 1. This is
just like Problem 1. You can finish this.
4. Find the points on y 2 x = 16 which are closest to (0, 0) .
2
2
2
You want to
p maximize x + y subject to y x 16. Of course you really want to
2
2
pair which maximized x2 + y 2 is the same
maximize x + y but the ordered p
as the ordered pair whic maximized x2 + y 2 so
to drag around
it is pointless
305
x nor y can equal zero and solve the constraint. Therefore, the second equation
2
implies x = 1. Hence = x1 = 2x
= y 2 and so 2x3 = 16 and
2 . Therefore, 2x
y
L = x2 + y 2 + z 2 (2x + y + z 3) x2 + y 2 4
Then this yields the equations 2x 2 2x = 0, 2y 2y, and 2z = 0.
The last equation says = 2z and so I will replace with 2z where ever it occurs.
This yields
x 2z x = 0, 2y 2z 2y = 0.
This shows x (1 ) = 2y (1 ) . First suppose = 1. Then from the above equa2
2
tions, z = 0 and so the two constraints
to 2y + x= 3 and
x 6+ y 1=
4 and
3 2 reduce
6
1
3
2
2y+x = 3. The solutions are 5 5 11, 5 + 5 11, 0 , 5 + 5 11, 5 5 11, 0 .
The other case is that 6= 1 in which case x = 2y and the second constraint
yields
that y = 25 and x = 45 . Now from the first constraint, z = 2 5 + 3 in
the case where y = 25 and z = 2 5 + 3 in the other case. This yields the
points 45 , 25 , 2 5 + 3 and 45 , 25 , 2 5 + 3 . This appears to have exhausted all the possibilities and so it is now just a matter of seeing which of
these points gives the best answer. An answer exists because of the extreme
value theorem. After all, this constraint set is closed and bounded. The first
2
2
2
2
the last candidate yields 45 + 25 + 2 5 + 3 = 4 + 2 5 + 3 also
larger than 4. Therefore, there
are two points
ofintersection
on the curve
which
306
is 13 , 13 , 13 which you could probably see was the answer from the sketch.
However, this could be made more difficult rather easily such that the sketch wont
help but Lagrange multipliers will.
13.10
2f
xi xj
Now recall the Taylor formula with the Lagrange form of the remainder. Here is a
statement and proof of this important theorem.
Theorem 13.10.2 Suppose f has n + 1 derivatives on an interval, (a, b) and let c
(a, b) . Then if x (a, b) , there exists between c and x such that
f (x) = f (c) +
n
X
f (k) (c)
k=1
P0
k=1
k!
(x c) +
f (n+1) ()
n+1
(x c)
.
(n + 1)!
Proof: If n = 0 then the theorem is true because it is just the mean value theorem.
Suppose the theorem is true for n 1, n 1. It can be assumed x 6= c because if x = c
there is nothing to show. Then there exists K such that
!
n
X
f (k) (c)
k
n+1
f (x) f (c) +
(x c) + K (x c)
=0
(13.10)
k!
k=1
In fact,
K=
Pn
f (x) + f (c) + k=1
(x c)
f (k) (c)
k!
n+1
(x c)
n
X
f (k) (c)
k
n+1
(x t) + K (x t)
.
F (t) f (x) f (t) +
k!
k=1
307
Therefore,
0 =
f 0 (t1 )
n
X
f (k) (c)
k!
k=1
=
=
k (x t1 )
k1
K (n + 1) (x t1 )
!
f (k+1) (c)
k
n
f (t1 ) f (c) +
(x t1 )
K (n + 1) (x t1 )
k!
k=1
!
n1
X f 0(k) (c)
k
n
0
0
f (t1 ) f (c) +
(x t1 )
K (n + 1) (x t1 )
k!
0
n1
X
k=1
By induction applied to f , there exists between x and t1 such that the above simplifies
to
n
f 0(n) () (x t1 )
n
K (n + 1) (x t1 )
n!
n
f (n+1) () (x t1 )
n
K (n + 1) (x t1 )
n!
0 =
=
therefore,
K=
f (n+1) ()
f (n+1) ()
=
(n + 1) n!
(n + 1)!
f
2f
(x + tv) vi , h00 (t) =
(x + tv) vi vj .
xi
xj xi
h00 (t) = vT H (x + tv) v.
f
1
(x) vi + vT H (x + tv) v
xi
2
1 T
v (H (x+tv) H (x)) v
2
where the last term satisfies
1 vT (H (x+tv) H (x)) v
lim
=0
2
|v|0 2
|v|
because of the continuity of the entries of H (x) .
(13.11)
(13.12)
308
f
Theorem 13.10.3 Suppose x is a critical point for f. That is, suppose x
(x) = 0 for
i
each i. Then if H (x) has all positive eigenvalues, x is a local minimum. If H (x) has
all negative eigenvalues, then x is a local maximum. If H (x) has a positive eigenvalue,
then there exists a direction in which f has a local minimum at x, while if H (x) has a
negative eigenvalue, there exists a direction in which f has a local maximum at x.
1
1
f (x + v) = f (x) + vT H (x) v+ vT (H (x+tv) H (x)) v
2
2
(13.13)
and by continuity of the second derivatives, these mixed second derivatives are equal and
so H (x) is a symmetric matrix . Thus, by Lemma A.2.27 on Page 425 in the appendix,
H (x) has all real eigenvalues. Suppose first that H (x) has all positive eigenvalues and
that all are larger than 2 > 0. Then by Theorem A.2.29 on Page 425 of the appendix,
uT H (x) u 2 |u|
1 2 T
s v (H (x+tsv) H (x)) v
2
which implies
f (x+sv)
1
1
2
= f (x) + s2 2 |v| + s2 vT (H (x+tsv) H (x)) v
2
2
1 2 2 2
f (x) + s |v|
4
whenever s is small enough. Thus in the direction v the function has a local minimum
at x. The assertion about the local maximum in some direction follows similarly. This
proves the theorem.
Outcomes
14.1
z = f (x, y)
In this picture, the volume of the little prism which lies above the rectangle Q and
the graph of the function would lie between MQ (f ) v (Q) and mQ (f ) v (Q) where
MQ (f ) sup {f (x) : x Q} , mQ (f ) inf {f (x) : x Q} ,
and v (Q) is defined as the area of Q. Now consider the following picture.
309
(14.1)
310
In this picture, it is assumed f equals zero outside the circle and f is a bounded
nonnegative function. Then each of those little squares are the base of a prism of the
sort in the previous picture and the sum of the volumes of those prisms should be the
volume under the surface, z = f (x, y) . Therefore, the desired volume must lie between
the two numbers,
X
X
MQ (f ) v (Q) and
mQ (f ) v (Q)
Q
(x, y) [a, b] [c, d] , means x [a, b] and also y [c, d] and the points which do
this comprise the rectangle just as shown in the picture.
Definition 14.1.1 For i = 1, 2, let ik k= be points on R which satisfy
lim ik = ,
(14.2)
Q = 1k , 1k+1 2l , 2l+1 .
(14.3)
If G is a grid, another grid, F is a refinement of G if every box of G is the union of
boxes of F.
311
MQ (f ) v (Q)
QG
is called the upper sum associated with the grid, G as described above in the discussion
of the volume under a surface. Again, this means to take a rectangle from G multiply
MQ (f ) defined in 14.1 by its area, v (Q) and sum all these products for every Q G.
The symbol,
X
mQ (f ) v (Q) ,
QG
called a lower sum, is defined similarly. With this preparation it is time to give a
definition of the Riemann integral of a function of two variables.
Definition 14.1.2 Let f : R2 R be
R a bounded function which equals zero for all
(x, y) outside some bounded set. Then f dV is defined to be the unique number which
2
lies between all upper sums and all lower sums.
R In the case of R , it is common to
replace the V with A and write this symbol as f dA where A stands for area.
This definition begs a difficult question. For which functions does there exist a
unique number between all the upper and lower sums? This interesting and fundamental
question is discussed in any advanced calculus book and may be seen in the appendix
on the theory of the Riemann integral. It is a hard problem which was only solved in
the first part of the twentieth century. When it was solved, it was also realized that the
Riemann integral was not the right integral to use.
Consider the question: How can the Riemann integral be computed? Consider the
following picture where f is assumed to be 0 outside the base of the solid which is
contained in some rectangle [a, b] [c, d].
It depicts a slice taken from the solid defined by {(x, y) : 0 y f (x, y)} . You see
these when you look at a loaf of bread. If you wanted to find the volume of the loaf of
bread, and you knew the volume of each slice of bread, you could find the volume of
the whole loaf by adding the volumes of individual slices. It is the same here. If you
could find the volume of the slice represented in this picture, you could add these up
and get the volume of the solid. The slice in the picture corresponds to y and y + h and
is assumed to be very thin, having thickness equal to h. Denote the volume of the solid
under the graph of z = f (x, y) on [a, b] [c, y] by V (y) . Then
Z
V (y + h) V (y) h
f (x, y) dx
a
where the integral is obtained by fixing y and integrating with respect to x and is the
area of the cross section corresponding to y. It is hoped that the approximation would
be increasingly good as h gets smaller. Thus, dividing by h and taking a limit, it is
expected that
Z b
V 0 (y) =
f (x, y) dx, V (c) = 0.
a
312
Therefore, as in the method of cross sections, the volume of the solid under the graph
Rd
of z = f (x, y) is obtained by doing c to both sides,
!
Z Z
d
f (x, y) dx
c
dy
(14.4)
R
but this volume Rwas also the result of f dV. Therefore, it is expected that this is a
way to evaluate f dV.
R
Note what has been gained here. A hard problem, finding f dV, is reduced to a
sequence of easier problems. First do
Z b
f (x, y) dx
a
f (x, y) dx
c
dy =
F (y) dy.
c
Of course there is nothing special about fixing y first. The same thing should be
obtained from the integral,
!
Z b Z d
f (x, y) dy dx
(14.5)
a
These expressions
in 14.4 and 14.5 are called iterated integrals. They are tools for
R
evaluating f dV which would be hard to find otherwise. In practice, the parenthesis
is usually omitted in these expressions. Thus
!
Z b Z d
Z bZ d
f (x, y) dy dx =
f (x, y) dy dx
a
and it is understood that you are to do the inside integral first and then when you have
done it, obtaining a function of x, you integrate this function of x. Note that this is
nothing more than using an integral to compute the area of a cross section and then
using this method to find a volume.
However, there is no difference in the general case where f is not necessarily nonnegative as can be seen by applying the method to the nonnegative functions f + , f
given by
|f | f
|f | f
f+
, f
2
2
+
f (x, y) dy dx
a
313
f (x, y) if (x, y) S
f1 (x, y)
.
0 if (x, y)
/S
Then
Z
f dV
f1 dV.
R
R
f dV.
This is done using iterated integrals like those defined above. Thus
Z 1Z 2
Z
2
f dV =
x y + yx dy dx.
R
x y + yx dy = 2x2 + 2x
1
0
R1
x y + yx dy dx =
5
2x2 + 2x dx = .
3
If the integration is done in the opposite order, the same answer should be obtained.
Z 2Z 1
2
x y + yx dx dy
0
1
0
Now
Z
0
5
x y + yx dx = y
6
x y + yx dx dy =
2
0
5
y
6
dy =
5
.
3
If a different answer had been obtained it would have been a sign that a mistake had
been made.
Example 14.1.5 Let f (x, y) = x2 y +yx for (x, y) R where R is the triangular region
defined
R to be in the first quadrant, below the line y = x and to the left of the line x = 4.
Find R f dV.
x2 y + yx dy dx
The reason for this is that x goes from 0 to 4 and for each fixed x between 0 and 4,
y goes from 0 to the slanted line, y = x, the function being defined to be 0 for larger
314
Z
Z 4
1 4 1 3
672
f dV =
x + x
dx =
.
2
2
5
R
0
Rx
0
x2 y + yx dy =
What of integration in a different order? Lets put the integral with respect to y on
the outside and the integral with respect to x on the inside. Then
Z
Z 4Z 4
2
f dV =
x y + yx dx dy
R
1
1
88
y y4 y3
x y + yx dx =
3
3
2
y
Now
f dV =
R
1
1
88
y y4 y3
3
3
2
dy =
672
.
5
y
R
4 x
Put the integral with respect to x on the outside first. Then
Z
Z 4 Z 2x
2
f dV =
x y dy dx
R
and so
f dV =
R
2048
2x4 dx =
5
Now do the integral in the other order. Here the integral with respect to y will be
on the outside. What are the limits of this integral? Look at the triangle and note that
x goes from 0 to 4 and so 2x = y goes from 0 to 8. Now for fixed y between 0 and 8,
where does x go? It goes from the x coordinate on the line y = 2x which corresponds
to this y to 4. What is the x coordinate on this line which goes with y? It is x = y/2.
Therefore, the iterated integral is
Z 8Z 4
2
x y dx dy.
0
y/2
315
2
64
1
x y dx =
y y4
3
24
y/2
and so
f dV =
R
64
1
y y4
3
24
dy =
2048
5
sin y 2 dV.
R
R
4 x
Setting this up to have the integral with respect to y on the inside yields
Z
0
sin y 2 dy dx.
2x
Unfortunately, there is no antiderivative in terms of elementary functions for sin y 2
so there is an immediate problem in evaluating the inside integral. It doesnt work out
so the next step is to do the integration in another order and see if some progress can
be made. This yields
Z
0
y/2
0
sin y 2 dx dy =
Z
0
y
sin y 2 dy
2
and 0 y2 sin y 2 dy = 14 cos 64 + 41 which you can verify by making the substitution,
u = y 2 . Thus
Z
1
1
sin y 2 dy = cos 64 + .
4
4
R
R
This illustrates an important idea. The integral R sin y 2 dV is defined as a number. It is the unique number between all the upper sums and all the lower sums. Finding
it is another matter. In this case it was possible to find it using one order of integration
but not the other. The iterated integral in this other order also is defined as a number
but it cant be found directly without interchanging the order of integration. Of course
sometimes nothing you try will work out.
R8
316
14.1.1
Consider a two dimensional material. Of course there is no such thing but a flat plate
might be modeled as one. The density is a function of position and is defined as follows.
Consider a small chunk of area, dV located at the point whose Cartesian coordinates
are (x, y) . Then the mass of this small chunk of material is given by (x, y) dV. Thus
if the material occupies a region in two dimensional space, U, the total mass of this
material would be
Z
dV
U
In other words you integrate the density to get the mass. Now by letting depend on
position, you can include the case where the material is not homogeneous. Here is an
example.
Example 14.1.8 Let (x, y) denote the density of the plane region determined by the
curves 31 x + y = 2, x = 3y 2 , and x = 9y. Find the total mass if (x, y) = y.
You need to first draw a picture of the region, R. A rough sketch follows.
(3, 1)
PP (1/3)x + y = 2
PP
x = 3y 2
PP
P
(9/2, 1/2)
x = 9y
(0, 0)
This region is in two pieces, one having the graph of x = 9y on the bottom and the
graph of x = 3y 2 on the top and another piece having the graph of x = 9y on the bottom
and the graph of 13 x + y = 2 on the top. Therefore, in setting up the integrals, with the
integral with respect to x on the outside, the double integral equals the following sum
of iterated integrals.
z
Z
}|
3Z
x/3
has
z
Z
y dy dx +
0
x/9
9
2
1
3 x+y=2
}|
2 31 x
on top
y dy dx
x/9
You notice it is not necessary to have a perfect picture, just one which is good enough
to figure out what the limits should be. The dividing line between the two cases is
x = 3 and this was shown in the picture. Now it is only a matter of evaluating the
iterated integrals which in this case is routine and gives 1.
14.2
Exercises
1. Let (x, y) denote the density of the plane region closest to (0, 0) which is between
the curves 41 x + y = 6, x = 4y 2 , and x = 16y. Find the total mass if (x, y) = y.
Your answer should be 1168
75 .
2. Let (x, y) denote the density of the plane region determined by the curves 15 x +
y = 6, x = 5y 2 , and x = 25y. Find the total mass if (x, y) = y + 2x. Your answer
should be 1735
3 .
14.2. EXERCISES
317
3. Let (x, y) denote the density of the plane region determined by the curves y =
3x, y = x, 3x + 3y = 9. Find the total mass if (x, y) = y + 1. Your answer should
81
be 32
.
4. Let (x, y) denote the density of the plane region determined by the curves y =
3x, y = x, 4x + 2y = 8. Find the total mass if (x, y) = y + 1.
5. Let (x, y) denote the density of the plane region determined by the curves y =
3x, y = x, 2x + 2y = 4. Find the total mass if (x, y) = x + 2y.
6. Let (x, y) denote the density of the plane region determined by the curves y =
3x, y = x, 5x + 2y = 10. Find the total mass if (x, y) = y + 1.
R4R2
y
7. Find 0 y/2 x1 e2 x dx dy. Your answer should be e4 1. You might need to interchange the order of integration.
R8R4
y
8. Find 0 y/2 x1 e3 x dx dy.
9. Find
10. Find
R8R4
0
y
1 3x
e
y/2 x
R4R2
0
y
1 3x
e
y/2 x
dx dy.
dx dy.
1
2
RR
0
1
2
x
x
sin y
y
sin y
y
dy dx.
dy dx
318
14.3
14.3.1
(14.6)
MQ (f ) v (Q)
QG
319
and k and the symbol means you simply add up all the aijk .P
ByP
theP
commutative law
of addition, these may be added systematically in the form, k j i aijk . A similar
process is used to evaluate triple integrals and since integrals are like sums, you might
expect it to be valid. Specifically,
Z
Z ?Z ?Z ?
f dV =
f (x, y, z) dx dy dz.
?
In words, sum with respect to x and then sum what you get with respect to y and
finally, with respect to z. Of course this should hold in any other order such as
Z
Z ?Z ?Z ?
f dV =
f (x, y, z) dz dy dx.
?
(14.8)
Q=
ji , iji +1 .
(14.9)
i=1
(14.10)
n
Y
(bi ai ) , Q
i=1
n
Y
[ai , bi ] .
i=1
Now define upper sums, UG (f ) and lower sums, LG (f ) with respect to the indicated
grid, by the formulas
X
X
UG (f )
MQ (f ) v (Q) , LG (f )
mQ (f ) v (Q) .
QG
QG
R
E
f dV
1 if x E
.
0 if x
/E
XE f dV when f XE R (Rn ) .
1 All of these fundamental questions about integrals can be considered more easily in the context of
the Lebesgue integral. However, this integral is more abstract than the Riemann integral.
320
14.3.2
Iterated Integrals
Then
Z Z
b(x,y)
f dV =
E
f (x, y, z) dzdA
R
a(x,y)
R b(x,y)
It might be helpful to think of dV = dzdA. Now a(x,y) f (x, y, z) dz is a function of x
and y and so you have reduced the triple integral to a double integral over R of this
function of x and y. Similar reasoning would apply if the region in R3 were of the form
{(x, y, z) : a (y, z) x b (y, z)} or {(x, y, z) : a (x, z) y b (x, z)} .
Example 14.3.6 Find the volume of the region, E in the first octant between z =
1 (x + y) and z = 0.
In this case, R is the region shown.
z
y
@
R
@
1
321
R 0
1 Z 1x
1(x+y)
dzdydx =
0
1
6
Of course iterated integrals have a life of their own although this will not be explored
here. You can just write them down and go to work on them. Here are some examples.
R3RxRx
Example 14.3.7 Find 2 3 3y (x y) dz dy dx.
Rx
The inside integral yields 3y (x y) dz = x2 4xy + 3y 2 . Next this must be inte
Rx
grated with respect to y to give 3 x2 4xy + 3y 2 dy = 3x2 + 18x 27. Finally the
third integral gives
Z 3Z xZ x
Z 3
(x y) dz dy dx =
3x2 + 18x 27 dx = 1.
2
3y
R R 3y R y+z
0
cos (x + y) dx dz dy.
R y+z
The inside integral is 0 cos (x + y) dx = 2 cos z sin y cos y + 2 sin z cos2 y sin z
sin y. Now this has to be integrated.
Z 3y Z y+z
Z 3y
cos (x + y) dx dz =
2 cos z sin y cos y + 2 sin z cos2 y sin z sin y dz
0
0
5
= 3
Example 14.3.9 Here is an iterated integral:
ated integral in the order dz dx dy.
R 2 R 3 32 x R x2
0
The inside integral is just a function of x and y. (In fact, only a function of x.) The
order of the last two integrals must be interchanged. Thus the iterated integral which
needs to be done in a different order is
Z 2 Z 3 32 x
f (x, y) dy dx.
0
J
J
J 3 32 x = y
J
J
J
2
322
Thus this double integral equals
Z 3Z
0
2
3 (3y)
f (x, y) dx dy.
2
3 (3y)
x2
dz dx dy.
0
3y + 3z = 2
x = 16 y 2
2
3
Therefore, the outside integrals taken with respect to z and y are of the form
R 32 R 23 y
dz dy and now for any choice of (y, z) in the above triangular region, x goes
0
0
from 0 to 16 y 2 . Therefore, the iterated integral is
Z
2
3
2
3 y
16y 2
dx dz dy =
0
860
243
Example 14.3.11 Find the volume of the region determined by the intersection of the
two cylinders, x2 + y 2 9 and y 2 + z 2 9.
The first listed cylinder intersects the xy plane in the disk, x2 + y 2 9. What is the
volume
region which is between this disk and the two surfaces,
p of the three dimensional
p
z = 9 y 2 and z = 9 y 2 ? An iterated integral for the volume is
Z Z 2 Z 2
3
9y
9y 2
9y
dz dx dy = 144.
9y 2
Note I drew no picture of the three dimensional region. If you are interested, here it is.
323
One of the cylinders is parallel to the z axis, x2 + y 2 9 and the other is parallel to
the x axis, y 2 + z 2 9. I did not need to be able to draw such a nice picture in order
to work this problem. This is the key to doing these. Draw pictures in two dimensions
and reason from the two dimensional pictures rather than attempt to wax artistic and
consider all three dimensions at once. These problems are hard enough without making
them even harder by attempting to be an artist.
14.3.3
As an example of the use of triple integrals, consider a solid occupying a set of points,
U R3 having density . Thus is a function of position and the total mass of the
solid equals
Z
dV.
U
This is just like the two dimensional case. The mass of an infinitesimal chunk of the
solid
located at x would be (x) dV and so the total mass is just the sum of all these,
R
(x)
dV.
U
Example 14.3.12 Find the volume of R where R is the bounded region formed by the
plane 51 x + y + 15 z = 1 and the planes x = 0, y = 0, z = 0.
When z = 0, the plane becomes 15 x + y = 1. Thus the intersection of this plane with
the xy plane is this line shown in the following picture.
1```
```
```
`
5
Therefore, the bounded region is between the triangle formed in the above picture
1
1
by the x axis,the y axis
and the above line and the surface given by 5 x + y + 5 z = 1
1
or z = 5 1 5 x + y = 5 x 5y. Therefore, an iterated integral which yields the
volume is
Z 5 Z 1 15 x Z 5x5y
25
dz dy dx =
.
6
0
0
0
Example 14.3.13 Find the mass of the bounded region, R formed by the plane
1
1
3 y + 5 z = 1 and the planes x = 0, y = 0, z = 0 if the density is (x, y, z) = z.
1
3x
This is done just like the previous example except in this case there is a function to
integrate. Thus the answer is
Z
3
0
Z
0
3x
Z
0
5 35 x 53 y
z dz dy dx =
75
.
8
Example 14.3.14 Find the total mass of the bounded solid determined by z = 9x2 y 2
and x, y, z 0 if the mass is given by (x, y, z) = z
When z = 0 the surface, z = 9 x2 y 2 intersects the xy plane in a circle of radius
3 centered at (0, 0) . Since x, y 0, it is only a quarter of a circle of interest, the part
where both these variables are nonnegative. For each (x, y) inside this quarter circle, z
goes from 0 to 9 x2 y 2 . Therefore, the iterated integral is of the form,
Z 3 Z (9x2 ) Z 9x2 y2
243
z dz dy dx =
8
0
0
0
324
1
7x
while the plane, x + 71 y + 41 z = 1 intersects the xy plane in the line whose equation is
1
x + y = 1.
7
Furthermore, the two planes intersect when x = y as can be seen from the equations,
x + 17 y = 1 z4 and 17 x + y = 1 z4 which imply x = y. Thus the two dimensional
picture to look at is depicted in the following picture.
D
D
D
D
D
D
x + 71 y + 14 z = 1
D
D
y=x
D
D
``
```
y + 17 x + 41 z = 1
R
1 D
```
```
D
``
R2 D
You see in this picture, the base of the region in the xy plane is the union of the two
triangles, R1 and R2 . For (x, y) R1 , z goes from 0 to what it needs to be to be on the
plane, 17 x +y + 41 z = 1.Thus z goes from 0 to 4 1 17 x y . Similarly, on R2 , z goes
from 0 to 4 1 17 y x . Therefore, the integral needed is
Z
Z 4(1 17 xy)
R1
Z 4(1 71 yx)
dz dV +
R2
dz dV
R
R
and now it only remains to consider R1 dV and R2 dV. The point of intersection of
7 7
these lines shown in the above picture is 8 , 8 and so an iterated integral is
Z
7/8
14.4
1 x
7
Z 4(1 17 xy)
7/8
dz dy dx +
1 y7
Exercises
R 4 R 2x R x
R 3 R 25x R 2x2y
0
2y
dz dy dx
2x dz dy dx
R 2 R 13x R 33x2y
x dz dy dx
R 5 R 3x R x
4. Evaluate the integral 2 4 4y (x y) dz dy dx
0
Z 4(1 17 yx)
0
dz dx dy =
7
6
14.4. EXERCISES
5. Evaluate the integral
6. Evaluate the integral
325
R R 3y R y+z
0
R R 4y R y+z
0
cos (x + y) dx dz dy
sin (x + y) dx dz dy
R1RzRz
R?R?R?
7. Fill in the missing limits. 0 0 0 f (x, y, z) dx dy dz = ? ? ? f (x, y, z) dx dz dy,
R?R?R?
R 1 R z R 2z
f (x, y, z) dx dy dz = ? ? ? f (x, y, z) dy dz dx,
0 0 0
R1RzRz
R?R?R?
f (x, y, z) dx dy dz = ? ? ? f (x, y, z) dz dy dx,
0 0 0
R 1 R z R y+z
R?R?R?
f (x, y, z) dx dy dz = ? ? ? f (x, y, z) dx dz dy,
0 z/2 0
R6R6R4
R?R?R?
f (x, y, z) dx dy dz = ? ? ? f (x, y, z) dz dy dx.
4 2 0
8. Find the volume of R where R is the bounded region formed by the plane
1
1
3 y + 4 z = 1 and the planes x = 0, y = 0, z = 0.
1
5x
9. Find the volume of R where R is the bounded region formed by the plane
1
1
2 y + 4 z = 1 and the planes x = 0, y = 0, z = 0.
1
4x
10. Find the mass of the bounded region, R formed by the plane 14 x + 31 y + 12 z = 1
and the planes x = 0, y = 0, z = 0 if the density is (x, y, z) = y + z
11. Find the mass of the bounded region, R formed by the plane 14 x + 21 y + 15 z = 1
and the planes x = 0, y = 0, z = 0 if the density is (x, y, z) = y
R 2 R 1 1 x R x2
12. Here is an iterated integral: 0 0 2 0 dz dy dx. Write as an iterated integral
in the following orders: dz dx dy, dx dz dy, dx dy dz, dy dx dz, dy dz dx.
13. Find the volume of the bounded region determined by 2y + z = 3, x = 9 y 2 , y =
0, x = 0, z = 0.
14. Find the volume of the bounded region determined by 3y + 2z = 5, x = 9 y 2 , y =
0, x = 0.
Your answer should be
11 525
648
250
3 .
326
26. Find
27. Find
28. Find
R 1 R 183z R 6z
0
1
3x
R 2 R 244z R 6z
0
1
4y
(6 z) exp y 2 dy dx dz.
(6 z) exp x2 dx dy dz.
R 1 R 124z R 3z
0
1
4y
sin x
x
dx dy dz.
R 20 R 1 R 5z
R 25 R 5 1 y R 5z
29. Find 0 0 1 y sinx x dx dz dy + 20 0 5 1 y
5
5
try doing it in the order, dy dx dz
14.5
sin x
x
R 7 R 3x R x
4
5y
dz dy dx
Answer:
3417
2
R 4 R 25x R 42xy
2. Find 0 0
(2x) dz dy dx
0
Answer:
2464
3
R 2 R 25x R 14x3y
3. Find 0 0
(2x) dz dy dx
0
Answer:
196
3
4. Evaluate the integral
R 8 R 3x R x
5
4y
(x y) dz dy dx
Answer:
114 607
8
R R 4y R y+z
0
cos (x + y) dx dz dy
Answer:
4
6. Evaluate the integral
R R 2y R y+z
0
sin (x + y) dx dz dy
Answer:
19
4
R1RzRz
R?R?R?
7. Fill in the missing limits. 0 0 0 f (x, y, z) dx dy dz = ? ? ? f (x, y, z) dx dz dy,
R 1 R z R 2z
R?R?R?
f (x, y, z) dx dy dz = ? ? ? f (x, y, z) dy dz dx,
0 0 0
R1RzRz
R?R?R?
f (x, y, z) dx dy dz = ? ? ? f (x, y, z) dz dy dx,
0 0 0
R 1 R z R y+z
R?R?R?
f (x, y, z) dx dy dz = ? ? ? f (x, y, z) dx dz dy,
0 z/2 0
Answer:
R1RzRz
f (x, y, z) dx dy dz =
327
R?R?R?
?
f (x, y, z) dz dy dx.
R1R1Rz
f (x, y, z) dx dz dy,
0 y 0
R2R1 Rz
f (x, y, z) dx dy dz = 0 x/2 0 f (x, y, z) dy dz dx,
0 0 0
i
R1R1
R 1 hR x R 1
R1RzRz
f
(x,
y,
z)
dz
dy
dx,
f
(x,
y,
z)
dz
dy
+
f
(x,
y,
z)
dx
dy
dz
=
0 x
x y
0
0 0 0
0
f (x, y, z) dx dy dz =
R 1 R z R 2z
R 1 R z R y+z
0
z/2
f (x, y, z) dx dy dz =
R 1/2 R 2y R y+z
R 1 R 1 R y+z
f (x, y, z) dx dz dy + 1/2 y2 0 f (x, y, z) dx dz dy
R7R5R3
R3R5R7
f (x, y, z) dx dy dz = 0 2 5 f (x, y, z) dz dy dx
5 2 0
y2
8. Find the volume of R where R is the bounded region formed by the plane
y + 41 z = 1 and the planes x = 0, y = 0, z = 0.
Answer:
R 5 R 1 15 x R 4 45 x4y
0
1
5x
10
3
dz dy dx =
9. Find the volume of R where R is the bounded region formed by the plane
1
1
2 y + 4 z = 1 and the planes x = 0, y = 0, z = 0.
Answer:
R 5 R 2 25 x R 4 45 x2y
1
5x
20
3
dz dy dx =
10. Find the mass of the bounded region, R formed by the plane 14 x + 21 y + 13 z = 1
and the planes x = 0, y = 0, z = 0 if the density is (x, y, z) = y
Answer:
R 4 R 2 12 x R 3 34 x 32 y
0
(y) dz dy dx = 2
11. Find the mass of the bounded region, R formed by the plane 12 x + 21 y + 14 z = 1
and the planes x = 0, y = 0, z = 0 if the density is (x, y, z) = z 2
Answer:
R 2 R 2x R 42x2y 2
z dz dy dx =
0 0
0
64
15
R 3 R 3x R x2
12. Here is an iterated integral: 0 0
dz dy dx. Write as an iterated integral in
0
the following orders: dz dx dy, dx dz dy, dx dy dz, dy dx dz, dy dz dx.
Answer:
Z
x2
3x
dy dz dx,
0
0
3y
x2
0
3
dz dx dy,
0
3x
dy dx dz,
0
(3y)2
3 z
3y
dx dy dz,
3y
dx dz dy
z
dx dz dy =
1168
375
328
dx dz dy =
375
256
dx dz dy =
23
4
16. Find the volume of the region bounded by x2 + y 2 = 16, z = 3x, z = 0, and x 0.
Answer:
R 4 R (16x2 ) R 3x
dz dy dx = 128
0
0
2
(16x )
17. Find the volume of the region bounded by x2 + y 2 = 25, z = 2x, z = 0, and x 0.
Answer:
R 5 R (25x2 ) R 2x
dz dy dx =
0
0
2
(25x )
500
3
18. Find the volume of the region determined by the intersection of the two cylinders,
x2 + y 2 9 and y 2 + z 2 9.
Answer:
R 3 R (9y2 ) R (9y2 )
8 0 0
dz dx dy = 144
0
19. Find the total mass of the bounded solid determined by z = a2 x2 y 2 and
x, y, z 0 if the mass is given by (x, y, z) = z
Answer:
R 4 R (16x2 ) R 16x2 y2
0
(z) dz dy dx =
512
3
20. Find the total mass of the bounded solid determined by z = a2 x2 y 2 and
x, y, z 0 if the mass is given by (x, y, z) = x + 1
Answer:
R 5 R (25x2 ) R 25x2 y2
0
(x + 1) dz dy dx =
625
8
1250
3
dz dy dx = 45
3
0
2
(9x )
dz dy dx +
R 32 R 1 12 y R 22xy
0
dz dx dy =
4
9
dz dy dx +
R 87 R 1 17 y R 33x 37 y
0
dz dx dy =
7
8
329
(x) dz dy dx =
1
0
2
3
5
25. Find
(925x )
R 1 R 355z R 7z
0
1
5x
81
1000
(7 z) cos y 2 dy dx dz.
Answer:
You need to interchange the order of integration.
5
5
4 cos 36 4 cos 49
R 2 R 123z R 4z
26. Find 0 0
(4 z) exp y 2 dy dx dz.
1
x
R 1 R 7z R 5y
0
(7 z) cos y 2 dx dy dz =
Answer:
You need to interchange the order of integration.
= 43 e4 9 + 34 e16
R 2 R 255z R 5z
27. Find 0 0
(5 z) exp x2 dx dy dz.
1
y
R 2 R 4z R 3y
0
(4 z) exp y 2 dx dy dz
Answer:
You need to interchange the order of integration.
Z 2 Z 5z Z 5x
5
5
(5 z) exp x2 dy dx dz = e9 20 + e25
4
4
0
0
0
28. Find
R 1 R 102z R 5z
0
1
2y
sin x
x
dx dy dz.
Answer:
You need to interchange the order of integration.
Z 1 Z 5z Z 2x
sin x
dy dx dz =
x
0
0
0
2 sin 1 cos 5 + 2 cos 1 sin 5 + 2 2 sin 5
29. Find
R 20 R 2 R 6z
0
1
5y
sin x
x
dx dz dy +
R 30 R 6 15 y R 6z
20
1
5y
sin x
x
dx dz dy.
Answer:
You need to interchange the order of integration.
Z 2 Z 305z Z 6z
Z 2 Z 6z Z 5x
sin x
sin x
dx dy dz =
dy dx dz
1
x
x
0
0
0
0
0
5y
= 5 sin 2 cos 6 + 5 cos 2 sin 6 + 10 5 sin 6
330
Outcomes
15.1
Different Coordinates
3
X
P (u1 , u2 , u3 )
sj uj : sj [0, 1] .
j=1
331
332
det u1
Lemma
15.1.2
The
volume
of
the
parallelepiped,
P
(u
,
u
,
u
)
is
given
by
1
2
3
u2
Proof: Recall from the discussion of the box product or triple product,
u1
u
volume of P (u1 , u2 , u3 ) |[u1 , u2 , u3 ]| = det
2
uT3
T
u1
where uT2 is the matrix having rows equal to the vectors, u1 , u2 and u3 arranged
uT3
horizontally. Since the determinant of a matrix equals the determinant of its transpose,
2
X
P (u1 , u2 )
sj uj : sj [0, 1] .
j=1
det u1
Lemma
15.1.4
The
area
of
the
parallelogram,
P
(u
,
u
)
is
given
by
1
2
u2
det u1
u2
where u1 u2 denotes the matrix having the two vectors u1 , u2 as columns. This
proves the lemma.
It always works this way. The n dimensional volume of the n dimensional parallelepiped determined by the vectors, {v1 , , vn } is always
det v1 vn
This general fact will not be used in what follows.
15.1.1
u3
333
having a corner at the point, u0 and sides as indicated. The image of this square is also
represented.
h(u0 + u2 e2 )
s
6
u2 e2
h(V)
V
u0
u1 e1
s
h(u0 + u1 e1 )
t
h(u0 )
For small ui you would expect the sides going from h (u0 ) to h (u0 + u1 e1 ) and
from h (u0 ) to h (u0 + u2 e2 ) to be almost the same as the vectors, h (u0 + u1 e1 )
h
h (u0 ) and h (u0 + u2 e2 ) h (u0 ) which are approximately equal to u
(u0 ) u1 and
1
h
(u
)
u
respectively.
Therefore,
the
area
of
h
(V
)
for
small
u
is essentially
0
2
i
u2
h
equal to the area of the parallelogram determined by the two vectors, u1 (u0 ) u1 and
h
u2 (u0 ) u2 . By Lemma 15.1.4 this equals
det
h
u1
(u0 ) u1
h
u2
(u0 ) u2
= det
h
u1
(u0 )
h
u2
(u0 ) u1 u2
h
h
(u) u
(u) dV (u)
f (x) dV (x) =
f (h (u)) det u
1
2
h(U )
det
h1
u1
h2
u1
(u1 , u2 )
(u1 , u2 )
h1
u2
h2
u2
(u1 , u2 )
(u1 , u2 )
x
r cos
=
y
r sin
Therefore, the volume (area) element is
sin r cos
334
15.1.2
Three Dimensions
The situation is no different for coordinate systems in any number of dimensions although I will concentrate here on three dimensions. x = f (u) where u U, a subset of
R3 and x is a point in V = f (U ) , a subset of 3 dimensional space. Thus, letting the
T
Cartesian coordinates of x be given by x = (x1 , x2 , x3 ) , each xi being a function of u,
an infinitesimal box located at u0 corresponds to an infinitesimal parallelepiped located
o3
n
0)
du
. From Lemma 15.1.2,
at f (u0 ) which is determined by the 3 vectors x(u
i
ui
i=1
x (u0 )
x (u0 ) x (u0 ) x (u0 )
x (u0 )
x (u0 )
u1
u2
u3
u2
u3
(15.1)
= det
x(u0 )
u1
x(u0 )
u2
x(u0 )
u3
Definition 15.1.7 Let x = f (u) be as described above. Then for n = 2, 3, the symbol,
(x1 ,xn )
(u1 ,,un ) , called the Jacobian determinant, is defined by
det
x(u0 )
u1
x(u0 )
un
(x1 , , xn )
.
(u1 , , un )
(x1 ,,xn )
Also, the symbol, (u
du1 dun is called the volume element.
1 ,,un )
This has given motivation for the following fundamental procedure often called the
change of variables formula which holds under fairly general conditions.
Z
(x1 , , xn )
dV =
h (f (u))
h (x) dV.
(u1 , , un )
U
f (U )
2 This will cause non overlapping infinitesimal boxes in U to be mapped to non overlapping infinitesimal parallelepipeds in V.
Also, in the context of the Riemann integral we should say more about the set U in any case the
function, h. These conditions are mainly technical however, and since a mathematically respectable
treatment will not be attempted for this theorem, I think it best to give a memorable version of it
which is essentially correct in all examples of interest.
335
Now consider spherical coordinates. Recall the geometrical meaning of these coordinates illustrated in the following picture.
z
6
(, , )
(r, , z1 )
(x1 , y1 , z1 )
z1
y1
y
r
x1
(x1 , y1 , 0)
x(0 ,0 , 0 )
x(0 ,0 , 0 )
x(0 ,0 , 0 )
det
ddd.
x
sin cos
x = y = sin sin = f (, , )
z
cos
Therefore, det
sin cos
det sin sin
cos
cos cos
cos sin
sin
(15.3)
sin sin
sin cos = 2 sin
0
336
volume equal to zero. It leaves out the point at the origin is all. Therefore, the volume
of the ball is
Z
Z
1 dV =
2 sin dV
BR
U
R
=
0
2 sin d d d =
4 3
R .
3
The reason this was effortless, is that the ball, BR is realized as a box in terms of
the spherical coordinates. Remember what was pointed out earlier about setting up
iterated integrals over boxes.
Example 15.1.10 A cone is cut out of a ball of radius R as shown in the following
picture, the diagram on the left being a side view. The angle of the cone is /3. Find
the volume of what is left.
/6
2 sin () ddd =
2 3 1
R +
3R3
3
3
Now change the example a little by cutting out a cone at the bottom which has an
angle of /2 as shown. What is the volume of what is left?
3/4
/6
Z
0
Z
0
2 sin () ddd =
1
1
2R3 +
3R3
3
3
Example 15.1.11 Next suppose the ball of radius R is a sort of an orange and you
remove a slice as shown in the picture. What is the volume of what is left? Assume the
slice is formed by the two half planes = 0 and = /4.
337
/4
7
7
2R3 +
3R3
24
24
2 sin () ddd =
Example 15.1.13 Set up the integrals to find the volume of the cone 0 z 4, z =
p
x2 + y 2 .
This is entirely the wrong coordinate system to use for this problem but it is a good
exercise. Here is a side view.
@
@
@
@
You need to figure out what is as a function of which goes from 0 to /4. You
should get
Z 2 Z /4 Z 4 sec()
64
2 sin () ddd =
3
0
0
0
@
x
r cos
y = r sin
z
z
338
cos r sin
det sin r cos
0
0
0
0 = r.
1
3
0
0
r
Example 15.1.16 This example uses spherical coordinates to verify an important conclusion about gravitational force. Let the hollow sphere, H be defined by a2 < x2 + y 2 +
z 2 < b2
and suppose this hollow sphere has constant density taken to equal 1. Now place a unit
mass at the point
(0, 0, z0 ) where |z0 | [a,b] . Show the force of gravity acting on this
R
unit mass is G H 2 2(zz0 ) 2 3/2 dV k and then show that if |z0 | > b then the
[x +y +(zz0 ) ]
force of gravity acting on this point mass is the same as if the entire mass of the hollow
sphere were placed at the origin, while if |z0 | < a, the total force acting on the point
mass from gravity equals zero. Here G is the gravitation constant and is the density.
In particular, this shows that the force a planet exerts on an object is as though the
entire mass of the planet were situated at its center3 .
Without loss of generality, assume z0 > 0. Let dV be a little chunk of material
located at the point (x, y, z) of H the hollow sphere. Then according to Newtons law
of gravity, the force this small chunk of material exerts on the given point mass equals
xi + yj + (z z0 ) k
1
G dV =
|xi + yj + (z z0 ) k| x2 + y 2 + (z z )2
0
1
(xi + yj + (z z0 ) k)
x2
y2
+ (z z0 )
3/2 G dV
1
x2
y2
+ (z z0 )
3/2 G dV.
3 This was shown by Newton in 1685 and allowed him to assert his law of gravitation applied to the
planets as though they were point masses. It was a major accomplishment.
339
By the symmetry of the sphere, the i and j components will cancel out when the integral
is taken. This is because there is the same amount of stuff for negative x and y as there
is for positive x and y. Hence what remains is
Z
(z z0 )
Gk
h
i3/2 dV
2
H
2
2
x + y + (z z0 )
as claimed. Now for the interesting part, the integral is evaluated. In spherical coordinates this integral is.
Z
Z
0
( cos z0 ) 2 sin
(2 + z02 2z0 cos )
3/2
d d d.
(15.4)
Rewrite the inside integral and use integration by parts to obtain this inside integral
equals
Z
2
1
(2z0 sin )
cos z0
d =
3/2
2
2z0 0
( + z02 2z0 cos )
2 z0
2 z0
1
p
2 p
+
2
2z0
(2 + z02 + 2z0 )
(2 + z02 2z0 )
!
Z
sin
2
d .
(15.5)
2 p
(2 + z02 2z0 cos )
0
There are some cases to consider here.
First suppose z0 < a so the point is on the inside of the hollow sphere and it is
always the case that > z0 . Then in this case, the two first terms reduce to
2 ( + z0 )
2 ( z0 )
2 ( + z0 ) 2 ( z0 )
q
+
= 4
+q
=
( + z0 )
z0
2
2
( + z0 )
( z0 )
and so the expression in 15.5 equals
!
Z
1
sin
4
22 p
d
2z0
(2 + z02 2z0 cos )
0
!
Z
1
2z0 sin
4
p
d
z0 0
(2 + z02 2z0 cos )
1/2
2 2
1
2
4
|0
=
+ z0 2z0 cos
2z0
z0
1
2
=
4
[( + z0 ) ( z0 )] = 0.
2z0
z0
1
=
2z0
Therefore, in this case the inner integral of 15.4 equals zero and so the original integral
will also be zero.
The other case is when z0 > b and so it is always the case that z0 > . In this case
the first two terms of 15.5 are
2 ( + z0 )
2 ( z0 )
2 ( + z0 ) 2 ( z0 )
q
+
= 0.
+q
=
( + z0 )
z0
2
2
( + z0 )
( z0 )
340
1
sin
2
2 p
d
2z0
(2 + z02 2z0 cos )
0
Z
!
2z0 sin
p
=
d
2z02
(2 + z02 2z0 cos )
0
which equals
1/2
2
2
|0
2z
cos
+
z
0
0
z02
22
[( + z0 ) (z0 )] = 2 .
2
z0
z0
Thus the inner integral of 15.4 reduces to the above simple expression. Therefore, 15.4
equals
Z 2 Z b
4 b3 a3
2
2 2 d d =
z0
3
z02
0
a
and so
Z
Gk
i3/2 dV
2
x2 + y 2 + (z z0 )
4 b3 a3
total mass
.
Gk
= kG
3
z02
z02
H
15.2
(z z0 )
Exercises
7. Here are three vectors. (4, 1, 2) , (5, 0, 2) , and (3, 1, 3) . These vectors determine a parallelepiped, R, which is occupied by a solid having density = x. Find
the mass of this solid.
8. Here are three vectors. (5, 1, 6) , (6, 0, 6) , and (4, 1, 7) . These vectors determine a parallelepiped, R, which is occupied by a solid having density = y. Find
the mass of this solid.
15.2. EXERCISES
341
T
9. Here are three vectors. (5, 2, 9) , (6, 1, 9) , and (4, 2, 10) . These vectors determine a parallelepiped, R, which is occupied by a solid having density = y + x.
Find the mass of this solid.
R
2
2
10. Let D = (x, y) : x2 + y 2 25 . Find D e25x +25y dx dy.
R
11. Let D = (x, y) : x2 + y 2 16 . Find D cos 9x2 + 9y 2 dx dy.
12. The ice cream
in a sugar cone is described in spherical coordinates by
[0, 10] , 0, 13 , [0, 2] . If the units are in centimeters, find the total
volume in cubic centimeters of this ice cream.
p
13. Find the volume between z = 5 x2 y 2 and z = 2 x2 + y 2 .
14. A ball of radius 3 is placed in a drill press and a hole of radius 2 is drilled out with
the center of the hole a diameter of the ball. What is the volume of the material
which remains?
p
15. A ball of radius 9 has density equal to x2 + y 2 + z 2 in rectangular coordinates.
The top of this ball is sliced off by a plane of the form z = 2. What is the mass of
what remains?
R
16. Find S xy dV where S is described in polar coordinates as 1 r 2 and 0
/4.
R 2
17. Find S xy + 1 dV where S is given in polar coordinates as 1 r 2 and
0 16 .
18. Use polar coordinates to evaluate
integral. Here S is given in terms
the following
R
of the polar coordinates. S sin 2x2 + 2y 2 dV where r 2 and 0 32 .
R
2
2
19. Find S e2x +2y dV where S is given in terms of the polar coordinates, r 2 and
0 .
20. Compute the volume of a sphere of radius R using cylindrical coordinates.
21. In Example 15.1.16 on Page 338 check out all the details by working the integrals
to be sure the steps are right.
22. What if the hollow sphere in Example 15.1.16 were in two dimensions and everything, including Newtons law still held? Would similar conclusions hold? Explain.
R
2
23. Fill in all details for the following argument that 0 ex dx = 12 . Let I =
R x2
e
dx. Then
0
Z /2 Z
Z Z
2
2
2
1
rer dr d =
e(x +y ) dx dy =
I2 =
4
0
0
0
0
from which the result follows.
R
(x)2
1
24. Show 2
e 22 dx = 1. Here is a positive number called the standard
deviation and is a number called the mean.
R
25. Show using Problem 23 21 = ,. Recall () 0 et t1 dt.
R1
q1
26. Let p, q > 0 and define B (p, q) = 0 xp1 (1 x) . Show
(p) (q) = B (p, q) (p + q)
. Hint: It is fairly routine if you start with the left side and proceed to change
variables.
342
15.3
1 v
1
v
x=
,y = u
31+u
3 1+u
Now
(x, y)
= det
(u, v)
v
13 (1+u)
2
1
v
3 (1+u)2
1
3+3u
1 u
3 1+u
!
=
1
v
.
9 (1 + u)2
dV =
R
dv du =
9 (1 + u)2
1
v
7
dv du =
2
9 (1 + u)
40
x=
v
v
,y = u
5+u
5+u
Now
(x, y)
= det
(u, v)
v
(5+u)
2
v
5 (5+u)2
1
5+u
u
5+u
!
=
v
(5 + u)
2.
dV =
R
Z 5Z 9
12
v
v
dv
du
=
dv du =
2
2
(5 + u)
7
2
1 (5 + u)
v
v
,y = u
x=
5 + 3u
5 + 3u
Now
(x, y)
= det
(u, v)
343
v
3 (5+3u)
2
v
5 (5+3u)2
1
5+3u
u
5+3u
!
=
v
(5 + 3u)
2.
dV =
R
v
5 + 3u
v
v
dv du =
5 + 3u (5 + 3u)2
v
2
(5 + 3u)
dv du =
4123
.
19 360
1 v
1
v
x=
,y = u
.
21+u
2 1+u
Now
(x, y)
= det
(u, v)
v
12 (1+u)
2
v
1
2 (1+u)2
1
2+2u
1 u
2 1+u
!
=
1
v
.
4 (1 + u)2
(x + 1)
dv du
4 (1 + u)2
4
1
!
Z 5 Z 10
v
1
dv du
(x + 1)
4 (1 + u)2
4
1
Z
dV
10
1 v
1
v
x=
,y = u
.
22+u
2 2+u
Now
(x, y)
= det
(u, v)
v
12 (2+u)
2
v
(2+u)2
1
4+2u
1 u
2 2+u
!
=
v
1
.
4 (2 + u)2
dV =
R
v
1
u
2 2+u
v
1
dv du = 4 ln 2 + 4 ln 3
2
4 (2 + u)
344
1 2
6. Find the volume of the region, E, bounded by the ellipsoid, 14 x2 + 19 y 2 + 49
z = 1.
Answer:
Let u = 12 x, v = 13 y, w = 71 z. Then (u, v, w) is a point in the unit ball, B. Therefore,
Z
Z
(x, y, z)
dV =
dV.
B (u, v, w)
E
But
(x,y,z)
(u,v,w)
4
42 = 56.
3
7. Here are three vectors. (4, 1, 4) , (5, 0, 4) , and(3, 1, 5) . These vectors determine
a parallelepiped, R, which is occupied by a solid having density = x. Find the
mass of this solid.
Answer:
4 5
Let 1 0
4 4
3
u
x
1 v = y . Then this maps the unit cube,
5
w
z
Q [0, 1] [0, 1] [0, 1]
onto R and
(x, y, z)
= det 1
(u, v, w)
4
5 3
0 1 = |9| = 9
4 5
so the mass is
Z
Z
x dV =
0
T
8. Here are three vectors. (3, 2, 6) , (4, 1, 6) , and (2, 2, 7) . These vectors determine a parallelepiped, R, which is occupied by a solid having density = y. Find
the mass of this solid.
Answer:
3 4
Let 2 1
6 6
2
u
x
2 v = y . Then this maps the unit cube,
7
w
z
Q [0, 1] [0, 1] [0, 1]
onto R and
(x, y, z)
= det 2
(u, v, w)
6
and so the mass is
4 2
1 2 = |11| = 11
6 7
345
Z
x dV =
=
0
0
T
55
.
2
9. Here are three vectors. (2, 2, 4) , (3, 1, 4) , and (1, 2, 5) . These vectors determine a parallelepiped, R, which is occupied by a solid having density = y + x.
Find the mass of this solid.
Answer:
2 3
Let 2 1
4 4
x
u
1
2 v = y . Then this maps the unit cube,
5
w
z
Q [0, 1] [0, 1] [0, 1]
onto R and
(x, y, z)
= det 2
(u, v, w)
4
3 1
1 2 = |8| = 8
4 5
Z
x dV =
Q
1
R
2
2
10. Let D = (x, y) : x2 + y 2 25 . Find D e36x +36y dx dy.
Answer:
This is easy in polar coordinates. x = r cos , y = r sin . Thus
terms of these new coordinates, the disk, D, is the rectangle,
(x,y)
(r,)
= r and in
Z
e
36x2 +36y 2
Z
0
e36r r dV =
dV =
R
Z
0
e36r r d dr =
1 900
e 1 .
36
Note you wouldnt get very far without changing the variables in this.
R
11. Let D = (x, y) : x2 + y 2 9 . Find D cos 36x2 + 36y 2 dx dy.
Answer:
This is easy in polar coordinates. x = r cos , y = r sin . Thus
terms of these new coordinates, the disk, D, is the rectangle,
R = {(r, ) [0, 3] [0, 2]} .
(x,y)
(r,)
= r and in
346
Therefore,
Z
cos 36r2 r dV =
R
1
cos 36r2 r d dr =
(sin 324) .
36
12. The
1 ice
cream in a sugar cone is described in spherical coordinates by [0, 8] ,
0, 4 , [0, 2] . If the units are in centimeters, find the total volume in cubic
centimeters of this ice cream.
Answer:
Remember that in spherical coordinates, the volume element is 2 sin dV and so
R 8 R 1 R 2
2 + 1024
the total volume of this is 0 04 0 2 sin d d d = 512
3
3 .
p
13. Find the volume between z = 5 x2 y 2 and z = (x2 + y 2 ).
Answer:
Use cylindrical coordinates. In terms of these coordinates the shape is
1
1
2
h r z r, r 0,
21
, [0, 2] .
2
2
Also,
(x,y,z)
(r,,z)
2
0
1
2
21 12
5r 2
r dz dr d =
0
39
1
+ 21
4
4
14. A ball of radius 12 is placed in a drill press and a hole of radius 4 is drilled out
with the center of the hole a diameter of the ball. What is the volume of the
material which remains?
Answer:
You know the formula for the volume of a sphere and so if you find out how much
stuff is taken away, then it will be easy to find what is left. To find the volume of
what is removed, it is easiest to use cylindrical coordinates. This volume is
Z 4 Z 2 Z (144r2 )
4096
r dz d dr =
2 + 2304.
3
2
0
0
(144r )
3
Therefore, the volume of what remains is 43 (12) minus the above. Thus the
volume of what remains is
4096
2.
3
p
15. A ball of radius 11 has density equal to x2 + y 2 + z 2 in rectangular coordinates.
The top of this ball is sliced off by a plane of the form z = 1. What is the mass of
what remains?
Answer:
Z
0
Z
0
2
arcsin( 11
30)
sec
Z
3
sin d d d+
0
2
arcsin( 11
30)
11
0
3 sin d d d
347
=
16. Find
/4.
y
S x
24 623
Answer:
Use x = r cos and y = r sin . Then the integral in polar coordinates is
Z
/4
(r tan ) dr d =
0
17. Find
R y 2
S
3
ln 2.
4
0 14 .
Answer:
Use x = r cos and y = r sin . Then the integral in polar coordinates is
Z
1
4
1 + tan2 r dr d.
R
of the polar coordinates. S sin 4x2 + 4y 2 dV where r 2 and 0 16 .
Answer:
Z
1
6
sin 4r2 r dr d.
R
2
2
19. Find S e2x +2y dV where S is given in terms of the polar coordinates, r 2 and
0 13 .
Answer:
The integral is
Z
0
1
3
re2r dr d =
1 8
e 1 .
12
15.4
R 2 R R R R2 r2
0
R2 r 2
r dz dr d = 34 R3 .
348
15.4.1
@
@
@y
R
For the purpose of this discussion, consider the top as a large number of point masses,
mi , located at the positions, ri (t) for i = 1, 2, , N and these masses are symmetrically
arranged relative to the axis of the top. As the top spins, the axis of symmetry is
observed to move around the z axis. This is called precession and you will see it occur
whenever you spin a top. What is the speed of this precession? In other words, what is
0 ? The following discussion follows one given in Sears and Zemansky [26].
Imagine a coordinate system which is fixed relative to the moving top. Thus in this
coordinate system the points of the top are fixed. Let the standard unit vectors of the
coordinate system moving with the top be denoted by i (t) , j (t) , k (t). From Theorem
8.4.2 on Page 163, there exists an angular velocity vector (t) such that if u (t) is the
position vector of a point fixed in the top, (u (t) = u1 i (t) + u2 j (t) + u3 k (t)),
u0 (t) = (t) u (t) .
The vector a shown in the picture is the vector for which
r0i (t) a ri (t)
is the velocity of the ith point mass due to rotation about the axis of the top. Thus
(t) = a (t) + p (t) and it is assumed p (t) is very small relative to a . In other
words, it is assumed the axis of the top moves very slowly relative to the speed of the
points in the top which are spinning very fast around the axis of the top. The angular
momentum, L is defined by
N
X
L
ri mi vi
(15.6)
i=1
349
where vi equals the velocity of the ith point mass. Thus vi = (t) ri and from the
above assumption, vi may be taken equal to a ri . Therefore, L is essentially given
by
L
N
X
mi ri (a ri )
i=1
2
mi |ri | a (ri a ) ri .
N
X
i=1
By symmetry of the top, this last expression equals a multiple of a . Thus L is parallel
to a . Also,
L a
N
X
mi a ri (a ri )
i=1
N
X
mi (a ri ) (a ri )
i=1
N
X
mi |a ri | =
i=1
N
X
mi |a | |ri | sin2 ( i )
i=1
where i denotes the angle between the position vector of the ith point mass and the
axis of the top. Since this expression is positive, this also shows L has the same direction
as a . Let |a | . Then the above expression is of the form
L a = I 2 ,
where
I
N
X
mi |ri | sin2 ( i ) .
i=1
Thus, to get I you take the mass of the ith point mass, multiply it by the square of
its distance to the axis of the top and add all these up. This is defined as the moment
of inertia of the top about the axis of the top. Letting u denote a unit vector in the
direction of the axis of the top, this implies
L = Iu.
(15.7)
Note the simple description of the angular momentum in terms of the moment of inertia.
Referring to the above picture, define the vector, y to be the projection of the vector,
u on the xy plane. Thus
y = u (u k) k
and
(u i) = (y i) = sin cos .
Now also from 15.6,
dL
dt
N
X
=0
z }| {
mi r0i vi + ri mi vi0
i=1
N
X
i=1
ri
mi vi0
N
X
i=1
ri mi gk
(15.8)
350
where g is the acceleration of gravity. From 15.7, 15.8, and the above,
dL
du
dy
i = I
i = I
i
dt
dt
dt
=
(I sin sin ) 0 =
N
X
ri mi gk i
i=1
N
X
mi gri k i =
i=1
N
X
mi gri j.
(15.9)
i=1
To simplify this further, recall the following definition of the center of mass.
Definition 15.4.1 Define the total mass, M by
M=
N
X
mi
i=1
PN
r0
i=1 ri mi
(15.10)
=
=
=
=
.
M g |r0 | sin cos
2
Note that by symmetry, r0 (t) is on the axis of the top, is in the same direction as L, u,
and a , and also |r0 | is independent of t. Therefore, from the second line of 15.9,
(I sin sin ) 0 = M g |r0 | sin sin .
which shows
M g |r0 |
.
(15.11)
I
From 15.11, the angular velocity of precession does not depend on in the picture.
It also is slower when is large and I is large.
The above discussion is a considerable simplification of the problem of a spinning
top obtained from an assumption that a is approximately equal to . It also leaves
out all considerations of friction and the observation that the axis of symmetry wobbles.
This is wobbling is called nutation. The full mathematical treatment of this problem
involves the Euler angles and some fairly complicated differential equations obtained
using techniques discussed in advanced physics classes. Lagrange studied these types of
problems back in the 1700s.
0 =
15.4.2
Kinetic Energy
The next problem is that of understanding the total kinetic energy of a collection of
moving point masses. Consider a possibly large number of point masses, mi located at
351
the positions ri for i = 1, 2, , N. Thus the velocity of the ith point mass is r0i = vi .
The kinetic energy of the mass mi is defined by
1
2
mi |r0i | .
2
(This is a very good time to review the presentation on kinetic energy given on Page
170.) The total kinetic energy of the collection of masses is then
E=
N
X
1
i=1
mi |r0i | .
(15.12)
As these masses move about, so does the center of mass, r0 . Thus r0 is a function
of t just as the other ri . From 15.12 the total kinetic energy is
E
N
X
1
i=1
N
X
1
i=1
h
i
2
2
mi |r0i r00 | + |r00 | + 2 (r0i r00 r00 ) .
(15.13)
Now
N
X
mi (r0i
r00
r00 )
N
X
!0
mi (ri r0 )
r00
i=1
i=1
mi (ri r0 ) =
i=1
N
X
PN
i=1
N
X
i=1
i=1
N
X
N
X
i=1
Let M
m i ri
m i ri
i=1
mi r0
P
mi
N
i=1 ri mi
PN
i=1 mi
!
= 0.
N
X
1
i=1
h
i
2
2
mi |r0i r00 | + |r00 |
N
X
1
1
2
2
M |r00 | +
mi |r0i r00 | .
2
2
i=1
(15.14)
The first term is just the kinetic energy of a point mass equal to the sum of all the masses
involved, located at the center of mass of the system of masses while the second term
represents kinetic energy which comes from the relative velocities of the masses taken
with respect to the center of mass. It is this term which is considered more carefully in
the case where the system of masses maintain distance between each other.
To illustrate the contrast between the case where the masses maintain a constant
distance and one in which they dont, take a hard boiled egg and spin it and then take
a raw egg and give it a spin. You will certainly feel a big difference in the way the two
eggs respond. Incidentally, this is a good way to tell whether the egg has been hard
boiled or is raw and can be used to prevent messiness which could occur if you think it
is hard boiled and it really isnt.
352
Now let e1 (t) , e2 (t) , and e3 (t) be an orthonormal set of vectors which is fixed in
the body undergoing rigid body motion. This means that ri (t) r0 (t) has components
which are constant in t with respect to the vectors, ei (t) . By Theorem 8.4.2 on Page
163 there exists a vector, (t) which does not depend on i such that
r0i (t) r00 (t) = (t) (ri (t) r0 (t)) .
Now using this in 15.14,
E
=
=
=
N
X
1
1
2
2
M |r00 | +
mi | (t) (ri (t) r0 (t))|
2
2
i=1
N
!
1
1 X
2
2
2
0 2
M |r0 | +
mi |ri (t) r0 (t)| sin i | (t)|
2
2 i=1
N
!
X
1
1
2
2
2
M |r00 | +
mi |ri (0) r0 (0)| sin2 i | (t)|
2
2 i=1
where i is the angle between (t) and the vector, ri (t)r0 (t) . Therefore, |ri (t) r0 (t)| sin i
is the distance between the point mass, mi located at ri and a line through the center
of mass, r0 with direction, as indicated in the following picture.
rmi
* (t)
ri (t) r0 (t)
i
PN
2
Thus the expression, i=1 mi |ri (0) r0 (0)| sin2 i plays the role of a mass in the
definition of kinetic energy except instead of the speed, substitute the angular speed,
| (t)| . It is this expression which is called the moment of inertia about the line whose
direction is (t) .
In both of these examples, the center of mass and the moment of inertia occurred
in a natural way.
15.4.3
The methods used to evaluate multiple integrals make possible the determination of
centers of mass and moments of inertia. In the case of a solid material rather than
finitely many point masses, you replace the sums with integrals. The sums are essentially
approximations of the integrals which result. This leads to the following definition.
Definition 15.4.2 Let a solid occupy a region R such that its density is (x) for x a
point in R and let L be a line. For x R, let l (x) be the distance from the point, x to
the line L. The moment of inertia of the solid is defined as
Z
2
l (x) (x) dV.
R
(x)
dV
(x) dV
R
RR
z (x) dV
z = RR
(x) dV
R
353
2
x + y 2 (x) dV
where the Cartesian coordinates of the point x are (x, y, z) . Then summing these up as
an integral, yields the following for the moment of inertia.
Z
2
x + y 2 (x) dV.
(15.15)
R
To find the center of mass, sum up r dV for the points in R and divide by the total
mass. In Cartesian coordinates, where r = (x, y, z) , this means to sum up vectors of the
form (x dV, y dV, z dV ) and divide by the total mass. Thus the Cartesian coordinates
of the center of mass are
R
R
R
R
r dV
x dV R y dV R z dV
R
R
, R
, R
RR
.
dV
dV
dV
dV
R
R
R
R
Here is a specific example.
Example 15.4.4 Find the moment of inertia about the z axis
and center of mass of
the solid which occupies the region, R defined by 9 x2 + y 2 z 0 if the density is
p
(x, y, z) = x2 + y 2 .
p
R
This moment of inertia is R x2 + y 2
x2 + y 2 dV and the easiest way to find this
integral is to use cylindrical coordinates. Thus the answer is
Z
0
3
0
9r 2
r3 r dz dr d =
8748
.
35
To find the center of mass, note the x and y coordinates of the center of mass,
R
R
x dV R y dV
R
R
, R
dV
dV
R
R
both equal zero because the above shape is symmetric about the z axis and is also
symmetric in its values. Thus x dV will cancel with x dV and a similar conclusion
will hold for the y coordinate. It only remains to find the z coordinate of the center of
mass, z. In polar coordinates, = r and so,
R
R 2 R 3 R 9r2 2
z dV
zr dz dr d
18
R
.
z= R
= R0 2 R0 3 R0 9r2
=
7
dV
r2 dz dr d
R
0
354
15.5
Exercises
1. Let R denote the finite region bounded by z = 4 x2 y 2 and the xy plane. Find
zc , the z coordinate of the center of mass if the density, is a constant.
2. Let R denote the finite region bounded by z = 4 x2 y 2 and the xy plane. Find
zc , the z coordinate of the center of mass if the density, is equals (x, y, z) = z.
3. Find the mass and center of mass of the region between the surfaces z = y 2 + 8
and z = 2x2 + y 2 if the density equals = 1.
4. Find the mass and center of mass of the region between the surfaces z = y 2 + 8
and z = 2x2 + y 2 if the density equals (x, y, z) = x2 .
5. The two cylinders, x2 + y 2 = 4 and y 2 + z 2 = 4 intersect in a region, R. Find the
mass and center of mass if the density, , is given by (x, y, z) = z 2 .
6. The two cylinders, x2 + y 2 = 4 and y 2 + z 2 = 4 intersect in a region, R. Find the
mass and center of mass if the density, , is given by (x, y, z) = 4 + z.
7. Find the mass and center of mass of the set, (x, y, z) such that
if the density is (x, y, z) = 4 + y + z.
x2
4
y2
9
+ z2 1
8. Let R denote the finite region bounded by z = 9 x2 y 2 and the xy plane. Find
the moment of inertia of this shape about the z axis given the density equals 1.
9. Let R denote the finite region bounded by z = 9 x2 y 2 and the xy plane. Find
the moment of inertia of this shape about the x axis given the density equals 1.
10. Let B be a solid ball of constant density and radius R. Find the moment of inertia
about a line through a diameter of the ball. You should get 25 R2 M where M
equals the mass.
11. Let B be a solid ball of density, = where is the distance to the center of
the ball which has radius R. Find the moment of inertia about a line through a
diameter of the ball. Write your answer in terms of the total mass and the radius
as was done in the constant density case.
12. Let C be a solid cylinder of constant density and radius R. Find the moment of
inertia about the axis of the cylinder
You should get 12 R2 M where M is the mass.
13. Let C be a solid cylinder of constant density and radius R and mass M and let B
be a solid ball of radius R and mass M. The cylinder and the sphere are placed
on the top of an inclined plane and allowed to roll to the bottom. Which one will
arrive first and why?
14. Suppose a solid of mass M occupying the region, B has moment of inertia, Il
about a line, l which passes through the center of mass of M and let l1 be another
line parallel to l and at a distance of a from l. Then the parallel axis theorem
states Il1 = Il + a2 M. Prove the parallel axis theorem. Hint: Choose axes such
that the z axis is l and l1 passes through the point (a, 0) in the xy plane.
15. Using the parallel axis theorem find the moment of inertia of a solid ball of radius
R and mass M about an axis located at a distance of a from the center of the
ball. Your answer should be M a2 + 25 M R2 .
15.5. EXERCISES
355
16. Consider all axes in computing the moment of inertia of a solid. Will the smallest
possible moment of inertia always result from using an axis which goes through
the center of mass?
17. Find the moment of inertia of a solid thin rod of length l, mass M, and constant
density about an axis through the center of the rod perpendicular to the axis of
1 2
the rod. You should get 12
l M.
18. Using the parallel axis theorem, find the moment of inertia of a solid thin rod of
length l, mass M, and constant density about an axis through an end of the rod
perpendicular to the axis of the rod. You should get 13 l2 M.
19. Let the angle between the z axis and the sides of a right circular cone be . Also
assume the height of this cone is h. Find the z coordinate of the center of mass of
this cone in terms of and h assuming the density is constant.
20. Let the angle between the z axis and the sides of a right circular cone be .
Also assume the height of this cone is h. Assuming the density is = 1, find the
moment of inertia about the z axis in terms of and h.
21. Let R denote the part of the solid ball, x2 + y 2 + z 2 R2 which lies in the first
octant. That is x, y, z 0. Find the coordinates of the center of mass if the density
is constant. Your answer for one of the coordinates for the center of mass should
be (3/8) R.
22. Show that in general for L angular momentum,
dL
=
dt
where is the total torque,
ri Fi
356
Outcomes
16.1
Consider the boundary of some three dimensional region such that a function, f is
defined on this boundary. Imagine taking the value of this function at a point, multiplying this value by the area of an infinitesimal chunk of area located at this point
and then adding these up. This is just the notion of the integral presented earlier only
now there is a difference because this infinitesimal chunk of area should be considered
as two dimensional even though it is in three dimensions. However, it is not really all
that different from what was done earlier. It all depends on the following fundamental
definition which is just a review of the fact presented earlier that the area of a parallelogram determined by two vectors in R3 is the norm of the cross product of the two
vectors.
Definition 16.1.1 Let u1 , u2 be vectors in R3 . The 2 dimensional parallelogram determined by these vectors will be denoted by P (u1 , u2 ) and it is defined as
P (u1 , u2 )
2
X
sj uj : sj [0, 1]
j=1
x = f (u) . (No sum on the repeated index.) From Definition 16.1.1, the volume of this
357
358
x (u0 )
x (u0 )
x (u0 )
x (u0 )
du
du
=
du1 du2
1
2
u1
u1
u2
u2
= |fu1 fu2 | du1 du2
(16.1)
(16.2)
It might help to think of a lizard. The infinitesimal parallelepiped is like a very small
scale on a lizard. This is the essence of the idea. To define the area of the lizard sum
up areas of individual scales1 . If the scales are small enough, their sum would serve as
a good approximation to the area of the lizard.
This motivates the following fundamental procedure which I hope is extremely familiar from the earlier material.
Procedure 16.1.2 Suppose U is a subset of R2 and suppose f : U f (U ) R3 is a
1
one to one
R and C function. Then if h : f (U ) R, define the 2 dimensional surface
integral, f (U ) h (x) dA according to the following formula.
Z
Z
h (x) dA
f (U )
1 ,x2 ,x3 )
Definition 16.1.3 It is customary to write |fu1 (u) fu2 (u)| = (x
(u1 ,u2 ) because this
new notation generalizes to far more general situations for which the cross product is
not defined. For example, one can consider three dimensional surfaces in R8 .
Example 16.1.4 Find the area of the region labeled A in the following picture. The
two circles are of radius 1, one has center (0, 0) and the other has center (1, 0) .
359
(x1 ,x2 )
(,r) .
i
j
(x1 , x2 , x3 ) x1 x2
=
(, r)
x1 x2
r
k
i
x3
r sin
=
x3
cos
r
j
r cos
sin
k
0
0
= r
1
0
fx = 0 , fy = 1
2x
0
and so
1
0 p
|fx fy | = 0 1 = 1 + 4x2
2x
0
and so the area element is 1 + 4x2 dx dy and the surface area is obtained by integrating
the function, h (x) 1. Therefore, this area is
Z
Z 1Z 1p
1
1
dA =
1 + 4x2 dx dy =
5 ln 2 + 5
2
4
f (U )
0
0
which can be obtained by using the trig. substitution, 2x = tan on the inside integral.
Note this all depends on being able to write the surface in the form, x = f (u) for
u U Rp . Surfaces obtained in this form are called parametrically defined surfaces.
These are best but sometimes you have some other description of a surface and in these
cases things can get pretty intractable. For example, you might have a level surface
4
of the form 3x2 + 4y
+ z 6 = 10. In this case, you could solve for z using methods of
p
6
4y 4 and a parametric description of part of this level
algebra. Thus
z = 10 3x2
360
this particular relation. However, if a point satisfying this relation can be identified, the
implicit function theorem from advanced calculus can usually be used to assert one of
the variables is a function of the others, proving the existence of a parameterization at
least locally. The problem is, this theorem doesnt give the answer in terms of known
functions so this isnt much help. Finding a parametric description of a surface is a
hard problem and there are no easy answers. This is a good example which illustrates
the gulf between theory and practice.
Example 16.1.6 Let U = [0, 12] [0, 2] and let f : U R3 be given by f (t, s)
T
(2 cos t + cos s, 2 sin t + sin s, t) . Find a double integral for the surface area. A graph
of this surface is drawn below.
2 sin t
sin s
ft = 2 cos t , fs = cos s
1
0
and
cos s
sin s
ft fs =
2 sin t cos s + 2 cos t sin s
and so
p
(x1 , x2 , x3 )
= |ft fs | = 5 4 sin2 t sin2 s 8 sin t sin s cos t cos s 4 cos2 t cos2 s.
(t, s)
Therefore, the desired integral giving the area is
Z 2 Z 12 p
5 4 sin2 t sin2 s 8 sin t sin s cos t cos s 4 cos2 t cos2 s dt ds.
0
If you really needed to find the number this equals, how would you go about finding it?
This is an interesting question and there is no single right answer. You should think
about this. Here is an example for which you will be able to find the integrals.
Example 16.1.7 Let U = [0, 2] [0, 2] and for (t, s) U, let
T
2 At Volwerths in Hancock Michigan, they make excellent sausages and hot dogs. The best are made
from natural casings which are the linings of intestines.
361
and so |ft fs | = (cos s + 2) so the area element is (cos s + 2) ds dt and the area is
Z 2 Z 2
(cos s + 2) ds dt = 8 2
0
Z
h dV
f (U )
where h (x, y, z) = x2 .
Everything is the same as the preceding example except this time it is an integral
of a function. The area element is (cos s + 2) ds dt and so the integral called for is
Z
h dA =
f (U )
16.1.1
x on the surface 2
z
}|
{
2 cos t + cos t cos s (cos s + 2) ds dt = 22 2
The special case where a surface is in the form z = f (x, y) , (x, y) U, yields a simple
formula which is used most often in this situation. You write the surface parametrically
T
in the form f (x, y) = (x, y, f (x, y)) such that (x, y) U. Then
1
0
fx = 0 , fy = 1
fx
fy
and
|fx fy | =
so the area element is
q
1 + fy2 + fx2
q
1 + fy2 + fx2 dx dy.
When the surface of interest comes in this simple form, people generally use this area
element directly rather than worrying about a parameterization and taking cross products.
In the case where the surface is of the form x = f (y, z) for (y, z) U, the area
element is obtained similarly and is
q
1 + fy2 + fz2 dy dz.
I think you can guess what the area element is if y = f (x, z) .
There is also a simple geometric description of these area elements. Consider the
surface z = f (x, y) . This is a level surface of the function of three variables z f (x, y) .
In fact the surface is simply z f (x, y) = 0. Now consider the gradient of this function
362
of three variables. The gradient is perpendicular to the surface and the third component
is positive in this case. This gradient is (fx , fy , 1) and so the unit upward normal is
just 12 2 (fx , fy , 1) . Now consider the following picture.
1+fx +fy
BMB
B
n B k
dV
B
dxdy
In this picture, you are looking at a chunk of area on the surface seen on edge and
so it seems reasonable to expect to have dx dy = dV cos . But it is easy to find cos
from the picture and the properties of the dot product.
cos =
nk
1
=q
.
|n| |k|
1 + fx2 + fy2
q
Therefore, dA = 1 + fx2 + fy2 dx dy as claimed. In this context, the surface involved is
referred to as S because the vector valued function, f giving the parameterization will
not have been identified.
p
p
where h (x, y, z) = x+z and S is the surface described as x, y, x2 + y 2 for (x, y) U.
Here you can see directly the angle in the above picture is 4 and so dV = 2 dx dy.
q
If you dont see this or if it is unclear, simply compute 1 + fx2 + fy2 and you will find
p
h dS =
x + x2 + y 2
2 dA
S
Z
2
0
(r cos + r) r dr d
16
2.
3
f1 (U1 )
f2 (U2 )
Admittedly, the set C gets added in twice but this doesnt matter because its 2 dimensional volume equals zero and therefore, the integrals over this set will also be zero.
I have been purposely vague about precise mathematical conditions necessary for
the above procedures. This is because the precise mathematical conditions which are
usually cited are very technical and at the same time far too restrictive. The most
16.2. EXERCISES
363
general conditions under which these sorts of procedures are valid include things like
Lipschitz functions defined on very general sets. These are functions satisfying a Lipschitz condition of the form |f (x) f (y)| K |x y| . For example, y = |x| is Lipschitz
continuous. However, this function does not have a derivative at every point. So it
is with Lipschitz functions. However, it turns out these functions have derivatives at
enough points to push everything through but this requires considerations involving the
Lebesgue integral. Lipschitz functions are also not the most general kind of function for
which the above is valid.
16.2
Exercises
14. For (, ) [0, 2][0, 2] , let f (, ) (cos (4 + cos ) , sin (4 + cos ) , sin ) .
Find the area of f ([0, 2] [0, 2]) .
15. For (, ) [0, 2] [0, 2] , let f (, )
T
R
f ([0,2][0,2])
h dA.
364
R
f ([0,2][0,2])
h dA.
T
.
Find a double integral which gives the area of f ([0, 2] [0, 2]) .
19. In spherical coordinates, = c, [0, R] determines a cone. Find the area of this
cone without doing any work involving Jacobians and such.
16.3
(x, y, z) = t 73 , t 23 , t
2. Find a parameterization for the intersection of the plane 4x + 2y + 4z = 0 and the
circular cylinder x2 + y 2 = 16.
Answer:
The cylinder is of the form x = 4 cos t, y = 4 sin t and z = z. Therefore, from the
equation of the plane, 16 cos t+8 sin t+4z = 0. Therefore, z = 16 cos t8 sin t and
this shows the parameterization is of the form (x, y, z) = (4 cos t, 4 sin t, 16 cos t 8 sin t)
where t [0, 2] .
3. Find a parameterization for the intersection of the plane 3x + 2y + z = 4 and the
elliptic cylinder x2 + 4z 2 = 1.
Answer:
The cylinder is of the form x = cos t, 2z = sin t and y = y. Therefore, from the
equation of the plane, 3 cos t+2y+ 12 sin t = 4. Therefore, y = 2 23 cos t 14 sin t and
this shows the parameterization is of the form (x, y, z) = cos t, 2 32 cos t 14 sin t, 12 sin t
where t [0, 2] .
4. Find a parameterization for the straight line joining (4, 3, 2) and (1, 7, 6) .
Answer:
(x, y, z) = (4, 3, 2) + t (3, 4, 4) = (4 3t, 3 + 4t, 2 + 4t) where t [0, 1] .
365
.
1
3
1
3
2
2
Therefore, the parameterization is (x, y, z) = t, 2t 2 + 4 t, 4 t + 2 + 2t .
6. Find the area of S if S is the part of the circular cylinder x2 + y 2 = 16 which lies
between z = 0 and z = 4 + y.
Answer:
Use the parameterization, x = 4 cos v, y = 4 sin v and z = u with the parameter
domain described as follows. The parameter, v goes from 2 to 3
2 and for each
v in this interval, u should go from 0 to 4 + 4 sin v. To see this observe that the
cylinder has its axis parallel to the z axis and if you look at a side view of the
surface you would see something like this:
The positive x axis is coming out of the paper toward you in the above picture
and the angle v is the usual angle measured from the positive x axis. Therefore,
R 3/2 R 4+4 sin v
the area is just A = /2 0
4 du dv = 32.
7. Find the area of S if S is the part of the cone x2 + y 2 = 9z 2 between z = 0 and
z = h.
Answer:
When z = h , x2 + y 2 = 9h2 which is the boundary of
p a circle of radius ah. A
parameterization of this surface is x = u, y = v, z = 31 (u2 + v 2 ) where (u, v)
D, a disk centered at the origin
having radius ha. Therefore, the volume is just
R p
R ha R (9h2 u2 ) 1
2
2
1 + zu + zv dA = ha 2 2 3 10 dv du = 3h2 10
D
(9h u )
0
2 sin v
|fu fv | = 0 2 cos v = 2.
1
366
R 2 R 1
2 3/2
2 r 2 r dr d =
1
+
4h
1
+
4h
1
.
2
6h
0
0
R
11. Evaluate S (1 + x) dA where S is the part of the plane 2x + 3y + 3z = 18 which
is in the first octant.
Answer:
R 6 R 6 23 x
(1 + x) 13 22 dy dx = 28 22
R
12. Evaluate S (1 + x) dA where S is the part of the cylinder x2 + y 2 = 16 between
z = 0 and z = h.
0
Answer:
Parametrize the cylinder as x = 4 cos and y = 4 sin while z = t and the
parameter domain is just [0, 2] [0, h] . Then the integral to evaluate would be
Z
(1 + 4 cos ) 4 dt d = 8h.
0
Note how 4 cos was substituted for x and the area element is 4 dt d .
R
13. Evaluate S (1 + x) dA where S is the hemisphere x2 +y 2 +z 2 = 16 between x = 0
and x = 4.
Answer:
Parametrize the sphere as x = 4 sin cos , y = 4 sin sin , and z = 4 cos and
consider the values of theparameters. Since it is referred to as a hemisphere
and inp
volves x > 0, 2 , 2 and [0, ] . Then the area element is a4 sin d d
and so the integral to evaluate is
Z
/2
/2
367
Answer:
sin () (2 + cos )
cos sin
0
cos
1/2
4 + 4 cos + cos2
|f f | =
=
1/2
4 + 4 cos + cos2
d d.
Therefore, the area is
Z 2 Z 2
Z
1/2
4 + 4 cos + cos2
d d =
0
(2 + cos ) d d = 8 2 .
R
f ([0,2][0,2])
h dA.
Answer:
sin () (4 + 2 cos )
2 cos sin
0
2 cos
1/2
= 64 + 64 cos + 16 cos2
|f f |
1/2
64 + 64 cos + 16 cos2
d d.
Therefore, the desired integral is
Z 2 Z 2
1/2
(cos ) 64 + 64 cos + 16 cos2
d d
0
=
0
(cos ) (8 + 4 cos ) d d = 8 2
R
f ([0,2][0,2])
Answer:
h dV.
368
The area element is
1/2
9 + 6 cos + cos2
d d.
=
0
cos2 (3 + cos ) d d = 6 2
1/2
68 + 64 cos + 12 cos2
d d
and so the surface area is
Z
0
68 + 64 cos + 12 cos2
1/2
d d.
1/2
After many computations, the area element is 4 + 4 cos + cos2
d d.
R 2 R 2
2
Therefore, the area is 0 0 (2 + cos ) d d = 8 .
Outcomes
1. Define and evaluate the divergence of a vector field in terms of Cartesian coordinates.
2. Define and evaluate the Curl of a vector field in Cartesian coordinates.
3. Discover vector identities involving the gradient, divergence, and curl.
4. Recall and verify the divergence theorem.
5. Apply the divergence theorem.
17.1
p
X
fi
(x) .
xi
i=1
i
j
k
x
y
z
370
Note the similarity with the cross product. Sometimes the curl is called rot. (Short for
rotation not decay.) Also
2 f (f ) .
This last symbol is important enough that it is given a name, the Laplacian.It is also
denoted by . Thus 2 f = f. In addition for f a vector field, the symbol f is
defined as a differential operator in the following way.
f (g) f1 (x)
g (x)
g (x)
g (x)
+ f2 (x)
+ + fp (x)
.
x1
x2
xp
Thus f takes vector fields and makes them into new vector fields.
This definition is in terms of a given coordinate system but later coordinate free
definitions of the curl and div are presented. For now, everything is defined in terms of
a given Cartesian coordinate system. The divergence and curl have profound physical
significance and this will be discussed later. For now it is important to understand their
definition in terms of coordinates. Be sure you understand that for f a vector field, div f
is a scalar field meaning it is a scalar valued function of three variables. For a scalar
field, f, f is a vector field described earlier on Page 282. For f a vector field having
values in R3 , curl f is another vector field.
Example 17.1.2 Let f (x) = xyi + (z y) j + (sin (x) + z) k. Find div f and curl f .
First the divergence of f is
(xy) (z y) (sin (x) + z)
+
+
= y + (1) + 1 = y.
x
y
z
Now curl f is obtained by evaluating
j
k
=
y
z
x
xy z y sin (x) + z
(sin (x) + z)
(z y) j
(sin (x) + z)
(xy) +
i
y
z
x
z
k
(z y)
(xy) = i cos (x) j xk.
x
y
17.1.1
Vector Identities
There are many interesting identities which relate the gradient, divergence and curl.
Theorem 17.1.3 Assuming f , g are a C 2 vector fields whenever necessary, the following identities are valid.
1. ( f ) = 0
2. = 0
3. ( f ) = ( f ) 2 f where 2 f is a vector field whose ith component is
2 fi .
4. (f g) = g ( f ) f ( g)
5. (f g) = ( g) f ( f ) g+ (g) f (f ) g
371
Proof: These are all easy to establish if you use the repeated index summation
convention and the reduction identities discussed on Page 116.
( f )
=
=
=
=
=
=
=
i ( f )i
i (ijk j fk )
ijk i (j fk )
jik j (i fk )
ijk j (i fk )
ijk i (j fk )
( f ) .
This establishes the first formula. The second formula is done similarly. Now consider
the third.
( ( f ))i
=
=
ijk j ( f )k
ijk j (krs r fs )
=ijk
z}|{
= kij krs j (r fs )
= ( ir js is jr ) j (r fs )
= j (i fj ) j (j fi )
= i (j fj ) j (j fi )
= ( f ) 2 f i
This establishes the third identity.
Consider the fourth identity.
(f g)
=
=
=
=
=
i (f g)i
i ijk fj gk
ijk (i fj ) gk + ijk fj (i gk )
(kij i fj ) gk (jik i gk ) fk
f g g f.
= ijk j (f g)k
= ijk j krs fr gs
= kij krs j (fr gs )
= ( ir js is jr ) j (fr gs )
= j (fi gj ) j (fj gi )
= (j gj ) fi + gj j fi (j fj ) gi fj (j gi )
= (( g) f + (g ) (f ) ( f ) g (f ) (g))i
372
17.1.2
Vector Potentials
In verifying this you need to use the following manipulation which will generally hold
under reasonable conditions but which has not been carefully shown yet.
Z b
Z b
h
h (x, t) dt =
(x, t) dt.
(17.2)
x a
a x
The above formula seems plausible because the integral is a sort of a sum and the
derivative of a sum is the sum of the derivatives. However, this sort of sloppy reasoning
will get you into all sorts of trouble. The formula involves the interchange of two limit
operations, the integral and the limit of a difference quotient. Such an interchange can
only be accomplished through a theorem. The following gives the necessary result. This
lemma is stated without proof.
Lemma 17.1.4 Suppose h and
Then 17.2 holds.
h
x
The second formula of Theorem 17.1.3 states = 0. This suggests the following
question: Suppose f = 0, does it follow there exists , a scalar field such that
= f ? The answer to this is often yes and a theorem will be given and proved after
the presentation of Stokes theorem. This scalar field, , is called a scalar potential
for f .
17.1.3
There is also a fundamental result having great significance which involves 2 called
the maximum principle. This principle says that if 2 u 0 on a bounded open set, U,
then u achieves its maximum value on the boundary of U.
Theorem 17.1.5 Let U be a bounded open set in Rn and suppose u C 2 (U ) C U
Therefore, u (x) + |x| also has its maximum in U because for small enough,
n
o
2
2
u (x0 ) + |x0 | > u (x0 ) > max u (x) + |x| : x U
for all x U .
2
Now let x1 be the point in U at which u (x)+ |x| achieves its maximum.
As an exer
2
2
2
2
2
cise you should show that (f + g) = f + g and therefore, u (x) + |x| =
2 u (x) + 2n. (Why?) Therefore,
0 2 u (x1 ) + 2n 2n,
a contradiction. This proves the theorem.
17.2. EXERCISES
17.2
373
Exercises
T
(a) xyz, x2 + ln (xy) , sin x2 + z
T
T
T
(d) (x 2, y 3, z 6)
T
(e) y 2 , 2xy, cos z
(d) ln x2 + y 2
p
(e) 1/ x2 + y 2 + z 2
10. Verify the formula given in 17.1 is a vector potential for g assuming that div g = 0.
11. ShowPthat if 2 uk = 0 for each k = 1, 2, , m, and ck is a constant, then
m
2 ( k=1 ck uk ) = 0 also.
2
12. In Theorem 17.1.5 why is 2 |x| = 2n?
13. Using Theorem 17.1.5 prove the following: Let f C (U ) (f is continuous on
U.) where U is
a bounded open set. Then there exists at most one solution, u
C 2 (U ) C U and 2 u = 0 in U with u = f on U. Hint: Suppose there are
two solutions, ui , i = 1, 2 and let w = u1 u2 . Then use the maximum principle.
14. Suppose B is a vector field and A = B. Thus A is a vector potential for B.
Show that A+ is also a vector potential for B. Here is just a C 2 scalar field.
Thus the vector potential is not unique.
374
17.3
The divergence theorem relates an integral over a set to one on the boundary of the set.
It is also called Gausss theorem.
Definition 17.3.1 A subset, V of R3 is called cylindrical in the x direction if it is of
the form
V = {(x, y, z) : (y, z) x (y, z) for (y, z) D}
where D is a subset of the yz plane. V is cylindrical in the z direction if
V = {(x, y, z) : (x, y) z (x, y) for (x, y) D}
where D is a subset of the xy plane, and V is cylindrical in the y direction if
V = {(x, y, z) : (x, z) y (x, z) for (x, z) D}
where D is a subset of the xz plane. If V is cylindrical in the z direction, denote by V
the boundary of V defined to be the points of the form (x, y, (x, y)) , (x, y, (x, y)) for
(x, y) D, along with points of the form (x, y, z) where (x, y) D and (x, y) z
(x, y) . Points on D are defined to be those for which every open ball contains points
which are in D as well as points which are not in D. A similar definition holds for V
in the case that V is cylindrical in one of the other directions.
The following picture illustrates the above definition in the case of V cylindrical in
the z direction.
z = (x, y)
z = (x, y)
Of course, many three dimensional sets are cylindrical in each of the coordinate
directions. For example, a ball or a rectangle or a tetrahedron are all cylindrical in each
direction. The following lemma allows the exchange of the volume integral of a partial
derivative for an area integral in which the derivative is replaced with multiplication by
an appropriate component of the unit exterior normal.
Lemma 17.3.2 Suppose V is cylindrical in the z direction and that and are the
functions in the above definition. Assume and are C 1 functions and suppose F is a
C 1 function defined on V. Also, let n = (nx , ny , nz ) be the unit exterior normal to V.
Then
Z
Z
F
(x, y, z) dV =
F nz dA.
V
V z
Proof: From the fundamental theorem of calculus,
Z
Z Z (x,y)
F
F
(x, y, z) dV =
(x, y, z) dz dx dy
z
V
D (x,y) z
Z
=
[F (x, y, (x, y)) F (x, y, (x, y))] dx dy
D
(17.3)
375
Now the unit exterior normal on the top of V, the surface (x, y, (x, y)) is
2x + 2y + 1
x , y , 1 .
This follows from the observation that the top surface is the level surface, z (x, y) = 0
and so the gradient of this function of three variables is perpendicular to the level surface.
It points in the correct direction because the z component is positive. Therefore, on the
top surface,
1
nz = q
2
x + 2y + 1
Similarly, the unit normal to the surface on the bottom is
q
1
2x + 2y + 1
x , y , 1
nz = q
2x + 2y + 1
Note that here the z component is negative because since it is the outer normal it must
point down. On the lateral surface, the one where (x, y) D and z [ (x, y) , (x, y)] ,
nz = 0.
q
The area element on the top surface is dA = 2x + 2y + 1 dx dy while the area
q
element on the bottom surface is 2x + 2y + 1 dx dy. Therefore, the last expression in
17.3 is of the form,
nz
}|
1
{zq
2x + 2y + 1
nz
}|
2x
+ 2y + 1
dA
}|
2x
2y
+ 1 dx dy+
dA
{z
}|
{
q
2
2
x + y + 1 dx dy
F nz dA,
Lateral surface
376
V1
V2
General formulations of the divergence theorem involve Hausdorff measures and the
Lebesgue integral, a better integral than the old fashioned Riemann integral which has
been obsolete now for almost 100 years. When all is said and done, one finds that the
conclusion of the divergence theorem is usually true and the theorem can be used with
confidence.
Example 17.3.5 Let V = [0, 1] [0, 1] [0, 1] . That is, V is the cube in the first
octant having the lower left corner at (0, 0, 0) and the sides of length 1. Let F (x, y, z) =
xi + yj + zk. Find the flux integral in which n is the unit exterior normal.
Z
F ndS
V
You can certainly inflict much suffering on yourself by breaking the surface up into 6
pieces corresponding to the 6 sides of the cube, finding a parameterization for each face
and adding up the appropriate flux integrals. For example, n = k on the top face and
n = k on the bottom face. On the top face, a parameterization is (x, y, 1) : (x, y)
377
[0, 1] [0, 1] . The area element is just dxdy. It isnt really all that hard to do it this way
but it is much easier to use the divergence theorem. The above integral equals
Z
Z
div (F) dV =
3dV = 3.
V
(x, y, z) : x2 + y 2 + z 2 1
and
=
0
V
1
Example 17.3.7 Suppose V is an open set in R3 for which the divergence theorem
holds. Let F (x, y, z) = xi + yj + zk. Then show
Z
F ndS = 3 volume(V ).
V
The message of the divergence theorem is the relation between the volume integral
and an area integral. This is the exciting thing about this marvelous theorem. It is not
its utility as a method for evaluations of boring problems. This will be shown in the
examples of its use which follow.
17.3.1
The divergence theorem also makes possible a coordinate free definition of the divergence.
Theorem 17.3.8 Let B (x, ) be the ball centered at x having radius and let F be a
C 1 vector field. Then letting v (B (x, )) denote the volume of B (x, ) given by
Z
dV,
B(x,)
it follows
1
div F (x) = lim
0+ v (B (x, ))
Z
F n dA.
B(x,)
(17.4)
378
Proof: The divergence theorem holds for balls because they are cylindrical in every
direction. Therefore,
Z
Z
1
1
F n dA =
div F (y) dV.
v (B (x, )) B(x,)
v (B (x, )) B(x,)
Therefore, since div F (x) is a constant,
F n dA
div F (x)
v (B (x, )) B(x,)
= div F (x)
div F (y) dV
v (B (x, )) B(x,)
=
(div F (x) div F (y)) dV
v (B (x, )) B(x,)
Z
1
|div F (x) div F (y)| dV
v (B (x, )) B(x,)
Z
1
dV <
v (B (x, )) B(x,) 2
whenever is small enough due to the continuity of div F. Since is arbitrary, this
shows 17.4.
How is this definition independent of coordinates? It only involves geometrical notions of volume and dot product. This is why. Imagine rotating the coordinate axes,
keeping all distances the same and expressing everything in terms of the new coordinates. The divergence would still have the same value because of this theorem.
17.4
17.4.1
Hydrostatic Pressure
Imagine a fluid which does not move which is acted on by an acceleration, g. Of course
the acceleration is usually the acceleration of gravity. Also let the density of the fluid
be , a function of position. What can be said about the pressure, p, in the fluid? Let
B (x, ) be a small ball centered at the point, x. Then the force the fluid exerts on this
ball would equal
Z
pn dA.
B(x,)
Here n is the unit exterior normal at a small piece of B (x, ) having area dA. By the
divergence theorem, (see Problem 1 on Page 390) this integral equals
Z
p dV.
B(x,)
379
Since it is given that the fluid does not move, the sum of these forces must equal zero.
Thus
Z
Z
g dV =
p dV.
B(x,)
B(x,)
Since this must hold for any ball in the fluid of any radius, it must be that
p = g.
(17.5)
It turns out that the pressure in a lake at depth z is equal to 62.5z. This is easy to
see from 17.5. In this case, g = gk where g = 32 feet/sec2 . The weight of a cubic foot
of water is 62.5 pounds. Therefore, the mass in slugs of this water is 62.5/32. Since it
is a cubic foot, this is also the density of the water in slugs per cubic foot. Also, it is
normally assumed that water is incompressible1 . Therefore, this is the mass of water at
any depth. Therefore,
p p p
62.5
i+ j+ k =
32k.
x y z
32
and so p does not depend on x and y and is only a function of z. It follows p (0) = 0,
and p0 (z) = 62.5. Therefore, p (x, y, z) = 62.5z. This establishes the claim. This is
interesting but 17.5 is more interesting because it does not require to be constant.
17.4.2
Archimedes principle states that when a solid body is immersed in a fluid the net force
acting on the body by the fluid is directly up and equals the total weight of the fluid
displaced.
Denote the set of points in three dimensions occupied by the body as V. Then for
dA an increment of area on the surface of this body, the force acting on this increment
of area would equal p dAn where n is the exterior unit normal. Therefore, since the
fluid does not move,
Z
Z
Z
pn dA =
p dV =
g dV k
V
Which equals the total weight of the displaced fluid and you note the force is directed
upward as claimed. Here is the density and 17.5 is being used. There is an interesting
point in the above explanation. Why does the second equation hold? Imagine that V
were filled with fluid. Then the equation follows from 17.5 because in this equation
g = gk.
17.4.3
Let x be a point in three dimensional space and let (x1 , x2 , x3 ) be Cartesian coordinates
of this point. Let there be a three dimensional body having density, = (x, t).
The heat flux, J, in the body is defined as a vector which has the following property.
Z
Rate at which heat crosses S =
J n dA
S
where n is the unit normal in the desired direction. Thus if V is a three dimensional
body,
Z
Rate at which heat leaves V =
J n dA
V
1 There
is no such thing as an incompressible fluid but this doesnt stop people from making this
assumption.
380
(17.6)
Take an arbitrary V for which the divergence theorem holds. Then the time rate of
change of the heat in V is
Z
Z
d
( (x, t) c (x, t) u (x, t))
(x, t) c (x, t) u (x, t) dV =
dV
dt V
t
V
where, as in the preceding example, this is a physical derivation so the consideration of
hard
R mathematics is not necessary. Therefore, from the Fourier law of heat conduction,
d
dt V (x, t) c (x, t) u (x, t) dV =
rate at which heat enters
Z
V
zZ
}|
J n dA
V
Z
+
f (x, u, t) dV
V
k (u) n dA +
V
f (x, u, t) dV =
V
( (k (u)) + f ) dV.
V
Since this holds for every sample volume, V it must be the case that the above
reaction diffusion equation, 17.6 holds. Note that more interesting equations can be
obtained by letting more of the quantities in the equation depend on temperature.
However, the above is a fairly hard equation and people usually assume the coefficient
of thermal conductivity depends only on x and that the reaction term, f depends only
on x and t and that and c are constant. Then it reduces to the much easier equation,
1
u (x, t) = (k (x) u (x, t)) + f (x,t) .
t
c
(17.7)
This is often referred to as the heat equation. Sometimes there are modifications of this
in which k is not just a scalar but a matrix to account for different heat flow properties
in different directions. However, they are not much harder than the above. The major
mathematical difficulties result from allowing k to depend on temperature.
It is known that the heat equation is not correct even if the thermal conductivity
did not depend on u because it implies infinite speed of propagation of heat. However,
this does not prevent people from using it.
17.4.4
Balance Of Mass
Let y be a point in three dimensional space and let (y1 , y2 , y3 ) be Cartesian coordinates
of this point. Let V be a region in three dimensional space and suppose a fluid having
381
+ (v) = f (y, t)
t
(17.8)
To see this is so, take an arbitrary V for which the divergence theorem holds. Then
the time rate of change of the mass in V is
Z
Z
(y, t)
(y, t) dV =
dV
t V
t
V
where the derivative was taken under the integral sign with respect to t. (This is a
physical derivation and therefore, it is not necessary to fuss with the hard mathematics
related to the change of limit operations. You should expect this to be true under fairly
general conditions because the integral is a sort of sum and the derivative
R of a sum is
(y, t)
dV
t
=
=
}|
{ Z
z Z
(y, t) v (y, t) n dA +
f (y, t) dV
V
ZV
Since this holds for every sample volume, V it must be the case that the equation
of continuity holds. Again, there are interesting mathematical questions here which can
be explored but since it is a physical derivation, it is not necessary to dwell too much
on them. If all the functions involved are continuous, it is certainly true but it is true
under far more general conditions than that.
Also note this equation applies to many situations and f might depend on more
than just y and t. In particular, f might depend also on temperature and the density, .
This would be the case for example if you were considering the mass of some chemical
and f represented a chemical reaction. Mass balance is a general sort of equation valid
in many contexts.
17.4.5
Balance Of Momentum
This example is a little more substantial than the above. It concerns the balance of
momentum for a continuum. To see a full description of all the physics involved, you
should consult a book on continuum mechanics. The situation is of a material in three
dimensions and it deforms and moves about in three dimensions. This means this
material is not a rigid body. Let B0 denote an open set identifying a chunk of this
material at time t = 0 and let Bt be an open set which identifies the same chunk of
material at time t > 0.
Let y (t, x) = (y1 (t, x) , y2 (t, x) , y3 (t, x)) denote the position with respect to Cartesian coordinates at time t of the point whose position at time t = 0 is x = (x1 , x2 , x3 ) .
The coordinates, x are sometimes called the reference coordinates and sometimes the
material coordinates and sometimes the Lagrangian coordinates. The coordinates, y are
382
called the Eulerian coordinates or sometimes the spacial coordinates and the function,
(t, x) y (t, x) is called the motion. Thus
y (0, x) = x.
(17.9)
The derivative,
D2 y (t, x) Dx y (t, x)
is called the deformation gradient. Recall the notation means you fix t and consider
the function, x y (t, x) , taking its derivative. Since it is a linear transformation, it is
represented by the usual matrix, whose ij th entry is given by
Fij (x) =
yi (t, x)
.
xj
Let (t, y) denote the density of the material at time t at the point, y and let 0 (x)
denote the density of the material at the point, x. Thus 0 (x) = (0, x) = (0, y (0, x)) .
The first task is to consider the relationship between (t, y) and 0 (x) . The following
picture is useful to illustrate the ideas.
N
x
V0
y = y(t, x)
-
Vt
y
Lemma 17.4.1 0 (x) = (t, y (t, x)) det (F ) and in any reasonable physical motion,
det (F ) > 0.
Proof: Let V0 represent a small chunk of material at t = 0 and let Vt represent the
same chunk of material at time t. I will be a little sloppy and refer to V0 as the small
chunk of material at time t = 0 and Vt as the chunk of material at time t rather than an
open set representing the chunk of material. Then by the change of variables formula
for multiple integrals,
Z
Z
dV =
Vt
|det (F )| dV.
V0
If det (F ) = 0 for some t the above formula shows that the chunk of material went
from positive volume to zero volume and this is not physically possible. Therefore, it is
impossible that det (F ) can equal zero. However, at t = 0, F = I, the identity because
of 17.9. Therefore, det (F ) = 1 at t = 0 and if it is assumed t det (F ) is continuous
it follows by the intermediate value theorem that det (F ) > 0 for all t. Of course it is
not known for sure this function is continuous but the above shows why it is at least
reasonable to expect det (F ) > 0.
Now using the change of variables formula,
Z
Z
mass of Vt =
(t, y) dV =
(t, y (t, x)) det (F ) dV
Vt
V0
Z
= mass of V0 =
0 (x) dV.
V0
383
as claimed. Note this shows that det (F ) is a magnification factor for the density.
Now consider a small chunk of material, Vt at time t which corresponds to V0 at
time t = 0. The total linear momentum of this material at time t is
Z
(t, y) v (t, y) dV
Vt
where v is the velocity. By Newtons second law, the time rate of change of this
linear momentum should equal the total force acting on the chunk of material. In the
following derivation, dV (y) will indicate the integration is taking place with respect
to the variable, y. By Lemma 17.4.1 and the change of variables formula for multiple
integrals
Z
d
d
(t, y) v (t, y) dV (y)
=
(t, y (t, x)) v (t, y (t, x)) det (F ) dV (x)
dt
dt
Vt
V
Z 0
d
=
0 (x) v (t, y (t, x)) dV (x)
dt
V0
Z
v yi
v
=
0 (x)
+
dV (x)
t
yi t
V0
0 (x)
z
}|
{ v
v yi
1
(t, y) det (F )
+
dV (y)
t
y
t
det
(F )
i
Vt
Z
v
v yi
(t, y)
+
dV (y) .
t
yi t
Vt
Z
=
=
Having taken the derivative of the total momentum, it is time to consider the total force
acting on the chunk of material.
The force comes from two sources, a body force, b and a force which acts on the
boundary of the chunk of material called a traction force. Typically, the body force
is something like gravity in which case, b = gk, assuming the Cartesian coordinate
system has been chosen in the usual manner. The traction force is of the form
Z
s (t, y, n) dA
Vt
where n is the unit exterior normal. Thus the traction force depends on position, time,
and the orientation of the boundary of Vt . Cauchy showed the existence of a linear
transformation, T (t, y) such that T (t, y) n = s (t, y, n) . It follows there is a matrix,
Tij (t, y) such that the ith component of s is given by si (t, y, n) = Tij (t, y) nj . Cauchy
also showed this matrix is symmetric, Tij = Tji . It is called the Cauchy stress. Using
Newtons second law to equate the time derivative of the total linear momentum with
the applied forces and using the usual repeated index summation convention,
Z
Z
Z
v
v yi
(t, y)
+
dV (y) =
b (t, y) dV (y) +
Tij (t, y) nj dA.
t
yi t
Vt
Vt
Bt
Here is where the divergence theorem is used. In the last integral, the multiplication by
nj is exchanged for the j th partial derivative and an integral over Vt . Thus
Z
Z
Z
v
v yi
(Tij (t, y))
(t, y)
+
dV (y) =
b (t, y) dV (y) +
dV (y) .
t
yi t
yj
Vt
Vt
Vt
Since Vt was arbitrary, it follows
v
v yi
(t, y)
+
t
yi t
384
Tij
.
yj
v yi
The term, v
t + yi t , is the total derivative with respect to t of the velocity v. Thus
you might see this written as
v = b + div (T ) .
The above formulation of the balance of momentum involves the spatial coordinates, y but people also like to formulate momentum balance in terms of the material
coordinates, x. Of course this changes everything.
The momentum in terms of the material coordinates is
Z
0 (x) v (t, x) dV
V0
V0
Vt
the first term on the right being the contribution of the body force given per unit volume
in the material coordinates and the last term being the traction force discussed earlier.
The task is to write this last integral as one over V0 . For y Vt there is a unit outer
normal, n. Here y = y (t, x) for x V0 . Then define N to be the unit outer normal to
V0 at the point, x. Near the point y Vt the surface, Vt is given parametrically in
the form y = y (s, t) for (s, t) D R2 and it can be assumed the unit normal to Vt
near this point is
ys (s, t) yt (s, t)
n=
|ys (s, t) yt (s, t)|
with the area element given by |ys (s, t) yt (s, t)| ds dt. This is true for y Pt Vt ,
a small piece of Vt . Therefore, the last integral in 17.10 is the sum of integrals over
small pieces of the form
Z
Tij nj dA
(17.11)
Pt
where Pt is parametrized by y (s, t) , (s, t) D. Thus the integral in 17.11 is of the form
Z
Tij (y (s, t)) (ys (s, t) yt (s, t))j ds dt.
D
Z
y x
y x
Tij (y (s, t))
ds dt.
x s
x t j
D
Remember y = y (t, x) and it is always assumed the mapping x y (t, x) is one to one
and so, since on the surface Vt near y, the points are functions of (s, t) , it follows x is
385
also a function of (s, t) . Now by the properties of the cross product, this last integral
equals
Z
y
x x y
ds dt
(17.12)
Tij (x (s, t))
s t
x
x j
D
where here x (s, t) is the point of V0 which corresponds with y (s, t) Vt . Thus
Tij (x (s, t)) = Tij (y (s, t)) .
(Perhaps this is a slight abuse of notation because Tij is defined on Vt , not on V0 ,
but it avoids introducing extra symbols.) Next 17.12 equals
Z
x x
ya yb
jab
Tij (x (s, t))
ds dt
s t
x x
D
Z
=
x x
ya yb
cab jc
ds dt
s t
x x
= jc
z }| {
x x
yc xp ya yb
=
Tij (x (s, t))
cab
ds dt
s
t
x
p yj x x
D
Z
=p det(F )
}|
{
yc ya yb
x x xp
cab
=
Tij (x (s, t))
ds dt
s t yj
xp x x
D
Z
x x xp
ds dt.
=
(det F ) Tij (x (s, t)) p
s t yj
D
Z
Now
xp
yj
1
= Fpj
and also
x x
= (xs xt )p
s t
Z
D
This has transformed the integral over Pt to one over P0 , the part of V0 which corresponds with Pt . Thus the last integral is of the form
Z
det (F ) T F T ip Np dA
P0
Summing these up over the pieces of Vt and V0 yields the last integral in 17.10 equals
Z
det (F ) T F T ip Np dA
V0
0 (x) vt (t, x) dV =
b0 (t, x) dV +
det (F ) T F T ip Np dA
V0
V0
V0
386
Z
Z
Z det (F ) T F T
ip
0 (x) vt (t, x) dV =
b0 (t, x) dV +
dV.
x
p
V0
V0
V0
Since V0 is arbitrary, a balance law for momentum in terms of the material coordinates
is obtained
det (F ) T F T ip
0 (x) vt (t, x) = b0 (t, x) +
xp
S = det (F ) T F T ,
(17.14)
perhaps not the first thing you would think of.
The main purpose of this presentation is to show how the divergence theorem is used
in a significant way to obtain balance laws and to indicate a very interesting direction
for further study. To continue, one needs to specify T or S as an appropriate function
of things related to the motion, y. Often the thing related to the motion is something
called the strain and such relationships are known as constitutive laws.
17.4.6
Frame Indifference
The proper formulation of constitutive laws involves more physical considerations such
as frame indifference in which it is required the response of the system cannot depend
on the manner in which the Cartesian coordinate system for the spacial coordinates was
chosen.
For Q (t) an orthogonal transformation and
y0 = q (t) + Q (t) y, n0 = Qn,
the new spacial coordinates are denoted by y0 . Recall an orthogonal transformation is
just one which satisfies
T
T
Q (t) Q (t) = Q (t) Q (t) = I.
The stress has to do with the traction force area density produced by internal changes
in the body and has nothing to do with the way the body is observed. Therefore, it is
required that
T 0 n0 = QT n
Thus
T 0 Qn = QT n
Since this is true for any n normal to the boundary of any piece of the material considered, it must be the case that
T 0 Q = QT
and so
This is called frame indifference.
T 0 = QT QT .
387
S 0 = det (F 0 ) T 0 (F 0 )
but
F 0 = Dx y0 = Q (t) Dx y = QF
and so frame indifference in terms of S is
S0
=
=
=
det (F ) QT QT (QF )
det (F ) QT QT QF T
QS
This principle of frame indifference is sometimes ignored and there are certainly
interesting mathematical models which have resulted from doing this, but such things
cannot be considered physically acceptable.
There are also many other physical properties which can be included and which
require a certain form for the constitutive equations. These considerations are outside
the scope of this book and require a considerable amount of linear algebra.
There are also balance laws for energy which you may study later but these are more
problematic than the balance laws for mass and momentum. However, the divergence
theorem is used in these also.
17.4.7
Bernoullis Principle
Consider a possibly moving fluid with constant density, and let P denote the pressure
in
R this fluid. If B is a part of this fluid the force exerted on B by the rest of the fluid is
P ndA where n is the outer normal from B. Assume this is the only force which
B
matters so for example there is no viscosity in the fluid. Thus the Cauchy stress in
rectangular coordinates should be
P
0
0
P
0 .
T = 0
P
0
0
Then
div T = P.
Also suppose the only body force is from gravity, a force of the form
gk
and so from the balance of momentum
v = gk P (x) .
(17.15)
Now in all this the coordinates are the spacial coordinates and it is assumed they are
rectangular. Thus
T
x = (x, y, z)
388
and v is the velocity while v is the total derivative of v = (v1 , v2 , v3 ) given by vt +vi v,i .
Take the dot product of both sides of 17.15 with v. This yields
(/2)
Therefore,
d
dt
d
dz
d
2
|v| = g
P (x) .
dt
dt
dt
|v|
+ gz + P (x)
2
!
=0
|v|
+ gz + P (x) = C 0
2
For convenience define to be the weight density of this fluid. Thus = g. Divide by
. Then
2
P (x)
|v|
+z+
= C.
2g
this is Bernoullis2 principle. Note how if you keep the height the same, then if you
raise |v| , it follows the pressure drops.
This is often used to explain the lift of an airplane wing. The top surface is curved
which forces the air to go faster over the top of the wing causing a drop in pressure
which creates lift. It is also used to explain the concept of a venturi tube in which
the air loses pressure due to being pinched which causes it to flow faster. In many of
these applications, the assumptions used in which is constant and there is no other
contribution to the traction force on B than pressure so in particular, there is no
viscosity, are not correct. However, it is hoped that the effects of these deviations from
the ideal situation above are small enough that the conclusions are still roughly true.
You can see how using balance of momentum can be used to consider more difficult
situations. For example, you might have a body force which is more involved than
gravity.
17.4.8
(17.16)
17.4.9
389
A Negative Observation
Many of the above applications of the divergence theorem are based on the assumption
that matter is continuously distributed in a way that the above arguments are correct. In
other words, a continuum. However, there is no such thing as a continuum. It has been
known for some time now that matter is composed of atoms. It is not continuously
distributed through some region of space as it is in the above. Apologists for this
contradiction with reality sometimes say to consider enough of the material in question
that it is reasonable to think of it as a continuum. This mystical reasoning is then
violated as soon as they go from the integral form of the balance laws to the differential
equations expressing the traditional formulation of these laws. See Problem 9 below, for
example. However, these laws continue to be used and seem to lead to useful physical
models which have value in predicting the behavior of physical systems. This is what
justifies their use, not any fundamental truth.
17.4.10
Electrostatics
Coloumbs law says that the electric field intensity at x of a charge q located at point,
x0 is given by
q (x x0 )
E=k
3
|x x0 |
where the electric field intensity is defined to be the force experienced by a unit positive
charge placed at the point, x. Note that this is a vector and that its direction depends
on the sign of q. It points away from x0 if q is positive and points toward x0 if q is
negative. The constant, k is a physical constant like the gravitation constant. It has
been computed through careful experiments similar to those used with the calculation
of the gravitation constant.
The interesting thing about Coloumbs law is that E is the gradient of a function.
In fact,
1
E = qk
.
|x x0 |
The other thing which is significant about this is that in three dimensions and for
x 6= x0 ,
1
qk
= E = 0.
(17.17)
|x x0 |
This is left as an exercise for you to verify.
These observations will be used to derive a very important formula for the integral,
Z
E ndS
U
where E is the electric field intensity due to a charge, q located at the point, x0 U, a
bounded open set for which the divergence theorem holds.
Let U denote the open set obtained by removing the open ball centered at x0 which
has radius where is small enough that the following picture is a correct representation
of the situation.
x0
390
kq
42 = 4kq.
2
Therefore, from the divergence theorem and observation 17.17,
Z
Z
Z
E ndS =
4kq +
E ndS =
EdV = 0.
=
It follows that
Z
4kq =
E ndS.
U
If there are several charges located inside U, say q1 , q2 , , qn , then letting Ei denote
the electric field intensity of the ith charge and E denoting the total resulting electric
field intensity due to all these charges,
Z
n Z
X
E ndS =
Ei ndS
U
i=1
n
X
i=1
4kqi = 4k
n
X
qi .
i=1
17.5
Exercises
1. To prove the divergence theorem, it was shown first that the spacial partial
derivative in the volume integral could be exchanged for multiplication by an
appropriate component of the exterior normal. This problem starts with the
divergence theorem and goes the other Rdirection. Assuming
the divergence theR
orem, holds for a region, V, show that V nu dA = V u dV. Note this implies
R u
R
dV = V n1 u dA.
V x
R
2. Let V be such that the divergence theorem holds. Show that V (uv) dV =
R
v
v
u n
dA where n is the exterior normal and n
denotes the directional derivaV
tive of v in the direction n.
R
3. Let V be such that the divergence theorem holds. Show that V v2 u u2 v dV =
R u
v
u
v n u n
dA where n is the exterior normal and n
is defined in Problem
V
2.
4. Let V be a ball and suppose 2 u = f in V while u = g on V. Show there is at
most one solution to this boundary value problem which is C 2 in V and continuous
on V with its boundary. Hint: You might consider w = u v where u and v are
solutions to the problem. Then use the result of Problem 2 and the identity
w2 w = (ww) w w
to conclude w = 0. Then show this implies w must be a constant by considering
h (t) = w (tx+ (1 t) y) and showing h is a constant. Alternatively, you might
consider the maximum principle.
17.5. EXERCISES
391
R
5. Show that V v n dA = 0 where V is a region for which the divergence
theorem holds and v is a C 2 vector field.
3
6. Let F (x, y, z) = (x, y, z) be a vector field
R in R and let V be a three dimensional
shape and let n = (n1 , n2 , n3 ). Show V (xn1 + yn2 + zn3 ) dA = 3 volume of
V.
7. Does the divergence theorem hold for higher dimensions? If so, explain why it
does. How about two dimensions?
8. Let F = xi + yj + zk and let V denote the tetrahedron formed by the planes,
x = 0, y = 0, z = 0, and 13 x + 13 y + 15 z = 1. Verify the divergence theorem for this
example.
9. Suppose f : U RR is continuous where U is some open set and for all B U
where B is a ball, B f (x) dV = 0. Show this implies f (x) = 0 for all x U.
10. Let U denote the box centered at (0, 0, 0) with sides parallel to theR coordinate
planes which has width 4, length 2 and height 3. Find the flux integral U F n dS
where F = (x + 3, 2y, 3z) . Hint: If you like, you might want to use the divergence
theorem.
11. Verify 17.16 from 17.13 and the assumption that S = kF.
12. Ficks law for diffusion states the flux of a diffusing species, J is proportional to
the gradient of the concentration, c. Write this law getting the sign right for the
constant of proportionality and derive an equation similar to the heat equation
for the concentration, c. Typically, c is the concentration of some sort of pollutant
or a chemical.
13. Show that if uk , k = 1, 2, , n each satisfies 17.7 then for any choice of constants,
c1 , , cn , so does
n
X
ck uk .
k=1
14. Suppose k (x) = k, a constant and f = 0. Then in one dimension, the heat
2
equation is of the form ut = uxx . Show u (x, t) = en t sin (nx) satisfies the
3
heat equation .
15. In a linear, viscous, incompressible fluid, the Cauchy stress is of the form
392
Also, p denotes the pressure. Show, using the balance of mass equation that
incompressible implies div v = 0. Next show the balance of momentum equation
requires
v
v
v =
+
vi v = b p.
v
2
t
yi
2
This is the famous Navier Stokes equation for incompressible viscous linear fluids.
There are still open questions related to this equation, one of which is worth
$1,000,000 at this time.
Outcomes
18.1
Greens Theorem
Greens theorem is an important theorem which relates line integrals to integrals over
a surface in the plane. It can be used to establish the seemingly more general Stokes
theorem but is interesting for its own sake. Historically, theorems like it were important
in the development of complex analysis. I will first establish Greens theorem for regions
of a particular sort and then show that the theorem holds for many other regions also.
Suppose a region is of the form indicated in the following picture in which
U
=
=
9
q
q x = l(y)
z
y=
q t(x)
q
y
q
q x = r(y)
U
y=
:
q b(x)
q
b
393
394
I will refer to such a region as being convex in both the x and y directions.
Lemma 18.1.1 Let F (x, y) (P (x, y) , Q (x, y)) be a C 1 vector field defined near U
where U is a region of the sort indicated in the above picture which is convex in both the
x and y directions. Suppose also that the functions, r, l, t, and b in the above picture are
all C 1 functions and denote by U the boundary of U oriented such that the direction
of motion is counter clockwise. (As you walk around U on U , the points of U are on
your left.) Then
Z
P dx + Qdy
U
FdR =
U
Q P
x
y
dA.
(18.1)
Z
Q P
dA
x
y
U
Z
r(y)
Q
dxdy
x
t(x)
P
dydx
c
l(y)
a
b(x) y
Z d
Z b
(Q (r (y) , y) Q (l (y) , y)) dy +
(P (x, b (x))) P (x, t (x)) dx.(18.2)
Now consider the left side of 18.1. Denote by V the vertical parts of U and by H the
horizontal parts.
Z
FdR =
U
Z
=
Z
0
Z
+
Z
Z
(P (s, b (s)) , 0) (0, 1) ds
V
d
a
b
a
d
Q (r (s) , s) ds
c
Q (l (s) , s) ds +
c
P (s, b (s)) ds
a
P (s, t (s)) ds
a
395
Z
Q P
dA =
x
y
U
m Z
X
Q P
x
y
Z
F dR =
k=1 Uk
m Z
X
Uk
k=1
dA
F dR
U
@
@
R
@
@
U1
I
U2
@
:
Similarly, if U V and if also U V and both U and V are open sets for which
18.1 holds, then the open set, V \ (U U ) consisting of what is left in V after deleting
U along with its boundary also satisfies 18.1. Roughly speaking, you can drill holes in
a region for which 18.1 holds and get another region for which this continues to hold
provided 18.1 holds for the holes. To see why this is so, consider the following picture
which typifies the situation just described.
9
: z
U
9
y
z
Then
y
V
:
Z
FdR =
V
Q P
x
y
Z
FdR +
which equals
Z
=
V \U
dA
Q P
dA +
x
y
V \U
Q P
dA
x
y
V \U
and so
Q P
x
y
Q P
x
y
Z
dA =
dA
Z
FdR
Z
F dR
(V \U )
FdR
U
396
where V is oriented as shown in the picture. (If you walk around the region, V \ U
with the area on the left, you get the indicated orientation for this curve.)
You can see that 18.1 is valid quite generally. This verifies the following theorem.
Theorem 18.1.3 (Greens Theorem) Let U be an open set in the plane and let U be
piecewise smooth and let F (x, y) = (P (x, y) , Q (x, y)) be a C 1 vector field defined near
U. Then it is often1 the case that
Z
Z
Q
P
F dR =
(x, y)
(x, y) dA.
x
y
U
U
Here is an alternate proof of Greens theorem from the divergence theorem.
Theorem 18.1.4 (Greens Theorem) Let U be an open set in the plane and let U be
piecewise smooth and let F (x, y) = (P (x, y) , Q (x, y)) be a C 1 vector field defined near
U. Then it is often the case that
Z
Z
Q
P
F dR =
(x, y)
(x, y) dA.
x
y
U
U
Proof: Suppose the divergence theorem holds for U. Consider the following picture.
(y 0 , x0 )
(x0 , y 0 )
yXX
X
X
U
Since it is assumed that motion around U is counter clockwise, the tangent vector,
(x0 , y 0 ) is as shown. Now the unit exterior normal is either
1
q
(y 0 , x0 )
2
2
0
0
(x ) + (y )
or
1
q
(y 0 , x0 )
2
2
(x0 ) + (y 0 )
Again, the counter clockwise motion shows the correct unit exterior normal is the second
of the above. To see this note that since the area should be on the left as you walk
around the edge, you need to have the unit normal point in the direction of (x0 , y 0 , 0)k
0
which equals (y 0 , x
q , 0). Now let F (x, y) = (Q (x, y) , P (x, y)) . Also note the area
2
a general version see the advanced calculus book by Apostol. The general versions involve the
concept of a rectifiable (finite length) Jordan curve.
m Z
X
i=1
bi
dS
zq
397
}|
2
(x0i )
(yi0 , x0i )
{
2
(yi0 ) dt
+
2
2
(x0i ) + (yi0 )
m Z bi
X
(Q (xi (t) , yi (t)) , P (xi (t) , yi (t))) (yi0 , x0i ) dt
i=1 ai
m Z bi
X
i=1
ai
Z
Q (xi (t) , yi (t)) yi0 (t) + P (xi (t) , yi (t)) x0i (t) dt
P dx + Qdy
U
where F (x, y) =
1
2
1
2
1
2
=
Example 18.1.7 Find
R
U
(ab) dt = ab.
0
(x, y) : x2 + 3y 2 9
398
81
81
=
cos 4 6 sin 3
cos 2.
4
4
This is much easier than computing the line integral because you dont have to break
the boundary in pieces and consider each separately.
R
Example 18.1.9 Find U FdR where U is the set,
{(x, y) : 2 x 4, x y 3}
and F (x, y) = (x sin y, y sin x) .
From Greens theorem this line integral equals
Z 4Z 3
(y cos x x cos y) dydx
2
18.2
3
9
sin 4 6 sin 3 8 cos 4 sin 2 + 4 cos 2.
2
2
Stokes theorem is a generalization of Greens theorem which relates the integral over
a surface to the integral around the boundary of the surface. These terms are a little
different from what occurs in R2 . To describe this, consider a sock. The surface is the
sock and its boundary will be the edge of the opening of the sock in which you place
your foot. Another way to think of this is to imagine a region in R2 of the sort discussed
above for Greens theorem. Suppose it is on a sheet of rubber and the sheet of rubber
is stretched in three dimensions. The boundary of the resulting surface is the result of
the stretching applied to the boundary of the original region in R2 . Here is a picture
describing the situation.
S
R
S
I
Recall the following definition of the curl of a vector field.
Definition 18.2.1 Let
F (x, y, z) = (F1 (x, y, z) , F2 (x, y, z) , F3 (x, y, z))
be a C 1 vector field defined on an open set, V in R3 . Then
i
j
k
F x y
z
F1 F2 F3
F2
F1
F3
F2
F1
F3
i+
j+
k.
y
z
z
x
x
y
This is also called curl (F) and written as indicated, F.
399
The following lemma gives the fundamental identity which will be used in the proof
of Stokes theorem.
Lemma 18.2.2 Let R : U V R3 where U is an open subset of R2 and V is an
open subset of R3 . Suppose R is C 2 and let F be a C 1 vector field defined in V.
(Ru Rv ) ( F) (R (u, v)) = ((F R)u Rv (F R)v Ru ) (u, v) .
(18.3)
Proof: Start with the left side and let xi = Ri (u, v) for short.
(Ru Rv ) ( F) (R (u, v))
Fs
xr
( jr ks js kr ) xju xkv
xju xkv
Fs
xr
Fk
Fj
xju xkv
xj
xk
(F R)
(F R)
Rv
Ru
u
v
n
k
C U , R if it is the restriction to U of a vector valued function which is defined on
Rm and is C k . That is, this function has continuous partial derivatives up to order k.
Theorem 18.2.4 (Stokes Theorem) Let U be any region in R2 for which the conclusion
of Greens theorem holds and let R C 2 U , R3 be a one to one function satisfying
|(Ru Rv ) (u, v)| 6= 0 for all (u, v) U and let S denote the surface,
S
S
{R (u, v) : (u, v) U } ,
{R (u, v) : (u, v) U }
Ru Rv
.
|Ru Rv |
400
+
Z
=
((F R) Ru , (F R) Rv ) dr.
C
By the assumption that the conclusion of Greens theorem holds for U , this equals
Z
[((F R) Rv )u ((F R) Ru )v ] dA
ZU
=
[(F R)u Rv + (F R) Rvu (F R) Ruv (F R)v Ru ] dA
ZU
=
[(F R)u Rv (F R)v Ru ] dA
U
the last step holding by equality of mixed partial derivatives, a result of the assumption
that R is C 2 . Now by Lemma 18.2.2, this equals
Z
(Ru Rv ) ( F) dA
ZU
=
F (Ru Rv ) dA
ZU
=
F ndS
S
(Ru Rv )
because dS = |(Ru Rv )| dA and n = |(R
. Thus
u Rv )|
(Ru Rv ) dA
(Ru Rv )
|(Ru Rv )| dA
|(Ru Rv )|
= ndS.
=
18.2.1
401
the point p being a corner of the parallelogram S. Then orient S consistent with
the counter clockwise orientation on Q. Thus, following this orientation on S you go
from p to p + a to p +R a + b to p +Rb to p. Then Stokes theorem implies that with
this orientation on S, S F dR = S F nds where n = Ru Rv / |Ru Rv | =
a b/ |a b|. Now recall a, b, a b forms a right hand system.
p+a+b
A
SA
Ap + a
a
AKb
a
A
P
ib
PPA
p
Thus, if you were walking around S in the direction of the orientation with your
left hand over the surface S, the normal vector a b would be pointing in the direction
of your head.
More generally, if S is a surface which is not necessarily a parallelogram but is instead
as described in Theorem 18.2.4, you could consider a small rectangle Q contained in U
and orient the boundary of R (Q) as described in that theorem. Then if the rectangle is
small enough, as you walk around R (Q) in the direction of the described orientation,
your head would point roughly in the direction of Ru Rv . This is because for small
enough Q, the normal to the tangent parallelogram would point in roughly the same
direction as Ru Rv at each point of R (Q) and your head would also point roughly
in the same direction if you were on R (Q) or the tangent parallelogram. You can
imagine essentially filling U with non overlapping rectangles, Qi . Then orienting R(Qi )
consistent with the counter clockwise orientation on Qi and adding the resulting line
integrals, the line integrals over the common sides cancel and the result is essentially
the line integral over S.Thus there is a simple relation between the field of normal
vectors on S and the orientation of S. It is simply this. If you walk along S in the
direction mandated by the orientation, with your left hand over the surface, the nearby
normal vectors in Stokes theorem will point roughly in the direction of your head.
6
n
402
This also illustrates that you can define an orientation for S by specifying a field
of normal vectors for the surface which varies continuously over the surface, and require
that the motion over the boundary of the surface is such that your head points roughly
in the direction of nearby normal vectors and your left hand is over the surface. The
existence of such a continuous field of normal vectors is what constitutes an orientable
surface.
18.2.2
It turns out there are more general formulations of Stokes theorem than what is presented above. However, it is always necessary for the surface, S to be orientable. This
means it is possible to obtain a vector field for a unit normal to the surface which is a
continuous function of position on S.
An example of a surface which is not orientable is the famous Mobeus band, obtained
by taking a long rectangular piece of paper and glueing the ends together after putting
a twist in it. Here is a picture of one.
There is something quite interesting about this Mobeus band and this is that it
can be written parametrically with a simple parameter domain. The picture above is a
maple graph of the parametrically defined surface
x = 4 cos + v cos 2
R (, v)
y = 4 sin + v cos 2 , [0, 2] , v [1, 1] .
z = v sin 2
An obvious question is why the normal vector, R, R,v / |R, R,v | is not a continuous
function of position on S. You can see easily that it is a continuous function of both
and v. However, the map, R is not one to one. In fact, R (0, 0) = R (2, 0) . Therefore,
near this point on S, there are two different values for the above normal vector. In fact,
a tedious computation will show this normal vector is
v2
2
+
+ 4 sin
v (sin cos )
16 sin
2
2
2
2 !
1
1
1
1
3
2
2
+ 4 cos
cos
sin
+ cos
2
2
2
2
2
and you can verify that the denominator will not vanish. Letting v = 0 and = 0 and
2 yields the two vectors, (0, 0, 1) , (0, 0, 1) so there is a discontinuity. This is why I
was careful to say in the statement of Stokes theorem given above that R is one to one.
The Mobeus band has some usefulness. In old machine shops the equipment was
run by a belt which was given a twist to spread the surface wear on the belt over twice
the area.
The above explanation shows that R, R,v / |R, R,v | fails to deliver an orientation for the Mobeus band. However, this does not answer the question whether there
403
is some orientation for it other than this one. In fact there is none. You can see this
by looking at the first of the two pictures below or by making one and tracing it with a
pencil. There is only one side to the Mobeus band. An oriented surface must have two
sides, one side identified by the given unit normal which varies continuously over the
surface and the other side identified by the negative of this normal. The second picture
below was taken by Ouyang when he was at meetings in Paris and saw it at a museum.
18.2.3
Thus the line integral is path independent in this case. This function, is called a
scalar potential for F.
Proof: To save space and fussing over things which are unimportant, denote by
p (x0 , x) a polygonal curve from x0 to x. Thus the orientation is such that it goes from
x0 to x. The curve p (x, x0 ) denotes the same set of points but in the opposite order.
Suppose first F is conservative. Fix x0 U and let
Z
(x)
F dR.
p(x0 ,x)
This is well defined because if q (x0 , x) is another polygonal curve joining x0 to x, Then
the curve obtained by following p (x0 , x) from x0 to x and then from x to x0 along
2 There
404
q (x, x0 ) is a closed piecewise smooth curve and so by assumption, the line integral
along this closed curve equals 0. However, this integral is just
Z
Z
Z
Z
F dR+
F dR =
F dR
F dR
p(x0 ,x)
q(x,x0 )
which shows
p(x0 ,x)
q(x0 ,x)
Z
F dR =
p(x0 ,x)
F dR
q(x0 ,x)
dR+
p(x0 ,x)
p(x,x+tei )
=
.
t
Since U is open, for small t, the ball of radius |t| centered at x is contained in U.
Therefore, the line segment from x to x + tei is also contained in U and so one can take
p (x, x + tei ) (s) = x + s (tei ) for s [0, 1]. Therefore, the above difference quotient
reduces to
Z
Z 1
1 1
F (x + s (tei )) tei ds =
Fi (x + s (tei )) ds
t 0
0
= Fi (x + st (tei ))
by the mean value theorem for integrals. Here st is some number between 0 and 1. By
continuity of F, this converges to Fi (x) as t 0. Therefore, = F as claimed.
Conversely, if = F, then if R : [a, b] Rp is any C 1 curve joining x to y,
Z
Z
F (R (t)) R0 (t) dt =
(R (t)) R0 (t) dt
Z
=
=
=
d
( (R (t))) dt
dt
a
(R (b)) (R (a))
(y) (x)
and this verifies 18.4 in the case where the curve joining the two points is smooth. The
general case follows immediately from this by using this result on each of the pieces of
the piecewise smooth curve. For example if the curve goes from x to p and then from p
to y, the above would imply the integral over the curve from x to p is (p) (x) while
from p to y the integral would yield (y) (p) . Adding these gives (y) (x) .
The formula 18.4 implies the line integral over any closed curve equals zero because the
starting and ending points of such a curve are the same. This proves the theorem.
Example 18.2.8 Let F (x, y, z) = (cos x yz sin (xz) , cos (xz) , yx sin
R (xz)) . Let C
be a piecewise smooth curve which goes from (, 1, 1) to 2 , 3, 2 . Find C F dR.
The specifics of the curve are not given so the problem is nonsense unless the vector
field is conservative. Therefore, it is reasonable to look for the function, satisfying
= F. Such a function satisfies
x = cos x y (sin xz) z
405
C
I
This is like a sock. The surface is the sock and the curve, C goes around the opening
of the sock.
As an application of Stokes theorem, here is a useful theorem which gives a way to
check whether a vector field is conservative.
Theorem 18.2.11 For a three dimensional simply connected open set, V and F a C 1
vector field defined in V, F is conservative if F = 0 in V.
406
Thus F is conservative.
Example 18.2.12 Determine whether the vector field,
4x + 2 cos x2 + z 2 x, 1, 2 cos x2 + z 2 z
is conservative.
Since this vector field is defined on all of
if it is the zero vector.
4x3 + 2 cos x2 + z 2 x
2 cos x2 + z 2 z
This is obviously equal to zero. Therefore, the given vector field is conservative. Can
you
function for it? Let
find
a potential
y, z)
2= y+g
(x)+
3
2
sin x2 + z 2 . Taking
the
derivative
with
respect
to
x,
you
get
4x
+2
cos
x
+
z
x=
2
0
2
4
g (x) + 2x
cos x + z and so it suffices to take g (x) = x . Hence (x, y, z) = y +
x4 + sin x2 + z 2 .
18.2.4
Some Terminology
18.2.5
Many of the ideas presented above are useful in analyzing Maxwells equations. These
equations are derived in advanced physics courses. They are
1 B
c t
E
1 E
B
c t
B
E+
= 4
4
=
f
c
= 0
(18.5)
(18.6)
(18.7)
(18.8)
It follows E +
407
1 A1
=0
E+
c t
1 A1
c t
1 A1
.
c t
(18.9)
1 1
1 2
=
A1 .
c2 t2
c t
(18.10)
Next define
1
.
c t
Therefore, in terms of the new variables, 18.10 becomes
1 2
1 1 2
2
2 2 =
A + 2
c t
c t
c t2
A A1 + , 1 +
(18.11)
which yields
c A.
(18.12)
t
Then it follows from Theorem 17.1.3 on Page 370 that A is also a vector potential for
B. That is
A = B.
(18.13)
0=
From 18.9
1
1 A
=E+
c t
c t
t
and so
= E +
1 A
.
c t
(18.14)
1 A
4
=
f.
c t
c
(18.15)
1
1 2A
4
( A) 2 A
+ 2 2 =
f
c t
c t
c
and using 18.12, this gives
1 2A
4
2 A =
f.
c2 t2
c
Also from 18.14, 18.6, and 18.12,
2
and so
(18.16)
1
( A)
c t
1 2
= 4 + 2 2
c t
=
E+
1 2
2 = 4.
c2 t2
(18.17)
408
This is very interesting. If a solution to the wave equations, 18.17, and 18.16 can be
found along with a solution to 18.12, then letting the magnetic field be given by 18.13
and letting E be given by 18.14 the result is a solution to Maxwells equations. This
is significant because wave equations are easier to think of than Maxwells equations.
Note the above argument also showed that it is always possible, by solving another wave
equation, to get 18.12 to hold.
18.3
Exercises
3. Determine whether the vector field, 2xy 3 sin z + z, 3x2 y 2 sin z + 2xy, x2 y 3 cos z + x
is conservative. If it is conservative, find a potential function.
4. Find scalar potentials for the following vector fields if it is possible to do so. If it
is not possible to do so, explain why.
(d) xy, z 2 , y 3
R
7. Here is a vector field: F 2xy, x2 5y 4 , 3z 2 . Find C FdR where C is a curve
which goes from (1, 2, 3) to (4, 2, 1) .
R
8. Here is a vector field: F 2xy, x2 5y 4 , 3 cos z 3 z 2 . Find C FdR where C
is a curve which goes from (1, 0, 1) to (4, 2, 1) .
R
9. Find U FdR where U is the set, {(x, y) : 2 x 4, 0 y x} and F (x, y) =
(x sin y, y sin x) .
R
10. Find U FdR where U is the set, (x, y) : 2 x 3, 0 y x2 and F (x, y) =
(x cos y, y + x) .
R
11. Find U FdR where U is the set, {(x, y) : 1 x 2, x y 3} and F (x, y) =
(x sin y, y sin x) .
18.3. EXERCISES
409
18. Let r (t) = cos3 (t) , sin3 (t) where t [0, 2] . Sketch this curve and find the area
enclosed by it using Greens theorem.
x
19. Consider the vector field, (x2y
+y 2 ) , (x2 +y 2 ) , 0 = F. Show that F = 0 but
that for theR closed curve, whose parameterization is R (t) = (cos t, sin t, 0) for
t [0, 2] , C F dR 6= 0. Therefore, the vector field is not conservative. Does this
contradict Theorem 18.2.11? Explain.
20. Let x be a point of R3 and let n be a unit vector. Let Dr be the circular disk of radius r containing x which is perpendicular to n. Placing the tail of n at x and viewing Dr from the point of n, orient Dr in the counter clockwise direction.R Now suppose F is a vector field defined near x. Show curl (F) n = limr0 r1 2 Dr FdR.
This last integral is sometimes called the circulation density of F. Explain how
this shows that curl (F) n measures the tendency for the vector field to curl
around the point, the vector n at the point x.
21. The cylinder x2 + y 2 = 4 is intersected with the plane x + y + z = 2. This yields a
closed curve, C. Orient this curve in the counter
clockwise
when viewed
direction
R
from a point high on the z axis. Let F = x2 y, z + y, x2 . Find C FdR.
22. The cylinder x2 + 4y 2 = 4 is intersected with the plane x + 3y + 2z = 1. This
yields a closed curve, C. Orient this curve in the counter
clockwise
direction
when
R
viewed from a point high on the z axis. Let F = y, z + y, x2 . Find C FdR.
23. The cylinder x2 + y 2 = 4 is intersected with the plane x + 3y + 2z = 1. This yields
a closed curve, C. Orient this curve in the clockwise direction
when viewed from
R
a point high on the z axis. Let F = (y, z + y, x). Find C FdR.
R
24. Let F = xz, z 2 (y + sin x) , z 3 y . Find
the surface integral, S curl (F)ndA where
S is the surface, z = 4 x2 + y 2 , z 0.
R
25. Let F = xz, y 3 + x , z3 y . Find
the surface integral, S curl (F) ndA where S
is the surface, z = 16 x2 + y 2 , z 0.
410
R
z
FdR if F = 2 , xy, xz . Hint: This is not too hard if you show you can use
C
Stokes theorem on a domain in the xy plane.
27. Suppose solutions have been found to 18.17, 18.16, and 18.12. Then define E and
B using 18.14 and 18.13. Verify Maxwells equations hold for E and B.
28. Suppose now you have found solutions to 18.17 and 18.16, 1 and A1 . Then go
show again that if satisfies 18.10 and 1 + 1c
t , while A A1 + , then
18.12 holds for A and .
29. Why consider Maxwells equations? Why not just consider 18.17, 18.16, and
18.12?
30. Tell which open sets are simply connected.
(a) The inside of a car radiator.
(b) A donut.
(c) The solid part of a cannon ball which contains a void on the interior.
(d) The inside of a donut which has had a large bite taken out of it.
(e) All of R3 except the z axis.
(f) All of R3 except the xy plane.
31. Let P be a polygon with vertices (x1 , y1 ) , (x2 , y2 ) , , (xn , yn ) , (x1 , y1 ) encountered as you move over the boundary of the polygon in the counter clockwise
direction. Using Problem 13, find a nice formula for the area of the polygon in
terms of the vertices.
A.1
It is easiest to give a different definition of the determinant which is clearly well defined
and then prove the earlier one in terms of Laplace expansion. Let (i1 , , in ) be an
ordered list of numbers from {1, , n} . This means the order is important so (1, 2, 3)
and (2, 1, 3) are different. There will be some repetition between this section and the
earlier section on determinants. The main purpose is to give all the missing proofs.
Two books which give a good introduction to determinants are Apostol [2] and Rudin
[23]. A recent book which also has a good introduction is Baker [4].
The following Lemma will be essential in the definition of the determinant.
Lemma A.1.1 There exists a unique function, sgnn which maps each list of numbers
from {1, , n} to one of the three numbers, 0, 1, or 1 which also has the following
properties.
sgnn (1, , n) = 1
(1.1)
sgnn (i1 , , p, , q, , in ) = sgnn (i1 , , q, , p, , in )
(1.2)
In words, the second property states that if two of the numbers are switched, the value
of the function is multiplied by 1. Also, in the case where n > 1 and {i1 , , in } =
{1, , n} so that every number from {1, , n} appears in the ordered list, (i1 , , in ) ,
sgnn (i1 , , i1 , n, i+1 , , in )
(1)
(1.3)
412
Proof: To begin with, it is necessary to show the existence of such a function. This
is clearly true if n = 1. Define sgn1 (1) 1 and observe that it works. No switching
is possible. In the case where n = 2, it is also clearly true. Let sgn2 (1, 2) = 1 and
sgn2 (2, 1) = 1 while sgn2 (2, 2) = sgn2 (1, 1) = 0 and verify it works. Assuming such a
function exists for n, sgnn+1 will be defined in terms of sgnn . If there are any repeated
numbers in (i1 , , in+1 ) , sgnn+1 (i1 , , in+1 ) 0. If there are no repeats, then n + 1
appears somewhere in the ordered list. Let be the position of the number n + 1 in the
list. Thus, the list is of the form (i1 , , i1 , n + 1, i+1 , , in+1 ) . From 1.3 it must
be that
sgnn+1 (i1 , , i1 , n + 1, i+1 , , in+1 )
n+1
(1)
It is necessary to verify this satisfies 1.1 and 1.2 with n replaced with n + 1. The first
of these is obviously true because
n+1(n+1)
sgnn (1, , n) = 1.
If there are repeated numbers in (i1 , , in+1 ) , then it is obvious 1.2 holds because
both sides would equal zero from the above definition. It remains to verify 1.2 in the
case where there are no numbers repeated in (i1 , , in+1 ) . Consider
r
s
sgnn+1 i1 , , p, , q, , in+1 ,
where the r above the p indicates the number, p is in the rth position and the s above
the q indicates that the number, q is in the sth position. Suppose first that r < < s.
Then
sgnn+1 i1 , , p, , n + 1, , q, , in+1
n+1
(1)
while
r
s1
sgnn i1 , , p, , q , , in+1
r
s
sgnn+1 i1 , , q, , n + 1, , p, , in+1 =
n+1
(1)
r
s1
sgnn i1 , , q, , p , , in+1
and so, by induction, a switch of p and q introduces a minus sign in the result. Similarly,
if > s or if < r it also follows that 1.2 holds. The interesting case is when = r or
= s. Consider the case where = r and note the other case is entirely similar.
r
s
sgnn+1 i1 , , n + 1, , q, , in+1 =
n+1r
(1)
while
s1
sgnn i1 , , q , , in+1
s
r
sgnn+1 i1 , , q, , n + 1, , in+1 =
r
n+1s
(1)
sgnn i1 , , q, , in+1 .
(1.4)
(1.5)
By making s 1 r switches, move the q which is in the s 1th position in 1.4 to the
rth position in 1.5. By induction, each of these switches introduces a factor of 1 and
so
r
s1
s1r
sgnn i1 , , q, , in+1 .
sgnn i1 , , q , , in+1 = (1)
413
Therefore,
r
s
s1
n+1r
sgnn+1 i1 , , n + 1, , q, , in+1 = (1)
sgnn i1 , , q , , in+1
r
s1r
(1)
sgnn i1 , , q, , in+1
r
r
n+s
2s1
n+1s
= (1)
sgnn i1 , , q, , in+1 = (1)
(1)
sgnn i1 , , q, , in+1
s
r
= sgnn+1 i1 , , q, , n + 1, , in+1 .
n+1r
= (1)
A.2
A.2.1
The Determinant
The Definition
In what follows sgn will often be used rather than sgnn because the context supplies
the appropriate n.
Definition A.2.1 Let f be a real valued function which has the set of ordered lists of
numbers from {1, , n} as its domain. Define
X
f (k1 kn )
(k1 , ,kn )
to be the sum of all the f (k1 kn ) for all possible choices of ordered lists (k1 , , kn )
of numbers of {1, , n} . For example,
X
f (k1 , k2 ) = f (1, 2) + f (2, 1) + f (1, 1) + f (2, 2) .
(k1 ,k2 )
where the sum is taken over all ordered lists of numbers from {1, , n}. Note it suffices
to take the sum over only those ordered lists in which there are no repeats because if
there are, sgn (k1 , , kn ) = 0 and so that term contributes 0 to the sum.
Let A be an n n matrix, A = (aij ) and let (r1 , , rn ) denote an ordered list of n
numbers from {1, , n}. Let A (r1 , , rn ) denote the matrix whose k th row is the rk
row of the matrix, A. Thus
X
det (A (r1 , , rn )) =
sgn (k1 , , kn ) ar1 k1 arn kn
(1.6)
(k1 , ,kn )
and
A (1, , n) = A.
414
A.2.2
(1.7)
(k1 , ,kn )
det (A (r1 , , rn )) .
(1.8)
(1.9)
(k1 , ,kn )
sgn k1 , ,
z }| {
kr , , ks
(k1 , ,kn )
(1.10)
Consequently,
det (A (1, , s, , r, , n)) =
det (A (1, , r, , s, , n)) = det (A)
Now letting A (1, , s, , r, , n) play the role of A, and continuing in this way,
switching pairs of numbers,
p
A.2.3
415
A Symmetric Definition
n!
(1.11)
And
det AT = det (A) where AT is the transpose of A. (Recall that for AT =
T also
aij , aTij = aji .)
Proof: From Proposition A.2.3, if the ri are distinct,
X
det (A) =
sgn (r1 , , rn ) sgn (k1 , , kn ) ar1 k1 arn kn .
(k1 , ,kn )
Summing over all ordered lists, (r1 , , rn ) where the ri are distinct, (If the ri are not
distinct, sgn (r1 , , rn ) = 0 and so there is no contribution to the sum.)
n! det (A) =
X
This proves the corollary since the formula gives the same number for A as it does for
AT .
A.2.4
Corollary A.2.6 If two rows or two columns in an n n matrix, A, are switched, the
determinant of the resulting matrix equals (1) times the determinant of the original
matrix. If A is an n n matrix in which two rows are equal or two columns are equal
then det (A) = 0. Suppose the ith row of A equals (xa1 + yb1 , , xan + ybn ). Then
det (A) = x det (A1 ) + y det (A2 )
where the ith row of A1 is (a1 , , an ) and the ith row of A2 is (b1 , , bn ) , all other
rows of A1 and A2 coinciding with those of A. In other words, det is a linear function
of each row A. The same is true with the word row replaced with the word column.
Proof: By Proposition A.2.3 when two rows are switched, the determinant of the
resulting matrix is (1) times the determinant of the original matrix. By Corollary
A.2.5 the same holds for columns because the columns of the matrix equal the rows
of the transposed matrix. Thus if A1 is the matrix obtained from A by switching two
columns,
det (A) = det AT = det AT1 = det (A1 ) .
If A has two equal columns or two equal rows, then switching them results in the same
matrix. Therefore, det (A) = det (A) and so det (A) = 0.
It remains to verify the last assertion.
X
det (A)
sgn (k1 , , kn ) a1k1 (xaki + ybki ) ankn
(k1 , ,kn )
416
X
=x
(k1 , ,kn )
+y
(k1 , ,kn )
A.2.5
a1
ar
a1
ar
Pr
k=1 ck ak
an1
By Corollary A.2.6
det (A) =
r
X
ck det
an1
ak
= 0.
k=1
The case for rows follows from the fact that det (A) = det AT . This proves the corollary.
A.2.6
One of the most important rules about determinants is that the determinant of a
product equals the product of the determinants.
Theorem A.2.10 Let A and B be n n matrices. Then
det (AB) = det (A) det (B) .
417
(k1 , ,kn )
sgn (k1 , , kn )
(k1 , ,kn )
!
a1r1 br1 k1
r1
!
anrn brn kn
rn
sgn (r1 rn ) a1r1 anrn det (B) = det (A) det (B) .
(r1 ,rn )
A.2.7
Cofactor Expansions
A
M=
0 a
or
M=
0
a
(1.12)
(1.13)
Letting denote the position of n in the ordered list, (k1 , , kn ) then using the earlier
conventions used to prove Lemma A.1.1, det (M ) equals
n1
n
(1)
sgnn1 k1 , , k1 , k+1 , , kn m1k1 mnkn
(k1 , ,kn )
Now suppose 1.13. Then if kn 6= n, the term involving mnkn in the above expression
equals zero. Therefore, the only terms which survive are those for which = n or in
other words, those for which kn = n. Therefore, the above expression reduces to
X
a
sgnn1 (k1 , kn1 ) m1k1 m(n1)kn1 = a det (A) .
(k1 , ,kn1 )
To get the assertion in the situation of 1.12 use Corollary A.2.5 and 1.13 to write
T
A
0
det (M ) = det M T = det
= a det AT = a det (A) .
a
418
i=1
The first formula consists of expanding the determinant along the ith row and the second
expands the determinant along the j th column.
Proof: Let (ai1 , , ain ) be the ith row of A. Let Bj be the matrix obtained from A
by leaving every row the same except the ith row which in Bj equals (0, , 0, aij , 0, , 0) .
Then by Corollary A.2.6,
n
X
det (A) =
det (Bj )
j=1
ij
Denote by A the (n 1) (n 1) matrix obtained by deleting the ith row and the
i+j
j th column of A. Thus cof (A)ij (1) det Aij . At this point, recall that from
Proposition A.2.3, when two rows or two columns in a matrix, M, are switched, this
results in multiplying the determinant of the old matrix by 1 to get the determinant
of the new matrix. Therefore, by Lemma A.2.11,
ij
nj
ni
det (Bj ) = (1)
(1)
det
0 aij
ij
i+j
= (1) det
= aij cof (A)ij .
0 aij
Therefore,
det (A) =
n
X
j=1
which is the formula for expanding det (A) along the ith row. Also,
det (A)
n
T X
det A =
aTij cof AT ij
j=1
n
X
j=1
which is the formula for expanding det (A) along the ith column. This proves the
theorem.Note that this gives an easy way to write a formula for the inverse of an n n
matrix. Recall the definition of the inverse of a matrix in Definition 2.1.28 on Page 39.
A.2.8
419
1
aij where
1
a1
cof (A)ji
ij = det(A)
for cof (A)ij the ij th cofactor of A.
Proof: By Theorem A.2.13 and letting (air ) = A, if det (A) 6= 0,
n
X
i=1
Now consider
n
X
i=1
th
when k 6= r. Replace the k column with the rth column to obtain a matrix, Bk whose
determinant equals zero by Corollary A.2.6. However, expanding this matrix along the
k th column yields
1
n
X
i=1
Summarizing,
n
X
= rk .
i=1
= rk
j=1
a1
ij = cof (A)ji det (A)
A1 = (BA) A1 = B AA1 = BI = B.
420
A.2.9
Cramers Rule
In case you are solving a system of equations, Ax = y for x, it follows that if A1 exists,
x = A1 A x = A1 (Ax) = A1 y
thus solving the system. Now in the case that A1 exists, there is a formula for A1
given above. Using this formula,
xi =
n
X
a1
ij yj =
j=1
n
X
j=1
1
cof (A)ji yj .
det (A)
1
..
xi =
det .
det (A)
y1
..
.
yn
.. ,
.
where here the ith column of A is replaced with the column vector, (y1 , yn ) , and
the determinant of this modified matrix is taken and divided by det (A). This formula
is known as Cramers rule.
A.2.10
0
..
.
..
..
.
..
.
0
..
.
A lower triangular matrix is defined similarly as a matrix for which all entries above
the main diagonal are equal to zero.
With this definition, here is a simple corollary of Theorem A.2.13.
Corollary A.2.17 Let M be an upper (lower) triangular matrix. Then det (M ) is
obtained by taking the product of the entries on the main diagonal.
A.2.11
421
..
..
.
.
.
ai1 j1 ai1 jr
..
..
.
.
air j1
Thus
0 = alp C +
a ir j r
r
X
Ck aik p
k=1
which implies
alp =
r
X
Ck
k=1
aik p
r
X
mk aik p
k=1
Since this is true for every p and since mk does not depend on p, this has shown the
lth row is a linear combination of the i1 , i2 , , ir rows. The determinant rank does
not change when you replace A with AT . Therefore, the same conclusion holds for the
columns. This proves the theorem.
422
A.2.12
The following theorem is of fundamental importance and ties together many of the ideas
presented above.
Theorem A.2.20 Let A be an n n matrix. Then the following are equivalent.
1. det (A) = 0.
2. A, AT are not one to one.
3. A is not onto.
Proof: Suppose det (A) = 0. Then the determinant rank of A = r < n. Therefore,
there exist r columns such that every other column is a linear combination of these
th
columns by Theorem A.2.19. In particular, it follows that for
some m, the m column
is a linear combination of all the others. Thus letting A = a1 am an
where the columns are denoted by ai , there exists scalars, i such that
X
am =
k ak .
k6=m
Ax = am +
. Then
k ak = 0.
k6=m
Since also A0 = 0, it follows A is not one to one. Similarly, AT is not one to one by the
same argument applied to AT . This verifies that 1.) implies 2.).
Now suppose 2.). Then since AT is not one to one, it follows there exists x 6= 0 such
that
AT x = 0.
Taking the transpose of both sides yields
x T A = 0T
where the 0T is a 1 n matrix or row vector. Now if Ay = x, then
2
|x| = xT (Ay) = xT A y = 0T y = 0
contrary to x 6= 0. Consequently there can be no y such that Ay = x and so A is not
onto. This shows that 2.) implies 3.).
Finally, suppose 3.). If 1.) does not hold, then det (A) 6= 0 but then from Theorem
A.2.14 A1 exists and so for every y Fn there exists a unique x Fn such that
Ax = y. In fact x = A1 y. Thus A would be onto contrary to 3.). This shows 3.)
implies 1.) and proves the theorem.
Corollary A.2.21 Let A be an n n matrix. Then the following are equivalent.
1. det(A) 6= 0.
2. A and AT are one to one.
3. A is onto.
Proof: This follows immediately from the above theorem.
A.2.13
423
Schurs Theorem
aij xj = 0, i = 1, 2, , m
(1.14)
j=1
A
,
0
an n n matrix having n m rows of zeros on the bottom, it follows this matrix has
determinant equal to 0. Therefore, from Theorem A.2.19, there exists x 6= 0 such that
Ax = 0. This proves the theorem.
Definition A.2.23 A set of vectors in Rn {x1 , , xk } is called an orthonormal set
of vectors if
1 if i = j
xi xj = ij
0 if i 6= j
Theorem A.2.24 Let v1 be a unit vector (|v1 | = 1) in Rn , n > 1. Then there exist
vectors {v2 , , vn } such that
{v1 , , vn }
is an orthonormal set of vectors.
Proof: The equation for x
v1 x = 0
has a nonzero solution x by Theorem A.2.22. Pick such a solution and divide by its
magnitude to get v2 a unit vector such that v1 v2 = 0. Now suppose v1 , , vk have
been chosen such that {v1 , , vk } us an orthonormal set of vectors. Then consider
the equations
vj x = 0 j = 1, 2, , k
This amounts to the situation of Theorem A.2.22 in which there are more variables
than equations. Therefore, by this theorem, there exists a nonzero x solving all these
equations. Divide by its magnitude and this gives vk+1 . This proves the theorem.
Definition A.2.25 If U is an n n matrix whose columns form an orthonormal set of
vectors, then Q is called an orthogonal matrix. Note that from the way we multiply
matrices,
U T U = U U T = I.
Thus U 1 = U T .
424
(1.15)
where T is an upper triangular matrix having the eigenvalues of A on the main diagonal
listed according to multiplicity as zeros of the characteristic equation.
Proof: The theorem is clearly true if A is a 1 1 matrix. Just let U = 1 the 1 1
matrix which has 1 down the main diagonal and zeros elsewhere. Suppose it is true
for (n 1) (n 1) matrices and let A be an n n matrix. Then let v1 be a unit
eigenvector for A . Then there exists 1 such that
Av1 = 1 v1 , |v1 | = 1.
By Theorem A.2.24 there exists {v1 , , vn }, an orthonormal set in Rn . Let U0 be a
matrix whose ith column is vi . Then from the above, it follows U0 is orthogonal. Then
from the way you multiply matrices U0T AU0 is of the form
1
0
..
A
1
0
where A1 is an n 1 n 1 matrix. The above matrix is similar to A so it has the
same eigenvalues and indeed the same characteristic equation. Also the eigenvalues of
A1 are all real because each of these eigenvalues is an eigenvalue of the above matrix
and is therefore an eigenvalue of A. Now by induction there exists an (n 1) (n 1)
e1 such that
orthogonal matrix U
e A1 U
e1 = Tn1 ,
U
1
an upper trianguar matrix. Consider
U1
1
0
0
e
U1
1
0
1
0
0
e
U
1
Tn1
1
0
T
A1
1
0
0
e1
U
425
T
where T is upper triangular. Then let U = U0 U1 . Since (U0 U1 ) = U1T U0T , it follows A
is similar to T and that U0 U1 is orthogonal. Hence A and T have the same characteristic
polynomials and since the eigenvalues of T are the diagonal entries listed according to
algebraic multiplicity, this proves the theorem.
A.2.14
Symmetric Matrices
xT x = (Ax) x = xT AT x = xT Ax = xT Ax = xT x
and so, cancelling xT x, it follows = showing is real. This proves the lemma.
Theorem A.2.28 Let A be a real symmetric matrix. Then there exists a diagonal
matrix D consisting of the eigenvalues of A down the main diagonal and an orthogonal
matrix U such that
U T AU = D.
Proof: Since A has all real eigenvalues, it follows from Theorem A.2.26, there exists
an orthogonal matrix U such that
U T AU = T
where T is upper triangular. Now
T T = U T AT U = U T AU = T
and so in fact T is a diagonal matrix having the eigenvalues of A down the diagonal.
This proves the theorem.
Theorem A.2.29 Let A be a real symmetric matrix which has all positive eigenvalues
0 < 1 2 n . Then
(Ax x) xT Ax 1 |x|
(Ax x) = xT Ax = xT U D U T x
X T 2
= UT x D UT x =
i U x i
i
U T x 2 = 1 U T xU T x
1
i
T
2
1 U T x U T x =1 xT U U T x = 1 xT Ix = 1 |x| .
426
A.3
Exercises
1. Let m < n and let A be an m n matrix. Show that A is not one to one. Hint:
Consider the n n matrix, A1 which is of the form
A
A1
0
where the 0 denotes an (n m) n matrix of zeros. Thus det A1 = 0 and so A1
is not one to one. Now observe that A1 x is the vector,
Ax
A1 x =
0
which equals zero if and only if Ax = 0.
2. Show that matrix multiplication is associative. That is, (AB) C = A (BC) .
3. Show the inverse of a matrix, if it exists, is unique. Thus if AB = BA = I, then
B = A1 .
4. In the proof of Theorem A.2.14 it was claimed that det (I) = 1. Here I = ( ij ) .
Prove this assertion. Also prove Corollary A.2.17.
5. Let v1 , , vn be vectors in Fn and let M (v1 , , vn ) denote the matrix whose
ith column equals vi . Define
d (v1 , , vn ) det (M (v1 , , vn )) .
Prove that d is linear in each variable, (multilinear), that
d (v1 , , vi , , vj , , vn ) = d (v1 , , vj , , vi , , vn ) ,
(1.16)
and
d (e1 , , en ) = 1
(1.17)
where here ej is the vector in F which has a zero in every position except the j th
position in which it has a one.
6. Suppose f : Fn Fn F satisfies 1.16 and 1.17 and is linear in each variable.
Show that f = d.
7. Show that if you replace a row (column) of an n n matrix A with itself added
to some multiple of another row (column) then the new matrix has the same
determinant as the original one.
P
8. If A = (aij ) , show det (A) = (k1 , ,kn ) sgn (k1 , , kn ) ak1 1 akn n .
9. Use the result of Problem 7 to evaluate
1
6
det
5
3
2 3 2
3 2 3
.
2 2 3
4 6 4
e
cos t
sin t
et sin t cos t .
et cos t sin t
A.3. EXERCISES
427
11. Let Ly = y (n) + an1 (x) y (n1) + + a1 (x) y 0 + a0 (x) y where the ai are given
continuous functions defined on a closed interval, (a, b) and y is some function
which has n derivatives so it makes sense to write Ly. Suppose Lyk = 0 for
k = 1, 2, , n. The Wronskian of these functions, yi is defined as
y1 (x)
yn (x)
y10 (x)
yn0 (x)
.
.
(n1)
y1
(x)
(n1)
yn
(x)
W 0 (x) = det
y1 (x)
y10 (x)
..
.
(n)
y1 (x)
yn (x)
yn0 (x)
..
.
(n)
yn (x)
428
The implicit function theorem is one of the greatest theorems in mathematics. There
are many versions of this theorem which are of far greater generality than the one given
here. The proof given here is like one found in one of Caratheodorys books on the
calculus of variations. It is not as elegant as some of the others which are based on a
contraction mapping principle but it may be more accessible. However, it is an advanced
topic. Dont waste your time with it unless you have first read and understood the
material on rank and determinants found in the chapter on the mathematical theory of
determinants. You will also need to use the extreme value theorem for a function of n
variables and the chain rule as well as everything about matrix multiplication.
Definition B.0.1 Suppose U is an open set in Rn Rm and (x, y) will denote a typical
point of Rn Rm with x Rn and y Rm . Let f : U Rp be in C 1 (U ) . Then define
..
..
D1 f (x, y)
,
.
.
D2 f (x, y)
fp,x1 (x, y)
fp,xn (x, y)
f1,ym (x, y)
..
.
.
fp,ym (x, y)
f1,y1 (x, y)
..
.
fp,y1 (x, y)
f (x0 , y0 ) = 0, D1 f (x0 , y0 )
429
exists.
(2.1)
430
Then there exist positive constants, , , such that for every y B (y0 , ) there exists a
unique x (y) B (x0 , ) such that
f (x (y) , y) = 0.
(2.2)
f (x, y) =
f1 (x, y)
f2 (x, y)
..
.
fn (x, y)
Define for x1 , , x
f1,x1 x1 , y
..
J x1 , , xn , y
.
fn,x1 (xn , y)
f1,xn x1 , y
..
.
.
fn,xn (xn , y)
Then by the assumption of continuity of all the partial derivatives and the extreme
value theorem, there exists r > 0 and 0 , 0 > 0 such that if 0 and 0 , it
n
follows that for all x1 , , xn B (x0 , ) and y B (y0 , ),
(2.3)
and B (x0 , 0 ) B (y0 , 0 ) U . By continuity of all the partial derivatives and the
extreme value theorem, it can also be assumed there exists a constant, K such that for
all (x, y) B (x0 , 0 ) B (y0 , 0 ) and i = 1, 2, , n, the ith row of D2 f (x, y) , given by
D2 fi (x, y) satisfies
|D2 fi (x, y)| < K,
(2.4)
1
n
and for all x , , xn B (x0 , 0 ) and y B (y0 , 0 ) the ith row of the matrix,
1
J x1 , , xn , y
1
which equals eTi J x1 , , xn , y
satisfies
1
T
ei J x1 , , xn , y
< K.
(2.5)
(Recall that ei is the column vector consisting of all zeros except for a 1 in the ith
position.)
To begin with it is shown that for a given y B (y0 , ) there is at most one x
B (x0 , ) such that f (x, y) = 0.
Pick y B (y0 , ) and suppose there exist x, z B (x0 , ) such that f (x, y) =
f (z, y) = 0. Consider fi and let
h (t) fi (x + t (z x) , y) .
Then h (1) = h (0) and so by the mean value theorem, h0 (ti ) = 0 for some ti (0, 1) .
Therefore, from the chain rule and for this value of ti ,
h0 (ti ) = Dfi (x + ti (z x) , y) (z x) = 0.
(2.6)
431
Then denote by xi the vector, x + ti (z x) . It follows from 2.6 that
J x1 , , xn , y (z x) = 0
and so from 2.3 z x = 0. (The matrix, in the above is invertible since its determinant
is nonzero.) Now it will be shown that if is chosen sufficiently small, then for all
y B (y0 , ) , there exists a unique x (y) B (x0 , ) such that f (x (y) , y) = 0.
2
Claim: If is small enough, then the function, hy (x) |f (x, y)| achieves its
minimum value on B (x0 , ) at a point of B (x0 , ) . (The existence of a point in B (x0 , )
at which hy achieves its minimum follows from the extreme value theorem.)
Proof of claim: Suppose this is not the case. Then there exists a sequence k 0
and for some yk having |yk y0 | < k , the minimum of hyk on B (x0 , ) occurs on a
point of B (x0 , ), xk such that |x0 xk | = . Now taking a subsequence, still denoted
by k, it can be assumed
that xk x with |x
n
o x0 | = and yk y0 . This follows
from the fact that x B (x0 , ) : |x x0 | = is a closed and bounded set and is
therefore sequentially compact. Let > 0. Then for k large enough, the continuity of
y hy (x0 ) implies hyk (x0 ) < because hy0 (x0 ) = 0 since f (x0 , y0 ) = 0. Therefore,
from the definition of xk , it is also the case that hyk (xk ) < . Passing to the limit yields
hy0 (x) . Since > 0 is arbitrary, it follows that hy0 (x) = 0 which contradicts the
first part of the argument in which it was shown that for y B (y0 , ) there is at most
one point, x of B (x0 , ) where f (x, y) = 0. Here two have been obtained, x0 and x.
This proves the claim.
Choose < 0 and also small enough that the above claim holds and let x (y) denote
a point of B (x0 , ) at which the minimum of hy on B (x0 , ) is achieved. Since x (y)
is an interior point, you can consider hy (x (y) + tv) for |t| small and conclude this
function of t has a zero derivative at t = 0. Now
n
X
hy (x (y) + tv) =
fi2 (x (y) + tv, y)
i=1
X
d
fi (x (y) + tv, y)
hy (x (y) + tv) =
2fi (x (y) + tv, y)
vj .
dt
xj
i=1
Therefore, letting t = 0, it is required that for every v,
n
X
2fi (x (y) , y)
i=1
fi (x (y) , y)
vj = 0.
xj
0 = 2f (x (y) , y) D1 f (x (y) , y) v
for every vector v. Therefore,
T
0 = f (x (y) , y) D1 f (x (y) , y)
From 2.3, it follows f (x (y) , y) = 0. This proves the existence of the function y x (y)
such that f (x (y) , y) = 0 for all y B (y0 , ) .
It remains to verify this function is a C 1 function. To do this, let y1 and y2 be
points of B (y0 , ) . Then as before, consider the ith component of f and consider the
same argument using the mean value theorem to write
0 = fi (x (y1 ) , y1 ) fi (x (y2 ) , y2 )
= fi (x (y1 ) , y1 ) fi (x (y2 ) , y1 ) + fi (x (y2 ) , y1 ) fi (x (y2 ) , y2 )
= D1 fi xi , y1 (x (y1 ) x (y2 )) + D2 fi x (y2 ) , yi (y1 y2 ) .
(2.7)
432
where yi is a point on the line segment joining y1 and y2 . Thus from 2.4 and the Cauchy
Schwarz inequality,
Therefore,
letting
M y1 , , yn M denote the matrix having the ith row equal to
D2 fi x (y2 ) , yi , it follows
|M (y1 y2 )|
!1/2
2
K |y1 y2 |
mK |y1 y2 | .
(2.8)
M (y1 y2 )
|x (y1 ) x (y2 )| = J x1 , , xn , y1
n
!
2 1/2
X
1
1
T
n
=
M (y1 y2 )
ei J x , , x , y1
i=1
n
X
= K
i=1
!1/2
2
K |M (y1 y2 )|
n
X
mK |y1 y2 |
(2.9)
(2.10)
!1/2
i=1
mn |y1 y2 |
(2.11)
(2.12)
Now let y B (y0 , ) and let |v| be sufficiently small that y + v B (y0 , ) . Then
0
= f (x (y + v) , y + v) f (x (y) , y)
= f (x (y + v) , y + v) f (x (y + v) , y) + f (x (y + v) , y) f (x (y) , y)
Therefore,
x (y + v) x (y) = D1 f (x (y) , y)
1
B.1
D2 f (x (y) , y) v + o (v)
(2.13)
433
be a collection of equality constraints with m < n. Now consider the system of nonlinear
equations
f (x) =
gi (x) =
a
0, i = 1, , m.
Recall x0 is a local maximum if f (x0 ) f (x) for all x near x0 which also satisfies
the constraints 2.13. A local minimum is defined similarly. Let F : U R Rm+1 be
defined by
f (x) a
g1 (x)
(2.14)
F (x,a)
.
..
.
gm (x)
Now consider the m + 1 n matrix,
fx1 (x0 )
g1x1 (x0 )
..
fxn (x0 )
g1xn (x0 )
..
.
gmx1 (x0 )
gmxn (x0 )
If this matrix has rank m+1 then some m+1m+1 submatrix has nonzero determinant.
It follows from the implicit function theorem there exists m + 1 variables, xi1 , , xim+1
such that the system
F (x,a) = 0
(2.15)
specifies these m + 1 variables as a function of the remaining n (m + 1) variables and
a in an open set of Rnm . Thus there is a solution (x,a) to 2.15 for some x close to x0
whenever a is in some open interval. Therefore, x0 cannot be either a local minimum or
a local maximum. It follows that if x0 is either a local maximum or a local minimum,
then the above matrix must have rank less than m + 1 which requires the rows to be
linearly dependent. Thus, there exist m scalars,
1 , , m ,
and a scalar , not all zero such that
fx1 (x0 )
g1x1 (x0 )
gmx1 (x0 )
..
..
..
= 1
+ + m
.
.
.
.
fxn (x0 )
g1xn (x0 )
gmxn (x0 )
If the column vectors
(2.16)
g1x1 (x0 )
gmx1 (x0 )
..
..
.
.
g1xn (x0 )
gmxn (x0 )
(2.17)
are linearly independent, then, 6= 0 and dividing by yields an expression of the form
fx1 (x0 )
g1x1 (x0 )
gmx1 (x0 )
..
..
..
(2.18)
= 1
+ + m
.
.
.
fxn (x0 )
g1xn (x0 )
gmxn (x0 )
at every point x0 which is either a local maximum or a local minimum. This proves the
following theorem.
434
B.2
T
h (x) = x1 (x) xn
.
Thus, h is primitive if it only changes one of the variables. A function, F : Rn Rn
is called a flip if
T
F (x1 , , xk , , xl , , xn ) = (x1 , , xl , , xk , , xn ) .
Thus a function is a flip if it interchanges two coordinates. Also, for m = 1, 2, , n,
T
Pm (x) x1 x2 xm 0 0
1
1 (x)
1,1 (0)
n (x)
n,1 (0)
T
T
k
where k,1 denotes
x1 . Since Dh (0) is one to one, the right side of this expression
cannot be zero. Hence there exists some k such that k,1 (0) 6= 0. Now define
G1 (x)
k (x)
k,1 (0)
0
1
..
.
0
x2
..
xn
k,n (0)
0
..
.
and its determinant equals k,1 (0) 6= 0. Therefore, by the inverse function theorem,
there exists an open set, U1 containing 0 and an open set, V2 containing 0 such that
G1 (U1 ) = V2 and G1 is one to one and onto such that it and its inverse are both C 1 .
Let F1 denote the flip which interchanges xk with x1 . Now define
h2 (y) F1 h1 G1
1 (y)
435
Thus
h2 (G1 (x))
F1 h1 (x)
k (x)
=
Therefore,
P1 h2 (G1 (x)) =
Also
P1 (G1 (x)) =
1 (x)
k (x) 0
k (x) x2
n (x)
T
0
xn
(2.19)
T
1
so P1 h2 (y) = P1 (y) for all y V2 . Also, h2 (0) = 0 and Dh2 (0) exists because of
the definition of h2 above and the chain rule. Also, since F21 = identity, it follows from
2.19 that
h (x) = h1 (x) = F1 h2 G1 (x) .
(2.20)
Suppose then that for m 2,
Pm1 hm (x) = Pm1 (x)
(2.21)
1
x1
xm1
1 (x)
n (x)
exists.
where these k are different than the ones used earlier. Then
T
Dhm (0) em = 0 0 1,m (0) n,m (0)
6 0
=
1
because Dhm (0) exists. Therefore, there exists a k such that k,m (0) 6= 0, not the
same k as before. Define
T
Gm+1 (x) x1 xm1 k (x) xm+1 xn
(2.22)
1
Then Gm+1 (0) = 0 and DGm+1 (0) exists similar to the above. In fact det (DGm+1 (0)) =
k,m (0). Therefore, by the inverse function theorem, there exists an open set, Vm+1
containing 0 such that Vm+1 = Gm+1 (Um ) with Gm+1 and its inverse being one to one
continuous and onto. Let Fm be the flip which flips xm and xk . Then define hm+1 on
Vm+1 by
hm+1 (y) = Fm hm G1
m+1 (y) .
Thus for x Um ,
hm+1 (Gm+1 (x)) = (Fm hm ) (x) .
(2.23)
(2.24)
and consequently,
It follows
Pm hm+1 (Gm+1 (x))
=
=
and
Pm (Gm+1 (x)) =
x1
Pm (Fm hm ) (x)
x1 xm1 k (x)
xm1
k (x) 0
0
T
436
As before, hm+1 (0) = 0 and Dhm+1 (0)
edly, obtaining the following:
h (x) =
=
F1 h2 G1 (x)
F1 F2 h3 G2 G1 (x)
..
.
F1 Fn1 hn Gn1 G1 (x)
where
Pn1 hn (x) = Pn1 (x) =
x1
xn1
x1
xn1
(x)
Therefore, define the primitive function, Gn (x) to equal hn (x). This proves the theorem.
The definition of the Riemann integral of a function of n variables uses the following
definition.
Definition C.0.3 For i = 1, , n, let ik k= be points on R which satisfy
lim ik = ,
(3.1)
ji , iji +1 .
(3.2)
Q=
i=1
438
and ik k= . It is necessary to show that for each i these points can be arranged in
order. To do so, let i0 i0 . Now if
ij , , i0 , , ij
have been chosen such that they are in order and all distinct, let ij+1 be the first
element of
i
k k= ik k=
(3.3)
which is larger than ij and let i(j+1) be the last element of 3.3 which is strictly smaller
than ij . The assumption 3.1 insures such a first and last element exists. Now let the
grid G F consist of boxes of the form
Q
n
Y
i i
ji , ji +1 .
i=1
The Riemann integral is only defined for functions, f which are bounded and are
equal to zero off some bounded set, D. In what follows f will always be such a function.
Definition C.0.5 Let f be a bounded function which equals zero off a bounded set, D,
and let G be a grid. For Q G, define
MQ (f ) sup {f (x) : x Q} , mQ (f ) inf {f (x) : x Q} .
(3.4)
n
Y
(bi ai ) , Q
i=1
n
Y
[ai , bi ] .
i=1
Now define upper sums, UG (f ) and lower sums, LG (f ) with respect to the indicated
grid, by the formulas
X
X
UG (f )
MQ (f ) v (Q) , LG (f )
mQ (f ) v (Q) .
QG
QG
X
QF
mQ (f ) v (Q) =
X X
P G QP
b
mQ (f ) v (Q)
439
mP (f )
P G
X
b
QP
v (Q) =
mP (f ) v (P ) LG (f ) .
P G
UG (f ) LF (f ) I (f ) + I (f )
= .
2
2
Then letting H = G F, Lemma C.0.6 implies
UH (f ) LH (f ) UG (f ) LF (f ) < .
Conversely, if for all > 0 there exists G such that
UG (f ) LG (f ) < ,
then
I (f ) I (f ) UG (f ) LG (f ) < .
Since > 0 is arbitrary, this proves the theorem.
440
C.1
Basic Properties
(3.5)
2 >
QG
Pk
v (Q)
Pk
and so for k = f, g,
>>
v (Q) .
(3.6)
Pk
Suppose for k = f, g,
MQ (k) mQ (k) .
Then if x1 , x2 Q,
|f (x1 ) f (x2 )| < , and |g (x1 ) g (x2 )| < .
Therefore,
|h (x1 ) h (x2 )| | (f (x1 ) , g (x1 )) (f (x2 ) , g (x2 ))| <
and it follows that
|MQ (h) mQ (h)| .
Now let
S {Q G : 0 < MQ (k) mQ (k) , k = f, g} .
Thus the union of the boxes in S is contained in some large box, R, which depends only
on f and g and also, from the assumption that (0, 0) = 0, MQ (h) mQ (h) = 0 unless
Q R. Then
X
(MQ (h) mQ (h)) v (Q) +
UG (h) LG (h)
QPf
441
X
QPg
v (Q) .
QS
Now since K is compact, it follows (K) is bounded and so there exists a constant,
C, depending only on h and such that MQ (h) mQ (h) < C. Therefore, the above
inequality implies
X
X
X
UG (h) LG (h) C
v (Q) + C
v (Q) +
v (Q) ,
QPf
QPg
QS
Rn
and
Rn
|f | dx f dx .
(3.8)
QG
and so
k dx
k (xQ ) v (Q) < .
QG
Consequently, since
QG
=a
X
QG
it follows
f (xQ ) v (Q) + b
g (xQ ) v (Q) ,
QG
Z
Z
(af + bg) dx a f dx b g dx
(af + bg) dx
(af + bg) (xQ ) v (Q) +
QG
(3.9)
442
Z
Z
X
X
f (xQ ) v (Q) a f dx + b
g (xQ ) v (Q) b g dx
QG
QG
+ |a| + |b| .
Since is arbitrary, this establishes Formula 3.7 and shows the integral is linear.
It remains to establish the inequality 3.8. By 3.9, and the triangle inequality for
sums,
Z
X
|f (xQ )| v (Q)
|f | dx +
QG
Z
X
f (xQ ) v (Q) f dx .
QG
Then since is arbitrary, this establishes the desired inequality. This proves the corollary.
C.2
Which functions are in R (Rn )? As in the case of integrals of functions of one variable,
this is an important question. It turns out the Riemann integrable functions are characterized by being continuous except on a very small set. This has to do with Jordan
content.
Definition C.2.1 A bounded set, E, has Jordan content 0 or content 0 if for every
> 0 there exists a grid, G such that
X
v (Q) < .
QE6=
This symbol says to sum the volumes of all boxes from G which have nonempty intersection with E.
Next it is necessary to define the oscillation of a function.
Definition C.2.2 Let f be a function defined on Rn and let
f,r (x) sup {|f (z) f (y)| : z, y B (x,r)} .
This is called the oscillation of f on B (x,r) . Note that this function of r is decreasing
in r. Define the oscillation of f as
f (x) lim f,r (x) .
r0+
Note that as r decreases, the function, f,r (x) decreases. It is also bounded below
by 0 and so the limit must exist and equals inf { f,r (x) : r > 0} . (Why?) Then the
following simple lemma whose proof follows directly from the definition of continuity
gives the reason for this definition.
Lemma C.2.3 A function f is continuous at x if and only if f (x) = 0.
This concept of oscillation gives a way to define how discontinuous a function is at
a point. The discussion will depend on the following fundamental lemma which gives
the existence of something called the Lebesgue number.
443
Definition C.2.4 Let C be a set whose elements are sets of Rn and let K Rn . The
set, C is called a cover of K if every point of K is contained in some set of C. If the
elements of C are open sets, it is called an open cover.
Lemma C.2.5 Let K be sequentially compact and let C be an open cover of K. Then
there exists r > 0 such that whenever x K, B(x, r) is contained in some set of C .
Proof: Suppose this is not so. Then letting rn = 1/n, there exists xn K such that
B (xn , rn ) is not contained in any set of C. Since K is sequentially compact, there is a
subsequence, xnk which converges to a point, x K. But there exists > 0 such that
B (x, ) U for some U C. Let k be so large that 1/k < /2 and |xnk x| < /2
also. Then if z B (xnk , rnk ) , it follows
|z x| |z xnk | + |xnk x| <
+ =
2 2
and so B (xnk , rnk ) U contrary to supposition. Therefore, the desired number exists
after all.
Theorem C.2.6 Let f be a bounded function which equals zero off a bounded set and
let W denote the set of points where f fails to be continuous. Then f R (Rn ) if W
has content zero. That is, for all > 0 there exists a grid, G such that
X
v (Q) <
(3.10)
QGW
where
GW {Q G : Q W 6= } .
Proof: Let W have content zero. Also let |f (x)| < C/2 for all x Rn , let > 0
be given, and let G be a grid which satisfies 3.10. Since
f equals zero off some
bounded
set, there exists R such that f equals zero off of B 0, R2 . Thus W B 0, R2 . Also
note that if G is a grid for which 3.10 holds, then this inequality continues to hold if G
is replaced with a refined grid. Therefore, you may assume the diameter of every box in
G which intersects B (0, R) is less than R3 and so all boxes of G which intersect the set
where f is nonzero are contained in B (0,R) . Since W is bounded, GW contains only
finitely many boxes. Letting
n
Y
Q
[ai , bi ]
i=1
be one of these boxes, enlarge the box slightly as indicated in the following picture.
n
Y
(ai i , bi + i )
i=1
e ,
( bi + i (ai i )) v Q
444
g
e
and G
W denotes those Q for Q G which have nonempty intersection with W, then
X e
e <
v Q
(3.11)
e G
g
Q
W
ee
where Q
is the box,
n
Y
( bi + 2 i (ai 2 i ))
i=1
(3.12)
ee
because each Q in FW is contained in a set, Q
described above and the sum of the
volumes of these is less than by 3.11. Then
X
UF (f ) LF (f ) =
(MQ (f ) mQ (f )) v (Q)
QFW
(MQ (f ) mQ (f )) v (Q) .
QF1 \FW
If Q F1 \ FW , then Q must be a subset of some set of C \CW since it is not in any set
f1 B (x,rx ) where x
of CW . Say Q Q
/ W . Therefore, from 3.12 and the observation
that x
/ W, it follows f (x) = 0 and so
MQ (f ) mQ (f ) .
Therefore, from 3.13 and the estimate on f,
X
UF (f ) LF (f )
Cv (Q) +
QFW
v (Q)
QF1 \FW
n
C + (2R) ,
the estimate of the second sum coming from the fact
B (0, R)
n
Y
[R, R] .
i=1
445
1 if x E
XE (x) =
0 if x
/E
It is called the indicator function because it indicates whether x is in E according to
whether it equals 1. For a function f R (Rn ) and E a contented set, f XE R (Rn )
by Corollary C.1.2. Then
Z
Z
f dV
f XE dV.
(3.14)
Also let K j=1 vn (Qj ) where the Qj are the boxes which intersect E. Let {ai }i=
be a sequence on R, ai < ai+1 for all i, which includes
Q0 G 0
X
m
X
i= j=1
and all sums are bounded because the functions, f and g are given to be bounded.
Therefore, there are no limit considerations needed here. Thus
UG 0 (XP ) LG 0 (XP ) =
m
X
j=1
vn (Qj )
446
Consider the inside sum with the aid of the following picture.
xn+1 = g(x)
xn+1 = f (x)
x
Qj
mQj (g)
MQj (g)
xn+1
In this picture, the little rectangles represent the boxes Qj [ai , ai+1 ] for fixed j.
The part of P having x contained in Qj is between the two surfaces, xn+1 = g (x) and
xn+1 = f (x) and there is a zero placed in those boxes for which
MQj [ai ,ai+1 ] (XP ) mQj [ai ,ai+1 ] (XP ) = 0.
You see, XP has either the value of 1 or the value of 0 depending on whether (x, y) is
contained in P. For the boxes shown with 0 in them, either all of the box is contained
in P or none of the box is contained in P. Either way,
MQj [ai ,ai+1 ] (XP ) mQj [ai ,ai+1 ] (XP ) = 0
on these boxes. However, on the boxes intersected by the surfaces, the value of
MQj [ai ,ai+1 ] (XP ) mQj [ai ,ai+1 ] (XP )
is 1 because there are points in this box which are not in P as well as points which are
in P. Because of the construction of G 0 which included all values of
MQj (f XE ) +
, MQj (f XE ) ,
4mK
MQj (gXE ) , mQj (f XE ) , mQj (gXE )
for all j = 1, , m,
i=
1 (ai+1 ai ) +
1 (ai+1 ai )
The first of the sums in 3.15 contains all possible terms for which
MQj [ai ,ai+1 ] (XP ) mQj [ai ,ai+1 ] (XP )
might be 1 due to the graph of the bottom surface gXE while the second sum contains
all possible terms for which the expression might be 1 due to the graph of the top surface
f XE .
1
m
X
v (Qj ) .
= MQj (gXE ) mQj (gXE ) + MQj (f XE ) mQj (f XE ) +
2m j=1
447
Therefore, by 3.14,
UG 0 (XP ) LG 0 (XP )
m
X
vn (Qj )
j=1
1
m
X
+
v (Qj )
v (Qj )
2m
j=1
j=1
m
X
UG (f ) LG (f ) + UG (g) LG (g) +
<
+ + = .
4 4 2
a i X Bi
i=1
448
Proof: Let G be a grid with
X
v (Q) <
QE6=
Then
UG (f XE )
.
1 + (M m)
M v (Q)
M
1 + (M m)
mv (Q)
m
1 + (M m)
QE6=
and
LG (f XE )
X
QE6=
and so
UG (f XE ) LG (f XE )
M v (Q)
QE6=
mv (Q)
QE6=
= (M m)
v (Q) <
QE6=
(M m)
< .
1 + (M m)
f XE dV M
v (Q) <
QE6=
i=1
Ei
449
Z
f dV +
Ei
f dV =
Er+1
n
Y
i=1
[ai , bi ] , Q0 =
i=1
n
Y
f dV
Ei
(ai , bi ]
i=1
Z
XQ dV =
This is because
r+1 Z
X
Q0
XQ0 dV = v (Q) .
Q \ Q0 = ni=1 ai
(3.16)
Y
(aj , bj ]
j6=i
a finite union of sets of content 0. It is obvious XQ dV = v (Q) because you can use a
grid which has Q as one of the boxes and then the upper and lower sums are the same
and equal to v (Q) . Therefore, the claim about the equality of the two integrals in 3.16
follows right away from Corollary C.2.11. That XQ0 is integrable follows from
XQ0 = XQ XQ\Q0
and each of the two functions on the right is integrable thanks to Theorem C.2.10.
In fact, here is an interesting version of the Riemann criterion which depends on
these half open boxes.
Lemma C.2.12 Suppose f is a bounded function which equals zero off some bounded
set. Then f R (Rn ) if and only if for all > 0 there exists a grid, G such that
X
(MQ0 (f ) mQ0 (f )) v (Q) < .
(3.17)
QG
Proof: Since Q0 Q,
MQ0 (f ) mQ0 (f ) MQ (f ) mQ (f )
and therefore, the only if part of the equivalence is obvious.
Conversely, let G be a grid such that 3.17 holds with replaced with 2 . It is
necessary to show there is a grid such that 3.17 holds with no primes on the Q. Let F
be a refinement of G obtained by adding the points ik + k where k and is also
chosen so small that for each i = 1, , n,
ik + k < ik+1 .
You only need to have k > 0 for the finitely many boxes of G which intersect the
bounded set where f is not zero. Then for
Q
n
Y
i
ki , iki +1 G,
i=1
Let
b
Q
n
Y
i
ki + ki , iki +1
i=1
450
and denote by Gb the collection of these smaller boxes. For each set, Q in G there is the
b along with n boxes, Bk , k = 1, , n, one of whose sides is of length k
smaller set, Q
and the remainder of whose sides are shorter than the diameter of Q such that the
set,
b and these sets, Bk . Now suppose f equals zero off the ball B 0, R .
Q is the union of Q
2
Then without loss of generality, you may assume the diameter of every box in G which
has nonempty intersection with B (0,R) is smaller than R3 . (If this is not so, simply
refine G to make it so, such a refinement leaving 3.17 valid because refinements do not
increase the difference between upper and lower sums in this context either.) Suppose
there are P sets of G contained in B (0,R) (So these are the only sets of G which could
have nonempty intersection with the set where f is nonzero.) and suppose that for all
x, |f (x)| < C/2. Then
X
X
(MQ (f ) mQ (f )) v (Q)
MQb (f ) mQb (f ) v (Q)
QF
b G
b
Q
(MQ (f ) mQ (f )) v (Q)
b
QF \G
The first term on the right of the inequality in the above is no larger than /2 because
MQb (f ) mQb (f ) MQ0 (f ) mQ0 (f ) for each Q. Therefore, the above is dominated
by
/2 + CP nRn1 <
whenever is small enough. Since is arbitrary, f R (Rn ) as claimed.
C.3
Iterated Integrals
Now the result is clearly a function of x and so, it might be possible to integrate this
and write
Z Z
f (x, y) dVy dVx .
Rn
Rm
This symbol is called an iterated integral, because it involves the iteration of two lower
dimensional integrations. Under what conditions are the two iterated integrals equal to
the integral
Z
f (z) dV ?
Rn+m
ki , iki +1 , ki n + 1
451
ki , iki +1 , ki n.
i=1
where Q equals zero for all but finitely many Q. Thus is a step function. Recall that
for
n+m
n+m
Y
Y
Q=
[ai , bi ] , Q0
(ai , bi ]
i=1
i=1
The function
=
Q XQ0
QG
is integrable because it is a finite sum of integrable functions, each function in the sum
being integrable because the set of discontinuities has Jordan content 0. (why?) Letting
(x, y) = z,
X X
(z) = (x, y) =
RP XR0 P 0 (x, y)
RGn P Gm
X X
(3.19)
RGn P Gm
(3.20)
(, y) dVy R (Rn ) ,
(3.21)
Z
Rm
and
Z
Rn
Rm
Z
(x, y) dVy dVx =
(z) dV.
(3.22)
Rn+m
Where x R and this is a finite sum of integrable functions because each has set of
discontinuities with Jordan content 0. From the description in 3.19,
Z
X X
(x, y) dVy =
RP XR0 (x) v (P )
Rm
RGn P Gm
452
RGn
RP v (P ) XR0 (x) ,
(3.23)
P Gm
Rm
RGn P Gm
Z
(z) dV.
Q v (Q) =
Rn+m
QG
!
)
Z
X
X
MR10
(, y) dVy sup
RP v (P ) XR0 (x) : x R10
Rm
RGn
P Gm
R1 P v (P )
(3.24)
P Gm
(, y) dVy has the constant value given in 3.24 for x R10 . Similarly,
)
(
!
Z
X
X
mR10
(, y) dVy inf
RP v (P ) XR0 (x) : x R10
because
Rm
Rm
RGn
P Gm
R1 P v (P ) .
(3.25)
P Gm
Theorem C.3.4 (Fubini) Let f R (Rn+m ) and suppose also that f (x, ) R (Rm )
for each x. Then
Z
f (, y) dVy R (Rn )
(3.26)
Rm
and
f (z) dV =
Rn+m
Rn
Rm
(3.27)
QG
X
MR0
f (, y) dVy MR0
(, y) dVy =
MR0 P 0 (f ) v (P )
Rm
Rm
Z
mR0
f (, y) dVy mR0
Rm
Rm
P Gm
(, y) dVy
X
P Gm
mR0 P 0 (f ) v (P ) .
453
Therefore,
Z
X
0
MR
Rm
RGn
X X
f (, y) dVy
R0
Rm
f (, y) dVy
v (R)
RGn P Gm
This shows, from Lemma C.2.12 and the Riemann criterion, that
R (Rn ) . It remains to verify 3.27. First note
Z
f (z) dV [LG (f ) , UG (f ) ] .
R
Rm
f (, y) dVy
Rn+m
Next,
Z
Z
LG (f )
Z
dVy dVx
dV =
Rn
Rn+m
Z
Rn
Rm
Z
Rm
Z
Rn
Rn
Rm
Therefore,
Rn+m
Rm
dV UG (f ) .
f (z) dV
n+m
Z Z
(x)
f dV =
P
f (x, y) dy dVx .
E
(x)
Since f is continuous,
Proof:
Z Z
R
(x)
f (x, y) dy dVx
E
(x)
454
C.4
First recall Theorem B.2.2 on Page 434 which is listed here for convenience.
1
f (s) ds =
c
f ( (t)) 0 (t) dt
Dh (x) v C
|v|
and so the desired formula follows when you multiply both sides by |v|.
Definition C.4.5 Let A be an open set. Write C k (A; Rn ) to denote a C k function
n
n
whose domain
1
h C U ; Rn be one to one and Dh (x) exists for all x U. Then h (U ) = (h (U ))
and (h (U )) has zero content.
Proof: Let x U and let g = h where g is a C 1 function defined on an open set
containing U . By the inverse function theorem, g is locally one to one and an open
mapping near x.
g (x) = h (x) and
is in an open set containing points of g (U )
Thus
and points of g U C . These points of g U C cannot equal any points of h (U ) because
g is one to one locally. Thus h (x) (h (U )) and so h (U ) (h (U )) . Now suppose
y (h (U )) . By the inverse function theorem y cannot be in the open set h (U ) . Since
y (h (U )), every ball centered at y contains points of h (U ) and so y h (U )\h (U ) .
Thus there exists a sequence, {xn } U such that h (xn ) y. But then, by the inverse
455
and by refining the grid if necessary, no box of G has nonempty intersection with both
U and H C . Refining this grid still more, you can also assume that for all boxes in G 0 ,
li
<2
lj
where li is the length of the ith side. (Thus the boxes are not too far from being cubes.)
Let C be the constant of Lemma C.4.4 applied to g on H.
Now consider one of these boxes, Q G 0 . If x, y Q, it follows from the chain rule
that
Z
1
g (y) g (x) =
Dg (x+t (y x)) (y x) dt
0
|g (y) g (x)|
|x y| dt C diam (Q)
0
n
X
!1/2
li2
C nL
i=1
X
QG 0
v (PQ ) C n nn/2 2n
QG 0
456
Lemma C.4.8 Let U be a bounded open set and let f XU R (Rn ) . Then
Z
Z
f (x + p) XU p (x) dx = f (x) XU (x) dx
A few more lemmas are needed.
Lemma C.4.9 Let S be a nonempty subset of Rn . Define
f (x) dist (x, S) inf {|x y| : y S} .
Then f is continuous.
Proof: Consider |f (x) f (x1 )|and suppose without loss of generality that f (x1 )
f (x) . Then choose y S such that f (x) + > |x y| . Then
|f (x1 ) f (x)| =
Since is arbitrary, it follows that |f (x1 ) f (x)| |x x1 | and this proves the lemma.
Theorem C.4.10 (Urysohns lemma for Rn ) Let H be a closed subset of an open set,
U. Then there exists a continuous function, g : Rn [0, 1] such that g (x) = 1 for all
x H and g (x) = 0 for all x
/ U.
Proof: If x
/ C, a closed set, then dist (x, C) > 0 because if not, there would exist
a sequence of points
of C
converging to x and it would follow that x C. Therefore,
dist (x, H) + dist x, U C > 0 for all x Rn . Now define a continuous function, g as
dist x, U C
g (x)
.
dist (x, H) + dist (x, U C )
It is easy to see this verifies the conclusions of the theorem and this proves the theorem.
Definition C.4.11 Define spt(f ) (support of f ) to be the closure of the set {x : f (x) 6=
0}. If V is an open set, Cc (V ) will be the set of continuous functions f , defined on Rn
having spt(f ) V .
Definition C.4.12 If K is a compact subset of an open set, V , then K V if
Cc (V ), (K) = {1}, (Rn ) [0, 1].
Also for Cc (Rn ), K if
(Rn ) [0, 1] and (K) = 1.
and V if
n
X
i=1
for all x K.
i (x) = 1
457
Wi Ui Vi
458
smaller than a Lebesgue number for this open cover. Denote by G 0 those boxes of G
whose union equals the set, K. Thus every box of G 0 is contained in one of these Oj . By
Theorem C.4.13 there exists a partition of unity, j on h (K) such that j h (Oj ).
Then
X Z
LG (g)
XQ (x) f (h (x)) |det Dh (x)| dx
QG 0
q Z
X X
(3.28)
QG 0 j=1
R
Consider the term XQ (x) j f (h (x)) |det Dh (x)| dx. By Lemma C.4.8 and Fubinis
theorem this equals
Z
Z
DG1 (x) = x
(x) . Fixing x2 , , xn , change the variable,
1
y1 = (x1 , x2 , , xn ) .
Thus
1
0
x = (x1 , x2 , , xn ) = G1
1 (y1 , x2 , , xn ) G1 (x )
0
0
XQpj G1
j f h (pi ) + F1 Fn1 Gn G1 G1
1 (x )
1 (x )
n1
R R
1 0
1 0
DF Gn G1 G (x ) DGn Gn1 G1 G (x )
1
1
which reduces to
Z
0
XQpj G1
j f (h (pi ) + F1 Fn1 Gn G2 (x0 ))
1 (x )
Rn
|DF (Gn G2 (x0 ))| |DGn (Gn1 G2 (x0 ))| |DGn1 (Gn2 G2 (x0 ))|
|DG2 (x0 )| dVn .
(3.31)
Now use Fubinis theorem again to make the inside integral taken with respect to x2 .
Exactly the same process yields
Z
Z
1
00
XQpj G1
j f (h (pi ) + F1 Fn1 Gn G3 (x00 ))
1 G2 (x )
Rn1
|DF (Gn G3 (x00 ))| |DGn (Gn1 G3 (x00 ))| |DGn1 (Gn2 G3 (x00 ))|
dy2 dVn1 .
(3.32)
00
Now F is just a composition of flips and so |DF (Gn G3 (x ))| = 1 and so this
term can be replaced with 1. Continuing this process, eventually yields an expression
of the form
Z
1
1
1
1
XQpj G1
(y) j f (h (pi ) + y) dVn . (3.33)
1 Gn2 Gn1 Gn F
Rn
459
1
1
1
Denoting by G1 the expression, G1
1 Gn2 Gn1 Gn ,
1
1
1
1
XQpj G1
(y) = 1
1 Gn2 Gn1 Gn F
Rn
X Z
LG (g)
QG 0
=
=
q Z
X X
QG 0 j=1
q Z
X X
QG 0 j=1
X Z
QG 0
Rn
Rn
Xh(U ) (z) f (z) det Dh h1 (z) det Dh1 (z) dVn
Z
=
1 = det Dh h1 (z) det Dh1 (z) .
460
Theorem
Let U be a bounded open set with U having content 0. Also
C.4.15
let
1
h C 1 U ; Rn be one to one on U with Dh (x) exists for all x U. Let f C U .
Then
Z
Z
Xh(U ) (z) f (z) dz = XU (x) f (h (x)) |det Dh (x)| dx
Proof: You note that the formula holds for f +
f = f + f and so
Z
Xh(U ) (z) f (z) dz
Z
=
=
=
C.5
|f |+f
2
and f
|f |f
2 .
Now
Z
Xh(U ) (z) f + (z) dz
Some Observations
Some of the above material is very technical. This is because it gives complete answers to the fundamental questions on existence of the integral and related theoretical
considerations. However, most of the difficulties are artifacts. They shouldnt even be
considered! It was realized early in the twentieth century that these difficulties occur
because, from the point of view of mathematics, this is not the right way to define
an integral! Better results are obtained much more easily using the Lebesgue integral.
Many of the technicalities related to Jordan content disappear almost magically when
the right integral is used. However, the Lebesgue integral is more abstract than the
Riemann integral and it is not traditional to consider it in a beginning calculus course.
If you are interested in the fundamental properties of the integral and the theory behind it, you should abandon the Riemann integral which is an antiquated relic and
begin to study the integral of the last century. An introduction to it is in [23]. Another
very good source is [12]. This advanced calculus text does everything in terms of the
Lebesgue integral and never bothers to struggle with the inferior Riemann integral. A
more general treatment is found in [18], [19], [24], and [20]. There is also a still more
general integral called the generalized Riemann integral. A recent book on this subject
is [5]. It is far easier to define than the Lebesgue integral but the convergence theorems
are much harder to prove. An introduction is also in [19].
Bibliography
[1] Apostol, T. M., Calculus second edition, Wiley, 1967.
[2] Apostol T. Calculus Volume II Second edition, Wiley 1969.
[3] Apostol, T. M., Mathematical Analysis, Addison Wesley Publishing Co., 1974.
[4] Baker, Roger, Linear Algebra, Rinton Press 2001.
[5] Bartle R.G., A Modern Theory of Integration, Grad. Studies in Math., Amer.
Math. Society, Providence, RI, 2000.
[6] Chahal J. S. , Historical Perspective of Mathematics 2000 B.C. - 2000 A.D.
[7] Davis H. and Snider A., Vector Analysis Wm. C. Brown 1995.
[8] DAngelo, J. and West D. Mathematical Thinking Problem Solving and Proofs,
Prentice Hall 1997.
[9] Edwards C.H. Advanced Calculus of several Variables, Dover 1994.
[10] Euclid, The Thirteen Books of the Elements, Dover, 1956.
[11] Fitzpatrick P. M., Advanced Calculus a course in Mathematical Analysis, PWS
Publishing Company 1996.
[12] Fleming W., Functions of Several Variables, Springer Verlag 1976.
[13] Greenberg, M. Advanced Engineering Mathematics, Second edition, Prentice
Hall, 1998
[14] Gurtin M. An introduction to continuum mechanics, Academic press 1981.
[15] Hardy G., A Course Of Pure Mathematics, Tenth edition, Cambridge University
Press 1992.
[16] Horn R. and Johnson C. matrix Analysis, Cambridge University Press, 1985.
[17] Karlin S. and Taylor H. A First Course in Stochastic Processes, Academic
Press, 1975.
[18] Kuttler K. L., Basic Analysis, Rinton
[19] Kuttler K.L., Modern Analysis CRC Press 1998.
[20] Lang S. Real and Functional analysis third edition Springer Verlag 1993. Press,
2001.
[21] Nobel B. and Daniel J. Applied Linear Algebra, Prentice Hall, 1977.
461
462
BIBLIOGRAPHY
[22] Rose, David, A., The College Math Journal, vol. 22, No.2 March 1991.
[23] Rudin, W., Principles of mathematical analysis, McGraw Hill third edition 1976
[24] Rudin W., Real and Complex Analysis, third edition, McGraw-Hill, 1987.
[25] Salas S. and Hille E., Calculus One and Several Variables, Wiley 1990.
[26] Sears and Zemansky, University Physics, Third edition, Addison Wesley 1963.
[27] Tierney John, Calculus and Analytic Geometry, fourth edition, Allyn and Bacon,
Boston, 1969.
[28] Yosida K., Functional Analysis, Springer Verlag, 1978.
Index
C 1 , 247
C k , 247
, 370
2 , 370
DAlembert, 242
deformation gradient, 382
density and mass, 316
derivative, 245
derivative of a function, 155
determinant, 53, 413
463
464
INDEX
Laplace expansion, 56
product, 416
product of matrices, 58
transpose, 415
diameter, 146
difference quotient, 155
differentiable, 243
differentiable matrix, 160
differentiation rules, 158
directed line segment, 19
direction vector, 19
directional derivative, 235
directrix, 104
distance formula, 20
divergence, 369
divergence theorem, 374
donut, 360
dot product, 91
eigenvalue, 302
Einstein summation convention, 116
entries of a matrix, 30
equality of mixed partial derivatives, 240
Eulerian coordinates, 382
Jacobian, 333
Jacobian determinant, 334
Jordan content, 442
Jordan set, 445
joule, 97
Keplers first law, 219
Keplers laws, 219
Keplers third law, 222
kilogram, 111
kinetic energy, 170
Kroneker delta, 115
main diagonal, 57
mass ballance, 381
material coordinates, 381
matrix, 29
inverse, 39
left inverse, 420
lower triangular, 57, 420
right inverse, 420
upper triangular, 57, 420
matrix multiplication
INDEX
entries, 35
properties, 37
matrix transpose, 38
matrix transpose properties, 38
minor, 54, 56, 418
mixed partial derivatives, 238
moment of a force, 110
motion, 382
moving coordinate system, 161, 172
acceleration , 173
multi-index, 136
Navier, 392
nested interval lemma, 146
Newton, 85
second law, 166
Newton Raphson method, 268
Newtons laws, 166
Newtons method, 269
nilpotent, 69, 74
normal vector to plane, 123
one to one, 41
onto, 42
open cover, 152
open set, 79
operator norm, 272
orientable, 402
orientation, 185
oriented curve, 185
origin, 13
orthogonal matrix, 68, 74, 423
orthonormal, 423
osculating plane, 198, 202
parallelepiped, 113
parameter, 18, 19
parametric equation, 18
parametrization, 182
partial derivative, 236
partition of unity, 456
permutation symbol, 115
perpendicular, 95
Piola Kirchhoff stress, 386
plane containing three points, 124
planes, 123
polynomials in n variables, 136
position vector, 16, 83
precession of a top, 348
principal normal, 198, 203
product of matrices, 34
product rule
cross product, 158
465
dot product, 158
matrices, 160
projection of a vector, 97
quadric surfaces, 126
radius of curvature, 198, 202
raw eggs, 351
recurrence relation, 148
recursively defined sequence, 148
refinement of a grid, 310, 318
refinement of grids, 437
resultant, 84
Riemann criterion, 439
Riemann integral, 311, 318
Riemann integral, 439
right handed system, 105
rot, 369
row operations, 58
row vector, 32
saddle point, 289
scalar field, 369
scalar multiplication, 15
scalar potential, 403
scalar product, 91
scalars, 15, 29
second derivative test, 308
sequences, 148
sequential compactness, 150, 191
sequentially compact, 150
singular point, 287
skew symmetric, 38
smooth curve, 182
spacial coordinates, 382
span, 416
speed, 86
spherical coordinates, 259
standard matrix, 245
standard position, 83
Stokes theorem, 399
Stokes, 392
support of a function, 456
symmetric, 38
symmetric form of a line, 20
torque vector, 110
torsion, 203
torus, 360
trace of a surface, 128
traces, 126
triangle inequality, 23, 93
uniformly continuous, 142, 151
466
unit tangent vector, 198, 203
upper sum, 319, 438
Urysohns lemma, 152
vector, 15
vector field, 185, 369
vector fields, 134
vector potential, 372
vector valued function
continuity, 136
derivative, 155
integral, 155
limit theorems, 137
vector valued functions, 133
vectors, 82
velocity, 86
volume element, 334
wave equation, 240
work, 186
Wronskian, 71, 427
zero matrix, 30
INDEX