0% found this document useful (0 votes)
111 views330 pages

Amread

This document is a collection of readings for two applied mathematics courses titled Selected Topics in Applied Mathematics. It contains 10 sections covering various mathematical topics relevant to applied mathematics such as differential equations, Fourier transforms, and their applications to fields like signal processing, remote sensing, and transmission tomography. Each section is further divided into subsections that provide supplementary readings on the given mathematical topics and their applications.

Uploaded by

Thangaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views330 pages

Amread

This document is a collection of readings for two applied mathematics courses titled Selected Topics in Applied Mathematics. It contains 10 sections covering various mathematical topics relevant to applied mathematics such as differential equations, Fourier transforms, and their applications to fields like signal processing, remote sensing, and transmission tomography. Each section is further divided into subsections that provide supplementary readings on the given mathematical topics and their applications.

Uploaded by

Thangaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 330

Selected Topics in Applied

Mathematics

Charles L. Byrne
Department of Mathematical Sciences
University of Massachusetts Lowell
Lowell, MA 01854

August 1, 2014

(Supplementary readings for 92.530–531


Applied Mathematics I and II)
(The most recent version is available as a pdf file at
http://faculty.uml.edu/cbyrne/cbyrne.html)
2
Contents

1 Preface 3

I Readings for Applied Mathematics I 5

2 More Fundamentals(Chapter 1) 7
2.1 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 The Gradient and Directional Derivatives . . . . . . . . . . 8
2.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Richardson’s Method . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Leibnitz’s Rule and Distributions . . . . . . . . . . . . . . . 11
2.7 The Complex Exponential Function . . . . . . . . . . . . . 13
2.7.1 Real Exponential Functions . . . . . . . . . . . . . . 13
2.7.2 Why is h(x) an Exponential Function? . . . . . . . . 13
2.7.3 What is ez , for z complex? . . . . . . . . . . . . . . 14
2.8 Complex Exponential Signal Models . . . . . . . . . . . . . 16

3 Differential Equations (Chapters 2,3) 17


3.1 Second-Order Linear ODE . . . . . . . . . . . . . . . . . . . 17
3.1.1 The Standard Form . . . . . . . . . . . . . . . . . . 17
3.1.2 The Sturm-Liouville Form . . . . . . . . . . . . . . . 17
3.1.3 The Normal Form . . . . . . . . . . . . . . . . . . . 18
3.2 Recalling the Wave Equation . . . . . . . . . . . . . . . . . 19
3.3 A Brief Discussion of Some Linear Algebra . . . . . . . . . 22
3.4 Preview of Coming Attractions . . . . . . . . . . . . . . . . 23

4 Extra Credit Problems (Chapters 2,3) 25


4.1 The Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 25

i
ii CONTENTS

5 Qualitative Analysis of ODEs (Chapter 2,3) 29


5.1 Existence and Uniqueness . . . . . . . . . . . . . . . . . . . 29
5.2 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . 30
5.3 The Sturm Separation Theorem . . . . . . . . . . . . . . . . 30
5.4 From Standard to Normal Form . . . . . . . . . . . . . . . . 30
5.5 On the Zeros of Solutions . . . . . . . . . . . . . . . . . . . 31
5.6 Sturm Comparison Theorem . . . . . . . . . . . . . . . . . . 32
5.6.1 Bessel’s Equation . . . . . . . . . . . . . . . . . . . . 32
5.7 Analysis of y 00 + q(x)y = 0 . . . . . . . . . . . . . . . . . . . 33
5.8 Toward the 20th Century . . . . . . . . . . . . . . . . . . . 33

6 The Trans-Atlantic Cable (Chapters 4,12) 35


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 The Electrical Circuit ODE . . . . . . . . . . . . . . . . . . 36
6.3 The Telegraph Equation . . . . . . . . . . . . . . . . . . . . 37
6.4 Consequences of Thomson’s Model . . . . . . . . . . . . . . 38
6.4.1 Special Case 1: E(t) = H(t) . . . . . . . . . . . . . . 38
6.4.2 Special Case 2: E(t) = H(t) − H(t − T ) . . . . . . . 39
6.5 Heaviside to the Rescue . . . . . . . . . . . . . . . . . . . . 39
6.5.1 A Special Case: G = 0 . . . . . . . . . . . . . . . . . 39
6.5.2 Another Special Case . . . . . . . . . . . . . . . . . 40

7 The Laplace Transform and the Ozone Layer (Chapter 4) 41


7.1 The Laplace Transform . . . . . . . . . . . . . . . . . . . . 41
7.2 Scattering of Ultraviolet Radiation . . . . . . . . . . . . . . 41
7.3 Measuring the Scattered Intensity . . . . . . . . . . . . . . 42
7.4 The Laplace Transform Data . . . . . . . . . . . . . . . . . 42

8 The Finite Fourier Transform (Chapter 7) 45


8.1 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . 45
8.2 Linear Trigonometric Models . . . . . . . . . . . . . . . . . 45
8.2.1 Equi-Spaced Frequencies . . . . . . . . . . . . . . . . 46
8.2.2 Simplifying the Calculations . . . . . . . . . . . . . . 46
8.3 From Real to Complex . . . . . . . . . . . . . . . . . . . . . 50
8.3.1 More Computational Issues . . . . . . . . . . . . . . 51

9 Transmission and Remote Sensing (Chapter 8) 53


9.1 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . 53
9.2 Fourier Series and Fourier Coefficients . . . . . . . . . . . . 53
9.3 The Unknown Strength Problem . . . . . . . . . . . . . . . 54
9.3.1 Measurement in the Far-Field . . . . . . . . . . . . . 55
9.3.2 Limited Data . . . . . . . . . . . . . . . . . . . . . . 56
9.3.3 Can We Get More Data? . . . . . . . . . . . . . . . 57
9.3.4 Other Forms of Prior Knowledge . . . . . . . . . . . 58
CONTENTS iii

9.4 The Transmission Problem . . . . . . . . . . . . . . . . . . 59


9.4.1 Directionality . . . . . . . . . . . . . . . . . . . . . . 59
9.4.2 The Case of Uniform Strength . . . . . . . . . . . . 59
9.5 Remote Sensing . . . . . . . . . . . . . . . . . . . . . . . . . 60
9.6 One-Dimensional Arrays . . . . . . . . . . . . . . . . . . . . 60
9.6.1 Measuring Fourier Coefficients . . . . . . . . . . . . 60
9.6.2 Over-sampling . . . . . . . . . . . . . . . . . . . . . 62
9.6.3 Under-sampling . . . . . . . . . . . . . . . . . . . . . 63
9.7 Higher Dimensional Arrays . . . . . . . . . . . . . . . . . . 63
9.7.1 The Wave Equation . . . . . . . . . . . . . . . . . . 64
9.7.2 Planewave Solutions . . . . . . . . . . . . . . . . . . 65
9.7.3 Superposition and the Fourier Transform . . . . . . 65
9.7.4 The Spherical Model . . . . . . . . . . . . . . . . . . 66
9.7.5 The Two-Dimensional Array . . . . . . . . . . . . . 66
9.7.6 The One-Dimensional Array . . . . . . . . . . . . . . 66
9.7.7 Limited Aperture . . . . . . . . . . . . . . . . . . . . 67
9.8 An Example: The Solar-Emission Problem . . . . . . . . . . 67

10 Properties of the Fourier Transform (Chapter 8) 75


10.1 Fourier-Transform Pairs . . . . . . . . . . . . . . . . . . . . 75
10.1.1 Decomposing f (x) . . . . . . . . . . . . . . . . . . . 75
10.1.2 The Issue of Units . . . . . . . . . . . . . . . . . . . 76
10.2 Basic Properties of the Fourier Transform . . . . . . . . . . 76
10.3 Some Fourier-Transform Pairs . . . . . . . . . . . . . . . . . 77
10.4 Dirac Deltas . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
10.5 More Properties of the Fourier Transform . . . . . . . . . . 80
10.6 Convolution Filters . . . . . . . . . . . . . . . . . . . . . . . 81
10.6.1 Blurring and Convolution Filtering . . . . . . . . . . 81
10.6.2 Low-Pass Filtering . . . . . . . . . . . . . . . . . . . 82
10.7 Two-Dimensional Fourier Transforms . . . . . . . . . . . . . 83
10.7.1 Two-Dimensional Fourier Inversion . . . . . . . . . . 84
10.7.2 A Discontinuous Function . . . . . . . . . . . . . . . 84

11 Transmission Tomography (Chapter 8) 87


11.1 X-ray Transmission Tomography . . . . . . . . . . . . . . . 87
11.2 The Exponential-Decay Model . . . . . . . . . . . . . . . . 87
11.3 Difficulties to be Overcome . . . . . . . . . . . . . . . . . . 88
11.4 Reconstruction from Line Integrals . . . . . . . . . . . . . . 89
11.4.1 The Radon Transform . . . . . . . . . . . . . . . . . 89
11.4.2 The Central Slice Theorem . . . . . . . . . . . . . . 90
iv CONTENTS

12 The ART and MART (Chapter 15) 93


12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
12.2 The ART in Tomography . . . . . . . . . . . . . . . . . . . 94
12.3 The ART in the General Case . . . . . . . . . . . . . . . . . 95
12.3.1 Calculating the ART . . . . . . . . . . . . . . . . . . 95
12.3.2 When Ax = b Has Solutions . . . . . . . . . . . . . . 96
12.3.3 When Ax = b Has No Solutions . . . . . . . . . . . . 96
12.3.4 The Geometric Least-Squares Solution . . . . . . . . 96
12.4 The MART . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
12.4.1 A Special Case of MART . . . . . . . . . . . . . . . 97
12.4.2 The MART in the General Case . . . . . . . . . . . 98
12.4.3 Cross-Entropy . . . . . . . . . . . . . . . . . . . . . 99
12.4.4 Convergence of MART . . . . . . . . . . . . . . . . . 99

13 Some Linear Algebra (Chapter 15) 103


13.1 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . 103
13.2 Linear Independence and Bases . . . . . . . . . . . . . . . . 104
13.3 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
13.4 Representing a Linear Transformation . . . . . . . . . . . . 106
13.5 Linear Functionals and Duality . . . . . . . . . . . . . . . . 107
13.6 Linear Operators on V . . . . . . . . . . . . . . . . . . . . . 108
13.7 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . 108
13.8 Using Matrix Representations . . . . . . . . . . . . . . . . . 109
13.9 Matrix Diagonalization and Systems of Linear ODE’s . . . 109
13.10An Inner Product on V . . . . . . . . . . . . . . . . . . . . 112
13.11Representing Linear Functionals . . . . . . . . . . . . . . . 112
13.12The Adjoint of a Linear Transformation . . . . . . . . . . . 113
13.13Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . 114
13.14Normal and Self-Adjoint Operators . . . . . . . . . . . . . . 114
13.15It is Good to be “Normal” . . . . . . . . . . . . . . . . . . . 115

II Readings for Applied Mathematics II 119

14 Vectors (Chapter 5,6) 121


14.1 Real N -dimensional Space . . . . . . . . . . . . . . . . . . . 121
14.2 Two Roles for Members of RN . . . . . . . . . . . . . . . . 121
14.3 Vector Algebra and Geometry . . . . . . . . . . . . . . . . . 122
14.4 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . 123
14.5 Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
CONTENTS v

15 A Brief History of Electromagnetism (Chapter 5,6) 125


15.1 Who Knew? . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
15.2 “What’s Past is Prologue” . . . . . . . . . . . . . . . . . . . 126
15.3 Are We There Yet? . . . . . . . . . . . . . . . . . . . . . . . 126
15.4 Why Do Things Move? . . . . . . . . . . . . . . . . . . . . . 127
15.5 Go Fly a Kite! . . . . . . . . . . . . . . . . . . . . . . . . . 129
15.6 Bring in the Frogs! . . . . . . . . . . . . . . . . . . . . . . . 129
15.7 Lose the Frogs! . . . . . . . . . . . . . . . . . . . . . . . . . 130
15.8 It’s a Magnet! . . . . . . . . . . . . . . . . . . . . . . . . . . 130
15.9 A New World . . . . . . . . . . . . . . . . . . . . . . . . . . 131
15.10Do The Math! . . . . . . . . . . . . . . . . . . . . . . . . . . 131
15.11Just Dot the i’s and Cross the t’s? . . . . . . . . . . . . . . 132
15.12Seeing is Believing . . . . . . . . . . . . . . . . . . . . . . . 134
15.13If You Can Spray Them, They Exist . . . . . . . . . . . . . 134
15.14What’s Going On Here? . . . . . . . . . . . . . . . . . . . . 135
15.15The Year of the Golden Eggs . . . . . . . . . . . . . . . . . 137
15.16Do Individuals Matter? . . . . . . . . . . . . . . . . . . . . 137
15.17What’s Next? . . . . . . . . . . . . . . . . . . . . . . . . . . 139
15.18Unreasonable Effectiveness . . . . . . . . . . . . . . . . . . 139
15.19Coming Full Circle . . . . . . . . . . . . . . . . . . . . . . . 141

16 Changing Variables in Multiple Integrals (Chapter 5,6) 143


16.1 Mean-Value Theorems . . . . . . . . . . . . . . . . . . . . . 143
16.1.1 The Single-Variable Case . . . . . . . . . . . . . . . 143
16.1.2 The Multi-variate Case . . . . . . . . . . . . . . . . 143
16.1.3 The Vector-Valued Multi-variate Case . . . . . . . . 144
16.2 The Vector Differential for Three Dimensions . . . . . . . . 145

17 Div, Grad, Curl (Chapter 5,6) 147


17.1 The Electric Field . . . . . . . . . . . . . . . . . . . . . . . 147
17.2 The Electric Field Due To A Single Charge . . . . . . . . . 148
17.3 Gradients and Potentials . . . . . . . . . . . . . . . . . . . . 149
17.4 Gauss’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
17.4.1 The Charge Density Function . . . . . . . . . . . . . 149
17.4.2 The Flux . . . . . . . . . . . . . . . . . . . . . . . . 150
17.5 A Local Gauss’s Law and Divergence . . . . . . . . . . . . . 150
17.5.1 The Laplacian . . . . . . . . . . . . . . . . . . . . . 151
17.6 Poisson’s Equation and Harmonic Functions . . . . . . . . . 151
17.7 The Curl . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
17.7.1 An Example . . . . . . . . . . . . . . . . . . . . . . 152
17.7.2 Solenoidal Fields . . . . . . . . . . . . . . . . . . . . 153
17.7.3 The Curl of the Electrostatic Field . . . . . . . . . . 153
17.8 The Magnetic Field . . . . . . . . . . . . . . . . . . . . . . . 153
17.9 Electro-magnetic Waves . . . . . . . . . . . . . . . . . . . . 154
vi CONTENTS

18 Kepler’s Laws of Planetary Motion (Chapter 5,6) 157


18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
18.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . 158
18.3 Torque and Angular Momentum . . . . . . . . . . . . . . . 159
18.4 Gravity is a Central Force . . . . . . . . . . . . . . . . . . . 161
18.5 The Second Law . . . . . . . . . . . . . . . . . . . . . . . . 161
18.6 The First Law . . . . . . . . . . . . . . . . . . . . . . . . . 163
18.7 The Third Law . . . . . . . . . . . . . . . . . . . . . . . . . 164
18.8 Dark Matter and Dark Energy . . . . . . . . . . . . . . . . 165
18.9 From Kepler to Newton . . . . . . . . . . . . . . . . . . . . 166
18.10Newton’s Own Proof of the Second Law . . . . . . . . . . . 168
18.11Armchair Physics . . . . . . . . . . . . . . . . . . . . . . . . 169
18.11.1 Rescaling . . . . . . . . . . . . . . . . . . . . . . . . 169
18.11.2 Gravitational Potential . . . . . . . . . . . . . . . . 169
18.11.3 Gravity on Earth . . . . . . . . . . . . . . . . . . . . 170

19 Green’s Theorem and Related Topics (Chapter 5,6,13) 173


19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
19.1.1 Some Terminology . . . . . . . . . . . . . . . . . . . 173
19.1.2 Arc-Length Parametrization . . . . . . . . . . . . . . 174
19.2 Green’s Theorem in Two Dimensions . . . . . . . . . . . . . 174
19.3 Proof of Green-2D . . . . . . . . . . . . . . . . . . . . . . . 175
19.4 Extension to Three Dimensions . . . . . . . . . . . . . . . . 177
19.4.1 Stokes’s Theorem . . . . . . . . . . . . . . . . . . . . 177
19.4.2 The Divergence Theorem . . . . . . . . . . . . . . . 179
19.5 When is a Vector Field a Gradient Field? . . . . . . . . . . 180
19.6 Corollaries of Green-2D . . . . . . . . . . . . . . . . . . . . 182
19.6.1 Green’s First Identity . . . . . . . . . . . . . . . . . 182
19.6.2 Green’s Second Identity . . . . . . . . . . . . . . . . 183
19.6.3 Inside-Outside Theorem . . . . . . . . . . . . . . . . 183
19.6.4 Green’s Third Identity . . . . . . . . . . . . . . . . . 183
19.7 Application to Complex Function Theory . . . . . . . . . . 185
19.8 The Cauchy-Riemann Equations Again . . . . . . . . . . . . 188

20 Introduction to Complex Analysis (Chapter 13) 191


20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
20.2 Complex-valued Functions of a Complex Variable . . . . . . 191
20.3 Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . 192
20.4 The Cauchy-Riemann Equations . . . . . . . . . . . . . . . 192
20.5 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
20.6 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . 194
20.7 Cauchy’s Integral Theorem . . . . . . . . . . . . . . . . . . 194
20.8 Taylor Series Expansions . . . . . . . . . . . . . . . . . . . . 195
20.9 Laurent Series: An Example . . . . . . . . . . . . . . . . . . 196
CONTENTS vii

20.9.1 Expansion Within an Annulus . . . . . . . . . . . . 196


20.9.2 Expansion Within the Inner Circle . . . . . . . . . . 197
20.10Laurent Series Expansions . . . . . . . . . . . . . . . . . . . 197
20.11Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
20.12The Binomial Theorem . . . . . . . . . . . . . . . . . . . . 199
20.13Using Residues . . . . . . . . . . . . . . . . . . . . . . . . . 201
20.14Cauchy’s Estimate . . . . . . . . . . . . . . . . . . . . . . . 201
20.15Liouville’s Theorem . . . . . . . . . . . . . . . . . . . . . . 201
20.16The Fundamental Theorem of Algebra . . . . . . . . . . . . 202
20.17Morera’s Theorem . . . . . . . . . . . . . . . . . . . . . . . 202

21 The Quest for Invisibility (Chapter 5,6) 203


21.1 Invisibility: Fact and Fiction . . . . . . . . . . . . . . . . . 203
21.2 The Electro-Static Theory . . . . . . . . . . . . . . . . . . . 203
21.3 Impedance Tomography . . . . . . . . . . . . . . . . . . . . 204
21.4 Cloaking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

22 Calculus of Variations (Chapter 16) 207


22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
22.2 Some Examples . . . . . . . . . . . . . . . . . . . . . . . . . 208
22.2.1 The Shortest Distance . . . . . . . . . . . . . . . . . 208
22.2.2 The Brachistochrone Problem . . . . . . . . . . . . . 208
22.2.3 Minimal Surface Area . . . . . . . . . . . . . . . . . 209
22.2.4 The Maximum Area . . . . . . . . . . . . . . . . . . 209
22.2.5 Maximizing Burg Entropy . . . . . . . . . . . . . . . 210
22.3 Comments on Notation . . . . . . . . . . . . . . . . . . . . 210
22.4 The Euler-Lagrange Equation . . . . . . . . . . . . . . . . . 211
22.5 Special Cases of the Euler-Lagrange Equation . . . . . . . . 212
22.5.1 If f is independent of v . . . . . . . . . . . . . . . . 212
22.5.2 If f is independent of u . . . . . . . . . . . . . . . . 213
22.6 Using the Euler-Lagrange Equation . . . . . . . . . . . . . . 213
22.6.1 The Shortest Distance . . . . . . . . . . . . . . . . . 214
22.6.2 The Brachistochrone Problem . . . . . . . . . . . . . 214
22.6.3 Minimizing the Surface Area . . . . . . . . . . . . . 216
22.7 Problems with Constraints . . . . . . . . . . . . . . . . . . . 216
22.7.1 The Isoperimetric Problem . . . . . . . . . . . . . . 216
22.7.2 Burg Entropy . . . . . . . . . . . . . . . . . . . . . . 217
22.8 The Multivariate Case . . . . . . . . . . . . . . . . . . . . . 218
22.9 Finite Constraints . . . . . . . . . . . . . . . . . . . . . . . 219
22.9.1 The Geodesic Problem . . . . . . . . . . . . . . . . . 219
22.9.2 An Example . . . . . . . . . . . . . . . . . . . . . . 223
22.10Hamilton’s Principle and the Lagrangian . . . . . . . . . . . 223
22.10.1 Generalized Coordinates . . . . . . . . . . . . . . . . 223
22.10.2 Homogeneity and Euler’s Theorem . . . . . . . . . . 224
viii CONTENTS

22.10.3 Hamilton’s Principle . . . . . . . . . . . . . . . . . . 225


22.11Sturm-Liouville Differential Equations . . . . . . . . . . . . 226
22.12Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

23 Sturm-Liouville Problems (Chapter 10,11) 227


23.1 Recalling Some Matrix Theory . . . . . . . . . . . . . . . . 227
23.2 The Sturm-Liouville Form . . . . . . . . . . . . . . . . . . . 229
23.3 Inner Products and Self-Adjoint Differential Operators . . . 230
23.3.1 An Example of a Self-Adjoint Operator . . . . . . . 230
23.3.2 Another Example . . . . . . . . . . . . . . . . . . . . 230
23.3.3 The Sturm-Liouville Operator . . . . . . . . . . . . . 231
23.4 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . 232
23.5 Normal Form of Sturm-Liouville Equations . . . . . . . . . 233
23.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
23.6.1 Wave Equations . . . . . . . . . . . . . . . . . . . . 234
23.6.2 Bessel’s Equations . . . . . . . . . . . . . . . . . . . 235
23.6.3 Legendre’s Equations . . . . . . . . . . . . . . . . . 236
23.6.4 Other Famous Examples . . . . . . . . . . . . . . . . 237

24 Series Solutions for Differential Equations (Chapter 10,11)239


24.1 First-Order Linear Equations . . . . . . . . . . . . . . . . . 239
24.1.1 An Example . . . . . . . . . . . . . . . . . . . . . . 239
24.1.2 Another Example: The Binomial Theorem . . . . . 240
24.2 Second-Order Problems . . . . . . . . . . . . . . . . . . . . 240
24.3 Ordinary Points . . . . . . . . . . . . . . . . . . . . . . . . . 241
24.3.1 The Wave Equation . . . . . . . . . . . . . . . . . . 241
24.3.2 Legendre’s Equations . . . . . . . . . . . . . . . . . 241
24.3.3 Hermite’s Equations . . . . . . . . . . . . . . . . . . 242
24.4 Regular Singular Points . . . . . . . . . . . . . . . . . . . . 242
24.4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . 242
24.4.2 Frobenius Series . . . . . . . . . . . . . . . . . . . . 243
24.4.3 Bessel Functions . . . . . . . . . . . . . . . . . . . . 244

25 Bessel’s Equations (Chapter 9,10,11) 245


25.1 The Vibrating String Problem . . . . . . . . . . . . . . . . . 246
25.2 The Hanging Chain Problem . . . . . . . . . . . . . . . . . 247
25.2.1 The Wave Equation for the Hanging Chain . . . . . 247
25.2.2 Separating the Variables . . . . . . . . . . . . . . . . 247
25.2.3 Obtaining Bessel’s Equation . . . . . . . . . . . . . . 248
25.3 Solving Bessel’s Equations . . . . . . . . . . . . . . . . . . . 248
25.3.1 Frobenius-series solutions . . . . . . . . . . . . . . . 248
25.3.2 Bessel Functions . . . . . . . . . . . . . . . . . . . . 249
25.4 Bessel Functions of the Second Kind . . . . . . . . . . . . . 250
25.5 Hankel Functions . . . . . . . . . . . . . . . . . . . . . . . . 250
CONTENTS ix

25.6 The Gamma Function . . . . . . . . . . . . . . . . . . . . . 250


25.6.1 Extending the Factorial Function . . . . . . . . . . . 250
25.6.2 Extending Γ(x) to negative x . . . . . . . . . . . . . 251
25.6.3 An Example . . . . . . . . . . . . . . . . . . . . . . 251
25.7 Representing the Bessel Functions . . . . . . . . . . . . . . 252
25.7.1 Taylor Series . . . . . . . . . . . . . . . . . . . . . . 252
25.7.2 Generating Function . . . . . . . . . . . . . . . . . . 252
25.7.3 An Integral Representation . . . . . . . . . . . . . . 252
25.8 Fourier Transforms and Bessel Functions . . . . . . . . . . . 253
25.8.1 The Case of Two Dimensions . . . . . . . . . . . . . 253
25.8.2 The Case of Radial Functions . . . . . . . . . . . . . 253
25.8.3 The Hankel Transform . . . . . . . . . . . . . . . . . 254
25.9 An Application of the Bessel Functions in Astronomy . . . 255
25.10Orthogonality of Bessel Functions . . . . . . . . . . . . . . . 256

26 Legendre’s Equations (Chapter 10,11) 259


26.1 Legendre’s Equations . . . . . . . . . . . . . . . . . . . . . . 259
26.2 Rodrigues’ Formula . . . . . . . . . . . . . . . . . . . . . . . 261
26.3 A Recursive Formula for Pn (x) . . . . . . . . . . . . . . . . 261
26.4 A Generating Function Approach . . . . . . . . . . . . . . . 262
26.5 A Two-Term Recursive Formula for Pn (x) . . . . . . . . . . 263
26.6 Legendre Series . . . . . . . . . . . . . . . . . . . . . . . . . 263
26.7 Best Approximation by Polynomials . . . . . . . . . . . . . 263
26.8 Legendre’s Equations and Potential Theory . . . . . . . . . 264
26.9 Legendre Polynomials and Gaussian Quadrature . . . . . . 264
26.9.1 The Basic Formula . . . . . . . . . . . . . . . . . . . 264
26.9.2 Lagrange Interpolation . . . . . . . . . . . . . . . . . 265
26.9.3 Using the Legendre Polynomials . . . . . . . . . . . 265

27 Hermite’s Equations and Quantum Mechanics (Chapter


10,11) 267
27.1 The Schrödinger Wave Function . . . . . . . . . . . . . . . . 267
27.2 Time-Independent Potentials . . . . . . . . . . . . . . . . . 268
27.3 The Harmonic Oscillator . . . . . . . . . . . . . . . . . . . . 268
27.3.1 The Classical Spring Problem . . . . . . . . . . . . . 268
27.3.2 Back to the Harmonic Oscillator . . . . . . . . . . . 269
27.4 Dirac’s Equation . . . . . . . . . . . . . . . . . . . . . . . . 269

28 Array Processing (Chapter 8) 271

29 Matched Field Processing (Chapter 10,11,12) 275


29.1 The Shallow-Water Case . . . . . . . . . . . . . . . . . . . . 275
29.2 The Homogeneous-Layer Model . . . . . . . . . . . . . . . . 276
29.3 The Pekeris Waveguide . . . . . . . . . . . . . . . . . . . . . 278
x CONTENTS

29.4 The General Normal-Mode Model . . . . . . . . . . . . . . 279


29.4.1 Matched-Field Processing . . . . . . . . . . . . . . . 279

III Appendices 281


30 Inner Products and Orthogonality 283
30.1 The Complex Vector Dot Product . . . . . . . . . . . . . . 283
30.1.1 The Two-Dimensional Case . . . . . . . . . . . . . . 283
30.1.2 Orthogonality . . . . . . . . . . . . . . . . . . . . . . 284
30.2 Generalizing the Dot Product: Inner Products . . . . . . . 285
30.2.1 Defining an Inner Product and Norm . . . . . . . . . 286
30.2.2 Some Examples of Inner Products . . . . . . . . . . 286
30.3 Best Approximation and the Orthogonality Principle . . . . 289
30.3.1 Best Approximation . . . . . . . . . . . . . . . . . . 289
30.3.2 The Orthogonality Principle . . . . . . . . . . . . . . 290
30.4 Gram-Schmidt Orthogonalization . . . . . . . . . . . . . . . 290

31 Chaos 291
31.1 The Discrete Logistics Equation . . . . . . . . . . . . . . . . 291
31.2 Fixed Points . . . . . . . . . . . . . . . . . . . . . . . . . . 292
31.3 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
31.4 Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
31.5 Sensitivity to the Starting Value . . . . . . . . . . . . . . . 293
31.6 Plotting the Iterates . . . . . . . . . . . . . . . . . . . . . . 294
31.7 Filled Julia Sets . . . . . . . . . . . . . . . . . . . . . . . . . 294
31.8 The Newton-Raphson Algorithm . . . . . . . . . . . . . . . 295
31.9 Newton-Raphson and Chaos . . . . . . . . . . . . . . . . . . 296
31.9.1 A Simple Case . . . . . . . . . . . . . . . . . . . . . 296
31.9.2 A Not-So-Simple Case . . . . . . . . . . . . . . . . . 297
31.10The Cantor Game . . . . . . . . . . . . . . . . . . . . . . . 297
31.11The Sir Pinski Game . . . . . . . . . . . . . . . . . . . . . . 297
31.12The Chaos Game . . . . . . . . . . . . . . . . . . . . . . . . 298

32 Wavelets 305
32.1 Analysis and Synthesis . . . . . . . . . . . . . . . . . . . . . 305
32.2 Polynomial Approximation . . . . . . . . . . . . . . . . . . 306
32.3 A Radar Problem . . . . . . . . . . . . . . . . . . . . . . . . 306
32.3.1 Stationary Target . . . . . . . . . . . . . . . . . . . 306
32.3.2 Moving Target . . . . . . . . . . . . . . . . . . . . . 307
32.3.3 The Wideband Cross-Ambiguity Function . . . . . . 308
32.4 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
32.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . 308
32.4.2 A Simple Example . . . . . . . . . . . . . . . . . . . 309
CONTENTS 1

32.4.3 The Integral Wavelet Transform . . . . . . . . . . . 310


32.4.4 Wavelet Series Expansions . . . . . . . . . . . . . . . 311
32.4.5 More General Wavelets . . . . . . . . . . . . . . . . 311

Bibliography 312

Index 317
2 CONTENTS
Chapter 1

Preface

These are notes on various topics in applied mathematics, designed to sup-


plement the text for the courses 92.530 Applied Mathematics I and 92.531
Applied Mathematics II. The text for these courses is Advanced Mathe-
matics for Engineers and Scientists, M. Spiegel, McGraw-Hill Schaum’s
Outline Series, ISBN 978-0-07-163540-0. Chapter references in the notes
are to chapters in this text.
For extra credit work there is one chapter containing well known prob-
lems in applied mathematics and other exercises scattered throughout these
notes.

3
4 CHAPTER 1. PREFACE
Part I

Readings for Applied


Mathematics I

5
Chapter 2

More
Fundamentals(Chapter 1)

2.1 The Dot Product


Let RN denote the collection of all N -dimensional column vectors of real
numbers; for example,
x1
 
 x2 
 · 
 
x= ,
 · 
·
 
xN
where each of the xn , n = 1, 2, ..., N is some real number. When N =
1 we write R1 = R, the collection of all real numbers. For notational
convenience, we sometimes write

xT = (x1 , x2 , ..., xN ),

which is the transpose of the column vector x.


If x and y are members of RN , then the dot product of x and y is the
real number
x · y = x1 y1 + x2 y2 + ... + xN yN .
The magnitude or size of a vector x is
q √
kxk = x21 + x22 + ... + x2N = x · x.

When N = 2 or N = 3 we can give more meaning to the dot product.

7
8 CHAPTER 2. MORE FUNDAMENTALS(CHAPTER 1)

For N = 2 or N = 3 we have

x · y = kxkkyk cos θ,

where θ is the angle between x and y, when they are viewed as directed
line segments in a plane, emerging from a common base point.
In general, when N is larger, the angle between x and y no longer makes
sense, but we still have a useful inequality, called Cauchy’s Inequality:

|x · y| ≤ kxkkyk,

and
|x · y| = kxkkyk
precisely when, or if and only if, as mathematicians say, x and y are parallel,
that is, there is a real number α with

y = αx.

2.2 The Gradient and Directional Derivatives


Let f (x1 , x2 , ..., xN ) be a real-valued function of N real variables, which we
denote by f : RN → R. For such functions we are interested in their
first partial derivatives. The first partial derivative of f , at the point
(x1 , x2 , ..., xN ), in the direction of xn is defined to be

f (x1 , x2 , ..., xn−1 , xn + h, xn+1 , ..., xN ) − f (x1 , x2 , ..., xn−1 , xn , xn+1 , ..., xN )
lim ,
h→0 h
provided that this limit exists. We denote this limit as fn (x1 , ..., xN ), or
∂f
∂xn (x1 , ..., xN ). When all the first partial derivatives of f exist at a point
we say that f is differentiable at that point.
When we are dealing with small values of N , such as N = 3, it is
common to write f (x, y, z), where now x, y, and z are real variables, not
vectors. Then the first partial derivatives can be denoted fx , fy , and fz .
The gradient of the function f : RN → R at the point (x1 , x2 , ..., xN ),
written ∇f (x1 , ..., xN ), is the column vector whose entries are the first
partial derivatives of f at that point.
Let d be a member of RN with kdk = 1; then d is called a direction
vector. The directional derivative of f , at the point (x1 , ..., xN ), in the
direction of d, is
∇f (x1 , ..., xN ) · d.
From Cauchy’s Inequality we see that the absolute value of the directional
derivative at a given point is at most the magnitude of the gradient at
that point, and is equal to that magnitude precisely when d is parallel to
2.3. OPTIMIZATION 9

the gradient. It follows that the direction in which the gradient points is
the direction of greatest increase in f , and the opposite direction is the
direction of greatest decrease. The gradient, therefore, is perpendicular to
the tangent plane to the surface of constant value, the level surface, passing
through this point. These facts are important in optimization, when we
try to find the largest and smallest values of f .

2.3 Optimization
If f : R → R is a differentiable real-valued function of a real variable, and
we want to find its local maxima and minima, we take the derivative and
set it to zero. When f : RN → R is a differentiable real-valued function
of N real variables, we find local maxima and minima by calculating the
gradient and finding out where the gradient is zero, that is, where all the
first partial derivatives are zero.

2.4 Lagrange Multipliers


If f : R → R is differentiable and we want to maximize or minimize f for
x in the interval [a, b], that is, we want to solve a constrained optimization
problem, we must not look only at the places where the derivative is zero,
but we must check the endpoints also. For functions of more than one
variable, constrained optimization problems are more difficult. Consider
the following example.
Let f (x, y) = x2 + y 2 and suppose we want to minimize f , but only for
those points (x, y) with x2 + y3 − 1 = 0. One way is to solve for y, getting
y = −3 2 2
2 x + 3, putting this into x + y to get a function of x alone, and
then minimizing that function of x. This does not always work, though.
Lagrange multipliers can help in more complicated cases.
Suppose that we want to minimize a differentiable function f (x1 , ..., xN ),
subject to g(x1 , ..., xN ) = 0, where g is another differentiable real-valued
function. The function f determines level surfaces, which are the sets of
all points in RN on which f has the same value; think of elevation lines on
a map. Similarly, g determines its own set of level surfaces. Our constraint
is that we must consider only those points in RN on the level surface where
g = 0. At the solution point (x∗1 , ..., x∗N ), the level surface for g = 0 must
be tangent to a level surface of f , which says that the gradient of g must
be parallel to the gradient of f at that point; in other words, there is a real
number α such that

∇g(x∗1 , ..., x∗N ) = α∇f (x∗1 , ..., x∗N ),


10 CHAPTER 2. MORE FUNDAMENTALS(CHAPTER 1)

which we can write in the more traditional way as

∇f (x∗1 , ..., x∗N ) + λ∇g(x∗1 , ..., x∗N ) = 0.

Suppose then that we form the function

h(x1 , ..., xN ) = f (x1 , ..., xN ) + λg(x1 , ..., xN ).

Then we want (x∗1 , ..., x∗N ) such that

∇h(x∗1 , ..., x∗N ) = 0.

This is the Lagrange multiplier approach.


Let’s return to the problem of minimizing f (x, y) = x2 + y 2 , subject to
the constraint g(x, y) = x2 + y3 − 1 = 0. Then
x y
h(x, y) = x2 + y 2 + λ( + − 1).
2 3
Setting hx = 0 and hy = 0 we find that we need

λ
2x + = 0,
2
and
λ
2y +
= 0.
3
We don’t know what λ is, but we don’t care, because we can write

λ = −4x,

and
λ = −6y,
from which we conclude that y = 32 x. This is a second relationship between
x and y and now we can find the answer.

2.5 Richardson’s Method


We know that if f : R → R is differentiable, then

f (x + h) − f (x)
f 0 (x) = lim .
h→0 h
Suppose that we want to approximate f 0 (x) numerically, by taking a small
value of h. Eventually, if h is too small, we run into trouble, because both
the top and the bottom of the difference quotient go to zero as h goes to
zero. So we are dividing a small number by a small number, which is to
2.6. LEIBNITZ’S RULE AND DISTRIBUTIONS 11

be avoided, according to the rules of numerical analysis. The download file


on Richardson’s method that is available on the website shows what can
happen, as we try to calculate f 0 (3) for f (x) = log x.
We know from the Taylor expansion that, if f is a nice function, then
f (x + h) − f (x) 1 1
= f 0 (x) + f 00 (x)h + f 000 (x)h2 + ... (2.1)
h 2! 3!
Similarly,
f (x + h) − f (x − h) 1
= f 0 (x) + f 000 (x)h2 + ... (2.2)
2h 3!
Therefore, the left side of Equation (2.1) goes to f 0 (x) on the order of
h, while the left side of Equation (2.2) goes to f 0 (x) on the order of h2 .
This tells us that if we use Equation (2.2) to estimate f 0 (x) we won’t
need to take h as small to get a good answer. This is the basic idea of
Richardson’s method, which can be applied to other types of problems,
such as approximating integrals.

2.6 Leibnitz’s Rule and Distributions


Let f (x, t) be a real-valued function of two real variables. Then
Z b
f (x, t)dx = F (t) (2.3)
a

is a function of t alone. The derivative of F (t), if it exists, can be expressed


in terms of the partial derivative, with respect to t, of f (x, t):
Z b
0 ∂f
F (t) = (x, t)dx. (2.4)
a ∂t

Leibnitz’s Rule extends this formula to the case in which a and b are also
allowed to depend on t.
Let h(t) and g(t) be real-valued functions of t. For convenience, we
assume that g(t) < h(t) for all t. Let
Z h(t)
F (t) = f (x, t)dx. (2.5)
g(t)

Leibnitz’s Rule then states that


Z h(t)
0 ∂f
F (t) = (x, t)dx + f (h(t), t)h0 (t) − f (g(t), t)g 0 (t). (2.6)
g(t) ∂t

We can use distributions to see why this is plausible.


12 CHAPTER 2. MORE FUNDAMENTALS(CHAPTER 1)

Distribution theory allows us to extend the notion of derivative to func-


tions that do not possess derivatives in the ordinary sense, such as the
Heaviside function U (x), which equals one for x ≥ 0 and zero for x < 0.
Integration by parts is the key here.
Suppose that v(x) is differentiable and goes to zero as |x| approaches
+∞. Then integration by parts tells us that
Z +∞ Z +∞
0
u (x)v(x)dx = − u(x)v 0 (x)dx. (2.7)
−∞ −∞

If u(x) doesn’t have a derivative in the usual sense, we define u0 (x) as


the generalized function u0 (x) that has the property described in Equation
(2.7).
For example, let u(x) = U (x), the Heaviside function. Then U 0 (x) has
the property that, for all v(x) as above,
Z +∞ Z +∞
0
U (x)v(x)dx = − v 0 (x)dx = v(0). (2.8)
−∞ 0
0
Therefore, U (x) can be defined by the property
Z +∞
U 0 (x)v(x)dx = v(0). (2.9)
−∞

But Equation (2.9) is also the definition of the generalized function (or
distribution) called the Dirac delta function, denoted δ(x). So U 0 (x) =
δ(x). We can now use this to motivate Leibnitz’s Rule.
Denote by χ[a,b] (x) the function that is one for a ≤ x ≤ b and zero
otherwise; note that
χ[a,b] (x) = U (x − a) − U (x − b), (2.10)
so that the derivative of χ[a,b] (x), in the distributional sense, is
χ0[a,b] (x) = δ(x − a) − δ(x − b). (2.11)
Then we can write
Z h(t) Z +∞
F (t) = f (x, t)dx = χ[g(t),h(t)] (x)f (x, t)dx. (2.12)
g(t) −∞

The function c(x, t) = χ[g(t),h(t)] (x) has the distributional partial derivative,
with respect to t, of
∂c
(x, t) = −g 0 (t)δ(x − g(t)) + h0 (t)δ(x − h(t)). (2.13)
∂t
Using the product rule and differentiating under the integral sign, we get
Z h(t)
0 ∂f
F (t) = (x, t)dx + h0 (t)f (h(t), t) − g 0 (t)f (g(t), t). (2.14)
g(t) ∂t
2.7. THE COMPLEX EXPONENTIAL FUNCTION 13

2.7 The Complex Exponential Function


The most important function in signal processing is the complex-valued
function of the real variable x defined by

h(x) = cos(x) + i sin(x). (2.15)

For reasons that will become clear shortly, this function is called the com-
plex exponential function. Notice that the magnitude of the complex num-
ber h(x) is always equal to one, since cos2 (x) + sin2 (x) = 1 for all real x.
Since the functions cos(x) and sin(x) are 2π-periodic, that is, cos(x+2π) =
cos(x) and sin(x + 2π) = sin(x) for all x, the complex exponential function
h(x) is also 2π-periodic.

2.7.1 Real Exponential Functions


In calculus we encounter functions of the form g(x) = ax , where a > 0 is
an arbitrary constant. These functions are the exponential functions, the
most well-known of which is the function g(x) = ex . Exponential functions
are those with the property

g(u + v) = g(u)g(v) (2.16)

for every u and v. Recall from calculus that for exponential functions
g(x) = ax with a > 0 the derivative g 0 (x) is

g 0 (x) = ax ln(a) = g(x) ln(a). (2.17)

Now we consider the function h(x) in light of these ideas.

2.7.2 Why is h(x) an Exponential Function?


We show now that the function h(x) in Equation (2.15) has the property
given in Equation (2.16), so we have a right to call it an exponential func-
tion; that is, h(x) = cx for some constant c. Since h(x) has complex values,
the constant c cannot be a real number, however.
Calculating h(u)h(v), we find

h(u)h(v) = (cos(u) cos(v) − sin(u) sin(v)) + i(cos(u) sin(v) + sin(u) cos(v))

= cos(u + v) + i sin(u + v) = h(u + v).


So h(x) is an exponential function; h(x) = cx for some complex constant
c. Inserting x = 1, we find that c is

c = cos(1) + i sin(1).
14 CHAPTER 2. MORE FUNDAMENTALS(CHAPTER 1)

Let’s find another way to express c, using Equation (2.17). Since

h0 (x) = − sin(x) + i cos(x) = i(cos(x) + i sin(x)) = ih(x),

we conjecture that ln(c) = i; but what does this mean?


For a > 0 we know that b = ln(a) means that a = eb . Therefore, we
say that ln(c) = i means c = ei ; but what does it mean to take e to a
complex power? To define ei we turn to the Taylor series representation
for the exponential function g(x) = ex , defined for real x:

ex = 1 + x + x2 /2! + x3 /3! + ....

Inserting i in place of x and using the fact that i2 = −1, we find that

ei = (1 − 1/2! + 1/4! − ...) + i(1 − 1/3! + 1/5! − ...);

note that the two series are the Taylor series for cos(1) and sin(1), respec-
tively, so ei = cos(1) + i sin(1). Then the complex exponential function in
Equation (2.15) is
h(x) = (ei )x = eix .
Inserting x = π, we get

h(π) = eiπ = cos(π) + i sin(π) = −1

or
eiπ + 1 = 0,
which is the remarkable relation discovered by Euler that combines the five
most important constants in mathematics, e, π, i, 1, and 0, in a single
equation.
Note that e2πi = e0i = e0 = 1, so

e(2π+x)i = e2πi eix = eix

for all x.

2.7.3 What is ez , for z complex?


We know from calculus what ex means for real x, and now we also know
what eix means. Using these we can define ez for any complex number
z = a + ib by ez = ea+ib = ea eib .
We know from calculus how to define ln(x) for x > 0, and we have just
defined ln(c) = i to mean c = ei . But we could also say that ln(c) = i(1 +
2πk) for any integer k; that is, the periodicity of the complex exponential
function forces the function ln(x) to be multi-valued.
2.7. THE COMPLEX EXPONENTIAL FUNCTION 15

For any nonzero complex number z = |z|eiθ(z) , we have

ln(z) = ln(|z|) + ln(eiθ(z) ) = ln(|z|) + i(θ(z) + 2πk),

for any integer k. If z = a > 0 then θ(z) = 0 and ln(z) = ln(a) + i(kπ)
for any even integer k; in calculus class we just take the value associated
with k = 0. If z = a < 0 then θ(z) = π and ln(z) = ln(−a) + i(kπ) for
any odd integer k. So we can define the logarithm of a negative number; it
just turns out not to be a real number. If z = ib with b > 0, then θ(z) = π2
and ln(z) = ln(b) + i( π2 + 2πk) for any integer k; if z = ib with b < 0, then
θ(z) = 3π 3π
2 and ln(z) = ln(−b) + i( 2 + 2πk) for any integer k.
−ix
Adding e = cos(x) − i sin(x) to eix given by Equation (2.15), we get

1 ix
cos(x) = (e + e−ix );
2

subtracting, we obtain

1 ix
sin(x) = (e − e−ix ).
2i

These formulas allow us to extend the definition of cos and sin to complex
arguments z:
1
cos(z) = (eiz + e−iz )
2
and
1 iz
sin(z) = (e − e−iz ).
2i
In signal processing the complex exponential function is often used to de-
scribe functions of time that exhibit periodic behavior:

h(ωt + θ) = ei(ωt+θ) = cos(ωt + θ) + i sin(ωt + θ),

where the frequency ω and phase angle θ are real constants and t denotes
time. We can alter the magnitude by multiplying h(ωt + θ) by a positive
constant |A|, called the amplitude, to get |A|h(ωt + θ). More generally, we
can combine the amplitude and the phase, writing

|A|h(ωt + θ) = |A|eiθ eiωt = Aeiωt ,

where A is the complex amplitude A = |A|eiθ . Many of the functions


encountered in signal processing can be modeled as linear combinations of
such complex exponential functions or sinusoids, as they are often called.
16 CHAPTER 2. MORE FUNDAMENTALS(CHAPTER 1)

2.8 Complex Exponential Signal Models


In a later chapter we consider signal models f (x) that are sums of trigono-
metric functions;
L 
1 X 
f (x) = a0 + ak cos(ωk x) + bk sin(ωk x) , (2.18)
2
k=1

where the ωk are known, but the ak and bk are not. Now that we see how
to convert sines and cosines to complex exponential functions, using
1 
cos(ωk x) = exp(iωk x) + exp(−iωk x) (2.19)
2
and
1 
sin(ωk x) = exp(iωk x) − exp(−iωk x) , (2.20)
2i
we can write f (x) as
L
X
f (x) = cm exp(iωm x), (2.21)
m=−L

where c0 = 12 a0 ,

1
ck = (ak − ibk ), (2.22)
2
and
1
c−k = (ak + ibk ), (2.23)
2
for k = 1, ..., L. The complex notation is more commonly used in signal
processing. Note that if the original coefficients ak and bk are real numbers,
then c−m = cm .
Chapter 3

Differential Equations
(Chapters 2,3)

3.1 Second-Order Linear ODE


The most general form of the second-order linear homogeneous ordinary
differential equation with variable coefficients is

R(x)y 00 (x) + P (x)y 0 (x) + Q(x)y(x) = 0. (3.1)

Many differential equations of this type arise when we employ the technique
of separating the variables to solve a partial differential equation. We shall
consider several equivalent forms of Equation (3.1).

3.1.1 The Standard Form


Of course, dividing through by the function R(x) and renaming the co-
efficient functions, we can also write Equation (3.1) in the standard form
as

y 00 (x) + P (x)y 0 (x) + Q(x)y(x) = 0. (3.2)

There are other equivalent forms of Equation (3.1).

3.1.2 The Sturm-Liouville Form


Let S(x) = exp(−F (x)), where F 0 (x) = (R0 (x) − P (x))/R(x). Then we
have
d
(S(x)R(x)) = S(x)P (x).
dx

17
18 CHAPTER 3. DIFFERENTIAL EQUATIONS (CHAPTERS 2,3)

From Equation (3.1) we obtain

S(x)R(x)y 00 (x) + S(x)P (x)y 0 (x) + S(x)Q(x)y(x) = 0,

so that
d
(S(x)R(x)y 0 (x)) + S(x)Q(x)y(x) = 0,
dx
which then has the form
d
(p(x)y 0 (x)) + g(x)y(x) = 0. (3.3)
dx
We shall be particularly interested in special cases having the form
d
(p(x)y 0 (x)) − w(x)q(x)y(x) + λw(x)y(x) = 0, (3.4)
dx
where w(x) > 0 and λ is a constant. Rewriting Equation (3.4) as
1 d
− (p(x)y 0 (x)) + q(x)y(x) = λy(x), (3.5)
w(x) dx
we are reminded of eigenvector problems in linear algebra,

Ax = λx, (3.6)

where A is a square matrix, λ is an eigenvalue of A, and x 6= 0 is an


associated eigenvector. What is now playing the role of A is the linear
differential operator L that operates on a function y to produce the function
Ly given by
1  d 
(Ly)(x) = − (p(x)y 0 (x)) + q(x)y(x). (3.7)
w(x) dx

If y(x) satisfies the equation

Ly = λy,

then y(x) is said to be an eigenfunction of L, with associated eigenvalue λ.

3.1.3 The Normal Form


We start now with the differential equation as given by Equation (3.2).
This differential equation can be written in the equivalent normal form

u00 (x) + q(x)u(x) = 0, (3.8)

where
y(x) = u(x)v(x),
3.2. RECALLING THE WAVE EQUATION 19

 1Z 
v(x) = − exp − P dx ,
2
and
1 1
q(x) = Q(x) − P (x)2 − P 0 (x).
4 2
One reason for wanting to put the differential equation into normal form is
to relate the properties of its solutions to the properties of q(x). For exam-
ple, we are interested in the location of zeros of the solutions of Equation
(3.8), as compared with the zeros of the solutions of

u00 (x) + r(x)u(x) = 0. (3.9)

In particular, we want to compare the spacing of zeros of solutions of Equa-


tion (3.8) to that of the known solutions of the equation

u00 (x) + u(x) = 0.

If q(x) < 0, then any non-trivial solution of Equation (3.8) has at most one
zero; think of the equation

u00 (x) − u(x) = 0,

with solutions u(x) = ex and u(x) = e−x . Therefore, when we study an


equation in normal form, we shall always assume that q(x) > 0.
Determining important properties of the solutions of a differential equa-
tion without actually finding those solutions is called qualitative analysis.
We shall have more to say about qualitative analysis later in these notes.

3.2 Recalling the Wave Equation


The one-dimensional wave equation is

φtt (x, t) = c2 φxx (x, t), (3.10)

where c > 0 is the propagation speed. Separating variables, we seek a


solution of the form φ(x, t) = f (t)y(x). Inserting this into Equation (3.10),
we get
f 00 (t)y(x) = c2 f (t)y 00 (x),
or
f 00 (t)/f (t) = c2 y 00 (x)/y(x) = −ω 2 ,
where ω > 0 is the separation constant. We then have the separated
differential equations

f 00 (t) + ω 2 f (t) = 0, (3.11)


20 CHAPTER 3. DIFFERENTIAL EQUATIONS (CHAPTERS 2,3)

and
ω2
y 00 (x) + y(x) = 0. (3.12)
c2
Equation (3.12) can be written as an eigenvalue problem:

ω2
−y 00 (x) = y(x) = λy(x), (3.13)
c2
where, for the moment, the λ is unrestricted.
The solutions to Equation (3.12) are
ω  ω 
y(x) = α sin x + β cos x .
c c
For each arbitrary ω, the corresponding solutions of Equation (3.11) are

f (t) = γ sin(ωt) + δ cos(ωt).

In the vibrating string problem, the string is fixed at both ends, x = 0 and
x = L, so that
φ(0, t) = φ(L, t) = 0,
for all t. Therefore, we must have y(0) = y(L) = 0, so that the solutions
must have the form
ω   πm 
m
y(x) = Am sin x = Am sin x ,
c L
where ωm = πcm L , for any positive integer m. Therefore, the boundary
conditions limit the choices for the separation constant ω, and thereby the
choices for λ. In addition, if the string is not moving at time t = 0, then

f (t) = δ cos(ωm t).

We want to focus on Equation (3.12).


What we have just seen is that the boundary conditions y(0) = y(L) = 0
limit the possible values of λ for which there can be solutions: we must
have  ω 2  πm 2
m
λ = λm = = ,
c L
for some positive integer m. The corresponding solutions
 πm 
ym (x) = sin x
L
are the eigenfunctions. This is analogous to the linear algebra case, in
which Ax = λx, with x non-zero, only holds for special choices of λ.
3.2. RECALLING THE WAVE EQUATION 21

In the vibrating string problem, we typically have the condition φ(x, 0) =


h(x), where h(x) is some function that describes the initial position of the
string. The problem that remains is to find a linear combination of the
eigenfunctions that satisfies this additional initial condition. Therefore, we
need to find coefficients Am so that

X  πm 
h(x) = Am sin x . (3.14)
m=1
L

This again reminds us of finding a basis for a finite-dimensional vector space


consisting of eigenvectors of a given matrix. As we discuss in the chapter
Some Linear Algebra, this can be done only for a certain special kind
of matrices, the normal matrices, which includes the self-adjoint ones. The
property of a matrix being self-adjoint is one that we shall usefully extend
later to linear differential operators.
Orthogonality of the eigenfunctions ym (x) will help us find the coef-
ficients Am . In this case we know the ym (x) and can demonstrate their
orthogonality directly, using trigonometric identities. But we can also
demonstrate their orthogonality using only the fact that each ym solves
the eigenvalue problem for λm ; this sort of approach is what is done in
qualitative analysis. We multiply the equation
00
ym = −λm ym

by yn and the equation


yn00 = −λn yn
by ym and subtract, to get
00
ym yn − yn00 ym = (λn − λm )(ym yn ).

Using
00
ym yn − yn00 ym = (yn ym
0
− ym yn0 )0 ,
and integrating, we get
0
0 = yn (L)ym (L) − ym (L)yn0 (L) − yn (0)ym
0
(0) + ym (0)yn0 (0)
Z L
= (λn − λm ) ym (x)yn (x)dx,
0

so that Z L
ym (x)yn (x)dx = 0,
0

for m 6= n. Using this orthogonality of the ym (x), we can easily find the
coefficients Am .
22 CHAPTER 3. DIFFERENTIAL EQUATIONS (CHAPTERS 2,3)

3.3 A Brief Discussion of Some Linear Alge-


bra
In this section we review briefly some notions from linear algebra that we
shall need shortly. For more detail, see the chapter Some Linear Algebra.
Suppose that V is an N -dimensional complex vector space on which
there is defined an inner product, with the inner product of members a
and b denoted ha, bi. For example, we take V = CN , the space of all N -
dimensional complex column vectors, with the inner product defined by
the complex dot product
N
X
a · b = b† a = an bn , (3.15)
n=1

where b† is the row vector with entries bn .


For any linear operator T : V → V we define the adjoint of T to be
the linear operator T ∗ satisfying hT a, bi = ha, T ∗ bi, for all a and b in V . A
word of warning: If we change the inner product, the adjoint changes. We
consider two examples.

• Example 1: Let A be any N by N complex matrix, V = CN , and


define the linear operator T on V to be multiplication on the left by
A; that is,
T x = Ax,
for any vector x in V . If the inner product on V is the usual one
coming from the dot product, as in Equation (3.15), then T ∗ is the
operator defined by
T ∗ x = A† x,
where A† is the conjugate transpose of the matrix A.

• Example 2: If, on the other hand, we define an inner product on


V = CN by
ha, bi = b† Qa,
where Q is a positive-definite Hermitian matrix, then T ∗ is the linear
operator defined by multiplication on the left by the matrix Q−1 A† Q.

Definition 3.1 Given V and the inner product, we say that a linear op-
erator T on V is self-adjoint if T ∗ = T .

For Example 1, T is self-adjoint if the associated matrix A is Hermitian,


that is, A† = A. For Example 2, T is self-adjoint if the associated matrix
A satisfies QA = A† Q.
3.4. PREVIEW OF COMING ATTRACTIONS 23

A non-zero vector u in V is said to be an eigenvector of T if there


is a constant λ such that T u = λu; then λ is called the eigenvalue of T
associated with the eigenvector u. We have the following important results
concerning self-adjoint linear operators.
Theorem 3.1 If T is self-adjoint on the inner product space V , then all
its eigenvalues are real numbers.
Proof: By the defining properties of an inner product, we have
hλx, yi = λhx, yi,
and
hx, λyi = λhx, yi.

Since T = T , we have
λhu, ui = hT u, ui = hu, T ui = hu, λui = λhu, ui.
Therefore, λ = λ.
Theorem 3.2 If λm 6= λn are two eigenvalues of the self-adjoint lin-
ear operator T associated with eigenvectors um and un , respectively, then
hum , un i = 0.
Proof: We have
λm hum , un i = hT um , un i = hum , T un i = hum , λn un i = λn hum , un i.

3.4 Preview of Coming Attractions


We have seen that the differential equation (3.1) can be reformulated in
several equivalent forms. The equation
d 0
y 00 (x) + y(x) =
(y (x)) + y(x) = 0
dx
is simultaneously in the standard form, the Sturm-Liouville form, and the
normal form. As we shall see, considering its normal form will help us
uncover properties of solutions of Equation (3.8) for other functions q(x).
Under certain boundary conditions, the linear differential operator L given
by Equation (3.7) will be self-adjoint, its eigenvalues will then be real
numbers, and solutions to the eigenfunction problem will enjoy orthogo-
nality properties similar to those we just presented for the ym (x) solving
y 00 (x) + y(x) = 0.
Some of our discussion of these subjects in later chapters is taken from
the book by Simmons [42]. Another good source that combines the math-
ematics with the history of the subject is the book by Gonzalez-Velasco
[19].
24 CHAPTER 3. DIFFERENTIAL EQUATIONS (CHAPTERS 2,3)
Chapter 4

Extra Credit Problems


(Chapters 2,3)

4.1 The Problems


• Chemical Reaction: Suppose that two chemical substances in so-
lution react together to form a compound. If the reaction occurs by
the collision and interaction of the molecules of the substances, we
expect the rate of formation of the compound to be proportional to
the number of collisions per unit of time, which in turn is jointly pro-
portional to the amounts of the substances that are untransformed.
A chemical reaction that proceeds in this manner is called a second-
order reaction, and this law of reaction is often referred to as the law
of mass action. Consider a second-order reaction in which x grams of
the compound contain ax grams of the first substance and bx grams
of the second, where a + b = 1. If there are aA grams of the first
substance and bB grams of the second present initially, and if x = 0
when t = 0, find x as a function of t. ([42], p. 18)
• Retarded Fall: If we assume that air exerts a resisting force propor-
tional to the velocity of a falling body, then the differential equation
of the motion is
d2 y dy
=g−c ,
dt2 dt
where c > 0 is some constant. If the velocity v = dy dt is zero when
t = 0, find the limiting or (terminal) velocity as t → +∞. If the
retarding force is proportional to the square of the velocity, then the
differential equation becomes
d2 y dy
2
= g − c( )2 .
dt dt

25
26 CHAPTER 4. EXTRA CREDIT PROBLEMS (CHAPTERS 2,3)

Find the terminal velocity in this case. ([42], pp. 20, 24.)
• Escape Velocity: The force that gravity exerts on a body of mass
m at the surface of the earth is mg. In space, however, Newton’s law
of gravitation asserts that this force varies inversely as a square of the
distance to the earth’s center. If a projectile fired upward from the
surface is to keep √
traveling indefinitely, show that its initial velocity
must be at least 2gR, where R is the radius of the earth (about
4000 miles). This escape velocity is approximately 7 miles/second or
about 25,000 miles/hour. Hint: If x is the distance from the center
of the earth to the projectile and v = dx dt is its velocity, then

d2 x dv dv dx dv
= = =v .
dt2 dt dx dt dx
([42], p. 24)
Another way to view this problem is to consider an object falling to
earth from space. Calculate its velocity upon impact, as a function
of the distance to the center of the earth at the beginning of its
fall, neglecting all but gravity. Then calculate the upper limit of the
impact velocity as the distance goes to infinity.
• The Snowplow Problem: It began snowing on a certain morning
and the snow continued to fall steadily throughout the day. At noon
a snowplow started to clear a road, at a constant rate, in terms of the
volume of snow removed per hour. The snowplow cleared 2 miles by
2 p.m. and 1 more mile by 4 p.m. When did it start snowing? ([42],
p. 31)
• Torricelli’s Law: According to Torricelli’s Law, water in an open
tank will flow out through a small hole in the bottom with the speed
it would acquire in falling freely from the water level to the hole.
A hemispherical bowl of radius R is initially filled with water, and a
small circular hole of radius r is punched in the bottom at time t = 0.
How long does it take for the bowl to empty itself? ([42], p. 32)
• The Coffee and Cream Problem: The President and the Prime
Minister order coffee and receive cups of equal temperature at the
same time. The President adds a small amount of cool cream im-
mediately, but does not drink his coffee until 10 minutes later. The
Prime Minister waits ten minutes and then adds the same amount of
cool cream and begins to drink. Who drinks the hotter coffee? ([42],
p. 33)
• The Two Tanks Problem: A tank contains 50 gallons of brine in
which 25 pounds of salt are dissolved. Beginning at time t = 0, water
4.1. THE PROBLEMS 27

runs into this tank at the rate of 2 gallons/minute, and the mixture
flows out at the same rate through a second tank initially containing
50 gallons of pure water. When will the second tank contain the
greatest amount of salt? ([42], p. 62)
• Torricelli Again: A cylindrical tank is filled with water to a height
of D feet. At height h < D feet a small hole is drilled into the side of
the tank. According to Torricelli’s Law, the horizontal p velocity with
which the water spurts from the side of the tank is v = 2g(D − h).
What is the distance d from the base of the tank to where the water
hits the ground? For fixed D, what are the possible values of d as
h varies? Given D and d, can we find h? This last question is an
example of an inverse problem ([24], pp. 26-27). We shall consider
more inverse problems below.
• The Well Problem: A rock is dropped into a well in which the
unknown water level is d feet below the top of the well. If we measure
the time lapse from the dropping of the rock until the hearing of the
splash, can we use this to determine d? ([24], p. 40)
• The Pool Table Problem: Suppose our ‘pool table’ is the unit
square {(x, y)|0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. Suppose the cue ball is at
(x1 , y1 ) and the target ball is at (x2 , y2 ). In how many ways can we
hit the target ball with the cue ball using a ‘bank shot’ , in which the
cue ball rebounds off the side of the table once before striking the
target ball? Now for a harder problem: there is no pool table now.
The cue ball is launched from the origin into the first quadrant at an
angle θ > 0 with the positive x-axis. It bounces off a straight line
and returns to the positive x-axis at the point r(θ), making an angle
ψ(θ) > 0. Can we determine the equation of the straight line from
this information? What if we do not know r(θ)? ([24], pp. 41-44)
• Torricelli, Yet Again!: A container is formed by revolving the
curve x = f (y) around the (vertical) y-axis. The container is filled
to a height of y and the water is allowed to run out through a hole
of cross-sectional area a in the bottom. The time it takes to drain
is T (y). How does the drain-time function T depend on the shape
function f ? Can we determine f if we know T ? How could we
approximate f from the values T (yn ), n = 1, ..., N ? ([24], pp. 59–66)
• Mixing Problems: Let q(t) denote the quantity of a pollutant in
a container at time t. Then the rate at which q(t) changes with time
is the difference between the rate at which the pollutant enters the
container and the rate at which it is removed. Suppose the container
has volume V , water with a concentration a of pollutant enters the
container at a rate r and the well-stirred mixture leaves the container
28 CHAPTER 4. EXTRA CREDIT PROBLEMS (CHAPTERS 2,3)

again at the rate r. Write the differential equation governing the


behavior of the function q(t). Suppose now that q(0) = 0 and a
and r are unknown. Show that they can be determined from two
measurements of q(t). If, instead, q(0) is also unknown, show that
all three parameters can be determined from three measurements of
q(t). ([24], pp. 92–96)
• Frictionless Sliding: A half-line begins at the origin and continues
into the fourth quadrant, making an angle of α with the positive x-
axis. A particle descends from the origin along this half-line, under
the influence of gravity and without resistance. Let C be a circle
contained in the third and fourth quadrants, passing through the
origin and tangent to the x-axis. Let T be the time required for
the particle to reach the point where the half-line again intersects C.
Show that T is independent of α and depends only on the radius of
the circle. ([24], pp. 96–102)
Chapter 5

Qualitative Analysis of
ODEs (Chapter 2,3)

We are interested in second-order linear differential equations with possibly


varying coefficients, as given in Equation (3.1), which we can also write as

y 00 + P (x)y 0 + Q(x)y = 0. (5.1)

Although we can find explicit solutions of Equation (5.1) in special cases,


such as

y 00 + y = 0, (5.2)

generally, we will not be able to do this. Instead, we can try to answer cer-
tain questions about the behavior of the solution, without actually finding
the solution; such an approach is called qualitative analysis. The discussion
here is based on that in Simmons [42].

5.1 Existence and Uniqueness


We begin with the fundamental existence and uniqueness theorem for so-
lutions of Equation (5.1).

Theorem 5.1 Let P (x) and Q(x) be continuous functions on the interval
[a, b]. If x0 is any point in [a, b] and y0 and y00 any real numbers, then there
is a unique solution of Equation (5.1) satisfying the conditions y(x0 ) = y0
and y 0 (x0 ) = y00 .

The proof of this theorem is somewhat lengthy and we shall omit the
proof here.

29
30CHAPTER 5. QUALITATIVE ANALYSIS OF ODES (CHAPTER 2,3)

5.2 A Simple Example


We know that the solution to Equation (5.2) satisfying y(0) = 0, and
y 0 (0) = 1 is y(x) = sin x; with y(0) = 1 and y 0 (0) = 0, the solution is
y(x) = cos x. But, suppose that we did not know these solutions; what
could we find out without solving for them?
Suppose that y(x) = s(x) satisfies Equation (5.2), with s(0) = 0, s(π) =
0, and s0 (0) = 1. As the graph of s(x) leaves the point (0, 0) with x
increasing, the slope is initially s0 (0) = 1, so the graph climbs above the
x-axis. But since y 00 (x) = −y(x), the second derivative is negative for
y(x) > 0, and becomes increasingly so as y(x) climbs higher; therefore, the
derivative is decreasing from s0 (0) = 1, eventually equaling zero, at say
x = m, and continuing to become negative. The function s(x) will be zero
again at x = π, and, by symmetry, we have m = π2 .
Now let y(x) = c(x) solve Equation (5.2), but with c(0) = 1, and
c0 (0) = 0. Since y(x) = s(x) satisfies Equation (5.2), so does y(x) = s0 (x),
with s0 (0) = 1 and s00 (0) = 0. Therefore, c(x) = s0 (x). Since the derivative
of the function s(x)2 + c(x)2 is zero, this function must be equal to one for
all x. In the section that follows, we shall investigate the zeros of solutions.

5.3 The Sturm Separation Theorem


Theorem 5.2 Let y1 (x) and y2 (x) be linearly independent solutions of
Equation (5.1). Then their zeros are distinct and occur alternately.

Proof: We know that solutions y1 (x) and y2 (x) are linearly independent
if and only if the Wronskian

W (x, y1 , y2 ) = y1 (x)y20 (x) − y2 (x)y10 (x),

is different from zero for all x in the interval [a, b]. Therefore, when the
two functions are linearly independent, the function W (x, y1 , y2 ) must have
constant sign on the interval [a, b]. Therefore, the two functions y1 (x) and
y2 (x) have no common zero. Suppose that y2 (x1 ) = y2 (x2 ) = 0, with
x1 < x2 successive zeros of y2 (x). Suppose, in addition, that y2 (x) > 0 in
the interval (x1 , x2 ). Therefore, we have y20 (x1 ) > 0 and y20 (x2 ) < 0. It
follows that y1 (x1 ) and y1 (x2 ) have opposite signs, and there must be a
zero between x1 and x2 .

5.4 From Standard to Normal Form


Equation (5.1) is called the standard form of the differential equation. To
put the equation into normal form, by which we mean an equation of the
5.5. ON THE ZEROS OF SOLUTIONS 31

form

u00 (x) + q(x)u(x) = 0, (5.3)

we write y(x) = u(x)v(x). Inserting this product into Equation (5.1), we


obtain
vu00 + (2v 0 + P v)u0 + (v 00 + P v 0 + Qv)u = 0.
With Z
 1 
v = exp − P dx ,
2
the coefficient of u0 becomes zero. Now we set
1 1
q(x) = Q(x) − P (x)2 − P 0 (x),
4 2
to get
u00 (x) + q(x)u(x) = 0.

5.5 On the Zeros of Solutions


We assume now that u(x) is a non-trivial solution of Equation (5.3). As
we shall show shortly, if q(x) < 0 and u(x) satisfies Equation (5.3), then
u(x) has at most one zero; for example, the equation

u00 (x) − u(x) = 0

has ex and e−x for solutions. Since we are interested in oscillatory solutions,
we restrict q(x) to be (eventually) positive. With q(x) > 0 and
Z ∞
q(x)dx = ∞,
1

the solution u(x) will have infinitely many zeros, but only finitely many on
any bounded interval.

Theorem 5.3 If q(x) < 0 for all x, then u(x) has at most one zero.

Proof: Let u(x0 ) = 0. Since u(x) is not identically zero, we must have
u0 (x0 ) 6= 0, by Theorem 5.1. Therefore, assume that u0 (x) > 0 for x in
the interval [x0 , x0 + ], where  is some positive number. Since u00 (x) =
−q(x)u(x), we know that u00 (x) > 0 also, for x in the interval [x0 , x0 + ].
So the slope of u(x) is increasing to the right of x0 , and so there can be no
zero of u(x) to the right of x0 . A similar argument shows that there can
be no zeros of u(x) to the left of x0 .
32CHAPTER 5. QUALITATIVE ANALYSIS OF ODES (CHAPTER 2,3)

R∞
Theorem 5.4 If q(x) > 0 for all x > 0 and 1
q(x)dx = ∞, then u(x)
has infinitely many positive zeros.

Proof: Assume, to the contrary, that u(x) has only finitely many positive
zeros, and that there are no positive zeros to the right of the positive
number x0 . Assume also that u(x0 ) > 0. From u00 (x) = −q(x)u(x) we
know that the slope of u(x) is decreasing to the right of x0 , so long as u(x)
remains above the x-axis. If the slope ever becomes negative, the graph of
u(x) will continue to drop at an ever increasing rate and will have to cross
the x-axis at some point to the right of x0 . Therefore, to avoid having a
root beyond x0 , the slope must remain positive. We prove the theorem by
showing that the slope eventually becomes negative.
Let v(x) = −u0 (x)/u(x), for x ≥ x0 . Then v 0 (x) = q(x) + v 2 (x), and
Z x Z x
v(x) − v(x0 ) = q(x)dx + v 2 (x)dx.
x0 x0

Since Z ∞
q(x)dx = ∞,
1

we see that v(x) must eventually become positive, as x → ∞. Therefore,


u0 (x) and u(x) eventually have opposite signs. Since we are assuming that
u(x) remains positive to the right of x0 , it follows that u0 (x) becomes
negative somewhere to the right of x0 .

5.6 Sturm Comparison Theorem


Solutions to
y 00 + 4y = 0
oscillate faster than solutions of Equation (5.2). This leads to the Sturm
Comparison Theorem.
Theorem 5.5 Let y 00 +q(x)y = 0 and z 00 +r(x)z = 0, with 0 < r(x) < q(x),
for all x. Then between any two zeros of z(x) is a zero of y(x).

5.6.1 Bessel’s Equation


Bessel’s Equation is

x2 y 00 + xy 0 + (x2 − ν 2 )y = 0. (5.4)

In normal form, it becomes


 1 − 4ν 2 
u00 + 1 + u = 0. (5.5)
4x2
5.7. ANALYSIS OF Y 00 + Q(X)Y = 0 33

Information about the zeros of solutions of Bessel’s Equation can be ob-


tained by using Sturm’s Comparison Theorem and comparing with solu-
tions of Equation (5.2).

5.7 Analysis of y 00 + q(x)y = 0


Using the Sturm Comparison Theorem, we can prove the following lemma.
Lemma 5.1 Let y 00 +q(x)y = 0, and z 00 +r(x)z = 0, with 0 < r(x) < q(x).
Let y(b0 ) = z(b0 ) = 0 and z(bj ) = 0, and bj < bj+1 , for j = 1, 2, ....
Then, y has at least as many zeros as z in [b0 , bn ]. If y(aj ) = 0, for
b0 < a1 < a2 < ..., then an < bn .
Lemma 5.2 Suppose that 0 < m2 < q(x) < M 2 on [a, b], and y(x) solves
y 00 + q(x)y = 0 on [a, b]. If x1 and x2 are successive zeros of y(x) then
π π
< x2 − x1 < .
M m
If y(a) = y(b) = 0 and y(x) = 0 for n − 1 other points in (a, b), then
m(b − a) M (b − a)
<n< .
π π
Lemma 5.3 Let yλ solve
y 00 + λq(x)y = 0,
with yλ (a) = 0, and yλ0 (a) = 1. Then, there exist λ1 < λ2 < ..., converging
to +∞, such that yλ (b) = 0 if and only if λ = λn , for some n. The solution
yλn has exactly n − 1 roots in (a, b).

5.8 Toward the 20th Century


The goal in qualitative analysis is to learn something about the solutions of
a differential equation by examining its form, rather than actually finding
solutions. We do this by exploiting similarities between one equation and
another; in other words, we study classes of differential equations all at
once. This is what we did earlier, when we studied problems of the Sturm-
Liouville type. The simplest boundary-value problem,
y 00 (x) + λy(x) = 0,
with y(0) = y(L) = 0, can be solved explicitly. Its eigenfunction solutions
are ym (x) = sin(mπx/L), which are orthogonal over the interval [0, L],
with respect to the inner product defined by
Z L
hf, gi = f (x)g(x)dx.
0
34CHAPTER 5. QUALITATIVE ANALYSIS OF ODES (CHAPTER 2,3)

This suggests that other differential equations that can be written in Sturm-
Liouville form may have eigenfunction solutions that are also orthogonal,
with respect to some appropriate inner product. As we have seen, this
program works out beautifully. What is happening here is a transition from
classical applied mathematics, with its emphasis on particular problems
and equations, to a more modern, 20th century style mathematics, with
an emphasis on families of functions or even more abstract inner-product
spaces, Hilbert spaces, Banach spaces, and so on.
Chapter 6

The Trans-Atlantic Cable


(Chapters 4,12)

6.1 Introduction
In 1815, at the end of the war with England, the US was a developing coun-
try, with most people living on small farms, eating whatever they could
grow themselves. Only those living near navigable water could market
their crops. Poor transportation and communication kept them isolated.
By 1848, at the end of the next war, this time with Mexico, things were
different. The US was a transcontinental power, integrated by railroads,
telegraph, steamboats, the Erie Canal, and innovations in mass production
and agriculture. In 1828, the newly elected President, Andrew Jackson,
arrived in Washington by horse-drawn carriage; he left in 1837 by train.
The most revolutionary change was in communication, where the recent
advances in understanding electromagnetism produced the telegraph. It
wasn’t long before efforts began to lay a telegraph cable under the At-
lantic Ocean, even though some wondered what England and the US could
possibly have to say to one another.
The laying of the trans-Atlantic cable was, in many ways, the 19th
century equivalent of landing a man on the moon, involving, as it did,
considerable expense, too frequent failure, and a level of precision in en-
gineering design and manufacturing never before attempted. From a sci-
entific perspective, it was probably more difficult, given that the study of
electromagnetism was in its infancy at the time.
Early on, Faraday and others worried that sending a message across a
vast distance would take a long time, but they reasoned, incorrectly, that
this would be similar to filling a very long hose with water. What they
did not realize initially was that, as William Thomson was to discover,

35
36 CHAPTER 6. THE TRANS-ATLANTIC CABLE (CHAPTERS 4,12)

the transmission of a pulse through an undersea cable was described more


by a heat equation than a wave equation. This meant that a signal that
started out as a sharp pulse would be spread out as time went on, making
communication extremely slow. The problem was the increased capacitance
with the ground.
Somewhat later, Oliver Heaviside realized that, when all four of the
basic elements of the electrical circuit, the inductance, the resistance, the
conductance to the ground and the capacitance to the ground, were consid-
ered together, it might be possible to adjust these parameters, in particular,
to increase the inductance, so as to produce undistorted signals. Heaviside
died in poverty, but his ideas eventually were adopted.
In 1859 Queen Victoria sent President Buchanan a 99 word greeting
using an early version of the cable, but the message took over sixteen hours
to be received. By 1866 one could transmit eight words a minute along a
cable that stretched from Ireland to Newfoundland, at a cost of about 1500
dollars per word in today’s money. With improvements in insulation, using
gutta percha, a gum from a tropical tree also used to make golf balls, and
the development of magnetic alloys that increased the inductance of the
cable, messages could be sent faster and more cheaply.
In this chapter we survey the development of the mathematics of the
problem. We focus, in particular, on the partial differential equations that
were used to describe the transmission problem. What we give here is a
brief glimpse; more detailed discussion of this problem is found in the books
by Körner [32], Gonzalez-Velasco [19], and Wylie [47].

6.2 The Electrical Circuit ODE


We begin with the ordinary differential equation that describes the hori-
zontal motion of a block of wood attached to a spring. We let x(t) be the
position of the block relative to the equilibrium position x = 0, with x(0)
and x0 (0) denoting the initial position and velocity of the block. When an
external force f (t) is imposed, a portion of this force is devoted to over-
coming the inertia of the block, a portion to compressing or stretching
the spring, and the remaining portion to resisting friction. Therefore, the
differential equation describing the motion is

mx00 (t) + ax0 (t) + kx(t) = f (t), (6.1)

where m is the mass of the block, a the coefficient of friction, and k the
spring constant.
The charge Q(t) deposited on a capacitor in an electrical circuit due to
an imposed electromotive force E(t) is similarly described by the ordinary
6.3. THE TELEGRAPH EQUATION 37

differential equation
1
LQ00 (t) + RQ0 (t) + Q(t) = E(t). (6.2)
C
The first term, containing the inductance coefficient L, describes the por-
tion of the force E(t) devoted to overcoming the effect of a change in the
current I(t) = Q0 (t); here L is analogous to the mass m. The second term,
containing the resistance coefficient R, describes that portion of the force
E(t) needed to overcome resistance to the current I(t); now R is analogous
to the friction coefficient a. Finally, the third term, containing the recipro-
cal of the capacitance C, describes the portion of E(t) used to store charge
on the capacitor; now C1 is analogous to k, the spring constant.

6.3 The Telegraph Equation


The objective here is to describe the behavior of u(x, t), the voltage at
location x along the cable, at time t. In the beginning, it was believed that
the partial differential equation describing the voltage would be the wave
equation
uxx = α2 utt .
If this were the case, an initial pulse

E(t) = H(t) − H(t − T )

would move along the cable undistorted; here H(t) is the Heaviside function
that is zero for t < 0 and one for t ≥ 0. Thomson (later Sir William
Thomson, and even later, Lord Kelvin) thought otherwise.
Thomson argued that there would be a voltage drop over an interval
[x, x+∆x] due to resistance to the current i(x, t) passing through the cable,
so that
u(x + ∆x, t) − u(x, t) = −Ri(x, t)∆x,
and so
∂u
= −Ri.
∂x
He also argued that there would be capacitance to the ground, made more
significant under water. Since the apparent change in current due to the
changing voltage across the capacitor is

i(x + ∆x, t) − i(x, t) = −Cut (x, t)∆x,

we have
∂i ∂u
= −C .
∂x ∂t
38 CHAPTER 6. THE TRANS-ATLANTIC CABLE (CHAPTERS 4,12)

Eliminating the i(x, t), we can write

uxx (x, t) = CRut (x, t), (6.3)

which is the heat equation, not the wave equation.

6.4 Consequences of Thomson’s Model


To see what Thomson’s model predicts, we consider the following problem.
Suppose we have a semi-infinite cable, that the voltage is u(x, t) for x ≥ 0,
and t ≥ 0, and that u(0, t) = E(t). Let U (x, s) be the Laplace transform
of u(x, t), viewed as a function of t. Then, from Thomson’s model we have

U (x, s) = L(E)(s)e− CRsx
,

where L(E)(s) denotes the Laplace transform of E(t). Since U (x, s) is the
product of two functions of s, the convolution theorem applies. But first,
it is helpful to√ find out which function has for its Laplace transform the
function e−αx s . The answer comes from the following fact: the function
2 √
be−b /4t /2 πt3/2

has for its Laplace transform the function e−b s . Therefore, we can write
√ 2
CRx t e−CRx /4τ
Z
u(x, t) = √ E(t − τ ) √ dτ.
2 π 0 τ τ
Now we consider two special cases.

6.4.1 Special Case 1: E(t) = H(t)


Suppose now that E(t) = H(t), the Heaviside function. Using the substi-
tution
z = CRx2 /4τ,
we find that
√ √
Z CRx/2 π
2 2
u(x, t) = 1 − √ e−z dz. (6.4)
π 0

The function Z r
2 2
erf(r) = √ e−z dz
π 0
is the well known error function, so we can write
 √CRx 
u(x, t) = 1 − erf √ . (6.5)
2 t
6.5. HEAVISIDE TO THE RESCUE 39

6.4.2 Special Case 2: E(t) = H(t) − H(t − T )


Now suppose that E(t) is the pulse H(t) − H(t − T ). Using the results
from the previous subsection, we find that, for t > T ,
 √CRx   √CRx 
u(x, t) = erf √ − erf √ . (6.6)
2 t−T 2 t
2
For fixed x, u(x, t) is proportional to the area under the function e−z ,
over an interval that, as time goes on, moves steadily to the left and de-
creases in length. For small t the interval involves only large z, where
2
the function e−z is nearly zero and the integral is nearly zero. As t in-
creases, the interval of integration moves to the left, so that the integrand
grows larger, but the length of the interval grows smaller. The net effect
is that the voltage at x increases gradually over time, and then decreases
gradually; the sharp initial pulse is smoothed out in time.

6.5 Heaviside to the Rescue


It seemed that Thomson had solved the mathematical problem and discov-
ered why the behavior was not wave-like. Since it is not really possible to
reduce the resistance along the cable, and capacitance to the ground would
probably remain a serious issue, particularly under water, it appeared that
little could be done to improve the situation. But Heaviside had a solution.
Heaviside argued that Thomson had ignored two other circuit compo-
nents, the leakage of current to the ground, and the self-inductance of the
cable. He revised Thomson’s equations, obtaining
ux = −Lit − Ri,
and
ix = −Cut − Gu,
where L is the inductance and G is the coefficient of leakage of current to
the ground. The partial differential equation governing u(x, t) now becomes
uxx = LCutt + (LG + RC)ut + RGu, (6.7)
which is the formulation used by Kirchhoff. As Körner remarks, never
before had so much money been riding on the solution of one partial dif-
ferential equation.

6.5.1 A Special Case: G = 0


If we take G = 0, thereby assuming that no current passes into the ground,
the partial differential equation becomes
uxx = LCutt + RCut , (6.8)
40 CHAPTER 6. THE TRANS-ATLANTIC CABLE (CHAPTERS 4,12)

or
1 R
uxx = utt + ut . (6.9)
CL L
If R/L could be made small, we √ would have a wave equation again, but
with a propagation speed of 1/ CL. This suggested to Heaviside that one
way to obtain undistorted signaling would be to increase L, since we cannot
realistically hope to change R. He argued for years for the use of cables
with higher inductance, which eventually became the practice, helped along
by the invention of new materials, such as magnetic alloys, that could be
incorporated into the cables.

6.5.2 Another Special Case


Assume now that E(t) is the pulse. Applying the Laplace transform method
described earlier to Equation (6.7), we obtain

Uxx (x, s) = (Cs + G)(Ls + R)U (x, s) = λ2 U (x, s),

from which we get


1 
U (x, s) = A(s)eλx + (1 − e−T s ) − A(s) e−λx .
s
If it happens that GL = CR, we can solve easily for λ:
√ √
λ = CLs + GR.

Then we have
√ √
GRx 1
U (x, s) = e− (1 − e−T s )e− CLxs
,
s
so that
√  √ √ 
u(x, t) = e− GRx
H(t − x CL) − H(t − T − x CL) . (6.10)

This tells us that√we have an undistorted pulse that arrives at the point x
at the time t = x CL.
In order to have GL = CR, we need L = CR/G. Since C and R are
more or less fixed, and G is typically reduced by insulation, L will need to
be large. Again, this argues for increasing the inductance in the cable.
Chapter 7

The Laplace Transform


and the Ozone Layer
(Chapter 4)

In farfield propagation problems, we often find the measured data to be


related to the desired object function by a Fourier transformation. The
image reconstruction problem then becomes one of estimating a function
from finitely many noisy values of its Fourier transform. In this chapter we
consider an inverse problem involving the Laplace transform. The example
is taken from Twomey’s book [44].

7.1 The Laplace Transform


The Laplace transform of the function f (x) defined for 0 ≤ x < +∞ is the
function Z +∞
F(s) = f (x)e−sx dx.
0

7.2 Scattering of Ultraviolet Radiation


The sun emits ultraviolet (UV) radiation that enters the Earth’s atmo-
sphere at an angle θ0 that depends on the sun’s position, and with intensity
I(0). Let the x-axis be vertical, with x = 0 at the top of the atmosphere
and x increasing as we move down to the Earth’s surface, at x = X. The
intensity at x is given by

I(x) = I(0)e−kx/ cos θ0 .

41
42CHAPTER 7. THE LAPLACE TRANSFORM AND THE OZONE LAYER (CHAPTER 4)

Within the ozone layer, the amount of UV radiation scattered in the direc-
tion θ is given by
S(θ, θ0 )I(0)e−kx/ cos θ0 ∆p,
where S(θ, θ0 ) is a known parameter, and ∆p is the change in the pressure
of the ozone within the infinitesimal layer [x, x+∆x], and so is proportional
to the concentration of ozone within that layer.

7.3 Measuring the Scattered Intensity


The radiation scattered at the angle θ then travels to the ground, a distance
of X − x, weakened along the way, and reaches the ground with intensity

S(θ, θ0 )I(0)e−kx/ cos θ0 e−k(X−x)/ cos θ ∆p.

The total scattered intensity at angle θ is then a superposition of the in-


tensities due to scattering at each of the thin layers, and is then
Z X
S(θ, θ0 )I(0)e−kX/ cos θ0 e−xβ dp,
0

where
1 1
β = k[ − ].
cos θ0 cos θ
This superposition of intensity can then be written as
Z X
−kX/ cos θ0
S(θ, θ0 )I(0)e e−xβ p0 (x)dx.
0

7.4 The Laplace Transform Data


Using integration by parts, we get
Z X Z X
e−xβ p0 (x)dx = p(X)e−βX − p(0) + β e−βx p(x)dx.
0 0

Since p(0) = 0 and p(X) can be measured, our data is then the Laplace
transform value Z +∞
e−βx p(x)dx;
0

note that we can replace the upper limit X with +∞ if we extend p(x) as
zero beyond x = X.
The variable β depends on the two angles θ and θ0 . We can alter θ as
we measure and θ0 changes as the sun moves relative to the earth. In this
way we get values of the Laplace transform of p(x) for various values of β.
7.4. THE LAPLACE TRANSFORM DATA 43

The problem then is to recover p(x) from these values. Because the Laplace
transform involves a smoothing of the function p(x), recovering p(x) from
its Laplace transform is more ill-conditioned than is the Fourier transform
inversion problem.
44CHAPTER 7. THE LAPLACE TRANSFORM AND THE OZONE LAYER (CHAPTER 4)
Chapter 8

The Finite Fourier


Transform (Chapter 7)

8.1 Fourier Series


Suppose that f (x) is a real or complex function defined for 0 ≤ x ≤ 2A,
with Fourier series representation

1 X π π
f (x) = a0 + ak cos( kx) + bk sin( kx). (8.1)
2 A A
k=1

Then the Fourier coefficients ak and bk are


1 2A
Z
π
ak = f (x) cos( kx)dx, (8.2)
A 0 A
and
Z 2A
1 π
bk = f (x) sin( kx)dx. (8.3)
A 0 A
To obtain the Fourier coefficients we need to know f (x) for all x in the
interval [0, 2A]. In a number of applications, we do not have complete
knowledge of the function f (x), but rather, we have measurements of f (x)
taken at a finite number of values of the variable x. In such circumstances,
the finite Fourier transform can be used in place of Fourier series.

8.2 Linear Trigonometric Models


A popular finite-parameter model is to consider f (x) as a finite sum of
trigonometric functions. For example, we may assume that f (x) is a func-

45
46CHAPTER 8. THE FINITE FOURIER TRANSFORM (CHAPTER 7)

tion of the form


M 
1 X 
f (x) = a0 + ak cos(ωk x) + bk sin(ωk x) , (8.4)
2
k=1

where the ωk are known, but the ak and bk are not. We find the unknown
ak and bk by fitting the model to the data. We obtain data f (xn ) corre-
sponding to the N points xn , for n = 0, 1, ..., N − 1, where N = 2M + 1,
and we solve the system
M 
1 X 
f (xn ) = a0 + ak cos(ωk xn ) + bk sin(ωk xn ) ,
2
k=1

for n = 0, ..., N − 1, to find the ak and bk .


When M is large, calculating the coefficients can be time-consuming.
One particular choice for the xn and ωk reduces the computation time
significantly.

8.2.1 Equi-Spaced Frequencies


It is often the case that we can choose the xn at which we evaluate or
measure the function f (x). We suppose now that we have selected N =
2M + 1 evaluation points equi-spaced from x = 0 to x = 2A; that is, xn =
2An π
N , for n = 0, ..., N −1. Now let us select ωk = A k, for k = 1, ..., M . These
are M values of the variable ω, equi-spaced within the interval (0, MAπ ]. Our
model for the function f (x) is now
M 
1 X π π 
f (x) = a0 + ak cos( kx) + bk sin( kx) . (8.5)
2 A A
k=1

In keeping with the common notation, we write fn = f ( 2An


N ) for n =
0, ..., N − 1. Then we have to solve the system
M 
1 X 2π 2π 
fn = a0 + ak cos( kn) + bk sin( kn) , (8.6)
2 N N
k=1

for n = 0, ..., N − 1, to find the N coefficients a0 and ak and bk , for k =


1, ..., M . These N coefficients are known, collectively, as the finite Fourier
transform of the data.

8.2.2 Simplifying the Calculations


Calculating the solution of a system of N linear equations in N unknowns
generally requires the number of multiplications to be on the order of N 3 .
8.2. LINEAR TRIGONOMETRIC MODELS 47

π
As we shall see in this subsection, choosing ωk = A k leads to a form of
orthogonality that will allow us to calculate the parameters in a relatively
simple manner, with the number of multiplications on the order of N 2 .
Later, we shall see how to use the fast Fourier transform algorithm to
reduce the number of computations even more.
For fixed j = 1, ..., M consider the sums
N −1 N −1
X 2π 1 X 2π
fn cos( jn) = a0 cos( jn)
n=0
N 2 n=0 N

M −1
 NX
X 2π 2π 
+ ak cos( kn) cos( jn)
n=0
N N
k=1

−1
!
 NX 2π 2π 
+bk sin( kn) cos( jn) , (8.7)
n=0
N N

and
N −1 N −1
X 2π 1 X 2π
fn sin( jn) = a0 sin( jn)
n=0
N 2 n=0 N
M −1
 NX
X 2π 2π 
+ ak cos( kn) sin( jn)
n=0
N N
k=1

−1
!
 NX 2π 2π 
+bk sin( kn) sin( jn) . (8.8)
n=0
N N

We want to obtain the following:

Lemma 8.1 For N = 2M + 1 and j, k = 1, 2, ..., M , we have


N −1
X 2π 2π
sin( kn) cos( jn) = 0,
n=0
N N

N −1  0, if j 6= k;
X 2π 2π
cos( kn) cos( jn) = N2 , if j = k 6= 0;
N N
N, if j = k = 0;

n=0

and
N −1 
X 2π 2π 0, if j 6= k, or j = k = 0;
sin( kn) sin( jn) = N
N N 2, if j = k 6= 0.
n=0
48CHAPTER 8. THE FINITE FOURIER TRANSFORM (CHAPTER 7)

Exercise 8.1 Using trigonometric identities, show that


2π 2π 1 2π 2π 
cos( kn) cos( jn) = cos( (k + j)n) + cos( (k − j)n) ,
N N 2 N N
2π 2π 1  2π 2π 
sin( kn) cos( jn) = sin( (k + j)n) + sin( (k − j)n) ,
N N 2 N N
and
2π 2π 1 2π 2π 
sin( kn) sin( jn) = − cos( (k + j)n) − cos( (k − j)n) .
N N 2 N N
Exercise 8.2 Use trigonometric identities to show that
1 1 x
sin((n + )x) − sin((n − )x) = 2 sin( ) cos(nx),
2 2 2
and
1 1 x
cos((n + )x) − cos((n − )x) = −2 sin( ) sin(nx).
2 2 2
Exercise 8.3 Use the previous exercise to show that
N −1
x X 1 x
2 sin( ) cos(nx) = sin((N − )x) + sin( ),
2 n=0 2 2

and
N −1
x X x 1
2 sin( ) sin(nx) = cos( ) − cos((N − )x).
2 n=0 2 2
Hints: sum over n = 0, 1, ..., N − 1 on both sides and note that
x x
sin( ) = − sin(− ).
2 2
Exercise 8.4 Use trigonometric identities to show that
1 x N −1 N
sin((N − )x) + sin( ) = 2 cos( x) sin( x),
2 2 2 2
and
x 1 N N −1
cos − cos((N − )x) = 2 sin( x) sin( x).
2 2 2 2
Hints: Use
1 N N −1
N− = + ,
2 2 2
and
1 N N −1
= − .
2 2 2
8.2. LINEAR TRIGONOMETRIC MODELS 49

Exercise 8.5 Use the previous exercises to show that

N −1
x X N N −1
sin( ) cos(nx) = sin( x) cos( x),
2 n=0 2 2

and
N −1
x X N N −1
sin( ) sin(nx) = sin( x) sin( x).
2 n=0 2 2

2πm
Let m be any integer. Substituting x = N in the equations in the
previous exercise, we obtain

N −1
π X 2πmn N −1
sin( m) cos( ) = sin(πm) cos( πm), (8.9)
N n=0
N N

and
N −1
π X 2πmn N −1
sin( m) sin( ) = sin(πm) sin( πm). (8.10)
N n=0
N N

With m = k + j, we have

N −1
π X 2π(k + j)n N −1
sin( (k + j)) cos( ) = sin(π(k + j)) cos( π(k + j)),
(8.11)
N n=0
N N

and
N −1
π X 2π(k + j)n N −1
sin( (k + j)) sin( ) = sin(π(k + j)) sin( π(k + j)).
(8.12)
N n=0
N N

Similarly, with m = k − j, we obtain

N −1
π X 2π(k − j)n N −1
sin( (k − j)) cos( ) = sin(π(k − j)) cos( π(k − j)),
(8.13)
N n=0
N N

and
N −1
π X 2π(k − j)n N −1
sin( (k − j)) sin( ) = sin(π(k − j)) sin( π(k − j)).
(8.14)
N n=0
N N

Exercise 8.6 Prove Lemma 8.1.


50CHAPTER 8. THE FINITE FOURIER TRANSFORM (CHAPTER 7)

It follows immediately from Lemma 8.1 that


N
X −1
fn = N a0 ,
n=0

and that
N −1
X 2π N
fn cos( jn) = aj ,
n=0
N 2

and
N −1
X 2π N
fn sin( jn) = bj ,
n=0
N 2

for j = 1, ..., M .

8.3 From Real to Complex


Throughout these notes we have limited the discussion to real data and
models involving only real coefficients and real-valued functions. It is more
common to use complex data and complex-valued models. Limiting the
discussion to the real numbers comes at a price. Although complex vari-
ables may not be as familiar to the reader as real variables, there is some
advantage in allowing the data and the models to be complex, as is the
common practice in signal processing.
Suppose now that f (x) is complex, for 0 ≤ x ≤ 2A, and, as before, we
have evaluated f (x) at the N = 2M + 1 points x = 2A N n, n = 0, 1, ..., N − 1.
Now we have the N complex numbers fn = f ( 2A N n).
In the model for the real-valued f (x) given by Equation (8.5) it ap-
peared that we used only M + 1 values of ωk , including ω0 = 0 for the
constant term. In fact, though, if we were to express the sine and cosine
functions in terms of complex exponential functions, we would see that we
π
have used the frequencies A j, for j = −M, ..., M , so we have really used
2M + 1 = N frequencies. In the complex version, we explicitly use N fre-
π π
quencies spaced A apart. It is traditional that we use the frequencies A k,
for k = 0, 1, ..., N − 1, although other choices are possible.
Given the (possibly) complex values fn = f ( 2A N n), n = 0, 1, ..., N − 1,
we model the function f (x) as a finite sum of N complex exponentials:

N −1
1 X π
f (x) = Fk exp(−i kx), (8.15)
N A
k=0

where the coefficients Fk are to be determined from the data fn , n =


8.3. FROM REAL TO COMPLEX 51

2A
0, 1, ..., N − 1. Setting x = N n in Equation (8.15), we have

N −1
1 X 2π
fn = Fk exp(−i kn). (8.16)
N N
k=0

Suppose that N = 2M + 1. Using the formula for the sum of a finite


geometric progression, we can easily show that
M
X sin((M + 21 )x)
exp(imx) = , (8.17)
sin( x2 )
m=−M

whenever the denominator is not zero. From Equation (8.17) we can show
that
N −1
X 2π 2π
exp(i kn) exp(−i jn) = 0, (8.18)
n=0
N N

for j 6= k. It follows that the coefficients Fk can be calculated as follows:


N −1
X 2π
Fk = f (n) exp(i kn), (8.19)
n=0
N

for k = 0, 1, ..., N − 1.
Generally, given any (possibly) complex numbers fn , n = 0, 1, ..., N − 1,
the collection of coefficients Fk , k = 0, 1, ..., N − 1, is called its complex
finite Fourier transform.

8.3.1 More Computational Issues


In many applications of signal processing N , the number of measurements
of the function f (x), can be quite large. We have found a relatively in-
expensive way to find the undetermined parameters of the trigonometric
model, but even this way poses computational problems when N is large.
The computation of a single ak , bk or Fk requires N multiplications and we
have to calculate N of these parameters. Thus, the complexity of the prob-
lem is on the order of N squared. Fortunately, there is a fast algorithm,
known as the fast Fourier transform (FFT), that enables us to perform
these calculations in far fewer multiplications.
52CHAPTER 8. THE FINITE FOURIER TRANSFORM (CHAPTER 7)
Chapter 9

Transmission and Remote


Sensing (Chapter 8)

9.1 Chapter Summary


In this chapter we illustrate the roles played by Fourier series and Fourier
coefficients in the analysis of signal transmission and remote sensing.

9.2 Fourier Series and Fourier Coefficients


We suppose that f (x) is defined for −L ≤ x ≤ L, with Fourier series
representation

1 X nπ nπ
f (x) = a0 + an cos( x) + bn sin( x). (9.1)
2 n=1
L L

The Fourier coefficients are


Z L
1 nπ
an = f (x) cos( x)dx, (9.2)
L −L L

and
Z L
1 nπ
bn = f (x) sin( x)dx. (9.3)
L −L L

In the examples in this chapter, we shall see how Fourier coefficients


can arise as data obtained through measurements. However, we shall be
able to measure only a finite number of the Fourier coefficients. One issue

53
54CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)

that will concern us is the effect on the representation of f (x) if we use


some, but not all, of its Fourier coefficients.
Suppose that we have an and bn for n = 1, 2, ..., N . It is not unrea-
sonable to try to estimate the function f (x) using the discrete Fourier
transform (DFT) estimate, which is
N
1 X nπ nπ
fDF T (x) = a0 + an cos( x) + bn sin( x). (9.4)
2 n=1
L L

In Figure 9.1 below, the function f (x) is the solid-line figure in both graphs.
In the bottom graph, we see the true f (x) and a DFT estimate. The top
graph is the result of band-limited extrapolation, a technique for predicting
missing Fourier coefficients.

Figure 9.1: The non-iterative band-limited extrapolation method (MDFT)


(top) and the DFT (bottom) for M = 129, ∆ = 1 and Ω = π/30.

9.3 The Unknown Strength Problem


In this example, we imagine that each point x in the interval [−L, L] is
sending a sine function signal at the frequency ω, each with its own strength
9.3. THE UNKNOWN STRENGTH PROBLEM 55

f (x); that is, the signal sent by the point x is

f (x) sin(ωt). (9.5)

In our first example, we imagine that the strength function f (x) is unknown
and we want to determine it. It could be the case that the signals originate
at the points x, as with light or radio waves from the sun, or are simply
reflected from the points x, as is sunlight from the moon or radio waves
in radar. Later in this chapter, we shall investigate a related example, in
which the points x transmit known signals and we want to determine what
is received elsewhere.

9.3.1 Measurement in the Far-Field


Now let us consider what is received by a point P on the circumference
of a circle centered at the origin and having large radius D. The point P
corresponds to the angle θ as shown in Figure 9.2; we use θ in the interval
[0, π]. It takes a finite time for the signal sent from x at time t to reach P ,
so there is a delay.
We assume that c is the speed at which the signal propagates. Because
D is large relative to L, we make the far-field assumption, which allows us
to approximate the distance from x to P by D − x cos(θ). Therefore, what
P receives at time t is what was sent from x at time t − 1c (D − x cos(θ)).
At time t, the point P receives from x the signal
 D ω cos(θ) D ω cos(θ) 
f (x) sin(ω(t − )) cos( x) + cos(ω(t − )) sin( x) ,(9.6)
c c c c
and the point Q corresponding to the angle θ + π receives
 D ω cos(θ) D ω cos(θ) 
f (x) sin(ω(t − )) cos( x) − cos(ω(t − )) sin( x) .(9.7)
c c c c
Adding the quantities in (9.6) and (9.7), we obtain
 ω cos(θ)  D
2 f (x) cos( x) sin(ω(t − )), (9.8)
c c
while subtracting the latter from the former, we get
 ω cos(θ)  D
2 f (x) sin( x) cos(ω(t − )). (9.9)
c c
Evaluating the signal in Equation (9.8) at the time when

D π
ω(t − )= ,
c 2
56CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)

and dividing by 2, we get


ω cos(θ)
f (x) cos( x),
c
while evaluating the signal in Equation (9.9) at the time when
D
ω(t − ) = 2π
c
and dividing by 2 gives us
ω cos(θ)
f (x) sin( x).
c
Because P and Q receive signals from all the x, not just from one x, what
P and Q receive at time t involves integrating over all x. Therefore, from
our measurements at P and Q we obtain the quantities
Z L
ω cos(θ)
f (x) cos( x)dx, (9.10)
−L c
and
Z L
ω cos(θ)
f (x) sin( x)dx. (9.11)
−L c
If we can select an angle θ for which
ω cos(θ) nπ
= , (9.12)
c L
then we have an and bn .

9.3.2 Limited Data


Note that we will be able to solve Equation (9.12) for θ only if we have

n≤ . (9.13)
πc
This tells us that we can measure only finitely many of the Fourier coeffi-
cients of f (x). It is common in signal processing to speak of the wavelength
of a sinusoidal signal; the wavelength associated with a given ω and c is
2πc
λ= . (9.14)
ω
Therefore the number N of Fourier coefficients we can measure is the largest
integer not greater than 2L λ , which is the length of the interval [−L, L],
measured in units of wavelength λ. We get more Fourier coefficients when
the product Lω is larger; this means that when L is small, we want ω to be
large, so that λ is small and N is large. As we saw previously, using these
finitely many Fourier coefficients to calculate the DFT reconstruction of
f (x) can lead to a poor estimate of f (x), particularly when N is small.
9.3. THE UNKNOWN STRENGTH PROBLEM 57

9.3.3 Can We Get More Data?


As we just saw, we can make measurements at any points P and Q in the
far-field; perhaps we do not need to limit ourselves to just those angles that
lead to the an and bn . It may come as somewhat of a surprise, but from
the theory of complex analytic functions we can prove that there is enough
data available to us here to reconstruct f (x) perfectly, at least in principle.
The drawback, in practice, is that the measurements would have to be free
of noise and impossibly accurate. All is not lost, however.
Suppose, for the sake of illustration, that we measure the far-field signals
at points P and Q corresponding to angles θ that satisfy

ω cos(θ) nπ
= . (9.15)
c 2L
Now we have twice as many data points: we now have
Z 2L Z L
nπ nπ
An = f (x) cos( )dx = f (x) cos( )dx, (9.16)
−2L 2L −L 2L

and
Z 2L Z L
nπ nπ
Bn = f (x) sin( )dx = f (x) sin( )dx, (9.17)
−2L 2L −L 2L

for n = 0, 1, ..., 2N . We say now that our data is twice over-sampled.


Notice, however, that we have implicitly assumed that the interval of x
values from which signals are coming is now [−2L, 2L], not the true [−L, L];
values of x beyond [−L, L] send no signals, so f (x) = 0 for those x. The
data values we now have allow us to get Fourier coefficients An and Bn
for the function f (x) throughout [−2L, 2L]. We have twice the number
of Fourier coefficients, but must reconstruct f (x) over an interval that is
twice as long. Over half of this interval f (x) = 0, so we waste effort if we
use the An and Bn in the DFT, which will now reconstruct f (x) over the
interval [−2L, 2L], on half of which f (x) is known to be zero. But what
else can we do?
Considerable research has gone into the use of prior knowledge about
f (x) to obtain reconstructions that are better than the DFT. In the ex-
ample we are now considering, we have prior knowledge that f (x) = 0 for
L < |x| ≤ 2L. We can use this prior knowledge to improve our recon-
struction. Suppose that we take as our reconstruction the modified DFT
(MDFT), which is a function defined only for |x| ≤ L and having the form

2N
1 X nπ nπ
fM DF T (x) = c0 + cn cos( x) + dn sin( x), (9.18)
2 n=1
2L 2L
58CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)

where the cn and dn are not yet determined. Then we determine the cn and
dn by requiring that the function fM DF T (x) could be the correct answer;
that is, we require that fM DF T (x) be consistent with the measured data.
Therefore, we must have
Z L

fM DF T (x) cos( )dx = An , (9.19)
−L 2L

and
Z L

fM DF T (x) sin( )dx = Bn , (9.20)
−L 2L

for n = 0, 1, ..., 2N . It is important to note now that the cn and dn are


not the An and Bn ; this is because we no longer have orthogonality. For
example, when we calculate the integral
Z L
nπ mπ
cos( ) cos( )dx, (9.21)
−L 2L 2L

for m 6= n, we do not get zero. To find the cn and dn we need to solve a


system of linear equations in these unknowns.
The top graph in Figure (9.1) illustrates the improvement over the DFT
that can be had using the MDFT. In that figure, we took data that was
thirty times over-sampled, not just twice over-sampled, as in our previous
discussion. Consequently, we had thirty times the number of Fourier coeffi-
cients we would have had otherwise, but for an interval thirty times longer.
To get the top graph, we used the MDFT, with the prior knowledge that
f (x) was non-zero only within the central thirtieth of the long interval. The
bottom graph shows the DFT reconstruction using the larger data set, but
only for the central thirtieth of the full period, which is where the original
f (x) is non-zero.

9.3.4 Other Forms of Prior Knowledge


As we just showed, knowing that we have over-sampled in our measure-
ments can help us improve the resolution in our estimate of f (x). We may
have other forms of prior knowledge about f (x) that we can use. If we know
something about large-scale features of f (x), but not about finer details,
we can use the PDFT estimate, which is a generalization of the MDFT.
For example, we may know that f (x) is non-negative, which we have
not assumed explicitly previously in this chapter. Or, we may know that
f (x) is approximately zero for most x, but contains very sharp peaks at
a few places. In more formal language, we may be willing to assume that
f (x) contains a few Dirac delta functions in a flat background. There are
9.4. THE TRANSMISSION PROBLEM 59

non-linear methods, such as the maximum entropy method, the indirect


PDFT (IPDFT), and eigenvector methods that can be used to advantage
in such cases; these methods are often called high-resolution methods.

9.4 The Transmission Problem


9.4.1 Directionality
Now we turn the table around and suppose that we are designing a broad-
casting system, using transmitters at each x in the interval [−L, L]. At
each x we will transmit f (x) sin(ωt), where both f (x) and ω are chosen by
us. We now want to calculate what will be received at each point P in the
far-field. We may wish to design the system so that the strengths of the
signals received at the various P are not all the same. For example, if we
are broadcasting from Los Angeles, we may well want a strong signal in the
north and south directions, but weak signals east and west, where there are
fewer people to receive the signal. Clearly, our model of a single-frequency
signal is too simple, but it does allow us to illustrate several important
points about directionality in array processing.

9.4.2 The Case of Uniform Strength


For concreteness, we investigate the case in which f (x) = 1 for |x| ≤
L. Since this function is even, we need only the an . In this case, the
measurement of the signal at the point P gives us

2c ω cos(θ)
sin( ), (9.22)
ω cos(θ) c
whose absolute value is then the strength of the signal at P . Is it possible
that the strength of the signal at some P is zero?
To have zero signal strength, we need
Lω cos(θ)
sin( ) = 0,
c
without
cos(θ) = 0.
Therefore, we need
Lω cos(θ)
= nπ, (9.23)
c
for some positive integers n ≥ 1. Notice that this can happen only if
Lωπ 2L
n≤ = . (9.24)
c λ
60CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)

Therefore, if 2L < λ, there can be no P with signal strength zero. The


larger 2L is, with respect to the wavelength λ, the more angles at which
the signal strength is zero.
We have assumed here that each x in the interval [−L, L] is transmit-
ting, but we can get a similar result using finitely many transmitters in
[−L, L]. The graphs in Figures 9.3, 9.4, and 9.5 illustrate the sort of trans-
mission patterns that can be designed by varying ω. The figure captions
refer to parameters used in a separate discussion, but the pictures are still
instructive.

9.5 Remote Sensing


A basic problem in remote sensing is to determine the nature of a distant
object by measuring signals transmitted by or reflected from that object.
If the object of interest is sufficiently remote, that is, is in the farfield, the
data we obtain by sampling the propagating spatio-temporal field is related,
approximately, to what we want by Fourier transformation. The problem
is then to estimate a function from finitely many (usually noisy) values
of its Fourier transform. The application we consider here is a common
one of remote-sensing of transmitted or reflected waves propagating from
distant sources. Examples include optical imaging of planets and asteroids
using reflected sunlight, radio-astronomy imaging of distant sources of radio
waves, active and passive sonar, and radar imaging.

9.6 One-Dimensional Arrays


Now we imagine that the points P are the sources of the signals and we
are able to measure the transmissions at points x in [−L, L]. The P cor-
responding to the angle θ sends F (θ) sin(ωt), where the absolute value of
F (θ) is the strength of the signal coming from P . In narrow-band pas-
sive sonar, for example, we may have hydrophone sensors placed at various
points x and our goal is to determine how much acoustic energy at a spec-
ified frequency is coming from different directions. There may be only a
few directions contributing significant energy at the frequency of interest.

9.6.1 Measuring Fourier Coefficients


To simplify notation, we shall introduce the variable u = cos(θ). We then
have
du p
= − sin(θ) = − 1 − u2 ,

so that
1
dθ = − √ du.
1 − u2
9.6. ONE-DIMENSIONAL ARRAYS 61

Now let G(u) be the function

F (arccos(u))
G(u) = √ ,
1 − u2
defined for u in the interval [−1, 1].
Measuring the signals received at x and −x, we can obtain the integrals
Z 1

G(u) cos( u)du, (9.25)
−1 c

and
Z 1

G(u) sin( u)du. (9.26)
−1 c

The Fourier coefficients of G(u) are

1 1
Z
G(u) cos(nπu)du, (9.27)
2 −1

and
Z 1
1
G(u) sin(nπu)du. (9.28)
2 −1

Therefore, in order to have our measurements match Fourier coefficients of


G(u) we need

= nπ, (9.29)
c
for some positive integer n. Therefore, we need to take measurements at
the points x and −x, where
πc λ
x=n = n = n∆, (9.30)
ω 2
where ∆ = λ2 is the Nyquist spacing. Since x is restricted to [−L, L], there
is an upper limit to the n we can use; we must have
L 2L
n≤ = . (9.31)
λ/2 λ

The upper bound 2L λ , which is the length of our array of sensors, in units
of wavelength, is often called the aperture of the array.
Once we have some of the Fourier coefficients of the function G(u), we
can estimate G(u) for |u| ≤ 1 and, from that estimate, obtain an estimate
of the original F (θ).
62CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)

As we just saw, the number of Fourier coefficients of G(u) that we


can measure, and therefore the resolution of the resulting reconstruction
of F (θ), is limited by the aperture, that is, the length 2L of the array of
sensors, divided by the wavelength λ. One way to improve resolution is
to make the array of sensors longer, which is more easily said than done.
However, synthetic-aperture radar (SAR) effectively does this. The idea of
SAR is to mount the array of sensors on a moving airplane. As the plane
moves, it effectively creates a longer array of sensors, a virtual array if you
will. The one drawback is that the sensors in this virtual array are not
all present at the same time, as in a normal array. Consequently, the data
must be modified to approximate what would have been received at other
times.
As in the examples discussed previously, we do have more measurements
we can take, if we use values of x other than those described by Equation
(9.30). The issue will be what to do with these over-sampled measurements.

9.6.2 Over-sampling
One situation in which over-sampling arises naturally occurs in sonar array
processing. Suppose that an array of sensors has been built to operate at
a design frequency of ω0 , which means that we have placed sensors at the
points x in [−L, L] that satisfy the equation
πc λ0
x=n =n = n∆0 , (9.32)
ω0 2
where λ0 is the wavelength corresponding to the frequency ω0 and ∆0 = λ20
is the Nyquist spacing for frequency ω0 . Now suppose that we want to
operate the sensing at another frequency, say ω. The sensors cannot be
moved, so we must make due with sensors at the points x determined by
the design frequency.
Consider, first, the case in which the second frequency ω is less than
the design frequency ω0 . Then its wavelength λ is larger than λ0 , and the
Nyquist spacing ∆ = λ2 for ω is larger than ∆0 . So we have over-sampled.
The measurements taken at the sensors provide us with the integrals

Z 1
1 nπ
G(u) cos( u)du, (9.33)
K −1 K
and
Z 1
1 nπ
G(u) sin( u)du, (9.34)
K −1 K
where K = ωω0 > 1. These are Fourier coefficients of the function G(u),
viewed as defined on the interval [−K, K], which is larger than [−1, 1], and
9.7. HIGHER DIMENSIONAL ARRAYS 63

taking the value zero outside [−1, 1]. If we then use the DFT estimate of
G(u), it will estimate G(u) for the values of u within [−1, 1], which is what
we want, as well as for the values of u outside [−1, 1], where we already
know G(u) to be zero. Once again, we can use the modified DFT, the
MDFT, to include the prior knowledge that G(u) = 0 for u outside [−1, 1]
to improve our reconstruction of G(u) and F (θ). In the over-sampled case
the interval [−1, 1] is called the visible region (although audible region seems
more appropriate for sonar), since it contains all the values of u that can
correspond to actual angles of arrival of acoustic energy.

9.6.3 Under-sampling
Now suppose that the frequency ω that we want to consider is greater than
the design frequency ω0 . This means that the spacing between the sensors
is too large; we have under-sampled. Once again, however, we cannot move
the sensors and must make due with what we have.
Now the measurements at the sensors provide us with the integrals
Z 1
1 nπ
G(u) cos( u)du, (9.35)
K −1 K

and
Z 1
1 nπ
G(u) sin( u)du, (9.36)
K −1 K

where K = ωω0 < 1. These are Fourier coefficients of the function G(u),
viewed as defined on the interval [−K, K], which is smaller than [−1, 1],
and taking the value zero outside [−K, K]. Since G(u) is not necessarily
zero outside [−K, K], treating it as if it were zero there results in a type
of error known as aliasing, in which energy corresponding to angles whose
u lies outside [−K, K] is mistakenly assigned to values of u that lie within
[−K, K]. Aliasing is a common phenomenon; the strobe-light effect is
aliasing, as is the apparent backward motion of the wheels of stage-coaches
in cowboy movies. In the case of the strobe light, we are permitted to view
the scene at times too far apart for us to sense continuous, smooth motion.
In the case of the wagon wheels, the frames of the film capture instants of
time too far apart for us to see the true rotation of the wheels.

9.7 Higher Dimensional Arrays


Up to now, we have considered sensors placed within a one-dimensional
interval [−L, L] and signals propagating within a plane containing [−L, L].
In such an arrangement there is a bit of ambiguity; we cannot tell if a
64CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)

signal is coming from the angle θ or the angle θ + π. When propagating


signals can come to the array from any direction in three-dimensional space,
there is greater ambiguity. To resolve the ambiguities, we can employ two-
and three-dimensional arrays of sensors. To analyze the higher-dimensional
cases, it is helpful to use the wave equation.

9.7.1 The Wave Equation


In many areas of remote sensing, what we measure are the fluctuations
in time of an electromagnetic or acoustic field. Such fields are described
mathematically as solutions of certain partial differential equations, such
as the wave equation. A function u(x, y, z, t) is said to satisfy the three-
dimensional wave equation if
utt = c2 (uxx + uyy + uzz ) = c2 ∇2 u, (9.37)
where utt denotes the second partial derivative of u with respect to the time
variable t twice and c > 0 is the (constant) speed of propagation. More
complicated versions of the wave equation permit the speed of propagation
c to vary with the spatial variables x, y, z, but we shall not consider that
here.
We use the method of separation of variables at this point, to get some
idea about the nature of solutions of the wave equation. Assume, for the
moment, that the solution u(t, x, y, z) has the simple form
u(t, x, y, z) = f (t)g(x, y, z). (9.38)
Inserting this separated form into the wave equation, we get
f 00 (t)g(x, y, z) = c2 f (t)∇2 g(x, y, z) (9.39)
or
f 00 (t)/f (t) = c2 ∇2 g(x, y, z)/g(x, y, z). (9.40)
The function on the left is independent of the spatial variables, while the
one on the right is independent of the time variable; consequently, they
must both equal the same constant, which we denote −ω 2 . From this we
have two separate equations,
f 00 (t) + ω 2 f (t) = 0, (9.41)
and
ω2
∇2 g(x, y, z) + g(x, y, z) = 0. (9.42)
c2
Equation (9.42) is the Helmholtz equation.
Equation (9.41) has for its solutions the functions f (t) = cos(ωt) and
sin(ωt). Functions u(t, x, y, z) = f (t)g(x, y, z) with such time dependence
are called time-harmonic solutions.
9.7. HIGHER DIMENSIONAL ARRAYS 65

9.7.2 Planewave Solutions


Suppose that, beginning at time t = 0, there is a localized disturbance.
As time passes, that disturbance spreads out spherically. When the radius
of the sphere is very large, the surface of the sphere appears planar, to
an observer on that surface, who is said then to be in the far field. This
motivates the study of solutions of the wave equation that are constant on
planes; the so-called planewave solutions.
Let s = (x, y, z) and u(s, t) = u(x, y, z, t) = eiωt eik·s . Then we can show
that u satisfies the wave equation utt = c2 ∇2 u for any real vector k, so long
as ||k||2 = ω 2 /c2 . This solution is a planewave associated with frequency
ω and wavevector k; at any fixed time the function u(s, t) is constant on
any plane in three-dimensional space having k as a normal vector.
In radar and sonar, the field u(s, t) being sampled is usually viewed as
a discrete or continuous superposition of planewave solutions with various
amplitudes, frequencies, and wavevectors. We sample the field at various
spatial locations s, for various times t. Here we simplify the situation a
bit by assuming that all the planewave solutions are associated with the
same frequency, ω. If not, we can perform an FFT on the functions of time
received at each sensor location s and keep only the value associated with
the desired frequency ω.

9.7.3 Superposition and the Fourier Transform


It is notationally convenient now to use the complex exponential functions

eiωt = cos(ωt) + i sin(ωt)

instead of cos(ωt) and sin(ωt).


In the continuous superposition model, the field is
Z
u(s, t) = eiωt F (k)eik·s dk. (9.43)

Our measurements at the sensor locations s give us the values


Z
f (s) = F (k)eik·s dk. (9.44)

The data are then Fourier transform values of the complex function F (k);
F (k) is defined for all three-dimensional real vectors k, but is zero, in
theory, at least, for those k whose squared length ||k||2 is not equal to
ω 2 /c2 . Our goal is then to estimate F (k) from measured values of its
Fourier transform. Since each k is a normal vector for its planewave field
component, determining the value of F (k) will tell us the strength of the
planewave component coming from the direction k.
66CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)

9.7.4 The Spherical Model


We can imagine that the sources of the planewave fields are the points P
that lie on the surface of a large sphere centered at the origin. For each
P , the ray from the origin to P is parallel to some wavevector k. The
function F (k) can then be viewed as a function F (P ) of the points P . Our
measurements will be taken at points s inside this sphere. The radius of
the sphere is assumed to be orders of magnitude larger than the distance
between sensors. The situation is that of astronomical observation of the
heavens using ground-based antennas. The sources of the optical or electro-
magnetic signals reaching the antennas are viewed as lying on a large sphere
surrounding the earth. Distance to the sources is not considered now, and
all we are interested in are the amplitudes F (k) of the fields associated
with each direction k.

9.7.5 The Two-Dimensional Array


In some applications the sensor locations are essentially arbitrary, while
in others their locations are carefully chosen. Sometimes, the sensors are
collinear, as in sonar towed arrays. Figure 9.6 illustrates a line array.
Suppose now that the sensors are in locations s = (x, y, 0), for various
x and y; then we have a planar array of sensors. Then the dot product s · k
that occurs in Equation (9.44) is
s · k = xk1 + yk2 ; (9.45)
we cannot see the third component, k3 . However, since we know the size
of the vector k, we can determine |k3 |. The only ambiguity that remains
is that we cannot distinguish sources on the upper hemisphere from those
on the lower one. In most cases, such as astronomy, it is obvious in which
hemisphere the sources lie, so the ambiguity is resolved.
The function F (k) can then be viewed as F (k1 , k2 ), a function of the
two variables k1 and k2 . Our measurements give us values of f (x, y), the
two-dimensional Fourier transform of F (k1 , k2 ). Because of the limitation
||k|| = ωc , the function F (k1 , k2 ) has bounded support. Consequently, its
Fourier transform cannot have bounded support. As a result, we can never
have all the values of f (x, y), and so cannot hope to reconstruct F (k1 , k2 )
exactly, even for noise-free data.

9.7.6 The One-Dimensional Array


If the sensors are located at points s having the form s = (x, 0, 0), then we
have a line array of sensors, as we discussed previously. The dot product
in Equation (9.44) becomes
s · k = xk1 . (9.46)
9.8. AN EXAMPLE: THE SOLAR-EMISSION PROBLEM 67

Now the ambiguity is greater than in the planar array case. Once we have
k1 , we know that
ω
k22 + k32 = ( )2 − k12 , (9.47)
c
which describes points P lying on a circle on the surface of the distant
sphere, with the vector (k1 , 0, 0) pointing at the center of the circle. It
is said then that we have a cone of ambiguity. One way to resolve the
situation is to assume k3 = 0; then |k2 | can be determined and we have
remaining only the ambiguity involving the sign of k2 . Once again, in many
applications, this remaining ambiguity can be resolved by other means.
Once we have resolved any ambiguity, we can view the function F (k)
as F (k1 ), a function of the single variable k1 . Our measurements give us
values of f (x), the Fourier transform of F (k1 ). As in the two-dimensional
case, the restriction on the size of the vectors k means that the function
F (k1 ) has bounded support. Consequently, its Fourier transform, f (x),
cannot have bounded support. Therefore, we shall never have all of f (x),
and so cannot hope to reconstruct F (k1 ) exactly, even for noise-free data.

9.7.7 Limited Aperture


In both the one- and two-dimensional problems, the sensors will be placed
within some bounded region, such as |x| ≤ A, |y| ≤ B for the two-
dimensional problem, or |x| ≤ L for the one-dimensional case. The size
of these bounded regions, in units of wavelength, are the apertures of the
arrays. The larger these apertures are, the better the resolution of the
reconstructions.
In digital array processing there are only finitely many sensors, which
then places added limitations on our ability to reconstruction the field
amplitude function F (k).

9.8 An Example: The Solar-Emission Prob-


lem
In [5] Bracewell discusses the solar-emission problem. In 1942, it was
observed that radio-wave emissions in the one-meter wavelength range were
arriving from the sun. Were they coming from the entire disk of the sun
or were the sources more localized, in sunspots, for example? The problem
then was to view each location on the sun’s surface as a potential source of
these radio waves and to determine the intensity of emission corresponding
to each location.
For electromagnetic waves the propagation speed is the speed of light
in a vacuum, which we shall take here to be c = 3 × 108 meters per second.
68CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)

The wavelength λ for gamma rays is around one Angstrom, which is 10−10
meters; for x-rays it is about one millimicron, or 10−9 meters. The visi-
ble spectrum has wavelengths that are a little less than one micron, that
is, 10−6 meters. Shortwave radio has a wavelength around one millime-
ter; microwaves have wavelengths between one centimeter and one meter.
Broadcast radio has a λ running from about 10 meters to 1000 meters. The
so-called long radio waves can have wavelengths several thousand meters
long, prompting clever methods of antenna design for radio astronomy.
The sun has an angular diameter of 30 min. of arc, or one-half of a
degree, when viewed from earth, but the needed resolution was more like
3 min. of arc. Such resolution requires a radio telescope 1000 wavelengths
across, which means a diameter of 1km at a wavelength of 1 meter; in
1942 the largest military radar antennas were less than 5 meters across.
A solution was found, using the method of reconstructing an object from
line-integral data, a technique that surfaced again in tomography.
9.8. AN EXAMPLE: THE SOLAR-EMISSION PROBLEM 69

Figure 9.2: Farfield Measurements.


70CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)

Figure 9.3: Transmission Pattern A(θ): m = 1, 2, 4, 8 and N = 5.


9.8. AN EXAMPLE: THE SOLAR-EMISSION PROBLEM 71

Figure 9.4: Transmission Pattern A(θ): m = 1, 2, 4, 8 and N = 21.


72CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)

Figure 9.5: Transmission Pattern A(θ): m = 0.9, 0.5, 0.25, 0.125 and N =
21.
9.8. AN EXAMPLE: THE SOLAR-EMISSION PROBLEM 73

Figure 9.6: A uniform line array sensing a planewave field.


74CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)
Chapter 10

Properties of the Fourier


Transform (Chapter 8)

In this chapter we review the basic properties of the Fourier transform.

10.1 Fourier-Transform Pairs


Let f (x) be defined for the real variable x in (−∞, ∞). The Fourier trans-
form (FT) of f (x) is the function of the real variable ω given by
Z ∞
F (ω) = f (x)eiωx dx. (10.1)
−∞

Having obtained F (ω) we can recapture the original f (x) from the Fourier-
Transform Inversion Formula:
Z ∞
1
f (x) = F (ω)e−iωx dω. (10.2)
2π −∞
Precisely how we interpret the infinite integrals that arise in the discus-
sion of the Fourier transform will depend on the properties of the function
f (x).

10.1.1 Decomposing f (x)


One way to view Equation (10.2) is that it shows us the function f (x)
as a superposition of complex exponential functions e−iωx , where ω runs
over the entire real line. The use of the minus sign here is simply for
notational convenience later. For each fixed value of ω, the complex number
F (ω) = |F (ω)|eiθ(ω) tells us that the amount of eiωx in f (x) is |F (ω)|, and
that eiωx involves a phase shift by θ(ω).

75
76CHAPTER 10. PROPERTIES OF THE FOURIER TRANSFORM (CHAPTER 8)

10.1.2 The Issue of Units


When we write cos π = −1, it is with the understanding that π is a mea-
sure of angle, in radians; the function cos will always have an independent
variable in units of radians. By extension, the same is true of the complex
exponential functions. Therefore, when we write eixω , we understand the
product xω to be in units of radians. If x is measured in seconds, then ω
is in units of radians per second; if x is in meters, then ω is in units of
radians per meter. When x is in seconds, we sometimes use the variable
ω ω
2π ; since 2π is then in units of radians per cycle, the variable 2π is in units
of cycles per second, or Hertz. When we sample f (x) at values of x spaced
1
∆ apart, the ∆ is in units of x-units per sample, and the reciprocal, ∆ ,
which is called the sampling frequency, is in units of samples per x-units.
1
If x is in seconds, then ∆ is in units of seconds per sample, and ∆ is in
units of samples per second.

10.2 Basic Properties of the Fourier Trans-


form
In this section we present the basic properties of the Fourier transform.
Proofs of these assertions are left as exercises.

Exercise 10.1 Let F (ω) be the FT of the function f (x). Use the defini-
tions of the FT and IFT given in Equations (10.1) and (10.2) to establish
the following basic properties of the Fourier transform operation:

• Symmetry: The FT of the function F (x) is 2πf (−ω). For example,


the FT of the function f (x) = sin(Ωx)
πx is χΩ (ω), so the FT of g(x) =
sin(Ωω)
χΩ (x) is G(ω) = 2π πω .

• Conjugation: The FT of f (x) is F (−ω).


1 ω
• Scaling: The FT of f (ax) is |a| F ( a ) for any nonzero constant a.

• Shifting: The FT of f (x − a) is eiaω F (ω).

• Modulation: The FT of f (x) cos(ω0 x) is 21 [F (ω + ω0 ) + F (ω − ω0 )].

• Differentiation: The FT of the nth derivative, f (n) (x) is (−iω)n F (ω).


The IFT of F (n) (ω) is (ix)n f (x).

• Convolution in x: Let f, F , g, G and h, H be FT pairs, with


Z
h(x) = f (y)g(x − y)dy,
10.3. SOME FOURIER-TRANSFORM PAIRS 77

so that h(x) = (f ∗ g)(x) is the convolution of f (x) and g(x). Then


H(ω) = F (ω)G(ω). For example, if we take g(x) = f (−x), then
Z Z
h(x) = f (x + y)f (y)dy = f (y)f (y − x)dy = rf (x)

is the autocorrelation function associated with f (x) and

H(ω) = |F (ω)|2 = Rf (ω) ≥ 0

is the power spectrum of f (x).


• Convolution in ω: Let f, F , g, G and h, H be FT pairs, with h(x) =
1
f (x)g(x). Then H(ω) = 2π (F ∗ G)(ω).

Definition 10.1 A function f : R → C is said to be even if f (−x) = f (x)


for all x, and odd if f (−x) = −f (x), for all x. Note that a typical function
is neither even nor odd.

Exercise 10.2 Show that f is an even function if and only if its Fourier
transform, F , is an even function.

Exercise 10.3 Show that f is real-valued if and only if its Fourier trans-
form F is conjugate-symmetric, that is, F (−ω) = F (ω). Therefore, f is
real-valued and even if and only if its Fourier transform F is real-valued
and even.

10.3 Some Fourier-Transform Pairs


In this section we present several Fourier-transform pairs.
2
x2
Exercise

10.4 Show that the Fourier transform of f (x) = e−α is F (ω) =
ω 2
π −( 2α )
α e .
Hint: Calculate the derivative F 0 (ω) by differentiating under the integral
sign in the definition of F and integrating by parts. Then solve the resulting
differential equation. Alternatively, perform the integration by completing
the square.
Let u(x) be the Heaviside function that is +1 if x ≥ 0 and 0 otherwise.
Let χA (x) be the characteristic function of the interval [−A, A] that is +1
for x in [−A, A] and 0 otherwise. Let sgn(x) be the sign function that is
+1 if x > 0, −1 if x < 0 and zero for x = 0.

Exercise 10.5 Show that the FT of the function f (x) = u(x)e−ax is


1
F (ω) = a−iω , for every positive constant a, where u(x) is the Heaviside
function.
78CHAPTER 10. PROPERTIES OF THE FOURIER TRANSFORM (CHAPTER 8)

Exercise 10.6 Show that the FT of f (x) = χA (x) is F (ω) = 2 sin(Aω)


ω .

Exercise 10.7 Show that the IFT of the function F (ω) = 2i/ω is f (x) =
sgn(x).

Hints: Write the formula for the inverse Fourier transform of F (ω) as
Z +∞ Z +∞
1 2i i 2i
f (x) = cos ωxdω − sin ωxdω,
2π −∞ ω 2π −∞ ω
which reduces to
1 +∞ 1
Z
f (x) = sin ωxdω,
π −∞ ω
since the integrand of the first integral is odd. For x > 0 consider the
Fourier transform of the function χx (t). For x < 0 perform the change of
variables u = −x.
Generally, the functions f (x) and F (ω) are complex-valued, so that we
may speak about their real and imaginary parts. The next exercise explores
the connections that hold among these real-valued functions.

Exercise 10.8 Let f (x) be arbitrary and F (ω) its Fourier transform. Let
F (ω) = R(ω) + iX(ω), where R and X are real-valued functions, and
similarly, let f (x) = f1 (x) + if2 (x), where f1 and f2 are real-valued. Find
relationships between the pairs R,X and f1 ,f2 .

Exercise 10.9 We define the even part of f (x) to be the function


f (x) + f (−x)
fe (x) = ,
2
and the odd part of f (x) to be
f (x) − f (−x)
fo (x) = ;
2
define Fe and Fo similarly for F the FT of f . Let F (ω) = R(ω) + iX(ω) be
the decomposition of F into its real and imaginary parts. We say that f is
a causal function if f (x) = 0 for all x < 0. Show that, if f is causal, then
R and X are related; specifically, show that X is the Hilbert transform of
R, that is,
1 ∞ R(α)
Z
X(ω) = dα.
π −∞ ω − α

Hint: If f (x) = 0 for x < 0 then f (x)sgn(x) = f (x). Apply the convolution
theorem, then compare real and imaginary parts.
10.4. DIRAC DELTAS 79

10.4 Dirac Deltas


We saw earlier that the F (ω) = χΩ (ω) has for its inverse Fourier transform
the function f (x) = sinπxΩx ; note that f (0) = Ωπ and f (x) = 0 for the first
π
time when Ωx = π or x = Ω . For any Ω-band-limited function g(x) we
have G(ω) = G(ω)χΩ (ω), so that, for any x0 , we have
Z ∞
sin Ω(x − x0 )
g(x0 ) = g(x) dx.
−∞ π(x − x0 )

We describe this by saying that the function f (x) = sinπxΩx has the sifting
property for all Ω-band-limited functions g(x).
As Ω grows larger, f (0) approaches +∞, while f (x) goes to zero for
x 6= 0. The limit is therefore not a function; it is a generalized function
called the Dirac delta function at zero, denoted δ(x). For this reason the
function f (x) = sinπxΩx is called an approximate delta function. The FT
of δ(x) is the function F (ω) = 1 for all ω. The Dirac delta function δ(x)
enjoys the sifting property for all g(x); that is,
Z ∞
g(x0 ) = g(x)δ(x − x0 )dx.
−∞

It follows from the sifting and shifting properties that the FT of δ(x − x0 )
is the function eix0 ω .
The formula for the inverse FT now says
Z ∞
1
δ(x) = e−ixω dω. (10.3)
2π −∞

If we try to make sense of this integral according to the rules of calculus we


get stuck quickly. The problem is that the integral formula doesn’t mean
quite what it does ordinarily and the δ(x) is not really a function, but
an operator on functions; it is sometimes called a distribution. The Dirac
deltas are mathematical fictions, not in the bad sense of being lies or fakes,
but in the sense of being made up for some purpose. They provide helpful
descriptions of impulsive forces, probability densities in which a discrete
point has nonzero probability, or, in array processing, objects far enough
away to be viewed as occupying a discrete point in space.
We shall treat the relationship expressed by Equation (10.3) as a formal
statement, rather than attempt to explain the use of the integral in what
is surely an unconventional manner.
If we move the discussion into the ω domain and define the Dirac delta
1
function δ(ω) to be the FT of the function that has the value 2π for all
1 −iω0 x
x, then the FT of the complex exponential function 2π e is δ(ω − ω0 ),
visualized as a ”spike” at ω0 , that is, a generalized function that has the
80CHAPTER 10. PROPERTIES OF THE FOURIER TRANSFORM (CHAPTER 8)

value +∞ at ω = ω0 and zero elsewhere. This is a useful result, in that


it provides the motivation for considering the Fourier transform of a signal
s(t) containing hidden periodicities. If s(t) is a sum of complex exponentials
with frequencies −ωn , then its Fourier transform will consist of Dirac delta
functions δ(ω − ωn ). If we then estimate the Fourier transform of s(t) from
sampled data, we are looking for the peaks in the Fourier transform that
approximate the infinitely high spikes of these delta functions.

Exercise 10.10 Use the fact that sgn(x) = 2u(x) − 1 and the previous
exercise to show that f (x) = u(x) has the FT F (ω) = i/ω + πδ(ω).

Rx
Exercise 10.11 Let f, F be a FT pair. Let g(x) = −∞
f (y)dy. Show that
iF (ω)
the FT of g(x) is G(ω) = πF (0)δ(ω) + ω .

Hint: For u(x) the Heaviside function we have


Z x Z ∞
f (y)dy = f (y)u(x − y)dy.
−∞ −∞

10.5 More Properties of the Fourier Trans-


form
We can use properties of the Dirac delta functions to extend the Parseval
Equation in Fourier series to Fourier transforms, where it is usually called
the Parseval-Plancherel Equation.

Exercise 10.12 Let f (x), F (ω) and g(x), G(ω) be Fourier transform pairs.
Use Equation (10.3) to establish the Parseval-Plancherel equation
Z Z
1
hf, gi = f (x)g(x)dx = F (ω)G(ω)dω,

from which it follows that
Z Z
1
||f ||2 = hf, f i = |f (x)|2 dx = |F (ω)|2 dω.

Exercise 10.13 The one-sided Laplace transform (LT) of f is F given by
Z ∞
F(z) = f (x)e−zx dx.
0

Compute F(z) for f (x) = u(x), the Heaviside function. Compare F(−iω)
with the FT of u.
10.6. CONVOLUTION FILTERS 81

10.6 Convolution Filters


Let h(x) and H(ω) be a Fourier-transform pair. We have mentioned several
times the basic problem of estimating the function H(ω) from finitely many
values of h(x); for convenience now we use the symbols h and H, rather
than f and F , as we did previously. Sometimes it is H(ω) that we really
want. Other times it is the unmeasured values of h(x) that we want, and
we try to estimate them by first estimating H(ω). Sometimes, neither
of these functions is our main interest; it may be the case that what we
want is another function, f (x), and h(x) is a distorted version of f (x).
For example, suppose that x is time and f (x) represents what a speaker
says into a telephone. The phone line distorts the signal somewhat, often
diminishing the higher frequencies. What the person at the other end
hears is not f (x), but a related signal function, h(x). For another example,
suppose that f (x, y) is a two-dimensional picture viewed by someone with
poor eyesight. What that person sees is not f (x, y) but a related function,
h(x, y), that is a distorted version of the true f (x, y). In both examples,
our goal is to recover the original undistorted signal or image. To do this,
it helps to model the distortion. Convolution filters are commonly used for
this purpose.

10.6.1 Blurring and Convolution Filtering


We suppose that what we measure are not values of f (x), but values of
h(x), where the Fourier transform of h(x) is

H(ω) = F (ω)G(ω).

The function G(ω) describes the effects of the system, the telephone line in
our first example, or the weak eyes in the second example, or the refraction
of light as it passes through the atmosphere, in optical imaging. If we
can use our measurements of h(x) to estimate H(ω) and if we have some
knowledge of the system distortion function, that is, some knowledge of
G(ω) itself, then there is a chance that we can estimate F (ω), and thereby
estimate f (x).
If we apply the Fourier Inversion Formula to H(ω) = F (ω)G(ω), we get
Z
1
h(x) = F (ω)G(ω)e−iωx dx. (10.4)

The function h(x) that results is h(x) = (f ∗ g)(x), the convolution of the
functions f (x) and g(x), with the latter given by
Z
1
g(x) = G(ω)e−iωx dx. (10.5)

82CHAPTER 10. PROPERTIES OF THE FOURIER TRANSFORM (CHAPTER 8)

Note that, if f (x) = δ(x), then h(x) = g(x). In the image processing
example, this says that if the true picture f is a single bright spot, the
blurred image h is g itself. For that reason, the function g is called the
point-spread function of the distorting system.
Convolution filtering refers to the process of converting any given func-
tion, say f (x), into a different function, say h(x), by convolving f (x) with a
fixed function g(x). Since this process can be achieved by multiplying F (ω)
by G(ω) and then inverse Fourier transforming, such convolution filters are
studied in terms of the properties of the function G(ω), known in this con-
text as the system transfer function, or the optical transfer function (OTF);
when ω is a frequency, rather than a spatial frequency, G(ω) is called the
frequency-response function of the filter. The magnitude of G(ω), |G(ω)|,
is called the modulation transfer function (MTF). The study of convolu-
tion filters is a major part of signal processing. Such filters provide both
reasonable models for the degradation signals undergo, and useful tools for
reconstruction.
Let us rewrite Equation (10.4), replacing F (ω) with its definition, as
given by Equation (10.1). Then we have
Z Z
1
h(x) = ( f (t)eiωt dt)G(ω)e−iωx dω. (10.6)

Interchanging the order of integration, we get
Z Z Z
1
h(x) = f (t)( G(ω)eiω(t−x) dω)dt. (10.7)

The inner integral is g(x − t), so we have


Z
h(x) = f (t)g(x − t)dt; (10.8)

this is the definition of the convolution of the functions f and g.

10.6.2 Low-Pass Filtering


If we know the nature of the blurring, then we know G(ω), at least to some
degree of precision. We can try to remove the blurring by taking mea-
surements of h(x), then estimating H(ω) = F (ω)G(ω), then dividing these
numbers by the value of G(ω), and then inverse Fourier transforming. The
problem is that our measurements are always noisy, and typical functions
G(ω) have many zeros and small values, making division by G(ω) danger-
ous, except where the values of G(ω) are not too small. These values of ω
tend to be the smaller ones, centered around zero, so that we end up with
estimates of F (ω) itself only for the smaller values of ω. The result is a
low-pass filtering of the object f (x).
10.7. TWO-DIMENSIONAL FOURIER TRANSFORMS 83

To investigate such low-pass filtering, we suppose that G(ω) = 1, for


|ω| ≤ Ω, and is zero, otherwise. Then the filter is called the ideal Ω-low-
pass filter. In the farfield propagation model, the variable x is spatial,
and the variable ω is spatial frequency, related to how the function f (x)
changes spatially, as we move x. Rapid changes in f (x) are associated with
values of F (ω) for large ω. For the case in which the variable x is time, the
variable ω becomes frequency, and the effect of the low-pass filter on f (x)
is to remove its higher-frequency components.
One effect of low-pass filtering in image processing is to smooth out
the more rapidly changing features of an image. This can be useful if
these features are simply unwanted oscillations, but if they are important
detail, such as edges, the smoothing presents a problem. Restoring such
wanted detail is often viewed as removing the unwanted effects of the low-
pass filtering; in other words, we try to recapture the missing high-spatial-
frequency values that have been zeroed out. Such an approach to image
restoration is called frequency-domain extrapolation . How can we hope
to recover these missing spatial frequencies, when they could have been
anything? To have some chance of estimating these missing values we need
to have some prior information about the image being reconstructed.

10.7 Two-Dimensional Fourier Transforms


More generally, we consider a function f (x, y) of two real variables. Its
Fourier transformation is
Z Z
F (α, β) = f (x, y)ei(xα+yβ) dxdz. (10.9)
p
For example, suppose that f (x, y) = 1 for x2 + y 2 ≤ R, and zero,
otherwise. Then we have
Z π Z R
F (α, β) = e−i(αr cos θ+βr sin θ) rdrdθ. (10.10)
−π 0

In polar coordinates, with α = ρ cos φ and β = ρ sin φ, we have


Z RZ π
F (ρ, φ) = eirρ cos(θ−φ) dθrdr. (10.11)
0 −π

The inner integral is well known;


Z π
eirρ cos(θ−φ) dθ = 2πJ0 (rρ), (10.12)
−π

where J0 denotes the 0th order Bessel function. Using the identity
Z z
tn Jn−1 (t)dt = z n Jn (z), (10.13)
0
84CHAPTER 10. PROPERTIES OF THE FOURIER TRANSFORM (CHAPTER 8)

we have
2πR
F (ρ, φ) = J1 (ρR). (10.14)
ρ
Notice that, since f (x, z) is a radial function, that is, dependent only on
the distance from (0, 0) to (x, y), its Fourier transform is also radial.
The first positive zero of J1 (t) is around t = 4, so when we measure
F at various locations and find F (ρ, φ) = 0 for a particular (ρ, φ), we can
estimate R ≈ 4/ρ. So, even when a distant spherical object, like a star,
is too far away to be imaged well, we can sometimes estimate its size by
finding where the intensity of the received signal is zero [32].

10.7.1 Two-Dimensional Fourier Inversion


Just as in the one-dimensional case, the Fourier transformation that pro-
duced F (α, β) can be inverted to recover the original f (x, y). The Fourier
Inversion Formula in this case is
Z Z
1
f (x, y) = F (α, β)e−i(αx+βy) dαdβ. (10.15)
4π 2
It is important to note that this procedure can be viewed as two one-
dimensional Fourier inversions: first, we invert F (α, β), as a function of,
say, β only, to get the function of α and y
Z
1
g(α, y) = F (α, β)e−iβy dβ; (10.16)

second, we invert g(α, y), as a function of α, to get
Z
1
f (x, y) = g(α, y)e−iαx dα. (10.17)

If we write the functions f (x, y) and F (α, β) in polar coordinates, we obtain
alternative ways to implement the two-dimensional Fourier inversion. We
shall consider these other ways when we discuss the tomography problem
of reconstructing a function f (x, y) from line-integral data.

10.7.2 A Discontinuous Function


1
Consider the function f (x) = 2A , for |x| ≤ A, and f (x) = 0, otherwise.
The Fourier transform of this f (x) is
sin(Aω)
F (ω) = , (10.18)

for all real ω 6= 0, and F (0) = 1. Note that F (ω) is nonzero throughout the
real line, except for isolated zeros, but that it goes to zero as we go to the
10.7. TWO-DIMENSIONAL FOURIER TRANSFORMS 85

infinities. This is typical behavior. Notice also that the smaller the A, the
π
slower F (ω) dies out; the first zeros of F (ω) are at |ω| = A , so the main
lobe widens as A goes to zero. The function f (x) is not continuous, so its
Fourier transform cannot be absolutely integrable. In this case, the Fourier-
Transform Inversion Formula must be interpreted as involving convergence
in the L2 norm.
86CHAPTER 10. PROPERTIES OF THE FOURIER TRANSFORM (CHAPTER 8)
Chapter 11

Transmission Tomography
(Chapter 8)

In this part of the text we focus on transmission tomography. This chapter


will provide a detailed description of how the data is gathered, the mathe-
matical model of the scanning process, and the problem to be solved. The
emphasis here is on the role of the Fourier transform.

11.1 X-ray Transmission Tomography


Although transmission tomography is not limited to scanning living beings,
we shall concentrate here on the use of x-ray tomography in medical diag-
nosis and the issues that concern us in that application. The mathematical
formulation will, of course, apply more generally.
In x-ray tomography, x-rays are transmitted through the body along
many lines. In some, but not all, cases, the lines will all lie in the same
plane. The strength of the x-rays upon entering the body is assumed
known, and the strength upon leaving the body is measured. This data can
then be used to estimate the amount of attenuation the x-ray encountered
along that line, which is taken to be the integral, along that line, of the
attenuation function. On the basis of these line integrals, we estimate the
attenuation function. This estimate is presented to the physician as one or
more two-dimensional images.

11.2 The Exponential-Decay Model


As an x-ray beam passes through the body, it encounters various types of
matter, such as soft tissue, bone, ligaments, air, each weakening the beam

87
88 CHAPTER 11. TRANSMISSION TOMOGRAPHY (CHAPTER 8)

to a greater or lesser extent. If the intensity of the beam upon entry is Iin
and Iout is its lower intensity after passing through the body, then
R
Iout = Iin e− L
f
,

where f = f (x, y) ≥ 0 is the attenuation function describing the two-


dimensional
R distribution of matter within the slice of the body being scanned
and L f is the integral of the function f over the line L along which the
x-ray beam has passed. To see why this is the case, imagine the line L
parameterized by the variable s and consider the intensity function I(s)
as a function of s. For small ∆s > 0, the drop in intensity from the start
to the end of the interval [s, s + ∆s] is approximately proportional to the
intensity I(s), to the attenuation f (s) and to ∆s, the length of the interval;
that is,
I(s) − I(s + ∆s) ≈ f (s)I(s)∆s.
Dividing by ∆s and letting ∆s approach zero, we get

I 0 (s) = −f (s)I(s).

Exercise 11.1 Show that the solution to this differential equation is


Z u=s
I(s) = I(0) exp(− f (u)du).
u=0

Hint: Use an integrating factor.


R R
From knowledge of Iin and Iout , we can determine L f . If we know L f
for every line in the x, y-plane we can reconstruct the attenuation function
f . In the real world we know line integrals only approximately and only
for finitely many lines. The goal in x-ray transmission tomography is to
estimate the attenuation function f (x, y) in the slice, from finitely many
noisy measurements of the line integrals. We usually have prior informa-
tion about the values that f (x, y) can take on. We also expect to find
sharp boundaries separating regions where the function f (x, y) varies only
slightly. Therefore, we need algorithms capable of providing such images.

11.3 Difficulties to be Overcome


There are several problems associated with this model. X-ray beams are
not exactly straight lines; the beams tend to spread out. The x-rays are not
monochromatic, and their various frequency components are attenuated at
different rates, resulting in beam hardening, that is, changes in the spec-
trum of the beam as it passes through the object (see the appendix on the
Laplace transform). The beams consist of photons obeying statistical laws,
so our algorithms probably should be based on these laws. How we choose
11.4. RECONSTRUCTION FROM LINE INTEGRALS 89

the line segments is determined by the nature of the problem; in certain


cases we are somewhat limited in our choice of these segments. Patients
move; they breathe, their hearts beat, and, occasionally, they shift position
during the scan. Compensating for these motions is an important, and dif-
ficult, aspect of the image reconstruction process. Finally, to be practical
in a clinical setting, the processing that leads to the reconstructed image
must be completed in a short time, usually around fifteen minutes. This
time constraint is what motivates viewing the three-dimensional attenua-
tion function in terms of its two-dimensional slices.
As we shall see, the Fourier transform and the associated theory of con-
volution filters play important roles in the reconstruction of transmission
tomographic images.
The data we actually obtain at the detectors are counts of detected
photons. These counts are not the line integrals; they are random quan-
tities whose means, or expected values, are related to the line integrals.
The Fourier inversion methods for solving the problem ignore its statistical
aspects; in contrast, other methods, such as likelihood maximization, are
based on a statistical model that involves Poisson-distributed emissions.

11.4 Reconstruction from Line Integrals


We turn now to the underlying problem of reconstructing attenuation func-
tions from line-integral data.

11.4.1 The Radon Transform


Our goal is to reconstruct the function f (x, y) ≥ 0 from line-integral data.
Let θ be a fixed angle in the interval [0, π). Form the t, s-axis system with
the positive t-axis making the angle θ with the positive x-axis, as shown
in Figure 11.1. Each point (x, y) in the original coordinate system has
coordinates (t, s) in the second system, where the t and s are given by

t = x cos θ + y sin θ,

and
s = −x sin θ + y cos θ.

If we have the new coordinates (t, s) of a point, the old coordinates are
(x, y) given by
x = t cos θ − s sin θ,

and
y = t sin θ + s cos θ.
90 CHAPTER 11. TRANSMISSION TOMOGRAPHY (CHAPTER 8)

We can then write the function f as a function of the variables t and s.


For each fixed value of t, we compute the integral
Z Z
f (x, y)ds = f (t cos θ − s sin θ, t sin θ + s cos θ)ds
L

along the single line L corresponding to the fixed values of θ and t. We


repeat this process for every value of t and then change the angle θ and
repeat again. In this way we obtain the integrals of f over every line L in
the plane. We denote by rf (θ, t) the integral
Z
rf (θ, t) = f (x, y)ds.
L

The function rf (θ, t) is called the Radon transform of f .

11.4.2 The Central Slice Theorem


For fixed θ the function rf (θ, t) is a function of the single real variable t;
let Rf (θ, ω) be its Fourier transform. Then
Z
Rf (θ, ω) = rf (θ, t)eiωt dt
Z Z
= f (t cos θ − s sin θ, t sin θ + s cos θ)eiωt dsdt
Z Z
= f (x, y)eiω(x cos θ+y sin θ) dxdy = F (ω cos θ, ω sin θ),

where F (ω cos θ, ω sin θ) is the two-dimensional Fourier transform of the


function f (x, y), evaluated at the point (ω cos θ, ω sin θ); this relationship
is called the Central Slice Theorem. For fixed θ, as we change the value
of ω, we obtain the values of the function F along the points of the line
making the angle θ with the horizontal axis. As θ varies in [0, π), we get all
the values of the function F . Once we have F , we can obtain f using the
formula for the two-dimensional inverse Fourier transform. We conclude
that we are able to determine f from its line integrals.
11.4. RECONSTRUCTION FROM LINE INTEGRALS 91

Figure 11.1: The Radon transform of f at (t, θ) is the line integral of f


along line L.
92 CHAPTER 11. TRANSMISSION TOMOGRAPHY (CHAPTER 8)
Chapter 12

The ART and MART


(Chapter 15)

12.1 Overview
In many applications, such as in image processing, the system of linear
equations to be solved is quite large, often several tens of thousands of
equations in about the same number of unknowns. In these cases, issues
such as the costs of storage and retrieval of matrix entries, the computa-
tion involved in apparently trivial operations, such as matrix-vector prod-
ucts, and the speed of convergence of iterative methods demand greater
attention. At the same time, the systems to be solved are often under-
determined, and solutions satisfying certain additional constraints, such as
non-negativity, are required. The ART and the MART are two iterative
algorithms that are designed to address these issues.
Both the algebraic reconstruction technique (ART) and the multiplica-
tive algebraic reconstruction technique (MART) were introduced as two
iterative methods for discrete image reconstruction in transmission tomog-
raphy.
Both methods are what are called row-action methods, meaning that
each step of the iteration uses only a single equation from the system. The
MART is limited to non-negative systems for which non-negative solutions
are sought. In the under-determined case, both algorithms find the solution
closest to the starting vector, in the two-norm or weighted two-norm sense
for ART, and in the cross-entropy sense for MART, so both algorithms
can be viewed as solving optimization problems. For both algorithms, the
starting vector can be chosen to incorporate prior information about the
desired solution. In addition,the ART can be employed in several ways to
obtain a least-squares solution, in the over-determined case.

93
94 CHAPTER 12. THE ART AND MART (CHAPTER 15)

12.2 The ART in Tomography


For i = 1, ..., I, let Li be the set of pixel indices j for which the j-th pixel
intersects the i-th line segment, as shown in Figure 12.1, and let |Li | be the
cardinality of the set Li . Let Aij = 1 for j in Li , and Aij = 0 otherwise.
With i = k(mod I) + 1, the iterative step of the ART algorithm is

1
xk+1
j = xkj + (bi − (Axk )i ), (12.1)
|Li |

for j in Li , and

xk+1
j = xkj , (12.2)

if j is not in Li . In each step of ART, we take the error, bi − (Axk )i ,


associated with the current xk and the i-th equation, and distribute it
equally over each of the pixels that intersects Li .
A somewhat more sophisticated version of ART allows Aij to include
the length of the i-th line segment that lies within the j-th pixel; Aij is
taken to be the ratio of this length to the length of the diagonal of the
j-pixel.
More generally, ART can be viewed as an iterative method for solving
an arbitrary system of linear equations, Ax = b.

Figure 12.1: Line integrals through a discretized object.


12.3. THE ART IN THE GENERAL CASE 95

12.3 The ART in the General Case


Let A be a complex matrix with I rows and J columns, and let b be a
member of CI . We want to solve the system Ax = b.
For each index value i, let Hi be the hyperplane of J-dimensional vectors
given by

Hi = {x|(Ax)i = bi }, (12.3)

and Pi the orthogonal projection operator onto Hi . Let x0 be arbitrary


and, for each nonnegative integer k, let i(k) = k(mod I) + 1. The iterative
step of the ART is

xk+1 = Pi(k) xk . (12.4)

Because the ART uses only a single equation at each step, it has been called
a row-action method. Figures 12.2 and 12.3 illustrate the behavior of the
ART.

12.3.1 Calculating the ART


Given any vector z the vector in Hi closest to z, in the sense of the Euclidean
distance, has the entries
J
X
xj = zj + Aij (bi − (Az)i )/ |Aim |2 . (12.5)
m=1

To simplify our calculations, we shall assume, throughout this chapter, that


the rows of A have been rescaled to have Euclidean length one; that is
J
X
|Aij |2 = 1, (12.6)
j=1

for each i = 1, ..., I, and that the entries of b have been rescaled accordingly,
to preserve the equations Ax = b. The ART is then the following: begin
with an arbitrary vector x0 ; for each nonnegative integer k, having found
xk , the next iterate xk+1 has entries

xk+1
j = xkj + Aij (bi − (Axk )i ). (12.7)

When the system Ax = b has exact solutions the ART converges to the
solution closest to x0 , in the 2-norm. How fast the algorithm converges
will depend on the ordering of the equations and on whether or not we use
relaxation. In selecting the equation ordering, the important thing is to
avoid particularly bad orderings, in which the hyperplanes Hi and Hi+1
are nearly parallel.
96 CHAPTER 12. THE ART AND MART (CHAPTER 15)

12.3.2 When Ax = b Has Solutions


For the consistent case, in which the system Ax = b has exact solutions,
we have the following result.

Theorem 12.1 Let Ax̂ = b and let x0 be arbitrary. Let {xk } be generated
by Equation (12.7). Then the sequence {||x̂ − xk ||2 } is decreasing and {xk }
converges to the solution of Ax = b closest to x0 .

12.3.3 When Ax = b Has No Solutions


When there are no exact solutions, the ART does not converge to a single
vector, but, for each fixed i, the subsequence {xnI+i , n = 0, 1, ...} converges
to a vector z i and the collection {z i |i = 1, ..., I} is called the limit cycle.
The ART limit cycle will vary with the ordering of the equations, and
contains more than one vector unless an exact solution exists. There are
several open questions about the limit cycle.

Open Question: For a fixed ordering, does the limit cycle depend on the
initial vector x0 ? If so, how?

12.3.4 The Geometric Least-Squares Solution


When the system Ax = b has no solutions, it is reasonable to seek an ap-
proximate solution, such as the least squares solution, xLS = (A† A)−1 A† b,
which minimizes ||Ax−b||2 . It is important to note that the system Ax = b
has solutions if and only if the related system W Ax = W b has solutions,
where W denotes an invertible matrix; when solutions of Ax = b exist, they
are identical to those of W Ax = W b. But, when Ax = b does not have
solutions, the least-squares solutions of Ax = b, which need not be unique,
but usually are, and the least-squares solutions of W Ax = W b need not
be identical. In the typical case in which A† A is invertible, the unique
least-squares solution of Ax = b is

(A† A)−1 A† b, (12.8)

while the unique least-squares solution of W Ax = W b is

(A† W † W A)−1 A† W † b, (12.9)

and these need not be the same.


A simple example is the following. Consider the system

x=1

x = 2, (12.10)
12.4. THE MART 97

which has the unique least-squares solution x = 1.5, and the system

2x = 2

x = 2, (12.11)

which has the least-squares solution x = 1.2.

Definition 12.1 The geometric least-squares solution of Ax = b is the


least-squares solution of W Ax = W b, for W the diagonal matrix whose
entries are the reciprocals of the Euclidean lengths of the rows of A.

In our example above, the geometric least-squares solution for the first
system is found by using W11 = 1 = W22 , so is again x = 1.5, while the
geometric least-squares solution of the second system is found by using
W11 = 0.5 and W22 = 1, so that the geometric least-squares solution is
x = 1.5, not x = 1.2.

Open Question: If there is a unique geometric least-squares solution,


where is it, in relation to the vectors of the limit cycle? Can it be calculated
easily, from the vectors of the limit cycle?
There is a partial answer to the second question. It is known that if
the system Ax = b has no exact solution, and if I = J + 1, then the
vectors of the limit cycle lie on a sphere in J-dimensional space having
the least-squares solution at its center. This is not true more generally,
however.

12.4 The MART


The multiplicative ART (MART) is an iterative algorithm closely related
to the ART. It also was devised to obtain tomographic images, but, like
ART, applies more generally; MART applies to systems of linear equations
Ax = b for which the bi are positive, the Aij are nonnegative, and the so-
lution x we seek is to have nonnegative entries. It is not so easy to see the
relation between ART and MART if we look at the most general formula-
tion of MART. For that reason, we begin with a simpler case, transmission
tomographic imaging, in which the relation is most clearly visible.

12.4.1 A Special Case of MART


We begin by considering the application of MART to the transmission
tomography problem. For i = 1, ..., I, let Li be the set of pixel indices j
for which the j-th pixel intersects the i-th line segment, and let |Li | be the
98 CHAPTER 12. THE ART AND MART (CHAPTER 15)

cardinality of the set Li . Let Aij = 1 for j in Li , and Aij = 0 otherwise.


With i = k(mod I) + 1, the iterative step of the ART algorithm is
1
xk+1
j = xkj + (bi − (Axk )i ), (12.12)
|Li |
for j in Li , and
xk+1
j = xkj , (12.13)

if j is not in Li . In each step of ART, we take the error, bi − (Axk )i ,


associated with the current xk and the i-th equation, and distribute it
equally over each of the pixels that intersects Li .
Suppose, now, that each bi is positive, and we know in advance that the
desired image we wish to reconstruct must be nonnegative. We can begin
with x0 > 0, but as we compute the ART steps, we may lose nonnegativity.
One way to avoid this loss is to correct the current xk multiplicatively,
rather than additively, as in ART. This leads to the multiplicative ART
(MART).
The MART, in this case, has the iterative step
 b 
i
xk+1
j = x k
j , (12.14)
(Axk )i
for those j in Li , and
xk+1
j = xkj , (12.15)
otherwise. Therefore, we can write the iterative step as
 b Aij
i
xk+1
j = xk
j . (12.16)
(Axk )i

12.4.2 The MART in the General Case


Taking the entries of the matrix A to be either one or zero, depending on
whether or not the j-th pixel is in the set Li , is too crude. The line Li
may just clip a corner of one pixel, but pass through the center of another.
Surely, it makes more sense to let Aij be the length of the intersection of
line Li with the j-th pixel, or, perhaps, this length divided by the length of
the diagonal of the pixel. It may also be more realistic to consider a strip,
instead of a line. Other modifications to Aij may made made, in order to
better describe the physics of the situation. Finally, all we can be sure of
is that Aij will be nonnegative, for each i and j. In such cases, what is the
proper form for the MART?
The MART, which can be applied only to nonnegative systems, is a
sequential, or row-action, method that uses one equation only at each step
of the iteration.
12.4. THE MART 99

Algorithm 12.1 (MART) Let x0 be any positive vector, and i = k(mod I)+
1. Having found xk for positive integer k, define xk+1 by
−1
 bi mi Aij
xk+1
j = xkj , (12.17)
(Axk )i

where mi = max {Aij |j = 1, 2, ..., J}.

Some treatments of MART leave out the mi , but require only that the
entries of A have been rescaled so that Aij ≤ 1 for all i and j. The mi is
important, however, in accelerating the convergence of MART.

12.4.3 Cross-Entropy
For a > 0 and b > 0, let the cross-entropy or Kullback-Leibler distance
from a to b be
a
KL(a, b) = a log + b − a, (12.18)
b
with KL(a, 0) = +∞, and KL(0, b) = b. Extend to nonnegative vectors
coordinate-wise, so that
J
X
KL(x, z) = KL(xj , zj ). (12.19)
j=1

Unlike the Euclidean distance, the KL distance is not symmetric; KL(Ax, b)


and KL(b, Ax) are distinct, and we can obtain different approximate so-
lutions of Ax = b by minimizing these two distances with respect to non-
negative x.

12.4.4 Convergence of MART


In the consistent case, by which we mean that Ax = b has nonnegative
solutions, we have the following convergence theorem for MART.

Theorem 12.2 In the consistent case, the MART converges


PJ to the unique
nonnegative solution of b = Ax for which the distance j=1 KL(xj , x0j ) is
minimized.

If the starting vector x0 is the vector whose entries are all one, then the
MART converges to the solution that maximizes the Shannon entropy,
J
X
SE(x) = xj log xj − xj . (12.20)
j=1
100 CHAPTER 12. THE ART AND MART (CHAPTER 15)

As with ART, the speed of convergence is greatly affected by the order-


ing of the equations, converging most slowly when consecutive equations
correspond to nearly parallel hyperplanes.

Open Question: When there are no nonnegative solutions, MART does


not converge to a single vector, but, like ART, is always observed to produce
a limit cycle of vectors. Unlike ART, there is no proof of the existence of
a limit cycle for MART.
12.4. THE MART 101

Figure 12.2: The ART algorithm in the consistent case.


102 CHAPTER 12. THE ART AND MART (CHAPTER 15)

Figure 12.3: The ART algorithm in the inconsistent case.


Chapter 13

Some Linear Algebra


(Chapter 15)

Linear algebra is the study of linear transformations between vector spaces.


Although the subject is not simply matrix theory, there is a close con-
nection, stemming from the role of matrices in representing linear trans-
formations. Throughout this section we shall limit discussion to finite-
dimensional vector spaces.

13.1 Matrix Algebra


If A and B are real or complex M by N and N by K matrices, respectively,
then the product C = AB is defined as the M by K matrix whose entry
Cmk is given by
N
X
Cmk = Amn Bnk . (13.1)
n=1

If x is an N -dimensional column vector, that is, x is an N by 1 matrix,


then the product b = Ax is the M -dimensional column vector with entries
N
X
bm = Amn xn . (13.2)
n=1

Exercise 13.1 Show that, for each k = 1, ..., K, Colk (C), the kth column
of the matrix C = AB, is

Colk (C) = AColk (B).

103
104 CHAPTER 13. SOME LINEAR ALGEBRA (CHAPTER 15)

It follows from this exercise that, for given matrices A and C, every column
of C is a linear combination of the columns of A if and only if there is a
third matrix B such that C = AB.
The matrix A† is the conjugate transpose of the matrix A, that is, the
N by M matrix whose entries are

(A† )nm = Amn (13.3)

When the entries of A are real, A† is just the transpose of A, written AT .


Exercise 13.2 Let C = AB. Show that B † A† = C † .

13.2 Linear Independence and Bases


As we shall see shortly, the dimension of a finite-dimensional vector space
will be defined as the number of members of any basis. Obviously, we
first need to see what a basis is, and then to convince ourselves that if a
vector space V has a basis with N members, then every basis for V has N
members.

Definition 13.1 The span of a collection of vectors {u1 , ..., uN } in V is


the set of all vectors x that can be written as linear combinations of the un ;
that is, for which there are scalars c1 , ..., cN , such that

x = c1 u1 + ... + cN uN . (13.4)

Definition 13.2 A collection of vectors {w1 , ..., wN } in V is called a span-


ning set for a subspace S if the set S is their span.

Definition 13.3 A subset S of a vector space V is called finite dimensional


if it is contained in the span of a finite set of vectors from V .

This definition tells us what it means to be finite dimensional, but does


not tell us what dimension means, nor what the actual dimension of a finite
dimensional subset is; for that we need the notions of linear independence
and basis.

Definition 13.4 A collection of vectors {u1 , ..., uN } in V is linearly inde-


pendent if there is no choice of scalars α1 , ..., αN , not all zero, such that

0 = α1 u1 + ... + αN uN . (13.5)

Exercise 13.3 Show that the following are equivalent:


• 1. the set U = {u1 , ..., uN } is linearly independent;
• 2. no un is a linear combination of the other members of U;
13.3. DIMENSION 105

• 3. u1 6= 0 and no un is a linear combination of the members of U


that precede it in the list.

Definition 13.5 A collection of vectors U = {u1 , ..., uN } in V is called


a basis for a subspace S if the collection is linearly independent and S is
their span.

Exercise 13.4 Show that

• 1. if U = {u1 , ..., uN } is a spanning set for S, then U is a basis for S


if and only if, after the removal of any one member, U is no longer
a spanning set; and

• 2. if U = {u1 , ..., uN } is a linearly independent set in S, then U is a


basis for S if and only if, after including in U any new member from
S, U is no longer linearly independent.

13.3 Dimension
We turn now to the task of showing that every basis for a finite dimensional
vector space has the same number of members. That number will then be
used to define the dimension of that subspace.
Suppose that S is a subspace of V , that {w1 , ..., wN } is a spanning set
for S, and {u1 , ..., uM } is a linearly independent subset of S. Beginning
with w1 , we augment the set {u1 , ..., uM } with wj if wj is not in the span of
the um and the wk previously included. At the end of this process, we have
a linearly independent spanning set, and therefore, a basis, for S (Why?).
Similarly, beginning with w1 , we remove wj from the set {w1 , ..., wN } if wj
is a linear combination of the wk , k = 1, ..., j − 1. In this way we obtain
a linearly independent set that spans S, hence another basis for S. The
following lemma will allow us to prove that all bases for a subspace S have
the same number of elements.

Lemma 13.1 Let G = {w1 , ..., wN } be a spanning set for a subspace S


in RI , and H = {v 1 , ..., v M } a linearly independent subset of S. Then
M ≤ N.

Proof: Suppose that M > N . Let B0 = G = {w1 , ..., wN }. To obtain the


set B1 , form the set C1 = {v 1 , w1 , ..., wN } and remove the first member of
C1 that is a linear combination of members of C1 that occur to its left in
the listing; since v 1 has no members to its left, it is not removed. Since G
is a spanning set, v 1 6= 0 is a linear combination of the members of G, so
that some member of G is a linear combination of v 1 and the members of
G that precede it in the list; remove the first member of G for which this
is true.
106 CHAPTER 13. SOME LINEAR ALGEBRA (CHAPTER 15)

We note that the set B1 is a spanning set for S and has N members.
Having obtained the spanning set Bk , with N members and whose first k
members are v k , ..., v 1 , we form the set Ck+1 = Bk ∪ {v k+1 }, listing the
members so that the first k + 1 of them are {v k+1 , v k , ..., v 1 }. To get the set
Bk+1 we remove the first member of Ck+1 that is a linear combination of
the members to its left; there must be one, since Bk is a spanning set, and
so v k+1 is a linear combination of the members of Bk . Since the set H is
linearly independent, the member removed is from the set G. Continuing
in this fashion, we obtain a sequence of spanning sets B1 , ..., BN , each with
N members. The set BN is BN = {v 1 , ..., v N } and v N +1 must then be
a linear combination of the members of BN , which contradicts the linear
independence of H.

Corollary 13.1 Every basis for a subspace S has the same number of el-
ements.
Exercise 13.5 Let G = {w1 , ..., wN } be a spanning set for a subspace S
in RI , and H = {v 1 , ..., v M } a linearly independent subset of S. Let A be
the I by M matrix whose columns are the vectors v m and B the I by N
matrix whose columns are the wn . Prove that there is an N by M matrix
C such that A = BC. Prove Lemma 13.1 by showing that, if M > N , then
there is a non-zero vector x with Cx = 0.
Definition 13.6 The dimension of a subspace S is the number of elements
in any basis.
Lemma 13.2 For any matrix A, the maximum number of linearly inde-
pendent rows equals the maximum number of linearly independent columns.

Proof: Suppose that A is an I by J matrix, and that K ≤ J is the


maximum number of linearly independent columns of A. Select K linearly
independent columns of A and use them as the K columns of an I by K
matrix U . Since every column of A must be a linear combination of these
K selected ones, there is a K by J matrix M such that A = U M . From
AT = M T U T we conclude that every column of AT is a linear combination
of the K columns of the matrix M T . Therefore, there can be at most K
linearly independent columns of AT .

Definition 13.7 The rank of A is the maximum number of linearly inde-


pendent rows or of linearly independent columns of A.

13.4 Representing a Linear Transformation


Let A = {a1 , a2 , ..., aN } be a basis for the finite-dimensional complex vector
space V . Now that the basis for V is specified, there is a natural association,
13.5. LINEAR FUNCTIONALS AND DUALITY 107

an isomorphism, between V and the vector space CN of N -dimensional


column vectors with complex entries. Any vector v in V can be written as
N
X
v= γn an . (13.6)
n=1

The column vector γ = (γ1 , ..., γN )T is uniquely determined by v and the


basis A and we denote it by γ = [v]A . Notice that the ordering of the list
of members of A matters, so we shall always assume that the ordering has
been fixed.
Let W be a second finite-dimensional vector space, and let T be any
linear transformation from V to W . Let B = {b1 , b2 , ..., bM } be a basis for
W . For n = 1, ..., N , let

T an = A1n b1 + A2n b2 + ... + AM n bM . (13.7)

Then the M by N matrix A having the Amn as entries is said to represent


T , with respect to the bases A and B.

Exercise 13.6 Show that [T v]B = A[v]A .

Exercise 13.7 Suppose that V , W and Z are vector spaces, with bases
A, B and C, respectively. Suppose also that T is a linear transformation
from V to W and U is a linear transformation from W to Z. Let A
represent T with respect to the bases A and B, and let B represent U with
respect to the bases B and C. Show that the matrix BA represents the linear
transformation U T with respect to the bases A and C.

13.5 Linear Functionals and Duality


When the second vector space W is just the space C of complex numbers,
any linear transformation from V to W is called a linear functional. The
space of all linear functionals on V is denoted V ∗ and called the dual space
of V . The set V ∗ is itself a finite-dimensional vector space, so it too has a
dual space, (V ∗ )∗ = V ∗∗ .

Exercise 13.8 Show that the dimension of V ∗ is the same as that of V .


Hint: let A = {a1 , ..., aN } be a basis for V , and for each m = 1, ..., N ,
let f m (an ) = 0, if m 6= n, and f m (am ) = 1. Show that the collection
{f 1 , ..., f N } is a basis for V ∗ .

There is a natural identification of V ∗∗ with V itself. For each v in V ,


define Jv (f ) = f (v) for each f in V ∗ . Then it is easy to establish that Jv
is in V ∗∗ for each v in V . The set JV of all members of V ∗∗ of the form Jv
for some v is a subspace of V ∗∗ .
108 CHAPTER 13. SOME LINEAR ALGEBRA (CHAPTER 15)

Exercise 13.9 Show that the subspace JV has the same dimension as V ∗∗
itself, so that it must be all of V ∗∗ .

We shall see later that once V has been endowed with an inner product,
there is a simple way to describe every linear functional on V : for each f
in V ∗ there is a unique vector vf in V with f (v) = hv, vf i, for each v in V .
As a result, we have an identification of V ∗ with V itself.

13.6 Linear Operators on V


When W = V , we say that the linear transformation T is a linear operator
on V . In this case, we can also take the basis B to be A, and say that the
matrix A represents the linear operator T , with respect to the basis A. We
then write A = [T ]A .

Exercise 13.10 Suppose that B is a second basis for V . Show that there is
a unique N by N matrix Q having the property that the matrix B = QAQ−1
represents T , with respect to the basis B; that is, we can write

[T ]B = Q[T ]A Q−1 .

Hint: The matrix Q is the change-of-basis matrix, satisfying

[v]B = Q[v]A ,

for all v.

13.7 Diagonalization
Let T : V → V be a linear operator, A a basis for V , and A = [T ]A . As we
change the basis, the matrix representing T also changes. We wonder if it
is possible to find some basis B such that B = [T ]B is a diagonal matrix L.
Let P = [I]A B be the change-of basis matrix from B to A. We would then
have P −1 AP = L, or A = P LP −1 . When this happens, we say that A has
been diagonalized by P .
Suppose that the basis B = {b1 , ..., bN } is such that B = [T ]B = L,
where L is the diagonal matrix L = diag {λ1 , ..., λN }. Then we have AP =
P L, which tells us that pn , the n-th column of P , is an eigenvector of the
matrix A, with λn as its eigenvalue. Since pn = [bn ]A , we have

0 = (A − λn I)pn = (A − λn I)[bn ]A = [(T − λn I)bn ]A ,

from which we conclude that

(T − λn I)bn = 0,
13.8. USING MATRIX REPRESENTATIONS 109

or
T bn = λn bn ;
therefore, bn is an eigenvector of the linear operator T .

13.8 Using Matrix Representations


The matrix A has eigenvalues λn , n = 1, ..., N precisely when these λn are
the roots of the characteristic polynomial

P (λ) = det (A − λI).

We would like to be able to define the characteristic polynomial of T itself


to be P (λ); the problem is that we do not yet know that different matrix
representations of T have the same characteristic polynomial.

Exercise 13.11 Use the fact that det(GH)=det(G)det(H) for any square
matrices G and H to show that

det([T ]B − λI) = det([T ]C − λI),

for any bases B and C for V .

13.9 Matrix Diagonalization and Systems of


Linear ODE’s
We know that the ordinary linear differential equation

x0 (t) = ax(t)

has the solution


x(t) = x(0)eat .
In this section we use matrix diagonalization to generalize this solution to
systems of linear ordinary differential equations.
Consider the system of linear ordinary differential equations

x0 (t) = 4x(t) − y(t) (13.8)


0
y (t) = 2x(t) + y(t), (13.9)

which we write as z 0 (t) = Az(t), with


 
4 −1
A= ,
2 1
110 CHAPTER 13. SOME LINEAR ALGEBRA (CHAPTER 15)
 
x(t)
z(t) = ,
y(t)
and
x0 (t)
 
0
z (t) = .
y 0 (t)
We then have

det(A − λI) = (4 − λ)(1 − λ) + 2 = (λ − 2)(λ − 3),

so the eigenvalues of A are λ = 2 and λ = 3.


The vector u given by  
1
u=
2
solves the system Au = 2u and the vector v given by
 
1
v=
1

solves the system Av = 3v. Therefore, u and v are linearly independent


eigenvectors of A. With  
1 1
B= ,
2 1
 
−1 −1 1
B = ,
2 −1
and  
2 0
D= ,
0 3
we have A = BDB −1 and B −1 AB = D; this is a diagonalization of A using
its eigenvalues and eigenvectors.
Note that not every N by N matrix A will have such a diagonalization;
we need N linearly independent eigenvectors of A, which need not exist.
They do exist if the eigenvalues of A are all different, as in the example
here, and also if the matrix A is Hermitian or normal. The reader should
prove that matrix  
1 1
M=
0 1
has no such diagonalization.
Continuing with our example, we let w(t) = B −1 z(t) so that w0 (t) =
Dw(t). Because D is diagonal, this new system is uncoupled;

w10 (t) = 2w1 (t),


13.9. MATRIX DIAGONALIZATION AND SYSTEMS OF LINEAR ODE’S111

and
w20 (t) = 3w2 (t).
The solutions are then
w1 (t) = w1 (0)e2t ,
and
w2 (t) = w2 (0)e3t .
It follows from z(t) = Bw(t) that

x(t) = w1 (0)e2t + w2 (0)e3t ,

and
y(t) = 2w1 (0)e2t + w2 (0)e3t .
We want to express x(t) and y(t) in terms of x(0) and y(0). To do this we
use z(0) = Bw(0), which tells us that

x(t) = (−x(0) + y(0))e2t + (2x(0) − y(0))e3t ,

and
y(t) = (−2x(0) + 2y(0))e2t + (2x(0) − y(0))e3t .
We can rewrite this as
z(t) = E(t)z(0),
where  
−e2t + 2e3t e2t − e3t
E(t) = .
−2e2t + 2e3t 2e2t − e3t
What is the matrix E(t)?
To mimic the solution x(t) = x(0)eat of the problem x0 (t) = ax(t), we
try
z(t) = etA z(0),
with the matrix exponential defined by

X 1 n n
etA = t A .
n=0
n!

Since A = BDB −1 , it follows that An = BDn B −1 , so that

etA = BetD B −1 .

Since D is diagonal, we have


 
e2t 0
etD = .
0 e3t
112 CHAPTER 13. SOME LINEAR ALGEBRA (CHAPTER 15)

A simple calculation shows that


 2t   
e 0 −1 −e2t + 2e3t e2t − e3t
etA = B 3t B = = E(t).
0 e −2e2t + 2e3t 2e2t − e3t

Therefore, the solution of the original system is

z(t) = etA z(0).

13.10 An Inner Product on V


For any two column vectors x = (x1 , ..., xN )T and y = (y1 , ..., yn )T in CN ,
their complex dot product is defined by
N
X
x·y = xn yn = y † x,
n=1

where y † is the conjugate transpose of the vector y, that is, y † is the row
vector with entries yn .
The association of the elements v in V with the complex column vector
[v]A can be used to obtain an inner product on V . For any v and w in V ,
define

hv, wi = [v]A · [w]A , (13.10)

where the right side is the ordinary complex dot product in CN . Once we
have an
pinner product on V we can define the norm of a vector in V as
kvk = hv, vi.

Definition 13.8 A collection of vectors {u1 , ..., uN } in an inner product


space V is called orthonormal if kun k2 = 1, for all n, and hum , un i = 0,
for m 6= n.

Note that, with respect to this inner product, the basis A becomes an
orthonormal basis.
We assume, throughout the remainder of this section, that V is an
inner-product space. For more detail concerning inner products, see the
chapter Appendix: Inner Products and Orthogonality.

13.11 Representing Linear Functionals


Let f : V → C be a linear functional on the inner-product space V and let
A = {a1 , ..., aN } be the basis for V used to define the inner product, as in
Equation (13.10). The singleton set {1} is a basis for the space W = C,
13.12. THE ADJOINT OF A LINEAR TRANSFORMATION 113

and the matrix A that represents T = f is a 1 by N matrix, or row vector,


A = Af with entries f (an ). Therefore, for each
N
X
v= αn an ,
n=1

in V , we have
N
X
f (v) = Af [v]A = f (an )αn .
n=1

Consequently, we can write

f (v) = hv, yf i,

for the vector yf with Af = [yf ]†A , or


N
X
yf = f (an )an .
n=1

So we see that once V has been given an inner product, each linear func-
tional f on V can be thought of as corresponding to a vector yf in V , so
that
f (v) = hv, yf i.

Exercise 13.12 Show that the vector yf associated with the linear func-
tional f is unique by showing that

hv, yi = hv, wi,

for every v in V implies that y = w.

13.12 The Adjoint of a Linear Transforma-


tion
Let T : V → W be a linear transformation from a vector space V to a
vector space W . The adjoint of T is the linear operator T ∗ : W ∗ → V ∗
defined by

(T ∗ g)(v) = g(T v), (13.11)

for each g ∈ W ∗ and v ∈ V .


Once V and W have been given inner products, and V ∗ and W ∗ have
been identified with V and W , respectively, the operator T ∗ can be defined
as a linear operator from W to V as follows. Let T : V → W be a linear
114 CHAPTER 13. SOME LINEAR ALGEBRA (CHAPTER 15)

transformation from an inner-product space V to an inner-product space


W . For each fixed w in W , define a linear functional f on V by
f (v) = hT v, wi.
By our earlier discussion, f has an associated vector yf in V such that
f (v) = hv, yf i.
Therefore,
hT v, wi = hv, yf i,
for each v in V . The adjoint of T is the linear transformation T ∗ from W
to V defined by T ∗ w = yf .
When W = V , and T is a linear operator on V , then so is T ∗ . In
this case, we can ask whether or not T ∗ T = T T ∗ , that is, whether or not
T is normal, and whether or not T = T ∗ , that is, whether or not T is
self-adjoint.

13.13 Orthogonality
Two vectors v and w in the inner-product space V are said to be orthogonal
if hv, wi = 0. A basis U = {u1 , u2 , ..., uN } is called an orthogonal basis if
every two vectors in U are orthogonal, and orthonormal if, in addition,
kun k = 1, for each n.
Exercise 13.13 Let U and V be orthonormal bases for the inner-product
space V , and let Q be the change-of-basis matrix satisfying
[v]U = Q[v]V .
Show that Q−1 = Q† , so that Q is a unitary matrix.
Exercise 13.14 Let U be an orthonormal basis for the inner-product space
V and T a linear operator on V . Show that
[T ∗ ]U = ([T ]U )† . (13.12)

13.14 Normal and Self-Adjoint Operators


Let T be a linear operator on an inner-product space V . We say that T is
normal if T ∗ T = T T ∗ , and self-adjoint if T ∗ = T . A square matrix A is
said to be normal if A† A = AA† , and Hermitian if A† = A.
Exercise 13.15 Let U be an orthonormal basis for the inner-product space
V . Show that T is normal if and only if [T ]U is a normal matrix, and T is
self-adjoint if and only if [T ]U is Hermitian. Hint: use Exercise (13.7).
13.15. IT IS GOOD TO BE “NORMAL” 115

Exercise 13.16 Compute the eigenvalues for the real square matrix
 
1 2
A= . (13.13)
−2 1

Note that the eigenvalues are complex, even though the entries of A are
real. The matrix A is not Hermitian.

Exercise 13.17 Show that the eigenvalues of the complex matrix


 
1 2+i
B= (13.14)
2−i 1
√ √
√ λ = 1T + 5 and√λ = 1 −T 5, with corresponding
are the real numbers
eigenvectors u = ( 5, 2 − i) and v = ( 5, i − 2) , respectively.

Exercise 13.18 Show that the eigenvalues of the real matrix


 
1 1
C= (13.15)
0 1

are both equal to one, and that the only eigenvectors are non-zero multiples
of the vector (1, 0)T . Compute C T C and CC T . Are they equal?

13.15 It is Good to be “Normal”


For a given linear operator, when does there exist an orthonormal basis for
V consisting of eigenvectors of T ? The answer is: When T is normal.
Consider an N by N matrix A. We use A to define a linear operator T
on the space of column vectors V = CN by T v = Av, that is, the operator
T works by multiplying each column vector v in CN by the matrix A.
Then A represents T with respect to the usual orthonormal basis A for
CN . Suppose now that there is an orthonormal basis U = {u1 , ..., uN } for
CN such that
Aun = λn un ,
for each n. The matrix representing T in the basis U is the matrix B =
Q−1 AQ, where Q is the change-of-basis matrix with

Q[v]U = [v]A .

But we also know that B is the diagonal matrix B = L =diag(λ1 , ..., λN ).


Therefore, L = Q−1 AQ, or A = QLQ−1 .
As we saw in Exercise (13.13), the matrix Q is unitary, that is, Q−1 =
Q . Therefore, A = QLQ† . Then we have

A† A = QL† Q† QLQ† = QL† LQ†


116 CHAPTER 13. SOME LINEAR ALGEBRA (CHAPTER 15)

= QLL† Q† = QLQ† QL† Q† = AA† ,


so that
A† A = AA† ,
and A is normal.
Two fundamental results in linear algebra are the following.

Theorem 13.1 For a linear operator T on a finite-dimensional complex


inner-product space V there is an orthonormal basis of eigenvectors if and
only if T is normal.

Corollary 13.2 A self-adjoint linear operator T on a finite-dimensional


complex inner-product space V has an orthonormal basis of eigenvectors.

Exercise 13.19 Show that the eigenvalues of a self-adjoint linear operator


T on a finite-dimensional complex inner-product space are real numbers.
Hint: consider T u = λ1 u, and begin with λhu, ui = hT u, ui.

Combining the various results obtained so far, we can conclude the follow-
ing.

Corollary 13.3 Let T be a linear operator on a finite-dimensional real


inner-product space V . Then V has an orthonormal basis consisting of
eigenvectors of T if and only if T is self-adjoint.

We present a proof of the following theorem.

Theorem 13.2 For a linear operator T on a finite-dimensional complex


inner-product space V there is an orthonormal basis of eigenvectors if and
only if T is normal.

We saw previously that if V has an orthonormal basis of eigenvectors


of T , then T is a normal operator. We need to prove the converse: if T is
normal, then V has an orthonormal basis consisting of eigenvectors of T .
A subspace W of V is said to be T -invariant if T w is in W whenever
w is in W . For any T -invariant subspace W , the restriction of T to W ,
denoted TW , is a linear operator on W .
For any subspace W , the orthogonal complement of W is the space
W ⊥ = {v|hw, vi = 0, for all w ∈ W }.

Proposition 13.1 Let W be a T -invariant subspace of V . Then

• (a) if T is self-adjoint, so is TW ;

• (b) W ⊥ is T ∗ -invariant;

• (c) if W is both T - and T ∗ -invariant, then (TW )∗ = (T ∗ )W ;


13.15. IT IS GOOD TO BE “NORMAL” 117

• (d) if W is both T - and T ∗ -invariant, and T is normal, then TW is


normal.

• (e) if T is normal and T x = λx, then T ∗ x = λx.

Exercise 13.20 Prove Proposition (13.1).

Proposition 13.2 If T is normal, T u1 = λ1 u1 , T u2 = λ2 u2 , and λ1 6= λ2 ,


then hu1 , u2 i = 0.

Exercise 13.21 Prove Proposition 13.2. Hint: use (e) of Proposition


13.1.

Proof of Theorem 13.2 The proof is by induction on the dimension of


the inner-product space V . To begin with, let N = 1, so that V is simply
the span of some unit vector x. Then any linear operator T on V has
T x = λx, for some λ, and the set {x} is an orthonormal basis for V .
Now suppose that the theorem is true for every inner-product space of
dimension N − 1. We know that every linear operator T on V has at least
one eigenvector, say x1 , since its characteristic polynomial has at least one
distinct eigenvalue λ1 in C. Take x1 to be a unit vector. Let W be the
span of the vector x1 , and W ⊥ the orthogonal complement of W . Since
T x1 = λ1 x1 and T is normal, we know that T ∗ x1 = λ1 x1 . Therefore, both
W and W ⊥ are T - and T ∗ -invariant. Therefore, TW ⊥ is normal on W ⊥ .
By the induction hypothesis, we know that W ⊥ has an orthonormal basis
consisting of N − 1 eigenvectors of TW , and, therefore, of T . Augmenting
this set with the original x1 , we get an orthonormal basis for all of V .

Corollary 13.4 A self-adjoint linear operator T on a finite-dimensional


complex inner-product space V has an orthonormal basis of eigenvectors.

Corollary 13.5 Let T be a linear operator on a finite-dimensional real


inner-product space V . Then V has an orthonormal basis consisting of
eigenvectors of T if and only if T is self-adjoint.

Proving the existence of the orthonormal basis uses essentially the same
argument as the induction proof given earlier. The eigenvalues of a self-
adjoint linear operator T on a finite-dimensional complex inner-product
space are real numbers. If T be a linear operator on a finite-dimensional real
inner-product space V and V has an orthonormal basis U = {u1 , ..., uN }
consisting of eigenvectors of T , then we have

T un = λn un = λn un = T ∗ un ,
118 CHAPTER 13. SOME LINEAR ALGEBRA (CHAPTER 15)

so, since T = T ∗ on each member of the basis, these operators are the same
everywhere, so T = T ∗ and T is self-adjoint.
We close with an example of a real 2 by 2 matrix A with AT A = AAT ,
but with no eigenvectors in R2 . Take 0 < θ < π and A to be the matrix
 
cos θ − sin θ
A= . (13.16)
sin θ cos θ

This matrix represents rotation through an angle of θ in R2 . Its transpose


represents rotation through the angle −θ. These operations obviously can
be done in either order, so the matrix A is normal. But there is no non-zero
vector in R2 that is an eigenvector. Clearly, A is not symmetric.
Part II

Readings for Applied


Mathematics II

119
Chapter 14

Vectors (Chapter 5,6)

14.1 Real N -dimensional Space


A real N -dimensional row vector is a list x = (x1 , x2 , ..., xN ), where each
xn is a real number. In the context of matrix multiplication, we find it
convenient to view x as a column vector or 1 by N matrix; generally,
though we shall view x as a row vector. We denote by RN the set of all
such x.

14.2 Two Roles for Members of RN


Members of RN play two different roles: they can be points in N -dimensional
space, or they can be directed line segments in N -dimensional space. Con-
sider the case of R2 . The graph of the linear equation

3x1 + 2x2 = 6 (14.1)

is a straight line in the plane. A vector x = (x1 , x2 ) is said to be on this


graph if Equation (14.1) holds. For example, x = (2, 0) is on the graph,
as is y = (0, 3); now both x and y are viewed as points in the plane. The
vector a = (3, 2), viewed as a directed line segment, is perpendicular to the
graph; it is orthogonal to the directed line segment b = x − y = (2, −3)
running from y to x that lies along the graph. To see this, note that the
dot product a · b = 0. There is no way to tell from the symbols we use
which role a member of RN is playing at any given moment; we just have
to figure it out from the context.

121
122 CHAPTER 14. VECTORS (CHAPTER 5,6)

14.3 Vector Algebra and Geometry


There are several forms of multiplication associated with vectors in RN .
The simplest is multiplication of a vector by a scalar. By scalar we mean
a real (or sometimes a complex) number. When we multiply the vector
x = (2, −3, 6, 1) by the scalar 4 we get the vector
4x = (8, −12, 24, 4).
The length of a vector x in RN is
q
|x| = x21 + x22 + ... + x2N . (14.2)

The dot product x·y of two vectors x = (x1 , x2 , ..., xN ) and y = (y1 , y2 , ..., yN )
in RN is defined by
x · y = x1 y1 + x2 y2 + ... + xN yN . (14.3)
2 3
For the cases of R and R√ we can give geometric meaning to the dot
product; the length of x is x · x and
x · y = |x| |y| cos(θ),
where θ is the angle between x and y when they are viewed as directed line
segments positioned to have a common beginning point. We see from this
that two vectors are perpendicular (or orthogonal) when their dot product
is zero.
For R3 we also have the cross product x × y, defined by
x × y = (x2 y3 − x3 y2 , x3 y1 − x1 y3 , x2 y3 − x3 y2 ). (14.4)
When x and y are viewed as directed line segments with a common be-
ginning point, the cross product is viewed as a third directed line segment
with the same beginning point, perpendicular to both x and y, and having
for its length the area of the parallelogram formed by x and y. Therefore,
if x and y are parallel, there is zero area and the cross product is the zero
vector. Note that
y × x = −x × y. (14.5)
From the relationships
x · (y × z) = y · (z × x) = z · (x × y) (14.6)
we see that
x · (x × y) = y · (x × x) = 0. (14.7)
The dot product and cross product are relatively new additions to the
mathematical tool box. They grew out of the 19th century study of quater-
nions.
14.4. COMPLEX NUMBERS 123

14.4 Complex Numbers


We may think of complex numbers as members of R2 with extra algebra
imposed, mainly the operator of multiplying two complex numbers to get a
third complex number. A complex number z can be written several ways:

z = (x, y) = x(1, 0) + y(0, 1) = x + yi,

where i is the shorthand for the complex number i = (0, 1). With w =
(u, v) = u + vi a second complex number, the product zw is

zx = (x + yi)(u + vi) = xu + xvi + yui + uvi2 = (xu − yv, xv + yu),(14.8)

which we obtain by defining i2 = (0, 1)(0, 1) = (−1, 0) = −1. The idea of


allowing −1 to have a square root was used as a trick in the middle ages
to solve certain polynomial equations, and was given a solid mathematical
foundation in the early part of the 19th century, with the development
of the theory of complex-valued functions of a complex variable (complex
analysis).
Complex analysis led to amazing new theorems and mathematical tools,
but was limited to two dimensions. As complex analysis was developing,
the theory of electromagnetism (EM) was beginning to take shape. The
EM theory dealt with the physics of three-dimensional space, while complex
analysis dealt only with two-dimensional space. What was needed was a
three-dimensional version of complex analysis.

14.5 Quaternions
It seemed logical that a three-dimensional version of complex analysis would
involve objects of the form
a + bi + cj,
where a, b, and c are real numbers, (1, 0, 0) = 1, i = (0, 1, 0) and j =
(0, 0, 1), and i2 = j 2 = −1 now. Multiplying a + bi + cj by d + ei + f j led
to the question What are ij and ji? The Irish mathematician Hamilton
eventually hit on the answer, but it forced the search to move from three-
dimensional space to four-dimensional space.
Hamilton discovered that it was necessary to consider objects of the
form a+bi+cj +dk, where 1 = (1, 0, 0, 0), i = (0, 1, 0, 0), j = (0, 0, 1, 0), and
k = (0, 0, 0, 1), and ij = k = −ji. With the other rules i2 = j 2 = k 2 = −1,
jk = i = −kj, and ki = j = −ik, we get what are called the quaternions.
For a while in the latter half of the 19th century it was thought that
quaternions would be the main tool for studying EM theory, but that was
not what happened.
124 CHAPTER 14. VECTORS (CHAPTER 5,6)

Let x = a+bi+cj+dk = (a, A), where A = (b, c, d) is viewed as a vector


in R3 , and y = e + f i + gj + hk = (e, B), where B = (f, g, h) is another
member of R3 . When we multiply the quaternion x by the quaternion y to
get xy, we find that xy = −yx and

xy = (ae − A · B, aB + eA + A × B). (14.9)

This tells us that quaternion multiplication employs all four of the notions
of multiplication that we have encountered previously: ordinary scalar mul-
tiplication, multiplication of a vector by a scalar, the dot product, and the
cross product. It didn’t take people long to realize that it isn’t necessary to
use quaternion multiplication all the time; just use the dot product when
you need it, and the cross product when you need it. Quaternions were
demoted to exercises in abstract algebra texts, while the notions of dot
product and cross product became essential tools in vector calculus and
EM theory.
Chapter 15

A Brief History of
Electromagnetism
(Chapter 5,6)

15.1 Who Knew?


Understanding the connections between magnetism and electricity and ex-
ploiting that understanding for technological innovation dominated science
in the nineteenth century, and yet no one saw it coming. In the index
to Butterfield’s classic history of the scientific revolution [9], which he lo-
cates roughly from 1300 to 1800, the word “electricity” does not appear.
Nobody in 1800 could have imagined that, within a hundred years or so,
people would live in cities illuminated by electric light, work with machinery
driven by electricity, in factories cooled by electric-powered refrigeration,
and go home to listen to a radio and talk to neighbors on a telephone. How
we got there is the subject of this essay.
These days, we tend to value science for helping us to predict things
like hurricanes, and for providing new technology. The scientific activity
we shall encounter in this chapter was not a quest for expanded powers
and new devices, but a search for understanding; the expanded powers and
new devices came later. The truly fundamental advances do not come from
focusing on immediate applications, and, anyway, it is difficult to anticipate
what applications will become important in the future. Nobody in 1960
thought that people would want a computer in their living room, just as
nobody in 1990 wanted a telephone that took pictures.
Electricity, as we now call it, was not completely unknown, of course.
In the late sixteenth century, Gilbert, famous for his studies of magnetism,
discovered that certain materials, mainly crystals, could be made attractive

125
126CHAPTER 15. A BRIEF HISTORY OF ELECTROMAGNETISM (CHAPTER 5,6)

by rubbing them with a cloth. He called these materials electrics. Among


Gilbert’s accomplishments was his overturning of the conventional wisdom
about magnets, when he showed, experimentally, that magnets could still
attract nails after being rubbed with garlic. Sometime after Gilbert, elec-
trostatic repulsion and induction were discovered, making the analogy with
magnetism obvious. However, until some way was found to study electric-
ity in the laboratory, the mysteries of electricity would remain hidden and
its importance unappreciated.

15.2 “What’s Past is Prologue”


The history of science is important not simply for its own sake, but as a
bridge connecting the arts with the sciences. When we study the history
of science, we begin to see science as an integral part of the broader quest
by human beings to understand themselves and their world. Progress in
science comes not only from finding answers to questions, but from learn-
ing to ask better questions. The questions we are able to ask, indeed the
observations we are able to make, are conditioned by our society, our his-
tory, and our intellectual outlook. Science does not exist in a vacuum. As
Shakespeare’s line, carved into the wall of the National Archives building
in Washington, D.C., suggests, the past sets the stage for what comes next,
indeed, for what can come next.

15.3 Are We There Yet?


We should be careful when we talk about progress, either within science
or more generally. Reasonable people can argue about whether or not
the development of atomic weapons ought to be called progress. Einstein
and others warned, at the beginning of the atomic age, that the emotional
and psychological development of human beings had not kept pace with
technological development, that we did not have the capacity to control
our technology. It does seem that we have a difficult time concerning
ourselves, as a society, with problems that will become more serious in
the future, preferring instead the motto “I won’t be there. You won’t be
there.”
We can certainly agree, though, that science, overall, has led us to a
better, even if not complete, understanding of ourselves and our world and
to the technology that is capable of providing decent life and health to far
more people than in the past. These successes have given science and scien-
tists a certain amount of political power that is not universally welcomed,
however. Recent attempts to challenge the status of science within the
community, most notably in the debate over creation “science” and evo-
lution, have really been attempts to lessen the political power of science,
15.4. WHY DO THINGS MOVE? 127

not debates within science itself; the decades long attacks on science by
the cigarette industry and efforts to weaken the EPA show clearly that it is
not only some religious groups that want the political influence of science
diminished.
Many of the issues our society will have to deal with in the near future,
including nuclear power, terrorism, genetic engineering, energy, climate
change, control of technology, space travel, and so on, involve science and
demand a more sophisticated understanding of science on the part of the
general public. The recent book Physics for Future Presidents: the Science
Behind the Headlines [36] discusses many of these topics, supposedly as an
attempt by the author to educate presidents-to-be, who will be called on
to make decisions, to initiate legislation, and to guide the public debate
concerning these issues.
History reminds us that progress need not be permanent. The tech-
nological expertise and artistic heights achieved by the Romans, even the
mathematical sophistication of Archimedes, were essentially lost, at least
in the west, for fifteen hundred years.
History also teaches us how unpredictable the future can be, which is, in
fact, the underlying theme of this essay. No one in 1800 could have imagined
the electrification that transformed society over the nineteenth century, just
as no one in 1900 could have imagined Hiroshima and Nagasaki, only a few
decades away, let alone the world of today.

15.4 Why Do Things Move?


In his famous “The Origins of Modern Science” [9] Butterfield singles out
the problem of motion as the most significant intellectual hurdle the human
mind has confronted and overcome in the last fifteen hundred years. The
ancients had theories of motion, but for Aristotle, as a scientist perhaps
more of a biologist than a physicist, motion as change in location was
insignificant compared to motion as qualitative change, as, say, when an
acorn grows into a tree. The change experienced by the acorn is clearly
oriented toward a goal, to make a tree. By focusing on qualitative change,
Aristotle placed too much emphasis on the importance of a goal. His idea
that even physical motion was change toward a goal, that objects had
a “natural” place to which they “sought” to return, infected science for
almost two thousand years.
We must not be too quick to dismiss Aristotle’s view, however. General
relativity asserts that space-time is curved and that clocks slow down where
gravity is stronger. Indeed, a clock on the top of the Empire State Building
runs slightly faster than one at street level. As Brian Greene puts it,
Right now, according to these ideas, you are anchored to the floor be-
cause your body is trying to slide down an indentation in space (really,
128CHAPTER 15. A BRIEF HISTORY OF ELECTROMAGNETISM (CHAPTER 5,6)

spacetime) caused by the earth. In a sense, all objects “want” to age as


slowly as possible [23].
The one instance of motion as change in location whose importance
the ancients appreciated was the motion of the heavens. Aristotle (384-
322 B.C.) taught the geocentric theory that the heavens move around the
earth. Aristarchus of Samos (310-230 B.C.) had a different view; according
to Heath [25], “There is not the slightest doubt that Aristarchus was the
first to put forward the heliocentric hypothesis.” This probably explains
why contemporaries felt that Aristarchus should be indicted for impiety.
Ptolemy (100-170 A.D.) based his astronomical system of an earth-centered
universe on the theories of Aristotle. Because the objects in the heavens,
the moon, the planets and the stars, certainly appear to move rapidly, they
must be made of an unearthly material, the quintessence.
The recent film “Agora” portrays the Alexandrian mathematician and
philosopher Hypatia (350-415 A.D.) as an early version of Copernicus, but
this is probably anachronistic. Her death at the hands of a Christian mob
seems to have had more to do with rivalries among Christian leaders than
with her scientific views and her belief in the heliocentric theory.
So things stood until the middle ages. In the fourteenth century the
French theologian Nicole Oresme considered the possibility that the earth
rotated daily around its own axis [34]. This hypothesis certainly simplified
things considerably, and removed the need for the heavens to spin around
the earth daily at enormous speeds. But even Oresme himself was hesitant
to push this idea, since it conflicted with scripture.
Gradually, natural philosophers, the term used to describe scientists
prior to the nineteenth century, began to take a more serious interest in
motion as change in location, due, in part, to their growing interest in
military matters and the trajectory of cannon balls. Now, motion on earth
and motion of the heavenly bodies came to be studied by some of the same
people, such as Galileo, and this set the stage for the unified theory of
motion due to gravity that would come later, with Newton.
Copernicus’ theory of a sun-centered astronomical system, Tycho Brahe’s
naked-eye observations of the heavens, Kepler’s systematizing of planetary
motion, the invention of the telescope and its use by Galileo to observe
the pock-marked moon and the mini-planetary system of Jupiter, Galileo’s
study of balls rolling down inclined planes, and finally Newton’s Law of Uni-
versal Gravitation marked a century of tremendous progress in the study
of motion and put mechanics at the top of the list of scientific paradigms
for the next century. Many of the theoretical developments of the eigh-
teenth century involved the expansion of Newton’s mechanics to ever more
complex systems, so that, by the end of that century, celestial mechanics
and potential theory were well developed mathematical subjects.
As we shall see, the early development of the field we now call elec-
tromagnetism involved little mathematics. As the subject evolved, the
15.5. GO FLY A KITE! 129

mathematics of potential theory, borrowed from the study of gravitation


and celestial mechanics, was combined with the newly discovered vector
calculus and the mathematical treatment of heat propagation to give the
theoretical formulation of electromagnetism familiar to us today.

15.5 Go Fly a Kite!


The ancients knew about magnets and used them as compasses. Static
electricity was easily observed and thought to be similar to magnetism. As
had been known for centuries, static electricity exhibited both attraction
and repulsion. For that reason, it was argued that there were two distinct
types of electricity. Benjamin Franklin opposed this idea, insisting instead
on two types of charge, positive and negative. Some progress was made
in capturing electricity for study with the invention of the Leyden jar, a
device for storing relatively large electrostatic charge (and giving rather
large shocks). The discharge from the Leyden jar reminded Franklin of
lightning and prompted him and others to fly kites in thunderstorms and to
discover that lightning would charge a Leyden jar; lightning was electricity.
These experiments led to his invention of the lightning rod, a conducting
device attached to houses to direct lightning strikes down to the ground.
The obvious analogies with magnetism had been noticed by Gilbert and
others in the late sixteenth century, and near the end of the eighteenth cen-
tury Coulomb found that both magnetic and electrical attraction fell off as
the square of the distance, as did gravity, according to Newton. Indeed, the
physical connection between magnetism and gravity seemed more plausi-
ble than one between magnetism and electricity, and more worth studying.
But things were about to change.

15.6 Bring in the Frogs!


In 1791 Galvani observed that a twitching of the muscles of a dead frog
he was dissecting seemed to be caused by sparks from a nearby discharge
of a Leyden jar. He noticed that the sparks need not actually touch the
muscles, provided a metal scalpel touched the muscles at the time of dis-
charge. He also saw twitching muscles when the frog was suspended by
brass hooks on an iron railing in a thunderstorm. Eventually, he real-
ized that the Leyden jar and thunderstorm played no essential roles; two
scalpels of different metals touching the muscles were sufficient to produce
the twitching. Galvani concluded that the electricity was in the muscles;
it was animal electricity.
Believing that the electricity could be within the animals is not as far-
fetched as it may sound. It was known at the time that there were certain
“electric” fish that generated their own electricity and used it to attack
130CHAPTER 15. A BRIEF HISTORY OF ELECTROMAGNETISM (CHAPTER 5,6)

their prey. When these animals were dissected, it was noticed that there
were unusual structures within their bodies that other fish did not have.
Later, it became clear that these structures were essentially batteries.

15.7 Lose the Frogs!


In 1800 Volta discovered that electricity could be produced by two dissim-
ilar metals, copper and zinc, say, in salt water; no animal electricity here,
and no further need for the frogs. He had discovered the battery and in-
troduced electrodynamics. His primitive batteries, eventually called voltaic
piles, closely resembled the electricity-producing structures found within
the bodies of “electric” fish. Only six weeks after Volta’s initial report,
Nicholson and Carlisle discovered electrolysis, the loosening up and sep-
arating of distinct atoms in molecules, such as the hydrogen and oxygen
atoms in water.
The fact that chemical reactions produced electric currents suggested
the reverse, that electrical currents could stimulate chemical reactions; this
is electrochemistry, which led to the discovery and isolation of many new
elements in the decades that followed. In 1807 Humphry Davy isolated
some active metals from their liquid compounds and became the first to
form sodium, potassium, calcium, strontium, barium, and magnesium.
In 1821 Seebeck found that the electric current would continue as long
as the temperatures of the two metals were kept different; this is thermo-
electricity and provides the basis for the thermocouple, which could then
be used as a thermometer.

15.8 It’s a Magnet!


In 1819 Oersted placed a current-carrying wire over a compass, not expect-
ing anything in particular to happen. The needle turned violently perpen-
dicular to the axis of the wire. When Oersted reversed the direction of the
current, the needle jerked around 180 degrees. This meant that magnetism
and electricity were not just analogous, but intimately related; electromag-
netism was born. Soon after, Arago demonstrated that a wire carrying
an electric current behaved like a magnet. Ampere, in 1820, confirmed
that a wire carrying a current was a magnet by demonstrating attraction
and repulsion between two separate current-carrying wires. He also exper-
imented with wires in various configurations and related the strength of
the magnetic force to the strength of the current in the wire. This con-
nection between electric current and magnetism led fairly soon after to the
telegraph, and later in the century, to the telephone.
15.9. A NEW WORLD 131

15.9 A New World


Electric currents produce magnetism. But can magnets produce electric
currents? Can the relationship be reversed? In 1831, Michael Faraday
tried to see if a current would be produced in a wire if it was placed in a
magnetic field created by another current-carrying wire. The experiment
failed, sort of. When the current was turned on in the second wire, gener-
ating the magnetic field, the first wire experienced a brief current, but then
nothing; when the current was turned off, again a brief current in the first
wire. Faraday, an experimental genius who, as a young man, had been an
assistant to Davy, and later the inventor of the refrigerator, made the right
conjecture that it is not the mere presence of the magnetic field that causes
a current, but changes in that magnetic field. He confirmed this conjec-
ture by showing that a current would flow through a coiled wire when a
magnetized rod was moved in and out of the coil; he (and, independently,
Henry in the United States) had invented electromagnetic induction and
the electric generator and, like Columbus, had discovered a new world.

15.10 Do The Math!


Mathematics has yet to appear in our brief history of electromagnetism,
but that was about to change. Although Faraday, often described as being
innocent of mathematics, developed his concept of lines of force in what
we would view as an unsophisticated manner, he was a great scientist and
his intuition would prove to be remarkably accurate.
In the summer of 1831, the same summer in which the forty-year old
Faraday first observed the phenomenon of electromagnetic induction, the
creation of an electric current by a changing magnetic field, James Clerk
Maxwell was born in Edinburgh, Scotland.
Maxwell’s first paper on electromagnetism, “On Faraday’s Lines of
Force” , appeared in 1855, when he was about 25 years old. The paper
involved a mathematical development of the results of Faraday and others
and established the mathematical methods Maxwell would use later in his
more famous work “On Physical Lines of Force” .
Although Maxwell did not have available all of the compact vector no-
tation we have today, his work was mathematically difficult. The following
is an excerpt from a letter Faraday himself sent to Maxwell concerning this
point.
There is one thing I would be glad to ask you. When a mathemati-
cian engaged in investigating physical actions and results has arrived at
his conclusions, may they not be expressed in common language as fully,
clearly and definitely as in mathematical formulae? If so, would it not be
a great boon to such as I to express them so? - translating them out of
132CHAPTER 15. A BRIEF HISTORY OF ELECTROMAGNETISM (CHAPTER 5,6)

their hieroglyphics, that we may work upon them by experiment. Hasn’t


every beginning student of vector calculus and electromagnetism wished
that Maxwell and his followers had heeded Faraday’s pleas?
As Zajonc relates in [49], reading Faraday, Maxwell was surprised to
find a kindred soul, someone who thought mathematically, although he
expressed himself in pictures. Maxwell felt that Faraday’s use of “lines of
force” to coordinate the phenomena of electromagnetism showed him to
be “a mathematician of a very high order” .
Maxwell reasoned that, since an electric current sets up a magnetic field,
and a changing magnetic field creates an electrical field, there should be
what we now call electromagnetic waves, as these two types of fields leap-
frog across (empty?) space. These waves would obey partial differential
equations, called Maxwell’s equations, although their familiar form came
later and is due to Heaviside [20]. Analyzing the mathematical properties
of the resulting wave equations, Maxwell discovered that the propagation
speed of these waves was the same as that of light, leading to the conclusion
that light itself is an electromagnetic phenomenon, distinguished from other
electromagnetic radiation only by its frequency. That light also exhibits
behavior more particle-like than wave-like is part of the story of the science
of the 20th century.
Maxwell predicted that electromagnetic radiation could exist at vari-
ous frequencies, not only those associated with visible light. Infrared and
ultraviolet radiation had been known since early in the century, and per-
haps they too were part of a spectrum of electromagnetic radiation. After
Maxwell’s death from cancer at forty-eight, Hertz demonstrated, in 1888,
the possibility of electromagnetic radiation at very low frequencies, radio
waves. In 1895 Röntgen discovered electromagnetic waves at the high-
frequency end of the spectrum, the so-called x-rays.

15.11 Just Dot the i’s and Cross the t’s?


By the end of the nineteenth century, some scientists felt that all that was
left to do in physics was to dot the i’s and cross the t’s. However, others
saw paradoxes and worried that there were problems yet to be solved; how
serious these might turn out to be was not always clear.
Maxwell himself had noted, about 1869, that his work on the specific
heats of gases revealed conflicts between rigorous theory and experimental
findings that he was unable to explain; it seemed that internal vibration of
atoms was being “frozen out”at sufficiently low temperatures, something for
which classical physics could not account. His was probably the first sugges-
tion that classical physics could be “wrong”. There were also the mysteries,
observed by Newton, associated with the partial reflection of light by thick
glass. Advances in geology and biology had suggested strongly that the
15.11. JUST DOT THE I’S AND CROSS THE T’S? 133

earth and the sun were much older than previously thought, which was not
possible, according to the physics of the day; unless a new form of energy
was operating, the sun would have burned out a long time ago.
Newton thought that light was a stream of particles. Others at the
time, notably Robert Hooke and Christiaan Huygens, felt that light was a
wave phenomenon. Both sides were hindered by a lack of a proper scien-
tific vocabulary to express their views. Around 1800 Young demonstrated
that a beam of light displayed interference effects similar to water waves.
Eventually, his work convinced people that Newton had been wrong on
this point and most accepted that light is a wave phenomenon. Faraday,
Maxwell, Hertz and others further developed the wave theory of light and
related light to other forms of electromagnetic radiation.
In 1887 Hertz discovered the photo-electric effect, later offered by Ein-
stein as confirming evidence that light has a particle nature. When light
strikes a metal, it can cause the metal to release an electrically charged par-
ticle, an electron. If light were simply a wave, there would not be enough
energy in the small part of the wave that hits the metal to displace the elec-
tron; in 1905 Einstein will argue that light is quantized, that is, it consists
of individual bundles or particles, later called photons, each with enough
energy to cause the electron to be released.
It was recognized that there were other problems with the wave theory
of light. All known waves required a medium in which to propagate. Sound
cannot propagate in a vacuum; it needs air or water or something. The
sound waves are actually compressions and rarefactions of the medium,
and how fast the waves propagate depends on how fast the material in the
medium can perform these movements; sound travels faster in water than
in air, for example.
Light travels extremely fast, but does not propagate instantaneously,
as Olaus Roemer first demonstrated around 1700. He observed that the
eclipses of the moons of Jupiter appeared to happen sooner when Jupiter
was moving closer to Earth, and later when it was moving away. He rea-
soned, correctly, that the light takes a finite amount of time to travel from
the moons to Earth, and when Jupiter is moving away the distance is
growing longer.
If light travels through a medium, which scientists called the ether, then
the ether must be a very strange substance indeed. The material that makes
up the ether must be able to compress and expand very quickly. Light
comes to us from great distances so the ether must extend throughout all of
space. The earth moves around the sun, and therefore through this ether, at
a great speed, and yet there are no friction effects, while very much slower
winds produce a great deal of weathering. Light can also be polarized,
so the medium must be capable of supporting transverse waves, not just
longitudinal waves, as in acoustics. To top it all off, the Michelson-Morley
experiment, performed in Cleveland in 1887, failed to detect the presence
134CHAPTER 15. A BRIEF HISTORY OF ELECTROMAGNETISM (CHAPTER 5,6)

of the ether. The notion that there is a physical medium that supports
the propagation of light would not go away, however. Late in his long life
Lord Kelvin (William Thomson) wrote “One word characterizes the most
strenuous efforts ... that I have made perseveringly during fifty-five years:
that word is FAILURE.” Thomson refused to give up his efforts to combine
the mathematics of electromagnetism with the mechanical picture of the
world.

15.12 Seeing is Believing


If radio waves can travel through an invisible ether, and if hypnotists can
mesmerize their subjects, why can’t human beings communicate telepath-
ically with each other and with the dead? Why should atoms exist when
we cannot see them, while ghosts must not, even when, as some claimed,
they have shown up in photographs? When is seeing believing?
In the late 1800’s the experimental physicist William Crooke claimed
to have discovered radiant matter [14]. When he passed an electric current
through a glass tube filled with a low-pressure gas, a small object within
the tube could be made to move from one end to the other, driven, so
Crooke claimed, by radiant particles of matter, later called cathode rays,
streaming from one end of the tube to the other. Crooke then went on,
without much success, to find material explanation for some of the alleged
effects of spiritualism. He felt that it ought to be possible for humans to
receive transmissions in much the same way as a radio receives signals. It
was a time of considerable uncertainty, and it was not clear that Crooke’s
radiant matter, atoms, x-rays, radio waves, radioactivity, and the ether
were any more real than ghosts, table tapping, and communicating with
the dead; they all called into question established physics.
Crooke felt that scientists had a calling to investigate all these myster-
ies, and should avoid preconceptions about what was true or false. Others
accused him of betraying his scientific calling and of being duped by spiritu-
alists. Perhaps remembering that even the word “scientist” was unknown
prior to the 1830’s, they knew, nevertheless, that, if the history of the nine-
teenth century taught them anything, it was that there were also serious
problems on the horizon of which they were completely unaware.

15.13 If You Can Spray Them, They Exist


Up through the seventeenth century, philosophy, especially the works of
Aristotle, had colored the way scientists looked at the physical world. By
the end of the nineteenth century, most scientists would have agreed that
philosophy had been banished from science, that statements that could not
be empirically verified, that is, metaphysics, had no place in science. But
15.14. WHAT’S GOING ON HERE? 135

philosophy began to sneak back in, as questions about causality and the
existence of objects we cannot see, such as atoms, started to be asked [1].
Most scientists are probably realists, believing that the objects they study
have an existence independent of the instruments used to probe them. On
the other side of the debate, positivists, or, at least, the more extreme
positivists, hold that we have no way of observing an observer-independent
reality, and therefore cannot verify that there is such a reality. Positivists
hold that scientific theories are simply instruments used to hold together
observed facts and make predictions. They do accept that the theories
describe an empirical reality that is the same for all observers, but not a
reality independent of observation. At first, scientists felt that it was safe
for them to carry on without worrying too much about these philosophical
points, but quantum theory would change things [26].
The idea that matter is composed of very small indivisible atoms goes
back to the ancient Greek thinkers Democritus and Epicurus. The phi-
losophy of Epicurus was popularized during Roman times by Lucretius,
in his lengthy poem De Rerum Natura (“On the Nature of Things” ), but
this work was lost to history for almost a thousand years. The discovery,
in 1417, of a medieval copy of the poem changed the course of history,
according to the author Stephen Greenblatt [22]. Copies of the poem be-
came widely distributed throughout Europe and eventually influenced the
thinking of Galileo, Freud, Darwin, Einstein, Thomas Jefferson, and many
others. But it wasn’t until after Einstein’s 1905 paper on Brownian mo-
tion and subsequent experimental confirmations of his predictions that the
actual existence of atoms was more or less universally accepted.
I recall reading somewhere about a conversation between a philosopher
of science and an experimental physicist, in which the physicist was ex-
plaining how he sprayed an object with positrons. The philosopher then
asked him if he really believed that positrons exist. The physicist answered,
“If you can spray them, they exist.”

15.14 What’s Going On Here?


Experiments with cathode rays revealed that they were deflected by mag-
nets, unlike any form of radiation similar to light, and unresponsive to
gravity. Maybe they were very small electrically charged particles. In
1897 J.J. Thomson established that the cathode rays were, indeed, elec-
trically charged particles, which he called electrons. For this discovery he
was awarded the Nobel Prize in Physics in 1906. Perhaps there were two
fundamental objects in nature, the atoms of materials and the electrons.
However, Volta’s experiments suggested the electrons were within the ma-
terials and involved in chemical reactions. In 1899 Thomson investigated
the photo-electric effect and found that cathode rays could be produced
136CHAPTER 15. A BRIEF HISTORY OF ELECTROMAGNETISM (CHAPTER 5,6)

by shining light on certain metals; the photo-electric effect revealed that


electrons were inside the materials. Were they between the atoms, or in-
side the atoms? If they were within the atoms, perhaps their number and
configuration could help explain Mendeleev’s periodic table and the variety
of elements found in nature.
In 1912, Max von Laue demonstrated that Röntgen’s x-ray beams can
be diffracted; this provided a powerful tool for determining the structure
of crystals and molecules and later played an important role in the dis-
covery of the double-helix structure of DNA. In 1923, the French physicist
Louis de Broglie suggested that moving particles, such as electrons, should
exhibit wave-like properties characterized by a wave-length. In particular,
he suggested that beams of electrons sent through a narrow aperture could
be diffracted. In 1937 G.P. Thomson, the son of J.J. Thomson, shared the
Nobel Prize in Physics with Clinton Davisson for their work demonstrating
that beams of electrons can be diffracted. As someone once put it, “The
father won the prize for showing that electrons are particles, and the son
won it for showing that they aren’t.” Some suggested that, since beams of
electrons exhibited wave-like properties, they should give rise to the sort
of interference effects Young had shown were exhibited by beams of light.
The first laboratory experiment showing double-slit interference effects of
beams of electrons was performed in 1989.
J.J. Thomson also discovered that the kinetic energy of the emitted
electrons depended not at all on the intensity of the light, but only on
its frequency. This puzzling aspect of the photo-electric effect prompted
Einstein to consider the possibility that light is quantized, that is, it comes
in small “packages”, or light quanta, later called photons. Einstein proposed
quantization of light energy in his 1905 work on the photo-electric effect.
It was this work, not his theories of special and general relativity, that
eventually won for Einstein the 1921 Nobel Prize in Physics.
Einstein’s 1905 paper that deals with the photo-electric effect is really
a paper about the particle nature of light. But this idea met with great
resistance, and it was made clear to Einstein that his prize was not for the
whole paper, but for that part dealing with the photo-electric effect. He
was even asked not to mention the particle nature of light in his Nobel
speech.
Around 1900 Max Planck had introduced quantization in his derivation
of the energy distribution as a function of frequency in black-body radi-
ation. Scholars have suggested that he did this simply for computational
convenience, and did not intend, at that moment, to abandon classical
physics. Somewhat later Plank and others proposed that the energy might
need to be quantized, in order to explain the absence of what Ehrenfest
called the ultraviolet catastrophe in black-body radiation.
Were the electrons the only sub-atomic particles? No, as Rutherford’s
discovery of the atomic nucleus in 1911 would reveal. And what is radioac-
15.15. THE YEAR OF THE GOLDEN EGGS 137

tivity, anyway? The new century was dawning, and all these questions
were in the air. It was about 1900, Planck had just discovered the quan-
tum theory, Einstein was in the patent office, where he would remain until
1909, Bohr and Schrödinger schoolboys, Heisenberg not yet born. A new
scientific revolution was about to occur, and, as in 1800, nobody could have
guessed what was coming next [35].

15.15 The Year of the Golden Eggs


As Rigden relates in [39], toward the end of his life Einstein looked back
to 1905, when he was twenty-six, and told Leo Szilard, “They were the
happiest years of my life. Nobody expected me to lay golden eggs.” It
is appropriate to end our story in 1905 because it was both an end and
a beginning. In five great papers published in that year, Einstein solved
several of the major outstanding problems that had worried physicists for
years, but the way he answered them was revolutionary and began a whole
new era of physics. After 1905 the development of electromagnetism merges
with that of quantum mechanics, and becomes too big a story to relate here.
The problems that attracted Einstein involved apparent contradictions,
and his answers were surprising. Is matter continuous or discrete? It is
discrete; atoms do exist. Is light wave-like or particle-like? It is both. Are
the laws of thermodynamics absolute or statistical? They are statistical.
Are the laws of physics the same for observers moving with uniform velocity
relative to one another? Yes; in particular, each will measure the speed of
light to be the same. And, by the way, our notion of three-dimensional
space and a separate dimension of time is wrong (special relativity), and
gravity and acceleration are really the same thing (general relativity). Is
inertial mass the same as gravitational mass? Yes. And what is mass,
anyway? It is really energy, as E = mc2 tells us.

15.16 Do Individuals Matter?


Our brief history of electromagnetism has focused on a handful of extraor-
dinary people. But how important are individuals in the development of
science, or in the course of history generally? An ongoing debate among
those who study history is over the role of the Great Man [13]. On one side
of the debate is the British writer and hero-worshipper Carlyle: “Universal
history, the history of what man has accomplished in this world, is at bot-
tom the History of the Great Men who have worked here.” On the other
side is the German political leader Bismarck: “The statesman’s task is to
hear God’s footsteps marching through history, and to try to catch on to
His coattails as He marches past.”
138CHAPTER 15. A BRIEF HISTORY OF ELECTROMAGNETISM (CHAPTER 5,6)

If Mozart had never lived, nobody else would have composed his music.
If Picasso had never lived, nobody else would have painted his pictures.
If Winston Churchill had never lived, or had he died of his injuries when,
in 1930, he was hit by a car on Fifth Avenue in New York City, western
Europe would probably be different today. If Hitler had died in 1930, when
the car he was riding in was hit by a truck, recent history would certainly
be different, in ways hard for us to imagine. But, I think the jury is still
out on this debate, at least as it applies to science.

I recently came across the following, which I think makes this point
well. Suppose that you were forced to decide which one of these four things
to“consign to oblivion” , that is, make it never to have happened: Mozart’s
opera Don Giovanni, Chaucer’s Canterbury Tales, Newton’s Principia, or
Eiffel’s tower. Which one would you choose? The answer has to be New-
ton’s Principia; it is the only one of the four that is not irreplaceable.

If Newton had never lived, we would still have Leibniz’s calculus. New-
ton’s Law of Universal Gravitation would have been discovered by someone
else. If Faraday had never lived, we would still have Henry’s discovery of
electromagnetic induction. If Darwin had never lived, someone else would
have published roughly the same ideas, at about the same time; in fact,
Alfred Russel Wallace did just that. If Einstein had not lived, somebody
else, maybe Poincaré, would have hit on roughly the same ideas, perhaps a
bit later. Relativity would have been discovered by someone else. The fact
that light behaves both like a wave and like a particle would have become
apparent to someone else. The fact that atoms do really exist would have
been demonstrated by someone else, although perhaps in a different way.

Nevertheless, just as Mozart’s work is unique, even though it was ob-


viously influenced by the times in which he composed and is clearly in the
style of the late 18th century, Darwin’s view of what he was doing differed
somewhat from the view taken by Wallace, and Einstein’s work reflected
his own fascination with apparent contradiction and a remarkable ability,
“to think outside the box” , as the currently popular expression has it.
Each of the people we have encountered in this brief history made a unique
contribution, even though, had they not lived, others would probably have
made their discoveries, one way or another.

People matter in another way, as well. Science is the work of individual


people just as art, music and politics are. The book of nature, as some
call it, is not easily read. Science is a human activity. Scientists are often
mistaken and blind to what their training and culture prevent them from
seeing. The history of the development of science is, like all history, our
own story.
15.17. WHAT’S NEXT? 139

15.17 What’s Next?


The twentieth century has taught us that all natural phenomena are based
on two physical principles, quantum mechanics and relativity. The combi-
nation of special relativity and quantum mechanics led to a unification of
three of the four fundamental forces of nature, electromagnetic force and
the weak and strong nuclear forces, originally thought to be unrelated. The
remaining quest is to combine quantum mechanics with general relativity,
which describes gravity. Such a unification seems necessary if one is to
solve the mysteries posed by dark matter and dark energy [6], which make
up most of the stuff of the universe, but of which nothing is known and
whose existence can only be inferred from their gravitational effects. Per-
haps what will be needed is a paradigm shift, to use Kuhn’s popular phrase;
perhaps the notion of a fundamental particle, or even of an observer, will
need to be abandoned.
The June 2010 issue of Scientific American contains an article called
“Twelve events that will change everything”. The article identifies twelve
events, both natural and man-made, that could happen at any time and
would transform society. It also rates the events in terms of how likely they
are to occur: fusion energy (very unlikely); extraterrestrial intelligence,
nuclear exchange, and asteroid collision (unlikely); deadly pandemic, room-
temperature superconductors, and extra dimensions (50-50); cloning of a
human, machine self-awareness, and polar meltdown (likely); and creation
of life, and Pacific earthquake (almost certain). Our brief study of the
history of electromagnetism should convince us that the event that will
really change everything is not on this list nor on anyone else’s list. As
Brian Greene suggests [23], people in the year 2100 may look back on today
as the time when the first primitive notions of parallel universes began to
take shape.

15.18 Unreasonable Effectiveness


As Butterfield points out in [9], science became modern in the period 1300
to 1800 not when experiment and observation replaced adherence to the
authority of ancient philosophers, but when the experimentation was per-
formed under the control of mathematics. New mathematical tools, loga-
rithms, algebra, analytic geometry, and calculus, certainly played an impor-
tant role, but so did mathematical thinking, measuring quantities, rather
than speculating about qualities, idealizing and abstracting from a phys-
ical situation, and the like. Astronomy and mechanics were the first to
benefit from this new approach. Paradoxically, our understanding of elec-
tromagnetism rests largely on a century or more of intuition, conjecture,
experimentation and invention that was almost completely free of math-
140CHAPTER 15. A BRIEF HISTORY OF ELECTROMAGNETISM (CHAPTER 5,6)

ematics. To a degree, this was because the objects of interest, magnets


and electricity, were close at hand and, increasingly, available for study.
In contrast, Newton’s synthesis of terrestrial and celestial gravitation was
necessarily largely a mathematical achievement; observational data was
available, but experimentation was not possible.
With Maxwell and the mathematicians, electromagnetism became a
modern science. Now electromagnetism could be studied with a pencil and
paper, as well as with generators. Consequences of the equations could be
tested in the laboratory and used to advance technology. The incomplete-
ness of the theory, with regard to the ether, the arrow of time, the finite
speed of light, also served to motivate further theoretical and experimental
investigation.
As electromagnetism, in particular, and physics, generally, became more
mathematical, studies of the very small (nuclear physics), the very large
(the universe), and the very long ago (cosmology) became possible. The
search for unifying theories of everything became mathematical studies, the
consequences of the theories largely beyond observation [43].
One of the great mysteries of science is what the physicist Eugene
Wigner called “the unreasonable effectiveness of mathematics”. Maxwell’s
mathematics suggested to him that visible light was an electromagnetic
phenomenon, occupying only a small part of an electromagnetic spectrum,
and to Hertz that there might be radio waves. Dirac’s mathematics sug-
gested to him the existence of anti-matter, positrons with the mass of an
electron, but with a positive charge, and with the bizarre property that,
when a positron hits an electron, their masses disappear, leaving only en-
ergy. What was fantastic science fiction in 1930 is commonplace today, as
anyone who has had a positron-emission-tomography (PET) scan is well
aware. Mathematics pointed to the existence of the Higgs boson, recently
discovered at CERN.
In 2000 the mathematical physicist Ed Witten wrote a paper describing
the physics of the century just ending [46]. Even the title is revealing; the
quest is for mathematical understanding. He points out that, as physics
became more mathematical in the first half of the twentieth century, with
relativity and non-relativistic quantum mechanics, it had a broad influence
on mathematics itself. The equations involved were familiar to the math-
ematicians of the day, even if the applications were not, and their use in
physics prompted further mathematical development, and the emergence
of new fields, such as functional analysis. In contrast, the physics of the
second half of the century involves mathematics, principally quantum con-
cepts applied to fields, not just particles, the foundations of which are not
well understood by mathematicians. This is mathematics with which even
the mathematicians are not familiar. Providing a mathematical foundation
for the standard model for particle physics should keep the mathematicians
of the next century busy for a while. The most interesting sentence in [46]
15.19. COMING FULL CIRCLE 141

is The quest to understand string theory may well prove to be a central


theme in physics of the twenty-first century. Are physicists now just trying
to understand their own mathematics, instead of the physical world?

15.19 Coming Full Circle


As we have seen, prior to Maxwell, electromagnetism was an experimental
science. With the coming of quantum mechanics, it became a mathemat-
ical study. Advances came from equations like Dirac’s, more than from
laboratories.
Within the last couple of decades, however, the circle has begun to close.
As scientists began to use computers to study their equations, strange phe-
nomena began to emerge: sensitive dependence on initial conditions in
the equations used to study the weather; chaotic behavior of sequences
of numbers generated by apparently simple formulas; fractal images ap-
pearing when these simple formulas were displayed graphically. At first,
it was thought that the strange behavior was coming from numerical er-
rors, but soon similar behavior was observed in natural systems. Chaos
theory, complexity and the study of emergent phenomena are the products
of computer-driven experimental mathematics.
142CHAPTER 15. A BRIEF HISTORY OF ELECTROMAGNETISM (CHAPTER 5,6)
Chapter 16

Changing Variables in
Multiple Integrals
(Chapter 5,6)

16.1 Mean-Value Theorems


In this section we review mean-value theorems for several types of functions.

16.1.1 The Single-Variable Case


The mean-value theorem that we learn in Calculus I can be expressed as
follows:

Theorem 16.1 Let f : R → R be a differentiable real-valued function of a


single real variable. Then for any real numbers a and b there is a third real
number c between a and b such that

∆f (a) = f (b) − f (a) = f 0 (c)(b − a) = f 0 (c)∆a. (16.1)

When we take b = a + da, where da is an infinitesmal, we have

df (a) = f 0 (a)da. (16.2)

16.1.2 The Multi-variate Case


Now consider a differentiable real-valued function of J real variables, F :
RJ → R. There is a mean-value theorem for this case, as well.

143
144CHAPTER 16. CHANGING VARIABLES IN MULTIPLE INTEGRALS (CHAPTER 5,6)

Theorem 16.2 For any a and b in RJ there is c on the line segment


between a and b such that

∆F (a) = F (b) − F (a) = ∇F (c) · (b − a) = ∇F (c) · ∆a. (16.3)

Proof: We prove this mean-value theorem using the previous one. Any
point x on the line segment joining a with b has the form

x = a + t(b − a) = (1 − t)a + tb,

for some t in the interval [0, 1]. We then define

f (t) = F (a + t(b − a)). (16.4)

The chain rule tells us that

f 0 (t) = ∇F (a + t(b − a)) · (b − a). (16.5)

Now we apply Equation (16.3) to get

∆F (a) = f (1) − f (0) = f 0 (τ )(1 − 0)

= ∇F (a + τ (b − a)) · (b − a) = ∇F (c) · ∆a, (16.6)

where c = a + τ (b − a).
When b − a = da we can write

dF (a) = F (b) − F (a) = ∇F (a) · da. (16.7)

16.1.3 The Vector-Valued Multi-variate Case


Our objective in this chapter is to examine the rules for change of coordi-
nates when we integrate functions defined on RJ . This leads us to consider
functions r : RJ → RJ . We write

r(x) = (r1 (x), ..., rJ (x)). (16.8)

Each of the functions rj is a real-valued function of J real variables, so we


can apply the mean-value theorem of the previous section, using F = rj .
Then we get

drj (a) = ∇rj (a) · da. (16.9)

We extend this to r, writing

dr(a) = (dr1 (a), ..., drJ (a)), (16.10)


16.2. THE VECTOR DIFFERENTIAL FOR THREE DIMENSIONS145

so that the vector differential of r at a is

dr(a) = (∇r1 (a), ..., ∇rJ (a)) · da. (16.11)

Writing a = (a1 , ..., aJ ), da = (da1 , ..., daJ ), and

∂r ∂r1 ∂rJ
(a) = ( (a), ..., (a)), (16.12)
∂aj ∂aj ∂aj

we have
J
X ∂r
dr(a) = (a) daj . (16.13)
j=1
∂aj

16.2 The Vector Differential for Three Di-


mensions
Let r = (x, y, z) be the vector from the origin in three-dimensional space to
the point (x, y, z) in rectangular coordinates. Suppose that there is another
coordinate system, (u, v, w), such that x = f (u, v, w), y = g(u, v, w) and
z = h(u, v, w). Then, with a = (u, v, w), we write

∂r ∂x ∂y ∂z
(a) = ( (a), (a), (a)), (16.14)
∂u ∂u ∂u ∂u

∂r ∂x ∂y ∂z
(a) = ( (a), (a), (a)), (16.15)
∂v ∂v ∂v ∂v
and
∂r ∂x ∂y ∂z
(a) = ( (a), (a), (a)). (16.16)
∂w ∂w ∂w ∂w
The vector differential dr is then
∂r ∂r ∂r
dr(a) = (a)du + (a)dv + (a)dw, (16.17)
∂u ∂v ∂w
which we obtain by applying the mean value theorem of the previous sec-
tion, viewing each of the functions x(u, v, w), y(u, v, w), and z(u, v, w, ) as
one of the rj . We view dr as the diagonal of an infinitesimal parallelepiped
with one corner at the point (x, y, z). We want to compute the volume of
this parallelepiped.
∂r ∂r ∂r
The vectors A = ∂u du, B = ∂v dv and C = ∂w dw are then three vectors
forming the sides of the parallelepiped. The volume of the parallelepiped
is then the absolute value of the vector triple product A · (B × C).
146CHAPTER 16. CHANGING VARIABLES IN MULTIPLE INTEGRALS (CHAPTER 5,6)

The triple product A · (B × C) is the determinant of the three by three


Jacobian matrix
 ∂x ∂x ∂x 
∂u ∂v ∂w
∂y ∂y ∂y
J(x, y, z) =  ∂u ∂v ∂w
, (16.18)
∂z ∂z ∂z
∂u ∂v ∂w

multiplied by du dv dw. Therefore the infinitesimal volume element dV is


dV = | det(J)|dudvdw. (16.19)
For example, let us consider spherical coordinates, (ρ, φ, θ).
Now we have
x = f (ρ, φ, θ) = ρ sin φ · cos θ, (16.20)

y = g(ρ, φ, θ) = ρ sin φ · sin θ, (16.21)


and
z = h(ρ, φ, θ) = ρ cos φ. (16.22)
Then the Jacobian matrix is
 
sin φ · cos θ ρ cos φ · cos θ −ρ sin φ · sin θ
J(x, y, z) =  sin φ · sin θ ρ cos φ · sin θ ρ sin φ · cos θ  , (16.23)
cos φ −ρ sin φ 0
and
det(J) = ρ2 sin φ. (16.24)
Therefore, the infinitesimal volume element in spherical coordinates is
ρ2 sin φ dρdφdθ. (16.25)
Similar formulas hold in two dimensions, as the example of polar coordi-
nates shows.
In the polar-coordinates system (ρ, θ) in two dimensions we have x =
ρ cos θ, and y = ρ sin θ. Then the Jacobian matrix is
 
cos θ −ρ sin θ
J(x, y) = , (16.26)
sin θ ρ cos θ
and
det(J) = ρ. (16.27)
Therefore, the infinitesimal area element in polar coordinates is
ρdρdθ. (16.28)
Chapter 17

Div, Grad, Curl (Chapter


5,6)

When we begin to study vector calculus, we encounter a number of new


concepts, divergence, gradient, curl, and so on, all related to the del oper-
ator, ∇. Shortly thereafter, we are hit with a blizzard of formulas relating
these concepts. It is all rather abstract and students easily lose their way.
It occurred to Prof. Schey of MIT to present these ideas to his students
side-by-side with the basics of electrostatics, which, after all, was one of
the main applications that drove the development of the vector calculus
in the first place. Eventually, he wrote a small book [40], which is now a
classic. These notes are based, in part, on that book.

17.1 The Electric Field


The basic principles of the electrostatics are the following:

• 1. there are positive and negative electrical charges, and like charges
repel, unlike charges attract;

• 2. the force is a central force, that is, the force that one charge exerts
on another is directed along the ray between them and, by Coulomb’s
Law, its strength falls off as the square of the distance between them;

• 3. super-position holds, which means that the force that results from
multiple charges is the vector sum of the forces exerted by each one
separately.

Apart from the first principle, this is a good description of gravity and
magnetism as well. According to Newton, every massive body exerts a

147
148 CHAPTER 17. DIV, GRAD, CURL (CHAPTER 5,6)

gravitational force of attraction on every other massive body. A space craft


heading to the moon feels the attractive force of both the earth and the
moon. For most of the journey, the craft is trying to escape the earth, and
the effect of the moon pulling the craft toward itself is small. But, at some
point in the journey, the attraction of the moon becomes stronger than that
of the earth, and the craft is mainly being pulled toward the moon. Even
before the space craft was launched, something existed up there in space,
waiting for a massive object to arrive and experience attractive force. This
something is the gravitational field due to the totality of massive bodies
doing the attracting. Einstein and others showed that gravity is a bit more
complicated than that, but this is a story for another time and a different
teller.
Faraday, working in England in the first half of the nineteenth century,
was the first to apply this idea of a field to electrostatics. He reasoned
that a distribution of electrical charges sets up something analogous to a
gravitational field, called an electric field, such that, once another charge
is placed within that field, it has a force exerted on it. The important idea
here is that something exists out there even when there is no charge present
to experience this force, just as with the gravitational field. There are also
magnetic fields, and the study of the interaction of electric and magnetic
fields is the focus of electromagnetism.

17.2 The Electric Field Due To A Single Charge


Suppose there is charge q at the origin in three-dimensional space. The
electric field resulting from this charge is
q
E(x, y, z) = u(x, y, z), (17.1)
x2 + y 2 + z 2

where
x y z
u(x, y, z) = ( p ,p ,p )
x2 + y2 + z2 2 2
x +y +z 2 x + y2 + z2
2

is the unit vector pointing from (0, 0, 0) to (x, y, z). The electric field can
be written in terms of its component functions, that is,

E(x, y, z) = (E1 (x, y, z), E2 (x, y, z), E3 (x, y, z)),

where
qx
E1 (x, y, z) = ,
(x2 + + z 2 )3/2
y2
qy
E2 (x, y, z) = ,
(x2 + y 2 + z 2 )3/2
17.3. GRADIENTS AND POTENTIALS 149

and
qz
E3 (x, y, z) = .
(x2 + y 2 + z 2 )3/2
It is helpful to note that these component functions are the three first
partial derivatives of the function
−q
φ(x, y, z) = p . (17.2)
x + y2 + z2
2

17.3 Gradients and Potentials


Because of the super-position principle, even when the electric field is the
result of multiple charges it will still be true that the component functions
of the field are the three partial derivatives of some scalar-valued function
φ(x, y, z). This function is called the potential function for the field.
For any scalar-valued function f (x, y, z), the gradient of f at the point
(x, y, z) is the vector of its first partial derivatives at (x, y, z), that is,
∂f ∂f ∂f
∇f (x, y, z) = ( (x, y, z), (x, y, z), (x, y, z));
∂x ∂y ∂z
the vector-valued function ∇f is called the gradient field of f . Therefore,
the electric field E is the gradient field of its potential function.

17.4 Gauss’s Law


Let’s begin by looking at Gauss’s Law, and then we’ll try to figure out
what it means.

Gauss’s Law:
Z Z Z Z Z
E · n dS = 4π ρ dV. (17.3)
S V

The integral on the left side is the integral over the surface S, while the
integral on the right side is the triple integral over the volume V enclosed
by the surface S. We must remember to think of integrals as summing, so
on the left we are summing something over the surface, while on the right
we are summing something else over the enclosed volume.

17.4.1 The Charge Density Function


The function ρ = ρ(x, y, z) assigns to each point in space a number, the
charge density at that point. The vector n = n(x, y, z) is the outward unit
normal vector to the surface at the point (x, y, z) on the surface, that is, it
is a unit vector pointing directly out of the surface at the point (x, y, z).
150 CHAPTER 17. DIV, GRAD, CURL (CHAPTER 5,6)

17.4.2 The Flux


The dot product
E · n = E(x, y, z) · n(x, y, z)
is the amplitude, that is, plus or minus the magnitude, of the component
of the electric field vector E(x, y, z) that points directly out of the surface.
The surface integral on the left side of Equation (17.3) is a measure of
the outward flux of the electric field through the surface. If there were no
charges inside the surface S there would be no outward flux. Gauss’s Law
tells us that the total outward flux that does exist is due to how much
charge there is inside the surface, that is, to the totality of charge density
inside the surface.
Our goal is to find a convenient way to determine the electric field
everywhere, assuming we know the charge density function everywhere.
Gauss’s Law is only a partial answer, since it seems to require lots of
surface and volume integrals.

17.5 A Local Gauss’s Law and Divergence


Gauss’s Law involves arbitrary surfaces and the volumes they enclose. It
would be more helpful if the law could be expressed locally, at each point
in space separately. To achieve this, we consider a fixed point (x, y, z)
in space, and imagine this point to be the center of a sphere. We apply
Gauss’s Law to this sphere and get the flux through its surface. Now we
imagine shrinking the sphere down to its center point. As we shall show
later, in the limit, the ratio of the flux to the volume of the sphere, as the
radius of the sphere goes to zero, is the divergence of the field E, whose
value at the point (x, y, z) is the number
∂E1 ∂E2 ∂E3
div E(x, y, z) = (x, y, z) + (x, y, z) + (x, y, z). (17.4)
∂x ∂y ∂z
For notational convenience, we also write the divergence function as

div E = ∇ · E,

where the symbol


∂ ∂ ∂
∇ =( , , )
∂x ∂y ∂z
is the del operator.
When we apply the same limiting process to the integral on the right
side of Gauss’s Law, we just get 4πρ(x, y, z). Therefore, the local or differ-
ential form of Gauss’s Law becomes

div E(x, y, z) = 4πρ(x, y, z). (17.5)


17.6. POISSON’S EQUATION AND HARMONIC FUNCTIONS 151

This is also the first of the four Maxwell’s Equations. When we substitute
div E(x, y, z) for 4πρ(x, y, z) in Equation (17.3) we get
Z Z Z Z Z
E · n dS = div E(x, y, z) dV, (17.6)
S V

which is the Divergence Theorem.


Our goal is to determine the electric field from knowledge of the charge
density function ρ. The partial differential equation in (17.5) is not enough,
by itself, since it involves three different unknown functions, E1 , E2 , and
E3 , and only one known function ρ. The next step in solving the problem
involves the potential function for the electric field.

17.5.1 The Laplacian


For a scalar-valued function f (x, y, z) the Laplacian is
∂2f ∂2f ∂2f
∇2 f (x, y, z) = + + = ∇ · (∇f ).
∂x2 ∂y 2 ∂z 2
For a vector-valued function
F(x, y, z) = (F1 (x, y, z), F2 (x, y, z), F3 (x, y, z)),
the symbol ∇2 F is the vector-valued function whose components are the
Laplacians of the individual F1 , F2 , and F3 , that is,
∇2 F = (∇2 F1 , ∇2 F2 , ∇2 F3 ).

17.6 Poisson’s Equation and Harmonic Func-


tions
As we discussed earlier, the component functions of the electric field are the
three first partial derivatives of a single function, φ(x, y, z), the electrostatic
potential function. Our goal then is to find the potential function. When
we calculate the divergence of the electric field using φ we find that
∂2φ ∂2φ ∂2φ
div E(x, y, z) = + 2 + 2 = ∇ · (∇φ) = ∇2 φ.
∂x2 ∂y ∂z
Therefore, the differential form of Gauss’s Law can be written as
∇2 φ(x, y, z) = 4πρ(x, y, z); (17.7)
this is called Poisson’s Equation. In any region of space where there are no
charges, that is, where ρ(x, y, z) = 0, we have
∇2 φ(x, y, z) = 0. (17.8)
152 CHAPTER 17. DIV, GRAD, CURL (CHAPTER 5,6)

Functions that satisfy Equation (17.8) are called harmonic functions. The
reader may know that both the real and imaginary parts of a complex-
valued analytic function are harmonic functions of two variables. This
connection between electrostatics and complex analysis motivated the (ul-
timately fruitless) search for a three-dimensional extension of complex anal-
ysis.

17.7 The Curl


The divergence of a vector field is a local measure of the flux, which we
may think of as outward flow of something. The curl is a measure of the
rotation of the something.
For any vector field F(x, y, z) = (F1 (x, y, z), F2 (x, y, z), F3 (x, y, z)), the
curl of F is the vector field
∂F3 ∂F2 ∂F1 ∂F3 ∂F2 ∂F1
curl F(x, y, z) = ∇ × F = ( − , − , − ). (17.9)
∂y ∂z ∂z ∂x ∂x ∂y
A useful identity involving the curl is the following:
∇ × (∇ × F) = ∇(∇ · F) − ∇2 F. (17.10)

17.7.1 An Example
The curve r(t) in three-dimensional space given by
r(t) = (x(t), y(t), z(t)) = r(cos θ(t), sin θ(t), 0)
can be viewed as describing the motion of a point moving in time, revolving
counter-clockwise around the z-axis. The velocity vector at each point is

v(t) = r0 (t) = (x0 (t), y 0 (t), z 0 (t)) =
(−r sin θ(t), r cos θ(t), 0).
dt
Suppressing the dependence on t, we can write the velocity vector field as

v(x, y, z) = (−y, x, 0).
dt
Then
curl v(x, y, z) = (0, 0, 2ω),

where ω = dt is the angular velocity. The divergence of the velocity field
is
div v(x, y, z) = 0.
The motion here is rotational; there is no outward flow of anything. Here
the curl describes how fast the rotation is, and indicates the axis of rotation;
the fact that there is no outward flow is indicated by the divergence being
zero.
17.8. THE MAGNETIC FIELD 153

17.7.2 Solenoidal Fields


When the divergence of a vector field is zero, the field is said to be solenoidal;
the velocity field in the previous example is solenoidal. The second of
Maxwell’s four equations is that the magnetic field is solenoidal.

17.7.3 The Curl of the Electrostatic Field


We can safely assume that the mixed second partial derivatives of the
potential function φ satisfy

∂2φ ∂2φ
= ,
∂x∂y ∂y∂x

∂2φ ∂2φ
= ,
∂x∂z ∂z∂x
and

∂2φ ∂2φ
= .
∂z∂y ∂y∂z
It follows, therefore, that, because the electrostatic field has a potential,
its curl is zero. The third of Maxwell’s Equations (for electrostatics) is

curl E(x, y, z) = 0. (17.11)

17.8 The Magnetic Field


We denote by B(x, y, z) a magnetic field. In the static case, in which
neither the magnetic field nor the electric field is changing with respect to
time, there is no connection between them. The equations that describe
this situation are

Maxwell’s Equations for the Static Case:


• 1. div E = 4πρ;
• 2. curl E = 0;
• 3. div B = 0;
• 4. curl B = 0.
It is what happens in the dynamic case, when the electric and magnetic
fields change with time, that is interesting.
Ampere discovered that a wire carrying a current acts like a magnet.
When the electric field changes with time, there is a current density vector
154 CHAPTER 17. DIV, GRAD, CURL (CHAPTER 5,6)

field J proportional to the rate of change of the electric field, and Item 4
above is replaced by Ampere’s Law:
∂E
curl B = a ,
∂t
where a is some constant. Therefore, the curl of the magnetic field is
proportional to the rate of change of the electric field with respect to time.
Faraday (and also Henry) discovered that moving a magnet inside a
wire coil creates a current in the wire. When the magnetic field is changing
with respect to time, the electric field has a non-zero curl proportional to
the rate at which the magnetic field is changing. Then Item 2 above is
replaced by
∂B
curl E = b ,
∂t
where b is some constant. Therefore, the curl of the electric field is pro-
portional to the rate of change of the magnetic field. It is this mutual de-
pendence that causes electromagnetic waves: as the electric field changes,
it creates a changing magnetic field, which, in turn, creates a changing
electric field, and so on.

17.9 Electro-magnetic Waves


We consider now the behavior of electric and magnetic fields that are chang-
ing with time, in a region of space where there are no charges or currents.
Maxwell’s Equations are then

• 1. div E = 0;
• 2. curl E = −b ∂B
∂t ;

• 3. div B = 0;
• 4. curl B = a ∂E
∂t .

We then have
∂B ∂ ∂ ∂E ∂2E
∇ × (∇ × E) = −b(∇ × ) = −b (∇ × B) = −ab ( ) = −ab 2 .
∂t ∂t ∂t ∂t ∂t
Using Equation (17.10), we can also write

∇ × (∇ × E) = ∇(∇ · E) − ∇2 E = ∇div E − ∇2 E = −∇2 E.

Therefore, we have
∂2E
∇2 E = ab ,
∂t2
17.9. ELECTRO-MAGNETIC WAVES 155

which means that, for each i = 1, 2, 3, the component function Ei satisfies


the three-dimensional wave equation

∂ 2 Ei
= c2 ∇2 Ei .
∂t2
The same is true for the component functions of the magnetic field. Here
the constant c is the speed of propagation of the wave, which turns out to
be the speed of light. It was this discovery that suggested to Maxwell that
light is an electromagnetic phenomenon.
156 CHAPTER 17. DIV, GRAD, CURL (CHAPTER 5,6)
Chapter 18

Kepler’s Laws of
Planetary Motion
(Chapter 5,6)

18.1 Introduction
Kepler worked from 1601 to 1612 in Prague as the Imperial Mathematician.
Taking over from Tycho Brahe, and using the tremendous amount of data
gathered by Brahe from naked-eye astronomical observation, he formulated
three laws governing planetary motion. Fortunately, among his tasks was
the study of the planet Mars, whose orbit is quite unlike a circle, at least
relatively speaking. This forced Kepler to consider other possibilities and
ultimately led to his discovery of elliptic orbits. These laws, which were
the first “natural laws” in the modern sense, served to divorce astronomy
from theology and philosophy and marry it to physics. At last, the planets
were viewed as material bodies, not unlike earth, floating freely in space
and moved by physical forces acting on them. Although the theology and
philosophy of the time dictated uniform planetary motion and circular or-
bits, nature was now free to ignore these demands; motion of the planets
could be non-uniform and the orbits other than circular.
Although the second law preceded the first, Kepler’s Laws are usually
enumerated as follows:
• 1. the planets travel around the sun not in circles but in elliptical
orbits, with the sun at one focal point;
• 2. a planet’s speed is not uniform, but is such that the line segment
from the sun to the planet sweeps out equal areas in equal time
intervals; and, finally,

157
158CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)

• 3. for all the planets, the time required for the planet to complete
one orbit around the sun, divided by the 3/2 power of its average
distance from the sun, is the same constant.
These laws, particularly the third one, provided strong evidence for New-
ton’s law of universal gravitation. How Kepler discovered these laws with-
out the aid of analytic geometry and differential calculus, with no notion of
momentum, and only a vague conception of gravity, is a fascinating story,
perhaps best told by Koestler in [31].
Around 1684, Newton was asked by Edmund Halley, of Halley’s comet
fame, what the path would be for a planet moving around the sun, if the
force of gravity fell off as the square of the distance from the sun. Newton
responded that it would be an ellipse. Kepler had already declared that
planets moved along elliptical orbits with the sun at one focal point, but his
findings were based on observation and imagination, not deduction from
physical principles. Halley asked Newton to provide a proof. To supply
such a proof, Newton needed to write a whole book, the Principia, pub-
lished in 1687, in which he had to deal with such mathematically difficult
questions as what the gravitational force is on a point when the attracting
body is not just another point, but a sphere, like the sun.
With the help of vector calculus, a later invention, Kepler’s laws can be
derived as consequences of Newton’s inverse square law for gravitational
attraction.

18.2 Preliminaries
We consider a body with constant mass m moving through three-dimensional
space along a curve
r(t) = (x(t), y(t), z(t)),
where t is time and the sun is the origin. The velocity vector at time t is
then
v(t) = r0 (t) = (x0 (t), y 0 (t), z 0 (t)),
and the acceleration vector at time t is

a(t) = v0 (t) = r00 (t) = (x00 (t), y 00 (t), z 00 (t)).

The linear momentum vector is

p(t) = mv(t).

One of the most basic laws of motion is that the vector p0 (t) = mv0 (t) =
ma(t) is equal to the external force exerted on the body. When a body, or
more precisely, the center of mass of the body, does not change location,
all it can do is rotate. In order for a body to rotate about an axis a torque
18.3. TORQUE AND ANGULAR MOMENTUM 159

is required. Just as work equals force times distance moved, work done
in rotating a body equals torque times angle through which it is rotated.
Just as force is the time derivative of p(t), the linear momentum vector, we
find that torque is the time derivative of something else, called the angular
momentum vector.

18.3 Torque and Angular Momentum


Consider a body rotating around the origin in two-dimensional space, whose
position at time t is

r(t) = (r cos θ(t), r sin θ(t)).

Then at time t + ∆t it is at

r(t + ∆t) = (r cos(θ(t) + ∆θ), r sin(θ(t) + ∆θ)).

Therefore, using trig identities, we find that the change in the x-coordinate
is approximately

∆x = −r∆θ sin θ(t) = −y(t)∆θ,

and the change in the y-coordinate is approximately

∆y = r∆θ cos θ(t) = x(t)∆θ.

The infinitesimal work done by a force F = (Fx , Fy ) in rotating the body


through the angle ∆θ is then approximately

∆W = Fx ∆x + Fy ∆y = (Fy x(t) − Fx y(t))∆θ.

Since work is torque times angle, we define the torque to be

τ = Fy x(t) − Fx y(t).

The entire motion is taking place in two dimensional space. Neverthe-


less, it is convenient to make use of the concept of cross product of three-
dimensional vectors to represent the torque. When we rewrite

r(t) = (x(t), y(t), 0),

and
F = (Fx , Fy , 0),
we find that

r(t) × F = (0, 0, Fy x(t) − Fx y(t)) = (0, 0, τ ) = τ.


160CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)

Now we use the fact that the force is the time derivative of the vector p(t)
to write
τ = (0, 0, τ ) = r(t) × p0 (t).

Exercise 18.1 Show that

d
r(t) × p0 (t) = (r(t) × p(t)). (18.1)
dt

By analogy with force as the time derivative of linear momentum, we


define torque as the time derivative of angular momentum, which, from the
calculations just performed, leads to the definition of the angular momen-
tum vector as
L(t) = r(t) × p(t).

We need to say a word about the word “vector”. In our example of


rotation in two dimensions we introduced the third dimension as merely a
notational convenience. It is convenient to be able to represent the torque
as L0 (t) = (0, 0, τ ), but when we casually call L(t) the angular momentum
vector, physicists would tell us that we haven’t yet shown that angular mo-
mentum is a “vector” in the physicists’ sense. Our example was too simple,
they would point out. We had rotation about a single fixed axis that was
conveniently chosen to be one of the coordinate axes in three-dimensional
space. But what happens when the coordinate system changes?
Clearly, they would say, physical objects rotate and have angular mo-
mentum. The earth rotates around an axis, but this axis is not always
the same axis; the axis wobbles. A well thrown football rotates around its
longest axis, but this axis changes as the ball flies through the air. Can we
still say that the angular momentum can be represented as

L(t) = r(t) × p(t)?

In other words, we need to know that the torque is still the time derivative
of L(t), even as the coordinate system changes. In order for something
to be a “vector”in the physicists’ sense, it needs to behave properly as we
switch coordinate systems, that is, it needs to transform as a vector [15].
In fact, all is well. This definition of L(t) holds for bodies moving along
more general curves in three-dimensional space, and we can go on calling
L(t) the angular momentum vector. Now we begin to exploit the special
nature of the gravitational force.
18.4. GRAVITY IS A CENTRAL FORCE 161

18.4 Gravity is a Central Force


We are not interested here in arbitrary forces, but in the gravitational force
that the sun exerts on the body, which has special properties that we shall
exploit. In particular, this gravitational force is a central force.
Definition 18.1 We say that the force is a central force if

F(t) = h(t)r(t),

for each t, where h(t) denotes a scalar function of t; that is, the force is
central if it is proportional to r(t) at each t.

Proposition 18.1 If F(t) is a central force, then L0 (t) = 0, for all t, so


that L = L(t) is a constant vector and L = ||L(t)|| = ||L|| is a constant
scalar, for all t.

Proof: From Equation (18.1) we have

L0 (t) = r(t) × p0 (t) = r(t) × F(t) = h(t)r(t) × r(t) = 0.

We see then that the angular momentum vector L(t) is conserved when
the force is central.

Proposition 18.2 If L0 (t) = 0, then the curve r(t) lies in a plane.

Proof: We have
 
r(t) · L = r(t) · L(t) = r(t) · r(t) × p(t) ,

which is the volume of the parallelepiped formed by the three vectors r(t),
r(t) and p(t), which is obviously zero. Therefore, for every t, the vector
r(t) is orthogonal to the constant vector L. So, the curve lies in a plane
with normal vector L.

18.5 The Second Law


We know now that, since the force is central, the curve described by r(t)
lies in a plane. This allows us to use polar coordinate notation [42]. We
write
r(t) = ρ(t)(cos θ(t), sin θ(t)) = ρ(t)ur (t),
where ρ(t) is the length of the vector r(t) and

r(t)
ur (t) = = (cos θ(t), sin θ(t))
||r(t)||
162CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)

is the unit vector in the direction of r(t). We also define

uθ (t) = (− sin θ(t), cos θ(t)),

so that
d
uθ (t) = ur (t),

and
d
ur (t) = − uθ (t).

Exercise 18.2 Show that

p(t) = mρ0 (t)ur (t) + mρ(t) uθ (t). (18.2)
dt
Exercise 18.3 View the vectors r(t), p(t), ur (t) and uθ (t) as vectors in
three-dimensional space, all with third component equal to zero. Show that

ur (t) × uθ (t) = k = (0, 0, 1),

for all t. Use this and Equation (18.2) to show that


 dθ 
L = L(t) = mρ(t)2 k,
dt
so that L = mρ(t)2 dθ
dt , the moment of inertia times the angular velocity, is
constant.

Let t0 be some arbitrary time, and for any time t ≥ t0 let A(t) be the
area swept out by the planet in the time interval [t0 , t]. Then A(t2 ) − A(t1 )
is the area swept out in the time interval [t1 , t2 ].
In the very short time interval [t, t + ∆t] the vector r(t) sweeps out a
very small angle ∆θ, and the very small amount of area formed is then
approximately
1
∆A = ρ(t)2 ∆θ.
2
Dividing by ∆t and taking limits, as ∆t → 0, we get
dA 1 dθ L
= ρ(t)2 = .
dt 2 dt 2m
Therefore, the area swept out between times t1 and t2 is
Z t2 Z t2
dA L L(t2 − t1 )
A(t2 ) − A(t1 ) = dt = dt = .
t1 dt t1 2m 2m

This is Kepler’s Second Law.


18.6. THE FIRST LAW 163

18.6 The First Law


We saw previously that the angular momentum vector is conserved when
the force is central. When Newton’s inverse-square law holds, there is
another conservation law; the Runge-Lenz vector is also conserved. We
shall use this fact to derive the First Law.
Let M denote the mass of the sun, and G Newton’s gravitational con-
stant.

Definition 18.2 The force obeys Newton’s inverse square law if

mM G
F(t) = h(t)r(t) = − r(t).
ρ(t)3

Then we can write


mM G r(t) mM G
F(t) = − =− ur (t).
ρ(t)2 ||r(t)|| ρ(t)2

Definition 18.3 The Runge-Lenz vector is

K(t) = p(t) × L(t) − kur (t),

where k = m2 M G.

Exercise 18.4 Show that the velocity vectors r0 (t) lie in the same plane
as the curve r(t).

Exercise 18.5 Use the rule

A × (A × B) = (A · B)A − (A · A)B

to show that K0 (t) = 0, so that K = K(t) is a constant vector and K =


||K|| is a constant scalar.

So the Runge-Lenz vector is conserved when the force obeys Newton’s


inverse square law.

Exercise 18.6 Use the rule in the previous exercise to show that the con-
stant vector K also lies in the plane of the curve r(t).

Exercise 18.7 Show that

K · r(t) = L2 − kρ(t).
164CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)

It follows from this exercise that

L2 − kρ(t) = K · r(t) = Kρ(t) cos α(t),

where α(t) is the angle between the vectors K and r(t). From this we get

ρ(t) = L2 /(k + K cos α(t)).

For k > K, this is the equation of an ellipse having eccentricity e = K/k.


This is Kepler’s First Law.
Kepler initially thought that the orbits were “egg-shaped” , but later
came to realize that they were ellipses. Although Kepler did not have the
analytical geometry tools to help him, he was familiar with the mathemat-
ical development of ellipses in the Conics, the ancient book by Apollonius,
written in Greek in Alexandria about 200 BC. Conics, or conic sections,
are the terms used to describe the two-dimensional curves, such as ellipses,
parabolas and hyperbolas, formed when a plane intersects an infinite double
cone (think “hour-glass”).
Apollonius was interested in astronomy and Ptolemy was certainly
aware of the work of Apollonius, but it took Kepler to overcome the bias
toward circular motion and introduce conic sections into astronomy. As
related by Bochner [3], there is a bit of mystery concerning Kepler’s use of
the Conics. He shows that he is familiar with a part of the Conics that
existed only in Arabic until translated into Latin in 1661, well after his
time. How he gained that familiarity is the mystery.

18.7 The Third Law


As the planet moves around its orbit, the closest distance to the sun is

ρmin = L2 /(k + K),

and the farthest distance is

ρmax = L2 /(k − K).

The average of these two is


1 
a= ρmin + ρmax = 2kL2 /(k 2 − K 2 );
2
this is the semi-major axis of the ellipse. The semi-minor axis has length
b, where
b2 = a2 (1 − e2 ).
Therefore, √
L a
b= √ .
k
18.8. DARK MATTER AND DARK ENERGY 165

The area of this ellipse is πab. But we know from the first law that the
L
area of the ellipse is 2m times the time T required to complete a full orbit.
Equating the two expressions for the area, we get
4π 2 3
T2 = a .
MG
This is the third law.
The first two laws deal with the behavior of one planet; the third law
is different. The third law describes behavior that is common to all the
planets in the solar system, thereby suggesting a universality to the force
of gravity.

18.8 Dark Matter and Dark Energy


Ordinary matter makes up only a small fraction of the “stuff” in the uni-
verse. About 25 percent of the stuff is dark matter and over two thirds
is dark energy. Because neither of these interacts with electromagnetic
radiation, evidence for their existence is indirect.
Suppose, for the moment, that a planet moves in a circular orbit of
radius a, centered at the sun. The orbital time is T , q
so, by Kepler’s third
law, the speed with which the planet orbits the sun is MaG , so the farther
away the planet the slower it moves. Spiral galaxies are like large planetary
systems, with some stars nearer to the center of the galaxy than others.
We would expect those stars farther from the center of mass of the galaxy
to be moving more slowly, but this is not the case. One explanation for
this is that there is more mass present, dark mass we cannot detect, spread
throughout the galaxy and not concentrated just near the center.
According to Einstein, massive objects can bend light. This gravita-
tional lensing, distorting the light from distant stars, has been observed
by astronomers, but cannot be simply the result of the observable mass
present; there must be more mass out there. Again, this provides indirect
evidence for dark mass.
The universe is expanding. Until fairly recently, it was believed that,
although it was expanding, the rate of expansion was decreasing; the mass
in the universe was exerting gravitational pull that was slowing down the
rate of expansion. The question was whether or not the expansion would
eventually stop and contraction begin. When the rate of expansion was
measured, it was discovered that the rate was increasing, not decreasing.
The only possible explanation for this seemed to be that dark energy was
operating and with sufficient strength to overcome not just the pull of or-
dinary matter, but of the dark matter as well. Understanding dark matter
and dark energy is one of the big challenges for physicists of the twenty-first
century.
166CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)

18.9 From Kepler to Newton


Our goal, up to now, has been to show how Kepler’s three laws can be
derived from Newton’s inverse-square law, which, of course, is not how
Kepler obtained the laws. Kepler arrived at his laws empirically, by study-
ing the astronomical data. Newton was aware of Kepler’s laws and they
influenced his work on universal gravitation. When asked what would ex-
plain Kepler’s elliptical orbits, Newton replied that he had calculated that
an inverse-square law would do it. Newton found that the force required
to cause the moon to deviate from a tangent line was approximately that
given by an inverse-square fall-off in gravity.
It is interesting to ask if the inverse-square law can be derived from
Kepler’s three laws; the answer is yes, as we shall see in this section. What
follows is taken from [21].
We found previously that

dA 1 dθ L
= ρ(t)2 = = c. (18.3)
dt 2 dt 2m
Differentiating with respect to t, we get

dθ 1 d2 θ
ρ(t)ρ0 (t) + ρ(t)2 2 = 0, (18.4)
dt 2 dt
so that
dθ d2 θ
2ρ0 (t) + ρ(t) 2 = 0. (18.5)
dt dt
From this, we shall prove that the force is central, directed towards the
sun.
As we did earlier, we write the position vector r(t) as

r(t) = ρ(t)ur (t),

so, suppressing the dependence on the time t, and using the identities

dur dθ
= uθ ,
dt dt
and
duθ dρ
= −ur ,
dt dt
we write the velocity vector as

dr dρ dur dρ dur dθ dρ dθ
v= = ur + ρ = ur + ρ = ur + ρ uθ ,
dt dt dt dt dθ dt dt dt
18.9. FROM KEPLER TO NEWTON 167

and the acceleration vector as


d2 ρ dρ dur dρ dθ d2 θ dθ duθ
a= 2
ur + + uθ + ρ 2
uθ + ρ
dt dt dt dt dt dt dt dt

d2 ρ dρ dθ dρ dθ d2 θ dθ dθ
= 2
ur + uθ + uθ + ρ 2
uθ − ρ ur .
dt dt dt dt dt dt dt dt
Therefore, we have
 d2 ρ dθ 2   dρ dθ d2 θ 
a= − ρ( ) ur + 2 + ρ 2 uθ .
dt2 dt dt dt dt
Using Equation (18.4), this reduces to
 d2 ρ dθ 2 
a= − ρ( ) ur , (18.6)
dt2 dt
which tells us that the acceleration, and therefore the force, is directed
along the line joining the planet to the sun; it is a central force.

Exercise 18.8 Prove the following two identities:

dρ dρ dθ 2c dθ
= = 2 (18.7)
dt dθ dt ρ dt

and
d2 ρ 4c2 d2 ρ 8c2  dρ 2
2
= 4 − 5 . (18.8)
dt ρ dθ2 ρ dθ

Therefore, we can write the acceleration vector as


!
4c2 d2 ρ 8c2  dρ 2 4c2
a= − 5 − 3 ur .
ρ4 dθ2 ρ dθ ρ

To simplify, we substitute u = ρ−1 .

Exercise 18.9 Prove that the acceleration vector can be written as


! !
2 2 1 d2 u 2  du 2 2 5
 1 du 2 2 3
a = 4c u − 2 2 + 3 − 8c u − 2 − 4c u ur ,
u dθ u dθ u dθ

so that
 d2 u 
a = −4c2 u2 + u ur . (18.9)
dθ2
168CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)

Kepler’s First Law tells us that

L2 a(1 − e2 )
ρ(t) = = ,
k + K cos α(t) 1 + e cos α(t)

where e = K/k and a is the semi-major axis. Therefore,

1 + e cos α(t)
u= .
a(1 − e2 )

Using Equation (18.9), we can write the acceleration as

4c2 4c2
a=− 2
u2 ur = − r−2 ur ,
a(1 − e ) a(1 − e2 )
which tells us that the force obeys an inverse-square law. We still must
show that this same law applies to each of the planets, that is, that the
c2
constant a(1−e 2 ) does not depend on the particular planet.

Exercise 18.10 Show that


c2 π 2 a3
2
= ,
a(1 − e ) T2
which is independent of the particular planet, according to Kepler’s Third
Law.

18.10 Newton’s Own Proof of the Second Law


Although Newton invented calculus, he relied on geometry for many of his
mathematical arguments. A good example is his proof of Kepler’s Second
Law.
He begins by imagining the planet at the point 0 in Figure 18.1. If there
were no force coming from the sun, then, by the principle of inertia, the
planet would continue in a straight line, with constant speed. The distance
∆ from the point 0 to the point 1 is the same as the distance from 1 to 2
and the same as the distance from 2 to 3. The areas of the three triangles
formed by the sun and the points 0 and 1, the sun and the points 1 and
2, and the sun and the points 2 and 3 are all equal, since they all equal
half of the base ∆ times the height H. Therefore, in the absence of a force
from the sun, the planet sweeps out equal areas in equal times. Now what
happens when there is a force from the sun?
Newton now assumes that ∆ is very small, and that during the short
time it would have taken for the planet to move from 1 to 3 there is a force
on the planet, directed toward the sun. Because of the small size of ∆, he
safely assumes that the direction of this force is unchanged and is directed
18.11. ARMCHAIR PHYSICS 169

along the line from 2, the midpoint of 1 and 3, to the sun. The effect of
such a force is to pull the planet away from 3, along the line from 3 to 4.
The areas of the two triangles formed by the sun and the points 2 and 3
and the sun and the points 2 and 4 are both equal to half of the distance
from the sun to 2, times the distance from 2 to B. So we still have equal
areas in equal times.
We can corroborate Newton’s approximations using vector calculus.
Consider the planet at 2 at time t = 0. Suppose that the acceleration
is a(t) = (b, c), where (b, c) is a vector parallel to the line segment from
the sun to 2. Then the velocity vector is v(t) = t(b, c) + (0, ∆), where, for
simplicity, we assume that, in the absence of the force from the sun, the
planet travels at a speed of ∆ units per second. The position vector is then
1 2
r(t) = t (b, c) + t(0, ∆) + r(0).
2
At time t = 1, instead of the planet being at 3, it is now at
1
r(1) = (b, c) + (0, ∆) + r(0).
2
Since the point 3 corresponds to the position (0, ∆) + r(0), we see that the
point 4 lies along the line from 3 parallel to the vector (b, c).

18.11 Armchair Physics


Mathematicians tend to ignore things like units, when they do calculus
problems. Physicists know that you can often learn a lot just by paying
attention to the units involved, or by asking questions like what happens to
velocity when length is converted from feet to inches and time from minutes
to seconds. This is sometimes called “armchair physics” . To illustrate, we
apply this approach to Kepler’s Third Law.

18.11.1 Rescaling
Suppose that the spatial variables (x, y, z) are replaced by (αx, αy, αz) and
time changed from t to βt. Then velocity, since it is distance divided by
time, is changed from v to αβ −1 v. Velocity squared, and therefore kinetic
and potential energies, are changed by a factor of α2 β −2 .

18.11.2 Gravitational Potential


The gravitational potential function φ(x, y, z) associated with the gravita-
tional field due to the sun is given by
−C
φ(x, y, z) = p , (18.10)
x + y2 + z2
2
170CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)

where C > 0 is some constant and we assume that the sun is at the origin.
The gradient of φ(x, y, z) is
 −C  x y z 
∇φ(x, y, z) = , , .
x2 + y 2 + z 2
p p p
x2 + y 2 + z 2 x2 + y 2 + z 2 x2 + y 2 + z 2
The gravitational force on a massive object at point (x, y, z) is therefore a
vector of magnitude x2 +yC2 +z2 , directed from (x, y, z) toward (0, 0, 0), which
says that the force is central and falls off as the reciprocal of the distance
squared.
The potential function φ(x, y, z) is (−1)-homogeneous, meaning that
when we replace x with αx, y with αy, and z with αz, the new potential
is the old one times α−1 .
We also know, though, that when we rescale the space variables by α
and time by β the potential energy is multiplied by a factor of α2 β −2 . It
follows that
α−1 = α2 β −2 ,
so that
β 2 = α3 . (18.11)
Suppose that we have two planets, P1 and P2 , orbiting the sun in circular
orbits, with the length of the the orbit of P2 equal to α times that of P1 .
We can view the orbital data from P2 as that from P1 , after a rescaling of
the spatial variables by α. According to Equation (18.11), the orbital time
of P2 is then that of P1 multiplied by β = α3/2 . This is Kepler’s Third
Law.
Kepler took several decades to arrive at his third law, which he obtained
not from basic physical principles, but from analysis of observational data.
Could he have saved himself much time and effort if he had stayed in his
armchair and considered rescaling, as we have just done? No. The impor-
tance of Kepler’s Third Law lies in its universality, the fact that it applies
not just to one planet but to all. We have implicitly assumed universality
by postulating a potential function that governs the gravitational field from
the sun.

18.11.3 Gravity on Earth


We turn now to the gravitational pull of the earth on an object near its
surface. We have just seen that the potential function is proportional to
the reciprocal of the distance from the center of the earth to the object.
Let the radius of the earth be R and let the object be at a height h above
the surface of the earth. Then the potential is
−B
φ(R + h) = ,
R+h
18.11. ARMCHAIR PHYSICS 171

for some constant B. The potential at the surface of the earth is


−B
φ(R) = .
R
The potential difference between the object at height h and the surface of
the earth is then
B B 1 1  R + h − R
P D(h) = − =B − =B .
R R+h R R+h R(R + h)

If h is very small relative to R, then we can say that


B
P D(h) = h,
R2
so is linear in h. The potential difference is therefore 1-homogeneous; if we
rescale the spatial variables by α the potential difference is also rescaled by
α. But, as we saw previously, the potential difference is also rescaled by
α2 β −2 . Therefore,
α = α2 β −2 ,
or
β = α1/2 .
This makes sense. Consider a ball dropped from a tall building. In order
to double the time of fall (multiply t by β = 2) we must quadruple the
height from which it is dropped (multiply h by α = β 2 = 4).
172CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)

Figure 18.1: Newton’s Own Diagram.


Chapter 19

Green’s Theorem and


Related Topics (Chapter
5,6,13)

19.1 Introduction
Green’s Theorem in two dimensions can be interpreted in two different
ways, both leading to important generalizations, namely Stokes’s Theorem
and the Divergence Theorem. In addition, Green’s Theorem has a number
of corollaries that involve normal derivatives, Laplacians, and harmonic
functions, and that anticipate results in analytic function theory, such as
the Cauchy Integral Theorems. A good reference is the book by Flanigan
[16].

19.1.1 Some Terminology


A subset D of R2 is said to be open if, for every point x in D, there
is  > 0, such that the ball centered at x, with radius  is completely
contained within D. The set D is connected if it is not the union of two
disjoint non-empty open sets. The set D is said to be a domain if D is
non-empty, open and connected. A subset B of R2 is bounded if it is a
subset of a ball of finite radius. The boundary of a set D, denoted ∂D,
is the set of all points x, in D or not, such that every ball centered at x
contains points in D and points not in D.
Because we shall be interested in theorems that relate the behavior of
functions inside a domain to their behavior on the boundary of that domain,
we need to limit our discussion to those domains that have nice boundaries.
A Jordan curve is a piece-wise smooth closed curve that does not cross

173
174CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)

itself. A Jordan domain is a bounded domain, whose boundary consists


of finitely many, say k, disjoint Jordan curves, parameterized in such a
way that as a point moves around the curve with increasing parameter,
the domain always lies to the left; this is positive orientation. Then the
domain is called a k-connected Jordan domain. For example, a ball in R2
is 1-connected, while an annulus is 2-connected; Jordan domains can have
holes in them.

19.1.2 Arc-Length Parametrization


Let C be a curve in space with parameterized form

r(t) = (x(t), y(t), z(t)).

For each t, let s(t) be the distance along the curve from the point r(0) to
the point r(t). The function s(t) is invertible, so that we can also express t
as a function of s, t = t(s). Then s(t) is called the arc-length. We can then
rewrite the parametrization, using as the parameter the variable s instead
of t; that is, the curve C can be described as

r(s) = r(t(s)) = (x(t(s)), y(t(s)), z(t(s))). (19.1)

Then
dr dr ds dx dy dz ds
r0 (t) = = =( , , ) . (19.2)
dt ds dt ds ds ds dt
The vector
dr dx dy dz
T(s) = =( , , ) (19.3)
ds ds ds ds
has length one, since

ds2 = dx2 + dy 2 + dz 2 , (19.4)


ds
and v = dt , the speed along the curve, satisfies

ds 2 dx dy dz
( ) = ( )2 + ( )2 + ( )2 . (19.5)
dt dt dt dt

19.2 Green’s Theorem in Two Dimensions


Green’s Theorem for two dimensions relates double integrals over domains
D to line integrals around their boundaries ∂D. Theorems such as this can
be thought of as two-dimensional extensions of integration by parts. Green
published this theorem in 1828, but it was known earlier to Lagrange and
Gauss.
19.3. PROOF OF GREEN-2D 175

Theorem 19.1 (Green-2D) Let P (x, y) and Q(x, y) have continuous first
partial derivatives for (x, y) in a domain Ω containing both Jordan domain
D and ∂D. Then
I Z Z 
∂Q ∂P 
P dx + Qdy = − dxdy. (19.6)
∂D D ∂x ∂y
Let the boundary ∂D be the positively oriented parameterized curve

r(t) = (x(t), y(t)).

Then, for each t, the vector

r0 (t) = (x0 (t), y 0 (t))

is tangent to the curve at the point r(t). The vector

N(t) = (y 0 (t), −x0 (t))

is perpendicular to r0 (t) and is outwardly normal to the curve at the point


r(t). The integrand on the left side of Equation (19.6) can be written in
two ways:

P dx + Qdy = (P, Q) · r0 (t)dt, (19.7)

or as

P dx + Qdy = (Q, −P ) · N(t)dt. (19.8)

In Equation (19.7) we use the dot product of the vector field F = (P, Q)
with a tangent vector; this point of view will be extended to Stokes’s
Theorem. In Equation (19.8) we use the dot product of the vector field
G = (Q, −P ) with a normal vector; this formulation of Green’s Theorem,
also called Gauss’s Theorem in the plane, will be extended to the Diver-
gence Theorem, also called Gauss’s Theorem in three dimensions. Either
of these extensions therefore can legitimately be called Green’s Theorem in
three dimensions.

19.3 Proof of Green-2D


H
First, we compute the line integral P dx + Qdy around a small rectangle
in D and then sum the result over all such small rectangles in D. For
convenience, we assume the parameter s is arc-length.
Consider the rectangle with vertices (x0 , y0 ), (x0 +∆x, y0 ), (x0 +∆x, y0 +
∆y), and (x0 , y0 +∆y), where ∆x and ∆y are very small positive quantities.
The boundary curve is counter-clockwise. The line integrals along the four
sides are as follows:
176CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)

• The right side:


Z y0 +∆y
Q(x0 + ∆x, y)dy; (19.9)
y0

• The top:
Z x0
P (x, y0 + ∆y)dx; (19.10)
x0 +∆x

• The left side:


Z y0
Q(x0 , y)dy; (19.11)
y0 +∆y

• The bottom:
Z x0 +∆x
P (x, y0 )dx. (19.12)
x0

Now consider the double integral


Z Z
∂Q ∂P
( − )dx dy, (19.13)
∆ ∂x ∂y
where ∆ denotes the infinitesmal rectangular region. We write the first
half of this integral as
Z y0 +∆y  Z x0 +∆x 
Qx (x, y)dx dy,
y0 x0
Z y0 +∆y  
= Q(x0 + ∆x, y) − Q(x0 , y) dy,
y0
Z y0 +∆y Z y0 +∆y
= Q(x0 + ∆x, y)dy − Q(x0 , y)dy,
y0 y0

which is the sum of the two integrals in lines 19.9 and 19.11. In the same
way, we can show that the second half of the double integral is equal to the
line integrals along the top and bottom of ∆.
Now consider the contributions to the double integral
Z Z
∂Q ∂P
( − )dx dy, (19.14)
D ∂x ∂y
which is the sum of each of the double integrals over all the small rectangles
∆ in D. When we add up the contributions of all these infinitesimal rect-
angles, we need to note that rectangles adjacent to one another contribute
19.4. EXTENSION TO THREE DIMENSIONS 177

nothing to the line integral from their shared edge, since the unit outward
normals are opposite in direction. Consequently, the sum of all the line
integrals around the small rectangles reduces to the line integral around
the boundary of D, since this is the only curve without any shared edges.
The double integral in Equation (19.14) is then the line integral around the
boundary only, which is the assertion of Green-2D.
Note that we have used the assumption that Qx and Py are continuous
when we replaced the double integral with iterated single integrals and
when we reversed the order of integration.

19.4 Extension to Three Dimensions


19.4.1 Stokes’s Theorem
The first extension of Green-2D to three dimensions that we shall discuss is
Stokes’s Theorem. The statement of Stokes’s Theorem involves a curve C
in space and a surface S that is a capping surface for C. A good illustration
of a capping surface is the soap bubble formed when we blow air through
a soapy ring; the ring is C and the bubble formed is S.
Theorem 19.2 Let C be a Jordan curve in space with unit tangent T(s) =
dr
ds and S a capping surface for C, with outward unit normal vectors n(s).
Let F(x, y, z) = (P (x, y, z), Q(x, y, z), R(x, y, z)) be a vector field. The curl
of F is the vector field
curl(F) = (Ry − Qz , Pz − Rx , Qx − Py ), (19.15)
∂R
where Ry = ∂y .Then
I Z Z
F · Tds = n · curl(F)dS. (19.16)
C S

Proof: For convenience, we shall assume that there is a region D in the


x, y plane and a real-valued function f (x, y), defined for (x, y) in D, such
that the surface S is the graph of f (x, y), that is, each point (x, y, z) on S
has the form (x, y, z) = (x, y, f (x, y)). The boundary curve of D, denoted
∂D, is the curve in the x, y plane directly below the curve C.
Since we can write
F = (P, Q, R) = P i + Qj + Rk
and
∇ × F = ∇ × (P i) + ∇ × (Qj) + ∇ × (Rk),
we focus on proving the theorem for the simpler case of Q = R = 0. Note
that we have
∂P ∂P
∇ × (P i) = j− k,
∂z ∂y
178CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)

so that
∂P ∂P
∇ × (P i) · n = n·j− n · k. (19.17)
∂z ∂y
The vector r(x, y, z) = (x, y, f (x, y)) from the origin to the point (x, y, z)
on the surface S then has
∂r ∂f
=j+ k.
∂y ∂y
∂r
The vector ∂y is tangent to the surface at (x, y, z), and so it is perpendicular
to the unit outward normal. This means that
∂f
n·j+ n · k = 0,
∂y
so that
∂f
n·j=− n · k. (19.18)
∂y
Therefore, using Equations (19.17) and (19.18), we have
 ∂P ∂f ∂P 
∇ × (P i) · n dS = − + n · kdS. (19.19)
∂z ∂y ∂y
Note, however, that
∂P ∂f ∂P ∂F
+ = ,
∂z ∂y ∂y ∂y
where F (x, y) = P (x, y, f (x, y)). Therefore, recalling that

n · k dS = dxdy,

we get
∂F
∇ × (P i) · n dS = − dxdy. (19.20)
∂y
By Green 2-D, we have
Z Z Z Z I
∂F
∇ × (P i) · n dS = − dxdy = F dx.
S D ∂y ∂D

But we also have I I


F dx = P dx,
∂D C
since F (x, y) = P (x, y, f (x, y)). Similar calculations for the other two
coordinate-direction components establish the assertion of the theorem.
19.4. EXTENSION TO THREE DIMENSIONS 179

Suppose that F = (P, Q, 0) and the surface S is a Jordan domain D in


R2 , with C = ∂D. Then

curl(F) = (0, 0, Qx − Py ),

and n = (0, 0, 1). Therefore,

n · curl(F) = Qx − Py .

Also,
F · T = P dx + Qdy.
We see then that Stokes’s Theorem has Green-2D as a special case.
Because the curl of a vector field is defined only for three-dimensional
vector fields, it is not obvious that the curl and Stokes’s Theorem extend
to higher dimensions. They do, but the extensions involve more compli-
cated calculus on manifolds and the integration of (n − 1)-forms over a
suitably oriented boundary of an oriented n-manifold; see Fleming [17] for
the details.

19.4.2 The Divergence Theorem


Equation (19.8) suggests that we consider surface integrals of functions
having the form F · n, where n is the outward unit normal to the surface
at each point. The Divergence Theorem, also called Gauss’s Theorem in
three dimensions, is one result in this direction.

Theorem 19.3 Let S be a closed surface enclosing the volume V . Let


F = (P, Q, R) be a vector field with divergence

div(F) = Px + Qy + Rz .

Then
Z Z Z Z Z
F · n dS = div(F) dV. (19.21)
S V

Proof: We first prove the theorem for a small cube with vertices (x, y, z),
(x, y + ∆y, z), (x, y, z + ∆z) and (x, y + ∆y, z + ∆z) forming the left side
wall, and the vertices (x + ∆x, y, z), (x + ∆x, y + ∆y, z), (x + ∆x, y, z + ∆z)
and (x + ∆x, y + ∆y, z + ∆z) forming the right side wall. The unit outward
normal for the side wall containing the first four of the eight vertices is
n = (−1, 0, 0); for the other side wall, it is n = (1, 0, 0). For the first side
wall the flux is the normal component of the field times the area of the
wall, or
−P (x, y, z)∆y ∆z,
180CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)

while for the second side wall, it is

P (x + ∆x, y, z)∆y ∆z.

The total outward flux through these two walls is then


 
P (x + ∆x, y, z) − P (x, y, z) ∆y ∆z,

or  P (x + ∆x, y, z) − P (x, y, z) 
∆x∆y∆z.
∆x
Taking limits, we get
∂P
(x, y, z) dV.
∂x
We then perform the same calculations for the other four walls. Finally,
having proved the theorem for small cubes, we view the entire volume as
a sum of small cubes and add up the total flux for all the cubes. Because
outward flux from one cube’s wall is inward flux for its neighbor, they
cancel out, except when a wall has no neighbor; this means that the only
outward flux that remains is through the surface. This is what the theorem
says.
If we let R = 0 and imagine the volume shrinking down to a two-
dimensional planar domain D, with S compressing down to its boundary,
∂D, the unit normal vector becomes
dy dx
n=( , − ),
ds ds
and Equation (19.21) reduces to Equation (19.6).

19.5 When is a Vector Field a Gradient Field?


The following theorem is classical and extends the familiar “test for exact-
ness” .
Theorem 19.4 Let F : D ⊆ RN → RN be continuously differentiable on
an open convex set D0 ⊆ D, with

F(x) = (F1 (x), F2 (x), ..., FN (x)).

Then there is a differentiable function f : D0 → RN such that F(x) =


∇f (x) for all x in D0 if and only if
∂Fm ∂Fn
= ,
∂xn ∂xm
for all m and n; in other words, the Jacobian matrix of F is symmetric.
19.5. WHEN IS A VECTOR FIELD A GRADIENT FIELD? 181

Proof: If F(x) = ∇f (x) for all x in D0 and is continuously differentiable,


then the second partial derivatives of f (x) are continuous, so that the
mixed second partial derivatives of f (x) are independent of the order of
differentiation.
For notational convenience, we present the proof of the converse only
for the case of N = 3; the proof is the same in general.
Without loss of generality, we assume that the origin is a member of
the set D0 . Define f (x, y, z) by
Z x Z y Z z
f (x, y, z) = F1 (u, 0, 0)du + F2 (x, u, 0)du + F3 (x, y, u)du.
0 0 0

We prove that ∂f∂x (x, y, z) = F1 (x, y, z).


The partial derivative of the first integral, with respect to x, is F1 (x, 0, 0).
The partial derivative of the second integral, with respect to x, obtained
by differentiating under the integral sign, is
Z y
∂F2
(x, u, 0)du,
0 ∂x

which, by the symmetry of the Jacobian matrix, is


Z y
∂F1
(x, u, 0)du = F1 (x, y, 0) − F1 (x, 0, 0).
0 ∂y

The partial derivative of the third integral, with respect to x, obtained by


differentiating under the integral sign, is
Z z
∂F3
(x, y, u)du,
0 ∂x

which, by the symmetry of the Jacobian matrix, is


Z z
∂F1
(x, y, u)du = F1 (x, y, z) − F1 (x, y, 0).
0 ∂z

We complete the proof by adding these three integral values. Similar cal-
culations show that ∇f (x) = F(x).
Theorem 19.4 tells us that, for a three-dimensional field

F(x, y, z) = (F1 (x, y, z), F2 (x, y, z), F3 (x, y, z)),

there is a real-valued function f (x, y, z) with F(x, y, z) = ∇f (x, y, z) for


all (x, y, z) if and only if the curlH of F(x, y, z) is zero. It follows from
Stokes’s Theorem that the integral C F · Tds is zero for every closed curve
C. Consequently, for any path C connecting points A and B, the integral
182CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)

R
C
F · Tds is independent of the path and depends only on the points A
and B; then we can write
Z Z B
F · Tds = F · Tds. (19.22)
C A
In addition, the potential function f (x, y, z) can be chosen to be
Z (x,y,z)
f (x, y, z) = F · Tds, (19.23)
(x0 ,y0 ,z0 )

where (x0 , y0 , z0 ) is an arbitrarily selected point in space.


RB
When F(x, y, z) denotes a force field, the integral A F·Tds is the work
done against the force in moving from A to B. When F = ∇f , this work is
simply the change in the potential function f . Such force fields are called
conservative.

19.6 Corollaries of Green-2D


19.6.1 Green’s First Identity
Let u(x, y) be a differentiable real-valued function of two variables, with
gradient
∂u ∂u
∇u(x, y) = ( , ).
∂x ∂y
Let D be a Jordan domain with boundary C = ∂D. The directional deriva-
tive of u, in the direction of the unit outward normal n, is
∂u
= ∇u · n.
∂n
When the curve C is parameterized by arc-length, the unit outward normal
is
dy dx
n = ( , − ),
ds ds
and
I I
∂u
ds = −uy dx + ux dy. (19.24)
C ∂n C

Theorem 19.5 (Green I) Let ∇2 q denote the Laplacian of the function


q(x, y), that is,
∇2 q = qxx + qyy .
Then
Z Z I Z Z
∂q
(∇p) · (∇q) dxdy = p ds − p∇2 q dxdy. (19.25)
D C ∂n D
Proof: Evaluate the line integral using Green-2D, with P = −pqy and
Q = pqx .
19.6. COROLLARIES OF GREEN-2D 183

19.6.2 Green’s Second Identity


An immediate corollary is Green’s Second Identity (Green II).
Theorem 19.6 (Green II)
I Z Z
∂q ∂p
p −q ds = p∇2 q − q∇2 p dxdy. (19.26)
C ∂n ∂n D

19.6.3 Inside-Outside Theorem


The Inside-Outside Theorem, which is a special case of Gauss’s Theorem
in the plane, follows immediately from Green II.
Theorem 19.7 (Inside-Outside Theorem)
I Z Z
∂q
ds = ∇2 q dxdy. (19.27)
C ∂n D

19.6.4 Green’s Third Identity


Green’s Third Identity (Green III) is more complicated than the previous
ones. Let w be any point inside the Jordan domain D in R2 and hold w
fixed. For variable z in the plane, let r = |z − w|. A function is said to
be harmonic if its Laplacian is identically zero. We show now that the
function p(z) = log r is harmonic for any z in any domain that does not
contain w. With z = (x, y) and w = (a, b), we have
r2 = (x − a)2 + (y − b)2 ,
so that
1  
p(z) = p(x, y) = log (x − a)2 + (y − b)2 .
2
Then
x−a
px (x, y) = ,
(x − a)2 + (y − b)2
(y − b)2 − (x − a)2
pxx (x, y) = ,
((x − a)2 + (y − b)2 )2
y−b
py (x, y) = ,
(x − a)2 + (y − b)2
and
(x − a)2 − (y − b)2
pyy (x, y) = .
((x − a)2 + (y − b)2 )2
Clearly, we have
pxx + pyy = 0,
and so p is harmonic in any region not including w.
The theorem is the following:
184CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)

Theorem 19.8 (Green III)


Z Z
1
q(w) = log r ∇2 q dxdy
2π D

I I
1 ∂q 1 ∂ log r
− log r ds + q ds. (19.28)
2π C ∂n 2π C ∂n
The two line integrals in Equation (19.28) are known as the logarithmic
single-layer potential and logarithmic double-layer potential, respectively,
of the function q.
Notice that we cannot apply Green II directly to the domain D, since
log r is not defined at z = w. The idea is to draw a small circle C 0 centered
at w, with interior D0 and consider the new domain that is the original
D, without the ball D0 around w and its boundary; the new domain has a
hole in it, but that is acceptable. Then apply Green II, and finally, let the
radius of the ball go to zero. There are two key steps in the calculation.
First, we use the fact that, for the small circle, n = r/krk to show that
∂p 1
= ∇p · n = ,
∂n ρ
where ρ is the radius of the small circle C 0 centered at w and r = (z − w)
is the vector from w to z. Then
1
p(z) = log(krk2 ),
2
so that
1 1 1
∇p(z) = 2
∇krk2 = r.
2 krk krk2
Therefore, for z on C 0 , we have
∂p 1
= ∇p(z) · n = .
∂n ρ
Then I I
∂p 1
q ds = qds,
C0 ∂n ρ C0
which, as the radius of C 0 goes to zero, is just 2πq(w).
∂q
Second, we note that the function ∂n is continuous, and therefore
bounded by some constant K > 0 on the circle C 0 ; the constant K can
be chosen to be independent of ρ, for ρ sufficiently close to zero. Conse-
quently, we have
I I
∂q
| log r ds| ≤ K| log ρ ds| = 2πK|ρ log ρ|.
C0 ∂n C0
19.7. APPLICATION TO COMPLEX FUNCTION THEORY 185

Since ρ log ρ goes to zero, as ρ goes to zero, this integral vanishes, in the
limit.
Equation (19.28) tells us that if q is a harmonic function in D, then
its value at any point w inside D is completely determined by what the
∂q
functions q and ∂n do on the boundary C. Note, however, that the nor-
mal derivative of q depends on values of q near the boundary, not just on
the boundary. In fact, q(w) is completely determined by q alone on the
boundary, via I
1 ∂
q(w) = − q(z) G(z, w) ds,
2π C ∂n
where G(z, w) is the Green’s function for the domain D.
According to the heat equation, the temperature u(x, y, t) in a two-
dimensional region at time t is governed by the partial differential equation
∂u
= c∇2 u,
∂t
for some constant c > 0. When a steady-state temperature has been
reached, the function u(x, y, t) no longer depends on t and the resulting
function f (x, y) of (x, y) only satisfies ∇2 f = 0; that is, f (x, y) is harmonic.
Imagine the region being heated by maintaining a temperature distribution
around the boundary of the region. It is not surprising that such a steady-
state temperature distribution throughout the region should be completely
determined by the temperature distribution around the boundary of the
region.

19.7 Application to Complex Function The-


ory
In Complex Analysis the focus is on functions f (z) where both z and f (z)
are complex variables, for each z. Because z = x + iy, for x and y real
variables, we can always write

f (z) = u(x, y) + iv(x, y),

where u(x, y) and v(x, y) are both real-valued functions of two real vari-
ables. So it looks like there is nothing new here; complex function theory
looks like the theory of any two real-valued functions glued together to form
a complex-valued function. There is an important difference, however.
The most important functions in complex analysis are the functions
that are analytic in a domain D in the complex plane. Such functions will
have the property that, for any closed curve C in D,
I
f (z)dz = 0.
C
186CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)

Writing dz = dx + idy, we have


I I I
f (z)dz = (udx − vdy) + i (vdx + udy).
C C C

From Green 2-D it follows that we want ux = vy and uy = −vx ; these


are called the Cauchy-Riemann (CR) equations. The CR equations are a
consequence of the differentiability of f (z). The point here is that complex
function theory is not just the theory of unrelated real-valued functions
glued together; the functions u(x, y) and v(x, y) must be related in the
sense of the CR equations in order for the function f (z) to fit the theory.
If f (z) is an analytic function of the complex variable z, then, because
of the CR equations, the real and imaginary parts of f (z), the functions
u(x, y) and v(x, y), are real-valued harmonic functions of two variables.
Using Green III, we can obtain Cauchy’s Integral Formula, which shows
that the value of f at any point w within the domain D is determined by
the value of the function at the points z of the boundary:
I
1 f (z)
f (w) = dz. (19.29)
2πi C z − w
This formula occurs in complex function theory under slightly weaker
assumptions than we use here. We shall assume that f (z) is continuously
differentiable, so that the real and imaginary parts of f are continuously
differentiable; we need this for Green 2-D. In complex function theory,
it is shown that the continuity of f 0 (z) is a consequence of analyticity.
According to complex function theory, we may, without loss of generality,
consider only the case in which C is the circle of radius ρ centered at
w = (a, b), and D is the region enclosed by C, which is what we shall do.
We know from Green III that
Z Z
1
u(w) = log r ∇2 u dxdy
2π D

I I
1 ∂u 1 ∂ log r
− log r ds + u ds, (19.30)
2π C ∂n 2π C ∂n
with a similar expression involving v. Because u is harmonic, Equation
(19.30) reduces to
I I
1 ∂u 1 ∂ log r
u(w) = − log r ds + u ds , (19.31)
2π C ∂n 2π C ∂n
with a similar expression involving the function v.
Consider the first line integral in Equation (19.31),
I
1 ∂u
log r ds. (19.32)
2π C ∂n
19.7. APPLICATION TO COMPLEX FUNCTION THEORY 187

Since r = ρ for all z on C, this line integral becomes


I
1 ∂u
log ρ ds. (19.33)
2π C ∂n

But, by the Inside-Outside Theorem, and the fact that u is harmonic, we


know that
I Z Z
∂u
ds = ∇2 u dxdy = 0. (19.34)
C ∂n D

So we need only worry about the second line integral in Equation (19.31),
which is
I
1 ∂ log r
u ds. (19.35)
2π C ∂n
We need to look closely at the term
∂ log r
.
∂n
First, we have

∂ log r 1 ∂ log r2
= . (19.36)
∂n 2 ∂n
The function log r2 can be viewed as
 
log r2 = log a · a , (19.37)

where a denotes z − w, thought of as a vector in R2 . Then


  a
∇ log r2 = ∇ log a · a = 2 . (19.38)
||a||2

Because C is a circle centered at w, the unit outward normal at z on C is


a
n= . (19.39)
||a||

Putting all this together, we find that


∂ log r 1 1
= = . (19.40)
∂n ||a|| |z − w|

Therefore, Green III tells us that


I
1 u(z)
u(w) = ds, (19.41)
2π C |z − w|
188CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)

with a similar expression involving v. There is one more step we must take
to get to the Cauchy Integral Formula.
We can write z − w = ρeiθ for z on C. Therefore,
dz
= ρieiθ . (19.42)

The arc-length s around the curve C is s = ρθ, so that
ds
= ρ. (19.43)

Therefore, we have
s
θ= , (19.44)
ρ
and

z − w = ρeis/ρ . (19.45)

Then,
dz
= ieis/ρ , (19.46)
ds
or
1 −iθ
ds = e dz. (19.47)
i
Substituting for ds in Equation (19.41) and in the corresponding equation
involving v, and using the fact that

|z − w|eiθ = z − w, (19.48)

we obtain Cauchy’s Integral Formula (19.29).


For a brief discussion of an interesting application of Maxwell’s equa-
tions, see the chapter on Invisibility.

19.8 The Cauchy-Riemann Equations Again


Let r(t) = (x(t), y(t)) be a curve in R2 , which we view as a parameterized
set of complex numbers; that is, we write z(t) = x(t) + iy(t). Then z 0 (t) =
(x0 (t) + iy 0 (t)). Let f (z) be analytic in a domain D and write

f (z) = u(x, y) + iv(x, y),

and
g(t) = f (z(t)) = u(x(t), y(t)) + iv(x(t), y(t)).
19.8. THE CAUCHY-RIEMANN EQUATIONS AGAIN 189

Then, with f 0 (z(t)) = c(t) + id(t) and suppressing the t, we have

g 0 (t) = f 0 (z(t))z 0 (t) = (cx0 − dy 0 ) + i(cy 0 + dx0 ). (19.49)

We also have

g 0 (t) = (ux x0 + uy y 0 ) + i(vx x0 + vy y 0 ). (19.50)

Comparing Equations (19.49) and (19.50), we find that

cx0 − dy 0 = ux x0 + uy y 0 ,

and
cy 0 + dx0 = vx x0 + vy y 0 .
Since these last two equations must hold for any curve r(t), they must hold
when x0 (t) = 0 for all t, as well as when y 0 (t) = 0 for all t. It follows
that c = ux , d = −uy , c = vy , and d = vx , from which we can get the
Cauchy-Riemann equations easily.
190CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)
Chapter 20

Introduction to Complex
Analysis (Chapter 13)

20.1 Introduction
The material in this chapter is taken mainly from Chapter 13 of the text.
In some cases, the ordering of topics has been altered slightly.

20.2 Complex-valued Functions of a Complex


Variable
In complex analysis the focus is on functions w = f (z), where both z and
w are complex numbers. With z = x + iy, for x and y real, it follows that

w = f (z) = u(x, y) + iv(x, y), (20.1)

with both u(x, y) and v(x, y) real-valued functions of the two real variables
x and y. Since zx = 1 and zy = i, the differential dz is

dz = dx + idy. (20.2)

For any curve C in the complex plane the line integral of f (z) along C is
defined as
Z Z Z Z
f (z)dz = (u + iv)(dx + idy) = udx − vdy + i vdx + udy.(20.3)
C C C C

191
192CHAPTER 20. INTRODUCTION TO COMPLEX ANALYSIS (CHAPTER 13)

20.3 Differentiability
The derivative of the function f (z) at the point z is defined to be

f (z + ∆z) − f (z)
f 0 (z) = lim , (20.4)
∆z→0 ∆z
whenever this limit exists. Note that ∆z = ∆x + i∆y, so that z + ∆z is
obtained by moving a small distance away from z, by ∆x in the horizontal
direction and ∆y in the vertical direction. When the limit does exist, the
function f (z) is said to be differentiable or analytic at the point z; the
function is then continuous at z as well.
For real-valued functions of a real variable, requiring that the function
be differentiable is not a strong requirement; however, for the functions
w = f (z) it certainly is. What makes differentiability a strong condition is
that, for the complex plane, we can move away from z in infinitely many
directions, unlike in the real case, where all we can do is to move left or
right away from x. As we shall see, in order for f (z) to be differentiable,
the functions u(x, y) and v(x, y) must be related in a special way, called
the Cauchy-Riemann equations.

20.4 The Cauchy-Riemann Equations


We can rewrite the limit in Equation (20.4) as

(u(x + ∆x, y + ∆y) − u(x, y)) + i(v(x + ∆x, y + ∆y) − v(x, y))
f 0 (z) = lim .
∆x,∆y→0 ∆x + i∆y
(20.5)

Suppose now that we take ∆y = 0, so that ∆z = ∆x. Then the limit in


Equation (20.5) becomes

(u(x + ∆x, y) − u(x, y)) + i(v(x + ∆x, y) − v(x, y))


f 0 (z) = lim . (20.6)
∆x→0 ∆x
Then

f 0 (z) = ux + ivx . (20.7)

On the other hand, if we take ∆x = 0, so that ∆z = i∆y, we get


(u(x, y + ∆y) − u(x, y)) + i(v(x, y + ∆y) − v(x, y))
f 0 (z) = lim . (20.8)
∆y→0 i∆y
Then

f 0 (z) = vy − iuy . (20.9)


20.5. INTEGRATION 193

It follows that

ux = vy , and uy = −vx . (20.10)

For example, suppose that

f (z) = z 2 = (x + iy)2 = (x2 − y 2 ) + i(2xy).

The derivative is f 0 (z) = 2z and ux = vy = 2x, while uy = −vx = −2y.


Since the Cauchy-Riemann equations must hold if f (z) is differentiable,
we can use them to find functions that are not differentiable. For example,
if f (z) is real-valued and not constant, then f (z) is not differentiable. In
this case vx = vy = 0, but ux and uy cannot both be zero. Another example
is the function f (z) = z = x − iy. Here ux = 1, but vy = −1.
However, most of the real-valued differentiable functions of x can be
extended to complex-valued differentiable functions of z simply by replacing
x with z in the formulas. For example, sin z, cos z, and ez are differentiable,
with derivatives cos z, − sin z, and ez , respectively. The function 2z−3
3z−6 is
differentiable throughout any region that does not include the point z = 2,
and we obtain its derivative using the usual quotient rule.
One consequence of the Cauchy-Riemann equations is that the functions
u and v are harmonic, that is

uxx + uyy = 0, and vxx + vyy = 0.

20.5 Integration
Suppose that f (z) is differentiable on and inside a simple closed curve C,
and suppose that the partial derivatives ux , uy , vx , and vy are continuous.
Using Equation (20.3) and applying Green 2-D separately to both of the
integrals, we get
I Z Z Z Z
f (z)dz = − (vx + uy )dxdy + i (ux − vy )dxdy, (20.11)
C D D

where D denotes the interior of the region whose boundary is the curve C.
The Cauchy-Riemann equations tell us that H both of the double integrals
are zero. Therefore, we may conclude that C f (z)dz = 0 for all such simple
closed curves C.
It is important to remember that Green 2-D is valid for regions that
have holes in them; in such cases the boundary C of the region consists
of more than one simple closed curve, so the line integral in Green 2-D is
along each of these curves separately, with the orientation such that the
region remains to the left as we traverse the line.
In a course on complex analysis it is shown that this theorem holds
without the assumptions that the first partial derivatives are continuous;
194CHAPTER 20. INTRODUCTION TO COMPLEX ANALYSIS (CHAPTER 13)

this is the Cauchy-Goursat Theorem. This theorem greatly improves the


theory of complex analysis, as we shall see.

20.6 Some Examples


Consider the function f (z) = (z − a)n , where n is an integer. If n is a
non-negative integer, then f (z) is differentiable everywhere and
I
(z − a)n dz = 0,
C

for every simple closed curve. But what happens when n is negative?
Let C be a simple closed curve with z = a inside C. Using the Cauchy-
Goursat Theorem, we may replace the integral around the curve C with
the integral around the circle centered at a and having radius , where  is
a small positive number. Then z = a + eiθ for all z on the small circle,
and
dz = ieiθ dθ.
Then
I Z 2π Z 2π
n iθ n iθ n
(z − a) dz = (e ) ie dθ = i ei(n+1)θ dθ.
C 0 0

Therefore, if n + 1 is not zero, we have


I
(z − a)n dz = 0,
C

and if n + 1 = 0 I
(z − a)−1 dz = 2πi.
C

20.7 Cauchy’s Integral Theorem


Suppose that C1 and C2 are circles in the complex plane, with common
center z = a; let the radius of C2 be the smaller. Suppose that f (z) is
differentiable in a region containing the annulus whose boundaries are C1
and C2 , and that C is a simple closed curve in the annulus that surrounds
the curve C2 . Then
I I I
f (z)dz = f (z)dz = f (z)dz. (20.12)
C1 C2 C

The proof of this result is given, almost, in the worked problem 13.12 on
p. 299 of the text. The difficulty with the proof given there is that the
20.8. TAYLOR SERIES EXPANSIONS 195

curve he describes as AQPABRSTBA is not a simple closed curve; this


curve repeats the part AB in both directions, so crosses itself. A more
rigorous proof replaces the return path BA with one very close to BA, call
it B’A’. Then the result is obtained by taking the limit, as the path B’A’
approaches BA.
One way to think of this theorem is that if we can morf the curve C into
C1 or C2 without passing through a point where f (z) is not differentiable,
then the integrals are the same. Now we use this fact to prove the Cauchy
Integral Theorem.
Let f (z) be differentiable on and inside a simple closed curve C, and
let a be a point inside C. Then Cauchy’s Integral Theorem tells us that
I
1 f (z)
f (a) = dz. (20.13)
2πi C z − a
Again, we may replace the curve C with the circle centered at a and having
radius , for some small positive . Then
I Z 2π Z 2π
f (z) iθ iθ −1 iθ
dz = f (a + e )(e ) ie dθ = i f (a + eiθ )dθ.
C z−a 0 0

Letting  ↓ 0, we get I
f (z)
dz = 2πif (a).
C z−a
Differentiating with respect to a in Cauchy’s Integral Theorem we find that
I
0 1 f (z)
f (a) = dz, (20.14)
2πi C (z − a)2
and more generally
I
n! f (z)
f (n) (a) = dz. (20.15)
2πi C (z − a)n+1
So not only is f (z) differentiable, but it has derivatives of all orders. This
is one of the main ways in which complex analysis differs from real analysis.

20.8 Taylor Series Expansions


When we study Taylor series expansions for real-valued functions of a real
variable x, we find that the function f (x) = 1/(x2 + 1) poses a bit of a
mystery. We learn that

1/(x2 + 1) = 1 − x2 + x4 − x6 + ...,

and that this series converges for |x| < 1 only. But why? The function
f (x) is differentiable for all x, so why shouldn’t the Taylor series converge
196CHAPTER 20. INTRODUCTION TO COMPLEX ANALYSIS (CHAPTER 13)

for all x, as the Taylor series for sin x or ex do. The answer comes when
we consider the complex extension, f (z) = 1/(z 2 + 1). This function is
undefined at z = i and z = −i. The Taylor series for f (z) converges in the
largest circle centered at a = 0 that does not contain a point where f (z)
fails to be differentiable, so must converge only within a circle of radius
one. This must apply on the real line as well.
Let f be differentiable on and inside a circle C centered at a and let
a + h be inside C. Then

f (a + h) = a0 + a1 h + a2 h2 + ..., (20.16)

where an = f (n) (a)/n!. The Taylor series converges in the largest circle
centered at z = a that does not include a point where f (z) is not differen-
tiable.
Taking a + h inside C and using the Cauchy Integral Theorem, we have
I
1 f (z)
f (a + h) = dz.
2πi C z − (a + h)
Writing
1
1/(z − (a + h)) = 1/((z − a) − h) = (z − a)−1 ,
1 − h(z − a)−1
we have

1/(z − (a + h)) = (z − a)−1 [1 + h(z − a)−1 + h2 (z − a)−2 + ...].

The series converges because z lies on C and a + h is inside, so that the


absolute value of the ratio h/(z − a) is less than one. Equation (20.16)
follows now from Equation (20.15).

20.9 Laurent Series: An Example


Suppose now that C1 and C2 are concentric circles with common center a,
and the radius of C2 is the smaller. We assume that the function f (z) is
differentiable in a region containing the annulus bounded by C1 and C2 , but
perhaps not differentiable at some points inside C2 . We want an infinite
series expansion for f (z) that is valid within the annulus. We begin with
an example.

20.9.1 Expansion Within an Annulus


Let f (z) = (7z − 2)/(z + 1)z(z − 2). Then f (z) is not differentiable at
z = −1, z = 0, and z = 2. Suppose that we want a series expansion for
f (z) in terms of powers of z + 1, valid within the annulus 1 < |z + 1| < 3.
20.10. LAURENT SERIES EXPANSIONS 197

To simplify the calculations we replace z with t = z + 1, so that


f (t) = (7t − 9)/t(t − 1)(t − 3),
and seek a series representation of f (t) in terms of powers of t.
Using partial fractions, we obtain
f (t) = −3t−1 + (t − 1)−1 + 2(t − 3)−1 .
For (t − 1)−1 we have
(t − 1)−1 = 1/(t − 1) = t−1 (1/(1 − t−1 )) = t−1 [1 + t−1 + t−2 + ...],
which converges for |t| > 1. For (t − 3)−1 we have
−1 −1 t t
(t − 3)−1 = (1/(1 − t/3)) = [1 + + ( )2 + ...],
3 3 3 3
which converges for |t| < 3. Putting all this together, we get
2 1 1
f (z) = −3(z+1)−1 +(z+1)−1 [1+(z+1)−1 +(z+1)−2 +...]− [1+ (z+1)+ (z+1)2 +...].
3 3 9

20.9.2 Expansion Within the Inner Circle


Suppose now that we want a Laurent series expansion of f (z) in powers of
z + 1 that is convergent within the circle centered at z = −1, with radius
one. The function f (z)(z + 1) is differentiable within that circle, and so
has a Taylor series expansion there. To get the Laurent series expansion
for f (z) we simply move the factor z + 1 to the other side, multiplying it
by the Taylor expansion.
In this case, we write (t − 1)−1 as
1 −1
(t − 1)−1 = = = −[1 + t + t2 + t3 + ...],
t−1 1−t
which converges for |t| < 1. The series for (t − 3)−1 remains the same as
before.

20.10 Laurent Series Expansions


Let C1 and C2 be two concentric circles with common center a, with the
radius of C2 the smaller. Let f (z) be differentiable in a region containing
the annulus bounded by C1 and C2 . Let a + h be inside the annulus and C
any simple closed curve in the annulus that surrounds the inner circle C2 .
Then

X
f (a + h) = an hn , (20.17)
n=−∞
198CHAPTER 20. INTRODUCTION TO COMPLEX ANALYSIS (CHAPTER 13)

for
I
1 f (z)
an = dz. (20.18)
2πi C (z − a)n+1

Note that for non-negative n the integral in Equation (20.18) need not be
f (n) (a), since the function f (z) need not be differentiable inside of C2 . This
theorem is discussed in problem 13.82 of the text.
To prove this, we use the same approach as in problem 13.12 of the
text. What we find is that, since the two curves form the boundary of the
annulus, Cauchy’s Integral Theorem becomes
I I
1 f (z) 1 f (z)
f (a + h) = dz − dz. (20.19)
2πi C1 z − (a + h) 2πi C2 z − (a + h)
1
To obtain the desired result we write the expression z−(a+h) in two ways,
depending on if the z lies on C1 or on C2 . For z on C1 we write
1 h h 2
= (z − a)−1 [1 + +( ) + ...], (20.20)
z − (a + h) z−a z−a
while for z on C2 we write
1 z−a z−a 2
= −h−1 [1 + +( ) + ...]. (20.21)
z − (a + h) h h
Then
I ∞ I
f (z) X
n f (z)
dz = h n+1
dz, (20.22)
C1 z − (a + h) n=0 C1 (z − a)

and
I −1 I
f (z) X f (z)
dz = hn dz. (20.23)
C2 z − (a + h) n=−∞ C2 (z − a)n+1

Both integrals are equivalent to integrals over the curve C. The desired
result follows by applying Equation (20.15).

20.11 Residues
Suppose now that we want to integrate f (z) over the simple closed curve
C in the previous section. From Equation (20.18) and n + 1 = 0 we see
that
I
f (z)dz = (2πi)a−1 . (20.24)
C
20.12. THE BINOMIAL THEOREM 199

Note that if f (z) is also differentiable inside of C2 then a−1 = 0 and the
integral is also zero.
If (z − a)m f (z) is differentiable on and inside C then the Laurent ex-
pansion becomes

X
f (z) = a−m (z − a)−m + a−m+1 (z − a)−m+1 + ... + an (z − a)n(.20.25)
n=0

If a−m is not zero, then f (z) is said to have a pole of order m at z = a.


We then have
I
f (z)dz = (2πi)a−1 ; (20.26)

the number a−1 is called the residue of f (z) at the point z = a. Further-
more, we have

1 dm−1  m

a−1 = lim (z − a) f (z) . (20.27)
z→a (m − 1)! dz m−1

Note that we can replace the curve C with an arbitrarily small circle cen-
tered at z = a.
If f (z) is differentiable on and inside
H a simple closed curve C, except
1
for a finite number of poles, then 2πi C
f (z)dz is the sum of the residues
at these poles; this is the Residue Theorem (see problem 13.25 of the text).
7z−2
For example, consider again the function f (z) = (z+1)z(z−2) . For the
annulus 1 < |z + 1| < 3 and the curve C the circle of radius two centered
at z = −1, we have I
f (z)dz = −4πi,
C

since the residues of f (z) are −3 at the pole z = −1 and 1 at the pole
z = 0, both inside C.
z2
For a second example, consider the function f (z) = (z2 +1)(z−2) . The
points z = 2, z = i and z = −i are poles of order one. The residue of f (z)
at z = 2 is
4
lim (z − 2)f (z) = .
z→2 5

20.12 The Binomial Theorem


1
Let f (z) = z(z+2) 3 . Suppose that we want to represent f (z) as a Laurent

series involving powers of z. We write


1 z
f (z) = (1 + )−3 .
8z 2
200CHAPTER 20. INTRODUCTION TO COMPLEX ANALYSIS (CHAPTER 13)

We need to expand (1 + z2 )−3 as a Taylor series involving powers of z.


The binomial theorem tells us that, for any positive integer N ,
N  
N
X N n
(1 + x) = x , (20.28)
n=0
n

for
 
N N!
= . (20.29)
n n!(N − n)!
Now if α is any real number, we would like to have
∞  
X α n
(1 + x)α = x ; (20.30)
n=0
n

The function
f (z) = (1 + z)α (20.31)
is analytic in the region |z| < 1, and so has a Taylor-series expansion of the
form
(1 + z)α = a0 + a1 z + a2 z 2 + ..., (20.32)
where
an = f (n) (0)/n! = α(α − 1)(α − 2) · · · (α − (n − 1))/n!. (20.33)
This tells us how to define α

n . We can also see how to do it when we write
 
N N (N − 1)(N − 2)(N − (n − 1))
= ; (20.34)
n n!
we now write
 
α α(α − 1)(α − 2) · · · (α − (n − 1))
= . (20.35)
n n!
Using this extended binomial theorem we have
z 3 3 5
(1 + )−3 = 1 − z + z 2 − z 3 + .... (20.36)
2 2 2 4
Therefore, we have
1 1 3 3 5
f (z) = 3
= − + z − z 2 + ... (20.37)
z(z + 2) 8z 16 16 32
The residue of f (z) at the point z = 0 is then 18 . Since (z + 2)3 f (z) = z −1
and the second derivative is 2z −3 , the residue of f (z) at the point z = −2
is −1
8 .
20.13. USING RESIDUES 201

20.13 Using Residues


R 2π 1
Suppose now that we want to find 0 5+3 sin θ dθ. Let z = eiθ . Then

eiθ − e−iθ z − z −1
sin θ = = ,
2i 2i
and
dz = ieiθ dθ = izdθ.
The integral is then
Z 2π I
1 2
dθ = dz, (20.38)
0 5 + 3 sin θ C 3z 2 + 10iz − 3
where C is the circle of radius one, centered at the origin. The poles of
the integrand are z = −3i and z = − 3i , both of order one. Only the pole
z = − 3i lies within C. The residue of the integrand at the pole z = − 3i is
1 π
4i , so the integral has the value 2 .
We conclude this chapter with a quick look at several of the more im-
portant consequences of the theory developed so far.

20.14 Cauchy’s Estimate


Again, let f (z) be analytic in a region R, let z0 be in R, and let C within
R be the circle with center z0 and radius r. Let M be the maximum value
of |f (z)| for z on C. Using Equation (20.15), it is easy to show that
n!M
|f (n) (z0 )| ≤ . (20.39)
rn
Note that M depends on r and probably changes as r changes. As we shall
see now, this relatively simple calculation has important consequences.

20.15 Liouville’s Theorem


Suppose now that f (z) is analytic throughout the complex plane; that is,
f (z) is an entire function. Suppose also that f (z) is bounded, that is, there
is a constant B > 0 such that |f (z)| ≤ B, for all z. Applying Equation
(20.39) for the case of n = 1, we get
B
|f 0 (z0 )| ≤ ,
r
where the B now does not change as r changes. Then we write

|f 0 (z0 )|r ≤ B.
202CHAPTER 20. INTRODUCTION TO COMPLEX ANALYSIS (CHAPTER 13)

Unless f 0 (z0 ) = 0, the left side goes to infinity, as r → ∞, while the right
side stays constant. Therefore, f 0 (z0 ) = 0 for all z0 and f (z) is constant.
This is Liouville’s Theorem.

20.16 The Fundamental Theorem of Algebra


The fundamental theorem of algebra tells us that every polynomial of de-
gree greater than zero has a (possibly non-real) root. We can prove this
using Liouville’s Theorem.
Suppose P (z) is such a polynomial and P (z) has no roots. Then P (z)−1
is analytic everywhere in the complex plane. Since |P (z)| → ∞ as |z| → ∞,
it follows that P (z)−1 is a bounded function. By Liouville’s Theorem,
P (z)−1 must be constant; but this is not true. So P (z) must be zero for
some z.

20.17 Morera’s Theorem


H
We know that if f (z) is analytic in a region R, then C f (z)dz = 0 for
every simple closed curve C in R. Morera’s Theorem is the converse.
H
Theorem 20.1 Let f (z) be continuous in a region R and such that C f (z)dz =
0 for every simple closed curve C in R. Then f (z) is analytic in R.
Proof: Let f (z) = u(x, y) + iv(x, y). For arbitrary fixed z0 = (x0 , y0 ) and
variable z = (x, y) in R, define
Z z
F (z) = f (w)dw.
z0

Then, writing
F (z) = U (x, y) + iV (x, y),
we can easily show that
Z (x,y)
U (x, y) = udx − vdy,
(x0 ,y0 )

and Z (x,y)
V (x, y) = vdx + udy.
(x0 ,y0 )

It follows then, from our previous discussions of the Green’s identities, that
Ux = u, Uy = −v, Vx = v and Vy = u. Therefore, Ux = Vy and Uy = −Vx ;
that is, the Cauchy-Riemann Equations are satisfied. Therefore, since these
partial derivatives are continuous, we can conclude that F (z) is analytic in
R. But then, so is F 0 (z) = f (z).
Chapter 21

The Quest for Invisibility


(Chapter 5,6)

21.1 Invisibility: Fact and Fiction


The military uses special materials and clever design in its stealth technol-
ogy to build aircraft that are nearly invisible to radar. Fictional characters
have it much easier; J.K. Rowling’s hero Harry Potter becomes invisible
when he wraps himself in his special cloak, and the Romulans in Star
Trek can make their entire fleet of ships invisible by selectively bending
light rays. In his Republic Plato, another best-selling author, has his hero
Socrates and his friend Glaucon discuss ethical behavior. Socrates asserts
that being honest and just is a good thing in its own right. Glaucon coun-
ters by recalling the mythical shepherd Gyges, who found a magic ring
that he could use to make himself invisible. Glaucon wonders if anyone
would behave honestly if no one would know if you did, and there was no
possibility of punishment if you did not.
As Kurt Bryan and Tanya Leise discuss in the article [7], recent research
in impedance tomography suggests that it may be possible, through the use
of special meta-materials with carefully designed microstructure, to render
certain objects invisible to certain electromagnetic probing. This chapter
is a brief sketch of the theory; for more detail, see [7].

21.2 The Electro-Static Theory


Suppose that Ω is the open disk with center at the origin and radius one
in two-dimensional space, and ∂Ω is its boundary, the circle with radius
one centered at the origin. The points of ∂Ω are denoted (cos θ, sin θ) for
0 ≤ θ < 2π.

203
204 CHAPTER 21. THE QUEST FOR INVISIBILITY (CHAPTER 5,6)

Let f (θ) describe a time-independent distribution of electrical charge


along the boundary. Then f (θ) induces an electro-static field E(x, y) within
Ω. We know that there is an electro-static potential function u(x, y) such
that E(x, y) = −∇u(x, y).
If f is constant, then so are u and E. If the disk Ω is made of a
perfectly homogeneous conducting material, then current will flow within
Ω; the current vector at (x, y) is denoted J(x, y) and J(x, y) = γE(x, y),
where γ > 0 is the constant conductance. The component of the current
field normal to the boundary at any point is
∂u
(θ) = ∇ · n(x, y) = −J · n,
∂n
where n = n(θ) is the unit outward normal at θ. This outward component
of the current will also be constant over all θ.
If f is not constant, then the induced potential u(x, y) will vary with
(x, y), as will the field E(x, y). Finding the induced potential u(x, y) from
f (θ) is called the Dirichlet problem.
If the conductance is not constant within Ω, then each point (x, y)
will have a direction of maximum conductance and an orthgonal direction
of minimum conductance. Using these as the eigenvectors of a positive-
definite matrix S = S(x, y), we have

∇ · (S∇u) = 0,

and J = S∇u.

21.3 Impedance Tomography


In impedance tomography we attempt to determine the potential u(x, y)
within Ω by first applying a current at the points of the boundary, and then
measuring the outward flux of the induced electro-static field at points of
the boundary. The measured outward flow is called the Neumann data.
When the conductivity within Ω is changed, the relationship between
the applied current and the measured outward flux changes. This suggests
that when there is a non-conducting region D within a homogeneous Ω, we
can detect it by noting the change in the measured outward flux.

21.4 Cloaking
Suppose we want to hide a conducting object within a non-conducting
region D. We can do this, but it will still be possible to “see” the presence
of D and determine its size. If D is large enough to conceal an object of
a certain size, then one might become suspicious. What we need to do is
21.4. CLOAKING 205

to make it look like the region D is smaller than it really is, or is not even
there.
By solving Laplace’s equation for the region between the outer bound-
ary, where we have measured the flux, and the inner boundary of D, where
the flux is zero, we can see how the size of D is reflected in the solution
obtained. The presence of D distorts the potential function, and therefore
the measured flux. The key to invisibility is to modify the conductivity in
the region surrounding D in such a way that all (or, at least, most) of the
distortion takes place well inside the boundary, so that at the boundary
the potential looks undistorted.
For more mathematical details and discussion of the meta-materials
needed to achieve this, see [7].
206 CHAPTER 21. THE QUEST FOR INVISIBILITY (CHAPTER 5,6)
Chapter 22

Calculus of Variations
(Chapter 16)

22.1 Introduction
In optimization, we are usually concerned with maximizing or minimizing
real-valued functions of one or several variables, possibly subject to con-
straints. In this chapter, we consider another type of optimization problem,
maximizing or minimizing a function of functions. The functions them-
selves we shall denote by simply y = y(x), instead of the more common
notation y = f (x), and the function of functions will be denoted J(y); in
the calculus of variations, such functions of functions are called functionals.
We then want to optimize J(y) over a class of admissible functions y(x). We
shall focus on the case in which x is a single real variable, although there
are situations in which the functions y are functions of several variables.
When we attempt to minimize a function g(x1 , ..., xN ), we consider
what happens to g when we perturb the values xn to xn + ∆xn . In order
for x = (x1 , ..., xN ) to minimize g, it is necessary that
g(x1 + ∆x1 , ..., xN + ∆xN ) ≥ g(x1 , ..., xN ),
for all perturbations ∆x1 , ..., ∆xN . For differentiable g, this means that
the gradient of g at x must be zero. In the calculus of variations, when
we attempt to minimize J(y), we need to consider what happens when we
perturb the function y to a nearby admissible function, denoted y + ∆y. In
order for y to minimize J(y), we need
J(y + ∆y) ≥ J(y),
for all ∆y that make y + ∆y admissible. We end up with something anal-
ogous to a first derivative of J, which is then set to zero. The result is a

207
208 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)

differential equation, called the Euler-Lagrange Equation, which must be


satisfied by the minimizing y.

22.2 Some Examples


In this section we present some of the more famous examples of problems
from the calculus of variations.

22.2.1 The Shortest Distance


Among all the functions y = y(x), defined for x in the interval [0, 1], with
y(0) = 0 and y(1) = 1, the straight-line function y(x) = x has the shortest
length. Assuming the functions are differentiable, the formula for the length
of such curves is
Z 1r  dy 2
J(y) = 1+ dx. (22.1)
0 dx

Therefore, we can say that the function y(x) = x minimizes J(y), over all
such functions.
In this example, the functional J(y) involves only the first derivative of
y = y(x) and has the form
Z
J(y) = f (x, y(x), y 0 (x))dx, (22.2)

where f = f (u, v, w) is the function of three variables


p
f (u, v, w) = 1 + w2 . (22.3)

In general, the functional J(y) can come from almost any function f (u, v, w).
In fact, if higher derivatives of y(x) are involved, the function f can be a
function of more than three variables. In this chapter we shall confine our
discussion to problems involving only the first derivative of y(x).

22.2.2 The Brachistochrone Problem


Consider a frictionless wire connecting the two points A = (0, 0) and B =
(1, 1); for convenience, the positive y-axis is downward. A metal ball rolls
from point A to point B under the influence of gravity. What shape should
the wire take in order to make the travel time of the ball the smallest? This
famous problem, known as the Brachistochrone Problem, was posed in 1696
by Johann Bernoulli. This event is viewed as marking the beginning of the
calculus of variations.
22.2. SOME EXAMPLES 209

The velocity of the ball along the curve is v = ds


dt , where s denotes the
arc-length. Therefore,
r  dy 2
ds 1
dt = = 1+ dx.
v v dx
Because the ball is falling under the influence of gravity only, the velocity
it attains after falling from (0, 0) to (x, y) is the same as it would have
attained had it fallen y units vertically; only the travel times are different.
This is because the loss of potential energy is the same √ either way. The
velocity attained after a vertical free fall of y units is 2gy. Therefore, we
have r   2
dy
1+ dx dx
dt = √ .
2gy
The travel time from A to B is therefore
Z 1r  dy 2 1
1
J(y) = √ 1+ √ dx. (22.4)
2g 0 dx y

For this example, the function f (u, v, w) is



1 + w2
f (u, v, w) = √ . (22.5)
v

22.2.3 Minimal Surface Area


Given a function y = y(x) with y(0) = 1 and y(1) = 0, we imagine revolving
this curve around the x-axis, to generate a surface of revolution. The
functional J(y) that we wish to minimize now is the surface area. Therefore,
we have
Z 1 p
J(y) = y 1 + y 0 (x)2 dx. (22.6)
0

Now the function f (u, v, w) is


p
f (u, v, w) = v 1 + w2 . (22.7)

22.2.4 The Maximum Area


Among all curves of length L connecting the points (0, 0) and (1, 0), find
the one for which the area A of the region bounded by the curve and the
x-axis is maximized. The length of the curve is given by
Z 1p
L= 1 + y 0 (x)2 dx, (22.8)
0
210 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)

and the area, assuming that y(x) ≥ 0 for all x, is


Z 1
A= y(x)dx. (22.9)
0

This problem is different from the previous ones, in that we seek to optimize
a functional, subject to a second functional being held fixed. Such problems
are called problems with constraints.

22.2.5 Maximizing Burg Entropy


The Burg entropy of a positive-valued function y(x) on [−π, π] is
Z π  
BE(y) = log y(x) dx. (22.10)
−π

An important problem in signal processing is to maximize BE(y), subject


to
Z π
rn = y(x)e−inx dx, (22.11)
−π

for |n| ≤ N . The rn are values of the Fourier transform of the function
y(x).

22.3 Comments on Notation


The functionals J(y) that we shall consider in this chapter have the form
Z
J(y) = f (x, y(x), y 0 (x))dx, (22.12)

where f = f (u, v, w) is some function of three real variables. It is common


practice, in the calculus of variations literature, to speak of f = f (x, y, y 0 ),
rather than f (u, v, w). Unfortunately, this leads to potentially confusing
∂f
notation, such as when ∂u is written as ∂f∂x , which is not the same thing as
the total derivative of f (x, y(x), y 0 (x)),
d ∂f ∂f 0 ∂f
f (x, y(x), y 0 (x)) = + y (x) + 0 y 00 (x). (22.13)
dx ∂x ∂y ∂y
Using the notation of this chapter, Equation (22.13) becomes
d ∂f
f (x, y(x), y 0 (x)) = (x, y(x), y 0 (x))+
dx ∂u

∂f ∂f
(x, y(x), y 0 (x))y 0 (x) + (x, y(x), y 0 (x))y 00 (x). (22.14)
∂v ∂w
22.4. THE EULER-LAGRANGE EQUATION 211

The common notation forces us to view f (x, y, y 0 ) both as a function of


three unrelated variables, x, y, and y 0 , and as f (x, y(x), y 0 (x)), a function
of the single variable x.
For example, suppose that

f (u, v, w) = u2 + v 3 + sin w,

and
y(x) = 7x2 .
Then

f (x, y(x), y 0 (x)) = x2 + (7x2 )3 + sin(14x), (22.15)

∂f
(x, y(x), y 0 (x)) = 2x, (22.16)
∂x
and
d d  2 
f (x, y(x), y 0 (x)) = x + (7x2 )3 + sin(14x)
dx dx

= 2x + 3(7x2 )2 (14x) + 14 cos(14x). (22.17)

22.4 The Euler-Lagrange Equation


In the problems we shall consider in this chapter, admissible functions are
differentiable, with y(x1 ) = y1 and y(x2 ) = y2 ; that is, the graphs of the
admissible functions pass through the end points (x1 , y1 ) and (x2 , y2 ). If
y = y(x) is one such function and η(x) is a differentiable function with
η(x1 ) = 0 and η(x2 ) = 0, then y(x) + η(x) is admissible, for all values of
. For fixed admissible function y = y(x), we define

J() = J(y(x) + η(x)), (22.18)

and force J 0 () = 0 at  = 0. The tricky part is calculating J 0 ().


Since J(y(x) + η(x)) has the form
Z x2
J(y(x) + η(x)) = f (x, y(x) + η(x), y 0 (x) + η 0 (x))dx, (22.19)
x1

we obtain J 0 () by differentiating under the integral sign.


Omitting the arguments, we have
Z x2
∂f ∂f 0
J 0 () = η+ η dx. (22.20)
x1 ∂v ∂w
212 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)

Using integration by parts and η(x1 ) = η(x2 ) = 0, we have


Z x2 Z x2
∂f 0 d ∂f
η dx = − ( )ηdx. (22.21)
x1 ∂w x1 dx ∂w

Therefore, we have
Z x2  ∂f
0 d ∂f 
J () = − ( ) ηdx. (22.22)
x1 ∂v dx ∂w

In order for y = y(x) to be the optimal function, this integral must be zero
for every appropriate choice of η(x), when  = 0. It can be shown without
too much trouble that this forces

∂f d ∂f
− ( ) = 0. (22.23)
∂v dx ∂w

Equation (22.23) is the Euler-Lagrange Equation.


For clarity, let us rewrite that Euler-Lagrange Equation using the ar-
guments of the functions involved. Equation (22.23) is then

∂f d  ∂f 
(x, y(x), y 0 (x)) − (x, y(x), y 0 (x)) = 0. (22.24)
∂v dx ∂w

22.5 Special Cases of the Euler-Lagrange Equa-


tion
The Euler-Lagrange Equation simplifies in certain special cases. Here we
consider two cases: 1) when f (u, v, w) is independent of the variable v, as
in Equation (22.3); and 2) when f (u, v, w) is independent of the variable
u, as in Equations (22.5) and (22.7).

22.5.1 If f is independent of v
If the function f (u, v, w) is independent of the variable v then the Euler-
Lagrange Equation (22.24) becomes

∂f
(x, y(x), y 0 (x)) = c, (22.25)
∂w

for some constant c. If, in addition, the function f (u, v, w) is a function of


∂f
w alone, then so is ∂w , from which we conclude from the Euler-Lagrange
Equation that y 0 (x) is constant.
22.6. USING THE EULER-LAGRANGE EQUATION 213

22.5.2 If f is independent of u
Note that we can write
d ∂f
f (x, y(x), y 0 (x)) = (x, y(x), y 0 (x))
dx ∂u

∂f ∂f
+ (x, y(x), y 0 (x))y 0 (x) + (x, y(x), y 0 (x))y 00 (x).
∂v ∂w
(22.26)

We also have
d  0 ∂f 
y (x) (x, y(x), y 0 (x)) =
dx ∂w

d  ∂f  ∂f
y 0 (x) (x, y(x), y 0 (x)) + y 00 (x) (x, y(x), y 0 (x)).
dx ∂w ∂w
(22.27)

Subtracting Equation (22.27) from Equation (22.26), we get

d  ∂f 
f (x, y(x), y 0 (x)) − y 0 (x) (x, y(x), y 0 (x)) =
dx ∂w

∂f  ∂f d ∂f 
(x, y(x), y 0 (x)) + y 0 (x) − (x, y(x), y 0 (x)).
∂u ∂v dx ∂w
(22.28)

Now, using the Euler-Lagrange Equation, we see that Equation (22.28)


reduces to
d  ∂f  ∂f
f (x, y(x), y 0 (x)) − y 0 (x) (x, y(x), y 0 (x)) = (x, y(x), y 0 (x)).
dx ∂w ∂u
(22.29)
∂f
If it is the case that ∂u = 0, then equation (22.29) leads to

∂f
f (x, y(x), y 0 (x)) − y 0 (x) (x, y(x), y 0 (x)) = c, (22.30)
∂w
for some constant c.

22.6 Using the Euler-Lagrange Equation


We derive and solve the Euler-Lagrange Equation for each of the examples
presented previously.
214 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)

22.6.1 The Shortest Distance


In this case, we have
p
f (u, v, w) = 1 + w2 , (22.31)

so that
∂f
= 0,
∂v
and
∂f
= 0.
∂u
We conclude that y 0 (x) is constant, so y(x) is a straight line.

22.6.2 The Brachistochrone Problem


Equation (22.5) tells us that

1 + w2
f (u, v, w) = √ . (22.32)
v

Then, since
∂f
= 0,
∂u
and
∂f w
=√ √ ,
∂w 1 + w2 v
Equation (22.30) tells us that
p
1 + y 0 (x)2 y 0 (x)
p − y 0 (x) p p = c. (22.33)
y(x) 1 + y 0 (x)2 y(x)

Equivalently, we have
p p √
y(x) 1 + y 0 (x)2 = a. (22.34)

Solving for y 0 (x), we get


s
0 a − y(x)
y (x) = . (22.35)
y(x)

Separating variables and integrating, using the substitution


a
y = a sin2 θ = (1 − cos 2θ),
2
22.6. USING THE EULER-LAGRANGE EQUATION 215

we obtain
Z
a
x = 2a sin2 θdθ = (2θ − sin 2θ) + k. (22.36)
2

From this, we learn that the minimizing curve is a cycloid, that is, the path
a point on a circle traces as the circle rolls.
There is an interesting connection, discussed by Simmons in [42] , be-
tween the brachistochrone problem and the refraction of light rays. Imagine
a ray of light passing from the point A = (0, a), with a > 0, to the point
B = (c, b), with c > 0 and b < 0. Suppose that the speed of light is v1
above the x-axis, and v2 < v1 below the x-axis. The path consists of two
straight lines, meeting at the point (0, x). The total time for the journey
is then
√ p
a2 + x2 b2 + (c − x)2
T (x) = + .
v1 v2
Fermat’s Principle of Least Time says that the (apparent) path taken by
the light ray will be the one for which x minimizes T (x). From calculus, it
follows that
x c−x
√ = p ,
2
v1 a + x 2 v2 b + (c − x)2
2

and from geometry, we get Snell’s Law:

sin α1 sin α2
= ,
v1 v2

where α1 and α2 denote the angles between the upper and lower parts of
the path and the vertical, respectively.
Imagine now a stratified medium consisting of many horizontal layers,
each with its own speed of light. The path taken by the light would be
such that sinv α remains constant as the ray passes from one layer to the
next. In the limit of infinitely many infinitely thin layers, the path taken
by the light would satisfy the equation sinv α = constant, with

1
sin α = p .
1 + y 0 (x)2

As
√ we have already seen, the velocity attained by the rolling ball is v =
2gy, so the equation to be satisfied by the path y(x) is
p p
2gy(x) 1 + y 0 (x)2 = constant,

which is what we obtained from the Euler-Lagrange Equation.


216 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)

22.6.3 Minimizing the Surface Area


For the problem of minimizing the surface area of a surface of revolution,
the function f (u, v, w) is
p
f (u, v, w) = v 1 + w2 . (22.37)
∂f
Once again, ∂u = 0, so we have

y(x)y 0 (x)2 p
p − y(x) 1 + y 0 (x)2 = c. (22.38)
0
1 + y (x) 2

It follows that
x−a
y(x) = b cosh , (22.39)
b
for appropriate a and b.
It is important to note that being a solution of the Euler-Lagrange Equa-
tion is a necessary condition for a differentiable function to be a solution
to the original optimization problem, but it is not a sufficient condition.
The optimal solution may not be a differentiable one, or there may be no
optimal solution. In the case of minimum surface area, there may not be
any function of the form in Equation (22.39) passing through the two given
end points; see Chapter IV of Bliss [2] for details.

22.7 Problems with Constraints


We turn now to the problem of optimizing one functional, subject to a
second functional being held constant. The basic technique is similar to
ordinary optimization subject to constraints: we use Lagrange multipliers.
We begin with a classic example.

22.7.1 The Isoperimetric Problem


A classic problem in the calculus of variations is the Isoperimetric Prob-
lem: find the curve of a fixed length that encloses the largest area. For
concreteness, suppose the curve connects the two points (0, 0) and (1, 0)
and is the graph of a function y(x). The problem then is to maximize the
area integral
Z 1
y(x)dx, (22.40)
0

subject to the perimeter being held fixed, that is,


Z 1p
1 + y 0 (x)2 dx = P. (22.41)
0
22.7. PROBLEMS WITH CONSTRAINTS 217

With p
f (x, y(x), y 0 (x)) = y(x) + λ 1 + y 0 (x)2 ,
the Euler-Lagrange Equation becomes
d  λy 0 (x) 
p − 1 = 0, (22.42)
dx 1 + y 0 (x)2
or
y 0 (x) x−a
p = . (22.43)
1+ y 0 (x)2 λ
x−a
Using the substitution t = λ and integrating, we find that
(x − a)2 + (y − b)2 = λ2 , (22.44)
which is the equation of a circle. So the optimal function y(x) is a portion
of a circle.
What happens if the assigned perimeter P is greater than π2 , the length
of the semicircle connecting (0, 0) and (1, 0)? In this case, the desired curve
is not the graph of a function of x, but a parameterized curve of the form
(x(t), y(t)), for, say, t in the interval [0, 1]. Now we have one independent
variable, t, but two dependent ones, x and y. We need a generalization of
the Euler-Lagrange Equation to the multivariate case.

22.7.2 Burg Entropy


According to the Euler-Lagrange Equation for this case, we have
N
1 X
+ λn e−ixn , (22.45)
y(x)
n=−N
or
N
X
y(x) = 1/ an einx . (22.46)
n=−N

The spectral factorization theorem [37] tells us that if the denominator is


positive for all x, then it can be written as
N
X N
X
an einx = | bm eimx |2 . (22.47)
n=−N m=0

With a bit more work (see [10]), it can be shown that the desired coefficients
bm are the solution to the system of equations
N
X
rm−k bm = 0, (22.48)
m=0
218 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)

for k = 1, 2, ..., N and


N
X
rm bm = 1. (22.49)
m=0

22.8 The Multivariate Case


Suppose that the integral to be optimized is
Z b
J(x, y) = f (t, x(t), x0 (t), y(t), y 0 (t))dt, (22.50)
a

where f (u, v, w, s, r) is a real-valued function of five variables. In such


cases, the Euler-Lagrange Equation is replaced by the two equations
d  ∂f  ∂f
− = 0,
dt ∂w ∂v
d  ∂f  ∂f
− = 0. (22.51)
dt ∂r ∂s
We apply this now to the problem of maximum area for a fixed perimeter.
We know from Green’s Theorem in two dimensions that the area A
enclosed by a curve C is given by the integral

1 1
I Z
1
A= (xdy − ydx) = (x(t)y 0 (t) − y(t)x0 (t))dt. (22.52)
2 C 2 0
The perimeter P of the curve is
Z 1p
P = x0 (t)2 + y 0 (t)2 dt. (22.53)
0

So the problem is to maximize the integral in Equation (22.52), subject to


the integral in Equation (22.53) being held constant.
The problem is solved by using a Lagrange multiplier. We write
Z 1 p 
J(x, y) = x(t)y 0 (t) − y(t)x0 (t) + λ x0 (t)2 + y 0 (t)2 dt. (22.54)
0

The generalized Euler-Lagrange Equations are


d 1 λy 0 (t)  1
x(t) + p + x0 (t) = 0, (22.55)
dt 2 x0 (t)2 + y 0 (t)2 2

and
d 1 λx0 (t)  1
− y(t) + p − y 0 (t) = 0. (22.56)
dt 2 x0 (t)2 + y 0 (t)2 2
22.9. FINITE CONSTRAINTS 219

It follows that
λx0 (t)
y(t) + p = c, (22.57)
x0 (t)2 + y 0 (t)2
and
λy 0 (t)
x(t) + p = d. (22.58)
x0 (t)2 + y 0 (t)2
Therefore,

(x − d)2 + (y − c)2 = λ2 . (22.59)

The optimal curve is then a portion of a circle.

22.9 Finite Constraints


Let x, y and z be functions of the independent variable t, with ẋ = x0 (t).
Suppose that we want to minimize the functional
Z b
J(x, y, z) = f (x, ẋ, y, ẏ, z, ż)dt,
a

subject to the constraint


G(x, y, z) = 0.
Here we suppose that the points (x(t), y(t), z(t)) describe a curve in space
and that the condition G(x(t), y(t), z(t)) = 0 restricts the curve to the
surface G(x, y, z) = 0. Such a problem is said to be one of finite constraints.
In this section we illustrate this type of problem by considering the geodesic
problem.

22.9.1 The Geodesic Problem


The space curve (x(t), y(t), z(t)), defined for a ≤ t ≤ b, lies on the surface
described by G(x, y, z) = 0 if G(x(t), y(t), z(t)) = 0 for all t in [a, b]. The
geodesic problem is to find the curve of shortest length lying on the surface
and connecting points A = (a1 , a2 , a3 ) and B = (b1 , b2 , b3 ). The functional
to be minimized is the arc length
Z bp
J= ẋ2 + ẏ 2 + ż 2 dt, (22.60)
a

dx
where ẋ = dt . Here the function f is
p
f (x, ẋ, y, ẏ, z, ż) = ẋ2 + ẏ 2 + ż 2 .
220 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)

We assume that the equation G(x, y, z) = 0 can be rewritten as

z = g(x, y),

that is, we assume that we can solve for the variable z, and that the function
g has continuous second partial derivatives. We may not be able to do this
for the entire surface, as the equation of a sphere G(x, y, z) = x2 + y 2 +
z 2 − r2 = 0 illustrates, but we can usually solve for z, or one of the other
variables, on part of the surface, as, for example, on the upper or lower
hemisphere.
We then have

ż = gx ẋ + gy ẏ = gx (x(t), y(t))ẋ(t) + gy (x(t), y(t))ẏ(t), (22.61)


∂g
where gx = ∂x .
Substituting for z in Equation (22.60), we see that the problem is now
to minimize the functional
Z bq
J= ẋ2 + ẏ 2 + (gx ẋ + gy ẏ)2 dt, (22.62)
a

which we write as
Z b
J= F (x, ẋ, y, ẏ)dt. (22.63)
a

The Euler-Lagrange Equations are then


∂F d ∂F
− ( ) = 0, (22.64)
∂x dt ∂ ẋ
and
∂F d ∂F
− ( ) = 0. (22.65)
∂y dt ∂ ẏ
We want to rewrite the Euler-Lagrange equations.
Lemma 22.1 We have
∂ ż d
= (gx ).
∂x dt
Proof: From Equation (22.61) we have
∂ ż ∂
= (gx ẋ + gy ẏ) = gxx ẋ + gyx ẏ.
∂x ∂x
We also have
d d
(gx ) = (gx (x(t), y(t)) = gxx ẋ + gxy ẏ.
dt dt
22.9. FINITE CONSTRAINTS 221

Since gxy = gyx , the assertion of the lemma follows.


From the Lemma we have both
∂ ż d
= (gx ), (22.66)
∂x dt
and
∂ ż d
= (gy ). (22.67)
∂y dt

Using
∂F ∂f ∂(gx ẋ + gy ẏ)
=
∂x ∂ ż ∂x
∂f ∂ dg ∂f ∂ ż
= ( )=
∂ ż ∂x dt ∂ ż ∂x
and
∂F ∂f ∂ ż
= ,
∂y ∂ ż ∂y
we can rewrite the Euler-Lagrange Equations as

d ∂f d ∂f
( ) + gx ( ) = 0, (22.68)
dt ∂ ẋ dt ∂ ż
and
d ∂f d ∂f
( ) + gy ( ) = 0. (22.69)
dt ∂ ẏ dt ∂ ż

To see why this is the case, we reason as follows. First

∂F ∂f ∂f ∂ ż
= +
∂ ẋ ∂ ẋ ∂ ż ∂ ẋ
∂f ∂f
= + gx ,
∂ ẋ ∂ ż
so that
d ∂F d ∂f d ∂f
( ) = ( ) + ( gx )
dt ∂ ẋ dt ∂ ẋ dt ∂ ż
d ∂f d ∂f ∂f d
= ( ) + ( )gx + (gx )
dt ∂ ẋ dt ∂ ż ∂ ż dt
d ∂f d ∂f ∂f ∂ ż
= ( ) + ( )gx + .
dt ∂ ẋ dt ∂ ż ∂ ż ∂x
Therefore,
d ∂F d ∂f d ∂f ∂F
( ) = ( ) + ( )gx + ,
dt ∂ ẋ dt ∂ ẋ dt ∂ ż ∂x
222 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)

so that

d ∂F ∂F d ∂f d ∂f
0= ( )− = ( ) + ( )gx . (22.70)
dt ∂ ẋ ∂x dt ∂ ẋ dt ∂ ż

Let the function λ(t) be defined by

d ∂f
( ) = λ(t)Gz .
dt ∂ ż

From G(x, y, z) = 0 and z = g(x, y), we have

H(x, y) = G(x, y, g(x, y)) = 0.

Then we have
Hx = Gx + Gz gx = 0,

so that
Gx
gx = − ;
Gz
similarly, we have
Gy
gy = − .
Gz
Then the Euler-Lagrange Equations become

d ∂f
( ) = λ(t)Gx , (22.71)
dt ∂ ẋ

and

d ∂f
( ) = λ(t)Gy . (22.72)
dt ∂ ẏ

Eliminating λ(t) and extending the result to include z as well, we have

d ∂f d ∂f d ∂f
dt ( ∂ ẋ ) dt ( ∂ ẏ ) dt ( ∂ ż )
= = . (22.73)
Gx Gy Gz

Notice that we could obtain the same result by calculating the Euler-
Lagrange Equation for the functional
Z b
f (ẋ, ẏ, ż) + λ(t)G(x(t), y(t), z(t))dt. (22.74)
a
22.10. HAMILTON’S PRINCIPLE AND THE LAGRANGIAN 223

22.9.2 An Example
Let the surface be a sphere, with equation

0 = G(x, y, z) = x2 + y 2 + z 2 − r2 .

Then Equation (22.73) becomes

f ẍ − ẋf˙ f ÿ − ẏ f˙ f z̈ − ż f˙
= = .
2xf 2 2yf 2 2zf 2
We can rewrite these equations as

ẍy − xÿ yz̈ − z ÿ f˙


= = .
ẋy − xẏ y ż − z ẏ f
The numerators are the derivatives, with respect to t, of the denominators,
which leads to
log |xẏ − y ẋ| = log |y ż − z ẏ| + c1 .
Therefore,
xẏ − y ẋ = c1 (y ż − z ẏ).
Rewriting, we obtain
ẋ + c1 ż ẏ
= ,
x + c1 z y
or
x + c1 z = c2 y,
which is a plane through the origin. The geodesics on the sphere are great
circles, that is, the intersection of the sphere with a plane through the
origin.

22.10 Hamilton’s Principle and the Lagrangian


22.10.1 Generalized Coordinates
Suppose there are J particles at positions rj (t) = (xj (t), yj (t), zj (t)), with
masses mj , for j = 1, 2, ..., J. Assume that there is a potential function
V (x1 , y1 , z1 , ..., xJ , yJ , zJ ) such that the force acting on the jth particle is
∂V ∂V ∂V
Fj = −( , , ).
∂xj ∂yj ∂zj
The kinetic energy is then
J
1X  
T = mj (ẋj )2 + (ẏj )2 + (żj )2 .
2 j=1
224 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)

Suppose also that the positions of the particles are constrained by the
conditions
φi (x1 , y1 , z1 , ..., xJ , yJ , zJ ) = 0,
for i = 1, ..., I. Then there are N = 3J − I generalized coordinates q1 , ..., qN
describing the behavior of the particles.
For example, suppose that there is one particle moving on the surface
of a sphere with radius R. Then the constraint is that

x2 + y 2 + z 2 = R 2 .

The generalized coordinates can be chosen to be the two angles describing


position on the surface, or latitude and longitude, say.
We then have
N
X ∂xj
ẋj = q̇n ,
n=1
∂qn

with similar expressions for the other time derivatives.

22.10.2 Homogeneity and Euler’s Theorem


A function f (u, v, w) is said to be n-homogeneous if

f (tu, tv, tw) = tn f (u, v, w),

for any scalar t. The kinetic energy T is 2-homogeneous in the variables


q̇n .

Lemma 22.2 Let f (u, v, w) be n-homogeneous. Then

∂f ∂f
(au, av, aw) = an−1 (u, v, w). (22.75)
∂u ∂u
Proof: We write
∂f f (au + a∆, av, aw) − f (au, av, aw)
(au, av, aw) = lim
∂u ∆→0 a∆
an ∂f ∂f
= (u, v, w) = an−1 (u, v, w).
a ∂u ∂u

Theorem 22.1 (Euler’s Theorem) Let f (u, v, w) be n-homogeneous. Then

∂f ∂f ∂f
u (u, v, w) + v (u, v, w) + w (u, v, w) = nf (u, v, w). (22.76)
∂u ∂v ∂w
22.10. HAMILTON’S PRINCIPLE AND THE LAGRANGIAN 225

Proof: Define g(a) = f (au, av, aw), so that


∂f ∂f ∂f
g 0 (a) = u (au, av, aw) + v (au, av, aw) + w (au, av, aw).
∂u ∂v ∂w
Using Equation (22.75) we have
 ∂f ∂f ∂f 
g 0 (a) = an−1 u (u, v, w) + v (u, v, w) + w (u, v, w) .
∂u ∂v ∂w
But we also know that

g(a) = an f (u, v, w),

so that
g 0 (a) = nan−1 f (u, v, w).
It follows that
∂f ∂f ∂f
u (u, v, w) + v (u, v, w) + w (u, v, w) = nf (u, v, w).
∂u ∂v ∂w

Since the kinetic energy T is 2-homogeneous in the variables q̇n , it


follows that
N
X ∂T
2T = q̇n . (22.77)
n=1
∂ q̇n

22.10.3 Hamilton’s Principle


The Lagrangian is defined to be

L(q1 , ..., qN , q̇1 , ..., q̇N ) = T − V.

Hamilton’s principle is then that the paths taken by the particles are such
that the integral Z t2 Z t2
L(t)dt = T (t) − V (t)dt
t1 t1

is minimized. Consequently, the paths must satisfy the Euler-Lagrange


equations
∂L d ∂L
− = 0,
∂qn dt ∂ q̇n
for each n. Since the variable t does not appear explicitly, we know that
N
X ∂L
q̇n − L = E,
n=1
∂ q̇n
226 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)

for some constant E.


Noting that
∂L ∂T
= ,
q̇n q̇n
since V does not depend on the variables q̇n , and using Equation (22.77),
we find that
E = 2T − L = 2T − (T − V ) = T + V,
so that the sum of the kinetic and potential energies is constant.

22.11 Sturm-Liouville Differential Equations


We have seen how optimizing a functional can lead to a differential equation
that must be solved. If we are given a differential equation to solve, it can
be helpful to know if it is the Euler-Lagrange equation for some functional.
For example, the Sturm-Liouville differential equations have the form
d  dy   
p(x) + q(x) + λr(x) y = 0.
dx dx
This differential equation is the Euler-Lagrange equation for the constrained
problem of minimizing the functional
Z x2  
p(x)(y 0 (x))2 − q(x)(y(x))2 dx,
x1

subject to Z x2
r(x)(y(x))2 dx = 1.
x1

We have more to say about these differential equations elsewhere in these


notes.

22.12 Exercises
Exercise 22.1 Suppose that the cycloid in the brachistochrone problem
connects the starting point (0, 0) with the point (πa, −2a), where a > 0.
q that the time required for the ball to reach the point (πa, −2a) is
Show
π ag .

Exercise 22.2 Show that, for the situation in the previous


q exercise, the
time required for the ball to reach (πa, −2a) is again π ag , if the ball begins
rolling at any intermediate point along the cycloid. This is the tautochrone
property of the cycloid.
Chapter 23

Sturm-Liouville Problems
(Chapter 10,11)

23.1 Recalling Some Matrix Theory


In this chapter we stress the similarities between special types of linear
differential operators and Hermitian matrices. We begin with a review of
the relevant linear algebra.
Every linear operator T on the vector space CN of N -dimensional com-
plex column vectors is multiplication by an N by N matrix; that is, there
is a complex matrix A such that T (x) = Ax for all x in CN . The space CN
is an inner-product space under the usual inner product, or dot product,
hx, yi given by
N
X
hx, yi = xn yn . (23.1)
n=1

Note that the inner product can be written as


hx, yi = y † x. (23.2)
We call a matrix A “real” if all its entries are real numbers. A matrix A
is Hermitian if A† = A, where A† denotes the conjugate transpose of A. If
A is real and Hermitian then AT = A, so A is symmetric.
We have defined what it means for A to be real and to be Hermitian
in terms of the entries of A; if we are to extend these notions to linear
differential operators we will need to define these notions differently. It is
easy to see that A is real if and only if hAu, vi is a real number, for every
real u and v in CN , and A is Hermitian if and only if
hAu, vi = hu, Avi, (23.3)

227
228CHAPTER 23. STURM-LIOUVILLE PROBLEMS (CHAPTER 10,11)

for every u and v in CN . These definitions we will be able to extend later.


The Hermitian matrices have the nicest properties and ones we wish to
extend to linear differential operators. A non-zero vector u in CN is an
eigenvector of A with associated eigenvalue λ if Au = λu.
Proposition 23.1 If A is Hermitian, then all its eigenvalues are real.
Proof: We have
hAu, ui = hλu, ui = λhu, ui,
and
hAu, ui = hu, Aui = λhu, ui.
Since hu, ui is not zero, we may conclude that λ = λ, or that λ is a real
number.

Proposition 23.2 If A is Hermitian and Aum = λm um and Aun = λn un ,


with λm 6= λn , then hum , un i = 0, so um and un are orthogonal.

Proof: We have
hAum , un i = λm hum , un i,
and
hAum , un i = hum , Aun i = λn hum , un i.
Since λm 6= λn , it follows that hum , un i = 0.
When we change the inner product on CN the Hermitian matrices may
no longer be the ones we focus on. For any inner product on CN we say
that a matrix B is self-adjoint if

hBu, vi = hu, Bvi, (23.4)

for all u and v in CN . For example, suppose that Q is a positive-definite N


by N matrix, which means that Q = C 2 , where C is a Hermitian, invertible
matrix. We then define the Q-inner product to be

hu, viQ = v † Qu = (Cv)† Cu. (23.5)

We say that a matrix B is self-adjoint with respect to the Q-inner product


if

hBu, viQ = hu, BviQ , (23.6)

or, equivalently,

v † QBu = (Bv)† Qu = v † B † Qu, (23.7)

for all u and v in CN . This means that QB = B † Q or that the matrix QB


is Hermitian. If QB = BQ, so that B and Q commute, then B † = B and
23.2. THE STURM-LIOUVILLE FORM 229

B is Hermitian; in general, however, B being self-adjoint for the Q-inner


product is different from B being Hermitian.
For a general linear operator T on an inner-product space we shall say
that T is self-adjoint for the given inner product if
hT u, vi = hu, T vi, (23.8)
for all u and v in the space.

23.2 The Sturm-Liouville Form


We begin with the second-order linear homogeneous ordinary differential
equation in standard form,
y 00 (x) + P (x)y 0 (x) + Q(x)y(x) = 0. (23.9)
0
R
Let F (x) = P denote an anti-derivative of P (x), that is, F (x) = P (x),
and let S(x) = exp(F (x)). Then
S(x)y 00 (x) + S(x)F 0 (x)y 0 (x) + S(x)Q(x)y(x) = 0, (23.10)
so that
d
(S(x)y 0 (x)) + S(x)Q(x)y(x) = 0, (23.11)
dx
or
d
(p(x)y 0 (x)) + g(x)y(x) = 0. (23.12)
dx
This is the Sturm-Liouville form for the differential equation in Equation
(23.9).
We shall be particularly interested in differential equations having the
Sturm-Liouville form
d
(p(x)y 0 (x)) − w(x)q(x)y(x) + λw(x)y(x) = 0, (23.13)
dx
where w(x) > 0 and λ is a constant. Rewriting Equation (23.13) as
1  d 
− (p(x)y 0 (x)) + q(x)y(x) = λy(x) (23.14)
w(x) dx
suggests an analogy with the linear algebra eigenvalue problem
Au = λu, (23.15)
where A is a square matrix, λ is an eigenvalue of A, and u 6= 0 is an
associated eigenvector. It also suggests that we study the linear differential
operator
1  d 
(Ly)(x) = − (p(x)y 0 (x)) + q(x)y(x) (23.16)
w(x) dx
to see if we can carry the analogy with linear algebra further.
230CHAPTER 23. STURM-LIOUVILLE PROBLEMS (CHAPTER 10,11)

23.3 Inner Products and Self-Adjoint Differ-


ential Operators
For the moment, let V0 be the vector space of complex-valued integrable
functions f (x), defined for a ≤ x ≤ b, for which
Z b
|f (x)|2 dx < ∞.
a

For any f and g in V0 the inner product of f and g is then


Z b
hf, gi = f (x)g(x)dx. (23.17)
a

Let V1 be the subspace of functions y(x) in V0 that are twice continuously


differentiable. Now let V be the subspace of V1 consisting of all y(x) with
y(a) = y(b) = 0. A linear operator T on V is said to be self-adjoint with
respect to the inner product in Equation (23.17) if
Z b Z b
(T f )(x)g(x)dx = f (x)(T g)(x)dx, (23.18)
a a

for all f (x) and g(x) in V .

23.3.1 An Example of a Self-Adjoint Operator


The linear differential operator Sy = iy 0 is self-adjoint. Using integration
by parts, we have
Z b Z b
hSf, gi = i f 0 (x)g(x)dx = i[f (x)g(x)|ba − f (x)g 0 (x)dx
a a

Z b
=i g 0 (x)f (x)dx = hSg, f i = hf, Sgi.
a

23.3.2 Another Example


The linear differential operator

T y = y 00

is defined for the subspace V .

Proposition 23.3 The operator T y = y 00 is self-adjoint on V .


23.3. INNER PRODUCTS AND SELF-ADJOINT DIFFERENTIAL OPERATORS231

Proof: Note that T = −S 2 . Therefore, we have

hT f, gi = −hS 2 f, gi = −hSf, Sgi

= −hf, S 2 gi = hf, T gi.

It is useful to note that


Z b
hT y, yi = − |y 0 (x)|2 dx ≤ 0,
a

for all y(x) in V , which prompts us to say that the differential operator
(−T )y = S 2 y = −y 00 is non-negative definite. We then expect all eigen-
values of −T to be non-negative. We know, in particular, that solutions
of
−y 00 (x) = λy(x),
with y(0) = y(1) = 0 are ym (x) = sin(mπx), and the eigenvalues are
λm = m2 π 2 .

23.3.3 The Sturm-Liouville Operator


We turn now to the differential operator L given by Equation (23.16). We
take V0 to be all complex-valued integrable functions f (x) with
Z b
|f (x)|2 w(x)dx < ∞.
a

We let the inner product of any f (x) and g(x) in V0 be


Z b
hf, gi = f (x)g(x)w(x)dx. (23.19)
a

Let V1 be all functions in V0 that are twice continuously differentiable, and


V all the functions y(x) in V1 with y(a) = y(b) = 0. We then have the
following result.

Theorem 23.1 The operator L given by Equation (23.16) is self-adjoint


on the inner product space V .

Proof: From
(pyz 0 − pzy 0 )0 = (pz 0 )0 y − (py 0 )0 z
we have
1 d
(Ly)z − y(Lz) = (pyz 0 − py 0 z).
w(x) dx
232CHAPTER 23. STURM-LIOUVILLE PROBLEMS (CHAPTER 10,11)

Therefore,
Z b  
(Ly)z − y(Lz) w(x)dx = (pyz 0 − py 0 z)|ba = 0.
a

Therefore, L is self-adjoint on V .
It is interesting to note that
Z b Z b
hLy, yi = p(y 0 )2 dx + qy 2 dx,
a a

so that, if we have p(x) ≥ 0 and q(x) ≥ 0, then the operator L is non-


negative-definite and we expect all its eigenvalues to be non-negative.
A square matrix Q is non-negative definite if and only if it has the form
Q = C 2 , for some Hermitian matrix C; the non-negative definite matrices
are therefore analogous to the non-negative real numbers in that each is a
square. As we just saw, the differential operator Ly = −y 00 is self-adjoint
and non-negative definite. By analogy with the matrix case, we would
expect to be able to write the operator L as L = C 2 , where C is some
self-adjoint linear differential operator. In fact, this is true for the operator
Cy = T y = iy 0 .

23.4 Orthogonality
Once again, let V be the space of all twice continuously differentiable func-
tions y(x) on [a, b] with y(a) = y(b) = 0. Let λm and λn be distinct
eigenvalues of the linear differential operator L given by Equation (23.16),
with associated eigenfunctions um (x) and un (x), respectively. Let the inner
product on V be given by Equation (23.19).
Theorem 23.2 The eigenfunctions um (x) and un (x) are orthogonal.
Proof: We have
d
(p(x)u0m (x)) − w(x)q(x)um (x) = −λm um (x)w(x),
dx
and
d
(p(x)u0n (x)) − w(x)q(x)un (x) = −λn un (x)w(x),
dx
so that
d
un (x) (p(x)u0m (x)) − w(x)q(x)um (x)un (x) = −λm um (x)un (x)w(x)
dx
and
d
um (x) (p(x)u0n (x)) − w(x)q(x)um (x)un (x) = −λn um (x)un (x)w(x).
dx
23.5. NORMAL FORM OF STURM-LIOUVILLE EQUATIONS 233

Subtracting one equation from the other, we get

d d
un (x) (p(x)u0m (x)) − um (x) (p(x)u0n (x)) = (λn − λm )um (x)un (x)w(x).
dx dx
The left side of the previous equation can be written as

d d
un (x) (p(x)u0m (x)) − um (x) (p(x)u0n (x))
dx dx
d  
= p(x)un (x)u0m (x) − p(x)um (x)u0n (x) .
dx
Therefore,
Z b
(λn − λm ) um (x)un (x)w(x)dx =
a

 
p(x)un (x)u0m (x) − p(x)um (x)u0n (x) |ba = 0. (23.20)

Since λm 6= λn , it follows that


Z b
um (x)un (x)w(x)dx = 0.
a

Note that it is not necessary to have um (a) = um (b) = 0 for all m in


order for the right side of Equation (23.20) to be zero; it is enough to have

p(a)um (a) = p(b)um (b) = 0.

We shall make use of this fact in our discussion of Bessel’s and Legendre’s
equations.

23.5 Normal Form of Sturm-Liouville Equa-


tions
We can put an equation in the Sturm-Liouville form into normal form by
first writing it in standard form. There is a better way, though. With the
change of variable from x to µ, where
Z x
1
µ(x) = dt,
a p(t)

and
µ0 (x) = 1/p(x),
234CHAPTER 23. STURM-LIOUVILLE PROBLEMS (CHAPTER 10,11)

we can show that


dy 1 dy
=
dx p(x) dµ
and
d2 y 1 d2 y p0 (x) dy
2
= 2 2− .
dx p dµ p(x) dµ
It follows that
d2 y
+ q1 (µ)y = 0. (23.21)
dµ2
For that reason, we study equations of the form
y 00 + q(x)y = 0. (23.22)

23.6 Examples
In this section we present several examples. We shall study these in more
detail later in these notes.

23.6.1 Wave Equations


Separating the variables to solve wave equations leads to important ordi-
nary differential equations.

The Homogeneous Vibrating String


The wave equation for the homogeneous vibrating string is
∂2u ∂2u
T 2
=m 2, (23.23)
∂x ∂t
where T is the constant tension and m the constant mass density. Sepa-
rating the variables leads to the differential equation
−y 00 (x) = λy(x). (23.24)

The Non-homogeneous Vibrating String


When the mass density m(x) varies with x, the resulting wave equation
becomes
∂2u ∂2u
T = m(x) . (23.25)
∂x2 ∂t2
Separating the variables leads to the differential equation
T
− y 00 (x) = λy(x). (23.26)
m(x)
23.6. EXAMPLES 235

The Vibrating Hanging Chain


In the hanging chain problem, considered in more detail later, the tension
is not constant along the chain, since at each point it depends on the weight
of the part of the chain below. The wave equation becomes

∂2u ∂  ∂u 
= g x . (23.27)
∂t2 ∂x ∂x
Separating the variables leads to the differential equation
d  dy 
−g x = λy(x). (23.28)
dx dx
Note that all three of these differential equations have the form

Ly = λy,

for L given by Equation (23.16).


If we make the change of variable
s
λx
z=2 ,
g

the differential equation in (23.28) becomes

d2 y dy
z2 +z + (z 2 − 02 )y = 0. (23.29)
dz 2 dz
As we shall see shortly, this is a special case of Bessel’s Equation, with
ν = 0.

23.6.2 Bessel’s Equations


For each non-negative constant ν the associated Bessel’s Equation is

x2 y 00 (x) + xy 0 (x) + (x2 − ν 2 )y(x) = 0. (23.30)

Note that the differential equation in Equation (23.28) has the form Ly =
λy, but Equation (23.29) was obtained by a change of variable that ab-
sorbed the λ into the z, so we do not expect this form of the equation to
be in eigenvalue form. However, we can rewrite Equation (23.30) as

1 d  0  ν2
− xy (x) + 2 y(x) = y(x), (23.31)
x dx x
which is in the form of a Sturm-Liouville eigenvalue problem, with w(x) =
2
x = p(x), q(x) = xν 2 , and λ = 1. As we shall discuss again in the chapter
236CHAPTER 23. STURM-LIOUVILLE PROBLEMS (CHAPTER 10,11)

on Bessel’s Equations, we can use this fact to obtain a family of orthogonal


eigenfunctions.
Let us fix ν and denote by Jν (x) a solution of Equation (23.30). Then
Jν (x) solves the eigenvalue problem in Equation (23.31), for λ = 1. A little
calculation shows that for any a the function u(x) = Jν (ax) satisfies the
eigenvalue problem
1 d  0  ν2
− xy (x) + 2 y(x) = a2 y(x). (23.32)
x dx x
Let γm > 0 be the positive roots of Jν (x) and define ym (x) = Jν (γm x) for
each m. Then we have
1 d  0  ν2
2
− xym (x) + 2 ym (x) = γm ym (x), (23.33)
x dx x
and ym (1) = 0 for each m. We have the following result.
Theorem 23.3 Let γm and γn be distinct positive zeros of Jν (x). Then
Z 1
ym (x)yn (x)xdx = 0.
0

Proof: The proof is quite similar to the proof of Theorem 23.2. The main
point is that now
 
0
xyn (x)ym (x) − xym (x)yn0 (x) |10 = 0

because ym (1) = 0 for all m and the function w(x) = x is zero when x = 0.

23.6.3 Legendre’s Equations


Legendre’s equations have the form
(1 − x2 )y 00 (x) − 2xy 0 (x) + p(p + 1)y(x) = 0, (23.34)
where p is a constant. When p = n is a non-negative integer, there is a
solution Pn (x) that is a polynomial of degree n, containing only even or
odd powers, as n is either even or odd; Pn (x) is called the nth Legendre
polynomial. Since the differential equation in (23.34) can be written as
d  
− (1 − x2 )y 0 (x) = p(p + 1)y(x), (23.35)
dx
it is a Sturm-Liouville eigenvalue problem with w(x) = 1, p(x) = (1 − x2 )
and q(x) = 0. The polynomials Pn (x) are eigenfunctions of the Legendre
differential operator T given by
d  
(T y)(x) = − (1 − x2 )y 0 (x) , (23.36)
dx
23.6. EXAMPLES 237

but we have not imposed any explicit boundary conditions. Nevertheless,


we have the following orthogonality theorem.
Theorem 23.4 For m 6= n we have
Z 1
Pm (x)Pn (x)dx = 0.
−1

Proof: In this case, Equation (23.20) becomes


Z 1
(λn − λm ) Pm (x)Pn (x)dx =
−1

 
0
(1 − x2 )[Pn (x)Pm (x) − Pm (x)Pn0 (x)] |1−1 = 0, (23.37)

which holds not because we have imposed end-point conditions on the


Pn (x), but because p(x) = 1 − x2 is zero at both ends.

23.6.4 Other Famous Examples


Well known examples of Sturm-Liouville problems also include
• Chebyshev:
d p dy 
1 − x2 + λ(1 − x2 )−1/2 y = 0;
dx dx

• Hermite:
d  −x2 dy  2
e + λe−x y = 0;
dx dx
and
• Laguerre:
d  −x dy 
xe + λe−x y = 0.
dx dx
Exercise 23.1 For each of the three differential equations just listed, see
if you can determine the interval over which their eigenfunctions will be
orthogonal.
238CHAPTER 23. STURM-LIOUVILLE PROBLEMS (CHAPTER 10,11)
Chapter 24

Series Solutions for


Differential Equations
(Chapter 10,11)

24.1 First-Order Linear Equations


There are only a few linear equations that can be solved exactly in closed
form. For the others, we need different approaches. One such is to find a
series representation for the solution. We begin with two simple examples.

24.1.1 An Example
Consider the differential equation

y 0 = y. (24.1)

We look for a solution of Equation (24.1) of the form

y(x) = a0 + a1 x + a2 x2 + ....

Writing
y 0 (x) = a1 + 2a2 x + 3a3 x2 + ...,
and inserting these series into the equation y 0 − y = 0, we have

0 = (a0 − a1 ) + (a1 − 2a2 )x + (2a2 − 3a3 )x2 + ....

Each coefficient must be zero, from which we determine that

an = a0 /n!.

239
240CHAPTER 24. SERIES SOLUTIONS FOR DIFFERENTIAL EQUATIONS (CHAPTER 10,11)

The solutions then are



X xn
y(x) = a0 = a0 ex .
n=0
n!

24.1.2 Another Example: The Binomial Theorem


Consider now the differential equation

(1 + x)y 0 − py = 0, (24.2)

with y(0) = 1. Writing

y(x) = a0 + a1 x + a2 x2 + ...,

we find that

0 = (a1 − p) + (2a2 − (p − 1)a1 )x + (3a3 − (p − 2)a2 )x2 + ....

Setting each coefficient to zero, we find that



X p(p − 1)(p − 2)...(p − n + 1) n
y(x) = x .
n=0
n!

The function y(x) = (1 + x)p can be shown to be the unique solution of


the original differential equation. Therefore, we have

X p(p − 1)(p − 2)...(p − n + 1) n
(1 + x)p = x ; (24.3)
n=0
n!

this is the Binomial Theorem, with the series converging for |x| < 1.

24.2 Second-Order Problems


We turn now to the second-order problem

y 00 (x) + P (x)y 0 (x) + Q(x)y = 0. (24.4)

If both P (x) and Q(x) have Taylor series expansions that converge in a
neighborhood of x = x0 , we say that x0 is an ordinary point for the differen-
tial equation. In that case, we expect to find a Taylor series representation
for the solution that converges in a neighborhood of x0 .
If x0 is not an ordinary point, but both (x − x0 )P (x) and (x − x0 )2 Q(x)
have Taylor series expansions that converge in a neighborhood of x0 , we
say that x0 is a regular singular point of the differential equation. In such
cases, we seek a Frobenius series solution.
24.3. ORDINARY POINTS 241

24.3 Ordinary Points


We consider several examples of equations for which x = 0 is an ordinary
point.

24.3.1 The Wave Equation


When we separate variables in the vibrating string problem we find that
we have to solve the equation

y 00 + y = 0. (24.5)

Writing the solution as



X
y(x) = an xn ,
n=0

we find that
 x2 x4 x6   x3 x5 
y(x) = a0 1 − + − + ... + a1 x − + − ...
2! 4! 6! 3! 5!
so that
y(x) = a0 cos x + a1 sin x.

24.3.2 Legendre’s Equations


Legendre’s Equations have the form

(1 − x2 )y 00 − 2xy 0 + p(p + 1)y = 0. (24.6)

Writing

X
y(x) = an xn ,
n=0

we find that
 p(p + 1) 2 p(p − 2)(p + 1)(p + 3) 4 
y(x) = a0 1 − x + x − ...
2! 4!
 (p − 1)(p + 2) 3 (p − 1)(p − 3)(p + 2)(p + 4) 5 
+a1 x − x + x − ... .
3! 5!
If p = n is a positive even integer, the first series terminates, and if p = n
is an odd positive integer, the second series terminates. In either case, we
get the Legendre polynomial solutions, denoted Pn (x).
242CHAPTER 24. SERIES SOLUTIONS FOR DIFFERENTIAL EQUATIONS (CHAPTER 10,11)

24.3.3 Hermite’s Equations


Hermite’s Equations have the form
y 00 − 2xy 0 + 2py = 0. (24.7)
The solutions of equation (24.7) are
y(x) = a0 y1 (x) + a1 y2 (x),
where
2p 2 22 p(p − 2) 4 23 p(p − 2)(p − 4) 6
y1 (x) = 1 − x + x − x + ...,
2! 4! 6!
and
2(p − 1) 3 22 (p − 1)(p − 3) 5
y2 (x) = x − x + x − ...
3! 5!
If p = n is a non-negative integer, one of these series terminates and gives
the Hermite polynomial solution Hn (x).

24.4 Regular Singular Points


We turn now to the case of regular singular points.

24.4.1 Motivation
We motivate the Frobenius series approach by considering Euler’s differen-
tial equation,
x2 y 00 + pxy 0 + qy = 0, (24.8)
where both p and q are constants and x > 0. Equation (24.8) can be
written as
p q
y 00 + y 0 + 2 y = 0,
x x
from which we see that x = 0 is a regular singular point.
Changing variables to z = log x, we obtain
d2 y dy
+ (p − 1) + qy = 0. (24.9)
dz 2 dz
We seek a solution of the form y(z) = emz . Inserting this guess into
Equation (24.9), we find that we must have
m2 + (p − 1)m + q = 0;
this is the indicial equation. If the roots m = m1 and m = m2 are distinct,
the solutions are em1 z and em2 z . If m1 = m2 , then the solutions are
em1 z and zem1 z . Reverting back to the original variables, we find that
the solutions are either y(x) = xm1 and y(x) = xm2 , or y(x) = xm1 and
y(x) = xm1 log x.
24.4. REGULAR SINGULAR POINTS 243

24.4.2 Frobenius Series


P∞ P∞
When p is replaced by n=0 pn xn and q is replaced by n=0 qn xn , we
expect solutions to have the form

X
y(x) = xm an xn ,
n=0

or

X
y(x) = xm log x an xn ,
n=0
where m is a root of an indicial equation. This is the Frobenius series
approach.
A Frobenius series associated with the singular point x0 = 0 has the
form

y(x) = xm a0 + a1 x + a2 x2 + ... ,

(24.10)

where m is to be determined, and a0 6= 0. Since xP (x) and x2 Q(x) are


analytic, we can write

xP (x) = p0 + p1 x + p2 x2 + ..., (24.11)

and

x2 Q(x) = q0 + q1 x + q2 x2 + ..., (24.12)

with convergence for |x| < R. Inserting these expressions into the differen-
tial equation, and performing a bit of algebra, we arrive at

(
X  
an (m + n)(m + n − 1) + (m + n)p0 + q0 +
n=0

n−1
)
X
xn = 0.
 
ak (m + k)pn−k + qn−k (24.13)
k=0

Setting each coefficient to zero, we obtain a recursive algorithm for finding


the an . To start with, we have
 
a0 m(m − 1) + mp0 + q0 = 0. (24.14)

Since a0 6= 0, we must have

m(m − 1) + mp0 + q0 = 0; (24.15)

this is called the Indicial Equation. We solve the quadratic Equation (24.15)
for m = m1 and m = m2 .
244CHAPTER 24. SERIES SOLUTIONS FOR DIFFERENTIAL EQUATIONS (CHAPTER 10,11)

24.4.3 Bessel Functions


Applying these results to Bessel’s Equation, we see that P (x) = x1 , Q(x) =
2
1 − xν 2 , and so p0 = 1 and q0 = −ν 2 . The Indicial Equation (24.15) is now

m2 − ν 2 = 0, (24.16)

with solutions m1 = ν, and m2 = −ν. The recursive algorithm for finding


the an is

an = −an−2 /n(2ν + n). (24.17)

Since a0 6= 0 and a−1 = 0, it follows that the solution for m = ν is


" #
2 4
x x
y = a0 xν 1 − 2 + − ... . (24.18)
2 (ν + 1) 24 2!(ν + 1)(ν + 2)

Setting a0 = 1/2ν ν!, we get the νth Bessel function,



X  x 2n+ν
Jν (x) = (−1)n /n!(ν + n)!. (24.19)
n=0
2

The most important Bessel functions are J0 (x) and J1 (x).


There is a potential problem in Equation (24.19). Notice that we have
not required that ν be a non-negative integer, so the term (ν + n)! may be
undefined. This leads to consideration of the gamma function, which ex-
tends the factorial function beyond non-negative integers. Equation (24.19)
should be written

X  x 2n+ν
Jν (x) = (−1)n /n!Γ(ν + n + 1). (24.20)
n=0
2
Chapter 25

Bessel’s Equations
(Chapter 9,10,11)

For each non-negative constant ν, the associated Bessel Equation is


d2 y dy
x2 2
+x + (x2 − ν 2 )y = 0, (25.1)
dx dx
which can also be written in the form

y 00 + P (x)y 0 + Q(x)y = 0, (25.2)


2
with P (x) = x1 and Q(x) = 1 − xν 2 .
Solutions of Equation (25.1) are Bessel functions. These functions first
arose in Daniel Bernoulli’s study of the oscillations of a hanging chain, and
now play important roles in many areas of applied mathematics [42].
We begin this note with Bernoulli’s problem, to see how Bessel’s Equa-
tion becomes involved. We then consider Frobenius-series solutions to
second-order linear differential equations with regular singular points; Bessel’s
Equation is one of these. Once we obtain the Frobenius-series solution of
Equation (25.1), we discover that it involves terms of the form p!, for (pos-
sibly) non-integer p. This leads to the Gamma Function, which extends
the factorial function to such non-integer arguments.
The Gamma Function, defined for x > 0 by the integral
Z ∞
Γ(x) = e−t tx−1 dt, (25.3)
0

is a higher transcendental function that cannot be evaluated by purely


algebraic means, and can only be approximated by numerical techniques.
With clever changes of variable, a large number of challenging integration
problems can be rewritten and solved in terms of the gamma function.

245
246 CHAPTER 25. BESSEL’S EQUATIONS (CHAPTER 9,10,11)

We prepare for our discussion of Bernoulli’s hanging chain problem by


recalling some important points in the derivation of the one-dimensional
wave equation for the vibrating string problem.

25.1 The Vibrating String Problem


In the vibrating string problem, the string is fixed at end-points (0, 0) and
(1, 0). The position of the string at time t is given by y(x, t), where x is
the horizontal spatial variable. It is assumed that the string has a constant
mass density, m. Consider the small piece of the string corresponding to
the interval [x, x + ∆x]. Its mass is m∆x, and so, from Newton’s equating
of force with mass times acceleration, we have that the force f on the small
piece of string is related to acceleration by
∂2y
f ≈ m(∆x) . (25.4)
∂t2
In this problem, the force is not gravitational, but comes from the tension
applied to the string; we denote by T (x) the tension in the string at x. This
tensile force acts along the tangent to the string at every point. Therefore,
the force acting on the left end-point of the small piece is directed to the
left and is given by −T (x) sin(θ(x)); at the right end-point it is T (x +
∆x) sin(θ(x + ∆x)), where θ(x) is the angle the tangent line at x makes
with the horizontal. For small-amplitude oscillations of the string, the
angles are near zero and the sine can be replaced by the tangent. Since
∂y
tan(θ(x)) = ∂x (x), we can write the net force on the small piece of string
as
∂y ∂y
f ≈ T (x + ∆x) (x + ∆x) − T (x) (x). (25.5)
∂x ∂x
Equating the two expressions for f in Equations (25.4) and (25.5) and
dividing by ∆x, we obtain
∂y ∂y
T (x + ∆x) ∂x (x + ∆x) − T (x) ∂x (x) ∂2y
≈m 2. (25.6)
∆x ∂t
Taking limits, as ∆x → 0, we arrive at the Wave Equation
∂  ∂y  ∂2y
T (x) (x) = m 2 . (25.7)
∂x ∂x ∂t
For the vibrating string problem, we also assume that the tension function
is constant, that is, T (x) = T , for all x. Then we can write Equation (25.7)
as the more familiar
∂2y ∂2y
T = m . (25.8)
∂x2 ∂t2
25.2. THE HANGING CHAIN PROBLEM 247

We could have introduced the assumption of constant tension earlier in


this discussion, but we shall need the wave equation for variable tension
Equation (25.7) when we consider the hanging chain problem.

25.2 The Hanging Chain Problem


Imagine a flexible chain hanging vertically. Assume that the chain has
a constant mass density m. Let the origin (0, 0) be the bottom of the
chain, with the positive x-axis running vertically, up through the chain.
The positive y-axis extends horizontally to the left, from the bottom of the
chain. As before, the function y(x, t) denotes the position of each point
on the chain at time t. We are interested in the oscillation of the hanging
chain. This is the vibrating string problem turned on its side, except that
now the tension is not constant.

25.2.1 The Wave Equation for the Hanging Chain


The tension at the point x along the chain is due to the weight of the portion
of the chain below the point x, which is then T (x) = mgx. Applying
Equation (25.7), we have
∂  ∂y  ∂2y
mgx (x) = m 2 . (25.9)
∂x ∂x ∂t
As we normally do at this stage, we separate the variables, to find potential
solutions.

25.2.2 Separating the Variables


We consider possible solutions having the form
y(x, t) = u(x)v(t). (25.10)
Inserting this y(x, t) into Equation (25.9), and doing a bit of algebra, we
arrive at
gxu00 (x) + gu0 (x) + λu(x) = 0, (25.11)
and
v 00 (t) + λv(t) = 0, (25.12)
where λ is the separation constant. It is Equation (25.11), which can also
be written as
d
(gxu0 (x)) + λu(x) = 0, (25.13)
dx
that interests us here.
248 CHAPTER 25. BESSEL’S EQUATIONS (CHAPTER 9,10,11)

25.2.3 Obtaining Bessel’s Equation


q √
With a bit more work, using the change of variable z = 2 λg x and
the Chain Rule (no pun intended!), we find that we can rewrite Equation
(25.11) as
d2 u du
z2 +z + (z 2 − 02 )u = 0, (25.14)
dz 2 dz
which is Bessel’s Equation (25.1), with the parameter value ν = 0.

25.3 Solving Bessel’s Equations


Second-order linear differential equations with the form

y 00 (x) + P (x)y 0 (x) + Q(x)y(x) = 0, (25.15)

with neither P (x) nor Q(x) analytic at x = x0 , but with both (x − x0 )P (x)
and (x − x0 )2 Q(x) analytic, are said to be equations with regular singular
points. Writing Equation (25.1) as
1 0 ν2
y 00 (x) + y (x) + (1 − 2 )y(x) = 0, (25.16)
x x
we see that Bessel’s Equation is such a regular singular point equation,
with the singular point x0 = 0. Solutions to such equations can be found
using the technique of Frobenius series.

25.3.1 Frobenius-series solutions


A Frobenius series associated with the singular point x0 = 0 has the form

y(x) = xm a0 + a1 x + a2 x2 + ... ,

(25.17)

where m is to be determined, and a0 6= 0. Since xP (x) and x2 Q(x) are


analytic, we can write

xP (x) = p0 + p1 x + p2 x2 + ..., (25.18)

and

x2 Q(x) = q0 + q1 x + q2 x2 + ..., (25.19)

with convergence for |x| < R. Inserting these expressions into the differen-
tial equation, and performing a bit of algebra, we arrive at

(
X  
an (m + n)(m + n − 1) + (m + n)p0 + q0 +
n=0
25.3. SOLVING BESSEL’S EQUATIONS 249

n−1
)
X   n
ak (m + k)pn−k + qn−k x = 0. (25.20)
k=0

Setting each coefficient to zero, we obtain a recursive algorithm for finding


the an . To start with, we have
 
a0 m(m − 1) + mp0 + q0 = 0. (25.21)

Since a0 6= 0, we must have

m(m − 1) + mp0 + q0 = 0; (25.22)

this is called the Indicial Equation. We solve the quadratic Equation (25.22)
for m = m1 and m = m2 .

25.3.2 Bessel Functions


Applying these results to Bessel’s Equation (25.1), we see that P (x) = x1 ,
2
Q(x) = 1 − xν 2 , and so p0 = 1 and q0 = −ν 2 . The Indicial Equation (25.22)
is now

m2 − ν 2 = 0, (25.23)

with solutions m1 = ν, and m2 = −ν. The recursive algorithm for finding


the an is

an = −an−2 /n(2ν + n). (25.24)

Since a0 6= 0 and a−1 = 0, it follows that the solution for m = ν is


" #
ν x2 x4
y = a0 x 1 − 2 + − ... . (25.25)
2 (ν + 1) 24 2!(ν + 1)(ν + 2)

Setting a0 = 1/2ν ν!, we get the νth Bessel function,



X  x 2n+ν
Jν (x) = (−1)n /n!(ν + n)!. (25.26)
n=0
2

The most important Bessel functions are J0 (x) and J1 (x).

We have a Problem! So far, we have allowed ν to be any real number.


What, then, do we mean by ν! and (n + ν)!? To answer this question, we
need to investigate the gamma function.
250 CHAPTER 25. BESSEL’S EQUATIONS (CHAPTER 9,10,11)

25.4 Bessel Functions of the Second Kind


If ν is not an integer, then Jν (x) and J−ν (x) are linearly independent and
the complete solution of Equation (25.1) is

y(x) = AJν (x) + BJ−ν (x). (25.27)

If ν = n is an integer, then

J−n (x) = (−1)n Jn (x).

For n = 0, 1, ..., the Bessel function of the second kind, of order n, is

Jν (x) cos νπ − J−ν (x)


Yn (x) = lim . (25.28)
ν→n sin νπ
The general solution of Equation (25.1), for ν = n, is then

y(x) = AJn (x) + BYn (x).

25.5 Hankel Functions


The Hankel functions of the first and second kind are

Hn(1) (x) = Jn (x) + iYn (x), (25.29)

and

Hn(2) (x) = Jn (x) − iYn (x). (25.30)

25.6 The Gamma Function


We want to define ν! for ν not a non-negative integer. The Gamma Function
is the way to do this.

25.6.1 Extending the Factorial Function


As we said earlier, the Gamma Function is defined for x > 0 by
Z ∞
Γ(x) = e−t tx−1 dt. (25.31)
0

Using integration by parts, it is easy to show that

Γ(x + 1) = xΓ(x). (25.32)


25.6. THE GAMMA FUNCTION 251

Using Equation (25.32) and the fact that


Z ∞
Γ(1) = e−t dt = 1, (25.33)
0

we obtain

Γ(n + 1) = n!, (25.34)

for n = 0, 1, 2, ....

25.6.2 Extending Γ(x) to negative x


We can use
Γ(x + 1)
Γ(x) = (25.35)
x
to extend Γ(x) to any x < 0, with the exception of the negative integers,
at which Γ(x) is unbounded.

25.6.3 An Example
We have
Z ∞
1
Γ( ) = e−t t−1/2 dt. (25.36)
2 0

Therefore, using t = u2 , we have


Z ∞
1 2
Γ( ) = 2 e−u du. (25.37)
2 0

Squaring, we get
Z ∞ Z ∞
1 2 2
Γ( )2 = 4 e−u e−v dudv. (25.38)
2 0 0

In polar coordinates, this becomes


Z π2 Z ∞
1 2
Γ( )2 = 4 e−r rdrdθ
2 0 0

Z π
2
=2 1dθ = π. (25.39)
0

Consequently, we have
1 √
Γ( ) = π. (25.40)
2
252 CHAPTER 25. BESSEL’S EQUATIONS (CHAPTER 9,10,11)

25.7 Representing the Bessel Functions


There are several equivalent ways to represent the Bessel functions.

25.7.1 Taylor Series


The Bessel function of the first kind and order n, Jn (x), is sometimes
defined by the infinite series

X (−1)m (x/2)n+2m
Jn (x) = , (25.41)
m=0
m!Γ(n + m + 1)

for n = 0, 1, ... . The series converges for all x. From Equation (25.41) we
have
x2 x4 x6
J0 (x) = 1 − + − + ..., (25.42)
22 22 42 22 42 62
from which it follows immediately that J0 (−x) = J0 (x).

25.7.2 Generating Function


For each fixed x, the function of the complex variable z given by
x
f (z) = exp( (z − 1/z))
2
has the Laurent series expansion

x X
exp( (z − 1/z)) = Jn (x) z n . (25.43)
2 n=−∞

Using Cauchy’s formula for the coefficients of a Laurent series, we find that
I
1 f (z)
Jn (x) = dz, (25.44)
2πi C z n+1

for any simple closed curve C surrounding the essential singularity z = 0.

25.7.3 An Integral Representation


Now we take as C the circle of radius one about the origin, so that z = eiθ ,
for θ in the interval [0, 2π]. Rewriting Equation (25.44), we get
Z 2π
1
Jn (x) = ei(x sin θ−nθ) dθ. (25.45)
2π 0
25.8. FOURIER TRANSFORMS AND BESSEL FUNCTIONS 253

By symmetry, we can also write


1 π
Z
Jn (x) = cos(x sin θ − nθ)dθ. (25.46)
π 0
From Equation (25.45) we have
Z 2π
1
J0 (x) = ei(x sin θ) dθ, (25.47)
2π 0

or, equivalently,
Z 2π
1
J0 (x) = ei(x cos θ) dθ. (25.48)
2π 0

25.8 Fourier Transforms and Bessel Functions


Bessel functions are closely related to Fourier transforms.

25.8.1 The Case of Two Dimensions


Let f (x, y) be a complex-valued function of the two real variables x and
y. Then its Fourier transform is the function F (α, β) of two real variables
defined by
Z ∞Z ∞
F (α, β) = f (x, y)eiαx eiβy dxdy. (25.49)
−∞ −∞

The Fourier Inversion Formula then gives


Z ∞Z ∞
1
f (x, y) = F (α, β)e−iαx e−iβy dαdβ. (25.50)
4π 2 −∞ −∞

25.8.2 The Case of Radial Functions


Suppose that we express f (x, y) in polar coordinates r and θ, with x =
r cos θ and y = r sin θ. The function f (x, y) is said to be radial if, when
expressed in polar coordinates, it is independent of θ. Another way to say
this is there is some function g(t) such that
p
f (x, y) = g( x2 + y 2 ) = g(r), (25.51)
for each x and y; that is, f is constant on circles centered at the origin.
When f (x, y) is radial, so if F (α, β), as we now prove. We begin by
expressing F (α, β) in polar coordinates as well, with α = ρ cos ω and β =
ρ sin ω. Then
Z ∞ Z 2π
F (ρ cos ω, ρ sin ω) = g(r)eirρ cos(θ−ω) dθrdr
0 0
254 CHAPTER 25. BESSEL’S EQUATIONS (CHAPTER 9,10,11)

Z ∞ Z 2π 
= eirρ cos(θ−ω) dθ g(r)rdr.
0 0

By making the variable substitution of γ = θ − ω, it is easy to show


that the inner integral,
Z 2π
eirρ cos(θ−ω) dθ,
0

is actually independent of ω, which tells us that F is radial; we then write


F (ρ cos ω, ρ sin ω) = H(ρ).
From Equation (25.48) we know that
Z 2π
2πJ0 (rρ) = eirρ cos(θ) dθ (25.52)
0

We then have
Z ∞
H(ρ) = 2π rg(r)J0 (rρ)dr. (25.53)
0

There are several things to notice here.

25.8.3 The Hankel Transform


First, note that when f (x, y) is radial, its two-dimensional Fourier trans-
form is also radial, but H(ρ) is not the one-dimensional Fourier transform
1
of g(r). The integral in Equation (25.53) tells us that 2π H(ρ) is the Hankel
transform of g(r). Because of the similarity between Equations (25.49) and
(25.50), we also have
Z ∞
1
g(r) = ρH(ρ)J0 (rρ)dρ. (25.54)
2π 0

For any function s(x) of a single real variable, its Hankel transform is
Z ∞
T (γ) = xs(x)J0 (γx)dx. (25.55)
0

The inversion formula is


Z ∞
1
s(x) = γT (γ)J0 (γx)dγ. (25.56)
2π 0
25.9. AN APPLICATION OF THE BESSEL FUNCTIONS IN ASTRONOMY255

25.9 An Application of the Bessel Functions


in Astronomy
In remote sensing applications, it is often the case that what we measure is
the Fourier transform of what we really want. This is the case in medical
imaging, for example, in both x-ray tomography and magnetic-resonance
imaging. It is also often the case in astronomy. Consider the problem of
determining the size of a distant star.
We model the star as a distance disk of uniform brightness. Viewed as
a function of two variables, it is the function that, in polar coordinates,
can be written as f (r, θ) = g(r), that is, it is a radial function that is a
function of r only, and independent of θ. The function g(r) is, say, one for
0 ≤ r ≤ R, where R is the radius of the star, and zero, otherwise. From
the theory of Fourier transform pairs in two-dimensions, we know that the
two-dimensional Fourier transform of f is also a radial function; it is the
function
Z R
H(ρ) = 2π rJ0 (rρ)dr,
0

where J0 is the zero-th order Bessel function of the first kind. From the
theory of Bessel functions, we learn that

d
[xJ1 (x)] = xJ0 (x),
dx

so that

H(ρ) = RJ1 (Rρ).
ρ
When the star is viewed through a telescope, the image is blurred by the
atmosphere. It is commonly assumed that the atmosphere performs a con-
volution filtering on the light from the star, and that this filter is random
and varies somewhat from one observation to another. Therefore, at each
observation, it is not H(ρ), but H(ρ)G(ρ) that is measured, where G(ρ) is
the filter transfer function operating at that particular time.
Suppose we observe the star N times, for each n = 1, 2, ..., N measur-
ing values of the function H(ρ)Gn (ρ). If we then average over the various
measurements, we can safely say that the first zero we observe in our mea-
surements is the first zero of H(ρ), that is, the first zero of J1 (Rρ). The
first zero of J1 (x) is known to be about 3.8317, so knowing this, we can
determine R. Actually, it is not truly R that we are measuring, since we
also need to involve the distance D to the star, known by other means.
What we are measuring is the perceived radius, in other words, half the
subtended angle. Combining this with our knowledge of D, we get R.
256 CHAPTER 25. BESSEL’S EQUATIONS (CHAPTER 9,10,11)

25.10 Orthogonality of Bessel Functions


As we have seen previously, the orthogonality of trigonometric functions
plays an important role in Fourier series. A similar notion of orthogonality
holds for Bessel functions. We begin with the following theorem.

Theorem 25.1 Let u(x) be a non-trivial solution of u00 (x) + q(x)u(x) = 0.


If Z ∞
q(x)dx = ∞ ,
1

then u(x) has infinitely many zeros on the positive x-axis.

Bessel’s Equation

x2 y 00 (x) + xy 0 (x) + (x2 − ν 2 )y(x) = 0, (25.57)

can be written in normal form as


!
00 1 − 4ν 2
y (x) + 1+ y(x) = 0, (25.58)
4x2

and, as x → ∞,
1 − 4ν 2
q(x) = 1 + → 1,
4x2
so, according to the theorem, every non-trivial solution of Bessel’s Equation
has infinitely many positive zeros.
Now consider the following theorem, which is a consequence of the
Sturm Comparison Theorem discussed elsewhere in these notes.

Theorem 25.2 Let yν (x) be a non-trivial solution of Bessel’s Equation

x2 y 00 (x) + xy 0 (x) + (x2 − ν 2 )y(x) = 0,

for x > 0. If 0 ≤ ν < 21 , then every interval of length π contains at least


one zero of yν (x); if ν = 21 , then the distance between successive zeros of
yν (x) is precisely π; and if ν > 12 , then every interval of length π contains
at most one zero of yν (x).

It follows from these two theorems that, for each fixed ν, the function
yν (x) has an infinite number of positive zeros, say λ1 < λ2 < ..., with
λn → ∞.
For fixed ν, let yn (x) = yν (λn x). As we saw earlier, we have the follow-
ing orthogonality theorem.
R1
Theorem 25.3 For m 6= n, 0 xym (x)yn (x)dx = 0.
25.10. ORTHOGONALITY OF BESSEL FUNCTIONS 257

Proof: Let u(x) = ym (x) and v(x) = yn (x). Then we have

1 0 ν2
u00 + u + (λ2m − 2 )u = 0,
x x
and
1 0 ν2
v 00 +
v + (λ2n − 2 )v = 0.
x x
Multiplying on both sides by x and subtracting one equation from the
other, we get

x(uv 00 − vu00 ) + (uv 0 − vu0 ) = (λ2m − λ2n )xuv.

Since
d  
x(uv 0 − vu0 ) = x(uv 00 − vu00 ) + (uv 0 − vu0 ),
dx
it follows, by integrating both sides over the interval [0, 1], that
Z 1
x(uv 0 − vu0 )|10 = (λ2m − λ2n ) xu(x)v(x)dx.
0

But
x(uv 0 − vu0 )|10 = u(1)v 0 (1) − v(1)u0 (1) = 0.
258 CHAPTER 25. BESSEL’S EQUATIONS (CHAPTER 9,10,11)
Chapter 26

Legendre’s Equations
(Chapter 10,11)

26.1 Legendre’s Equations


In this chapter we shall be interested in Legendre’s equations of the form

(1 − x2 )y 00 (x) − 2xy 0 (x) + n(n + 1)y(x) = 0, (26.1)

where n is a non-negative integer. In this case, there is a solution Pn (x)


that is a polynomial of degree n, containing only even or odd powers, as n
is either even or odd; Pn (x) is called the nth Legendre polynomial. Since
the differential equation in (26.1) can be written as

d  
− (1 − x2 )y 0 (x) = n(n + 1)y(x), (26.2)
dx
it is a Sturm-Liouville eigenvalue problem with w(x) = 1, p(x) = (1 − x2 )
and q(x) = 0. The polynomials Pn (x) are eigenfunctions of the Legendre
differential operator T given by

d  
(T y)(x) = − (1 − x2 )y 0 (x) , (26.3)
dx
but we have not imposed any explicit boundary conditions. Nevertheless,
we have the following orthogonality theorem.

Theorem 26.1 For m 6= n we have


Z 1
Pm (x)Pn (x)dx = 0.
−1

259
260 CHAPTER 26. LEGENDRE’S EQUATIONS (CHAPTER 10,11)

Proof: In this case, Equation (23.20) becomes


Z 1
(λn − λm ) Pm (x)Pn (x)dx =
−1

 
0
(1 − x2 )[Pn (x)Pm (x) − Pm (x)Pn0 (x)] |1−1 = 0, (26.4)

which holds not because we have imposed end-point conditions on the


Pn (x), but because p(x) = 1 − x2 is zero at both ends.
From the orthogonality we can conclude that
Z 1
PN (x)Q(x)dx = 0,
−1

for any polynomial Q(x) of degree at most N − 1. This is true because


Q(x) can be written as a linear combination of the Legendre polynomials
Pn (x), for n = 0, 1, ..., N − 1.
Using orthogonality we can prove the following theorem:
Theorem 26.2 All the N roots of PN (x) lie in the interval [−1, 1].
Proof: Let {xn |n = 1, 2, ..., N } be the roots of PN (x). For fixed n let
Y
Qn (x) = (x − xm ),
m6=n

so that PN (x) = c(x − xn )Qn (x), for some constant c. Then


Z 1 Z 1 Y
0= PN (x)Qn (x)dx = c (x − xn ) (x − xm )2 dx,
−1 −1 m6=n

from which we conclude that (x − xn ) does not have constant sign on the
interval [−1, 1].
Now that we know that all N roots of PN (x) are real, we can use
orthogonality again to prove that all the roots are distinct.
Theorem 26.3 All the roots of PN (x) are distinct.
Proof: Suppose that x1 = x2 . Then we can write
N
Y
PN (x) = c(x − x1 )2 (x − xm ) = (x − x1 )2 Q(x),
m=3

where Q(x) is a polynomial of degree N − 2. Therefore,


Z 1
PN (x)Q(x)dx = 0,
−1
26.2. RODRIGUES’ FORMULA 261

by orthogonality. But
Z 1 Z 1
PN (x)Q(x)dx = (x − x1 )2 Q(x)2 dx,
−1 −1

which cannot equal zero, since the integrand is a non-negative polynomial.

26.2 Rodrigues’ Formula


There is a simple formula, called Rodrigues’ Formula, for generating the
successive Legendre polynomials:
1 dn 2
Pn (x) = (x − 1)n . (26.5)
2n n! dxn
Using Equation (26.5), we find that

P0 (x) = 1,

P1 (x) = x,
1
P2 (x) = (3x2 − 1),
2
and so on.
Exercise 26.1 Calculate P3 (x).

26.3 A Recursive Formula for Pn (x)


While Rodrigues’ formula is simple to write down, it is not simple to apply.
With n = 0, we get P0 (x) = 1. Then
1 d 1
P1 (x) = [(x2 − 1)] = (2x) = x,
2 dx 2
1 d2 1 3 1
P2 (x) = [(x2 − 1)2 ] = [12x2 − 4] = x2 − .
8 dx2 8 2 2
The others follow in the same way, although, as I am sure you will discover,
the calculations become increasingly tedious. One approach that simplifies
the calculation is to use the binomial theorem to expand (x2 − 1)n before
differentiating. Only the arithmetic involving the coefficients remains a
nuisance then.
They say that necessity is the mother of invention, but I think avoiding
tedium can also be a strong incentive. Finding a shortcut is not necessarily
a way to save time; in the time spent finding the shortcut, you could prob-
ably have solved the original problem three times over. It is easy to make
262 CHAPTER 26. LEGENDRE’S EQUATIONS (CHAPTER 10,11)

mistakes in long calculations, though, and a shortcut that requires fewer


calculations can be helpful. In a later section we shall see a standard recur-
sive formula that allows us to compute Pn+1 (x) from Pn (x) and Pn−1 (x).
In this section we try to find our own simplification of Rodrigues’ formula.
Here is one idea. Convince yourself that the n + 1-st derivative of a
product of f (x) and g(x) can be written
n+1
X 
n + 1 (k) (n+1−k)
(f g)(n+1) = f g ,
k
k=0

where  
n+1 (n + 1)!
= .
k k!(n + 1 − k)!
Now we find [(x2 − 1)n+1 ](n+1) by defining f (x) = (x2 − 1)n and g(x) =
x2 − 1.
Since g (n+1−k) = 0, except for k = n+1, n, and n−1, the sum above has
only three terms. Two of the three terms involve Pn (x) and Pn0 (x), which
we would already have found. The third term involves the anti-derivative of
Pn (x). We can easily calculate this anti-derivative, except for the constant.
See if you can figure out what the constant must be.

26.4 A Generating Function Approach


For each fixed x and variable t, the function
1
√ = P0 (x) + P1 (x)t + P2 (x)t2 + P3 (x)t3 + ... + Pn (x)tn + ....
1 − 2xt + t2
This function is called the generating function for the Legendre polynomi-
als.

Exercise 26.2 Use the generating function and the Taylor expansion of
log(t + 1) around t = 0 to prove that
Z 1
2
Pn (x)Pn (x)dx = .
−1 2n + 1

Exercise 26.3 Use the generating function to

• (a) verify that Pn (1) = 1 and Pn (−1) = (−1)n , and


• (b) show that P2n+1 (0) = 0 and

(−1)n
P2n (0) = (1 · 3 · 5 · · · (2n − 1)).
2n n!
26.5. A TWO-TERM RECURSIVE FORMULA FOR PN (X) 263

26.5 A Two-Term Recursive Formula for Pn (x)


Using the generating function, we can show that
(n + 1)Pn+1 (x) = (2n + 1)xPn (x) − nPn−1 (x), (26.6)
for n = 1, 2, .... This is a two-term recursive formula for the Pn (x).
Exercise 26.4 Use the recursive formula to compute Pn (x) for n = 3, 4,
and 5.

26.6 Legendre Series


Just as Fourier series deals with the representation of a function on a finite
interval as a (possibly infinite) sum of sines and cosines, Legendre series
involves the representation of a function f (x) on the interval [−1, 1] as
f (x) = a0 P0 (x) + a1 P1 (x) + · · ·an Pn (x) + · · ·. (26.7)
Exercise 26.5 Use the orthogonality of the Pn (x) over the interval [−1, 1]
to show that
1 1
Z
an = n + f (x)Pn (x)dx. (26.8)
2 −1

26.7 Best Approximation by Polynomials


Suppose that we want to approximate a function f (x) by a polynomial of
degree n, over the interval [−1, 1]. Which polynomial is the best? Since
much attention is paid to the Taylor expansion of a function, you might
guess that the first n + 1 terms of the Taylor series for f (x) might be what
we want to use, but this is not necessarily the case.
First of all, we need to be clear about what we mean by best. Let us
agree that we want to find the polynomial
p(x) = b0 + b1 x + b2 x2 + · · ·bn xn
that minimizes
Z 1 2
f (x) − p(x) dx. (26.9)
−1

It is helpful to note that any polynomial of degree n can be written as


p(x) = c0 P0 (x) + c1 P1 (x) + ... + cn Pn (x), (26.10)
for some coefficients c0 , ..., cn . For example,
p(x) = 3x2 + 4x + 7 = 8P0 (x) + 4P1 (x) + 2P2 (x).
264 CHAPTER 26. LEGENDRE’S EQUATIONS (CHAPTER 10,11)

Exercise 26.6 Show that the choice of coefficients in Equation (26.10) for
which the distance in Equation (26.9) is minimized is cm = am , for the an
given in Equation (26.8).

26.8 Legendre’s Equations and Potential The-


ory
Potential theory is the name given to the study of Laplace’s Equation,
∇2 U = 0, where U = U (x, y, z, t) and the Laplacian operator is with re-
spect to the spatial variables only. Steady-state solutions of the heat equa-
tion, a2 ∇2 U = ∂U ∂t satisfy Laplace’s equation. An important problem in
potential theory is to find a function U satisfying Laplace’s equation in the
interior of a solid and taking specified values on its boundary. For example,
take a sphere where the temperatures at each point on its surface do not
change with time. Now find the steady-sate distribution of temperature
inside the sphere.
It is natural to select coordinates that are easily related to the solid in
question. Because the solid here is a sphere, spherical coordinates are the
best choice. When the Laplacian is translated into spherical coordinates
and Laplace’s equation is solved by separation of variables, one of the
equations that results is easily transformed into Legendre’s equation.

26.9 Legendre Polynomials and Gaussian Quadra-


ture
A quadrature method is a way of estimating the integral of a function from
finitely many of its values. For example, the two-point trapezoidal method
Rb
estimates the integral a f (x)dx as
Z b
1 1
f (x)dx ≈ f (a) + f (b).
a 2(b − a) 2(b − a)
The Legendre polynomials play an important role in one such method,
known as Gaussian Quadrature.

26.9.1 The Basic Formula


Suppose that we are given the points (xn , f (xn )), for n = 1, 2, ..., N , and
Rb
we want to use these values to estimate the integral a f (x)dx. One way
is to use
Z b XN
f (x)dx ≈ cn f (xn ). (26.11)
a n=1
26.9. LEGENDRE POLYNOMIALS AND GAUSSIAN QUADRATURE265

If we select the cn so that the formula in Equation (26.11) is exact for the
functions 1, x, ..., xN −1 , then the formula will provide the exact value of
the integral for any polynomial f (x) of degree less than N . Remarkably,
we can do better than this if we are allowed to select the xn as well as the
cn .

26.9.2 Lagrange Interpolation


Let xn , n = 1, 2, ..., N be arbitrary points in [a, b]. Then the Lagrange
polynomials Ln (x), n = 1, 2, ..., N , are
Y (x − xm )
Ln (x) = .
(xn − xm )
m6=n

Then Ln (xn ) = 1 and Ln (xm ) = 0, for m 6= n. The polynomial


N
X
P (x) = f (xn )Ln (x)
n=1

interpolates f (x) at the N points xn , since P (xn ) = f (xn ) for n = 1, 2, ..., N .

26.9.3 Using the Legendre Polynomials


Let N be given, and let xn , n = 1, 2, ..., N be the N roots of the Legendre
polynomial PN (x). We know that all these roots lie in the interval [−1, 1].
R1
For each n let cn = −1 Ln (x)dx. Let P (x) be any polynomial of degree
less than 2N . We show that
Z 1 X N
P (x)dx = cn P (xn );
−1 n=1

that is, the quadrature method provides the correct answer, not just for
polynomials of degree less than N , but for polynomials of degree less than
2N .
Divide P (x) by PN (x) to get
P (x) = Q(x)PN (x) + R(x),
where both Q(x) and R(x) are polynomials of degree less than N . Then
Z 1 Z 1 Z 1 Z 1
P (x)dx = Q(x)PN (x)dx + R(x)dx = R(x)dx,
−1 −1 −1 −1

since PN (x) is orthogonal to all polynomials of degree less than N . Since


N
X
R(xn )Ln (x)
n=1
266 CHAPTER 26. LEGENDRE’S EQUATIONS (CHAPTER 10,11)

is a polynomial of degree at most N − 1 that interpolates R(x) at N points,


we must have
XN
R(xn )Ln (x) = R(x).
n=1

In addition,

P (xn ) = Q(xn )PN (xn ) + R(xn ) = R(xn ),

so that
N
X Z 1 Z 1 N
X
cn R(xn ) = R(x)dx = P (x)dx = cn P (xn ).
n=1 −1 −1 n=1
Chapter 27

Hermite’s Equations and


Quantum Mechanics
(Chapter 10,11)

27.1 The Schrödinger Wave Function


In quantum mechanics, the behavior of a particle with mass m subject to
a potential V (x, t) satisfies the Schrödinger Equation

∂ψ(x, t) ~ ∂ 2 ψ(x, t)
i~ =− + V (x, t)ψ(x, t), (27.1)
∂t 2m ∂x2
where ~ is Planck’s constant. Here the x is one-dimensional, but extensions
to higher dimensions are also possible.
When the solution ψ(x, t) is selected so that

|ψ(x, t)| → 0,

as |x| → ∞, and Z ∞
|ψ(x, t)|2 dx = 1,
−∞

then, for each fixed t, the function |ψ(x, t)|2 is a probability density function
governing the position of the particle. In other words, the probability of
finding the particle in the interval [a, b] at time t is
Z b
|ψ(x, t)|2 dx.
a

An important special case is that of time-independent potentials.

267
268CHAPTER 27. HERMITE’S EQUATIONS AND QUANTUM MECHANICS (CHAPTER 10,11)

27.2 Time-Independent Potentials


We say that V (x, t) is time-independent if V (x, t) = V (x), for all t. We
then attempt to solve Equation (27.1) by separating the variables; we take
ψ(x, t) = f (t)g(x) and insert this product into Equation (27.1).
The time function is easily shown to be

f (t) = e−Et/~ ,

where E is defined to be the energy. The function g(x) satisfies the time-
independent Schrödinger Equation

~ 00
− g (x) + V (x)g(x) = Eg(x). (27.2)
2m
An important special case is the harmonic oscillator.

27.3 The Harmonic Oscillator


The case of the harmonic oscillator corresponds to the potential V (x) =
1 2
2 kx .

27.3.1 The Classical Spring Problem


To motivate the development of the harmonic oscillator in quantum me-
chanics, it is helpful to recall the classical spring problem. In this problem
a mass m slides back and forth along a frictionless surface, with position
x(t) at time t. It is connected to a fixed structure by a spring with spring
constant k > 0. The restoring force acting on the mass at any time is −kx,
with x = 0 the equilibrium position of the mass. The equation of motion
is
mx00 (t) = −kx(t),
and the solution is r
k
x(t) = x(0) cos t.
m
The period of oscillation is T = 2π m
p
q k and the frequency of oscillation is
ν = T1 = 2π
1 k
m , from which we obtain the equation

k = 4π 2 mν 2 .

The potential energy is 21 kx2 , while the kinetic energy is 12 mẋ2 . The sum of
the kinetic and potential energies is the total energy, E(t). Since E 0 (t) = 0,
the energy is constant.
27.4. DIRAC’S EQUATION 269

27.3.2 Back to the Harmonic Oscillator


When the potential function is V (x) = 21 kx2 , Equation (27.2) becomes

~ 00 1
g (x) + (E − kx2 )g(x) = 0, (27.3)
2m 2
where k = mω , for ω = 2πν. With u = mω
2 2E
p
~ and  = ~ω , we have

d2 g
+ ( − u2 )g = 0. (27.4)
du2
Equation (27.4) is equivalent to

w00 (x) + (2p + 1 − x2 )w(x) = 0,

which can be transformed into Hermite’s Equation

y 00 − 2xy 0 + 2py = 0,
2
by writing y(x) = w(x)ex /2 .
In order for the solutions of Equation (27.3) to be physically admissible
solutions, it is necessary that p be a non-negative integer, which means
that
1
E = ~ω(n + ),
2
for some non-negative integer n; this gives the quantized energy levels for
the harmonic oscillator.

27.4 Dirac’s Equation


Einstein’s theory of special relativity tells us that there are four variables,
not just three, that have length for their units of measurement: the familiar
three-dimensional spatial coordinates, and ct, where c is the speed of light
and t is time. Looked at this way, Schrödinger’s Equation (27.1), extended
to three spatial dimensions, is peculiar, in that it treats the variable ct
differently from the others. There is only a first partial derivative in t,
but second partial derivatives in the other variables. In 1930 the British
mathematician Paul Dirac presented his relativistically correct version of
Schrödinger’s Equation.
Dirac’s Equation, a version of which is inscribed on the wall of West-
minster Abbey, is the following:
∂ψ ~c  ∂ψ ∂ψ ∂ψ 
i~ = α1 + α2 + α3 + α4 mc2 ψ. (27.5)
∂t i ∂x1 ∂x2 ∂x3
Here the αi are the Dirac matrices.
270CHAPTER 27. HERMITE’S EQUATIONS AND QUANTUM MECHANICS (CHAPTER 10,11)

This equation agreed remarkably well with experimental data on the


behavior of electrons in electric and magnetic fields, but it also seemed to
allow for nonsensical solutions, such as spinning electrons with negative
energy. The next year, Dirac realized that what the equation was calling
for was anti-matter, a particle with the same mass as the electron, but with
a positive charge. In the summer of 1932 Carl Anderson, working at Cal
Tech, presented clear evidence for the existence of such a particle, which
we now call the positron. What seemed like the height of science fiction in
1930 has become commonplace today.
When a positron collides with an electron their masses vanish and two
gamma ray photons of pure energy are produced. These photons then
move off in opposite directions. In positron emission tomography (PET)
certain positron-emitting chemicals, such as glucose with radioactive flu-
orine chemically attached, are injected into the patient. When the PET
scanner detects two photons arriving at the two ends of a line segment at
(almost) the same time, called coincidence detection, it concludes that a
positron was emitted somewhere along that line. This is repeated thou-
sands of times. Once all this data has been collected, the mathematicians
take over and use these clues to reconstruct an image of where the glucose
is in the body. It is this image that the doctor sees.
Chapter 28

Array Processing
(Chapter 8)

In radar and sonar, the field u(s, t) being sampled is usually viewed as a
discrete or continuous superposition of planewave solutions with various
amplitudes, frequencies, and wavevectors. We sample the field at various
spatial locations sm , m = 1, ..., M , for t in some finite interval of time.
We simplify the situation a bit now by assuming that all the planewave
solutions are associated with the same frequency, ω. If not, we perform an
FFT on the functions of time received at each sensor location sm and keep
only the value associated with the desired frequency ω.
In the continuous superposition model, the field is
Z
u(s, t) = eiωt f (k)eik·s dk.

Our measurements at the sensor locations sm give us the values


Z
F (sm ) = f (k)eik·sm dk,

for m = 1, ..., M . The data are then Fourier transform values of the complex
function f (k); f (k) is defined for all three-dimensional real vectors k, but
is zero, in theory, at least, for those k whose squared length ||k||2 is not
equal to ω 2 /c2 . Our goal is then to estimate f (k) from finitely many values
of its Fourier transform. Since each k is a normal vector for its planewave
field component, determining the value of f (k) will tell us the strength of
the planewave component coming from the direction k.
The collection of sensors at the spatial locations sm , m = 1, ..., M ,
is called an array, and the size of the array, in units of the wavelength
λ = 2πc/ω, is called the aperture of the array. Generally, the larger the

271
272 CHAPTER 28. ARRAY PROCESSING (CHAPTER 8)

aperture the better, but what is a large aperture for one value of ω will be
a smaller aperture for a lower frequency.
In some applications the sensor locations are essentially arbitrary, while
in others their locations are carefully chosen. Sometimes, the sensors are
collinear, as in sonar towed arrays. Let’s look more closely at the collinear
case.

Figure 28.1: A uniform line array sensing a planewave field.

We assume now that the sensors are equispaced along the x-axis, at
locations (m∆, 0, 0), m = 1, ..., M , where ∆ > 0 is the sensor spacing; such
an arrangement is called a uniform line array. This setup is illustrated in
273

Figure 28.1. Our data is then


Z
Fm = F (sm ) = F ((m∆, 0, 0)) = f (k)eim∆k·(1,0,0) dk.

Since k · (1, 0, 0) = ωc cos θ, for θ the angle between the vector k and the
x-axis, we see that there is some ambiguity now; we cannot distinguish the
cone of vectors that have the same θ. It is common then to assume that the
wavevectors k have no z-component and that θ is the angle between two
vectors in the x, y-plane, the so-called angle of arrival. The wavenumber
variable k = ωc cos θ lies in the interval [− ωc , ωc ], and we imagine that f (k)
is now f (k), defined for |k| ≤ ωc . The Fourier transform of f (k) is F (s), a
function of a single real variable s. Our data is then viewed as the values
F (m∆), for m = 1, ..., M . Since the function f (k) is zero for |k| > ωc , the
Nyquist spacing in s is πc λ 2πc
ω , which is 2 , where λ = ω is the wavelength.
To avoid aliasing, which now means mistaking one direction of arrival
for another, we need to select ∆ ≤ λ2 . When we have oversampled, so that
∆ < λ2 , the interval [− ωc , ωc ], the so-called visible region, is strictly smaller
π π
than the interval [− ∆ , ∆ ]. If the model of propagation is accurate, all
the signal component planewaves will correspond to wavenumbers k in the
visible region and the background noise will also appear as a superposition
of such propagating planewaves. In practice, there can be components in
the noise that appear to come from wavenumbers k outside of the visible
region; this means these components of the noise are not due to distant
sources propagating as planewaves, but, perhaps, to sources that are in
the near field, or localized around individual sensors, or coming from the
electronics within the sensors.
Using the relation λω = 2πc, we can calculate the Nyquist spacing for
any particular case of planewave array processing. For electromagnetic
waves the propagation speed is the speed of light, which we shall take here
to be c = 3 × 108 meters per second. The wavelength λ for gamma rays
is around one Angstrom, which is 10−10 meters; for x-rays it is about one
millimicron, or 10−9 meters. The visible spectrum has wavelengths that
are a little less than one micron, that is, 10−6 meters. Shortwave radio has
wavelength around one millimeter; broadcast radio has a λ running from
about 10 meters to 1000 meters, while the so-called long radio waves can
have wavelengths several thousand meters long. At the one extreme it is
impractical (if not physically impossible) to place individual sensors at the
Nyquist spacing of fractions of microns, while at the other end, managing
to place the sensors far enough apart is the challenge.
The wavelengths used in primitive early radar at the start of World War
II were several meters long. Since resolution is proportional to aperture,
which, in turn, is the length of the array, in units of wavelength, antennae
for such radar needed to be quite large. The general feeling at the time was
that the side with the shortest wavelength would win the war. The cavity
274 CHAPTER 28. ARRAY PROCESSING (CHAPTER 8)

magnetron, invented during the war by British scientists, made possible 10


cm wavelength radar, which could then easily be mounted on planes.
In ocean acoustics it is usually assumed that the speed of propagation
of sound is around 1500 meters per second, although deviations from this
ambient sound speed are significant and since they are caused by such things
as temperature differences in the ocean, can be used to estimate these
differences. At around the frequency ω = 50 Hz, we find sound generated
by man-made machinery, such as motors in vessels, with higher frequency
harmonics sometimes present also; at other frequencies the main sources of
acoustic energy may be wind-driven waves or whales. The wavelength for
50 Hz is λ = 30 meters; sonar will typically operate both above and below
this wavelength. It is sometimes the case that the array of sensors is fixed
in place, so what may be Nyquist spacing for 50 Hz will be oversampling
for 20 Hz.
We have focused here exclusively on planewave propagation, which re-
sults when the source is far enough way from the sensors and the speed of
propagation is constant. In many important applications these conditions
are violated, and different versions of the wave equation are needed, which
have different solutions. For example, sonar signal processing in environ-
ments such as shallow channels, in which some of the sound reaches the
sensors only after interacting with the ocean floor or the surface, requires
more complicated parameterized models for solutions of the appropriate
wave equation. Lack of information about the depth and nature of the
bottom can also cause errors in the signal processing. In some cases it is
possible to use acoustic energy from known sources to determine the needed
information.
Array signal processing can be done in passive or active mode. In passive
mode the energy is either reflected off of or originates at the object of
interest: the moon reflects sunlight, while ships generate their own noise.
In the active mode the object of interest does not generate or reflect enough
energy by itself, so the energy is generated by the party doing the sensing:
active sonar is sometimes used to locate quiet vessels, while radar is used to
locate planes in the sky or to map the surface of the earth. In the February
2003 issue of Harper’s Magazine there is an article on scientific apocalypse,
dealing with the search for near-earth asteroids. These objects are initially
detected by passive optical observation, as small dots of reflected sunlight;
once detected, they are then imaged by active radar to determine their size,
shape, rotation and such.
Chapter 29

Matched Field Processing


(Chapter 10,11,12)

Previously we considered the array processing problem in the context of


planewave propagation. When the environment is more complicated, the
wave equation must be modified to reflect the physics of the situation and
the signal processing modified to incorporate that physics. A good example
of such modification is provided by acoustic signal processing in shallow
water, the topic of this chapter.

29.1 The Shallow-Water Case


In the shallow-water situation the acoustic energy from the source inter-
acts with the surface and with the bottom of the channel, prior to being
received by the sensors. The nature of this interaction is described by the
wave equation in cylindrical coordinates. The deviation from the ambient
pressure is the function p(t, s) = p(t, r, z, θ), where s = (r, z, θ) is the spa-
tial vector variable, r is the range, z the depth, and θ the bearing angle in
the horizontal. We assume a single frequency, ω, so that

p(t, s) = eiωt g(r, z, θ).

We shall assume cylindrical symmetry to remove the θ dependence; in many


applications the bearing is essentially known or limited by the environment
or can be determined by other means. The sensors are usually positioned
in a vertical array in the channel, with the top of the array taken to be
the origin of the coordinate system and positive z taken to mean positive
depth below the surface. We shall also assume that there is a single source
of acoustic energy located at range rs and depth zs .

275
276CHAPTER 29. MATCHED FIELD PROCESSING (CHAPTER 10,11,12)

To simplify a bit, we assume here that the sound speed c = c(z) does
not change with range, but only with depth, and that the channel has
constant depth and density. Then, the Helmholtz equation for the function
g(r, z) is
∇2 g(r, z) + [ω/c(z)]2 g(r, z) = 0.
The Laplacian is
1
∇2 g(r, z) = grr (r, z) + gr (r, z) + gzz (r, z).
r
We separate the variables once again, writing

g(r, z) = f (r)u(z).

Then, the range function f (r) must satisfy the differential equation
1
f 00 (r) + f 0 (r) = −αf (r),
r
and the depth function u(z) satisfies the differential equation

u00 (z) + k(z)2 u(z) = αu(z),

where α is a separation constant and

k(z)2 = [ω/c(z)]2 .

Taking λ2 = α, the range equation becomes


1
f 00 (r) + f 0 (r) + λ2 f (r) = 0,
r
which is Bessel’s equation, with Hankel-function solutions. The depth equa-
tion becomes
u00 (z) + (k(z)2 − λ2 )u(z) = 0,
which is of Sturm-Liouville type. The boundary conditions pertaining to
the surface and the channel bottom will determine the values of λ for which
a solution exists.
To illustrate the way in which the boundary conditions become involved,
we consider two examples.

29.2 The Homogeneous-Layer Model


We assume now that the channel consists of a single homogeneous layer of
water of constant density, constant depth d, and constant sound speed c.
We impose the following boundary conditions:
29.2. THE HOMOGENEOUS-LAYER MODEL 277

a. Pressure-release surface: u(0) = 0.

b. Rigid bottom: u0 (d) = 0.

With γ 2 = (k 2 − λ2 ), we get cos(γd) = 0, so the permissible values of λ are

λm = (k 2 − [(2m − 1)π/2d]2 )1/2 , m = 1, 2, ....

The normalized solutions of the depth equation are now


p
um (z) = 2/d sin(γm z),

where p
γm = k 2 − λ2m = (2m − 1)π/2d, m = 1, 2, ....
For each m the corresponding function of the range satisfies the differential
equation
1
f 00 (r) + f 0 (r) + λ2m f (r),
r
(1) (1)
which has solution H0 (λm r), where H0 is the zeroth order Hankel-
function solution of Bessel’s equation. The asymptotic form for this func-
tion is
(1)
p π
πiH0 (λm r) = 2π/λm r exp(−i(λm r + )).
4
It is this asymptotic form that is used in practice. Note that when λm is
complex with a negative imaginary part, there will be a decaying exponen-
tial in this solution, so this term will be omitted in the signal processing.
Having found the range and depth functions, we write g(r, z) as a su-
perposition of these elementary products, called the modes:
M
(1)
X
g(r, z) = Am H0 (λm r)um (z),
m=1

where M is the number of propagating modes free of decaying exponentials.


The Am can be found from the original Helmholtz equation; they are

Am = (i/4)um (zs ),

where zs is the depth of the source of the acoustic energy. Notice that
the depth of the source also determines the strength of each mode in this
superposition; this is described by saying that the source has excited certain
modes and not others.
The eigenvalues λm of the depth equation will be complex when

ω (2m − 1)π
k= < .
c 2d
278CHAPTER 29. MATCHED FIELD PROCESSING (CHAPTER 10,11,12)

If ω is below the cut-off frequency πc


2d , then all the λm are complex and
there are no propagating modes (M = 0). The number of propagating
modes is
1 ωd
M= + ,
2 πc
which is 12 plus the depth of the channel in units of half-wavelengths.
This model for shallow-water propagation is helpful in revealing a num-
ber of the important aspects of modal propagation, but is of limited prac-
tical utility. A more useful and realistic model is the Pekeris waveguide.

29.3 The Pekeris Waveguide


Now we assume that the water column has constant depth d, sound speed
c, and density b. Beneath the water is an infinite half-space with sound
speed c0 > c, and density b0 . Figure 29.1 illustrates the situation.
Using the new depth variable v = ωz c , the depth equation becomes

ωd
u00 (v) + λ2 u(v) = 0, for 0 ≤ v ≤ ,
c
and
c ωd
u00 (v) + (( 0 )2 − 1 + λ2 )u(v) = 0, for < v.
c c
To have a solution, λ must satisfy the equation
r
0 c
tan(λωd/c) = −(λb/b )/ 1 − ( 0 )2 − λ2 ,
c
with
c
1 − ( 0 )2 − λ2 ≥ 0.
c
The trapped modes are those whose corresponding λ satisfies
c
1 ≥ 1 − λ2 ≥ ( 0 )2 .
c
The eigenfunctions are
ωd
um (v) = sin(λm v), for 0 ≤ v ≤
c
and r !
c 2 2
ωd
um (v) = exp −v 1 − ( 0 ) − λ , for < v.
c c
Although the Pekeris model has its uses, it still may not be realistic enough
in some cases and more complicated propagation models will be needed.
29.4. THE GENERAL NORMAL-MODE MODEL 279

29.4 The General Normal-Mode Model


Regardless of the model by which the modal functions are determined, the
general normal-mode expansion for the range-independent case is
M
X
g(r, z) = um (z)sm (r, zs ),
m=1

where M is the number of propagating modes and sm (r, zs ) is the modal


amplitude containing all the information about the source of the sound.

29.4.1 Matched-Field Processing


In planewave array processing we write the acoustic field as a superposition
of planewave fields and try to find the corresponding amplitudes. This
can be done using a matched filter, although high-resolution methods can
also be used. In the matched-filter approach, we fix a wavevector and then
match the data with the vector that describes what we would have received
at the sensors had there been but a single planewave present corresponding
to that fixed wavevector; we then repeat for other fixed wavevectors. In
more complicated acoustic environments, such as normal-mode propagation
in shallow water, we write the acoustic field as a superposition of fields due
to sources of acoustic energy at individual points in range and depth and
then seek the corresponding amplitudes. Once again, this can be done
using a matched filter.
In matched-field processing we fix a particular range and depth and
compute what we would have received at the sensors had the acoustic field
been generated solely by a single source at that location. We then match the
data with this computed vector. We repeat this process for many different
choices of range and depth, obtaining a function of r and z showing the
likely locations of actual sources. As in the planewave case, high-resolution
nonlinear methods can also be used.
280CHAPTER 29. MATCHED FIELD PROCESSING (CHAPTER 10,11,12)

Figure 29.1: The Pekeris Model.


Part III

Appendices

281
Chapter 30

Inner Products and


Orthogonality

30.1 The Complex Vector Dot Product


An inner product is a generalization of the notion of the dot product be-
tween two complex vectors.

30.1.1 The Two-Dimensional Case


Let u = (a, b) and v = (c, d) be two vectors in two-dimensional space. Let
u make √ the angle α > 0 with the positive x-axis and v the angle β > 0. Let
||u|| = a2 + b2 denote the length of the vector u. Then a = ||u|| cos α,
b = ||u|| sin α, c = ||v|| cos β and d = ||v|| sin β. So u · v = ac + bd =
||u||||v||(cos α cos β + sin α sin β = ||u|| ||v|| cos(α − β). Therefore, we have

u · v = ||u|| ||v|| cos θ, (30.1)

where θ = α − β is the angle between u and v. Cauchy’s inequality is

|u · v| ≤ ||u|| ||v||,

with equality if and only if u and v are parallel. From Equation (30.1) we
know that the dot product u · v is zero if and only if the angle between
these two vectors is a right angle; we say then that u and v are mutually
orthogonal.
Cauchy’s inequality extends to complex vectors u and v:
N
X
u·v = u n vn , (30.2)
n=1

283
284 CHAPTER 30. INNER PRODUCTS AND ORTHOGONALITY

and Cauchy’s Inequality still holds.

Proof of Cauchy’s inequality: To prove Cauchy’s inequality for the


complex vector dot product, we write u · v = |u · v|eiθ . Let t be a real
variable and consider

0 ≤ ||e−iθ u − tv||2 = (e−iθ u − tv) · (e−iθ u − tv)

= ||u||2 − t[(e−iθ u) · v + v · (e−iθ u)] + t2 ||v||2

= ||u||2 − t[(e−iθ u) · v + (e−iθ u) · v] + t2 ||v||2

= ||u||2 − 2Re(te−iθ (u · v)) + t2 ||v||2

= ||u||2 − 2Re(t|u · v|) + t2 ||v||2 = ||u||2 − 2t|u · v| + t2 ||v||2 .


This is a nonnegative quadratic polynomial in the variable t, so it can-
not have two distinct real roots. Therefore, the discriminant 4|u · v|2 −
4||v||2 ||u||2 must be non-positive; that is, |u · v|2 ≤ ||u||2 ||v||2 . This is
Cauchy’s inequality.
A careful examination of the proof just presented shows that we did not
explicitly use the definition of the complex vector dot product, but only
some of its properties. This suggested to mathematicians the possibility of
abstracting these properties and using them to define a more general con-
cept, an inner product, between objects more general than complex vectors,
such as infinite sequences, random variables, and matrices. Such an inner
product can then be used to define the norm of these objects and thereby a
distance between such objects. Once we have an inner product defined, we
also have available the notions of orthogonality and best approximation.

30.1.2 Orthogonality
Consider the problem of writing the two-dimensional real vector (3, −2) as
a linear combination of the vectors (1, 1) and (1, −1); that is, we want to
find constants a and b so that (3, −2) = a(1, 1) + b(1, −1). One way to do
this, of course, is to compare the components: 3 = a + b and −2 = a − b;
we can then solve this simple system for the a and b. In higher dimensions
this way of doing it becomes harder, however. A second way is to make
use of the dot product and orthogonality.
The dot product of two vectors (x, y) and (w, z) in R2 is (x, y) · (w, z) =
xw+yz. If the dot product is zero then the vectors are said to be orthogonal;
the two vectors (1, 1) and (1, −1) are orthogonal. We take the dot product
of both sides of (3, −2) = a(1, 1) + b(1, −1) with (1, 1) to get

1 = (3, −2) · (1, 1) = a(1, 1) · (1, 1) + b(1, −1) · (1, 1) = a(1, 1) · (1, 1) + 0 = 2a,
30.2. GENERALIZING THE DOT PRODUCT: INNER PRODUCTS285

so we see that a = 21 . Similarly, taking the dot product of both sides with
(1, −1) gives

5 = (3, −2) · (1, −1) = a(1, 1) · (1, −1) + b(1, −1) · (1, −1) = 2b,

so b = 25 . Therefore, (3, −2) = 12 (1, 1) + 52 (1, −1). The beauty of this


approach is that it does not get much harder as we go to higher dimensions.
Since the cosine of the angle θ between vectors u and v is

cos θ = u · v/||u|| ||v||,

where ||u||2 = u · u, the projection of vector v on to the line through the


origin parallel to u is
u·v
Proju (v) = u.
u·u
Therefore, the vector v can be written as

v = Proju (v) + (v − Proju (v)),

where the first term on the right is parallel to u and the second one is
orthogonal to u.
How do we find vectors that are mutually orthogonal? Suppose we
begin with (1, 1). Take a second vector, say (1, 2), that is not parallel to
(1, 1) and write it as we did v earlier, that is, as a sum of two vectors,
one parallel to (1, 1) and the second orthogonal to (1, 1). The projection
of (1, 2) onto the line parallel to (1, 1) passing through the origin is

(1, 1) · (1, 2) 3 3 3
(1, 1) = (1, 1) = ( , )
(1, 1) · (1, 1) 2 2 2
so
3 3 3 3 3 3 1 1
(1, 2) = ( , ) + ((1, 2) − ( , )) = ( , ) + (− , ).
2 2 2 2 2 2 2 2
The vectors (− 21 , 12 ) = − 12 (1, −1) and, therefore, (1, −1) are then orthogo-
nal to (1, 1). This approach is the basis for the Gram-Schmidt method for
constructing a set of mutually orthogonal vectors.

30.2 Generalizing the Dot Product: Inner


Products
The proof of Cauchy’s Inequality rests not on the actual definition of the
complex vector dot product, but rather on four of its most basic properties.
We use these properties to extend the concept of the complex vector dot
product to that of inner product. Later in this chapter we shall give several
examples of inner products, applied to a variety of mathematical objects,
286 CHAPTER 30. INNER PRODUCTS AND ORTHOGONALITY

including infinite sequences, functions, random variables, and matrices.


For now, let us denote our mathematical objects by u and v and the inner
product between them as hu, vi . The objects will then be said to be
members of an inner-product space. We are interested in inner products
because they provide a notion of orthogonality, which is fundamental to
best approximation and optimal estimation.

30.2.1 Defining an Inner Product and Norm


The four basic properties that will serve to define an inner product are:

• 1: hu, ui ≥ 0, with equality if and only if u = 0;

• 2: hv, ui = hu, vi ;

• 3: hu, v + wi = hu, vi + hu, wi;

• 4: hcu, vi = chu, vi for any complex number c.

The inner product is the basic ingredient in Hilbert space theory. Using
the inner product, we define the norm of u to be
p
||u|| = hu, ui

and the distance between u and v to be ||u − v||.

The Cauchy-Schwarz inequality: Because these four properties were


all we needed to prove the Cauchy inequality for the complex vector dot
product, we obtain the same inequality whenever we have an inner product.
This more general inequality is the Cauchy-Schwarz inequality:
p p
|hu, vi| ≤ hu, ui hv, vi

or
|hu, vi| ≤ ||u|| ||v||,

with equality if and only if there is a scalar c such that v = cu. We say
that the vectors u and v are orthogonal if hu, vi = 0. We turn now to
some examples.

30.2.2 Some Examples of Inner Products


Here are several examples of inner products.
30.2. GENERALIZING THE DOT PRODUCT: INNER PRODUCTS287

• Inner product of infinite sequences: Let u = {un } and v = {vn }


be infinite sequences of complex numbers. The inner product is then
X
hu, vi = un vn ,

and qX
||u|| = |un |2 .
The sums are assumed to be finite; the index of summation n is singly
or doubly infinite, depending on the context. The Cauchy-Schwarz
inequality says that
X qX qX
| un vn | ≤ |un |2 |vn |2 .

• Inner product of functions: Now suppose that u = f (x) and


v = g(x). Then, Z
hu, vi = f (x)g(x)dx

and sZ
||u|| = |f (x)|2 dx.

The integrals are assumed to be finite; the limits of integration de-


pend on the support of the functions involved. The Cauchy-Schwarz
inequality now says that
Z sZ sZ
| f (x)g(x)dx| ≤ |f (x)|2 dx |g(x)|2 dx.

• Inner product of random variables: Now suppose that u = X


and v = Y are random variables. Then,

hu, vi = E(XY )

and p
||u|| = E(|X|2 ),
which is the standard deviation of X if the mean of X is zero. The
expected values are assumed to be finite. The Cauchy-Schwarz in-
equality now says that
p p
|E(XY )| ≤ E(|X|2 ) E(|Y |2 ).

If E(X) = 0 and E(Y ) = 0, the random variables X and Y are


orthogonal if and only if they are uncorrelated.
288 CHAPTER 30. INNER PRODUCTS AND ORTHOGONALITY

• Inner product of complex matrices: Now suppose that u = A


and v = B are complex matrices. Then,
hu, vi = trace(B † A)
and q
||u|| = trace(A† A),
where the trace of a square matrix is the sum of the entries on the
main diagonal. As we shall see later, this inner product is simply the
complex vector dot product of the vectorized versions of the matrices
involved. The Cauchy-Schwarz inequality now says that
q q
|trace(B † A)| ≤ trace(A† A) trace(B † B).

• Weighted inner product of complex vectors: Let u and v be


complex vectors and let Q be a Hermitian positive-definite matrix;
that is, Q† = Q and u† Qu > 0 for all nonzero vectors u. The inner
product is then
hu, vi = v† Qu
and p
||u|| = u† Qu.
We know from the eigenvector decomposition of Q that Q = C † C for
some matrix C. Therefore, the inner product is simply the complex
vector dot product of the vectors Cu and Cv. The Cauchy-Schwarz
inequality says that
p p
|v† Qu| ≤ u† Qu v† Qv.

• Weighted inner product of functions: Now suppose that u =


f (x) and v = g(x) and w(x) > 0. Then define
Z
hu, vi = f (x)g(x)w(x)dx

and sZ
||u|| = |f (x)|2 w(x)dx.

The integrals are assumed to be finite; the limits of integration depend


on the support of the functions involved.pThis inner productp is simply
the inner product of the functions f (x) w(x) and g(x) w(x). The
Cauchy-Schwarz inequality now says that
Z sZ sZ
| f (x)g(x)w(x)dx| ≤ |f (x)|2 w(x)dx |g(x)|2 w(x)dx.
30.3. BEST APPROXIMATION AND THE ORTHOGONALITY PRINCIPLE289

Once we have an inner product defined, we can speak about orthogonality


and best approximation. Important in that regard is the orthogonality
principle.

30.3 Best Approximation and the Orthogo-


nality Principle
Imagine that you are standing and looking down at the floor. The point
B on the floor that is closest to N , the tip of your nose, is the unique
point on the floor such that the vector from B to any other point A on the
floor is perpendicular to the vector from N to B; that is, hBN, BAi = 0.
This is a simple illustration of the orthogonality principle. Whenever we
have an inner product defined we can speak of orthogonality and apply
the orthogonality principle to find best approximations. For notational
simplicity, we shall consider only real inner product spaces.

30.3.1 Best Approximation


Let u and v1 , ..., vN be members of a real inner-product space. For all
choices of scalars a1 , ..., aN , we can compute the distance from u to the
member a1 v1 + ...aN vN . Then, we minimize this distance over all choices
of the scalars; let b1 , ..., bN be this best choice.
The distance squared from u to a1 v1 + ...aN vN is

||u − (a1 v1 + ...aN vN )||2 = hu − (a1 v1 + ...aN vN ), u − (a1 v1 + ...aN vN )i,


N
X N X
X N
= ||u||2 − 2hu, an vn i + an am hvn , vm i.
n=1 n=1 m=1

Setting the partial derivative with respect to an equal to zero, we have


N
X
hu, vn i = bm hvm , vn i.
m=1

With b = (b1 , ..., bN )T ,

d = (hu, v1 i, ..., hu, vN i)T

and V the matrix with entries

Vmn = hvm , vn i,

we find that we must solve the system of equations V b = d. When the


vectors vn are mutually orthogonal and each has norm equal to one, then
V = I, the identity matrix, and the desired vector b is simply d.
290 CHAPTER 30. INNER PRODUCTS AND ORTHOGONALITY

30.3.2 The Orthogonality Principle


The orthogonality principle provides another way to view the calculation
of the best approximation: let the best approximation of u be the vector
b1 v1 + ...bN vN .
Then
hu − (b1 v1 + ... + bN vN ), vn i = 0,
for n = 1, 2, ..., N . This leads directly to the system of equations
d = V b,
which, as we just saw, provides the optimal coefficients.
To see why the orthogonality principle is valid, fix a value of n and
consider the problem of minimizing the distance
||u − (b1 v1 + ...bN vN + αvn )||
as a function of α. Writing the norm squared in terms of the inner product,
expanding the terms, and differentiating with respect to α, we find that
the minimum occurs when
α = hu − (b1 v1 + ... + bN vN ), vn i.
But we already know that the minimum occurs when α = 0. This completes
the proof of the orthogonality principle.

30.4 Gram-Schmidt Orthogonalization


We have seen that the best approximation is easily calculated if the vectors
vn are mutually orthogonal. But how to we get such a mutually orthogonal
set, in general? The Gram-Schmidt Orthogonalization Method is one way
to proceed.
Let {v1 , ..., vN } be a linearly independent set of vectors in the space
R , where N ≤ M . The Gram-Schmidt method uses the vn to create
M

an orthogonal basis {u1 , ..., uN } for the span of the vn . Begin by taking
u1 = v1 . For j = 2, ..., N , let
u1 · vj 1 uj−1 · vj j−1
uj = vj − u − ... − u . (30.3)
u1 · u1 uj−1 · uj−1
One obvious problem with this approach is that the calculations become
increasingly complicated and lengthy as the j increases. In many of the
important examples of orthogonal functions that we study in connection
with Sturm-Liouville problems, there is a two-term recursive formula that
enables us to generate the next orthogonal function from the two previous
ones.
Chapter 31

Chaos

31.1 The Discrete Logistics Equation


Up to now, our study of differential equations has focused on linear ones,
either ordinary or partial. These constitute only a portion of the differential
equations of interest to applied mathematics. In this chapter we look briefly
at a non-linear ordinary differential equation, the logistics equation, and at
its discrete version. We then consider the iterative sequence, or dynamical
system, associated with this discrete logistics equation, and its relation to
chaos theory. The best introduction to chaos theory is the book by James
Gleick [18].
To illustrate the role of iteration in chaos theory, consider the simplest
differential equation describing population dynamics:

p0 (t) = ap(t), (31.1)

with exponential solutions. More realistic models impose limits to growth,


and may take the form

p0 (t) = a(L − p(t))p(t), (31.2)

where L is an asymptotic limit for p(t). The solution to Equation (31.2) is


p(0)L
p(t) = . (31.3)
p(0) + (L − p(0)) exp(−aLt)
Discrete versions of the limited-population problem then have the form

xk+1 − xk = a(L − xk )xk , (31.4)


a
which, for zk = 1+aL xk , can be written as

zk+1 = r(1 − zk )zk ; (31.5)

291
292 CHAPTER 31. CHAOS

we shall assume that r = 1 + aL > 1. With T z = r(1 − z)z = f (z) and


zk+1 = T zk , we are interested in the behavior of the sequence, as a function
of r.
Figure 31.1 shows the graphs of the functions y = f (x) = r(1 − x)x,
for a fixed value of r, and y = x. The maximum of f (x) occurs at x = 0.5
and the maximum value is f (0.5) = 4r . In the Figure, r looks to be about
3.8. Figure 31.2 displays the iterations for several values of r, called λ in
the figures.

31.2 Fixed Points


The operator T has a fixed point at z∗ = 0, for every value of r, and another
fixed point at z∗ = 1− 1r , if r > 1. From the Mean-Value Theorem we know
that

z∗ − zk+1 = f (z∗ ) − f (zk ) = f 0 (ck )(z∗ − zk ), (31.6)

for some ck between z∗ and zk . If zk is sufficiently close to z∗ , then ck will


be even closer to z∗ and f 0 (ck ) can be approximated by f 0 (z∗ ).
In order for f (x) to be a mapping from [0, 1] to [0, 1] it is necessary and
sufficient that r ≤ 4. For r > 4, it is interesting to ask for which starting
points z0 does the sequence of iterates remain within [0, 1].

31.3 Stability
A fixed point z∗ of f (z) is said to be stable if |f 0 (z∗ )| < 1, where f 0 (z∗ ) =
r(1 − 2z∗ ). Since we are assuming that r > 1, the fixed point z∗ = 0 is
unstable. The point z∗ = 1 − 1r is stable if 1 < r < 3. When z∗ is a stable
fixed point, and zk is sufficiently close to z∗ , we have

|z∗ − zk+1 | < |z∗ − zk |, (31.7)

so we get closer to z∗ with each iterative step. Such a fixed point is attrac-
tive. In fact, if r = 2, z∗ = 1− 1r = 12 is superstable and convergence is quite
rapid, since f 0 ( 12 ) = 0. We can see from Figure 31.3 that, for 1 < r < 3,
the iterative sequence {zk } has the single limit point z∗ = 1 − 1r .
What happens beyond r = 3 is more interesting. For r > 3 the fixed
point z∗ = 1 − 1r is no longer attracting, so all the fixed points are repelling.
What can the sequence {zk } do in such a case? As we see from Figure
31.3 and the close-up in Figure 31.5, for values of r from 3 to about 3.45,
the sequence {zk } eventually oscillates between two subsequential limits;
the sequence is said to have period two. Then period doubling occurs. For
values of r from about 3.45 to about 3.54, the sequence {zk } has period
four, that is, the sequence eventually oscillates among four subsequential
31.4. PERIODICITY 293

limit points. Then, as r continues to increase, period doubling happens


again, and again, and again, each time for smaller increases in r than for
the previous doubling. Remarkably, there is a value of r prior to which
infinitely many period doublings have taken place and after which chaos
ensues.

31.4 Periodicity
For 1 < r < 3 the fixed point z∗ = 1 − 1r is stable and is an attracting fixed
point. For r > 3, the fixed point z∗ is no longer attracting; if zk is near z∗
then zk+1 will be farther away.
Using the change of variable x = −rz + 2r , the iteration in Equation
(31.5) becomes

r r2
xk+1 = x2k + ( − ), (31.8)
2 4
and the fixed points become x∗ = 2r and x∗ = 1 − 2r .
For r = 3.835 there is a starting point x0 for which the iterates are
periodic with period three, which implies, according to the results of Li
and Yorke, that there are periodic orbits with period n, for all positive
integers n [33].

31.5 Sensitivity to the Starting Value


Using Equation (31.8), the iteration for r = 4 can be written as

xk+1 = x2k − 2. (31.9)

In [8] Burger and Starbird illustrate the sensitivity of this iterative scheme
to the choice of x0 . The numbers in the first column of Figure 31.6 were
generated by Excel using Equation (31.9) and starting value x0 = 0.5. To
form the second column, the authors retyped the first twelve entries of the
first column, exactly as shown on the page, and then let Excel proceed to
calculate the remaining ones. Obviously, the two columns become quite
different, as the iterations proceed. Why? The answer lies in sensitivity of
the iteration to initial conditions.
When Excel generated the first column, it kept more digits at each
step than it displayed. Therefore, Excel used more digits to calculate the
thirteenth item in the first column than just what is displayed as the twelfth
entry. When the twelfth entry, exactly as displayed, was used to generate
the thirteenth entry of the second column, those extra digits were not
available to Excel. This slight difference, beginning in the tenth decimal
place, was enough to cause the observed difference in the two tables.
294 CHAPTER 31. CHAOS

For r > 4 the set of starting points in [0, 1] for which the sequence of
iterates never leaves [0, 1] is a Cantor set, which is a fractal. The book by
Devaney [12] gives a rigorous treatment of these topics; Young’s book [48]
contains a more elementary discussion of some of the same notions.

31.6 Plotting the Iterates


Figure 31.4 shows the values of z1 (red), z2 (yellow), z3 (green), z4 (blue),
and z5 (violet), for each z0 in the interval [0, 1], for the four values r =
1, 2, 3, 4. For r = 1, we have zk+1 = zk − zk2 < zk , for all zk > 0, so that the
only limit is zero. For r = 2, z∗ = 0.5 is the only attractive fixed point and
is the limit, for all z0 in (0, 1). For r = 3 we se the beginnings of instability,
while by r = 4 chaos reigns.

31.7 Filled Julia Sets


The xk in the iteration in Equation (31.8) are real numbers, but the iter-
ation can be applied to complex numbers as well. For each fixed complex
number c, consider the iterative sequence beginning at z0 = 0, with

zk+1 = zk2 + c. (31.10)

We say that the sequence {zk } is bounded if there is a constant B such that
|zk | ≤ B, for all k. We want to know for which c the sequence generated
by Equation (31.10) is bounded.
In Figure 31.7 those c for which the iterative sequence {zk } is bounded
are in the black Mandelbrot set, and those c for which the sequence is
not bounded are in the white set. It is not apparent from the figure, but
when we zoom in, we find the entire figure repeated on a smaller scale. As
we continue to zoom in, the figure reappears again and again, each time
smaller than before. √
There is a theorem that tells us that if |zk | ≥ 1+ 2 for some k, then the
sequence is not bounded. Therefore, if c is in the white set, we will know
this for certain after we have computed finitely many iterates. Such sets are
sometimes called recursively enumerable. However, there does not appear
to be an algorithm that will tell us when c is in the black set. The situation
is described by saying that the black set, often called the Mandelbrot set,
is non-recursive.
Previously, we were interested in what happens as we change c, but
start the iteration at z0 = 0 each time. We could modify the problem
slightly, using only a single value of c, but then starting at arbitrary points
z0 . Those z0 for which the sequence is bounded form the new black set,
called the filled Julia set associated with the function f (z) = z 2 + c.
31.8. THE NEWTON-RAPHSON ALGORITHM 295

For much more on this subject and related ideas, see the book by Roger
Penrose [38].

Figure 31.1: The graphs of y = r(1 − x)x and y = x.

31.8 The Newton-Raphson Algorithm


The well known Newton-Raphson (NR) iterative algorithm is used to find
a root of a function g : R → R.

Algorithm 31.1 (Newton-Raphson) Let x0 ∈ R be arbitrary. Having


calculated xk , let

xk+1 = xk − g(xk )/g 0 (xk ). (31.11)


296 CHAPTER 31. CHAOS

The operator T is now the ordinary function

T x = x − g(x)/g 0 (x). (31.12)

If g is a vector-valued function, g : RJ → RJ , then g(x) has the form


g(x) = (g1 (x), ..., gJ (x))T , where gj : RJ → R are the component functions
of g(x). The NR algorithm is then as follows:

Algorithm 31.2 (Newton-Raphson) Let x0 ∈ RJ be arbitrary. Having


calculated xk , let

xk+1 = xk − [J (g)(xk )]−1 g(xk ). (31.13)

Here J (g)(x) is the Jacobian matrix of first partial derivatives of the com-
ponent functions of g; that is, its entries are ∂g
∂xj (x). The operator T is
m

now

T x = x − [J (g)(x)]−1 g(x). (31.14)

Convergence of the NR algorithm is not guaranteed and depends on the


starting point being sufficiently close to a solution. When it does converge,
however, it does so fairly rapidly. In both the scalar and vector cases, the
limit is a fixed point of T , and therefore a root of g(x).

31.9 Newton-Raphson and Chaos


It is interesting to consider how the behavior of the NR iteration depends
on the starting point.

31.9.1 A Simple Case


The complex-valued function f (z) = z 2 − 1 of the complex variable z has
two roots, z = 1 and z = −1. The NR method for finding a root now has
the iterative step

zk 1
zk+1 = T zk = + . (31.15)
2 2zk

If z0 is selected closer to z = 1 than to z = −1 then the iterative


sequence converges to z = 1; similarly, if z0 is closer to z = −1, the limit is
z = −1. If z0 is on the vertical axis of points with real part equal to zero,
then the sequence does not converge, and is not even defined for z0 = 0.
This axis separates the two basins of attraction of the algorithm.
31.10. THE CANTOR GAME 297

31.9.2 A Not-So-Simple Case


Now consider the function f (z) = z 3 − 1, which has the three roots z = 1,
z = ω = e2πi/3 , and z = ω 2 = e4πi/3 . The NR method for finding a root
now has the iterative step

2zk 1
zk+1 = T zk = + 2. (31.16)
3 3zk

Where are the basins of attraction now? Is the complex plane divided up
as three people would divide a pizza, into three wedge-shaped slices, each
containing one of the roots? Far from it, as Figure 31.8 shows. In this
figure the color of a point indicates the root to which the iteration will
converge, if it is started at that point. In fact, it can be shown that, if the
sequence starting at z0 = a converges to z = 1 and the sequence starting
at z0 = b converges to ω, then there is a starting point z0 = c, closer to a
than b is, whose sequence converges to ω 2 . For more details, see Schroeder’s
delightful book [41].

31.10 The Cantor Game


The Cantor Game is played as follows. Select a starting point x0 in the
interior of the unit interval [0, 1]. Let a be the end point closer to x0 , with
a = 0 if there is a tie. Let x1 be the point whose distance from a is three
times the distance from a to x0 . Now repeat the process, with x1 replacing
x0 . In order to win the game, we must select a starting point x0 in such
a way that all the subsequent points xn remain within the interior of the
interval [0, 1]. Where are the winning starting points?

31.11 The Sir Pinski Game


In [41] Schroeder discusses several iterative sequences that lead to fractals
or chaotic behavior. The Sir Pinski Game, a two-dimensional variant of the
Cantor Game, has the following rules. Let P0 be a point chosen arbitrarily
within the interior of the equilateral triangle with vertices (1, 0, 0), (0, 1, 0)
and (0, 0, 1). Let V be the vertex closest to P0 and P1 chosen so that P0
is the midpoint of the line segment V P1 . Repeat the process, with P1 in
place of P0 . The game is lost when Pn falls outside the original triangle.
The objective of the game is to select P0 that will allow the player to win
the game. Where are these winning points?
The inverse Sir Pinski Game is similar. Select any point P0 in the
plane of the equilateral triangle, let V be the most distance vertex, and P1
the midpoint of the line segment P0 V . Replace P0 with P1 and repeat the
298 CHAPTER 31. CHAOS

procedure. The resulting sequence of points is convergent. Which points


are limit points of sequences obtained in this way?

31.12 The Chaos Game


Schroeder also mentions Barnsley’s Chaos Game. Select P0 inside the equi-
lateral triangle. Roll a fair die and let V = (1, 0, 0) if 1 or 2 is rolled,
V = (0, 1, 0) if 3 or 4 is rolled, and V = (0, 0, 1) if 5 or 6 is rolled. Let
P1 again be the midpoint of V P0 . Replace P0 with P1 and repeat the
procedure. Which points are limits of such sequences of points?
31.12. THE CHAOS GAME 299

Figure 31.2: Iterations for various values of r.


300 CHAPTER 31. CHAOS

Figure 31.3: Limit behavior for various r.

Figure 31.4: Iteration maps.


31.12. THE CHAOS GAME 301

Figure 31.5: Close-up of Figure 31.3.


302 CHAPTER 31. CHAOS

Figure 31.6: Sensitivity to initial conditions.


31.12. THE CHAOS GAME 303

Figure 31.7: The Mandelbrot Set


304 CHAPTER 31. CHAOS

Figure 31.8: Basins of attraction for Equation (31.16).


Chapter 32

Wavelets

32.1 Analysis and Synthesis

In our discussion of special functions, we saw that the Bessel functions,


the Legendre polynomials and the Hermite polynomials provide building
blocks, or basis elements for certain classes of functions. Each family ex-
hibits a form of orthogonality that makes it easy to calculate the coefficients
in an expansion. Wavelets provide other important families of orthogonal
basis functions.

An important theme that runs through most of mathematics, from the


geometry of the early Greeks to modern signal processing, is analysis and
synthesis, or, less formally, breaking up and putting back together. The
Greeks estimated the area of a circle by breaking it up into sectors that
approximated triangles. The Riemann approach to integration involves
breaking up the area under a curve into pieces that approximate rectangles
or other simple shapes. Viewed differently, the Riemann approach is first
to approximate the function to be integrated by a step function and then
to integrate the step function.

Along with geometry, Euclid includes a good deal of number theory,


in which we find analysis and synthesis. His theorem that every positive
integer is divisible by a prime is analysis; division does the breaking up
and the simple pieces are the primes. The fundamental theorem of arith-
metic, which asserts that every positive integer can be written in an essen-
tially unique way as the product of powers of primes, is synthesis, with the
putting back together done by multiplication.

305
306 CHAPTER 32. WAVELETS

32.2 Polynomial Approximation


The individual power functions, xn , are not particularly interesting by
themselves, but when finitely many of them are scaled and added to form a
polynomial, interesting functions can result, as the famous approximation
theorem of Weierstrass confirms [32]:

Theorem 32.1 If f : [a, b] → R is continuous and  > 0 is given, we can


find a polynomial P such that |f (x) − P (x)| ≤  for every x in [a, b].

The idea of building complicated functions from powers is carried a


step further with the use of infinite series, such as Taylor series. The sine
function, for example, can be represented for all real x by the infinite power
series
1 1 1
sin x = x − x3 + x5 − x7 + ....
3! 5! 7!
The most interesting thing to note about this is that the sine function has
properties that none of the individual power functions possess; for exam-
ple, it is bounded and periodic. So we see that an infinite sum of simple
functions can be qualitatively different from the components in the sum. If
we take the sum of only finitely many terms in the Taylor series for the sine
function we get a polynomial, which cannot provide a good approximation
of the sine function for all x; that is, the finite sum does not approximate
the sine function uniformly over the real line. The approximation is better
for x near zero and poorer as we move away from zero. However, for any
selected x and for any  > 0, there is a positive integer N , depending on
the x and on the , with the sum of the first n terms of the series within  of
sin x for n ≥ N ; that is, the series converges pointwise to sin x for each real
x. In Fourier analysis the trigonometric functions themselves are viewed
as the simple functions, and we try to build more complicated functions as
(possibly infinite) sums of trig functions. In wavelet analysis we have more
freedom to design the simple functions to fit the problem at hand.

32.3 A Radar Problem


To help motivate wavelets, we turn to a signal-processing problem arising
in radar. The connection between radar signal processing and wavelets is
discussed in some detail in Kaiser’s book [30].

32.3.1 Stationary Target


In radar a real-valued function ψ(t) representing a time-varying voltage is
converted by an antenna in transmission mode into a propagating electro-
magnetic wave. When this wave encounters a reflecting target an echo is
32.3. A RADAR PROBLEM 307

produced. The antenna, now in receiving mode, picks up the echo f (t),
which is related to the original signal by

f (t) = Aψ(t − d(t)),

where d(t) is the time required for the original signal to make the round trip
from the antenna to the target and return back at time t. The amplitude A
incorporates the reflectivity of the target as well as attenuation suffered by
the signal. As we shall see shortly, the delay d(t) depends on the distance
from the antenna to the target and, if the target is moving, on its radial
velocity. The main signal-processing problem here is to determine target
range and radial velocity from knowledge of f (t) and ψ(t).
If the target is stationary, at a distance r0 from the antenna, then
d(t) = 2r0 /c, where c is the speed of light. In this case the original signal
and the received echo are related simply by

f (t) = Aψ(t − b),

for b = 2r0 /c.

32.3.2 Moving Target


When the target is moving so that its distance to the antenna, r(t), is
time-dependent, the relationship between f and ψ is more complicated.

Exercise 32.1 Suppose the target is at a distance r0 > 0 from the antenna
at time t = 0, and has radial velocity v, with v > 0 indicating away from
the antenna. Show that the delay function d(t) is now
r0 + vt
d(t) = 2
c+v
and f (t) is related to ψ(t) according to

t−b
f (t) = Aψ( ), (32.1)
a
for
c+v
a=
c−v
and
2r0
b= .
c−v
Show also that if we select A = ( c−v
c+v )
1/2
then energy is preserved; that is,
||f || = ||ψ||.
308 CHAPTER 32. WAVELETS

Exercise 32.2 Let Ψ(ω) be the Fourier transform of the signal ψ(t). Show
that the Fourier transform of the echo f (t) in Equation (32.1) is then

F (ω) = Aaeibω Ψ(aω). (32.2)

The basic problem is to determine a and b, and therefore the range and
radial velocity of the target, from knowledge of f (t) and ψ(t). An obvious
approach is to do a matched filter.

32.3.3 The Wideband Cross-Ambiguity Function


Note that the received echo f (t) is related to the original signal by the
operations of rescaling and shifting. We therefore match the received echo
with all the shifted and rescaled versions of the original signal. For each
a > 0 and real b, let
t−b
ψa,b (t) = ψ( ).
a
The wideband cross-ambiguity function (WCAF) is
Z ∞
1
(Wψ f )(b, a) = √ f (t)ψa,b (t)dt. (32.3)
a −∞
In the ideal case the values of a and b for which the WCAF takes on its
largest absolute value should be the true values of a and b.

32.4 Wavelets
32.4.1 Background
The fantastic increase in computer power over the last few decades has
made possible, even routine, the use of digital procedures for solving prob-
lems that were believed earlier to be intractable, such as the modeling of
large-scale systems. At the same time, it has created new applications
unimagined previously, such as medical imaging. In some cases the math-
ematical formulation of the problem is known and progress has come with
the introduction of efficient computational algorithms, as with the Fast
Fourier Transform. In other cases, the mathematics is developed, or per-
haps rediscovered, as needed by the people involved in the applications.
Only later it is realized that the theory already existed, as with the de-
velopment of computerized tomography without Radon’s earlier work on
reconstruction of functions from their line integrals.
It can happen that applications give a theoretical field of mathematics a
rebirth; such seems to be the case with wavelets [28]. Sometime in the 1980s
researchers working on various problems in electrical engineering, quantum
mechanics, image processing, and other areas became aware that what the
32.4. WAVELETS 309

others were doing was related to their own work. As connections became
established, similarities with the earlier mathematical theory of approxi-
mation in functional analysis were noticed. Meetings began to take place,
and a common language began to emerge around this reborn area, now
called wavelets. One of the most significant meetings took place in June
of 1990, at the University of Massachusetts Lowell. The keynote speaker
was Ingrid Daubechies; the lectures she gave that week were subsequently
published in the book [11].
There are a number of good books on wavelets, such as [30], [4], and [45].
A recent issue of the IEEE Signal Processing Magazine has an interesting
article on using wavelet analysis of paintings for artist identification [29].
Fourier analysis and synthesis concerns the decomposition, filtering,
compressing, and reconstruction of signals using complex exponential func-
tions as the building blocks; wavelet theory provides a framework in which
other building blocks, better suited to the problem at hand, can be used.
As always, efficient algorithms provide the bridge between theory and prac-
tice.

32.4.2 A Simple Example


Imagine that f (t) is defined for all real t and we have sampled f (t) every
half-second. We focus on the time interval [0, 2). Suppose that f (0) = 1,
f (0.5) = −3, f (1) = 2 and f (1.5) = 4. We approximate f (t) within the
interval [0, 2) by replacing f (t) with the step function that is 1 on [0, 0.5),
−3 on [0.5, 1), 2 on [1, 1.5), and 4 on [1.5, 2); for notational convenience, we
represent this step function by (1, −3, 2, 4). We can decompose (1, −3, 2, 4)
into a sum of step functions

(1, −3, 2, 4) = 1(1, 1, 1, 1) − 2(1, 1, −1, −1) + 2(1, −1, 0, 0) − 1(0, 0, 1, −1).

The first basis element, (1, 1, 1, 1), does not vary over a two-second interval.
The second one, (1, 1, −1, −1), is orthogonal to the first, and does not vary
over a one-second interval. The other two, both orthogonal to the previous
two and to each other, vary over half-second intervals. We can think of these
basis functions as corresponding to different frequency components and
time locations; that is, they are giving us a time-frequency decomposition.
Suppose we let φ0 (t) be the function that is 1 on the interval [0, 1) and
0 elsewhere, and ψ0 (t) the function that is 1 on the interval [0, 0.5) and −1
on the interval [0.5, 1). Then we say that

φ0 (t) = (1, 1, 0, 0),

and
ψ0 (t) = (1, −1, 0, 0).
310 CHAPTER 32. WAVELETS

Then we write
φ−1 (t) = (1, 1, 1, 1) = φ0 (0.5t),

ψ0 (t − 1) = (0, 0, 1, −1),
and
ψ−1 (t) = (1, 1, −1, −1) = ψ0 (0.5t).
So we have the decomposition of (1, −3, 2, 4) as

(1, −3, 2, 4) = 1φ−1 (t) − 2ψ−1 (t) + 2ψ0 (t) − 1ψ0 (t − 1).

It what follows we shall be interested in extending these ideas, to find other


functions φ0 (t) and ψ0 (t) that lead to bases consisting of functions of the
form
ψj,k (t) = ψ0 (2j t − k).
These will be our wavelet bases.

32.4.3 The Integral Wavelet Transform


For real numbers b and a 6= 0, the integral wavelet transform (IWT) of the
signal f (t) relative to the basic wavelet (or mother wavelet) ψ(t) is
Z ∞
1 t−b
(Wψ f )(b, a) = |a|− 2 f (t)ψ( )dt.
−∞ a

This function is also the wideband cross-ambiguity function in radar. The


function ψ(t) is also called a window function and, like Gaussian functions,
it will be relatively localized in time. An example is the Haar wavelet
ψHaar (t) that has the value +1 for 0 ≤ t < 21 , −1 for 12 ≤ t < 1 and zero
otherwise.
As the scaling parameter a grows larger the wavelet ψ(t) grows wider,
so choosing a small value of the scaling parameter permits us to focus on a
neighborhood of the time t = b. The IWT then registers the contribution
to f (t) made by components with features on the scale determined by
a, in the neightborhood of t = b. Calculations involving the uncertainty
principle reveal that the IWT provides a flexible time-frequency window
that narrows when we observe high frequency components and widens for
lower frequencies.
Given the integral wavelet transform (Wψ f )(b, a), it is natural to ask
how we might recover the signal f (t). The following inversion formula
answers that question: at points t where f (t) is continuous we have
Z ∞Z ∞
1 t − b da
f (t) = (Wψ f )(b, a)ψ( ) db,
Cψ −∞ −∞ a a2
32.4. WAVELETS 311

with

|Ψ(ω)|2
Z
Cψ = dω
−∞ |ω|
for Ψ(ω) the Fourier transform of ψ(t).

32.4.4 Wavelet Series Expansions


The Fourier series expansion of a function f (t) on a finite interval is a
representation of f (t) as a sum of orthogonal complex exponentials. Lo-
calized alterations in f (t) affect every one of the components of this sum.
Wavelets, on the other hand, can be used to represent f (t) so that local-
ized alterations in f (t) affect only a few of the components of the wavelet
expansion. The simplest example of a wavelet expansion is with respect to
the Haar wavelets.

Exercise 32.3 Let w(t) = ψHaar (t). Show that the functions wjk (t) =
w(2j t − k) are mutually orthogonal on the interval [0, 1], where j = 0, 1, ...
and k = 0, 1, ..., 2j − 1.

These functions wjk (t) are the Haar wavelets. Every continuous func-
tion f (t) defined on [0, 1] can be written as
j
∞ 2X
X −1
f (t) = c0 + cjk wjk (t)
j=0 k=0

for some choice of c0 and cjk . Notice that the support of the function wjk (t),
the interval on which it is nonzero, gets smaller as j increases. Therefore,
the components corresponding to higher values of j in the Haar expansion
of f (t) come from features that are localized in the variable t; such features
are transients that live for only a short time. Such transient components
affect all of the Fourier coefficients but only those Haar wavelet coefficients
corresponding to terms supported in the region of the disturbance. This
ability to isolate localized features is the main reason for the popularity of
wavelet expansions.

32.4.5 More General Wavelets


The orthogonal functions used in the Haar wavelet expansion are them-
selves discontinuous, which presents a bit of a problem when we represent
continuous functions. Wavelets that are themselves continuous, or better
still, differentiable, should do a better job representing smooth functions.
312 CHAPTER 32. WAVELETS

We can obtain other wavelet series expansions by selecting a basic


wavelet ψ(t) and defining ψjk (t) = 2j/2 ψ(2j t − k), for integers j and k.
We then say that the function ψ(t) is an orthogonal wavelet if the family
{ψjk } is an orthonormal basis for the space of square-integrable functions
on the real line, the Hilbert space L2 (R). This implies that for every such
f (t) there are coefficients cjk so that

X ∞
X
f (t) = cjk ψjk (t),
j=−∞ k=−∞

with convergence in the mean-square sense. The coefficients cjk are found
using the IWT:
k 1
cjk = (Wψ f )( j , j ).
2 2
As with Fourier series, wavelet series expansion permits the filtering of
certain components, as well as signal compression. In the case of Fourier
series, we might attribute high frequency components to noise and achieve
a smoothing by setting to zero the coefficients associated with these high
frequencies. In the case of wavelet series expansions, we might attribute to
noise localized small-scale disturbances and remove them by setting to zero
the coefficients corresponding to the appropriate j and k. For both Fourier
and wavelet series expansions we can achieve compression by ignoring those
components whose coefficients are below some chosen level.
Bibliography

1. Baggott, J. (1992) The Meaning of Quantum Theory, Oxford Univer-


sity Press.

2. Bliss, G.A. (1925) Calculus of Variations Carus Mathematical Mono-


graphs, American Mathematical Society.

3. Bochner, S. (1966) The Role of Mathematics in the Rise of Science.


Princeton University Press.

4. Boggess, A. and Narcowich, F. (2001) A First Course in Wavelets, with


Fourier Analysis. Englewood Cliffs, NJ: Prentice-Hall.

5. Bracewell, R.C. (1979) “Image reconstruction in radio astronomy.” in


[27], pp. 81–104.

6. Brockman, M. (2009) What’s Next? Dispatches on the Future of Sci-


ence, Vintage Books, New York.

7. Bryan, K., and Leise, T. (2010) “Impedance imaging, inverse problems,


and Harry Potter’s cloak .” SIAM Review, 52 (2), pp. 359–377.

8. Burger, E., and Starbird, M. (2006) Coincidences, Chaos, and All That
Math Jazz New York: W.W. Norton, Publ.

9. Butterfield, H. (1957) The Origins of Modern Science: 1300–1800, Free


Press Paperback (MacMillan Co.).

10. Byrne, C. (2005) Signal Processing: A Mathematical Approach, AK


Peters, Publ., Wellesley, MA.

11. Daubechies, I. (1992) Ten Lectures on Wavelets. Philadelphia: Society


for Industrial and Applied Mathematics.

12. Devaney, R. (1989) An Introduction to Chaotic Dynamical Systems,


Addison-Wesley.

13. Diamond, J. (1997) Guns, Germs, and Steel, Norton, Publ.

313
314 BIBLIOGRAPHY

14. Fara, P. (2009) Science: A Four Thousand Year History, Oxford Uni-
versity Press.

15. Feynman, R., Leighton, R., and Sands, M. (1963) The Feynman Lec-
tures on Physics, Vol. 1. Boston: Addison-Wesley.

16. Flanigan, F. (1983) Complex Variables: Harmonic and Analytic Func-


tions, Dover Publ.

17. Fleming, W. (1965) Functions of Several Variables, Addison-Wesley.

18. Gleick, J. (1987) Chaos: The Making of a New Science. Penguin Books.

19. Gonzalez-Velasco, E. (1996) Fourier Analysis and Boundary Value


Problems. Academic Press.

20. Gonzalez-Velasco, E. (2008) personal communication.

21. Graham-Eagle, J. (2008) unpublished notes in applied mathematics.

22. Greenblatt, S. (2011) The Swerve: How the World Became Modern.
New York: W.W. Norton.

23. Greene, B. (2011) The Hidden Reality: Parallel Universes and the Deep
Laws of the Cosmos. New York: Vintage Books.

24. Groetsch, C. (1999) Inverse Problems: Activities for Undergraduates,


The Mathematical Association of America.

25. Heath, T. (1981) Aristarchus of Samos: The Ancient Copernicus.


Dover Books.

26. Heisenberg, W. (1958) Physics and Philosophy, Harper Torchbooks.

27. Herman, G.T. (ed.) (1979) “Image Reconstruction from Projections” ,


Topics in Applied Physics, Vol. 32, Springer-Verlag, Berlin.

28. Hubbard, B. (1998) The World According to Wavelets. Natick, MA: A


K Peters, Inc.

29. Johnson, C., Hendriks, E., Berezhnoy, I., Brevdo, E., Hughes, S.,
Daubechies, I., Li, J., Postma, E., and Wang, J. (2008) “Image Pro-
cessing for Artist Identification” IEEE Signal Processing Magazine,
25(4), pp. 37–48.

30. Kaiser, G. (1994) A Friendly Guide to Wavelets. Boston: Birkhäuser.

31. Koestler, A. (1959) The Sleepwalkers: A History of Man’s Changing


Vision of the Universe, Penguin Books.
BIBLIOGRAPHY 315

32. Körner, T. (1988) Fourier Analysis. Cambridge, UK: Cambridge Uni-


versity Press.
33. Li, T., and Yorke, J.A. (1975) “Period Three Implies Chaos” American
Mathematics Monthly, 82, pp. 985–992.
34. Lindberg, D. (1992) The Beginnings of Western Science, University of
Chicago Press.
35. Lindley, D. (2007) Uncertainty: Einstein, Heisenberg, Bohr, and the
Struggle for the Soul of Science, Doubleday.
36. Muller, R. (2008) Physics for Future Presidents: the Science Behind
the Headlines, Norton.
37. Papoulis, A. (1977) Signal Analysis. New York: McGraw-Hill.
38. Penrose, R. (1989) The Emperor’s New Mind: Concerning Computers,
Minds, and the Laws of Physics. Oxford University Press.
39. Rigden, J. (2005) Einstein 1905: The Standard of Greatness. Harvard
University Press.
40. Schey, H.M. (1973) Div, Curl, Grad, and All That, W.W. Norton.
41. Schroeder, M. (1991) Fractals, Chaos, Power Laws, W.H. Freeman,
New York.
42. Simmons, G. (1972) Differential Equations, with Applications and His-
torical Notes. New York: McGraw-Hill.
43. Smolin, L. (2006) The Trouble with Physics, Houghton Mifflin.
44. Twomey, S. (1996) Introduction to the Mathematics of Inversion in
Remote Sensing and Indirect Measurement. New York: Dover Publ.
45. Walnut, D. (2002) An Introduction to Wavelets. Boston: Birkhäuser.
46. Witten, E. (2002) “Physical law and the quest for mathematical un-
derstanding.” Bulletin of the American Mathematical Society, 40(1),
pp. 21–29.
47. Wylie, C.R. (1966) Advanced Engineering Mathematics. New York:
McGraw-Hill.
48. Young, R. (1992) Excursions in Calculus: An Interplay of the Continu-
ous and Discrete, Dolciani Mathematical Expositions Number 13, The
Mathematical Association of America.
49. Zajonc, A. (1993) Catching the Light: the Entwined History of Light
and Mind. Oxford, UK: Oxford University Press.
316 BIBLIOGRAPHY
Index

T -invariant subspace, 116 DFT, 54


χΩ (ω), 77 direction vector, 8
directional derivative, 8
algebraic reconstruction technique, 93 discrete Fourier transform, 54
aliasing, 63 divergence, 150
angle of arrival, 273 divergence theorem, 151
angular momentum vector, 160 dot product, 7, 284
aperture, 61, 271 dual space, 107
approximate delta function, 79
array, 271, 275 eigenvalue, 108
array aperture, 67 eigenvector, 108, 288
ART, 93, 95 Euler, 14
Euler-Lagrange Equation, 212
band-limited extrapolation, 54 even function, 77
basic wavelet, 310 even part, 78
basin of attraction, 296
basis, 105 far-field assumption, 55
beam-hardening, 88 Fourier Inversion Formula, 84
Brachistochrone Problem, 208 Fourier transform, 60, 75
Burg entropy, 210 frequency-domain extrapolation, 83
frequency-response function, 82
Cauchy’s Inequality, 8 functional, 207
Cauchy-Schwarz inequality, 286
causal function, 78 geometric least-squares solution, 97
Central Slice Theorem, 90 Gram-Schmidt method, 290
change-of-basis matrix, 108
Chaos Game, 298 Haar wavelet, 310, 311
characteristic function of a set, 77 Heaviside function, 77
characteristic polynomial, 109 Helmholtz equation, 64, 276
complex exponential function, 13 Hermitian, 288
conjugate transpose, 104, 112 Hermitian matrix, 114
conjugate-symmetric function, 77 Hilbert transform, 78
convolution, 77, 81
cycloid, 215 inner product, 284, 285
inner-product space, 286
del operator, 150 integral wavelet transform, 310

317
318 INDEX

inverse Sir Pinski Game, 297 planewave, 65, 271, 275


isomorphism, 107 point-spread function, 82
Isoperimetric Problem, 216 positive-definite, 288

Jacobian matrix, 296 radar, 306


Radon transform, 90
KL distance, 99 rank of a matrix, 106
Kullback-Leibler distance, 99 remote sensing, 64
row-action method, 95
Laplace transform, 80 Runge-Lenz vector, 163
line array, 66
linear functional, 107 sampling frequency, 76
linear independence, 104 SAR, 62
linear operator, 108 self-adjoint operator, 22, 114
logarithm of a complex number, 15 separation of variables, 64
sgn, 77
MART, 93, 97 sign function, 77
matched field, 279 sinusoid, 15
MDFT, 57 Sir Pinski Game, 297
modified DFT, 57 span, 104
modulation transfer function, 82 spanning set, 104
multiplicative algebraic reconstruc- stable fixed point, 292
tion technique, 93 Sturm Comparison Theorem, 32, 256
multiplicative ART, 97 synthetic-aperture radar, 62
system transfer function, 82
Newton-Raphson algorithm, 295
norm, 112, 284, 286 time-harmonic solutions, 64
normal matrix, 114 transpose, 104
normal mode, 278
normal operator, 114 uncorrelated, 287
Nyquist spacing, 61, 273 uniform line array, 273
unitary matrix, 114
odd function, 77
odd part, 78 visible region, 63, 273
optical transfer function, 82
orthogonal, 283, 284, 286, 311 wave equation, 64, 275
orthogonal basis, 114 wavelength, 56
orthogonal complement, 116 wavelet, 311
orthogonal vectors, 114 wavenumber, 273
orthogonal wavelet, 312 wavevector, 65
orthogonality principle, 289 Weierstrass approximation theorem,
orthonormal, 112, 114 306
wideband cross-ambiguity function,
Parseval-Plancherel Equation, 80 308
planar sensor array, 66

You might also like