Amread
Amread
Mathematics
Charles L. Byrne
Department of Mathematical Sciences
University of Massachusetts Lowell
Lowell, MA 01854
August 1, 2014
1 Preface 3
2 More Fundamentals(Chapter 1) 7
2.1 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 The Gradient and Directional Derivatives . . . . . . . . . . 8
2.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Richardson’s Method . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Leibnitz’s Rule and Distributions . . . . . . . . . . . . . . . 11
2.7 The Complex Exponential Function . . . . . . . . . . . . . 13
2.7.1 Real Exponential Functions . . . . . . . . . . . . . . 13
2.7.2 Why is h(x) an Exponential Function? . . . . . . . . 13
2.7.3 What is ez , for z complex? . . . . . . . . . . . . . . 14
2.8 Complex Exponential Signal Models . . . . . . . . . . . . . 16
i
ii CONTENTS
31 Chaos 291
31.1 The Discrete Logistics Equation . . . . . . . . . . . . . . . . 291
31.2 Fixed Points . . . . . . . . . . . . . . . . . . . . . . . . . . 292
31.3 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
31.4 Periodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
31.5 Sensitivity to the Starting Value . . . . . . . . . . . . . . . 293
31.6 Plotting the Iterates . . . . . . . . . . . . . . . . . . . . . . 294
31.7 Filled Julia Sets . . . . . . . . . . . . . . . . . . . . . . . . . 294
31.8 The Newton-Raphson Algorithm . . . . . . . . . . . . . . . 295
31.9 Newton-Raphson and Chaos . . . . . . . . . . . . . . . . . . 296
31.9.1 A Simple Case . . . . . . . . . . . . . . . . . . . . . 296
31.9.2 A Not-So-Simple Case . . . . . . . . . . . . . . . . . 297
31.10The Cantor Game . . . . . . . . . . . . . . . . . . . . . . . 297
31.11The Sir Pinski Game . . . . . . . . . . . . . . . . . . . . . . 297
31.12The Chaos Game . . . . . . . . . . . . . . . . . . . . . . . . 298
32 Wavelets 305
32.1 Analysis and Synthesis . . . . . . . . . . . . . . . . . . . . . 305
32.2 Polynomial Approximation . . . . . . . . . . . . . . . . . . 306
32.3 A Radar Problem . . . . . . . . . . . . . . . . . . . . . . . . 306
32.3.1 Stationary Target . . . . . . . . . . . . . . . . . . . 306
32.3.2 Moving Target . . . . . . . . . . . . . . . . . . . . . 307
32.3.3 The Wideband Cross-Ambiguity Function . . . . . . 308
32.4 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308
32.4.1 Background . . . . . . . . . . . . . . . . . . . . . . . 308
32.4.2 A Simple Example . . . . . . . . . . . . . . . . . . . 309
CONTENTS 1
Bibliography 312
Index 317
2 CONTENTS
Chapter 1
Preface
3
4 CHAPTER 1. PREFACE
Part I
5
Chapter 2
More
Fundamentals(Chapter 1)
xT = (x1 , x2 , ..., xN ),
7
8 CHAPTER 2. MORE FUNDAMENTALS(CHAPTER 1)
For N = 2 or N = 3 we have
x · y = kxkkyk cos θ,
where θ is the angle between x and y, when they are viewed as directed
line segments in a plane, emerging from a common base point.
In general, when N is larger, the angle between x and y no longer makes
sense, but we still have a useful inequality, called Cauchy’s Inequality:
|x · y| ≤ kxkkyk,
and
|x · y| = kxkkyk
precisely when, or if and only if, as mathematicians say, x and y are parallel,
that is, there is a real number α with
y = αx.
f (x1 , x2 , ..., xn−1 , xn + h, xn+1 , ..., xN ) − f (x1 , x2 , ..., xn−1 , xn , xn+1 , ..., xN )
lim ,
h→0 h
provided that this limit exists. We denote this limit as fn (x1 , ..., xN ), or
∂f
∂xn (x1 , ..., xN ). When all the first partial derivatives of f exist at a point
we say that f is differentiable at that point.
When we are dealing with small values of N , such as N = 3, it is
common to write f (x, y, z), where now x, y, and z are real variables, not
vectors. Then the first partial derivatives can be denoted fx , fy , and fz .
The gradient of the function f : RN → R at the point (x1 , x2 , ..., xN ),
written ∇f (x1 , ..., xN ), is the column vector whose entries are the first
partial derivatives of f at that point.
Let d be a member of RN with kdk = 1; then d is called a direction
vector. The directional derivative of f , at the point (x1 , ..., xN ), in the
direction of d, is
∇f (x1 , ..., xN ) · d.
From Cauchy’s Inequality we see that the absolute value of the directional
derivative at a given point is at most the magnitude of the gradient at
that point, and is equal to that magnitude precisely when d is parallel to
2.3. OPTIMIZATION 9
the gradient. It follows that the direction in which the gradient points is
the direction of greatest increase in f , and the opposite direction is the
direction of greatest decrease. The gradient, therefore, is perpendicular to
the tangent plane to the surface of constant value, the level surface, passing
through this point. These facts are important in optimization, when we
try to find the largest and smallest values of f .
2.3 Optimization
If f : R → R is a differentiable real-valued function of a real variable, and
we want to find its local maxima and minima, we take the derivative and
set it to zero. When f : RN → R is a differentiable real-valued function
of N real variables, we find local maxima and minima by calculating the
gradient and finding out where the gradient is zero, that is, where all the
first partial derivatives are zero.
λ
2x + = 0,
2
and
λ
2y +
= 0.
3
We don’t know what λ is, but we don’t care, because we can write
λ = −4x,
and
λ = −6y,
from which we conclude that y = 32 x. This is a second relationship between
x and y and now we can find the answer.
f (x + h) − f (x)
f 0 (x) = lim .
h→0 h
Suppose that we want to approximate f 0 (x) numerically, by taking a small
value of h. Eventually, if h is too small, we run into trouble, because both
the top and the bottom of the difference quotient go to zero as h goes to
zero. So we are dividing a small number by a small number, which is to
2.6. LEIBNITZ’S RULE AND DISTRIBUTIONS 11
Leibnitz’s Rule extends this formula to the case in which a and b are also
allowed to depend on t.
Let h(t) and g(t) be real-valued functions of t. For convenience, we
assume that g(t) < h(t) for all t. Let
Z h(t)
F (t) = f (x, t)dx. (2.5)
g(t)
But Equation (2.9) is also the definition of the generalized function (or
distribution) called the Dirac delta function, denoted δ(x). So U 0 (x) =
δ(x). We can now use this to motivate Leibnitz’s Rule.
Denote by χ[a,b] (x) the function that is one for a ≤ x ≤ b and zero
otherwise; note that
χ[a,b] (x) = U (x − a) − U (x − b), (2.10)
so that the derivative of χ[a,b] (x), in the distributional sense, is
χ0[a,b] (x) = δ(x − a) − δ(x − b). (2.11)
Then we can write
Z h(t) Z +∞
F (t) = f (x, t)dx = χ[g(t),h(t)] (x)f (x, t)dx. (2.12)
g(t) −∞
The function c(x, t) = χ[g(t),h(t)] (x) has the distributional partial derivative,
with respect to t, of
∂c
(x, t) = −g 0 (t)δ(x − g(t)) + h0 (t)δ(x − h(t)). (2.13)
∂t
Using the product rule and differentiating under the integral sign, we get
Z h(t)
0 ∂f
F (t) = (x, t)dx + h0 (t)f (h(t), t) − g 0 (t)f (g(t), t). (2.14)
g(t) ∂t
2.7. THE COMPLEX EXPONENTIAL FUNCTION 13
For reasons that will become clear shortly, this function is called the com-
plex exponential function. Notice that the magnitude of the complex num-
ber h(x) is always equal to one, since cos2 (x) + sin2 (x) = 1 for all real x.
Since the functions cos(x) and sin(x) are 2π-periodic, that is, cos(x+2π) =
cos(x) and sin(x + 2π) = sin(x) for all x, the complex exponential function
h(x) is also 2π-periodic.
for every u and v. Recall from calculus that for exponential functions
g(x) = ax with a > 0 the derivative g 0 (x) is
c = cos(1) + i sin(1).
14 CHAPTER 2. MORE FUNDAMENTALS(CHAPTER 1)
Inserting i in place of x and using the fact that i2 = −1, we find that
note that the two series are the Taylor series for cos(1) and sin(1), respec-
tively, so ei = cos(1) + i sin(1). Then the complex exponential function in
Equation (2.15) is
h(x) = (ei )x = eix .
Inserting x = π, we get
or
eiπ + 1 = 0,
which is the remarkable relation discovered by Euler that combines the five
most important constants in mathematics, e, π, i, 1, and 0, in a single
equation.
Note that e2πi = e0i = e0 = 1, so
for all x.
for any integer k. If z = a > 0 then θ(z) = 0 and ln(z) = ln(a) + i(kπ)
for any even integer k; in calculus class we just take the value associated
with k = 0. If z = a < 0 then θ(z) = π and ln(z) = ln(−a) + i(kπ) for
any odd integer k. So we can define the logarithm of a negative number; it
just turns out not to be a real number. If z = ib with b > 0, then θ(z) = π2
and ln(z) = ln(b) + i( π2 + 2πk) for any integer k; if z = ib with b < 0, then
θ(z) = 3π 3π
2 and ln(z) = ln(−b) + i( 2 + 2πk) for any integer k.
−ix
Adding e = cos(x) − i sin(x) to eix given by Equation (2.15), we get
1 ix
cos(x) = (e + e−ix );
2
subtracting, we obtain
1 ix
sin(x) = (e − e−ix ).
2i
These formulas allow us to extend the definition of cos and sin to complex
arguments z:
1
cos(z) = (eiz + e−iz )
2
and
1 iz
sin(z) = (e − e−iz ).
2i
In signal processing the complex exponential function is often used to de-
scribe functions of time that exhibit periodic behavior:
where the frequency ω and phase angle θ are real constants and t denotes
time. We can alter the magnitude by multiplying h(ωt + θ) by a positive
constant |A|, called the amplitude, to get |A|h(ωt + θ). More generally, we
can combine the amplitude and the phase, writing
where the ωk are known, but the ak and bk are not. Now that we see how
to convert sines and cosines to complex exponential functions, using
1
cos(ωk x) = exp(iωk x) + exp(−iωk x) (2.19)
2
and
1
sin(ωk x) = exp(iωk x) − exp(−iωk x) , (2.20)
2i
we can write f (x) as
L
X
f (x) = cm exp(iωm x), (2.21)
m=−L
where c0 = 12 a0 ,
1
ck = (ak − ibk ), (2.22)
2
and
1
c−k = (ak + ibk ), (2.23)
2
for k = 1, ..., L. The complex notation is more commonly used in signal
processing. Note that if the original coefficients ak and bk are real numbers,
then c−m = cm .
Chapter 3
Differential Equations
(Chapters 2,3)
Many differential equations of this type arise when we employ the technique
of separating the variables to solve a partial differential equation. We shall
consider several equivalent forms of Equation (3.1).
17
18 CHAPTER 3. DIFFERENTIAL EQUATIONS (CHAPTERS 2,3)
so that
d
(S(x)R(x)y 0 (x)) + S(x)Q(x)y(x) = 0,
dx
which then has the form
d
(p(x)y 0 (x)) + g(x)y(x) = 0. (3.3)
dx
We shall be particularly interested in special cases having the form
d
(p(x)y 0 (x)) − w(x)q(x)y(x) + λw(x)y(x) = 0, (3.4)
dx
where w(x) > 0 and λ is a constant. Rewriting Equation (3.4) as
1 d
− (p(x)y 0 (x)) + q(x)y(x) = λy(x), (3.5)
w(x) dx
we are reminded of eigenvector problems in linear algebra,
Ax = λx, (3.6)
Ly = λy,
where
y(x) = u(x)v(x),
3.2. RECALLING THE WAVE EQUATION 19
1Z
v(x) = − exp − P dx ,
2
and
1 1
q(x) = Q(x) − P (x)2 − P 0 (x).
4 2
One reason for wanting to put the differential equation into normal form is
to relate the properties of its solutions to the properties of q(x). For exam-
ple, we are interested in the location of zeros of the solutions of Equation
(3.8), as compared with the zeros of the solutions of
If q(x) < 0, then any non-trivial solution of Equation (3.8) has at most one
zero; think of the equation
and
ω2
y 00 (x) + y(x) = 0. (3.12)
c2
Equation (3.12) can be written as an eigenvalue problem:
ω2
−y 00 (x) = y(x) = λy(x), (3.13)
c2
where, for the moment, the λ is unrestricted.
The solutions to Equation (3.12) are
ω ω
y(x) = α sin x + β cos x .
c c
For each arbitrary ω, the corresponding solutions of Equation (3.11) are
In the vibrating string problem, the string is fixed at both ends, x = 0 and
x = L, so that
φ(0, t) = φ(L, t) = 0,
for all t. Therefore, we must have y(0) = y(L) = 0, so that the solutions
must have the form
ω πm
m
y(x) = Am sin x = Am sin x ,
c L
where ωm = πcm L , for any positive integer m. Therefore, the boundary
conditions limit the choices for the separation constant ω, and thereby the
choices for λ. In addition, if the string is not moving at time t = 0, then
Using
00
ym yn − yn00 ym = (yn ym
0
− ym yn0 )0 ,
and integrating, we get
0
0 = yn (L)ym (L) − ym (L)yn0 (L) − yn (0)ym
0
(0) + ym (0)yn0 (0)
Z L
= (λn − λm ) ym (x)yn (x)dx,
0
so that Z L
ym (x)yn (x)dx = 0,
0
for m 6= n. Using this orthogonality of the ym (x), we can easily find the
coefficients Am .
22 CHAPTER 3. DIFFERENTIAL EQUATIONS (CHAPTERS 2,3)
Definition 3.1 Given V and the inner product, we say that a linear op-
erator T on V is self-adjoint if T ∗ = T .
25
26 CHAPTER 4. EXTRA CREDIT PROBLEMS (CHAPTERS 2,3)
Find the terminal velocity in this case. ([42], pp. 20, 24.)
• Escape Velocity: The force that gravity exerts on a body of mass
m at the surface of the earth is mg. In space, however, Newton’s law
of gravitation asserts that this force varies inversely as a square of the
distance to the earth’s center. If a projectile fired upward from the
surface is to keep √
traveling indefinitely, show that its initial velocity
must be at least 2gR, where R is the radius of the earth (about
4000 miles). This escape velocity is approximately 7 miles/second or
about 25,000 miles/hour. Hint: If x is the distance from the center
of the earth to the projectile and v = dx dt is its velocity, then
d2 x dv dv dx dv
= = =v .
dt2 dt dx dt dx
([42], p. 24)
Another way to view this problem is to consider an object falling to
earth from space. Calculate its velocity upon impact, as a function
of the distance to the center of the earth at the beginning of its
fall, neglecting all but gravity. Then calculate the upper limit of the
impact velocity as the distance goes to infinity.
• The Snowplow Problem: It began snowing on a certain morning
and the snow continued to fall steadily throughout the day. At noon
a snowplow started to clear a road, at a constant rate, in terms of the
volume of snow removed per hour. The snowplow cleared 2 miles by
2 p.m. and 1 more mile by 4 p.m. When did it start snowing? ([42],
p. 31)
• Torricelli’s Law: According to Torricelli’s Law, water in an open
tank will flow out through a small hole in the bottom with the speed
it would acquire in falling freely from the water level to the hole.
A hemispherical bowl of radius R is initially filled with water, and a
small circular hole of radius r is punched in the bottom at time t = 0.
How long does it take for the bowl to empty itself? ([42], p. 32)
• The Coffee and Cream Problem: The President and the Prime
Minister order coffee and receive cups of equal temperature at the
same time. The President adds a small amount of cool cream im-
mediately, but does not drink his coffee until 10 minutes later. The
Prime Minister waits ten minutes and then adds the same amount of
cool cream and begins to drink. Who drinks the hotter coffee? ([42],
p. 33)
• The Two Tanks Problem: A tank contains 50 gallons of brine in
which 25 pounds of salt are dissolved. Beginning at time t = 0, water
4.1. THE PROBLEMS 27
runs into this tank at the rate of 2 gallons/minute, and the mixture
flows out at the same rate through a second tank initially containing
50 gallons of pure water. When will the second tank contain the
greatest amount of salt? ([42], p. 62)
• Torricelli Again: A cylindrical tank is filled with water to a height
of D feet. At height h < D feet a small hole is drilled into the side of
the tank. According to Torricelli’s Law, the horizontal p velocity with
which the water spurts from the side of the tank is v = 2g(D − h).
What is the distance d from the base of the tank to where the water
hits the ground? For fixed D, what are the possible values of d as
h varies? Given D and d, can we find h? This last question is an
example of an inverse problem ([24], pp. 26-27). We shall consider
more inverse problems below.
• The Well Problem: A rock is dropped into a well in which the
unknown water level is d feet below the top of the well. If we measure
the time lapse from the dropping of the rock until the hearing of the
splash, can we use this to determine d? ([24], p. 40)
• The Pool Table Problem: Suppose our ‘pool table’ is the unit
square {(x, y)|0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. Suppose the cue ball is at
(x1 , y1 ) and the target ball is at (x2 , y2 ). In how many ways can we
hit the target ball with the cue ball using a ‘bank shot’ , in which the
cue ball rebounds off the side of the table once before striking the
target ball? Now for a harder problem: there is no pool table now.
The cue ball is launched from the origin into the first quadrant at an
angle θ > 0 with the positive x-axis. It bounces off a straight line
and returns to the positive x-axis at the point r(θ), making an angle
ψ(θ) > 0. Can we determine the equation of the straight line from
this information? What if we do not know r(θ)? ([24], pp. 41-44)
• Torricelli, Yet Again!: A container is formed by revolving the
curve x = f (y) around the (vertical) y-axis. The container is filled
to a height of y and the water is allowed to run out through a hole
of cross-sectional area a in the bottom. The time it takes to drain
is T (y). How does the drain-time function T depend on the shape
function f ? Can we determine f if we know T ? How could we
approximate f from the values T (yn ), n = 1, ..., N ? ([24], pp. 59–66)
• Mixing Problems: Let q(t) denote the quantity of a pollutant in
a container at time t. Then the rate at which q(t) changes with time
is the difference between the rate at which the pollutant enters the
container and the rate at which it is removed. Suppose the container
has volume V , water with a concentration a of pollutant enters the
container at a rate r and the well-stirred mixture leaves the container
28 CHAPTER 4. EXTRA CREDIT PROBLEMS (CHAPTERS 2,3)
Qualitative Analysis of
ODEs (Chapter 2,3)
y 00 + y = 0, (5.2)
generally, we will not be able to do this. Instead, we can try to answer cer-
tain questions about the behavior of the solution, without actually finding
the solution; such an approach is called qualitative analysis. The discussion
here is based on that in Simmons [42].
Theorem 5.1 Let P (x) and Q(x) be continuous functions on the interval
[a, b]. If x0 is any point in [a, b] and y0 and y00 any real numbers, then there
is a unique solution of Equation (5.1) satisfying the conditions y(x0 ) = y0
and y 0 (x0 ) = y00 .
The proof of this theorem is somewhat lengthy and we shall omit the
proof here.
29
30CHAPTER 5. QUALITATIVE ANALYSIS OF ODES (CHAPTER 2,3)
Proof: We know that solutions y1 (x) and y2 (x) are linearly independent
if and only if the Wronskian
is different from zero for all x in the interval [a, b]. Therefore, when the
two functions are linearly independent, the function W (x, y1 , y2 ) must have
constant sign on the interval [a, b]. Therefore, the two functions y1 (x) and
y2 (x) have no common zero. Suppose that y2 (x1 ) = y2 (x2 ) = 0, with
x1 < x2 successive zeros of y2 (x). Suppose, in addition, that y2 (x) > 0 in
the interval (x1 , x2 ). Therefore, we have y20 (x1 ) > 0 and y20 (x2 ) < 0. It
follows that y1 (x1 ) and y1 (x2 ) have opposite signs, and there must be a
zero between x1 and x2 .
form
has ex and e−x for solutions. Since we are interested in oscillatory solutions,
we restrict q(x) to be (eventually) positive. With q(x) > 0 and
Z ∞
q(x)dx = ∞,
1
the solution u(x) will have infinitely many zeros, but only finitely many on
any bounded interval.
Theorem 5.3 If q(x) < 0 for all x, then u(x) has at most one zero.
Proof: Let u(x0 ) = 0. Since u(x) is not identically zero, we must have
u0 (x0 ) 6= 0, by Theorem 5.1. Therefore, assume that u0 (x) > 0 for x in
the interval [x0 , x0 + ], where is some positive number. Since u00 (x) =
−q(x)u(x), we know that u00 (x) > 0 also, for x in the interval [x0 , x0 + ].
So the slope of u(x) is increasing to the right of x0 , and so there can be no
zero of u(x) to the right of x0 . A similar argument shows that there can
be no zeros of u(x) to the left of x0 .
32CHAPTER 5. QUALITATIVE ANALYSIS OF ODES (CHAPTER 2,3)
R∞
Theorem 5.4 If q(x) > 0 for all x > 0 and 1
q(x)dx = ∞, then u(x)
has infinitely many positive zeros.
Proof: Assume, to the contrary, that u(x) has only finitely many positive
zeros, and that there are no positive zeros to the right of the positive
number x0 . Assume also that u(x0 ) > 0. From u00 (x) = −q(x)u(x) we
know that the slope of u(x) is decreasing to the right of x0 , so long as u(x)
remains above the x-axis. If the slope ever becomes negative, the graph of
u(x) will continue to drop at an ever increasing rate and will have to cross
the x-axis at some point to the right of x0 . Therefore, to avoid having a
root beyond x0 , the slope must remain positive. We prove the theorem by
showing that the slope eventually becomes negative.
Let v(x) = −u0 (x)/u(x), for x ≥ x0 . Then v 0 (x) = q(x) + v 2 (x), and
Z x Z x
v(x) − v(x0 ) = q(x)dx + v 2 (x)dx.
x0 x0
Since Z ∞
q(x)dx = ∞,
1
x2 y 00 + xy 0 + (x2 − ν 2 )y = 0. (5.4)
This suggests that other differential equations that can be written in Sturm-
Liouville form may have eigenfunction solutions that are also orthogonal,
with respect to some appropriate inner product. As we have seen, this
program works out beautifully. What is happening here is a transition from
classical applied mathematics, with its emphasis on particular problems
and equations, to a more modern, 20th century style mathematics, with
an emphasis on families of functions or even more abstract inner-product
spaces, Hilbert spaces, Banach spaces, and so on.
Chapter 6
6.1 Introduction
In 1815, at the end of the war with England, the US was a developing coun-
try, with most people living on small farms, eating whatever they could
grow themselves. Only those living near navigable water could market
their crops. Poor transportation and communication kept them isolated.
By 1848, at the end of the next war, this time with Mexico, things were
different. The US was a transcontinental power, integrated by railroads,
telegraph, steamboats, the Erie Canal, and innovations in mass production
and agriculture. In 1828, the newly elected President, Andrew Jackson,
arrived in Washington by horse-drawn carriage; he left in 1837 by train.
The most revolutionary change was in communication, where the recent
advances in understanding electromagnetism produced the telegraph. It
wasn’t long before efforts began to lay a telegraph cable under the At-
lantic Ocean, even though some wondered what England and the US could
possibly have to say to one another.
The laying of the trans-Atlantic cable was, in many ways, the 19th
century equivalent of landing a man on the moon, involving, as it did,
considerable expense, too frequent failure, and a level of precision in en-
gineering design and manufacturing never before attempted. From a sci-
entific perspective, it was probably more difficult, given that the study of
electromagnetism was in its infancy at the time.
Early on, Faraday and others worried that sending a message across a
vast distance would take a long time, but they reasoned, incorrectly, that
this would be similar to filling a very long hose with water. What they
did not realize initially was that, as William Thomson was to discover,
35
36 CHAPTER 6. THE TRANS-ATLANTIC CABLE (CHAPTERS 4,12)
where m is the mass of the block, a the coefficient of friction, and k the
spring constant.
The charge Q(t) deposited on a capacitor in an electrical circuit due to
an imposed electromotive force E(t) is similarly described by the ordinary
6.3. THE TELEGRAPH EQUATION 37
differential equation
1
LQ00 (t) + RQ0 (t) + Q(t) = E(t). (6.2)
C
The first term, containing the inductance coefficient L, describes the por-
tion of the force E(t) devoted to overcoming the effect of a change in the
current I(t) = Q0 (t); here L is analogous to the mass m. The second term,
containing the resistance coefficient R, describes that portion of the force
E(t) needed to overcome resistance to the current I(t); now R is analogous
to the friction coefficient a. Finally, the third term, containing the recipro-
cal of the capacitance C, describes the portion of E(t) used to store charge
on the capacitor; now C1 is analogous to k, the spring constant.
would move along the cable undistorted; here H(t) is the Heaviside function
that is zero for t < 0 and one for t ≥ 0. Thomson (later Sir William
Thomson, and even later, Lord Kelvin) thought otherwise.
Thomson argued that there would be a voltage drop over an interval
[x, x+∆x] due to resistance to the current i(x, t) passing through the cable,
so that
u(x + ∆x, t) − u(x, t) = −Ri(x, t)∆x,
and so
∂u
= −Ri.
∂x
He also argued that there would be capacitance to the ground, made more
significant under water. Since the apparent change in current due to the
changing voltage across the capacitor is
we have
∂i ∂u
= −C .
∂x ∂t
38 CHAPTER 6. THE TRANS-ATLANTIC CABLE (CHAPTERS 4,12)
where L(E)(s) denotes the Laplace transform of E(t). Since U (x, s) is the
product of two functions of s, the convolution theorem applies. But first,
it is helpful to√ find out which function has for its Laplace transform the
function e−αx s . The answer comes from the following fact: the function
2 √
be−b /4t /2 πt3/2
√
has for its Laplace transform the function e−b s . Therefore, we can write
√ 2
CRx t e−CRx /4τ
Z
u(x, t) = √ E(t − τ ) √ dτ.
2 π 0 τ τ
Now we consider two special cases.
The function Z r
2 2
erf(r) = √ e−z dz
π 0
is the well known error function, so we can write
√CRx
u(x, t) = 1 − erf √ . (6.5)
2 t
6.5. HEAVISIDE TO THE RESCUE 39
or
1 R
uxx = utt + ut . (6.9)
CL L
If R/L could be made small, we √ would have a wave equation again, but
with a propagation speed of 1/ CL. This suggested to Heaviside that one
way to obtain undistorted signaling would be to increase L, since we cannot
realistically hope to change R. He argued for years for the use of cables
with higher inductance, which eventually became the practice, helped along
by the invention of new materials, such as magnetic alloys, that could be
incorporated into the cables.
Then we have
√ √
GRx 1
U (x, s) = e− (1 − e−T s )e− CLxs
,
s
so that
√ √ √
u(x, t) = e− GRx
H(t − x CL) − H(t − T − x CL) . (6.10)
This tells us that√we have an undistorted pulse that arrives at the point x
at the time t = x CL.
In order to have GL = CR, we need L = CR/G. Since C and R are
more or less fixed, and G is typically reduced by insulation, L will need to
be large. Again, this argues for increasing the inductance in the cable.
Chapter 7
41
42CHAPTER 7. THE LAPLACE TRANSFORM AND THE OZONE LAYER (CHAPTER 4)
Within the ozone layer, the amount of UV radiation scattered in the direc-
tion θ is given by
S(θ, θ0 )I(0)e−kx/ cos θ0 ∆p,
where S(θ, θ0 ) is a known parameter, and ∆p is the change in the pressure
of the ozone within the infinitesimal layer [x, x+∆x], and so is proportional
to the concentration of ozone within that layer.
where
1 1
β = k[ − ].
cos θ0 cos θ
This superposition of intensity can then be written as
Z X
−kX/ cos θ0
S(θ, θ0 )I(0)e e−xβ p0 (x)dx.
0
Since p(0) = 0 and p(X) can be measured, our data is then the Laplace
transform value Z +∞
e−βx p(x)dx;
0
note that we can replace the upper limit X with +∞ if we extend p(x) as
zero beyond x = X.
The variable β depends on the two angles θ and θ0 . We can alter θ as
we measure and θ0 changes as the sun moves relative to the earth. In this
way we get values of the Laplace transform of p(x) for various values of β.
7.4. THE LAPLACE TRANSFORM DATA 43
The problem then is to recover p(x) from these values. Because the Laplace
transform involves a smoothing of the function p(x), recovering p(x) from
its Laplace transform is more ill-conditioned than is the Fourier transform
inversion problem.
44CHAPTER 7. THE LAPLACE TRANSFORM AND THE OZONE LAYER (CHAPTER 4)
Chapter 8
45
46CHAPTER 8. THE FINITE FOURIER TRANSFORM (CHAPTER 7)
where the ωk are known, but the ak and bk are not. We find the unknown
ak and bk by fitting the model to the data. We obtain data f (xn ) corre-
sponding to the N points xn , for n = 0, 1, ..., N − 1, where N = 2M + 1,
and we solve the system
M
1 X
f (xn ) = a0 + ak cos(ωk xn ) + bk sin(ωk xn ) ,
2
k=1
π
As we shall see in this subsection, choosing ωk = A k leads to a form of
orthogonality that will allow us to calculate the parameters in a relatively
simple manner, with the number of multiplications on the order of N 2 .
Later, we shall see how to use the fast Fourier transform algorithm to
reduce the number of computations even more.
For fixed j = 1, ..., M consider the sums
N −1 N −1
X 2π 1 X 2π
fn cos( jn) = a0 cos( jn)
n=0
N 2 n=0 N
M −1
NX
X 2π 2π
+ ak cos( kn) cos( jn)
n=0
N N
k=1
−1
!
NX 2π 2π
+bk sin( kn) cos( jn) , (8.7)
n=0
N N
and
N −1 N −1
X 2π 1 X 2π
fn sin( jn) = a0 sin( jn)
n=0
N 2 n=0 N
M −1
NX
X 2π 2π
+ ak cos( kn) sin( jn)
n=0
N N
k=1
−1
!
NX 2π 2π
+bk sin( kn) sin( jn) . (8.8)
n=0
N N
and
N −1
X 2π 2π 0, if j 6= k, or j = k = 0;
sin( kn) sin( jn) = N
N N 2, if j = k 6= 0.
n=0
48CHAPTER 8. THE FINITE FOURIER TRANSFORM (CHAPTER 7)
and
N −1
x X x 1
2 sin( ) sin(nx) = cos( ) − cos((N − )x).
2 n=0 2 2
Hints: sum over n = 0, 1, ..., N − 1 on both sides and note that
x x
sin( ) = − sin(− ).
2 2
Exercise 8.4 Use trigonometric identities to show that
1 x N −1 N
sin((N − )x) + sin( ) = 2 cos( x) sin( x),
2 2 2 2
and
x 1 N N −1
cos − cos((N − )x) = 2 sin( x) sin( x).
2 2 2 2
Hints: Use
1 N N −1
N− = + ,
2 2 2
and
1 N N −1
= − .
2 2 2
8.2. LINEAR TRIGONOMETRIC MODELS 49
N −1
x X N N −1
sin( ) cos(nx) = sin( x) cos( x),
2 n=0 2 2
and
N −1
x X N N −1
sin( ) sin(nx) = sin( x) sin( x).
2 n=0 2 2
2πm
Let m be any integer. Substituting x = N in the equations in the
previous exercise, we obtain
N −1
π X 2πmn N −1
sin( m) cos( ) = sin(πm) cos( πm), (8.9)
N n=0
N N
and
N −1
π X 2πmn N −1
sin( m) sin( ) = sin(πm) sin( πm). (8.10)
N n=0
N N
With m = k + j, we have
N −1
π X 2π(k + j)n N −1
sin( (k + j)) cos( ) = sin(π(k + j)) cos( π(k + j)),
(8.11)
N n=0
N N
and
N −1
π X 2π(k + j)n N −1
sin( (k + j)) sin( ) = sin(π(k + j)) sin( π(k + j)).
(8.12)
N n=0
N N
N −1
π X 2π(k − j)n N −1
sin( (k − j)) cos( ) = sin(π(k − j)) cos( π(k − j)),
(8.13)
N n=0
N N
and
N −1
π X 2π(k − j)n N −1
sin( (k − j)) sin( ) = sin(π(k − j)) sin( π(k − j)).
(8.14)
N n=0
N N
and that
N −1
X 2π N
fn cos( jn) = aj ,
n=0
N 2
and
N −1
X 2π N
fn sin( jn) = bj ,
n=0
N 2
for j = 1, ..., M .
N −1
1 X π
f (x) = Fk exp(−i kx), (8.15)
N A
k=0
2A
0, 1, ..., N − 1. Setting x = N n in Equation (8.15), we have
N −1
1 X 2π
fn = Fk exp(−i kn). (8.16)
N N
k=0
whenever the denominator is not zero. From Equation (8.17) we can show
that
N −1
X 2π 2π
exp(i kn) exp(−i jn) = 0, (8.18)
n=0
N N
for k = 0, 1, ..., N − 1.
Generally, given any (possibly) complex numbers fn , n = 0, 1, ..., N − 1,
the collection of coefficients Fk , k = 0, 1, ..., N − 1, is called its complex
finite Fourier transform.
and
Z L
1 nπ
bn = f (x) sin( x)dx. (9.3)
L −L L
53
54CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)
In Figure 9.1 below, the function f (x) is the solid-line figure in both graphs.
In the bottom graph, we see the true f (x) and a DFT estimate. The top
graph is the result of band-limited extrapolation, a technique for predicting
missing Fourier coefficients.
In our first example, we imagine that the strength function f (x) is unknown
and we want to determine it. It could be the case that the signals originate
at the points x, as with light or radio waves from the sun, or are simply
reflected from the points x, as is sunlight from the moon or radio waves
in radar. Later in this chapter, we shall investigate a related example, in
which the points x transmit known signals and we want to determine what
is received elsewhere.
D π
ω(t − )= ,
c 2
56CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)
ω cos(θ) nπ
= . (9.15)
c 2L
Now we have twice as many data points: we now have
Z 2L Z L
nπ nπ
An = f (x) cos( )dx = f (x) cos( )dx, (9.16)
−2L 2L −L 2L
and
Z 2L Z L
nπ nπ
Bn = f (x) sin( )dx = f (x) sin( )dx, (9.17)
−2L 2L −L 2L
2N
1 X nπ nπ
fM DF T (x) = c0 + cn cos( x) + dn sin( x), (9.18)
2 n=1
2L 2L
58CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)
where the cn and dn are not yet determined. Then we determine the cn and
dn by requiring that the function fM DF T (x) could be the correct answer;
that is, we require that fM DF T (x) be consistent with the measured data.
Therefore, we must have
Z L
nπ
fM DF T (x) cos( )dx = An , (9.19)
−L 2L
and
Z L
nπ
fM DF T (x) sin( )dx = Bn , (9.20)
−L 2L
2c ω cos(θ)
sin( ), (9.22)
ω cos(θ) c
whose absolute value is then the strength of the signal at P . Is it possible
that the strength of the signal at some P is zero?
To have zero signal strength, we need
Lω cos(θ)
sin( ) = 0,
c
without
cos(θ) = 0.
Therefore, we need
Lω cos(θ)
= nπ, (9.23)
c
for some positive integers n ≥ 1. Notice that this can happen only if
Lωπ 2L
n≤ = . (9.24)
c λ
60CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)
F (arccos(u))
G(u) = √ ,
1 − u2
defined for u in the interval [−1, 1].
Measuring the signals received at x and −x, we can obtain the integrals
Z 1
xω
G(u) cos( u)du, (9.25)
−1 c
and
Z 1
xω
G(u) sin( u)du. (9.26)
−1 c
1 1
Z
G(u) cos(nπu)du, (9.27)
2 −1
and
Z 1
1
G(u) sin(nπu)du. (9.28)
2 −1
The upper bound 2L λ , which is the length of our array of sensors, in units
of wavelength, is often called the aperture of the array.
Once we have some of the Fourier coefficients of the function G(u), we
can estimate G(u) for |u| ≤ 1 and, from that estimate, obtain an estimate
of the original F (θ).
62CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)
9.6.2 Over-sampling
One situation in which over-sampling arises naturally occurs in sonar array
processing. Suppose that an array of sensors has been built to operate at
a design frequency of ω0 , which means that we have placed sensors at the
points x in [−L, L] that satisfy the equation
πc λ0
x=n =n = n∆0 , (9.32)
ω0 2
where λ0 is the wavelength corresponding to the frequency ω0 and ∆0 = λ20
is the Nyquist spacing for frequency ω0 . Now suppose that we want to
operate the sensing at another frequency, say ω. The sensors cannot be
moved, so we must make due with sensors at the points x determined by
the design frequency.
Consider, first, the case in which the second frequency ω is less than
the design frequency ω0 . Then its wavelength λ is larger than λ0 , and the
Nyquist spacing ∆ = λ2 for ω is larger than ∆0 . So we have over-sampled.
The measurements taken at the sensors provide us with the integrals
Z 1
1 nπ
G(u) cos( u)du, (9.33)
K −1 K
and
Z 1
1 nπ
G(u) sin( u)du, (9.34)
K −1 K
where K = ωω0 > 1. These are Fourier coefficients of the function G(u),
viewed as defined on the interval [−K, K], which is larger than [−1, 1], and
9.7. HIGHER DIMENSIONAL ARRAYS 63
taking the value zero outside [−1, 1]. If we then use the DFT estimate of
G(u), it will estimate G(u) for the values of u within [−1, 1], which is what
we want, as well as for the values of u outside [−1, 1], where we already
know G(u) to be zero. Once again, we can use the modified DFT, the
MDFT, to include the prior knowledge that G(u) = 0 for u outside [−1, 1]
to improve our reconstruction of G(u) and F (θ). In the over-sampled case
the interval [−1, 1] is called the visible region (although audible region seems
more appropriate for sonar), since it contains all the values of u that can
correspond to actual angles of arrival of acoustic energy.
9.6.3 Under-sampling
Now suppose that the frequency ω that we want to consider is greater than
the design frequency ω0 . This means that the spacing between the sensors
is too large; we have under-sampled. Once again, however, we cannot move
the sensors and must make due with what we have.
Now the measurements at the sensors provide us with the integrals
Z 1
1 nπ
G(u) cos( u)du, (9.35)
K −1 K
and
Z 1
1 nπ
G(u) sin( u)du, (9.36)
K −1 K
where K = ωω0 < 1. These are Fourier coefficients of the function G(u),
viewed as defined on the interval [−K, K], which is smaller than [−1, 1],
and taking the value zero outside [−K, K]. Since G(u) is not necessarily
zero outside [−K, K], treating it as if it were zero there results in a type
of error known as aliasing, in which energy corresponding to angles whose
u lies outside [−K, K] is mistakenly assigned to values of u that lie within
[−K, K]. Aliasing is a common phenomenon; the strobe-light effect is
aliasing, as is the apparent backward motion of the wheels of stage-coaches
in cowboy movies. In the case of the strobe light, we are permitted to view
the scene at times too far apart for us to sense continuous, smooth motion.
In the case of the wagon wheels, the frames of the film capture instants of
time too far apart for us to see the true rotation of the wheels.
The data are then Fourier transform values of the complex function F (k);
F (k) is defined for all three-dimensional real vectors k, but is zero, in
theory, at least, for those k whose squared length ||k||2 is not equal to
ω 2 /c2 . Our goal is then to estimate F (k) from measured values of its
Fourier transform. Since each k is a normal vector for its planewave field
component, determining the value of F (k) will tell us the strength of the
planewave component coming from the direction k.
66CHAPTER 9. TRANSMISSION AND REMOTE SENSING (CHAPTER 8)
Now the ambiguity is greater than in the planar array case. Once we have
k1 , we know that
ω
k22 + k32 = ( )2 − k12 , (9.47)
c
which describes points P lying on a circle on the surface of the distant
sphere, with the vector (k1 , 0, 0) pointing at the center of the circle. It
is said then that we have a cone of ambiguity. One way to resolve the
situation is to assume k3 = 0; then |k2 | can be determined and we have
remaining only the ambiguity involving the sign of k2 . Once again, in many
applications, this remaining ambiguity can be resolved by other means.
Once we have resolved any ambiguity, we can view the function F (k)
as F (k1 ), a function of the single variable k1 . Our measurements give us
values of f (x), the Fourier transform of F (k1 ). As in the two-dimensional
case, the restriction on the size of the vectors k means that the function
F (k1 ) has bounded support. Consequently, its Fourier transform, f (x),
cannot have bounded support. Therefore, we shall never have all of f (x),
and so cannot hope to reconstruct F (k1 ) exactly, even for noise-free data.
The wavelength λ for gamma rays is around one Angstrom, which is 10−10
meters; for x-rays it is about one millimicron, or 10−9 meters. The visi-
ble spectrum has wavelengths that are a little less than one micron, that
is, 10−6 meters. Shortwave radio has a wavelength around one millime-
ter; microwaves have wavelengths between one centimeter and one meter.
Broadcast radio has a λ running from about 10 meters to 1000 meters. The
so-called long radio waves can have wavelengths several thousand meters
long, prompting clever methods of antenna design for radio astronomy.
The sun has an angular diameter of 30 min. of arc, or one-half of a
degree, when viewed from earth, but the needed resolution was more like
3 min. of arc. Such resolution requires a radio telescope 1000 wavelengths
across, which means a diameter of 1km at a wavelength of 1 meter; in
1942 the largest military radar antennas were less than 5 meters across.
A solution was found, using the method of reconstructing an object from
line-integral data, a technique that surfaced again in tomography.
9.8. AN EXAMPLE: THE SOLAR-EMISSION PROBLEM 69
Figure 9.5: Transmission Pattern A(θ): m = 0.9, 0.5, 0.25, 0.125 and N =
21.
9.8. AN EXAMPLE: THE SOLAR-EMISSION PROBLEM 73
Having obtained F (ω) we can recapture the original f (x) from the Fourier-
Transform Inversion Formula:
Z ∞
1
f (x) = F (ω)e−iωx dω. (10.2)
2π −∞
Precisely how we interpret the infinite integrals that arise in the discus-
sion of the Fourier transform will depend on the properties of the function
f (x).
75
76CHAPTER 10. PROPERTIES OF THE FOURIER TRANSFORM (CHAPTER 8)
Exercise 10.1 Let F (ω) be the FT of the function f (x). Use the defini-
tions of the FT and IFT given in Equations (10.1) and (10.2) to establish
the following basic properties of the Fourier transform operation:
Exercise 10.2 Show that f is an even function if and only if its Fourier
transform, F , is an even function.
Exercise 10.3 Show that f is real-valued if and only if its Fourier trans-
form F is conjugate-symmetric, that is, F (−ω) = F (ω). Therefore, f is
real-valued and even if and only if its Fourier transform F is real-valued
and even.
Exercise 10.7 Show that the IFT of the function F (ω) = 2i/ω is f (x) =
sgn(x).
Hints: Write the formula for the inverse Fourier transform of F (ω) as
Z +∞ Z +∞
1 2i i 2i
f (x) = cos ωxdω − sin ωxdω,
2π −∞ ω 2π −∞ ω
which reduces to
1 +∞ 1
Z
f (x) = sin ωxdω,
π −∞ ω
since the integrand of the first integral is odd. For x > 0 consider the
Fourier transform of the function χx (t). For x < 0 perform the change of
variables u = −x.
Generally, the functions f (x) and F (ω) are complex-valued, so that we
may speak about their real and imaginary parts. The next exercise explores
the connections that hold among these real-valued functions.
Exercise 10.8 Let f (x) be arbitrary and F (ω) its Fourier transform. Let
F (ω) = R(ω) + iX(ω), where R and X are real-valued functions, and
similarly, let f (x) = f1 (x) + if2 (x), where f1 and f2 are real-valued. Find
relationships between the pairs R,X and f1 ,f2 .
Hint: If f (x) = 0 for x < 0 then f (x)sgn(x) = f (x). Apply the convolution
theorem, then compare real and imaginary parts.
10.4. DIRAC DELTAS 79
We describe this by saying that the function f (x) = sinπxΩx has the sifting
property for all Ω-band-limited functions g(x).
As Ω grows larger, f (0) approaches +∞, while f (x) goes to zero for
x 6= 0. The limit is therefore not a function; it is a generalized function
called the Dirac delta function at zero, denoted δ(x). For this reason the
function f (x) = sinπxΩx is called an approximate delta function. The FT
of δ(x) is the function F (ω) = 1 for all ω. The Dirac delta function δ(x)
enjoys the sifting property for all g(x); that is,
Z ∞
g(x0 ) = g(x)δ(x − x0 )dx.
−∞
It follows from the sifting and shifting properties that the FT of δ(x − x0 )
is the function eix0 ω .
The formula for the inverse FT now says
Z ∞
1
δ(x) = e−ixω dω. (10.3)
2π −∞
Exercise 10.10 Use the fact that sgn(x) = 2u(x) − 1 and the previous
exercise to show that f (x) = u(x) has the FT F (ω) = i/ω + πδ(ω).
Rx
Exercise 10.11 Let f, F be a FT pair. Let g(x) = −∞
f (y)dy. Show that
iF (ω)
the FT of g(x) is G(ω) = πF (0)δ(ω) + ω .
Exercise 10.12 Let f (x), F (ω) and g(x), G(ω) be Fourier transform pairs.
Use Equation (10.3) to establish the Parseval-Plancherel equation
Z Z
1
hf, gi = f (x)g(x)dx = F (ω)G(ω)dω,
2π
from which it follows that
Z Z
1
||f ||2 = hf, f i = |f (x)|2 dx = |F (ω)|2 dω.
2π
Exercise 10.13 The one-sided Laplace transform (LT) of f is F given by
Z ∞
F(z) = f (x)e−zx dx.
0
Compute F(z) for f (x) = u(x), the Heaviside function. Compare F(−iω)
with the FT of u.
10.6. CONVOLUTION FILTERS 81
H(ω) = F (ω)G(ω).
The function G(ω) describes the effects of the system, the telephone line in
our first example, or the weak eyes in the second example, or the refraction
of light as it passes through the atmosphere, in optical imaging. If we
can use our measurements of h(x) to estimate H(ω) and if we have some
knowledge of the system distortion function, that is, some knowledge of
G(ω) itself, then there is a chance that we can estimate F (ω), and thereby
estimate f (x).
If we apply the Fourier Inversion Formula to H(ω) = F (ω)G(ω), we get
Z
1
h(x) = F (ω)G(ω)e−iωx dx. (10.4)
2π
The function h(x) that results is h(x) = (f ∗ g)(x), the convolution of the
functions f (x) and g(x), with the latter given by
Z
1
g(x) = G(ω)e−iωx dx. (10.5)
2π
82CHAPTER 10. PROPERTIES OF THE FOURIER TRANSFORM (CHAPTER 8)
Note that, if f (x) = δ(x), then h(x) = g(x). In the image processing
example, this says that if the true picture f is a single bright spot, the
blurred image h is g itself. For that reason, the function g is called the
point-spread function of the distorting system.
Convolution filtering refers to the process of converting any given func-
tion, say f (x), into a different function, say h(x), by convolving f (x) with a
fixed function g(x). Since this process can be achieved by multiplying F (ω)
by G(ω) and then inverse Fourier transforming, such convolution filters are
studied in terms of the properties of the function G(ω), known in this con-
text as the system transfer function, or the optical transfer function (OTF);
when ω is a frequency, rather than a spatial frequency, G(ω) is called the
frequency-response function of the filter. The magnitude of G(ω), |G(ω)|,
is called the modulation transfer function (MTF). The study of convolu-
tion filters is a major part of signal processing. Such filters provide both
reasonable models for the degradation signals undergo, and useful tools for
reconstruction.
Let us rewrite Equation (10.4), replacing F (ω) with its definition, as
given by Equation (10.1). Then we have
Z Z
1
h(x) = ( f (t)eiωt dt)G(ω)e−iωx dω. (10.6)
2π
Interchanging the order of integration, we get
Z Z Z
1
h(x) = f (t)( G(ω)eiω(t−x) dω)dt. (10.7)
2π
where J0 denotes the 0th order Bessel function. Using the identity
Z z
tn Jn−1 (t)dt = z n Jn (z), (10.13)
0
84CHAPTER 10. PROPERTIES OF THE FOURIER TRANSFORM (CHAPTER 8)
we have
2πR
F (ρ, φ) = J1 (ρR). (10.14)
ρ
Notice that, since f (x, z) is a radial function, that is, dependent only on
the distance from (0, 0) to (x, y), its Fourier transform is also radial.
The first positive zero of J1 (t) is around t = 4, so when we measure
F at various locations and find F (ρ, φ) = 0 for a particular (ρ, φ), we can
estimate R ≈ 4/ρ. So, even when a distant spherical object, like a star,
is too far away to be imaged well, we can sometimes estimate its size by
finding where the intensity of the received signal is zero [32].
infinities. This is typical behavior. Notice also that the smaller the A, the
π
slower F (ω) dies out; the first zeros of F (ω) are at |ω| = A , so the main
lobe widens as A goes to zero. The function f (x) is not continuous, so its
Fourier transform cannot be absolutely integrable. In this case, the Fourier-
Transform Inversion Formula must be interpreted as involving convergence
in the L2 norm.
86CHAPTER 10. PROPERTIES OF THE FOURIER TRANSFORM (CHAPTER 8)
Chapter 11
Transmission Tomography
(Chapter 8)
87
88 CHAPTER 11. TRANSMISSION TOMOGRAPHY (CHAPTER 8)
to a greater or lesser extent. If the intensity of the beam upon entry is Iin
and Iout is its lower intensity after passing through the body, then
R
Iout = Iin e− L
f
,
I 0 (s) = −f (s)I(s).
t = x cos θ + y sin θ,
and
s = −x sin θ + y cos θ.
If we have the new coordinates (t, s) of a point, the old coordinates are
(x, y) given by
x = t cos θ − s sin θ,
and
y = t sin θ + s cos θ.
90 CHAPTER 11. TRANSMISSION TOMOGRAPHY (CHAPTER 8)
12.1 Overview
In many applications, such as in image processing, the system of linear
equations to be solved is quite large, often several tens of thousands of
equations in about the same number of unknowns. In these cases, issues
such as the costs of storage and retrieval of matrix entries, the computa-
tion involved in apparently trivial operations, such as matrix-vector prod-
ucts, and the speed of convergence of iterative methods demand greater
attention. At the same time, the systems to be solved are often under-
determined, and solutions satisfying certain additional constraints, such as
non-negativity, are required. The ART and the MART are two iterative
algorithms that are designed to address these issues.
Both the algebraic reconstruction technique (ART) and the multiplica-
tive algebraic reconstruction technique (MART) were introduced as two
iterative methods for discrete image reconstruction in transmission tomog-
raphy.
Both methods are what are called row-action methods, meaning that
each step of the iteration uses only a single equation from the system. The
MART is limited to non-negative systems for which non-negative solutions
are sought. In the under-determined case, both algorithms find the solution
closest to the starting vector, in the two-norm or weighted two-norm sense
for ART, and in the cross-entropy sense for MART, so both algorithms
can be viewed as solving optimization problems. For both algorithms, the
starting vector can be chosen to incorporate prior information about the
desired solution. In addition,the ART can be employed in several ways to
obtain a least-squares solution, in the over-determined case.
93
94 CHAPTER 12. THE ART AND MART (CHAPTER 15)
1
xk+1
j = xkj + (bi − (Axk )i ), (12.1)
|Li |
for j in Li , and
xk+1
j = xkj , (12.2)
Hi = {x|(Ax)i = bi }, (12.3)
Because the ART uses only a single equation at each step, it has been called
a row-action method. Figures 12.2 and 12.3 illustrate the behavior of the
ART.
for each i = 1, ..., I, and that the entries of b have been rescaled accordingly,
to preserve the equations Ax = b. The ART is then the following: begin
with an arbitrary vector x0 ; for each nonnegative integer k, having found
xk , the next iterate xk+1 has entries
xk+1
j = xkj + Aij (bi − (Axk )i ). (12.7)
When the system Ax = b has exact solutions the ART converges to the
solution closest to x0 , in the 2-norm. How fast the algorithm converges
will depend on the ordering of the equations and on whether or not we use
relaxation. In selecting the equation ordering, the important thing is to
avoid particularly bad orderings, in which the hyperplanes Hi and Hi+1
are nearly parallel.
96 CHAPTER 12. THE ART AND MART (CHAPTER 15)
Theorem 12.1 Let Ax̂ = b and let x0 be arbitrary. Let {xk } be generated
by Equation (12.7). Then the sequence {||x̂ − xk ||2 } is decreasing and {xk }
converges to the solution of Ax = b closest to x0 .
Open Question: For a fixed ordering, does the limit cycle depend on the
initial vector x0 ? If so, how?
x=1
x = 2, (12.10)
12.4. THE MART 97
which has the unique least-squares solution x = 1.5, and the system
2x = 2
x = 2, (12.11)
In our example above, the geometric least-squares solution for the first
system is found by using W11 = 1 = W22 , so is again x = 1.5, while the
geometric least-squares solution of the second system is found by using
W11 = 0.5 and W22 = 1, so that the geometric least-squares solution is
x = 1.5, not x = 1.2.
Algorithm 12.1 (MART) Let x0 be any positive vector, and i = k(mod I)+
1. Having found xk for positive integer k, define xk+1 by
−1
bi mi Aij
xk+1
j = xkj , (12.17)
(Axk )i
Some treatments of MART leave out the mi , but require only that the
entries of A have been rescaled so that Aij ≤ 1 for all i and j. The mi is
important, however, in accelerating the convergence of MART.
12.4.3 Cross-Entropy
For a > 0 and b > 0, let the cross-entropy or Kullback-Leibler distance
from a to b be
a
KL(a, b) = a log + b − a, (12.18)
b
with KL(a, 0) = +∞, and KL(0, b) = b. Extend to nonnegative vectors
coordinate-wise, so that
J
X
KL(x, z) = KL(xj , zj ). (12.19)
j=1
If the starting vector x0 is the vector whose entries are all one, then the
MART converges to the solution that maximizes the Shannon entropy,
J
X
SE(x) = xj log xj − xj . (12.20)
j=1
100 CHAPTER 12. THE ART AND MART (CHAPTER 15)
Exercise 13.1 Show that, for each k = 1, ..., K, Colk (C), the kth column
of the matrix C = AB, is
103
104 CHAPTER 13. SOME LINEAR ALGEBRA (CHAPTER 15)
It follows from this exercise that, for given matrices A and C, every column
of C is a linear combination of the columns of A if and only if there is a
third matrix B such that C = AB.
The matrix A† is the conjugate transpose of the matrix A, that is, the
N by M matrix whose entries are
x = c1 u1 + ... + cN uN . (13.4)
0 = α1 u1 + ... + αN uN . (13.5)
13.3 Dimension
We turn now to the task of showing that every basis for a finite dimensional
vector space has the same number of members. That number will then be
used to define the dimension of that subspace.
Suppose that S is a subspace of V , that {w1 , ..., wN } is a spanning set
for S, and {u1 , ..., uM } is a linearly independent subset of S. Beginning
with w1 , we augment the set {u1 , ..., uM } with wj if wj is not in the span of
the um and the wk previously included. At the end of this process, we have
a linearly independent spanning set, and therefore, a basis, for S (Why?).
Similarly, beginning with w1 , we remove wj from the set {w1 , ..., wN } if wj
is a linear combination of the wk , k = 1, ..., j − 1. In this way we obtain
a linearly independent set that spans S, hence another basis for S. The
following lemma will allow us to prove that all bases for a subspace S have
the same number of elements.
We note that the set B1 is a spanning set for S and has N members.
Having obtained the spanning set Bk , with N members and whose first k
members are v k , ..., v 1 , we form the set Ck+1 = Bk ∪ {v k+1 }, listing the
members so that the first k + 1 of them are {v k+1 , v k , ..., v 1 }. To get the set
Bk+1 we remove the first member of Ck+1 that is a linear combination of
the members to its left; there must be one, since Bk is a spanning set, and
so v k+1 is a linear combination of the members of Bk . Since the set H is
linearly independent, the member removed is from the set G. Continuing
in this fashion, we obtain a sequence of spanning sets B1 , ..., BN , each with
N members. The set BN is BN = {v 1 , ..., v N } and v N +1 must then be
a linear combination of the members of BN , which contradicts the linear
independence of H.
Corollary 13.1 Every basis for a subspace S has the same number of el-
ements.
Exercise 13.5 Let G = {w1 , ..., wN } be a spanning set for a subspace S
in RI , and H = {v 1 , ..., v M } a linearly independent subset of S. Let A be
the I by M matrix whose columns are the vectors v m and B the I by N
matrix whose columns are the wn . Prove that there is an N by M matrix
C such that A = BC. Prove Lemma 13.1 by showing that, if M > N , then
there is a non-zero vector x with Cx = 0.
Definition 13.6 The dimension of a subspace S is the number of elements
in any basis.
Lemma 13.2 For any matrix A, the maximum number of linearly inde-
pendent rows equals the maximum number of linearly independent columns.
Exercise 13.7 Suppose that V , W and Z are vector spaces, with bases
A, B and C, respectively. Suppose also that T is a linear transformation
from V to W and U is a linear transformation from W to Z. Let A
represent T with respect to the bases A and B, and let B represent U with
respect to the bases B and C. Show that the matrix BA represents the linear
transformation U T with respect to the bases A and C.
Exercise 13.9 Show that the subspace JV has the same dimension as V ∗∗
itself, so that it must be all of V ∗∗ .
We shall see later that once V has been endowed with an inner product,
there is a simple way to describe every linear functional on V : for each f
in V ∗ there is a unique vector vf in V with f (v) = hv, vf i, for each v in V .
As a result, we have an identification of V ∗ with V itself.
Exercise 13.10 Suppose that B is a second basis for V . Show that there is
a unique N by N matrix Q having the property that the matrix B = QAQ−1
represents T , with respect to the basis B; that is, we can write
[T ]B = Q[T ]A Q−1 .
[v]B = Q[v]A ,
for all v.
13.7 Diagonalization
Let T : V → V be a linear operator, A a basis for V , and A = [T ]A . As we
change the basis, the matrix representing T also changes. We wonder if it
is possible to find some basis B such that B = [T ]B is a diagonal matrix L.
Let P = [I]A B be the change-of basis matrix from B to A. We would then
have P −1 AP = L, or A = P LP −1 . When this happens, we say that A has
been diagonalized by P .
Suppose that the basis B = {b1 , ..., bN } is such that B = [T ]B = L,
where L is the diagonal matrix L = diag {λ1 , ..., λN }. Then we have AP =
P L, which tells us that pn , the n-th column of P , is an eigenvector of the
matrix A, with λn as its eigenvalue. Since pn = [bn ]A , we have
(T − λn I)bn = 0,
13.8. USING MATRIX REPRESENTATIONS 109
or
T bn = λn bn ;
therefore, bn is an eigenvector of the linear operator T .
Exercise 13.11 Use the fact that det(GH)=det(G)det(H) for any square
matrices G and H to show that
x0 (t) = ax(t)
and
w20 (t) = 3w2 (t).
The solutions are then
w1 (t) = w1 (0)e2t ,
and
w2 (t) = w2 (0)e3t .
It follows from z(t) = Bw(t) that
and
y(t) = 2w1 (0)e2t + w2 (0)e3t .
We want to express x(t) and y(t) in terms of x(0) and y(0). To do this we
use z(0) = Bw(0), which tells us that
and
y(t) = (−2x(0) + 2y(0))e2t + (2x(0) − y(0))e3t .
We can rewrite this as
z(t) = E(t)z(0),
where
−e2t + 2e3t e2t − e3t
E(t) = .
−2e2t + 2e3t 2e2t − e3t
What is the matrix E(t)?
To mimic the solution x(t) = x(0)eat of the problem x0 (t) = ax(t), we
try
z(t) = etA z(0),
with the matrix exponential defined by
∞
X 1 n n
etA = t A .
n=0
n!
etA = BetD B −1 .
where y † is the conjugate transpose of the vector y, that is, y † is the row
vector with entries yn .
The association of the elements v in V with the complex column vector
[v]A can be used to obtain an inner product on V . For any v and w in V ,
define
where the right side is the ordinary complex dot product in CN . Once we
have an
pinner product on V we can define the norm of a vector in V as
kvk = hv, vi.
Note that, with respect to this inner product, the basis A becomes an
orthonormal basis.
We assume, throughout the remainder of this section, that V is an
inner-product space. For more detail concerning inner products, see the
chapter Appendix: Inner Products and Orthogonality.
in V , we have
N
X
f (v) = Af [v]A = f (an )αn .
n=1
f (v) = hv, yf i,
So we see that once V has been given an inner product, each linear func-
tional f on V can be thought of as corresponding to a vector yf in V , so
that
f (v) = hv, yf i.
Exercise 13.12 Show that the vector yf associated with the linear func-
tional f is unique by showing that
13.13 Orthogonality
Two vectors v and w in the inner-product space V are said to be orthogonal
if hv, wi = 0. A basis U = {u1 , u2 , ..., uN } is called an orthogonal basis if
every two vectors in U are orthogonal, and orthonormal if, in addition,
kun k = 1, for each n.
Exercise 13.13 Let U and V be orthonormal bases for the inner-product
space V , and let Q be the change-of-basis matrix satisfying
[v]U = Q[v]V .
Show that Q−1 = Q† , so that Q is a unitary matrix.
Exercise 13.14 Let U be an orthonormal basis for the inner-product space
V and T a linear operator on V . Show that
[T ∗ ]U = ([T ]U )† . (13.12)
Exercise 13.16 Compute the eigenvalues for the real square matrix
1 2
A= . (13.13)
−2 1
Note that the eigenvalues are complex, even though the entries of A are
real. The matrix A is not Hermitian.
are both equal to one, and that the only eigenvectors are non-zero multiples
of the vector (1, 0)T . Compute C T C and CC T . Are they equal?
Q[v]U = [v]A .
Combining the various results obtained so far, we can conclude the follow-
ing.
• (a) if T is self-adjoint, so is TW ;
• (b) W ⊥ is T ∗ -invariant;
Proving the existence of the orthonormal basis uses essentially the same
argument as the induction proof given earlier. The eigenvalues of a self-
adjoint linear operator T on a finite-dimensional complex inner-product
space are real numbers. If T be a linear operator on a finite-dimensional real
inner-product space V and V has an orthonormal basis U = {u1 , ..., uN }
consisting of eigenvectors of T , then we have
T un = λn un = λn un = T ∗ un ,
118 CHAPTER 13. SOME LINEAR ALGEBRA (CHAPTER 15)
so, since T = T ∗ on each member of the basis, these operators are the same
everywhere, so T = T ∗ and T is self-adjoint.
We close with an example of a real 2 by 2 matrix A with AT A = AAT ,
but with no eigenvectors in R2 . Take 0 < θ < π and A to be the matrix
cos θ − sin θ
A= . (13.16)
sin θ cos θ
119
Chapter 14
121
122 CHAPTER 14. VECTORS (CHAPTER 5,6)
The dot product x·y of two vectors x = (x1 , x2 , ..., xN ) and y = (y1 , y2 , ..., yN )
in RN is defined by
x · y = x1 y1 + x2 y2 + ... + xN yN . (14.3)
2 3
For the cases of R and R√ we can give geometric meaning to the dot
product; the length of x is x · x and
x · y = |x| |y| cos(θ),
where θ is the angle between x and y when they are viewed as directed line
segments positioned to have a common beginning point. We see from this
that two vectors are perpendicular (or orthogonal) when their dot product
is zero.
For R3 we also have the cross product x × y, defined by
x × y = (x2 y3 − x3 y2 , x3 y1 − x1 y3 , x2 y3 − x3 y2 ). (14.4)
When x and y are viewed as directed line segments with a common be-
ginning point, the cross product is viewed as a third directed line segment
with the same beginning point, perpendicular to both x and y, and having
for its length the area of the parallelogram formed by x and y. Therefore,
if x and y are parallel, there is zero area and the cross product is the zero
vector. Note that
y × x = −x × y. (14.5)
From the relationships
x · (y × z) = y · (z × x) = z · (x × y) (14.6)
we see that
x · (x × y) = y · (x × x) = 0. (14.7)
The dot product and cross product are relatively new additions to the
mathematical tool box. They grew out of the 19th century study of quater-
nions.
14.4. COMPLEX NUMBERS 123
where i is the shorthand for the complex number i = (0, 1). With w =
(u, v) = u + vi a second complex number, the product zw is
14.5 Quaternions
It seemed logical that a three-dimensional version of complex analysis would
involve objects of the form
a + bi + cj,
where a, b, and c are real numbers, (1, 0, 0) = 1, i = (0, 1, 0) and j =
(0, 0, 1), and i2 = j 2 = −1 now. Multiplying a + bi + cj by d + ei + f j led
to the question What are ij and ji? The Irish mathematician Hamilton
eventually hit on the answer, but it forced the search to move from three-
dimensional space to four-dimensional space.
Hamilton discovered that it was necessary to consider objects of the
form a+bi+cj +dk, where 1 = (1, 0, 0, 0), i = (0, 1, 0, 0), j = (0, 0, 1, 0), and
k = (0, 0, 0, 1), and ij = k = −ji. With the other rules i2 = j 2 = k 2 = −1,
jk = i = −kj, and ki = j = −ik, we get what are called the quaternions.
For a while in the latter half of the 19th century it was thought that
quaternions would be the main tool for studying EM theory, but that was
not what happened.
124 CHAPTER 14. VECTORS (CHAPTER 5,6)
This tells us that quaternion multiplication employs all four of the notions
of multiplication that we have encountered previously: ordinary scalar mul-
tiplication, multiplication of a vector by a scalar, the dot product, and the
cross product. It didn’t take people long to realize that it isn’t necessary to
use quaternion multiplication all the time; just use the dot product when
you need it, and the cross product when you need it. Quaternions were
demoted to exercises in abstract algebra texts, while the notions of dot
product and cross product became essential tools in vector calculus and
EM theory.
Chapter 15
A Brief History of
Electromagnetism
(Chapter 5,6)
125
126CHAPTER 15. A BRIEF HISTORY OF ELECTROMAGNETISM (CHAPTER 5,6)
not debates within science itself; the decades long attacks on science by
the cigarette industry and efforts to weaken the EPA show clearly that it is
not only some religious groups that want the political influence of science
diminished.
Many of the issues our society will have to deal with in the near future,
including nuclear power, terrorism, genetic engineering, energy, climate
change, control of technology, space travel, and so on, involve science and
demand a more sophisticated understanding of science on the part of the
general public. The recent book Physics for Future Presidents: the Science
Behind the Headlines [36] discusses many of these topics, supposedly as an
attempt by the author to educate presidents-to-be, who will be called on
to make decisions, to initiate legislation, and to guide the public debate
concerning these issues.
History reminds us that progress need not be permanent. The tech-
nological expertise and artistic heights achieved by the Romans, even the
mathematical sophistication of Archimedes, were essentially lost, at least
in the west, for fifteen hundred years.
History also teaches us how unpredictable the future can be, which is, in
fact, the underlying theme of this essay. No one in 1800 could have imagined
the electrification that transformed society over the nineteenth century, just
as no one in 1900 could have imagined Hiroshima and Nagasaki, only a few
decades away, let alone the world of today.
their prey. When these animals were dissected, it was noticed that there
were unusual structures within their bodies that other fish did not have.
Later, it became clear that these structures were essentially batteries.
earth and the sun were much older than previously thought, which was not
possible, according to the physics of the day; unless a new form of energy
was operating, the sun would have burned out a long time ago.
Newton thought that light was a stream of particles. Others at the
time, notably Robert Hooke and Christiaan Huygens, felt that light was a
wave phenomenon. Both sides were hindered by a lack of a proper scien-
tific vocabulary to express their views. Around 1800 Young demonstrated
that a beam of light displayed interference effects similar to water waves.
Eventually, his work convinced people that Newton had been wrong on
this point and most accepted that light is a wave phenomenon. Faraday,
Maxwell, Hertz and others further developed the wave theory of light and
related light to other forms of electromagnetic radiation.
In 1887 Hertz discovered the photo-electric effect, later offered by Ein-
stein as confirming evidence that light has a particle nature. When light
strikes a metal, it can cause the metal to release an electrically charged par-
ticle, an electron. If light were simply a wave, there would not be enough
energy in the small part of the wave that hits the metal to displace the elec-
tron; in 1905 Einstein will argue that light is quantized, that is, it consists
of individual bundles or particles, later called photons, each with enough
energy to cause the electron to be released.
It was recognized that there were other problems with the wave theory
of light. All known waves required a medium in which to propagate. Sound
cannot propagate in a vacuum; it needs air or water or something. The
sound waves are actually compressions and rarefactions of the medium,
and how fast the waves propagate depends on how fast the material in the
medium can perform these movements; sound travels faster in water than
in air, for example.
Light travels extremely fast, but does not propagate instantaneously,
as Olaus Roemer first demonstrated around 1700. He observed that the
eclipses of the moons of Jupiter appeared to happen sooner when Jupiter
was moving closer to Earth, and later when it was moving away. He rea-
soned, correctly, that the light takes a finite amount of time to travel from
the moons to Earth, and when Jupiter is moving away the distance is
growing longer.
If light travels through a medium, which scientists called the ether, then
the ether must be a very strange substance indeed. The material that makes
up the ether must be able to compress and expand very quickly. Light
comes to us from great distances so the ether must extend throughout all of
space. The earth moves around the sun, and therefore through this ether, at
a great speed, and yet there are no friction effects, while very much slower
winds produce a great deal of weathering. Light can also be polarized,
so the medium must be capable of supporting transverse waves, not just
longitudinal waves, as in acoustics. To top it all off, the Michelson-Morley
experiment, performed in Cleveland in 1887, failed to detect the presence
134CHAPTER 15. A BRIEF HISTORY OF ELECTROMAGNETISM (CHAPTER 5,6)
of the ether. The notion that there is a physical medium that supports
the propagation of light would not go away, however. Late in his long life
Lord Kelvin (William Thomson) wrote “One word characterizes the most
strenuous efforts ... that I have made perseveringly during fifty-five years:
that word is FAILURE.” Thomson refused to give up his efforts to combine
the mathematics of electromagnetism with the mechanical picture of the
world.
philosophy began to sneak back in, as questions about causality and the
existence of objects we cannot see, such as atoms, started to be asked [1].
Most scientists are probably realists, believing that the objects they study
have an existence independent of the instruments used to probe them. On
the other side of the debate, positivists, or, at least, the more extreme
positivists, hold that we have no way of observing an observer-independent
reality, and therefore cannot verify that there is such a reality. Positivists
hold that scientific theories are simply instruments used to hold together
observed facts and make predictions. They do accept that the theories
describe an empirical reality that is the same for all observers, but not a
reality independent of observation. At first, scientists felt that it was safe
for them to carry on without worrying too much about these philosophical
points, but quantum theory would change things [26].
The idea that matter is composed of very small indivisible atoms goes
back to the ancient Greek thinkers Democritus and Epicurus. The phi-
losophy of Epicurus was popularized during Roman times by Lucretius,
in his lengthy poem De Rerum Natura (“On the Nature of Things” ), but
this work was lost to history for almost a thousand years. The discovery,
in 1417, of a medieval copy of the poem changed the course of history,
according to the author Stephen Greenblatt [22]. Copies of the poem be-
came widely distributed throughout Europe and eventually influenced the
thinking of Galileo, Freud, Darwin, Einstein, Thomas Jefferson, and many
others. But it wasn’t until after Einstein’s 1905 paper on Brownian mo-
tion and subsequent experimental confirmations of his predictions that the
actual existence of atoms was more or less universally accepted.
I recall reading somewhere about a conversation between a philosopher
of science and an experimental physicist, in which the physicist was ex-
plaining how he sprayed an object with positrons. The philosopher then
asked him if he really believed that positrons exist. The physicist answered,
“If you can spray them, they exist.”
tivity, anyway? The new century was dawning, and all these questions
were in the air. It was about 1900, Planck had just discovered the quan-
tum theory, Einstein was in the patent office, where he would remain until
1909, Bohr and Schrödinger schoolboys, Heisenberg not yet born. A new
scientific revolution was about to occur, and, as in 1800, nobody could have
guessed what was coming next [35].
If Mozart had never lived, nobody else would have composed his music.
If Picasso had never lived, nobody else would have painted his pictures.
If Winston Churchill had never lived, or had he died of his injuries when,
in 1930, he was hit by a car on Fifth Avenue in New York City, western
Europe would probably be different today. If Hitler had died in 1930, when
the car he was riding in was hit by a truck, recent history would certainly
be different, in ways hard for us to imagine. But, I think the jury is still
out on this debate, at least as it applies to science.
I recently came across the following, which I think makes this point
well. Suppose that you were forced to decide which one of these four things
to“consign to oblivion” , that is, make it never to have happened: Mozart’s
opera Don Giovanni, Chaucer’s Canterbury Tales, Newton’s Principia, or
Eiffel’s tower. Which one would you choose? The answer has to be New-
ton’s Principia; it is the only one of the four that is not irreplaceable.
If Newton had never lived, we would still have Leibniz’s calculus. New-
ton’s Law of Universal Gravitation would have been discovered by someone
else. If Faraday had never lived, we would still have Henry’s discovery of
electromagnetic induction. If Darwin had never lived, someone else would
have published roughly the same ideas, at about the same time; in fact,
Alfred Russel Wallace did just that. If Einstein had not lived, somebody
else, maybe Poincaré, would have hit on roughly the same ideas, perhaps a
bit later. Relativity would have been discovered by someone else. The fact
that light behaves both like a wave and like a particle would have become
apparent to someone else. The fact that atoms do really exist would have
been demonstrated by someone else, although perhaps in a different way.
Changing Variables in
Multiple Integrals
(Chapter 5,6)
143
144CHAPTER 16. CHANGING VARIABLES IN MULTIPLE INTEGRALS (CHAPTER 5,6)
Proof: We prove this mean-value theorem using the previous one. Any
point x on the line segment joining a with b has the form
where c = a + τ (b − a).
When b − a = da we can write
∂r ∂r1 ∂rJ
(a) = ( (a), ..., (a)), (16.12)
∂aj ∂aj ∂aj
we have
J
X ∂r
dr(a) = (a) daj . (16.13)
j=1
∂aj
∂r ∂x ∂y ∂z
(a) = ( (a), (a), (a)), (16.14)
∂u ∂u ∂u ∂u
∂r ∂x ∂y ∂z
(a) = ( (a), (a), (a)), (16.15)
∂v ∂v ∂v ∂v
and
∂r ∂x ∂y ∂z
(a) = ( (a), (a), (a)). (16.16)
∂w ∂w ∂w ∂w
The vector differential dr is then
∂r ∂r ∂r
dr(a) = (a)du + (a)dv + (a)dw, (16.17)
∂u ∂v ∂w
which we obtain by applying the mean value theorem of the previous sec-
tion, viewing each of the functions x(u, v, w), y(u, v, w), and z(u, v, w, ) as
one of the rj . We view dr as the diagonal of an infinitesimal parallelepiped
with one corner at the point (x, y, z). We want to compute the volume of
this parallelepiped.
∂r ∂r ∂r
The vectors A = ∂u du, B = ∂v dv and C = ∂w dw are then three vectors
forming the sides of the parallelepiped. The volume of the parallelepiped
is then the absolute value of the vector triple product A · (B × C).
146CHAPTER 16. CHANGING VARIABLES IN MULTIPLE INTEGRALS (CHAPTER 5,6)
• 1. there are positive and negative electrical charges, and like charges
repel, unlike charges attract;
• 2. the force is a central force, that is, the force that one charge exerts
on another is directed along the ray between them and, by Coulomb’s
Law, its strength falls off as the square of the distance between them;
• 3. super-position holds, which means that the force that results from
multiple charges is the vector sum of the forces exerted by each one
separately.
Apart from the first principle, this is a good description of gravity and
magnetism as well. According to Newton, every massive body exerts a
147
148 CHAPTER 17. DIV, GRAD, CURL (CHAPTER 5,6)
where
x y z
u(x, y, z) = ( p ,p ,p )
x2 + y2 + z2 2 2
x +y +z 2 x + y2 + z2
2
is the unit vector pointing from (0, 0, 0) to (x, y, z). The electric field can
be written in terms of its component functions, that is,
where
qx
E1 (x, y, z) = ,
(x2 + + z 2 )3/2
y2
qy
E2 (x, y, z) = ,
(x2 + y 2 + z 2 )3/2
17.3. GRADIENTS AND POTENTIALS 149
and
qz
E3 (x, y, z) = .
(x2 + y 2 + z 2 )3/2
It is helpful to note that these component functions are the three first
partial derivatives of the function
−q
φ(x, y, z) = p . (17.2)
x + y2 + z2
2
Gauss’s Law:
Z Z Z Z Z
E · n dS = 4π ρ dV. (17.3)
S V
The integral on the left side is the integral over the surface S, while the
integral on the right side is the triple integral over the volume V enclosed
by the surface S. We must remember to think of integrals as summing, so
on the left we are summing something over the surface, while on the right
we are summing something else over the enclosed volume.
div E = ∇ · E,
This is also the first of the four Maxwell’s Equations. When we substitute
div E(x, y, z) for 4πρ(x, y, z) in Equation (17.3) we get
Z Z Z Z Z
E · n dS = div E(x, y, z) dV, (17.6)
S V
Functions that satisfy Equation (17.8) are called harmonic functions. The
reader may know that both the real and imaginary parts of a complex-
valued analytic function are harmonic functions of two variables. This
connection between electrostatics and complex analysis motivated the (ul-
timately fruitless) search for a three-dimensional extension of complex anal-
ysis.
17.7.1 An Example
The curve r(t) in three-dimensional space given by
r(t) = (x(t), y(t), z(t)) = r(cos θ(t), sin θ(t), 0)
can be viewed as describing the motion of a point moving in time, revolving
counter-clockwise around the z-axis. The velocity vector at each point is
dθ
v(t) = r0 (t) = (x0 (t), y 0 (t), z 0 (t)) =
(−r sin θ(t), r cos θ(t), 0).
dt
Suppressing the dependence on t, we can write the velocity vector field as
dθ
v(x, y, z) = (−y, x, 0).
dt
Then
curl v(x, y, z) = (0, 0, 2ω),
dθ
where ω = dt is the angular velocity. The divergence of the velocity field
is
div v(x, y, z) = 0.
The motion here is rotational; there is no outward flow of anything. Here
the curl describes how fast the rotation is, and indicates the axis of rotation;
the fact that there is no outward flow is indicated by the divergence being
zero.
17.8. THE MAGNETIC FIELD 153
∂2φ ∂2φ
= ,
∂x∂y ∂y∂x
∂2φ ∂2φ
= ,
∂x∂z ∂z∂x
and
∂2φ ∂2φ
= .
∂z∂y ∂y∂z
It follows, therefore, that, because the electrostatic field has a potential,
its curl is zero. The third of Maxwell’s Equations (for electrostatics) is
field J proportional to the rate of change of the electric field, and Item 4
above is replaced by Ampere’s Law:
∂E
curl B = a ,
∂t
where a is some constant. Therefore, the curl of the magnetic field is
proportional to the rate of change of the electric field with respect to time.
Faraday (and also Henry) discovered that moving a magnet inside a
wire coil creates a current in the wire. When the magnetic field is changing
with respect to time, the electric field has a non-zero curl proportional to
the rate at which the magnetic field is changing. Then Item 2 above is
replaced by
∂B
curl E = b ,
∂t
where b is some constant. Therefore, the curl of the electric field is pro-
portional to the rate of change of the magnetic field. It is this mutual de-
pendence that causes electromagnetic waves: as the electric field changes,
it creates a changing magnetic field, which, in turn, creates a changing
electric field, and so on.
• 1. div E = 0;
• 2. curl E = −b ∂B
∂t ;
• 3. div B = 0;
• 4. curl B = a ∂E
∂t .
We then have
∂B ∂ ∂ ∂E ∂2E
∇ × (∇ × E) = −b(∇ × ) = −b (∇ × B) = −ab ( ) = −ab 2 .
∂t ∂t ∂t ∂t ∂t
Using Equation (17.10), we can also write
Therefore, we have
∂2E
∇2 E = ab ,
∂t2
17.9. ELECTRO-MAGNETIC WAVES 155
∂ 2 Ei
= c2 ∇2 Ei .
∂t2
The same is true for the component functions of the magnetic field. Here
the constant c is the speed of propagation of the wave, which turns out to
be the speed of light. It was this discovery that suggested to Maxwell that
light is an electromagnetic phenomenon.
156 CHAPTER 17. DIV, GRAD, CURL (CHAPTER 5,6)
Chapter 18
Kepler’s Laws of
Planetary Motion
(Chapter 5,6)
18.1 Introduction
Kepler worked from 1601 to 1612 in Prague as the Imperial Mathematician.
Taking over from Tycho Brahe, and using the tremendous amount of data
gathered by Brahe from naked-eye astronomical observation, he formulated
three laws governing planetary motion. Fortunately, among his tasks was
the study of the planet Mars, whose orbit is quite unlike a circle, at least
relatively speaking. This forced Kepler to consider other possibilities and
ultimately led to his discovery of elliptic orbits. These laws, which were
the first “natural laws” in the modern sense, served to divorce astronomy
from theology and philosophy and marry it to physics. At last, the planets
were viewed as material bodies, not unlike earth, floating freely in space
and moved by physical forces acting on them. Although the theology and
philosophy of the time dictated uniform planetary motion and circular or-
bits, nature was now free to ignore these demands; motion of the planets
could be non-uniform and the orbits other than circular.
Although the second law preceded the first, Kepler’s Laws are usually
enumerated as follows:
• 1. the planets travel around the sun not in circles but in elliptical
orbits, with the sun at one focal point;
• 2. a planet’s speed is not uniform, but is such that the line segment
from the sun to the planet sweeps out equal areas in equal time
intervals; and, finally,
157
158CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)
• 3. for all the planets, the time required for the planet to complete
one orbit around the sun, divided by the 3/2 power of its average
distance from the sun, is the same constant.
These laws, particularly the third one, provided strong evidence for New-
ton’s law of universal gravitation. How Kepler discovered these laws with-
out the aid of analytic geometry and differential calculus, with no notion of
momentum, and only a vague conception of gravity, is a fascinating story,
perhaps best told by Koestler in [31].
Around 1684, Newton was asked by Edmund Halley, of Halley’s comet
fame, what the path would be for a planet moving around the sun, if the
force of gravity fell off as the square of the distance from the sun. Newton
responded that it would be an ellipse. Kepler had already declared that
planets moved along elliptical orbits with the sun at one focal point, but his
findings were based on observation and imagination, not deduction from
physical principles. Halley asked Newton to provide a proof. To supply
such a proof, Newton needed to write a whole book, the Principia, pub-
lished in 1687, in which he had to deal with such mathematically difficult
questions as what the gravitational force is on a point when the attracting
body is not just another point, but a sphere, like the sun.
With the help of vector calculus, a later invention, Kepler’s laws can be
derived as consequences of Newton’s inverse square law for gravitational
attraction.
18.2 Preliminaries
We consider a body with constant mass m moving through three-dimensional
space along a curve
r(t) = (x(t), y(t), z(t)),
where t is time and the sun is the origin. The velocity vector at time t is
then
v(t) = r0 (t) = (x0 (t), y 0 (t), z 0 (t)),
and the acceleration vector at time t is
p(t) = mv(t).
One of the most basic laws of motion is that the vector p0 (t) = mv0 (t) =
ma(t) is equal to the external force exerted on the body. When a body, or
more precisely, the center of mass of the body, does not change location,
all it can do is rotate. In order for a body to rotate about an axis a torque
18.3. TORQUE AND ANGULAR MOMENTUM 159
is required. Just as work equals force times distance moved, work done
in rotating a body equals torque times angle through which it is rotated.
Just as force is the time derivative of p(t), the linear momentum vector, we
find that torque is the time derivative of something else, called the angular
momentum vector.
Then at time t + ∆t it is at
Therefore, using trig identities, we find that the change in the x-coordinate
is approximately
τ = Fy x(t) − Fx y(t).
and
F = (Fx , Fy , 0),
we find that
Now we use the fact that the force is the time derivative of the vector p(t)
to write
τ = (0, 0, τ ) = r(t) × p0 (t).
d
r(t) × p0 (t) = (r(t) × p(t)). (18.1)
dt
In other words, we need to know that the torque is still the time derivative
of L(t), even as the coordinate system changes. In order for something
to be a “vector”in the physicists’ sense, it needs to behave properly as we
switch coordinate systems, that is, it needs to transform as a vector [15].
In fact, all is well. This definition of L(t) holds for bodies moving along
more general curves in three-dimensional space, and we can go on calling
L(t) the angular momentum vector. Now we begin to exploit the special
nature of the gravitational force.
18.4. GRAVITY IS A CENTRAL FORCE 161
F(t) = h(t)r(t),
for each t, where h(t) denotes a scalar function of t; that is, the force is
central if it is proportional to r(t) at each t.
We see then that the angular momentum vector L(t) is conserved when
the force is central.
Proof: We have
r(t) · L = r(t) · L(t) = r(t) · r(t) × p(t) ,
which is the volume of the parallelepiped formed by the three vectors r(t),
r(t) and p(t), which is obviously zero. Therefore, for every t, the vector
r(t) is orthogonal to the constant vector L. So, the curve lies in a plane
with normal vector L.
r(t)
ur (t) = = (cos θ(t), sin θ(t))
||r(t)||
162CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)
so that
d
uθ (t) = ur (t),
dθ
and
d
ur (t) = − uθ (t).
dθ
Exercise 18.2 Show that
dθ
p(t) = mρ0 (t)ur (t) + mρ(t) uθ (t). (18.2)
dt
Exercise 18.3 View the vectors r(t), p(t), ur (t) and uθ (t) as vectors in
three-dimensional space, all with third component equal to zero. Show that
Let t0 be some arbitrary time, and for any time t ≥ t0 let A(t) be the
area swept out by the planet in the time interval [t0 , t]. Then A(t2 ) − A(t1 )
is the area swept out in the time interval [t1 , t2 ].
In the very short time interval [t, t + ∆t] the vector r(t) sweeps out a
very small angle ∆θ, and the very small amount of area formed is then
approximately
1
∆A = ρ(t)2 ∆θ.
2
Dividing by ∆t and taking limits, as ∆t → 0, we get
dA 1 dθ L
= ρ(t)2 = .
dt 2 dt 2m
Therefore, the area swept out between times t1 and t2 is
Z t2 Z t2
dA L L(t2 − t1 )
A(t2 ) − A(t1 ) = dt = dt = .
t1 dt t1 2m 2m
mM G
F(t) = h(t)r(t) = − r(t).
ρ(t)3
where k = m2 M G.
Exercise 18.4 Show that the velocity vectors r0 (t) lie in the same plane
as the curve r(t).
A × (A × B) = (A · B)A − (A · A)B
Exercise 18.6 Use the rule in the previous exercise to show that the con-
stant vector K also lies in the plane of the curve r(t).
K · r(t) = L2 − kρ(t).
164CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)
where α(t) is the angle between the vectors K and r(t). From this we get
The area of this ellipse is πab. But we know from the first law that the
L
area of the ellipse is 2m times the time T required to complete a full orbit.
Equating the two expressions for the area, we get
4π 2 3
T2 = a .
MG
This is the third law.
The first two laws deal with the behavior of one planet; the third law
is different. The third law describes behavior that is common to all the
planets in the solar system, thereby suggesting a universality to the force
of gravity.
dA 1 dθ L
= ρ(t)2 = = c. (18.3)
dt 2 dt 2m
Differentiating with respect to t, we get
dθ 1 d2 θ
ρ(t)ρ0 (t) + ρ(t)2 2 = 0, (18.4)
dt 2 dt
so that
dθ d2 θ
2ρ0 (t) + ρ(t) 2 = 0. (18.5)
dt dt
From this, we shall prove that the force is central, directed towards the
sun.
As we did earlier, we write the position vector r(t) as
so, suppressing the dependence on the time t, and using the identities
dur dθ
= uθ ,
dt dt
and
duθ dρ
= −ur ,
dt dt
we write the velocity vector as
dr dρ dur dρ dur dθ dρ dθ
v= = ur + ρ = ur + ρ = ur + ρ uθ ,
dt dt dt dt dθ dt dt dt
18.9. FROM KEPLER TO NEWTON 167
d2 ρ dρ dθ dρ dθ d2 θ dθ dθ
= 2
ur + uθ + uθ + ρ 2
uθ − ρ ur .
dt dt dt dt dt dt dt dt
Therefore, we have
d2 ρ dθ 2 dρ dθ d2 θ
a= − ρ( ) ur + 2 + ρ 2 uθ .
dt2 dt dt dt dt
Using Equation (18.4), this reduces to
d2 ρ dθ 2
a= − ρ( ) ur , (18.6)
dt2 dt
which tells us that the acceleration, and therefore the force, is directed
along the line joining the planet to the sun; it is a central force.
dρ dρ dθ 2c dθ
= = 2 (18.7)
dt dθ dt ρ dt
and
d2 ρ 4c2 d2 ρ 8c2 dρ 2
2
= 4 − 5 . (18.8)
dt ρ dθ2 ρ dθ
so that
d2 u
a = −4c2 u2 + u ur . (18.9)
dθ2
168CHAPTER 18. KEPLER’S LAWS OF PLANETARY MOTION (CHAPTER 5,6)
L2 a(1 − e2 )
ρ(t) = = ,
k + K cos α(t) 1 + e cos α(t)
1 + e cos α(t)
u= .
a(1 − e2 )
4c2 4c2
a=− 2
u2 ur = − r−2 ur ,
a(1 − e ) a(1 − e2 )
which tells us that the force obeys an inverse-square law. We still must
show that this same law applies to each of the planets, that is, that the
c2
constant a(1−e 2 ) does not depend on the particular planet.
along the line from 2, the midpoint of 1 and 3, to the sun. The effect of
such a force is to pull the planet away from 3, along the line from 3 to 4.
The areas of the two triangles formed by the sun and the points 2 and 3
and the sun and the points 2 and 4 are both equal to half of the distance
from the sun to 2, times the distance from 2 to B. So we still have equal
areas in equal times.
We can corroborate Newton’s approximations using vector calculus.
Consider the planet at 2 at time t = 0. Suppose that the acceleration
is a(t) = (b, c), where (b, c) is a vector parallel to the line segment from
the sun to 2. Then the velocity vector is v(t) = t(b, c) + (0, ∆), where, for
simplicity, we assume that, in the absence of the force from the sun, the
planet travels at a speed of ∆ units per second. The position vector is then
1 2
r(t) = t (b, c) + t(0, ∆) + r(0).
2
At time t = 1, instead of the planet being at 3, it is now at
1
r(1) = (b, c) + (0, ∆) + r(0).
2
Since the point 3 corresponds to the position (0, ∆) + r(0), we see that the
point 4 lies along the line from 3 parallel to the vector (b, c).
18.11.1 Rescaling
Suppose that the spatial variables (x, y, z) are replaced by (αx, αy, αz) and
time changed from t to βt. Then velocity, since it is distance divided by
time, is changed from v to αβ −1 v. Velocity squared, and therefore kinetic
and potential energies, are changed by a factor of α2 β −2 .
where C > 0 is some constant and we assume that the sun is at the origin.
The gradient of φ(x, y, z) is
−C x y z
∇φ(x, y, z) = , , .
x2 + y 2 + z 2
p p p
x2 + y 2 + z 2 x2 + y 2 + z 2 x2 + y 2 + z 2
The gravitational force on a massive object at point (x, y, z) is therefore a
vector of magnitude x2 +yC2 +z2 , directed from (x, y, z) toward (0, 0, 0), which
says that the force is central and falls off as the reciprocal of the distance
squared.
The potential function φ(x, y, z) is (−1)-homogeneous, meaning that
when we replace x with αx, y with αy, and z with αz, the new potential
is the old one times α−1 .
We also know, though, that when we rescale the space variables by α
and time by β the potential energy is multiplied by a factor of α2 β −2 . It
follows that
α−1 = α2 β −2 ,
so that
β 2 = α3 . (18.11)
Suppose that we have two planets, P1 and P2 , orbiting the sun in circular
orbits, with the length of the the orbit of P2 equal to α times that of P1 .
We can view the orbital data from P2 as that from P1 , after a rescaling of
the spatial variables by α. According to Equation (18.11), the orbital time
of P2 is then that of P1 multiplied by β = α3/2 . This is Kepler’s Third
Law.
Kepler took several decades to arrive at his third law, which he obtained
not from basic physical principles, but from analysis of observational data.
Could he have saved himself much time and effort if he had stayed in his
armchair and considered rescaling, as we have just done? No. The impor-
tance of Kepler’s Third Law lies in its universality, the fact that it applies
not just to one planet but to all. We have implicitly assumed universality
by postulating a potential function that governs the gravitational field from
the sun.
19.1 Introduction
Green’s Theorem in two dimensions can be interpreted in two different
ways, both leading to important generalizations, namely Stokes’s Theorem
and the Divergence Theorem. In addition, Green’s Theorem has a number
of corollaries that involve normal derivatives, Laplacians, and harmonic
functions, and that anticipate results in analytic function theory, such as
the Cauchy Integral Theorems. A good reference is the book by Flanigan
[16].
173
174CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)
For each t, let s(t) be the distance along the curve from the point r(0) to
the point r(t). The function s(t) is invertible, so that we can also express t
as a function of s, t = t(s). Then s(t) is called the arc-length. We can then
rewrite the parametrization, using as the parameter the variable s instead
of t; that is, the curve C can be described as
Then
dr dr ds dx dy dz ds
r0 (t) = = =( , , ) . (19.2)
dt ds dt ds ds ds dt
The vector
dr dx dy dz
T(s) = =( , , ) (19.3)
ds ds ds ds
has length one, since
ds 2 dx dy dz
( ) = ( )2 + ( )2 + ( )2 . (19.5)
dt dt dt dt
Theorem 19.1 (Green-2D) Let P (x, y) and Q(x, y) have continuous first
partial derivatives for (x, y) in a domain Ω containing both Jordan domain
D and ∂D. Then
I Z Z
∂Q ∂P
P dx + Qdy = − dxdy. (19.6)
∂D D ∂x ∂y
Let the boundary ∂D be the positively oriented parameterized curve
or as
In Equation (19.7) we use the dot product of the vector field F = (P, Q)
with a tangent vector; this point of view will be extended to Stokes’s
Theorem. In Equation (19.8) we use the dot product of the vector field
G = (Q, −P ) with a normal vector; this formulation of Green’s Theorem,
also called Gauss’s Theorem in the plane, will be extended to the Diver-
gence Theorem, also called Gauss’s Theorem in three dimensions. Either
of these extensions therefore can legitimately be called Green’s Theorem in
three dimensions.
• The top:
Z x0
P (x, y0 + ∆y)dx; (19.10)
x0 +∆x
• The bottom:
Z x0 +∆x
P (x, y0 )dx. (19.12)
x0
which is the sum of the two integrals in lines 19.9 and 19.11. In the same
way, we can show that the second half of the double integral is equal to the
line integrals along the top and bottom of ∆.
Now consider the contributions to the double integral
Z Z
∂Q ∂P
( − )dx dy, (19.14)
D ∂x ∂y
which is the sum of each of the double integrals over all the small rectangles
∆ in D. When we add up the contributions of all these infinitesimal rect-
angles, we need to note that rectangles adjacent to one another contribute
19.4. EXTENSION TO THREE DIMENSIONS 177
nothing to the line integral from their shared edge, since the unit outward
normals are opposite in direction. Consequently, the sum of all the line
integrals around the small rectangles reduces to the line integral around
the boundary of D, since this is the only curve without any shared edges.
The double integral in Equation (19.14) is then the line integral around the
boundary only, which is the assertion of Green-2D.
Note that we have used the assumption that Qx and Py are continuous
when we replaced the double integral with iterated single integrals and
when we reversed the order of integration.
so that
∂P ∂P
∇ × (P i) · n = n·j− n · k. (19.17)
∂z ∂y
The vector r(x, y, z) = (x, y, f (x, y)) from the origin to the point (x, y, z)
on the surface S then has
∂r ∂f
=j+ k.
∂y ∂y
∂r
The vector ∂y is tangent to the surface at (x, y, z), and so it is perpendicular
to the unit outward normal. This means that
∂f
n·j+ n · k = 0,
∂y
so that
∂f
n·j=− n · k. (19.18)
∂y
Therefore, using Equations (19.17) and (19.18), we have
∂P ∂f ∂P
∇ × (P i) · n dS = − + n · kdS. (19.19)
∂z ∂y ∂y
Note, however, that
∂P ∂f ∂P ∂F
+ = ,
∂z ∂y ∂y ∂y
where F (x, y) = P (x, y, f (x, y)). Therefore, recalling that
n · k dS = dxdy,
we get
∂F
∇ × (P i) · n dS = − dxdy. (19.20)
∂y
By Green 2-D, we have
Z Z Z Z I
∂F
∇ × (P i) · n dS = − dxdy = F dx.
S D ∂y ∂D
curl(F) = (0, 0, Qx − Py ),
n · curl(F) = Qx − Py .
Also,
F · T = P dx + Qdy.
We see then that Stokes’s Theorem has Green-2D as a special case.
Because the curl of a vector field is defined only for three-dimensional
vector fields, it is not obvious that the curl and Stokes’s Theorem extend
to higher dimensions. They do, but the extensions involve more compli-
cated calculus on manifolds and the integration of (n − 1)-forms over a
suitably oriented boundary of an oriented n-manifold; see Fleming [17] for
the details.
div(F) = Px + Qy + Rz .
Then
Z Z Z Z Z
F · n dS = div(F) dV. (19.21)
S V
Proof: We first prove the theorem for a small cube with vertices (x, y, z),
(x, y + ∆y, z), (x, y, z + ∆z) and (x, y + ∆y, z + ∆z) forming the left side
wall, and the vertices (x + ∆x, y, z), (x + ∆x, y + ∆y, z), (x + ∆x, y, z + ∆z)
and (x + ∆x, y + ∆y, z + ∆z) forming the right side wall. The unit outward
normal for the side wall containing the first four of the eight vertices is
n = (−1, 0, 0); for the other side wall, it is n = (1, 0, 0). For the first side
wall the flux is the normal component of the field times the area of the
wall, or
−P (x, y, z)∆y ∆z,
180CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)
or P (x + ∆x, y, z) − P (x, y, z)
∆x∆y∆z.
∆x
Taking limits, we get
∂P
(x, y, z) dV.
∂x
We then perform the same calculations for the other four walls. Finally,
having proved the theorem for small cubes, we view the entire volume as
a sum of small cubes and add up the total flux for all the cubes. Because
outward flux from one cube’s wall is inward flux for its neighbor, they
cancel out, except when a wall has no neighbor; this means that the only
outward flux that remains is through the surface. This is what the theorem
says.
If we let R = 0 and imagine the volume shrinking down to a two-
dimensional planar domain D, with S compressing down to its boundary,
∂D, the unit normal vector becomes
dy dx
n=( , − ),
ds ds
and Equation (19.21) reduces to Equation (19.6).
We complete the proof by adding these three integral values. Similar cal-
culations show that ∇f (x) = F(x).
Theorem 19.4 tells us that, for a three-dimensional field
R
C
F · Tds is independent of the path and depends only on the points A
and B; then we can write
Z Z B
F · Tds = F · Tds. (19.22)
C A
In addition, the potential function f (x, y, z) can be chosen to be
Z (x,y,z)
f (x, y, z) = F · Tds, (19.23)
(x0 ,y0 ,z0 )
I I
1 ∂q 1 ∂ log r
− log r ds + q ds. (19.28)
2π C ∂n 2π C ∂n
The two line integrals in Equation (19.28) are known as the logarithmic
single-layer potential and logarithmic double-layer potential, respectively,
of the function q.
Notice that we cannot apply Green II directly to the domain D, since
log r is not defined at z = w. The idea is to draw a small circle C 0 centered
at w, with interior D0 and consider the new domain that is the original
D, without the ball D0 around w and its boundary; the new domain has a
hole in it, but that is acceptable. Then apply Green II, and finally, let the
radius of the ball go to zero. There are two key steps in the calculation.
First, we use the fact that, for the small circle, n = r/krk to show that
∂p 1
= ∇p · n = ,
∂n ρ
where ρ is the radius of the small circle C 0 centered at w and r = (z − w)
is the vector from w to z. Then
1
p(z) = log(krk2 ),
2
so that
1 1 1
∇p(z) = 2
∇krk2 = r.
2 krk krk2
Therefore, for z on C 0 , we have
∂p 1
= ∇p(z) · n = .
∂n ρ
Then I I
∂p 1
q ds = qds,
C0 ∂n ρ C0
which, as the radius of C 0 goes to zero, is just 2πq(w).
∂q
Second, we note that the function ∂n is continuous, and therefore
bounded by some constant K > 0 on the circle C 0 ; the constant K can
be chosen to be independent of ρ, for ρ sufficiently close to zero. Conse-
quently, we have
I I
∂q
| log r ds| ≤ K| log ρ ds| = 2πK|ρ log ρ|.
C0 ∂n C0
19.7. APPLICATION TO COMPLEX FUNCTION THEORY 185
Since ρ log ρ goes to zero, as ρ goes to zero, this integral vanishes, in the
limit.
Equation (19.28) tells us that if q is a harmonic function in D, then
its value at any point w inside D is completely determined by what the
∂q
functions q and ∂n do on the boundary C. Note, however, that the nor-
mal derivative of q depends on values of q near the boundary, not just on
the boundary. In fact, q(w) is completely determined by q alone on the
boundary, via I
1 ∂
q(w) = − q(z) G(z, w) ds,
2π C ∂n
where G(z, w) is the Green’s function for the domain D.
According to the heat equation, the temperature u(x, y, t) in a two-
dimensional region at time t is governed by the partial differential equation
∂u
= c∇2 u,
∂t
for some constant c > 0. When a steady-state temperature has been
reached, the function u(x, y, t) no longer depends on t and the resulting
function f (x, y) of (x, y) only satisfies ∇2 f = 0; that is, f (x, y) is harmonic.
Imagine the region being heated by maintaining a temperature distribution
around the boundary of the region. It is not surprising that such a steady-
state temperature distribution throughout the region should be completely
determined by the temperature distribution around the boundary of the
region.
where u(x, y) and v(x, y) are both real-valued functions of two real vari-
ables. So it looks like there is nothing new here; complex function theory
looks like the theory of any two real-valued functions glued together to form
a complex-valued function. There is an important difference, however.
The most important functions in complex analysis are the functions
that are analytic in a domain D in the complex plane. Such functions will
have the property that, for any closed curve C in D,
I
f (z)dz = 0.
C
186CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)
I I
1 ∂u 1 ∂ log r
− log r ds + u ds, (19.30)
2π C ∂n 2π C ∂n
with a similar expression involving v. Because u is harmonic, Equation
(19.30) reduces to
I I
1 ∂u 1 ∂ log r
u(w) = − log r ds + u ds , (19.31)
2π C ∂n 2π C ∂n
with a similar expression involving the function v.
Consider the first line integral in Equation (19.31),
I
1 ∂u
log r ds. (19.32)
2π C ∂n
19.7. APPLICATION TO COMPLEX FUNCTION THEORY 187
So we need only worry about the second line integral in Equation (19.31),
which is
I
1 ∂ log r
u ds. (19.35)
2π C ∂n
We need to look closely at the term
∂ log r
.
∂n
First, we have
∂ log r 1 ∂ log r2
= . (19.36)
∂n 2 ∂n
The function log r2 can be viewed as
log r2 = log a · a , (19.37)
with a similar expression involving v. There is one more step we must take
to get to the Cauchy Integral Formula.
We can write z − w = ρeiθ for z on C. Therefore,
dz
= ρieiθ . (19.42)
dθ
The arc-length s around the curve C is s = ρθ, so that
ds
= ρ. (19.43)
dθ
Therefore, we have
s
θ= , (19.44)
ρ
and
z − w = ρeis/ρ . (19.45)
Then,
dz
= ieis/ρ , (19.46)
ds
or
1 −iθ
ds = e dz. (19.47)
i
Substituting for ds in Equation (19.41) and in the corresponding equation
involving v, and using the fact that
|z − w|eiθ = z − w, (19.48)
and
g(t) = f (z(t)) = u(x(t), y(t)) + iv(x(t), y(t)).
19.8. THE CAUCHY-RIEMANN EQUATIONS AGAIN 189
We also have
cx0 − dy 0 = ux x0 + uy y 0 ,
and
cy 0 + dx0 = vx x0 + vy y 0 .
Since these last two equations must hold for any curve r(t), they must hold
when x0 (t) = 0 for all t, as well as when y 0 (t) = 0 for all t. It follows
that c = ux , d = −uy , c = vy , and d = vx , from which we can get the
Cauchy-Riemann equations easily.
190CHAPTER 19. GREEN’S THEOREM AND RELATED TOPICS (CHAPTER 5,6,13)
Chapter 20
Introduction to Complex
Analysis (Chapter 13)
20.1 Introduction
The material in this chapter is taken mainly from Chapter 13 of the text.
In some cases, the ordering of topics has been altered slightly.
with both u(x, y) and v(x, y) real-valued functions of the two real variables
x and y. Since zx = 1 and zy = i, the differential dz is
dz = dx + idy. (20.2)
For any curve C in the complex plane the line integral of f (z) along C is
defined as
Z Z Z Z
f (z)dz = (u + iv)(dx + idy) = udx − vdy + i vdx + udy.(20.3)
C C C C
191
192CHAPTER 20. INTRODUCTION TO COMPLEX ANALYSIS (CHAPTER 13)
20.3 Differentiability
The derivative of the function f (z) at the point z is defined to be
f (z + ∆z) − f (z)
f 0 (z) = lim , (20.4)
∆z→0 ∆z
whenever this limit exists. Note that ∆z = ∆x + i∆y, so that z + ∆z is
obtained by moving a small distance away from z, by ∆x in the horizontal
direction and ∆y in the vertical direction. When the limit does exist, the
function f (z) is said to be differentiable or analytic at the point z; the
function is then continuous at z as well.
For real-valued functions of a real variable, requiring that the function
be differentiable is not a strong requirement; however, for the functions
w = f (z) it certainly is. What makes differentiability a strong condition is
that, for the complex plane, we can move away from z in infinitely many
directions, unlike in the real case, where all we can do is to move left or
right away from x. As we shall see, in order for f (z) to be differentiable,
the functions u(x, y) and v(x, y) must be related in a special way, called
the Cauchy-Riemann equations.
(u(x + ∆x, y + ∆y) − u(x, y)) + i(v(x + ∆x, y + ∆y) − v(x, y))
f 0 (z) = lim .
∆x,∆y→0 ∆x + i∆y
(20.5)
It follows that
20.5 Integration
Suppose that f (z) is differentiable on and inside a simple closed curve C,
and suppose that the partial derivatives ux , uy , vx , and vy are continuous.
Using Equation (20.3) and applying Green 2-D separately to both of the
integrals, we get
I Z Z Z Z
f (z)dz = − (vx + uy )dxdy + i (ux − vy )dxdy, (20.11)
C D D
where D denotes the interior of the region whose boundary is the curve C.
The Cauchy-Riemann equations tell us that H both of the double integrals
are zero. Therefore, we may conclude that C f (z)dz = 0 for all such simple
closed curves C.
It is important to remember that Green 2-D is valid for regions that
have holes in them; in such cases the boundary C of the region consists
of more than one simple closed curve, so the line integral in Green 2-D is
along each of these curves separately, with the orientation such that the
region remains to the left as we traverse the line.
In a course on complex analysis it is shown that this theorem holds
without the assumptions that the first partial derivatives are continuous;
194CHAPTER 20. INTRODUCTION TO COMPLEX ANALYSIS (CHAPTER 13)
for every simple closed curve. But what happens when n is negative?
Let C be a simple closed curve with z = a inside C. Using the Cauchy-
Goursat Theorem, we may replace the integral around the curve C with
the integral around the circle centered at a and having radius , where is
a small positive number. Then z = a + eiθ for all z on the small circle,
and
dz = ieiθ dθ.
Then
I Z 2π Z 2π
n iθ n iθ n
(z − a) dz = (e ) ie dθ = i ei(n+1)θ dθ.
C 0 0
and if n + 1 = 0 I
(z − a)−1 dz = 2πi.
C
The proof of this result is given, almost, in the worked problem 13.12 on
p. 299 of the text. The difficulty with the proof given there is that the
20.8. TAYLOR SERIES EXPANSIONS 195
Letting ↓ 0, we get I
f (z)
dz = 2πif (a).
C z−a
Differentiating with respect to a in Cauchy’s Integral Theorem we find that
I
0 1 f (z)
f (a) = dz, (20.14)
2πi C (z − a)2
and more generally
I
n! f (z)
f (n) (a) = dz. (20.15)
2πi C (z − a)n+1
So not only is f (z) differentiable, but it has derivatives of all orders. This
is one of the main ways in which complex analysis differs from real analysis.
1/(x2 + 1) = 1 − x2 + x4 − x6 + ...,
and that this series converges for |x| < 1 only. But why? The function
f (x) is differentiable for all x, so why shouldn’t the Taylor series converge
196CHAPTER 20. INTRODUCTION TO COMPLEX ANALYSIS (CHAPTER 13)
for all x, as the Taylor series for sin x or ex do. The answer comes when
we consider the complex extension, f (z) = 1/(z 2 + 1). This function is
undefined at z = i and z = −i. The Taylor series for f (z) converges in the
largest circle centered at a = 0 that does not contain a point where f (z)
fails to be differentiable, so must converge only within a circle of radius
one. This must apply on the real line as well.
Let f be differentiable on and inside a circle C centered at a and let
a + h be inside C. Then
f (a + h) = a0 + a1 h + a2 h2 + ..., (20.16)
where an = f (n) (a)/n!. The Taylor series converges in the largest circle
centered at z = a that does not include a point where f (z) is not differen-
tiable.
Taking a + h inside C and using the Cauchy Integral Theorem, we have
I
1 f (z)
f (a + h) = dz.
2πi C z − (a + h)
Writing
1
1/(z − (a + h)) = 1/((z − a) − h) = (z − a)−1 ,
1 − h(z − a)−1
we have
for
I
1 f (z)
an = dz. (20.18)
2πi C (z − a)n+1
Note that for non-negative n the integral in Equation (20.18) need not be
f (n) (a), since the function f (z) need not be differentiable inside of C2 . This
theorem is discussed in problem 13.82 of the text.
To prove this, we use the same approach as in problem 13.12 of the
text. What we find is that, since the two curves form the boundary of the
annulus, Cauchy’s Integral Theorem becomes
I I
1 f (z) 1 f (z)
f (a + h) = dz − dz. (20.19)
2πi C1 z − (a + h) 2πi C2 z − (a + h)
1
To obtain the desired result we write the expression z−(a+h) in two ways,
depending on if the z lies on C1 or on C2 . For z on C1 we write
1 h h 2
= (z − a)−1 [1 + +( ) + ...], (20.20)
z − (a + h) z−a z−a
while for z on C2 we write
1 z−a z−a 2
= −h−1 [1 + +( ) + ...]. (20.21)
z − (a + h) h h
Then
I ∞ I
f (z) X
n f (z)
dz = h n+1
dz, (20.22)
C1 z − (a + h) n=0 C1 (z − a)
and
I −1 I
f (z) X f (z)
dz = hn dz. (20.23)
C2 z − (a + h) n=−∞ C2 (z − a)n+1
Both integrals are equivalent to integrals over the curve C. The desired
result follows by applying Equation (20.15).
20.11 Residues
Suppose now that we want to integrate f (z) over the simple closed curve
C in the previous section. From Equation (20.18) and n + 1 = 0 we see
that
I
f (z)dz = (2πi)a−1 . (20.24)
C
20.12. THE BINOMIAL THEOREM 199
Note that if f (z) is also differentiable inside of C2 then a−1 = 0 and the
integral is also zero.
If (z − a)m f (z) is differentiable on and inside C then the Laurent ex-
pansion becomes
∞
X
f (z) = a−m (z − a)−m + a−m+1 (z − a)−m+1 + ... + an (z − a)n(.20.25)
n=0
the number a−1 is called the residue of f (z) at the point z = a. Further-
more, we have
1 dm−1 m
a−1 = lim (z − a) f (z) . (20.27)
z→a (m − 1)! dz m−1
Note that we can replace the curve C with an arbitrarily small circle cen-
tered at z = a.
If f (z) is differentiable on and inside
H a simple closed curve C, except
1
for a finite number of poles, then 2πi C
f (z)dz is the sum of the residues
at these poles; this is the Residue Theorem (see problem 13.25 of the text).
7z−2
For example, consider again the function f (z) = (z+1)z(z−2) . For the
annulus 1 < |z + 1| < 3 and the curve C the circle of radius two centered
at z = −1, we have I
f (z)dz = −4πi,
C
since the residues of f (z) are −3 at the pole z = −1 and 1 at the pole
z = 0, both inside C.
z2
For a second example, consider the function f (z) = (z2 +1)(z−2) . The
points z = 2, z = i and z = −i are poles of order one. The residue of f (z)
at z = 2 is
4
lim (z − 2)f (z) = .
z→2 5
for
N N!
= . (20.29)
n n!(N − n)!
Now if α is any real number, we would like to have
∞
X α n
(1 + x)α = x ; (20.30)
n=0
n
The function
f (z) = (1 + z)α (20.31)
is analytic in the region |z| < 1, and so has a Taylor-series expansion of the
form
(1 + z)α = a0 + a1 z + a2 z 2 + ..., (20.32)
where
an = f (n) (0)/n! = α(α − 1)(α − 2) · · · (α − (n − 1))/n!. (20.33)
This tells us how to define α
n . We can also see how to do it when we write
N N (N − 1)(N − 2)(N − (n − 1))
= ; (20.34)
n n!
we now write
α α(α − 1)(α − 2) · · · (α − (n − 1))
= . (20.35)
n n!
Using this extended binomial theorem we have
z 3 3 5
(1 + )−3 = 1 − z + z 2 − z 3 + .... (20.36)
2 2 2 4
Therefore, we have
1 1 3 3 5
f (z) = 3
= − + z − z 2 + ... (20.37)
z(z + 2) 8z 16 16 32
The residue of f (z) at the point z = 0 is then 18 . Since (z + 2)3 f (z) = z −1
and the second derivative is 2z −3 , the residue of f (z) at the point z = −2
is −1
8 .
20.13. USING RESIDUES 201
eiθ − e−iθ z − z −1
sin θ = = ,
2i 2i
and
dz = ieiθ dθ = izdθ.
The integral is then
Z 2π I
1 2
dθ = dz, (20.38)
0 5 + 3 sin θ C 3z 2 + 10iz − 3
where C is the circle of radius one, centered at the origin. The poles of
the integrand are z = −3i and z = − 3i , both of order one. Only the pole
z = − 3i lies within C. The residue of the integrand at the pole z = − 3i is
1 π
4i , so the integral has the value 2 .
We conclude this chapter with a quick look at several of the more im-
portant consequences of the theory developed so far.
|f 0 (z0 )|r ≤ B.
202CHAPTER 20. INTRODUCTION TO COMPLEX ANALYSIS (CHAPTER 13)
Unless f 0 (z0 ) = 0, the left side goes to infinity, as r → ∞, while the right
side stays constant. Therefore, f 0 (z0 ) = 0 for all z0 and f (z) is constant.
This is Liouville’s Theorem.
Then, writing
F (z) = U (x, y) + iV (x, y),
we can easily show that
Z (x,y)
U (x, y) = udx − vdy,
(x0 ,y0 )
and Z (x,y)
V (x, y) = vdx + udy.
(x0 ,y0 )
It follows then, from our previous discussions of the Green’s identities, that
Ux = u, Uy = −v, Vx = v and Vy = u. Therefore, Ux = Vy and Uy = −Vx ;
that is, the Cauchy-Riemann Equations are satisfied. Therefore, since these
partial derivatives are continuous, we can conclude that F (z) is analytic in
R. But then, so is F 0 (z) = f (z).
Chapter 21
203
204 CHAPTER 21. THE QUEST FOR INVISIBILITY (CHAPTER 5,6)
∇ · (S∇u) = 0,
and J = S∇u.
21.4 Cloaking
Suppose we want to hide a conducting object within a non-conducting
region D. We can do this, but it will still be possible to “see” the presence
of D and determine its size. If D is large enough to conceal an object of
a certain size, then one might become suspicious. What we need to do is
21.4. CLOAKING 205
to make it look like the region D is smaller than it really is, or is not even
there.
By solving Laplace’s equation for the region between the outer bound-
ary, where we have measured the flux, and the inner boundary of D, where
the flux is zero, we can see how the size of D is reflected in the solution
obtained. The presence of D distorts the potential function, and therefore
the measured flux. The key to invisibility is to modify the conductivity in
the region surrounding D in such a way that all (or, at least, most) of the
distortion takes place well inside the boundary, so that at the boundary
the potential looks undistorted.
For more mathematical details and discussion of the meta-materials
needed to achieve this, see [7].
206 CHAPTER 21. THE QUEST FOR INVISIBILITY (CHAPTER 5,6)
Chapter 22
Calculus of Variations
(Chapter 16)
22.1 Introduction
In optimization, we are usually concerned with maximizing or minimizing
real-valued functions of one or several variables, possibly subject to con-
straints. In this chapter, we consider another type of optimization problem,
maximizing or minimizing a function of functions. The functions them-
selves we shall denote by simply y = y(x), instead of the more common
notation y = f (x), and the function of functions will be denoted J(y); in
the calculus of variations, such functions of functions are called functionals.
We then want to optimize J(y) over a class of admissible functions y(x). We
shall focus on the case in which x is a single real variable, although there
are situations in which the functions y are functions of several variables.
When we attempt to minimize a function g(x1 , ..., xN ), we consider
what happens to g when we perturb the values xn to xn + ∆xn . In order
for x = (x1 , ..., xN ) to minimize g, it is necessary that
g(x1 + ∆x1 , ..., xN + ∆xN ) ≥ g(x1 , ..., xN ),
for all perturbations ∆x1 , ..., ∆xN . For differentiable g, this means that
the gradient of g at x must be zero. In the calculus of variations, when
we attempt to minimize J(y), we need to consider what happens when we
perturb the function y to a nearby admissible function, denoted y + ∆y. In
order for y to minimize J(y), we need
J(y + ∆y) ≥ J(y),
for all ∆y that make y + ∆y admissible. We end up with something anal-
ogous to a first derivative of J, which is then set to zero. The result is a
207
208 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)
Therefore, we can say that the function y(x) = x minimizes J(y), over all
such functions.
In this example, the functional J(y) involves only the first derivative of
y = y(x) and has the form
Z
J(y) = f (x, y(x), y 0 (x))dx, (22.2)
In general, the functional J(y) can come from almost any function f (u, v, w).
In fact, if higher derivatives of y(x) are involved, the function f can be a
function of more than three variables. In this chapter we shall confine our
discussion to problems involving only the first derivative of y(x).
This problem is different from the previous ones, in that we seek to optimize
a functional, subject to a second functional being held fixed. Such problems
are called problems with constraints.
for |n| ≤ N . The rn are values of the Fourier transform of the function
y(x).
∂f ∂f
(x, y(x), y 0 (x))y 0 (x) + (x, y(x), y 0 (x))y 00 (x). (22.14)
∂v ∂w
22.4. THE EULER-LAGRANGE EQUATION 211
f (u, v, w) = u2 + v 3 + sin w,
and
y(x) = 7x2 .
Then
∂f
(x, y(x), y 0 (x)) = 2x, (22.16)
∂x
and
d d 2
f (x, y(x), y 0 (x)) = x + (7x2 )3 + sin(14x)
dx dx
Therefore, we have
Z x2 ∂f
0 d ∂f
J () = − ( ) ηdx. (22.22)
x1 ∂v dx ∂w
In order for y = y(x) to be the optimal function, this integral must be zero
for every appropriate choice of η(x), when = 0. It can be shown without
too much trouble that this forces
∂f d ∂f
− ( ) = 0. (22.23)
∂v dx ∂w
∂f d ∂f
(x, y(x), y 0 (x)) − (x, y(x), y 0 (x)) = 0. (22.24)
∂v dx ∂w
22.5.1 If f is independent of v
If the function f (u, v, w) is independent of the variable v then the Euler-
Lagrange Equation (22.24) becomes
∂f
(x, y(x), y 0 (x)) = c, (22.25)
∂w
22.5.2 If f is independent of u
Note that we can write
d ∂f
f (x, y(x), y 0 (x)) = (x, y(x), y 0 (x))
dx ∂u
∂f ∂f
+ (x, y(x), y 0 (x))y 0 (x) + (x, y(x), y 0 (x))y 00 (x).
∂v ∂w
(22.26)
We also have
d 0 ∂f
y (x) (x, y(x), y 0 (x)) =
dx ∂w
d ∂f ∂f
y 0 (x) (x, y(x), y 0 (x)) + y 00 (x) (x, y(x), y 0 (x)).
dx ∂w ∂w
(22.27)
d ∂f
f (x, y(x), y 0 (x)) − y 0 (x) (x, y(x), y 0 (x)) =
dx ∂w
∂f ∂f d ∂f
(x, y(x), y 0 (x)) + y 0 (x) − (x, y(x), y 0 (x)).
∂u ∂v dx ∂w
(22.28)
∂f
f (x, y(x), y 0 (x)) − y 0 (x) (x, y(x), y 0 (x)) = c, (22.30)
∂w
for some constant c.
so that
∂f
= 0,
∂v
and
∂f
= 0.
∂u
We conclude that y 0 (x) is constant, so y(x) is a straight line.
Then, since
∂f
= 0,
∂u
and
∂f w
=√ √ ,
∂w 1 + w2 v
Equation (22.30) tells us that
p
1 + y 0 (x)2 y 0 (x)
p − y 0 (x) p p = c. (22.33)
y(x) 1 + y 0 (x)2 y(x)
Equivalently, we have
p p √
y(x) 1 + y 0 (x)2 = a. (22.34)
we obtain
Z
a
x = 2a sin2 θdθ = (2θ − sin 2θ) + k. (22.36)
2
From this, we learn that the minimizing curve is a cycloid, that is, the path
a point on a circle traces as the circle rolls.
There is an interesting connection, discussed by Simmons in [42] , be-
tween the brachistochrone problem and the refraction of light rays. Imagine
a ray of light passing from the point A = (0, a), with a > 0, to the point
B = (c, b), with c > 0 and b < 0. Suppose that the speed of light is v1
above the x-axis, and v2 < v1 below the x-axis. The path consists of two
straight lines, meeting at the point (0, x). The total time for the journey
is then
√ p
a2 + x2 b2 + (c − x)2
T (x) = + .
v1 v2
Fermat’s Principle of Least Time says that the (apparent) path taken by
the light ray will be the one for which x minimizes T (x). From calculus, it
follows that
x c−x
√ = p ,
2
v1 a + x 2 v2 b + (c − x)2
2
sin α1 sin α2
= ,
v1 v2
where α1 and α2 denote the angles between the upper and lower parts of
the path and the vertical, respectively.
Imagine now a stratified medium consisting of many horizontal layers,
each with its own speed of light. The path taken by the light would be
such that sinv α remains constant as the ray passes from one layer to the
next. In the limit of infinitely many infinitely thin layers, the path taken
by the light would satisfy the equation sinv α = constant, with
1
sin α = p .
1 + y 0 (x)2
As
√ we have already seen, the velocity attained by the rolling ball is v =
2gy, so the equation to be satisfied by the path y(x) is
p p
2gy(x) 1 + y 0 (x)2 = constant,
y(x)y 0 (x)2 p
p − y(x) 1 + y 0 (x)2 = c. (22.38)
0
1 + y (x) 2
It follows that
x−a
y(x) = b cosh , (22.39)
b
for appropriate a and b.
It is important to note that being a solution of the Euler-Lagrange Equa-
tion is a necessary condition for a differentiable function to be a solution
to the original optimization problem, but it is not a sufficient condition.
The optimal solution may not be a differentiable one, or there may be no
optimal solution. In the case of minimum surface area, there may not be
any function of the form in Equation (22.39) passing through the two given
end points; see Chapter IV of Bliss [2] for details.
With p
f (x, y(x), y 0 (x)) = y(x) + λ 1 + y 0 (x)2 ,
the Euler-Lagrange Equation becomes
d λy 0 (x)
p − 1 = 0, (22.42)
dx 1 + y 0 (x)2
or
y 0 (x) x−a
p = . (22.43)
1+ y 0 (x)2 λ
x−a
Using the substitution t = λ and integrating, we find that
(x − a)2 + (y − b)2 = λ2 , (22.44)
which is the equation of a circle. So the optimal function y(x) is a portion
of a circle.
What happens if the assigned perimeter P is greater than π2 , the length
of the semicircle connecting (0, 0) and (1, 0)? In this case, the desired curve
is not the graph of a function of x, but a parameterized curve of the form
(x(t), y(t)), for, say, t in the interval [0, 1]. Now we have one independent
variable, t, but two dependent ones, x and y. We need a generalization of
the Euler-Lagrange Equation to the multivariate case.
With a bit more work (see [10]), it can be shown that the desired coefficients
bm are the solution to the system of equations
N
X
rm−k bm = 0, (22.48)
m=0
218 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)
1 1
I Z
1
A= (xdy − ydx) = (x(t)y 0 (t) − y(t)x0 (t))dt. (22.52)
2 C 2 0
The perimeter P of the curve is
Z 1p
P = x0 (t)2 + y 0 (t)2 dt. (22.53)
0
and
d 1 λx0 (t) 1
− y(t) + p − y 0 (t) = 0. (22.56)
dt 2 x0 (t)2 + y 0 (t)2 2
22.9. FINITE CONSTRAINTS 219
It follows that
λx0 (t)
y(t) + p = c, (22.57)
x0 (t)2 + y 0 (t)2
and
λy 0 (t)
x(t) + p = d. (22.58)
x0 (t)2 + y 0 (t)2
Therefore,
dx
where ẋ = dt . Here the function f is
p
f (x, ẋ, y, ẏ, z, ż) = ẋ2 + ẏ 2 + ż 2 .
220 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)
z = g(x, y),
that is, we assume that we can solve for the variable z, and that the function
g has continuous second partial derivatives. We may not be able to do this
for the entire surface, as the equation of a sphere G(x, y, z) = x2 + y 2 +
z 2 − r2 = 0 illustrates, but we can usually solve for z, or one of the other
variables, on part of the surface, as, for example, on the upper or lower
hemisphere.
We then have
which we write as
Z b
J= F (x, ẋ, y, ẏ)dt. (22.63)
a
Using
∂F ∂f ∂(gx ẋ + gy ẏ)
=
∂x ∂ ż ∂x
∂f ∂ dg ∂f ∂ ż
= ( )=
∂ ż ∂x dt ∂ ż ∂x
and
∂F ∂f ∂ ż
= ,
∂y ∂ ż ∂y
we can rewrite the Euler-Lagrange Equations as
d ∂f d ∂f
( ) + gx ( ) = 0, (22.68)
dt ∂ ẋ dt ∂ ż
and
d ∂f d ∂f
( ) + gy ( ) = 0. (22.69)
dt ∂ ẏ dt ∂ ż
∂F ∂f ∂f ∂ ż
= +
∂ ẋ ∂ ẋ ∂ ż ∂ ẋ
∂f ∂f
= + gx ,
∂ ẋ ∂ ż
so that
d ∂F d ∂f d ∂f
( ) = ( ) + ( gx )
dt ∂ ẋ dt ∂ ẋ dt ∂ ż
d ∂f d ∂f ∂f d
= ( ) + ( )gx + (gx )
dt ∂ ẋ dt ∂ ż ∂ ż dt
d ∂f d ∂f ∂f ∂ ż
= ( ) + ( )gx + .
dt ∂ ẋ dt ∂ ż ∂ ż ∂x
Therefore,
d ∂F d ∂f d ∂f ∂F
( ) = ( ) + ( )gx + ,
dt ∂ ẋ dt ∂ ẋ dt ∂ ż ∂x
222 CHAPTER 22. CALCULUS OF VARIATIONS (CHAPTER 16)
so that
d ∂F ∂F d ∂f d ∂f
0= ( )− = ( ) + ( )gx . (22.70)
dt ∂ ẋ ∂x dt ∂ ẋ dt ∂ ż
d ∂f
( ) = λ(t)Gz .
dt ∂ ż
Then we have
Hx = Gx + Gz gx = 0,
so that
Gx
gx = − ;
Gz
similarly, we have
Gy
gy = − .
Gz
Then the Euler-Lagrange Equations become
d ∂f
( ) = λ(t)Gx , (22.71)
dt ∂ ẋ
and
d ∂f
( ) = λ(t)Gy . (22.72)
dt ∂ ẏ
d ∂f d ∂f d ∂f
dt ( ∂ ẋ ) dt ( ∂ ẏ ) dt ( ∂ ż )
= = . (22.73)
Gx Gy Gz
Notice that we could obtain the same result by calculating the Euler-
Lagrange Equation for the functional
Z b
f (ẋ, ẏ, ż) + λ(t)G(x(t), y(t), z(t))dt. (22.74)
a
22.10. HAMILTON’S PRINCIPLE AND THE LAGRANGIAN 223
22.9.2 An Example
Let the surface be a sphere, with equation
0 = G(x, y, z) = x2 + y 2 + z 2 − r2 .
f ẍ − ẋf˙ f ÿ − ẏ f˙ f z̈ − ż f˙
= = .
2xf 2 2yf 2 2zf 2
We can rewrite these equations as
Suppose also that the positions of the particles are constrained by the
conditions
φi (x1 , y1 , z1 , ..., xJ , yJ , zJ ) = 0,
for i = 1, ..., I. Then there are N = 3J − I generalized coordinates q1 , ..., qN
describing the behavior of the particles.
For example, suppose that there is one particle moving on the surface
of a sphere with radius R. Then the constraint is that
x2 + y 2 + z 2 = R 2 .
∂f ∂f
(au, av, aw) = an−1 (u, v, w). (22.75)
∂u ∂u
Proof: We write
∂f f (au + a∆, av, aw) − f (au, av, aw)
(au, av, aw) = lim
∂u ∆→0 a∆
an ∂f ∂f
= (u, v, w) = an−1 (u, v, w).
a ∂u ∂u
∂f ∂f ∂f
u (u, v, w) + v (u, v, w) + w (u, v, w) = nf (u, v, w). (22.76)
∂u ∂v ∂w
22.10. HAMILTON’S PRINCIPLE AND THE LAGRANGIAN 225
so that
g 0 (a) = nan−1 f (u, v, w).
It follows that
∂f ∂f ∂f
u (u, v, w) + v (u, v, w) + w (u, v, w) = nf (u, v, w).
∂u ∂v ∂w
Hamilton’s principle is then that the paths taken by the particles are such
that the integral Z t2 Z t2
L(t)dt = T (t) − V (t)dt
t1 t1
subject to Z x2
r(x)(y(x))2 dx = 1.
x1
22.12 Exercises
Exercise 22.1 Suppose that the cycloid in the brachistochrone problem
connects the starting point (0, 0) with the point (πa, −2a), where a > 0.
q that the time required for the ball to reach the point (πa, −2a) is
Show
π ag .
Sturm-Liouville Problems
(Chapter 10,11)
227
228CHAPTER 23. STURM-LIOUVILLE PROBLEMS (CHAPTER 10,11)
Proof: We have
hAum , un i = λm hum , un i,
and
hAum , un i = hum , Aun i = λn hum , un i.
Since λm 6= λn , it follows that hum , un i = 0.
When we change the inner product on CN the Hermitian matrices may
no longer be the ones we focus on. For any inner product on CN we say
that a matrix B is self-adjoint if
or, equivalently,
Z b
=i g 0 (x)f (x)dx = hSg, f i = hf, Sgi.
a
T y = y 00
for all y(x) in V , which prompts us to say that the differential operator
(−T )y = S 2 y = −y 00 is non-negative definite. We then expect all eigen-
values of −T to be non-negative. We know, in particular, that solutions
of
−y 00 (x) = λy(x),
with y(0) = y(1) = 0 are ym (x) = sin(mπx), and the eigenvalues are
λm = m2 π 2 .
Proof: From
(pyz 0 − pzy 0 )0 = (pz 0 )0 y − (py 0 )0 z
we have
1 d
(Ly)z − y(Lz) = (pyz 0 − py 0 z).
w(x) dx
232CHAPTER 23. STURM-LIOUVILLE PROBLEMS (CHAPTER 10,11)
Therefore,
Z b
(Ly)z − y(Lz) w(x)dx = (pyz 0 − py 0 z)|ba = 0.
a
Therefore, L is self-adjoint on V .
It is interesting to note that
Z b Z b
hLy, yi = p(y 0 )2 dx + qy 2 dx,
a a
23.4 Orthogonality
Once again, let V be the space of all twice continuously differentiable func-
tions y(x) on [a, b] with y(a) = y(b) = 0. Let λm and λn be distinct
eigenvalues of the linear differential operator L given by Equation (23.16),
with associated eigenfunctions um (x) and un (x), respectively. Let the inner
product on V be given by Equation (23.19).
Theorem 23.2 The eigenfunctions um (x) and un (x) are orthogonal.
Proof: We have
d
(p(x)u0m (x)) − w(x)q(x)um (x) = −λm um (x)w(x),
dx
and
d
(p(x)u0n (x)) − w(x)q(x)un (x) = −λn un (x)w(x),
dx
so that
d
un (x) (p(x)u0m (x)) − w(x)q(x)um (x)un (x) = −λm um (x)un (x)w(x)
dx
and
d
um (x) (p(x)u0n (x)) − w(x)q(x)um (x)un (x) = −λn um (x)un (x)w(x).
dx
23.5. NORMAL FORM OF STURM-LIOUVILLE EQUATIONS 233
d d
un (x) (p(x)u0m (x)) − um (x) (p(x)u0n (x)) = (λn − λm )um (x)un (x)w(x).
dx dx
The left side of the previous equation can be written as
d d
un (x) (p(x)u0m (x)) − um (x) (p(x)u0n (x))
dx dx
d
= p(x)un (x)u0m (x) − p(x)um (x)u0n (x) .
dx
Therefore,
Z b
(λn − λm ) um (x)un (x)w(x)dx =
a
p(x)un (x)u0m (x) − p(x)um (x)u0n (x) |ba = 0. (23.20)
We shall make use of this fact in our discussion of Bessel’s and Legendre’s
equations.
and
µ0 (x) = 1/p(x),
234CHAPTER 23. STURM-LIOUVILLE PROBLEMS (CHAPTER 10,11)
23.6 Examples
In this section we present several examples. We shall study these in more
detail later in these notes.
∂2u ∂ ∂u
= g x . (23.27)
∂t2 ∂x ∂x
Separating the variables leads to the differential equation
d dy
−g x = λy(x). (23.28)
dx dx
Note that all three of these differential equations have the form
Ly = λy,
d2 y dy
z2 +z + (z 2 − 02 )y = 0. (23.29)
dz 2 dz
As we shall see shortly, this is a special case of Bessel’s Equation, with
ν = 0.
Note that the differential equation in Equation (23.28) has the form Ly =
λy, but Equation (23.29) was obtained by a change of variable that ab-
sorbed the λ into the z, so we do not expect this form of the equation to
be in eigenvalue form. However, we can rewrite Equation (23.30) as
1 d 0 ν2
− xy (x) + 2 y(x) = y(x), (23.31)
x dx x
which is in the form of a Sturm-Liouville eigenvalue problem, with w(x) =
2
x = p(x), q(x) = xν 2 , and λ = 1. As we shall discuss again in the chapter
236CHAPTER 23. STURM-LIOUVILLE PROBLEMS (CHAPTER 10,11)
Proof: The proof is quite similar to the proof of Theorem 23.2. The main
point is that now
0
xyn (x)ym (x) − xym (x)yn0 (x) |10 = 0
because ym (1) = 0 for all m and the function w(x) = x is zero when x = 0.
0
(1 − x2 )[Pn (x)Pm (x) − Pm (x)Pn0 (x)] |1−1 = 0, (23.37)
• Hermite:
d −x2 dy 2
e + λe−x y = 0;
dx dx
and
• Laguerre:
d −x dy
xe + λe−x y = 0.
dx dx
Exercise 23.1 For each of the three differential equations just listed, see
if you can determine the interval over which their eigenfunctions will be
orthogonal.
238CHAPTER 23. STURM-LIOUVILLE PROBLEMS (CHAPTER 10,11)
Chapter 24
24.1.1 An Example
Consider the differential equation
y 0 = y. (24.1)
y(x) = a0 + a1 x + a2 x2 + ....
Writing
y 0 (x) = a1 + 2a2 x + 3a3 x2 + ...,
and inserting these series into the equation y 0 − y = 0, we have
an = a0 /n!.
239
240CHAPTER 24. SERIES SOLUTIONS FOR DIFFERENTIAL EQUATIONS (CHAPTER 10,11)
(1 + x)y 0 − py = 0, (24.2)
y(x) = a0 + a1 x + a2 x2 + ...,
we find that
this is the Binomial Theorem, with the series converging for |x| < 1.
If both P (x) and Q(x) have Taylor series expansions that converge in a
neighborhood of x = x0 , we say that x0 is an ordinary point for the differen-
tial equation. In that case, we expect to find a Taylor series representation
for the solution that converges in a neighborhood of x0 .
If x0 is not an ordinary point, but both (x − x0 )P (x) and (x − x0 )2 Q(x)
have Taylor series expansions that converge in a neighborhood of x0 , we
say that x0 is a regular singular point of the differential equation. In such
cases, we seek a Frobenius series solution.
24.3. ORDINARY POINTS 241
y 00 + y = 0. (24.5)
we find that
x2 x4 x6 x3 x5
y(x) = a0 1 − + − + ... + a1 x − + − ...
2! 4! 6! 3! 5!
so that
y(x) = a0 cos x + a1 sin x.
Writing
∞
X
y(x) = an xn ,
n=0
we find that
p(p + 1) 2 p(p − 2)(p + 1)(p + 3) 4
y(x) = a0 1 − x + x − ...
2! 4!
(p − 1)(p + 2) 3 (p − 1)(p − 3)(p + 2)(p + 4) 5
+a1 x − x + x − ... .
3! 5!
If p = n is a positive even integer, the first series terminates, and if p = n
is an odd positive integer, the second series terminates. In either case, we
get the Legendre polynomial solutions, denoted Pn (x).
242CHAPTER 24. SERIES SOLUTIONS FOR DIFFERENTIAL EQUATIONS (CHAPTER 10,11)
24.4.1 Motivation
We motivate the Frobenius series approach by considering Euler’s differen-
tial equation,
x2 y 00 + pxy 0 + qy = 0, (24.8)
where both p and q are constants and x > 0. Equation (24.8) can be
written as
p q
y 00 + y 0 + 2 y = 0,
x x
from which we see that x = 0 is a regular singular point.
Changing variables to z = log x, we obtain
d2 y dy
+ (p − 1) + qy = 0. (24.9)
dz 2 dz
We seek a solution of the form y(z) = emz . Inserting this guess into
Equation (24.9), we find that we must have
m2 + (p − 1)m + q = 0;
this is the indicial equation. If the roots m = m1 and m = m2 are distinct,
the solutions are em1 z and em2 z . If m1 = m2 , then the solutions are
em1 z and zem1 z . Reverting back to the original variables, we find that
the solutions are either y(x) = xm1 and y(x) = xm2 , or y(x) = xm1 and
y(x) = xm1 log x.
24.4. REGULAR SINGULAR POINTS 243
or
∞
X
y(x) = xm log x an xn ,
n=0
where m is a root of an indicial equation. This is the Frobenius series
approach.
A Frobenius series associated with the singular point x0 = 0 has the
form
y(x) = xm a0 + a1 x + a2 x2 + ... ,
(24.10)
and
with convergence for |x| < R. Inserting these expressions into the differen-
tial equation, and performing a bit of algebra, we arrive at
∞
(
X
an (m + n)(m + n − 1) + (m + n)p0 + q0 +
n=0
n−1
)
X
xn = 0.
ak (m + k)pn−k + qn−k (24.13)
k=0
this is called the Indicial Equation. We solve the quadratic Equation (24.15)
for m = m1 and m = m2 .
244CHAPTER 24. SERIES SOLUTIONS FOR DIFFERENTIAL EQUATIONS (CHAPTER 10,11)
m2 − ν 2 = 0, (24.16)
Bessel’s Equations
(Chapter 9,10,11)
245
246 CHAPTER 25. BESSEL’S EQUATIONS (CHAPTER 9,10,11)
with neither P (x) nor Q(x) analytic at x = x0 , but with both (x − x0 )P (x)
and (x − x0 )2 Q(x) analytic, are said to be equations with regular singular
points. Writing Equation (25.1) as
1 0 ν2
y 00 (x) + y (x) + (1 − 2 )y(x) = 0, (25.16)
x x
we see that Bessel’s Equation is such a regular singular point equation,
with the singular point x0 = 0. Solutions to such equations can be found
using the technique of Frobenius series.
y(x) = xm a0 + a1 x + a2 x2 + ... ,
(25.17)
and
with convergence for |x| < R. Inserting these expressions into the differen-
tial equation, and performing a bit of algebra, we arrive at
∞
(
X
an (m + n)(m + n − 1) + (m + n)p0 + q0 +
n=0
25.3. SOLVING BESSEL’S EQUATIONS 249
n−1
)
X n
ak (m + k)pn−k + qn−k x = 0. (25.20)
k=0
this is called the Indicial Equation. We solve the quadratic Equation (25.22)
for m = m1 and m = m2 .
m2 − ν 2 = 0, (25.23)
If ν = n is an integer, then
and
we obtain
for n = 0, 1, 2, ....
25.6.3 An Example
We have
Z ∞
1
Γ( ) = e−t t−1/2 dt. (25.36)
2 0
Squaring, we get
Z ∞ Z ∞
1 2 2
Γ( )2 = 4 e−u e−v dudv. (25.38)
2 0 0
Z π
2
=2 1dθ = π. (25.39)
0
Consequently, we have
1 √
Γ( ) = π. (25.40)
2
252 CHAPTER 25. BESSEL’S EQUATIONS (CHAPTER 9,10,11)
for n = 0, 1, ... . The series converges for all x. From Equation (25.41) we
have
x2 x4 x6
J0 (x) = 1 − + − + ..., (25.42)
22 22 42 22 42 62
from which it follows immediately that J0 (−x) = J0 (x).
Using Cauchy’s formula for the coefficients of a Laurent series, we find that
I
1 f (z)
Jn (x) = dz, (25.44)
2πi C z n+1
or, equivalently,
Z 2π
1
J0 (x) = ei(x cos θ) dθ. (25.48)
2π 0
Z ∞ Z 2π
= eirρ cos(θ−ω) dθ g(r)rdr.
0 0
We then have
Z ∞
H(ρ) = 2π rg(r)J0 (rρ)dr. (25.53)
0
For any function s(x) of a single real variable, its Hankel transform is
Z ∞
T (γ) = xs(x)J0 (γx)dx. (25.55)
0
where J0 is the zero-th order Bessel function of the first kind. From the
theory of Bessel functions, we learn that
d
[xJ1 (x)] = xJ0 (x),
dx
so that
2π
H(ρ) = RJ1 (Rρ).
ρ
When the star is viewed through a telescope, the image is blurred by the
atmosphere. It is commonly assumed that the atmosphere performs a con-
volution filtering on the light from the star, and that this filter is random
and varies somewhat from one observation to another. Therefore, at each
observation, it is not H(ρ), but H(ρ)G(ρ) that is measured, where G(ρ) is
the filter transfer function operating at that particular time.
Suppose we observe the star N times, for each n = 1, 2, ..., N measur-
ing values of the function H(ρ)Gn (ρ). If we then average over the various
measurements, we can safely say that the first zero we observe in our mea-
surements is the first zero of H(ρ), that is, the first zero of J1 (Rρ). The
first zero of J1 (x) is known to be about 3.8317, so knowing this, we can
determine R. Actually, it is not truly R that we are measuring, since we
also need to involve the distance D to the star, known by other means.
What we are measuring is the perceived radius, in other words, half the
subtended angle. Combining this with our knowledge of D, we get R.
256 CHAPTER 25. BESSEL’S EQUATIONS (CHAPTER 9,10,11)
Bessel’s Equation
and, as x → ∞,
1 − 4ν 2
q(x) = 1 + → 1,
4x2
so, according to the theorem, every non-trivial solution of Bessel’s Equation
has infinitely many positive zeros.
Now consider the following theorem, which is a consequence of the
Sturm Comparison Theorem discussed elsewhere in these notes.
It follows from these two theorems that, for each fixed ν, the function
yν (x) has an infinite number of positive zeros, say λ1 < λ2 < ..., with
λn → ∞.
For fixed ν, let yn (x) = yν (λn x). As we saw earlier, we have the follow-
ing orthogonality theorem.
R1
Theorem 25.3 For m 6= n, 0 xym (x)yn (x)dx = 0.
25.10. ORTHOGONALITY OF BESSEL FUNCTIONS 257
1 0 ν2
u00 + u + (λ2m − 2 )u = 0,
x x
and
1 0 ν2
v 00 +
v + (λ2n − 2 )v = 0.
x x
Multiplying on both sides by x and subtracting one equation from the
other, we get
Since
d
x(uv 0 − vu0 ) = x(uv 00 − vu00 ) + (uv 0 − vu0 ),
dx
it follows, by integrating both sides over the interval [0, 1], that
Z 1
x(uv 0 − vu0 )|10 = (λ2m − λ2n ) xu(x)v(x)dx.
0
But
x(uv 0 − vu0 )|10 = u(1)v 0 (1) − v(1)u0 (1) = 0.
258 CHAPTER 25. BESSEL’S EQUATIONS (CHAPTER 9,10,11)
Chapter 26
Legendre’s Equations
(Chapter 10,11)
d
− (1 − x2 )y 0 (x) = n(n + 1)y(x), (26.2)
dx
it is a Sturm-Liouville eigenvalue problem with w(x) = 1, p(x) = (1 − x2 )
and q(x) = 0. The polynomials Pn (x) are eigenfunctions of the Legendre
differential operator T given by
d
(T y)(x) = − (1 − x2 )y 0 (x) , (26.3)
dx
but we have not imposed any explicit boundary conditions. Nevertheless,
we have the following orthogonality theorem.
259
260 CHAPTER 26. LEGENDRE’S EQUATIONS (CHAPTER 10,11)
0
(1 − x2 )[Pn (x)Pm (x) − Pm (x)Pn0 (x)] |1−1 = 0, (26.4)
from which we conclude that (x − xn ) does not have constant sign on the
interval [−1, 1].
Now that we know that all N roots of PN (x) are real, we can use
orthogonality again to prove that all the roots are distinct.
Theorem 26.3 All the roots of PN (x) are distinct.
Proof: Suppose that x1 = x2 . Then we can write
N
Y
PN (x) = c(x − x1 )2 (x − xm ) = (x − x1 )2 Q(x),
m=3
by orthogonality. But
Z 1 Z 1
PN (x)Q(x)dx = (x − x1 )2 Q(x)2 dx,
−1 −1
P0 (x) = 1,
P1 (x) = x,
1
P2 (x) = (3x2 − 1),
2
and so on.
Exercise 26.1 Calculate P3 (x).
where
n+1 (n + 1)!
= .
k k!(n + 1 − k)!
Now we find [(x2 − 1)n+1 ](n+1) by defining f (x) = (x2 − 1)n and g(x) =
x2 − 1.
Since g (n+1−k) = 0, except for k = n+1, n, and n−1, the sum above has
only three terms. Two of the three terms involve Pn (x) and Pn0 (x), which
we would already have found. The third term involves the anti-derivative of
Pn (x). We can easily calculate this anti-derivative, except for the constant.
See if you can figure out what the constant must be.
Exercise 26.2 Use the generating function and the Taylor expansion of
log(t + 1) around t = 0 to prove that
Z 1
2
Pn (x)Pn (x)dx = .
−1 2n + 1
(−1)n
P2n (0) = (1 · 3 · 5 · · · (2n − 1)).
2n n!
26.5. A TWO-TERM RECURSIVE FORMULA FOR PN (X) 263
Exercise 26.6 Show that the choice of coefficients in Equation (26.10) for
which the distance in Equation (26.9) is minimized is cm = am , for the an
given in Equation (26.8).
If we select the cn so that the formula in Equation (26.11) is exact for the
functions 1, x, ..., xN −1 , then the formula will provide the exact value of
the integral for any polynomial f (x) of degree less than N . Remarkably,
we can do better than this if we are allowed to select the xn as well as the
cn .
that is, the quadrature method provides the correct answer, not just for
polynomials of degree less than N , but for polynomials of degree less than
2N .
Divide P (x) by PN (x) to get
P (x) = Q(x)PN (x) + R(x),
where both Q(x) and R(x) are polynomials of degree less than N . Then
Z 1 Z 1 Z 1 Z 1
P (x)dx = Q(x)PN (x)dx + R(x)dx = R(x)dx,
−1 −1 −1 −1
In addition,
so that
N
X Z 1 Z 1 N
X
cn R(xn ) = R(x)dx = P (x)dx = cn P (xn ).
n=1 −1 −1 n=1
Chapter 27
∂ψ(x, t) ~ ∂ 2 ψ(x, t)
i~ =− + V (x, t)ψ(x, t), (27.1)
∂t 2m ∂x2
where ~ is Planck’s constant. Here the x is one-dimensional, but extensions
to higher dimensions are also possible.
When the solution ψ(x, t) is selected so that
|ψ(x, t)| → 0,
as |x| → ∞, and Z ∞
|ψ(x, t)|2 dx = 1,
−∞
then, for each fixed t, the function |ψ(x, t)|2 is a probability density function
governing the position of the particle. In other words, the probability of
finding the particle in the interval [a, b] at time t is
Z b
|ψ(x, t)|2 dx.
a
267
268CHAPTER 27. HERMITE’S EQUATIONS AND QUANTUM MECHANICS (CHAPTER 10,11)
f (t) = e−Et/~ ,
where E is defined to be the energy. The function g(x) satisfies the time-
independent Schrödinger Equation
~ 00
− g (x) + V (x)g(x) = Eg(x). (27.2)
2m
An important special case is the harmonic oscillator.
k = 4π 2 mν 2 .
The potential energy is 21 kx2 , while the kinetic energy is 12 mẋ2 . The sum of
the kinetic and potential energies is the total energy, E(t). Since E 0 (t) = 0,
the energy is constant.
27.4. DIRAC’S EQUATION 269
~ 00 1
g (x) + (E − kx2 )g(x) = 0, (27.3)
2m 2
where k = mω , for ω = 2πν. With u = mω
2 2E
p
~ and = ~ω , we have
d2 g
+ ( − u2 )g = 0. (27.4)
du2
Equation (27.4) is equivalent to
y 00 − 2xy 0 + 2py = 0,
2
by writing y(x) = w(x)ex /2 .
In order for the solutions of Equation (27.3) to be physically admissible
solutions, it is necessary that p be a non-negative integer, which means
that
1
E = ~ω(n + ),
2
for some non-negative integer n; this gives the quantized energy levels for
the harmonic oscillator.
Array Processing
(Chapter 8)
In radar and sonar, the field u(s, t) being sampled is usually viewed as a
discrete or continuous superposition of planewave solutions with various
amplitudes, frequencies, and wavevectors. We sample the field at various
spatial locations sm , m = 1, ..., M , for t in some finite interval of time.
We simplify the situation a bit now by assuming that all the planewave
solutions are associated with the same frequency, ω. If not, we perform an
FFT on the functions of time received at each sensor location sm and keep
only the value associated with the desired frequency ω.
In the continuous superposition model, the field is
Z
u(s, t) = eiωt f (k)eik·s dk.
for m = 1, ..., M . The data are then Fourier transform values of the complex
function f (k); f (k) is defined for all three-dimensional real vectors k, but
is zero, in theory, at least, for those k whose squared length ||k||2 is not
equal to ω 2 /c2 . Our goal is then to estimate f (k) from finitely many values
of its Fourier transform. Since each k is a normal vector for its planewave
field component, determining the value of f (k) will tell us the strength of
the planewave component coming from the direction k.
The collection of sensors at the spatial locations sm , m = 1, ..., M ,
is called an array, and the size of the array, in units of the wavelength
λ = 2πc/ω, is called the aperture of the array. Generally, the larger the
271
272 CHAPTER 28. ARRAY PROCESSING (CHAPTER 8)
aperture the better, but what is a large aperture for one value of ω will be
a smaller aperture for a lower frequency.
In some applications the sensor locations are essentially arbitrary, while
in others their locations are carefully chosen. Sometimes, the sensors are
collinear, as in sonar towed arrays. Let’s look more closely at the collinear
case.
We assume now that the sensors are equispaced along the x-axis, at
locations (m∆, 0, 0), m = 1, ..., M , where ∆ > 0 is the sensor spacing; such
an arrangement is called a uniform line array. This setup is illustrated in
273
Since k · (1, 0, 0) = ωc cos θ, for θ the angle between the vector k and the
x-axis, we see that there is some ambiguity now; we cannot distinguish the
cone of vectors that have the same θ. It is common then to assume that the
wavevectors k have no z-component and that θ is the angle between two
vectors in the x, y-plane, the so-called angle of arrival. The wavenumber
variable k = ωc cos θ lies in the interval [− ωc , ωc ], and we imagine that f (k)
is now f (k), defined for |k| ≤ ωc . The Fourier transform of f (k) is F (s), a
function of a single real variable s. Our data is then viewed as the values
F (m∆), for m = 1, ..., M . Since the function f (k) is zero for |k| > ωc , the
Nyquist spacing in s is πc λ 2πc
ω , which is 2 , where λ = ω is the wavelength.
To avoid aliasing, which now means mistaking one direction of arrival
for another, we need to select ∆ ≤ λ2 . When we have oversampled, so that
∆ < λ2 , the interval [− ωc , ωc ], the so-called visible region, is strictly smaller
π π
than the interval [− ∆ , ∆ ]. If the model of propagation is accurate, all
the signal component planewaves will correspond to wavenumbers k in the
visible region and the background noise will also appear as a superposition
of such propagating planewaves. In practice, there can be components in
the noise that appear to come from wavenumbers k outside of the visible
region; this means these components of the noise are not due to distant
sources propagating as planewaves, but, perhaps, to sources that are in
the near field, or localized around individual sensors, or coming from the
electronics within the sensors.
Using the relation λω = 2πc, we can calculate the Nyquist spacing for
any particular case of planewave array processing. For electromagnetic
waves the propagation speed is the speed of light, which we shall take here
to be c = 3 × 108 meters per second. The wavelength λ for gamma rays
is around one Angstrom, which is 10−10 meters; for x-rays it is about one
millimicron, or 10−9 meters. The visible spectrum has wavelengths that
are a little less than one micron, that is, 10−6 meters. Shortwave radio has
wavelength around one millimeter; broadcast radio has a λ running from
about 10 meters to 1000 meters, while the so-called long radio waves can
have wavelengths several thousand meters long. At the one extreme it is
impractical (if not physically impossible) to place individual sensors at the
Nyquist spacing of fractions of microns, while at the other end, managing
to place the sensors far enough apart is the challenge.
The wavelengths used in primitive early radar at the start of World War
II were several meters long. Since resolution is proportional to aperture,
which, in turn, is the length of the array, in units of wavelength, antennae
for such radar needed to be quite large. The general feeling at the time was
that the side with the shortest wavelength would win the war. The cavity
274 CHAPTER 28. ARRAY PROCESSING (CHAPTER 8)
275
276CHAPTER 29. MATCHED FIELD PROCESSING (CHAPTER 10,11,12)
To simplify a bit, we assume here that the sound speed c = c(z) does
not change with range, but only with depth, and that the channel has
constant depth and density. Then, the Helmholtz equation for the function
g(r, z) is
∇2 g(r, z) + [ω/c(z)]2 g(r, z) = 0.
The Laplacian is
1
∇2 g(r, z) = grr (r, z) + gr (r, z) + gzz (r, z).
r
We separate the variables once again, writing
g(r, z) = f (r)u(z).
Then, the range function f (r) must satisfy the differential equation
1
f 00 (r) + f 0 (r) = −αf (r),
r
and the depth function u(z) satisfies the differential equation
k(z)2 = [ω/c(z)]2 .
where p
γm = k 2 − λ2m = (2m − 1)π/2d, m = 1, 2, ....
For each m the corresponding function of the range satisfies the differential
equation
1
f 00 (r) + f 0 (r) + λ2m f (r),
r
(1) (1)
which has solution H0 (λm r), where H0 is the zeroth order Hankel-
function solution of Bessel’s equation. The asymptotic form for this func-
tion is
(1)
p π
πiH0 (λm r) = 2π/λm r exp(−i(λm r + )).
4
It is this asymptotic form that is used in practice. Note that when λm is
complex with a negative imaginary part, there will be a decaying exponen-
tial in this solution, so this term will be omitted in the signal processing.
Having found the range and depth functions, we write g(r, z) as a su-
perposition of these elementary products, called the modes:
M
(1)
X
g(r, z) = Am H0 (λm r)um (z),
m=1
Am = (i/4)um (zs ),
where zs is the depth of the source of the acoustic energy. Notice that
the depth of the source also determines the strength of each mode in this
superposition; this is described by saying that the source has excited certain
modes and not others.
The eigenvalues λm of the depth equation will be complex when
ω (2m − 1)π
k= < .
c 2d
278CHAPTER 29. MATCHED FIELD PROCESSING (CHAPTER 10,11,12)
ωd
u00 (v) + λ2 u(v) = 0, for 0 ≤ v ≤ ,
c
and
c ωd
u00 (v) + (( 0 )2 − 1 + λ2 )u(v) = 0, for < v.
c c
To have a solution, λ must satisfy the equation
r
0 c
tan(λωd/c) = −(λb/b )/ 1 − ( 0 )2 − λ2 ,
c
with
c
1 − ( 0 )2 − λ2 ≥ 0.
c
The trapped modes are those whose corresponding λ satisfies
c
1 ≥ 1 − λ2 ≥ ( 0 )2 .
c
The eigenfunctions are
ωd
um (v) = sin(λm v), for 0 ≤ v ≤
c
and r !
c 2 2
ωd
um (v) = exp −v 1 − ( 0 ) − λ , for < v.
c c
Although the Pekeris model has its uses, it still may not be realistic enough
in some cases and more complicated propagation models will be needed.
29.4. THE GENERAL NORMAL-MODE MODEL 279
Appendices
281
Chapter 30
|u · v| ≤ ||u|| ||v||,
with equality if and only if u and v are parallel. From Equation (30.1) we
know that the dot product u · v is zero if and only if the angle between
these two vectors is a right angle; we say then that u and v are mutually
orthogonal.
Cauchy’s inequality extends to complex vectors u and v:
N
X
u·v = u n vn , (30.2)
n=1
283
284 CHAPTER 30. INNER PRODUCTS AND ORTHOGONALITY
30.1.2 Orthogonality
Consider the problem of writing the two-dimensional real vector (3, −2) as
a linear combination of the vectors (1, 1) and (1, −1); that is, we want to
find constants a and b so that (3, −2) = a(1, 1) + b(1, −1). One way to do
this, of course, is to compare the components: 3 = a + b and −2 = a − b;
we can then solve this simple system for the a and b. In higher dimensions
this way of doing it becomes harder, however. A second way is to make
use of the dot product and orthogonality.
The dot product of two vectors (x, y) and (w, z) in R2 is (x, y) · (w, z) =
xw+yz. If the dot product is zero then the vectors are said to be orthogonal;
the two vectors (1, 1) and (1, −1) are orthogonal. We take the dot product
of both sides of (3, −2) = a(1, 1) + b(1, −1) with (1, 1) to get
1 = (3, −2) · (1, 1) = a(1, 1) · (1, 1) + b(1, −1) · (1, 1) = a(1, 1) · (1, 1) + 0 = 2a,
30.2. GENERALIZING THE DOT PRODUCT: INNER PRODUCTS285
so we see that a = 21 . Similarly, taking the dot product of both sides with
(1, −1) gives
5 = (3, −2) · (1, −1) = a(1, 1) · (1, −1) + b(1, −1) · (1, −1) = 2b,
where the first term on the right is parallel to u and the second one is
orthogonal to u.
How do we find vectors that are mutually orthogonal? Suppose we
begin with (1, 1). Take a second vector, say (1, 2), that is not parallel to
(1, 1) and write it as we did v earlier, that is, as a sum of two vectors,
one parallel to (1, 1) and the second orthogonal to (1, 1). The projection
of (1, 2) onto the line parallel to (1, 1) passing through the origin is
(1, 1) · (1, 2) 3 3 3
(1, 1) = (1, 1) = ( , )
(1, 1) · (1, 1) 2 2 2
so
3 3 3 3 3 3 1 1
(1, 2) = ( , ) + ((1, 2) − ( , )) = ( , ) + (− , ).
2 2 2 2 2 2 2 2
The vectors (− 21 , 12 ) = − 12 (1, −1) and, therefore, (1, −1) are then orthogo-
nal to (1, 1). This approach is the basis for the Gram-Schmidt method for
constructing a set of mutually orthogonal vectors.
• 2: hv, ui = hu, vi ;
The inner product is the basic ingredient in Hilbert space theory. Using
the inner product, we define the norm of u to be
p
||u|| = hu, ui
or
|hu, vi| ≤ ||u|| ||v||,
with equality if and only if there is a scalar c such that v = cu. We say
that the vectors u and v are orthogonal if hu, vi = 0. We turn now to
some examples.
and qX
||u|| = |un |2 .
The sums are assumed to be finite; the index of summation n is singly
or doubly infinite, depending on the context. The Cauchy-Schwarz
inequality says that
X qX qX
| un vn | ≤ |un |2 |vn |2 .
and sZ
||u|| = |f (x)|2 dx.
hu, vi = E(XY )
and p
||u|| = E(|X|2 ),
which is the standard deviation of X if the mean of X is zero. The
expected values are assumed to be finite. The Cauchy-Schwarz in-
equality now says that
p p
|E(XY )| ≤ E(|X|2 ) E(|Y |2 ).
and sZ
||u|| = |f (x)|2 w(x)dx.
Vmn = hvm , vn i,
an orthogonal basis {u1 , ..., uN } for the span of the vn . Begin by taking
u1 = v1 . For j = 2, ..., N , let
u1 · vj 1 uj−1 · vj j−1
uj = vj − u − ... − u . (30.3)
u1 · u1 uj−1 · uj−1
One obvious problem with this approach is that the calculations become
increasingly complicated and lengthy as the j increases. In many of the
important examples of orthogonal functions that we study in connection
with Sturm-Liouville problems, there is a two-term recursive formula that
enables us to generate the next orthogonal function from the two previous
ones.
Chapter 31
Chaos
291
292 CHAPTER 31. CHAOS
31.3 Stability
A fixed point z∗ of f (z) is said to be stable if |f 0 (z∗ )| < 1, where f 0 (z∗ ) =
r(1 − 2z∗ ). Since we are assuming that r > 1, the fixed point z∗ = 0 is
unstable. The point z∗ = 1 − 1r is stable if 1 < r < 3. When z∗ is a stable
fixed point, and zk is sufficiently close to z∗ , we have
so we get closer to z∗ with each iterative step. Such a fixed point is attrac-
tive. In fact, if r = 2, z∗ = 1− 1r = 12 is superstable and convergence is quite
rapid, since f 0 ( 12 ) = 0. We can see from Figure 31.3 that, for 1 < r < 3,
the iterative sequence {zk } has the single limit point z∗ = 1 − 1r .
What happens beyond r = 3 is more interesting. For r > 3 the fixed
point z∗ = 1 − 1r is no longer attracting, so all the fixed points are repelling.
What can the sequence {zk } do in such a case? As we see from Figure
31.3 and the close-up in Figure 31.5, for values of r from 3 to about 3.45,
the sequence {zk } eventually oscillates between two subsequential limits;
the sequence is said to have period two. Then period doubling occurs. For
values of r from about 3.45 to about 3.54, the sequence {zk } has period
four, that is, the sequence eventually oscillates among four subsequential
31.4. PERIODICITY 293
31.4 Periodicity
For 1 < r < 3 the fixed point z∗ = 1 − 1r is stable and is an attracting fixed
point. For r > 3, the fixed point z∗ is no longer attracting; if zk is near z∗
then zk+1 will be farther away.
Using the change of variable x = −rz + 2r , the iteration in Equation
(31.5) becomes
r r2
xk+1 = x2k + ( − ), (31.8)
2 4
and the fixed points become x∗ = 2r and x∗ = 1 − 2r .
For r = 3.835 there is a starting point x0 for which the iterates are
periodic with period three, which implies, according to the results of Li
and Yorke, that there are periodic orbits with period n, for all positive
integers n [33].
In [8] Burger and Starbird illustrate the sensitivity of this iterative scheme
to the choice of x0 . The numbers in the first column of Figure 31.6 were
generated by Excel using Equation (31.9) and starting value x0 = 0.5. To
form the second column, the authors retyped the first twelve entries of the
first column, exactly as shown on the page, and then let Excel proceed to
calculate the remaining ones. Obviously, the two columns become quite
different, as the iterations proceed. Why? The answer lies in sensitivity of
the iteration to initial conditions.
When Excel generated the first column, it kept more digits at each
step than it displayed. Therefore, Excel used more digits to calculate the
thirteenth item in the first column than just what is displayed as the twelfth
entry. When the twelfth entry, exactly as displayed, was used to generate
the thirteenth entry of the second column, those extra digits were not
available to Excel. This slight difference, beginning in the tenth decimal
place, was enough to cause the observed difference in the two tables.
294 CHAPTER 31. CHAOS
For r > 4 the set of starting points in [0, 1] for which the sequence of
iterates never leaves [0, 1] is a Cantor set, which is a fractal. The book by
Devaney [12] gives a rigorous treatment of these topics; Young’s book [48]
contains a more elementary discussion of some of the same notions.
We say that the sequence {zk } is bounded if there is a constant B such that
|zk | ≤ B, for all k. We want to know for which c the sequence generated
by Equation (31.10) is bounded.
In Figure 31.7 those c for which the iterative sequence {zk } is bounded
are in the black Mandelbrot set, and those c for which the sequence is
not bounded are in the white set. It is not apparent from the figure, but
when we zoom in, we find the entire figure repeated on a smaller scale. As
we continue to zoom in, the figure reappears again and again, each time
smaller than before. √
There is a theorem that tells us that if |zk | ≥ 1+ 2 for some k, then the
sequence is not bounded. Therefore, if c is in the white set, we will know
this for certain after we have computed finitely many iterates. Such sets are
sometimes called recursively enumerable. However, there does not appear
to be an algorithm that will tell us when c is in the black set. The situation
is described by saying that the black set, often called the Mandelbrot set,
is non-recursive.
Previously, we were interested in what happens as we change c, but
start the iteration at z0 = 0 each time. We could modify the problem
slightly, using only a single value of c, but then starting at arbitrary points
z0 . Those z0 for which the sequence is bounded form the new black set,
called the filled Julia set associated with the function f (z) = z 2 + c.
31.8. THE NEWTON-RAPHSON ALGORITHM 295
For much more on this subject and related ideas, see the book by Roger
Penrose [38].
Here J (g)(x) is the Jacobian matrix of first partial derivatives of the com-
ponent functions of g; that is, its entries are ∂g
∂xj (x). The operator T is
m
now
zk 1
zk+1 = T zk = + . (31.15)
2 2zk
2zk 1
zk+1 = T zk = + 2. (31.16)
3 3zk
Where are the basins of attraction now? Is the complex plane divided up
as three people would divide a pizza, into three wedge-shaped slices, each
containing one of the roots? Far from it, as Figure 31.8 shows. In this
figure the color of a point indicates the root to which the iteration will
converge, if it is started at that point. In fact, it can be shown that, if the
sequence starting at z0 = a converges to z = 1 and the sequence starting
at z0 = b converges to ω, then there is a starting point z0 = c, closer to a
than b is, whose sequence converges to ω 2 . For more details, see Schroeder’s
delightful book [41].
Wavelets
305
306 CHAPTER 32. WAVELETS
produced. The antenna, now in receiving mode, picks up the echo f (t),
which is related to the original signal by
where d(t) is the time required for the original signal to make the round trip
from the antenna to the target and return back at time t. The amplitude A
incorporates the reflectivity of the target as well as attenuation suffered by
the signal. As we shall see shortly, the delay d(t) depends on the distance
from the antenna to the target and, if the target is moving, on its radial
velocity. The main signal-processing problem here is to determine target
range and radial velocity from knowledge of f (t) and ψ(t).
If the target is stationary, at a distance r0 from the antenna, then
d(t) = 2r0 /c, where c is the speed of light. In this case the original signal
and the received echo are related simply by
Exercise 32.1 Suppose the target is at a distance r0 > 0 from the antenna
at time t = 0, and has radial velocity v, with v > 0 indicating away from
the antenna. Show that the delay function d(t) is now
r0 + vt
d(t) = 2
c+v
and f (t) is related to ψ(t) according to
t−b
f (t) = Aψ( ), (32.1)
a
for
c+v
a=
c−v
and
2r0
b= .
c−v
Show also that if we select A = ( c−v
c+v )
1/2
then energy is preserved; that is,
||f || = ||ψ||.
308 CHAPTER 32. WAVELETS
Exercise 32.2 Let Ψ(ω) be the Fourier transform of the signal ψ(t). Show
that the Fourier transform of the echo f (t) in Equation (32.1) is then
The basic problem is to determine a and b, and therefore the range and
radial velocity of the target, from knowledge of f (t) and ψ(t). An obvious
approach is to do a matched filter.
32.4 Wavelets
32.4.1 Background
The fantastic increase in computer power over the last few decades has
made possible, even routine, the use of digital procedures for solving prob-
lems that were believed earlier to be intractable, such as the modeling of
large-scale systems. At the same time, it has created new applications
unimagined previously, such as medical imaging. In some cases the math-
ematical formulation of the problem is known and progress has come with
the introduction of efficient computational algorithms, as with the Fast
Fourier Transform. In other cases, the mathematics is developed, or per-
haps rediscovered, as needed by the people involved in the applications.
Only later it is realized that the theory already existed, as with the de-
velopment of computerized tomography without Radon’s earlier work on
reconstruction of functions from their line integrals.
It can happen that applications give a theoretical field of mathematics a
rebirth; such seems to be the case with wavelets [28]. Sometime in the 1980s
researchers working on various problems in electrical engineering, quantum
mechanics, image processing, and other areas became aware that what the
32.4. WAVELETS 309
others were doing was related to their own work. As connections became
established, similarities with the earlier mathematical theory of approxi-
mation in functional analysis were noticed. Meetings began to take place,
and a common language began to emerge around this reborn area, now
called wavelets. One of the most significant meetings took place in June
of 1990, at the University of Massachusetts Lowell. The keynote speaker
was Ingrid Daubechies; the lectures she gave that week were subsequently
published in the book [11].
There are a number of good books on wavelets, such as [30], [4], and [45].
A recent issue of the IEEE Signal Processing Magazine has an interesting
article on using wavelet analysis of paintings for artist identification [29].
Fourier analysis and synthesis concerns the decomposition, filtering,
compressing, and reconstruction of signals using complex exponential func-
tions as the building blocks; wavelet theory provides a framework in which
other building blocks, better suited to the problem at hand, can be used.
As always, efficient algorithms provide the bridge between theory and prac-
tice.
(1, −3, 2, 4) = 1(1, 1, 1, 1) − 2(1, 1, −1, −1) + 2(1, −1, 0, 0) − 1(0, 0, 1, −1).
The first basis element, (1, 1, 1, 1), does not vary over a two-second interval.
The second one, (1, 1, −1, −1), is orthogonal to the first, and does not vary
over a one-second interval. The other two, both orthogonal to the previous
two and to each other, vary over half-second intervals. We can think of these
basis functions as corresponding to different frequency components and
time locations; that is, they are giving us a time-frequency decomposition.
Suppose we let φ0 (t) be the function that is 1 on the interval [0, 1) and
0 elsewhere, and ψ0 (t) the function that is 1 on the interval [0, 0.5) and −1
on the interval [0.5, 1). Then we say that
and
ψ0 (t) = (1, −1, 0, 0).
310 CHAPTER 32. WAVELETS
Then we write
φ−1 (t) = (1, 1, 1, 1) = φ0 (0.5t),
ψ0 (t − 1) = (0, 0, 1, −1),
and
ψ−1 (t) = (1, 1, −1, −1) = ψ0 (0.5t).
So we have the decomposition of (1, −3, 2, 4) as
(1, −3, 2, 4) = 1φ−1 (t) − 2ψ−1 (t) + 2ψ0 (t) − 1ψ0 (t − 1).
with
∞
|Ψ(ω)|2
Z
Cψ = dω
−∞ |ω|
for Ψ(ω) the Fourier transform of ψ(t).
Exercise 32.3 Let w(t) = ψHaar (t). Show that the functions wjk (t) =
w(2j t − k) are mutually orthogonal on the interval [0, 1], where j = 0, 1, ...
and k = 0, 1, ..., 2j − 1.
These functions wjk (t) are the Haar wavelets. Every continuous func-
tion f (t) defined on [0, 1] can be written as
j
∞ 2X
X −1
f (t) = c0 + cjk wjk (t)
j=0 k=0
for some choice of c0 and cjk . Notice that the support of the function wjk (t),
the interval on which it is nonzero, gets smaller as j increases. Therefore,
the components corresponding to higher values of j in the Haar expansion
of f (t) come from features that are localized in the variable t; such features
are transients that live for only a short time. Such transient components
affect all of the Fourier coefficients but only those Haar wavelet coefficients
corresponding to terms supported in the region of the disturbance. This
ability to isolate localized features is the main reason for the popularity of
wavelet expansions.
with convergence in the mean-square sense. The coefficients cjk are found
using the IWT:
k 1
cjk = (Wψ f )( j , j ).
2 2
As with Fourier series, wavelet series expansion permits the filtering of
certain components, as well as signal compression. In the case of Fourier
series, we might attribute high frequency components to noise and achieve
a smoothing by setting to zero the coefficients associated with these high
frequencies. In the case of wavelet series expansions, we might attribute to
noise localized small-scale disturbances and remove them by setting to zero
the coefficients corresponding to the appropriate j and k. For both Fourier
and wavelet series expansions we can achieve compression by ignoring those
components whose coefficients are below some chosen level.
Bibliography
8. Burger, E., and Starbird, M. (2006) Coincidences, Chaos, and All That
Math Jazz New York: W.W. Norton, Publ.
313
314 BIBLIOGRAPHY
14. Fara, P. (2009) Science: A Four Thousand Year History, Oxford Uni-
versity Press.
15. Feynman, R., Leighton, R., and Sands, M. (1963) The Feynman Lec-
tures on Physics, Vol. 1. Boston: Addison-Wesley.
18. Gleick, J. (1987) Chaos: The Making of a New Science. Penguin Books.
22. Greenblatt, S. (2011) The Swerve: How the World Became Modern.
New York: W.W. Norton.
23. Greene, B. (2011) The Hidden Reality: Parallel Universes and the Deep
Laws of the Cosmos. New York: Vintage Books.
29. Johnson, C., Hendriks, E., Berezhnoy, I., Brevdo, E., Hughes, S.,
Daubechies, I., Li, J., Postma, E., and Wang, J. (2008) “Image Pro-
cessing for Artist Identification” IEEE Signal Processing Magazine,
25(4), pp. 37–48.
317
318 INDEX