Notes On Classical Mechanics
Notes On Classical Mechanics
THIERRY LAURENS
I Newtonian Mechanics 1
1 Newton’s equations 2
1.1 Empirical assumptions . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Kinetic energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Potential energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Total energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Linear momentum . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Angular momentum . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Central fields 38
3.1 Central fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Periodic orbits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Kepler’s problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Virial theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
II Lagrangian Mechanics 55
4 Euler–Lagrange equations 56
4.1 Principle of least action . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Conservative systems . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Nonconservative systems . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Equivalence to Newton’s equations . . . . . . . . . . . . . . . . . 66
4.5 Momentum and conservation . . . . . . . . . . . . . . . . . . . . 67
i
CONTENTS ii
5 Constraints 77
5.1 D’Alembert–Lagrange principle . . . . . . . . . . . . . . . . . . . 77
5.2 Gauss’ principle of least constraint . . . . . . . . . . . . . . . . . 79
5.3 Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4 Integral constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5 Existence of closed orbits . . . . . . . . . . . . . . . . . . . . . . 85
5.6 One-form constraints . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6 Hamilton–Jacobi equation 98
6.1 Hamilton–Jacobi equation . . . . . . . . . . . . . . . . . . . . . . 98
6.2 Separation of variables . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3 Conditionally periodic motion . . . . . . . . . . . . . . . . . . . . 102
6.4 Geometric optics analogy . . . . . . . . . . . . . . . . . . . . . . 104
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
References 207
PREFACE
iv
PART I
NEWTONIAN MECHANICS
1
CHAPTER 1
NEWTON’S EQUATIONS
We begin the physical theory with the essential empirical observations and
its immediate consequences. The material for this chapter is based on [Arn89,
Ch. 1], [AKN06, Ch. 1], and [Gol51, Ch. 1].
2
CHAPTER 1. NEWTON’S EQUATIONS 3
The system of equations (1.2) are called Newton’s equations (or New-
ton’s second law). Unless otherwise noted, we will always assume the particle
masses mi are constant. In this case, (1.2) takes the form
(c) Galilean boosts: g3 (t, x) = (t, x + vt) for some fixed velocity v ∈ Rd .
In fact, these examples generate the entire Galilean group (cf. Exercise 1.1).
Galileo’s principle of relativity is the experimental observation that for
an isolated system there exists a reference frame—a choice of origin and
coordinate axes for Rd —that is invariant under any Galilean transformation.
CHAPTER 1. NEWTON’S EQUATIONS 4
Such a reference frame is called inertial, and the principle also asserts that all
coordinate systems in uniform motion with constant velocity with respect to an
inertial frame must also be inertial. (This is observed, for example, in a car
traveling at constant velocity and noting that motion inside the car is as if the
car were at rest.) We formulate this mathematically as follows:
Definition 1.3 (Galileo’s principle of relativity). A Newtonian system is iso-
lated if there exists a reference frame so that Newton’s equations (1.2) are
invariant under any Galilean transformation.
Physically, this is requiring that the ambient space is both homogeneous and
isotropic and that time is homogeneous. Geometrically, this principle requires
that if we apply a Galilean transformation to a phase portrait, then the resulting
graph still consists of trajectories.
If Newton’s equations (1.2) hold in an inertial coordinate system, then they
must be invariant with respect to the Galilean group. Let x(t) denote a solution
in these coordinates. Applying the Galilean group generators of Example 1.2,
we find the following conditions on Fi :
(a) Translation invariance: Fi (t, x, ẋ) ≡ Fi (xj − xk , ẋ).
(b) Rotation and reflection invariance: Fi (Ax, Aẋ) = AFi (x, ẋ) for A ∈ O(d).
(c) Boost invariance: Fi (xj − xk , ẋ) ≡ Fi (xj − xk , ẋj − ẋk ).
Note that the third type of transformation in Example 1.2 changes neither ẍ
nor xi − xj .
Proposition 1.4 (Newton’s first law, special case). For an isolated Newtonian
system of one particle, the particle’s acceleration in an inertial coordinate sys-
tem vanishes. In particular, the motion is rectilinear: uniform in time with
constant velocity.
Proof. Taking N = 1, the conditions (a)–(c) above require that F is independent
of x, ẋ, t and is rotationally invariant. Therefore F ≡ 0.
The kinetic energy of the ith particle and the total kinetic energy are
given by
XN
1 2 1 2
Ki = 2 mi |ẋi | = 2mi |pi | , K= Ki (1.4)
i=1
respectively. From experience, we know that the magnitude of the velocity and
hence the kinetic energy can be increased and decreased by the force Fi acting
on the ith particle, depending on the force’s magnitude and direction. This is
measured through the work Wi done by the force Fi on the ith particle from
time 0 to t, defined by
Z x(t) Z t
Wi = Fi · dsi = Fi (x(τ )) · ẋ(τ ) dτ.
x(0) 0
CHAPTER 1. NEWTON’S EQUATIONS 5
(We use dsi to denote the line element of the trajectory xi (t), and so the second
equality is the definition of path integration.) Although work is measured in
the physical space Rd , the total work
Z x(t)
W = F · ds
x(0)
is independent of the path chosen for the line integral. (Note that the line
integral path is allowed to be arbitrary, and is not limited to trajectories.) This
independence is equivalent to the work around any simple closed path vanishing,
since two paths with the same endpoints can be concatenated to form one closed
path.
One way for this to be satisfied is if there is a potential energy, a function
V such that F = −∇V . Indeed, if this is the case, then by the fundamental
theorem of calculus we have
Z z
W =− ∇V (s) · ds = −V (z) + V (y)
y
on R2 r {(0, 0)}. This is the negative gradient of the angle coordinate, which is
defined on all of R2 r {(0, 0)} unlike the angle coordinate alone. Consequently,
by the fundamental theorem of calculus the work done on a particle traveling
around any simple closed curve not containing the origin is zero. Conversely,
the work done on a particle traveling once clockwise about the unit circle is 2π,
which reflects the fact that we cannot define a single-valued angle coordinate on
all of R2 r {(0, 0)}. Consequently, the system consisting of one particle subject
to this force is an example of a nonconservative force.
CHAPTER 1. NEWTON’S EQUATIONS 7
Example 1.8 shows that unlike R2 , a curl-free vector field on R2 r {(0, 0)} is
not necessarily gradient field. Consequently, the fact that Proposition 1.7 holds
on RN d r ∆ may initially be surprising. To understand why this works, take
d = 3, fix N − 1 particles, and consider only moving the ith particle. Let S be
a smooth 2-dimensional surface in R3 with boundary. Using Green’s theorem
we write Z Z
0= Fi · dsi = (∇i × Fi ) · n dA,
∂S S
where n is a unit vector field that is perpendicular to S. For this to hold for
all such surfaces S, then we must have ∇i × Fi ≡ 0. So the definition of a
conservative force is an integral formulation of Fi S being curl-free on all of R3 ,
which avoids the issue of Fi not being defined on j6=i {xj }. As the domain is
all of R3 , we know that ∇ × Fi ≡ 0 implies FS i = −∇Vi (xi ), but with the caveat
that our potential Vi may not be defined on j6=i {xj } because F is not defined
on the diagonals ∆.
A similar argument also applies to dimensions d 6= 3 using differential forms:
if Fi is conservative then Fi is a closed 1-form (i.e. dFi ≡ 0) on all of Rd . This
S Fi is exact (i.e. Fi = −dVi ), although the 0-form Vi may not
then implies that
be defined on j6=i {xj }. In comparison, the 1-form Fi of Example 1.8 is closed
on R2 r {0} (if S ⊂ R2 r {0} then the work around ∂S vanishes), but it is not
closed on all of R2 .
We now have two notions of mechanical energy: kinetic and potential. Their
sum E = K + V is the system’s total energy, and it is conserved under the
dynamics:
Proposition 1.9 (Conservation of energy). For a conservative system, the total
energy E is conserved: E(t) = E(0) for all t ∈ R.
CHAPTER 1. NEWTON’S EQUATIONS 8
Ė = K̇ + ∇V · ẋ = ẋ · F − F · ẋ = 0.
Let I denote the set of t ∈ R for which (1.7) holds. The set I is connected as
t 7→ (x(t), ẋ(t)) is continuous. The set I is open: we just showed that (1.6) at
time t implies (1.7) for the same t, and (1.7) at time t trivially implies that (1.6)
holds on a neighborhood of t. The set I is also closed, since the inequality (1.7)
is a closed condition. Finally, picking r sufficiently small, we can ensure that
the set I contains t = 0 and thus is nonempty. Altogether, the connectedness of
I implies that I = R. In particular, the blowup condition of Corollary A.6 can
never be satisfied, and so the maximal time of existence cannot be finite.
For intuition about the behavior of solutions, we can picture a small ball
rolling down the graph of V (x). Suppose we have a solution x(t) to Newton’s
equations (1.3) with E(x(t)) ≡ E0 . As kinetic energy is nonnegative, then a
ball at position V (x(t)) is confined to the region where V (x) ≤ E0 . A smaller
potential energy yields a greater kinetic energy by Proposition 1.9, which implies
a greater velocity. This means that the ball gains velocity as it rolls downhill.
This picture makes some facts very intuitive, like that local minima and maxima
of V (x) are stable and unstable equilibria for the system, respectively. For a
general bounded region V (x) ≤ E0 , the ball rolls right through any minima and
up towards the boundary V −1 (E0 ).
where Fij is the interaction force between the ith and jth particle and Fei is
the external force on the ith particle. A system is called closed if there are
no external forces:
Fei ≡ 0 for all i.
We will assume that the interaction forces obey the law of action and
reaction (or Newton’s third law): the experimental observation that the
force two particles exert on each other are equal and opposite, i.e.
Example 1.12. Any system of the form in Example 1.6 obeys the law of action
and reaction since eij = −eji .
Interaction forces for non-Newtonian systems generally do not obey this law.
For example, a particle with electric charge q placed in an electromagnetic field
is acted upon by the Lorentz force
F = q E + 1c (v × H) ,
where E, H are the electric and magnetic fields (which satisfy Maxwell’s equa-
tions) and c is the speed of light. For a system of two electrically charged
particles, the cross product term creates a non-collinear interaction force.
The effect of all external forces together can be observed through the total
(linear) momentum
XN
P= pi ,
i=1
which is conserved for a closed system:
Proposition 1.13 (Conservation of linear momentum). The increase in total
momentum is equal to the total force. Moreover, if the forces can be decomposed
as (1.8) and satisfy the law of action and reaction,
P then the change in the total
momentum is equal to the total external force i Fei . In particular, for a closed
system the total linear momentum is conserved.
Proof. By Newton’s equations (1.2) we have
N
X
Ṗ = Ḟi ,
i=1
which is the second claim. In the last equality, we note that Fij = −Fji causes
the double sum over interaction forces to cancel pairwise.
A similar argument shows that we have component-wise conservation under
weaker assumptions:
Corollary 1.14. Suppose that the forces can be decomposed as (1.8) and that
the law of action and reaction holds. If the total external force is perpendicular to
an axis, then the component of the total momentum along that axis is conserved.
Proof. Let a be a unit vector so that
N
X
a· Fei = 0.
i=1
CHAPTER 1. NEWTON’S EQUATIONS 11
d
(a · P) = 0
dt
and thus a · P is conserved.
As we will see later, the conservation of momentum along an axis is asso-
ciated with the system’s invariance under spatial translations along that axis
(cf. Proposition 4.12). Specifically, we say that a system is symmetric in the
direction of the unit vector a if whenever xi (t) is a solution then so is xi (t) + ca
for all i, for any constant c.
The total momentum can also be observed through the system’s center of
mass (or barycenter)
PN
mi xi
X = Pi=1N
. (1.10)
i=1 mi
It turns out that this definition is independent of the choice or origin, and it is
characterized as the point with respect to which the total momentum vanishes
(cf. Exercise 1.4). Moreover, the total momentum is equivalent to all of the
mass lying at the center of mass:
N
X N
X
P= pi = M Ẋ, where M= mi . (1.11)
i=1 i=1
Proposition 1.15 (Newton’s first law, general case). The center of mass evol-
ves as if all masses were concentrated at it and all forces were applied to it. In
particular, for a closed system the motion of the center of mass is rectilinear
(i.e. uniform in time with constant velocity).
In this section, we will specialize to the case d = 3 so that we may use the
cross product on R3 . The angular momentum (about the origin) of the ith
particle and the total angular momentum are given by
N
X
Li = xi × pi , L= Li
i=1
CHAPTER 1. NEWTON’S EQUATIONS 12
respectively. The torque (or moment of force) of the ith particle and the
total torque are given by
N
X
Ni = xi × Fi , N= Ni
i=1
respectively.
When the forces can be decomposed as (1.8), we define the external torque
Nei = xi × Fei .
which is the first claim. In the third equality we noted that the term ẋi × pi
vanishes because pi is parallel to ẋi . Assuming the decomposition (1.8), we
obtain
N
X N
X N
X N
X
L̇ = x i × Fi = xi × Fij + xi × Fei = Nei , (1.12)
i=1 i,j=1 i=1 i=1
i6=j
which is the second claim. In the last equality we noted that Fij = −Fji causes
the double sum cancels pairwise:
In particular, for a closed system the RHS of (1.12) vanishes and we have
L̇ = 0.
A similar argument shows that we have component-wise conservation under
weaker assumptions:
Corollary 1.17. Suppose that the forces can be decomposed as (1.8) and that
the law of action and reaction holds. If the total external torque is perpendicular
to an axis, then the projection of the total angular momentum onto that axis is
conserved.
CHAPTER 1. NEWTON’S EQUATIONS 13
which yields the claim for angular momentum. In the last equality we used the
definition of total momentum (1.11) for the first term and Exercise 1.4 for the
vanishing of the square-bracketed terms. The statement for torque follows by
taking a time derivative.
There is one more dynamical quantity of interest: the moment of inertia
of the ith particle (about the origin), which is given by
Ii = mi |xi |2 .
CHAPTER 1. NEWTON’S EQUATIONS 14
1.7. Exercises
(a) Show that the equations of motion in the new frame are
where
The new forces Φi and Ψi that appear in the equations of motion for z
are called inertial or fictitious forces.
(b) Differentiate the definition of an orthogonal matrix to show that B −1 Ḃ is
antisymmetric, and write B −1 Ḃz = ω × z where ω is the angular velocity
of the moving frame. Now we have
Ψi = −2mi ω × ż.
1.5 (A way to compute π [Gal03]). In this example we will see a geometric aspect
of phase space appear as a physically measurable quantity. This reinforces that
phase space is an inherent object of a Newtonian system and not merely an
abstract concept.
Consider a frictionless horizontal ray with a vertical wall at the origin. One
small block of mass m is initially at rest on the surface, and a big block of mass
M m is pushed towards the small block so that the small block is sandwiched
between the large block and the wall. We will count the number of collisions N
the small block makes with the big block or the wall.
CHAPTER 1. NEWTON’S EQUATIONS 16
(a) Let v1 and v2 denote the velocities of√the large and √ small blocks respec-
tively, and consider the rescaling y1 = M v1 , y2 = mv2 . Plot the initial
energy level set in the (y1 , y2 )-plane to which the motion is confined.
(b) Initially we have v1 < 0, v2 = 0; plot this point in the same (y1 , y2 )-plane.
Assume that each collision is purely elastic, so that when the blocks
collide the total momentum is conserved. Plot the total momentum level
set which contains the initial point; the outcome velocities is determined
by the other intersection of the level sets. After the first collision the small
block will eventually hit the wall, and we will assume that this collision is
also elastic so that v2 < 0 is replaced by −v2 > 0; plot this new point as
well. Plot a few more iterates of this two-collision pattern in the (y1 , y2 )-
plane.
(c) The pattern repeats until v1 > 0 and 0 < v2 < v1 , so that the large block
is moving away and the small block will neither collide with the wall again
nor catch up to the large block. Sketch this final configuration region in
the (y1 , y2 )-plane.
(d) Connect consecutive points occupied by the system in the (y1 , y2 )-plane.
As the lines are either vertical or parallel, the angle θ at a point
p between
any two consecutive lines is the same. Show that θ = arctan m/M .
(e) For any point, the angle θ at that point subtends an arc on the circle
opposite that point. By considering the length of this arc, show that the
total number N of collisions that occurs is the largest integer such that
N θ < π.
(f) Take M = 100n m for n > 0 an integer. Show that 10−n N → π as n → ∞,
and so the number N of collisions spells out the digits of π for n sufficiently
large.
1.6 (Book stacking [Nah16]). We will stack books of mass 1 and length 1 on a
table in an effort to produce the maximum amount of overhang.
(a) Place the first book with its left edge at x = 0 and its right edge lined
up with the end of the table at x = 1. By considering the center of mass
of the book, determine the distance S(1) we can slide the book over the
edge of the table before it falls.
(b) Starting with a stack of two books, we can reason as in part (a) and slide
the top book forward a distance of S(1) while keeping the bottom book
stationary. By considering the center of mass of the two books, determine
the distance S(2) we can slide this two-book configuration before it falls.
(c) Now start with three books, slide the top one a distance of S(1) and then
the top two books as in part (b) in order to produce an overhang S(2)
from the edge of the bottom book. Determine the distance S(3) we can
slide the three-book configuration before it falls.
CHAPTER 1. NEWTON’S EQUATIONS 17
(d) Postulate a formula for S(n) and prove it by induction. Note that the
overhang S(n) tends to infinity as n → ∞.
Suppose we have a Newtonian system with one degree of freedom, and that
the force is only a function of position:
mẍ = F (x).
Any such system is automatically conservative, because we can always find an
antiderivative −V (x) (unique up to an additive constant) so that
mẍ = −V 0 (x). (2.1)
We want to understand the qualitative behavior of solutions near a given
point x0 . We may assume that x0 = 0 for convenience, after replacing the
variable x by x − x0 if necessary.
First we begin with the generic case V 0 (0) 6= 0. In phase space Rx × Rp ,
this means that the vector field (ẋ, ṗ) at the origin is nonzero. From the general
theory of ODEs (cf. Proposition A.18), this implies that there is a smooth change
of variables in a neighborhood of the origin so that the vector field is constant.
More specifically, we have ṗ ≈ −V 0 (0) near the origin, and so the solution x(t)
is accelerating to the left if V 0 (0) > 0 and to the right if V 0 (0) < 0.
With this easy case out of the way, we will now assume V 0 (0) = 0 for the
remainder of this section. The point (x0 , p0 ) = (0, 0) is then a fixed point (or
equilibrium) of the flow, meaning that the constant function x(t) ≡ 0 (and
p(t) ≡ 0) solves the equation. Our first step is to linearize: Taylor expanding
about x = 0 and keeping only the linear term, we obtain
mẍ = −V 0 (x) ≈ −V 0 (0) − V 00 (0)x = −V 00 (0)x. (2.2)
18
CHAPTER 2. ONE DEGREE OF FREEDOM 19
mẍ = kx
for k > 0 a constant. This is a conservative system with potential and total
energy
V (x) = − 21 kx2 , 1 2
E(x, p) = 2m p − 12 kx2 .
The trajectories in phase space are confined to the level sets of E, which look
like axes-symmetric hyperbolas. The origin is a saddle node for this linear
system, and we have the explicit solutions
p0
x(t) = x0 cosh(γt) + mγ sinh(γt), p(t) = mγx0 sinh(γt) + p0 cosh(γt),
q
k
where γ = m.
mẍ = −kx
for k > 0 a constant. This is a conservative system with potential and total
energy
V (x) = 21 kx2 , 1 2
E(x, p) = 2m p + 12 kx2 .
The trajectories in phase space are confined to the level sets of E, which look
like axes-parallel ellipses centered at the origin. The origin is called a center
for this linear system, and we have the explicit solutions
p0
x(t) = x0 cos(ωt) + mω sin(ωt), p(t) = −mωx0 sin(ωt) + p0 cos(ωt),
q
k
where ω = m.
ẋ = −y + ax(x2 + y 2 ) ṙ = ar3
⇐⇒
ẏ = x + ay(x2 + y 2 ) θ̇ = 1
CHAPTER 2. ONE DEGREE OF FREEDOM 21
We will see in sections 2.2 and 2.4 that the prediction of Example 2.2 is
indeed accurate for conservative systems, because the mechanical system (2.1)
has special properties in comparison to general ODEs.
Finally, consider the case V 00 (0) = 0. Then the equation mẍ = 0 can be
directly integrated to obtain a linear function for x(t), which describes rectilinear
motion (uniform motion with constant velocity). This is of course not a robust
prediction since ṗ is nonzero whenever V 0 6= 0, and so we cannot draw any
conclusions.
In section 1.4 we saw that for a conservative Newtonian system the total
mechanical energy is constant along trajectories. We will now give this math-
ematical phenomenon a name and examine its consequences. In addition to
conservative mechanical systems, this also applies to some systems of ODEs
which do not arise from mechanics (cf. Exercise 2.2).
Suppose we have the first-order ODE system
ẋ = f (x) (2.3)
Geometrically, this requires that solutions x(t) lie in level sets of E(x), and
so the quantity E restricts the directions in which trajectories may travel. We
require that E is nonconstant on open sets so that E rules out some directions.
For example, the constant function E(x) ≡ 10 is trivially conserved, but it does
not reveal any information about the behavior of solutions.
A point x∗ where f (x∗ ) = 0 is called a fixed point (or equilibrium)
of (2.3). This implies that the constant function x(t) ≡ x∗ is a solution of (2.3).
A fixed point x∗ is attracting if there exists an open ball B (x∗ ) centered
at x∗ so that for any initial condition x(0) ∈ B (x∗ ) the corresponding solutions
x(t) converge to x∗ as t → ∞. Likewise, a fixed point x∗ is repulsive if the
same statement holds with −t in place of t.
Proposition 2.5. If the ODE system (2.3) has a conserved quantity, then there
are no attracting (or repulsive) fixed points.
Proof. Suppose x∗ were an attracting fixed point, and let > 0 such that
x(t) → x∗ as t → ∞ for all initial conditions x(0) ∈ B (x∗ ). Using that E is
continuous and is constant on the trajectory x(t), we have
As x(0) ∈ B (x∗ ) was arbitrary we conclude that E is constant on the open ball
B (x∗ ), which contradicts our definition of a conserved quantity
Substituting t 7→ −t yields the claim for repulsive fixed points.
Fix > 0 sufficiently small so that x∗ is the only fixed point of f in B (x∗ ),
x∗ is the only critical point of E in B (x∗ ), and E 00 (x) is positive definite on
B (x∗ ).
Fix x0 ∈ B (x∗ ) r {x∗ }. Let c = E(x0 ), and consider the component γ of
the level set E −1 (c) in B (x∗ ) containing x0 . We claim that γ is a simple closed
contour containing x∗ . For each θ ∈ [0, 2π), consider the value r(θ) > 0 such
that
E(r(θ) cos θ, r(θ) sin θ) = c.
As x∗ is a strict local minimum of E, then we may take smaller if necessary
to ensure that r(θ) exists for all θ. The choice of r(θ) is then unique since
E is strictly convex on B (x∗ ). The function θ 7→ (r(θ) cos θ, r(θ) sin θ) now
provides a continuous parameterization of γ, and thus γ is a simple closed
contour containing x∗ . In fact, we know that γ is also smooth by the implicit
function theorem since ∇E is nonzero on γ.
Next, we claim that the trajectory x(t) starting at x0 must be periodic. We
know that x(t) exists for all t ∈ R, because it is confined to the bounded set γ
and so the blowup condition of Corollary A.6 can never be satisfied. Suppose
for a contradiction that x(t) never repeats any value. Consider the sequence
x(1), x(2), . . . . It is contained in the closed and bounded set γ, and thus must
admit a convergent subsequence. Along this sequence the derivative ẋ = f is
converging to zero. As f is continuous then the value of f at the limit point must
be zero, which contradicts that γ does not contain a fixed point. Moreover, x(t)
must hit every point on γ since there are no fixed points on γ and γ is connected.
Therefore the trajectory starting at x0 is periodic. As x0 ∈ B (x∗ ) r {x∗ }
was arbitrary, we conclude that all trajectories in B (x∗ ) r {x∗ } are closed
orbits.
In Theorem 2.6 we must assume that x∗ is an isolated fixed point, otherwise
there could be fixed points on the energy contour (cf. Exercise 2.3).
The star-shaped assumption clearly guarantees that the level sets of E near
x∗ have an interior and exterior. While there do exist topological results that
we could rely upon to weaken this hypothesis (cf. the Jordan curve theorem and
its extensions), Definition 2.7 is suitable for our purposes and is easily verified
in practice. In particular, the star-shaped criterion is satisfied if E 00 is positive
definite, as we will now show. However, note that our definition can also allow
for higher-order minima on a case-by-case basis.
Lemma 2.8. If E : U → R is smooth on an open set U ⊂ Rn , x∗ ∈ U is an
isolated critical point of E with E(x∗ ) = 0, and the Hessian matrix E 00 (x∗ ) is
positive definite, then there exists > 0 sufficiently small so that the sub-level
sets {x ∈ B (x) : E(x) ≤ c} for c ∈ E(B (x∗ )) are compact and star-shaped
about x∗ with smooth boundary.
Proof. Fix > 0 sufficiently small so that x∗ is the only critical point of E in
B (x∗ ) and E 00 (x) is positive definite on B (x∗ ). For each unit vector ν ∈ Rn ,
|ν| = 1 consider the value r(ν) > 0 such that
E(r(ν)ν) = c.
mẍ = −bẋ − kx
with one degree of freedom, where b > 0 a damping constant. The total energy
is still
E = 21 (mẋ2 + kx2 ),
but now
d
E = mẋẍ + kxẋ = −bẋ2 ≤ 0.
dt
The total energy E is a weak (but not strong) Lyapunov function. The origin
is globally attracting with three qualitatively different phase portraits:
√
(a) 0 < b < 2 km: Under damped. The origin is a stable spiral node and
the system oscillates infinitely many times with exponentially decaying
amplitude.
√
(b) b = 2 km: Critically damped. The origin is a stable degenerate node.
The oscillation and friction balance each other so that trajectories barely
fail to make one complete oscillation. In fact, trajectories approach the
origin faster than in the other two cases.
CHAPTER 2. ONE DEGREE OF FREEDOM 25
√
(c) b > 2 km: Over damped. The origin is a stable node and the system
returns to the origin without oscillating.
For our image of the ball rolling down the graph of the potential energy,
the surface of the graph is now slightly sticky. The ball may still roll through a
minimum, but does not have enough energy to approach the boundary V −1 (E0 )
again and so the permitted region for the ball continually shrinks. If V is
shaped like a bowl about x∗ as in the definition of a Lyapunov function, then
we intuitively expect that the ball tends to the bottom of the bowl and hence
x∗ is stable.
Theorem 2.10. Consider the smooth n-dimensional ODE system (2.3) with a
fixed point x∗ .
(a) If there exists a weak Lyapunov function on a neighborhood of the fixed
point x∗ , then x∗ is Lyapunov stable: for any > 0 there exists δ > 0
such that |x(0) − x∗ | < δ implies |x(t) − x∗ | < for all t ≥ 0.
(b) If n = 2 and there exists a strong Lyapunov function near the fixed point
x∗ , then x∗ is also asymptotically stable: there exists η > 0 so that |x(0) −
x∗ | < η implies x(t) → x∗ as t → ∞.
Proof. (a) Fix > 0. After shrinking if necessary, we may assume B (x∗ ) ⊂ U .
As x∗ is a strict local minimum of E, there exists c > 0 sufficiently small so
that the sub-level set {E(x) ≤ c} is contained in B (x∗ ). Pick δ > 0 so that the
ball Bδ (x∗ ) is contained within {E(x) < c}.
Fix x(0) ∈ Bδ (x∗ ). We claim that the trajectory {x(t) : t ≥ 0} can never
enter the exterior of E(x) = c. Suppose for a contradiction that there exists
t > 0 such that x(t) is in the exterior of E(x) = c. Then E(x(t)) > E(x(0)).
As E(x(t)) is smooth, the mean value theorem guarantees that there is a time
t0 ∈ [0, t] such that
d
E(x(t)) > 0.
dt t=t0
energy graph were the bottom of a tank filled with water. Closed orbits are of
course impossible again (cf. Exercise 2.5).
Example 2.12. In the over-damped limit for the harmonic oscillator we have
k 2
ẋ = − kb x, V (x) = 2b x .
This has the solution x(t) = x0 e−kt/b , which is the limiting (i.e. slow timescale)
behavior for the over damped oscillator after the transient (i.e. fast timescale)
behavior becomes negligible. In this limit, the trajectories in the phase portrait
are confined to the line p = mẋ = − mk b x, which agrees with the fact that we
can no longer take a second arbitrary initial condition p(0) for the new one-
dimensional system.
mẍ = F(x)
that is independent of time and velocity (or if F is even in velocity and time).
Note that the force F does not have to be conservative. This system is invari-
ant under the change of variables t 7→ −t since ẍ picks up two factors of −1.
Consequently, if x(t) is a solution then so is x(−t).
CHAPTER 2. ONE DEGREE OF FREEDOM 28
By premise x∗ is a linear center (cf. Example 2.2), and so solutions y(t) to the
linearized equation
ẏ = Df (x∗ ) · (y − x∗ )
are concentric ellipses centered at x∗ .
Fix x0 ∈ B (x∗ ) on the line of reflection. We claim there exists sufficiently
small so that x(t) is never on the opposite side of x∗ as y(t). Assuming this, the
trajectory x(t) intersects the line of reflection on the other side of x∗ at some
time t > 0 because the trajectory y(t) encloses the origin. By reversibility, we
can reflect this trajectory to obtain a twin trajectory with the same endpoints
but with its arrow reversed. Taking > 0 smaller if necessary, we can know
that x∗ is the only fixed point in B (x∗ ), and so the two trajectories together
form a closed orbit.
It only remains to justify that such an > 0 exists. Let T > 0 denote the
period of solutions y(t) to the linearized system. Given an initial condition x0 ,
define the difference
ht (x0 ) = x(t) − y(t)
at time t ∈ [0, T ] between the nonlinear and linear solutions starting at x0 . The
differential equations for x and y match to first order, and so we have
ht (0) = 0, Dht (0) = 0,
where D denotes the gradient in the spatial coordinates. As f is smooth, there
exists a constant c so that
|Dht (x0 )| ≤ c for all x0 ∈ B (x∗ ), t ∈ [0, T ].
Moreover, c → 0 as ↓ 0. Using the mean value theorem we estimate
|ht (x0 )| = |ht (x0 ) − ht (0)| ≤ c|x0 | for all x0 ∈ B (x∗ ), t ∈ [0, T ].
CHAPTER 2. ONE DEGREE OF FREEDOM 29
The linear solutions y(t) are ellipses, and so there exists a constant a > 0
(depending on the semi-major and semi-minor axes) so that
Combing the previous two inequalities, we conclude that there exists > 0
sufficiently small so that
In other words, for x(0) = y(0) ∈ B (x∗ ) we have |x(t) − y(t)| < |y(t)| for all
t ∈ [0, T ], and so x(t) is never on the opposite side of x∗ as y(t).
This argument can also be applied to specific examples to show the exis-
tence of individual closed, homoclinic, and heteroclinic orbits. The key input is
establishing that the trajectory eventually reaches the hyperplane of symmetry,
and then we can extend the trajectory using time-reversal symmetry.
Note that a general involution R can behave very differently in compari-
son to reflections, and so Theorem 2.15 does not clearly generalize to generic
involutions. In particular, there may not be a hypersurface of fixed points; Ex-
ercise 2.8 provides a linear two-dimensional example where the symmetry only
fixes one point.
is conserved by the motion. Note that (2.7) provides a first-order equation for
x(t) in place of the second-order equation (2.6). In this section, we will use this
observation to solve for x(t) and record some consequences.
Suppose that the potential V (x) is shaped like a well, in the sense that
V (x) → +∞ as x → ±∞. The total energy (2.7) is conserved by Proposition 1.9,
and hence E(t) ≡ E is constant. The kinetic energy 12 mẋ2 is nonnegative, and
so the solution x(t) is confined to the region {x : V (x) ≤ E} in configuration
space Rx . This set is bounded since V (x) → +∞ as x → ±∞, and so the
motion is bounded.
By conservation of energy, the velocity ẋ(t) vanishes for values of x with
V (x) = E; these values are called the turning points of the motion. They are
the two endpoints of the interval of {x : V (x) < E} containing x0 , and they are
the extremal points of the motion.
CHAPTER 2. ONE DEGREE OF FREEDOM 30
Most trajectories are periodic and oscillate between the two turning points.
Indeed, if we have V 0 (x) 6= 0 at the two turning points, then the trajectory
x(t) reaches the turning point in finite time and doubles back. This is not
true if V 0 (x) vanishes at one of the turning points. One possibility is that the
trajectory starts at an equilibrium point (x0 , p0 ) in phase space, in which case we
have p0 = 0 and V 0 (x0 ) = 0 so that ẋ = 0 and ṗ = 0 respectively. Alternatively,
it could be that the turning point is an equilibrium, in which case the motion
x(t) approaches the turning point as t → ∞ but never reaches it. Indeed, if x(t)
reaches the turning point then we have ẋ(t) = 0 by conservation of energy, but
this violates the uniqueness of the equilibrium solution.
Suppose x(t) is periodic with turning points x1 (E) < x2 (E). Solving the
energy equation (2.7) for ẋ we obtain
q
2
ẋ = m [E − V (x)]. (2.8)
V (x)
E=1
2.6. Exercises
(b) Sketch the potential energy and the phase portrait, and convince yourself
that the trajectories correspond to a small ball rolling down the graph of
the potential (cf. section 1.4). Note that near the origin in the phase plane
the diagram looks like that of the harmonic oscillator (cf. Example 2.2),
which is a consequence of the small-angle approximation sin x ≈ x. Iden-
tify all equilibria and heteroclinic orbits. How many trajectories make up
the eye-shaped energy level set from −π ≤ x ≤ π that separates different
modes of behavior, and to what motion do they correspond?
(c) Now add a damping term:
g
ẍ = −bẋ − ` sin x
d
where b > 0. Show that dt E ≤ 0 along all trajectories, and sketch the
new phase portrait.
2.2 (Non-mechanical conservative system). Consider the Lotka–Volterra model
ẋ = xy, ẏ = −x2 .
Show that E(x, y) = x2 + y 2 is a conserved quantity and plot the phase portrait
for this system. Although the origin is a minimum for E, it is not an isolated
fixed point nor a center.
2.4 (Over-damped limit for the harmonic oscillator). We will now justify the
over-damped limit approximation in Example 2.12. The objective is to find a
regime for the damped harmonic oscillator
(c) The coefficient of the ẍ term should be mk/b2 , and so the limit in which
this term is negligible is = mk/b2 → 0. Find the general solution
x(t) = c1 ek1 t + c2 ek2 t for the linear equation ẍ + ẋ + x = 0.
(d) Recall that 1/|k| is called the characteristic time of ekt , because after a
time 1/|k| the function has decreased (since k < 0) by a factor of 1/e.
Find the leading term in the Taylor expansion of 1/k1 and 1/k2 about
= 0; these are called the fast and slow timescales for the solution.
2.5. Show that nonconstant periodic solutions are impossible in a gradient sys-
tem by considering the change in V (x) around such an orbit. Conclude that
any one-dimensional first order ODE has no periodic solutions.
2.6 (Low-regularity existence for gradient systems [San17]). The special form
of gradient systems allows us to establish existence and uniqueness with fewer
regularity assumptions. This is particularly useful for gradient PDE systems,
where Rn is replaced by an infinite-dimensional function space.
Suppose F : Rn → R is a convex (and not necessarily smooth) function.
This guarantees that the sub-differential
Show that τ
1
xk+1 − xτk ∈ −∂F (xτk+1 ).
τ
For F differentiable, this is simply the implicit Euler scheme for ẋ =
−∇F (x).
(c) In order to extract a convergent sequence as τ ↓ 0, we need a compactness
estimate. Use the definition of the sequence xτk show that
K
X
τ
1
2τ |xk+1 − xτk |2 ≤ F (x0 ) − F (xτK+1 ).
k=0
Use the previous part to show there exists a constant C such that
Z T
1
xτ )0 (t)|2 dt ≤ C.
2 |(e
0
Conclude that
xτ (t) − x
|e eτ (s)| ≤ C|t − s|1/2 , xτ (t) − xτ (t)| ≤ Cτ 1/2 .
|e
(e) Assume that F is bounded below. For any T > 0, use the Arzelà–Ascoli
theorem to show that x eτ : [0, T ] → Rn admits a uniformly convergent sub-
sequence as τ ↓ 0, and that xτ converges uniformly to the same limit. After
passing to a further subsequence if necessary, show that (e xτ )0 converges
2
weakly in L ([0, T ]). Conclude that the limit x(t) solves the gradient
system for F .
when it exists. Different choices of inner products on the RHS yields different
notions of gradients.
on L2 (Rn ; R), show that formally the gradient flow for the energy E is the
heat equation
∂u
= −∆u.
∂t
(b) For the inner product
Z
hu, viḢ 1 = ∇u(x) · ∇v(x) dx
Note that the higher regularity norm yields less regular solutions: it is
well-known that solutions to the heat equation are automatically smooth,
while solutions u(t, x) = e−t u0 (x) to this equation are only as smooth as
the initial data.
is reversible with respect to rotation by π. Note that the presence of stable and
unstable nodes guarantees that this system is not conservative.
2.9 (Pendulum period). Show that for the pendulum
ẍ = − g` sin x,
where Z π/2
dξ
K(k) = p
0 1 − k 2 sin2 ξ
is the complete elliptic integral of the first kind. By Taylor expanding about
θ0 = 0, find the expansion:
q
1 2
τ ≈ 2π g` 1 + 16
θ0 + . . . .
Note that the zeroth order term is the constant-period approximation obtained
by taking sin x ≈ x and thus replacing the pendulum by a harmonic oscillator.
2.10 (Existence of solitons). In 1844, Scott Russell famously observed a solitary
traveling wave (now commonly referred to as a soliton) in a canal, contradicting
the popular belief that all water waves must either crest and break or disperse.
In order to explain this phenomenon, the Korteweg–de Vries equation
∂u ∂3u ∂u
= − 3 − 6u
∂t ∂x ∂x
was introduced in [KdV95] as a model for the surface u : Rt × Rx → R of a
shallow channel of water.
(a) We seek traveling wave solutions to this PDE. Insert the ansatz u(t, x) =
h(x − ct) where c > 0 is a constant and obtain an ODE for h(x).
(b) Integrate the equation for h once to obtain a second-order ODE, and write
2
it in the form ddxh2 = −V 0 (h) of a conservative mechanical system for some
potential function V (h).
CHAPTER 2. ONE DEGREE OF FREEDOM 37
(c) Use the conserved quantity 21 (h0 )2 + V (h) to sketch the phase portrait
in the (h, h0 )-plane. Highlight a unique homoclinic orbit connecting the
fixed point at the origin to itself; the corresponding solution h(x) obeys
h(x) → 0 as x → ±∞, and thus describes the profile of a localized wave.
(d) Use the conservation of 12 (h0 )2 + V (h) to obtain a first-order ODE for h.
This equation is separable, and thus can be integrated. Conclude that
solitary traveling wave solutions are given by the formula
for arbitrary constants x0 ∈ R and β > 0. How is the speed of these waves
related to their amplitude?
CHAPTER 3
CENTRAL FIELDS
We will examine some examples of systems with more than one degree of
freedom. This selection focuses on the most important examples in order to
provide a baseline intuition; a thorough study of classical mechanics should
include many more examples, e.g. rigid bodies and the mechanical top. The
material is based on [Arn89, Ch. 2], [LL76, Ch. 3], and [Gol51, Ch. 3].
In this section we will solve for the motion of a single particle in R3 subject to
a central force F. A vector field F is called central (about the origin) if all of the
vectors are radial and the magnitude is only a function of the radial coordinate
r = |x|; in other words, F ≡ F (r)r̂. (This definition of course extends to Rd , but
soon we will need to specialize to R3 in order to discuss angular momentum.)
A central field must be conservative, and the corresponding potential energy
V ≡ V (r) depends only on the distance to the origin. This is because F (r) is a
function of one variable and thus we can always find an antiderivative −V (r).
Alternatively, if we write F = F (r)r̂, then the work
Z x2 Z |x2 |
F · ds = F (r) dr
x1 |x1 |
is path independent and so from Proposition 1.7 we know there exists a radial
potential energy V (r) so that F = −∇V .
The torque of the particle is
N = x × F = F (r)r(r̂ × r̂) = 0,
38
CHAPTER 3. CENTRAL FIELDS 39
Proposition 3.1 (Kepler’s second law). The rate of change in the total area
swept by the radius vector as a function of time is constant.
Proof. Let (r, φ) denote polar coordinates within the plane of motion. The
velocity in these coordinates is
d r cos φ cos φ − sin φ
= ṙ + rφ̇ = ṙr̂ + rφ̇φ̂.
dt r sin φ sin φ cos φ
to first order. Together, we see that the total area S(t) swept by the radius
vector obeys
Ṡ = 12 r2 φ̇ = 2m
1
L (3.2)
and thus is constant.
Using (3.1), we can rewrite the total energy as
m 2 L2 −2
E =K +V = 2 (ṙ + r2 φ̇2 ) + V (r) = m 2
2 ṙ + 2m r + V (r). (3.3)
This is the total energy for a one-dimensional Newtonian system in the coordi-
nate r with the effective potential energy
L2 −2
Veff (r) = V (r) + 2m r .
The last term on the RHS is called the centrifugal energy. When the effective
potential is equal to the total energy we have ṙ = 0, which is a turning point for
the one-dimensional system. Unlike in section 2.5 though, the actual particle
is not at rest at such a point because the angle is changing (unless the angular
momentum is zero).
Example 3.2. Kepler’s problem seeks the equations of motion for a particle
moving around a fixed gravitational mass, which is governed by the potential
V (r) = −kr−1
Veff (r)
E0 > 0
r
E0 < 0
We can also use the expression (3.4) for ṙ to solve the separable equation (3.1)
for φ: Z
L dr
φ= p + φ0 . (3.6)
2
r 2m[E − Veff (r)]
This is a formula for φ as a function of r, at least formally.
The original system of differential equations was six-dimensional: 3 dimen-
sions for the position x ∈ R3 and 3 dimensions for the momentum p ∈ R3 . We
then used 4 conserved quantities to reduce the system to 2 first-order equations,
which could then be integrated. Indeed, the conservation of the angle of L
eliminated 2 degrees of freedom by restricting the motion to be coplanar, and
the conservation of the magnitude of L eliminated another degree of freedom
by providing the first-order equation (3.1) for φ. The fourth conserved quantity
was the total energy E, which we used to obtain the first-order equation (3.4)
for r.
As in section 2.5, the motion is confined to the region Veff ≤ E. Nevertheless,
in general the particle may still be able to approach r = ∞. Indeed, if the
potential energy at infinity V∞ = limr→∞ V (r) = limr→∞ Veff (r) exists and is
CHAPTER 3. CENTRAL FIELDS 41
finite, then there are unbounded trajectories for energies E ≥ V∞ and we can
define the velocity at infinity v∞ ≥ 0 via E = 21 mv∞
2
+ V∞ . On the other hand,
the particle may also be able to approach r = 0. In order for this to happen,
the potential V (r) must not outgrow the centrifugal energy:
L2 2
lim sup V (r) + 2m r < +∞.
r↓0
In the remaining case, the effective potential has two turning points rmin
and rmax , which confines the motion within the annulus bounded by these two
radii. Points where r = rmin are called pericenters and points where r = rmax
are called apocenters. The time symmetry for the one-dimensional system in
r implies that the trajectory will be symmetric about any ray from the origin
through a pericenter or apocenter. According to the solution (3.6), the angle
between successive pericenters (or apocenters) is then
Z rmax
L dr
Φ=2 p . (3.7)
2 2m[E − Veff (r)]
rmin r
We have seen two special examples of central fields thus far: the harmonic
oscillator potential
V (r) = kr2 , k > 0, (3.8)
and the gravitational potential
The objective of this section is to show that these are the only two central fields
for which all bounded orbits are periodic:
Theorem 3.3 ([Arn89, §8D]). Suppose a particle moves in a smooth central field
on R3 r {0} and there exists a bounded trajectory. If all bounded trajectories
are periodic, then the potential is either the harmonic oscillator potential (3.8)
or the gravitational potential (3.9).
For V (r) = −krp with p ∈ (−2, 0), the effective potential still has the same
qualitative shape as in the case p = −1 from Example 3.2: Veff (r) → +∞ as
r ↓ 0 and Veff (r) → 0 as r → ∞, with a negative minimum in between. If
p 6= −1, Theorem 3.3 implies that for E0 < 0 there must be trajectories that
are not periodic. In fact, it can be shown that such trajectories are dense in the
annulus {r : Veff (r) ≤ E0 }.
To begin the proof of Theorem 3.3, suppose V (r) is a central field in which
all bounded orbits are closed. The existence of a closed orbit guarantees that
CHAPTER 3. CENTRAL FIELDS 42
0
Veff (r) has a (strict) local minimum at some value r > 0. Indeed, if Veff (r) < 0
for all r > 0, then we eventually have Veff (r) ≤ E0 − a for some a > 0 and all
t large, and hence ṙ ≤ b > 0 for all t large by conservation energy. Similarly, if
0
Veff (r) > 0 for all r > 0, then we eventually have Veff (r) ≤ E0 − a for some a > 0
and all t large, and hence ṙ ≤ −b < 0 for all t large by conservation energy. In
either case, we eventually have that ṙ is nonvanishing, and so no closed orbits
could exist.
Let r0 denote a local minimum of Veff . If the initial radius is sufficiently
close to r0 then the energy E0 will be close to Veff (r0 ) and the motion will be
confined to a bounded component of {r : V (r) ≤ E0 }. From (3.7), the angle
between successive pericenters or apocenters is
Z rmax
L
Φ=2 p dr,
2 2m[E − V (r) − L2 /2mr2 ]
rmin r
L
where rmin and rmax are the radial turning points. Substituting x = mr , we
obtain
√ Z xmax
1
Φ = 2m p dx. (3.10)
xmin E − V (L/mx) − mx2 /2
This is the period integral (2.9) for the one-dimensional system with potential
L
m 2
W (x) = V mx + 2x .
Next, we compute the limiting period for small oscillations near minima for
the one-dimensional system with potential W :
Lemma 3.4. Consider a conservative one-dimensional Newtonian system with
smooth potential W (x). If W has a local minimum x0 with value E0 , then
q
lim τ (E) = 2π W 00m(x0 ) . (3.11)
E↓E0
Without the error term, this is a harmonic oscillator about x0 with spring
constant k = W 00 (x0 ), and in Example 2.2 we found that the solution can be
expressed in terms of trigonometric functions with period
q q
2π m k = 2π
m
W 00 (x0 ) .
This gives us the formula (3.11) for the leading term of τ (E) as E ↓ E0 . The
convergence of τ (E) as E ↓ E0 can be justified by passing the limit inside the
integral expression (2.9).
CHAPTER 3. CENTRAL FIELDS 43
m2 x30 = LV 0 (r0 ).
r0 V 00 (r0 ) + 3V 0 (r0 )
W 00 (x0 ) = m .
V 0 (r0 )
Therefore, taking the limit E ↓ V (r0 ) in the angle integral (3.10) and apply-
ing (3.11), we conclude that the angle Φ tends to
s
V 0 (r0 )
r
m
Φcir = 2π = 2π (3.12)
W 00 (x0 ) r0 V 00 (r0 ) + 3V 0 (r0 )
and in both cases Φ is constant. Plugging this back into Φcir , we obtain
(The case α = 0 corresponds to V (r) = b log r.) We will now split into cases.
First
√ consider the case V (r) = b log r. Taking α = 0 in (3.13) we have
Φcir = 2π, and so Φ is not a rational multiple of 2π.
Next, consider the case V (r) = arα with α > 0. The constant a must be
positive so that there exists a bounded orbit, and hence V (r) → ∞ as r → ∞.
Substituting x = xmax y in the Φ integral (3.10), we have
√ 1
my 2
Z
dy 1 L
Φ= 2m p , U (y) = + 2 V .
ymin U (1) − U (y) 2 xmax mxmax y
CHAPTER 3. CENTRAL FIELDS 44
lim Φ = π.
E→∞
Kepler’s problem seeks the motion for the central field with potential
The original motivation was to model a celestial body in motion around a fixed
gravitational object, but this also describes the motion of an electrically charged
particle attracted to a fixed charge.
From section 3.1 we know the motion is coplanar, and the radius r evolves
subject to the one-dimensional effective potential
L2 −2
Veff (r) = −kr−1 + 2m r .
Note that limr↓0 Veff (r) = +∞ and limr→∞ Veff (r) = 0. If L 6= 0 then the first
derivative
0 L2
Veff (r) = rk2 − mr 3
has exactly one root for r ∈ (0, ∞) at r = L2 /mk, and so Veff has a strict global
minimum with value 2 2
L
Veff mk = − mk
2L2 .
Given an initial condition, we picked the origin for φ so that integration constant
above is zero. Solving for r as a function of φ, we obtain
L2 /mk
r= p .
1+ 1 + 2EL2 /mk 2 cos φ
respectively.
The eccentricities = 1 and > 1 correspond to parabolas and hyperbolas
respectively, which agrees with the fact that E ≥ 0 yields unbound orbits.
Likewise = 0 and 0 < < 1 correspond to circles and ellipses respectively,
2
which agrees with the fact that E ∈ [− mk2L2 , 0) yields bounded orbits.
For the planets in our solar system, the eccentricities are very small and the
trajectories are nearly circular. Consequently, before solving Kepler’s problem,
scientists (such as Copernicus) believed that the planets’ orbits were perfectly
circular with the Sun at the center. Kepler corrected this, and Kepler’s first
law states that the planetary orbits are ellipses with the Sun lying at a focal
point.
Now we will determine the period τ of a bounded elliptic orbit. Integrating
Kepler’s second law (3.2) over one orbit and recalling the area of an ellipse, we
have
πab = S = 12 Lτ.
This yields the explicit formula
q
τ = 2πm ab
L = πk
m
2|E|3
for the period as a function of the energy. Using the formula (3.14) for the
semi-major axis a in terms of the energy E, we obtain
q
3
τ = 2π ma k . (3.15)
In practice we know that the mass (or charge) sitting at the origin is not
perfectly stationary, but instead is perturbed by the particle’s presence. We
will now remedy this. The two-body problem seeks the motion for a closed
system consisting of two gravitational bodies with positions xi and masses mi ,
for i = 1, 2. The system is conservative with potential
Gm1 m2
V (x1 , x2 ) = − , (3.16)
|x1 − x2 |
The virial theorem is a general formula for the long-time average of a system’s
kinetic energy. In the special case of a single particle in a homogeneous central
field, it takes a particularly simple form.
Suppose we have a system of N particles in Rd . The virial theorem is based
upon the following simple calculation:
N
X N N N
d X X X
xi · pi = ẋi · pi + xi · ṗi = 2K + Fi · xi . (3.19)
dt i=1 i=1 i=1 i=1
Therefore, if all particle motion is bounded, then the LHS of (3.21) vanishes.
Altogether, we conclude:
Theorem 3.5 (Clausius’ virial theorem [Cla70]). If the motion of a Newtonian
system of N particles is bounded, then the long-time average (defined by (3.20))
of the kinetic energy obeys
N
X
− 2hKi = Fi · xi . (3.22)
i=1
Proof. It only remains to prove the special case (3.23). Writing V (r) = krα for
k ∈ R a constant, we have
hKi = − 21 hV i.
The version (3.22) of the virial theorem is commonly used in physical appli-
cations. Mathematically however, the computation (3.19) is arguably equally
as important, particularly in deriving monotonicity formulas.
Example 3.7. Consider a single particle moving in a central field in R3 with
potential V , and suppose that the potential is repulsive in the sense that the
radial component of the force always points away from the origin:
d2
|x(t)|2 ≥ 4
dt 2 mK > 0,
d2 |πx (ẋ)|2
|x(t)| ≥ > 0.
dt2 |x|
x
Consequently |x(t)| is strictly convex, or equivalently |x| · ẋ is strictly increas-
ing. Physically, this tells us that a particle initially moving towards the origin
is slowing down, and a particle moving away from the origin cannot reverse
direction.
If the potential V is also nonnegative, then we have
1 2
2 m|ẋ| ≤ E0
(provided that x(t) exists for all t ∈ R and is nonvanishing). Physically, this
tells us that the angular component of the velocity is decaying in time and the
motion becomes predominantly radial.
The estimate (3.26) is an example of a Morawetz inequality, and such
inequalities have played an important role in the context of PDEs. They are
named after Morawetz’s pioneering work on the scattering problem for the linear
wave equation with an obstacle; see [Mor75, LP89] for details. More recently,
Morawetz inequalities have also proven to be a powerful tool in the study of
nonlinear PDEs.
For more examples of monotonicity formulas in the context of Newtonian
systems, see [Tao06, §1.5]. For an introduction to monotonicity formulas for
PDEs, see [KV13, §7] and [Tao06, Ch. 2-3].
3.5. Exercises
V (r) = −kr−2
for k > 0 a constant. Show that trajectories with negative energy reach the
origin in finite time by considering the effective potential.
3.2 (Method of similarity). Suppose that the potential energy of a central field
is a homogeneous function of degree α:
V (r) = krα
3.4 (Cosmic velocities). Consider the gravitational potential energy of the Earth
as in the previous problem. The escape velocity v2 is sometimes called the second
cosmic velocity. The first cosmic velocity is the speed of a particle in a circular
orbit with radius equal
√ to that of the Earth. Find the first cosmic velocity v1
and show that v2 = 2v1 .
3.5 (Geosynchronous orbit [Nah16]). It is useful for communication satellites
to be in geosynchronous orbit, so that their orbital period is one day and the
satellite appears to hover in the sky. We will calculate the height of this orbit
for Earth in two different ways.
(a) Let m be the satellite’s mass, M = 5.98 × 1024 kg be the Earth’s mass, v
be the satellite’s velocity, and Rs the radius of the satellite’s circular orbit.
Determine Rs by equating the gravitational and centripetal accelerations
of a circular orbit and writing v = 2πRs /T where T is the length of a day.
(b) Use Kepler’s third law to calculate the same value for Rs .
3.6 (Satellite paradox [Nah16]). Satellites in low Earth orbit experience signif-
icant atmospheric drag, which actually increases the speed of the satellite.
E = − 12 mv 2 .
Ė = −cv
3.7 (Solar and lunar tides [Nah16]). The gravitational force between two bodies
of masses m1 and m2 has magnitude
where r is the distance between the bodies’ centers and G is the universal
gravitational constant. For the Earth, Sun, and Moon, we have
(a) Find the ratio of the Sun’s and the Moon’s gravitational forces on the
Earth. Even though the Sun is much farther from the Earth, the Sun’s
gravitational force on the Earth is much greater than the Moon’s.
(b) As the Earth is not a point mass, then the Sun’s gravitational force is
stronger (weaker) on the side of the Earth closest (farthest) from the Sun.
This causes water to bulge at the points closest and furthest from the Sun,
which is called the solar high tides. Calculate the maximum difference in
gravitational force in terms of Earth’s radius R for both the Sun and the
Moon.
(c) Extract the leading term in the limit R/Rs 1 and R/Rm 1 for each
expression in part (b). Calculate their ratio and conclude that, although
the Sun’s gravitational force is stronger, the lunar tides are more than
twice as large as the solar tides.
3.8 (Energy of the ocean tides [Nah16]). The lunar tides are not directly in line
with the centers of the Earth and Moon, but are rather carried ahead slightly
by the Earth’s rotation and friction. This means that the Moon’s gravitational
pull on both tides produce torque. The Moon’s pull on the farther tide in-
creases the Earth’s rotational speed, but the stronger pull on the nearer tide
is counter-rotational, and so the overall effect decreases the Earth’s rotational
speed. Atomic clocks have measured that the length of a day is increasing at
the rate of about 2 milliseconds per century.
(a) Let Ω denote the angular rotation rate of the Earth and let T denote the
length of a day in seconds, so that ΩT = 2π. By integrating the kinetic
energy over the volume of the Earth, show that the rotational energy E
is given by
E = 21 Ω2 I,
where Z
I= (x2 + y 2 ) dx dy dz
r≤R
I = 52 M R2 .
2
The Earth is not a constant-density sphere, and so rather than 5 the
coefficient is approximately 0.3444.
CHAPTER 3. CENTRAL FIELDS 52
(c) Write the rotational energy E as a function of the period τ , and show that
dE 4 × 0.3444 × M π 2 R2
=− .
dτ T3
Taking τ to be the length of a day, ∆τ to be 2 milliseconds, M =
5.98 × 1024 kg, and R = 6.38 × 106 m, find the change in the Earth’s rota-
tional energy ∆E over a century. Dividing ∆E by the number of seconds
in a century, conclude that the power of the ocean tides is 3, 260 gigawatts
(which is 4.37 billion horsepower).
3.9 (Moon recession rate [Nah16]). As in Exercise 3.8, tidal friction decreases
the Earth’s rotational angular momentum. Consequently, the Moon’s orbital
angular momentum increases in order to conserve total angular momentum,
which results in the Moon drifting away from the Earth. We will estimate this
recession rate, assuming that all of the momentum is transferred to the Moon’s
orbit (rather than its rotation).
(a) Consider the Moon as a point mass m orbiting circularly about the Earth
at a radius r, with speed v and angular speed ω radians per second. What
is the magnitude Lm of Moon’s orbital angular momentum about the
Earth?
(b) The gravitational force on the Moon by the Earth has magnitude
F = GM mr−2 ,
where M is the mass of the Earth and G is the universal gravitational
constant. Equating the gravitational and centripetal accelerations of the
Moon, find v as a function of r. Use this to determine the angular mo-
mentum Lm as a function of r.
(c) From part (b) of Exercise 3.8, the rotational angular momentum of the
Earth is Le = 0.3444 M R2 Ω where Ω is Earth’s rotation rate. Expressing
Ω in terms of the day length T in seconds, find Le as a function of T and
calculate dL
dT .
e
(d) Using the daily change ∆T = 2 × 10−5 /365 seconds in the length of a day,
approximate the daily change and the yearly change in ∆Le .
(e) Equating change in the Moon’s orbital momentum ∆Lm with the change
in Earth’s rotational momentum |∆Le |, find the yearly change in the
Moon’s orbital radius. Using the values
M = mass of the Earth = 5.98 × 1024 kg,
m = mass of the Moon = 7.35 × 1022 kg,
r = radius of Moon’s orbit = 3.84 × 108 m,
R = radius of the Earth = 6.37 × 106 m,
G = gravitational constant = 6.67 × 10−11 m3 kg−1 s−2 ,
CHAPTER 3. CENTRAL FIELDS 53
conclude that the Moon is receding from the Earth at a rate of 3.75 cen-
timeters (or 1.48 inches) per year. This value is in outstanding agreement
with measurements made by a laser on Earth and corner cube reflectors
on the Moon.
V (r) = −kr−2
where E is the total energy. Conclude that trajectories with negative energy
reach the origin in finite time.
3.11 (Central field scattering). In this problem we consider a classical model for
a beam of charged particles passing near a repulsive central charge. Consider a
repulsive central field in R3 that tends to zero as |x| → ∞. Suppose we have a
uniform beam of particles all of the same mass and energy whose motion begins
and ends colinearly at infinity. The intenstiy I of the beam is the number of
particles crossing unit area normal to the initial direction of travel per unit time.
(a) Define the impact parameter s for a particle of mass m and initial velocity
v0 via √
L = mv0 s = s 2mE.
Using the facts from section 3.1, show that for the solid angle Ω ⊂ S 2 the
scattering cross section σ(Ω)—the number of particles scattered per unit
solid angle per unit time divided by the incident intensity—is given by
s ds
σ(Θ) = − ,
sin Θ dΘ
where Θ is the angle between the incident and scattered beams.
(b) Suppose the incident particles have charge −q < 0 and the fixed particle
has charge −Q < 0. The motion is dictated by the Coulomb force
sin Θ 1
2 = , cot Θ
2 =
2Es
qQ .
|v1 |
σ 0 (θ) = 4 cos θ σ(2θ) , = cos θ.
|v0 |
PART II
LAGRANGIAN MECHANICS
55
CHAPTER 4
EULER–LAGRANGE EQUATIONS
56
CHAPTER 4. EULER–LAGRANGE EQUATIONS 57
The principle of least action often bears Hamilton’s name, but it was also
independently discovered by Jacobi.
In this section, we will see how to extract equations of motion from the
principle of least action. The principle is called “least” action because it turns
out that the motion is often a minimum; however, we will not make use of this
additional assumption.
The goal of the calculus of variations is to find the extrema of functionals.
A path from a0 ∈ M to a1 ∈ M (not necessarily distinct) starting at time 0
and ending at time T > 0 is a smooth map γ : [0, T ] → M with γ(0) = a0 and
γ(T ) = a1 . Let Ω denote the collection of all such paths. A functional is a
function Φ from Ω into R.
Example 4.3. The arc length of the graph of x(t) from [0, T ] into M = Rn is
a functional. It takes as input a smooth function x : [0, T ] → Rn and returns
the value Z Tp
Φ(x(t)) = 1 + ẋ2 dt.
0
Intuitively, we expect the path of minimum length between two points to be the
line segment connecting those two points. Indeed, by the fundamental theorem
of calculus we have
Z T Z T Z Tp
(T, a1 ) − (0, a0 ) = (1, ẋ) dt ≤ |(1, ẋ)| dt = 1 + |ẋ|2 dt,
0 0 0
with equality if and only if ẋ is constant. The more systematic machinery that
we develop in this section should also return this answer.
In order to solve for the optimizer, we want to obtain an equation that must
be satisfied by the extremum of our functional. From calculus we expect that the
first derivative should vanish at an extremum, and so we want to define a notion
of first derivative for functionals. For more concrete examples we do not need to
assume arbitrary smoothness (cf. Exercise 4.2) or even that an extremum exists
a priori (cf. Exercise 4.5), but for clarity we will assume the extremum exists
and that everything is smooth.
A (fixed-endpoint) variation of a path γ ∈ Ω is a smooth map H(s, t) =
Hs (t) from (−, ) × [0, T ] to M for some > 0 such that:
• H0 = γ,
• Hs ∈ Ω for all s ∈ (−, ), and
• H(s, ti ) = ai for i = 0, 1 for all s ∈ (−, ).
In other words, the paths Hs for various s form a smooth deformation of γ,
which is equal to γ at s = 0 and always connect a1 to a2 for other values of s.
In analogy with calculus on Euclidean space, we will call a functional Φ
differentiable at a path γ ∈ Ω with derivative (or first variation) dΦ|γ (or
δΦ) if the limit
∂Hs Φ(Hs ) − Φ(γ)
dΦ|γ = lim
∂s s→0 s
CHAPTER 4. EULER–LAGRANGE EQUATIONS 58
exists for all variations H of γ. In other words, there should exist a function
dΦ|γ so that the Taylor expansion
∂Hs
Φ(Hs ) = Φ(γ) + s dΦ|γ + o(s)
∂s
holds. We expect the function dΦ|γ will be linear in ∂H ∂s (like v 7→ ∇f (x0 )·v on
s
∂Hs
Euclidean space). We need Φ to be a function of ∂s instead of H because the
space of derivatives ∂H
∂s will be linear, unlike the space of variations H which
s
appear in the Euclidean theory, but we present them for the sake of the general
manifold theory. We can think of Ω as a manifold (albeit infinite-dimensional),
and a variation as a path on Ω through γ at s = 0. The derivative ∂H ∂s should
s
derivatives
∂H
(0, t) ∈ Tγ(t) M for t ∈ [0, T ].
∂s
As
∂H ∂H
(0, t0 ) = 0 and (0, t1 ) = 0
∂s ∂s
∂Hs
by the fixed-endpoint requirement, then we indeed have ∂s (0) ∈ Tγ Ω. Now
we define
∂Hs d
dΦ|γ = Φ ◦ Hs .
∂s ds s=0
One can check that dΦ|γ is well-defined and linear as expected (cf. Exercise 4.1).
Example 4.4. Consider the arc-length functional Φ of Example 4.3. Given
two points a0 , a1 ∈ Rn , the space Ω of paths from a0 to a1 is the set of smooth
functions [0, T ] → Rn with x(ti ) = ai for i = 0, 1. For any path x : [0, T ] → Rn ,
a variation Hs (t) of x takes the form x(t) + hs (t), where hs : [0, T ] → Rn is
smooth satisfying hs (0) = 0 = hs (T ) for all s ∈ (−, ) and h0 (t) ≡ 0.
∂hs
The derivative ∂H ∂s is equal to ∂s (t). As h0 (t) ≡ 0, then the variation
∂h0
at s = 0 is equal to x(t), and so ∂H ∂s (0, t) = ∂s (t) is a vector centered at
x(t) and pointing in the direction of the variation hs (t) for small s. Moreover,
∂H ∂h0
∂s (0, ti ) = ∂s (ti ) = 0 for i = 1, 2 since hs (0) = 0 = hs (T ) for all s ∈ (−, ).
A tangent vector W to Ω at x(t) is a set of vectors {wt : t ∈ [0, T ]} such that
wt is centered at x(t), the vectors w0 and wT at the endpoints vanish, and wt
depends smoothly on t (in the sense that t 7→ v · wt is smooth, or equivalently
every component of wt is a smooth function of t).
CHAPTER 4. EULER–LAGRANGE EQUATIONS 59
For the rest of this section, we will restrict our attention to the action func-
tional S defined in (4.1). If we take L (t, x, y) = 1 + |y|2 , we recover the
p
Consequently, a path q(t) is a critical point for the action if and only if q(t)
solves
d ∂L ∂L
t, q(t), q̇(t) − t, q(t), q̇(t) = 0. (4.3)
dt ∂ q̇ ∂q
The n-many second-order differential equations (4.3) are called the Euler–
Lagrange equations for the functional S, or simply the Lagrange equa-
tions when applied to a mechanical system. Note that (4.3) must hold for
any choice of coordinates q, where ∂L ∂q is the gradient of L (t, q, v) in the q
∂L
variables. The derivative ∂ q̇ is a convenient notation for the derivative of the
Lagrangian L (t, q, v) in the velocity variables v, and should technically be no-
tated as ∂L
∂v (t, q, q̇). Although (4.3) is often shortened to
d ∂L ∂L
− = 0,
dt ∂ q̇ ∂q
d
we are meant to plug in q and q̇ before taking the total time derivative dt , which
for example will turn q̇ into q̈.
Proof. Fix a set of coordinates q on M , let γ be a path, and let H be a variation
of γ. By taking the variation of γ to be supported within the image of the
coordinate patch q and shrinking the interval [0, T ] if necessary, we will work
solely within the domain of q in Rn . Let x(t) denote the path in Rn , L (t, x, ẋ)
denote the Lagrangian in Rn , and x(t) + h(t) the variation Hs (t) in Rn (where
the s-dependence of h is suppressed). Note that the fixed endpoint condition
requires that h(0) = 0 = h(T ).
Now that we may work in Euclidean space, we can now use the key idea that
lies at the heart of the Euler–Lagrange theory. As L is differentiable, we may
Taylor expand and write
∂L ∂L
L (t, x+h, ẋ+ḣ) = L (t, x, ẋ)+ (t, x, ẋ)·h(t)+ (t, x, ẋ)·ḣ(t)+O(|h(t)|2 ).
∂x ∂ ẋ
CHAPTER 4. EULER–LAGRANGE EQUATIONS 60
Therefore,
Z T
L (t, x + h, ẋ + ḣ) − L (t, x, ẋ) dt
S(γ + h) − S(γ) =
0
Z T
∂L ∂L
= (t, x, ẋ) · h(t) + (t, x, ẋ) · ḣ(t) dt + O(|h|2 ).
0 ∂x ∂ ẋ
We pulled the term O(|h(t)|2 ) outside of the integral to get a term O(|h|2 ) which
is bounded by maximum of |h(t)|2 because L is continuously differentiable. The
integral in the rightmost expression will be the derivative dS|γ . Integrating by
parts yields
Z T Z T T
∂L d ∂L ∂L
· ḣ dt = − · h dt + ·h .
0 ∂ ẋ 0 dt ∂ ẋ ∂ ẋ 0
As h(0) = 0 = h(T ) the second term on the RHS above must vanish, and
inserting this back into the expression for dS|γ yields (4.2) as desired.
It is clear from the formula (4.2) for dS|γ that if (4.3) is satisfied then
dS|γ ≡ 0. Conversely, assuming that dS|γ = 0 for any variation, we obtain
Z T
∂L d ∂L
− · h dt = 0
0 ∂x dt ∂ ẋ
for all smooth h : [0, T ] → Rn with h(0) = 0 = h(T ). We conclude that the
integrand without h vanishes identically, since otherwise we could pick h(t) to be
a bump function that witnesses any nonzero value and obtain dS|γ (h) 6= 0.
In the proof of Proposition 4.5, we only need that the coordinates q are
spanning (and not necessarily independent) in order to have enough directions
h(t) to conclude that the Euler–Lagrange equation holds. Therefore, there is no
obstruction to extending Proposition 4.5 to more than n coordinates provided
that they span the same space.
We also record the following observation:
Corollary 4.6. Given a Lagrangian L (t, q, q̇), adding a total time derivative:
and the addition of a constant to S does not affect whether or not a path is a
critical point of S.
CHAPTER 4. EULER–LAGRANGE EQUATIONS 61
In section 4.1 we saw how the principle of least action yields the n = dim M
Euler–Lagrange equations
d ∂L ∂L
− = 0. (4.4)
dt ∂ q̇ ∂q
In this section we will see how (4.4) encodes Newton’s equations, and thus
Newton’s principle of determinacy is implied by Hamilton’s principle of least
action. For this reason, many graduate physics texts opt to begin with the
principle of least action and view Newton’s equations as a consequence.
Let x1 , . . . , xN ∈ Rd be Cartesian coordinates on Euclidean space. For a con-
servative system of N particles, we will see that the right choice of Lagrangian
is
L (t, x, ẋ) = K(ẋ) − V (x), (4.5)
where K and V are the kinetic and potential energies. The principle of least
action implies that the motion x(t) = (x1 (t), . . . , xN (t)) of the system satisfies
the Euler–Lagrange equations (4.4). For this Lagrangian, we have
N
∂L ∂K ∂ X1 2
= = m ẋ
i i = p,
∂ ẋ ∂ ẋi ∂ ẋ i=1 2
∂L ∂V
=− = −∇V = F.
∂x ∂x
CHAPTER 4. EULER–LAGRANGE EQUATIONS 62
V = −mg` cos θ.
(We picked our integration constant so that V = 0 when the mass is at the
height of the pivot θ = π/2.) The Lagrangian is given by
frame K
e moving with small velocity relative to an inertial frame K. Then
v|2 = |v|2 + 2v · + ||2 , and so
e = v + and |e
v
L (|e
v|2 ) = L (|v|2 ) + 2L
f(|v|2 )v · + O(||2 ). (4.6)
However, as both frames are inertial then the two Lagrangians should be equiva-
lent for all . Therefore the linear term of (4.6) should be a total time derivative
(cf. Corollary 4.6). As the Lagrangian can only be a function of |v|2 , then this
term could only be a total time derivative if it is linear in v. We therefore have
that L f(|v|2 ) is independent of v, and hence L is proportional to |v|2 . This
allows us to write
L = 21 m|v|2 , (4.7)
where m is the particle’s mass. Experimentally, we would observe that a par-
ticle’s acceleration is inversely proportional to its mass as in section 1.1. Note
that m cannot be negative since from the action (4.1) we see that Hamilton’s
principle would yield maxima instead of minima. We did not use the third type
of Galilean transformations, but the expression (4.7) is automatically invariant
with respect to rectilinear motion v0 = v + v0 . Indeed, we have
which yields (4.5) when there is no potential energy. In practice, to change into
another coordinate system we only need to know the line element ds or metric
ds2 in order to know how to transform |ẋ|2 . If we wish to express the Cartesian
coordinates xi as functions of generalized coordinates q = (q1 , . . . , qn ), then we
obtain
Xn
K = 12 aij (q)q̇i q̇j (4.8)
i,j=1
for functions aij of the coordinates only. That is, the kinetic energy K in
generalized coordinates is still a quadratic function of velocities, but may also
depend on the other coordinates. Mathematically, a conservative Lagrangian
system is determined by a Riemann manifold—where the metric determines the
kinetic energy—and a potential function.
To describe a general system of particles which may interact as in (4.5), we
add a function to the Lagrangian. For a conservative system, this function is
the potential energy:
L (q, q̇) = K(q, q̇) − V (q). (4.9)
CHAPTER 4. EULER–LAGRANGE EQUATIONS 64
Adding this in, the principle of least action for the new Lagrangian (4.10) yields
Z Z TX n
∂K d ∂K
0=d (K + W ) dt (h) = − + Qj · hj dt.
q(t) 0 j=1 ∂qj dt ∂ q̇j
CHAPTER 4. EULER–LAGRANGE EQUATIONS 65
Therefore, the new Lagrangian (4.10) generates the same motion as the conser-
vative Lagrangian L = K − V .
Even when the forces are nonconservative, all we needed in the previous
paragraph was the ability to write the jth component of the generalized force
as
d ∂V ∂V
Qj = − . (4.15)
dt ∂ q̇j ∂qj
In this case, we recover the familiar form of Lagrange’s equations (4.4) with no
RHS, but now with a velocity-dependent potential V (t, q, q̇).
The quantity V = K − L is called the potential energy even for non-
conservative systems, and is generally time-dependent. A common example is
a system A with coordinates qA that is not closed, but it moves in an external
field due to a system B with coordinates qB (t) independent of qA such that the
entire system A + B is closed. This system has a Lagrangian of the form
L = KA (qA , q̇A ) − V (qA , qB (t)). (4.16)
We may ignore KB since it depends only on time and is thus a complete time
derivative. Equation (4.16) is a Lagrangian of the usual type, but with V
being possibly time-dependent. If system A is a single particle, then the Euler–
Lagrange equations yield
∂V
mq̈ = − (t, q) = F (t, q). (4.17)
∂q
For example, if F ≡ F (t) is uniform (i.e. independent of position) then V = F·x
in Euclidean space.
CHAPTER 4. EULER–LAGRANGE EQUATIONS 66
Now we will see how to obtain the Euler–Lagrange equations (4.4) from
Newtonian mechanics, which will show that Hamilton’s principle of least action
is equivalent to Newton’s principle of determinacy.
A mechanical system with a configuration manifold M can always be—and
in experiment is automatically—embedded in some Euclidean space RN . Within
M , the motion of the system is dictated by some known force F. The effect
of constraining the motion to the manifold M can be thought of as a force N
orthogonal to M , called the constraint force. Newton’s equations for this
system is
mi ẍi = Fi + Ni .
Rearranging, we see that mi ẍi − Fi = Ni is orthogonal to M , and so
(mi ẍi − Fi ) · ξ i = 0
for all vectors ξ = (ξ 1 , . . . , ξ N ) tangent to M . This is Newton’s equation in
the tangent plane to the surface M . Summing over all particles, we get the
d’Alembert–Lagrange principle:
N
X
(mi ẍi − Fi ) · ξ i = 0 (4.18)
i=1
for all vectors ξ ∈ RN tangent to M . In section 5.1 we will see that this principle
more generally dictates the motion of a system with constraints. Note that for
a free system M = Rn we may take any vector ξ ∈ Rn , and so we recover
Newton’s equations.
Let q = (q1 , . . . , qn ) be local coordinates on M . Then by the chain rule we
have
n
X ∂xi
ẋi = q̇j ,
j=1
∂qj
and so we may write the kinetic energy
N
X n
X
1 2
K(q, q̇) = 2 mi |ẋi | = aij (q)q̇i q̇j
i=1 i,j=1
or equivalently
N
X ∂xi
Qj = Fi , (4.19)
i=1
∂qj
CHAPTER 4. EULER–LAGRANGE EQUATIONS 67
are called the generalized forces. They dictate the evolution of the kinetic
energy via the following expression.
Proposition 4.9. The Newtonian motion q(t) of the system satisfies
d ∂K ∂K
− = Q. (4.20)
dt ∂ q̇ ∂q
Proof. We repeat the argument from section 4.3. The calculation (4.12) of the
variation of the kinetic energy contribution still holds, since we did not use any
equations of motion. Taking a dot product with an arbitrary tangent vector
ξ to M , we can replace the coordinates x ∈ RN with q ∈ M . Similarly, the
calculation (4.13) still holds on M by the definition (4.19) and the d’Alembert–
Lagrange principle (4.18)—that is, Newton’s equations hold on the manifold M
in terms of the generalized forces. Adding these together, we obtain (4.20) as
desired.
For a conservative system we have that the one-form Q dq is exact and may
be written as −dV for a potential energy V (q). For such a system we have the
Lagrangian L = K − V , and hence (4.20) implies that
d ∂L ∂L
− = 0.
dt ∂ q̇ ∂q
In section 4.1, we saw that this implies q(t) is a critical point for the action
functional. As q(t) was an arbitrary motion of the system, we conclude that the
principle of least action must hold.
∂L
pi = . (4.21)
∂ q̇i
If qi = xi is a Cartesian coordinate, the kinetic energy part of the Lagrangian
has a term 21 mi ẋ2i and so pi = mi xi is the linear momentum along the xi -axis. If
qi = φi is the azimuthal angular coordinate in R3 , then the kinetic energy about
the z-axis is 12 mi ri2 φ̇2i and so pi = mi ri2 φ̇i is the angular momentum about the
z-axis.
We can similarly define the (generalized) force as
∂L
Fi = .
∂qi
CHAPTER 4. EULER–LAGRANGE EQUATIONS 68
This includes our previous definition, since for a conservative system on Eu-
clidean space we have
n n X n
X ∂ X1 2 1 2
E= vi mj v j − V − 2 mi vi − V
i=1
∂vi j=1 2 j=1
n
X n
X
= mi vi2 − 1 2
2 mi vi + V = K + V.
i=1 i=1
CHAPTER 4. EULER–LAGRANGE EQUATIONS 69
dL ∂L ∂L ∂L
= + · q̇i + · q̈i
dt ∂t ∂qi ∂ q̇
i
∂L d ∂L ∂L ∂L d ∂L
= + · q̇i + · q̈i = + q̇i · .
∂t dt ∂ q̇i ∂ q̇i ∂t dt ∂ q̇i
∂L d
I(q, v) = (q, v) · hs (q) . (4.24)
∂ q̇ ds s=0
CHAPTER 4. EULER–LAGRANGE EQUATIONS 70
of coordinates.
Proof. As in the set up for the proof of Proposition 4.5, we may fix a coordinate
patch on M and take the variation to be supported within the image in order
to reduce the statement to Euclidean space Rn . Let q(t) : R → Rn denote a
solution to Lagrange’s equations. As hs is a symmetry for each s then hs ◦ q will
also be a solution, because (hs ◦ q)(t) = (hs )∗ (q(t)) is the pushforward of the
d
point q(t) and dt (hs ◦q)(t) = d(hs ) |q(t) (q̇(t)) = (hs )∗ (q̇(t)) is the pushforward of
the tangent vector q̇(t), and so by (4.23) the Lagrangian evaluated at (hs ◦ q)(t)
is the same as the Lagrangian evaluated at q(t).
Consider the map Φ : Rs × Rt → Rn given by Φ(s, t) = (hs ◦ q)(t). As all of
the symmetries hs preserve the Lagrangian L , then
d ∂L ∂Φ ∂L ∂ Φ̇
0= L (Φ, Φ̇) = · + · , (4.25)
ds ∂q ∂s ∂ q̇ ∂s
where everything on the RHS is evaluated at (Φ(s, t), Φ̇(s, t)) ∈ T Rn . As we
just noted, for fixed s the map Φ(s, ·) : R → Rn satisfies the Euler–Lagrange
equation
∂ ∂L ∂L
Φ(s, t), Φ̇(s, t) = Φ(s, t), Φ̇(s, t) .
∂t ∂ q̇ ∂q
Inserting this into the RHS of (4.25), we obtain
∂ ∂L ∂Φ ∂L ∂ Φ̇
0= Φ, Φ̇ · Φ, Φ̇ + Φ, Φ̇ · Φ, Φ̇
∂t ∂ q̇ ∂s ∂ q̇ ∂s
d ∂L ∂Φ dI
= · Φ, Φ̇ = Φ, Φ̇
dt ∂ q̇ ∂s dt
by the chain rule.
Example 4.13 (Translational symmetry). Consider the conservative N -particle
Lagrangian
XN
L (x, v) = 1 2
2 mi |vi | − V (x), (4.26)
i=1
4.7. Exercises
4.3 (Geodesics on the sphere [Tro96, Ch. 1]). In Example 4.7 we saw that
the geodesics—paths of shortest length between two given points—in Rn are
straight lines. We will repeat this procedure for the sphere S 2 .
x(t) = (cos φ(t) sin θ(t), sin φ(t) sin θ(t), cos θ(t)), t ∈ [0, 1].
4.4 (Brachistochrone [Tro96, Ch. 6]). The brachistochrone between two points
in the plane is the curve which a frictionless bead would traverse the quickest
subject to a downward gravitational acceleration. Johann Bernoulli in 1696
challenged mathematicians to find the shape of the brachistochrone, and it was
his brother Jakob Bernoulli who provided a solution which was later refined into
the calculus of variations.
(a) After translating, we may assume that the initial point is the origin (0, 0)
and the second point is given by some (x1 , y1 ) with x1 > 0 and y1 < 0.
Explain why it is reasonable to assume that the brachistochrone is the
graph of a function y(x), x ∈ [0, x1 ] as opposed to a general parametric
curve. Show that the time is takes the bead to traverse this curve is
Z x1 p
1 + y 0 (x)2
Φ[y(x)] = dx
0 v(x)
where v(x) is the bead’s speed.
p
(b) With constant downward acceleration g, show that v(x) = 2gy(x).
(c) Using conservation of the energy (4.22), find the first order differential
equation r
y
2
y 0 = 1.
c −y
(d) Introducing a new dependent variable θ(x) so that
y = c2 sin2 θ
2 = 12 c2 (1 − cos θ), 0 ≤ θ < 2π,
show that
1 2
2 c (1 − cos θ)θ0 = 1.
(e) By integrating the two equations of the previous part, obtain the para-
metric equations
x(θ) = c2 (θ − sin θ) + c1 , y(θ) = c2 (1 − cos θ).
In order for x(0) = 0 = y(0), we must have c1 = 0. That is, the brachis-
tochrone is a cycloid; these equations describe the path traced by a fixed
point on a circle of radius c2 as it rolls along the x-axis in the lower
half-plane.
4.5 (Lagrangian PDE I [Eva10, Ch. 8]). In this exercise we will prove the
existence of a solution to the elliptic PDE
−∇ · (A(x)∇u(x)) = 0 for x ∈ Ω, u(x) = 0 for x ∈ ∂Ω,
on an open set Ω ⊂ Rn . Here, A(x) = (aij (x)) is a symmetric n × n matrix
with aij ∈ H 2 (Ω) (the Sobolev space), and we also assume that A is uniformly
elliptic:
λI ≤ A(x) ≤ ΛI for x ∈ Ω
(in the sense of positive definite matrices).
CHAPTER 4. EULER–LAGRANGE EQUATIONS 73
(a) For u ∈ H01 (Ω) (the closure of Cc∞ (Ω) in H 1 (Ω)), show that the energy
functional Z
1
E(u) = 2 ∇u(x) · A(x)∇u(x) dx
Ω
is finite. Show that for φ ∈ Cc∞ (Ω) the first variation at u is
E(u + φ) − E(u)
Z
lim = ∇φ · A∇u.
→0 Ω
Using Poincaré’s inequality, show that E(u) is bounded below and hence
our sequence is bounded. Conclude that there exists a weakly convergent
subsequence ujk * u in H 1 (Ω) using the Riesz representation theorem.
(c) Show that E is weakly lower semicontinuous:
E(u) ≤ lim inf E(ujk ).
k→∞
(c) For k 1 we will see that an exchange of energy occurs. Suppose that
at time t = 0 we have θ1 = 0 = θ2 , θ̇2 = 0, and θ̇1 = v0 . Using part (b),
show that the motion is given by
θ1 (t) = v20 sin t + ω1 sin ωt , θ2 (t) = v20 sin t − ω1 sin ωt
(b) The Lorentz force is nonconservative, but if we can put it in the form (4.15)
then we will have a Lagrangian for this system. The first term qφ is already
of the desired form. Show that the x-component of the rightmost term
v × (∇ × A) may be rewritten as
∂ dAx ∂Ax
(v × (∇ × A))x = (v · A) − + .
∂x dt ∂t
By symmetry, we get the same relation for the other components with x
replaced by the respective variable.
CHAPTER 4. EULER–LAGRANGE EQUATIONS 75
(c) Show that the x-component of the Lorentz force can be written as
∂V d ∂V
Fx = − +
∂x dt ∂vx
for the potential energy
V = qφ − qc v · A.
L = K − V = 21 mv 2 − qφ + qc v · A
4.9 (Lagrangian PDE II). In this exercise, we will explore the formal Lagrangian
structure associated with the wave equation. The Lagrangian formulation of
the wave equation (as opposed to the Hamiltonian formulation) is advantageous
because it shares the Lorentz symmetries of the wave equation. For further
details, see [SS98].
(a) Let
−1 0 ··· 0
0 1 ··· 0
g= . .. = (gαβ )α,β=0,...,d
.. . .
.. . . .
0 0 ··· 1
denote the (d + 1) × (d + 1) matrix associated with the Minkowski metric
ds2 = −dt2 + dx21 + · · · + dx2d on Rt × Rdx , with coordinates (x0 , . . . , xd ) =
(t, x1 , . . . , xd ). Consider the Lagrangian
d
∂u 2
X
L (u, ∇u) := (g −1 )αβ ∂x
∂u ∂u
α ∂xβ
+ F (u) = − 12 ∂t + 12 |∇u|2 + F (u).
α,β=0
CHAPTER 4. EULER–LAGRANGE EQUATIONS 76
(b) Now we will take F ≡ 0 and derive the conservation laws of the linear wave
equation. Consider the same Lagrangian L (u, ∇u, g) and action S(u, g)
to be functions of the metric g. Given τ : R × Rd → R × Rd a smooth and
compactly supported diffeomorphism, let gs , s ∈ R be the pullback of the
metric g by the map id + sτ , so that
d ∂τβ ∂τα
(gs )αβ s=0
= gαβ , ds (gs )αβ s=0 = ∂xα + ∂xβ =: παβ .
where
T αβ := ∂L
∂(g −1 )αβ − 12 gαβ L
is the stress-energy tensor. Conclude that T αβ is divergence-free:
d
∂T αβ
X
∂xα =0 for β = 0, . . . , d.
α=0
CONSTRAINTS
∂f1 ∂fk
(t, q, q̇) · ξ = · · · = (t, q, q̇) · ξ = 0. (5.3)
∂ q̇ ∂ q̇
77
CHAPTER 5. CONSTRAINTS 78
x2 + y 2 − `2 = 0, 2xẋ + 2y ẏ = 0,
where ` is the length of the pendulum arm. Note that the second condition
is a time derivative of the first equation, but is necessary to specify the two-
dimensional submanifold S in the four-dimensional tangent space T R2 = R2 ×
R2 . If we let θ denote the angle from the downward vertical and r the distance
from the pivot, then the Lagrangian becomes
The first constraint does not place any restrictions on the virtual velocities
ξ = (ξr , ξθ ) since ∂(∂f 1
ṙ,θ̇)
= 0, but the second condition yields
∂f2
0= · ξ = ξr .
∂(ṙ, θ̇)
and q̇(0) = v0 as the conceivable motions; these lie in the constraint sub-
manifold S but do not necessarily satisfy the equations of motion. A released
motion satisfies the unconstrained Euler–Lagrange equations (4.4) (and does
not lie in S), and an actual motion is a conceivable motion satisfying the
d’Alembert–Lagrange principle (5.4) (and hence lies in S).
By considering only variations supported in a fixed coordinate patch, we may
reduce to a neighborhood about the point q0 and work in Euclidean coordinates
with the vector notation q(t) = x(t). We will write an actual motion of the
system xa (t) = xr (t) + δx(t) as a deviation from the released motion xr (t),
and we assume that the initial positions xa (0) = xr (0) = x0 and velocities
ẋa (0) = ẋr (0) = v0 are fixed. Taylor expanding we have
Therefore we have
δx(t) = 21 t2 δẍ(0) + O(t3 ). (5.6)
So far, we have not yet used the constraints.
In our local coordinates the d’Alembert–Lagrange principle requires
d ∂L ∂L
− ·ξ =0 (5.7)
dt ∂ ẋ ∂x
for all virtual velocities ξ. For conservative systems, ∂L ∂x is the system’s force
d ∂L
−∇V and dt ∂ ẋ is the total force mẍ (i.e. the system’s force plus the fictitious
d ∂L ∂L
constraint forces). So in this special case, we see that dt ∂ ẋ − ∂x is the force
d ∂L ∂L
m δẍ due to the constraints. In the general case, we evaluate dt ∂ ẋ − ∂x at
xa (t) = xr (t) + δx(t) and Taylor expand about xr (t):
d ∂L ∂L d ∂L ∂L
− = −
dt ∂ ẋ ∂x x=xa dt ∂ ẋ ∂x x=xr
d ∂ L ∂ L ∂ L ∂2L
2 2
2
+ δ ẋ + δx − δx − δ ẋ + O(|δx|2 ).
dt ∂ ẋ2 ∂ ẋ∂x ∂x2 ∂ ẋ∂x x=xr
The first term on the RHS vanishes since the released motion xr solves La-
grange’s equations. For the second term, we use (5.6) and take the limit t → 0
to obtain
∂2L
d ∂L ∂L
− = δẍ(0).
dt ∂ ẋ ∂x x=xa ∂ ẋ2 x=xr
t=0 t=0
∂2L
(q̈a (0) − q̈r (0)) (5.8)
∂ q̇ 2 q=qr
t=0
CHAPTER 5. CONSTRAINTS 81
for the Hessian matrix A = ∂∂ q̇L2 of the kinetic energy quadratic form at x = xr
2
5.3. Integrability
Some authors also require that holonomic constraints are integrable (in the
sense of Frobenius). In this section, we will explore what this additional as-
sumption yields.
In this section, we will assume that the motion is constrained within a k-
dimensional distribution ∆ ⊂ T M : for any q0 ∈ M , ∆q0 is a k-dimensional
subspace of Tq0 M and there exist smooth vector fields X1 , . . . , Xk defined on a
neighborhood of q0 such that ∆q is given by the span of X1 (q), . . . , Xk (q) on that
neighborhood. For holonomic constraints, ∆ is the space of virtual velocities.
We will also assume that ∆ is integrable: there exists an embedding i : N → M
such that di(Tq N ) = ∆q for all q ∈ N ; in other words, ∆ is given by the tangent
bundle of a submanifold.
Frobenius’ theorem provides us with a practical condition to determine if
the distribution ∆ is integrable:
Theorem 5.6 (Frobenius’ theorem). The distribution ∆ is integrable if and
only if ∆ is involutive—that is, for any vector fields X, Y ∈ ∆, the Lie bracket
[X, Y ] is also in ∆.
This abstract result expresses a basic idea when applied to differential equa-
tions: a first-order system of equations can be solved locally if and only if they
are consistent. For example, consider a function u = (u1 , . . . , um ) of the vari-
ables q = (q1 , . . . , qk ) which solves the system of equations
∂u ∂u
(q) = F1 (q, u), ..., (q) = Fk (q, u).
∂q1 ∂qk
Of course, if there exists a solution u, then the right-hand sides F1 , . . . , Fk must
be consistent:
∂ ∂2u ∂
Fi (q, u(q)) = (q) = Fj (q, u(q))
∂qj ∂qi ∂qj ∂qi
∂Fi ∂Fi ∂Fj ∂Fj
=⇒ + · Fj = + · Fi
∂qj ∂u ∂qi ∂u
for all i and j. This is the involutive condition of Forbenius’ theorem. Moreover,
the theorem also provides the converse: if the right-hand sides F1 , . . . , Fk are
consistent, then there exists a local solution. For a proof of Frobenius’ theorem,
see [Lee13, Th. 19.12].
If the holonomic constraints are integrable, then the motion q(t) must lie in
a submanifold of M , not merely a submanifold of T M . Now that we know that
the holonomic constraints have a corresponding smooth integrable submanifold
N ⊂ M of dimension d−k, we can show that the d’Alembert–Lagrange condition
is equivalent to the principle of least action holding on N :
Proposition 5.7. Suppose the Lagrangian system (M, L ) has integrable holo-
nomic constraints. Then a constrained path is a motion of the system if and
only if the path is a motion for the system (N, L |N ).
CHAPTER 5. CONSTRAINTS 83
Proof. From Hölder’s principle (Proposition 5.3) we know that the d’Alembert–
Lagrange condition is equivalent to insisting that the action variation dS|γ van-
ishes on a subspace Γ of conceivable variations. Here, Γ is a subspace of the
tangent space Tq(t) ΩM to the paths ΩM on M . On the other hand, the motion
of the Lagrangian system on N is given by taking the action S on the paths
ΩN on N , and then insisting that its variation vanishes on the tangent space
Tq(t) ΩN . The key observation is that the tangent space Tq(t) ΩN is equal to the
subspace Γ, and so the two conditions are identical.
In particular, the motion for integrable holonomic constraints is determined
by the restriction of the Lagrangian to the constraint submanifold N . In this
way, a system with holonomic constraints is like a new mechanical system with
fewer degrees of freedom; this is a characteristic feature of holonomic constraints,
which we do not expect for nonholomic constraints.
d ∂L ∂L ∂G
− =λ . (5.11)
dt ∂ q̇ ∂q ∂q
Proof. As in the set up for the proof of Proposition 4.5, we may fix a coordinate
patch on M and take the variation to be supported within the image in order
to reduce the statement to Euclidean space Rn . Let q(t) : R → Rn be a critical
point, and let g = ∂G ∂q denote the gradient of G. By premise, we know that
g(q) 6≡ 0. In particular, there exists a smooth function v(t) : R → Rn so that
Z T
g(q(t)) · v(t) dt 6= 0. (5.12)
0
This follows from the implicit function theorem, since we know that J(0, 0) = 0
and Z T
∂J
(0, 0) = g(q(t)) · v(t) dt
∂τ 0
is nonzero by (5.12).
The perturbation q + σh + τ (σ)v now satisfies the constraint for all σ suf-
ficiently small. As q(t) is a critical point of the constrained action functional,
then we must have
Z T
d
L q + σh + τ (σ)v, q̇ + σ ḣ + τ (σ)v̇, t dt.
0=
dσ 0
d ∂J ∂J
(0, 0) τ 0 (0)
0= J(σ, τ (σ)) σ=0 = (0, 0) +
dσ ∂σ ∂τ
Z T Z T
= g(q(t)) · h(t) dt + τ 0 (0) g(q(t)) · v(t) dt,
0 0
and so RT
0 g(q(t)) · h(t) dt
τ (0) = − R0T .
0
g(q(t)) · v(t) dt
Inserting this into the derivative (5.14) we arrive at
Z T
∂L d ∂L
0= (q) − (q) + λg(q) · h dt,
0 ∂q dt ∂ q̇
where RT
0
[ ∂L d ∂L
∂q (q) − dt ∂ q̇ (q)] · w(t) dt
λ=− RT
0
g(q(t)) · v(t) dt
is independent of h. As h(t) was an arbitrary fixed-endpoint variation, we
conclude that (5.11) holds.
CHAPTER 5. CONSTRAINTS 85
∂L
E = q̇ · −L (5.15)
∂ q̇
is conserved. For a conservative Lagrangian of the form
n
X
L (q, q̇) = 1 2
2 mi q̇i − V (q), (5.16)
i=1
For such a system, we seek closed orbits (i.e. periodic solutions) to Lagrange’s
equations. We will explore some well-known results in this field obtained by
variational methods, and then settle our attention on the proof of one specific
result (Theorem 5.12 below). For each result, closed orbits are produced by
proving the existence of an optimizer for a certain variational problem. To
this end, it is more convenient to work with the energy (5.15) instead of the
Lagrangian (5.16) since it is convex when V is. Consequently, these results are
more naturally phrased in terms of Hamiltonian mechanics, even though they
fall under the topic of optimization and constraints.
For one-dimensional systems, we know from section 2.5 that there are many
periodic trajectories. However, the following example illustrates that closed
orbits can be quite exceptional for n ≥ 2 degrees of freedom.
Example 5.9. Consider two particles of mass m = 1 moving according to the
two-dimensional harmonic oscillator potential
On the other hand, the total energy E by itself is convex. Consequently, our
next guess might be to seek critical points x : R/T Z → R2n for
Z T Z T
1
E(x(t)) dt, subject to the constraint 2 ẋ · Jx dt = 1.
0 0
In the last equality we integrated by parts, and the boundary terms canceled
by periodicity. As φ̇ can be an arbitrary smooth periodic function, then we
conclude that we must have
ẏ(t) = λJ∇E(y(t)).
CHAPTER 5. CONSTRAINTS 88
It only remains to show that λ > 0. By the equation (5.18) for z, we have
Z T
0= ż · ∇F (ż) − λJz − β dt
0
Z T Z T Z T
= ż · F (ż) dt − λ ż · Jz dt − ż · β dt.
0 0 0
The third integral on the RHS vanishes by periodicity, and the second integral is
equal to 2 by the constraint. As F is strictly convex, we must have ξ ·∇F (ξ) > 0
for all ξ 6= 0. Altogether, we conclude that λ ≥ 0. If λ = 0 then we have ż ≡ 0,
but this contradicts the constraints.
In the proof we used that ż 6≡ 0, which implies that z cannot be identically
constant and so y cannot vanish identically. It then follows that y can never
vanish, because x = 0 is the unique point where ∇E = 0 and so y(t) = 0 implies
y(t) ≡ 0.
In applying Proposition 5.13 to prove Theorem 5.12, there is a problem: the
solution produced by Proposition 5.13 is on some nonzero energy surface, but
not necessarily the one we started with. To solve this issue, we will modify E
so that it is positive homogeneous of degree 2:
E(λx)
e = λ2 E(x)
e for all x ∈ R2n , λ > 0
∇E(x)
e = s(x)∇E(x).
of degree 2 and z(t) is nonvanishing, then E(z(t)) > 0 and so we may choose
λ > 0 so that x(t) = λz(t) satisfies E(x(t)) ≡ E0 . Using that E is homogeneous
again, we see that
ẋ = λJ∇E(z) = J∇E(x)
as desired.
In order to extract a minimizer, we will use the following fact from functional
analysis:
for some constants c, C > 0, we see that the functional is well-defined on its
domain and that the optimizing sequence zn is in H 1 (R/2πZ). (Note that the
domain is also nonempty, since it contains harmonic oscillators like those in
Example 5.9.)
Decompose the zn as the Fourier series
X
zn (t) = zbn (k)eikt .
k∈Zr{0}
CHAPTER 5. CONSTRAINTS 90
R
(Note that zbn (0) = z dt = 0 by the constraints.) By Parseval we have
X Z Z
2 2 2 1
2π k |b
zn (k)| = |zn | dt ≤ c F (żn ) dt. (5.19)
k6=0
Therefore,
R 2π after initially replacing zn by a subsequence
R 2π which converges to
lim inf 0 F (żn ), then the RHS above is lim inf 0 F (żn ) as desired.
holonomic constraints. This is not always true in the nonholonomic case how-
ever, and so we would like to develop a new tool for this situation.
Suppose we have m nonholonomic constraints that can be expressed as the
vanishing of one-forms:
n
X
a`k (t, q) dqk + a`t (t, q) dt = 0 for ` = 1, . . . , m. (5.20)
k=1
However, the constraint (5.20) also includes some nonholonomic constraints; for
example, see (5.29) of Exercise 5.4.
Consider fixed-endpoint variations δq(t) of a path q(t) between the times 0
and T as in the proof of Proposition 4.5. The constraint (5.20) is satisfied for
q(t), and so Taylor expansion yields
n
X
a`k (t, q) δqk = O(δq 2 ) for ` = 1, . . . , m. (5.21)
k=1
The n generalized coordinates qk are not independent since they are related
by the m constraints. However, the first n − m coordinates can be chosen
independently, and the remaining m coordinates are determined by the condi-
tions (5.21). Pick the multipliers λ` such that
m
d ∂L ∂L X
− = λ` a`k for n − m < k ≤ n. (5.24)
dt ∂ q̇k ∂qk
`=1
CHAPTER 5. CONSTRAINTS 92
This causes the last m terms of the summation in the variation (5.23) to vanish,
leaving us with
Z T n−m
X m
∂L d ∂L X
− + λ` a`k δqk dt = O(δq 2 ).
0 ∂qk dt ∂ q̇k
k=1 `=1
obtained from the one-form constraints (5.20). Comparing these new equations
of motion (5.25) to P
the generalization for nonconservative forces (4.14), we see
that the quantities ` λ` a`k are a manifestation of the constraint forces.
5.7. Exercises
5.1. For the pendulum of Example 5.4, explicitly verify that minimizing the
compulsion Z leads to the familiar equation of motion.
5.2 (Hoop rolling down an inclined plane). Consider a circular disk of mass M
and radius r rolling without slipping due to gravity down a stationary inclined
plane of fixed inclination φ.
(a) In a vertical plane, the disk requires three coordinates: two Cartesian
coordinates (x, y) for the center of mass and an angular coordinate to
measure the disk’s rotation. If we pick the origin such that the surface
of the inclined plane is the line y = r − (tan φ)x, obtain a holonomic
constraint of the form (5.1) for the center of mass corresponding to the
disk sitting on the plane.
(b) Consequently, we now can pick two generalized coordinates to describe
the disk’s motion: let x denote the distance of the disk’s point of contact
CHAPTER 5. CONSTRAINTS 93
and the top of the inclined plane, and θ the angle through which the disk
has rotated from its initial state. By considering the arc length through
which the disk has rolled, show that rolling without slipping poses another
holonomic constraint.
(c) In this case, it is easier to treat rolling without slipping as a nonholonomic
constraint of the type in section 5.6:
r dθ − dx = 0.
Show that the Lagrangian for this system is
L = 21 M (ẋ2 + r2 θ̇2 ) + M gx sin φ.
(d) Apply Lagrange’s equations of the form (5.25) to determine the equations
of motion. Here, λ is the force of friction that causes the disk to roll
without slipping. Using the differential equation of constraint
rθ̇ = ẋ,
conclude that
g g Mg
ẍ = 2 sin φ, θ̈ = 2r sin φ, λ= 2 sin φ.
5.3 (Catenary [Tro96, Ch. 3]). The catenary is the shape taken by a cable hung
by both ends subject to gravity.
(a) Consider a cable of length L hung between two equal height supports
separated by a distance H < L. Let y denote the vertical coordinate
with y = 0 at the point where the cable is fastened, and let y(s) denote
the shape of the cable where s is the arc length along the cable, so that
y(0) = 0 = y(L). If the weight per unit length is a constant W , explain
why the cable shape y(s) minimizes the mass integral
Z L
F [y(s)] = W y(s) ds.
0
(c) Apply the Euler–Lagrange equation and integrate once to obtain the first
order differential equation
λy 0 (s)
p =s+c
1 − y 02 (s)
Together with y(s) from part (d), we have parametric equations for the
catenary. Eliminate the variable s and conclude
x− H
p
2
y(x) = λ cosh − λ2 + `2
λ
for x ∈ [0, H]. That is, the catenary is the graph of hyperbolic cosine.
(g) Obtain the same expression for y(x) using Proposition 5.8.
5.4 (Solving Kepler’s problem with harmonic oscillators [KS65]). Consider the
system of section 3.3, where a particle x ∈ R3 of mass m moves in a central
potential
Mm
V (|x|) = − .
|x|
Recall that the squaring function on C ' R2 given by
2
u1 − u22
u1
u := u1 + iu2 = 7→ x := u2 = (5.26)
u2 2u1 u2
CHAPTER 5. CONSTRAINTS 95
is conformal (except at the origin) and maps conic sections centered at the
origin to conic sections with one focus at the origin. As the orbits x(t) are conic
sections, we might hope to apply a transformation with similar properties to
our system in order to turn the elliptic orbits into simple harmonic oscillation.
(As a problem-solving method this may seem rather ad hoc, but an analogous
transformation was used to solve a long-standing open problem in physics.)
Associated to (5.26) is the the linear transformation
dx1 u1 −u2 du1
=2
dx2 u2 u1 du2
of differential forms. The matrix above has the following key properties:
• its entries are linear homogeneous functions of the ui ,
• it is like an orthogonal matrix, in the sense that the dot product of any
two rows vanishes and each row has norm u21 + u22 + · · · + u2n .
We would like to find such a transformation on Rn . It turns out that alge-
braically such transformations can only exist for n = 1, 2, 4, or 8. Ultimately
we want a transformation Rn → R3 , so we take n = 4 and choose a matrix
u1 −u2 −u3 u4
u2 u1 −u4 −u3
A= u3 u4
u1 u2
u4 −u3 u2 −u1
which satisfies these properties. (This matrix also can be obtained directly from
quaternion multiplication.) Consequently, we set
dx1 du1 u1 du1 − u2 du2 − u3 du3 + u4 du4
dx2 du2 u2 du1 + u1 du2 − u4 du3 − u3 du4
dx3 = 2A du3 = 2 u3 du1 + u4 du2 + u1 du3 + u2 du4 . (5.27)
Sadly, only the first three of these equations are exact forms, corresponding to
the quantities
x1 = u21 − u22 − u23 + u24 ,
x2 = 2(u1 u2 − u3 u4 ), (5.28)
x3 = 2(u1 u3 + u2 u4 )
respectively. The fourth equation of (5.27) is the nonholonomic constraint
Equation (5.28) defines a transformation from R4 into R3 , and in fact there are
explicit formulas for both the one-dimensional kernel and the inverse map.
(a) Let r = |x| denote the distance to the origin in R3 . Show that
Insert the Kepler forces Qi = −∂V (r)/∂ui and the (signed) semi-major
axis −1
v2
2
a0 = −
r M
to arrive at
∂ 2 ui M
2
+ ui = 0.
∂s 4a0
That is, the preimage of bounded orbits (a0 > 0) under thisp
transformation
is simple harmonic motion in R4 with frequency ω = M/4a0 . The
harmonic motion can be (rather tediously) transformed into R sa solution
u(t) by computing and substituting in the physical time t = 0 r(s) ds.
CHAPTER 6
HAMILTON–JACOBI EQUATION
In the last equality, we used the definition (4.21) of the generalized momentum.
By definition (4.1) of the action we have
dS
= L. (6.2)
dt
98
CHAPTER 6. HAMILTON–JACOBI EQUATION 99
On the other hand, the chain rule applies to S = S(t, q(t)) insists
dS ∂S ∂S ∂S
= + · q̇ = + p · q̇ (6.3)
dt ∂t ∂q ∂t
by (6.1). Setting (6.2) and (6.2) equal yields
∂S
= L − p · q̇ = −H,
∂t
where H is the total energy (or Hamiltonian) (4.22). Using the derivative (6.1),
we recognize this identity as a first order partial differential equation (PDE) for
S(t, q):
Theorem 6.1. If q(t) is a critical point for the action functional, then q(t)
solves
∂S ∂S
0= + H t, q, , (6.4)
∂t ∂q
where H is total energy defined by (4.22).
Equation (6.4) is the Hamilton–Jacobi equation, and for a system with
n degrees of freedom there are n + 1 independent variables (t, q1 , . . . , qn ). It is
often the case in practice that the formula for S(t, q) is unknown, and cannot
be determined from (6.4) alone.
The solution, or complete integral, of this equation has n + 1 integration
constants corresponding to the number of independent variables. Denoting these
constants by α1 , . . . , αn , and A, we write
S = f (t, q1 , . . . , qn , α1 , . . . , αn ) + A.
Indeed, one of these constants must be additive since the action appears in the
PDE (6.4) only through its partial derivatives and hence is invariant under the
addition of a constant. Mathematically, we require that a complete integral
S(t, q, α) to the Hamilton–Jacobi equation (6.4) satisfies
∂2S
det 6= 0
∂q ∂α
in order to avoid incomplete solutions.
The function f (t, q1 , . . . , qn , α1 , . . . , αn ) induces a change of coordinates.
(We will develop this idea more generally in section 7.6, where we will see that
f is an example of a generating function which generates a canonical transfor-
mation.) Think of α1 , . . . , αn as new momenta, and let β1 , . . . , βn denote new
coordinates to be chosen. Note that by the chain rule,
df ∂f ∂f ∂f
= + · q̇ + · α̇. (6.5)
dt ∂t ∂q ∂α
The qi derivatives are the momenta by (6.1), and we set
∂f
βi := − . (6.6)
∂αi
CHAPTER 6. HAMILTON–JACOBI EQUATION 100
∂S 0 ∂S 0
dS1
Φ t, q2 , . . . , qn , ,..., , φ q1 , = 0. (6.10)
∂q2 ∂qn dq1
Note that q1 only influences φ and is entirely independent from the rest of
the expression. As the variables are independent, this can only happen if φ is
constant:
∂S 0 ∂S 0
dS1
φ q1 , = α1 , Φ t, q2 , . . . , qn , ,..., , α1 = 0. (6.11)
dq1 ∂q2 ∂qn
∂S
= α1 , S = S 0 (q2 , . . . , qn , t) + α1 q1 . (6.14)
∂q1
If this cyclic variable is time, then the system is conservative and we saw
in (6.8) that the action is given by
Using (6.23), we may write the angle variables as functions of time. Absorbing
the integration constants of the wi into the coefficients A` , we get
X ∂E
F = A` exp it` · . (6.25)
n
∂I
`∈Z
the system is said to be completely degenerate. In the latter case, all motion
is periodic, and so we must have a full set of 2n−1 conserved quantities. Only n
of these will be independent, and so they can be defined to be the action variables
∂E ∂E
I1 , . . . , In . The remaining n − 1 constants may be chosen to be wi ∂I k
− wk ∂I i
for distinct i, k, since
d ∂E ∂E ∂E ∂E ∂E ∂E ∂E ∂E
wi − wk = ẇi − ẇk = − = 0.
dt ∂Ik ∂Ii ∂Ik ∂Ii ∂Ii ∂Ik ∂Ik ∂Ii
Note, however, that since the angle variables are not single-valued, neither will
be the n − 1 constants of motion.
Consider a partial degeneracy, say, of frequencies 1 and 2. This means
∂E ∂E
k1 = k2 (6.26)
∂I1 ∂I2
for some k1 , k2 ∈ Z. The quantity w1 k2 − w2 k1 will then be conserved, since
d ∂E ∂E
(w1 k1 − w2 k2 ) = ẇ1 k1 − ẇ2 k2 = k1 − k2 = 0. (6.27)
dt ∂I1 ∂I2
Note that this quantity is single-valued modulus 2π, and so a trigonometric
function of it will be an actual conserved quantity.
In general, for a system with n degrees of freedom whose action is totally
separable and has n single-valued integrals of motion, the system state moves
densely in a n-dimensional manifold in 2d-dimensional phase space. For degen-
erate systems we have more than n integrals of motion, and consequently the
system state is confined to a manifold of dimension less than n. When a system
has less than n degeneracies, then there has fewer than n integrals of motion
and the system state travels within a manifold of dimension greater than n.
Hamilton came up with the principle of least action while studying optics,
and was inspired by Fermat’s optics principle. In this section we will see that the
level sets of the action propagate through configuration space mathematically
similar to how light travels through a medium. This connection brings a physical
analogy to the formerly abstract notion of the action.
Suppose we have a system consisting of one particle moving in the Euclidean
space R3 for which the total energy is conserved. (Although this analogy holds
for systems with multiple particles and more complicated configuration spaces,
we will focus on this simple case for convenience.) Equation (6.15) tells us that
the action is given by
S(q, t) = W (q) − Et. (6.28)
Take Cartesian coordinates q = x = (x, y, z) ∈ R3 , and consider the motion
of the action level surfaces S(t, q) = b with time in R3 . (If we were to gener-
alize this argument to multiple-particle systems, then instead of the particle’s
CHAPTER 6. HAMILTON–JACOBI EQUATION 105
motion in Cartesian space we must consider the path that the system traces
out in configuration space.) At time t = 0, we have an equation for Hamilton’s
characteristic function W = b, and after a time step ∆t we then have
since dW
dt = E.
The propagation of this surface can be thought of as a wavefront. If we call
the distance traveled normal to the wavefront ds, then we also have
∂W
= |∇W |. (6.29)
∂s
The velocity u of the wavefront is then
ds dW/|∇W | E
u= = = .
dt dW/E |∇W |
or after rearranging,
2
(∇W ) = 2m(E − V ). (6.30)
Plugging this into the velocity (6.29), we have
E E E
u= p =√ = .
2m(E − V ) 2mK p
The faster the particle moves, the slower the action level sets propagate.
The momentum is given by
∂W
p= = ∇W.
∂q
The gradient is of course normal to the level sets, and so this relation tells us
that the particle always moves normal to the level sets of the characteristic
function W .
We will now see how the level sets of the action propagate like waves. For
some scalar-valued function φ, the wave equation of optics is
n2 ∂ 2 φ
∇2 φ − = 0. (6.31)
c2 ∂t2
If the refractive index n is constant, then there is a family of plane wave solu-
tions:
φ(t, r) = φ0 ei(k·x−ωt) , (6.32)
CHAPTER 6. HAMILTON–JACOBI EQUATION 106
where A(x) is related to the amplitude of the wave, k0 = ωc is the wave number
in vacuum (n ≡ 1), and L(x) is called the optical path length of the wave.
Plugging in (6.33), the wave equation (6.31) becomes
6.5. Exercises
(a) Write down the Hamilton–Jacobi equation (6.4) for this system. As this
system is conservative, we expect a solution (up to an arbitrary additive
constant) of the form
S(q, α, t) = W (q, α) − αt
CHAPTER 6. HAMILTON–JACOBI EQUATION 107
where the constant α is the total energy. Plug this ansatz into the
Hamilton–Jacobi equation and conclude that
Z q
2α 2
W = ±mω mω 2 − q dq.
This integral can be evaluated further, but it is not necessary for our
purposes.
(b) The quantity β will implicitly give us the equation of motion q(α, β, t).
Using the definition (6.6), show that
( )
1 sin−1
pm
ωq if ± = +
t+β = p2α + constant.
ω cos−1 m
2α ωq if ± = −
Note that we may assume that we are in the case ± = − and absorb the
integration constant into β, which has yet to be determined. Altogether,
this yields the familiar equation of motion
q
2α
q(t) = mω 2 cos [ω(t + β)] .
(c) To find the constants we must apply the initial conditions q(0) = q0 , p(0) =
p0 . Determine α and β using p0 = ∂S∂q |t=0 and q(0) = q0 respectively, and
obtain the solution as a function of the initial values:
r
p2 q0
q(t) = q02 + 20 2 cos ωt + cos−1 q .
m ω p2
q02 + m20ω2
S = W1 (r) + αφ φ − Et.
Plug this into the Hamilton–Jacobi equation (6.16) for conservative sys-
tems and integrate to arrive at
Z q
W = W1 (r) + αφ φ = ± 2m(E − V (r)) − αφ2 r−2 dr + αφ φ.
CHAPTER 6. HAMILTON–JACOBI EQUATION 108
6.3 (Kepler’s problem). We will find the frequency of oscillations for Kepler’s
problem using action variables without solving the equations of motion. Con-
sider a particle of mass m in an inverse-square central force field, as in section 3.3:
K= m 2
+ r2 θ̇2 + r2 sin2 θφ̇2 ), V = −kr−1 ,
2 (ṙ
∂L ∂L ∂L
pr = = mṙ , pθ = = mr2 θ̇, pφ = = mr2 sin2 θφ̇,
∂ ṙ ∂ θ̇ ∂ φ̇
!
1 2 p2θ p2φ k
H= pr + 2 + 2 2 − .
2m r r sin θ r
(a) Write down the Hamilton–Jacobi equation (6.16) for this conservative sys-
tem. Notice that all of the coordinates are separable, and so the charac-
teristic function is of the form
∂W
(b) The total energy is cyclic in φ and thus has constant ∂W φ
∂φ = ∂φ = αφ ,
which is angular momentum about the z-axis. Plug this in, group the
terms involving only θ and conclude that
2 " 2 #
αφ2 αθ2
∂Wθ 2 1 ∂Wr k
+ 2 = αθ , + 2 − = E.
∂θ sin θ 2m ∂r r r
(c) Use the three differential equations of part (b) to obtain the action vari-
ables:
Iφ = αφ = pφ ,
s
αφ2
I
1
Iθ = αθ2 − dθ,
2π sin2 θ
I r
1 2mk αθ2
Ir = 2mE + − 2 dr.
2π r r
(d) Let us look at the second action variable Iθ . We know from section 3.1
that this motion is coplanar, so let ψ denote the angle in the plane of orbit.
Set the momentum in the (r, θ, φ) variables and (r, ψ) variables equal, and
conclude that pθ dθ = p dψ − pφ dφ. Conclude that
Iθ = p − pφ = αθ − αφ .
(e) Now for the third action variable Ir . The integral for Ir is evaluated
between two turning points r1 , r2 for which the integrand pr = mṙ must
vanish. We can therefore integrate from r2 to r1 and back to r2 , for which
the integrand is first negative then positive, corresponding to the sign of
the momentum pr = mṙ. In the complex plane, this integrand is analytic
everywhere but r = 0 and along the segment on the real axis connecting
r1 and r2 . Integrate around a counterclockwise contour enclosing r1 and
r2 to obtain
mk mk
Ir = −αθ2 + √ = −(Iθ + Iφ ) + √ .
−2mE −2mE
Note that the energy
mk 2
E(I) = −
2(Ir + Iθ + Iφ )2
∂E ∂E ∂E mk 2
= = = .
∂Ir ∂Iθ ∂Iφ (Ir + Iθ + Iφ )3
This agrees with the fact that the force is rotationally symmetric.
PART III
HAMILTONIAN MECHANICS
110
CHAPTER 7
HAMILTON’S EQUATIONS
111
CHAPTER 7. HAMILTON’S EQUATIONS 112
Next, we will show that given a Hamiltonian, Hamilton’s equations are equiv-
alent to Lagrange’s equations for the corresponding Lagrangian. In section 7.2,
we will see how to transform a Lagrangian into a Hamiltonian, and so together
this will complete the equivalence of Hamiltonian and Lagrangian mechanics.
Proposition 7.1 (Principle of least action). The path (p(t), q(t)) is a critical
point for the functional
n
Z T X
S(p(t), q(t)) = pi (t)q̇i (t) − H(p(t), q(t)) dt (7.3)
0 i=1
over fixed-endpoint variations if and only if (p(t), q(t)) solves Hamilton’s equa-
tions (7.2).
Heuristically, we can write the action integral (7.3) as
Z
S = (p dq − H dt).
d
0= S(p + φ, q + ψ)
d =0
Z TX n
∂H ∂H
= φi q̇i + pi ψ̇i − φi − ψi dt
0 i=1 ∂pi ∂qi
Z T Z T
t=T ∂H ∂H
= pψ t=0 + φ q̇ − dt + − ṗ − dt.
0 ∂p 0 ∂q
The first term on the RHS vanishes as ψ(0) = 0 = ψ(T ). In order for this to
vanish for all such φ, ψ, we must have that Hamilton’s equations (7.2) hold.
Next we will see how conservation of momentum and energy are manifested in
the Hamiltonian perspective. A position variable qk is cyclic if the Hamiltonian
H is independent of qk (but may depend on pk ). For such a variable, the
corresponding component of Hamilton’s equations (7.2) reads
∂H
q̇k = , ṗk = 0. (7.4)
∂pk
The second of these equations expresses the conservation of pk , which we record
in the following statement:
Proposition 7.2 (Conservation of momentum). If the Hamiltonian is inde-
pendent of a position variable qk (but possibly dependent on pk ), then the cor-
responding momentum pk is conserved.
CHAPTER 7. HAMILTON’S EQUATIONS 113
where in the second equality we used that q(t) and p(t) solve Hamilton’s equa-
tions (7.2).
f (x) = 21 Ax · x + b · x + c,
∂F ∗ ∂
x · ξ − 21 Ax · x − b · x − c x=x∗ = ξ − Ax∗ − b.
0= (x ) =
∂x ∂x
CHAPTER 7. HAMILTON’S EQUATIONS 114
f
slope ξ
x
f ∗ (ξ)
Figure 7.1: Graphically, f ∗ (ξ) is the distance of the origin to the y-intercept of
the supporting hyperplane to the graph of f (x) with slope ξ.
This equation has one critical point x∗ = A−1 (ξ − b), which must be our
maximum. Plugging this back in, we get
f ∗ (ξ) = F (x∗ (ξ), ξ)
= A−1 (ξ − b) · ξ − 21 (ξ − b) · A−1 (ξ − b) − b · A−1 (ξ − b) − c
= 21 A−1 (ξ − b) · (ξ − b) − c.
In particular, if we take A = mI, b = 0, and c = 0, then both
f (x) = 21 m|x|2 , f ∗ (ξ) = 1
2m |ξ|
2
are the kinetic energy once we recognize x as the velocity and ξ as the momen-
tum.
The Legendre transform can be defined more generally (cf. Exercise 7.3),
but with these hypotheses on f we have that f ∗ satisfies the same hypotheses
(although they are not in one-to-one correspondence):
Theorem 7.5 (Involution property). If f : Rn → R is a smooth, nonnegative,
strictly convex function with f (0) = 0 that satisfies f (x)/|x| → +∞ as |x| → ∞,
then the Legendre transform (7.5) is too and we have (f ∗ )∗ = f .
Proof. It is immediate that f ∗ is nonnegative and that f ∗ (0) = 0. Moreover, f ∗
convex as the supremum of over convex (indeed, F (x, ξ) is affine in ξ) functions.
Next we differentiate g := f ∗ . As the unique maximizer of the distance
F (x, ξ), we know that x∗ = x∗ (ξ) is the unique solution to
∂F ∗
0= (x ) = ξ − ∇f (x∗ ).
∂x
The function ∇f is a local diffeomorphism on Rn by the inverse function theo-
rem, since (∇f )0 = f 00 is positive definite and hence invertible. This means that
the above equation has a unique solution
x∗ (ξ) = (∇f )−1 (ξ) (7.6)
CHAPTER 7. HAMILTON’S EQUATIONS 115
which is smooth because ∇f is, and hence g(ξ) = F (x∗ (ξ), ξ) is smooth as well.
The inverse function theorem also tells us that the derivative of x∗ (ξ) is
where f 00 (x) is the Hessian matrix of second derivatives. The first derivative of
g(ξ) := f ∗ (ξ) is then
which is positive definite since f 00 (x) is. That is, the Legendre transform f ∗ = g
is also strictly convex.
Now that we know
is also convex, we may now consider its Legendre transform. Let ξ ∗ (x) be
the point which attains the supremum for g ∗ . By (7.6) this point is uniquely
determined by
ξ ∗ (x) = (∇g)−1 (x) = ∇f (x),
where in the last equality we used (7.7). Comparing this with (7.6), we see that
his is the inverse of the function x∗ (ξ). Consequently, the transform of g(ξ) is
given by
and so (f ∗ )∗ = f as desired.
Within the context of mechanics, the importance of the Legendre transform
is contained in the following simple calculation:
Proposition 7.6. Let M be an n × n positive definite matrix. The Legendre
transformation of the conservative Lagrangian system
H(q, p) = 12 M −1 p · p + V (q)
dF ∂F ∂F ∂F ∂F ∂F ∂H ∂F ∂H
= + · q̇ + · ṗ = + · − · . (7.9)
dt ∂t ∂q ∂p ∂t ∂q ∂p ∂p ∂q
{F G, H} = F {G, H} + G {F, H} ,
which can be seen by using the product rule for F G and expanding. We also
have the chain rule
{F, g(H)} = g 0 (H){F, H}. (7.12)
Although slightly less obvious, we claim that the Poisson bracket also satisfies
the Jacobi identity:
Let’s focus on one of the terms above, say, {H, {F, G}}. As {F, G} is a linear
expression in terms of the first derivatives of F and G, then {H, {F, G}} is also
linear expression with each term containing exactly one second derivative of F
or G. Let DG (φ) = {G, φ} and DH (φ) = {H, φ}. Note that the first term
of (7.13) does not contain any second derivatives of F , and so all of the second
derivatives of F are contained within
and DH as
X ∂G ∂ ∂G ∂
X
∂
DG = − = ξk ,
j
∂pj ∂qj ∂qj ∂pj ∂x k
k
X ∂H ∂ ∂H ∂
X
∂
DH = − = ηk ,
j
∂pj ∂qj ∂qj ∂pj ∂xk
k
In taking the difference of these, we see that the second terms in these last
equalities cancel and we are left with
X ∂η` ∂ξ`
∂
DG DH − DH DG = ξk − ηk .
∂xk ∂xk ∂x`
k,`
That is, all of the second derivatives of F cancel, leaving only first derivatives.
By symmetry this must also be true for G and H, and since every term of (7.13)
contains only terms containing exactly one second derivative, then all the terms
must cancel.
In order to make the equation of motion (7.11) entirely coordinate-free, we
need a new coordinate-free definition of the Poisson bracket. Specifically, out of
the properties we just observed, we take the following as an abstract definition:
The properties of Definition 7.7 are easily verifiable for this matrix. Equivalently,
the entries of this matrix are uniquely determined by the relations
(Again, there is another popular convention that comes from swapping the order
of F and H in the definition of {F, H}.) We call XH the Hamiltonian vector
field associated to H. This is not the only way that we can turn an observable
H into a vector field. Indeed, the gradient vector field ∇H induces the gradient
flow (cf. (2.5)), and this is associated to the dot product structure F 7→ ∇F ·∇H.
The Hamiltonian vector field is the analogous object for the Poisson structure.
In Hamiltonian mechanics, the state of the system is described by a point
x in Rd (or on a manifold) endowed with a Poisson structure. The evolution is
dictated by a Hamiltonian (or total energy) H : Rd → R via Hamilton’s
equations
ẋ = XH = J∇H. (7.15)
CHAPTER 7. HAMILTON’S EQUATIONS 121
These are the generalization of (7.2) and (7.11) to arbitrary Poisson structures.
For future reference, we also note that the above calculation implies
In other words, the evolution of F under the flow of H can be recast as the
evolution of H under the flow of F .
Example 7.12. To put classical mechanics in this framework, we adopt the
canonical bracket of Example 7.10. Here, qi are the position coordinates and pi
are the corresponding momenta. The Hamiltonian is the total energy
1 2
H(q, p) = 2m2 p + V (q).
where we used that the Poisson bracket satisfies the chain rule (7.12). The first
equation is the definition of momentum, and the second equation is Newton’s
equation.
Example 7.13. For a free relativistic particle, the total energy is given by
Einstein’s equation p
H = c2 p2 + m2 c4 .
This yields
cp
q̇ = p , ṗ = 0.
m2 c2 + p2
In particular, we see that |q̇| ≤ c with equality if and only if m = 0. These
equations of motion can be used to derive the Lorentz formulas, which lie at the
heart of relativistic mechanics. In fact, Hamiltonian mechanics is compatible
with relativity. However, relativity is not built in to the Hamiltonian framework,
and It turns out to be easier to incorporate relativity into Lagrangian mechanics.
Next, we record some easy consequences:
Lemma 7.14. (a) The observable F is conserved under the flow of H if and
only if {F, H} ≡ 0 (and we may swap F, H in either of these statements).
(b) We have [XF , XH ] = −X{F,H} .
(c) If {F, H} is a constant, then the F -flow commutes with the H-flow.
CHAPTER 7. HAMILTON’S EQUATIONS 122
Proof. (a) This follows from the definition XH F = {F, H}, and we may swap
F and H in either statement by the observation (7.16).
(b) We compute
The RHS is exactly the time derivative of {F, G} under the flow of H.
Example 7.17. If L1 = x2 p3 − x3 p2 and L2 = x3 p1 − x1 p3 are the first and
second components of the angular momentum L = x × p of a particle x ∈ R3 ,
then their bracket
Although powerful, it should be noted that the new quantity {F, G} is not
guaranteed to be nontrivial. For example, a system with n degrees of freedom
can only have up to 2n − 1 independent conserved quantities, and so repeated
applications of Proposition 7.16 must eventually stop producing independent
quantities.
for all i, j, `.
and so λ = µ = 0 as desired.
By the claim and Proposition A.19, we can find a new full set of coordinates
x1 , . . . , xd so that
∂ ∂
= Xp1 , = Xq1 .
∂x1 ∂x2
It is important to know that it is possible to extend to a full set of coordinates
∂
at this step. Indeed, in order to even define ∂x 1
we need a full set of coordinates.
∂E
(In statistical mechanics, the notation ( ∂V )P is expressly intended to resolve
this issue.)
Next we claim that (p1 , q1 , x3 , . . . , xd ) are also valid coordinates on a neigh-
borhood of x0 —i.e. their gradients at x0 are linearly independent. Suppose
that
Xd
0 = ∇ c1 p1 + c2 q2 + c` x` (x0 )
`=3
for some constants c1 , . . . , cd . Then
d d
X X ∂x`
0 = c1 p1 + c2 q2 + c` x` , p1 = c1 {p1 , p1 } + c2 {q1 , p1 } − = c2 ,
∂x1
`=3 `=3
and so c2 = 0. Similarly,
d
X
0 = c1 p1 + c2 q2 + c` x` , q1 = −c1 .
`=3
We will soon make this relationship precise in Proposition 7.20. Note that if
we replace the canonical structure matrix J0 by the identity matrix I then we
recover the orthogonal group; this reflects that I is the structure matrix for the
dot product.
First, we record some facts about Sp(R2n ):
Proposition 7.19. Sp(R2n ) is a group that is closed under transposition.
CHAPTER 7. HAMILTON’S EQUATIONS 126
Proof. To see that Sp(R2n ) is a group, we check that it is closed under inversion.
Suppose that U satisfies (7.19). Then U has nonzero determinant, and so by
multiplying by U −1 and U −T we see that J0 = U −T J0 U −1 .
The group Sp(R2n ) is also closed under transposition. This does not fol-
low from taking the transpose of (7.19) however, because taking the transpose
of (7.19) does not yield anything new. Instead, we take the inverse of (7.19) to
obtain −U −1 J0 U −T = −J0 . As Sp(R2n ) is closed under inversion, we conclude
that U T ∈ Sp(R2n ).
In the case of the canonical Poisson bracket, we have:
Proposition 7.20. Suppose J = J0 . Then Ψ is canonical if and only if Ψ0 ∈
Sp(R2n ).
Proof. The relation (7.18) reads Ψ0 J0 (Ψ0 )T = J0 , and we know that Sp(R2n ) is
closed under transposition.
Note that the block matrix
A B
U=
C D
AT C = C T A, B T D = DT B, AT D − C T B = I. (7.20)
for all i, j = 1, . . . , n.
CHAPTER 7. HAMILTON’S EQUATIONS 127
Proof. We may assume that the Poisson bracket is given in canonical coordinates
(q, p) by Theorem 7.18.
The forward implication is immediate, since we can replace the differentia-
tion variables q, p with Q, P by assumption. For example,
In this section we will prove Liouville’s theorem, which says that Hamiltonian
flows preserve Lebesgue measure on phase space. In other words, the density of
trajectories in phase space surrounding a given trajectory is constant in time.
Given a Borel measure µ on Rd , we say that a continuous map Φ : Rd → Rd
preserves the measure µ if
(As in measure theory, we need to use Φ−1 here instead of Φ because Φ−1 (A)
is Borel when A is by the continuity of Φ, but Φ(A) might not be.) The condi-
tion (7.21) is equivalent to
Z Z
(f ◦ Φ)(x) dµ(x) = f (x) dµ(x) for all f ∈ Cc∞ (Rd ). (7.22)
Indeed, note that taking f = 1A in (7.22) yields (7.21), and the equivalence is
proved by approximating 1A by smooth functions. The equations (7.21)-(7.22)
is also sometimes conveyed by saying that the pushforward measure µ ◦ Φ−1
is equal to µ. (Recall that measures pushforward because they are dual to
functions, which pullback.)
Liouville’s theorem follows from the following general fact about ODEs:
CHAPTER 7. HAMILTON’S EQUATIONS 128
d
(ω ◦ Φ)(t, ξ) det Dξ Φ(t, ξ)
dt
X d
∂Xi
= (∇ω ◦ Φ)(t, ξ) · (X ◦ Φ)(t, ξ) + ω · ◦ Φ(t, ξ) det Dξ Φ(t, ξ)
i=1
∂ξ i
= ∇ · (ωX) ◦ Φ(t, x) det Dξ Φ(t, ξ).
The RHS vanishes by premise, and so we conclude that the quantity in the time
derivative on the LHS is equal to its initial value:
This is exactly the condition (7.22), and so we conclude that the flow Φ preserves
ω(x) dx.
As an immediate corollary, we obtain Liouville’s theorem for the canonical
Poisson bracket:
∂2H ∂2H
∇ · XH = − ≡ 0,
∂q∂p ∂p∂q
and so its flow preserves Lebesgue measure by Lemma 7.22.
Next, we would like to extend this fact to all Poisson structures. First, we
will need:
Proposition 7.24. The flow map Φ(t, ·) of any Hamiltonian vector field is
canonical.
Proof. By Theorem 7.18, we may work in local coordinates so that
0 I 0
J = −I 0 0
0 0 0
In particular, this holds for the flow map of any Hamiltonian vector field.
Proof. Consider the change of variables
y = Φ(x), dy = | det Φ0 | dx.
As J ◦ Φ = Φ0 J(Φ0 )T , we have
| det Φ0 | dx
Z Z Z
dy dx
f (y) 1 = (f ◦ Φ)(x) 1 = (f ◦ Φ)(x) 1 .
| det J(y)| 2 | det J ◦ Φ| 2 | det J| 2
Liouville’s theorem has many important consequences. For example, we
now know that for a Hamiltonian system there can be no asymptotically stable
equilibrium points or asymptotically stable closed trajectories in phase space,
since these would require that the density of phase curves to increase around
such phenomena. We also have the following phenomenon:
Corollary 7.26 (Poincare’s recurrence theorem). Fix t ∈ R and D ⊂ R2n a
bounded region of phase space, and let Φ := Φ(t, ·) denote the flow by time t of a
Hamiltonian vector field. Then for any positive measure set U ⊂ D there exists
x0 ∈ U and a positive integer n such that φn (x0 ) ∈ U .
If the motion is bounded—as is the case for a conservative Newtonian system
with potential energy V (x) → +∞ as |x| → ∞—this means that the system
will return to an arbitrary vicinity of any given possible configuration (q, p) ∈
R2d infinitely often, given enough time. For example, suppose we opened a
connection between chambers of gas and vacuum. Then Corollary 7.26 says that
the gas molecules will eventually all return to the initial chamber, seemingly in
violation of the second law of thermodynamics. Although it may appear that
Poincare’s theorem conflicts with Liouville’s theorem, the time scales are often
quite large (in our example it is longer than the age of the universe) and so
there is no contradiction.
Proof. Given a smooth Hamiltonian, Hamilton’s equations are a smooth system
of ODEs and so the subsequent flow Φ : R2n → R2n is injective by uniqueness
of solutions. Liouville’s theorem (Theorem 7.25) tells us that Φ preserves phase
volume.
Consider the collection of sets U, Φ(U ), Φ2 (U ), · · · ⊂ D. As Φ is volume-
preserving, all of these sets must have the same volume. On the other hand,
D is bounded and thus has finite volume, and so it is impossible for all of
these sets to be disjoint. That is, there exists some distinct j < k such that
Φj (U ) ∩ Φk (U ) 6= ∅. As Φ is injective, this requires Φk−j (U ) ∩ U 6= ∅. Namely,
we can pick some x0 ∈ U in this intersection, which gives φk−j (x0 ) ∈ U .
CHAPTER 7. HAMILTON’S EQUATIONS 131
In fact, the proof of Corollary 7.26 shows that the set of points in U which
do not return to U infinitely often has measure zero.
Together, a measure space X with a finite measure µ and a measurable
function φ : X → X that is measure preserving constitute a measure preserving
space, which is the fundamental object of study in discrete dynamical systems.
In fact, phase volume is the only measure on phase space that can be pre-
served:
Proposition 7.27. Suppose that J is nondegenerate. If a smooth measure
ω(x) dx is invariant under all Hamiltonian flows, then it must be a scalar mul-
tiple of the phase volume measure.
Proof. By Theorem 7.18, we may work in local canonical coordinates. Suppose
that he measure
ω(p1 , . . . , pn , q1 , . . . , qn ) dp1 . . . dpn dq1 . . . dqn
is invariant under all Hamiltonian flows. Taking the Hamiltonian H = pk we
have the vector field XH = ∂q∂k , and so Lemma 7.22 requires that
∂ω
0 = ∇ · (ωXH ) =
∂qk
∂ω
for all k. Taking H = qk , we similarly conclude that ∂pk = 0 for all k. Therefore
∇ω ≡ 0, and hence ω is a constant.
Let
1
dν := | det J(x)|− 2 dx
denote the phase volume measure. The Gibbs’ measure or Gibbs’ state
associated to a Hamiltonian system at temperature T > 0 is
1 −βH
Ze dν,
where β = kB1T , Boltzman’s constant kB is a universal constant, and the par-
tition function Z is a normalization constant chosen so that the total integral
of the Gibbs’ measure is one. Note that this definition requires that e−βH is
integrable. When the temperature T is small, β is large and hence the Gibbs’
measure is supported near the minima of H. When T is large, β is small and
the Gibb’s measure is supported more evenly everywhere.
The Gibbs’ measure can also be characterized via a variational principle, but
this still requires phase volume (via the entropy). On the other hand, we also
have the following algebraic characterization:
Proposition 7.28. The Gibbs’ state is the unique probability measure that sat-
isfies the classical Kubo–Martin–Swinger (KMS) condition:
E[{F, G}] = βE[{F, H}G] for all F, G ∈ Cc∞ (Rd ).
Here, the expected value E is defined in terms of a probability measure ω(x) dx
via Z
E[X] = X(x)ω(x) dx.
CHAPTER 7. HAMILTON’S EQUATIONS 132
In other words,
Z
(ωeβH ){F, K} dx for all F, K ∈ Cc∞ (Rd ).
7.8. Exercises
L = 21 mv 2 − qφ + qc v · A
where φ and A are the scalar and vector potentials for the electric and magnetic
fields:
1 ∂A
B = ∇ × A, E = −∇φ − .
c ∂t
(a) Show that the Hamiltonian for this system is
2
H= 1
2m p − qc A + qφ.
CHAPTER 7. HAMILTON’S EQUATIONS 133
(b) Although the electric and magnetic fields are uniquely expressed, the
scalar and vector potentials that φ and A are not unique and they ap-
pear explicitly in the Hamiltonian. In fact, substituting A0 = A + ∇f
for any smooth function f (t, x) leaves B unchanged since the curl of a
gradient is also zero. Show that for E to remain unchanged we need to
also substitute
1 ∂f
φ0 = φ − .
c ∂t
Together, replacing (A, φ) → (A0 , φ0 ) is called a gauge transformation,
and a specific pair (A, φ) is called a choice of gauge.
(c) As the electric and magnetic fields are unaffected by the choice of gauge,
then any physical laws in terms of the potentials should also be invariant.
Show that under a gauge transformation the Hamiltonian becomes
q ∂f
H0 = H − ,
c ∂t
and that Hamilton’s equations still hold in the new variables.
7.2 (Young’s inequality). Show that for any Legendre transform pair f (x) and
f ∗ (ξ) we have
x · ξ ≤ f (x) + f ∗ (ξ) for all x, ξ ∈ Rn .
Apply this inequality to the function f (x) = |x|p /p for p ∈ (1, ∞) and conclude
|x|p |ξ|q 1 1
x·ξ ≤ + , where p + q = 1.
p q
7.3 (Properties of the Legendre transform). Suppose that f : Rn → (−∞, +∞]
is lower semicontinuous and not identically +∞.
(a) Define the sub-differential
∂f (x) = {v ∈ Rn : f (y) − f (x) ≥ v · (y − x) for all y ∈ Rn }.
Show that if f is convex, then ∂f (x) is nonempty for all x ∈ Rn . Show
that the Legendre transform f ∗ (ξ) (defined by (7.5)) is equal to x·ξ −f (x)
if and only if ξ ∈ ∂f (x).
(b) Show that f ∗ (ξ) is a lower semicontinuous and convex function. Moreover,
show that f ∗∗ is the largest lower semicontinuous convex function that is
less than or equal to f , and f ∗∗ = f if f is convex.
7.4 (Example of Poisson’s theorem). Let L = x × p denote the angular mo-
mentum of a particle x ∈ R3 . Show that for any unit vector n we have
{L, L · n} = L × n.
(Hint: Fix Cartesian coordinates with respect to n and write down the vector
field corresponding to rotation about the n axis. Apply this vector field to L and
recognize this as the canonical Poisson bracket of L with a certain Hamiltonian.
Conclude that {Li , Lj } = −Lk whenever i, j, k is a cyclic permutation of 1, 2, 3.)
CHAPTER 7. HAMILTON’S EQUATIONS 134
7.5 (Chain rule for Poisson brackets). Given a Poisson bracket in the sense of
Definition 7.7, show that we have the chain rule
{V (q), F } = V 0 (q){q, F }.
(Hint: Use the product rule of Definition 7.7 and the Taylor expansion V (q) =
V (q0 ) + V 0 (q0 )(q − q0 ) + (q − q0 )2 W (q).)
7.6. (a) Show that a symplectic matrix M is orthogonal if and only if it takes
the form
A B
−B A
with matrices A and B such that A + iB is a complex unitary matrix.
(Hint: For the reverse direction, start with the relation M T (I + iJ)M =
I + iJ.)
(b) Deduce that such matrices have determinant equal to 1. (Hint: Use a
complex matrix to diagonalize M .)
7.7. Hamilton’s equations also arise from a variational principle, but the func-
tional is unbounded and hence less useful. Given a smooth and strictly convex
Hamiltonian H : R2n → R with H(0) = 0 and an energy level α ∈ R, consider
the functional Z 1
1
E(x(t)) = 2 x(t) · J ẋ(t) dt
0
with the domain
Z 1
Mα = x ∈ C 1 (R; R2n ) : x(t + 1) = x(t), H(x(t)) dt = α .
0
Here, J is the canonical structure matrix (9.6). Show that if x(t) is a critical
point of E on Mα , then x(t) = (q(t), p(t)) is a periodic solution of Hamilton’s
equations (7.2). Show that E is not bounded below on {x ∈ C 1 (R; R2n ) :
x(t + 1) = x(t)}.
CHAPTER 8
NORMAL FORMS
Theorem 8.1. Consider R2n endowed with the canonical Poisson bracket J =
J0 and a function Φ : R2n → R2n and write Φ(p, q) = (P (p, q), Q(p, q)).
135
CHAPTER 8. NORMAL FORMS 136
∂W ∂W
(P, Q) = Φ(p, q) ⇐⇒ p= , Q= . (8.1)
∂q ∂P
2
∂ W
Moreover, we have det ∂P ∂q 6= 0.
2
∂ W
(b) Conversely, if W (q, P ) is smooth and det ∂P ∂q 6= 0, then (8.1) defines a
canonical transformation between neighborhoods of (0, 0) and Φ(0, 0).
Proof. (a) We claim that there exists a function V (p, q) so that
∂V
! P ∂P ! P ∂q ! P ∂P !
∂pi j Qj ∂pij j pj ∂pji j Qj ∂pij
0
∇V = = + P = + . (8.2)
∂V P
Qj
∂Pj ∂q
pj ∂qji
P
Qj
∂Pj pi
∂qi j ∂qi j j ∂qi
For (8.2) to hold, we simply check the equality of mixed partials. We will verify
this for one example:
n n
∂2V ∂ 2 Pj
X
∂ X ∂Pj ∂Qj ∂Pj
= Qj + pi = + Qj ,
∂qk ∂qi ∂qk j=1 ∂qi j=1
∂qk ∂qi ∂qk ∂qi
n n
∂2V ∂ 2 Pj
∂ X ∂Pj X ∂Qj ∂Pj
= Qj + pk = + Qj .
∂qi ∂qi k ∂qi j=1 ∂qk j=1
∂qi ∂qk ∂qi ∂qk
Note that the second terms on the RHS agree. As Φ is canonical, then by
Proposition 7.20 we know that
∂P ∂P
!
0 ∂p ∂q A B
Φ = ∂Q ∂Q =
C D
∂p ∂q
AT C = C T A, B T D = DT B, AT D − C T B = I.
∂2V ∂2V
which demonstrates the equality of mixed partials for ∂qk ∂qi and ∂qi ∂qk . (The
T T ∂2V T
second relation B D = D B is needed for ∂pi ∂pk and the third relation A D −
T ∂2V
C B = I for ∂pi ∂qk .)
In fact, this demonstrates that (8.2) is locally solvable by
some V if and only if Φ is canonical.
CHAPTER 8. NORMAL FORMS 137
Next, as det ∂P
∂p 6= 0, the implicit function guarantees that (p, q) 7→ (P, q) is
a local diffeomorphism. Therefore we may define
It remains to check that the relations (8.1) hold so that W is the generating
function for Φ. To emphasize that we are working with the coordinates (P, q),
we may write this as
∂W ∂W
p= , Q= .
∂q P ∂P q
ẏ = JH 00 (0) y.
(8.4)
This is a Hamiltonian flow for the quadratic Taylor approximation of our Hamil-
tonian:
H(0) + 21 x · H 00 (0)x. (8.5)
We will study quadratic Hamiltonians in finer detail in section 8.3.
For general ODEs, the Hartman–Grobman theorem (Theorem A.21) tells us
that if the matrix for the linearized flow (8.4) has no purely imaginary eigen-
values, then the actual flow (8.3) is qualitatively determined by the spectrum
of the matrix for the linearized flow. For Hamiltonian flows, the matrix in (8.4)
must be of the form JH 00 (0) for H 00 (0) real and symmetric, and so we have some
conditions on the spectrum:
Proposition 8.2. The linearized equation (8.4) has the following properties.
Note that property (b) tells us that the trace of JH 00 (0) is zero, and so the
00
linear flow map etJH (0) preserves volume in phase space. This must be the case
by Liouville’s theorem (Proposition 7.23), since the linear flow is Hamiltonian
for the Hamiltonian (8.5).
Proof. (a) This is because the matrix JH 00 (0) is real.
(b) Let σ(JH 00 (0)) denote the spectrum of JH 00 (0). As H 00 (0) is symmetric,
we have
(c) This is because the linear flow is Hamiltonian for the Hamiltonian (8.5).
Indeed, the flow map for a Hamiltonian flow is a canonical transformation by
Proposition 7.24, and for the canonical Poisson bracket this simply means that
00
the derivative of etJH (0) is a symplectic matrix by Proposition 7.20.
CHAPTER 8. NORMAL FORMS 139
00
While property (c) tells us that etJH (0) is always a symplectic matrix, not
00
every symplectic matrix admits the representation etJH (0) for a symmetric
matrix H 00 (0). For example, the matrix
−4 0
(8.6)
0 − 41
00
is symplectic, and it cannot be written as etJH (0) since then it would admit
00
a real square root etJH (0)/2 ; this root would then have eigenvalues ±2i and
1
± 2 i, which is too many. (Note that this example is not near the identity,
00
while etJH (0) tends towards the identity as t → 0.) On the other hand, it
can be shown that the polar decomposition matrices of a symplectic matrix are
symplectic, and so it follows that every symplectic matrix is the product of two
00
matrices of the form etJH (0) .
1
Here, we take the self-adjoint square root M 2 . This is a canonical transforma-
tion because the matrix
A 0
0 A−T
is symplectic whenever A is invertible (which is easily verified using (7.20)). In
the new variables, the Hamiltonian (8.7) becomes
H = 12 |p|2 + 12 q · Ve q (8.8)
1 1
for the new symmetric matrix Ve = M − 2 V M − 2 .
Next, we diagonalize Ve :
κ1
Ve = O
.. T
O ,
.
κd
Hi = 21 p2 + 12 ω 2 q 2 ,
which is the harmonic oscillator of Example 2.2. Indeed, the equations of motion
take the form
q̇ 00 q 0 1 q
= JH (0) = .
ṗ p −ω 2 0 p
From its trace and determinant, we see that the that the matrix on the RHS
has eigenvalues ±iω. The trajectories in phase space are ellipses due to the
conservation of H. In fact, we can perform one further canonical change of
variables
1 1
pnew = ω − 2 pold , qnew = ω 2 qold
to symmetrize the Hamiltonian
ω 2
Hi = 2 (pi + qi2 ) (8.10)
Hi = 21 p2 − 12 γ 2 q 2 ,
which is the saddle node of Example 2.1. The equations of motion are
q̇ 0 1 q
= ,
ṗ γ2 0 p
and the matrix on the RHS has eigenvalues ±γ. The trajectories in phase space
trace out the hyperbolic level sets of Hi . To symmetrize the Hamiltonian, we
can take the canonical change of variables
1 1
pnew = γ − 2 pold , qnew = γ 2 qold
to make
Hi = γ2 (p2i − qi2 ).
Alternatively, the canonical change variables
pnew 1 1 −γ pold
=√ −1
qnew 2 γ 1 qold
makes
Hi = γpi qi . (8.11)
Lastly, in the case κi = 0 we have
q̇ 0 1 q
= ,
ṗ 0 0 p
Hi = 12 p2i (8.12)
Proof. This follows immediately from the fact that H is a (weak) Lyapunov
function (cf. Lemma 2.8) and Theorem 2.10.
As a warning, it is false that a not bowl-shaped Hamiltonian implies that
the the fixed point is unstable. Indeed, saddle-shaped Hamiltonians can have
stable fixed points. For example, the Hamiltonian
H = γ(p1 q1 + p2 q2 ) + ω(p1 q2 − p2 q1 )
which consists of a saddle in the first and second variables plus an interaction
term. This leads to the linear ODE with matrix
γ ω 0 0
−ω γ 0 0
JH 00 =
0 0 −γ ω
0 0 −ω −γ
(in terms of the ordered basis (q1 , q2 , p1 , p2 )), which has eigenvalues ±γ ± iω.
As a side note, if we take γ = log 4 and ω = π then we obtain
−4
00 −4
eJH =
.
− 41
− 41
Recall that the 2-dimensional version (8.6) of this matrix could not be expressed
in such an exponential form.
CHAPTER 8. NORMAL FORMS 143
Let H be a Hamiltonian that is real analytic with a fixed point at the origin.
We seek to simplify the structure of H near the origin via a smooth and canonical
change of variables. As in the beginning of section 8.2, it suffices to consider
the canonical Poisson bracket.
In section 8.3, we saw how to reduce the quadratic part of the Hamiltonian
to a normal form. Consequently, we now seek a nonlinear change of variables
(near the identity) to reduce the complexity of the nonlinear terms. The method
we will present is iterative, in the sense that we tidy up the terms in the Taylor
expansion one order at a time.
Inductively assume that we have treated the Taylor expansion up to order
k − 1 for some k ≥ 3. We will prescribe a canonical transformation via a
generating function W (P, q) of the form discussed in section 8.1. Our generating
function will be of the form
W (P, q) = P q + F (P, q), (8.13)
where the first term P q which generates the identity, and the error F is a
polynomial that is homogeneous of degree k. The transformation generated
by (8.13) is
∂F ∂F
p=P+ (P, q), Q=q+ (P, q). (8.14)
∂q ∂P
These coordinates are guaranteed to be a local diffeomorphism near the origin
because the leading term is the identity and the remainders above are degree 2
or higher. Rearranging (8.14), we have
∂F ∂F ∂F
P =p− (p, q) + (p, q) − (P, q)
∂q ∂q ∂q
∂F ∂F ∂F ∂F
=p− (p, q) + (p, q) − p− ,q .
∂q ∂q ∂q ∂q
We write X
F = fαβ pα q β ,
|α+β|=k
Setting
Ij = pj qj ,
we see that all of the terms above are functions only of the Ij :
Thus
It turns out that the computation is more elegant in terms of the complex
variables
zj = pj + iqj , z j = pj − iqj
so that
n
X
1
H2 = 2 ωj z j z j ,
j=1
CHAPTER 8. NORMAL FORMS 146
even though our final choice of canonical transformation will be real. Note that
complex polynomials in pj , qj correspond to complex polynomials in zj , z j . We
also extend the bracket {·, ·} to be bilinear (as opposed to Hermitian), so that
We write X
F = fαβ z α z β ,
|α+β|=k
with the requirement that fβα = fαβ so that F is real-valued. Using the product
rule (cf. Definition 7.7), we compute
X n
X
1 α β
{H2 , F } = fαβ 2 ωj {zj z j , z z }
α,β j=1
X n
X
1
zj {z j , z α z β } + z j {zj , z α z β }
= fαβ 2 ωj
α,β j=1
X n
X
1
−2iαj + 2iβj z α z β
= fαβ 2 ωj
α,β j=1
X
= −i ω · (α − β)fαβ z α z β ,
α,β
Setting
Ij = 12 zj z j = 21 (p2j + qj2 ),
we obtain
H = h(I1 , . . . , In ) + O |p|k+1 + |q|k+1 .
(8.24)
Moreover, the variables Ij still satisfy the bracket relations (8.20) and are almost
conserved by the flow in the sense of (8.21).
CHAPTER 8. NORMAL FORMS 147
Together with (8.25), this implies that the quantities Ij are conserved:
(c) The action coordinates Ij are conserved and the angle coordinates
φj evolve linearly in time:
∂h
I˙j = 0, φ̇j = (I1 , . . . , In ) = constant
∂IJ
for all j.
So far, we have seen that in the elliptic case, if the Birkhoff normal form con-
verges as k → ∞ then the system is completely integrable and we can find a set
of action-angle coordinates. This can also be done in the hyperbolic case (8.17)
and other cases as well, although the angle coordinate is no longer corresponds
to an actual angle. (For example, for the free Hamiltonian (8.12) with rectilin-
ear motion the angle coordinate q(t) = q0 + tp0 traces out a straight line.) As
it turns out, the converse is also true: if a system is completely integrable, then
the Birkhoff normal forms converge.
For the remainder of this section, we will abandon the ellipticity and normal
form assumptions and consider a general completely integrable Hamiltonian H.
By definition, the existence of action-angle coordinates immediately implies that
the system is completely integrable. The converse is also true: if a system is
completely integrable then formally we may find action-angle coordinates. This
process is called Liouville integration, as the action-angle coordinates provide
parametric solutions to the equations of motion.
It should be noted that completely integrable systems are quite exceptional
within the class of Hamiltonian systems. Specifically, Siegel [Sie54] showed that
generic (in the sense of the Baire category theorem) analytic Hamiltonians in a
neighborhood of a fixed point are not integrable.
Nevertheless, complete integrability is general enough to include a rich family
of examples. In addition to the Birkhoff normal form example at the beginning
of this section (and the Toda lattice system that we will study in section 8.6),
we have the following two examples.
Example 8.6 (One-dimensional systems). Every conservative system with one
degree of freedom:
E = 21 mẋ2 + V (x), x ∈ R
is completely integrable. Indeed, phase space is two-dimensional and so this
total energy provides the one conserved quantity that we require. Rearranging
this conservation law, we are able to replace the second-order equation of motion
with the first-order equation
q
2
ẋ = ± m [E − V (x)],
which is separable and thus provides a formal solution for the motion. Even
without explicitly evaluating this integral, we were able to use this formula in
section 2.5 to draw many conclusions (Propositions 2.16, 2.18 and 2.19).
CHAPTER 8. NORMAL FORMS 149
Rn × Bδ (c0 ) → R2n
(t, c) 7→ e t1 J∇F1
· · · etn ∇Fn x0 (c)
which flows out from the base point x0 (c) within {F = c} along the commuting
flows ẋ = J∇Fj (x).
As ∇F1 , . . . , ∇Fn are linearly independent, this map is a local diffeomor-
phism. In fact, we claim that for fixed c this forms a covering map from Rn
onto {F = c}. It only remains to show surjectivity, for which we use con-
nectedness. Consider the set of points in {F = c} accessible by this map. It
is relatively open as the map is a local diffeomorphism. Its complement must
then also be open, by the same argument for a different choice of base point.
Therefore, as {F = c} is connected then this set must contain the whole set.
For each c, let L denote the set of times t ∈ Rn for which the flow et1 J∇F1 · · ·
etn ∇Fn x0 (c) returns to the base point x0 (c). As the Fj flows commute, this set
L is an additive subgroup of Rn . This subgroup must be discrete, since the
map is locally injective in a neighborhood of x0 (c). Therefore L is a lattice.
It is a fact that any such lattice must be of the form { mi ei : m ∈ Zd } for
P
some generators {ei }di=1 ⊂ Rn and d ≤ n. (To show this, we can pick e1 closest
to the origin, show that taking the quotient by e1 yields a lattice with the
same properties, and then induct.) Moreover, we must have a full set d = n
of generators because {F = c} is compact. (Indeed, if we had d < n, then the
image of our map would be a cylinder which is not compact.)
CHAPTER 8. NORMAL FORMS 151
θ ∈ Rn /Zn , θ = A(F)t.
However, these coordinates do not retain much of the Poisson structure; we have
where cij (F) is an invertible matrix (since A is), and the bij are smooth functions
that are otherwise unknown.
By the Jacobi identity, we have
Note that {Fk , θi } and {θj , Fk } are only functions of F and not θ. Therefore,
we deduce that {bij , Fk } depends only on F. On the other hand, we have
n
X ∂bij ∂bij
{bij , Fk } = {θ` , Fk } + {F` , Fk } .
∂θ` ∂F`
`=1
The second bracket {F` , Fk } above vanishes, while the first bracket {θ` , Fk } =
∂b
cij (F) is invertible and depends only on F. Together, we conclude that ∂θij`
depends only on F. (In other words, the gradient of bij within {F = c} is
constant.) Therefore, each bij also depends only on F:
The brackets {θi , Fk } = cik (F) are invertible, and so this uniquely prescribes
∂I ∂I
the derivatives ∂Fjk . Moreover, we see that ∂Fjk is invertible, and so F 7→ I
will be a local diffeomorphism by the inverse function theorem, once we know
∂I
it exists. The values of ∂Fjk will correspond to the derivatives of a function I if
CHAPTER 8. NORMAL FORMS 152
∂
we have equality of mixed partials. To avoid the mess of directly applying ∂Fk
to (cij )−1 , we instead take the bracket of (8.31) with θ` :
n n
X ∂Ij X ∂ 2 Ij
0= {{θi , Fk }, θ` } + {θi , Fk }{Fm , θ` }.
∂Fk ∂Fm ∂Fk
k=1 k,m=1
j ∂2I
We want this expression to force the term ∂Fm ∂F k
to be symmetric in k, m. The
brackets {θi , Fk } = cik and {Fm , θ` } = −c`m are invertible, and multiplying by
their inverses ciµ and −cmν yields
∂ 2 Ij X ∂Ij
= {{θi , Fk }, θ` }ciµ cmν . (8.32)
∂Fν ∂Fµ ∂Fk
i,k,m
{{θi , Fk }, θ` } = {{θ` , Fk }, θi }.
The first relation is by our requirement (8.31), and the second relation is because
I = I(F). We want to choose f so that the change of variables φ = θ + f (I)
makes
∂fj ∂fi
0 = {φi , φj } = ebij (I) + (I) − .
∂Ii ∂Ij
This prescribes the “curl” of f (I), and it is solvable by some f because
ebij = −ebji
Indeed, the first relation follows from the corresponding relation (8.33) for θi
and that f is a function of I. These are exactly the canonical relations (8.26)
and (8.28), and so this concludes the proof of Theorem 8.8.
CHAPTER 8. NORMAL FORMS 153
This is a model for a one-dimensional crystal for the specific choice (8.34) of
interaction potential. There is no issue in working with this infinite-dimensional
system. Indeed, local well-posedness follows from the Picard–Lindelöf theorem
in Banach spaces (Theorem A.2). As V ≥ 0, we see that H controls the `2 norm
of q̇, and so conservation of energy implies global-in-time existence of solutions
for localized excitations.
V (x)
∼ e−x ∼x
x
∼ 12 x2
q̇i = pi , ṗi = −V 0 (qi − qi−1 ) + V 0 (qi+1 − qi ) = eqi−1 −qi − e−qi+1 +qi (8.36)
for i ∈ Z. Note that the contributions from the x term in (8.34) cancel out
because qi appears in the Hamiltonian for the ith and (i − 1)st particles. As
we will see, the infinite-dimensional system (8.36) turns out to be completely
integrable in a certain sense. In order to demonstrate this however, we will
need to exhibit infinitely many conserved quantities. To this end, we will need
to move beyond physical quantities like momentum and energy, and develop a
systematic tool that generates these conservation laws.
Our first step is the following (non-canonical) change of variables due to
Flaschka:
bi = − 12 pi , ai = 12 e(qi −qi+1 )/2 .
Note that ai , bi do not determine pi , qi exactly, but rather up to a global trans-
lation in qi . This is okay by the conservation of total momentum. In terms of
CHAPTER 8. NORMAL FORMS 154
Together, L and P form what is called a Lax pair for the Toda lattice: the
equations of motion can be recast as the equation (8.37) for L symmetric and
P antisymmetric. So far, we intentionally are not making any rigorous claim
about the existence of the matrices L, P . Nevertheless, each entry of the equa-
tion (8.37) only involves finitely many entries of L, P and thus makes sense.
The choice of Lax pair is certainly not unique; indeed, we are free to add any
antisymmetric function of L to P without changing the RHS of (8.37). However,
the structure of P is not surprising once we choose L to be tridiagonal: for
example, we need the super-super-diagonal of L̇ to vanish, which dictates the
entries of P .
Formally, the existence of a Lax pair for a system implies that the system is
completely integrable. If we let U (t) denote the solution to
U̇ = P U, U (0) = I,
d T
U U = U T P T U + U T P U = 0.
dt
Therefore, the formula (8.38) for the solution tells us that L(t) is orthogonally
conjugate to its initial value. As conjugation preserves spectral information
(e.g. it does not change the Jordan normal form), this implies that the eigen-
values of L(t) are conserved.
In order to make this rigorous, we will restrict to finite dimensional subspaces
of arbitrary size. We will impose that our solutions are periodic with period n:
This periodicity is preserved by the dynamics. Now, the matrix equation (8.37)
holds for the n × n matrices
b1 a1 an 0 a1 −an
a1 b2 a2 −a1 0 a2
.. ..
L=
a2 b3 . ,
P =
−a2 0 .
.. .. .. ..
. . an−1 . . an−1
an an−1 bn an −an−1 0
where H is the Hamiltonian (8.35) (with the summation only over one period
i = 1, . . . , n), and in the last equality we noted that the sum of qi+1 − qi is
telescoping and vanishes.
We know that both of the quantities (8.39) and (8.40) are conserved. In
fact, for any k ≥ 1 we have
d
tr Lk = tr L̇Lk−1 + tr LL̇Lk−2 + · · · + tr Lk−1 L̇
dt
= tr [P, L]Lk−1 + tr L[P, L]Lk−2 + · · · + tr Lk−1 [P, L]
= tr{[P, Lk ]} = tr P Lk − tr Lk P = 0
d
(L(t) − λ)ψ(t) = [P, L]ψ − λP ψ = P (L − λ)ψ.
dt
By Grönwall’s inequality (Lemma A.3), it follows that ψ remains an eigenvector
for all t ∈ R:
(L(t) − λ)ψ(t) ≡ 0.
(This computation is for real ψ; over the field C, we can multiply by eiθ and
account for degenerate eigenvalues as well.) Moreover, the length of ψ is con-
stant:
d
|ψ|2 = P ψ · ψ + ψ · P ψ = ψ · (P + P T )ψ = 0.
dt
Altogether, we conclude that the dynamics preserve the eigenvalues of L while
the eigenvectors move around. With these eigenvectors in hand, we can con-
struct the solution (8.38) for L(t) to the equation (8.37).
So far, we have seen that the infinite-dimensional system (8.35) is completely
integrable, in the sense that it is the limit of n-dimensional completely integrable
systems as n → ∞. For the remainder of this section, we will examine the corre-
sponding action-angle coordinates. For this, it turns out to be more convenient
to work with a different system corresponding to the matrices
b1 a1 0 a1
a1 b2 a2 −a1 0 a2
.. ..
L=
a2 b3 . ,
P =
−a2 0 . .
.. .. .. ..
. . an−1 . . an−1
an−1 bn −an−1 0
We have deleted the an entries in the corners of the matrix. This yields different
finite-dimensional systems, but they correspond to the same Lax pair as n →
∞. These an entries had originated from the x term in (8.34), and thus these
CHAPTER 8. NORMAL FORMS 157
matrices are the Lax pair for the Hamiltonian (8.35) but with the new potential
energy
V (x) = e−x .
Note that this potential no longer has an equilibrium at x = 0.
Our action-angle coordinates will be closely related to the spectral properties
of the operator L. The matrices L, P are now tridiagonal, and L is the operator
for a discrete Sturm–Liouville problem. Consequently, we expect that L should
have similar properties to a Sturm–Liouville operator. In particular, we claim
that L has simple eigenvalues. If Lu = λu for some u ∈ Rn , then the first entry
reads
b1 u1 + a1 u2 = λu1 .
As aj > 0, we see that u1 determines u2 . With these in hand, we then see that
the second entry of Lu = λu determines u3 , and so on. In this way, u1 uniquely
determines the eigenvector u.
By the spectral theorem, we may construct the discrete probability measure
n
X
dµ(λ) = µj δ(λ − λj ) dλ with µj = u21j .
j=1
This makes Z
he1 , f (L)e1 i = f (λ) dµ(λ)
We know that the other entries must be zero, because the RHS must have degree
k + 1 and we have the orthogonality condition
Z Z
λpk (λ) pk−2 (λ) dλ = pk (λ) λpk−2 (λ) dλ = 0
and that the eigenvectors u evolve according to (8.41). (We proved this for our
previous Lax pair, but the proof only relied on the evolution equation (8.37)
and thus still applies.) Consequently, the evolution of the weights µj is
µ̇j = 2u1j u̇1j = 2u1j a1 u2j = 2(λj − b1 )u21j
X n
= 2(λj − b1 )µj = 2 (λj − λk )µk µj ,
k=1
The change of variables from the λj , µj back to the aj , bj is called the inverse
scattering transformation. (The reason for this name is that it is common
for infinite-dimensional completely integrable systems for the action-angle vari-
ables to be connected to scattering theory, and inverting the change of variables
requires new work.)
Finally, we note that the simple evolution of λj , µj also provides a description
of the long-time behavior. After reordering λ1 > λ2 > · · · > λn , we see that
the exponential e2λ1 t is the dominant term as t → ∞, and so from the explicit
solutions (8.42) we have
dµ → δ(λ − λ1 ) dλ as t → ∞.
Therefore Z
b1 = λ dλ → λ1 , a1 → 0,
and so
λ1 0
0 ∗ ∗
.
∗ ..
L→
∗ .
.. ..
. . ∗
∗ ∗
This resembles the first step of the QR algorithm for the symmetric tridiagonal
matrix L. In fact, we can obtain a dynamical version of the full QR algorithm
by removing the first row and column of L and repeating the process. Indeed,
the next order exponential is e2λ2 t which yields
8.7. Exercises
8.1 (Example of Birkhoff normal form). Fix ω > 0 and a ∈ R, and consider the
Hamiltonian
H = 12 p2 + 12 ω 2 q 2 + aq 4 .
8.2 (Example of Liouville integration). Fix ω > 0 and a ∈ R, and consider the
Hamiltonian
H = 21 p2 + 12 ω 2 q 2 + aq 4 .
(a) Using conservation of energy under the above Hamiltonian flow, show that
the dynamics yield
q̇ = f (E, q)
for suitable explicit function(s) f .
(b) This equation is separable. Thus one can formally solve for p(t) and q(t)
via integration. Use this method to derive a formula for the period of
oscillation (as a function of energy) in the form of a definite integral.
(a) Find the angle variable I and show that the energy as a function of the
angle variable I is h(I) = ωI.
(b) Write down (but do not evaluate) the integral for the generating function
Φ, and show that the angle variable is given by
x
φ = tan−1 q + constant.
2I 2
mω − x
(c) Using the linear evolution of φ, find the solution x(t) for the motion.
CHAPTER 9
SYMPLECTIC GEOMETRY
is a symplectic form, where Jij are the entries of the inverse of the matrix J ij .
This is the dual object to Poisson brackets. Indeed, so far we have been
viewing a vector field X as an operator
n
X ∂f
X(f ) = X i (x)
i=1
∂xi
161
CHAPTER 9. SYMPLECTIC GEOMETRY 162
on smooth functions f . We can also view this as the exterior derivative df (X),
where f operates on X and dxi is defined by
(
i ∂ 1 i = j,
dx =
∂xj 0 i 6= j.
for i, j = 1, . . . , n. Often in differential topology we write ∂qi for the dual vector
to dq i , but physically we like to think of the tangent vector to a point in phase
space as the velocity and (mass-times-)acceleration. The action of ω on these
basis vectors is
and so ξ = 0 as desired.
In fact, the form of Example 9.2 is the fundamental example in the following
sense.
CHAPTER 9. SYMPLECTIC GEOMETRY 163
where the sum ranges over all multi-indices I = (i1 , . . . , in ) of length n. Any
term where I contains a repeated index is zero because dq i ∧ dq i = 0, and since
2-forms commute under the wedge product then we can rewrite this as
ω d = d! dp1 ∧ dq 1 ∧ · · · ∧ dpd ∧ dq d .
This is nonvanishing at x by definition of the nondegeneracy of ω.
Next let us see how to use the symplectic structure to obtain the famil-
iar dynamics from a Hamiltonian. Let (M, ω) be a symplectic manifold and
H : M → R be a smooth function. Then dH is a differential one-form which
associates a covector to each point in M . On the other hand, for Hamilton’s
equations we need to specify a vector field for the right-hand side of the differen-
tial equation. By definition, for each x ∈ M the map Tx M → Tx∗ M which takes
ξ 7→ ξyω is invertible, and so let J : Tx∗ M → Tx M denote its inverse. Then
J dH is a vector field on M , called the Hamiltonian vector field associated
to the Hamiltonian H. The induced flow etJ dH : M → M , x0 7→ x(t) defined
to solve the ODE
ẋ(t) = J dH(x(t)), x(0) = x0 (9.4)
is called a Hamiltonian flow on M . In differential geometry the notation
b : Tx M → Tx∗ M is used for the map ξ 7→ ξyω, and so in place of J the
ω
notations ωb −1 and
b −1 (dH)
XH := ω ⇐⇒ XH yω = dH (9.5)
are often used.
Example 9.4. Consider the canonical symplectic form (9.2) on the Euclidean
phase space T ∗ Rn = Rnq × Rnp . On Euclidean space we already have a natural
identification of vectors and covectors via the dot product, and we can express
J as a linear transformation in terms of this identification. Given a vector field
n
i ∂ i ∂
X
X= b +c
i=1
∂q i ∂pi
CHAPTER 9. SYMPLECTIC GEOMETRY 164
and hence
0 I
J= (9.6)
−I 0
for I the identity matrix. Writing
n
!
∂H
∂H ∂H X ∂H i ∂H i ∂q
dH = dq + dp = dq + i dp = ∂H ,
∂q ∂p i=1
∂q i ∂p ∂p
we have !
∂H
q̇ ∂p
= J dH =
ṗ − ∂H
∂q
Proof. The map gt is smoothly homotopic to the identity via the family of
diffeomorphisms gs , s ∈ [0, t], in the sense that at time s = 0 the map g0 :
M → M is the identity, and at time s = t the map gt : M → M is what
we are given. Fix N a smooth orientable 2-dimensional submanifold, and let
ΩN := {gs (N ) : s ∈ [0, t]} denote the image of N under the homotopy. We can
think of ωN as an orientable 3-manifold in [0, t] × M or as being immersed in
M . With this choice of orientation we have
∂ΩN = gt (N ) ∪ (−N ) ∪ (−Ω∂N ). (9.8)
We claim that for any smooth curve γ in M we have
Z Z
d
ω= dH, (9.9)
dt Ωγ g t (γ)
and the identity (9.9) follows from the fundamental theorem of calculus.
For a closed curve like ∂N we note that
Z Z
dH = H = 0,
g t (∂N ) g t (∅)
From (9.10) we know the last integral vanishes, and so we conclude that ω is an
integral invariant of gt .
CHAPTER 9. SYMPLECTIC GEOMETRY 166
As ∂N is closed then these two right-hand sides are equal by premise. There-
fore we conclude that (9.7) holds for the form dη, and hence dη is an integral
invariant.
Noting that d(p dq) = dp ∧ dq, we conclude:
Corollary 9.9. Canonical transformations preserve the symplectic form ω and
the volume form ω n .
In Euclidean phase space T ∗ Rn , any Hamiltonian flow is automatically a
canonical transformation. However, the converse of Proposition 9.8 is not true in
general, and so Theorem 9.6 does not imply the identity (9.11). We must assume
that M is simply connected in order for Hamiltonian flows to be canonical
transformations, since then Theorem 9.6 implies (9.11).
As in (7.9) the Poisson bracket {H, f } is also the evolution of the quantity f
under the Hamiltonian flow H, since the Lie derivative of the function f along
J dH is given by
d
LJ dH f = f ◦ etJ dH = (J dH)(f ) = {H, f } (9.13)
dt t=0
∂f ∂g ∂f ∂g
{f, g} = (J df )(g) = − ,
∂p ∂q ∂q ∂p
and recognizing the Poisson bracket as a Lie derivative via (9.13) we conclude
Note that the first term above is the tautological one-form p dq on M , the
differential of which yields the symplectic form ω. This is the form that we
insisted be preserved by a canonical transformation in (9.11), and the same
notion of canonical transformations holds on a general extended phase space
M × R in terms of the canonical coordinates (q, p, t).
The extended phase space M × R together with the Poincaré–Cartan one-
form τ do indeed define a contact manifold, but as we will see in the next chapter
it is not the natural contact extension of M since τ depends on the system’s
Hamiltonian H.
On M × R we define the extended Hamiltonian vector field
∂
YH = XH + , (9.17)
∂t
where XH (t) is the Hamiltonian vector field on M × {t} defined by (9.4). In
analogy with the second condition of (9.5), the vector field (9.17) is the unique
solution to
YH ydτ = 0. (9.18)
CHAPTER 9. SYMPLECTIC GEOMETRY 169
which is just Hamilton’s equations (7.2) joined with the trivial equation ṫ = 1. It
follows that any smooth time-dependent function f on M × R evolves according
to
df ∂f
= {H, f }p,q + ,
dt ∂t
as was the case in section 7.3. In particular, a time-dependent Hamiltonian H
is no longer conserved under its own flow.
P2
H(Q, P, t) := e−γt + eγt V (Q) (9.20)
2m
on the extended phase space RQ × RP × Rt , where γ ≥ 0 is a constant and the
coordinates Q, P are related to the physical coordinates q, p by the non-canonical
transformation
P = eγt p, Q = q.
Then the equations of motion (9.19) yield
mq̈ + mγ q̇ + V 0 (q) = 0.
This represents a Newtonian system with a friction force that depends linearly
on the velocity, like the damped harmonic oscillator of Example 2.9.
9.7. Exercises
9.1. Show that the following criteria for a nondegenerate two-form ω are equiv-
alent.
(a) For each x ∈ M the map Tx M → Tx∗ M which takes ξ 7→ ω(ξ, ·) is invert-
ible.
(b) For each x ∈ M and ξ ∈ Tx M nonzero there exists η ∈ Tx M such that
ω(ξ, η) 6= 0.
(c) The local matrix representation of ω in terms of some (hence every) basis
is invertible.
9.2. Let J ij (x) denote the structure matrix of a non-degenerate Poisson bracket
as in Corollary 7.9 and let Jij (x) denote the entries of the inverse matrix. Show
that
∂Jab ∂Jca ∂Jbc
+ + =0
∂xc ∂xb ∂xa
for every triple of indices a, b, and c. Note that this implies that the 2-form (9.1)
is closed. (Hint: Use the Jacobi identity (7.14) for J along with the deriva-
tive (A.15) of the determinant.)
9.3. (a) Show that rotation on the two-dimensional sphere S 2 is a Hamilto-
nian flow.
(b) Show that translations gt (q, p) = (q + t, p) on the torus R2 /Z2 is a locally
Hamiltonian flow, but is not globally Hamiltonian.
9.4. Let (M, ω) be a 2n-dimensional compact symplectic manifold.
9.5 (Symplectic and complex structure). Identify Euclidean phase space R2n =
Rnq × Rnp with the complex space Cn via zj := qj + ipj .
(a) Check that this Poisson bracket is bilinear, antisymmetric, and satisfies
the Jacobi identity.
(b) Given a Hamiltonian functional H(q), show that a smooth functional F (q)
is constant for solutions q to the PDE associated to H if and only if
{H, F } = 0 as a functional on C ∞ (R/Z).
(c) Show that for any Hamiltonian functional H(q) both the Hamiltonian H
and the mass functional
Z 1
M (q) := q(x) dx
0
does indeed generate translations, in the sense that the solution to the
associated PDE with initial data q(0, x) = f (x) is q(t, x) = f (x + t).
(e) The Korteweg–de Vries (KdV) equation is the PDE associated to the
Hamiltonian Z 1
1 0 2 3
HKdV (q) := 2 q (x) − q(x) dx,
0
and arises as the long-wavelength and shallow-water limit for unidirec-
tional water waves of height q from the undisturbed water level q = 0.
Show that the mass M (q), momentum P (q), and energy HKdV (q) are all
constant for solutions q to KdV. In fact, these are the first three of an
infinite hierarchy of conserved quantities for KdV.
CHAPTER 10
CONTACT GEOMETRY
η ∧ (dη)n 6= 0; (10.1)
Note that this is the combination of dS and the tautological one-form (9.3) on
T ∗ Rn . A straightforward computation shows that η ∧ (dη)n = dS ∧ dq ∧ dp is
the Euclidean volume form on Rt × T ∗ Rn , but we can also check that dη is a
174
CHAPTER 10. CONTACT GEOMETRY 175
symplectic tensor. Note that dη = −dp ∧ dq, and so the rank-2n distribution
N ⊂ T R2n+1 annihilated by η is spanned by the vector fields
∂ ∂ ∂
Xi = + pi , Yi =
∂q i ∂S ∂pi
for i = 1, . . . , n. Moreover, we have
Proof. The map Φ which takes X 7→ Xydη defines a smooth bundle homo-
morphism Φ : T M → T ∗ M , and for each x ∈ M it reduces to a linear map
Φx : Tx M → Tx∗ M . As dηx restricted to the subspace Nx is nondegenerate by
definition, then Φx |Nx is injective and hence Φx has rank at least 2n. On the
other hand, we know that Φx cannot have rank 2n + 1 because then dηx would
be nondegenerate and contradict that Tx M is odd-dimensional. Therefore, we
conclude that ker Φx is one-dimensional. Moreover, since ker(Φx ) is not con-
tained in Nx = ker(ηx ) by definition, we know there exists a unique ξ ∈ ker(Φx )
with ηx (ξx ) = 1; these correspond to the two conditions (10.3) respectively.
The smoothness of ξ follows from the smoothness of η. Note that ker Φ ⊂
T M is a smooth rank-one subbundle, and so around any x ∈ M we can pick
a smooth nonvanishing section X of ker Φ near x. As η(X) 6= 0, then we can
write ξ = η(X)−1 X as a composition of smooth maps near x.
Example 10.4. For the Euclidean contact space of Example 10.1, we see that
Reeb field is
∂
ξ=
∂S
as the two conditions (10.3) are easily verified.
CHAPTER 10. CONTACT GEOMETRY 176
satisfies the two conditions (10.4), from which we obtain the differential equation
system
n
dS X ∂H dq i ∂H dpi ∂H ∂H
= pi i − H, = , = − i − pi . (10.6)
dt i=1
∂p dt ∂pi dt ∂q ∂S
p p2
q̇ = , ṗ = −V 0 (q) − γp, Ṡ = − V (q) − γS.
m 2m
This represents a Newtonian system with a friction force that depends linearly
on the velocity, like the damped harmonic oscillator of Example 2.9. Note that
as opposed to Example 9.12, in this approach the momentum coordinate still
coincides with the physical momentum defined via the velocity.
10.3. Dynamics
where H0 (q, p) is the mechanical energy of the system (e.g. H0 (q, p) = p2 /2m +
V (q)), then according to the formula (10.8) the mechanical energy obeys
n
dH0 X ∂H0
=− pi i f 0 (S). (10.11)
dt i=1
∂p
CHAPTER 10. CONTACT GEOMETRY 178
We interpret this as saying that f (S) is the potential for the system’s dissipative
force. Moreover, the evolution (10.9) of the Hamiltonian can be integrated to
obtain Z t
H(t) = H(0) exp − f 0 (S) .
0
Ḣ0 = −mγ q̇ 2
which agrees with what we found for the damped harmonic oscillator of Ex-
ample 2.9. The energy of this system decays exponentially to zero, according
to
H(t) = H(0)e−γt .
Solving for the action S we obtain
p2
1
S(q, p, t) = H(0)e−γt − − V (q) .
γ 2m
Writing (S̃, q̃, p̃) as functions of (q, p, S), this is equivalent to the conditions
n n n
∂ S̃ X j ∂ q̃ j i ∂ S̃ X j ∂ q̃ j ∂ S̃ X j ∂ q̃ j
f= − p̃ , −f p = i − p̃ , 0= i − p̃
∂S j=1 ∂S ∂q j=1
∂q i ∂p j=1
∂pi
Plugging this into (10.12), we see that the remaining coordinates are determined
in terms of S̃ by
∂ S̃ ∂ S̃ ∂ S̃
f= , f pi = − , p̃i = .
∂S ∂q i ∂ q̃ i
From the second condition of (10.4) we know that XH ydη is equal to −dH on
N . By the definition (10.3) of the Reeb field it then follows that
∂H
LXH η = −ξ(H)η = − η, (10.14)
∂S
CHAPTER 10. CONTACT GEOMETRY 180
where the last equality is merely the expression in terms of the local canonical co-
ordinates. Comparing this to the definition (10.12) of a contact transformation,
we see that the flow is a contact transformation with f = −ξ(H) = −∂H/∂S.
In particular, if we restrict to the contact structure N , we have η = 0 and hence
LXH η = 0 as desired.
Conversely, assume that V is contact vector field. Then Cartan’s magic
formula reads
0 = (LV η)|N = dη(XH ) |N + (V ydη)|N .
Consider the smooth function H = η(V ), defined so that the first condition of
the definition (10.4) holds. Then we obtain the second condition of (10.4) from
the above equality, and so we conclude that V = XH is the contact Hamiltonian
vector field for H.
Heuristically, we do not expect the volume form to be preserved by a general
contact Hamiltonian flow since contact dynamics includes dissipative systems.
Using (10.14) we see that the volume form evolves according to
n−1
X
LXH [η ∧ (dη)n ] = (LXH η) ∧ (dη)n + η ∧ (dη)i ∧ [d(LXH η)] ∧ (dη)n−1−i
i=0
∂H
= −(n + 1) η ∧ (dη)n .
∂S
This illustrates the connection between the Hamiltonian’s S-dependence to the
system’s dissipation, and consequently systems for which ∂H ∂S is nonvanishing
are called dissipative.
Instead, we have a variant of Liouville’s theorem due to [BT15], in which a
rescaled volume form is preserved away from the zero set H −1 (0).
|H|−(n+1) η ∧ (dη)n
is an invariant measure for the contact Hamiltonian flow for H along orbits
outside of H −1 (0). Moreover, up to scalar multiplication it is the unique such
measure whose density with respect to the standard volume form depends only
on H.
Proof. For a smooth function ρ on M , a computation using (10.14) shows that
∂H
LXH [ρ η ∧ (dη)n ] = (LXH ρ) η ∧ (dη)n − (n + 1) ρ η ∧ (dη)n
∂S
∂H
= XH (ρ) − (n + 1) ρ η ∧ (dη)n .
∂S
CHAPTER 10. CONTACT GEOMETRY 181
which are the old equations of motion (10.6) joined with the trivial equation
ṫ = 1. It follows that any smooth time-dependent function F on M × R evolves
according to
n
dF ∂F X i ∂F
= −H + p {H, F }pi ,S + {H, F }p,q + , (10.17)
dt ∂S i=1
∂t
CHAPTER 10. CONTACT GEOMETRY 182
using the notation of eq. (10.8). In particular, we see that under its own flow
the Hamiltonian now changes according to both its dissipation ∂H/∂S and its
time-dependence.
Lastly, let us extend the notion of contact transformations to our extended
manifold M ×R. In terms of canonical coordinates, a time-dependent contact
transformation (q, p, S, t) 7→ (q̃, p̃, S̃, t̃) must satisfy
∂ S̃ ∂ S̃ ∂ S̃ ∂ S̃
f= , f pi = − , p̃i = , fH = + K. (10.19)
∂S ∂q i ∂ q̃ i ∂t
The first three conditions are unchanged and the last condition defines the new
Hamiltonian K = K(q, q̃, S, t). Taking f ≡ 1, we see that S̃ is related to the
canonical transformation generating function F (q, q̃, t) via
∂K ∂H
f =f + df (YH ).
∂ S̃ ∂S
In the special case f ≡ 1 we note that if H is independent of S then K = 0
is a solution, in which case the last condition of (10.19) becomes the familiar
Hamilton–Jacobi equation (6.4). However, in general f may be S-dependent and
so the notion of contact transformations is strictly more general than even the
physicist’s notion of canonical transformations (cf. the remark of section 7.6).
CHAPTER 10. CONTACT GEOMETRY 183
10.6. Exercises
(b) Let S ⊂ T (R2n+1 r {0}) denote the subbundle spanned by V and W , and
let
[
S⊥ = X ∈ Tp R2n+2 : dθp (Vp , Xp ) = dθp (Wp , Xp ) = 0
p∈S 2n+1
p2 m
H= + ω 2 (t)q 2 + γS
2m 2
CHAPTER 10. CONTACT GEOMETRY 184
with time-dependent frequency ω(t) and damping parameter γ. Show that the
new expanding coordinates
(a) We seek a solution F to eq. (10.17) with vanishing left-hand side. Substi-
tuting the quadratic ansatz
α̈ + [ω 2 (t) − 41 γ 2 ]α = α−3 ,
Here, α(t) solves the Ermakov equation and the conserved quantities I
and G are determined by the initial conditions.
APPENDIX A
We collect some facts from introductory ODE theory in this chapter for easy
reference. The material is based on [CL55].
Throughout this chapter, we will study the initial value problem (IVP)
Proof. Both directions easily follow from the fundamental theorem of calculus.
First assume that (a) holds. Then both sides of the IVP (A.1) are continuous,
and so integration yields (A.2) by the fundamental theorem of calculus.
Now assume that (b) holds. Then t 7→ f (t, x(t)) is continuous, and so by
the fundamental theorem of calculus the integral equation (A.2) says that x is
differentiable with derivative f (t, x(t)).
In the case X = Rd , we can give a more general measure-theoretic version of
Lemma A.1 via the following statement of the fundamental theorem of calculus:
186
APPENDIX A. FUNDAMENTALS OF ODE THEORY 187
for all ψ ∈ C0∞ ([−T, T ]). For ODEs, it turns out that this is also equivalent
provided that f is continuous, but we will not need this fact.
In trying to argue that solutions to the IVP (A.1) exist, the formulation (b) is
better than that of (a). This is because integrals are stable while Pderivatives are
highly unstable. For example, consider the Fourier series g(t) = k6=0 ck e2πikt .
If the coefficients ck are absolutely summable, then this defines a periodic
function g(t). Roughly speaking, the rate of decay of ck corresponds to the
smoothness of g(t), because the character e2πikt is rapidly oscillating for large
frequencies k. Integration suppresses high frequencies, because it replaces the
ck
coefficients of g(t) by more rapidly decaying sequence 2πik . Conversely, differ-
entiation amplifies high frequencies, because it replaces the coefficients of g(t)
by more slowly decaying sequence 2πikck .
In addition to differential equations, the theory of integral equations also
exists. The integral equation (A.2) is special in that the integration is over [0, t],
and such equations are said to be of Volterra type (in analogy with triangular
matrices). The more general case
Z 1
x(t) = b(t) + K(t, s)x(s) ds
0
is called a Fredholm integral equation, of which the linear case is shown above.
The following theorem is a fundamental result on the existence of solutions:
for some constant C > 0, then for each x0 ∈ X there exists a unique continuous
function x : R → X so that the integral equation (A.2) holds. Moreover, x ∈
C([−T, T ] → X) depends continuously upon x0 ∈ X for all T > 0.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 188
Theorem A.2 says that the ODE (A.1) is (globally-in-time) well-posed (in
the sense of Hadamard): solutions exist, solutions are unique, and solutions
depend continuously upon the initial data. The statement of continuous depen-
dence can take various forms; for example, it follows that given a convergent
sequence of initial data, the corresponding sequence of solutions converges uni-
formly on compact time intervals. Note that continuous dependence upon initial
data is not extremely restrictive; indeed, chaotic systems, where trajectories ex-
hibit complex geometric structure and are sensitive to small changes in initial
conditions, can still be well-posed.
We will only present one proof of Theorem A.2, but there are multiple argu-
ments that apply. Having multiple methods is particularly useful in the study
of PDEs, because different ODE proofs yield distinct PDE statements.
Proof. We will argue by Picard iteration. Recursively define the sequence
Z t
x0 (t) ≡ x0 , xn+1 (t) = x0 + f (s, xn (s)) ds
0
We now have the difference between the previous two approximations on the
RHS. Applying this estimate iteratively to the RHS, we obtain
Z t Z sn
2
≤C |xn−1 (sn−1 ) − xn−2 (sn−1 )| dsn−1 dsn
0 0
..
. Z
n
≤C |x1 (s1 ) − x0 (s1 )| ds1 ds2 . . . dsn .
0<s1 <s2 <···<sn <t
C n tn+1
|xn+1 (t) − xn (t)| ≤ sup |f (s, x0 )| .
s∈(0,t) (n + 1)!
APPENDIX A. FUNDAMENTALS OF ODE THEORY 189
for all T > 0. To see that x(t) solves the IVP (A.1), we take n → ∞ in the
definition Z t
xn+1 (t) = x0 + f (s, xn (s)) ds.
0
We have f (s, xn (s)) → f (s, x(s)) by continuity, so the integrals converge, and
hence we obtain Z t
x(t) = x0 + f (s, x(s)) ds.
0
Therefore x(t) is a solution to the integral equation (A.2), and consequently the
IVP (A.1) by Lemma A.1.
Next, we claim that the solution is unique. Suppose x(t) and x̃(t) are both
solutions to the IVP (A.1) that are in C([−T, T ] → X). Arguing as before, we
estimate
Z t
|x(t) − x̃(t)| ≤ |f (s, x(s)) − f (s, x̃(s))| ds
0
Z t
≤C |x(sn ) − x̃(sn )| dsn
0
Z t Z sn
2
≤C |x(sn−1 ) − x̃(sn−1 )| dsn−1 dsn
0 0
..
. Z
≤ Cn |x(s1 ) − x̃(s1 )| ds1 ds2 . . . dsn
0<s1 <s2 <···<sn <t
Cn
≤ sup |x(s) − x̃(s)|.
n! s∈(−T,T )
Cn
sup |x(t) − x̃(t)| ≤ sup |x(s) − x̃(s)|
t∈(−T,T ) n! s∈(−T,T )
for all n. As the LHS and RHS are finite since x and x̃ are in C([−T, T ] → X),
we send n → ∞ to conclude that x(t) ≡ x̃(t).
Lastly, we claim that the solution depends continuously upon the initial
data. Suppose x(t) and x̃(t) are both solutions to (A.1) in C([−T, T ] → X)
APPENDIX A. FUNDAMENTALS OF ODE THEORY 190
then Z t
f (t) ≤ A exp a(s) ds .
0
Many authors choose to prove Lemma A.3 first, and then cite it in the proof
of Theorem A.2.
Notice that the proof actually shows that the data-to-solution map x0 7→
x(t) is a Lipschitz function from X into C([−T, T ] → X), and that the Lipschitz
constant is bounded by eCT . In fact, it follows that for fixed t the map x0 7→ x(t)
is a bi-Lipschitz homeomorphism, because we can reconstruct x(0) from x(t)
by solving the ODE backwards in time and citing uniqueness. The Lipschitz
continuity here matches the Lipschitz continuity of f , and in general we cannot
do better.
Another common proof of Theorem A.2 is based on contraction mapping.
However, this only proves the result for T > 0 sufficiently small. The full
statement of Theorem A.2 requires our iteration argument.
Next, we would like to extend our existence result to include equations where
f is not globally Lipschitz, but instead is smooth. Note that f being smooth
does not imply that there are global solutions:
APPENDIX A. FUNDAMENTALS OF ODE THEORY 191
ẋ = x2 , x(0) = 1
has solution
1
x(t) = ,
1−t
which blows up at t = 1.
Smoothness does guarantee local solutions however:
Lastly, we choose T > 0 small enough to stop the solution from noticing the
change in the RHS.
Lastly, we note that solutions x(t) should always be defined on open intervals,
since given one defined on a closed interval we can always extend it a bit further.
Proof. We define the maximal interval of existence to be the union of all open
intervals containing t0 on which a solution x(t) exists. By Theorem A.5, this
is an open and connected set and hence is indeed an interval (T− , T+ ). We
may then glue all of these solutions together by uniqueness to obtain a solution
u : (T− , T+ ) → X.
Suppose for a contradiction that T+ < +∞ and |x(t)| 6→ ∞ as t ↑ T+ .
Then there exists an increasing sequence tn converging to T+ on which |x(t)| is
bounded. Together with t = T+ , this is a bounded set on which f 0 is continuous
and hence bounded. Arguing as in Theorem A.5, we may apply Theorem A.2 to
construct a solution defined for a short time after t = T+ —but this contradicts
the maximality of T+ .
APPENDIX A. FUNDAMENTALS OF ODE THEORY 192
Notice that Z t
x (t) = x0 + f (s, x (s)) ds (A.4)
0
is Lipschitz uniformly in , since
Z t
|x (t) − x (s)| ≤ |f (τ, x (τ ))| dτ ≤ |t − s|kf kL∞ .
s
Therefore x(t) solves the IVP by Lemma A.1. We then pick T > 0 small so that
x(t) solves the original IVP.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 193
Then for every x0 ∈ X the IVP (A.1) has at most one solution.
Note that the assumption (A.6) on f is slightly weaker than Lipschitz con-
tinuity.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 194
Proof. Fix two solutions x(t) and x̃(t) with the same initial data. We may
assume that |x(t) − x̃(t)| ≤ 12 by first restricting our attention to sufficiently
small time intervals, and then patching these intervals together. This allows us
to estimate
Z t
1
|x(t) − x̃(t)| ≤ |x(s) − x̃(s)| log |x(s)−x̃(s)| ds.
0
Fix A > 0, and define the “barrier” b(t) = exp{−Ae−t }, which solves the
equation
ḃ = b log 1b , b(0) = e−A . (A.7)
We claim that |x(t) − x̃(t)| ≤ b(t). Suppose this is false. It is true at t = 0, and
so by continuity there exists a minimal time t0 > 0 where it fails, i.e.
|x(t) − x̃(t)| < b(t) for all t ∈ [0, t0 ), |x(t0 ) − x̃(t0 )| = b(t0 ).
Then by continuity,
Z t0 Z t0
1
|x(t0 ) − x̃(t0 )| ≤ |x(s) − x̃(s)| log |x(s)−x̃(s)| ds < b log 1b ds = b(t0 ),
0 0
Proof. We write
Z Z 2
2λt 2 d λt λt
e ẇ dt = (e w) − λe w dt
dt
Z 2
d λt d
= (e w) − 2λ(eλt w) (eλt w) + λ2 e2λt w2 dt.
dt dt
APPENDIX A. FUNDAMENTALS OF ODE THEORY 195
The first term on the RHS is nonnegative, and so we can drop it to obtain an
d
inequality. The second term is equal to −λ dt [(eλt w)2 ], which integrates to zero
since w has compact support. The inequality (A.8) follows.
Proof of Proposition A.11. Let x(t) and x̃(t) be two solutions to the IVP (A.1)
with the same initial data. We may assume that x and x̃ disagree in the future
after substituting t 7→ −t if necessary. We may also modify x̃ so that x̃(t) ≡ x(t)
for all t ≤ 0.
Fix δ > 0, and let χ : R → R be a smooth function so that χ(t) ≡ 1 for
t ≤ δ and χ(t) ≡ 0 for t ≥ 2δ. Then the weight w(t) = χ(t)[x̃(t) − x(t)] is C 1
and compactly supported, and so the Carleman estimate (A.8) yields
Z
2
λ e2λt χ(t)2 [x̃(t) − x(t)]2 dt
Z
2
≤ e2λt χ̇(x̃ − x) + χ(f (t, x̃(t)) − f (t, x(t))) dt
Z
2
≤C e2λt χ(t)2 |x̃(t) − x(t)|2 dt
Z Z
+ e2λt χ̇(t)2 |x̃(t) − x(t)|2 dt + 2C e2λt |χ̇(t)χ(t)||x̃(t) − x(t)|2 dt.
Let A = sup{|χ̇(t)| : t ∈ R}, and note that A ≥ cδ −1 for some c > 0 since χ
decreases by 1 over an interval of length δ. Then
Z
2
λ e2λt χ(t)2 [x̃(t) − x(t)]2 dt
Z Z 2δ
2 2λt 2 2 2
≤C e χ(t) |x̃(t) − x(t)| dt + (A + 2AC) e2λt 2(|x̃|2 + |x|2 ) dt.
δ
The second term on the RHS is bounded by a constant times e2δλ . Therefore,
for all λ ≤ −1 sufficiently large we have
λ2
Z
e2λt χ(t)2 [x̃(t) − x(t)]2 dt ≤ C 0 e−2δ|λ|
4
for some constant C 0 . This implies that the LHS is zero. Indeed, if |x̃(t)−x(t)| 6≡
0 on (0, δ), then there is a time t0 ∈ (0, δ) such that
λ2
Z
e2λt χ(t)2 [x̃(t) − x(t)]2 dt ≥ cλ2 e−2t0 λ
4
for some c > 0, and the RHS cannot bounded by e−2δ|λ| for all λ ≤ −1 large.
As δ > 0 was arbitrary, we conclude that x(t) ≡ x̃(t).
In other applications, the parameter δ often needs to be fixed small for some
other reason. To accommodate this, we can pick t = 0 to be the first time after
which x(t) and x̃(t) disagree.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 196
and hence will solve the IVP (A.1) with initial data ξ by Lemma A.1. Solutions
are unique by premise, and so the limit function must be x(t; ξ).
As the initial subsequence was arbitrary, we conclude that the entire se-
quence xn (t) converges to x(t; ξ).
Observe that |Eh (t)| . 1 uniformly on [0, T ], because f and x0 7→ x(t; x0 ) are
both Lipschitz. We also know that Eh (t) → 0 as h ↓ 0 for each t, because
|x(t; x0 + η) − x(t; η)| . eC|t| |η| and f is differentiable at x0 .
Now, for η 6= 0 we have
Z t
x(t; x0 + η) − x(t; x0 ) − η + f 0 (s, x(s))[x(t; x0 + η) − x(t; x0 )] ds
0
Z T
≤ |η| E|η| (s) ds.
0
Thus
kV (t, x0 + η) − V (t; x0 )k
Z t
≤ kf 0 (s, x(s; x0 + η))V (s; x0 + η) − f 0 (s, x(s; x0 ))V (s; x0 )k ds
0
Z t
≤ kf 0 (s, x(s))kop kV (s; x0 + η) − V (s; x0 )k ds
0
Z T
+ kf 0 (s, x(s; x0 + η)) − f 0 (s, x(s; x0 ))kkV (s; x0 + η)k ds.
0
Also, the factor kV (s; x0 + η)k is bounded, since by the differential equation we
have
kV (s; x0 + η)k ≤ 1 · exp kf kLip T .
Therefore, an application of Grönwall’s inequality (Lemma A.3) finishes the
proof.
We can extend this to include both higher degrees of regularity and depen-
dence on parameters:
Corollary A.15. If f : Rt × Rdx × Rkµ → Rd is C r in (x, µ) and Lipschitz in x,
then the solution to
ẋ = f (t, x(t), µ), x(0) = ξ (A.9)
is C r in (ξ, µ).
Proof. We iterate the previous theorem. For example, consider the augmented
system
x(t) f (t, x(t), µ)
d ∂x ∂f
(t) = ∂x (t, x(t), µ) ∂x∂ξ (t)
(A.10)
dt ∂ξ
µ(t) 0
with initial data (ξ, Id, µ). Assume that f is C 2 in x and µ. Then the RHS
obeys the hypotheses of Proposition A.14, and so we conclude that x(t), ∂x
∂ξ (t),
and µ(t) are C 1 functions of ξ and µ.
The augmented system (A.10) is not only useful in proofs, but is also com-
monly used in numerical integration of the system (A.9) with parameter µ. The
time t can also be included in the augmented system, but this yields a weaker
smoothness result.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 199
for all vector fields X, Y , and Z. This can be verified directly, but is not
particularly illuminating. Ultimately this is true because operators form an
associative algebra, which in turn is true because function composition is always
associative. (The Jacobi identity does not exclusively arise from associative
algebras however; the cross product also obeys the Jacobi identity, but is not
associative.)
A vector field X also has an associated first-order differential equation
ẋ = X(x). (A.11)
Given a vector field X, we define the flow ΦX (t) := x(t; ·) which maps the
initial data ξ ∈ Rd to the solution x(t; ξ) ∈ Rd at time t for the differential
equation ẋ = X(x).
To leading order, the commutator [X, Y ] measures the failure of the flows
ΦX and ΦY to commute:
Lemma A.16. Let X and Y be smooth vector fields on Rd . Then
as s, t → 0.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 200
Proof. We will Taylor expand the LHS. This is valid because flows are always
smooth, and consequently we may also differentiate in any order. We compute
d
ΦX (t) ◦ ΨY (s) = X ◦ ΦX (t) ◦ ΦY (s),
dt
and so
As ΦY (0) = Id, we get the same expression for ΦY (0) ◦ ΦX (t). In this way,
there are no terms involving only t or only s in the Taylor expansion.
Therefore the leading order term in the Taylor expansion is quadratic, with
only the st term not necessarily vanishing. We have
∂2
ΦX (t) ◦ ΦY (s) = (Y · ∇)X,
∂s ∂t s,t=0
∂2
ΦY (s) ◦ ΦX (t) = (X · ∇)Y.
∂s ∂t s,t=0
∂2
ΦX (t) ◦ ΦY (s) − ΦY (s) ◦ ΦX (t) (ξ) = (Y · ∇)X − (X · ∇)Y (ξ)
∂s ∂t s,t=0
= −[X, Y ](ξ)
as desired.
Integrating in time, we obtain the following important fact:
Theorem A.17. Let X and Y be smooth vector fields. Then [X, Y ] = 0 if and
only if the flows commute:
ΦX (t) ◦ ΦY (s) = ΦX ( Nt ) ◦ · · · ◦ ΦX ( Nt ) ◦ ΦY ( Ns ) ◦ · · · ◦ ΦY ( Ns )
differ around one grid square. Within the summand, there is a difference of two
flows that agree up to one point, differ around a N −1 × N −1 box, and then
continue as the flows of two possibly different points. Applying the lemma with
[X, Y ] = 0, the difference around one box is
3 3
ΦX ( Nt ) ◦ ΦY ( Ns ) − ΦY ( Ns ) ◦ ΦX ( Nt ) = 0 + O( |s| N+|t|
3 ).
After this, the flows can then deviate at most exponentially. Altogether, we
estimate the whole sum as
3 3
ΦX (t) ◦ ΦY (s) − ΦY (s) ◦ ΦX (t) = N 2 eC(|t|+|s|) O( |s| N+|t|
3 ).
Sending N → ∞, the RHS vanishes. As s, t were arbitrary, we conclude that
ΦX (t) ◦ ΦY (s) − ΦY (s) ◦ ΦX (t) ≡ 0.
has determinant c > 0 and hence is nonsingular. (The lower right submatrix
is the identity because ΦX (0; (0, y2 , . . . , yd )) = (0, y2 , . . . , yd ).) Therefore Ψ is a
local diffeomorphism by the inverse function theorem. Also, from (A.13) we see
that
Ψ0 (y)e1 = (X ◦ Ψ)(y),
and so
ẏ = (Ψ0 (y))−1 (X ◦ Ψ)(y) ≡ e1
as desired.
To paraphrase, a nonvanishing vector field X is locally a coordinate vector
∂
field ∂x 1
for some choice of coordinates. Unlike general collections of vector
fields, coordinate vector fields commute with each other.
Proposition A.19. If X1 , . . . , Xn smooth vector fields on Rd that commute
and X1 (x0 ), . . . , Xn (x0 ) ∈ Rd are linearly independent, then there exists a local
diffeomorphism that conjugates X1 , . . . , Xn into e1 , . . . , en .
Proof. We may assume x0 = 0 after translating. First, we make a linear change
of variables so that X1 (x0 ) = e1 , . . . , Xn (x0 ) = en , which can be done by linear
independence. Let [ΦXi (t)](x) := xi (t; ξ) denote the flow of the initial data ξ
under the equation ẋi = Xi by time t. Define
Ψ(y) = ΦX1 (y1 ) ◦ ΦX2 (y2 ) ◦ · · · ◦ ΦXn (yn ) (0, . . . , 0, yn+1 , . . . , yd ).
a study of ODEs. Nevertheless, the local structure of flows nearby a fixed point
is still not fully understood to this day.
A first step is to linearize the vector field X(x):
d
X ∂X i
X i (x) ≈ 0 + (x0 )xj .
j=1
∂xj
Ideally, we would like to say that the higher order terms that we have neglected
are small. A fundamental idea to accomplish this is to write the nonlinear flow
as the linear flow plus a perturbation:
Lemma A.20 (Duhamel formula). Suppose A is a d×d matrix and g : Rd → Rd
is smooth. Then x ∈ C 1 ((−T, T ) → Rd ) solves
ẋ = Ax + g(x)
−At
Recalling that y(t) = e x(t) and multiplying by eAt yields the claim.
This proof method is called variation of parameters, and it is widely appli-
cable. It is useful even when A is nonlinear, although it is harder.
Even in the case when g is linear, the Duhamel formula (A.14) is nontrivial.
Indeed, if we write g(x) = Bx and iterate, then we get
Z t
x(t) = eAt x(0) + eA(t−s) BeAs x(0) ds
0
ZZ
+ eA(t−s2 ) BeA(s2 −s1 ) BeAs1 x(0) ds1 ds2 + . . . .
0<s1 <s2 <t
This is the infinite Duhamel expansion, where we sum over the possible histories
of x(t). Starting with x(0), we flow by A; to this, we add the flow by A, bump
by B, and flow by A; then we add the flow by A, bump by B, flow by A, bump
by B, and flow by A; and so on.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 204
Duhamel’s formula (A.14) is much more effective than the integral equa-
tion (A.2) at solving equations by iteration. Consider
−1 0
ẋ = Ax + (x), A = .
0 −λ
For λ > 1 large, we have to go to the term k ≈ λ before the terms stop growing.
On the other hand, eAt is very stable, and the Duhamel formula harnesses this.
Back to the behavior near a fixed point. If we make a diffeomorphic change
of variables x = Ψ(y), we know from (A.12) that the equation for y is
The first term on the RHS vanishes since X(x0 ) = 0. Therefore the matrix
∂Y i ∂X i 0
∂y j is similar to ∂xj , because they are conjugated by Ψ (0). In particular, the
i
Jordan normal form of ∂X ∂xj (and hence all of its spectral properties) is preserved
under this change of variables.
The following fundamental result tells us that often the actual flow is quali-
tatively similar to the linearized flow:
i
Theorem A.21 (Hartman–Grobman). If the matrix ∂X ∂xj (x0 ) has no purely
imaginary eigenvalues, then there exists a homeomorphism conjugating the non-
linear flow (A.11) to the linear flow
d
X ∂X
ẏ = j
(x0 )y j .
j=1
∂x
eigenspaces? The stable and unstable manifold theorems tell us that they are
preserved under some additional assumptions. For a general linear system ẋ =
Ax, we define the stable, unstable, and center manifolds:
[
Xs = {span ker((A − λ)k ) : Re λ < 0},
[
Xu = {span ker((A − λ)k ) : Re λ > 0},
[
Xc = {span ker((A − λ)k ) : Re λ = 0},
See [CL55, §13.4] for a proof. To obtain the statement of the unstable
manifold theorem we simply need to reverse time, which swaps A 7→ −A and
Xs ↔ Xu .
The constant σ in part (a) may be strictly smaller than the smallest real
part inf{| Re λ| : Re λ < 0}, and the constant C must depend on σ. To see
this, consider a Jordan block A with eigenvalue Re λ < 0 and all ones on the
superdiagonal. Then we have
1 t 21 t2 . . .
Although the factor eλt is exponentially decaying, the matrix is initially growing
in t.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 206
A.7. Exercises
A.1. (a) Fix f : R×Rd → Rd that is C 1 and consider the initial value problem
ẋ = f (t, x) with x(0) = ξ.
Suppose that for some ξ0 this problem admits a (necessarily unique) so-
lution on the interval [0, T ). Show that for each > 0 there is a δ > 0 so
that if |ξ − ξ0 | < δ, then the initial value problem admits a solution on the
interval [0, T − ).
(b) Compute the maximal (forward) existence time as a function of the initial
data for the problem
x2
ẋ = , ẏ = 0.
1 + y 2 x2
Deduce that the existence time may fail to be continuous (in the natu-
ral topology on [0, ∞]). By part (a), however, it is always lower semi-
continuous.
A.2. Suppose that A(t) and B(t) are C 1 square matrices with Ȧ = BA. Show
that
d
det A = tr(B) det(A). (A.15)
dt
(Hint: : Write the derivative of det A in terms of the derivatives of its rows and
simplify using row operations. Be careful that you do not implicitly assume the
eigenvalues are differentiable, which is not true.)
A.3 (Trust singular values, not eigenvalues). Suppose A is a 2 × 2 matrix with
eigenvalues −λ < 0 and −1, and eigenspaces which form angles ±θ with the
horizontal axis. Fix t > 0 and λ > 1 sufficiently large. Show that eAt has norm
≥ cθ−1 for some constant c > 0 for all θ > 0 sufficiently small. Even though
A has negative eigenvalues, the change of basis matrix makes the norm of eAt
very large.
A.4. Find an example of 2 × 2 matrices A and B which both satisfy Xs = R2
but A + B has Xu 6= {0}. (Hint: Take A and B to be upper and lower triangular
respectively, and consider det(A + B).) This phenomenon is known as Turning
instability, and was introduced in [Tur52].
A.5. The proof of the stable manifold theorem actually shows that M is a
smooth manifold that is tangent to Xs at the origin, following as in the proof of
Proposition A.14. With this information in hand, we can derive many properties
from the differential equation. For example, consider the system
q̇ = q − 2pq + 3p2 , ṗ = −p + p2 .
(Incidentally, this system is Hamiltonian, but this is not important to this
method.) On M we can write q = ψ(p). Taking a time derivative of this
equation, match Taylor coefficients using ψ(0) = 0 = ψ 0 (0) to compute the
second-order expansion of ψ at p = 0.
REFERENCES
207
BIBLIOGRAPHY 208
[KdV95] D. J. Korteweg and G. de Vries, On the change of form of long waves advancing in
a rectangular canal, and on a new type of long stationary waves, Philos. Mag. (5)
39 (1895), no. 240, 422–443. MR 3363408
[KH95] A. Katok and B. Hasselblatt, Introduction to the modern theory of dynamical sys-
tems, Encyclopedia of Mathematics and its Applications, vol. 54, Cambridge Univer-
sity Press, Cambridge, 1995, With a supplementary chapter by Katok and Leonardo
Mendoza. MR 1326374
[KS65] P. Kustaanheimo and E. Stiefel, Perturbation theory of Kepler motion based on
spinor regularization, J. Reine Angew. Math. 218 (1965), 204–219. MR 180349
[KV13] R. Killip and M. Vişan, Nonlinear Schrödinger equations at critical regularity, Evo-
lution equations, Clay Math. Proc., vol. 17, Amer. Math. Soc., Providence, RI, 2013,
pp. 325–437. MR 3098643
[Lee13] J. M. Lee, Introduction to smooth manifolds, second ed., Graduate Texts in Math-
ematics, vol. 218, Springer, New York, 2013. MR 2954043
[Lie80] S. Lie, Theorie der Transformationsgruppen I, Math. Ann. 16 (1880), no. 4, 441–
528. MR 1510035
[LL76] L. D. Landau and E. M. Lifshitz, Course of theoretical physics. Vol. 1, third ed.,
Pergamon Press, Oxford-New York-Toronto, Ont., 1976, Mechanics, Translated
from the Russian by J. B. Skyes and J. S. Bell. MR 0475051
[LP89] P. D. Lax and R. S. Phillips, Scattering theory, second ed., Pure and Applied
Mathematics, vol. 26, Academic Press, Inc., Boston, MA, 1989, With appendices
by Cathleen S. Morawetz and G. Schmidt. MR 1037774
[Mor75] C. S. Morawetz, Notes on time decay and scattering for some hyperbolic problems,
Regional Conference Series in Applied Mathematics, No. 19, Society for Industrial
and Applied Mathematics, Philadelphia, Pa., 1975. MR 0492919
[MZ05] J. Moser and E. J. Zehnder, Notes on dynamical systems, Courant Lecture Notes
in Mathematics, vol. 12, New York University, Courant Institute of Mathematical
Sciences, New York; American Mathematical Society, Providence, RI, 2005. MR
2189486
[Nah16] P. Nahin, In praise of simple physics, Princeton University Press, 2016.
[Rab78] P. H. Rabinowitz, Periodic solutions of Hamiltonian systems, Comm. Pure Appl.
Math. 31 (1978), no. 2, 157–184. MR 467823
[San17] F. Santambrogio, {Euclidean, metric, and Wasserstein} gradient flows: an
overview, Bull. Math. Sci. 7 (2017), no. 1, 87–154. MR 3625852
[Sie54] C. L. Siegel, Über die Existenz einer Normalform analytischer Hamiltonscher
Differentialgleichungen in der Nähe einer Gleichgewichtslösung, Math. Ann. 128
(1954), 144–170. MR 67298
[SM95] C. L. Siegel and J. K. Moser, Lectures on celestial mechanics, Classics in Mathe-
matics, Springer-Verlag, Berlin, 1995, Translated from the German by C. I. Kalme,
Reprint of the 1971 translation. MR 1345153
[SS98] J. Shatah and M. Struwe, Geometric wave equations, Courant Lecture Notes in
Mathematics, vol. 2, New York University, Courant Institute of Mathematical
Sciences, New York; American Mathematical Society, Providence, RI, 1998. MR
1674843
[Str08] M. Struwe, Variational methods, fourth ed., Ergebnisse der Mathematik und ihrer
Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in
Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Math-
ematics], vol. 34, Springer-Verlag, Berlin, 2008, Applications to nonlinear partial
differential equations and Hamiltonian systems. MR 2431434
[Str15] S. H. Strogatz, Nonlinear dynamics and chaos, second ed., Westview Press, Boulder,
CO, 2015, With applications to physics, biology, chemistry, and engineering. MR
3837141
BIBLIOGRAPHY 209
[Tao06] T. Tao, Nonlinear dispersive equations, CBMS Regional Conference Series in Math-
ematics, vol. 106, Published for the Conference Board of the Mathematical Sciences,
Washington, DC; by the American Mathematical Society, Providence, RI, 2006, Lo-
cal and global analysis. MR 2233925
[Tro96] J. L. Troutman, Variational calculus and optimal control, second ed., Undergradu-
ate Texts in Mathematics, Springer-Verlag, New York, 1996, With the assistance of
William Hrusa, Optimization with elementary convexity. MR 1363262
[Tur52] A. M. Turing, The chemical basis of morphogenesis, Philos. Trans. Roy. Soc. London
Ser. B 237 (1952), no. 641, 37–72. MR 3363444
[Wei78] A. Weinstein, Periodic orbits for convex Hamiltonian systems, Ann. of Math. (2)
108 (1978), no. 3, 507–518. MR 512430
[Wei83] , The local structure of Poisson manifolds, J. Differential Geom. 18 (1983),
no. 3, 523–557. MR 723816
[Wil36] J. Williamson, On the Algebraic Problem Concerning the Normal Forms of Linear
Dynamical Systems, Amer. J. Math. 58 (1936), no. 1, 141–163. MR 1507138