0% found this document useful (0 votes)

32 views214 pages

Notes On Classical Mechanics

Uploaded by

Stephen Nnochin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views214 pages

Notes On Classical Mechanics

Uploaded by

Stephen Nnochin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 214

NOTES ON CLASSICAL MECHANICS

THIERRY LAURENS

Last updated: January 16, 2023

CONTENTS

I Newtonian Mechanics 1
1 Newton’s equations 2
1.1 Empirical assumptions . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Kinetic energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Potential energy . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Total energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Linear momentum . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Angular momentum . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 One degree of freedom 18

2.1 Linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Conservative systems . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Nonconservative systems . . . . . . . . . . . . . . . . . . . . . . . 23
2.4 Time reversibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Periodic motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 Central fields 38
3.1 Central fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Periodic orbits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Kepler’s problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.4 Virial theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

II Lagrangian Mechanics 55
4 Euler–Lagrange equations 56
4.1 Principle of least action . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Conservative systems . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3 Nonconservative systems . . . . . . . . . . . . . . . . . . . . . . . 64
4.4 Equivalence to Newton’s equations . . . . . . . . . . . . . . . . . 66
4.5 Momentum and conservation . . . . . . . . . . . . . . . . . . . . 67

i
CONTENTS ii

4.6 Noether’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5 Constraints 77
5.1 D’Alembert–Lagrange principle . . . . . . . . . . . . . . . . . . . 77
5.2 Gauss’ principle of least constraint . . . . . . . . . . . . . . . . . 79
5.3 Integrability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4 Integral constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.5 Existence of closed orbits . . . . . . . . . . . . . . . . . . . . . . 85
5.6 One-form constraints . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6 Hamilton–Jacobi equation 98
6.1 Hamilton–Jacobi equation . . . . . . . . . . . . . . . . . . . . . . 98
6.2 Separation of variables . . . . . . . . . . . . . . . . . . . . . . . . 101
6.3 Conditionally periodic motion . . . . . . . . . . . . . . . . . . . . 102
6.4 Geometric optics analogy . . . . . . . . . . . . . . . . . . . . . . 104
6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

III Hamiltonian Mechanics 110

7 Hamilton’s equations 111
7.1 Hamilton’s equations . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2 Legendre transform . . . . . . . . . . . . . . . . . . . . . . . . . . 113
7.3 Poisson structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.4 Hamiltonian vector fields . . . . . . . . . . . . . . . . . . . . . . 120
7.5 Darboux theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
7.6 Canonical transformations . . . . . . . . . . . . . . . . . . . . . . 125
7.7 Liouville’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

8 Normal forms 135

8.1 Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.2 Local structure of Hamiltonian flows . . . . . . . . . . . . . . . . 137
8.3 Normal forms for quadratic Hamiltonians . . . . . . . . . . . . . 139
8.4 Birkhoff normal form . . . . . . . . . . . . . . . . . . . . . . . . . 143
8.5 Completely integrable systems . . . . . . . . . . . . . . . . . . . 147
8.6 Toda lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
8.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

9 Symplectic geometry 161

9.1 Symplectic structure . . . . . . . . . . . . . . . . . . . . . . . . . 161
9.2 Hamiltonian vector fields . . . . . . . . . . . . . . . . . . . . . . 163
9.3 Integral invariants . . . . . . . . . . . . . . . . . . . . . . . . . . 164
9.4 Poisson bracket . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
CONTENTS iii

9.5 Time-dependent systems . . . . . . . . . . . . . . . . . . . . . . . 168

9.6 Locally Hamiltonian vector fields . . . . . . . . . . . . . . . . . . 169
9.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

10 Contact geometry 174

10.1 Contact structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
10.2 Hamiltonian vector fields . . . . . . . . . . . . . . . . . . . . . . 176
10.3 Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
10.4 Contact transformations . . . . . . . . . . . . . . . . . . . . . . . 178
10.5 Time-dependent systems . . . . . . . . . . . . . . . . . . . . . . . 181
10.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

Appendix A Fundamentals of ODE theory 186

A.1 Picard iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
A.2 Alternative approaches to well-posedness . . . . . . . . . . . . . . 192
A.3 Smooth dependence upon initial data . . . . . . . . . . . . . . . . 196
A.4 Vector fields and flows . . . . . . . . . . . . . . . . . . . . . . . . 199
A.5 Behavior away from fixed points . . . . . . . . . . . . . . . . . . 201
A.6 Behavior near a fixed point . . . . . . . . . . . . . . . . . . . . . 202
A.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

References 207
PREFACE

Although classical, the subject of mechanical systems continues to be im-

portant in modern research on differential equations, with applications to the
studies of ODEs, Lagrangian PDEs, and Hamiltonian PDEs. In each applica-
tion, the general theorems of mechanics serve as overarching principles while
the details for the particular example at hand are worked out. Nevertheless,
the subject of classical mechanics is often elided in today’s physics curricula
in order to make way for more active fields of research. I found this partic-
ularly inconvenient as a student learning mathematical tools originating from
classical mechanics, as neither the physics nor math departments taught the
corresponding physical motivation and intuition. Consequently, these notes are
the product of a personal effort to collect these mathematical ideas and connect
them to their physical inspiration.
The first objective of these notes is to present the core theory of classical
mechanics palatable to mathematicians. The presentation is intended to be self-
contained and mathematically rigorous, while maintaining the example-driven
physical mindset in order to cultivate intuition. From a physics viewpoint how-
ever, this excludes many topics that commonly appear in a first-year graduate
course on the subject (e.g. rigid bodies). For a thorough study of the physical
theory, techniques, and examples of classical mechanics, we refer the reader to
the classic physics texts [Arn89, Gol51, LL76].
The second objective of these notes is to develop the resultant mathematical
theory alongside the physical inspiration, and to recognize the physical system
as an example. These mathematical tools are foundational in the theory of
ODEs, and they are still commonly found throughout modern mathematics.
Note that this does not include the rich field of mathematical methods used
in physics, for which we refer the reader to the excellent mathematics texts
[AKN06, SM95, MZ05].
In order to focus on mathematical tools which arose from mechanics, we will
assume the reader possesses a beginner graduate or advanced undergraduate un-
derstanding of mathematics. Specifically, we assume the reader is familiar with
multivariate real analysis throughout, with manifolds and tangent bundles in
Part II, and with differential forms in Part III. Moreover, we will almost always
work in the class of smooth (i.e. infinitely differentiable) functions. Although
many of the results hold under weaker assumptions, we choose to focus on the
proofs in the smooth case in order to highlight the physical motivation.

iv
PART I

NEWTONIAN MECHANICS

The Newtonian framework is the most fundamental interpretation of the

motion of mechanical systems. The physical theory is based on a few empirical
observations which are taken as axioms, among which are Newton’s equations of
motion. The resulting differential equations inherit special properties from their
corresponding physical systems. Extracting mathematical statements about the
behavior of solutions from these properties illustrates some classical tools from
the theory of ODEs. Along the way, we will also identify the key mathematical
ideas that will allow us to include more general systems in the proceeding parts.

1
CHAPTER 1

NEWTON’S EQUATIONS

We begin the physical theory with the essential empirical observations and
its immediate consequences. The material for this chapter is based on [Arn89,
Ch. 1], [AKN06, Ch. 1], and [Gol51, Ch. 1].

1.1. Empirical assumptions

Classical mechanics is the study of the motion of particles in Euclidean

space Rd . A particle (or point mass) consists of two pieces of information:
the body’s mass m > 0 and its position x ∈ Rd . This model for physical
objects neglects the object’s spatial dimensions. Given a system of N particles
with positions xi ∈ Rd , the collection of all possible positions x = (x1 , . . . , xN )
constitutes the configuration space Rd × · · · × Rd = RN d , whose dimension
N d is the system’s degrees of freedom.
The evolution of a system is described by the particles’ trajectories, N
maps xi : I → Rd for I ⊂ R an interval, which together constitute the motion
of the system. In order to determine the system’s evolution we will use the
velocity ẋi , acceleration ẍi , and momentum
pi = mi ẋi (1.1)
of the ith particle. (We use the notation f˙ = df
for time derivatives.) The posi-
dt
tions and momenta together span the phase space RN d ×RN d , whose dimension
2N d is always twice the degrees of freedom. We will refer to the trajectories of
the system plotted in phase space as the system’s phase portrait.
Newton’s principle of determinacy is the experimental observation that
the initial state of a Newtonian system—all the positions x(t0 ) and velocities
ẋ(t0 ) at some moment t0 ∈ R in time—uniquely determines the system’s motion.
We formulate this mathematically as follows:
Definition 1.1 (Newton’s principle of determinacy). A Newtonian system
is given by N particles and a function F : R × (RN d r ∆) × RN d → RN d , called
the force, such that
ṗ = F(t, x, ẋ). (1.2)
S
Here, ∆ = i<j {xi = xj } is the union of diagonals, so that we do not consider
two particles occupying the same position.

2
CHAPTER 1. NEWTON’S EQUATIONS 3

The system of equations (1.2) are called Newton’s equations (or New-
ton’s second law). Unless otherwise noted, we will always assume the particle
masses mi are constant. In this case, (1.2) takes the form

mi ẍi = Fi (t, x, ẋ) for i = 1, . . . , N. (1.3)

Newton’s equations are commonly stated in terms of the momentum p instead

of the velocity ẋ because experimentally we can observe that a particle’s accel-
eration by a force is inversely proportional to its mass.
In mathematics, it is common practice to reduce a system of N second-order
ordinary differential equations (ODEs) on configuration space to first-order. To
do this, we group the equations (1.1) for ẋ together with the equations (1.2)
for ṗ to obtain a system of 2N first-order ODEs on phase space. For initial
conditions, we then take the positions x(t0 ) and momenta p(t0 ) at some time
t0 ∈ R. For convenience, we may assume that t0 = 0 after replacing the variable
t by t − t0 .
Unless otherwise stated, we will always assume that the force F is smooth
(i.e. infinitely differentiable) on R × (RN d r ∆) × RN d . It then follows (cf. The-
orem A.5) that there exists a unique solution x(t) to the system of differential
equations (1.3) for any initial state. However, the solution is only guaranteed
to exist on a short time interval, and in general may not be able to be extended
for all time (cf. Example A.4). From experience though, we expect that for nat-
urally occurring mechanical systems that the time interval of existence can be
extended to all of R. Indeed, most of our mathematical statements will include
a premise that ensures that the solutions to our mathematical model (1.3) exist
for all time.
We will often impose one additional assumption: that the physical laws
governing the system’s motion is independent of the choice of coordinates and
origin on Rd . A transformation Rt × Rd → Rt × Rd is Galilean provided that
it is an affine transformation (a linear transformation and a translation) that
preserves time intervals and for any fixed t ∈ R is an isometry of Rd . In other
words, if we write g(t, x) = (t0 , x0 ) then for each t ∈ R the function x0 is an
orthogonal transformation and a translation. It is straightforward to check that
the set of Galilean transformations forms a group under function composition.

Example 1.2. The following are all Galilean transformations:

(a) Translations: g1 (t, x) = (t + t0 , x + x0 ) for some fixed x0 ∈ Rd , t0 ∈ R.
(b) Rotations and reflections: g2 (t, x) = (t, Ax) for some fixed orthogonal
transformation A ∈ O(d).

(c) Galilean boosts: g3 (t, x) = (t, x + vt) for some fixed velocity v ∈ Rd .
In fact, these examples generate the entire Galilean group (cf. Exercise 1.1).
Galileo’s principle of relativity is the experimental observation that for
an isolated system there exists a reference frame—a choice of origin and
coordinate axes for Rd —that is invariant under any Galilean transformation.
CHAPTER 1. NEWTON’S EQUATIONS 4

Such a reference frame is called inertial, and the principle also asserts that all
coordinate systems in uniform motion with constant velocity with respect to an
inertial frame must also be inertial. (This is observed, for example, in a car
traveling at constant velocity and noting that motion inside the car is as if the
car were at rest.) We formulate this mathematically as follows:
Definition 1.3 (Galileo’s principle of relativity). A Newtonian system is iso-
lated if there exists a reference frame so that Newton’s equations (1.2) are
invariant under any Galilean transformation.
Physically, this is requiring that the ambient space is both homogeneous and
isotropic and that time is homogeneous. Geometrically, this principle requires
that if we apply a Galilean transformation to a phase portrait, then the resulting
graph still consists of trajectories.
If Newton’s equations (1.2) hold in an inertial coordinate system, then they
must be invariant with respect to the Galilean group. Let x(t) denote a solution
in these coordinates. Applying the Galilean group generators of Example 1.2,
we find the following conditions on Fi :
(a) Translation invariance: Fi (t, x, ẋ) ≡ Fi (xj − xk , ẋ).
(b) Rotation and reflection invariance: Fi (Ax, Aẋ) = AFi (x, ẋ) for A ∈ O(d).
(c) Boost invariance: Fi (xj − xk , ẋ) ≡ Fi (xj − xk , ẋj − ẋk ).
Note that the third type of transformation in Example 1.2 changes neither ẍ
nor xi − xj .
Proposition 1.4 (Newton’s first law, special case). For an isolated Newtonian
system of one particle, the particle’s acceleration in an inertial coordinate sys-
tem vanishes. In particular, the motion is rectilinear: uniform in time with
constant velocity.
Proof. Taking N = 1, the conditions (a)–(c) above require that F is independent
of x, ẋ, t and is rotationally invariant. Therefore F ≡ 0.

1.2. Kinetic energy

The kinetic energy of the ith particle and the total kinetic energy are
given by
XN
1 2 1 2
Ki = 2 mi |ẋi | = 2mi |pi | , K= Ki (1.4)
i=1
respectively. From experience, we know that the magnitude of the velocity and
hence the kinetic energy can be increased and decreased by the force Fi acting
on the ith particle, depending on the force’s magnitude and direction. This is
measured through the work Wi done by the force Fi on the ith particle from
time 0 to t, defined by
Z x(t) Z t
Wi = Fi · dsi = Fi (x(τ )) · ẋ(τ ) dτ.
x(0) 0
CHAPTER 1. NEWTON’S EQUATIONS 5

(We use dsi to denote the line element of the trajectory xi (t), and so the second
equality is the definition of path integration.) Although work is measured in
the physical space Rd , the total work
Z x(t)
W = F · ds
x(0)

is measured on configuration space RN d .

The net change in kinetic energy is due to the work done by the force F on
the path x(t) in configuration space:
Proposition 1.5. The increase in total kinetic energy is equal to the total work.

Proof. Differentiating the kinetic energy (1.4), we obtain

N
X N
X
K̇ = mi ẋi · ẍi = ẋi · Fi = ẋ · F (1.5)
i=1 i=1

by Newton’s equations (1.3). Integrating, we have

Z t Z t Z x(t)
K(t) − K(0) = K̇ dt = ẋ · F dτ = F · ds = W.
0 0 x(0)

1.3. Potential energy

For some systems there is also a potential energy. In physics, a Newtonian

system is called conservative if the force F(t, x, ẋ) ≡ F(x) depends only on
the positions x and if the total work along any path connecting two points y, z
in configuration space, Z z
W = F(s) · ds,
y

is independent of the path chosen for the line integral. (Note that the line
integral path is allowed to be arbitrary, and is not limited to trajectories.) This
independence is equivalent to the work around any simple closed path vanishing,
since two paths with the same endpoints can be concatenated to form one closed
path.
One way for this to be satisfied is if there is a potential energy, a function
V such that F = −∇V . Indeed, if this is the case, then by the fundamental
theorem of calculus we have
Z z
W =− ∇V (s) · ds = −V (z) + V (y)
y

for all paths connecting y to z.

CHAPTER 1. NEWTON’S EQUATIONS 6

Example 1.6. If the interaction forces depend only on particle distances:

N
X xj − xi
Fi = Fij , Fij = fij |xi − xj | eij , eij = ,
j=1
|xj − xi |

then the system is conservative with potential energy

X Z
V = Vij , Vij = fij (r) dr.
i<j

In fact, every conservative system must have a potential energy:

Proposition 1.7. A Newtonian system is conservative if and only if there exists
a potential energy, i.e. a smooth function V : RN d r∆ → R such that F = −∇V .
Proof. First suppose that the total work integral is path independent. Fix some
x0 ∈ RN d r ∆. Then the line integral
Z x
V (x) = − F(s) · ds
x0

is well-defined as a function of x, because its value is independent of the choice

of path from x0 to x. We then have F = −∇V by the fundamental theorem of
calculus.
Conversely, if there is a potential energy V , then the fundamental theorem
of calculus tells us that
Z x Z x
F · ds = − ∇V (s) · ds = −V (x) + V (x0 ).
x0 x0

That is, the work is independent of the path.

In Proposition 1.7, it is assumed that F and V are defined on all of RN d r ∆.
The statement does not hold for arbitrary open subsets of the configuration
space RN d :
Example 1.8. Consider the vector field

y x
F(x, y) = ,− 2
x2 + y 2 x + y2

on R2 r {(0, 0)}. This is the negative gradient of the angle coordinate, which is
defined on all of R2 r {(0, 0)} unlike the angle coordinate alone. Consequently,
by the fundamental theorem of calculus the work done on a particle traveling
around any simple closed curve not containing the origin is zero. Conversely,
the work done on a particle traveling once clockwise about the unit circle is 2π,
which reflects the fact that we cannot define a single-valued angle coordinate on
all of R2 r {(0, 0)}. Consequently, the system consisting of one particle subject
to this force is an example of a nonconservative force.
CHAPTER 1. NEWTON’S EQUATIONS 7

Figure 1.1: Configuration space for the system of Example 1.8.

Example 1.8 shows that unlike R2 , a curl-free vector field on R2 r {(0, 0)} is
not necessarily gradient field. Consequently, the fact that Proposition 1.7 holds
on RN d r ∆ may initially be surprising. To understand why this works, take
d = 3, fix N − 1 particles, and consider only moving the ith particle. Let S be
a smooth 2-dimensional surface in R3 with boundary. Using Green’s theorem
we write Z Z
0= Fi · dsi = (∇i × Fi ) · n dA,
∂S S
where n is a unit vector field that is perpendicular to S. For this to hold for
all such surfaces S, then we must have ∇i × Fi ≡ 0. So the definition of a
conservative force is an integral formulation of Fi S being curl-free on all of R3 ,
which avoids the issue of Fi not being defined on j6=i {xj }. As the domain is
all of R3 , we know that ∇ × Fi ≡ 0 implies FS i = −∇Vi (xi ), but with the caveat
that our potential Vi may not be defined on j6=i {xj } because F is not defined
on the diagonals ∆.
A similar argument also applies to dimensions d 6= 3 using differential forms:
if Fi is conservative then Fi is a closed 1-form (i.e. dFi ≡ 0) on all of Rd . This
S Fi is exact (i.e. Fi = −dVi ), although the 0-form Vi may not
then implies that
be defined on j6=i {xj }. In comparison, the 1-form Fi of Example 1.8 is closed
on R2 r {0} (if S ⊂ R2 r {0} then the work around ∂S vanishes), but it is not
closed on all of R2 .

1.4. Total energy

We now have two notions of mechanical energy: kinetic and potential. Their
sum E = K + V is the system’s total energy, and it is conserved under the
dynamics:
Proposition 1.9 (Conservation of energy). For a conservative system, the total
energy E is conserved: E(t) = E(0) for all t ∈ R.
CHAPTER 1. NEWTON’S EQUATIONS 8

Proof. By the computation (1.5) and Newton’s equations (1.3), we have

Ė = K̇ + ∇V · ẋ = ẋ · F − F · ẋ = 0.

Therefore E(t) must be constant.

The conservation of energy is associated with the system’s time symmetry.
Clearly a conservative system is automatically time-independent by definition.
As we will see later, converses of this statement is also true (cf. section 2.4,
Proposition 4.11, Exercise 4.8, and Proposition 7.3).
In section 2.2 we will explore the implications of energy conservation in
greater detail, but as immediate consequences we record a couple of statements
regarding the global existence of solutions. The best case scenario is when the
potential is coercive at infinity:
Corollary 1.10. If a conservative system has a smooth nonnegative potential
energy V (x) that satisfies V (x) → +∞ as |x| → ∞, then solutions exist for all
time.
Proof. Fix initial data and let E0 denote the corresponding initial energy. By
conservation of energy (Proposition 1.9), we know that the solution is confined
to the region {(x, ẋ) : 21 |ẋ|2 + V (x) = E0 } for as long as it exists. This region
is contained in the set { 21 |ẋ|2 ≤ E0 } ∩ {V (x) ≤ E0 }, which is bounded since
V (x) → +∞ as |x| → ∞, and hence our trajectory is bounded as well. In
particular, the blowup condition of Corollary A.6 can never be satisfied, and so
the maximal time of existence cannot be finite.
Even when the potential does not tend to infinity, we can still conclude
global existence nearby the minima of the potential:
Corollary 1.11. If a conservative system has a smooth potential energy V (x)
such that V (0) = 0, ∇V = 0, and the Hessian matrix V 00 (0) is positive definite,
then there exists r > 0 so that solutions with initial data (x(0), ẋ(0)) in the ball
Br (0) exist for all time.
Proof. We follow the bootstrap argument from [Tao06].
Let > 0 be a small parameter to be chosen and assume that

|x(t)|2 + |ẋ(t)|2 ≤ (2)2 (1.6)

for some time t. Taylor expanding about x = 0, we have

V (x) = 12 x · V 00 (0)x + O(|x|3 ).

As V 00 (0) is positive definite and |x(t)| ≤ 2, then we have

V (x(t)) ≥ c|x(t)|2 − C3

for some constants c, C > 0. By conservation of energy (Proposition 1.9), we

then have
1 2 2 3 3
2 |ẋ(t)| + c|x(t)| ≤ E(t) + C = E(0) + C
CHAPTER 1. NEWTON’S EQUATIONS 9

Picking E(0) and sufficiently small, we conclude that

|x(t)|2 + |ẋ(t)|2 ≤ 2 . (1.7)

Let I denote the set of t ∈ R for which (1.7) holds. The set I is connected as
t 7→ (x(t), ẋ(t)) is continuous. The set I is open: we just showed that (1.6) at
time t implies (1.7) for the same t, and (1.7) at time t trivially implies that (1.6)
holds on a neighborhood of t. The set I is also closed, since the inequality (1.7)
is a closed condition. Finally, picking r sufficiently small, we can ensure that
the set I contains t = 0 and thus is nonempty. Altogether, the connectedness of
I implies that I = R. In particular, the blowup condition of Corollary A.6 can
never be satisfied, and so the maximal time of existence cannot be finite.
For intuition about the behavior of solutions, we can picture a small ball
rolling down the graph of V (x). Suppose we have a solution x(t) to Newton’s
equations (1.3) with E(x(t)) ≡ E0 . As kinetic energy is nonnegative, then a
ball at position V (x(t)) is confined to the region where V (x) ≤ E0 . A smaller
potential energy yields a greater kinetic energy by Proposition 1.9, which implies
a greater velocity. This means that the ball gains velocity as it rolls downhill.
This picture makes some facts very intuitive, like that local minima and maxima
of V (x) are stable and unstable equilibria for the system, respectively. For a
general bounded region V (x) ≤ E0 , the ball rolls right through any minima and
up towards the boundary V −1 (E0 ).

1.5. Linear momentum

Suppose the force on the ith particle can be decomposed as

N
X
Fi = Fij (t, xi , xj ) + Fei (t, xi ), (1.8)
j=1
j6=i

where Fij is the interaction force between the ith and jth particle and Fei is
the external force on the ith particle. A system is called closed if there are
no external forces:
Fei ≡ 0 for all i.
We will assume that the interaction forces obey the law of action and
reaction (or Newton’s third law): the experimental observation that the
force two particles exert on each other are equal and opposite, i.e.

Fij = −Fji for all i, j.

This is a common property of Newtonian systems, and we will often be work-

ing under this assumption. If we are also given an inertial frame, then this
assumption implies that the interaction forces are collinear:
xj − xi
Fij = fij eij , where eij = .
|xj − xi |
CHAPTER 1. NEWTON’S EQUATIONS 10

Example 1.12. Any system of the form in Example 1.6 obeys the law of action
and reaction since eij = −eji .
Interaction forces for non-Newtonian systems generally do not obey this law.
For example, a particle with electric charge q placed in an electromagnetic field
is acted upon by the Lorentz force
F = q E + 1c (v × H) ,

where E, H are the electric and magnetic fields (which satisfy Maxwell’s equa-
tions) and c is the speed of light. For a system of two electrically charged
particles, the cross product term creates a non-collinear interaction force.
The effect of all external forces together can be observed through the total
(linear) momentum
XN
P= pi ,
i=1
which is conserved for a closed system:
Proposition 1.13 (Conservation of linear momentum). The increase in total
momentum is equal to the total force. Moreover, if the forces can be decomposed
as (1.8) and satisfy the law of action and reaction,
P then the change in the total
momentum is equal to the total external force i Fei . In particular, for a closed
system the total linear momentum is conserved.
Proof. By Newton’s equations (1.2) we have
N
X
Ṗ = Ḟi ,
i=1

which is the first claim. Inserting the decomposition (1.8), we obtain

N
X N
X N
X N
X N
X
Ṗ = ṗi = Fi = Fij + Fei = Fei , (1.9)
i=1 i=1 i,j=1 i=1 i=1
i6=j

which is the second claim. In the last equality, we note that Fij = −Fji causes
the double sum over interaction forces to cancel pairwise.
A similar argument shows that we have component-wise conservation under
weaker assumptions:
Corollary 1.14. Suppose that the forces can be decomposed as (1.8) and that
the law of action and reaction holds. If the total external force is perpendicular to
an axis, then the component of the total momentum along that axis is conserved.
Proof. Let a be a unit vector so that
N
X
a· Fei = 0.
i=1
CHAPTER 1. NEWTON’S EQUATIONS 11

Then taking the dot product of (1.9) with a, we obtain

d
(a · P) = 0
dt
and thus a · P is conserved.
As we will see later, the conservation of momentum along an axis is asso-
ciated with the system’s invariance under spatial translations along that axis
(cf. Proposition 4.12). Specifically, we say that a system is symmetric in the
direction of the unit vector a if whenever xi (t) is a solution then so is xi (t) + ca
for all i, for any constant c.
The total momentum can also be observed through the system’s center of
mass (or barycenter)
PN
mi xi
X = Pi=1N
. (1.10)
i=1 mi
It turns out that this definition is independent of the choice or origin, and it is
characterized as the point with respect to which the total momentum vanishes
(cf. Exercise 1.4). Moreover, the total momentum is equivalent to all of the
mass lying at the center of mass:
N
X N
X
P= pi = M Ẋ, where M= mi . (1.11)
i=1 i=1

Proposition 1.15 (Newton’s first law, general case). The center of mass evol-
ves as if all masses were concentrated at it and all forces were applied to it. In
particular, for a closed system the motion of the center of mass is rectilinear
(i.e. uniform in time with constant velocity).

Proof. Differentiating (1.11) yields

n
X
M Ẍ = Ṗ = Fi .
i=1

The RHS vanishes for a closed system by (1.9).

1.6. Angular momentum

In this section, we will specialize to the case d = 3 so that we may use the
cross product on R3 . The angular momentum (about the origin) of the ith
particle and the total angular momentum are given by
N
X
Li = xi × pi , L= Li
i=1
CHAPTER 1. NEWTON’S EQUATIONS 12

respectively. The torque (or moment of force) of the ith particle and the
total torque are given by
N
X
Ni = xi × Fi , N= Ni
i=1

respectively.
When the forces can be decomposed as (1.8), we define the external torque

Nei = xi × Fei .

The relationship between angular momentum and torque is analogous to their

linear counterparts:
Proposition 1.16 (Conservation of angular momentum). The increase in total
angular momentum is equal to the total torque. Moreover, if the forces can be
decomposed as (1.8) and satisfy the law of action and reaction, thenP
the change
e
in total angular momentum is equal to the total external torque i Ni . In
particular, for a closed system the total angular momentum is conserved.
Proof. By Newton’s equations (1.2) we have
N
X N
X N
X
L̇ = L̇i = xi × ṗi + ẋi × pi = xi × Fi + 0 = N,
i=1 i=1 i=1

which is the first claim. In the third equality we noted that the term ẋi × pi
vanishes because pi is parallel to ẋi . Assuming the decomposition (1.8), we
obtain
N
X N
X N
X N
X
L̇ = x i × Fi = xi × Fij + xi × Fei = Nei , (1.12)
i=1 i,j=1 i=1 i=1
i6=j

which is the second claim. In the last equality we noted that Fij = −Fji causes
the double sum cancels pairwise:

xi × Fij + xj × Fji = (xi − xj ) × Fij = 0.

In particular, for a closed system the RHS of (1.12) vanishes and we have
L̇ = 0.
A similar argument shows that we have component-wise conservation under
weaker assumptions:
Corollary 1.17. Suppose that the forces can be decomposed as (1.8) and that
the law of action and reaction holds. If the total external torque is perpendicular
to an axis, then the projection of the total angular momentum onto that axis is
conserved.
CHAPTER 1. NEWTON’S EQUATIONS 13

Proof. Let a be a unit vector so that

N
X
a· Nei = 0.
i=1

Then taking the dot product of (1.9) with a, we obtain

d
(a · L) = 0
dt
and thus a · L is conserved.
As we will see later, the conservation of angular momentum about an axis is
associated with invariance under rotations about that axis (cf. Proposition 4.12).
Specifically, we say that the system is rotationally symmetric about the unit
vector a if whenever xi (t) is a solution then so is xi (t) cos θ + a × xi (t) sin θ +
a(a · xi (t))(1 − cos θ) for all i, for any constant θ.
The center of mass also plays an important role for angular momentum.
Indeed, the total angular momentum evolves as if it were all concentrated at
the center of mass and all the external torques were applied to it:
Proposition 1.18. The total angular momentum (or total torque) about the
origin is equal to the sum of the total angular momentum (total torque) about
the center of mass and the angular momentum (torque) of the center of mass
about the origin.
Proof. Expanding about the center of mass, we have
N
X
L= [(xi − X) + X] × mi (ẋi − Ẋ) + Ẋ
i=1
N
X
= (xi − X) × mi (ẋi − Ẋ) + X × M Ẋ
i=1
N
X N
X
+ mi (xi − X) × Ẋ + X × mi (ẋi − Ẋ)
i=1 i=1
N
X
= (xi − X) × mi (ẋi − Ẋ) + X × P + 0 + 0
i=1

which yields the claim for angular momentum. In the last equality we used the
definition of total momentum (1.11) for the first term and Exercise 1.4 for the
vanishing of the square-bracketed terms. The statement for torque follows by
taking a time derivative.
There is one more dynamical quantity of interest: the moment of inertia
of the ith particle (about the origin), which is given by

Ii = mi |xi |2 .
CHAPTER 1. NEWTON’S EQUATIONS 14

The moment of inertia plays a role for the angular velocity

xi × ẋi
ωi =
|xi |2
analogous to that of mass for linear velocity, in the sense that
Li = xi × pi = xi × (mi ω i × xi ) = mi [(xi · xi )ω i − (xi · ω i )xi ] = Ii ω i + 0.
As in the previous proposition, the formula L = Iω can also be translated to
the center of mass.
Unlike mass, the moment of inertia evolves in time:
Proposition 1.19. The total moment of inertia evolves according to
I¨ = 4K + 2F · x. (1.13)
In particular, for a conservative system with homogeneous potential V (x) =
c|x|k we have
I¨ = 4E − 2(k + 2)V. (1.14)
Proof. The first claim (1.13) is a straightforward calculation. For the second
claim (1.14), we note that for a homogeneous potential we have
F · x = −∇V · x = −kV, 4K = 4E − 4V.

1.7. Exercises

1.1 (Galilean group generators). Show that every Galilean transformation g on

R × Rd can be uniquely written as the composition g1 ◦ g2 ◦ g3 of the three types
of Galilean transformations in Example 1.2. What is the dimension of the group
of Galilean transformations for d = 3? (Hint: Start by writing g as the general
affine transformation
g(t, x) = (b · x + kt + t0 , Ax + vt + x0 ),
and show that b = 0, k = 1, and A ∈ O(d).)
1.2. Suppose a Newtonian system of N particles in Rd is in an inertial frame
and all of the initial velocities are zero. Show that if the particles are initially
contained in a (d − 1)-dimensional linear subspace of Rd then they remain in
that subspace for all time.
1.3 (Rotating reference frame). Suppose we have a system in an inertial co-
ordinate frame z ∈ R3 (e.g. coordinates relative to the sun), so that Newton’s
equations (1.3) obey the conditions (a)–(c) of section 1.1. Consider another
set of non-inertial coordinates x (e.g. coordinates relative to a point on Earth’s
surface) expressed in terms of the coordinates z via
t 7→ t, z 7→ x = B(t)z + b(t).
Here, b(t) ∈ R3 is the new origin and B(t) ∈ O(3) is a rotation matrix for all t.
CHAPTER 1. NEWTON’S EQUATIONS 15

(a) Show that the equations of motion in the new frame are

mi z̈i = Fi zk − zj , B −1 Ḃ(zk − zj ) + (żk − żj ) + Φi + Ψi ,

where

Φi = −mi B −1 B̈zi + B −1 b̈ , Ψi = −2mi B −1 Ḃ żi .

The new forces Φi and Ψi that appear in the equations of motion for z
are called inertial or fictitious forces.
(b) Differentiate the definition of an orthogonal matrix to show that B −1 Ḃ is
antisymmetric, and write B −1 Ḃz = ω × z where ω is the angular velocity
of the moving frame. Now we have

Ψi = −2mi ω × ż.

Ψi is called the Coriolis force, and depends on the velocity. In the

northern hemisphere on Earth, it deflects every moving body to the right
and every falling body to the east.
(c) Let w = B −1 b denote the acceleration and α = ω̇ denote the angular
acceleration of the moving frame. Use the time derivative of B −1 Ḃ to
show that
Φi = −mi w + ω × (ω × zi ) + α × zi .
The second term is called the centrifugal force and the third term is the
inertial rotation force or the Euler force, both of which only depend
on position. The former is always directed outward from the instantaneous
axis of rotation and acts even on a body at rest in the coordinate system
z. The latter is only present for nonuniform rotation.
1.4 (Center of mass). (a) Show that the center of mass (1.10) does not de-
pend on the choice of origin.
(b) Show that both the “total position” and total momentum relative to the
center of mass vanishes:
N
X N
X
mi (xi − X) = 0, mi (ẋi − Ẋ) = 0.
i=1 i=1

1.5 (A way to compute π [Gal03]). In this example we will see a geometric aspect
of phase space appear as a physically measurable quantity. This reinforces that
phase space is an inherent object of a Newtonian system and not merely an
abstract concept.
Consider a frictionless horizontal ray with a vertical wall at the origin. One
small block of mass m is initially at rest on the surface, and a big block of mass
M m is pushed towards the small block so that the small block is sandwiched
between the large block and the wall. We will count the number of collisions N
the small block makes with the big block or the wall.
CHAPTER 1. NEWTON’S EQUATIONS 16

(a) Let v1 and v2 denote the velocities of√the large and √ small blocks respec-
tively, and consider the rescaling y1 = M v1 , y2 = mv2 . Plot the initial
energy level set in the (y1 , y2 )-plane to which the motion is confined.
(b) Initially we have v1 < 0, v2 = 0; plot this point in the same (y1 , y2 )-plane.
Assume that each collision is purely elastic, so that when the blocks
collide the total momentum is conserved. Plot the total momentum level
set which contains the initial point; the outcome velocities is determined
by the other intersection of the level sets. After the first collision the small
block will eventually hit the wall, and we will assume that this collision is
also elastic so that v2 < 0 is replaced by −v2 > 0; plot this new point as
well. Plot a few more iterates of this two-collision pattern in the (y1 , y2 )-
plane.
(c) The pattern repeats until v1 > 0 and 0 < v2 < v1 , so that the large block
is moving away and the small block will neither collide with the wall again
nor catch up to the large block. Sketch this final configuration region in
the (y1 , y2 )-plane.
(d) Connect consecutive points occupied by the system in the (y1 , y2 )-plane.
As the lines are either vertical or parallel, the angle θ at a point
p between
any two consecutive lines is the same. Show that θ = arctan m/M .
(e) For any point, the angle θ at that point subtends an arc on the circle
opposite that point. By considering the length of this arc, show that the
total number N of collisions that occurs is the largest integer such that
N θ < π.
(f) Take M = 100n m for n > 0 an integer. Show that 10−n N → π as n → ∞,
and so the number N of collisions spells out the digits of π for n sufficiently
large.
1.6 (Book stacking [Nah16]). We will stack books of mass 1 and length 1 on a
table in an effort to produce the maximum amount of overhang.

(a) Place the first book with its left edge at x = 0 and its right edge lined
up with the end of the table at x = 1. By considering the center of mass
of the book, determine the distance S(1) we can slide the book over the
edge of the table before it falls.
(b) Starting with a stack of two books, we can reason as in part (a) and slide
the top book forward a distance of S(1) while keeping the bottom book
stationary. By considering the center of mass of the two books, determine
the distance S(2) we can slide this two-book configuration before it falls.
(c) Now start with three books, slide the top one a distance of S(1) and then
the top two books as in part (b) in order to produce an overhang S(2)
from the edge of the bottom book. Determine the distance S(3) we can
slide the three-book configuration before it falls.
CHAPTER 1. NEWTON’S EQUATIONS 17

(d) Postulate a formula for S(n) and prove it by induction. Note that the
overhang S(n) tends to infinity as n → ∞.

1.7. Consider a conservative system in R3 with coordinates so that the center

of mass X = 0 is at the origin and is at rest, i.e. Ẋ = 0.

(a) Show that |L|2 ≤ 2IK.

(b) Show that the trajectories in configuration space must be contained in the
PN
intersection of the hyperplane {x ∈ R3N : i=1 mi xi = 0} with the set
{x ∈ R3N : V (x) + |L|2 /(2I(x)) ≤ E}, where L and E are constant in
time.
CHAPTER 2

ONE DEGREE OF FREEDOM

As Newton’s equations model a mechanical system, they come with some

special structure. In this chapter, we will see some of the implications that this
has for solutions. Although many of the definitions extend to arbitrary dimen-
sions, we will focus on the case of one degree of freedom where the consequences
are particularly clear (because trajectories in phase space have codimension
one). Sections 2.1 to 2.4 are based on [Str15, Ch. 5–7] and [JS07, Ch. 10], and
section 2.5 is based on [LL76, Ch. 3] and [Arn89, Ch. 2].

2.1. Linear systems

Suppose we have a Newtonian system with one degree of freedom, and that
the force is only a function of position:
mẍ = F (x).
Any such system is automatically conservative, because we can always find an
antiderivative −V (x) (unique up to an additive constant) so that
mẍ = −V 0 (x). (2.1)
We want to understand the qualitative behavior of solutions near a given
point x0 . We may assume that x0 = 0 for convenience, after replacing the
variable x by x − x0 if necessary.
First we begin with the generic case V 0 (0) 6= 0. In phase space Rx × Rp ,
this means that the vector field (ẋ, ṗ) at the origin is nonzero. From the general
theory of ODEs (cf. Proposition A.18), this implies that there is a smooth change
of variables in a neighborhood of the origin so that the vector field is constant.
More specifically, we have ṗ ≈ −V 0 (0) near the origin, and so the solution x(t)
is accelerating to the left if V 0 (0) > 0 and to the right if V 0 (0) < 0.
With this easy case out of the way, we will now assume V 0 (0) = 0 for the
remainder of this section. The point (x0 , p0 ) = (0, 0) is then a fixed point (or
equilibrium) of the flow, meaning that the constant function x(t) ≡ 0 (and
p(t) ≡ 0) solves the equation. Our first step is to linearize: Taylor expanding
about x = 0 and keeping only the linear term, we obtain
mẍ = −V 0 (x) ≈ −V 0 (0) − V 00 (0)x = −V 00 (0)x. (2.2)

18
CHAPTER 2. ONE DEGREE OF FREEDOM 19

The behavior of solutions to this approximate equation depends on the sign of

V 00 (0).
We begin with the case V 00 (0) < 0:
Example 2.1 (Saddle node). Consider the linear system

mẍ = kx

for k > 0 a constant. This is a conservative system with potential and total
energy
V (x) = − 21 kx2 , 1 2
E(x, p) = 2m p − 12 kx2 .
The trajectories in phase space are confined to the level sets of E, which look
like axes-symmetric hyperbolas. The origin is a saddle node for this linear
system, and we have the explicit solutions
p0
x(t) = x0 cosh(γt) + mγ sinh(γt), p(t) = mγx0 sinh(γt) + p0 cosh(γt),
q
k
where γ = m.

Figure 2.1: Phase portrait for the system of Example 2.1.

We would like to know if the solutions to the approximate equation (2.2)

provide an accurate prediction for the solutions to the actual equation (2.1).
From ODE theory, the Hartman–Grobman theorem (Theorem A.21) tells us
that there is a continuous change of variables from the nonlinear system to
the linearized system, provided that all the eigenvalues of the linearized system
have nonzero real parts. The theorem applies in this case, because we have the
matrix
1

ẋ 0 m x
=
ṗ k 0 p
q
k
with eigenvalues ± m .
Next we consider the case V 00 (0) > 0:
CHAPTER 2. ONE DEGREE OF FREEDOM 20

Example 2.2 (Harmonic oscillator). Consider the linear system

mẍ = −kx

for k > 0 a constant. This is a conservative system with potential and total
energy
V (x) = 21 kx2 , 1 2
E(x, p) = 2m p + 12 kx2 .
The trajectories in phase space are confined to the level sets of E, which look
like axes-parallel ellipses centered at the origin. The origin is called a center
for this linear system, and we have the explicit solutions
p0
x(t) = x0 cos(ωt) + mω sin(ωt), p(t) = −mωx0 sin(ωt) + p0 cos(ωt),
q
k
where ω = m.

Figure 2.2: Phase portrait for the system of Example 2.2.

Unfortunately, we are unable to immediately conclude that this is an ac-

curate description of the actual equation (2.1) because the Hartman–Grobman
theorem no longer applies. Indeed, our matrix is
1

ẋ 0 m x
=
ṗ −k 0 p
q
k
with purely imaginary eigenvalues ±i m .
We might hope that the conclusion of the Hartman–Grobman theorem still
persists. This is not obvious however, because the premise of the theorem is
necessary as the following example illustrates:
Example 2.3. Consider the system

ẋ = −y + ax(x2 + y 2 ) ṙ = ar3
⇐⇒
ẏ = x + ay(x2 + y 2 ) θ̇ = 1
CHAPTER 2. ONE DEGREE OF FREEDOM 21

on R2 , where a ∈ R is a constant. When a = 0 we obtain the linearized system

at (x∗ , y∗ ) = (0, 0), which has eigenvalues ±i and predicts that the origin is a
center. However, when a < 0 (a > 0) we see that r(t) is decreasing (increasing)
monotonically and so the origin becomes a stable (unstable) spiral.

(a) a < 0 (b) a = 0 (c) a > 0

Figure 2.4: Phase portraits for the system of Example 2.3.

We will see in sections 2.2 and 2.4 that the prediction of Example 2.2 is
indeed accurate for conservative systems, because the mechanical system (2.1)
has special properties in comparison to general ODEs.
Finally, consider the case V 00 (0) = 0. Then the equation mẍ = 0 can be
directly integrated to obtain a linear function for x(t), which describes rectilinear
motion (uniform motion with constant velocity). This is of course not a robust
prediction since ṗ is nonzero whenever V 0 6= 0, and so we cannot draw any
conclusions.

2.2. Conservative systems

In section 1.4 we saw that for a conservative Newtonian system the total
mechanical energy is constant along trajectories. We will now give this math-
ematical phenomenon a name and examine its consequences. In addition to
conservative mechanical systems, this also applies to some systems of ODEs
which do not arise from mechanics (cf. Exercise 2.2).
Suppose we have the first-order ODE system

ẋ = f (x) (2.3)

defined for x in an open set U ⊂ Rn . We can always reduce a degree-d system

of ODEs to such a first-order system (at the cost of increasing the dimension)
by treating the time derivatives ẋ, ẍ, . . . , x(d−1) as new independent variables.
Definition 2.4. A conserved quantity for the ODE system (2.3) is a smooth
function E : U → R that is nonconstant on open subsets of U and satisfies
d
dt (E(x(t))) = 0 for all solutions x(t) to (2.3).
CHAPTER 2. ONE DEGREE OF FREEDOM 22

Geometrically, this requires that solutions x(t) lie in level sets of E(x), and
so the quantity E restricts the directions in which trajectories may travel. We
require that E is nonconstant on open sets so that E rules out some directions.
For example, the constant function E(x) ≡ 10 is trivially conserved, but it does
not reveal any information about the behavior of solutions.
A point x∗ where f (x∗ ) = 0 is called a fixed point (or equilibrium)
of (2.3). This implies that the constant function x(t) ≡ x∗ is a solution of (2.3).
A fixed point x∗ is attracting if there exists an open ball B (x∗ ) centered
at x∗ so that for any initial condition x(0) ∈ B (x∗ ) the corresponding solutions
x(t) converge to x∗ as t → ∞. Likewise, a fixed point x∗ is repulsive if the
same statement holds with −t in place of t.

Proposition 2.5. If the ODE system (2.3) has a conserved quantity, then there
are no attracting (or repulsive) fixed points.
Proof. Suppose x∗ were an attracting fixed point, and let > 0 such that
x(t) → x∗ as t → ∞ for all initial conditions x(0) ∈ B (x∗ ). Using that E is
continuous and is constant on the trajectory x(t), we have

E(x(0)) = lim E(x(t)) = E(x∗ ).

t→∞

As x(0) ∈ B (x∗ ) was arbitrary we conclude that E is constant on the open ball
B (x∗ ), which contradicts our definition of a conserved quantity
Substituting t 7→ −t yields the claim for repulsive fixed points.

In order to understand the qualitative behavior of solutions near a fixed

point x∗ , we can try to linearize about x∗ as in section 2.1:

ẋ = f (x) ≈ f (x∗ ) + Df (x∗ )(x − x∗ ) = Df (x∗ )(x − x∗ ), (2.4)

where Df (x∗ ) is the derivative matrix of f at x∗ . Recall from Example 2.3

that when the linearized system predicts that x∗ a center, we cannot conclude
that solutions to the nonlinear system (2.3) form closed orbits (i.e. periodic
trajectories). This is because centers are delicate in the sense that trajectories
need to perfectly match up after one revolution, and the neglected terms in the
Taylor expansion (2.4) can prevent this by pushing them slightly inwards or
outwards.
In two dimensions, the presence of a conserved quantity is enough to recover
the prediction:
Theorem 2.6 (Nonlinear centers for conservative systems). Suppose the two-
dimensional ODE system (2.3) has an isolated fixed point x∗ and that E(x) is
a conserved quantity. If x∗ is an isolated critical point of E and the Hessian
matrix E 00 (x∗ ) is positive (or negative) definite, then all trajectories sufficiently
close to x∗ are closed.
Proof. We will consider the case where x∗ is a minimum of E. Replacing E by
−E then yields the claim for a maximum of E.
CHAPTER 2. ONE DEGREE OF FREEDOM 23

Fix > 0 sufficiently small so that x∗ is the only fixed point of f in B (x∗ ),
x∗ is the only critical point of E in B (x∗ ), and E 00 (x) is positive definite on
B (x∗ ).
Fix x0 ∈ B (x∗ ) r {x∗ }. Let c = E(x0 ), and consider the component γ of
the level set E −1 (c) in B (x∗ ) containing x0 . We claim that γ is a simple closed
contour containing x∗ . For each θ ∈ [0, 2π), consider the value r(θ) > 0 such
that
E(r(θ) cos θ, r(θ) sin θ) = c.
As x∗ is a strict local minimum of E, then we may take smaller if necessary
to ensure that r(θ) exists for all θ. The choice of r(θ) is then unique since
E is strictly convex on B (x∗ ). The function θ 7→ (r(θ) cos θ, r(θ) sin θ) now
provides a continuous parameterization of γ, and thus γ is a simple closed
contour containing x∗ . In fact, we know that γ is also smooth by the implicit
function theorem since ∇E is nonzero on γ.
Next, we claim that the trajectory x(t) starting at x0 must be periodic. We
know that x(t) exists for all t ∈ R, because it is confined to the bounded set γ
and so the blowup condition of Corollary A.6 can never be satisfied. Suppose
for a contradiction that x(t) never repeats any value. Consider the sequence
x(1), x(2), . . . . It is contained in the closed and bounded set γ, and thus must
admit a convergent subsequence. Along this sequence the derivative ẋ = f is
converging to zero. As f is continuous then the value of f at the limit point must
be zero, which contradicts that γ does not contain a fixed point. Moreover, x(t)
must hit every point on γ since there are no fixed points on γ and γ is connected.
Therefore the trajectory starting at x0 is periodic. As x0 ∈ B (x∗ ) r {x∗ }
was arbitrary, we conclude that all trajectories in B (x∗ ) r {x∗ } are closed
orbits.
In Theorem 2.6 we must assume that x∗ is an isolated fixed point, otherwise
there could be fixed points on the energy contour (cf. Exercise 2.3).

2.3. Nonconservative systems

In practice, we know that mechanical systems are never exactly conserva-

tive: dissipative forces (e.g. kinetic friction) dampen the motion and prevent
d
trajectories from being perfectly periodic. However, we still have dt E ≤ 0 in
this case, which has new mathematical consequences for solutions.
Definition 2.7. Consider the first-order ODE system (2.3) with a fixed point
x∗ . Suppose there exists a smooth function E(x) on a connected neighborhood
U of the fixed point x∗ such that x∗ is a strict global minimum with value zero,
and for all c > 0 the sub-level sets {x : E(x) ≤ c} within U are compact and
star-shaped about x∗ with smooth boundary. If also:
d
(a) dt E(x) ≤ 0 for all x in U r{x∗ }, then E is a (weak) Lyapunov function.
d
(b) dt E(x) < 0 for all x in U r{x∗ }, then E is a strong Lyapunov function.
CHAPTER 2. ONE DEGREE OF FREEDOM 24

The star-shaped assumption clearly guarantees that the level sets of E near
x∗ have an interior and exterior. While there do exist topological results that
we could rely upon to weaken this hypothesis (cf. the Jordan curve theorem and
its extensions), Definition 2.7 is suitable for our purposes and is easily verified
in practice. In particular, the star-shaped criterion is satisfied if E 00 is positive
definite, as we will now show. However, note that our definition can also allow
for higher-order minima on a case-by-case basis.
Lemma 2.8. If E : U → R is smooth on an open set U ⊂ Rn , x∗ ∈ U is an
isolated critical point of E with E(x∗ ) = 0, and the Hessian matrix E 00 (x∗ ) is
positive definite, then there exists > 0 sufficiently small so that the sub-level
sets {x ∈ B (x) : E(x) ≤ c} for c ∈ E(B (x∗ )) are compact and star-shaped
about x∗ with smooth boundary.
Proof. Fix > 0 sufficiently small so that x∗ is the only critical point of E in
B (x∗ ) and E 00 (x) is positive definite on B (x∗ ). For each unit vector ν ∈ Rn ,
|ν| = 1 consider the value r(ν) > 0 such that

E(r(ν)ν) = c.

As x∗ is a strict local minimum of E, then we may take smaller if necessary

to ensure that r(ν) exists for every |ν| = 1. The choice of r(ν) is then unique
since E is strictly convex. Finally, r(ν) is smooth since ∇E is nonvanishing on
the level set E(x) = c.
A physical example of a Lyapunov function is the total energy of a dissipative
system:
Example 2.9 (Damped harmonic oscillator). Consider the system

mẍ = −bẋ − kx

with one degree of freedom, where b > 0 a damping constant. The total energy
is still
E = 21 (mẋ2 + kx2 ),
but now
d
E = mẋẍ + kxẋ = −bẋ2 ≤ 0.
dt
The total energy E is a weak (but not strong) Lyapunov function. The origin
is globally attracting with three qualitatively different phase portraits:
√
(a) 0 < b < 2 km: Under damped. The origin is a stable spiral node and
the system oscillates infinitely many times with exponentially decaying
amplitude.
√
(b) b = 2 km: Critically damped. The origin is a stable degenerate node.
The oscillation and friction balance each other so that trajectories barely
fail to make one complete oscillation. In fact, trajectories approach the
origin faster than in the other two cases.
CHAPTER 2. ONE DEGREE OF FREEDOM 25
√
(c) b > 2 km: Over damped. The origin is a stable node and the system
returns to the origin without oscillating.

(a) Under damped (b) Critically damped (c) Over damped

Figure 2.6: Phase portraits for the system of Example 2.9.

For our image of the ball rolling down the graph of the potential energy,
the surface of the graph is now slightly sticky. The ball may still roll through a
minimum, but does not have enough energy to approach the boundary V −1 (E0 )
again and so the permitted region for the ball continually shrinks. If V is
shaped like a bowl about x∗ as in the definition of a Lyapunov function, then
we intuitively expect that the ball tends to the bottom of the bowl and hence
x∗ is stable.
Theorem 2.10. Consider the smooth n-dimensional ODE system (2.3) with a
fixed point x∗ .
(a) If there exists a weak Lyapunov function on a neighborhood of the fixed
point x∗ , then x∗ is Lyapunov stable: for any > 0 there exists δ > 0
such that |x(0) − x∗ | < δ implies |x(t) − x∗ | < for all t ≥ 0.
(b) If n = 2 and there exists a strong Lyapunov function near the fixed point
x∗ , then x∗ is also asymptotically stable: there exists η > 0 so that |x(0) −
x∗ | < η implies x(t) → x∗ as t → ∞.

In particular, if there exists a strong Lyapunov function then there can be

no periodic solutions, unlike conservative systems.
In part (a) of Theorem 2.10, we cannot expect the fixed point to by asymp-
totically stable in general: for the harmonic oscillator (Example 2.2) the total
energy is conserved and thus serves as a weak Lyapunov function. On the other
hand, all cases of the damped harmonic oscillator (Example 2.9) only possess
a weak Lyapunov function and yet they still enjoy the additional conclusion
of part (b). This happens because no trajectory can remain in the set where
d
dt E(x) = 0. In fact, for such systems the conclusion of case (b) can be recovered
under additional assumptions; see [JS07, §10.5] for details.
CHAPTER 2. ONE DEGREE OF FREEDOM 26

Proof. (a) Fix > 0. After shrinking if necessary, we may assume B (x∗ ) ⊂ U .
As x∗ is a strict local minimum of E, there exists c > 0 sufficiently small so
that the sub-level set {E(x) ≤ c} is contained in B (x∗ ). Pick δ > 0 so that the
ball Bδ (x∗ ) is contained within {E(x) < c}.
Fix x(0) ∈ Bδ (x∗ ). We claim that the trajectory {x(t) : t ≥ 0} can never
enter the exterior of E(x) = c. Suppose for a contradiction that there exists
t > 0 such that x(t) is in the exterior of E(x) = c. Then E(x(t)) > E(x(0)).
As E(x(t)) is smooth, the mean value theorem guarantees that there is a time
t0 ∈ [0, t] such that
d
E(x(t)) > 0.
dt t=t0

This contradicts that E is a weak Lyapunov function.

In particular, as {E(x) ≤ c} is contained in B (x∗ ), then the trajectory
remains in B (x∗ ).
(b) Fix x(0), and let c = E(x(0)) > 0 so that x(0) lies in the level set
E(x) = c.
We claim that there are no equilibrium points other than x∗ and no closed
d
orbits. In the former case, we would have dt E(x(t)) = 0 for all t ≥ 0, which
contradicts that E is a strong Lyapunov function. In the latter case, there would
exist t > 0 such that x(t) = x(0), and hence E(x(t)) = E(x(0)). As E(x(t)) is
smooth, the mean value theorem guarantees that there is a time t0 ∈ [0, t] such
that
d
E(x(t)) = 0,
dt t=t0

which contradicts that E is a strong Lyapunov function.

By the the Poincaré–Bendixson theorem (cf. [CL55, Ch. 16 Th. 2.1]), the
only remaining possible behavior for the trajectory {x(t) : t ≥ 0} is to approach
x∗ .
Next, we will examine the behavior of solutions in the limit where the friction
term dominates. In this limit, the ẍ term is negligible and we are often left with
an equation of the form ẋ = −∇V (x). For the damped harmonic oscillator
in Example 2.9 this is the limit mk/b2 → 0, which is rigorously justified in
Exercise 2.4.
Definition 2.11. The ODE system (2.3) is a gradient system if there exists
a smooth function V such that

ẋ = −∇V (x). (2.5)

Although conservative systems ẍ = −∇V (x) and gradient systems ẋ =

−∇V (x) may look similar, they display nearly opposite behavior.
From multivariate calculus we know that the vector field −∇V points in
the direction of steepest descent for V and is orthogonal to the level sets of V .
For our image of the ball rolling down the graph of the potential energy, the
ball slows and never reaches the first minimum it encounters, as if the potential
CHAPTER 2. ONE DEGREE OF FREEDOM 27

energy graph were the bottom of a tank filled with water. Closed orbits are of
course impossible again (cf. Exercise 2.5).
Example 2.12. In the over-damped limit for the harmonic oscillator we have
k 2
ẋ = − kb x, V (x) = 2b x .

This has the solution x(t) = x0 e−kt/b , which is the limiting (i.e. slow timescale)
behavior for the over damped oscillator after the transient (i.e. fast timescale)
behavior becomes negligible. In this limit, the trajectories in the phase portrait
are confined to the line p = mẋ = − mk b x, which agrees with the fact that we
can no longer take a second arbitrary initial condition p(0) for the new one-
dimensional system.

Figure 2.7: Phase portrait for the system of Example 2.12.

2.4. Time reversibility

In addition to a conserved quantity, conservative Newtonian systems also

possess a symmetry in time. We will now give this property a name and study
its mathematical consequences.
Definition 2.13. The ODE system (2.3) is (time-)reversible if there exists
a smooth involution R : U → U (i.e. R(R(x)) = x for all x ∈ U ) such that the
change of variables t 7→ −t, x 7→ R(x) leaves the system (2.3) invariant (i.e. if
x(t) is a solution, then so is R(x(−t))).
Example 2.14. Consider a Newtonian system of the form

mẍ = F(x)

that is independent of time and velocity (or if F is even in velocity and time).
Note that the force F does not have to be conservative. This system is invari-
ant under the change of variables t 7→ −t since ẍ picks up two factors of −1.
Consequently, if x(t) is a solution then so is x(−t).
CHAPTER 2. ONE DEGREE OF FREEDOM 28

As a first-order system of ODEs in phase space, the involution is R(x, p) =

(x, −p). This tells us that trajectories in the phase portrait are symmetric
across the position axes {(xi , pi ) : pi = 0} with the time arrows on trajectories
reversed.
Recall from Example 2.3 that when the linearized system (2.4) predicts that
x∗ is a center, we cannot conclude anything about the behavior of nearby solu-
tions to the nonlinear system (2.3). In two dimensions however, reversibility is
enough to recover this prediction:
Theorem 2.15 (Nonlinear centers for reversible systems). Suppose that the 2-
dimensional ODE system (2.3) is reversible with R a reflection across a line, and
that x∗ is an isolated fixed point lying on the line of reflection. If the linearized
system about x∗ predicts a center, then all trajectories sufficiently close to x∗
are closed.
Proof. As f is smooth, Taylor’s theorem guarantees that
ẋ = f (x) = Df (x∗ ) · (x − x∗ ) + O |x − x∗ |2 .

By premise x∗ is a linear center (cf. Example 2.2), and so solutions y(t) to the
linearized equation
ẏ = Df (x∗ ) · (y − x∗ )
are concentric ellipses centered at x∗ .
Fix x0 ∈ B (x∗ ) on the line of reflection. We claim there exists sufficiently
small so that x(t) is never on the opposite side of x∗ as y(t). Assuming this, the
trajectory x(t) intersects the line of reflection on the other side of x∗ at some
time t > 0 because the trajectory y(t) encloses the origin. By reversibility, we
can reflect this trajectory to obtain a twin trajectory with the same endpoints
but with its arrow reversed. Taking > 0 smaller if necessary, we can know
that x∗ is the only fixed point in B (x∗ ), and so the two trajectories together
form a closed orbit.
It only remains to justify that such an > 0 exists. Let T > 0 denote the
period of solutions y(t) to the linearized system. Given an initial condition x0 ,
define the difference
ht (x0 ) = x(t) − y(t)
at time t ∈ [0, T ] between the nonlinear and linear solutions starting at x0 . The
differential equations for x and y match to first order, and so we have
ht (0) = 0, Dht (0) = 0,
where D denotes the gradient in the spatial coordinates. As f is smooth, there
exists a constant c so that
|Dht (x0 )| ≤ c for all x0 ∈ B (x∗ ), t ∈ [0, T ].
Moreover, c → 0 as ↓ 0. Using the mean value theorem we estimate
|ht (x0 )| = |ht (x0 ) − ht (0)| ≤ c|x0 | for all x0 ∈ B (x∗ ), t ∈ [0, T ].
CHAPTER 2. ONE DEGREE OF FREEDOM 29

The linear solutions y(t) are ellipses, and so there exists a constant a > 0
(depending on the semi-major and semi-minor axes) so that

|y(t)| ≥ a|y(0)| = a|x0 | for all t ∈ [0, T ].

Combing the previous two inequalities, we conclude that there exists > 0
sufficiently small so that

|x(t) − y(t)| = |ht (x0 )| ≤ 21 |y(t)| for all x0 ∈ B (x∗ ), t ∈ [0, T ].

In other words, for x(0) = y(0) ∈ B (x∗ ) we have |x(t) − y(t)| < |y(t)| for all
t ∈ [0, T ], and so x(t) is never on the opposite side of x∗ as y(t).
This argument can also be applied to specific examples to show the exis-
tence of individual closed, homoclinic, and heteroclinic orbits. The key input is
establishing that the trajectory eventually reaches the hyperplane of symmetry,
and then we can extend the trajectory using time-reversal symmetry.
Note that a general involution R can behave very differently in compari-
son to reflections, and so Theorem 2.15 does not clearly generalize to generic
involutions. In particular, there may not be a hypersurface of fixed points; Ex-
ercise 2.8 provides a linear two-dimensional example where the symmetry only
fixes one point.

2.5. Periodic motion

Consider a Newtonian system with one degree of freedom of the form

mẍ = F (x). (2.6)

Any such system is conservative, because we can always find an antiderivative

−V (x) (unique up to an additive constant) so that F = −V 0 (x). From Propo-
sition 1.9 we know that the total energy

E = 21 mẋ2 + V (x) (2.7)

is conserved by the motion. Note that (2.7) provides a first-order equation for
x(t) in place of the second-order equation (2.6). In this section, we will use this
observation to solve for x(t) and record some consequences.
Suppose that the potential V (x) is shaped like a well, in the sense that
V (x) → +∞ as x → ±∞. The total energy (2.7) is conserved by Proposition 1.9,
and hence E(t) ≡ E is constant. The kinetic energy 12 mẋ2 is nonnegative, and
so the solution x(t) is confined to the region {x : V (x) ≤ E} in configuration
space Rx . This set is bounded since V (x) → +∞ as x → ±∞, and so the
motion is bounded.
By conservation of energy, the velocity ẋ(t) vanishes for values of x with
V (x) = E; these values are called the turning points of the motion. They are
the two endpoints of the interval of {x : V (x) < E} containing x0 , and they are
the extremal points of the motion.
CHAPTER 2. ONE DEGREE OF FREEDOM 30

Most trajectories are periodic and oscillate between the two turning points.
Indeed, if we have V 0 (x) 6= 0 at the two turning points, then the trajectory
x(t) reaches the turning point in finite time and doubles back. This is not
true if V 0 (x) vanishes at one of the turning points. One possibility is that the
trajectory starts at an equilibrium point (x0 , p0 ) in phase space, in which case we
have p0 = 0 and V 0 (x0 ) = 0 so that ẋ = 0 and ṗ = 0 respectively. Alternatively,
it could be that the turning point is an equilibrium, in which case the motion
x(t) approaches the turning point as t → ∞ but never reaches it. Indeed, if x(t)
reaches the turning point then we have ẋ(t) = 0 by conservation of energy, but
this violates the uniqueness of the equilibrium solution.
Suppose x(t) is periodic with turning points x1 (E) < x2 (E). Solving the
energy equation (2.7) for ẋ we obtain
q
2
ẋ = m [E − V (x)]. (2.8)

This is a separable differential equation with solution

r Z
m dx
t(E) = p + t0 .
2 E − V (x)

The period τ is given by the integral from x1 to x2 to x1 , which is equivalent

to two times the integral from x1 to x2 :
√ Z x2 (E)
dx
τ (E) = 2m p . (2.9)
x1 (E) E − V (x)

Altogether, we have proved the following:

Proposition 2.16. Suppose the one-dimensional conservative Newtonian sys-
tem (2.6) has a potential energy that obeys V (x) → +∞ as x → ±∞. Then
the motion x(t) of the system is bounded. Moreover, when the motion is peri-
odic the period is given by the integral (2.9), where E is the initial energy and
x1 (E) < x2 (E) are the turning points of the motion.
Example 2.17 (Double well). Consider the system (2.6) with the double-well
potential
V (x) = (x + 1)2 (x − 1)2 .
There are three equilibrium points in phase space: the two bottoms of the well,
(x0 , p0 ) = (±1, 0) with energy E = 0, and the unstable equilibrium (x0 , p0 ) =
(0, 0) between the wells with energy E = 1. We know what the phase portrait
looks like near these equilibria by the linear stability analysis of section 2.1,
and we can use the level sets of the energy (2.7) to fill in the rest of the phase
portrait.
For E in (0, 1) or (1, ∞) the motion is periodic. The turning points x1 , x2
satisfy V 0 (xi ) 6= 0, and so nearby we have V (x) ≈ E + V 0 (xi )(x − xi ). Conse-
quently the integrand of (2.9) has a singularity (x − xi )−1/2 at both endpoints,
and so the integral converges.
CHAPTER 2. ONE DEGREE OF FREEDOM 31

The level set E = 1 in phase space is comprised of three trajectories: the

equilibrium (0, 0) and two homoclinic orbits which are each traversed in an
infinite amount of time. We have V 0 (0) = 0, and so near the turning point
x = 0 we have V (x) ≈ E + 21 V 00 (0)x2 . Consequently the integrand of (2.9) has
a singularity x−1 at the endpoint x = 0, and so the integral diverges to +∞.

V (x)

E=1

(a) Potential energy. (b) Phase portrait.

Figure 2.9: The system of Example 2.17.

The period of motion also has a geometric interpretation. Given a periodic

trajectory at energy E, let S(E) denote the area enclosed by the trajectory in
the phase plane Rx × Rp with turning points x1 < x2 . The system (2.6) is time-
reversible, and so as in Example 2.14 the trajectory in phase space is symmetric
about the x-axis. Therefore, expressing the top half of the curve p = mẋ as a
function of position using (2.8), we may write the area as twice the area under
the curve: Z x2 (E) p
S(E) = 2 2m[E − V (x)] dx.
x1 (E)
Differentiate this using the Leibniz integral rule, we obtain
Z x2 (E)
dS d p
=2 2m[E − V (x)] dx
dE x1 (E) dE
p dx2 p dx1
+ 2 2m[E − V (x2 )] − 2 2m[E − V (x1 )] .
dE dE
As x1 , x2 are turning points we know V (x1 ) = V (x2 ) = E, and so the second
and third terms vanish. This yields
dS √ Z x2 (E)
dx
= 2m p .
dE x1 (E) E − V (x)
Comparing this expression to the period integral (2.9), we conclude:
Proposition 2.18. For the system of Proposition 2.16, the period of a periodic
trajectory is equal to the rate of change of the enclosed area with respect to
energy:
dS
τ (E) = (E).
dE
CHAPTER 2. ONE DEGREE OF FREEDOM 32

As a partial converse to Proposition 2.16, we will show that the period

determines the shape of the potential:
Proposition 2.19. For the system of Proposition 2.16, suppose that on an
interval (−r, r) the potential V has a minimum at x = 0, has no other critical
points, and is even. Then all trajectories are periodic, and the potential V is
uniquely determined (up to an additive constant) by the periods τ (E).
Proof. By premise we have V 0 (x) > 0 for x > 0 and V 0 (x) < 0 for x < 0. This
guarantees that all trajectories other than the equilibrium solution are periodic.
As adding a constant does not change the equations of motion, we may assume
that V (0) = 0. Each value V > 0 corresponds to exactly two values of x, which
we will denote by x− (V ) ∈ (−r, 0) and x+ (V ) ∈ (0, r). Changing variables in
the period integral (2.9), we obtain
√ Z 0
dx √ Z x2 (E)
dx
τ (E) = 2m p + 2m p
x1 (E) E − V (x) 0 E − V (x)
√ Z E
dx+ dx− dV
= 2m − √ .
0 dV dV E −V
√
We divide both sides by W − E and integrate with respect to E from 0 to W :
Z W √ Z WZ E
τ (E) dE dx+ dx− dV dE
√ = 2m − p
0 W −E 0 0 dV dV (W − E)(E − V )
"Z W #
√ W
Z
dx+ dx− dE
= 2m − p dV
0 dV dV V (W − E)(E − V )
√ Z W √

dx+ dx−
= π 2m − dV = π 2m [x+ (W ) − x− (W )] .
0 dV dV
In the last equality we noted x± (0) = 0. Taking W = V we obtain
Z V
1 τ (E)
x+ (V ) − x− (V ) = √ √ dE.
π 2m 0 V −E
In this way, the period determines the potential energy.

2.6. Exercises

2.1 (Pendulum). Consider a mass m attached to the end of a rigid massless

rod of length ` with the other end suspended at a fixed point. We allow the
rod to rotate in a vertical plane, subject to a constant downward gravitational
acceleration g.
(a) Let x denote the angle from the vertical directly below the pivot, and show
that
ẍ = − g` sin x.
(We have not allowed for configuration spaces other than Euclidean space
yet, so at the moment we are simply viewing x ∈ R.)
CHAPTER 2. ONE DEGREE OF FREEDOM 33

(b) Sketch the potential energy and the phase portrait, and convince yourself
that the trajectories correspond to a small ball rolling down the graph of
the potential (cf. section 1.4). Note that near the origin in the phase plane
the diagram looks like that of the harmonic oscillator (cf. Example 2.2),
which is a consequence of the small-angle approximation sin x ≈ x. Iden-
tify all equilibria and heteroclinic orbits. How many trajectories make up
the eye-shaped energy level set from −π ≤ x ≤ π that separates different
modes of behavior, and to what motion do they correspond?
(c) Now add a damping term:
g
ẍ = −bẋ − ` sin x
d
where b > 0. Show that dt E ≤ 0 along all trajectories, and sketch the
new phase portrait.
2.2 (Non-mechanical conservative system). Consider the Lotka–Volterra model

ẋ = x(1 − y), ẏ = µy(x − 1)

for a predator-prey system, where µ > 0 is a constant parameter.

dy
(a) Use the chain rule to find a differential equation for dx .

(b) Integrate this separable differential equation to find a conserved quantity

E(x, y) for the system.
(c) Show that all trajectories are periodic for initial conditions x(0), y(0) > 0.
2.3 (Conserved quantity minimum that is not a center). Consider the system

ẋ = xy, ẏ = −x2 .

Show that E(x, y) = x2 + y 2 is a conserved quantity and plot the phase portrait
for this system. Although the origin is a minimum for E, it is not an isolated
fixed point nor a center.
2.4 (Over-damped limit for the harmonic oscillator). We will now justify the
over-damped limit approximation in Example 2.12. The objective is to find a
regime for the damped harmonic oscillator

mẍ = −bẋ − kx, b, k > 0

in which the ẍ term is negligible.

(a) By equating the units of each term in the differential equation, determine
the units of b and k. (Assume that x is measured in units of, say, length.)
(b) Define a new dimensionless variable τ via t = T τ , where T is a constant
with units time to be chosen. Find the new differential equation in terms
of τ . Pick T to make the coefficient of the ẋ term equal to one, and check
that T has units time.
CHAPTER 2. ONE DEGREE OF FREEDOM 34

(c) The coefficient of the ẍ term should be mk/b2 , and so the limit in which
this term is negligible is = mk/b2 → 0. Find the general solution
x(t) = c1 ek1 t + c2 ek2 t for the linear equation ẍ + ẋ + x = 0.
(d) Recall that 1/|k| is called the characteristic time of ekt , because after a
time 1/|k| the function has decreased (since k < 0) by a factor of 1/e.
Find the leading term in the Taylor expansion of 1/k1 and 1/k2 about
= 0; these are called the fast and slow timescales for the solution.
2.5. Show that nonconstant periodic solutions are impossible in a gradient sys-
tem by considering the change in V (x) around such an orbit. Conclude that
any one-dimensional first order ODE has no periodic solutions.
2.6 (Low-regularity existence for gradient systems [San17]). The special form
of gradient systems allows us to establish existence and uniqueness with fewer
regularity assumptions. This is particularly useful for gradient PDE systems,
where Rn is replaced by an infinite-dimensional function space.
Suppose F : Rn → R is a convex (and not necessarily smooth) function.
This guarantees that the sub-differential

∂F (x) = {v ∈ Rn : F (y) ≥ F (x) + v · (y − x) for all y ∈ Rn }

is nonempty for each x ∈ Rn . Note that F is differentiable at x if and only if

∂F (x) = {∇F (x)}. For x : [0, ∞) → Rn absolutely continuous, consider the
gradient system

ẋ(t) ∈ −∂F (x(t)) for almost every t > 0, x(0) = x0 .

(a) Prove uniqueness of solutions by differentiating the squared difference of

two solutions.
(b) Fix τ > 0, and recursively define the sequence

xτ0 = x0 , xτk+1 a minimizer of x 7→ F (x) + 1

2τ |x − xτk |2 .

Show that τ
1
xk+1 − xτk ∈ −∂F (xτk+1 ).

τ
For F differentiable, this is simply the implicit Euler scheme for ẋ =
−∇F (x).
(c) In order to extract a convergent sequence as τ ↓ 0, we need a compactness
estimate. Use the definition of the sequence xτk show that
K
X
τ
1
2τ |xk+1 − xτk |2 ≤ F (x0 ) − F (xτK+1 ).
k=0

(d) Define the piecewise constant and linear interpolations

xτk+1 −xτk
xτ (t) = xτk+1 , eτ (t) = xτk +
x τ (t − kτ ) for t ∈ (kτ, (k + 1)τ ].
CHAPTER 2. ONE DEGREE OF FREEDOM 35

Use the previous part to show there exists a constant C such that
Z T
1
xτ )0 (t)|2 dt ≤ C.
2 |(e
0

Conclude that

xτ (t) − x
|e eτ (s)| ≤ C|t − s|1/2 , xτ (t) − xτ (t)| ≤ Cτ 1/2 .
|e

(e) Assume that F is bounded below. For any T > 0, use the Arzelà–Ascoli
theorem to show that x eτ : [0, T ] → Rn admits a uniformly convergent sub-
sequence as τ ↓ 0, and that xτ converges uniformly to the same limit. After
passing to a further subsequence if necessary, show that (e xτ )0 converges
2
weakly in L ([0, T ]). Conclude that the limit x(t) solves the gradient
system for F .

2.7 (Gradient flows in PDE). For u : Rn → R define the Dirichlet energy

( R
1
|∇u(x)|2 dx if ∇u ∈ L2 ,
E(u) = 2
+∞ else.

In analogy with directional derivatives for functions on Rn , the gradient ∇E(u)

of the functional E at u is defined by
d
E(u + sv) = h∇E(u), vi for all v,
ds s=0

when it exists. Different choices of inner products on the RHS yields different
notions of gradients.

(a) For the inner product

Z
hu, viL2 = u(x)v(x) dx

on L2 (Rn ; R), show that formally the gradient flow for the energy E is the
heat equation
∂u
= −∆u.
∂t
(b) For the inner product
Z
hu, viḢ 1 = ∇u(x) · ∇v(x) dx

on Ḣ 1 (Rn ; R) = {u : Rn → R : |∇u|2 dx < ∞}, show that formally the

gradient flow for the energy E is the equation

∂u
= −u.
∂t
CHAPTER 2. ONE DEGREE OF FREEDOM 36

Note that the higher regularity norm yields less regular solutions: it is
well-known that solutions to the heat equation are automatically smooth,
while solutions u(t, x) = e−t u0 (x) to this equation are only as smooth as
the initial data.

2.8 (A non-mechanical reversible system). Show that the system

ẋ = −2 cos x − cos y, ẏ = −2 cos y − cos x

is reversible with respect to rotation by π. Note that the presence of stable and
unstable nodes guarantees that this system is not conservative.
2.9 (Pendulum period). Show that for the pendulum

ẍ = − g` sin x,

the motion with turning points ±θ0 has period

q
τ = 4 g` K sin θ20 ,

where Z π/2
dξ
K(k) = p
0 1 − k 2 sin2 ξ
is the complete elliptic integral of the first kind. By Taylor expanding about
θ0 = 0, find the expansion:
q
1 2
τ ≈ 2π g` 1 + 16

θ0 + . . . .

Note that the zeroth order term is the constant-period approximation obtained
by taking sin x ≈ x and thus replacing the pendulum by a harmonic oscillator.
2.10 (Existence of solitons). In 1844, Scott Russell famously observed a solitary
traveling wave (now commonly referred to as a soliton) in a canal, contradicting
the popular belief that all water waves must either crest and break or disperse.
In order to explain this phenomenon, the Korteweg–de Vries equation

∂u ∂3u ∂u
= − 3 − 6u
∂t ∂x ∂x
was introduced in [KdV95] as a model for the surface u : Rt × Rx → R of a
shallow channel of water.

(a) We seek traveling wave solutions to this PDE. Insert the ansatz u(t, x) =
h(x − ct) where c > 0 is a constant and obtain an ODE for h(x).
(b) Integrate the equation for h once to obtain a second-order ODE, and write
2
it in the form ddxh2 = −V 0 (h) of a conservative mechanical system for some
potential function V (h).
CHAPTER 2. ONE DEGREE OF FREEDOM 37

(c) Use the conserved quantity 21 (h0 )2 + V (h) to sketch the phase portrait
in the (h, h0 )-plane. Highlight a unique homoclinic orbit connecting the
fixed point at the origin to itself; the corresponding solution h(x) obeys
h(x) → 0 as x → ±∞, and thus describes the profile of a localized wave.
(d) Use the conservation of 12 (h0 )2 + V (h) to obtain a first-order ODE for h.
This equation is separable, and thus can be integrated. Conclude that
solitary traveling wave solutions are given by the formula

u(t, x) = 2β 2 sech2 β(x − x0 − 4β 2 t)

for arbitrary constants x0 ∈ R and β > 0. How is the speed of these waves
related to their amplitude?
CHAPTER 3

CENTRAL FIELDS

We will examine some examples of systems with more than one degree of
freedom. This selection focuses on the most important examples in order to
provide a baseline intuition; a thorough study of classical mechanics should
include many more examples, e.g. rigid bodies and the mechanical top. The
material is based on [Arn89, Ch. 2], [LL76, Ch. 3], and [Gol51, Ch. 3].

3.1. Central fields

In this section we will solve for the motion of a single particle in R3 subject to
a central force F. A vector field F is called central (about the origin) if all of the
vectors are radial and the magnitude is only a function of the radial coordinate
r = |x|; in other words, F ≡ F (r)r̂. (This definition of course extends to Rd , but
soon we will need to specialize to R3 in order to discuss angular momentum.)
A central field must be conservative, and the corresponding potential energy
V ≡ V (r) depends only on the distance to the origin. This is because F (r) is a
function of one variable and thus we can always find an antiderivative −V (r).
Alternatively, if we write F = F (r)r̂, then the work
Z x2 Z |x2 |
F · ds = F (r) dr
x1 |x1 |

is path independent and so from Proposition 1.7 we know there exists a radial
potential energy V (r) so that F = −∇V .
The torque of the particle is

N = x × F = F (r)r(r̂ × r̂) = 0,

and so by Proposition 1.16 we know the particle’s angular momentum L = x×p

is conserved. As x is always perpendicular to L, we see that the particle’s motion
is confined to the plane orthogonal to L provided that L 6= 0 initially. If L = 0,
then x is parallel to the velocity ẋ, and thus the particle’s motion must be
collinear. In both cases, the motion is coplanar.
As a consequence of the conservation of angular momentum, we have:

38
CHAPTER 3. CENTRAL FIELDS 39

Proposition 3.1 (Kepler’s second law). The rate of change in the total area
swept by the radius vector as a function of time is constant.
Proof. Let (r, φ) denote polar coordinates within the plane of motion. The
velocity in these coordinates is

d r cos φ cos φ − sin φ
= ṙ + rφ̇ = ṙr̂ + rφ̇φ̂.
dt r sin φ sin φ cos φ

The magnitude L of the angular momentum L is then

L = |x × p| = |(rr̂) × (mẋ)| = mr2 φ̇. (3.1)

Note that the conservation of L requires that φ̇ cannot change sign.

On the other hand, the area ∆S of the angular wedge swept by the radius
vector x over an angle ∆φ is given by

∆S = 21 x · x∆φ + O(∆φ2 ) = 21 r2 φ̇∆t + O(∆t2 )

to first order. Together, we see that the total area S(t) swept by the radius
vector obeys
Ṡ = 12 r2 φ̇ = 2m
1
L (3.2)
and thus is constant.
Using (3.1), we can rewrite the total energy as
m 2 L2 −2
E =K +V = 2 (ṙ + r2 φ̇2 ) + V (r) = m 2
2 ṙ + 2m r + V (r). (3.3)

This is the total energy for a one-dimensional Newtonian system in the coordi-
nate r with the effective potential energy
L2 −2
Veff (r) = V (r) + 2m r .

The last term on the RHS is called the centrifugal energy. When the effective
potential is equal to the total energy we have ṙ = 0, which is a turning point for
the one-dimensional system. Unlike in section 2.5 though, the actual particle
is not at rest at such a point because the angle is changing (unless the angular
momentum is zero).
Example 3.2. Kepler’s problem seeks the equations of motion for a particle
moving around a fixed gravitational mass, which is governed by the potential

V (r) = −kr−1

for k > 0 a constant. This yields the effective potential

L2 −2
Veff (r) = −kr−1 + 2m r .

The potential V (r) tends to −∞ as r ↓ 0, but the added centrifugal energy

makes the effective potential Veff (r) tend to +∞.
CHAPTER 3. CENTRAL FIELDS 40

If the initial energy E0 is nonnegative, then the trajectory is unbounded.

The particle comes in from infinity, swings around the central mass reaching a
minimum radius where Veff = E0 , and returns towards infinity.
If the initial energy E0 is negative, then the trajectory is bounded. The
particle moves within an annulus, oscillating between the inner and outer radii
where Veff = E0 . As of yet we do not know if the trajectory is periodic, never
realigns, or even is dense in the annulus.

Veff (r)

E0 > 0

r
E0 < 0

Figure 3.1: Effective potential for the system of Example 3.2.

Rearranging the total energy (3.3), we obtain

q
2
ṙ = m [E − Veff (r)]. (3.4)

This is a separable differential equation, with solution

Z
dr
t= q + t0 . (3.5)
2
m [E − Veff (r)]

We can also use the expression (3.4) for ṙ to solve the separable equation (3.1)
for φ: Z
L dr
φ= p + φ0 . (3.6)
2
r 2m[E − Veff (r)]
This is a formula for φ as a function of r, at least formally.
The original system of differential equations was six-dimensional: 3 dimen-
sions for the position x ∈ R3 and 3 dimensions for the momentum p ∈ R3 . We
then used 4 conserved quantities to reduce the system to 2 first-order equations,
which could then be integrated. Indeed, the conservation of the angle of L
eliminated 2 degrees of freedom by restricting the motion to be coplanar, and
the conservation of the magnitude of L eliminated another degree of freedom
by providing the first-order equation (3.1) for φ. The fourth conserved quantity
was the total energy E, which we used to obtain the first-order equation (3.4)
for r.
As in section 2.5, the motion is confined to the region Veff ≤ E. Nevertheless,
in general the particle may still be able to approach r = ∞. Indeed, if the
potential energy at infinity V∞ = limr→∞ V (r) = limr→∞ Veff (r) exists and is
CHAPTER 3. CENTRAL FIELDS 41

finite, then there are unbounded trajectories for energies E ≥ V∞ and we can
define the velocity at infinity v∞ ≥ 0 via E = 21 mv∞
2
+ V∞ . On the other hand,
the particle may also be able to approach r = 0. In order for this to happen,
the potential V (r) must not outgrow the centrifugal energy:
L2 2

lim sup V (r) + 2m r < +∞.
r↓0

In the remaining case, the effective potential has two turning points rmin
and rmax , which confines the motion within the annulus bounded by these two
radii. Points where r = rmin are called pericenters and points where r = rmax
are called apocenters. The time symmetry for the one-dimensional system in
r implies that the trajectory will be symmetric about any ray from the origin
through a pericenter or apocenter. According to the solution (3.6), the angle
between successive pericenters (or apocenters) is then
Z rmax
L dr
Φ=2 p . (3.7)
2 2m[E − Veff (r)]
rmin r

In general, Φ is not a rational multiple of 2π, and consequently the trajectory

is not closed (and, it turns out, is necessarily dense in the annulus).

3.2. Periodic orbits

We have seen two special examples of central fields thus far: the harmonic
oscillator potential
V (r) = kr2 , k > 0, (3.8)
and the gravitational potential

V (r) = −kr−1 , k > 0. (3.9)

The objective of this section is to show that these are the only two central fields
for which all bounded orbits are periodic:

Theorem 3.3 ([Arn89, §8D]). Suppose a particle moves in a smooth central field
on R3 r {0} and there exists a bounded trajectory. If all bounded trajectories
are periodic, then the potential is either the harmonic oscillator potential (3.8)
or the gravitational potential (3.9).
For V (r) = −krp with p ∈ (−2, 0), the effective potential still has the same
qualitative shape as in the case p = −1 from Example 3.2: Veff (r) → +∞ as
r ↓ 0 and Veff (r) → 0 as r → ∞, with a negative minimum in between. If
p 6= −1, Theorem 3.3 implies that for E0 < 0 there must be trajectories that
are not periodic. In fact, it can be shown that such trajectories are dense in the
annulus {r : Veff (r) ≤ E0 }.
To begin the proof of Theorem 3.3, suppose V (r) is a central field in which
all bounded orbits are closed. The existence of a closed orbit guarantees that
CHAPTER 3. CENTRAL FIELDS 42

0
Veff (r) has a (strict) local minimum at some value r > 0. Indeed, if Veff (r) < 0
for all r > 0, then we eventually have Veff (r) ≤ E0 − a for some a > 0 and all
t large, and hence ṙ ≤ b > 0 for all t large by conservation energy. Similarly, if
0
Veff (r) > 0 for all r > 0, then we eventually have Veff (r) ≤ E0 − a for some a > 0
and all t large, and hence ṙ ≤ −b < 0 for all t large by conservation energy. In
either case, we eventually have that ṙ is nonvanishing, and so no closed orbits
could exist.
Let r0 denote a local minimum of Veff . If the initial radius is sufficiently
close to r0 then the energy E0 will be close to Veff (r0 ) and the motion will be
confined to a bounded component of {r : V (r) ≤ E0 }. From (3.7), the angle
between successive pericenters or apocenters is
Z rmax
L
Φ=2 p dr,
2 2m[E − V (r) − L2 /2mr2 ]
rmin r

L
where rmin and rmax are the radial turning points. Substituting x = mr , we
obtain
√ Z xmax
1
Φ = 2m p dx. (3.10)
xmin E − V (L/mx) − mx2 /2
This is the period integral (2.9) for the one-dimensional system with potential
L
m 2
W (x) = V mx + 2x .
Next, we compute the limiting period for small oscillations near minima for
the one-dimensional system with potential W :
Lemma 3.4. Consider a conservative one-dimensional Newtonian system with
smooth potential W (x). If W has a local minimum x0 with value E0 , then
q
lim τ (E) = 2π W 00m(x0 ) . (3.11)
E↓E0

Proof. After substituting W − E0 for W we may assume that E0 = 0. As x0 is

a local minimum with value 0, the Taylor expansion of W at x0 is
W (x) = 21 W 00 (x0 )(x − x0 )2 + O (x − x0 )3 .

This yields the equations of motion

mẍ = −W 00 (x0 )(x − x0 ) + O (x − x0 )2 .

Without the error term, this is a harmonic oscillator about x0 with spring
constant k = W 00 (x0 ), and in Example 2.2 we found that the solution can be
expressed in terms of trigonometric functions with period
q q
2π m k = 2π
m
W 00 (x0 ) .

This gives us the formula (3.11) for the leading term of τ (E) as E ↓ E0 . The
convergence of τ (E) as E ↓ E0 can be justified by passing the limit inside the
integral expression (2.9).
CHAPTER 3. CENTRAL FIELDS 43

Let r0 denote a local minimum of Veff . Then W has a minimum at

L
x0 = mr0 .

Consequently we have W 0 (x0 ) = 0, which yields

m2 x30 = LV 0 (r0 ).

Inserting this into the expression for W 00 (x0 ), we obtain

r0 V 00 (r0 ) + 3V 0 (r0 )
W 00 (x0 ) = m .
V 0 (r0 )

Therefore, taking the limit E ↓ V (r0 ) in the angle integral (3.10) and apply-
ing (3.11), we conclude that the angle Φ tends to
s
V 0 (r0 )
r
m
Φcir = 2π = 2π (3.12)
W 00 (x0 ) r0 V 00 (r0 ) + 3V 0 (r0 )

as the initial radius tends to r0 .

All of the orbits near r0 are bounded, and so by premise they must be
closed. As Φ is the angle between successive pericenters or apocenters, some
integer multiple of Φ must be equal to 2πn for some integer n ≥ 1. Therefore
Φ is a rational multiple of 2π, and since the integral is a continuous function
of the initial condition in a neighborhood of r0 then we must have Φ = Φcir on
this neighborhood of r0 . Setting (3.12) equal to a constant, we obtain the linear
differential equation

cr0 V 00 (r0 ) + (3c − 1)V 0 (r0 ) = 0

for c ∈ R a constant. The solutions are

V (r) = arα for α ∈ [−2, 0) ∪ (0, ∞), and V (r) = b log r,

and in both cases Φ is constant. Plugging this back into Φcir , we obtain

Φ ≡ Φcir = √2π for α ≥ −2. (3.13)

α+2

(The case α = 0 corresponds to V (r) = b log r.) We will now split into cases.
First
√ consider the case V (r) = b log r. Taking α = 0 in (3.13) we have
Φcir = 2π, and so Φ is not a rational multiple of 2π.
Next, consider the case V (r) = arα with α > 0. The constant a must be
positive so that there exists a bounded orbit, and hence V (r) → ∞ as r → ∞.
Substituting x = xmax y in the Φ integral (3.10), we have

√ 1
my 2
Z
dy 1 L
Φ= 2m p , U (y) = + 2 V .
ymin U (1) − U (y) 2 xmax mxmax y
CHAPTER 3. CENTRAL FIELDS 44

As E → ∞ we have xmax → ∞ and the second term in U tends to zero.

Moreover, in this limit we also have ymin → 0 and so we obtain

lim Φ = π.
E→∞

On the other hand, Φ ≡ Φcir is a constant, and so comparing to (3.13) we

conclude that α = 2.
Now consider the case V (r) = arα with −2 ≤ α < 0. Taking the limit E ↓ 0
in the integral (3.10), we obtain
Z 1
dx 2π
lim Φ = 2 √ = .
E↓0 0 x−α−x 2 2+α

Comparing this to (3.13), we conclude α = −1.

Altogether, the only two possible potentials are the harmonic oscillator (3.8)
and the gravitational potential (3.9). This concludes the proof of Theorem 3.3.

3.3. Kepler’s problem

Kepler’s problem seeks the motion for the central field with potential

V (r) = −kr−1 , k > 0.

The original motivation was to model a celestial body in motion around a fixed
gravitational object, but this also describes the motion of an electrically charged
particle attracted to a fixed charge.
From section 3.1 we know the motion is coplanar, and the radius r evolves
subject to the one-dimensional effective potential
L2 −2
Veff (r) = −kr−1 + 2m r .

Note that limr↓0 Veff (r) = +∞ and limr→∞ Veff (r) = 0. If L 6= 0 then the first
derivative
0 L2
Veff (r) = rk2 − mr 3

has exactly one root for r ∈ (0, ∞) at r = L2 /mk, and so Veff has a strict global
minimum with value 2 2
L
Veff mk = − mk
2L2 .

Consequently, for E ≥ 0 we have unbound motion, and for E < 0 (with L 6= 0)

2
we have bounded motion with E ≥ − mk 2L2 .
For this potential we can evaluate the formal solution (3.6) for the angular
coordinate:  
L mk
−
φ(r) = cos−1  q r L .
m2 k2
2mE + L2
CHAPTER 3. CENTRAL FIELDS 45

Given an initial condition, we picked the origin for φ so that integration constant
above is zero. Solving for r as a function of φ, we obtain

L2 /mk
r= p .
1+ 1 + 2EL2 /mk 2 cos φ

Define the quantities

q
L2 2EL2
`= mk , = 1+ mk2 ,

so that we may write

`
r(φ) = .
1 + cos φ
This is the parametric equation for a conic section having one focus at the
origin, with eccentricity ∈ [0, ∞) and latus rectum 2`. By planar geometry,
the semi-major and semi-minor axes are given by
` k ` L
a= 2
= , b= √ =p , (3.14)
1− 2|E| 1− 2 2m|E|

respectively.
The eccentricities = 1 and > 1 correspond to parabolas and hyperbolas
respectively, which agrees with the fact that E ≥ 0 yields unbound orbits.
Likewise = 0 and 0 < < 1 correspond to circles and ellipses respectively,
2
which agrees with the fact that E ∈ [− mk2L2 , 0) yields bounded orbits.
For the planets in our solar system, the eccentricities are very small and the
trajectories are nearly circular. Consequently, before solving Kepler’s problem,
scientists (such as Copernicus) believed that the planets’ orbits were perfectly
circular with the Sun at the center. Kepler corrected this, and Kepler’s first
law states that the planetary orbits are ellipses with the Sun lying at a focal
point.
Now we will determine the period τ of a bounded elliptic orbit. Integrating
Kepler’s second law (3.2) over one orbit and recalling the area of an ellipse, we
have
πab = S = 12 Lτ.
This yields the explicit formula
q
τ = 2πm ab
L = πk
m
2|E|3

for the period as a function of the energy. Using the formula (3.14) for the
semi-major axis a in terms of the energy E, we obtain
q
3
τ = 2π ma k . (3.15)

This demonstrates Kepler’s third law: the period τ of a planet’s orbit is

proportional to a3/2 , where a is the semi-major axis.
CHAPTER 3. CENTRAL FIELDS 46

In practice we know that the mass (or charge) sitting at the origin is not
perfectly stationary, but instead is perturbed by the particle’s presence. We
will now remedy this. The two-body problem seeks the motion for a closed
system consisting of two gravitational bodies with positions xi and masses mi ,
for i = 1, 2. The system is conservative with potential
Gm1 m2
V (x1 , x2 ) = − , (3.16)
|x1 − x2 |

where G is the gravitational constant.

By Newton’s first law (cf. Proposition 1.15) we know that the center of mass
moves with constant velocity. We choose a reference frame so that the center of
mass lies at the origin:
m1 x1 + m2 x2 ≡ 0.
Let x = x1 − x2 denote the distance between the particles. We can recover the
positions from x via
m2
x1 = m1 +m2 x, x2 = − m1m+m
1
2
x. (3.17)

The total energy can be written solely in terms of x as

E = 12 m1 |ẋ1 |2 + 12 m2 |ẋ2 |2 + V (|x1 − x2 |) = 21 µ|ẋ|2 + V (|x|), (3.18)

where µ = m1 m2 /(m1 + m2 ) is the reduced mass. In other words, the two-

body system is equivalent to a single-particle of mass µ moving in the external
central field V (|x|). Solving for the motion of x in the reduced problem (3.18)
yields the solutions to the original problem (3.16) via (3.17). In particular, the
motion of two gravitational bodies (or two attractive charges) will be two conic
sections with a shared focus at the origin.

3.4. Virial theorem

The virial theorem is a general formula for the long-time average of a system’s
kinetic energy. In the special case of a single particle in a homogeneous central
field, it takes a particularly simple form.
Suppose we have a system of N particles in Rd . The virial theorem is based
upon the following simple calculation:
N
X N N N
d X X X
xi · pi = ẋi · pi + xi · ṗi = 2K + Fi · xi . (3.19)
dt i=1 i=1 i=1 i=1

We now take the long-time average

Z T
1
hf i = lim f (t) dt (3.20)
T →∞ T 0
CHAPTER 3. CENTRAL FIELDS 47

of both sides of (3.19):

N
X N
X
d
xi · pi = 2hKi + Fi · x i . (3.21)
dt i=1 i=1

Note that for any bounded function f we have

df f (T ) − f (0)
= lim = 0.
dt T →∞ T

Therefore, if all particle motion is bounded, then the LHS of (3.21) vanishes.
Altogether, we conclude:
Theorem 3.5 (Clausius’ virial theorem [Cla70]). If the motion of a Newtonian
system of N particles is bounded, then the long-time average (defined by (3.20))
of the kinetic energy obeys
N
X
− 2hKi = Fi · xi . (3.22)
i=1

In particular, for a single particle in a central field with potential V that is

homogeneous of degree α we have
α
hKi = 2 hV i. (3.23)

Proof. It only remains to prove the special case (3.23). Writing V (r) = krα for
k ∈ R a constant, we have

F · x = −V 0 (r)r = −αV (r).

Together with (3.22), this yields (3.23).

The quantity on the RHS of (3.22) is called the virial. This name is due to
Clausius, and is derived from the Latin word for “force”.
Similar to how the conservation of momentum and energy are associated
to symmetries, the virial identity (3.23) is connected to a scaling symmetry.
Indeed, for a potential V that is homogeneous of degree α, it is straightforward
to check that if x(t) solves Newton’s equation (1.3), then so does λ2 x(λα−2 t)
for any constant λ > 0.
Example 3.6. For the gravitational potential (3.9) we have α = −1, and thus

hKi = − 21 hV i.

In particular, for circular orbits the potential V = −kr−1 is constant, and so

we deduce that the velocity v is proportional to r−1/2 . On the other hand, the
period τ is given by 2πr/v and thus is proportional to r3/2 ; this agrees with
Kepler’s third law (3.15).
CHAPTER 3. CENTRAL FIELDS 48

The version (3.22) of the virial theorem is commonly used in physical appli-
cations. Mathematically however, the computation (3.19) is arguably equally
as important, particularly in deriving monotonicity formulas.
Example 3.7. Consider a single particle moving in a central field in R3 with
potential V , and suppose that the potential is repulsive in the sense that the
radial component of the force always points away from the origin:

x · ∇V (x) < 0 for all x 6= 0. (3.24)

From (3.19) we see that

d
(x · p) ≥ 2K > 0,
dt
and thus x · p is a strictly increasing function of time along any trajectory; in
particular, there can be no closed orbits for the system. Equivalently, we can
phrase this in terms of its antiderivative m 2
2 |x(t)| :

d2
|x(t)|2 ≥ 4

dt 2 mK > 0,

from which we see that |x(t)|2 is a strictly convex function.

Example 3.8 (Morawetz inequality [Tao06, Ex. 1.32]). Alternatively, it is also
common to work in terms of the function |x(t)|. Consider the system in the
previous example. For x 6= 0 we compute
d2 |ẋ|2 (x · ẋ)2 |πx (ẋ)|2 x · ∇V

d x x
2
|x(t)| = · ẋ = − 3
+ · ẍ = − , (3.25)
dt dt |x| |x| |x| |x| |x| m|x|
where
x x
πx (v) = v − ·v
|x| |x|
denotes the component of v perpendicular to x. For a repulsive potential V (in
the sense of (3.24)), we obtain

d2 |πx (ẋ)|2
|x(t)| ≥ > 0.
dt2 |x|
x
Consequently |x(t)| is strictly convex, or equivalently |x| · ẋ is strictly increas-
ing. Physically, this tells us that a particle initially moving towards the origin
is slowing down, and a particle moving away from the origin cannot reverse
direction.
If the potential V is also nonnegative, then we have
1 2
2 m|ẋ| ≤ E0

where E0 is the initial total energy, and thus

x
q
2
· ẋ ≤ |ẋ| ≤ m E0 .
|x|
CHAPTER 3. CENTRAL FIELDS 49

Integrating (3.25) in time we then obtain

∞
|πx(t) (ẋ(t))|2
Z q
2
dt ≤ 2 m E0 (3.26)
−∞ |x(t)|

(provided that x(t) exists for all t ∈ R and is nonvanishing). Physically, this
tells us that the angular component of the velocity is decaying in time and the
motion becomes predominantly radial.
The estimate (3.26) is an example of a Morawetz inequality, and such
inequalities have played an important role in the context of PDEs. They are
named after Morawetz’s pioneering work on the scattering problem for the linear
wave equation with an obstacle; see [Mor75, LP89] for details. More recently,
Morawetz inequalities have also proven to be a powerful tool in the study of
nonlinear PDEs.
For more examples of monotonicity formulas in the context of Newtonian
systems, see [Tao06, §1.5]. For an introduction to monotonicity formulas for
PDEs, see [KV13, §7] and [Tao06, Ch. 2-3].

3.5. Exercises

3.1. Consider the central field with potential

V (r) = −kr−2

for k > 0 a constant. Show that trajectories with negative energy reach the
origin in finite time by considering the effective potential.
3.2 (Method of similarity). Suppose that the potential energy of a central field
is a homogeneous function of degree α:

V (r) = krα

for k ∈ R a constant. Show that if a curve γ is a trajectory, then the rescaled

curve λγ for any constant λ > 0 is also a trajectory. For a periodic trajectory
γ, determine the ratio of the periods of these trajectories. Conclude that the
period is constant for the harmonic oscillator (α = 2), and that the gravitational
potential (α = −1) obeys Kepler’s third law.
3.3 (Escape velocity). Let r0 denote the radius of the Earth, and g the gravi-
tational acceleration at the Earth’s surface. The gravitational potential energy
of the Earth is
gr 2
V (r) = − r0 .
Determine the escape velocity, i.e. the minimum velocity a particle must be
given on the surface of the Earth in order for it to travel infinitely far away.
CHAPTER 3. CENTRAL FIELDS 50

3.4 (Cosmic velocities). Consider the gravitational potential energy of the Earth
as in the previous problem. The escape velocity v2 is sometimes called the second
cosmic velocity. The first cosmic velocity is the speed of a particle in a circular
orbit with radius equal
√ to that of the Earth. Find the first cosmic velocity v1
and show that v2 = 2v1 .
3.5 (Geosynchronous orbit [Nah16]). It is useful for communication satellites
to be in geosynchronous orbit, so that their orbital period is one day and the
satellite appears to hover in the sky. We will calculate the height of this orbit
for Earth in two different ways.

(a) Let m be the satellite’s mass, M = 5.98 × 1024 kg be the Earth’s mass, v
be the satellite’s velocity, and Rs the radius of the satellite’s circular orbit.
Determine Rs by equating the gravitational and centripetal accelerations
of a circular orbit and writing v = 2πRs /T where T is the length of a day.
(b) Use Kepler’s third law to calculate the same value for Rs .

3.6 (Satellite paradox [Nah16]). Satellites in low Earth orbit experience signif-
icant atmospheric drag, which actually increases the speed of the satellite.

(a) For a circularly orbiting satellite in Earth’s gravitational potential, con-

clude from the virial theorem that the satellite’s total energy is

E = − 12 mv 2 .

Alternatively, this relation can be obtained by equating the gravitational

and centripetal accelerations, solving for v, and substituting v into the
kinetic energy.
(b) Differentiate the result from part (a) to determine v̇ in terms of Ė. As-
suming the energy loss rate (i.e. dissipated power) of the satellite is

Ė = −cv

for c > 0 a constant, show that v̇ > 0.

3.7 (Solar and lunar tides [Nah16]). The gravitational force between two bodies
of masses m1 and m2 has magnitude

F (r) = Gm1 m2 r−2 ,

where r is the distance between the bodies’ centers and G is the universal
gravitational constant. For the Earth, Sun, and Moon, we have

Ms = mass of the Sun = 2 × 1030 kg,

Mm = mass of the Moon = 7.35 × 1022 kg,
Rs = Earth-Sun separation = 93 × 106 miles,
Rm = Earth-Moon separation = 2.39 × 105 miles.
CHAPTER 3. CENTRAL FIELDS 51

(a) Find the ratio of the Sun’s and the Moon’s gravitational forces on the
Earth. Even though the Sun is much farther from the Earth, the Sun’s
gravitational force on the Earth is much greater than the Moon’s.
(b) As the Earth is not a point mass, then the Sun’s gravitational force is
stronger (weaker) on the side of the Earth closest (farthest) from the Sun.
This causes water to bulge at the points closest and furthest from the Sun,
which is called the solar high tides. Calculate the maximum difference in
gravitational force in terms of Earth’s radius R for both the Sun and the
Moon.
(c) Extract the leading term in the limit R/Rs 1 and R/Rm 1 for each
expression in part (b). Calculate their ratio and conclude that, although
the Sun’s gravitational force is stronger, the lunar tides are more than
twice as large as the solar tides.

3.8 (Energy of the ocean tides [Nah16]). The lunar tides are not directly in line
with the centers of the Earth and Moon, but are rather carried ahead slightly
by the Earth’s rotation and friction. This means that the Moon’s gravitational
pull on both tides produce torque. The Moon’s pull on the farther tide in-
creases the Earth’s rotational speed, but the stronger pull on the nearer tide
is counter-rotational, and so the overall effect decreases the Earth’s rotational
speed. Atomic clocks have measured that the length of a day is increasing at
the rate of about 2 milliseconds per century.

(a) Let Ω denote the angular rotation rate of the Earth and let T denote the
length of a day in seconds, so that ΩT = 2π. By integrating the kinetic
energy over the volume of the Earth, show that the rotational energy E
is given by
E = 21 Ω2 I,
where Z
I= (x2 + y 2 ) dx dy dz
r≤R

is the moment of inertia.

(b) Show that for a solid sphere of radius R and constant mass density ρ the
moment of inertia is
3
I = 8π
15 R ρ,

or in terms of the total mass M ,

I = 52 M R2 .
2
The Earth is not a constant-density sphere, and so rather than 5 the
coefficient is approximately 0.3444.
CHAPTER 3. CENTRAL FIELDS 52

(c) Write the rotational energy E as a function of the period τ , and show that
dE 4 × 0.3444 × M π 2 R2
=− .
dτ T3
Taking τ to be the length of a day, ∆τ to be 2 milliseconds, M =
5.98 × 1024 kg, and R = 6.38 × 106 m, find the change in the Earth’s rota-
tional energy ∆E over a century. Dividing ∆E by the number of seconds
in a century, conclude that the power of the ocean tides is 3, 260 gigawatts
(which is 4.37 billion horsepower).
3.9 (Moon recession rate [Nah16]). As in Exercise 3.8, tidal friction decreases
the Earth’s rotational angular momentum. Consequently, the Moon’s orbital
angular momentum increases in order to conserve total angular momentum,
which results in the Moon drifting away from the Earth. We will estimate this
recession rate, assuming that all of the momentum is transferred to the Moon’s
orbit (rather than its rotation).
(a) Consider the Moon as a point mass m orbiting circularly about the Earth
at a radius r, with speed v and angular speed ω radians per second. What
is the magnitude Lm of Moon’s orbital angular momentum about the
Earth?
(b) The gravitational force on the Moon by the Earth has magnitude
F = GM mr−2 ,
where M is the mass of the Earth and G is the universal gravitational
constant. Equating the gravitational and centripetal accelerations of the
Moon, find v as a function of r. Use this to determine the angular mo-
mentum Lm as a function of r.
(c) From part (b) of Exercise 3.8, the rotational angular momentum of the
Earth is Le = 0.3444 M R2 Ω where Ω is Earth’s rotation rate. Expressing
Ω in terms of the day length T in seconds, find Le as a function of T and
calculate dL
dT .
e

(d) Using the daily change ∆T = 2 × 10−5 /365 seconds in the length of a day,
approximate the daily change and the yearly change in ∆Le .
(e) Equating change in the Moon’s orbital momentum ∆Lm with the change
in Earth’s rotational momentum |∆Le |, find the yearly change in the
Moon’s orbital radius. Using the values
M = mass of the Earth = 5.98 × 1024 kg,
m = mass of the Moon = 7.35 × 1022 kg,
r = radius of Moon’s orbit = 3.84 × 108 m,
R = radius of the Earth = 6.37 × 106 m,
G = gravitational constant = 6.67 × 10−11 m3 kg−1 s−2 ,
CHAPTER 3. CENTRAL FIELDS 53

conclude that the Moon is receding from the Earth at a rate of 3.75 cen-
timeters (or 1.48 inches) per year. This value is in outstanding agreement
with measurements made by a laser on Earth and corner cube reflectors
on the Moon.

3.10. Consider the central field with potential

V (r) = −kr−2

for k > 0 a constant. Show that

d2
|x(t)|2 = 4

dt 2 m E,

where E is the total energy. Conclude that trajectories with negative energy
reach the origin in finite time.
3.11 (Central field scattering). In this problem we consider a classical model for
a beam of charged particles passing near a repulsive central charge. Consider a
repulsive central field in R3 that tends to zero as |x| → ∞. Suppose we have a
uniform beam of particles all of the same mass and energy whose motion begins
and ends colinearly at infinity. The intenstiy I of the beam is the number of
particles crossing unit area normal to the initial direction of travel per unit time.

(a) Define the impact parameter s for a particle of mass m and initial velocity
v0 via √
L = mv0 s = s 2mE.
Using the facts from section 3.1, show that for the solid angle Ω ⊂ S 2 the
scattering cross section σ(Ω)—the number of particles scattered per unit
solid angle per unit time divided by the incident intensity—is given by
s ds
σ(Θ) = − ,
sin Θ dΘ
where Θ is the angle between the incident and scattered beams.
(b) Suppose the incident particles have charge −q < 0 and the fixed particle
has charge −Q < 0. The motion is dictated by the Coulomb force

F(r) = qQr−2 r̂ = ∇V, V (r) = qQr−1 .

Although we assumed that V = −kr−1 with k > 0 in section 3.3 in order

to model celestial motion, we never needed that k is positive, and so the
explicit solutions are still valid. By relating the total angle of deflection
Θ to the eccentricity of the hyperbolic trajectories, show that

sin Θ 1
2 = , cot Θ
2 =
2Es
qQ .

Using the previous part, conclude that

qQ 2
−4
sin Θ

σ(Θ) = 4E 2 .
CHAPTER 3. CENTRAL FIELDS 54

This is the Rutherford scattering cross section. As a classical approxima-

tion of a quantum system, it has some limitations. For example, if we were
to integrate this expression over the sphere we would obtain an infinite
total scattering cross section, which is of course impossible.
(c) As with Kepler’s problem, the central charge is not truly fixed but in-
stead recoils as a result of the scattering. Assume the central particle
is initially stationary. Let the subscript 1 refer to the scattered particle,
primed coordinates denote coordinates relative to the center of mass, and
X represent the position of the center of mass in laboratory coordinates
so that x1 = X + x01 , v1 = ẋ1 = Ẋ + v01 . To replace the angle Θ between
the initial and final vectors between the charges, consider the deflection
angle θ between the initial and final directions of the scattered particles.
Use the conservation of total momentum to show that
|v01 | sin Θ
tan θ = 0
.
|v1 | cos Θ + |Ẋ|

Then by considering the initial relative velocity, show that

sin Θ
tan θ = .
cos Θ + m 1
m2

As we would expect experimentally, when m2 m1 the initially station-

ary particle does not experience significant recoil and θ ≈ Θ.
(d) The number of particles scattered into a fixed element of solid angle must
be the same in both the laboratory and center-of-mass coordinates. Con-
sequently, show that the cross section σ 0 (θ) in the laboratory system is
given by
sin Θ dΘ d(cos Θ)
σ 0 (θ) = σ(Θ) = σ(Θ) .
sin θ dθ d(cos θ)

(e) Rutherford was interested in α-particle scattering, for which corrections

to m1 /m2 = 1 are negligible. When m1 = m2 , show that θ = 12 Θ; in
particular, 0 ≤ θ ≤ π/2 and there can be no back scattering. Conclude

|v1 |
σ 0 (θ) = 4 cos θ σ(2θ) , = cos θ.
|v0 |
PART II

LAGRANGIAN MECHANICS

The Lagrangian perspective is based upon the principle of least action,

which is a coordinate-free reformulation of mechanics on configuration space
(and its tangent bundle). Unpacking the principle of least action yields the
Euler–Lagrange equations of motion—a system with one second order differen-
tial equation for each degree of freedom—and showcases some fundamental ideas
from the calculus of variations. The coordinate-independence allows us to work
with any choice of coordinates or on any manifold. In particular, when the coor-
dinates are chosen to align with symmetries of the system, the Euler–Lagrange
equations are effective in identifying the corresponding conserved quantities,
reducing the number of equations, and simplifying the system.

55
CHAPTER 4

EULER–LAGRANGE EQUATIONS

We will reformulate Newton’s equations as the principle of least action. Un-

packing this axiom to obtain the Euler–Lagrange equations of motion will fea-
ture some fundamental ideas from the calculus of variations. This presentation
of the calculus of variations follows [AKN06, Ch. 1], and in the rest of the
material is based on [LL76, Ch. 1–2], [Arn89, Ch. 3], and [Gol51, Ch. 1–2].

4.1. Principle of least action

For a mechanical system with N particles, we replace the configuration space

(Rd )N by a smooth manifold M . While physics texts often focus on the deriva-
tion for Euclidean space, we will present the theory for general manifolds. This
level of generality is not superfluous; indeed, the freedom in the choice of con-
figuration space and coordinates is a powerful feature of Lagrangian mechanics.
(Familiarity with manifolds is not strictly necessary however, because mathe-
matical statements for manifolds will often be easily reducible to that of Eu-
clidean space; cf. Proposition 4.5.)
We will use q to denote any (local) coordinates on M (sometimes called
generalized coordinates in physics), and v (often q̇ in physics) to denote a
tangent vector in the tangent space Tq M at q. The number of components
n := dim M of q is called the degrees of freedom of the system.
Definition 4.1. A Lagrangian system (M, L ) is a smooth n-dimensional
manifold M called the configuration space together with a smooth function
L (t, q, v) : I × T M → R called the Lagrangian, where T M is the tangent
bundle of M and I ⊂ R is an interval.
As we will see in section 4.2, the following abstract statement encodes New-
ton’s equations:
Definition 4.2 (Hamilton’s principle of least action). A smooth curve q(t) from
[0, T ] into M is a motion of the Lagrangian system (M, L ) if it is a critical
point of the action functional
Z T
S(q(t)) = L (t, q(t), q̇(t)) dt. (4.1)
0

56
CHAPTER 4. EULER–LAGRANGE EQUATIONS 57

The principle of least action often bears Hamilton’s name, but it was also
independently discovered by Jacobi.
In this section, we will see how to extract equations of motion from the
principle of least action. The principle is called “least” action because it turns
out that the motion is often a minimum; however, we will not make use of this
additional assumption.
The goal of the calculus of variations is to find the extrema of functionals.
A path from a0 ∈ M to a1 ∈ M (not necessarily distinct) starting at time 0
and ending at time T > 0 is a smooth map γ : [0, T ] → M with γ(0) = a0 and
γ(T ) = a1 . Let Ω denote the collection of all such paths. A functional is a
function Φ from Ω into R.
Example 4.3. The arc length of the graph of x(t) from [0, T ] into M = Rn is
a functional. It takes as input a smooth function x : [0, T ] → Rn and returns
the value Z Tp
Φ(x(t)) = 1 + ẋ2 dt.
0
Intuitively, we expect the path of minimum length between two points to be the
line segment connecting those two points. Indeed, by the fundamental theorem
of calculus we have
Z T Z T Z Tp
(T, a1 ) − (0, a0 ) = (1, ẋ) dt ≤ |(1, ẋ)| dt = 1 + |ẋ|2 dt,
0 0 0

with equality if and only if ẋ is constant. The more systematic machinery that
we develop in this section should also return this answer.
In order to solve for the optimizer, we want to obtain an equation that must
be satisfied by the extremum of our functional. From calculus we expect that the
first derivative should vanish at an extremum, and so we want to define a notion
of first derivative for functionals. For more concrete examples we do not need to
assume arbitrary smoothness (cf. Exercise 4.2) or even that an extremum exists
a priori (cf. Exercise 4.5), but for clarity we will assume the extremum exists
and that everything is smooth.
A (fixed-endpoint) variation of a path γ ∈ Ω is a smooth map H(s, t) =
Hs (t) from (−, ) × [0, T ] to M for some > 0 such that:
• H0 = γ,
• Hs ∈ Ω for all s ∈ (−, ), and
• H(s, ti ) = ai for i = 0, 1 for all s ∈ (−, ).
In other words, the paths Hs for various s form a smooth deformation of γ,
which is equal to γ at s = 0 and always connect a1 to a2 for other values of s.
In analogy with calculus on Euclidean space, we will call a functional Φ
differentiable at a path γ ∈ Ω with derivative (or first variation) dΦ|γ (or
δΦ) if the limit
∂Hs Φ(Hs ) − Φ(γ)
dΦ|γ = lim
∂s s→0 s
CHAPTER 4. EULER–LAGRANGE EQUATIONS 58

exists for all variations H of γ. In other words, there should exist a function
dΦ|γ so that the Taylor expansion

∂Hs
Φ(Hs ) = Φ(γ) + s dΦ|γ + o(s)
∂s

holds. We expect the function dΦ|γ will be linear in ∂H ∂s (like v 7→ ∇f (x0 )·v on
s

∂Hs
Euclidean space). We need Φ to be a function of ∂s instead of H because the
space of derivatives ∂H
∂s will be linear, unlike the space of variations H which
s

will be nonlinear if M is not flat.

It remains to define dΦ|γ and ∂H ∂s . These are technical details that do not
s

appear in the Euclidean theory, but we present them for the sake of the general
manifold theory. We can think of Ω as a manifold (albeit infinite-dimensional),
and a variation as a path on Ω through γ at s = 0. The derivative ∂H ∂s should
s

be tangent to Ω at the path Hs . We define a tangent vector W to Ω at a

path γ ∈ Ω to be a function which associates to each t ∈ [0, T ] a tangent
vector Wt ∈ Tγ(t) M , and is smooth in the sense that for every smooth function
f : M → R the function t 7→ df (Wt ) ∈ R is smooth. The tangent space Tγ Ω is
the space of all tangent vectors W at γ such that W0 = 0 = WT . Now we can
define the tangent vector ∂H∂s (0) to the variation Hs at s = 0 to be the set of
s

derivatives
∂H
(0, t) ∈ Tγ(t) M for t ∈ [0, T ].
∂s
As
∂H ∂H
(0, t0 ) = 0 and (0, t1 ) = 0
∂s ∂s
∂Hs
by the fixed-endpoint requirement, then we indeed have ∂s (0) ∈ Tγ Ω. Now
we define
∂Hs d
dΦ|γ = Φ ◦ Hs .
∂s ds s=0

One can check that dΦ|γ is well-defined and linear as expected (cf. Exercise 4.1).
Example 4.4. Consider the arc-length functional Φ of Example 4.3. Given
two points a0 , a1 ∈ Rn , the space Ω of paths from a0 to a1 is the set of smooth
functions [0, T ] → Rn with x(ti ) = ai for i = 0, 1. For any path x : [0, T ] → Rn ,
a variation Hs (t) of x takes the form x(t) + hs (t), where hs : [0, T ] → Rn is
smooth satisfying hs (0) = 0 = hs (T ) for all s ∈ (−, ) and h0 (t) ≡ 0.
∂hs
The derivative ∂H ∂s is equal to ∂s (t). As h0 (t) ≡ 0, then the variation
∂h0
at s = 0 is equal to x(t), and so ∂H ∂s (0, t) = ∂s (t) is a vector centered at
x(t) and pointing in the direction of the variation hs (t) for small s. Moreover,
∂H ∂h0
∂s (0, ti ) = ∂s (ti ) = 0 for i = 1, 2 since hs (0) = 0 = hs (T ) for all s ∈ (−, ).
A tangent vector W to Ω at x(t) is a set of vectors {wt : t ∈ [0, T ]} such that
wt is centered at x(t), the vectors w0 and wT at the endpoints vanish, and wt
depends smoothly on t (in the sense that t 7→ v · wt is smooth, or equivalently
every component of wt is a smooth function of t).
CHAPTER 4. EULER–LAGRANGE EQUATIONS 59

For the rest of this section, we will restrict our attention to the action func-
tional S defined in (4.1). If we take L (t, x, y) = 1 + |y|2 , we recover the
p

functional of Example 4.3. The reason we insisted that L be differentiable is

so that the action is differentiable. To study the consequences of the principle
of least action, we seek a critical point γ for Φ, that is a path γ that satisfies
dΦ|γ ≡ 0.

Proposition 4.5 (Euler–Lagrange equations). The action functional (4.1) is

differentiable, with derivative
Z T
∂Hs ∂L d ∂L ∂Hs
dS|γ = − · dt. (4.2)
∂s 0 ∂q dt ∂ q̇ ∂s

Consequently, a path q(t) is a critical point for the action if and only if q(t)
solves
d ∂L ∂L
t, q(t), q̇(t) − t, q(t), q̇(t) = 0. (4.3)
dt ∂ q̇ ∂q
The n-many second-order differential equations (4.3) are called the Euler–
Lagrange equations for the functional S, or simply the Lagrange equa-
tions when applied to a mechanical system. Note that (4.3) must hold for
any choice of coordinates q, where ∂L ∂q is the gradient of L (t, q, v) in the q
∂L
variables. The derivative ∂ q̇ is a convenient notation for the derivative of the
Lagrangian L (t, q, v) in the velocity variables v, and should technically be no-
tated as ∂L
∂v (t, q, q̇). Although (4.3) is often shortened to

d ∂L ∂L
− = 0,
dt ∂ q̇ ∂q
d
we are meant to plug in q and q̇ before taking the total time derivative dt , which
for example will turn q̇ into q̈.
Proof. Fix a set of coordinates q on M , let γ be a path, and let H be a variation
of γ. By taking the variation of γ to be supported within the image of the
coordinate patch q and shrinking the interval [0, T ] if necessary, we will work
solely within the domain of q in Rn . Let x(t) denote the path in Rn , L (t, x, ẋ)
denote the Lagrangian in Rn , and x(t) + h(t) the variation Hs (t) in Rn (where
the s-dependence of h is suppressed). Note that the fixed endpoint condition
requires that h(0) = 0 = h(T ).
Now that we may work in Euclidean space, we can now use the key idea that
lies at the heart of the Euler–Lagrange theory. As L is differentiable, we may
Taylor expand and write
∂L ∂L
L (t, x+h, ẋ+ḣ) = L (t, x, ẋ)+ (t, x, ẋ)·h(t)+ (t, x, ẋ)·ḣ(t)+O(|h(t)|2 ).
∂x ∂ ẋ
CHAPTER 4. EULER–LAGRANGE EQUATIONS 60

Therefore,
Z T
L (t, x + h, ẋ + ḣ) − L (t, x, ẋ) dt

S(γ + h) − S(γ) =
0
Z T
∂L ∂L
= (t, x, ẋ) · h(t) + (t, x, ẋ) · ḣ(t) dt + O(|h|2 ).
0 ∂x ∂ ẋ

As h(0) = 0 = h(T ) the second term on the RHS above must vanish, and
inserting this back into the expression for dS|γ yields (4.2) as desired.
It is clear from the formula (4.2) for dS|γ that if (4.3) is satisfied then
dS|γ ≡ 0. Conversely, assuming that dS|γ = 0 for any variation, we obtain
Z T
∂L d ∂L
− · h dt = 0
0 ∂x dt ∂ ẋ

for all smooth h : [0, T ] → Rn with h(0) = 0 = h(T ). We conclude that the
integrand without h vanishes identically, since otherwise we could pick h(t) to be
a bump function that witnesses any nonzero value and obtain dS|γ (h) 6= 0.
In the proof of Proposition 4.5, we only need that the coordinates q are
spanning (and not necessarily independent) in order to have enough directions
h(t) to conclude that the Euler–Lagrange equation holds. Therefore, there is no
obstruction to extending Proposition 4.5 to more than n coordinates provided
that they span the same space.
We also record the following observation:
Corollary 4.6. Given a Lagrangian L (t, q, q̇), adding a total time derivative:

f(t, q, q̇) = L (t, q, q̇) + d [f (t, q)]

L
dt
leaves the Euler–Lagrange equations (4.3) unchanged.
Proof. The new action is given by
Z T Z T
df
Se = L (t, q, q̇) + dt = S + f (q(T ), T ) − f (q(0), 0),
0 0 dt

and the addition of a constant to S does not affect whether or not a path is a
critical point of S.
CHAPTER 4. EULER–LAGRANGE EQUATIONS 61

Example 4.7. We will apply the Euler–Lagrange equations (4.3) to determine

the extrema of the arc length functional of Example 4.3. Using Cartesian coor-
dinates x on Rn , the Euler–Lagrange equations are
!
d ∂L ∂L d ẋ
0= − = p − 0.
dt ∂ ẋ ∂x dt 1 + |ẋ|2
Integrating once, we see
ẋ
p =a
1 + |ẋ|2
for some constant a ∈ Rn . Squaring shows that the magnitude |ẋ| must be
constant, from which we then note that the angle of ẋ must also be constant.
Together,
ẋ = b
for a new constant b ∈ Rn . Finally, integrating yields
x(t) = bt + c
for b, c ∈ Rn . That is, the extrema of the arc length functional are straight
lines. If we had instead chosen, say, polar coordinates, the Euler–Lagrange
equations (4.3) would hold for a new Lagrangian L , yielding different differen-
tial equations with solutions that still describe straight lines.

4.2. Conservative systems

In section 4.1 we saw how the principle of least action yields the n = dim M
Euler–Lagrange equations
d ∂L ∂L
− = 0. (4.4)
dt ∂ q̇ ∂q
In this section we will see how (4.4) encodes Newton’s equations, and thus
Newton’s principle of determinacy is implied by Hamilton’s principle of least
action. For this reason, many graduate physics texts opt to begin with the
principle of least action and view Newton’s equations as a consequence.
Let x1 , . . . , xN ∈ Rd be Cartesian coordinates on Euclidean space. For a con-
servative system of N particles, we will see that the right choice of Lagrangian
is
L (t, x, ẋ) = K(ẋ) − V (x), (4.5)
where K and V are the kinetic and potential energies. The principle of least
action implies that the motion x(t) = (x1 (t), . . . , xN (t)) of the system satisfies
the Euler–Lagrange equations (4.4). For this Lagrangian, we have
N
∂L ∂K ∂ X1 2
= = m ẋ
i i = p,
∂ ẋ ∂ ẋi ∂ ẋ i=1 2
∂L ∂V
=− = −∇V = F.
∂x ∂x
CHAPTER 4. EULER–LAGRANGE EQUATIONS 62

Therefore the Euler–Lagrange equations are simply Newton’s equations (1.2).

The advantage of the Lagrangian perspective is that we are now no longer
restricted to Euclidean space.
Example 4.8. Consider a pendulum consisting of a mass m attached to the end
of a rigid massless rod of length ` with the other end fixed, allowed to rotate
in a vertical plane subject to a constant downward gravitational acceleration
g. Let θ denote the angle from the vertical directly below the pivot, which
entirely describes the system. The configuration space is the circle S 1 , and the
Lagrangian is a (time-independent) function defined for (θ, θ̇) ∈ T S 1 = S 1 × R.
The kinetic energy is
K = 21 mv 2 = 12 m`2 θ̇2 ,
and since the force acting on the mass is F = ma = −mg` sin θ (cf. Exercise 2.1),
then the potential energy is

V = −mg` cos θ.

(We picked our integration constant so that V = 0 when the mass is at the
height of the pivot θ = π/2.) The Lagrangian is given by

L = K − V = 12 m`2 θ̇2 + mg` cos θ,

and so the Euler–Lagrange equation is

d ∂L ∂L g
0= − = m`2 θ̈ + mg` sin θ =⇒ θ̈ = − sin θ,
dt ∂ θ̇ ∂θ `
which agrees with what we found using Newton’s equation in Exercise 2.1.
We saw that for conservative systems, one good choice for the Lagrangian
is (4.5). This choice is not arbitrary, and in fact can be derived from Galileo’s
principle of relativity (Definition 1.3) applied to the Lagrangian L rather than
the force. Consider a single particle in space whose position is denoted by the
Cartesian coordinates x ∈ Rd . In an inertial frame, the Lagrangian L (t, x, ẋ) of
this particle cannot be explicitly dependent on position or time by homogeneity,
and so L ≡ L (ẋ). As ∂L ∂x = 0, then the Euler–Lagrange equation for this
particle is
d ∂L
= 0.
dt ∂ ẋ
The quantity ∂L ∂ ẋ is therefore constant in time, and since L ≡ L (ẋ) then ẋ
must be constant. This is the Lagrangian mechanics proof of Newton’s first
law (cf. Proposition 1.4): in an inertial frame, the motion of a free particle is
uniform in time with constant velocity.
Next, we note that L cannot depend on the direction of ẋ either since space
is isotropic, and so L ≡ L (|ẋ|). Let us write this as L ≡ L (|ẋ|2 ) since we
expect L to be smooth. Let v = ẋ denote the velocity, and consider an inertial
CHAPTER 4. EULER–LAGRANGE EQUATIONS 63

frame K
e moving with small velocity relative to an inertial frame K. Then
v|2 = |v|2 + 2v · + ||2 , and so
e = v + and |e
v

L (|e
v|2 ) = L (|v|2 ) + 2L
f(|v|2 )v · + O(||2 ). (4.6)

However, as both frames are inertial then the two Lagrangians should be equiva-
lent for all . Therefore the linear term of (4.6) should be a total time derivative
(cf. Corollary 4.6). As the Lagrangian can only be a function of |v|2 , then this
term could only be a total time derivative if it is linear in v. We therefore have
that L f(|v|2 ) is independent of v, and hence L is proportional to |v|2 . This
allows us to write
L = 21 m|v|2 , (4.7)
where m is the particle’s mass. Experimentally, we would observe that a par-
ticle’s acceleration is inversely proportional to its mass as in section 1.1. Note
that m cannot be negative since from the action (4.1) we see that Hamilton’s
principle would yield maxima instead of minima. We did not use the third type
of Galilean transformations, but the expression (4.7) is automatically invariant
with respect to rectilinear motion v0 = v + v0 . Indeed, we have

L v|2 = 12 m|v + v0 |2 = 12 m|v|2 + mv · v0 + 12 mv20

v) = 12 m|e
f(e
d
= L (v) + mx · v0 + 21 m|v0 |2 t ,

dt
and so the Euler–Lagrange equations are the same by Corollary 4.6.
For a system of free noninteracting particles, the Lagrangian for each indi-
vidual particle cannot be dependent on the coordinates of any other, and so L
must be additive:
XN
L = 1 2
2 mi |vi | = K,
i=1

which yields (4.5) when there is no potential energy. In practice, to change into
another coordinate system we only need to know the line element ds or metric
ds2 in order to know how to transform |ẋ|2 . If we wish to express the Cartesian
coordinates xi as functions of generalized coordinates q = (q1 , . . . , qn ), then we
obtain
Xn
K = 12 aij (q)q̇i q̇j (4.8)
i,j=1

for functions aij of the coordinates only. That is, the kinetic energy K in
generalized coordinates is still a quadratic function of velocities, but may also
depend on the other coordinates. Mathematically, a conservative Lagrangian
system is determined by a Riemann manifold—where the metric determines the
kinetic energy—and a potential function.
To describe a general system of particles which may interact as in (4.5), we
add a function to the Lagrangian. For a conservative system, this function is
the potential energy:
L (q, q̇) = K(q, q̇) − V (q). (4.9)
CHAPTER 4. EULER–LAGRANGE EQUATIONS 64

As we are no longer in Euclidean space, we make the additional assumption

that for a conservative system the force is exact (i.e. F = −∇V ) and not merely
closed (i.e. that work is path independent). Now the time reversibility (cf. sec-
tion 2.4) is easily seen as the time-independence of the Lagrangian: time reversal
t 7→ −t preserves each product q̇i q̇j in the quadratic kinetic energy (4.8) and
thus preserves the Lagrangian (4.9). This suggests that the total energy is con-
served for Lagrangians of the form (4.9); we will revisit this in more detail in
Proposition 4.11.

4.3. Nonconservative systems

Thus far we have considered Lagrangians which describe conservative sys-

tems and are of the form L = K − V . In this section we will extend Lagrangian
mechanics to include nonconservative systems. We will see that the correct
choice is
L = K + W, (4.10)
where W is the total work
n
X
W = Qi · qi (4.11)
i=1
and Qi are the (generalized) forces of the system.
It suffices to consider the case of Euclidean space as in the beginning of
the proof of Proposition 4.5, but we will continue to use the notation q for
convenience. Consider a system whose motion is described by the n coordinates
qj (t) ∈ Rd from time 0 to T , and consider a fixed-endpoint variation qj (t)+hj (t).
Repeating the integration by parts procedure from the proof of Proposition 4.5,
we obtain
Z Z TX n
∂K ∂K
d K dt (h) = hj + ḣj dt
q(t) 0 j=1 ∂qj ∂ q̇j
Xn T Z T X n
∂K ∂K d ∂K
= hj + − · hj dt (4.12)
j=1
∂ q̇j 0 0 j=1 ∂qj dt ∂ q˙j
Z TX n
∂K d ∂K
= − · hj dt.
0 j=1 ∂qj dt ∂ q˙j

In the last equality we noted that hj (0) = hj (T ) = 0 for a fixed-endpoint

variation.
From (4.11) we compute the variation of the work to be
Z Z TXn
d W dt (h) = Qj · hj dt. (4.13)
q(t) 0 j=1

Adding this in, the principle of least action for the new Lagrangian (4.10) yields
Z Z TX n
∂K d ∂K
0=d (K + W ) dt (h) = − + Qj · hj dt.
q(t) 0 j=1 ∂qj dt ∂ q̇j
CHAPTER 4. EULER–LAGRANGE EQUATIONS 65

As this must be true for all variations hj , then we conclude

d ∂K ∂K
− = Qj for j = 1, . . . , n. (4.14)
dt ∂ q̇j ∂qj
These are Lagrange’s equations for nonconservative forces. In other
words, the motion of a nonconservative system is given by Lagrange’s equa-
tions (4.4) for the Lagrangian (4.10).
We never needed to assume that the generalized forces are not conservative,
and so (4.14) must reduce to the familiar formulation of Lagrange’s equations
when the forces are conservative. In the case Q = −∇V (q), we can use the same
integration by parts procedure in reverse to obtain
Z Z TX n Z TX n
∂V d ∂V
d W dt (h) = Qj · hj dt = − − · hj dt
q(t) 0 j=1 0 j=1 ∂qj dt ∂ q̇j
Xn T Z T X n
∂V ∂V ∂V
= · hj − · hj + · ḣj dt
j=1
∂ q̇j 0 0 j=1 ∂qj ∂ q̇j
Z TX n Z
∂V ∂V
=− · hj + · ḣj dt = −d V dt (h).
0 j=1 ∂qj ∂ q̇j q(t)

Therefore, the new Lagrangian (4.10) generates the same motion as the conser-
vative Lagrangian L = K − V .
Even when the forces are nonconservative, all we needed in the previous
paragraph was the ability to write the jth component of the generalized force
as
d ∂V ∂V
Qj = − . (4.15)
dt ∂ q̇j ∂qj
In this case, we recover the familiar form of Lagrange’s equations (4.4) with no
RHS, but now with a velocity-dependent potential V (t, q, q̇).
The quantity V = K − L is called the potential energy even for non-
conservative systems, and is generally time-dependent. A common example is
a system A with coordinates qA that is not closed, but it moves in an external
field due to a system B with coordinates qB (t) independent of qA such that the
entire system A + B is closed. This system has a Lagrangian of the form
L = KA (qA , q̇A ) − V (qA , qB (t)). (4.16)
We may ignore KB since it depends only on time and is thus a complete time
derivative. Equation (4.16) is a Lagrangian of the usual type, but with V
being possibly time-dependent. If system A is a single particle, then the Euler–
Lagrange equations yield
∂V
mq̈ = − (t, q) = F (t, q). (4.17)
∂q
For example, if F ≡ F (t) is uniform (i.e. independent of position) then V = F·x
in Euclidean space.
CHAPTER 4. EULER–LAGRANGE EQUATIONS 66

4.4. Equivalence to Newton’s equations

Now we will see how to obtain the Euler–Lagrange equations (4.4) from
Newtonian mechanics, which will show that Hamilton’s principle of least action
is equivalent to Newton’s principle of determinacy.
A mechanical system with a configuration manifold M can always be—and
in experiment is automatically—embedded in some Euclidean space RN . Within
M , the motion of the system is dictated by some known force F. The effect
of constraining the motion to the manifold M can be thought of as a force N
orthogonal to M , called the constraint force. Newton’s equations for this
system is
mi ẍi = Fi + Ni .
Rearranging, we see that mi ẍi − Fi = Ni is orthogonal to M , and so
(mi ẍi − Fi ) · ξ i = 0
for all vectors ξ = (ξ 1 , . . . , ξ N ) tangent to M . This is Newton’s equation in
the tangent plane to the surface M . Summing over all particles, we get the
d’Alembert–Lagrange principle:
N
X
(mi ẍi − Fi ) · ξ i = 0 (4.18)
i=1

for all vectors ξ ∈ RN tangent to M . In section 5.1 we will see that this principle
more generally dictates the motion of a system with constraints. Note that for
a free system M = Rn we may take any vector ξ ∈ Rn , and so we recover
Newton’s equations.
Let q = (q1 , . . . , qn ) be local coordinates on M . Then by the chain rule we
have
n
X ∂xi
ẋi = q̇j ,
j=1
∂qj
and so we may write the kinetic energy
N
X n
X
1 2
K(q, q̇) = 2 mi |ẋi | = aij (q)q̇i q̇j
i=1 i,j=1

as a positive definite quadratic form on M .

Expressing the force in terms of the coordinates qj , the covectors Qj defined
by the one-form equation
N
X n
X
Fi dxi = Qj dqj ,
i=1 j=1

or equivalently
N
X ∂xi
Qj = Fi , (4.19)
i=1
∂qj
CHAPTER 4. EULER–LAGRANGE EQUATIONS 67

are called the generalized forces. They dictate the evolution of the kinetic
energy via the following expression.
Proposition 4.9. The Newtonian motion q(t) of the system satisfies

d ∂K ∂K
− = Q. (4.20)
dt ∂ q̇ ∂q

Proof. We repeat the argument from section 4.3. The calculation (4.12) of the
variation of the kinetic energy contribution still holds, since we did not use any
equations of motion. Taking a dot product with an arbitrary tangent vector
ξ to M , we can replace the coordinates x ∈ RN with q ∈ M . Similarly, the
calculation (4.13) still holds on M by the definition (4.19) and the d’Alembert–
Lagrange principle (4.18)—that is, Newton’s equations hold on the manifold M
in terms of the generalized forces. Adding these together, we obtain (4.20) as
desired.
For a conservative system we have that the one-form Q dq is exact and may
be written as −dV for a potential energy V (q). For such a system we have the
Lagrangian L = K − V , and hence (4.20) implies that

d ∂L ∂L
− = 0.
dt ∂ q̇ ∂q

In section 4.1, we saw that this implies q(t) is a critical point for the action
functional. As q(t) was an arbitrary motion of the system, we conclude that the
principle of least action must hold.

4.5. Momentum and conservation

In this section we will investigate two special mathematical cases when we

can replace a second order Euler–Lagrange equation by a first order equation,
which correspond to two physically important conservation laws.
For a Lagrangian system, the momentum (sometimes called generalized
momentum in physics) of a particle is defined to be

∂L
pi = . (4.21)
∂ q̇i
If qi = xi is a Cartesian coordinate, the kinetic energy part of the Lagrangian
has a term 21 mi ẋ2i and so pi = mi xi is the linear momentum along the xi -axis. If
qi = φi is the azimuthal angular coordinate in R3 , then the kinetic energy about
the z-axis is 12 mi ri2 φ̇2i and so pi = mi ri2 φ̇i is the angular momentum about the
z-axis.
We can similarly define the (generalized) force as

∂L
Fi = .
∂qi
CHAPTER 4. EULER–LAGRANGE EQUATIONS 68

This way, Lagrange’s equations (4.4) imply

d ∂L dpi
Fi = = ,
dt ∂ q̇i dt
and so Newton’s equation is still satisfied for these new quantities. This defi-
nition agrees with the one-form definition (4.19) for Qi , and so both forces Fi
and momenta pi should be interpreted as covectors. This fact is built into the
kinetic energy
Ki = 12 vi · pi = 12 hvi , pi i
on Euclidean space, since vi = ẋi is a tangent vector.
These definitions immediately illuminate a special case when we can replace
a second order Euler–Lagrange equation by a first order equation:
Proposition 4.10 (Conservation of momentum). If a Lagrangian is indepen-
dent of the variable qi , then the corresponding momentum is conserved:
∂L
= constant.
∂ q̇i
Proof. The Euler–Lagrange equations (4.4) yield
dpi d ∂L ∂L
= = = 0.
dt dt ∂ q̇i ∂qi
Any such coordinate qi is called cyclic, and so Proposition 4.10 says that if
a coordinate is cyclic then the corresponding momentum is conserved. This fact
includes the conservation laws we saw in Propositions 1.13 and 1.16. One of
the strengths of Lagrangian mechanics is that if we can find a cyclic coordinate
then there is one less equation of motion to solve. For example, in section 3.1
we picked the z-axis to align with the initial angular momentum, which elimi-
nated the equations for the z-coordinate of the position and momentum. Cyclic
coordinates can also be observed geometrically: Proposition 4.10 requires that
the trajectories q(t) in configuration space M lie in the level sets of pi , and so
the system has a translation symmetry in the qi direction.
We can extend this heuristic to include the time coordinate: when the cyclic
coordinate is time, then the conserved quantity is the total energy. We define
the total energy of a system as
n
X
E= q̇i pi − L . (4.22)
i=1

This includes our previous definition, since for a conservative system on Eu-
clidean space we have
n n X n
X ∂ X1 2 1 2
E= vi mj v j − V − 2 mi vi − V
i=1
∂vi j=1 2 j=1
n
X n
X
= mi vi2 − 1 2
2 mi vi + V = K + V.
i=1 i=1
CHAPTER 4. EULER–LAGRANGE EQUATIONS 69

Proposition 4.11 (Conservation of energy). The solution q(t) to the Euler–

Lagrange equations (4.3) satisfies
Z t
∂L ∂L
E(q) = q̇ · (t, q, q̇) − L (t, q, q̇) = (q(s), q̇(s), s) ds + constant.
∂ q̇ 0 ∂t

In particular, if the Lagrangian L is independent of t, then the total energy

E(q) is constant.

In the context of variational calculus this is sometimes called the second

Euler–Lagrange equation, which replaces the second order equation (4.3)
with a first order equation.
Proof. Using the chain rule and the Euler–Lagrange equation (4.3), we have

dL ∂L ∂L ∂L
= + · q̇i + · q̈i
dt ∂t ∂qi ∂ q̇
i
∂L d ∂L ∂L ∂L d ∂L
= + · q̇i + · q̈i = + q̇i · .
∂t dt ∂ q̇i ∂ q̇i ∂t dt ∂ q̇i

Integrating in time yields the desired result.

4.6. Noether’s theorem

The conservation of momentum (Proposition 4.10) says that a system that

is continuously symmetric along the coordinate qi possesses the corresponding
conserved quantity pi . This turns out to be a general phenomenon: every
one-parameter group of symmetries which preserves a Lagrangian system has a
corresponding conserved quantity.
Consider a Lagrangian system consisting of a smooth n-dimensional manifold
M with a time-independent Lagrangian L (t, q, v) : T M → R. We say that a
diffeomorphism h : M → M is a symmetry of the system if

L (h∗ (q, v)) = L (q, v) for all (q, v) ∈ T M. (4.23)

Here, h∗ (q, v) = dh(q, v) is the pushforward or differential of the point q ∈ M

and tangent vector v ∈ Tq M , and is equal to the point h(q) ∈ M with the
tangent vector dh|q (v) ∈ Th(q) M .
Proposition 4.12 (Noether’s theorem). If the Lagrangian system (M, L ) is
time-independent and has a differentiable one-parameter group of diffeomor-
phisms hs : M → M for s ∈ R, then there is a conserved quantity (or “integral”)
I : T M → R. In local coordinates,

∂L d
I(q, v) = (q, v) · hs (q) . (4.24)
∂ q̇ ds s=0
CHAPTER 4. EULER–LAGRANGE EQUATIONS 70

The quantity I is independent of the choice of local coordinates q. Indeed, I

measures the rate of change of L (q, v) as v is varied in Tq M in the direction of
d s
the tangent vector ds h (q)|s=0 , which is why the formula (4.24) looks like the
chain rule for ds L ((hs )∗ (q, v))|s=0 . This does not intrinsically involve a choice
d

of coordinates.
Proof. As in the set up for the proof of Proposition 4.5, we may fix a coordinate
patch on M and take the variation to be supported within the image in order
to reduce the statement to Euclidean space Rn . Let q(t) : R → Rn denote a
solution to Lagrange’s equations. As hs is a symmetry for each s then hs ◦ q will
also be a solution, because (hs ◦ q)(t) = (hs )∗ (q(t)) is the pushforward of the
d
point q(t) and dt (hs ◦q)(t) = d(hs ) |q(t) (q̇(t)) = (hs )∗ (q̇(t)) is the pushforward of
the tangent vector q̇(t), and so by (4.23) the Lagrangian evaluated at (hs ◦ q)(t)
is the same as the Lagrangian evaluated at q(t).
Consider the map Φ : Rs × Rt → Rn given by Φ(s, t) = (hs ◦ q)(t). As all of
the symmetries hs preserve the Lagrangian L , then
d ∂L ∂Φ ∂L ∂ Φ̇
0= L (Φ, Φ̇) = · + · , (4.25)
ds ∂q ∂s ∂ q̇ ∂s
where everything on the RHS is evaluated at (Φ(s, t), Φ̇(s, t)) ∈ T Rn . As we
just noted, for fixed s the map Φ(s, ·) : R → Rn satisfies the Euler–Lagrange
equation
∂ ∂L ∂L
Φ(s, t), Φ̇(s, t) = Φ(s, t), Φ̇(s, t) .
∂t ∂ q̇ ∂q
Inserting this into the RHS of (4.25), we obtain

∂ ∂L ∂Φ ∂L ∂ Φ̇
0= Φ, Φ̇ · Φ, Φ̇ + Φ, Φ̇ · Φ, Φ̇
∂t ∂ q̇ ∂s ∂ q̇ ∂s

d ∂L ∂Φ dI
= · Φ, Φ̇ = Φ, Φ̇
dt ∂ q̇ ∂s dt
by the chain rule.
Example 4.13 (Translational symmetry). Consider the conservative N -particle
Lagrangian
XN
L (x, v) = 1 2
2 mi |vi | − V (x), (4.26)
i=1

where x = (x1 , . . . , xN ), xi ∈ Rd and similarly for v. If the potential energy is

invariant under translations along the first coordinate axis e1 ∈ Rn , then the
system is symmetric with respect to the N translations
hsj : (Rd )N → (Rd )N , hsj (x1 , . . . , xN ) = (x1 , . . . , xj + se1 , . . . , xN )
for j = 1, . . . , N . Noether’s theorem yields N conserved quantities
∂L
Ij (x, v) = · (0, . . . , 0, e1 , 0, . . . , 0) = mj vj1 ,
∂ ẋ
CHAPTER 4. EULER–LAGRANGE EQUATIONS 71

which we recognize as the first component of the jth particle’s momentum pj =

mj vj (cf. Corollary 1.14).
Example 4.14 (Rotational symmetry). Now suppose d = 3, and that the
conservative N -particle Lagrangian (4.26) is invariant under rotations about
the first coordinate axis e1 ∈ R3 . Then the system is symmetric with respect
to the N rotations

hsj (x1 , . . . , xN ) = (x1 , . . . , cos(s)xj + sin(s)e1 × xj + (1 − cos s)xj1 e1 , . . . , xN )

for j = 1, . . . , N . Noether’s theorem returns the N conserved quantities

∂L
Ij (x, v) = · (0, . . . , 0, e1 × xj , 0, . . . , 0) = pj · (e1 × xj ) = (xj × pj ) · e1 ,
∂ ẋ
which we recognize as the first component of the jth particle’s angular momen-
tum Lj = xj × pj (cf. Corollary 1.17).

4.7. Exercises

4.1 (Functional derivative is well-defined). Show that if Φ is a differentiable

functional, then its differential is linear and is independent of the choice in
variation of γ.
4.2. Repeat the proof of Proposition 4.5 to prove the following stronger state-
ment on Euclidean space: If L ∈ C 1 (Rn × Rn × [0, T ]; R) and q ∈ C 1 ([0, T ]; Rn )
is a (fixed-endpoint) critical point for the action functional defined in (4.1),
then ∂L 1
∂ q̇ (t, q, q̇) is a C function on [0, T ] and q solves the Euler–Lagrange
equations (4.3).

4.3 (Geodesics on the sphere [Tro96, Ch. 1]). In Example 4.7 we saw that
the geodesics—paths of shortest length between two given points—in Rn are
straight lines. We will repeat this procedure for the sphere S 2 .

(a) Using coordinates (φ, θ), we can parameterize a path x(t) in S 2 ⊂ R3 as

x(t) = (cos φ(t) sin θ(t), sin φ(t) sin θ(t), cos θ(t)), t ∈ [0, 1].

Find the formula for the arc length functional Φ(x(t)).

(b) After rotating the sphere we may assume that θ(0) = 0, θ(1) = θ1 , and
φ(1) = 0. Find a simple lower bound for Φ[x(t)] using the φ̇ term. By
considering when equality occurs, conclude that the geodesic connecting
the north pole x(0) to the point x(1) is the shorter arc of the great circle
(an equator, or circle of maximum circumference on the sphere) connecting
the two points. This is another example where we can directly confirm
that the critical path for the functional is a minimum.
CHAPTER 4. EULER–LAGRANGE EQUATIONS 72

4.4 (Brachistochrone [Tro96, Ch. 6]). The brachistochrone between two points
in the plane is the curve which a frictionless bead would traverse the quickest
subject to a downward gravitational acceleration. Johann Bernoulli in 1696
challenged mathematicians to find the shape of the brachistochrone, and it was
his brother Jakob Bernoulli who provided a solution which was later refined into
the calculus of variations.
(a) After translating, we may assume that the initial point is the origin (0, 0)
and the second point is given by some (x1 , y1 ) with x1 > 0 and y1 < 0.
Explain why it is reasonable to assume that the brachistochrone is the
graph of a function y(x), x ∈ [0, x1 ] as opposed to a general parametric
curve. Show that the time is takes the bead to traverse this curve is
Z x1 p
1 + y 0 (x)2
Φ[y(x)] = dx
0 v(x)
where v(x) is the bead’s speed.
p
(b) With constant downward acceleration g, show that v(x) = 2gy(x).
(c) Using conservation of the energy (4.22), find the first order differential
equation r
y
2
y 0 = 1.
c −y
(d) Introducing a new dependent variable θ(x) so that
y = c2 sin2 θ
2 = 12 c2 (1 − cos θ), 0 ≤ θ < 2π,
show that
1 2
2 c (1 − cos θ)θ0 = 1.
(e) By integrating the two equations of the previous part, obtain the para-
metric equations
x(θ) = c2 (θ − sin θ) + c1 , y(θ) = c2 (1 − cos θ).
In order for x(0) = 0 = y(0), we must have c1 = 0. That is, the brachis-
tochrone is a cycloid; these equations describe the path traced by a fixed
point on a circle of radius c2 as it rolls along the x-axis in the lower
half-plane.
4.5 (Lagrangian PDE I [Eva10, Ch. 8]). In this exercise we will prove the
existence of a solution to the elliptic PDE
−∇ · (A(x)∇u(x)) = 0 for x ∈ Ω, u(x) = 0 for x ∈ ∂Ω,
on an open set Ω ⊂ Rn . Here, A(x) = (aij (x)) is a symmetric n × n matrix
with aij ∈ H 2 (Ω) (the Sobolev space), and we also assume that A is uniformly
elliptic:
λI ≤ A(x) ≤ ΛI for x ∈ Ω
(in the sense of positive definite matrices).
CHAPTER 4. EULER–LAGRANGE EQUATIONS 73

(a) For u ∈ H01 (Ω) (the closure of Cc∞ (Ω) in H 1 (Ω)), show that the energy
functional Z
1
E(u) = 2 ∇u(x) · A(x)∇u(x) dx
Ω
is finite. Show that for φ ∈ Cc∞ (Ω) the first variation at u is
E(u + φ) − E(u)
Z
lim = ∇φ · A∇u.
→0 Ω

Consequently, if u is a minimum of E, then the above expression vanishes

for all φ ∈ Cc∞ (Ω); such a function u ∈ H01 (Ω) is called a weak solution of
the PDE and boundary condition, since one formal integration by parts
would produce the PDE—but we do not assume u is twice differentiable.
(b) Let uj be a sequence in H01 (Ω) such that
lim E(uj ) = inf
1
E.
j→∞ H0 (Ω)

Using Poincaré’s inequality, show that E(u) is bounded below and hence
our sequence is bounded. Conclude that there exists a weakly convergent
subsequence ujk * u in H 1 (Ω) using the Riesz representation theorem.
(c) Show that E is weakly lower semicontinuous:
E(u) ≤ lim inf E(ujk ).
k→∞

Conclude that u ∈ H01 (Ω)

is a minimum for E, and hence is a weak solution
to the PDE by part (a).
4.6 (Two pendulums connected by a spring). Consider two pendulums of unit
length and unit mass in a constant gravitational field g. Suppose they are
connected by a massless spring with spring constant k whose resting length is
the same as their distance of separation.
(a) Let θ1 and θ2 be the angles the pendulums make with the downward
verticals. Find the Lagrangian for the system for small angles, so that
sin θ ≈ θ.
(b) Define new variables
θ1 + θ2 θ1 − θ2
q1 = √ , q2 = √ ,
2 2
and show that Lagrangian separates into two harmonic oscillators:
L = 21 (q̇12 + q̇22 ) + 12 (ω12 q12 + ω22 q22 ).
Find ω1 and ω2 . When q2 = 0, we have θ1 = θ2 and so both pendulums
swing in phase with each other with frequency ω1 . When q1 = 0, we
have θ2 = −θ1 and so the pendulums swing in exact opposite phase with
frequency ω2 .
CHAPTER 4. EULER–LAGRANGE EQUATIONS 74

(c) For k 1 we will see that an exchange of energy occurs. Suppose that
at time t = 0 we have θ1 = 0 = θ2 , θ̇2 = 0, and θ̇1 = v0 . Using part (b),
show that the motion is given by
θ1 (t) = v20 sin t + ω1 sin ωt , θ2 (t) = v20 sin t − ω1 sin ωt

with ω = ω2 . For k 1 we have 1/ω ≈ 1, and so

θ1 (t) ≈ v0 cos t sin ω
e t, θ2 (t) ≈ −v0 cos ω
e t sin t
e . Show that to leading order as k → 0 we have
for some and ω
≈ k2 , e ≈ 1,
ω
and so after a time T = π/2 ≈ π/k the pendulums have switched roles
and now essentially only the second pendulum is oscillating.
4.7 (Charged particle in an electromagnetic field). A charged particle moving in
an electromagnetic field is an example of a nonconservative Lagrangian system.
A particle with charge q moving through the vector fields E, D, B, H on R3
with the scalar charge density ρ and vector current density j obeys Maxwell’s
equations (in Gaussian units):
1 ∂B
∇×E+ = 0, ∇ · D = 4πρ,
c ∂t
1 ∂D 4π
∇×H − = j, ∇ · B = 0.
c ∂t c
The force on the charge is given by the Lorentz law
F = q E + 1c (v × B) .

(a) The fourth Maxwell equation requires that B is divergence-free, and so

we introduce a vector-valued potential A for B so that B = ∇ × A. Using
the first equation, introduce a scalar potential φ so that the electric field
becomes
1 ∂A
E = −∇φ −
c ∂t
and the Lorentz force is

1 ∂A 1
F = q −∇φ − + (v × (∇ × A)) .
c ∂t c

(b) The Lorentz force is nonconservative, but if we can put it in the form (4.15)
then we will have a Lagrangian for this system. The first term qφ is already
of the desired form. Show that the x-component of the rightmost term
v × (∇ × A) may be rewritten as
∂ dAx ∂Ax
(v × (∇ × A))x = (v · A) − + .
∂x dt ∂t
By symmetry, we get the same relation for the other components with x
replaced by the respective variable.
CHAPTER 4. EULER–LAGRANGE EQUATIONS 75

V = qφ − qc v · A.

Symmetrically, the y- and z-components of the Lorentz force are also of

this form if we replace x with the respective coordinates. Consequently,
the Lagrangian for this system is

L = K − V = 21 mv 2 − qφ + qc v · A

where m is the particle’s mass.

4.8 (Noether’s theorem with time dependence). Consider a Lagrangian system

L : R × T M → R with time dependence.

(a) Prove an extension of Noether’s theorem for this system by applying

Proposition 4.12 to the extended configuration space M1 = M × R with
Lagrangian
dq dt dq/dτ dt
L1 q, t, , = L q, ,t
dτ dτ dt/dτ dτ
and the new time variable τ , to obtain a conserved quantity I(t, q, q̇) on
M.
(b) Apply this to a time-independent Lagrangian L (t, q, v) ≡ L (q, v) to con-
clude that the total energy is conserved.

4.9 (Lagrangian PDE II). In this exercise, we will explore the formal Lagrangian
structure associated with the wave equation. The Lagrangian formulation of
the wave equation (as opposed to the Hamiltonian formulation) is advantageous
because it shares the Lorentz symmetries of the wave equation. For further
details, see [SS98].

(a) Let  
−1 0 ··· 0
0 1 ··· 0
g= . ..  = (gαβ )α,β=0,...,d
 
.. . .
 .. . . .
0 0 ··· 1
denote the (d + 1) × (d + 1) matrix associated with the Minkowski metric
ds2 = −dt2 + dx21 + · · · + dx2d on Rt × Rdx , with coordinates (x0 , . . . , xd ) =
(t, x1 , . . . , xd ). Consider the Lagrangian
d
∂u 2
X
L (u, ∇u) := (g −1 )αβ ∂x
∂u ∂u
α ∂xβ
+ F (u) = − 12 ∂t + 12 |∇u|2 + F (u).
α,β=0
CHAPTER 4. EULER–LAGRANGE EQUATIONS 76

Show formally that a function u : R × Rd → C is a critical point for the

action Z p
S(u) := L(u, ∇u) − det g dx dt
R×Rd
if and only if u solves the (semilinear) wave equation
2
− ∂∂t2u + ∆u = F 0 (u).

(b) Now we will take F ≡ 0 and derive the conservation laws of the linear wave
equation. Consider the same Lagrangian L (u, ∇u, g) and action S(u, g)
to be functions of the metric g. Given τ : R × Rd → R × Rd a smooth and
compactly supported diffeomorphism, let gs , s ∈ R be the pullback of the
metric g by the map id + sτ , so that
d ∂τβ ∂τα
(gs )αβ s=0
= gαβ , ds (gs )αβ s=0 = ∂xα + ∂xβ =: παβ .

Similarly, let us be the pullback of the metric g by the map id + sτ , so

d
that us |s=0 = u. Using that ds S(us , gs )|s=0 = 0 for all diffeomorphisms τ
(because the Lagrangian is invariant under the change of variables (t, x) 7→
(t, x) + sτ (t, x)), show formally that if u is a critical point for S(u, g) then
Z d
X p
T αβ παβ − det g dx dt = 0,
R×Rd α,β=0

where
T αβ := ∂L
∂(g −1 )αβ − 12 gαβ L
is the stress-energy tensor. Conclude that T αβ is divergence-free:
d
∂T αβ
X
∂xα =0 for β = 0, . . . , d.
α=0

These yield the microscopic conservation laws

d d
∂T 00 ∂T 0k ∂T 0j ∂T jk
X X
∂t + ∂xk = 0, ∂t + ∂xk =0 for j = 1, . . . , d.
k=1 k=1

(c) Integrate in space to formally show that the total energy

Z Z
00 1 ∂u 2
+ 41 |∇u|2 dx

E(u) := T (t, x) dx = 4 ∂t
Rd Rd

and the components of the momentum

Z Z
0j 1 ∂u ∂u

pj (u) := T (t, x) dx = − 2 Re ∂t ∂xj dx for j = 1, . . . , d
Rd Rd

are conserved. These are the corresponding macroscopic conservation

laws.
CHAPTER 5

CONSTRAINTS

In this chapter we explore some physical and mathematical formulations of

constraints. Note that this does not include methods of solving the equations
of constrained motion; cf. [AKN06, Sec. 1.6]. The treatment of holonomic con-
straints is based on [AKN06, Ch. 1], the treatment of nonholonomic constraints
is based on [Gol51, Ch. 1–2], and section 5.5 is based on [MZ05].

5.1. D’Alembert–Lagrange principle

A holonomic constraint on a Lagrangian system (M, L ) is the require-

ment that the system’s motion is confined to a submanifold S of the phase space
T M that locally can be expressed by

f1 (t, q, q̇) = · · · = fk (t, q, q̇) = 0. (5.1)

A constraint that is not holonomic is called a nonholonomic constraint.

Example 5.1. Rigid-body motion in Euclidean space Rd requires that the dis-
tance between any two particles xi ∈ Rd is fixed. This is a holonomic constraint
because it can be expressed as

|xi − xj |2 − c2ij = 0, (5.2)

where cij are the inter-particle distances.

Conversely, the motion of particles xi ∈ Rd confined to a rigid spherical con-
tainer of radius R centered at the origin is a nonholonomic constraint, because
it is given by
|xj |2 − R2 ≤ 0.
Both of these examples happen to be time-independent, but this need not be
the case.
In this section, we will assume that we have a holonomic constraint. A vector
ξ ∈ Tq M is tangent to the submanifold S provided that

∂f1 ∂fk
(t, q, q̇) · ξ = · · · = (t, q, q̇) · ξ = 0. (5.3)
∂ q̇ ∂ q̇

77
CHAPTER 5. CONSTRAINTS 78

Such a vector ξ is called a virtual velocity of the constrained motion at the

state (q, q̇) ∈ T M and time t. We can ensure that the motion q(t) is constrained
to S by insisting that
d ∂L ∂L
− ·ξ =0 (5.4)
dt ∂ q̇ ∂q
for all virtual velocities ξ at the state (q(t), q̇(t)).
Definition 5.2 (d’Alembert–Lagrange principle). A motion of the Lagrangian
system (M, L ) subject to the holonomic constraints (5.1) is a smooth trajectory
solving (5.4) for all virtual velocities ξ.
By the definition (5.3) of the virtual velocities, this requires that the LHS
of the Euler–Lagrange equations be within the span of the derivatives ∂f ∂ q̇ . In
i

other words, there exist constants µ1 , . . . , µk so that

k
d ∂L ∂L X ∂fi
− = µi (5.5)
dt ∂ q̇ ∂q i=1
∂ q̇

These are called Lagrange’s equations with multipliers. Mathematically,

we recognize the constants µi as Lagrange multipliers: in order for the action
functional to obtain a minimum on the submanifold S, the gradient of the action
(the LHS of (5.5)) must be orthogonal to the submanifold S. Physically, if we
imagine that the particle system is confined to the submanifold S via fictitious
constraint forces given by the RHS of (5.5) (cf. section 4.4), then (5.4) states
that the constraint forces must be orthogonal to the motion and hence do no
work.
We can also observe the constraint effect in terms of the principle of least
action and the admissible path variations. Given a path γ ∈ Ω, let Γ be the
subspace of the tangent space Tγ Ω consisting of vectors W where Wt is a virtual
velocity for each t. For holonomic constraints, the following proposition states
that the constrained Lagrangian system on M is given by a the principle of least
action on S, and hence the system’s degrees of freedom are effectively reduced.

Proposition 5.3 (Hölder’s principle). A path γ ∈ Ω is a motion of the La-

grangian system with the holonomic constraints S if and only if the functional
derivative dS|γ of the action vanishes on the subspace Γ.
Proof. By Proposition 4.5, the first derivative of the action S at the path q(t)
in the direction of the fixed-endpoint variation Hs (t) is
Z T
∂Hs ∂L d ∂L ∂Hs
dS|γ = − · dt.
∂s 0 ∂q dt ∂ q̇ ∂s

Restricting dS|γ to the subspace Γ is equivalent to requiring that ∂H ∂s is a

virtual velocity. As every virtual velocity can be attained by some variation

∂Hs
∂s , we see from the above variation that dS|γ vanishes on Γ if and only if the
d’Alembert–Lagrange principle (5.4) is satisfied.
CHAPTER 5. CONSTRAINTS 79

Example 5.4. We can think of the pendulum of Example 4.8 as a particle of

mass m in a vertical plane (x, y) ∈ R2 subject to a downward gravitational force
with potential V (x, y) = mgy and to the holonomic constraints

x2 + y 2 − `2 = 0, 2xẋ + 2y ẏ = 0,

where ` is the length of the pendulum arm. Note that the second condition
is a time derivative of the first equation, but is necessary to specify the two-
dimensional submanifold S in the four-dimensional tangent space T R2 = R2 ×
R2 . If we let θ denote the angle from the downward vertical and r the distance
from the pivot, then the Lagrangian becomes

L = 21 m(r2 θ̇2 + ṙ2 ) + mgr cos θ,

and the holonomic conditions are

f1 (r, θ, ṙ, θ̇) = r2 − `2 = 0, f2 (r, θ, ṙ, θ̇) = ṙ = 0.

The first constraint does not place any restrictions on the virtual velocities
ξ = (ξr , ξθ ) since ∂(∂f 1
ṙ,θ̇)
= 0, but the second condition yields

∂f2
0= · ξ = ξr .
∂(ṙ, θ̇)

The d’Alembert–Lagrange principle (5.4) then yields the condition

(mr̈ − mrθ̇2 − mg cos θ)ξr + (mr2 θ̈ + 2mrṙθ̇ + mgr sin θ)ξθ = 0.

As ξr = 0 and ξθ is arbitrary, we conclude that the second parenthetical term

above vanishes. After using the constraints f1 and f2 , we obtain the familiar
equation of motion
θ̈ = − g` sin θ.
The holonomic constraints have effectively discarded the r equation and reduced
the degrees of freedom from two to one.

5.2. Gauss’ principle of least constraint

Unlike the principle of least action, the d’Alembert–Lagrange principle is not

an optimization problem due to the presence of the virtual velocities. Gauss
sought to recast the constrained system as an optimization problem, and he
discovered that the actual constrained motion is the feasible constrained motion
which deviates the least from the unconstrained motion.
The state of the system at time 0 is determined by the position q0 = q(0)
and velocity v0 = q̇(0) at a fixed time, and then the acceleration a(0) = q̈(0) is
determined by the laws of motion and constraints. Consequently, we will con-
sider q0 and v0 as fixed while we vary a0 . For a fixed state (q0 , v0 ) ∈ T M and
time 0, we will refer to all paths q(t) allowed by the constraints with q(0) = q0
CHAPTER 5. CONSTRAINTS 80

and q̇(0) = v0 as the conceivable motions; these lie in the constraint sub-
manifold S but do not necessarily satisfy the equations of motion. A released
motion satisfies the unconstrained Euler–Lagrange equations (4.4) (and does
not lie in S), and an actual motion is a conceivable motion satisfying the
d’Alembert–Lagrange principle (5.4) (and hence lies in S).
By considering only variations supported in a fixed coordinate patch, we may
reduce to a neighborhood about the point q0 and work in Euclidean coordinates
with the vector notation q(t) = x(t). We will write an actual motion of the
system xa (t) = xr (t) + δx(t) as a deviation from the released motion xr (t),
and we assume that the initial positions xa (0) = xr (0) = x0 and velocities
ẋa (0) = ẋr (0) = v0 are fixed. Taylor expanding we have

xr (t) = x0 + tv0 + 21 t2 ẍr (0) + O(t3 ),

xa (t) = x0 + tv0 + 21 t2 ẍa (0) + O(t3 ).

Therefore we have
δx(t) = 21 t2 δẍ(0) + O(t3 ). (5.6)
So far, we have not yet used the constraints.
In our local coordinates the d’Alembert–Lagrange principle requires

d ∂L ∂L
− ·ξ =0 (5.7)
dt ∂ ẋ ∂x

for all virtual velocities ξ. For conservative systems, ∂L ∂x is the system’s force
d ∂L
−∇V and dt ∂ ẋ is the total force mẍ (i.e. the system’s force plus the fictitious
d ∂L ∂L
constraint forces). So in this special case, we see that dt ∂ ẋ − ∂x is the force
d ∂L ∂L
m δẍ due to the constraints. In the general case, we evaluate dt ∂ ẋ − ∂x at
xa (t) = xr (t) + δx(t) and Taylor expand about xr (t):

d ∂L ∂L d ∂L ∂L
− = −
dt ∂ ẋ ∂x x=xa dt ∂ ẋ ∂x x=xr
d ∂ L ∂ L ∂ L ∂2L
2 2
2

+ δ ẋ + δx − δx − δ ẋ + O(|δx|2 ).
dt ∂ ẋ2 ∂ ẋ∂x ∂x2 ∂ ẋ∂x x=xr

The first term on the RHS vanishes since the released motion xr solves La-
grange’s equations. For the second term, we use (5.6) and take the limit t → 0
to obtain
∂2L

d ∂L ∂L
− = δẍ(0).
dt ∂ ẋ ∂x x=xa ∂ ẋ2 x=xr
t=0 t=0

Therefore, we see that (5.7) requires

∂2L
(q̈a (0) − q̈r (0)) (5.8)
∂ q̇ 2 q=qr
t=0
CHAPTER 5. CONSTRAINTS 81

to be orthogonal to the constraint submanifold S. On the other hand, (5.8) is

the gradient of the functional
1
Z(q(t)) = 2 (q̈ − q̈r ) · A(q̈ − q̈r ) t=0
(5.9)

for the Hessian matrix A = ∂∂ q̇L2 of the kinetic energy quadratic form at x = xr
2

and t = 0. For conservative systems, A is a positive definite matrix containing

the particle masses. The quantity (5.9) is called Gauss’ compulsion, and it
measures how much the motion q(t) deviates from the released motion.
As the gradient (5.8) of the compulsion functional (5.9) evaluated at the
actual motion qa is orthogonal to the submanifold S, we conclude that the actual
motion qa is a critical point for the compulsion functional. Moreover, all that we
used to prove this was that the constants µi in (5.5) are Lagrange multipliers.
In fact, the converse of this statement is also true. Let f (q) be a smooth
functional on M with a critical point qa when restricted to the submanifold S.
This happens if and only if D(f ◦ φ)|qa = 0 for all coordinates φ from Euclidean
space to a neighborhood of qa in S. Writing D(f ◦ φ) = ∇f · Dφ and noting that
the columns of the matrix Dφ span the tangent space Tq S ⊂ Tq M , we conclude
that qa is a critical point on S if and only if the gradient ∇f (qa ) ∈ Tq M is
orthogonal Tq S.
Altogether, we have proven the following:
Theorem 5.5 (Gauss’ principle). Among the conceivable motions, the actual
motion q(t) is a critical point for the compulsion (5.9) with respect to the released
motion. In particular, if A is positive definite then the actual motion is the global
minimum for the compulsion with respect to the released motion.

See Exercise 5.1 for an example of Gauss’ principle.

Gauss’ principle is analogous to the method of least squares in regression
analysis (which was also discovered by Gauss). In the method of least squares,
there is an unknown data correlation function determined by n parameters and a
large number N > n of observations. The observations deviate slightly from the
desired function’s exact values due to observation error, and hence the overde-
termined system for the correlation function is inconsistent. The remedy is to
construct the square sum error between the correlation function and the data,
and then find the desired function as an error minimizer in the n parameters.
Here, the compulsion is determined by the 2n initial conditions (q0 , v0 ) ∈ T M
where n = dim M is the number of degrees of freedom, and the system is
overdetermined due to the extra constraint conditions. Here, the actual motion
q(t) plays the role of the function we seek determined by the smaller number
of conditions dim T S < 2n, the released motion is the measured data which
overdetermines the correlation function with error, and the compulsion is the
square sum error. Moreover, the matrix A of masses can statistically be inter-
preted as weights in the method of least squares, which are modified based on
the assumed reliability of the data accuracy.
CHAPTER 5. CONSTRAINTS 82

5.3. Integrability

Some authors also require that holonomic constraints are integrable (in the
sense of Frobenius). In this section, we will explore what this additional as-
sumption yields.
In this section, we will assume that the motion is constrained within a k-
dimensional distribution ∆ ⊂ T M : for any q0 ∈ M , ∆q0 is a k-dimensional
subspace of Tq0 M and there exist smooth vector fields X1 , . . . , Xk defined on a
neighborhood of q0 such that ∆q is given by the span of X1 (q), . . . , Xk (q) on that
neighborhood. For holonomic constraints, ∆ is the space of virtual velocities.
We will also assume that ∆ is integrable: there exists an embedding i : N → M
such that di(Tq N ) = ∆q for all q ∈ N ; in other words, ∆ is given by the tangent
bundle of a submanifold.
Frobenius’ theorem provides us with a practical condition to determine if
the distribution ∆ is integrable:
Theorem 5.6 (Frobenius’ theorem). The distribution ∆ is integrable if and
only if ∆ is involutive—that is, for any vector fields X, Y ∈ ∆, the Lie bracket
[X, Y ] is also in ∆.
This abstract result expresses a basic idea when applied to differential equa-
tions: a first-order system of equations can be solved locally if and only if they
are consistent. For example, consider a function u = (u1 , . . . , um ) of the vari-
ables q = (q1 , . . . , qk ) which solves the system of equations
∂u ∂u
(q) = F1 (q, u), ..., (q) = Fk (q, u).
∂q1 ∂qk
Of course, if there exists a solution u, then the right-hand sides F1 , . . . , Fk must
be consistent:
∂ ∂2u ∂
Fi (q, u(q)) = (q) = Fj (q, u(q))
∂qj ∂qi ∂qj ∂qi
∂Fi ∂Fi ∂Fj ∂Fj
=⇒ + · Fj = + · Fi
∂qj ∂u ∂qi ∂u
for all i and j. This is the involutive condition of Forbenius’ theorem. Moreover,
the theorem also provides the converse: if the right-hand sides F1 , . . . , Fk are
consistent, then there exists a local solution. For a proof of Frobenius’ theorem,
see [Lee13, Th. 19.12].
If the holonomic constraints are integrable, then the motion q(t) must lie in
a submanifold of M , not merely a submanifold of T M . Now that we know that
the holonomic constraints have a corresponding smooth integrable submanifold
N ⊂ M of dimension d−k, we can show that the d’Alembert–Lagrange condition
is equivalent to the principle of least action holding on N :
Proposition 5.7. Suppose the Lagrangian system (M, L ) has integrable holo-
nomic constraints. Then a constrained path is a motion of the system if and
only if the path is a motion for the system (N, L |N ).
CHAPTER 5. CONSTRAINTS 83

Proof. From Hölder’s principle (Proposition 5.3) we know that the d’Alembert–
Lagrange condition is equivalent to insisting that the action variation dS|γ van-
ishes on a subspace Γ of conceivable variations. Here, Γ is a subspace of the
tangent space Tq(t) ΩM to the paths ΩM on M . On the other hand, the motion
of the Lagrangian system on N is given by taking the action S on the paths
ΩN on N , and then insisting that its variation vanishes on the tangent space
Tq(t) ΩN . The key observation is that the tangent space Tq(t) ΩN is equal to the
subspace Γ, and so the two conditions are identical.
In particular, the motion for integrable holonomic constraints is determined
by the restriction of the Lagrangian to the constraint submanifold N . In this
way, a system with holonomic constraints is like a new mechanical system with
fewer degrees of freedom; this is a characteristic feature of holonomic constraints,
which we do not expect for nonholomic constraints.

5.4. Integral constraints

In this section, we will assume we have a Lagrangian system (M, L ) subject

to the constraint Z T
G(q(t)) dt = 0, (5.10)
0
for some smooth G : M → R that is not a constant function.
Proposition 5.8. If q(t) is a critical point for a functional S(q(t)) of the
form (4.1) subject to the constraint (5.10) and G(q(t)) is nonconstant, then
there exists a Lagrange multiplier λ ∈ R so that q(t) solves

d ∂L ∂L ∂G
− =λ . (5.11)
dt ∂ q̇ ∂q ∂q
Proof. As in the set up for the proof of Proposition 4.5, we may fix a coordinate
patch on M and take the variation to be supported within the image in order
to reduce the statement to Euclidean space Rn . Let q(t) : R → Rn be a critical
point, and let g = ∂G ∂q denote the gradient of G. By premise, we know that
g(q) 6≡ 0. In particular, there exists a smooth function v(t) : R → Rn so that
Z T
g(q(t)) · v(t) dt 6= 0. (5.12)
0

Let h(t) be a fixed-endpoint variation. To obtain a differential equation for

q(t) we would like to perturb q(t) by the variation h(t). However, the perturbed
path q(t) + h(t) will not satisfy the constraint for arbitrary h(t), and so we will
need to modify our perturbation. Let
Z T
J(σ, τ ) := G q(t) + σh(t) + τ v(t) dt.
0
CHAPTER 5. CONSTRAINTS 84

We claim there is a smooth function τ (σ) defined on a neighborhood of zero

such that τ (0) = 0 and

J(σ, τ (σ)) ≡ 0 for all σ. (5.13)

This follows from the implicit function theorem, since we know that J(0, 0) = 0
and Z T
∂J
(0, 0) = g(q(t)) · v(t) dt
∂τ 0

is nonzero by (5.12).
The perturbation q + σh + τ (σ)v now satisfies the constraint for all σ suf-
ficiently small. As q(t) is a critical point of the constrained action functional,
then we must have
Z T
d
L q + σh + τ (σ)v, q̇ + σ ḣ + τ (σ)v̇, t dt.

0=
dσ 0

Using the derivative of the action functional (4.2), we have

Z T
∂L d ∂L
0= (q) − (q) · (h + τ 0 (0)w) dt. (5.14)
0 ∂q dt ∂ q̇

On the other hand, differentiating (5.13) with respect to σ we obtain

d ∂J ∂J
(0, 0) τ 0 (0)

0= J(σ, τ (σ)) σ=0 = (0, 0) +
dσ ∂σ ∂τ
Z T Z T
= g(q(t)) · h(t) dt + τ 0 (0) g(q(t)) · v(t) dt,
0 0

and so RT
0 g(q(t)) · h(t) dt
τ (0) = − R0T .
0
g(q(t)) · v(t) dt
Inserting this into the derivative (5.14) we arrive at
Z T
∂L d ∂L
0= (q) − (q) + λg(q) · h dt,
0 ∂q dt ∂ q̇

where RT
0
[ ∂L d ∂L
∂q (q) − dt ∂ q̇ (q)] · w(t) dt
λ=− RT
0
g(q(t)) · v(t) dt
is independent of h. As h(t) was an arbitrary fixed-endpoint variation, we
conclude that (5.11) holds.
CHAPTER 5. CONSTRAINTS 85

5.5. Existence of closed orbits

In this section we will assume that we have a time-independent Lagrangian

L (q, q̇). By Proposition 4.11, we know that the total energy

∂L
E = q̇ · −L (5.15)
∂ q̇
is conserved. For a conservative Lagrangian of the form
n
X
L (q, q̇) = 1 2
2 mi q̇i − V (q), (5.16)
i=1

this yields the usual expression

n
X
1 2
E(q, q̇) = 2 mi q̇i + V (q).
i=1

For such a system, we seek closed orbits (i.e. periodic solutions) to Lagrange’s
equations. We will explore some well-known results in this field obtained by
variational methods, and then settle our attention on the proof of one specific
result (Theorem 5.12 below). For each result, closed orbits are produced by
proving the existence of an optimizer for a certain variational problem. To
this end, it is more convenient to work with the energy (5.15) instead of the
Lagrangian (5.16) since it is convex when V is. Consequently, these results are
more naturally phrased in terms of Hamiltonian mechanics, even though they
fall under the topic of optimization and constraints.
For one-dimensional systems, we know from section 2.5 that there are many
periodic trajectories. However, the following example illustrates that closed
orbits can be quite exceptional for n ≥ 2 degrees of freedom.
Example 5.9. Consider two particles of mass m = 1 moving according to the
two-dimensional harmonic oscillator potential

V (x1 , x2 ) = 21 (k1 x21 + k2 x22 ),

where k1 , k2 > 0 are constants. Then Newton’s system of equations decouple,

and x1 and x2 are independent harmonic oscillators with constants k1 and k2
respectively. We found the equations of motion
√ in Example
√ 2.2, and x1 (t) and
x2 (t) are each periodic with periods 2π/ k1 and 2π/ k2 respectively. There-
fore,
p when k1 = k2 then all trajectories for x = (x1 , x2 ) are periodic, and when
k1 /k2 is merely
p rational we still have infinitely many periodic trajectories.
However, when k1 /k2 is irrational, only trajectories with x1 ≡ p1 ≡ 0 or
x2 ≡ p2 ≡ 0 are periodic, and every other trajectory is aperiodic.
Nevertheless, under mild assumptions, there is still must be some closed
orbits. A classical result in this vein is the following theorem of Lyapunov:
CHAPTER 5. CONSTRAINTS 86

Theorem 5.10. Suppose E : R2n → R has a minimum at x = 0 with E 00 (0)

positive definite, and suppose that the eigenvalues of E 00 (0) satisfy λk /λ` ∈
/ Z
for k 6= `. Then Lagrange’s equations (4.4) have n distinct periodic orbits on
the surface E(x) = for all > 0 sufficiently small.
The proof follows Poincaré’s perturbation method and is elementary, but is
still rather clever; cf. [MZ05, §2.2]. Note that we cannot guarantee more than
n closed orbits, as Example 5.9 illustrates. On the other hand, whether the
assumption that λk /λ` ∈ / Z is necessary remains an open question.
Another notion of a flow-invariant family of trajectories is that of geodesics
in differential geometry. In that setting, we have the following result:

Theorem 5.11. A smooth compact surface in Rd whose interior is convex

admits d closed nonconstant and non-self-intersecting geodesics.
The existence of at least one closed geodesic is a famous result of Birkhoff,
and the improvement to d closed geodesics was later made by Lusternik and
Schnirelmann. (Note that d is the right number of geodesics, as can be seen by
an ellipsoid in R3 .) Birkoff’s idea was to rephrase the question as a minimax
problem (cf. [Str08, Th. 4.4]): given an ellipsoid-like object in R3 , we want to
pass it through a rubber band while stretching the band the least. Specifically,
if we take θ ∈ R/2πZ and φ ∈ [0, π] to be coordinates on the sphere S 2 , our
rubber band is given by a φ-dependent closed loop `φ (θ) with `0 being a constant
function at the north pole and `π being the south pole. We then take the
maximum stretch of `φ in θ, and then minimize in φ.
For the remainder of this section, we will focus on the following result of
Weinstein [Wei78] and Rabinowitz [Rab78] and its proof:
Theorem 5.12. Suppose we have a conservative Lagrangian of the form (5.16).
If {x ∈ R2n : E(x) ≤ E0 } is convex, then Lagrange’s equations (4.4) have a
closed orbit on the surface E(x) = E0 .
To prove Theorem 5.12, we will extract the trajectory x(t) = (q(t), q̇(t))
as a minimizer of a functional. Our first idea for a variational principle might
be the principle of least action, which can rewrite using (5.15) as follows: if
x : R/T Z → R2n is a critical point of
Z T
0 I
E(x(t)) − 21 ẋ(t) · Jx(t) dt,

where J= ,
0 −I 0

then x(t) solves

ẋ = J∇E(x) (5.17)
(and this is a first-order rewriting of Lagrange’s equations (4.4) for our La-
grangian (5.16)). However, it is not clear how to guarantee the existence of
a minimizer, because the integrand of the functional above is the difference of
convex functions and hence is not convex.
CHAPTER 5. CONSTRAINTS 87

On the other hand, the total energy E by itself is convex. Consequently, our
next guess might be to seek critical points x : R/T Z → R2n for
Z T Z T
1
E(x(t)) dt, subject to the constraint 2 ẋ · Jx dt = 1.
0 0

Then, arguing as in Proposition 5.8, a critical point x(t) must satisfy

∇E(x) = −λJ ẋ =⇒ λẋ = J∇E(x)

for a Lagrange multiplier λ ∈ R. If λ = 0, then ∇E(x(t)) ≡ 0 and so x(t) is a

curve of fixed points. If λ 6= 0, then the rescaling y(t) := x(t/λ) is a periodic
solution to (5.17) with period |λ|T . Although this functional is convex, it turns
out that it is difficult to obtain enough control over E(x) to prove Theorem 5.12.
Our proof of Theorem 5.12 will be based on the following idea (which is due
to Clarke [Cla79]): it is easier to control the Legendre transform F = E ∗ , the
object that is dual to the convex function E, rather than E itself. We will study
this transformation more thoroughly in section 7.2 when it arises naturally in
the study of Hamiltonian mechanics, and so we defer the proofs of some basic
facts (cf. Theorem 7.5) which we will borrow.
Proposition 5.13 (Clarke’s dual action principle). Suppose that E and F are
strictly convex smooth functions on R2n such that F = E ∗ (cf. the premise of
Theorem 7.5). If z : R/T Z → R2n is a critical point for
Z T Z T Z T
1
F (ż) dt subject to 2 ż · Jz dt = 1 and z dt = 0,
0 0 0

then there exist λ > 0 and β ∈ R2n so that

y = λJz + β solves ẏ = λJ∇E(y).

That is, y is a time-scaled periodic solution to (5.17).

RT
Proof. Let φ : R/T Z → R2n be a smooth function with 0 φ dt = 0. Given a
critical point z(t), there must exist a Lagrange multiplier λ ∈ R so that
Z Z

0= ∇F (ż) · φ̇ + λφ · J ż dt = ∇F (ż) · φ̇ − λφ̇ · Jz dt.

In the last equality we integrated by parts, and the boundary terms canceled
by periodicity. As φ̇ can be an arbitrary smooth periodic function, then we
conclude that we must have

∇F (ż) − λJz = β. (5.18)

Next, we set y = λJz + β. As F = E ∗ , we have ∇F = (∇E)−1 (cf. the

calculation (7.6) and (7.7) in Theorem 7.5) and hence y solves

ẏ(t) = λJ∇E(y(t)).
CHAPTER 5. CONSTRAINTS 88

It only remains to show that λ > 0. By the equation (5.18) for z, we have
Z T
0= ż · ∇F (ż) − λJz − β dt
0
Z T Z T Z T
= ż · F (ż) dt − λ ż · Jz dt − ż · β dt.
0 0 0

The third integral on the RHS vanishes by periodicity, and the second integral is
equal to 2 by the constraint. As F is strictly convex, we must have ξ ·∇F (ξ) > 0
for all ξ 6= 0. Altogether, we conclude that λ ≥ 0. If λ = 0 then we have ż ≡ 0,
but this contradicts the constraints.
In the proof we used that ż 6≡ 0, which implies that z cannot be identically
constant and so y cannot vanish identically. It then follows that y can never
vanish, because x = 0 is the unique point where ∇E = 0 and so y(t) = 0 implies
y(t) ≡ 0.
In applying Proposition 5.13 to prove Theorem 5.12, there is a problem: the
solution produced by Proposition 5.13 is on some nonzero energy surface, but
not necessarily the one we started with. To solve this issue, we will modify E
so that it is positive homogeneous of degree 2:

E(λx)
e = λ2 E(x)
e for all x ∈ R2n , λ > 0

without changing the level surface E = E0 . As E is strictly convex with mini-

mum zero, then for each unit vector ν ∈ R2n , |ν| = 1 there is a unique r(ν) so
that
E(r(ν)ν) = c.
We then define
E(x)
e = c|x|2 r(ν)−2
for x 6= 0 and E(0)
e = 0, so that E = E0 and E e = E0 correspond to the same
level surface. Lastly, we check that if we can find a closed orbit for E,
e then we
also have a closed orbit for E. As E = E0 and E = E0 are the same surface
e
and ∇E and ∇E e are perpendicular to this surface, then there must exist a
proportionality constant s(x) so that

∇E(x)
e = s(x)∇E(x).

(We may assume that ∇E is nonvanishing on the surface E = E0 , since oth-

erwise we can take x(t) to be identically constant.) This means that solutions
to
ẋ = J∇E(x)
e and ẋ = J∇E(x)
only differ in time-parameterization. Therefore, replacing E by E, e we may
assume that E is positive homogeneous of degree 2.
Now suppose that z(t) is a solution in the sense of Proposition 5.13. Then
E(z(t)) is constant, but perhaps not equal to E0 . As E is positive homogeneous
CHAPTER 5. CONSTRAINTS 89

of degree 2 and z(t) is nonvanishing, then E(z(t)) > 0 and so we may choose
λ > 0 so that x(t) = λz(t) satisfies E(x(t)) ≡ E0 . Using that E is homogeneous
again, we see that
ẋ = λJ∇E(z) = J∇E(x)
as desired.
In order to extract a minimizer, we will use the following fact from functional
analysis:

Lemma 5.14 (Mazur’s lemma). Let X be a Banach space. If we have a weakly

convergent sequence xn * x in X, then there exists a sequence of finite convex
combinations of the xn that converge strongly to x.
Proof. Let C denote the set of finite convex combinations of the sequence el-
ements xn , which is clearly convex. If the statement of the lemma were false,
then by Hahn–Banach there would exist a linear functional `∗ so that

sup{`(y) : y ∈ C } < `(x).

This contradicts that `(xn ) → `(x), as is guaranteed by weak convergence.

To conclude the proof of Theorem 5.12, it suffices to show:
R 2π
Proposition 5.15. The functional z 7→ 0 F (ż) dt achieves its minimum over
the set
Z Z Z
2n 2 1
z : R/2πZ → R : |ż| dt < ∞, z dt = 0, 2 ż · Jz dt = 1 .

Indeed, after rescaling time, by Proposition 5.13 we obtain a nontrivial pe-

riodic solution z(t) to (5.17). As we have assumed that E is homogeneous of
degree 2, then a constant multiple x(t) = λz(t) lies on the energy surface E = E0
and is a still a periodic solution. (This solution is in the space H 1 (R/2πZ) and
thus only solves the equation pointwise almost everywhere at first, but a stan-
dard bootstrap argument shows that it must in fact be infinitely smooth.)
Proof. Let zn be an optimizing sequence. As

c|x|2 ≤ F (x) ≤ C|x|2

for some constants c, C > 0, we see that the functional is well-defined on its
domain and that the optimizing sequence zn is in H 1 (R/2πZ). (Note that the
domain is also nonempty, since it contains harmonic oscillators like those in
Example 5.9.)
Decompose the zn as the Fourier series
X
zn (t) = zbn (k)eikt .
k∈Zr{0}
CHAPTER 5. CONSTRAINTS 90

R
(Note that zbn (0) = z dt = 0 by the constraints.) By Parseval we have
X Z Z
2 2 2 1
2π k |b
zn (k)| = |zn | dt ≤ c F (żn ) dt. (5.19)
k6=0

The RHS converges as the sequence zn is minimizing, and hence is bounded

uniformly in n. Therefore, by Bolzano–Weierstrass we may pass to a subse-
quence along which each zbn (k) converges to some zb(k) as n → ∞. These values
are indeed Fourier coefficients for some z(t) by the upper bound (5.19). The
function z(t) isRthen also real-valued since zb(−k) = zb(k), and z(t) also satisfies
the constraint z dt = zb(0) = 0. For the second constraint, we have
Z X
1 = 21 żn · Jzn dt = π zn (k) · J zbn (−k).
ikb
k6=0

We know the RHS converges by dominated convergence, since each summand

converges and we have the uniform bound (5.19). (Alternatively this follows
immediately from the Rellich–Kondrashov compactness theorem for the embed-
ding H 1 (R/2πZ) ,→ H 1/2 (R/2πZ), but here we have a simple direct argument.)
Altogether, we now have a function z(t) that satisfies the constraints. In or-
der
R 2π to conclude that z is a minimizer, it only remains to show that the 1functional
0
F (ż) dt is weakly lower semicontinuous: if zn * z weakly in H (R/2πZ),
then Z Z
F (ż) dt ≤ lim inf F (żn ) dt.
n→∞

(This follows immediately from Tonelli’s theorem in functional analysis, but we

will provide a direct proof.) As zn * z weakly in H 1 (R/2πZ), by Mazur’s lemma
(Lemma 5.14) and a diagonal argument there exist finite convex combinations
1
P
m≥n n,m zm that converge to z in H (R/2πZ). As F is convex, then we have
p
Z Z X
F (ż) dt = lim F pn,m żm dt
n→∞
m≥n
X Z
≤ lim pn,m F (żm ) dt
n→∞
m≥n
Z
≤ lim sup F (żn ) dt.
n→∞

Therefore,
R 2π after initially replacing zn by a subsequence
R 2π which converges to
lim inf 0 F (żn ), then the RHS above is lim inf 0 F (żn ) as desired.

5.6. One-form constraints

In the derivation of Lagrange’s equations (Proposition 4.5), the last step

relied upon the generalized coordinates qj being independent. As discussed in
section 5.3, such qj can always be chosen for a system subject to (integrable)
CHAPTER 5. CONSTRAINTS 91

holonomic constraints. This is not always true in the nonholonomic case how-
ever, and so we would like to develop a new tool for this situation.
Suppose we have m nonholonomic constraints that can be expressed as the
vanishing of one-forms:
n
X
a`k (t, q) dqk + a`t (t, q) dt = 0 for ` = 1, . . . , m. (5.20)
k=1

Note that velocity-independent holonomic constraints also fit this requirement,

because if the condition (5.1) is independent of q̇ then taking the differential of
both sides yields
n
X ∂f ∂f
dqk + dt = 0.
∂qk ∂t
k=1

However, the constraint (5.20) also includes some nonholonomic constraints; for
example, see (5.29) of Exercise 5.4.
Consider fixed-endpoint variations δq(t) of a path q(t) between the times 0
and T as in the proof of Proposition 4.5. The constraint (5.20) is satisfied for
q(t), and so Taylor expansion yields
n
X
a`k (t, q) δqk = O(δq 2 ) for ` = 1, . . . , m. (5.21)
k=1

If there were no constraints, the principle of least action would require

Z TX n
∂L d ∂L
− δqk dt = 0. (5.22)
0 ∂qk dt ∂ q̇k
k=1

To make (5.21) look like this, we multiply by coefficients λ` to be chosen, sum

over `, and integrate from 0 to T :
Z T n X
X m
λ` a`k δqk dt = O(δq 2 ).
0 k=1 `=1

Adding this to (5.22) we obtain

Z T n m
X ∂L d ∂L X
− + λ` a`k δqk dt = O(δq 2 ). (5.23)
0 ∂qk dt ∂ q̇k
k=1 `=1

The n generalized coordinates qk are not independent since they are related
by the m constraints. However, the first n − m coordinates can be chosen
independently, and the remaining m coordinates are determined by the condi-
tions (5.21). Pick the multipliers λ` such that
m
d ∂L ∂L X
− = λ` a`k for n − m < k ≤ n. (5.24)
dt ∂ q̇k ∂qk
`=1
CHAPTER 5. CONSTRAINTS 92

This causes the last m terms of the summation in the variation (5.23) to vanish,
leaving us with
Z T n−m
X m
∂L d ∂L X
− + λ` a`k δqk dt = O(δq 2 ).
0 ∂qk dt ∂ q̇k
k=1 `=1

As the first n − m coordinates qk are independent, then we obtain

m
d ∂L ∂L X
− = λ` a`k for k = 1, . . . , n − m.
dt ∂ q̇k ∂qk
`=1

Including our choice (5.24) of the last m coordinates, we conclude

m
d ∂L ∂L X
− = λ` a`k for all k = 1, . . . , n. (5.25)
dt ∂ q̇k ∂qk
`=1

This is the extension of Lagrange’s equations for nonholonomic con-

straints. Note that these are n equations in n+m unknowns—the n coordinates
qk and the m multipliers λ` . To make the system not under-determined, we must
also consider the m equations:
d
X
a`k q̇k + a`t = 0 for ` = 1, . . . , m,
k=1

obtained from the one-form constraints (5.20). Comparing these new equations
of motion (5.25) to P
the generalization for nonconservative forces (4.14), we see
that the quantities ` λ` a`k are a manifestation of the constraint forces.

5.7. Exercises

5.1. For the pendulum of Example 5.4, explicitly verify that minimizing the
compulsion Z leads to the familiar equation of motion.
5.2 (Hoop rolling down an inclined plane). Consider a circular disk of mass M
and radius r rolling without slipping due to gravity down a stationary inclined
plane of fixed inclination φ.

(a) In a vertical plane, the disk requires three coordinates: two Cartesian
coordinates (x, y) for the center of mass and an angular coordinate to
measure the disk’s rotation. If we pick the origin such that the surface
of the inclined plane is the line y = r − (tan φ)x, obtain a holonomic
constraint of the form (5.1) for the center of mass corresponding to the
disk sitting on the plane.
(b) Consequently, we now can pick two generalized coordinates to describe
the disk’s motion: let x denote the distance of the disk’s point of contact
CHAPTER 5. CONSTRAINTS 93

and the top of the inclined plane, and θ the angle through which the disk
has rotated from its initial state. By considering the arc length through
which the disk has rolled, show that rolling without slipping poses another
holonomic constraint.
(c) In this case, it is easier to treat rolling without slipping as a nonholonomic
constraint of the type in section 5.6:
r dθ − dx = 0.
Show that the Lagrangian for this system is
L = 21 M (ẋ2 + r2 θ̇2 ) + M gx sin φ.

(d) Apply Lagrange’s equations of the form (5.25) to determine the equations
of motion. Here, λ is the force of friction that causes the disk to roll
without slipping. Using the differential equation of constraint
rθ̇ = ẋ,
conclude that
g g Mg
ẍ = 2 sin φ, θ̈ = 2r sin φ, λ= 2 sin φ.

5.3 (Catenary [Tro96, Ch. 3]). The catenary is the shape taken by a cable hung
by both ends subject to gravity.
(a) Consider a cable of length L hung between two equal height supports
separated by a distance H < L. Let y denote the vertical coordinate
with y = 0 at the point where the cable is fastened, and let y(s) denote
the shape of the cable where s is the arc length along the cable, so that
y(0) = 0 = y(L). If the weight per unit length is a constant W , explain
why the cable shape y(s) minimizes the mass integral
Z L
F [y(s)] = W y(s) ds.
0

If we instead chose the horizontal coordinate x as the independent variable,

it turns out that the resulting functional would not be convex.
(b) The functional with Lagrangian (t, y, y 0 ) 7→ W y is merely convex rather
than strictly convex, and consequently may not have a unique minimizer.
Show that in order to span the supports, the cable must satisfy the con-
straint Z Lp
1 − y 0 (s)2 ds = H.
0
(Note that |y 0 (s)| = 1 requires x0 (s) = 0, which would produce a cusp.)
The new Lagrangian
p
L (t, y, y 0 ) = W y − λ 1 − y 02
is strictly convex for λ > 0.
CHAPTER 5. CONSTRAINTS 94

(c) Apply the Euler–Lagrange equation and integrate once to obtain the first
order differential equation
λy 0 (s)
p =s+c
1 − y 02 (s)

for the catenary, where c is a constant.

(d) As we expect y(s) to be unique by convexity, there is no harm in making
additional assumptions. Assume that the cable shape is symmetric about
the midpoint ` = L/2, and conclude that y 0 (`) = 0 and c = −`. Solve for
y 0 (s) and integrate on [0, s] to obtain
p p
y(s) = λ2 + (` − s)2 − λ2 + `2 on [0, `].

(e) Show that the constraint of part (b) yields

Z `
λ H
p ds = .
0 λ2 + (` − s)2 2

Make the substitution ` − s = λ sinh θ, evaluate the integral, and conclude

that
`
λ= .
sinh(H/L)

(f) Show that x(s) is given by

Z sp
0 2
H −1 `−s
x(s) = 1 − y (t) dt = − λ sinh .
0 2 λ

Together with y(s) from part (d), we have parametric equations for the
catenary. Eliminate the variable s and conclude

x− H
p
2
y(x) = λ cosh − λ2 + `2
λ

for x ∈ [0, H]. That is, the catenary is the graph of hyperbolic cosine.
(g) Obtain the same expression for y(x) using Proposition 5.8.

5.4 (Solving Kepler’s problem with harmonic oscillators [KS65]). Consider the
system of section 3.3, where a particle x ∈ R3 of mass m moves in a central
potential
Mm
V (|x|) = − .
|x|
Recall that the squaring function on C ' R2 given by
2
u1 − u22

u1
u := u1 + iu2 = 7→ x := u2 = (5.26)
u2 2u1 u2
CHAPTER 5. CONSTRAINTS 95

is conformal (except at the origin) and maps conic sections centered at the
origin to conic sections with one focus at the origin. As the orbits x(t) are conic
sections, we might hope to apply a transformation with similar properties to
our system in order to turn the elliptic orbits into simple harmonic oscillation.
(As a problem-solving method this may seem rather ad hoc, but an analogous
transformation was used to solve a long-standing open problem in physics.)
Associated to (5.26) is the the linear transformation

dx1 u1 −u2 du1
=2
dx2 u2 u1 du2

of differential forms. The matrix above has the following key properties:
• its entries are linear homogeneous functions of the ui ,
• it is like an orthogonal matrix, in the sense that the dot product of any
two rows vanishes and each row has norm u21 + u22 + · · · + u2n .
We would like to find such a transformation on Rn . It turns out that alge-
braically such transformations can only exist for n = 1, 2, 4, or 8. Ultimately
we want a transformation Rn → R3 , so we take n = 4 and choose a matrix
 
u1 −u2 −u3 u4
u2 u1 −u4 −u3 
A= u3 u4

u1 u2 
u4 −u3 u2 −u1

which satisfies these properties. (This matrix also can be obtained directly from
quaternion multiplication.) Consequently, we set
     
dx1 du1 u1 du1 − u2 du2 − u3 du3 + u4 du4
dx2  du2  u2 du1 + u1 du2 − u4 du3 − u3 du4 
dx3  = 2A du3  = 2 u3 du1 + u4 du2 + u1 du3 + u2 du4  . (5.27)
     

0 du4 u4 du1 − u3 du2 + u2 du3 − u1 du4

Sadly, only the first three of these equations are exact forms, corresponding to
the quantities
x1 = u21 − u22 − u23 + u24 ,
x2 = 2(u1 u2 − u3 u4 ), (5.28)
x3 = 2(u1 u3 + u2 u4 )
respectively. The fourth equation of (5.27) is the nonholonomic constraint

u4 du1 − u3 du2 + u2 du3 − u1 du4 = 0. (5.29)

Equation (5.28) defines a transformation from R4 into R3 , and in fact there are
explicit formulas for both the one-dimensional kernel and the inverse map.

(a) Let r = |x| denote the distance to the origin in R3 . Show that

u21 + u22 + u23 + u24 = r.

CHAPTER 5. CONSTRAINTS 96

Using the orthogonality of A, invert the transformation (5.27) of differ-

entials and conclude that for a fixed point u ∈ R4 our transformation
conformally maps the space orthogonal to the kernel at u onto R3 .
(b) Let u, v ∈ R4 be two orthonormal vectors which satisfy the condition
u4 v1 − u3 v2 + u2 v3 − u1 v4 = 0,
inspired by the constraint (5.29). From part (a) we know that the plane
span{u, v} is mapped conformally onto a plane in R3 . Show that when
this transformation is restricted to span{u, v} onto its image, distances
from the origin are squared and angles at the origin are doubled. (Hint:
The calculation of the image x of a point in span{u, v} is rather lengthy.
To finish, it may help to compare the formula for x to the formula for
Cayley–Klein parameters (cf. [Gol51, Sec. 4.5]).)
(c) In particular, it follows that a conic section centered at the origin in the
plane span{u, v} ⊂ R4 gets mapped to another conical section in R3 with
one focus at the origin, the latter of which describes Kepler’s motion.
Using part (b) and polar coordinates in the plane span{u, v}, show that
ellipses and hyperbolas in the plane span{u, v} centered at the origin are
mapped to ellipses and hyperbolas respectively, and from the limit case
conclude that a line in span{u, v} is mapped to a parabola.
(d) We want to take the motion x(t) ∈ R3 subject to the force P = (P1 , P2 , P3 )
and transpose it into motion u(t) in R4 subject to the force Q = (Q1 , Q2 ,
Q3 , Q4 ) and the constraint (5.29). Use the transformation of differen-
tials (5.27) to determine the kinetic energy K and the force Q in terms
of the coordinates u and forces P , and use the formula for the force Q to
show that
u4 Q1 − u3 Q2 + u2 Q3 − u1 Q4 = 0. (5.30)
(e) Apply Lagrange’s equations (4.14) for nonconservative forces to obtain
the equations of motion for u(t). These equations have Qi on the RHS, so
add them together according to the identity (5.30), simplify, and integrate
once to obtain
r(u4 u̇1 − u3 u̇2 + u2 u̇3 − u1 u̇4 ) = constant. (5.31)
The parenthetical term above is exactly the constraint (5.29) divided by dt,
and so to ensure that the constraint is upheld we pick the initial conditions
for u and u̇ so that the parenthetical term vanishes initially, and hence for
all time by the conservation of (5.31). (Conversely, it can be shown that
the equations of motion for u(t) and the condition u4 u̇1 − u3 u̇2 + u2 u̇3 −
u1 u̇4 = 0 yield mẍ = P .)
(f) The equations of motion for u have a singularity at r = 0. To deal with
this, change variables t 7→ s where s is the regularizing time
Z t
dt d 1 d
s= , so that = .
0 r dt r ds
CHAPTER 5. CONSTRAINTS 97

Insert the Kepler forces Qi = −∂V (r)/∂ui and the (signed) semi-major
axis −1
v2

2
a0 = −
r M
to arrive at
∂ 2 ui M
2
+ ui = 0.
∂s 4a0
That is, the preimage of bounded orbits (a0 > 0) under thisp
transformation
is simple harmonic motion in R4 with frequency ω = M/4a0 . The
harmonic motion can be (rather tediously) transformed into R sa solution
u(t) by computing and substituting in the physical time t = 0 r(s) ds.
CHAPTER 6

HAMILTON–JACOBI EQUATION

Rather than n second-order ODEs, the Hamilton–Jacobi equation of motion

is one partial differential equation in 2n + 1 variables. The material for this
chapter is based on [LL76, Ch. 7] and [Gol51, Ch. 9].

6.1. Hamilton–Jacobi equation

In this section, we will derive the Hamilton–Jacobi equation of motion. To

begin, we return our attention to the action functional.
As in section 4.1, consider the variation of the action S(q) at the systems
motion q(t) for paths with fixed endpoints. After fixing a coordinate patch,
we can reduce to an open subset of Euclidean space (cf. the proof of Proposi-
tion 4.5). Write the variation of the actual motion q(t) between times 0 and T
as q(t) + δq(t). Previously we assumed δq(0) = δq(T ) = 0 and then prescribed
the resulting variation δS to vanish in accordance with the principle of least
action. Now we will consider how the action S(t, q(t)) varies as a function of
the coordinates at time T by allowing δq(T ) to vary.
Repeating the calculation of the action variation from Proposition 4.5, we
now have
T Z T
∂L ∂L d ∂L ∂L
dS|q (δq) = · δq + − · δq dt = (t) · q(t).
∂ q̇ 0 0 ∂q dt ∂ q̇ ∂ q̇

The integral vanishes as q(t) satisfies Lagrange’s equations of motion. On the

other hand, now only one of the boundary terms vanish since δq(0) = 0. If we
let t := T vary now, then we conclude
∂S ∂L
= = pi . (6.1)
∂qi ∂ q̇i

In the last equality, we used the definition (4.21) of the generalized momentum.
By definition (4.1) of the action we have

dS
= L. (6.2)
dt

98
CHAPTER 6. HAMILTON–JACOBI EQUATION 99

On the other hand, the chain rule applies to S = S(t, q(t)) insists
dS ∂S ∂S ∂S
= + · q̇ = + p · q̇ (6.3)
dt ∂t ∂q ∂t
by (6.1). Setting (6.2) and (6.2) equal yields
∂S
= L − p · q̇ = −H,
∂t
where H is the total energy (or Hamiltonian) (4.22). Using the derivative (6.1),
we recognize this identity as a first order partial differential equation (PDE) for
S(t, q):
Theorem 6.1. If q(t) is a critical point for the action functional, then q(t)
solves
∂S ∂S
0= + H t, q, , (6.4)
∂t ∂q
where H is total energy defined by (4.22).
Equation (6.4) is the Hamilton–Jacobi equation, and for a system with
n degrees of freedom there are n + 1 independent variables (t, q1 , . . . , qn ). It is
often the case in practice that the formula for S(t, q) is unknown, and cannot
be determined from (6.4) alone.
The solution, or complete integral, of this equation has n + 1 integration
constants corresponding to the number of independent variables. Denoting these
constants by α1 , . . . , αn , and A, we write
S = f (t, q1 , . . . , qn , α1 , . . . , αn ) + A.
Indeed, one of these constants must be additive since the action appears in the
PDE (6.4) only through its partial derivatives and hence is invariant under the
addition of a constant. Mathematically, we require that a complete integral
S(t, q, α) to the Hamilton–Jacobi equation (6.4) satisfies
∂2S
det 6= 0
∂q ∂α
in order to avoid incomplete solutions.
The function f (t, q1 , . . . , qn , α1 , . . . , αn ) induces a change of coordinates.
(We will develop this idea more generally in section 7.6, where we will see that
f is an example of a generating function which generates a canonical transfor-
mation.) Think of α1 , . . . , αn as new momenta, and let β1 , . . . , βn denote new
coordinates to be chosen. Note that by the chain rule,
df ∂f ∂f ∂f
= + · q̇ + · α̇. (6.5)
dt ∂t ∂q ∂α
The qi derivatives are the momenta by (6.1), and we set
∂f
βi := − . (6.6)
∂αi
CHAPTER 6. HAMILTON–JACOBI EQUATION 100

Consider the new total energy

∂f ∂S
H 0 (t, α, β) := H + =H+ = 0,
∂t ∂t
which has corresponding Lagrangian
df ∂f ∂f
L 0 (t, β, α) := α · β̇ − H 0 = α · β̇ = −
+ + · q̇
dt ∂t ∂q
df df
=− − H + p · q̇ = L −
dt dt
by the chain rule (6.5). As this new Lagrangian L 0 differs from the old L by a
complete time derivative, then they generate the same motion by Corollary 4.6.
For any Lagrangian L , the definition of the total energy (4.22) requires
∂H ∂
= (p · q̇ − L (t, q, q̇)) = q̇,
∂p ∂p
∂H ∂ ∂L d ∂L
= (p · q̇ − L (t, q, q̇)) = − =− = −ṗ.
∂q ∂q ∂q dt ∂ q̇
This calculation may appear questionable at first: it is not clear that the velocity
q̇ is independent of the position q and momentum p. However, it is indeed
correct, and we will later give it thorough justification in section 7.1. Applying
this to our new total energy H 0 (t, β, α), we observe that
α̇ = 0, β̇ = 0.
In other words, we have used the integration constants α to generate correspond-
ing conserved quantities β. Recalling the definition (6.6) of the coordinates β,
we conclude
∂S
(t, q, α) = constant (6.7)
∂αi
in the n coordinates and time. In fact, even if we can only obtain a partial
solution to the Hamilton–Jacobi equation involving only m constants αi , we
still get the corresponding m constants of motion.
Lastly, let us specialize to a conservative system. In this case, the total
energy H is equal to a constant E and the action integral becomes
Z t Z t
S(q(t)) = L (t, q(t), q̇(t)) dt = (p · q̇ − H) dt
0 0
Z t Z t
= p · q̇ dt − E dt = S0 (q1 , . . . , qn ) − Et,
0 0
∂S0
for some S0 (q) which is only a function of the coordinates. As ∂t = 0, the
Hamilton–Jacobi equation then takes the special form

∂S
H q, , t = E. (6.8)
∂q
In particular, we are no longer required to know the exact formula for the action.
CHAPTER 6. HAMILTON–JACOBI EQUATION 101

6.2. Separation of variables

Sometimes we can reduce the Hamilton–Jacobi equation by one coordinate

using separation of variables. Suppose that for some system with n degrees of
∂S
freedom, we have a coordinate q1 with corresponding derivative ∂q 1
that appear
∂S
in the Hamilton–Jacobi equation only in some particular combination φ(q1 , ∂q 1
).
That is, the Hamilton–Jacobi equation (6.4) takes the form

∂S ∂S ∂S
Φ t, q2 , . . . , qn , ,..., , φ q1 , =0 (6.9)
∂q2 ∂qn ∂q1

after rearranging the independent variables. To look for a solution S, we take

the ansatz
S = S 0 (q2 , . . . , qn , t) + S1 (q1 ).
Plugging this back into our Hamilton–Jacobi equation (6.9) we get

∂S 0 ∂S 0

dS1
Φ t, q2 , . . . , qn , ,..., , φ q1 , = 0. (6.10)
∂q2 ∂qn dq1

Note that q1 only influences φ and is entirely independent from the rest of
the expression. As the variables are independent, this can only happen if φ is
constant:
∂S 0 ∂S 0

dS1
φ q1 , = α1 , Φ t, q2 , . . . , qn , ,..., , α1 = 0. (6.11)
dq1 ∂q2 ∂qn

We now have a first order ODE and a Hamilton–Jacobi equation in terms of

the remaining n − 1 coordinates. The ability to remove a coordinate in the
Hamilton–Jacobi consideration even when it is not cyclic is a virtue of this
formulation.
If we do have some cyclic coordinate q1 for the system (i.e. L is independent
of q1 ), then the Hamilton–Jacobi equation (6.4) becomes

∂S ∂S ∂S
+ H q2 , . . . , qn , ,..., , t = 0. (6.12)
∂t ∂q1 ∂qn

Then q1 is of the type (6.9) for the function

∂S ∂S
φ q1 , = . (6.13)
∂q1 ∂q1

Our result (6.11) yields

∂S
= α1 , S = S 0 (q2 , . . . , qn , t) + α1 q1 . (6.14)
∂q1

Here, α1 = ∂S/∂q1 is just the conserved momentum corresponding to q1 .

CHAPTER 6. HAMILTON–JACOBI EQUATION 102

If this cyclic variable is time, then the system is conservative and we saw
in (6.8) that the action is given by

S = W (q1 , . . . , qn ) − Et. (6.15)

Here, −E is the conserved quantity associated with t, although it is not nec-

essarily always the total energy. We have used W instead of S 0 to denote this
special case, but they share the same purpose; this W is called Hamilton’s
characteristic function. The Hamilton–Jacobi equation is now

∂W ∂W ∂S
H q1 , . . . , q n , ,..., =− = E, (6.16)
∂q1 ∂qn ∂t
and E is just one integration constant αj of the motion that does not appear
in S 0 . The corresponding conserved quantity (6.7) will give the coordinates
implicitly as functions of the constants αi , βi , and time:
∂W


 − t for i = j,
∂S  ∂αi
βi = = (6.17)
∂αi  ∂W

 for i 6= j.
∂αi
Only the jth of these equations is time-dependent, and so one of the qi can be
chosen as an independent variable and the rest will be able to be written in
terms of this coordinate. For example, in section 3.1 we saw that for central
forces the angle φ can be soled for as a function of the radius r.

6.3. Conditionally periodic motion

For this section, we will examine a system of n degrees of freedom with

bounded motion, such that every variable can be separated using the method
of the previous section. This means the action takes the form
d
X
S= Si (qi ) + S 0 (t), (6.18)
i=1

where each coordinate Si is related to the corresponding momentum via

Z
∂Si
pi = , Si = pi dqi . (6.19)
∂qi
The motion is bounded, and so this integral represents the area enclosed by a
loop in the phase plane (qi , pi ) (just as with the period integral (2.9)). Every
time qi returns to a value, Si has incremented by an amount 2πIi with
I
1
Ii = pi dqi . (6.20)
2π
These Ii are called the action variables.
CHAPTER 6. HAMILTON–JACOBI EQUATION 103

As in section 6.1, the function Si induces a change of variables with the

action variables Ii as the new momenta. (Later, we will see that Si is a gen-
erating function that induces a canonical transformation.) The new position
coordinates, called angle variables, are given by
n
∂S X ∂Sk
wi = = (qk , I1 , . . . , In ) (6.21)
∂Ii ∂Ii
k=1

since this is the time integral of ∂H

∂p = q̇. The functions Si are time-independent,
and so the new total energy H ≡ E is just the old in terms of the new coordi-
nates. The new equations of motion require
∂H ∂H ∂E
I˙i = − (I1 , . . . , In ) = 0, ẇi = (I1 , . . . , In ) = (I1 , . . . , Is ), (6.22)
∂wi ∂Ii ∂Ii
which can be immediately integrated to yield
∂E
Ii = constant, wi = t + constant. (6.23)
∂Ii
As we have already observed, Si increments by 2πIi each time qi returns to its
original value, and so the angle variables wi also increment by 2π. Consequently
∂E
the derivative ∂I i
is the frequency of motion in the ith coordinate, which we
were able to identify without solving the entire system.
As the motion in these variables is periodic, any single-valued function
F (q, p) of the system coordinates and momenta will be periodic in the angle
variables with period 2π after being transformed to the canonical variables.
Fourier expanding in each of the angle variables, we have
X X
F = A` ei`·w = A`1 ...`n ei(`1 w1 +···+`n wn ) . (6.24)
`∈Zn `1 ,...,`n ∈Z

Using (6.23), we may write the angle variables as functions of time. Absorbing
the integration constants of the wi into the coefficients A` , we get

X ∂E
F = A` exp it` · . (6.25)
n
∂I
`∈Z

Each term of this sum is periodic with frequency ` · ∂E ∂E

∂I . If the frequencies ∂Ii
are not commensurable however, then the total quantity F is not periodic. In
particular, the coordinates q, p are may not be periodic, and the system may
not return to any given state that it instantaneously occupies. However, if
we wait long enough the system will come arbitrarily close to any given oc-
cupied state—this phenomenon is referred to as Poincaré’s recurrence theorem
(cf. Corollary 7.26). Such motion is called conditionally periodic.
∂E
Two frequencies ∂I i
that are commensurable (i.e. their ratio is a rational
number) are called a degeneracy of the system, and if all n are commensurable
CHAPTER 6. HAMILTON–JACOBI EQUATION 104

the system is said to be completely degenerate. In the latter case, all motion
is periodic, and so we must have a full set of 2n−1 conserved quantities. Only n
of these will be independent, and so they can be defined to be the action variables
∂E ∂E
I1 , . . . , In . The remaining n − 1 constants may be chosen to be wi ∂I k
− wk ∂I i
for distinct i, k, since

d ∂E ∂E ∂E ∂E ∂E ∂E ∂E ∂E
wi − wk = ẇi − ẇk = − = 0.
dt ∂Ik ∂Ii ∂Ik ∂Ii ∂Ii ∂Ik ∂Ik ∂Ii

Note, however, that since the angle variables are not single-valued, neither will
be the n − 1 constants of motion.
Consider a partial degeneracy, say, of frequencies 1 and 2. This means
∂E ∂E
k1 = k2 (6.26)
∂I1 ∂I2
for some k1 , k2 ∈ Z. The quantity w1 k2 − w2 k1 will then be conserved, since
d ∂E ∂E
(w1 k1 − w2 k2 ) = ẇ1 k1 − ẇ2 k2 = k1 − k2 = 0. (6.27)
dt ∂I1 ∂I2
Note that this quantity is single-valued modulus 2π, and so a trigonometric
function of it will be an actual conserved quantity.
In general, for a system with n degrees of freedom whose action is totally
separable and has n single-valued integrals of motion, the system state moves
densely in a n-dimensional manifold in 2d-dimensional phase space. For degen-
erate systems we have more than n integrals of motion, and consequently the
system state is confined to a manifold of dimension less than n. When a system
has less than n degeneracies, then there has fewer than n integrals of motion
and the system state travels within a manifold of dimension greater than n.

6.4. Geometric optics analogy

Hamilton came up with the principle of least action while studying optics,
and was inspired by Fermat’s optics principle. In this section we will see that the
level sets of the action propagate through configuration space mathematically
similar to how light travels through a medium. This connection brings a physical
analogy to the formerly abstract notion of the action.
Suppose we have a system consisting of one particle moving in the Euclidean
space R3 for which the total energy is conserved. (Although this analogy holds
for systems with multiple particles and more complicated configuration spaces,
we will focus on this simple case for convenience.) Equation (6.15) tells us that
the action is given by
S(q, t) = W (q) − Et. (6.28)
Take Cartesian coordinates q = x = (x, y, z) ∈ R3 , and consider the motion
of the action level surfaces S(t, q) = b with time in R3 . (If we were to gener-
alize this argument to multiple-particle systems, then instead of the particle’s
CHAPTER 6. HAMILTON–JACOBI EQUATION 105

motion in Cartesian space we must consider the path that the system traces
out in configuration space.) At time t = 0, we have an equation for Hamilton’s
characteristic function W = b, and after a time step ∆t we then have

W (∆t) + ∆W = b + E∆t + O(∆t2 )

since dW
dt = E.
The propagation of this surface can be thought of as a wavefront. If we call
the distance traveled normal to the wavefront ds, then we also have
∂W
= |∇W |. (6.29)
∂s
The velocity u of the wavefront is then

ds dW/|∇W | E
u= = = .
dt dW/E |∇W |

As a conservative system, the Hamilton–Jacobi equation takes the form (6.16),

which in Cartesian coordinates looks like
(∇W )2

∂W ∂W
E = H q, =K +V = + V,
∂q ∂q 2m

or after rearranging,
2
(∇W ) = 2m(E − V ). (6.30)
Plugging this into the velocity (6.29), we have

E E E
u= p =√ = .
2m(E − V ) 2mK p

The faster the particle moves, the slower the action level sets propagate.
The momentum is given by
∂W
p= = ∇W.
∂q
The gradient is of course normal to the level sets, and so this relation tells us
that the particle always moves normal to the level sets of the characteristic
function W .
We will now see how the level sets of the action propagate like waves. For
some scalar-valued function φ, the wave equation of optics is

n2 ∂ 2 φ
∇2 φ − = 0. (6.31)
c2 ∂t2
If the refractive index n is constant, then there is a family of plane wave solu-
tions:
φ(t, r) = φ0 ei(k·x−ωt) , (6.32)
CHAPTER 6. HAMILTON–JACOBI EQUATION 106

where the vector k ∈ R3 is the propagation direction and the magnitude of

k = 2π nω
λ = c is the wave number.
In geometric optics, the refractive index n ≡ n(x) is not assumed to be
constant, but rather to be changing slowly on the scale of the wavelength. Con-
sequentially, we now seek solutions to the wave equation (6.31) with the ansatz
based on the plane waves (6.32):

φ(t, r) = eA(x)+ik0 (L(x)−ct) . (6.33)

where A(x) is related to the amplitude of the wave, k0 = ωc is the wave number
in vacuum (n ≡ 1), and L(x) is called the optical path length of the wave.
Plugging in (6.33), the wave equation (6.31) becomes

φ ∇2 A + (∇A)2 − k02 (∇L)2 + k02 n2 + ik0 ∇2 L + 2∇A · ∇L = 0.

In general, φ is nonzero and so the curly-bracketed expression must vanish.

We want A and L to be real-valued by construction, and so we the real and
imaginary parts in square brackets must also vanish:

∇2 A + (∇A)2 + k02 n2 − (∇L)2 = 0, ∇2 L + 2∇A · ∇L = 0.

(6.34)

Now comes the geometric optics approximation: the wavelength λ = 2π k is small

compared to the rate of change of the medium. In particular, the wave number
in vacuum k0 = 2πλ0 must be considerably large compared to the derivative terms
in the first equation of (6.34), and thus we require
2
(∇L) = n2 .

This is called the eikonal equation.

Returning to (6.30), we see that the characteristicp function W satisfies an
eikonal equation for a “medium” of refractive index 2m(E − V ) = p (this
equality holds for the single-particle case). This illustrates that the characteris-
tic function is like a wavefront that propagates through the medium of config-
uration space with refractive index p, in the geometric optics approximation.

6.5. Exercises

6.1 (Harmonic oscillator). The harmonic oscillator has total energy

1 2
H= 2m p + 12 mω 2 q 2 .

(a) Write down the Hamilton–Jacobi equation (6.4) for this system. As this
system is conservative, we expect a solution (up to an arbitrary additive
constant) of the form

S(q, α, t) = W (q, α) − αt
CHAPTER 6. HAMILTON–JACOBI EQUATION 107

where the constant α is the total energy. Plug this ansatz into the
Hamilton–Jacobi equation and conclude that
Z q
2α 2
W = ±mω mω 2 − q dq.

This integral can be evaluated further, but it is not necessary for our
purposes.
(b) The quantity β will implicitly give us the equation of motion q(α, β, t).
Using the definition (6.6), show that
( )
1 sin−1
pm
ωq if ± = +
t+β = p2α + constant.
ω cos−1 m

2α ωq if ± = −

Note that we may assume that we are in the case ± = − and absorb the
integration constant into β, which has yet to be determined. Altogether,
this yields the familiar equation of motion
q
2α
q(t) = mω 2 cos [ω(t + β)] .

(c) To find the constants we must apply the initial conditions q(0) = q0 , p(0) =
p0 . Determine α and β using p0 = ∂S∂q |t=0 and q(0) = q0 respectively, and
obtain the solution as a function of the initial values:
r   
p2 q0
q(t) = q02 + 20 2 cos ωt + cos−1  q  .
m ω p2
q02 + m20ω2

6.2 (Central field). Consider the motion of a particle in a central field as in

section 3.1. In polar coordinates, we have
2 2 2
K=m 2 (ṙ + r φ̇ ), V = V (r),
∂L ∂L
pr = = mṙ, pφ = = mr2 φ̇,
∂ ṙ ∂ φ̇
H = 2m 1
(p2r + r−2 p2φ ) + V (r).

(a) This total energy is both time-independent and cyclic in φ, and so we

expect an action of the form

S = W1 (r) + αφ φ − Et.

Plug this into the Hamilton–Jacobi equation (6.16) for conservative sys-
tems and integrate to arrive at
Z q
W = W1 (r) + αφ φ = ± 2m(E − V (r)) − αφ2 r−2 dr + αφ φ.
CHAPTER 6. HAMILTON–JACOBI EQUATION 108

(b) Use (6.17) to obtain the implicit equations of motion

Z
m
β1 = ± q dr − t,
2m(E − V (r)) − αφ2 r−2
−αφ r−2
Z
β2 = ± q dr + φ.
2m(E − V (r)) − αφ2 r−2

These match what we found in equations (3.5) and (3.6) where αφ = M

is the angular momentum, which is the constant associated to the cyclic
coordinate φ.

6.3 (Kepler’s problem). We will find the frequency of oscillations for Kepler’s
problem using action variables without solving the equations of motion. Con-
sider a particle of mass m in an inverse-square central force field, as in section 3.3:

K= m 2
+ r2 θ̇2 + r2 sin2 θφ̇2 ), V = −kr−1 ,
2 (ṙ
∂L ∂L ∂L
pr = = mṙ , pθ = = mr2 θ̇, pφ = = mr2 sin2 θφ̇,
∂ ṙ ∂ θ̇ ∂ φ̇
!
1 2 p2θ p2φ k
H= pr + 2 + 2 2 − .
2m r r sin θ r

The constant k is positive, so that we have an attractive force in order to have

bounded motion.

(a) Write down the Hamilton–Jacobi equation (6.16) for this conservative sys-
tem. Notice that all of the coordinates are separable, and so the charac-
teristic function is of the form

W = Wr (r) + Wθ (θ) + Wφ (φ).

∂W
(b) The total energy is cyclic in φ and thus has constant ∂W φ
∂φ = ∂φ = αφ ,
which is angular momentum about the z-axis. Plug this in, group the
terms involving only θ and conclude that
2 " 2 #
αφ2 αθ2

∂Wθ 2 1 ∂Wr k
+ 2 = αθ , + 2 − = E.
∂θ sin θ 2m ∂r r r

The equations for Wφ , Wθ , and Wr demonstrate the conservations of angu-

lar momentum about the z-axis pθ , total angular momentum p, and total
energy E, and from here they could be integrated to obtain the equations
of motion.
CHAPTER 6. HAMILTON–JACOBI EQUATION 109

Iφ = αφ = pφ ,
s
αφ2
I
1
Iθ = αθ2 − dθ,
2π sin2 θ
I r
1 2mk αθ2
Ir = 2mE + − 2 dr.
2π r r

(d) Let us look at the second action variable Iθ . We know from section 3.1
that this motion is coplanar, so let ψ denote the angle in the plane of orbit.
Set the momentum in the (r, θ, φ) variables and (r, ψ) variables equal, and
conclude that pθ dθ = p dψ − pφ dφ. Conclude that

Iθ = p − pφ = αθ − αφ .

(e) Now for the third action variable Ir . The integral for Ir is evaluated
between two turning points r1 , r2 for which the integrand pr = mṙ must
vanish. We can therefore integrate from r2 to r1 and back to r2 , for which
the integrand is first negative then positive, corresponding to the sign of
the momentum pr = mṙ. In the complex plane, this integrand is analytic
everywhere but r = 0 and along the segment on the real axis connecting
r1 and r2 . Integrate around a counterclockwise contour enclosing r1 and
r2 to obtain
mk mk
Ir = −αθ2 + √ = −(Iθ + Iφ ) + √ .
−2mE −2mE
Note that the energy

mk 2
E(I) = −
2(Ir + Iθ + Iφ )2

is symmetric in the three action variables, and so the frequency of oscilla-

tions in each coordinate r, θ, φ is the same:

∂E ∂E ∂E mk 2
= = = .
∂Ir ∂Iθ ∂Iφ (Ir + Iθ + Iφ )3

This agrees with the fact that the force is rotationally symmetric.
PART III

HAMILTONIAN MECHANICS

Hamiltonian mechanics treats the position and momentum variables as in-

dependent coordinates on phase space. This yields Hamilton’s equations of
motion—a system with two first-order differential equations for each degree of
freedom—which portrays the system’s motion as the flow of a particular vector
field on phase space. While we are still free to choose coordinates, the position
and momentum variables are of course not truly independent quantities, and
consequently permissible changes of variables must preserve the class of such
vector fields. This underlying structure of these vector fields induces a geome-
try on phase space, which is the foundation of symplectic and contact structures
in differential geometry.

110
CHAPTER 7

HAMILTON’S EQUATIONS

In this chapter we will develop Hamilton’s equations and Poisson structure

on Euclidean space. The material for this chapter is based on [MZ05], [Arn89,
Ch. 3], [Gol51, Ch. 7–8], and [LL76, Ch. 7].

7.1. Hamilton’s equations

A conservative system in Euclidean space with n degrees of freedom obeys

Newton’s equation (1.3), which reads
ṗ = −∇V (x).
Here, we have x = (x1 , . . . , xN ) ∈ Rn and pi = mi xi , which makes this a
second-order equation in x. We can recast this as the first-order system
1
ẋi = mi pi , ṗ = −∇V (x).
In terms of the system’s total energy
N
X
1 2
E(x, p) = 2mi |pi | + V (x),
i=1

this takes the particularly simple form

∂E ∂E
ẋ = , ṗ = − . (7.1)
∂p ∂x
In this way, the total energy induces the flow of (x(t), p(t)) on the phase space
R2n .
Now suppose we start with a smooth function H(t, q, p) on R × Rn × Rn . In
the spirit of (7.1), we define Hamilton’s equations of motion
∂H ∂H
q̇ = , ṗ = − (7.2)
∂p ∂q
for the Hamiltonian H(t, q, p). In particular, if H is the total energy E(x, p) of
a conservative system on Rn , then (7.1) shows that Hamilton’s equations (7.2)
implies Newton’s equations (1.3).

111
CHAPTER 7. HAMILTON’S EQUATIONS 112

Next, we will show that given a Hamiltonian, Hamilton’s equations are equiv-
alent to Lagrange’s equations for the corresponding Lagrangian. In section 7.2,
we will see how to transform a Lagrangian into a Hamiltonian, and so together
this will complete the equivalence of Hamiltonian and Lagrangian mechanics.
Proposition 7.1 (Principle of least action). The path (p(t), q(t)) is a critical
point for the functional
n
Z T X
S(p(t), q(t)) = pi (t)q̇i (t) − H(p(t), q(t)) dt (7.3)
0 i=1

over fixed-endpoint variations if and only if (p(t), q(t)) solves Hamilton’s equa-
tions (7.2).
Heuristically, we can write the action integral (7.3) as
Z
S = (p dq − H dt).

We will revisit this interpretation in section 9.5 (cf. (9.16)).

Proof. Consider p(t) + φ(t) and q(t) + ψ(t) with φ, ψ smooth Rn -valued func-
tions satisfying ψ(0) = 0 = ψ(T ). Then

d
0= S(p + φ, q + ψ)
d =0
Z TX n
∂H ∂H
= φi q̇i + pi ψ̇i − φi − ψi dt
0 i=1 ∂pi ∂qi
Z T Z T
t=T ∂H ∂H
= pψ t=0 + φ q̇ − dt + − ṗ − dt.
0 ∂p 0 ∂q

The first term on the RHS vanishes as ψ(0) = 0 = ψ(T ). In order for this to
vanish for all such φ, ψ, we must have that Hamilton’s equations (7.2) hold.
Next we will see how conservation of momentum and energy are manifested in
the Hamiltonian perspective. A position variable qk is cyclic if the Hamiltonian
H is independent of qk (but may depend on pk ). For such a variable, the
corresponding component of Hamilton’s equations (7.2) reads

∂H
q̇k = , ṗk = 0. (7.4)
∂pk
The second of these equations expresses the conservation of pk , which we record
in the following statement:
Proposition 7.2 (Conservation of momentum). If the Hamiltonian is inde-
pendent of a position variable qk (but possibly dependent on pk ), then the cor-
responding momentum pk is conserved.
CHAPTER 7. HAMILTON’S EQUATIONS 113

The first equation of (7.4) is then independent of qk and is thus can be

directly integrated. In this way, we are left with 2n − 2 equations in terms of
the integration constant pk .
When time is the cyclic coordinate, the corresponding conserved quantity is
the Hamiltonian:

Proposition 7.3 (Conservation of energy). If the Hamiltonian H is indepen-

dent of time t, then H is conserved.
Proof. The chain rule yields
d ∂H ∂H ∂H ∂H ∂H ∂H
H(q(t), p(t)) = · q̇ + · ṗ = · − · = 0,
dt ∂q ∂p ∂q ∂p ∂p ∂q

where in the second equality we used that q(t) and p(t) solve Hamilton’s equa-
tions (7.2).

7.2. Legendre transform

The Legendre transform is an involution on the space of strictly convex func-

tions, which provides a correspondence between the Lagrangian and Hamilto-
nian perspectives. Specifically, Lagrangians and Hamiltonians are often strictly
convex in q̇ and p due to the quadratic kinetic energy term, and they are Leg-
endre transforms of each other in in those variables.
Let f : Rn → R be a smooth, nonnegative, strictly convex function with
f (0) = 0 that satisfies f (x)/|x| → +∞ as |x| → ∞. We define

f ∗ (ξ) = sup {x · ξ − f (x)} (7.5)

x∈Rn

to be the Legendre transform of f (x). Inside the supremum is the distance

F (x, ξ) = x·ξ−f (x) between the graph of the function f (x) and the hyperplane
x · ξ of slope ξ. As f (x)/|x| → +∞ as |x| → ∞, then for fixed ξ the function
x 7→ F (x, ξ) must attain its maximum. Moreover, as f is strictly convex, then
x 7→ F (x, ξ) is strictly concave and so there is a unique point x∗ (ξ) which
maximizes F (x, ξ). The Legendre transform of f is then given by f ∗ (ξ) =
F (x∗ (ξ), ξ).
In the context of classical mechanics, we will be primarily concerned with the
application of the Legendre transform to positive definite quadratic functions.
Example 7.4. Consider the quadratic function

f (x) = 21 Ax · x + b · x + c,

where A is an n × n real symmetric positive definite matrix, b ∈ Rn , and c ∈ R.

In order to maximize the distance F (x, ξ), we differentiate:

∂F ∗ ∂
x · ξ − 21 Ax · x − b · x − c x=x∗ = ξ − Ax∗ − b.

0= (x ) =
∂x ∂x
CHAPTER 7. HAMILTON’S EQUATIONS 114

f
slope ξ

x
f ∗ (ξ)

Figure 7.1: Graphically, f ∗ (ξ) is the distance of the origin to the y-intercept of
the supporting hyperplane to the graph of f (x) with slope ξ.

This equation has one critical point x∗ = A−1 (ξ − b), which must be our
maximum. Plugging this back in, we get
f ∗ (ξ) = F (x∗ (ξ), ξ)
= A−1 (ξ − b) · ξ − 21 (ξ − b) · A−1 (ξ − b) − b · A−1 (ξ − b) − c
= 21 A−1 (ξ − b) · (ξ − b) − c.
In particular, if we take A = mI, b = 0, and c = 0, then both
f (x) = 21 m|x|2 , f ∗ (ξ) = 1
2m |ξ|
2

are the kinetic energy once we recognize x as the velocity and ξ as the momen-
tum.
The Legendre transform can be defined more generally (cf. Exercise 7.3),
but with these hypotheses on f we have that f ∗ satisfies the same hypotheses
(although they are not in one-to-one correspondence):
Theorem 7.5 (Involution property). If f : Rn → R is a smooth, nonnegative,
strictly convex function with f (0) = 0 that satisfies f (x)/|x| → +∞ as |x| → ∞,
then the Legendre transform (7.5) is too and we have (f ∗ )∗ = f .
Proof. It is immediate that f ∗ is nonnegative and that f ∗ (0) = 0. Moreover, f ∗
convex as the supremum of over convex (indeed, F (x, ξ) is affine in ξ) functions.
Next we differentiate g := f ∗ . As the unique maximizer of the distance
F (x, ξ), we know that x∗ = x∗ (ξ) is the unique solution to
∂F ∗
0= (x ) = ξ − ∇f (x∗ ).
∂x
The function ∇f is a local diffeomorphism on Rn by the inverse function theo-
rem, since (∇f )0 = f 00 is positive definite and hence invertible. This means that
the above equation has a unique solution
x∗ (ξ) = (∇f )−1 (ξ) (7.6)
CHAPTER 7. HAMILTON’S EQUATIONS 115

which is smooth because ∇f is, and hence g(ξ) = F (x∗ (ξ), ξ) is smooth as well.
The inverse function theorem also tells us that the derivative of x∗ (ξ) is

∇x∗ (ξ) = (f 00 )−1 (ξ)

where f 00 (x) is the Hessian matrix of second derivatives. The first derivative of
g(ξ) := f ∗ (ξ) is then

∇g(ξ) = ∇ {ξ · x∗ (ξ) − f (x∗ (ξ))}

(7.7)
= x∗ (ξ) + ξ · ∇x∗ (ξ) − ∇f (x∗ (ξ)) · ∇x∗ (ξ) = x∗ (ξ).

Therefore, the second derivative of g is

g 00 (ξ) = (f 00 )−1 (ξ) > 0,

which is positive definite since f 00 (x) is. That is, the Legendre transform f ∗ = g
is also strictly convex.
Now that we know

g(ξ) = x∗ (ξ) · ξ − f (x∗ (ξ))

is also convex, we may now consider its Legendre transform. Let ξ ∗ (x) be
the point which attains the supremum for g ∗ . By (7.6) this point is uniquely
determined by
ξ ∗ (x) = (∇g)−1 (x) = ∇f (x),
where in the last equality we used (7.7). Comparing this with (7.6), we see that
his is the inverse of the function x∗ (ξ). Consequently, the transform of g(ξ) is
given by

g ∗ (x) = ξ ∗ (x) · x − g(ξ ∗ (x))

= ξ ∗ (x) · x − x∗ (ξ ∗ (x)) · ξ ∗ (x) + f (x∗ (ξ ∗ (x))) = f (x),

and so (f ∗ )∗ = f as desired.
Within the context of mechanics, the importance of the Legendre transform
is contained in the following simple calculation:
Proposition 7.6. Let M be an n × n positive definite matrix. The Legendre
transformation of the conservative Lagrangian system

L (q, q̇) = 12 M q̇ · q̇ − V (q)

in q̇ is the corresponding Hamiltonian

H(q, p) = 12 M −1 p · p + V (q)

in p, and vice versa.

Proof. Apply Example 7.4 to A = M , b = 0, and c = V (q) (which is constant
with respect to the velocity q̇).
CHAPTER 7. HAMILTON’S EQUATIONS 116

Given a Hamiltonian system, we saw in Proposition 7.1 that the correspond-

ing Lagrangian is given by (7.3) and that Hamilton’s equations are equivalent
to the principle of least action. Now, given a Lagrangian system, by Propo-
sition 7.6 we see that the corresponding Hamiltonian is given by its Legendre
transform:
n
X ∂L
H(q, p, t) := q̇ · p − L (q, q̇, t) = q̇i −L.
i=1
∂ q̇i
Recall that in Proposition 4.11 we saw that the Hamiltonian determines the
system’s motion, and that the quantity H is conserved whenever the Lagrangian
L is time-independent.
We also see from Proposition 7.6 that the velocity q̇ and the momentum p
are dual variables. This is encoded in the dot product q̇ ·p in the definition (7.5),
which is also twice the kinetic energy. As the velocity q̇ is a tangent vector, then
the momentum p must be a covector and so the Hamiltonian is defined on the
cotangent bundle of phase space.

7.3. Poisson structure

Unlike the principle of least action, Hamilton’s equations (7.2) depend on

the choice of coordinates. In this section, we will see that the Poisson bracket
provides a coordinate-free reformulation of Hamilton’s equations.
Rather than just the position and momentum, let us now consider the evo-
lution of a general observable. If x(t) evolves according to the first-order ODE
system
ẋ = f (t, x),
an observable (resp. time-dependent observable) is a smooth function F
from Rdx (resp. Rt × Rdx ) into R. The complete or material derivative of a
time-dependent observable F is then
d ∂F
F (t, x) = + ∇F · f (7.8)
dt ∂t
by the chain rule. Note that in addition to the explicit time dependence of F ,
the evolution of F is influenced by the advection term f · ∇F .
Here, our coordinates x are (q, p) ∈ R2n , which evolve under Hamilton’s
equations. Specifically, by the chain rule and Hamilton’s equations (7.2) the
material derivative of a time-dependent observable F (t, q, p) is

dF ∂F ∂F ∂F ∂F ∂F ∂H ∂F ∂H
= + · q̇ + · ṗ = + · − · . (7.9)
dt ∂t ∂q ∂p ∂t ∂q ∂p ∂p ∂q

Given two time-dependent observables F and H, we define the (canonical)

Poisson bracket to be the advection term above, i.e.
n
∂F ∂H ∂F ∂H X ∂F ∂H ∂F ∂H
{F, H} = · − · = − . (7.10)
∂q ∂p ∂p ∂q ∂qk ∂pk ∂pk ∂qk
k=1
CHAPTER 7. HAMILTON’S EQUATIONS 117

(Note that there is another popular convention with F and H swapped, or

equivalently with an overall factor of −1.) Then from (7.9) we see that the
material derivative is given by
dF ∂F
= + {F, H}. (7.11)
dt ∂t
In particular, if F (q, p) is just a usual observable (i.e. time-independent), then
F is conserved by the flow of the Hamiltonian H if and only if {F, H} = 0.
The evolution (7.11) of observables in terms of the Poisson bracket can be
taken as alternative equations of motion. Indeed, (7.11) implies Hamilton’s
equations (7.2) by taking the observable F to be a position or momentum co-
ordinate:
∂H ∂H
q̇i = {qi , H} = , ṗi = {pi , H} = − .
∂pi ∂qi
As a side note, the equation of motion (7.11) has the added benefit that it is
analogous to the Heisenberg formulation of quantum mechanics, where Poisson
brackets are replaced by operator commutators.
Next, we record some properties of the Poisson bracket. It is easy to see
that the Poisson bracket is bilinear (i.e. linear in each entry) and antisymmet-
ric ({F, G} = −{G, F }) from the definition (7.10). The bracket also posses a
product (or Leibniz) rule,

{F G, H} = F {G, H} + G {F, H} ,

which can be seen by using the product rule for F G and expanding. We also
have the chain rule
{F, g(H)} = g 0 (H){F, H}. (7.12)
Although slightly less obvious, we claim that the Poisson bracket also satisfies
the Jacobi identity:

{F, {G, H}} + {G, {H, F }} + {H, {F, G}} = 0. (7.13)

Let’s focus on one of the terms above, say, {H, {F, G}}. As {F, G} is a linear
expression in terms of the first derivatives of F and G, then {H, {F, G}} is also
linear expression with each term containing exactly one second derivative of F
or G. Let DG (φ) = {G, φ} and DH (φ) = {H, φ}. Note that the first term
of (7.13) does not contain any second derivatives of F , and so all of the second
derivatives of F are contained within

{G, {H, F }} + {H, {F, G}} = {G, {H, F }} − {H, {G, F }}

= (DG DH − DH DG )F.

To simplify notation and observe some cancellation, we write the operators DG

CHAPTER 7. HAMILTON’S EQUATIONS 118

and DH as
X ∂G ∂ ∂G ∂
X
∂
DG = − = ξk ,
j
∂pj ∂qj ∂qj ∂pj ∂x k
k
X ∂H ∂ ∂H ∂
X
∂
DH = − = ηk ,
j
∂pj ∂qj ∂qj ∂pj ∂xk
k

where x = (q, p), ξ = ( ∂G ∂G ∂H ∂H

∂p , − ∂q ), and η = ( ∂p , − ∂q ). Therefore
!
X ∂2

X ∂ X ∂ ∂η` ∂
DG DH = ξk η` = ξ` + ξk η ` ,
∂xk ∂x` ∂xk ∂x` ∂xk ∂x`
k ` k,`
!
X ∂2

X ∂ X ∂ ∂ξ` ∂
DH DG = ηk ξ` = ηk + η k ξ` .
∂xk ∂x` ∂xk ∂x` ∂xk ∂x`
k ` k,`

In taking the difference of these, we see that the second terms in these last
equalities cancel and we are left with
X ∂η` ∂ξ`

∂
DG DH − DH DG = ξk − ηk .
∂xk ∂xk ∂x`
k,`

That is, all of the second derivatives of F cancel, leaving only first derivatives.
By symmetry this must also be true for G and H, and since every term of (7.13)
contains only terms containing exactly one second derivative, then all the terms
must cancel.
In order to make the equation of motion (7.11) entirely coordinate-free, we
need a new coordinate-free definition of the Poisson bracket. Specifically, out of
the properties we just observed, we take the following as an abstract definition:

Definition 7.7. A Poisson bracket or Poisson structure on Rd is a function

{·, ·} : C ∞ (Rd ) × C ∞ (Rd ) → C ∞ (Rd ) satisfying:
(a) (Antisymmetry) {F, G} = −{G, F } for all F, G ∈ C ∞ (Rd ).

(b) (Bilinearity) {λF +µG, H} = λ{F, H}+µ{G, H} for all F, G, H ∈ C ∞ (Rd )

and λ, µ ∈ R.
(c) (Product rule) {F G, H} = {F, H}G + F {G, H} for all F, G, H ∈ C ∞ (Rd ).
(d) (Jacobi identity) {{F, G}, H} + {{G, H}, F } + {{H, F }, G} = 0 for all
F, G, H ∈ C ∞ (Rd ).
We also say that a Poisson bracket is nondegenerate if ∇F (x) 6= 0 implies
that there exists G ∈ C ∞ (Rd ) so that {F, G}(x) 6= 0.
CHAPTER 7. HAMILTON’S EQUATIONS 119

In fact, this definition can even be generalized to manifolds. However, we

choose to focus on the Euclidean case for clarity; we will consider general man-
ifolds when we develop symplectic geometry.
Recall the following fact from differential geometry (e.g. [Lee13, Prop. 8.15]):
first-order differential operators are characterized by being linear and satisfying
the product rule.
Proposition 7.8. If L : C ∞ (Rd ) → C ∞ (Rd ) is linear and satisfies the product
rule, then there exist smooth coefficients X 1 , . . . , X d so that
d
X ∂F
(Lf )(x) = X i (x) (x).
i=1
∂xi

As a consequence, we see that any Poisson bracket must be given by a

structure matrix J : Rd → Rd×d :
Corollary 7.9. If {·, ·} is a Poisson bracket, then there exist smooth functions
J ij : Rd → R such that
d
X ∂F ∂G
{F, G}(x) = J ij (x) .
i,j=1
∂xi ∂xj

Moreover, J is antisymmetric (J ij = −J ji for all i, j) and we have the Jacobi

identity:
d
∂J ij k` ∂J `i kj ∂J j` ki
X
J + J + J =0 (7.14)
∂xk ∂xk ∂xk
k=1

for all i, j, `. Lastly, {·, ·} is nondegenerate if and only if det J is nonvanishing.

Next, we turn to some examples. In the case d = 2n, our first defini-
tion (7.10) of the Poisson bracket provides a key example:
Example 7.10 (Canonical Poisson bracket). Let (q, p) denote the coordinates
on Rn × Rn , and consider the canonical Poisson bracket (7.10). The structure
matrix for this bracket in these coordinates is

0 I
J0 = .
−I 0

The properties of Definition 7.7 are easily verifiable for this matrix. Equivalently,
the entries of this matrix are uniquely determined by the relations

{qi , qj } = 0 = {pi , pj }, {qi , pj } = δij for all i, j = 1, . . . , n.

These relations are called Hamilton’s commutation relations or the canon-

ical relations, and they are one of the remnants of quantum mechanics for
macroscopic dynamics.
The matrix J0 has √ the property that J02 = −I. Consequently, we may think
of J0 as (a choice of) −1, which is part of the reason why we use the letter
CHAPTER 7. HAMILTON’S EQUATIONS 120

J. Indeed, nondegenerate Poisson brackets are closely related to “almost com-

plex structures”; this relationship is further explored for the canonical structure
matrix J0 in Exercise 9.5.
We can also have a Poisson bracket on Rd for d 6= 2n:

Example 7.11 (Casimirs). We augment Example 7.10 by adding coordinates

y i with
{F, y i } ≡ 0
for all smooth F . Any such function y i is called a Casimir. The structure
matrix is now  
0 I 0
J = −I 0 0 .
0 0 0
In canonical coordinates, introducing Casimirs is relatively uninteresting; how-
ever, the ability to identify a Casimir for an arbitrary Poisson bracket turns out
to be quite useful in practice.

7.4. Hamiltonian vector fields

Given a first-order ODE

ẋ = f (t, x),
we saw in (7.8) that the material derivative of an observable is F (x) is the
advection term f · ∇F . Consequently, we can identify the RHS f geometrically
with the vector field f · ∇. As a choice of Poisson bracket and Hamiltonian
produce a first-order ODE via (7.8), we would now like to explore this connection
in terms of vector fields.
Specifically, the map F 7→ {F, H} defines a vector field XH via
d
i
X ∂H
{F, H} = XH (F ), where XH = J∇H, XH = J ij (x) .
j=1
∂xj

(Again, there is another popular convention that comes from swapping the order
of F and H in the definition of {F, H}.) We call XH the Hamiltonian vector
field associated to H. This is not the only way that we can turn an observable
H into a vector field. Indeed, the gradient vector field ∇H induces the gradient
flow (cf. (2.5)), and this is associated to the dot product structure F 7→ ∇F ·∇H.
The Hamiltonian vector field is the analogous object for the Poisson structure.
In Hamiltonian mechanics, the state of the system is described by a point
x in Rd (or on a manifold) endowed with a Poisson structure. The evolution is
dictated by a Hamiltonian (or total energy) H : Rd → R via Hamilton’s
equations
ẋ = XH = J∇H. (7.15)
CHAPTER 7. HAMILTON’S EQUATIONS 121

Equivalently, the evolution of an observable F (x) is

d
X ∂F ∂H
Ḟ = ∇F · ẋ = ∇F · XH = J ij = {F, H}.
i,j=1
∂xi ∂xj

These are the generalization of (7.2) and (7.11) to arbitrary Poisson structures.
For future reference, we also note that the above calculation implies

XH F = {F, H} = −{H, F } = −XF H. (7.16)

In other words, the evolution of F under the flow of H can be recast as the
evolution of H under the flow of F .
Example 7.12. To put classical mechanics in this framework, we adopt the
canonical bracket of Example 7.10. Here, qi are the position coordinates and pi
are the corresponding momenta. The Hamiltonian is the total energy
1 2
H(q, p) = 2m2 p + V (q).

This yields the familiar equations of motion:

1 0 p
q̇ = {q, H} = 2m 2p{q, p} + V (q){q, q} = m + 0,
0
ṗ = {p, H} = 1
2m 2p{p, p} + V (q){p, q} = 0 − V 0 (q),

where we used that the Poisson bracket satisfies the chain rule (7.12). The first
equation is the definition of momentum, and the second equation is Newton’s
equation.
Example 7.13. For a free relativistic particle, the total energy is given by
Einstein’s equation p
H = c2 p2 + m2 c4 .
This yields
cp
q̇ = p , ṗ = 0.
m2 c2 + p2
In particular, we see that |q̇| ≤ c with equality if and only if m = 0. These
equations of motion can be used to derive the Lorentz formulas, which lie at the
heart of relativistic mechanics. In fact, Hamiltonian mechanics is compatible
with relativity. However, relativity is not built in to the Hamiltonian framework,
and It turns out to be easier to incorporate relativity into Lagrangian mechanics.
Next, we record some easy consequences:
Lemma 7.14. (a) The observable F is conserved under the flow of H if and
only if {F, H} ≡ 0 (and we may swap F, H in either of these statements).
(b) We have [XF , XH ] = −X{F,H} .
(c) If {F, H} is a constant, then the F -flow commutes with the H-flow.
CHAPTER 7. HAMILTON’S EQUATIONS 122

Proof. (a) This follows from the definition XH F = {F, H}, and we may swap
F and H in either statement by the observation (7.16).
(b) We compute

XF (XH G) = XF {G, H} = {{G, H}, F },

XH (XF G) = {{G, F }, H} = −{{F, G}, H}.

Subtracting these and recalling the Jacobi identity, we have

[XF , XH ]G = {{G, H}, F } + {{F, G}, H}

= −{{H, F }, G} = −{G, {F, H}} = −X{F,H} G.

(c) Note that 0 = [XF , XH ] = −X{F,H} requires that

d
X ∂
J ij {F, H} = 0.
j=1
∂xj

In particular, this happens if the gradient of {F, H} vanishes.

Example 7.15. Consider the canonical Poisson bracket on R3 . The compo-
nents of the angular momentum L = q × p generate rotation about the cor-
responding coordinate axes. More generally, the flow of the Hamiltonian a · L
is rotation about the a-axis. Thus, if H is conserved by rotations, then part
(a) allows us to turn this statement around: the flow of H conserves L. Anal-
ogous conclusions hold for just component-wise conservation. Note how this is
an example of Noether’s theorem: a symmetry implies a conservation law.
To conclude this section, we will record one more observation about Poisson
bracket:
Proposition 7.16 (Poisson’s theorem). If the observables F and G are con-
served under the flow of H, then their Poisson bracket {F, G} is also conserved.
Proof. We use the Jacobi identity (7.13) and note that {F, H} = 0 = {G, H}:

0 = {{F, G}, H} + {{G, H}, F } + {{H, F }, G}

= {{F, G}, H} + {{G, H}, F } − {{F, H}, G}
= {{F, G}, H}.

The RHS is exactly the time derivative of {F, G} under the flow of H.
Example 7.17. If L1 = x2 p3 − x3 p2 and L2 = x3 p1 − x1 p3 are the first and
second components of the angular momentum L = x × p of a particle x ∈ R3 ,
then their bracket

{L1 , L2 } = {x2 p3 , x3 p1 } + {x3 p2 , x1 p3 } = −x2 p1 + p2 x1 = L3

is the remaining component of the angular momentum.

CHAPTER 7. HAMILTON’S EQUATIONS 123

Although powerful, it should be noted that the new quantity {F, G} is not
guaranteed to be nontrivial. For example, a system with n degrees of freedom
can only have up to 2n − 1 independent conserved quantities, and so repeated
applications of Proposition 7.16 must eventually stop producing independent
quantities.

7.5. Darboux theorem

Now that we have a general notion of a Poisson structure on Rd , we would

like to understand what the possibilities are for how it behaves locally. The Dar-
boux theorem says that Example 7.11 is in fact the only possibility. Although
this result most commonly is associated with Darboux, the Poisson bracket
formulation is due to Lie [Lie80].
Theorem 7.18 (Darboux theorem). If the rank of J(x) is independent of x,
then for any x0 ∈ Rd there exist coordinates (p1 , . . . , pn , q1 , . . . , qn , y1 , . . . , yk )
on a neighborhood of x0 so that

{qj , pj } = δij , {pi , pj } = 0 = {qi , qj }, {·, y` } = 0

for all i, j, `.

If the rank of J(x) is full—i.e. the Poisson bracket is nondegenerate—then

this immediately implies the well-known Darboux theorem about symplectic
structures (Theorem 9.3).
If the rank of J(x) is not full, then we must have k ≥ 1 many coordinates
y` . Such an observable y` is called a Casimir. Note that {·, y` } = 0 implies
that y` is conserved by all Hamiltonian flows.
Proof. We follow the proof from [Wei83].
Fix x0 ∈ Rd . For ease of notation, we translate so that x0 = 0. We will
induct on the rank of J near x0 .
For the base case, suppose that the rank of J is zero near x0 . Then J ≡ 0,
and so we take y1 , . . . , yd to be any coordinates on Rd .
Next we turn to the inductive step. Assume that the rank of J near x0 is
nonzero. Choose a function p1 so that Xp1 (x0 ) 6= 0. In particular, we must
have ∇p1 (x0 ) 6= 0. By Proposition A.18, we can pick a full set of coordinates
∂
x1 , . . . , xd so that ∂x 1
= Xp1 . We then take q1 (x) = x1 , which automatically
satisfies
∂
{q1 , p1 } = Xp1 q1 = x1 = 1.
∂x1
We claim that Xp1 and Xq1 at x0 must be linearly independent. Suppose
that λXp1 + µXq1 = 0 at x0 for some µ, λ ∈ R. Then

(λXp1 + µXq1 )q1 = λ{q1 , p1 } + µ{q1 , q1 } = λ,

(λXp1 + µXq1 )p1 = −µ,
CHAPTER 7. HAMILTON’S EQUATIONS 124

and so λ = µ = 0 as desired.
By the claim and Proposition A.19, we can find a new full set of coordinates
x1 , . . . , xd so that
∂ ∂
= Xp1 , = Xq1 .
∂x1 ∂x2
It is important to know that it is possible to extend to a full set of coordinates
∂
at this step. Indeed, in order to even define ∂x 1
we need a full set of coordinates.
∂E
(In statistical mechanics, the notation ( ∂V )P is expressly intended to resolve
this issue.)
Next we claim that (p1 , q1 , x3 , . . . , xd ) are also valid coordinates on a neigh-
borhood of x0 —i.e. their gradients at x0 are linearly independent. Suppose
that
Xd
0 = ∇ c1 p1 + c2 q2 + c` x` (x0 )
`=3
for some constants c1 , . . . , cd . Then
d d
X X ∂x`
0 = c1 p1 + c2 q2 + c` x` , p1 = c1 {p1 , p1 } + c2 {q1 , p1 } − = c2 ,
∂x1
`=3 `=3

and so c2 = 0. Similarly,
d
X
0 = c1 p1 + c2 q2 + c` x` , q1 = −c1 .
`=3

Now we have a linear combination of the gradients of x3 , . . . , xd at x0 , and since

x1 , . . . , xd are coordinates then we must have c3 = · · · = cd = 0 as desired.
Finally, we would like to invoke the inductive hypothesis for the Poisson
structure restricted to the domain of x3 , . . . , xd . First we check that {xk , x` }
are independent of p1 , q1 for all k, ` = 3, . . . , d. Using the Jacobi identity, we
compute
∂
{xk , x` } = Xp1 {xk , x` } = {{xk , x` }, p1 }
∂x1
= −{{p1 , xk }, x` } − {{x` , p1 }, xk } = 0,
where in the last equality we noted that the inner Poisson brackets vanish since
Xp1 = ∂x1 . A similar computation shows that
∂
{xk , x` } = 0.
∂x2
Therefore the Poisson bracket restricted to the domain of x3 , . . . , xd is well-
defined and has its rank reduced by two. Indeed, by our construction the struc-
ture matrix J in the coordinates x1 , . . . , xd is a diagonal 2 × 2 block matrix with
the 2 × 2 leading entry
0 1
.
−1 0
CHAPTER 7. HAMILTON’S EQUATIONS 125

By the inductive hypothesis, we can find a diffeomorphic change of variables

from (x3 , . . . , xd ) to (q2 , . . . , qn , p2 , . . . , pn , y1 , . . . , yk ) near x0 so that the new
coordinates obey the canonical Poisson bracket relations. Adding in p1 , q1 then
yields the desired coordinates.

7.6. Canonical transformations

A fundamental feature of the Hamiltonian perspective is that the positions

and momenta are treated as independent coordinates. Physically, we know that
this is not true—the momentum is defined in terms of the position by (1.1).
Consequently, an arbitrary change of variables will not preserve the physical
system. Canonical transformations are those changes of variables which are
admissible.
Given a Poisson structure on Rd , we call a map Ψ : Rd → Rd is canonical
if it preserves the bracket:

{F ◦ Ψ, G ◦ Ψ} = {F, G} ◦ Ψ for all F, G. (7.17)

In other words, Ψ satisfies

d
X ∂Ψk ij ∂Ψ`
i
J j
= J k,` ◦ Ψ. (7.18)
i,j=1
∂x ∂x

If J is also nondegenerate, then we call Ψ a symplectomorphism because the

Poisson bracket induces a symplectic structure. Note that if Ψ is a symplecto-
morphism, then Ψ is automatically a diffeomorphism. Indeed, we can see that
Ψ0 is invertible by taking determinants in (7.18).
As a warning, some authors define canonical transformations as any diffeo-
morphism which preserves Hamilton’s equations (7.2). It is false that these
two definitions are equivalent, and this point is even neglected in the litera-
ture (cf. [LL76, Sec. 45]). Indeed, the condition that a transformation preserves
Hamilton’s equations (7.2) is weaker and less geometrically significant. For ex-
ample P = 2p, Q = q satisfies (7.2) but not (7.18).
The definition (7.18) is closely related to the symplectic group Sp(R2n ),
which is comprised of the invertible 2n × 2n matrices U such that

T 0 I
U J0 U = J0 , where J0 = . (7.19)
−I 0

We will soon make this relationship precise in Proposition 7.20. Note that if
we replace the canonical structure matrix J0 by the identity matrix I then we
recover the orthogonal group; this reflects that I is the structure matrix for the
dot product.
First, we record some facts about Sp(R2n ):
Proposition 7.19. Sp(R2n ) is a group that is closed under transposition.
CHAPTER 7. HAMILTON’S EQUATIONS 126

Proof. To see that Sp(R2n ) is a group, we check that it is closed under inversion.
Suppose that U satisfies (7.19). Then U has nonzero determinant, and so by
multiplying by U −1 and U −T we see that J0 = U −T J0 U −1 .
The group Sp(R2n ) is also closed under transposition. This does not fol-
low from taking the transpose of (7.19) however, because taking the transpose
of (7.19) does not yield anything new. Instead, we take the inverse of (7.19) to
obtain −U −1 J0 U −T = −J0 . As Sp(R2n ) is closed under inversion, we conclude
that U T ∈ Sp(R2n ).
In the case of the canonical Poisson bracket, we have:
Proposition 7.20. Suppose J = J0 . Then Ψ is canonical if and only if Ψ0 ∈
Sp(R2n ).
Proof. The relation (7.18) reads Ψ0 J0 (Ψ0 )T = J0 , and we know that Sp(R2n ) is
closed under transposition.
Note that the block matrix

A B
U=
C D

lies in Sp(R2n ) if and only if

AT C = C T A, B T D = DT B, AT D − C T B = I. (7.20)

Taking determinants, we see that (det U )2 = 1. It is not obvious, but we actually

have det U = 1; this is proved in Exercise 7.6. In particular, symplectic matrices
preserve both volume and orientation.
The symplectic group is not only similar to the orthogonal group on R2n
and the unitary group on Cn , but they are also deeply connected. One facet of
this relationship is illustrated in Exercise 7.6. As a warning, unlike orthogonal
and unitary matrices, symplectic matrix are not always diagonalizable. Indeed,
from the conditions (7.20) we see that

I B
with B = B T
0 I
is symplectic. When B = I, this matrix is essentially a Jordan block and hence
is not diagonalizable.
Lastly, we record the following criteria to check if a transformation is canon-
ical in practice:
Proposition 7.21. Suppose that we have a nondegenerate Poisson structure on
R2n . Then Ψ(q, p) = (Q, P ) is a canonical transformation (and hence a sym-
plectomorphism) if and only if the new variables satisfy the canonical relations
with respect to the old variables:

{Qi , Qj }p,q = 0, {Pi , Pj }p,q = 0, {Pi , Qj }p,q = δij

for all i, j = 1, . . . , n.
CHAPTER 7. HAMILTON’S EQUATIONS 127

Proof. We may assume that the Poisson bracket is given in canonical coordinates
(q, p) by Theorem 7.18.
The forward implication is immediate, since we can replace the differentia-
tion variables q, p with Q, P by assumption. For example,

{Qi , Qj }q,p = {Qi , Qj }Q,P = 0.

For the reverse implication, the premise tells us that

! ∂Q !
∂Q ∂Q ∂P
0 I ∂q ∂p 0 I ∂q ∂q
= ∂P ∂P ∂Q ∂P .
−I 0 ∂q ∂p
−I 0 ∂p ∂p

So given arbitrary observables F, G we have

0 I ∂G
∂F ∂F ∂Q
{F, G}Q,P = ∂Q ∂P ∂G
−I 0
! ∂P ∂Q ∂P ! ∂G
∂Q ∂Q
0 I

∂F ∂F ∂q ∂p ∂q ∂q ∂Q
= ∂Q ∂P ∂P ∂P ∂Q ∂P ∂G
∂q ∂p
−I 0 ∂p ∂p ∂P
!
0 I ∂G
= ∂F ∂q
∂F
∂p
∂q
∂G = {F, G}q,p .
−I 0 ∂p

Therefore Ψ preserves the Poisson bracket in the sense of (7.17).

7.7. Liouville’s theorem

In this section we will prove Liouville’s theorem, which says that Hamiltonian
flows preserve Lebesgue measure on phase space. In other words, the density of
trajectories in phase space surrounding a given trajectory is constant in time.
Given a Borel measure µ on Rd , we say that a continuous map Φ : Rd → Rd
preserves the measure µ if

µ(Φ−1 (A)) = µ(A) for all A ⊂ Rd Borel measurable. (7.21)

(As in measure theory, we need to use Φ−1 here instead of Φ because Φ−1 (A)
is Borel when A is by the continuity of Φ, but Φ(A) might not be.) The condi-
tion (7.21) is equivalent to
Z Z
(f ◦ Φ)(x) dµ(x) = f (x) dµ(x) for all f ∈ Cc∞ (Rd ). (7.22)

Indeed, note that taking f = 1A in (7.22) yields (7.21), and the equivalence is
proved by approximating 1A by smooth functions. The equations (7.21)-(7.22)
is also sometimes conveyed by saying that the pushforward measure µ ◦ Φ−1
is equal to µ. (Recall that measures pushforward because they are dual to
functions, which pullback.)
Liouville’s theorem follows from the following general fact about ODEs:
CHAPTER 7. HAMILTON’S EQUATIONS 128

Lemma 7.22. Let X be a smooth autonomous vector field on Rd and ω ∈

C ∞ (Rd ). Then the flow of ẋ = X(x) preserves the measure ω(x) dx if and only
if ∇ · (ωX) ≡ 0.
Proof. Let Φ(t; ξ) denote the solution x(t; ξ) to the differential equation ẋ =
X(x) with initial data x(0) = ξ.
For the forward direction, assume that the flow preserves ω(x) dx. For a
smooth compactly supported observable F (t, x) we take the pullback by Φ:
Z Z
d
(F ◦ Φ)(t, ξ)ω(ξ) dξ = X(ξ) · ∇F (ξ)ω(ξ) dξ
dt t=0
Z
= − F (ξ)∇ · (ωX)(ξ) dξ.

By premise, the LHS vanishes. As F ∈ Cc∞ was arbitrary, we conclude that

∇ · (ωX) ≡ 0 as desired.
For the reverse direction, assume that ∇ · (ωX) ≡ 0. Using Φ̇ = X ◦ Φ and
the derivative (A.15) of the determinant, we compute

d
(ω ◦ Φ)(t, ξ) det Dξ Φ(t, ξ)
dt
X d
∂Xi
= (∇ω ◦ Φ)(t, ξ) · (X ◦ Φ)(t, ξ) + ω · ◦ Φ(t, ξ) det Dξ Φ(t, ξ)
i=1
∂ξ i

= ∇ · (ωX) ◦ Φ(t, x) det Dξ Φ(t, ξ).

The RHS vanishes by premise, and so we conclude that the quantity in the time
derivative on the LHS is equal to its initial value:

ω ◦ Φ(t, ξ) det Dξ Φ(t, ξ) = ω(ξ) det I = ω(ξ) for all t.

Consequently, the change of variables x = Φ(t, ξ) yields

Z Z
(F ◦ Φ)(t, ξ)ω(ξ) dξ = (F ◦ Φ)(t, ξ)(ω ◦ Φ)(t, ξ) det Dξ Φ(t, ξ) dξ
Z
= F (ξ)ω(ξ) dξ.

This is exactly the condition (7.22), and so we conclude that the flow Φ preserves
ω(x) dx.
As an immediate corollary, we obtain Liouville’s theorem for the canonical
Poisson bracket:

Proposition 7.23 (Liouville’s theorem, special case). Suppose that J = J0 is

the canonical Poisson bracket. Then any Hamiltonian flow on R2n preserves
the Lebesgue measure.
CHAPTER 7. HAMILTON’S EQUATIONS 129

Proof. Given a Hamiltonian H, the corresponding Hamiltonian vector field is

∂H ∂H
XH = ,− .
∂p ∂q
This vector field is divergence-free since

∂2H ∂2H
∇ · XH = − ≡ 0,
∂q∂p ∂p∂q
and so its flow preserves Lebesgue measure by Lemma 7.22.
Next, we would like to extend this fact to all Poisson structures. First, we
will need:
Proposition 7.24. The flow map Φ(t, ·) of any Hamiltonian vector field is
canonical.
Proof. By Theorem 7.18, we may work in local coordinates so that
 
0 I 0
J = −I 0 0
0 0 0

is independent of x. Given a Hamiltonian H, the flow map Φ(t, x) is defined by

∂Φ
= J(∇H) ◦ Φ,
∂t
and so
∂Φ0
= J(H 00 ◦ Φ)Φ0 .
∂t
Let
E(t) = Φ0 J(Φ0 )T − J
denote the failure of Φ to be symplectic. Then
dE
= J(H 00 ◦ Φ)Φ0 J(Φ0 )T + Φ0 J(Φ0 )T (H 00 ◦ Φ)T J T
dt
= J(H 00 ◦ Φ)(E + J) − (E + J)(H 00 ◦ Φ)T J T
= J(H 00 ◦ Φ)E − E(H 00 ◦ Φ)J,

where in the second equality we noted that H 00 is symmetric and J is antisym-

metric. The RHS is a bounded linear operator applied to E, and so we conclude
that there exists a constant C so that
dE
(t) ≤ C|E(t)|.
dt

As E(0) = 0, then Grönwall’s inequality (Lemma A.3) implies E(t) ≡ 0 as

desired.
CHAPTER 7. HAMILTON’S EQUATIONS 130

We are now prepared to prove:

Theorem 7.25 (Liouville’s theorem, general case). For a nondegenerate Pois-
son structure J, any canonical mapping Φ preserves phase volume:
Z Z
1 1
(f ◦ Φ)(x)| det J(x)|− 2 dx = f (y)| det J(y)|− 2 dy for all f ∈ Cc∞ (Rd ).

In particular, this holds for the flow map of any Hamiltonian vector field.
Proof. Consider the change of variables
y = Φ(x), dy = | det Φ0 | dx.
As J ◦ Φ = Φ0 J(Φ0 )T , we have
| det Φ0 | dx
Z Z Z
dy dx
f (y) 1 = (f ◦ Φ)(x) 1 = (f ◦ Φ)(x) 1 .
| det J(y)| 2 | det J ◦ Φ| 2 | det J| 2
Liouville’s theorem has many important consequences. For example, we
now know that for a Hamiltonian system there can be no asymptotically stable
equilibrium points or asymptotically stable closed trajectories in phase space,
since these would require that the density of phase curves to increase around
such phenomena. We also have the following phenomenon:
Corollary 7.26 (Poincare’s recurrence theorem). Fix t ∈ R and D ⊂ R2n a
bounded region of phase space, and let Φ := Φ(t, ·) denote the flow by time t of a
Hamiltonian vector field. Then for any positive measure set U ⊂ D there exists
x0 ∈ U and a positive integer n such that φn (x0 ) ∈ U .
If the motion is bounded—as is the case for a conservative Newtonian system
with potential energy V (x) → +∞ as |x| → ∞—this means that the system
will return to an arbitrary vicinity of any given possible configuration (q, p) ∈
R2d infinitely often, given enough time. For example, suppose we opened a
connection between chambers of gas and vacuum. Then Corollary 7.26 says that
the gas molecules will eventually all return to the initial chamber, seemingly in
violation of the second law of thermodynamics. Although it may appear that
Poincare’s theorem conflicts with Liouville’s theorem, the time scales are often
quite large (in our example it is longer than the age of the universe) and so
there is no contradiction.
Proof. Given a smooth Hamiltonian, Hamilton’s equations are a smooth system
of ODEs and so the subsequent flow Φ : R2n → R2n is injective by uniqueness
of solutions. Liouville’s theorem (Theorem 7.25) tells us that Φ preserves phase
volume.
Consider the collection of sets U, Φ(U ), Φ2 (U ), · · · ⊂ D. As Φ is volume-
preserving, all of these sets must have the same volume. On the other hand,
D is bounded and thus has finite volume, and so it is impossible for all of
these sets to be disjoint. That is, there exists some distinct j < k such that
Φj (U ) ∩ Φk (U ) 6= ∅. As Φ is injective, this requires Φk−j (U ) ∩ U 6= ∅. Namely,
we can pick some x0 ∈ U in this intersection, which gives φk−j (x0 ) ∈ U .
CHAPTER 7. HAMILTON’S EQUATIONS 131

In fact, the proof of Corollary 7.26 shows that the set of points in U which
do not return to U infinitely often has measure zero.
Together, a measure space X with a finite measure µ and a measurable
function φ : X → X that is measure preserving constitute a measure preserving
space, which is the fundamental object of study in discrete dynamical systems.
In fact, phase volume is the only measure on phase space that can be pre-
served:
Proposition 7.27. Suppose that J is nondegenerate. If a smooth measure
ω(x) dx is invariant under all Hamiltonian flows, then it must be a scalar mul-
tiple of the phase volume measure.
Proof. By Theorem 7.18, we may work in local canonical coordinates. Suppose
that he measure
ω(p1 , . . . , pn , q1 , . . . , qn ) dp1 . . . dpn dq1 . . . dqn
is invariant under all Hamiltonian flows. Taking the Hamiltonian H = pk we
have the vector field XH = ∂q∂k , and so Lemma 7.22 requires that
∂ω
0 = ∇ · (ωXH ) =
∂qk
∂ω
for all k. Taking H = qk , we similarly conclude that ∂pk = 0 for all k. Therefore
∇ω ≡ 0, and hence ω is a constant.
Let
1
dν := | det J(x)|− 2 dx
denote the phase volume measure. The Gibbs’ measure or Gibbs’ state
associated to a Hamiltonian system at temperature T > 0 is
1 −βH
Ze dν,
where β = kB1T , Boltzman’s constant kB is a universal constant, and the par-
tition function Z is a normalization constant chosen so that the total integral
of the Gibbs’ measure is one. Note that this definition requires that e−βH is
integrable. When the temperature T is small, β is large and hence the Gibbs’
measure is supported near the minima of H. When T is large, β is small and
the Gibb’s measure is supported more evenly everywhere.
The Gibbs’ measure can also be characterized via a variational principle, but
this still requires phase volume (via the entropy). On the other hand, we also
have the following algebraic characterization:
Proposition 7.28. The Gibbs’ state is the unique probability measure that sat-
isfies the classical Kubo–Martin–Swinger (KMS) condition:
E[{F, G}] = βE[{F, H}G] for all F, G ∈ Cc∞ (Rd ).
Here, the expected value E is defined in terms of a probability measure ω(x) dx
via Z
E[X] = X(x)ω(x) dx.
CHAPTER 7. HAMILTON’S EQUATIONS 132

Proof. The KMS condition says that

E[eβH {F, Ge−βH }] = E[eβH ({F, G}e−βH − βe−βH {F, H}G)]

= E[{F, G}] − βE[{F, H}G] = 0.

So if a probability measure ω(x) dx satisfies the KMS condition, then

E[eβH {F, K}] = 0 for all F, K ∈ Cc∞ (Rd ).

In other words,
Z
(ωeβH ){F, K} dx for all F, K ∈ Cc∞ (Rd ).

Therefore ωeβH dx is a measure that is invariant under all Hamiltonian flows,

and hence ωeβH dx is proportional to the phase volume dν. Rearranging, we
see that ω is equal to the Gibbs’ measure as claimed.
Example 7.29. Consider an infinite chain of oscillators, with Hamiltonian
X
H= [ 21 p2i + V (qi − qi−1 )].
i∈Z

Well-posedness for such an infinite-dimensional system is not an issue (cf. The-

orem A.2) and neither is defining a measure on this infinite-dimensional space
(cf. sequences of random variables in probability theory). However, H = +∞
almost everywhere, and so the factor e−βH ≡ 0 in the Gibbs’ measure does not
make sense. Nevertheless, the KMS condition does make sense, if we also re-
quire that F, G depend on finitely many variables. Using this, we could proceed
to define a Gibbs’ measure.

7.8. Exercises

7.1 (Charged particle in an electromagnetic field). In Exercise 4.7, we found

the Lagrangian for a charged particle in R3 with charge q and mass m moving
through an electromagnetic field to be

L = 21 mv 2 − qφ + qc v · A

where φ and A are the scalar and vector potentials for the electric and magnetic
fields:
1 ∂A
B = ∇ × A, E = −∇φ − .
c ∂t
(a) Show that the Hamiltonian for this system is
2
H= 1
2m p − qc A + qφ.
CHAPTER 7. HAMILTON’S EQUATIONS 133

(b) Although the electric and magnetic fields are uniquely expressed, the
scalar and vector potentials that φ and A are not unique and they ap-
pear explicitly in the Hamiltonian. In fact, substituting A0 = A + ∇f
for any smooth function f (t, x) leaves B unchanged since the curl of a
gradient is also zero. Show that for E to remain unchanged we need to
also substitute
1 ∂f
φ0 = φ − .
c ∂t
Together, replacing (A, φ) → (A0 , φ0 ) is called a gauge transformation,
and a specific pair (A, φ) is called a choice of gauge.
(c) As the electric and magnetic fields are unaffected by the choice of gauge,
then any physical laws in terms of the potentials should also be invariant.
Show that under a gauge transformation the Hamiltonian becomes
q ∂f
H0 = H − ,
c ∂t
and that Hamilton’s equations still hold in the new variables.
7.2 (Young’s inequality). Show that for any Legendre transform pair f (x) and
f ∗ (ξ) we have
x · ξ ≤ f (x) + f ∗ (ξ) for all x, ξ ∈ Rn .
Apply this inequality to the function f (x) = |x|p /p for p ∈ (1, ∞) and conclude
|x|p |ξ|q 1 1
x·ξ ≤ + , where p + q = 1.
p q
7.3 (Properties of the Legendre transform). Suppose that f : Rn → (−∞, +∞]
is lower semicontinuous and not identically +∞.
(a) Define the sub-differential
∂f (x) = {v ∈ Rn : f (y) − f (x) ≥ v · (y − x) for all y ∈ Rn }.
Show that if f is convex, then ∂f (x) is nonempty for all x ∈ Rn . Show
that the Legendre transform f ∗ (ξ) (defined by (7.5)) is equal to x·ξ −f (x)
if and only if ξ ∈ ∂f (x).
(b) Show that f ∗ (ξ) is a lower semicontinuous and convex function. Moreover,
show that f ∗∗ is the largest lower semicontinuous convex function that is
less than or equal to f , and f ∗∗ = f if f is convex.
7.4 (Example of Poisson’s theorem). Let L = x × p denote the angular mo-
mentum of a particle x ∈ R3 . Show that for any unit vector n we have
{L, L · n} = L × n.
(Hint: Fix Cartesian coordinates with respect to n and write down the vector
field corresponding to rotation about the n axis. Apply this vector field to L and
recognize this as the canonical Poisson bracket of L with a certain Hamiltonian.
Conclude that {Li , Lj } = −Lk whenever i, j, k is a cyclic permutation of 1, 2, 3.)
CHAPTER 7. HAMILTON’S EQUATIONS 134

7.5 (Chain rule for Poisson brackets). Given a Poisson bracket in the sense of
Definition 7.7, show that we have the chain rule

{V (q), F } = V 0 (q){q, F }.

(Hint: Use the product rule of Definition 7.7 and the Taylor expansion V (q) =
V (q0 ) + V 0 (q0 )(q − q0 ) + (q − q0 )2 W (q).)
7.6. (a) Show that a symplectic matrix M is orthogonal if and only if it takes
the form
A B
−B A
with matrices A and B such that A + iB is a complex unitary matrix.
(Hint: For the reverse direction, start with the relation M T (I + iJ)M =
I + iJ.)
(b) Deduce that such matrices have determinant equal to 1. (Hint: Use a
complex matrix to diagonalize M .)
7.7. Hamilton’s equations also arise from a variational principle, but the func-
tional is unbounded and hence less useful. Given a smooth and strictly convex
Hamiltonian H : R2n → R with H(0) = 0 and an energy level α ∈ R, consider
the functional Z 1
1
E(x(t)) = 2 x(t) · J ẋ(t) dt
0
with the domain
Z 1
Mα = x ∈ C 1 (R; R2n ) : x(t + 1) = x(t), H(x(t)) dt = α .
0

Here, J is the canonical structure matrix (9.6). Show that if x(t) is a critical
point of E on Mα , then x(t) = (q(t), p(t)) is a periodic solution of Hamilton’s
equations (7.2). Show that E is not bounded below on {x ∈ C 1 (R; R2n ) :
x(t + 1) = x(t)}.
CHAPTER 8

NORMAL FORMS

We will study the local structure of Hamiltonian flows in a neighborhood

of a point, with a focus on common physical examples. The material for this
chapter is based upon various sections from [Arn89, MZ05].

8.1. Generating functions

In practice, we would like a way of coming up with transformations Φ :

(q, p) 7→ (Q, P ) that are canonical as defined in (7.17). In Proposition 7.24 we
saw that one way to do this is to use the flow of a Hamiltonian by some fixed
time. In this section we will present another method: generating functions. In
fact, this will provide a characterization of canonical transformations that are
close to the identity in a certain sense. We will also refer to this material in
section 8.4.
As we will see shortly, canonical transformations Φ(q, p) = (Q, P ) near the
identity are determined by a function W of the initial position q and the fi-
nal momentum P . Specifically, W (q, P ) is the generating function of the
transformation Φ if
∂W ∂W
p= , Q= .
∂q ∂P
For example,
W = Pq
is the generating function for the identity map. Although it may be surprising
at first, it is crucial that the function W depends on some of the the initial and
finial variables. On the other hand, we have some choice as to which variables we
pick for W to depend on. Indeed, we can do something similar with generating
functions of q, Q, but this does not encode the identity transformation (and thus
it corresponds to canonical transformations that are far from the identity).
These generating functions W (q, P ) characterize canonical transformations
in the following sense:

Theorem 8.1. Consider R2n endowed with the canonical Poisson bracket J =
J0 and a function Φ : R2n → R2n and write Φ(p, q) = (P (p, q), Q(p, q)).

135
CHAPTER 8. NORMAL FORMS 136

(a) If Φ is canonical and det ∂P

∂p 6= 0, then there exists a function W (q, P ) so
that in neighborhoods of (0, 0) and Φ(0, 0) we have

∂W ∂W
(P, Q) = Φ(p, q) ⇐⇒ p= , Q= . (8.1)
∂q ∂P
2
∂ W
Moreover, we have det ∂P ∂q 6= 0.
2
∂ W
(b) Conversely, if W (q, P ) is smooth and det ∂P ∂q 6= 0, then (8.1) defines a
canonical transformation between neighborhoods of (0, 0) and Φ(0, 0).
Proof. (a) We claim that there exists a function V (p, q) so that
∂V
! P ∂P ! P ∂q ! P ∂P !
∂pi j Qj ∂pij j pj ∂pji j Qj ∂pij
0
∇V = = + P = + . (8.2)
∂V P
Qj
∂Pj ∂q
pj ∂qji
P
Qj
∂Pj pi
∂qi j ∂qi j j ∂qi

For (8.2) to hold, we simply check the equality of mixed partials. We will verify
this for one example:
n n
∂2V ∂ 2 Pj
X
∂ X ∂Pj ∂Qj ∂Pj
= Qj + pi = + Qj ,
∂qk ∂qi ∂qk j=1 ∂qi j=1
∂qk ∂qi ∂qk ∂qi
n n
∂2V ∂ 2 Pj

∂ X ∂Pj X ∂Qj ∂Pj
= Qj + pk = + Qj .
∂qi ∂qi k ∂qi j=1 ∂qk j=1
∂qi ∂qk ∂qi ∂qk

Note that the second terms on the RHS agree. As Φ is canonical, then by
Proposition 7.20 we know that
∂P ∂P
!
0 ∂p ∂q A B
Φ = ∂Q ∂Q =
C D
∂p ∂q

is a symplectic matrix, and so the blocks satisfy (cf. (7.20))

AT C = C T A, B T D = DT B, AT D − C T B = I.

This first relation implies

n n
X ∂Qj ∂Pj X ∂Qj ∂Pj
= AT C = C T A = ,
j=1
∂qk ∂qi j=1
∂qi ∂qk

∂2V ∂2V
which demonstrates the equality of mixed partials for ∂qk ∂qi and ∂qi ∂qk . (The
T T ∂2V T
second relation B D = D B is needed for ∂pi ∂pk and the third relation A D −
T ∂2V
C B = I for ∂pi ∂qk .)
In fact, this demonstrates that (8.2) is locally solvable by
some V if and only if Φ is canonical.
CHAPTER 8. NORMAL FORMS 137

Next, as det ∂P
∂p 6= 0, the implicit function guarantees that (p, q) 7→ (P, q) is
a local diffeomorphism. Therefore we may define

W (q, P ) = V (q, p(q, P )).

It remains to check that the relations (8.1) hold so that W is the generating
function for Φ. To emphasize that we are working with the coordinates (P, q),
we may write this as

∂W ∂W
p= , Q= .
∂q P ∂P q

Think of (8.2) as an equality of directional derivatives. (Indeed, in differential

geometry notation we would write (8.2) as Q dP + p dq = dV .) We plug in the
direction ( ∂q∂k )P into (8.2) to obtain

∂W ∂V
= = 0 + pk .
∂qk P ∂qk
∂
Likewise, ( ∂P k
)q yields

∂W ∂V
= = Qk + 0.
∂Pk q ∂Pk
2
∂ W
(b) Given W (q, P ) smooth such that det ∂P ∂q 6= 0, the implicit function
theorem guarantees that (P, q) 7→ (p, q) for p = ∂W ∂q is a local diffeomorphism.
∂W
In particular, Q = ∂P can be written as a smooth function of p, q.
To verify that (p, q) 7→ (P, Q) is canonical, it suffices to check that (8.2) holds
for W . This is a straightforward computation using the values of the partial
derivatives guaranteed by the implicit function theorem.

8.2. Local structure of Hamiltonian flows

We would like to understand the structure of a Hamiltonian flow in a neigh-

borhood of a point up to changes in coordinates. Given an arbitrary Poisson
structure J(x) and a point x0 , we may choose canonical coordinates on a neigh-
borhood of x0 by the Darboux theorem (Theorem 7.18). If there are Casimirs,
we can restrict our attention to one of their level sets so that the flow takes the
form
0 I
ẋ = J∇H, J= . (8.3)
−I 0
First, consider the case where ∇H(x0 ) 6= 0. From the general theory of
ODEs (cf. Proposition A.18) this implies that there is a choice of coordinates
so that the flow is uniform, but we can say something more specific about our
Hamiltonian ODE. In our proof of Theorem 7.18, we can make any choice for
CHAPTER 8. NORMAL FORMS 138

the first coordinate p1 , like p1 = H. Therefore, we can construct local canonical

coordinates (H, p2 , . . . , pn , q1 , . . . , qn ), and our flow is then

q̇1 = p1 , ṗ1 = 0, q̇i = ṗi = 0 for i = 2, . . . , n.

Next, consider the case of a fixed point ∇H(x0 ) = 0. After translating,

we may assume that x0 = 0. The standard procedure in the theory of ODEs
(cf. section A.6) is to linearize:

ẏ = JH 00 (0) y.

(8.4)

This is a Hamiltonian flow for the quadratic Taylor approximation of our Hamil-
tonian:
H(0) + 21 x · H 00 (0)x. (8.5)
We will study quadratic Hamiltonians in finer detail in section 8.3.
For general ODEs, the Hartman–Grobman theorem (Theorem A.21) tells us
that if the matrix for the linearized flow (8.4) has no purely imaginary eigen-
values, then the actual flow (8.3) is qualitatively determined by the spectrum
of the matrix for the linearized flow. For Hamiltonian flows, the matrix in (8.4)
must be of the form JH 00 (0) for H 00 (0) real and symmetric, and so we have some
conditions on the spectrum:
Proposition 8.2. The linearized equation (8.4) has the following properties.

(a) A complex number λ ∈ C is a eigenvalue for JH 00 (0) if and only if λ is,

and λ, λ have equal algebraic and geometric multiplicities.
(b) A complex number λ ∈ C is a eigenvalue for JH 00 (0) if and only if −λ is,
and λ, −λ have equal algebraic and geometric multiplicities.
00
(c) The linear flow map etJH (0)
is a symplectic matrix for every t ∈ R.

Note that property (b) tells us that the trace of JH 00 (0) is zero, and so the
00
linear flow map etJH (0) preserves volume in phase space. This must be the case
by Liouville’s theorem (Proposition 7.23), since the linear flow is Hamiltonian
for the Hamiltonian (8.5).
Proof. (a) This is because the matrix JH 00 (0) is real.
(b) Let σ(JH 00 (0)) denote the spectrum of JH 00 (0). As H 00 (0) is symmetric,
we have

σ JH 00 (0) = σ [JH 00 (0)]T = σ −H 00 (0)J

= σ J[−H 00 (0)J]J −1 = σ −JH 00 (0) .

(c) This is because the linear flow is Hamiltonian for the Hamiltonian (8.5).
Indeed, the flow map for a Hamiltonian flow is a canonical transformation by
Proposition 7.24, and for the canonical Poisson bracket this simply means that
00
the derivative of etJH (0) is a symplectic matrix by Proposition 7.20.
CHAPTER 8. NORMAL FORMS 139

00
While property (c) tells us that etJH (0) is always a symplectic matrix, not
00
every symplectic matrix admits the representation etJH (0) for a symmetric
matrix H 00 (0). For example, the matrix

−4 0
(8.6)
0 − 41
00
is symplectic, and it cannot be written as etJH (0) since then it would admit
00
a real square root etJH (0)/2 ; this root would then have eigenvalues ±2i and
1
± 2 i, which is too many. (Note that this example is not near the identity,
00
while etJH (0) tends towards the identity as t → 0.) On the other hand, it
can be shown that the polar decomposition matrices of a symplectic matrix are
symplectic, and so it follows that every symplectic matrix is the product of two
00
matrices of the form etJH (0) .

8.3. Normal forms for quadratic Hamiltonians

In this section, we seek a global understanding of the flow for quadratic

Hamiltonians. As in the beginning of section 8.2, it suffices to consider the
canonical Poisson bracket. At the end of the section we will turn to a discussion
of the general case, but first we will study the important case of the Hamiltonian
n
X n
X
1 −1 1
H= 2 Mij pi pj + 2 Vij qi qj (8.7)
i,j=1 i,j=1

for M a positive definite matrix and V a symmetric matrix. We call M the

mass matrix so that the first term on the RHS is the kinetic energy, and the
second term is the potential energy landscape.
Such a system arises naturally in practice. Consider a system of particles
with Hamiltonian
X n X
1 2 1
H= 2mi p i + 2 U (qi − qj ).
i=1 i<j
P
For such a system, the total momentum P = pi is conserved because H is
invariant under a collective translation; this follows from Proposition 1.13 or
a simple computation. Consequently, it suffices to study the P motion from
P the
center of mass frame, where P = 0 and the center of mass Q = ( mi qi )/( mi )
is conserved. As we now have P = 0, we might eliminate one pi , say, pn =
−p1 −· · ·−pn−1 . In this way, mass matrices arise naturally. Quadratic potential
energies frequently arise as Taylor approximations for more general systems, and
the behaviors that arise from this special case is fundamental to understanding
the general case.
We would like to boil down the Hamiltonian (8.7) to its essential parts in
order to obtain a simple description of its flow. A first step is to make the mass
matrix go away, via the canonical change of variables
1 1
pnew = M − 2 pold , qnew = M 2 qold .
CHAPTER 8. NORMAL FORMS 140

1
Here, we take the self-adjoint square root M 2 . This is a canonical transforma-
tion because the matrix
A 0
0 A−T
is symplectic whenever A is invertible (which is easily verified using (7.20)). In
the new variables, the Hamiltonian (8.7) becomes

H = 12 |p|2 + 12 q · Ve q (8.8)
1 1
for the new symmetric matrix Ve = M − 2 V M − 2 .
Next, we diagonalize Ve :
 
κ1
Ve = O 
 ..  T
O ,
.
κd

where O is an orthogonal matrix. Consider the canonical change of variables

pnew = OT pold , qnew = OT qold .

This turns the Hamiltonian (8.8) into

n
X
H= 1
2 (p2i + κi qi2 ). (8.9)
i=1

Altogether, we have reduced the general quadratic system (8.7) to a system of

decoupled one-dimensional particles.
If κi > 0, then we set ω 2 = κi . The Hamiltonian for the ith particle is then

Hi = 21 p2 + 12 ω 2 q 2 ,

which is the harmonic oscillator of Example 2.2. Indeed, the equations of motion
take the form
q̇ 00 q 0 1 q
= JH (0) = .
ṗ p −ω 2 0 p
From its trace and determinant, we see that the that the matrix on the RHS
has eigenvalues ±iω. The trajectories in phase space are ellipses due to the
conservation of H. In fact, we can perform one further canonical change of
variables
1 1
pnew = ω − 2 pold , qnew = ω 2 qold
to symmetrize the Hamiltonian
ω 2
Hi = 2 (pi + qi2 ) (8.10)

so that the trajectories are circular.

CHAPTER 8. NORMAL FORMS 141

If κi < 0, then we set γ 2 = −κi . Then the ith Hamiltonian is

Hi = 21 p2 − 12 γ 2 q 2 ,

which is the saddle node of Example 2.1. The equations of motion are

q̇ 0 1 q
= ,
ṗ γ2 0 p

and the matrix on the RHS has eigenvalues ±γ. The trajectories in phase space
trace out the hyperbolic level sets of Hi . To symmetrize the Hamiltonian, we
can take the canonical change of variables
1 1
pnew = γ − 2 pold , qnew = γ 2 qold

to make
Hi = γ2 (p2i − qi2 ).
Alternatively, the canonical change variables

pnew 1 1 −γ pold
=√ −1
qnew 2 γ 1 qold

makes
Hi = γpi qi . (8.11)
Lastly, in the case κi = 0 we have

q̇ 0 1 q
= ,
ṗ 0 0 p

a 2 × 2 Jordan block with eigenvalue zero. The Hamiltonian

Hi = 12 p2i (8.12)

is shaped like a trough. Consequently, the origin is a non-isolated fixed point

that is not Lyapunov stable; trajectories are straight linear of constant p (and
q(t) = q0 + tp0 ). This is called rectilinear motion (or sometimes secular drift).
Altogether, we notice that notice that if the potential energy is bowl-shaped
in the sense that all κi are positive, then the fixed point at the origin is Lyapunov
stable. If one κi is nonpositive, then the fixed point at the origin is not Lyapunov
stable. Moreover, eigenvalues γ ± iω with both γ, ω 6= 0 do not occur for such
Hamiltonians.
More generally, H bowl-shaped implies that the fixed point at the bottom
is Lyapunov stable:

Theorem 8.3 (Dirichlet–Lagrange). If the Hamiltonian H has a critical point

at x0 and the Hessian matrix H 00 (x0 ) is positive definite, then x0 is a Lyapunov
stable fixed point for Hamilton’s equations (7.15).
CHAPTER 8. NORMAL FORMS 142

Proof. This follows immediately from the fact that H is a (weak) Lyapunov
function (cf. Lemma 2.8) and Theorem 2.10.
As a warning, it is false that a not bowl-shaped Hamiltonian implies that
the the fixed point is unstable. Indeed, saddle-shaped Hamiltonians can have
stable fixed points. For example, the Hamiltonian

H = 12 (p21 + q12 ) − 21 (p22 + q22 )

consists of a harmonic oscillator in the first variables, and a time-reversed har-

monic oscillator in the second variables. In order to determine stability, the key
is how the saddle directions interact with the symplectic structure.
Altogether, we have seen 3 normal forms: (8.10), (8.11), and (8.12). These
are normal forms in the sense that given a Hamiltonian of the form (8.7), there
is a linear canonical transformation decomposing into a direct sum (cf. (8.9)) of
these 3 forms. For general quadratic Hamiltonians, we have the following result
based on the work of Williamson [Wil36]:

Theorem 8.4 (Williamson’s theorem). Given a nondegenerate Poisson bracket,

there are 9 normal forms so that for any quadratic Hamiltonian there exists a
canonical transformation which decomposes it into a direct sum of these forms.
For a complete list of these canonical forms, see [Arn89, App. 6].
It is indeed possible to have complex eigenvalues γ + iω with both γ, ω 6= 0.
For example, consider the Hamiltonian

H = γ(p1 q1 + p2 q2 ) + ω(p1 q2 − p2 q1 )

which consists of a saddle in the first and second variables plus an interaction
term. This leads to the linear ODE with matrix
 
γ ω 0 0
−ω γ 0 0 
JH 00 =  
 0 0 −γ ω 
0 0 −ω −γ

(in terms of the ordered basis (q1 , q2 , p1 , p2 )), which has eigenvalues ±γ ± iω.
As a side note, if we take γ = log 4 and ω = π then we obtain
 
−4
00 −4
eJH = 
 
.
 − 41 
− 41

Recall that the 2-dimensional version (8.6) of this matrix could not be expressed
in such an exponential form.
CHAPTER 8. NORMAL FORMS 143

8.4. Birkhoff normal form

Let H be a Hamiltonian that is real analytic with a fixed point at the origin.
We seek to simplify the structure of H near the origin via a smooth and canonical
change of variables. As in the beginning of section 8.2, it suffices to consider
the canonical Poisson bracket.
In section 8.3, we saw how to reduce the quadratic part of the Hamiltonian
to a normal form. Consequently, we now seek a nonlinear change of variables
(near the identity) to reduce the complexity of the nonlinear terms. The method
we will present is iterative, in the sense that we tidy up the terms in the Taylor
expansion one order at a time.
Inductively assume that we have treated the Taylor expansion up to order
k − 1 for some k ≥ 3. We will prescribe a canonical transformation via a
generating function W (P, q) of the form discussed in section 8.1. Our generating
function will be of the form
W (P, q) = P q + F (P, q), (8.13)
where the first term P q which generates the identity, and the error F is a
polynomial that is homogeneous of degree k. The transformation generated
by (8.13) is
∂F ∂F
p=P+ (P, q), Q=q+ (P, q). (8.14)
∂q ∂P
These coordinates are guaranteed to be a local diffeomorphism near the origin
because the leading term is the identity and the remainders above are degree 2
or higher. Rearranging (8.14), we have

∂F ∂F ∂F
P =p− (p, q) + (p, q) − (P, q)
∂q ∂q ∂q

∂F ∂F ∂F ∂F
=p− (p, q) + (p, q) − p− ,q .
∂q ∂q ∂q ∂q

Note that the input for the first ∂F

∂q on the RHS is p and not P ; we are treating F
here as a mathematical function here (rather than an observable) and inserting
p in place of P without changing the formula for F . As F is a homogeneous
polynomial of degree k ≥ 3, then the square-bracketed term on the RHS is of
degree at least 2k − 3 ≥ k in p, q, and thus is higher order than the first terms:
∂F
(p, q) + O |p|k + |q|k .

P =p− (8.15)
∂q
Similarly, we have
∂F
(p, q) + O |p|k + |q|k .

Q=q+ (8.16)
∂p
We are now prepared to perform our change of variables. Taylor expand
H = H2 + H3 + · · · + Hk + . . . ,
CHAPTER 8. NORMAL FORMS 144

where Hn is a homogeneous polynomial of degree n. We may assume that

H0 = 0 after adding a constant to our Hamiltonian, and H1 = 0 since the origin
is a fixed point. Let Φ(p, q) = (P, Q) be the canonical transformation associated
to the generating function W . By (8.15) and (8.16), the leading order of this
transformation yields
H ◦ Φ(p, q) = H2 (p, q) + · · · + Hk−1 (p, q)

∂H2 ∂F ∂H2 ∂F
(p, q) + O |p|k+1 + |q|k+1 .

+ Hk + · − ·
∂q ∂p ∂p ∂q
We conclude that the transformation Φ leaves the terms H2 , . . . , Hk−1 that have
already been accounted for untouched, and replaces Hk by Hk + {H2 , F }. It
also changes all of the higher order terms, but these are computable to any finite
degree.
It remains to show that we can pick F so that {H2 , F } cancels Hk . We
will examine how to do this for the examples of normal forms that we saw in
section 8.3.

8.4.1. Hyperbolic case. Suppose that H2 can be fully decomposed into

the direct sum of the normal form (8.11):
n
X
H2 = γj pj qj . (8.17)
j=1

We write X
F = fαβ pα q β ,
|α+β|=k

where the sum rangesPover multiindices α, β ∈ Nn , we use the notations pα =

pα1 · · · pα
n and |α| =
n
|αj |, and we take coefficients fαβ ∈ R. We compute
n
X X ∂H2 ∂ ∂H2 ∂
{H2 , F } = fαβ (pα q β ) − (pα q β )
j=1
∂qj ∂pj ∂pj ∂qj
α,β
X n
X
γj αj − γj βj pα q β

= fαβ
α,β j=1
X
= γ · (α − β)fαβ pα q β ,
α,β

where γ = (γ1 , . . . , γn ). Consequently, we may cancel any term hαβ pα q β in Hk

by setting
hαβ
fαβ = − , (8.18)
γ · (α − β)
provided that γ · (α − β) 6= 0. In particular, we cannot eliminate terms with
α = β. The best case scenario is when γ is nonresonant of degree k:
n
X X
ki γi 6= 0 for all k1 , . . . , kn ∈ Z such that 0 < |ki | ≤ k,
i=1
CHAPTER 8. NORMAL FORMS 145

then all terms with α 6= β can be canceled. If γ is resonant, then we can

still apply this method to remove all nonresonant terms; note that we did not
assume that H3 , . . . , Hk−1 were in normal form, and so the presence of surviving
resonant terms is okay. This leaves us with a resonant normal form, which is still
simpler than the original Hamiltonian. As γ ·(α−β) appears in the denominator
of (8.18), we see that when γ · (α − β) is small then we must choose fαβ large—
this resonance phenomenon is called small denominators.
Suppose that γ is nonresonant of degree k. Then for all α 6= β we may
prescribe the coefficients of F according to (8.18). This yields a canonical change
of variables in a neighborhood of the origin which turns H into
k X
X
cα pα q α + O |p|k+1 + |q|k+1 .

H = H2 (p, q) +
j=3 |α|=j

Setting
Ij = pj qj ,
we see that all of the terms above are functions only of the Ij :

H = h(I1 , . . . , In ) + O |p|k+1 + |q|k+1 .

(8.19)

As Ij depends only on the jth variables, then we have

{Ij , Ik } = 0 for all j, k. (8.20)

Thus

I˙j = {Ij , H} = {Ij , h(I1 , . . . , In )} + Ij , O |p|k+1 + |q|k+1

k+1 k+1 (8.21)
= O |p|k+1 + |q|k+1 = O |I1 | 2 + · · · + |In | 2 ,

and so the Ij are almost conserved by the flow for k large.

8.4.2. Elliptic case. Suppose that H2 can be fully decomposed as the

direct sum of the normal form (8.10):
n
X
2
H2 = 1
2 ωj (pj + qj2 ). (8.22)
j=1

It turns out that the computation is more elegant in terms of the complex
variables
zj = pj + iqj , z j = pj − iqj
so that
n
X
1
H2 = 2 ωj z j z j ,
j=1
CHAPTER 8. NORMAL FORMS 146

even though our final choice of canonical transformation will be real. Note that
complex polynomials in pj , qj correspond to complex polynomials in zj , z j . We
also extend the bracket {·, ·} to be bilinear (as opposed to Hermitian), so that

{zj , zk } = 0, {z j , z k } = 0, {zj , z k } = 2iδjk .

We write X
F = fαβ z α z β ,
|α+β|=k

with the requirement that fβα = fαβ so that F is real-valued. Using the product
rule (cf. Definition 7.7), we compute

X n
X
1 α β
{H2 , F } = fαβ 2 ωj {zj z j , z z }
α,β j=1
X n
X
1
zj {z j , z α z β } + z j {zj , z α z β }

= fαβ 2 ωj
α,β j=1
X n
X
1
−2iαj + 2iβj z α z β

= fαβ 2 ωj
α,β j=1
X
= −i ω · (α − β)fαβ z α z β ,
α,β

where ω = (ω1 , . . . , ωn ). Consequently, we may cancel any term hαβ z α z β in Hk

by setting
ihαβ
fαβ = , (8.23)
ω · (α − β)
whenever ω · (α − β) 6= 0. Note that as Hk is real-valued, then these coefficients
automatically satisfy the reality condition fβα = fαβ .
We conclude that if ω is nonresonant of degree k then we may eliminate
(using a real-valued change of variables) all terms of Hk with α 6= β, leaving
k X
X
cα z α z α + O |p|k+1 + |q|k+1 .

H = H2 (p, q) +
j=3 |α|=j

Setting
Ij = 12 zj z j = 21 (p2j + qj2 ),
we obtain
H = h(I1 , . . . , In ) + O |p|k+1 + |q|k+1 .

(8.24)
Moreover, the variables Ij still satisfy the bracket relations (8.20) and are almost
conserved by the flow in the sense of (8.21).
CHAPTER 8. NORMAL FORMS 147

8.5. Completely integrable systems

Suppose that we have a Hamiltonian whose quadratic part is of the elliptic

type (8.22), and assume that the Birkhoff normal form produces a convergent
change of variables as k → ∞. This is a strong assumption, as it requires that
the coefficients fαβ of the kth generating function Fk decay sufficiently rapidly
as k → ∞. In the formulas (8.18) and (8.23) we saw that the frequencies ω of the
quadratic normal form appear in the denominator as the term ω · (α − β). These
denominators will not vanish provided that ω is nonresonant of all degrees, but
they still may be very small rather often. (This is closely related to the concepts
of Diophantine and Liouville numbers in number theory.)
Under this assumption, by taking k → ∞ in (8.24) our new Hamiltonian is
of the form
H = h(I1 , . . . , In ) (8.25)
with h(0) = 0 and ∇h(0) = ω. Moreover, the new variables Ij = 12 (p2j + qj2 ) are
in involution (or Poisson commute) with each other:

{Ij , Ik } = 0 for all j, k. (8.26)

Together with (8.25), this implies that the quantities Ij are conserved:

I˙j = {Ij , H} = 0 for all j. (8.27)

When we have a set of n functionally independent quantities satisfying (8.26)

and (8.27), we say that the Hamiltonian flow of H is completely integrable.
(The condition that the Ij are functionally independent is intentionally a little
vague, but it is essential and will be made concrete in each rigorous statement
that we make.) Note that canonical transformations preserve complete inte-
grability, and so the Hamiltonian flow of H remains completely integrable even
after we undo the Birkhoff normal form transformation.
To fill out these quantities into a complete set of coordinates, we define

φj = arg(pj + iqj ) ∈ R for pj + iqj ∈ C r {0}.

A direct computation shows that these variables satisfy the relations

{φj , φk } = 0, {φj , Ik } = δjk for all j, k. (8.28)

Together, we obtain a complete set of coordinates satisfying the following prop-

erties.
Definition 8.5. Coordinates (φ, I) on R2n are action-angle coordinates for
the Hamiltonian H if:
(a) They satisfy the canonical relations (8.26) and (8.28),

(b) The Hamiltonian H = h(I1 , . . . , In ) is a function of only I1 , . . . , In ,

CHAPTER 8. NORMAL FORMS 148

∂h
I˙j = 0, φ̇j = (I1 , . . . , In ) = constant
∂IJ
for all j.
So far, we have seen that in the elliptic case, if the Birkhoff normal form con-
verges as k → ∞ then the system is completely integrable and we can find a set
of action-angle coordinates. This can also be done in the hyperbolic case (8.17)
and other cases as well, although the angle coordinate is no longer corresponds
to an actual angle. (For example, for the free Hamiltonian (8.12) with rectilin-
ear motion the angle coordinate q(t) = q0 + tp0 traces out a straight line.) As
it turns out, the converse is also true: if a system is completely integrable, then
the Birkhoff normal forms converge.
For the remainder of this section, we will abandon the ellipticity and normal
form assumptions and consider a general completely integrable Hamiltonian H.
By definition, the existence of action-angle coordinates immediately implies that
the system is completely integrable. The converse is also true: if a system is
completely integrable then formally we may find action-angle coordinates. This
process is called Liouville integration, as the action-angle coordinates provide
parametric solutions to the equations of motion.
It should be noted that completely integrable systems are quite exceptional
within the class of Hamiltonian systems. Specifically, Siegel [Sie54] showed that
generic (in the sense of the Baire category theorem) analytic Hamiltonians in a
neighborhood of a fixed point are not integrable.
Nevertheless, complete integrability is general enough to include a rich family
of examples. In addition to the Birkhoff normal form example at the beginning
of this section (and the Toda lattice system that we will study in section 8.6),
we have the following two examples.
Example 8.6 (One-dimensional systems). Every conservative system with one
degree of freedom:
E = 21 mẋ2 + V (x), x ∈ R
is completely integrable. Indeed, phase space is two-dimensional and so this
total energy provides the one conserved quantity that we require. Rearranging
this conservation law, we are able to replace the second-order equation of motion
with the first-order equation
q
2
ẋ = ± m [E − V (x)],

which is separable and thus provides a formal solution for the motion. Even
without explicitly evaluating this integral, we were able to use this formula in
section 2.5 to draw many conclusions (Propositions 2.16, 2.18 and 2.19).
CHAPTER 8. NORMAL FORMS 149

Example 8.7 (Central fields). A particle in R3 moving under the influence of

a radial potential:
H = 2m 1
|p|2 + V (|x|), x ∈ R3
is completely integrable. As phase space is six-dimensional, this requires three
conserved quantities. Spherical symmetry tells us that the 3 components of the
angular momentum L = x × p are conserved (cf. Proposition 1.16). However,
these quantities are not in involution! Indeed, we have
{L1 , L2 } = L3 , {L2 , L3 } = L1 , {L3 , L1 } = L2 ,
which expresses the fact that the components of the angular momentum provide
a representation of the Lie algebra of the rotation group SO(3).
Nevertheless, this system is completely integrable. It is straightforward to
check that the three functionally independent quantities
H, |L|2 , L3
are in involution. In section 3.1, we used this to replace the second-order equa-
tions of motion with first-order equations, which we could integrate to obtain a
formal solution (cf. (3.5) and (3.6)).
For the Kepler potential
V (r) = −k|x|−1 , k 6= 0
we know that all bounded orbits are closed. However, the 4 conserved quantities
H, L only tell us that orbits are constrained to a two-dimensional surface and
hence may not be—and in general are not—closed (cf. Theorem 3.3). For the
Kepler potential, Laplace discovered that
x
A=L×p+k
|x|
is also conserved. Now we have 7 conserved quantities, which is too many! It
turns out that there are exactly 2 redundancies:
L · A = 0, |A|2 = 2H|L|2 + k 2 .
This leaves us with 5 functionally independent conserved quantities, which im-
plies that trajectories are restricted to a one-dimensional manifold as desired.
The Poisson brackets of these 5 quantities can also be computed, and it turns
out that they provide a representation of the four -dimensional rotation group
SO(4). Moser (cf. [MZ05, §1.6]) explained this symmetry by showing that for
negative energy, the Kepler problem is diffeomorphic to geodesic motion on the
sphere S 3 ⊂ R4 .
These examples illustrate the procedure of Liouville integration: when a
system is completely integrable, there are enough conserved quantities so that
we may integrate the equations of motion and find a formal expression for
the solution. In practice, the exact procedure varies from system to system.
Nevertheless, we have the following general fact:
CHAPTER 8. NORMAL FORMS 150

Theorem 8.8 (Liouville–Arnold–Jost). Suppose that F1 , . . . , Fn : R2n → R

are smooth and {x ∈ R2n : F(x) = c0 } is a smooth compact connected n-
dimensional manifold on which ∇F1 , . . . , ∇Fn are linearly independent. If the Fi
are also in involution with each other, then there exist action-angle coordinates
φ ∈ Rn /2πZn , I ∈ Rn so that the canonical relations (8.26) and (8.28) hold,
I ≡ I(F1 , . . . , Fn ) is a function only of F, and I is a smooth local diffeomorphism
near c0 .
We partially attribute this result to Liouville, as he founded the idea of
Liouville integration. Arnold was the first to make this rigorous statement on
R2n , and Jost extended the result to general manifolds.
Topologically, we interpret Theorem 8.8 as saying that we have a foliation of
R2n by tori in a way that respects the Poisson structure. Indeed, (φ, I) provides
a map from a neighborhood of {F = c0 } in R2n into Rn /2πZn × {c ∈ Rn :
|c − c0 | < }. For fixed c, we call {F = c} a torus because this map provides a
parameterization from {F = c} to Rn /2πZn × {c}.
In this way Theorem 8.8 gives us a good understanding of the motion, but
it is not complete. In particular, we are making no claim about what the fixed
points look like (cf. the double well potential of Example 2.17).
Proof. As ∇F1 , . . . , ∇Fn are linearly independent, there exists some δ > 0 so
that for |c − c0 | < δ the equations F = c define smooth surfaces near F = c0 .
For each c we pick a smoothly varying base point x0 (c) ∈ {F = c}, and define
the smooth map

Rn × Bδ (c0 ) → R2n
(t, c) 7→ e t1 J∇F1
· · · etn ∇Fn x0 (c)

which flows out from the base point x0 (c) within {F = c} along the commuting
flows ẋ = J∇Fj (x).
As ∇F1 , . . . , ∇Fn are linearly independent, this map is a local diffeomor-
phism. In fact, we claim that for fixed c this forms a covering map from Rn
onto {F = c}. It only remains to show surjectivity, for which we use con-
nectedness. Consider the set of points in {F = c} accessible by this map. It
is relatively open as the map is a local diffeomorphism. Its complement must
then also be open, by the same argument for a different choice of base point.
Therefore, as {F = c} is connected then this set must contain the whole set.
For each c, let L denote the set of times t ∈ Rn for which the flow et1 J∇F1 · · ·
etn ∇Fn x0 (c) returns to the base point x0 (c). As the Fj flows commute, this set
L is an additive subgroup of Rn . This subgroup must be discrete, since the
map is locally injective in a neighborhood of x0 (c). Therefore L is a lattice.
It is a fact that any such lattice must be of the form { mi ei : m ∈ Zd } for
P
some generators {ei }di=1 ⊂ Rn and d ≤ n. (To show this, we can pick e1 closest
to the origin, show that taking the quotient by e1 yields a lattice with the
same properties, and then induct.) Moreover, we must have a full set d = n
of generators because {F = c} is compact. (Indeed, if we had d < n, then the
image of our map would be a cylinder which is not compact.)
CHAPTER 8. NORMAL FORMS 151

So far, we have a smooth map foliating the manifolds {F = c : |c − c0 | < δ}

given by the coordinates (t, F), and we have a lattice L(c) ⊂ Rn of full rank that
depends smoothly on c. By applying an invertible matrix A(c) that smoothly
varies with c, we can find new coordinates (θ, F) so that L = Zn :

θ ∈ Rn /Zn , θ = A(F)t.

However, these coordinates do not retain much of the Poisson structure; we have

{Fi , Fj } = 0, {θi , Fj } = cij (F), {θi , θj } = bij ,

where cij (F) is an invertible matrix (since A is), and the bij are smooth functions
that are otherwise unknown.
By the Jacobi identity, we have

{bij , Fk } + {{Fk , θi }, θj } + {{θj , Fk }, θi } = 0.

Note that {Fk , θi } and {θj , Fk } are only functions of F and not θ. Therefore,
we deduce that {bij , Fk } depends only on F. On the other hand, we have
n
X ∂bij ∂bij
{bij , Fk } = {θ` , Fk } + {F` , Fk } .
∂θ` ∂F`
`=1

The second bracket {F` , Fk } above vanishes, while the first bracket {θ` , Fk } =
∂b
cij (F) is invertible and depends only on F. Together, we conclude that ∂θij`
depends only on F. (In other words, the gradient of bij within {F = c} is
constant.) Therefore, each bij also depends only on F:

{θi , θj } = bij (F), (8.29)

∂b ∂b
since otherwise one of the ∂θij` would have to depend on ∂θij` . (To verify this,
we can expand bij (F, θ) in a Fourier series in θ with F-dependent coefficients
∂b
and use that the derivatives ∂θij` are constant in θ.)
Lastly, we will perform one further changes of variables

I = I(F), φ = θ + f (I) (8.30)

to fix the Poisson bracket relations. We would like to make

n
X ∂Ij
δij = {θi , Ij } = {θi , Fk }. (8.31)
∂Fk
k=1

The brackets {θi , Fk } = cik (F) are invertible, and so this uniquely prescribes
∂I ∂I
the derivatives ∂Fjk . Moreover, we see that ∂Fjk is invertible, and so F 7→ I
will be a local diffeomorphism by the inverse function theorem, once we know
∂I
it exists. The values of ∂Fjk will correspond to the derivatives of a function I if
CHAPTER 8. NORMAL FORMS 152

∂
we have equality of mixed partials. To avoid the mess of directly applying ∂Fk
to (cij )−1 , we instead take the bracket of (8.31) with θ` :
n n
X ∂Ij X ∂ 2 Ij
0= {{θi , Fk }, θ` } + {θi , Fk }{Fm , θ` }.
∂Fk ∂Fm ∂Fk
k=1 k,m=1

j ∂2I
We want this expression to force the term ∂Fm ∂F k
to be symmetric in k, m. The
brackets {θi , Fk } = cik and {Fm , θ` } = −c`m are invertible, and multiplying by
their inverses ciµ and −cmν yields

∂ 2 Ij X ∂Ij
= {{θi , Fk }, θ` }ciµ cmν . (8.32)
∂Fν ∂Fµ ∂Fk
i,k,m

Note that the Jacobi identity and (8.29) imply

{{θi , Fk }, θ` } = {{θ` , Fk }, θi }.

Together, we see that RHS(8.32) is indeed symmetric in µ and ν, and so the

change of variables I(F) exists so that the bracket relations (8.31) hold.
It only remains to pick the function f in our change of variables (8.30).
Currently, the variables (θ, I) obey

{θi , Ij } = δij , {Ii , Ij } = 0, {θi , θj } = ebij (I). (8.33)

The first relation is by our requirement (8.31), and the second relation is because
I = I(F). We want to choose f so that the change of variables φ = θ + f (I)
makes
∂fj ∂fi
0 = {φi , φj } = ebij (I) + (I) − .
∂Ii ∂Ij
This prescribes the “curl” of f (I), and it is solvable by some f because

ebij = −ebji

by the antisymmetry of the bracket of θi , θj , and

∂ebij ∂ebjk ∂ebki

+ + =0
∂Ik ∂Ii ∂Ij

by the Jacobi identity for θi , θj , θk .

Altogether, we have found a change of variables (8.30) which makes

{φi , Ij } = δij , {Ii , Ij } = 0, {φi , φj } = 0.

Indeed, the first relation follows from the corresponding relation (8.33) for θi
and that f is a function of I. These are exactly the canonical relations (8.26)
and (8.28), and so this concludes the proof of Theorem 8.8.
CHAPTER 8. NORMAL FORMS 153

8.6. Toda lattice

To conclude this chapter, we will present an introduction to an example of an

infinite-dimensional completely integrable system. Define the one-dimensional
potential
V (x) = e−x − 1 + x, (8.34)
and consider the following infinite chain of oscillators:
X
1 2
H= 2 pi + V (qi+1 − qi ). (8.35)
i∈Z

This is a model for a one-dimensional crystal for the specific choice (8.34) of
interaction potential. There is no issue in working with this infinite-dimensional
system. Indeed, local well-posedness follows from the Picard–Lindelöf theorem
in Banach spaces (Theorem A.2). As V ≥ 0, we see that H controls the `2 norm
of q̇, and so conservation of energy implies global-in-time existence of solutions
for localized excitations.

V (x)
∼ e−x ∼x

x
∼ 12 x2

Figure 8.1: The potential energy (8.34).

The equations of motion are

q̇i = pi , ṗi = −V 0 (qi − qi−1 ) + V 0 (qi+1 − qi ) = eqi−1 −qi − e−qi+1 +qi (8.36)

for i ∈ Z. Note that the contributions from the x term in (8.34) cancel out
because qi appears in the Hamiltonian for the ith and (i − 1)st particles. As
we will see, the infinite-dimensional system (8.36) turns out to be completely
integrable in a certain sense. In order to demonstrate this however, we will
need to exhibit infinitely many conserved quantities. To this end, we will need
to move beyond physical quantities like momentum and energy, and develop a
systematic tool that generates these conservation laws.
Our first step is the following (non-canonical) change of variables due to
Flaschka:
bi = − 12 pi , ai = 12 e(qi −qi+1 )/2 .
Note that ai , bi do not determine pi , qi exactly, but rather up to a global trans-
lation in qi . This is okay by the conservation of total momentum. In terms of
CHAPTER 8. NORMAL FORMS 154

the new variables ai , bi , our system (8.36) becomes

ȧi = (bi+1 − bi )ai , ḃi = 2(a2i − a2i−1 ).

A straightforward computation shows that this system is equivalent to the ma-

trix equation
L̇ = [P, L] = P L − LP (8.37)
for the matrices
.. .. .. ..
   
. . . .
. . . .
 .  . 0
 
 b0 a0 
  a0 

L(t) =  a0 b1 a1 , P (t) =  −a0 0 a1 .
..  .. 
   
a1 b2 . −a1 0 .
 
 
.. .. .. ..
. . . .

Together, L and P form what is called a Lax pair for the Toda lattice: the
equations of motion can be recast as the equation (8.37) for L symmetric and
P antisymmetric. So far, we intentionally are not making any rigorous claim
about the existence of the matrices L, P . Nevertheless, each entry of the equa-
tion (8.37) only involves finitely many entries of L, P and thus makes sense.
The choice of Lax pair is certainly not unique; indeed, we are free to add any
antisymmetric function of L to P without changing the RHS of (8.37). However,
the structure of P is not surprising once we choose L to be tridiagonal: for
example, we need the super-super-diagonal of L̇ to vanish, which dictates the
entries of P .
Formally, the existence of a Lax pair for a system implies that the system is
completely integrable. If we let U (t) denote the solution to

U̇ = P U, U (0) = I,

then we see that

L(t) = U (t)L(0)U −1 (t) (8.38)
solves (8.37). Moreover, the matrix U (t) remains orthogonal for all time since

d T
U U = U T P T U + U T P U = 0.
dt
Therefore, the formula (8.38) for the solution tells us that L(t) is orthogonally
conjugate to its initial value. As conjugation preserves spectral information
(e.g. it does not change the Jordan normal form), this implies that the eigen-
values of L(t) are conserved.
In order to make this rigorous, we will restrict to finite dimensional subspaces
of arbitrary size. We will impose that our solutions are periodic with period n:

qi+n = qi , pi+n = pi for all i.

CHAPTER 8. NORMAL FORMS 155

This periodicity is preserved by the dynamics. Now, the matrix equation (8.37)
holds for the n × n matrices
   
b1 a1 an 0 a1 −an
 a1 b2 a2  −a1 0 a2 
 ..   .. 
L=
 a2 b3 . ,

P =
 −a2 0 . 

 .. ..   .. .. 
 . . an−1   . . an−1 
an an−1 bn an −an−1 0

with an = 21 e(qn −q1 )/2 .

For these finite-dimensional matrices, we are justified to compute their spec-
tral properties. First, we note that the trace of L,
n
X n
X
tr(L) = bi = − 12 pi (8.39)
i=1 i=1

is proportional to the total momentum. Moreover, we have

n
X n
X n
X
tr L2 = L2ij = (b2i + 2a2i ) = 1 1 2
+ eqi+1 −qi

2 2 pi
i,j=1 i,j=1 i=1
n
(8.40)
X
= 12 H + 1
1 − (qi+1 − qi ) = 12 H + 21 n,

2
i=1

where H is the Hamiltonian (8.35) (with the summation only over one period
i = 1, . . . , n), and in the last equality we noted that the sum of qi+1 − qi is
telescoping and vanishes.
We know that both of the quantities (8.39) and (8.40) are conserved. In
fact, for any k ≥ 1 we have
d
tr Lk = tr L̇Lk−1 + tr LL̇Lk−2 + · · · + tr Lk−1 L̇

dt
= tr [P, L]Lk−1 + tr L[P, L]Lk−2 + · · · + tr Lk−1 [P, L]

Expanding [P, L] = P L − LP , we see that this sum is telescoping and we are

left with

= tr{[P, Lk ]} = tr P Lk − tr Lk P = 0

by cycling the trace (which is justified for L finite-dimensional).

More generally, for any function f (L) of the self-adjoint matrix L (which
makes sense by the spectral theorem), we have
d
tr{f (L)} = tr [P, f (L)] = 0.
dt
In particular,
d
tr (L − z)−1 = 0.
dt
CHAPTER 8. NORMAL FORMS 156

The function z 7→ tr{(L − z)−1 } is meromorphic with poles at the eigenvalues

of L. From this, we deduce that the n eigenvalues of L are conserved. (This
argument is not the most direct argument for finite-dimensional systems, but it
has the advantage that it can be extended to infinite-dimensional operators.)
It follows that the system is completely integrable. So far, we have found n
conserved quantities, namely the eigenvalues of L. A rather delicate computa-
tion then shows that these n quantities are also in involution; see [MZ05, §3.4]
for details.
Next, we turn to the eigenvectors of L. Given

ψ(0) ∈ Rn r {0} such that (L(0) − λ)ψ(0) = 0,

let ψ(t) to be the solution to

ψ̇ = P ψ. (8.41)
This is a Lipschitz ODE, and thus has a global solution ψ(t) by the Picard–
Lindelöf theorem (Theorem A.2). Then ψ obeys

d
(L(t) − λ)ψ(t) = [P, L]ψ − λP ψ = P (L − λ)ψ.
dt
By Grönwall’s inequality (Lemma A.3), it follows that ψ remains an eigenvector
for all t ∈ R:
(L(t) − λ)ψ(t) ≡ 0.
(This computation is for real ψ; over the field C, we can multiply by eiθ and
account for degenerate eigenvalues as well.) Moreover, the length of ψ is con-
stant:
d
|ψ|2 = P ψ · ψ + ψ · P ψ = ψ · (P + P T )ψ = 0.
dt
Altogether, we conclude that the dynamics preserve the eigenvalues of L while
the eigenvectors move around. With these eigenvectors in hand, we can con-
struct the solution (8.38) for L(t) to the equation (8.37).
So far, we have seen that the infinite-dimensional system (8.35) is completely
integrable, in the sense that it is the limit of n-dimensional completely integrable
systems as n → ∞. For the remainder of this section, we will examine the corre-
sponding action-angle coordinates. For this, it turns out to be more convenient
to work with a different system corresponding to the matrices
   
b1 a1 0 a1
a1 b2 a2  −a1 0 a2 
 ..   .. 
L=
 a2 b3 . ,

P =
 −a2 0 . .

 .. ..   .. .. 
 . . an−1   . . an−1 
an−1 bn −an−1 0

We have deleted the an entries in the corners of the matrix. This yields different
finite-dimensional systems, but they correspond to the same Lax pair as n →
∞. These an entries had originated from the x term in (8.34), and thus these
CHAPTER 8. NORMAL FORMS 157

matrices are the Lax pair for the Hamiltonian (8.35) but with the new potential
energy
V (x) = e−x .
Note that this potential no longer has an equilibrium at x = 0.
Our action-angle coordinates will be closely related to the spectral properties
of the operator L. The matrices L, P are now tridiagonal, and L is the operator
for a discrete Sturm–Liouville problem. Consequently, we expect that L should
have similar properties to a Sturm–Liouville operator. In particular, we claim
that L has simple eigenvalues. If Lu = λu for some u ∈ Rn , then the first entry
reads
b1 u1 + a1 u2 = λu1 .
As aj > 0, we see that u1 determines u2 . With these in hand, we then see that
the second entry of Lu = λu determines u3 , and so on. In this way, u1 uniquely
determines the eigenvector u.
By the spectral theorem, we may construct the discrete probability measure
n
X
dµ(λ) = µj δ(λ − λj ) dλ with µj = u21j .
j=1

Here, (u1j , . . . , unj ) is the eigenvector for the eigenvalue λj :

n
X
Lk` u`j = λj ukj .
`=1

This makes Z
he1 , f (L)e1 i = f (λ) dµ(λ)

for all f : R → R. As we have just seen, L determines the measure µ. Conversely,

given µ, we can reconstruct L as the matrix representation of the operation
f (λ) 7→ λf (λ) in L2 (dµ) in the basis of orthonormal polynomials (i.e. the result
of the Gram–Schmidt procedure applied to 1, x, x2 , . . . , xn−1 ). In other words,
the three middle entries of the kth row of L must satisfy

λpk (λ) = ak pk+1 (λ) + bk pk (λ) + ak−1 pk−1 (λ).

We know that the other entries must be zero, because the RHS must have degree
k + 1 and we have the orthogonality condition
Z Z
λpk (λ) pk−2 (λ) dλ = pk (λ) λpk−2 (λ) dλ = 0

since λpk−2 has degree k − 1.

Next, we turn to the evolution of µ. We have already seen that the eigen-
values λj of L are conserved:
λ̇j = 0,
CHAPTER 8. NORMAL FORMS 158

and that the eigenvectors u evolve according to (8.41). (We proved this for our
previous Lax pair, but the proof only relied on the evolution equation (8.37)
and thus still applies.) Consequently, the evolution of the weights µj is
µ̇j = 2u1j u̇1j = 2u1j a1 u2j = 2(λj − b1 )u21j
X n
= 2(λj − b1 )µj = 2 (λj − λk )µk µj ,
k=1

where in the last equality we used

Z n
X
b1 = he1 , Le1 i = λ dλ = λk µk .
k=1

The solution is given by

e2λj t µj (0)
µj (t) = P 2λ t . (8.42)
e k µk (0)
This is an exponential with rate 2λj that is normalized so that µ is a probability
measure. Together, the eigenvalues λj and the weights µj provide a complete
and simple description of the flow. The eigenvalues are action variables, as they
are conserved and in involution. Strictly speaking, the weights µj are not an-
gle variables since they evolve exponentially rather than linearly; this is easily
remedied by taking their logarithm, but this is not strictly necessary as the µj
already provide an equally simple description of the motion. Altogether, the
transformation from aj , bj to λj , µj is like a nonlinear version of the Fourier
transform: just like the Fourier transform linearizes a linear constant-coefficient
differential equation, we have found a change of coordinates λj , µj which lin-
earizes the flow.
It remains to invert our transformation and recover aj , bj from λj , µj . Let
Z
cj = xj dµ,

and consider the k × k Hankel determinant of the cj :

 
c0 c1 · · · ck−1
 c1 c2 · · · ck 
hk = det  . ..  .
 
. .
 .. .. .. . 
ck−1 ck ··· c2k−2
These quantities are closely related to the Gram–Schmidt procedure that pro-
duces the polynomials pk (x). Specifically, we have
 
c0 c1 · · · ck
 c1
 c2 · · · ck+1 

 .. .. . . ..  .
pk (x) ∝ det  . . . .  (8.43)
 
ck−1 ck · · · c2k−1 
1 x ··· xk
CHAPTER 8. NORMAL FORMS 159

This is because the RHS is a polynomial of degree k, and the Andréief–Heine

identity  
c0 c1 ··· ck
Z  c1
 c2 ··· ck+1 
`  .. .. .. .. 
x RHS(8.43) dx = det  . . . . 
 
ck−1 ck ··· c2k−1 
c` c`+1 ··· c`+k
implies that the integral vanishes for ` ≤ k − 1 (as required by the Gram–
Schmidt procedure). It turns out that the Hankel determinants of the cj also
allow us to recover the ak via the formula
s
hk−1 hk+1
ak = .
h2k

The change of variables from the λj , µj back to the aj , bj is called the inverse
scattering transformation. (The reason for this name is that it is common
for infinite-dimensional completely integrable systems for the action-angle vari-
ables to be connected to scattering theory, and inverting the change of variables
requires new work.)
Finally, we note that the simple evolution of λj , µj also provides a description
of the long-time behavior. After reordering λ1 > λ2 > · · · > λn , we see that
the exponential e2λ1 t is the dominant term as t → ∞, and so from the explicit
solutions (8.42) we have

dµ → δ(λ − λ1 ) dλ as t → ∞.

Therefore Z
b1 = λ dλ → λ1 , a1 → 0,

and so  
λ1 0
0 ∗ ∗ 
.
∗ ..
 
L→
 ∗ .

 .. .. 
 . . ∗
∗ ∗
This resembles the first step of the QR algorithm for the symmetric tridiagonal
matrix L. In fact, we can obtain a dynamical version of the full QR algorithm
by removing the first row and column of L and repeating the process. Indeed,
the next order exponential is e2λ2 t which yields

dµ = Z −1 δ(λ − λ1 ) + e−2(λ1 −λ2 )t δ(λ − λ2 ) + O e−2(λ1 −λ3 )t dλ

for Z a normalization constant, and this suggests b2 → λ2 and a2 → 0. This

can be made rigorous by evaluating the λj in terms of Hankel determinants as
t → ∞; see [MZ05, §3.4] for details.
CHAPTER 8. NORMAL FORMS 160

8.7. Exercises

8.1 (Example of Birkhoff normal form). Fix ω > 0 and a ∈ R, and consider the
Hamiltonian
H = 12 p2 + 12 ω 2 q 2 + aq 4 .

(a) Compute the Birkhoff normal form up to degree 4.

(b) Provide a correspondingly accurate approximation for the period of oscil-
lation as a function of energy.

8.2 (Example of Liouville integration). Fix ω > 0 and a ∈ R, and consider the
Hamiltonian
H = 21 p2 + 12 ω 2 q 2 + aq 4 .

(a) Using conservation of energy under the above Hamiltonian flow, show that
the dynamics yield
q̇ = f (E, q)
for suitable explicit function(s) f .
(b) This equation is separable. Thus one can formally solve for p(t) and q(t)
via integration. Use this method to derive a formula for the period of
oscillation (as a function of energy) in the form of a definite integral.

8.3 (Harmonic oscillator). The one-dimensional harmonic oscillator has Hamil-

tonian
1 2 2 2
H = T + V = 2m p +m 2ω x .

(a) Find the angle variable I and show that the energy as a function of the
angle variable I is h(I) = ωI.
(b) Write down (but do not evaluate) the integral for the generating function
Φ, and show that the angle variable is given by
 
x
φ = tan−1  q  + constant.
2I 2
mω − x

SYMPLECTIC GEOMETRY

Symplectic geometry is the differential geometric generalization of the time-

independent Hamiltonian structure of phase space to general manifolds, and the
physical results for conservative systems from the previous chapter can be lifted
to this perspective. The material for this section is based on [Arn89, Ch. 8]
and [Lee13, Ch. 22].

9.1. Symplectic structure

Let M be a smooth even-dimensional manifold of dimension 2n. A sym-

plectic form or symplectic structure on M is a two-form ω on M that is
closed (dω = 0) and nondegenerate: for each x ∈ M the map Tx M → Tx∗ M
which takes ξ 7→ ξyω = ω(ξ, ·) is invertible. The pair (M, ω) is called a sym-
plectic manifold. There are other more easily verifiable criteria for a two-
form ω (cf. Exercise 9.1), but conceptually this is the classification we will rely
upon. Heuristically, the symplectic form ω is an identification of the tangent and
cotangent spaces at each point in M , which bears resemblance to Riemannian
structure but as we shall see exhibits drastically different behavior.
Symplectic structures are equivalent to nondegenerate Poisson structures via
inverting the structure matrix J:
Example 9.1. If is the structure matrix of a canonical Poisson bracket
n
X ∂ ∂
{·, ·} = J ij (x) ⊗
i,j=1
∂xi ∂xj

as in Corollary 7.9, then

n
X
ω= Jij (x) dxi ⊗ dxj (9.1)
i,j=1

is a symplectic form, where Jij are the entries of the inverse of the matrix J ij .
This is the dual object to Poisson brackets. Indeed, so far we have been
viewing a vector field X as an operator
n
X ∂f
X(f ) = X i (x)
i=1
∂xi

161
CHAPTER 9. SYMPLECTIC GEOMETRY 162

on smooth functions f . We can also view this as the exterior derivative df (X),
where f operates on X and dxi is defined by
(
i ∂ 1 i = j,
dx =
∂xj 0 i 6= j.

The operator (9.1) is an exterior 2-form because J is antisymmetric. It is

also closed because J satisfies the Jacobi identity (Exercise 9.2).
In particular, if we take the canonical Poisson structure in the previous
example, we obtain:
Example 9.2 (Canonical symplectic form). Consider a cotangent bundle T ∗ N
of an n-dimensional manifold N , or simply the Euclidean space T ∗ Rn = Rnq ×Rnp .
Then the two-form
Xn
ω = dp ∧ dq = dpi ∧ dq i (9.2)
i=1
∗
is a symplectic form on T N , and is called the canonical symplectic form
on the cotangent bundle; we will later see how this form naturally encapsulates
Hamilton’s equations. This form is closed, since it is the exterior derivative of
the one-form
X n
p dq = pi dq i , (9.3)
i=1

which is in turn called the tautological one-form on T ∗ M ; we first saw this

form in the period integral (2.9) and then again later in defining the action
variable (6.20). To see that ω is nondegenerate, fix (q, p) ∈ T ∗ N and let
(v1 , . . . , vn , a1 , . . . , an ) ∈ T(q,p) (T ∗ N ) denote the dual basis to (dq, dp) so that

dq i (vj ) = δij , dpi (aj ) = δij , dq i (aj ) = dpi (vj ) = 0

for i, j = 1, . . . , n. Often in differential topology we write ∂qi for the dual vector
to dq i , but physically we like to think of the tangent vector to a point in phase
space as the velocity and (mass-times-)acceleration. The action of ω on these
basis vectors is

ω(vi , vj ) = ω(ai , aj ) = 0, ω(vi , aj ) = −ω(aj , vi ) = δij

∗
for i, j = 1, . . . , n. To check nondegeneracy, fix ξ ∈ T(q,p) (TP N ) and suppose we
∗ n
have ω(ξ, η) = 0 for all η ∈ T(q,p) (T N ). Expanding ξ = i=1 (bi vi + ci ai ) for
i i
constants b , c ∈ R, we see that

0 = ω(ξ, vi ) = −ci , 0 = ω(ξ, ai ) = bi

and so ξ = 0 as desired.
In fact, the form of Example 9.2 is the fundamental example in the following
sense.
CHAPTER 9. SYMPLECTIC GEOMETRY 163

Theorem 9.3 (Darboux’s theorem). Let (M, ω) be a 2n-dimensional symplectic

manifold. For any ξ ∈ M there exists local coordinates (q, p) centered at ξ with
respect to which ω has the representation (9.2).
This follows from Theorem 7.18, but also can be proved directly [Lee13,
Th. 22.13].
Symplectic manifolds are automatically orientable becausePω n is a nonvan-
ishing 2n form on M . Indeed, if we fix x ∈ M and write ω = i dpi ∧ dq i at x,
then the n-fold wedge product
X
ωn = dpi1 ∧ dq i1 ∧ · · · ∧ dpid ∧ dq id
I

where the sum ranges over all multi-indices I = (i1 , . . . , in ) of length n. Any
term where I contains a repeated index is zero because dq i ∧ dq i = 0, and since
2-forms commute under the wedge product then we can rewrite this as
ω d = d! dp1 ∧ dq 1 ∧ · · · ∧ dpd ∧ dq d .
This is nonvanishing at x by definition of the nondegeneracy of ω.

9.2. Hamiltonian vector fields

Next let us see how to use the symplectic structure to obtain the famil-
iar dynamics from a Hamiltonian. Let (M, ω) be a symplectic manifold and
H : M → R be a smooth function. Then dH is a differential one-form which
associates a covector to each point in M . On the other hand, for Hamilton’s
equations we need to specify a vector field for the right-hand side of the differen-
tial equation. By definition, for each x ∈ M the map Tx M → Tx∗ M which takes
ξ 7→ ξyω is invertible, and so let J : Tx∗ M → Tx M denote its inverse. Then
J dH is a vector field on M , called the Hamiltonian vector field associated
to the Hamiltonian H. The induced flow etJ dH : M → M , x0 7→ x(t) defined
to solve the ODE
ẋ(t) = J dH(x(t)), x(0) = x0 (9.4)
is called a Hamiltonian flow on M . In differential geometry the notation
b : Tx M → Tx∗ M is used for the map ξ 7→ ξyω, and so in place of J the
ω
notations ωb −1 and
b −1 (dH)
XH := ω ⇐⇒ XH yω = dH (9.5)
are often used.
Example 9.4. Consider the canonical symplectic form (9.2) on the Euclidean
phase space T ∗ Rn = Rnq × Rnp . On Euclidean space we already have a natural
identification of vectors and covectors via the dot product, and we can express
J as a linear transformation in terms of this identification. Given a vector field
n
i ∂ i ∂
X
X= b +c
i=1
∂q i ∂pi
CHAPTER 9. SYMPLECTIC GEOMETRY 164

for some smooth coefficients bi and ci , we compute

n
X c b
ci dq i − bi dpi =

Xyω = =J
−b c
i=1

and hence
0 I
J= (9.6)
−I 0
for I the identity matrix. Writing
n
!
∂H
∂H ∂H X ∂H i ∂H i ∂q
dH = dq + dp = dq + i dp = ∂H ,
∂q ∂p i=1
∂q i ∂p ∂p

we have !
∂H
q̇ ∂p
= J dH =
ṗ − ∂H
∂q

which agrees with Hamilton’s equations (7.2).

Symplectic geometry is the generalization of time-independent Hamiltonian
dynamics since the Hamiltonian function H is automatically conserved under
its Hamiltonian flow.

Proposition 9.5 (Conservation of energy). Let (M, ω) be a symplectic manifold

and let H be a smooth function on M . Then H is constant along the integral
curves of J dH, and when dH 6= 0 the vector field dH is tangent to the level
sets of H.
Proof. Both are a consequence of

(J dH)(H) = dH(J dH) = ω(J dH, J dH) = 0,

which holds since ω is alternating.

9.3. Integral invariants

Consider a smooth diffeomorphism g : M → M . A k-form η is said to be an

integral invariant for g if Z Z
η= η (9.7)
g(N ) N

for all orientable k-dimensional submanifolds N with boundary.

Theorem 9.6. A Hamiltonian flow gt = etJ dH preserves the symplectic struc-
ture: gt∗ ω = ω. In other words, the symplectic form ω is an integral invariant
of gt .
CHAPTER 9. SYMPLECTIC GEOMETRY 165

Proof. The map gt is smoothly homotopic to the identity via the family of
diffeomorphisms gs , s ∈ [0, t], in the sense that at time s = 0 the map g0 :
M → M is the identity, and at time s = t the map gt : M → M is what
we are given. Fix N a smooth orientable 2-dimensional submanifold, and let
ΩN := {gs (N ) : s ∈ [0, t]} denote the image of N under the homotopy. We can
think of ωN as an orientable 3-manifold in [0, t] × M or as being immersed in
M . With this choice of orientation we have
∂ΩN = gt (N ) ∪ (−N ) ∪ (−Ω∂N ). (9.8)
We claim that for any smooth curve γ in M we have
Z Z
d
ω= dH, (9.9)
dt Ωγ g t (γ)

where H is the Hamiltonian for the flow g t . Let φ : [a, b] → M be a parametriza-

tion of γ. Then Ωγ is parameterized by Φ(s, x) := g s (φ(x)) and we have
Z Z tZ b
∂Φ ∂Φ
ω= ω , dx ds
Ωγ 0 a ∂x ∂s

where ∂Φ∂s (s, x) is in TΦ(s,x) M . By definition of the Hamiltonian phase flow

gt = e tJ dH
the tangent vector ∂Φ ∂s points in the direction of J dH, and so we
have ω( ∂x , ∂s ) = dH( ∂Φ
∂Φ ∂Φ
∂x ). Therefore
Z Z tZ b Z t Z !
∂Φ
ω= dH dx ds = dH ds,
Ωγ 0 a ∂x 0 g s (γ)

and the identity (9.9) follows from the fundamental theorem of calculus.
For a closed curve like ∂N we note that
Z Z
dH = H = 0,
g t (∂N ) g t (∅)

and so the identity (9.9) implies

Z
ω = 0. (9.10)
Ω∂N

As ω is closed by definition, then by Stokes’ theorem we have

Z Z
0= dω = ω.
ΩN ∂ΩN

Decomposing the boundary ∂ΩN according to (9.8) we obtain

Z Z Z
0= ω− ω− ω.
gt (N ) N Ω∂N

From (9.10) we know the last integral vanishes, and so we conclude that ω is an
integral invariant of gt .
CHAPTER 9. SYMPLECTIC GEOMETRY 166

In the previous section we saw that ω n defines a volume form on M and so

we immediately obtain an analog of Theorem 7.25.
Corollary 9.7 (Liouville’s theorem). Each of the forms ω 2 , ω 4 , . . . is preserved
by a Hamiltonian flow. In particular, every Hamiltonian flow preserves the
volume form ω n .
In section 7.6 we defined a canonical transformation to be a change of coordi-
nates that preserved the Poisson structure. Given a cotangent bundle M = T ∗ Q
with canonical symplectic form ω = dp ∧ dq, we analogously define a (time-
independent) canonical transformation to be a function g : T ∗ Q → T ∗ Q which
satisfies
g ∗ (p dq) = p dq + dS. (9.11)
This definition does not make p dq an integral invariant for a canonical trans-
formation, since the condition (9.7) only holds for closed curves N . Instead, we
employ the following useful observation.
Proposition 9.8. Let g : M → M be a smooth diffeomorphism. If η is a k-
form such that (9.7) holds for only closed orientable k-submanifolds N , then
dη is an integral invariant.
Proof. Let N be an orientable (k + 1)-submanifold. Then by Stokes’ theorem
we have Z Z Z Z Z
dη = η, dη = η= η.
N ∂N g(N ) ∂g(N ) g(∂N )

As ∂N is closed then these two right-hand sides are equal by premise. There-
fore we conclude that (9.7) holds for the form dη, and hence dη is an integral
invariant.
Noting that d(p dq) = dp ∧ dq, we conclude:
Corollary 9.9. Canonical transformations preserve the symplectic form ω and
the volume form ω n .
In Euclidean phase space T ∗ Rn , any Hamiltonian flow is automatically a
canonical transformation. However, the converse of Proposition 9.8 is not true in
general, and so Theorem 9.6 does not imply the identity (9.11). We must assume
that M is simply connected in order for Hamiltonian flows to be canonical
transformations, since then Theorem 9.6 implies (9.11).

9.4. Poisson bracket

In section 7.3 we saw how to phrase Hamiltonian dynamics in terms of the

Poisson bracket, and in this section we will see how this notion manifests in terms
of symplectic structure. For smooth functions f, g ∈ C ∞ (M ) on a symplectic
manifold (M, ω) we define the Poisson bracket of f and g to be

{f, g} = ω(J df, J dg) = dg(J df ) = (J df )(g). (9.12)

CHAPTER 9. SYMPLECTIC GEOMETRY 167

As in (7.9) the Poisson bracket {H, f } is also the evolution of the quantity f
under the Hamiltonian flow H, since the Lie derivative of the function f along
J dH is given by

d
LJ dH f = f ◦ etJ dH = (J dH)(f ) = {H, f } (9.13)
dt t=0

according to the definition (9.4) of the Hamiltonian flow. In particular, we again

have that f is conserved by the Hamiltonian flow of H if and only if {H, f } = 0.
Example 9.10. Let us check that the new definition (9.12) agrees with the
phase space definition (7.10) on Euclidean space T ∗ Rn = Rnq × Rnp . Using the
calculation of J from Example 9.4 we have
∂f !
∂f ∂f 0 I ∂q ∂f ∂ ∂f ∂
J df = J dq + dp = = − .
∂q ∂p −I 0 ∂f ∂p ∂q ∂q ∂p
∂p

Consequently, the definition (9.12) yields

∂f ∂g ∂f ∂g
{f, g} = (J df )(g) = − ,
∂p ∂q ∂q ∂p

which agrees with the first definition (7.10).

The properties listed in section 7.3 resemble those of the commutator because
the Poisson bracket at the level of functions corresponds to the commutator of
the respective Hamiltonian vector fields.
Proposition 9.11. If (M, ω) is a symplectic manifold, then the Poisson bracket
is the Hamiltonian of the commutator of the corresponding Hamiltonian vector
fields:
J d{f, g} = [J df, J dg]. (9.14)
In particular, the vector space C ∞ (M ) is a Lie algebra under the Poisson
bracket: the Poisson is bilinear, antisymmetric, and satisfies the Jacobi identity.
Proof. In fact, it suffices to prove that the Poisson bracket satisfies the Jacobi
identity
{{f, g}, h} + {{g, h}, f } + {{h, f }, g} = 0 (9.15)
for arbitrary smooth functions f, g, h ∈ C ∞ (M ). This is sufficient because we
can use the antisymmetry of ω to write

{{f, g}, h} = {f, {g, h}} − {g, {f, h}},

and recognizing the Poisson bracket as a Lie derivative via (9.13) we conclude

LJ d{f,g} h = LJ df LJ dg h − LJ dg LJ df h = L[J df,J dg] h.

As h ∈ C ∞ (M ) is arbitrary, the identity (9.14) must hold.

CHAPTER 9. SYMPLECTIC GEOMETRY 168

It remains to demonstrate the Jacobi identity (9.15). The left-hand side

of (9.15) is a combination of terms which each have at least one second order
derivative. Isolating the terms containing second order derivatives of f and
recognize the Poisson bracket as a Lie derivative via (9.13), we have

{{f, g}, h} + {{h, f }, g} = {h, {g, f }} − {g, {h, f }}

= LJ dh LJ dg f − LJ dg LJ dh f.

However, we know LJ dh LJ dg − LJ dg LJ dh = L[J dh,J dg] is only a first order dif-

ferential operator, and so we conclude that the terms with second order deriva-
tives of f vanish. Lastly, since the left-hand side of (9.15) is symmetric in f ,
g, and h, we conclude that there can be no second order derivatives and hence
must vanish.

9.5. Time-dependent systems

Hamilton’s equations (7.2) remain valid for time-dependent Hamiltonian sys-

tems, and yet thus far we have only allowed H to be a smooth function on phase
space M . This section is dedicated to extending the symplectic geometry de-
veloped thus far to time-dependent Hamiltonian systems.
Let (M, ω) be a 2n-dimensional symplectic manifold, and define the ex-
tended phase space M × R. Given a possibly time-dependent Hamiltonian
H, we define the Poincaré–Cartan one-form locally in terms of the canonical
coordinates (q, p) on M guaranteed by Theorem 9.3 as
n
X
τ = p dq − H dt = pi dq i − H dt. (9.16)
i=1

Note that the first term above is the tautological one-form p dq on M , the
differential of which yields the symplectic form ω. This is the form that we
insisted be preserved by a canonical transformation in (9.11), and the same
notion of canonical transformations holds on a general extended phase space
M × R in terms of the canonical coordinates (q, p, t).
The extended phase space M × R together with the Poincaré–Cartan one-
form τ do indeed define a contact manifold, but as we will see in the next chapter
it is not the natural contact extension of M since τ depends on the system’s
Hamiltonian H.
On M × R we define the extended Hamiltonian vector field
∂
YH = XH + , (9.17)
∂t
where XH (t) is the Hamiltonian vector field on M × {t} defined by (9.4). In
analogy with the second condition of (9.5), the vector field (9.17) is the unique
solution to
YH ydτ = 0. (9.18)
CHAPTER 9. SYMPLECTIC GEOMETRY 169

The flow of YH is given by

∂H ∂H
q̇ i = , ṗi = − , ṫ = 1, (9.19)
∂pi ∂q i

which is just Hamilton’s equations (7.2) joined with the trivial equation ṫ = 1. It
follows that any smooth time-dependent function f on M × R evolves according
to
df ∂f
= {H, f }p,q + ,
dt ∂t
as was the case in section 7.3. In particular, a time-dependent Hamiltonian H
is no longer conserved under its own flow.

Example 9.12. The authors of [Cal41, Kan48] showed that in addition to

naturally occurring time-dependent Hamiltonian systems, this framework can
also be applied to some systems which are time-independent and dissipative
by introducing artificial time dependence into the Hamiltonian. Consider the
one-dimensional Hamiltonian system

P2
H(Q, P, t) := e−γt + eγt V (Q) (9.20)
2m
on the extended phase space RQ × RP × Rt , where γ ≥ 0 is a constant and the
coordinates Q, P are related to the physical coordinates q, p by the non-canonical
transformation
P = eγt p, Q = q.
Then the equations of motion (9.19) yield

mq̈ + mγ q̇ + V 0 (q) = 0.

This represents a Newtonian system with a friction force that depends linearly
on the velocity, like the damped harmonic oscillator of Example 2.9.

9.6. Locally Hamiltonian vector fields

In section 9.1 we called a vector field V on a symplectic manifold (M, ω)

Hamiltonian if it is equal to J dH for some smooth function H ∈ C ∞ (M ),
and in Theorem 9.6 we saw that ω is necessarily invariant under the flow of
J dH. In fact, any smooth vector field V on M whose flow leaves ω invariant
(i.e. (etV )∗ ω = ω) is called symplectic. However, a vector field being Hamil-
tonian is a global condition—in the sense that the corresponding Hamiltonian
H must extend smoothly to all of M —while being symplectic is a pointwise
condition and is hence only local.
Consequently, a smooth vector field V on M is called locally Hamiltonian
(as opposed to globally Hamiltonian) if for each point p there exists a neigh-
borhood on which V is Hamiltonian. As an extension of Theorem 9.6, locally
Hamiltonian vector fields are exactly those which are symplectic.
CHAPTER 9. SYMPLECTIC GEOMETRY 170

Proposition 9.13. Let (M, ω) be a symplectic manifold. A smooth vector field

V on M is symplectic if and only if it is locally Hamiltonian. If M is also simply
connected, then every locally Hamiltonian vector field is globally Hamiltonian.
Proof. Differentiating the condition (etV )∗ ω = ω with respect to t at t = 0, we
see that a vector field V is symplectic if and only if LV ω = 0. From Cartan’s
magic formula we have

LV ω = d(V yω) + V ydω = d(V yω)

since ω is closed. Therefore V is symplectic if and only if V yω is closed.

If V is locally Hamiltonian, then in a neighborhood of any point there exists
a function f so that V = J df , and hence V yω = df which is certainly closed.
Conversely, if V is a symplectic vector field, then V yω is closed and hence exact
in a neighborhood of any point, and writing V yω = df we deduce that V = J df
on a neighborhood as desired.
Now suppose M is also simply connected. Then every closed one form is
exact, and so V yω is closed if and only if V yω = df and V = J df for a smooth
function f defined on all of M .
A smooth vector field V is called an infinitesimal symmetry of the Hamil-
tonian H if both ω and H are invariant under the flow etV of V . The second
condition (etV )∗ H = H can be recast as V (H) = 0, as can be seen by differen-
tiating H ◦ etV = H with respect to t at t = 0. With this notion, we can prove
the following analogue of Proposition 4.12.
Proposition 9.14 (Noether’s theorem). Let (M, ω) be a symplectic manifold
and H a fixed Hamiltonian. If f is a conserved quantity, then the Hamiltonian
vector field J df is an infinitesimal symmetry. Conversely, if M is also simply
connected then each infinitesimal symmetry is the Hamiltonian vector field of a
conserved quantity, and the quantity is unique up to the addition of a function
that is constant on each component of M .
Proof. First suppose f is a conserved quantity. Then from the identity (9.13)
we know that {H, f } = 0. However, since the Poisson bracket is antisymmetric,
we deduce that 0 = {f, H} = (J df )(H) as well. This demonstrates that H is
conserved by J df and from Theorem 9.6 we know that ω is also conserved by
J df , and hence J df is an infinitesimal symmetry.
Now suppose that M is simply connected and V is an infinitesimal symmetry.
Then V is symplectic by definition, and by Proposition 9.13 we know V is
globally Hamiltonian. Writing V = J df , since H is conserved by f then we have
0 = (J df )(H) = {f, H} = −{H, f }, and therefore f is a conserved quantity. If
g is another function with J dg = V = J df , then by definition of J we have
d(f − g) = (J df − J dg)yω = 0 and hence f − g is constant on the components
of M .
CHAPTER 9. SYMPLECTIC GEOMETRY 171

9.7. Exercises

9.1. Show that the following criteria for a nondegenerate two-form ω are equiv-
alent.

(a) For each x ∈ M the map Tx M → Tx∗ M which takes ξ 7→ ω(ξ, ·) is invert-
ible.
(b) For each x ∈ M and ξ ∈ Tx M nonzero there exists η ∈ Tx M such that
ω(ξ, η) 6= 0.
(c) The local matrix representation of ω in terms of some (hence every) basis
is invertible.

9.2. Let J ij (x) denote the structure matrix of a non-degenerate Poisson bracket
as in Corollary 7.9 and let Jij (x) denote the entries of the inverse matrix. Show
that
∂Jab ∂Jca ∂Jbc
+ + =0
∂xc ∂xb ∂xa
for every triple of indices a, b, and c. Note that this implies that the 2-form (9.1)
is closed. (Hint: Use the Jacobi identity (7.14) for J along with the deriva-
tive (A.15) of the determinant.)
9.3. (a) Show that rotation on the two-dimensional sphere S 2 is a Hamilto-
nian flow.
(b) Show that translations gt (q, p) = (q + t, p) on the torus R2 /Z2 is a locally
Hamiltonian flow, but is not globally Hamiltonian.
9.4. Let (M, ω) be a 2n-dimensional compact symplectic manifold.

(a) Show that the n-fold wedge product ω n is not exact.

2k
(b) Show that the de-Rham cohomology groups HdR (M ) are nontrivial for
k = 1, . . . , n.
(c) Conclude that the two-dimensional sphere S 2 is the only sphere that ad-
mits a symplectic structure.

9.5 (Symplectic and complex structure). Identify Euclidean phase space R2n =
Rnq × Rnp with the complex space Cn via zj := qj + ipj .

(a) Show that matrix multiplication by matrix J (9.6) on R2n corresponds

to multiplication by −i on Cn . This is why the matrix (9.6) is called J,
and in fact some authors (e.g. [Arn89]) choose the opposite sign for the
canonical symplectic form (9.2) and use the notation I in place of J.
zj wj on Cn corre-
P
(b) Show that the Hermitian inner product (z, w) 7→
sponds to (ξ, η) 7→ ξ · η + i ξ · Jη on R2n . In other words, the scalar
product and symplectic product are the real and imaginary parts of the
Hermitian inner product, respectively.
CHAPTER 9. SYMPLECTIC GEOMETRY 172

9.6. (Hamiltonian PDE [Gar71]) The Hamiltonian mechanics we have devel-

oped thus far originates from the time derivative of trajectories in phase space
and hence provides a class of ODEs on a finite dimensional manifold; however,
this can be extended to PDE when we consider the time derivative of trajec-
tories in an infinite dimensional function space. Given a smooth functional F
of, say, smooth real-valued periodic functions C ∞ (R/Z) we denote the Fréchet
derivative kernel by δF/δq(x), so that
Z
d δF
dF q (f ) = F (q + sf ) = (x)f (x) dx.
ds s=0 δq
The Fréchet space C ∞ (R/Z; R) will now take the role of the manifold M . Specif-
ically, for two smooth functionals F (q) and G(q) on C ∞ (R/Z) we define the
Poisson bracket Z 0
δF δG
{F, G} = (x) (x) dx.
δq δq
In other words, we have replaced the dot product on the Euclidean phase space
R2n and the matrix J (9.6) with the L2 real inner product and the skew-
symmetric operator J = ∂x ∂
. Given a smooth functional H(q) on C ∞ (R/Z),
we define the associated Hamiltonian PDE

∂q ∂ ∂H
=
∂t ∂x ∂q
in the spirit of (9.4) and (9.13).
It turns out that many of the results of finite dimensional Hamiltonian me-
chanics have analogs in this infinite dimensional setting. For example, the global
minimum of a classical Hamiltonian is a (Liapunov) stable equilibrium, and the
global minimum of a Hamiltonian functional is orbitally stable (i.e. concentra-
tion compactness).

(a) Check that this Poisson bracket is bilinear, antisymmetric, and satisfies
the Jacobi identity.
(b) Given a Hamiltonian functional H(q), show that a smooth functional F (q)
is constant for solutions q to the PDE associated to H if and only if
{H, F } = 0 as a functional on C ∞ (R/Z).
(c) Show that for any Hamiltonian functional H(q) both the Hamiltonian H
and the mass functional
Z 1
M (q) := q(x) dx
0

are automatically conserved for solutions q to the PDE associated to H.

(d) Show that the momentum functional
Z 1
P (q) := 21 q(x)2 dx
0
CHAPTER 9. SYMPLECTIC GEOMETRY 173

does indeed generate translations, in the sense that the solution to the
associated PDE with initial data q(0, x) = f (x) is q(t, x) = f (x + t).
(e) The Korteweg–de Vries (KdV) equation is the PDE associated to the
Hamiltonian Z 1
1 0 2 3

HKdV (q) := 2 q (x) − q(x) dx,
0
and arises as the long-wavelength and shallow-water limit for unidirec-
tional water waves of height q from the undisturbed water level q = 0.
Show that the mass M (q), momentum P (q), and energy HKdV (q) are all
constant for solutions q to KdV. In fact, these are the first three of an
infinite hierarchy of conserved quantities for KdV.
CHAPTER 10

CONTACT GEOMETRY

Just as symplectic geometry extends the structure of conservative Hamilto-

nian dynamics on phase space, contact geometry is the natural generalization
of nonconservative dynamics on the product of phase space with the time axis.
The material for this chapter is based on [BCT17] and [Lee13, Ch. 22].

10.1. Contact structure

A contact manifold is a smooth (2n + 1)-dimensional manifold M paired

with a contact form η. A contact form η is a one-form required to satisfy the
following nondegeneracy condition: for each x ∈ M the restriction of dηx to the
subspace ker(ηx ) ⊂ Tx M is nondegenerate (i.e. dηx is a symplectic tensor for all
x ∈ M ). The rank-2n distribution N ⊂ T M satisfying Nx = ker(ηx ) for each
x ∈ M is called a contact structure on M ; it plays a fundamental role and is
sometimes taken in the literature to be the defining geometric concept instead
of η.
Similar to how the symplectic form nondegeneracy condition is equivalent to
the nonvanishing of the n-fold wedge product (ω)n , this nondegeneracy condition
turns out to be equivalent to

η ∧ (dη)n 6= 0; (10.1)

the proof of this equivalence is Exercise 10.1. Consequently, η ∧ (dη)n defines a

volume form on M , and so in particular M must be orientable. Although the
condition (10.1) is sometimes easily verifiable in practice, we will conceptually
be relying on the first condition.
Example 10.1. On the Euclidean contact space RS × T ∗ Rn = RS × Rnq × Rnp
we have the canonical contact form
n
X
η = dS − p dq = dS − pi dq i . (10.2)
i=1

Note that this is the combination of dS and the tautological one-form (9.3) on
T ∗ Rn . A straightforward computation shows that η ∧ (dη)n = dS ∧ dq ∧ dp is
the Euclidean volume form on Rt × T ∗ Rn , but we can also check that dη is a

174
CHAPTER 10. CONTACT GEOMETRY 175

symplectic tensor. Note that dη = −dp ∧ dq, and so the rank-2n distribution
N ⊂ T R2n+1 annihilated by η is spanned by the vector fields
∂ ∂ ∂
Xi = + pi , Yi =
∂q i ∂S ∂pi
for i = 1, . . . , n. Moreover, we have

dη(Xi , Xj ) = 0, dη(Yi , Yj ) = 0, dη(Xi , Yj ) = δij

for i, j = 1, . . . , n, and it follows that dη|N is nondegenerate as in Example 9.2.

This example is again fundamental in the following sense.

Theorem 10.2 (Contact Darboux theorem). If (M, η) is a (2n+1)-dimensional
contact manifold, then for each x ∈ M there exist local coordinates (q, p, S)
centered at x with respect to which η has the representation (10.2).

See [Lee13, Th. 22.31] for a proof.

The contact structure automatically induces an associated vector field called
the Reeb field, which heuristically points orthogonally to the distribution N and
plays the role of S-axis.
Proposition 10.3 (The Reeb field). If (M, η) is a contact manifold, then there
exists a unique smooth vector field ξ on M called the Reeb field satisfying

ξydη = 0, η(ξ) = 1. (10.3)

Proof. The map Φ which takes X 7→ Xydη defines a smooth bundle homo-
morphism Φ : T M → T ∗ M , and for each x ∈ M it reduces to a linear map
Φx : Tx M → Tx∗ M . As dηx restricted to the subspace Nx is nondegenerate by
definition, then Φx |Nx is injective and hence Φx has rank at least 2n. On the
other hand, we know that Φx cannot have rank 2n + 1 because then dηx would
be nondegenerate and contradict that Tx M is odd-dimensional. Therefore, we
conclude that ker Φx is one-dimensional. Moreover, since ker(Φx ) is not con-
tained in Nx = ker(ηx ) by definition, we know there exists a unique ξ ∈ ker(Φx )
with ηx (ξx ) = 1; these correspond to the two conditions (10.3) respectively.
The smoothness of ξ follows from the smoothness of η. Note that ker Φ ⊂
T M is a smooth rank-one subbundle, and so around any x ∈ M we can pick
a smooth nonvanishing section X of ker Φ near x. As η(X) 6= 0, then we can
write ξ = η(X)−1 X as a composition of smooth maps near x.
Example 10.4. For the Euclidean contact space of Example 10.1, we see that
Reeb field is
∂
ξ=
∂S
as the two conditions (10.3) are easily verified.
CHAPTER 10. CONTACT GEOMETRY 176

10.2. Hamiltonian vector fields

Given a smooth function H on a contact manifold (M, η), the associated

contact Hamiltonian vector field XH is uniquely determined by the two
conditions
η(XH ) = H, (XH ydη)|N = −dH|N . (10.4)
As dη|N is nondegenerate by definition, there is a unique vector field Y on N
satisfying the second condition of (10.4). The vector field XH := Y + Hξ is
then the unique solution of the conditions (10.4).
In comparison to symplectic Hamiltonian vector fields, the first condition
of (10.4) looks like the primitive of ω(XH ) = dH (which we wrote as XH =
J dH) and the second condition is like XH yω = dH; in the symplectic case
these conditions were redundant (cf. (9.5)), but now we need both in order to
determine XH on and off of the kernel Nx of ηx .
Example 10.5. Let us see what the contact Hamiltonian vector field XH looks
like for the Euclidean contact space of Example 10.1 (and hence also the ex-
pression of XH in the local coordinates guaranteed by Theorem 10.2). Given a
smooth function H(s, q, p) it is easily verified that
n
X ∂H ∂ ∂H ∂ ∂H i ∂H ∂
XH = pi i − H + i i− i
+ p i
(10.5)
i=1
∂p ∂S ∂p ∂q ∂q ∂S ∂p

satisfies the two conditions (10.4), from which we obtain the differential equation
system
n
dS X ∂H dq i ∂H dpi ∂H ∂H
= pi i − H, = , = − i − pi . (10.6)
dt i=1
∂p dt ∂pi dt ∂q ∂S

When H is independent of s, the second two sets of equations reduce to Hamil-

ton’s equations of motion (7.2). We also recognize the quantity S as the action
S(q, t); indeed, if we use the formula (6.1) for the momentum and use the
Hamilton–Jacobi equation (6.4) we see that the time derivative of the action is
n
X ∂S dq i X dq in
dS ∂S
= i
+ = pi − H.
dt i=1
∂q dt ∂t i=1
dt

To conclude the section, let us consider a one-dimensional physical example

for to illustrate how contact geometry encapsulates time-independent noncon-
servative dynamics.
Example 10.6. Take n = 1 in the Euclidean contact space of Example 10.1,
and consider the Hamiltonian
p2
H(q, p, S) = + V (q) + γS (10.7)
2m
CHAPTER 10. CONTACT GEOMETRY 177

where γ is a real constant. The contact Hamilton’s equations (10.6) read

p p2
q̇ = , ṗ = −V 0 (q) − γp, Ṡ = − V (q) − γS.
m 2m
This represents a Newtonian system with a friction force that depends linearly
on the velocity, like the damped harmonic oscillator of Example 2.9. Note that
as opposed to Example 9.12, in this approach the momentum coordinate still
coincides with the physical momentum defined via the velocity.

10.3. Dynamics

Using the expression (10.5) of a contact Hamiltonian vector field XH in

terms of the local coordinates (q, p, S) of Theorem 10.2, we see that a smooth
function F (q, p, S) on a contact manifold (M, η) evolves according to
dF
= XH (F )
dt
n X n
∂F X i ∂H ∂F ∂H ∂F ∂H ∂F ∂H ∂F
= −H + p − + −
∂S i=1
∂pi ∂S ∂S ∂pi i=1
∂pi ∂q i ∂q i ∂pi
n
∂F X i
= −H + p {H, F }pi ,S + {H, F }p,q .
∂S i=1
(10.8)
The last term above we recognize from (7.9) and the first two terms are cor-
rections introduced by the contact structure. The notation {H, F }pi ,S should
be interpreted as a convenient shorthand defined via the above formula with no
deeper meaning.
Taking F = H(q, p, S) to be the Hamiltonian for the vector field XH in the
formula (10.8) above, we see that the Hamiltonian evolves according to
dH ∂H
= −H . (10.9)
dt ∂S
From this we see that H is a conserved quantity if and only if H is independent
of S or H = 0. In particular, for a conservative Hamiltonian H ≡ H(q, p) we
recover the conservation of the Hamiltonian. However, in general the rate of
decrease of the Hamiltonian is proportional to the system’s energy H and its
dissipation ∂H/∂S.
Specifically, let us consider a Hamiltonian of the form

H(q, p, S) = H0 (q, p) + f (S) (10.10)

where H0 (q, p) is the mechanical energy of the system (e.g. H0 (q, p) = p2 /2m +
V (q)), then according to the formula (10.8) the mechanical energy obeys
n
dH0 X ∂H0
=− pi i f 0 (S). (10.11)
dt i=1
∂p
CHAPTER 10. CONTACT GEOMETRY 178

We interpret this as saying that f (S) is the potential for the system’s dissipative
force. Moreover, the evolution (10.9) of the Hamiltonian can be integrated to
obtain Z t
H(t) = H(0) exp − f 0 (S) .
0

Plugging in the Hamiltonian (10.10), we obtain an implicit equation which

in principle determines the action S = S(q, p, t); thus the equations of mo-
tion (10.6) reduce to the 2n equations for the positions q i and momenta pi .
Example 10.7. If we take d = 1 and the dissipative potential f (S) = γS to be
linear as in Example 10.6 then equation (10.11) becomes

Ḣ0 = −mγ q̇ 2

which agrees with what we found for the damped harmonic oscillator of Ex-
ample 2.9. The energy of this system decays exponentially to zero, according
to
H(t) = H(0)e−γt .
Solving for the action S we obtain

p2

1
S(q, p, t) = H(0)e−γt − − V (q) .
γ 2m

10.4. Contact transformations

A contact transformation is a smooth transformation which leaves the

contact structure invariant. As opposed to canonical transformations, we allow
the contact form f η in the new coordinates to differ by a smooth nonvanishing
factor f ∈ C ∞ (M ) since we are interested in the contact structure N . In
terms of the canonical coordinates (q, p, S) guaranteed by Theorem 10.2, the
new coordinates (q̃, p̃, S̃) must satisfy

η̃ = dS̃ − p̃ dq̃ = f (dS − p dq) = f η. (10.12)

Writing (S̃, q̃, p̃) as functions of (q, p, S), this is equivalent to the conditions
n n n
∂ S̃ X j ∂ q̃ j i ∂ S̃ X j ∂ q̃ j ∂ S̃ X j ∂ q̃ j
f= − p̃ , −f p = i − p̃ , 0= i − p̃
∂S j=1 ∂S ∂q j=1
∂q i ∂p j=1
∂pi

for i = 1, . . . , d. Note that canonical transformations are independent of S, S̃

and are defined by the condition (10.12) with f ≡ 1.
As we allow for a conformal factor f in the definition (10.12), the volume
form is also rescaled. Indeed, if η̃ = f η then dη̃ = df ∧ η + f dη, and so

η̃ ∧ (dη̃)n = f d+1 η ∧ (dη)n

CHAPTER 10. CONTACT GEOMETRY 179

is the new volume form. In the case of canonical transformations we have f ≡ 1,

and hence we recover volume preservation.
As with canonical transformations in section 7.6, we may consider a contact
transformation as being generated by a generating function. For an example,
assume that the coordinates (q, q̃, S) are independent. Then the differential of
the generating function S̃ = S̃(q, q̃, S) may be written
n n
∂ S̃ X ∂ S̃ i X ∂ S̃ i
dS̃ = dS + dq + dq̃ .
∂S i=1
∂q i i=1
∂ q̃ i

Plugging this into (10.12), we see that the remaining coordinates are determined
in terms of S̃ by

∂ S̃ ∂ S̃ ∂ S̃
f= , f pi = − , p̃i = .
∂S ∂q i ∂ q̃ i

Taking f ≡ 1, we see that S̃ is related to the canonical transformation generating

function F (q, q̃) via
S̃ = S − F (q, q̃).
In particular, we conclude that all physicist’s time-independent canonical trans-
formations (cf. the remark of section 7.6) are a special case of contact transfor-
mations.
Recall that for symplectic structures the Hamiltonian dynamics generate
instantaneous canonical transformations by Proposition 9.13. In analogy with
symplectic vector fields, we will call a smooth vector field V on M if a contact
vector field if its flow etV preserves the contact structure N , in that

d etV x (Nx ) = NetV x

(10.13)

for all t ∈ R and x ∈ M in the domain of definition of the flow etV .

Proposition 10.8. Let (M, η) be a contact manifold. A smooth vector field V
on M is a contact vector field if and only if it is a contact Hamiltonian vector
field.
Proof. From the condition (10.13) and the definition of the Lie derivative, we
see that V is a contact vector field if and only if LV η = 0 on N .
First assume that V is a contact Hamiltonian vector field, and write V =
XH for a Hamiltonian H. Then from Cartan’s magic formula and the first
condition (10.4) defining XH we have

LXH η = d [η(XH )] + XH ydη = dH + XH ydη.

From the second condition of (10.4) we know that XH ydη is equal to −dH on
N . By the definition (10.3) of the Reeb field it then follows that
∂H
LXH η = −ξ(H)η = − η, (10.14)
∂S
CHAPTER 10. CONTACT GEOMETRY 180

where the last equality is merely the expression in terms of the local canonical co-
ordinates. Comparing this to the definition (10.12) of a contact transformation,
we see that the flow is a contact transformation with f = −ξ(H) = −∂H/∂S.
In particular, if we restrict to the contact structure N , we have η = 0 and hence
LXH η = 0 as desired.
Conversely, assume that V is contact vector field. Then Cartan’s magic
formula reads
0 = (LV η)|N = dη(XH ) |N + (V ydη)|N .
Consider the smooth function H = η(V ), defined so that the first condition of
the definition (10.4) holds. Then we obtain the second condition of (10.4) from
the above equality, and so we conclude that V = XH is the contact Hamiltonian
vector field for H.
Heuristically, we do not expect the volume form to be preserved by a general
contact Hamiltonian flow since contact dynamics includes dissipative systems.
Using (10.14) we see that the volume form evolves according to
n−1
X
LXH [η ∧ (dη)n ] = (LXH η) ∧ (dη)n + η ∧ (dη)i ∧ [d(LXH η)] ∧ (dη)n−1−i
i=0
∂H
= −(n + 1) η ∧ (dη)n .
∂S
This illustrates the connection between the Hamiltonian’s S-dependence to the
system’s dissipation, and consequently systems for which ∂H ∂S is nonvanishing
are called dissipative.
Instead, we have a variant of Liouville’s theorem due to [BT15], in which a
rescaled volume form is preserved away from the zero set H −1 (0).

Proposition 10.9 (Canonical measure for dissipative contact systems). Let

(M, η) be a (2n + 1)-dimensional contact manifold and H a smooth function on
M . Then the volume form

|H|−(n+1) η ∧ (dη)n

is an invariant measure for the contact Hamiltonian flow for H along orbits
outside of H −1 (0). Moreover, up to scalar multiplication it is the unique such
measure whose density with respect to the standard volume form depends only
on H.
Proof. For a smooth function ρ on M , a computation using (10.14) shows that

∂H
LXH [ρ η ∧ (dη)n ] = (LXH ρ) η ∧ (dη)n − (n + 1) ρ η ∧ (dη)n
∂S
∂H
= XH (ρ) − (n + 1) ρ η ∧ (dη)n .
∂S
CHAPTER 10. CONTACT GEOMETRY 181

If we assume ρ = ρ(H) then

∂H
XH (ρ) = −Hρ0 (H) ,
∂S
and so the vanishing of LXH [ρ η ∧ (dη)n ] occurs exactly when ρ solves
ρ0 (H) = −(n + 1)H −1 ρ.
This equation has the solution ρ(H) = |H|−(n+1) and it is unique up to scalar
multiplication.

10.5. Time-dependent systems

Thus far we have allowed H to be a function on the contact manifold M ,

and hence have only considered time-independent dissipative systems. In this
section, we present the extension introduced in [BCT17] of contact Hamiltonian
systems to include time-dependence.
For (M, η) a (2n + 1)-dimensional contact manifold, we define the extended
manifold M × R. In analogy with the Poincaré–Cartan one-form (9.16), given
a possibly time-dependent Hamiltonian H we extend the contact form to
n
X
θ = dS − p dq + H dt = dS − pi dq i + H dt (10.15)
i=1

in terms of the canonical coordinates (q, p, S) on M guaranteed by Theorem 10.2.

On M × R we define the extended contact Hamiltonian vector field
∂
YH = XH + .
∂t
In place of the conditions (10.4), it can be checked that this vector field is
uniquely determined by
∂H
θ(YH ) = 0, YH ydθ = − θ. (10.16)
∂S
Here, the first condition is analogous to how (9.18) replaced (9.5) for time-
dependent symplectic systems, and the second condition is the analog of (10.14)
(which serves as a rephrasing for the second condition of (10.4) that does not
involve N ).
The flow of YH is given by
n
dS X ∂H dq i ∂H dpi ∂H ∂H
= pi i − H, = , = − i − pi , ṫ = 1,
dt i=1
∂p dt ∂pi dt ∂q ∂S

which are the old equations of motion (10.6) joined with the trivial equation
ṫ = 1. It follows that any smooth time-dependent function F on M × R evolves
according to
n
dF ∂F X i ∂F
= −H + p {H, F }pi ,S + {H, F }p,q + , (10.17)
dt ∂S i=1
∂t
CHAPTER 10. CONTACT GEOMETRY 182

using the notation of eq. (10.8). In particular, we see that under its own flow
the Hamiltonian now changes according to both its dissipation ∂H/∂S and its
time-dependence.
Lastly, let us extend the notion of contact transformations to our extended
manifold M ×R. In terms of canonical coordinates, a time-dependent contact
transformation (q, p, S, t) 7→ (q̃, p̃, S̃, t̃) must satisfy

θ̃ = dS̃ − p̃ dq̃ + K dt̃ = f (dS − p dq + H dt) = f θ (10.18)

for a smooth nonvanishing factor f ∈ C ∞ (M × R) and a new Hamiltonian

K ∈ C ∞ (M × R). Expanding dq̃ and dS̃, we see that the new Hamiltonian
must satisfy
n
∂ S̃ X i ∂ q̃ i
fH = − p̃ + K.
∂t i=1
∂t
As before, we may consider a contact transformation as being generated
by a generating function. For example, let us assume that the coordinates
(q, q̃, S, t) are independent. After substituting the differential of the generating
function S̃ = S̃(q, q̃, S, t) into (10.18), we see that the remaining coordinates are
determined in terms of S̃ by

∂ S̃ ∂ S̃ ∂ S̃ ∂ S̃
f= , f pi = − , p̃i = , fH = + K. (10.19)
∂S ∂q i ∂ q̃ i ∂t
The first three conditions are unchanged and the last condition defines the new
Hamiltonian K = K(q, q̃, S, t). Taking f ≡ 1, we see that S̃ is related to the
canonical transformation generating function F (q, q̃, t) via

S̃ = S − F (q, q̃, t).

However, now there is an additional constraint on S̃ imposed by the in-

variance of the second condition of (10.16). After the transformation we must
have
∂K
YH ydθ = − θ̃.
∂ S̃
Using θ̃ = f θ, ỸH = YH , and the extended contact Hamiltonian vector field
conditions (10.16), this yields

∂K ∂H
f =f + df (YH ).
∂ S̃ ∂S
In the special case f ≡ 1 we note that if H is independent of S then K = 0
is a solution, in which case the last condition of (10.19) becomes the familiar
Hamilton–Jacobi equation (6.4). However, in general f may be S-dependent and
so the notion of contact transformations is strictly more general than even the
physicist’s notion of canonical transformations (cf. the remark of section 7.6).
CHAPTER 10. CONTACT GEOMETRY 183

Example 10.10. In Example 9.12 we introduced a non-canonical coordinate

transformation (q, p) 7→ (q, eγt ) to describe a dissipative system using time-
dependent Hamiltonian dynamics. However, it is easily checked that the trans-
formation
q̃ = q, p̃ = eγt p, S̃ = eγt S, t̃ = t
satisfies eq. (10.15) and is hence a time-dependent contact transformation. Here,
the conformal factor is f = eγt , and the Hamiltonians are expressed in their
respective coordinates: H is given by eq. (10.7) and K is given by eq. (9.20).

10.6. Exercises

10.1. Show that a smooth one-form η on a (2n + 1)-dimensional manifold M

satisfies the nondegeneracy condition of a contact form if and only if it satisfies
the nonvanishing top form condition (10.1).
10.2 (Contact structure on S 2n+1 ). On the Euclidean space R2n+2 consider the
coordinates (x1 , . . . , xn+1 , y 1 , . . . , y n+1 ) and define the one-form
n+1
X
xi dy i − y i dxi .

θ :=
i=1

The standard contact form on the sphere S 2n+1 is η := ι∗ θ, where ι :=:

S 2n+1 ,→ R2n+2 is the inclusion map.

(a) Show that the vector fields

n+1
X n+1
X
i ∂ ∂ ∂ ∂
V = x + yi i , W = xi
− yi i
i=1
∂xi ∂y i=1
∂y i ∂x

satisfy V ydθ = 2θ and W ydθ = − d x2 + y . 2

(b) Let S ⊂ T (R2n+1 r {0}) denote the subbundle spanned by V and W , and
let
[
S⊥ = X ∈ Tp R2n+2 : dθp (Vp , Xp ) = dθp (Wp , Xp ) = 0
p∈S 2n+1

denote its symplectic complement. Show that θ is indeed a contact form

with respect to the contact structure S ⊥ .
(c) Show that the corresponding Reeb field is given by W restricted S 2n+1 .

10.3 (Solving the damped parametric oscillator via expanding coordinates).

Consider the one-dimensional damped parametric oscillator

p2 m
H= + ω 2 (t)q 2 + γS
2m 2
CHAPTER 10. CONTACT GEOMETRY 184

with time-dependent frequency ω(t) and damping parameter γ. Show that the
new expanding coordinates

q̃ = eγt/2 q, p̃ = eγt/2 (p + 21 mγq), S̃ = eγt (S + 41 mγq 2 ), t̃ = t

define a contact transformation, with respect to which the Hamiltonian takes

the form
p̃2 m 2
ω (t) − 14 γ 2 q̃ 2 .

K= +
2m 2
This new Hamiltonian K q now corresponds to an undamped parametric oscillator
with the new frequency ω 2 (t) − 41 γ 2 . The undamped oscillator has been ex-
tensively studied and solutions for the equations of motion can be obtained. In
the harmonic oscillator case ω(t) ≡ ω0 the Hamiltonian K is a conserved quan-
tity, and the coordinates expand exponentially in time so that the trajectories
form closed orbits at a slower frequency.
10.4 (Solving the damped parametric oscillator via conserved quantities). Con-
sider the one-dimensional damped parametric oscillator
p2 m
H= + ω 2 (t)q 2 + γS
2m 2
with time-dependent frequency ω(t) and damping parameter γ.

(a) We seek a solution F to eq. (10.17) with vanishing left-hand side. Substi-
tuting the quadratic ansatz

F (q, p, S, t) := β(t)p2 − 2ξ(t)qp + η(t)q 2 + ζ(t)S,

obtain a system of first-order equations for β, η, ξ, and ζ, where the ζ̇ has

solution ζ(t) = ζ0 eγt .
1 γt 2
(b) Using the substitution β(t) := 2m e α (t), show that the β equation is
solved if and only if α satisfies the Ermakov equation

α̈ + [ω 2 (t) − 41 γ 2 ]α = α−3 ,

and the remaining equations become

η(t) = 12 meγt {[α̇ − 21 γα]2 + α−2 }, ξ(t) = 12 eγt [α̇ − 12 γα] + 14 .

(c) Conclude that the quantity F (q, p, S, t) = I(q, p, t) + ζ0 G(q, p, S, t) is con-

served, with

m γt h p γ i2 q 2
I= e α − α̇ − α q + , G = eγt (S − 12 qp),
2 m 2 α
and α(t) solves the Ermakov equation. Moreover, since F is invariant for
all initial conditions and ζ0 is determined solely by initial conditions, then
I and G must be separately conserved.
CHAPTER 10. CONTACT GEOMETRY 185

(d) Show that the new coordinates

1 2 p
q̃ = arctan (α̇ − 2 γα)α − α , p̃ = I(q, p, t), S̃ = G(q, p, S, t),
mq

(and t̃ = t) define a contact transformation, with respect to which the new

Hamiltonian is simply K = Iα−2 . Solve the new equations of motion for
q̃, p̃, S̃ and obtain the solution
r
2I γt
q(t) = e α(t) cos φ(t), S(t) = Ge−γt + 21 q(t)p(t),
m
√ Z t

γ 1 dτ
p(t) = 2mIeγt α̇ − α cos φ(t) − sin φ(t) , φ(t) = 2 (τ )
.
2 α t0 α

Here, α(t) solves the Ermakov equation and the conserved quantities I
and G are determined by the initial conditions.
APPENDIX A

FUNDAMENTALS OF ODE THEORY

We collect some facts from introductory ODE theory in this chapter for easy
reference. The material is based on [CL55].

A.1. Picard iteration

Throughout this chapter, we will study the initial value problem (IVP)

ẋ = f (t, x(t)), x(0) = x0 , (A.1)

where ġ = dg dt denotes a time derivative. The equation (A.1) describes the

evolution of a point x, which we will take to lie in a Banach space (X, | · |).
Although we will primarily be concerned with the case X = Rd in these notes,
this level of generality is useful as it includes some systems with infinite degrees
of freedom (e.g. Example 7.29 and section 8.6). However, this is not broad
enough to really include PDE, except for some boring examples.
Our first step will be to recast the IVP (A.1) as an integral equation:
Lemma A.1. If f : R×X → X is continuous, then the following are equivalent:

(a) (Classical solution) x : (−T, T ) → X is C 1 and solves the IVP (A.1).

(b) (Strong solution) x : (−T, T ) → X is C 0 and solves the integral equation
Z t
x(t) = x(0) + f (s, x(s)) ds. (A.2)
0

Proof. Both directions easily follow from the fundamental theorem of calculus.
First assume that (a) holds. Then both sides of the IVP (A.1) are continuous,
and so integration yields (A.2) by the fundamental theorem of calculus.
Now assume that (b) holds. Then t 7→ f (t, x(t)) is continuous, and so by
the fundamental theorem of calculus the integral equation (A.2) says that x is
differentiable with derivative f (t, x(t)).
In the case X = Rd , we can give a more general measure-theoretic version of
Lemma A.1 via the following statement of the fundamental theorem of calculus:

186
APPENDIX A. FUNDAMENTALS OF ODE THEORY 187

given g : (−T, T ) → Rd in L1 , the function x : (−T, T ) → Rd is absolutely

Rt
continuous and solves ẋ(t) = g(t) almost everywhere iff x(t) = x(0) + 0 g(s) ds.
The theory of absolute continuity for Banach-valued functions exists, but is
quite involved.
Lemma A.1 does not apply to many PDE, not even to the transport equation
∂u ∂u ∂u
∂t = ∂x , because spatial differentiation u 7→ ∂x is not continuous in most
Banach spaces. (There are some exceptions to this though, like the space of
holomorphic functions on a strip {z ∈ C : | Im z| < c} containing the real axis.)
Moreover, the conclusion of Lemma A.1 does not hold, and (a) and (b) yield
distinct notions of solutions. Another common notion is that of weak solutions:
an L∞ function x : (−T, T ) → X that solves
Z T Z T Z T Z t
x(t)ψ(t) dt = x(0) ψ(t) dt + ψ(t) f (s, x(s)) ds dt
−T −T −T 0

for all ψ ∈ C0∞ ([−T, T ]). For ODEs, it turns out that this is also equivalent
provided that f is continuous, but we will not need this fact.
In trying to argue that solutions to the IVP (A.1) exist, the formulation (b) is
better than that of (a). This is because integrals are stable while Pderivatives are
highly unstable. For example, consider the Fourier series g(t) = k6=0 ck e2πikt .
If the coefficients ck are absolutely summable, then this defines a periodic
function g(t). Roughly speaking, the rate of decay of ck corresponds to the
smoothness of g(t), because the character e2πikt is rapidly oscillating for large
frequencies k. Integration suppresses high frequencies, because it replaces the
ck
coefficients of g(t) by more rapidly decaying sequence 2πik . Conversely, differ-
entiation amplifies high frequencies, because it replaces the coefficients of g(t)
by more slowly decaying sequence 2πikck .
In addition to differential equations, the theory of integral equations also
exists. The integral equation (A.2) is special in that the integration is over [0, t],
and such equations are said to be of Volterra type (in analogy with triangular
matrices). The more general case
Z 1
x(t) = b(t) + K(t, s)x(s) ds
0

is called a Fredholm integral equation, of which the linear case is shown above.
The following theorem is a fundamental result on the existence of solutions:

Theorem A.2 (Picard–Lindelöf). If f : R × X → X is continuous and is

Lipschitz in x:

|f (t, x) − f (t, y)| ≤ C|x − y| for all x, y ∈ X (A.3)

for some constant C > 0, then for each x0 ∈ X there exists a unique continuous
function x : R → X so that the integral equation (A.2) holds. Moreover, x ∈
C([−T, T ] → X) depends continuously upon x0 ∈ X for all T > 0.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 188

Theorem A.2 says that the ODE (A.1) is (globally-in-time) well-posed (in
the sense of Hadamard): solutions exist, solutions are unique, and solutions
depend continuously upon the initial data. The statement of continuous depen-
dence can take various forms; for example, it follows that given a convergent
sequence of initial data, the corresponding sequence of solutions converges uni-
formly on compact time intervals. Note that continuous dependence upon initial
data is not extremely restrictive; indeed, chaotic systems, where trajectories ex-
hibit complex geometric structure and are sensitive to small changes in initial
conditions, can still be well-posed.
We will only present one proof of Theorem A.2, but there are multiple argu-
ments that apply. Having multiple methods is particularly useful in the study
of PDEs, because different ODE proofs yield distinct PDE statements.
Proof. We will argue by Picard iteration. Recursively define the sequence
Z t
x0 (t) ≡ x0 , xn+1 (t) = x0 + f (s, xn (s)) ds
0

of successive approximate solutions. Ultimately we will show {xn } is a Cauchy

We now have the difference between the previous two approximations on the
RHS. Applying this estimate iteratively to the RHS, we obtain
Z t Z sn
2
≤C |xn−1 (sn−1 ) − xn−2 (sn−1 )| dsn−1 dsn
0 0
..
. Z
n
≤C |x1 (s1 ) − x0 (s1 )| ds1 ds2 . . . dsn .
0<s1 <s2 <···<sn <t

For the first two functions in our sequence, we can bound

Z s1
|x1 (s1 ) − x0 (s1 )| = f (s0 , x0 ) ds0 ≤ s1 sup |f (s, x0 )|.
0 s∈(0,t)

Together, this yields

C n tn+1
|xn+1 (t) − xn (t)| ≤ sup |f (s, x0 )| .
s∈(0,t) (n + 1)!
APPENDIX A. FUNDAMENTALS OF ODE THEORY 189

The RHS is summable in n, and so we conclude that {xn (t)} is a Cauchy

sequence in C([−T, T ] → X).
Define x to be the limit of xn in C([−T, T ] → X), which exists because X
is complete. As xn converges to x in this space, we have

sup |xn (t) − x(t)| → 0 as n → ∞

t∈[−T,T ]

for all T > 0. To see that x(t) solves the IVP (A.1), we take n → ∞ in the
definition Z t
xn+1 (t) = x0 + f (s, xn (s)) ds.
0

We have f (s, xn (s)) → f (s, x(s)) by continuity, so the integrals converge, and
hence we obtain Z t
x(t) = x0 + f (s, x(s)) ds.
0

Taking a supremum over t ∈ (−T, T ), we obtain

Cn
sup |x(t) − x̃(t)| ≤ sup |x(s) − x̃(s)|
t∈(−T,T ) n! s∈(−T,T )

for all n. As the LHS and RHS are finite since x and x̃ are in C([−T, T ] → X),
we send n → ∞ to conclude that x(t) ≡ x̃(t).
Lastly, we claim that the solution depends continuously upon the initial
data. Suppose x(t) and x̃(t) are both solutions to (A.1) in C([−T, T ] → X)
APPENDIX A. FUNDAMENTALS OF ODE THEORY 190

with different initial data. Arguing as before, we estimate

Z t
|x(t) − x̃(t)| ≤ |x(0) − x̃(0)| + C |x(sn−1 ) − x̃(sn−1 )| dsn−1
0
..
.
C 2 t2 C n tn

≤ |x(0) − x̃(0)| 1 + Ct + + ··· +
2! n!
n+1 n+1
C t
+ sup |x(s) − x̃(s)|.
(n + 1)! s∈(−T,T )

The supremum is finite and independent of n since x and x̃ are in C([−T, T ] →

X). Therefore the last term on the RHS converges to zero, and so sending
n → ∞ yields
|x(t) − x̃(t)| ≤ |x(0) − x̃(0)|eC|t| .
Taking a supremum over t ∈ [−T, T ], we see that the map x0 7→ x(t) is Lipschitz
continuous on any bounded time interval.
Mimicking the proof of continuous dependence, we can also prove the fol-
lowing useful fact:
Lemma A.3 (Grönwall’s inequality). If f : [0, T ] → [0, ∞) is continuous,
a : [0, T ] → [0, ∞) is L1 , and
Z t
f (t) ≤ A + a(s)f (s) ds,
0

then Z t
f (t) ≤ A exp a(s) ds .
0

Many authors choose to prove Lemma A.3 first, and then cite it in the proof
of Theorem A.2.
Notice that the proof actually shows that the data-to-solution map x0 7→
x(t) is a Lipschitz function from X into C([−T, T ] → X), and that the Lipschitz
constant is bounded by eCT . In fact, it follows that for fixed t the map x0 7→ x(t)
is a bi-Lipschitz homeomorphism, because we can reconstruct x(0) from x(t)
by solving the ODE backwards in time and citing uniqueness. The Lipschitz
continuity here matches the Lipschitz continuity of f , and in general we cannot
do better.
Another common proof of Theorem A.2 is based on contraction mapping.
However, this only proves the result for T > 0 sufficiently small. The full
statement of Theorem A.2 requires our iteration argument.
Next, we would like to extend our existence result to include equations where
f is not globally Lipschitz, but instead is smooth. Note that f being smooth
does not imply that there are global solutions:
APPENDIX A. FUNDAMENTALS OF ODE THEORY 191

Example A.4. The equation

ẋ = x2 , x(0) = 1

has solution
1
x(t) = ,
1−t
which blows up at t = 1.
Smoothness does guarantee local solutions however:

Theorem A.5. If f : R×X → X is continuous and is C 1 in x, then given x0 ∈

X there exists T > 0 and a unique solution x : (−T, T ) → X in C([−T, T ] → X)
to the IVP (A.1) that depends continuously upon the initial data.
Proof. As f 0 is continuous, there exists δ > 0 and A > 0 such that |t|+|x−x0 | <
δ implies kf 0 (t, x)k ≤ A. In particular, f is Lipschitz on the set |t|+|x−x0 | < δ.
Let ψ : [0, ∞) → R be a smooth cutoff function so that ψ(r) ≡ 1 for
r ∈ [0, 2δ ] and ψ(r) ≡ 0 for r ≥ δ. We can now apply the Picard–Lindelöf
theorem (Theorem A.2) to

ẋ = f (t, x)ψ(t)ψ(|x − x0 |), x(0) = x0 .

Lastly, we choose T > 0 small enough to stop the solution from noticing the
change in the RHS.
Lastly, we note that solutions x(t) should always be defined on open intervals,
since given one defined on a closed interval we can always extend it a bit further.

Corollary A.6 (Blowup criterion). Suppose f : R × X → X is continuous

and is C 1 in x, and fix x0 ∈ X. Then there exists a maximal interval of
existence (T− , T+ ) for some −∞ ≤ T− < 0 < T+ ≤ ∞ and a unique solution
x : (−T− , T+ ) → X to the IVP (A.1). Moreover, if T+ is finite then |x(t)| → ∞
as t ↑ T+ , and similarly for T− .

Proof. We define the maximal interval of existence to be the union of all open
intervals containing t0 on which a solution x(t) exists. By Theorem A.5, this
is an open and connected set and hence is indeed an interval (T− , T+ ). We
may then glue all of these solutions together by uniqueness to obtain a solution
u : (T− , T+ ) → X.
Suppose for a contradiction that T+ < +∞ and |x(t)| 6→ ∞ as t ↑ T+ .
Then there exists an increasing sequence tn converging to T+ on which |x(t)| is
bounded. Together with t = T+ , this is a bounded set on which f 0 is continuous
and hence bounded. Arguing as in Theorem A.5, we may apply Theorem A.2 to
construct a solution defined for a short time after t = T+ —but this contradicts
the maximality of T+ .
APPENDIX A. FUNDAMENTALS OF ODE THEORY 192

A.2. Alternative approaches to well-posedness

In this section, we display some important arguments which should be in-

cluded in a study of ODEs. However, they are not strictly necessary for these
notes, and so this section can be skipped if desired.

A.2.1. Existence. In the Picard–Lindelöf theorem (Theorem A.2), the

data-to-solution map is automatically Lipschitz. This is convenient for nice
problems, but it limits the applicability to ODEs where this is true. The follow-
ing is an alternative method for proving existence based on compactness, and
hence only works for X = Rd .
Theorem A.7 (Cauchy–Peano). Let f : R × Rd → Rd continuous. Then for
every x0 ∈ Rd the IVP (A.1) has at least one solution x : (−T, T ) → Rd .
Proof. We may assume f (t, x) is continuous and bounded, after replacing f by a
truncation as in
R the proof of Theorem A.5. Let φ(t, x) be a smooth nonnegative
function with φ = 1, and define φ (t, x) := −d−1 φ( t , x ). We replace f by the
convolution ZZ
f (t, x) := φ (s, y)f (t − s, x − y) dy ds.

Then f is bounded pointwise independently of , and is Lipschitz for fixed .

By the Picard–Lindelöf theorem (Theorem A.2), there exists a unique solu-
tion x (t) to the modified equation

ẋ = f (t, x), x(0) = x0 .

Notice that Z t
x (t) = x0 + f (s, x (s)) ds (A.4)
0
is Lipschitz uniformly in , since
Z t
|x (t) − x (s)| ≤ |f (τ, x (τ ))| dτ ≤ |t − s|kf kL∞ .
s

Therefore x : [−1, 1] → Rd for ∈ (0, 1] form a bounded (take s = 0 above)

and equicontinuous set of functions. By the Arzelà–Ascoli theorem, there exists
a sequence n → 0 such that xn (t) converges uniformly on [−1, 1] to some x(t).
(This step requires Euclidean space Rd instead of a general Banach space X,
but still works if the image of f is precompact.) Sending n → 0 in the integral
equation (A.4), we get
Z t
x(t) = x0 + f (s, x(s)) ds.
0

Therefore x(t) solves the IVP by Lemma A.1. We then pick T > 0 small so that
x(t) solves the original IVP.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 193

A.2.2. Uniqueness. Note that the statement of Theorem A.7 provides

existence, but not uniqueness. This is good for problems where solutions are
not unique. For example, the IVP
p
ẋ = 2 |x|, x(0) = 0

has solutions x(t) ≡ 0 and x(t) = t2 .

However, for all other IVPs we would like to prove that solutions are unique.
We will now present various arguments for uniqueness.
Our first tool is Grönwall’s inequality:
Proposition A.8 (Uniqueness by Grönwall). Suppose f : R × Rd → Rd is
Lipschitz in x. Then for every x0 ∈ X the IVP (A.1) has at most one solution.
Proof. Suppose x(t) and x̃(t) both solve the IVP (A.1). As f is Lipschitz,
Z t
|x(t) − x̃(t)| ≤ C|x(s) − x̃(s)| ds,
0

and so by Grönwall’s inequality (Lemma A.3) we have x(t) ≡ x̃(t).

Second, we have a monotonicity argument:
Proposition A.9 (Uniqueness by monotonicity). Suppose f : R × Rd → Rd
satisfies
(x − y) · [f (t, x) − f (t, y)] ≤ 0. (A.5)
Then for every x0 ∈ X the IVP (A.1) has at most one solution.
When d = 1, the condition (A.5) says that f is decreasing.
Proof. Given two solutions x(t) and x̃(t) with the same initial data, we have
d
[x(t) − x̃(t)]2 = 2[x(t) − x̃(t)] · [f (t, x(t)) − f (t, x̃(t))] ≤ 0.
dt
So by Grönwall’s inequality (Lemma A.3) we have x(t) ≡ x̃(t).
Note that the proof still applies even if we only have

(x − y) · [f (t, x) − f (t, y)] ≤ C|x − y|2 .

Third, we have a barrier argument:

Proposition A.10 (Uniqueness by barrier). Suppose f : R × Rd → Rd satisfies
1
|f (t, x) − f (t, y)| ≤ |x − y| log |x−y| for |x − y| ≤ 21 . (A.6)

Then for every x0 ∈ X the IVP (A.1) has at most one solution.
Note that the assumption (A.6) on f is slightly weaker than Lipschitz con-
tinuity.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 194

Fix A > 0, and define the “barrier” b(t) = exp{−Ae−t }, which solves the
equation
ḃ = b log 1b , b(0) = e−A . (A.7)
We claim that |x(t) − x̃(t)| ≤ b(t). Suppose this is false. It is true at t = 0, and
so by continuity there exists a minimal time t0 > 0 where it fails, i.e.
|x(t) − x̃(t)| < b(t) for all t ∈ [0, t0 ), |x(t0 ) − x̃(t0 )| = b(t0 ).
Then by continuity,
Z t0 Z t0
1
|x(t0 ) − x̃(t0 )| ≤ |x(s) − x̃(s)| log |x(s)−x̃(s)| ds < b log 1b ds = b(t0 ),
0 0

which contracts the choice of t0 .

Together, we have shown that for any A > 0 we have |x(t) − x̃(t)| ≤ b(t).
Sending A → ∞, we conclude that |x(t) − x̃(t)| ≤ 0 and hence x(t) ≡ x̃(t).
The assumption (A.6) we must place on f is dictated by the differential
equation (A.7) for the barrier b. All we need to makeR this argument work
1
though is that the differential equation ḃ = g(b) satisfies 0 g(b) db = ∞.
Fourth, we have Carleman’s approach:
Proposition A.11 (Uniqueness by Carleman estimate). Suppose that f : R ×
R → R is continuous and
|f (t, x) − f (t, y)| ≤ C|x − y| for all x, y ∈ R.
Then for every x0 ∈ X the IVP (A.1) has at most one solution.
We begin with an inequality:
Lemma A.12 (A Carleman estimate). If w : R → R is C 1 and compactly
supported, then
Z Z
e ẇ dt ≥ λ2 e2λt w2 dt for all λ ∈ R.
2λt 2
(A.8)

Proof. We write
Z Z 2
2λt 2 d λt λt
e ẇ dt = (e w) − λe w dt
dt
Z 2
d λt d
= (e w) − 2λ(eλt w) (eλt w) + λ2 e2λt w2 dt.
dt dt
APPENDIX A. FUNDAMENTALS OF ODE THEORY 195

The first term on the RHS is nonnegative, and so we can drop it to obtain an
d
inequality. The second term is equal to −λ dt [(eλt w)2 ], which integrates to zero
since w has compact support. The inequality (A.8) follows.
Proof of Proposition A.11. Let x(t) and x̃(t) be two solutions to the IVP (A.1)
with the same initial data. We may assume that x and x̃ disagree in the future
after substituting t 7→ −t if necessary. We may also modify x̃ so that x̃(t) ≡ x(t)
for all t ≤ 0.
Fix δ > 0, and let χ : R → R be a smooth function so that χ(t) ≡ 1 for
t ≤ δ and χ(t) ≡ 0 for t ≥ 2δ. Then the weight w(t) = χ(t)[x̃(t) − x(t)] is C 1
and compactly supported, and so the Carleman estimate (A.8) yields
Z
2
λ e2λt χ(t)2 [x̃(t) − x(t)]2 dt
Z
2
≤ e2λt χ̇(x̃ − x) + χ(f (t, x̃(t)) − f (t, x(t))) dt

Z
2
≤C e2λt χ(t)2 |x̃(t) − x(t)|2 dt
Z Z
+ e2λt χ̇(t)2 |x̃(t) − x(t)|2 dt + 2C e2λt |χ̇(t)χ(t)||x̃(t) − x(t)|2 dt.

Let A = sup{|χ̇(t)| : t ∈ R}, and note that A ≥ cδ −1 for some c > 0 since χ
decreases by 1 over an interval of length δ. Then
Z
2
λ e2λt χ(t)2 [x̃(t) − x(t)]2 dt
Z Z 2δ
2 2λt 2 2 2
≤C e χ(t) |x̃(t) − x(t)| dt + (A + 2AC) e2λt 2(|x̃|2 + |x|2 ) dt.
δ

The second term on the RHS is bounded by a constant times e2δλ . Therefore,
for all λ ≤ −1 sufficiently large we have

λ2
Z
e2λt χ(t)2 [x̃(t) − x(t)]2 dt ≤ C 0 e−2δ|λ|
4

for some constant C 0 . This implies that the LHS is zero. Indeed, if |x̃(t)−x(t)| 6≡
0 on (0, δ), then there is a time t0 ∈ (0, δ) such that

λ2
Z
e2λt χ(t)2 [x̃(t) − x(t)]2 dt ≥ cλ2 e−2t0 λ
4

for some c > 0, and the RHS cannot bounded by e−2δ|λ| for all λ ≤ −1 large.
As δ > 0 was arbitrary, we conclude that x(t) ≡ x̃(t).
In other applications, the parameter δ often needs to be fixed small for some
other reason. To accommodate this, we can pick t = 0 to be the first time after
which x(t) and x̃(t) disagree.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 196

One application of Carleman’s argument is Laplace’s equation ∆u = 0 in

Rd . For d = 2, uniqueness can be easily proved by methods of complex analysis.
These methods do not carry over at all to d ≥ 3 however, but Carleman’s
argument does.

A.2.3. Continuous dependence. Once we have proved existence and

uniqueness, we can sometimes recover continuous dependence via a compactness
argument.

Proposition A.13. Let f : R × Rd → Rd be continuous, and suppose all

solutions to the IVP (A.1) exist, are bounded, and are unique on some interval
[0, T ]. Then the IVP (A.1) is well-posed.
Proof. It only remains to show continuous dependence. Consider a convergent
sequence of initial data ξn → ξ. We want to show that the corresponding
solutions xn (t) := x(t; ξn ) converge uniformly to x(t; ξ).
Observe that the sequence {xn (t)} is equicontinuous on [0, T ], since

|ẋn (t)| ≤ |f (t, xn (t))|;

the input xn (t) is bounded uniformly in n ∈ N and t ∈ [0, T ] and f is continuous,

and so the RHS is also uniformly bounded. Fix an arbitrary subsequence of
{xn (t)}. By the Arzelà–Ascoli theorem, there exists a further subsequence which
converges uniformly. The uniform convergence implies that the limit y(t) will
solve the integral equation
Z t
y(t) = ξ + f (s, y(s)) ds,
0

and hence will solve the IVP (A.1) with initial data ξ by Lemma A.1. Solutions
are unique by premise, and so the limit function must be x(t; ξ).
As the initial subsequence was arbitrary, we conclude that the entire se-
quence xn (t) converges to x(t; ξ).

A.3. Smooth dependence upon initial data

If f is C 1 in x, then the following proposition shows that the data-to-solution

map is also C 1 . This will be useful because we will want to be able to compute
volumes in phase space.
Proposition A.14. Let f : R × X → X be continuous, and be Lipschitz and
C 1 in x, and let x(t; x0 ) denote the unique solution to the IVP (A.1). Then
x(t; x0 ) is a C 1 function of x0 , and the derivative V (t; x0 ) = ∂x0 x(t) is the
unique solution to

V̇ (t) = f 0 (t, x(t))V (t), V (0) = IdX .

APPENDIX A. FUNDAMENTALS OF ODE THEORY 197

Proof. The Picard–Lindelöf theorem (Theorem A.2) applies to V equation, be-

cause the RHS is continuous in times and linear in V .
Let Eh (t) denote the quantity

f (t, x(t; x0 + η)) − f (t, x(t; x0 )) x(t; x0 + η) − x(t; x0 )

sup − f 0 (t, x(t; x0 )) .
|η|≤h h h

Observe that |Eh (t)| . 1 uniformly on [0, T ], because f and x0 7→ x(t; x0 ) are
both Lipschitz. We also know that Eh (t) → 0 as h ↓ 0 for each t, because
|x(t; x0 + η) − x(t; η)| . eC|t| |η| and f is differentiable at x0 .
Now, for η 6= 0 we have
Z t
x(t; x0 + η) − x(t; x0 ) − η + f 0 (s, x(s))[x(t; x0 + η) − x(t; x0 )] ds
0
Z T
≤ |η| E|η| (s) ds.
0

This is the integral equation for

Z t
V (t)η = η + f 0 (s, x(s))V (s)η ds.
0

Thus

|x(t; x0 + η) − x(t; x0 ) − V (t)η|

Z T Z T
≤ |η| E|η| (s) ds + kf 0 (s, x(s))kop |x(s; x0 + η) − x(s; x0 ) − V (s)η| ds.
0 0

Note that kf 0 (s, x(s))kop is bounded uniformly for s ∈ [0, T ]. So by Grönwall’s

inequality (Lemma A.3),
Z T
sups kf 0 kop
|x(t; x0 + η) − x(t; x0 ) − V (t)η| ≤ |η|e Eh (s) ds.
0

Taking a supremum in η, we obtain

Z T
sups kf 0 kop
sup |x(t; x0 + η) − x(t; x0 ) − V (t)η| ≤ he Eh (s) ds.
|η|≤h 0

The integral on the RHS converges to zero as h ↓ 0 by the dominated con-

vergence theorem. (For infinite dimensional X, these are Riemann integrals,
and so we replace the dominated convergence theorem by the Arzelà conver-
gence theorem—a fact surprisingly more difficult to prove within the context of
Riemann integration.) Therefore ∂x(t)
∂x0 exists and is equal to V (t).
APPENDIX A. FUNDAMENTALS OF ODE THEORY 198

It only remains to show that x0 7→ V (t; x0 ) is continuous. We estimate

kV (t, x0 + η) − V (t; x0 )k
Z t
≤ kf 0 (s, x(s; x0 + η))V (s; x0 + η) − f 0 (s, x(s; x0 ))V (s; x0 )k ds
0
Z t
≤ kf 0 (s, x(s))kop kV (s; x0 + η) − V (s; x0 )k ds
0
Z T
+ kf 0 (s, x(s; x0 + η)) − f 0 (s, x(s; x0 ))kkV (s; x0 + η)k ds.
0

For the second integral on the RHS, we have

kf 0 (s, x(s; x0 + η)) − f 0 (s, x(s; x0 ))k → 0

for each s, since

|x(s; x0 + η) − x(s; x0 )| ≤ |η| exp kf kLip T .

Also, the factor kV (s; x0 + η)k is bounded, since by the differential equation we
have
kV (s; x0 + η)k ≤ 1 · exp kf kLip T .
Therefore, an application of Grönwall’s inequality (Lemma A.3) finishes the
proof.

We can extend this to include both higher degrees of regularity and depen-
dence on parameters:
Corollary A.15. If f : Rt × Rdx × Rkµ → Rd is C r in (x, µ) and Lipschitz in x,
then the solution to
ẋ = f (t, x(t), µ), x(0) = ξ (A.9)
is C r in (ξ, µ).
Proof. We iterate the previous theorem. For example, consider the augmented
system    
x(t) f (t, x(t), µ)
d  ∂x   ∂f
(t) = ∂x (t, x(t), µ) ∂x∂ξ (t)
 (A.10)
dt ∂ξ
µ(t) 0
with initial data (ξ, Id, µ). Assume that f is C 2 in x and µ. Then the RHS
obeys the hypotheses of Proposition A.14, and so we conclude that x(t), ∂x
∂ξ (t),
and µ(t) are C 1 functions of ξ and µ.

The augmented system (A.10) is not only useful in proofs, but is also com-
monly used in numerical integration of the system (A.9) with parameter µ. The
time t can also be included in the augmented system, but this yields a weaker
smoothness result.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 199

A.4. Vector fields and flows

In differential geometry, vector fields are conflated with first-order differential

operators:
d
X ∂
X(x) = (X 1 (x), . . . , X d (x)) ←→ X ·∇= X i (x) ,
i=1
∂xi

where x = (x1 , . . . , xd ). (We intentionally use superscript indices here, because

they are useful for bookkeeping. In fact, some authors use Einstein summation
notation, where we omit the summation symbol and automatically sum over
repeated indices provided that one is superscript and the other is subscript.)
This way, first-order differential operators are characterized by being a linear
operator that satisfies the product rule.
We define the commutator of the vector fields X and Y to be the differential
operator
[X, Y ]f = (XY − Y X)f.
At first this appears to be a second-order differential operator; however, a com-
putation shows that it is a first order differential operator with coefficients
X ∂Y j ∂X j

[X, Y ]j = Xi i − Y i .
i
∂x ∂xi

The commutator satisfies the Jacobi identity

[[X, Y ], Z] + [[Z, X], Y ] + [[Y, Z], X] = 0

for all vector fields X, Y , and Z. This can be verified directly, but is not
particularly illuminating. Ultimately this is true because operators form an
associative algebra, which in turn is true because function composition is always
associative. (The Jacobi identity does not exclusively arise from associative
algebras however; the cross product also obeys the Jacobi identity, but is not
associative.)
A vector field X also has an associated first-order differential equation

ẋ = X(x). (A.11)

Given a vector field X, we define the flow ΦX (t) := x(t; ·) which maps the
initial data ξ ∈ Rd to the solution x(t; ξ) ∈ Rd at time t for the differential
equation ẋ = X(x).
To leading order, the commutator [X, Y ] measures the failure of the flows
ΦX and ΦY to commute:
Lemma A.16. Let X and Y be smooth vector fields on Rd . Then

Φ(t) ◦ ΦY (s) − ΦY (s) ◦ ΦX (t) (ξ) = −st[X, Y ](ξ) + O(s3 + t3 )

as s, t → 0.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 200

Proof. We will Taylor expand the LHS. This is valid because flows are always
smooth, and consequently we may also differentiate in any order. We compute
d
ΦX (t) ◦ ΨY (s) = X ◦ ΦX (t) ◦ ΦY (s),
dt
and so

ΦX (t) ◦ ΦY (0) (ξ) = ξ + tX(ξ) + 21 t2 (X · ∇)X(ξ) + O(t3 ).

As ΦY (0) = Id, we get the same expression for ΦY (0) ◦ ΦX (t). In this way,
there are no terms involving only t or only s in the Taylor expansion.
Therefore the leading order term in the Taylor expansion is quadratic, with
only the st term not necessarily vanishing. We have

∂2
ΦX (t) ◦ ΦY (s) = (Y · ∇)X,
∂s ∂t s,t=0
∂2
ΦY (s) ◦ ΦX (t) = (X · ∇)Y.
∂s ∂t s,t=0

Together, we compute the coefficient of the st term to be

∂2
ΦX (t) ◦ ΦY (s) − ΦY (s) ◦ ΦX (t) (ξ) = (Y · ∇)X − (X · ∇)Y (ξ)
∂s ∂t s,t=0
= −[X, Y ](ξ)

as desired.
Integrating in time, we obtain the following important fact:
Theorem A.17. Let X and Y be smooth vector fields. Then [X, Y ] = 0 if and
only if the flows commute:

ΦX (t) ◦ ΦY (s) = ΦY (s) ◦ ΦX (t).

Proof. We follow the analytic argument from [Arn89, §39.E].

⇐= : This follows immediately from the previous lemma.
=⇒ : Fix s, t 6= 0. We may assume s, t > 0 after reversing time if necessary.
Fix a positive integer N , and divide the rectangle [0, t] × [0, s] into an N × N
grid.
Each path from (0, 0) to (s, t) along this grid corresponds to a composition
of N copies of ΦX ( Nt ) and N copies of ΦY ( Ns ) in some order. In particular,

ΦX (t) ◦ ΦY (s) = ΦX ( Nt ) ◦ · · · ◦ ΦX ( Nt ) ◦ ΦY ( Ns ) ◦ · · · ◦ ΦY ( Ns )

and similarly for ΦY (s) ◦ ΦX (t).

We write the difference ΦX (t) ◦ ΦY (s) − ΦY (s) ◦ ΦX (t) as a telescoping sum
of N 2 terms, where the summand consists of the difference of two paths that
APPENDIX A. FUNDAMENTALS OF ODE THEORY 201

differ around one grid square. Within the summand, there is a difference of two
flows that agree up to one point, differ around a N −1 × N −1 box, and then
continue as the flows of two possibly different points. Applying the lemma with
[X, Y ] = 0, the difference around one box is
3 3
ΦX ( Nt ) ◦ ΦY ( Ns ) − ΦY ( Ns ) ◦ ΦX ( Nt ) = 0 + O( |s| N+|t|
3 ).
After this, the flows can then deviate at most exponentially. Altogether, we
estimate the whole sum as
3 3
ΦX (t) ◦ ΦY (s) − ΦY (s) ◦ ΦX (t) = N 2 eC(|t|+|s|) O( |s| N+|t|
3 ).
Sending N → ∞, the RHS vanishes. As s, t were arbitrary, we conclude that
ΦX (t) ◦ ΦY (s) − ΦY (s) ◦ ΦX (t) ≡ 0.

A.5. Behavior away from fixed points

We are ultimately interested in the qualitative characterization of all Hamil-

tonian flows. First, we will need a general fact about the behavior of solutions
away from fixed points.
We say that a diffeomorphism x = Ψ(y) conjugates the flow ẋ = X(x) into
the flow ẏ = Y (y) if
Y (y) = (Ψ0 (y))−1 (X ◦ Ψ)(y). (A.12)
Indeed, under the change of variables x = Ψ(y) we have
d
ẋ = Ψ(y) = Ψ0 (y)ẏ = Ψ0 (y)Y (y).
dt
On the other hand,
ẋ = X(x) = (X ◦ Ψ)(y),
and so rearranging yields the condition (A.12). This is the notion with which
we can describe the qualitative behavior of solutions.
Proposition A.18. If X(x0 ) 6= 0, then there exists a local diffeomorphism
conjugating (A.11) to
ẏ = e1 = (1, 0, . . . , 0).
Proof. We may assume x0 = 0 after translating. First we rotate coordinates
so that X(0) = ce1 with c > 0. Let ΦX (t; x) := x(t; ξ) denote the flow of the
initial data ξ by time t. Define
Ψ(y) = ΦX (y1 ; (0, y2 , . . . , yd )).
That is, we flow the initial data (0, y2 , . . . , yd ) by a “time” y1 .
We check that this choice does indeed work. We have
 ∂Ψ1
. . . ∂Ψ 1
  
∂y1 ∂yd | ∗ ... ∗
0  .. .
. . .. ..  ,
Ψ (y) =  . .  = X ◦ Ψ(y) .. (A.13)
 
. .
∂Ψd ∂Ψd
∂y1 . . . ∂yd | ∗ . . . ∗
APPENDIX A. FUNDAMENTALS OF ODE THEORY 202

where “∗” denotes an unspecified nonzero entry. In particular,

 
c 0 ... 0
0 1 
Ψ0 (0) =  .
 
 .. . .. 

0 1

has determinant c > 0 and hence is nonsingular. (The lower right submatrix
is the identity because ΦX (0; (0, y2 , . . . , yd )) = (0, y2 , . . . , yd ).) Therefore Ψ is a
local diffeomorphism by the inverse function theorem. Also, from (A.13) we see
that
Ψ0 (y)e1 = (X ◦ Ψ)(y),
and so
ẏ = (Ψ0 (y))−1 (X ◦ Ψ)(y) ≡ e1
as desired.
To paraphrase, a nonvanishing vector field X is locally a coordinate vector
∂
field ∂x 1
for some choice of coordinates. Unlike general collections of vector
fields, coordinate vector fields commute with each other.
Proposition A.19. If X1 , . . . , Xn smooth vector fields on Rd that commute
and X1 (x0 ), . . . , Xn (x0 ) ∈ Rd are linearly independent, then there exists a local
diffeomorphism that conjugates X1 , . . . , Xn into e1 , . . . , en .
Proof. We may assume x0 = 0 after translating. First, we make a linear change
of variables so that X1 (x0 ) = e1 , . . . , Xn (x0 ) = en , which can be done by linear
independence. Let [ΦXi (t)](x) := xi (t; ξ) denote the flow of the initial data ξ
under the equation ẋi = Xi by time t. Define

Ψ(y) = ΦX1 (y1 ) ◦ ΦX2 (y2 ) ◦ · · · ◦ ΦXn (yn ) (0, . . . , 0, yn+1 , . . . , yd ).

We have Ψ0 (0) = Id as before. As the vector fields commute, so do their flows.

Therefore, in computing the ith derivative of Ψ, we may pull the ith flow map
ΦXi out to the front:
∂Ψ ∂
= Ψ1 Xi (yi ) ◦ [. . . ] = Xi ◦ Ψ(y) ◦ [. . . ].
∂yi ∂yi
The rest of the calculation is the same as in Proposition A.18.

A.6. Behavior near a fixed point

It remains to describe the behavior of solutions near a point x0 where

X(x0 ) = 0. Such a point is called a fixed point (or stationary point, equi-
librium), because the constant function x(t) ≡ x0 is a solution to ẋ = X(x)
and hence the flow ΦX (t) fixes x0 . The material in this section is not strictly
necessary for these notes (and so we omit the proofs); however, it is essential in
APPENDIX A. FUNDAMENTALS OF ODE THEORY 203

a study of ODEs. Nevertheless, the local structure of flows nearby a fixed point
is still not fully understood to this day.
A first step is to linearize the vector field X(x):
d
X ∂X i
X i (x) ≈ 0 + (x0 )xj .
j=1
∂xj

Ideally, we would like to say that the higher order terms that we have neglected
are small. A fundamental idea to accomplish this is to write the nonlinear flow
as the linear flow plus a perturbation:
Lemma A.20 (Duhamel formula). Suppose A is a d×d matrix and g : Rd → Rd
is smooth. Then x ∈ C 1 ((−T, T ) → Rd ) solves

ẋ = Ax + g(x)

if and only if x ∈ C 0 ((−T, T ) → Rd ) solves

Z t
x(t) = eAt x(0) + eA(t−s) g(x(s)) ds. (A.14)
0

In the case A = 0, we recover the integral equation (A.2).

Proof. Write x(t) = eAt y(t). As A is time-independent, then eAt commutes
with A and so we may write

ẏ = −Ay + e−At (Ax + g(x)) = e−At g(x(t)).

This is solved if and only if y(t) solves the integral equation

Z t
y(t) = y(0) + e−As g(x(s)) ds.
0

−At
Recalling that y(t) = e x(t) and multiplying by eAt yields the claim.
This proof method is called variation of parameters, and it is widely appli-
cable. It is useful even when A is nonlinear, although it is harder.
Even in the case when g is linear, the Duhamel formula (A.14) is nontrivial.
Indeed, if we write g(x) = Bx and iterate, then we get
Z t
x(t) = eAt x(0) + eA(t−s) BeAs x(0) ds
0
ZZ
+ eA(t−s2 ) BeA(s2 −s1 ) BeAs1 x(0) ds1 ds2 + . . . .
0<s1 <s2 <t

This is the infinite Duhamel expansion, where we sum over the possible histories
of x(t). Starting with x(0), we flow by A; to this, we add the flow by A, bump
by B, and flow by A; then we add the flow by A, bump by B, flow by A, bump
by B, and flow by A; and so on.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 204

Duhamel’s formula (A.14) is much more effective than the integral equa-
tion (A.2) at solving equations by iteration. Consider

−1 0
ẋ = Ax + (x), A = .
0 −λ

In the case = 0, iterating the integral equation (A.2) yields

x(t) = x0 + (At)x0 + 21 (At)2 x0 + · · · + 1 k

k! (At) x0 + ....

For λ > 1 large, we have to go to the term k ≈ λ before the terms stop growing.
On the other hand, eAt is very stable, and the Duhamel formula harnesses this.
Back to the behavior near a fixed point. If we make a diffeomorphic change
of variables x = Ψ(y), we know from (A.12) that the equation for y is

ẏ = Y := (Ψ0 (y))−1 (X ◦ Ψ)(y).

If Ψ maps 0 to the fixed point x0 , then we obtain

d
∂Y i X 0 −1 i ∂X k `
(x0 ) ψ 0 (0) j .

(0) = [. . . ]X(x 0 ) + ψ (0) k
∂y j ∂x`
k,`=1

The first term on the RHS vanishes since X(x0 ) = 0. Therefore the matrix
∂Y i ∂X i 0
∂y j is similar to ∂xj , because they are conjugated by Ψ (0). In particular, the
i
Jordan normal form of ∂X ∂xj (and hence all of its spectral properties) is preserved
under this change of variables.
The following fundamental result tells us that often the actual flow is quali-
tatively similar to the linearized flow:
i
Theorem A.21 (Hartman–Grobman). If the matrix ∂X ∂xj (x0 ) has no purely
imaginary eigenvalues, then there exists a homeomorphism conjugating the non-
linear flow (A.11) to the linear flow
d
X ∂X
ẏ = j
(x0 )y j .
j=1
∂x

For a proof, see [KH95]. The assumption of Theorem A.21 is necessary;

cf. Example 2.3.
Note that Theorem A.21 only guarantees that the change of variables is
homeomorphic, rather than diffeomorphic. It turns out that the change of vari-
ables may indeed only be continuous and not differentiable, but the derivative
at x0 must exist and be equal to identity; see [Har60, GHR03] for details. In
particular, this justifies that the phase portrait for the nonlinear system must
look like the linearized phase portrait near x0 .
However, Theorem A.21 does not tell us what happens to the finer features
of the linearized flow. In particular, what happens to the stable and unstable
APPENDIX A. FUNDAMENTALS OF ODE THEORY 205

eigenspaces? The stable and unstable manifold theorems tell us that they are
preserved under some additional assumptions. For a general linear system ẋ =
Ax, we define the stable, unstable, and center manifolds:
[
Xs = {span ker((A − λ)k ) : Re λ < 0},
[
Xu = {span ker((A − λ)k ) : Re λ > 0},
[
Xc = {span ker((A − λ)k ) : Re λ = 0},

where λ varies over the eigenvalues of A.

Theorem A.22 (Stable manifold theorem). Suppose that f : Rd → Rd is
smooth and the IVP (A.1) has a hyperbolic fixed point at x = 0. Then there
exists 0 < δ0 < δ1 and a smooth function ψ : {a : Xs : |a| < δ1 } → Xu such that
the stable manifold

M = {a + ψ(a) : a ∈ Xs , |a| < δ1 }

satisfies the following:

(a) If |x0 | < δ0 and x0 ∈ M, then the solution x(t) stays in the δ1 neighbor-
hood:
|x(t)| ≤ δ1 for all t ≥ 0,
and decays exponentially:

|x(t)| ≤ Ce−σt |x0 | for some C, σ > 0.

(b) If |x0 | < δ and x0 ∈

/ M, then the solution is ejected from the δ1 neighbor-
hood:
|x(t)| ≥ δ1 for some t > 0.

See [CL55, §13.4] for a proof. To obtain the statement of the unstable
manifold theorem we simply need to reverse time, which swaps A 7→ −A and
Xs ↔ Xu .
The constant σ in part (a) may be strictly smaller than the smallest real
part inf{| Re λ| : Re λ < 0}, and the constant C must depend on σ. To see
this, consider a Jordan block A with eigenvalue Re λ < 0 and all ones on the
superdiagonal. Then we have

1 t 21 t2 . . .
 

eAt = eλt 0 1 t . . ..


.. .. .. ..
. . . .

Although the factor eλt is exponentially decaying, the matrix is initially growing
in t.
APPENDIX A. FUNDAMENTALS OF ODE THEORY 206

A.7. Exercises

A.1. (a) Fix f : R×Rd → Rd that is C 1 and consider the initial value problem
ẋ = f (t, x) with x(0) = ξ.
Suppose that for some ξ0 this problem admits a (necessarily unique) so-
lution on the interval [0, T ). Show that for each > 0 there is a δ > 0 so
that if |ξ − ξ0 | < δ, then the initial value problem admits a solution on the
interval [0, T − ).
(b) Compute the maximal (forward) existence time as a function of the initial
data for the problem
x2
ẋ = , ẏ = 0.
1 + y 2 x2
Deduce that the existence time may fail to be continuous (in the natu-
ral topology on [0, ∞]). By part (a), however, it is always lower semi-
continuous.
A.2. Suppose that A(t) and B(t) are C 1 square matrices with Ȧ = BA. Show
that
d
det A = tr(B) det(A). (A.15)
dt
(Hint: : Write the derivative of det A in terms of the derivatives of its rows and
simplify using row operations. Be careful that you do not implicitly assume the
eigenvalues are differentiable, which is not true.)
A.3 (Trust singular values, not eigenvalues). Suppose A is a 2 × 2 matrix with
eigenvalues −λ < 0 and −1, and eigenspaces which form angles ±θ with the
horizontal axis. Fix t > 0 and λ > 1 sufficiently large. Show that eAt has norm
≥ cθ−1 for some constant c > 0 for all θ > 0 sufficiently small. Even though
A has negative eigenvalues, the change of basis matrix makes the norm of eAt
very large.
A.4. Find an example of 2 × 2 matrices A and B which both satisfy Xs = R2
but A + B has Xu 6= {0}. (Hint: Take A and B to be upper and lower triangular
respectively, and consider det(A + B).) This phenomenon is known as Turning
instability, and was introduced in [Tur52].
A.5. The proof of the stable manifold theorem actually shows that M is a
smooth manifold that is tangent to Xs at the origin, following as in the proof of
Proposition A.14. With this information in hand, we can derive many properties
from the differential equation. For example, consider the system
q̇ = q − 2pq + 3p2 , ṗ = −p + p2 .
(Incidentally, this system is Hamiltonian, but this is not important to this
method.) On M we can write q = ψ(p). Taking a time derivative of this
equation, match Taylor coefficients using ψ(0) = 0 = ψ 0 (0) to compute the
second-order expansion of ψ at p = 0.
REFERENCES

[AKN06] V. I. Arnol’d, V. V. Kozlov, and A. I. Neishtadt, Mathematical aspects of classical

and celestial mechanics, third ed., Encyclopaedia of Mathematical Sciences, vol. 3,
Springer-Verlag, Berlin, 2006, [Dynamical systems. III], Translated from the Russian
original by E. Khukhro. MR 2269239
[Arn89] V. I. Arnol’d, Mathematical methods of classical mechanics, Graduate Texts in
Mathematics, vol. 60, Springer-Verlag, New York, 1989, Translated from the 1974
Russian original by K. Vogtmann and A. Weinstein, Corrected reprint of the second
(1989) edition. MR 1345386
[BCT17] A. Bravetti, H. Cruz, and D. Tapias, Contact Hamiltonian mechanics, Ann. Physics
376 (2017), 17–39. MR 3600092
[BT15] A. Bravetti and D. Tapias, Liouville’s theorem and the canonical measure for non-
conservative systems from contact geometry, J. Phys. A 48 (2015), no. 24, 245001,
11. MR 3355244
[Cal41] P. Caldirola, Forze non conservative nella meccanica quantistica, Atti Accad. Italia.
Rend. Cl. Sci. Fis. Mat. Nat. (7) 2 (1941), 896–903. MR 18569
[CL55] E. A. Coddington and N. Levinson, Theory of ordinary differential equations,
McGraw-Hill Book Company, Inc., New York-Toronto-London, 1955. MR 0069338
[Cla70] R. Clausius, On a mechanical theorem applicable to heat, Philosophical Magazine,
Ser. 4 40 (1870), 122–127.
[Cla79] F. H. Clarke, A classical variational principle for periodic Hamiltonian trajectories,
Proc. Amer. Math. Soc. 76 (1979), no. 1, 186–188. MR 534415
[Eva10] L. C. Evans, Partial differential equations, second ed., Graduate Studies in Mathe-
matics, vol. 19, American Mathematical Society, Providence, RI, 2010. MR 2597943
[Gal03] G. Galperin, Playing pool with π (the number π from a billiard point of view),
Regul. Chaotic Dyn. 8 (2003), no. 4, 375–394. MR 2023043
[Gar71] C. S. Gardner, Korteweg-de Vries equation and generalizations. IV. The Korteweg-
de Vries equation as a Hamiltonian system, J. Mathematical Phys. 12 (1971),
1548–1551. MR 286402
[GHR03] M. Guysinsky, B. Hasselblatt, and V. Rayskin, Differentiability of the Hartman-
Grobman linearization, Discrete Contin. Dyn. Syst. 9 (2003), no. 4, 979–984. MR
1975364
[Gol51] H. Goldstein, Classical Mechanics, Addison-Wesley Press, Inc., Cambridge, Mass.,
1951. MR 0043608
[Har60] P. Hartman, On local homeomorphisms of Euclidean spaces, Bol. Soc. Mat. Mexi-
cana (2) 5 (1960), 220–241. MR 141856
[JS07] D. W. Jordan and P. Smith, Nonlinear ordinary differential equations, fourth ed.,
Oxford University Press, Oxford, 2007, An introduction for scientists and engineers.
MR 2348881
[Kan48] E. Kanai, On the quantization of the dissipative systems, Progress of Theoretical
Physics 3 (1948), no. 4, 440–442.

207
BIBLIOGRAPHY 208

[KdV95] D. J. Korteweg and G. de Vries, On the change of form of long waves advancing in
a rectangular canal, and on a new type of long stationary waves, Philos. Mag. (5)
39 (1895), no. 240, 422–443. MR 3363408
[KH95] A. Katok and B. Hasselblatt, Introduction to the modern theory of dynamical sys-
tems, Encyclopedia of Mathematics and its Applications, vol. 54, Cambridge Univer-
sity Press, Cambridge, 1995, With a supplementary chapter by Katok and Leonardo
Mendoza. MR 1326374
[KS65] P. Kustaanheimo and E. Stiefel, Perturbation theory of Kepler motion based on
spinor regularization, J. Reine Angew. Math. 218 (1965), 204–219. MR 180349
[KV13] R. Killip and M. Vişan, Nonlinear Schrödinger equations at critical regularity, Evo-
lution equations, Clay Math. Proc., vol. 17, Amer. Math. Soc., Providence, RI, 2013,
pp. 325–437. MR 3098643
[Lee13] J. M. Lee, Introduction to smooth manifolds, second ed., Graduate Texts in Math-
ematics, vol. 218, Springer, New York, 2013. MR 2954043
[Lie80] S. Lie, Theorie der Transformationsgruppen I, Math. Ann. 16 (1880), no. 4, 441–
528. MR 1510035
[LL76] L. D. Landau and E. M. Lifshitz, Course of theoretical physics. Vol. 1, third ed.,
Pergamon Press, Oxford-New York-Toronto, Ont., 1976, Mechanics, Translated
from the Russian by J. B. Skyes and J. S. Bell. MR 0475051
[LP89] P. D. Lax and R. S. Phillips, Scattering theory, second ed., Pure and Applied
Mathematics, vol. 26, Academic Press, Inc., Boston, MA, 1989, With appendices
by Cathleen S. Morawetz and G. Schmidt. MR 1037774
[Mor75] C. S. Morawetz, Notes on time decay and scattering for some hyperbolic problems,
Regional Conference Series in Applied Mathematics, No. 19, Society for Industrial
and Applied Mathematics, Philadelphia, Pa., 1975. MR 0492919
[MZ05] J. Moser and E. J. Zehnder, Notes on dynamical systems, Courant Lecture Notes
in Mathematics, vol. 12, New York University, Courant Institute of Mathematical
Sciences, New York; American Mathematical Society, Providence, RI, 2005. MR
2189486
[Nah16] P. Nahin, In praise of simple physics, Princeton University Press, 2016.
[Rab78] P. H. Rabinowitz, Periodic solutions of Hamiltonian systems, Comm. Pure Appl.
Math. 31 (1978), no. 2, 157–184. MR 467823
[San17] F. Santambrogio, {Euclidean, metric, and Wasserstein} gradient flows: an
overview, Bull. Math. Sci. 7 (2017), no. 1, 87–154. MR 3625852
[Sie54] C. L. Siegel, Über die Existenz einer Normalform analytischer Hamiltonscher
Differentialgleichungen in der Nähe einer Gleichgewichtslösung, Math. Ann. 128
(1954), 144–170. MR 67298
[SM95] C. L. Siegel and J. K. Moser, Lectures on celestial mechanics, Classics in Mathe-
matics, Springer-Verlag, Berlin, 1995, Translated from the German by C. I. Kalme,
Reprint of the 1971 translation. MR 1345153
[SS98] J. Shatah and M. Struwe, Geometric wave equations, Courant Lecture Notes in
Mathematics, vol. 2, New York University, Courant Institute of Mathematical
Sciences, New York; American Mathematical Society, Providence, RI, 1998. MR
1674843
[Str08] M. Struwe, Variational methods, fourth ed., Ergebnisse der Mathematik und ihrer
Grenzgebiete. 3. Folge. A Series of Modern Surveys in Mathematics [Results in
Mathematics and Related Areas. 3rd Series. A Series of Modern Surveys in Math-
ematics], vol. 34, Springer-Verlag, Berlin, 2008, Applications to nonlinear partial
differential equations and Hamiltonian systems. MR 2431434
[Str15] S. H. Strogatz, Nonlinear dynamics and chaos, second ed., Westview Press, Boulder,
CO, 2015, With applications to physics, biology, chemistry, and engineering. MR
3837141
BIBLIOGRAPHY 209

[Tao06] T. Tao, Nonlinear dispersive equations, CBMS Regional Conference Series in Math-
ematics, vol. 106, Published for the Conference Board of the Mathematical Sciences,
Washington, DC; by the American Mathematical Society, Providence, RI, 2006, Lo-
cal and global analysis. MR 2233925
[Tro96] J. L. Troutman, Variational calculus and optimal control, second ed., Undergradu-
ate Texts in Mathematics, Springer-Verlag, New York, 1996, With the assistance of
William Hrusa, Optimization with elementary convexity. MR 1363262
[Tur52] A. M. Turing, The chemical basis of morphogenesis, Philos. Trans. Roy. Soc. London
Ser. B 237 (1952), no. 641, 37–72. MR 3363444
[Wei78] A. Weinstein, Periodic orbits for convex Hamiltonian systems, Ann. of Math. (2)
108 (1978), no. 3, 507–518. MR 512430
[Wei83] , The local structure of Poisson manifolds, J. Differential Geom. 18 (1983),
no. 3, 523–557. MR 723816
[Wil36] J. Williamson, On the Algebraic Problem Concerning the Normal Forms of Linear
Dynamical Systems, Amer. J. Math. 58 (1936), no. 1, 141–163. MR 1507138

Coconut VA Resume Template
50% (2)
Coconut VA Resume Template
2 pages
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Aws Networking Fundamentals
100% (1)
Aws Networking Fundamentals
322 pages
Classical Mechanics - Marion, Thornton
100% (4)
Classical Mechanics - Marion, Thornton
252 pages
CLS13 - After The Chains
No ratings yet
CLS13 - After The Chains
222 pages
English For Advanced Learners
100% (2)
English For Advanced Learners
9 pages
Lecture Notes Classical Mechanics
100% (1)
Lecture Notes Classical Mechanics
453 pages
1.2 Continuacion Resumen Principios Fundamentales
No ratings yet
1.2 Continuacion Resumen Principios Fundamentales
453 pages
Notes
No ratings yet
Notes
453 pages
Dokerclassical Mechanics
No ratings yet
Dokerclassical Mechanics
117 pages
Classical Mechanics 200 COURSE
No ratings yet
Classical Mechanics 200 COURSE
453 pages
Lecture Notes On Classical Mechanics (PDFDrive)
100% (1)
Lecture Notes On Classical Mechanics (PDFDrive)
463 pages
tm2016 17
No ratings yet
tm2016 17
181 pages
Joel A Shapiro Classical Mechanic
100% (2)
Joel A Shapiro Classical Mechanic
278 pages
Classical Mechanics
No ratings yet
Classical Mechanics
427 pages
(Cambridge Monographs On Applied and Computational Mathematics) Benedict Leimkuhler, Sebastian Reich - Simulating Hamiltonian Dynamics-Cambridge University Press (2005)
0% (1)
(Cambridge Monographs On Applied and Computational Mathematics) Benedict Leimkuhler, Sebastian Reich - Simulating Hamiltonian Dynamics-Cambridge University Press (2005)
396 pages
Classical Mechanics Lect Notes by Sunil Golwala
No ratings yet
Classical Mechanics Lect Notes by Sunil Golwala
386 pages
Lectures in Classical Mechanics - Richard Fitzpatrick
100% (3)
Lectures in Classical Mechanics - Richard Fitzpatrick
310 pages
MO Apuntes Anuales
No ratings yet
MO Apuntes Anuales
200 pages
CRC Advanced Classical Mechanics B071HLBNQD
100% (3)
CRC Advanced Classical Mechanics B071HLBNQD
272 pages
Lecture Notes On Classical Mechanics For Physics 106ab
No ratings yet
Lecture Notes On Classical Mechanics For Physics 106ab
396 pages
Lecture Notes On Classical Mechanics For Physics 106ab
No ratings yet
Lecture Notes On Classical Mechanics For Physics 106ab
396 pages
Classical Mechanics Very Good Notes Like
No ratings yet
Classical Mechanics Very Good Notes Like
396 pages
Notas Mecánica
No ratings yet
Notas Mecánica
177 pages
Mechanics Dec 05
No ratings yet
Mechanics Dec 05
309 pages
Newtonian Dynamics: Professor of Physics The University of Texas at Austin
No ratings yet
Newtonian Dynamics: Professor of Physics The University of Texas at Austin
293 pages
Classical Mechanics
No ratings yet
Classical Mechanics
193 pages
Classical Mechanics
No ratings yet
Classical Mechanics
96 pages
LectureNote CM2
No ratings yet
LectureNote CM2
51 pages
Classical Mechanics A Minimal Standard Course, Winitzki
No ratings yet
Classical Mechanics A Minimal Standard Course, Winitzki
65 pages
PHAS0049 Lecture Notes
No ratings yet
PHAS0049 Lecture Notes
165 pages
Analytical Classical Dynamics An Intermediate Level Course PDF
No ratings yet
Analytical Classical Dynamics An Intermediate Level Course PDF
315 pages
Mechanics Book Fall 2014
No ratings yet
Mechanics Book Fall 2014
206 pages
Lecture Notes On Classical Mechanics
No ratings yet
Lecture Notes On Classical Mechanics
570 pages
KTRM Analytic Mechanics
No ratings yet
KTRM Analytic Mechanics
144 pages
ИНЖЕНЕРНЫЙ анализ ВИБРАЦИЙ PDF
No ratings yet
ИНЖЕНЕРНЫЙ анализ ВИБРАЦИЙ PDF
186 pages
Newtonian Dynamics
No ratings yet
Newtonian Dynamics
298 pages
Analytical Classical Dynamics
100% (15)
Analytical Classical Dynamics
315 pages
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Mortals or Immortals
From Everand
Mortals or Immortals
Konstantinos p Anastasiadis
No ratings yet
Deadline Yemen (The Elizabeth Darcy Series)
From Everand
Deadline Yemen (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Deadline Istanbul (The Elizabeth Darcy Series)
From Everand
Deadline Istanbul (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Hamlet Had an Uncle: A Comedy of Honor
From Everand
Hamlet Had an Uncle: A Comedy of Honor
James Branch Cabell
4.5/5 (7)
Operation Longlife
From Everand
Operation Longlife
E. Hoffmann Price
3.5/5 (3)
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
From Everand
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
Lane Conner
No ratings yet
Alienist: A Gerald Knave Science Fiction Adventure
From Everand
Alienist: A Gerald Knave Science Fiction Adventure
Laurence M. Janifer
No ratings yet
The Calm Teacher Method: How to Thrive in Chaos Without Losing Your Cool
From Everand
The Calm Teacher Method: How to Thrive in Chaos Without Losing Your Cool
Thomas Grey
No ratings yet
Back in the Real World (Stone Angel #2)
From Everand
Back in the Real World (Stone Angel #2)
Marvin H. Albert
4.5/5 (2)
Osama the Gun
From Everand
Osama the Gun
Norman Spinrad
5/5 (1)
Between River and Mountain
From Everand
Between River and Mountain
Sally Walker Brinkmann
No ratings yet
Old Breed General: How Marine Corps General William H. Rupertus Broke the Back of the Japanese in World War II from Guadalcanal to Peleliu
From Everand
Old Breed General: How Marine Corps General William H. Rupertus Broke the Back of the Japanese in World War II from Guadalcanal to Peleliu
Amy Rupertus Peacock
No ratings yet
Operation Exile
From Everand
Operation Exile
E. Hoffmann Price
3.5/5 (1)
Kayaking with Eric Jackson: Strokes and Concepts
From Everand
Kayaking with Eric Jackson: Strokes and Concepts
Eric Jackson
No ratings yet
Never Walk Alone
From Everand
Never Walk Alone
Rufus King
No ratings yet
Bimbo Heaven: Stone Angel #7
From Everand
Bimbo Heaven: Stone Angel #7
Marvin H. Albert
No ratings yet
Duenna to a Murder
From Everand
Duenna to a Murder
Rufus King
No ratings yet
The Gracious Lily Affair
From Everand
The Gracious Lily Affair
Van Wyck Mason
5/5 (1)
The Sow's Ear: A Paisley Sterling Mystery
From Everand
The Sow's Ear: A Paisley Sterling Mystery
E. Joan Sims
No ratings yet
'Ware the Dark-Haired Man: The Hieromonk's Tale, Book Three
From Everand
'Ware the Dark-Haired Man: The Hieromonk's Tale, Book Three
Robert Reginald
No ratings yet
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
DBMS Interview Questions and Answers
No ratings yet
DBMS Interview Questions and Answers
5 pages
Solutions Unit 2
No ratings yet
Solutions Unit 2
4 pages
Jiddu Krishnamurti
No ratings yet
Jiddu Krishnamurti
29 pages
Step Into Your Supernatural Destiny Activate The Calling On Your LIFE For Breakthrough Success (Destiny Image, Fighting... (Edwin Kim (Kim, Edwin) ) (Z-Library)
No ratings yet
Step Into Your Supernatural Destiny Activate The Calling On Your LIFE For Breakthrough Success (Destiny Image, Fighting... (Edwin Kim (Kim, Edwin) ) (Z-Library)
46 pages
Passive Exercises With Answers
No ratings yet
Passive Exercises With Answers
6 pages
The Indo Europeans
100% (1)
The Indo Europeans
33 pages
CH 1 To CH 3 10th Class
No ratings yet
CH 1 To CH 3 10th Class
9 pages
Grade 1 To 12 Daily Lesson Log: Monday Tuesday Wednesday Thursday Friday
No ratings yet
Grade 1 To 12 Daily Lesson Log: Monday Tuesday Wednesday Thursday Friday
5 pages
E Service Quality A Conceptual Model
No ratings yet
E Service Quality A Conceptual Model
17 pages
Drexel Lesson Plan Template Interactive Read Aloud Teacher: Brad Jones Grade: 2
No ratings yet
Drexel Lesson Plan Template Interactive Read Aloud Teacher: Brad Jones Grade: 2
3 pages
10 - Chapter 3 - History of Indian Cinema
No ratings yet
10 - Chapter 3 - History of Indian Cinema
42 pages
Copia de 2015 NOV Senior 1 P2
No ratings yet
Copia de 2015 NOV Senior 1 P2
2 pages
Key Signatures Explained
No ratings yet
Key Signatures Explained
9 pages
31IJELS 109202050 Licensing PDF
No ratings yet
31IJELS 109202050 Licensing PDF
6 pages
Ibm Tivoli Workload Scheduler
No ratings yet
Ibm Tivoli Workload Scheduler
23 pages
Sap Enterprise Structure
No ratings yet
Sap Enterprise Structure
27 pages
Mac 221
No ratings yet
Mac 221
169 pages
Lit. Revision Worksheet
No ratings yet
Lit. Revision Worksheet
2 pages
Notes Graph
No ratings yet
Notes Graph
9 pages
STAR Supervisory Tool
100% (1)
STAR Supervisory Tool
64 pages
1 - Course Info TAC501das
0% (1)
1 - Course Info TAC501das
4 pages
Create Loving Relationships
100% (1)
Create Loving Relationships
57 pages
PlateMaker Advanced Tutorial Sample
No ratings yet
PlateMaker Advanced Tutorial Sample
8 pages
Important Instruction For Internal and Practical Exam-1
No ratings yet
Important Instruction For Internal and Practical Exam-1
3 pages
BPCS 188 E Assignment 2024-2025
No ratings yet
BPCS 188 E Assignment 2024-2025
4 pages
A Parable of God's Love For Sinners - Luke 15
No ratings yet
A Parable of God's Love For Sinners - Luke 15
15 pages

Notes On Classical Mechanics

Uploaded by

Notes On Classical Mechanics

Uploaded by

NOTES ON CLASSICAL MECHANICS

Last updated: January 16, 2023

2 One degree of freedom 18

4.6 Noether’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 69

III Hamiltonian Mechanics 110

8 Normal forms 135

9 Symplectic geometry 161

9.5 Time-dependent systems . . . . . . . . . . . . . . . . . . . . . . . 168

10 Contact geometry 174

Appendix A Fundamentals of ODE theory 186

Although classical, the subject of mechanical systems continues to be im-

The Newtonian framework is the most fundamental interpretation of the

1.1. Empirical assumptions

Classical mechanics is the study of the motion of particles in Euclidean

mi ẍi = Fi (t, x, ẋ) for i = 1, . . . , N. (1.3)

Newton’s equations are commonly stated in terms of the momentum p instead

Example 1.2. The following are all Galilean transformations:

1.2. Kinetic energy

is measured on configuration space RN d .

Proof. Differentiating the kinetic energy (1.4), we obtain

by Newton’s equations (1.3). Integrating, we have

1.3. Potential energy

For some systems there is also a potential energy. In physics, a Newtonian

for all paths connecting y to z.

Example 1.6. If the interaction forces depend only on particle distances:

then the system is conservative with potential energy

In fact, every conservative system must have a potential energy:

is well-defined as a function of x, because its value is independent of the choice

That is, the work is independent of the path.

Figure 1.1: Configuration space for the system of Example 1.8.

1.4. Total energy

Proof. By the computation (1.5) and Newton’s equations (1.3), we have

Therefore E(t) must be constant.

|x(t)|2 + |ẋ(t)|2 ≤ (2)2 (1.6)

for some time t. Taylor expanding about x = 0, we have

V (x) = 12 x · V 00 (0)x + O(|x|3 ).

As V 00 (0) is positive definite and |x(t)| ≤ 2, then we have

V (x(t)) ≥ c|x(t)|2 − C3

for some constants c, C > 0. By conservation of energy (Proposition 1.9), we

Picking E(0) and  sufficiently small, we conclude that

|x(t)|2 + |ẋ(t)|2 ≤ 2 . (1.7)

1.5. Linear momentum

Suppose the force on the ith particle can be decomposed as

Fij = −Fji for all i, j.

This is a common property of Newtonian systems, and we will often be work-

which is the first claim. Inserting the decomposition (1.8), we obtain

Then taking the dot product of (1.9) with a, we obtain

Proof. Differentiating (1.11) yields

The RHS vanishes for a closed system by (1.9).

1.6. Angular momentum

The relationship between angular momentum and torque is analogous to their

xi × Fij + xj × Fji = (xi − xj ) × Fij = 0.

Proof. Let a be a unit vector so that

Then taking the dot product of (1.9) with a, we obtain

The moment of inertia plays a role for the angular velocity

1.1 (Galilean group generators). Show that every Galilean transformation g on

mi z̈i = Fi zk − zj , B −1 Ḃ(zk − zj ) + (żk − żj ) + Φi + Ψi ,

Φi = −mi B −1 B̈zi + B −1 b̈ , Ψi = −2mi B −1 Ḃ żi .

Ψi is called the Coriolis force, and depends on the velocity. In the

1.7. Consider a conservative system in R3 with coordinates so that the center

(a) Show that |L|2 ≤ 2IK.

ONE DEGREE OF FREEDOM

As Newton’s equations model a mechanical system, they come with some

2.1. Linear systems

The behavior of solutions to this approximate equation depends on the sign of

Figure 2.1: Phase portrait for the system of Example 2.1.

We would like to know if the solutions to the approximate equation (2.2)

Example 2.2 (Harmonic oscillator). Consider the linear system

Figure 2.2: Phase portrait for the system of Example 2.2.

Unfortunately, we are unable to immediately conclude that this is an ac-

on R2 , where a ∈ R is a constant. When a = 0 we obtain the linearized system

(a) a < 0 (b) a = 0 (c) a > 0

Figure 2.4: Phase portraits for the system of Example 2.3.

2.2. Conservative systems

defined for x in an open set U ⊂ Rn . We can always reduce a degree-d system

E(x(0)) = lim E(x(t)) = E(x∗ ).

|x(t)|2 + |ẋ(t)|2 ≤ (2)2 (1.6)

As V 00 (0) is positive definite and |x(t)| ≤ 2, then we have

V (x(t)) ≥ c|x(t)|2 − C3

Picking E(0) and sufficiently small, we conclude that

|x(t)|2 + |ẋ(t)|2 ≤ 2 . (1.7)

As x∗ is a strict local minimum of E, then we may take smaller if necessary

|x(t) − y(t)| = |ht (x0 )| ≤ 21 |y(t)| for all x0 ∈ B (x∗ ), t ∈ [0, T ].