0% found this document useful (0 votes)

81 views125 pages

Part3 GR Lectures

This document provides lecture notes on General Relativity for Part III of the Cambridge Mathematical Tripos. It begins with an introduction to manifolds and tensors, which are the fundamental objects in general relativity. A manifold is a set with coordinate charts that map to subsets of Euclidean space in a smooth way. This allows calculus to be done on curved spaces by treating them locally as Euclidean. The notes will cover topics like metric tensors, covariant derivatives, curvature, diffeomorphisms, physical laws in curved spacetime, and the initial value problem.

Uploaded by

ea564

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views125 pages

Part3 GR Lectures

Uploaded by

ea564

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 125

Mathematical Tripos Part III

General Relativity

Harvey Reall
CONTENTS

Contents
1 Manifolds and tensors 4
1.1 Introduction 4
1.2 Differentiable manifolds 5
1.3 Smooth functions 8
1.4 Curves and vectors 9
1.5 Covectors 14
1.6 Abstract index notation 15
1.7 Tensors 16
1.8 Tensor fields 20
1.9 Integral curves 21
1.10 The commutator 22

2 Metric tensors 24
2.1 Definition 24
2.2 Lorentzian signature 26
2.3 Curves of extremal proper time 28

3 Covariant derivative 31
3.1 Introduction 31
3.2 The Levi-Civita connection 35
3.3 Geodesics 37
3.4 Normal coordinates 39

4 Curvature 40
4.1 Parallel transport 40
4.2 The Riemann tensor 41
4.3 Parallel transport again 42
4.4 Symmetries of the Riemann tensor 45
4.5 Geodesic deviation 46
4.6 Curvature of the Levi-Civita connection 48

5 Diffeomorphisms and Lie derivative 49

5.1 Maps between manifolds 49
5.2 Diffeomorphisms 51
5.3 Lie derivative 53

Part 3 GR October 12, 2022 1 H.S. Reall

CONTENTS

6 Physical laws in curved spacetime 57

6.1 Motivating principles 57
6.2 Physical laws in curved spacetime 60
6.3 Energy-momentum tensor 63
6.4 Einstein’s equation 67
6.5 Gauge freedom 69

7 Linearized theory 71
7.1 The linearized Einstein equation 71
7.2 The Newtonian limit 74
7.3 Gravitational waves 77
7.4 The field far from a source 83
7.5 The energy in gravitational waves 86
7.6 The quadrupole formula 90
7.7 Comparison with electromagnetic radiation 93
7.8 Gravitational waves from binary systems 94

8 Differential forms 99
8.1 Introduction 99
8.2 Connection 1-forms 100
8.3 Spinors in curved spacetime (non-examinable) 103
8.4 Curvature 2-forms 105
8.5 Volume form 107
8.6 Integration on manifolds 108
8.7 Submanifolds and Stokes’ theorem 109

9 Lagrangian formulation 111

9.1 Scalar field action 111
9.2 The Einstein-Hilbert action 112
9.3 Energy momentum tensor 116

10 The initial value problem 118

10.1 Introduction 118
10.2 Extrinsic curvature 119
10.3 The Gauss-Codacci equations (proofs non-examinable) 120
10.4 The constraint equations 123
10.5 The initial value problem for GR 123

Part 3 GR October 12, 2022 2 H.S. Reall

CONTENTS

Preface
These are lecture notes for the course on General Relativity in Part III of the Cambridge
Mathematical Tripos. There are introductory GR courses in Part II (Mathematics or
Natural Sciences) so, although self-contained, this course does not cover topics usually
covered in a first course, e.g., the Schwarzschild solution, the solar system tests, and
cosmological solutions. You should consult an introductory book (e.g. Gravity by J.B.
Hartle) if you have not studied these topics before.

Acknowledgment
I am very grateful to Andrius Štikonas for producing the figures.

Conventions
We will use units in which the speed of light is one: c = 1. Sometimes we will use units
where Newton’s gravitational constant is one: G = 1.
We will use ”abstract indices” a, b, c etc to denote tensors, e.g. V a , gcd . Equations
involving such indices are basis-independent. Greek indices µ, ν etc refer to tensor
components in a particular basis. Equations involving such indices are valid only in
that basis.
We will define the metric tensor to have signature (− + ++), which is the most
common convention. Some authors use signature (+ − −−).
Our convention for the Riemann tensor is such that the Ricci identity takes the
form
∇a ∇b V c − ∇b ∇a V c = Rc dab V d .
Some authors define the Riemann tensor with the opposite sign.

Bibliography
There are many excellent books on General Relativity. The following is an incomplete
list:

1. General Relativity, R.M. Wald, Chicago UP, 1984.

2. Advanced General Relativity, J.M. Stewart, CUP, 1993.

3. Spacetime and geometry: an introduction to General Relativity, S.M. Carroll,

Addison-Wesley, 2004.

Part 3 GR October 12, 2022 3 H.S. Reall

4. Gravitation, C.W. Misner, K.S. Thorne and J.A. Wheeler, Freeman 1973.

5. Gravitation and Cosmology, S. Weinberg, Wiley, 1972.

Our approach will be closest to that of Wald. The first part of Stewart’s book
is based on a previous version of this course. Carroll’s book is a very readable intro-
duction. Weinberg’s book contains a good discussion of equivalence principles. Our
treatments of the Newtonian approximation and gravitational radiation are based on
Misner, Thorne and Wheeler.

1 Manifolds and tensors

1.1 Introduction
In Minkowski spacetime we usually use inertial frame coordinates (t, x, y, z) since these
are adapted to the symmetries of the spacetime so using these coordinates simplifies the
form of physical laws. However, a general spacetime has no symmetries and therefore no
preferred set of coordinates. In fact, a single set of coordinates might not be sufficient
to describe the spacetime. A simple example of this is provided by spherical polar
coordinates (θ, φ) on the surface of the unit sphere S 2 in R3 :

φ y

Figure 1. Spherical polar coordinates

These coordinates are not well-defined at θ = 0, π (what is the value of φ there?).

Furthermore, the coordinate φ is discontinuous at φ = 0 or 2π.
To describe S 2 so that a pair of coordinates is assigned in a smooth way to every
point, we need to use several overlapping sets of coordinates. Generalizing this example
leads to the idea of a manifold. In GR, we assume that spacetime is a 4-dimensional
differentiable manifold.

Part 3 GR October 12, 2022 4 H.S. Reall

1.2 Differentiable manifolds

You know how to do calculus on Rn . How do you do calculus on a curved space, e.g.,
S 2 ? Locally, S 2 looks like R2 so one can carry over standard results. However, one has
to confront the fact that it is impossible to use a single coordinate system on S 2 . In
order to do calculus we need our coordinates systems to ”mesh together” in a smooth
way. Mathematically, this is captured by the notion of a differentiable manifold:
Definition. An n-dimensional differentiable manifold is a set M together with a col-
lection of subsets Oα such that
S
1. α Oα = M , i.e., the subsets Oα cover M

2. For each α there is a one-to-one and onto map φα : Oα → Uα where Uα is an open

subset of Rn .

3. If Oα and Oβ overlap, i.e., Oα ∩ Oβ 6= ∅ then φβ ◦ φ−1

α maps from φα (Oα ∩ Oβ ) ⊂
n n
Uα ⊂ R to φβ (Oα ∩ Oβ ) ⊂ Uβ ⊂ R . We require that this map be smooth
(infinitely differentiable).

The maps φα are called charts or coordinate systems. The set {φα } is called an atlas.

Oα Oβ M

φβ
φα
φβ ◦ φ−1
α

Uβ
Uα

φα (Oα ∩ Oβ ) φβ (Oα ∩ Oβ )

Figure 2. Overlapping charts

Remarks.

1. Sometimes we shall write φα (p) = (x1α (p), x2α (p), . . . xnα (p)) and refer to xiα (p) as
the coordinates of p.

Part 3 GR October 12, 2022 5 H.S. Reall

1.2 Differentiable manifolds

2. We use the charts to define a topology on M : we say that R ⊂ M is open iff

φα (R ∩ Oα ) is an open subset of Uα for all α.

3. Strictly speaking, we have defined above the notion of a smooth manifold. If

we replace “smooth” in the definition by C k (k-times continuously differentiable)
then we obtain a C k -manifold. We shall always assume the manifold is smooth.

Examples.

1. Rn : this is a manifold with atlas consisting of the single chart φ : (x1 , . . . , xn ) 7→

(x1 , . . . , xn ).

2. S 1 : the unit circle, i.e., the subset of R2 given by (cos θ, sin θ) with θ ∈ R. We
can’t define a chart by using θ ∈ [0, 2π) as a coordinate because [0, 2π) is not open.
Instead let P be the point (1, 0) and define one chart by φ1 : S 1 − {P } → (0, 2π),
φ1 (p) = θ1 with θ1 defined by Fig. 3.

θ1 P
x

Figure 3. Definition of θ1

Now let Q be the point (−1, 0) and define a second chart by φ2 : S 1 − {Q} →
(−π, π), φ2 (p) = θ2 where θ2 is defined by Fig. 4.
Neither chart covers all of S 1 but together they form an atlas. The charts overlap
on the “upper” semi-circle and on the “lower” semi-circle. On the first of these we
have θ2 = φ2 ◦ φ−1 −1
1 (θ1 ) = θ1 . On the second we have θ2 = φ2 ◦ φ1 (θ1 ) = θ1 − 2π.
These are obviously smooth functions.

3. S 2 : the two-dimensional sphere defined by the surface x2 + y 2 + z 2 = 1 in R3 .

Introduce spherical polar coordinates in the usual way:

x = sin θ cos φ, y = sin θ sin φ, z = cos θ (1.1)

Part 3 GR October 12, 2022 6 H.S. Reall

1.2 Differentiable manifolds

Q θ2
x

Figure 4. Definition of θ2

These define θ ∈ [0, π] and φ ∈ [0, 2π) uniquely but the map (x, y, z) 7→ (θ, φ)
does not define a chart because [0, π]×[0, 2π) is not an open set. Note that points
with θ = 0, π or φ = 0 are the points on S 2 with y = 0, x ≥ 0, which form a line
of longitude as shown in Fig. 5. Let O be S 2 with this line removed. Then we
have a chart ψ : O → U = (0, π) × (0, 2π) ⊂ R2 , given by ψ : (x, y, z) 7→ (θ, φ).

z
O

Figure 5. The subset O ⊂ S 2 : points with θ = 0, π and φ = 0 are removed.

We can define a second chart using a different set of spherical polar coordinates
defined as follows:

x = − sin θ0 cos φ0 , y = cos θ0 , z = sin θ0 sin φ0 , (1.2)

where θ0 ∈ (0, π) and φ0 ∈ (0, 2π) are uniquely defined by these equations. This
is a chart ψ 0 : O0 → U 0 , where O0 is S 2 with the line z = 0, x ≤ 0 removed, see
Fig. 6, and U 0 is (0, π) × (0, 2π). Clearly S 2 = O ∪ O0 . The functions ψ ◦ ψ 0 −1
and ψ 0 ◦ ψ −1 are smooth on O ∩ O0 so these two charts define an atlas for S 2 .

Part 3 GR October 12, 2022 7 H.S. Reall

1.3 Smooth functions

z
O0

Figure 6. The subset O0 ⊂ S 2 : points with θ0 = 0, π and φ0 = 0 are removed.

4. Recall the Cartesian product of two sets X and Y is defined as X × Y = {(x, y) :

x ∈ X, y ∈ Y }. Given differentiable manifolds M and M 0 of dimensions n
and n0 we can make M × M 0 into a differentiable manifold of dimension n + n0
as follows. Let the charts of our two manifolds be φα : Oα → Uα ⊂ Rn and
0 0
φ0α : Oα0 → Uα0 ⊂ Rn . Now define φαβ : Oα × Oβ0 → Uα × Uβ0 ⊂ Rn+n by
φαβ (p, q) = (φα (p), φ0β (q)). You can check that the set of φαβ forms an atlas for
M × M 0 . This can be generalized to more than 2 factors in the obvious way, e.g.
we can define the n-torus as T n = S 1 × S 1 × . . . × S 1 (n factors).

Remark. A given set M may admit many atlases, e.g., one can simply add extra
charts to an atlas. We don’t want to regard this as producing a distinct manifold so
we make the following definition:
Definition. Two atlases are compatible if their union is also an atlas. The union of all
atlases compatible with a given atlas is called a complete atlas: it is an atlas which is
not contained in any other atlas.
Remark. We will always assume that were are dealing with a complete atlas. (None
of the above examples gives a complete atlas; such atlases necessarily contain infinitely
many charts.)

1.3 Smooth functions

We will need the notion of a smooth function on a smooth manifold. If φ : O → U is a
chart and f : M → R then note that f ◦ φ−1 is a map from U, i.e., a subset of Rn , to
R.
Definition. A function f : M → R is smooth if, and only if, for any chart φ, F ≡
f ◦ φ−1 : U → R is a smooth function.

Part 3 GR October 12, 2022 8 H.S. Reall

1.4 Curves and vectors

Remark. In GR, a function f : M → R is sometimes called a scalar field.

Examples.

1. Consider the example of S 1 discussed above. Let f : S 1 → R be defined by

f (x, y) = x where (x, y) are the Cartesian coordinates in R2 labelling a point on
S 1 . In the first chart φ1 we have f ◦ φ−1
1 (θ1 ) = f (cos θ1 , sin θ1 ) = cos θ1 , which
−1
is smooth. Similary f ◦ φ2 (θ2 ) = cos θ2 is also smooth. If φ is any other chart
then we can write f ◦ φ−1 = (f ◦ φ−1 −1
i ) ◦ (φi ◦ φ ), which is smooth because we’ve
just seen that f ◦ φ−1i are smooth, and φi ◦ φ−1 is smooth from the definition of
a manifold. Hence f is a smooth function.

2. Consider a manifold M with a chart φ : O → U ⊂ Rn . Let φ : p 7→ (x1 (p), x2 (p), . . . xn (p)).

Then we can regard x1 (say) as a function on the subset O of M . Is it a smooth
function? Yes for any other chart φ0 , x1 ◦ φ0−1 is smooth because it is the first
component of the map φ ◦ φ0 −1 , which is smooth by the definition of a manifold.

3. Often it is convenient to define a function by specifying F instead of f . More

precisely, given an atlas {φα }, we define f by specifying functions Fα : Uα → R
and then setting f = Fα ◦φα . One has to make sure that the resulting definition is
independent of α on chart overlaps. For example, for S 1 using the atlas discussed
above, define F1 : (0, 2π) → R by θ1 7→ sin(mθ1 ) and F2 : (−π, π) → R by θ2 7→
sin(mθ2 ), where m is an integer. On the chart overlaps we have F1 ◦ φ1 = F2 ◦ φ2
because θ1 and θ2 differ by a multiple of 2π on both overlaps. Hence this defines
a function on S 1 .

Remark. After a while we will stop distinguishing between f and F , i.e., we will say
f (x) when we mean F (x).

1.4 Curves and vectors

Rn , or Minkowski spacetime, has the structure of a vector space, e.g., it makes sense
to add the position vectors of points. One can view more general vectors, e.g., the
4-velocity of a particle, as vectors in the space itself. This structure does not extend
to more general manifolds, e.g., S 2 . So we need to discuss how to define vectors on
manifolds.
For a surface in R3 , the set of all vectors tangent to the surface at some point p
defines the tangent plane to the surface at p (see Fig. 7). This has the structure of a 2d
vector space. Note that the tangent planes at two different points p and q are different.
It does not make sense to compare a vector at p with a vector at q. For example: if

Part 3 GR October 12, 2022 9 H.S. Reall

1.4 Curves and vectors

p q

Figure 7. Tangent planes.

one tried to define the sum of a vector at p and a vector at q then to which tangent
plane would the sum belong?
On a surface, the tangent vector to a curve in the surface is automatically tangent
to the surface. We take this as our starting point for defining vectors on a general
manifold. We start by defining the notion of a curve in a manifold, and then the notion
of a tangent vector to a curve at a point p. We then show that the set of all such tangent
vectors at p forms a vector space Tp (M ). This is the analogue of the tangent plane to
a surface but it makes no reference to any embedding into a higher-dimensional space.
Definition A smooth curve in a differentiable manifold M is a smooth function λ :
I → M , where I is an open interval in R (e.g. (0, 1) or (−1, ∞)). By this we mean
that φα ◦ λ is a smooth map from I to Rn for all charts φα .
Let f : M → R and λ : I → M be a smooth function and a smooth curve
respectively. Then f ◦ λ is a map from I to R. Hence we can take its derivative to
obtain the rate of change of f along the curve:
d d
[(f ◦ λ)(t)] = [f (λ(t))] (1.3)
dt dt
In Rn we are used to the idea that the rate of change of f along the curve at a point p
is given by the directional derivative Xp · (∇f )p where Xp is the tangent to the curve
at p. Note that the vector Xp defines a linear map from the space of smooth functions
on Rn to R: f 7→ Xp · (∇f )p . This is how we define a tangent vector to a curve in a
general manifold. (We restrict to curves without self-intersections, i.e., λ(t1 ) 6= λ(t2 ) if
t1 6= t2 . A self-intersecting curve can have multiple tangent vectors at p.)
Definition. Let λ : I → M be a smooth curve without self-intersections and (wlog)
λ(0) = p. The tangent vector to λ at p is the linear map Xp from the space of smooth
functions on M to R defined by

d
Xp (f ) = [f (λ(t))] (1.4)
dt t=0

Note that this satisfies two important properties: (i) it is linear, i.e., Xp (f + g) =
Xp (f ) + Xp (g) and Xp (αf ) = αXp (f ) for any constant α; (ii) it satisfies the Leibniz

Part 3 GR October 12, 2022 10 H.S. Reall

1.4 Curves and vectors

rule Xp (f g) = Xp (f )g(p) + f (p)Xp (g), where f and g are smooth functions and f g is
their product.
If φ = (x1 , x2 , . . . xn ) is a chart defined in a neighbourhood of p and F ≡ f ◦ φ−1
then we have f ◦ λ = f ◦ φ−1 ◦ φ ◦ λ = F ◦ (φ ◦ λ) and hence
µ
∂F (x) dx (λ(t))
Xp (f ) = (1.5)
∂xµ φ(p) dt t=0

Note that (i) the first term on the RHS depends only on f and φ and the second
term on the RHS depends only on φ and λ; (ii) we are using the Einstein summation
convention, i.e., µ is summed from 1 to n in the above expression.
Proposition. The set of all tangent vectors at p forms a n-dimensional vector space
Tp (M ): the tangent space at p.
Proof. Consider curves λ and κ through p, wlog λ(0) = κ(0) = p. Let their tangent
vectors at p be Xp and Yp respectively. We need to define addition of tangent vectors
and multiplication by a constant. let α and β be constants. We define αXp + βYp to
be the linear map f 7→ αXp (f ) + βYp (f ). Next we need to show that this linear map is
indeed the tangent vector to a curve through p. Let φ = (x1 , . . . , xn ) be a chart defined
in a neighbourhood of p. Consider the following curve:

ν(t) = φ−1 [α(φ(λ(t)) − φ(p)) + β(φ(κ(t)) − φ(p)) + φ(p)] (1.6)

Note that ν(0) = p. Let Zp denote the tangent vector to this curve at p. From equation
(1.5) we have

∂F (x) d µ µ µ µ µ
Zp (f ) = [α(x (λ(t)) − x (p)) + β(x (κ(t)) − x (p)) + x (p)]
∂xµ φ(p) dt t=0
µ µ
∂F (x) dx (λ(t)) dx (κ(t))
= µ
α +β
∂x φ(p) dt t=0 dt t=0
= αXp (f ) + βYp (f )
= (αXp + βYp )(f ).

Since this is true for any smooth function f , we have Zp = αXp + βYp as required.
Hence αXp + βYp is tangent to the curve ν at p. It follows that the set of tangent
vectors at p forms a vector space (the zero vector is realized by the curve λ(t) = p for
all t).
The next step is to show that this vector space is n-dimensional. To do this, we
exhibit a basis. Let 1 ≤ µ ≤ n. Consider the curve λµ through p defined by

λµ (t) = φ−1 (x1 (p), . . . , xµ−1 (p), xµ (p) + t, xµ+1 (p), . . . , xn (p)). (1.7)

Part 3 GR October 12, 2022 11 H.S. Reall

1.4 Curves and vectors

The tangent vector to this curve at p is denoted ∂x∂ µ p . To see why, note that, using

equation (1.5)
∂ ∂F
(f ) = . (1.8)
∂xµ p ∂xµ φ(p)
The n tangent vectors ∂x∂ µ p are linearly independent. To see why, assume that there

exist constants αµ such that αµ ∂x∂ µ p = 0. Then, for any function f we must have

µ ∂F (x)
α = 0. (1.9)
∂xµ φ(p)

Choosing F = xν , this reduces to αν = 0. Letting this run over all values of ν we see
that all of the constants αν must vanish, which proves linear independence.
Finally we must prove that these tangent vectors span the vector space. This
follows from equation (1.5), which can be rewritten
µ
dx (λ(t)) ∂
Xp (f ) = (f ) (1.10)
dt t=0 ∂xµ p

this is true for any f hence

dxµ (λ(t))

∂
Xp = , (1.11)
dt t=0 ∂xµ p

i.e. Xp can be written as a linear combination of the n tangent vectors ∂x∂ µ p . These

n vectors therefore form a basis for Tp (M ), which establishes that the tangent space is
n-dimensional. QED.
Remark. The basis { ∂x∂ µ p , µ = 1, . . . , n} is chart-dependent: we had to choose

a chart φ defined in a neighbourhood of p to define it. Choosing a different chart

would give a different basis for Tp (M ). A basis defined this way is sometimes called a
coordinate basis.
Definition. Let {eµ , µ = 1, . . . , n} be a basis for Tp (M ) (not necessarily a coordinate
basis). We can expand any vector X ∈ Tp (M ) as X = X µ eµ . We call the numbers X µ
the components of X with respect to this basis.
Example. Using the coordinate basis eµ = (∂/∂xµ )p , equation (1.11) shows that the
tangent vector Xp to a curve λ(t) at p (where t = 0) has components
µ
µ dx (λ(t))
Xp = . (1.12)
dt t=0

Part 3 GR October 12, 2022 12 H.S. Reall

1.4 Curves and vectors

Remark. Note the placement of indices. We shall sum over repeated indices if one
such index appears “upstairs” (as a superscript, e.g., X µ ) and the other “downstairs”
(as a subscript, e.g., eµ ). (The index µ on ∂x∂ µ p is regarded as downstairs.) If an

equation involves the same index more than twice, or twice but both times upstairs or
both times downstairs (e.g. Xµ Yµ ) then a mistake has been made.
Let’s consider the relationship between different coordinate bases. Let φ = (x1 , . . . , xn )
and φ0 = (x0 1 , . . . , x0 n ) be two charts defined in a neighbourhood of p. Then, for any
smooth function f , we have

∂ ∂ −1
(f ) = (f ◦ φ )
∂xµ p ∂xµ φ(p)

∂ 0 −1 0 −1
= [(f ◦ φ ) ◦ (φ ◦ φ )]
∂xµ φ(p)

Now let F 0 = f ◦φ0 −1 . This is a function of the coordinates x0 . Note that the components
of φ0 ◦ φ−1 are simply the functions x0 µ (x), i.e., the primed coordinates expressed in
terms of the unprimed coordinates. Hence what we have is easy to evaluate using the
chain rule:

∂ ∂ 0 0
(f ) = (F (x (x)))
∂xµ p ∂xµ φ(p)
0ν 0 0
∂x ∂F (x )
= µ
∂x φ(p) ∂x0 ν φ0 (p)
0ν
∂x ∂
= (f )
∂xµ φ(p) ∂x0 ν p
Hence we have
∂x0 ν

∂ ∂
= (1.13)
∂xµ p ∂xµ φ(p) ∂x0 ν p

This expresses one set of basis vectors in terms of the other. Let X µ and X 0 µ denote
the components of a vector with respect to the two bases. Then we have
0µ
ν ∂ ν ∂x ∂
X=X =X (1.14)
∂xν p ∂xν φ(p) ∂x0 µ p
and hence
∂x0 µ

0µ ν
X =X (1.15)
∂xν φ(p)

Elementary treatments of GR usually define a vector to be a set of numbers {X µ } that

transforms according to this rule under a change of coordinates. More precisely, they
usually call this a “contravariant vector”.

Part 3 GR October 12, 2022 13 H.S. Reall

1.5 Covectors

1.5 Covectors
Recall the following from linear algebra:
Definition. Let V be a real vector space. The dual space V ∗ of V is the vector space
of linear maps from V to R.
Lemma. If V is n-dimensional then so is V ∗ . If {eµ , µ = 1, . . . , n} is a basis for V then
V ∗ has a basis {f µ , µ = 1, . . . , n}, the dual basis defined by f µ (eν ) = δνµ (if X = X µ eµ
then f µ (X) = X ν f µ (eν ) = X µ ).
Since V and V ∗ have the same dimension, they are isomorphic. For example the linear
map defined by eµ 7→ f µ is an isomorphism. But this is basis-dependent: a different
choice of basis would give a different isomorphism. In contrast, there is a natural
(basis-independent) isomorphism between V and (V ∗ )∗ :
Theorem. If V is finite dimensional then (V ∗ )∗ is naturally isomorphic to V . The
isomorphism is Φ : V → (V ∗ )∗ where Φ(X)(ω) = ω(X) for all ω ∈ V ∗ .
Now we return to manifolds:
Definition. The dual space of Tp (M ) is denoted Tp∗ (M ) and called the cotangent space
at p. An element of this space is called a covector at p. If {eµ } is a basis for Tp (M )
and {f µ } is the dual basis then we can expand a covector η as ηµ f µ . ηµ are called the
components of η.
Note that (i) η(eµ ) = ην f ν (eµ ) = ηµ ; (ii) if X ∈ Tp (M ) then η(X) = η(X µ eµ ) =
X µ η(eµ ) = X µ ηµ (note the placement of indices!)
Definition. Let f : M → R be a smooth function. Define a covector (df )p by
(df )p (X) = X(f ) for any vector X ∈ Tp (M ). (df )p is the gradient of f at p.
Remarks.

1. Let f be a constant. Then X(f ) = 0 for any X so (df )p = 0 for any p.

2. Let (x1 , . . . , xn ) be a coordinate chart defined in a neighbourhood of p, recall that

xµ is a smooth function (in this neighbourhood) so we can take f = xµ in the
above definition to define n covectors (dxµ )p . Note that
! µ
∂ ∂x
(dxµ )p ν
= ν
= δνµ (1.16)
∂x p ∂x p

Hence {(dxµ )p } is the dual basis of {(∂/∂xµ )p }.

Part 3 GR October 12, 2022 14 H.S. Reall

1.6 Abstract index notation

3. To explain why we call (df )p the gradient of f at p, observe that its components
in a coordinate basis are
!
∂ ∂ ∂F
[(df )p ]µ = (df )p = (f ) = (1.17)
∂xµ p ∂xµ p ∂xµ φ(p)

where the first equality uses (i) above, the second equality is the definition of
(df )p and the final equality used (1.8).

Exercise. Consider two different charts φ = (x1 , . . . , xn ) and φ0 = (x0 1 , . . . , x0 n ) defined

in a neighbourhood of p. Show that
µ
∂x ν
µ
(dx )p = (dx0 )p , (1.18)
∂x0 ν φ0 (p)

and hence that, if ωµ and ωµ0 are the components of ω ∈ Tp∗ (M ) w.r.t. the two coordinate
bases, then ν
0 ∂x
ωµ = ων . (1.19)
∂x0 µ φ0 (p)
Elementary treatements of GR take this as the definition of a covector, which they
usually call a “covariant vector”.

1.6 Abstract index notation

So far, we have used Greek letters µ, ν, . . . to denote components of vectors or covec-
tors with respect to a basis, and also to label the basis vectors themselves (e.g. eµ ).
Equations involving such indices are assumed to hold only in that basis. For example
an equation of the form X µ = δ1µ says that, in a particular basis, a vector X has only
a single non-vanishing component. This will not be true in other bases. Furthermore,
if we were just presented with this equation, we would not even know whether or not
the quantities {X µ } are the components of a vector or just a set of n numbers.
The abstract index notation uses Latin letters a, b, c, . . .. A vector X is denoted
X or X b or X c etc. The letter used in the superscript does not matter. What matters
a

is that there is a superscript Latin letter. This tells us that the object in question is a
vector. We emphasize: X a represents the vector itself, not a component of the vector.
Similarly we denote a covector η by ηa (or ηb etc).
The idea is that an equation written using Latin indices is basis-independent. More
precisely, if we have some equation written with Greek indices, and we know that
it is true for any basis, then we can replace Greek indices with Latin indices (e.g.
µ → a, ν → b etc). For example, in any basis we have η(X) = ηµ X µ = X µ ηµ

Part 3 GR October 12, 2022 15 H.S. Reall

1.7 Tensors

hence we can write η(X) = ηa X a = X a ηa . Similarly, if f is a smooth function then

X(f ) = df (X) = X a (df )a .
Conversely, if we have an equation involving abstract indices then we can obtain
an equation valid in any particular basis simply by replacing the abstract indices by
basis indices (e.g. a → µ, b → ν etc.). For example if ηa X a = 2 then in any basis we
have ηµ X µ = 2.
Latin indices must respect the rules of the summation convention so equations of
the form ηa ηa = 1 or ηb = 2 do not make sense.

1.7 Tensors
In Newtonian physics, you are familiar with the idea that certain physical quantities
are described by tensors (e.g. the inertia tensor). You may have encountered the idea
that the Maxwell field in special relativity is described by a tensor. Tensors are very
important in GR because the curvature of spacetime is described with tensors. In this
section we shall define tensors at a point p and explain some of their basic properties.
Definition. A tensor of type (r, s) at p is a multilinear map

T : Tp∗ (M ) × . . . × Tp∗ (M ) × Tp (M ) × . . . × Tp (M ) → R. (1.20)

where there are r factors of Tp∗ (M ) and s factors of Tp (M ). (Multilinear means that
the map is linear in each argument.)
In other words, given r covectors and s vectors, a tensor of type (r, s) produces a real
number.
Examples.

1. A tensor of type (0, 1) is a linear map Tp (M ) → R, i.e., it is a covector.

2. A tensor of type (1, 0) is a linear map Tp∗ (M ) → R, i.e., it is an element of

(Tp∗ (M ))∗ but this is naturally isomorphic to Tp (M ) hence a tensor of type (1, 0)
is a vector. To see how this works, given a vector X ∈ Tp (M ) we define a linear
map Tp∗ (M ) → R by η 7→ η(X) for any η ∈ Tp∗ (M ).

3. We can define a (1, 1) tensor δ by

δ(ω, X) = ω(X) (1.21)

for any covector ω and vector X.

Part 3 GR October 12, 2022 16 H.S. Reall

1.7 Tensors

Definition. Let T be a tensor of type (r, s) at p. If {eµ } is a basis for Tp (M ) with

dual basis {f µ } then the components of T in this basis are the numbers

T µ1 µ2 ...µr ν1 ν2 ...νs = T (f µ1 , f µ2 , . . . , f µr , eν1 , eν2 , . . . , eνs ) (1.22)

In the abstract index notation, we denote T by T a1 a2 ...ar b1 b2 ...bs .

Remark. Tensors of type (r, s) at p can be added together and multiplied by a constant,
hence they form a vector space. Since such a tensor has nr+s components, it is clear
that this vector space has dimension nr+s .
Examples.

1. Consider the tensor δ defined above. Its components are

δ µ ν = δ(f µ , eν ) = f µ (eν ) = δνµ , (1.23)

where the RHS is a Kronecker delta. This is true in any basis, so in the abstract
index notation we write δ as δba .

2. Consider a (2, 1) tensor. Let η and ω be covectors and X a vector. Then in our
basis we have

T (η, ω, X) = T (ηµ f µ , ων f ν , X ρ eρ ) = ηµ ων X ρ T (f µ , f ν , eρ ) = T µν ρ ηµ ων X ρ (1.24)

Now the basis we chose was arbitrary, hence we can immediately convert this to
a basis-independent equation using the abstract index notation:

T (η, ω, X) = T ab c ηa ωb X c . (1.25)

This formula generalizes in the obvious way to a (r, s) tensor.

We have discussed the transformation of vectors and covectors components under a

change of coordinate basis. Let’s now examine how tensor components transform under
an arbitrary change of basis. Let {eµ } and {e0µ } be two bases for Tp (M ). Let {f µ } and
{f 0 µ } denote the corresponding dual bases. Expanding the primed bases in terms of
the unprimed bases gives
µ
f 0 = Aµ ν f ν , e0µ = B ν µ eν (1.26)

for some matrices Aµ ν and B ν µ . These matrices are related because:

µ
δνµ = f 0 (e0ν ) = Aµ ρ f ρ (B σ ν eσ ) = Aµ ρ B σ ν f ρ (eσ ) = Aµ ρ B σ ν δσρ = Aµ ρ B ρ ν . (1.27)

Part 3 GR October 12, 2022 17 H.S. Reall

1.7 Tensors

Hence B µ ν = (A−1 )µ ν . For a change between coordinate bases, our previous results
give 0µ µ
µ ∂x µ ∂x
A ν= , B ν = (1.28)
∂xν ∂x0 ν
and these matrices are indeed inverses of each other (from the chain rule).
Exercise. Show that under an arbitrary change of basis, the components of a vector
X and a covector η transform as
µ
X 0 = Aµ ν X ν , η 0 µ = (A−1 )ν µ ην . (1.29)

Show that the components of a (2, 1) tensor T transform as

µν
T0 ρ = Aµ σ Aν τ (A−1 )λ ρ T στ λ . (1.30)

The corresponding result for a (r, s) tensor is an obvious generalization of this formula.
Given a (r, s) tensor, we can construct a (r − 1, s − 1) tensor by contraction. This is
easiest to demonstrate with an example.
Example. Let T be a tensor of type (3, 2). Define a new tensor S of type (2, 1) as
follows
S(ω, η, X) = T (f µ , ω, η, eµ , X) (1.31)
where {eµ } is a basis and {f µ } is the dual basis, ω and η are arbitrary covectors and
X is an arbitrary vector. This definition is basis-independent because
µ
T (f 0 , ω, η, e0 µ , X) = T (Aµ ν f ν , ω, η, (A−1 )ρ µ eρ , X)
= (A−1 )ρ µ Aµ ν T (f ν , ω, η, eρ , X)
= T (f µ , ω, η, eµ , X).

The components of S and T are related by S µν ρ = T σµν σρ in any basis. Since this is
true in any basis, we can write it using the abstract index notation as

S ab c = T dab dc (1.32)

Note that there are other (2, 1) tensors that we can build from T abc de . For example,
there is T abd cd , which corresponds to replacing the RHS of (1.31) with T (ω, η, f µ , X, eµ ).
The abstract index notation makes it clear how many different tensors can be defined
this way: we can define a new tensor by ”contracting” any upstairs index with any
downstairs index.
Another important way of constructing new tensors is by taking the product of two
tensors:

Part 3 GR October 12, 2022 18 H.S. Reall

1.7 Tensors

Definition. If S is a tensor of type (p, q) and T is a tensor of type (r, s) then the outer
product of S and T , denoted S ⊗ T is a tensor of type (p + r, q + s) defined by

(S ⊗ T )(ω1 , . . . , ωp , η1 , . . . , ηr , X1 , . . . , Xq , Y1 , . . . , Ys )
= S(ω1 , . . . , ωp , X1 , . . . , Xq )T (η1 , . . . , ηr , Y1 , . . . , Ys ) (1.33)

where ω1 , . . . , ωp and η1 , . . . , ηr are arbitrary covectors and X1 , . . . , Xq and Y1 , . . . , Ys

are arbitrary vectors.
Exercise. Show that this definition is equivalent to

(S ⊗ T )a1 ...ap b1 ...br c1 ...cq d1 ...ds = S a1 ...ap c1 ...cq T b1 ...br d1 ...ds (1.34)

Exercise. Show that, in a basis, any (2, 1) tensor T at p can be written as

T = T µν ρ eµ ⊗ eν ⊗ f ρ (1.35)

This generalizes in the obvious way to a (r, s) tensor.

Remark. You may be wondering why we write T ab c instead of Tcab . At the moment
there is no reason why we should not adopt the latter notation. However, it is convenient
to generalize our definition of tensors slightly. We have defined a (r, s) tensor to be
a linear map with r + s arguments, where the first r arguments are covectors and
the final s arguments are vectors. We can generalize this by allowing the covectors
and vectors to appear in any order. For example, consider a (1, 1) tensor. This is
a map Tp∗ (M ) × Tp (M ) → R. But we could just as well have defined it to be a map
Tp (M )×Tp∗ (M ) → R, which defines a different type of (1, 1) tensor. The abstract index
notation allows us to distinguish these two types easily: the first would be written as
T a b and the second as Ta b . (2, 1) tensors come in 3 different types: T ab c , T a b c and Ta bc .
Each type of of (r, s) tensor gives a vector space of dimension nr+s but these vector
spaces are naturally isomorphic so often one does not bother to distinguish between
them.
There is a final type of tensor operation that we shall need: symmetrization and
antisymmetrization. Consider a (0, 2) tensor T . We can define two other (0, 2) tensors
S and A as follows:
1 1
S(X, Y ) = (T (X, Y ) + T (Y, X)), A(X, Y ) = (T (X, Y ) − T (Y, X)), (1.36)
2 2
where X and Y are vectors at p. In abstract index notation:
1 1
Sab = (Tab + Tba ), Aab = (Tab − Tba ). (1.37)
2 2

Part 3 GR October 12, 2022 19 H.S. Reall

1.8 Tensor fields

In a basis, we can regard the components of T as a square matrix. The components of S

and A are just the symmetric and antisymmetric parts of this matrix. It is convenient
to introduce some notation to describe the operations we have just defined: we write
1 1
T(ab) = (Tab + Tba ), T[ab] = (Tab − Tba ). (1.38)
2 2
These operations can be applied to more general tensors. For example,
1
T (ab)c d = (T abc d + T bac d ). (1.39)
2
We can also symmetrize or antisymmetrize on more than 2 indices. To symmetrize on
n indices, we sum over all permutations of these indices and divide the result by n!
(the number of permutations). To antisymmetrize we do the same but we weight each
term in the sum by the sign of the permutation. The indices that we symmetrize over
must be either upstairs or downstairs, they cannot be a mixture. For example,
1
T (abc)d = T abcd + T bcad + T cabd + T bacd + T cbad + T acbd .

(1.40)
3!
1
T a [bcd] = (T a bcd + T a cdb + T a dbc − T a cbd − T a dcb − T a bdc ) . (1.41)
3!
Sometimes we might wish to (anti)symmetrize over indices which are not adjacent. In
this case, we use vertical bars to denote indices excluded from the (anti)symmetrization.
For example,
1
T(a|bc|d) = (Tabcd + Tdbca ) . (1.42)
2
Exercise. Show that T (ab) X[a|cd|b] = 0.

1.8 Tensor fields

So far, we have defined vectors, covectors and tensors at a single point p. However, in
physics we shall need to consider how these objects vary in spacetime. This leads us
to define vector, covector and tensor fields.
Definition. A vector field is a map X which maps any point p ∈ M to a vector Xp at
p. Given a vector field X and a function f we can define a new function X(f ) : M → R
by X(f ) : p 7→ Xp (f ). The vector field X is smooth if X(f ) is a smooth function for
any smooth f .
∂
Example. Given any coordinate chart φ = (x1 , . . . , xn ), the vector field ∂xµ
is defined
by p 7→ ∂x∂ µ p . Hence

∂ ∂F
(f ) : p 7→ , (1.43)
∂xµ ∂xµ φ(p)

Part 3 GR October 12, 2022 20 H.S. Reall

1.9 Integral curves

where F ≡ f ◦ φ−1 . Smoothness of f implies that this map defines a smooth function.
Therefore ∂/∂xµ is a smooth vector field. (Note that (∂/∂xµ ) usually won’t be defined
on the whole manifold M since the chart φ might not cover the whole manifold. So
strictly speaking this is not a vector field on M but only on a subset of M . We shan’t
worry too much about this distinction.)
Remark. Since the vectors (∂/∂xµ )p provide a basis for Tp (M ) at any point p, we can
expand an arbitrary vector field as

µ ∂
X=X (1.44)
∂xµ
Since ∂/∂xµ is smooth, it follows that X is smooth if, and only if, its coordinate-basis
components X µ are smooth functions.
Definition. A covector field is a map ω which maps any point p ∈ M to a covector ωp
at p. Given a covector field and a vector field X we can define a function ω(X) : M → R
by ω(X) : p 7→ ωp (Xp ). The covector field ω is smooth if this function is smooth for
any smooth vector field X.
Example. Let f be a smooth function. We have defined (df )p above. Now we let p
vary to define a covector field df . Let X be a smooth vector field. Then df (X) : p 7→
(df )p (Xp ) = Xp (f ) hence df (X) = X(f ). This is a smooth function of p (because X is
smooth). Hence df is a smooth covector field: the gradient of f .
Remark. Taking f = xµ reveals that dxµ is a smooth covector field.
Definition. A (r, s) tensor field is a map T which maps any point p ∈ M to a (r, s) ten-
sor Tp at p. Given r covector fields η1 , . . . , ηr and s vector fields X1 , . . . , Xs we can define
a function T (η1 , . . . , ηr , X1 , . . . , Xs ) : M → R by p 7→ Tp ((η1 )p , . . . , (ηr )p , (X1 )p , . . . , (Xs )p ).
The tensor field T is smooth if this function is smooth for any smooth covector fields
η1 , . . . , ηr and vector fields X1 , . . . , Xr .
Exercise. Show that a tensor field is smooth if, and only if, its components in a
coordinate chart are smooth functions.
Remarks. Note that we can regard a function on M as a (0, 0) tensor field. Henceforth
we shall assume that all tensor fields that we encounter are smooth.

1.9 Integral curves

In fluid mechanics, the velocity of a fluid is described by a vector field u(x) in R3 (we are
returning to Cartesian vector notation for a moment). Consider a particle suspended
in the fluid with initial position x0 . It moves with the fluid so its position x(t) satisfies
dx
= u(x(t)), x(0) = x0 . (1.45)
dt

Part 3 GR October 12, 2022 21 H.S. Reall

1.10 The commutator

The solution of this differential equation is called the integral curve of the vector field
u through x0 . The definition extends straightforwardly to a vector field on a general
manifold:
Definition. Let X be a vector field on M and p ∈ M . An integral curve of X through
p is a curve through p whose tangent at every point is X.
Let λ denote an integral curve of X with (wlog) λ(0) = p. In a coordinate chart, this
definition reduces to the initial value problem

dxµ (t)
= X µ (x(t)), xµ (0) = xµp . (1.46)
dt
(Here we are using the abbreviation xµ (t) = xµ (λ(t)).) Standard ODE theory guar-
antees that there exists a unique solution to this problem. Hence there is a unique
integral curve of X through any point p.
Example. In a chart φ = (x1 , . . . , xn ), consider X = ∂/∂x1 + x1 ∂/∂x2 and take p to
be the point with coordinates (0, . . . , 0). Then dx1 /dt = 1, dx2 /dt = x1 . Solving the
first equation and imposing the initial condition gives x1 = t, then plugging into the
second equation and solving gives x2 = t2 /2. The other coords are trivial: xµ = 0 for
µ > 2, so the integral curve is t 7→ φ−1 (t, t2 /2, 0, . . . , 0).

1.10 The commutator

Let X and Y be vector fields and f a smooth function. Since Y (f ) is a smooth function,
we can act on it with X to form a new smooth function X(Y (f )). Does the map
f 7→ X(Y (f )) define a vector field? No, because X(Y (f g)) = X(f Y (g) + gY (f )) =
f X(Y (g)) + gX(Y (f )) + X(f )Y (g) + X(g)Y (f ) so the Leibniz law is not satisfied.
However, we can also define Y (X(f )) and the combination X(Y (f )) − Y (X(f )) does
obey the Leibniz law (check!).
Definition. The commutator of two vector fields X and Y is the vector field [X, Y ]
defined by
[X, Y ](f ) = X(Y (f )) − Y (X(f )) (1.47)
for any smooth function f .
To see that this does indeed define a vector field, we can evaluate it in a coordinate

Part 3 GR October 12, 2022 22 H.S. Reall

1.10 The commutator

chart:

ν ∂F µ ∂F
[X, Y ](f ) = X Y −Y X
∂xν ∂xµ

µ ∂ ν ∂F ν ∂ µ ∂F
= X Y −Y X
∂xµ ∂xν ∂xν ∂xµ
∂Y ν ∂F ∂X µ ∂F
= Xµ µ ν − Y ν ν
∂x ∂x ∂x ∂xµ
µ
∂Y ∂X µ ∂F
= Xν ν − Y ν ν
∂x ∂x ∂xµ

∂
= [X, Y ]µ (f )
∂xµ

where µ µ

µ ν ∂Y ν ∂X
[X, Y ] = X −Y . (1.48)
∂xν ∂xν
Since f is arbitrary, it follows that

µ ∂
[X, Y ] = [X, Y ] . (1.49)
∂xµ

The RHS is a vector field hence [X, Y ] is a vector field whose components in a coordinate
basis are given by (1.48). (Note that we cannot write equation (1.48) in abstract index
notation because it is valid only in a coordinate basis.)
Example. Let X = ∂/∂x1 and Y = x1 ∂/∂x2 + ∂/∂x3 . The components of X are
constant so [X, Y ]µ = ∂Y µ /∂x1 = δ2µ so [X, Y ] = ∂/∂x2 .
Exercise. Show that (i) [X, Y ] = −[Y, X]; (ii) [X, Y + Z] = [X, Y ] + [X, Z]; (iii)
[X, f Y ] = f [X, Y ] + X(f )Y ; (iv) [X, [Y, Z]] + [Y, [Z, X]] + [Z, [X, Y ]] = 0 (the Jacobi
identity). Here X, Y, Z are vector fields and f is a smooth function.
Remark. The components of (∂/∂xµ ) in the coordinate basis are either 1 or 0. It
follows that
∂ ∂
, = 0. (1.50)
∂xµ ∂xν
Conversely, it can be shown that if X1 , . . . , Xm (m ≤ n) are vector fields that are
linearly independent at every point of some open region R, and whose commutators
all vanish in R, then, in a neighbourhood of any point p ∈ R, one can introduce a
coordinate chart (x1 , . . . , xn ) such that Xi = ∂/∂xi (i = 1, . . . , m) throughout this
neighbourhood.

Part 3 GR October 12, 2022 23 H.S. Reall

2 Metric tensors
2.1 Definition
A metric captures the notion of distance on a manifold. We can motivate the required
definition by considering the case of Euclidean space. Let x(t), a < t < b be a curve in
R3 (we’re using Cartesian vector notation). Then the length of the curve is
Z b r
dx dx
dt · . (2.1)
a dt dt

Inside the integral we see the norm of the tangent vector dx/dt, in other words the
scalar product of this vector with itself. Therefore to define a notion of distance on a
general manifold, we shall start by introducing a scalar product between vectors.
A scalar product maps a pair of vectors to a number. In other words, at a point
p, it is a map g : Tp (M ) × Tp (M ) → R. A scalar product should be linear in each
argument. Hence g is a (0, 2) tensor at p. We call g a metric tensor. There are a couple
of other properties that g should also satisfy:
Definition. A metric tensor at p ∈ M is a (0, 2) tensor g with the following properties:

1. It is symmetric: g(X, Y ) = g(Y, X) for all X, Y ∈ Tp (M ) (i.e. gab = gba )

2. It is non-degenerate: g(X, Y ) = 0 for all Y ∈ Tp (M ) if, and only if, X = 0.

Remark. Sometimes we shall denote g(X, Y ) by hX, Y i or X · Y .

Since the components of g form a symmetric matrix, one can introduce a basis that
diagonalizes g. Non-degeneracy implies that none of the diagonal elements is zero. By
rescaling the basis vectors, one can arrange that the diagonal elements are all ±1. In
this case, the basis is said to be orthonormal. There are many such bases but a standard
algebraic theorem (Sylvester’s law of inertia) states that the number of positive and
negative elements is independent of the choice of orthonormal basis. The number of
positive and negative elements is called the signature of the metric.
In differential geometry, one is usually interested in Riemannian metrics. These
have signature + + + . . . + (i.e. all diagonal elements +1 in an orthonormal basis), and
hence g is positive definite. In GR, we are interested in Lorentzian metrics, i.e., those
with signature − + + . . . +.
Definition. A Riemannian (Lorentzian) manifold is a pair (M, g) where M is a differ-
entiable manifold and g is a Riemannian (Lorentzian) metric tensor field. A Lorentzian
manifold is sometimes called a spacetime.

Part 3 GR October 12, 2022 24 H.S. Reall

2.1 Definition

p
Definition. On a Riemannian manifold, the norm of a vector X is |X| = g(X, X)
and the angle between two non-zero vectors X and Y (at the same point) is θ where
cos θ = g(X, Y )/(|X| |Y |). (These definitions, in terms of the scalar product, agree
with the usual definitions of Euclidean geometry.)
Remark. On a Riemannian manifold, we can now define the length of a curve in
exactly the same way as above: let λ : (a, b) → M be a smooth curve with tangent
vector X. Then the length of the curve is
Z b q
dt g(X, X)|λ(t) (2.2)
a

Exercise. Given a curve λ(t) we can define a new curve simply by changing the
parameterization: let t = t(u) with dt/du > 0 and u ∈ (c, d) with t(c) = a and t(d) = b.
Show that: (i) the new curve κ(u) ≡ λ(t(u)) has tangent vector Y a = (dt/du)X a ; (ii)
the length of these two curves is the same, i.e., our definition of length is independent
of parameterization.
In a coordinate basis, we have (cf equation (1.35))

g = gµν dxµ ⊗ dxν (2.3)

Often we use the notation ds2 instead of g and abbreviate this to

ds2 = gµν dxµ dxν (2.4)

This notation captures the intuitive idea of an infinitesimal distance ds being deter-
mined by infinitesimal coordinate separations dxµ .
Examples.

1. In Rn = {(x1 , . . . , xn )}, the Euclidean metric is

g = dx1 ⊗ dx1 + . . . + dxn ⊗ dxn (2.5)

(Rn , g) is called Euclidean space. A coordinate chart which covers all of R4 and
in which gµν = diag(1, 1, . . . , 1) is called Cartesian.

2. In R4 = {(x0 , x1 , x2 , x3 )}, the Minkowski metric is

η = −(dx0 )2 + (dx1 )2 + (dx2 )2 + (dx3 )2 . (2.6)

(R4 , η) is called Minkowski spacetime. A coordinate chart which covers all of R4

and in which ηµν = diag(−1, 1, 1, 1) everywhere is called an inertial frame.

Part 3 GR October 12, 2022 25 H.S. Reall

2.2 Lorentzian signature

3. On S 2 , let (θ, φ) denote the spherical polar coordinate chart discussed earlier.
The (unit) round metric on S 2 is

ds2 = dθ2 + sin2 θ dφ2 , (2.7)

i.e. in the chart (θ, φ), we have gµν = diag(1, sin2 θ). Note this is positive definite
for θ ∈ (0, π), i.e., on all of this chart. However, this chart does not cover the
whole manifold so the above equation does not determine g everywhere. We can
give a precise definition by adding that, in the chart (θ0 , φ0 ) discussed earlier,
g = dθ0 2 + sin2 θ0 dφ0 2 . One can check that this does indeed define a smooth
tensor field. (This metric is the one induced from the embedding of S 2 into 3d
Euclidean space: we will see later that it is the “pull-back” of the metric on
Euclidean space.)

Definition. Since gab is non-degenerate, it must be invertible. The inverse metric is a

symmetric (2, 0) tensor field denoted g ab and obeys

g ab gbc = δca (2.8)

where δca is the (1, 1) tensor defined in (1.21) with components δνµ .
Example. For the metric on S 2 defined above, in the chart (θ, φ) we have g µν =
diag(1, 1/ sin2 θ).

Definition. A metric determines a natural isomorphism between vectors and covectors.

Given a vector X a we can define a covector Xa = gab X b . Given a covector ηa we can
define a vector η a = g ab ηb . These maps are clearly inverses of each other.
Remark. This isomorphism is the reason why covectors are not more familiar: we are
used to working in Euclidean space using Cartesian coordinates, for which gµν and g µν
are both the identity matrix, so the isomorphism appears trivial.
Definition. For a general tensor, abstract indices can be “lowered” by contracting with
gab and “raised” by contracting with g ab . Raising and lowering preserve the ordering
of indices. The resulting tensor is denoted by the same letter as the original tensor.
Example. Let T be a (3, 2) tensor. Then T a b cde = gbf g dh g ej T af c hj .

2.2 Lorentzian signature

Remark. On a Lorentzian manifold, we take basis indices µ, ν, . . . to run from 0 to

n − 1.

Part 3 GR October 12, 2022 26 H.S. Reall

2.2 Lorentzian signature

At any point p of a Lorentzian manifold, we can choose an orthonormal basis {eµ }

so that, at p, g(eµ , eν ) = ηµν ≡ diag(−1, 1, . . . , 1). Such a basis is far from unique. If
e0 µ = (A−1 )ν µ eν is any other such basis then we have

ηµν = g(e0 µ , e0 ν ) = (A−1 )ρ µ (A−1 )σ ν g(eρ , eσ ) = (A−1 )ρ µ (A−1 )σ ν ηρσ . (2.9)

Hence
ηµν Aµ ρ Aν σ = ηρσ . (2.10)
These are the defining equations of a Lorentz transformation in special relativity. Hence
different orthonormal bases at p are related by Lorentz transformations. We saw earlier
that the components of a vector at p transform as X 0 µ = Aµ ν X ν , which is the same
as the transformation law of the components of a vector in special relativity. A similar
result holds for tensors of other types. Thus, on a Lorentzian manifold, tensor compo-
nents w.r.t. different orthonormal bases at p are related in exactly the same way as in
special relativity.
Definition. On a Lorentzian manifold (M, g), a vector X ∈ Tp (M ) is timelike if
g(X, X) < 0, null (or lightlike) if X 6= 0 and g(X, X) = 0, and spacelike if g(X, X) > 0.
A vector is causal if it is timelike or null.
Remark. In an orthonormal basis at p, the metric has components ηµν so the tangent
space at p has exactly the same structure as Minkowski spacetime, i.e., null vectors at p
define a double cone, the light cone, that separates timelike vectors at p from spacelike
vectors at p (see Fig. 8). Note that causal vectors at p fall into two disconnected sets.

timelike
null

spacelike

Figure 8. Light cone structure of Tp (M )

Exercise (Examples sheet 1). Let X a , Y b be non-zero vectors at p that are orthog-
onal, i.e., gab X a Y b = 0. Show that (i) if X a is timelike then Y a is spacelike; (ii) if X a
is null then Y a is spacelike or null; (iii) if X a is spacelike then Y a can be spacelike,

Part 3 GR October 12, 2022 27 H.S. Reall

2.3 Curves of extremal proper time

timelike, or null. (Hint. Choose an orthonormal basis to make the components of X a

as simple as possible.)
Definition. A curve in a Lorentzian manifold is said to be timelike if its tangent vector
is everywhere timelike. Null, spacelike and causal curves are defined similarly. (Most
curves do not satisfy any of these definitions because e.g. the tangent vector can change
from timelike to null to spacelike along a curve.)
Remark. The length of a spacelike curve is defined in the same way as on a Riemannian
manifold (equation (2.2)). What about a timelike curve?
Definition. let λ(u) be a timelike curve with λ(0) = p. Let X a be the tangent to the
curve. The proper time τ from p along the curve is defined by
dτ q
= − (gab X a X b )λ(u) , τ (0) = 0. (2.11)
du
Remark. In a coordinate chart, X µ = dxµ /du so this definition can be rewritten in
the form
dτ 2 = −gµν dxµ dxν , (2.12)
with the understanding that this is to be evaluated along the curve. Integrating the
above equation along the curve gives the proper time from p to some other point
q = λ(uq ) as s
Z uq
dxµ dxν
τ= du − gµν (2.13)
0 du du λ(u)

Definition. If proper time τ is used to parametrize a timelike curve then the tangent
ua to the curve is called the 4-velocity of the curve. In a coordinate basis, it has
components uµ = dxµ /dτ .
Remark. (2.12) implies that 4-velocity is a unit timelike vector:

gab ua ub = −1. (2.14)

2.3 Curves of extremal proper time

Let p and q be points connected by a timelike curve. A small deformation of a timelike
curve remains timelike hence there exist infinitely many timelike curves connecting p
and q. The proper time between p and q will be different for different curves. Which
curve extremizes the proper time between p and q?
This is a standard Euler-Lagrange problem. Consider timelike curves from p to q
with parameter u such that λ(0) = p, λ(1) = q. Let’s use a dot to denote a derivative

Part 3 GR October 12, 2022 28 H.S. Reall

2.3 Curves of extremal proper time

with respect to u. The proper time between p and q along such a curve is given by the
functional Z 1
τ [λ] = du G (x(u), ẋ(u)) (2.15)
0
where q
G (x(u), ẋ(u)) ≡ −gµν (x(u))ẋµ (u)ẋν (u) (2.16)
and we are writing xµ (u) as a shorthand for xµ (λ(u)).
The curve that extremizes the proper time, must satisfy the Euler-Lagrange equa-
tion
d ∂G ∂G
µ
− µ =0 (2.17)
du ∂ ẋ ∂x
Working out the various terms, we have (using the symmetry of the metric)
∂G 1 1
µ
= − 2gµν ẋν = − gµν ẋν (2.18)
∂ ẋ 2G G
∂G 1
µ
= − gνρ,µ ẋν ẋρ (2.19)
∂x 2G
where we have relabelled some dummy indices, and introduced the important notation
of a comma to denote partial differentiation:
∂
gνρ,µ ≡ gνρ (2.20)
∂xµ
We will be using this notation a lot henceforth.
So far, our parameter u has been arbitrary subject to the conditions u(0) = p and
u(1) = q. At this stage, it is convenient to use a more physical parameter, namely τ ,
the proper time along the curve. (Note that we could not have used τ from the outset
since the value of τ at q is different for different curves, which would make the range
of integration different for different curves.) The parameters are related by
2
dτ
= −gµν ẋµ ẋν = G2 (2.21)
du

and hence dτ /du = G. So in our equations above, we can replace d/du with Gd/dτ , so
the Euler-Lagrange equation becomes (after cancelling a factor of −G)

dxν dxν dxρ

d 1
gµν − gνρ,µ =0 (2.22)
dτ dτ 2 dτ dτ
Hence
d 2 xν dxρ dxν 1 dxν dxρ
gµν + gµν,ρ − gνρ,µ =0 (2.23)
dτ 2 dτ dτ 2 dτ dτ

Part 3 GR October 12, 2022 29 H.S. Reall

2.3 Curves of extremal proper time

In the second term, we can replace gµν,ρ with gµ(ν,ρ) because it is contracted with an
object symmetrical on ν and ρ. Finally, contracting the whole expression with the
inverse metric and relabelling indices gives

d 2 xµ ν
µ dx dx
ρ
+ Γ νρ =0 (2.24)
dτ 2 dτ dτ
where Γµνρ are known as the Christoffel symbols, and are defined by

1
Γµνρ = g µσ (gσν,ρ + gσρ,ν − gνρ,σ ) . (2.25)
2
Remarks. 1. Γµνρ = Γµρν . 2. The Christoffel symbols are not tensor components.
Neither the first term nor the second term in (2.24) are components of a vector but
the sum of these two terms does give vector components. 3. Equation 2.24 is called
the geodesic equation and its solutions are called geodesics. Geodesics will be discussed
more generally below. 4. We obtain exactly the same equation if we consider curves in
a Riemannian manifold, or spacelike curves in a Lorentzian manifold, that extremize
proper length.
Example. In Minkowski spacetime, the components of the metric in an inertial frame
are constant so Γµνρ = 0. Hence the above equation reduces to d2 xµ /dτ 2 = 0, which
is the equation of a straight line. Thus, in Minkowski spacetime, timelike curves of
extremal proper time are straight lines. It can be shown that these lines maximize the
proper time between two points. In a general spacetime, this is true only locally, i.e.,
for any point p there exists a neighbourhood of p within which timelike geodesics are
curves that maximize proper time.
Exercise. Show that (2.24) can be obtained more directly as the Euler-Lagrange
equation for the Lagrangian
dxµ dxν
L = −gµν (x(τ )) (2.26)
dτ dτ
This is usually the easiest way to derive (2.24) or to calculate the Christoffel symbols.
Example. The Schwarzschild metric in Schwarzschild coordinates (t, r, θ, φ) is
2M
ds2 = −f dt2 + f −1 dr2 + r2 dθ2 + r2 sin2 θ dφ2 , f =1− (2.27)
r
where M is a constant. We have
2 2 2 2
dt −1 dr 2 dθ 2 2 dφ
L=f −f −r − r sin θ (2.28)
dτ dτ dτ dτ

Part 3 GR October 12, 2022 30 H.S. Reall

so the EL equation for t(τ ) is

d2 t

d dt dt dr
2f =0 ⇒ 2
+ f −1 f 0 =0 (2.29)
dτ dτ dτ dτ dτ

From this we can read off

f0
Γ001 = Γ010 = , Γ0µν = 0 otherwise (2.30)
2f
The other Christoffel symbols are obtained in a similar way from the remaining EL
equations (examples sheet 1).

3 Covariant derivative
3.1 Introduction
To formulate physical laws, we need to be able to differentiate tensor fields. For scalar
fields, partial differentiation is fine: f,µ ≡ ∂f /∂xµ are the components of the covector
field (df )a . However, for tensor fields, partial differentiation is no good because the
partial derivative of a tensor field does not give another tensor field:
Exercise. Let V a be a vector field. In any coordinate chart, let T µ ν = V µ ,ν ≡
∂V µ /∂xν . Show that T µ ν do not transform as tensor components under a change of
chart.
The problem is that differentiation involves comparing a tensor at two infinitesi-
mally nearby points of the manifold. But we have seen that this does not make sense:
tensors at different points belong to different spaces. The mathematical structure that
overcomes this difficulty is called a covariant derivative or connection.
Definition. A covariant derivative ∇ on a manifold M is a map sending every pair of
smooth vector fields X, Y to a smooth vector field ∇X Y , with the following properties
(where X, Y, Z are vector fields and f, g are functions)

∇f X+gY Z = f ∇X Z + g∇Y Z, (3.1)

∇X (Y + Z) = ∇X Y + ∇X Z, (3.2)
∇X (f Y ) = f ∇X Y + (∇X f )Y, (Leibniz rule), (3.3)
where the action of ∇ on functions is defined by

∇X f = X(f ). (3.4)

Part 3 GR October 12, 2022 31 H.S. Reall

3.1 Introduction

Remark. (3.1) implies that, at any point, the map ∇Y : X 7→ ∇X Y is a linear map
from Tp (M ) to itself. Hence it defines a (1, 1) tensor (see examples sheet 1). More
precisely, if η ∈ Tp∗ (M ) and X ∈ Tp (M ) then we define (∇Y )(η, X) ≡ η(∇X Y ).
Definition. let Y be a vector field. The covariant derivative of Y is the (1, 1) tensor
field ∇Y . In abstract index notation we usually write (∇Y )a b as ∇b Y a or Y a ;b
Remarks.

1. Similarly we define ∇f : X 7→ ∇X f = X(f ). Hence ∇f = df . We can write

this as either ∇a f or f;a or ∂a f or f,a (i.e. the covariant derivative reduces to the
partial derivative when acting on a function).

2. Does the map ∇ : X, Y 7→ ∇X Y define a (1, 2) tensor field? No - equation (3.3)

shows that this map is not linear in Y .

Example. Pick a coordinate chart on M . Let ∇ be the partial derivative in this chart.
This satisfies all of the above conditions. This is not a very interesting example of
a covariant derivative because it depends on choosing a particular chart: if we use a
different chart then this covariant derivative will not be the partial derivative in the
new chart.
Definition. In a basis {eµ } the connection components Γµνρ are defined by

∇ρ eν ≡ ∇eρ eν = Γµνρ eµ (3.5)

Example. The Christoffel symbols are the coordinate basis components of a certain
connection, the Levi-Civita connection, which is defined on any manifold with a metric.
More about this soon.
Write X = X µ eµ and Y = Y µ eµ . Now

∇X Y = ∇X (Y µ eµ ) = X(Y µ )eµ + Y µ ∇X eµ (Leibniz)

ν µ µ
= X eν (Y )eµ + Y ∇X ν eν eµ
= X ν eν (Y µ )eµ + Y µ X ν ∇ν eµ by (3.1)
= X ν eν (Y µ )eµ + Y µ X ν Γρµν eρ
= X ν eν (Y µ ) + Γµρν Y ρ eµ

(3.6)

and hence
(∇X Y )µ = X ν eν (Y µ ) + Γµρν Y ρ X ν (3.7)

Part 3 GR October 12, 2022 32 H.S. Reall

3.1 Introduction

so
Y µ ;ν = eν (Y µ ) + Γµρν Y ρ (3.8)
In a coordinate basis, this reduces to

Y µ ;ν = Y µ ,ν + Γµρν Y ρ (3.9)

The connection components Γµνρ are not tensor components:

Exercise (examples sheet 2). Consider a change of basis e0µ = (A−1 )ν µ eν . Show that
µ
Γ0 νρ = Aµ τ (A−1 )λ ν (A−1 )σ ρ Γτλσ + Aµ τ (A−1 )σ ρ eσ ((A−1 )τ ν ) (3.10)

The presence of the second term demonstrates that Γµνρ are not tensor components.
Hence neither term on the RHS of equation (3.8) transforms as a tensor. However, the
sum of these two terms does transform as a tensor.
Exercise. Let ∇ and ∇ ˜ be two different connections on M . Show that ∇ − ∇ ˜ is a
(1, 2) tensor field. You can do this either from the definition of a connection, or from
the transformation law for the connection components.
The action of ∇ is extended to general tensor fields by the Leibniz property. If T
is a tensor field of type (r, s) then ∇T is a tensor field of type (r, s + 1). For example,
if η is a covector field then, for any vector fields X and Y , we define

(∇X η)(Y ) ≡ ∇X (η(Y )) − η(∇X Y ). (3.11)

It is not obvious that this defines a (0, 2) tensor but we can see this as follows:

(∇X η)(Y ) = ∇X (ηµ Y µ ) − ηµ (∇X Y )µ

= X(ηµ )Y µ + ηµ X(Y µ ) − ηµ X ν eν (Y µ ) + Γµρν Y ρ X ν ,

(3.12)

where we used (3.7). Now, the second and third terms cancel (X = X ν eν ) and hence
(renaming dummy indices in the final term)

(∇X η)(Y ) = X(ηµ ) − Γρµν ηρ X ν Y µ ,

(3.13)

which is linear in Y µ so ∇X η is a covector field with components

(∇X η)µ = X(ηµ ) − Γρµν ηρ X ν

= X ν eν (ηµ ) − Γρµν ηρ

(3.14)

This is linear in X ν and hence ∇η is a (0, 2) tensor field with components

ηµ;ν = eν (ηµ ) − Γρµν ηρ (3.15)

Part 3 GR October 12, 2022 33 H.S. Reall

3.1 Introduction

In a coordinate basis, this is

ηµ;ν = ηµ,ν − Γρµν ηρ (3.16)
Now the Leibniz rule can be used to obtain the formula for the coordinate basis com-
ponents of ∇T where T is a (r, s) tensor:
T µ1 ...µr ν1 ...νs ;ρ = T µ1 ...µr ν1 ...νs ,ρ + Γµσρ1 T σµ2 ...µr ν1 ...νs + . . . + Γµσρr T µ1 ...µr−1 σ ν1 ...νs
− Γσν1 ρ T µ1 ...µr σν2 ...νs − . . . − Γσνs ρ T µ1 ...µr ν1 ...νs−1 σ (3.17)
Exercise. Prove this result for a (1, 1) tensor.

Remark. We are using a comma and semi-colon to denote partial, and covariant,
derivatives respectively. If more than one index appears after a comma or semi-colon
then the derivative is to be taken with respect to all indices. The index nearest to
comma/semi-colon is the first derivative to be taken. For example, f,µν = f,µ,ν ≡ ∂ν ∂µ f ,
and X a ;bc = ∇c ∇b X a (we cannot use abstract indices for the first example since it is
not a tensor). The second partial derivatives of a function commute: f,µν = f,νµ but
for a covariant derivative this is not true in general. Set η = df in (3.16) to get, in a
coordinate basis,
f;µν = f,µν − Γρµν f,ρ (3.18)
Antisymmetrizing gives
f;[µν] = −Γρ[µν] f,ρ (coordinate basis) (3.19)
Definition. A connection ∇ is torsion-free if ∇a ∇b f = ∇b ∇a f for any function f .
From (3.19), this is equivalent to
Γρ[µν] = 0 (coordinate basis) (3.20)

Lemma. For a torsion-free connection, if X and Y are vector fields then

∇X Y − ∇Y X = [X, Y ] (3.21)
Proof. Use a coordinate basis:
X ν Y µ ;ν − Y ν X µ ;ν = X ν Y µ ,ν + Γµρν X ν Y ρ − Y ν X µ ,ν − Γµρν Y ν X ρ
= [X, Y ]µ + 2Γµ[ρν] X ν Y ρ
= [X, Y ]µ (3.22)
Hence the equation is true in a coordinate basis and therefore (as it is a tensor equation)
it is true in any basis.
Remark. Even for a torsion-free connection, the second covariant derivatives of a
tensor field do not commute. More soon.

Part 3 GR October 12, 2022 34 H.S. Reall

3.2 The Levi-Civita connection

On a manifold with a metric, the metric singles out a preferred connection:
Theorem. Let M be a manifold with a metric g. There exists a unique torsion-free
connection ∇ such that the metric is covariantly constant: ∇g = 0 (i.e. gab;c = 0).
This is called the Levi-Civita (or metric) connection.
Proof. Let X, Y, Z be vector fields then

X(g(Y, Z)) = ∇X (g(Y, Z)) = g(∇X Y, Z) + g(Y, ∇X Z), (3.23)

where we used the Leibniz rule and ∇X g = 0 in the second equality. Permuting X, Y, Z
leads to two similar identities:

Y (g(Z, X)) = g(∇Y Z, X) + g(Z, ∇Y X), (3.24)

Z(g(X, Y )) = g(∇Z X, Y ) + g(X, ∇Z Y ), (3.25)

Add the first two of these equations and subtract the third to get (using the symmetry
of the metric)

X(g(Y, Z)) + Y (g(Z, X)) − Z(g(X, Y )) = g(∇X Y + ∇Y X, Z)

− g(∇Z X − ∇X Z, Y )
+ g(∇Y Z − ∇Z Y, X) (3.26)

The torsion-free condition implies

∇X Y − ∇Y X = [X, Y ] (3.27)

Using this and the same identity with X, Y, Z permuted gives

X(g(Y, Z)) + Y (g(Z, X)) − Z(g(X, Y )) = 2g(∇X Y, Z) − g([X, Y ], Z)

− g([Z, X], Y ) + g([Y, Z], X)
(3.28)

Hence
1
g(∇X Y, Z) = [X(g(Y, Z)) + Y (g(Z, X)) − Z(g(X, Y ))
2
+ g([X, Y ], Z) + g([Z, X], Y ) − g([Y, Z], X)] (3.29)

Part 3 GR October 12, 2022 35 H.S. Reall

3.2 The Levi-Civita connection

This determines ∇X Y uniquely because the metric is non-degenerate. It remains to

check that it satisfies the properties of a connection. For example:
1
g(∇f X Y, Z) = [f X(g(Y, Z)) + Y (f g(Z, X)) − Z(f g(X, Y ))
2
+ g([f X, Y ], Z) + g([Z, f X], Y ) − f g([Y, Z], X)]
1
= [f X(g(Y, Z)) + f Y (g(Z, X)) + Y (f )g(Z, X)
2
− f Z(g(X, Y )) − Z(f )g(X, Y ) + f g([X, Y ], Z) − Y (f )g(X, Z)
+ f g([Z, X], Y ) + Z(f )g(X, Y ) − f g([Y, Z], X)]
f
= [X(g(Y, Z)) + Y (g(Z, X)) − Z(g(X, Y ))
2
+ g([X, Y ], Z) + g([Z, X], Y ) − g([Y, Z], X)]
= f g(∇X Y, Z) = g(f ∇X Y, Z) (3.30)
and hence g(∇f X Y − f ∇X Y, Z) = 0 for any vector field Z so, by the non-degeneracy
of the metric, ∇f X Y = f ∇X Y .
Exercise. Show that ∇X Y as defined by (3.29) satisfies the other properties required
of a connection.
Remark. In differential geometry, this theorem is called the fundamental theorem of
Riemannian geometry (although it applies for a metric of any signature).
Let’s determine the components of the Levi-Civita connection in a coordinate basis
(for which [eµ , eν ] = 0):
1
g(∇ρ eν , eσ ) = [eρ (gνσ ) + eν (gσρ ) − eσ (gρν )] , (3.31)
2
that is
1
g(Γτνρ eτ , eσ ) =
(gσν,ρ + gσρ,ν − gνρ,σ ) (3.32)
2
The LHS is just Γτνρ gτ σ . Hence if we multiply the whole equation by the inverse metric
g µσ we obtain
1
Γµνρ = g µσ (gσν,ρ + gσρ,ν − gνρ,σ ) (3.33)
2
This is the same equation as we obtained earlier; we have now shown that the Christoffel
symbols are the components of the Levi-Civita connection w.r.t. a coordinate basis.
Remark. In GR we take the connection to be the Levi-Civita connection. This is not
as restrictive as it sounds: we saw above that the difference between two connections is
a tensor field. Hence we can write any connection (even one with torsion) in terms of
the Levi-Civita connection and a (1, 2) tensor field. In GR we could regard the latter
as a particular kind of “matter” field, rather than as part of the geometry of spacetime.

Part 3 GR October 12, 2022 36 H.S. Reall

3.3 Geodesics

3.3 Geodesics
Previously we considered curves that extremize the proper time between two points of
a spacetime, and showed that this gives the equation
d 2 xµ µ dxν dxρ
+ Γ νρ (x(τ )) = 0, (3.34)
dτ 2 dτ dτ
where τ is the proper time along the curve. The tangent vector X a to the curve has
components X µ = dxµ /dτ . This is defined only along the curve. However, we can
extend X a (in an arbitrary way) to a neighbourhood of the curve, so that X a becomes
a vector field, and the curve is an integral curve of this vector field. The chain rule
gives
d2 x µ dX µ (x(τ )) dxν ∂X µ
= = = X ν X µ ,ν . (3.35)
dτ 2 dτ dτ ∂xν
Note that the LHS is independent of how we extend X a hence so must be the RHS.
We can now write (3.34) as
X ν X µ ,ν + Γµνρ X ρ = 0

(3.36)
which is the same as
X ν X µ ;ν = 0, or ∇X X = 0. (3.37)
where we are using the Levi-Civita connection. We now extend this to an arbitrary
connection:
Definition. Let M be a manifold with a connection ∇. An affinely parameterized
geodesic is an integral curve of a vector field X satisfying ∇X X = 0.
Remarks.
1. What do we mean by ”affinely parameterized”? Consider a curve with parameter
t whose tangent X satisfies the above definition. Let u be some other parameter
for the curve, so t = t(u) and dt/du > 0. Then the tangent vector becomes
Y = hX where h = dt/du. Hence
∇Y Y = ∇hX (hX) = h∇X (hX) = h2 ∇X X + X(h)hX = f Y, (3.38)
where f = X(h) = dh/dt. Hence ∇Y Y = f Y describes the same geodesic. In
this case, the geodesic is not affinely parameterized.
It always is possible to find an affine parameter so there is no loss of generality
in restricting to affinely parameterized geodesics. Note that the new parameter
is also affine iff X(h) = 0, i.e., h is constant. Then u = at + b where a and b are
constants with a > 0 (a = h−1 ). Hence there is a 2-parameter family of affine
parameters for any geodesic.

Part 3 GR October 12, 2022 37 H.S. Reall

3.3 Geodesics

2. Reversing the above steps shows that, in a coordinate chart, for any connec-
tion, the geodesic equation can be written as (3.34) with τ an arbitrary affine
parameter.

3. In a spacetime, curves of extremal proper time are timelike geodesics (with ∇ the
Levi-Civita connection). One can also consider geodesics which are not timelike.
These satisfy (3.34) with τ an affine parameter. The easiest way to obtain this
equation is to use the Lagrangian (2.26).

Theorem. Let M be a manifold with a connection ∇. Let p ∈ M and Xp ∈ Tp (M ).

Then there exists a unique affinely parameterized geodesic through p with tangent
vector Xp at p.
Proof. Choose a coordinate chart xµ in a neighbourhood of p. Consider a curve
parameterized by τ . It has tangent vector with components X µ = dxµ /dτ . The
geodesic equation is (3.34). We want the curve to satisfy the initial conditions
µ
µ µ dx
x (0) = xp , = Xpµ . (3.39)
dτ τ =0

This is a coupled system of n ordinary differential equations for the n functions xµ (t).
Existence and uniqueness is guaranteed by the standard theory of ordinary differential
equations.
Exercise. Let X be tangent to an affinely parameterized geodesic of the Levi-Civita
connection. Show that ∇X (g(X, X)) = 0 and hence g(X, X) is constant along the
geodesic. Therefore the tangent vector cannot change e.g. from timelike to null along
the geodesic: a geodesic is either timelike, spacelike or null.
Postulate. In GR, free particles move on geodesics (of the Levi-Civita connection).
These are timelike for massive particles, and null for massless particles (e.g. photons).
Remark. In the timelike case we can use proper time as an affine parameter. This
imposes the additional restriction g(X, X) = −1. If τ and τ 0 both are proper times
along a geodesic then τ 0 = τ + b (i.e. a = 1 above). In other words, clocks measuring
proper time differ only by their choice of zero. In particular, they measure equal time
intervals. Similarly in the spacelike case (or on a Riemannian manifold), we use arc
length s as affine parameter, which gives g(X, X) = 1 and s0 = s + b. In the null case,
there is no analogue of proper time or arc length and so there is a 2-parameter freedom
in choice of affine parameterization.

Part 3 GR October 12, 2022 38 H.S. Reall

3.4 Normal coordinates

Definition. Let M be a manifold with a connection ∇. Let p ∈ M . The exponential

map from Tp (M ) to M is defined as the map which sends Xp to the point unit affine
parameter distance along the geodesic through p with tangent Xp at p.
Remark. It can be shown that this map is one-to-one and onto locally, i.e., for Xp in
a neighbourhood of the origin in Tp (M ).
Exercise. Let 0 ≤ t ≤ 1. Show that the exponential map sends tXp to the point affine
parameter distance t along the geodesic through p with tangent Xp at p. (Hint: if λ(τ )
is a geodesic with tangent Xp at p then consider λ(tτ ).)
Definition. Let {eµ } be a basis for Tp (M ). Normal coordinates at p are defined in a
neighbourhood of p as follows. Pick q near p. Then the coordinates of q are X µ where
X a is the element of Tp (M ) that maps to q under the exponential map.
Lemma. Γµ(νρ) (p) = 0 in normal coordinates at p. For a torsion-free connection,
Γµνρ (p) = 0 in normal coordinates at p.
Proof. From the above exercise, it follows that affinely parameterized geodesics
through p are given in normal coordinates by X µ (t) = tXpµ . Hence the geodesic equation
reduces to
Γµνρ (X(t))Xpν Xpρ = 0. (3.40)
Evaluating at t = 0 gives that Γµνρ (p)Xpν Xpρ = 0. But Xp is arbitrary, so the first result
follows. The second result follows using the fact that torsion-free implies Γµ[νρ] = 0 in a
coordinate chart.
Remark. The connection components away from p will not vanish in general.
Lemma. On a manifold with a metric, if the Levi-Civita connection is used to define
normal coordinates at p then gµν,ρ = 0 at p.
Proof. Apply the previous lemma. We then have, at p,

0 = 2gµσ Γσνρ = gµν,ρ + gµρ,ν − gνρ,µ (3.41)

Now symmetrize on µν: the final two terms cancel and the result follows.
Remark. Again, we emphasize, this is valid only at the point p. At any point, we can
introduce normal coordinates to make the first partial derivatives of the metric vanish
at that point. They will not vanish away from that point.
Lemma. On a manifold with metric one can choose normal coordinates at p so that
gµν,ρ (p) = 0 and also gµν (p) = ηµν (Lorentzian case) or gµν (p) = δµν (Riemannian case).

Part 3 GR October 12, 2022 39 H.S. Reall

Proof. We’ve already shown gµν,ρ (p) = 0. Consider ∂/∂X 1 . The integral curve
through p of this vector field is X µ (t) = (t, 0, 0, . . . , 0) (since X µ = 0 at p). But,
from the above, this is the same as the geodesic through p with tangent vector e1 at
p. It follows that ∂/∂X 1 = e1 at p (since both vectors are tangent to the curve at p).
Similarly ∂/∂X µ = eµ at p. But the choice of basis {eµ } was arbitrary. So we are free
to choose {eµ } to be an orthonormal basis. ∂/∂X µ then defines an orthonormal basis
at p too.
In summary, on a Lorentzian (Riemannian) manifold, we can choose coordinates
in the neighbourhood of any point p so that the components of the metric at p are
the same as those of the Minkowski metric in inertial coordinates (Euclidean metric in
Cartesian coordinates), and the first partial derivatives of the metric vanish at p.

4 Curvature
4.1 Parallel transport
On a general manifold there is no way of comparing tensors at different points. For
example, we can’t say whether a vector at p is the same as a vector at q. However, with
a connection we can define a notion of “a tensor that doesn’t change along a curve”:
Definition. Let X a be the tangent to a curve. A tensor field T is parallelly transported
along the curve if ∇X T = 0.
Remarks.

1. Sometimes we say “parallelly propagated” instead of “parallelly transported”.

2. A geodesic is a curve whose tangent vector is parallelly transported along the

curve.

3. Let p be a point on a curve. If we specify T at p then the above equation

determines T uniquely everywhere along the curve. For example, consider a
(1, 1) tensor. Introduce a chart in a neighbourhood of p. Let t be the parameter
along the curve. In the chart, X µ = dxµ /dt so ∇X T = 0 gives

0 = X σ T µ ν;σ = X σ T µ ν,σ + Γµρσ T ρ ν X σ − Γρνσ T µ ρ X σ

dT µ ν
= + Γµρσ T ρ ν X σ − Γρνσ T µ ρ X σ (4.1)
dt
Standard ODE theory guarantees a unique solution given initial values for the
components T µ ν .

Part 3 GR October 12, 2022 40 H.S. Reall

4.2 The Riemann tensor

4. Parallel transport along a curve from p to q determines an isomorphism between

tensors at p and tensors at q.

Consider Euclidean space or Minkowski spacetime with the Levi-Civita connection, and
use Cartesian/inertial frame coordinates so the Christoffel symbols vanish everywhere.
Then a tensor is parallelly transported along a curve iff its components are constant
along the curve. Hence if we have two different curves from p to q then the result of
parallelly transporting T from p to q is independent of which curve we choose. However,
in a general spacetime this is no longer true: parallel transport is path-dependent. The
path-dependence of parallel transport is measured by the Riemann curvature tensor.

4.2 The Riemann tensor

We shall return to the path-dependence of parallel transport below. First we define the
Riemann tensor is as follows:
Definition. The Riemann curvature tensor Ra bcd of a connection ∇ is defined by
Ra bcd Z b X c Y d = (R(X, Y )Z)a , where X, Y, Z are vector fields and R(X, Y )Z is the
vector field
R(X, Y )Z = ∇X ∇Y Z − ∇Y ∇X Z − ∇[X,Y ] Z (4.2)
To demonstrate that this defines a tensor, we need to show that it is linear in X, Y, Z.
The antisymmetry R(X, Y )Z = −R(Y, X)Z implies that we need only check linearity
in X and Z. The non-trivial part is to check what happens if we multiply X or Z by
a function f :

R(f X, Y )Z = ∇f X ∇Y Z − ∇Y ∇f X Z − ∇[f X,Y ] Z

= f ∇X ∇Y Z − ∇Y (f ∇X Z) − ∇f [X,Y ]−Y (f )X Z
= f ∇X ∇Y Z − f ∇Y ∇X Z − Y (f )∇X Z − ∇f [X,Y ] Z + ∇Y (f )X Z
= f ∇X ∇Y Z − f ∇Y ∇X Z − Y (f )∇X Z − f ∇[X,Y ] Z + Y (f )∇X Z
= f R(X, Y )Z (4.3)

R(X, Y )(f Z) = ∇X ∇Y (f Z) − ∇Y ∇X (f Z) − ∇[X,Y ] (f Z)

= ∇X (f ∇Y Z + Y (f )Z) − ∇Y (f ∇X Z + X(f )Z)
−f ∇[X,Y ] Z − [X, Y ](f )Z
= f ∇X ∇Y Z + X(f )∇Y Z + Y (f )∇X Z + X(Y (f ))Z
−f ∇Y ∇X Z − Y (f )∇X Z − X(f )∇Y Z − Y (X(f ))Z
−f ∇[X,Y ] Z − [X, Y ](f )Z
= f R(X, Y )Z (4.4)

Part 3 GR October 12, 2022 41 H.S. Reall

4.3 Parallel transport again

It follows that our definition does indeed define a tensor. Let’s calculate its components
in a coordinate basis {eµ = ∂/∂xµ } (so [eµ , eν ] = 0). Use the notation ∇µ ≡ ∇eµ ,

R(eρ , eσ )eν = ∇ρ ∇σ eν − ∇σ ∇ρ eν
= ∇ρ (Γτνσ eτ ) − ∇σ (Γτνρ eτ )
= ∂ρ Γµνσ eµ + Γτνσ Γµτρ eµ − ∂σ Γµνρ eµ − Γτνρ Γµτσ eµ (4.5)

and hence, in a coordinate basis,

Rµ νρσ = ∂ρ Γµνσ − ∂σ Γµνρ + Γτνσ Γµτρ − Γτνρ Γµτσ (4.6)

Example. For the Levi-Civita connection in Minkowski spacetime, we can choose

inertial frame coordinates, for which Γµνρ vanishes everywhere. Hence Rµ νρσ = 0 every-
where. If a tensor vanishes w.r.t. one basis then it vanishes in all bases so we have
Ra bcd = 0. Such a connection, whose Riemann tensor vanishes everywhere, is called
flat. Conversely, if the Levi-Civita connection associated with a Lorentzian metric is
flat then, for any point p, one can find a neighbourhood O of p and coordinates xµ
defined on O such that gµν = diag(−1, 1, . . . , 1). (Such coordinates will not exist glob-
ally if the manifold has non-trivial topology. We say that the spacetime is “locally
isometric” to Minkowski spacetime.)
The following contraction of the Riemann tensor plays an important role in GR:
Definition. The Ricci curvature tensor is the (0, 2) tensor defined by

Rab = Rc acb (4.7)

We saw earlier that, with vanishing torsion, the second covariant derivatives of a
function commute. The same is not true of covariant derivatives of tensor fields. The
failure to commute arises from the Riemann tensor:
Exercise. Let ∇ be a torsion-free connection. Prove the Ricci identity:

∇c ∇d Z a − ∇d ∇c Z a = Ra bcd Z b (4.8)

Hint. Show that the equation is true when multiplied by arbitrary vector fields X c and
Y d.

4.3 Parallel transport again

Now we return to the relation between the Riemann tensor and the path-dependence
of parallel transport. Let X and Y be vector fields that are linearly independent

Part 3 GR October 12, 2022 42 H.S. Reall

4.3 Parallel transport again

everywhere, with [X, Y ] = 0. Earlier we saw that we can choose a coordinate chart
(s, t, . . .) such that X = ∂/∂s and Y = ∂/∂t. Let p ∈ M and choose the coordinate
chart such that p has coordinates (0, . . . , 0). Let q, r, u be the points with coordinates
(δs, 0, 0, . . .), (δs, δt, 0, . . .), (0, δt, 0, . . .) respectively, where δs and δt are small. We
can connect p and q with a curve along which only s varies, with tangent X. Similarly,
q and r can be connected by a curve with tangent Y . p and u can be connected by
a curve with tangent Y , and u and r can be connected by a curve with tangent X.
The result is a small quadrilateral (Fig. 9). (Note that if [X, Y ] 6= 0 then the integral
curves of X, Y would not form such a quadrilateral.)

u(0, δt, 0, . . . , 0) X r(δs, δt, 0, . . . , 0)

Y Y

X q(δs, 0, . . . , 0)
p(0, 0, . . . , 0)

Figure 9. Parallel transport

Now let Zp ∈ Tp (M ). Parallel transport Zp along pqr to obtain a vector Zr ∈

Tr (M ). Parallel transport Zp along pur to obtain a vector Zr0 ∈ Tr (M ). We shall
calculate the difference Zr0 − Zr for a torsion-free connection.
It is convenient to introduce a new coordinate chart: normal coordinates at p.
Henceforth, indices µ, ν, . . . will refer to this chart. s and t will now be used as param-
eters along the curves with tangent X and Y respectively.
pq is a curve with tangent vector X and parameter s. Along pq, Z is parallely trans-
ported: ∇X Z = 0 so dZ µ /ds = −Γµνρ Z ν X ρ and hence d2 Z µ /ds2 = −(Γµνρ Z ν X ρ ),σ X σ .
Now Taylor’s theorem gives
µ
1 d2 Z µ

µ µ dZ
Zq = Zp + δs + δs2 + O(δs3 )
ds p 2 ds2 p
1 µ
= Zpµ − Γνρ,σ Z ν X ρ X σ p δs2 + O(δs3 )

(4.9)
2
where we have used Γµνρ (p) = 0 in normal coordinates at p (assuming a torsion-free

Part 3 GR October 12, 2022 43 H.S. Reall

4.3 Parallel transport again

connection). Now consider parallel transport along qr to obtain

µ
1 d2 Z µ

µ µ dZ
Zr = Zq + δt + 2
δt2 + O(δt3 )
dt q 2 dt q
1
= Zqµ − Γµνρ Z ν Y ρ q δt − (Γµνρ Z ν Y ρ ),σ Y σ q δt2 + O(δt3 )

(4.10)
2

Consider the behaviour of Γµνρ Z ν Y ρ along the curve pq with parameter s. This
quantity vanishes at p so Taylor expanding in s gives

µ ν ρ d µ ν ρ
h i
δs + O(δs2 ) = Γµνρ Z ν Y ρ ,σ X σ δs + O(δs2 )

Γνρ Z Y q = Γνρ Z Y
ds p p

= Γµνρ,σ Z ν Y ρ X σ p δs + O(δs2 )

(4.11)

We also have
(Γµνρ Z ν Y ρ ),σ Y σ = (Γµνρ Z ν Y ρ ),σ Y σ

q p
+ O(δs) (4.12)
Hence
h i
Zrµ = Zqµ − Γµνρ,σ Z ν Y ρ X σ p δs + O(δs2 ) δt

1h µ i
(Γνρ,σ Z ν Y ρ Y σ p + O(δs) δt2 + O(δt3 )

−
2
1 µ ν
= Zpµ − Γνρ,σ p Z X ρ X σ δs2 + Y ρ Y σ δt2 + 2Y ρ X σ δsδt p + O(δ 3 )

2
(4.13)

Here we assume that δs and δt both are O(δ) (i.e. δs = aδ for some non-zero constant
a and similarly for δt). Now consider parallel transport along pur. The result can be
obtained from the above expression simply by interchanging X with Y and s with t.
Hence we have
0
∆Zrµ ≡ Zrµ − Zrµ = Γµνρ,σ Z ν (Y ρ X σ − X ρ Y σ ) p δsδt + O(δ 3 )

= Γµνσ,ρ − Γµνρ,σ Z ν X ρ Y σ p δsδt + O(δ 3 )

= (Rµ νρσ Z ν X ρ Y σ )p δsδt + O(δ 3 )

= (Rµ νρσ Z ν X ρ Y σ )r δsδt + O(δ 3 ) (4.14)

where we used the expression (4.6) for the Riemann tensor components (remember that
Γµνρ (p) = 0). In the final equality we used that quantities at p and r differ by O(δ).
We have derived this result in a coordinate basis defined using normal coordinates at
p. But now both sides involve tensors at r. Hence our equation is basis-independent
so we can write
a b c d
∆Zra
R bcd Z X Y r = lim (4.15)
δ→0 δsδt

Part 3 GR October 12, 2022 44 H.S. Reall

4.4 Symmetries of the Riemann tensor

The Riemann tensor measures the path-dependence of parallel transport.

Remark. We considered parallel transport along two different curves from p to r.
However, we can reinterpret the result as describing the effect of parallel transport of a
0
vector Zra around the closed curve rqpur to give the vector Zra . Hence ∆Zra measures
the change in Zra when parallel transported around a closed curve.

4.4 Symmetries of the Riemann tensor

From its definition, we have the symmetry Ra bcd = −Ra bdc , equivalently:

Ra b(cd) = 0. (4.16)

Proposition. If ∇ is torsion-free then

Ra [bcd] = 0. (4.17)

Proof. Let p ∈ M and choose normal coordinates at p. Vanishing torsion implies

Γµνρ (p) = 0 and Γµ[νρ] = 0 everywhere. We have Rµ νρσ = ∂ρ Γµνσ − ∂σ Γµνρ at p. Anti-
symmetrizing on νρσ now gives Rµ [νρσ] = 0 at p in the coordinate basis defined using
normal coordinates at p. But if the components of a tensor vanish in one basis then
they vanish in any basis. This proves the result at p. However, p is arbitrary so the
result holds everywhere.
Proposition. (Bianchi identity). If ∇ is torsion-free then

Ra b[cd;e] = 0 (4.18)

Proof. Use normal coordinate at p again. At p,

Rµ νρσ;τ = ∂τ Rµ νρσ (4.19)

In normal coordinates at p, ∂R = ∂∂Γ − Γ∂Γ and the latter terms vanish at p, we only
need to worry about the former:

Rµ νρσ;τ = ∂τ ∂ρ Γµνσ − ∂τ ∂σ Γµνρ at p (4.20)

Antisymmetrizing gives Rµ ν[ρσ;τ ] = 0 at p in this basis. But again, if this is true in one
basis then it is true in any basis. Furthermore, p is arbitrary. The result follows.

Part 3 GR October 12, 2022 45 H.S. Reall

4.5 Geodesic deviation

Remark. In Euclidean space, or in Minkowski spacetime, initially parallel geodesics

remain parallel forever. On a general manifold we have no notion of “parallel”. How-
ever, we can study whether nearby geodesics move together or apart. In particular, we
can quantify their “relative acceleration”.
Definition. Let M be a manifold with a connection ∇. A 1-parameter family of
geodesics is a map γ : I × I 0 → M where I and I 0 both are open intervals in R, such
that (i) for fixed s, γ(s, t) is a geodesic with affine parameter t (so s is the parameter
that labels the geodesic); (ii) the map (s, t) 7→ γ(s, t) is smooth and one-to-one with a
smooth inverse. This implies that the family of geodesics forms a 2d surface Σ ⊂ M .
Let T be the tangent vector to the geodesics and S to be the vector tangent to the
curves of constant t, which are parameterized by s (see Fig. 10). In a chart xµ , the

s = const

T
T T

S
o
t = const
S

Figure 10. 1-parameter family of geodesics

geodesics are specified by xµ (s, t) with S µ = ∂xµ /∂s. Hence xµ (s + δs, t) = xµ (s, t) +
δsS µ (s, t) + O(δs2 ). Therefore (δs)S a points from one geodesic to an infinitesimally
nearby one in the family. We call S a a deviation vector.
On the surface Σ we can use s and t as coordinates. We can extend these to
coordinates (s, t, . . .) defined in a neighbourhood of Σ. This gives a coordinate chart
in which S = ∂/∂s and T = ∂/∂t on Σ. We can now use these equations to extend S
and T to a neighbourhood of the surface. S and T are now vector fields satisfying

[S, T ] = 0 (4.21)

Remark. If we fix attention on a particular geodesic then ∇T (δsS) = δs∇T S can be re-
garded as the rate of change of the relative position of an infinitesimally nearby geodesic

Part 3 GR October 12, 2022 46 H.S. Reall

4.5 Geodesic deviation

in the family i.e., as the “relative velocity” of an infinitesimally nearby geodesic. We

can define the “relative acceleration” of an infinitesimally nearby geodesic in the fam-
ily as δs∇T ∇T S. The word “relative” is important: the acceleration of a curve with
tangent T is ∇T T , which vanishes here (as the curves are geodesics).
Proposition. If ∇ has vanishing torsion then

∇T ∇T S = R(T, S)T (4.22)

Proof. Vanishing torsion gives ∇T S − ∇S T = [T, S] = 0. Hence

∇T ∇T S = ∇T ∇S T = ∇S ∇T T + R(T, S)T, (4.23)

where we used the definition of the Riemann tensor. But ∇T T = 0 on Σ because

T is tangent to (affinely parameterized) geodesics, and hence (as S is tangent to Σ)
∇S ∇T T = 0.
Remark. This result is known as the geodesic deviation equation. In abstract index
notation it is:
T c ∇c (T b ∇b S a ) = Ra bcd T b T c S d (4.24)
This equation shows that curvature results in relative acceleration of geodesics. It also
provides a method of measuring Ra bcd : at any point p we can pick our 1-parameter
family of geodesics such that T and S are arbitrary. Hence by measuring the LHS
above we can determine Ra (bc)d . From this we can determine Ra bcd :
Exercise. Show that, for a torsion-free connection,
2 a
Ra bcd = R (bc)d − Ra (bd)c

(4.25)
3

Remarks.

1. Note that the relative acceleration vanishes for all families of geodesics if, and
only if, the connection is flat, i.e., Ra bcd = 0.

2. In GR, free particles follow geodesics of the Levi-Civita connection. Geodesic

deviation is the tendency of freely falling particles to move together or apart.
Physically, this corresponds to tidal forces arising from inhomogeneity of the
gravitational field. Hence the Riemann tensor is the quantity that measures tidal
forces.

Part 3 GR October 12, 2022 47 H.S. Reall

4.6 Curvature of the Levi-Civita connection

Remark. From now on, we shall restrict attention to a manifold with metric, and use
the Levi-Civita connection. The Riemann tensor then enjoys additional symmetries.
Note that we can lower an index with the metric to define Rabcd .
Proposition. The Riemann tensor satisfies

Rabcd = Rcdab , R(ab)cd = 0. (4.26)

Proof. The second identity follows from the first and the antisymmetry of the Riemann
tensor. To prove the first, introduce normal coordinates at p, so ∂µ gνρ = 0 at p. Then,
at p,
0 = ∂µ δρν = ∂µ (g νσ gσρ ) = gσρ ∂µ g νσ . (4.27)
Multiplying by the inverse metric gives ∂µ g νρ = 0 at p. Using this, we have

1
∂ρ Γτνσ = g τ µ (gµν,σρ + gµσ,νρ − gνσ,µρ ) at p (4.28)
2
And hence (as Γµνρ = 0 at p)

1
Rµνρσ = (gµσ,νρ + gνρ,µσ − gνσ,µρ − gµρ,νσ ) at p (4.29)
2
This satisfies Rµνρσ = Rρσµν at p using the symmetry of the metric and the fact that
partial derivatives commute. This establishes the identity in normal coordinates, but
this is a tensor equation and hence valid in any basis. Furthermore p is arbitrary so
the identity holds everywhere.
Proposition. The Ricci tensor is symmetric:

Rab = Rba (4.30)

Proof. Rab = g cd Rdacb = g cd Rcbda = Rc bca = Rba where we used the first identity above
in the second equality.
Definition. The Ricci scalar is
R = g ab Rab (4.31)
Definition. The Einstein tensor is the symmetric (0, 2) tensor defined by

1
Gab = Rab − Rgab (4.32)
2

Part 3 GR October 12, 2022 48 H.S. Reall

Proposition. The Einstein tensor satisfies the contracted Bianchi identity:

∇a Gab = 0 (4.33)

which can also be written as

1
∇a Rab − ∇b R = 0 (4.34)
2
Proof. Examples sheet 2.
Exercise (examples sheet 2). Consider a Lorentzian manifold and choose normal
coordinates at p. Show that, in a neighbourhood of p, we have
1
gµν = ηµν − Rµρνσ X ρ X σ + O(X 3 ) (4.35)
3
So in normal coordinates at p, the metric agrees with the Minkowski metric up to first
order in X but deviations from Minkowski appear at second order.

5 Diffeomorphisms and Lie derivative

5.1 Maps between manifolds

Definition. Let M , N be differentiable manifolds of dimension m, n respectively. A

function φ : M → N is smooth if, and only if, ψA ◦ φ ◦ ψα−1 is smooth for all charts ψα
of M and all charts ψA of N (note that this is a map from a subset of Rm to a subset
of Rn ).
If we have such a map then we can ”pull-back” a function on N to define a function
on M :
Definition. Let φ : M → N and f : N → R be smooth functions. The pull-back of f
by φ is the function φ∗ (f ) : M → R defined by φ∗ (f ) = f ◦ φ, i.e., φ∗ (f )(p) = f (φ(p)).
Furthermore, φ allows us to ”push-forward” a curve λ in M to a curve φ ◦ λ in N .
Hence we can push-forward vectors from M to N (Figs. 11, 12)
Definition. Let φ : M → N be smooth. Let p ∈ M and X ∈ Tp (M ). The push-
forward of X with respect to φ is the vector φ∗ (X) ∈ Tφ(p) (N ) defined as follows. Let
λ be a smooth curve in M passing through p with tangent X at p. Then φ∗ (X) is the
tangent vector to the curve φ ◦ λ in N at the point φ(p).
Lemma. Let f : N → R. Then (φ∗ (X))(f ) = X(φ∗ (f )).

Part 3 GR October 12, 2022 49 H.S. Reall

5.1 Maps between manifolds

λ φ◦λ
X φ∗ (X)
p φ(p)

M N

Figure 11. A curve in M Figure 12. The curve in N

Proof. Wlog λ(0) = p.

d
(φ∗ (X))(f ) = (f ◦ (φ ◦ λ))(t)
dt
t=0
d
= ((f ◦ φ) ◦ λ)(t)
dt t=0
= X(φ∗ (f )) (5.1)

Exercise. Let xµ be coordinates on M and y α be coordinates on N (we use different

indices α, β etc for N because N is a different manifold which might not have the same
dimension as M ). Then we can regard φ as defining a map y α (xµ ). Show that the
coordinate basis components of φ∗ (X) are related to those of X by
α
α ∂y
(φ∗ (X)) = Xµ (5.2)
∂xµ p

The map on covectors works in the opposite direction:

∗
Definition. Let φ : M → N be smooth. Let p ∈ M and η ∈ Tφ(p) (N ). The pull-back
∗ ∗ ∗
of η with respect to φ is φ (η) ∈ Tp (M ) defined by (φ (η))(X) = η(φ∗ (X)) for any
X ∈ Tp (M ).
Lemma. Let f : N → R. Then φ∗ (df ) = d(φ∗ (f )) (the gradient commutes with
pull-back).
Proof. Let X ∈ Tp (M ). Then

(φ∗ (df ))(X) = (df )(φ∗ (X)) = (φ∗ (X))(f ) = X(φ∗ (f )) = (d(φ∗ (f )))(X) (5.3)

The first equality is the definition of φ∗ , the second is the definition of df , the third is
the previous Lemma and the fourth is the definition of d(φ∗ (f )). Since X is arbitrary,
the result follows.

Part 3 GR October 12, 2022 50 H.S. Reall

5.2 Diffeomorphisms

Exercise. Use coordinates xµ and y α as before. Show that the components of φ∗ (η)
are related to the components of η by
α
∗ ∂y
(φ (η))µ = ηα (5.4)
∂xµ p
Remark. The pull-back can be extended to a tensor S of type (0, s) by defin-

ing (φ∗ (S))(X1 , . . . , Xs ) = S(φ∗ (X1 ), . . . , φ∗ (Xs )) where X1 , . . . , Xs ∈ Tp (M ). Sim-

ilarly, one can push-forward a tensor of type (r, 0) by defining φ∗ (T )(η1 , . . . , ηr ) =
T (φ∗ (η1 ), . . . , φ∗ (ηr )) where η1 , . . . , ηr ∈ Tφ(p)
∗
(N ). The components of these tensors in
a coordinate basis are given by
α1 αs
∗ ∂y ∂y
(φ (S))µ1 ...µs = . . . Sα ...α (5.5)
µ
∂x 1 p ∂xµs p 1 s
α1 αr
α1 ...αr ∂y ∂y
(φ∗ (T )) = µ
. . . T µ1 ...µr (5.6)
∂x 1 p ∂xµr p

Example. The embedding of S 2 into Euclidean space. Let M = S 2 and N = R3 .

Define φ : M → N as the map which sends the point on S 2 with spherical polar
coordinates xµ = (θ, φ) to the point y α = (sin θ cos φ, sin θ sin φ, cos θ) ∈ R3 . Now
consider the Euclidean metric g on R3 , whose components w.r.t. Cartesian coordinates
y α are δαβ . Pulling this back to S 2 using (5.5) gives (φ∗ g)µν = diag(1, sin2 θ) (check!)
(this is the “unit round metric” on S 2 ).

5.2 Diffeomorphisms

Definition. A map φ : M → N is a diffeomorphism iff it 1-1 and onto, smooth, and

has a smooth inverse.
Remark. This implies that M and N have the same dimension. In fact, M and N
have identical manifold structure.
With a diffeomorphism, we can extend our definitions of push-forward and pull-
back so that they apply for any type of tensor:
Definition. Let φ : M → N be a diffeomorphism and T a tensor of type (r, s) on
M . Then the push-forward of T is a tensor φ∗ (T ) of type (r, s) on N defined by (for
∗
arbitrary ηi ∈ Tφ(p) (N ), Xi ∈ Tφ(p) (N ))

φ∗ (T )(η1 , . . . , ηr , X1 , . . . , Xs ) = T (φ∗ (η1 ), . . . , φ∗ (ηr ), (φ−1 )∗ (X1 ), . . . , (φ−1 )∗ (Xs ))

(5.7)
Exercises.

Part 3 GR October 12, 2022 51 H.S. Reall

5.2 Diffeomorphisms

1. Convince yourself that push-forward commutes with the contraction and outer
product operations.

2. Show that the analogue of equation (5.6) for a (1, 1) tensor field is
µ σ
µ ∂y ∂x
[(φ∗ (T )) ν ]φ(p) = (T ρ σ )p (5.8)
∂x p ∂y ν p
ρ

(We don’t need to use indices α, β etc because now M and N have the same
dimension.) Generalize this result to a (r, s) tensor.

Remarks.
1. Pull-back can be defined in a similar way, with the result φ∗ = (φ−1 )∗ .

2. We’ve taken an “active” point of view, regarding a diffeomorphism as a map

taking a point p to a new point φ(p). However, there is an alternative “passive”
point of view in which we consider a diffeomorphism simply as a change of chart
at p. Consider a coordinate chart xµ defined near p and another chart y µ defined
near φ(p) (Fig. 13). Regarding the coordinates y µ as functions on N , we can
pull them back to define corresponding coordinates, which we also call y µ , on
M . So now we have two coordinate systems defined near p. The components of
tensors at p in the new coordinate basis are given by the tensor transformation
law, which is exactly the RHS of (5.8).

xµ φ yµ
φ(p)
p
M
N

Figure 13. Active versus passive diffeomorphism.

Definition. Let φ : M → N be a diffeomorphism. Let ∇ be a covariant derivative on

˜ on N defined by
M . The push-forward of ∇ is a covariant derivative ∇
˜ X T = φ∗ ∇φ∗ (X) (φ∗ (T ))

∇ (5.9)

where X is a vector field and T a tensor field on N . (In words: pull-back X and T to
M , evaluate the covariant derivative there and then push-forward the result to N .)
Exercises (optional!).

Part 3 GR October 12, 2022 52 H.S. Reall

5.3 Lie derivative

1. Check that this satisfies the properties of a covariant derivative.

˜ is the push-forward of the Riemann tensor
2. Show that the Riemann tensor of ∇
of ∇.
˜
3. Let ∇ be the Levi-Civita connection defined by a metric g on M . Show that ∇
is the Levi-Civita connection defined by the metric φ∗ (g) on N .

Note that diffeomorphisms allow us to compare tensors defined at different points

via push-forward or pull-back. This leads to a notion of a tensor field possessing
symmetry:
Definition. A diffeomorphism φ : M → M is a symmetry transformation of a tensor
field T iff φ∗ (T ) = T everywhere. A symmetry transformation of a metric tensor is
called an isometry.
Example. In Minkowski spacetime, translations, rotations and Lorentz transforma-
tions are isometries.

5.3 Lie derivative

Definition. Let X be a vector field on a manifold M . Let φt : M → M be the map

which sends a point p ∈ M to the point parameter distance t along the integral curve
of X through p (this might be defined only for small enough t). It can be shown that
φt is a diffeomorphism.
Remarks.

1. Note that φ0 is the identity map and φs ◦ φt = φs+t . Hence φ−t = (φt )−1 . If φt is
defined for all t ∈ R (in which case we say the integral curves of X are complete)
then these diffeomorphisms form a 1-parameter abelian group.

2. Given X we’ve defined φt . Conversely, if one has a 1-parameter abelian group

of diffeomorphisms φt (i.e. one satisfying the rules just mentioned) then through
any point p one can consider the curve with parameter t given by φt (p). Define
X to be the tangent to this curve at p. Doing this for all p defines a vector field
X. The integral curves of X generate φt in the sense defined above.

3. If we use (φt )∗ to compare tensors at different points then the parameter t controls
how near the points are. In particular, in the limit t → 0, we are comparing
tensors at infinitesimally nearby points. This leads to the notion of a new type
of derivative:

Part 3 GR October 12, 2022 53 H.S. Reall

5.3 Lie derivative

Definition. The Lie derivative of a tensor field T with respect to a vector field X is a
tensor field given by
((φ−t )∗ T )p − Tp
(LX T )p = lim (5.10)
t→0 t
Remarks. Recall that (φ−t )∗ = (φt )∗ so we can also phrase the definition in terms of
pull-back. The Lie derivative wrt X is a map from (r, s) tensor fields to (r, s) tensor
fields. It obeys LX (αS + βT ) = αLX S + βLX T where α and β are constants.
The easiest way to demonstrate other properties of the Lie derivative is to introduce
coordinates in which the components of X are simple. Let Σ be a hypersurface that
has the property that X is nowhere tangent to Σ (in particular X 6= 0 on Σ). Let
xi , i = 1, 2, . . . , n − 1 be coordinates on Σ. Now assign coordinates (t, xi ) to the
point parameter distance t along the integral curve of X that starts at the point with
coordinates xi on Σ (Fig. 14).

(t, xi )

xi
Σ

Figure 14. Coordinates adapted to a vector field

This defines a coordinate chart (t, xi ) at least for small t, i.e., in a neighbourhood
of Σ. Furthermore, the integral curves of X are the curves (t, xi ) with fixed xi and
parameter t. The tangent to these curves is ∂/∂t so we have constructed coordinates
such that X = ∂/∂t. The diffeomorphism φt is very simple: it just sends the point
p with coordinates xµ = (tp , xip ) to the point φt (p) with coordinates y µ = (tp + t, xip )
hence ∂y µ /∂xν = δνµ . The generalization of (5.8) to a (r, s) tensor then gives

[((φ−t )∗ (T ))µ1 ,...,µr ν1 ,...νs ]φ−t (q) = [T µ1 ,...,µr ν1 ,...νs ]q (5.11)

and setting q = φt (p) gives

[((φ−t )∗ (T ))µ1 ,...,µr ν1 ,...νs ]p = [T µ1 ,...,µr ν1 ,...νs ]φt (p) (5.12)

Part 3 GR October 12, 2022 54 H.S. Reall

5.3 Lie derivative

It follows that, if p has coordinates (tp , xip ) in this chart,

1
[(LX T )µ1 ,...,µr ν1 ,...νs ]p = lim T µ1 ,...,µr ν1 ,...νs (tp + t, xip ) − T µ1 ,...,µr ν1 ,...νs (tp , xip )

t→0 t

∂ µ1 ,...,µr i
= T ν1 ,...νs (tp , xp ) (5.13)
∂t

So in this chart, the Lie derivative is simply the partial derivative with respect to the
coordinate t. It follows that the Lie derivative has the following properties:

1. It obeys the Leibniz rule: LX (S ⊗ T ) = (LX S) ⊗ T + S ⊗ LX T .

2. It commutes with contraction.

Now let’s derive basis-independent formulae for the Lie derivative. First consider
a function f . In the above chart, we have LX f = (∂/∂t)(f ). However, in this chart we
also have X(f ) = (∂/∂t)(f ). Hence

LX f = X(f ) (5.14)

Both sides of this expression are scalars and hence this equation must be valid in any
basis. Next consider a vector field Y . In our coordinates above we have
∂Y µ
(LX Y )µ = (5.15)
∂t
but we also have
∂Y µ
[X, Y ]µ = (5.16)
∂t
If two vectors have the same components in one basis then they are equal in all bases.
Hence we have the basis-independent result

LX Y = [X, Y ] (5.17)

Remark. Let’s compare the Lie derivative and the covariant derivative. The former
is defined on any manifold whereas the latter requires extra structure (a connection).
Equation (5.17) reveals that the Lie derivative wrt X at p depends on Xp and the first
derivatives of X at p. The covariant derivative wrt X at p depends only on Xp , which
enables us to remove X to define the tensor ∇T , a covariant generalization of partial
differentiation. It is not possible to define a corresponding tensor LT using the Lie
derivative. Only LX T makes sense.
Exercises (examples sheet 2).

Part 3 GR October 12, 2022 55 H.S. Reall

5.3 Lie derivative

1. Derive the formula for the Lie derivative of a covector field ωa in a coordinate
basis:
(LX ω)µ = X ν ∂ν ωµ + ων ∂µ X ν (5.18)
Show that this can be written in the basis-independent form

(LX ω)a = X b ∇b ωa + ωb ∇a X b (5.19)

where ∇ is any torsion-free connection.

2. Show that the Lie derivative of a (0, 2) tensor field gab in a coordinate basis is

(LX g)µν = X ρ ∂ρ gµν + gµρ ∂ν X ρ + gρν ∂µ X ρ (5.20)

and that this can be written in the basis-independent form

(LX g)ab = ∇a Xb + ∇b Xa (5.21)

where ∇ is the Levi-Civita connection.

Note that we cannot use abstract indices in (5.18) and (5.20) because they are valid
only in a coordinate basis.
Remark. If φt is a symmetry transformation of a tensor field T (for all t) then LX T =
0.
If φt are a 1-parameter group of isometries then LX g = 0, i.e.,

∇a Xb + ∇b Xa = 0 (5.22)

This is Killing’s equation and solutions X a are called Killing vector fields. Consider
the case in which there exists a chart for which the metric tensor does not depend on
some coordinate z. Then equation (5.20) reveals that ∂/∂z is a Killing vector field.
Conversely, if the metric admits a Killing vector field then equation (5.13) demon-
strates that one can introduce coordinates such that the metric tensor components are
independent of one of the coordinates.
Lemma. Let X a be a Killing vector field and let V a be tangent to an affinely param-
eterized geodesic. Then Xa V a is constant along the geodesic.
Proof. The derivative of Xa V a along the geodesic is
d
(Xa V a ) = V (Xa V a ) = ∇V (Xa V a ) = V b ∇b (Xa V a )
dτ
= V a V b ∇b Xa + Xa V b ∇b V a (5.23)

Part 3 GR October 12, 2022 56 H.S. Reall

The first term vanishes because Killing’s equation implies that ∇b Xa is antisymmetric.
The second term vanishes by the geodesic equation.
Exercise. Let J a = T a b X b where Tab is symmetric and satisfies ∇a T ab = 0 (e.g.
the energy-momentum tensor: see later) and X b is a Killing vector field. Show that
∇a J a = 0, i.e., J a is a conserved current.

6 Physical laws in curved spacetime

6.1 Motivating principles
In this chapter we will explain how the differential geometry we have discussed in pre-
vious chapters is related to physics. We will start by reviewing some physical principles
that were important in the discovery of GR. Compared to the discussion of previous
chapters, the material of this section is going to appear rather imprecise but we will
see that it leads quite naturally to the idea that we should describe physics using a
Lorentzian manifold.
We will use “Spacetime” (with a capital S) to refer to the (vague) physics notion
of “the stage on which physics takes place”, to distinguish it from “spacetime”, which
we’ve defined to mean a Lorentzian manifold.
Based on various thought experiments involving Newtonian gravity (e.g. “a man
falling off a roof will not feel his own weight”), Einstein came to the conclusion that lo-
cally in Spacetime one can “transform away” a gravitational field by working in a “local
inertial frame”. Roughly speaking, this can be regarded as a set of coordinates associ-
ated to a laboratory that is falling freely under gravity, with the size of the laboratory,
and the duration of any experiments, taken to be small compared to the length and
time scales over which the gravitational field varies. Inside such a laboratory, Einstein
argued that the results of non-gravitational experiments would be indistinguishable
from the results of the same experiments performed in an inertial frame in Minkowski
spacetime. We can state this more formally as:
Einstein equivalence principle. In any sufficiently small region of Spacetime there
should exist coordinates xµ , called a local inertial frame, such that in these coordinates,
the results of all non-gravitational experiments are indistinguishable from the results
of the same experiment performed in an inertial frame in Minkowski spacetime.
Lets see how this leads to the idea that Spacetime should be described mathemat-
ically by a spacetime. First, the principle refers to coordinates, which suggests that
we should introduce a 4d differentiable manifold M on which these coordinates are
defined. Next, given any point p ∈ M , the principle says we can introduce local inertial

Part 3 GR October 12, 2022 57 H.S. Reall

6.1 Motivating principles

frame coordinates in a neighbourhood of p, and physics in these coordinates should

be indistinguishable from Minkowski physics. Hence there should exist a metric with
components ηµν = diag(−1, 1, 1, 1) at p in these coordinates. So we have a Lorentzian
metric defined at each point of M hence there must be a Lorentzian metric tensor gab
defined globally on M . Therefore we must have a spacetime (M, g).
Conversely, if we have a spacetime (M, g) then, for any p ∈ M we can introduce
normal coordinates such that, in a neighbourhood of p we have gµν = ηµν +. . . where the
ellipsis denotes terms of second order in the coordinates. We can now identify a “local
inertial frame” as a “normal coordinate chart” and the spacetime metric reduces to the
Minkowski metric locally at the origin of such a chart. As one moves away from the
origin, there are quadratic corrections to the Minkowski metric arising from spacetime
curvature (equation (4.35)). These describe gravitational tidal forces. In a Newtonian
description, these are forces arising from inhomogeneities of the gravitational field.
A “test body” is an uncharged object whose internal structure is small compared to
the scale over which the gravitational field varies, and which has negligible gravitational
self-interaction. In Minkowski spacetime, test bodies, or free particles, follow timelike
or null geodesic (straight line) trajectories, i.e., curves of vanishing acceleration. Now
consider the motion of such a body in a curved spacetime (M, g). At a point p on
the trajectory, we can introduce a local inertial frame. In this frame, the Einstein
equivalence principle implies that the acceleration ∇X X of the curve at p must vanish,
since this is true in Minkowski spacetime. But p is arbitrary. So the curve must be a
geodesic, which motivates:
Geodesic postulate. The worldlines of free particles or test bodies are timelike or
null geodesics (of the Levi-Civita connection).
It is believed that this is not really needed as an independent assumption in GR
but follows from energy-momentum conservation, or (in the null case) geometric optics.
We now turn to a second principle that was important in the discovery of GR,
namely general covariance.1 Let’s start by thinking about special relativity. The “prin-
ciple of relativity” is usually stated as “the laws of physics should take the same form
in any inertial frame”. This is satisfied by demanding that physical laws are tensor
equations, with tensors in the sense of special relativity (i.e. sets of components trans-
forming in an appropriate way under Lorentz transformations). This statement of the
principle is unsatisfactory because, by introducing additional tensor fields, essentially
any equation can be made to respect it. For example, in 2d Minkowski spacetime,
1
The following discussion is motivated by the one in Wald’s book.

Part 3 GR October 12, 2022 58 H.S. Reall

6.1 Motivating principles

consider the wave equation with speed v 6= 1:

1 ∂ 2φ ∂ 2φ
− + =0 (6.1)
v 2 ∂(x0 )2 ∂(x1 )2

where xµ are the coordinates of an inertial frame. This equation is not Lorentz invariant
so it cannot hold in all inertial frames. Let {eµ = ∂/∂xµ } be the coordinate basis vectors
of an inertial frame in which it does hold. The components of eµ w.r.t. this basis are
eνµ = δµν . So we can rewrite the equation as

∂ 2φ

1 µ ν µ ν
− 2 e0 e0 + e1 e1 =0 (6.2)
v ∂xµ ∂xν

Since e0 and e1 are vectors, this equation is invariant under Lorentz transformations
so the Greek indices can now be taken to refer to an arbitrary inertial frame (with
eνµ 6= δµν ). Hence we have written the equation in a form that obeys the principle of
relativity! We have achieved this by introducing the vector fields {e0 , e1 }. However, we
don’t really want to regard the above equation as respecting the principle of relativity,
so we need a different formulation of this principle. We use the following:
Principle of special covariance. The only mathematical structure needed to de-
scribe Spacetime is Minkowski spacetime.
This excludes (6.2) because if we regard {e0 , e1 } as associated with Spacetime then
we have introduced an extra mathematical structure to describe Spacetime. However,
if v = 1 then we have −eµ0 eν0 + eµ1 eν1 = η µν where ηµν is the Minkowski metric. Hence
for v = 1 the dependence on {e0 , e1 } drops out and the principle is respected.
We need to go beyond special relativity to incorporate gravity. We have seen that
the equivalence principle strongly suggests that we should describe Spacetime using a
spacetime (M, g). The obvious generalization of the principle of special covariance is:
Principle of general covariance. The only mathematical structure needed to de-
scribe Spacetime is spacetime.
This asserts that spacetime is all we need to describe Spacetime. So, roughly speak-
ing, the equivalence principle suggests that spacetime is necessary, and the principle
of general covariance says that it is sufficient, to describe Spacetime. For example,
when we introduce a connection, general covariance requires that we must choose the
Levi-Civita connection defined by the metric, for otherwise we would be introducing a
new mathematical structure.
As another example, consider Newtonian physics, which we can formulate in terms
of a manifold R4 . In Newtonian physics there is a preferred notion of time, absolute

Part 3 GR October 12, 2022 59 H.S. Reall

6.2 Physical laws in curved spacetime

time, which all observers agree on. Mathematically, this is described by a function
t : R4 → R. So t is a mathematical structure that we need to describe a Newtonian
Spacetime, hence Newtonian physics violates the principle of general covariance.2
Of course the above principle is rather vague because haven’t defined precisely what
we mean by Spacetime. Sometimes it is ambiguous whether a given mathematical
structure is needed to describe Spacetime, or whether it needed to describe matter
in Spacetime. For example, string theory predicts the existence of a scalar field Φ
and an antisymmetric tensor Habc . These emerge from the theory in a very similar
way to the metric tensor gab . Hence one could regard these additional fields as part
of Spacetime and declare that string theory violates the principle. Alternatively, one
could declare these fields to be matter fields, rather than part of Spacetime, in which
case the principle is not violated.
To conclude, the principles discussed in this section provide strong motivation for
the formulation of GR as a theory which describes Spacetime in terms of a Lorentzian
manifold (M, g). However, these principles are not mathematically precise and they
contain ambiguities. Ultimately what matters is whether the theory we construct is
mathematically consistent and whether it agrees with observations.

6.2 Physical laws in curved spacetime

Consider an equation written in terms of tensors in the sense of special relativity. It is
straightforward to rewrite it in terms of tensors in the sense that we have defined them.
Starting with a special relativity tensor equation written w.r.t. an arbitrary inertial
frame xµ we simply do the following:

1. In an inertial frame, the Levi-Civita connection ∇µ is the same as ∂µ . So we first

replace ∂µ with ∇µ everywhere (“comma goes to semi-colon” rule).

2. Replace Greek indices (referring to an inertial frame) with abstract Latin indices.

When we do this we are not changing the equation, we are simply enlarging its regime
of validity, so that it holds w.r.t. an arbitrary basis.
Example. The simplest field equation obeying the principle of special covariance is
the wave equation for a scalar field Φ:

η µν ∂µ ∂ν Φ = 0 (6.3)
2
Actually we may only want to define t up to a constant, corresponding to the freedom to shift the
zero of time. In that case, it is dt rather than t which is well-defined, so a Newtonian Spacetime requires
(among other things) a covector field, which still violates the principle. Look up Newton-Cartan theory
if you want to learn more.

Part 3 GR October 12, 2022 60 H.S. Reall

6.2 Physical laws in curved spacetime

where Greek indices refer to an inertial frame. Following the above steps, this becomes

η ab ∇a ∇b Φ = 0 (6.4)

It may look like we have not done much but note that we can now immediately write
down the wave equation for a non-inertial coordinate system xµ : it is η µν ∇µ ∇ν Φ = 0
even though in such a coordinate system we have ηµν 6= diag(−1, 1, 1, 1) and ∇µ 6= ∂µ .
Our equation respects the principle of special covariance because we have not introduced
any mathematical structures beyond those defined in Minkowski spacetime, i.e., the
metric ηab and its Levi-Civita connection.
One more simple step yields a tensor equation valid in an arbitrary spacetime:

3. Replace the Minkowski metric ηab with an arbitrary metric gab and take ∇ to be
the Levi-Civita connection of this metric.

If we start from an equation that respects the principle of special covariance then, after
following the above steps, we will obtain an equation that respects the principle of
general covariance.
Examples.

1. The wave equation in a general spacetime is

g ab ∇a ∇b Φ = 0, or ∇a ∇a Φ = 0 or Φ;a a = 0. (6.5)

A simple generalization of this equation is the Klein-Gordon equation:

∇a ∇a Φ − m2 Φ = 0. (6.6)

where m ≥ 0 has dimensions of inverse length. In quantum field theory, this

equation describes particles of mass m~ so Φ is called a massive scalar field.

2. In special relativity, the electric and magnetic fields are combined into an anti-
symmetric tensor Fµν . The electric and magnetic fields in an inertial frame are
obtained by the rule (i, j, k take values from 1 to 3) F0i = −Ei and Fij = ijk Bk .
Maxwell’s equations in an inertial frame are (in Gaussian units)

η µρ ∂µ Fνρ = 4πjν , ∂[µ Fνρ] = 0. (6.7)

where jµ is the charge-current density. Hence in a curved spacetime, the electro-

magnetic field is described by an antisymmetric tensor Fab satisfying

g ac ∇a Fbc = 4πjb , ∇[a Fbc] = 0. (6.8)

Part 3 GR October 12, 2022 61 H.S. Reall

6.2 Physical laws in curved spacetime

The Lorentz force law for a particle of charge q and mass m in Minkowski space-
time is
d2 x µ q µν dxρ
= η F νρ (6.9)
dτ 2 m dτ
where τ is proper time. We saw previously that the LHS can be rewritten as
uν ∂ν uµ where uµ = dxµ /dτ is the 4-velocity. Now following the rules above gives
the generally covariant equation
q q
ub ∇b ua = g ab Fbc uc = F a b ub . (6.10)
m m
Note that this reduces to the geodesic equation when q = 0.
At least in simple cases, tensor equations obtained by the above method will satisfy
the Einstein equivalence principle. This can be seen by introducing a local inertial
frame (i.e. normal coordinates) at p, so that gµν (p) = ηµν and Γµνρ (p) = 0. Hence (first)
covariant derivatives reduce to partial derivatives at p. This has the effect of reversing
the above steps at p, to return to the equation of special relativity, in agreement with
the Einstein equivalence principle. For example, in such coordinates at p we have
g µρ ∇µ Fνρ = η µρ ∂µ Fνρ and ∇[µ Fνρ] = ∂[µ Fνρ] so Maxwell’s equations at p are identical
to Maxwell’s equations in Minkowski spacetime. Of course this is only true at p and as
we move away from p we will see deviations from the Minkowski spacetime equations
arising from the fact that spacetime is not flat.
The above procedure gives tensor equations but they are not necessarily the correct
tensor equations! For example, a natural generalization of the Klein-Gordon equation
in curved spacetime is
g ab ∇a ∇b Φ − (m2 + ξR)Φ = 0 (6.11)
where ξ is a dimensionless constant. This equation reduces to the Klein-Gordon equa-
tion in Minkowski spacetime, so one would not obtain this equation by following the
steps above. In general, terms involving curvature are not determined by the above
procedure. Sometimes, such terms are fixed by mathematical consistency. However,
this is not always possible: there is no reason why it should be possible to derive
laws of physics in curved spacetime from those in flat spacetime. The ultimate test is
comparison with observations.
Let’s now discuss geometric optics. Consider, for simplicity, a scalar field satisfying
the wave equation in curved spacetime. It is convenient to consider a complex scalar
field Ψ. In geometric optics we look for solutions of the form Ψ = AeiλS where we
are interested in the high frequency limit λ → ∞ with the functions A, S and their
derivatives assumed to be O(1). Plugging this Ansatz into the wave equation, we obtain

− λ2 Ψg ab ∇a S∇b S + O(λ) = 0 (6.12)

Part 3 GR October 12, 2022 62 H.S. Reall

6.3 Energy-momentum tensor

hence to leading order in λ the wave equation reduces to the eikonal equation

g ab ∇a S∇b S = 0 (6.13)

Now, for any function f , if X a is tangent to a curve lying within a surface of constant
f then we have X(f ) = 0 (as f is constant along the curve) and hence df (X) = 0.
Therefore, on such a surface, df is orthogonal to any tangent vector so we say df is
normal to the surface. So in our case Pa ≡ ∇a S is normal to the surfaces of constant
phase S. But we also have 0 = P a Pa so P a is normal to Pa and hence P a is tangent to
these surfaces (this is a counter-intuitive property of a “null” hypersurfaces, i.e., one
with a null normal). Now consider
1
P b ∇b Pa = (∇b S)∇b ∇a S = g bc (∇c S)∇a ∇b S = ∇a g bc ∇b S∇c S = 0

(6.14)
2
where the second equality uses vanishing torsion and the final equality follows from the
eikonal equation. Hence we have shown that, in the geometric optics (high frequency)
limit, P a is null and satisfies the geodesic equation. Each geodesic lies in a surface of
constant S. One can show that exactly the same thing happens for Maxwell’s equations.
In that case the integral curves of P a are called light rays. This is exactly analogous to
the usual way in which light rays arise in geometric optics in flat spacetime. In summary,
we’ve shown how the geodesic equation for null geodesics arises from geometric optics.

6.3 Energy-momentum tensor

in GR, the curvature of spacetime is related to the energy and momentum of matter.
So we need to discuss how the latter concepts are defined in GR. We shall start by
discussing the energy and momentum of particles.
In special relativity, associated to any particle is a scalar called its rest mass (or
simply its mass) m. If the particle has 4-velocity uµ (again xµ denote inertial frame
coordinates) then its 4-momentum is

P µ = muµ (6.15)

The time component of P µ is the particle’s energy and the spatial components are its
3-momentum with respect to the inertial frame.
If an observer at some point p has 4-velocity v µ (p) then he measures the particle’s
energy, when the particle is at q, to be

E = −ηµν v µ (p)P ν (q). (6.16)

Part 3 GR October 12, 2022 63 H.S. Reall

6.3 Energy-momentum tensor

The way to see this is to choose an inertial frame in which, at p, the observer is at rest
at the origin, so v µ (p) = (1, 0, 0, 0) so E is just the time component of P ν (q) in this
inertial frame.
By the Einstein equivalence principle, GR should reduce to SR in a local inertial
frame. Hence in GR we also associate a rest mass m to any particle and define the
4-momentum of a particle with 4-velocity ua as

P a = mua (6.17)

Note that
gab P a P b = −m2 (6.18)
The equivalence principle implies that when the observer and particle both are at p
then (6.16) should be valid so the observer measures the particle’s energy to be

E = −gab (p)v a (p)P b (p) (6.19)

However, an important difference between GR and SR is that there is no analogue

of equation (6.16) for p 6= q. This is because v a (p) and P a (q) are vectors defined at
different points, so they live in different tangent spaces. There is no way they can be
combined to give a scalar quantity. An observer at p cannot measure the energy of a
particle at q.
Now let’s consider the energy and momentum of continuous distributions of matter.
Example. Consider Maxwell theory in Minkowski spacetime. Pick an inertial frame
and work in pre-relativity notation using Cartesian tensors. The electromagnetic field
has energy density
1
E= (Ei Ei + Bi Bi ) (6.20)
8π
and the momentum density (or energy flux density) is given by the Poynting vector:
1
Si = ijk Ej Bk . (6.21)
4π
The Maxwell equations imply that these satisfy the conservation law (if no sources)

∂E
+ ∂i Si = 0. (6.22)
∂t
The momentum flux density is described by the stress tensor:

1 1
tij = (Ek Ek + Bk Bk ) δij − Ei Ej − Bi Bj , (6.23)
4π 2

Part 3 GR October 12, 2022 64 H.S. Reall

6.3 Energy-momentum tensor

with the conservation law

∂Si
+ ∂j tij = 0. (6.24)
∂t
If a surface element has area dA and normal ni then the force exerted on this surface
by the electromagnetic field is tij nj dA.
In special relativity, these three objects are combined into a single tensor, called var-
iously the ”energy-momentum tensor”, the ”stress tensor”, the ”stress-energy-momentum
tensor” etc. In an inertial frame it is

1 ρ 1 ρσ
Tµν = Fµρ Fν − F Fρσ ηµν (6.25)
4π 4

where we’ve raised indices with η µν . Note that this is a symmetric tensor. It has
components T00 = E, T0i = −Si , Tij = tij . The conservation laws above are equivalent
to the single equation (if no sources)

∂ µ Tµν = 0. (6.26)

The definition of the energy-momentum tensor extends naturally to GR:

Definition. The energy-momentum tensor of a Maxwell field in a general spacetime is

1 c 1 cd
Tab = Fac Fb − F Fcd gab (6.27)
4π 4

Exercise (examples sheet 3). Show that Maxwell’s equations imply that

∇a Tab = 0. (6.28)

In GR (and SR) we assume that continuous matter always is described by a conserved

energy-momentum tensor:
Postulate. The energy, momentum, and stresses, of matter are described by an energy-
momentum tensor, a (0, 2) symmetric tensor Tab that is conserved: ∇a Tab = 0.
Remark. Let ua be the 4-velocity of an observer O at p. Choose the orthonormal
basis {eµ } used to define normal coordinates at p so that ea0 = ua . This defines a local
inertial frame (LIF) at p in which O is at rest. Denote the spatial basis vectors as eai ,
i = 1, 2, 3. From the Einstein equivalence principle, E ≡ T00 = Tab ea0 eb0 = Tab ua ub is the
energy density of matter at p measured by O. Similarly, Si ≡ −T0i is the momentum
density and tij ≡ Tij the stress tensor measured by O. The energy-momentum current
measured by O is the 4-vector j a = −T a b ub , which has components (E, Si ) in this basis.

Part 3 GR October 12, 2022 65 H.S. Reall

6.3 Energy-momentum tensor

Remark. In an inertial frame xµ in Minkowski spacetime, local conservation of Tab

is equivalent to equations of the form (6.22) and (6.24). If one integrates these over
a fixed volume V in surfaces of constant t = x0 then one obtains global conservation
equations. For example, integrating (6.22) over V gives
Z Z
d
E = − S · ndA (6.29)
dt V S

where the surface S (with outward unit normal n) bounds V . In words: the rate of
increase of the energy of matter in V is equal to minus the energy flux across S. In
a general curved spacetime, such an interpretation is not possible. This is because
the gravitational field can do work on the matter in the spacetime. One might think
that one could obtain global conservation laws in curved spacetime by introducing a
definition of energy density etc for the gravitational field. This is a subtle issue. The
gravitational field is described by the metric gab . In Newtonian theory, the energy
density of the gravitational field is −(1/8π)(∇Φ)2 so one might expect that in GR
the energy density of the gravitational field should be some expression quadratic in
first derivatives of gab . But we have seen that we can choose normal coordinates to
make the first partial derivatives of gab vanish at any given point. Gravitational energy
certainly exists but not in a local sense. For example one can define the total energy
(i.e. the energy of matter and the gravitational field) for certain spacetimes (this will
be discussed in the black holes course).
Example. A perfect fluid is described by a 4-velocity vector field ua , and two scalar
fields ρ and p. The energy-momentum tensor is

Tab = (ρ + p)ua ub + pgab (6.30)

ρ and p are the energy density and pressure measured by an observer co-moving with
the fluid, i.e., one with 4-velocity ua (check: Tab ua ub = ρ + p − p = ρ). The equations
of motion of the fluid can be derived by conservation of Tab :
Exercise (examples sheet 3). Show that, for a perfect fluid, ∇a Tab = 0 is equivalent
to
ua ∇a ρ + (ρ + p)∇a ua = 0, (ρ + p)ub ∇b ua = −(gab + ua ub )∇b p (6.31)
These are relativistic generalizations of the mass conservation equation and Euler equa-
tion of non-relativistic fluid dynamics. Note that a pressureless fluid moves on timelike
geodesics. This makes sense physically: zero pressure implies that the fluid particles
are non-interacting and hence behave like free particles.

Part 3 GR October 12, 2022 66 H.S. Reall

6.4 Einstein’s equation

Postulates of General Relativity.

1. Spacetime is a 4d Lorentzian manifold equipped with the Levi-Civita connection.

2. Free particles follow timelike or null geodesics.

3. The energy, momentum, and stresses of matter are described by a symmetric

tensor Tab that is conserved: ∇a Tab = 0.

4. The curvature of spacetime is related to the energy-momentum tensor of matter

by the Einstein equation (1915)
1
Gab ≡ Rab − Rgab = 8πG Tab (6.32)
2
where G is Newton’s constant.
We have discussed points 1-3 above. It remains to discuss the Einstein equation.
We can motivate this as follows. In GR, the gravitational field is described by the
curvature of spacetime. Since the energy of matter should be responsible for gravitation,
we expect some relationship between curvature and the energy-momentum tensor. The
simplest possibility is a linear relationship, i.e., a curvature tensor is proportional to
Tab . Since Tab is symmetric, it is natural to expect the Ricci tensor to be the relevant
curvature tensor.
Einstein’s first guess for the field equation of GR was Rab = CTab for some constant
C. This does not work for the following reason. The RHS is conserved hence this
equation implies ∇a Rab = 0. But then from the contracted Bianchi identity we get
∇a R = 0. Taking the trace of the equation gives R = CT (where T = T a a ) and hence
we must have ∇a T = 0, i.e., T is constant. But, T vanishes in empty space and is
usually non-zero inside matter. Hence this is unsatisfactory.
The solution to this problem is obvious once one knows of the contracted Bianchi
identity. Take Gab , rather than Rab , to be proportional to Tab . The coefficient of
proportionality on the RHS of Einstein’s equation is fixed by demanding that the
equation reduces to Newton’s law of gravitation when the gravitational field is weak
and the matter is moving non-relativistically. We will show this later.
Remarks.
1. In vacuum, Tab = 0 so Einstein’s equation gives Gab = 0. Contracting indices
gives R = 0. Hence the vacuum Einstein equation can be written as

Rab = 0 (6.33)

Part 3 GR October 12, 2022 67 H.S. Reall

6.4 Einstein’s equation

2. The “geodesic postulate” of GR is redundant. Using conservation of the energy-

momentum tensor it can be shown that a distribution of matter that is small
(compared to the scale on which the spacetime metric varies), and sufficiently
weak (so that its gravitational effect is small), must follow a geodesic. (See
examples sheet 4 for the case of a point particle.) In the null case we’ve seen that
the postulate follows from geometric optics.

3. The Einstein equation is a set of non-linear, second order, coupled, partial dif-
ferential equations for the components of the metric gµν . Very few physically
interesting explicit solutions are known so one has to develop other methods to
solve the equation, e.g., numerical integration.

4. How unique is the Einstein equation? Is there any tensor, other than Gab that
we could have put on the LHS? The answer is supplied by:

Theorem (Lovelock 1972). Let Hab be a symmetric tensor such that (i) in any
coordinate chart, at any point, Hµν is a function of gµν , gµν,ρ and gµν,ρσ at that point;
(ii) ∇a Hab = 0; (iii) either spacetime is four-dimensional or Hµν depends linearly on
gµν,ρσ . Then there exist constants α and β such that

Hab = αGab + βgab (6.34)

Hence (as Einstein realized) there is the freedom to add a constant multiple of gab to
the LHS of Einstein’s equation, giving

Gab + Λgab = 8πG Tab (6.35)

Λ is called the cosmological constant. This no longer reduces to Newtonian theory for
slow motion in a weak field but the deviation from Newtonian theory is unobservable
if Λ is sufficiently small. Note that |Λ|−1/2 has the dimensions of length. The effects
of Λ are negligible on lengths or times small compared to this quantity. Astronomical
observations indicate that there is indeed a very small positive cosmological constant:
Λ−1/2 ∼ 109 light years, the same order of magnitude as the size of the observable
Universe. Hence the effects of the cosmological constant are negligible except on cos-
mological length scales. Therefore we can set Λ = 0 unless we discuss cosmology.
Note that we can move the cosmological constant term to the RHS of the Einstein
equation, and regard it as the energy-momentum tensor of a perfect fluid with ρ =
−p = Λ/(8πG). Hence the cosmological constant is sometimes referred to as dark
energy or vacuum energy. It is a great mystery why it is so small because arguments
from quantum field theory suggest that it should be 10120 times larger. This is the

Part 3 GR October 12, 2022 68 H.S. Reall

6.5 Gauge freedom

cosmological constant problem. One (controversial) proposed solution of this problem

invokes the anthropic principle, which posits the existence of many possible universes
with different values for constants such as Λ. If Λ was very different from its observed
value then galaxies never would have formed and hence we would not be here.
Remark. We have explicitly written Newton’s constant G throughout this section.
Henceforth we shall choose units so that G = c = 1.

6.5 Gauge freedom

A physical field theory is said to possess “gauge freedom” or “gauge symmetry” if
there exist distinct field configurations that are physically equivalent, i.e., they cannot
be distinguished by observations. A transformation that maps a field configuration to
a physically equivalent configuration is called a “gauge transformation”. By definition,
observable quantities are “gauge invariant”, i.e., they should take the same value for
field configurations related by a gauge transformation.
In GR we describe physics with a manifold M on which certain tensor fields e.g.
the metric g, Maxwell field F etc. are defined. If φ : M → N is a diffeomorphism then
there is no way of distinguishing (M, g, F, . . .) from (N, φ∗ (g), φ∗ (F ), . . .); they give
equivalent descriptions of physics. For example if the metric g on M has components
gµν in a basis {eµ } for Tp (M ) then the metric φ∗ (g) has the same components gµν in
the basis {φ∗ (eµ )} for Tφ(p) (N ). If we set N = M this reveals that, on M , the set of
tensor fields (φ∗ (g), φ∗ (F ), . . .) is physically indistinguishable from (g, F, . . .). It follows
that diffeomorphisms are gauge transformations in GR.
Example. Consider three particles following geodesics of the metric g. Assume that
the worldlines of particles 1 and 2 intersect at p and that the worldlines of particles 2
and 3 intersect at q. Applying a diffeomorphism φ : M → M maps the worldlines to
geodesics of φ∗ (g) which intersect at the points φ(p) and φ(q). Note that φ(p) 6= p so
saying “particles 1 and 2 coincide at p” is not a gauge-invariant statement. An example
of a quantity that is gauge invariant is the proper time between the two intersections
along the worldline of particle 2.
This gauge freedom raises the following puzzle. The metric tensor is symmetric and
hence has 10 independent components. Consider the vacuum Einstein equation - this
appears to give 10 independent equations, which looks good. However, the Einstein
equation should not determine the components of the metric tensor uniquely, but only
up to diffeomorphisms. The resolution is that not all components of the Einstein
equations are truly independent because they are related by the contracted Bianchi
identity. So the Einstein equation does not determine the metric uniquely.

Part 3 GR October 12, 2022 69 H.S. Reall

6.5 Gauge freedom

Consider electromagnetism in Minkowski spacetime. We can write the Maxwell

tensor in terms of a covector potential Aµ as Fµν = 2∂[µ Aν] . A gauge transformation
is the map Aµ → Aµ + ∂µ Λ for any function Λ: this does not change the observable
quantity Fµν . Now consider the initial value problem for Maxwell’s equations written
in terms of Aµ . Can we specify initial data for Aµ and then solve Maxwell’s equations
to determine the solution Aµ uniquely at a later time? No, because of the gauge
symmetry! More formally, the initial value problem for Aµ is not well-posed: one does
not have uniqueness of solutions. To render the initial value problem well-posed one
needs to eliminate the gauge symmetry. This is achieved via a gauge condition. A
common choice in Maxwell’s theory is Lorenz gauge: ∂ µ Aµ = 0: one can always impose
this condition via a gauge transformation. If we impose this condition then Maxwell’s
equation reduces to the wave equation ∂ ν ∂ν Aµ = 0 which does admit a well-posed initial
value problem.
Similar remarks apply to Einstein’s equation. In this case, we can also obtain a
well-posed initial value problem by imposing a suitable gauge condition. A common
choice is harmonic gauge, defined by requiring the coordinates xµ to satisfy the wave
equation:
g νρ ∇ν ∇ρ xµ = 0 (6.36)
such coordinates are called harmonic coordinates. Since this is just a wave equation,
one can always construct such coordinates, at least locally. Expanding out the LHS
this condition is equivalent to
g νρ Γµνρ = 0 (6.37)
This is a gauge condition analogous to the Lorenz gauge condition in electromagnetism.
When this gauge condition is satisfied, the vacuum Einstein equation Rµν = 0 takes
the form
g ρσ ∂ρ ∂σ gµν + Fµν (g, ∂g) = 0 (6.38)
where the nonlinear quantity Fµν depends gρσ and its first partial derivatives but not
on its second derivatives. Notice that the second derivative term is the same second
derivative term that appears in the wave equation. So, in harmonic gauge, the Einstein
equation is a nonlinear wave equation, which admits a well-posed initial value problem.
In particular this means that we should expect solutions of the Einstein equation to
exhibit wavelike features, i.e., the theory predicts the existence of gravitational waves.

Part 3 GR October 12, 2022 70 H.S. Reall

7 Linearized theory
7.1 The linearized Einstein equation
The nonlinearity of the Einstein equation makes it very hard to solve. However, in many
circumstances, gravity is not strong and spacetime can be regarded as a perturbation
of Minkowski spacetime. More precisely, we assume our spacetime manifold is M = R4
and that there exist globally defined “almost inertial” coordinates xµ for which the
metric can be written

gµν = ηµν + hµν , ηµν = diag(−1, 1, 1, 1) (7.1)

with the weakness of the gravitational field corresponding to the components of hµν
being small compared to 1. Note that we are dealing with a situation in which we
have two metrics defined on spacetime, namely gab and the Minkowski metric ηab . The
former is supposed to be the physical metric, i.e., free particles move on geodesics of
gab .
In linearized theory we regard hµν as the components of a tensor field in the sense
of special relativity, i.e., it transforms as a tensor under Lorentz transformations of the
coordinates xµ .
To first order in the perturbation, the inverse metric is

g µν = η µν − hµν , (7.2)

where we define
hµν = η µρ η νσ hρσ (7.3)
To prove this, just check that g µν gνρ = δρµ to linear order in the perturbation. Here,
and henceforth, we shall raise and lower indices using the Minkowski metric ηµν . At
zeroth order this agrees with raising and lowering with gµν . We shall determine the
Einstein equation to first order in the perturbation hµν .
To first order, the Christoffel symbols are
1
Γµνρ = η µσ (hσν,ρ + hσρ,ν − hνρ,σ ) , (7.4)
2
the Riemann tensor is (neglecting ΓΓ terms since they are second order in the pertur-
bation):

Rµνρσ = ηµτ ∂ρ Γτνσ − ∂σ Γτνρ

1
= (hµσ,νρ + hνρ,µσ − hνσ,µρ − hµρ,νσ ) (7.5)
2

Part 3 GR October 12, 2022 71 H.S. Reall

7.1 The linearized Einstein equation

and the Ricci tensor is

1 1
Rµν = ∂ ρ ∂(µ hν)ρ − ∂ ρ ∂ρ hµν − ∂µ ∂ν h, (7.6)
2 2
where ∂µ denotes ∂/∂xµ as usual, and

h = hµ µ (7.7)

To first order, the Einstein tensor is

1 1 1
Gµν = ∂ ρ ∂(µ hν)ρ − ∂ ρ ∂ρ hµν − ∂µ ∂ν h − ηµν (∂ ρ ∂ σ hρσ − ∂ ρ ∂ρ h) . (7.8)
2 2 2
The Einstein equation equates this to 8πTµν (which must therefore be assumed to be
first order). It is convenient to define
1
h̄µν = hµν − hηµν , (7.9)
2
with inverse
1
hµν = h̄µν − h̄ηµν , (h̄ = h̄µ µ = −h) (7.10)
2
The linearized Einstein equation is then (exercise)
1 1
− ∂ ρ ∂ρ h̄µν + ∂ ρ ∂(µ h̄ν)ρ − ηµν ∂ ρ ∂ σ h̄ρσ = 8πTµν (7.11)
2 2
We must now discuss the gauge symmetry present in this theory. We argued above
that diffeomorphisms are gauge transformations in GR. A manifold M with metric
g and energy-momentum tensor T is physically equivalent to M with metric φ∗ (g)
and energy momentum tensor φ∗ (T ) if φ is a diffeomorphism. Now we are restricting
attention to metrics of the form (7.1). Hence we must consider which diffeomorphisms
preserve this form. A general diffeomorphism would lead to (φ∗ (η))µν very different from
diag(−1, 1, 1, 1) and hence such a diffeomorphism would not preserve (7.1). However,
if we consider a 1-parameter family of diffeomorphisms φt then φ0 is the identity map,
so if t is small then φt is close to the identity and hence will have a small effect, i.e.,
(φt∗ (η))µν will be close to diag(−1, 1, 1, 1) and the form (7.1) will be preserved. For
small t, we can use the definition of the Lie derivative to deduce that, for any tensor T

(φ−t )∗ (T ) = T + tLX T + O(t2 )

= T + Lξ T + O(t2 ) (7.12)

where X a is the vector field that generates φt and ξ a = tX a . Note that ξ a is small so
we treat it as a first order quantity. If we apply this result to the energy-momentum

Part 3 GR October 12, 2022 72 H.S. Reall

7.1 The linearized Einstein equation

tensor, evaluating in our chart xµ , then the first term is small (by assumption) so the
second term is higher order and can be neglected. Hence the energy-momentum tensor
is gauge-invariant to first order. The same is true for any tensor that vanishes in the
unperturbed spacetime, e.g. the Riemann tensor. However, for the metric we have

(φ−t )∗ (g) = g + Lξ g + . . . = η + h + Lξ η + . . . (7.13)

where we have neglected Lξ h because this is a higher order quantity (as ξ and h
both are small). Comparing this with (7.1) we deduce that h and h + Lξ η describe
physically equivalent metric perturbations. Therefore linearized GR has the gauge
symmetry h → h + Lξ η for small ξ µ . In our chart xµ , we can use (5.21) to write
(Lξ η)µν = ∂µ ξν + ∂ν ξµ and so the gauge symmetry is

hµν → hµν + ∂µ ξν + ∂ν ξµ (7.14)

There is a close analogy with electromagnetism in flat spacetime, where we can intro-
duce an electromagnetic potential Aµ , a 4-vector obeying Fµν = 2∂[µ Aν] . This has the
gauge symmetry
Aµ → Aµ + ∂µ Λ (7.15)
for some scalar Λ. In this case, one can choose Λ to impose the Lorenz gauge condition
∂ µ Aµ = 0. Similarly, in linearized GR we can choose ξµ to impose the gauge condition

∂ ν h̄µν = 0. (7.16)

To see this, note that under the gauge transformation (7.14) we have

∂ ν h̄µν → ∂ ν h̄µν + ∂ ν ∂ν ξµ (7.17)

so if we choose ξµ to satisfy the inhomogeneous wave equation ∂ ν ∂ν ξµ = −∂ ν h̄µν (which

we can solve using a Green function) then we reach the gauge (7.16). This is called
variously Lorenz, de Donder, or harmonic gauge (it is the linearized version of (6.37)).
In this gauge, the linearized Einstein equation reduces to

∂ ρ ∂ρ h̄µν = −16πTµν (7.18)

Hence, in this gauge, each component of h̄µν satisfies the wave equation with a source
given by the energy-momentum tensor. Given appropriate boundary conditions, the
solution can be determined using a Green function.

Part 3 GR October 12, 2022 73 H.S. Reall

7.2 The Newtonian limit

Recall that in Newtonian gravity, the gravitational field is g = −∇Φ where the gravi-
tational potential satisfies Poisson’s equation

∇2 Φ = 4πGρ (7.19)

with ρ the mass density of matter. For a localised distribution of matter (i.e. ρ → 0
at large x), we can solve this using a Green function:
ρ(t, x0 )
Z
Φ(t, x) = −G d3 x0 (7.20)
|x − x0 |
A necessary condition for validity of Newtonian theory is that the motion of bodies is
non-relativistic. Consider a particle moving in the Newtonian gravitational potential
Φ of some localised object e.g. star or a galaxy. Assume that the particle starts at rest
far away from the object, where Φ is negligible, and then falls towards the object. From
energy conservation we have (1/2)v 2 + Φ = 0 hence v 2 ∼ |Φ| so the motion remains
non-relativistic only if |Φ| 1 everywhere. The same conclusion is reached by looking
at bound orbits. Hence validity of Newtonian theory requires that the gravitational
field is weak.
For a spherical star of mass M and radius R, at the surface of the star we have
Φ = −GM/R and so |Φ| 1 provided R GM . This is satisfied by most “normal”
stars e.g. the sun has R ≈ 7 × 105 km and GM ≈ 1.5km. The only known objects with
R ∼ GM are black holes and neutron stars.
We will now see how GR reduces to Newtonian theory in the limit of non-relativistic
motion and a weak gravitational field. To do this, we could reintroduce factors of c
and try to expand everything in powers of 1/c since we expect Newtonian theory to
be valid as c → ∞. We will follow an equivalent approach in which we stick to our
convention c = 1 but introduce a small dimensionless parameter 0 < 1 such that
a factor of appears everywhere that a factor of 1/c would appear.
We assume that, for some choice of almost-inertial coordinates xµ = (t, xi ), the
3-velocity of any particle v i = dxi /dτ is O(). In Newtonian theory we had v 2 ∼ |Φ|
and so we expect the gravitational field to be O(2 ). We assume that

h00 = O(2 ), h0i = O(3 ), hij = O(2 ) (7.21)

We’ll see below how the additional factor of in h0i emerges.

Since the matter which generates the gravitational field is assumed to move non-
relativistically, time derivatives of the gravitational field will be small compared to
spatial derivatives. Let L denote the length scale over which hµν varies, i.e., if X

Part 3 GR October 12, 2022 74 H.S. Reall

7.2 The Newtonian limit

denotes a typical component of hµν then |∂i X| = O(X/L). Our assumption of small
time derivatives is
∂0 X = O(X/L) (7.22)
For example, in Newtonian theory, the gravitational field of a body of mass M at
position x(t) is Φ = −M/|x − x(t)| which obeys these formulae with L = |x − x(t)|
and |ẋ| = O().
Consider the equations for a timelike geodesic. The Lagrangian is (adding a hat to
avoid confusion with the length L)

L̂ = (1 − h00 )ṫ2 − 2h0i ṫẋi − (δij + hij ) ẋi ẋj (7.23)

where a dot denotes a derivative with respect to proper time τ . Our non-relativistic
assumption is ẋi = O(). From the definition of proper time we have L̂ = 1 which can
be solved to give
1 1
ṫ = 1 + h00 + δij ẋi ẋj + O(4 ) (7.24)
2 2
The Euler-Lagrange equation for xi is
d
−2h0i ṫ − 2 (δij + hij ) ẋj = −h00,i ṫ2 − 2h0j,i ṫẋj − hjk,i ẋj ẋk

dτ
= −h00,i + O(4 /L) (7.25)

The LHS is −2ẍi plus subleading terms. Retaining only the leading order terms gives
1
ẍi = h00,i (7.26)
2
Finally we can convert τ derivatives on the LHS to t derivatives using the chain rule
and (7.24) to obtain
d 2 xi
= −∂i Φ (7.27)
dt2
where
1
Φ ≡ − h00 (7.28)
2
We have recovered the equation of motion for a test body moving in a Newtonian
gravitational field Φ.
Exercise. Show that the corrections to (7.27) are O(4 /L). You can argue as follows.
(7.26) implies ẍi = O(2 /L). Now use (7.24) to show ẗ = O(3 /L). Expand out the
derivative on the LHS of (7.25) to show that the corrections to (7.26) are O(4 /L).
Finally convert τ derivatives to t derivatives using (7.24).

Part 3 GR October 12, 2022 75 H.S. Reall

7.2 The Newtonian limit

The next thing we need to show is that Φ satisfies the Poisson equation (7.19).
First consider the energy-momentum tensor. Assume that one can ascribe a 4-velocity
ua to the matter. Our non-relativistic assumption implies

ui = O(), u0 = 1 + O(2 ) (7.29)

where the second equality follows from gab ua ub = −1. The energy-density in the rest-
frame of the matter is
ρ ≡ ua ub Tab (7.30)
Recall that −T0i is the momentum density measured by an observer at rest in these
coordinates so we expect −T0i ∼ ρui = O(ρ). Tij will have a part ∼ ρui uj = O(ρ2 )
arising from the motion of the matter. It will also have a contribution from the stresses
in the matter. Under all but the most extreme circumstances, these are small compared
to ρ. For example, in the rest frame of a perfect fluid, stresses are determined by the
pressure p. The speed of sound in the fluid is C where C 2 = dp/dρ ∼ p/ρ. Our non-
relativistic assumption is that C = O() hence p = O(ρ2 ). This is true in the Solar
system, where p/ρ ∼ |Φ| ∼ 10−5 at the centre of the Sun. Hence we make the following
assumptions

T00 = ρ(1 + O(2 )), T0i = O(ρ), Tij = O(ρ2 ) (7.31)

where the first equality follows from (7.30) and other two equalities.
Finally, we consider the linearized Einstein equation. Equation (7.21) implies that

h̄00 = O(2 ), h̄0i = O(3 ), h̄ij = O(2 ) (7.32)

Using our assumption about time derivatives being small compared to spatial deriva-
tives, equation (7.18) becomes

∇2 h̄00 = −16πρ(1 + O(2 )), ∇2 h̄0i = O(ρ), ∇2 h̄ij = O(ρ2 ) (7.33)

If we impose boundary conditions that the metric perturbation (gravitational field)

should decay at large distance then these equations can be solved using a Green function
as in (7.20). The factors of on the RHS above imply that the resulting solutions satisfy

h̄0i = O(h̄00 ) = O(3 ), h̄ij = O(h̄00 2 ) = O(4 ) (7.34)

Since h0i = h̄0i , this explains why we had to assume h0i = O(3 ). From the second
result, we have h̄ii = O(4 ) and hence h̄ = −h̄00 + O(4 ). We can use (7.10) to recover
hµν . This gives h00 = (1/2)h̄00 + O(4 ) and so, using (7.28) we obtain Newton’s law of
gravitation:
∇2 Φ = 4πρ(1 + O(2 )) (7.35)

Part 3 GR October 12, 2022 76 H.S. Reall

7.3 Gravitational waves

We also obtain hij = (1/2)h̄00 δij + O(4 ) and so

hij = −2Φδij + O(4 ) (7.36)

The expansion of various quantities in powers of can be extended to higher orders.

As we have seen, the Newtonian approximation requires only the O(2 ) term in h00 .
The next order, post-Newtonian, approximation corresponds to including O(4 ) terms
in h00 , O(3 ) terms in h0i and O(2 ) in hij . Equation (7.36) gives hij to O(2 ). The
above analysis also lets us write the O(3 ) term in h0i in terms of T0i and a Green
function. However, to obtain the O(4 ) term in h00 one has to go beyond linearized
theory.

7.3 Gravitational waves

In vacuum, the linearized Einstein equation reduces to the source-free wave equation:

∂ ρ ∂ρ h̄µν = 0 (7.37)

so the theory admits gravitational wave solutions. As usual for the wave equation, we
can build a general solution as a superposition of plane wave solutions. So let’s look
for a plane wave solution:
ρ
h̄µν (x) = Re Hµν eikρ x (7.38)
where Hµν is a constant symmetric complex matrix describing the polarization of the
wave and k µ is the (real) wavevector. We shall suppress the Re in all subsequent
equations. The wave equation reduces to

kµ k µ = 0 (7.39)

so the wavevector k µ must be null hence these waves propagate at the speed of light
relative to the background Minkowski metric. The gauge condition (7.16) gives

k ν Hµν = 0, (7.40)

i.e. the waves are transverse.

Example. Consider the null vector k µ = ω(1, 0, 0, 1). Then exp(ikµ xµ ) = exp(−iω(t −
z)) so this describes a wave of frequency ω propagating at the speed of light in the
z-direction. The transverse condition reduces to

Hµ0 + Hµ3 = 0. (7.41)

Part 3 GR October 12, 2022 77 H.S. Reall

7.3 Gravitational waves

Returning to the general case, the condition (7.16) does not eliminate all gauge free-
dom. Consider a gauge transformation (7.14). From equation (7.17), we see that this
preserves the gauge condition (7.16) if ξµ obeys the wave equation:

∂ ν ∂ν ξµ = 0. (7.42)

Hence there is a residual gauge freedom which we can exploit to simplify the solution.
Take
ρ
ξµ (x) = Xµ eikρ x (7.43)
which satisfies (7.42) because kµ is null. Using

h̄µν → h̄µν + ∂µ ξν + ∂ν ξµ − ηµν ∂ ρ ξρ (7.44)

we see that the residual gauge freedom in our case is

Hµν → Hµν + i (kµ Xν + kν Xµ − ηµν k ρ Xρ ) (7.45)

Exercise. Show that the residual gauge freedom can be used to achieve “longitudinal
gauge”:
H0µ = 0 (7.46)
but (since k ν Hµν = 0) this still does not determine Xµ uniquely, and the freedom
remains to impose the additional “trace-free” condition

H µ µ = 0. (7.47)

In this gauge, we have

hµν = h̄µν . (7.48)

Example. Return to our wave travelling in the z-direction. The longitudinal gauge
condition combined with the transversality condition (7.41) gives H3µ = 0. Using the
trace-free condition now gives
 
0 0 0 0
 0 H+ H× 0 
 
Hµν =  (7.49)
 0 H× −H+ 0 

0 0 0 0

where the solution is specified by the two constants H+ and H× corresponding to two
independent polarizations. So gravitational waves are transverse and have two possible

Part 3 GR October 12, 2022 78 H.S. Reall

7.3 Gravitational waves

polarizations. This is one way of interpreting the statement that the gravitational field
has two degrees of freedom per spacetime point.
How would one detect a gravitational wave? An observer could set up a family of
test particles locally. The displacement vector S a from the observer to any particle is
governed by the geodesic deviation equation. (We are taking S a to be infinitesimal,
i.e., what we called δsS a previously.) So we can use this equation to predict what the
observer would see. We have to be careful here. It would be natural to write out the
geodesic deviation equation using the almost inertial coordinates and therby determine
S µ . But how would we relate this to observations? S µ are the components of S a with
respect to a certain basis, so how would we determine whether the variation in S µ arises
from variation of S a or from variation of the basis? With a bit more thought, one can
make this approach work but we shall take a different approach.
Consider an observer following a geodesic in a general spacetime. Our observer
will be equipped with a set of measuring rods with which to measure distances. At
some point p on his worldline we could introduce a local inertial frame with spatial
coordinates X, Y, Z in which the observer is at rest. Imagine that the observer sets up
measuring rods of unit length pointing in the X, Y, Z directions at p. Mathematically,
this defines an orthonormal basis {eα } for Tp (M ) (we use α to label the basis vectors
because we are using µ for our almost inertial coordinates) where ea0 = ua (his 4-velocity)
and eai (i = 1, 2, 3) are spacelike vectors satisfying

ua eai = 0, gab eai ebj = δij (7.50)

In Minkowski spacetime, this basis can be extended to the observer’s entire worldline by
taking the basis vectors to have constant components (in an inertial frame), i.e., they
do not depend on proper time τ . In particular, this implies that the orthonormal basis
is non-rotating. Since the basis vectors have constant components, they are parallelly
transported along the worldline. Hence, in curved spacetime, the analogue of this is to
take the basis vectors to be parallelly transported along the worldline. For ua , this is
automatic (the worldline is a geodesic). But for ei it gives

ub ∇b eai = 0 (7.51)

As we discussed previously, if the eai are specified at any point p then this equation
determines them uniquely along the whole worldline. Furthermore, the basis remains
orthonormal because parallel transport preserves inner products (examples sheet 2).
The basis just constructed is sometimes called a parallelly transported frame. It is the
kind of basis that would be constructed by an observer freely falling with 3 gyroscopes
whose spin axes define the spatial basis vectors. Using such a basis we can be sure that

Part 3 GR October 12, 2022 79 H.S. Reall

7.3 Gravitational waves

an increase in a component of S a is really an increase in the distance to the particle in

a particular direction, rather than a basis-dependent effect.
Now imagine this observer sets up a family of test particles near his worldline. The
deviation vector to any infinitesimally nearby particle satisfies the geodesic deviation
equation
ub ∇b (uc ∇c Sa ) = Rabcd ub uc S d (7.52)
Contract with eaα and use the fact that the basis is parallelly transported to obtain

ub ∇b [uc ∇c (eaα Sa )] = Rabcd eaα ub uc S d (7.53)

Now eaα Sa is a scalar hence the equation reduces to

d2 S α
= Rabcd eaα ub uc edβ S β (7.54)
dτ 2
where τ is the observer’s proper time and Sα = eaα Sa is one of the components of Sa in
the parallelly transported frame. On the RHS we’ve used S d = edβ S β .
So far, the discussion has been general but now let’s specialize to our gravitational
plane wave. On the RHS, Rabcd is a first order quantity so we only need to evaluate
the other quantities to leading order, i.e., we can evaluate them as if spacetime were
flat. Assume that the observer is at rest in the almost inertial coordinates. To leading
order, uµ = (1, 0, 0, 0) hence
d2 Sα
2
≈ Rµ00ν eµα eνβ S β (7.55)
dτ
Using the formula for the perturbed Riemann tensor (7.5) and h0µ = 0 we obtain

d2 Sα 1 ∂ 2 hµν µ ν β
≈ e e S (7.56)
dτ 2 2 ∂t2 α β
In Minkowski spacetime we could take eai aligned with the x, y, z axes respectively, i.e.,
eµ1 = (0, 1, 0, 0), eµ2 = (0, 0, 1, 0) and eµ3 = (0, 0, 0, 1). We can use the same results here
because we only need to evaluate eµα to leading order. Using h0µ = h3µ = 0 we then
have
d2 S0 d2 S3
= =0 (7.57)
dτ 2 dτ 2
to this order of approximation. Hence the observer sees no relative acceleration of the
test particles in the z-direction, i.e, the direction of propagation of the wave. Let the
observer set up initial conditions so that S0 and its first derivatives vanish at τ = 0.
Then S0 will vanish for all time. If the derivative of S3 vanishes initially then S3 will
be constant. The same is not true for the other components.

Part 3 GR October 12, 2022 80 H.S. Reall

7.3 Gravitational waves

We can choose our almost inertial coordinates so that the observer has coordinates
µ
x ≈ (τ, 0, 0, 0) (i.e. t = τ to leading order along the observer’s worldline). For a +
polarized wave we then have
d2 S 1 1 d2 S 2 1
2
= − ω 2 |H+ | cos(ωτ − α)S1 , 2
= ω 2 |H+ | cos(ωτ − α)S2 (7.58)
dτ 2 dτ 2
where we have replaced t by τ in ∂ 2 hµν /∂t2 and α = arg H+ . Since H+ is small we can
solve this perturbatively: the leading order solution is S1 = S̄1 , a constant (assuming
that we set up initial condition so that the particles are at rest to leading order).
Similarly S2 = S̄2 . Now we can plug these leading order solutions into the RHS of
the above equations and integrate to determine the solution up to first order (again
choosing constants of integration so that the particles would be at rest in the absence
of the wave)

1 1
S1 (τ ) ≈ S̄1 1 + |H+ | cos(ωτ − α) , S2 (τ ) ≈ S̄2 1 − |H+ | cos(ωτ − α)
2 2
(7.59)
This reveals that particles are displaced outwards in the x-direction whilst being dis-
placed inwards in the y-direction, and vice-versa. S̄1 and S̄2 give the average displace-
ment. If the particles are arranged in the xy plane with S̄12 + S̄22 = R2 then they form
a circle of radius R when ωτ = α + π/2. This will be deformed into an ellipse, then
back to a circle, then an ellipse again (Fig 15).

ωτ = α + 21 π ωτ = α + π ωτ = α + 32 π ωτ = α + 2π

Figure 15. Geodesic deviation caused by + polarized wave.

Exercise. Show that the corresponding result for a × polarized wave is the same, just
rotated through 45◦ (Fig. 16).
Gravitational wave detectors look for the changes in position of test masses caused
by the above effect. For example, the two LIGO observatories (in the US, see Fig. 7.3)
each have two perpendicular tunnels, each 4 km long. There are test masses (analogous

Part 3 GR October 12, 2022 81 H.S. Reall

7.3 Gravitational waves

ωτ = α + 12 π ωτ = α + π ωτ = α + 32 π ωτ = α + 2π

Figure 16. Geodesic deviation caused by × polarized wave.

Figure 17. The LIGO Hanford observatory in Washington state, USA. There is another
LIGO observatory in Louisiana. (Image credit: LIGO.)

to the particles above) at the end of each arm (tunnel) and where the arms meet. A
beam splitter is attached to the test mass where the arms meet. A laser signal is split
and sent down each arm, where it reflects off mirrors attached to the test masses at the
ends of the arms. The signals are recombined and interferometry used to detect whether
there has been any change in the length difference of the two arms. The effect that is
being looked for is tiny: plausible sources of gravitational waves give H+ , H× ∼ 10−21
(see below) so the relative length change of each arm is δL/L ∼ 10−21 . The resulting
δL is a tiny fraction of the wavelength of the laser light but the resulting tiny phase
difference between the two laser signals is detectable!
Gravitational wave detectors have been operating for several decades, gradually

Part 3 GR October 12, 2022 82 H.S. Reall

7.4 The field far from a source

improving in sensitivity. The first direct detection of gravitational waves was made by
the LIGO collaboration in September 2015. As we will explain below, there is also very
good indirect evidence for the existence of gravitational waves. We will discuss all of
this in more detail later.

7.4 The field far from a source

Let’s return to the linearized Einstein equation with matter:

∂ ρ ∂ρ h̄µν = −16πTµν (7.60)

Since each component of h̄µν satisfies the inhomogeneous wave equation, the solution
can be solved using the same retarded Green function that one uses in electromag-
netism:
Tµν (t − |x − x0 |, x0 )
Z
h̄µν (t, x) = 4 d3 x0 (7.61)
|x − x0 |
where |x − x0 | is calculated using the Euclidean metric.
Assume that the matter is confined to a compact region near the origin of size d
(e.g. let d be the radius of the smallest sphere that encloses the matter). Then, far
from the source we have r ≡ |x| |x0 | ∼ d so we can expand
2
|x − x0 |2 = r2 − 2x · x0 + x0 = r2 1 − (2/r)x̂ · x0 + O(d2 /r2 )

(7.62)

(where x̂ = x/r) hence

|x − x0 | = r − x̂ · x0 + O(d2 /r) (7.63)
Tµν (t − |x − x0 |, x0 ) = Tµν (t0 , x0 ) + x̂ · x0 (∂0 Tµν )(t0 , x0 ) + . . . (7.64)
where
t0 = t − r (7.65)
Now let τ denote the time scale on which Tµν is varying so ∂0 Tµν ∼ Tµν /τ . For example,
if the source is a binary star system, then τ is the orbital period. The second term in
(7.64) is of order (d/τ )Tµν . Note that d is the time it takes light to cross the region
containing the matter. Hence d/τ 1 will be satisfied if the matter is moving non-
relativistically. We assume this henceforth. This implies that the second term in (7.64)
is negligible compared to the first and so
Z
4
h̄ij (t, x) ≈ d3 x0 Tij (t0 , x0 ) t0 = t − r (7.66)
r
Here we are considering just the spatial components of h̄µν , i.e., h̄ij . Other components
can be obtained from the gauge condition (7.16), which gives

∂0 h̄0i = ∂j h̄ji , ∂0 h̄00 = ∂i h̄0i (7.67)

Part 3 GR October 12, 2022 83 H.S. Reall

7.4 The field far from a source

Given h̄ij , the first equation can be integrated to determine h̄0i and the second can
then be integrated to determine h̄00 .
The integral in (7.66) can be evaluated as follows. Since the matter is compactly
supported, we can freely integrate by parts and discard surface terms (note also that
t0 does not depend on x0 ). We can also use energy-momentum conservation, which to
this order is just ∂ν T µν = 0. Let’s drop the primes on the coordinates in the integral
for now.
Z Z
3 ij
d3 x ∂k (T ik xj ) − (∂k T ik )xj

d x T (t, x) =
Z
= − d3 x (∂k T ik )xj drop surface term
Z
= d3 x(∂0 T i0 )xj conservation law
Z
= ∂0 d3 x T 0i xj (7.68)

We can now symmetrize this equation on ij to get

Z Z
3 ij
d x T (t, x) = ∂0 d3 x T 0(i xj)
Z
3 1 0k i j
1 0k i j
= ∂0 d x ∂k T x x − (∂k T )x x
2 2
Z
1
= − ∂0 d3 x (∂k T 0k )xi xj integration by parts
2
Z
1
= ∂0 d3 x (∂0 T 00 )xi xj conservation law
2
Z
1
= ∂0 ∂0 d3 x T 00 xi xj
2
1
= I¨ij (t) (7.69)
2
where Z
Iij (t) = d3 xT00 (t, x)xi xj (7.70)

(Note that T00 = T 00 and Tij = T ij to leading order.) Hence we have

2
h̄ij (t, x) ≈ I¨ij (t − r) (7.71)
r
This result is valid when r d and τ d.
Iij is the second moment of the energy density. It is a tensor in the Cartesian sense,
i.e., it transforms in the usual way under rotations of the coordinates xi . (The zeroth

Part 3 GR October 12, 2022 84 H.S. Reall

7.4 The field far from a source

R
moment is the total energy in matter d3 xT00 , the first moment is the energy dipole
R 3
d xT00 xi .)
The result (7.71) describes a disturbance propagating outwards from the source at
the speed of light. If the source exhibits oscillatory motion (e.g. a binary star system)
then h̄ij will describe waves with the same period τ as the energy-momentum tensor of
the source.
The first equation in (7.67) gives

2¨
∂0 h̄0i ≈ ∂j Iij (t − r) (7.72)
r

so integrating with respect to time gives (using ∂i r = xi /r ≡ x̂i )

2˙ 2x̂j 2x̂j ¨
h̄0i ≈ ∂j Iij (t − r) + ki (x) = − 2 I˙ij (t − r) − Iij (t − r) + ki (x)
r r r
2x̂j ¨
≈− Iij (t − r) + ki (x) (7.73)
r
where ki (x) is an arbitrary function. In the final line we have assumed that we are in
the radiation zone r τ . This allows us to neglect the term proportional to I˙ because
it is smaller than the term we have retained by a factor τ /r. In the radiation zone,
space and time derivatives are of the same order of magnitude. (Note that, even for
a non-relativistic source, the Newtonian approximation breaks down in the radiation
zone because (7.22) is violated.)
Similarly integrating the second equation in (7.67) gives

2x̂j ˙ 2x̂i x̂j ¨
h̄00 ≈ ∂i − Iij (t − r) + t∂i ki + f (x) ≈ Iij (t − r) + t∂i ki + f (x) (7.74)
r r

for some arbitrary function f (x). To determine the functions f and ki we return to
(7.61), which to leading order gives
4E
h̄00 ≈ (7.75)
r
where E is the total energy of the matter
Z
E = d3 x0 T00 (t0 , x0 ) (7.76)

Note that energy-momentum conservation gives

Z Z
∂0 E = d x (∂0 T00 )(t , x ) = d3 x0 (∂i Ti0 )(t0 , x0 ) = 0
3 0 0 0
(7.77)

Part 3 GR October 12, 2022 85 H.S. Reall

7.5 The energy in gravitational waves

and hence E is a constant. We can now read off the time-independent term in h̄00 :
f (x) = 4E/r. Why have we not rediscovered the I¨ij term in (7.74) from (7.61)? This
term is smaller than f by a factor of d2 /τ 2 (as Iij ∼ Ed2 : convince yourself why!) so
we’d have to go to higher order in the expansion of (7.61) to find this term (and also
the term linear in t, which appears at higher order).
Similarly, (7.61) gives
4Pi
h̄0i ≈ − (7.78)
r
where Pi is the total 3-momentum of matter
Z
Pi = − d3 x0 T0i (t0 , x0 ) (7.79)

This is also constant by energy-momentum conservation. Comparing with the above

we can read off ki (x) = −4Pi /r so we’ve now determined all components of h̄µν .
Remark. We will show below that gravitational waves carry away energy (they also
carry away momentum). So why is the total energy of matter constant? In fact, the
total energy of matter is not constant but one has to go beyond linearized theory to
see this: one would have to correct the equation for energy-momentum conservation
to take account of the perturbation to the connection. But then we would have to
correct the LHS of the linearized Einstein equation, including second order terms for
consistency with the new conservation law of the RHS. So to see this effect requires
going beyond linearized theory.
A final simplification is possible: we are free to choose our almost inertial coordi-
nates to correspond to the ”centre of momentum frame”, i.e., Pi = 0. If we do this then
E is just the total mass of the matter, which we shall denote M . Putting everything
together we have
4M 2x̂i x̂j ¨ 2x̂j ¨
h̄00 (t, x) ≈ + Iij (t − r), h̄0i (t, x) ≈ − Iij (t − r) (7.80)
r r r
To recap, we have assumed r τ d and work in the centre of momentum frame. In
h̄00 we have retained the second term, even though it is subleading relative to the first,
because this is the leading order time-dependent term.

7.5 The energy in gravitational waves

We see that the gravitational waves arise when Iij varies in time. Gravitational waves
carry energy away from the source. Calculating this is subtle: as discussed previously,
there is no local energy density for the gravitational field. To explain the calculation we
must go to second order in perturbation theory. At second order, our metric ηµν + hµν

Part 3 GR October 12, 2022 86 H.S. Reall

7.5 The energy in gravitational waves

will not satisfy the Einstein equation so we have to add a second order correction,
writing
gµν = ηµν + hµν + h(2)
µν (7.81)
(2)
The idea is that if the components of hµν are O() then the components of hµν are
O(2 ).
Now we calculate the Einstein tensor to second order. The first order term is
(1)
what we calculated before (equation (7.8)). We shall call this Gµν [h]. The second
order terms are either linear in h(2) or quadratic in h. The terms linear in h(2) can be
calculated by setting h to zero. This is exactly the same calculation we did before but
with h replaced by h(2) . Hence the result will be (7.8) with h → h(2) , which we denote
(1)
Gµν [h(2) ]. Therefore to second order we have

Gµν [g] = G(1) (1) (2) (2)

µν [h] + Gµν [h ] + Gµν [h] (7.82)
(2)
where Gµν [h] is the part of Gµν that is quadratic in h. This is:
1 (1) 1 (2)
G(2) (2)
µν [h] = Rµν [h] − R [h]hµν − R [h]ηµν (7.83)
2 2
(2)
where Rµν [h] is the term in the Ricci tensor that is quadratic in h. R(1) and R(2) are
the terms in the Ricci scalar which are linear and quadratic in h respectively. The
latter can be written
R(2) [h] = η µν Rµν
(2)
[h] − hµν Rµν
(1)
[h] (7.84)
Exercise (examples sheet 3). Show that

(2) 1 ρσ 1
Rµν [h] = h ∂µ ∂ν hρσ − hρσ ∂ρ ∂(µ hν)σ + ∂µ hρσ ∂ν hρσ + ∂ σ hρ ν ∂[σ hρ]µ
2 4
1 σρ 1 ρ ρσ 1 ρ
+ ∂σ (h ∂ρ hµν ) − ∂ h∂ρ hµν − ∂σ h − ∂ h ∂(µ hν)ρ (7.85)
2 4 2
For simplicity, assume that no matter is present. At first order, the linearized
(1)
Einstein equation is Gµν [h] = 0 as before. At second order we have

G(1) (2)
µν [h ] = 8πtµν [h] (7.86)

where
1 (2)
tµν [h] ≡ −G [h] (7.87)
8π µν
Equation (7.86) is the equation of motion for h(2) . If h satisfies the linear Einstein
(1)
equation then we have Rµν [h] = 0 so the above results give

1 (2) 1 ρσ (2)
tµν [h] = − Rµν [h] − η Rρσ [h]ηµν (7.88)
8π 2

Part 3 GR October 12, 2022 87 H.S. Reall

7.5 The energy in gravitational waves

Consider now the contracted Bianchi identity g µρ ∇ρ Gµν = 0. Expanding this, at first
order we get
∂ µ G(1)
µν [h] = 0 (7.89)
for arbitrary first order perturbation h (i.e. not assuming that h satisfies any equation
of motion). At second order we get

∂ µ G(1) (2) (2) (1)

µν [h ] + Gµν [h] + hG [h] = 0 (7.90)

where the final term denotes schematically the terms that arise from the first order
change in the inverse metric and the Christoffel symbols in g µρ ∇ρ . Now, since (7.89)
(1)
holds for arbitrary h, it holds if we replace h with h(2) so ∂ µ Gµν [h(2) ] = 0. If we now
assume that h satisfies its equation of motion G(1) [h] = 0 then the final term in (7.90)
vanishes and this equation reduces to

∂ µ tµν = 0. (7.91)

Hence tµν is a symmetric tensor (in the sense of special relativity) that is (i) quadratic
in the linear perturbation h, (ii) conserved if h satisfies its equation of motion, and (iii)
appears on the RHS of the second order Einstein equation (7.86). This is a natural
candidate for the energy momentum tensor of the linearized gravitational field.
Unfortunately, tµν suffers from a major problem: it is not invariant under a gauge
transformation (7.14). This is how the impossibility of localizing gravitational energy
arises in linearized theory.
Nevertheless, it can be shown that the integral of t00 over a surface of constant
time t = x0 is gauge invariant provided one considers hµν that decays at infinity, and
restricts to gauge transformations which preserve this property. This integral provides
a satisfactory notion of the total energy in the linearized gravitational field. Hence
gravitational energy does exist, but it cannot be localized.
One can use the second order Einstein equation (7.86) to convert the integral
defining the energy, which is quadratic in hµν , into a surface integral at infinity which
(2)
is linear in hµν . In fact the latter can be made fully nonlinear: these surface integrals
make sense in any spacetime which is ”asymptotically flat”, irrespective of whether or
not the linearized approximation holds in the interior. This notion of energy is referred
to as the ADM energy. This is constant in time but there is a related quantity called
the Bondi energy, a non-increasing function of time. The rate of change of this can be
interpreted as the rate of energy loss in gravitational waves.
We shall follow a less rigorous, but more intuitive, approach in which we convert tµν
into a gauge-invariant quantity by averaging. Let R be an open subset of R4 containing
the origin, with typical coordinate size a in all directions (e.g. consider a sphere of

Part 3 GR October 12, 2022 88 H.S. Reall

7.5 The energy in gravitational waves

radius a). Let W (y) be a smooth function on R4 such that W = 0 outside R, W > 0
R
inside R and R W (y)d4 y = 1. We define the average of a tensor Xµν at a point with
“almost inertial” coordinates xµ by
Z
hXµν i(x) = W (x − y)Xµν (y)d4 y (7.92)
R

Note that it makes sense to integrate Xµν because we are treating it as a tensor in
Minkowski spacetime, so we can add tensors at different points.
We are interested in averaging in the region far from the source, in which we have
gravitational radiation with some typical wavelength λ (in the notation used above
λ ∼ τ ). Assume that the components of Xµν have typical size X. Since the wavelength
of the radiation is λ, ∂µ Xνρ will have components of typical size X/λ. But the average
is Z
h∂µ Xνρ i(x) = (∂µ W )(x − y)Xνρ (y)d4 y (7.93)
R
where we have integrated by parts and used W = 0 on ∂R. Now ∂µ W has components
of order W/a so the RHS above has components of order X/a. Hence if we choose
a λ then averaging has the effect of reducing ∂µ Xνρ by a factor of λ/a 1. So if we
choose a λ then we can neglect total derivatives inside averages. This implies that
we are free to integrate by parts inside averages:

hA∂Bi = h∂(AB)i − h(∂A)Bi ≈ −h(∂A)Bi (7.94)

because h∂(AB)i is a factor λ/a smaller than h(∂A)Bi. Henceforth we assume a λ

so we can exploit these properties.
Exercises (examples sheet 3).

1. Use the linearized Einstein equation to show that, in vacuum,

hη µν Rµν
(2)
[h]i = 0 (7.95)

Hence the second term in tµν [h] averages to zero.

2. Show that
1 1
htµν i = h∂µ h̄ρσ ∂ν h̄ρσ − ∂µ h̄∂ν h̄ − 2∂σ h̄ρσ ∂(µ h̄ν)ρ i (7.96)
32π 2

3. Show that htµν i is gauge invariant.

Part 3 GR October 12, 2022 89 H.S. Reall

7.6 The quadrupole formula

Hence we can obtain a gauge invariant energy momentum tensor as long as we

average over a region much larger than the wavelength of the the gravitational radiation
we are studying. This might be a rather large region: the LIGO detector looks for waves
with frequency around 100Hz, corresponding to a wavelength λ ∼ 3000km.
Exercise. Calculate htµν i for the plane gravitational wave solution discussed above.
Show that
 
1 0 0 −1
1 2 2
ω2 2 2  0 0 0 0 
 
htµν i = |H+ | + |H× | kµ kν = |H+ | + |H× |  (7.97)
32π 32π  0 00 0 

−1 0 0 1

As one would expect, there is a constant flux of energy and momentum travelling at
the speed of light in the z-direction.

7.6 The quadrupole formula

Now we are ready to calculate the energy loss from a compact source due to gravitational
radiation. The averaged energy flux 3-vector is −ht0i i. Consider a large sphere r =
constant surrounding the source. The unit normal to the sphere (in a surface of constant
t) is x̂i . Hence the average total energy flux across this sphere, i.e., the average power
radiated across the sphere is
Z
hP i = − r2 dΩht0i ix̂i (7.98)

where dΩ is the usual volume element on a unit S 2 .

Calculating this is just a matter of substituting the results of section 7.4 into (7.96).
Since these results apply in harmonic gauge, we have
1 1
ht0i i = h∂0 h̄ρσ ∂i h̄ρσ − ∂0 h̄∂i h̄i
32π 2
1 1
= h∂0 h̄jk ∂i h̄jk − 2∂0 h̄0j ∂i h̄0j + ∂0 h̄00 ∂i h̄00 − ∂0 h̄∂i h̄i (7.99)
32π 2
Since h̄jk (t, x) = (2/r)I¨jk (t − r) we have

2 ...
∂0 h̄jk = I jk (t − r) (7.100)
r
and
2 ... 2 ¨
∂i h̄jk = − I jk (t − r) − 2 Ijk (t − r) x̂i (7.101)
r r

Part 3 GR October 12, 2022 90 H.S. Reall

7.6 The quadrupole formula

The second term is smaller than the first by a factor τ /r 1 and so negligible for large
enough r. Hence
Z
1 1 ... ...
− r2 dΩh∂0 h̄jk ∂i h̄jk ix̂i = h I ij I ij it−r (7.102)
32π 2
On the RHS, the average is a time average, taken over an interval a λ ∼ τ centered
on the retarded time t − r.
Next we have h̄0j = (−2x̂k /r)I¨jk (t − r) hence
2x̂k ... 2x̂k ...
∂0 h̄0j = − I jk (t − r) ∂i h̄0j ≈ I jk (t − r)x̂i (7.103)
r r
where in the second expressions we have used τ /r 1 to neglect the terms arising
from differentiation of x̂k /r. Hence
Z Z
1 2 1 ... ...
− r dΩh−2∂0 h̄0j ∂i h̄0j ix̂i = − h I jk I jl it−r dΩx̂k x̂l (7.104)
32π 4π
R
Now recall the following from Cartesian tensors: dΩx̂k x̂l is isotropic (rotationally
invariant) and hence must equal λδkl for some constant λ. Taking the trace fixes
λ = 4π/3. Hence the RHS above is
1 ... ...
− h I ij I ij it−r (7.105)
3
Next we use h̄00 = 4M/r + (2x̂j x̂k /r)I¨jk (t − r) to give
2x̂j x̂k ...
∂0 h̄00 = I jk (t − r) (7.106)
r
and

4M 2x̂j x̂k ... 2x̂j x̂k ...
∂i h̄00 ≈ − 2 − I jk (t − r) x̂i ≈ − I jk (t − r)x̂i (7.107)
r r r
where we’ve neglected terms arising from differentiation of x̂j x̂k /r in the first equality
because in the radiation zone (τ /r 1) they’re negligible compared to the second
term we’ve retained. In the second equality we’ve neglected the first term in brackets
...
because this leads to a term in the integral proportional to h I jk i, which is the average
of a derivative and therefore negligible (we can’t argue that the second term in brackets
is large compared to the first: the second term is roughly d2 r/τ 3 times the first term,
and this factor is not necessarily large when r τ d). Hence we have
Z
1 1 ... ...
− r2 dΩh∂0 h̄00 ∂i h̄00 ix̂i = h I ij I kl it−r Xijkl (7.108)
32π 8π

Part 3 GR October 12, 2022 91 H.S. Reall

7.6 The quadrupole formula

where Z
Xijkl = dΩx̂i x̂j x̂k x̂l (7.109)
is another isotropic integral which we’ll evaluate below.
Next we use h̄ = h̄jj − h̄00 and the above results to obtain
2 ... 2x̂j x̂k ...
∂0 h̄ = I jj (t − r) − I jk (t − r) (7.110)
r r

2 ... 2x̂j x̂k ...
∂i h̄ = − I jj (t − r) + I jk (t − r) x̂i (7.111)
r r
R
and hence (using the result above for dΩx̂i x̂j )
Z
1 1 1 ... ... 1 ... ... 1 ... ...
− r2 dΩh− ∂0 h̄∂i h̄ix̂i = h− I jj I kk + I jj I kk − I ij I kl Xijkl i (7.112)
32π 2 4 6 16π
Putting everything together we have
1 ... ... 1 ... ... 1 ... ...
hP it = h I ij I ij − I ii I jj + I ij I kl Xijkl it−r (7.113)
6 12 16π
To evaluate Xijkl , we use the fact that any isotropic Cartesian tensor is a product of δ
factors and factors. In the present case, Xijkl has rank 4 so we can only use δ terms so
we must have Xijkl = αδij δkl + βδik δjl + γδil δjk for some α, β, γ. The symmetry of Xijkl
implies that α = β = γ. Taking the trace on ij and on kl indices then fixes α = 4π/15.
The final term above is therefore
1 ... ... ... ...
h I ii I jj +2 I ij I ij i (7.114)
60
and hence
1 ... ... 1 ... ...
hP it = h I ij I ij − I ii I jj it−r (7.115)
5 3
Finally we define the energy quadrupole tensor, which is the traceless part of Iij
1
Qij = Iij − Ikk δij (7.116)
3
We then have
1 ... ...
hP it = hQij Qij it−r (7.117)
5
This is the quadrupole formula for energy loss via gravitational wave emission. It is
valid in the radiation zone far from a non-relativistic source, i.e., for r τ d.
We conclude that an energy-momentum distribution whose quadrupole tensor is
varying in time will emit gravitational radiation. A spherically symmetric body has
Qij = 0 and so will not radiate, in agreement with Birkhoff ’s theorem, which asserts
that the unique spherically symmetric solution of the vacuum Einstein equation is the
Schwarzschild solution. Hence the spacetime outside a spherically symmetric body is
time independent because it is described by the Schwarzschild solution.

Part 3 GR October 12, 2022 92 H.S. Reall

7.7 Comparison with electromagnetic radiation

In electromagnetic theory, given a charge distribution ρ we can define the total charge
Z
Q = d3 xρ (7.118)

and the electric dipole moment Z

D= d3 xρx (7.119)

Similarly for a matter distribution with energy density T00 we have defined the total
energy Z
E = d3 xT00 (7.120)

and we can define the centre of mass

Z
−1
X(t) = E d3 xT00 (t, x)x (7.121)

Electromagnetic radiation is produced by the motion of charge. Of course charge is

conserved so Q does not vary with time, just like E does not vary with time. Hence
there is no monopole radiation in either electromagnetism or gravity. However, D can
change with time, and changing D leads to emission of electromagnetic radiation with
power
1
hP it = h|D̈|2 it−r (7.122)
12π0
Since the analogue of D is X, one might expect gravitational dipole radiation when X
varies. However, we have
E Ẍ = Ṗ = 0 (7.123)
where P is the total momentum of the mass distribution, which is conserved. Hence
there is no gravitational analogue of electric dipole radiation: it is forbidden by linear
momentum conservation.
In electromagnetic theory, a varying magnetic dipole moment also produces radia-
tion, although this is usually much weaker than electric dipole radiation. The magnetic
dipole is Z
µ= d3 x x × j (7.124)

where j is the electric current. The analogue of a magnetic dipole moment for a mass
distribution is Z
J = d3 x x × (ρu) (7.125)