Part3 GR Lectures
Part3 GR Lectures
General Relativity
Harvey Reall
CONTENTS
Contents
1 Manifolds and tensors 4
1.1 Introduction 4
1.2 Differentiable manifolds 5
1.3 Smooth functions 8
1.4 Curves and vectors 9
1.5 Covectors 14
1.6 Abstract index notation 15
1.7 Tensors 16
1.8 Tensor fields 20
1.9 Integral curves 21
1.10 The commutator 22
2 Metric tensors 24
2.1 Definition 24
2.2 Lorentzian signature 26
2.3 Curves of extremal proper time 28
3 Covariant derivative 31
3.1 Introduction 31
3.2 The Levi-Civita connection 35
3.3 Geodesics 37
3.4 Normal coordinates 39
4 Curvature 40
4.1 Parallel transport 40
4.2 The Riemann tensor 41
4.3 Parallel transport again 42
4.4 Symmetries of the Riemann tensor 45
4.5 Geodesic deviation 46
4.6 Curvature of the Levi-Civita connection 48
7 Linearized theory 71
7.1 The linearized Einstein equation 71
7.2 The Newtonian limit 74
7.3 Gravitational waves 77
7.4 The field far from a source 83
7.5 The energy in gravitational waves 86
7.6 The quadrupole formula 90
7.7 Comparison with electromagnetic radiation 93
7.8 Gravitational waves from binary systems 94
8 Differential forms 99
8.1 Introduction 99
8.2 Connection 1-forms 100
8.3 Spinors in curved spacetime (non-examinable) 103
8.4 Curvature 2-forms 105
8.5 Volume form 107
8.6 Integration on manifolds 108
8.7 Submanifolds and Stokes’ theorem 109
Preface
These are lecture notes for the course on General Relativity in Part III of the Cambridge
Mathematical Tripos. There are introductory GR courses in Part II (Mathematics or
Natural Sciences) so, although self-contained, this course does not cover topics usually
covered in a first course, e.g., the Schwarzschild solution, the solar system tests, and
cosmological solutions. You should consult an introductory book (e.g. Gravity by J.B.
Hartle) if you have not studied these topics before.
Acknowledgment
I am very grateful to Andrius Štikonas for producing the figures.
Conventions
We will use units in which the speed of light is one: c = 1. Sometimes we will use units
where Newton’s gravitational constant is one: G = 1.
We will use ”abstract indices” a, b, c etc to denote tensors, e.g. V a , gcd . Equations
involving such indices are basis-independent. Greek indices µ, ν etc refer to tensor
components in a particular basis. Equations involving such indices are valid only in
that basis.
We will define the metric tensor to have signature (− + ++), which is the most
common convention. Some authors use signature (+ − −−).
Our convention for the Riemann tensor is such that the Ricci identity takes the
form
∇a ∇b V c − ∇b ∇a V c = Rc dab V d .
Some authors define the Riemann tensor with the opposite sign.
Bibliography
There are many excellent books on General Relativity. The following is an incomplete
list:
Our approach will be closest to that of Wald. The first part of Stewart’s book
is based on a previous version of this course. Carroll’s book is a very readable intro-
duction. Weinberg’s book contains a good discussion of equivalence principles. Our
treatments of the Newtonian approximation and gravitational radiation are based on
Misner, Thorne and Wheeler.
φ y
The maps φα are called charts or coordinate systems. The set {φα } is called an atlas.
Oα Oβ M
φβ
φα
φβ ◦ φ−1
α
Uβ
Uα
φα (Oα ∩ Oβ ) φβ (Oα ∩ Oβ )
Remarks.
1. Sometimes we shall write φα (p) = (x1α (p), x2α (p), . . . xnα (p)) and refer to xiα (p) as
the coordinates of p.
Examples.
2. S 1 : the unit circle, i.e., the subset of R2 given by (cos θ, sin θ) with θ ∈ R. We
can’t define a chart by using θ ∈ [0, 2π) as a coordinate because [0, 2π) is not open.
Instead let P be the point (1, 0) and define one chart by φ1 : S 1 − {P } → (0, 2π),
φ1 (p) = θ1 with θ1 defined by Fig. 3.
θ1 P
x
Figure 3. Definition of θ1
Now let Q be the point (−1, 0) and define a second chart by φ2 : S 1 − {Q} →
(−π, π), φ2 (p) = θ2 where θ2 is defined by Fig. 4.
Neither chart covers all of S 1 but together they form an atlas. The charts overlap
on the “upper” semi-circle and on the “lower” semi-circle. On the first of these we
have θ2 = φ2 ◦ φ−1 −1
1 (θ1 ) = θ1 . On the second we have θ2 = φ2 ◦ φ1 (θ1 ) = θ1 − 2π.
These are obviously smooth functions.
Q θ2
x
Figure 4. Definition of θ2
These define θ ∈ [0, π] and φ ∈ [0, 2π) uniquely but the map (x, y, z) 7→ (θ, φ)
does not define a chart because [0, π]×[0, 2π) is not an open set. Note that points
with θ = 0, π or φ = 0 are the points on S 2 with y = 0, x ≥ 0, which form a line
of longitude as shown in Fig. 5. Let O be S 2 with this line removed. Then we
have a chart ψ : O → U = (0, π) × (0, 2π) ⊂ R2 , given by ψ : (x, y, z) 7→ (θ, φ).
z
O
We can define a second chart using a different set of spherical polar coordinates
defined as follows:
where θ0 ∈ (0, π) and φ0 ∈ (0, 2π) are uniquely defined by these equations. This
is a chart ψ 0 : O0 → U 0 , where O0 is S 2 with the line z = 0, x ≤ 0 removed, see
Fig. 6, and U 0 is (0, π) × (0, 2π). Clearly S 2 = O ∪ O0 . The functions ψ ◦ ψ 0 −1
and ψ 0 ◦ ψ −1 are smooth on O ∩ O0 so these two charts define an atlas for S 2 .
z
O0
Remark. A given set M may admit many atlases, e.g., one can simply add extra
charts to an atlas. We don’t want to regard this as producing a distinct manifold so
we make the following definition:
Definition. Two atlases are compatible if their union is also an atlas. The union of all
atlases compatible with a given atlas is called a complete atlas: it is an atlas which is
not contained in any other atlas.
Remark. We will always assume that were are dealing with a complete atlas. (None
of the above examples gives a complete atlas; such atlases necessarily contain infinitely
many charts.)
Remark. After a while we will stop distinguishing between f and F , i.e., we will say
f (x) when we mean F (x).
p q
one tried to define the sum of a vector at p and a vector at q then to which tangent
plane would the sum belong?
On a surface, the tangent vector to a curve in the surface is automatically tangent
to the surface. We take this as our starting point for defining vectors on a general
manifold. We start by defining the notion of a curve in a manifold, and then the notion
of a tangent vector to a curve at a point p. We then show that the set of all such tangent
vectors at p forms a vector space Tp (M ). This is the analogue of the tangent plane to
a surface but it makes no reference to any embedding into a higher-dimensional space.
Definition A smooth curve in a differentiable manifold M is a smooth function λ :
I → M , where I is an open interval in R (e.g. (0, 1) or (−1, ∞)). By this we mean
that φα ◦ λ is a smooth map from I to Rn for all charts φα .
Let f : M → R and λ : I → M be a smooth function and a smooth curve
respectively. Then f ◦ λ is a map from I to R. Hence we can take its derivative to
obtain the rate of change of f along the curve:
d d
[(f ◦ λ)(t)] = [f (λ(t))] (1.3)
dt dt
In Rn we are used to the idea that the rate of change of f along the curve at a point p
is given by the directional derivative Xp · (∇f )p where Xp is the tangent to the curve
at p. Note that the vector Xp defines a linear map from the space of smooth functions
on Rn to R: f 7→ Xp · (∇f )p . This is how we define a tangent vector to a curve in a
general manifold. (We restrict to curves without self-intersections, i.e., λ(t1 ) 6= λ(t2 ) if
t1 6= t2 . A self-intersecting curve can have multiple tangent vectors at p.)
Definition. Let λ : I → M be a smooth curve without self-intersections and (wlog)
λ(0) = p. The tangent vector to λ at p is the linear map Xp from the space of smooth
functions on M to R defined by
d
Xp (f ) = [f (λ(t))] (1.4)
dt t=0
Note that this satisfies two important properties: (i) it is linear, i.e., Xp (f + g) =
Xp (f ) + Xp (g) and Xp (αf ) = αXp (f ) for any constant α; (ii) it satisfies the Leibniz
rule Xp (f g) = Xp (f )g(p) + f (p)Xp (g), where f and g are smooth functions and f g is
their product.
If φ = (x1 , x2 , . . . xn ) is a chart defined in a neighbourhood of p and F ≡ f ◦ φ−1
then we have f ◦ λ = f ◦ φ−1 ◦ φ ◦ λ = F ◦ (φ ◦ λ) and hence
µ
∂F (x) dx (λ(t))
Xp (f ) = (1.5)
∂xµ φ(p) dt t=0
Note that (i) the first term on the RHS depends only on f and φ and the second
term on the RHS depends only on φ and λ; (ii) we are using the Einstein summation
convention, i.e., µ is summed from 1 to n in the above expression.
Proposition. The set of all tangent vectors at p forms a n-dimensional vector space
Tp (M ): the tangent space at p.
Proof. Consider curves λ and κ through p, wlog λ(0) = κ(0) = p. Let their tangent
vectors at p be Xp and Yp respectively. We need to define addition of tangent vectors
and multiplication by a constant. let α and β be constants. We define αXp + βYp to
be the linear map f 7→ αXp (f ) + βYp (f ). Next we need to show that this linear map is
indeed the tangent vector to a curve through p. Let φ = (x1 , . . . , xn ) be a chart defined
in a neighbourhood of p. Consider the following curve:
Note that ν(0) = p. Let Zp denote the tangent vector to this curve at p. From equation
(1.5) we have
∂F (x) d µ µ µ µ µ
Zp (f ) = [α(x (λ(t)) − x (p)) + β(x (κ(t)) − x (p)) + x (p)]
∂xµ φ(p) dt t=0
µ µ
∂F (x) dx (λ(t)) dx (κ(t))
= µ
α +β
∂x φ(p) dt t=0 dt t=0
= αXp (f ) + βYp (f )
= (αXp + βYp )(f ).
Since this is true for any smooth function f , we have Zp = αXp + βYp as required.
Hence αXp + βYp is tangent to the curve ν at p. It follows that the set of tangent
vectors at p forms a vector space (the zero vector is realized by the curve λ(t) = p for
all t).
The next step is to show that this vector space is n-dimensional. To do this, we
exhibit a basis. Let 1 ≤ µ ≤ n. Consider the curve λµ through p defined by
λµ (t) = φ−1 (x1 (p), . . . , xµ−1 (p), xµ (p) + t, xµ+1 (p), . . . , xn (p)). (1.7)
The tangent vector to this curve at p is denoted ∂x∂ µ p . To see why, note that, using
equation (1.5)
∂ ∂F
(f ) = . (1.8)
∂xµ p ∂xµ φ(p)
The n tangent vectors ∂x∂ µ p are linearly independent. To see why, assume that there
exist constants αµ such that αµ ∂x∂ µ p = 0. Then, for any function f we must have
µ ∂F (x)
α = 0. (1.9)
∂xµ φ(p)
Choosing F = xν , this reduces to αν = 0. Letting this run over all values of ν we see
that all of the constants αν must vanish, which proves linear independence.
Finally we must prove that these tangent vectors span the vector space. This
follows from equation (1.5), which can be rewritten
µ
dx (λ(t)) ∂
Xp (f ) = (f ) (1.10)
dt t=0 ∂xµ p
dxµ (λ(t))
∂
Xp = , (1.11)
dt t=0 ∂xµ p
i.e. Xp can be written as a linear combination of the n tangent vectors ∂x∂ µ p . These
n vectors therefore form a basis for Tp (M ), which establishes that the tangent space is
n-dimensional. QED.
Remark. The basis { ∂x∂ µ p , µ = 1, . . . , n} is chart-dependent: we had to choose
Remark. Note the placement of indices. We shall sum over repeated indices if one
such index appears “upstairs” (as a superscript, e.g., X µ ) and the other “downstairs”
(as a subscript, e.g., eµ ). (The index µ on ∂x∂ µ p is regarded as downstairs.) If an
equation involves the same index more than twice, or twice but both times upstairs or
both times downstairs (e.g. Xµ Yµ ) then a mistake has been made.
Let’s consider the relationship between different coordinate bases. Let φ = (x1 , . . . , xn )
and φ0 = (x0 1 , . . . , x0 n ) be two charts defined in a neighbourhood of p. Then, for any
smooth function f , we have
∂ ∂ −1
(f ) = (f ◦ φ )
∂xµ p ∂xµ φ(p)
∂ 0 −1 0 −1
= [(f ◦ φ ) ◦ (φ ◦ φ )]
∂xµ φ(p)
Now let F 0 = f ◦φ0 −1 . This is a function of the coordinates x0 . Note that the components
of φ0 ◦ φ−1 are simply the functions x0 µ (x), i.e., the primed coordinates expressed in
terms of the unprimed coordinates. Hence what we have is easy to evaluate using the
chain rule:
∂ ∂ 0 0
(f ) = (F (x (x)))
∂xµ p ∂xµ φ(p)
0ν 0 0
∂x ∂F (x )
= µ
∂x φ(p) ∂x0 ν φ0 (p)
0ν
∂x ∂
= (f )
∂xµ φ(p) ∂x0 ν p
Hence we have
∂x0 ν
∂ ∂
= (1.13)
∂xµ p ∂xµ φ(p) ∂x0 ν p
This expresses one set of basis vectors in terms of the other. Let X µ and X 0 µ denote
the components of a vector with respect to the two bases. Then we have
0µ
ν ∂ ν ∂x ∂
X=X =X (1.14)
∂xν p ∂xν φ(p) ∂x0 µ p
and hence
∂x0 µ
0µ ν
X =X (1.15)
∂xν φ(p)
1.5 Covectors
Recall the following from linear algebra:
Definition. Let V be a real vector space. The dual space V ∗ of V is the vector space
of linear maps from V to R.
Lemma. If V is n-dimensional then so is V ∗ . If {eµ , µ = 1, . . . , n} is a basis for V then
V ∗ has a basis {f µ , µ = 1, . . . , n}, the dual basis defined by f µ (eν ) = δνµ (if X = X µ eµ
then f µ (X) = X ν f µ (eν ) = X µ ).
Since V and V ∗ have the same dimension, they are isomorphic. For example the linear
map defined by eµ 7→ f µ is an isomorphism. But this is basis-dependent: a different
choice of basis would give a different isomorphism. In contrast, there is a natural
(basis-independent) isomorphism between V and (V ∗ )∗ :
Theorem. If V is finite dimensional then (V ∗ )∗ is naturally isomorphic to V . The
isomorphism is Φ : V → (V ∗ )∗ where Φ(X)(ω) = ω(X) for all ω ∈ V ∗ .
Now we return to manifolds:
Definition. The dual space of Tp (M ) is denoted Tp∗ (M ) and called the cotangent space
at p. An element of this space is called a covector at p. If {eµ } is a basis for Tp (M )
and {f µ } is the dual basis then we can expand a covector η as ηµ f µ . ηµ are called the
components of η.
Note that (i) η(eµ ) = ην f ν (eµ ) = ηµ ; (ii) if X ∈ Tp (M ) then η(X) = η(X µ eµ ) =
X µ η(eµ ) = X µ ηµ (note the placement of indices!)
Definition. Let f : M → R be a smooth function. Define a covector (df )p by
(df )p (X) = X(f ) for any vector X ∈ Tp (M ). (df )p is the gradient of f at p.
Remarks.
3. To explain why we call (df )p the gradient of f at p, observe that its components
in a coordinate basis are
!
∂ ∂ ∂F
[(df )p ]µ = (df )p = (f ) = (1.17)
∂xµ p ∂xµ p ∂xµ φ(p)
where the first equality uses (i) above, the second equality is the definition of
(df )p and the final equality used (1.8).
and hence that, if ωµ and ωµ0 are the components of ω ∈ Tp∗ (M ) w.r.t. the two coordinate
bases, then ν
0 ∂x
ωµ = ων . (1.19)
∂x0 µ φ0 (p)
Elementary treatements of GR take this as the definition of a covector, which they
usually call a “covariant vector”.
is that there is a superscript Latin letter. This tells us that the object in question is a
vector. We emphasize: X a represents the vector itself, not a component of the vector.
Similarly we denote a covector η by ηa (or ηb etc).
The idea is that an equation written using Latin indices is basis-independent. More
precisely, if we have some equation written with Greek indices, and we know that
it is true for any basis, then we can replace Greek indices with Latin indices (e.g.
µ → a, ν → b etc). For example, in any basis we have η(X) = ηµ X µ = X µ ηµ
1.7 Tensors
In Newtonian physics, you are familiar with the idea that certain physical quantities
are described by tensors (e.g. the inertia tensor). You may have encountered the idea
that the Maxwell field in special relativity is described by a tensor. Tensors are very
important in GR because the curvature of spacetime is described with tensors. In this
section we shall define tensors at a point p and explain some of their basic properties.
Definition. A tensor of type (r, s) at p is a multilinear map
where there are r factors of Tp∗ (M ) and s factors of Tp (M ). (Multilinear means that
the map is linear in each argument.)
In other words, given r covectors and s vectors, a tensor of type (r, s) produces a real
number.
Examples.
where the RHS is a Kronecker delta. This is true in any basis, so in the abstract
index notation we write δ as δba .
2. Consider a (2, 1) tensor. Let η and ω be covectors and X a vector. Then in our
basis we have
Now the basis we chose was arbitrary, hence we can immediately convert this to
a basis-independent equation using the abstract index notation:
T (η, ω, X) = T ab c ηa ωb X c . (1.25)
Hence B µ ν = (A−1 )µ ν . For a change between coordinate bases, our previous results
give 0µ µ
µ ∂x µ ∂x
A ν= , B ν = (1.28)
∂xν ∂x0 ν
and these matrices are indeed inverses of each other (from the chain rule).
Exercise. Show that under an arbitrary change of basis, the components of a vector
X and a covector η transform as
µ
X 0 = Aµ ν X ν , η 0 µ = (A−1 )ν µ ην . (1.29)
The corresponding result for a (r, s) tensor is an obvious generalization of this formula.
Given a (r, s) tensor, we can construct a (r − 1, s − 1) tensor by contraction. This is
easiest to demonstrate with an example.
Example. Let T be a tensor of type (3, 2). Define a new tensor S of type (2, 1) as
follows
S(ω, η, X) = T (f µ , ω, η, eµ , X) (1.31)
where {eµ } is a basis and {f µ } is the dual basis, ω and η are arbitrary covectors and
X is an arbitrary vector. This definition is basis-independent because
µ
T (f 0 , ω, η, e0 µ , X) = T (Aµ ν f ν , ω, η, (A−1 )ρ µ eρ , X)
= (A−1 )ρ µ Aµ ν T (f ν , ω, η, eρ , X)
= T (f µ , ω, η, eµ , X).
The components of S and T are related by S µν ρ = T σµν σρ in any basis. Since this is
true in any basis, we can write it using the abstract index notation as
S ab c = T dab dc (1.32)
Note that there are other (2, 1) tensors that we can build from T abc de . For example,
there is T abd cd , which corresponds to replacing the RHS of (1.31) with T (ω, η, f µ , X, eµ ).
The abstract index notation makes it clear how many different tensors can be defined
this way: we can define a new tensor by ”contracting” any upstairs index with any
downstairs index.
Another important way of constructing new tensors is by taking the product of two
tensors:
Definition. If S is a tensor of type (p, q) and T is a tensor of type (r, s) then the outer
product of S and T , denoted S ⊗ T is a tensor of type (p + r, q + s) defined by
(S ⊗ T )(ω1 , . . . , ωp , η1 , . . . , ηr , X1 , . . . , Xq , Y1 , . . . , Ys )
= S(ω1 , . . . , ωp , X1 , . . . , Xq )T (η1 , . . . , ηr , Y1 , . . . , Ys ) (1.33)
(S ⊗ T )a1 ...ap b1 ...br c1 ...cq d1 ...ds = S a1 ...ap c1 ...cq T b1 ...br d1 ...ds (1.34)
T = T µν ρ eµ ⊗ eν ⊗ f ρ (1.35)
where F ≡ f ◦ φ−1 . Smoothness of f implies that this map defines a smooth function.
Therefore ∂/∂xµ is a smooth vector field. (Note that (∂/∂xµ ) usually won’t be defined
on the whole manifold M since the chart φ might not cover the whole manifold. So
strictly speaking this is not a vector field on M but only on a subset of M . We shan’t
worry too much about this distinction.)
Remark. Since the vectors (∂/∂xµ )p provide a basis for Tp (M ) at any point p, we can
expand an arbitrary vector field as
µ ∂
X=X (1.44)
∂xµ
Since ∂/∂xµ is smooth, it follows that X is smooth if, and only if, its coordinate-basis
components X µ are smooth functions.
Definition. A covector field is a map ω which maps any point p ∈ M to a covector ωp
at p. Given a covector field and a vector field X we can define a function ω(X) : M → R
by ω(X) : p 7→ ωp (Xp ). The covector field ω is smooth if this function is smooth for
any smooth vector field X.
Example. Let f be a smooth function. We have defined (df )p above. Now we let p
vary to define a covector field df . Let X be a smooth vector field. Then df (X) : p 7→
(df )p (Xp ) = Xp (f ) hence df (X) = X(f ). This is a smooth function of p (because X is
smooth). Hence df is a smooth covector field: the gradient of f .
Remark. Taking f = xµ reveals that dxµ is a smooth covector field.
Definition. A (r, s) tensor field is a map T which maps any point p ∈ M to a (r, s) ten-
sor Tp at p. Given r covector fields η1 , . . . , ηr and s vector fields X1 , . . . , Xs we can define
a function T (η1 , . . . , ηr , X1 , . . . , Xs ) : M → R by p 7→ Tp ((η1 )p , . . . , (ηr )p , (X1 )p , . . . , (Xs )p ).
The tensor field T is smooth if this function is smooth for any smooth covector fields
η1 , . . . , ηr and vector fields X1 , . . . , Xr .
Exercise. Show that a tensor field is smooth if, and only if, its components in a
coordinate chart are smooth functions.
Remarks. Note that we can regard a function on M as a (0, 0) tensor field. Henceforth
we shall assume that all tensor fields that we encounter are smooth.
The solution of this differential equation is called the integral curve of the vector field
u through x0 . The definition extends straightforwardly to a vector field on a general
manifold:
Definition. Let X be a vector field on M and p ∈ M . An integral curve of X through
p is a curve through p whose tangent at every point is X.
Let λ denote an integral curve of X with (wlog) λ(0) = p. In a coordinate chart, this
definition reduces to the initial value problem
dxµ (t)
= X µ (x(t)), xµ (0) = xµp . (1.46)
dt
(Here we are using the abbreviation xµ (t) = xµ (λ(t)).) Standard ODE theory guar-
antees that there exists a unique solution to this problem. Hence there is a unique
integral curve of X through any point p.
Example. In a chart φ = (x1 , . . . , xn ), consider X = ∂/∂x1 + x1 ∂/∂x2 and take p to
be the point with coordinates (0, . . . , 0). Then dx1 /dt = 1, dx2 /dt = x1 . Solving the
first equation and imposing the initial condition gives x1 = t, then plugging into the
second equation and solving gives x2 = t2 /2. The other coords are trivial: xµ = 0 for
µ > 2, so the integral curve is t 7→ φ−1 (t, t2 /2, 0, . . . , 0).
chart:
ν ∂F µ ∂F
[X, Y ](f ) = X Y −Y X
∂xν ∂xµ
µ ∂ ν ∂F ν ∂ µ ∂F
= X Y −Y X
∂xµ ∂xν ∂xν ∂xµ
∂Y ν ∂F ∂X µ ∂F
= Xµ µ ν − Y ν ν
∂x ∂x ∂x ∂xµ
µ
∂Y ∂X µ ∂F
= Xν ν − Y ν ν
∂x ∂x ∂xµ
∂
= [X, Y ]µ (f )
∂xµ
where µ µ
µ ν ∂Y ν ∂X
[X, Y ] = X −Y . (1.48)
∂xν ∂xν
Since f is arbitrary, it follows that
µ ∂
[X, Y ] = [X, Y ] . (1.49)
∂xµ
The RHS is a vector field hence [X, Y ] is a vector field whose components in a coordinate
basis are given by (1.48). (Note that we cannot write equation (1.48) in abstract index
notation because it is valid only in a coordinate basis.)
Example. Let X = ∂/∂x1 and Y = x1 ∂/∂x2 + ∂/∂x3 . The components of X are
constant so [X, Y ]µ = ∂Y µ /∂x1 = δ2µ so [X, Y ] = ∂/∂x2 .
Exercise. Show that (i) [X, Y ] = −[Y, X]; (ii) [X, Y + Z] = [X, Y ] + [X, Z]; (iii)
[X, f Y ] = f [X, Y ] + X(f )Y ; (iv) [X, [Y, Z]] + [Y, [Z, X]] + [Z, [X, Y ]] = 0 (the Jacobi
identity). Here X, Y, Z are vector fields and f is a smooth function.
Remark. The components of (∂/∂xµ ) in the coordinate basis are either 1 or 0. It
follows that
∂ ∂
, = 0. (1.50)
∂xµ ∂xν
Conversely, it can be shown that if X1 , . . . , Xm (m ≤ n) are vector fields that are
linearly independent at every point of some open region R, and whose commutators
all vanish in R, then, in a neighbourhood of any point p ∈ R, one can introduce a
coordinate chart (x1 , . . . , xn ) such that Xi = ∂/∂xi (i = 1, . . . , m) throughout this
neighbourhood.
Inside the integral we see the norm of the tangent vector dx/dt, in other words the
scalar product of this vector with itself. Therefore to define a notion of distance on a
general manifold, we shall start by introducing a scalar product between vectors.
A scalar product maps a pair of vectors to a number. In other words, at a point
p, it is a map g : Tp (M ) × Tp (M ) → R. A scalar product should be linear in each
argument. Hence g is a (0, 2) tensor at p. We call g a metric tensor. There are a couple
of other properties that g should also satisfy:
Definition. A metric tensor at p ∈ M is a (0, 2) tensor g with the following properties:
p
Definition. On a Riemannian manifold, the norm of a vector X is |X| = g(X, X)
and the angle between two non-zero vectors X and Y (at the same point) is θ where
cos θ = g(X, Y )/(|X| |Y |). (These definitions, in terms of the scalar product, agree
with the usual definitions of Euclidean geometry.)
Remark. On a Riemannian manifold, we can now define the length of a curve in
exactly the same way as above: let λ : (a, b) → M be a smooth curve with tangent
vector X. Then the length of the curve is
Z b q
dt g(X, X)|λ(t) (2.2)
a
Exercise. Given a curve λ(t) we can define a new curve simply by changing the
parameterization: let t = t(u) with dt/du > 0 and u ∈ (c, d) with t(c) = a and t(d) = b.
Show that: (i) the new curve κ(u) ≡ λ(t(u)) has tangent vector Y a = (dt/du)X a ; (ii)
the length of these two curves is the same, i.e., our definition of length is independent
of parameterization.
In a coordinate basis, we have (cf equation (1.35))
This notation captures the intuitive idea of an infinitesimal distance ds being deter-
mined by infinitesimal coordinate separations dxµ .
Examples.
(Rn , g) is called Euclidean space. A coordinate chart which covers all of R4 and
in which gµν = diag(1, 1, . . . , 1) is called Cartesian.
3. On S 2 , let (θ, φ) denote the spherical polar coordinate chart discussed earlier.
The (unit) round metric on S 2 is
i.e. in the chart (θ, φ), we have gµν = diag(1, sin2 θ). Note this is positive definite
for θ ∈ (0, π), i.e., on all of this chart. However, this chart does not cover the
whole manifold so the above equation does not determine g everywhere. We can
give a precise definition by adding that, in the chart (θ0 , φ0 ) discussed earlier,
g = dθ0 2 + sin2 θ0 dφ0 2 . One can check that this does indeed define a smooth
tensor field. (This metric is the one induced from the embedding of S 2 into 3d
Euclidean space: we will see later that it is the “pull-back” of the metric on
Euclidean space.)
where δca is the (1, 1) tensor defined in (1.21) with components δνµ .
Example. For the metric on S 2 defined above, in the chart (θ, φ) we have g µν =
diag(1, 1/ sin2 θ).
Hence
ηµν Aµ ρ Aν σ = ηρσ . (2.10)
These are the defining equations of a Lorentz transformation in special relativity. Hence
different orthonormal bases at p are related by Lorentz transformations. We saw earlier
that the components of a vector at p transform as X 0 µ = Aµ ν X ν , which is the same
as the transformation law of the components of a vector in special relativity. A similar
result holds for tensors of other types. Thus, on a Lorentzian manifold, tensor compo-
nents w.r.t. different orthonormal bases at p are related in exactly the same way as in
special relativity.
Definition. On a Lorentzian manifold (M, g), a vector X ∈ Tp (M ) is timelike if
g(X, X) < 0, null (or lightlike) if X 6= 0 and g(X, X) = 0, and spacelike if g(X, X) > 0.
A vector is causal if it is timelike or null.
Remark. In an orthonormal basis at p, the metric has components ηµν so the tangent
space at p has exactly the same structure as Minkowski spacetime, i.e., null vectors at p
define a double cone, the light cone, that separates timelike vectors at p from spacelike
vectors at p (see Fig. 8). Note that causal vectors at p fall into two disconnected sets.
timelike
null
spacelike
Exercise (Examples sheet 1). Let X a , Y b be non-zero vectors at p that are orthog-
onal, i.e., gab X a Y b = 0. Show that (i) if X a is timelike then Y a is spacelike; (ii) if X a
is null then Y a is spacelike or null; (iii) if X a is spacelike then Y a can be spacelike,
Definition. If proper time τ is used to parametrize a timelike curve then the tangent
ua to the curve is called the 4-velocity of the curve. In a coordinate basis, it has
components uµ = dxµ /dτ .
Remark. (2.12) implies that 4-velocity is a unit timelike vector:
with respect to u. The proper time between p and q along such a curve is given by the
functional Z 1
τ [λ] = du G (x(u), ẋ(u)) (2.15)
0
where q
G (x(u), ẋ(u)) ≡ −gµν (x(u))ẋµ (u)ẋν (u) (2.16)
and we are writing xµ (u) as a shorthand for xµ (λ(u)).
The curve that extremizes the proper time, must satisfy the Euler-Lagrange equa-
tion
d ∂G ∂G
µ
− µ =0 (2.17)
du ∂ ẋ ∂x
Working out the various terms, we have (using the symmetry of the metric)
∂G 1 1
µ
= − 2gµν ẋν = − gµν ẋν (2.18)
∂ ẋ 2G G
∂G 1
µ
= − gνρ,µ ẋν ẋρ (2.19)
∂x 2G
where we have relabelled some dummy indices, and introduced the important notation
of a comma to denote partial differentiation:
∂
gνρ,µ ≡ gνρ (2.20)
∂xµ
We will be using this notation a lot henceforth.
So far, our parameter u has been arbitrary subject to the conditions u(0) = p and
u(1) = q. At this stage, it is convenient to use a more physical parameter, namely τ ,
the proper time along the curve. (Note that we could not have used τ from the outset
since the value of τ at q is different for different curves, which would make the range
of integration different for different curves.) The parameters are related by
2
dτ
= −gµν ẋµ ẋν = G2 (2.21)
du
and hence dτ /du = G. So in our equations above, we can replace d/du with Gd/dτ , so
the Euler-Lagrange equation becomes (after cancelling a factor of −G)
In the second term, we can replace gµν,ρ with gµ(ν,ρ) because it is contracted with an
object symmetrical on ν and ρ. Finally, contracting the whole expression with the
inverse metric and relabelling indices gives
d 2 xµ ν
µ dx dx
ρ
+ Γ νρ =0 (2.24)
dτ 2 dτ dτ
where Γµνρ are known as the Christoffel symbols, and are defined by
1
Γµνρ = g µσ (gσν,ρ + gσρ,ν − gνρ,σ ) . (2.25)
2
Remarks. 1. Γµνρ = Γµρν . 2. The Christoffel symbols are not tensor components.
Neither the first term nor the second term in (2.24) are components of a vector but
the sum of these two terms does give vector components. 3. Equation 2.24 is called
the geodesic equation and its solutions are called geodesics. Geodesics will be discussed
more generally below. 4. We obtain exactly the same equation if we consider curves in
a Riemannian manifold, or spacelike curves in a Lorentzian manifold, that extremize
proper length.
Example. In Minkowski spacetime, the components of the metric in an inertial frame
are constant so Γµνρ = 0. Hence the above equation reduces to d2 xµ /dτ 2 = 0, which
is the equation of a straight line. Thus, in Minkowski spacetime, timelike curves of
extremal proper time are straight lines. It can be shown that these lines maximize the
proper time between two points. In a general spacetime, this is true only locally, i.e.,
for any point p there exists a neighbourhood of p within which timelike geodesics are
curves that maximize proper time.
Exercise. Show that (2.24) can be obtained more directly as the Euler-Lagrange
equation for the Lagrangian
dxµ dxν
L = −gµν (x(τ )) (2.26)
dτ dτ
This is usually the easiest way to derive (2.24) or to calculate the Christoffel symbols.
Example. The Schwarzschild metric in Schwarzschild coordinates (t, r, θ, φ) is
2M
ds2 = −f dt2 + f −1 dr2 + r2 dθ2 + r2 sin2 θ dφ2 , f =1− (2.27)
r
where M is a constant. We have
2 2 2 2
dt −1 dr 2 dθ 2 2 dφ
L=f −f −r − r sin θ (2.28)
dτ dτ dτ dτ
d2 t
d dt dt dr
2f =0 ⇒ 2
+ f −1 f 0 =0 (2.29)
dτ dτ dτ dτ dτ
3 Covariant derivative
3.1 Introduction
To formulate physical laws, we need to be able to differentiate tensor fields. For scalar
fields, partial differentiation is fine: f,µ ≡ ∂f /∂xµ are the components of the covector
field (df )a . However, for tensor fields, partial differentiation is no good because the
partial derivative of a tensor field does not give another tensor field:
Exercise. Let V a be a vector field. In any coordinate chart, let T µ ν = V µ ,ν ≡
∂V µ /∂xν . Show that T µ ν do not transform as tensor components under a change of
chart.
The problem is that differentiation involves comparing a tensor at two infinitesi-
mally nearby points of the manifold. But we have seen that this does not make sense:
tensors at different points belong to different spaces. The mathematical structure that
overcomes this difficulty is called a covariant derivative or connection.
Definition. A covariant derivative ∇ on a manifold M is a map sending every pair of
smooth vector fields X, Y to a smooth vector field ∇X Y , with the following properties
(where X, Y, Z are vector fields and f, g are functions)
∇X (Y + Z) = ∇X Y + ∇X Z, (3.2)
∇X (f Y ) = f ∇X Y + (∇X f )Y, (Leibniz rule), (3.3)
where the action of ∇ on functions is defined by
∇X f = X(f ). (3.4)
Remark. (3.1) implies that, at any point, the map ∇Y : X 7→ ∇X Y is a linear map
from Tp (M ) to itself. Hence it defines a (1, 1) tensor (see examples sheet 1). More
precisely, if η ∈ Tp∗ (M ) and X ∈ Tp (M ) then we define (∇Y )(η, X) ≡ η(∇X Y ).
Definition. let Y be a vector field. The covariant derivative of Y is the (1, 1) tensor
field ∇Y . In abstract index notation we usually write (∇Y )a b as ∇b Y a or Y a ;b
Remarks.
Example. Pick a coordinate chart on M . Let ∇ be the partial derivative in this chart.
This satisfies all of the above conditions. This is not a very interesting example of
a covariant derivative because it depends on choosing a particular chart: if we use a
different chart then this covariant derivative will not be the partial derivative in the
new chart.
Definition. In a basis {eµ } the connection components Γµνρ are defined by
Example. The Christoffel symbols are the coordinate basis components of a certain
connection, the Levi-Civita connection, which is defined on any manifold with a metric.
More about this soon.
Write X = X µ eµ and Y = Y µ eµ . Now
and hence
(∇X Y )µ = X ν eν (Y µ ) + Γµρν Y ρ X ν (3.7)
so
Y µ ;ν = eν (Y µ ) + Γµρν Y ρ (3.8)
In a coordinate basis, this reduces to
Y µ ;ν = Y µ ,ν + Γµρν Y ρ (3.9)
The presence of the second term demonstrates that Γµνρ are not tensor components.
Hence neither term on the RHS of equation (3.8) transforms as a tensor. However, the
sum of these two terms does transform as a tensor.
Exercise. Let ∇ and ∇ ˜ be two different connections on M . Show that ∇ − ∇ ˜ is a
(1, 2) tensor field. You can do this either from the definition of a connection, or from
the transformation law for the connection components.
The action of ∇ is extended to general tensor fields by the Leibniz property. If T
is a tensor field of type (r, s) then ∇T is a tensor field of type (r, s + 1). For example,
if η is a covector field then, for any vector fields X and Y , we define
It is not obvious that this defines a (0, 2) tensor but we can see this as follows:
where we used (3.7). Now, the second and third terms cancel (X = X ν eν ) and hence
(renaming dummy indices in the final term)
Remark. We are using a comma and semi-colon to denote partial, and covariant,
derivatives respectively. If more than one index appears after a comma or semi-colon
then the derivative is to be taken with respect to all indices. The index nearest to
comma/semi-colon is the first derivative to be taken. For example, f,µν = f,µ,ν ≡ ∂ν ∂µ f ,
and X a ;bc = ∇c ∇b X a (we cannot use abstract indices for the first example since it is
not a tensor). The second partial derivatives of a function commute: f,µν = f,νµ but
for a covariant derivative this is not true in general. Set η = df in (3.16) to get, in a
coordinate basis,
f;µν = f,µν − Γρµν f,ρ (3.18)
Antisymmetrizing gives
f;[µν] = −Γρ[µν] f,ρ (coordinate basis) (3.19)
Definition. A connection ∇ is torsion-free if ∇a ∇b f = ∇b ∇a f for any function f .
From (3.19), this is equivalent to
Γρ[µν] = 0 (coordinate basis) (3.20)
where we used the Leibniz rule and ∇X g = 0 in the second equality. Permuting X, Y, Z
leads to two similar identities:
∇X Y − ∇Y X = [X, Y ] (3.27)
Hence
1
g(∇X Y, Z) = [X(g(Y, Z)) + Y (g(Z, X)) − Z(g(X, Y ))
2
+ g([X, Y ], Z) + g([Z, X], Y ) − g([Y, Z], X)] (3.29)
3.3 Geodesics
Previously we considered curves that extremize the proper time between two points of
a spacetime, and showed that this gives the equation
d 2 xµ µ dxν dxρ
+ Γ νρ (x(τ )) = 0, (3.34)
dτ 2 dτ dτ
where τ is the proper time along the curve. The tangent vector X a to the curve has
components X µ = dxµ /dτ . This is defined only along the curve. However, we can
extend X a (in an arbitrary way) to a neighbourhood of the curve, so that X a becomes
a vector field, and the curve is an integral curve of this vector field. The chain rule
gives
d2 x µ dX µ (x(τ )) dxν ∂X µ
= = = X ν X µ ,ν . (3.35)
dτ 2 dτ dτ ∂xν
Note that the LHS is independent of how we extend X a hence so must be the RHS.
We can now write (3.34) as
X ν X µ ,ν + Γµνρ X ρ = 0
(3.36)
which is the same as
X ν X µ ;ν = 0, or ∇X X = 0. (3.37)
where we are using the Levi-Civita connection. We now extend this to an arbitrary
connection:
Definition. Let M be a manifold with a connection ∇. An affinely parameterized
geodesic is an integral curve of a vector field X satisfying ∇X X = 0.
Remarks.
1. What do we mean by ”affinely parameterized”? Consider a curve with parameter
t whose tangent X satisfies the above definition. Let u be some other parameter
for the curve, so t = t(u) and dt/du > 0. Then the tangent vector becomes
Y = hX where h = dt/du. Hence
∇Y Y = ∇hX (hX) = h∇X (hX) = h2 ∇X X + X(h)hX = f Y, (3.38)
where f = X(h) = dh/dt. Hence ∇Y Y = f Y describes the same geodesic. In
this case, the geodesic is not affinely parameterized.
It always is possible to find an affine parameter so there is no loss of generality
in restricting to affinely parameterized geodesics. Note that the new parameter
is also affine iff X(h) = 0, i.e., h is constant. Then u = at + b where a and b are
constants with a > 0 (a = h−1 ). Hence there is a 2-parameter family of affine
parameters for any geodesic.
2. Reversing the above steps shows that, in a coordinate chart, for any connec-
tion, the geodesic equation can be written as (3.34) with τ an arbitrary affine
parameter.
3. In a spacetime, curves of extremal proper time are timelike geodesics (with ∇ the
Levi-Civita connection). One can also consider geodesics which are not timelike.
These satisfy (3.34) with τ an affine parameter. The easiest way to obtain this
equation is to use the Lagrangian (2.26).
This is a coupled system of n ordinary differential equations for the n functions xµ (t).
Existence and uniqueness is guaranteed by the standard theory of ordinary differential
equations.
Exercise. Let X be tangent to an affinely parameterized geodesic of the Levi-Civita
connection. Show that ∇X (g(X, X)) = 0 and hence g(X, X) is constant along the
geodesic. Therefore the tangent vector cannot change e.g. from timelike to null along
the geodesic: a geodesic is either timelike, spacelike or null.
Postulate. In GR, free particles move on geodesics (of the Levi-Civita connection).
These are timelike for massive particles, and null for massless particles (e.g. photons).
Remark. In the timelike case we can use proper time as an affine parameter. This
imposes the additional restriction g(X, X) = −1. If τ and τ 0 both are proper times
along a geodesic then τ 0 = τ + b (i.e. a = 1 above). In other words, clocks measuring
proper time differ only by their choice of zero. In particular, they measure equal time
intervals. Similarly in the spacelike case (or on a Riemannian manifold), we use arc
length s as affine parameter, which gives g(X, X) = 1 and s0 = s + b. In the null case,
there is no analogue of proper time or arc length and so there is a 2-parameter freedom
in choice of affine parameterization.
Now symmetrize on µν: the final two terms cancel and the result follows.
Remark. Again, we emphasize, this is valid only at the point p. At any point, we can
introduce normal coordinates to make the first partial derivatives of the metric vanish
at that point. They will not vanish away from that point.
Lemma. On a manifold with metric one can choose normal coordinates at p so that
gµν,ρ (p) = 0 and also gµν (p) = ηµν (Lorentzian case) or gµν (p) = δµν (Riemannian case).
4 Curvature
4.1 Parallel transport
On a general manifold there is no way of comparing tensors at different points. For
example, we can’t say whether a vector at p is the same as a vector at q. However, with
a connection we can define a notion of “a tensor that doesn’t change along a curve”:
Definition. Let X a be the tangent to a curve. A tensor field T is parallelly transported
along the curve if ∇X T = 0.
Remarks.
Consider Euclidean space or Minkowski spacetime with the Levi-Civita connection, and
use Cartesian/inertial frame coordinates so the Christoffel symbols vanish everywhere.
Then a tensor is parallelly transported along a curve iff its components are constant
along the curve. Hence if we have two different curves from p to q then the result of
parallelly transporting T from p to q is independent of which curve we choose. However,
in a general spacetime this is no longer true: parallel transport is path-dependent. The
path-dependence of parallel transport is measured by the Riemann curvature tensor.
It follows that our definition does indeed define a tensor. Let’s calculate its components
in a coordinate basis {eµ = ∂/∂xµ } (so [eµ , eν ] = 0). Use the notation ∇µ ≡ ∇eµ ,
R(eρ , eσ )eν = ∇ρ ∇σ eν − ∇σ ∇ρ eν
= ∇ρ (Γτνσ eτ ) − ∇σ (Γτνρ eτ )
= ∂ρ Γµνσ eµ + Γτνσ Γµτρ eµ − ∂σ Γµνρ eµ − Γτνρ Γµτσ eµ (4.5)
We saw earlier that, with vanishing torsion, the second covariant derivatives of a
function commute. The same is not true of covariant derivatives of tensor fields. The
failure to commute arises from the Riemann tensor:
Exercise. Let ∇ be a torsion-free connection. Prove the Ricci identity:
∇c ∇d Z a − ∇d ∇c Z a = Ra bcd Z b (4.8)
Hint. Show that the equation is true when multiplied by arbitrary vector fields X c and
Y d.
everywhere, with [X, Y ] = 0. Earlier we saw that we can choose a coordinate chart
(s, t, . . .) such that X = ∂/∂s and Y = ∂/∂t. Let p ∈ M and choose the coordinate
chart such that p has coordinates (0, . . . , 0). Let q, r, u be the points with coordinates
(δs, 0, 0, . . .), (δs, δt, 0, . . .), (0, δt, 0, . . .) respectively, where δs and δt are small. We
can connect p and q with a curve along which only s varies, with tangent X. Similarly,
q and r can be connected by a curve with tangent Y . p and u can be connected by
a curve with tangent Y , and u and r can be connected by a curve with tangent X.
The result is a small quadrilateral (Fig. 9). (Note that if [X, Y ] 6= 0 then the integral
curves of X, Y would not form such a quadrilateral.)
Y Y
X q(δs, 0, . . . , 0)
p(0, 0, . . . , 0)
= Γµνρ,σ Z ν Y ρ X σ p δs + O(δs2 )
(4.11)
We also have
(Γµνρ Z ν Y ρ ),σ Y σ = (Γµνρ Z ν Y ρ ),σ Y σ
q p
+ O(δs) (4.12)
Hence
h i
Zrµ = Zqµ − Γµνρ,σ Z ν Y ρ X σ p δs + O(δs2 ) δt
1h µ i
(Γνρ,σ Z ν Y ρ Y σ p + O(δs) δt2 + O(δt3 )
−
2
1 µ ν
= Zpµ − Γνρ,σ p Z X ρ X σ δs2 + Y ρ Y σ δt2 + 2Y ρ X σ δsδt p + O(δ 3 )
2
(4.13)
Here we assume that δs and δt both are O(δ) (i.e. δs = aδ for some non-zero constant
a and similarly for δt). Now consider parallel transport along pur. The result can be
obtained from the above expression simply by interchanging X with Y and s with t.
Hence we have
0
∆Zrµ ≡ Zrµ − Zrµ = Γµνρ,σ Z ν (Y ρ X σ − X ρ Y σ ) p δsδt + O(δ 3 )
where we used the expression (4.6) for the Riemann tensor components (remember that
Γµνρ (p) = 0). In the final equality we used that quantities at p and r differ by O(δ).
We have derived this result in a coordinate basis defined using normal coordinates at
p. But now both sides involve tensors at r. Hence our equation is basis-independent
so we can write
a b c d
∆Zra
R bcd Z X Y r = lim (4.15)
δ→0 δsδt
Ra b(cd) = 0. (4.16)
Ra [bcd] = 0. (4.17)
Ra b[cd;e] = 0 (4.18)
In normal coordinates at p, ∂R = ∂∂Γ − Γ∂Γ and the latter terms vanish at p, we only
need to worry about the former:
Antisymmetrizing gives Rµ ν[ρσ;τ ] = 0 at p in this basis. But again, if this is true in one
basis then it is true in any basis. Furthermore, p is arbitrary. The result follows.
s = const
T
T T
S
o
t = const
S
geodesics are specified by xµ (s, t) with S µ = ∂xµ /∂s. Hence xµ (s + δs, t) = xµ (s, t) +
δsS µ (s, t) + O(δs2 ). Therefore (δs)S a points from one geodesic to an infinitesimally
nearby one in the family. We call S a a deviation vector.
On the surface Σ we can use s and t as coordinates. We can extend these to
coordinates (s, t, . . .) defined in a neighbourhood of Σ. This gives a coordinate chart
in which S = ∂/∂s and T = ∂/∂t on Σ. We can now use these equations to extend S
and T to a neighbourhood of the surface. S and T are now vector fields satisfying
[S, T ] = 0 (4.21)
Remark. If we fix attention on a particular geodesic then ∇T (δsS) = δs∇T S can be re-
garded as the rate of change of the relative position of an infinitesimally nearby geodesic
Remarks.
1. Note that the relative acceleration vanishes for all families of geodesics if, and
only if, the connection is flat, i.e., Ra bcd = 0.
Remark. From now on, we shall restrict attention to a manifold with metric, and use
the Levi-Civita connection. The Riemann tensor then enjoys additional symmetries.
Note that we can lower an index with the metric to define Rabcd .
Proposition. The Riemann tensor satisfies
Proof. The second identity follows from the first and the antisymmetry of the Riemann
tensor. To prove the first, introduce normal coordinates at p, so ∂µ gνρ = 0 at p. Then,
at p,
0 = ∂µ δρν = ∂µ (g νσ gσρ ) = gσρ ∂µ g νσ . (4.27)
Multiplying by the inverse metric gives ∂µ g νρ = 0 at p. Using this, we have
1
∂ρ Γτνσ = g τ µ (gµν,σρ + gµσ,νρ − gνσ,µρ ) at p (4.28)
2
And hence (as Γµνρ = 0 at p)
1
Rµνρσ = (gµσ,νρ + gνρ,µσ − gνσ,µρ − gµρ,νσ ) at p (4.29)
2
This satisfies Rµνρσ = Rρσµν at p using the symmetry of the metric and the fact that
partial derivatives commute. This establishes the identity in normal coordinates, but
this is a tensor equation and hence valid in any basis. Furthermore p is arbitrary so
the identity holds everywhere.
Proposition. The Ricci tensor is symmetric:
Proof. Rab = g cd Rdacb = g cd Rcbda = Rc bca = Rba where we used the first identity above
in the second equality.
Definition. The Ricci scalar is
R = g ab Rab (4.31)
Definition. The Einstein tensor is the symmetric (0, 2) tensor defined by
1
Gab = Rab − Rgab (4.32)
2
∇a Gab = 0 (4.33)
λ φ◦λ
X φ∗ (X)
p φ(p)
M N
(φ∗ (df ))(X) = (df )(φ∗ (X)) = (φ∗ (X))(f ) = X(φ∗ (f )) = (d(φ∗ (f )))(X) (5.3)
The first equality is the definition of φ∗ , the second is the definition of df , the third is
the previous Lemma and the fourth is the definition of d(φ∗ (f )). Since X is arbitrary,
the result follows.
Exercise. Use coordinates xµ and y α as before. Show that the components of φ∗ (η)
are related to the components of η by
α
∗ ∂y
(φ (η))µ = ηα (5.4)
∂xµ p
Remark. The pull-back can be extended to a tensor S of type (0, s) by defin-
5.2 Diffeomorphisms
1. Convince yourself that push-forward commutes with the contraction and outer
product operations.
2. Show that the analogue of equation (5.6) for a (1, 1) tensor field is
µ σ
µ ∂y ∂x
[(φ∗ (T )) ν ]φ(p) = (T ρ σ )p (5.8)
∂x p ∂y ν p
ρ
(We don’t need to use indices α, β etc because now M and N have the same
dimension.) Generalize this result to a (r, s) tensor.
Remarks.
1. Pull-back can be defined in a similar way, with the result φ∗ = (φ−1 )∗ .
xµ φ yµ
φ(p)
p
M
N
where X is a vector field and T a tensor field on N . (In words: pull-back X and T to
M , evaluate the covariant derivative there and then push-forward the result to N .)
Exercises (optional!).
1. Note that φ0 is the identity map and φs ◦ φt = φs+t . Hence φ−t = (φt )−1 . If φt is
defined for all t ∈ R (in which case we say the integral curves of X are complete)
then these diffeomorphisms form a 1-parameter abelian group.
3. If we use (φt )∗ to compare tensors at different points then the parameter t controls
how near the points are. In particular, in the limit t → 0, we are comparing
tensors at infinitesimally nearby points. This leads to the notion of a new type
of derivative:
Definition. The Lie derivative of a tensor field T with respect to a vector field X is a
tensor field given by
((φ−t )∗ T )p − Tp
(LX T )p = lim (5.10)
t→0 t
Remarks. Recall that (φ−t )∗ = (φt )∗ so we can also phrase the definition in terms of
pull-back. The Lie derivative wrt X is a map from (r, s) tensor fields to (r, s) tensor
fields. It obeys LX (αS + βT ) = αLX S + βLX T where α and β are constants.
The easiest way to demonstrate other properties of the Lie derivative is to introduce
coordinates in which the components of X are simple. Let Σ be a hypersurface that
has the property that X is nowhere tangent to Σ (in particular X 6= 0 on Σ). Let
xi , i = 1, 2, . . . , n − 1 be coordinates on Σ. Now assign coordinates (t, xi ) to the
point parameter distance t along the integral curve of X that starts at the point with
coordinates xi on Σ (Fig. 14).
(t, xi )
xi
Σ
This defines a coordinate chart (t, xi ) at least for small t, i.e., in a neighbourhood
of Σ. Furthermore, the integral curves of X are the curves (t, xi ) with fixed xi and
parameter t. The tangent to these curves is ∂/∂t so we have constructed coordinates
such that X = ∂/∂t. The diffeomorphism φt is very simple: it just sends the point
p with coordinates xµ = (tp , xip ) to the point φt (p) with coordinates y µ = (tp + t, xip )
hence ∂y µ /∂xν = δνµ . The generalization of (5.8) to a (r, s) tensor then gives
1
[(LX T )µ1 ,...,µr ν1 ,...νs ]p = lim T µ1 ,...,µr ν1 ,...νs (tp + t, xip ) − T µ1 ,...,µr ν1 ,...νs (tp , xip )
t→0 t
∂ µ1 ,...,µr i
= T ν1 ,...νs (tp , xp ) (5.13)
∂t
So in this chart, the Lie derivative is simply the partial derivative with respect to the
coordinate t. It follows that the Lie derivative has the following properties:
Now let’s derive basis-independent formulae for the Lie derivative. First consider
a function f . In the above chart, we have LX f = (∂/∂t)(f ). However, in this chart we
also have X(f ) = (∂/∂t)(f ). Hence
LX f = X(f ) (5.14)
Both sides of this expression are scalars and hence this equation must be valid in any
basis. Next consider a vector field Y . In our coordinates above we have
∂Y µ
(LX Y )µ = (5.15)
∂t
but we also have
∂Y µ
[X, Y ]µ = (5.16)
∂t
If two vectors have the same components in one basis then they are equal in all bases.
Hence we have the basis-independent result
LX Y = [X, Y ] (5.17)
Remark. Let’s compare the Lie derivative and the covariant derivative. The former
is defined on any manifold whereas the latter requires extra structure (a connection).
Equation (5.17) reveals that the Lie derivative wrt X at p depends on Xp and the first
derivatives of X at p. The covariant derivative wrt X at p depends only on Xp , which
enables us to remove X to define the tensor ∇T , a covariant generalization of partial
differentiation. It is not possible to define a corresponding tensor LT using the Lie
derivative. Only LX T makes sense.
Exercises (examples sheet 2).
1. Derive the formula for the Lie derivative of a covector field ωa in a coordinate
basis:
(LX ω)µ = X ν ∂ν ωµ + ων ∂µ X ν (5.18)
Show that this can be written in the basis-independent form
2. Show that the Lie derivative of a (0, 2) tensor field gab in a coordinate basis is
Note that we cannot use abstract indices in (5.18) and (5.20) because they are valid
only in a coordinate basis.
Remark. If φt is a symmetry transformation of a tensor field T (for all t) then LX T =
0.
If φt are a 1-parameter group of isometries then LX g = 0, i.e.,
∇a Xb + ∇b Xa = 0 (5.22)
This is Killing’s equation and solutions X a are called Killing vector fields. Consider
the case in which there exists a chart for which the metric tensor does not depend on
some coordinate z. Then equation (5.20) reveals that ∂/∂z is a Killing vector field.
Conversely, if the metric admits a Killing vector field then equation (5.13) demon-
strates that one can introduce coordinates such that the metric tensor components are
independent of one of the coordinates.
Lemma. Let X a be a Killing vector field and let V a be tangent to an affinely param-
eterized geodesic. Then Xa V a is constant along the geodesic.
Proof. The derivative of Xa V a along the geodesic is
d
(Xa V a ) = V (Xa V a ) = ∇V (Xa V a ) = V b ∇b (Xa V a )
dτ
= V a V b ∇b Xa + Xa V b ∇b V a (5.23)
1 ∂ 2φ ∂ 2φ
− + =0 (6.1)
v 2 ∂(x0 )2 ∂(x1 )2
where xµ are the coordinates of an inertial frame. This equation is not Lorentz invariant
so it cannot hold in all inertial frames. Let {eµ = ∂/∂xµ } be the coordinate basis vectors
of an inertial frame in which it does hold. The components of eµ w.r.t. this basis are
eνµ = δµν . So we can rewrite the equation as
∂ 2φ
1 µ ν µ ν
− 2 e0 e0 + e1 e1 =0 (6.2)
v ∂xµ ∂xν
Since e0 and e1 are vectors, this equation is invariant under Lorentz transformations
so the Greek indices can now be taken to refer to an arbitrary inertial frame (with
eνµ 6= δµν ). Hence we have written the equation in a form that obeys the principle of
relativity! We have achieved this by introducing the vector fields {e0 , e1 }. However, we
don’t really want to regard the above equation as respecting the principle of relativity,
so we need a different formulation of this principle. We use the following:
Principle of special covariance. The only mathematical structure needed to de-
scribe Spacetime is Minkowski spacetime.
This excludes (6.2) because if we regard {e0 , e1 } as associated with Spacetime then
we have introduced an extra mathematical structure to describe Spacetime. However,
if v = 1 then we have −eµ0 eν0 + eµ1 eν1 = η µν where ηµν is the Minkowski metric. Hence
for v = 1 the dependence on {e0 , e1 } drops out and the principle is respected.
We need to go beyond special relativity to incorporate gravity. We have seen that
the equivalence principle strongly suggests that we should describe Spacetime using a
spacetime (M, g). The obvious generalization of the principle of special covariance is:
Principle of general covariance. The only mathematical structure needed to de-
scribe Spacetime is spacetime.
This asserts that spacetime is all we need to describe Spacetime. So, roughly speak-
ing, the equivalence principle suggests that spacetime is necessary, and the principle
of general covariance says that it is sufficient, to describe Spacetime. For example,
when we introduce a connection, general covariance requires that we must choose the
Levi-Civita connection defined by the metric, for otherwise we would be introducing a
new mathematical structure.
As another example, consider Newtonian physics, which we can formulate in terms
of a manifold R4 . In Newtonian physics there is a preferred notion of time, absolute
time, which all observers agree on. Mathematically, this is described by a function
t : R4 → R. So t is a mathematical structure that we need to describe a Newtonian
Spacetime, hence Newtonian physics violates the principle of general covariance.2
Of course the above principle is rather vague because haven’t defined precisely what
we mean by Spacetime. Sometimes it is ambiguous whether a given mathematical
structure is needed to describe Spacetime, or whether it needed to describe matter
in Spacetime. For example, string theory predicts the existence of a scalar field Φ
and an antisymmetric tensor Habc . These emerge from the theory in a very similar
way to the metric tensor gab . Hence one could regard these additional fields as part
of Spacetime and declare that string theory violates the principle. Alternatively, one
could declare these fields to be matter fields, rather than part of Spacetime, in which
case the principle is not violated.
To conclude, the principles discussed in this section provide strong motivation for
the formulation of GR as a theory which describes Spacetime in terms of a Lorentzian
manifold (M, g). However, these principles are not mathematically precise and they
contain ambiguities. Ultimately what matters is whether the theory we construct is
mathematically consistent and whether it agrees with observations.
2. Replace Greek indices (referring to an inertial frame) with abstract Latin indices.
When we do this we are not changing the equation, we are simply enlarging its regime
of validity, so that it holds w.r.t. an arbitrary basis.
Example. The simplest field equation obeying the principle of special covariance is
the wave equation for a scalar field Φ:
η µν ∂µ ∂ν Φ = 0 (6.3)
2
Actually we may only want to define t up to a constant, corresponding to the freedom to shift the
zero of time. In that case, it is dt rather than t which is well-defined, so a Newtonian Spacetime requires
(among other things) a covector field, which still violates the principle. Look up Newton-Cartan theory
if you want to learn more.
where Greek indices refer to an inertial frame. Following the above steps, this becomes
η ab ∇a ∇b Φ = 0 (6.4)
It may look like we have not done much but note that we can now immediately write
down the wave equation for a non-inertial coordinate system xµ : it is η µν ∇µ ∇ν Φ = 0
even though in such a coordinate system we have ηµν 6= diag(−1, 1, 1, 1) and ∇µ 6= ∂µ .
Our equation respects the principle of special covariance because we have not introduced
any mathematical structures beyond those defined in Minkowski spacetime, i.e., the
metric ηab and its Levi-Civita connection.
One more simple step yields a tensor equation valid in an arbitrary spacetime:
3. Replace the Minkowski metric ηab with an arbitrary metric gab and take ∇ to be
the Levi-Civita connection of this metric.
If we start from an equation that respects the principle of special covariance then, after
following the above steps, we will obtain an equation that respects the principle of
general covariance.
Examples.
g ab ∇a ∇b Φ = 0, or ∇a ∇a Φ = 0 or Φ;a a = 0. (6.5)
∇a ∇a Φ − m2 Φ = 0. (6.6)
2. In special relativity, the electric and magnetic fields are combined into an anti-
symmetric tensor Fµν . The electric and magnetic fields in an inertial frame are
obtained by the rule (i, j, k take values from 1 to 3) F0i = −Ei and Fij = ijk Bk .
Maxwell’s equations in an inertial frame are (in Gaussian units)
The Lorentz force law for a particle of charge q and mass m in Minkowski space-
time is
d2 x µ q µν dxρ
= η F νρ (6.9)
dτ 2 m dτ
where τ is proper time. We saw previously that the LHS can be rewritten as
uν ∂ν uµ where uµ = dxµ /dτ is the 4-velocity. Now following the rules above gives
the generally covariant equation
q q
ub ∇b ua = g ab Fbc uc = F a b ub . (6.10)
m m
Note that this reduces to the geodesic equation when q = 0.
At least in simple cases, tensor equations obtained by the above method will satisfy
the Einstein equivalence principle. This can be seen by introducing a local inertial
frame (i.e. normal coordinates) at p, so that gµν (p) = ηµν and Γµνρ (p) = 0. Hence (first)
covariant derivatives reduce to partial derivatives at p. This has the effect of reversing
the above steps at p, to return to the equation of special relativity, in agreement with
the Einstein equivalence principle. For example, in such coordinates at p we have
g µρ ∇µ Fνρ = η µρ ∂µ Fνρ and ∇[µ Fνρ] = ∂[µ Fνρ] so Maxwell’s equations at p are identical
to Maxwell’s equations in Minkowski spacetime. Of course this is only true at p and as
we move away from p we will see deviations from the Minkowski spacetime equations
arising from the fact that spacetime is not flat.
The above procedure gives tensor equations but they are not necessarily the correct
tensor equations! For example, a natural generalization of the Klein-Gordon equation
in curved spacetime is
g ab ∇a ∇b Φ − (m2 + ξR)Φ = 0 (6.11)
where ξ is a dimensionless constant. This equation reduces to the Klein-Gordon equa-
tion in Minkowski spacetime, so one would not obtain this equation by following the
steps above. In general, terms involving curvature are not determined by the above
procedure. Sometimes, such terms are fixed by mathematical consistency. However,
this is not always possible: there is no reason why it should be possible to derive
laws of physics in curved spacetime from those in flat spacetime. The ultimate test is
comparison with observations.
Let’s now discuss geometric optics. Consider, for simplicity, a scalar field satisfying
the wave equation in curved spacetime. It is convenient to consider a complex scalar
field Ψ. In geometric optics we look for solutions of the form Ψ = AeiλS where we
are interested in the high frequency limit λ → ∞ with the functions A, S and their
derivatives assumed to be O(1). Plugging this Ansatz into the wave equation, we obtain
hence to leading order in λ the wave equation reduces to the eikonal equation
g ab ∇a S∇b S = 0 (6.13)
Now, for any function f , if X a is tangent to a curve lying within a surface of constant
f then we have X(f ) = 0 (as f is constant along the curve) and hence df (X) = 0.
Therefore, on such a surface, df is orthogonal to any tangent vector so we say df is
normal to the surface. So in our case Pa ≡ ∇a S is normal to the surfaces of constant
phase S. But we also have 0 = P a Pa so P a is normal to Pa and hence P a is tangent to
these surfaces (this is a counter-intuitive property of a “null” hypersurfaces, i.e., one
with a null normal). Now consider
1
P b ∇b Pa = (∇b S)∇b ∇a S = g bc (∇c S)∇a ∇b S = ∇a g bc ∇b S∇c S = 0
(6.14)
2
where the second equality uses vanishing torsion and the final equality follows from the
eikonal equation. Hence we have shown that, in the geometric optics (high frequency)
limit, P a is null and satisfies the geodesic equation. Each geodesic lies in a surface of
constant S. One can show that exactly the same thing happens for Maxwell’s equations.
In that case the integral curves of P a are called light rays. This is exactly analogous to
the usual way in which light rays arise in geometric optics in flat spacetime. In summary,
we’ve shown how the geodesic equation for null geodesics arises from geometric optics.
P µ = muµ (6.15)
The time component of P µ is the particle’s energy and the spatial components are its
3-momentum with respect to the inertial frame.
If an observer at some point p has 4-velocity v µ (p) then he measures the particle’s
energy, when the particle is at q, to be
The way to see this is to choose an inertial frame in which, at p, the observer is at rest
at the origin, so v µ (p) = (1, 0, 0, 0) so E is just the time component of P ν (q) in this
inertial frame.
By the Einstein equivalence principle, GR should reduce to SR in a local inertial
frame. Hence in GR we also associate a rest mass m to any particle and define the
4-momentum of a particle with 4-velocity ua as
P a = mua (6.17)
Note that
gab P a P b = −m2 (6.18)
The equivalence principle implies that when the observer and particle both are at p
then (6.16) should be valid so the observer measures the particle’s energy to be
∂E
+ ∂i Si = 0. (6.22)
∂t
The momentum flux density is described by the stress tensor:
1 1
tij = (Ek Ek + Bk Bk ) δij − Ei Ej − Bi Bj , (6.23)
4π 2
where we’ve raised indices with η µν . Note that this is a symmetric tensor. It has
components T00 = E, T0i = −Si , Tij = tij . The conservation laws above are equivalent
to the single equation (if no sources)
∂ µ Tµν = 0. (6.26)
Exercise (examples sheet 3). Show that Maxwell’s equations imply that
∇a Tab = 0. (6.28)
where the surface S (with outward unit normal n) bounds V . In words: the rate of
increase of the energy of matter in V is equal to minus the energy flux across S. In
a general curved spacetime, such an interpretation is not possible. This is because
the gravitational field can do work on the matter in the spacetime. One might think
that one could obtain global conservation laws in curved spacetime by introducing a
definition of energy density etc for the gravitational field. This is a subtle issue. The
gravitational field is described by the metric gab . In Newtonian theory, the energy
density of the gravitational field is −(1/8π)(∇Φ)2 so one might expect that in GR
the energy density of the gravitational field should be some expression quadratic in
first derivatives of gab . But we have seen that we can choose normal coordinates to
make the first partial derivatives of gab vanish at any given point. Gravitational energy
certainly exists but not in a local sense. For example one can define the total energy
(i.e. the energy of matter and the gravitational field) for certain spacetimes (this will
be discussed in the black holes course).
Example. A perfect fluid is described by a 4-velocity vector field ua , and two scalar
fields ρ and p. The energy-momentum tensor is
ρ and p are the energy density and pressure measured by an observer co-moving with
the fluid, i.e., one with 4-velocity ua (check: Tab ua ub = ρ + p − p = ρ). The equations
of motion of the fluid can be derived by conservation of Tab :
Exercise (examples sheet 3). Show that, for a perfect fluid, ∇a Tab = 0 is equivalent
to
ua ∇a ρ + (ρ + p)∇a ua = 0, (ρ + p)ub ∇b ua = −(gab + ua ub )∇b p (6.31)
These are relativistic generalizations of the mass conservation equation and Euler equa-
tion of non-relativistic fluid dynamics. Note that a pressureless fluid moves on timelike
geodesics. This makes sense physically: zero pressure implies that the fluid particles
are non-interacting and hence behave like free particles.
Rab = 0 (6.33)
3. The Einstein equation is a set of non-linear, second order, coupled, partial dif-
ferential equations for the components of the metric gµν . Very few physically
interesting explicit solutions are known so one has to develop other methods to
solve the equation, e.g., numerical integration.
4. How unique is the Einstein equation? Is there any tensor, other than Gab that
we could have put on the LHS? The answer is supplied by:
Theorem (Lovelock 1972). Let Hab be a symmetric tensor such that (i) in any
coordinate chart, at any point, Hµν is a function of gµν , gµν,ρ and gµν,ρσ at that point;
(ii) ∇a Hab = 0; (iii) either spacetime is four-dimensional or Hµν depends linearly on
gµν,ρσ . Then there exist constants α and β such that
Hence (as Einstein realized) there is the freedom to add a constant multiple of gab to
the LHS of Einstein’s equation, giving
Λ is called the cosmological constant. This no longer reduces to Newtonian theory for
slow motion in a weak field but the deviation from Newtonian theory is unobservable
if Λ is sufficiently small. Note that |Λ|−1/2 has the dimensions of length. The effects
of Λ are negligible on lengths or times small compared to this quantity. Astronomical
observations indicate that there is indeed a very small positive cosmological constant:
Λ−1/2 ∼ 109 light years, the same order of magnitude as the size of the observable
Universe. Hence the effects of the cosmological constant are negligible except on cos-
mological length scales. Therefore we can set Λ = 0 unless we discuss cosmology.
Note that we can move the cosmological constant term to the RHS of the Einstein
equation, and regard it as the energy-momentum tensor of a perfect fluid with ρ =
−p = Λ/(8πG). Hence the cosmological constant is sometimes referred to as dark
energy or vacuum energy. It is a great mystery why it is so small because arguments
from quantum field theory suggest that it should be 10120 times larger. This is the
with the weakness of the gravitational field corresponding to the components of hµν
being small compared to 1. Note that we are dealing with a situation in which we
have two metrics defined on spacetime, namely gab and the Minkowski metric ηab . The
former is supposed to be the physical metric, i.e., free particles move on geodesics of
gab .
In linearized theory we regard hµν as the components of a tensor field in the sense
of special relativity, i.e., it transforms as a tensor under Lorentz transformations of the
coordinates xµ .
To first order in the perturbation, the inverse metric is
g µν = η µν − hµν , (7.2)
where we define
hµν = η µρ η νσ hρσ (7.3)
To prove this, just check that g µν gνρ = δρµ to linear order in the perturbation. Here,
and henceforth, we shall raise and lower indices using the Minkowski metric ηµν . At
zeroth order this agrees with raising and lowering with gµν . We shall determine the
Einstein equation to first order in the perturbation hµν .
To first order, the Christoffel symbols are
1
Γµνρ = η µσ (hσν,ρ + hσρ,ν − hνρ,σ ) , (7.4)
2
the Riemann tensor is (neglecting ΓΓ terms since they are second order in the pertur-
bation):
1
= (hµσ,νρ + hνρ,µσ − hνσ,µρ − hµρ,νσ ) (7.5)
2
h = hµ µ (7.7)
where X a is the vector field that generates φt and ξ a = tX a . Note that ξ a is small so
we treat it as a first order quantity. If we apply this result to the energy-momentum
tensor, evaluating in our chart xµ , then the first term is small (by assumption) so the
second term is higher order and can be neglected. Hence the energy-momentum tensor
is gauge-invariant to first order. The same is true for any tensor that vanishes in the
unperturbed spacetime, e.g. the Riemann tensor. However, for the metric we have
where we have neglected Lξ h because this is a higher order quantity (as ξ and h
both are small). Comparing this with (7.1) we deduce that h and h + Lξ η describe
physically equivalent metric perturbations. Therefore linearized GR has the gauge
symmetry h → h + Lξ η for small ξ µ . In our chart xµ , we can use (5.21) to write
(Lξ η)µν = ∂µ ξν + ∂ν ξµ and so the gauge symmetry is
There is a close analogy with electromagnetism in flat spacetime, where we can intro-
duce an electromagnetic potential Aµ , a 4-vector obeying Fµν = 2∂[µ Aν] . This has the
gauge symmetry
Aµ → Aµ + ∂µ Λ (7.15)
for some scalar Λ. In this case, one can choose Λ to impose the Lorenz gauge condition
∂ µ Aµ = 0. Similarly, in linearized GR we can choose ξµ to impose the gauge condition
∂ ν h̄µν = 0. (7.16)
To see this, note that under the gauge transformation (7.14) we have
Hence, in this gauge, each component of h̄µν satisfies the wave equation with a source
given by the energy-momentum tensor. Given appropriate boundary conditions, the
solution can be determined using a Green function.
∇2 Φ = 4πGρ (7.19)
with ρ the mass density of matter. For a localised distribution of matter (i.e. ρ → 0
at large x), we can solve this using a Green function:
ρ(t, x0 )
Z
Φ(t, x) = −G d3 x0 (7.20)
|x − x0 |
A necessary condition for validity of Newtonian theory is that the motion of bodies is
non-relativistic. Consider a particle moving in the Newtonian gravitational potential
Φ of some localised object e.g. star or a galaxy. Assume that the particle starts at rest
far away from the object, where Φ is negligible, and then falls towards the object. From
energy conservation we have (1/2)v 2 + Φ = 0 hence v 2 ∼ |Φ| so the motion remains
non-relativistic only if |Φ| 1 everywhere. The same conclusion is reached by looking
at bound orbits. Hence validity of Newtonian theory requires that the gravitational
field is weak.
For a spherical star of mass M and radius R, at the surface of the star we have
Φ = −GM/R and so |Φ| 1 provided R GM . This is satisfied by most “normal”
stars e.g. the sun has R ≈ 7 × 105 km and GM ≈ 1.5km. The only known objects with
R ∼ GM are black holes and neutron stars.
We will now see how GR reduces to Newtonian theory in the limit of non-relativistic
motion and a weak gravitational field. To do this, we could reintroduce factors of c
and try to expand everything in powers of 1/c since we expect Newtonian theory to
be valid as c → ∞. We will follow an equivalent approach in which we stick to our
convention c = 1 but introduce a small dimensionless parameter 0 < 1 such that
a factor of appears everywhere that a factor of 1/c would appear.
We assume that, for some choice of almost-inertial coordinates xµ = (t, xi ), the
3-velocity of any particle v i = dxi /dτ is O(). In Newtonian theory we had v 2 ∼ |Φ|
and so we expect the gravitational field to be O(2 ). We assume that
denotes a typical component of hµν then |∂i X| = O(X/L). Our assumption of small
time derivatives is
∂0 X = O(X/L) (7.22)
For example, in Newtonian theory, the gravitational field of a body of mass M at
position x(t) is Φ = −M/|x − x(t)| which obeys these formulae with L = |x − x(t)|
and |ẋ| = O().
Consider the equations for a timelike geodesic. The Lagrangian is (adding a hat to
avoid confusion with the length L)
where a dot denotes a derivative with respect to proper time τ . Our non-relativistic
assumption is ẋi = O(). From the definition of proper time we have L̂ = 1 which can
be solved to give
1 1
ṫ = 1 + h00 + δij ẋi ẋj + O(4 ) (7.24)
2 2
The Euler-Lagrange equation for xi is
d
−2h0i ṫ − 2 (δij + hij ) ẋj = −h00,i ṫ2 − 2h0j,i ṫẋj − hjk,i ẋj ẋk
dτ
= −h00,i + O(4 /L) (7.25)
The LHS is −2ẍi plus subleading terms. Retaining only the leading order terms gives
1
ẍi = h00,i (7.26)
2
Finally we can convert τ derivatives on the LHS to t derivatives using the chain rule
and (7.24) to obtain
d 2 xi
= −∂i Φ (7.27)
dt2
where
1
Φ ≡ − h00 (7.28)
2
We have recovered the equation of motion for a test body moving in a Newtonian
gravitational field Φ.
Exercise. Show that the corrections to (7.27) are O(4 /L). You can argue as follows.
(7.26) implies ẍi = O(2 /L). Now use (7.24) to show ẗ = O(3 /L). Expand out the
derivative on the LHS of (7.25) to show that the corrections to (7.26) are O(4 /L).
Finally convert τ derivatives to t derivatives using (7.24).
The next thing we need to show is that Φ satisfies the Poisson equation (7.19).
First consider the energy-momentum tensor. Assume that one can ascribe a 4-velocity
ua to the matter. Our non-relativistic assumption implies
where the second equality follows from gab ua ub = −1. The energy-density in the rest-
frame of the matter is
ρ ≡ ua ub Tab (7.30)
Recall that −T0i is the momentum density measured by an observer at rest in these
coordinates so we expect −T0i ∼ ρui = O(ρ). Tij will have a part ∼ ρui uj = O(ρ2 )
arising from the motion of the matter. It will also have a contribution from the stresses
in the matter. Under all but the most extreme circumstances, these are small compared
to ρ. For example, in the rest frame of a perfect fluid, stresses are determined by the
pressure p. The speed of sound in the fluid is C where C 2 = dp/dρ ∼ p/ρ. Our non-
relativistic assumption is that C = O() hence p = O(ρ2 ). This is true in the Solar
system, where p/ρ ∼ |Φ| ∼ 10−5 at the centre of the Sun. Hence we make the following
assumptions
where the first equality follows from (7.30) and other two equalities.
Finally, we consider the linearized Einstein equation. Equation (7.21) implies that
Using our assumption about time derivatives being small compared to spatial deriva-
tives, equation (7.18) becomes
Since h0i = h̄0i , this explains why we had to assume h0i = O(3 ). From the second
result, we have h̄ii = O(4 ) and hence h̄ = −h̄00 + O(4 ). We can use (7.10) to recover
hµν . This gives h00 = (1/2)h̄00 + O(4 ) and so, using (7.28) we obtain Newton’s law of
gravitation:
∇2 Φ = 4πρ(1 + O(2 )) (7.35)
∂ ρ ∂ρ h̄µν = 0 (7.37)
so the theory admits gravitational wave solutions. As usual for the wave equation, we
can build a general solution as a superposition of plane wave solutions. So let’s look
for a plane wave solution:
ρ
h̄µν (x) = Re Hµν eikρ x (7.38)
where Hµν is a constant symmetric complex matrix describing the polarization of the
wave and k µ is the (real) wavevector. We shall suppress the Re in all subsequent
equations. The wave equation reduces to
kµ k µ = 0 (7.39)
so the wavevector k µ must be null hence these waves propagate at the speed of light
relative to the background Minkowski metric. The gauge condition (7.16) gives
k ν Hµν = 0, (7.40)
Returning to the general case, the condition (7.16) does not eliminate all gauge free-
dom. Consider a gauge transformation (7.14). From equation (7.17), we see that this
preserves the gauge condition (7.16) if ξµ obeys the wave equation:
∂ ν ∂ν ξµ = 0. (7.42)
Hence there is a residual gauge freedom which we can exploit to simplify the solution.
Take
ρ
ξµ (x) = Xµ eikρ x (7.43)
which satisfies (7.42) because kµ is null. Using
Exercise. Show that the residual gauge freedom can be used to achieve “longitudinal
gauge”:
H0µ = 0 (7.46)
but (since k ν Hµν = 0) this still does not determine Xµ uniquely, and the freedom
remains to impose the additional “trace-free” condition
H µ µ = 0. (7.47)
Example. Return to our wave travelling in the z-direction. The longitudinal gauge
condition combined with the transversality condition (7.41) gives H3µ = 0. Using the
trace-free condition now gives
0 0 0 0
0 H+ H× 0
Hµν = (7.49)
0 H× −H+ 0
0 0 0 0
where the solution is specified by the two constants H+ and H× corresponding to two
independent polarizations. So gravitational waves are transverse and have two possible
polarizations. This is one way of interpreting the statement that the gravitational field
has two degrees of freedom per spacetime point.
How would one detect a gravitational wave? An observer could set up a family of
test particles locally. The displacement vector S a from the observer to any particle is
governed by the geodesic deviation equation. (We are taking S a to be infinitesimal,
i.e., what we called δsS a previously.) So we can use this equation to predict what the
observer would see. We have to be careful here. It would be natural to write out the
geodesic deviation equation using the almost inertial coordinates and therby determine
S µ . But how would we relate this to observations? S µ are the components of S a with
respect to a certain basis, so how would we determine whether the variation in S µ arises
from variation of S a or from variation of the basis? With a bit more thought, one can
make this approach work but we shall take a different approach.
Consider an observer following a geodesic in a general spacetime. Our observer
will be equipped with a set of measuring rods with which to measure distances. At
some point p on his worldline we could introduce a local inertial frame with spatial
coordinates X, Y, Z in which the observer is at rest. Imagine that the observer sets up
measuring rods of unit length pointing in the X, Y, Z directions at p. Mathematically,
this defines an orthonormal basis {eα } for Tp (M ) (we use α to label the basis vectors
because we are using µ for our almost inertial coordinates) where ea0 = ua (his 4-velocity)
and eai (i = 1, 2, 3) are spacelike vectors satisfying
In Minkowski spacetime, this basis can be extended to the observer’s entire worldline by
taking the basis vectors to have constant components (in an inertial frame), i.e., they
do not depend on proper time τ . In particular, this implies that the orthonormal basis
is non-rotating. Since the basis vectors have constant components, they are parallelly
transported along the worldline. Hence, in curved spacetime, the analogue of this is to
take the basis vectors to be parallelly transported along the worldline. For ua , this is
automatic (the worldline is a geodesic). But for ei it gives
ub ∇b eai = 0 (7.51)
As we discussed previously, if the eai are specified at any point p then this equation
determines them uniquely along the whole worldline. Furthermore, the basis remains
orthonormal because parallel transport preserves inner products (examples sheet 2).
The basis just constructed is sometimes called a parallelly transported frame. It is the
kind of basis that would be constructed by an observer freely falling with 3 gyroscopes
whose spin axes define the spatial basis vectors. Using such a basis we can be sure that
d2 S α
= Rabcd eaα ub uc edβ S β (7.54)
dτ 2
where τ is the observer’s proper time and Sα = eaα Sa is one of the components of Sa in
the parallelly transported frame. On the RHS we’ve used S d = edβ S β .
So far, the discussion has been general but now let’s specialize to our gravitational
plane wave. On the RHS, Rabcd is a first order quantity so we only need to evaluate
the other quantities to leading order, i.e., we can evaluate them as if spacetime were
flat. Assume that the observer is at rest in the almost inertial coordinates. To leading
order, uµ = (1, 0, 0, 0) hence
d2 Sα
2
≈ Rµ00ν eµα eνβ S β (7.55)
dτ
Using the formula for the perturbed Riemann tensor (7.5) and h0µ = 0 we obtain
d2 Sα 1 ∂ 2 hµν µ ν β
≈ e e S (7.56)
dτ 2 2 ∂t2 α β
In Minkowski spacetime we could take eai aligned with the x, y, z axes respectively, i.e.,
eµ1 = (0, 1, 0, 0), eµ2 = (0, 0, 1, 0) and eµ3 = (0, 0, 0, 1). We can use the same results here
because we only need to evaluate eµα to leading order. Using h0µ = h3µ = 0 we then
have
d2 S0 d2 S3
= =0 (7.57)
dτ 2 dτ 2
to this order of approximation. Hence the observer sees no relative acceleration of the
test particles in the z-direction, i.e, the direction of propagation of the wave. Let the
observer set up initial conditions so that S0 and its first derivatives vanish at τ = 0.
Then S0 will vanish for all time. If the derivative of S3 vanishes initially then S3 will
be constant. The same is not true for the other components.
We can choose our almost inertial coordinates so that the observer has coordinates
µ
x ≈ (τ, 0, 0, 0) (i.e. t = τ to leading order along the observer’s worldline). For a +
polarized wave we then have
d2 S 1 1 d2 S 2 1
2
= − ω 2 |H+ | cos(ωτ − α)S1 , 2
= ω 2 |H+ | cos(ωτ − α)S2 (7.58)
dτ 2 dτ 2
where we have replaced t by τ in ∂ 2 hµν /∂t2 and α = arg H+ . Since H+ is small we can
solve this perturbatively: the leading order solution is S1 = S̄1 , a constant (assuming
that we set up initial condition so that the particles are at rest to leading order).
Similarly S2 = S̄2 . Now we can plug these leading order solutions into the RHS of
the above equations and integrate to determine the solution up to first order (again
choosing constants of integration so that the particles would be at rest in the absence
of the wave)
1 1
S1 (τ ) ≈ S̄1 1 + |H+ | cos(ωτ − α) , S2 (τ ) ≈ S̄2 1 − |H+ | cos(ωτ − α)
2 2
(7.59)
This reveals that particles are displaced outwards in the x-direction whilst being dis-
placed inwards in the y-direction, and vice-versa. S̄1 and S̄2 give the average displace-
ment. If the particles are arranged in the xy plane with S̄12 + S̄22 = R2 then they form
a circle of radius R when ωτ = α + π/2. This will be deformed into an ellipse, then
back to a circle, then an ellipse again (Fig 15).
ωτ = α + 21 π ωτ = α + π ωτ = α + 32 π ωτ = α + 2π
Exercise. Show that the corresponding result for a × polarized wave is the same, just
rotated through 45◦ (Fig. 16).
Gravitational wave detectors look for the changes in position of test masses caused
by the above effect. For example, the two LIGO observatories (in the US, see Fig. 7.3)
each have two perpendicular tunnels, each 4 km long. There are test masses (analogous
ωτ = α + 12 π ωτ = α + π ωτ = α + 32 π ωτ = α + 2π
Figure 17. The LIGO Hanford observatory in Washington state, USA. There is another
LIGO observatory in Louisiana. (Image credit: LIGO.)
to the particles above) at the end of each arm (tunnel) and where the arms meet. A
beam splitter is attached to the test mass where the arms meet. A laser signal is split
and sent down each arm, where it reflects off mirrors attached to the test masses at the
ends of the arms. The signals are recombined and interferometry used to detect whether
there has been any change in the length difference of the two arms. The effect that is
being looked for is tiny: plausible sources of gravitational waves give H+ , H× ∼ 10−21
(see below) so the relative length change of each arm is δL/L ∼ 10−21 . The resulting
δL is a tiny fraction of the wavelength of the laser light but the resulting tiny phase
difference between the two laser signals is detectable!
Gravitational wave detectors have been operating for several decades, gradually
improving in sensitivity. The first direct detection of gravitational waves was made by
the LIGO collaboration in September 2015. As we will explain below, there is also very
good indirect evidence for the existence of gravitational waves. We will discuss all of
this in more detail later.
Since each component of h̄µν satisfies the inhomogeneous wave equation, the solution
can be solved using the same retarded Green function that one uses in electromag-
netism:
Tµν (t − |x − x0 |, x0 )
Z
h̄µν (t, x) = 4 d3 x0 (7.61)
|x − x0 |
where |x − x0 | is calculated using the Euclidean metric.
Assume that the matter is confined to a compact region near the origin of size d
(e.g. let d be the radius of the smallest sphere that encloses the matter). Then, far
from the source we have r ≡ |x| |x0 | ∼ d so we can expand
2
|x − x0 |2 = r2 − 2x · x0 + x0 = r2 1 − (2/r)x̂ · x0 + O(d2 /r2 )
(7.62)
Given h̄ij , the first equation can be integrated to determine h̄0i and the second can
then be integrated to determine h̄00 .
The integral in (7.66) can be evaluated as follows. Since the matter is compactly
supported, we can freely integrate by parts and discard surface terms (note also that
t0 does not depend on x0 ). We can also use energy-momentum conservation, which to
this order is just ∂ν T µν = 0. Let’s drop the primes on the coordinates in the integral
for now.
Z Z
3 ij
d3 x ∂k (T ik xj ) − (∂k T ik )xj
d x T (t, x) =
Z
= − d3 x (∂k T ik )xj drop surface term
Z
= d3 x(∂0 T i0 )xj conservation law
Z
= ∂0 d3 x T 0i xj (7.68)
R
moment is the total energy in matter d3 xT00 , the first moment is the energy dipole
R 3
d xT00 xi .)
The result (7.71) describes a disturbance propagating outwards from the source at
the speed of light. If the source exhibits oscillatory motion (e.g. a binary star system)
then h̄ij will describe waves with the same period τ as the energy-momentum tensor of
the source.
The first equation in (7.67) gives
2¨
∂0 h̄0i ≈ ∂j Iij (t − r) (7.72)
r
for some arbitrary function f (x). To determine the functions f and ki we return to
(7.61), which to leading order gives
4E
h̄00 ≈ (7.75)
r
where E is the total energy of the matter
Z
E = d3 x0 T00 (t0 , x0 ) (7.76)
and hence E is a constant. We can now read off the time-independent term in h̄00 :
f (x) = 4E/r. Why have we not rediscovered the I¨ij term in (7.74) from (7.61)? This
term is smaller than f by a factor of d2 /τ 2 (as Iij ∼ Ed2 : convince yourself why!) so
we’d have to go to higher order in the expansion of (7.61) to find this term (and also
the term linear in t, which appears at higher order).
Similarly, (7.61) gives
4Pi
h̄0i ≈ − (7.78)
r
where Pi is the total 3-momentum of matter
Z
Pi = − d3 x0 T0i (t0 , x0 ) (7.79)
will not satisfy the Einstein equation so we have to add a second order correction,
writing
gµν = ηµν + hµν + h(2)
µν (7.81)
(2)
The idea is that if the components of hµν are O() then the components of hµν are
O(2 ).
Now we calculate the Einstein tensor to second order. The first order term is
(1)
what we calculated before (equation (7.8)). We shall call this Gµν [h]. The second
order terms are either linear in h(2) or quadratic in h. The terms linear in h(2) can be
calculated by setting h to zero. This is exactly the same calculation we did before but
with h replaced by h(2) . Hence the result will be (7.8) with h → h(2) , which we denote
(1)
Gµν [h(2) ]. Therefore to second order we have
(2) 1 ρσ 1
Rµν [h] = h ∂µ ∂ν hρσ − hρσ ∂ρ ∂(µ hν)σ + ∂µ hρσ ∂ν hρσ + ∂ σ hρ ν ∂[σ hρ]µ
2 4
1 σρ 1 ρ ρσ 1 ρ
+ ∂σ (h ∂ρ hµν ) − ∂ h∂ρ hµν − ∂σ h − ∂ h ∂(µ hν)ρ (7.85)
2 4 2
For simplicity, assume that no matter is present. At first order, the linearized
(1)
Einstein equation is Gµν [h] = 0 as before. At second order we have
G(1) (2)
µν [h ] = 8πtµν [h] (7.86)
where
1 (2)
tµν [h] ≡ −G [h] (7.87)
8π µν
Equation (7.86) is the equation of motion for h(2) . If h satisfies the linear Einstein
(1)
equation then we have Rµν [h] = 0 so the above results give
1 (2) 1 ρσ (2)
tµν [h] = − Rµν [h] − η Rρσ [h]ηµν (7.88)
8π 2
Consider now the contracted Bianchi identity g µρ ∇ρ Gµν = 0. Expanding this, at first
order we get
∂ µ G(1)
µν [h] = 0 (7.89)
for arbitrary first order perturbation h (i.e. not assuming that h satisfies any equation
of motion). At second order we get
where the final term denotes schematically the terms that arise from the first order
change in the inverse metric and the Christoffel symbols in g µρ ∇ρ . Now, since (7.89)
(1)
holds for arbitrary h, it holds if we replace h with h(2) so ∂ µ Gµν [h(2) ] = 0. If we now
assume that h satisfies its equation of motion G(1) [h] = 0 then the final term in (7.90)
vanishes and this equation reduces to
∂ µ tµν = 0. (7.91)
Hence tµν is a symmetric tensor (in the sense of special relativity) that is (i) quadratic
in the linear perturbation h, (ii) conserved if h satisfies its equation of motion, and (iii)
appears on the RHS of the second order Einstein equation (7.86). This is a natural
candidate for the energy momentum tensor of the linearized gravitational field.
Unfortunately, tµν suffers from a major problem: it is not invariant under a gauge
transformation (7.14). This is how the impossibility of localizing gravitational energy
arises in linearized theory.
Nevertheless, it can be shown that the integral of t00 over a surface of constant
time t = x0 is gauge invariant provided one considers hµν that decays at infinity, and
restricts to gauge transformations which preserve this property. This integral provides
a satisfactory notion of the total energy in the linearized gravitational field. Hence
gravitational energy does exist, but it cannot be localized.
One can use the second order Einstein equation (7.86) to convert the integral
defining the energy, which is quadratic in hµν , into a surface integral at infinity which
(2)
is linear in hµν . In fact the latter can be made fully nonlinear: these surface integrals
make sense in any spacetime which is ”asymptotically flat”, irrespective of whether or
not the linearized approximation holds in the interior. This notion of energy is referred
to as the ADM energy. This is constant in time but there is a related quantity called
the Bondi energy, a non-increasing function of time. The rate of change of this can be
interpreted as the rate of energy loss in gravitational waves.
We shall follow a less rigorous, but more intuitive, approach in which we convert tµν
into a gauge-invariant quantity by averaging. Let R be an open subset of R4 containing
the origin, with typical coordinate size a in all directions (e.g. consider a sphere of
radius a). Let W (y) be a smooth function on R4 such that W = 0 outside R, W > 0
R
inside R and R W (y)d4 y = 1. We define the average of a tensor Xµν at a point with
“almost inertial” coordinates xµ by
Z
hXµν i(x) = W (x − y)Xµν (y)d4 y (7.92)
R
Note that it makes sense to integrate Xµν because we are treating it as a tensor in
Minkowski spacetime, so we can add tensors at different points.
We are interested in averaging in the region far from the source, in which we have
gravitational radiation with some typical wavelength λ (in the notation used above
λ ∼ τ ). Assume that the components of Xµν have typical size X. Since the wavelength
of the radiation is λ, ∂µ Xνρ will have components of typical size X/λ. But the average
is Z
h∂µ Xνρ i(x) = (∂µ W )(x − y)Xνρ (y)d4 y (7.93)
R
where we have integrated by parts and used W = 0 on ∂R. Now ∂µ W has components
of order W/a so the RHS above has components of order X/a. Hence if we choose
a λ then averaging has the effect of reducing ∂µ Xνρ by a factor of λ/a 1. So if we
choose a λ then we can neglect total derivatives inside averages. This implies that
we are free to integrate by parts inside averages:
hη µν Rµν
(2)
[h]i = 0 (7.95)
2. Show that
1 1
htµν i = h∂µ h̄ρσ ∂ν h̄ρσ − ∂µ h̄∂ν h̄ − 2∂σ h̄ρσ ∂(µ h̄ν)ρ i (7.96)
32π 2
As one would expect, there is a constant flux of energy and momentum travelling at
the speed of light in the z-direction.
2 ...
∂0 h̄jk = I jk (t − r) (7.100)
r
and
2 ... 2 ¨
∂i h̄jk = − I jk (t − r) − 2 Ijk (t − r) x̂i (7.101)
r r
The second term is smaller than the first by a factor τ /r 1 and so negligible for large
enough r. Hence
Z
1 1 ... ...
− r2 dΩh∂0 h̄jk ∂i h̄jk ix̂i = h I ij I ij it−r (7.102)
32π 2
On the RHS, the average is a time average, taken over an interval a λ ∼ τ centered
on the retarded time t − r.
Next we have h̄0j = (−2x̂k /r)I¨jk (t − r) hence
2x̂k ... 2x̂k ...
∂0 h̄0j = − I jk (t − r) ∂i h̄0j ≈ I jk (t − r)x̂i (7.103)
r r
where in the second expressions we have used τ /r 1 to neglect the terms arising
from differentiation of x̂k /r. Hence
Z Z
1 2 1 ... ...
− r dΩh−2∂0 h̄0j ∂i h̄0j ix̂i = − h I jk I jl it−r dΩx̂k x̂l (7.104)
32π 4π
R
Now recall the following from Cartesian tensors: dΩx̂k x̂l is isotropic (rotationally
invariant) and hence must equal λδkl for some constant λ. Taking the trace fixes
λ = 4π/3. Hence the RHS above is
1 ... ...
− h I ij I ij it−r (7.105)
3
Next we use h̄00 = 4M/r + (2x̂j x̂k /r)I¨jk (t − r) to give
2x̂j x̂k ...
∂0 h̄00 = I jk (t − r) (7.106)
r
and
4M 2x̂j x̂k ... 2x̂j x̂k ...
∂i h̄00 ≈ − 2 − I jk (t − r) x̂i ≈ − I jk (t − r)x̂i (7.107)
r r r
where we’ve neglected terms arising from differentiation of x̂j x̂k /r in the first equality
because in the radiation zone (τ /r 1) they’re negligible compared to the second
term we’ve retained. In the second equality we’ve neglected the first term in brackets
...
because this leads to a term in the integral proportional to h I jk i, which is the average
of a derivative and therefore negligible (we can’t argue that the second term in brackets
is large compared to the first: the second term is roughly d2 r/τ 3 times the first term,
and this factor is not necessarily large when r τ d). Hence we have
Z
1 1 ... ...
− r2 dΩh∂0 h̄00 ∂i h̄00 ix̂i = h I ij I kl it−r Xijkl (7.108)
32π 8π
where Z
Xijkl = dΩx̂i x̂j x̂k x̂l (7.109)
is another isotropic integral which we’ll evaluate below.
Next we use h̄ = h̄jj − h̄00 and the above results to obtain
2 ... 2x̂j x̂k ...
∂0 h̄ = I jj (t − r) − I jk (t − r) (7.110)
r r
2 ... 2x̂j x̂k ...
∂i h̄ = − I jj (t − r) + I jk (t − r) x̂i (7.111)
r r
R
and hence (using the result above for dΩx̂i x̂j )
Z
1 1 1 ... ... 1 ... ... 1 ... ...
− r2 dΩh− ∂0 h̄∂i h̄ix̂i = h− I jj I kk + I jj I kk − I ij I kl Xijkl i (7.112)
32π 2 4 6 16π
Putting everything together we have
1 ... ... 1 ... ... 1 ... ...
hP it = h I ij I ij − I ii I jj + I ij I kl Xijkl it−r (7.113)
6 12 16π
To evaluate Xijkl , we use the fact that any isotropic Cartesian tensor is a product of δ
factors and factors. In the present case, Xijkl has rank 4 so we can only use δ terms so
we must have Xijkl = αδij δkl + βδik δjl + γδil δjk for some α, β, γ. The symmetry of Xijkl
implies that α = β = γ. Taking the trace on ij and on kl indices then fixes α = 4π/15.
The final term above is therefore
1 ... ... ... ...
h I ii I jj +2 I ij I ij i (7.114)
60
and hence
1 ... ... 1 ... ...
hP it = h I ij I ij − I ii I jj it−r (7.115)
5 3
Finally we define the energy quadrupole tensor, which is the traceless part of Iij
1
Qij = Iij − Ikk δij (7.116)
3
We then have
1 ... ...
hP it = hQij Qij it−r (7.117)
5
This is the quadrupole formula for energy loss via gravitational wave emission. It is
valid in the radiation zone far from a non-relativistic source, i.e., for r τ d.
We conclude that an energy-momentum distribution whose quadrupole tensor is
varying in time will emit gravitational radiation. A spherically symmetric body has
Qij = 0 and so will not radiate, in agreement with Birkhoff ’s theorem, which asserts
that the unique spherically symmetric solution of the vacuum Einstein equation is the
Schwarzschild solution. Hence the spacetime outside a spherically symmetric body is
time independent because it is described by the Schwarzschild solution.
Similarly for a matter distribution with energy density T00 we have defined the total
energy Z
E = d3 xT00 (7.120)
where j is the electric current. The analogue of a magnetic dipole moment for a mass
distribution is Z
J = d3 x x × (ρu) (7.125)
where u is the local velocity of matter (i.e. T0i ≈ −ρui as in section 7.2). But this is
simply the total angular momentum of the matter, which is again conserved. So there is
no gravitational analogue of magnetic dipole radiation: it is forbidden by conservation
of angular momentum.
These arguments “explain” why there is no monopole or dipole gravitational radi-
ation. Gravitational quadrupole radiation is analogous to electric quadrupole radiation
in electromagnetic theory, which is the leading order effect when the electric and mag-
netic dipoles do not vary.
It is easy to detect electromagnetic radiation but gravitational radiation is very
hard to detect. This is because gravitational waves interact only very weakly with
matter (or with each other). This is equivalent to the familiar statement that gravity
is a very weak force - the weakest known force in Nature, and much weaker than the
electromagnetic force.
One way to see this is to consider the energy flux F of a plane gravitational wave.
For example, take a wave with h ∼ 10−21 and ω/(2π) ∼ 100Hz, the kind of signal
the LIGO detectors search for. From the 03 component of (7.97) we have, reinstating
factors of G and c to give a quantity with the correct dimensions
ω 2 c3 2
F∼ h ∼ 0.01Wm−2 (7.126)
32πG
where we are just working to an order of magnitude. This is the same as the energy
flux around 30m from a 100W light bulb. Of course an electromagnetic flux of this
magnitude is easily detectable - your eyes are doing it now. (However, the light has
much higher frequency so maybe a better comparison is with 100Hz electromagnetic
waves, i.e., low frequency radio waves, and these would also be easy to detect at a
flux of 0.01Wm−2 .) But to detect the same energy flux in gravitational waves requires
spending a billion dollars to construct a state of the art detector! A large energy flux
produces only a very small effect on the detector because gravity interacts with matter
so weakly.
The weakness of gravity has some advantanges. Because gravitational waves do
not interact much with matter, they do not suffer much distortion as they propagate
through the Universe. Unlike electromagnetic waves, they are not absorbed or scattered
by matter. So the waves received by a detector are essentially the same as the waves
emitted by the source, adjusted for cosmological expansion.
the case when the stars both have mass M , their separation is d and the orbital period is
τ so the angular velocity is ω ∼ τ −1 . Then Newton’s second law gives M ω 2 d ∼ M 2 /d2
which gives ω ∼ M 1/2 d−3/2 . The quadrupole tensor has components of typical size M d2
so the third derivative is of size M d2 /τ 3 ∼ M d2 ω 3 ∼ (M/d)5/2 . Hence we obtain the
order of magnitude estimate
P ∼ (M/d)5 . (7.127)
The power radiated in gravitational waves is greatest when M/d is as large as possible.
Note the large power (5) on the RHS of this equation: if M/d decreases by an order of
magnitude then P decreases by 5 orders of magnitude. To get a large P we need the
system to have M/d as large as possible, i.e., it has to be as compact as possible.
To understand the size of P , recall that we’ve used units G = c = 1 in (7.127),
which means that we are measuring P is units of the Planck luminosity
c5
LPlanck = ≈ 4 × 1052 W (7.128)
G
This is a mind-bogglingly enormous luminosity. The electromagnetic luminosity of the
Sun is L ≈ 4×1026 W ≈ 10−26 LPlanck . There are roughly 1010 galaxies in the observable
Universe, so if a typical galaxy contains 1011 stars we can estimate the electromagnetic
luminosity of all the stars in the Universe as 1021 L ≈ 10−5 LPlanck . Hence a binary
>
with M/d ∼ 10−1 would emit more power in gravitational radiation than all the stars
in the Universe emit in electromagnetic radiation!
How big can M/d be? Obviously d cannot be smaller than the size R of the stars
themselves. However, a normal star has radius R M . For example, the Sun has
R ≈ 7 × 105 km and M ≈ 1.5km so M/R ∼ 2 × 10−6 hence a binary made of Sun-like
stars would have M/d 10−6 as d R. To obtain a larger amount of radiation we
need to consider binary systems made of much more compact bodies, i.e., bodies with
M/R as large as possible. The most compact objects in Nature are black holes, whose
size is given by the Schwarzschild radius R = 2M (anything more compact than this
would collapse to form a black hole: see the Black Holes course). Almost as compact
are neutron stars: stars made of nuclear matter held together by gravity, like a giant
atomic nucleus. So the binaries which are expected to emit the most gravitational
radiation are tightly bound NS/NS, NS/BH or BH/BH systems (NS: neutron star, BH:
black hole).
The emission of gravitational radiation causes the shape of the orbit to change over
time. To a good approximation, valid when the stars are far apart and moving non-
relativistically, we can calculate this by letting the radius of the Newtonian orbit vary
slowly with time. The energy of a Newtonian orbit is E ∼ −M 2 /d so d decreases as the
system loses energy via gravitational radiation. Hence the orbital period τ ∼ d3/2 M −1/2
also decreases. To calculate how d varies with time, we equate dE/dt with −P . (See
Examples sheet 3.) This approximation can be improved by including higher order,
post-Newtonian, effects.
This prediction of GR has been confirmed observationally. In 1974, Hulse and
Taylor detected a binary pulsar. This is a NS/NS binary in which one of the stars is a
pulsar, i.e., it emits a beam of radio waves in a certain direction. This star is rotating
very rapidly and the beam (which is not aligned with the rotation axis) periodically
points in our direction. Hence we receive pulses of radiation from the star. The period
between successive pulses (about 0.05s) is very stable and has been measured to very
high accuracy. Therefore it acts like a clock that we can observe from Earth. Using this
clock we can determine the orbital period (about 7.75h) of the binary system, again
with good accuracy. The emission of gravitational waves causes the period of the orbit
to decrease by about 10µs per year. This small effect has been measured and the result
confirms the quadrupole formula to an accuracy of 0.3% (the accuracy increases the
longer the system is observed). This is very strong indirect evidence for the existence
of gravitational waves, for which Hulse and Taylor received the Nobel Prize in 1993.
(The gravitational wave luminosity of the Hulse-Taylor system is about 0.02L .)
As a compact binary system loses energy to radiation, the radius of the orbit
shrinks and eventually the two bodies in the system will collide and merge to form
a single black hole (it is unlikely to be a neutron star because a NS cannot have a
mass greater than about 2M ). As the system approaches merger, the velocities of the
two bodies become very large, a significant fraction of the speed of light. For such a
system, the post-Newtonian expansion is useless and the only way of predicting what
will happen is to solve the Einstein equation numerically on a supercomputer. As
the bodies approach merger, the luminosity P still increases in rough agreement with
(7.127), with the luminosity peaking at the merger. (Numerical simulations indicate
that the largest possible peak luminosity in a BH/BH merger is about 0.002LPlanck .)
Hence the strongest sources of gravitational waves are expected to be compact binaries
just before merger.
To discuss the direct detection of gravitational waves from a merging compact
binary, we need to estimate the amplitude of the gravitational waves from such a
source. At a distance r, the above estimates give (using Iij ∼ M d2 )
M d2 M2
h̄ij ∼ ∼ (7.129)
τ 2r dr
We can use this to estimate the amplitude of waves arriving at a detector on Earth.
We take r to be the distance within which we expect there to exist sufficiently many
suitable binary system that at least one will merge within a reasonable time, say 1
year (we don’t want to have to wait for 100 years to detect anything!). The process of
gradual inspiral to final merger is very slow, taking billions of years for plausible initial
conditions (see Examples sheet 3). This implies that r must be of cosmological size:
of the order of 3 × 108 light years. Taking M to be about ten times the mass of the
Sun and d to be ten times the Schwarzschild radius gives h ∼ 10−21 and waves with a
frequency of 100 − 1000 Hz. This is the kind of signal that the two LIGO detectors (in
the US) and the Virgo detector (in Italy) are designed to detect.
On examples sheet 3 you will investigate how properties of the source can be
deduced from properties of the gravitational waves detected. From the frequency of
the waves one can deduce the orbital frequency ω. If, during the “inspiral” phase, one
knows the frequency of the waves, and the rate of change of this frequency, then one
can deduce the so-called chirp mass
(M1 M2 )3/5
Mchirp = (7.130)
(M1 + M2 )1/5
where M1 and M2 are the masses of the two objects. This gives a measure of the total
mass M of the system. From the amplitude of the waves one can then estimate the
distance r of the source using (7.129) (with d ∼ (M/ω 2 )1/3 ). The waves emitted imme-
diately after the merger occur as the final black hole is “settling down to equilibrium”.
The frequency and decay time of such waves are determined uniquely by the final black
hole, so this can be used to determine the final black hole mass. Of course, in practice,
parameters of the initial compact objects and final black hole are determined by finding
the best-fit theoretical prediction to the entire signal.
The first direct detection of gravitational waves was made by the LIGO collabo-
ration on 14 September 2015 (Fig. 18). This event is now referred to as GW150914
(the numbers are the date of the event). By comparing with the detailed predictions
of General Relativity (determined using the post-Newtonian expansion and numerical
simulations), it was deduced that these waves were emitted in the merger of a compact
binary at a distance of 1.4 × 109 light years. The masses of the objects in the binary
were estimated to be 36M and 29M . Since these are well above the upper mass limit
for a neutron star, it was deduced that this was a BH/BH binary. The merger produced
a final BH with mass 62M . The missing mass 3M was emitted as gravitational ra-
diation. The detected signal lasted for about 0.2s. The gravitational wave luminosity
at the merger was greater than the electromagnetic luminosity of all the stars in the
visible Universe!
Since this first detection, many further detections of gravitational waves from
BH/BH binaries have been made by LIGO and Virgo.
Figure 18. The signals detected by the two LIGO observatories on 14 September 2015. The
vertical axis is the change in length of the arms of the detectors. (Image credit: LIGO.)
The first direct detection of gravitational waves from a NS/NS binary was made
in August 2017. The detected signal was much longer than the previous detections
because lower mass binaries perform many more orbits whilst within the detectable
frequency range than higher mass binaries. However, for such systems the merger
occurs at too high a frequency for the gravitational waves emitted at the merger to be
detected.
This was the first gravitational wave detection which was accompanied by detection
of electromagnetic radiation. This radiation was emitted by the NS matter when the
stars were disrupted first by tidal forces and then by merger, and also by the matter left
over after the final BH formed. First gamma rays (arriving 2s after the gravitational
waves), then subsequently other types of electromagnetic waves were detected, with
the electromagnetic signal lasting for days after the event. From comparing the arrival
time of the gamma rays and the gravitational waves one can confirm the prediction
that gravitational waves travel at the speed of light. Detailed study of the properties of
NS/NS mergers is expected to lead to improved understanding of physics at the extreme
densities that occur inside a NS. Such events also have implications for cosmology.
From the electromagnetic radiation, one can deduce which galaxy the event occurred
in, which determines its redshift. By comparing this with the luminosity distance (r
above) one can measure the Hubble constant.
The direct detection of gravitational waves has the potential to revolutionize as-
tronomy. Until now we have always been looking at the Universe using electromagnetic
radiation. We now have a completely different way of studying the Universe. The
importance of this discovery was recognized in the award of the 2017 Nobel Prize in
8 Differential forms
8.1 Introduction
(X ∧ Y ) ∧ Z = X ∧ (Y ∧ Z) (8.3)
i.e. the wedge product is associative so we don’t need to include the brackets.
Remark. If we have a dual basis {f µ } then the set of p-forms of the form f µ1 ∧f µ2 . . .∧
f µp give a basis for the space of all p-forms on M because we can write
1
X= Xµ ...µ f µ1 ∧ f µ2 . . . ∧ f µp (8.4)
p! 1 p
Proof. The components of the LHS and RHS are equal in normal coordinates at r for
any point r.
Exercises (examples sheet 4). Show that the exterior derivative enjoys the following
properties:
d(dX) = 0 (8.7)
d(X ∧ Y ) = (dX) ∧ Y + (−1)p X ∧ dY (8.8)
(where Y is a q-form) and
d(φ∗ X) = φ∗ dX (8.9)
(where φ : N → M ), i.e. the exterior derivative commutes with pull-back.
Remark. The last property implies that the exterior derivative commutes with a Lie
derivative:
LV (dX) = d(LV X) (8.10)
where V is a vector field.
Definition. X is closed if dX = 0 everywhere. X is exact if there exists a (p − 1)-form
Y such that X = dY everywhere.
Remark. Exact implies closed. The converse is true only locally:
Poincaré Lemma. If X is a closed p-form (p ≥ 1) then for any r ∈ M there exists a
neighbourhood O of r and a (p − 1)-form Y such that X = dY in O.
where the third equality is (8.11). Since faµ = eµa we will usually denote basis vectors
as eµ and dual basis covectors as eµ .
Remark. In (8.12) we have defined what it means to raise the Greek index labeling
the basis vector (this is not a tensor index!). Henceforth any Greek index can be
raised/lowered with the Minkowski metric.
Remark. We saw earlier (section 2.2) that any two orthonormal bases at are related
by a Lorentz transformation which acts on the indices µ, ν:
a ν
e0 µ = A−1 a
µ eν , ηµν Aµ ρ Aν σ = ηρσ (8.14)
ηµν eµa eνb ebρ = ηµν eµa δρν = ηµρ eµa = (eρ )a = gab ebρ (8.16)
Hence the first equation is true when contracted with any basis vector so it is true in
general. This equation can be written as eµa (eµ )b = gab . Raise the b index to get the
second equation.
Definition. The connection 1-forms ω µ ν are (using the Levi-Civita connection)
where the second equality uses the definition of the connection components (3.5) and
the final equality uses (8.11). Hence we can write the connection 1-forms as
Remark. The indices µ, ν on ω µ ν are not tensor indices: they do not transform
correctly under Lorentz transformations. This is just the fact that the components of
the connection are not tensor components. However, for each pair (µ, ν), we do have a
well-defined 1-form (ω µ ν )a .
Lemma. (ωµν )a = −(ωνµ )a .
Proof.
deµ = −ω µ ν ∧ eν (8.21)
where the final step uses the second equation of (8.15). Hence
and so
(deµ )ab = 2∇[a eµb] = −2 (ω µ ν )[a eνb] = − (ω µ ν ∧ eν )ab (8.24)
de1 = −f −2 f 0 dr ∧ dr = 0 (8.29)
de2 = dr ∧ dθ = (f /r)e1 ∧ e2 (8.30)
de3 = sin θdr ∧ dφ + r cos θdθ ∧ dφ = (f /r)e1 ∧ e3 + (1/r) cot θe2 ∧ e3 (8.31)
The first of these suggests we try ω 0 1 = f 0 e0 . This would give ω01 = −f 0 e0 and
hence ω10 = f 0 e0 so ω 1 0 = f 0 e0 . This would make a vanishing contribution to de1
(ω 1 0 ∧e0 = 0), which looks promising. The third equation suggests we try ω 2 1 = (f /r)e2 ,
which gives ω 1 2 = −(f /r)e2 which again would make a vanishing contribution to de1 .
The final equation suggests ω 3 1 = (f /r)e3 and ω 3 2 = (1/r) cot θ e3 and again these
will not spoil the second and third equations. So these connection 1-forms (and those
related by ωµν = −ωνµ ) must be the correct answer.
Remark. From (3.8), the components of the covariant derivative of a vector field Y a
are
∇ν Y µ = eν (Y µ ) + Γµρν Y ρ = ∂ν Y µ + ω µ ρν Y ρ (8.32)
where ∂µ ≡ eαµ ∂α , α refers to a coordinate basis, and ω µ ρν ≡ (ω µ ρ )ν .
Exercise. What is the corresponding result for a (1, 1) tensor field?
Remark. We’ve seen how to define tensors in curved spacetime. But what about spinor
fields, i.e., fields of non-integer spin? If we use orthonormal bases, this is straightforward
because the structure of special relativity is manifest locally.
Remark. For now, we work at a single point p. Consider an infinitesimal Lorentz
transformation at p
Aµ ν = δ µ ν + αµ ν (8.33)
for infinitesimal α. The condition that this be a Lorentz transformation (the second
equation in (8.14) gives the restriction
From these we deduce the Lorentz algebra (the square brackets denote a matrix com-
mutator)
[M µν , M ρσ ] = . . . (8.38)
Γµ Γν + Γν Γµ = 2η µν (8.41)
Definition. A field in the Lorentz representation D is a smooth map which, for any
point p and any orthonormal basis {eµ } defined in a neighbourhood of p, gives a vector
ψ in the carrier space of the representation D, with the property that if (p, {eµ })
maps to ψ then (p, {e0µ }) maps to D(A)ψ where {e0µ } is related to {eµ } by the Lorentz
transformation A.
Example. Take D to be the vector representation. Then the carrier space is just Rn so
we can denote the resulting vector ψ µ . Then D(A) = A and so under a change of basis
we have ψ 0 µ = Aµ ν ψ ν , which is just the usual transformation law for the components
of a vector.
Remark. Given a field transforming in some representation D, we can take a partial
derivative of its components in a coordinate basis, and then convert the result to our
orthonormal basis, i.e., ∂µ ψ ≡ eαµ ∂α ψ where α refers to the coordinate basis. However,
since the matrix A can depend on position, the partial derivative of our field will no
longer transform homogeneously under a Lorentz transformation: the result will involve
derivatives of A. For tensor fields, we know how to resolve this problem: introduce the
covariant derivative. So now we need to extend the definition of covariant derivative
to a general representation D:
Definition. The covariant derivative of a field ψ transforming in a representation of
the Lorentz group with generators T µν is, in an orthonormal basis,
1
∇µ ψ = ∂µ ψ + (ωνρ )µ T νρ ψ (8.43)
2
Remark. One can show that this does indeed transform correctly, i.e., in a represen-
tation of the Lorentz group, under Lorentz transformations. The connection 1-forms
are sometimes referred to as the spin connection because of their role in defining the
covariant derivative for spinor fields.
Definition. The Dirac equation for a spin 1/2 field of mass m is
Γµ ∇µ ψ − mψ = 0. (8.44)
Lemma. The curvature 2-forms are given in terms of the connection 1-forms by
Θµ ν = dω µ ν + ω µ ρ ∧ ω ρ ν (8.47)
Proof. Optional exercise. Direct calculation of the RHS, using the relation (8.19) and
equation (8.21). You’ll need to work out the generalization of (4.6) to a non-coordinate
basis.
Remark. This provides a convenient way of calculating the Riemann tensor in an
orthonormal basis. One calculates the connection 1-forms using (8.21) and then the
curvature 2-forms using (8.47). The components of Θµ ν are Rµ νρσ so one can read
off the Riemann tensor components. The only derivatives one needs to calculate are
exterior derivatives, which are usually fairly easy. In simple situations, this is much
faster than calculating the Riemann tensor in a coordinate basis using (4.6).
Example. We determined the connection 1-forms in the Schwarzschild spacetime
previously. From ω 0 1 = f 0 e0 we have
2
dω 0 1 = f 0 de0 + f 00 dr ∧ e0 = f 0 e1 ∧ e0 + f f 00 e1 ∧ e0 (8.48)
we also have
ω0ρ ∧ ωρ1 = ω01 ∧ ω11 = 0 (8.49)
where the first equality arises because ω 0 ρ is non-zero only for ρ = 1 and the second
equality is because ω 1 1 = ω11 = 0 (by antisymmetry). Combining these results we have
2
Θ01 = −Θ0 1 = f f 00 + f 0 e0 ∧ e1 (8.50)
and hence the only non-vanishing components of the form R01ρσ are
00 02
1 2 00 2M
R0101 = −R0110 = f f + f = f =− 3 (8.51)
2 r
Exercise (examples sheet 4). Determine the remaining curvature 2-forms Θ02 , Θ03 ,
Θ12 , Θ13 , Θ23 (all others are related to these by (8.46)). Hence determine the Riemann
tensor components. Check that the Ricci tensor vanishes.
in any right-handed coordinate chart, where g denotes the determinant of gµν in this
chart.
Exercise. 1. Show that this definition is chart-independent. 2. Show that (in a RH
coordinate chart)
1
12...n = ± p (8.53)
|g|
where the upper (lower) sign holds for Riemannian (Lorentzian) signature.
Lemma. ∇a b1 ...bn = 0 where ∇ is the Levi-Civita connection.
Proof. Use normal coordinates at p: the partial derivatives of gµν vanish hence so do
those of g. The Christoffel symbols also vanish at p so the result follows.
Lemma.
a
a1 ...ap cp+1 ...cn b1 ...bp cp+1 ...cn = ±p!(n − p)!δ[ba11 . . . δbpp] (8.54)
where the upper (lower) sign holds for Riemannian (Lorentzian) signature.
Proof. Optional exercise.
Definition. On an oriented manifold with metric, the Hodge dual of a p-form X is the
(n − p)-form ? X defined by
1
(? X)a1 ...an−p = a1 ...an−p b1 ...bp X b1 ...bp (8.55)
p!
? (? X) = ±(−1)p(n−p) X (8.56)
2. Maxwell’s equations are ∇a Fab = −4πjb and ∇[a Fbc] = 0 where j a is the current
density vector. These can be written as
d ? F = −4π ? j, dF = 0 (8.59)
Exercise. Show that this is chart independent, i.e., if one replaces φ with another RH
coordinate chart φ0 : O → U 0 then one gets the same result.
Remark. How do we extend our definition to all of M ? The idea is just to chop M
up into regions such that we can use the above definition on each region, then sum the
resulting terms.
Definition. Let the RH charts φα : Oα → Uα be an atlas for M . Introduce a “partition
of unity”, i.e., a set of functions hα : M → [0, 1] such that hα (p) = 0 if p ∈
/ Oα , and
P
α hα (p) = 1 for all p. We then define
Z XZ
X≡ hα X (8.61)
M α Oα
Remarks.
1. It can be shown that this definition is independent of the choice of atlas and
partition of unity.
Definition. Let M be an oriented manifold with a metric g. Let be the volume form.
R
The volume of M is M . If f is a function on M then
Z Z
f≡ f (8.63)
M M
This is an abuse of notation because the RHS refers to coordinates xµ but M might not
be covered by a single chart. It has the advantage of making it clear that the integral
depends on the metric tensor.
The final equality defines the total charge on Σ. Hence we have a formula relating the
charge on Σ to the flux through the boundary of Σ. This is Gauss’ law.
Definition. X ∈ Tp (M ) is tangent to φ[S] at p if X is the tangent vector at p of a
curve that lies in φ[S]. n ∈ Tp∗ (M ) is normal to a submanifold φ[S] if n(X) = 0 for any
vector X tangent to φ[S] at p.
Remark. The vector space of tangent vectors to φ[S] at p has dimension m. The
vector space of normals to φ[S] at p has dimension n − m. Any two normals to a
hypersurface are proportional to each other.
Definition. A hypersurface in a Lorentzian manifold is timelike, spacelike or null if
any normal is everywhere spacelike, timelike or null respectively.
Exercise. Show that any tangent vector to a spacelike hypersurface is spacelike and
any tangent vector to a null hypersurface is either null or spacelike.
dx1
dx1 (X) = X(x1 ) = = 0. (8.69)
dt
Hence dx1 (X) vanishes for any X tangent to ∂M so dx1 is normal to ∂M . Any other
normal to ∂M will be proportional to dx1 . If ∂M is timelike or spacelike then we can
construct a unit normal by dividing by the norm of dx1 :
(dx1 )a
na = p ⇒ g ab na nb = ±1 (8.70)
±g bc (dx1 )b (dx1 )c
One can show that this is chart independent. Here we choose the + sign if dx1 is
spacelike and the − sign if dx1 is timelike (+ if the metric is Riemannian). Note that
the vector na “points out of” M if ∂M is timelike (or the metric is Riemannian) but
into M if ∂M is spacelike. This is to get the correct sign in the divergence theorem:
Divergence theorem. If ∂M is timelike or spacelike then
Z p Z p
n a
d x |g| ∇a X = dn−1 x |h| na X a (8.71)
M ∂M
9 Lagrangian formulation
9.1 Scalar field action
You are familiar with the idea that the equation of motion of a point particle can be
obtained by extremizing an action. You may also know that the same is true for fields
in Minkowski spacetime. The same is true in GR. To see how this works, consider first
a scalar field, i.e., a function Φ : M → R and define the action as the functional
√
Z
S[Φ] = d4 x −gL (9.1)
M
p √
where |g| = −g because we assume Lorentzian signature, and L is the Lagrangian:
1
L = − g ab ∇a Φ∇b Φ − V (Φ) (9.2)
2
and V (Φ) is called the scalar potential. Now consider a variation Φ → Φ + δΦ for
some function δΦ that vanishes on ∂M (∂M might be “at infinity”). The change in
the action is (working to linear order in δΦ)
Note that we have used the divergence theorem to “integrate by parts”. A formal way
of writing the final expression is
Z
δS
δS = d4 x δΦ (9.4)
M δΦ
where
δS √
≡ −g (∇a ∇a Φ − V 0 (Φ)) (9.5)
δΦ
√
The factor of −g here means that this quantity is not a scalar (it is an example of
√
a scalar density). However (1/ −g)δS/δΦ is a scalar. We’ve written things in this
strange way in order to be consistent with how we treat the gravitational field.
Demanding that δS vanishes for arbitrary δΦ gives us the equation of motion
δS/δΦ = 0, i.e.,
∇a ∇a Φ − V 0 (Φ) = 0. (9.6)
The particular choice V (Φ) = 12 m2 Φ2 gives the Klein-Gordon equation.
where L is a scalar constructed from the metric. An obvious choice for the Lagrangian
is L ∝ R. This gives the Einstein-Hilbert action
√
Z Z
1 4 1
SEH [g] = d x −gR = R (9.8)
16π M 16π M
where the prefactor is included for later convenience and is the volume form. The
idea is that we regard our manifold M as fixed (e.g. R4 ) and gab is determined by
extremizing S[g]. In other words, we consider two metrics gab and gab + δgab and
demand that S[g + δg] − S[g] should vanish to linear order in δgab . Note that δgab is
the difference of two metrics and hence is a tensor field.
We need to work out what happens to and R when we vary gµν . Recall the
formula for the determinant “expanding along the µth row”:
X
g= gµν ∆µν (9.9)
ν
where we are suspending the summation convention, µ is any fixed value, and ∆µν is
the cofactor matrix, whose µν element is (−1)µ+ν times the determinant of the matrix
obtained by deleting row µ and column ν from the metric. Note that ∆µν is independent
of the µν element of the metric. Hence
∂g
= ∆µν = gg µν (9.10)
∂gµν
where on the RHS we recall the formula for the inverse matrix g µν in terms of the
cofactor matrix. We can use this formula to determine how g varies under a small
change δgµν in gµν (reinstating the summation convention):
∂g
δg = δgµν = gg µν δgµν = gg ab δgab (9.11)
∂gµν
(we can use abstract indices in the final equality since g ab δgab is a scalar) and hence
√ 1 1√
δ −g = √ (−δg) = −g g ab δgab (9.12)
2 −g 2
Levi-Civita connections associated to gab + δgab and gab ). Since the difference of two
connections is a tensor, it follows that δΓµνρ are components of a tensor δΓabc . These
components are easy to evaluate if we introduce normal coordinates at p for the unper-
turbed connection: at p we have gµν,ρ = 0 and Γµνρ = 0. For the perturbed connection
we therefore have, at p, (to linear order)
1 µσ
δΓµνρ = g (δgσν,ρ + δgσρ,ν − δgνρ,σ )
2
1
= g µσ (δgσν;ρ + δgσρ;ν − δgνρ;σ ) (9.14)
2
In the second equality, the semi-colon denotes a covariant derivative with respect to
the Levi-Civita connection associated to gab . The two expressions are equal because
Γ(p) = 0. The LHS and RHS are tensors so this is a basis independent result hence we
can use abstract indices:
1
δΓabc = g ad (δgdb;c + δgdc;b − δgbc;d ) (9.15)
2
p is arbitrary so this result holds everywhere.
Now consider the variation of the Riemann tensor. Again it is convenient to use
normal coordinates at p, so at p we have (using δ(ΓΓ) ∼ ΓδΓ = 0 at p)
where ∇ is the Levi-Civita connection of gab . Once again we can immediately replace
the basis indices by abstract indices:
and p is arbitrary so the result holds everywhere. Contracting gives the variation of
the Ricci tensor:
δRab = ∇c δΓcab − ∇b δΓcac (9.18)
Finally we have
δR = δ(g ab Rab ) = g ab δRab + δg ab Rab (9.19)
where δg ab is the variation in g ab (not the result of raising indices on δgab ). Using
δ(gµρ g ρν ) = δ(δµν ) = 0 it is easy to show (exercise)
δg ab = −g ac g bd δgcd (9.20)
where
X a = g bc δΓabc − g ab δΓcbc (9.22)
Hence the variation of the Einstein-Hilbert action is
Z
1
δSEH = δ( R)
16π M
Z
1 1 ab ab a
= Rg δgab − R δgab + ∇a X
16π M 2
√
Z
1 4 1 ab ab a
= d x −g Rg δgab − R δgab + ∇a X (9.23)
16π M 2
The final term can be converted to a surface term on ∂M using the divergence theorem.
If we assume that δgab has support in a compact region that doesn’t intersect ∂M then
this term will vanish (because vanishing of δgab and its derivative on ∂M implies that
X a will vanish on ∂M ). Hence we have
√
Z Z
1 4 ab
δSEH
δSEH = d x −g −G δgab = δgab (9.24)
16π M M δgab
Remark. The Palatini procedure is a different way of deriving the Einstein equation
from the Einstein-Hilbert action. Instead of using the Levi-Civita connection, we allow
for an arbitrary torsion-free connection. The EH action is then a functional of both the
metric and the connection, which are to be varied independently. Varying the metric
gives the Einstein equation (but written with an arbitrary connection). Varying the
connection implies that the connection should be the Levi-Civita connection. When
matter is included, this works only if the matter action is independent of the connection
(as is the case for a scalar field or Maxwell field) or if the Levi-Civita connection is
used in the matter action.
here Lmatter is a function of the matter fields (assumed to be tensor fields), their deriva-
tives, the metric and its derivatives. An example is given by the scalar field Lagrangian
discussed above. We define the energy momentum tensor by
2 δSmatter
T ab = √ (9.28)
−g δgab
in other words, under a variation in gab we have (after integrating by parts using the
divergence theorem to eliminate derivatives of δgab if present)
√
Z
1
δSmatter = d4 x −gT ab δgab (9.29)
2 M
This definition clearly makes T ab symmetric.
Example. Consider the scalar field action we discussed previously.
Z
S= L (9.30)
M
with L given by (9.2). Using the results for δ and δg ab derived above we have, under
a variation of gab :
√
Z
4 1 a b 1 1 cd
δS = d x −g ∇ Φ∇ Φ + − g ∇c Φ∇d Φ − V (Φ) g ab δgab (9.31)
M 2 2 2
Hence
1 cd
T = ∇ Φ∇ Φ + − g ∇c Φ∇d Φ − V (Φ) g ab
ab a b
(9.32)
2
If we define the total action to be SEH + Smatter then under a variation of gab we have
√
δ 1 ab 1 ab
(SEH + Smatter ) = −g − G + T (9.33)
δgab 16π 2
and hence demanding that SEH + Smatter be extremized under variation of the metric
gives the Einstein equation
Gab = 8πTab (9.34)
How do we know that our definition of Tab gives a conserved tensor? It follows from
the fact that Smatter is diffeomorphism invariant. In more detail, diffeomorphisms are a
gauge symmetry so the total action S = SEH + Smatter should be diffeomorphism invari-
ant in the sense that S[g, Φ] = S[φ∗ (g), φ∗ (Φ)] where Φ denotes the matter fields and
φ is a diffeomorphism. The Einstein-Hilbert action alone is diffeomorphism invariant.
Hence Smatter also must be diffeomorphism invariant. The easiest way of ensuring this
is to take it to be the integral of a scalar Lagrangian as we assumed above.
Now consider the effect of an infinitesimal diffeomorphism. As we saw when dis-
cussing linearized theory (eq (7.13)), an infinitesimal diffeomorphism shifts gab by
Matter fields also transform according to the Lie derivative (eq (7.12)), e.g. for a scalar
field:
δΦ = Lξ Φ = ξ a ∇a Φ (9.36)
Let’s consider this scalar field case in detail. Assume that the matter Lagrangian
is an arbitrary scalar constructed from Φ, the metric, and arbitrarily many of their
derivatives (e.g. there could be a term of the form ∇a ∇b Φ∇a ∇b Φ or RΦ2 ). Under an
infinitesimal diffeomorphism, (after integration by parts to remove derivatives from δΦ
and δgab )
Z
4 δSmatter δSmatter
δSmatter = dx δΦ + δgab
M δΦ δgab
1√
Z
4 δSmatter b ab
= dx ξ ∇b Φ + −gT δgab (9.37)
M δΦ 2
The second term can be written
√ √
Z Z
4 ab
d4 x −g ∇a T ab ξb − ∇a T ab ξb
d x −gT ∇a ξb =
M MZ
√
d4 x −g ∇a T ab ξb
=− (9.38)
M
where we assume that ξb vanishes on ∂M so the total derivative can be discarded. Now
diffeomorphism invariance implies that δSmatter must vanish for arbitrary ξb . Hence we
must have
δSmatter b √
∇ Φ − −g∇a T ab = 0. (9.39)
δΦ
∂ µ ∂µ Φ − m2 Φ = 0 (10.1)
Given the initial data Φ and ∂0 Φ at time x0 = 0, there is a unique solution to this
equation for x0 > 0. What is the analogous statement for the gravitational field?
In GR, not only do we need to determine the metric tensor but we also need to
determine the manifold on which this tensor is defined. So it is not obvious what
constitutes a suitable set of initial data for solving Einstein’s equation. However, it
seems likely that we will want to prescribe data on an “initial” hypersurface Σ which
should correspond to a “moment of time”. This we interpret as the requirement that
Σ should be spacelike.
What data should be prescribed on Σ? Since the Einstein equation, like the Klein-
Gordon equation, is second order in derivatives, one would expect that prescribing the
spacetime metric and the “time derivative of the metric” on Σ should be enough. In
fact, it turns out that we do not need to prescribe a full spacetime metric tensor on
Σ, but only a Riemannian metric describing the intrinsic geometry of Σ, related to the
full spacetime metric by pull-back. A notion of “time derivative of the metric” on Σ is
provided by the extrinsic curvature tensor of Σ.
Remark. Extrinsic curvature is of interest also in the case for which Σ is timelike so
we will allow for this possibility. We also allow for the possibility that our manifold is
Riemannian. So let Σ denotes a spacelike or timelike hypersurface with unit normal
na : na na = ±1 where the upper sign refers to timelike Σ (or a Riemannian manifold)
and the lower sign to spacelike Σ.
Lemma. For p ∈ Σ let hab = δba ∓ na nb so hab nb = 0. 1. hac hcb = hab . 2. Any
vector X a at p can be written uniquely as Xka + X⊥a where Xka = hab X b is tangent
to Σ and X⊥a = ±nb X b na is normal to Σ. 3. If X a and Y a are tangent to Σ then
hab X a Y b = gab X a Y b .
Proof. Exercise.
Remarks. hab = gab ∓ na nb is symmetric which means that it doesn’t matter whether
we write ha b or hab . Properties 1 and 2 show that hab is a projection onto Σ. Property
3 reveals that hab can be interpreted as the metric induced on Σ. This is sometimes
called the first fundamental form of Σ.
Remark. Let Na be normal to Σ at p and consider parallel transport of Na along a
curve in Σ with tangent vector X a , i.e., X b ∇b Na = 0. Does Na remain normal to Σ?
To answer this, let Y a be another vector tangent to Σ so Y a Na = 0 at p. Consider
how Y a Na varies along the curve: X(Y a Na ) = X b ∇b (Y a Na ) = Na X b ∇b Y a . So Y a Na
vanishes along the curve iff the RHS vanishes. So if parallel transport within Σ preserves
the property of being normal to Σ then (∇X Y )⊥ = 0 for any X, Y tangent to Σ. The
converse is also true. This motivates the following definition.
Definition. Up to now, na has been defined defined only on Σ so first extend it to
a neighbourhood of Σ in an arbitrary way (with unit norm). The extrinsic curvature
tensor
(also called the second fundamental form) Kab is defined at p ∈ Σ by K(X, Y ) =
a
−na ∇Xk Yk where X, Y are vector fields on M .
Lemma. Kab is independent of how na is extended and
− nd Xkc ∇c Ykd = −Xkc ∇c (nd Ykd ) + Xkc Ykd ∇c nd = Xkc Ykd ∇c nd = hca X a hdb Y b ∇c nd (10.3)
The final expression is linear in X a and Y b so the result follows. To demonstrate that
the result is independent of how na is extended, consider a different extension n0a , and
where the second equality uses ma = 0 on Σ and the final equality follows because it is
the derivative along a curve tangent to Σ, along which ma = 0.
Remark. nb ∇c nb = (1/2)∇c (nb nb ) = 0 because nb nb = ±1. Hence we can also write
Tensors at p which are invariant under projection can be identified with tensors defined
on the submanifold Σ, at p and vice-versa. More precisely, if φ : S → M with φ[S] = Σ
then tensors on M invariant under projection can be indentified with tensors on S. For
example, if X is a vector on S then φ∗ (X) is a vector on M that is invariant under
projection; we can define a metric φ∗ (g) on S and this corresponds to the tensor hab on
M . (See Hawking and Ellis for more details on this correspondence.)
Proposition. A covariant derivative D on Σ can be defined by projection of the
covariant derivative on M : for any tensor obeying (10.7) we define
Da T b1 ...br c1 ...cs = hda hbe11 . . . hberr hfc11 . . . hfcss ∇d T e1 ...er f1 ...fs (10.8)
The first term is symmetric (because ∇ is torsion-free). The second term involves
hca hdb ∇c hed = g ef hca hdb ∇c hdf = ∓g ef hca hdb nf ∇e nd = ∓ne Kab (10.10)
= hec hhd hai ∇e ∇h X i + hec hfd hai ∇e hhf ∇h X i + hec hhd hag (∇e hgi ) ∇h X i
where we used (10.10) in the final two terms. The final term can be written
where we used X i = hib X b because X a is tangent to Σ. We can now plug (10.13) into
(10.12): the second term on the RHS drops out when we antisymmetrize, leaving
0
R a bcd X b = 2he[c hfd] hag ∇e ∇f X g ± 2K[c a Kd]b X b (10.15)
2hec hfd hag ∇[e ∇f ] X g = hec hfd hag Rg hef X h = hec hfd hag hhb Rg hef X b (10.16)
where we used the Ricci identity for ∇ in the first equality and the fact that X a is
parallel to Σ in the second. Moving everything to the LHS now gives
0
e f a h g
R bcd − hc hd hg hb R hef ∓ 2K[c Kd]b X b = 0
a a
(10.17)
The expression in brackets is invariant under projection onto Σ and hence can be
identified with a tensor on Σ. X b is an arbitrary vector parallel to Σ. It follows that
the expression in brackets must vanish. The result follows upon relabelling dummy
indices.
Lemma. The Ricci scalar of Σ is
where K ≡ K a a .
0
Proof. R0 = hbd R c bcd (since hbd can be identified with the inverse metric on Σ). Now
use Gauss’ equation.
Proposition (Codacci’s equation).
Proof.
= hda hgb hfc heg ∇d ∇e nf + hda hgb hfc (∇d heg )∇e nf
= hda heb hfc ∇d ∇e nf ∓ Kab ne hfc ∇e nf (10.20)
2D[a Kb]c = 2hd[a heb] hfc ∇d ∇e nf = 2hda heb hfc ∇[d ∇e] nf = hda heb hfc Rf gde ng (10.21)
where ρ ≡ Tab na nb is the matter energy density measured by an observer with 4-velocity
na . R0 is determined by the metric on Σ, i.e., by hab . This equation reveals that we are
not free to specify hab and Kab arbitrarily on Σ: they must be related by this equation,
which is called the Hamiltonian constraint.
Now consider the “normal-tangential” components of Einstein equation by con-
tracting it with na and then projecting onto Σ:
(a) (M, g) satisfies the vacuum Einstein equation; (b) M contains a hypersurface Σ
with induced metric hab and extrinsic curvature tensor Kab .
Remarks.
2. This is an existence theorem, but what about uniqueness? There are infinitely
many spacetimes (M, g) satisfying the above theorem e.g. given one such space-
time one can apply a diffeomorphism which is non-trivial only in some region
which doesn’t intersect Σ. This produces a different metric g satisfying the above
theorem. This is to be expected because diffeomorphisms are a gauge symmetry
in GR. Hence we should only expect (M, g) to be unique “up to diffeomorphisms”.
One can show that this is indeed true locally near Σ, e.g., for small enough T .
However, uniqueness can fail globally. Nevertheless, given any spacetime satis-
fying the above theorem, one can find a subset D(Σ) ⊂ M called the domain
of dependence of Σ (defined in the Black Holes course) within which the solu-
tion is unique up to diffeomorphisms. Furthermore, there is a “biggest possible”
spacetime for which uniqueness holds (up to diffeomorphisms): this is called the
maximal Cauchy development of the initial data.
3. The above theorem was stated for the vacuum Einstein equation. Analogous
theorems exist for the Einstein equation with suitable matter fields e.g. a Maxwell
field, a scalar field, or a perfect fluid. In this case, one must also specify initial
data for the matter fields.
4. The proof of the theorem makes use of the harmonic coordinates we discussed in
section 6.5. See Wald’s book for this proof.