General Relativity: The Notes: C.P. Burgess

Download as pdf or txt
Download as pdf or txt
You are on page 1of 166

Preprint typeset in JHEP style - HYPER VERSION

General Relativity: the Notes∗

C.P. Burgess
Department of Physics & Astronomy, McMaster University,
1280 Main Street West, Hamilton, Ontario, Canada, L8S 4M1.
Perimeter Institute for Theoretical Physics,
31 Caroline Street North, Waterloo, Ontario, Canada, N2L 2Y5.

Abstract: These notes present a brief introduction to Einstein’s General Theory


of Relativity, prepared for the course Physics 3A03.


Cliff
c Burgess, March 2009
Contents

1. Elements of Differential Geometry 2


1.1 The Geometry of Surfaces 3
1.2 More General Curved Space 17

2. Special Relativity and Flat Spacetime 24


2.1 Minkowski Spacetime 25
2.2 Inertial Particle Motion 28
2.3 Non-inertial Motion 31
2.4 Conserved Quantities 37

3. Weak Gravitational Fields 44


3.1 Newtonian Gravity 44
3.2 Gravity as Geometry 48
3.3 Relativistic Effects in the Solar System 53

4. Field Equations for Curved Space 70


4.1 Gravity as curvature 70
4.2 Einstein’s Field Equations 71
4.3 Rotationally Invariant Solutions 73

5. Compact Stars and Black Holes 76


5.1 Orbits 77
5.2 Radial geodesics 80
5.3 Singularities of the solution 82
5.4 Black Holes and Event Horizons 84
5.5 Quantum Effects Near Black Holes 86
5.6 Rotating Black Holes 90

6. Other Astrophysical Applications 95


6.1 Stellar interiors 95
6.2 Gravitational Lensing 102
6.3 Gravitational Waves 108
6.4 Binary pulsars 110
6.5 Astrophysical Black Holes 115

–1–
7. Cosmology 120
7.1 Kinematics of an Expanding Universe 120
7.2 Distance vs Redshift 128
7.3 Dynamics of an Expanding Universe 136
7.4 The Present-Day Energy Content 146
7.5 Earlier Epochs 153
7.6 Hot Big Bang Cosmology 157

1. Elements of Differential Geometry

The essence of general relativity is that gravity is described by the geometry of


spacetime, and so this first section pauses to summarize some of the mathematics
used to describe non-Euclidean geometries. Before doing so, a brief reminder about
Euclidean geometry.

Euclidean Geometry
Euclid founded his study of plane (i.e. 2-dimensional) geometry on the following five
axioms:

1. Any two points can be joined by a straight line.

2. Any straight line segment can be extended indefinitely in a straight line.

3. Given any straight line segment, a circle can be drawn having the segment as
radius and one endpoint as center.

4. All right angles are congruent.

5. Parallel postulate: If two lines intersect a third in such a way that the sum of
the inner angles on one side is less than two right angles, then the two lines
inevitably must intersect each other on that side if extended far enough.

All of these seem to be obviously true, given the standard notions of what a point,
straight line, circle, right angle and congruence mean. Among the consequences of
these axioms are many familiar statements like: the ratio of a circle’s circumference,
C, to its radius, r, is a universal number: C/r = 2π; the ratio of a circle’s area,
A, to the square of its radius is also a universal number A/r2 = π; the sum of the
interior angles of a triangle sum to 180 degrees, and so on. We are used to taking

–2–
these consequences for granted when understanding the relations amongst objects in
physical space.
The rest of this section is devoted to describing simple situations where they
do not all apply. Once this is done, it becomes an experimental issue whether or
not the Euclidean axioms are properties of the space in which we find ourselves
situated. The goal of this section is to develop the tools for this, by setting up a
precise characterization of these new geometries, and the ways they can differ from
Euclidean space.

1.1 The Geometry of Surfaces

The non-Euclidean geometries that are easiest to visualize are those of two-dimensional
surfaces, such as planes, spheres or hyperbolae. These are easy to picture since we
can envision these surfaces embedded in 3-dimensional space.
To this end consider the 3-dimensional vector space, IR3 , whose vectors, r, de-
scribe the distance from an (arbitrary) origin, O, to the various points in space. It
is convenient to describe such a vector by its components referred to a ‘rectangular’
basis of unit vectors, {ex , ey , ez }, oriented in a fixed but arbitrary direction, so that

r = x ex + y ey + z ez
= xi e i , (1.1)

where the three coordinates, (x, y, z), each can run from −∞ to ∞.
Some important notation is introduced in the second equality of eq. (1.1), which
writes x1 = x, x2 = y, x3 = z, and e1 = ex , e2 = ey and e3 = ez . There is
also an implied sum from 1 to 3 over the repeated index ‘i’, or any other repeated
index taken from the middle of the Latin alphabet for that matter. (Indices taken
from the beginning of the Latin alphabet are encountered later, where they run over
a, b = 1, 2; and indices from the Greek alphabet also come up, and will be summed
from µ, ν = 0, 1, 2, 3.) This rule for summing over repeated indices is called the
Einstein summation convention, and in terms of it the dot product of two vectors
with components a = ai ei and b = bi ei can be written a · b = δij ai bj , where the
Kronecker-δ symbol has the property that δij = 1 if i = j and δij = 0 otherwise.
We take the distance, s(r1 , r2 ), between any two points, r1 and r2 , in IR3 is given
in terms of their rectangular coordinates by the usual Pythagorean rule
p
s(r1 , r2 ) = |r1 − r2 | = (r1 − r2 ) · (r1 − r2 )
q
= δij (xi1 − xi2 )(xj1 − xj2 ) (1.2)
p
= (x1 − x2 )2 + (y1 − y2 )2 + (z1 − z2 )2 ,

–3–
where the middle line again uses the Einstein summation convention. This definition
has the important property that it does not depend at all on the origin, O, and
orientation of the axes, ei = {ex , ey , ez }, that are required to define the coordinates
xi = {x, y, z} describing r1 and r2 .

Curves in Space
Before describing two-dimensional surfaces in IR3 , it is worth briefly digressing to
describe the simpler case of one-dimensional curves. A curve in IR3 is defined by the
locus of points that are swept out as a single parameter varies:

r(u) = x(u) ex + y(u) ey + z(u) ez


= xi (u) ei . (1.3)

Here the parameter u labels the points on the curve and our interest is usually in
component functions xi (u) = {x(u), y(u), z(u)} that are multiply differentiable with
respect to u.
For example, straight lines in this picture are described by linear functions,
r(u) = a + b u, where a and b are arbitrary constant vectors. When the origin, O,
is not on the straight line (i.e. a 6= 0) then the origin together with the line define
a plane, which is spanned by the vectors a and b. More generally, a straight line is
also given by r(u) = a + b f (u), for any function f (u) that satisfies df /du 6= 0, since
this simply represents a relabelling of the points along the curve.
By contrast, a curve of the form r(u) = c + a cos u + b sin u traces out a more
complicated closed shape, which becomes an ellipse if a and b are perpendicular to
one another: a · b = ax bx + ay by + az bz = 0. In this case c specifies the position of
the ellipse’s centre, and its two semi-major axes are
√ q p
a = |a| = a · a = a2x + a2y + a2z = δij ai aj
√ q p
b = |b| = b · b = b2x + b2y + b2z = δij bi bj . (1.4)

This ellipse is inscribed on the plane spanned by the vectors a and b, and degenerates
into a circle in the special case that a and b have the same length: a = b.
The family of vectors that lie tangent to a curve r(u) is found by differentiation,

dr dx dy dz dxi
t(u) = = ex + ey + ez = ei , (1.5)
du du du du du
and a one-parameter family of unit vectors tangent to the curve is found by normal-
izing
t(u)
et (u) = , (1.6)
|t(u)|

–4–
so et · et = 1 for all u. For a straight line, r(u) = a + bf (u), the tangent
dr df
t(u) = =b , (1.7)
du du
has a constant direction, but a u-dependent length that depends on the precise
function f (u) used to parametrize the curve. But for any parametrization the unit
tangent vector for a straight line is a constant vector: et = b/|b|. The basis vectors,
ei = {ex , ey , ez } may themselves be regarded as unit tangent vectors to the curves
defining the rectangular coordinate axes themselves: that is, ex is the unit tangent
to the curves along which y and z are constant, and similarly for ey and ez .
The tangent to the elliptical curve centered at the origin, r(u) = a cos u + b sin u
is given by t(u) = −a sin u + b cos u, whose direction changes continuously with u,

with norm |t(u)| = a2 sin2 u + b2 cos2 u (and we use a · b = 0). In this case the unit
tangent is et (u) = (−a sin u + b cos u)/|t(u)|. Notice that the inner product between
the radius vector and the tangent is t(u) · r(u) = (b2 − a2 ) sin u cos u, which vanishes
for all u in the case of a circle, where b = a.

Distances along curves


Measures of length and angle play a central role in geometry, and since angle (in
radians) is defined in terms of ratios of lengths, the basic problem is how to measure
length within curved surfaces. This section describes a first step in this direction:
measuring length along curves.
The starting point is eq. (1.2), telling us how distances are measured in IR3 . We
apply this to find the distance, ds, between two points on a curve, r(u) and r(u+du),
that are infinitesimally far from one another.
r r
dr dr dr dxi dxj
ds = |r(u + du) − r(u)| = du = · du = δij du , (1.8)
du du du du du
The arc-length along a finite-sized interval of the curve is then obtained by integration
Z u2 r
dxi dxj
s(u1 , u2 ) = du δij . (1.9)
u1 du du
For example, for the circle r(u) = a(ex cos u+ey sin u) we have dr/du = a(−ex sin u+
ey cos u) and so ds = a du, giving s(u1 , u2 ) = a(u2 − u1 ).
Arc-length provides a particularly physical way to parameterize a curve. Once
this is done the tangent vector to a curve is automatically a unit vector. To see this
consider a generic curve, r(u), defined using a generic parameter, u. The tangent
vector computed using arc-length as a parameter is
dr dr du t
= = = et , (1.10)
ds du ds |t|
where t = dr/du and eq. (1.8) is used to evaluate du/ds = 1/|t|.

–5–
Curvature of curves
In addition to the unit tangent, et =
dr/ds, there is also a natural family of
orthonormal basis vectors that can be
defined everywhere along a curve. A unit
vector, n, that is always perpendicular
to et is found as above by differentia-
tion with respect to arc length: n(s) =
det /ds. The fact that this definition gives
a vector normal to et can be seen by dif-
ferentiating the condition et · et = 1, as
follows:
det 1 d 
et · n = et · = et · et = 0 .
ds 2 ds
(1.11)
Figure 1: The Frenet-Serret basis vectors The plane spanned by t(s) and n(s) at
and the osculating plane (Wikipedia). each u is called the osculating plane for
the curve r(s). The vectors
n(s)
et (s) , en (s) = and the cross product eb (s) = et (s) × en (s) , (1.12)
|n(s)|
give an orthonormal triad of vectors at each point along the curve, one of which is
always tangent.
Because these vectors form a basis, their derivative along the curve can be ex-
panded in terms of them, leading to:
det
= κ en
ds
den
= −κ et + τ eb (1.13)
ds
deb
= − τ en .
ds
These expressions are known as the Frenet-Serret formulae, and the basis et , en
and eb is called the Frenet-Serret basis. The coefficients in this expression give a
differential measure of the curvature, κ(s), and torsion, τ (s), at each point of the
curve r(s).

Exercise 1: Use the definitions of et , en and eb to prove that only


two parameters, κ and τ , are required to label their derivatives as in
eqs. (1.13).

–6–
Notice that the definitions show that κ = τ = 0 for a straight line. Conversely,
if κ and τ vanish for all u, then eqs. (1.13) can be integrated twice to show that the
corresponding curve, r(u), is a straight line. Similarly, if τ should vanish for all u
(with κ(u) arbitrary), then the curve must be confined to the plane that is normal
to the constant vector eb .

Exercise 2: Show that the curvature and torsion of the curve r(s) =
a[ex cos(s/a) + ey sin(s/a)] (a circle of radius a) are constant, with κ =
1/a and τ = 0. Repeat for the helical curve r(u) = a(ex cos u+ey sin u)+
` u ez , keeping in mind that the arc-length in this case satisfies s =

u a2 + ` 2 .

Surfaces in IR3
A two-dimensional surface embedded in IR3 is similarly defined by the locus of points
swept out by a two-parameter family,

r(u, v) = xi (u, v) ei = x(u, v) ex + y(u, v) ey + z(u, v) ez . (1.14)

Alternatively, it is sometimes more convenient to define the surface implicitly, rather


than explicitly, such as through an algebraic condition of the form f (r) = 0. In this
case the expression r(u, v) can be regarded as being obtained as the solution to this
condition. We next provide explicit representations for some simple surfaces, many
of which are used as illustrative examples in later sections.
Planar surfaces:
A plane passing through the origin and spanned by two linearly-independent
vectors a and b is swept out by a surface whose equation has the form

r(u, v) = a u + b v , (1.15)

with −∞ < u, v < ∞. Straight lines can be inscribed inside such a plane, such as
r(u) = a u or r(v) = b v or r(u) = (a + b) u, as can circles. As is easily verified, the
geometry of these circles and straight lines defined for any such a plane satisfies the
axioms of Euclidean geometry.
Planes can equally well be specified through a constraint f (r) = 0. For example,
the plane r(u, v) = ex u + ey v defined by the x- and y-axes is equally well described
as the general solution to the condition z = 0, and so f (r) := z = ez · r.
Cylindrical surfaces:
A slightly more interesting example is provided by a cylindrical surface. A rep-
resentation for a cylinder concentric with the z-axis and having an elliptical profile

–7–
aligned with the x- and y-axes would be

r(u, v) = ex a cos u + ey b sin u + ez v , (1.16)

where 0 ≤ u < 2π and −∞ < v < ∞. The constants a and b define the semi-major
axes of the elliptical cross sections taken at fixed z. This elliptical cylinder could
equally well be specified by the condition f (r) := (x2 /a2 ) + (y 2 /b2 ) − 1 = 0. It is
possible to inscribe straight lines on such a cylinder, but only if they are parallel
with the z-axis: for instance r(v) = ex a cos u? + ey b sin u? + ez v, where u? is any
particular, fixed, value of u.
Spherical surfaces:
The surface of a sphere provides an example of a truly curved surface (in a
sense explained in detail below). A representative sphere centered at the origin with
radius a can be represented as the surface f (r) := x2 + y 2 + z 2 − a2 = 0, or explicitly
parameterized using spherical polar coordinates (u = θ and v = φ) by:

r(u, v) = ex a sin u cos v + ey a sin u sin v + ez a cos u , (1.17)

with 0 < u < π and 0 ≤ v < 2π. It is intuitively clear that no straight lines can be
inscribed on a sphere.

Inscribed Curves
Given a surface r(u, v) = xi (u, v) ei in IR3 , an inscribed curve is a curve, x(w) =
xi (w) ei , in IR3 whose points also lie within the surface. For instance if the surface
is defined by a condition of the form f (r) = 0, then an inscribed curve satisfies
f (x(w)) = 0 for all values of its parameter, w. An alternative way of describ-
ing an inscribed curve is to specify the curve parameters, {u(w), v(w)}, that trace
out the points along the curve: r((u(w), v(w)) = x(w). For instance, the circle
x(w) = a(ex cos w + ey sin w) is inscribed in the sphere r(u, v) = a(ex sin u cos v +
ey sin u sin v+ez cos u), and can be described by the parameter values {u(w), v(w)} =
{ π2 , w}.
The tangent to an inscribed curve can therefore be written either in terms of
derivatives of x(w) or r(u, v),
dx d ∂r du ∂r dv
t= = r(u(w), v(w)) = + . (1.18)
dw dw ∂u dw ∂v dw
It is useful to use the Einstein summation convention to combine the above expres-
sions into the more compact notation
∂r dua ∂xi dua
t= = ei , (1.19)
∂ua dw ∂ua dw

–8–
where a = 1, 2 with u1 = u and u2 = v.
A particularly simple family of inscribed curves is obtained by holding fixed either
one of the two parameters, u or v, that define the surface itself. Consider for instance
a surface defined by the locus of points swept out by a particular parameterization
r(u, v). A family of curves lying in this surface, parameterized by u, is found by
setting v to some fixed value v = v? : r(u) = r(u, v? ). Different values of v? produce
different members of this family of curves. A second family of curves lying within
r(u, v) is similarly obtained by fixing u at a sequence of values, u = u? , and letting
the variation of v parameterize the curves: r(v) = r(u? , v). Different choices for u?
then define different members of this family of curves.
It is possible to use the tangents of inscribed curves to define a pair of linearly
independent tangent vectors to any surface that are not necessarily orthogonal. These
are given above simply by computing the tangent vector for the inscribed curves along
which only one of either u or v varies. The tangent to the curves along which only
u varies is given by
∂r
t(u) = (u, v? ) , (1.20)
∂u
and a family of unit vectors tangent to these curves are then given by eu = t(u)/|t(u)|.
The tangents to the curves along which only v varies are similarly given by
∂r
t̂(v) = (u? , v) , (1.21)
∂v
and the unit tangent becomes ev = t̂(v)/|t̂(v)|.
Again using the notation ua = {u1 , u2 } = {u, v}, these may be written

∂r ∂xi
ta = = ei , (1.22)
∂ua ∂ua
where t1 = t while t2 = t̂. The span of the normalized vectors ea (u, v) define the
tangent plane to the surface at the point labelled by (u, v).
A normal vector defined everywhere on the surface r(u, v) may then be con-
structed using the two families of tangent vectors defined above, eu and ev , by taking
the cross product: en (u, v) = eu (u, v) × ev (u, v). This defines a basis of vectors that
is adapted to the surface at every point.
Notice that if the surface is specified by a constraint, f (r) = 0, then an alternative
way to identify this normal direction is by taking the gradient of f :
     
∂f ∂f ∂f
n = ∇f = ex + ey + ez , (1.23)
∂x ∂y ∂z
because the following argument shows this vector is orthogonal to the tangent vectors.
The argument relies on the observation that if r(u, v) is a parametrization of the

–9–
surface defined by f (r) = 0, then what this means is f (r(u, v)) = 0 for all u and v.
Differentiating this last expression with respect to u or v, and using the chain rule,
then implies ∇f · (∂r/∂u) = ∇f · (∂r/∂u) = 0, or
∂r ∂xi ∂x ∂f ∂y ∂f ∂z ∂f
ta · ∇f = a
· ∇f = a
∂i f = a
+ a + a = 0, (1.24)
∂u ∂u ∂u ∂x ∂u ∂y ∂u ∂z
which states that ∇f is perpendicular to both the tangent vectors, ta = {t, t̂}.
Eq. (1.24) introduces the notation

∂i := , (1.25)
∂xi
and (for practice) is rewritten several ways to emphasize the Einstein summation
convention.

Distances along surfaces


Distances along a surface are similarly measured along a curve inscribed in this
surface, and in general the distance between two points depends on the details of
which curve is used to link these points, just as is also true for points in IR3 .
In IR3 when one speaks of the distance between two points without referring to
the curve involved, what is meant is the distance along the straight line that connects
the two points. Since a straight line cannot in general be inscribed into a generic
curved surface it is clear that the same definition cannot generically be used to define
a distance between points in a generic surface.
An exception to this is when the two points of interest are infinitesimally sepa-
rated on the surface: r(u, v) and r(u + du, v + dv), since in this case the straight-line
curve that connects them is arbitrarily close to an inscribed arc lying on the surface.
In this case the distance between the points becomes
r
∂r ∂xi ∂xj
ds = |r(u, v) − r(u + du, v + dv)| = a dua = δij a b dua dub . (1.26)
∂u ∂u ∂u
The last version of this equation, using the Einstein summation convention, is most
commonly written without the ugly square root:
ds2 = γab (u, v) dua dub , (1.27)
where the right-hand side defines what was historically called the surface’s first fun-
damental quadratic form — or its induced metric in more modern parlance — with
∂xi ∂xj
γab = δij . (1.28)
∂ua ∂ub
A central point of the geometry of surfaces is that any intrinsic property of the
surface — that is, involving only distances and angles associated to inscribed curves
on the surface — can be expressed in terms of γab (u, v) and its derivatives.

– 10 –
Exercise 3: Show that the induced metric for the plane given by r(u, v) =
ex u + ey v in IR3 is !
1 0
γab = δab = , (1.29)
0 1
where δab is the Kronecker δ-function, defined just above eq. (1.2). Repeat
the calculation for the cylinder r(u, v) = a(ex cos u + ey sin u) + ez v to
show that ! !
γuu γuv a2 0
γab = = . (1.30)
γvu γvv 0 1

Finally repeat for the sphere r(u, v) = a(ex sin u cos v + ey sin u sin v +
ez cos u), to show
! !
γuu γuv a2 0
γab = = . (1.31)
γvu γvv 0 a2 sin2 u

The arc-length along any inscribed curve running between points A and B may
now be found by integrating eqs. (1.27) in the form:
Z wB Z wB r
ds dua dub
s(A, B) = dw = dw γab (w) , (1.32)
wA dw wA dw dw

where γab (w) = γab (u(w), v(w)).

Angles between inscribed curves


The angle, θ, between two inscribed curves that intersect at a point P can also be
computed using γab evaluated at P .
To see this suppose the curves x1 (s) and x2 (s) inscribed in the surface r(u, v)
intersect at the point labelled by (u, v) = (u? , v? ). The angle between these curves
may be defined as the angle between their tangent vectors, evaluated at P :
dx1 ∂r dua1
t1 = = , (1.33)
ds ∂ua ds
where ua1 (s) = {u1 (s), v1 (s)} describes the parameters which describe the locus of
points on the surface through which the inscribed curve x1 (s) = r(u1 (s), v1 (s)) passes.
An identical expression also holds for t2 = dx2 /ds and ua2 (s) = {u2 (s), v2 (s)}. Clearly
the norm of the tangent vector evaluated at P is therefore given by

dx1 dx1 ∂r ∂r dua1 dub1 dua1 dub1


|t1 |2 = · = = γ (u ,
ab ? ? v ) , (1.34)
ds ds ∂ua ∂ub ds ds ds ds

– 11 –
and similarly for |t2 |2 .
Using a · b = |a||b| cos θ, where θ is the angle between a and b, we have

dua1 dub2 γab (u? , u? ) dua1 dub2


 
t1 · t2 1 ∂r ∂r
cos θ = = · = . (1.35)
|t1 ||t2 | |t1 ||t2 | ∂ua ∂ub ds ds |t1 ||t2 | ds ds

Combining eq. (1.35) with eq. (1.34) applied to both |t1 | and |t2 | then shows that
θ can be determined purely in terms of γab (u? , v? ) and the quantities dua1 /ds and
dua2 /ds, all of which would be accessible to an observer trapped to live on the surface.

Geodesics
Although straight lines cannot in general be defined for curves inscribed on a general
surface in IR3 , there is a natural definition of what is the straightest line possible
given the surface’s curvature. This definition starts from the observation that a
straight line connecting two points in IR3 gives the curve along which the distance
between these points is minimized.
This suggests identifying those curves on a given surface that minimize the dis-
tance between points, and letting these stand in for straight lines from the point of
view of the intrinsic geometry of the surface. Such curves are called geodesics, and
are readily computed once the induced metric, γab (u, v), is everywhere known. The
explicit calculation of these curves is left to a subsequent section.

Curvature of surfaces
We have seen that it is always possible to define the curvature, κ(s), for a curve,
x(s), by using the Frenet-Serret basis for x(s) as above, whose derivatives along the
curve satisfy the Frenet-Serret formulae, eqs. (1.13). In particular, the first formula,
det /ds = κ(s) en , gives κ in terms of the magnitude, |det /ds|, of the rate of change
of the curve’s unit tangent. However, because the en direction need not be specially
correlated with the tangent or normal to the surface in which x(s) is inscribed, this
definition of curvature need have little to do with the properties of the surface.
To obtain a measure of the surface’s curvature it is therefore useful to focus on a
specific family of inscribed curves, x(s), defined by the intersection of the surface with
any of the planes that contain the surface’s normal vector, n (see fig. 2). Because
they are defined by construction to lie within a plane, such inscribed curves have
vanishing torsion, τ (s) = 0. Furthermore, because the osculation plane spanned by
et and en = det /ds contains n, and because the surface’s normal, n, is necessarily
orthogonal to the tangent of any inscribed curve, it follows that det /ds must be
parallel (or antiparallel) to the normal direction, n.
The curvature, κ(s), defined using the Frenet-Serret formulae, eqs. (1.13), for
such a curve is called a normal curvature, κn (s), of the surface at the point x(s).

– 12 –
Figure 2: Illustration of several planes whose intersection with a surface define the curves
whose curvature is a normal curvature (Wikipedia).

These are not unique, since they depend on the direction of the plane containing n
that is used in the construction. The surface’s principal curvatures, κ1 and κ2 , are
defined at each point as the maximum and minimum values taken by the normal
curvatures as the direction of this plane is varied.
The surface’s mean curvature, H, and Gaussian curvature, K, are then defined
as the arithmetic and geometric means of κ1 and κ2 :
1
H = (κ1 + κ2 ) and K = κ1 κ2 . (1.36)
2
Although it is not clear from its definition, Gauss’ Theorema Egregium states that
the Gaussian curvature can be determined purely in terms of lengths and angles
measured within the surface — that is, in terms of the induced metric γab and its
derivatives — and so is a property intrinsic to the surface itself (as opposed to an
extrinsic property that depends on how the surface is embedded into the external
IR3 ).

Exercise 4: Show that the principal curvatures for the plane r(u, v) =
ex u+ey v are κ1 = κ2 = 0. Repeat for the cylinder r(u, v) = a(ex cos u+
ey sin u) + ez v to show they are κ1 = 0 and κ2 = 1/a. Finally, show that
for the sphere r(u, v) = a(ex sin u cos v + ey sin u sin v + ez cos u), the
principal curvatures are equal and positive: κ1 = κ2 = 1/a.

– 13 –
Changing the parametrization
Notice that the discussion so far did not need to provide any details about the
kinds of parameters, (u, v), used to specify the surface. Before generalizing the
above discussion to more general spaces, it is worth first digressing briefly about how
quantities change as the coordinates used to describe them change.

Contravariant vectors

When describing a surface, r(u, v), we saw that the inscribed curves along which
the parameters ua = {u, v} vary could be used to provide a natural basis, ta =
∂r/∂ua , for the surface’s tangent plane. Because this forms a basis, it can be used
to define the components of any vector at all that is tangent to the surface:

c = ca ta = cu tu + cv tv , (1.37)

Suppose we now change our parametrization of the surface, defining new pa-
0
rameters ua (u, v) = {u0 (u, v), v 0 (u, v)} that provide equally good labels for points
on the surface: r(u, v) = r(u0 (u, v), v 0 (u, v)). Provided that the new parameters are
really independent of one another (more about this below), the tangents to these
0
new parameter curves define a new basis, ta0 = ∂r/∂ua , of the same tangent plane,
in terms of which the same vector c has the expansion
0 0 0
c = ca ta0 = cu tu0 + cv tv0 . (1.38)
0
To obtain the relation between the coefficients ca and ca we relate the two sets
of tangent bases to one another, using the chain rule:
0 0
∂r ∂ua ∂r ∂ua
ta = = = ta0 , (1.39)
∂ua ∂ua ∂ua0 ∂ua
and so 0
a ∂ua a
c = c ta = c ta0 (1.40)
∂ua
which implies
0
a0 ∂ua a
c =c . (1.41)
∂ua
Components ca that transform in this way under a change of parametrization are
called contravariant components, and c would be called a contravariant vector.
An earlier-mentioned proviso to this discussion was the requirement that the new
coordinates be independent of one another and so provide a faithful parametrization
of the surface. Eq. (1.39) provides a local criterion for when this is so, since it is
equivalent to asking when the new pair of tangent vectors are linearly independent of

– 14 –
one another (as is required if they are to form a basis). Since eq. (1.39) can equally
well be written in matrix notation as

t = J t0 , (1.42)

with ! ! !
tu tu0 ∂u0 /∂u ∂v 0 /∂u
t= , t0 = and J = , (1.43)
tv tv0 ∂u0 /∂v ∂v 0 /∂v

the new basis is linearly independent if and only if the matrix J is invertible, or
equivalently if its determinant, J = det J — the Jacobian of the transformation
(u, v) → (u0 , v 0 ) — is nonzero: J 6= 0.

Covariant Vectors

There is an alternative way of using parameters on a surface to describe vectors


that are tangent to the surface. Instead of defining a set of basis vectors that are
tangent to lines along which one parameter varies, ta = ∂r/∂ua = (∂xi /∂ua ) ei , one
can instead define a basis of vectors, sa , using the normals to the surfaces along
which one of the parameters is held constant. That is we ask the basis sa to satisfy
the defining condition
sa · tb = δ a b , (1.44)

where, as before, the Kronecker symbol satisfies δ a b = 1 if a = b and vanishes


otherwise. Such a basis is often called a basis dual to the basis of tangent vectors.
Although these two definitions give the same bases
in many of the simple coordinates commonly used
(like rectangular or polar coordinates), they need not
always do so. An example of an ‘oblique’ set of coor-
dinates, where these two definitions would not agree,
is given by the parameters on a plane defined by
r(u, v) = a u + b v, when the vectors a and b are
not orthogonal to one another. A cartoon of these
Figure 3: An example where coordinates is given in figure 3. In general curvi-
normals to coordinate surfaces
linear coordinates both bases are possible, and most
(small arrows) are not equiva-
importantly, transform differently when the surface
lent to tangents to coordinate
is reparameterized (u, v) → (u0 v 0 ).
directions (lines).
Since the sa form a basis for the surface’s tangent
plane, a general vector, c, tangent to the surface can be expanded

c = cb sb , (1.45)

– 15 –
and the coefficients in this expansion are given by
c · ta = cb sb · ta = cb δ b a = ca . (1.46)
These components change if the parameters used to label the surface are changed,
(u, v) → (u0 (u, v), v 0 (u, v)), but in a different way than did the components ca arising
when c is expanded directly in terms of the ta ’s. Using eq. (1.39) with eq. (1.46)
gives
∂ub ∂ub
ca0 = c · ta0 = c · t b = cb , (1.47)
∂ua0 ∂ua0
0
where we use that the partial derivatives ∂ub /∂ua make up the elements of the matrix
0
J −1 that is inverse to the matrix J whose elements are ∂ua /∂ub . Equivalently, using
the Einstein summation convention we use the identity
0 0
∂ub ∂ua b ∂ub ∂ua b0
a0 c
= δ c and its partner a c 0 = δ c0 . (1.48)
∂u ∂u ∂u ∂u
Coefficients that transform as in eq. (1.47) are said to transform as covariant com-
ponents.

Tensors
Since the first fundamental form, γab , is so central to the geometry on a surface,
it is worth knowing how it transforms when the parameters labelling the surface
are changed. Keeping in mind its definition in terms of the distance, ds, along the
surface, eq. (1.27), and recognizing that a physical quantity like ds must be parameter
independent shows that if (u, v) → (u0 , v 0 ), then the chain rule implies
∂ua ∂ub c0 d0
ds2 = γab dua dub = γab 0 0 du du , (1.49)
∂uc ∂ud
which shows
∂ua ∂ub
γc0 d0 (u0 , v 0 ) = γab (u(u0 , v 0 ), v(u0 , v 0 ))
. (1.50)
∂uc0 ∂ud0
Since this looks like two copies of the transformation rule, eq. (1.47), the quantity
γab is said to transform like a covariant tensor of rank 2.
More generally, if something having many indices transforms under a change of
parameters like
0 0
0 0 ∂ua1 ∂uak ∂ud1 ∂ud`
T a1 ..ak b01 ..b0` (u0 , v 0 ) = T c1 ..ck d1 ..d` (u(u0 , v 0 ), v(u0 , v 0 ))
· · · 0 · · · 0 ,
∂uc1 ∂uck ∂ub1 ∂ub`
(1.51)
is called a tensor of covariant rank ` and contravariant rank k. The special case
of something which has no indices, such as an inner product between two vectors
tangent to a surface,
∂r ∂r
m · n = (ma ta ) · (nb tb ) = ma nb · = γab ma nb , (1.52)
∂ua ∂ub

– 16 –
transforms according to
a0 b0
∂uc ∂ud
   
a0 b0 e ∂u f ∂u
γa0 b0 m n = γcd a0 m n = γef me nf , (1.53)
∂u ∂ub0 ∂ue ∂uf

(which uses eq. (1.48) twice) and is called a scalar.


The reason tensors like this are important is that physical laws cannot depend
on our arbitrary choice of how we parameterize a surface. A physical statement
— like F = m a, say — directly relates physical objects, like vectors, distances or
inner products. And although the components of each quantity like F, m and a
can individually change when different parameters or bases are used, it is always
true that both sides of the equality transform in precisely the same way. Thus it is
important for Newton’s Law that F and the product of m times a both transform
as vectors.
We similarly demand on curved space that any reasonable physical law must have
the form A = B, where both sides of the equation are tensors of precisely the same
type. This ensures that once we know the components Aa1 ..ak b1 ..b` and B a1 ..ak b1 ..b` are
equal in a particular basis, it is automatic that they will also be equal in any other
basis we should choose to examine.

1.2 More General Curved Space

We are now ready to kick away the crutch of embedding surfaces into flat IR3 and
formulate directly what non-Euclidean geometry might look like in three (or more)
dimensions. The key in doing so is to focus on those relations derived above for
surfaces that do not make any reference at all to how the surface is situated within
its embedding space.

Tensors and Curvilinear Coordinates


We start by choosing an arbitrary set of coordinates, xi , to label the points in three di-
mensions, without requiring that these coordinates be the usual rectangular {x, y, z}.
For instance we could instead use spherical polar coordinates xi = {r, θ, φ}, or any
other choice of coordinates which happens to suit our purposes.1
Just as for surfaces we can also define curves ‘inscribed’ within our space by
specifying how the coordinates vary along the curve: xi (u) = {x1 (u), x2 (u), x3 (u)}.
At each point P in our three-dimensional space we can define a tangent space, TP —
1
As a technical point, it is not necessary that any one choice of coordinates describe all of the
points in space. It is sufficient to have a collection of coordinate choices which cover the entire
space once taken together, with sufficient overlap between pairs of coordinate patches to allow the
results of measurements to be translated from one set of coordinates to another.

– 17 –
i.e. a generalized tangent plane — comprising the vector space spanned by all of the
tangents at P to the curves that pass through P .
A choice of coordinates provides a natural basis for describing vectors that lie
within the tangent space at each point. This can be taken to be defined by the vectors
ti that are tangent to the curves along which only one of the coordinates varies.
Notice that this basis need not be normalized or mutually orthogonal, although it
must be linearly independent and complete.
In terms of the basis ti , the tangent, t, to any other curve defined by xi (w) has
components
dxi
t= ti . (1.54)
dw
These components define a contravariant vector, in the sense that if we change co-
0
ordinates from xi to xi , the components t in the new basis vectors, ti0 , are given
by
0 0
dxi ∂xi dxj
= . (1.55)
dw ∂xj dw
Such a coordinate transformation is only well-defined if the matrix whose entries are
0
∂xi /∂xj is invertible.
A contravariant tensor, T, having rank p is similarly defined to have components
involving p indices, that transform under a coordinate change according to
0 0
0 0 ∂xi1 ∂xip
T i1 ..ip (x0 ) = T j1 ..jp (x(x0 )) · · · . (1.56)
∂xj1 ∂xjp
Metrics
We now come to the central concept. The essence of the geometry is determined
by specifying a notion of distance between points within the space. This is done
by giving the metric, gij (x) = gji (x), which is a symmetric three-by-three positive-
definite matrix whose entries are a function of position. gij (x) is defined to give the
distance between two infinitesimally displaced points, situated at xi and xi + dxi , as
q
ds = gij (x) dxi dxj . (1.57)

The square root is always real because gij is positive definite, and ds = 0 only occurs
if dxi = 0. This last equation is more commonly written without the square root as

ds2 = gij (x) dxi dxj . (1.58)

Besides providing a notion of distance, the metric provides a natural way to define
the angle between two curves. This is done by using the metric to define an inner
product between the tangent vectors of the two curves at their point of intersection.
That is, suppose the curves xi1 (u) and xi2 (v) both pass through the point P , then

– 18 –
their tangent vectors, m1 and m2 respectively have components dxi1 /du and dxi2 /dv.
Guided by eq. (1.35), we can then define the intersection angle between the two
curves as
m1 · m2
cos θ = p , (1.59)
(m1 · m2 )(m2 · m2 )
evaluated at P , where the inner product is defined in terms of the vector components
as
a · b = gij ai bj . (1.60)

Notice that in particular the inner product of the tangents of the two curves xi1 (u)
and xi2 (v) becomes
dxi1 dxj2
m1 · m2 = gij , (1.61)
du dv
and so in particular, for basis vectors defined as tangents to the coordinate lines
themselves we have
ti · tj = gij . (1.62)

Having a notion of angles also means we know what it means for vectors to be
orthogonal: a · b = 0. This then allows the definition of the second natural basis for
vectors, si , in terms of the normals to the surfaces on which one coordinate is held
fixed. Such a dual basis must satisfy si · tj = δ i j , and so if a vector m is expanded
m = mi ti and m = mi si , then the components are given by

mi = m · ti = (mj tj ) · ti = mj gij . (1.63)

It is convenient to define the quantities g ij as the components of the matrix


that is inverse to the matrix whose components are gij . Such a matrix always exists
because the fact that gij is positive definite excludes the possibility of it having a
zero eigenvector, and so not having an inverse. With this definition we have

g ij gjk = δ i k , (1.64)

and so multiplying eq. (1.63) by g ik (including the implied sum over i, from the
Einstein summation convention) gives

mk = g ki mi . (1.65)

Notice that its definition, together with the invariance of the distance element,
ds, implies that under a coordinate transformation gij transforms as

∂xk ∂xl
gi0 j 0 = gkl , (1.66)
∂xi0 ∂xj 0

– 19 –
what is called the transformation of a covariant tensor of rank 2. Similarly the
covariant components, mi , of a vector m transform as (compare with eq. (1.56))

∂xj
m = mj
i0 , (1.67)
∂xi0
which is a covariant tensor of rank 1, or one-form. The transformation properties of
covariant tensors of higher rank can be similarly defined.

Geodesics
Returning to the main line of development, following the example of curves on a sur-
face, we now define a geodesic as the curve that minimizes the distance between two
points. Such curves are the natural generalization of the straight lines of Euclidean
geometry.
To determine the local equations that govern geodesics, we must first find an
expression for the distance between two points, A and B, that is to be minimized.
If this distance is measured along a curve, xi (u), that connects them, the distance
may be found by integrating the infinitesimal definition, eq. (1.57), in the form
r
uB uB uB
dxi dxj
Z Z Z
ds
q
sAB = du = du gij (x(u)) = du gij (x(u)) ẋi ẋj .
uA du uA du du uA
(1.68)
i i
This introduces the simplifying notation ẋ := dx /du.
If xi (u) were the curve of minimum length, then the quantity sAB should be
stationary with respect to small changes, xi (u) → xi (u) + δxi (u), to the curve, at
least to first order in δxi (u). Such variations must vanish in the same way that small
changes to a function, f (x), vanish to linear order in δx if they are evaluated at a
function’s minimum, x = xm : f (xm + δx) − f (xm ) ' f 0 (xm ) δx = 0. The conceptual
difference here is that the length sAB is a functional that depends on the shape of
the entire curve, xi (u), and not simply on its value at a single point, like A or B.
To see what it means for sAB to be stationary, let us write it as sAB [x(u)],
to emphasize that it depends on the shape of the curve xi (u) in addition to the
endpoints. We then evaluate the difference, δsAB = sAB [x(u)+δx(u)]−sAB [x(u)], and
expand the result out to linear order in δxi (u), using gij (x(u) + δx(u)) ' gij (x(u)) +
δxk (u) ∂k gij (x(u)) in eq. (1.68) to find
!
1 uB δxk ∂k gij ẋi ẋj + gij δ ẋi ẋj + gij ẋi δ ẋj
Z
δsAB = du p
2 uA gij ẋi ẋj
B uB
gij ẋi δxj gij δxj
  Z     
i i k l s̈
= − du ẍ + Γkl ẋ ẋ − ẋi . (1.69)
ṡ A uA ṡ ṡ

– 20 –
p
This uses the notation ṡ = ds/du = gij ẋi ẋj and s̈ = d2 s/du2 and defines
1  
Γijk = Γikj := g il ∂j gkl + ∂k gjl − ∂l gjk , (1.70)
2
which is a useful quantity known as the Christoffel symbol of the second kind.2 Finally,
the last equality in eqs. (1.69) also arranges there to be no derivatives of δxi by
performing an integration by parts, using the identity
gij ẋi δ ẋj d gij ẋi δxj gij ẋi
   
j d
= − δx . (1.71)
ṡ du ṡ du ṡ
Exercise 5: Explicitly verify both equalities in eq. (1.69).
Now if xi (u) is a geodesic connecting A and B then sAB must be minimized for all
paths that connect A to B, so we must demand δsAB = 0 for any choice for δxi (u) that
satisfies δxi (A) = δxi (B) = 0. Since this last condition ensures [gij ẋi δxj /ṡ]BA = 0,
we ask what xi (u) must satisfy in order to ensure the vanishing of the integral in the
last line of eq. (1.69).
But now comes the main point: because δxi (u) is arbitrary, we can choose it to
vanish for all u apart from being positive in an arbitrarily narrow interval immediately
surrounding some point u = u? . This insures that the integral receives contributions
only from the integrand at u? , leading to the conclusion that the integrand must
therefore vanish at this point. But since we can choose δxi (u) to peak about an
arbitrary value of u? and sAB must be stationary with respect to all such variations,
we can conclude that this integrand must vanish for all u when evaluated for any
geodesic. But since gij is positive definite this is only possible if the square bracket
vanishes, leading to the following geodesic equation:
Dẋi
 
i i j k s̈
:= ẍ + Γjk ẋ ẋ = ẋi . (1.72)
du ṡ
Exercise 6: Use the transformation properties,
0 0
∂xi ∂xj 0 0 ∂xi ∂xj
gij = gi0 j 0 and g ij = g i j , (1.73)
∂xi ∂xj ∂xi0 ∂xj 0
0
under the coordinate transformation xi → xi to derive the transforma-
tion law 0 0 0
i i0 ∂xi ∂xj ∂xk ∂ 2 xi ∂xi
Γjk = Γj 0 k0 i0 + j k , (1.74)
∂x ∂xj ∂xk ∂x ∂x ∂xi0
and show thereby that the Christoffel symbols are not tensors. Similarly
show that although ẋi transforms as a contravariant vector, ẍi does not.
Finally, show that the sum ẍi +Γijk ẋj ẋk does transform as a contravariant
vector, ensuring that if it vanishes in one set of coordinates, it must also
vanish in all others.
2
The Christoffel symbol of the first kind is [i, jk] := gil Γljk = 12 (∂j gik + ∂k gij − ∂i gjk ).

– 21 –
The special case where s̈ = 0 (and so u = as + b for constants a and b) is called
an affinely-parameterized geodesic, which satisfies

ẍi + Γijk ẋj ẋk = 0 . (1.75)

Exercise 7: Use the explicit form computed earlier for the metric on a
2-sphere of radius a, ds2 = a2 (dθ2 + sin2 θdφ2 ) in spherical polar coor-
dinates, to show that the only nonzero Christoffel symbols, Γabc , in these
coordinates are:

Γθφφ = − sin θ cos θ and Γφθφ = Γφφθ = cot θ . (1.76)

Use this to show that the equations for an affinely parameterized geodesic,
{θ(s), φ(s)}, on a sphere are
 2
d2 θ dφ
2
− sin θ cos θ =0
ds ds
d2 φ
  
dθ dφ
and 2
+ 2 cot θ = 0. (1.77)
ds ds ds
Show that the solutions to these equations are great circles. (HINT: It
will simplify your life to choose your coordinates so that the two points
connected by your geodesic are chosen to lie on the sphere’s equator.)

Curvature
Since the metric, gij , can take different forms in different coordinate systems, trans-
forming as eq. (1.66), when confronted with a complicated metric it is important
to know how much of the complication comes from complications in the underlying
geometry and how much simply arises due to the use of a complicated coordinate
system. For instance, the two following metrics describe the same physical distance
relation,

ds2 = dx2 + dy 2 + dz 2
ds2 = dx2 + x2 dy 2 + x2 sin2 y dz 2 , (1.78)

but simply do so with different coordinate choices (rectangular and spherical coordi-
nates, respectively). Given an arbitrary metric,

ds2 = f (x, y, z) dx2 + g(x, y, z) dy 2 + h(x, y, z) dz 2


+2j(x, y, z) dx dy + 2k(x, y, z) dx dz + 2l(x, y, z) dy dz , (1.79)

it is useful to have a criterion for deciding when a coordinate transformation exists,


x = x(u, v, w), y = y(u, v, w) and z = z(u, v, w), that can put this into a simple form
like ds2 = du2 + dv 2 + dw2 , for which gij = δij .

– 22 –
In fact, at first sight it is tempting to conclude that it is always possible to
perform such a transformation. After all, gij can be regarded as defining the compo-
nents of a real symmetric matrix, g, and the transformation rule, eq. (1.66), can be
regarded as a similarity transformation,

g0 = S g ST , (1.80)

where the superscript ‘T ’ denotes transpose, and the matrix S has components
0
∂xi /∂xj . But any real symmetric matrix can always be made into the unit matrix
with an appropriate choice of S, since it can first be diagonalized using an orthogonal
matrix, and then its diagonal elements can be rescaled to unity.
Although the above argument does show that it is always possible to choose
coordinates so that gij = δij at any one point, it does not follow that this can be
done for an entire region of points at the same time (using the same coordinates). To
see why, suppose that the required matrix, S(x), is found, that when used in eq. (1.80)
ensures gi0 j 0 = δi0 j 0 . This can only be accomplished by a coordinate transformation
if there exist coordinates xi (x0 ) such that

∂xi i
j 0 = Sj 0 . (1.81)
∂x
But there can be integrability conditions that can obstruct being able to integrate
these equations to find the required xi (x0 ). For instance, if a solution is to exist it
0 0 0 0
must satisfy ∂ 2 xi /∂xj ∂xk = ∂ 2 xi /∂xk ∂xj , so no solution is possible if it should
happen that ∂k0 Sj 0 i 6= ∂j 0 Sk0 i .
It turns out that the freedom to change coordinates is sufficient to arrange that
both gij = δij at any particular point, P , and that Γijk = 0 at the same point.
Such coordinates are called Gaussian normal coordinates at P . Although this can
be arranged at any particular point, it cannot in general be arranged simultaneously
at all points in an open region around a given point.

Flat space

If there exist a set of coordinates for which gij = δij within a entire region, R,
(such as is possible for a 2D plane in IR3 , say) then this region is said to be flat. A
necessary and sufficient condition for this to be possible is that the following tensor:

Ri jkl = ∂k Γijl − ∂l Γijk + Γikm Γm i m


jl − Γlm Γjk . (1.82)

must vanish, Ri jkl = 0, everywhere in R. The tensor Ri jkl is called the Riemann
curvature tensor. (For a proof of this see, for example, the text by Weinberg listed
in the bibliography.)

– 23 –
Exercise 8: Use the transformation properties for Γijk derived in Exercise
6 to show that Ri jkl transforms as a tensor, ensuring that it suffices
to show that the Riemann tensor vanishes in one coordinate system to
conclude that it must vanish in them all.

Exercise 9: Use its definition, eq. (1.82), to prove the following symme-
try properties of Rijkl := gim Rm jkl :

Rijkl = Rklij = −Rjikl = −Rijlk , (1.83)

and
Rijkl + Riklj + Riljk = 0 . (1.84)

It is a theorem that Ri jkl is the unique tensor that can be constructed only using
the metric and its first and second derivatives at a point. Two related tensors can
be built from the Riemann tensor by taking traces using the metric. These are the
Ricci tensor, Rij , and the Ricci scalar, R, defined by

Rij := Rk ikj and R := g ij Rij = g ij Rk ikj . (1.85)

Exercise 10: Use the Christoffel symbols computed in exercise 1.2 to


compute explicitly the Riemann tensor for a 2-sphere in spherical polar
coordinates. Show in this way that its only nonzero component (up to
symmetries) is
Rθ φθφ = sin2 θ , (1.86)
and so Rijkl = (gik gjl − gil gjk )/a2 , while Rij = (1/a2 ) gij and R = 2/a2 =
2K, where K is the Gaussian curvature.

2. Special Relativity and Flat Spacetime


Once it is recognized that space can be curved its geometrical properties fall into
the domain of experiments, that can ask whether it is curved and how this curvature
might manifest itself physically. And if spacetime geometry is a physical quantity,
one might also seek the physical laws that govern its properties. General Relativity
is the result to which such a search leads.
As a first step towards making the connection between gravity and a physical
theory of geometry, it is important to realize that it is not just the geometry of
three-dimensional space that is at play; rather it is the geometry of four-dimensional
spacetime, defined as the union of all possible events in space for all times. Four

– 24 –
coordinates — three spatial coordinates, xi with i = 1, 2, 3, as well as time, x0 = t —
are required to specify positions of events in spacetime. These are collectively denoted
by xµ with Greek indices like µ, ν running from 0 to 3: xµ = {x0 , xi } = {t, x1 , x2 , x3 }.
Within such a picture, point particles can be regarded as sweeping out world
lines, xµ (u), through spacetime as time evolves. For instance, if we use time, t,
itself to parameterize such a world line, then a particle that sits motionless at the
fixed position r = a (or xi = ai , for constants ai ) has world line xµ (t) = {t, ai }.
The world line of a particle moving at constant speed v might similarly be written
xµ (t) = {t, v i t}, while that of a particle executing uniform circular motion in the
(x, y) plane would be xµ (t) = {t, a cos(ωt), a sin(ωt), 0}, and so on.

2.1 Minkowski Spacetime


If gravity is to be regarded as the physics of curved spacetime, we might expect that
the absence of gravity should be describable as the physics of flat spacetime. This
section aims to show that this is true, inasmuch as the non-gravitational physics
of special relativity is most efficiently expressed in terms of the geometry of flat
spacetime.

Inertial observers and the Minkowski metric


From this point of view, special relativity can be regarded as describing the motion of
particles in a spacetime that is endowed with a metric, ds2 = gµν dxµ dxν , for which
coordinates can be found for which gνλ is a constant (and so for which the Christoffel
symbols and curvature vanish Γµνλ = Rµ νλρ = 0). Observers whose measurements are
described by such coordinates are called inertial observers, and who are the observers
for which the standard postulates of special relativity apply:

1. Principle of Relativity: All laws of nature take the same form when written in
any inertial frame;

2. Invariance of the Speed of Light: All inertial observers measure precisely the
same numerical value, c = 299, 792, 458 m/s, for the speed of light in vacuum.

We therefore require these observers to use rectangular coordinates in space, xi =


{x, y, z}, and to move relative to one another by at most a constant velocity.

Exercise 11: Astronomers detect distant objects in the sky that appear
to move faster than light — how is this possible? Consider a very distant
object moving towards us at speed v at an angle θ to the line of sight.
Suppose the object sends us two light rays that depart at times t and t+dt,
and these are received at times t0 and t0 + dt0 (with all times measured

– 25 –
in our rest frame), during which time the object moves a distance dx =
v sin θ dt transverse to the line of sight. If the distance to the object when
the first signal is emitted is D = c (t0 − t), show that the distance to the
object when the second ray is sent is D − dD where dD = c(dt − dt0 ) '
v cos θ dt, assuming v dt  D. Use this to show that the apparent lateral
speed of the object is
  
dx dx dt v sin θ
veff = 0 = ' , (2.1)
dt dt dt0 1 − (v/c) cos θ

which can satisfy veff > c if θ is close to zero and v is close to (but smaller
than) c.

Because all inertial observers measure the same value for c, it is worth defining
our unit of distance to be light-seconds — i.e. the distance travelled by light in 1
second — so that c = 1 and the speed of any particle moving more slowly than light
satisfies 0 ≤ v < 1. (Such units would not be useful if all inertial observers did not
agree on the speed of light.) These units are used throughout the rest of these notes,
and conversion of subsequent formulae to ordinary units is accomplished by inserting
whatever factors of c are required to give the expression the correct dimensions. (E.g.
for a result like v = 0.2 to have the dimensions of m/s, its right-hand-side must really
be 0.2 c. Similarly, for E an energy, p a momentum and m a mass, E = p becomes
E = p c and E = m becomes E = m c2 .)
These observations guide us to choose the form taken for the metric to be one
for which all inertial observers agree. This suggests the constant metric agreed on
by inertial observers should be chosen to be the Minkowski metric, ηµν , defined by

ds2 = ηµν dxµ dxν = −dt2 + dx2 + dy 2 + dz 2 ,

and so for rectangular coordinates {x0 , x1 , x2 , x3 } = {t, x, y, z}, we have


 
−1
 1 
ηµν =  . (2.2)
 
 1 
1

Notice that this metric is not positive definite, unlike the metrics considered
when thinking about the geometry of three-dimensional space. But ds2 is positive
and agrees with our notion of distance in flat space if it is restricted to a purely spatial
interval, along which dt = 0. (The possibility that ds2 can be zero or negative is
the main reason why the geometry of spacetime differs from that of the geometry of
four-dimensional space.) If ds2 > 0 the interval is called spacelike, and will turn out

– 26 –
represent the a spatial distance along the interval for the particular inertial observers
who see dt = 0 along the interval.
By contrast, the situation ds2 = 0 describes the trajectory of a light ray. That is,
ds = 0 implies dt2 = d`2 , where d`2 = dx2 + dy 2 + dz 2 measures the spatial distance
traversed. Clearly any such a trajectory satisfies d`/dt = 1, and so moves at the
speed of light (since c = 1). The requirement that all inertial observers agree on the
interval ds2 therefore includes as a special case the condition that all such observers
agree on the speed of light in vacuo. An interval for which ds2 = 0 is called a null
interval.
In the situation where ds2 = −dt2 + d`2 < 0, the interval corresponds to the
world line of a trajectory of a particle moving at less than the speed of light, since

v 2 = (d`/dt)2 = 1+(ds/dt)2 < 1. In this case it is useful to define dτ = −ds2 , since
this represents the proper time elapsed by the observer moving along this trajectory
(for whom d` = 0). For this reason intervals for which ds2 < 0 are called timelike.

Lorentz transformations
The transformations of special relativity may now be defined as those which do not
change the Minkowski metric, eq. (2.2), since all such observers will agree on physical
distances and so also agree on physical laws that are expressed in terms of them.
The resulting transformations are given by a combination of translations,

x µ → x µ + aµ , (2.3)

and linear transformations,


xµ → Λ µ ν xν , (2.4)

where the constant matrices Λµ ν must satisfy

ηαβ Λα µ Λβ ν = ηµν . (2.5)

The group of transformations defined by eqs. (2.3) through (2.5) is called the Poincaré
group, while those defined by eqs. (2.4) and (2.5) alone are called the Lorentz group,
or the group O(3, 1).
Spatial rotations provide a special case, for which
!
1
Λµ ν = , (2.6)
M ij

where i, j = 1, 2, 3 runs over purely spatial directions, and M i j is an arbitrary 3 × 3


orthogonal matrix: δij M i k M j l = δkl . The group of all such matrices is called O(3).

– 27 –
For instance, for rotations about the z axis through an angle α we would have
 
cos α sin α
i
(Mz ) j = − sin α cos α . (2.7)
 
1

A second special case is given by a boost, which relates two inertial observers
who move at constant speed relative to one another. For instance if the motion is
along the x axis, then such a boost is described by
 
cosh β sinh β
 sinh β cosh β 
(Λx )µ ν =  , (2.8)
 
 1 
1

and β is a parameter related (more about which below) to the relative speed of the
two observers who are related by the boost. Boosts along the y and z axes are
similarly given by
   
cosh β sinh β cosh β sinh β
 1   1 
(Λy )µ ν =   and (Λz )µ ν =  . (2.9)
   
 sinh β cosh β   1 
1 sinh β cosh β

Exercise 12: Verify that the transformations (2.6) and (2.8) satisfy
condition (2.5).

2.2 Inertial Particle Motion


Newton’s first law states that a particle does not accelerate in the absence of external
forces, and so in special relativity the spacetime trajectory (or world-line) of such an
inertial particle (on which no forces act) is given by a straight line,

xµ (τ ) = aµ + v µ f (λ) , (2.10)

where aµ and v µ are constant 4-vectors and λ is a parameter that labels the points
along the curve (and so for which the otherwise arbitrary function satisfies df /dλ >
0). For later purposes notice that any such a curve satisfies

d2 xµ d2 f µ
 2
d f /dλ2 dxµ

= v = , (2.11)
dλ2 dλ2 df /dλ dλ

and so can be interpreted as a geodesic in flat spacetime (c.f. eq. (1.72)).

– 28 –
The interval measured along the trajectory is
2
dxµ dxν

2 df
ds = ηµν dλ2 = (v · v) dλ2 , (2.12)
dλ dλ dλ
so it follows that v µ must satisfy v · v = ηµν v µ v ν < 0 for a timelike trajectory, in
which case the vector v µ is also said to be timelike. (By contrast, for motion at the
speed of light — such as for a photon — v µ would instead be null: v · v = 0.)
For motion slower than the speed of light we define the proper time, τ , as the
distance measured along the trajectory, and so ds2 = −dτ 2 , and it is convenient to
use λ = τ as the parameter along the curve. In this case uµ := dxµ /dτ is called
the 4-velocity of the trajectory, and eq. (2.12) then implies u · u = −1. Writing its
components as
dxµ
 
µ dt dx dy dz
=u = , , , (2.13)
dτ dτ dτ dτ dτ
 
dt dx dy dz
= 1, , , , (2.14)
dτ dt dt dt
the condition u · u = −1 implies dt/dτ satisfies (dt/dτ )2 (1 − v2 ) = 1, where the
velocity 3-vector, v, is defined to have components v i = dxi /dt. We read off from
this the time dilation that relates the proper time τ to the time t of the observer
with respect to which the trajectory has velocity v:
dt 1
:= γ = √ . (2.15)
dτ 1 − v2
where the condition dt/dτ > 0 fixes the sign of the square root used in this expression.
We may now relate the parameter β appearing in a Lorentz boost to the speed,
v, of the inertial observers involved, and thereby verify that eq. (2.8) describes a
standard Lorentz transformation familiar from special relativity. To this end, suppose
Λµ ν is the Lorentz boost which transforms from the frame of an observer at rest
(and so whose 4-velocity is uµ = (1, 0, 0, 0)) to the frame of an inertial observer
moving with speed v along the x axis (and so whose 4-velocity is uµ = (γ, γv, 0, 0)).
Requiring that the Lorentz transformation of eq. (2.8) is the one that relates these
two 4-velocities gives the parameter β in terms of the speed, v. That is, if
    
γ cosh β sinh β 1
γv   sinh β cosh β  0
 =   , (2.16)
    
0  1  0
0 1 0
then cosh β = γ and sinh β = γv, and so tanh β = v. Notice that the definition
γ = (1 − v 2 )−1/2 is then equivalent to the identity cosh2 β − sinh2 β = 1. β is
sometimes called the rapidity of the moving particle.

– 29 –
Exercise 13: Prove the identity Λx (β1 )Λx (β2 ) = Λx (β1 + β2 ) for the
composition of two boosts along the x axis, as in eq. (2.8), and use this
to show that the inverse of the matrix Λx (β) is Λ−1
x (β) = Λx (−β). Use
your result with the relation v/c = tanh β to derive the relativistic law
for adding velocities: if β = β1 + β2 then
v1 + v2
v= . (2.17)
1 + v1 v2 /c2

Using this connection between β and v in the relation between the coordinates
0 0
in these two frames, xµ = Λµ ν xν , or
 0   
t cosh β sinh β t
x0   sinh β cosh β   x
 0 =    , (2.18)
    
y   1  y 
z0 1 z

leads (temporarily replacing the factors of c) to the familiar expressions

t + vx/c2 x + vt
t0 = p , x0 = p , (2.19)
1 − v 2 /c2 1 − v 2 /c2

together with y 0 = y and z 0 = z. The fact that these expressions imply that events
sharing a common value for t are not the same as those sharing a common value for
t0 — i.e. the relativity of simultaneity — that makes it much more efficient to think
in terms of spacetime, rather than space and time separately.

Exercise 14: Calculate the relation between the coordinates {t0 , x0 , y 0 , z 0 }


and {t, x, y, z} obtained by first performing a boost in the x direction with
speed v followed by a boost in the y direction with speed u.

Lorentz tensors
Physical quantities in different inertial frames in special relativity transform as ten-
sors with respect to Lorentz transformations
 0 
µ01 ..µ0p ν1 ..νp µ0p
Λ ν1 · · · Λ νp Λρ1 λ01 · · · Λρq λ0q .
µ1

T λ1 ..λ0q = T
0 ρ1 ..ρq (2.20)

As a result the Principle or Relativity is automatically satisfied if physical laws having


the schematic form of tensor = tensor, since the tensor transformation rule ensures
that if the law is true for any one frame, it must be true for them all.

– 30 –
For instance, the instantaneous 4-momentum of a particle having rest-mass m
moving along a trajectory xµ (τ ) transforms as a 4-velocity, that is defined in terms
of the 4-velocity, eq. (2.13), by
dxµ
pµ = m = m uµ . (2.21)

The components of pµ define the particle’s instantaneous energy, E = p0 , and 3-
momentum, pi , and so (using the components for uµ found earlier):
m mv i
p0 = E = m γ = √ and pi = m γ v i = √ . (2.22)
1 − v2 1 − v2
Notice that the condition ηµν (dxµ /dτ )(dxν /dτ ) = −1 implies ηµν pµ pν = pµ pµ =
−m2 , which is equivalent to the relativistic energy-momentum relation

E 2 = p2 + m2 . (2.23)

To describe photons we take the limit m → 0 and dτ → 0, so that pµ remains


fixed and well-defined. (The velocity dxµ /dλ is also well-defined, although it is no
longer possible to choose proper time, τ , as the parameter along the world line.) The
resulting 4-momentum satisfies ηµν pµ pν = pµ pµ = 0, and so E = |p|.
As an example of the utility of knowing that quantities like pµ and uµ transform
as 4-vectors under Lorentz transformations, consider the following proof that

E = −uµ pµ = −ηµν uµ pν , (2.24)

gives the energy of a particle with 4-momentum pµ as seen by an observer with 4-


velocity uµ . The proof starts by showing (by direct evaluation) that the result is
trivially true in the simple special case where the observer is at rest, in which case
uµ = {1, 0, 0, 0}. To obtain the result for a general observer it suffices to recognize
that the 4-vector transformation properties of uµ and pν ensure that the quantity
0 0
uµ pµ is Lorentz invariant. That is, if xµ = Λµ ν xν is the Lorentz transformation
0 0 0 0
that takes us to the observer’s rest frame, then pν = Λν λ pλ and uµ = Λµ ρ uρ , and
so
0 0 0 0
ηµ0 ν 0 uµ pν = ηµ0 ν 0 Λµ ρ uρ Λν λ pλ = ηρλ uρ pλ , (2.25)
which uses eq. (2.5). This ensures that all inertial observers must obtain the same
thing for uµ pµ , and so it suffices to show that E = −uµ pµ in the observer’s rest frame
to conclude it must be true for any frame.

2.3 Non-inertial Motion


The geometry of flat space captures equally well the relativistic kinematics of particles
that are not moving at constant speed.

– 31 –
Accelerated particles
For instance, consider an arbitrary trajectory, xµ (τ ), that does not describe motion
at constant velocity, such as the following trajectory describing a particle that accel-
erates along the x axis from rest at x = 0, until its speed reaches v = vmax at which
point it then decelerates back to rest a distance ` away and then returns to x = 0,
again at rest:
n o  
vmax t
 
µ 2
x (t) = t, x(t), y(t), z(t) = t, ` sin , 0, 0 . (2.26)
`
Here the inertial observer’s time, t, is used to label points on the curve, with 0 ≤ t ≤
T = π`/vmax describing the entire round trip. The turning point at x = ` is achieved
at t = 12 T , and because the instantaneous particle speed seen by the inertial observer
is  
dx 2vmax t
v(t) = = vmax sin , (2.27)
dt `
the maximum speed on the outbound leg takes place at t = 14 T .
The proper time measured by a clock riding with the particle along such a
trajectory is

dτ 2 = −ds2 = −ηµν dxµ (t)dxν (t) = 1 − v 2 (t) dt2 ,


 
(2.28)

and so the 4-velocity and 4-acceleration become


dxµ dt dxµ 1 n o
uµ = = =p 1, v (t) , 0, 0
dτ dτ dt 1 − v 2 (t)
d2 xµ dt duµ dv/dt n o
and aµ := = = v(t), 1, 0, 0 , (2.29)
dτ 2 dτ dt [1 − v 2 (t)]2
with
2
 
dv 2vmax 2vmax t
= cos . (2.30)
dt ` `
In relativistic Newtonian mechanics the force responsible for this motion is described
by a 4-vector, F µ = m aµ . Notice that all inertial observers must agree on the proper
acceleration given by the Lorentz-invariant definition
 2
2 µ ν µ 1 dv
a := ηµν a a = aµ a = 2 3
. (2.31)
[1 − v (t)] dt
Exercise 15: Compute the proper time, 4-velocity, 4-momentum and
4-acceleration for the following trajectories: (a) constant proper acceler-
ation along the z axis, xµ (u) = {` sinh(αu), 0, 0, ` cosh(αu)}, and (b) uni-
form circular motion in the x-y plane, xµ (u) = {t, d cos(ωt), d sin(ωt), 0}.
What is the physical interpretation of the parameters `, α, d and ω used
in these trajectories?

– 32 –
Exercise 16: Suppose a family of light rays having frequency ω is sent
parallel to the x-y plane at an angle θ to the x axis, and so has 4-
momentum k µ = {~ω? , ~ω? cos θ, ~ω? sin θ, 0}. Show that this satisfies
kµ k µ = 0, as it must if it is tangent to the trajectory of a light ray. Use
the relation E = ~ω and E = −ηµν uµ k ν to evaluate the frequency of
the photons that is measured by observers moving along the accelerated
trajectories in the previous exercise (Exercise 15).

Twin ‘paradox’
The Twin ‘Paradox’ compares the time elapsed for two identical clocks (or twins),
one of which travels along an accelerated trajectory as described above, while the
other remains at rest at x = 0. The time elapsed for the motionless clock is simply
the difference in t between the events when the two clocks separate and rejoin, and
so is ∆t = tf − ti = π`/vmax = T , while the time elapsed by the moving clock is
found by integrating eq. (2.28):
Z T p Z T s  
2 2 2 2πt 2 E(vmax )
∆τ = dt 1 − v (t) = dt 1 − vmax sin = , (2.32)
0 0 T π

where E(v) denotes the Elliptic-E func-


1 tion, defined by
Z 1 r
0.95 1 − vx2
E(v) := dx , (2.33)
0.9 0 1 − x2
0.85 and so which satisfies E(1) = 1. The
0.8
result for ∆τ /∆t — the elapsed proper
time for the moving twin as a fraction of
0.75
the time elapsed for the twin at rest —
0.7
is given as a function of vmax /c in Fig. 4.
0.65 The ‘paradox’ is that the moving twin
0 0.2 0.4 0.6 0.8 sees less time pass, but this is not really
v
a paradox at all since there is no rea-
Figure 4: The ratio ∆τ /∆t of time elapsed son why clocks on inertial and acceler-
for the moving and stationary twins as a func- ated trajectories must agree with one an-
tion of the moving twin’s maximum speed. other. Indeed, the trajectory of the clock
at rest is a geodesic for the Minkowski
metric ηµν – it is after all a straight line and this metric is constant (and so flat).
But the negative sign in the time part of the Minkowski metric ensures that time-like
geodesics describe the maximum distance between two points in spacetime (whereas,
by contrast, geodesics in space give the minimum distance between two points). So

– 33 –
we are guaranteed that all other accelerating clocks also record an elapsed time that
is smaller than the one of the clock at rest.

Exercise 17: Imagine two clocks that both perform uniform circular
motion of radius a in the x-y plane, but in opposite directions: xµ (u) =
{t, a cos(ωt), ± a sin(ωt), 0}. Suppose these clocks are synchronized to
agree when they are coincident at x = a at t = 0. How much time
elapses until the next time the clocks are at x = a, as seen by each clock
as well as by the inertial observer whose time is labelled by t?

Noninertial observers
In special relativity the laws of nature are simpler as seen by inertial observers, whose
0 0
rectangular positions and times are related by Lorentz transformations xµ = Λµ ν xν ,
but look different for observers who do not move at constant speeds relative to inertial
observers. This section computes an example of this, and shows in the process that
Newton’s first law of motion in a non-inertial frame can nonetheless still be regarded
as stating that particles move along geodesics in the absence of external forces.
To see how this works, consider the particular case of an observer experiencing
uniform circular motion in the x-y plane who uses coordinates, xµ = {t, x, y, z}, in
0
terms of which an inertial observer’s coordinates, xµ = {t, x0 , y 0 , z}, can be written

x0 = x cos(Ω t) − y sin(Ω t) and y 0 = x sin(Ω t) + y cos(Ω t) . (2.34)

Ω here represents the angular velocity of the uniform circular motion. Using this
relation we have
h i
dx0 = dx cos(Ω t) − dy sin(Ω t) − x sin(Ω t) + y cos(Ω t) Ω dt
h i
dy 0 = dx sin(Ω t) + dy cos(Ωt) + x cos(Ω t) − y sin(Ω t) Ω dt , (2.35)

so the Lorentz-invariant element of distance becomes


2 2
ds2 = −dt2 + dx0 + dy 0 + dz 2
h i
= −1 + (x2 + y 2 )Ω2 dt2 + 2(x dy − y dx) Ω dt + dx2 + dy 2 + dz 2 , (2.36)

corresponding to the metric


   
gtt gtx gty gtz −1 + r2 Ω2 −yΩ xΩ 0
  −yΩ
g gxx gxy gxz   1 0 0
 xt
= , (2.37)


 gyt gyx gyy gyz   xΩ 0 1 0
gzt gzx gzy gzz 0 0 0 1

– 34 –
where r2 = x2 + y 2 .
For this non-inertial observer, a particle whose position is fixed in space defines
a world line along which only t varies, so dx = dy = dz = 0 (corresponding to
a particle executing uniform circular motion from the point of view of the inertial
observer). Proper time along such a trajectory as measured with the non-inertial
observer’s metric is dτ 2 = −ds2 = −gtt dt2 = (1 − r2 Ω2 ) dt2 , in agreement with the
inertial observer’s result (given that the inertial observer attributes a speed v = rΩ
due to the uniform circular motion).
In the absence of forces the inertial observer would say that particle trajectories
0 0 0 0 0 0
are straight lines: xµ = xµ0 + uµ τ for constant xµ0 and uµ ; or d2 xµ /dτ 2 = 0. These
same trajectories do not have the same form for the non-inertial observer, since they
do not correspond to xµ = xµ0 + uµ τ or d2 xµ /dτ 2 = 0.
0
But recall that d2 xµ /dτ 2 = 0 is the equation for a geodesic for the metric
gµ0 ν 0 = ηµ0 ν 0 , and that the condition for a geodesic can be written for a general
metric by
d2 xµ ν
µ dx dx
λ
+ Γνλ = 0, (2.38)
dτ 2 dτ dτ
which is a form that is equally valid in any coordinate system. We should therefore
expect that this equation describes motion in the absence of forces as seen by our
non-inertial, uniformly rotating observer. But then what is the significance of the
Christoffel symbols, Γµνλ , in the non-inertial frame?
To find out we compute the nonzero components of Γµνλ , recalling the definition
of the Christoffel symbols

µ 1 µρ  
Γνλ = g ∂ν gλρ + ∂λ gνρ − ∂ρ gνλ . (2.39)
2
For the metric of interest the only nonzero metric derivatives are ∂x gtt = 2 x Ω2 ,
∂y gtt = 2 y Ω2 , ∂y gtx = ∂y gxt = −Ω, and ∂x gty = ∂x gyt = Ω, and the inverse metric is
 tt tx ty tz   
g g g g −1 −y Ω xΩ 0
 g xt g xx g xy g xz   −y Ω 1 − y 2 Ω2 xy Ω2 0
= , (2.40)
   
 yt yx yy yz   2 2 2
g g g g   xΩ xy Ω 1 − x Ω 0
g zt g zx g zy g zz 0 0 0 1

and so the nonzero Christoffel symbols (of the second kind) turn out to be

Γxtt = −x Ω2 , Γytt = −y Ω2 , Γxyt = Γxty = −Ω and Γyxt = Γytx = Ω . (2.41)

With these expressions the equations for a geodesic become

d2 t d2 z
= =0 (2.42)
dτ 2 dτ 2

– 35 –
and
2
d2 x
   
dt dt dy
− x Ω2 − 2Ω =0
dτ 2 dτ dτ dτ
2
d2 y
   
dt dt dx
− y Ω2 + 2Ω = 0. (2.43)
dτ 2 dτ dτ dτ
The first two of these may be integrated to give

z = z0 + γ vz τ = z0 + vz t and t = γ τ , (2.44)

where z0 , vz and γ are constants. Using these in the second two equations, and
changing variables using d/dτ = γ(d/dt), then gives
d2 x
 
2 dy
− x Ω − 2 Ω =0
dt2 dt
d2 y
 
2 dx
2
− yΩ + 2Ω = 0. (2.45)
dt dt
Defining the angular momentum vector by w := Ω ez , these equations can be written
in vector form as
d2 r
+ w × (w × r) + 2 w × v = 0 , (2.46)
dt2
where r := x ex + y ey + z ez and v := dr/dt.
Eqs. (2.45) have as their solutions

x = x0 (t) cos(Ω t) + y 0 (t) sin(Ω t)


and y = −x0 (t) sin(Ω t) + y 0 (t) cos(Ω t) . (2.47)

where x0 (t) = x00 +vx t and y 0 (t) = y0 +vy t. The condition gµν (dxµ /dτ )(dxν /dτ ) = −1
then implies (as usual) γ = (1 − v 2 )−1/2 where v 2 = vx2 + vy2 + vz2 .
We see that the Christoffel symbols provide precisely the ‘fictitious forces’ that
are required in order to ensure that the geodesics are straight lines, expressed in the
non-inertial coordinates. And experience with classical physics allows these fictitious
forces to be recognized as old friends; with the w × (w × r) term of eq. (2.46)
representing the centrifugal force and the velocity-dependent w × v term giving
the coriolis force associated with a rotating reference frame. The fact that Γµνλ do
not transform as the components of a tensor is consistent with the fact that these
fictitious forces can vanish in some frames — e.g. inertial ones — even if they do
not in others.

Exercise 18: Show that distances measured by non-inertial observers


with coordinates xµ = {ξ, χ, y, z} defined by t = χ sinh(aξ) and x =
χ cosh(aξ) (with χ > 0) are given by the Rindler metric

ds2 = −a2 χ2 dξ 2 + dχ2 + dy 2 + dz 2 . (2.48)

– 36 –
Show that observers whose world-lines are the curves along which only
ξ varies undergo constant proper acceleration with invariant magnitude
ηµν (duµ /dτ )(duν /dτ ) = 1/χ2 . Show that the only nonzero Christoffel
symbols for this metric are Γχξξ = a2 χ and Γξχξ = Γξξχ = 1/χ, and so show
that geodesics satisfy the equations d2 y/dτ 2 = d2 z/dτ 2 = 0 and
 2
d2 ξ 2 dχ dξ d2 χ 2 dξ
2
+ = 0 and 2
+a χ = 0. (2.49)
dτ χ dτ dτ dτ dτ
Use these, and the identity dχ/dξ = χ̇/ξ˙ (where over-dots denote d/dτ ),
to show that if ξ parameterizes the geodesics, then χ(ξ) satisfies
 2
d2 χ χ̈ χ̇ ξ¨ 2 2 dχ
= − = −a χ + , (2.50)
dξ 2 ξ˙2 ξ˙3 χ dξ
revealing the fictitious forces required to describe inertial motion in this
accelerated frame. Show that the curves χ(ξ) = ` e± aξ solve this equation.

2.4 Conserved Quantities


A special role is played in physics by conserved quantities like electric charge, energy
and momentum, since these are all conserved and they all act as sources for known
forces of nature. As we shall see, energy and momentum are sources for gravity in
much the same way as electric charges and currents source electromagnetism. In order
to motivate how energy and momentum density is formulated within a relativistic
theory — as will be required in order to state in later sections how they act as
sources for gravity — it is convenient first to recall how other conserved quantities,
like electric charge density, are formulated.

Electric Current
If there is an observer who sees a nonzero density of electric charge, σ(x, t), then
anyone else who moves relative to this observer must see a nonzero electric current
density, j(x, t), in addition to seeing a charge density which is different due to the
Lorentz contraction of space in the direction of motion, and due to the change in the
relative motion of the moving charges. It follows that σ and j must transform into
one another under Lorentz transformations, and it turns out that they transform as
a 4-vector with components: !
0
j = σ
jµ = , (2.51)
ji
where j i represent the 3 spatial components of the current density vector, j. Being
a 4-vector means that it transforms under a Lorentz transformation as
0
j µ = Λµ ν j ν , (2.52)

– 37 –
and so in the specific case of a boost between inertial observers moving at relative
speed v, c.f. eqs. (2.8) and (2.19), this becomes

0 σ + v · j/c2 j + vσ
σ0 = j 0 = p , j0 = p , (2.53)
1 − v 2 /c2 1 − v 2 /c2

Conservation of electric charge may be expressed in terms of this 4-vector in a


manifestly Lorentz-invariant way, as

∂j 0
∂µ j µ = + ∇ · j = 0. (2.54)
∂t
Since this is a scalar, if any observer finds the right-hand-side to vanish, then all
inertial observers must also find it to vanish. That this equation expresses local
charge conservation may be seen by integrating it over a volume V having boundary
∂V , and using Gauss’ theorem
Z  0  Z Z
∂j 3 d 3
0= +∇·j d x= σd x+ n · j d2 S , (2.55)
V ∂t dt V ∂V

where d2 S denotes an infinitesimal area element of the surface, whose outward-


pointing normal vector is n. Written this way it is clear that charge is conserved,
inasmuch as the rate of change of the total charge in any volume V is equal to the
net flux of charge carried by the current through the boundaries of V .

Electromagnetism
Since charges and currents are sources for electric, E, and magnetic, B, fields, these
must similarly transform into one another under Lorentz transformations. It turns
out that these six quantities transform as the components of an antisymmetric tensor,
Fµν = −Fνµ , according to
   
F00 F01 F02 F03 0 −Ex −Ey −Ez
  Ex 0 Bz −By 
F F11 F12 F13   
 10
= , (2.56)
F23   Ey −Bz 0 Bx 

 F20 F21 F22
F30 F31 F32 F33 Ez By −Bx 0

which labels the inertial coordinate in the usual way, xµ = {x0 , x1 , x2 , x3 } = {t, x, y, z}.

Exercise 19: Use the transformation properties under Lorentz transfor-


mations of a covariant tensor of rank 2 to compute how the components
of electric and magnetic fields, E and B, are related for observers who
move relative to one another with constant speed v along the x-axis.

– 38 –
There are two types of fundamental laws in electromagnetism. One of these
expresses the forces felt by charges in the presence of electric and magnetic fields,
and states that a point charge of magnitude q moving with velocity v experiences a
Lorentz force of magnitude  
F=q E+v×B . (2.57)
The second type of law in electromagnetism relates the properties of the electric
and magnetic fields to the distribution of charges and currents that source them.
These may be summarized as Maxwell’s equations:
∂B
∇×E+ = 0, ∇·B=0 (2.58)
∂t
and
∂E
∇×B− = j, ∇·E = σ. (2.59)
∂t
Since all inertial observers must agree on the laws of electromagnetism, it should
be possible to formulate these in terms of Lorentz tensors like Fµν and j µ . Indeed,
the two source-free Maxwell equation, eqs. (2.58), can be written as the combined
tensor equation
∂µ Fνλ + ∂ν Fλµ + ∂λ Fµν = 0 , (2.60)
and the two Maxwell equations with sources, eqs. (2.59), similarly can be written

∂ν F µν = j µ . (2.61)

Notice that the antisymmetry F µν = −F νµ implies ∂µ ∂ν F µν vanishes identically,


showing that eq. (2.61) would be inconsistent if charge were not conserved, ∂µ j µ 6= 0.
The Lorentz force, eq. (2.57), can also be grouped into a force 4-vector,

Fµ = qFµν uν , (2.62)

where uν denotes the 4-velocity of the point charge.


Finally, the source-free Maxwell equations, eqs. (2.58), are often solved by writing
the fields E and B in terms of an electric and magnetic potential, Φ and A, with
∂A
B=∇×A and E = −∇Φ − , (2.63)
∂t
and these two equations can be grouped into the single tensor equation

Fµν = ∂µ Aν − ∂ν Aµ , (2.64)

with the gauge potential 4-vector defined by Aµ = {A0 , Ai } = {Φ, Ai }.

Exercise 20: Verify that eqs. (2.57), (2.58), (2.59) and (2.63) follow
from eqs. (2.62), (2.60), (2.61) and (2.64), together with the definitions
of Fµν , Aµ and j µ .

– 39 –
Stress Energy
As the example of electric charge shows, we expect to be able to associate a current
4-vector with each conserved quantity. Since energy is conserved we might therefore
naively expect the energy density, ρ, to be combined under Lorentz transformations
with an energy flux, s, into a 4-vector sµ = {s0 , si } = {ρ, si }. What makes this
R
expectation naive is the fact that the total energy, E = ρ d3 x, unlike the total
R
electric charge, Q = σ d3 x, is not itself Lorentz invariant, because it combines with
linear momentum, p, into the energy-momentum 4-vector, pµ = {E, pi }.
The proper statement is instead that the energy density, ρ, energy flux, sj ,
momentum density, π i , and momentum flux (or stress), tij , all combine under Lorentz
transformations into a Stress-Energy tensor, T µν , where
!
j
ρ s
T µν = . (2.65)
π i tij

In terms of this tensor energy and momentum conservation are expressed by the
condition
∂ν T µν = 0 , (2.66)
since this states that the total change of energy and momentum within any volume
V is equal to the net flux of energy and momentum current through the boundaries
of V :
Z Z
0ν dE ∂ρ 3
∂ν T = 0 ⇒ = d V =− sj nj d2 S (2.67)
dt V ∂t ∂V
dP i ∂π i 3
Z Z

∂ν T = 0 ⇒ = d V =− tij nj d2 S . (2.68)
dt V ∂t ∂V

Furthermore, because of the equivalence between mass and energy in relativity, there
is no difference between an energy flux, si , and a momentum density, π i : that is,
energy on the move (i.e. a flux of energy) carries momentum, and so is equivalent
to a density of momentum. Since it is also true that the internal stress tensor can
always also be chosen to be symmetric, tij = tji , the total stress-energy tensor can
also be taken to be symmetric: T µν = T νµ .

Examples
At this point it is useful to have explicit forms for the conserved current and stress
energy for some simple systems.
Massive point particles:
The simplest system for whom the stress energy can be explicitly written down is
for a point particle. A point particle is completely characterized by its world line,

– 40 –
xµ (τ ), as well as the value of any conserved physical quantities it might have, such
as its rest-mass, m, or its electric charge, q.
The contribution of a massive charged particle to the conserved current is easiest
to evaluate in its rest frame, where it is motionless and so contributes no current at
all, j = 0, and its contribution to the charge density is

j 0 (r, t) = q δ 3 (r − y(t)) (rest frame) , (2.69)

where y(t) is the particle’s spatial trajectory. Here δ 3 (r) = δ(x)δ(y)δ(z) denotes
the 3-dimensional Dirac delta function, which can be regarded as a limiting case
2 2
as λ → 0 of the function (C/λ3 ) e−r /λ , with the constant C chosen to ensure
R 3 3
that d x δ (r) = 1. The result is a quantity that is infinitely peaked about zero
argument, but with a normalization that diverges in such a way as to ensure constant
area under the curve. It has the property that
Z
d3 x f (r) δ(r − y) = f (y) , (2.70)

for arbitrary smooth functions, f .


The result for j µ in a general frame is then found simply by identifying a 4-vector
that agrees with this result in the rest frame. Any such a 4-vector must be unique,
since if two 4-vectors agree in one frame they must agree in them all. The result is

j µ (x) = q uµ δ 3 (x − y(τ )) , (2.71)

where xµ = y µ (τ ) gives the components of the particle’s world-line, for which the
4-velocity, uµ (y(τ )) = dy µ /dτ , satisfies (as usual) uµ uµ = −1. The delta function
therefore gives zero contribution except at the particle’s world line. Using the com-
ponents uµ = γ(1, v), with γ = (1 − v 2 )−1/2 , this gives

j 0 = qγ δ 3 (x − y(τ ))
and j = qγ v δ 3 (x − y(τ )) . (2.72)

The stress energy for such a particle is found using the same arguments. In the
rest frame there is no internal stress or energy flow, so the only nonzero component
is the energy density,

T 00 = m δ 3 (r − x(τ )) (rest frame) . (2.73)

The unique result for the tensor T µν in a general reference frame is then given by

T µν (x) = m uµ uν δ 3 (x − y(τ )) , (2.74)

– 41 –
which has components

T 00 = mγ 2 δ 3 (x − y(τ ))
T 0i = mγ 2 v i δ 3 (x − y(τ )) (2.75)
and T ij = mγ 2 v i v j δ 3 (x − y(τ )) .

Dust

A more common energy source for gravitational problems is a macroscopic collection


of a great many – N , say – individual particles. If these particles do not interact with
one another their total current and stress energy is just the sum of the contribution
of each, summed over all particles present:
N
X N
X
jµ = qk uµk δ 3 (x − xk (τ )) and T µν = mk uµk uνk δ 3 (x − xk (τ )) . (2.76)
k=1 k=1

Most commonly, when the gravitational properties of such a system are of interest
it is over distances, L, that are much larger than the typical inter-particle spacing,
a: L  a. (Examples where this will prove to be true include the gravitational field
within a star or cloud of interstellar gas — for which the particles are gas molecules
or atoms — or for the overall shape of the universe as a whole — for which the
particles might be entire galaxies).
In this case only the average properties of the distribution of particles is relevant,
and it is unnecessary to carry around information concerning the position of each
separate particle. This can be made precise by identifying a region whose size, d,
is much larger than the typical inter-particle distance scale, yet still much smaller
than the scale, L, of gravitational interest: L  d  a. When such a region exists,
because d  a it contains a large number of particles, and so has the property
that the statistical fluctuations (due to the exchange of individual particles with the
surrounding regions, say) about the mean of the energy and charge are very small.
However, because d  L these mean properties can be well approximated as being
constant over each such region, although they can vary slowly from region to region.
In this case we can define the average frame of rest for a given region in terms
of the region’s average 4-velocity
N
X
µ
U (x) = C uµk , (2.77)
k=1

where C is chosen to ensure that U µ is normalized, Uµ U µ = −1, and N denotes


the number of particles in R. The x-dependence of U µ emphasizes that the precise

– 42 –
average rest frame can vary slowly from region to region. The average rest frame for
R is the frame for which the spatial components vanish: U i = γv i = 0.
With this definition, the mean charge density, σ, and energy density, ρ, can be
defined for the average rest frame by

j µ (x) := σ(x) U µ (x) and T µν (x) := ρ(x) U µ (x) U ν (x) . (2.78)

In the limit where the particles all move non-relativistically these satisfy σ(x) '
q n(x) and ρ(x) ' m n(x), where n(x) = N (x)/V(x) is the macroscopic particle
R
density that reproduces V d3 x n(x) = N as would the microscopic result, nmicro (x) =
P 3
k δ (x − yk (τ )), when integrated over any spatial region of volume V containing
N particles. This is a less useful description for relativistic particles, since for these
the possibility of particle-antiparticle production and annihilation implies the total
number of particles is never strictly conserved.
A fluid made up of noninteracting massive particles of this type is known as
‘dust’, inasmuch as it represents a special case of a more general fluid for which the
pressure and viscosity terms are negligible.
Perfect fluids
The similarly simple but more realistic system for which the stress energy can be
explicitly written down is for a perfect fluid; defined as a system for which the average,
macroscopic conserved quantities are functions only of the local average fluid 4-
velocity, U µ (x), and the local metric, ηµν (and not also their derivatives, say).3
Under this assumption the conserved current describing the conservation of their
total number is given by !
γ σ
jµ = σ U µ = , (2.79)
γ σ vi
where the local rest-frame charge density, σ(x) = −Uµ j µ has properties (like de-
pendence on temperature or other macroscopic variables) that depend on the details
of the microscopic properties of the particles involved. Regardless of these details,
conservation of the underlying charge requires j µ (x) must satisfy the conservation
condition: ∂µ j µ = 0.
Similarly, the most general symmetric tensor depending only on ηµν and U µ (x)
(but not its derivatives) is
!
2 2 2 j
γ (ρ + p v ) γ (ρ + p) v
T µν = (ρ + p) U µ U ν + p η µν = . (2.80)
γ 2 (ρ + p) v i γ 2 (ρ + p) v i v j + p δ ij

3
Inclusion of a dependence on derivatives into the macroscopic currents is what introduces
transport coefficients, like conductivities and viscosities, into the discussion.

– 43 –
The interpretation of the coefficient functions ρ(x) is found by going to the rest
frame, which reveals ρ = T 00 |rest is the rest-frame energy density. Similarly, in the
rest frame T ij |rest = p δ ij . But conservation of momentum for a region V within the
fluid, eq. (2.68), then reads
dP i ∂T i0 3
Z Z Z
ij 2
= d V =− T nj d S = − p ni d2 S , (2.81)
dt V ∂t ∂V ∂V
R
which uses ∂ν T µν = 0, together with Stoke’s theorem in the form V ∂j T ij d3 V =
R
n T ij d2 S. This shows that each surface element exerts an inward-directed force
∂V j
of magnitude p, along the line defined by the surface element’s normal, n. Conse-
quently p can be interpreted as the fluid’s pressure. In general the detailed properties
of both ρ(x) and p(x) can depend on what kind of particles are involved in the fluid,
and is often characterized by an equation of state, of the form p = p(ρ, T ), where T
is the fluid’s local rest-frame temperature.

3. Weak Gravitational Fields


We are now in a position to begin making the connection between gravitation and the
geometry of spacetime. To this end it is first worth pausing to formulate Newtonian
gravity in an explicitly field-theoretic manner.

3.1 Newtonian Gravity


In the first encounter with Newtonian gravitation, one is normally taught that the
gravitational force acting on a point mass m1 situated at a position r1 due to the
presence of another point mass, m2 , situated at r2 , is
Gm1 m2
F12 = e12 , (3.1)
|r2 − r1 |2
where e12 = (r2 − r1 )/|r2 − r1 | is the unit vector pointing from particle 1 to particle
2, and G = 6.673(10) × 10−11 N m2 /kg2 is a universal constant known as Newton’s
constant of gravitation. The force due to a more complicated distribution of masses
is then found by summing eq. (3.1) over all of the particles that are present.

The principle of equivalence


Using eq. (3.1) in Newton’s 2nd Law of motion gives the acceleration of particle
number 1:
F12 Gm2
a1 = = e12 , (3.2)
m1 |r2 − r1 |2
and similarly for particle 2. This has the remarkable property of being completely
independent of the value of m1 . This property, which assumes that a particle’s

– 44 –
inertial mass appearing in Newton’s second law — F = m a — is the same as its
gravitational mass, appearing in eq. (3.1). As applied to a constant gravitational
field, such as arises to good approximation at the Earth’s surface, this implies the
well-known fact that all objects near the Earth’s surface accelerate towards it with
a universal acceleration,
d2 r
= g, (3.3)
dt2
2
with magnitude g = GM⊕ /R⊕ ' 9.8 m/s2 , regardless of how massive they are.4
The best present test of the mass-independence of eq. (3.2) come from precision
measurements of the distance to the Moon that became possible once laser reflectors
were left on its surface by astronauts in the late 1960s. These show that the difference
between the Moon and the Earth’s average acceleration towards the Sun is [1]

∆a |aE − aM |
= 1 = (−1 ± 1.4) × 10−13 , (3.4)
a 2
(a E + a M )

which is consistent with zero with a precision of one part in 1013 .


Measurements such as these provide the experimental cornerstone for under-
standing gravity theoretically, since they provide guidance about how to modify
Newton’s theory to be consistent with relativity. In particular, the great accuracy
with which a falling particle’s acceleration is known to be independent of its mass
suggests it be elevated to a principle whose validity is not restricted to its being a
consequence of Newton’s Laws.
The resulting principle is called the Principle of Equivalence because it makes
a constant gravitational force, as in eq. (3.3), appear very much like the fictitious
centrifugal and coriolis forces encountered earlier in eq. (2.46), since both produce
accelerations that are completely independent of the moving particle’s mass). A con-
stant gravitational force would in this sense be equivalent to the fictitious constant
force associated with being in a non-inertial frame undergoing constant accelera-
tion. Conversely, it is the observers in a freely falling frame that are the inertial
observers that experience Newton’s 2nd Law of motion in a constant gravitational
field (as is graphically experienced by astronauts who appear to float freely within
their spacecraft, as they all move in orbit around the Earth).

The gravitational field


A more useful way to think about Newtonian gravity for the purposes of generalizing
to relativity is in terms of fields, rather than forces. To this end one defines the
gravitational potential, Φ(r, t), throughout all space, whose strength is determined
4
...provided all non-gravitational complications, like air resistance, are negligible.

– 45 –
by the field equation
∇2 Φ = 4πG µ , (3.5)

where ∇2 = ∂x2 + ∂y2 + ∂z2 and µ(r, t) denotes the local density of mass, per unit
volume. Once Φ(r, t) is determined by solving this equation, the gravitational force
acting on any mass, m, located at a point, r, is found using the relation

F = −m ∇Φ(r, t) . (3.6)

To see that eqs. (3.5) and (3.6) reproduce eq. (3.1) one first solves eq. (3.5) using
µ(r, t) = m2 δ 3 (r − r2 (t)) to determine the gravitational potential set up by a point
mass, m2 , situated at position r = r2 (t). The solution that vanishes at spatial infinity
is
Gm2
Φ(r, t) = − , (3.7)
|r − r2 (t)|
and so applying eq. (3.6) to this for a point mass, m = m1 , situated at r = r1 then
gives eq. (3.1).
Because Newton’s law of gravity is a conservative force, it can be derived from
a potential energy. For a collection of N otherwise noninteracting particles moving
under their mutual gravitation the total conserved energy in Newtonian physics is
(up to an infinite, but position-independent, constant)

N
1X h 2 i
E= mk vk + Φ(rk ) , (3.8)
2 k=1

where, as usual, vk2 = vk · vk . For the special case of two particles, this may be
written
M V 2 mred v 2 Gm1 m2
E= + − , (3.9)
2 2 |r|
where M = m1 +m2 is the total mass, V is the magnitude of the velocity of the center
of mass, V = dR/dt with R = (m1 r1 + m2 r2 )/M . The quantity mred = m1 m2 /M
defines the reduced mass and r = r1 − r2 is the relative position of the two particles,
whose velocity v = dr/dt has magnitude v. The relative position and center of mass
position are convenient variables because they separately evolve under the equations
of motion
d2 R d2 r GM
= 0 and = − er , (3.10)
dt2 dt2 |r|2
where er = r/|r| is the unit vector parallel to r.
The solutions to these equations describe both bound orbits and unbound scat-
tering solutions, and an important property of the bound orbits is that their internal

– 46 –
kinetic and potential energies are similar in size. Consequently, the non-relativistic
approximation (required for such a Newtonian analysis) is valid if
GM
v2 '  1. (3.11)
r
Putting back the factors of c, the criterion for the validity of a Newtonian description
becomes v 2 /c2 ' GM/rc2  1. The size of GM/Rc2 at the surface of the Sun and
Earth is listed in the following Table, and show why a non-relativistic Newtonian
approximation works so well for applications in the Solar System.

M (kg) R (m) GM/R c2


Earth (⊕) 5.97 × 1024 6.38 × 106 6.95 × 10−10
Sun ( ) 1.99 × 1030 6.96 × 108 2.12 × 10−6
Table 1:
The size of non-Newtonian effects near the surface of the Earth and Sun.

Exercise 21: For a bound (elliptical) orbit of a particle in the gravi-


tational field of a large central mass M , use Newton’s Law in the form
ma = −(GM m/r2 ) er (where er is the outward pointing unit vector in the
R
radial direction) to prove hv2 i = hGM/ri, where h· · · i := (1/T ) dt(· · · )
denotes the time-average of a given quantity over one orbit (where T is
the orbital period). Use this to compute the ratio of the average kinetic
and potential energy of the particle, K = 12 mv2 and U = −GM m/r,
over an orbit, and show that hKi = − 12 hU i.

Consistency with relativity


There are several ways to see that the above Newtonian story is inconsistent with
special relativity. One is to notice that eq. (3.7) depends on time, t, only through
the specification of the instantaneous position, r2 (t), of the source particle. This
means that the force exerted on other particles, eq. (3.6), changes instantaneously
as the source particle changes its position. Information about the source’s position
therefore travels faster than light to simultaneously tell all other particles that they
should fall towards the source particle’s new position.
But special relativity states that what is simultaneous for one inertial observer
is not simultaneous for all others, and so this same force rule cannot possibly hold
for all such observers. This violates the Principle of Relativity. This problem is
related to relativity’s proscription against things moving faster than light, which the
Newtonian force law also violates.
This particular problem arises because eq. (3.5) treats space differently from time,
and a naive way to fix it would be to replace the Laplacian operator, ∇2 , appearing

– 47 –
in this equation by the Lorentz-invariant d’Alembertian operator,  = ∇2 − ∂t2 , to
get the following guess (called Nördström gravity):

1 ∂ 2Φ
 
µν µ 2
 Φ = η ∂µ ∂ν Φ = ∂ ∂µ Φ = − 2 + ∇ Φ = 4πG µ(x) . (3.12)
c ∂t2

A fully relativistic theory would also have to identify a Lorentz-invariant notion


of mass density, like perhaps the rest-frame energy density, ρ(x)/c2 , to use on the
right-hand side.
This kind of proposal has the nice feature that changes to the forces seen by other
masses do not change instantaneously, with the news of changes in source position
instead being carried by waves in the field Φ (analogous to electromagnetic waves in
the electromagnetic field) that travel at the speed of light. It must be rejected as a
successful theory of gravity, however, because its predictions contradict a number of
experimental facts, including tests like eq. (3.4), or predictions for the gravitational
bending of light rays (to be discussed below).
A clue as to how to proceed comes from recognizing that the famous equation
E = m c2 implies energy and mass are equivalent to one another in relativity, and so
we should be seeking an equation like eq. (3.12), but with the entire conserved stress
energy on the right-hand side:

4πG
 hµν = Tµν , (3.13)
c2
in much the same way as it is the entire conserved 4-current, j µ , that appears on the
right-hand side in Maxwell’s equations, eqs. (2.61). This indicates we should seek
some sort of symmetric tensor field, hµν , to describe gravity, rather than a single
scalar field like Φ. Einstein’s insight was to see that it is the metric tensor, gµν , that
is the field we seek, although eq. (3.13) is only in this case an approximation to the
right field equations for gravity.

3.2 Gravity as Geometry


To make the case that it is the spacetime metric, gµν (x), that describes gravity we
next investigate in detail spacetime geometry in a spherically symmetric system, such
as should apply outside of a spherically symmetric matter distribution like the Sun
or Earth.

Spherically Symmetric Geometries


The first step is to identify what restrictions the metric must satisfy in order to be
spherically symmetric. For the present purposes, take spherical symmetry to mean
the existence of a symmetry acting on the three spatial coordinates, xi , of the form

– 48 –
given in eq. (2.6): xi → M i j xj , where M is an orthogonal matrix (δij M i k M j l = δkl ).
This is to be a symmetry in the sense that it leaves the metric completely unchanged.
The implications for the metric can be found by constructing the most general
invariant quadratic line element, ds2 , that can be built from the vectors xi and dxi ,
and from the scalars, t and dt. Given the three rotationally invariant combinations

δij xi xj = x · x ≡ r2 , δij xi dxj = x · dx , δij dxi dxj = dx · dx , (3.14)

the most general invariant form is

ds2 = −A dt2 + B dt(x · dx) + C (x · dx)2 + D dx · dx , (3.15)

where the coefficients A = A(r, t) through D = D(r, t) are arbitrary functions of the
invariants r and t.
Given the dependence on r, it is convenient to work in polar coordinates, (r, θ, φ),
defined as usual by x1 = r sin θ cos φ, x2 = r sin θ sin φ and x3 = r cos θ, in which case

x · dx = r dr and dx · dx = dr2 + r2 (dθ2 + sin2 θ dφ2 ) . (3.16)

In these coordinates the most general invariant line element is then

ds2 = −Ã dt2 + B̃ dtdr + C̃ dr2 + D̃ r2 (dθ2 + sin2 θ dφ2 ) , (3.17)

where à = A, B̃ = rB, C̃ = r2 C + D and D̃ = D.


We are still free to redefine the invariant coordinates r and t to further simplify
the form of this metric. A convenient choice is to redefine r → r̂ = rD̃1/2 , which is
possible provided D̃ ≥ 0. This ensures the last term of eq. (3.17) becomes r̂2 (dθ2 +
sin2 θ dφ2 ). Physically, this means that r̂ plays the role usually associated with
‘radius’, because the sphere obtained by varying θ and φ at fixed r̂ and t has area
4πr̂2 , and circumference 2πr̂ when these are computed using the proper length ds.
Although this choice also mixes up the coefficients of dt2 , dtdr and dr2 in ds2 , this
can be absorbed into appropriate redefinitions of the unknown coefficients à through
C̃, leaving

ds2 = −Â dt2 + B̂ dtdr̂ + Ĉ dr̂2 + r̂2 (dθ2 + sin2 θ dφ2 ) . (3.18)

Finally, we may remove the cross term dtdr by redefining the time coordinate to
t = F (t̂, r̂), for which dt = dt̂ ∂t̂ F + dr̂ ∂r̂ F . This makes the cross term in ds2 become
[−2Â∂r̂ F + B̂]∂t̂ F dt̂dr̂, which can be eliminated by choosing F (r̂) as a solution to
the linear partial differential equation −2Â∂r̂ F + B̂ = 0.
Once this has been done we have the most general form possible for a spherically
symmetric metric. Dropping the ‘ ˆ ’ everywhere, it is:

ds2 = −e2a(r,t) dt2 + e2b(r,t) dr2 + r2 (dθ2 + sin2 θ dφ2 ) , (3.19)

– 49 –
where the remaining unknown coefficient functions are written as exponentials in
order to simplify some expressions that come later.
The coordinates used to put the metric into the form eq. (3.19) are called
Schwarzschild coordinates, and are defined by the condition that it is r2 that pre-
multiplies the angular terms. An alternative definition of coordinates can instead be
defined by the condition that the metric has the alternative isotropic form
h i
ds2 = −e2ã(%,t) dt2 + e2b̃(%,t) d%2 + %2 (dθ2 + sin2 θ dφ2 )
h i
= −e2ã(%,t) dt2 + e2b̃(%,t) dx2 + dy 2 + dz 2 , (3.20)

whose convenience relies on the metric within the square brackets being the metric
of flat 3-dimensional space.

Weak Gravitational Fields


To describe weak gravitational fields outside of a spherical source we further sup-
pose that these functions are close to those for a flat geometry (written in spherical
coordinates): ds2 ' −dt2 + dr2 + r2 (dθ2 + sin2 θ dφ2 ). That is, if we write

e2a(r,t) := 1 + 2Φ(r, t) and e2b(r,t) := 1 + 2Ψ(r, t) , (3.21)

then the Newtonian limit should correspond to the case where the functions Φ and Ψ
are small: Φ, Ψ  1. More precisely, because the Newtonian description of two-body
bound orbits implies v 2 ' GM/r (c.f. the discussion around eq. (3.11)), where v is
the relative speed and M = m1 + m2 , we assume Φ ' Ψ ' O(v 2 ) ' O(GM/r)  1.
Geodesics for slowly moving particles
To see what the physical implications of such a metric might be, we must know
how it affects the trajectories of particles. To this end — inspired by the example
of flat spacetime — we make the additional assumption that in the absence of all
non-gravitational forces particles simply follow the geodesics of the metric.
This implies that particle motion maximizes the proper time
v " 
Z B u  2 2  2 #
u dr dθ dφ
τAB = dt t(1 + 2Φ) − (1 + 2Ψ) − r2 − sin2 θ
A dt dt dt
"  2  2  2 #
1 B
Z
dx dy dz
≈ (tB − tA ) + dt 2Φ − − − , (3.22)
2 A dt dt dt

where we use t as the parameter along the curve. The approximate equality: (a)
expands the square root, keeping terms only up to O(v 2 ) — and so in particular
neglects the product Ψ(dr/dt)2 ' O(v 4 ); and (b) changes to rectangular coordinates:

– 50 –
(r, θ, φ) → (x, y, z). Assuming Φ is independent of t, and asking eq. (3.22) to be
stationary with respect to small variations in the trajectories r(t) then leads to the
following geodesic equation:
d2 r
+ ∇Φ = 0 , (3.23)
dt2
which may be recognized as Newton’s equations for particles interacting with gravity
provided we regard Φ as being the Newtonian gravitational potential. In particular
this implies " 2 #
GM GM
Φ'− +O , (3.24)
r r
at a radial position, r, above a weakly-gravitating, spherically symmetric source.
This shows that everything we know about orbits in Newtonian physics can be
captured by the postulate that gravity is associated with the curvature of spacetime,
with Newton’s first law modified to state that particles travel along geodesics in the
absence of non-gravitational forces. What is particularly noteworthy is that within
this framework the equivalence principle arises automatically: because gravity is
associated with motion through a geometry, the acceleration experienced by a moving
particle is independent of its mass (for precisely the same reason that the same is
true for fictitious forces like the coriolis force).

Exercise 22: Starting from the metric ds2 = −(1 + 2Φ) dt2 + dx2 + dy 2 +
dz 2 , and assuming Φ is small enough that higher powers like Φ2 can be
neglected, show that the only nonzero Christoffel symbols are Γttt = ∂t Φ,
Γitt = ∂i Φ and Γtti = Γtit = ∂i Φ. Use these results in the geodesic equation
to show that geodesics, xµ (w), satisfy

ẗ + ṫ2 ∂t Φ + 2 ṫẋk ∂k Φ = 0 and ẍi + δ ij ∂j Φ ṫ2 = 0 , (3.25)

where over-dots denote d/dw. Use these, with the identity dxi /dt = ẋi /ṫ,
to derive
d2 r dr
 
dr dr
− ∂t Φ − 2 · ∇Φ + ∇Φ = 0 , (3.26)
dt2 dt dt dt
and so also eq. (3.23) in the non-relativistic limit where ∂t Φ = 0 and
products like ∂k Φ (dxk /dt) can be neglected. (As usual, r here denotes
the vector xi ei , where ex , ey and ez are the unit vectors in the three
Cartesian coordinate directions.)

Gravitational redshift
Since the trajectories of Newtonian gravity are reproduced as the geodesics of a metric
that depends on the gravitational potential, and since the geodesics are defined as

– 51 –
the curves that maximize the proper time between two events, it must be true that
gravitational fields cause time to run differently for observers sitting within them.
To see this quantitatively, consider the world-lines of an observer who hovers
(perhaps using rockets) at a fixed distance above a gravitating source: xµ (τ ) =
{t(τ ), r? , θ? , φ? }, where (r? , θ? , φ? ) labels the fixed spatial position of the observer.
Any such an observer’s 4-velocity is given by uµ = dxµ /dτ = {dt/dτ, 0, 0, 0}, where
dt/dτ can be computed in terms of the gravitational potential by using the condition
gµν uµ uν = gtt (dt/dτ )2 = −1, and so using gtt = −(1 + 2Φ) we find

dτ √ √
= −gtt = 1 + 2Φ ' 1 + Φ + O(Φ2 ) . (3.27)
dt
and so, to linear order in Φ, the difference between the rates of two clocks at different
radii, rA and rB , becomes
   
dτ dτ
− ' Φ(rB ) − Φ(rA ) . (3.28)
dt B dt A

As expected, this states that clocks run at different speeds when situated in a grav-
itational field.

Exercise 23: For a constant gravitational field pointed along the z axis
the Newtonian potential can be written as Φ = gz, where g is the uni-
versal acceleration experienced by falling objects. Eq. (3.28) states that
two clocks separated by a height h = ∆z run with rates that differ by
an amount gh, with the higher of the two clocks running faster. Verify
that this result also follows from special relativity and the principle of
equivalence by considering two observers who accelerate in the positive
z direction along the trajectories zA (t) = 12 g t2 and zB (t) = h + 21 g t2 in
the absence of a gravitational field, by comparing the times of departure
and arrival of two light rays sent from observer A to observer B.

As applied to observers outside of a spherical, weakly gravitating source, for


which Φ = −GM/r these become
" 2 #
dτ GM GM
'1− +O , (3.29)
dt r r

and so in particular dτ = dt for clocks that are infinitely far away (r → ∞). This
provides the physical interpretation for the coordinate t, which is seen as the time
measured by an infinitely distant observer. Eq. (3.29) then shows how time runs
more and more slowly the closer one hovers over the gravitating source. In particular,

– 52 –
reinstating the factors of c, motionless clocks at the top of a building on the surface
of the Earth run faster than those on the ground floor by an amount
   
dτ GM⊕ h gh −15 h
∆ ' 2 2 ' 2 ' 1.1 × 10 , (3.30)
dt R⊕ c c 10 m
2
for a building of height h  R⊕ . Here g = GM⊕ /R⊕ ' 9.8 m/sec2 denotes the
acceleration due to gravity at the Earth’s surface. This difference in the clock’s rate
accumulates over time, adding up to a difference of 9.5 × 10−11 sec (about a tenth of
a nanosecond) every day between clocks situated on the two floors. Time differences
this large can be measured using accurate atomic clocks, verifying the prediction of
eq. (3.30).
Closely related to the slowing of time in a gravitational field is the red-shifting of
light as it climbs out of a gravitational potential well (or its blue-shifting as it falls in).
Although this is described in more detail below, once the geodesics describing light
propagation are determined, the main result also follows from the above discussion of
gravitational time dilation. This is possible because of the connection between photon
energy and frequency required by quantum mechanics, E = ~ω, since frequencies may
be directly determined by time measurements (such as measurements of the period
T = 2π/ω).
Keeping in mind that t measures time as seen by observers at infinity, eq. (3.29)
shows that the frequency, ω(r), of a photon measured by a motionless observer at
radius r, differs from the frequency, ω∞ , the same photon would be measured to have
at r → ∞ by
" 2 #
ω(r) T∞ dt 1 GM GM
= = =p '1+ +O . (3.31)
ω∞ T (r) dτ 1 + 2Φ(r) r r

Physically, the decrease (or red-shift) in frequency seen by observers at successively


larger radii corresponds to the photon’s energy loss due to its having to climb out of
the gravitational potential well. (The only difference between photons and massive
particles climbing out of such a well is that for photons this energy loss does not
imply a corresponding reduction of speed.)

3.3 Relativistic Effects in the Solar System


It is very useful to explore the implications of weak gravity in more detail since this
is the regime of real interest for most applications in near-Earth orbit, or within
the Solar System. But it is also useful to go beyond the strict Newtonian limit
since many measurements are sufficiently sensitive to detect the deviations between
relativistic gravity and Newton’s laws. We do so here in a way that is reasonably
model independent, by not restricting to the specific metric that we shall later find

– 53 –
is predicted by Einstein’s field equations. The utility of being this general is that it
allows a quantitative statement as to the accuracy with which observations support
the predictions of General Relativity.

Parameterized Post-Newtonian (PPN) Approximation


Our starting point is the metric, eq. (3.19), which we assume also to be static (i.e. t
independent), and so write as

ds2 = −e2a(r) dt2 + e2b(r) dr2 + r2 (dθ2 + sin2 θ dφ2 )


h i h i
= − 1 + 2Φ(r) dt2 + 1 + 2Ψ(r) dr2 + r2 (dθ2 + sin2 θ dφ2 ) . (3.32)

Unlike in previous sections we do not stop at the Newtonian approximation for Φ


and Ψ and instead write
 2  
GM GM GM
Φ(r) = − + (β − γ) + · · · , and Ψ(r) = γ + · · · , (3.33)
r r r
where β and γ are dimensionless quantities that will differ for different theories of
gravity. As we shall see in detail below, in General Relativity the exact spherical
solution to Einstein’s equations gives
h i−1 2 GM
e2a(r) = 1 + 2Φ(r) = e−2b(r) = 1 + 2Ψ(r) =1− , (3.34)
r
and so predicts
β=γ=1 (General Relativity) . (3.35)
Most of the experimental tests of General Relativity can be summarized as con-
straints on the range for β and γ that are allowed by observations, some of the most
important of which are described in the next sections.

General properties of geodesics


Since the observational tests all involve the motion of particles or light rays within
the geometry, the first step is to identify and solve the geodesic equations. We start
with some general properties of geodesics for any geometry, before specializing to the
spherically symmetric case.
The equation of motion which defines the trajectory, xµ (τ ), of a freely-falling
particle is the geodesic equation
d 2 xµ
 ν  λ
µ dx dx
2
+ Γνλ [x(τ )] = 0, (3.36)
dτ dτ dτ
where for time-like geodesics τ is the proper time measured along the trajectory.
There are several first integrals of these equations that may be obtained on general
grounds.

– 54 –
To find the first integral of this type, take the inner product of eq. (3.36) with
the velocity 4-vector, dxµ /dτ , and use eq. (2.39) to simplify the result:

dxµ
  2 ν  α   β 
dx ν dx dx
0 = gµν 2
+ Γαβ
dτ dτ dτ dτ
 µ 2 ν   µ α β
dx dx 1 dx dx dx
= gµν + ∂α g µβ
dτ dτ 2 2 dτ dτ dτ
  µ   ν 
1 d dx dx
= gµν . (3.37)
2 dτ dτ dτ

The last line uses that gµν [x(τ )] is itself evaluated along the trajectory, and so must
be implicitly differentiated.
This shows that the quantity gµν ẋµ ẋν is a constant along a geodesic (where
ẋµ = dxµ /dτ ) and so in particular its sign does not change. As a result it follows
that if a particle initially starts out moving at the local speed of light, gµν ẋµ ẋν = 0,
then this is always true. Similarly, if a particle initially moves more slowly than light,
gµν ẋµ ẋν < 0, then this is also always true.
Another first integral of the geodesic equations is immediate if the metric should
happen to have an isometry. That is, if there are directions in the geometry along
which the metric does not change. Recall that the metric transforms as a tensor,
0 0
under a coordinate change, so gµ0 ν 0 (x0 ) = gαβ (x) (∂xα /∂xµ )(∂xβ /∂xν ). Specializing
0 0
to an infinitesimal transformation, xµ = xµ + ξ µ (x0 ), gives ∂xµ /∂xα ' δ µ α + ∂α ξ µ
and so the transformed metric becomes gµ0 ν 0 ' gµν + δgµν , with

δgµν = ξ λ ∂λ gµν + ∂µ ξ λ gλν + ∂ν ξ λ gµλ . (3.38)

This transformation is called an isometry for those ξ µ for which eq. (3.38) vanishes,
and if such a ξ µ (x) exists it is called a Killing vector field. In the time-independent
and spherically symmetric applications of present interest there are four such direc-
tions, corresponding to arbitrary shifts in t, and to the three independent rotations
of 3-dimensional space (including in particular constant shifts of φ). The simplest
Killing vectors are those corresponding respectively to the constant shifts in the co-
µ µ
ordinates t and φ, for which ξ(t) = {1, 0, 0, 0} or ξ(φ) = {0, 0, 0, 1} (in the coordinates
µ ν
x = {t, r, θ, φ}), since for these ∂µ ξ = 0, and the fact that the metric does not
µ µ
depend on this coordinate implies ξ(t) ∂µ gνλ = ∂t gνλ = 0 and ξ(φ) ∂µ gνλ = ∂φ gνλ = 0.
To see why isometries help integrate the geodesic equations, multiply eq. (3.36)

– 55 –
through by ξµ = gµν ξ ν , to get
 2 ν  α   β 
µ d x ν dx dx
0 = gµν ξ + Γ αβ
dτ 2 dτ dτ
 2 ν  α β  α β
µ dx µ dx dx 1 µ dx dx
= gµν ξ 2
+ ∂α gµβ ξ − ξ ∂µ gαβ
dτ dτ dτ 2 dτ dτ
  ν   α β
d dx 1 dx dx
= gµν ξ µ − δgαβ . (3.39)
dτ dτ 2 dτ dτ
Clearly, for any ξ µ for which δgαβ = 0 the geodesic equation implies the quantity
dxν
Q = gµν ξ µ
, (3.40)

is a constant along the geodesic. That is, there is a conserved quantity for a geodesic
corresponding to each symmetry of the metric.

Geodesics in static spherically symmetric spacetimes


In terms of the metric functions a(r) and b(r) the nonzero components of the Christof-
fel symbols turn out to be,

Γrtt = e2(a−b) ∂r a , Γttr = ∂r a , Γrrr = ∂r b


1
Γrθθ = −r e−2b , Γrφφ = −r sin2 θ e−2b , Γθrθ = (3.41)
r
1
Γφrφ = , Γθφφ = − sin θ cos θ , Γφθφ = cot θ .
r
and so the geodesic equations become
d2 t
  
dt dr
2
+ 2 ∂r a =0
dτ dτ dτ
 2
d2 θ
  
dφ 2 dr dθ
− sin θ cos θ + =0 (3.42)
dτ 2 dτ r dτ dτ
d2 φ
     
dr dφ 2 dr dφ
2
+ 2 cot θ + = 0,
dτ dτ dτ r dτ dτ
and
2 2 " 2 2 #
d2 r 2(a−b)
  
dt dr dθ dφ
+e ∂r a +∂r b −re−2b + sin2 θ = 0 . (3.43)
dτ 2 dτ dτ dτ dτ

One of the above equations can be traded for the first integral corresponding to
the condition that gµν (dxµ /dτ )(dxν /dτ ) = −1 (or zero, for a null geodesic) along
the geodesic, which implies
 2  2 "   2 #
2
2a dt 2b dr 2 dθ 2 dφ
−e +e +r + sin θ = −1 (or 0) . (3.44)
dτ dτ dτ dτ

– 56 –
The conserved quantity, E, associated with the symmetry corresponding to shifts
in t is found by multiplying the t geodesic equation by gtt = −e2a and integrating,
leading to
ν
   
µ dx 2a dt dt
E = −gµν ξ(t) =e = (1 + 2Φ) , (3.45)
dτ dτ dτ
being a constant along the geodesic. The corresponding conserved quantity, L, asso-
ciated with shifting φ is similarly found by multiplying the φ geodesic equation by
gφφ = r2 sin2 θ and integrating, implying that the angular momentum

dxν
 
µ dφ
L= gµν ξ(φ) = r2 sin2 θ , (3.46)
dτ dτ

is also constant along any geodesic.


The resulting equations can be further simplified by using the observation that
motion in a spherically symmetric gravitational field lies completely within a plane.5
This allows us the freedom to choose the orientation of the coordinate axes so that
the relevant plane is described by θ(τ ) = π2 , for all τ . (Notice that this choice solves
the geodesic equation for θ, eq. (3.42), as claimed.) With this simplifying choice, we
may use eqs. (3.45) and (3.46) to eliminate dt/dτ and dφ/dτ from eq. (3.44), leading
to the following first-order equation governing the radial motion of a geodesic
2
L2

2 −2a 2b dr
−E e +e + = −ζ , (3.47)
dτ r2

where ζ = 1 for a time-like geodesic and ζ = 0 for a null geodesic. Alternatively, this
may be written
 2
dr
+ Weff (r) = 0 , (3.48)

which has the form E = 0 for the energy of one-dimensional motion in the presence
of an effective potential
 2 
L
Weff (r) = 2
+ ζ e−2b(r) − E 2 e−2[a(r)+b(r)]
r
 2
E2

1 L
= +ζ − . (3.49)
1 + 2Ψ r2 1 + 2Φ

The advantage of writing the equation in this form is the intuition it provides about
the kinds of orbits that are possible (once the functions a(r) and b(r) — or Φ(r) and
Ψ(r) — are specified).
5
The fact that the motion lies in a plane ultimately can be traced to the existence of the two
isometries to do with rotations that do not correspond simply to shifts in φ.

– 57 –
Gravitational redshift
We are now in a position to directly verify the earlier expression for the redshift
(or energy loss) of a light ray as seen by motionless observers as it climbs away
from a gravitational source. To this end suppose a light ray is sent radially outward
from an observer at (r, θ, φ) = (rA , θ? , φ? ) to another observer at position (r, θ, φ) =
(rB , θ? , φ? ). To compute the energy of this light ray as seen by these observers we
must compute both their 4-velocity, uµ , and the 4-momentum of the outgoing light
ray, pµ , and evaluate E = −gµν uµ pν .
The 4-velocity of an observer sitting at fixed spatial position, (r, θ, φ), is easiest
to compute since it must point purely in the time direction: uµ = {ut , 0, 0, 0}. The
condition gµν uµ uν = gtt (ut )2 = −1 then implies

1 1
ut (r) = p = e−a(r) = p . (3.50)
−gtt (r) 1 + 2Φ(r)

The trajectory of the light ray, xµ (w), is a radially out-going null geodesic for
the given metric, for which the equations of the previous section can be applied,
specialized to the case of radial motion: dθ/dτ = dφ/dτ = 0. In particular, the
condition gµν (dxµ /dw)(dxν /dw) = 0, eq. (3.44), in this case implies
 2  2  2  2
2a(r) dt 2b(r) dr dt dr
0 = −e +e = − (1 + 2Φ) + (1 + 2Ψ) ,
dw dw dw dw
(3.51)
and so the trajectory of the light ray satisfies
r
dr/dw dr 1 + 2Φ
= = ± ea−b = ± , (3.52)
dt/dw dt 1 + 2Ψ

where the sign depends on whether the light ray is in-going or out-going. Similarly,
eq. (3.45) implies that
   
2a dt dt
E =e = (1 + 2Φ) , (3.53)
dw dw

is constant along the outgoing null geodesic. The tangent vector to the light ray’s
world-line then is
( r )
dxµ
 
dt dr E 1 + 2Φ
= , , 0, 0 = 1, ± , 0, 0 , (3.54)
dw dw dw 1 + 2Φ 1 + 2Ψ

in terms of which the photon’s 4-momentum may be written pµ (w) = k dxµ /dw for
some constant k.

– 58 –
We may now compute the energy of the photon seen by the stationary observers
at fixed position, which is given by

kE
E(r) = −gµν uµ pν = −gtt ut pt = p . (3.55)
1 + 2Φ(r)

In particular, since Φ → 0 as r → ∞ it follows that kE = E∞ can be interpreted as


the photon’s energy as seen by observers at rest very far from the gravitating source.
In this case, the energy seen by observers at rest at general r is
  2
E(r) 1 GM 3 GM
=p '1+ + −β+γ + ··· , (3.56)
E∞ 1 + 2Φ(r) r 2 r

which agrees with the result, eq. (3.31), of the previous section.

Deflection of light by the Sun


The equation governing the ra-
dial motion for a more general light
ray in a spherically symmetric grav-
itational field is eq. (3.48), together
with eq. (3.49) specialized to ζ = 0:
 2
E2

1 L
Weff (r) = − .
1 + 2Ψ r2 1 + 2Φ
(3.57)
These describe trajectories that typ-
ically escape to infinity, particularly
Figure 5: The geometry of light deflection by a in the weak field limit, since light
gravitating body, showing the impact parameter, rays move so swiftly they are diffi-
b, and deflection angle, δφ. cult to bind into orbits. The point
of closest approach to the gravitat-
ing source of such a ray corresponds to the place where dr/dτ = 0, and so — from
eq. (3.48) — occurs at r = r? , where Weff (r? ) = 0. That is,

L2 r?2
b2 := =
E2 1 + 2Φ(r? )
(GM )3
 
' r?2 2
+ 2 GM r? + 2(GM ) (2 − β + γ) + O . (3.58)
r?

If Φ = 0 then r? = b, and since the geodesics are straight lines in this limit, b is
revealed as the impact parameter: the point of closest approach of the straight line
obtained by extrapolating the asymptotic trajectory far from the gravitating source.

– 59 –
The radial coordinate of closest approach for the full trajectory is instead smaller
than b, approximately given by
(GM )2
 
r? ' b − GM + O , (3.59)
b
when b  GM . This is a good approximation within the solar system, since then
b ≥ R , and Table 1 shows that GM /R ' 10−6 .
The spatial shape of the trajectory in space, r(φ), is found by using eqs. (3.46)
and (3.48) to compute dr/dφ = (dr/dτ )/(dφ/dτ ), leading to
 2  2 2  2
r2
  2 
dr r dr 1 r
+ Weff (r) = + 1− = 0 . (3.60)
dφ L dφ 1 + 2Ψ 1 + 2Φ b2
Very far from the gravitating source Φ, Ψ → 0 and so this reduces to
dr r √
'± r 2 − b2 , (3.61)
dφ b
where the sign corresponds to which angular direction the light ray travels relative
to the gravitating source. This has as solutions b = r cos(φ − φ? ) (upper sign) or
b = r sin(φ − φ? ) (lower sign). These are the equations of a straight line, as must be
so in the absence of gravity. This form confirms that b is the impact parameter of
the asymptotic trajectory.
The measured quantity when a light ray is deflected by a gravitating source is the
deflection, δφ, between the asymptotic lines defined by the incident and departing
rays. This is computed by inverting the expression for dr/dφ to obtain dφ/dr, using
eq. (3.60), and integrating the result from the initial asymptotically distant region to
the final one. Since the scattering is symmetric about the point of closest approach,
the total change, ∆φ, over the whole trajectory is twice the result integrated from
r = r? to r = ∞, leading to
Z ∞   Z ∞ s
dφ dr (1 + 2Φ)(1 + 2Ψ)
∆φ = 2b dr = 2b . (3.62)
r? dr r? r r2 − b2 (1 + 2Φ)

Changing variables to x = r/r? , using the leading approximations Φ ' −GM/r,


Ψ ' γ GM/r, r? ' b − GM , and expanding to linear order in GM/b, this becomes
Z ∞ s
dx (1 + 2Φ)(1 + 2Ψ)
∆φ = 2
1 x (xr? /b)2 − 1 − 2Φ
Z ∞ " 2 #
x2
  
dx GM GM
=2 √ 1+ γ+ +O (3.63)
1 x x2 − 1 bx x+1 b
  " 2 #
GM GM
= π + 2(γ + 1) +O .
b b

– 60 –
The desired scattering angle subtracts the result in the absence of gravity, δφ =
∆φ − π, and so (restoring factors of c)
  " 2 #
γ+1 4GM GM
δφ = +O (radians) . (3.64)
2 b c2 b c2

In particular, for General Relativity we have γ = 1, so applying eq. (3.64) to tra-


jectories that just graze the Sun — i.e. for which M = M and b = R — gives
δφ ' 1.75 seconds of arc. (An arc-second is defined to be 1/3600 of a degree.)

Exercise 24: Compute the deflection angle in Newtonian gravity for a


particle whose trajectory is bent by gravity as it passes a second particle,
as a function of its impact parameter, b. Specialize the result to the
case where the particle’s speed is v = c and show that Newton would
have predicted a result that is half as large as Einstein’s prediction of
δφ ' 4GM/b c2 . Here M = m1 + m2 is the total mass of the two-particle
system.

This effect was first observed in 1919, by searching for the deflection of starlight
as it passes very close to the Sun during a total solar eclipse. The deflection is then
observable as an apparent change in the position of the stars seen near the Sun during
the eclipse as compared with their relative positions when the Sun is elsewhere in
the sky. Because the light rays are bent towards the Sun, during the eclipse their
apparent position as seen from Earth is displaced away from the Sun, by an amount
that falls off with their angular separation from the Sun.
Modern measurements instead perform this measurement using very long base-
line radio telescopes to observe astrophysical radio sources when these are near the
Sun. The use of long baseline interferometry provides much improved angular resolu-
tion, as well as the advantage that the Sun is not as brilliant a foreground obstruction
in radio wavelengths as it is in visible light. The main complications arise from the
presence of a plasma of ionized particles in the solar corona near the Sun, whose
presence provides an index of refraction for the radio waves and so can bend their
trajectories. Unlike the relativistic effect, the influence of the solar corona is fre-
quency dependent, however, and so can be disentangled by making observations at
more than one frequency. The resulting constraint on the PPN parameter γ is

γ = 1.007 ± 0.009 , (3.65)

and so agrees well with the prediction γ = 1 of General Relativity.

– 61 –
Shapiro time delay
A second observable related to the
trajectories of light rays in the presence
of gravity is associated with the change
in transit time for light rays that travel
very close to the solar surface [2]. This
can be measured by sending signals to
other planets (such as to space probes
orbiting Mars or on the Martian surface)
and back and measuring the result as a
function of the planetary position as it
passes through superior conjunction (i.e.
when it is on the opposite side of the Sun
Figure 6: The geometry of time delay mea- from the Earth).
surements, showing the impact parameter, d, Recall that it takes light about 8
point of closest approach, r? , and the dis-
minutes to reach the Earth from the Sun,
tances to the Earth, rE = r⊕ and Mars rM .
and it takes a radio signal about 40 min-
utes to make the round trip across the
740 million km from Earth to Mars at its most distant. Since the Earth’s orbital
speed is roughly 30 km/s, during this time the Earth only moves through about
70,000 km, largely at right angles to the line of sight to Mars. As a result we can
treat the Earth and Mars to be at rest for the purposes of the calculation.
Suppose the instantaneous Sun-Earth distance is denoted r⊕ , and the same for
Mars is rM , and if the radial position of the radio signal’s closest approach to the
Sun is r? . In the absence of gravity the time taken for the round-trip passage of a
signal from Earth to Mars (see Figure 6) is
p q 
2 − d2 + 2
∆t0 = 2 rM r⊕ − d2 , (3.66)

where d is the distance from the Sun of the nearest point on the straight line connect-

ing Mars to the Earth. Each square root of the form r2 − d2 gives the light travel
time along the straight-line trajectory to r from r = d, and the factor of 2 appears
because we seek the round-trip time. Notice that, unlike for the impact parameter b
in the calculation for the deflection of light, the quantity d satisfies d < r? , because
the relevant straight-line trajectory is the one passing directly from Earth to Mars,
and not the one tangent to the asymptotic light ray at infinite distance.
With gravity present, the radius of closest approach is found by asking where
dr/dτ = 0 along the geodesic trajectory, leading to eq. (3.58), which states Weff (r? ) =
0, and so r? = b − GM + · · · , where b = L/E.

– 62 –
The time elapsed (as seen by a distant motionless observer) during the radio
signal’s trip is found by integrating dt/dr = (dt/dτ )/(dr/dτ ), using eqs. (3.48) and
(3.45). That is,
 2  2
dr 1 + 2Φ
+ Weff (r) = 0 , (3.67)
dt E
and so the round-trip time evolved becomes ∆t = 2[T (r? , rM ) + T (r? , r⊕ )], with

rx rx
r −1/2
b2
Z   Z 
dt 1 + 2Ψ
T (r? , rx ) = dr = dr 1 − 2 (1 + 2Φ) . (3.68)
r? dr r? 1 + 2Φ r

Writing r = x r? and expanding to leading order in GM/r? then gives

rx /r?
r −1/2
b2
Z 
1 + 2Ψ
T (r? , rx ) = r? dx 1 − 2 2 (1 + 2Φ) (3.69)
1 1 + 2Φ x r?
Z rx /r?    " 2 #
dx GM 1 GM
= r? √ x+ 1+γ+ +O
1 x2 − 1 r? x+1 r?
     
rx 1 rx
' rx2 − r?2 + GM (1 + γ) cosh−1 cosh−1
p
+ tanh .
r? 2 r?

These expressions may be simplified using cosh−1 x = ln x + x2 − 1 and tanh 12 z =
 

(ez − 1)/(ez + 1) and so


  √ r
1 −1 x − 1 + x2 − 1 x−1
tanh cosh x = √ = , (3.70)
2 x+1+ x −1 2 x+1

to get (with c’s re-instated) ∆t = 2[T (r? , rM ) + T (r? , r⊕ )], with


" p ! r #
GM r + r 2 − r2 r − r
p x x ? x ?
cT (r? , rx ) ' rx2 − r?2 + 2 (1 + γ) ln + , (3.71)
c r? rx + r?

up to terms of order (GM )2 /r? c4 .


In the applications of interest to the solar system this may be simplified using
rx  r? to drop all terms suppressed by (r? /rx )2 , whose accuracy is controlled by
(R /r⊕ )2 ' 2 × 10−5 , an amount about 10 times larger than GM /R . In this case
the total time delay becomes
   
1 + γ 4 GM 4 rM r⊕
∆t ' ∆t0 + ln . (3.72)
2 c3 r?2

This neglects the product (r? /r⊕ )2 (GM/r? ), which means that the difference between
d and r? can be neglected in the first term, allowing it to be written as the transit
time, ∆t0 , found in the absence of gravity, eq. (3.66).

– 63 –
The size of this effect for signals sent to Mars during superior conjunction is
about 250 µsec out of a total round-trip travel time of about 40 minutes. Although
this represents only a change of one part in 107 , it can be measured precisely due
to the great stability of atomic clocks, which can be accurate to a part in 1012 . The
orbits of the planets are also known to sufficient precision to make their positions
known to an accuracy of about a kilometre, meaning that the timing effect is also not
swamped by the distance uncertainty. The biggest measurement errors are associated
with the effects of propagation through the ions of the solar corona, as was the case
for measurements of the solar deflection of light. The resulting precision obtained
for the PPN parameter γ from the Viking Mars Mission is [3]

γ = 1.000 ± 0.002 . (3.73)

More recent measurements of the same effect for signals sent to the Cassini probe at
Saturn have improved this accuracy to [4]

γ − 1 = (−1.3 ± 5.2) × 10−5 , (3.74)

again in good agreement with the prediction γ = 1 of General Relativity.

Orbital precession
Another classic test of General Relativity within the solar system concerns the orbits
of planets and satellites rather than the motion of light rays. In this case the relevant
equations are those for a time-like geodesic, rather than a null one, and so the radial
dependence is given by eqs. (3.48) and (3.49), with ζ = 1 rather than zero, and so
 2
dr
+ Weff (r) = 0 , (3.75)

with
L2 E2
 
1
Weff (r) = + 1 − , (3.76)
1 + 2Ψ r2 1 + 2Φ
with conservation of energy and momentum given by eqs. (3.45) and (3.46),
   
dt 2 dφ
E = (1 + 2Φ) and L = r . (3.77)
dτ dτ

The Newtonian Limit

It is useful to have in mind the properties of the Newtonian orbits before investigating
their relativistic corrections.

– 64 –
Recall that for orbits, the Newto-
0.3 nian limit of these equations corresponds
to Φ = −GM/r = O(v 2 ), and dt/dτ =
0.2
1 + O(v 2 ) and so E = 1 + ε with ε =
0.1
O(v 2 ). In this case Ψ = O(v 2 ) only con-
0 tributes at O(v 4 ) and so can be com-
pletely neglected, leaving eq. (3.75) in
-0.1
the familiar form from the Newtonian
-0.2
Kepler problem,
2 4 6 8 10
2
L2

r
1 dr
+ + Φ(r) = ε . (3.78)
Figure 7: A plot of the Newtonian effective 2 dτ 2 r2
potential against r.
In particular we see that L = r2 (dφ/dτ )
is the usual specific angular momentum,
while ε plays the role of the total Newtonian energy. The effective potential appearing
here, Veff (r) = (L2 /2r2 ) − GM/r, is plotted in Fig. 7, which displays the divergence
Veff → +∞ as r → 0 when L 6= 0, thereby showing how angular momentum excludes
an orbiting particle from approaching too close to r → 0. For r → ∞ the limit
instead is Veff → 0 from below, showing that orbits with ε ≥ 0 escape to infinity
while those with ε < 0 describe bound orbits.
The bound orbits are confined to lie within a finite range of radii, r− ≤ r ≤ r+ ,
whose endpoints are determined by the conditions dr/dτ = 0. Eq. (3.78) allows these
to be determined in terms of the conserved quantities L and ε, since they must be
roots of
L2 GM
2
− = ε. (3.79)
2r r
The smaller of the two roots, r− , corresponds to the point of closest approach to the
Sun, and is called its perihelion. Aphelion6 defines the point on the orbit furthest
from the Sun, given by the larger of the two roots, r = r+ . Solving eq. (3.79) gives
the explicit expressions
p
1 GM ∓ (GM )2 − 2L2 |ε|
= , (3.80)
r± L2

or, equivalently,
p
GM ± (GM )2 − 2L2 |ε|
r± = . (3.81)
2|ε|
6
For orbits about the Earth the corresponding points are instead called perigee and apogee, and
for orbits about other stars the terms are periastron and apastron.

– 65 –
The explicit shape, r(φ), of the bound orbits in the Newtonian case is found by
combining eqs. (3.78) and (3.77) to obtain
2 2 2 
r2 L2
   
1 dr 1 dr/dτ GM
= = ε+ − 2 . (3.82)
2 dφ 2 dφ/dτ L r 2r

This can be explicitly integrated by changing variables to u = 1/r, giving solutions


u = A + B cos φ where A and B are constants. These describe bound orbits that
are ellipses, with the constants A and B related to their semi-major axis a and
eccentricity 0 ≤ e < 1. In terms of a and e:

a(1 − e2 )
r(φ) = . (3.83)
1 + e cos φ
This shows that the points closest to and furthest from the Sun are given by r± =
a(1 ± e). Comparing this with the expressions for r± in terms of L and ε allows these
conserved quantities to be given in terms of a and e by
GM
L2 = GM a(1 − e2 ) and ε = − , (3.84)
2a
and so L2 /(2 |ε|) = a2 (1 − e2 ) = r+ r− and r+ + r− = 2 a = GM/|ε|.
There are two different ways to define the period of the orbit, both of which
happen to give the same result in the Newtonian limit. One definition, Pr , is defined
in terms of the radial motion as the time taken to move between successive perihe-
lia. This can be found by recognizing that dt/dτ ' 1 in the Newtonian limit, and
integrating eq. (3.78)
r+ r+ −1/2
L2
Z   Z 
dt 2 GM
Pr = 2 dr =2 dr 2 ε + − 2
r− dr r− r r
Z r+
2 rdr π(r+ + r− )
= p p = p , (3.85)
2 |ε| r− (r+ − r)(r − r− ) 2 |ε|

and so 2
(GM )2 a3

Pr
= = , (3.86)
2π (2 |ε|)3 GM
in agreement with Newton’s modification of Kepler’s Third Law.
A second way to define the orbital period is in terms of the angular motion, as
the time, Pφ , required to sweep out 2π radians:
Z 2π   Z 2π
2 a2 (1 − e2 )2 π
 2 Z
dt r dφ
Pφ = dφ = dφ = 2
0 dφ 0 L L 0 (1 + e cos φ)
2 2 2
 
2 a (1 − e ) π
= = Pr . (3.87)
L (1 − e2 )3/2

– 66 –
Because these two notions of period agree with one another, the Newtonian orbit
passes through precisely the same points every time φ cycles through 2π radians,
and so is said to be closed.
More generally this is not the case in relativistic systems, and any mismatch
Pr 6= Pφ implies the orbit precesses, with successive perihelions occurring at different
angular positions, displaced by the perihelion shift, δφprec := ∆φ − 2π, with
Z r+  

∆φ := 2 dr . (3.88)
r− dr
For the Newtonian orbits δφprec = 0, because
Z r+ Z r+
dr L dr
∆φ = L p =p p
2
2 GM r − 2 |ε|r − L 2 2 |ε| r− r (r+ − r)(r − r− )
r− r
 
p 2π
= a2 (1 − e2 ) √ = 2π , (3.89)
r+ r−
as expected, since Pr = Pφ .
Relativistic Precession
We may now see how the leading relativistic corrections change these Newtonian re-
sults. The main observable effect from the point of view of testing General Relativity
is the violation of the relation Pr = Pφ that relativistic effects induce, leading to a
nonzero prediction for the orbital precession angle, δφprec .
To this end we recompute eq. (3.88) by going back to the full expressions,
eqs. (3.75) and (3.77), for the orbital shape, r(φ). These give
 2  2  2
E2

du 1 dr 1 L
= 4 =− 2 +1− , (3.90)
dφ r dφ L (1 + 2Ψ) r2 1 + 2Φ
where u := 1/r. Expanding this out to next-to-leading order in powers of GM/r =
GM u and E = 1 + ε gives
 2
du 1h
' 2 −L2 u2 + 2 GM u + 2 ε (1 − 2γ GM u)

dφ L
i
+2(2 + γ − β)(GM u)2 + ε2 + 4 εGM u . (3.91)
The relativistic correction terms have several effects. First, they change the
position of the zeroes of the right-hand side of eq. (3.91), to u± = u0± + δu± , where
(2 + γ − β)(GM u0± )2 + 12 ε2 + 2 ε (GM u0± )
δu± '
L2 u0± − GM
(2 + γ − β)(GM u0± )2 + 12 ε2 + 2 ε (GM u0± )
= p
± (GM )2 + 2L2 ε
  
GM 2+γ−β 1 1
'± + − , (3.92)
a2 e (1 ± e)2 8 1±e

– 67 –
in which the second line uses eq. (3.80) to simplify the denominator, and the third
line expresses L, ε and u0± in terms of a and e using the equations for the Newtonian
orbits.
The angle ∆φ then becomes
Z u−
du
∆φ = 2 L √ (3.93)
u+ Au3
+ Bu2 + Cu + D
Z u−
δA u3 + δB u2 + δC u + δD
 
' 2π − L du ,
u+ (B0 u2 + C0 u + D0 )3/2

where

B0 = −L2 , C0 = 2 GM and D0 = 2 ε , (3.94)


Figure 8: The precession of
an elliptical orbit, such as is while
caused by deviations from the
inverse-square force law. δA = 2γGM L2 , δB = 2(2 − γ − β)(GM )2 ,
δC = 4(1 − γ) ε GM and δD = ε2 . (3.95)

The integral in the second line of eq. (3.93) is subtle


to evaluate because it diverges as u → u0± . Although δu+ > 0 and δu− < 0, so
the range of integration does not include u0± , it is nonetheless true that this near-
divergence complicates the expansion of the integral in powers of GM/a. Such an
evaluation gives (restoring factors of c)
   
2 + 2γ − β 6π GM
δφprec = ∆φ − 2π = . (3.96)
3 (1 − e2 ) a c2

Exercise 25: Verify that eq. (3.96) follows from eq. (3.93), as claimed.

Astronomy has a long history of precise observations of planetary orbits, and


most orbits are observed to precess. However there are several complication to be
addressed before these can be compared with the prediction, eq. (3.96). First of all,
Newton’s Law only predicts strictly elliptical orbits for a planet orbiting the Sun in
the absence of the gravitational pull of all of the other planets, and in the approxi-
mation that the Sun is perfectly spherical. Deviations from these two idealizations
perturb the orbits, typically causing them to precess. The calculated contribution of
these more mundane perturbations must be subtracted from any observed precession
before any relativistic effects can be identified.
Deviations of this type from the predictions of Newtonian mechanics were iden-
tified very early, and were historically used to predict the existence of some of the
outer planets before their actual discovery. By the turn of the 20th century all such

– 68 –
planetary effects had been accounted for, and only one observation remained in dis-
agreement with predictions: a small anomalous precession in the orbit of Mercury.
This is measured to precess — relative to the vernal equinox (i.e. the place in the
sky where the Sun crosses the celestial equator in the spring as seen from the Earth)
— by a very small amount: 5599.7 arc-seconds per century. For comparison, the
amount expected within Newtonian gravity is given in the first three rows of the
following table, which sum to the Newtonian prediction of 5557.0 arc-sec/century.

Source Amount (arcsec/century)


Earth’s spin precession 5025.6
Other planets 531.4
Solar oblateness 0.03
Relativity 42.98 ± 0.04
Total 5600.0

The difference between the observations and the Newtonian prediction, 43 arc-
sec/century, is larger than the theoretical and observational errors, and its interpre-
tation remained a puzzle, until the discovery of General Relativity. Remarkably, the
contribution of eq. (3.96) for β = γ = 1 is precisely the amount required to bring
theory into agreement with observations. This was one of the clinchers for Einstein
and others in the early days of General Relativity. Given the bounds on γ coming
from the deflection of light and the Shapiro time delay, the agreement of predictions
with the orbit of Mercury gives the following limit on β:

β = 1.000 ± 0.003 . (3.97)

There is an analogous relativistic precession of the orbits of other planets, and


some asteroids, and although the orbits of the remaining innermost planets are so
close to circular that their precession is hard to measure, all extant observations
agree well with the predictions. The comparison for the innermost planets and the
asteroid Icarus is given in the following table [5].

Object GR prediction (arcsec/century) Observation (arcsec/century)


Mercury 43.0 43.1 ± 0.05
Venus 8.6 8.4 ± 4.8
Earth 3.8 5.0 ± 1.2
Icarus 10.3 9.8 ± 0.8

– 69 –
4. Field Equations for Curved Space

The content of general relativity has been summarized (by John Wheeler) as the
statements that “Spacetime tells Matter how to move” and “Matter tells Spacetime
how to curve”.
The previous section has explored the implication of generalizing Newton’s First
Law to the assumption that particles move on geodesics in the absence of any non-
gravitational forces. This is how spacetime tells matter to move. The remainder
of this section describes the field equations, which is how matter makes spacetime
curve. These equations are necessary for predicting which metric should be relevant
to describe the gravitational field in any given situation.

4.1 Gravity as curvature

The first step towards formulating the field equations is to identify how they should
depend on the metric. To this end we seek a quantity that expresses precisely what
is different about a gravitating geometry. Whatever this quantity is, it should be
a tensor so that whatever the distinction is, all observers will agree on it (much as
they all agree on what it means to be a geodesic).

Freely falling observers


The principle of equivalence states that a freely-falling observer in a gravitational
field finds the local laws of physics are the same as those given in special relativity.
These observers are those whose coordinates are such that gµν = ηµν and Γµνλ = 0
at the relevant point, and so geodesics correspond to the condition d2 xµ /dτ 2 = 0.
Mathematically, it is always possible to find such an observer at any point, and the
coordinates of these observers are called Gaussian normal coordinates.
In general it is not possible to find a similar class of observers simultaneously for
all of the points throughout an entire region of spacetime, and according to Einstein
the failure to be able to do so is the signature of the existence of a gravitational field.
We therefore seek a tensor which can be used to distinguish a metric that describes
a gravitational field, from one which is simply Minkowski space written in a bizarre
set of coordinates.
Since the issue is whether or not Γµνλ can be made equal to vanish throughout an
entire region, even though this is always possible at a given point, the obstruction
is to do with the ability to choose coordinates that set derivatives, ∂ρ Γµνλ , to zero at
a given point, as well Γµνλ itself. We therefore expect the tensor which expresses the
obstruction to involve derivatives of the Christoffel symbols, and so second derivatives
of the metric.

– 70 –
The tensor that provides the obstruction to making the Christoffel symbols van-
ish throughout a region is the natural generalization to spacetime of the curvature,
encountered in earlier sections when describing the differential geometry of space.
That is, the existence of observers for which Γµνλ = 0 throughout some region can
be shown to be equivalent to the vanishing of the Riemann curvature tensor, Rµ νλρ ,
throughout the same region, where
Rµ νλρ = ∂λ Γµνρ + Γµλσ Γσνρ − (λ ↔ ρ) . (4.1)
Recalling that the Christoffel symbols are defined by
µ 1 µρ  
Γνλ = g ∂ν gλρ + ∂λ gνρ − ∂ρ gνλ , (4.2)
2
it is clear that the Riemann tensor involves second derivatives of the metric tensor.
Because Rµ νλρ transforms as a tensor, if it vanishes in any set of coordinates, it
must also vanish for all others. This means that although the laws of nature can be
made into those of special relativity simply by transforming to an appropriate freely-
falling frame), this does not mean that all the effects of gravity are removed in such
a frame. This cannot be true, since the curvature tensor, Rµ νλρ , cannot be similarly
removed simply by performing a coordinate transformation. Einstein’s point with the
principle of equivalence was not that gravity is purely a fictitious frame-dependent
thing, but rather that it is the tidal forces of gravity that are present for all observers,
and it is the curvature of spacetime that encodes these tidal effects.

4.2 Einstein’s Field Equations


We may now state the field equation that expresses how sources of mass and energy
give rise to gravitational fields, that generalizes the Newtonian field equation for the
gravitational potential, Φ:
∇2 Φ = 4πG µ , (4.3)
where µ is the local mass density. We’ve seen that the Newtonian potential, Φ, is
naturally expressed as a component of the metric, gµν , and since eq. (4.3) involves
second derivatives of Φ it is natural to seek a generalization with the curvature tensor
appearing on the left-hand side.
Einstein proposed that the spacetime curvature tensor, Rµ νλρ , is related to the
local distribution, Tµν , of stress-energy by the following field equations:
1 8πG
Rµν − R gµν = 2 Tµν , (4.4)
2 c
where Tµν is the stress-energy tensor that describes the conserved energy and mo-
mentum of matter, Rµν = Rλ µλν is the spacetime’s Ricci tensor and R = g µν Rµν is
its Ricci scalar. The left-hand-side of this equation is the most general one which
satisfies the following three conditions:

– 71 –
1. It transforms as a symmetric tensor (as does Tµν );

2. It involves exactly two derivatives of gµν (which is the relativistic generalization


of the Newtonian potential Φ, because gtt ≈ −1 − 2Φ in the non-relativistic
limit); and

3. It is covariantly conserved inasmuch as: ∇µ Rµν − 12 R gµν = 0.




In the above ∇µ denotes the covariant derivative, defined so that ∇µ Tµ... =


g µν ∇µ Tν... , and

∇µ T α1 ... β1 ... = ∂µ T α1 ... β1 ... + Γαµρ1 T ρ... β1 ... + · · · − Γρµβ1 T α1 ... ρ... − · · · . (4.5)

It is defined in this way in order to have the following properties: ∇µ Tν...λ transforms
as a tensor under coordinate changes if Tν...λ does, and for a freely-falling observer
(for whom Γµνλ = 0 at a particular point) it reduces to a regular partial derivative:
∂µ Tν...λ . Given these properties, the third condition listed above is motivated by the
generalization to curved space,

∇µ T µν = ∂µ T µν + Γµµα T αν + Γνµα T µα = 0 , (4.6)

of the conservation of stress-energy, eq. (2.66). Notice that because it is a tensor


equation, if ∇µ T µν vanishes for any observer it must vanish for all observers. But
energy conservation requires ∇µ T µν = 0 because eq. (4.6) reduces to eq. (2.66) for a
freely falling observer, for whom Γµνλ vanishes at a particular point.
Two comments are in order about Requirement 2, that the left-hand side involve
only two derivatives:

1. Requirement 2 should not be regarded as being fundamental. Rather, keep-


ing in mind that our observational knowledge of gravity is largely confined to
comparatively weak gravitational fields, it should be regarded as the leading
contribution in an expansion of the left-hand side in powers of the curvature.
As such it expresses our ignorance about strong curvatures, and we should
expect any inferences drawn from General Relativity to be suspect when the
curvatures become sufficiently large. How large? This is not known, but we
should beware whenever any dimensionless measure of curvature (like Gg µν Rµν
or G2 Rµνλρ Rµνλρ ) should become large.

2. Requirement 2 states that the left-hand side should contain precisely two
derivatives of the metric, but if this equation is to be regarded as being a
derivative expansion one should really keep all terms having up to two deriva-
tives. In fact there is one possible term involving no derivatives at all, and this

– 72 –
should be expected to dominate if derivatives are small. Including this term
revises eq. (4.4) to
1
Rµν − R gµν + λ gµν = 8πG Tµν , (4.7)
2
where the constant λ is known as the ‘cosmological term’. At present there is
evidence from cosmology that λ is actually nonzero, but very small compared
with the contribution of the right-hand side of eq. (4.7) in all applications apart
from cosmology. For simplicity we ignore this term in the following sections,
but return to it in the later discussion of cosmology.

Taking the trace of eq. (4.4) implies R = −8πG T , where T = g µν Tµν is the
trace of the stress tensor. Using this in eq. (4.4) gives the Einstein equations in their
trace-reversed form:  1 
Rµν = 8πG Tµν − T gµν . (4.8)
2
In particular, a vacuum spacetime is one for which no matter is present, and so
Tµν = 0. Eq. (4.8) implies any such spacetime is Ricci flat: Rµν = 0.

Exercise 26: Use the definitions to compute the Ricci scalar for an n-
dimensional space whose metric is gµν = e2φ ηµν , where φ(x) is a scalar
function and ηµν is the usual flat Minkowski metric. Show that it is given
by
R = −2(n − 1)∂ 2 φ − (n − 1)(n − 2)(∂φ)2 , (4.9)
where ∂ 2 φ := η µν ∂µ ∂ν φ and (∂φ)2 := η µν ∂µ φ ∂ν φ.

4.3 Rotationally Invariant Solutions


This section now derives some of the solutions to Einstein’s equations which describe
the geometries outside of symmetric gravitating sources, such as stars, planets or
black holes.

Birkhoff ’s Theorem: Spherical symmetry implies static


Consider first the geometry outside of a spherical distribution of matter. It is assumed
that there is no matter outside of the distribution, and so Tµν = 0 in the region
of interest. The goal of this section is to identify the most general solution to the
vacuum Einstein equations which is spherically symmetric. We do so without making
the additional assumption of time-independence.
We saw earlier that it is always possible to choose coordinates in a spherically
symmetric geometry so that the metric takes the form of eq. (3.19). The metric
cannot be simplified further using only symmetries and coordinate choices, so the

– 73 –
functions a(r, t) and b(r, t) must be determined by solving Einstein’s field equations
for the vacuum: Rµν = 0. To this end the next step is to specialize Einstein’s
equations to the special case of the metric given in eq. (3.19).
Plugging into the definitions the nonzero components of the Christoffel symbols
become:

Γttt = ∂t a , Γttr = ∂r a , Γtrr = e2(b−a) ∂t b


Γrtt = e2(a−b) ∂r a , Γrtr = ∂t b , Γrrr = ∂r b (4.10)
1
Γrθθ = −r e−2b , Γrφφ = −r sin2 θ e−2b , Γθrθ =
r
1
Γφrφ = , Γθφφ = − sin θ cos θ , Γφθφ = cot θ .
r
Exercise 27: Verify that eqs. (4.10) follow from a direct application of
the definition of Γµνλ to the metric of eq. (3.19), as claimed.

Using these components of the Christoffel symbols in the definition of the Rie-
mann tensor then leads to the following nonzero components:
h i
t 2(b−a)
R rtr = e ∂t b + (∂t b) − ∂t a ∂t b − ∂r2 a − (∂r a)2 + ∂r a ∂r b ,
2 2

Rt θtθ = −r e−2b ∂r a , Rt φtφ = −r e−2b sin2 θ ∂r a ,


Rt θrθ = −r e−2a ∂t b , Rt φrφ = −r e−2a sin2 θ ∂t b , (4.11)
Rr θrθ = r e−2b ∂r b , Rr φrφ = r e−2b sin2 θ ∂r b ,
Rθ φθφ = (1 − e−2b ) sin2 θ .

Finally, taking the trace of this to obtain the Ricci tensor leads to
 
2 2 2(a−b) 2 2 2∂r a
Rtt = ∂t b + (∂t b) − ∂t a ∂t b + e ∂r a + (∂r a) − ∂r a ∂r b +
r
2∂r b h i
Rrr = −∂r2 a − (∂r a)2 + ∂r a ∂r b + + e2(b−a) ∂t2 b + (∂t b)2 − ∂t a ∂t b (4.12)
r
2∂t b h i
Rtr = , Rθθ = 1 + e−2b r(∂r b − ∂r a) − 1 , Rφφ = Rθθ sin2 θ .
r
Exercise 28: Verify that eqs. (4.11) and (4.12) follow from a direct appli-
cation of the definitions, using the components of Γµνλ given in eq. (4.10),
as claimed.

The goal is to use the five equations found by setting Rµν = 0 to solve for the
two unknown functions a(r, t) and b(r, t). Although this seems like it should be an
over-determined problem (too many equations for the number of unknowns), it is not
for two reasons. The first reason is the spherical symmetry of the problem (which

– 74 –
is also what reduced the metric to two independent functions). For example the
conditions Rθθ = 0 and Rφφ = 0 are not independent conditions, and this is a generic
consequence of spherical symmetry. However, the remaining four equations still do
not over-determine a and b because of the Bianchi identity, ∇µ Rµν − 21 R gµν = 0,


implies that they are not all independent.


Birkhoff ’s Theorem
The simplest equation to solve is Rtr = 0, which implies b = b(r) is t-independent.
Differentiating Rθθ = 0 with respect to t and using ∂t b = 0, then implies the further
condition ∂t ∂r a = 0, whose general solution is a(r, t) = f (r) + g(t), for arbitrary
functions f and g. This makes the time component of the metric become −e2a dt2 =
−e2f (r) [eg(t) dt]2 , which shows that the function g(t) can be removed by redefining the
t coordinate from t to t0 , with dt0 = eg(t) dt. Once this has been done it follows that
the remaining metric functions are independent of time: a = a(r) and b = b(r):
ds2 = −e2a(r) dt2 + e2b(r) dr2 + r2 (dθ2 + sin2 θ dφ2 ) . (4.13)
This result is important, so it has a name: Birkhoff ’s theorem. It states that the
assumption of spherical symmetry is sufficient in itself to ensure that the geometry is
also time-independent. A metric like eq. 4.13, for which coordinates exist for which
all components of gµν are independent of t and there are no terms linear7 in dt, is
called static. If the metric can only be made t-independent in coordinates for which
dt dxi cross-terms exist, then the metric is instead called stationary.

The Schwarzschild Solution


Given the t-independence of a and b, the components of the Ricci tensor simplify to
 
2(a−b) 2 2 2∂r a
Rtt = e ∂r a + (∂r a) − ∂r a ∂r b +
r
2∂r b
Rrr = −∂r2 a − (∂r a)2 + ∂r a ∂r b +
h i r
Rθθ = 1 + e−2b r(∂r b − ∂r a) − 1 , Rφφ = Rθθ sin2 θ . (4.14)

A simple equation is obtained by taking the combination Rtt e2(b−a) + Rrr = 0,


which gives
2 
∂r a + ∂r b = 0 . (4.15)
r
This implies a + b = k, where k is an r-independent constant. The constant k can
be set to zero without loss of generality simply by rescaling the time coordinate
t → e−k t, leaving the result a(r) = −b(r). Using this in Rθθ = 0 implies
   
2a 2a
e 2r∂r a + 1 = ∂r re = 1, (4.16)
7
More precisely, for which the vector in the time direction is ‘hypersurface orthogonal’.

– 75 –
whose solution is
rs
e2a = 1 − , (4.17)
r
where the integration constant, rs , has dimensions of length, and is called the
Schwarzschild radius. As is easily checked, no further information is obtained from
setting to zero any of the other components of Rµν given in eq. (4.14).
The value of the integration constant, rs , can be found by examining the large-r
limit, for which the metric approaches the metric for flat space (written in polar
coordinates): ds2 → −dt2 + dr2 + r2 (dθ2 + sin2 θ dφ2 ). A metric having this property
is said to be asymptotically flat. Since the flatness of the metric at large r implies the
gravitational field is weak there, the Newtonian limit applies and so gtt ≈ −1 − 2Φ,
where Φ = −GM/r is the Newtonian potential for a spherical source having mass8 M .
Comparing this with the large-r limit of gtt = −e2a = −1 + rs /r gives (re-introducing
the factors of c),
2 GM
rs = . (4.18)
c2
The final result is the Schwarzschild geometry
 rs  2  rs −1 2
ds2 = − 1 − dt + 1 − dr + r2 (dθ2 + sin2 θ dφ2 ) , (4.19)
r r
whose weak-field limit (r  rs ) is obtained by expanding in powers of rs /r, and gives
the Parameterized Post-Newtonian form, e2a = −[1−2 GM/r+(β−γ)(GM/r)2 +· · · ]
and eb = 1 + γ(GM/r) + · · · , with β = γ = 1, as was discussed in earlier sections.
Notice that rs is very small for ordinary astrophysical objects. For instance
using the solar mass, M = 2 × 1033 g, leads to rs = 3 km. For such objects the
geometry of eq. (4.19) becomes inappropriate once one reaches the ‘edge’ of the sun,
r = R = 700, 000 km, inside of which Tµν no longer vanishes. Because of this the
entire exterior of the star is effectively in the weak-field limit r > R  rs .

5. Compact Stars and Black Holes


This section explores some of the physical consequences of the spherically symmetric
solutions obtained in the previous section, going beyond the limit of weak gravita-
tional fields considered earlier.

Geodesics
Given the metric, the motion of freely falling observers can be found by integrating
the geodesic equations, eq. (3.36). This relies on having explicit expressions for the
8
There are a number of definitions of mass in GR, and defining M in this way is equivalent to
using that of Arnowitt, Deser and Misner (ADM) in this case.

– 76 –
Christoffel symbols for the Schwarzschild geometry, which are given by specializing
eqs. (4.10) to the case e−2b = e2a = 1 − rs /r:
rs rs (r − rs )
Γttr = −Γrrr = Γrtt =
2r(r − rs ) 2r3
Γrθθ = −(r − rs ) , Γφφ = −(r − rs ) sin2 θ
r

1
Γθrθ = Γφrφ = , Γθφφ = − sin θ cos θ , Γφθφ = cot θ . (5.1)
r
Using these gives the geodesics as solutions, xµ (τ ) = [t(τ ), r(τ ), θ(τ ), φ(τ )], to the
following equations:
d2 t
 
rs dr dt
2
+ =0
dτ r(r − rs ) dτ dτ
  2    2
d2 r

rs (r − rs ) dt rs dr
2
+ 3

dτ 2r dτ 2r(r − rs ) dτ
"   2 #
2
dθ dφ
−(r − rs ) + sin2 θ =0 (5.2)
dτ dτ
 2
d2 θ 2 dθ dr dφ
2
+ − sin θ cos θ =0
dτ r dτ dτ dτ
d2 φ 2 dφ dr dθ dφ
2
+ + 2 cot θ = 0.
dτ r dτ dτ dτ dτ
5.1 Orbits
As discussed earlier, solving these equations themselves is in general a mess. However
because of the symmetries of the geometry there are a number of conservation laws,
which help obtain solutions. Spherical symmetry ensures the conservation of angular
momentum, and the conservation of the direction of angular momentum requires the
trajectory to be restricted to a plane in space. We are free to choose our coordinates
so that this plane corresponds to θ = π/2, and it is clear that θ(τ ) = π/2 is indeed a
solution to the third of eqs. (5.2). Using this result, the conservation of the magnitude
of angular momentum can then be seen by multiplying the last of eqs. (5.2) by r2 ,
to give (d/dτ )[r2 (dφ/dτ )] = 0. This leads to the first integral

r2 =L (5.3)

where L is a constant.
Time-translation invariance similarly leads to energy conservation, whose form
is found by multiplying the first of eqs. (5.2) by (1 − rs /r), to get (d/dτ )[(1 −
rs /r)(dt/dτ )] = 0. Integrating then gives the first integral
 rs  dt
1− =E, (5.4)
r dτ

– 77 –
where E is a constant. Furthermore, eq. (3.37) shows that it is also always true that
dxµ dxµ
ζ = −gµν
dτ dτ
 2   2
 rs  dt rs −1 dr
= 1− − 1−
r dτ r dτ
"  2 #
2 
dθ dφ
−r2 + sin2 θ , (5.5)
dτ dτ

is also conserved along any geodesic. For timelike geodesics we usually choose τ to be
proper time along the trajectory, in which case ζ = 1. For null geodesics describing
the propagation of light we must instead choose ζ = 0. This last equation may be
simplified by using the three conservation laws given above, allowing the derivatives
dt/dτ , dθ/dτ and dφ/dτ to be eliminated in favour of the constants L and E, giving
the following first-order equation to be solved for dr/dτ :
 2
 rs −1 2  rs −1 dr L2
ζ = 1− E − 1− − 2 . (5.6)
r r dτ r
In principle one solves this equation for r(τ ), and after plugging the result into
eqs. (5.3) and (5.4), integrates these to obtain φ(τ ) and t(τ ).
This last equation can be put into a form with which one can become emotionally
involved, by multiplying through by 12 (1 − rs /r):
 2
1 dr
+ V (r) = E , (5.7)
2 dτ
where
  2 
1 2GM L
V (r) = 1− +ζ
2 r r2
 2
L2 GM

L ζGM ζ
= 2
− − 3
+
2r r r 2
2
E
and E = . (5.8)
2
What is attractive about eq. (5.7) is that it has the form of the energy equation for
one-dimensional motion in a potential, V (r), for a particle having energy E. This is
attractive because there is considerable intuition about the properties of the solutions
based on the shape of the potential.

Orbits of massive particles


Consider first the timelike geodesics which describe the world-lines of massive parti-
cles moving slower than the speed of light, corresponding to the choice ζ = 1 in the

– 78 –
above expressions. It is useful to contrast the relativistic result with what happens
for orbits in the Newtonian limit. To this end notice that the effective potential
governing the radial motion of orbits in the Newtonian limit is given by the square
bracket in the second equality for V (r): that is Vc (r) = (L2 /2r2 ) − GM/r.
To infer the qualitative properties of orbits in the Newtonian limit notice that
Vc (r) → +∞ as r → 0 and Vc (r) → 0 from below as r → ∞. This implies Vc (r) must
have a minimum for some intermediate value, r = rc , which differentiation shows lies
at r = rc ≡ L2 /GM . Furthermore, r is time independent at this minimum provided
that the ‘energy’ satisfies E = Vc (rc ), and so r = rc gives the position of the circular
orbits for a given L. Since this is a minimum of Vc , circular orbits are stable for any
L and orbits which start near r = rc will oscillate about this point. The period of
this radial oscillation is given by ωr2 = Vc00 (rc ), and so ωr = (GM )2 /L3 . On the other
hand, for circular orbits the angular frequency of the orbit’s angular motion is given
by ωφ = (dφ/dτ ) = L/rc2 = (GM )2 /L3 . This result ωr = ωφ is related to these orbits
being ellipses having a fixed orientation in space, since the time between successive
closest approaches (perihelia) is the same as the time taken to circumnavigate the
orbit once.
How does all this change in the relativistic case? In this case V (r) → −∞ as
r → 0 and V (r) → 21 from below as r → ∞. Differentiating V shows that V 0 (r)
vanishes when
Lh  1/2 i
r = rc± ≡ L ± L2 − 3 rs2 . (5.9)
rs

We see from this that if L < 3 rs then V (r) has no real minima or maxima, and so
no circular orbits are possible at all. Orbits then come in two classes: those coming
in from infinity, which have E ≥ 12 , or E 2 ≥ 1; and those which cannot escape from
the gravitational source, having E < 21 , or E 2 < 1. In both cases, once r begins to
decrease it necessarily reaches r = 0 (and so at some point either reaches r = rs or
crashes into the source’s surface).

If, on the other hand, L > 3 rs , then V (r) has a local minimum at r = rc+
and a maximum at r = rc− . This shows that stable orbits occur at r = rc+ , and
the radius of these orbits grows as L does. The smallest stable orbit occurs when

L = 3 rs , and occurs at rmin = 3 rs = 6GM . On the other hand, for L  rs the
radius of the stable orbit becomes rc+ → L2 /GM , which agrees with the Newtonian
result (as we should expect because GM/rc = (GM )2 /L2 = (rs /2L)2  1). Orbits
which start near such circular orbits will oscillate about this radius, with frequency
3L2 2GM 12L2 GM
ωr2 = V 00 (rc ) = − − . (5.10)
rc4 rc3 rc5
Since this does not agree with the frequency of the angular motion, defined by ωφ =
(dφ/dτ ) = L/rc2 , the motion describes the precessing ellipses seen in earlier lectures.

– 79 –
Exercise 29: Compute ωφ for a stable circular orbit as a function of L
and GM and compare it to the frequency, ωr , of small radial oscillations
about the same circular orbit, computed using eq. (5.10). From these
calculate the precession angle, δφprec , that accumulates per period for
nearly circular elliptical orbits in Schwarzschild spacetime. Does your
result agree with the small-eccentricity limit of the post-Newtonian result
found earlier in eq. (3.96)?

Circular orbits are also possible at r = rc− , which decreases with increasing L.
However because this is a maximum of V (r) these orbits are unstable, and small
perturbations from them cause the trajectory to veer into the source or to escape
out to infinity. In particular, the outermost of these circular orbits occurs for the

smallest possible L, corresponding to rc− → 6 GM as L → 3 rs (which coincides
with rc+ in this limit). The smallest possible unstable circular orbit instead occurs
as L → ∞, which corresponds to rc− → 23 rs = 3 GM .

Exercise 30: Show that circular orbits in Schwarzschild spacetime ex-


actly satisfy Kepler’s 3rd Law: Ω2 = GM/r3 , where Ω = dφ/dt =
(dφ/dτ )/(dt/dτ ).

Orbits of light rays


The trajectories for massless particles (like photons, gravitons and possibly some
neutrinos) are found in an identical fashion, using instead ζ = 0, as appropriate for
null geodesics. In this case the potential V (r) degenerates to

L2
 
2 GM
V (r) = 2 1 − , (5.11)
2r r

for which V 0 vanishes at the L-independent value r = rc ≡ 32 rs = 3 GM . Since


V (r) → −∞ as r → 0 and V (r) → 0+ as r → ∞, V has a maximum at r = rc .
This shows that there is only one possible circular orbit for a light ray, and this
is unstable — occurring at r = 3 GM . Furthermore, since V (rc ) = 61 [L/(3 GM )]2 ,
photons which approach from infinity do not get closer than r = 3 GM provided

their ‘energy’ satisfies E < V (rc ), or |E| < L/(3 3 GM ). Trajectories having |E|
larger than this necessarily reach r < 32 rs .

5.2 Radial geodesics


We have seen that orbits exist for which test particles can move to arbitrarily small
r, and this means that we may have to take seriously the potential singularities of
the metric at r = 0 and r = rs . (More about these singularities in the next section.)

– 80 –
Since for any given E the orbits which penetrate to small r have small L it is useful to
study in more detail radially-directed geodesics corresponding to particles which fall
directly in (or climb directly out) of the gravitational potential. For simplicity it is
also useful to follow the fastest-moving particles, and so specialize to null geodesics.
If we focus on the shape of these geodesics in the (r, t) plane, it is convenient not
to separately find r(τ ) and t(τ ), and to instead directly use the condition

ds2 = 0 = −(1 − rs /r) dt2 + (1 − rs /r)−1 dr2 , (5.12)

to get
dr  rs 
=± 1− . (5.13)
dt r
This integrates to give the curves r∗ (r) = ±t, where the upper (lower) sign cor-
responds to outward-going (in-falling) geodesics. Since the tortoise coordinate, r∗ ,
defined by
r
r∗ = r + rs ln − 1 , (5.14)
rs
approaches r for large r, these trajectories get closer and closer to the flat-space
geodesics, r = ±t, as r → ∞. Notice also that r∗ → −∞ as r → rs .
Suppose we now examine what happens to an in-falling light ray, for which r∗ =
−t. At asymptotically late times, t → ∞, r∗ approaches −∞ and so r asymptotically
approaches rs from above. Even though we found that orbits are not energetically
precluded from reaching r = 0, the above result makes it seem as if an infinite
amount of time is required to reach the Schwarzschild radius. And this is indeed
true, although it is important in relativity to specify more precisely whose time the
coordinate t keeps track of.
Imagine therefore filling spacetime with observers who hover at a fixed radius
and angle in the Schwarzschild gravitational field. (Since these are not geodesics,
these observers would have to use rockets to accelerate and keep from falling in the
ambient gravitational field.) Only the coordinate t varies along the world-line of such
an observer, but the proper time as measured by one of these observers is given by
 rs  2
dτ 2 = −ds2 = 1 − dt , (5.15)
r

and so dτ = (1 − rs /r)1/2 dt. In general this differs from dt because of the gravita-
tional redshift associated with each observer’s position, and so t represents the time
measured only by the asymptotic observer at r → ∞.
The result above therefore shows that as seen by an observer at infinity, an in-
falling light ray takes an infinite amount of time to reach r = rs . It does not show
that this takes an infinite amount of time as measured by the in-falling observers

– 81 –
themselves. This can be determined by returning to our geodesic expression, eq. (5.7),
in the case L = 0:
dr h  rs i1/2
= ± E2 − ζ 1 − . (5.16)
dτ r
For in-falling null geodesics we choose ζ = 0, and so r = r0 − E(τ − τ0 ), showing that
r = rs is reached in a finite parameter interval along the null geodesic. A similar
conclusion can be drawn for in-falling timelike geodesics, for which ζ = 1. This is
p
most simply done by choosing the special case E = 1, for which dr/dτ = − rs /r,
and so r ∝ τ 2/3 .
We conclude that in-falling observers pass r = rs in a finite amount of their own
time. Paradoxically, this is not inconsistent with the infinite amount of time taken
as seen by the observer from infinity. To understand this suppose that the in-falling
astronaut were to send regularly spaced signals out to the observer at infinity during
the trip. Because of the gravitational redshift, these signals arrive at infinity spaced
further apart than they were on their emission, with this redshift becoming infinite
as the astronaut reaches r = rs .

5.3 Singularities of the solution


Because coordinates can be chosen arbitrarily in General Relativity, it is always
important to check that they mean what they are assumed to mean. This is usually
done by using the metric to compute physical distances, such as when we chose the
radial coordinate earlier to be the radius or area of the spheres at fixed r and t.
However, because the metric itself is only found after solving the field equations, it
may be that the coordinates do not end up having all of the properties they were
assumed to have when they were chosen. For this reason it is always important to
check the properties of the metric which results, to see what it implies about the
properties of various coordinate surfaces.
The first thing to check is that the metric is well-defined: i.e. that its components
are finite and the metric is invertible. (Invertibility is important because if gµν is
not invertible then the infinitesimal coordinate displacements, dxµ , are not linearly
independent and so do not span all of the possible directions in the space.) Inspection
of the Schwarzschild metric, eq. (4.19), shows that there are two places which might
be problems: r = 0 and r = rs . Clearly neither of these is of real interest for most
astrophysical objects, for which the solution does not apply down to such small radii.

Curvature singularity: r = 0
The geometry near r = 0 is counter-intuitive because for all 0 < r < rs it is grr which
is negative, while gtt is positive. This means that in this region it is r, and not t,
that is the time coordinate!

– 82 –
r = 0 seems problematic, because the components of the metric and curvature
tensors all diverge at this point. This need not represent a physical problem in itself,
however, because the components of a tensor are different in different coordinate
systems, and it could just be that our coordinates are poorly chosen near r = 0. For
instance, even starting with the flat metric ds2 = −dt2 + dx2 + dy 2 + dz 2 can lead to
divergent metric components after performing the coordinate change x = 1/(w − 3),
since dx = −dw/(w−3)2 implies there are metric components which diverge as w → 3
or vanish as w → ∞. In this case this is a sign that the coordinate transformation
x → w is singular (because x or w diverges).
Is this what is happening for the Schwarzschild solution at r = 0? If so there
would exist an inspired change of coordinates which would remove the singularities
in the metric and curvature as r → 0. A sufficient condition for such a change
of coordinates not to exist is if there is a scalar quantity which diverges, since a
scalar takes the same value in all coordinate systems. Examples of scalars might
be R, Rµν Rµν or the eigenvalues, λa , of the matrix Rµ ν . (These last are scalars
because the covariance of the eigenvalue equation Rµ ν vaν = λa vaµ requires λa to be a
scalar. Notice the same would not be true for the eigenvalues of the matrix Rµν !)
Unfortunately, none of those listed helps for the Schwarzschild geometry, because
this satisfies Rµν = 0 by construction. However, there is a scalar which diverges at
r = 0:
12 r2
Rµνλρ Rµνλρ = 6 s , (5.17)
r
which shows that r = 0 really is a curvature singularity, and not merely a coordinate
singularity.
Given that we believe Einstein’s equations are likely to be weak-curvature ex-
pansions of something more fundamental, we should be wary of taking too seriously
the properties of the Schwarzschild solution very near r = 0.

Coordinate singularity: r = rs = 2 GM
What about the singularity seen in eq. (4.19) as r → rs ? Is this also a curvature
singularity? It is the purpose of this section to argue that this is a coordinate
singularity, which merely expresses the breakdown of the Schwarzschild coordinates
for r ≤ rs . An indication that this is possible comes from the fact that nothing
particular seems to happen to in-falling observers as they reach rs .
This suggests dropping the coordinates r and t and instead trying coordinates
which are adapted to in-falling and out-going radial light rays. To this end consider
the new coordinates u and v, defined in terms of the tortoise coordinate, r∗ , of
eq. (5.14):
u = t − r∗ v = t + r∗ . (5.18)

– 83 –
Radially in-falling light rays are described in these coordinates by constant v, while
radially out-going light rays travel along the lines of constant u.
The idea is to trade t, the Schwarzschild time variable, for either u or v. For in-
stance, Eddington-Finkelstein coordinates are defined by using the coordinates (v, r),
in terms of which the Schwarzschild metric becomes
 rs  2
ds2 = − 1 − dv + 2dvdr + r2 (dθ2 + sin2 θdφ2 ) , (5.19)
r
and so gvv = −(1 − rs /r), grv = gvr = 1 and grr = 0. It is clear that none of the
metric components diverge anymore as r → rs , although some do pass through zero
there. However, zero values for metric elements are not in themselves a problem —
after all, there were plenty of zeros in the diagonal Schwarzschild metric for r 6= rs —
provided that the metric remains invertible. However the determinant of the above
metric is g ≡ det gµν = −r4 sin2 θ, which is nonzero (and so gµν is invertible) for all
r > 0.

5.4 Black Holes and Event Horizons

Clearly, there is nothing singular about the Schwarzschild geometry at r = rs , it is


just that the Schwarzschild coordinates, (t, r, θ, φ), break down at this point. Now
that we have coordinates which do not break down, what then is the interpretation
of the surface r = rs ?
To see this consider again the trajectories of in-falling and out-going light rays.
In Eddington-Finkelstein coordinates the condition ds = 0 implies these satisfy
h  rs  i
ds2 = 0 = dv 2dr − 1 − dv , (5.20)
r

and so dv/dr = 0 for in-falling light rays, and dv/dr = 2(1 − rs /r)−1 for out-going
light rays. Notice that for the outgoing rays, this means that dr/dv is positive only if
r > rs , but dr/dv < 0 when r < rs . This shows that r always decreases when r < rs ,
even for the outgoing light ray! At r = rs , the outgoing ray satisfies dr/dv = 0, and
so the ray simply ‘hovers’ at r = rs . That is to say, the surface r = rs is a null
surface, spanned by null geodesics.
These arguments show that the surface r = rs serves as the point of no return,
inasmuch as no light signal emitted at r < rs can escape to r > rs . The same is
also true for timelike geodesics, as might have been expected since these particles
necessarily move more slowly than do light rays. The existence of such a surface
also makes sense from the following point of view. If the escape speed, vesc , were
computed as a function of r using Newtonian physics, it would be defined as that
speed that gets the object to infinity with precisely zero kinetic energy, and so would

– 84 –
satisfy
2
mvesc GM m
− = 0, (5.21)
2 r
2
and so vesc = 2 GM/r. The radius at which vesc = c would then be rs = 2 GM/c2 ,
in agreement with the Schwarzschild radius. (There is no reason why the numerical
factors of the Newtonian calculation should agree exactly with the relativistic calcu-
lation, but it is nonetheless a happy accident that they do.) What is new to special
relativity is the proscription of motion with v > c, which completely precludes the
ability for anything to escape from r < rs .
A surface such as this, which divides spacetime into regions between which signals
cannot be sent due to the speed of light being the maximum speed, is called an event
horizon. r = rs is an event horizon for the Schwarzschild geometry. The region with
r < rs is called a black hole, since it is something from which nothing, not even light,
can classically escape.

Validity of the approximations


If gravitational effects are so dramatic as to divide spacetime into two regions like
this, one might ask whether the curvatures are too large to trust our use of Einstein’s
equations to predict them. It is worth keeping in mind when doing so that we have
three scales in the problem, namely G, M and r, and so there are two independent
dimensionless ratios which we can form from them. These are GM/r and G/r2 (in
our units with ~ = c = 1). It turns out that each of these controls a different kind
of approximation.

• Relativistic Effects: We have already seen that the first of these, 2GM/rc2 ∼
rs /r (once the factors of c are restored), controls the importance of relativistic
effects, and the fact that this is O(1) when r ∼ rs shows that relativistic effects
are crucial to understanding the properties of the event horizon.

• Quantum effects: Our treatment of gravity has been purely classical, and it
turns out that the relative size of quantum corrections to our treatment are
of order G/r2 — or ~G/c3 r2 once ~ and c are restored (notice the tell-tale ~,
the signature of quantum effects). The classical approximation is typically a
good one provided that this ratio is small. Since the unit of length — the
Planck length — associated with G is very small, `p = (~G/c3 )1/2 ∼ 10−33 cm,
the condition G/r2 = (`p /r)2  1 is not a very strong restriction for any r of
astrophysical interest!

• Weak curvature: Recall that Einstein’s equations are motivated as being the
weak-curvature approximation to some possibly more fundamental theory, and

– 85 –
so corrections to these equations might be expected to arise that are of order
GR, where R is any invariant notion of the local curvature. (For example,
one might think of GR being the square root of the invariant G2 Rµνλρ Rµνλρ
for the Schwarzschild geometry.) But inspection of the Riemann tensor for
Schwarzschild shows that in order of magnitude the components of Rµνλρ are
of order GM/r3 and so GR ∼ G2 M/r3 ∼ (G/r2 )(rs /r), which shows how
curvature corrections to Einstein’s equations are related to the size of quantum
corrections.

The above arguments indicate that the effects of quantum gravity near the event
horizon, when r = rs , should be of order

~G `2p ~c Mp2
δs = 3 2 = 2 = = , (5.22)
c rs rs 4GM 2 4M 2
p
and this gets smaller the larger M is. The quantity Mp = ~c/G = 2.18 × 10−8
kg is called the Planck mass, and its size shows that δs  1 for the black holes of
astrophysics, for which M > M ' 1.99 × 1030 kg. Given that the interpretation
of astrophysical objects as black holes is based purely on the classical predictions of
general relativity, one might have worried that this interpretation might be under-
mined by unknown quantum gravity effects. The fact that δs  1 for such black
holes shows that this worry is likely to be groundless.
Quantum effects would be important, however, for very light black holes, such
as if their mass were as small as that of an elementary particle like a proton, whose
mass is mp ' 1.67 × 10−27 kg. We should not trust any classical inferences about
the gravitational field of a proton at radii as small as its Schwarzschild radius, and
so have no reason to believe these should behave gravitationally as classical black
holes.

5.5 Quantum Effects Near Black Holes


What about black holes with masses in between these two extremes? For a black
hole with M ' 10−3 kg (i.e. 1 gram) we have δs ' 10−10 , ensuring that it is
massive enough that the classical approximation would be very good, even at the
Schwarzschild radius. On the other hand, although quantum effects are small for such
a black hole, they need not be completely negligible. Are there any novel quantum
phenomena that might arise?

Particle Production
The first quantum property of a black hole that can arise in this way as a small
quantum correction in a controlled semiclassical approximation was found in the

– 86 –
1970 s by Stephen Hawking. He discovered that black holes need not be strictly
black, since quantum effects can make them radiate elementary particles.
The effect he discovered for black holes is a special case of a more general quan-
tum phenomenon: the spontaneous production of particles by an external field. This
kind of effect had been predicted theoretically decades earlier by Julian Schwinger,
who predicted that a sufficiently strong electric field would create electrons and
positrons out of the vacuum.
It is instructive to see how particle production like this works energetically. Be-
cause of the randomness of quantum mechanics the vacuum of empty space is better
imagined as a frothing soup of particles and antiparticles that are forever trying
to emerge as real particles. (In quantum mechanics, whatever is not forbidden is
compulsory.) They normally cannot emerge, however, because their appearance is
forbidden by conservation laws. For instance, electrons cannot emerge from the
vacuum alone without violating conservation of electric charge, since each electron
carries charge q = −e, where e = 1.60 × 10−19 Coulomb. But since positrons carry
the opposite charge, charge conservation cannot forbid the joint emergence of an
electron-positron pair. But it is energy conservation that keeps such pairs from
emerging all the time from the vacuum around us, because such an emergence would
require the production of sufficient energy to account for their masses, E = 2mc2 .
Although there is a sense that the Uncertainty Principle allows quantum fluctuations
to violate energy conservation, they can only do so very briefly and in the long term
energy conservation is inviolate.
The situation changes in the presence of an electric field, E, because the energy of
a pair of oppositely charged particles is a function of their separation. Such particles
can lower their energy by separating because their opposite charges make them feel
forces in opposite directions due to the electric field. It is the work done by these
forces that lowers their energy, and if their total energy (including their mass) can be
lowered to zero in this way then energy conservation can no longer forbid their being
produced spontaneously from the vacuum. The energy (including the rest mass) of
an electron-positron pair (held at rest) a distance x apart in a constant electric field
turns out to be
E = 2mc2 − e|E|x , (5.23)
and so this can vanish (just like for the vacuum), once x > 2mc2 /e|E|.
Using the quantum probability of having the electrons emerge a distance x apart
from the vacuum, p(x) ∼ e−2mcx/~ , implies the probability for producing electron-
positron pairs by an electric field is given by

4m2 c3
 
p ∼ exp − . (5.24)
e|E|~

– 87 –
Notice that the exponential dependence makes this probability extremely small unless
e|E| ∼> 4m2 c3 /~, which is why electrons don’t pop out of the vacuum all the time
in the presence of the stray electric fields that arise in day-to-day life. The kinds of
fields that are required can exist very near very heavy nuclei (having more protons
than the heaviest naturally occuring nuclei), once all of their screening electrons have
been stripped off.

Hawking Radiation
Hawking’s observation was that a similar phenomenon can happen in the gravita-
tional field produced by a black hole. As particles and antiparticles pop in and out
of the fermenting froth of the vacuum near the Schwarzschild radius, r = rs , one
member of a pair can fall into the black hole and so be unable to recombine with its
erstwhile partner. And the energy that is released by having this member fall into
the hole can be sufficient to carry its surviving partner far enough away from the
black hole that it can escape. The resulting prediction is that a black hole should
emit a constant stream of elementary particles, now called the Hawking radiation.
To see why sufficient energy is liberated, consider a particle having 4-momentum
p = mv µ , where m is the particle rest-mass and v µ = dxµ /dτ is its 4-velocity,
µ

moving along a radial trajectory, r = r(t), in a Schwarzschild geometry. Since


v · v = gµν v µ v ν = −(1 − rs /r)(dt/dτ )2 + (1 − rs /r)−1 (dr/dτ )2 = −1, we have
 
γ
γ v
µ
v = , (5.25)
 
 0 
0

where v = dr/dt and


−1/2
v2

rs 
γ= 1− − . (5.26)
r (1 − rs /r)

Notice that the requirement that dt/dτ = γ be real requires v 2 ≤ (1 − rs /r)2 , which
approaches zero as r → rs . This limit arises because v is defined using the asymptotic
time t, and reflects the breakdown of this coordinate near r = rs due to the infinite
redshift that exists between this coordinate and the proper time for freely-falling
observers in this limit.
Recall that the quantity that is conserved along the trajectory of a particle as it
falls in (or climbs out) of the black hole is

dt  rs  pt  rs 
E = −gtt = 1− =γ 1− , (5.27)
dτ r m r

– 88 –
and so is the quantity of interest for deciding whether one of a particle-antiparticle
pair can escape to infinity. This is an energy inasmuch as it agrees at r → ∞ with
the energy, E = −u · p = −gµν uµ pν = −gtt ut pt , of the particle as seen by a static
observer hovering at fixed radius whose 4-velocity, uµ , is ut = [1 − rs /r]−1/2 and
ui = 0.
Since E → 1 as r → ∞, the obstacle to having a particle escape to infinity is
that E for the escaping particle must get to unity whereas the sum E1 + E2 for the
particle-antiparticle pair starts at zero (same as in the absence of the pair) and is
conserved as they move along their respective geodesics. In order to have E1 = 1 for
the particle, say, its partner must be able to tunnel to a region for which E2 = −1.
The remarkable thing is that eq. (5.27) shows that this is possible, provided r < rs
because E < 0 in this region. Furthermore, E = −1 can be reached if r gets close
enough to r = 0.
Particle production can therefore occur provided the particle-antiparticle pair
can tunnel to a separation of order r ' rs , since one particle must remain outside
the event horizon (in order to escape) while the other must get deep enough inside
to ensure that it reaches an area for which E ≤ −1. Using the quantum amplitude,
ψ ' e−mr , for the amplitude for a pair of mass m to separate by a distance r leads
one to expect a particle production rate that is suppressed by a power of e−mrs .
It happens that a more precise calculation does give this result, and the dis-
tribution of particles that are released in this way closely resembles what would be
expected for the radiation from a hot body, ∝ exp(−m/TH ), with the temperature
given by
~c ~c3
kB TH = = , (5.28)
4πrs 8πGM
where kB = 1.38×10−23 Joule/Kelvin is Boltzmann’s constant, which tells how much
energy is associated with a given temperature. TH is called the black hole’s Hawking
temperature, TH ∝ 1/rs . Numerically, for a solar-mass astrophysical black hole with
M = M , this predicts the completely negligible temperature TH ' 10−8 Kelvin.
For thermal emission into radiation the surface brightness (energy loss rate per
unit area), f , is completely characterized by the temperature, with f = αB Nr T 4 ,
where Nr counts the number of species of particles in the radiation and αB =
π 2 kB4 /(60~3 c2 ) = 5.67 × 10−8 Watts/(metre)2 (Kelvin)4 is the Stefan-Boltzmann con-
stant. The total rate of energy loss that is produced in this way far from the black
hole whose surface area is 4πrs2 is then of order
 2
dE M
4 2
' −4παB Nr TH rs = −Nr 9.00 × 10−29 Watts . (5.29)
dt M
Although this is negligible for any astrophysical system, for a black hole with M = 1
gram, it is 1066 times bigger than for a solar mass, implying a whopping power release

– 89 –
of 1038 Watts! Since the black hole energy is given by its mass, the above equation
can be read as implying dM/dt ∝ −M −2 , which can be integrated to infer how M
varies with time. The result is a monotonically decreasing function that ultimately
reaches zero, describing the black hole’s evaporation.
Because the radiation rate grows as M falls, for relatively small black holes the
energy loss due to Hawking radiation can be appreciable. And the more energy
that is lost, the smaller becomes the mass of the black hole, making the Hawking
temperature (and so also the radiation rate) larger. This is the recipe for a runaway
evaporation, wherein the radiation becomes faster and faster, ultimately becoming
explosive once the black hole mass gets down to the vicinity of the Planck mass,
Mp ' 2 × 10−8 kg. The time taken for the evaporation of such a black hole turns out
to be 3
5120πG2 M 3

M
τev = = 6.62 × 1074 seconds . (5.30)
~c4 M
This is much larger than the age of the universe (1010 years, or 3 × 1017 seconds)
in the case of a solar-mass black hole, but is in the ballpark of 10−25 seconds for a
one-gram black hole.
Hawking radiation is one of the few cases where a quantum effect can be reliably
computed in a gravitating environment, and it carries many surprises. It tells us
that very small black holes are unlikely to exist, since they are likely to evaporate
very quickly and explosively. It also turns out that the similarity between black holes
and thermal systems appears to be very deep, with 1/4 of the area of the black hole
event horizon (in Planck units) playing the role of its entropy

πrs2 4πGM 2
S= 2 = , (5.31)
`p ~c

called the Bekenstein-Hawking entropy. The classical evolution of the black hole
then combines precisely with the thermodynamic evolution of any surrounding hot
particles to ensure the validity of the three laws of Thermodynamics (including the
inevitability of the increase of total entropy), in a deep way that even now remains
poorly understood.

5.6 Rotating Black Holes

The Schwarzschild solution described to this point describes the unique gravita-
tional field outside of any spherically symmetric source (including a black hole). But
because such a source carries no angular momentum, it cannot describe the gravita-
tional field exterior to a rotating source, or the field external to a black hole formed
by the collapse of initially rotating matter.

– 90 –
Rotating black holes are instead described by what is called the Kerr metric,9
which is axially symmetric rather than spherically symmetric. The Kerr metric can
be explicitly written using Boyer-Lindquist coordinates, {t, r, θ, φ}, where 0 < θ < π
and 0 < φ < 2π are periodic angular variables (as for spherical polar coordinates),
while both t and r can take arbitrarily large values. It is given by
2GM ar sin2 θ 
 
2 2GM r 2

ds = − 1 − dt − dt dφ + dφ dt
ρ2 ρ2
ρ2 sin2 θ h 2 2 2
i
+ dr2 + ρ2 dθ2 + 2 2
dφ2

r + a − a ∆ sin θ (5.32)
∆ ρ2
∆h i2 sin2 θ h i2 ρ2
2
= − 2 dt − a sin θ dφ + 2
(r + a )dφ − a dt + dr2 + ρ2 dθ2 ,
2 2
ρ ρ ∆
where a and GM are positive real parameters with dimensions of length while ρ(r, θ)
and ∆(r) are functions, given explicitly by

∆ := r2 − 2GM r + a2 , (5.33)

and
ρ2 := r2 + a2 cos2 θ . (5.34)
As is straightforward (but tedious) to verify, the Ricci tensor constructed from this
metric vanishes — Rµν = 0 — so it satisfies the vacuum Einstein equations.
For r  a and r  GM these functions become ρ ' r and ∆ ' r2 − 2GM r and
so metric becomes
2GM a sin2 θ 
 
2 2GM 
ds ' − 1 − dt2 − dt dφ + dφ dt (5.35)
r r
 
2GM  
+ 1+ dr2 + r2 dθ2 + sin2 θ dφ2 ,
r
up to terms that are subdominant by two powers of 1/r. This asymptotes to
Minkowski space in spherical polar coordinates as r → ∞, showing that this ge-
ometry is asymptotically flat at large r.
Keeping terms of order 1/r shows g00 ' −1 + 2GM/r and so the Newtonian
potential seen by very distant observers is Φ = −GM/r, as appropriate for an object
of mass M (where G, as usual, denotes Newton’s gravitational constant). This
interpretation of M as the black hole mass is also supported by taking the a → 0
limit for arbitrary r, in which case (5.32) becomes the Schwarzschild metric, with
rs = 2GM .
The dependence on θ implies the metric (5.32) has less symmetry than does the
Schwarzschild metric, making it not spherically symmetric. It is symmetric under
9
Both Schwarzschild and Kerr solutions to Einstein’s equations are named after their discoverers.

– 91 –
the independent constant shifts of the coordinates t and φ, however, showing that it
is both time-translation invariant — i.e. ‘stationary’ — and invariant under rotations
for which θ remains fixed. For the asymptotically flat geometry at large r, shifts of
φ with fixed θ correspond to rotations about only the z-axis.
As usual there is a conserved angular momentum associated with this rotational
invariance, but because the invariance is only about the z-axis, there is only a single
conserved quantity, J, instead of a vector’s-worth of quantities, J. This conserved
angular momentum works out to be related to a by

J = Ma , (5.36)

so the a → 0 limit corresponds to turning off the geometry’s angular momentum (in
which limit we saw above the geometry becomes Schwarzschild).
The presence of the dt dφ + dφ dt term implies the Kerr geometry is (unlike
the Schwarzschild geometry) not ‘static’ — i.e. not invariant under time-reversal,
for which t → −t — even though the geometry is stationary.10 The absence of
time-reversal invariance is also what would be expected for nonzero J because time-
reversal also changes the sign of J, and indeed the Kerr solution remains invariant
under t → −t if at the same time we take a → −a.
In the limit M → 0 with a fixed, the metric (5.32) becomes

r2 + a2 cos2 θ 2
ds2 = −dt2 + dr + r2 + a2 cos2 θ dθ2 + r2 + a2 sin2 θ dφ2 . (5.37)
 
2
r +a 2

This is again flat space but written in ellipsoidal coordinates, related to cartesian
coordinates by
√ √
x= r2 + a2 sin θ cos φ , y= r2 + a2 sin θ sin φ , z = r cos θ . (5.38)

Surfaces of constant r in these coordinates are ellipsoids that satisfy

x2 + y 2 z 2
+ 2 = 1. (5.39)
r 2 + a2 r

As r → 0 these ellipsoids degenerate down to a circular disk, x2 + y 2 ≤ a2 , at


z = 0, whose centre corresponds to cos θ = 1 and whose boundary at x2 + y 2 = a2
corresponds to cos θ = 0.
10
Strictly speaking, a geometry is stationary when it has a time-like Killing vector field, ξ µ — see
the discussion around eq. (3.38) — and it is static if this vector field is ‘hypersurface orthogonal’,
i.e. perpendicular to surfaces of constant t.

– 92 –
Event horizons and Ergosphere
The Kerr geometry describes the spacetime surrounding a spinning black hole, and
it is a black hole inasmuch as there is a region of the spacetime from which it is
impossible to escape to spatial infinity. The boundary of this region defines an
‘event horizon’ through which the flow of test particles is purely a one-way trip.
To explore the physically significant surfaces like this, consider various families of
observers moving within this spacetime.
The first class of observers to consider are those who simply ‘hover’ at fixed r,
θ and φ. These are the observers who remain at rest with stationary observers at
infinity, relative to whose clocks the hovering observers experience time-dilation (or
redshift). The 4-velocity, uµ , of any such a hovering observer points purely in the t
direction and must be time-like or null, so that gµν uµ uν = gtt (ut )2 ≤ 0. Increments
of proper time, dτ , for such an observer are given by
 
2 2GM r
dτ = 1 − 2
dt2 . (5.40)
ρ
Such observers are only possible when 2GM r < ρ2 = r2 + a2 cos2 θ and so
p
r > GM + (GM )2 − a2 cos2 θ , (5.41)

and for radii smaller than this all timelike observers must also move in the direction
of the black hole’s rotation. For the equator (for which θ = π2 and so cos θ = 0) this
amounts to r > 2GM — just like the corresponding condition for Schwarzschild.
It occurs for smaller radii than this at higher latitudes, with hovering observers
p
allowed for r > r+ := GM + (GM )2 − a2 at the poles (for which cos θ = ±1).
Eq. (5.41) defines the exterior of the ‘ergosphere’, defined as the region within which
it is impossible to simply hover at fixed r, θ and φ.
Consider next a photon that moves in the equatorial plane (cos θ = 0) initially
with no radial velocity. Such a photon instantaneously has a 4-momentum pointing
purely in the φ and t directions, and so satisfies gtt dt2 +gtφ (dt dφ+dφ dt)+gφφ dφ2 = 0,
and so s 2
dφ gtφ gtφ gtt
=− ± − . (5.42)
dt gφφ gφφ gφφ
π
Evaluating this right at the boundary of the ergosphere (which for θ = 2
corresponds
to r = 2GM ) implies gtt = 0 and so
 
dφ dφ gtφ a
= 0 or = −2 = . (5.43)
dt dt gφφ 2(GM )2 + a2
These show that a photon moving in a retrograde sense relative to the black hole
rotation has zero transverse speed when at the edge of the ergosphere. A massive

– 93 –
particle not moving radially at this radius moves more slowly than a photon and so
must be carried along by the rotation within the ergosphere. By contrast, motion
in the same sense as the black hole rotation has nonzero speed, suggesting that the
edge of the ergosphere is unlikely also to define the event horizon in the equatorial
plane.11
This disagreement between the position of the event horizon and the boundary
of the ergosphere arises because Kerr is stationary but not static. To identify the
position of the event horizon consider the trajectory r(t) of a radially out-going light
ray. This satisfies ds2 = 0 and so r(t) must satisfy
s  
dr ∆ 2GM r
= 1− . (5.44)
dt ρ2 ρ2

The radial position, r, no longer increases with increasing t once the right-hand side
of this equation vanishes. This either occurs when 2GM r = ρ2 = r2 + a2 cos2 θ or
when ∆ = r2 − 2GM r + a2 = 0.
The problem at 2GM r ≤ ρ2 proves to be more about the breakdown of the
ability to use the coordinate t to parameterize time along a timelike curve inside
the ergosphere. Instead it is radii for which ∆(r) = 0 that turn out to correspond
to event horizons for the Kerr metric, corresponding to where g rr = 1/grr vanishes.
This implies the event horizons occur as surfaces of constant r, at the specific values
p
r = r± := GM ± (GM )2 − a2 . (5.45)

External observers only access information from outside the outermost of the two
event horizons — i.e. the one at r = r+ . Precisely as for the Schwarzschild geometry,
the apparent singularity of the metric at r± is only an artifact of the breakdown
there of the coordinates {t, r, θ, φ}.
Notice that the external horizon becomes the Schwarzschild horizon r+ → 2GM
as a → 0, and also corresponds to the boundary of the ergosphere (for all θ) in this
limit. The ergosphere touches the outer horizon only at the poles, but elsewhere (for
all cos2 θ < 1) is strictly exterior to the outer horizon.
Both the boundary of the ergosphere and the event horizons are only real for all
θ if a ≤ GM , in which case the black-hole angular momentum satisfies the upper
bound
J = M a ≤ GM 2 . (5.46)
This is believed to be a physical condition for black holes because geometries with
a > GM turn out to have regions of infinite curvature that are not masked by event
11
Recall for a Schwarzschild black hole a photon cannot have tangential components to momentum
right at the horizon, r = 2GM .

– 94 –
horizons (what are called ‘naked singularities’), that are unstable and are believed
to be unphysical.

6. Other Astrophysical Applications

The universe is a violent place, containing many examples of matter situated in very
extreme environments. Many of the most violent of these involve black holes located
in galactic centres whose masses are many millions of times the mass of our Sun.
These release enormous amounts of energy as material falls into the black hole, in
amounts that can only be understood within a relativistic framework.
Furthermore more sophisticated surveying techniques are now mapping out larger
and larger regions of the universe, allowing a more detailed understanding of how
much matter is out there, where it is, and how it interacts with its surroundings.
Since most of this material turns out to be dark, there is a high premium for un-
derstanding how it gravitates, since this provides the only observational handle on
knowing where it is.
Many of these studies rely heavily on General Relativity, and some are accurate
enough to provide precision tests of the theory that are similar in spirit to those
performed in the solar system. This section summarizes a few of these.

6.1 Stellar interiors

For an astrophysical object like a star the properties of the event horizon are irrel-
evant, because the Schwarzschild geometry only applies down to the star’s radius,
R? , below which we must re-solve Einstein’s equations in the presence of matter,
Tµν 6= 0. To illustrate how this works, this section finds this interior geometry us-
ing a simple model of the physics of the star. The absence of stable orbits in the
Schwarzschild solution too close to r = rs should make one expect that stars should
not be able to stave off gravitational collapse if they become too dense, R? ∼ rs , and
this expectation is borne out in detail in the analysis below.
If the star is spherically symmetric then the arguments made earlier show that
it is always possible to choose coordinates so that the metric has the form

ds2 = −e2a dt2 + e2b dr2 + r2 (dθ2 + sin2 θdφ2 ) , (6.1)

where we may take a = a(r) and b = b(r) if the star’s interior is time-independent.
The goal is to solve for these functions using the field equations,

1
Gµν ≡ Rµν − R gµν = 8πGTµν , (6.2)
2

– 95 –
given a simple choice for Tµν . To this end we require the components of the Einstein
tensor, Gµν , which can be found using eqs. (4.12):

e2(a−b) 1
2r∂r b − 1 + e2b , 2r∂r a + 1 − e2b ,
 
Gtt = 2
Grr = 2
r  r 
1
Gθθ = r2 e−2b ∂r2 a + (∂r a)2 − ∂r a ∂r b + (∂r a − ∂r b) (6.3)
r

and Gφφ = Gθθ sin2 θ.


For the stress energy, we take the stellar interior to be a perfect fluid which is
characterized by an energy density, ρ, and pressure, p, which are related by some
sort of equation of state, p = p(ρ, S), where S is the fluid’s entropy. Any such a fluid
must have a local rest frame, whose 4-velocity is denoted by uµ (x), where as usual
gµν uµ uν = −1.
To determine the stress tensor for such a fluid, we appeal to the principle of
equivalence. First consider the limit of flat space for which we would like Ttt = ρ
and Tij = p δij in the fluid’s rest frame (for which uµ = (1, 0, 0, 0)). This implies Tµν ,
written in terms of uµ , ρ and p, must be defined by

Tµν = (ρ + p)uµ uν + p gµν , (6.4)

where gµν = ηµν in flat space. The principle of equivalence says that this same ex-
pression should also hold in the presence of a gravitational field, since it is a generally
covariant expression which agrees with the flat-space result of special relativity in
the special frame for which gµν = ηµν .
Evaluating eq. (6.4) using the metric, eq. (6.1) leads to the following components
for Tµν :
Ttt = e2a ρ , Trr = e2b p , Tθθ = r2 p , (6.5)

and Tφφ = Tθθ sin2 θ. The expression of energy conservation for this metric, ∇µ Tµν =
0, then implies
da dp
(ρ + p) =− . (6.6)
dr dr
Using eqs. (6.5) in the Einstein equations leads to three independent expressions:

e−2b 2b

2r∂ r b − 1 + e = 8πGρ (tt equation)
r2
e−2b
2r∂r a + 1 − e2b = 8πGp (rr equation) (6.7)

r 2
 
2 −2b 2 2 1
r e ∂r a + (∂r a) − ∂r a ∂r b + (∂r a − ∂r b) = 8πGp (θθ equation) .
r

– 96 –
Since the (tt) equation does not involve a(r), it can be put into a more physically
intuitive form by performing a change of variables from b(r) to
r  
m(r) = 1 − e−2b , (6.8)
2G
for which e2b = [1 − 2Gm(r)/r]−1 . In terms of this variable the (tt) equation becomes

dm
= 4πr2 ρ , (6.9)
dr
which integrates to give Z r
m(r) = 4π dr̂ r̂2 ρ(r̂) . (6.10)
0

If the boundary of the star is taken to be r = R? , then for r > R? the geometry
is given by the Schwarzschild metric. Continuity of the metric across r = R? then
requires the function m(r) must satisfy the boundary condition m(R? ) = M , where
M is the mass of the star. That is,
Z R?
M = 4π dr̂ r̂2 ρ(r̂) . (6.11)
0

This last equation almost (but not quite) says that m(r) is the integral of the energy
density out to radius r, and so that M is the integral of this energy density over
the entire volume of the star. The qualification ‘almost’ is required here because
the integral of the energy density would really have been weighted by the covari-
ant measure of volume which involves the determinant of the entire spatial metric,
p
det gij = eb r2 sin θ dr dθ dφ, and so the integrated energy is really given by
R? R?
r̂2 ρ(r̂)
Z Z
b(r̂) 2
Mtot = 4π dr̂ e r̂ ρ(r̂) = 4π dr̂ >M. (6.12)
0 0 [1 − 2Gm(r̂)/r̂]1/2

This shows that Mtot is better thought of as the energy the star would have if it
were distributed to infinity and so had no gravitational field, making the difference
Mtot − M the star’s gravitational binding energy.
Trading b(r) for m(r) in the (rr) equation then gives the following result for
a(r):
da Gm(r) + 4πGr3 p
= . (6.13)
dr r[r − 2 Gm(r)]
Rather than trying to simplify this using the (θθ) equation, it is simpler instead to
use conservation of energy, eq. (7.70), to trade da/dr for dp/dr, to get

dp (ρ + p)[Gm(r) + 4πGr3 p]
=− . (6.14)
dr r[r − 2 Gm(r)]

– 97 –
This equation, called the Tolman-Oppenheimer-Volkoff equation, expresses how
the pressure profile in the star’s interior must adjust in order to balance the gravi-
tational force required to support the star’s outer layers, and provides the condition
of hydrostatic equilibrium for the interior of the star. In particular, so long as p and
ρ are both positive and r > 2 Gm(r), eq. (6.14) implies dp/dr < 0 and so the pres-
sure profile decreases monotonically with radius within the star, taking its maximum
value at the star’s centre at r = 0.
Notice that in the Newtonian limit we may take p  ρ, since the energy density
is dominated by the rest mass of the atoms in the star, as well as r  2 Gm(r),
allowing eq. (6.14) to be approximated by the more familiar form

dp Gm(r)ρ
=− . (6.15)
dr r2
This equation simply states that the pressure gradient adjusts to ensure that the
net force acting on any particular fluid element vanishes. To see this, consider a
small fluid element that extends from r to r + dr with cross-sectional area A. Since
pressure is force per unit area, the radial component of the net fluid force acting on
this element is
dp
dFp = p(r)A − p(r + dr)A ' − A dr . (6.16)
dr
Eq. (6.15) simply states that this force must balance the gravitational attraction
between the matter in the fluid element (whose mass is ρA dr) and the matter that
lies interior to it in the star (whose mass is m(r)) and so whose radial component is

Gm(r)ρA
dFg = − dr . (6.17)
r2
Implications for stellar phenomenology
In general, hydrostatic equilibrium relates dp/dr to ρ and m (which is itself related
to ρ), and this can be integrated to obtain explicit profiles, p(r) and ρ(r), once an
equation of state is given, like p = p(ρ, S) where S is the entropy density of the fluid.
For example, for a perfect fluid one might use p = κρT where κ is a constant related
to the mass per particle of the atoms making up the fluid and T is the local fluid
temperature (and so is related to its entropy).
Once such an equation of state is known p can be eliminated (in principle) in
terms of ρ, as can m using eq. (6.10), allowing eq. (6.14) to be regarded as an
equation involving ρ only. This can be integrated, typically numerically, to give the
profile ρ(r) from which the equation of state then gives p(r), while m(r) and a(r) are
obtained from eqs. (6.10) and (6.13). Of course this process can become complicated
in detail if the changes in pressure and density trigger phase changes in the stellar
material, or in the dominant mechanism for energy transfer within the star, but the

– 98 –
logic still remains the same in such cases provided one is careful to use the proper
new expression relating p and ρ in the relevant areas.
The upshot is that an assumed equation of state leads to a prediction for all
three of these profiles that depends on a single integration constant, usually taken
to be the value of the energy density at the stellar centre: ρ? = ρ(r = 0). What
is important is that this means that the two external properties of a star — its
mass M and radius R? — must be related to one another because both of these can
be calculated once ρ? is known. The stellar radius is calculable because it may be
defined as the radius r = R? where p(R? ) = 0. The mass is then found by using
eq. (6.10) at r = R? . Because these two variables are both predicted from the one
integration constant one expects to find a relation M = M (R? ) that relates all stars
that share the same equation of state.
The importance of this observation is that both M and R? can often be deter-
mined by observations. For instance, the mass can often be found by observing how
other objects orbit around the given star. Although such orbits exist for a surpris-
ingly large number of stars, since just under half of stars are found in binary systems
with pairs of (or more than two) stars orbiting one another, in practice the two stars
are usually required to eclipse one another (from the Earth’s point of view) in order
to obtain the stellar mass. This is because it is Kepler’s third law that gives the
mass in terms of the orbital period and semi-major axis, but the semi-major axis can
only be determined if the orientation of the stellar orbit relative to the line of sight
is known.
The radius, on the other hand, is more easily observable because it typically
controls the star’s overall luminosity, L, defined as its rate of energy emission. This
depends on R? because stars emit energy thermally and so do so with a flux — i.e.
rate per unit surface area — that is characterized purely by their surface temperature:
f = f (T ) = σT 4 , where σ is a known constant. Since this temperature can be
measured from the spectrum of radiation the star emits, as can the total luminosity,
L = 4πf R?2 , from the total observed brightness of the star (once its distance from
the Earth is known), the radius R? can be inferred from observations.
In the event, 90% of stars are dominantly made up of hydrogen, and provide the
pressure gradients required to stave off gravitational collapse by fusing hydrogen into
helium in their cores. Consequently they share the same equation of state, and so
their pressure, density and temperature profiles are all calculable in terms of their
central density, ρ? . They should be expected to fall along a single curve M (R? ) if
their masses and radii are plotted in the M − R? plane. Astronomers really test
this prediction by instead plotting their luminosity against their temperature, and
looking for a correlation between L and T since these are the two quantities that are

– 99 –
the most easily observable. And they indeed find that most stars — known as main
sequence stars — do fall along a curve when plotted in the L − T plane (known as a
Hertzsprung-Russell diagram).
When mass can be measured it
is also observed to be correlated with
luminosity when main sequence stars
are plotted in the M − L plane. Be-
cause the energy source in stars ul-
timately comes from nuclear reac-
tions, small increases in mass lead
to fairly small increases in the cen-
tral temperature, but this leads to a
large change in luminosity. Obser-
vationally one finds the strong vari-
Figure 9: A Hertzsprung-Russell (HR) diagram
ation L ∝ M 3.5 , with more massive
showing the correlation between stellar luminosity
stars being much more luminous.
and temperature.
For ordinary stars the balance
between pressure and gravity is perilously achieved, because it relies on the pressures
associated with the energy release due to nuclear fusion which becomes possible at
the high pressures which occur in stellar cores. This is perilous because it can only
work so long as there is nuclear fuel to burn in this way, and so ends once this fuel
is depleted. Furthermore, since the main sequence lifetime is of order τ ∝ M/L the
observed mass-luminosity correlation shows that τ ∝ M −2.5 , and so more massive
stars have a much shorter lifetime than do lighter ones.
At some point either a new, more stable, source of pressure must be found to
balance gravity if a permanent object is to be formed, or gravity wins – leading to a
runaway gravitational collapse.

An incompressible star
To see in more detail what the options are for balancing gravity with various forms
of pressure it is instructive to specialize to the very simple case of an incompressible
fluid, ρ = ρ? for all p. This represents the extreme case where the stellar material
resists changing its density regardless of how high the pressures get. It also has the
advantage of allowing explicit solutions which illustrate the behaviour in the more
general case.
Suppose, then, that we assume the incompressible density profile
(
ρ? if r < R?
ρ(r) = , (6.18)
0 if r > R?

– 100 –
which is characterized by the two parameters ρ? and R? . In this case we may directly
integrate to obtain m(r), leading to
(
4πρ? r3 /3 if r < R?
m(r) = 3
, (6.19)
4πρ? R? /3 = M if r > R?

which last relation allows one to trade R? and M as independent parameters. Simi-
larly, the pressure profile found by integrating (6.14) becomes
" √ p #
R? R? − rs − R?3 − rs r2
p(r) = ρ? p √ if r < R? , (6.20)
R?3 − rs r2 − 3R? R? − rs

where, as usual, rs = 2GM . Notice that the pressure goes to zero at r = R? :


p(R? ) = 0, as expected by hydrostatic equilibrium for the stellar surface.
Similarly, integrating eq. (6.13) gives the metric component, gtt = −e2a :
1/2 1/2
rs r 2
 
a(r) 3 rs 1
e = 1− − 1− 3 if r < R? . (6.21)
2 R? 2 R?

Notice that this implies e2a(R? ) = 1 − rs /R? , as required by continuity with the
exterior Schwarzschild solution.
It is the pressure equation, eq. (6.20), which says something really interesting.
Recall that it implies the pressure goes to zero at the stellar surface, p(R? ) = 0, and
then grows monotonically as one moves into the interior (i.e. for decreasing r), as is
required by hydrostatic equilibrium. The maximum pressure reached is at the stellar
center, and is given by
√ √ 
R? − rs − R?
pmax = p(0) = ρ? √ √ . (6.22)
R? − 3 R? − rs

Notice in particular that if we increase M (and so also rs ) for fixed R? , then p(0) → ∞
once rs = 98 R? , or Mmax = 49 (R? /G). This states that once the star becomes too
dense it is completely impossible to support it against gravitational collapse. A
similar conclusion is reached using more realistic equations of state, but for these it
is also true that Mmax ≤ 49 (R? /G), a result known as Buchdahl’s theorem. This is as
one might have expected: it is an incompressible fluid which supports the maximum
mass which is possible.
If M should be larger than Mmax for any given equation of state then there is no
static solution possible, and the star collapses. It continues to collapse, either until
the equation of state modifies so that M becomes smaller than the new Mmax , or until
the entire star falls below r = rs , forming a black hole. For real astrophysical objects
there are a number of stable objects which can be formed in this way, including

– 101 –
planets (for which gravity is balanced by material stresses); white dwarf stars (for
which gravity is balanced by electron degeneracy pressure); and neutron stars (for
which gravity is balanced by neutron degeneracy pressure). These can remain stable
indefinitely, unless additional matter is added to them in such a way as to push them
over the limit of stability. (Some supernovae can arise when white dwarfs are pushed
over their limits in this way within binary star systems.)

6.2 Gravitational Lensing


Since we can now see objects that are
very distant in the Universe, we should ex-
pect to find a reasonably large number of
coincidences with distant galaxies appearing
to lie very close to the same line of sight as
nearer galaxies in the foreground. Because of
this we expect the widespread occurrence of
gravitational lensing, wherein light from very
distant galaxies is deflected by the gravita-
tional field of a foreground mass. This kind
of lensing has in fact been seen many times,
Figure 10: A photograph of gravita- such as the strong lensing that is shown in
tional lensing (the arc-like shapes) of fig. 10, where the arcs are lensed image of a
distant galaxies by a foreground galaxy distant galaxy distorted by a large cluster of
cluster. galaxies in the foreground. But other exam-
ples of lensing have also been seen, including
the micro-lensing of stars in our galaxy (and in nearby galaxies) by other stars that
pass along the intervening line of sight, and the weak lensing that slightly distorts
the shape of a great many galaxies across the sky.
This section describes the basics of such lensing events. One should keep in mind
that lensing phenomena are typically not used to test GR, because comparatively
little is known about the properties of the foreground masses that are doing the
lensing. Because of this it is difficult to have precise predictions with which to
compare the observations. What is done instead is to use the observed lensing to
infer the distribution of foreground matter, under the assumption that GR provides
a good description of the lensing physics. It is arguments such as these that point to
the widespread existence throughout the Universe of an unknown form of matter —
called Dark Matter — whose presence is only inferred from its gravitational effects.

Lensing Basics
The starting point for the story is the basic observation, derived in section 3.3, that

– 102 –
General Relativity predicts that light rays passing close to a spherical gravitational
source are deflected through an angle
4GM 2 rs
α' = , (6.23)
b b
where M is the source’s mass and b is the impact parameter of the passing light ray.
The goal is to re-express this angle in
terms of those that are more suited to what
is actually measured when a lensing event
is seen. In particular, rather than knowing
the deflection angle, α, it is more useful to
know the angular position, θ, of the image,
I, relative to the angular position, β, of the
source, S, as seen from by the observer, O.
The figure, fig. 11, shows how these are re-
lated. A great help when using this figure
to solve for θ is the early recognition that
for most events the distances involved are
enormous and the deflection angles are con-
sequently very small. Because of this we can
Figure 11: A diagram of the geometry idealize the change of direction of the light
of a lensing event. ray as being completely localized at a single
instant when it passes by the plane of the
lens’ position in the sky, and we can liberally use the approximation sin x ' x for
x  1.
Inspection of the top of the figure shows that the angles α, β and θ are related
by
θ DS = β DS + α DLS , (6.24)
and so dividing through by DS and eliminating α using eq. (6.23), with b ' ξ ' θ DL
then gives
θ2
θ=β+ E, (6.25)
θ
where the Einstein angle, θE , is defined in terms of the distances in the problem by
r
2 rs DLS
θE = . (6.26)
DS DL
Solving eq. (6.25) for θ gives the desired solutions, θ = θ± , for the angular
positions of the two perceived images (one on each side of the lens in the plane
defined by the observer source and lens), with
1 p 
θ± = β ± β 2 + 4θE2 . (6.27)
2

– 103 –
In the degenerate situation that the lens lies directly in front of the source — i.e.
if β = 0, and so the observer, lens and source do not define a plane — then the
observed image would be an Einstein ring that surrounds the lens, whose angular
radius is θ = θE .
To get an idea of how big this ring is, suppose the source and lens are as distant
from each other as the lens is from us, DLS ' DL := D and so DS ' 2D. Then if12
D ' 1 Mpc and the lens has a mass M ' 108 M , its Schwarzschild radius would be
p
rs ' 3 × 1011 m and so θE ' rs /2D ' 2 × 10−6 radians, or 0.5 seconds of arc.
When the source and lens are instead only slightly off-set this ring degenerates
into two arcs, much like those seen in fig. 10, and these are relatively easy to recognize.
There are also several ways to check that two candidate objects in the sky are really
multiple, lensed images of the same source. One is to compare their spectra, which
should be identical for two images of the same source because (unlike for lenses in
the lab) the bending of light by gravity is the same for all wavelengths. The other
is to watch for correlations in any time-dependence in the intensity of the received
light, since any fluctuations in the intensity of one must be repeated for the other
— possibly after a delay due to any difference in the path length along the two light
trajectories. Time-lags of this sort are observed for pairs of gravitationally lensed
images, with changes in one image followed by changes in the other, often several
weeks later.
But lensing events are not always so
easy to identify, since the lenses are often
3
too dark to see and the images needn’t
2 be so strongly distorted if the lens and
source are not aligned sufficiently closely.
1
Alternatively, for some objects the an-
gle θE can be too small to be resolved.
0
It turns out that there are nonetheless
-1
sometimes useful ways for searching for
0.5 1 1.5 2 2.5 3
beta
lensing that do not rely on directly de-
tecting the independent images of a par-
Figure 12: A plot of θ+ (upper red) and θ−
ticular source.
(lower blue) vs β, in units of θE .
Weak Lensing
When looking at a field of view filled with distant galaxies, evidence, even for rel-
atively weak lensing, can be found using statistical methods even if it is hopeless
to find multiple images of individual galaxies. This evidence relies on statistically
identifying the distortion that lensing produces on a galaxy’s shape.
12
Mpc denotes a megaparsec, or 106 parsecs, which is a commonly used distance unit in extra-
galactic astronomy. A parsec is an astronomical measure of distance, defined to be one AU per
arc-second, where an astronomical unit (AU) is the mean Earth-Sun distance. This makes a parsec
about 3.262 light years, or 3.086 × 1016 m, which is roughly the distance to the nearest stars. So 1
Mpc ' 3 × 1022 m.
– 104 –
To quantify this distortion imagine describing the sky using angular coordinates
that are centered on the position of the foreground object that is responsible for the
lensing. In this case, as before, we use θ to describe the ‘radial’ angle of an image
away from the lens, and ϕ to measure the ‘azimuthal’ angle of the image transverse
to the radial direction θ. Lensing only moves the image of the source away or towards
the lens (in the θ direction), with one image inside of and one outside of θ = θE , but
does not also change ϕ.
In terms of these coordinates, suppose a narrow beam of light rays has angular
widths ∆θ and ∆ϕ when it leaves the source. Since the source is displaced relative
to the lens by the angle β, the spread in ∆θ can be interpreted as a spread ∆β in
the initial angular position of the beam relative to the lens. Once the beam has been
lensed its new angular position relative to the lens is θ± , and although the spread
in the beam in the ϕ direction remains unchanged, in the θ direction the spread
becomes   " #
dθ± 1 β
∆θ± = ∆β = 1± p ∆β . (6.28)
dβ 2 β 2 + 4θE2
Because of this distortion the images of a galaxy that would have appeared to us
as being spherical without the lens, become elliptical in a precisely calculable way.
Observationally, the problem is that galaxies are not perfectly spherical, and so the
trick is to distinguish the distortions due to lensing from general oddities in galactic
shapes. This is where statistics come in, because galaxies are usually randomly
oriented in the sky and can come in a fairly random pattern of shapes. But the
distortions due to lensing in the part of the sky near a source are preferentially
distorted along the direction towards the lens. If one samples a large sample of
galaxies in a particular part of the sky and finds a bias for galaxies to be distorted
(on average) in a particular direction, this can be interpreted as evidence for lensing
by a source that lies in this direction. By repeating this process over and over again
for nearby regions it is possible to provide a map of the foreground mass distribution
that is doing the lensing, regardless of whether this distribution is directly visible or
not.
Maps of the mass distribution in the universe produced by weak lensing surveys
of this type are just now (2008) being performed, and are providing one of the main
lines of evidence for the existence of vast amounts of Dark Matter throughout the
universe (more about which later).

Microlensing
Lensing can also be applied to objects in our own galaxy, and for nearby galaxies
(like the Magellanic Clouds – which are two small galaxies that orbit our own) since
stars, planets and other objects can periodically pass into the line of sight towards

– 105 –
other stars. Seeking these kinds of lensing events could allow us to count the number
of relatively small and dark objects that may be floating about the galaxy, otherwise
unseen.
A major complication in this case is the
very small size of the angular deflection, how-
ever, since a solar-mass lens situated 10 kpc
away (a typical galactic distance) lensing a
source that is 10 kpc beyond it, would give
(using 10 kpc ' 3 × 1020 m) θE ' 2 × 10−9
radians, or 5 × 10−4 arc seconds. Angular
distances this small are too small to measure
from Earth, even if two stars could be found
lying this close to the same line of sight.
But even if the two separate images of
the source star are not separately visible,
Figure 13: A sketch of the angular dis-
tortion of the two lensed images.
taken together they increase the total amount
of light received at the earth from the source,
compared with what would have arrived in the absence of lensing. Although we do
not know how bright the initial source star intrinsically is, we know that within our
galaxy stars are moving, with an average speed of roughly 200 km/sec. So although
the absolute brightness of the unresolved images cannot be compared with a known
initial source brightness, the change in brightness of the images as the lens and source
move into and out of alignment can be measured.
What does this change of brightness look like? Since a star emits radiation
thermally, its surface brightness depends only on its temperature and so its apparent
brightness as seen by any given observer is controlled purely by the total fraction of
the star’s radiation that the observer is able to catch. And this fraction is controlled
by the solid angle that the source subtends as seen by the observer. (This is why
the apparent brightness of a star usually falls off with distance, d, from the star like
1/d2 .) The increase in brightness due to the lensing may therefore be computed by
calculating the increased solid angle subtended due to the splitting and distorting of
the lensed images.
Consider then a beam of light coming from the source at θ = β having a small
angular width ∆θ = ∆β and ∆ϕ. The solid angle, observed from Earth, spanned
by this beam at the source is then ∆Ω = sin θ∆θ∆ϕ ' β∆β∆ϕ. Once it has
been lensed, we have seen that the beam acquires a new angular position, θ = θ± ,
and widths ∆θ± = (dθ± /dβ)∆β and ∆ϕ. After the lensing the beam subtends the
new solid angle ∆Ω± ' |θ± ∆θ± ∆ϕ|, where the absolute value arises because θ− is

– 106 –
negative. The change in intensity due to the imaging is therefore given by the ratio
of these two solid angles summed over the two images:

Ilens X θi ∆θi ∆ϕ X θi  dθi 
= β∆β∆ϕ =

Isource i=± i=±
β dβ
" p #
1 β β 2 + 4θE2
= p + . (6.29)
2 β 2 + 4θE2 β

Notice that since f (x) = 12 (x + 1/x) ≥ 1 (with equality occurring only when x = 1)
we have Ilens ≥ Isource , with equality occurring only if θE /β → 0.
The time-dependence enters
this intensity because β varies
with time as the relative posi-
tions of the source and lens change.
The maximum change occurs once
β ' θE and so for lens and source
a distance of order D away, the
time required for a maximal change
of intensity can be estimated to
be τ ' θE D/v, where v ' 200
km/sec is the typical speed of
galactic objects. Taking D ' 10
kpc and θE ' 2 × 10−9 radi-
Figure 14: Observational traces of brightness vs day ans, as above, then gives the es-
of observation for candidate microlensing events (from timate τ ' 0.1 years, or a few
the MACHO Collaboration). months.
Although it might seem like
a million-to-one shot to happen to see a lens and source line up in precisely this
way, these kinds of microlensing events have been sought by dedicating a telescope
to repeatedly photograph large fields of stars over many nights, and then looking
for the few stars whose brightness changes. Such a search inevitably finds various
types of variable stars, whose brightness changes for other reasons internal to the
star, but these can be identified by seeing how their pattern of variation differs at
different wavelengths. Once these are removed, a handful of bona fide microlensing
events remain, some of which are shown in fig. 14. The frequency of these events
is consistent with what is known for small stellar and planetary objects, but is too
small to account for the dark matter (whose presence is inferred in all galaxies from
measurements of how they rotate).

– 107 –
6.3 Gravitational Waves
Waves are a generic consequence of relativistic field theory, and correspond to the
fact that information can only travel out through the field at a finite speed (at most,
the speed of light), bringing the news to other particles about how their sources have
moved. For the special case of General Relativity, since gravity is represented as the
geometry of spacetime, gravitational waves are ripples in the fabric of spacetime itself.
These are generated when masses are moved relative to one another. These waves
are the precise analogs of the electromagnetic waves that are generated by moving
electrical charges, and which we know as light, radio waves, x-rays etc., depending
on their frequency.
To understand many of the properties of gravitational waves it suffices to consider
very small geometrical ripples about flat spacetime, for which the metric has the form

gµν (x) = ηµν + hµν (x) , (6.30)

where hµν represents a small deviation that depends on position and time. Calcu-
lating the Christoffel symbols and curvature tensor, but dropping all terms that are
quadratic and higher in the small quantity hµν leads a Ricci tensor of the form
1  
Rµν = − η αβ ∂α ∂β hµν − ∂µ ∂α hβν − ∂ν ∂α hβµ + ∂µ ∂ν hαβ + O(h2 ) . (6.31)
2
We can simplify this by using the freedom to change coordinates, in which case
a small change, xµ → xµ + ξ µ , leads to

hµν → hµν + ηνλ ∂µ ξ λ + ηµλ ∂ν ξ λ , (6.32)

up to quantities that quadratic in ξ µ . A convenient choice is to use the four in-


dependent quantities in ξ µ to impose the following four independent constraints on
hµν :  
σν 1 αβ
η ∂σ hµν − ηµν η hαβ = 0 , (6.33)
2
since with this choice the vacuum Einstein equations become, to linear order in h:
1
Rµν = −  hµν = 0 , (6.34)
2
where  = η αβ ∂α ∂β denotes the d’Alembertian operator (which was introduced in
the earlier sections devoted to special relativity).
The significance of this last equation is that it is a wave equation, as may be
seen by writing it out without the benefit of the Einstein summation convention
(and re-introducing the factors of c):
1 ∂ 2 hµν
 hµν = − + ∇2 hµν = 0 . (6.35)
c2 ∂t2

– 108 –
This has as solutions arbitrary linear combinations of plane waves,
h i
hµν (x) = εµν (k) exp ikµ xµ , (6.36)
where the quantity εµν describes the wave’s polarization (of which there are two
independent forms, more about which below), and the 4-vector k µ must satisfy
k 2 = ηµν k µ k ν = 0 . (6.37)
Writing k µ = {ω, k}, eq. (6.37) implies k = ω k̂, where k̂ := k/|k| is the unit
vector normal to the plane of the wave-front. The plane wave then becomes
h i h i h  i
µ
exp ikµ x = exp −iωt + ik · x = exp −iω t − k̂ · x . (6.38)
General spatial profiles, hµν (x), are built as linear combinations of the above solutions
(i.e. by Fourier transformation). Eq. (6.38) implies the waves are functions of the
combination t − x/c, where x = k̂ · x and the factors of c are restored. This shows
that wave profiles propagate with speed c: both gravitational and electromagnetic
waves move at the speed of light.
The two polarizations of gravitational waves correspond to the choices possible
for the polarization tensor, εµν (k), which eqs. (6.33) imply satisfy
1
kν εµ µ = 0 .
k µ εµν − (6.39)
2
This condition has many more than two solutions, but it is also true that this con-
dition does not completely remove the freedom to change εµν (k) by using coordi-
nate transformations of the form (6.32) with ξ µ (x) = ζ µ eik·x , with constant ζ µ and
k 2 = kµ k µ = 0. To see why, notice that under such a transformation we have
εµν (k) → ε̃µν (k) with
ε̃µν (k) := εµν (k) + ikµ ζν + ikν ζµ , (6.40)
and so if εµν (k) satisfies (6.39) then so also does ε̃µν (k), since (using k 2 = 0)
1 i
k µ ε̃µν (k) −kν ε̃µ µ = i(k · ζ)kν − kν (2k · ζ) = 0 . (6.41)
2 2
This remaining freedom to redefine coordinates is also removed if εµν (k) is re-
quired to also satisfy a second condition: `µ εµν (k) = 0 where `µ is a second future-
pointed null vector, `µ `µ = 0, chosen such that kµ `µ = −1. Contracting (6.39) with
`ν then implies εµ µ = 0 and so k µ εµν = 0. For instance, for a wave moving along the
positive z-axis with frequency ω 6= 0, we can choose
   
1 1
µ
0
µ 1  0 

k = ω   and ` = , (6.42)
 
2ω  0 

0
1 −1

– 109 –
and so the most general εµν (k) satisfying k µ εµν = `µ εµν = εµ µ = 0 is
 
0 0 0 0
0 ε ε 0
 + × 
εµν =  , (6.43)
 0 ε× −ε+ 0 
0 0 0 0

where ε+ and ε× denote the two types of polarizations. Notice the wave is transverse
(just like an electromagnetic wave) because these polarizations are only nonzero in
the x and y directions for a wave travelling along the z-axis.
If such a wave were to pass by a material it causes nearby particles to move
relative to one another in an oscillatory fashion. Because gravity is so weak the
induced motion for test particles on Earth is likely to be extremely small.
Remarkably, such relative motion was recently observed for the arms of a pair of
long laser interferometers, built by the LIGO collaboration with precisely the goal of
determining if such waves actually exist in nature. The LIGO interferometers each
have arms several kilometers long, and were situated thousands of kilometers away
from one another (so that their reactions to any stray environmental effects would not
be correlated, unlike for the passage of a gravitational wave). The observed wave had
precisely the properties that would have been expected if the wave were emitted by
two distant black holes, that initially orbited one another but whose orbits decayed
(for reasons described below) until they eventually merged together into a larger,
spinning, black hole.

6.4 Binary pulsars


The most precise extra-solar tests of GR come from the study of the orbits of binary
pulsars. This section briefly describes what these systems are, and what new features
arise in their study beyond those that are familiar from tests of GR within the solar
system.

What are Binary Pulsars?


A pulsar is an astrophysical object that is observed to send regularly repeated bursts
of radiation (which could be radio waves, or x-rays etc.), whose repetition period
ranges from a few seconds to a few milliseconds (see Fig. 15).
Their properties fit what would be expected for a very compact star, called a
neutron star, that is rapidly spinning. A neutron star is an exotic beast, with a mass
similar to that of the Sun but a radius of only a few kilometres, which is not much
larger than its Schwarzschild radius. This small size makes it capable of rotating as
quickly as many times a second. Such a star, once rapidly rotating, would tend to
set up a large magnetic field which would tend to fire very energetic particles into

– 110 –
space along a well-directed beam. Such a beam would rotate with the neutron star,
causing a lighthouse-like beam of particles that sweeps around as the neutron star
turns. The regular pattern of pulses of radio waves or x-rays seen from the Earth
then arises as this lighthouse beam repeatedly sweeps past us.
A binary pulsar is a pulsar (i.e.
a neutron star) that orbits a com-
panion star. This companion can
be an ordinary star like the Sun, or
possibly even another neutron star.
(Stars orbiting one another like this
are actually not an unusual occur-
rence, since just under half the stars
visible in the sky orbit a partner in
this way.) The fact that the pulsar
is in an orbit around another star
can be inferred from the shifts that
this motion induces in the frequency
of the light the pulsar emits, a phe-
nomenon called the Doppler effect.
It takes a pulsar a few days or so
to orbit once around its partner, in-
dicating that the pulsar and its com-
Figure 15: Plots of the spectrum of radiation
panion are closer to one another than
from two representative pulsars. The pattern
Mercury is to our Sun. Together
shown is repeated over and over again.
with the compactness of the pulsar
itself, this means that the gravitational fields through which these stars pass are much
stronger than those to which we are accustomed in the solar system. What’s more,
the fact that the pulsar sends out such regularly repeating signals means that we
see an exquisitely precise clock in orbit around another star, providing a remarkable
chance to measure the nature of space and time in these orbits.
For all of these reasons there are a number of relativistic effects that are com-
paratively large relative to those seen in the solar system. This allows a potentially
greater suite of tests of GR than are possible in the solar system. Some of the rela-
tivistic effects that have been seen in these systems are the ones that are also seen
in the solar system. These include
• the relativistic precession, or periastron shift, of the pulsar orbits;

• the relativistic slowing of time as counted by the pulsar as it moves in the


gravitational field of its companion;

– 111 –
• the Shapiro time delay of the pulsar signals as they pass through the gravita-
tional field of the massive companion.

Orbital Decay

There are also new effects seen


in binary pulsar systems, that have
not been seen before. Foremost among
these is the observed decay of the
pulsar orbit, which are very slowly
spiralling in towards one another. This
orbital decay is observed as an ex-
tremely small, slow, secular increase
in the orbital period, seen in Fig. 16.
Although small, the increase is ob-
servable because the pulsars have been
watched consistently over a long pe-
riod of time, in some cases — for the
Hulse-Taylor pulsar, for example —
for several decades.

Why is this decay a relativistic


Figure 16: A plot comparing measurements of effect? It is because orbital decay in-
the rate of decay of the pulsar orbital period as dicates that the pulsar orbit is losing
a function of time, with the prediction following energy. General Relativity predicts
from gravitational radiation in GR. such an energy loss, due to the emis-
sion of gravitational waves. After a
short aside to summarize the prop-
erties of gravitational waves, we return to a discussion of their implications for pulsar
orbits in more detail.

– 112 –
Figure 17: Plot of the prediction for periastron shift (blue), orbital decay rate (dotted
black) and relativistic time delay (dashed red) for the Hulse-Taylor binary, PSR 1913+16
in General Relativity, as functions of the mass of the pulsar and its companion in the binary
system. If GR is true all three lines should touch at a point (within errors), revealing the
masses of the actual bodies involved.

Gravitational Waves and Orbital Decay


Because the waves are produced by moving masses, much as electromagnetic waves
arise from the motion of electric charges, the energy loss rate into gravitational
radiation turns out to be proportional to a power of both the total mass, M , of the
orbiting system and of its orbital angular frequency, Ω = 2π/P :
  10/3
128G 2 4 6 erg M 1 hour
L= 5
M R Ω ' 2 × 1033 , (6.44)
5c sec M P

and the second equality uses Kepler’s 3rd Law, Ω2 = GM/R3 , to trade R for M
and Ω (or the orbital period, P ). Alternatively, L ' 25 (GM 2 Ω/R) (v/c)5 , where
v ' R Ω is of order the orbital speed. This way of writing things shows that the first
term represents an emission of an appreciable fraction of the gravitational binding
energy per period, while the second factor suppresses the result by 5 powers of v/c.
For orbits with a period of P ' 1 hour ' 1.1 × 10−4 year, Kepler’s 3rd Law implies
a mean orbital radius that is of order R ' (1.1 × 10−4 )2/3 AU ' 0.002 AU, where 1
AU ' 1.5 × 1011 m. Consequently, v/c ' RΩ/c ' 0.1.

– 113 –
Equating this to the loss rate of orbital energy and using the properties of Newto-
nian orbits to relate the energy to the orbital period, give the resulting GR prediction
for the period change
  5/3
dP −12 M 1 hour
= −3 × 10 . (6.45)
dt M P
Fig. 16 plots the comparison between the prediction of eq. (6.45) and the observed
rate of decrease of orbital period for the Hulse-Taylor pulsar, PSR B1913+16, which
has been closely and continually watched for several decades now.
But in order for this to provide a test of General Relativity it is necessary to
know what the masses are for both the pulsar and its companion. How were these
measured in order to make the comparison of fig. 16? Although they cannot be
measured directly, since the companion is not in this case visible, progress is possible
because the masses also appear in the prediction for the size of the other relativistic
effects that are observed for pulsars. The strategy is to use the agreement of these
predictions with experiment to infer the masses of the orbiting stars, and then to use
these to predict the gravitational radiation rate.
Fig. 17 illustrates this strategy, showing three curves that give the relationship
between the pulsar mass and the mass of its companion that follows by requiring the
prediction of GR for the precession of the orbit, the slowing down of the pulsar clock,
and the orbital decay caused by gravitational radiation, to agree with what is seen
for a particular pulsar. If GR provides a correct description of the pulsar system, all
three of these curves should touch at a single point, corresponding to the masses of
the two bodies in the orbit. The remarkable fact is that they do, and because they
do we learn both that GR is working well, and what the masses of the two stars must
be. And given these masses the rate of decay evolves in time in precisely the way
predicted by GR, as seen in Fig. 16.

The Double Binary Pulsar


Almost a thousand pulsars have been discovered over the years, and some of the
ones found more recently promise to provide new ways to test General Relativity. A
particularly promising system is given by the pulsar J0737-3039, which (unusually)
consists of a pulsar being orbited by another pulsar. Even better, the pulsars almost
eclipse one another (that is, the beam from one passes through the astrophysical
detritus that surrounds the other), and so their orbit is inclined so that we see it
edge-on from the point of view of the Earth.
This system is something of a holy grail for testing general relativity, since it
provides access to more relativistic effects than do other pulsar systems. For example,
the near-eclipsing of one pulsar by the other implies that the observed light signal

– 114 –
from one pulsar passes very close to the other on its way to the Earth, and so
experiences a Shapiro time delay that is observable and may be compared with
predictions.
The prediction of General Relativity
for the five observable relativistic effects
as a function of the two pulsar masses
is given in Fig. 18. If GR provides a
correct description of the pulsar system,
all of the curves should touch at a single
point, corresponding to the masses of the
two pulsars. Remarkably, again they all
do to within the errors, confirming that
GR is working well. And just like for
the Hulse-Taylor pulsar, the precision of
these tests will improve the longer its sig-
nals are watched.
Figure 18: Plot of the prediction for peri-
astron shift (dashed blue), orbital decay rate
(dashed green), relativistic time delay (red), 6.5 Astrophysical Black Holes
Shapiro time delay (orange) and shift (black),
for the double-binary system, PSR J0737- There is considerable evidence within as-
3039 in General Relativity, as functions of trophysics for the existence of black holes
the mass of the two pulsars. If GR is true in the universe, and this provides sup-
all five lines should touch at a point (which port for the general picture of these ob-
they do within the errors). jects that is painted by General Relativ-
ity even though they do not yet provide
precision tests of the theory.
In each case the thrust of the evidence identifies the total mass of an unseen
central object by watching how it is orbited by objects that we can see. This is
compared with upper limits to its size, that either come from direct observations of
the innermost positions of the orbiting objects, or by considerations related to how
fast the object is observed to change its brightness. In many cases there is so much
mass crammed into so small a region that there is no known way for it to support
itself against collapse into a black hole.
There are two broad classes of black holes that have been reliably identified in
this way: stellar-sized black holes; and super-massive black holes in the centers of
galaxies. (Intermediate-sized black holes, with masses of order thousands of solar
masses, are also believed to exist — perhaps at the centers of globular clusters of
stars — but evidence for them is more controversial.)

– 115 –
Stellar-sized black holes

Among the first black hole candidates were those having masses not so different
from that of the Sun, as would be expected as the endpoints of the gravitational
collapse of a sufficiently massive star. Although the black hole itself is not visible, it
can be observed when matter that falls into it radiates. And such infalling matter
is particularly likely in situations where the black hole is in an orbit with another
ordinary star, since in this case material from the companion star can be siphoned off
to continually feed the black hole (as illustrated in fig. 19). As it falls in, this matter
can become hot enough to emit x-rays, and many examples of such x-ray binaries
are known (some of which are among the brightest objects in the sky when viewed
in x-rays).
Sometimes the stellar companion to
the black hole is a star that is sufficiently
luminous to be directly visible in optical
or radio wavelengths. In such cases the
black hole dominates the luminosity of
the binary pair in x-rays, while its stel-
lar partner is the one that can be seen in
the visible spectrum. Among the most
Figure 19: A drawing (courtesy of the Euro-
famous x-ray binaries that are believed
pean Space Agency and Hubble Space Tele- to consist of black holes is Cygnus X-1,
scope) of an x-ray binary system. the brightest x-ray source in its constel-
lation as seen from Earth. The orbital
partner of the x-ray source has been identified to be the super-giant star AGK2 +35
1910 = HDE 226868, which is itself incapable of emitting the x-rays observed from
its partner. Both stars are 2 kpc away from us, and move together in an aggregation
of stars, indicating a probable common origin.
The light from this star exhibits the characteristic Doppler shifts that are asso-
ciated with being in an orbit about a massive partner, with an orbital period of 5.6
days. Because the plane of the orbit relative to the sky is unknown it is trickier to
determine unambiguously the mass of the partner, but the best estimates lead to a
mass of 8.7 ± 0.8 M . On the other hand, since the x-ray source varies in time with
a timescale faster than several times a second, it cannot be larger than a fraction
of a light-second across (the best estimates indicate its size is smaller than 105 km).
The compact object is believed to be a black hole since neutron stars cannot be this
massive, and no other object is known that can have this much mass compressed
within the allowed size.

– 116 –
Galactic black holes
Enormous black holes, more massive than a million Suns, are believed to reside at
the center of most galaxies. When fed by infalling material, these can be among the
most luminous objects in the universe.

The Milky Way


Detailed studies of the properties of the galactic center give very good evidence
that our own galaxy, the Milky Way, itself contains such a super-massive black hole.
This evidence partially comes from the indications that there is a very powerful
energy source located near the galactic center, as would be expected if material
accretes there onto a black hole. The galactic center is an active energy emitter
when viewed in radio and x-ray wavelengths. (Studies with visible light are more
difficult because this is obscured by the dust that lies along our line of sight to the
galactic center.) Fig. 20 shows an x-ray photograph of our galactic center, showing
the presence of a variety of sources.
A more detailed picture of the Milky
Way’s central object is formed by study-
ing how it affects the motion of stars in
its immediate vicinity. The motion of a
handful of such stars have been observed
continually for 16 years, allowing a de-
tailed reconstruction of their orbits that
in some cases includes enough time for
them to have completed an entire revo-
lution about the galactic center [6].
The observed orbits are consistent
with motion in the presence of a very
massive point source, since they are very
Figure 20: An Chandra satellite x-ray im-
close to Keplerian. For instance, one of
age of the center of our galaxy.
the innermost stars — the star S2 of
fig. 21 — moves in an orbit whose eccentricity is e = 0.88 and whose semimajor
axis subtends an angle (seen from the Earth) of 0.1 seconds of arc, or 4 × 10−7 ra-
dians. Since the galactic center is 8.3 kpc away, this corresponds to an orbit whose
semimajor axis is 0.01 light years, or about 4 light days.
These orbits indicate that the mass of the central mass is 4.3 × 106 M . On the
other hand, the size of the source must be much smaller than the point of closest
approach of the smallest orbit (which turns out to be 17 light hours) because the
orbits are consistent with the central object being at a single point. For comparison,
the Schwarzschild radius corresponding to a mass of 4.3 × 106 M is about 1.2 × 107

– 117 –
km, or 43 light seconds. (For reference, our Sun is about a light second across, and
the Earth is about 8 light minutes – or 480 light seconds – from the Sun.)
The central source being orbited is believed to be a black hole because there is
no other known way to cram this much mass into so small a region, without its being
directly visible. If there were a black hole at the galactic center, simulations show
that stars would naturally be found orbiting it that are formed as huge gas clouds
fall into the black hole.

Active Galaxies

Super-massive black holes at the center


of other galaxies are believed to be among
the brightest objects in the sky, and the
difference between these and the one in
the center of our own galaxy seems to
be mostly to do with how much mate-
rial they are being fed. An example of
the kinds of energy release that is pos-
sible is given by fig. 22, which shows a
jet of energetic particles emerging from
the center of the large elliptical galaxy
Figure 21: A plot of reconstructed orbits of
several stars orbiting the center of our galaxy M87 in the Virgo cluster about 17 Mpc
[6]. away from us. This jet is more than 5000
light years long, and the apparent speed
of the matter being ejected along it has been measured using the Hubble space tele-
scope. This finds the apparent motion to be between 4 to 6 times the speed of light,
an illusion that indicates (see exercise 11 in chapter 2) that the jet is moving at
relativistic speeds (but slower than light) largely directed towards us, along the line
of sight. Other indications of a strong energy source in M87 comes from its strong
emissions in x-rays and gamma rays.
The argument that the energy source at the galactic center is a black hole again
comes from measurements indicating that an enormous amount of mass resides within
a comparatively small volume. For M87 the mass measurement is made by following
the speed with which hot gas orbits the central object in a central disc as a function
of the gas’ distance from the center. The speed of the orbits is measurable as a net
Doppler red-shift on one (receding) side of the central object, and a net blue-shift
on the other (approaching) side. These measurements indicate the central object is
enormously massive: its mass is 3 × 109 solar masses.
An upper limit to the size within which this mass is compressed comes from the
HESS gamma ray telescope which sees variations in the gamma ray flux that occur

– 118 –
over timescales of a few days. This indicates that the 3 billion solar masses lie within
a region that is a few light days across. For comparison, the Schwarzschild radius of
a black hole whose mass is 3 × 109 M is about 8 light hours (which is larger than
the planetary orbits of our solar system). The only known object that can be this
massive and yet so small is a black hole.
A second piece of circumstantial evi-
dence for the central object being a black
hole is the enormous efficiency – 6 times
better than the nuclear fusion that drives
stars like the Sun – with which a black
hole is able to convert mass into energy.
To see why this is so consider the con-
served quantity, E = −(1−rs /r)(dt/dτ ),
of a particle moving in a circular orbit at
radius r. The 4-velocity for such a par-
ticle is  
1
0
uµ = γ   , (6.46)
 
0
Figure 22: A Hubble Space Telescope pho-
tograph of an energetic jet emerging from the Ω
center of galaxy M87. where γ = dt/dτ and Ω = dφ/dt. Since
1 = −u · u = γ 2 [(1 − rs /r) − r2 Ω2 ], we
have
1 1
γ=p =p , (6.47)
(1 − rs /r) − r2 Ω2 1 − 3rs /2r

where the last equality uses Kepler’s 3rd Law, Ω2 = GM/r3 = rs /(2r3 ) — which is
exactly satisfied for circular orbits in Schwarzschild spacetime (see exercise 28) — to
write r2 Ω2 = rs /(2r).
Using eq. (6.47) in the expression for E then gives

1 − rs /r
E=p , (6.48)
1 − 3rs /2r

which for the innermost circular orbit at r = 6M = 3rs becomes



2 2
E= ' 0.94 . (6.49)
3
Since E = 1 for a particle at rest at infinity, E can be interpreted as the energy per
unit rest mass, and eq. (6.49) shows that as much as 6% of the rest mass of a particle

– 119 –
can be converted to gravitational binding energy as a particle falls into an orbit
close to the black hole. This is ultimately the energy that is released to drive the
acceleration of the few particles that escape the black hole by being accelerated out
the jet (which emerges along the axis of rotation for the accretion disc that infalling
matter forms around the black hole).
By comparison, typical nuclear interactions release the nuclear binding energy,
and comparing the 27 MeV released by each fusion of a Helium nucleus from four
Hydrogen nuclei shows that this type of fusion releases roughly only 1% of the rest
mass available as energy. The energy released from matter infalling into a black hole
is therefore expected to be roughly 6 times more abundant than would have been
released by using the same amount of matter in some sort of a nuclear reaction.

7. Cosmology
The earlier sections show that once one accepts Einstein’s point of view that the
right way to describe gravity is as the curvature of spacetime, it becomes possible
to relate our local geometry to the distribution of matter in our immediate vicinity.
However the same logic also connects geometry to the matter distribution over much
larger scales, and in principle should relate the geometry of the Universe as a whole
to the average distribution of matter on the largest observable scales.
It is this realization that underlies the
science of cosmology, which uses observations
of the distribution of matter on very large
scales to make inferences about the overall
curvature of space and time, and how these
change in time. This section provides a brief
overview of the Big Bang theory of cosmol-
ogy, with an emphasis is on the theoretical
ideas that pertain to General Relativity.
Figure 23: A map of the nearby distri-
bution of galaxies on the sky seen from 7.1 Kinematics of an Expanding Uni-
the earth, as obtained from the 2-Mass verse
galaxy survey. The ‘S’-shaped smear is
We start with a section describing the geom-
where our view is obscured by the pres-
etry of spacetime on which all of the subse-
ence of our own galaxy in the foreground.
quent sections rely. The key underlying as-
sumption in this section is that the universe is homogeneous and isotropic when seen
on the largest distance scales. Here isotropic means that all directions are equivalent
as seen by an observer situated at a particular point, and so is equivalent to the
spherical symmetry of the geometry about this point. Homogeneity states that the

– 120 –
above isotropy holds for an observer located at any point. Until relatively recently
this assertion about the homogeneity and isotropy of the universe was an assump-
tion, often called the Cosmological Principle. More recently it has become possible
to put this assertion on an observational footing, based on large-scale surveys of the
distribution of matter and radiation within the observed universe. The isotropy of
this distribution relative to our own vantage point can be seen in fig. 23, which shows
a the results of a representative galaxy survey.

The LFRW Metric

The assumption that the universe is spherically symmetric and homogeneous puts a
strong restriction on the form of the universe’s overall geometry. We have already
seen that spherical symmetry by itself ensures that the metric can always be written
in the ‘isotropic’ form of eq. (3.20):
h i
ds2 = −e2α(%,τ ) dτ 2 + e2β(%,τ ) d%2 + %2 (dθ2 + sin2 θ dφ2 ) , (7.1)

for some unknown functions, α(%, τ ) and β(%, τ ).


These functions are further restricted by the requirement of homogeneity, which
says that α must be a function only of the time coordinate, α = α(τ ). This function
can then be completely eliminated by redefining the time coordinate, τ → t, so that
eα(τ ) dτ = dt.
It is tempting to conclude that homogeneity amounts to translation invariance,
and so β must also be independent of %. Although this does provide a homogeneous
and isotropic space, it does not produce the most general one. The condition on β
is slightly weaker: β must come as a sum, β = f (τ ) + g(%). Although β can depend
on %, the allowed dependence is very restrictive. Homogeneity turns out to require
that g(%) is such that we can change variables % → r, in a way that allows it to be
put into the LeMaitre-Friedmann-Robertson-Walker (LFRW) form:

dr2
 
2 2 2
ds = −dt + a (t) + r2 dθ2 + r2 sin2 θ dφ2 (7.2)
1 − κr2 /r02
h i
= −dt2 + a2 (t) d`2 + r2 (`) dθ2 + r2 (`) sin2 θ dφ2 ,

where r0 is a constant and κ can take one of the following three values: κ = 1, 0, −1.
This is the most general 4D geometry that is consistent with isotropy and homo-
geneity of its spatial slices, and it is characterized by the one unknown function,
a(t) = ef (τ (t)) . The content of Einstein’s equations will be to relate the shape of the
function a(t) to the matter content of the universe.

– 121 –
The coordinate ` in this metric is related to r by d` = dr/(1 − κr2 /r02 )1/2 , so if
we demand `(r = 0) = 0 then

 r0 sin(`/r0 ) if κ = +1

r(`) = ` if κ = 0 . (7.3)

r sinh(`/r ) if κ = −1
0 0

Notice that the metric, eq. (7.2), is invariant under the following re-scaling of
parameters: a → a/λ, r0 → λr0 , provided we also re-scale the coordinate ` → λ`.
This freedom is often used to choose convenient units, such as by choosing λ to ensure
r0 = 1 (if κ 6= 0), or perhaps to set a(t0 ) = 1 for some t0 .
The coordinates used all have the following simple physical interpretations.

• t represents the proper time along the time-like trajectories along which `, θ
and φ are fixed. The range over which t may run is defined by the region over
which the function a(t) is neither zero nor infinite.

• ` is simply related to the proper distance measured along the radial directions
along which t, θ and φ are fixed, since this proper distance is given by

D(`, t) = ` a(t) . (7.4)

If κ = 0, −1 then ` takes values in the range 0 < ` < ∞, but if κ = +1 then `


is restricted to run over 0 < ` < πr0 because r(`) vanishes at ` = πr0 .

• 0 < θ < π and 0 < φ < 2π represent the usual angular coordinates of spheri-
cal polar coordinates. (Spherical coordinates furnish a convenient description
of our view of the universe, with the origin of coordinates representing our
vantage point.) The geometry is invariant under the SO(3) rotations of the
2-dimensional spherical surfaces at fixed ` and t which these coordinates pa-
rameterize.

• r(`) is simply related to the arc-length measured along these spherical surfaces
of fixed ` and t in the sense that a small angular displacement, dθ, is subtended
by a proper arc-length
ds = a(t) r(`) dθ , (7.5)
at a coordinate position `. It follows that the sphere having proper radius
`a(t) has a proper circumference of C = 2π r(`)a(t) and its proper area is
A = 4π r2 (`)a2 (t).

The quantities κ and r0 characterize the curvature of the spatial slices at fixed
t, in the following way.

– 122 –
Flat Spatial Curvature

If κ = 0 then r(`) = ` and the spatial part of the LFRW metric reduces (apart
from the overall factor, a2 (t)) to the metric of flat 3-dimensional space, written in
spherical polar coordinates:

ds23 = dr2 + r2 (dθ2 + sin2 θ dφ2 ) , (7.6)

as may be seen by performing the standard coordinate transformation

x = r sin θ cos φ , y = r sin θ sin φ , z = r cos θ (7.7)

in the metric of eq. (2.2). In this case the parameter r0 does not appear in the metric.

Positive Spatial Curvature

When κ = 1 we have r(`) = r0 sin(`/r0 ) and the metric for t fixed describes the
geometry of a 3-dimensional sphere whose radius of curvature is r0 . For instance, in
this case the circumference of a circle of proper radius a(t) ` is
 
`
C = 2πa(t) r0 sin , (7.8)
r0

which is strictly smaller than the corresponding flat result: C < 2πa(t) `.
Furthermore, for fixed t, C is a monotonically increasing function of ` until
` = πr0 /2, but beyond this point C decreases until it vanishes at ` = πr0 . The
maximum coordinate circumference obtained in this way is Cmax = 2πa(t) r0 .
Notice also that the flat κ = 0 case is retrieved in the limit of infinite curvature
radius: r0 → ∞.

Negative Spatial Curvature

When κ = −1 we have r(`) = r0 sinh(`/r0 ), which makes the metric for constant
t describe the geometry of a 3-dimensional surface of negative constant curvature.
(The surface of a saddle is close to being a 2-dimensional surface having constant
negative curvature.) The radius of curvature of this space is r0 . In this case the
circumference of a circle of proper radius a(t) ` grows monotonically with `,
 
`
C = 2πa(t) r0 sinh , (7.9)
r0

and is always larger than the corresponding flat-space result: C > 2πa(t) `.
Again the flat κ = 0 case is retrieved in the limit of infinite curvature radius:
r0 → ∞.

– 123 –
Particle Motion
For the purposes of cosmology galaxies are particles, and so their trajectories in this
spacetime are given, as usual, by solutions to the geodesic equation, eq. (3.36)

d2 xµ
 ν  λ
µ dx dx
2
+ Γνλ [x(s)] = 0, (7.10)
ds ds ds

with the Christoffel symbols, Γµνλ , given by eq. (2.39).


For the LFRW metric the only nonzero Christoffel symbols turn out to be given
by
Γt`` = aȧ , Γtθθ = aȧ r2 , Γtφφ = aȧ r2 sin2 θ ,

Γ`t` = Γ``t = Γθtθ = Γθθt = Γφtφ = Γφφt = , (7.11)
a
r0
Γ`θθ = −rr0 , Γ`φφ = −rr0 sin2 θ , Γθ`θ = Γθθ` = Γφ`φ = Γφφ` = ,
r
Γθφφ = − sin θ cos θ , Γφθφ = cot θ ,
where the dots denote differentiation with respect to t and the primes represent
derivatives with respect to `.
Using these expressions for the Christoffel symbols, the four geodesic equations
then become
(  "   2 #)
2 2
d2 t d` 2 dθ 2 dφ
2
+ aȧ +r + sin θ =0
ds ds ds ds
"   2 #
2
d2 `
 
ȧ d` dt 0 dθ 2 dφ
2
+2 − rr + sin θ =0
ds a ds ds ds ds
 0  2
d2 θ
 
ȧ dθ dt r dθ d` dφ
2
+2 +2 − sin θ cos θ =0
ds a ds ds r ds ds ds
 0
d2 φ
 
ȧ dφ dt r dφ d` dθ dφ
+ 2 + 2 + 2 cot θ =0
ds2 a ds ds r ds ds ds ds
Since the metric is rotationally invariant, angular momentum is conserved along
these geodesics in precisely the same way as it was for the Schwarzschild metric.
That is, the motion is guaranteed to take place entirely within a plane, and we are
free to choose our coordinates so that this plane is described by the equator, θ = π2 ,
for all s (which is clearly a solution to the d2 θ/ds2 equation above). Rotational
invariance implies that the equation of motion for φ may be integrated once, to give
(using θ = π/2)

L = a2 r 2 , (7.12)
ds
where L is a constant.

– 124 –
The remaining equations can often be explicitly integrated. When ȧ = 0 they
describe motion at constant speed along the geodesics of the spatial geometry (along
straight lines if this geometry is flat: κ = 0). When ȧ 6= 0, motion along these
geodesics instead tends to damp out under the influence of the ȧ/a terms in the
equations (called the Hubble ‘friction’ terms). This damping arises because the ex-
pansion of the universe extracts energy from the motion. Several special cases are of
particular interest.

• Radial Motion: If dθ/ds = dφ/ds = 0 at one point, then these quantities


remain zero along the entire geodesic. This shows that an initially radial motion
continues in the radial direction for all times. Radial free fall is described by
the equations of motion
2
d2 t d2 `
   
d` ȧ d` dt
+ a ȧ = 0 and +2 = 0. (7.13)
ds2 ds ds 2 a ds ds

These together imply the constancy of the proper distance along the geodesic,
(d/ds)[(dt/ds)2 − a2 (d`/ds)2 ] = 0, as expected on general grounds.

• Inertial Motion: If a galaxy is initially at rest — and so d`/ds = dθ/ds =


dφ/ds = 0 — then it remains at rest, at fixed coordinate position, for all t.
This shows that observers who remain at fixed position ` (the analogs of the
observers at fixed r for the Schwarzschild metric) move along geodesics (unlike
for the fixed-r observers in Schwarzschild).

Hubble Flow and Peculiar Motion


Consider now a particle moving more slowly
than light, but for which some force keeps it
from moving along a geodesic. This might
happen for a galaxy, for instance, if some lo-
cal density enhancement attracts it. In par-
ticular, consider for simplicity a galaxy hav-
ing coordinates (t, ` = `(t), θ = θ0 , φ = φ0 ),
which moves on a purely radial trajectory.
Figure 24: A plot of velocity (redshift) The proper distance to this galaxy from, say,
vs (luminosity) distance for a class of the origin is given by D(`, t) = `(t)a(t), and
bright, distant objects that are used to so its proper velocity relative to an observer
trace the motions of very distant galax- at the origin is
ies (courtesy of Michael Richmond).
dD da d` d`
Vp = =` +a =HD+a ,
dt dt dt dt
(7.14)

– 125 –
where  
1 da
H(t) := . (7.15)
a dt
The first term of eq. (7.14) describes the galaxy’s apparent motion due to the overall
universal expansion, and expresses the Hubble Law: in the absence of other motions
at any given instant all galaxies recede with a proper speed which is proportional
to their proper distance. (This law describes the observed overall motion of galaxies
very well, as is illustrated in fig. 24.) By contrast, the second term describes peculiar
velocity,
d`
Vpec = a , (7.16)
dt
which expresses any deviation from geodesic motion in the overall LFRW metric.
Measurements of H at the present epoch, H0 = H(t = t0 ), give H0 = 70 ±
10 km/sec/Mpc, which for a galaxy 1,000 Mpc distant (using present-day proper
distance) would represent an apparent Hubble velocity of VH = 70, 000 km/sec, or
VH /c ∼ 0.2.
If the proper time of an observer riding in this galaxy, τ , is used as the parameter
along its trajectory, then (as usual)
 ν ν
dx dx
gµν = −1 . (7.17)
dτ dτ

This expression allows the time dilation of observers in the galaxy to be related to
the motion just described. Specialized to the radial motion ` = `(t) this last equation
reads  2  2  2
dt 2 d` dt  2

−a = 1 − Vpec = 1, (7.18)
dτ dτ dτ
and so the local time dilation is
dt 1
= γpec = p . (7.19)
dτ 2
1 − Vpec

We see that there is no time dilation in the absence of peculiar motion, so t


describes the proper time for all observers who sit at fixed coordinate positions.
In the presence of proper motion a time dilation arises, given by the usual special
relativistic expression in terms of the peculiar velocity, Vpec .
Light Rays and Redshift
The trajectories of particles (like photons) moving at the speed of light similarly
satisfy  ν ν
dx dx
gµν = 0, (7.20)
ds ds

– 126 –
which for radial motion specializes to
dt d`
=±a . (7.21)
ds ds
Consider now a photon which is sent to us (at the origin) along a radial trajectory
from a galaxy which is situated at fixed coordinate position ` = L. If we suppose the
photon to arrive at our position at t = 0 then we may compute its departure time at
the emitting galaxy, t = −T . Explicitly, the look-back time, T , is given by eq. (7.21)
to be Z T
dt
L= . (7.22)
0 a(t)
Imagine now repeating this calculation for a sequence of photons (or for a train
of wave crests) which are emitted from the galaxy and are received here. Suppose
two consecutive photons are emitted at events which are labelled by the coordinate
positions (−T, L, θ0 , φ0 ) and (−T + δT, L + δL, θ0 , φ0 ), with the first of these received
at the origin at time t = 0 and the second arriving at (δt, δ`, θ0 , φ0 ). The redshift
of such a wave train may be found by computing how δt depends on δT , the scale
factor, a(t), and the peculiar motions of the emitter and observer.
We know that the trajectories of both photons satisfy eq. (7.21), and so we know
Z T Z T −δT
dt dt
L= and (L + δL) − δ` = . (7.23)
0 a(t) −δt a(t)
Subtracting the first of these from the second, and expanding the result to first order
in the small quantities δt, δT δL leads to the following relation
Z T −δT Z T
dt dt δt δT
δL − δ` = − ≈ − , (7.24)
−δt a(t) 0 a(t) a0 a(T )
where a0 = a(0). Dividing by δT then gives
   
δL δ` δt 1 δt 1
− = − . (7.25)
δT δt δT a0 δT a(T )
This may now be solved for δt/δT as a function of a0 , a(T ) and the emitter and
observer’s peculiar velocities, Vpec = a(T )[δL/δT ] and vpec = a0 [δ`/δt] to give
 
δt a0 1 + Vpec
= . (7.26)
δT a(T ) 1 + vpec
The redshift, z, of the light is defined in terms of its wavelength at emission,
λem , and at observation, λobs , by z = (λobs − λem )/λem and so
2 1/2
δt 1 − vpec

λobs δτobs
1+z = = = 2
(7.27)
λem δτem δT 1 − Vpec
2 1/2
1 − vpec
 
a0 1 + Vpec
= 2
.
a(T ) 1 + vpec 1 − Vpec

– 127 –
This last expression uses eq. (7.19) to relate the proper time of the observer, δτobs ,
and of the emitter, δτem , to the corresponding coordinate time differences, δt and
δT .
Eq. (7.27) is the main result. For negligible peculiar motions it reduces to a
simple expression for the redshift due to the Hubble flow
a0
1+z = , (7.28)
a(T )

which is a red shift – i.e. z > 0 – if the universe expands – i.e. a0 > a(T ). This
expression gives a good method for measuring the universe’s scale factor, a(t), since
it shows that it is simply related to the redshift of the light received from distant
galaxies.
For non-relativistic peculiar velocities this generalizes to the approximate formula
a0 h i
1+z ≈ 1 + (Vpec − vpec ) . (7.29)
a(T )

Notice that (as expected) relative peculiar motion also generates a redshift – z > 0
– if Vpec > vpec – that is, if the emitting galaxy is receding from the observing one.
In principle, the dependence of z on peculiar velocity complicates the inference
of the universal scale factor from measurements of redshift, since in principle it
requires knowledge of the peculiar velocity of the distant emitting galaxy. In practice,
however, this complication is only important for relatively nearby galaxies, for which
the redshift due to the peculiar velocities are not dominated by that due to the
universal expansion.

7.2 Distance vs Redshift


In LFRW cosmology the expansion of the universe is characterized by the time de-
pendence of the scale factor, a(t), which we shall see is in most circumstances a
monotonic function of t. In principle, predictions for a(t) can be tested by mea-
suring the proper distances, D(L, −T ), to distant celestial objects and comparing
this with the look-back time, T , to these objects. Measurements of D(L, −T ) vs T
allow the inference of a(t) because of the connection between L and T — i.e. the
relation L(T ) given implicitly by eq. (7.23) — which expresses the fact that all of our
observations about the distant universe lie along our past light cone, because they
rely on our detecting photons which have come to us from the far reaches of space.
In practice, however, it is much easier to directly measure a than it is to measure
T because of the direct relationship between a and redshift. So inferences about the
geometry of spacetime instead are founded on measuring the dependence of distance
on redshift, z, for distant objects, rather than on look-back time, T . z and T carry

– 128 –
the same information provided a(t) is a monotonic function of time, and so it is more
convenient to use z itself as an operational measure of the universe’s age and size.
The remainder of this section derives expressions for the dependence of various
measures of distance on redshift, given a universal expansion history, a(t).

Proper Distance
Consider, then, a galaxy which at event (−T, L, θ0 , φ0 ) sends light to us which we
receive at the origin at t = 0. Writing a0 = a(0), the present-day proper distance to
this galaxy is given by
Z T 
a0
D(T ) = D(L(T ), −T ) = a0 L = dt . (7.30)
0 a(t)

This may be changed into an expression in terms of redshift by changing integration


variable from t to z using the relations
 
a0 a0 ȧ
1+z = and so dz = − dt = −(1 + z) H dt , (7.31)
a(t) a2

where as before H = ȧ/a. This leads to the desired result


Z z
dz 0
D(z) = 0
. (7.32)
0 H(z )

Unfortunately, proper distance is also not particularly convenient since it is not


easily obtained from observations. There are two other notions of distance which are
more practical, whose dependence on z is now derived.

Luminosity Distance
One way of inferring how far away a distant object is becomes possible if the object’s
intrinsic rate of energy release per unit time — i.e. luminosity, L — is known. If L is
known then it may be compared with the observed energy flux, f , which is received
at Earth from the object, with the distance to the object obtained by assuming
that the flux is related to L only by the geometrical solid angle which the Earth
subtends at the source. For instance in Euclidean space the flux received by a source
of luminosity L situated a distance D away is given by

L
f= , (7.33)
4πD2
provided the source sends its energy equally in all directions and that there is no
absorption or scattering of the light while it is en route from the source. The lu-
minosity distance, DL , to the object may then be defined in terms of L and f by

– 129 –
DL = [L/(4πf )]1/2 . This is the distance measure which is used, for example, in
recent measurements of the universal expansion using distant Type I supernovae.
Suppose, then, that the source emits a packet of light having energy, δEem , in a
time, δtem , and so has luminosity L = δEem /δtem . In an LFRW universe the relation
between L and the flux, f , we observe depends differently on distance than in flat
spacetime, in the following ways.

• Because the wavelength of the light is stretched by the universal expansion,


and the energy of a light wave is inversely proportional to its wavelength (E =
hν = hc/λ) this packet of energy arrives to us having a red-shifted energy
δEobs = δEem /(1 + z).

• Because of the expansion of space the wavelength of the light stretches as space
expands while it is en route. As a result the spatial extent of the packet also
stretches by a factor 1 + z during its passage between the source and us. The
means that on its arrival the time taken for the packet to deliver its energy is
δtobs = δtem (1 + z).

• The total energy from the source is sent in all directions, and so (using the
LFRW metric) it is spread over a sphere having surface area A = 4πr2 (L)a2 at
a proper distance D = La from the source, where r(L) is given by eq. (7.3).

The flux observed at Earth is therefore given by


 
1 δEobs
f =
4πr2 (L) a20 δtobs
 
1 δEem /(1 + z)
= (7.34)
4πr2 (L) a20 δt (1 + z)
  em
L 1
= 2
,
4πr (L) a0 (1 + z)2
2

and so the luminosity distance becomes


 1/2
L
DL (z) ≡ = a0 r(L(z)) (1 + z) . (7.35)
4πf

Notice that the present-day proper distance to the same galaxy would be D = L a0 .
Since in the special case of a spatially-flat universe, κ = 0, we have r(`) = `, in this
case DL is related to this proper distance by

DL (z) = D(z) (1 + z) (if κ = 0) . (7.36)

– 130 –
Angular-Diameter Distance
A second measure of distance becomes possible if an object of known proper length is
observed at a distance, since the angle which the object subtends as seen from Earth
is geometrically related to its distance from us. In Euclidean geometry an object of
length ds placed a distance D  ds from us subtends an angle
ds
dθ = (radians) , (7.37)
D
which motivates defining the angular-diameter distance by DA = ds/dθ in terms of
the (assumed) known length ds and measured angle dθ. This notion of distance comes
up in the study of the temperature fluctuations of the cosmic microwave background
radiation (about which more will be said later).
The connection between ds and dθ differs in the LFRW geometry in the following
ways.

• At any given time, within an LFRW geometry the proper length of an object
which subtends an angle dθ when placed a proper distance D = a ` away is
given by ds = a r(`) dθ, with r(`) given by eq. (7.3).

• When an object is observed from a great distance it is the proper distance at


the time its light was emitted which appears in the previous argument. Due to
the overall expansion of space this corresponds to a proper distance at present
which is a factor a0 /a(−T ) = 1 + z larger.

With these two effects in mind, the angle subtended by an object having proper
length ds when observed from a present-day proper distance D = a0 L away is given
by
ds ds
dθ = = , (7.38)
a(−T ) r(L) a0 r(L)/(1 + z)
and so the angular-diameter distance of such an object is
ds a0 r(L(z)) DL (z)
DA (z) ≡ = = , (7.39)
dθ 1+z (1 + z)2

where the last equality uses eq. (7.35).


Notice that in the special case of a spatially-flat universe (κ = 0), we have
r(`) = ` and so the angular-diameter distance to an object situated a proper distance
D = a0 L away is
D(z)
DA (z) = (if κ = 0) . (7.40)
1+z
This is equivalent to the object’s proper distance as measured at the time of the
light’s emission rather than its present proper distance.

– 131 –
Exercise 31: Measurements of the total number, N , of distant objects
as a function of their redshift, z, provide another way to measure a(t).
Show that if the objects in question have a density n(t), then

dN = 4πn(t)a3 (t)r2 (`(t)) d`


= 4πn(t)a2 (t)r2 (`(t)) dt (7.41)
dz
= 4πn[t(z)]a20 r2 [`(t(z))] .
(1 + z)3 H(z)

The Recent Universe


For later purposes it is useful to evaluate the above distance-redshift expressions
for various choices for the time-dependence of the universal expansion, a(t). For
simplicity (and because this appears to be a good description of the present-day
universe) in the case of DL and DA we provide formulae for the special case κ = 0.
A great many cosmological observations are restricted to the comparatively
nearby universe, for which the observed red-shifts are small. For such small red-
shifts it is useful to evaluate the distance-redshift expressions by expanding about
the present epoch, for which z = 0. Consider, therefore, a scale factor of the form
1
a(t) = a0 + ȧ0 (t − t0 ) + ä0 (t − t0 )2 + · · · , (7.42)
2
where t = t0 denotes the present epoch. In what follows it is convenient to measure
the time difference in units of H0−1 , where H0 = ȧ0 /a0 by defining ζ = −H0 (t − t0 ),
in which case the above expansion is expected to furnish a good approximation for
< 1. (Notice that as defined ζ ≥ 0 when applied to a(t) in the past universe, for
|ζ| ∼
which t ≤ t0 .)
In terms of this expansion the redshift of light becomes
a0  q0  2
1+z = =1+ζ + 1+ ζ + ··· , (7.43)
a(t) 2
where q0 ≡ −a0 ä0 /ȧ20 = −ä/(a0 H02 ), with the sign chosen so that q0 > 0 for a
decelerating universe (for which ä0 < 0).
The distance-redshift relations are governed by H(z), which is given by
   
ä0 ȧ0
H = H0 1 + − (t − t0 ) + · · · (7.44)
ȧ0 a0
h i
= H0 1 + (1 + q0 ) z + · · · .

Using this in eq. (7.32) leads to the following expression for D(z) near z = 0
 
−1 1 2
D(z) = H0 z − (1 + q0 ) z + · · · , (7.45)
2

– 132 –
which for κ = 0 also imply the following small-z expansions for the luminosity and
angular-diameter distances
 
−1 1 2
DL (z) = H0 z + (1 − q0 ) z + · · · (7.46)
2
 
−1 1 2
DA (z) = H0 z − (3 + q0 ) z + · · · .
2
For small z the leading distance-redshift dependence is therefore predicted to be
linear — D(z) ' H0−1 z — for all of the distance definitions given above, a result
which expresses Hubble’s Law in the form observers really test it (such as in fig. 24).
It is the measurement of this slope, such as by the Hubble Key Project [7], that lead
to the current best value H0 = 72 ± 8 km/sec/Mpc.
Clearly a precise determination of dis-
tance vs redshift for objects out to larger red-
shifts permits the extraction of the deceler-
ation parameter (q0 ) in addition to both the
present-day Hubble constant (H0 ). This has
proven quite difficult to do reliably, but has
recently been accomplished (see fig. 26) using
the luminosity-distance vs redshift relation
measured for Type IA supernovae, which are
bright enough to be seen at enormous dis-
tances but for which the intrinsic luminosity
Figure 25: Measurements of the is known. It is these measurements that dis-
present-day Hubble scale, H0 , as ob- covered that the universal expansion is ac-
tained from distance-redshift measure- celerating – that is, q0 < 0 so ä0 > 0.
ments by the Hubble Key Project.
Power-Law Expansion
Another situation of considerable practical interest is the case where the expansion
varies as a power of t, as in
 α
a0 t0
1+z = = , (7.47)
a(t) t
for some choices for the parameters a0 , t0 and α. In later sections we shall find this
law is produced (if κ = 0) with α = 12 for a universe full of radiation, and with α = 23
for a universe consisting dominantly of non-relativistic matter (like atoms or stars).
For such a universe the Hubble and deceleration parameters become
 
ȧ α t0 a ä 1−α
H(t) = = = H0 = H0 (1 + z)1/α and q(t) = − 2 = .
a t t ȧ α
(7.48)

– 133 –
Notice that this kind of power law implies that a vanishes for t = 0 provided
only that α > 0 (and so in particular does so for the cases α = 21 and 32 mentioned
above). This is the Big Bang which underlies much of modern cosmology. In terms
of q = q0 and the present value of the Hubble parameter, H0 , this occurs a time
H0−1
t0 = αH0−1 = (7.49)
q0 + 1
in the past.
Using the above expressions for q and H(z) in eq. (7.32) gives the following
expression for the proper distance
H0−1
Z z
dz 0
 
−1 1
D(z) = H0 0 1/α
= 1− , (7.50)
0 (1 + z ) q (1 + z)q
which with DL (z) = D(z) (1 + z) and DA (z) = D(z)/(1 + z) give the luminosity and
angular-diameter distances when κ = 0.
Radiation-Dominated Universe (if κ = 0):
As mentioned above, the special case where the universe is dominated by radiation
with κ = 0 turns out to correspond to a power-law expansion with α = 21 , and so
we have H(z) = H0 (1 + z)2 and q(z) = q0 = 1. This leads to the following proper
distance  ( −1
H0 [z − z 2 + · · · ] if z  1

z
D(z) = H0−1 = . (7.51)
H0−1 1 − z1 + · · · if z  1
 
1+z
Since κ = 0 the luminosity and angular-diameter distances become
 ( −1
H0 [z − 2 z 2 + · · · ] if z  1

−1 −1 z
DL (z) = H0 z , DA (z) = H0 = H0−1  .
(1 + z)2 1 − z2 + · · · if z  1

z
(7.52)
Matter-Dominated Universe (if κ = 0):
The special case where κ = 0 and the universe is dominated by non-relativistic
matter corresponds to power-law expansion with α = 32 , and so H(z) = H0 (1 + z)3/2
and q(z) = q0 = 12 . This leads to the proper distance

H0−1h z − 43 z 2 + · · · i if z  1
 1/2
 (  
−1 (1 + z) −1
D(z) = 2 H0 = 1/2 . (7.53)
(1 + z)1/2 2H0−1 1 − z1 + · · · if z  1

Because κ = 0 the luminosity and angular-diameter distances are


−1
(  1 2

h √ i 2H0 z + 4
z + · · · i if z  1 ,
DL (z) = 2H0−1 (1 + z) − 1 + z = h
1/2
2H0−1 z 1 − z1

+ · · · if z  1
(7.54)

– 134 –
and
H0−1h z − 47 z 2 + · · · i if z  1
 (  
(1 + z)1/2 − 1

DA (z) = 2 H0−1 = 2H0−1 1 1/2
 . (7.55)
(1 + z)3/2 z
1 − z
+ · · · if z  1

Notice that for both matter- and radiation-dominated universes the present-day
proper distance approaches a limiting value of order H0−1 when z → ∞. This implies
that we do not learn about arbitrarily large distances when we look into the past at
objects having larger and larger redshifts. A related observation is the fact that the
angular-diameter distance is not a monotonic function of z, since it grows like z for
small z but vanishes asymptotically for large z, proportional to 1/z. Since (when
κ = 0) angular-diameter distance is the proper distance to the source measured at
the time the light is emitted rather than observed, this vanishing of DA for large
z shows that our observations are limited to a vanishingly small local region in the
very distant past. This limitation to our view is called our local particle horizon. It
arises because for these geometries the universe becomes vanishingly small at a finite
time in our past and the universal expansion can be fast enough to permit objects
to be sufficiently distant that light cannot reach us from them given the limited age
of the universe.

Exponential Expansion
The next special case of interest corresponds to exponential expansion
a0
1+z = = exp[−H0 (t − t0 )] , (7.56)
a(t)
which may be regarded as the limiting case of a power law for which α → ∞. We
shall find this kind of expansion can be produced when the universal energy density
is dominated by the energy of the vacuum.
In this case the Hubble and deceleration parameters are time-independent, with

H(t) = = H0 and q(t) = q0 = −1 , (7.57)
a
Rz
and the redshift-dependence of the proper distance is D(z) = H0−1 0 dz 0 = H0−1 z.
The luminosity and angular-diameter distances (when κ = 0) then become.
 
−1 −1 z
DL (z) = H0 z(1 + z) and DA (z) = H0 . (7.58)
1+z
Notice that, unlike for the previous examples, the expansion in this case is accel-
erated, with ä > 0 and so q0 < 0. This kind of expansion is particularly interesting
because of recent tests of Hubble’s Law out to comparatively large redshifts, which
indicate q0 really is negative (see fig. 26). We shall see later that this kind of ex-
pansion can also be generated by plausible kinds of matter, and in particular would
arise if the vacuum itself were to have a nonzero energy density.

– 135 –
Unlike the case of matter- and radiation-
domination considered earlier, in this case
the present-day proper distance grows with-
out bound but the proper-distance at emis-
sion approaches a fixed limit, DA → H0−1 ,
as z → ∞. This distance represents an ap-
parent horizon beyond which we are unable
to penetrate with observations, and differs
from the particle horizon considered above
Figure 26: Measurements of the
because it is not tied to there only having
luminosity-distance/redshift relation to
been a finite proper time since the universe
higher redshifts, with evidence for q0 <
had zero size. For the exponentially-expanding
0, by the Supernova Cosmology Project
universe only a finite proper distance in the
and the High-z Supernova Search.
past is accessible to us even though t can run
back to −∞. The existence of this horizon can be traced to the enormous speed of
the exponential expansion, with which light waves travelling at finite speed cannot
keep up.

7.3 Dynamics of an Expanding Universe


The previous sections described the kinematics of how various distance-redshift re-
lationships depend on the universal expansion history, a(t). The present section
instead addresses the question of how this expansion history depends on the energy
content of the matter which lives inside the universe. This connection has its roots
in the Einstein field equations
1
Rµν − R gµν = 8πG Tµν , (7.59)
2
which relate the curvature of spacetime to its energy-momentum content — i.e.
“matter tells space how to curve”.

Homogeneous and Isotropic Stress Energy


The conditions of homogeneity and isotropy strongly restrict the distribution of mat-
ter and energy within the universe, in the same way that they restrict the metric to
take the Friedmann-Robertson-Walker form, given by eq. (7.2). For the stress-energy
tensor, Tµν , the analogous conditions have the following form.

• Isotropy permits the energy density, ρ = T tt , to be an arbitrary function of


time, t, and radial position, `, but homogeneity forbids any dependence on
the position `. The most general energy density can therefore only be time
dependent: T tt = ρ(t).

– 136 –
• Isotropy permits a net energy flux, si = T ti with i = 1, 2, 3, so long as it points
purely in the radial direction.13 In LFRW coordinates this implies T tθ = T tφ =
0 while T t` can be a nonzero function of t and `. Homogeneity, however,
requires T t` = 0 because having a nonzero energy flux would necessarily allow
one to distinguish between the directions from which and to which the energy
is flowing. The same conclusions equally apply to the momentum density:
π i = T it = 0.

• Isotropy permits the 3-dimensional stress tensor, T ij , to be nonzero provided


it is built from the metric tensor itself, or from the radial direction vector, xi .
That is, isotropy allows T ij = p g ij + q xi xj , where p and q can be functions
of both t and `. However homogeneity precludes p from depending on `, and
does not permit a nonzero q at all, since the radial vector picks out a preferred
place as its origin. It follows that the stress tensor must have the diagonal form
Tij = gik T kj = p(t) δij .
We are led to the conclusion that homogeneity and isotropy only permit a stress-
energy of the form

T tt = ρ(t) , T ti = T it = 0 and T ij = p(t) g ij , (7.60)

which is characterized by two functions of time: ρ(t) and p(t). As is clear from the
definition of T µν , ρ represents the (average) energy density as seen by co-moving
observers who are situated at fixed values of (`, θ, φ). As we saw in earlier sections,
the interpretation of T ij as a momentum flux together with stress-energy conservation
implies that the net rate of change in momentum of a volume V — i.e. the net force
acting on V — is given by the flux of momentum current through the boundary, ∂V :
dP i ∂π i 3
Z Z Z
i ij 2
F ≡ = d V =− T nj d S = − p ni d2 S , (7.61)
dt V ∂t ∂V ∂V

which shows that p represents the total (average) pressure of the matter whose stress
energy is under consideration.
Our goal now is to see how Einstein’s equations relate these quantities to a(t).

Einstein’s Equations
In order to determine how a(t) is connected to ρ(t) and p(t) we require the Ricci
tensor for the LFRW metric, eq. (7.2). It is convenient to write the metric in terms
of the time coordinate, t, and the space coordinates, xi = {r, θ, φ}, as

gtt = −1 , gti = 0 and gij = a2 (t) ĝij , (7.62)


13
This can be removed by changing the radial coordinate, but we do not do so in order not to
lose the simple connection between proper distance and coordinate distance, D = a(t)∆`.

– 137 –
where ĝij dxi dxj = dr2 /(1 − κr2 /r02 ) + r2 (dθ2 + sin2 θ dφ2 ) denotes the spatial metric
with the scale factor, a(t), removed. In terms of these, the only nonzero Christoffel
symbols are

Γtij = aȧ ĝij , Γitj = Γijt = δji and Γijk = Γ̂ijk , (7.63)
a
where Γ̂ijk denotes the Christoffel symbols built from the spatial metric, ĝij . The
components of the Ricci tensor are similarly given by
3 ä 
2

Rtt = − , Rti = 0 and Rij = R̂ij + aä + 2 ȧ ĝij , (7.64)
a
where the Ricci tensor for the spatial metric is

R̂ij = 2 ĝij . (7.65)
r0
In the same basis the components of the stress energy are
Ttt = ρ , Tti = 0 and Tij = p gij = p a2 ĝij , (7.66)
and so specializing the Einstein field equations, eq. (4.4), to homogeneous and
isotropic geometries leads to the following two independent differential equations
which relate a(t) to ρ(t) and p(t):
 

3 = −4πG (ρ + 3p)
a
 2
ä ȧ 2κ
+2 + 2 2 = 4πG (ρ − p) . (7.67)
a a a r0
In particular, a particularly useful combination of these may be chosen for which ä
is eliminated, and is called the Friedmann equation,
 2
ȧ κ 8πG
+ 2 2 = ρ. (7.68)
a a r0 3
Rather than directly using an equation involving second derivatives as our second
equation it is more convenient to instead use the equation describing the Conservation
of Stress-Energy in curved space, eq. (4.6):
∇µ T µν = ∂µ T µν + Γµµα T αν + Γνµα T µα = 0 . (7.69)
Once specialized to the stress energy and connection given above, eqs. (7.63) and
(7.66), the ν = i components of this equation vanish for any ρ or p (because of
the assumed homogeneity and isotropy). But the ν = t component of this equation
carries some content:
∂T tt
0= + Γiit T tt + Γtij T ij
∂t  

= ρ̇ + 3 (ρ + p) . (7.70)
a

– 138 –
The physical meaning of this last equation as energy conservation is more easily
seen if it is rewritten as
d  3 d
ρ a + p (a3 ) = 0 , (7.71)
dt dt
since in this form it relates the rate of change of the total energy, ρa3 , to the work
done by the pressure as the universe expands. For matter in thermal equilibrium, a
comparison of this last equation with the 1st Law of Thermodynamics shows that the
expansion of the universe is adiabatic, inasmuch as the total entropy of the matter
in the universe does not change in a homogeneous and isotropic expansion.
Cosmic Acceleration and Matter
In what follows we use the easier-to-use first-order Friedmann and energy-conservation
equations, eqs. (7.68) and (7.70), rather than the original second-order equations,
eq. (7.67), that directly arise from the Einstein equations.
To see that these are equivalent it is instructive to rederive the second-order
equations, eqs. (7.67), from eqs. (7.68) and (7.70). To this end differentiate eq. (7.68)
and use eq. (7.70) to eliminate ρ̇. This gives (if ȧ 6= 0) the first of eqs. (7.67):
ä 4πG
=− (ρ + 3p) . (7.72)
a 3
Notice that this last equation implies that ä < 0 for most forms of matter, since for
these ρ and p are typically positive. This corresponds physically to the statement
that gravity is always attractive, and so the mutual attraction of the galaxies in the
universe always acts to slow down the universal expansion. As we shall see there can
be exceptions to this general rule, for which ρ + 3p < 0, and so whose presence could
cause the universal expansion to accelerate rather than decelerate.
Another application of eq. (7.72) is to use it to see what may be learned about
the present-day values of ρ and p from measurements of the present-day expansion
rate, H0 , and deceleration parameter, q0 . To this end notice that the Friedmann
equation evaluated at the present epoch implies
κ 8πG κ ρ0
H02 + 2
= ρ0 or 1 + 2
= ≡ Ω0 , (7.73)
(a0 r0 ) 3 (a0 H0 r0 ) ρc
where the critical density is defined by ρc ≡ 3H02 /(8πG) and the last equality defines
Ω0 to be the energy density in units of this critical density, Ω0 = ρ0 /ρc . Given the
current measurement H0 = 70 ± 10 km/sec/Mpc, the critical density’s numerical
value becomes ρc = 5200 ± 1000 MeV m−3 = (9 ± 2) × 10−30 g cm−3 .
ρc is defined in the way it is because if ρ0 = ρc then κ = 0. Similarly if κ = +1
then we must have ρ0 > ρc and if κ = −1 then ρ0 < ρc . Evaluating the acceleration
equation, eq. (7.72), at the present epoch similarly gives
ä0 4πG ρ0 + 3p0 Ω0
q0 = − 2
= 2
(ρ0 + 3p0 ) = = (1 + 3w0 ) , (7.74)
a0 H0 3H0 2ρc 2

– 139 –
where we define w0 = p0 /ρ0 . Clearly a measurement of H0 and q0 allows the inference
of both ρ0 and p0 , and knowledge of ρ0 also allows the determination of κ, since
κ = +1 if and only if Ω0 > 1 and q0 > 12 while κ = −1 requires both Ω0 < 1 and
q0 < 12 . In particular, distance-redshift measurements that indicate q0 < 0 also imply
w0 < − 13 (given that ρ0 ' ρc > 0).

Equations of State
Mathematically speaking, finding the evolution of the universe as a function of time
requires the integration of eqs. (7.68) and (7.70), but in themselves these two equa-
tions are inadequate to determine the evolution of the three unknown functions, a(t),
ρ(t) and p(t). Another condition is required in order to make the problem well-posed.
The missing condition is furnished by the equation of state for the matter in
question, which for the present purposes may be regarded as being an expression for
the pressure as a function of energy density, p = p(ρ). As we shall see this expression
is typically characteristic of the microscopic constituents of the matter whose stress
energy is of interest. Such an equation of state naturally arises for matter which
is in local thermodynamic equilibrium, since this often allows both p and ρ to be
expressed in terms of a single quantity like the local temperature, T . But it may also
arise for matter which was only in equilibrium in the past, even if it is no longer in
equilibrium at present.
Most of the equations of state of interest in cosmology have the general form
p = wρ, (7.75)
where w is a t-independent constant. Given an equation of state of this form it is
possible to integrate eqs. (7.68) and (7.70) to determine how a, ρ and p vary with
time, as we now see.
The first step is to determine how p and ρ depend on a, since this is dictated by
energy conservation. Using eq. (7.75) to eliminate p allows eq. (7.70) to be written
ρ̇ ȧ
+ 3(1 + w) = 0 , (7.76)
ρ a
which may be integrated to obtain
 a σ
0
ρ = ρ0 with σ = 3(1 + w) . (7.77)
a
The pressure satisfies an identical dependence on a by virtue of the equation of state:
p = wρ.
If eq. (7.77) is now used to eliminate ρ from eq. (7.68), the following differential
equation for a(t) is obtained
2 8πGρ0 a20  a0 σ−2 κ
ȧ = − 2, (7.78)
3 a r0

– 140 –
In the special case that κ = 0 this equation is easily integrated to give
 α
t 2 2
a(t) = a0 with α = = . (7.79)
t0 σ 3(1 + w)

We now apply the above expressions to a few examples of the equations of state
which are known to be relevant to cosmology.

Empty Space

The simplest cosmology possible is obtained in the absence of matter, in which case
ρ = p = 0. In this case we have ȧ2 = −κ, from which we see that κ 6= +1. Two
distinct solutions are possible, depending on whether κ = 0 or κ = −1.
If κ = 0 we have ȧ = 0 and so we may choose a = 1 for all t. In this case the
LFRW metric simply reduces to the flat metric of Minkowski space, written in polar
coordinates.
If κ = −1 then we have ȧ = ±1 and so a = ±(t−t0 )+a0 . This negatively-curved
geometry is known as the Milne Universe, but so far as we know it does not play any
role in Big Bang cosmology.

Radiation

A gas of relativistic particles, like photons or neutrinos (or other particles for suffi-
ciently high temperatures), when in thermal equilibrium has an energy density and
pressure given by
1
ρ = aB T 4 and p = aB T 4 , (7.80)
3

where aB = π 2 /15 = 0.6580 is the Stefan-Boltzmann constant (in units where kB =


c = ~ = 1) and T is the temperature. These two expressions ensure that ρ and p
satisfy the relation
1 1
p= ρ and so w= . (7.81)
3 3

Since w = 13 we see that σ = 3(1 + w) = 4 and so ρ ∝ a−4 . This has a simple


physical interpretation for a gas of noninteracting photons, since for these the total
number of photons is fixed (and so nγ ∝ a−3 ), but each photon energy also redshifts
like 1/a as the universe expands, leading to ργ ∝ a−4 .
Since σ = 4 we have α = 2/σ = 1/2, and so if κ = 0 then a(t) ∝ t1/2 . Explicit
expressions are given in previous sections for the proper, luminosity and angular-
diameter distance as functions of redshift for this type of expansion.

– 141 –
Non-relativistic Matter
An ideal gas of non-relativistic particles in thermal equilibrium has a pressure and
energy density given by14
nT
p = nT and ρ = nm + , (7.82)
γ−1
where n is the number of particles per unit volume, m is the particle’s rest mass and
γ = cp /cv is its ratio of specific heats, with γ = 53 for a gas of monatomic atoms.
For non-relativistic particles the total number of particles is usually also con-
served, which implies that
d h 3i
na = 0. (7.83)
dt
Since m  T (or else the atoms would be relativistic) the equation of state for this
gas may be taken to be

p/ρ ≈ 0 and so w ≈ 0. (7.84)

Notice that although this equation of state is derived for a thermal gas, it applies
much more generally, such as for the cosmic fluid of galaxies, or for other forms of
non-relativistic matter that are not in thermal equilibrium. This because for all such
systems the pressure is suppressed relative to the energy density by factors of v/c.
If w = 0 then energy conservation implies σ = 3(1 + w) = 3 and so ρa3 is
a constant. This is appropriate for non-relativistic matter for which the energy
density is dominated by the particle rest-masses, ρ ≈ n m, because in this case energy
conservation is equivalent to conservation of particle number, which we’ve seen is
equivalent to n ∝ a−3 (since this leaves the total number of particles, N ∼ n a3 ,
fixed).
Given that σ = 3 we have α = 2/σ = 23 and so if κ = 0 then the universal scale
factor expands like a ∝ t2/3 . Explicit expressions for the proper, luminosity and
angular-diameter distances for this type of expansion are all given in earlier sections.
Nonrelativistic Solutions for General κ:
When σ = 3 it is also possible to solve eq. (7.68) analytically even when κ 6= 0. We
pause here to display these solutions in some detail because most of the history of the
universe from z ∼ 104 down to z ∼ 1 appears to have been governed by a universe
whose energy density was dominated by non-relativistic matter.
As was described in earlier sections, we may expect the solutions for general κ to
be described by two integration constants, which we may take to be Ω0 and H0 , or
equivalently to be q0 = Ω0 /2 and H0 . The value of κ is related to these parameters
14
Units are again used for which Boltzmann’s constant is unity: kB = 1.

– 142 –
because Ω0 = 2q0 = 1 if and only if κ = 0, and κ = +1 if Ω0 > 1 and κ = −1 if
Ω < 1.
For κ = +1 (and so ρ0 > ρc ) the solution for a(t) is most compactly given in
parametric form, as the formula for a cycloid:

a(ζ) q0   1  Ω  
0
= 1 − cos ζ = 1 − cos ζ
a0 2q0 − 1 2 Ω0 − 1
q0   Ω0  
H0 t(ζ) = ζ − sin ζ = ζ − sin ζ . (7.85)
(2q0 − 1)3/2 2(Ω0 − 1)3/2

Here the initial conditions which parameterize this solution are given in terms of the
physically measurable parameters, q0 = Ω0 /2 and H0 .
As ζ increases from 0 to 2π, t increases monotonically from an initial value of 0
to tend = πΩ0 H0−1 /(Ω0 − 1)3/2 , but a/a0 rises from 0 at t = 0 to a maximum value,
Ω0 /(Ω0 − 1) when t = tmax = tend /2. After this point a/a0 decreases monotonically
until it again vanishes at t = tend . This describes a universe which begins in a Big
Bang at t = 0, stops expanding at t = tmax and then finally recollapses and ends in
a Big Crunch at t = tend .
For κ = −1 (and so Ω0 < 1 and q0 < 12 ) the solution for a(t) is given by a very
similar expression

a(ζ) q0   1  Ω  
0
= cosh ζ − 1 = cosh ζ − 1
a0 1 − 2q0 2 1 − Ω0
q0   Ω0  
H0 t(ζ) = sinh ζ − ζ = sinh ζ − ζ . (7.86)
(1 − 2q0 )3/2 2(1 − Ω0 )3/2

This time both t and a increase monotonically with ζ, whose range runs from 0 to
infinity. In this case the universe begins in a Big Bang at t = 0 and then continues
expanding (and cooling) forever, leading to a Big Chill in the remote future.
The Vacuum
If the vacuum is Lorentz invariant, as the success of special relativity seems to in-
dicate, then its stress energy must satisfy Tµν = ρ gµν . This implies the vacuum
pressure must satisfy the only possible Lorentz-invariant equation of state:

p = −ρ and so w = −1 . (7.87)

Clearly either p or ρ must be negative with this equation of state, and unlike for
other equations of state there is no reason of principle for choosing either sign for ρ
a priori.
Because w = −1 when the vacuum energy is dominant, we see that σ = 3(1 +
w) = 0 and so energy conservation implies that ρ is a constant, independent of a

– 143 –
or t. This kind of constant energy density is often called, for historical reasons, the
cosmological constant.
In this situation α = 2/σ → ∞, which shows that the power-law solutions,
a ∝ tα , are not appropriate. Returning directly to the Friedmann equation, eq. (7.68),
shows that if κ = 0 then ȧ ∝ ±a and so the solutions are given by exponentials:
a ∝ exp[±H0 (t − t0 )]. Explicit expressions for the proper, luminosity and angular-
diameter distances as functions of z are given for this expansion in earlier sections.
Notice also that in this case ρ + 3p = −2ρ, which is negative if ρ is positive. As
such this furnishes an explicit example of an equation of state for which the universal
acceleration, ä/a = − 43 πG(ρ + 3p) = + 38 πGρ, can be positive if ρ > 0.

If all lengths are expanding, how can one tell?


We round out this section by taking a breather to address a basic conceptual question
concerning the expanding universe. Since it is spacetime itself that is expanding, this
question asks, how it is possible to measure the expansion of the universe if all of
one’s rulers are also expanding?
In a nutshell, the key to this puzzle is that time, t, is not expanding, and so
energies that are defined relative to this time do not change as the universal length
scale expands. For example, we have seen above that the rest masses of nonrelativistic
particles do not change as the universe expands, and this is related to why the
energy density of such particles fall with the universal expansion proportional to
1/a3 . Because energies and masses do not change, neither do the sizes of bound
states like atoms, and so small everyday objects do not grow along with the universal
expansion.
To see this in more detail, imagine solving the Schrödinger equation for the
ground state of the Hydrogen atom in a universe that is expanding, but doing so at
a rate that is much smaller than any atomic frequencies.15 In terms of rectangular
co-moving coordinates, x, physical (proper) distances, y, are measured by including
the (slowly varying) time-dependent scale factor, y = ax, where a = a(t). In terms
of these the Schrödinger equation is

~2 2 α
− ∇y ψ − ψ = Eψ , (7.88)
2me |y|

where α = e2 /4π is the electromagnetic fine-structure constant, me is the electron


mass, r2 = y2 = a2 x2 and so ∇2y = a−2 ∇2x relates the Laplace operators for the
coordinates y and x respectively.
15
This is an extremely good approximation, since the present-day Hubble scale, H0 , is roughly
−34
10 times smaller than the frequency associated with the 13.6 eV binding energy of the Hydrogen
atom.

– 144 –
Following the usual steps leads to ground-state wave functions of the form ψ ∝
exp(−r/a0 ), with the Bohr radius given by a0 = 1/(αme ) and the energy E0 =
− 21 α2 me . This shows that the atom’s physical size, a0 , measured using the nominally
expanding physical coordinates, y, is fixed by the time-independent constants me and
α. This is in contrast with the time-dependent separation between galaxies in the
LFRW metric, which are situated at fixed values of x (because these are geodesics),
and so separate as a gets larger.
But how do we see that it is the scale H that is the relevant comparison when
deciding which bound systems do not expand with the universal expansion? And
what about bound states where it is gravity itself that is doing the binding? Do the
Schwarzschild radii of stars increase as the universe expands? These questions can be
explicitly answered using an exact solution to Einstein’s equations that describes a
gravitating object (like a black hole) sitting within an expanding LFRW cosmology.
The solution in question is called the McVittie solution [8], and for spatially flat
cosmologies (κ = 0) has the form
 2
2 1−µ
ds = − dt2 + (1 + µ)4 a2 [d%2 + %2 (dθ2 + sin2 θ dφ2 )] , (7.89)
1+µ

where % is the radial coordinate, the dimensionless quantity µ is defined by

GM
µ(%, t) = , (7.90)
2a(t) %

and the scale factor a(t) is obtained by solving the Friedmann equation, as for the
LFRW metric (with κ = 0):
 2
2 ȧ 8πGρ
H = = . (7.91)
a 3

Here ρ(t) is the homogeneous isotropic energy density that governs the time-dependence
of the cosmological environment.
The limiting LFRW and Schwarzschild behaviours are easier to see if we change
coordinates, % → r, where r is defined so that the area of the spheres at fixed r and
t are A = 4πr2 . The desired coordinate change therefore is

r = (1 + µ)2 a % . (7.92)

The metric in these new coordinates then becomes

2
 rs 2 2

2 dr2 2Hr 2

2 2 2

ds = − 1 − − H r dt + − p dr dt + r dθ + sin θ dφ ,
r 1 − rs /r 1 − rs /r
(7.93)

– 145 –
where, as usual, rs = 2GM (which is independent of time). To see that this geometry
approaches an LFRW metric at large distances, use rs /r  (Hr)2 to neglect the rs /r
terms in eq. (7.93). Then adopt the co-moving radius, `, that is used in the standard
form of the LFRW metric, defined (for κ = 0) by r = a(t) `. Using dr = a d` + rH dt,
we see that −(1 − H 2 r2 )dt2 + dr2 − 2Hr drdt = −dt2 + a2 d`2 , and so
 
ds2 ' −dt2 + a2 d`2 + `2 dθ2 + `2 sin2 θ dφ2 if r3  rs /H 2 . (7.94)

This is clearly the LFRW metric (with κ = 0), to which the McVittie solution
asymptotes in the limit (Hr)2  rs /r.
To identify that the metric, eq. (7.93), approaches the Schwarzschild metric in
the opposite limit, Hr  rs /r, it is worth defining the new time coordinate τ by
Hrdr
dτ = dt + p , (7.95)
1 − rs /r (1 − rs /r − H 2 r2 )
since this allows the metric to be written in the diagonal form
 rs  dr2  
ds2 = − 1 − − H 2 r2 dτ 2 + + r 2
dθ 2
+ sin 2
θ dφ2
. (7.96)
r 1 − rs /r − H 2 r2

Clearly this reduces to the Schwarzschild solution when (Hr)2  rs /r, which is true
for any distances that are small compared with the megaparsec scales of relevance
to cosmology.
For the present purposes, the important thing is that the physical constants
characterizing the size of the bound object (a−1 0 = αme for the atomic case, or
rs = 2 GM for gravitationally bound systems), are time-independent when expressed
using the distance measure, r. But the distance between galaxies, that move along
the geodesics corresponding to fixed values of `, grow with time proportional to a(t)
in these same coordinates. The overall expansion of the universe can therefore be
measured by using the sizes of the bound states as the rulers.

7.4 The Present-Day Energy Content


In general the universe contains more than one kind of matter, with some relativistic
particles (like photons) mixed with non-relativistic particles (like atoms) plus possibly
other more exotic forms, each of which satisfies its own equation of state and interacts
fairly weakly with the others. This section summarizes what is known about the
universe’s contents now, and what may be said about the expansion of the universe
in the presence of a mixture of matter of this sort.
Indeed, there is evidence that the universe now contains at least 4 independent
types of matter. This section summarizes what is known about the abundance of
various types of matter in our present best understanding of the universe.

– 146 –
Radiation
The universe is awash with radiation, with the following components.
The Cosmic Microwave Background Radiation:
The sky is full of photons, called the Cosmic Microwave Background (CMB), whose
measured spectrum (see fig. 27) indicates that they are distributed in a thermal
distribution whose temperature is Tγ = 2.725 K. These photons were first directly
detected using a microwave horn on the Earth’s surface, and their thermal prop-
erties have subsequently been precisely measured using balloon- and satellite-borne
instruments.
As we saw earlier, the number
density and energy density of ther-
mal photons are determined by the
temperature, with nγ ∝ T 3 while
ργ ∝ T 4 . The number density corre-
sponding to T0 = 2.725 K turns out
to be

nγ0 = 4.11 × 108 m−3 , (7.97)

which is very high, much higher than


the number density of ordinary atoms.
The energy density carried by these
Figure 27: A plot of the measured spectrum of
the cosmic microwave background radiation, as photons similarly turns out to be
measured by the FIRAS instrument aboard the
ργ0 = 0.261 MeV m−3
COBE satellite.
or Ωγ0 = 5.0 × 10−5 , (7.98)

where as before Ω = ρ/ρc measures the density relative to the critical density, ρc =
5200 ± 1000 MeV −3 ' 9 × 10−28 g/cm3 .
Starlight:
The CMB photons turn out to be somewhat more abundant and carry more energy
than is the integrated number of photons emitted by stars since stars first formed,
and so represent the dominant contribution of photons to the universal energy den-
sity. For instance, a very rough estimate of the density in starlight is obtained by
multiplying the present-day luminosity density of galaxies,16 nL ' 2 × 108 L Mpc−3
by the approximate age of the universe, H0−1 ' 14 Gy, which gives ρ? ' 7 × 10−3
MeV m−3 , or Ω? ' 1 × 10−6 .
16
L here denotes the luminosity of the Sun.

– 147 –
Relic Neutrinos:
Neutrinos are elementary particles whose mass is small enough to also make them
relativistic during most of the universe’s history, meaning they also count as radiation
when tallying the universe’s total energy density. There are three species of neutrino,
but because they are electrically neutral they interact very weakly with matter: they
can penetrate the entire earth without interacting once. Their existence is known
because they take part in radioactive decays, such as in the conversion of a neutron
into a proton,
n ↔ p + e− + ν e , (7.99)
in beta decay.
It is believed on theoretical grounds (more about these grounds in subsequent
sections) that there is also an almost equally large population of cosmic relic neutrinos
filling the universe, although these neutrinos have never been detected. They are
expected to have been relativistic throughout most of the universe’s history, although
they may have perhaps become non-relativistic very recently. They are also expected
to be thermally distributed, as are the photons. The neutrinos are expected to have
a slightly lower temperature, Tν0 = 1.9 K, than the photons, and because neutrinos
are fermions they have a slightly different energy-density/temperature relation than
do photons (which are bosons).
These properties make their contribution to the present-day cosmological energy
budget not negligible, being predicted to be

ρν0 = 0.18 MeV m−3 or Ων0 = 3.4 × 10−5 . (7.100)

If the neutrinos are relativistic, the total radiation density becomes ρR 0 = ργ0 + ρν0 ,
which is of order

ρR 0 = 0.44 MeV m−3 or ΩR 0 = 8.4 × 10−5 . (7.101)

Nonrelativistic Matter
There are two qualitatively different kinds of matter present in the universe that we
know are not moving at relativistic speeds.
Baryons
The main constituents of the matter we see around us on Earth are atoms, which are
themselves made up of protons, neutrons and electrons, and these are predominantly
non-relativistic at the present epoch. Furthermore the abundance of electrons is very
likely to precisely equal that of protons, since these carry opposite electrical charge,
and a precise equality of abundance is required to ensure that the universe carries

– 148 –
no net charge. (The penalty for not having charges locally balance is huge electric
forces that ensure that charges move until the local charge density vanishes.)
The mass of the proton and neutron is 940 MeV, which is about 1840 times
more massive than the electron, and so the energy density in ordinary non-relativistic
particles is likely to be well approximated by the total energy in protons and neutrons.
This is also called the total energy in baryons, since protons and neutrons carry an
approximately conserved charge called baryon number.
For reasons to become clear in later
sections, it is possible to determine the
total number of baryons in the universe
(regardless of whether or not they are
presently visible) from the success of the
predictions of the abundances of light el-
ements due to primordial nucleosynthe-
sis during the very early universe (see
fig. 28). This indicates that there is about
one baryon for every 1010 photons, lead-
ing to the following contribution to the
total energy density in baryons (i.e. or-
dinary protons, neutrons and electrons)

ρB0 = 210 MeV m−3 or ΩB0 = 0.04 .


(7.102)
Figure 28: The predictions for light-nuclei
abundance as a function of baryon density,
For comparison, the amount of lumi-
with the vertical strip indicating the baryon nous matter is considerably smaller than
abundance that gives agreement with obser- this. Using the previously-quoted lumi-
8
vations for all of the light elements. (Cour- nosity density for galaxies, nL = 2 × 10
tesy of Ned Wright’s cosmology page.) L Mpc−3 , together with a typical mass-
to-luminosity ratio of M/L = 4M /L ,
gives an energy density in luminous baryons which is roughly 10% of the total amount
in baryons
ρL0 = 20 MeV m−3 or ΩL0 = 0.004 . (7.103)

It should be emphasized that although there is more energy in baryons than in


CMB photons, the number density of baryons is much smaller. That is

210 MeV m−3


nB0 = = 0.22 m−3 = 5 × 10−10 nγ0 , (7.104)
940 MeV
and this plays an important role in the physics of the early universe.

– 149 –
Dark Matter
There several lines of evidence that point
to the existence of another form of non-
relativistic matter besides baryons, called
Dark Matter, which appear to carry more
energy density than the baryons.
Some of this evidence comes from
different independent measures of the to-
tal amount of gravitating mass in galax-
ies. This can be inferred by measuring
the rotation rates of galaxies as a func-
Figure 29: A measurement of a galactic tion of distance from the galactic center,
rotation speed vs distance from the galac- since this gives speed as a function of ra-
tic center. The dashed line indicates what dius, v(r), for objects orbiting the galac-
would be expected if the visible matter were tic center (see fig. (29)). For circular or-
the only matter present. bits about a point mass Newton’s Laws
would imply a = v 2 /r = F/m ∝ 1/r2 ,

and so v ∝ 1/ r, and a similar fall-off is expected for gas and stars within galaxies
(indicated by the dashed line in fig. 29) if only the matter that is visible were present.
The disagreement between predictions and observations — which is the rule for large
luminous galaxies — indicates that there is 10 – 100 times as much gravitating mass
present than would be inferred by counting the luminous matter.
A similar result holds for the total mass in galaxy clusters, as estimated in three
independent ways:

• The mass in a galaxy cluster can be inferred by measuring the motions of its
constituent galaxies, and comparing this to Newton’s Laws (much as was done
for stars and gas orbiting in galaxies).

• Alternatively, it can be inferred from the temperature of the hot intergalactic


gas that is seen when the galaxy cluster is viewed in x-ray wavelengths (see
fig. 30).17 This temperature gives the average speed of the hydrogen ions
present, and the cluster mass must be large enough to have kept this gas bound
to the cluster to prevent its dispersal.

• Finally, the mass of a cluster can be inferred by measuring the amount of


gravitational lensing that it produces in the images of more distant galaxies,
such as revealed by micro-lensing surveys.
17
Typically, there are more baryons in the intergalactic gas than in the galaxies themselves.

– 150 –
Two further lines of evidence also point towards the existence of Dark Matter,
based on the picture that the large-scale structure of galaxies and clusters of galaxies
first arose as gravity amplified initially small primordial density fluctuations that
were already present in the early universe. They start from the realization that
these primordial fluctuations are revealed to us by detailed measurements of the
temperature of the Cosmic Microwave Background (CMB) as a function of direction,
seen from Earth (see fig. 34). Since the CMB represents light that last scattered
from matter as the universe cooled through the temperature when electrons and
protons were first combining into hydrogen nuclei, these temperature fluctuations
represent density fluctuations in the primordial hydrogen gas. Since it is these same
fluctuations that are later amplified by gravity to form the galaxies, the properties
of the CMB can be related to those of the observed distribution of galaxies we see
in the later universe.
Since it turns out that gravity can
only amplify density fluctuations if the
universe is dominated by nonrelativistic
matter, the first piece of evidence asks
how long it would take to produce the
observed galaxies from the initially small
(10−5 ) amplitude of temperature fluctu-
ations seen in the CMB. It turns out
that there has been insufficient time if
Figure 30: A visible-light photograph of baryons were the only nonrelativistic mat-
a cluster of galaxies overlaid by an x-ray ter in the universe, but galaxies would
picture indicating the presence of hot inter- have had time to form if there were suf-
galactic gas. The orbital speeds of the galax-
ficiently much Dark Matter.
ies and the gas molecules both indicate the
Similarly, since galaxies form by am-
presence of Dark Matter.
plifying fluctuations seen in the CMB,
the correlations of the CMB should be mirrored by correlations amongst the posi-
tions of the subsequent galaxies. These correlations have been seen and are known
as baryon acoustic oscillations. The properties of these oscillations agree with pre-
dictions only given the right amount of Dark Matter.
All of these estimates appear to be consistent with one another, and indicate a
Dark Matter density that is of order

ρDM 0 = 1350 MeV m−3 or ΩDM 0 = 0.26 . (7.105)

Furthermore it turns out that whatever this gravitating matter is, it must be
non-relativistic since it otherwise would not take part in the gravitational collapse
that makes galaxies and their clusters in the first place. This indicates that it should

– 151 –
have the same equation of state, p ≈ 0, as have the baryons, meaning that the total
energy density in non-relativistic matter is the sum of the baryonic and Dark Matter
abundances: ΩM 0 = ΩB0 + ΩDM 0 . Combining the above estimates gives a total that
is of order
ρM 0 = 1600 MeV m−3 or Ωm0 = 0.30 . (7.106)

Dark Energy

Finally, there are two lines of evidence which point to a second form of unknown
matter in the universe, which does not share the same equation of state of either
relativistic or nonrelativistic matter. As mentioned above, one line is based on the
recent measurements of the deceleration parameter, q0 , that were made by detecting
the expected deviation from Hubble law for very distant supernovae (see fig. 26).
This shows that the universal expansion is accelerating, rather than decelerating,
and so requires the universe must now be dominated by a form of matter for which
ρ + 3p < 0.
The second line of argument is based
on the evidence in favor of the universe
being spatially flat: κ = 0 and so Ω0 =
1. This evidence comes from measure-
ments of the angular distance between
hot and cold fluctuations in the temper-
ature of the CMB photon distributions,
Figure 31: A sketch of the relation between as has been measured by satellite exper-
the measured angle on the sky, θ, of a known iments (see fig. 34). Since these fluctua-
length, D, seen from across a flat (κ = 0) tions are due to sound waves in the pri-
universe. The dotted lines indicate how the mordial hydrogen gas, their physical size
angle would change (for fixed D) if the in- can be computed in terms of the known
tervening geometry of space were positively
speed of sound in Hydrogen: it is as if
curved (κ = 1).
someone has held up a ruler of known
length for us at the other end of the uni-
verse. Furthermore, we also know the distance to this fluctuation from measurements
of H0 . In Euclidean geometry knowledge of these two distances would not be inde-
pendent of the angle since the geometry of an isosceles triangle is over-determined by
a measurement of its length, breadth and angular width. Such a triangle is similarly
over-determined in a curved (κ = ±1) geometry, but with a different angle predicted
for a given length triangle (as is shown in fig. 31). Consequently, the geometry of
space can be inferred by comparing the physical distances with the measured angular
separation, leading to the conclusion that κ = 0 to within the errors.

– 152 –
But the Friedmann equation tells us that κ = 0 implies Ω0 = 1 and so ρ0 = ρc .
And this requires the existence of something besides Dark Matter, since the evidence
for Dark Matter indicates that its abundance is too small to give Ω0 = 1. These two
lines of evidence are consistent with one another (within sizeable errors) and point
to a Dark Energy density which is of order

ρDE0 = 3600 MeV m−3 or ΩDE0 = 0.70 . (7.107)

The equation of state for the Dark


Energy is not known, apart from the re-
mark that the observations indicate both
that at present ρDE0 ∼ 0.7 ρc > 0 and
w∼ < −0.8. If w is constant, it is likely
on theoretical grounds that w = −1 and
the Dark Energy is simply the Lorentz-
invariant vacuum energy density. Al-
though it is not yet known whether the
vacuum need be Lorentz invariant to the
Figure 32: A plot of the amount of Dark En-
ergy and Dark Matter as indicated by super- precision required to draw cosmological
nova measurements, properties of the CMB conclusions of sufficient accuracy, in what
and direct measurements for Dark Matter. follows it will be assumed that the Dark
The fact that the regions overlap indicates Energy equation of state is w = −1.
that all evidence is consistent. What emerges is a universe consist-
ing of 70% Dark Energy, 26% Dark Mat-
ter and 4% baryons, with many different lines of evidence converging to paint the
same picture. It is the very consistency of these many lines of evidence — what
has become known as concordance cosmology — that helps give confidence that the
overall framework is healthy even though it involves the existence of two completely
new kinds of unknown matter.

7.5 Earlier Epochs

Given the present-day cosmic ingredients described in the previous section, this sec-
tion uses the equations of state for each type of ingredient to extrapolate the relative
abundances into the past in order to estimate what can be said about the cosmic
environment during earlier epochs. The main assumption for this extrapolation is
that the various components of the cosmic fluid are weakly coupled to one another,
and so cannot transfer energy directly to one another.
Under these circumstances the equation of energy conservation, eq. (7.70), ap-
plies separately to each component of the fluid. The relative energy densities then

– 153 –
change as these components respond differently to the expansion of the universe, as
follows.

• Radiation: For photons, starlight and relic neutrinos of sufficiently small mass
we have w = 13 and so ρ(a)/ρ0 = (a0 /a)4 ;

• Non-relativistic Matter: For both ordinary matter (baryons and electrons)


and for the Dark Matter we have w = 0 and so ρ(a)/ρ0 = (a0 /a)3 ;

• Vacuum Energy: Assuming the Dark Energy has the equation of state w =
−1 we have ρ(a) = ρ0 for all a.

This implies the total energy density and pressure have the form
 a 3  a 4
0 0
ρ(a) = ρDE0 + ρM 0 + ρR0
a a
1  a 4
0
p(a) = −ρDE0 + ρR0 . (7.108)
3 a
As the universe is run backwards to smaller sizes it is clear that these results
imply that the Dark Energy becomes less and less important, while relativistic matter
becomes more and more important (see fig. 33). Although the Dark Energy now
dominates, non-relativistic matter is the next most abundant contribution, and when
extrapolated backwards would have satisfied ρM (a) > ρDE (a) relatively recently, at
a redshift  1/3  1/3
a0 ΩDE0 0.7
1+z = > = = 1.3 . (7.109)
a ΩM 0 0.3
The energy density in baryons alone becomes larger than the Dark Energy density
at a slightly earlier epoch
 1/3  1/3
ΩDE0 0.7
1+z > = = 2.6 . (7.110)
ΩB0 0.04
For times earlier than this the dominant component of the energy density is
due to non-relativistic matter, and this remains true back until the epoch when the
energy density in radiation became comparable with that in non-relativistic matter.
Since ρR ∝ a−4 and ρM ∝ a−3 radiation-matter equality occurs when
ΩM 0 0.3
1+z > = = 3600 . (7.111)
ΩR0 8.4 × 10−5
This crossover would have occurred much later in the absence of Dark Matter, since
the radiation energy density equals the energy density in baryons when
ΩB0 0.04
1+z > = = 480 . (7.112)
ΩR0 8.4 × 10−5

– 154 –
Knowing how ρ depends on a immediately gives, with the Friedmann equation,
H as a function of a, and so also an explicit form for the proper, luminosity and
angular-diameter distances. For example, eq. (7.108) implies
h  a 2  a 3  a 4 i1/2
0 0 0
H(a) = H0 ΩDE0 + Ωκ0 + ΩM 0 + ΩR0 , (7.113)
a a a
where we define
κ
Ωκ0 ≡ − . (7.114)
(H0 r0 a0 )2
Using 1 + z = a0 /a to eliminate a in favour of z then allows the present-day proper
distance in such a universe to be written
Z z h i−1/2
−1
D(z) = H0 dz 0 ΩDE0 + Ωκ0 (1 + z 0 )2 + ΩM 0 (1 + z 0 )3 + ΩR0 (1 + z 0 )4 , (7.115)
0

with DL and DA being related to this by powers of (1 + z) if κ = 0. It is clear from


this expression how measurements of DL (z) or DA (z) for a range of z’s can allow an
inference of the relative present-day density abundances, Ωi0 , for i = DE, M, R and
κ.
Given the dependence, eq. (7.113) of
Energy Density vs Scale Factor
30 H on a, it is possible to integrate to ob-
25
Radiation
tain the t-dependence of a. Although
Log of Energy Density

Matter
20 Dark Energy
Total
in general this dependence must be ob-
15
tained numerically, many of its features
10
may be understood on simple analytic
5
grounds based on the recognition that
0
for most epochs there is only a single
-8 -6 -4 -2 0 2
Log of Scale Factor component of the cosmic fluid which is
dominating the total energy density. We
Figure 33: A plot of the energy density, expect that for redshifts larger than sev-
ρ, vs universal scale factor, a, for radiation, eral thousand a(t) should be well ap-
matter and dark energy. proximated by the expansion in a uni-
verse which is filled purely by radiation.
Once a/a0 rises to above 1/3600 there should be a brief transition to the time de-
pendence which describes the universal expansion in a universe dominated by non-
relativistic matter. This should apply right up to the very recent past, when a/a0
is around 0.8, after which there is a transition to vacuum-energy domination, dur-
ing which the universal expansion accelerates to become exponential with t. In
all likelihood we are at present still living in the transition period from matter to
vacuum-energy domination.

– 155 –
Although the detailed relationship of a on t in principle depends on the value
taken by κ, in practice the contribution of κ is only important in the very recent
past. This is because the best information available at present indicates that Ω0 =
ΩDE0 + Ωm0 + Ωr0 = 1, which is consistent with κ = 0. But even if κ 6= 0, since
the curvature term in eq. (7.68) varies like a−2 , it falls more slowly than does either
the contribution of matter (ρm ∝ a−3 ) or radiation (ρr ∝ a−4 ). So given that the
curvature term is at best only comparable to the other energy densities at present,
it becomes more and more negligible the further one looks into the universe’s past.
As a result it is a very good approximation to use κ = 0 in the expression
for a(t) during the matter-dominated and the earlier radiation-dominated epoch, in
which case it has the very simple form a(t) = a0 (t/t0 )α , with α = 21 during radiation
domination and α = 23 during matter domination. It may not be valid to neglect
κ for the more recent periods of matter domination, and so in this case the more
detailed expressions given in the previous section should instead be used. For the
present-day epoch it is best to include both κ 6= 0 and ρDE 6= 0, although the best
evidence remains consistent (within largish errors) with κ = 0.
When κ = 0 it is also possible to give simple analytic expressions for the
time dependence of a in the two transition regions: between radiation- and matter-
domination; and between matter- and dark-energy domination. Neglecting radiation
during the matter/dark-energy transition gives a Friedmann equation of the form
 2   a 3 
ȧ 2 eq
= Hde 1 + , (7.116)
a a
where aeq is the value of the scale factor when the energy densities of the matter and
2
dark energy are equal to one another, and Hde = 8πGρde /3 is the (constant) Hubble
scale during the pure dark-energy epoch. Integrating this equation (assuming ȧ > 0),
with the boundary condition that a = 0 when t = 0 then gives the solution
 
2/3 3Hde t
a(t) = a0 sinh , (7.117)
2
where a0 is a constant. Notice that when Hde t  1 this approaches the exponential
solution, a/a0 ∝ exp(Hde t) of the dark-energy epoch, while for Hde t  1 it instead
implies a/a0 ∝ t2/3 , as is appropriate for the matter-dominated epoch.
More generally, the transition from an epoch for which
 2
ȧ 2
h  a p i
eq
= Hde 1 + , (7.118)
a a
is given by the solution  
2/p pHde t
a(t) = a0 sinh . (7.119)
2

– 156 –
The transition from radiation to matter domination may be handled in a similar
way. It is convenient to write the Friedmann equation during this transition as
 2 2 
Heq

ȧ aeq 3  aeq 4
= + , (7.120)
a 2 a a

where the constants aeq and Heq are the scale factor and Hubble scale at the instant
where radiation and matter have equal energy densities. This may be integrated
directly (with ȧ > 0 and the initial condition a = 0 when t = 0) to give
 1/2  
a a 3Heq t
+1 − 2 = √ − 2. (7.121)
aeq aeq 2 2

Again this has the correct limits: a ∝ t2/3 when a  aeq and a ∝ t1/2 when a  aeq .

Exercise 32: Derive eqs. (7.119) and (7.121) by respectively integrating


eqs. (7.118) and (7.120).

7.6 Hot Big Bang Cosmology


The equations of state for radiation and non-relativistic matter used in the previous
discussion are based on those which arise for radiation and atoms which are in thermal
equilibrium, and for the case of CMB photons the photons can be seen explicitly to
have a thermal distribution. This all points to matter being hot and dense at some
point in the universe’s past. As we shall see there is also other evidence that the
matter in the universe was once as hot as 1010 K or more, at which time nuclei were
once synthesized from a hot soup of protons, neutrons and electrons.
The Big Bang theory of cosmology starts with the idea that the universe was once
small and hot enough that it contained just a soup of elementary particles, in order
to see if this leads to a later universe that we recognize in cosmological observations.
This picture turns out to describe well many of the features we see around us, which
are otherwise harder to understand. This section starts the discussion of the Big Bang
theory by exploring the properties of a thermal bath of particles in an expanding
universe, in order to understand the conditions under which equilibrium might be
expected to hold, and to see what happens as such a bath cools as the universe
expands.

The Known Particle Content


The starting point of any such description is a summary of the various types of
elementary particles which are known, and their properties. These are well-known
from experimental and theoretical study over more than 40 years.

– 157 –
As mentioned earlier, the highest temperature there is direct observational ev-
idence the universe has attained in the past is T ∼ 1010 K, which corresponds to
thermal energies of order 1 MeV. The elementary particles which might be expected
to be found within a soup having this temperature are the following.

• Photons (γ): are bosons that have two spin (or polarization) states, and have
no electric charge or mass. They can be singly emitted and absorbed by any
electrically-charged particles.

• Electrons and Positrons (e± ): are fermions that each have two spin states
and have charge ±e, where e denotes the proton charge.18 Their masses are the
same size as one another, and equal numerically to me = 0.511 MeV. Because
the positron, e+ , is the antiparticle for the electron, e− , (and vice versa), these
particles can completely annihilate into photons through the reaction

e+ + e− ↔ 2γ . (7.122)

• Protons (p): are fermions that have two spin states, charge +e and a mass
mp = 938 MeV. Unlike all of the other particles described here (except the
neutron, which is next), the proton can take part in the strong interactions,
which are what hold nuclei together. For example, this permits reactions like

p+n↔D+γ, (7.123)

in which a proton and neutron combine to produce a deuterium nucleus, which


is a heavy isotope of Hydrogen that consists of a bound state of one proton
and one neutron. The photon which appears in this expression simply carries
off any excess energy which is released by the reaction.

• Neutrons (n): are fermions having two spin states, no electric charge and a
mass mn = 940 MeV. Like protons, neutrons participate in the strong interac-
tions. Isolated neutrons are unstable, and left to themselves decay through the
weak interactions into a proton, an electron and an electron-antineutrino (see
below).
n → p + e− + ν e . (7.124)

• Neutrinos and Anti-neutrinos (νe , ν e , νµ , ν µ , ντ , ν τ ): are fermions which


are electrically neutral, and have been found to have nonzero masses whose
precise values are not known, but which are known to be smaller than 1 eV.
18
Superscripts ‘±’ or context should allow the use of e both as the symbol of the electron and to
denote its charge.

– 158 –
Although each has two spin states, it is not yet known whether or not the
neutrino and antineutrino are distinct particles (like for electrons) or not (as
for photons).

• Gravitons (G): are bosons which are not electrically charged and are massless.
Gravitons are the quanta which carry the energy packets in a gravitational
wave, in the same way that photons do for electromagnetic waves. Gravitons
only interact with other particles with gravitational strength, which is very
weak compared to the strength of the other interactions. As a result they turn
out never to have been in thermal equilibrium for any of the temperatures to
which we have observational access in cosmology.

The next sections ask how the temperature of a bath of particles would evolve
on thermodynamic grounds as the universe expands.

Cooling Rate
We have found (for several choices for the equation of state) how the energy density
in different forms of matter varies with a as the universe expands, and we have seen
how to find from this how a varies with time, t. We now ask how thermodynamics
relates the temperature to a (and so also t), in order to quantify the rate with which
a hot bath cools due to the universal expansion. Since most of the universe’s history
was dominated by radiation (whose energy density was more important in the past
than it is now), we do so here for relativistic particles.
The energy density and pressure appropriate to a gas of relativistic particles (like
photons) when in thermal equilibrium at temperature TR are given by

1
ρR = aB TR4 and pR = aB TR4 , (7.125)
3
where aB is g/2 times the Stefan-Boltzmann constant, where g counts the number of
internal (spin) states of the particles of interest (and so g = 2 for a gas of photons).
The evolution of TR as the universe expands is simply determined by these ex-
pressions together with energy conservation, which for relativistic particles we have
seen implies ρR a4 does not change as a increases. It is clear that because ρR ∝ TR4
and ρR ∝ a−4 , consistency implies the product a TR is constant, and so
a 
0
TR = TR0 = TR0 (1 + z) . (7.126)
a
Notice that this assumes only that ρR ∝ TR4 ∝ a−4 , and so (unlike the expression
for a vs t) it does not assume that the total energy density is radiation-dominated.
One way to see why this is so is to recognize that eq. (7.126) is equivalent to the

– 159 –
statement that the expansion is adiabatic, since the entropy per unit volume of a
relativistic gas is sR ∝ TR3 , and so the total entropy in this gas is

SR ∝ sR a3 ∝ (TR a)3 = constant . (7.127)

A Thermal History of the Universe


An important consequence of the falling of the temperature as the universe expands
is that it makes interactions amongst the various particles run more slowly. This
happens because the lower temperature means there is less energy available per
collision on average, but also because it means that there are fewer particles about
(per unit volume) with which to interact. Eventually for all interactions there comes
a point where reactions run slowly enough that they are so rare as to be nonexistent.
When this happens the very equilibrium of the particles involved breaks down, and
they are said to freeze out. That is, they coast along without interacting down until
the present day.
What is spectacular about the study of cosmology now is the ability to test
cosmological ideas with observations, and these tests largely rely on detecting those
particles which have fallen out of equilibrium to persist to the present day as residual
relics of the early universe. This section provides a brief history of the early universe
with a focus on describing the various types of relics which arise. Our starting point
is the epoch when the universe has a temperature of about 10 MeV, at which point it
consists of a hot soup of non-relativistic protons and neutrons, in equilibrium with a
population of relativistic electrons, positrons, photons and three species of neutrino.
At MeV temperatures we have approximately equal numbers of protons and
neutrons. Since all of the other particles satisfy m  T at these temperatures,
equipartition of energy in a thermal environment ensures that there are roughly equal
numbers of electrons, positrons, photons and each species of neutrino. Furthermore,
agreement with observations requires the relativistic particles to be considerably more
numerous, with ηB = nB /nγ = (nn + np )/nγ ∼ 10−10 . There must also be a slight
excess of electrons over positrons so that ne −ne = np in order to ensure the electrical
neutrality of the cosmic environment. This enormous excess of relativistic particles
over non-relativistic ones ensures that the entropy of the equilibrium bath which
they all share is dominated by the relativistic particles, and so the temperature of
the bath falls like T ∝ a−1 , as discussed above. The excess of relativistic matter over
non-relativistic matter also ensures that the energy density is radiation-dominated,
and so ρtot ∝ T 4 ∝ a−4 .
We now list a number of landmarks in the thermal history of the universe, which
make an important impact on the relics we see today that are left over from this
earlier and hotter time.

– 160 –
1. Neutrino Freeze-out: Once the temperatures fall below a few MeV, the
weak interactions are not sufficiently strong to keep the three types of neutrino
species in thermal equilibrium. After this point these neutrinos continue to run
around the universe without scattering, and are still present during the present
epoch as a Cosmic Neutrino Background. Since the neutrinos are relativistic,
however, their number density remains in its equilibrium form with the tem-
perature simply red-shifting, Tν ∝ a−1 , as the universe expands. Since this
is precisely the same time-dependence as for the thermal bath containing the
rest of the particles, Tν continues to track the temperature of the thermal bath
as the universe expands. Although these neutrinos are in principle all around
us, they have so far escaped detection due to their extremely small interaction
cross sections.

2. Electron-Positron Annihilation: Once the temperature falls below twice


the electron mass, 2me = 1.2 MeV, the abundance of electrons and positrons
begins to decline relative to photons due to the reaction e+ e− → γγ beginning
to predominate over the inverse process of pair creation. This ends up removing
essentially all of the positrons, leaving the same number of residual electrons
as there are protons. This has an important consequence for the later universe,
because this process of annihilation dumps a considerable amount of energy
which reheats the equilibrium bath of photons, neutrons and charged parti-
cles relative to the neutrino temperature, which continues to redshift without
experiencing any heating (because it is no longer in equilibrium).

3. Formation of Nuclei: The thermal evolution at temperatures lower than 1


MeV is richer than would be believed from previous sections due to the possi-
bility which arises of forming bound states. In particular, nuclear interactions
can bind a neutron and proton into deuterium, with a binding energy of 2.22
MeV, and so once temperatures reach this energy range light nuclei begin to
form and so change the chemical composition of the cosmic fluid. The residual
abundance of these nuclei predicted by this process agrees well with the ob-
served primordial abundances, which provides strong evidence for the validity
of the Big Bang picture of cosmology, and gives important information about
the total abundance, nB , of baryons (protons and neutrons). A constraint on
the total number of baryons is possible because the nuclear reaction rates are
proportional to the density of reactants, with more baryons leading to faster re-
actions. But the total number of nuclei formed depends on how long it takes for
temperatures to cool to the point that nuclear reactions also stop happening,
and this is controlled in part by the size of the reaction rates (and so also by

– 161 –
the baryon density). The result is usually normalized to the density of photons
since the result is then time-independent, leading to ηB := nB /nγ ' 10−10 .

4. Formation of Atoms: Electromagnetic interactions furnish another impor-


tant set of bound states which complicate the picture of the universe at lower
temperatures. In particular, electrons can bind with nuclei to form neutral
atoms once the temperature falls below the relevant binding energies, E ∼ 10
eV. In practice atoms don’t actually form until the temperature is somewhat
cooler than this, T ' 1 eV, because the large number (∼ 1010 ) of photons for
each electron and proton, makes the reactions where photons dissociate bound
atoms initially more common than those where atoms are formed. At this point
the equilibrium conditions for charged particles and photons changes dramati-
cally, since once atoms form the cosmic fluid becomes electrically neutral, and
so largely transparent to photons. The cosmic microwave background (CMB)
consists of those photons which last scattered from matter at this point, and
have survived unscathed to be observed during the present epoch. The obser-
vation of these photons gives a direct measure of the temperature of the heat
bath from which the photons eventually decoupled, a map of which is given in
fig. 34.

In all, the Hot Big Bang provides


an outstandingly successful description
what we see around ourselves in cosmol-
ogy, but only if we start with just the
right initial conditions sometime before
nucleosynthesis. These initial conditions
require the early universe to be very ho-
Figure 34: The temperature of the cosmic mogeneous and isotropic, since this is
microwave background radiation as a func- what is observed to be true for the cos-
tion of direction, as measured by the WMAP mic microwave background. Indeed, small
collaboration. The difference between the temperature (and so also density) fluc-
hottest and coolest points in this map are of
tuations in the primordial hydrogen en-
order 10 µK.
vironment are directly observed in pre-
cision measurements of the cosmic mi-
crowave background temperature as a function of direction in the sky (as seen in
fig. 34). Since these fluctuations are only about 10 µK in size, compared with the
CMB’s average temperature of 2.725 K they show that density perturbations were
at most as big as 1 part in 105 when atoms were first forming in the early universe.

– 162 –
But because primordial fluctuations have been seen, the initial universe cannot
be perfectly homogeneous. This is also a good thing, because the amplitude of these
small density fluctuations is ultimately amplified by gravitational collapse to form the
galaxies and stars we find ourselves surrounded by. An important piece of evidence
for Dark Matter is that there has not been sufficient time for this amplification to
take place if the only non-relativistic particles around are baryons.
It turns out that these initial conditions are not natural, in that they do not
automatically arise unless they are put by hand into the initial conditions. Fur-
thermore, because time evolution moves the universe away from homogeneity and
isotropy, the universe at still-earlier times must be smooth to a much higher accuracy
than at present. It is hoped that these initial conditions may be the relics of a still-
earlier epoch of the universe about which physicists have long speculated, called the
inflationary epoch. The speculations center around the observation that the special
initial conditions of the Big Bang would emerge very naturally if the universe were to
have undergone a period of near exponential expansion (much like the Dark Energy
dominated epoch we now appear to be entering, but with much higher energies and
densities) at much earlier times.

– 163 –
Here is a selection of textbooks on General Relativity, and cosmology.

1. C.M. Will, Theory and Experiment in Gravitational Physics (Revised Edition),


Cambridge University Press, 1993.

2. S. Carroll, An Introduction to General Relativity Spacetime and Geometry,


Addison Wesley 2004. [Modern and well written]

3. S. Weinberg, Gravitation and Cosmology: Principles and Applications of the


General Theory of Relativity, Wiley 1972. [The timeless classic – very physical]

4. C. Misner, K. Thorne and J. Wheeler, Gravitation, Freeman and Company


1970. [Encyclopedic, with many layers of insight]

5. R. Wald, General Relativity, University of Chicago 1984. [More mathematical,


with an emphasis on modern differential geometry]

6. P.J.E. Peebles, Principles of Physical Cosmology, Princeton University Press


(1993).

7. B. Ryden, Introduction to Cosmology, Pearson Education 2003. [A good un-


dergraduate introduction to modern cosmology]

8. S. Dodelson, Modern Cosmology, Academic Press 2003. [A good, but more


advanced, introduction to modern cosmology.]

9. A. Linde, Particle Physics and Inflationary Cosmology, Harwood Academic


Publishers (1990).

10. E. W. Kolb and M. S. Turner, The Early Universe, Addison-Wesley (1990).

11. A. R. Liddle and D. H. Lyth, Cosmological Inflation and Large-Scale Structure,


Cambridge University Press (2000).

12. S. Weinberg, Cosmology, Oxford University Press (2008).

13. S. Chandrashekhar, The Mathematical Theory of Black Holes, Oxford Univer-


sity Press 1992.

14. S.L. Shapiro and S.A. Teukolsky, Black Holes, White Dwarfs and Neutron
Stars: The physics of compact objects, Wiley 1983.

– 164 –
References
[1] S. Baessler et.al., Physical Review Letters 83 (1999) 3585;
E. Adelberger, Classical and Quantum Gravity 18 (2001) 2397.

[2] I. I. Shapiro, Fourth Test of General Relativity, Physical Review Letters 13 (1964)
789-791.

[3] R. D. Reasenberg, et al., Viking Relativity Experiment: Verification of Signal


Retardation by Solar Gravity, Astrophysical Journal 234, (1979) L219-L221.

[4] B. Bertotti, L. Iess and P. Tortora, A Test of General Relativity Using Radio Links
with the Cassini Spacecraft, Nature 425, (2003) 374-376 (2003);
John D. Anderson, Eunice L. Lau, and Giacomo Giampieri, “Measurement of the
PPN Parameter with Radio Signals from the Cassini Spacecraft at X- and
Ka-Bands,” in the proceedings of the 22nd Texas Symposium on Relativistic
Astrophysics, Stanford, 2004.

[5] Reflections on Relativity, http://www.mathpages.com/rr/rrtoc.htm.

[6] S. Gillessen et.al., arXiv:0810.4674 (astro-ph).

[7] W.L. Freedman et.al., Ap. J. 553 (2001) 47–72, e-print (arXiv:astro-ph/0012376).

[8] G.C. McVittie, Mon. Not. Roy. Aston. Soc. 93 (1933) 325;
B.C. Nolan, Phys. Rev. D58 (1998) 064006 [gr-qc/9805041].

– 165 –

You might also like