Spending Symmetry

Download as pdf or txt
Download as pdf or txt
You are on page 1of 250

Spending symmetry

Terence Tao
Department of Mathematics, UCLA, Los Angeles, CA 90095
E-mail address: [email protected]

In memory of Garth Gaudry, who set me on the road

Contents

Preface

ix

A remark on notation

Acknowledgments

Chapter 1.

Logic and foundations

1.1.

The argument from ignorance

1.2.

On truth and accuracy

1.3.

Mathematical modeling

1.4.

Epistemic logic, and the blue-eyed islander puzzle lower bound

1.5.

Higher-order epistemic logic

Chapter 2.

21

Group theory

39

2.1.

Symmetry spending

39

2.2.

Isogenies between classical Lie groups

41

Chapter 3.
3.1.
3.2.

Combinatorics

47

The Szemeredi-Trotter theorem via the polynomial ham


sandwich theorem

47

A quantitative Kemperman theorem

51

Chapter 4.

Analysis

55

4.1.

The Fredholm alternative

55

4.2.

The inverse function theorem for everywhere differentiable


functions

60

Steins interpolation theorem

68

4.3.

vii

viii

Contents

4.4.

The Cotlar-Stein lemma

74

4.5.

Steins spherical maximal inequality

80

4.6.

Steins maximal principle

87

Chapter 5.

Nonstandard analysis

93

5.1.

Polynomial bounds via nonstandard analysis

93

5.2.

Loeb measure and the triangle removal lemma

97

Chapter 6.

Partial differential equations

109

6.1.

The limiting absorption principle

109

6.2.

The shallow water wave equation, and the propagation of


tsunamis

122

Chapter 7.

Number theory

131

7.1.

Hilberts seventh problem, and powers of 2 and 3

7.2.

The Collatz conjecture, Littlewood-Offord theory, and powers


of 2 and 3
142

7.3.

Erdoss divisor bound

7.4.

The Katai-Bourgain-Sarnak-Ziegler asymptotic orthogonality


criterion
165

Chapter 8.

Geometry

131

151

173

8.1.

A geometric proof of the impossibility of angle trisection by


straightedge and compass
173

8.2.

Elliptic curves and Pappuss theorem

188

8.3.

Lines in the Euclidean group SE(2)

194

8.4.

Bezouts inequality

200

8.5.

The Brunn-Minkowski inequality in nilpotent groups

207

Chapter 9.

Dynamics

213

9.1.

The Furstenberg recurrence theorem and finite extensions

213

9.2.

Rohlins problem

217

Chapter 10.

Miscellaneous

223

10.1.

Worst movie polls

224

10.2.

Descriptive and prescriptive science

225

10.3.

Honesty and Bayesian probability

229

Bibliography

233

Index

241

Preface

In February of 2007, I converted my Whats new web page of research


updates into a blog at terrytao.wordpress.com. This blog has since grown
and evolved to cover a wide variety of mathematical topics, ranging from my
own research updates, to lectures and guest posts by other mathematicians,
to open problems, to class lecture notes, to expository articles at both basic
and advanced levels. In 2010, I also started writing shorter mathematical
articles, first on a (now defunct) Google Buzz feed, and now at the Google+
feed
plus.google.com/114134834346472219368/posts .
This book collects some selected articles from both my blog and my
Buzz and Google+ feeds from 2011, continuing a series of previous books
[Ta2008], [Ta2009], [Ta2009b], [Ta2010], [Ta2010b], [Ta2011], [Ta2011b],
[Ta2011c], [Ta2011d], [Ta2012] based on the blog and Buzz.
The articles here are only loosely connected to each other, although
many of them share common themes (such as the titular use of compactness
and contradiction to connect finitary and infinitary mathematics to each
other). I have grouped them loosely by the general area of mathematics
they pertain to, although the dividing lines between these areas is somewhat
blurry, and some articles arguably span more than one category. The articles
in Sections 4.3-4.6 were written in honour of the eightieth birthday of my
graduate advisor, Eli Stein, as a selection of my favourite contributions he
made to analysis.

ix

Preface

A remark on notation
For reasons of space, we will not be able to define every single mathematical
term that we use in this book. If a term is italicised for reasons other than
emphasis or for definition, then it denotes a standard mathematical object,
result, or concept, which can be easily looked up in any number of references.
(In the blog version of the book, many of these terms were linked to their
Wikipedia pages, or other on-line reference pages.)
I will however mention a few notational conventions that I will use
throughout. The cardinality of a finite set E will be denoted |E|. We
will use1 the asymptotic notation X = O(Y ), X  Y , or Y  X to denote
the estimate |X| CY for some absolute constant C > 0. In some cases
we will need this constant C to depend on a parameter (e.g. d), in which
case we shall indicate this dependence by subscripts, e.g. X = Od (Y ) or
X d Y . We also sometimes use X Y as a synonym for X  Y  X.
In many situations there will be a large parameter n that goes off to
infinity. When that occurs, we also use the notation on (X) or simply
o(X) to denote any quantity bounded in magnitude by c(n)X, where c(n)
is a function depending only on n that goes to zero as n goes to infinity. If
we need c(n) to depend on another parameter, e.g. d, we indicate this by
further subscripts, e.g. on;d (X).
1 P
We will occasionally use the averaging notation ExX f (x) := |X|
xX f (x)
to denote the average value of a function f : X C on a non-empty finite
set X.
If E is a subset of a domain X, we use 1E : X R to denote the
indicator function of X, thus 1E (x) equals 1 when x E and 0 otherwise.

Acknowledgments
I am greatly indebted to many readers of my blog, Buzz, and Google+ feeds,
including Andrew Bailey, Roland Bauerschmidt, Tony Carbery, Yemon Choi,
Marco Frasca, Charles Gunn, Joerg Grande, Alex Iosevich, Allen Knutson,
Miguel Lacruz, Srivatsan Narayanan, Andreas Seeger, Orr Shalit, David
Speyer, Ming Wang, Ben Wieland, Qiaochu Yuan, Pavel Zorin, and several
anonymous commenters, for corrections and other comments, which can be
viewed online at
terrytao.wordpress.com
The author is supported by a grant from the MacArthur Foundation, by
NSF grant DMS-0649473, and by the NSF Waterman award.

1In harmonic analysis and PDE, it is more customary to use X . Y instead of X  Y .

Chapter 1

Logic and foundations

1.1. The argument from ignorance


The argumentum ad ignorantiam (argument from ignorance) is one of the
classic fallacies in informal reasoning. In this argument, one starts with the
observation that one does not know of any reason that a statement X is true
(or false), and uses this as evidence to support the claim that X is therefore
false (or therefore true). This argument can have a fair amount of validity
in situations in which ones ability to gather information about X can be
reasonably expected to be close to complete, and can give weak support
for a conclusion when ones information about X is partial but substantial
and unbiased (except in situations in which an adversary is deliberately
exploiting gaps in this information, in which case one should proceed in a
far more game-theoretic or paranoid manner). However, when dealing
with statements about poorly understood phenomena, in which only a small
or unrepresentative amount of data is available, the argument from ignorance
can be quite dangerous, as summarised by the adage absence of evidence
is not evidence of absence.
There are versions of the argument from ignorance that occur in mathematics and physics; these are almost always non-rigorous arguments, but
can serve as useful heuristics, or as the basis for formulating useful conjectures. Examples include the following:
(1) (Non-mathematical induction) If a statement P (x) is known to be
true for all computable examples of x, and one sees no reason why
these examples should not be representative of the general case,
then one expects P (x) to be true for all x.
1

1. Logic and foundations

(2) (Principle of indifference) If a random variable X can take N different values, and there is no reason to expect one of these values
to be any more likely to occur than any other, then one can expect
each value to occur with probability 1/N .
(3) (Equidistribution) If one has a (discrete or continuous) distribution of points x in a space X, and one sees no reason why this
distribution should favour one portion of X over another, then one
can expect this distribution to be asymptotically equidistributed in
X after increasing the sample size of the distribution to infinity
(thus, for any reasonable subset E of X, the portion of the distribution contained inside E should asymptotically converge to the
relative measure of E inside X).
(4) (Independence) If one has two random variables X and Y , and
one sees no reason why knowledge about the value of X should
significantly affect the behaviour of Y (or vice versa), then one can
expect X and Y to be independent (or approximately independent)
as random variables.
(5) (Heuristic Borel-Cantelli) Suppose one is counting solutions to an
equation such as P (n) = 0, where n ranges over some set N . Suppose that for any given n N , one expects the equation P (n) = 0
to hold with probability1 pn . Suppose also that one sees no significant relationship between the solvability P
of P (n) = 0 and the
solvability of P (m) = 0 for distinct n, m. If n pn is infinite,
P one
then expects infinitely many solutions to P (n) = 0; but if n pn is
finite, then on expects only finitely many solutions to P (n) = 0.
(6) (Local-to-global principle) If one is trying to solve some sort of
equation F (x) = 0, and all obvious or local obstructions to
this solvability (e.g. trying to solve df = w when w is not closed)
are not present, and one believes that the class of all possible x is
so large or flexible that no global obstructions (such as those
imposed by topology) are expected to intervene, then one expects
a solution to exist.
The equidistribution principle is a generalisation of the principle of indifference, and among other things forms the heuristic basis for statistical
mechanics (where it is sometimes referred to as the fundamental postulate
of statistical mechanics). The heuristic Borel-Cantelli lemma can be viewed
as a combination of the equidistribution and independence principles.
1Such an expectation for instance might occur from the principle of indifference, for instance
by observing that P (n) can range in a set of size Rn that contains zero, in which case one can
predict a probability pn = 1/Rn that P (n) will equal zero.

1.1. The argument from ignorance

A typical example of the equidistribution principle in action is the conjecture (which is still unproven) that the digits of are equidistributed:
thus, for instance, the proportion of the first N digits of that are equal
to, say, 7, should approach 1/10 in the limit as N goes to infinity. The
point here is that we see no reason why the fractional part {10n } of the
expression 10n should favour one portion of the unit interval [0, 1] over any
other, and in particular it should occupy the subinterval [0.7, 0.8) one tenth
of the time, asymptotically.
A typical application of the heuristic Borel-Cantelli lemma is an informal
proof of the twin prime conjecture that there are infinitely many primes
p such that p + 2 is also prime. From the prime number theorem, we expect
a typical large number n to have an (asymptotic) probability log1 n of being
1
prime, and n+2 to have a probability log(n+2)
of being prime. If one sees no
reason why the primality (or lack thereof) of n should influence the primality
(or lack thereof) of n + 2, then by the independence principle one expects
1
a typical number n to have a probability (log n)(log
n+2) of being the first
P
1
part of a twin prime pair. Since n (log n)(log n+2) diverges, we then expect
infinitely many twin primes.
While these arguments can lead to useful heuristics and conjectures, it
is important to realise that they are not remotely close to being rigorous,
and can indeed lead to incorrect results. For instance, the above argument
claiming to prove the infinitude of twin primes p, p + 2 would also prove
the infinitude of consecutive primes p, p + 1, which is absurd. The reason
here is that the primality of a number p does significantly influence the
primality of its successor p + 1, because all but one of the primes are odd,
and so if p is a prime other than 2, then p + 1 is even and cannot itself be
prime. Now, this objection does not prevent p + 2 from being prime (and
neither does consideration of divisibility by 3, or 5, etc.), and so there is no
obvious reason why the twin prime argument does not work; but one cannot
conclude from this that there are infinitely twin primes without an appeal
to the non-rigorous argument from ignorance.
Another well-known mathematical example where the argument from

ignorance fails concerns the fractional parts of exp( n), where n is a natural number. At first glance, much as with 10n , there is no reason why
these fractional parts of transcendental numbers should favour any region
of the unit interval [0, 1] over any other, and so one expects equidistribution
in n. As a consequence of this and a heuristic Borel-Cantelli argument, one

expects the distance of exp( n) to the nearest integer to not be much


less
than 1/n at best. However, as famously observed by Hermite, exp( 163)
is extremely close to an integer, with the error being less than 1012 . Here,
there is a deeper structure present which one might previously be ignorant

1. Logic and foundations

of, namely the unique factorisation of the number field Q( 163). For all
we know, a similar hidden structure or conspiracy might ultimately be
present in the digits2 of , or the twin primes; we cannot yet rule these out,
and so these conjectures remain open.
There are similar cautionary counterexamples that are related to the
twin prime problem. The same sort of heuristics that support the twin prime
conjecture also support Schinzels hypothesis H, which roughly speaking
asserts that polynomials P (n) over the integers should take prime values
for infinitely many n unless there is an obvious reason why this is not the
case, i.e. if P (n) is never coprime to a fixed modulus q, or if it is reducible, or
if it cannot take arbitrarily large positive values. Thus, for instance, n2 + 1
should take infinitely many prime values (an old conjecture of Landau).
This conjecture is widely believed to be true, and one can use the heuristic
Borel-Cantelli lemma to support it. However, it is interesting to note that
if the integers Z are replaced by the function field analogue F2 [t], then the
conjecture fails, as first observed by Swan [Sw1962]. Indeed, the octic
polynomial n8 + t3 , while irreducible over F2 [t], turns out to never give an
irreducible polynomial for any given value n F2 [t]; this has to do with the
structure of this polynomial in certain lifts of F2 [t], a phenomenon studied
systematically in [CoCoGr2008].
Even when the naive argument from ignorance fails, though, the nature
of that failure can often be quite interesting and lead to new mathematics. In
my own area of research, an example of this came from the inverse theory of
the Gowers uniformity norms. Naively, these norms measured the extent to
which the phase of a function behaved like a polynomial, and so an argument
from ignorance would suggest that the polynomial phases were the only
obstructions to the Gowers uniformity norm being small; however, there
was an important additional class of pseudopolynomial phases, known
as nilsequences, that one additionally had to consider. Proving this latter
conjecture (known as the inverse conjecture for the Gowers norms) goes
through a lot of rich mathematics, in particular the equidistribution theory
of orbits in nilmanifolds, and has a number of applications, for instance in
counting patterns in primes such as arithmetic progressions; see [Ta2011b].

1.2. On truth and accuracy


Suppose that x is an object, and X is a class of objects. What does it mean
to honestly say that x is an element of X?
To a mathematician, the standard here is that of truth: the statement x
is an element of X is honest as long as x satisfies, to the letter, absolutely
2Incidentally, a possible conspiracy among the digits of is a key plot point in the novel
Contact by Carl Sagan, though not in the more well known movie adaptation of that novel.

1.2. On truth and accuracy

all of the requirements for membership in X (and similarly, x is not an


element of X is honest if even the most minor requirement for membership
is violated). Thus, for instance, a square is an example of a rectangle, a
straight line segment is an example of a curve, 1 is not an example of a
prime number, and so forth.
In most areas outside of mathematics, though, using strict truth as the
standard for honesty is not ideal (even if people profess it to be so). To give
a somewhat frivolous example, using a strict truth standard, tomatoes are
not vegetables, but are technically fruits. Less frivolously, many loopholes in
legal codes (such as tax codes) are based on interpretations of laws that are
strictly true, but not necessarily in the spirit in which the law was intended.
Even mathematicians deviate sometimes from a strict truth standard, for
instance by abusing notation (e.g. using a set X when one should instead be
referring to a space (such as a metric space (X, d), a measure space (X, B, ),
etc.)), or by using adverbs such as morally or essentially.
In most practical situations, a better standard for honesty would be
that of accuracy rather than truth. Under this standard, the statement
x is an element of X would be honest if x is close to (or resembles) a
typical element of X, with the level of honesty proportional to the degree of
resemblance or closeness (and the degree of typicality). Under this standard,
for instance, the assertion that a tomato is a vegetable is quite honest, as
a tomato is close in practical function to a typical vegetable. On the other
hand, a mathematically correct assertion such as squares are rectangles
becomes slightly dishonest, since a generic rectangle would not have all sides
equal, and so the mental image generated by labeling a square object a
rectangle instead of a square is more misleading. Meanwhile, the statement
equals 22/7, while untrue, is reasonably accurate, and thus honest in
many situations outside of higher mathematics.
Many deceptive rhetorical techniques rely on asserting statements which
are true but not accurate. A good example of this is reductio ad Hitlerum:
attacking the character of a person x by noting that x belongs to a class X
which also contains Hitler. Usually, either x or Hitler (or both) will not be
a typical element of X, making this attack dishonest even if all statements
used in the attack are true in a strict sense. Other examples include using
guilt by association, lying by omission, or by using emotionally charged
words to alter the listeners perception of what a typical element of a
class X is.
Of course, accuracy is much less of an objective standard than truth, as
it is difficult to attain consensus on exactly what one means by close or
typical, or to decide on exactly what threshold of accuracy is acceptable
for a given situation. Also, the laws of logic, which apply without exception

1. Logic and foundations

to truth, do not always apply without exception to accuracy. For instance,


the law of the excluded middle fails: if x is a person, it is possible for the
two statements x is someone who has stopped beating his wife and x
is someone who has not stopped beating his wife to both3 be dishonest.
Similarly, 1 is not a prime number and 1 is not a composite number are
true, but somewhat dishonest statements (as the former suggests that 1 is
composite, while the latter suggests that 1 is prime); the joint statement 1
is neither a prime number nor a composite number is more honest.
Ideally, of course, all statements in a given discussion should be both
factually correct and accurate. But it would be a mistake to only focus on
the former standard and not on the latter.

1.3. Mathematical modeling


In order to use mathematical modelling in order to solve a real-world problem, one ideally would like to have three ingredients besides the actual mathematical analysis:
(i) A good mathematical model. This is a mathematical construct
which connects the observable data, the predicted outcome, and
various unspecified parameters of the model to each other. In some
cases, the model may be probabilistic instead of deterministic (thus
the predicted outcome will be given as a random variable rather
than as a fixed quantity).
(ii) A good set of observable data.
(iii) Good values for the parameters of the model.
For instance, if one wanted to work out the distance D to a distant
galaxy, the model might be Hubbles law v = HD relating the distance
to the recessional velocity v, the data might be the recessional velocity v
(or, more realistically, a proxy for that velocity, such as the red shift), and
the only parameter in this case would be the Hubble constant H. This is
a particularly simple situation; of course, in general one would expect a
much more complex model, a much larger set of data, and a large number
of parameters4.
As mentioned above, in ideal situations one has all three ingredients: a
good model, good data, and good parameters. In this case the only remaining difficulty is a direct one, namely to solve the equations of the model
3At the other extreme, consider Niels Bohrs quote: The opposite of a correct statement is
a false statement. But the opposite of a profound truth may well be another profound truth..
4Such parameters need not be numerical; a model, for instance, could posit an unknown
functional relationship between two observable quantities, in which case the function itself is the
unknown parameter.

1.3. Mathematical modeling

with the given data and parameters to obtain the result. This type of situation pervades undergraduate homework exercises in applied mathematics
and physics, and also accurately describes many mature areas of engineering
(e.g. civil engineering or mechanical engineering) in which the model, data,
and parameters are all well understood. One could also classify pure mathematics as being the quintessential example of this type of situation, since the
models for mathematical foundations (e.g. the ZFC model for set theory)
are incredibly well understood (to the point where we rarely even think of
them as models any more), and one primarily works with well-formulated
problems with precise hypotheses and data.
However, there are many situations in which one or more ingredients are
missing. For instance, one may have a good model and good data, but the
parameters of the model are initially unknown. In that case, one needs to
first solve some sort of inverse problem to recover the parameters from existing sets of data (and their outcomes), before one can then solve the direct
problem. In some cases, there are clever ways to gather and use the data
so that various unknown parameters largely cancel themselves out, simplifying the task. For instance, to test the efficiency of a drug, one can use a
double-blind study in order to cancel out the numerous unknown parameters that affect both the control group and the experimental group equally.
Typically, one cannot solve for the parameters exactly, and so one must accept an increased range of error in ones predictions. This type of problem
pervades undergraduate homework exercises in statistics, and accurately describes many mature sciences, such as physics, chemistry, materials science,
and some of the life sciences.
Another common situation is when one has a good model and good
parameters, but an incomplete or corrupted set of data. Here, one often has
to clean up the data first using error-correcting techniques before proceeding
(this often requires adding a mechanism for noise or corruption into the
model itself, e.g. adding gaussian white noise to the measurement model).
This type of problem pervades undergraduate exercises in signal processing,
and often arises in computer science and communications science.
In all of the above cases, mathematics can be utilised to great effect,
though different types of mathematics are used for different situations (e.g.
computational mathematics when one has a good model, data set, and parameters; statistics when one has good model and data set but unknown
parameters; computer science, filtering, and compressed sensing when one
has good model and parameters, but unknown data; and so forth). However,
there is one important situation where the current state of mathematical sophistication is only of limited utility, and that is when it is the model which
is unreliable. In this case, even having excellent data, perfect knowledge of

1. Logic and foundations

parameters, and flawless mathematical analysis may lead to error or a false


sense of security; this for instance arose during the recent financial crisis, in
which models based on independent gaussian fluctuations in various asset
prices turned out to be totally incapable of describing tail events.
Nevertheless, there are still some ways in which mathematics can assist
in this type of situation. For instance, one can mathematically test the
robustness of a model by replacing it with other models and seeing the
extent to which the results change. If it turns out that the results are largely
unaffected, then this builds confidence that even a somewhat incorrect model
may still yield usable and reasonably accurate results. At the other extreme,
if the results turn out to be highly sensitive to the model assumptions,
then even a model with a lot of theoretical justification would need to be
heavily scrutinised by other means (e.g. cross-validation) before one would
be confident enough to use it. Another use of mathematics in this context
is to test the consistency of a model. For instance, if a model for a physical
process leads to a non-physical consequence (e.g. if a partial differential
equation used in the model leads to solutions that become infinite in finite
time), this is evidence that the model needs to be modified or discarded
before it can be used in applications.
It seems to me that one of the reasons why mathematicians working
in different disciplines (e.g. mathematical physicists, mathematical biologists, mathematical signal processors, financial mathematicians, cryptologists, etc.) have difficulty communicating to each other mathematically is
that their basic environment of model, data, and parameters are so different: a set of mathematical tools, principles, and intuition that works well
in, say, a good model, good parameters, bad data environment may be totally inadequate or even misleading when working in, say, a bad model,
bad parameters, good data environment. (And there are also other factors
beyond these three that also significantly influence the mathematical environment and thus inhibit communication; for instance, problems with an
active adversary, such as in cryptography or security, tend to be of a completely different nature than problems in the only adverse effects come from
natural randomness, which is for instance the case in safety engineering.)

1.4. Epistemic logic, and the blue-eyed islander puzzle lower


bound
In [Ta2009, 1.1] I discussed my favourite logic puzzle, namely the blue-eyed
islander puzzle, reproduced here:
Problem 1.4.1. There is an island upon which a tribe resides. The tribe
consists of 1000 people, with various eye colours. Yet, their religion forbids
them to know their own eye color, or even to discuss the topic; thus, each

1.4. Epistemic logic and blue-eyed islanders

resident can (and does) see the eye colors of all other residents, but has
no way of discovering his or her own (there are no reflective surfaces). If
a tribesperson does discover his or her own eye color, then their religion
compels them to commit ritual suicide at noon the following day in the
village square for all to witness. All the tribespeople are highly logical5 and
devout, and they all know that each other is also highly logical and devout
(and they all know that they all know that each other is highly logical and
devout, and so forth).
Of the 1000 islanders, it turns out that 100 of them have blue eyes and
900 of them have brown eyes, although the islanders are not initially aware
of these statistics (each of them can of course only see 999 of the 1000
tribespeople).
One day, a blue-eyed foreigner visits to the island and wins the complete
trust of the tribe.
One evening, he addresses the entire tribe to thank them for their hospitality.
However, not knowing the customs, the foreigner makes the mistake
of mentioning eye color in his address, remarking how unusual it is to see
another blue-eyed person like myself in this region of the world.
What effect, if anything, does this faux pas have on the tribe?
I am fond of this puzzle because in order to properly understand the
correct solution (and to properly understand why the alternative solution is
incorrect), one has to think very clearly (but unintuitively) about the nature
of knowledge.
There is however an additional subtlety to the puzzle that was pointed
out to me, in that the correct solution to the puzzle has two components, a
(necessary) upper bound and a (possible) lower bound, both of which I will
discuss shortly. Only the upper bound is correctly explained in the puzzle
(and even then, there are some slight inaccuracies, as will be discussed below). The lower bound, however, is substantially more difficult to establish,
in part because the bound is merely possible and not necessary. Ultimately,
this is because to demonstrate the upper bound, one merely has to show
that a certain statement is logically deducible from an islanders state of
knowledge, which can be done by presenting an appropriate chain of logical
deductions. But to demonstrate the lower bound, one needs to show that
certain statements are not logically deducible from an islanders state of
knowledge, which is much harder, as one has to rule out all possible chains
5For the purposes of this logic puzzle, highly logical means that any conclusion that can
logically deduced from the information and observations available to an islander, will automatically
be known to that islander.

10

1. Logic and foundations

of deductive reasoning from arriving at this particular conclusion. In fact,


to rigorously establish such impossiblity statements, one ends up having to
leave the syntactic side of logic (deductive reasoning), and move instead
to the dual semantic side of logic (creation of models). As we shall see,
semantics requires substantially more mathematical setup than syntax, and
the demonstration of the lower bound will therefore be much lengthier than
that of the upper bound.
To complicate things further, the particular logic that is used in the blueeyed islander puzzle is not the same as the logics that are commonly used in
mathematics, namely propositional logic and first-order logic. Because the
logical reasoning here depends so crucially on the concept of knowledge, one
must work instead with an epistemic logic (or more precisely, an epistemic
modal logic) which can properly work with, and model, the knowledge of
various agents. To add even more complication, the role of time is also
important (an islander may not know a certain fact on one day, but learn it
on the next day), so one also needs to incorporate the language of temporal
logic in order to fully model the situation. This makes both the syntax
and semantics of the logic quite intricate; to see this, one only needs to
contemplate the task of programming a computer with enough epistemic
and temporal deductive reasoning powers that it would be able to solve
the islander puzzle (or even a smaller version thereof, say with just three
or four islanders) without being deliberately fed the solution. (The fact,
therefore, that humans can actually grasp the correct solution without any
formal logical training is therefore quite remarkable.)
As difficult as the syntax of temporal epistemic modal logic is, though,
the semantics is more intricate still. For instance, it turns out that in order
to completely model the epistemic state of a finite number of agents (such
as 1000 islanders), one requires an infinite model, due to the existence of
arbitrarily long nested chains of knowledge (e.g. A knows that B knows
that C knows that D has blue eyes), which cannot be automatically reduced
to shorter chains of knowledge. Furthermore, because each agent has only
an incomplete knowledge of the world, one must take into account multiple
hypothetical worlds, which differ from the real world but which are considered
to be possible worlds by one or more agents, thus introducing modality into
the logic. More subtly, one must also consider worlds which each agent
knows to be impossible, but are not commonly known to be impossible, so
that (for instance) one agent is willing to admit the possibility that another
agent considers that world to be possible; it is the consideration of such
worlds which is crucial to the resolution of the blue-eyed islander puzzle.
And this is even before one adds the temporal aspect (e.g. On Tuesday,
A knows that on Monday, B knew that by Wednesday, C will know that D
has blue eyes).

1.4. Epistemic logic and blue-eyed islanders

11

Despite all this fearsome complexity, it is still possible to set up both the
syntax and semantics of temporal epistemic modal logic6 in such a way that
one can formulate the blue-eyed islander problem rigorously, and in such a
way that one has both an upper and a lower bound in the solution. The
purpose of this section is to construct such a setup and to explain the lower
bound in particular. The same logic is also useful for analysing another
well-known paradox, the unexpected hanging paradox, and I will do so at the
end of this section. Note though that there is more than one way7 to set up
epistemic logics, and they are not all equivalent to each other.
Our approach here will be a little different from the approach commonly found in the epistemic logic literature, in which one jumps straight
to arbitrary-order epistemic logic in which arbitrarily long nested chains
of knowledge (A knows that B knows that C knows that . . . ) are allowed. Instead, we will adopt a hierarchical approach, recursively defining
for k = 0, 1, 2, . . . a k th -order epistemic logic in which knowledge chains of
depth up to k, but no greater, are permitted. The arbitrarily order epistemic
logic is then obtained as a limit (a direct limit on the syntactic side, and an
inverse limit on the semantic side, which is dual to the syntactic side) of
the finite order epistemic logics. The relationship between the traditional
approach (allowing arbitrarily depth from the start) and the hierarchical one
presented here is somewhat analogous to the distinction between ZermeloFraenkel-Choice (ZFC) set theory without the axiom of foundation, and
ZFC with that axiom.
I should warn that this is going to be a rather formal and mathematical
article. Readers who simply want to know the answer to the islander puzzle
would probably be better off reading the discussion at
terrytao.wordpress.com/2011/04/07/the-blue-eyed-islanders-puzzle-repost
.
I am indebted to Joe Halpern for comments and corrections.
1.4.1. Zeroth-order logic. Before we plunge into the full complexity of
epistemic logic (or temporal epistemic logic), let us first discuss formal logic
in general, and then focus on a particularly simple example of a logic, namely
zeroth order logic (better known as propositional logic). This logic will end
up forming the foundation for a hierarchy of epistemic logics, which will be
needed to model such logic puzzles as the blue-eyed islander puzzle.
6On the other hand, for puzzles such as the islander puzzle in which there are only a finite
number of atomic propositions and no free variables, one at least can avoid the need to admit
predicate logic, in which one has to discuss quantifiers such as and . A fully formed predicate
temporal epistemic modal logic would indeed be of terrifying complexity.
7
In particular, one can also proceed using Kripke models for the semantics, which in my view,
are more elegant, but harder to motivate than the more recursively founded models presented here.

12

1. Logic and foundations

Informally, a logic consists of three inter-related components:


(1) A language. This describes the type of sentences the logic is able
to discuss.
(2) A syntax (or more precisely, a formal system for the given language). This describes the rules by which the logic can deduce
conclusions (from given hypotheses).
(3) A semantics. This describes the sentences which the logic interprets
to be true (in given models).
A little more formally:
(1) A language is a set L of sentences, which are certain strings of
symbols from a fixed alphabet, that are generated by some rules of
grammar.
(2) A syntax is a collection of inference rules for generating deductions
of the form T ` S (which we read as From T , we can deduce S
or S is a consequence of T ), where T and S are sentences in L
(or sets of sentences in L).
(3) A semantics describes what a model (or interpretation, or structure,
or world ) M of the logic is, and defines what it means for a sentence
S in L (or a collection of sentences) to be true in such a model M
(which we write as M |= S, and we read as M models S, M
obeys S, or S is true in M ).
We will abuse notation a little bit and use the language L as a metonym
for the entire logic; strictly speaking, the logic should be a tuple (L, `L , |=L )
consisting of the language, syntax, and semantics, but this leads to very
unwieldy notation.
The syntax and semantics are dual to each other in many ways; for instance, the syntax of deduction can be used to show that certain statements
can be proved, while the semantics can be used to show that certain statements cannot be proved. This distinction will be particularly important in
the blue-eyed islander puzzle; in order to show that all blue-eyed islanders
commit suicide by the 100th day, one can argue purely on formal syntactical
grounds; but to show that it is possible for the blue-eyed islanders to not
commit suicide on the 99th day or any preceding day, one must instead use
semantic methods.
To illustrate the interplay between language, deductive syntax, and semantics, we begin with the simple example of propositional logic. To describe
this logic, one must first begin with some collection of atomic propositions.
For instance, on an island with three islanders I1 , I2 , I3 , one could consider

1.4. Epistemic logic and blue-eyed islanders

13

the propositional logic generated by three atomic propositions A1 , A2 , A3 ,


where each Ai is intended to model the statement that Ii has blue eyes.
One can have either a finite or an infinite set of atomic propositions. In
this discussion, it will suffice to consider the situation in which there are only
finitely many atomic propositions, but one can certainly also study logics
with infinitely many such propositions.
The language L would then consist of all the sentences that can be
formed from the atomic propositions using the usual logical connectives (,
, , = , >, , etc.) and parentheses, according to the usual rules of
logical grammar (which consists of rules such as If S and T are sentences
in L, then (S T ) is also a sentence in L). For instance, if A1 , A2 , A3 are
atomic propositions, then
((A1 A2 ) (A3 A1 ))
would be an example of a sentence in L. On the other hand,
A1 A3 A1 ) = A2 (
is not a sentence in L, despite being a juxtaposition of atomic propositions,
connectives, and parentheses, because it is not built up from rules of grammar.
One could certainly write down a finite list of all the rules of grammar
for propositional calculus (as is done in any basic textbook on mathematical
logic), but we will not do so here in order not to disrupt the flow of discussion.
It is customary to abuse notation slightly and omit parentheses when
they are redundant (or when there is enough associativity present that the
precise placement of parentheses are not relevant). For instance, ((A1
A2 ) A3 ) could be abbreviated as A1 A2 A3 . We will adopt this type of
convention in order to keep the exposition as uncluttered as possible.
Now we turn to the syntax of propositional logic. This syntax is generated by basic rules of deductive logic, such as modus ponens
A, (A = B) ` B
or the law of the excluded middle
` (A A)
and completed by transitivity (if S ` T and T ` U , then S ` U ), monotonicity (S, T ` S), and concatenation (if S ` T and S ` U then S ` T, U ). (Here
we adopt the usual convention of representing a set of sentences without using the usual curly braces, instead relying purely on the comma separator.)
Another convenient inference rule to place in this logic is the deduction theorem: if S ` T , then one can infer ` (S = T ). In propositional logic (or
predicate logic), this rule is redundant (hence the designation of this rule as

14

1. Logic and foundations

a theorem), but for the epistemic logics below, it will be convenient to make
deduction an explicit inference rule, as it simplifies the other inference rules
one will have to add to the system.
A typical deduction that comes from this syntax is
(A1 A2 A3 ), A2 , A3 ` A1
which using the blue-eyed islander interpretation, is the formalisation of the
assertion that given that at least one of the islanders I1 , I2 , I3 has blue eyes,
and that I2 , I3 do not have blue eyes, one can deduce that I1 has blue eyes.
As with the laws of grammar, one can certainly write down a finite list
of inference rules in propositional calculus; again, such lists may be found in
any text on mathematical logic. Note though that, much as a given vector
space has more than one set of generators, there is more than one possible
list of inference rules for propositional calculus, due to some rules being
equivalent to, or at least deducible from, other rules; the precise choice of
basic inference rules is to some extent a matter of personal taste and will
not be terribly relevant for the current discussion.
Finally, we discuss the semantics of propositional logic. For this particular logic, the models M are described by truth assignments, that assign a
truth value (M |= Ai ) {true, false} to each atomic statement Ai . Once a
truth value (M |= Ai ) to each atomic statement Ai is assigned, the truth
value (M |= S) of any other sentence S in the propositional logic generated
by these atomic statements can then be interpreted using the usual truth
tables. For instance, returning to the islander example, consider a model M
in which M |= A1 is true, but M |= A2 and M |= A3 are false; informally,
M describes a hypothetical world in which I1 has blue eyes but I2 and I3
do not have blue eyes. Then the sentence A1 A2 A3 is true in M ,
M |= (A1 A2 A3 ),
but the statement A1 = A2 is false in M ,
M 6|= (A1 = A2 ).
If S is a set of sentences, we say that M models S if M models each sentence
in S. Thus for instance, if we continue the preceding example, then
M |= (A1 A2 A3 ), (A2 = A3 )
but
M 6|= (A1 A2 A3 ), (A1 = A2 ).
Note that if there are only finitely many atomic statements A1 , . . . , An ,
then there are only finitely many distinct models M of the resulting propositional logic; in fact, there are exactly 2n such models, one for each truth

1.4. Epistemic logic and blue-eyed islanders

15

assignment. We will denote the space of all possible models of a language L


as Mod(L).
If one likes, one can designate one of these models to be the real world
Real so that all the other models become purely hypothetical worlds. In the
setting of propositional logic, the hypothetical worlds then have no direct
bearing on the real world; the fact that a sentence S is true or false in a
hypothetical world M does not say anything about what sentences are true
or false in Real. However, when we turn to epistemic logics later in this
section, we will see that hypothetical worlds will play an important role in
the real world, because such worlds may be considered to be possible worlds
by one or more agents (or, an agent may consider it possible that another
agent considers the world to be possible, and so forth.).
The syntatical and semantic sides of propositional logic are tied together
by two fundamental facts:
Theorem 1.4.2 (Soundness and completeness). Let L be a propositional
logic, and let S be a set of sentences in L, and let T be another sentence in
L.
(1) (Soundness) If S ` T , then every model M which obeys S, also
obeys T (i.e. M |= S implies M |= T ).
(2) (Completeness) If every model M that obeys S, also obeys T , then
S ` T.
Soundness is easy to prove; one merely needs to verify that each of the
inference rules S ` T in ones syntax is valid, in that models that obey
S, automatically obey T . This boils down to some tedious inspection of
truth tables. (The soundness of the deduction theorem is a little trickier to
prove, but one can achieve this by an induction on the number of times this
theorem is invoked in a given induction.) Completeness is a bit more difficult
to establish; this claim is in fact a special case of the G
odel completeness
theorem, and is discussed in [Ta2010b, 1.4]; we also sketch a proof of
completeness below.
By taking the contrapositive of soundness, we have the following important corollary: if we can find a model M which obeys S but does not obey
T , then it must not be possible to deduce T as a logical consequence of S:
S 6` T . Thus, we can use semantics to demonstrate limitations in syntax.
For instance, consider a truth assignment M in which A2 is true but
A1 is false. Then M |= (A1 = A2 ), but M 6|= (A2 = A1 ). This
demonstrates that
(A1 = A2 ) 6` (A2 = A1 ),

16

1. Logic and foundations

thus an implication such as A1 = A2 does not entail its converse A2 =


A1 .
A theory (or more precisely, a deductive theory) in a logic L, is a set of
sentences T in L which is closed under deductive consequence, thus if T ` S
for some sentence S in L, then S T . Given a theory T , one can associate
the set
ModL (T ) = Mod(T ) := {M Mod(L) : M |= T }
of all possible worlds (or models) in which that theory is true; conversely,
given a set M Mod(L) of such models, one can form the theory
ThL (M) = Th(M) := {S L : M |= S for all M M}
of sentences which are true in all models in M. If the logic L is both sound
and complete, these operations invert each other: given any theory T , we
have Th(Mod(T )) = T , and given any set M Mod(L) of models, Th(M)
is a theory and Mod(Th(M)) = M. Thus there is a one-to-one correspondence between theories and sets of possible worlds in a sound complete
language L.
For instance, in our running example, if T is the theory generated by
the three statements A1 A2 A3 , A2 , and A3 , then Mod(T ) consists of
precisely two worlds; one in which A1 , A2 are true and A3 is false, and one
in which A2 is true and A1 , A3 are false. Since neither of A1 or A1 are true
in both worlds in Mod(T ), neither of A1 or A1 lie in T = Th(Mod(T )).
Thus, it is not possible to deduce either of A1 or A1 from the hypotheses
A1 A2 A3 , A2 , and A3 . More informally, if one knows that there is at
least one blue-eyed islander, that I2 has blue eyes, and I3 does not have blue
eyes, this is not enough information to determine whether I1 has blue eyes
or not.
One can use theories to prove the completeness theorem. Roughly speaking, one can argue by taking the contrapositive. Suppose that S 6` T , then
we can find a theory which contains all sentences in S, but does not contain
T . In this finite setting, we can easily pass to a maximal such theory (with
respect to set inclusion); one then easily verifies that this theory is complete
in the sense that for any given sentence U , exactly one of U and U is true.
From this complete theory one can then directly build a model M which
obeys S but does not obey T , giving the desired claim.
1.4.2. First-order epistemic logic. Having reviewed propositional logic
(which we will view as the zeroth-order iteration of epistemic logic), we
now turn to the first non-trivial example of epistemic logic, which we shall
call first-order epistemic logic (which should not be confused with the more
familiar first-order predicate logic). Roughly speaking, first-order epistemic
logic is like zeroth-order logic, except that there are now also some knowledge

1.4. Epistemic logic and blue-eyed islanders

17

agents that are able to know certain facts in zeroth-order logic (e.g. an
islander I1 may know that the islander I2 has blue eyes). However, in this
logic one cannot yet express higher-order facts (e.g. we will not yet be able
to formulate a sentence to the effect that I1 knows that I2 knows that I3
has blue eyes). This will require a second-order or higher epistemic logic,
which we will discuss later in this section.
Let us now formally construct this logic. As with zeroth-order logic, we
will need a certain set of atomic propositions, which for simplicity we will
assume to be a finite set A1 , . . . , An . This already gives the zeroth order
language L0 of sentences that one can form from the A1 , . . . , An by the rules
of propositional grammar. For instance,
(A1 = A2 ) (A2 = A3 )
is a sentence in L0 . The zeroth-order logic L0 also comes with a notion of
inference `L0 and a notion of modeling |=L0 , which we now subscript by L0
in order to distinguish it from the first-order notions of inference `L1 and
modeling |=L1 which we will define shortly. Thus, for instance
(A1 = A2 ) (A2 = A3 ) `L0 (A1 = A3 ),
and if M0 is a truth assignment for L0 for which A1 , A2 , A3 are all true, then
M0 |=L0 (A1 = A2 ) (A2 = A3 ).
We will also assume the existence of a finite number of knowledge agents
K1 , . . . , Km , each of which are capable of knowing sentences in the zeroth
order language L0 . (In the case of the islander puzzle, and ignoring for
now the time aspect of the puzzle, each islander Ii generates one knowledge
agent Ki , representing the state of knowledge of Ii at a fixed point in time.
Later on, when we add in the temporal aspect to the puzzle, we will need
different knowledge agents for a single islander at different points in time,
but let us ignore this issue for now.) To formalise this, we define the firstorder language L1 to be the language generated from L0 and the rules of
propositional grammar by imposing one additional rule:
If S is a sentence in L0 , and K is a knowledge agent, then K(S)
is a sentence in L1 (which can informally be read as K knows (or
believes) S to be true).
Thus, for instance,
K2 (A1 ) K1 (A1 A2 A3 ) A3
is a sentence in L1 ; in the islander interpretation, this sentence denotes the
assertion that I2 knows I1 to have blue eyes, and I1 knows that at least one
islander has blue eyes, but I3 does not have blue eyes. On the other hand,
K1 (K2 (A3 ))

18

1. Logic and foundations

is not a sentence in L1 , because K2 (A3 ) is not a sentence in L0 . (However, we


will be able to interpret K1 (K2 (A3 )) in the second-order epistemic language
L2 that we will define later.)
We give L1 all the rules of syntax that L0 presently enjoys. For instance,
thanks to modus ponens, we have
(1.1)

K1 (A1 ) (K1 (A1 ) = K1 (A2 )) `L1 K1 (A2 ).

Similarly, if S, T are sentences in L0 such that S `L0 T , then one automatically has S `L1 T .
However, we would like to add some additional inference rules to reflect our understanding of what knowledge means. One has some choice
in deciding what rules to lay down here, but we will only add one rule,
which informally reflects the assertion that all knowledge agents are highly
logical:
First-order epistemic inference rule: If S1 , . . . , Si , T L0 are
sentences such that
S1 , . . . , Si `L0 T
and K is a knowledge agent, then
K(S1 ), . . . , K(Si ) `L1 K(T ).
We will introduce higher order epistemic inference rules when we turn
to higher order epistemic logics.
Informally speaking, the epistemic inference rule asserts that if T can be
deduced from S1 , . . . , Si , and K knows S1 , . . . Si to be true, then K must also
know T to be true. For instance, since modus ponens gives us the inference
A1 , (A1 = A2 ) `L0 A2
we therefore have, by the first-order epistemic inference rule,
K1 (A1 ), K1 (A1 = A2 ) `L1 K1 (A2 )
(note how this is different from (1.1) - why?).
Another example of more relevance to the islander puzzle, we have
(A1 A2 A3 ), A2 , A3 `L0 A1
and thus, by the first-order epistemic inference rule,
K1 (A1 A2 A3 ), K1 (A2 ), K1 (A3 ) `L1 K1 (A1 ).
In the islander interpretation, this asserts that if I1 knows that one of the
three islanders I1 , I2 , I3 has blue eyes, but also knows that I2 and I3 do not
have blue eyes, then I1 must also know that he himself (or she herself) has
blue eyes.

1.4. Epistemic logic and blue-eyed islanders

19

One particular consequence of the first-order epistemic inference rule is


that if a sentence T L0 is a tautology in L0 - true in every model of L0 ,
or equivalently (by completeness) deducible from the inference rules of L0 ,
and K is a knowledge agent, then K(T ) is a tautology in L1 : `L0 T implies
`L1 K(T ). Thus, for instance, we have `L1 K1 (A1 = A1 ), because A1 is
a tautology in L0 (thus `L0 A1 = A1 ).
It is important to note, however, that if a statement T is not a tautology,
but merely true in the real world Real, this does not imply that K(T ) is
also true in the real world: as we shall see later, Real |=L0 T does not imply
Real |=L1 K(T ). (We will define what |=L1 means presently.) Intuitively,
this reflects the obvious fact that knowledge agents need not be omniscient;
it is possible for a sentence T to be true without a given agent K being
aware of this truth.
In the converse direction, we also allow for the possibility that K(T )
is true in the real world, without T being true in the real world, thus it
is conceivable that Real |=L1 K(T ) is true but Real |=L0 T is false. This
reflects the fact that a knowledge agent may in fact have incorrect knowledge
of the real world. (This turns out not to be an important issue in the islander
puzzle, but is of relevance for the unexpected hanging puzzle.)
In a related spirit, we also allow for the possibility that K(T ) and K(T )
may both be true in the real world; an agent may conceivably be able to
know inconsistent facts. However, from the inference T, T `L0 S of ex
falso quodlibet and the first-order epistemic inference rule, this would mean
that K(S) is true in this world for every S in L0 , thus this knowledge agent
believes absolutely every statement to be true. Again, such inconsistencies
are not of major relevance to the islander puzzle, but as we shall see, their
analysis is important for resolving the unexpected hanging puzzle correctly.
Remark 1.4.3. It is perhaps worth re-emphasising the previous points. In
some interpretations of knowledge, K(S) means that S has somehow been
justified to be true, and in particular K(S) should entail S in such interpretations. However, we are taking a more general (and abstract) point of
view, in which we are agnostic as regards to whether K represents necessary
or justified knowledge. In particular, our analysis also applies to generalised knowledge operators, such as belief. One can of course specialise
this general framework to a more specific knowledge concept by adding more
axioms, in which case one can obtain sharper conclusions regarding the resolution of various paradoxes, but we will work here primarily in the general
setting.
Having discussed the language and syntax of the first-order epistemic
logic L1 , we now turn to the semantics, in which we describe the possible

20

1. Logic and foundations

models M1 of L1 . As L1 is an extension of L0 , any model M1 of L1 must contain as a component a model M0 of L0 , which describes the truth assignment
of each of the atomic propositions Ai of L0 ; but it must also describe the state
of knowledge of each of the agents Ki in this logic. One can describe this
state in two equivalent ways; either as a theory {S L0 : M1 |=L1 Ki (S)}
(in L0 ) of all the sentences S in L0 that Ki knows to be true (which, by the
first-order epistemic inference rule, is closed under `L0 and is thus indeed a
theory in L0 ); or equivalently (by the soundness and completeness of L0 ),
as a set
{M0,i Mod(L0 ) : M0,i |=L0 S whenever M1 |=L1 Ki (S)}
of all the possible models of L0 in which all the statements that Ki knows to
be true, are in fact true. We will adopt the latter perspective; thus a model
M1 of L1 consists of a tuple
(1)

(m)

M1 = (M0 , M0 , . . . , M0 )
(i)

where M0 Mod(L0 ) is a model of L0 , and for each i = 1, . . . , m, M0


Mod(L0 ) is a set of models of L0 . To interpret sentences S L1 in M1 , we
then declare M1 |= Ai iff M0 |= Ai for each atomic sentence Ai , and declare
(i)
M1 |= Ki (S) iff S is true in every model in M0 , for each i = 1, . . . , m and
S L0 . All other sentences in L1 are then interpreted by applying the usual
truth tables.
As an example of such a model, consider a world with three islanders
I1 , I2 , I3 , each of which has blue eyes, and can each see that each other has
blue eyes, but are each unaware of their own eye colour. In this model, M0
(1)
assigns a true value to each of A1 , A2 , A3 . As for M0 , which describes the
knowledge state of I1 , this set consists of two possible L0 -worlds. One is
the true L0 -world M0 , in which A1 , A2 , A3 are all true; but there is also
an additional hypothetical L0 -world M0,1 , in which A2 , A3 is true but A1 is
false. With I1 s current state of knowledge, neither of these two possibilities
(2)
(3)
can be ruled out. Similarly, M0 and M0 will also consist of two L0 worlds, one of which is the true L0 -world M0 , and the other is not.
In this particular case, the true L0 -world M0 is included as a possible
(i)
world in each of the knowledge agents set of possible worlds M0 , but in
situations in which the agents knowledge is incorrect or inconsistent, it can
(i)
be possible for M0 to not be an element of one or more of the M0 .
Remark 1.4.4. One can view an L1 model M1 as consisting of the real
(i)
world - the L0 -model M0 - together with m clouds M0 , i = 1, . . . , m of
hypothetical worlds, one for each knowledge agent Ki . If one chooses, one
can enter the head of any one of these knowledge agents Ki to see what
he or she is thinking. One can then select any one of the L0 -worlds M0,i

1.5. Higher-order epistemic logic

21

(i)

in M0 as a possible world in Ki s worldview, and explore that world


further. Later on we will iterate this process, giving a tree-like structure to
the higher order epistemic models.
Let Mod(L1 ) be the set of all models of L1 . This is quite a large
set; if there are n atomic statements A1 , . . . , An and m knowledge agents
K1 , . . . , Km , then there are 2n possibilities for the L0 -world M0 , and each
(i)
knowledge agent Ki has its own independent set M0 of possible worlds, of
n
n
which there are 22 different possibilities, leading to 2n+m2 distinct models
M1 for L1 in all. For instance, with three islanders wondering about eye
colours, this leads to 227 possibilities (although, once everyone learns each
others eye colour, the number of possible models goes down quite significantly).
It can be shown (but is somewhat tedious to do so) that the syntax
and semantics of the first-order epistemic logic L1 is still sound and complete, basically by mimicking (and using) the proof of the soundness and
completeness of L0 ; we sketch a proof of this below when we discuss higher
order logics.

1.5. Higher-order epistemic logic


We can iterate the above procedure and construct a language, syntax, and
semantics for k th order epistemic logic Lk generated by some atomic propositions A1 , . . . , An and knowledge agents K1 , . . . , Km , recursively in terms
of the preceding epistemic logic Lk1 . More precisely, let k 1 be a natural
number, and suppose that the logic Lk1 has already been defined. We then
define the language of Lk as the extension of Lk1 generated by the laws of
propositional grammar and the following rule:
If S is a sentence in Lk1 , and K is a knowledge agent, then K(S)
is a sentence in Lk .
Thus, for instance, in the running example of three propositions A1 , A2 , A3
and three knowledge agents K1 , K2 , K3 ,
K1 (A3 ) K1 (K2 (A3 ))
is a sentence in L2 (and hence in L3 , L4 , etc.) but not in L1 .
As for the syntax, we adopt all the inference rules of ordinary propositional logic, together with one new rule:
k th -order epistemic inference rule: If S1 , . . . , Si , T Lk1 are
sentences such that
S1 , . . . , Si `Lk1 T

22

1. Logic and foundations

and K is a knowledge agent, then


K(S1 ), . . . , K(Si ) `Lk K(T ).
Thus, for instance, starting with
A1 , (A1 = A2 ) `L0 A2
one has
K1 (A1 ), K1 (A1 = A2 ) `L1 K1 (A2 ),
and then
K2 (K1 (A1 )), K2 (K1 (A1 = A2 )) `L2 K2 (K1 (A2 )),
and so forth. Informally, this rule asserts that all agents are highly logical,
that they know that all agents are highly logical, and so forth. A typical
deduction from these inference rules, which is again of relevance to the
islander puzzle, is
K1 (K2 (A1 A2 A3 )), K1 (K2 (A3 )) `L2
K1 ((K2 (A2 )) = (K2 (A1 ))).
Remark 1.5.1. This is a very minimal epistemic syntax, and is weaker
than some epistemic logics considered in the literature. For instance, we do
not have any version of the positive introspection rule
K(S) ` K(K(S));
thus we allow the possibility that an agent knows S subconsciously, in that
the agent knows S but does not know that he or she knows S. Similarly, we
do not have any version of the negative introspection rule
K(S) ` K(K(S)),
so we allow the possibility that an agent is unaware of his or her own
ignorance. One can of course add these additional rules ex post facto and
see how this strengthens the syntax and limits the semantics, but we will
not need to do so here.
There is also no reason to expect the knowledge operators to commute:
K(K 0 (S)) 6` K 0 (K(S)).
Now we turn to the semantics. A model Mk of the language Lk consists of a L0 -model M0 Mod(L0 ), together with sets of possible Lk1 (1)
(m)
models Mk1 , . . . , Mk1 Mod(Lk1 ) associated to their respective knowledge agents K1 , . . . , Km . To describe how Mk models sentences, we declare
Mk |=Lk Ai iff M0 |=L0 Ai , and for any sentence S in Lk1 and i = 1, . . . , m,
(i)
we declare Mk |=Lk Ki (S) iff one has Mk1,i |= S for every Mk1 Mk1 .

1.5. Higher-order epistemic logic

23

Example 1.5.2. We consider an islander model with n atomic propositions


A1 , . . . , An (with each Ai representing the claim that Ii has blue eyes) and n
knowledge agents K1 , . . . , Kn (with Ki representing the knowledge state of
Ii at a fixed point in time). There are 2n L0 -models M0 , determined by the
truth values they assign to the n atomic propositions A1 , . . . , An . For each
k 0, we can then recursively associate a Lk -model Mk (M0 ) to each L0 model M0 , by setting M0 (M0 ) := M0 , and then for k 1, setting Mk (M0 )
(i)
to be the Lk -model with L0 -model M0 , and with Mk1 consisting of the pair

+
{Mk1 (M0,i
), Mk1 (M0,i
)}, where M0,i
(resp. M0,i
) is the L0 -model which is
identical to M0 except that the truth value of Ai is set to false (resp. true).
Informally, Mk (M0 ) models the k th -order epistemology of the L0 -world M0 ,
in which each islander sees each others eye colour (and knows that each
other islander can see all other islanders eye colour, and so forth for k
iterations), but is unsure as to his or her own eye colour (which is why the
(i)
set Mk1 of Ai s possible Lk1 -worlds branches into two possibilities). As
one recursively explores the clouds of hypothetical worlds in these models,
one can move further and further away from the real world. Consider for
instance the situation when n = 3 and M0 |= A1 , A2 , A3 (thus in the real
world, all three islanders have blue eyes), and k = 3. From the perspective

), in which I1 does not


of K1 , it is possible that one is in the world M2 (M0,1

have blue eyes: M0,1 |= A1 , A2 , A3 . In that world, we can then pass to the

), in which
perspective of K2 , and then one could be in the world M1 (M0,1,2

neither I1 nor I2 have blue eyes: M0,1,2 |= A1 , A2 , A3 . Finally, inside


this doubly nested hypothetical world, one can consider the perspective of

, in which none of I1 , I2 , I3
K3 , in which one could be in the world M0,1,2,3

have blue eyes: M0,1,2,3 |= A1 , A2 , A3 . This is the total opposite of the


real model M0 , but cannot be ruled out in at this triply nested level. In
particular, we have
M3 (M0 ) |= K1 (K2 (K3 (A1 A2 A3 )))
despite the fact that
M3 (M0 ) |= A1 A2 A3
and
M3 (M0 ) |= Ki (A1 A2 A3 )
and
M3 (M0 ) |= Ki (Kj (A1 A2 A3 ))
for all i, j {1, 2, 3}. (In particular, the statement A1 A2 A3 , which
asserts at least one islander has blue eyes, is not common knowledge in
M3 (M0 ).
We have the basic soundness and completeness properties:

24

1. Logic and foundations

Proposition 1.5.3. For each k 0, Lk is both sound and complete.


Proof. (Sketch) This is done by induction on k. For k = 0, this is just the
soundness and completeness of propositional logic. Now suppose inductively
that k 1 and the claim has already been proven for k 1. Soundness can
be verified as in the propositional logic case (with the validity of the k th
epistemic inference rule being justified by induction). For completeness,
one again uses the trick of passing to a maximal Lk -theory T that contains
one set S of sentences in Lk , but not another sentence T . This maximal
Lk -theory T uniquely determines an L0 -model M0 by inspecting whether
each Ai or its negation lies in the theory, and also determines Lk1 -theories
{S Lk1 : Ki (S) T } for each i = 1, . . . , m. By induction hypothesis,
(i)
each of these theories can be identified with a collection Mk1 of Lk1 models, thus creating a Lk -model Mk that obeys T but not S, giving (the
contrapositive of) completeness.

1.5.1. Arbitrary order epistemic logic. An easy induction shows that
the k th order logic Lk extends the previous logic Lk1 , in the sense that
every sentence in Lk1 is a sentence in Lk , every deduction on Lk1 is also a
deduction in Lk , and every model of Lk projects down (by forgetting some
aspects of the model) to a model of Lk1 . We can then form a limiting logic
L , whose language is the union of all the Lk (thus, S is a sentence in L iff
S lies in Lk for some k), whose deductive implications are the union of all
the Lk deductive implications (thus, S `L T if we have (S Lk ) `Lk T for
some k), and whose models are the inverse limits of the Lk models (thus, a
model M of L is an infinite sequence of models Mk of Lk for each k, such
that each Mk projects down to Mk1 for k 1. It is not difficult to see that
the soundness and completeness of each of the Lk implies the soundness and
completeness of the limit L (assuming the axiom of choice, of course, in
our metamathematics).
Remark 1.5.4. These models M are not quite the usual models of L one
sees in the literature, namely Kripke models; roughly speaking, the models
here are those Kripke models which are well-founded in some sense, in that
they emerge from a hierarchical construction. Conversely, a Kripke model in
our notation would be a collection W of worlds, with each world W in W
associated with an L0 -model W0 , as well as sets MK, (W ) W for each
knowledge agent K, describing all the worlds in W that K considers possible
in W . Such models can be shown to be identifiable (in the sense that they
give equivalent semantics) with the models described hierarchically as the
inverse limits of finite depth models, but we will not detail this here.

1.5. Higher-order epistemic logic

25

The logic L now allows one to talk about arbitrarily deeply nested
strings of knowledge: if S is a sentence in L , and K is a knowledge agent,
then K(S) is also a sentence in L . This allows for the following definition:
Definition 1.5.5 (Common knowledge). If S is a sentence in L , then
C(S) is the set of all sentences of the form
Ki1 (Ki2 (. . . (Kik (S)) . . .))
where k 0 and Ki1 , . . . , Kik are knowledge agents (possibly with repetition).
Thus, for instance, using the epistemic inference rules, every tautology
in L is commonly known as such: if `L S, then `L C(S).
Let us now work in the islander model in which there are n atomic
propositions A1 , . . . , An and n knowledge agents K1 , . . . , Kn . To model the
statement that it is commonly known that each islander knows each other
islanders eye colour, one can use the sets of sentences
(1.2)

C(Ai = Kj (Ai ))

and
(1.3)

C(Ai = Kj (Ai ))

for all distinct i, j {1, . . . , n}.


For any 0 l n, let Bl denote the sentence that there are at least l
blue-eyed islanders; this can be encoded as a suitable finite combination of
the A1 , . . . , An . For instance, B0 can be expressed by any tautology, B1
can be expressed by A1 . . . An , Bn can be expressed by A1 . . . An ,
and intermediate Bl can be expressed by more complicated formulae. Let
Bl denote the statement that there are exactly k blue-eyed islanders; for
instance, if n = 3, then B1 can be expressed as
(A1 A2 A3 ) (A1 A2 A3 ) (A1 A2 A3 ).
The following theorem asserts, roughly speaking, that if there are m
blue-eyed islanders, and it is commonly known that there are at least l blueeyed islanders, then all blue-eyed islanders can deduce their own eye colour
if m l, but not otherwise.
Theorem 1.5.6. Let T be the set of sentences consisting of the union of
(1.2) and (1.3) for all distinct i, j {1, . . . , n}. Let 0 m, l n. Let S
denote the sentence
n
^
S=
(Ai = Ki (Ai ))
i=1

(informally, S asserts that all blue-eyed islanders know their own eye colour).

26

1. Logic and foundations

(1) If m l, then
T , Bm , C(Bl ) `L S.
(2) If m > l, then
T , Bm , C(Bl ) 6`L S.
Proof. The first part of the theorem can be established informally as follows: if Bm holds, then each blue-eyed islander sees m 1 other blue-eyed
islanders, but also knows that there are at least l blue-eyed islanders. If
m l, this forces each blue-eyed islander to conclude that his or her own
eyes are blue (and in fact if m < l, the blue-eyed islanders knowledge is now
inconsistent, but the conclusion is still valid thanks to ex falso quodlibet). It
is a routine matter to formalise this argument using the axioms (1.2), (1.3)
and the epistemic inference rule; we leave the details as an exercise.
To prove the second part, it suffices (by soundness) to construct a L model M which satisfies T , Bm , and C(Bl ) but not S. By definition of
an L -model, it thus suffices to construct, for all sufficiently large natural
numbers k, an L -model Mk which satisfies T Lk , Bm , and C(Bl ) Lk ,
but not S, and which are consistent with each other in the sense that each
Mk is the restriction of Mk+1 to Lk .
We can do this by a modification of the construction in Example 1.5.2.
For any L0 -model M0 , we can recursively define an Lk -model Mk,l (M0 ) for
any k 0 by setting M0,l (M0 ) := M0 , and then for each k 1, setting
Mk,l (M0 ) to be the Lk -model with L0 -model M0 , and with possible worlds
(i)
Mk1 given by
(i)

Mk1 := {Mk1,l (M0,i ) : M0,i {M0,i


, M0,i
}; M0,i |=L0 Bl };

this is the same construction as in Example 1.5.2, except that at all levels of
the recursive construction, we restrict attention to worlds that obey Bl . A
routine induction shows that the Mk,l (M0 ) determine a limit M,l (M0 ),
which is an L model that obeys T and C(Bl ). If M0 |=L0 Bm , then
clearly M,l (M0 ) |=L Bm as well. But if m > l, then we see that
M,l (M0 ) 6|=L S, because for any index i with M0 |=L0 Ai , we see
(i)
that if k 1, then Mk1 (M0 ) contains worlds in which Ai is false, and
so Mk,l (M0 ) 6|=Lk Ki (Ai ) for any k 1.

1.5.2. Temporal epistemic logic. The epistemic logic discussed above is
sufficiently powerful to model the knowledge environment of the islanders
in the blue-eyed islander puzzle at a single instant in time, but in order to
fully model the islander puzzle, we now must now incorporate the role of
time. To avoid confusion, I feel that this is best accomplished by adopting
a spacetime perspective, in which time is treated as another coordinate

1.5. Higher-order epistemic logic

27

rather than having any particularly privileged role in the theory, and the
model incorporates all time-slices of the system at once. In particular, if we
allow the time parameter t to vary along some set T of times, then each actor
Ii in the model should now generate not just a single knowledge agent Ki ,
but instead a family (Ki,t )tT of knowledge agents, one for each time t T .
Informally, Ki,t (S) should then model the assertion that Ii knows S at time
t. This of course leads to many more knowledge agents than before; if for
instance one considers an islander puzzle with n islanders over M distinct
points in time, this would lead to nM distinct knowledge agents Ki,t . And
if the set of times T is countably or uncountably infinite, then the number
of knowledge agents would similarly be countably or uncountably infinite.
Nevertheless, there is no difficulty extending the previous epistemic logics
Lk and L to cover this situation. In particular we still have a complete
and sound logical framework to work in.
Note that if we do so, we allow for the ability to nest knowledge operators
at different times in the past or future. For instance, if we have three times
t1 < t2 < t3 , one could form a sentence such as
K1,t2 (K2,t1 (S)),
which informally asserts that at time t2 , I1 knows that I2 already knew S
to be true by time t1 , or
K1,t2 (K2,t3 (S)),
which informally asserts that at time t2 , I1 knows that I2 will know S to be
true by time t3 . The ability to know certain statements about the future is
not too relevant for the blue-eyed islander puzzle, but is a crucial point in
the unexpected hanging paradox.
Of course, with so many knowledge agents present, the models become
more complicated; a model Mk of Lk now must contain inside it clouds
(i,t)
Mk1 of possible worlds for each actor Ii and each time t T .
One reasonable axiom to add to a temporal epistemological system is
the ability of agents to remember what they know. More precisely, we can
impose the memory axiom
(1.4)

C(Ki,t (S) = Ki,t0 (S))

for any S L , any i = 1, . . . , m, and any t < t0 . (This axiom is important


for the blue-eyed islander puzzle, though it turns out not to be relevant for
the unexpected hanging paradox.)
We can also define a notion of common knowledge at a single time t T :
given a sentence S L , we let Ct (S) denote the set of sentences of the
form
Ki1 ,t (Ki2 ,t (. . . (Kik ,t (S)) . . .))

28

1. Logic and foundations

where k 0 and i1 , . . . , ik {1, . . . , n}. This is a subset of C(S), which is


the set of all sentences of the form
Ki1 ,t1 (Ki2 ,t2 (. . . (Kik ,tk (S)) . . .))
where t1 , . . . , tk T can vary arbitrarily in T .
1.5.3. The blue-eyed islander puzzle. Now we can model the blue-eyed
islander puzzle. To simplify things a bit, we will work with a discrete set
of times T = Z indexed by the integers, with 0 being the day in which the
foreigner speaks, and any other time t being the time t days after (or before,
if t is negative) the foreigner speaks. (One can also work with a continuous
time with only minor changes.) Note the presence of negative time; this
is to help resolve the question (which often comes up in discussion of this
puzzle) as to whether the islanders would already have committed suicide
even before the foreigner speaks.
Also, the way the problem is set up, we have the somewhat notationally
annoying difficulty that once an islander commits suicide, it becomes meaningless to ask whether that islander continues to know anything or not. To
resolve this problem, we will take the liberty of modifying the problem by
replacing suicide with a non-lethal public ritual. (This means (thanks to
(1.4)) that once an islander learns his or her own eye colour, he or she will
be condemned to repeating this ritual suicide every day from that point.)
It is possible to create a logic which tracks when different agents are alive
or dead and to thus model the concept of suicide, but this is something of a
distraction from the key point of the puzzle, so we will simply redefine away
this issue.
For similar reasons, we will not concern ourselves with eye colours other
than blue, and only consider suicides stemming from blue eyes, rather than
from any non-blue colour. (It is intuitively obvious, and can eventually
be proven, that the foreigners statement about the existence of blue-eyed
islanders is insufficient information to allow any islander to distinguish between, say, green eyes and brown eyes, and so this statement cannot trigger
the suicide of any non-blue-eyed person.)
As in previous sections, our logic will have the atomic propositions
A1 , . . . , An , with each Ai expressing the statement that Ii has blue eyes,
as well as knowledge agents Ki,t for each i = 1, . . . , n and t Z. However,
we will also need further atomic propositions Si,t for i = 1, . . . , n and t Z,
which denote the proposition that Ii commits suicide (or a ritual equivalent)
at time t. Thus we now have a countably infinite number of atomic propositions and a countably infinite number of knowledge agents, but there is
little difficulty extending the logics Lk and L to cover this setting.

1.5. Higher-order epistemic logic

29

We can now set up the various axioms for the puzzle. The highly
logical axiom has already been subsumed in the epistemological inference
rule. We also impose the memory axiom (1.4). Now we formalise the other
assumptions of the puzzle:
(All islanders see each others eye colour) If i, j {1, . . . , n} are
distinct and t Z, then
(1.5)

C(Ai = Kj,t (Ai ))


and

(1.6)

C(Ai = Kj,t (Ai )).


(Anyone who learns their own eye colour is blue, must commit
suicide the next day) If i {1, . . . , n} and t Z, then

(1.7)

C(Ki,t (Ai ) = Si,t+1 ).


0
(Suicides are public) For any i {1, . . . , n}, t Z, and Si,t
Ct (Si,t ), we have
0
C(Si,t = Si,t
).

(1.8)

00 C (S ), then
Similarly, if Si,t
t
i,t
00
C(Si,t = Si,t
).

(1.9)

(Foreigner announces in public on day 0 that there is at least one


blue-eyed islander) We have
(1.10)

C0 (B1 ).

Let T denote the union of all the axioms (1.4), (1.5), (1.6), (1.7), (1.8),
(1.10). The solution to the islander puzzle can then be summarised as
follows:
Theorem 1.5.7. Let 1 m n.
(1) (At least one blue-eyed islander commits suicide by day m)
n _
m
_
T , Bm `L
(Ai Si,t ).
i=1 t=1

(2) (Nobody needs to commit suicide before day m) For any t < m and
1 i m,
T , Bm 6`L Si,t .
Note that the first conclusion is weaker than the conventional solution to
the puzzle, which asserts in fact that all m blue-eyed islanders will commit
suicide on day m. While this indeed the default outcome of the hypotheses
T , Bm , it turns out that this is not the only possible outcome; for instance, if
one blue-eyed person happens to commit suicide on day 0 or day 1 (perhaps

30

1. Logic and foundations

for an unrelated reason than learning his or her own eye colour), then it
turns out that this cancels the effect of the foreigners announcement, and
prevents further suicides. (So, if one were truly nitpicky, the conventional
solution is not always correct, though one could also find similar loopholes
to void the solution to most other logical puzzles, if one tried hard enough.)
In fact there is a strengthening of the first conclusion: given the hypotheses T , Bm , there must exist a time 1 t m and t distinct islanders
Ii1 , . . . , Iit such that Aij Sij ,t holds for all j = 1, . . . , t.
Note that the second conclusion does not prohibit the existence of some
models of T , Bm in which suicides occur before day m (consider for instance
a situation in which a second foreigner made a similar announcement a few
days before the first one, causing the chain of events to start at an earlier
point and leading to earlier suicides).
Proof. (Sketch) To illustrate the first part of the theorem, we focus on the
simple case m = n = 2; the general case is similar but requires more notation
(and an inductive argument). It suffices to establish that
T , B2 , S1,1 , S2,1 `L S1,2 S2,2
(i.e. if nobody suicides by day 1, then both islanders will suicide on day 2.)
Assume T , B2 , S1,1 , S2,1 . From (1.10) we have
K1,0 (K2,0 (A1 A2 ))
and hence by (1.4)
K1,1 (K2,0 (A1 A2 )).
By (1.6) we also have
K1,1 (A1 = K2,0 (A1 ))
whereas from the epistemic inference axioms we have
K1,1 ((K2,0 (A1 A2 ) K2,0 (A1 )) = K2,0 (A2 )).
From the epistemic inference axioms again, we conclude that
K1,1 (A1 = K2,0 (A2 ))
and hence by (1.7) (and epistemic inference)
K1,1 (A1 = S2,1 ).
On the other hand, from S2,1 and (1.9) we have
K1,1 (S2,1 )
and hence by epistemic inference
K1,1 (A1 )

1.5. Higher-order epistemic logic

31

and thus by (1.7)


S1,2 .
A similar argument gives S2,2 , and the claim follows.
To prove the second part, one has to construct, for each k, an Lk -model
in which T , Bm is true and Si,t is false for any 1 i n and t < m. This
is remarkably difficult, in large part due to the ability of nested knowledge
operators to jump backwards and forwards in time. In particular, one can
jump backwards to before Day 0, and so one must first model worlds in
which there is no foreigner announcement. We do this as follows. Given an
L0 -model M0 , we recursively define a Lk -model Mk (M0 ) for k = 0, 1, 2, . . .
as follows. Firstly, M0 (M0 ) := M0 . Next, if k 1 and Mk1 () has already
been defined, we define Mk (M0 ) to be the Lk -model with L0 -model M0 ,
(i,t)
and for any i = 1, . . . , n and t Z, setting Mk1 (M0 ) to be the set of all
Lk1 -models of the form Mk1 (M00 ), where M00 is an L0 -model obeying the
following properties:
(Ii sees other islanders eyes) If j {1, . . . , n} and j 6= i, then
M00 |=L0 Ai iff M0 |=L0 Ai .
(Ii remembers suicides) If j {1, . . . , n} and t0 t, then M00 |=L0
Sj,t0 iff M0 |=L0 Sj,t0 .
Now we model worlds in which there is a foreigner announcement. Define
an admissible L0 model to be an L0 -model M0 such that there exist 1 t
m for which the following hold:
M0 |=L0 Bm (i.e. there are exactly m blue-eyed islanders in the
world M0 ).
There exists distinct i1 , . . . , it {1, . . . , n} such that M0 |=L0 Aij
and M0 |=L0 Sij ,t for all j = 1, . . . , t.
For any i {1, . . . , n} and t0 Z, M0 |=L0 Si,t0 implies M0 |=L0
Si,t0 +1 .
We call m the blue-eyed count of M0 .
(Such models, incidentally, can already be used to show that no suicides
necessarily occur in the absence of the foreigners announcement, because
the limit M (M0 ) of such models always obey all the axioms of T except
for (1.10).)
Given an admissible L0 -model M0 of some blue-eyed count m, we recur k (M0 ) for k = 0, 1, 2, . . . by setting M
0 (M0 ) :=
sively define an Lk model M

k (M0 )
M0 , then if k 1 and Mk1 () has already been defined, we define M
(i,t)
to be the Lk -model with L0 -model M0 , and with Mk1 (M0 ) for i = 1, . . . , n
and t Z defined by the following rules:

32

1. Logic and foundations

(i,t)

Case 1. If t < 0, then we set Mk1 (M0 ) to be the set of all Lk1 -models
of the form Mk1 (M00 ), where M00 obeys the two properties Ii sees other islanders eyes and Ii remembers suicides from the preceding construction.
(M00 does not need to be admissible in this case.)
Case 2. If t = m 1, M0 |= Ai , and there does not exist 1 t0 t
distinct i1 , . . . , it0 {1, . . . , n} such that M0 |=L0 Aij Sij ,t0 for all j =
(i,t)
k1 (M 0 ), where
1, . . . , t0 , then we set M (M0 ) Lk1 -models of the form M
0

k1

M00 is admisssible, obeys the two properties Ii sees other islanders eyes
and Ii remembers suicides from the preceding construction, and also obeys
the additional property M00 |= Ai . (Informally, this is the case in which Ii
must learn Ai .)
(i,t)

Case 3. In all other cases, we set Mk1 (M0 ) to be the set of all Lk1 k1 (M 0 ), where M 0 is admissible and obeys the two
models of the form M
0
0
properties Ii sees other islanders eyes and Ii remembers suicides from
the preceding construction.
(M0 ) be the limit of the M
k (M0 ) (which can easily be verified
We let M
to exist by induction). A quite tedious verification reveals that for any
(M0 ) obeys both T
admissible L0 -model M0 of blue-eyed count m, that M
and Bm , but one can choose M0 to not admit any suicides before time m,
which will give the second claim of the theorem.

Remark 1.5.8. Under the assumptions used in our analysis, we have shown
that it is inevitable that the foreigners comment will cause at least one
death. However, it is possible to avert all deaths by breaking one or more
of the assumptions used. For instance, if it is possible to sow enough doubt
in the islanders minds about the logical and devout nature of the other
islanders, then one can cause a breakdown of the epistemic inference rule
or of (1.7), and this can prevent the chain of deductions from reaching its
otherwise deadly conclusion.
Remark 1.5.9. The same argument actually shows that L can be replaced
by Lm for the first part of Theorem 1.5.7 (after restricting the definition of
common knowledge to those sentences that are actually in Lm , of course).
On the other hand, using Lk for k < m, one can show that this logic is
insufficient to deduce any suicides if there are m blue-eyed islanders, by
using the model Mk (M0 ) defined above; we omit the details.
1.5.4. The unexpected hanging paradox. We now turn to the unexpected hanging paradox, and try to model it using (temporal) epistemic logic.
Here is a common formulation of the paradox (taken from the Wikipedia
entry on this problem):

1.5. Higher-order epistemic logic

33

Problem 1.5.10. A judge tells a condemned prisoner that he will be hanged


at noon on one weekday in the following week but that the execution will
be a surprise to the prisoner. He will not know the day of the hanging until
the executioner knocks on his cell door at noon that day.
Having reflected on his sentence, the prisoner draws the conclusion that
he will escape from the hanging. His reasoning is in several parts. He begins
by concluding that the surprise hanging cant be on Friday, as if he hasnt
been hanged by Thursday, there is only one day left - and so it wont be a
surprise if hes hanged on Friday. Since the judges sentence stipulated that
the hanging would be a surprise to him, he concludes it cannot occur on
Friday.
He then reasons that the surprise hanging cannot be on Thursday either,
because Friday has already been eliminated and if he hasnt been hanged by
Wednesday night, the hanging must occur on Thursday, making a Thursday
hanging not a surprise either. By similar reasoning he concludes that the
hanging can also not occur on Wednesday, Tuesday or Monday. Joyfully he
retires to his cell confident that the hanging will not occur at all. The next
week, the executioner knocks on the prisoners door at noon on Wednesday
which, despite all the above, was an utter surprise to him. Everything the
judge said came true.
It turns out that there are several, not quite equivalent, ways to model
this paradox epistemologically, with the differences hinging on how one
interprets what unexpected or surprise means. In particular, if S is a
sentence and K is a knowledge agent, how would one model the sentence
K does not expect S or K is surprised by S?
One possibility is to model this sentence as
(1.11)

K(S),

i.e. as the assertion that K does not know S to be true. However, this leads
to the following situation: if K has inconsistent knowledge (in particular, one
has K(), where represents falsity (the negation of a tautology)), then by
ex falso quodlibet, K(S) would be true for every S, and hence K would expect
everything and be surprised by nothing. An alternative interpretation, then,
is to adopt the convention that an agent with inconsistent knowledge is so
confused as to not be capable of expecting anything (and thus be surprised
by everything). In this case, K does not expect S should instead be
modeled as
(1.12)

(K(S)) K(),

i.e. that K either does not know S to be true, or is inconsistent.

34

1. Logic and foundations

Both interpretations (1.11), (1.12) should be compared with the sentence


(1.13)

K(S),

i.e. that K knows that S is false. If K is consistent, (1.13) implies (1.11),


but if K is inconsistent then (1.13) is true and (1.11) is false. In either case,
though, we see that (1.13) implies (1.12).
Let now analyse the unexpected hanging paradox using the former interpretation (1.11) of surprise. We begin with the simplest (and somewhat
degenerate) situation, in which there is only one time (say Monday at noon)
in which the hanging is to take place. In this case, there is just one knowledge agent K (the knowledge of the prisoner after the judge speaks, but
before the executation date of Monday at noon). We introduce an atomic
sentence E, representing the assertion that the prisoner will be hanged on
Monday at noon. In this case (and using the former interpretation (1.11) of
surprise), the judges remarks can be modeled by the sentence
S := E (E = K(E)).
The paradox in this case stems from the following curious fact:
Theorem 1.5.11.

(1) There exist L -models in which S is true.

(2) There exist L -models in which K(S) is true.


(3) However, there does not exist any L -model in which both S and
K(S) is true.
Thus, the judges statement can be true, but if so, it is not possible for
the prisoner to know this! (In this regard, the sentence S is analogous to a
Godel sentences, which can be true in models of a formal system, but not
provable in that system.) More informally: knowing a surprise, ruins that
surprise.
Proof. The third statement is easy enough to establish: if S is true in some
model, then clearly K(E) is true in that model; but if K(S) is true in the
same model, then (by epistemic inference) K(E) will be true as well, which
is a contradiction.
The first statement is also fairly easy to establish. We have two L0 models; a model M0+ in which E is true, and a model M0 in which E
is false. We can recursively define the Lk -model Mk (M0 ) for any k 0
and any M0 {M0+ , M0 } by setting M0 (M0 ) := M0 , and for k 1, setting Mk (M0 ) to be the Lk -model with L0 -model M0 , and with Mk1 :=
{Mk1 (M0+ ), Mk1 (M0 )}. One then easily verifies that the Mk (M0 ) have a
limit M (M0 ), and that M (M0+ ) models S (but not K(S), of course).
A trivial way to establish the second statement is to make a model in
which K is inconsistent (thus Mk1 is empty). One can also take Mk1 to

1.5. Higher-order epistemic logic

35

be Mk1 (M0+ ), and this will also work. (Of course, in such models, S must
be false.)

Another peculiarity of the sentence S is that
K(S), K(K(S)) |=L K()
as can be easily verified (by modifying the proof of the second statement
of the above theorem). Thus, the sentence S has the property that if the
prisoner believes S, and also knows that he or she believes S, then the
prisoners beliefs automatically become inconsistent - despite the fact that
S is not actually a self-contradictory statement (unless also combined with
K(S)).
Now we move to the case when the execution could take place at two
possible times, say Monday at noon and Tuesday at noon. We then have
two atomic statements: E1 , the assertion that the execution takes place
on Monday at noon, and E2 , the assertion that the execution takes place
on Tuesday at noon. There are two knowledge agents; K1 , the state of
knowledge just before Monday at noon, and K2 , the state of knowledge just
before Tuesday at noon. (There is again the annoying notational issue that
if E1 occurs, then presumably the prisoner will have no sensible state of
knowledge by Tuesday, and so K2 might not be well defined in that case;
to avoid this irrelevant technicality, we replace the execution by some nonlethal punishment (or use an alternate formulation of the puzzle, for instance
by replacing an unexpected hanging with a surprise exam.)
We will need one axiom beyond the basic axioms of epistemic logic,
namely
(1.14)

C(E1 = K2 (E1 )).

Thus, it is common knowledge that if the execution does not happen on


Monday, then by Tuesday, the prisoner will be aware of this fact. This
axiom should of course be completely non-controversial.
The judges sentence, in this case, is given by
S := (E1 E2 ) (E1 = K1 (E1 )) (E2 = K2 (E2 )).
Analogously to Theorem 1.5.11, we can find L models obeying (1.14)
in which S is true, but one cannot find models obeying (1.14) in which S,
K1 (S), K2 (S), and K1 (K2 (S)) are all true, as one can soon see that this
leads to a contradiction. Indeed, from S one has
E1 = K2 (E2 )
while from (1.14) one has
E1 = K2 (E1 )

36

1. Logic and foundations

and from K2 (S) one has


K2 (E1 E2 )
wihch shows that E1 leads to a contradiction, which implies E1 and hence
K1 (E1 ) by S. On the other hand, from K1 (S) one has
K1 (E1 = K2 (E2 ))
while from (1.14) one has
K1 (E1 = K2 (E1 ))
and from K1 (K2 (S)) one has
K1 (K2 (E1 E2 ))
which shows that K1 (E1 = ), and thus K1 (E1 ), a contradiction. So,
as before, S is a secret which can only be true as long as it is not too widely
known.
A slight variant of the above argument shows that if K1 (S), K2 (S),
and K1 (K2 (S)) hold, then K1 (E1 ) and E1 = K2 (E2 ) hold - or
informally, the prisoner can deduce using knowledge of S (and knowledge of
knowledge of S) that there will be no execution on either date. This may
appear at first glance to be consistent with S (which asserts that the prisoner
will be surprised when the execution does happen), but this is a confusion
between (1.11) and (1.13). Indeed, one can show under the assumptions
K1 (S), K2 (S), K1 (K2 (S)) that K1 is inconsistent, and (if E1 holds) then
K2 is also inconsistent, and so K1 (E1 ) and E1 = K2 (E2 ) do not, in
fact, imply S.
Now suppose that we interpret surprise using (1.12) instead of (1.11).
Let us begin first with the one-day setting. Now the judges sentence becomes
S = E (E = (K(E) K())).
In this case it is possible for S and K(S) to be true, and in fact for S
to be common knowledge, basically by making K inconsistent. (A little
more precisely: we use the Lk -model Mk where M0 = M0+ and Mk1 =
. Informally: the judge has kept the execution a surprise by driving the
prisoner insane with contradictions.
The situation is more interesting in the two-day setting (as first pointed
out by Kritchman and Raz [KrRa2010]), where S is now
S := (E1 E2 ) (E1 = (K1 (E1 ) K1 ()))
(E2 = (K2 (E2 ) K2 ())).
Here it is possible for S to in fact be common knowledge in some L model, but in order for this to happen, at least one of the following three
statements must be true in this model:

1.5. Higher-order epistemic logic

37

K1 ().
K2 ().
K1 (K2 ()).
(We leave this as an exercise for the interested reader.) In other words, in
order for the judges sentence to be common knowledge, either the prisoners
knowledge on Monday or Tuesday needs to be inconsistent, or else the prisoners knowledge is consistent, but the prisoner is unable (on Monday) to
determine that his or her own knowledge (on Tuesday) is consistent. Notice
that the third conclusion here K1 (K2 ()) is very reminiscent of G
odels
second incompleteness theorem, and indeed in [KrRa2010], the surprise
examination argument is modified to give a rigorous proof of that theorem.
Remark 1.5.12. Here is an explicit example of a L -world in which S
is common knowledge, and K1 and K2 are both consistent (but K1 does
not know that K2 is consistent). We first define Lk -models Mk for each
k = 0, 1, . . . recursively by setting M0 to be the world in which M0 |=L0 E2
and M0 |=L0 E1 , and then define Mk for k 1 to be the Lk -model with
(1)
(2)
L0 -model M0 , with Mk1 := {Mk1 }, and Mk1 := . (Informally: the
execution is on Tuesday, and the prisoner knows this on Monday, but has
k for k = 0, 1, . . .
become insane by Tuesday.) We then define the models M

0 |=L
recursively by setting M0 to be the world in which M0 |=L0 E1 and M
0
k for k 1 to be the Lk -model with L0 -model M
0,
E2 , then define M
(1)
k1 }, and M(2) := {M
k1 }. (Informally: the execution
Mk1 := {Mk1 , M
k1
is on Monday, but the prisoner only finds this out after the fact.) The limit
of the M
k then has S as common knowledge, with K1 () and K2 ()
M
both false, but K1 (K2 ()) is also false.

Chapter 2

Group theory

2.1. Symmetry spending


Many problems in mathematics have the general form For any object x in
the class X, show that the property P (x) is true. For instance, one might
need to prove an identity or inequality for all choices of parameters x (which
may be numbers, functions, sets, or other objects) in some parameter space
X.
In many cases, such problems enjoy invariance or closure properties with
respect to some natural symmetries, actions, or operations. For instance,
there might be an operation T that preserves X (so that if x is in X, then
T x is in X) and preserves P (so that if P (x) is true, then P (T x) is true).
Then, in order to verify the problem for T x, it suffices to verify the problem
for x.
Similarly, if X is closed under (say) addition, and P is also closed under
addition (thus if P (x) and P (y) is true, then P (x + y) is true), then to verify
the problem for x + y, it suffices to verify the problem for x and y separately.
Another common example of a closure property: if X is closed under
some sort of limit operation, and P is also closed under the same limit
operation (thus if xn converges to x and P (xn ) is true for all n, then P (x) is
true), then to verify the problem for x, then it suffices to verify the problem
for the xn .
One can view these sorts of invariances and closure properties as problemsolving assets; in particular, one can spend these assets to reduce the class X
of objects x that one needs to solve the problem for. By doing so, one has to
give up the invariance or closure property that one spent; but if one spends
these assets wisely, this is often a favorable tradeoff. (And one can often
39

40

2. Group theory

buy back these assets if needed by expanding the class of objects again (and
defining the property P in a sufficiently abstract and invariant fashion).)
For instance, if one needs to verify P (x) for all x in a normed vector
space X, and the property P (x) is homogeneous (so that, for any scalar
c, P (x) implies P (cx)), then we can spend this homogeneity invariance to
normalise x to have norm 1, thus effectively replacing X with the unit sphere
of X. Of course, this new space is no longer closed under homogeneity; we
have spent that invariance property. Conversely, to prove a property P (x)
for all x on the unit sphere, it is equivalent to prove P (x) for all x in X,
provided that one extends the definition of P(x) to X in a homogeneous
fashion.
As a rule of thumb, each independent symmetry of the problem that one
has can be used to achieve one normalisation. Thus, for instance, if one has
a three-dimensional group of symmetries, one can expect to normalise three
quantities of interest to equal a nice value (typically one normalises to 0 for
additive symmetries, or 1 for multiplicative symmetries).
In a similar spirit, if the problem one is trying to solve is closed with
respect to an operation such as addition, then one can restrict attention to
all x in a suitable generating set of X, such as a basis. Many divide and
conquer strategies are based on this type of observation.
Or: if the problem one is trying to solve is closed with respect to limits,
then one can restrict attention to all x in a dense subclass of X. This is a
particularly useful trick in real analysis (using limiting arguments to replace
reals with rationals, sigma-compact sets with compact sets, rough functions
with nice functions, etc.). If one uses ultralimits instead of limits, this
type of observation leads to various useful correspondence principles between
finitary instances of the problem and infinitary ones (with the former serving
as a kind of dense subclass of the latter); see e.g. [Ta2012, 1.7].
Sometimes, one can exploit rather abstract or unusual symmetries. For
instance, certain types of statements in algebraic geometry tend to be insensitive to the underlying field (particularly if the fields remain algebraically
closed). This allows one to sometimes move from one field to another, for
instance from an infinite field to a finite one or vice versa; see [Ta2010b,
1.2]. Another surprisingly useful symmetry is closure with respect to tensor
powers; see [Ta2008, 1.9].
Gauge symmetry is a good example of a symmetry which is both spent
(via gauge fixing) and bought (by reformulating the problem in a gaugeinvariant fashion); see [Ta2009b, 1.4].

2.2. Isogenies between classical Lie groups

41

Symmetries also have many other uses beyond their ability to be spent
in order to obtain normalisation. For instance, they can be used to analyse a claim or argument for compatibility with that symmetry; generally
speaking, one should not be able to use a non-symmetric argument to prove
a symmetric claim (unless there is an explicit step where one spends the
symmetry in a strategic fashion). The useful tool of dimensional analysis is
perhaps the most familiar example of this sort of meta-analysis.
Thanks to Noethers theorem and its variants, we also know that there
often is a duality relationship between (continuous) symmetries and conservation laws; for instance, the time-translation invariance of a (Hamiltonian
or Lagrangian) system is tied to energy conservation, the spatial translation
invariance is tied to momentum conservation, and so forth. The general
principle of relativity (that the laws of physics are invariant with respect to
arbitrary nonlinear coordinate changes) leads to a much stronger pointwise
conservation law, namely the divergence-free nature of the stress-energy tensor, which is fundamentally important in the theory of wave equations (and
particularly in general relativity).
As the above examples demonstrate, when solving a mathematical problem, it is good to be aware of what symmetries and closure properties the
problem has, before one plunges in to a direct attack on the problem. In
some cases, such symmetries and closure properties only become apparent
if one abstracts and generalises the problem to a suitably natural framework; this is one of the major reasons why mathematicians use abstraction
even to solve concrete problems. (To put it another way, abstraction can be
used to purchase symmetries or closure properties by spending the implicit
normalisations that are present in a concrete approach to the problem; see
[Ta2011d, 1.6].)

2.2. Isogenies between classical Lie groups


For sake of concreteness we will work here over the complex numbers C,
although most of this discussion is valid for arbitrary algebraically closed
fields (but some care needs to be taken in characteristic 2, as always, particularly when defining the orthogonal and symplectic groups). Then one has
the following four infinite families of classical Lie groups for n 1:
(1) (Type An ) The special linear group SLn+1 (C) of volume-preserving
linear maps T : Cn+1 Cn+1 .
(2) (Type Bn ) The special orthogonal group SO2n+1 (C) of (orientation
preserving) linear maps T : C2n+1 C2n+1 preserving a nondegenerate symmetric form h, i : C2n+1 C2n+1 C, such as the

42

2. Group theory

standard symmetric form


h(z1 , . . . , z2n+1 ), (w1 , . . . , w2n+1 )i := z1 w1 + . . . + z2n+1 w2n+1 .
(this is the complexification of the more familiar real special orthogonal group SO2n+1 (R)).
(3) (Type Cn ) The symplectic group Sp2n (C) of linear maps T : C2n
C2n preserving a non-degenerate antisymmetric form : C2n
C2n C, such as the standard symplectic form
((z1 , . . . , z2n ), (w1 , . . . , w2n )) :=

n
X

zj wn+j zn+j wj .

j=1

(4) (Type Dn ) The special orthogonal group SO2n (C) of (orientation


preserving) linear maps C2n C2n preserving a non-degenerate
symmetric form h, i : C2n C2n C (such as the standard symmetric form).
In this section, we will abuse notation somewhat and identify An with
SLn+1 (C), Bn with SO2n+1 (C), etc., although it is more accurate to say
that SLn+1 (C) is a Lie group of type An , etc., as there are other forms of the
Lie algebras associated to An , Bn , Cn , Dn over various fields. Over a nonalgebraically closed field, such as R, the list of Lie groups associated with
a given type can in fact get quite complicated, and will not be discussed
here. One can also view the double covers Spin2n+1 (C) and Spin2n (C)
of SO2n+1 (C), SO2n (C) (i.e. the spin groups) as being of type Bn , Dn
respectively; however, I find the spin groups less intuitive to work with
than the orthogonal groups and will therefore focus more on the orthogonal
model.
The reason for this subscripting is that each of the classical groups
An , Bn , Cn , Dn has rank n, i.e. the dimension of any maximal connected
abelian subgroup of simultaneously diagonalisable elements (also known as
a Cartan subgroup) is n. For instance:
(1) (Type An ) In SLn+1 (C), one Cartan subgroup is the diagonal matrices in SLn+1 (C), which has dimension n.
(2) (Type Bn ) In SO2n+1 (C), all Cartan subgroups are isomorphic to
SO2 (C)n SO1 (C), which has dimension n.
(3) (Type Cn ) In Sp2n (C), all Cartan subgroups are isomorphic to
SO2 (C)n Sp2 (C)n Sp2n (C), which has dimension n.
(4) (Type Dn ) in SO2n (C), all Cartan subgroups are isomorphic to
SO2 (C)n , which has dimension n.

2.2. Isogenies between classical Lie groups

43

Remark 2.2.1. This same convention also underlies the notation for the
exceptional simple Lie groups G2 , F4 , E6 , E7 , E8 , which we will not discuss
further here.
With two exceptions, the classical Lie groups An , Bn , Cn , Dn are all simple, i.e. their Lie algebras are non-abelian and not expressible as the direct sum of smaller Lie algebras. The two exceptions are D1 = SO2 (C),
which is abelian (isomorphic to C , in fact) and thus not considered simple,
and D2 = SO4 (C), which turns out to essentially split as A1 A1 =
SL2 (C) SL2 (C), in the sense that the former group is double covered
by the latter (and in particular, there is an isogeny from the latter to the
former, and the Lie algebras are isomorphic).
The adjoint action of a Cartan subgroup of a Lie group G on the Lie
algebra g splits that algebra into weight spaces; in the case of a simple
Lie group, the associated weights are organised by a Dynkin diagram. The
Dynkin diagrams for An , Bn , Cn , Dn are of course well known, and can be
found in any text on Lie groups or algebraic groups.
For small n, some of these Dynkin diagrams are isomorphic; this is a classic instance of the tongue-in-cheek strong law of small numbers [Gu1988],
though in this case strong law of small diagrams would be more appropriate. These accidental isomorphisms then give rise to the exceptional isomorphisms between Lie algebras (and thence to exceptional isogenies between
Lie groups). Excluding those isomorphisms involving the exceptional Lie
algebras En for n = 3, 4, 5, these isomorphisms are
(1) A1 = B1 = C1 ;
(2) B2 = C2 ;
(3) D2 = A1 A1 ;
(4) D3 = A3 .
There is also a pair of exceptional isomorphisms from (the Spin8 form of)
D4 to itself, a phenomenon known as triality.
These isomorphisms are most easily seen via algebraic and combinatorial
tools, such as an inspection of the Dynkin diagrams. However, the isomorphisms listed above1 can also be seen by more geometric means, using
the basic representations of the classical Lie groups on their natural vector
spaces (Cn+1 , C2n+1 , C2n , C2n for An , Bn , Cn , Dn respectively) and combinations thereof (such as exterior powers). These isomorphisms are quite
standard (they can be found, for instance, in [Pr2007]), but I decided to
present them here for sake of reference.
1However, I dont know of a simple way to interpret triality geometrically; the descriptions
I have seen tend to involve some algebraic manipulation of the octonions or of a Clifford algebra,
in a manner that tended to obscure the geometry somewhat.

44

2. Group theory

2.2.1. A1 = C1 . This is the simplest correspondence. A1 = SL2 (C) is


the group of transformations T : C2 C2 that preserve the volume form;
C1 = Sp2 (C) is the group of transformations T : C2 C2 that preserve the
symplectic form. But in two dimensions, the volume form and the symplectic
form are the same.
2.2.2. A1 = B1 . The group A1 = SL2 (C) naturally acts on C2 . But it
also has an obvious three-dimensional action, namely the adjoint action
g : X 7 gXg 1 on the Lie algebra sl2 (C) of 2 2 complex matrices of trace
zero. This action preserves the Killing form
hX, Y isl2 (C) := tr(XY )
due to the cyclic nature of the trace. The Killing form is symmetric and
non-degenerate (this reflects the simple nature of A1 ), and so we see that
each element of SL2 (C) has been mapped to an element of
SO(sl2 (C)) SO3 (C) = B1 ,
thus giving a homomorphism from A1 to B1 . The group A1 has dimension
22 1 = 3, and B1 has dimension 3(3 1)/2 = 3, so A1 and B1 have
the same dimension. The kernel of the map is easily seen to be the centre
{+1, 1} of A1 , and so this is a double cover2 of B1 by A1 (thus interpreting
A1 = SL2 (C) as the spin group Spin3 (C)).
A slightly different interpretation of this correspondence, using quaternions, will be discussed in Section 8.3.
2.2.3. A3 = D3 . The group A3 = SL4 (C) naturally acts on C4 . Like A1 ,
it has an adjoint action (on the 15-dimensional Lie algebra sl4 (C)), but this
is not the action we will use for the A3 = D3 correspondence. Instead,
V2 4 we
4
C of
will look at the action on the 2 = 6-dimensional exterior power
C4 , given by the usual formula
g(v w) := (gv) (gw).
V
Since 2+2 = 4, the volume form on C4 induces a bilinear form h, i on 2 C2 ;
since 2 is even, this form is symmetric rather than anti-symmetric, and it is
also non-degenerate. An element of SL4 (C) preserves the volume form and
thus preserves the bilinear form, giving a map from SL4 (C) to
2
^
SO( C4 ) SO6 (C) = D3 .

This is a homomorphism from A3 to D3 . The group A3 has dimension


42 1 = 15, and D3 has dimension 6(6 1)/2 = 15, so A3 and D3 have the
same dimension. As before, the kernel is seen to be {+1, 1}, so this is a
2Note that the image of the map is open and B is connected, so that one indeed has a
1
covering map.

2.2. Isogenies between classical Lie groups

45

double cover of D3 by A3 (thus interpreting A3 = SL4 (C) as the spin group


Spin6 (C)).
2.2.4. B2 = C2 . This is basically a restriction of the A3 = D3 correspondence. Namely, the group C2 = Sp4 (C) acts on C4 in a manner that preserves the symplectic form , and hence (on taking a wedge product) the
volume form also. Thus C2 is a subgroup of SL4 (C) = A3 , and as discussed
V
above, thus acts orthogonally on the six-dimensional space 2 C4 . On the
other
the symplectic form can itself be thought of as an element
V2 hand,
4
of
C , and is clearly fixed by all of C2 ; thus C2 also stabilises the fiveV
dimensional orthogonal complement of inside 2 C4 . Note that is
non-degenerate (here we crucially use the fact that the characteristic is not
two!) and so is also non-degenerate. We have thus mapped C2 to
SO( ) SO5 (C) = B2 .
This is a homomorphism from C2 to B2 . The group C2 has dimension
2(4 + 1) = 10, while B2 has dimension 5(5 1)/2 = 10, so B2 and C2 have
the same dimension. Once again, one can verify that the kernel is {+1, 1},
so this is a double cover of B2 by C2 (thus interpreting C2 = Sp4 (C) as the
spin group Spin5 (C)).
Remark 2.2.2. In characteristic two, the above map from C2 to B2 disappears, but there is a somewhat different identification between Bn =
SO2n+1 (k) and Cn = Sp2n (k) for any n in this case. Namely, in characteristic two, inside k 2n+1 with a non-degenerate symmetric form h, i, the set of
null vectors (vectors x with hx, xi = 0) forms a 2n-dimensional hyperplane,
and the restriction of the symmetric form to that hyperplane becomes a symplectic form (which, in characteristic two, is defined to be an anti-symmetric
form with (x, x) = 0 for all x). This provides the claimed identification
between Bn and Cn .
2.2.5. D2 = A1 A1 . The group A1 A1 = SL2 (C) SL2 (C) acts on
C2 C2 by direct sum:
(g, h)(v, w) := (gv, hw).
Each individual factor g, h preserves the symplectic form on C2 , and so
the pair (g, h) preserves the tensor product , which is the bilinear form
on C2 C2 defined as
((v, w), (v 0 , w0 )) := (v, v 0 )(w, w0 ).
As each factor is anti-symmetric and non-degenerate, the tensor product
is symmetric and non-degenerate. Thus we have mapped A1 A1 into
SO(C2 C2 ) = SO4 (C) = D2 .

46

2. Group theory

The group A1 A1 has dimension (22 1) + (22 1) = 6, and D2 has


dimension 4(4 1)/2 = 6, so A1 A1 and D2 have the same dimension. As
before, the kernel can be verified to be {(+1, +1), (1, 1)}, and so this is a
double cover of D2 by A1 A1 (thus interpreting A1 A1 = SL2 (C)SL2 (C)
as the spin group Spin4 (C)).
Remark 2.2.3. All of these exceptional isomorphisms can be treated algebraically in a unified manner using the machinery of Clifford algebras and
spinors; however, I find the more ad hoc geometric approach given here to
be easier to visualise.
Remark 2.2.4. In the above discussion, we relied heavily on matching
dimensions to ensure that various homomorphisms were in fact isogenies.
There are some other exceptional homomorphisms in low dimension which
are not isogenies due to mismatching dimensions, but are still of interest. For
instance, there is a way to embed the six-dimensional space D2 = A1 A1 =
C1 B1 = Sp2 (C) SO3 (C) into the 21-dimensional space C3 = Sp6 (C),
by letting Sp2 (C) act on C2 and SO3 (C) act on C3 , so that Sp2 (C)
SO3 (C) acts on the six-dimensional tensor product C2 C3 in the obvious
manner; this preserves the tensor product of the symplectic form on C2 and
the symmetric form on C3 , which is a non-degenerate symplectic form on
C2 C3 C6 , giving the homomorphism (with the kernel once again being
{(+1, +1), (1, 1)}). These sorts of embeddings were useful in a recent
paper of Breuillard, Green, Guralnick, and myself [BrGrGuTa2010], as
they gave examples of semisimple groups that could be easily separated from
other semisimple groups (such as C1 C2 inside C3 ) due to their irreducible
action on various natural vector spaces (i.e. they did not stabilise any nontrivial space).

Chapter 3

Combinatorics

3.1. The Szemer


edi-Trotter theorem via the polynomial ham
sandwich theorem
The ham sandwich theorem asserts that, given d bounded open sets U1 , . . . , Ud
in Rd , there exists a hyperplane {x Rd : x v = c} that bisects each of
these sets Ui , in the sense that each of the two half-spaces {x Rd : x v <
c}, {x Rd : x v > c} on either side of the hyperplane captures exactly half
of the volume of Ui . The shortest proof of this result proceeds by invoking
the Borsuk-Ulam theorem.
A useful generalisation of the ham sandwich theorem is the polynomial ham sandwich theorem, which asserts that given m bounded open sets
U1 , . . . , Um in Rd , there exists a hypersurface {x Rd : Q(x) = 0} of degree
Od (m1/d ) (thus P : Rd R is a polynomial of degree1 O(m1/n ) such that
the two semi-algebraic sets {Q > 0} and {Q < 0} capture half the volume
of each of the Ui . This theorem can be deduced from the Borsuk-Ulam theorem in the same manner that the ordinary ham sandwich theorem is (and
can also be deduced directly from the ordinary ham sandwich theorem via
the Veronese embedding).
The polynomial ham sandwich theorem is a theorem about continuous
bodies (bounded open sets), but a simple limiting argument leads one to
the following discrete analogue: given m finite sets S1 , . . . , Sm in Rd , there
exists a hypersurface {x Rd : Q(x) = 0} of degree Od (m1/d ), such that
each of the two semi-algebraic sets {Q > 0} and {Q < 0} contain at most
half of the points on Si (note that some of the points of Si can certainly
1More precisely, the degree will be at most D, where D is the first positive integer for which
exceeds m.

D+d
d

47

48

3. Combinatorics

lie on the boundary {Q = 0}). This can be iterated to give a useful cell
decomposition:
Proposition 3.1.1 (Cell decomposition). Let P be a finite set of points in
Rd , and let D be a positive integer. Then there exists a polynomial Q of
degree at most D, and a decomposition
Rd = {Q = 0} C1 . . . Cm
into the hypersurface {Q = 0} and a collection C1 , . . . , Cm of cells bounded
by {P = 0}, such that m = Od (Dd ), and such that each cell Ci contains at
most Od (|P |/Dd ) points.
A proof of this decomposition is sketched in [Ta2011d, 3.9]. The cells
in the argument are not necessarily connected (being instead formed by
intersecting together a number of semi-algebraic sets such as {Q > 0} and
{Q < 0}), but it is a classical result2 [OlPe1949], [Mi1964], [Th1965]
that any degree D hypersurface {Q = 0} divides Rd into Od (Dd ) connected
components, so one can easily assume that the cells are connected if desired.
Remark 3.1.2. By setting D as large as Od (|P |1/m ), we obtain as a limiting
case of the cell decomposition the fact that any finite set P of points in Rd
can be captured by a hypersurface of degree Od (|P |1/m ). This fact is in
fact true over arbitrary fields (not just over R), and can be proven by a
simple linear algebra argument; see e.g. [Ta2009b, 1.1]. However, the cell
decomposition is more flexible than this algebraic fact due to the ability to
arbitrarily select the degree parameter D.
The cell decomposition can be viewed as a structural theorem for arbitrary large configurations of points in space, much as the Szemeredi regularity lemma [Sz1978] can be viewed as a structural theorem for arbitrary
large dense graphs. Indeed, just as many problems in the theory of large
dense graphs can be profitably attacked by first applying the regularity
lemma and then inspecting the outcome, it now seems that many problems
in combinatorial incidence geometry can be attacked by applying the cell
decomposition (or a similar such decomposition), with a parameter D to be
optimised later, to a relevant set of points, and seeing how the cells interact with each other and with the other objects in the configuration (lines,
planes, circles, etc.). This strategy was spectacularly illustrated recently
with Guth and Katzs use [GuKa2010] of the cell decomposition to resolve
2Actually, one does not need the full machinery of the results in the above cited papers which control not just the number of components, but all the Betti numbers of the complement of
{Q = 0} - to get the bound on connected components; one can instead observe that every bounded
connected component has a critical point where Q = 0, and one can control the number of these
points by Bezouts theorem, after perturbing Q slightly to enforce genericity, and then count the
unbounded components by an induction on dimension. See [SoTa2011, Appendix A].

3.1. Szemeredi-Trotter and ham sandwich

49

the Erd
os distinct distance problem (up to logarithmic factors), as discussed
in [Ta2011d, 3.9].
In this section, I will record a simpler (but still illustrative) version of
this method (that I learned from Nets Katz), which provides yet another
proof of the Szemeredi-Trotter theorem in incidence geometry:
Theorem 3.1.3 (Szemeredi-Trotter theorem). Given a finite set of points
P and a finite set of lines L in R2 , the set of incidences I(P, L) := {(p, `)
P L : p `} has cardinality
|I(P, L)|  |P |2/3 |L|2/3 + |P | + |L|.
This theorem has many short existing proofs, including one via crossing
number inequalities (as discussed in [Ta2008, 1.10] or via a slightly different type of cell decomposition (as discussed in [Ta2010b, 1.6]). The proof
given below is not that different, in particular, from the latter proof, but I
believe it still serves as a good introduction to the polynomial method in
combinatorial incidence geometry.
Let us begin with a trivial bound:
Lemma 3.1.4 (Trivial bound). For any finite set of points P and finite set
of lines L, we have |I(P, L)|  |P ||L|1/2 + |L|.
The slickest way to prove this lemma is by the Cauchy-Schwarz inequality. If we let (`) be the number of points P incident to a given line `, then
we have
X
|I(P, L)| =
(`)
`L

and hence by Cauchy-Schwarz


X
(`)2 |I(P, L)|2 /|L|.
`L

On the other hand, the left-hand side counts the number of triples (p, p0 , `)
P P L with p, p0 `. Since two distinct points p, p0 determine at most
one line, one thus sees that the left-hand side is at most |P |2 + |I(P, L)|, and
the claim follows.
Now we return to the Szemeredi-Trotter theorem, and apply the cell
decomposition with some parameter D. This gives a decomposition
R2 = {Q = 0} C1 . . . Cm
into a curve {Q = 0} of degree O(D), and at most O(D2 ) cells C1 , . . . , Cm ,
each of which contains O(|P |/D2 ) points. We can then decompose
m
X
|I(P, L)| = |I(P {Q = 0}, L)| +
|I(P Ci , L)|.
i=1

50

3. Combinatorics

By removing repeated factors, we may take Q to be square-free.


Let us first deal with the incidences coming from the cells Ci . Let Li be
the lines in L that pass through the ith cell Ci . Clearly
|I(P Ci , L)| = |I(P Ci , Li )|
and thus by the trivial bound
|P |
|Li |1/2 + |Li |.
D2
Now we make a key observation (coming from Bezouts theorem: each line
in ` can meet at most O(D) cells Ci , because the cells Ci are bounded by a
degree D curve {Q = 0}). Thus
|I(P Ci , L)|  |P Ci ||Li |1/2 + |Li | 

m
X

|Li |  D|L|

i=1

and hence by Cauchy-Schwarz, we have


m
X
|Li |1/2  D3/2 |L|1/2 .
i=1

Putting all this together, we see that


m
X
|I(P Ci , L)|  D1/2 |P ||L|1/2 + D|L|.
i=1

Now we turn to the incidences coming from the curve {Q = 0}. Applying
Bezouts theorem again, we see that each line in L either lies in {Q = 0},
or meets {Q = 0} in O(D) points. The latter case contributes at most
O(D|L|) incidences, so now we restrict attention to lines that are completely
contained in {Q = 0}. The points in the curve {Q = 0} are of two types:
smooth points (for which there is a unique tangent line to the curve {Q = 0})
and singular points (where Q and Q both vanish). A smooth point can be
incident to at most one line in {Q = 0}, and so this case contributes at most
|P | incidences. So we may restrict attention to the singular points. But by
one last application of Bezouts theorem, each line in L can intersect the
zero-dimensional set {Q = Q = 0} in at most O(D) points (note that each
partial derivative of Q also has degree O(D)), giving another contribution
of O(D|L|) incidences. Putting everything together, we obtain
|I(P, L)|  D1/2 |P ||L|1/2 + D|L| + |P |
for any D 1. An optimisation in D then gives the claim.
Remark 3.1.5. If one used the extreme case of the cell decomposition noted
in Remark 3.1.2, one only obtains the trivial bound
|I(P, L)|  |P |1/2 |L| + |P |.

3.2. A quantitative Kemperman theorem

51

On the other hand, this bound holds over arbitrary fields k (not just over
R), and can be sharp in such cases (consider for instance the case when k
is a finite field, P consists of all the points in k 2 , and L consists of all the
lines in k 2 .)

3.2. A quantitative Kemperman theorem


In [Ke1964], Kemperman established the following result:
Theorem 3.2.1. Let G be a compact connected group, with a Haar probability measure . Let A, B be compact subsets of G. Then
(AB) min((A) + (B), 1).
Remark 3.2.2. The estimate is sharp, as can be seen by considering the
case when G is a unit circle, and A, B are arcs; similarly if G is any compact
connected group that projects onto the circle. The connectedness hypothesis
is essential, as can be seen by considering what happens if A and B are a
non-trivial open subgroup of G. For locally compact connected groups which
are unimodular but not compact, there is an analogous statement, but with
now a Haar measure instead of a Haar probability measure, and the righthand side min((A) + (B), 1) replaced simply by (A) + (B). The case
when G is a torus is due to Macbeath [Ma1953], and the case when G
is a circle is due to Raikov [Ra1939]. The theorem is closely related to
the Cauchy-Davenport inequality [Ca1813], [Da1935]; indeed, it is not
difficult to use that inequality to establish the circle case, and the circle case
can be used to deduce the torus case by considering increasingly dense circle
subgroups of the torus (alternatively, one can also use Knesers theorem
[Kn1953]).
By inner regularity, the hypothesis that A, B are compact can be replaced with Borel measurability, so long as one then adds the additional
hypothesis that A + B is also Borel measurable.
A short proof of Kempermans theorem was given by Ruzsa [Ru1992].
In this section, I wanted to record how this argument can be used to establish
the following more robust version of Kempermans theorem, which not
only lower bounds AB, but gives many elements of AB some multiplicity:
Theorem 3.2.3. Let G be a compact connected group, with a Haar probability measure . Let A, B be compact subsets of G. Then for any 0 t
min((A), (B)), one has
Z
(3.1)
min(1A 1B , t) d t min((A) + (B) t, 1).
G

52

3. Combinatorics

Indeed, Theorem 3.2.1 can be deduced from Theorem 3.2.3 by dividing


(3.1) by t and then taking limits as t 0. The bound in (3.1) is sharp, as
can again be seen by considering the case when A, B are arcs in a circle. The
analogous claim for cyclic groups for prime order was established by Pollard
[Po1974], and for general abelian groups by Green and Ruzsa [GrRu2005].
Let us now prove Theorem 3.2.3. It uses a submodularity argument
related to some arguments of Hamidoune [Ha2010], [Ta2012b]. We fix B
and t with 0 t (B), and define the quantity
Z
min(1A 1B , t) d t((A) + (B) t).
c(A) :=
G

for any compact set A. Our task is to establish that c(A) 0 whenever
t (A) 1 (B) + t.
We first verify the extreme cases. If (A) = t, then 1A 1B t, and so
c(A) = 0 in this case. At the other extreme, if (A) = 1 (B) + t, then
from the inclusion-exclusionRprinciple we see that 1A 1B t, and so again
c(A) = 0 in this case (since G 1A 1B = (A)(B) = t(B)).
To handle the intermediate regime when (A) lies between t and 1
(B) + t, we rely on the submodularity inequality
(3.2)

c(A1 ) + c(A2 ) c(A1 A2 ) + c(A1 A2 )

for arbitrary compact A1 , A2 . This inequality comes from the obvious


pointwise identity
1A1 + 1A2 = 1A1 A2 + 1A1 A2
whence
1A1 1B + 1A2 1B = 1A1 A2 1B + 1A1 A2 1B
and thus (noting that the quantities on the left are closer to each other than
the quantities on the right)
min(1A1 1B , t) + min(1A2 1B , t)
min(1A1 A2 1B , t) + min(1A1 A2 1B , t)
at which point (3.2) follows by integrating over G and then using the inclusionexclusion principle.
Now introduce the function
f (a) := inf{c(A) : (A) = a}
for t a 1 (B) + t. From the preceding discussion f (a) vanishes at the
endpoints a = t, 1 (B) + t; our task is to show that f (a) is non-negative
in the interior region t < a < 1 (B) + t. Suppose for contradiction that
this was not the case. It is easy to see that f is continuous (indeed, it is
even Lipschitz continuous), so there must be t < a < 1 (B) + t at which

3.2. A quantitative Kemperman theorem

53

f is a local minimum and not locally constant. In particular, 0 < a < 1.


But for any A with (A) = a, we have the translation-invariance
(3.3)

c(gA) = c(A)

for any g G, and hence by (3.2)


1
1
c(A) c(A gA) + c(A gA).
2
2
Note that (A gA) depends continuously on g, equals a when g is the
identity, and has an average value of a2 . As G is connected, we thus see
from the intermediate value theorem that for any 0 < < a a2 , we
can find g such that (A gA) = a , and thus by inclusion-exclusion
(A gA) = a + . By definition of f , we thus have
1
1
c(A) f (a ) + f (a + ).
2
2
Taking infima in A (and noting that the hypotheses on are independent
of A) we conclude that
1
1
f (a) f (a ) + f (a + )
2
2
for all 0 < < aa2 . As f is a local minimum and is arbitrarily small, this
implies that f is locally constant, a contradiction. This establishes Theorem
3.2.3.
We observe the following corollary:
Corollary 3.2.4. Let G be a compact connected group, with a Haar probability measure . Let A, B, C be compact subsets of G, and let :=
min((A), (B), (C)). Then one has the pointwise estimate
1
1A 1B 1C ((A) + (B) + (C) 1)2+
4
if (A) + (B) + (C) 1 2, and
1A 1B 1C ((A) + (B) + (C) 1 )
if (A) + (B) + (C) 1 2.
Once again, the bounds are completely sharp, as can be seen by computing 1A 1B 1C when A, B, C are arcs of a circle. For groups G which are
quasirandom (which means that they have no small-dimensional non-trivial
representations, and are thus in some sense highly non-abelian), one can do
much better than these bounds [Go2008]; thus, the abelian case is morally
the worst case here, although it seems difficult to convert this intuition into
a rigorous reduction.

54

3. Combinatorics

Proof. By cyclic permutation we may take = (C). For any


((A) + (B) 1)+ t min((A), (B)),
we can bound
1A 1B 1C min(1A 1B , t) 1C
Z
min(1A 1B , t) d t(1 (C))

t((A) + (B) t) t(1 (C))


= t min((A) + (B) + (C) 1 t)
where we used Theorem 3.2.3 to obtain the third line. Optimising in t, we
obtain the claim.


Chapter 4

Analysis

4.1. The Fredholm alternative


In one of my recent papers [RoTa2011], we needed to use the Fredholm
alternative in functional analysis:
Theorem 4.1.1 (Fredholm alternative). Let X be a Banach space, let T :
X X be a compact operator (that is, a bounded linear operator that maps
bounded sets to precompact sets), and let C be non-zero. Then exactly
one of the following statements hold:
(1) (Eigenvalue) There is a non-trivial solution x X to the equation
T x = x.
(2) (Bounded resolvent) The operator T has a bounded inverse (T
)1 on X.
Among other things, the Fredholm alternative can be used to establish
the spectral theorem for compact operators. A hypothesis such as compactness is necessary; the shift operator U on `2 (Z), for instance, has no eigenfunctions, but U z is not invertible for any unit complex number z. The
claim is also false when = 0; consider for instance the multiplication operator T f (n) := n1 f (n) on `2 (N), which is compact and has no eigenvalue
at zero, but is not invertible.
In this section we present a proof of the Fredholm alternative (first discovered by MacCleur-Hulland [MaHu2008] and by Uuye [Uu2010]) in the
case of approximable operators, which are a special subclass of compact operators that are the limit of finite rank operators in the uniform topology.
55

56

4. Analysis

Many Banach spaces (and in particular, all Hilbert spaces) have the approximation property 1 that implies (by a result of Grothendieck [Gr1955]) that
all compact operators on that space are approximable. For instance, if X
is a Hilbert space, then any compact operator is approximable, because any
compact set can be approximated by a finite-dimensional subspace, and in
a Hilbert space, the orthogonal projection operator to a subspace is always
a contraction. In more general Banach spaces, finite-dimensional subspaces
are still complemented, but the operator norm of the projection can be large.
Indeed, there are examples of Banach spaces for which the approximation
property fails; the first such examples were discovered by Enflo [En1973],
and a subsequent paper by Alexander [Al1974] demonstrated the existence
of compact operators in certain Banach spaces that are not approximable.
We also give two more traditional proofs of the Fredholm alternative,
not requiring the operator to be approximable, which are based on the Riesz
lemma and a continuity argument respectively.
4.1.1. First proof (approximable case only). In the finite-dimensional
case, the Fredholm alternative is an immediate consequence of the ranknullity theorem, and the finite rank case can be easily deduced from the
finite dimensional case by some routine algebraic manipulation. The main
difficulty in proving the alternative is to be able to take limits and deduce
the approximable case from the finite rank case. The key idea of the proof
is to use the approximable property to establish a lower bound on T I
that is stable enough to allow one to take such limits.
Fix a non-zero . It is clear that T cannot have both an eigenvalue and
bounded resolvent at , so now suppose that T has no eigenvalue at , thus
T is injective. We claim that this implies a lower bound:
Lemma 4.1.2 (Lower bound). Let C be non-zero, and suppose that
T : X X be a compact operator that has no eigenvalue at . Then there
exists c > 0 such that k(T )xk ckxk for all x X.
Proof. By homogeneity, it suffices to establish the claim for unit vectors x.
Suppose this is not the case; then we can find a sequence of unit vectors xn
such that (T )xn converges strongly to zero. Since xn has norm bounded
away from zero (here we use the non-zero nature of ), we conclude in
particular that yn := T xn has norm bounded away from zero for sufficiently
large n. By compactness of T , we may (after passing to a subsequence)
assume that the yn converge strongly to a limit y, which is thus also nonzero.
1The approximation property has many formulations; one of them is that the identity operator is the limit of a sequence of finite rank operators in the strong operator topology.

4.1. The Fredholm alternative

57

On the other hand, applying the bounded operator T to the strong


convergence (T )xn 0 (and using the fact that T commutes with
T ) we see that (T )yn converges strongly to 0. Since yn converges
strongly to y, we conclude that (T )y = 0, and thus we have an eigenvalue
of T at , contradiction.

Remark 4.1.3. Note that this argument is ineffective in that it provides
no explicit value of c (and thus no explicit upper bound for the operator
norm of the resolvent (T )1 ). This is not surprising, given that the fact
that T has no eigenvalue at is an open condition rather than a closed one,
and so one does not expect bounds that utilise this condition to be uniform.
(Indeed, the resolvent needs to blow up as one approaches the spectrum of
T .)
From the lower bound, we see that to prove the bounded invertibility
of T , it will suffice to establish surjectivity. (Of course, we could have
also obtained this reduction by using the open mapping theorem.) In other
words, we need to establish that the range Ran(T ) of T is all of X.
Let us first deal with the easy case when T has finite rank, so that
Ran(T ) is some finite-dimension n. This implies that the kernel Ker(T )
has codimension n, and we may thus split X = Ker(T ) + Y for some ndimensional space Y . The operator T is a non-zero multiple of the
identity on Ker(T ), and so Ran(T ) already contains Ker(T ). On the
other hand, the operator T (T ) maps the n-dimensional space Y to the
n-dimensional space Ran(T ) injectively (since Y avoids Ker(T ) and T
is injective), and thus also surjectively (by the rank-nullity theorem). Thus
T (Ran(T )) contains Ran(T ), and thus (by the short exact sequence
0 Ker(T ) X Ran(T ) 0) Ran(T ) is in fact all of X, as desired.
Finally, we deal with the case when T is approximable. The lower bound
in Lemma 4.1.2 is stable, and will extend to the finite rank operators Sn for
n large enough (after reducing c slightly). By the preceding discussion for
the finite rank case, we see that Ran(Sn ) is all of X. Using Lemma 4.1.2
for Sn , and the convergence of Sn to T in the operator norm topology, we
conclude that Ran(T ) is dense in X. On the other hand, we observe
that the space Ran(T ) is necessarily closed, for if (T )xn converges
to a limit y, then (by Lemma 4.1.2 and the assumption that X is Banach)
xn will also converge to some limit x, and so y = (T )x. As Ran(T )
is now both dense and closed, it must be all of X, and the claim follows.
4.1.2. Second proof. We now give the standard proof of the Fredholm
alternative based on the Riesz lemma:

58

4. Analysis

Lemma 4.1.4 (Riesz lemma). If Y is a proper closed subspace of a Banach space X, and > 0, then there exists a unit vector x whose distance
dist(x, Y ) to Y is at least 1 .
Proof. By the Hahn-Banach theorem, one can find a non-trivial linear functional : X C on X which vanishes on Y . By definition of the operator
norm kkop of , one can find a unit vector x such that |(x)| (1)kkop .
The claim follows.

The strategy here is not to use finite rank approximations (as they are
no longer available), but instead to try to contradict the compactness of T
by exhibiting a bounded set whose image under T is not totally bounded.
Let T : X X be a compact operator on a Banach space, and let
be a non-zero complex number such that T has no eigenvalue at . As in
the first proof, we have the lower bound from Lemma 4.1.2, and we know
that Ran(T ) is a closed subspace of X; in particular, the map T is
a Banach space isomorphism from X to Ran(T ). Our objective is again
to show that Ran(T ) is all of X.
Suppose for contradiction that Ran(T ) is a proper closed subspace of
X. Applying the Banach space isomorphism T repeatedly, we conclude
that for every natural number m, the space Vm+1 := Ran((T )m+1 ) is a
proper closed subspace of Vm := Ran((T )m ). From the Riesz lemma, we
may thus find unit vectors xm in Vm for m = 0, 1, 2, . . . whose distance to
Vm+1 is at least 1/2 (say).
Now suppose that n > m 0. By construction, xn , (T )xn , (T )xm
all lie in Vm+1 , and thus T xn T xm xm + Vm+1 . Since xm lies at a
distance at least 1/2 from Vm+1 , we conclude the separation proeprty
||
.
2
But this implies that the sequence {T xn : n N} is not totally bounded,
contradicting the compactness of T .
kT xn T xm k

4.1.3. Third proof. Now we give another textbook proof of the Fredholm
alternative, based on Fredholm index theory. The basic idea is to observe
that the Fredholm alternative is easy when is large enough (and specifically, when || > kT kop ), as one can then invert T using Neumann series.
One can then attempt to continously perturb from large values to small
values, using stability results (such as Lemma 4.1.2) to ensure that invertibility does not suddenly get destroyed during this process. Unfortunately,
there is an obstruction to this strategy, which is that during the perturbation process, may pass through an eigenvalue of T . To get around this, we
will need to abandon the hypothesis that T has no eigenvalue at , and work

4.1. The Fredholm alternative

59

in the more general setting in which Ker(T ) is allowed to be non-trivial.


This leads to a lengthier proof, but one which lays the foundation for much
of Fredholm theory (which is more powerful than the Fredholm alternative
alone).
Fortunately, we still have analogues of much of the above theory in this
setting:
Proposition 4.1.5. Let C be non-zero, and let T : X X be a compact
operator on a Banach space X. Then the following statements hold;
(1) (Finite multiplicity) Ker(T ) is finite-dimensional.
(2) (Lower bound) There exists c > 0 such that kT xk c dist(x, Ker(T
)) for all x X.
(3) (Closure) Ran(T ) is a closed subspace of X.
(4) (Finite comultiplicity) Ran(T ) has finite codimension in X.
Proof. We begin with finite multiplicity. Suppose for contradiction that
Ker(T ) was infinite dimensional, then it must contain an infinite nested
sequence {0} = V0 ( V1 ( V2 ( . . . of finite-dimensional (and thus closed)
subspaces. Applying the Riesz lemma, we may find for each n = 1, 2, . . ., a
unit vector xn Vn of distance at least 1/2 from Vn1 . Since T xn = xn ,
we see that the sequence {T xn : n = 1, 2, . . .} is then ||/2-separated and
thus not totally bounded, contradicting the compactness of T .
The lower bound follows from the argument used to prove Lemma 4.1.2
after quotienting out the finite-dimensional space Ker(T ), and the closure assertion follows from the lower bound (again after quotienting out the
kernel) as before.
Finally, we establish finite comultiplicity. Suppose for contradiction that
the closed subspace Ran(T ) had infinite codimension, then by properties
of T already established, we see that Ran((T )m+1 ) is closed and has
infinite codimension in Ran((T )m ) for each m. One can then argue as
in the second proof to contradict total boundedness as before.

Remark 4.1.6. The above arguments also work if is replaced by an
invertible linear operator on X, or more generally by a Fredholm operator.
We can now define the index ind(T ) to be the dimension of the kernel
of T , minus the codimension of the range. To establish the Fredholm
alternative, it suffices to show that ind(T ) = 0 for all , as this implies
surjectivity of T whenever there is no eigenvalue. Note that Note that
when is sufficiently large, and in particular when || > kT kop , then T is
invertible by Neumann series and so one already has index zero in this case.

60

4. Analysis

To finish the proof, it suffices by the discrete nature of the index function
(which takes values in the integers) to establish continuity of the index:
Lemma 4.1.7 (Continuity of index). Let T : X X be a compact operator
on a Banach space. Then the function 7 ind(T ) is continuous from
C\{0} to Z.
Proof. Let be non-zero. Our task is to show that
ind(T 0 ) = ind(T )
for all 0 sufficiently close to .
In the model case when T is invertible (and thus has index zero),
the claim is easy, because (T 0 )(T )1 = 1 + ( 0 )(T )1 can
be inverted by Neumann series for 0 close enough to , giving rise to the
invertibility of T .
Now we handle the general case. As every finite dimensional space is
complemented, we can split X = Ker(T ) + V for some closed subspace
V of X, and similarly split X = Ran(T ) + W for some finite-dimensional
subspace W of X with dimension codim Ran(T ).
From the lower bound we see that T is a Banach space isomorphism
from V to Ran(T ). For 0 close to , we thus see that (T 0 )(V ) is close
to Ran(T ), in the sense that one can map the latter space to the former
by a small perturbation of the identity (in the operator norm). Since W
complements Ran(T ), it also complements (T 0 )(V ) for 0 sufficiently
close to . (To see this, observe that the composition of the obvious maps
X 7 W Ran(T ) W V W (T 0 )(V ) X
is a small perturbation of the identity map and is thus invertible for 0 close
to .)
Let : X W be the projection onto W with kernel (T 0 )(V ).
Then (T 0 ) maps the finite-dimensional space Ker(T ) to the finitedimensional space W . By the rank-nullity theorem, this map has index equal
to dim Ker(T ) dim(W ) = ind(T ). Gluing this with the Banach
space isomorphism T 0 : V Ran(T 0 ), we see that T 0 also has
index ind(T ), as desired.

Remark 4.1.8. Again, this result extends to more general Fredholm operators, with the result being that the index of a Fredholm operator is stable
with respect to continuous deformations in the operator norm topology.

4.2. The inverse function theorem for everywhere


differentiable functions
The classical inverse function theorem reads as follows:

4.2. Inverse function theorem

61

Theorem 4.2.1 (C 1 inverse function theorem). Let Rn be an open


set, and let f : Rn be an continuously differentiable function, such that
for every x0 , the derivative map Df (x0 ) : Rn Rn is invertible. Then
f is a local homeomorphism; thus, for every x0 , there exists an open
neighbourhood U of x0 and an open neighbourhood V of f (x0 ) such that f
is a homeomorphism from U to V .
It is also not difficult to show by inverting the Taylor expansion
f (x) = f (x0 ) + Df (x0 )(x x0 ) + o(kx x0 k)
that at each x0 , the local inverses f 1 : V U are also differentiable at
f (x0 ) with derivative
(4.1)

Df 1 (f (x0 )) = Df (x0 )1 .

The textbook proof of the inverse function theorem proceeds by an application of the contraction mapping theorem. Indeed, one may normalise
x0 = f (x0 ) = 0 and Df (0) to be the identity map; continuity of Df
then shows that Df (x) is close to the identity for small x, which may be
used (in conjunction with the fundamental theorem of calculus) to make
x 7 x f (x) + y a contraction on a small ball around the origin for small
y, at which point the contraction mapping theorem readily finishes off the
problem.
Less well known is the fact that the hypothesis of continuous differentiability may be relaxed to just everywhere differentiability:
Theorem 4.2.2 (Everywhere differentiable inverse function theorem). Let
Rn be an open set, and let f : Rn be an everywhere differentiable
function, such that for every x0 , the derivative map Df (x0 ) : Rn Rn
is invertible. Then f is a local homeomorphism; thus, for every x0 ,
there exists an open neighbourhood U of x0 and an open neighbourhood V of
f (x0 ) such that f is a homeomorphism from U to V .
As before, one can recover the differentiability of the local inverses, with
the derivative of the inverse given by the usual formula (4.1).
This result implicitly follows from the more general results of Cernavskii
[Ce1964] about the structure of finite-to-one open and closed maps, however the arguments there are somewhat complicated (and subsequent proofs
of those results, such as the one in [Va1966], use some powerful tools from
algebraic topology, such as dimension theory). There is however a more
elementary proof of Saint Raymond [Ra2002] that was pointed out to me
by Julien Melleray. It only uses basic point-set topology (for instance, the
concept of a connected component) and the basic topological and geometric

62

4. Analysis

structure of Euclidean space (in particular relying primarily on local compactness, local connectedness, and local convexity). I decided to present (an
arrangement of) Saint Raymonds proof here.
To obtain a local homeomorphism near x0 , there are basically two things
to show: local surjectivity near x0 (thus, for y near f (x0 ), one can solve
f (x) = y for some x near x0 ) and local injectivity near x0 (thus, for distinct
x1 , x2 near f (x0 ), f (x1 ) is not equal to f (x2 )). Local surjectivity is relatively
easy; basically, the standard proof of the inverse function theorem works
here, after replacing the contraction mapping theorem (which is no longer
available due to the possibly discontinuous nature of Df ) with the Brouwer
fixed point theorem instead (or one could also use degree theory, which is more
or less an equivalent approach). The difficulty is local injectivity - one needs
to preclude the existence of nearby points x1 , x2 with f (x1 ) = f (x2 ) = y;
note that in contrast to the contraction mapping theorem that provides both
existence and uniqueness of fixed points, the Brouwer fixed point theorem
only gives existence and not uniqueness.
In one dimension n = 1 one can proceed by using Rolles theorem. Indeed, as one traverses the interval from x1 to x2 , one must encounter some
intermediate point x which maximises the quantity |f (x ) y|, and which
is thus instantaneously non-increasing both to the left and to the right of
x . But, by hypothesis, f 0 (x ) is non-zero, and this easily leads to a contradiction.
Saint Raymonds argument for the higher dimensional case proceeds in
a broadly similar way. Starting with two nearby points x1 , x2 with f (x1 ) =
f (x2 ) = y, one finds a point x which locally extremises kf (x ) yk
in the following sense: kf (x ) yk is equal to some r > 0, but x is
adherent to at least two distinct connected components U1 , U2 of the set
U = {x : kf (x) yk < r }. (This is an oversimplification, as one has to
restrict the available points x in U to a suitably small compact set, but let
us ignore this technicality for now.) Note from the non-degenerate nature of
Df (x ) that x was already adherent to U ; the point is that x disconnects
U in some sense. Very roughly speaking, the way such a critical point x is
found is to look at the sets {x : kf (x) yk r} as r shrinks from a large
initial value down to zero, and one finds the first value of r below which
this set disconnects x1 from x2 . (Morally, one is performing some sort of
Morse theory here on the function x 7 kf (x) yk, though this function
does not have anywhere near enough regularity for classical Morse theory
to apply.)
The point x is mapped to a point f (x ) on the boundary B(y, r ) of
the ball B(y, r ), while the components U1 , U2 are mapped to the interior of
this ball. By using a continuity argument, one can show (again very roughly

4.2. Inverse function theorem

63

speaking) that f (U1 ) must contain a hemispherical neighbourhood {z


B(y, r ) : kz f (x )k < } of f (x ) inside B(y, r ), and similarly for f (U2 ).
But then from differentiability of f at x , one can then show that U1 and
U2 overlap near x , giving a contradiction.
We now give the rigorous argument. Fix x0 . By a translation, we
may assume x0 = f (x0 ) = 0; by a further linear change of variables, we may
also assume Df (0) (which by hypothesis is non-singular) to be the identity
map. By differentiability, we have
f (x) = x + o(kxk)
as x 0. In particular, there exists a ball B(0, r0 ) in such that
1
kf (x) xk < kxk
2
for all x B(0, r0 ); by rescaling we may take r0 = 1, thus
1
(4.2)
kf (x) xk < kxk whenever kxk 1.
2
Among other things, this gives a uniform lower bound
1
(4.3)
kf (x)k >
2
for all x B(0, 1), and a uniform upper bound
1
(4.4)
kf (x)k <
10
1
1
1
for all x B(0, 20 ); thus f maps B(0, 20 ) to B(0, 10
).
Proposition 4.2.3 (Local surjectivity). For any 0 < r < 1, f (B(0, r))
contains B(0, r/2).
Proof. Let y B(0, r/2). From (4.2), we see that the map f : B(0, r)
f (B(0, r)) avoids y, and has degree 1 around y; contracting B(0, r) to a
point, we conclude that f (x) = y for some x B(0, r), yielding the claim.
Alternatively, one may proceed by invoking the Brouwer fixed point theorem, noting that the map x 7 x f (x) + y is continuous and maps the
closed ball B(0, r) to the open ball B(0, r) by (4.2), and has a fixed point
precisely when f (x) = y.
A third argument (avoiding the use of degree theory or the Brouwer
fixed point theorem, but requiring one to replace B(0, r/2) with the slightly
smaller ball B(0, r/3)) is as follows: let x B(0, r) minimise kf (x) yk.
From (4.2) and the hypothesis y B(0, r/3) we see that x lies in the interior
B(0, r). If the minimum is zero, then we have found a solution to f (x) = y
as required; if not, then we have a stationary point of x 7 kf (x) yk, which
implies that Df (x) is degenerate, a contradiction. (One can recover the full

64

4. Analysis

ball B(0, r/2) by tweaking the expression kf (x) yk to be minimised in a


suitable fashion; we leave this as an exercise for the interested reader.) 
Corollary 4.2.4. f is an open map: the image of any open set is open.
Proof. It suffices to show that for every x , the image of any open
neighbourhood of x is an open neighbourhood of f (x). Proposition 4.2.3
handles the case x = 0; the general case follows by renormalising.

1
Suppose we could show that f is injective on B(0, 20
). By Corollary
1
1
1
4.2.4, the inverse map f
: f (B(0, 20 )) B(0, 20 ) is also continuous.
1
1
Thus f is a homeomorphism from B(0, 20
) to f (B(0, 20
)), which are both
neighbourhoods of 0 by Proposition 4.2.3; giving the claim.

It remains to establish injectivity. Suppose for sake of contradiction that


1
1
) and y B(0, 10
)
this was not the case. Then there exists x1 , x2 B(0, 20
such that
y = f (x1 ) = f (x2 ).
For every radius r 0, the set
Kr := {x : kf (x) yk r}
is closed and contains both x1 and x2 . Let Kr1 denote the connected component of Kr that contains x1 . Since Kr is non-decreasing in r, Kr1 is
non-decreasing also.
Now let us study the behaviour of Kr1 as r ranges from 0 to
extreme cases are easy to analyse:

4
10 .

The two

Lemma 4.2.5. K01 = {x1 }.


Proof. Since Df (x1 ) is non-singular, we see from differentiability that f (x) 6=
f (x1 ) for all x 6= x1 sufficiently close to x1 . Thus x1 is an isolated point of
K0 , and the claim follows.

1
2
4
Lemma 4.2.6. We have B(0, 20
) Kr1 B(0, 1) for all 10
r 10
. In
4
2
1
particular, Kr is compact for all 0 r 10 , and contains x2 for 10 r
4
10 .
1
1
1
Proof. Since f (B(0, 20
)) B(f (0), 10
) B(y, r), we see that B(0, 20
)
1
1
Kr ; since B(0, 20 ) is connected and contains x1 , we conclude that B(0, 20 )
Kr1 .

Next, if x B(0, 1), then by (4.3) we have f (x) 6 B(0, 12 ), and hence
f (x) 6 B(y, r). Thus Kr is disjoint from the sphere B(0, 1). Since x1 lies
in the interior of this sphere we thus have Kr1 B(0, 1) as required.

Next, we show that the Kr1 increase continuously in r:

4.2. Inverse function theorem

1
Lemma 4.2.7. If 0 r < 20
and > 0, then for r < r0 <
1
close to r, Kr0 is contained in an -neighbourhood of Kr1 .

65

1
20

sufficiently

T
Proof. By the finite intersection property, it suffices to show that r0 >r Kr10 =
Kr1 . Suppose for contradiction that there is a point x outside of Kr1 that
lies in Kr10 for all r0 > r. Then x lies in Kr0 for all r0 > r, and hence lies
in Kr B(0, 1). As x and x1 lie in different connected components of the
compact set Kr B(0, 1) (recall that Kr is disjoint from B(0, 1)), there
must be a partition of Kr B(0, 1) into two disjoint closed sets F, G that
separate x from x1 (for otherwise the only clopen sets in Kr B(0, 1) that
contain x1 would also contain x, and their intersection would then be a connected subset of Kr B(0, 1) that contains both x1 and x, contradicting the
fact that x lies outside Kr1 ). By normality, we may find open neighbourhoods U, V of F, G that are disjoint. For all x on the boundary U , one has
kf (x) yk > r for all x U . As U is compact and f is continuous, we
thus have kf (x) yk > r0 for all x U if r0 is sufficiently close to r. This
makes U Kr0 clopen in Kr0 , and so x cannot lie in Kr10 , giving the desired
contradiction.

2
Observe that Kr1 contains x2 for r 10
, but does not contain x2 for
1
r = 0. By the monotonicity of the Kr and least upper bound principle,
2
there must therefore exist a critical 0 r 10
such that Kr1 contains x2
for all r > r , but does not contain x2 for r < r . From Lemma 4.2.7 we see
that Kr1 must also contain x2 . In particular, by Lemma 4.2.5, r > 0.

We now analyse the critical set Kr1 . By construction, this set is connected, compact, contains both x1 and x2 , contained in B(0, 1), and one has
kf (x) yk r for all x Kr1 .
Lemma 4.2.8. The set U := {x Kr1 : kf (x) yk < r } is open and
disconnected.
Proof. The openness is clear from the continuity of f (and the local connectedness of Rn ). Now we show disconnectedness. Being an open subset of
Rn , connectedness is equivalent to path connectedness, and x1 and x2 both
lie in U , so it suffices to show that x1 and x2 cannot be joined by a path in
U . But if such a path existed, then by compactness of and continuity of
f , one would have Kr for some r < r . This would imply that x2 Kr1 ,
contradicting the minimal nature of r , and the claim follows.

Lemma 4.2.9. U has at most finitely many connected components.
Proof. Let U1 be a connected component of U ; then f (U1 ) is non-empty
and contained in B(y, r ). As U is open, U1 is also open, and thus by
Corollary 4.2.4, f (U1 ) is open also.

66

4. Analysis

We claim that f (U1 ) is in fact all of B(y, r ). Suppose this were not the
case. As B(y, r ) is connected, this would imply that f (U1 ) is not closed in
B(y, r ); thus there is an element z of B(y, r ) which is adherent to f (U1 ),
but does not lie in f (U1 ). Thus one may find a sequence xn in U1 with
f (xn ) converging to z. By compactness of Kr1 (which contains U1 ), we may
pass to a subsequence and assume that xn converges to a limit x in Kr1 ;
then f (x) = z. By continuity, there is thus a ball B centred at x that is
mapped to B(y, r) for some r < r ; this implies that B lies in Kr and hence
in Kr1 (since x Kr1 ) and thence in U (since r is strictly less than r ). As
x is adherent to U1 and B is connected, we conclude that B lies in U1 . In
particular x lies in U1 and so z = f (x) lies in f (U1 ), a contradiction.
As f (U1 ) is equal to B(y, r ), we thus see that U1 contains an element
of f 1 ({y}). However, each element x of f 1 ({y}) must be isolated since
Df (x) is non-singular. By compactness of Kr1 , the set Kr1 (and hence U )
thus contains at most finitely many elements of f 1 ({y}), and so there are
finitely many components as claimed.

Lemma 4.2.10. Every point in Kr1 is adherent to U (i.e. U = Kr1 ).
Proof. If x Kr1 , then kf (x) yk r . If kf (x) yk < r then x U
and we are done, so we may assume kf (x) yk = r . By differentiability,
one has
f (x0 ) = f (x) + Df (x)(x0 x) + o(kx0 xk)
for all x0 sufficiently close to x. If we choose x0 to lie on a ray emenating from
x such that Df (x)(x0 x) lies on a ray pointing towards y from f (x) (this
is possible as Df (x) is non-singular), we conclude that for all x0 sufficiently
close to x on this ray, kf (x0 ) yk < r . Thus all such points x0 lie in Kr ;
since x lies in Kr1 and the ray is locally connected, we see that all such
points x0 in fact lie in Kr1 and thence in U . The claim follows.

Corollary 4.2.11. There exists a point x Kr1 with kf (x )yk = r (i.e.
x lies outside U ) which is adherent to at least two connected components of
U.
Proof. Suppose this were not the case, then the closures of all the connected
components of U would be disjoint. (Note that an element of one connected
component of U cannot lie in the closure of another component.) By Lemma
4.2.10, these closures would form a partition of Kr1 by closed sets. By
Lemma 4.2.8, there are at least two such closed sets, each of which is nonempty; by Lemma 4.2.9, the number of such closed sets is finite. But this
contradicts the connectedness of Kr1 .

Next, we prove

4.2. Inverse function theorem

67

Proposition 4.2.12. Let x Kr1 be such that kf (x ) yk = r , and


suppose that x is adherent to a connected component U1 of U . Let be the
vector such that
Df (x ) = y f (x )

(4.5)

(this vector exists and is non-zero since Df (x ) is non-singular). Then U1


contains an open ray of the form {x + t : 0 < t < } for some > 0.
This together with Corollary 4.2.11 gives the desired contradiction, since
one cannot have two distinct components U1 , U2 both contain a ray from x
in the direction .
Proof. As f is differentiable at x , we have
f (x + t) = f (x ) + Df (x )t + o(|t|)
for all sufficiently small t; we rearrange this using (4.5) as
f (x + t) y = (1 t)(f (x ) y) + o(|t|).
In particular, f (x + t) B(y, r ) for all sufficiently small positive t. This
shows that all sufficiently small open rays {x + t : 0 < t < } lie in Kr ,
hence in Kr1 (since x Kr1 ), and hence in U . In fact, the same argument
shows that there is a cone
(4.6)

{x + t 0 : 0 < t < ; k 0 k }

that will lie in U if is small enough. As this cone is connected, it thus


suffices to show that U1 intersects this cone.
Let > 0 be a small radius to be chosen later. As Df (x ) is non-singular,
we see if is small enough that f (x) 6= f (x ) whenever kx x k = . By
continuity, we may thus find > 0 such that kf (x) f (x )k > whenever
kx x k = .
Consider the set
U 0 := {x U1 : kx x k ; kf (x) f (x )k < }.
As x is adherent to U1 , U 0 is non-empty. By construction of , we see that
we also have
U 0 := {x U1 : kx x k < ; kf (x) f (x )k < }
and so U 0 is open. By Corollary 4.2.4, f (U 0 ) is then also non-empty and
open. By construction, f (U 0 ) also lies in the set
D := {z B(y, r ) : kz f (x )k < }.
We claim that f (U 0 ) is in fact all of D. The proof will be a variant of the
proof of Lemma 4.2.9. Suppose this were not the case. As D is connected,
this implies that there is an element z of D which is adherent to f (U 0 ), but

68

4. Analysis

does not lie in f (U 0 ). Thus one may find a sequence xn in U 0 with f (xn )
converging to z. By compactness of Kr1 (which contains U 0 ), we may pass
to a subsequence and assume that xn converges to a limit x in Kr1 ; then
f (x) = z. By continuity, there is thus a ball B centred at x contained in
B(x , ) that is mapped to B(y, r) D for some r < r ; this implies that
B lies in Kr and hence in Kr1 (since x Kr1 ) and thence in U (since r
is strictly less than r ). As x is adherent to U1 and B is connected, we
conclude that B lies in U1 and thence in U 0 . In particular x lies in U 0 and
so z = f (x) lies in f (U 0 ), a contradiction.
As f (U 0 ) = D, we may thus find a sequence tn > 0 converging to zero,
and a sequence xn U 0 , such that
f (xn ) = f (x ) + tn (y f (x )).
However, if is small enough, we have kf (xn ) f (x )k comparable to kxn
x k (cf. (4.2)), and so xn converges to x . By Taylor expansion, we then
have
f (xn ) = f (x ) + Df (x )(xn x ) + o(kxn x k)
and thus
(Df (x ) + o(1))(xn x ) = tn Df (x )
for some matrix-valued error o(1). Since Df (x ) is invertible, this implies
that
xn x = tn (1 + o(1)) = tn + o(tn ).
In particular, xn lies in the cone (4.6) for n large enough, and the claim
follows.


4.3. Steins interpolation theorem


One of Eli Steins very first results that is still used extremely widely today, is
his interpolation theorem [St1956] (and its refinement, the Fefferman-Stein
interpolation theorem [FeSt1972]). This is a deceptively innocuous, yet
remarkably powerful, generalisation of the classic Riesz-Thorin interpolation
theorem (see e.g. [Ta2010, Theorem 1.11.7]) which uses methods from
complex analysis (and in particular, the Lindel
of theorem or the PhragmenLindel
of principle) to show that if a linear operator T : Lp0 (X) + Lp1 (X)
Lq0 (Y )+Lq1 (Y ) from one (-finite) measure space X = (X, X , ) to another
Y = (Y, Y, ) obeyed the estimates
(4.7)

kT f kLq0 (Y ) B0 kf kLp0 (X)

for all f Lp0 (X) and


(4.8)

kT f kLq1 (Y ) B1 kf kLp1 (X)

4.3. Steins interpolation theorem

69

for all f Lp1 (X), where 1 p0 , p1 , q0 , q1 and B0 , B1 > 0, then one


automatically also has the interpolated estimates
(4.9)

kT f kLq (Y ) B kf kLp (X)

for all f Lp (X) and 0 1, where the quantities p , q , B are defined


by the formulae
1

1
+
=
p
p0
p1

1
1
+
=
q
q0
q1
B = B01 B1 .
The Riesz-Thorin theorem is already quite useful (it gives, for instance,
by far the quickest proof of the Hausdorff-Young inequality for the Fourier
transform, to name just one application; see e.g.[Ta2010, (1.103)]), but it
requires the same linear operator T to appear in (4.7), (4.8), and (4.9). Stein
realised, though, that due to the complex-analytic nature of the proof of the
Riesz-Thorin theorem, it was possible to allow different linear operators to
appear in (4.7), (4.8), (4.9), so long as the dependence was analytic. A bit
more precisely: if one had a family Tz of operators which depended in an
analytic manner on a complex variable z in the strip {z C : 0 Re(z) 1}
(thus, for any test functions f, g, the inner product hTz f, gi would be analytic
in z) which obeyed some mild regularity assumptions (which are slightly
technical and are omitted here), and one had the estimates
kT0+it f kLq0 (Y ) Ct kf kLp0 (X)
and
kT1+it f kLq1 (Y ) Ct kf kLp1 (X)
for all t R and some quantities Ct that grew at most exponentially in t
(actually, any growth rate significantly slower than the double-exponential
eexp(|t|) would suffice here), then one also has the interpolated estimates
kT f kLq (Y ) C 0 kf kLp (X)
for all 0 1 and a constant C 0 depending only on C, p0 , p1 , q0 , q1 .
In [Fe1995], Fefferman notes that the proof of the Stein interpolation
theorem can be obtained from that of the Riesz-Thorin theorem simply by
adding a single letter of the alphabet. Indeed, the way the Riesz-Thorin
theorem is proven is to study an expression of the form
Z
F (z) :=
T fz (y)gz (y) dy,
Y

where fz , gz are functions depending on z in a suitably analytic manner,


for instance taking fz = |f |

1z
+ pz
p0
1

sgn(f ) for some test function f , and

70

4. Analysis

similarly for g. If fz , gz are chosen properly, F will depend analytically on


z as well, and the two hypotheses (4.7), (4.8) give bounds on F (0 + it) and
F (1 + it) for t R respectively. The Lindel
of theorem then gives bounds on
intermediate values of F , such as F (); and the Riesz-Thorin theorem can
then be deduced by a duality argument. (This is covered in many graduate
real analysis texts; see e.g. [Ta2010, 1.11].)
The Stein interpolation theorem proceeds by instead studying the expression
Z
F (z) :=
Tz fz (y)gz (y) dy.
Y

One can then repeat the proof of the Riesz-Thorin theorem more or less
verbatim to obtain the Stein interpolation theorem.
The ability to vary the operator T makes the Stein interpolation theorem
significantly more flexible than the Riesz-Thorin theorem. We illustrate this
with the following sample result:
Proposition 4.3.1. For any (test) function f : R2 R, let T f : R2 R
be the average of f along an arc of a parabola:
Z
T f (x1 , x2 ) :=
f (x1 t, x2 t2 )(t) dt
R

where is a bump function supported on (say) [1, 1]. Then T is bounded


from L3/2 (R2 ) to L3 (R2 ), thus
(4.10)

kT f kL3 (R2 ) Ckf kL3/2 (R2 ) .

There is nothing too special here about the parabola; the same result
in fact holds for convolution operators on any arc of a smooth curve with
nonzero curvature (and there are many extensions to higher dimensions,
to variable-coefficient operators, etc.). We will however restrict attention
to the parabola for sake of exposition. One can view T f as a convolution
T f = f , where is a measure on the parabola arc {(t, t2 ) : |t| 1}.
We will also be somewhat vague about what test function means in this
exposition in order to gloss over some minor technical details.
By testing T (and its adjoint) on the indicator function of a small ball
of some radius > 0 (or of small rectangles such as [, ] [0, 2 ]) one sees
that the exponent L3/2 , L3 here are best possible.
This proposition was first proven in [Li1973] using the Stein interpolation theorem. To illustrate the power of this theorem, it should be noted
that for almost two decades this was the only known proof of this result;
a proof based on multilinear interpolation (exploiting the fact that the exponent 3 in (4.10) is an integer) was obtained in [Ob1992], and a fully

4.3. Steins interpolation theorem

71

combinatorial proof was only obtained in [Ch2008] (see also [St2010],


[DeFoMaWr2010] for further extensions of the combinatorial argument).
To motivate the Stein interpolation argument, let us first try using the
Riesz-Thorin interpolation theorem first. The exponent pair L3/2 L3 is
an interpolant between L2 L2 and L1 L , so a first attempt to proceed
here would be to establish the bounds
kT f kL2 (R2 ) Ckf kL2 (R2 )

(4.11)
and

kT f kL (R2 ) Ckf kL1 (R2 )

(4.12)
for all (test) functions f

The bound (4.11) is an easy consequence of Minkowskis integral inequality(or Youngs inequality, noting that is a finite measure). On the other
hand, because the measure is not absolutely continuous, let alone arising
from an L (R2 ) function, the estimate (4.12) is very false. For instance, if
one applies T f to the indicator function 1[,][,] for some small > 0,
then the L1 norm of f is 2 , but the L norm of T f is comparable to ,
contradicting (4.12) as one sense to zero.
To get around this, one first notes that there is a lot of room in (4.11)
due to the smoothing properties of the measure . Indeed, from Plancherels
theorem one has
kf kL2 (R2 ) = kfkL2 (R2 )
and
kT f kL2 (R2 ) = kf
kL2 (R2 )
for all test functions f , where
f() :=

e2ix f (x) dx

R2

is the Fourier transform of f , and


Z
2

(1 , 2 ) :=
e2i(t1 +t 2 ) (t) dt.
R

It is clear that
() is uniformly bounded in , which already gives (4.11).
But a standard application of the method of stationary phase reveals that
one in fact has a decay estimate
C
(4.13)
|
()| 1/2
||
for some C > 0. This shows that T f is not just in L2 , but is somewhat
smoother as well; in particular, one has
kD1/2 T f kL2 (R2 ) Ckf kL2 (R2 )

72

4. Analysis

for any (fractional) differential operator D1/2 of order 1/2. (Here we adopt
the usual convention that the constant C is allowed to vary from line to
line.)
Using the numerology of the Stein interpolation theorem, this suggests
that if we can somehow obtain the counterbalancing estimate
kD1 T f kL (R2 ) Ckf kL1 (R2 )
for some differential operator D1 of order 1, then we should be able to
interpolate and obtain the desired estimate (4.10). And indeed, we can take
an antiderivative in the x2 direction, giving the operator
Z Z 0
1
x2 T f (x1 , x2 ) :=
f (x1 t, x2 t2 s) (t)dtds;
R

and a simple change of variables does indeed verify that this operator is
bounded from L1 (R2 ) to L (R2 ).
Unfortunately, the above argument is not rigorous, because we need
an analytic family of operators Tz in order to invoke the Stein interpolation
theorem, rather than just two operators T0 and T1 . This turns out to require
some slightly tricky complex analysis: after some trial and error, one finds
that one can use the family Tz defined for Re(z) > 1/3 by the formula
Z Z 0
1
1
Tz f (x1 , x2 ) =
f (x1 t, x2 t2 s) (t)dtds
(33z)/2
((3z 1)/2) R s
where is the Gamma function, and extended to the rest of the complex
plane by analytic continuation. The Gamma factor is a technical one, needed
1
as z approaches 1/3;
to compensate for the divergence of the weight s(33z)/2
it also makes the Fourier representation of Tz cleaner (indeed, Tz f is morally
(13z)/2
x2
f ). It is then easy to verify the estimates
(4.14)

kT1+it f kL (R2 ) Ct kf kL1 (R2 )

for all t R (with Ct growing at a controlled rate), while from Fourier


analysis one also can show that
(4.15)

kT0+it f kL2 (R2 ) Ct kf kL2 (R2 )

for all t R. Finally, one can verify that T1/3 = T , and (4.10) then follows
from the Stein interpolation theorem.
It is instructive to compare this result with what can be obtained by
real-variable methods. One can perform a smooth dyadic partition of unity
(s) = (s) +

X
j=1

2j (2j s)

4.3. Steins interpolation theorem

73

for some bump function (of total mass 1) and bump function (of total
mass zero), which (formally, at least) leads to the decomposition
T f = T0 f +

Tj f

j=1

where T0 f is a harmless smoothing operator (which certainly maps L3/2 (R2 )


to L3 (R2 )) and
Z Z
Tj f (x1 , x2 ) :=
2j (2j s)f (x1 t, x2 t2 s)(t) dtds.
R

It is not difficult to show that


kTj f kL (R2 ) C2j kf kL1 (R2 )

(4.16)

while a Fourier-analytic computation (using (4.13)) reveals that


kTj f kL2 (R2 ) C2j/2 kf kL2 (R2 )

(4.17)

which interpolates (by, say, the Riesz-Thorin theorem, or the real-variable


Marcinkiewicz interpolation theorem, see [Ta2010, Theorem 1.11.10]) to
kTj f kL3 (R2 ) Ckf kL3/2 (R2 )
which is close to (4.10). Unfortunately, we still have to sum in j, and this
creates a logarithmic divergence that just barely fails2 to recover (4.10).
The key difference is that the inputs (4.14), (4.15) used in the Stein
interpolation theorem are more powerful than the inputs (4.16), (4.17) in the
real-variable method. Indeed, (4.14) is roughly equivalent to the assertion
that

X
k
e2ijt 2j Tj f kL (R2 ) Ct kf kL1 (R2 )
j=1

and (4.15) is similarly equivalent to the assertion that


k

e2ijt 2j/2 Tj f kL2 (R2 ) Ct kf kL2 (R2 ) .

j=1

A Fourier averaging argument shows that these estimates imply (4.16) and
(4.17), but not conversely. If one unpacks the proof of Lindelofs theorem
(which is ultimately powered by an integral representation, such as that
provided by the Cauchy integral formula) and hence of the Stein interpolation theorem, one can interpret Stein
P interpolation in this case as using
a clever integral representation of
j=1 Tj f in terms of expressions such
2With a slightly more refined real interpolation argument, one can at least obtain a restricted
weak-type estimate from L3/2,1 (R2 ) to L3, (R2 ) this way, but one can concoct abstract coun3/2 L3
terexamples
P to show that the estimates (4.16), (4.17) are insufficient to obtain an L
bound on
T
.
j=1 j

74

4. Analysis

P 2ijt j/2
P
2ijt 2j T f
2 Tj f0+it , where f1+it , f0+it are
as
j 1+it and
j=1 e
j=1 e
various nonlinear transforms of f . Technically, it would then be possible
to rewrite the Stein interpolation argument as a real-variable one, without explicit mention of Lindelofs theorem; but the proof would then look
extremely contrived; the complex-analytic framework is much more natural (much as it is in analytic number theory, where the distribution of the
primes is best handled by a complex-analytic study of the Riemann zeta
function).
Remark 4.3.2. A useful strengthening of the Stein interpolation theorem
is the Fefferman-Stein interpolation theorem [FeSt1972], in which the endpoint spaces L1 and L are replaced by the Hardy space H1 and the space
BMO of functions of bounded mean oscillation respectively. These spaces
are more stable with respect to various harmonic analysis operators, such
as singular integrals (and in particular, with respect to the Marcinkiewicz
operators ||it , which come up frequently when attempting to use the complex method), which makes the Fefferman-Stein theorem particularly useful
for controlling expressions derived from these sorts of operators.

4.4. The Cotlar-Stein lemma


A basic problem in harmonic analysis (as well as in linear algebra, random
matrix theory, and high-dimensional geometry) is to estimate the operator
norm kT kop of a linear map T : H H 0 between two Hilbert spaces, which
we will take to be complex for sake of discussion. Even the finite-dimensional
case T : Cm Cn is of interest, as this operator norm is the same as the
largest singular value 1 (A) of the n m matrix A associated to T .
In general, this operator norm is hard to compute precisely, except in
special cases. One such special case is that of a diagonal operator, such
as that associated to an n n diagonal matrix D = diag(1 , . . . , n ). In
this case, the operator norm is simply the supremum norm of the diagonal
coefficients:
(4.18)

kDkop = sup |i |.
1in

A variant of (4.18) is Schurs test, which for simplicity we will phrase in


the setting of finite-dimensional operators T : Cm Cn given by a matrix
A = (aij )1in;1jm via the usual formula
m
X
T (xj )m
:=
(
aij xj )ni=1 .
j=1
j=1

4.4. The Cotlar-Stein lemma

75

A simple version of this test is as follows: if all the absolute row sums and
columns sums of A are bounded by some constant M , thus
m
X

(4.19)

|aij | M

j=1

for all 1 i n and


n
X

(4.20)

|aij | M

i=1

for all 1 j m, then


kT kop = kAkop M

(4.21)

(note that this generalises (the upper bound in) (4.18).) Indeed, to see
(4.21), it suffices by duality and homogeneity to show that
|

n X
m
X
(
aij xj )yi | M
i=1 j=1

Pm
Pn
n
2
2
whenever (xj )m
j=1 and (yi )i=1 are sequences with
j=1 |xj | =
i=1 |yi | =
1; but this easily follows from the arithmetic mean-geometric mean inequality
1
1
|aij xj )yi | |aij ||xi |2 + |aij ||yj |2
2
2
and (4.19), (4.20).
Schurs test (4.21) (and its many generalisations to weighted situations,
or to Lebesgue or Lorentz spaces) is particularly useful for controlling operators in which the role of oscillation (as reflected in the phase of the
coefficients aij , as opposed to just their magnitudes |aij |) is not decisive.
However, it is of limited use in situations that involve a lot of cancellation.
For this, a different test, known as the Cotlar-Stein lemma [Co1955], is
much more flexible and powerful. It can be viewed in a sense as a noncommutative variant of Schurs test (4.21) (or of (4.18)), in which the scalar
coefficients i or aij are replaced by operators instead.
To illustrate the basic flavour of the result, let us return to the bound
(4.18), and now consider instead a block-diagonal matrix

1 0 . . . 0
0 2 . . . 0

(4.22)
A= .
.. . .
..
.
.
. .
.
0
0 . . . n

76

4. Analysis

where each i is now a mi mi matrix, and so A is an m m matrix with


m := m1 + . . . + mn . Then we have
(4.23)

kAkop = sup ki kop .


1in

Indeed, the lower bound is trivial (as can be seen by testing A on vectors
which are supported on the ith block of coordinates), while to establish the
upper bound, one can make use of the orthogonal decomposition
m
M
m
(4.24)
C
Cmi
i=1

to decompose an arbitrary vector x Cm as



x1
x2

x= .
..
xn
with xi Cmi , in which case we have

1 x1
2 x2

Ax = .
.
.
n xn

and the upper bound in (4.23) then follows from a simple computation.
The operator
T associated to the matrix A in (4.22) can be viewed as a
Pn
sum T = i=1 Ti , where each Ti corresponds to the i block of A, in which
case (4.23) can also be written as
(4.25)

kT kop = sup kTi kop .


1in

When n is large, this is a significant improvement over the triangle inequality,


which merely gives
X
kT kop
kTi kop .
1in

The reason for this gain can ultimately be traced back to the orthogonality
of the Ti ; that they occupy different columns and different rows of
the range and domain of T . This is obvious when viewed in the matrix
formalism, but can also be described in the more abstract Hilbert space
operator formalism via the identities3
(4.26)

Ti Tj = 0

3The first identity asserts that the ranges of the T are orthogonal to each other, and the
i
second asserts that the coranges of the Ti (the ranges of the adjoints Ti ) are orthogonal to each
other.

4.4. The Cotlar-Stein lemma

77

and
Ti T j = 0

(4.27)

whenever i 6= j. By replacing (4.24) with a more abstract orthogonal decomposition into these ranges and coranges, one can in fact deduce (4.25)
directly from (4.26) and (4.27).
The Cotlar-Stein lemma is an extension of this observation to the case
where the Ti are merely almost orthogonal rather than orthogonal, in a
manner somewhat analogous to how Schurs test (partially) extends (4.18)
to the non-diagonal case. Specifically, we have
Lemma 4.4.1 (Cotlar-Stein lemma). Let T1 , . . . , Tn : H H 0 be a finite
sequence of bounded linear operators from one Hilbert space H to another
H 0 , obeying the bounds
(4.28)

n
X

kTi Tj k1/2
op M

j=1

and
(4.29)

n
X

1/2
kTi Tj kop
M

j=1

for all i = 1, . . . , n and some M > 0 (compare with (4.19), (4.20)). Then
one has
n
X
(4.30)
k
Ti kop M.
i=1

Note from the basic T T identity


(4.31)

1/2
kT kop = kT T k1/2
op = kT T kop

that the hypothesis (4.28) (or (4.29)) already gives the bound
(4.32)

kTi kop M

on each component Ti of T , which by the triangle inequality gives the inferior


bound
n
X
k
Ti kop nM ;
i=1

the point of the Cotlar-Stein lemma is that the dependence on n in this


bound is eliminated in (4.30), which in particular makes the bound suitable
for extension to the limit n (see Remark 4.4.2 below).
The Cotlar-Stein lemma was first established by Cotlar [Co1955] in the
special case of commuting self-adjoint operators, and then independently by
Cotlar and Stein in full generality, with the proof appearing in [KnSt1971].

78

4. Analysis

The Cotlar-Stein lemma is often useful in controlling operators such as


singular integral operators or pseudo-differential operators T which do not
mix scales together too much, in that operators T map functions that
oscillate at a given scale 2i to functions that still mostly oscillate at the
same scale 2i . In that case, one can often split T into components Ti which
essentically capture the scale 2i behaviour, and understanding L2 boundedness properties of T then reduces to establishing the boundedness of the
simpler operators Ti (and of establishing a sufficient decay in products such
as Ti Tj or Ti Tj when i and j are separated from each other). In some cases,
one can use Fourier-analytic tools such as Littlewood-Paley projections to
generate the Ti , but the true power of the Cotlar-Stein lemma comes from
situations in which the Fourier transform is not suitable, such as when one
has a complicated domain (e.g. a manifold or a non-abelian Lie group),
or very rough coefficients (which would then have badlyPbehaved Fourier
behaviour). One can then select the decomposition T = i Ti in a fashion
that is tailored to the particular operator T , and is not necessarily dictated
by Fourier-analytic considerations.
Once one is in the almost orthogonal setting, as opposed to the genuinely
orthogonal setting, the previous arguments based on orthogonal projection
seem to fail completely. Instead, the proof of the Cotlar-Stein lemma proceeds via an elegant application of the tensor power trick (or perhaps more
accurately, the power method ), in which the operator norm of T is understood through the operator norm of a large power of T (or more precisely,
of its self-adjoint square T T or T T ). Indeed, from an iteration of (4.31)
we see that for any natural number N , one has
(4.33)

N
kT k2N
op = k(T T ) kop .

To estimate the right-hand side, we expand out the right-hand side and
apply the triangle inequality to bound it by
X
(4.34)
kTi1 Tj1 Ti2 Tj2 . . . TiN TjN kop .
i1 ,j1 ,...,iN ,jN {1,...,n}

Recall that when we applied the triangle inequality directly to T , we lost a


factor of n in the final estimate; it will turn out that we will lose a similar
factor here, but this factor will eventually be attenuated into nothingness
by the tensor power trick.
To bound (4.34), we use the basic inequality kST kop kSkop kT kop in
two different ways. If we group the product Ti1 Tj1 Ti2 Tj2 . . . TiN TjN in pairs,
we can bound the summand of (4.34) by
kTi1 Tj1 kop . . . kTiN TjN kop .

4.4. The Cotlar-Stein lemma

79

On the other hand, we can group the product by pairs in another way, to
obtain the bound of
kTi1 kop kTj1 Ti2 kop . . . kTjN 1 TiN kop kTjN kop .
We bound kTi1 kop and kTjN kop crudely by M using (4.32). Taking the
geometric mean of the above bounds, we can thus bound (4.34) by
X
1/2

1/2
1/2
kTi1 Tj1 k1/2
M
op kTj1 Ti2 kop . . . kTjN 1 TiN kop kTiN TjN kop .
i1 ,j1 ,...,iN ,jN {1,...,n}

If we then sum this series first in jN , then in iN , then moving back all the
way to i1 , using (4.28) and (4.29) alternately, we obtain a final bound of
nM 2N
for (4.33). Taking N th roots, we obtain
kT kop n1/2N M.
Sending N , we obtain the claim.
Remark 4.4.2. As observed in a number of places (see e.g. [St1993, p.
318]
P or [Co2007]), the Cotlar-Stein lemma can be extended to infinite sums
(4.28), (4.29)). Indeed,
i=1 Ti (with the obvious changes to the hypotheses
P
one can show that for any f H, the sum i=1 Ti f is unconditionally convergent inPH 0 (and furthermore has bounded 2-variation), and the resulting
operator
i=1 Ti is a bounded linear operator with an operator norm bound
on M .
Remark 4.4.3. If we specialise to the case where all the Ti are equal, we
see that the bound in the Cotlar-Stein lemma is sharp, at least in this case.
Thus we see how the tensor power trick can convert an inefficient argument,
such as that obtained using the triangle inequality or crude bounds such as
(4.32), into an efficient one.
Remark 4.4.4. One can justify Schurs test by a similar method. Indeed,
starting from the inequality
N
kAk2N
op tr((AA ) )

(which follows easily from the singular value decomposition), we can bound
kAk2N
op by
X
ai1 ,j1 aj1 ,i2 . . . aiN ,jN ajN ,i1 .
i1 ,...,jN {1,...,n}

Estimating the other two terms in the summand by M , and then repeatedly
summing the indices one at a time as before, we obtain
2N
kAk2N
op nM

80

4. Analysis

and the claim follows from the tensor power trick as before. On the other
hand, in the converse direction, I do not know of any way to prove the
Cotlar-Stein lemma that does not basically go through the tensor power
argument.

4.5. Steins spherical maximal inequality


If f : Rd C is a locally integrable function, we define the Hardy-Littlewood
maximal function M f : Rd C by the formula
Z
1
M f (x) := sup
|f (y)| dy,
r>0 |B(x, r)| B(x,r)
where B(x, r) is the ball of radius r centred at x, and |E| denotes the measure
of a set E. The Hardy-Littlewood maximal inequality asserts that
(4.35)

|{x Rd : M f (x) > }|

Cd
kf kL1 (Rd )

for all f L1 (Rd ), all > 0, and some constant Cd > 0 depending only on
d. By a standard density argument, this implies in particular that we have
the Lebesgue differentiation theorem
Z
1
lim
f (y) dy = f (x)
r0 |B(x, r)| B(x,r)
for all f L1 (Rd ) and almost every x Rd . See for instance [Ta2011,
Theorem 1.6.11].
By combining the Hardy-Littlewood maximal inequality with the Marcinkiewicz
interpolation theorem [Ta2010, 1.11.10] (and the trivial inequality kM f kL (Rd )
kf kL (Rd ) ) we see that
(4.36)

kM f kLp (Rd ) Cd,p kf kLp (Rd )

for all p > 1 and f Lp (Rd ), and some constant Cd,p depending on d and
p.
The exact dependence of Cd,p on d and p is still not completely understood. The standard Vitali-type covering argument used to establish
(4.35) has an exponential dependence on dimension, giving a constant of
the form Cd = C d for some absolute constant C > 1. Inserting this into the
Cd
Marcinkiewicz theorem, one obtains a constant Cd,p of the form Cd,p = p1
for some C > 1 (and taking p bounded away from infinity, for simplicity).
The dependence on p is about right, but the dependence on d should not be
exponential.
In [St1982, StSt1983], Stein gave an elegant argument, based on the
Calder
on-Zygmund method of rotations, to eliminate the dependence of d:

4.5. Steins spherical maximal inequality

81

Theorem 4.5.1. One can take Cd,p = Cp for each p > 1, where Cp depends
only on p.
The argument is based on an earlier bound [St1976] of Stein on the
spherical maximal function
MS f (x) := sup Ar |f |(x)
r>0

where Ar are the spherical averaging operators


Z
Ar f (x) :=
f (x + r)d d1 ()
S d1

d d1

and
is normalised surface measure on the sphere S d1 . Because this
is an uncountable supremum, and the averaging operators Ar do not have
good continuity properties in r, it is not a priori obvious that MS f is even
a measurable function for, say, locally integrable f ; but we can avoid this
technical issue, at least initially, by restricting attention to continuous functions f . The Stein maximal theorem for the spherical maximal function
d
then asserts that if d 3 and p > d1
, then we have
(4.37)

kMS f kLp (Rd ) Cd,p kf kLp (Rd )

for all (continuous) f Lp (Rd ). We will sketch a proof of this theorem4


below the fold.
d
The condition p > d1
can be seen to be necessary as follows. Take f
to be any fixed bump function. A brief calculation then shows that MS f (x)
decays like |x|1d as |x| , and hence MS f does not lie in Lp (Rd ) unless
d
p > d1
. By taking f to be a rescaled bump function supported on a small
d
ball, one can show that the condition p > d1
is necessary even if we replace
d
R with a compact region (and similarly restrict the radius parameter r
to be bounded). The condition d 3 however is not quite necessary; the
result is also true when d = 2, but this turned out to be a more difficult
result, obtained first in [Bo1985], with a simplified proof (based on the local
smoothing properties of the wave equation) later given in [MoSeSo1992].

The Hardy-Littlewood maximal operator M f , which involves averaging


over balls, is clearly related to the spherical maximal operator, which averages over spheres. Indeed, by using polar co-ordinates, one easily verifies
the pointwise inequality
M f (x) MS f (x)
for any (continuous) f , which intuitively reflects the fact that one can think
of a ball as an average of spheres. Thus, we see that the spherical maximal
4Among other things, one can use this bound to show the pointwise convergence
d
limr0 Ar f (x) = f (x) of the spherical averages for any f Lp (Rd ) when d 3 and p > d1
,
although we will not focus on this application here.

82

4. Analysis

inequality (4.37) implies5 the Hardy-Littlewood maximal inequality (4.36)


with the same constant Cp,d .
At first glance, this observation does not immediately establish Theorem
4.5.1 for two reasons. Firstly, Steins spherical maximal theorem is restricted
d
to the case when d 3 and p > d1
; and secondly, the constant Cd,p in that
theorem still depends on dimension d. The first objection can be easily
d
disposed of, for if p > 1, then the hypotheses d 3 and p > d1
will
automatically be satisfied for d sufficiently large (depending on p); note
that the case when d is bounded (with a bound depending on p) is already
handled by the classical maximal inequality (4.36).
We still have to deal with the second objection, namely that constant
Cd,p in (4.37) depends on d. However, here we can use the method of
rotations to show that the constants Cp,d can be taken to be non-increasing
(and hence bounded) in d. The idea is to view high-dimensional spheres
as an average of rotated low-dimensional spheres. We illustrate this with a
demonstration that Cd+1,p Cd,p , in the sense that any bound of the form
kMS f kLp (Rd ) Akf kLp (Rd )

(4.38)

for the d-dimensional spherical maximal function, implies the same bound
(4.39)

kMS f kLp (Rd+1 ) Akf kLp (Rd+1 )

for the d + 1-dimensional spherical maximal function, with exactly the same
constant A. For any direction 0 S d Rd+1 , consider the averaging
operators
MS0 f (x) := sup Ar 0 |f |(x)
r>0

Rd+1

C, where
Z
0
Ar f (x) :=
f (x + rU0 ) d d1 ()

for any continuous f :

S d1

where U0 is some orthogonal transformation mapping the sphere S d1 to


the sphere S d1,0 := { S d : 0 }; the exact choice of orthogonal
transformation U0 is irrelevant due to the rotation-invariance of surface
measure d d1 on the sphere S d1 . A simple application of Fubinis theorem
(after first rotating 0 to be, say, the standard unit vector ed ) using (4.38)
then shows that
(4.40)

kMS0 f kLp (Rd+1 ) Akf kLp (Rd+1 )

5This implication is initially only valid for continuous functions, but one can then extend the
inequality (4.36) to the rest of Lp (Rd ) by a standard limiting argument.

4.5. Steins spherical maximal inequality

83

uniformly in 0 . On the other hand, by viewing the d-dimensional sphere


S d as an average of the spheres S d1,0 , we have the identity
Z
Ar 0 f (x) d d (0 );
Ar f (x) =
Sd

indeed, one can deduce this from the uniqueness of Haar measure by noting
that both the left-hand side and right-hand side are invariant means of f on
the sphere {y Rd+1 : |y x| = r}. This implies that
Z
MS0 f (x) d d (0 )
MS f (x)
Sd

and thus by Minkowskis inequality for integrals, we may deduce (4.39) from
(4.40).
Remark 4.5.2. Unfortunately, the method of rotations does not work to
show that the constant Cd for the weak (1, 1) inequality (4.35) is independent of dimension, as the weak L1 quasinorm kkL1, is not a genuine norm
and does not obey the Minkowski inequality for integrals. Indeed, the question of whether Cd in (4.35) can be taken to be independent of dimension
remains open. The best known positive result is due to Stein and Stromberg
[StSt1983], who showed that one can take Cd = Cd for some absolute constant C, by comparing the Hardy-Littlewood maximal function with the
heat kernel maximal function
sup et |f |(x).
t>0

The abstract semigroup maximal inequality of Dunford and Schwartz (see


e.g. [Ta2009, Theorem 2.9.1]) shows that the heat kernel maximal function
is of weak-type (1, 1) with a constant of 1, and this can be used, together
with a comparison argument, to give the Stein-Stromberg bound. In the
converse direction, it was shown in [Al2011] that if one replaces the balls
B(x, r) with cubes, then the weak (1, 1) constant Cd must go to infinity as
d .
4.5.1. Proof of spherical maximal inequality. We now sketch the proof
d
of Steins spherical maximal inequality (4.37) for d 3, p > d1
, and
f Lp (Rd ) continuous. To motivate the argument, let us first establish the
simpler estimate
kMS1 f kLp (Rd ) Cd,p kf kLp (Rd )
where MS1 is the spherical maximal function restricted to unit scales:
MS1 f (x) := sup Ar |f |(x).
1r2

For the rest of these notes, we suppress the dependence of constants on d


and p, using X . Y as short-hand for X Cp,d Y .

84

4. Analysis

It will of course suffice to establish the estimate


k sup |Ar f (x)|kLp (Rd ) . kf kLp (Rd )

(4.41)

1r2

for all continuous f Lp (Rd ), as the original claim follows by replacing f


with |f |. Also, since the bound is trivially true for p = , and we crucially
d
have d1
< 2 in three and higher dimensions, we can restrict attention to
the regime p < 2.
We establish this bound using a Littlewood-Paley decomposition
X
f=
PN f
N

where N ranges over dyadic numbers 2k , k Z, and PN is a smooth Fourier


projection to frequencies || N ; a bit more formally, we have

Pd
)f ()
N f () = (
N
where is a bump function supported on the annulus { Rd : 1/2 ||
P
2} such that N ( N ) = 1 for all non-zero . Actually, for the purposes of
proving (4.41), it is more convenient to use the decomposition
X
f = P1 f +
PN f
N >1

where P1 =
N 1 PN is the projection to frequencies || . 1. By the
triangle inequality, it then suffices to show the bounds
(4.42)

k sup |Ar P1 f (x)|kLp (Rd ) . kf kLp (Rd )


1r2

and
(4.43)

k sup |Ar PN f (x)|kLp (Rd ) . N kf kLp (Rd )


1r2

for all N 1 and some > 0 depending only on p, d.


To prove the low-frequency bound (4.42), observe that P1 is a convolution operator with a bump function, and from this and the radius restriction
1 r 2 we see that Ar P1 is a convolution operator with a function of
uniformly bounded size and support. From this we obtain the pointwise
bound
(4.44)

Ar P1 f (x) . M f (x)

and the claim (4.42) follows from (4.36).


Now we turn to the more interesting high-frequency bound (4.43). Here,
PN is a convolution operator with an approximation to the identity at scale
1/N , and so Ar PN is a convolution operator with a function of magnitude

4.5. Steins spherical maximal inequality

85

O(N ) concentrated on an annulus of thickness O(1/N ) around the sphere


of radius R. This can be used to give the pointwise bound
(4.45)

Ar PN f (x) . N M f (x),

which by (4.36) gives the bound


k sup |Ar PN f (x)|kLq (Rd ) .q N kf kLq (Rd )

(4.46)

1r2

for any q > 1. This is not directly strong enough to prove (4.43), due to the
loss of one derivative as manifested by the factor N . On the other hand,
d
this bound (4.46) holds for all q > 1, and not just in the range p > d1
.
To counterbalance this loss of one derivative, we turn to L2 estimates.
A standard stationary phase computation (or Bessel function computation)
shows that Ar is a Fourier multiplier whose symbol decays like ||(d1)/2 .
As such, Plancherels theorem yields the L2 bound
kAr PN f kL2 (Rd ) . N (d1)/2 kf kL2 (Rd )
uniformly in 1 r 2. But we still have to take the supremum over
r. This is an uncountable supremum, so one cannot just apply a union
bound argument. However, from the uncertainty principle, we expect PN f
to be blurred out at spatial scale 1/N , which suggests that the averages
Ar PN f do not vary much when r is restricted to an interval of size 1/N .
Heuristically, this then suggests that
sup |Ar PN f |

sup

|Ar PN f |.

1
Z
1r2:r N

1r2

Estimating the discrete supremum on the right-hand side somewhat crudely


by the square-function,
X
sup
|Ar PN f |2 )1/2 ,
|Ar PN f | (
1
1r2:r N
Z

1
1r2:r N
Z

and taking L2 norms, one is then led to the heuristic prediction that
(4.47)

k sup |Ar PN f |kL2 (Rd ) . N 1/2 N (d1)/2 kf kL2 (Rd ) .


1r2

One can make this heuristic precise using the one-dimensional Sobolev embedding inequality adapted to scale 1/N , namely that
Z 2
Z 2
1/2
2
1/2
1/2
sup |g(r)| . N (
|g(r)| dr) + N
(
|g 0 (r)|2 dr)1/2 .
1r2

A routine computation shows that


k

d
Ar PN f kL2 (Rd ) . N N (d1)/2 kf kL2 (Rd )
dr

86

4. Analysis

(which formalises the heuristic that Ar PN f is roughly constant at r-scales


1/N ), and this soon leads to a rigorous proof of (4.47).
An interpolation between (4.46) and (4.47) (for q sufficiently close to 1)
d
then gives (4.43) for some > 0 (here we crucially use that p > d1
and
p < 2).
Now we control the full maximal function MS f . It suffices to show that
k sup sup |Ar f (x)|kLp (Rd ) . kf kLp (Rd ) ,
R Rr2R

where R ranges over dyadic numbers.


For any fixed R, the natural spatial scale is R, and the natural frequency
scale is thus 1/R. We therefore split
X
f = P1/R f +
PN/R f,
N >1

and aim to establish the bounds


(4.48)

k sup sup |Ar P1/R f (x)|kLp (Rd ) . kf kLp (Rd )


R Rr2R

and
(4.49)

k sup sup |Ar PN/R f (x)|kLp (Rd ) . N kf kLp (Rd )


R Rr2R

for each N > 1 and some > 0 depending only on d and p, similarly to
before.
A rescaled version of the derivation of (4.44) gives
Ar P1/R f (x) . M f (x)
for all R r 2R, which already lets us deduce (4.48). As for (4.49), a
rescaling of (4.45) gives
Ar PN/R f (x) . N M f (x),
for all R r 2R, and thus
(4.50)

k sup sup |Ar PN/R f (x)|kLq (Rd ) . N kf kLq (Rd )


R Rr2R

for all q > 1. Meanwhile, at the L2 level, we have


kAr PN/R f kL2 (Rd ) . N (d1)/2 kf kL2 (Rd )
and
k

d
N
Ar PN/R f kL2 (Rd ) . N (d1)/2 kf kL2 (Rd )
dr
R

4.6. Steins maximal principle

87

and so
Z
Z 2R
1 2R
R
d
2
1/2
k(
|Ar PN/R f | dr) + ( 2
| Ar PN/R f |2 dr)1/2 kL2 (Rd )
R R
N R dr
. N 1/2 N (d1)/2 kf kL2 (Rd )
which implies by rescaled Sobolev embedding that
k sup |Ar PN/R f |kL2 (Rd ) . N 1/2 N (d1)/2 kf kL2 (Rd ) .
Rr2R

In fact, by writing PN/R f = PN/R PN/R f , where PN/R is a slight widening


of PN/R , we have
k sup |Ar PN/R f |kL2 (Rd ) . N 1/2 N (d1)/2 kPN/R f kL2 (Rd ) ;
Rr2R

square summing this (and bounding a supremum by a square function) and


using Plancherel we obtain
k sup sup |Ar PN/R f |kL2 (Rd ) . N 1/2 N (d1)/2 kf kL2 (Rd ) .
R Rr2R

Interpolating this against (4.50) as before we obtain (4.49) as required.

4.6. Steins maximal principle


Suppose one has a measure space X = (X, B, ) and a sequence of operators
Tn : Lp (X) Lp (X) that are bounded on some Lp (X) space, with 1
p < . Suppose that on some dense subclass of functions f in Lp (X) (e.g.
continuous compactly supported functions, if the space X is reasonable),
one already knows that Tn f converges pointwise almost everywhere to some
limit T f , for another bounded operator T : Lp (X) Lp (X) (e.g. T could
be the identity operator). What additional ingredient does one need to pass
to the limit and conclude that Tn f converges almost everywhere to T f for
all f in Lp (X) (and not just for f in a dense subclass)?
One standard way to proceed here is to study the maximal operator
T f (x) := sup |Tn f (x)|
n

and aim to establish a weak-type maximal inequality


(4.51)

kT f kLp, (X) Ckf kLp (X)

for all f Lp (X) (or all f in the dense subclass), and some constant C,
where Lp, is the weak Lp norm
kf kLp, (X) := sup t({x X : |f (x)| t})1/p .
t>0

A standard approximation argument using (4.51) then shows that Tn f will


now indeed converge to T f pointwise almost everywhere for all f in Lp (X),

88

4. Analysis

and not just in the dense subclass. See for instance [Ta2011, 1.6], in which
this method is used to deduce the Lebesgue differentiation theorem from the
Hardy-Littlewood maximal inequality.
This is by now a very standard approach to establishing pointwise almost everywhere convergence theorems, but it is natural to ask whether it
is strictly necessary. In particular, is it possible to have a pointwise convergence result Tn f 7 T f without being able to obtain a weak-type maximal
inequality of the form (4.51)?
In the case of norm convergence (in which one asks for Tn f to converge
to T f in the Lp norm, rather than in the pointwise almost everywhere sense),
the answer is no, thanks to the uniform boundedness principle, which among
other things shows that norm convergence is only possible if one has the
uniform bound
sup kTn f kLp (X) Ckf kLp (X)

(4.52)

for some C > 0 and all f Lp (X); and conversely, if one has the uniform
bound, and one has already established norm convergence of Tn f to T f on
a dense subclass of Lp (X), (4.52) will extend that norm convergence to all
of Lp (X).
Returning to pointwise almost everywhere convergence, the answer in
general is yes. Consider for instance the rank one operators
Z 1
Tn f (x) := 1[n,n+1]
f (y) dy
0

L1 (R)

L1 (R).

from
to
It is clear that Tn f converges pointwise almost everywhere to zero as n for any f L1 (R), and the operators Tn are
uniformly bounded on L1 (R), but the maximal function T does not obey
(4.51). One can modify this example in a number of ways to defeat almost
any reasonable conjecture that something like (4.51) should be necessary for
pointwise almost everywhere convergence.
In spite of this, a remarkable observation of Stein [St1961], now known
as Steins maximal principle, asserts that the maximal inequality is necessary to prove pointwise almost everywhere convergence, if one is working on
a compact group and the operators Tn are translation invariant, and if the
exponent p is at most 2:
Theorem 4.6.1 (Stein maximal principle). Let G be a compact group, let X
be a homogeneous space6 of G with a finite Haar measure , let 1 p 2,
and let Tn : Lp (X) Lp (X) be a sequence of bounded linear operators
commuting with translations, such that Tn f converges pointwise almost everywhere for each f Lp (X). Then (4.51) holds.
6By this, we mean that G has a transitive action on X which preserves .

4.6. Steins maximal principle

89

This is not quite the most general version of the principle; some additional variants and generalisations are given in [St1961]. For instance, one
can replace the discrete sequence Tn of operators with a continuous sequence
Tt without much difficulty. As a typical application of this principle, we see
that Carlesons celebrated theorem [Ca1966] that the partial Fourier series
PN
2inx of an L2 (R/Z) function f : R/Z C converge almost

n=N f (n)e
everywhere is in fact equivalent to the estimate
(4.53)

k sup |

N
X

N >0 n=N

f(n)e2in |kL2, (R/Z) Ckf kL2 (R/Z) .

And unsurprisingly, most of the proofs of this (difficult) theorem have proceeded by first establishing (4.53), and Steins maximal principle strongly
suggests that this is the optimal way to try to prove this theorem.
On the other hand, the theorem does fail for p > 2, and almost everywhere convergence results in Lp for p > 2 can be proven by other methods
than weak (p, p) estimates. For instance, the convergence of Bochner-Riesz
multipliers in Lp (Rn ) for any n (and for p in the range predicted by the
Bochner-Riesz conjecture) was verified for p > 2 in [CaRuVe1988], despite the fact that the weak (p, p) of even a single Bochner-Riesz multiplier,
let alone the maximal function, has still not been completely verified in
this range. (The argument in [CaRuVe1988] uses weighted L2 estimates
for the maximal Bochner-Riesz operator, rather than Lp type estimates.)
For p 2, though, Steins principle (after localising to a torus) does apply, though, and pointwise almost everywhere convergence of Bochner-Riesz
means is equivalent to the weak (p, p) estimate (4.51).
Steins principle is restricted to compact groups (such as the torus (R/Z)n
or the rotation group SO(n)) and their homogeneous spaces (such as the
torus (R/Z)n again, or the sphere S n1 ). As stated, the principle fails
in the noncompact setting; for instance, in R, the convolution operators
Tn f := f 1[n,n+1] are such that Tn f converges pointwise almost everywhere
to zero for every f L1 (Rn ), but the maximal function is not of weak-type
(1, 1). However, in many applications on non-compact domains, the Tn are
localised enough that one can transfer from a non-compact setting to a
compact setting and then apply Steins principle. For instance, Carlesons
theorem on the real line R is equivalent to Carlesons theorem on the circle
R/Z (due to the localisation of the Dirichlet kernels), which as discussed
before is equivalent to the estimate (4.53) on the circle, which by a scaling
argument is equivalent to the analogous estimate on the real line R.
Steins argument from [St1961] can be viewed nowadays as an application of the probabilistic method ; starting with a sequence of increasingly bad
counterexamples to the maximal inequality (4.51), one randomly combines

90

4. Analysis

them together to create a single infinitely bad counterexample. To make


this idea work, Stein employs two basic ideas:
(1) The random rotations (or random translations) trick. Given a subset E of X of small but positive measure, one can randomly select
about |G|/|E| translates gi E of E that cover most of X.
(2) The random sums trick Given a collection f1 , . . . , fn : X C of
signed functions
that may possibly cancel each other
P
P in a deterministic sum ni=1 fi , one can perform a random sum ni=1 fi instead
to obtain a random function whose
will usually be comPmagnitude
n
2
1/2
parable to the square function ( i=1 |fi | ) ; this can be made
rigorous by concentration of measure results, such as Khintchines
inequality.
These ideas have since been used repeatedly in harmonic analysis. For
instance, the random rotations trick was used in [ElObTa2010] to obtain
Kakeya-type estimates in finite fields. The random sums trick is by now a
standard tool to build various counterexamples to estimates (or to convergence results) in harmonic analysis, for instance being used in [Fe1971] to
disprove the boundedness of the ball multiplier on Lp (Rn ) for p 6= 2, n 2.
Another use of the random sum trick is to show that Theorem 4.6.1 fails
once p > 2; see Steins original paper for details.
Another use of the random rotations trick, closely related to Theorem
4.6.1, is the Nikishin-Stein factorisation theorem. Here is Steins formulation
of this theorem:
Theorem 4.6.2 (Stein factorisation theorem). Let G be a compact group,
let X be a homogeneous space of G with a finite Haar measure , let 1
p 2 and q > 0, and let T : Lp (X) Lq (X) be a bounded linear operator
commuting with translations and obeying the estimate
kT f kLq (X) Akf kLp (X)
for all f Lp (X) and some A > 0. Then T also maps Lp (X) to Lp, (X),
with
kT f kLp, (X) Cp,q Akf kLp (X)
for all f Lp (X), with Cp,q depending only on p, q.
This result is trivial with q p, but becomes useful when q < p. In this
regime, the translation invariance allows one to freely upgrade a strongtype (p, q) result to a weak-type (p, p) result. In other words, bounded linear
operators from Lp (X) to Lq (X) automatically factor through the inclusion
Lp, (X) Lq (X), which helps explain the name factorisation theorem.
Factorisation theory has been developed further in [Ma1974], [Pi1986].

4.6. Steins maximal principle

91

Steins factorisation theorem (or more precisely, a variant of it) is useful


in the theory of Kakeya and restriction theorems in Euclidean space, as first
observed in [Bo1991].
In [Ni1970], Nikishin obtained the following generalisation of Steins
factorisation theorem in which the translation-invariance hypothesis can be
dropped, at the cost of excluding a set of small measure:
Theorem 4.6.3 (Nikishin-Stein factorisation theorem). Let X be a finite
measure space, let 1 p 2 and q > 0, and let T : Lp (X) Lq (X)
be a bounded linear operator commuting with translations and obeying the
estimate
kT f kLq (X) Akf kLp (X)
for all f Lp (X) and some A > 0. Then for any > 0, there exists a
subset E of X of measure at most such that
(4.54)

kT f kLp, (X\E) Cp,q, Akf kLp (X)

for all f Lp (X), with Cp,q, depending only on p, q, .


One can recover Theorem 4.6.2 from Theorem 4.6.3 by an averaging
argument to eliminate the exceptional set; we omit the details.
4.6.1. Sketch of proofs. We now sketch how Steins maximal principle
is proven. We may normalise (X) = 1. Suppose the maximal inequality
(4.51) fails for any C. Then, for any A 1, we can find a non-zero function
f Lp (X) such that
kT f kLp, (X) Akf kLp (X) .
By homogeneity, we can arrange matters so that
(E) Ap kf kpLp (X) ,
where E := {x X : |T f (x)| 1}.
At present, E could be a much smaller set than X: (E)  1. But
we can amplify E by using the random rotations trick. Let m be a natural
number comparable to 1/(E), and let g1 , . . . , gm be elements of G, chosen
uniformly at random. Each element x of X has a probability 1 (1
(E))m 1 of lying in at least one of the translates g1 E, . . . , gm E of E.
From this and the first moment method, we see that with probability 1,
the set g1 E . . . gm E has measure 1.
P
1
Now form the function F := m
j=1 j gj f , where gj f (x) := f (gj x) is
the left-translation of f by gj , and the j = 1 are randomly chosen signs.
On the one hand, an application of moment methods (such as the PaleyZygmund inequality), one can show that each element x of g1 E . . . gm E
will be such that |T F (x)| & 1 with probability 1. On the other hand,

92

4. Analysis

an application of Khintchines inequality shows that with high probability


F will have an Lp (X) norm bounded by
m
X
. k(
|gj f |2 )1/2 kLp (X) .
j=1

Now we crucially use the hypothesis p 2 to replace the `2 -summation here


by an `p summation. Interchanging the `p and Lp norms, we then conclude
that with high probability we have
kF kLp (X) . m1/p kf kLp (X) . 1/A.
To summarise, using the probabilistic method, we have constructed (for
arbitrarily large A) a function F = FA whose Lp norm is only O(1/A) in size,
but such that |T F (x)| & 1 on a subset of X of measure 1. By sending A
rapidly to infinity and taking a suitable combination of these functions F ,
one can then create a function G in Lp such that T G is infinite on a set
of positive measure, which contradicts the hypothesis of pointwise almost
everywhere convergence.
Steins factorisation theorem is proven in a similar fashion. For Nikishins factorisation theorem, the group translation operations gj are no
longer available. However, one can substitute for this by using the failure
of the hypothesis (4.54), which among other things tells us that if one has
a number of small sets E1 , . . . , Ei in X whose total measure is at most ,
then we can find another function fi+1 of small Lp norm for which T fi+1 is
large on a set Ei+1 outside of E1 . . . Ei . Iterating this observation and
choosing all parameters carefully, one can eventually establish the result.
Remark 4.6.4. A systematic discussion of these and other maximal principles is given in [de1981].

Chapter 5

Nonstandard analysis

5.1. Polynomial bounds via nonstandard analysis


Nonstandard analysis is useful in allowing one to import tools from infinitary (or qualitative) mathematics in order to establish results in finitary
(or quantitative) mathematics. One drawback, though, to using nonstandard analysis methods is that the bounds one obtains by such methods are
usually ineffective: in particular, the conclusions of a nonstandard analysis
argument may involve an unspecified constant C that is known to be finite
but for which no explicit bound is obviously1 available.
Because of this fact, it would seem that quantitative bounds, such as
polynomial type bounds X CY C that show that one quantity X is controlled in a polynomial fashion by another quantity Y , are not easily obtainable through the ineffective methods of nonstandard analysis. Actually,
this is not the case; as I will demonstrate by an example below, nonstandard
analysis can certainly yield polynomial type bounds. The catch is that the
exponent C in such bounds will be ineffective; but nevertheless such bounds
are still good enough for many applications.
Let us now illustrate this by reproving a lemma of Chang [Ch2003]
(Lemma 2.14, to be precise), which was recently pointed out to me by Van
Vu. Changs paper is focused primarily on the sum-product problem, but she
uses a quantitative lemma from algebraic geometry which is of independent

1In many cases, a bound can eventually be worked out by performing proof mining on
the argument, and in particular by carefully unpacking the proofs of all the various results from
infinitary mathematics that were used in the argument, as opposed to simply using them as black
boxes, but this is a time-consuming task and the bounds that one eventually obtains tend to be
quite poor (e.g. tower exponential or Ackermann type bounds are not uncommon).

93

94

5. Nonstandard analysis

interest. To motivate the lemma, let us first establish a qualitative version


(a variant of the Lefschetz principle):
Lemma 5.1.1 (Qualitative solvability). Let P1 , . . . , Pr : Cd C be a finite
number of polynomials in several variables with rational coefficients. If there
is a complex solution z = (z1 , . . . , zd ) Cd to the simultaneous system of
equations
P1 (z) = . . . = Pr (z) = 0,
d

then there also exists a solution z Q whose coefficients are algebraic


numbers (i.e. they lie in the algebraic closure Q of the rationals).
Proof. Suppose there was no solution to P1 (z) = . . . = Pr (z) = 0 over Q.
Applying Hilberts nullstellensatz (which is available as Q is algebraically
closed), we conclude the existence of some polynomials Q1 , . . . , Qr (with
coefficients in Q) such that
P1 Q1 + . . . + Pr Qr = 1
as polynomials. In particular, we have
P1 (z)Q1 (z) + . . . + Pr (z)Qr (z) = 1
for all z Cd . This shows that there is no solution to P1 (z) = . . . = Pr (z) =
0 over C, as required.

Remark 5.1.2. Observe that in the above argument, one could replace Q
and C by any other pair of fields, with the latter containing the algebraic
closure of the former, and still obtain the same result.
The above lemma asserts that if a system of rational equations is solvable
at all, then it is solvable with some algebraic solution. But it gives no bound
on the complexity of that solution in terms of the complexity of the original
equation. Changs lemma provides such a bound. If H 1 is an integer,
let us say that an algebraic number has height at most H if its minimal
polynomial (after clearing denominators) consists of integers of magnitude
at most H.
Lemma 5.1.3 (Quantitative solvability). Let P1 , . . . , Pr : Cd C be a
finite number of polynomials of degree at most D with rational coefficients,
each of height at most H. If there is a complex solution z = (z1 , . . . , zd ) Cd
to the simultaneous system of equations
P1 (z) = . . . = Pr (z) = 0,
d

then there also exists a solution z Q whose coefficients are algebraic


numbers of degree at most C and height at most CH C , where C = CD,d,r
depends only on D, d and r.

5.1. Polynomial bounds via nonstandard analysis

95

Chang proves this lemma by essentially establishing a quantitative version of the nullstellensatz, via elementary elimination theory (somewhat
similar, actually, to the approach taken in I took to the nullstellensatz in
[Ta2008, 1.15]. She also notes that one could also establish the result
through the machinery of Gr
obner bases. In each of these arguments, it was
not possible to use Lemma 5.1.1 (or the closely related nullstellensatz) as a
black box; one actually had to unpack one of the proofs of that lemma or
nullstellensatz to get the polynomial bound. However, using nonstandard
analysis, it is possible to get such polynomial bounds (albeit with an ineffective value of the constant C) directly from Lemma 5.1.1 (or more precisely,
the generalisation in Remark 5.1.2) without having to inspect the proof,
and instead simply using it as a black box, thus providing a soft proof of
Lemma 5.1.3 that is an alternative to the hard proofs mentioned above.
The nonstandard proof is essentially due to Schmidt-Gottsch [Sc1989],
and proceeds as follows. Informally, the idea is that Lemma 5.1.3 should
follow from Lemma 5.1.1 after replacing the field of rationals Q with the
field of rationals of polynomially bounded height. Unfortunately, the latter
object does not really make sense as a field in standard analysis; nevertheless,
it is a perfectly sensible object in nonstandard analysis, and this allows the
above informal argument to be made rigorous.
We turn to the details. As is common whenever one uses nonstandard
analysis to prove finitary results, we use a compactness and contradiction
argument (or more precisely, an ultralimit and contradiction argument).
Suppose for contradiction that Lemma 5.1.3 failed. Carefully negating the
quantifiers (and using the axiom of choice), we conclude that there exists
D, d, r such that for each natural number n, there is a positive integer H (n)
(n)
(n)
and a family P1 , . . . , Pr : Cd C of polynomials of degree at most D
and rational coefficients of height at most H (n) , such that there exist at least
one complex solution z (n) Cd to
(5.1)

(n)

P1 (z (n) ) = . . . = Pr (z (n) ) = 0,

but such that there does not exist any such solution whose coefficients are
algebraic numbers of degree at most n and height at most n(H (n) )n .
Now we take ultralimits (see e.g. [Ta2011b, 2.1] of a quick review of
ultralimit analysis, which we will assume knowledge of in the argument that
follows). Let p N\N be a non-principal ultrafilter. For each i = 1, . . . , r,
the ultralimit
(n)
Pi := lim Pi
np

(n)
Pi

of the (standard) polynomials


is a nonstandard polynomial Pi : Cd
of degree at most D, whose coefficients now lie in the nonstandard rationals Q. Actually, due to the height restriction, we can say more. Let

96

5. Nonstandard analysis

H := limnp H (n) N be the ultralimit of the H (n) , this is a nonstandard


natural number (which will almost certainly be unbounded, but we will not
need to use this). Let us say that a nonstandard integer a is of polynomial
size if we have |a| CH C for some standard natural number C, and say
that a nonstandard rational number a/b is of polynomial height if a, b are of
polynomial size. Let Qpoly(H) be the collection of all nonstandard rationals
of polynomial height. (In the language of nonstandard analysis, Qpoly(H) is
an external set rather than an internal one, because it is not itself an ultraproduct of standard sets; but this will not be relevant for the argument that
follows.) It is easy to see that Qpoly(H) is a field, basically because the sum
or product of two integers of polynomial size, remains of polynomial size. By
construction, it is clear that the coefficients of Pi are nonstandard rationals
of polynomial height, and thus P1 , . . . , Pr are defined over Qpoly(H) .
Meanwhile, if we let z := limnp z (n) Cd be the ultralimit of the
solutions z (n) in (5.1), we have
P1 (z) = . . . = Pr (z) = 0,
thus P1 , . . . , Pr are solvable in C. Applying Lemma 5.1.1 (or more precisely,
the generalisation in Remark 5.1.2), we see that P1 , . . . , Pr are also solvable
in Qpoly(H) . (Note that as C is algebraically closed, C is also (by Loss
theorem), and so C contains Qpoly(H) .) Thus, there exists w Qpoly(H)
with
P1 (w) = . . . = Pr (w) = 0.

As Qpoly(H) lies in Cd , we can write w as an ultralimit w = limnp w(n) of


standard complex vectors w(n) Cd . By construction, the coefficients of w
each obey a non-trivial polynomial equation of degree at most C and whose
coefficients are nonstandard integers of magnitude at most CH C , for some
standard natural number C. Undoing the ultralimit, we conclude that for n
sufficiently close to p, the coefficients of w(n) obey a non-trivial polynomial
equation of degree at most C whose coefficients are standard integers of
magnitude at most C(H (n) )C . In particular, these coefficients have height
at most C(H (n) )C . Also, we have
(n)

P1 (w(n) ) = . . . = Pr(n) (w(n) ) = 0.


(n)

But for n larger than C, this contradicts the construction of the Pi , and
the claim follows. (Note that as p is non-principal, any neighbourhood of p
in N will contain arbitrarily large natural numbers.)
Remark 5.1.4. The same argument actually gives a slightly stronger version of Lemma 5.1.3, namely that the integer coefficients used to define the
algebraic solution z can be taken to be polynomials in the coefficients of
P1 , . . . , Pr , with degree and coefficients bounded by CD,d,r .

5.2. Loeb measure and the triangle removal lemma

97

5.2. Loeb measure and the triangle removal lemma


Formally, a measure space is a triple (X, B, ), where X is a set, B is a
-algebra of subsets of X, and : B [0, +] is a countably additive
unsigned measure on B. If the measure (X) of the total space is one, then
the measure space becomes a probability space. If a non-negative function
f : X [0, +]Ris B-measurable (or measurable for short), one can then
form the integral X f d [0, +] by the usual abstract measure-theoretic
construction (as discussed for instance in [Ta2011, 1.4]).
A measure space is complete if every subset of a null set (i.e. a measurable set of measure zero) is also a null set. Not all measure spaces are
complete, but one can always form the completion (X, B, ) of a measure
space (X, B, ) by enlarging the -algebra B to the space of all sets which are
equal to a measurable set outside of a null set, and extending the measure
appropriately.
Given two (-finite) measure spaces (X, BX , X ) and (Y, BY , Y ), one
can form the product space (X Y, BX BY , X Y ). This is a measure
space whose domain is the Cartesian product X Y , the -algebra BX BY
is generated by the rectangles A B with A BX , B BY , and the
measure X Y is the unique measure on BX BY obeying the identity
X Y (A B) = X (A)Y (B).
See for instance [Ta2011, 1.7] for a formal construction of product measure2. One of the fundamental theorems concerning product measure is
Tonellis theorem (which is basically the unsigned version of the more wellknown Fubini theorem), which asserts that if f : X Y [0, +] is BX BY
measurable, then the integral expressions
Z Z
( f (x, y) dY (y)) dX (x),
X

Z
( f (x, y) dX (x)) dY (y)

and
Z
f (x, y) dXY (x, y)
XY

all exist (thus all integrands are almost-everywhere well-defined and measurable with respect to the appropriate -algebras), and are all equal to each
other; see e.g. [Ta2011, Theorem 1.7.15].
2There are technical difficulties with the theory when X or Y is not -finite, but in these
notes we will only be dealing with probability spaces, which are clearly -finite, so this difficulty
will not concern us.

98

5. Nonstandard analysis

Any finite non-empty set V can be turned into a probability space


(V, 2V , V ) by endowing it with the discrete -algebra 2V := {A : A V }
of all subsets of V , and the normalised counting measure
(A) :=

|A|
,
|V |

where |A| denotes the cardinality of A. In this discrete setting, the probability space is automatically complete, and every function f : V [0, +]
is measurable, with the integral simply being the average:
Z
1 X
f dV =
f (v).
|V |
V
vV

Of course, Tonellis theorem is obvious for these discrete spaces; the deeper
content of that theorem is only apparent at the level of continuous measure
spaces.
Among other things, this probability space structure on finite sets can
be used to describe various statistics of dense graphs. Recall that a graph
G = (V, E) is a finite vertex set V , together with a set of edges E, which we
will think of as a symmetric subset3 of the Cartesian product V V . Then,
if V is non-empty, and ignoring some minor errors coming from the diagonal
V , the edge density of the graph is essentially
Z
e(G) := V V (E) =
1E (v, w) dV V (v, w),
V V

the triangle density of the graph is basically


Z
t(G) :=
1E (u, v)1E (v, w)1E (w, u) dV V V (u, v, w),
V V V

and so forth.
In [RuSz1978], Ruzsa and Szemeredi established the triangle removal
lemma concerning triangle densities, which informally asserts that a graph
with few triangles can be made completely triangle-free by removing a small
number of edges:
Lemma 5.2.1 (Triangle removal lemma). Let G = (V, E) be a graph on a
non-empty finite set V , such that t(G) for some > 0. Then there exists
a subgraph G0 = (V, E 0 ) of G with t(G0 ) = 0, such that e(G\G0 ) = o0 (1),
where o0 (1) denotes a quantity bounded by c() for some function c() of
that goes to zero as 0.
3If one wishes, one can prohibit loops in E, so that E is disjoint from the diagonal V :=
{(v, v) : v V } of V V , but this will not make much difference for the discussion below.

5.2. Loeb measure and the triangle removal lemma

99

The original proof of the triangle removal lemma was a finitary one,
and proceeded via the Szemeredi regularity lemma [Sz1978]. It has a number of consequences; for instance, as already noted in that paper, the triangle
removal lemma implies as a corollary Roths theorem [Ro1953] that subsets
of Z of positive upper density contain infinitely many arithmetic progressions of length three.
It is however also possible to establish this lemma by infinitary means.
There are at least three basic approaches for this. One is via a correspondence principle between questions about dense finite graphs, and questions
about exchangeable random infinite graphs, as was pursued in [Ta2007],
[Ta2010b, 2.3]. A second (closely related to the first) is to use the machinery of graph limits, as developed in [LoSz2006], [BoChLoSoVe2008].
The third is via nonstandard analysis (or equivalently, by using ultraproducts), as was pursued in [ElSz2012]. These three approaches differ in the
technical details of their execution, but the net effect of all of these approaches is broadly the same, in that they both convert statements about
large dense graphs (such as the triangle removal lemma) to measure theoretic statements on infinitary measure spaces. (This is analogous to how
the Furstenberg correspondence principle converts combinatorial statements
about dense sets of integers into ergodic-theoretic statements on measurepreserving systems.)
In this section, we will illustrate the nonstandard analysis approach
of [ElSz2012] by providing a nonstandard proof of the triangle removal
lemma. The main technical tool used here (besides the basic machinery
of nonstandard analysis) is that of Loeb measure [Lo1975], which gives a
probability
space structure (V, BV , V ) to nonstandard finite non-empty sets
Q
V = np Vn that is an infinitary analogue of the discrete probability space
structures V = (V, 2V , V ) one has on standard finite non-empty sets. The
nonstandard analogue of quantities such as triangle densities then become
the integrals of various nonstandard functions with respect to Loeb measure. With this approach, the epsilons and deltas that are so prevalent in
the finitary approach to these subjects disappear almost completely; but to
compensate for this, one now must pay much more attention to questions of
measurability, which were automatic in the finitary setting but now require
some care in the infinitary one.
The nonstandard analysis approaches are also related to the regularity
lemma approach; see [Ta2011d, 4.4] for a proof of the regularity lemma
using Loeb measure.
As usual, the nonstandard approach offers a complexity tradeoff: there
is more effort expended in building the foundational mathematical structures of the argument (in this case, ultraproducts and Loeb measure), but

100

5. Nonstandard analysis

once these foundations are completed, the actual arguments are shorter than
their finitary counterparts. In the case of the triangle removal lemma, this
tradeoff does not lead to a particularly significant reduction in complexity
(and arguably leads in fact to an increase in the length of the arguments,
when written out in full), but the gain becomes more apparent when proving more complicated results, such as the hypergraph removal lemma, in
which the initial investment in foundations leads to a greater savings in net
complexity, as can be seen in [ElSz2012].
5.2.1. Loeb measure. We use the usual setup of nonstandard analysis
(as reviewed for instance in [Ta2011d, 4.4]). Thus, we will need a nonprincipal Ultrafilter ultrafilter p N\N on the natural numbers N. A
statement P (n) pertaining to a natural number n is said to hold for n sufficiently close to p if the set of n for which P (n) holds lies in the ultrafilter p.
Given
a sequence Xn of (standard) spaces Xn , the Ultraproductultraproduct
Q
X
np n is the space of all ultralimits limnp xn with xn Xn , with two
ultralimits limnp xn , limnp yn considered equal if and only if xn = yn for
all n sufficiently close to p.
Now
Q consider a nonstandard finite non-empty set V , i.e. an ultraproduct
V = np Vn of standard finite non-empty sets Vn . Define an internal
Q
subset of V to be a subset of V of the form A = np An , where each An
is a subset of Vn . It is easy to see that the collection AV of all internal
subsets of V is a boolean algebra. In general, though, AV will not be a algebra. For instance, suppose that the Vn are the standard discrete intervals
Vn := [1, n] := {i N : i n}, then V is the non-standard discrete interval
V = [1, N ] := {i N : i N }, where N is the unbounded nonstandard
natural number N := limnp n. For any standard integer m, the subinterval
[1, N/m] is an internal subset of V ; but the intersection
\
[1, o(N )] :=
[1, N/m] = {i N : i = o(N )}
mN

is not an internal subset of V . (This can be seen, for instance, by noting that
all non-empty internal subsets of [1, N ] have a maximal element, whereas
[1, o(N )] does not.)
Q
Given any internal subset A = np An of V , we can define the cardinality |A| of A, which is the nonstandard natural number |A| := limnp |An |.
|A|
We then have the nonstandard density |V
| , which is a nonstandard real
number between 0 and 1. By the Bolzano-Weierstrass theorem, every this
|A|
|A|
bounded nonstandard real number |V
| has a unique standard part st( |V | ),
which is a standard real number in [0, 1] such that
|A|
|A|
= st(
) + o(1),
|V |
|V |

5.2. Loeb measure and the triangle removal lemma

101

where o(1) denotes a nonstandard infinitesimal (i.e. a nonstandard number


which is smaller in magnitude than any standard > 0).
In [Lo1975], Loeb observed that this standard density can be extended
to a complete probability measure:
Theorem 5.2.2 (Construction of Loeb measure). Let V be a nonstandard finite non-empty set. Then there exists a complete probability space
(V, LV , V ), with the following properties:
(Internal sets are Loeb measurable) If A is an internal subset of V ,
then A LV and
|A|
V (A) = st(
).
|V |
(Loeb measurable sets are almost internal) If E is a subset of V ,
then E is Loeb measurable if and only if, for every standard > 0,
there exists internal subsets A, B1 , B2 , . . . of V such that

[
EA
Bn
n=1

and

V (Bn ) .

n=1
|A|
Proof. The map V : A 7 st( |V
| ) is a finitely additive probability measure
on AV . We claim that this map V is in fact a pre-measure on AV , thus
one has

X
(5.2)
V (A) =
V (An )
n=1

whenever A is an internal set that is partitioned into a disjoint sequence


of internal sets An . But the countable sequence of sets A\(A1 . . . An )
are internal, and have empty intersection, so by the countable saturation
property of ultraproducts (see e.g. [Ta2011d, 4.4]), one of the A\(A1
. . . An ) must be empty. The pre-measure property (5.2) then follows from
the finite additivity of V .
Invoking the Hahn-Kolmogorov extension theorem (see e.g. [Ta2011,
Theorem 1.7.8]), we conclude that V extends to a countably additive probability measure on the -algebra hAV i generated by the internal sets. This
measure need not be complete, but we can then pass to the completion
LV := hAV i of that -algebra. This probability space certainly obeys the
first property. The only if portion of second property asserts that all
Loeb measurable sets differ from an internal set by sets of arbitrarily small

102

5. Nonstandard analysis

outer measure, but this is easily seen since the space of all sets that have
this property is easily verified to be a complete -algebra that contain the
algebra of internal sets. The if portion follows easily from the fact that
LV is a complete -algebra containing the internal sets. (These facts are
very similar to the more familiar facts that a bounded subset of a Euclidean
space is Lebesgue measurable if and only if it differs from an elementary set
by a set of arbitrarily small outer measure.)

Now we turn to the analogue of Tonellis theorem for Loeb measure,
which will be a fundamental tool when it comes to prove the triangle removal
lemma. Let V, W be two nonstandard finite non-empty sets, then V
W is also a nonstandard finite non-empty set. We then have three Loeb
probability spaces
(V, LV , V ),
(W, LW , W ),
(V W, LV W , V W ),

(5.3)

and we also have the product space


(5.4)

(V W, LV LW , V W ).

It is then natural to ask how the two probability spaces (5.3) and (5.4)
are related. There is one easy relationship, which shows that (5.3) extends
(5.4):
Exercise 5.2.1. Show that (5.3) is a refinement of (5.4), thus LV LW ,
and V W extends V W . (Hint: first recall why the product of Lebesgue
measurable sets is Lebesgue measurable, and mimic that proof to show that
the product of a LV -measurable set and a LW -measurable set is LV W measurable, and that the two measures V W and V W agree in this
case.)
In the converse direction, (5.3) enjoys the type of Tonelli theorem that
(5.4) does:
Theorem 5.2.3 (Tonelli theorem for Loeb measure). Let V, W be two nonstandard finite non-empty sets, and let f : V W [0, +] be an unsigned
LV W -measurable function. Then the expressions
Z Z
(5.5)
(
f (v, w) dW (w)) dV (v)
V

Z
(5.6)

Z
(

f (v, w) dW (w)) dV (v)


V

5.2. Loeb measure and the triangle removal lemma

103

and
Z
f (v, w) dV W (v, w)

(5.7)
V W

are well-defined (thus all integrands are almost everywhere well-defined and
appropriately measurable) and equal to each other.
Proof. By the monotone convergence theorem it suffices to verify this when
f is a simple function; by linearity we may then take f to be an indicator
function f = 1E . Using Theorem 5.2.2 and an approximation argument
(and many further applications of monotone convergence) we may assume
without loss of generality that E is an internal set. We then have
Z
|E|
f (v, w) dV W (v, w) = st(
)
|V ||W |
V W
and for every v V , we have
Z
|Ev |
f (v, w) dW (w) = st(
),
|W |
W
where Ev is the internal set
Ev := {w W : (v, w) E}.
Let n be a standard natural number, then we can partition V into the
internal sets V = V1 . . . Vn , where
Vi := {v V :

i1
|Ev |
i
<
}.
n
|W |
n

On each Vi , we have
Z
(5.8)

f (v, w) dW (w) =
W

i
1
+ O( )
n
n

and
(5.9)

i
1
|Ev |
= + O( ).
|W |
n
n

From (5.8), we see that the upper and lower integrals of


are both of the form
n
X
i |Vi |
1
+ O( ).
n |V |
n

R
W

f (v, w) dW (w)

i=1

Meanwhile, using the nonstandard double counting identity


1 X |Ev |
|E|
=
|V |
|W |
|V ||W |
vV

104

5. Nonstandard analysis

(where all arithmetic operations are interpreted in the nonstandard sense,


of course) and (5.9), we see that
n

X i |Vi |
|E|
1
=
+ O( ).
|V ||W |
n |V |
n
i=1
R
Thus we see that the upper and lower integrals of W f (v, w) dW (w) are
1
equal to |V|E|
||WR| + O( n ) for every standard n. Sending n to infinity, we
conclude that W f (v, w) dW (w) is measurable, and that
Z Z
|E|
f (v, w) dW (w)) dV (v) = st(
(
)
|V ||W |
W
V
showing that (5.5) and (5.7) are well-defined and equal. A similar argument
holds for (5.6) and (5.7), and the claim follows.

Remark 5.2.4. It is well known that the product of two Lebesgue measure
spaces Rn , Rm , upon completion, becomes the Lebesgue measure space on
Rn+m . Drawing the analogy between Loeb measure and Lebesgue measure,
it is then natural to ask whether (5.3) is simply the completion of (5.4). But
while (5.3) certainly contains the completion ofQ(5.4), it is a significantly
Q
larger space in general. Indeed, suppose V = np Vn , W = np Wn ,
where the cardinality of Vn , Wn goes to infinity at some reasonable rate, e.g.
|Vn |, |Wn | n for all n. For each n, let En be a random subset of Vn Wn ,
with each element of Vn Wn having an independent probability of 1/2 of
lying in En . Then, as is well known, the sequence of sets En is almost surely
asymptotically regular in the sense that almost surely, we have the bound
sup
An Vn ,Bn Wn

||En (An Bn )| 12 |An ||Bn ||


0
|Vn ||Wn |

as n . Let us condition on the event that this asymptotic regularity


Q
holds. Taking ultralimits, we conclude that the internal set E := np En
obeys the property
1
V W (E (A B)) = V W (A B)
2
for all internal A V, B W ; in particular, E has Loeb measure 1/2.
Using Theorem 5.2.2 we conclude that
1
V W (E F ) = V W (F )
2
for all LV LW -measurable F , which implies in particular that E cannot be
LV LW -measurable. (Indeed, 1E 21 is anti-measurable in the sense that
it is orthogonal to all functions in L2 (LV LW ); or equivalently, we have the
conditional expectation formula E(1E |LV LW ) = 12 almost everywhere.)

5.2. Loeb measure and the triangle removal lemma

105

Intuitively, a LV LW -measurable set corresponds to a subset of V W


that is of almost bounded complexity, in that it can be approximated by
a bounded boolean combination of Cartesian products. In contrast, LV W measurable sets (such as the set E given above) have no bound on their
complexity.
5.2.2. The triangle removal lemma. Now we can prove the triangle removal lemma, Lemma 5.2.1. We will deduce it from the following nonstandard (and tripartite) counterpart (a special case of a result first established
in [Ta2007]):
Lemma 5.2.5 (Nonstandard triangle removal lemma). Let V be a nonstandard finite non-empty set, and let E12 , E23 , E31 V V be Loeb-measurable
subsets of V V which are almost triangle-free in the sense that
Z
1E12 (u, v)1E23 (v, w)1E31 (w, u) dV V V (u, v, w) = 0.
(5.10)
V V V

Then for any standard > 0, there exists a internal subsets Fij V V for
ij = 12, 23, 31 with V V (Eij \Fij ) < , which are completely triangle-free
in the sense that
(5.11)

1F12 (u, v)1F23 (v, w)1F31 (w, u) = 0

for all u, v, w V .
Let us first see why Lemma 5.2.5 implies Lemma 5.2.1. We use the
usual compactness and contradiction argument. Suppose for contradiction
that Lemma 5.2.1 failed. Carefully negating the quantifiers, we can find a
(standard) > 0, and a sequence Gn = (Vn , En ) of graphs with t(Gn ) 1/n,
such that for each n, there does not exist a subgraph G0n = (Vn , En0 ) of n
with |En \En0 | |Vn |2 with t(G0n ) = 0. Clearly we may assume the Vn are
non-empty.
Q
We form the ultraproduct G = (V, E) of the Gn , thus V = np Vn and
Q
E = np En . By construction, E is a symmetric internal subset of V V
and we have
Z
1E (u, v)1E (v, w)1E (w, u) dV V V (u, v, w) = st lim t(Gn ) = 0.
np

V V V

Thus, by Lemma 5.2.5, we may find internal subsets F12 , F23 , F31 of V V
with V V (E\Fij ) < /6 (say) for ij = 12, 23, 31 such that (5.11) holds for
all u, v, w V . By letting E 0 be the intersection of all E with all the Fij
and their reflections, we see that E 0 is a symmetric internal subset of E with
V V (E\E 0 ) < , and we still have
1E 0 (u, v)1E 0 (v, w)1E 0 (w, u) = 0

106

5. Nonstandard analysis

for all u, v, w V . If we write E 0 = limnp En0 for some sets En0 , then for n
sufficiently close to p, one has En0 a symmetric subset of En with
Vn Vn (En \En0 ) <
and
1En0 (u, v)1En0 (v, w)1En0 (w, u) = 0.
If we then set G0n := (Vn , En ), we thus have |En \En0 | |Vn |2 and t(G0n ) = 0,
which contradicts the construction of Gn by taking n sufficiently large.
Now we prove Lemma 5.2.5. The idea (similar to that used to prove
the Furstenberg recurrence theorem, as discussed for instance in [Ta2009,
2.15]) is to first prove the lemma for very simple examples of sets Eij , and
then work ones way towards the general case. Readers who are familiar
with the traditional proof of the triangle removal lemma using the regularity
lemma will see strong similarities between that argument and the one given
here (and, on some level, they are essentially the same argument).
To begin with, we suppose first that the Eij are all elementary sets, in
the sense that they are finite boolean combinations of products of internal
sets. (At the finitary level, this corresponds to graphs that are bounded
combinations of bipartite graphs.) This implies that there is an internal
partition V = V1 . . . Vn of the vertex set V , such that each Eij is the
union of some of the Va Vb .
Let Fij be the union of all the Va Vb in Eij for which Va and Vb have
positive Loeb measure; then V V (Eij \Fij ) = 0. We claim that (5.11)
holds for all u, v, w V , which gives Theorem 5.2.5 in this case. Indeed, if
u Va , v Vb , w Vc were such that (5.11) failed, then E12 would contain
Va Vb , E23 would contain Vb Vc , and E31 would contain Vc Va . The
integrand in (5.10) is then equal to 1 on Va Vb Vc , which has Loeb
measure V (Va )V (Vb )V (Vc ) which is non-zero, contradicting (5.10). This
gives Theorem 5.2.5 in the elementary set case.
Next, we increase the level of generality by assuming that the Eij are
all LV LV -measurable. (The finitary equivalent of this is a little difficult
to pin down; roughly speaking, it is dealing with graphs that are not quite
bounded combinations of bounded graphs, but can be well approximated
by such bounded combinations; a good example is the half-graph, which is
a bipartite graph between two copies of {1, . . . , N }, which joins an edge
between the first copy of i and the second copy of j iff i < j.) Then each Eij
can be approximated to within an error of /3 in V V by elementary sets.
In particular, we can find a finite partition V = V1 . . . Vn of V , and sets
0 that are unions of some of the V V , such that
0
Eij
a
V V (Eij Eij ) < /3.
b

5.2. Loeb measure and the triangle removal lemma

107

0 such that V , V
Let Fij be the union of all the Va Vb contained in Eij
a
b
have positive Loeb measure, and such that

2
V V (Eij (Va Vb )) > V V (Va Vb ).
3
Then the Fij are internal subsets of V V , and V V (Eij \Fij ) < .
We now claim that the Fij obey (5.11) for all u, v, w, which gives Theorem 5.2.5 in this case. Indeed, if u Va , v Vb , w Vc were such that (5.11)
failed, then E12 occupies more than 32 of Va Vb , and thus
Z
2
1E12 (u, v) dV V V (u, v, w) > V V V (Va Vb Vc ).
3
Va Vb Vc
Similarly for 1E23 (v, w) and 1E31 (w, u). From the inclusion-exclusion formula, we conclude that
Z
1E12 (u, v)1E23 (v, w)1E31 (w, u) dV V V (u, v, w) > 0,
Va Vb Vc

contradicting (5.10), and the claim follows.


Finally, we turn to the general case, when the Eij are merely LV V measurable. Here, we split
1Eij = fij + gij
where fij := E(1Eij |LV LV ) is the conditional expectation of 1Eij onto
LV LV , and gij := 1Eij fij is the remainder. We observe that each
gij (u, v) is orthogonal to any tensor product f (u)g(v) with f, g bounded
and LV -measurable. From this and Tonellis theorem for Loeb measure
(Theorem 5.2.3) we conclude that each of the gij make a zero contribution
to (5.10), and thus
Z
f12 (u, v)f23 (v, w)f31 (w, u) dV V V (u, v, w) = 0.
V V V
0 := {(u, v) V V : f (u, v) /2}, then the E 0 are L L Now let Eij
ij
V
V
ij
measurable, and we have
Z
0 (u, v)1E 0 (v, w)1E 0 (w, u) dV V V (u, v, w) = 0.
1E12
23
31
V V V

Also, we have
0
V V (Eij \Eij
)

Z
=
V V

1Eij (1 1Eij0 )

Z
=
V V

/2.

fij (1 1Eij0 )

108

5. Nonstandard analysis

Applying the already established cases of Theorem 5.2.5, we can find internal
0 \F ) < /2, and hence
sets Fij obeying (5.11) with V V (Eij
ij
V V (Eij \Fij ) <
, and Theorem 5.2.5 follows.
Remark 5.2.6. The full hypergraph removal lemma can be proven using
similar techniques, but with a longer tower of generalisations than the three
cases given here; see [Ta2007] or [ElSz2012].

Chapter 6

Partial differential
equations

6.1. The limiting absorption principle


Perhaps the most fundamental differential operator on Euclidean space Rd
is the Laplacian
d
X
2
.
:=
x2j
j=1
The Laplacian is a linear translation-invariant operator, and as such is necessarily diagonalised by the Fourier transform
Z

f () :=
f (x)e2ix dx.
Rd

Indeed, we have
c () = 4 2 ||2 f()
f
for any suitably nice function f (e.g. in the Schwartz class; alternatively,
one can work in very rough classes, such as the space of tempered distributions, provided of course that one is willing to interpret all operators in a
distributional or weak sense).
Because of this explicit diagonalisation, it is a straightforward manner
to define spectral multipliers m() of the Laplacian for any (measurable,
polynomial growth) function m : [0, +) C, by the formula
\ () := m(4 2 ||2 )f().
m()f
(The presence of the minus sign in front of the Laplacian has some minor
technical advantages, as it makes positive semi-definite. One can also
109

110

6. Partial differential equations

define spectral multipliers more abstractly from general functional calculus, after establishing that the Laplacian is essentially self-adjoint.) Many
of these multipliers are of importance in PDE and analysis, such as the
fractional derivative operators ()s/2 , the heat propagators
et , the (free)

Schr
odinger
propagators eit , the wave propagators eit (or cos(t )

and sin(t
, depending on ones conventions), the spectral projections

1I ( ), the Bochner-Riesz summation operators (1 + 4


2 R2 )+ , or the resolvents R(z) := ( z)1 .
Each of these families of multipliers are related to the others, by means
of various integral transforms (and also, in some cases, by analytic continuation). For instance:
(1) Using the Laplace transform, one can express (sufficiently smooth)
multipliers in terms of heat operators. For instance, using the identity
Z
1
s/2
=
t1s/2 et dt
(s/2) 0
(using analytic continuation if necessary to make the right-hand
side well-defined), with being the Gamma function, we can write
the fractional derivative operators in terms of heat kernels:
Z
1
s/2
t1s/2 et dt.
(6.1)
() =
(s/2) 0
(2) Using analytic continuation, one can connect heat operators et
to Schr
odinger operators eit , a process also known as Wick rotation. Analytic continuation is a notoriously unstable process, and
so it is difficult to use analytic continuation to obtain any quantitative estimates on (say) Schrodinger operators from their heat
counterparts; however, this procedure can be useful for propagating identities from one family to another. For instance, one can
derive the fundamental solution for the Schrodinger equation from
the fundamental solution for the heat equation by this method.
(3) Using the Fourier inversion formula, one can write general multipliers as integral combinations of Schrodinger or wave propagators; for
instance, if z lies in the upper half plane H := {z C : Im z > 0},
one has
Z
1
=i
eitx eitz dt
xz
0
for any real number x, and thus we can write resolvents in terms
of Schr
odinger propagators:
Z
(6.2)
R(z) = i
eit eitz dt.
0

6.1. The limiting absorption principle

111

In a similar vein, if k H, then


Z
1
i
=
cos(tx)eikt dt
x2 k 2
k 0

(6.3)

for any x > 0, so one can also write resolvents in terms of wave
propagators:
Z

i
2
cos(t )eikt dt.
R(k ) =
k 0
(4) Using the Cauchy integral formula, one can express (sufficiently
holomorphic) multipliers in terms of resolvents (or limits of resolvents). For instance, if t > 0, then from the Cauchy integral formula
(and Jordans lemma) one has
Z
1
eity
itx
e =
lim
dy
2i 0+ R y x + i

(6.4)

for any x R, and so one can (formally, at least) write Schrodinger


propagators in terms of resolvents:
Z
1
it
e
=
lim
eity R(y + i) dy.
2i 0+ R
1
(5) The imaginary part of 1 x(y+i)
is the Poisson kernel (yx)12 +2 ,
which is an approximation to the identity. As a consequence, for
any reasonable function m(x), one has (formally, at least)
Z
1
1
m(x) = lim
(Im
)m(y) dy
+
x (y + i)
0 R

(6.5)

which leads (again formally) to the ability to express arbitrary multipliers in terms of imaginary (or skew-adjoint) parts of resolvents:
Z
1
m() = lim
(Im R(y + i))m(y) dy.
0+ R
Among other things, this type of formula (with replaced by a
more general self-adjoint operator) is used in the resolvent-based
approach to the spectral theorem (by using the limiting imaginary
part of resolvents to build spectral measure). Note that one can
1
also express Im R(y + i) as 2i
(R(y + i) R(y i)).

Remark 6.1.1. The ability of heat operators, Schrodinger propagators,


wave propagators, or resolvents to generate other spectral multipliers can be
viewed as a sort of manifestation of the Stone-Weierstrass theorem (though
with the caveat that the spectrum of the Laplacian is non-compact and so

112

6. Partial differential equations

the Stone-Weierstrass theorem does not directly apply). Indeed, observe the
*-algebra type identities
es et = e(s+t) ;
(es ) = es ;
eis eit = ei(s+t) ;
(eis ) = eis ;
eis

it

(eis

= ei(s+t)

) = eis

R(w) R(z)
;
zw
R(z) = R(z).

R(z)R(w) =

Because of these relationships, it is possible (in principle, at least), to


leverage ones understanding one family of spectral multipliers to gain control on another family of multipliers. For instance, the fact that the heat
operators et have non-negative kernel (a fact which can be seen from the
maximum principle, or from the Brownian motion interpretation of the heat
kernels) implies (by (6.1)) that the fractional integral operators ()s/2
for s > 0 also have non-negative kernel. Or, the fact that the wave equation
enjoys
finite speed of propagation (and hence that the wave propagators
cos(t ) have distributional convolution kernel localised to the ball of
radius |t| centred at the origin), can be used (by (6.3)) to show that the
resolvents R(k 2 ) have a convolution kernel that is essentially localised to
the ball of radius O(1/| Im(k)|) around the origin.
In this section, we will continue this theme by using the resolvents
R(z) = ( z)1 to control other spectral multipliers. These resolvents
are well-defined whenever z lies outside of the spectrum [0, +) of the operator . In the model three-dimensional1 case d = 3, they can be defined
explicitly by the formula
Z
eik|xy|
2
f (y) dy
R(k )f (x) =
R3 4|x y|
whenever k lives in the upper half-plane {k C : Im(k) > 0}, ensuring the
absolute convergence of the integral for test functions f . It is an instructive
exercise to verify that this resolvent indeed inverts the operator k 2 ,
either by using Fourier analysis or by Greens theorem.
1In general dimension, explicit formulas are still available, but involve Bessel functions.
But asymptotically at least, and ignoring higher order terms, one simply replaces
eik|xy|
cd |xy|d2

for some explicit constant cd .

eik|xy|
4|xy|

by

6.1. The limiting absorption principle

113

Henceforth we restrict attention to three dimensions d = 3 for simplicity.


One consequence of the above explicit formula is that for positive real > 0,
the resolvents R( + i) and R( i) tend to different limits as 0,
reflecting the jump discontinuity in the resolvent function at the spectrum;
as one can guess from formulae such as (6.4) or (6.5), such limits are of
interest for understanding many other spectral multipliers. Indeed, for any
test function f , we see that

Z
ei |xy|
lim R( + i)f (x) =
f (y) dy
0+
R3 4|x y|
and

Z
lim R( i)f (x) =

0+

R3

Both of these functions

ei |xy|
f (y) dy.
4|x y|

Z
u (x) :=
R3

ei |xy|
f (y) dy
4|x y|

solve the Helmholtz equation


(6.6)

( )u = f,

but have different asymptotics at infinity. Indeed, if


we have the asymptotic

R3

f (y) dy = A, then

(6.7)

Aei |x|
1
u (x) =
+ O( 2 )
4|x|
|x|

as |x| , leading also to the Sommerfeld radiation condition

1
1
(6.8)
u (x) = O( ); (r i )u (x) = O( 2 )
|x|
|x|
x
where r := |x|
x is the outgoing radial derivative. Indeed, one can
show using an integration by parts argument that u is the unique solution
of the Helmholtz equation (6.6) obeying (6.8) (see below). u+ is known
as the outward radiating solution of the Helmholtz equation (6.6), and u
is known as the inward radiating solution. Indeed, if one views the function u (t, x) := eit u (x) as a solution to the inhomogeneous Schrodinger
equation
(it + )u = eit f

and using the de Broglie law that a solution to such an equation with wave
number k R3 (i.e. resembling Aeikx for some amplitide A) should propagate at (group) velocity 2k, we see (heuristically, at least) that the outward
radiating
solution will indeed propagate radially away from the origin at
speed 2 , while inward radiating solution propagates inward at the same
speed.

114

6. Partial differential equations

There is a useful quantitative version of the convergence


R( i)f u ,

(6.9)

known as the limiting absorption principle:


Theorem 6.1.2 (Limiting absorption principle). Let f be a test function
on R3 , let > 0, and let > 0. Then one has
kR( i)f kH 0,1/2 (R3 ) C 1/2 kf kH 0,1/2+ (R3 )
for all > 0, where C > 0 depends only on , and H 0,s (R3 ) is the weighted
norm
kf kH 0,s (R3 ) := khxis f kL2x (R3 )
and hxi := (1 + |x|2 )1/2 .
This principle allows one to extend the convergence (6.9) from test functions f to all functions in the weighted space H 0,1/2+ by a density argument
(though the radiation condition (6.8) has to be adapted suitably for this scale
of spaces when doing so). The weighted space H 0,1/2 on the left-hand
side is optimal, as can be seen from the asymptotic (6.7); a duality argument
similarly shows that the weighted space H 0,1/2+ on the right-hand side is
also optimal.
We will prove this theorem shortly. As observed long ago by Kato
[Ka1965] (and also reproduced below), this estimate is equivalent (via a
Fourier transform in the spectral variable ) to a useful estimate for the
free Schr
odinger equation known as the local smoothing estimate, which in
particular implies the well-known RAGE theorem for that equation; it also
has similar consequences for the free wave equation. As we shall see, it also
encodes some spectral information about the Laplacian; for instance, it can
be used to show that the Laplacian has no eigenvalues, resonances, or singular continuous spectrum. These spectral facts are already obvious from
the Fourier transform representation of the Laplacian, but the point is that
the limiting absorption principle also applies to more general operators for
which the explicit diagonalisation afforded by the Fourier transform is not
available; see [RoTa2011].
Important caveat: In order to illustrate the main ideas and suppress
technical details, I will be a little loose with some of the rigorous details of
the arguments, and in particular will be manipulating limits and integrals
at a somewhat formal level.
6.1.1. Uniqueness. We first use an integration by parts argument to show
uniqueness of the solution to the Helmholtz equation (6.6) assuming the
radiation condition (6.8). For sake of concreteness we shall work with the
sign = +, and we will ignore issues of regularity, assuming all functions

6.1. The limiting absorption principle

115

are as smooth as needed. (In practice, the elliptic nature of the Laplacian
ensures that issues of regularity are easily dealt with.) If uniqueness fails,
then by subtracting the two solutions, we obtain a non-trivial solution u to
the homogeneous Helmholtz equation
( )u = 0

(6.10)
such that

1
(r i )u(x) = O( 2 ).
|x|

1
);
|x|

u(x) = O(

Next, we introduce the charge current


ji := Im(u i u)
(using the usual Einstein index notations), and observe from (6.6) that this
current is divergence-free:
i ji = 0.
(This reflects the phase rotation invariance u 7 ei u of the equation (6.6),
and can also be viewed as a version of the conservation of the Wronskian.)
From Stokes theorem, and using polar coordinates, we conclude in particular that
Z
jr (r) d = 0

S2

or in other words that


Z
Im(ur u)(r) d = 0.
S2

Using the radiation condition, this implies in particular that


Z
(6.11)
|u(r)|2 d = O(r3 )
S2

and
Z
(6.12)

|r u(r)|2 d = O(r3 )

S2

as r .
Now we use the positive commutator method. Consider the expression
Z
[r , ]u(x)u(x) dx.
(6.13)
R3

(To be completely rigorous, one should insert a cutoff to a large ball, and
then send the radius of that ball to infinity, in order to make the integral
well-defined but we will ignore this technicality here.) On the one hand, we
may integrate by parts (using (6.11), (6.12) to show that all boundary terms

116

6. Partial differential equations

go to zero) and (6.10) to see that this expression vanishes. On the other
hand, by expanding the Laplacian in polar coordinates we see that
[ , r ] =

2
2
r 3 .
2
r
r

An integration by parts in polar coordinates (using (6.11), (6.12) to justify


ignoring the boundary terms at infinity) shows that
Z
2
u(x)u(x) dx = 8|u(0)|2

2 r
r
3
R
and
Z

R3

2
u(x)u(x) dx = 2
r3

Z
R3

|ang |u(x)|2
dx
|x|

where |ang u(x)|2 := |u(x)|2 |r u(x)|2 is the angular part of the kinetic
energy density |u(x)|2 . We obtain (a degenerate case of) the PohazaevMorawetz identity
Z
|ang u(x)|2
2
8|u(0)| + 2
dx = 0
|x|
R3
which implies in particular that u vanishes at the origin. Translating u
around (noting that this does not affect either the Helmholtz equation or
the Sommerfeld radiation condition) we see that u vanishes completely.
(Alternatively, one can replace r by the smoothed out multiplier x
hxi , in
which case the Pohazaev-Morawetz identity acquires a term of the form
R |u(x)|2
R3 hxi5 dx which is enough to directly ensure that u vanishes.)
6.1.2. Proof of the limiting absorption principle. We now sketch a
proof of the limiting absorption principle, also based on the positive commutator method. For notational simplicity we shall only consider the case
when is comparable to 1, though the method we give here also yields the
general case after some more bookkeeping.
Let > 0 be a small exponent to be chosen later, and let f be normalised
to have H 0,1/2+ (R3 ) norm equal to 1. For sake of concreteness let us take
the + sign, so that we wish to bound u := R( + i)f . This u obeys the
Helmholtz equation
(6.14)

u + u = f iu.

For positive , we also see from the spectral theorem that u lies in L2 (R3 );
the bound here though depends on , so we can only use this L2 (R3 ) regularity for qualitative purposes (and specifically, for ensuring that boundary
terms at infinity from integration by parts vanish) rather than quantitatively.

6.1. The limiting absorption principle

117

Once again, we may apply the positive commutator method. If we again


consider the expression (6.13), then on the one hand this expression evaluates
as before to
Z
|ang u(x)|2
8|u(0)|2 + 2
dx.
|x|
R3
On the other hand, integrating by parts using (6.14), this expression also
evaluates to
Z
2 Re
(r (f + iu))u dx.
R3

Integrating by parts and using Cauchy-Schwarz and the normalisation on f


(and also Hardys inequality), we thus see that
Z
|ang u(x)|2
2
|u(0)| +
dx . kukH 0,3/2 +kr ukH 0,1/2 +kukL2 kukL2 .
|x|
R3
A slight modification of this argument, replacing the operator r with the
smoothed out variant
r
r
(
1+2 )r
hri
hri
yields (after a tedious computation)
Z
|u(x)|2
|u(x)|2
+
dx . kukH 0,3/2 + kr ukH 0,1/2 + kuk2L2 .
3+2
1+2
hxi
hxi
3
R
The left-hand side is kuk2H 0,3/2 + kuk2H 0,1/2 ; we can thus absorb the
first two terms of the right-hand side onto the left-hand side, leading one
with
kuk2H 0,3/2 + kuk2H 0,1/2 . kukL2 kukL2 .
On the other hand, by taking the inner product of (6.14) against iu and
using the self-adjointness of + , one has
Z
Z
0=
f iu
|u|2
R3

R3

and hence by Cauchy-Schwarz and the normalisation of f


kuk2L2 kukH 0,1/2 .
Elliptic regularity estimates using (6.14) (together with the hypothesis that
is comparable to 1) also show that
kukH 0,1/2 . kukH 0,1/2 + 1
and
kukL2 . kukL2 + 1;
putting all these estimates together, we obtain
kukH 0,1/2 . 1
as required.

118

6. Partial differential equations

Remark 6.1.3. In applications it is worth noting some additional estimates


that can be obtained by variants of the above method (i.e. lots of integration
by parts and Cauchy-Schwarz). From the Pohazaev-Morawetz identity, for
instance, we can show some additional decay for the angular derivative:
k|ang u|kH 0,1/2 . kf kH 0,1/2+ .
With the positive sign = +, we also have the Sommerfeld type outward
radiation condition

kr u i ukH 0,1/2+ . kf kH 0,1/2+


if > 0 is small enough. For the negative sign = , we have the inward
radiating condition

kr u + i ukH 0,1/2+ . kf kH 0,1/2+


6.1.3. Spectral applications. The limiting absorption principle can be
used to deduce various basic facts about the spectrum of the Laplacian. For
instance:
Proposition 6.1.4 (Purely absolutely continuous spectrum). As an operator on L2 (R3 ), has only purely absolutely continuous spectrum on any
compact subinterval [a, b] of (0, +).
Proof. (Sketch) By density, it suffices to show that for any test function f
C0 (R3 ), the spectral measure f of relative to f is purely absolutely
continuous on [a, b]. In view of (6.5), we have
f = lim

0+

1
ImhR( + i)f, f i

in the sense of distributions, so from Fatous lemma it suffices to show that


ImhR( + i)f, f i is uniformly bounded on [a, b], uniformly in . But this
follows from the limiting absorption principle and Cauchy-Schwarz.

Remark 6.1.5. The Laplacian also has no (point) spectrum at zero
or negative energies, but this cannot be shown purely from the limiting
absorption principle; if one allows a non-zero potential, then the limiting
absorption principle holds (assuming suitable short-range hypotheses on
the potential) but (as is well known in quantum mechanics) one can have
eigenvalues (bound states) at zero or negative energies.
6.1.4. Local smoothing. Another key application of the limiting absorption principle is to obtain local smoothing estimates for both the Schrodinger
and wave equations. Here is an instance of local smoothing for the Schrodinger
equation:

6.1. The limiting absorption principle

119

Theorem 6.1.6 (Homogeneous local smoothing for Schrodinger). If f


L2 (R3 ), and u : R R3 C is the (tempered distributional) solution to the
homogeneous Schr
odinger equation iut + u = 0, u(0) = f (or equivalently,
u(t) = eit f ), then one has
k||1/2 ukL2 H 0,1/2 (RR3 ) . kf kL2 (R3 )
t

for any fixed > 0.


The ||1/2 factor in this estimate is the smoothing part of the local
smoothing estimate, while the negative weight 1/2 is the local part.
There is also a version of this local smoothing estimate for the inhomogeneous Schr
odinger equation iut + u = F which is in fact essentially equivalent to the limiting absorption principle (as observed http://www.ams.org/mathscinetgetitem?mr=190801 by Kato), which we will not give here.
Proof. We begin by using the T T method. By duality, the claim is equivalent to
Z
k||1/2
eit F (t) dtkL2 (R3 ) . kF kL2 H 0,1/2+ (RR3 )
t

which by squaring is equivalent to


Z
0
ei(tt ) F (t0 ) dt0 kL2 H 0,1/2 (RR3 ) . kF kL2 H 0,1/2+ (RR3 ) .
(6.15) k||
R

From (6.5) one has (formally, at least)


Z
1
(Im R(y + i))eity dy.
eit = lim
0+ R
Because only has spectrum on the positive real axis, Im R(y + i0) vanishes on the negative real axis, and so (after carefully dealing with the contribution near the zero energy) one has
Z
1
eit = lim
(Im R(y + i))eity dy.
0+ 0
Taking the time-Fourier transform
F (y) :=

eity F (t)

we thus have
Z
Z
1
0
ei(tt ) F (t0 ) dt0 = lim
eity (Im R(y + i))F (y) dy.
+
0
R
R
Applying Plancherels theorem and Fatous lemma (and commuting the L2t
0,1/2
and Hx
norms), we can bound the LHS of (6.15) by
. k||(Im R(y + i))F (y)kL2 H 0,1/2 (RR3 )
y

120

6. Partial differential equations

while the right-hand side is comparable to


. kF (y)kL2 H 0,1/2+ (RR3 ) .
y

The claim now follows from the limiting absorption principle (and elliptic
regularity).

Remark 6.1.7. The above estimate was proven by taking a Fourier transform in time, and then applying the limiting absorption principle, which
was in turn proven by using the positive commutator method. An equivalent way to proceed is to establish the local smoothing estimate directly
by the analogue of the positive commutator method for Schrodinger flows,
namely Morawetz multiplier method in which one contracts the stress-energy
tensor (or variants thereof) against well-chosen vector fields, and integrates
by parts.
An analogous claim holds for solutions to the wave equation
t2 u + u = 0
with initial data u(0) = u0 , t u(0) = u1 , with the relevant estimate being
that
k|t,x u|kL2t H 0,1/2 (RR3 ) . ku0 kH 1 (R3 ) + ku1 kL2 (R3 ) .
As before, this estimate can also be proven directly using the Morawetz
multiplier method.
6.1.5. The RAGE theorem. Another consequence of limiting absorption, closely related both to absolutely continuous spectrum and to local
smoothing, is the RAGE theorem (named after Ruelle [Ru1969], AmreinGeorgescu [AmGe1973], and Enss [En1978], specialised to the free Schrodinger
equation:
Theorem 6.1.8 (RAGE for Schrodinger). If f L2 (R3 ), and K is a compact subset of R3 , then keit f kL2 (K) 0 as t .
Proof. By a density argument we may assume that f lies in, say, H 2 (R3 ).
Then eit f is uniformly bounded in H 2 (R3 ), and is Lipschitz in time in the
L2 (R3 ) (and hence L2 (K)) norm. On the other hand, from local smoothing
R T +1 it
we know that T
ke kL2 (K) dt goes to zero as T . Putting the
two facts together we obtain the claim.

Remark 6.1.9. One can also deduce this theorem from the fact that
has purely absolutely continuous spectrum, using the abstract form of the
RAGE theorem due to the authors listed above (which can be thought of as
a Hilbert space-valued version of the Riemann-Lebesgue lemma).

6.1. The limiting absorption principle

121

There is also a similar RAGE theorem for the wave equation (with L2
replaced by the energy space H 1 L2 ) whose precise statement we omit
here.
6.1.6. The limiting amplitude principle. A close cousin to the limiting
absorption principle, which governs the limiting behaviour of the resolvent
as it approaches the spectrum, is the limiting amplitude principle, which
governs the asymptotic behaviour of a Schrodinger or wave equation with
oscillating forcing term. We give this principle for the Schrodinger equation
(the case for the wave equation is analogous):
Theorem 6.1.10 (Limiting amplitude principle). Let f L2 (R3 ) be compactly supported, let > 0, and let u be a solution to the forced Schr
odinger
equation iut + u = eit f which lies in L2 (R3 ) at time zero. Then for any
compact set K, eiT u converges in L2 (K) as T + to v, the solution to
the Helmholtz equation v+v = f obeying the outgoing radiation condition
(6.7).
Proof. (Sketch) By subtracting off the free solution eit u(0) (which decays
in L2 (K) by the RAGE theorem), we may assume that u(0) = 0. From the
Duhamel formula we then have
Z T
u(T ) = i
ei(T t) eit f dt
0

and thus (after changing variables from t to T t)


Z T
eiT u(T ) = i
eit(+) f dt.
0

We write the right-hand side as


Z
i lim

0+

eit(++i) f dt.

R
From the limiting absorption principle, the integral i 0 eit(++i) f dt
converges to v, and so it suffices to show that the expression
Z
lim
eit(++i) f dt
0+

converges to zero as T + in L2 (K) norm. Evaluating the integral, we


are left with showing that
lim eiT R( + i)f

0+

converges to zero as T + in L2 (K) norm.

122

6. Partial differential equations

By using contour integration, one can write


Z
1
eiT x
iT
lim e
R( + i)f =
lim lim
R(x + i0 )f dx.
2i 0+ 0 0+ R x i
0+
On the other hand, from the explicit solution for the resolvent (and the
compact support of f ), R(x + i0 )f can be shown to vary in a Holder continuous fashion on x in the L2 (K) norm (uniformly in x, 0 ), and to decay
at a polynomial rate as x . Since
Z
eiT x
dx = 0
R x i
for T > 0, the required decay in L2 (K) then follows from a routine calculation.

Remark 6.1.11. More abstractly, it was observed by Eidus [Ei1969] that
the limiting amplitude principle for a general Schrodinger or wave equation can be deduced from the limiting absorption principle and a Holder
continuity bound on the resolvent.

6.2. The shallow water wave equation, and the propagation


of tsunamis
Tsunamis are water waves that start in the deep ocean, usually because of
an underwater earthquake (though tsunamis can also be caused by underwater landslides or volcanoes), and then propagate towards shore. Initially,
tsunamis have relatively small amplitude (a metre or so is typical), which
would seem to render them as harmless as wind waves. And indeed, tsunamis
often pass by ships in deep ocean without anyone on board even noticing.
However, being generated by an event as large as an earthquake, the
wavelength of the tsunami is huge - 200 kilometres is typical (in contrast
with wind waves, whose wavelengths are typically closer to 100 metres). In
particular, the wavelength of the tsunami is far greater than the depth of
the ocean (which is typically 2-3 kilometres). As such, even in the deep
ocean, the dynamics of tsunamis are essentially governed by the shallow
water equations. One consequence of these equations is that the speed of
propagation v of a tsunami can be approximated by the formula
p
(6.16)
v gb
where b is the depth of the ocean, and g 9.8ms2 is the force of gravity. As such, tsunamis in deep water move2 very fast - speeds such as 500
kilometres per hour (300 miles per hour) are quite typical; enough to travel
from Japan to the US, for instance, in less than a day. Ultimately, this is
2Note though that this is the phase velocity of the tsunami wave, and not the velocity of the
water molecues themselves, which are far slower.

6.2. Shallow water waves and tsunamis

123

due to the incompressibility of water (and conservation of mass); the massive net pressure (or more precisely, spatial variations in this pressure) of a
very broad and deep wave of water forces the profile of the wave to move
horizontally at vast speeds.
As the tsunami approaches shore, the depth b of course decreases, causing the tsunami to slow down, at a rate proportional to the square root
of the depth, as per (6.16). Unfortunately, wave shoaling then forces the
amplitude A to increase at an inverse rate governed by Greens law,
(6.17)

1
b1/4

at least until the amplitude becomes comparable to the water depth (at
which point the assumptions that underlie the above approximate results
break down; also, in two (horizontal) spatial dimensions there will be some
decay of amplitude as the tsunami spreads outwards). If one starts with
a tsunami whose initial amplitude was A0 at depth b0 and computes the
point at which the amplitude A and depth b become comparable using the
proportionality relationship (6.17), some high school algebra then reveals
that at this point, amplitude of a tsunami (and the depth of the water) is
4/5 1/5
about A0 b0 . Thus, for instance, a tsunami with initial amplitude of one
metre at a depth of 2 kilometres can end up with a final amplitude of about
5 metres near shore, while still traveling at about ten metres per second (35
kilometres per hour, or 22 miles per hour), which can lead to a devastating
impact when it hits shore.
While tsunamis are far too massive of an event to be able to control
(at least in the deep ocean), we can at least model them mathematically,
allowing one to predict their impact at various places along the coast with
high accuracy. The full equations and numerical methods used to perform
such models are somewhat sophisticated, but by making a large number of
simplifying assumptions, it is relatively easy to come up with a rough model
that already predicts the basic features of tsunami propagation, such as the
velocity formula (6.16) and the amplitude proportionality law (6.17). I give
this (standard) derivation below. The argument will largely be heuristic in
nature; there are very interesting analytic issues in actually justifying many
of the steps below rigorously, but I will not discuss these matters here.
6.2.1. The shallow water wave equation. The ocean is, of course, a
three-dimensional fluid, but to simplify the analysis we will consider a twodimensional model in which the only spatial variables are the horizontal
variable x and the vertical variable z, with z = 0 being equilibrium sea
level. We model the ocean floor by a curve
z = b(x),

124

6. Partial differential equations

thus b measures the depth of the ocean at position x. At any time t and
position x, the height of the water (compared to sea level z = 0) will be
given by an unknown height function h(t, x); thus, at any time t, the ocean
occupies the region
t := {(x, z) : b(x) < z < h(t, x)}.
Now we model the motion of water inside the ocean by assigning at each
time t and each point (x, z) t in the ocean, a velocity vector
~u(t, x, z) = (ux (t, x, z), uz (t, x, z)).
We make the basic assumption of incompressibility, so that the density
of water is constant throughout t .
The velocity changes over time according to Newtons second law F =
ma. To apply this law to fluids, we consider an infinitesimal amount of
water as it flows along the velocity field ~u. Thus, at time t, we assume that
this amount of water occupies some infinitesimal area dA and some position
~x(t) = (x(t), z(t)), where we have
d
~x(t) = ~u(t, ~x(t)).
dt
Because of incompressibility, the area dA stays constant, and the mass of
this infinitesimal portion of water is m = dA. There will be two forces on
this body of water; the force of gravity, which is (0, mg) = (0, )dA, and
the force of the pressure field p(t, x, z), which is given by pdA. At the
length and time scales of a tsunami, we can neglect the effect of other forces
such as viscosity or surface tension. Newtons law m du
dt = F then gives
d
~u(t, ~x(t)) = pdA + (0, mg)
dt
which simplifies to the incompressible Euler equation
m

~u + (~u )~u = p + (0, g).


t

At present, the pressure is not given. However, we can simplify things by


making the assumption of (vertical) hydrostatic equilibrium, i.e. the vertical

effect 1 z
p of pressure cancels out the effect g of gravity. We also assume
that the pressure is zero on the surface z = h(t, x) of the water. Together,
these two assumptions force the pressure to be the hydrostatic pressure
(6.18)

p = g(h(t, x) z).

This reflects the intuitively plausible fact that the pressure at a point under
the ocean should be determined by the weight of the water above that point.

6.2. Shallow water waves and tsunamis

125

The incompressible Euler equation now simplifies to

~u + (~u )~u = g( h, 0).


t
x

(6.19)

We next make the shallow water approximation that the wavelength of


the water is far greater than the depth of the water. In particular, we do not
expect significant changes in the velocity field in the z variable, and thus
make the ansatz
~u(t, x, z) ~u(t, x).

(6.20)

(This ansatz should be taken with a grain of salt, particularly when applied
to the z component uz of the velocity, which does actually have to fluctuate
a little bit to accomodate changes in ocean depth and in the height function.
However, the primary component of the velocity is the horizontal component
ux , and this does behave in a fairly vertically insensitive fashion in actual
tsunamis.)
Taking the x component of (6.19), and abbreviating ux as u, we obtain
the first shallow water wave equation

u + u u = g h.
t
x
x

(6.21)

The next step is to play off the incompressibility of water against the
finite depth of the ocean. Consider an infinitesimal slice
{(x, z) t : x0 x x0 + dx}
of the ocean at some time t and position x0 . The total mass of this slice is
roughly
(h(t, x0 ) + b(x0 ))dx
and so the rate of change of mass of this slice over time is
h
(t, x0 )dx.
t
On the other hand, the rate of mass entering this slice on the left x = x0 is

u(t, x0 )(h(t, x0 ) + b(x0 ))


and the rate of mass exiting on the right x = x0 + dx is
u(t, x0 + dx)(h(t, x0 + dx) + b(x0 + dx)).
Putting these three facts together, we obtain the equation

h
(t, x0 )dx = u(t, x0 )(h(t, x0 ) + b(x0 ))
t
u(t, x0 + dx)(h(t, x0 + dx) + b(x0 + dx))

126

6. Partial differential equations

which simplifies after Taylor expansion to the second shallow water wave
equation

(6.22)
h+
(u(h + b)) = 0.
t
x
Remark 6.2.1. Another way to derive (6.22) is to use a more familiar form
of the incompressibility, namely the divergence-free equation

(6.23)
ux +
uz = 0.
x
z
(Here we will refrain from applying (6.20) to the vertical component of the
velocity uz , as the approximation (6.20) is not particularly accurate for this
component.) Also, by considering the trajectory of a particle (x(t), h(t, x(t)))
at the surface of the ocean, we have the formulae
d
x(t) = ux (x(t), h(t, x(t)))
dt
and
d
h(t, x(t)) = uz (x(t), h(t, x(t)))
dt
which after application of the chain rule gives the equation

(6.24)
h(t, x) + ( h(x))ux (x, h(t, x)) = uz (x, h(t, x)).
t
x
A similar analysis at the ocean floor (which does not vary in time) gives

b(x)ux (x, b(x)) = uz (x, b(x)).


x
We apply these equations to the evaluation of the expression
Z h(t,x)

ux (t, x, z) dz.
x b(x)
(6.25)

which is the spatial rate of change of the velocity flux through a vertical
slice of the ocean. On the one hand, using the ansatz (6.20), we expect this
expression to be approximately

(u(h + b)).
x
On the other hand, by differentiation under the integral sign, we can evaluate
this expression instead as
Z h(t,x)

ux (t, x, z) dz
b(x) x

+ ( h(t, x))ux (x, h(t, x))


x

+ ( b(x))ux (x, b(x)).


x

6.2. Shallow water waves and tsunamis

127

If we then substitute in (6.23), (6.24), (6.25) and apply the fundamental

theorem of calculus, one ends up with t


h(t, x), and the claim (6.22) follows.
The equations (6.21), (6.22) are nonlinear in the unknowns u, h. However, one can approximately linearise them by making the hypothesis that
the amplitude of the wave is small compared to the depth of the water:
(6.26)

|h|  b.

This hypothesis is fairly accurate for tsunamis in the deep ocean, and even
for medium depths, but of course is not reasonable once the tsunami has
reached shore (where the dynamics are far more difficult to model).
The hypothesis (6.26) already simplifies (6.22) to (approximately)
(6.27)

h+
(ub) = 0.
t
x

As for (6.21), we argue that the second term on the left-hand side is negligible, leading to

u = g h.
t
x
To explain heuristically why we expect this to be the case, let us make the
ansatz that h and u have amplitude A, V respectively, and propagate at
some phase velocity v and wavelength ; let us also make the (reasonable)
assumption that b varies much slower in space than u does (i.e. that b is
roughly constant at the scale of the wavelength ), so we may (for a first

approximation) replace x
(ub) by b x
u. Heuristically, we then have
(6.28)

u = O(V /)
x

h = O(A/)
x

u = O(vV /)
t

h = O(vA/)
t
and equation (6.27) then suggests
(6.29)

vA/ V b/.

From (6.26) we expect A  b, and thus v  V ; the wave propagates

u=
much faster than the velocity of the fluid. In particular, we expect u x

2
O(V /) to be much smaller than t u = O(vV /), which explains why we
expect to drop the second term in (6.21) to obtain (6.28).

128

6. Partial differential equations

If we now insert the above ansatz into (6.28), we obtain


vV / gA/;
combining this with (6.29), we already get the velocity relationship (6.16).
Remark 6.2.2. One can also obtain (6.16) more quickly (up to a loss of
a constant factor) by dimensional analysis, together with some additional
physical arguments. Indeed, it is clear from a superficial scan of the above
discussion that the velocity v is only going to depend on the quantities
, g, b, A, V, . As the density is the only input that involves mass in its
units, dimensional analysis already rules out any role for . As we are in the
small amplitude regime (6.26), we expect the dynamics to be linearised, and
thus not dependent on amplitude; this rules out A (and similarly V , which
is the amplitude of the velocity field, and which is negligible when compared
against the phase velocity V ). Finally, in the long wavelength regime  b,
we expect the wavelength to be physically undetectable at local scales (it
requires not only knowledge of the slope of the height function at ones
location, but also the second derivative of that function (i.e. the curvature
of the ocean surface), which is lower order). So we rule out dependence on
also, leaving only g and b, and at this point dimensional analysis forces
the relationship (6.16) up to constants. (Unfortunately, I do not know of an
analogous dimensional analysis argument that gives (6.17).)
To get the relation (6.17), we have to analyse the ansatz a bit more
carefully. First, we combine (6.28) and (6.27) into a single equation for the
height function h. Indeed, differentiating (6.27) in time and then substituting in (6.28) and (6.16) gives
2
2
h
(v
h) = 0.
t2
x
x
To solve this wave equation, we use a standard sinusoidal ansatz
h(t, x) = A(t, x) sin((t, x)/)
where A, are slowly varying functions, and > 0 is a small parameter.
Inserting this ansatz and extracting the top order terms in , we conclude
the eikonal equation
2t v 2 2x = 0
and the Hamilton-Jacobi equation
2At t + Att v 2 (2Ax x + Axx ) 2vvx Ax = 0.
From the eikonal equation we see that propagates at speed v. Assuming
rightward propagation, we thus have
(6.30)

t = vx .

6.2. Shallow water waves and tsunamis

129

As for the Hamilton-Jacobi equation, we solve it using the method of characteristics. Multiplying the equation by A, we obtain
(A2 t )t v 2 (A2 x )x 2vvx A2 x = 0.
Inserting (6.30) and writing F := A2 x , one obtains
vFt v 2 Fx 2vvx F = 0
which simplifies to
(t + vx )(v 2 F ) = 0.
Thus we see that v 2 F is constant along characteristics. On the other hand,
differentiating (6.30) in x we see (after some rearranging) that
(t + vx )(vx ) = 0
so vx is also constant along characteristics. Dividing, we see that A2 v is
constant along characteristics, leading to the proportionality relationship
1
A
v
which gives (6.17).
Remark 6.2.3. It becomes difficult to retain the sinusoidal ansatz once the
amplitude exceeds the depth, as it leads to the absurd conclusion that the
troughs of the wave lie below the ocean floor. However, a remnant of this
effect can actually be seen in real-life tsunamis, namely that if the tsunami
starts with a trough rather than a crest, then the water at the shore draws
back at first (sometimes for hundreds of metres), before the crest of the
tsunami hits. As such, the sudden withdrawal of water of a shore is an
important warning sign of an immediate tsunami.

Chapter 7

Number theory

7.1. Hilberts seventh problem, and powers of 2 and 3


Hilberts seventh problem asks to determine the transcendence of powers ab
of two algebraic numbers a, b. This problem was famously solved by Gelfond
and Schneider [Ge1934], [Sc1934]:
Theorem 7.1.1 (Gelfond-Schneider theorem). Let a, b be algebraic numbers, with a 6= 0, 1 and b irrational. Then (any of the values of the possibly
multi-valued expression) ab is transcendental.
For sake of simplifying the discussion, let us focus on just one specific
consequence of this theorem:
Corollary 7.1.2.

log 2
log 3

is transcendental.

Proof. If not, one could obtain a contradiction to the Gelfond-Schneider


2
log 2
theorem by setting a := 3 and b := log
log 3 . (Note that log 3 is clearly irrational,
since 3p 6= 2q for any integers p, q with q positive.)

In a series of papers [Ba1966], [Ba1967], [Ba1967b], Alan Baker established a major generalisation of the Gelfond-Schneider theorem known
as Bakers theorem, as part of his work in transcendence theory that later
earned him a Fields Medal. Among other things, this theorem provided
explicit quantitative bounds on exactly how transcendental quantities such
log 2
as log
3 were. In particular, it gave a strong bound on how irrational such
quantities were (i.e. how easily they were approximable by rationals). Here,
in particular, is one special case of Bakers theorem:
131

132

7. Number theory

Proposition 7.1.3 (Special case of Bakers theorem). For any integers p, q


with q positive, one has
log 2 p
1
|
|c C
log 3 q
q
for some absolute (and effectively computable) constants c, C > 0.
This theorem may be compared with (the easily proved) Liouvilles theorem on diophantine approximation, which asserts that if is an irrational
algebraic number of degree d, then
p
1
| | c d
q
q
for all p, q with q positive, and some effectively computable c > 0, and (the
more significantly difficult) Thue-Siegel-Roth theorem [Th1909, Si1921,
Ro1955], which under the same hypotheses gives the bound
1
p
| | c 2+
q
q
for all > 0, all p, q with q positive and an ineffective 1 constant c > 0.
Finally, one should compare these results against Dirichlets theorem on
Diophantine approximation, which asserts that for any real number one
has
p
1
| | < 2
q
q
for infinitely many p, q with q positive.
Proposition 7.1.3 easily implies the following separation property between powers of 2 and powers of 3:
Corollary 7.1.4 (Separation between powers of 2 and powers of 3). For
any positive integers p, q one has
c
|3p 2q | C 3p
q
for some effectively computable constants c, C > 0 (which may be slightly
different from those in Proposition 7.1.3).
Indeed, this follows quickly from Proposition 7.1.3, the identity
(7.1)

3p 2q = 3p (1 3

2
pq )
q( log
log 3

and some elementary estimates.


In particular, the gap between powers of three 3p and powers of two 2q
grows exponentially in the exponents p, q. I do not know of any other way
1The reason the Thue-Siegel-Roth theorem is ineffective is because it relies heavily on the
dueling conspiracies argument [Ta2010b, 1.12], i.e. playing off multiple conspiracies pq
against each other; the other results however only focus on one approximation at a time and thus
avoid ineffectivity.

7.1. Hilberts seventh problem, and powers of 2 and 3

133

to establish this fact other than essentially going through some version of
Bakers argument (which will be given below).
For comparison, by exploiting the trivial (yet fundamental) integrality
gap - the obvious fact that if an integer n is non-zero, then its magnitude is
at least 1 - we have the trivial bound
|3p 2q | 1
for all positive integers p, q (since, from the fundamental theorem of arithmetic, 3p 2q cannot vanish). Putting this into (7.1) we obtain a very weak
version of Proposition 7.1.3, that only gives exponential bounds instead of
polynomial ones:
Proposition 7.1.5 (Trivial bound). For any integers p, q with q positive,
one has
log 2 p
1
|
|c q
log 3 q
2
for some absolute (and effectively computable) constant c > 0.
The proof of Bakers theorem (or even of the simpler special case in
Proposition 7.1.3) is largely elementary (except for some very basic complex
analysis), but is quite intricate and lengthy, as a lot of careful book-keeping
is necessary in order to get a bound as strong as that in Proposition 7.1.3. To
illustrate the main ideas, I will prove a bound that is weaker than Proposition
7.1.3, but still significantly stronger than Proposition 7.1.5, and whose proof
already captures many of the key ideas of Baker:
Proposition 7.1.6 (Weak special case of Bakers theorem). For any integers p, q with q > 1, one has
log 2 p
0
|
| exp(C logC q)
log 3 q
for some absolute constants C, C 0 > 0.
Note that Proposition 7.1.3 is equivalent to the assertion that one can
take C 0 = 1 (and C effective) in the above proposition.
The proof of Proposition 7.1.6 can be made effective (for instance, it is
not too difficult to make the C 0 close to 2); however, in order to simplify the
exposition (and in particular, to be able to use some nonstandard analysis
terminology to reduce the epsilon management, cf. [Ta2008, 1.5]), I will
establish Proposition 7.1.6 with ineffective constants C, C 0 .
Like many other results in transcendence theory, the proof of Bakers
theorem (and Proposition 7.1.6) rely on what we would nowadays call the
polynomial method - to play off upper and lower bounds on the complexity
of polynomials that vanish (or nearly vanish) to high order on a specified

134

7. Number theory

set of points. In the specific case of Proposition 7.1.6, the points in question
are of the form
N := {(2n , 3n ) : n = 1, . . . , N } R2
for some large integer N . On the one hand, the irrationality of
that the curve
:= {(2t , 3t ) : t R}

log 2
log 3

ensures

is not algebraic, and so it is difficult for a polynomial P of controlled complexity2 to vanish (or nearly vanish) to high order at all the points of N ; the
trivial bound in Proposition 7.1.5 allows one to make this statement more
2
precise. On the other hand, if Proposition 7.1.6 failed, then log
log 3 is close to
a rational, which by Taylor expansion makes close to an algebraic curve
over the rationals (up to some rescaling by factors such as log 2 and log 3)
at each point of N . This, together with a pigeonholing argument, allows
one to find a polynomial P of reasonably controlled complexity to (nearly)
vanish to high order at every point of N .
These observations, by themselves, are not sufficient to get beyond the
trivial bound in Proposition 7.1.5. However, Bakers key insight was to
exploit the integrality gap to bootstrap the (near) vanishing of P on a set
N to imply near-vanishing of P on a larger set N 0 with N 0 > N . The point
is that if a polynomial P of controlled degree and size (nearly) vanishes to
higher order on a lot of points on an analytic curve such as , then it will
also be fairly small on many other points in as well. (To quantify this
statement efficiently, it is convenient to use the tools of complex analysis,
which are particularly well suited to understand zeroes (or small values) of
polynomials.) But then, thanks to the integrality gap (and the controlled
complexity of P ), we can amplify fairly small to very small.
Using this observation and an iteration argument, Baker was able to take
a polynomial of controlled complexity P that nearly vanished to high order
on a relatively small set N0 , and bootstrap that to show near-vanishing on
a much larger set Nk . This bootstrap allows one to dramatically bridge the
gap between the upper and lower bounds on the complexity of polynomials
that nearly vanish to a specified order on a given N , and eventually leads
to Proposition 7.1.6 (and, with much more care and effort, to Proposition
7.1.3).
Below the fold, I give the details of this argument. My treatment here
is inspired by the expose [Se1969], as well as the unpublished lecture notes
[So2010].
2Here, complexity of a polynomial is an informal term referring both to the degree of the
polynomial, and the height of the coefficients, which in our application will essentially be integers
up to some normalisation factors.

7.1. Hilberts seventh problem, and powers of 2 and 3

135

7.1.1. Nonstandard formulation. The proof of Bakers theorem requires


a lot of epsilon management in that one has to carefully choose a lot of
parameters such as C and in order to make the argument work properly.
This is particularly the case if one wants a good value of exponents in the
final result, such as the quantity C 0 in Proposition 7.1.6. To simplify matters,
we will abandon all attempts to get good values of constants anywhere, which
allows one to retreat to the nonstandard analysis setting where the notation
is much cleaner, and much (though not all) of the epsilon management is
eliminated (cf. [Ta2008, 1.5]). This is a relatively mild use of nonstandard
analysis, though, and it is not difficult to turn all the arguments below into
standard effective arguments (but at the cost of explicitly tracking all the
constants C). See for instance [So2010] for such an effective treatment.
We turn to the details. We will assume some basic familiarity with
nonstandard analysis, as covered for instance in [Ta2008, 1.5] (but one
should be able to follow this argument using only non-rigorous intuition of
what terms such as unbounded or infinitesimal mean).
Let H be an unbounded (nonstandard) positive real number. Relative
to this H, we can define various notions of size:
(1) A nonstandard number z is said to be of polynomial size if one has
|z| CH C for some standard C > 0.
(2) A nonstandard number z is said to be of polylogarithmic size if one
has |z| C logC H for some standard C > 0.
(3) A nonstandard number z is said to be of quasipolynomial size if
one has |z| exp(C logC H) for some standard C > 0.
(4) A nonstandard number z is said to be quasiexponentially small if
one has |z| exp(C logC H) for every standard C > 0.
(5) Given two nonstandard numbers X, Y with Y non-negative, we
write X  Y or X = O(Y ) if |X| CY for some standard C > 0.
We write X = o(Y ) or X Y if we have |X| cY for all standard
c > 0.
As a general rule of thumb, in our analysis all exponents will be of
polylogarithmic size, all coefficients will be of quasipolynomial size, and all
error terms will be quasiexponentially small.
In this nonstandard analysis setting, there is a clean calculus (analogous
to the calculus of the asymptotic notations O() and o()) to manipulate these
sorts of quantities without having to explicitly track the constants C. For
instance:

136

7. Number theory

(1) The sum, product, or difference of two quantities of a given size


(polynomial, polylogarithmic, quasipolynomial, or quasiexponentially small) remains of that given size (i.e. each size range forms
a ring).
(2) If X  Y , and Y is of a given size, then X is also of that size.
(3) If X is of quasipolynomial size and Y is of polylogarithmic size,
then X Y is of quasipolynomial size, and (if Y is a natural number)
Y ! is also of quasipolynomial size.
(4) If is quasiexponentially small, and X is of quasipolynomial size,
then X is also quasiexponentially small. (Thus, the quasiexponentially small numbers form an ideal in the ring of quasipolynomial
numbers.)
(5) Any quantity of polylogarithmic size, is of polynomial size; and any
quantity of polynomial size, is of quasipolynomial size.
We will refer to these sorts of facts as asymptotic calculus, and rely upon
them heavily to simplify a lot of computations (particularly regarding error
terms).
Proposition 7.1.6 is then equivalent to the following assertion:
Proposition 7.1.7 (Nonstandard weak special case of Baker). Let H be an
unbounded nonstandard natural number, and let pq be a rational of height at
most H (i.e. |p|, |q| H). Then
(relative to H, of course).

log 2
log 3

p
q

is not quasiexponentially small

Let us quickly see why Proposition 7.1.7 implies Proposition 7.1.6 (the
converse is easy and is left to the reader). This is the usual compactness
and contradiction argument. Suppose for contradiction that Proposition
7.1.6 failed. Carefully negating the quantifiers, we may then find a sequence
pn
qn of (standard) rationals with qn > 1, such that
|

log 2 pn
| exp(n logn qn )
log 3 qn

log 2
for all natural numbers n. As log
3 is irrational, qn must go to infinity.
Taking the ultralimit pq of the pqnn , and setting H to be (say) q, we contradict
Proposition 7.1.7.

It remains to prove Proposition 7.1.7. We fix the unbounded nonstanlog 2


dard natural number H, and assume for contradiction that log
3 is quasiexp
ponentially close to a nonstandard rational q of height at most H. We will
write X Y for the assertion that X Y is quasiexponentially small, thus
log 2
p
(7.2)
.
log 3
q

7.1. Hilberts seventh problem, and powers of 2 and 3

137

The objective is to show that (7.2) leads to a contradiction.


7.1.2. The polynomial method. Now it is time to introduce the polynomial method. We will be working with the following class of polynomials:
Definition 7.1.8. A good polynomial is a nonstandard polynomial P :
C of the form
X
(7.3)
P (x, y) =
ca,b xa y b
C2

0a,bD

of two nonstandard variables of some (nonstandard) degree at most D (in


each variable), where D is a nonstandard natural number of polylogarithmic
size, and whose coefficients ca,b are (nonstandard) integers of quasipolynomial size. (A technical point: we require the ca,b to depend in an internal
fashion on the indices a, b, in order for the nonstandard summation here to
be well-defined.) Define the height M of the polynomial to be the maximum
magnitude of the coefficients in P ; thus, by hypothesis, M is of quasipolynomial size.
We have a key definition:
Definition 7.1.9. Let N, J be two (nonstandard) positive numbers of polylogarithmic size. A good polynomial P is said to nearly vanish to order J
on N if one has
dj
P (2z , 3z )|z=n 0
dz j
for all nonstandard natural numbers 0 j J and 1 n N .

(7.4)

The derivatives in (7.4) can be easily computed. Indeed, if we expand


out the good polynomial P out as (7.3), then the left-hand side of (7.4) is
X
ca,b (a log 2 + b log 3)j 2an 3bn .
0a,bD

Now, from (7.2) we have


a log 2 + b log 3

log 3
(aq + bp).
q

Using the asymptotic calculus (and the hypotheses that D, j are of polylogarithmic size, and the ca,b are of quasipolynomial size) we conclude that the
left-hand side of (7.4) is
log 3 j X
)
ca,b (ap + bq)j 2an 3bn .
(7.5)
(
q
0a,bD

138

7. Number theory

The quantity ( logq 3 )j (and its reciprocal) is of quasipolynomial size. Thus,


the condition (7.4) is equivalent to the assertion that
X
ca,b (ap + bq)j 2an 3bn 0
0a,bD

for all 0 j J and 1 n N ; as the left-hand side is a nonstandard


integer, we see from the integrality gap that the condition is in fact equivalent
to the exact constraint
X
(7.6)
ca,b (ap + bq)j 2an 3bn = 0
0a,bD

for all 0 j J and 1 n N .


Using this reformulation of (7.4), we can now give some upper and lower
bounds on the complexity of good polynomials that nearly vanish to a high
order on a set N . We first give an lower bound, that prevents the degree
D from being smaller than N 1/2 :
Proposition 7.1.10 (Lower bound). Let P be a non-trivial good polynomial
of degree D that nearly vanishes to order at least 0 on N . Then (D + 1)2 >
N.
Proof. Suppose for contradiction that (D + 1)2 N . Then from (7.6) we
have
X
ca,b (2a 3b )n = 0
0a,bD

for 1 n (D + 1)2 ; thus there is a non-trivial linear dependence between


2
the (D + 1)2 (nonstandard) vectors ((2a 3b )n )1n(D+1)2 R(D+1) for 0
a, b D. But, from the formula for the Vandermonde determinant, this
would imply that two of the 2a 3b are equal, which is absurd.

In the converse direction, we can obtain polynomials that vanish to a
high order J on N , but with degree D larger than N 1/2 J 1/2 :
Proposition 7.1.11 (Upper bound). Let D, J, N be positive quantities of
polylogarithmic size such that
D2 N J.
Then there exists a non-trivial good polynomial P of degree at most D that
vanishes to order J on N . Furthermore, P has height at most
exp(O(

N 2J
N J 2 log H
+
)).
D2
D

7.1. Hilberts seventh problem, and powers of 2 and 3

139

Proof. We use the pigeonholing argument of Thue and Siegel. Let M be


an positive quantity of quasipolynomial size to be chosen later, and choose
coefficients ca,b for 0 a, b D that are nonstandard natural numbers
2
between 1 and M . There are M (D+1) exp(D2 log M ) possible ways to
make such a selection. For each such selection, we consider the N (J + 1)
expressions arising as the left-hand side of (7.6) with 0 j J and 1
n N . These expressions are nonstandard integers whose magnitude is
bounded by
O((D + 1)2 M O(DH)J exp(O(N D)))
which by asymptotic calculus simplifies to be bounded by
exp(log M + O(J log H) + O(N D)).
The number of possible values of these N (J + 1) expressions is thus
exp(N (J + 1) log M + O(N J 2 log H) + O(N 2 JD)).
By the hypothesis D2  N J and asymptotic calculus, we can make this
quantity less than exp(D2 log M ) for some M of size
N J 2 log H
N 2J
+
)).
D2
D
In particular, M can be taken to be of polylogarithmic size. Thus, by
the pigeonhole principle, one can find two choices for the coefficients ca,b
which give equal values for the expressions in the left-hand side of (7.6).
Subtracting those two choices we obtain the result.

M  exp(O(

7.1.3. The bootstrap. At present, there is no contradiction between the


lower bound in Proposition 7.1.10 and the upper bound in Proposition
7.1.11, because there is plenty of room between the two bounds. To bridge
the gap between the bounds, we need a bootstrap argument that uses vanishing on one N to imply vanishing (to slightly lower order) on a larger
N 0 . The key bootstrap in this regard is:
Proposition 7.1.12 (Bootstrap). Let D, J, N be unbounded polylogarithmic
quantities, such that
N log H.
Let P be a good polynomial of degree at most D and height exp(O(N J)),
that nearly vanishes to order 2J on N . Then P also vanishes to order J
J
on N 0 for any N 0 = o( D
N ).
Proof. It is convenient to use complex analysis methods. We consider the
entire function
f (z) := P (2z , 3z ),

140

7. Number theory

thus by (7.3)
X

f (z) =

ca,b 2az 3bz .

0a,bD

By hypothesis, we have
f (j) (n) 0
for all 0 j 2J and 1 n N . We wish to show that
f (j) (n0 ) 0
for 0 j J and 1 n0 N 0 . Clearly we may assume that N 0 n0 > N .
Fix 0 j J and 1 n0 N 0 . To estimate f (j) (n0 ), we consider the
contour integral
Z
dz
1
f (j) (z)
(7.7)
QN
J
2i |z|=R n=1 (z n) z n0
(oriented anticlockwise), where R 2N 0 is to be chosen later, and estimate
it in two different ways. Firstly, we have
X
f (j) (z) =
ca,b (a log 2 + b log 3)j 2az 3bz ,
0a,bD

so for |z| =

2N 0 ,

we have the bound

|f (j) (z)|  D2 exp(o(N J))O(D)J exp(O(DR))


when |z| = 2N 0 , which by the hypotheses and asymptotic calculus, simplifies
to
|f (j) (z)|  exp(O(N J + DR)).
Also, when |z| = R we have
|

N
Y

(z n)J | (R/2)N J .

n=1

We conclude the upper bound


exp(O(N J + DR) N J log R)
for the magnitude of (7.7). On the other hand, we can evaluate (7.7) using
the residue theorem. The integrand has poles at 1, . . . , N and at n0 . The
simple pole at n0 has residue
f (j) (n0 )
QN

0
J
n=1 (n n)

Now we consider the poles at n = 1, . . . , N . For each such n, we see that the
first J derivatives of f (j) are quasiexponentially small at n. Thus, by Taylor
expansion (and asymptotic calculus), one can express f (j) (z) as the sum of
a polynomial of degree J with quasiexponentially small coefficients, plus an
entire function that vanishes to order J at n. The latter term contributes

7.1. Hilberts seventh problem, and powers of 2 and 3

141

nothing to the residue at n, while from the Cauchy integral formula (applied,
for instance, to a circle of radius 1/2 around n) and asymptotic calculus, we
see that the former term contributes a residue is quasiexponentially small.
In particular, it is less than exp(O(N J) N J log R). We conclude that
| QN

f (j) (n0 )

n=1 (n

n)J

|  exp(O(N J + DR) N J log R).

We have
|

N
Y

(n0 n)J | (N 0 )N J

n=1

and thus
R
);
N0
choosing R to be a large standard multiple of N 0 and using the hypothesis
J
N 0 = o( D
N ), we can simplify this to
|f (j) (n0 )|  exp(O(N J + DR) N J log

|f (j) (n0 )|  exp(N J).


To improve this bound, we use the integrality gap. Recall that from
(7.5) that
log 3 j X
0
0
|f (j) (n0 )| (
ca,b (ap + bq)j 2an 3bn ;
)
q
0a,bD

in particular, ( logq 3 )j f (j) (n0 ) is quasiexponentially close to a (nonstandard)


integer. Since
q j
) = exp(O(J log H)),
(
log 3
we have
q j (j) 0
1
|(
) f (n )|
log 3
2
(say). Using the integrality gap, we conclude that
q j (j) 0
(
) f (n ) 0
log 3
which implies that f nearly vanishes to order J on N 0 , as required.

Now we can finish the proof of Proposition 7.1.7 (and hence Proposition
7.1.6). We select quantities D, J, N0 of polylogarithmic size obeying the
bounds
log H N0 D J
and
N0 J D2 ,

142

7. Number theory

with a gap of a positive power of log H between each such inequality. For
instance, one could take
N0 := log2 H
D := log4 H
J := log5 H;
many other choices are possible (and one can optimise these choices eventually to get a good value of exponent C 0 in Proposition 7.1.6).
Using Proposition 7.1.11, we can find a good polynomial P which van2
N 2J
H
ishes to order J on N0 , of height exp(O( N0 JDlog
+ D0 )), and hence (by
2
the assumptions on N0 , D, J) of height exp(O(N0 J)).
Applying Proposition 7.1.12, P nearly vanishes to order J/2 on N1 for
J
any N1 = o( D
N0 ). Iterating this, an easy induction shows that for any
standard k 1, P nearly vanishes to order J/2k on Nk for any Nk =
J k
) N0 ). As J/D was chosen to be larger than a positive power of log H,
o(( D
we conclude that P nearly vanishes to order at least 0 on N for any N of
polylogarithmic size. But for N large enough, this contradicts Proposition
7.1.10.
Remark 7.1.13. The above argument places a lower bound on quantities
such as
q log 2 p log 3
for integer p, q. Bakers theorem, in its full generality, gives a lower bound
on quantities such as
0 + 1 log 1 + . . . + n log n
for algebraic numbers 0 , . . . , n , 1 , . . . , n , which is polynomial in the
height of the quantities involved, assuming of course that 1, 1 , . . . , n are
multiplicatively independent, and that all quantities are of bounded degree.
The proof is more intricate than the one given above, but follows a broadly
similar strategy, and the constants are completely effective.

7.2. The Collatz conjecture, Littlewood-Offord theory, and


powers of 2 and 3
One of the most notorious problems in elementary mathematics that remains
unsolved is the Collatz conjecture, concerning the function f0 : N N defined by setting f0 (n) := 3n + 1 when n is odd, and f0 (n) := n/2 when
n is even. (Here, N is understood to be the positive natural numbers
{1, 2, 3, . . .}.)
Conjecture 7.2.1 (Collatz conjecture). For any given natural number n,
the orbit n, f0 (n), f02 (n), f03 (n), . . . passes through 1 (i.e. f0k (n) = 1 for some
k).

7.2. Collatz and Littlewood-Offord

143

Open questions with this level of notoriety can lead to what Richard
Lipton calls3 mathematical diseases. Nevertheless, it can still be diverting
to spend a day or two each year on these sorts of questions, before returning
to other matters; so I recently had a go at the problem. Needless to say,
I didnt solve the problem, but I have a better appreciation of why the
conjecture is (a) plausible, and (b) unlikely be proven by current technology,
and I thought I would share what I had found out here.
Let me begin with some very well known facts. If n is odd, then f0 (n) =
3n + 1 is even, and so f02 (n) = 3n+1
2 . Because of this, one could replace
when n is odd,
f0 by the function f1 : N N, defined by f1 (n) = 3n+1
2
and f1 (n) = n/2 when n is even, and obtain an equivalent conjecture. Now
we see that if one chooses n at random, in the sense that it is odd with
probability 1/2 and even with probability 1/2, then f1 increases n by a factor
of roughly 3/2 half the time, and decreases it by a factor of 1/2 half the time.
Furthermore, if n is uniformly distributed modulo 4, one easily verifies that
f1 (n) is uniformly distributed modulo 2, and so f12 (n) should be roughly 3/2
times as large as f1 (n) half the time, and roughly 1/2 times as large as f1 (n)
the other half of the time. Continuing this at a heuristic level, we expect
generically that f1k+1 (n) 23 f1k (n) half the time, and f1k+1 (n) 21 f1k (n) the
other half of the time. The logarithm log f1k (n) of this orbit can then be
modeled heuristically by a random walk with steps log 32 and log 21 occuring
with equal probability. The expectation
1
3 1
1
1
3
log + log = log
2
2 2
2
2
4
is negative, and so (by the classic gamblers ruin) we expect the orbit to
decrease over the long term. This can be viewed as heuristic justification
of the Collatz conjecture, at least in the average case scenario in which
n is chosen uniform at random (e.g. in some large interval {1, . . . , N }). (It
also suggests that if one modifies the problem, e.g. by replacing 3n + 1
to 5n + 1, then one can obtain orbits that tend to increase over time, and
indeed numerically for this variant one sees orbits that appear to escape to
infinity.) Unfortunately, one can only rigorously keep the orbit uniformly
distributed modulo 2 for time about O(log N ) or so; after that, the system
is too complicated for naive methods to control at anything other than a
heuristic level.
Remark 7.2.2. One can obtain a rigorous analogue of the above arguments
by extending f1 from the integers Z to the 2-adics Z2 (the inverse limit of
the cyclic groups Z/2n Z). This compact abelian group comes with a Haar
probability measure, and one can verify that this measure is invariant with
respect to f1 ; with a bit more effort one can verify that it is ergodic. This
3See rjlipton.wordpress.com/2009/11/04/on-mathematical-diseases.

144

7. Number theory

suggests the introduction of ergodic theory methods. For instance, using


the pointwise ergodic theorem, we see that if n is a random 2-adic integer,
then almost surely the orbit n, f1 (n), f12 (n), . . . will be even half the time
and odd half the time asymptotically, thus supporting the above heuristics.
Unfortunately, this does not directly tell us much about the dynamics on Z,
as this is a measure zero subset of Z2 . More generally, unless a dynamical
system is somehow polynomial, nilpotent, or unipotent in nature,
the current state of ergodic theory is usually only able to say something
meaningful about generic orbits, but not about all orbits. For instance, the
very simple system x 10x on the unit circle R/Z is well understood from
ergodic theory (in particular, almost all orbits will be uniformly distributed),
but the orbit of a specific point, e.g. mod 1, is still nearly impossible
to understand (this particular problem being equivalent to the notorious
unsolved question of whether the digits of are uniformly distributed).
The above heuristic argument only suggests decreasing orbits for almost all n (though even this remains unproven, the state of the art is that
the number of n in {1, . . . , N } that eventually go to 1 is  N 0.84 , see
[KrLa2003]). It leaves open the possibility of some very rare exceptional n
for which the orbit goes to infinity, or gets trapped in a periodic loop. Since
the only loop that 1 lies in is 1, 4, 2 (for f0 ) or 1, 2 (for f1 ), we thus may
isolate a weaker consequence of the Collatz conjecture:
Conjecture 7.2.3 (Weak Collatz conjecture). Suppose that n is a natural
number such that f0k (n) = n for some k 1. Then n is equal to 1, 2, or 4.
Of course, we may replace f0 with f1 (and delete 4) and obtain an
equivalent conjecture.
This weaker version of the Collatz conjecture is also unproven. However, it was observed in [BoSo1978]by Bohm and Sontacchi that this weak
conjecture is equivalent to a divisibility problem involving powers of 2 and
3:
Conjecture 7.2.4 (Reformulated weak Collatz conjecture). There does not
exist k 1 and integers
0 = a1 < a2 < . . . < ak+1
such that 2ak+1 3k is a positive integer that is a proper divisor of
3k1 2a1 + 3k2 2a2 + . . . + 2ak ,
i.e.
(7.8)

(2ak+1 3k )n = 3k1 2a1 + 3k2 2a2 + . . . + 2ak

for some natural number n > 1.

7.2. Collatz and Littlewood-Offord

145

Proposition 7.2.5. Conjecture 7.2.3 and Conjecture 7.2.4 are equivalent.


Proof. To see this, it is convenient to reformulate Conjecture 7.2.3 slightly.
Define an equivalence relation on N by declaring a b if a/b = 2m for
some integer m, thus giving rise to the quotient space N/ of equivalence
classes [n] (which can be placed, if one wishes, in one-to-one correspondence
with the odd natural numbers). We can then define a function f2 : N/
N/ by declaring
f2 ([n]) := [3n + 2a ]

(7.9)

for any n N, where 2a is the largest power of 2 that divides n. It is easy


to see that f2 is well-defined (it is essentially the Syracuse function, after
identifying N/ with the odd natural numbers), and that periodic orbits
of f2 correspond to periodic orbits of f1 or f0 . Thus, Conjecture 7.2.3 is
equivalent to the conjecture that [1] is the only periodic orbit of f2 .
Now suppose that Conjecture 7.2.3 failed, thus there exists [n] 6= [1] such
that f2k ([n]) = [n] for some k 1. Without loss of generality we may take
n to be odd, then n > 1. It is easy to see that [1] is the only fixed point of
f2 , and so k > 1. An easy induction using (7.9) shows that
f2k ([n]) = [3k n + 3k1 2a1 + 3k2 2a2 + . . . + 2ak ]
where, for each 1 i k, 2ai is the largest power of 2 that divides
(7.10)

ni := 3i1 n + 3i2 2a1 + . . . + 2ai1 .

In particular, as n1 = n is odd, a1 = 0. Using the recursion


ni+1 = 3ni + 2ai ,

(7.11)

we see from induction that 2ai +1 divides ni+1 , and thus ai+1 > ai :
0 = a1 < a2 < . . . < ak .
Since f2k ([n]) = [n], we have
2ak+1 n = 3k n + 3k1 2a1 + 3k2 2a2 + . . . + 2ak = 3nk + 2ak
for some integer ak+1 . Since 3nk + 2ak is divisible by 2ak +1 , and n is odd,
we conclude ak+1 > ak ; if we rearrange the above equation as (7.8), then we
obtain a counterexample to Conjecture 7.2.4.
Conversely, suppose that Conjecture 7.2.4 failed. Then we have k 1,
integers
0 = a1 < a2 < . . . < ak+1
and a natural number n > 1 such that (7.8) holds. As a1 = 0, we see that
the right-hand side of (7.8) is odd, so n is odd also. If we then introduce

146

7. Number theory

the natural numbers ni by the formula (7.10), then an easy induction using
(7.11) shows that
(7.12)

(2ak+1 3k )ni = 3k1 2ai + 3k2 2ai+1 + . . . + 2ai+k1

with the periodic convention ak+j := aj + ak+1 for j > 1. As the ai are
increasing in i (even for i k + 1), we see that 2ai is the largest power of 2
that divides the right-hand side of (7.12); as 2ak+1 3k is odd, we conclude
that 2ai is also the largest power of 2 that divides ni . We conclude that
f2 ([ni ]) = [3ni + 2ai ] = [ni+1 ]
and thus [n] is a periodic orbit of f2 . Since n is an odd number larger than
1, this contradicts Conjecture 7.2.4.

Call a counterexample a tuple (k, a1 , . . . , ak+1 ) that contradicts Conjecture 7.2.4, i.e. an integer k 1 and an increasing set of integers
0 = a1 < a2 < . . . < ak+1
such that (7.8) holds for some n 1. We record a simple bound on such
counterexamples, due to Terras [Te1976] and Garner [Ga1981]:
Lemma 7.2.6 (Exponent bounds). Let N 1, and suppose that the Collatz
conjecture is true for all n < N . Let (k, a1 , . . . , ak+1 ) be a counterexample.
Then
log(3 + N1 )
log 3
k < ak+1 <
k.
log 2
log 2
Proof. The first bound is immediate from the positivity of 2ak+1 3k . To
prove the second bound, observe from the proof of Proposition 7.2.5 that
the counterexample (k, a1 , . . . , ak+1 ) will generate a counterexample to Conjecture 7.2.3, i.e. a non-trivial periodic orbit n, f (n), . . . , f K (n) = n. As the
conjecture is true for all n < N , all terms in this orbit must be at least N .
An inspection of the proof of Proposition 7.2.5 reveals that this orbit consists of k steps of the form x 7 3x + 1, and ak+1 steps of the form x 7 x/2.
As all terms are at least n, the former steps can increase magnitude by a
multiplicative factor of at most 3 + N1 . As the orbit returns to where it
started, we conclude that
1
1
1 (3 + )k ( )ak+1
N
2
whence the claim.

The Collatz conjecture has already been verified for many values4 of n.
Inserting this into the above lemma, one can get lower bounds on k. For
4According to http://www.ieeta.pt/ tos/3x+1.html, the conjecture has been verified up to
at least N = 5 1018 .

7.2. Collatz and Littlewood-Offord

147

instance, by methods such as this, it is known that any non-trivial periodic


orbit has length at least 105, 000, as shown in [Ga1981] (and this bound,
which uses the much smaller value N = 2 109 that was available in 1981,
can surely be improved using the most recent computational bounds).
Now we can perform a heuristic count on the number of counterexamples.
If we fix k and a := ak+1
, then 2a > 3k , and from basic combinatorics we

a1
see that there are k1
different ways to choose the remaining integers
0 = a1 < a2 < . . . < ak+1
to form a potential counterexample (k, a1 , . . . , ak+1 ). As a crude heuristic,
one expects that for a random such choice of integers, the expression
(7.8) has a probability 1/q of holding for some integer n. (Note that q is
not divisible by 2 or 3, and so one does not expect the special structure
of the right-hand side of (7.8) with respect to those moduli to be relevant.
There will be some choices of a1 , . . . , ak where the right-hand side in (7.8) is
too small to be divisible by q, but using the estimates in Lemma 7.2.6, one
expects this to occur very infrequently.) Thus, the total expected number
of solutions for this choice of a, k is


1 a1
.
q k1
The heuristic number of solutions overall is then expected to be
X 1 a 1
(7.13)
,
q k1
a,k

where, in view of Lemma 7.2.6, one should restrict the double summation
3
to the heuristic regime a log
log 2 k, with the approximation here accurate to
many decimal places.
We need a lower bound on q. Here, we will use Bakers theorem (as
discussed in Section 7.1), which among other things gives the lower bound
(7.14)

q = 2a 3k  2a /aC

for some absolute constant C. Meanwhile, Stirlings formula (as discussed


2
for instance in [Ta2011c, 1.2]) combined with the approximation k log
log 3 a
gives


a1
log 2 a
exp(h(
))
k1
log 3
where h is the entropy function
h(x) := x log x (1 x) log(1 x).
A brief computation shows that
log 2
exp(h(
)) 1.9318 . . .
log 3

148

7. Number theory

and so (ignoring all subexponential terms)




1 a1
(0.9659 . . .)a
q k1
which makes the series (7.13) convergent. (Actually, one does not need the
full strength of Lemma 7.2.6 here; anything that kept k well away from
a/2 would suffice. In particular, one does not need an enormous value of
N ; even N = 5 (say) would be more than sufficient to obtain the heuristic
that there are finitely many counterexamples.) Heuristically applying the
Borel-Cantelli lemma, we thus expect that there are only a finite number
of counterexamples to the weak Collatz conjecture (and inserting a bound
such as k 105,000, one in fact expects it to be extremely likely that there
are no counterexamples at all).
This, of course, is far short of any rigorous proof of Conjecture 7.2.3. In
order to make rigorous progress on this conjecture, it seems that one would
need to somehow exploit the structural properties of numbers of the form
(7.15)

3k1 2a1 + 3k2 2a2 + . . . + 2ak .

In some very special cases, this can be done. For instance, suppose that
one had ai+1 = ai + 1 with at most one exception (this is essentially what
is called a 1-cycle in [St1978]). Then (7.15) simplifies via the geometric
series formula to a combination of just a bounded number of powers of
2 and 3, rather than an unbounded number. In that case, one can start
using tools from transcendence theory such as Bakers theorem to obtain
good results; for instance, in [St1978], it was shown that 1-cycles cannot
actually occur, and similar methods have been used to show that m-cycles
(in which there are at most m exceptions to ai+1 = ai + 1) do not occur for
any m 63, as was shown in [Side2005]. However, for general increasing
tuples of integers a1 , . . . , ak , there is no such representation by bounded
numbers of powers, and it does not seem that methods from transcendence
theory will be sufficient to control the expressions (7.15) to the extent that
one can understand their divisibility properties by quantities such as 2a 3k .
Amusingly, there is a slight connection to Littlewood-Offord theory in
additive combinatorics - the study of the 2n random sums
v1 v2 . . . vn
generated by some elements v1 , . . . , vn of an additive group G, or equivalently, the vertices of an n-dimensional parallelepiped inside G. Here, the
relevant group is Z/qZ. The point is that if one fixes k and ak+1 (and hence
q), and lets a1 , . . . , ak vary inside the simplex
:= {(a1 , . . . , ak ) Nk : 0 = a1 < . . . < ak < ak+1 }

7.2. Collatz and Littlewood-Offord

149

then the set S of all sums5 of the form (7.15) (viewed as an element of Z/qZ)
contains many large parallelepipeds. This is because the simplex contains
many large cubes. Indeed, if one picks a typical element (a1 , . . . , ak ) of ,
then one expects (thanks to Lemma 7.2.6) that there there will be  k
indices 1 i1 < . . . < im k such that aij +1 > aij + 1 for j = 1, . . . , m,
which allows one to adjust each of the aij independently by 1 if desired and
still remain inside . This gives a cube in of dimension  k, which then
induces a parallelepiped of the same dimension in S. A short computation
shows that the generators of this parallelepiped consist of products of a
power of 2 and a power of 3, and in particular will be coprime to q.
If the weak Collatz conjecture is true, then the set S must avoid the
residue class 0 in Z/qZ. Let us suppose temporarily that we did not know
about Bakers theorem (and the associated bound (7.14)), so that q could
potentially be quite small. Then we would have a large parallelepiped inside
a small cyclic group Z/qZ that did not cover all of Z/qZ, which would not
be possible for q small enough. Indeed, an easy induction shows that a ddimensional parallelepiped in Z/qZ, with all generators coprime to q, has
cardinality at least min(q, d + 1). This argument already shows the lower
bound q  k. In other words, we have
Proposition 7.2.7. Suppose the weak Collatz conjecture is true. Then for
any natural numbers a, k with 2a > 3k , one has 2a 3k  k.
This bound is very weak when compared against the unconditional
bound (7.14). However, I know of no way to get a nontrivial separation
property between powers of 2 and powers of 3 other than via transcendence
theory methods. Thus, this result strongly suggests that any proof of the
Collatz conjecture must either use existing results in transcendence theory,
or else must contribute a new method to give non-trivial results in transcendence theory. (This already rules out a lot of possible approaches to solve
the Collatz conjecture.)
By using more sophisticated tools in additive combinatorics, one can improve the above proposition (though it is still well short of the transcendence
theory bound (7.14)):
Proposition 7.2.8. Suppose the weak Collatz conjecture is true. Then for
any natural numbers a, k with 2a > 3k , one has 2a 3k  (1 + )k for some
absolute constant > 0.
Proof. (Informal sketch only) Suppose not, then we can find a, k with q :=
2a 3k of size (1 + o(1))k = exp(o(k)). We form the set S as before, which
5Note, incidentally, that once one fixes k, all the sums of the form (7.15) are distinct; because
given (7.15) and k, one can read off 2a1 as the largest power of 2 that divides (7.15), and then
subtracting off 3k1 2a1 one can then read off 2a2 , and so forth.

150

7. Number theory

contains parallelepipeds in Z/qZ of large dimension d  k that avoid 0.


We can count the number of times 0 occurs in one of these parallelepipeds
by a standard Fourier-analytic computation involving Riesz products (see
[TaVu2006, Chapter 7] or [Ma2010]). Using this Fourier representation,
the fact that this parallelepiped avoids 0 (and the fact that q = exp(o(k)) =
exp(o(d))) forces the generators v1 , . . . , vd to be concentrated in a Bohr set,
in that one can find a non-zero frequency Z/qZ such that (1 o(1))d
of the d generators lie in the set {v : v = o(q) mod q}. However, one can
choose the generators to essentially have the structure of a (generalised)
geometric progression (up to scaling, it resembles something like 2i 3bic for
i ranging over a generalised arithmetic progression, and a fixed irrational),
and one can show that such progressions cannot be concentrated in Bohr
sets (this is similar in spirit to the exponential sum estimates of Bourgain
[Bo2005] on approximate multiplicative subgroups of Z/qZ, though one can
use more elementary methods here due to the very strong nature of the Bohr
set concentration (being of the 99% concentration variety rather than the
1% concentration).). This furnishes the required contradiction.

Thus we see that any proposed proof of the Collatz conjecture must either use transcendence theory, or introduce new techniques that are powerful
enough to create exponential separation between powers of 2 and powers of
3.
Unfortunately, once one uses the transcendence theory bound (7.14), the
size q of the cyclic group Z/qZ becomes larger than the volume of any cube
in S, and Littlewood-Offord techniques are no longer of much use (they can
be used to show that S is highly equidistributed in Z/qZ, but this does not
directly give any way to prevent S from containing 0).
One possible toy model problem for the (weak) Collatz conjecture is a
conjecture of Erdos [Er1979] asserting that for n > 8, the base 3 representation of 2n contains at least one 2. (See [La2009] for some work on this
conjecture and on related problems.) To put it another way, the conjecture
asserts that there are no integer solutions to
2n = 3a1 + 3a2 + . . . + 3ak
with n > 8 and 0 a1 < . . . < ak . (When n = 8, of course, one has
28 = 30 + 31 + 32 + 35 .) In this form we see a resemblance to Conjecture
7.2.4, but it looks like a simpler problem to attack (though one which is
still a fair distance beyond what one can do with current technology). Note
that one has a similar heuristic support for this conjecture as one does
2
for Proposition 7.2.4; a number of magnitude 2n has about n log
log 3 base 3
digits, so the heuristic probability that none of these digits are equal to 2 is
3

log 2
n log
3

= 2n , which is absolutely summable.

7.3. Erdoss divisor bound

151

7.3. Erdoss divisor bound


One of the basic problems in analytic number theory is to obtain bounds
and asymptotics for sums of the form6
X
f (n)
nx

in the limit x , where n ranges over natural numbers less than x, and
f : N C is some arithmetic function of number-theoretic interest. For
instance, the celebrated prime number theorem is equivalent to the assertion
X
(n) = x + o(x)
nx

where (n) is the von Mangoldt function (equal to log p when n is a power
of a prime p, and zero otherwise), while the infamous Riemann hypothesis
is equivalent to the stronger assertion
X
(n) = x + O(x1/2+o(1) ).
nx

P
It is thus of interest to develop techniques to estimate such sums nx f (n).
Of course, the difficulty of this task depends on how nice the function f
is. The functions f that come up in number theory lie on a broad spectrum
of niceness, with some particularly nice functions being quite easy to sum,
and some being insanely difficult.
At the easiest end of the spectrum are those functions f that exhibit
some sort of regularity or smoothness. Examples of smoothness include
Archimedean smoothness, in which f (n) is the restriction of some smooth
function f : R C from the reals to the natural numbers, and the derivatives of f are well controlled. A typical example is
X
log n.
nx

One can already


R x get quite good bounds on this quantity by comparison with
the integral 1 log t dt, namely
X
log n = x log x x + O(log x),
nx

with sharper bounds available by using tools such as the Euler-Maclaurin


formula (see [Ta2011d, 3.7]). Exponentiating such asymptotics, incidentally, leads to one of the standard proofs of Stirlings formula (as discussed
in [Ta2011c, 1.2]).

as

6
P It is also often convenient to replace this sharply truncated sum with a smoother sum such
n f (n)(n/x) for some smooth cutoff , but we will not discuss this technicality here.

152

7. Number theory

One can also consider non-Archimedean notions of smoothness, such


as periodicity relative to a small period q. Indeed, if f is periodic with
period q (and is thus essentially a function on the cyclic group Z/qZ), then
one has the easy bound
X
X
x X
f (n) =
|f (n)|).
f (n) + O(
q
nx

nZ/qZ

nZ/qZ

In particular, we have the fundamental estimate


X
x
1 = + O(1).
(7.16)
q
nx:q|n

This is a good estimate when q is much smaller than x, but as q approaches


x in magnitude, the error term O(1) begins to overwhelm the main term nq ,
and one needs much more delicate information on the fractional part of nq
in order to obtain good estimates at this point.
One can also consider functions f which combine Archimedean and
non-Archimedean smoothness into an adelic smoothness. We will not
define this term precisely here (though the concept of a Schwartz-Bruhat
function is one way to capture this sort of concept), but a typical example
might be
X
(n) log n
nx

where is periodic with some small period q. By using techniques such


as summation by parts, one can estimate such sums using the techniques
used to estimate sums of periodic functions or functions with (Archimedean)
smoothness.
Another class of functions that is reasonably well controlled are the
multiplicative functions, in which f (nm) = f (n)f (m) whenever n, m are
coprime. Here, one can use the powerful techniques of multiplicative number
theory, for instance by working with the Dirichlet series

X
f (n)
n=1

ns

P
which are clearly related to the partial sums nx f (n) (essentially via the
Mellin transform, a cousin of the Fourier and Laplace transforms); for this
section we ignore the (important) issue of how to make sense of this series
when it is not absolutely convergent (but see [Ta2011d, 3.7] for more
discussion). A primary reason that this technique is effective is that the
Dirichlet series of a multiplicative function factorises as an Euler product

X
f (n)
n=1

ns

YX
f (pj )
(
).
js
p
p
j=0

7.3. Erdoss divisor bound

153

One also obtains similar types of representations for functions that are not
quite multiplicative, but are closely related to multiplicative functions, such
P
(n)
0 (s)
as the von Mangoldt function (whose Dirichlet series
n=1 ns = (s)
is not given by an Euler product, but instead by the logarithmic derivative
of an Euler product).
Moving another notch along the spectrum between well-controlled and
ill-controlled functions, one can consider functions f that are divisor sums
such as
X
X
g(d) =
1d|n g(d)
f (n) =
dR

dR;d|n

for some other arithmetic function g, and some level R. This is a linear
combination of periodic functions 1d|n g(d) and is thus technically periodic
in n (with period equal to the least common multiple of all the numbers from
1 to R), but in practice this periodic is far too large to be useful (except
for extremely small
P levels R, e.g. R = O(log x)). Nevertheless, we can still
control the sum nx f (n) simply by rearranging the summation:
X
X
X
f (n) =
g(d)
1
nx

dR

nx:d|n

P
and thus by (7.16) one can bound this by the sum of a main term x dR g(d)
d
P
and an error term O( dR |g(d)|). As long as the level R is significantly
less than x, one may expect the main term to dominate, and one can often
estimate this term by a variety of techniques (for instance, if g is multiplicative, then multiplicative number theory techniques are quite effective,
as mentioned previously). Similarly for other slight variants of divisor sums,
such as expressions of the form
X
n
g(d) log
d
dR;d|n

or expressions of the form


X

Fd (n)

dR

where each Fd is periodic with period d.


One of the simplest examples of this comes when estimating the divisor
function
X
(n) :=
1,
d|n

which counts the number of divisors up to n. This is a multiplicative function, and is therefore most efficiently estimated using the techniques of multiplicative number theory; but for reasons that will become clearer later, let

154

7. Number theory

us forget the multiplicative structure and estimate the above sum by more
elementary methods. By applying the preceding method, we see that
X X
X
1
(n) =
nx

dx nx:d|n

X x
=
( + O(1))
d
dx

(7.17)

= x log x + O(x).

Here, we are (barely) able to keep the error term smaller than the main
term; this is right at the edge of the divisor sum method, because the level
R in this case is equal to x. Unfortunately, at this high choice of level, it
is not always possible to always keep the error term under control like this.
For instance, if one wishes to use the standard divisor sum representation
X
n
(n) =
(d) log ,
d
d|n

where (n) is the M


obius function (defined to equal (1)k when
P n is the
product of k distinct primes, and zero otherwise), to compute nx (n),
then one ends up looking at
X
X
X
n
(n) =
(d)
log
d
nx

X
dx

dx

nx:d|n

n
n n
n
(d)( log + O(log ))
d
d
d
d

From Dirichlet series methods, it is not difficult to establish the identities

X
(n)
lim
=0
ns
s1+
n=1

and
lim

s1+

X
(n) log n

ns

n=1

= 1.

This suggests (but does not quite prove) that one has

X
(n)

(7.18)

n=1

=0

and
(7.19)

X
(n) log n
n=1

= 1

in the sense of conditionally convergent series. Assuming one can justify


this (which, ultimately, requires one to exclude zeroes of the Riemann zeta

7.3. Erdoss divisor bound

155

function on the line Re(s) = 1, as discussed in [Ta2010b, 1.12]), one is


eventually left with the estimate x + O(x), which is useless
as a lower bound
P
(and recovers only the classical Chebyshev estimate nx (n)  x as the
upper bound). The inefficiency here when compared to the situation with
the divisor function can be attributed to the signed nature of the Mobius
function (n), which causes some cancellation in the divisor sum expansion
that needs to be compensated for with improved estimates.
However, there are a number of tricks available to reduce the level of
divisor sums. The simplest comes from exploiting the change of variables
d 7 nd , which can in principle reduce the level by
P a square root. For instance,
when computing the divisor function (n) = d|n 1, one can observe using

this change of variables that every divisor of n above n is paired with one

below n, and so we have


X
(7.20)
(n) = 2
1

d n:d|n

except when n is a perfect square, in which case one must subtract one from
the right-hand side. Using this reduced-level divisor sum representation, one
can obtain an improvement to (7.17), namely
X

(n) = x log x + (2 1)x + O( x).


nx

This type of argument is also known as the Dirichlet hyperbola method . A


variant of this argument can also deduce the prime number theorem from
(7.18), (7.19) (and with some additional effort, one can even drop the use
of (7.19)).
Using this square root trick, one can now also control divisor sums such
as
X

(n2 + 1).

nx

(n2

(Note that
+ 1) has no multiplicativity properties in n, and so multiplicative number theory techniques cannot be directly applied here.) The
level of the divisor sum here is initially of order x2 , which is too large to be
useful; but using the square root trick, we can expand this expression as
X
X
2
1
nx dn:d|n2 +1

which one can rewrite as


2

dx dnx:n2 +1=0

1.
mod

156

7. Number theory

The constraint n2 + 1 = 0 mod d is periodic in n with period d, so we can


write this as
X x
2
( (d) + O((d)))
d
dx

where (d) is the number of solutions in Z/dZ to the equation n2 + 1 =


0 mod d, and so
X (d)
X
X
(n2 + 1) = 2x
+ O(
(d)).
d
nx

dx

dx

The function is multiplicative, and can be easily computed at primes p


and prime powers pj using tools such as quadratic reciprocity and Hensels
lemma. For instance, by Fermats two-square theorem, (p) is equal to 2 for
p = 1 mod 4 and 0 for p = 3 mod 4. From this and standard multiplicative
number theory methods (e.g. by obtaining asymptotics on the Dirichlet
P
series d (d)
ds ), one eventually obtains the asymptotic
X (d)
3
=
log x + O(1)
d
2
dx

and also
X

(d) = O(x)

dx

and thus
X

(n2 + 1) =

nx

3
x log x + O(x).

Similar arguments give asymptotics for on other quadratic polynomials; see for instance [Ho1963], [Mc1995], [Mc1997], [Mc1999]. Note that
the irreducibility of the polynomial will be important.PIf one considers instead a sum involving a reducible polynomial, such as nx (n2 1), then
the analogous quantity (n) becomes significantly larger, leading to a larger
growth rate (of order x log2 x rather than x log x) for the sum.
However, the square root trick is insufficient by itself to deal with higher
order sums involving the divisor function, such as
X
(n3 + 1);
nx

the level here is initially of order x3 , and the square root trick only lowers
this to about x3/2 , creating an error term that overwhelms the main term.
And indeed, the asymptotic for such this sum has not yet been rigorously established (although if one heuristically drops error terms, one can arrive at a
reasonable conjecture for this asymptotic), although some results are known
if one averages over additional parameters (see e.g. [Gr1970], [Ma2012].

7.3. Erdoss divisor bound

157

Nevertheless, there is an ingenious argument of Erdos [Er1952] that


allows one to obtain good upper and lower bounds for these sorts of sums,
in particular establishing the asymptotic
X
(P (n))  x log x
(7.21)
x log x 
nx

for any fixed irreducible non-constant polynomial P that maps N to N (with


the implied constants depending of course on the choice of P ). There is also
the related moment bound
X
m (P (n))  x logO(1) x
(7.22)
nx

for any fixed P (not necessarily irreducible) and any fixed m 1, due to
van der Corput [va1939]; this bound is in fact used to dispose of some error
terms in the proof of (7.21). These should be compared with what one can
obtain from the divisor bound (n)  nO(1/ log log n) (see [Ta2009, 1.6])
and the trivial bound (n) 1, giving the bounds
X
1
1+O( log log
)
x
x
m (P (n))  x
nx

for any fixed m 1.


The lower bound in (7.21) is easy, since one can simply lower the level
in (7.20) to obtain the lower bound
X
(n)
1
dn :d|n

for any > 0, and the preceding methods then easily allow one to obtain
the lower bound by taking small enough (more precisely, if P has degree
d, one should take equal to 1/d or less). The upper bounds in (7.21) and
(7.22) are more difficult. Ideally, if we could obtain upper bounds of the
form
X
(7.23)
(n) 
1
dn :d|n

for any fixed > 0, then the preceding methods would easily establish both
results. Unfortunately, this bound can fail, as illustrated by the following
example. Suppose that n is the product of k distinct primes
 p1 . . . pk , each
of which is close to n1/k . Then n has 2k divisors, with nj of them close to
nj/k for each 0 . . . j k. One can think of (the logarithms of) these divisors
as being distributed according to what is essentially a Bernoulli distribution,
thus a randomly selected divisor of n has magnitude about nj/k , where j is
a random variable which has the same distribution as the number of heads
in k independently tossed fair coins. By the law of large numbers, j should

158

7. Number theory

concentrate near k/2 when k is large, which implies that the majority of the
divisors of n will be close to n1/2 . Sending k , one can show that the
bound (7.23) fails whenever < 1/2.
This however can be fixed in a number of ways. First of all, even when
< 1/2, one can show weaker substitutes for (7.23). For instance, for any
fixed > 0 and m 1 one can show a bound of the form
X
(d)C
(7.24)
(n)m 
dn :d|n

for some C depending only on m, . This nice elementary inequality (first


observed in [La1989]) already gives a quite short proof of van der Corputs
bound (7.22).
For Erd
oss upper bound (7.21), though, one cannot afford to lose these
additional factors of (d), and one must argue more carefully. Here, the key
observation is that the counterexample discussed earlier - when the natural
number n is the product of a large number of fairly small primes - is quite
atypical; most numbers have at least one large prime factor. For instance,
the number of natural numbers less than x that contain a prime factor
between x1/2 and x is equal to
X
x
( + O(1)),
p
1/2
x

px

which, thanks to Mertens theorem


X1
= log log x + M + o(1)
p
px

for some absolute constant M , is comparable to x. In a similar spirit, one


can show by similarly elementary means that the number of natural numbers
m less than x that are x1/m -smooth, in the sense that all prime factors are
at most x1/m , is only about mcm x or so. Because of this, one can hope that
the bound (7.23), while not true in full generality, will still be true for most
natural numbers n, with some slightly weaker substitute available (such as
(7.22)) for the exceptional numbers n. This turns out to be the case by an
elementary but careful argument.
The Erd
os argument is quite robust; for instance, the more general inequality
X
m
m
x log2 1 x 
(P (n))m  x log2 1 x
nx

for fixed irreducible P and m 1, which improves van der Corputs inequality (7.23) was shown in [De1971] using the same methods. (A slight
error in the original paper of Erdos was also corrected in this paper.) In

7.3. Erdoss divisor bound

159

[ElTa2011], we also applied this method to obtain bounds such as


XX
(a2 b + 1)  AB log(A + B),
aA bB

which turn out to be enough to obtain the right asymptotics for the number
of solutions to the equation p4 = x1 + y1 + z1 .
7.3.1. Landreaus argument. We now prove (7.24), and use this to show
(7.22).
Suppose first that all prime factors of n have magnitude at most nc/2 .
Then by a greedy algorithm, we can factorise n as the product n = n1 . . . nr
of numbers between nc/2 and nc . In particular, the number r of terms in
this factorisation is at most 2/c. By the trivial inequality (ab) (a) (b)
we have
(n) (n1 ) . . . (nr )
and thus by the pigeonhole principle one has
(n)m (nj )2m/c
for some j. Since nj is a factor of n that is at most nc , the claim follows in
this case (taking C := 2m/c).
Now we consider the general case, in which n may contain prime factors
that exceed nc . There are at most 1/c such factors (counting multiplicity).
Extracting these factors out first and then running the greedy algorithm
again, we may factorise n = n1 . . . nr q where the ni are as before, and q is
the product of at most 1/c primes. In particular, (q) 21/c and thus
(n) 21/c (n1 ) . . . (nr ).
One now argues as before (conceding a factor of 21/c , which is acceptable) to
obtain (7.24) in full generality. (Note that this illustrates a useful principle,
which is that large prime factors of n are essentially harmless for the purposes
of upper bounding (n).)
Now we prove (7.22). From (7.24) we have
X
(P (n))m 
(d)O(1)
dx:d|P (n)

for any n x, and hence we can bound


X

(d)O(1)
dx

m
nx (P (n))

nx:d|n;P (n)=0

by

1.
mod

The inner sum is xd (d) + O((d)) = O( xd (d)), where (d) is the number
of roots of P mod d. Now, for fixed P , it is easy to see that (p) = O(1)
for all primes p, and from Hensels lemma one soon extends this to (pj ) =

160

7. Number theory

O(1) for all prime powers p. (This is easy when p does not divide the
discriminant (P ) of p, as the zeroes of P mod p are then simple. There
are only finitely many primes that do divide the discriminant, and they
can each be handled separately by Hensels lemma and an induction on
the degree of P .) Meanwhile, from the Chinese remainder theorem, is
multiplicative. From this we obtain the crude bound (d)  (d)O(1) , and
so we obtain a bound
X (d)O(1)
X
(P (n))m  x
.
d
nx

dx

This sum can easily be bounded by x logO(1) x by multiplicative number


theory techniques, e.g. by first computing the Dirichlet series

X
(d)O(1)
d=1

1+ log1 x

via the Euler product. This proves (7.22).


7.3.2. Erd
os argument. Now we prove (7.21). We focus on the upper
bound, as the proof of the lower bound has already been sketched.
We first make a convenient observation: from (7.22) (with m = 2) and
the Cauchy-Schwarz inequality, we see that we have
X
(P (n))  x log x
nE

whenever E is a subset of the natural numbers less than x of cardinality


O(x logC x) for some sufficiently large C. Thus we have the freedom to
restrict attention to generic n, where by generic we mean lying outside
of an exceptional set of cardinality O(x logC x) for the C specified above.
Let us now look at the behaviour of P (n) for generic n. We first control
the total number of prime factors:
Lemma 7.3.1. For generic n x, P (n) has O(log log x) distinct prime
factors.
This result is consistent with the Hardy-Ramanujan and Erdos-Kac theorems [HaRa1917], [Ka1940], though it does not quite follow from these
results (because P (n) lives in quite a sparse set of natural numbers).
Proof. If P (n) has more than A log2 log x prime factors for some A, then
P (n) has at least logA x divisors, thus (P (n)) logA X. The claim then
follows from (7.22) (with m = 1) and Markovs inequality, taking A large
enough.

Next, we try to prevent repeated prime factors:

7.3. Erdoss divisor bound

161

Lemma 7.3.2. For generic n x, the prime factors of P (n) between logC x
and x1/2 are all distinct.
Proof. If p is a prime between logC x and x1/2 , then the total number of
n x for which p2 divides P (n) is
x
x
(p2 ) 2 + O((p2 )) = O( 2 ),
p
p
so the total number of x that fail the above property is
X
x
x


2
p
logC x
C
1/2
log xpx

which is acceptable.

It is difficult to increase the upper bound here beyond x1/2 , but fortunately we will not need to go above this bound. The lower bound cannot be
significantly reduced; for instance, it is quite likely that P (n) will be divisible
by 22 for a positive fraction of n. But we have the following substitute:
Lemma 7.3.3. For generic n x, there are no prime powers pj dividing
2
2
P (n) with p < x1/(log log x) and pj x1/(log log x) .
Proof. By the preceding lemma, we can restrict attention to primes p with
p < logC x. For each such p, let pj be the first power of p exceeding
2
x1/(log log x) . Arguing as before, the total number of n x for which pj
divides P (n) is
x
x
 j  1/(log log x)2 ;
p
x
on the other hand, there are at most logC x primes p to consider. The claim
then follows from the union bound.

We now have enough information on the prime factorisation of P (n) to
proceed. We arrange the prime factors of P (n) in increasing order (allowing
repetitions):
P (n) = p1 . . . pJ .
Let 0 j J be the largest integer for which p1 . . . pj x. Suppose first
that J = j + O(1), then as in the previous section we would have
X
(P (n))  (p1 . . . pj )
1
dx:d|P (n)

which is an estimate of the form (7.23), and thus presumably advantageous.


Now suppose that J is much larger than j. Since P (n) = O(xO(1) ), this
implies in particular that pj+1 x1/2 (say), which forces
(7.25)

x1/2 p1 . . . pj x

162

7. Number theory

and pj x1/2 .
For generic n, we have at most O(log log x) distinct prime factors, and
2
2
each such distinct prime less than x1/(log log x) contributes at most x1/(log log x)
to the product p1 . . . pj . We conclude that generically, at least one of these
2
primes p1 , . . . , pj must exceed x1/(log log x) , thus we generically have
2

x1/(log log x) pj x1/2 .


In particular, we have
x1/(r+1) pj x1/r
for some 2 r (log log x)2 . This makes the quantity p1 . . . pj x1/r -smooth,
i.e. all the prime factors are at most x1/r . On the other hand, the remaining
prime factors pj+1 , . . . , pJ are at least x1/(r+1) , and P (n) = O(xO(1) ), so we
have J = j + O(r). Thus we can write P (n) as the product of p1 . . . pj and
at most O(r) additional primes, which implies that
(P (n))  exp(O(r)) (p1 . . . pj )
X
= exp(O(r))
1.
d:d|p1 ...pj

The exponential factor looks bad, but we can offset it by the x1/r -smooth
nature of p1 . . . pj , which is inherited by its factors d. From (7.25), d is at
most x; by using the square root trick, we can restrict d to be at least the
square root of p1 . . . pj , and thus to be at least x1/4 . Also, d divides P (n),
and as such inherits many of the prime factorisation properties of P (n); in
particular, O(log log x) distinct prime factors, and d has no prime powers pj
2
2
dividing d with p < x1/(log log x) and pj x1/(log log x) .
To summarise, we have shown the following variant of (7.23):
Lemma 7.3.4 (Lowering the level). For generic n x, we
X
(P (n))  exp(O(r))
1
dSr :d|P (n)

for some 1 r (log log x)2 , where Sr is the set of all x1/r -smooth numbers
d between x1/4 and x with O(log log x) distinct prime factors, and such that
2
there are no prime powers pj dividing d with p < x1/(log log x) and pj
2
x1/(log log x) .
Applying P
this lemma (and discarding the non-generic n), we can thus
upper bound nx (P (n)) (up to acceptable errors) by
X
X
X

exp(O(r))
1.
1r(log log x)2

nx dSr :d|P (n)

7.3. Erdoss divisor bound

163

The level is now less than x and we can use the usual methods to estimate
the inner sums:
X (d)
X
X
1x
.
d
nx dSr :d|P (n)

dSr

Thus it suffices to show that


X (d)
X
exp(O(r))
(7.26)
 log x.
d
2
dSr

1r(log log x)

It is at this point that we need some algebraic number theory, and specifically the Landau prime ideal theorem, via the following lemma:
Proposition 7.3.5. We have
X (d)
 log x.
(7.27)
d
dx

Proof. Let k be the number field formed by extending the rationals by


adjoining a root of the irreducible polynomial P . The Landau prime ideal
theorem (the generalisation of the prime number theorem to such fields)
then tells us (among other things) that the number of prime ideals in k of
norm less than x is x/ log x + O(x/ log2 x). Note that if p is a prime with
a simple root P (n) = 0 mod p in Z/pZ, then one can associate a prime
ideal in k of norm p defined as (p, n). As long as p does not divide the
discriminant, one has (p) simple roots; but there are only O(1) primes that
divide the discriminant. From this we see that
X
x
x
(p)
+ O( 2 ).
log x
log x
px
(One can complement this upper bound with a lower bound, since the ideals
whose norms are a power of a (rational) prime rather than a prime have
only a negligible contribution to the ideal count, but we will not need the
lower bound here). By summation by parts we conclude
X (p)
log log x + O(1)
p
px

and (7.27) follows by standard multiplicative number theory methods (e.g.


P
(d)
j
bounding dx d1+1/
log x by computing the Euler product, noting that (p ) =
(p) whenever p does not divide the discriminant of P , thanks to Hensels
lemma).

This proposition already deals with the bounded r case. For large r we
need the following variant:

164

7. Number theory

Proposition 7.3.6. For any 2 r (log log x)2 , one has


X (d)
 rcr log x
d

dSr

for some absolute constant c > 0.


The bound (7.26) then follows as a corollary of this proposition. In fact,
one expects the x1/r -smoothness in the definition of Sr to induce a gain of
about r!1 ; see [Gr2008] for extensive discussion of this and related topics.
Proof. If d Sr , then we can write d = p1 . . . pj for some primes p1 , . . . , pj
x1/r . As noted previously, the primes in this product that are less than
2
2
x1/(log log x) each contribute at most x1/(log log x) to this product, and there
are at most O(log log x) of these primes, so their total contribution is at
most xO(1/ log log x) . Since d x1/2 , we conclude that the primes that are
2
greater than x1/(log log x) in the factorisation of d must multiply to at least
x1/4 (say). By definition of Sr , these primes are distinct. By the pigeonhole principle, we can then find t 1 such that there are distinct primes
t+1
t
q1 , . . . , qm between x1/2 r and x1/2 r which appear in the prime factorirt
c (say); by definition of Sr , all these primes
sation of d, where m := b 100
are distinct and can thus be ordered as q1 < . . . < qm , and we can write
d = q1 . . . qm u for some u x. As the (qj ) are bounded, we have
(d)  O(1)m (u)  O(1)rt (u)
and so we can upper bound
X

(d)
d

dSr

O(1)rt

t(log log x)2

by

x1/2

t+1 r

q1 <...<qm x1/2

tr

X
1
(u).
q1 . . . qm u<x

Using (7.27) and symmetry we can bound this by


X

O(1)rt

t(log log x)2

1
(
m!

x1/2

X
t+1 r

qx1/2

tr

1 m
) log x.
q

By the prime number theorem (or Mertens theorem) we have


X
x1/2

t+1 r

qx1/2

tr

1
 1.
q

Inserting this bound and summing the series using Stirlings formula, one
obtains the claim.


7.4. The Katai-Borgain-Sanark-Ziegler criterion

165

7.4. The Katai-Bourgain-Sarnak-Ziegler asymptotic


orthogonality criterion
One of the basic problems in analytic number theory is to estimate sums of
the form
X
f (p)
p<x

as x , where p ranges over primes and f is some explicit function of


interest (e.g. a linear phase function f (p) = e2ip for some real number ).
This is essentially the same task as obtaining estimates on the sum
X
(n)f (n)
n<x

where is the von Mangoldt function. If f is bounded, f (n) = O(1), then


from the prime number theorem one has the trivial bound
X
(n)f (n) = O(x)
n<x

but often (when f is somehow oscillatory in nature) one is seeking the


refinement
X
(7.28)
(n)f (n) = o(x)
n<x

or equivalently
(7.29)

f (p) = o(

p<x

x
).
log x

Thanks to identities such as


(7.30)

(n) =

X
d|n

n
(d) log( ),
d

where is the M
obius function, refinements such as (7.28) are similar in
spirit to estimates of the form
X
(7.31)
(n)f (n) = o(x).
n<x

Unfortunately, the connection between (7.28) and (7.31) is not particularly


tight; roughly speaking, one needs to improve the bounds in (7.31) (and
variants thereof) by about two factors of log x before one can use identities
such as (7.30) to recover (7.28). Still, one generally thinks of (7.28) and
(7.31) as being morally equivalent, even if they are not formally equivalent.
When f is oscillating in a sufficiently irrational way, then one standard
way to proceed is the method of Type I and Type II sums, which uses
truncated versions of divisor identities such as (7.30) to expand out either
(7.28) or (7.31) into linear (Type I) or bilinear sums (Type II) with which

166

7. Number theory

one can exploit the oscillation of f . For instance, Vaughans identity lets
one rewrite the sum in (7.28) as the sum of the Type I sum
X
X
(log r)f (rd)),
(d)(
dU

V /drx/d

the Type I sum

a(d)

dU V

f (rd),

V /drx/d

the Type II sum


X

(d)b(m)f (dm),

V dx/U U <mx/V

P
and the error term dV (n)f (n), whenever 1 U, V x are parameters,
and a, b are the sequences
X
a(d) :=
(d)(e)
eU,f V :ef =d

and
X

b(m) :=

(d).

d|m:dU

Similarly one can express (7.31) as the Type I sum


X
X

c(d)
f (rd),
dU V

U V /drx/d

the Type II sum


X

(m)b(d)f (dm)

V <dx/U U <mx/d

and the error term dU V (n)f (N ), whenever 1 U, V x with U V x,


and c is the sequence
X
c(d) :=
(d)(e).
eU,f V :ef =d

After eliminating troublesome sequences such as a(), b(), c() via CauchySchwarz or the triangle inequality, one is then faced with the task of estimating Type I sums such as
X
f (rd)
ry

or Type II sums such as


X

f (rd)f (rd0 )

ry

y, d, d0

for various
1. Here, the trivial bound is O(y), but due to a number
of logarithmic inefficiencies in the above method, one has to obtain bounds

7.4. The Katai-Borgain-Sanark-Ziegler criterion

167

that are more like O( logyC y ) for some constant C (e.g. C = 5) in order to
end up with an asymptotic such as (7.28) or (7.31).
However, in a recent paper [BoSaZi2011] of Bourgain, Sarnak, and
Ziegler, it was observed that as long as one is only seeking the Mobius
orthogonality (7.31) rather than the von Mangoldt orthogonality (7.28),
one can avoid losing any logarithmic factors, and rely purely on qualitative equidistribution properties of f . A special case of their orthogonality
criterion (which had been discovered previously by Katai [Ka1986]) is as
follows:
Proposition 7.4.1 (Orthogonality criterion). Let f : N C be a bounded
function such that
X
(7.32)
f (pn)f (qn) = o(x)
nx

for any distinct primes p, q (where the decay rate of the error term o(x) may
depend on p and q). Then
X
(7.33)
(n)f (n) = o(x).
nx

Actually, the Bourgain-Sarnak-Ziegler paper establishes a more quantitative version of this proposition, in which can be replaced by an arbitrary bounded multiplicative function, but we will content ourselves with
the above weaker special case. This criterion can be viewed as a multiplicative variant of
lemma, which in our notation
Pthe classical van der CorputP
asserts that nx f (n) = o(x) if one has nx f (n + h)f (n) = o(x) for
each fixed non-zero h.
As a sample application, Proposition 7.4.1 easily gives a proof of the
asymptotic
X
(n)e2in = o(x)
nx

for any irrational . (For rational , this is a little trickier, as it is basically equivalent to the prime number theorem in arithmetic progressions.)
In [BoSaZi2011] this criterion is also applied to nilsequences (obtaining
a quick proof of a qualitative version of a result in [GrTa2012]) and to
horocycle flows (for which no Mobius orthogonality result was previously
known).
Informally, the connection between (7.32) and (7.33) comes from the
multiplicative nature of the Mobius function. If (7.33) failed, then (n) exhibits strong correlation with f (n); by change of variables, we then expect
(pn) to correlate with f (pn) and (pm) to correlate with f (qn), for typical p, q at least. On the other hand, since is multiplicative, (pn) exhibits

168

7. Number theory

strong correlation with (qn). Putting all this together (and pretending correlation is transitive), this would give the claim (in the contrapositive). Of
course, correlation is not quite transitive, but it turns out that one can use
the Cauchy-Schwarz inequality as a substitute for transitivity of correlation
in this case.
We will give a proof of Proposition 7.4.1 shortly. The main idea is to
exploit the following observation:P
if P is a large but finite set of primes
(in the sense that the sum A := pP p1 is large), then for a typical large
number n (much larger than the elements
of P ), the number of primes in P
P
that divide n is pretty close to A = pP p1 :
X
(7.34)
1 A.
pP :p|n

A more precise formalisation of this heuristic is provided by the TuranKubilius inequality, which is proven by a simple application of the second
moment method.
In particular, one can sum (7.34) against (n)f (n) and obtain an approximation
X
1 X X
(n)f (n)
(n)f (n)
A
nx

pP nx:p|n

that approximates a sum of (n)f (n) by a bunch of sparser sums of (n)f (n).
Since
1 Xx
,
x=
A
p
pP

we see (heuristically, at least) that in order to establish (7.31), it would


suffice to establish the sparser estimates
X
x
(n)f (n) = o( )
p
nx:p|n

for all p P (or at least for most p P ).


Now we make the change of variables n = pm. As the Mobius function
is multiplicative, we usually have (n) = (p)(m) = (m). (There is an
exception when n is divisible by p2 , but this will be a rare event and we will
be able to ignore it). So it should suffice to show that
X
(m)f (pm) = o(x/p)
mx/p

for most p P . However, by the hypothesis (7.32), the sequences m 7


f (pm) are asymptotically orthogonal as p varies, and this claim will then
follow from a Cauchy-Schwarz argument.

7.4. The Katai-Borgain-Sanark-Ziegler criterion

169

7.4.1. Rigorous proof. We will need a slowly growing function H = H(x)


of x, with H(x) as x , to be chosen later. As the sum of
reciprocals of primes diverges, we see that
X1

p
p<H

as x . It will also be convenient to eliminate small primes. Note that we


may find an even slower growing function W = W (x) of x, with W (x)
as x , such that
X 1
.
p
W p<H

Although it is not terribly important, we will take W and H to be powers of


two. Thus, if we set P to be all the primes between W and H, the quantity
X1
A :=
p
pP

goes to infinity as x .
Lemma 7.4.2 (Turan-Kubilius inequality). One has
X X
(7.35)
|
1 A|2  Ax.
nx pP :p|n

Proof. We have
X X

1=

nx pP :p|n

X X

1.

pP nx:p|n

On the other hand, we have


X

1=

nx:p|n

x
+ O(1)
p

and thus (if H is sufficiently slowly growing)


X X
1 = xA + O(x).
nx pP :p|n

Similarly, we have
X

1)2 =

nx pP :p|n

1.

p,qP nx:p|n,q|n

The expression nx:p|n,q|n 1 is equal to xp + O(1) when q = p, and


p 6= q. A brief calculation then shows that
X X
(
1)2 = xA2 + O(Ax)
P

x
pq

when

nx pP :p|n

if H is sufficiently slowly growing. Inserting these bounds into (7.35), the


claim follows.


170

7. Number theory

From (7.35) and the Cauchy-Schwarz inequality, one has


X X
1 A)(n)f (n) = O(A1/2 x)
(
nx pP :p|n

which we rearrange as
X
1 X X
(n)f (n) =
(n)f (n) + O(A1/2 x).
A
nx

pP nx:p|n

Since A goes to infinity, the O(A1/2 x) term is o(x), so it now suffices to


show that
X X
(n)f (n) = o(Ax).
pP nx:p|n

Write n = pm. Then we have (n)f (n) = (m)f (pm) for all but O(x/p2 )
values of n (if H is sufficiently slowly growing). The exceptional values
contribute at most
X x
X x
=
= O(Ax/W ) = o(Ax)
p2
Wp
pP

pP

which is acceptable. Thus it suffices to show that


X X
(m)f (pm) = o(Ax).
pP mx/p

Partitioning into dyadic blocks, it suffices to show that


X X
(m)f (pm) = o(|Pk |x/2k )
pPk mx/p

uniformly for W 2k < H, where Pk are the primes between 2k and 2k+1 .
Fix k. The left-hand side can be rewritten as
X
X
(m)
f (pm)1mx/p
mx/2k

pPk

so by the Cauchy-Schwarz inequality it suffices to show that


X
X
|
f (pm)1mx/p |2 = o(|Pk |2 x/2k ).
mx/2k pPk

We can rearrange the left-hand side as


X
X

f (pm)f (qm).

p,qPk mmin(x/p,x/q)

Now if H is sufficiently slowly growing as a function of x, we see from (7.32)


that for all distinct p, q H, we have
X
f (pm)f (qm) = o(x/2k )
mmin(x/p,x/q)

7.4. The Katai-Borgain-Sanark-Ziegler criterion

171

uniformly in p, q; meanwhile, for p = q, we have the crude bound


X
f (pm)f (qm) = O(x/2k ).
mmin(x/p,x/q)

The claim follows (noting from the prime number theorem that |Pk | =
o(|Pk |2 )).
7.4.2. From M
obius to von Mangoldt? It would be great if one could
pass from the M
obius asymptotic orthogonality (7.31) to the von Mangoldt
asymptotic orthgonality (7.28) (or equivalently, to (7.29)), as this would give
some new information about the distribution of primes. Unfortunately, it
seems that some additional input is needed to do so. Here is a simple example of a conditional implication that requires an additional input, namely
some quantitative control on Type I sums:
Proposition 7.4.3. Let f : N C be a bounded function such that
X
(7.36)
(n)f (dn) = o(x)
nx

for each fixed d 1 (with the decay rate allowed to depend on d). Suppose
also that one has the Type I bound
X
X
Mx
(7.37)
sup |
f (mn)| 
log2+ x
yx ny
1mM
for all M, x 2 and some absolute constant > 0, where the implied constant is independent of both M and x. Then one has
X
(7.38)
(n)f (n) = o(x)
nx

and thus (by discarding the prime powers and summing by parts)
X
x
).
f (p) = o(
log x
px
Proof. We use the Dirichlet hyperbola method. Using (7.30), one can write
the left-hand side of (7.38) as
X
(m)(log d)f (dm).
dmx

We let D = D(x) be a slowly growing function of x to be chosen later, and


split this sum as
X
X
X
X
(7.39)
log d
(m)f (dm) +
(m)
(log d)f (dm).
dD

mx/d

m<x/D

D<dx/m

172

7. Number theory

P
If D is sufficiently slowly growing, then by (7.36) one has mx/d (m)f (dm) =
o(x) uniformly for all d D. If D is sufficiently slowly growing, this implies that the first term in (7.39) is also o(x). As for the second term, we
dyadically decompose it and bound it in absolute value by
X
X
X
(log d)f (dm)|.
|
(7.40)
2k <x/D 2k m<2k+1 D<dx/m

By summation by parts, we can bound


X
X
f (mn)|
(log d)f (dm)|  log(x/2k ) sup |
|
yx/2k ny

D<dx/m

and so by (7.37), we can bound (7.40) by


X
x

.
1+
log
(x/2k )
k
2 <x/D

This sum evaluates to O(x/D ), and the claim follows since D goes to infinity.

Note that the trivial bound on (7.37) is M x, so one needs to gain about
two logarithmic factors over the trivial bound in order to use the above
proposition. The presence of the supremum is annoying, but it can be
removed by a modification of the argument if one improves the bound by an
additional logarithm by a variety of methods (e.g. completion of sums), or
by smoothing out the constraint n x. However, I do not know of a way to
remove the need to improve the trivial bound by two logarithmic factors.

Chapter 8

Geometry

8.1. A geometric proof of the impossibility of angle


trisection by straightedge and compass
One of the most well known problems from ancient Greek mathematics was
that of trisecting an angle by straightedge and compass, which was eventually proven impossible in [Wa1836], using methods from Galois theory.
Formally, one can set up the problem as follows. Define a configuration
to be a finite collection C of points, lines, and circles in the Euclidean plane.
Define a construction step to be one of the following operations to enlarge
the collection C:
(Straightedge) Given two distinct points A, B in C, form the line
AB that connects A and B, and add it to C.
(Compass) Given two distinct points A, B in C, and given a third
point O in C (which may or may not equal A or B), form the
circle with centre O and radius equal to the length |AB| of the line
segment joining A and B, and add it to C.
(Intersection) Given two distinct curves , 0 in C (thus is either
a line or a circle in C, and similarly for 0 ), select a point P that is
common to both and 0 (there are at most two such points), and
add it to C.
We say that a point, line, or circle is constructible by straightedge and
compass from a configuration C if it can be obtained from C after applying
a finite number of construction steps.
Problem 8.1.1 (Angle trisection). Let A, B, C be distinct points in the
plane. Is it always possible to construct by straightedge and compass from
173

174

8. Geometry

A, B, C a line ` through A that trisects the angle BAC, in the sense that
the angle between ` and BA is one third of the angle of BAC?
Thanks to Wantzels result [Wa1836], the answer to this problem is
known to be no in general; a generic angle BAC cannot be trisected
by straightedge and compass. (On the other hand, some special angles
can certainly be trisected by straightedge and compass, such as a right
angle. Also, one can certainly trisect generic angles if other methods than
straightedge and compass are permitted.)
The impossibility of angle trisection stands in sharp contrast to the
easy construction of angle bisection via straightedge and compass, which we
briefly review as follows:
(1) Start with three points A, B, C.
(2) Form the circle c0 with centre A and radius AB, and intersect it
with the line AC. Let D be the point in this intersection that lies
on the same side of A as C. (D may well be equal to C.)
(3) Form the circle c1 with centre B and radius AB, and the circle c2
with centre D and radius AB. Let E be the point of intersection
of c1 and c2 that is not A.
(4) The line ` := AE will then bisect the angle BAC.
See Figure 1. The key difference between angle trisection and angle
bisection ultimately boils down to the following trivial number-theoretic
fact:
Lemma 8.1.2. There is no power of 2 that is evenly divisible by 3.
Proof. Obvious by modular arithmetic, by induction, or by the fundamental theorem of arithmetic.

In contrast, there are of course plenty of powers of 2 that are evenly
divisible by 2, and this is ultimately why angle bisection is easy while angle
trisection is hard.
The standard way in which Lemma 8.1.2 is used to demonstrate the
impossibility of angle trisection is via Galois theory. The implication is
quite short if one knows this theory, but quite opaque otherwise. We briefly
sketch the proof of this implication here, though we will not need it in the
rest of the discussion. Firstly, Lemma 8.1.2 implies the following fact about
field extensions.
Corollary 8.1.3. Let F be a field, and let E be an extension of F that can
be constructed out of F by a finite sequence of quadratic extensions. Then
E does not contain any cubic extensions K of F .

8.1. Impossibility of angle trisection

175

Figure 1. Angle bisection.

Proof. If E contained a cubic extension K of F , then the dimension of E


over F would be a multiple of three. On the other hand, if E is obtained
from F by a tower of quadratic extensions, then the dimension of E over F
is a power of two. The claim then follows from Lemma 8.1.2.

To conclude the proof, one then notes that any point, line, or circle that
can be constructed from a configuration C is definable in a field obtained from
the coefficients of all the objects in C after taking a finite number of quadratic
extensions, whereas a trisection of an angle ABC will generically only be
definable in a cubic extension of the field generated by the coordinates of
A, B, C.
The Galois theory method also allows one to obtain many other impossibility results of this type, most famously the Abel-Ruffini theorem on the
insolvability of the quintic equation by radicals. For this reason (and also because of the many applications of Galois theory to number theory and other
branches of mathematics), the Galois theory argument is the right way to
prove the impossibility of angle trisection within the broader framework of
modern mathematics. However, this argument has the drawback that it requires one to first understand Galois theory (or at least field theory), which
is usually not presented until an advanced undergraduate algebra or number

176

8. Geometry

theory course, whilst the angle trisection problem requires only high-school
level mathematics to formulate. Even if one is allowed to cheat and sweep
several technicalities under the rug, one still needs to possess a fair amount
of solid intuition about advanced algebra in order to appreciate the proof.
(This was undoubtedly one reason why, even after Wantzels impossibility
result was published, a large amount of effort was still expended by amateur
mathematicians to try to trisect a general angle.)
In this section, I would therefore like to present a different proof (or
perhaps more accurately, a disguised version of the standard proof) of the
impossibility of angle trisection by straightedge and compass, that avoids
explicit mention of Galois theory (though it is never far beneath the surface).
With cheats, the proof is actually quite simple and geometric (except
for Lemma 8.1.2, which is still used at a crucial juncture), based on the
basic geometric concept of monodromy; unfortunately, some technical work
is needed however to remove these cheats.
To describe the intuitive idea of the proof, let us return to the angle
bisection construction, that takes a triple A, B, C of points as input and
returns a bisecting line ` as output. We iterate the construction to create
a quartisecting line m, via the following sequence of steps that extend the
original bisection construction:
(1) Start with three points A, B, C.
(2) Form the circle c0 with centre A and radius AB, and intersect it
with the line AC. Let D be the point in this intersection that lies
on the same side of A as C. (D may well be equal to C.)
(3) Form the circle c1 with centre B and radius AB, and the circle c2
with centre D and radius AB. Let E be the point of intersection
of c1 and c2 that is not A.
(4) Let F be the point on the line ` := AE which lies on c0 , and is on
the same side of A as E.
(5) Form the circle c3 with centre F and radius AB. Let G be the
point of intersection of c1 and c3 that is not A.
(6) The line m := AG will then quartisect the angle BAC.
See Figure 2. Let us fix the points A and B, but not C, and view m (as
well as intermediate objects such as D, c2 , E, `, F , c3 , G) as a function of
C.
Let us now do the following: we begin rotating C counterclockwise
around A, which drags around the other objects D, c2 , E, `, F , c3 , G that
were constructed by C accordingly. For instance, here is an early stage of

8.1. Impossibility of angle trisection

177

Figure 2. Angle quadrisection.

this rotation process, when the angle BAC has become obtuse; see Figure
3.
Now for the slightly tricky bit. We are going to keep rotating C beyond
a half-rotation of 180 , so that BAC now becomes a reflex angle. At this
point, a singularity occurs; the point E collides into A, and so there is an
instant in which the line ` = AE is not well-defined. However, this turns
out to be a removable singularity (and the easiest way to demonstrate this
will be to tap the power of complex analysis, as complex numbers can easily
route around such a singularity), and we can blast through it to the other
side; see Figure 4.
Note that we have now deviated from the original construction in that
F and E are no longer on the same side of A; we are thus now working in
a continuation of that construction rather than with the construction itself.
Nevertheless, we can still work with this continuation (much
P as,1 say, one
works with analytic continuations of infinite series such as n=1 ns beyond
their original domain of definition).
We now keep rotating C around A. In Figure 5, BAC is approaching
a full rotation of 360 .

178

8. Geometry

Figure 3. BAC becomes obtuse.

When BAC reaches a full rotation, a different singularity occurs: c1


and c2 coincide. Nevertheless, this is also a removable singularity, and we
blast through to beyond a full rotation; see Figure 6.
And now C is back where it started, as are D, c2 , E, and `... but
the point F has moved, from one intersection point of ` c3 to the other.
As a consequence, c3 , G, and m have also changed, with m being at right
angles to where it was before. (In the jargon of modern mathematics, the
quadrisection construction has a non-trivial monodromy.)
But nothing stops us from rotating C some more. If we continue this
procedure, we see that after two full rotations of C around A, all points,
lines, and circles constructed from A, B, C have returned to their original
positions. Because of this, we shall say that the quadrisection construction
described above is periodic with period 2.
Similarly, if one performs an octisection of the angle BAC by bisecting
the quadrisection, one can verify that this octisection is periodic with period
4; it takes four full rotations of C around A before the configuration returns
to where it started. More generally, one can show
Proposition 8.1.4. Any construction of straightedge and compass from the
points A, B, C is periodic with period equal to a power of 2.

8.1. Impossibility of angle trisection

179

Figure 4. Beyond the removable singularity.

The reason for this, ultimately, is because any two circles or lines will intersect each other in at most two points, and so at each step of a straightedgeand-compass construction there is an ambiguity of at most 2! = 2. Each
rotation of C around A can potentially flip one of these points to the other,
but then if one rotates again, the point returns to its original position, and
then one can analyse the next point in the construction in the same fashion
until one obtains the proposition.
But now consider a putative trisection operation, that starts with an
arbitrary angle BAC and somehow uses some sequence of straightedge
and compass constructions to end up with a trisecting line `: see Figure 7.
What is the period of this construction? If we continuously rotate C
around A, we observe that a full rotations of C only causes the trisecting
line ` to rotate by a third of a full rotation (i.e. by 120 ): see Figure 8.
Because of this, we see that the period of any construction that contains
` must be a multiple of 3. But this contradicts Proposition 8.1.4 and Lemma
8.1.2.
We will now make the above proof rigorous. Unfortunately, in doing
so, one has to leave the world of high-school mathematics, as one needs a
little bit of algebraic geometry and complex analysis to resolve the issues
with singularities that we saw in the above sketch. Still, I feel that at an

180

8. Geometry

Figure 5. BAC is almost full.

intuitive level at least, this argument is more geometric and accessible than
the Galois-theoretic argument (though anyone familiar with Galois theory
will note that there is really not that much difference between the proofs,
ultimately, as one has simply replaced the Galois group with a closely related
monodromy group instead).
8.1.1. Details. We will assume for sake of contradiction that for every
triple A, B, C of distinct points, we can find a construction by straightedge
and compass that trisects the angle BAC, and eventually deduce a contradiction out of this.
We remark that we do not initially assume any uniformity in this construction; for instance, it could be possible that the trisection procedure for
obtuse angles is completely different from that of acute angles, using a totally different set of constructions, while some exceptional angles (e.g. right
angles or degenerate angles) might use yet another construction. We will
address these issues later.
The first step is to get rid of some possible degeneracies in ones construction. At present, nothing in our definition of a construction prevents
us from adding a point, line, or circle to the construction that was already
present in the existing collection C of points, lines, and circles. However, it is

8.1. Impossibility of angle trisection

181

Figure 6. BAC is beyond full.

clear that any such step in the construction is redundant, and can be omitted. Thus, we may assume without loss of generality that for each A, B, C,
the construction used to trisect the angle contains no such redundant steps.
(This may make the construction even less uniform than it was previously,
but we will address this issue later.)
Another form of degeneracy that we will need to eliminate for technical
reasons is that of tangency. At present, we allow in our construction the
ability to take two tangent circles, or a circle and a tangent line, and add the
tangent point to the collection (if it was not already present in the construction). This would ordinarily be a harmless thing to do, but it complicates
our strategy of perturbing the configuration, so we now act to eliminate it.
Suppose first that one had two circles c1 , c2 already constructed in the configuration C and tangent to each other, and one wanted to add the tangent
point T to the configuration. But note that in order to have added c1 and c2
to C, one must previously have added the centres A1 and A2 of these circles
to C also. One can then add T to C by intersecting the line A1 A2 with c1
and picking the point that lies on c2 ; this way, one does not need to intersect
two tangent curves together: see Figure 9.

182

8. Geometry

Figure 7. A putative trisection operation.

Similarly, suppose that we already had a circle c and a tangent line


` already constructed in the configuration, but with the tangent point T
absent. The centre A of c, and at least two points B, C on `, must previously
have also been constructed in order to have c and ` present; note that B, C
are not equal to T by hypothesis. One can then obtain T by dropping a
perpendicular from A to ` by the usual construction (i.e. drawing a circle
centred at A with radius |AB| to hit ` again at D, then drawing circles from
B and D with the same radius |AB| to meet at a point E distinct from A,
then intersecting AE with ` to obtain T ), thus1 avoiding tangencies again;
see Figure 10.
As a consequence of these reductions, we may now assume that our
construction is nondegenerate in the sense that
Any point, line, or circle added at a step in the construction, does
not previously appear in that construction.
Whenever one intersects two circles in a construction together to
add another point to the construction, the circles are non-tangent
(and thus meet in exactly two points).
Whenever one intersects a circle and a line in a construction together to add another point to the construction, the circle and line
are non-tangent (and thus meet in exactly two points).
1This construction may happen to use lines or circles that had already appeared in the
construction, but in those cases one can simply skip those steps.

8.1. Impossibility of angle trisection

183

Figure 8. Rotating the trisection.

The reason why we restrict attention to nondegenerate constructions is


that they are stable with respect to perturbations. Note for instance that if
one has two circles c1 , c2 that intersect in two different points, and one of
them is labeled P , then we may perturb c1 and c2 by a small amount, and
still have an intersection point close to P (with the other intersection point
far away from P ). Thus, P is locally a continuous function of c1 and c2 .
Similarly if one forms the intersection of a circle and a secant (a line which
intersects non-tangentially). In a similar vein, given two points A and B that
are distinct, the line between them AB varies continuously with A and B as
long as one does not move A and B so far that they collide; and given two
lines `1 and `2 that intersect at a point C (and in particular are non-parallel),
then C also depends continuously on `1 and `2 . Thus, in a nondegenerate
construction starting from the original three points A, B, C, every point, line,
or circle created by the construction can be viewed as a continuous function
of A, B, C, as long as one only works in a sufficiently small neighbourhood
of the original configuration (A, B, C). In particular, the final line ` varies
continuously in this fashion. Note however that the trisection property may
be lost by this perturbation; just because ` happens to trisect BAC when
A, B, C are in the original positions, this does not necessarily imply that

184

8. Geometry

Figure 9. Eliminating tangency.

after one perturbs A, B, C, that the resulting perturbed line ` still trisects
the angle. (For instance, there are a number of ways to trisect a right angle
(e.g. by bisecting an angle of an equilateral triangle), but if one perturbs
the angle to be slightly acute or slightly obtuse, the line created by this
procedure would not be expected to continue to trisect that angle.)
The next step is to allow analytic geometry (and thence algebraic geometry) to enter the picture, by using Cartesian coordinates. We may identify
the Euclidean plane with the analytic plane R2 := {(x, y) : x, y R}; we
may also normalise A, B to be the points A = (0, 0), B = (1, 0) by this
identification. We will also restrict C to lie on the unit circle S 1 := {(x, y)
R2 : x2 + y 2 = 1}, so that there is now just one degree of freedom in the
configuration (A, B, C). One can describe a line in R2 by an equation of the
form
{(x, y) R2 : ax + by + c = 0}
(with a, b not both zero), and describe a circle in R2 by an equation of the
form
{(x, y) R2 : (x x0 )2 + (y y0 )2 = r2 }
with r non-zero. There is some non-uniqueness in these representations: for
the line, one can multiply a, b, c by the same constant without altering the

8.1. Impossibility of angle trisection

185

Figure 10. Another tangency elimination.

line, and for the circle, one can replace r by r. However, this will not be
a serious concern for us. Note that any two distinct points P = (x1 , y1 ),
Q = (x2 , y2 ) determine a line
{(x, y) R2 : xy1 xy2 yx1 + yx2 + x1 y2 x2 y1 = 0}
and given three points O = (x0 , y0 ), A = (x1 , y1 ), B = (x2 , y2 ), one can
form a circle
{(x, y) R2 : (x x0 )2 + (y y0 )2 = (x1 x2 )2 + (y1 y2 )2 }
with centre O and radius |AB|. Given two distinct non-parallel lines
` = {(x, y) R2 : ax + by + c = 0}
and
`0 = {(x, y) R2 : a0 x + b0 y + c0 = 0},
their unique intersection point is given as
(

bc0 b0 c a0 c c0 a
,
);
ab0 ba0 ab0 ba0

186

8. Geometry

similarly, given two circles


c1 = {(x, y) R2 : (x x1 )2 + (y y1 )2 = r12 }
and
c2 = {(x, y) R2 : (x x2 )2 + (y y2 )2 = r22 },
their points of intersection (if they exist in R2 ) are given as
(8.1)

(x1 , y1 ) + t(x2 x1 , y2 y1 ) (

where
t :=

r2 t2 d2 1/2
) (y1 y2 , x2 x1 )
d2

1 r22 r12

2
2d2

and
d2 := (x1 x2 )2 + (y1 y2 )2 ,
and the points of intersection between ` and c1 (if they exist in R2 ) are
given as
r
ax1 + by1 + c 2
ax1 + by1 + c
(a, b) r2 (
) (b, a).
(8.2)
(x1 , y1 )
2
2
a +b
a2 + b2
The precise expressions given above are not particularly important for our
argument, save to note that these expressions are always algebraic functions
of the input coordinates such as x0 , x1 , x2 , y0 , y1 , y2 , a, b, c, a0 , b0 , c0 , r1 , r2 , defined over the reals R, and that the only algebraic operations needed here
besides the arithmetic operations of addition, subtraction, multiplication,
and division is the square root operation. Thus, we see that any particular
construction of, say, a line ` from a configuration (A, B, C) will locally be
an algebraic function of C (recall that we have already fixed A, B), and this
definition can be extended until one reaches a degeneracy (two points, lines,
or circles collide, two curves become tangent, or two lines become parallel);
however, this degeneracy only occurs in an proper real algebraic set of configurations, and in particular for C in a dimension zero subset of the circle
S1.
These degeneracies are annoying because they disconnect the circle S 1 ,
and can potentially block off large regions of that circle for which the construction is not even defined (because two circles stop intersecting, or a
circle and line stop intersecting, in R2 , due to the lack of a real square
root for negative numbers). To fix this, we move now from the real plane
R2 to the complex plane C2 . Note that the algebraic definitions of a line
and a circle continue to make perfect sense in C2 (with coefficients such as
a, b, c, x0 , y0 , r now allowed to be complex numbers instead of real numbers),
and the algebraic intersection formulae given previously continue to make
sense in the complex setting. The point C now is allowed to range in the
1 = {(x, y) C : x2 + y 2 = 1}, which is a Riemann surface
complex circle SC

8.1. Impossibility of angle trisection

187

(conformal to the Riemann sphere C after stereogrpahic projection).


Furthermore, because all non-zero complex numbers have square roots, any
given construction that was valid for at least one configuration is now valid
1 outside of a
(though possibly multi-valued) as an algebraic function on SC
dimension zero set of singularities, i.e. outside of a finite number of exceptional values of C. But note now that these singularities do not disconnect
1 , which has topological dimension two instead of one.
the complex circle SC
As mentioned earlier, a line ` given by such a construction may or may
not trisect the original angle BAC. But this trisection property can be
expressed algebraically (e.g. using the triple angle formulae from trigonometry, or by building rotation matrices), and in particular makes sense over
1 for which
C. Thus, for any given construction of a line `, the set of C in SC
the construction is non-degenerate and trisects BAC is a constructible set
1 is an irreducible one(a boolean combination of algebraic sets). But SC
dimensional complex variety. As such, the aforementioned set of C is either
generic (the complement of a dimension one algebraic set), or has dimension at most one. (Here we are implicitly using the fundamental theorem
of algebra, because the basic dimension theory of algebraic geometry only
works properly over algebraically closed fields.)
On the other hand, there are at most countably many constructions, and
1 , at least one of these constructions
by hypothesis, for each choice of C in SC
has to trisect the angle. Applying the Baire category theorem (or countable
additivity of Lebesgue measure, or using the algebraic geometry fact that an
algebraic variety over an uncountable field cannot be covered by the union
of countably many algebraic sets of smaller dimension), we conclude that
there is a single construction which trisects the angle BAC for a generic
1 outside of a finite set of points, there is a
choice of C, i.e. for all C in SC
construction, which amongst its multiple possible values, is able to output
at least one line ` that trisects BAC.
Now one performs monodromy. Suppose we move C around a closed loop
1 that avoids all points of degeneracy. Then all the other points, lines,
in SC
and circles constructed from A, B, C can be continuously extended from an
initial configuration as discussed earlier, with each such object tracing out its
own path in its own configuration space. Because of the presence of square
roots in constructions such as the intersection (8.1) between two circles, or
the intersection (8.2) between a circle and a line, these constructions may
map a closed loop to an open loop; but because the square root function
forms a double cover of C\0, we see that any closed loop in C\0, if doubled,
will continue to be a closed loop upon taking a square root. (Alternatively,
one can argue geometrically rather than algebraically, noting that in the
intersection of (say) two non-degenerate circles c1 , c2 , there are only two

188

8. Geometry

possible choices for the intersection point of these two circles, and so if one
performs monodromy along a loop of possible pairs (c1 , c2 ) of circles, either
these two choices return to where they initially started, or are swapped; so if
one doubles the loop, one must necessarily leave the intersection points unchanged.) Iterating this, we see that any object constructed by straightedge
and compass from A, B, C must have period 2k for some power of two 2k , in
1 avoiding degenerate points
the sense that if one iterates a loop of C in SC
2k times, the object must return to where it started. (In more algebraic
terminology: the monodromy group must be a 2-group.)
Now, one traverses C along a slight perturbation of a single rotation of
the real unit circle S 1 , taking a slight detour around the finite number of
degeneracy points one encounters along the way. Since ` has to trisect the
angle ABC at each of these points, while varying continuously with C, we
see that when C traverses a full rotation, ` has only traversed one third of
a rotation (or two thirds, depending on which trisection one obtained), and
so the period of ` must be a multiple of three; but this contradicts Lemma
8.1.2, and the claim follows.

8.2. Elliptic curves and Pappuss theorem


An algebraic (affine) plane curve of degree d over some field k is a curve
of the form
= {(x, y) k 2 : P (x, y) = 0}
where P is some non-constant polynomial of degree d. Examples of lowdegree plane curves include
Degree 1 (linear) curves {(x, y) k 2 : ax + by = c}, which are
simply the lines;
Degree 2 (quadric) curves {(x, y) k 2 : ax2 + bxy + cy 2 + dx +
ey + f = 0}, which (when k = R) include the classical conic sections (i.e. ellipses, hyperbolae, and parabolae), but also include
the reducible example of the union of two lines; and
Degree 3 (cubic) curves {(x, y) k 2 : ax3 + bx2 y + cxy 2 + dy 3 +
ex2 + f xy + gy 2 + hx + iy + j = 0}, which include the elliptic curves
{(x, y) k 2 : y 2 = x3 + ax + b} (with non-zero discriminant :=
16(4a3 +27b2 ), so that the curve is smooth) as examples (ignoring
some technicalities when k has characteristic two or three), but also
include the reducible examples of the union of a line and a conic
section, or the union of three lines.
etc.
Algebraic affine plane curves can also be extended to the projective plane
Pk 2 = {[x, y, z] : (x, y, z) k 3 \0} by homogenising the polynomial. For

8.2. Elliptic curves and Pappuss theorem

189

instance, the affine quadric curve {(x, y) k 2 : ax2 +bxy+cy 2 +dx+ey+f =


0} would become {[x, y, z] Pk 2 : ax2 + bxy + cy 2 + dxz + eyz + f z 2 = 0}.
One of the fundamental theorems about algebraic plane curves is Bezouts
theorem, which asserts that if a degree d curve and a degree d0 curve 0
have no common component, then they intersect in at most dd0 points (and if
the underlying field k is algebraically closed, one works projectively, and one
counts intersections with multiplicity, they intersect in exactly dd0 points).
Thus, for instance, two distinct lines intersect in at most one point; a line
and a conic section intersect in at most two points; two distinct conic sections intersect in at most four points; a line and an elliptic curve intersect
in at most three points; two distinct elliptic curves intersect in at most nine
points; and so forth. Bezouts theorem is discussed further in Section 8.4.
From linear algebra we also have the fundamental fact that one can
build algebraic curves through various specified points. For instance, for
any two points A1 , A2 one can find a line {(x, y) : ax + by = c} passing
through the points A1 , A2 , because this imposes two linear constraints on
three unknowns a, b, c and is thus guaranteed to have at least one solution.
Similarly, given any five points A1 , . . . , A5 , one can find a quadric curve passing through these five points (though note that if three of these points are
collinear, then this curve cannot be a conic thanks to Bezouts theorem, and
is thus necessarily reducible to the union of two lines); given any nine points
A1 , . . . , A9 , one can find a cubic curve going through these nine points; and
so forth. This simple observation is one of the foundational building blocks
of the polynomial method in combinatorial incidence geometry, discussed for
instance in [Ta2009b, 1.1].
In the degree 1 case, it is always true that two distinct points A, B

determine exactly one line AB. In higher degree, the situation is a bit
more complicated. For instance, five collinear points determine more than
one quadric curve, as one can simply take the union of the line containing
those five points, together with an arbitrary additional line. Similarly, eight
points on a conic section plus one additional point determine more than
one cubic curve, as one can take that conic section plus an arbitrary line
going through the additional point. However, if one places some general
position hypotheses on these points, then one can recover uniqueness. For
instance, given five points, no three of which are collinear, there can be at
most one quadric curve that passes through these points (because these five
points cannot lie on the union of two lines, and by Bezouts theorem they
cannot simultaneously lie on two distinct conic sections).
For cubic curves, the situation is more complicated still. Consider for instance two distinct cubic curves 0 = {P0 (x, y) = 0} and = {P (x, y) =
0} that intersect in precisely nine points A1 , . . . , A9 (note from Bezouts

190

8. Geometry

theorem that this is an entirely typical situation). Then there is in fact an


entire one-parameter family of cubic curves that pass through these points,
namely the curves t = {P0 (x, y) + tP (x, y) = 0} for any t k {} (with
the convention that the constraint P0 + tP = 0 is interpreted as P = 0
when t = ).
In fact, these are the only cubics that pass through these nine points, or
even through eight of the nine points. More precisely, we have the following
useful fact, known as the Cayley-Bacharach theorem (although the version
given here is actually due to Chasles [Ch1885]):
Proposition 8.2.1 (Cayley-Bacharach theorem). Let 0 = {P0 (x, y) = 0}
and = {P (x, y) = 0} be two cubic curves that intersect (over some
algebraically closed field k) in precisely nine distinct points A1 , . . . , A9 k 2 .
Let P be a cubic polynomial that vanishes on eight of these points (say
A1 , . . . , A8 ). Then P is a linear combination of P0 , P , and in particular
vanishes on the ninth point A9 .

Proof. We use an argument from [Hu2004]. We assume for contradiction


that there is a cubic polynomial P that vanishes on A1 , . . . , A8 , but is not a
linear combination of P0 and P .
We first make some observations on the points A1 , . . . , A9 . No four of
these points can be collinear, because then by Bezouts theorem, P0 and P
would both have to vanish on this line, contradicting the fact that 0 ,
meet in at most nine points. For similar reasons, no seven of these points
can lie on a quadric curve.
One consequence of this is that any five of the A1 , . . . , A9 determine
a unique quadric curve . The existence of the curve follows from linear
algebra as discussed previously. If five of the points lie on two different
quadric curves , 0 , then by Bezouts theorem, they must share a common
line; but this line can contain at most three of the five points, and the other
two points determine uniquely the other line that is the component of both
and 0 , and the claim follows.
Now suppose that three of the first eight points, say A1 , A2 , A3 , are
collinear, lying on a line `. The remaining five points A4 , . . . , A8 do not lie
on `, and determine a unique quadric curve by the previous discussion.
Let B be another point on `, and let C be a point that does not lie on either
` or . By linear algebra, one can find a non-trivial linear combination
Q = aP + bP0 + cP of P, P0 , P that vanishes at both B and C. Then Q is
a cubic polynomial that vanishes on the four collinear points A1 , A2 , A3 , B
and thus vanishes on `, thus the cubic curve defined by Q consists of ` and
a quadric curve. This curve passes through A4 , . . . , A8 and thus equals .

8.2. Elliptic curves and Pappuss theorem

191

But then C does not lie on either ` or despite being a vanishing point of
Q, a contradiction. Thus, no three points from A1 , . . . , A8 are collinear.
In a similar vein, suppose next that six of the first eight points, say
A1 , . . . , A6 , lie on a quadric curve ; as no three points are collinear, this
quadric curve cannot be the union of two lines, and is thus a conic section.

The remaining two points A7 , A8 determine a unique line ` = A7 A8 . Let B


be another point on , and let C be another point that does not lie on either
` and . As before, we can find a non-trivial cubic Q = aP + bP0 + cP that
vanishes at both B, C. As Q vanishes at seven points of a conic section ,
it must vanish on all of , and so the cubic curve defined by Q is the union
of and a line that passes through A7 and A8 , which must necessarily be
`. But then this curve does not pass through C, a contradiction. Thus no
six points in A1 , . . . , A8 lie on a quadric curve.
Finally, let ` be the line through the two points A1 , A2 , and the quadric
curve through the five points A3 , . . . , A7 ; as before, must be a conic section,
and by the preceding paragraphs we see that A8 does not lie on either or
`. We pick two more points B, C lying on ` but not on . As before, we
can find a non-trivial cubic Q = aP + bP0 + cP that vanishes on B, C; it
vanishes on four points on ` and thus Q defines a cubic curve that consists
of ` and a quadric curve. The quadric curve passes through A3 , . . . , A7 and
is thus ; but then the curve does not pass through A8 , a contradiction.
This contradiction finishes the proof of the proposition.

I recently learned of this proposition and its role in unifying many incidence geometry facts concerning lines, quadric curves, and cubic curves.
For instance, we can recover the proof of the classical theorem of Pappus:
Theorem 8.2.2 (Pappus theorem). Let `, `0 be two distinct lines, let A1 , A2 , A3
be distinct points on ` that do not lie on `0 , and let B1 , B2 , B3 be distinct
points on `0 that do not lie on `. Suppose that for ij = 12, 23, 31, the

lines Ai Bj and Aj Bi meet at a point Cij . Then the points C12 , C23 , C31 are
collinear.
Proof. We may assume that C12 , C23 are distinct, since the claim is trivial
otherwise.

Let 0 be the union of the three lines A1 B2 , A2 B3 , and A3 B1 (the

purple lines in the first figure), let 1 be the union of the three lines A2 B1 ,

A3 B2 , and A1 B3 (the dark blue lines), and let be the union of the three

lines `, `0 , and C12 C23 (the other three lines). By construction, 0 and 1
are cubic curves with no common component that meet at the nine points
A1 , A2 , A3 , B1 , B2 , B3 , C12 , C23 , C31 . Also, is a cubic curve that passes
through the first eight of these points, and thus also passes through the

192

8. Geometry

Figure 11. Pappus theorem.

ninth point C31 , by the Cayley-Bacharach theorem. The claim follows (note
that C31 cannot lie on ` or `0 ).

The same argument gives the closely related theorem of Pascal:
Theorem 8.2.3 (Pascals theorem). Let A1 , A2 , A3 , B1 , B2 , B3 be distinct

points on a conic section . Suppose that for ij = 12, 23, 31, the lines Ai Bj

and Aj Bi meet at a point Cij . Then the points C12 , C23 , C31 are collinear.
Proof. Repeat the proof of Pappus theorem, with taking the place of
` `0 . (Note that as any line meets in at most two points, the Cij cannot
lie on .)

One can view Pappuss theorem as the degenerate case of Pascals theorem, when the conic section degenerates to the union of two lines.
Finally, Proposition 8.2.1 gives the associativity of the elliptic curve
group law:
Theorem 8.2.4 (Associativity of the elliptic curve law). Let := {(x, y)
k 2 : y 2 = x3 +ax+b}{O} be a (projective) elliptic curve, where O := [0, 1, 0]
is the point at infinity on the y-axis, and the discriminant := 16(4a3 +
27b2 ) is non-zero. Define an addition law + on by defining A + B to equal
C, where C is the unique point on collinear with A and B (if A, B are
disjoint) or tangent to A (if A = B), and C is the reflection of C through

8.2. Elliptic curves and Pappuss theorem

193

Figure 12. Pascals theorem.

the x-axis (thus C, C, O are collinear), with the convention O = O. Then


+ gives the structure of an abelian group with identity O and inverse .

Proof. It is clear that O is the identity for +, is an inverse, and + is


abelian. The only non-trivial assertion is associativity: (A + B) + C =
A + (B + C). By a perturbation (or Zariski closure) argument, we may
assume that we are in the generic case when O, A, B, C, A + B, B + C, (A +
B), (B+C) are all distinct from each other and from ((A+B)+C), (A+
(B + C)). (Here we are implicitly using the smoothness of the elliptic curve,
which is guaranteed by the hypothesis that the discriminant is non-zero.)


Let 0 be the union of the three lines AB, C(A + B), and O(B + C)

(the purple lines), and let 00 be the union of the three lines O(A + B),

BC, and A(B + C) (the green lines). Observe that 0 and are cubic
curves with no common component that meet at the nine distinct points
O, A, B, C, A + B, (A + B), B + C, (B + C), ((A + B) + C). The cubic
curve 00 goes through the first eight of these points, and thus (by Proposition
8.2.1) also goes through the ninth point ((A+B)+C). This implies that the
line through A and B+C meets in both (A+(B+C)) and ((A+B)+C),
and so these two points must be equal, and so (A + B) + C = A + (B + C)
as required.


194

8. Geometry

Figure 13. Associativity of the elliptic curve group law.

One can view Pappuss theorem and Pascals theorem as a degeneration


of the associativity of the elliptic curve law, when the elliptic curve degenerates to three lines (in the case of Pappus) or the union of one line and one
conic section (in the case of Pascals theorem).

8.3. Lines in the Euclidean group SE(2)


In the recent breakthrough of Guth and Katz [GuKa2010] (discussed at
[Ta2011d, 3.9]) on the Erdos distance problem, one of the main tools
used in the proof (building upon the earlier work of Elekes and Sharir
[ElSh2011]) was the observation (dating back to [St1891]) that the incidence geometry of the Euclidean group SE(2) of rigid motions of the plane
was almost identical to that of lines in the Euclidean space R3 :
Proposition 8.3.1. One can identify a (Zariski-)dense portion of SE(2)
with R3 , in such a way that for any two points A, B in the plane R2 , the
set lAB := {R SE(2) : RA = B} of rigid motions mapping A to B forms
a line in R3 .
Proof. A rigid motion is either a translation or a rotation, with the latter
forming a Zariski-dense subset of SE(2). Identify a rotation R in SE(2) by
an angle with || < around a point P with the element (P, cot 2 ) in
R3 . (Note that such rotations also form a Zariski-dense subset of SE(2).)

8.3. Lines in the Euclidean group SE(2)

195

Elementary trigonometry then reveals that if R maps A to B, then P lies on


the perpendicular bisector of AB, and depends in a linear fashion on cot 2
(for fixed A, B). The claim follows.

As seen from the proof, this proposition is an easy (though ad hoc) application of elementary trigonometry, but it was still puzzling to me why such
a simple parameterisation of the incidence structure of SE(2) was possible.
Certainly it was clear from general algebraic geometry considerations that
some bounded-degree algebraic description was available, but why would
the lAB be expressible as lines and not as, say, quadratic or cubic curves?
In this section I would like to record some observations arising from
discussions with Jordan Ellenberg, Jozsef Solymosi, and Josh Zahl which
give a more conceptual (but less elementary) derivation of the above proposition that avoids the use of ad hoc coordinate transformations such as
R 7 (P, cot 2 ). The starting point is to view the Euclidean plane R2 as the
scaling limit of the sphere S 2 (a fact which is familiar to all of us through
the geometry of the Earth), which makes the Euclidean group SE(2) a scaling limit of the rotation group SO(3). The latter can then be lifted to a
double cover, namely the spin group Spin(3). This group has a natural interpretation as the unit quaternions, which is isometric to the unit sphere
S 3 . The analogue of the lines lAB in this setting become great circles on
this sphere; applying a projective transformation, one can map S 3 to R3 (or
more precisely to the projective space P3 ), at whichi point the great circles
become lines. This gives a proof of Proposition 8.3.1.
Details of the correspondence are provided below the fold. One byproduct of this analysis, incidentally, is the observation that the Guth-Katz
bound g(N )  N/ log N for the Erdos distance problem in the plane R2 ,
immediately extends with almost no modification to the sphere S 2 as well
(i.e. any N points in S 2 determine  N/ log N distances), as well as to
the hyperbolic plane H 2 . (The latter observation has since been applied in
[IoRoRu2011].)
8.3.1. Euclidean geometry as the scaling limit of spherical geometry. Euclidean geometry and spherical geometry are examples of Kleinian
geometries: the geometry of a space X with a group of symmetries G acting transitively on it. In the case of Euclidean plane geometry, the space
X is the plane R2 and the symmetry group is the special Euclidean group
SE(2) = SO(2) n R2 ; in the case of spherical geometry, the space X is
the unit sphere S 2 and the symmetry group is the special orthogonal group
SO(3). According to the Kleinian way of thinking (as formalised by the
Erlangen program), explicit coordinates on X should be avoided if possible,

196

8. Geometry

with a preference instead for only using concepts (e.g. congruence, distance,
angle) that are invariant with respect to the group of symmetries G.
As we all know from the geometry of the Earth (and the Greek root
geometria literally means Earth measurement), the geometry of the sphere
S 2 resembles the geometry of the plane R2 at scales that are small compared
to the radius of the sphere. There are at least two ways to make this
intuitive fact more precise. One is to make the radius R of the sphere
go to infinity, and perform a suitable limit (e.g. a Gromov-Hausdorff limit).
A dual approach is to keep the radius of the sphere fixed (e.g. considering
only the unit sphere), but making the scale being considered on the sphere
shrink to zero. The two approaches are of course equivalent, but we will
consider the latter.
Thus, we view S 2 as the unit sphere in R3 . With an eye to using the
quaternionic number system later on, we will denote the standard basis of
R3 as i, j, k, thus in particular i is a point on the sphere S 2 which we will
view as an origin for this sphere. The tangent plane to S 2 at this point is
then
{i + yj + zk : y, z R2 }.
This plane is tangent to the sphere to second order. In particular, if (y, z)
R2 , and > 0 is a small parameter (which we think of as going to zero
eventually), then we can find a point on S 2 of the form i + yj + zk + O(2 ).
(If one wishes, one can enforce the O(2 ) error to lie in the i direction,
in order to make the identification uniquely well-defined, although this is
not strictly necessary for the discussion below.) Thus, we can view the
-neighbourhood of the origin i as being approximately identifiable with a
bounded neighbourhood of the origin 0 in the plane R2 via the identification
(y, z) 7 i + yj + zk + O(2 ).
With this identification, one can see various structures in spherical geometry correspond (up to errors of O()) to analogous structures in planar
geometry. For instance, a great circle in S 2 is of the form
{p S 2 : p = 0}
for some S 2 , where is the usual dot product. In order for this great
circle to intersect the O() neighbourhood of the origin i, one must have
i = O(), and so we have
= ai + (cos )j + (sin )k + O(2 )
for some bounded quantity a and some angle . If one then restricts the
great circle to points p = i + yj + zk + O(2 ), the constraint p = 0 then
becomes
a + (cos )y + (sin )z = O(),

8.3. Lines in the Euclidean group SE(2)

197

which is within O() of the equation of a typical line in R2 ,


a + (cos )y + (sin )z = 0.
This formalises the geometrically intuitive fact that great circles resemble
lines at small scales.
Remark 8.3.2. One can also adopt a more intrinsic Riemannian geometry viewpoint to see R2 as the limit of rescaled versions of S 2 . Indeed, for
each real number , there is a unique (up to isometry) simply connected
Riemannian surface S of constant
p scalar curvature . For > 0, this is
2
the unit sphere S (rescaled by /2); for = 0, this is the Euclidean
p
plane R2 ; and for < 0, it is the hyperbolic plane H 2 (rescaled by ||/2).
Sending to zero, we thus see the sphere (or hyperbolic plane) converging
to the Euclidean plane.
We now apply a similar analysis to a rotation matrix R SO(3) acting
on the unit sphere S 2 . In order for this rotation matrix to map an O()
neighbourhood of the origin i to another such neighbourhood, the rotation
matrix R must be of the form R = (1 + T + O(2 ))S, where S is a rotation
that fixes i, thus
S(xi + yj + zk) = xi + (y cos z sin )j + (y sin + z cos )k
for some angle , and T is an infinitesimal rotation (i.e. an element of the
Lie algebra so(3)), thus T i = aj + bk for some reals a, b. We then have
R(i + yj + zk + O(2 )) = i + (y cos z sin + a)j
+ (y sin + z cos + b) + O(2 ),
so in (y, z) coordinates, R becomes the map
(y, z) 7 (y cos z sin + a, y sin + z cos + b) + O(),
which is within of the Euclidean rigid motion
(y, z) 7 (y cos z sin + a, y sin + z cos + b).
Thus we see the Euclidean rigid motion group SE(2) emerging as a scaling
limit of the orthogonal rotation group SO(3) (or alternatively, as the normal
bundle of the stabiliser of the origin, which is a copy of SO(2)).
Remark 8.3.3. One can also analyse the situation from a Lie algebra perspective. As is well known, one can equip the three-dimensional Lie algebra
so(3) with a basis X, Y, Z obeying the commutation relations
[X, Y ] = Z; [Y, Z] = X; [Z, X] = Y,
corresponding to infinitesimal rotations around the x, y, z axes respectively.
If we then rescale X := X, Y := Y, Z := Z (which morally corresponds

198

8. Geometry

to looking at rotation matrices that almost fix i, as above), the commutation


relations rescale to
[X , Y ] = Z ; [Y , Z ] = 2 X ; [Z , X ] = Y .
Sending 0, the Lie algebra degenerates to the solvable Lie algebra
[X0 , Y0 ] = Z0 ; [Y0 , Z0 ] = 0; [Z0 , X0 ] = Y0 ,
which is the Lie algebra se(2) of the Euclidean group SE(2).
There is an exact analogue of this phenomenon for the isometry group
SO(2, 1) SL2 (R) of the hyperbolic plane H 2 (which one can think of as
one sheet of the unit sphere in Minkowski space R2+1 , just as S 2 is the unit
sphere in R3 ). The Lie algebra here can be equipped with a basis X, Y, Z
(which one can interpret as infinitesimal rotations and Lorentz boosts in
Minkowski space) with the relations
[X, Y ] = Z; [Y, Z] = X; [Z, X] = Y,
and the same scaling argument as before gives SE(2) as a scaling limit of
SO(2, 1).
8.3.2. Lifting to the quaternions. The quaternions are a number system, defined formally as the set H of numbers of the form
t + xi + yj + zk
with t, x, y, z R. This a four-dimensional vector space; it can be turned
into an algebra (and into a division ring) by enforcing Hamiltons relations
i2 = j 2 = k 2 = ijk = 1.
The quaternions also come equipped with a conjugation operation
(t + xi + yj + zk) := t xi yj zk
and a norm
|| := ( )1/2 = ( )1/2 ,
thus
|(t + xi + yj + zk)| =

p
t 2 + x2 + y 2 + z 2 .

The conjugation operation is an anti-automorphism, and the norm is multiplicative: || = ||||. The quaternions also have a trace
tr(t + xi + yj + zk) = t
(in particular, tr( ) = tr() and tr() = tr()), giving rise to a dot
product
:= tr( )
which (together with the norm) gives a Hilbert space structure on the quaternions.

8.3. Lines in the Euclidean group SE(2)

199

The unit sphere S 3 = { H : = 1} of the quaternions forms


a group, which is a model for the spin group Spin(3) (thus giving rise to
an interpretation of the quaternions H, which are of course acted upon
by S 3 by left-multiplication, as spinors for this group). This group acts
on itself isometrically by conjugation, with an element S 3 Spin(3)
mapping S 3 to . As = 1, this action preserves the quaternionic
identity 1, and thus preserves the orthogonal complement {xi + yj + zk :
x2 + y 2 + z 2 } of that identity in S 3 , which we can of course identify with S 2 .
Thus S 3 Spin(3) acts on S 2 isometrically by conjugation, thus providing
a map from Spin(3) to SO(3). One can verify that this map is surjective
(indeed, conjugation by the quaternion ei corresponds to a rotation around
the i-axis by 2, and similarly for rotations around other axes) and is a
double cover (since the center of S 3 is {1, +1}), with the pre-image of any
rotation in SO(3) being a pair {, } of diametrically opposed points in
S 3 . Thus we see that SO(3) can be identified (in either the topological or
Riemannian geometrical senses) with the projective sphere S 3 / P3 . As
SO(3) acts transitively on S 2 , we see that S 3 Spin(3) does also.
The stabiliser lAA := { S 3 : A = A} of a point A S 2 is easily
seen to be a great circle in S 3 (being the intersection of S 3 with the center of
, which is a plane). For instance, the stabiliser lii of the origin i is the circle
{ei : R}. (This, incidentally, gives an explicit geometric manifestation
of the Hopf fibration.) By transitivity (and the isometric nature of the S 3
action), we conclude that the sets lAB := { S 3 : A = B} are also
great circles in S 3 for any pair of points A, B. Conversely, as all great circles
are isometric to each other, we see that all great circles are of the form lAB .
One also sees that two great circles lAB , lCD coincide only when A, C have
the same stabiliser, and when B, D have the same stabiliser, which forces
C, D to either equal A, B, or A, B.
Remark 8.3.4. Using a projective transformation, one can identify (a hemisphere of) S 3 with R3 , with (most) great circles becoming lines in R3 . Thus,
we see that the incidence geometry of the lAB in S 3 is essentially equivalent
to the incidence geometry of lines in R3 . Because of this, the Guth-Katz
argument to establish the bound g(N )  N/ log N for the number of distances determined by N points in the plane R2 , also extends to N points
in the sphere S 2 . Indeed, as in the Guth-Katz paper, it suffices to show
that the number of congruent line segments AB, CD in these N points is
O(N 3 log N ). For each such pair of line segments, there is a unique element
of SO(3) (and thus two antipodal elements of S 3 Spin(3)) that maps AB
to CD; these two antipodal points are also the intersection of lAC with lBD .
Applying the projective transformation, one arrives at exactly the same incidence problem between points and lines considered by Guth-Katz (and in

200

8. Geometry

particular, one can apply [GuKa2010, Theorems 2.4, 2.5] as a black box,
after verifying that at most O(N ) lines of the form lAB project into a plane
or regulus, which is proven in the S 2 case in much the same way as it is in
the R2 case). We omit the details.
A similar argument (changing the signatures in various metrics, and in
the Clifford algebra underlying the quaternions) also allows one to establish
the same results in the hyperbolic plane H 2 ; again, we omit the details.
If we restrict attention to an -neighbourhood of the origin i in the
sphere S 2 , and similarly restrict to an -neighbourhood of the stabiliser of
i in the spin group S 3 Spin(3), we can use the correspondences from the
previous section to convert S 2 into R2 in the limit, and Spin(3) in the limit
into a double cover of the rotation group SE(2) (which ends up just being
isomorphic to SE(2) again). The great circles lAB in Spin(3) then, in the
limit, become the analogous sets lAB = {R SE(2) : RA = B} in SE(2),
and the above correspondences can then be used to map (most of) SE(2) to
R3 , and (most) lAB to lines, giving Proposition 8.3.1.
Remark 8.3.5. The results in this section can also be interpreted using the
language of Clifford algebra; see [Gu2011].

8.4. Bezouts inequality


Classically, the fundamental object of study in algebraic geometry is the
solution set
(8.3)

V = VP1 ,...,Pm := {(x1 , . . . , xd ) k d : P1 (x1 , . . . , xd ) = . . .

= Pm (x1 , . . . , xd ) = 0}
to multiple algebraic equations
P1 (x1 , . . . , xd ) = . . . = Pm (x1 , . . . , xd ) = 0
in multiple unknowns x1 , . . . , xd in a field k, where the P1 , . . . , Pm : k d
k are polynomials of various degrees D1 , . . . , Dm . We adopt the classical
perspective of viewing V as a set (and specifically, as an algebraic set), rather
than as a scheme. Without loss of generality we may order the degrees in
non-increasing order:
D1 D2 . . . Dm 1.
We can distinguish between the underdetermined case m < d, when there
are more unknowns than equations; the determined case m = d when there
are exactly as many unknowns as equations; and the overdetermined case
m > d, when there are more equations than unknowns.
Experience has shown that the theory of such equations is significantly
simpler if one assumes that the underlying field k is algebraically closed,

8.4. Bezouts inequality

201

and so we shall make this assumption throughout the rest of this section. In
particular, this covers the important case when k = C is the field of complex
numbers (but it does not cover the case k = R of real numbers - see below).
From the general soft theory of algebraic geometry, we know that
the algebraic set V is a union of finitely many algebraic varieties, each of
dimension at least d m, with none of these components contained in any
other. In particular, in the underdetermined case m < d, there are no
zero-dimensional components of V , and thus V is either empty or infinite.
Now we turn to the determined case d = m, where we expect the solution
set V to be zero-dimensional and thus finite. Here, the basic control on the
solution set is given by Bezouts theorem. In our notation, this theorem
states the following:
Theorem 8.4.1 (Bezouts theorem). Let d = m = 2. If V is finite, then it
has cardinality at most D1 D2 .
This result can be found in any introductory algebraic geometry textbook; it can for instance be proven using the classical tool of resultants. The
solution set V will be finite when the two polynomials P1 , P2 are coprime,
but can (and will) be infinite if P1 , P2 share a non-trivial common factor.
By defining the right notion of multiplicity on V (and adopting a suitably scheme-theoretic viewpoint), and working in projective space rather
than affine space, one can make the inequality |V | D1 D2 an equality.
However, for many applications (and in particular, for the applications to
combinatorial incidence geometry), the upper bound usually suffices.
Bezouts theorem can be generalised in a number of ways. For instance,
the restriction on the finiteness of the solution set V can be dropped by
restricting attention to V 0 , the union of the zero-dimensional irreducible
components of V :
Corollary 8.4.2 (Bezouts theorem, again). Let d = m = 2. Then V 0 has
cardinality at most D1 D2 .
Proof. We factor P1 , P2 into irreducible factors (using unique factorisation
of polynomials). By removing repeated factors, we may assume P1 , P2 are
square-free. We then write P1 = Q1 R, P2 = Q2 R where R is the greatest
common divisor of P1 , P2 and Q1 , Q2 are coprime. Observe that the zerodimensional component of {P1 = P2 = 0} is contained in {Q1 = Q2 = 0},
which is finite from the coprimality of Q1 , Q2 . The claim follows.

It is also not difficult to use Bezouts theorem to handle the overdetermined case m > d = 2 in the plane:

202

8. Geometry

Corollary 8.4.3 (Bezouts theorem, yet again). Let m d = 2. Then V 0


has cardinality at most D1 D2 .
Proof. We may assume all the Pi are square-free. We write P1 = Q2 R2 ,
where Q2 is coprime to P2 and R2 divides P2 (and also P1 ). We then write
R2 = Q3 R3 , where Q3 is coprime to P3 and R3 divides P3 (and also P1 , P2 ).
Continuing in this fashion we obtain a factorisation
Sm P1 = Q2 Q3 . . . Qm Rm .
One then observes that V 0 is contained in the
set
i=2 {Qi = Pi = 0}, which
Pm
by Theorem
8.4.1 has cardinality at most i=2 deg(Qi )Di . Since Di D2
P
and m
deg(Q

i ) deg(P1 ) = D1 , the claim follows.
i=2
Remark 8.4.4. Of course, in the overdetermined case m > d one generically
expects the solution set to be empty, but if there is enough degeneracy or
numerical coincidence then non-zero solution sets can occur. In particular,
by considering the case when P2 = . . . = Pm and D2 = . . . = Dm we see that
the bound D1 D2 can be sharp in some cases. However, one can do a little
better in this situation; by decomposing Pm into irreducible components,
for instance, one can improve the upper bound of D1 D2 slightly to D1 Dm .
However, this improvement seems harder to obtain in higher dimensions (see
below).
Bezouts theorem also extends to higher dimensions. Indeed, we have
Theorem 8.4.5 (Higher-dimensional Bezouts theorem). Let d = m 0.
If V is finite, then it has cardinality at most D1 . . . Dd .
This is a standard fact, and can for instance be proved from the more
general and powerful machinery of intersection theory. A typical application
of this theorem is to show that, given a degree D polynomial P : Rd R
over the reals, the number of connected components of {x Rd : P (x) 6= 0}
is O(Dd ). The main idea of the proof is to locate a critical point P (x) = 0
inside each connected component, and use Bezouts theorem to count the
number of zeroes of the polynomial map P : Rd Rd . (This doesnt quite
work directly because some components may be unbounded, and because the
fibre of P at the origin may contain positive-dimensional components, but
one can use truncation and generic perturbation to deal with these issues;
see [SoTa2011] for further discussion.)
Bezouts theorem can be extended to the overdetermined case as before:
Theorem 8.4.6 (Bezouts inequality). Let m d 0. Then V 0 has
cardinality at most D1 . . . Dd .
Remark 8.4.7. Theorem 8.4.6 ostensibly only controls the zero-dimensional
components of V , but by throwing in a few generic affine-linear forms to the
set of polynomials P1 , . . . , Pm (thus intersecting V with a bunch of generic

8.4. Bezouts inequality

203

hyperplanes) we can also control the total degree of all the i-dimensional
components of V for any fixed i. (Again, by using intersection theory one
can get a slightly more precise bound than this, but the proof of that bound
is more complicated than the arguments given here.)

This time, though, it is a slightly non-trivial matter to deduce Theorem


8.4.6 from Theorem 8.4.5, due to the standard difficulty that the intersection
of irreducible varieties need not be irreducible (which can be viewed in some
ways as the source of many other related difficulties, such as the fact that not
every algebraic variety is a complete intersection), and so one cannot evade
all irreducibility issues merely by assuming that the original polynomials Pi
are irreducible. Theorem 8.4.6 first appeared explicitly in the work of Heintz
[He1983].
As before, the most systematic way to establish Theorem 8.4.6 is via intersection theory. In this section, though, I would like to give a slightly more
elementary argument (essentially due to Schmid [Sc1995]), based on generically perturbing the polynomials P1 , . . . , Pm in the problem ; this method
is less powerful than the intersection-theoretic methods, which can be used
for a wider range of algebraic geometry problems, but suffices for the narrow objective of proving Theorem 8.4.6. The argument requires some of
the soft or qualitative theory of algebraic geometry (in particular, one
needs to understand the semicontinuity properties of preimages of dominant
maps), as well as basic linear algebra. As such, the proof is not completely
elementary, but it uses only a small amount of algebraic machinery, and as
such I found it easier to understand than the intersection theory arguments.
Theorem 8.4.6 is a statement about arbitrary polynomials P1 , . . . , Pm .
However, it turns out (in the determined case m = d, at least) that upper
bounds on |V 0 | are Zariski closed properties, and so it will suffice to establish
this claim for generic polynomials P1 , . . . , Pm . On the other hand, it is
possible to use duality to deduce such upper bounds on |V 0 | from a Zariski
open condition, namely that a certain collection of polynomials are linearly
independent. As such, to verify the generic case of this open condition, it
suffices to establish this condition for a single family of polynomials, such as
a family of monomials, in which case the condition can be verified by direct
inspection. Thus we see an example of the somewhat strange strategy of
establishing the general case from a specific one, using the generic case as
an intermediate step.

Remark 8.4.8. There is an important caveat to note here, which is that


these theorems only hold for algebraically closed fields, and in particular can

204

8. Geometry

fail over the reals R. For instance, in R3 , the polynomials


P (x, y, z) = z
Q(x, y, z) = z
R(x, y, z) = (

N
Y

(x j))2 + (

j=1

N
Y

(y j))2

j=1

have degrees 1, 1, 2N respectively, but their common zero locus {(x, y, 0) :


x, y {1, . . . , N }} has cardinality N 2 . In some cases one can safely obtain
incidence bounds in R by embedding R inside C (for instance, things are
OK if one knows that the zero locus has complex dimension zero, and not
merely real dimension zero), but as the above example shows, one needs to
be careful when doing so.
8.4.1. The determined case. We begin by establishing Theorem 8.4.5.
Fix m = d 0. If one wishes, one can dispose of the trivial case m = d = 0
and assume m = d 1, although this is not strictly necessary for the
argument below.
As mentioned in the introduction, the first idea is to generically perturb
the P1 , . . . , Pd . To this end, let W be the vector space W := W1 . . . Wd ,
where Wi is the vector space of all polynomials Pi : k d k of degree at
most Di ; thus W is the configuration space of all possible P1 , . . . , Pd . This is
a finite-dimensional vector space over k with an explicit dimension dim(W )
which we will not need to compute here. The set VP1 ,...,Pd k d in (8.3) can
thus be viewed as the fibre over (P1 , . . . , Pd ) of the algebraic set V W k d ,
defined as
V := {(P1 , . . . , Pd , x1 , . . . , xd ) W k d :
P1 (x1 , . . . , xd ) = . . . = Pd (x1 , . . . , xd ) = 0 for all 1 i d}.
This algebraic set is cut out by d polynomial equations on W k d and thus
is the union of finitely many algebraic varieties V = V,1 . . . V,r , each
of which has codimension at most d in W k d . In particular, dim(V,i )
dim(W ).
Consider one of the components V,i of V . Its projection W,i in W will
be Zariski dense in its closure W,i , which is necessarily an irreducible variety
since V,i is, and the projection map from V,i to W,i will, by definition, be a
dominant map. By basic algebraic geometry, the preimages in V,i of a point
(P1 , . . . , Pd ) in W,i will generically have dimension dim(V,i ) dim(W,i )
(i.e. this dimension will occur outside of a algebraic subset of W,i of positive
codimension), and even in the non-generic case will have dimension at least
dim(V,i ) dim(W,i ). Since dim(V,i ) dim(W ), we thus see that the
only components V,i that can contribute to the zero-dimensional set V 0

8.4. Bezouts inequality

205

are those with dim(V,i ) = dim(W,i ) = dim(W ). Now, the projection map
is a dominant map between two varieties of the same dimension, and is
thus generically finite, with the preimages generically having some constant
cardinality D,i , and non-generically the preimages have a zero-dimensional
component of at most2 D,i points.
As a consequence of this analysis, we see that the generic fibre always
has at least as many zero-dimensional components as a non-generic fibre,
and so to establish Theorem 8.4.5, it suffices to do so for generic P1 , . . . , Pm .
Now take P1 , . . . , Pm to be generic. We know that generically, the set V
is finite; we seek to bound its cardinality |V | by D1 . . . Dm . To do this, we
dualise the problem. Let A be the space of all affine-linear forms : k d k;
this is a d + 1-dimensional vector space. We consider the set V of all affinelinear forms whose kernel { = 0} intersects V . This is a union of |V |
hyperplanes in A, and is thus a hypersurface of degree |V |. Thus, to upper
bound the size of V , it suffices to upper bound the degree of the hypersurface
V , and this can be done by finding a non-zero polynomial of controlled
degree that vanishes identically on V . The point of this observation is that
the property of a polynomial being non-zero is a Zariski-open condition, and
so we have a chance of establishing the generic case from a special case.
Now let us look for a (generically non-trivial) polynomial of degree at
Q
most di=1 Di that vanishes on V . The idea is to try to dualise the assertion
that the monomials xa11 . . . xadd with aj < Dj for all 1 j d generically
span the function ring of V , to become a statement about V .
P
Let D be a sufficiently large integer (any integer larger than di=1 (Di 1)
will do, actually), and let X be the space of all polynomials P : k d k of
degree at most D. This is a finite-dimensional vector space over k, generated
by the monomials xa11 . . . xadd with a1 , . . . , ad non-negative integers adding up
to at most D. We can split X as a direct sum
(8.4)

d
X
i
X=(
xD
i Xi ) + X0
i=1

where for i, . . . , d, Xi is generated by those monomials xa11 . . . xadd of degree


at most D Di with aj < Dj for all j < i and with ai 0, and X0 is
generated by those monomials xa11 . . . xadd with aj < Dj for all 1 j d. In
2Topologically, one can see this claim (at least in the model case k = C) as follows. Any
isolated point p in a non-generic preimage must perturb to a zero-dimensional set or to an empty
set in nearby preimages, since V,i is closed. On a generic preimage, p must perturb to a zerodimensional set (for if it generically perturbs to the empty set, the dimension of V,i will be
too small); since a generic preimage has D,i points, we conclude that the non-generic preimage
can contain at most D,i isolated points. For a more detailed proof of this claim, see [He1983,
Proposition 1].

206

8. Geometry

particular,
(8.5)

dim(X) =

d
X

dim(Xi ) + dim(X0 ).

i=1

Also observe that


(8.6)

dim(X0 ) =

d
Y

Di .

i=1

Now let A be an affine-linear form, and consider the sum


(8.7)

d
X

Pi Xi ) + X0 .

i=1

This is a subspace of X for D large enough. In view of (8.4), we see that


i
this sum can equal all of X in the case when Pi = xD
and = 1. From
i
(8.5), the property of (8.7) spanning all of X is a Zariski-open condition,
and thus we see that (8.7) spans X for generic choices of P1 , . . . , Pd and .
On the other hand, suppose that V , thus (a) = P1 (a) = . . . =
Pd (a) = 0 for some a k d . Then observe that each factor of (8.7) lies in the
hyperplane {P X : P (a) = 0} of X, and so (8.7) does not span X in this
case. Thus, for generic P1 , . . . , Pd , we see that (8.7) spans X for generic
but not for any in V .
The property of (8.7) spanning X is equivalent to a certain resultant-like
determinant of a dim(X) dim(X) matrix being non-zero, where the rows
are given by the generators of Pi Xi and X0 . For generic P1 , . . . , Pd ,
this determinant is a non-trivial polynomial in X0 which vanishes on
V ; an inspection of the matrix reveals that this determinant has degree
Q
dim(X0 ) = di=1 Di in . Thus we have found a non-trivial polynomial of
Q
degree di=1 Di that vanishes on V , and the claim follows.
8.4.2. The overdetermined case. Now we establish Theorem 8.4.6. Fix
P1 , . . . , Pm , and let V 0 be the union of all the positive-dimensional components of V , thus V0 = V \V 0 .
The main idea here is to perturb P1 , . . . , Pm to be a regular sequence of
polynomials outside of V 0 . More precisely, we have
Lemma 8.4.9 (Regular sequence). For any 1 r d, one can find polynomials Q1 , . . . , Qr , such that for each i = 1, . . . , r, Qi is a linear combination
of Pi , . . . , Pm (and thus has degree at most Di ), with the Pi coefficient being
non-zero, and the set {x k d \V 0 : Q1 = . . . = Qr = 0} in k d has dimension
at most d r (in the sense that it is covered by finitely many varieties of
dimension at most d r).

8.5. The Brunn-Minkowski inequality in nilpotent groups

207

Proof. We establish this claim by induction on r. For r = 1 the claim


follows by setting Q1 := P1 . Now suppose inductively that 1 < r d, and
that Q1 , . . . , Qr1 have already been found with the stated properties for
r 1.
By construction, the polynomials P1 , . . . , Pm are linear combinations of
Q1 , . . . , Qr1 , Pr , . . . , Pm . Thus, on the set W := {x k d \V 0 : Q1 = . . . =
Qr1 = 0}, the polynomials Pr , . . . , Pm can only simultaneously vanish on
the zero-dimensional set V \V 0 = V0 . On the other hand, each irreducible
component of W has dimension d r + 1, which is positive. Thus it is not
possible for Pr , . . . , Pm to simultaneously vanish on any of these components.
If one then sets Qr to be a generic linear combination of Pr , . . . , Pm , then
we thus see that Qr will also not vanish on any of these components, and
so {x k d \V 0 : Q1 = . . . = Qr = 0} has dimension at most d r. Also,
generically the Pr coefficient of Qr is non-zero, and the claim follows.

From the above lemma (with r := d) we see that V0 is contained in the
set {Q1 = . . . = Qd = 0}. By Theorem 8.4.5, the latter set has cardinality
at most D1 . . . Dd , and the claim follows.

8.5. The Brunn-Minkowski inequality in nilpotent groups


One of the fundamental inequalities in convex geometry is the Brunn-Minkowski
inequality, which asserts that if A, B are two non-empty bounded open subsets of Rd , then
(8.8)

(A + B)1/d (A)1/d + (B)1/d ,

where
A + B := {a + b : a A, b B}
is the sumset of A and B, and denotes Lebesgue measure. The estimate is
sharp, as can be seen by considering the case when A, B are convex bodies
that are dilates of each other, thus A = B := {b : b B} for some
> 0, since in this case one has (A) = d (B), A + B = ( + 1)B, and
(A + B) = ( + 1)d (B).
The Brunn-Minkowski inequality has many applications in convex geometry. To give just one example, if we assume that A has a smooth boundary A, and set B equal to a small ball B = B(0, ), then (B)1/d =
(B(0, 1))1/d , and in the limit 0 one has
(A + B) = (A) + |A| + o()
where |A| is the surface measure of A; applying the Brunn-Minkowski inequality and performing a Taylor expansion, one soon arrives at the isoperimetric inequality
|A| d(A)11/d (B(0, 1))1/d .

208

8. Geometry

Thus one can view the isoperimetric inequality as an infinitesimal limit of


the Brunn-Minkowski inequality.
There are many proofs known of the Brunn-Minkowski inequality. Firstly,
the inequality is trivial in one dimension:
Lemma 8.5.1 (One-dimensional Brunn-Minkowski). If A, B R are nonempty measurable sets, then
(A + B) (A) + (B).
Proof. By inner regularity we may assume that A, B are compact. The
claim then follows since A + B contains the sets sup(A) + B and A + inf(B),
which meet only at a single point sup(A) + inf(B).

For the higher dimensional case, the inequality can be established from
the Prekopa-Leindler inequality:
Theorem 8.5.2 (Prekopa-Leindler inequality in Rd ). Let 0 < < 1, and let
f, g, h : Rd R be non-negative measurable functions obeying the inequality
(8.9)

h(x + y) f (x)1 g(y)

for all x, y Rd . Then we have


Z
Z
Z
1
1
(8.10)
h
(
f) (
g) .
d(1) d
d
d
d
(1

)
R
R
R
This inequality is usually stated using h((1 )x + y) instead of h(x +
1
y) in order to eliminate the ungainly factor (1)d(1)
. However, we
d
formulate the inequality in this fashion in order to avoid any reference to
the dilation maps x 7 x; the reason for this will become clearer later.
The Prekopa-Leindler inequality quickly implies the Brunn-Minkowski
inequality. Indeed, if we apply it to the indicator functions f := 1A , g :=
1B , h := 1A+B (which certainly obey (8.9)), then (8.10) gives
1

1
(A) d (B) d
(A + B)1/d
(1 )1
for any 0 < < 1. We can now optimise in ; the optimal value turns out
to be
(B)1/d
:=
(A)1/d + (B)1/d
which yields (8.8).
To prove the Prekopa-Leindler inequality, we first observe that the inequality tensorises in the sense that if it is true in dimensions d1 and d2 , then
it is automatically true in dimension d1 + d2 . Indeed, if f, g, h : Rd1 Rd2
R+ are measurable functions obeying (8.9) in dimension d1 + d2 , then for
any x1 , y1 Rd1 , the functions f (x1 , ), g(y1 , ), h(x1 + y1 , ) : Rd2 R+

8.5. The Brunn-Minkowski inequality in nilpotent groups

209

obey (8.9) in dimension d2 . Applying the Prekopa-Leindler inequality in


dimension d2 , we conclude that
1
F (x1 )1 G(y1 )
(1 )d2 (1) d2
R
for all x1 , y1 Rd1 , where F (x1 ) := Rd2 f (x1 , x2 ) dx2 and similarly for
G, H. But then if we apply the Prekopa-Leindler inequality again, this time
in dimension d1 and to the functions F , G, and (1 )d2 (1) d2 H, and
then use the Fubini-Tonelli theorem, we obtain (8.10).
H(x1 + y1 )

From tensorisation, we see that to prove the Prekopa-Leindler inequality


it suffices to do so in the one-dimensional case. We can derive this from
Lemma 8.5.1 by reversing the Prekopa-Leindler implies Brunn-Minkowski
argument given earlier, as follows. If (8.9) holds (in one dimension), then the
super-level sets {f > }, {g > }, {h > } are related by the set-theoretic
inclusion
{h > } {f > } + {g > }
and thus by Lemma 8.5.1
({h > }) ({f > }) + ({g > }).
On the other hand, from the Fubini-Tonelli theorem one has the distributional identity
Z
Z
h=
({h > }) d
R

(and similarly for f, g), and thus


Z
Z
Z
h
f+
g.
R

The claim then follows from the weighted arithmetic mean-geometric mean
inequality (1 )x + y x1 y .
In this section we will make the simple observation (which appears in
[LeMa2005] in the case of the Heisenberg group, but may have also been
stated elsewhere in the literature) that the above argument carries through
without much difficulty to the nilpotent setting, to give a nilpotent BrunnMinkowski inequality:
Theorem 8.5.3 (Nilpotent Brunn-Minkowski). Let G be a connected, simply connected nilpotent Lie group of (topological) dimension d, and let A, B
be bounded open subsets of G. Let be a Haar measure on G (note that
nilpotent groups are unimodular, so there is no distinction between left and
right Haar measure). Then
(8.11)

(A B)1/d (A)1/d + (B)1/d .

210

8. Geometry

Here of course A B := {ab : a A, b B} is the product set of A and


B.
Indeed, by repeating the previous arguments, the nilpotent Brunn-Minkowski
inequality will follow from
Theorem 8.5.4 (Nilpotent Prekopa-Leindler inequality). Let G be a connected, simply connected nilpotent Lie group of topological dimension d with
a Haar measure . Let 0 < < 1, and let f, g, h : G R be non-negative
measurable functions obeying the inequality
h(xy) f (x)1 g(y)

(8.12)

for all x, y G. Then we have


Z
Z
Z
1
1
(8.13)
h d
(
f
d)
(
g d) .
d(1) d
(1

)
G
G
G
To prove the nilpotent Prekopa-Leindler inequality, the key observation
is that this inequality not only tensorises; it splits with respect to short
exact sequences. Indeed, suppose one has a short exact sequence
0KGH0
of connected, simply connected nilpotent Lie groups. The adjoint action
of the connected group G on K acts nilpotently on the Lie algebra of K
and is thus unimodular. Because of this, we can split a Haar measure G
on G into Haar measures K , H on K, H respectively so that we have the
Fubini-Tonelli formula
Z
Z
f (g) dG (g) =
F (h) dH (h)
G

R+ ,

for any measurable f : G


where F (h) is defined by the formula
Z
Z
F (h) :=
f (kg)dK (k) =
f (gk) dK (k)
K

for any coset representative g G of h (the choice of g is not important,


thanks to unimodularity of the conjugation action). It is then not difficult
to repeat the proof of tensorisation (relying heavily on the unimodularity of
conjugation) to conclude that the nilpotent Prekopa-Leindler inequality for
H and K implies the Prekopa-Leindler inequality for G; we leave this as an
exercise to the interested reader.
Now if G is a connected simply connected Lie group, then the abeliansation G/[G, G] is connected and simply connected and thus isomorphic to
a vector space. This implies that [G, G] is a retract of G and is thus also
connected and simply connected. From this and an induction of the step of
the nilpotent group, we see that the nilpotent Prekopa-Leindler inequality

8.5. The Brunn-Minkowski inequality in nilpotent groups

211

follows from the abelian case, which we have already established in Theorem
8.5.2.
Remark 8.5.5. Some connected, simply connected nilpotent groups G (and
specifically, the Carnot groups) can be equipped with a one-parameter family
of dilations x 7 x, which are a family of automorphisms on G, which
dilate the Haar measure by the formula
( x) = D (x)
for an integer D, called the homogeneous dimension of G, which is typically larger than the topological dimension. For instance, in the case of the
Heisenberg group

1 R R
G := 0 1 R ,
0 0 1
which has topological dimension d = 3, the natural family of dilations is
given by

1 x z
1 x 2 z
: 0 1 y 7 0 1 y
0 0 1
0 0
1
with homogeneous dimension D = 4. Because the two notions d, D of dimension are usually distinct in the nilpotent case, it is no longer helpful to try to
use these dilations to simplify the proof of the Brunn-Minkowski inequality,
in contrast to the Euclidean case. This is why we avoided using dilations in
the preceding discussion. It is natural to wonder whether one could replace
d by D in (8.11), but it can be easily shown that the exponent d is best possible (an observation that essentially appeared first in [Mo2003]). Indeed,
working in the Heisenberg group for sake of concreteness, consider the set

1 x z
A := {0 1 y : |x|, |y| N, |z| N 10 }
0 0 1
for some large parameter N . This set has measure N 12 using the standard
Haar measure on G. The product set A A is contained in

1 x z
A := {0 1 y : |x|, |y| 2N, |z| 2N 10 + O(N 2 )}
0 0 1
and thus has measure at most 8N 12 + O(N 4 ). This already shows that
the exponent in (8.11) cannot be improved beyond d = 3; note that the
homogeneous dimension D = 4 is making its presence known in the O(N 4 )
term in the measure of A A, but this is a lower order term only.

212

8. Geometry

It is somewhat unfortunate that the nilpotent Brunn-Minkowski inequality is adapted to the topological dimension rather than the homogeneous one,
because it means that some of the applications of the inequality (such as
the application to isoperimetric inequalities mentioned at the start of the
section) break down3.
Remark 8.5.6. The inequality can be extended to non-simply-connected
connected nilpotent groups G, if d is now set to the dimension of the largest
simply connected quotient of G. It seems to me that this is the best one can
do in general; for instance, if G is a torus, then the inequality fails for any
d > 0, as can be seen by setting A = B = G.
Remark 8.5.7. Specialising the nilpotent Brunn-Minkowski inequality to
the case A = B, we conclude that
(A A) 2d (A).
This inequality actually has a much simpler proof (attributed to Tsachik
Gelander in [Hr2012]): one can show that for a connected, simply connected
Lie group G, the exponential map exp : g G is a measure-preserving
homeomorphism, for some choice of Haar measure g on g, so it suffices to
show that
g (log(A A)) 2d g (log A).
But A A contains all the squares {a2 : a A} of A, so log(A A) contains
the isotropic dilation 2 log A, and the claim follows. Note that if we set
A to be a small ball around the origin, we can modify this argument to
give another demonstration of why the topological dimension d cannot be
replaced with any larger exponent in (8.11).
One may tentatively conjecture that the inequality (A A) 2d (A)
in fact holds in all unimodular connected, simply connected Lie groups G,
and all bounded open subsets A of G; I do not know if this bound is always
true, however.

3Indeed, the topic of isoperimetric inequalities for the Heisenberg group is a subtle one, with
many naive formulations of the inequality being false. See [Mo2003] for more discussion.

Chapter 9

Dynamics

9.1. The Furstenberg recurrence theorem and finite


extensions
In [Fu1977], Furstenberg established his celebrated multiple recurrence theorem:
Theorem 9.1.1 (Furstenberg multiple recurrence). Let (X, B, , T ) be a
measure-preserving system, thus (X, B, ) is a probability space and T : X
X is a measure-preserving bijection such that T and T 1 are both measurable. Let E be a measurable subset of X of positive measure (E) > 0. Then
for any k 1, there exists n > 0 such that
E T n E . . . T (k1)n E 6= .
Equivalently, there exists n > 0 and x X such that
x, T n x, . . . , T (k1)n x E.
As is well known, the Furstenberg multiple recurrence theorem is equivalent to Szemeredis theorem [Sz1975], thanks to the Furstenberg correspondence principle; see for instance [Ta2009, 2.10].
The multiple recurrence theorem is proven, roughly speaking, by an
induction on the complexity of the system (X, X , , T ). Indeed, for very
simple systems, such as periodic systems (in which T n is the identity for some
n > 0, which is for instance the case for the circle shift X = R/Z, T x := x+
with a rational shift ), the theorem is trivial; at a slightly more advanced
level, almost periodic (or compact) systems (in which {T n f : n Z} is a
precompact subset of L2 (X) for every f L2 (X), which is for instance
the case for irrational circle shifts), is also quite easy. One then shows
213

214

9. Dynamics

that the multiple recurrence property is preserved under various extension


operations (specifically, compact extensions, weakly mixing extensions, and
limits of chains of extensions), which then gives the multiple recurrence
theorem as a consequence of the Furstenberg-Zimmer structure theorem for
measure-preserving systems. See [Ta2009, 2.15] for further discussion.
From a high-level perspective, this is still one of the most conceptual
proofs known of Szemeredis theorem. However, the individual components
of the proof are still somewhat intricate. Perhaps the most difficult step
is the demonstration that the multiple recurrence property is preserved under compact extensions; see for instance [Ta2009, 2.13], which is devoted
entirely to this step. This step requires quite a bit of measure-theoretic
and/or functional analytic machinery, such as the theory of disintegrations,
relatively almost periodic functions, or Hilbert modules.
However, I recently realised that there is a special case of the compact
extension step - namely that of finite extensions - which avoids almost all of
these technical issues while still capturing the essence of the argument (and
in particular, the key idea of using van der Waerdens theorem [vdW1927]).
As such, this may serve as a pedagogical device for motivating this step of
the proof of the multiple recurrence theorem.
Let us first explain what a finite extension is. Given a measure-preserving
system X = (X, X , , T ), a finite set Y , and a measurable map : X
Sym(Y ) from X to the permutation group of Y , one can form the finite
extension
X n Y = (X Y, X Y, , S),
which as a probability space is the product of (X, X , ) with the finite probability space Y = (Y, Y, ) (with the discrete -algebra and uniform probability measure), and with shift map
(9.1)

S(x, y) := (T x, (x)y).

One easily verifies that this is indeed a measure-preserving system. We refer


to as the cocycle of the system.
An example of finite extensions comes from group theory. Suppose we
have a short exact sequence
0KGH0
of finite groups. Let g be a group element of G, and let h be its projection
in H. Then the shift map x 7 gx on G (with the discrete -algebra and
uniform probability measure) can be viewed as a finite extension of the
shift map y 7 hy on H (again with the discrete -algebra and uniform
probability measure), by arbitrarily selecting a section : H G that
inverts the projection map, identifying G with H K by identifying k(y)

9.1. The Furstenberg recurrence theorem and finite extensions

215

with (y, k) for y H, k K, and using the cocycle


(y) := (hy)1 g(y).
Thus, for instance, the unit shift x 7 x + 1 on Z/N Z can be thought of as
a finite extension of the unit shift x 7 x + 1 on Z/M Z whenever N is a
multiple of M .
Another example comes from Riemannian geometry. If M is a Riemannian manifold that is a finite cover of another Riemannian manifold N (with
the metric on M being the pullback of that on N ), then (unit time) geodesic
flow on the cosphere bundle of M is a finite extension of the corresponding
flow on N .
Here, then, is the finite extension special case of the compact extension
step in the proof of the multiple recurrence theorem:
Proposition 9.1.2 (Finite extensions). Let X o Y be a finite extension of a
measure-preserving system X. If X obeys the conclusion of the Furstenberg
multiple recurrence theorem, then so does X o Y .
Before we prove this proposition, let us first give the combinatorial analogue.
Lemma 9.1.3. Let A be a subset of the integers that contains arbitrarily
long arithmetic progressions, and let A = A1 . . . AM be a colouring of A
by M colours (or equivalently, a partition of A into M colour classes Ai ).
Then at least one of the Ai contains arbitrarily long arithmetic progressions.
Proof. By the infinite pigeonhole principle, it suffices to show that for each
k 1, one of the colour classes Ai contains an arithmetic progression of
length k.
Let N be a large integer (depending on k and M ) to be chosen later.
Then A contains an arithmetic progression of length N , which may be identified with {0, . . . , N 1}. The colouring of A then induces a colouring on
{0, . . . , N 1} into M colour classes. Applying (the finitary form of) van
der Waerdens theorem [vdW1927], we conclude that if N is sufficiently
large depending on M and k, then one of these colouring classes contains
an arithmetic progression of length k; undoing the identification, we conclude that one of the Ai contains an arithmetic progression of length k, as
desired.

Of course, by specialising to the case A = Z, we see that the above
Lemma is in fact equivalent to van der Waerdens theorem.
Now we prove Proposition 9.1.2.

216

9. Dynamics

Proof. Fix k. Let E be a positive measure subset of X o Y = (X Y, X


Y, , S). By Fubinis theorem, we have
Z
f (x) d(x)
(E) =
X

where f (x) := (Ex ) and Ex := {y Y : (x, y) E} is the fibre of E at x.


Since (E) is positive, we conclude that the set
F := {x X : f (x) > 0} = {x X : Ex 6= }
is a positive measure subset of X. Note for each x F , we can find an
element g(x) Y such that (x, g(x)) E. While not strictly necessary for
this argument, one can ensure if one wishes that the function g is measurable
by totally ordering Y , and then letting g(x) the minimal element of Y for
which (x, g(x)) E.
Let N be a large integer (which will depend on k and the cardinality M
of Y ) to be chosen later. Because X obeys the multiple recurrence theorem,
we can find a positive integer n and x X such that
x, T n x, T 2n x, . . . , T (N 1)n x F.
Now consider the sequence of N points
S mn (T mn x, g(T mn x))
for m = 0, . . . , N 1. From (9.1), we see that
(9.2)

S mn (T mn x, g(T mn x)) = (x, c(m))

for some sequence c(0), . . . , c(N 1) Y . This can be viewed as a colouring


of {0, . . . , N 1} by M colours, where M is the cardinality of Y . Applying
van der Waerdens theorem, we conclude (if N is sufficiently large depending
on k and |Y |) that there is an arithmetic progression a, a + r, . . . , a + (k 1)r
in {0, . . . , N 1} with r > 0 such that
c(a) = c(a + r) = . . . = c(a + (k 1)r) = c
for some c Y . If we then let y = (x, c), we see from (9.2) that
S an+irn y = (T (a+ir)n x, g(T (a+ir)n x)) E
for all i = 0, . . . , k 1, and the claim follows.

Remark 9.1.4. The precise connection between Lemma 9.1.3 and Proposition 9.1.2 arises from the following observation: with E, F, g as in the proof
of Proposition 9.1.2, and x X, the set
A := {n Z : T n x F }
can be partitioned into the classes
Ai := {n Z : S n (x, i) E 0 }

9.2. Rohlins problem

217

where E 0 := {(x, g(x)) : x F } E is the graph of g. The multiple


recurrence property for X ensures that A contains arbitrarily long arithmetic
progressions, and so therefore one of the Ai must also, which gives the
multiple recurrence property for Y .
Remark 9.1.5. Compact extensions can be viewed as a generalisation of
finite extensions, in which the fibres are no longer finite sets, but are themselves measure spaces obeying an additional property, which roughly speaking asserts that for many functions f on the extension, the shifts T n f of
f behave in an almost periodic fashion on most fibres, so that the orbits
T n f become totally bounded on each fibre. This total boundedness allows
one to obtain an analogue of the above colouring map c() to which van der
Waerdens theorem can be applied.

9.2. Rohlins problem


Let G = (G, +) be an abelian countable discrete group. A measure-preserving
G-system X = (X, X , , (Tg )gG ) (or G-system for short) is a probability
space (X, X , ), equipped with a measure-preserving action Tg : X X of
the group G, thus
(Tg (E)) = (E)
for all E X and g G, and
Tg Th = Tg+h
for all g, h G, with T0 equal to the identity map. Classically, ergodic
theory has focused on the cyclic case G = Z (in which the Tg are iterates
of a single map T = T1 , with elements of G being interpreted as a time
parameter), but one can certainly consider actions of other groups G also
(including continuous or non-abelian groups).
A G-system is said to be strongly 2-mixing, or strongly mixing for short,
if one has
lim (A Tg B) = (A)(B)
g

for all A, B X , where the convergence is with respect to the one-point


compactification of G (thus, for every > 0, there exists a compact (hence
finite) subset K of G such that |(A Tg B) (A)(B)| for all g 6 K).
Similarly, we say that a G-system is strongly 3-mixing if one has
lim

g,h,hg

(A Tg B Th C) = (A)(B)(C)

for all A, B, C X , thus for every > 0, there exists a finite subset K of G
such that
|(A Tg B Th C) (A)(B)(C)|
whenever g, h, h g all lie outside K.

218

9. Dynamics

It is obvious that a strongly 3-mixing system is necessarily strong 2mixing. In the case of Z-systems, it has been an open problem for some
time, due to Rohlin [Ro1949], whether the converse is true:
Problem 9.2.1 (Rohlins problem). Is every strongly mixing Z-system necessarily strongly 3-mixing?
This is a surprisingly difficult problem. In the positive direction, a routine application of the Cauchy-Schwarz inequality (via van der Corputs inequality) shows that every strongly mixing system is weakly 3-mixing, which
roughly speaking means that (A Tg B Th C) converges to (A)(B)(C)
for most g, h Z. Indeed, every weakly mixing system is in fact weakly
mixing of all orders; see for instance [Ta2009, 2.10]. So the problem is to
exclude the possibility of correlation between A, Tg B, and Th C for a small
but non-trivial number of pairs (g, h).
It is also known that the answer to Rohlins problem is affirmative for
rank one transformations [Ka1984] and for shifts with purely singular continuous spectrum [Ho1991] (note that strongly mixing systems cannot have
any non-trivial point spectrum). Indeed, any counterexample to the problem, if it exists, is likely to be highly pathological.
In the other direction, Rohlins problem is known to have a negative answer for Z2 -systems, by a well-known counterexample of Ledrappier [Le1978]
which can be described as follows. One can view a Z2 -system as being essentially equivalent to a stationary process (xn,m )(n,m)Z2 of random variables
2
xn,m in some range space indexed by Z2 , with X being Z with the
obvious shift map
T(g,h) (xn,m )(n,m)Z2 := (xng,mh )(n,m)Z2 .
In Ledrappiers example, the xn,m take values in the finite field F2 of two
elements, and are selected at uniformly random subject to the Pascals
triangle linear constraints
xn,m = xn1,m + xn,m1 .
A routine application of the Kolmogorov extension theorem (see e.g. [Ta2011,
1.7]) allows one to build such a process. The point is that due to the properties of Pascals triangle modulo 2 (known as Sierpinskis triangle), one
has
xn,m = xn2k ,m + xn,m2k
for all powers of two 2k . This is enough to destroy strong 3-mixing, because
it shows a strong correlation between x, T(2k ,0) x, and T(0,2k ) x for arbitrarily
large k and randomly chosen x X. On the other hand, one can still show
that x and Tg x are asymptotically uncorrelated for large g, giving strong
2-mixing. Unfortunately, there are significant obstructions to converting

9.2. Rohlins problem

219

Ledrappiers example from a Z2 -system to a Z-system, as pointed out in


[de2006].
In this section, I would like to record a finite field variant of Ledrappiers construction, in which Z2 is replaced by the function field ring F3 [t],
which is a dyadic (or more precisely, triadic) model for the integers (cf.
[Ta2008, 1.6]). In other words:
Theorem 9.2.2. There exists a F3 [t]-system that is strongly 2-mixing but
not strongly 3-mixing.
The idea is much the same as that of Ledrappier; one builds a stationary
F3 [t]-process (xn )nF3 [t] in which xn F3 are chosen uniformly at random
subject to the constraints
(9.3)

xn + xn+tk + xn+2tk = 0

for all n F3 [t] and all k 0. Again, this system is manifestly not strongly
3-mixing, but can be shown to be strongly 2-mixing; I give details below the
fold.
As I discussed in [Ta2008, 1.6], in many cases the dyadic model serves
as a good guide for the non-dyadic model. However, in this case there is
a curious rigidity phenomenon that seems to prevent Ledrappier-type examples from being transferable to the one-dimensional non-dyadic setting;
once one restores the Archimedean nature of the underlying group, the constraints (9.3) not only reinforce each other strongly, but also force so much
linearity on the system that one loses the strong mixing property.
9.2.1. The example. Let B be any ball in F3 [t], i.e. any set of the form
{n F3 [t] : deg(n n0 ) K} for some n0 F3 [t] and K 0. One can then
create a process xB = (xn )nB adapted to this ball, by declaring (xn )nB
to be uniformly distributed in the vector space VB FB
3 of all tuples with
coefficients in F3 that obey (9.3) for all n B and k K. Because any
translate of a line (n, n + tk , n + t2k ) is still a line, we see that this process is
stationary with respect to all shifts n 7 n + g of degree deg(g) at most K.
Also, if B B 0 are nested balls, we see that the vector space VB 0 projects
surjectively via the restriction map to VB (since any tuple obeying (9.3) in
B can be extended periodically to one obeying (9.3) in B 0 ). As such, we see
that the process xB is equivalent in distribution to the restriction xB 0 B
of xB 0 to B. Applying the Kolmogorov extension theorem, we conclude that
there exists an infinite process x = (xn )nF3 [t] whose restriction x B to any
ball B has the distribution of xB . As each xB was stationary with respect
to translations that preserved B, we see that the full process x is stationary
with respect to the entire group F3 [t].

220

9. Dynamics

Now let B be a ball


B := {n F3 [t] : deg(n n0 ) K},
which we divide into three equally sized sub-balls B0 , B1 , B2 by the formula
Bi := {n F3 [t] : deg(n (n0 + itK )) K 1}.
By construction, we see that
VB = {(xB0 , xB1 , xB2 ) : xB0 , xB1 , xB2 VB0 ; xB0 + xB1 + xB2 = 0}
where we use translation by tK to identify VB0 , VB1 , and VB2 together. As
a consequence, we see that the projection map (xB0 , xB1 , xB2 ) (xB0 , xB1 )
from VB to VB0 VB0 is surjective, and this implies that the random variables x B0 , x B1 are independent. More generally, this argument implies
that for any disjoint balls B, B 0 , the random variables x B and x B 0 are
independent.
Now we can prove strong 2-mixing. Given any measurable event A and
any > 0, one can find a ball B and a set A0 depending only on x B such
that A and A0 differ by at most in measure. On the other hand, for g
outside of B B, A0 and Tg A0 are determined by the restrictions of x to
disjoint balls and are thus independent. In particular,
(A0 Tg A0 ) = (A0 )2
and thus
(A Tg A) = (A)2 + O()
which gives strong 2-mixing.
On the other hand, we have x0 + xtk + x2tk = 0 almost surely, while each
x0 , xtk , x2tk are uniformly distributed in F3 and pairwise independent. In
particular, if E is the event that x0 = 0, we see that
(E) = 1/3
and
(E Ttk E Tt2k E) = 1/9
showing that strong 3-mixing fails.
Remark 9.2.3. In the Archimedean case G = Z, a constraint such as
xn + xn+1 + xn+2 = 0 propagates itself to force complete linearity of xn ,
which is highly incompatible with strong mixing; in contrast, in the nonArchimedean case G = F3 , such a constraint does not propagate very far. It
is then tempting to relax this constraint, for instance by adopting an Isingtype model which penalises a configuration whenever quantities such as xn +
xn+1 + xn+2 deviates from zero. However, to destroy strong 3-mixing, one
needs infinitely many such penalisation terms, which roughly corresponds to
an Ising model in an infinite-dimensional lattice. In such models, it seems

9.2. Rohlins problem

221

difficult to find a way to set the temperature parameters in such a way that
one has meaningful 3-correlations, without the system freezing up so much
that 2-mixing fails. It is also tempting to try to truncate the constraints
such as (9.3) to prevent their propagation, but it seems that any naive
attempt to perform a truncation either breaks stationarity, or introduces
enough periodicity into the system that 2-mixing breaks down. My tentative
opinion on this problem is that a Z-counterexample is constructible, but one
would have to use a very delicate and finely tuned construction to achieve
it.

Chapter 10

Miscellaneous

223

224

10. Miscellaneous

10.1. Worst movie polls


Every so often, one sees on the web some poll for the worst X, where X
is some form of popular entertainment; lets take X to be movies for sake
of discussion.
Invariably, the results of these polls are somewhat disappointing; a
worst movie list will often contain examples of bad movies, but with an
arbitrary-seeming ranking, with many obviously bad movies missing from
the list.
Of course, much of this can be ascribed to the highly subjective and variable nature of the tastes of those being polled, as well as the over-marketing
of various mediocre but not exceptionally terrible movies. However, it turns
out that even in an idealised situation in which all movie watchers use the
same objective standard to rate movies, and where the success of each movie
is determined solely by its quality, a worst movie poll will still often give
totally inaccurate results.
Informally, the reason for this is that the truly bad movies, by their
nature, are so unpopular that most people will not have watched them, and
so they rarely even show up on the polls at all.
One can mathematically model this as follows. Let us say there are N
movies, ranked in order of highest quality to least. Suppose that the k th best
movie has been watched by a proportion pk of the population. As we are
assuming that movie success is determined by quality, we suppose that the
pk are decreasing in k. A randomly selected member of the population thus
has a probability pk of seeing the k th movie. In order to make the analysis
tractable, we make the (unrealistic) assumption that these events of seeing
the k th movie are independent in k.
As such, the probability that a given voter will rank movie k as the worst
movie (because he or she has seen that movie, but has not seen any worse
movie) is

(10.1)

pk (1 pk+1 ) . . . (1 pN ).

The winner of the poll should then be the movie which maximises the
quantity (10.1).
One can solve this optimisation problem by assuming a power law
pk ck
for some parameters c and , which typically are comparable to 1. It is
an instructive exercise to optimise (10.1) using this law. What one finds is

10.2. Descriptive and prescriptive science

225

that the value of the exponent becomes key. If < 1 (and N is large),
then (10.1) is maximised at k = N , and so in this case the poll should indeed
rate the very worst movies at the top of their ranking.
If > 1, there is a surprising reversal; (10.1) is instead maximised for
a value of k which is bounded, k = O(1). Basically, the poll now ranks the
worst blockbuster movie, rather than the worst movie period; a mediocre
but widely viewed movie will beat out a terrible but obscure movie.
Amusingly, according to Zipf s law, one expects to be close to 1. As
such, there is a critical phase transition (especially if the constant c is also
at the critical value of 1) and now one can anticipate the poll to more or
less randomly select movies of any level of quality. So one can blame Zipfs
law for the inaccuracy of worst movie polls.

10.2. Descriptive and prescriptive science


Broadly speaking, work in an academic discipline can be divided into descriptive 1 activity, which seeks to objectively describe the world we live in,
and prescriptive activity, which is more subjective and seeks to define how
the world ought to be interpreted.
However, the division between descriptive and prescriptive activity varies
widely between fields (broadly corresponding to the distinction between
hard and soft sciences). Mathematics, for instance, tends to focus
almost entirely (in the short term, at least) on descriptive activity (e.g.
determining the truth or falsity of various conjectures, solving problems,
or proving theorems), although visionary (and prescriptivist) guidance (e.g.
introducing a point of view, making an influential set of conjectures, identifying promising avenues of research, initiating a mathematical program,
or finding the right definition for a mathematical concept, or the right
set of axioms for a formal system) does play a vital role in the long-term
development of the field.
The physical sciences are often presented to the public from a prescriptive standpoint, in that they are supposed to answer the question of why
nature is the way we see it to be, and what causes a certain physical phenomenon to happen. However, in truth, many of the successful and tangible
achievements of physics have come instead from the descriptive side of the
field - finding out what the laws of nature are, and how specific physical systems will behave. The relationship between the prescriptive and descriptive
sides of physics is roughly analogous to the relationship between causation
and correlation in statistics; the latter can (and should) form a supporting
1In some fields, descriptive and prescriptive are referred to as positive and normative
respectively.

226

10. Miscellaneous

foundation of evidence for the former, but an understanding of the latter


does not necessarily entail a corresponding understanding of the former.
The prescriptive side of physics is extremely difficult to formalise properly, as one can see by the immense literature on philosophy of science; it is
not easy at all to quantify the extent to which the answer to a why? or
what causes? question is correct and intellectually satisfying.
In contrast, the descriptive side of physics, while perhaps less satisfying,
is at least somewhat easier to formalise (though it is not without its own set
of difficulties, such as the problem of defining precisely what a measurement
or observation is, and how to deal with errors in the measurements or in the
model). One way to do so is to take a computational complexity viewpoint,
and view descriptive physics as an effort to obtain increasingly good upper
bounds on the descriptive complexity (or Kolmogorov complexity) of the
universe, or more precisely on the set of observations that we can make in
the universe.
To give an example of this, consider a very simple set of observations,
namely the orbital periods T1 , . . . , T6 of the six classical planets (Mercury,
Venus, Earth, Mars, Jupiter, Saturn), and their distances R1 , . . . , R6 to
the Sun (ignoring for now the detail that the orbits are not quite circular,
but are instead essentially elliptical). To describe this data set, one could
perform2 the relevant set of observations, and obtain a list of twelve numbers
T1 , . . . , T6 , R1 , . . . , R6 , which form a complete description of this data set.
On the other hand, if one is aware of Keplers third law, one knows about
the proportionality relationship
Ti2 = cRi3
for some constant c and all i = 1, . . . , 6. In that case, one can describe
the entire data set by just seven numbers - the distances R1 , . . . , R6 and the
constant c - together with Keplers third law. This is a shorter description
of the set, and so we have thus reduced the upper bound on the Kolmogorov
complexity of the set. In this example, we have only shortened the length of
the description by five numbers (minus the length required to state Keplers
law), but if one then adds in more planets and planet-like objects (e.g. asteroids, and also comets if one generalises Keplers law to elliptical orbits), one
sees the improvement in descriptive complexity become increasingly marked.
In particular, the one-time cost of stating Keplers law (and of stating the
proportionally constant c) eventually becomes a negligible component of the
2For this exercise, we will ignore the issue of possible inaccuracies in measurement, or in the
implicit physical assumptions used to perform such a measurement.

10.2. Descriptive and prescriptive science

227

total descriptive complexity, when the range of applicability of the law becomes large. This is in contrast to superficially similar proposed laws such
as the Titius-Bode law, which was basically restricted to the six classical
planets and thus provided only a negligible saving in descriptive complexity.
Note that Keplers law introduces a new quantity, c, to the explanatory
model of the universe. This quantity increases the descriptive complexity
of the model by one number, but this increase is more than offset by the
decrease (of six numbers, in the classical case) caused by the application of
the law. Thus we see the somewhat unintuitive fact that one can simplify
ones model of the universe by adding parameters to it. However, if one adds
a gratuitiously large number of such parameters to the model, then one can
end up with a net increase in descriptive complexity, which is undesirable;
this can be viewed as a formal manifestation of Occams razor. For instance,
if one had to add an ad hoc fudge factor Fi to Keplers law to make it
work,
Ti2 = cRi3 + Fi ,
with Fi being different for each planet, then the descriptive complexity
of this model has in fact increased to thirteen numbers (e.g. one can specify
c, R1 , . . . R6 , and F1 , . . . , F6 ), together with the fudged Keplers law, leading
to a model with worse complexity3 than the initial model of simply stating
all the twelve observables T1 , . . . , T6 , R1 , . . . , R6 .
Note also that the additional parameters (such as c) introduced by such
a law were not initially present in the previous model of the data set, and
can only be measured through the law itself. This can give the appearance of
circularity - Keplers law relates times and radii of planets using a constant
c, but the constant c can only be determined by applying Keplers law. If
there was only one planet in the data set, this law would indeed be circular
(providing no new information on the orbital time and radius of the planet);
but the power of the law comes from its uniform applicability among all
planets. For instance, one can use data from the six classical planets to
compute c, which can then be used to make predictions on, say, the orbital
period of a newly discovered planet at a known distance to the sun. This
may seem confusingly circular4 from the prescriptive viewpoint - does the
3However, if this very same fudge factor F also appeared in laws that involved other statistics
i
of the planet, e.g. mass, radius, temperature, etc. - then it can become possible again that such
a law could act to decrease descriptive complexity when working with an enlarged data set that
involves these statistics. Also, if the fudge factor is always small, then there is still some decrease
in descriptive complexity coming from a saving in the most significant figures of the primary
measurements Ti , Ri . So an analysis of an oversimplified data set, such as this one, can be
misleading.
4One could use mathematical manipulation to try to eliminate such unsightly constants, for
instance replacing Keplers law with the (mathematically equivalent) assertion that Ti2 /Ri3 =

228

10. Miscellaneous

constant c cause the relationship between period and distance, or vice


versa? - but is perfectly consistent and useful from the descriptive viewpoint.
Note also that with this descriptive approach to Keplers law, absolutely
nothing has been said about the causal origins of the law. Of course, we now
know that Keplers law can be mathematically deduced from Newtons law
of gravitation (which has a far greater explanatory power, and thus achieves
a far greater reduction in descriptive complexity, than Keplers laws, due
to its much wider range of applicability). From a prescriptive viewpoint,
this can be viewed as a partial explanation of Keplers law, reducing the
question to that of understanding the causal origins of Newtons law. When
viewed in isolation, this may not be regarded as much of a reduction, as one
is simply replacing one unexplained law with another; but when one takes
into account that Newtons laws of classical mechanics can be used to derive
hundreds of previously known classical laws besides Keplers law, we see
that Newtonian mechanics did in fact achieve a substantial reduction in the
number of unexplained laws in physics. Thus we see that descriptive science
can be used to reduce the magnitude of problems one faces in prescriptive
science, although it cannot by itself be used to solve these problems entirely.
In modern physics, of course, we model the universe to be extremely
large, extremely old, and to have structure both at very fine scales and very
large scales. At first glance, this seems to massively increase the descriptive
complexity of this model, in defiance of Occams razor. However, these scale
parameters in our model were not chosen gratuitously, but were the natural
and consistent consequence of extrapolating from the known observational
data using the known laws of physics. All known rival models of the universe
that are significantly smaller in scale in either time or space require either
that a large fraction of observational data be arbitrarily invalidated, or that
the known laws of physics acquire an ad hoc set of fudge factors that emerge
in some range of physical scenarios but not in others (in particular, these
factors need to somehow disappear in all scenarios that can be directly
observed). Either of these two fixes ends up leading to a much larger
descriptive complexity for the universe than the standard model.
In some cases, the additional parameters introduced by a model to reduce
the descriptive complexity are in fact unphysical - they cannot be computed,
even in principle, from observation and from the laws of the model. A simple
example is that of the potential energy of an object in classical physics.
Experiments (e.g. measuring the amount of work needed to alter the state
of an object) can measure the difference between the potential energy of

Tj2 /Rj3 for all i, j, but this tends to lead to mathematically uglier laws and also does not lead to
any substantial saving in descriptive complexity.

10.3. Honesty and Bayesian probability

229

an object in two different states, but cannot compute5 the potential energy
itself. Indeed, one could add a fixed constant to the potential energy of all
the possible states of an object, and this would not alter any of the physical
consequences of the model. Nevertheless, the presence of such unphysical
quantities can serve to reduce the descriptive complexity of a model (or at
least to reduce the mathematical complexity, by making it easier to compute
with the model), and can thus be desirable from a descriptive viewpoint,
even though they are unappealing from a prescriptive one.
It is also possible to use mathematical abstraction to reduce the number
of unphysical quantities in a model; for instance, potential energy could
be viewed not as a scalar, but instead as a more abstract torsor. Again,
these mathematical manipulations do not fundamentally affect the physical
consequences of the model.

10.3. Honesty and Bayesian probability


Suppose you are shopping for some item X. You find a vendor V who is
willing to sell X to you at a good price. However, you do not know whether
V is honest (and thus selling you a genuine X), or dishonest (selling you
a counterfeit X). How can one estimate the likelihood that V is actually
honest?
One can try to model this problem using Bayesian probability. One
can assign a prior probability p that V is honest (based, perhaps, on how
trustworthy V looks, or on past experience with such vendors). However,
one can update this prior probability p based on contextual information,
such as the nature of the deal V is offering you, the way in which you got in
contact with V , the venue in which V is operating in, and the past history
of V (or the brand that V represents).
For instance, suppose V is offering you X at a remarkably low price Y one which is almost too good to be true. Specifically, this price might be
so low that an honest vendor would find it very difficult to sell X profitably
at this price, whereas a dishonest vendor could more easily sell a counterfeit
X at the same price. Intuitively, this context should create a downward
revision on ones probability estimate that V is honest. Indeed, if we let a
be the conditional probability
a := P(V sells at Y |V is honest)
and b be the probability
5Amusingly, in special relativity, the potential energy does actually become physically measurable, thanks to Einsteins famous equation E = mc2 , but this does not detract from the previous
point. Other examples of non-physical quantities that are nevertheless descriptively useful include
the wave function in quantum mechanics, or gauge fields in gauge theory.

230

10. Miscellaneous

b := P (V sells at Y |V is dishonest)
then after a bit of computation using Bayes theorem, we find that
(10.2)

P (V is honest|V sells at Y ) =

ap
.
ap + b(1 p)

the right-hand side can be rearranged as


p

(b a)p(1 p)
.
ap + b(1 p)

Thus we do indeed see that if b > a, then the probability that V is


honest is revised downwards from p (and conversely if b < a, then we revise
the probability that V is honest upwards).
In a similar fashion, if V has invested in a substantial storefront presence, which would make it difficult (or at least expensive) for V to quickly
disappear in case of customer complaints about X, then the same analysis
increases the probability that V is honest, since it is unlikely that a dishonest vendor would make such an investment, instead preferring a more mobile
fly by night operation. Or in the language of the above Bayesian analysis:
the analogue of a is large, and the analogue of b is small.
One can also take V s past sale history into account. Suppose that one
knows that V has already sold N copies of X without any known complaint.
If we make the somewhat idealistic assumptions that an honest vendor would
not cause any complaints, and each sale by a dishonest vendor has a probability of causing a complaint (with the probability of complaint being
independent from sale to sale), then in the notation of the previous analysis,
we have a = 1 and b = (1 )N . As N gets large, b tends exponentially
to zero, and this causes the posterior probability that V is honest to tend
exponentially to 1, as can be seen by the formula (10.2). This analysis
can help explain the power of large corporate brands, which have a very
long history of sales, and thus (assuming, of course, that their prior reputation is strong) have a significant advantage over smaller competitors in that
consumers generally entrust them to guarantee a certain minimum level of
quality. (Conversely, smaller businesses can take more risks, and can thus
sometimes offer levels of quality significantly higher than that of a safe corporate brand.)
A similar analysis can be applied to non-commercial settings, such as
the leak of some purportedly genuine document. If one has an anonymous
leak of only a single document, then it can be quite difficult to determine
whether the document is genuine or not, as it is entirely possible to forge a
single document that passes for genuine under superficial scrutiny. However,

10.3. Honesty and Bayesian probability

231

if there is a leak of N documents for a large value of N , and no glaring


inaccuracies or contradictions have been found in any of these documents,
then the probability that the documents are largely genuine converges quite
rapidly to one, because the difficulty of forging N documents without any
obvious slip-ups increases exponentially with N .
It is important to note, however, that Bayesian analysis is only as strong
as the assumptions that underlie it. In the above analysis that a long history of sales without complaint increases the probability that the vendor is
honest, an important assumption was made that each sale by a dishonest
vendor had an independent probability of triggering a complaint. However,
this assumption can fail in some key situations, most notably when X is a
financial product, and the vendor V could potentially be running a pyramid
scheme. In such schemes, there are essentially no complaints from customers
for most of the lifetime of the scheme, but then there is a catastrophic collapse at the very end of the scheme. As such, a past history of satisfied
customers does not in fact increase the probability that V is honest in this
case. (Another thing to note is that pyramid schemes, by their nature, grow
exponentially in time, and so one is statistically much more likely to come
in contact with a pyramid scheme when it is large and near the end of its
lifespan, than when it is small and still some way from collapsing.)

Bibliography

[Al2011] J. M. Aldaz, The weak type (1, 1) bounds for the maximal function associated
to cubes grow to infinity with the dimension, Ann. of Math. (2) 173 (2011), no. 2,
1013-1023.
[Al1974] F. Alexander, Compact and finite rank operators on subspaces of lp , Bull. London
Math. Soc. 6 (1974), 341-342.
[AmGe1973] W. O. Amrein, V. Georgescu, On the characterization of bound states and
scattering states in quantum mechanics, Helv. Phys. Acta 46 (1973/74), 635-658.
[Ba1966] A. Baker, Linear forms in the logarithms of algebraic numbers. I, Mathematika.
A Journal of Pure and Applied Mathematics 13 (1966), 204-216.
[Ba1967] A. Baker, Linear forms in the logarithms of algebraic numbers. II, Mathematika.
A Journal of Pure and Applied Mathematics 14 (1966), 102-107.
[Ba1967b] A. Baker, Linear forms in the logarithms of algebraic numbers. III, Mathematika. A Journal of Pure and Applied Mathematics 14 (1966), 220-228.
[BoSo1978] C. B
ohm, G. Sontacchi, On the existence of cycles of given length in integer
sequences like xn+1 = xn/2 if xn even, and xn+1 = 3xn + 1 otherwise, Atti Accad.
Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. 64 (1978), no. 3, 260-264.
[BoChLoSoVe2008] C. Borgs, J. Chayes, L. Lov
asz, V. S
os, K. Vesztergombi, Convergent
sequences of dense graphs. I. Subgraph frequencies, metric properties and testing, Adv.
Math. 219 (2008), no. 6, 1801-1851.
[Bo1985] J. Bourgain, Estimations de certaines fonctions maximales, C. R. Acad. Sci.
Paris Ser. I Math. 301 (1985), no. 10, 499-502.
[Bo1991] J. Bourgain, Besicovitch type maximal operators and applications to Fourier
analysis, Geom. Funct. Anal. 1 (1991), no. 2, 147-187.
[Bo2005] J. Bourgain, Estimates on exponential sums related to the Diffie-Hellman distributions, Geom. Funct. Anal. 15 (2005), no. 1, 1-34.
[BoSaZi2011] J. Bourgain, P. Sarnak, T. Ziegler, Disjointness of Mobius from horocycle
flows, preprint.
[BrGrGuTa2010] E. Breuillard, B. Green, R. Guralnick, T. Tao, Strongly dense free subgroups of semisimple algebraic groups, preprint.

233

234

Bibliography

[CaRuVe1988] A. Carbery, J. Rubio de Francia, L. Vega, Almost everywhere summability


of Fourier integrals, J. London Math. Soc. (2) 38 (1988), no. 3, 513-524.
[Ca1966] L. Carleson, On convergence and growth of partial sums of Fourier series, Acta
Mathematica 116 (1966), 135-157.

[Ca1813] A. L. Cauchy, Recherches sur les nombres, J. Ecole


Polytech. 9 (1813), 99-116.
[Ce1964] A. V. Cernavskii, Finite-to-one open mappings of manifolds, Mat. Sb. (N.S.) 65
(1964), 357-369.
[Ch2003] M. Chang, Factorization in generalized arithmetic progressions and applications
to the Erdos-Szemeredi sum-product problems, Geom. Funct. Anal. 13 (2003), no. 4,
720-736.
[Ch1885] M. Chasles, Traite des sections coniques, Gauthier-Villars, Paris, 1885.
[Ch2008] M. Christ, Quasi-extremals for a Radon-like transform,
www.math.berkeley.edu/ mchrist/Papers/quasiextremal.pdf

preprint.

[Co2007] A. Comech,
Cotlar-Stein almost orthogonality lemma,
www.math.tamu.edu/ comech/papers/CotlarStein/CotlarStein.pdf

preprint.

[CoCoGr2008] B. Conrad, K. Conrad, R. Gross, Prime specialization in genus 0, Trans.


Amer. Math. Soc. 360 (2008), no. 6, 2867-2908.
[Co1955] M. Cotlar, A combinatorial inequality and its application to L2 spaces, Math.
Cuyana 1 (1955), 41-55.
[Da1935] H. Davenport, On the addition of residue classes, J. London Math. Soc. 10
(1935), 3032.
[de1981] M. de Guzm
an, Real variable methods in Fourier analysis. North-Holland Mathematics Studies, 46. Notas de Matemtica [Mathematical Notes], 75. North-Holland
Publishing Co., Amsterdam-New York, 1981.
[de2006] T. de la Rue, 2-fold and 3-fold mixing: why 3-dot-type counterexamples are impossible in one dimension, Bull. Braz. Math. Soc. (N.S.) 37 (2006), no. 4, 503-521.
P
[De1971] F. Delmer, Sur la somme de diviseurs kx d[f (k)]s , C. R. Acad. Sci. Paris Ser.
A-B 272 (1971), A849-A852.
[DeFoMaWr2010] S. Dendrinos, M. Folch-Gabayet, J. Wright, An affine-invariant inequality for rational functions and applications in harmonic analysis, Proc. Edinb. Math.
Soc. (2) 53 (2010), no. 3, 639-655.

[Ei1969] D. Eidus,
The principle of limiting amplitude, Uspehi Mat. Nauk 24 (1969), no.
3(147), 91-156.
[ElSz2012] G. Elek, B. Szegedy, A measure-theoretic approach to the theory of dense hypergraphs, Adv. Math. 231 (2012), no. 3-4, 1731-1772.
[ElSh2011] G. Elekes, M. Sharir, Incidences in three dimensions and distinct distances in
the plane, Combin. Probab. Comput. 20 (2011), no. 4, 571-608.
[ElObTa2010] J. Ellenberg, R. Oberlin, T. Tao, The Kakeya set and maximal conjectures
for algebraic varieties over finite fields, Mathematika 56 (2010), no. 1, 1-25.
[ElTa2011] C. Elsholtz, T. Tao, Counting the number of solutions to the Erdos-Straus
equation on unit fractions, preprint.
[En1973] P. Enflo, A counterexample to the approximation problem in Banach spaces, Acta
Math. 130 (1973), 309-317.
[En1978] V. Enss, Asymptotic completeness for quantum mechanical potential scattering.
I. Short range potentials, Comm. Math. Phys. 61 (1978), no. 3, 285-291.
P
[Er1952] P. Erd
os, On the sum xk=1 d(f (k)), J. London Math. Soc. 27 (1952), 7-15.

Bibliography

235

[Er1979] P. Erd
os, Some unconventional problems in number theory, Journees
Arithmetiques de Luminy (Colloq. Internat. CNRS, Centre Univ. Luminy, Luminy,
1978), pp. 73-82, Asterisque, 61, Soc. Math. France, Paris, 1979.
[Ka1940] P. Erdos, M. Kac, The Gaussian Law of Errors in the Theory of Additive Number
Theoretic Functions, American Journal of Mathematics 62 (1940), 738-742.
[Fe1971] C. Fefferman, The multiplier problem for the ball, Ann. of Math. (2) 94 (1971),
330-336.
[Fe1995] C. Fefferman, Selected theorems by Eli Stein, Essays on Fourier analysis in honor
of Elias M. Stein (Princeton, NJ, 1991), 135, Princeton Math. Ser., 42, Princeton
Univ. Press, Princeton, NJ, 1995.
[FeSt1972] C. Fefferman, E. Stein, H p spaces of several variables, Acta Math. 129 (1972),
no. 3-4, 137-193.
[Fu1977] H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemeredi on arithmetic progressions, J. Analyse Math. 31 (1977), 204256.
[Ga1981] L. Garner, On the Collatz 3n + 1 algorithm, Proc. Amer. Math. Soc. 82 (1981),
no. 1, 19-22.
[Ge1934] A. Gelfond, Sur le septieme Probleme de D. Hilbert,Comptes Rendus Acad. Sci.
URSS Moscou 2 (1934), 16.
[Go2008] W. T. Gowers, Quasirandom groups, Combin. Probab. Comput. 17 (2008), no.
3, 363-387.
[Gr2008] A. Granville, Smooth numbers: computational number theory and beyond, Algorithmic number theory: lattices, number fields, curves and cryptography, 267323,
Math. Sci. Res. Inst. Publ., 44, Cambridge Univ. Press, Cambridge, 2008.
[Gr1970] G. Greaves, On the divisor-sum problem for binary cubic forms, Acta Arith. 17
(1970) 1-28.
[GrRu2005] B. Green, I. Ruzsa, Sum-free sets in abelian groups, Israel J. Math. 147
(2005), 157-188.
[GrTa2012] B. Green, T. Tao, The M
obius function is strongly orthogonal to nilsequences,
Ann. of Math. (2) 175 (2012), no. 2, 541-566.
[Gr1955] A. Grothendieck, Produits tensoriels topologiques et espaces nucleaires, Mem.
Amer. Math. Soc. 1955 (1955), no. 16, 140 pp.
[Gu2011] C. Gunn, On the Homogeneous Model Of Euclidean Geometry, AGACSE (2011)
[GuKa2010] L. Guth, N. Katz, On the Erdos distinct distance problem in the plane,
preprint.
[Gu1988] R. Guy, The Strong Law of Small Numbers, American Mathematical Monthly
95 (1988), 697-712.
[Ha2010] Y. Hamidoune, Two Inverse results, preprint. arXiv:1006.5074
[HaRa1917] G. H. Hardy, S. Ramanujan, The normal number of prime factors of a number, Quarterly Journal of Mathematics 48 (1917), 76-92.
[He1983] J. Heintz, Definability and fast quantifier elimination over algebraically closed
fields, Theoret. Comput. Sci. 24 (1983), 239277.
[Ho1963] C. Hooley, On the number of divisors of a quadratic polynomial, Acta Math.
110 (1963), 97-114.
[Ho1991] B. Host, Mixing of all orders and pairwise independent joinings of systems with
singular spectrum, Israel J. Math. 76 (1991), no. 3, 289-298.
[Hr2012] E. Hrushovski, Stable group theory and approximate subgroups, J. Amer. Math.
Soc. 25 (2012), no. 1, 189-243.

236

Bibliography

[Hu2004] D. Husem
oller, Elliptic curves. Second edition. With appendices by Otto Forster,
Ruth Lawrence and Stefan Theisen. Graduate Texts in Mathematics, 111. SpringerVerlag, New York, 2004.
[IoRoRu2011] A. Iosevich, O. Roche-Newton, M. Rudnev, On an application of Guth-Katz
theorem, preprint.
[Ka1986] I. K
atai, A remark on a theorem of H. Daboussi. Acta Math. Hungar. 47 (1986),
no. 1-2, 223-225.
[Ka1984] S. Kalikow, Twofold mixing implies threefold mixing for rank one transformations, Ergodic Theory Dynam. Systems 4 (1984), no. 2, 237-259.
[Ka1965] T. Kato, Wave operators and similarity for some non-selfadjoint operators,
Math. Ann. 162 (1965/1966), 258-279.
[Ke1964] J. H. B. Kemperman, On products of sets in locally compact groups, Fund. Math.
56 (1964), 51-68.
[KnSt1971] A. Knapp, E. Stein, Intertwining operators for semisimple groups, Ann. of
Math. (2) 93 (1971), 489-578.
[Kn1953] M. Kneser, Absch
atzungen der asymptotischen Dichte von Summenmengen,
Math. Z 58 (1953), 459484.
[KrLa2003] I. Krasikov, J. Lagarias, Bounds for the 3x + 1 problem using difference inequalities, Acta Arith. 109 (2003), no. 3, 237-258.
[KrRa2010] S. Kritchman, R. Raz, The surprise examination paradox and the second incompleteness theorem, Notices Amer. Math. Soc. 57 (2010), no. 11, 1454-1458.
[La2009] J. Lagarias, Ternary expansions of powers of 2., J. Lond. Math. Soc. 79 (2009),
no. 3, 562-588.
[La1989] B. Landreau, A new proof of a theorem of van der Corput, Bull. London Math.
Soc. 21 (1989), no. 4, 366-368.
[Le1978] F. Ledrappier, Un champ markovien peut etre dentropie nulle et melangeant, C.
R. Acad. Sci. Paris Ser. A-B 287 (1978), no. 7, A561-A563.
[LeMa2005] G. Leonardi, S. Masnou, On the isoperimetric problem in the Heisenberg group
1H n , Ann. Mat. Pura Appl. (4) 184 (2005), no. 4, 533553.
[Li1973] W. Littman, Lp Lq -estimates for singular integral operators arising from hyperbolic equations, Partial differential equations (Proc. Sympos. Pure Math., Vol. XXIII,
Univ. California, Berkeley, Calif., 1971), pp. 479481. Amer. Math. Soc., Providence,
R.I., 1973.
[Lo1975] P. Loeb, Conversion from nonstandard to standard measure spaces and applications in probability theory, Trans. Amer. Math. Soc. 211 (1975), 113-122.
[LoSz2006] L. Lov
asz, B. Szegedy, Limits of dense graph sequences, J. Combin. Theory
Ser. B 96 (2006), no. 6, 933-957.
[Ma1953] A. M. Macbeath, On measure of sum sets. II. The sum-theorem for the torus,
Proc. Cambridge Philos. Soc. 49, (1953), 40-43.
[MaHu2008] C. R. MacCluer, A. Hull, A short proof of the Fredholm alternative, Int. J.
Pure Appl. Math. 45 (2008), no. 3, 379-381.
[Ma2010] K. Maples, Singularity of Random Matrices over Finite Fields, preprint.
arXiv:1012.2372
[Ma2012] L. Matthiesen, Correlations of the divisor function, Proc. Lond. Math. Soc. 104
(2012), 827-858.

Bibliography

237

[Ma1974] B. Maurey, Theor`emes de factorisation pour les operateurs lineaires a


` valeurs
dans les espaces Lp , With an English summary. Astrisque, No. 11. Socit Mathmatique
de France, Paris, 1974 ii+163 pp.
[Mc1995] J. McKee, On the average number of divisors of quadratic polynomials, Math.
Proc. Cambridge Philos. Soc. 117 (1995), no. 3, 389-392.
[Mc1997] J. McKee, A note on the number of divisors of quadratic polynomials. Sieve
methods, exponential sums, and their applications in number theory (Cardiff, 1995),
275281, London Math. Soc. Lecture Note Ser., 237, Cambridge Univ. Press, Cambridge, 1997.
[Mc1999] J. McKee, The average number of divisors of an irreducible quadratic polynomial, Math. Proc. Cambridge Philos. Soc. 126 (1999), no. 1, 1722.
[Mi1964] J. Milnor, On the Betti numbers of real varieties, Proc. Amer. Math. Soc. 15
(1964), 275-280.
[MoSeSo1992] G. Mockenhaupt, A. Seeger, C. Sogge, Wave front sets, local smoothing and
Bourgains circular maximal theorem, Ann. of Math. (2) 136 (1992), no. 1, 207218.
[Mo2003] R. Monti, Brunn-Minkowski and isoperimetric inequality in the Heisenberg
group, Ann. Acad. Sci. Fenn. Math. 28 (2003), no. 1, 99-109.
[Ni1970] E. Nikishin, Resonance theorems and superlinear operators, Uspehi Mat. Nauk
25 (1970), no. 6 (156), 129191.
[Ob1992] D. Oberlin, Multilinear proofs for two theorems on circular averages, Colloq.
Math. 63 (1992), no. 2, 187-190.
[OlPe1949] I. G. Petrovskii, O. A. Oleinik, On the topology of real algebraic surfaces,
Izvestiya Akad. Nauk SSSR. Ser. Mat. 13 (1949), 389-402.
[Pi1986] G. Pisier, Factorization of operators through Lp, or Lp,1 and noncommutative
generalizations, Math. Ann. 276 (1986), no. 1, 105-136.
[Po1974] J. M. Pollard, A generalisation of the theorem of Cauchy and Davenport, J.
London Math. Soc. 8 (1974), 460-462.
[Pr2007] C. Procesi, Lie groups. An approach through invariants and representations.
Universitext. Springer, New York, 2007.
[Ra1939] D. Raikov, On the addition of point-sets in the sense of Schnirelmann, Rec.
Math. [Mat. Sbornik] N.S. 5, (1939), 425-440.
[RoTa2011] I. Rodnianski, T. Tao, Effective limiting absorption principles, and applications, preprint.
[Ro1949] V. A. Rohlin, On endomorphisms of compact commutative groups, Izvestiya
Akad. Nauk SSSR. Ser. Mat. 13, (1949), 329-340.
[Ro1953] K. F. Roth, On certain sets of integers, J. London Math. Soc. 28 (1953), 245
252.
[Ro1955] K. F. Roth, Rational approximations to algebraic numbers, Mathematika 2
(1955), 1-20.
[Ru1969] D. Ruelle, A remark on bound states in potential-scattering theory, Nuovo Cimento A 61 (1969), 655-662.
[Ru1992] I. Ruzsa, A concavity property for the measure of product sets in groups, Fund.
Math. 140 (1992), no. 3, 247-254.
[RuSz1978] I. Ruzsa, E. Szemeredi, Triple systems with no six points carrying three triangles, Colloq. Math. Soc. J. Bolyai 18 (1978), 939945.
[Ra2002] J. Saint Raymond, Local inversion for differentiable functions and the Darboux
property, Mathematika 49 (2002), 141-158.

238

Bibliography

[Sc1995] J. Schmid, On the affine Bezout inequality, Manuscripta Mathematica 88 (1995),


Number 1, 225232.
[Sc1989] K. Schmidt-G
ottsch, Polynomial bounds in polynomial rings over fields, J. Algebra 125 (1989), no. 1, 164-180.
[Sc1934] T. Schneider, Transzendenzuntersuchungen periodischer Funktionen. I, J. reine
angew. Math. 172 (1934), 6569.
[Se1969] J. P. Serre, Travaux de Baker, Seminaire Bourbaki, exp. 368 (19691970), 7386.
[Si1921] C. L. Siegel, Approximation algebraischer Zahlen, Mathematische Zeitschrift 10
(1921), 173-213.
[Side2005] J. Simons, B. de Weger, Theoretical and computational bounds for m-cycles of
the 3n + 1-problem, Acta Arith. 117 (2005), no. 1, 51-70.
[SoTa2011] J. Solymosi, T. Tao, An incidence theorem in higher dimensions, preprint.
[So2010] K. Soundararajan, Math249A Fall 2010: Transcendental Number Theory, lecture
notes available at math.stanford.edu/ ksound/TransNotes.pdf. Transcribed by Ian
Petrow.
[St1956] E. M. Stein, Interpolation of linear operators, Trans. Amer. Math. Soc. 83 (1956),
482-492.
[St1961] E. M. Stein, On limits of seqences of operators, Ann. of Math. (2) 74 (1961)
140-170.
[St1976] E. M. Stein, Maximal functions. I. Spherical means, Proc. Nat. Acad. Sci. U.S.A.
73 (1976), no. 7, 2174-2175.
[St1982] E. M. Stein, The development of square functions in the work of A. Zygmund,
Bull. Amer. Math. Soc. (N.S.) 7 (1982), no. 2, 359-376.
[St1993] E. M. Stein, Harmonic analysis: real-variable methods, orthogonality, and oscillatory integrals. With the assistance of Timothy S. Murphy. Princeton Mathematical
Series, 43. Monographs in Harmonic Analysis, III. Princeton University Press, Princeton, NJ, 1993.
[StSt1983] E. M. Stein, J.-O. Str
omberg, Behavior of maximal functions in Rn for large
n, Ark. Mat. 21 (1983), no. 2, 259-269.
[St1978] P. Steiner, A theorem on the Syracuse problem, Proceedings of the Seventh Manitoba Conference on Numerical Mathematics and Computing (Univ. Manitoba, Winnipeg, Man., 1977), pp. 553559, Congress. Numer., XX, Utilitas Math., Winnipeg,
Man., 1978.
[St2010] B. Stovall, Endpoint Lp Lq bounds for integration along certain polynomial
curves, J. Funct. Anal. 259 (2010), no. 12, 3205-3229.
[St1891] E. Study, Von den bewegungen und umlegungen, Mathematische Annalen, 39
(1891), 441-566.
[Sw1962] R. Swan, Factorization of polynomials over finite fields, Pacific J. Math., 12,
1962, 1099-1106.
[Sz1975] E. Szemeredi, On sets of integers containing no k elements in arithmetic progression, Acta Arith. 27 (1975), 299345.
[Sz1978] E. Szemeredi, Regular partitions of graphs, Probl`emes combinatoires et theorie
des graphes (Colloq. Internat. CNRS, Univ. Orsay, Orsay, 1976), Colloq. Internat.
CNRS, 260, Paris: CNRS, pp. 399-401.
[Ta2007] T. Tao, A correspondence principle between (hyper)graph theory and probability
theory, and the (hyper)graph removal lemma, J. Anal. Math. 103 (2007), 1-45.

Bibliography

239

[Ta2008] T. Tao, Structure and randomness: pages from year one of a mathematical blog,
American Mathematical Society, Providence RI, 2008.
[Ta2009] T. Tao, Poincares Legacies: pages from year two of a mathematical blog, Vol.
I, American Mathematical Society, Providence RI, 2009.
[Ta2009b] T. Tao, Poincares Legacies: pages from year two of a mathematical blog, Vol.
II, American Mathematical Society, Providence RI, 2009.
[Ta2010] T. Tao, An epsilon of room, Vol. I, American Mathematical Society, Providence
RI, 2010.
[Ta2010b] T. Tao, An epsilon of room, Vol. II, American Mathematical Society, Providence RI, 2010.
[Ta2011] T. Tao, An introduction to measure theory, American Mathematical Society,
Providence RI, 2011.
[Ta2011b] T. Tao, Higher order Fourier analysis, American Mathematical Society, Providence RI, 2011.
[Ta2011c] T. Tao, Topics in random matrix theory, American Mathematical Society, Providence RI, 2011.
[Ta2011d] T. Tao, Compactness and contradiction, American Mathematical Society, Providence RI, 2011.
[Ta2012] T. Tao, Hilberts fifth problem and related topics, in preparation.
[Ta2012b] T. Tao, Noncommutative sets of small doubling, preprint.
[TaVu2006] T. Tao, V. Vu, Additive combinatorics, Cambridge University Press, 2006.
[Te1976] R. Terras, A stopping time problem on the positive integers, Acta Arith. 30
(1976), no. 3, 241-252.
[Th1965] R. Thom, Sur lhomologie des varietes algebriques reelles, 1965 Differential and
Combinatorial Topology (A Symposium in Honor of Marston Morse) pp. 255-265
Princeton Univ. Press, Princeton, N.J.

[Th1909] A. Thue, Uber


Ann
aherungswerte algebraischer Zahlen, Journal fr die reine und
angewandte Mathematik 135 (1909), 284-305.
[Uu2010] O. Uuye, A simple proof of the Fredholm alternative, preprint. arXiv:1011.2933
[Va1966] J. V
ais
al
a, Discrete open mappings on manifolds, Ann. Acad. Sci. Fenn. Ser. A
I 392 (1966), 10 pp.
[va1939] J. G. van der Corput, Une inegalite relative au nombre des diviseurs, Nederl.
Akad. Wetensch., Proc. 42 (1939), 547-553.
[vdW1927] B.L. van der Waerden, Beweis einer Baudetschen Vermutung, Nieuw. Arch.
Wisk. 15 (1927), 212-216.
[Wa1836] M. L. Wantzel, Recherches sur les moyens de reconnatre si un Probleme de
Geometrie peut se resoudre avec la r`gle et le compas, J. Math. pures appliq. 1 (1836),
36637.

Index

T T identity, 77
approximation property, 56
argumentum ad ignorantium, 1
asymptotic notation, x
atomic proposition, 13
Bakers theorem, 132
Bezouts inequality, 202
Bezouts theorem, 189, 201
Bochner-Riesz operator, 110
Borel-Cantelli lemma (heuristic), 2
Brunn-Minkowski inequality, 207
Cartan subgroup, 42
Cayley-Bacharach theorem, 190
cell decomposition, 48
charge current, 115
classical Lie group, 41
cocycle, 214
Collatz conjecture, 143
common knowledge, 25
complete measure space, 97
completeness (logic), 15
completeness theorem, 16
Cotlar-Stein lemma, 77
deduction theorem, 14
deductive theory, 16
descriptive activity, 225
Dirichlet hyperbola method, 155
Dirichlet series, 152
Dirichlets theorem on
diophantineDapproximation, 132

divisor function, 153


dominant map, 204
entropy function, 147
epistemic inference rule, 18, 22
Euler product, 152
ex falso quodlibet, 19
finite extension, 214
formal system, 12
fractional derivative, 110
Fredholm alternative, 55
Fredholm index, 60
Fubinis theorem, 97
Furstenberg multiple recurrence
theorem, 213
Gelfond-Schneider theorem, 131
half-graph, 106
ham sandwich theorem, 47
Hardy-Littlewood maximal inequality,
80
heat propagator, 110
Helmholtz equation, 113
Hubbles law, 6
hydrostatic equilibrium, 124
incompressible Euler equation, 124
indicator function, x
induction (non-mathematical), 1
integrality gap, 133
internal subset, 100
inverse function theorem, 61

241

242

isogeny, 43
isoperimetric inequality, 207
Keplers third law, 226
Kleinian geometry, 195
knowledge agent, 17
Kripke model, 24
Landaus conjecture, 4
Laplacian, 109
law of the excluded middle, 13
limiting absorption principle, 114
limiting amplitude principle, 121
local smoothing, 119
local-to-global principle (heuristic), 2
Loeb measure, 101
measure space, 97
memory axiom, 27
Mertens theorem, 158
modus ponens, 13
multiplicative function, 152
negative introspection rule, 22
Nikishin-Stein factorisation theorem, 91
Notation, x
Pappus theorem, 191
Pascals theorem, 192
polynomial ham sandwich theorem, 47
polynomial method, 134
positive introspection rule, 22
Prekopa-Leindler inequality, 208
pre-measure, 101
prescriptive activity, 225
principle of indifference, 2
propositional logic, 13
quaternions, 198
RAGE theorem, 120
random rotations trick, 90
random sums trick, 90
rank of a Lie group, 42
regular sequence, 206
resolvent, 110
Riesz lemma, 58
Riesz-Thorin interpolation theorem, 70
Rohlins problem, 218
Schinzels hypothesis H, 4
Schrodinger propagator, 110
Schurs test, 74
semantics, 12

Index

Sierpinskis triangle, 218


smooth number, 162
soundness (logic), 15
special linear group, 41
special orthogonal group, 41
spherical maximal function, 81
spin groups, 42
standard part, 101
Stein factorisation theorem, 90
Stein interpolation theorem, 70
Stein maximal principle, 89
strong mixing, 217
submodularity, 52
symplectic group, 42
syntax, 12
Szemeredi-Trotter theorem, 49
tensor power trick, 78
theory, 16
Thue-Siegel-Roththeorem, 132
Tonellis theorem, 97
triangle removal lemma, 98
truth assignment, 14
truth table, 14
twin prime conjecture, 3
unexpected hanging paradox, 33
wave propagator, 110
Zipfs law, 225

You might also like