Spending Symmetry
Spending Symmetry
Spending Symmetry
Terence Tao
Department of Mathematics, UCLA, Los Angeles, CA 90095
E-mail address: [email protected]
Contents
Preface
ix
A remark on notation
Acknowledgments
Chapter 1.
1.1.
1.2.
1.3.
Mathematical modeling
1.4.
1.5.
Chapter 2.
21
Group theory
39
2.1.
Symmetry spending
39
2.2.
41
Chapter 3.
3.1.
3.2.
Combinatorics
47
47
51
Chapter 4.
Analysis
55
4.1.
55
4.2.
60
68
4.3.
vii
viii
Contents
4.4.
74
4.5.
80
4.6.
87
Chapter 5.
Nonstandard analysis
93
5.1.
93
5.2.
97
Chapter 6.
109
6.1.
109
6.2.
122
Chapter 7.
Number theory
131
7.1.
7.2.
7.3.
7.4.
Chapter 8.
Geometry
131
151
173
8.1.
8.2.
188
8.3.
194
8.4.
Bezouts inequality
200
8.5.
207
Chapter 9.
Dynamics
213
9.1.
213
9.2.
Rohlins problem
217
Chapter 10.
Miscellaneous
223
10.1.
224
10.2.
225
10.3.
229
Bibliography
233
Index
241
Preface
ix
Preface
A remark on notation
For reasons of space, we will not be able to define every single mathematical
term that we use in this book. If a term is italicised for reasons other than
emphasis or for definition, then it denotes a standard mathematical object,
result, or concept, which can be easily looked up in any number of references.
(In the blog version of the book, many of these terms were linked to their
Wikipedia pages, or other on-line reference pages.)
I will however mention a few notational conventions that I will use
throughout. The cardinality of a finite set E will be denoted |E|. We
will use1 the asymptotic notation X = O(Y ), X Y , or Y X to denote
the estimate |X| CY for some absolute constant C > 0. In some cases
we will need this constant C to depend on a parameter (e.g. d), in which
case we shall indicate this dependence by subscripts, e.g. X = Od (Y ) or
X d Y . We also sometimes use X Y as a synonym for X Y X.
In many situations there will be a large parameter n that goes off to
infinity. When that occurs, we also use the notation on (X) or simply
o(X) to denote any quantity bounded in magnitude by c(n)X, where c(n)
is a function depending only on n that goes to zero as n goes to infinity. If
we need c(n) to depend on another parameter, e.g. d, we indicate this by
further subscripts, e.g. on;d (X).
1 P
We will occasionally use the averaging notation ExX f (x) := |X|
xX f (x)
to denote the average value of a function f : X C on a non-empty finite
set X.
If E is a subset of a domain X, we use 1E : X R to denote the
indicator function of X, thus 1E (x) equals 1 when x E and 0 otherwise.
Acknowledgments
I am greatly indebted to many readers of my blog, Buzz, and Google+ feeds,
including Andrew Bailey, Roland Bauerschmidt, Tony Carbery, Yemon Choi,
Marco Frasca, Charles Gunn, Joerg Grande, Alex Iosevich, Allen Knutson,
Miguel Lacruz, Srivatsan Narayanan, Andreas Seeger, Orr Shalit, David
Speyer, Ming Wang, Ben Wieland, Qiaochu Yuan, Pavel Zorin, and several
anonymous commenters, for corrections and other comments, which can be
viewed online at
terrytao.wordpress.com
The author is supported by a grant from the MacArthur Foundation, by
NSF grant DMS-0649473, and by the NSF Waterman award.
Chapter 1
(2) (Principle of indifference) If a random variable X can take N different values, and there is no reason to expect one of these values
to be any more likely to occur than any other, then one can expect
each value to occur with probability 1/N .
(3) (Equidistribution) If one has a (discrete or continuous) distribution of points x in a space X, and one sees no reason why this
distribution should favour one portion of X over another, then one
can expect this distribution to be asymptotically equidistributed in
X after increasing the sample size of the distribution to infinity
(thus, for any reasonable subset E of X, the portion of the distribution contained inside E should asymptotically converge to the
relative measure of E inside X).
(4) (Independence) If one has two random variables X and Y , and
one sees no reason why knowledge about the value of X should
significantly affect the behaviour of Y (or vice versa), then one can
expect X and Y to be independent (or approximately independent)
as random variables.
(5) (Heuristic Borel-Cantelli) Suppose one is counting solutions to an
equation such as P (n) = 0, where n ranges over some set N . Suppose that for any given n N , one expects the equation P (n) = 0
to hold with probability1 pn . Suppose also that one sees no significant relationship between the solvability P
of P (n) = 0 and the
solvability of P (m) = 0 for distinct n, m. If n pn is infinite,
P one
then expects infinitely many solutions to P (n) = 0; but if n pn is
finite, then on expects only finitely many solutions to P (n) = 0.
(6) (Local-to-global principle) If one is trying to solve some sort of
equation F (x) = 0, and all obvious or local obstructions to
this solvability (e.g. trying to solve df = w when w is not closed)
are not present, and one believes that the class of all possible x is
so large or flexible that no global obstructions (such as those
imposed by topology) are expected to intervene, then one expects
a solution to exist.
The equidistribution principle is a generalisation of the principle of indifference, and among other things forms the heuristic basis for statistical
mechanics (where it is sometimes referred to as the fundamental postulate
of statistical mechanics). The heuristic Borel-Cantelli lemma can be viewed
as a combination of the equidistribution and independence principles.
1Such an expectation for instance might occur from the principle of indifference, for instance
by observing that P (n) can range in a set of size Rn that contains zero, in which case one can
predict a probability pn = 1/Rn that P (n) will equal zero.
A typical example of the equidistribution principle in action is the conjecture (which is still unproven) that the digits of are equidistributed:
thus, for instance, the proportion of the first N digits of that are equal
to, say, 7, should approach 1/10 in the limit as N goes to infinity. The
point here is that we see no reason why the fractional part {10n } of the
expression 10n should favour one portion of the unit interval [0, 1] over any
other, and in particular it should occupy the subinterval [0.7, 0.8) one tenth
of the time, asymptotically.
A typical application of the heuristic Borel-Cantelli lemma is an informal
proof of the twin prime conjecture that there are infinitely many primes
p such that p + 2 is also prime. From the prime number theorem, we expect
a typical large number n to have an (asymptotic) probability log1 n of being
1
prime, and n+2 to have a probability log(n+2)
of being prime. If one sees no
reason why the primality (or lack thereof) of n should influence the primality
(or lack thereof) of n + 2, then by the independence principle one expects
1
a typical number n to have a probability (log n)(log
n+2) of being the first
P
1
part of a twin prime pair. Since n (log n)(log n+2) diverges, we then expect
infinitely many twin primes.
While these arguments can lead to useful heuristics and conjectures, it
is important to realise that they are not remotely close to being rigorous,
and can indeed lead to incorrect results. For instance, the above argument
claiming to prove the infinitude of twin primes p, p + 2 would also prove
the infinitude of consecutive primes p, p + 1, which is absurd. The reason
here is that the primality of a number p does significantly influence the
primality of its successor p + 1, because all but one of the primes are odd,
and so if p is a prime other than 2, then p + 1 is even and cannot itself be
prime. Now, this objection does not prevent p + 2 from being prime (and
neither does consideration of divisibility by 3, or 5, etc.), and so there is no
obvious reason why the twin prime argument does not work; but one cannot
conclude from this that there are infinitely twin primes without an appeal
to the non-rigorous argument from ignorance.
Another well-known mathematical example where the argument from
ignorance fails concerns the fractional parts of exp( n), where n is a natural number. At first glance, much as with 10n , there is no reason why
these fractional parts of transcendental numbers should favour any region
of the unit interval [0, 1] over any other, and so one expects equidistribution
in n. As a consequence of this and a heuristic Borel-Cantelli argument, one
of, namely the unique factorisation of the number field Q( 163). For all
we know, a similar hidden structure or conspiracy might ultimately be
present in the digits2 of , or the twin primes; we cannot yet rule these out,
and so these conjectures remain open.
There are similar cautionary counterexamples that are related to the
twin prime problem. The same sort of heuristics that support the twin prime
conjecture also support Schinzels hypothesis H, which roughly speaking
asserts that polynomials P (n) over the integers should take prime values
for infinitely many n unless there is an obvious reason why this is not the
case, i.e. if P (n) is never coprime to a fixed modulus q, or if it is reducible, or
if it cannot take arbitrarily large positive values. Thus, for instance, n2 + 1
should take infinitely many prime values (an old conjecture of Landau).
This conjecture is widely believed to be true, and one can use the heuristic
Borel-Cantelli lemma to support it. However, it is interesting to note that
if the integers Z are replaced by the function field analogue F2 [t], then the
conjecture fails, as first observed by Swan [Sw1962]. Indeed, the octic
polynomial n8 + t3 , while irreducible over F2 [t], turns out to never give an
irreducible polynomial for any given value n F2 [t]; this has to do with the
structure of this polynomial in certain lifts of F2 [t], a phenomenon studied
systematically in [CoCoGr2008].
Even when the naive argument from ignorance fails, though, the nature
of that failure can often be quite interesting and lead to new mathematics. In
my own area of research, an example of this came from the inverse theory of
the Gowers uniformity norms. Naively, these norms measured the extent to
which the phase of a function behaved like a polynomial, and so an argument
from ignorance would suggest that the polynomial phases were the only
obstructions to the Gowers uniformity norm being small; however, there
was an important additional class of pseudopolynomial phases, known
as nilsequences, that one additionally had to consider. Proving this latter
conjecture (known as the inverse conjecture for the Gowers norms) goes
through a lot of rich mathematics, in particular the equidistribution theory
of orbits in nilmanifolds, and has a number of applications, for instance in
counting patterns in primes such as arithmetic progressions; see [Ta2011b].
with the given data and parameters to obtain the result. This type of situation pervades undergraduate homework exercises in applied mathematics
and physics, and also accurately describes many mature areas of engineering
(e.g. civil engineering or mechanical engineering) in which the model, data,
and parameters are all well understood. One could also classify pure mathematics as being the quintessential example of this type of situation, since the
models for mathematical foundations (e.g. the ZFC model for set theory)
are incredibly well understood (to the point where we rarely even think of
them as models any more), and one primarily works with well-formulated
problems with precise hypotheses and data.
However, there are many situations in which one or more ingredients are
missing. For instance, one may have a good model and good data, but the
parameters of the model are initially unknown. In that case, one needs to
first solve some sort of inverse problem to recover the parameters from existing sets of data (and their outcomes), before one can then solve the direct
problem. In some cases, there are clever ways to gather and use the data
so that various unknown parameters largely cancel themselves out, simplifying the task. For instance, to test the efficiency of a drug, one can use a
double-blind study in order to cancel out the numerous unknown parameters that affect both the control group and the experimental group equally.
Typically, one cannot solve for the parameters exactly, and so one must accept an increased range of error in ones predictions. This type of problem
pervades undergraduate homework exercises in statistics, and accurately describes many mature sciences, such as physics, chemistry, materials science,
and some of the life sciences.
Another common situation is when one has a good model and good
parameters, but an incomplete or corrupted set of data. Here, one often has
to clean up the data first using error-correcting techniques before proceeding
(this often requires adding a mechanism for noise or corruption into the
model itself, e.g. adding gaussian white noise to the measurement model).
This type of problem pervades undergraduate exercises in signal processing,
and often arises in computer science and communications science.
In all of the above cases, mathematics can be utilised to great effect,
though different types of mathematics are used for different situations (e.g.
computational mathematics when one has a good model, data set, and parameters; statistics when one has good model and data set but unknown
parameters; computer science, filtering, and compressed sensing when one
has good model and parameters, but unknown data; and so forth). However,
there is one important situation where the current state of mathematical sophistication is only of limited utility, and that is when it is the model which
is unreliable. In this case, even having excellent data, perfect knowledge of
resident can (and does) see the eye colors of all other residents, but has
no way of discovering his or her own (there are no reflective surfaces). If
a tribesperson does discover his or her own eye color, then their religion
compels them to commit ritual suicide at noon the following day in the
village square for all to witness. All the tribespeople are highly logical5 and
devout, and they all know that each other is also highly logical and devout
(and they all know that they all know that each other is highly logical and
devout, and so forth).
Of the 1000 islanders, it turns out that 100 of them have blue eyes and
900 of them have brown eyes, although the islanders are not initially aware
of these statistics (each of them can of course only see 999 of the 1000
tribespeople).
One day, a blue-eyed foreigner visits to the island and wins the complete
trust of the tribe.
One evening, he addresses the entire tribe to thank them for their hospitality.
However, not knowing the customs, the foreigner makes the mistake
of mentioning eye color in his address, remarking how unusual it is to see
another blue-eyed person like myself in this region of the world.
What effect, if anything, does this faux pas have on the tribe?
I am fond of this puzzle because in order to properly understand the
correct solution (and to properly understand why the alternative solution is
incorrect), one has to think very clearly (but unintuitively) about the nature
of knowledge.
There is however an additional subtlety to the puzzle that was pointed
out to me, in that the correct solution to the puzzle has two components, a
(necessary) upper bound and a (possible) lower bound, both of which I will
discuss shortly. Only the upper bound is correctly explained in the puzzle
(and even then, there are some slight inaccuracies, as will be discussed below). The lower bound, however, is substantially more difficult to establish,
in part because the bound is merely possible and not necessary. Ultimately,
this is because to demonstrate the upper bound, one merely has to show
that a certain statement is logically deducible from an islanders state of
knowledge, which can be done by presenting an appropriate chain of logical
deductions. But to demonstrate the lower bound, one needs to show that
certain statements are not logically deducible from an islanders state of
knowledge, which is much harder, as one has to rule out all possible chains
5For the purposes of this logic puzzle, highly logical means that any conclusion that can
logically deduced from the information and observations available to an islander, will automatically
be known to that islander.
10
11
Despite all this fearsome complexity, it is still possible to set up both the
syntax and semantics of temporal epistemic modal logic6 in such a way that
one can formulate the blue-eyed islander problem rigorously, and in such a
way that one has both an upper and a lower bound in the solution. The
purpose of this section is to construct such a setup and to explain the lower
bound in particular. The same logic is also useful for analysing another
well-known paradox, the unexpected hanging paradox, and I will do so at the
end of this section. Note though that there is more than one way7 to set up
epistemic logics, and they are not all equivalent to each other.
Our approach here will be a little different from the approach commonly found in the epistemic logic literature, in which one jumps straight
to arbitrary-order epistemic logic in which arbitrarily long nested chains
of knowledge (A knows that B knows that C knows that . . . ) are allowed. Instead, we will adopt a hierarchical approach, recursively defining
for k = 0, 1, 2, . . . a k th -order epistemic logic in which knowledge chains of
depth up to k, but no greater, are permitted. The arbitrarily order epistemic
logic is then obtained as a limit (a direct limit on the syntactic side, and an
inverse limit on the semantic side, which is dual to the syntactic side) of
the finite order epistemic logics. The relationship between the traditional
approach (allowing arbitrarily depth from the start) and the hierarchical one
presented here is somewhat analogous to the distinction between ZermeloFraenkel-Choice (ZFC) set theory without the axiom of foundation, and
ZFC with that axiom.
I should warn that this is going to be a rather formal and mathematical
article. Readers who simply want to know the answer to the islander puzzle
would probably be better off reading the discussion at
terrytao.wordpress.com/2011/04/07/the-blue-eyed-islanders-puzzle-repost
.
I am indebted to Joe Halpern for comments and corrections.
1.4.1. Zeroth-order logic. Before we plunge into the full complexity of
epistemic logic (or temporal epistemic logic), let us first discuss formal logic
in general, and then focus on a particularly simple example of a logic, namely
zeroth order logic (better known as propositional logic). This logic will end
up forming the foundation for a hierarchy of epistemic logics, which will be
needed to model such logic puzzles as the blue-eyed islander puzzle.
6On the other hand, for puzzles such as the islander puzzle in which there are only a finite
number of atomic propositions and no free variables, one at least can avoid the need to admit
predicate logic, in which one has to discuss quantifiers such as and . A fully formed predicate
temporal epistemic modal logic would indeed be of terrifying complexity.
7
In particular, one can also proceed using Kripke models for the semantics, which in my view,
are more elegant, but harder to motivate than the more recursively founded models presented here.
12
13
14
a theorem), but for the epistemic logics below, it will be convenient to make
deduction an explicit inference rule, as it simplifies the other inference rules
one will have to add to the system.
A typical deduction that comes from this syntax is
(A1 A2 A3 ), A2 , A3 ` A1
which using the blue-eyed islander interpretation, is the formalisation of the
assertion that given that at least one of the islanders I1 , I2 , I3 has blue eyes,
and that I2 , I3 do not have blue eyes, one can deduce that I1 has blue eyes.
As with the laws of grammar, one can certainly write down a finite list
of inference rules in propositional calculus; again, such lists may be found in
any text on mathematical logic. Note though that, much as a given vector
space has more than one set of generators, there is more than one possible
list of inference rules for propositional calculus, due to some rules being
equivalent to, or at least deducible from, other rules; the precise choice of
basic inference rules is to some extent a matter of personal taste and will
not be terribly relevant for the current discussion.
Finally, we discuss the semantics of propositional logic. For this particular logic, the models M are described by truth assignments, that assign a
truth value (M |= Ai ) {true, false} to each atomic statement Ai . Once a
truth value (M |= Ai ) to each atomic statement Ai is assigned, the truth
value (M |= S) of any other sentence S in the propositional logic generated
by these atomic statements can then be interpreted using the usual truth
tables. For instance, returning to the islander example, consider a model M
in which M |= A1 is true, but M |= A2 and M |= A3 are false; informally,
M describes a hypothetical world in which I1 has blue eyes but I2 and I3
do not have blue eyes. Then the sentence A1 A2 A3 is true in M ,
M |= (A1 A2 A3 ),
but the statement A1 = A2 is false in M ,
M 6|= (A1 = A2 ).
If S is a set of sentences, we say that M models S if M models each sentence
in S. Thus for instance, if we continue the preceding example, then
M |= (A1 A2 A3 ), (A2 = A3 )
but
M 6|= (A1 A2 A3 ), (A1 = A2 ).
Note that if there are only finitely many atomic statements A1 , . . . , An ,
then there are only finitely many distinct models M of the resulting propositional logic; in fact, there are exactly 2n such models, one for each truth
15
16
17
agents that are able to know certain facts in zeroth-order logic (e.g. an
islander I1 may know that the islander I2 has blue eyes). However, in this
logic one cannot yet express higher-order facts (e.g. we will not yet be able
to formulate a sentence to the effect that I1 knows that I2 knows that I3
has blue eyes). This will require a second-order or higher epistemic logic,
which we will discuss later in this section.
Let us now formally construct this logic. As with zeroth-order logic, we
will need a certain set of atomic propositions, which for simplicity we will
assume to be a finite set A1 , . . . , An . This already gives the zeroth order
language L0 of sentences that one can form from the A1 , . . . , An by the rules
of propositional grammar. For instance,
(A1 = A2 ) (A2 = A3 )
is a sentence in L0 . The zeroth-order logic L0 also comes with a notion of
inference `L0 and a notion of modeling |=L0 , which we now subscript by L0
in order to distinguish it from the first-order notions of inference `L1 and
modeling |=L1 which we will define shortly. Thus, for instance
(A1 = A2 ) (A2 = A3 ) `L0 (A1 = A3 ),
and if M0 is a truth assignment for L0 for which A1 , A2 , A3 are all true, then
M0 |=L0 (A1 = A2 ) (A2 = A3 ).
We will also assume the existence of a finite number of knowledge agents
K1 , . . . , Km , each of which are capable of knowing sentences in the zeroth
order language L0 . (In the case of the islander puzzle, and ignoring for
now the time aspect of the puzzle, each islander Ii generates one knowledge
agent Ki , representing the state of knowledge of Ii at a fixed point in time.
Later on, when we add in the temporal aspect to the puzzle, we will need
different knowledge agents for a single islander at different points in time,
but let us ignore this issue for now.) To formalise this, we define the firstorder language L1 to be the language generated from L0 and the rules of
propositional grammar by imposing one additional rule:
If S is a sentence in L0 , and K is a knowledge agent, then K(S)
is a sentence in L1 (which can informally be read as K knows (or
believes) S to be true).
Thus, for instance,
K2 (A1 ) K1 (A1 A2 A3 ) A3
is a sentence in L1 ; in the islander interpretation, this sentence denotes the
assertion that I2 knows I1 to have blue eyes, and I1 knows that at least one
islander has blue eyes, but I3 does not have blue eyes. On the other hand,
K1 (K2 (A3 ))
18
Similarly, if S, T are sentences in L0 such that S `L0 T , then one automatically has S `L1 T .
However, we would like to add some additional inference rules to reflect our understanding of what knowledge means. One has some choice
in deciding what rules to lay down here, but we will only add one rule,
which informally reflects the assertion that all knowledge agents are highly
logical:
First-order epistemic inference rule: If S1 , . . . , Si , T L0 are
sentences such that
S1 , . . . , Si `L0 T
and K is a knowledge agent, then
K(S1 ), . . . , K(Si ) `L1 K(T ).
We will introduce higher order epistemic inference rules when we turn
to higher order epistemic logics.
Informally speaking, the epistemic inference rule asserts that if T can be
deduced from S1 , . . . , Si , and K knows S1 , . . . Si to be true, then K must also
know T to be true. For instance, since modus ponens gives us the inference
A1 , (A1 = A2 ) `L0 A2
we therefore have, by the first-order epistemic inference rule,
K1 (A1 ), K1 (A1 = A2 ) `L1 K1 (A2 )
(note how this is different from (1.1) - why?).
Another example of more relevance to the islander puzzle, we have
(A1 A2 A3 ), A2 , A3 `L0 A1
and thus, by the first-order epistemic inference rule,
K1 (A1 A2 A3 ), K1 (A2 ), K1 (A3 ) `L1 K1 (A1 ).
In the islander interpretation, this asserts that if I1 knows that one of the
three islanders I1 , I2 , I3 has blue eyes, but also knows that I2 and I3 do not
have blue eyes, then I1 must also know that he himself (or she herself) has
blue eyes.
19
20
models M1 of L1 . As L1 is an extension of L0 , any model M1 of L1 must contain as a component a model M0 of L0 , which describes the truth assignment
of each of the atomic propositions Ai of L0 ; but it must also describe the state
of knowledge of each of the agents Ki in this logic. One can describe this
state in two equivalent ways; either as a theory {S L0 : M1 |=L1 Ki (S)}
(in L0 ) of all the sentences S in L0 that Ki knows to be true (which, by the
first-order epistemic inference rule, is closed under `L0 and is thus indeed a
theory in L0 ); or equivalently (by the soundness and completeness of L0 ),
as a set
{M0,i Mod(L0 ) : M0,i |=L0 S whenever M1 |=L1 Ki (S)}
of all the possible models of L0 in which all the statements that Ki knows to
be true, are in fact true. We will adopt the latter perspective; thus a model
M1 of L1 consists of a tuple
(1)
(m)
M1 = (M0 , M0 , . . . , M0 )
(i)
21
(i)
22
23
+
{Mk1 (M0,i
), Mk1 (M0,i
)}, where M0,i
(resp. M0,i
) is the L0 -model which is
identical to M0 except that the truth value of Ai is set to false (resp. true).
Informally, Mk (M0 ) models the k th -order epistemology of the L0 -world M0 ,
in which each islander sees each others eye colour (and knows that each
other islander can see all other islanders eye colour, and so forth for k
iterations), but is unsure as to his or her own eye colour (which is why the
(i)
set Mk1 of Ai s possible Lk1 -worlds branches into two possibilities). As
one recursively explores the clouds of hypothetical worlds in these models,
one can move further and further away from the real world. Consider for
instance the situation when n = 3 and M0 |= A1 , A2 , A3 (thus in the real
world, all three islanders have blue eyes), and k = 3. From the perspective
have blue eyes: M0,1 |= A1 , A2 , A3 . In that world, we can then pass to the
), in which
perspective of K2 , and then one could be in the world M1 (M0,1,2
, in which none of I1 , I2 , I3
K3 , in which one could be in the world M0,1,2,3
24
25
The logic L now allows one to talk about arbitrarily deeply nested
strings of knowledge: if S is a sentence in L , and K is a knowledge agent,
then K(S) is also a sentence in L . This allows for the following definition:
Definition 1.5.5 (Common knowledge). If S is a sentence in L , then
C(S) is the set of all sentences of the form
Ki1 (Ki2 (. . . (Kik (S)) . . .))
where k 0 and Ki1 , . . . , Kik are knowledge agents (possibly with repetition).
Thus, for instance, using the epistemic inference rules, every tautology
in L is commonly known as such: if `L S, then `L C(S).
Let us now work in the islander model in which there are n atomic
propositions A1 , . . . , An and n knowledge agents K1 , . . . , Kn . To model the
statement that it is commonly known that each islander knows each other
islanders eye colour, one can use the sets of sentences
(1.2)
C(Ai = Kj (Ai ))
and
(1.3)
C(Ai = Kj (Ai ))
(informally, S asserts that all blue-eyed islanders know their own eye colour).
26
(1) If m l, then
T , Bm , C(Bl ) `L S.
(2) If m > l, then
T , Bm , C(Bl ) 6`L S.
Proof. The first part of the theorem can be established informally as follows: if Bm holds, then each blue-eyed islander sees m 1 other blue-eyed
islanders, but also knows that there are at least l blue-eyed islanders. If
m l, this forces each blue-eyed islander to conclude that his or her own
eyes are blue (and in fact if m < l, the blue-eyed islanders knowledge is now
inconsistent, but the conclusion is still valid thanks to ex falso quodlibet). It
is a routine matter to formalise this argument using the axioms (1.2), (1.3)
and the epistemic inference rule; we leave the details as an exercise.
To prove the second part, it suffices (by soundness) to construct a L model M which satisfies T , Bm , and C(Bl ) but not S. By definition of
an L -model, it thus suffices to construct, for all sufficiently large natural
numbers k, an L -model Mk which satisfies T Lk , Bm , and C(Bl ) Lk ,
but not S, and which are consistent with each other in the sense that each
Mk is the restriction of Mk+1 to Lk .
We can do this by a modification of the construction in Example 1.5.2.
For any L0 -model M0 , we can recursively define an Lk -model Mk,l (M0 ) for
any k 0 by setting M0,l (M0 ) := M0 , and then for each k 1, setting
Mk,l (M0 ) to be the Lk -model with L0 -model M0 , and with possible worlds
(i)
Mk1 given by
(i)
this is the same construction as in Example 1.5.2, except that at all levels of
the recursive construction, we restrict attention to worlds that obey Bl . A
routine induction shows that the Mk,l (M0 ) determine a limit M,l (M0 ),
which is an L model that obeys T and C(Bl ). If M0 |=L0 Bm , then
clearly M,l (M0 ) |=L Bm as well. But if m > l, then we see that
M,l (M0 ) 6|=L S, because for any index i with M0 |=L0 Ai , we see
(i)
that if k 1, then Mk1 (M0 ) contains worlds in which Ai is false, and
so Mk,l (M0 ) 6|=Lk Ki (Ai ) for any k 1.
1.5.2. Temporal epistemic logic. The epistemic logic discussed above is
sufficiently powerful to model the knowledge environment of the islanders
in the blue-eyed islander puzzle at a single instant in time, but in order to
fully model the islander puzzle, we now must now incorporate the role of
time. To avoid confusion, I feel that this is best accomplished by adopting
a spacetime perspective, in which time is treated as another coordinate
27
rather than having any particularly privileged role in the theory, and the
model incorporates all time-slices of the system at once. In particular, if we
allow the time parameter t to vary along some set T of times, then each actor
Ii in the model should now generate not just a single knowledge agent Ki ,
but instead a family (Ki,t )tT of knowledge agents, one for each time t T .
Informally, Ki,t (S) should then model the assertion that Ii knows S at time
t. This of course leads to many more knowledge agents than before; if for
instance one considers an islander puzzle with n islanders over M distinct
points in time, this would lead to nM distinct knowledge agents Ki,t . And
if the set of times T is countably or uncountably infinite, then the number
of knowledge agents would similarly be countably or uncountably infinite.
Nevertheless, there is no difficulty extending the previous epistemic logics
Lk and L to cover this situation. In particular we still have a complete
and sound logical framework to work in.
Note that if we do so, we allow for the ability to nest knowledge operators
at different times in the past or future. For instance, if we have three times
t1 < t2 < t3 , one could form a sentence such as
K1,t2 (K2,t1 (S)),
which informally asserts that at time t2 , I1 knows that I2 already knew S
to be true by time t1 , or
K1,t2 (K2,t3 (S)),
which informally asserts that at time t2 , I1 knows that I2 will know S to be
true by time t3 . The ability to know certain statements about the future is
not too relevant for the blue-eyed islander puzzle, but is a crucial point in
the unexpected hanging paradox.
Of course, with so many knowledge agents present, the models become
more complicated; a model Mk of Lk now must contain inside it clouds
(i,t)
Mk1 of possible worlds for each actor Ii and each time t T .
One reasonable axiom to add to a temporal epistemological system is
the ability of agents to remember what they know. More precisely, we can
impose the memory axiom
(1.4)
28
29
We can now set up the various axioms for the puzzle. The highly
logical axiom has already been subsumed in the epistemological inference
rule. We also impose the memory axiom (1.4). Now we formalise the other
assumptions of the puzzle:
(All islanders see each others eye colour) If i, j {1, . . . , n} are
distinct and t Z, then
(1.5)
(1.6)
(1.7)
(1.8)
00 C (S ), then
Similarly, if Si,t
t
i,t
00
C(Si,t = Si,t
).
(1.9)
C0 (B1 ).
Let T denote the union of all the axioms (1.4), (1.5), (1.6), (1.7), (1.8),
(1.10). The solution to the islander puzzle can then be summarised as
follows:
Theorem 1.5.7. Let 1 m n.
(1) (At least one blue-eyed islander commits suicide by day m)
n _
m
_
T , Bm `L
(Ai Si,t ).
i=1 t=1
(2) (Nobody needs to commit suicide before day m) For any t < m and
1 i m,
T , Bm 6`L Si,t .
Note that the first conclusion is weaker than the conventional solution to
the puzzle, which asserts in fact that all m blue-eyed islanders will commit
suicide on day m. While this indeed the default outcome of the hypotheses
T , Bm , it turns out that this is not the only possible outcome; for instance, if
one blue-eyed person happens to commit suicide on day 0 or day 1 (perhaps
30
for an unrelated reason than learning his or her own eye colour), then it
turns out that this cancels the effect of the foreigners announcement, and
prevents further suicides. (So, if one were truly nitpicky, the conventional
solution is not always correct, though one could also find similar loopholes
to void the solution to most other logical puzzles, if one tried hard enough.)
In fact there is a strengthening of the first conclusion: given the hypotheses T , Bm , there must exist a time 1 t m and t distinct islanders
Ii1 , . . . , Iit such that Aij Sij ,t holds for all j = 1, . . . , t.
Note that the second conclusion does not prohibit the existence of some
models of T , Bm in which suicides occur before day m (consider for instance
a situation in which a second foreigner made a similar announcement a few
days before the first one, causing the chain of events to start at an earlier
point and leading to earlier suicides).
Proof. (Sketch) To illustrate the first part of the theorem, we focus on the
simple case m = n = 2; the general case is similar but requires more notation
(and an inductive argument). It suffices to establish that
T , B2 , S1,1 , S2,1 `L S1,2 S2,2
(i.e. if nobody suicides by day 1, then both islanders will suicide on day 2.)
Assume T , B2 , S1,1 , S2,1 . From (1.10) we have
K1,0 (K2,0 (A1 A2 ))
and hence by (1.4)
K1,1 (K2,0 (A1 A2 )).
By (1.6) we also have
K1,1 (A1 = K2,0 (A1 ))
whereas from the epistemic inference axioms we have
K1,1 ((K2,0 (A1 A2 ) K2,0 (A1 )) = K2,0 (A2 )).
From the epistemic inference axioms again, we conclude that
K1,1 (A1 = K2,0 (A2 ))
and hence by (1.7) (and epistemic inference)
K1,1 (A1 = S2,1 ).
On the other hand, from S2,1 and (1.9) we have
K1,1 (S2,1 )
and hence by epistemic inference
K1,1 (A1 )
31
k (M0 )
M0 , then if k 1 and Mk1 () has already been defined, we define M
(i,t)
to be the Lk -model with L0 -model M0 , and with Mk1 (M0 ) for i = 1, . . . , n
and t Z defined by the following rules:
32
(i,t)
Case 1. If t < 0, then we set Mk1 (M0 ) to be the set of all Lk1 -models
of the form Mk1 (M00 ), where M00 obeys the two properties Ii sees other islanders eyes and Ii remembers suicides from the preceding construction.
(M00 does not need to be admissible in this case.)
Case 2. If t = m 1, M0 |= Ai , and there does not exist 1 t0 t
distinct i1 , . . . , it0 {1, . . . , n} such that M0 |=L0 Aij Sij ,t0 for all j =
(i,t)
k1 (M 0 ), where
1, . . . , t0 , then we set M (M0 ) Lk1 -models of the form M
0
k1
M00 is admisssible, obeys the two properties Ii sees other islanders eyes
and Ii remembers suicides from the preceding construction, and also obeys
the additional property M00 |= Ai . (Informally, this is the case in which Ii
must learn Ai .)
(i,t)
Case 3. In all other cases, we set Mk1 (M0 ) to be the set of all Lk1 k1 (M 0 ), where M 0 is admissible and obeys the two
models of the form M
0
0
properties Ii sees other islanders eyes and Ii remembers suicides from
the preceding construction.
(M0 ) be the limit of the M
k (M0 ) (which can easily be verified
We let M
to exist by induction). A quite tedious verification reveals that for any
(M0 ) obeys both T
admissible L0 -model M0 of blue-eyed count m, that M
and Bm , but one can choose M0 to not admit any suicides before time m,
which will give the second claim of the theorem.
Remark 1.5.8. Under the assumptions used in our analysis, we have shown
that it is inevitable that the foreigners comment will cause at least one
death. However, it is possible to avert all deaths by breaking one or more
of the assumptions used. For instance, if it is possible to sow enough doubt
in the islanders minds about the logical and devout nature of the other
islanders, then one can cause a breakdown of the epistemic inference rule
or of (1.7), and this can prevent the chain of deductions from reaching its
otherwise deadly conclusion.
Remark 1.5.9. The same argument actually shows that L can be replaced
by Lm for the first part of Theorem 1.5.7 (after restricting the definition of
common knowledge to those sentences that are actually in Lm , of course).
On the other hand, using Lk for k < m, one can show that this logic is
insufficient to deduce any suicides if there are m blue-eyed islanders, by
using the model Mk (M0 ) defined above; we omit the details.
1.5.4. The unexpected hanging paradox. We now turn to the unexpected hanging paradox, and try to model it using (temporal) epistemic logic.
Here is a common formulation of the paradox (taken from the Wikipedia
entry on this problem):
33
K(S),
i.e. as the assertion that K does not know S to be true. However, this leads
to the following situation: if K has inconsistent knowledge (in particular, one
has K(), where represents falsity (the negation of a tautology)), then by
ex falso quodlibet, K(S) would be true for every S, and hence K would expect
everything and be surprised by nothing. An alternative interpretation, then,
is to adopt the convention that an agent with inconsistent knowledge is so
confused as to not be capable of expecting anything (and thus be surprised
by everything). In this case, K does not expect S should instead be
modeled as
(1.12)
(K(S)) K(),
34
K(S),
35
be Mk1 (M0+ ), and this will also work. (Of course, in such models, S must
be false.)
Another peculiarity of the sentence S is that
K(S), K(K(S)) |=L K()
as can be easily verified (by modifying the proof of the second statement
of the above theorem). Thus, the sentence S has the property that if the
prisoner believes S, and also knows that he or she believes S, then the
prisoners beliefs automatically become inconsistent - despite the fact that
S is not actually a self-contradictory statement (unless also combined with
K(S)).
Now we move to the case when the execution could take place at two
possible times, say Monday at noon and Tuesday at noon. We then have
two atomic statements: E1 , the assertion that the execution takes place
on Monday at noon, and E2 , the assertion that the execution takes place
on Tuesday at noon. There are two knowledge agents; K1 , the state of
knowledge just before Monday at noon, and K2 , the state of knowledge just
before Tuesday at noon. (There is again the annoying notational issue that
if E1 occurs, then presumably the prisoner will have no sensible state of
knowledge by Tuesday, and so K2 might not be well defined in that case;
to avoid this irrelevant technicality, we replace the execution by some nonlethal punishment (or use an alternate formulation of the puzzle, for instance
by replacing an unexpected hanging with a surprise exam.)
We will need one axiom beyond the basic axioms of epistemic logic,
namely
(1.14)
36
37
K1 ().
K2 ().
K1 (K2 ()).
(We leave this as an exercise for the interested reader.) In other words, in
order for the judges sentence to be common knowledge, either the prisoners
knowledge on Monday or Tuesday needs to be inconsistent, or else the prisoners knowledge is consistent, but the prisoner is unable (on Monday) to
determine that his or her own knowledge (on Tuesday) is consistent. Notice
that the third conclusion here K1 (K2 ()) is very reminiscent of G
odels
second incompleteness theorem, and indeed in [KrRa2010], the surprise
examination argument is modified to give a rigorous proof of that theorem.
Remark 1.5.12. Here is an explicit example of a L -world in which S
is common knowledge, and K1 and K2 are both consistent (but K1 does
not know that K2 is consistent). We first define Lk -models Mk for each
k = 0, 1, . . . recursively by setting M0 to be the world in which M0 |=L0 E2
and M0 |=L0 E1 , and then define Mk for k 1 to be the Lk -model with
(1)
(2)
L0 -model M0 , with Mk1 := {Mk1 }, and Mk1 := . (Informally: the
execution is on Tuesday, and the prisoner knows this on Monday, but has
k for k = 0, 1, . . .
become insane by Tuesday.) We then define the models M
0 |=L
recursively by setting M0 to be the world in which M0 |=L0 E1 and M
0
k for k 1 to be the Lk -model with L0 -model M
0,
E2 , then define M
(1)
k1 }, and M(2) := {M
k1 }. (Informally: the execution
Mk1 := {Mk1 , M
k1
is on Monday, but the prisoner only finds this out after the fact.) The limit
of the M
k then has S as common knowledge, with K1 () and K2 ()
M
both false, but K1 (K2 ()) is also false.
Chapter 2
Group theory
40
2. Group theory
buy back these assets if needed by expanding the class of objects again (and
defining the property P in a sufficiently abstract and invariant fashion).)
For instance, if one needs to verify P (x) for all x in a normed vector
space X, and the property P (x) is homogeneous (so that, for any scalar
c, P (x) implies P (cx)), then we can spend this homogeneity invariance to
normalise x to have norm 1, thus effectively replacing X with the unit sphere
of X. Of course, this new space is no longer closed under homogeneity; we
have spent that invariance property. Conversely, to prove a property P (x)
for all x on the unit sphere, it is equivalent to prove P (x) for all x in X,
provided that one extends the definition of P(x) to X in a homogeneous
fashion.
As a rule of thumb, each independent symmetry of the problem that one
has can be used to achieve one normalisation. Thus, for instance, if one has
a three-dimensional group of symmetries, one can expect to normalise three
quantities of interest to equal a nice value (typically one normalises to 0 for
additive symmetries, or 1 for multiplicative symmetries).
In a similar spirit, if the problem one is trying to solve is closed with
respect to an operation such as addition, then one can restrict attention to
all x in a suitable generating set of X, such as a basis. Many divide and
conquer strategies are based on this type of observation.
Or: if the problem one is trying to solve is closed with respect to limits,
then one can restrict attention to all x in a dense subclass of X. This is a
particularly useful trick in real analysis (using limiting arguments to replace
reals with rationals, sigma-compact sets with compact sets, rough functions
with nice functions, etc.). If one uses ultralimits instead of limits, this
type of observation leads to various useful correspondence principles between
finitary instances of the problem and infinitary ones (with the former serving
as a kind of dense subclass of the latter); see e.g. [Ta2012, 1.7].
Sometimes, one can exploit rather abstract or unusual symmetries. For
instance, certain types of statements in algebraic geometry tend to be insensitive to the underlying field (particularly if the fields remain algebraically
closed). This allows one to sometimes move from one field to another, for
instance from an infinite field to a finite one or vice versa; see [Ta2010b,
1.2]. Another surprisingly useful symmetry is closure with respect to tensor
powers; see [Ta2008, 1.9].
Gauge symmetry is a good example of a symmetry which is both spent
(via gauge fixing) and bought (by reformulating the problem in a gaugeinvariant fashion); see [Ta2009b, 1.4].
41
Symmetries also have many other uses beyond their ability to be spent
in order to obtain normalisation. For instance, they can be used to analyse a claim or argument for compatibility with that symmetry; generally
speaking, one should not be able to use a non-symmetric argument to prove
a symmetric claim (unless there is an explicit step where one spends the
symmetry in a strategic fashion). The useful tool of dimensional analysis is
perhaps the most familiar example of this sort of meta-analysis.
Thanks to Noethers theorem and its variants, we also know that there
often is a duality relationship between (continuous) symmetries and conservation laws; for instance, the time-translation invariance of a (Hamiltonian
or Lagrangian) system is tied to energy conservation, the spatial translation
invariance is tied to momentum conservation, and so forth. The general
principle of relativity (that the laws of physics are invariant with respect to
arbitrary nonlinear coordinate changes) leads to a much stronger pointwise
conservation law, namely the divergence-free nature of the stress-energy tensor, which is fundamentally important in the theory of wave equations (and
particularly in general relativity).
As the above examples demonstrate, when solving a mathematical problem, it is good to be aware of what symmetries and closure properties the
problem has, before one plunges in to a direct attack on the problem. In
some cases, such symmetries and closure properties only become apparent
if one abstracts and generalises the problem to a suitably natural framework; this is one of the major reasons why mathematicians use abstraction
even to solve concrete problems. (To put it another way, abstraction can be
used to purchase symmetries or closure properties by spending the implicit
normalisations that are present in a concrete approach to the problem; see
[Ta2011d, 1.6].)
42
2. Group theory
n
X
zj wn+j zn+j wj .
j=1
43
Remark 2.2.1. This same convention also underlies the notation for the
exceptional simple Lie groups G2 , F4 , E6 , E7 , E8 , which we will not discuss
further here.
With two exceptions, the classical Lie groups An , Bn , Cn , Dn are all simple, i.e. their Lie algebras are non-abelian and not expressible as the direct sum of smaller Lie algebras. The two exceptions are D1 = SO2 (C),
which is abelian (isomorphic to C , in fact) and thus not considered simple,
and D2 = SO4 (C), which turns out to essentially split as A1 A1 =
SL2 (C) SL2 (C), in the sense that the former group is double covered
by the latter (and in particular, there is an isogeny from the latter to the
former, and the Lie algebras are isomorphic).
The adjoint action of a Cartan subgroup of a Lie group G on the Lie
algebra g splits that algebra into weight spaces; in the case of a simple
Lie group, the associated weights are organised by a Dynkin diagram. The
Dynkin diagrams for An , Bn , Cn , Dn are of course well known, and can be
found in any text on Lie groups or algebraic groups.
For small n, some of these Dynkin diagrams are isomorphic; this is a classic instance of the tongue-in-cheek strong law of small numbers [Gu1988],
though in this case strong law of small diagrams would be more appropriate. These accidental isomorphisms then give rise to the exceptional isomorphisms between Lie algebras (and thence to exceptional isogenies between
Lie groups). Excluding those isomorphisms involving the exceptional Lie
algebras En for n = 3, 4, 5, these isomorphisms are
(1) A1 = B1 = C1 ;
(2) B2 = C2 ;
(3) D2 = A1 A1 ;
(4) D3 = A3 .
There is also a pair of exceptional isomorphisms from (the Spin8 form of)
D4 to itself, a phenomenon known as triality.
These isomorphisms are most easily seen via algebraic and combinatorial
tools, such as an inspection of the Dynkin diagrams. However, the isomorphisms listed above1 can also be seen by more geometric means, using
the basic representations of the classical Lie groups on their natural vector
spaces (Cn+1 , C2n+1 , C2n , C2n for An , Bn , Cn , Dn respectively) and combinations thereof (such as exterior powers). These isomorphisms are quite
standard (they can be found, for instance, in [Pr2007]), but I decided to
present them here for sake of reference.
1However, I dont know of a simple way to interpret triality geometrically; the descriptions
I have seen tend to involve some algebraic manipulation of the octonions or of a Clifford algebra,
in a manner that tended to obscure the geometry somewhat.
44
2. Group theory
45
46
2. Group theory
Chapter 3
Combinatorics
D+d
d
47
48
3. Combinatorics
lie on the boundary {Q = 0}). This can be iterated to give a useful cell
decomposition:
Proposition 3.1.1 (Cell decomposition). Let P be a finite set of points in
Rd , and let D be a positive integer. Then there exists a polynomial Q of
degree at most D, and a decomposition
Rd = {Q = 0} C1 . . . Cm
into the hypersurface {Q = 0} and a collection C1 , . . . , Cm of cells bounded
by {P = 0}, such that m = Od (Dd ), and such that each cell Ci contains at
most Od (|P |/Dd ) points.
A proof of this decomposition is sketched in [Ta2011d, 3.9]. The cells
in the argument are not necessarily connected (being instead formed by
intersecting together a number of semi-algebraic sets such as {Q > 0} and
{Q < 0}), but it is a classical result2 [OlPe1949], [Mi1964], [Th1965]
that any degree D hypersurface {Q = 0} divides Rd into Od (Dd ) connected
components, so one can easily assume that the cells are connected if desired.
Remark 3.1.2. By setting D as large as Od (|P |1/m ), we obtain as a limiting
case of the cell decomposition the fact that any finite set P of points in Rd
can be captured by a hypersurface of degree Od (|P |1/m ). This fact is in
fact true over arbitrary fields (not just over R), and can be proven by a
simple linear algebra argument; see e.g. [Ta2009b, 1.1]. However, the cell
decomposition is more flexible than this algebraic fact due to the ability to
arbitrarily select the degree parameter D.
The cell decomposition can be viewed as a structural theorem for arbitrary large configurations of points in space, much as the Szemeredi regularity lemma [Sz1978] can be viewed as a structural theorem for arbitrary
large dense graphs. Indeed, just as many problems in the theory of large
dense graphs can be profitably attacked by first applying the regularity
lemma and then inspecting the outcome, it now seems that many problems
in combinatorial incidence geometry can be attacked by applying the cell
decomposition (or a similar such decomposition), with a parameter D to be
optimised later, to a relevant set of points, and seeing how the cells interact with each other and with the other objects in the configuration (lines,
planes, circles, etc.). This strategy was spectacularly illustrated recently
with Guth and Katzs use [GuKa2010] of the cell decomposition to resolve
2Actually, one does not need the full machinery of the results in the above cited papers which control not just the number of components, but all the Betti numbers of the complement of
{Q = 0} - to get the bound on connected components; one can instead observe that every bounded
connected component has a critical point where Q = 0, and one can control the number of these
points by Bezouts theorem, after perturbing Q slightly to enforce genericity, and then count the
unbounded components by an induction on dimension. See [SoTa2011, Appendix A].
49
the Erd
os distinct distance problem (up to logarithmic factors), as discussed
in [Ta2011d, 3.9].
In this section, I will record a simpler (but still illustrative) version of
this method (that I learned from Nets Katz), which provides yet another
proof of the Szemeredi-Trotter theorem in incidence geometry:
Theorem 3.1.3 (Szemeredi-Trotter theorem). Given a finite set of points
P and a finite set of lines L in R2 , the set of incidences I(P, L) := {(p, `)
P L : p `} has cardinality
|I(P, L)| |P |2/3 |L|2/3 + |P | + |L|.
This theorem has many short existing proofs, including one via crossing
number inequalities (as discussed in [Ta2008, 1.10] or via a slightly different type of cell decomposition (as discussed in [Ta2010b, 1.6]). The proof
given below is not that different, in particular, from the latter proof, but I
believe it still serves as a good introduction to the polynomial method in
combinatorial incidence geometry.
Let us begin with a trivial bound:
Lemma 3.1.4 (Trivial bound). For any finite set of points P and finite set
of lines L, we have |I(P, L)| |P ||L|1/2 + |L|.
The slickest way to prove this lemma is by the Cauchy-Schwarz inequality. If we let (`) be the number of points P incident to a given line `, then
we have
X
|I(P, L)| =
(`)
`L
On the other hand, the left-hand side counts the number of triples (p, p0 , `)
P P L with p, p0 `. Since two distinct points p, p0 determine at most
one line, one thus sees that the left-hand side is at most |P |2 + |I(P, L)|, and
the claim follows.
Now we return to the Szemeredi-Trotter theorem, and apply the cell
decomposition with some parameter D. This gives a decomposition
R2 = {Q = 0} C1 . . . Cm
into a curve {Q = 0} of degree O(D), and at most O(D2 ) cells C1 , . . . , Cm ,
each of which contains O(|P |/D2 ) points. We can then decompose
m
X
|I(P, L)| = |I(P {Q = 0}, L)| +
|I(P Ci , L)|.
i=1
50
3. Combinatorics
m
X
|Li | D|L|
i=1
Now we turn to the incidences coming from the curve {Q = 0}. Applying
Bezouts theorem again, we see that each line in L either lies in {Q = 0},
or meets {Q = 0} in O(D) points. The latter case contributes at most
O(D|L|) incidences, so now we restrict attention to lines that are completely
contained in {Q = 0}. The points in the curve {Q = 0} are of two types:
smooth points (for which there is a unique tangent line to the curve {Q = 0})
and singular points (where Q and Q both vanish). A smooth point can be
incident to at most one line in {Q = 0}, and so this case contributes at most
|P | incidences. So we may restrict attention to the singular points. But by
one last application of Bezouts theorem, each line in L can intersect the
zero-dimensional set {Q = Q = 0} in at most O(D) points (note that each
partial derivative of Q also has degree O(D)), giving another contribution
of O(D|L|) incidences. Putting everything together, we obtain
|I(P, L)| D1/2 |P ||L|1/2 + D|L| + |P |
for any D 1. An optimisation in D then gives the claim.
Remark 3.1.5. If one used the extreme case of the cell decomposition noted
in Remark 3.1.2, one only obtains the trivial bound
|I(P, L)| |P |1/2 |L| + |P |.
51
On the other hand, this bound holds over arbitrary fields k (not just over
R), and can be sharp in such cases (consider for instance the case when k
is a finite field, P consists of all the points in k 2 , and L consists of all the
lines in k 2 .)
52
3. Combinatorics
for any compact set A. Our task is to establish that c(A) 0 whenever
t (A) 1 (B) + t.
We first verify the extreme cases. If (A) = t, then 1A 1B t, and so
c(A) = 0 in this case. At the other extreme, if (A) = 1 (B) + t, then
from the inclusion-exclusionRprinciple we see that 1A 1B t, and so again
c(A) = 0 in this case (since G 1A 1B = (A)(B) = t(B)).
To handle the intermediate regime when (A) lies between t and 1
(B) + t, we rely on the submodularity inequality
(3.2)
53
c(gA) = c(A)
54
3. Combinatorics
Chapter 4
Analysis
56
4. Analysis
Many Banach spaces (and in particular, all Hilbert spaces) have the approximation property 1 that implies (by a result of Grothendieck [Gr1955]) that
all compact operators on that space are approximable. For instance, if X
is a Hilbert space, then any compact operator is approximable, because any
compact set can be approximated by a finite-dimensional subspace, and in
a Hilbert space, the orthogonal projection operator to a subspace is always
a contraction. In more general Banach spaces, finite-dimensional subspaces
are still complemented, but the operator norm of the projection can be large.
Indeed, there are examples of Banach spaces for which the approximation
property fails; the first such examples were discovered by Enflo [En1973],
and a subsequent paper by Alexander [Al1974] demonstrated the existence
of compact operators in certain Banach spaces that are not approximable.
We also give two more traditional proofs of the Fredholm alternative,
not requiring the operator to be approximable, which are based on the Riesz
lemma and a continuity argument respectively.
4.1.1. First proof (approximable case only). In the finite-dimensional
case, the Fredholm alternative is an immediate consequence of the ranknullity theorem, and the finite rank case can be easily deduced from the
finite dimensional case by some routine algebraic manipulation. The main
difficulty in proving the alternative is to be able to take limits and deduce
the approximable case from the finite rank case. The key idea of the proof
is to use the approximable property to establish a lower bound on T I
that is stable enough to allow one to take such limits.
Fix a non-zero . It is clear that T cannot have both an eigenvalue and
bounded resolvent at , so now suppose that T has no eigenvalue at , thus
T is injective. We claim that this implies a lower bound:
Lemma 4.1.2 (Lower bound). Let C be non-zero, and suppose that
T : X X be a compact operator that has no eigenvalue at . Then there
exists c > 0 such that k(T )xk ckxk for all x X.
Proof. By homogeneity, it suffices to establish the claim for unit vectors x.
Suppose this is not the case; then we can find a sequence of unit vectors xn
such that (T )xn converges strongly to zero. Since xn has norm bounded
away from zero (here we use the non-zero nature of ), we conclude in
particular that yn := T xn has norm bounded away from zero for sufficiently
large n. By compactness of T , we may (after passing to a subsequence)
assume that the yn converge strongly to a limit y, which is thus also nonzero.
1The approximation property has many formulations; one of them is that the identity operator is the limit of a sequence of finite rank operators in the strong operator topology.
57
58
4. Analysis
Lemma 4.1.4 (Riesz lemma). If Y is a proper closed subspace of a Banach space X, and > 0, then there exists a unit vector x whose distance
dist(x, Y ) to Y is at least 1 .
Proof. By the Hahn-Banach theorem, one can find a non-trivial linear functional : X C on X which vanishes on Y . By definition of the operator
norm kkop of , one can find a unit vector x such that |(x)| (1)kkop .
The claim follows.
The strategy here is not to use finite rank approximations (as they are
no longer available), but instead to try to contradict the compactness of T
by exhibiting a bounded set whose image under T is not totally bounded.
Let T : X X be a compact operator on a Banach space, and let
be a non-zero complex number such that T has no eigenvalue at . As in
the first proof, we have the lower bound from Lemma 4.1.2, and we know
that Ran(T ) is a closed subspace of X; in particular, the map T is
a Banach space isomorphism from X to Ran(T ). Our objective is again
to show that Ran(T ) is all of X.
Suppose for contradiction that Ran(T ) is a proper closed subspace of
X. Applying the Banach space isomorphism T repeatedly, we conclude
that for every natural number m, the space Vm+1 := Ran((T )m+1 ) is a
proper closed subspace of Vm := Ran((T )m ). From the Riesz lemma, we
may thus find unit vectors xm in Vm for m = 0, 1, 2, . . . whose distance to
Vm+1 is at least 1/2 (say).
Now suppose that n > m 0. By construction, xn , (T )xn , (T )xm
all lie in Vm+1 , and thus T xn T xm xm + Vm+1 . Since xm lies at a
distance at least 1/2 from Vm+1 , we conclude the separation proeprty
||
.
2
But this implies that the sequence {T xn : n N} is not totally bounded,
contradicting the compactness of T .
kT xn T xm k
4.1.3. Third proof. Now we give another textbook proof of the Fredholm
alternative, based on Fredholm index theory. The basic idea is to observe
that the Fredholm alternative is easy when is large enough (and specifically, when || > kT kop ), as one can then invert T using Neumann series.
One can then attempt to continously perturb from large values to small
values, using stability results (such as Lemma 4.1.2) to ensure that invertibility does not suddenly get destroyed during this process. Unfortunately,
there is an obstruction to this strategy, which is that during the perturbation process, may pass through an eigenvalue of T . To get around this, we
will need to abandon the hypothesis that T has no eigenvalue at , and work
59
60
4. Analysis
To finish the proof, it suffices by the discrete nature of the index function
(which takes values in the integers) to establish continuity of the index:
Lemma 4.1.7 (Continuity of index). Let T : X X be a compact operator
on a Banach space. Then the function 7 ind(T ) is continuous from
C\{0} to Z.
Proof. Let be non-zero. Our task is to show that
ind(T 0 ) = ind(T )
for all 0 sufficiently close to .
In the model case when T is invertible (and thus has index zero),
the claim is easy, because (T 0 )(T )1 = 1 + ( 0 )(T )1 can
be inverted by Neumann series for 0 close enough to , giving rise to the
invertibility of T .
Now we handle the general case. As every finite dimensional space is
complemented, we can split X = Ker(T ) + V for some closed subspace
V of X, and similarly split X = Ran(T ) + W for some finite-dimensional
subspace W of X with dimension codim Ran(T ).
From the lower bound we see that T is a Banach space isomorphism
from V to Ran(T ). For 0 close to , we thus see that (T 0 )(V ) is close
to Ran(T ), in the sense that one can map the latter space to the former
by a small perturbation of the identity (in the operator norm). Since W
complements Ran(T ), it also complements (T 0 )(V ) for 0 sufficiently
close to . (To see this, observe that the composition of the obvious maps
X 7 W Ran(T ) W V W (T 0 )(V ) X
is a small perturbation of the identity map and is thus invertible for 0 close
to .)
Let : X W be the projection onto W with kernel (T 0 )(V ).
Then (T 0 ) maps the finite-dimensional space Ker(T ) to the finitedimensional space W . By the rank-nullity theorem, this map has index equal
to dim Ker(T ) dim(W ) = ind(T ). Gluing this with the Banach
space isomorphism T 0 : V Ran(T 0 ), we see that T 0 also has
index ind(T ), as desired.
Remark 4.1.8. Again, this result extends to more general Fredholm operators, with the result being that the index of a Fredholm operator is stable
with respect to continuous deformations in the operator norm topology.
61
Df 1 (f (x0 )) = Df (x0 )1 .
The textbook proof of the inverse function theorem proceeds by an application of the contraction mapping theorem. Indeed, one may normalise
x0 = f (x0 ) = 0 and Df (0) to be the identity map; continuity of Df
then shows that Df (x) is close to the identity for small x, which may be
used (in conjunction with the fundamental theorem of calculus) to make
x 7 x f (x) + y a contraction on a small ball around the origin for small
y, at which point the contraction mapping theorem readily finishes off the
problem.
Less well known is the fact that the hypothesis of continuous differentiability may be relaxed to just everywhere differentiability:
Theorem 4.2.2 (Everywhere differentiable inverse function theorem). Let
Rn be an open set, and let f : Rn be an everywhere differentiable
function, such that for every x0 , the derivative map Df (x0 ) : Rn Rn
is invertible. Then f is a local homeomorphism; thus, for every x0 ,
there exists an open neighbourhood U of x0 and an open neighbourhood V of
f (x0 ) such that f is a homeomorphism from U to V .
As before, one can recover the differentiability of the local inverses, with
the derivative of the inverse given by the usual formula (4.1).
This result implicitly follows from the more general results of Cernavskii
[Ce1964] about the structure of finite-to-one open and closed maps, however the arguments there are somewhat complicated (and subsequent proofs
of those results, such as the one in [Va1966], use some powerful tools from
algebraic topology, such as dimension theory). There is however a more
elementary proof of Saint Raymond [Ra2002] that was pointed out to me
by Julien Melleray. It only uses basic point-set topology (for instance, the
concept of a connected component) and the basic topological and geometric
62
4. Analysis
structure of Euclidean space (in particular relying primarily on local compactness, local connectedness, and local convexity). I decided to present (an
arrangement of) Saint Raymonds proof here.
To obtain a local homeomorphism near x0 , there are basically two things
to show: local surjectivity near x0 (thus, for y near f (x0 ), one can solve
f (x) = y for some x near x0 ) and local injectivity near x0 (thus, for distinct
x1 , x2 near f (x0 ), f (x1 ) is not equal to f (x2 )). Local surjectivity is relatively
easy; basically, the standard proof of the inverse function theorem works
here, after replacing the contraction mapping theorem (which is no longer
available due to the possibly discontinuous nature of Df ) with the Brouwer
fixed point theorem instead (or one could also use degree theory, which is more
or less an equivalent approach). The difficulty is local injectivity - one needs
to preclude the existence of nearby points x1 , x2 with f (x1 ) = f (x2 ) = y;
note that in contrast to the contraction mapping theorem that provides both
existence and uniqueness of fixed points, the Brouwer fixed point theorem
only gives existence and not uniqueness.
In one dimension n = 1 one can proceed by using Rolles theorem. Indeed, as one traverses the interval from x1 to x2 , one must encounter some
intermediate point x which maximises the quantity |f (x ) y|, and which
is thus instantaneously non-increasing both to the left and to the right of
x . But, by hypothesis, f 0 (x ) is non-zero, and this easily leads to a contradiction.
Saint Raymonds argument for the higher dimensional case proceeds in
a broadly similar way. Starting with two nearby points x1 , x2 with f (x1 ) =
f (x2 ) = y, one finds a point x which locally extremises kf (x ) yk
in the following sense: kf (x ) yk is equal to some r > 0, but x is
adherent to at least two distinct connected components U1 , U2 of the set
U = {x : kf (x) yk < r }. (This is an oversimplification, as one has to
restrict the available points x in U to a suitably small compact set, but let
us ignore this technicality for now.) Note from the non-degenerate nature of
Df (x ) that x was already adherent to U ; the point is that x disconnects
U in some sense. Very roughly speaking, the way such a critical point x is
found is to look at the sets {x : kf (x) yk r} as r shrinks from a large
initial value down to zero, and one finds the first value of r below which
this set disconnects x1 from x2 . (Morally, one is performing some sort of
Morse theory here on the function x 7 kf (x) yk, though this function
does not have anywhere near enough regularity for classical Morse theory
to apply.)
The point x is mapped to a point f (x ) on the boundary B(y, r ) of
the ball B(y, r ), while the components U1 , U2 are mapped to the interior of
this ball. By using a continuity argument, one can show (again very roughly
63
64
4. Analysis
4
10 .
The two
Next, if x B(0, 1), then by (4.3) we have f (x) 6 B(0, 12 ), and hence
f (x) 6 B(y, r). Thus Kr is disjoint from the sphere B(0, 1). Since x1 lies
in the interior of this sphere we thus have Kr1 B(0, 1) as required.
Next, we show that the Kr1 increase continuously in r:
1
Lemma 4.2.7. If 0 r < 20
and > 0, then for r < r0 <
1
close to r, Kr0 is contained in an -neighbourhood of Kr1 .
65
1
20
sufficiently
T
Proof. By the finite intersection property, it suffices to show that r0 >r Kr10 =
Kr1 . Suppose for contradiction that there is a point x outside of Kr1 that
lies in Kr10 for all r0 > r. Then x lies in Kr0 for all r0 > r, and hence lies
in Kr B(0, 1). As x and x1 lie in different connected components of the
compact set Kr B(0, 1) (recall that Kr is disjoint from B(0, 1)), there
must be a partition of Kr B(0, 1) into two disjoint closed sets F, G that
separate x from x1 (for otherwise the only clopen sets in Kr B(0, 1) that
contain x1 would also contain x, and their intersection would then be a connected subset of Kr B(0, 1) that contains both x1 and x, contradicting the
fact that x lies outside Kr1 ). By normality, we may find open neighbourhoods U, V of F, G that are disjoint. For all x on the boundary U , one has
kf (x) yk > r for all x U . As U is compact and f is continuous, we
thus have kf (x) yk > r0 for all x U if r0 is sufficiently close to r. This
makes U Kr0 clopen in Kr0 , and so x cannot lie in Kr10 , giving the desired
contradiction.
2
Observe that Kr1 contains x2 for r 10
, but does not contain x2 for
1
r = 0. By the monotonicity of the Kr and least upper bound principle,
2
there must therefore exist a critical 0 r 10
such that Kr1 contains x2
for all r > r , but does not contain x2 for r < r . From Lemma 4.2.7 we see
that Kr1 must also contain x2 . In particular, by Lemma 4.2.5, r > 0.
We now analyse the critical set Kr1 . By construction, this set is connected, compact, contains both x1 and x2 , contained in B(0, 1), and one has
kf (x) yk r for all x Kr1 .
Lemma 4.2.8. The set U := {x Kr1 : kf (x) yk < r } is open and
disconnected.
Proof. The openness is clear from the continuity of f (and the local connectedness of Rn ). Now we show disconnectedness. Being an open subset of
Rn , connectedness is equivalent to path connectedness, and x1 and x2 both
lie in U , so it suffices to show that x1 and x2 cannot be joined by a path in
U . But if such a path existed, then by compactness of and continuity of
f , one would have Kr for some r < r . This would imply that x2 Kr1 ,
contradicting the minimal nature of r , and the claim follows.
Lemma 4.2.9. U has at most finitely many connected components.
Proof. Let U1 be a connected component of U ; then f (U1 ) is non-empty
and contained in B(y, r ). As U is open, U1 is also open, and thus by
Corollary 4.2.4, f (U1 ) is open also.
66
4. Analysis
We claim that f (U1 ) is in fact all of B(y, r ). Suppose this were not the
case. As B(y, r ) is connected, this would imply that f (U1 ) is not closed in
B(y, r ); thus there is an element z of B(y, r ) which is adherent to f (U1 ),
but does not lie in f (U1 ). Thus one may find a sequence xn in U1 with
f (xn ) converging to z. By compactness of Kr1 (which contains U1 ), we may
pass to a subsequence and assume that xn converges to a limit x in Kr1 ;
then f (x) = z. By continuity, there is thus a ball B centred at x that is
mapped to B(y, r) for some r < r ; this implies that B lies in Kr and hence
in Kr1 (since x Kr1 ) and thence in U (since r is strictly less than r ). As
x is adherent to U1 and B is connected, we conclude that B lies in U1 . In
particular x lies in U1 and so z = f (x) lies in f (U1 ), a contradiction.
As f (U1 ) is equal to B(y, r ), we thus see that U1 contains an element
of f 1 ({y}). However, each element x of f 1 ({y}) must be isolated since
Df (x) is non-singular. By compactness of Kr1 , the set Kr1 (and hence U )
thus contains at most finitely many elements of f 1 ({y}), and so there are
finitely many components as claimed.
Lemma 4.2.10. Every point in Kr1 is adherent to U (i.e. U = Kr1 ).
Proof. If x Kr1 , then kf (x) yk r . If kf (x) yk < r then x U
and we are done, so we may assume kf (x) yk = r . By differentiability,
one has
f (x0 ) = f (x) + Df (x)(x0 x) + o(kx0 xk)
for all x0 sufficiently close to x. If we choose x0 to lie on a ray emenating from
x such that Df (x)(x0 x) lies on a ray pointing towards y from f (x) (this
is possible as Df (x) is non-singular), we conclude that for all x0 sufficiently
close to x on this ray, kf (x0 ) yk < r . Thus all such points x0 lie in Kr ;
since x lies in Kr1 and the ray is locally connected, we see that all such
points x0 in fact lie in Kr1 and thence in U . The claim follows.
Corollary 4.2.11. There exists a point x Kr1 with kf (x )yk = r (i.e.
x lies outside U ) which is adherent to at least two connected components of
U.
Proof. Suppose this were not the case, then the closures of all the connected
components of U would be disjoint. (Note that an element of one connected
component of U cannot lie in the closure of another component.) By Lemma
4.2.10, these closures would form a partition of Kr1 by closed sets. By
Lemma 4.2.8, there are at least two such closed sets, each of which is nonempty; by Lemma 4.2.9, the number of such closed sets is finite. But this
contradicts the connectedness of Kr1 .
Next, we prove
67
(4.5)
{x + t 0 : 0 < t < ; k 0 k }
68
4. Analysis
does not lie in f (U 0 ). Thus one may find a sequence xn in U 0 with f (xn )
converging to z. By compactness of Kr1 (which contains U 0 ), we may pass
to a subsequence and assume that xn converges to a limit x in Kr1 ; then
f (x) = z. By continuity, there is thus a ball B centred at x contained in
B(x , ) that is mapped to B(y, r) D for some r < r ; this implies that
B lies in Kr and hence in Kr1 (since x Kr1 ) and thence in U (since r
is strictly less than r ). As x is adherent to U1 and B is connected, we
conclude that B lies in U1 and thence in U 0 . In particular x lies in U 0 and
so z = f (x) lies in f (U 0 ), a contradiction.
As f (U 0 ) = D, we may thus find a sequence tn > 0 converging to zero,
and a sequence xn U 0 , such that
f (xn ) = f (x ) + tn (y f (x )).
However, if is small enough, we have kf (xn ) f (x )k comparable to kxn
x k (cf. (4.2)), and so xn converges to x . By Taylor expansion, we then
have
f (xn ) = f (x ) + Df (x )(xn x ) + o(kxn x k)
and thus
(Df (x ) + o(1))(xn x ) = tn Df (x )
for some matrix-valued error o(1). Since Df (x ) is invertible, this implies
that
xn x = tn (1 + o(1)) = tn + o(tn ).
In particular, xn lies in the cone (4.6) for n large enough, and the claim
follows.
69
1
+
=
p
p0
p1
1
1
+
=
q
q0
q1
B = B01 B1 .
The Riesz-Thorin theorem is already quite useful (it gives, for instance,
by far the quickest proof of the Hausdorff-Young inequality for the Fourier
transform, to name just one application; see e.g.[Ta2010, (1.103)]), but it
requires the same linear operator T to appear in (4.7), (4.8), and (4.9). Stein
realised, though, that due to the complex-analytic nature of the proof of the
Riesz-Thorin theorem, it was possible to allow different linear operators to
appear in (4.7), (4.8), (4.9), so long as the dependence was analytic. A bit
more precisely: if one had a family Tz of operators which depended in an
analytic manner on a complex variable z in the strip {z C : 0 Re(z) 1}
(thus, for any test functions f, g, the inner product hTz f, gi would be analytic
in z) which obeyed some mild regularity assumptions (which are slightly
technical and are omitted here), and one had the estimates
kT0+it f kLq0 (Y ) Ct kf kLp0 (X)
and
kT1+it f kLq1 (Y ) Ct kf kLp1 (X)
for all t R and some quantities Ct that grew at most exponentially in t
(actually, any growth rate significantly slower than the double-exponential
eexp(|t|) would suffice here), then one also has the interpolated estimates
kT f kLq (Y ) C 0 kf kLp (X)
for all 0 1 and a constant C 0 depending only on C, p0 , p1 , q0 , q1 .
In [Fe1995], Fefferman notes that the proof of the Stein interpolation
theorem can be obtained from that of the Riesz-Thorin theorem simply by
adding a single letter of the alphabet. Indeed, the way the Riesz-Thorin
theorem is proven is to study an expression of the form
Z
F (z) :=
T fz (y)gz (y) dy,
Y
1z
+ pz
p0
1
70
4. Analysis
One can then repeat the proof of the Riesz-Thorin theorem more or less
verbatim to obtain the Stein interpolation theorem.
The ability to vary the operator T makes the Stein interpolation theorem
significantly more flexible than the Riesz-Thorin theorem. We illustrate this
with the following sample result:
Proposition 4.3.1. For any (test) function f : R2 R, let T f : R2 R
be the average of f along an arc of a parabola:
Z
T f (x1 , x2 ) :=
f (x1 t, x2 t2 )(t) dt
R
There is nothing too special here about the parabola; the same result
in fact holds for convolution operators on any arc of a smooth curve with
nonzero curvature (and there are many extensions to higher dimensions,
to variable-coefficient operators, etc.). We will however restrict attention
to the parabola for sake of exposition. One can view T f as a convolution
T f = f , where is a measure on the parabola arc {(t, t2 ) : |t| 1}.
We will also be somewhat vague about what test function means in this
exposition in order to gloss over some minor technical details.
By testing T (and its adjoint) on the indicator function of a small ball
of some radius > 0 (or of small rectangles such as [, ] [0, 2 ]) one sees
that the exponent L3/2 , L3 here are best possible.
This proposition was first proven in [Li1973] using the Stein interpolation theorem. To illustrate the power of this theorem, it should be noted
that for almost two decades this was the only known proof of this result;
a proof based on multilinear interpolation (exploiting the fact that the exponent 3 in (4.10) is an integer) was obtained in [Ob1992], and a fully
71
(4.11)
and
(4.12)
for all (test) functions f
The bound (4.11) is an easy consequence of Minkowskis integral inequality(or Youngs inequality, noting that is a finite measure). On the other
hand, because the measure is not absolutely continuous, let alone arising
from an L (R2 ) function, the estimate (4.12) is very false. For instance, if
one applies T f to the indicator function 1[,][,] for some small > 0,
then the L1 norm of f is 2 , but the L norm of T f is comparable to ,
contradicting (4.12) as one sense to zero.
To get around this, one first notes that there is a lot of room in (4.11)
due to the smoothing properties of the measure . Indeed, from Plancherels
theorem one has
kf kL2 (R2 ) = kfkL2 (R2 )
and
kT f kL2 (R2 ) = kf
kL2 (R2 )
for all test functions f , where
f() :=
e2ix f (x) dx
R2
(1 , 2 ) :=
e2i(t1 +t 2 ) (t) dt.
R
It is clear that
() is uniformly bounded in , which already gives (4.11).
But a standard application of the method of stationary phase reveals that
one in fact has a decay estimate
C
(4.13)
|
()| 1/2
||
for some C > 0. This shows that T f is not just in L2 , but is somewhat
smoother as well; in particular, one has
kD1/2 T f kL2 (R2 ) Ckf kL2 (R2 )
72
4. Analysis
for any (fractional) differential operator D1/2 of order 1/2. (Here we adopt
the usual convention that the constant C is allowed to vary from line to
line.)
Using the numerology of the Stein interpolation theorem, this suggests
that if we can somehow obtain the counterbalancing estimate
kD1 T f kL (R2 ) Ckf kL1 (R2 )
for some differential operator D1 of order 1, then we should be able to
interpolate and obtain the desired estimate (4.10). And indeed, we can take
an antiderivative in the x2 direction, giving the operator
Z Z 0
1
x2 T f (x1 , x2 ) :=
f (x1 t, x2 t2 s) (t)dtds;
R
and a simple change of variables does indeed verify that this operator is
bounded from L1 (R2 ) to L (R2 ).
Unfortunately, the above argument is not rigorous, because we need
an analytic family of operators Tz in order to invoke the Stein interpolation
theorem, rather than just two operators T0 and T1 . This turns out to require
some slightly tricky complex analysis: after some trial and error, one finds
that one can use the family Tz defined for Re(z) > 1/3 by the formula
Z Z 0
1
1
Tz f (x1 , x2 ) =
f (x1 t, x2 t2 s) (t)dtds
(33z)/2
((3z 1)/2) R s
where is the Gamma function, and extended to the rest of the complex
plane by analytic continuation. The Gamma factor is a technical one, needed
1
as z approaches 1/3;
to compensate for the divergence of the weight s(33z)/2
it also makes the Fourier representation of Tz cleaner (indeed, Tz f is morally
(13z)/2
x2
f ). It is then easy to verify the estimates
(4.14)
for all t R. Finally, one can verify that T1/3 = T , and (4.10) then follows
from the Stein interpolation theorem.
It is instructive to compare this result with what can be obtained by
real-variable methods. One can perform a smooth dyadic partition of unity
(s) = (s) +
X
j=1
2j (2j s)
73
for some bump function (of total mass 1) and bump function (of total
mass zero), which (formally, at least) leads to the decomposition
T f = T0 f +
Tj f
j=1
(4.16)
(4.17)
X
k
e2ijt 2j Tj f kL (R2 ) Ct kf kL1 (R2 )
j=1
j=1
A Fourier averaging argument shows that these estimates imply (4.16) and
(4.17), but not conversely. If one unpacks the proof of Lindelofs theorem
(which is ultimately powered by an integral representation, such as that
provided by the Cauchy integral formula) and hence of the Stein interpolation theorem, one can interpret Stein
P interpolation in this case as using
a clever integral representation of
j=1 Tj f in terms of expressions such
2With a slightly more refined real interpolation argument, one can at least obtain a restricted
weak-type estimate from L3/2,1 (R2 ) to L3, (R2 ) this way, but one can concoct abstract coun3/2 L3
terexamples
P to show that the estimates (4.16), (4.17) are insufficient to obtain an L
bound on
T
.
j=1 j
74
4. Analysis
P 2ijt j/2
P
2ijt 2j T f
2 Tj f0+it , where f1+it , f0+it are
as
j 1+it and
j=1 e
j=1 e
various nonlinear transforms of f . Technically, it would then be possible
to rewrite the Stein interpolation argument as a real-variable one, without explicit mention of Lindelofs theorem; but the proof would then look
extremely contrived; the complex-analytic framework is much more natural (much as it is in analytic number theory, where the distribution of the
primes is best handled by a complex-analytic study of the Riemann zeta
function).
Remark 4.3.2. A useful strengthening of the Stein interpolation theorem
is the Fefferman-Stein interpolation theorem [FeSt1972], in which the endpoint spaces L1 and L are replaced by the Hardy space H1 and the space
BMO of functions of bounded mean oscillation respectively. These spaces
are more stable with respect to various harmonic analysis operators, such
as singular integrals (and in particular, with respect to the Marcinkiewicz
operators ||it , which come up frequently when attempting to use the complex method), which makes the Fefferman-Stein theorem particularly useful
for controlling expressions derived from these sorts of operators.
kDkop = sup |i |.
1in
75
A simple version of this test is as follows: if all the absolute row sums and
columns sums of A are bounded by some constant M , thus
m
X
(4.19)
|aij | M
j=1
(4.20)
|aij | M
i=1
(4.21)
(note that this generalises (the upper bound in) (4.18).) Indeed, to see
(4.21), it suffices by duality and homogeneity to show that
|
n X
m
X
(
aij xj )yi | M
i=1 j=1
Pm
Pn
n
2
2
whenever (xj )m
j=1 and (yi )i=1 are sequences with
j=1 |xj | =
i=1 |yi | =
1; but this easily follows from the arithmetic mean-geometric mean inequality
1
1
|aij xj )yi | |aij ||xi |2 + |aij ||yj |2
2
2
and (4.19), (4.20).
Schurs test (4.21) (and its many generalisations to weighted situations,
or to Lebesgue or Lorentz spaces) is particularly useful for controlling operators in which the role of oscillation (as reflected in the phase of the
coefficients aij , as opposed to just their magnitudes |aij |) is not decisive.
However, it is of limited use in situations that involve a lot of cancellation.
For this, a different test, known as the Cotlar-Stein lemma [Co1955], is
much more flexible and powerful. It can be viewed in a sense as a noncommutative variant of Schurs test (4.21) (or of (4.18)), in which the scalar
coefficients i or aij are replaced by operators instead.
To illustrate the basic flavour of the result, let us return to the bound
(4.18), and now consider instead a block-diagonal matrix
1 0 . . . 0
0 2 . . . 0
(4.22)
A= .
.. . .
..
.
.
. .
.
0
0 . . . n
76
4. Analysis
Indeed, the lower bound is trivial (as can be seen by testing A on vectors
which are supported on the ith block of coordinates), while to establish the
upper bound, one can make use of the orthogonal decomposition
m
M
m
(4.24)
C
Cmi
i=1
1 x1
2 x2
Ax = .
.
.
n xn
and the upper bound in (4.23) then follows from a simple computation.
The operator
T associated to the matrix A in (4.22) can be viewed as a
Pn
sum T = i=1 Ti , where each Ti corresponds to the i block of A, in which
case (4.23) can also be written as
(4.25)
The reason for this gain can ultimately be traced back to the orthogonality
of the Ti ; that they occupy different columns and different rows of
the range and domain of T . This is obvious when viewed in the matrix
formalism, but can also be described in the more abstract Hilbert space
operator formalism via the identities3
(4.26)
Ti Tj = 0
3The first identity asserts that the ranges of the T are orthogonal to each other, and the
i
second asserts that the coranges of the Ti (the ranges of the adjoints Ti ) are orthogonal to each
other.
77
and
Ti T j = 0
(4.27)
whenever i 6= j. By replacing (4.24) with a more abstract orthogonal decomposition into these ranges and coranges, one can in fact deduce (4.25)
directly from (4.26) and (4.27).
The Cotlar-Stein lemma is an extension of this observation to the case
where the Ti are merely almost orthogonal rather than orthogonal, in a
manner somewhat analogous to how Schurs test (partially) extends (4.18)
to the non-diagonal case. Specifically, we have
Lemma 4.4.1 (Cotlar-Stein lemma). Let T1 , . . . , Tn : H H 0 be a finite
sequence of bounded linear operators from one Hilbert space H to another
H 0 , obeying the bounds
(4.28)
n
X
kTi Tj k1/2
op M
j=1
and
(4.29)
n
X
1/2
kTi Tj kop
M
j=1
for all i = 1, . . . , n and some M > 0 (compare with (4.19), (4.20)). Then
one has
n
X
(4.30)
k
Ti kop M.
i=1
1/2
kT kop = kT T k1/2
op = kT T kop
that the hypothesis (4.28) (or (4.29)) already gives the bound
(4.32)
kTi kop M
78
4. Analysis
N
kT k2N
op = k(T T ) kop .
To estimate the right-hand side, we expand out the right-hand side and
apply the triangle inequality to bound it by
X
(4.34)
kTi1 Tj1 Ti2 Tj2 . . . TiN TjN kop .
i1 ,j1 ,...,iN ,jN {1,...,n}
79
On the other hand, we can group the product by pairs in another way, to
obtain the bound of
kTi1 kop kTj1 Ti2 kop . . . kTjN 1 TiN kop kTjN kop .
We bound kTi1 kop and kTjN kop crudely by M using (4.32). Taking the
geometric mean of the above bounds, we can thus bound (4.34) by
X
1/2
1/2
1/2
kTi1 Tj1 k1/2
M
op kTj1 Ti2 kop . . . kTjN 1 TiN kop kTiN TjN kop .
i1 ,j1 ,...,iN ,jN {1,...,n}
If we then sum this series first in jN , then in iN , then moving back all the
way to i1 , using (4.28) and (4.29) alternately, we obtain a final bound of
nM 2N
for (4.33). Taking N th roots, we obtain
kT kop n1/2N M.
Sending N , we obtain the claim.
Remark 4.4.2. As observed in a number of places (see e.g. [St1993, p.
318]
P or [Co2007]), the Cotlar-Stein lemma can be extended to infinite sums
(4.28), (4.29)). Indeed,
i=1 Ti (with the obvious changes to the hypotheses
P
one can show that for any f H, the sum i=1 Ti f is unconditionally convergent inPH 0 (and furthermore has bounded 2-variation), and the resulting
operator
i=1 Ti is a bounded linear operator with an operator norm bound
on M .
Remark 4.4.3. If we specialise to the case where all the Ti are equal, we
see that the bound in the Cotlar-Stein lemma is sharp, at least in this case.
Thus we see how the tensor power trick can convert an inefficient argument,
such as that obtained using the triangle inequality or crude bounds such as
(4.32), into an efficient one.
Remark 4.4.4. One can justify Schurs test by a similar method. Indeed,
starting from the inequality
N
kAk2N
op tr((AA ) )
(which follows easily from the singular value decomposition), we can bound
kAk2N
op by
X
ai1 ,j1 aj1 ,i2 . . . aiN ,jN ajN ,i1 .
i1 ,...,jN {1,...,n}
Estimating the other two terms in the summand by M , and then repeatedly
summing the indices one at a time as before, we obtain
2N
kAk2N
op nM
80
4. Analysis
and the claim follows from the tensor power trick as before. On the other
hand, in the converse direction, I do not know of any way to prove the
Cotlar-Stein lemma that does not basically go through the tensor power
argument.
Cd
kf kL1 (Rd )
for all f L1 (Rd ), all > 0, and some constant Cd > 0 depending only on
d. By a standard density argument, this implies in particular that we have
the Lebesgue differentiation theorem
Z
1
lim
f (y) dy = f (x)
r0 |B(x, r)| B(x,r)
for all f L1 (Rd ) and almost every x Rd . See for instance [Ta2011,
Theorem 1.6.11].
By combining the Hardy-Littlewood maximal inequality with the Marcinkiewicz
interpolation theorem [Ta2010, 1.11.10] (and the trivial inequality kM f kL (Rd )
kf kL (Rd ) ) we see that
(4.36)
for all p > 1 and f Lp (Rd ), and some constant Cd,p depending on d and
p.
The exact dependence of Cd,p on d and p is still not completely understood. The standard Vitali-type covering argument used to establish
(4.35) has an exponential dependence on dimension, giving a constant of
the form Cd = C d for some absolute constant C > 1. Inserting this into the
Cd
Marcinkiewicz theorem, one obtains a constant Cd,p of the form Cd,p = p1
for some C > 1 (and taking p bounded away from infinity, for simplicity).
The dependence on p is about right, but the dependence on d should not be
exponential.
In [St1982, StSt1983], Stein gave an elegant argument, based on the
Calder
on-Zygmund method of rotations, to eliminate the dependence of d:
81
Theorem 4.5.1. One can take Cd,p = Cp for each p > 1, where Cp depends
only on p.
The argument is based on an earlier bound [St1976] of Stein on the
spherical maximal function
MS f (x) := sup Ar |f |(x)
r>0
d d1
and
is normalised surface measure on the sphere S d1 . Because this
is an uncountable supremum, and the averaging operators Ar do not have
good continuity properties in r, it is not a priori obvious that MS f is even
a measurable function for, say, locally integrable f ; but we can avoid this
technical issue, at least initially, by restricting attention to continuous functions f . The Stein maximal theorem for the spherical maximal function
d
then asserts that if d 3 and p > d1
, then we have
(4.37)
82
4. Analysis
(4.38)
for the d-dimensional spherical maximal function, implies the same bound
(4.39)
for the d + 1-dimensional spherical maximal function, with exactly the same
constant A. For any direction 0 S d Rd+1 , consider the averaging
operators
MS0 f (x) := sup Ar 0 |f |(x)
r>0
Rd+1
C, where
Z
0
Ar f (x) :=
f (x + rU0 ) d d1 ()
S d1
5This implication is initially only valid for continuous functions, but one can then extend the
inequality (4.36) to the rest of Lp (Rd ) by a standard limiting argument.
83
indeed, one can deduce this from the uniqueness of Haar measure by noting
that both the left-hand side and right-hand side are invariant means of f on
the sphere {y Rd+1 : |y x| = r}. This implies that
Z
MS0 f (x) d d (0 )
MS f (x)
Sd
and thus by Minkowskis inequality for integrals, we may deduce (4.39) from
(4.40).
Remark 4.5.2. Unfortunately, the method of rotations does not work to
show that the constant Cd for the weak (1, 1) inequality (4.35) is independent of dimension, as the weak L1 quasinorm kkL1, is not a genuine norm
and does not obey the Minkowski inequality for integrals. Indeed, the question of whether Cd in (4.35) can be taken to be independent of dimension
remains open. The best known positive result is due to Stein and Stromberg
[StSt1983], who showed that one can take Cd = Cd for some absolute constant C, by comparing the Hardy-Littlewood maximal function with the
heat kernel maximal function
sup et |f |(x).
t>0
84
4. Analysis
(4.41)
1r2
where P1 =
N 1 PN is the projection to frequencies || . 1. By the
triangle inequality, it then suffices to show the bounds
(4.42)
and
(4.43)
Ar P1 f (x) . M f (x)
85
Ar PN f (x) . N M f (x),
(4.46)
1r2
for any q > 1. This is not directly strong enough to prove (4.43), due to the
loss of one derivative as manifested by the factor N . On the other hand,
d
this bound (4.46) holds for all q > 1, and not just in the range p > d1
.
To counterbalance this loss of one derivative, we turn to L2 estimates.
A standard stationary phase computation (or Bessel function computation)
shows that Ar is a Fourier multiplier whose symbol decays like ||(d1)/2 .
As such, Plancherels theorem yields the L2 bound
kAr PN f kL2 (Rd ) . N (d1)/2 kf kL2 (Rd )
uniformly in 1 r 2. But we still have to take the supremum over
r. This is an uncountable supremum, so one cannot just apply a union
bound argument. However, from the uncertainty principle, we expect PN f
to be blurred out at spatial scale 1/N , which suggests that the averages
Ar PN f do not vary much when r is restricted to an interval of size 1/N .
Heuristically, this then suggests that
sup |Ar PN f |
sup
|Ar PN f |.
1
Z
1r2:r N
1r2
1
1r2:r N
Z
and taking L2 norms, one is then led to the heuristic prediction that
(4.47)
One can make this heuristic precise using the one-dimensional Sobolev embedding inequality adapted to scale 1/N , namely that
Z 2
Z 2
1/2
2
1/2
1/2
sup |g(r)| . N (
|g(r)| dr) + N
(
|g 0 (r)|2 dr)1/2 .
1r2
d
Ar PN f kL2 (Rd ) . N N (d1)/2 kf kL2 (Rd )
dr
86
4. Analysis
and
(4.49)
for each N > 1 and some > 0 depending only on d and p, similarly to
before.
A rescaled version of the derivation of (4.44) gives
Ar P1/R f (x) . M f (x)
for all R r 2R, which already lets us deduce (4.48). As for (4.49), a
rescaling of (4.45) gives
Ar PN/R f (x) . N M f (x),
for all R r 2R, and thus
(4.50)
d
N
Ar PN/R f kL2 (Rd ) . N (d1)/2 kf kL2 (Rd )
dr
R
87
and so
Z
Z 2R
1 2R
R
d
2
1/2
k(
|Ar PN/R f | dr) + ( 2
| Ar PN/R f |2 dr)1/2 kL2 (Rd )
R R
N R dr
. N 1/2 N (d1)/2 kf kL2 (Rd )
which implies by rescaled Sobolev embedding that
k sup |Ar PN/R f |kL2 (Rd ) . N 1/2 N (d1)/2 kf kL2 (Rd ) .
Rr2R
for all f Lp (X) (or all f in the dense subclass), and some constant C,
where Lp, is the weak Lp norm
kf kLp, (X) := sup t({x X : |f (x)| t})1/p .
t>0
88
4. Analysis
and not just in the dense subclass. See for instance [Ta2011, 1.6], in which
this method is used to deduce the Lebesgue differentiation theorem from the
Hardy-Littlewood maximal inequality.
This is by now a very standard approach to establishing pointwise almost everywhere convergence theorems, but it is natural to ask whether it
is strictly necessary. In particular, is it possible to have a pointwise convergence result Tn f 7 T f without being able to obtain a weak-type maximal
inequality of the form (4.51)?
In the case of norm convergence (in which one asks for Tn f to converge
to T f in the Lp norm, rather than in the pointwise almost everywhere sense),
the answer is no, thanks to the uniform boundedness principle, which among
other things shows that norm convergence is only possible if one has the
uniform bound
sup kTn f kLp (X) Ckf kLp (X)
(4.52)
for some C > 0 and all f Lp (X); and conversely, if one has the uniform
bound, and one has already established norm convergence of Tn f to T f on
a dense subclass of Lp (X), (4.52) will extend that norm convergence to all
of Lp (X).
Returning to pointwise almost everywhere convergence, the answer in
general is yes. Consider for instance the rank one operators
Z 1
Tn f (x) := 1[n,n+1]
f (y) dy
0
L1 (R)
L1 (R).
from
to
It is clear that Tn f converges pointwise almost everywhere to zero as n for any f L1 (R), and the operators Tn are
uniformly bounded on L1 (R), but the maximal function T does not obey
(4.51). One can modify this example in a number of ways to defeat almost
any reasonable conjecture that something like (4.51) should be necessary for
pointwise almost everywhere convergence.
In spite of this, a remarkable observation of Stein [St1961], now known
as Steins maximal principle, asserts that the maximal inequality is necessary to prove pointwise almost everywhere convergence, if one is working on
a compact group and the operators Tn are translation invariant, and if the
exponent p is at most 2:
Theorem 4.6.1 (Stein maximal principle). Let G be a compact group, let X
be a homogeneous space6 of G with a finite Haar measure , let 1 p 2,
and let Tn : Lp (X) Lp (X) be a sequence of bounded linear operators
commuting with translations, such that Tn f converges pointwise almost everywhere for each f Lp (X). Then (4.51) holds.
6By this, we mean that G has a transitive action on X which preserves .
89
This is not quite the most general version of the principle; some additional variants and generalisations are given in [St1961]. For instance, one
can replace the discrete sequence Tn of operators with a continuous sequence
Tt without much difficulty. As a typical application of this principle, we see
that Carlesons celebrated theorem [Ca1966] that the partial Fourier series
PN
2inx of an L2 (R/Z) function f : R/Z C converge almost
n=N f (n)e
everywhere is in fact equivalent to the estimate
(4.53)
k sup |
N
X
N >0 n=N
And unsurprisingly, most of the proofs of this (difficult) theorem have proceeded by first establishing (4.53), and Steins maximal principle strongly
suggests that this is the optimal way to try to prove this theorem.
On the other hand, the theorem does fail for p > 2, and almost everywhere convergence results in Lp for p > 2 can be proven by other methods
than weak (p, p) estimates. For instance, the convergence of Bochner-Riesz
multipliers in Lp (Rn ) for any n (and for p in the range predicted by the
Bochner-Riesz conjecture) was verified for p > 2 in [CaRuVe1988], despite the fact that the weak (p, p) of even a single Bochner-Riesz multiplier,
let alone the maximal function, has still not been completely verified in
this range. (The argument in [CaRuVe1988] uses weighted L2 estimates
for the maximal Bochner-Riesz operator, rather than Lp type estimates.)
For p 2, though, Steins principle (after localising to a torus) does apply, though, and pointwise almost everywhere convergence of Bochner-Riesz
means is equivalent to the weak (p, p) estimate (4.51).
Steins principle is restricted to compact groups (such as the torus (R/Z)n
or the rotation group SO(n)) and their homogeneous spaces (such as the
torus (R/Z)n again, or the sphere S n1 ). As stated, the principle fails
in the noncompact setting; for instance, in R, the convolution operators
Tn f := f 1[n,n+1] are such that Tn f converges pointwise almost everywhere
to zero for every f L1 (Rn ), but the maximal function is not of weak-type
(1, 1). However, in many applications on non-compact domains, the Tn are
localised enough that one can transfer from a non-compact setting to a
compact setting and then apply Steins principle. For instance, Carlesons
theorem on the real line R is equivalent to Carlesons theorem on the circle
R/Z (due to the localisation of the Dirichlet kernels), which as discussed
before is equivalent to the estimate (4.53) on the circle, which by a scaling
argument is equivalent to the analogous estimate on the real line R.
Steins argument from [St1961] can be viewed nowadays as an application of the probabilistic method ; starting with a sequence of increasingly bad
counterexamples to the maximal inequality (4.51), one randomly combines
90
4. Analysis
91
92
4. Analysis
Chapter 5
Nonstandard analysis
1In many cases, a bound can eventually be worked out by performing proof mining on
the argument, and in particular by carefully unpacking the proofs of all the various results from
infinitary mathematics that were used in the argument, as opposed to simply using them as black
boxes, but this is a time-consuming task and the bounds that one eventually obtains tend to be
quite poor (e.g. tower exponential or Ackermann type bounds are not uncommon).
93
94
5. Nonstandard analysis
95
Chang proves this lemma by essentially establishing a quantitative version of the nullstellensatz, via elementary elimination theory (somewhat
similar, actually, to the approach taken in I took to the nullstellensatz in
[Ta2008, 1.15]. She also notes that one could also establish the result
through the machinery of Gr
obner bases. In each of these arguments, it was
not possible to use Lemma 5.1.1 (or the closely related nullstellensatz) as a
black box; one actually had to unpack one of the proofs of that lemma or
nullstellensatz to get the polynomial bound. However, using nonstandard
analysis, it is possible to get such polynomial bounds (albeit with an ineffective value of the constant C) directly from Lemma 5.1.1 (or more precisely,
the generalisation in Remark 5.1.2) without having to inspect the proof,
and instead simply using it as a black box, thus providing a soft proof of
Lemma 5.1.3 that is an alternative to the hard proofs mentioned above.
The nonstandard proof is essentially due to Schmidt-Gottsch [Sc1989],
and proceeds as follows. Informally, the idea is that Lemma 5.1.3 should
follow from Lemma 5.1.1 after replacing the field of rationals Q with the
field of rationals of polynomially bounded height. Unfortunately, the latter
object does not really make sense as a field in standard analysis; nevertheless,
it is a perfectly sensible object in nonstandard analysis, and this allows the
above informal argument to be made rigorous.
We turn to the details. As is common whenever one uses nonstandard
analysis to prove finitary results, we use a compactness and contradiction
argument (or more precisely, an ultralimit and contradiction argument).
Suppose for contradiction that Lemma 5.1.3 failed. Carefully negating the
quantifiers (and using the axiom of choice), we conclude that there exists
D, d, r such that for each natural number n, there is a positive integer H (n)
(n)
(n)
and a family P1 , . . . , Pr : Cd C of polynomials of degree at most D
and rational coefficients of height at most H (n) , such that there exist at least
one complex solution z (n) Cd to
(5.1)
(n)
P1 (z (n) ) = . . . = Pr (z (n) ) = 0,
but such that there does not exist any such solution whose coefficients are
algebraic numbers of degree at most n and height at most n(H (n) )n .
Now we take ultralimits (see e.g. [Ta2011b, 2.1] of a quick review of
ultralimit analysis, which we will assume knowledge of in the argument that
follows). Let p N\N be a non-principal ultrafilter. For each i = 1, . . . , r,
the ultralimit
(n)
Pi := lim Pi
np
(n)
Pi
96
5. Nonstandard analysis
But for n larger than C, this contradicts the construction of the Pi , and
the claim follows. (Note that as p is non-principal, any neighbourhood of p
in N will contain arbitrarily large natural numbers.)
Remark 5.1.4. The same argument actually gives a slightly stronger version of Lemma 5.1.3, namely that the integer coefficients used to define the
algebraic solution z can be taken to be polynomials in the coefficients of
P1 , . . . , Pr , with degree and coefficients bounded by CD,d,r .
97
Z
( f (x, y) dX (x)) dY (y)
and
Z
f (x, y) dXY (x, y)
XY
all exist (thus all integrands are almost-everywhere well-defined and measurable with respect to the appropriate -algebras), and are all equal to each
other; see e.g. [Ta2011, Theorem 1.7.15].
2There are technical difficulties with the theory when X or Y is not -finite, but in these
notes we will only be dealing with probability spaces, which are clearly -finite, so this difficulty
will not concern us.
98
5. Nonstandard analysis
|A|
,
|V |
where |A| denotes the cardinality of A. In this discrete setting, the probability space is automatically complete, and every function f : V [0, +]
is measurable, with the integral simply being the average:
Z
1 X
f dV =
f (v).
|V |
V
vV
Of course, Tonellis theorem is obvious for these discrete spaces; the deeper
content of that theorem is only apparent at the level of continuous measure
spaces.
Among other things, this probability space structure on finite sets can
be used to describe various statistics of dense graphs. Recall that a graph
G = (V, E) is a finite vertex set V , together with a set of edges E, which we
will think of as a symmetric subset3 of the Cartesian product V V . Then,
if V is non-empty, and ignoring some minor errors coming from the diagonal
V , the edge density of the graph is essentially
Z
e(G) := V V (E) =
1E (v, w) dV V (v, w),
V V
and so forth.
In [RuSz1978], Ruzsa and Szemeredi established the triangle removal
lemma concerning triangle densities, which informally asserts that a graph
with few triangles can be made completely triangle-free by removing a small
number of edges:
Lemma 5.2.1 (Triangle removal lemma). Let G = (V, E) be a graph on a
non-empty finite set V , such that t(G) for some > 0. Then there exists
a subgraph G0 = (V, E 0 ) of G with t(G0 ) = 0, such that e(G\G0 ) = o0 (1),
where o0 (1) denotes a quantity bounded by c() for some function c() of
that goes to zero as 0.
3If one wishes, one can prohibit loops in E, so that E is disjoint from the diagonal V :=
{(v, v) : v V } of V V , but this will not make much difference for the discussion below.
99
The original proof of the triangle removal lemma was a finitary one,
and proceeded via the Szemeredi regularity lemma [Sz1978]. It has a number of consequences; for instance, as already noted in that paper, the triangle
removal lemma implies as a corollary Roths theorem [Ro1953] that subsets
of Z of positive upper density contain infinitely many arithmetic progressions of length three.
It is however also possible to establish this lemma by infinitary means.
There are at least three basic approaches for this. One is via a correspondence principle between questions about dense finite graphs, and questions
about exchangeable random infinite graphs, as was pursued in [Ta2007],
[Ta2010b, 2.3]. A second (closely related to the first) is to use the machinery of graph limits, as developed in [LoSz2006], [BoChLoSoVe2008].
The third is via nonstandard analysis (or equivalently, by using ultraproducts), as was pursued in [ElSz2012]. These three approaches differ in the
technical details of their execution, but the net effect of all of these approaches is broadly the same, in that they both convert statements about
large dense graphs (such as the triangle removal lemma) to measure theoretic statements on infinitary measure spaces. (This is analogous to how
the Furstenberg correspondence principle converts combinatorial statements
about dense sets of integers into ergodic-theoretic statements on measurepreserving systems.)
In this section, we will illustrate the nonstandard analysis approach
of [ElSz2012] by providing a nonstandard proof of the triangle removal
lemma. The main technical tool used here (besides the basic machinery
of nonstandard analysis) is that of Loeb measure [Lo1975], which gives a
probability
space structure (V, BV , V ) to nonstandard finite non-empty sets
Q
V = np Vn that is an infinitary analogue of the discrete probability space
structures V = (V, 2V , V ) one has on standard finite non-empty sets. The
nonstandard analogue of quantities such as triangle densities then become
the integrals of various nonstandard functions with respect to Loeb measure. With this approach, the epsilons and deltas that are so prevalent in
the finitary approach to these subjects disappear almost completely; but to
compensate for this, one now must pay much more attention to questions of
measurability, which were automatic in the finitary setting but now require
some care in the infinitary one.
The nonstandard analysis approaches are also related to the regularity
lemma approach; see [Ta2011d, 4.4] for a proof of the regularity lemma
using Loeb measure.
As usual, the nonstandard approach offers a complexity tradeoff: there
is more effort expended in building the foundational mathematical structures of the argument (in this case, ultraproducts and Loeb measure), but
100
5. Nonstandard analysis
once these foundations are completed, the actual arguments are shorter than
their finitary counterparts. In the case of the triangle removal lemma, this
tradeoff does not lead to a particularly significant reduction in complexity
(and arguably leads in fact to an increase in the length of the arguments,
when written out in full), but the gain becomes more apparent when proving more complicated results, such as the hypergraph removal lemma, in
which the initial investment in foundations leads to a greater savings in net
complexity, as can be seen in [ElSz2012].
5.2.1. Loeb measure. We use the usual setup of nonstandard analysis
(as reviewed for instance in [Ta2011d, 4.4]). Thus, we will need a nonprincipal Ultrafilter ultrafilter p N\N on the natural numbers N. A
statement P (n) pertaining to a natural number n is said to hold for n sufficiently close to p if the set of n for which P (n) holds lies in the ultrafilter p.
Given
a sequence Xn of (standard) spaces Xn , the Ultraproductultraproduct
Q
X
np n is the space of all ultralimits limnp xn with xn Xn , with two
ultralimits limnp xn , limnp yn considered equal if and only if xn = yn for
all n sufficiently close to p.
Now
Q consider a nonstandard finite non-empty set V , i.e. an ultraproduct
V = np Vn of standard finite non-empty sets Vn . Define an internal
Q
subset of V to be a subset of V of the form A = np An , where each An
is a subset of Vn . It is easy to see that the collection AV of all internal
subsets of V is a boolean algebra. In general, though, AV will not be a algebra. For instance, suppose that the Vn are the standard discrete intervals
Vn := [1, n] := {i N : i n}, then V is the non-standard discrete interval
V = [1, N ] := {i N : i N }, where N is the unbounded nonstandard
natural number N := limnp n. For any standard integer m, the subinterval
[1, N/m] is an internal subset of V ; but the intersection
\
[1, o(N )] :=
[1, N/m] = {i N : i = o(N )}
mN
is not an internal subset of V . (This can be seen, for instance, by noting that
all non-empty internal subsets of [1, N ] have a maximal element, whereas
[1, o(N )] does not.)
Q
Given any internal subset A = np An of V , we can define the cardinality |A| of A, which is the nonstandard natural number |A| := limnp |An |.
|A|
We then have the nonstandard density |V
| , which is a nonstandard real
number between 0 and 1. By the Bolzano-Weierstrass theorem, every this
|A|
|A|
bounded nonstandard real number |V
| has a unique standard part st( |V | ),
which is a standard real number in [0, 1] such that
|A|
|A|
= st(
) + o(1),
|V |
|V |
101
[
EA
Bn
n=1
and
V (Bn ) .
n=1
|A|
Proof. The map V : A 7 st( |V
| ) is a finitely additive probability measure
on AV . We claim that this map V is in fact a pre-measure on AV , thus
one has
X
(5.2)
V (A) =
V (An )
n=1
102
5. Nonstandard analysis
outer measure, but this is easily seen since the space of all sets that have
this property is easily verified to be a complete -algebra that contain the
algebra of internal sets. The if portion follows easily from the fact that
LV is a complete -algebra containing the internal sets. (These facts are
very similar to the more familiar facts that a bounded subset of a Euclidean
space is Lebesgue measurable if and only if it differs from an elementary set
by a set of arbitrarily small outer measure.)
Now we turn to the analogue of Tonellis theorem for Loeb measure,
which will be a fundamental tool when it comes to prove the triangle removal
lemma. Let V, W be two nonstandard finite non-empty sets, then V
W is also a nonstandard finite non-empty set. We then have three Loeb
probability spaces
(V, LV , V ),
(W, LW , W ),
(V W, LV W , V W ),
(5.3)
(V W, LV LW , V W ).
It is then natural to ask how the two probability spaces (5.3) and (5.4)
are related. There is one easy relationship, which shows that (5.3) extends
(5.4):
Exercise 5.2.1. Show that (5.3) is a refinement of (5.4), thus LV LW ,
and V W extends V W . (Hint: first recall why the product of Lebesgue
measurable sets is Lebesgue measurable, and mimic that proof to show that
the product of a LV -measurable set and a LW -measurable set is LV W measurable, and that the two measures V W and V W agree in this
case.)
In the converse direction, (5.3) enjoys the type of Tonelli theorem that
(5.4) does:
Theorem 5.2.3 (Tonelli theorem for Loeb measure). Let V, W be two nonstandard finite non-empty sets, and let f : V W [0, +] be an unsigned
LV W -measurable function. Then the expressions
Z Z
(5.5)
(
f (v, w) dW (w)) dV (v)
V
Z
(5.6)
Z
(
103
and
Z
f (v, w) dV W (v, w)
(5.7)
V W
are well-defined (thus all integrands are almost everywhere well-defined and
appropriately measurable) and equal to each other.
Proof. By the monotone convergence theorem it suffices to verify this when
f is a simple function; by linearity we may then take f to be an indicator
function f = 1E . Using Theorem 5.2.2 and an approximation argument
(and many further applications of monotone convergence) we may assume
without loss of generality that E is an internal set. We then have
Z
|E|
f (v, w) dV W (v, w) = st(
)
|V ||W |
V W
and for every v V , we have
Z
|Ev |
f (v, w) dW (w) = st(
),
|W |
W
where Ev is the internal set
Ev := {w W : (v, w) E}.
Let n be a standard natural number, then we can partition V into the
internal sets V = V1 . . . Vn , where
Vi := {v V :
i1
|Ev |
i
<
}.
n
|W |
n
On each Vi , we have
Z
(5.8)
f (v, w) dW (w) =
W
i
1
+ O( )
n
n
and
(5.9)
i
1
|Ev |
= + O( ).
|W |
n
n
R
W
f (v, w) dW (w)
i=1
104
5. Nonstandard analysis
X i |Vi |
|E|
1
=
+ O( ).
|V ||W |
n |V |
n
i=1
R
Thus we see that the upper and lower integrals of W f (v, w) dW (w) are
1
equal to |V|E|
||WR| + O( n ) for every standard n. Sending n to infinity, we
conclude that W f (v, w) dW (w) is measurable, and that
Z Z
|E|
f (v, w) dW (w)) dV (v) = st(
(
)
|V ||W |
W
V
showing that (5.5) and (5.7) are well-defined and equal. A similar argument
holds for (5.6) and (5.7), and the claim follows.
Remark 5.2.4. It is well known that the product of two Lebesgue measure
spaces Rn , Rm , upon completion, becomes the Lebesgue measure space on
Rn+m . Drawing the analogy between Loeb measure and Lebesgue measure,
it is then natural to ask whether (5.3) is simply the completion of (5.4). But
while (5.3) certainly contains the completion ofQ(5.4), it is a significantly
Q
larger space in general. Indeed, suppose V = np Vn , W = np Wn ,
where the cardinality of Vn , Wn goes to infinity at some reasonable rate, e.g.
|Vn |, |Wn | n for all n. For each n, let En be a random subset of Vn Wn ,
with each element of Vn Wn having an independent probability of 1/2 of
lying in En . Then, as is well known, the sequence of sets En is almost surely
asymptotically regular in the sense that almost surely, we have the bound
sup
An Vn ,Bn Wn
105
Then for any standard > 0, there exists a internal subsets Fij V V for
ij = 12, 23, 31 with V V (Eij \Fij ) < , which are completely triangle-free
in the sense that
(5.11)
for all u, v, w V .
Let us first see why Lemma 5.2.5 implies Lemma 5.2.1. We use the
usual compactness and contradiction argument. Suppose for contradiction
that Lemma 5.2.1 failed. Carefully negating the quantifiers, we can find a
(standard) > 0, and a sequence Gn = (Vn , En ) of graphs with t(Gn ) 1/n,
such that for each n, there does not exist a subgraph G0n = (Vn , En0 ) of n
with |En \En0 | |Vn |2 with t(G0n ) = 0. Clearly we may assume the Vn are
non-empty.
Q
We form the ultraproduct G = (V, E) of the Gn , thus V = np Vn and
Q
E = np En . By construction, E is a symmetric internal subset of V V
and we have
Z
1E (u, v)1E (v, w)1E (w, u) dV V V (u, v, w) = st lim t(Gn ) = 0.
np
V V V
Thus, by Lemma 5.2.5, we may find internal subsets F12 , F23 , F31 of V V
with V V (E\Fij ) < /6 (say) for ij = 12, 23, 31 such that (5.11) holds for
all u, v, w V . By letting E 0 be the intersection of all E with all the Fij
and their reflections, we see that E 0 is a symmetric internal subset of E with
V V (E\E 0 ) < , and we still have
1E 0 (u, v)1E 0 (v, w)1E 0 (w, u) = 0
106
5. Nonstandard analysis
for all u, v, w V . If we write E 0 = limnp En0 for some sets En0 , then for n
sufficiently close to p, one has En0 a symmetric subset of En with
Vn Vn (En \En0 ) <
and
1En0 (u, v)1En0 (v, w)1En0 (w, u) = 0.
If we then set G0n := (Vn , En ), we thus have |En \En0 | |Vn |2 and t(G0n ) = 0,
which contradicts the construction of Gn by taking n sufficiently large.
Now we prove Lemma 5.2.5. The idea (similar to that used to prove
the Furstenberg recurrence theorem, as discussed for instance in [Ta2009,
2.15]) is to first prove the lemma for very simple examples of sets Eij , and
then work ones way towards the general case. Readers who are familiar
with the traditional proof of the triangle removal lemma using the regularity
lemma will see strong similarities between that argument and the one given
here (and, on some level, they are essentially the same argument).
To begin with, we suppose first that the Eij are all elementary sets, in
the sense that they are finite boolean combinations of products of internal
sets. (At the finitary level, this corresponds to graphs that are bounded
combinations of bipartite graphs.) This implies that there is an internal
partition V = V1 . . . Vn of the vertex set V , such that each Eij is the
union of some of the Va Vb .
Let Fij be the union of all the Va Vb in Eij for which Va and Vb have
positive Loeb measure; then V V (Eij \Fij ) = 0. We claim that (5.11)
holds for all u, v, w V , which gives Theorem 5.2.5 in this case. Indeed, if
u Va , v Vb , w Vc were such that (5.11) failed, then E12 would contain
Va Vb , E23 would contain Vb Vc , and E31 would contain Vc Va . The
integrand in (5.10) is then equal to 1 on Va Vb Vc , which has Loeb
measure V (Va )V (Vb )V (Vc ) which is non-zero, contradicting (5.10). This
gives Theorem 5.2.5 in the elementary set case.
Next, we increase the level of generality by assuming that the Eij are
all LV LV -measurable. (The finitary equivalent of this is a little difficult
to pin down; roughly speaking, it is dealing with graphs that are not quite
bounded combinations of bounded graphs, but can be well approximated
by such bounded combinations; a good example is the half-graph, which is
a bipartite graph between two copies of {1, . . . , N }, which joins an edge
between the first copy of i and the second copy of j iff i < j.) Then each Eij
can be approximated to within an error of /3 in V V by elementary sets.
In particular, we can find a finite partition V = V1 . . . Vn of V , and sets
0 that are unions of some of the V V , such that
0
Eij
a
V V (Eij Eij ) < /3.
b
107
0 such that V , V
Let Fij be the union of all the Va Vb contained in Eij
a
b
have positive Loeb measure, and such that
2
V V (Eij (Va Vb )) > V V (Va Vb ).
3
Then the Fij are internal subsets of V V , and V V (Eij \Fij ) < .
We now claim that the Fij obey (5.11) for all u, v, w, which gives Theorem 5.2.5 in this case. Indeed, if u Va , v Vb , w Vc were such that (5.11)
failed, then E12 occupies more than 32 of Va Vb , and thus
Z
2
1E12 (u, v) dV V V (u, v, w) > V V V (Va Vb Vc ).
3
Va Vb Vc
Similarly for 1E23 (v, w) and 1E31 (w, u). From the inclusion-exclusion formula, we conclude that
Z
1E12 (u, v)1E23 (v, w)1E31 (w, u) dV V V (u, v, w) > 0,
Va Vb Vc
Also, we have
0
V V (Eij \Eij
)
Z
=
V V
1Eij (1 1Eij0 )
Z
=
V V
/2.
fij (1 1Eij0 )
108
5. Nonstandard analysis
Applying the already established cases of Theorem 5.2.5, we can find internal
0 \F ) < /2, and hence
sets Fij obeying (5.11) with V V (Eij
ij
V V (Eij \Fij ) <
, and Theorem 5.2.5 follows.
Remark 5.2.6. The full hypergraph removal lemma can be proven using
similar techniques, but with a longer tower of generalisations than the three
cases given here; see [Ta2007] or [ElSz2012].
Chapter 6
Partial differential
equations
f () :=
f (x)e2ix dx.
Rd
Indeed, we have
c () = 4 2 ||2 f()
f
for any suitably nice function f (e.g. in the Schwartz class; alternatively,
one can work in very rough classes, such as the space of tempered distributions, provided of course that one is willing to interpret all operators in a
distributional or weak sense).
Because of this explicit diagonalisation, it is a straightforward manner
to define spectral multipliers m() of the Laplacian for any (measurable,
polynomial growth) function m : [0, +) C, by the formula
\ () := m(4 2 ||2 )f().
m()f
(The presence of the minus sign in front of the Laplacian has some minor
technical advantages, as it makes positive semi-definite. One can also
109
110
define spectral multipliers more abstractly from general functional calculus, after establishing that the Laplacian is essentially self-adjoint.) Many
of these multipliers are of importance in PDE and analysis, such as the
fractional derivative operators ()s/2 , the heat propagators
et , the (free)
Schr
odinger
propagators eit , the wave propagators eit (or cos(t )
and sin(t
, depending on ones conventions), the spectral projections
111
(6.3)
for any x > 0, so one can also write resolvents in terms of wave
propagators:
Z
i
2
cos(t )eikt dt.
R(k ) =
k 0
(4) Using the Cauchy integral formula, one can express (sufficiently
holomorphic) multipliers in terms of resolvents (or limits of resolvents). For instance, if t > 0, then from the Cauchy integral formula
(and Jordans lemma) one has
Z
1
eity
itx
e =
lim
dy
2i 0+ R y x + i
(6.4)
(6.5)
which leads (again formally) to the ability to express arbitrary multipliers in terms of imaginary (or skew-adjoint) parts of resolvents:
Z
1
m() = lim
(Im R(y + i))m(y) dy.
0+ R
Among other things, this type of formula (with replaced by a
more general self-adjoint operator) is used in the resolvent-based
approach to the spectral theorem (by using the limiting imaginary
part of resolvents to build spectral measure). Note that one can
1
also express Im R(y + i) as 2i
(R(y + i) R(y i)).
112
the Stone-Weierstrass theorem does not directly apply). Indeed, observe the
*-algebra type identities
es et = e(s+t) ;
(es ) = es ;
eis eit = ei(s+t) ;
(eis ) = eis ;
eis
it
(eis
= ei(s+t)
) = eis
R(w) R(z)
;
zw
R(z) = R(z).
R(z)R(w) =
eik|xy|
4|xy|
by
113
Z
ei |xy|
lim R( + i)f (x) =
f (y) dy
0+
R3 4|x y|
and
Z
lim R( i)f (x) =
0+
R3
ei |xy|
f (y) dy.
4|x y|
Z
u (x) :=
R3
ei |xy|
f (y) dy
4|x y|
( )u = f,
R3
f (y) dy = A, then
(6.7)
Aei |x|
1
u (x) =
+ O( 2 )
4|x|
|x|
1
1
(6.8)
u (x) = O( ); (r i )u (x) = O( 2 )
|x|
|x|
x
where r := |x|
x is the outgoing radial derivative. Indeed, one can
show using an integration by parts argument that u is the unique solution
of the Helmholtz equation (6.6) obeying (6.8) (see below). u+ is known
as the outward radiating solution of the Helmholtz equation (6.6), and u
is known as the inward radiating solution. Indeed, if one views the function u (t, x) := eit u (x) as a solution to the inhomogeneous Schrodinger
equation
(it + )u = eit f
and using the de Broglie law that a solution to such an equation with wave
number k R3 (i.e. resembling Aeikx for some amplitide A) should propagate at (group) velocity 2k, we see (heuristically, at least) that the outward
radiating
solution will indeed propagate radially away from the origin at
speed 2 , while inward radiating solution propagates inward at the same
speed.
114
(6.9)
115
are as smooth as needed. (In practice, the elliptic nature of the Laplacian
ensures that issues of regularity are easily dealt with.) If uniqueness fails,
then by subtracting the two solutions, we obtain a non-trivial solution u to
the homogeneous Helmholtz equation
( )u = 0
(6.10)
such that
1
(r i )u(x) = O( 2 ).
|x|
1
);
|x|
u(x) = O(
S2
and
Z
(6.12)
|r u(r)|2 d = O(r3 )
S2
as r .
Now we use the positive commutator method. Consider the expression
Z
[r , ]u(x)u(x) dx.
(6.13)
R3
(To be completely rigorous, one should insert a cutoff to a large ball, and
then send the radius of that ball to infinity, in order to make the integral
well-defined but we will ignore this technicality here.) On the one hand, we
may integrate by parts (using (6.11), (6.12) to show that all boundary terms
116
go to zero) and (6.10) to see that this expression vanishes. On the other
hand, by expanding the Laplacian in polar coordinates we see that
[ , r ] =
2
2
r 3 .
2
r
r
2 r
r
3
R
and
Z
R3
2
u(x)u(x) dx = 2
r3
Z
R3
|ang |u(x)|2
dx
|x|
where |ang u(x)|2 := |u(x)|2 |r u(x)|2 is the angular part of the kinetic
energy density |u(x)|2 . We obtain (a degenerate case of) the PohazaevMorawetz identity
Z
|ang u(x)|2
2
8|u(0)| + 2
dx = 0
|x|
R3
which implies in particular that u vanishes at the origin. Translating u
around (noting that this does not affect either the Helmholtz equation or
the Sommerfeld radiation condition) we see that u vanishes completely.
(Alternatively, one can replace r by the smoothed out multiplier x
hxi , in
which case the Pohazaev-Morawetz identity acquires a term of the form
R |u(x)|2
R3 hxi5 dx which is enough to directly ensure that u vanishes.)
6.1.2. Proof of the limiting absorption principle. We now sketch a
proof of the limiting absorption principle, also based on the positive commutator method. For notational simplicity we shall only consider the case
when is comparable to 1, though the method we give here also yields the
general case after some more bookkeeping.
Let > 0 be a small exponent to be chosen later, and let f be normalised
to have H 0,1/2+ (R3 ) norm equal to 1. For sake of concreteness let us take
the + sign, so that we wish to bound u := R( + i)f . This u obeys the
Helmholtz equation
(6.14)
u + u = f iu.
For positive , we also see from the spectral theorem that u lies in L2 (R3 );
the bound here though depends on , so we can only use this L2 (R3 ) regularity for qualitative purposes (and specifically, for ensuring that boundary
terms at infinity from integration by parts vanish) rather than quantitatively.
117
R3
118
0+
1
ImhR( + i)f, f i
119
eity F (t)
we thus have
Z
Z
1
0
ei(tt ) F (t0 ) dt0 = lim
eity (Im R(y + i))F (y) dy.
+
0
R
R
Applying Plancherels theorem and Fatous lemma (and commuting the L2t
0,1/2
and Hx
norms), we can bound the LHS of (6.15) by
. k||(Im R(y + i))F (y)kL2 H 0,1/2 (RR3 )
y
120
The claim now follows from the limiting absorption principle (and elliptic
regularity).
Remark 6.1.7. The above estimate was proven by taking a Fourier transform in time, and then applying the limiting absorption principle, which
was in turn proven by using the positive commutator method. An equivalent way to proceed is to establish the local smoothing estimate directly
by the analogue of the positive commutator method for Schrodinger flows,
namely Morawetz multiplier method in which one contracts the stress-energy
tensor (or variants thereof) against well-chosen vector fields, and integrates
by parts.
An analogous claim holds for solutions to the wave equation
t2 u + u = 0
with initial data u(0) = u0 , t u(0) = u1 , with the relevant estimate being
that
k|t,x u|kL2t H 0,1/2 (RR3 ) . ku0 kH 1 (R3 ) + ku1 kL2 (R3 ) .
As before, this estimate can also be proven directly using the Morawetz
multiplier method.
6.1.5. The RAGE theorem. Another consequence of limiting absorption, closely related both to absolutely continuous spectrum and to local
smoothing, is the RAGE theorem (named after Ruelle [Ru1969], AmreinGeorgescu [AmGe1973], and Enss [En1978], specialised to the free Schrodinger
equation:
Theorem 6.1.8 (RAGE for Schrodinger). If f L2 (R3 ), and K is a compact subset of R3 , then keit f kL2 (K) 0 as t .
Proof. By a density argument we may assume that f lies in, say, H 2 (R3 ).
Then eit f is uniformly bounded in H 2 (R3 ), and is Lipschitz in time in the
L2 (R3 ) (and hence L2 (K)) norm. On the other hand, from local smoothing
R T +1 it
we know that T
ke kL2 (K) dt goes to zero as T . Putting the
two facts together we obtain the claim.
Remark 6.1.9. One can also deduce this theorem from the fact that
has purely absolutely continuous spectrum, using the abstract form of the
RAGE theorem due to the authors listed above (which can be thought of as
a Hilbert space-valued version of the Riemann-Lebesgue lemma).
121
There is also a similar RAGE theorem for the wave equation (with L2
replaced by the energy space H 1 L2 ) whose precise statement we omit
here.
6.1.6. The limiting amplitude principle. A close cousin to the limiting
absorption principle, which governs the limiting behaviour of the resolvent
as it approaches the spectrum, is the limiting amplitude principle, which
governs the asymptotic behaviour of a Schrodinger or wave equation with
oscillating forcing term. We give this principle for the Schrodinger equation
(the case for the wave equation is analogous):
Theorem 6.1.10 (Limiting amplitude principle). Let f L2 (R3 ) be compactly supported, let > 0, and let u be a solution to the forced Schr
odinger
equation iut + u = eit f which lies in L2 (R3 ) at time zero. Then for any
compact set K, eiT u converges in L2 (K) as T + to v, the solution to
the Helmholtz equation v+v = f obeying the outgoing radiation condition
(6.7).
Proof. (Sketch) By subtracting off the free solution eit u(0) (which decays
in L2 (K) by the RAGE theorem), we may assume that u(0) = 0. From the
Duhamel formula we then have
Z T
u(T ) = i
ei(T t) eit f dt
0
0+
eit(++i) f dt.
R
From the limiting absorption principle, the integral i 0 eit(++i) f dt
converges to v, and so it suffices to show that the expression
Z
lim
eit(++i) f dt
0+
0+
122
123
due to the incompressibility of water (and conservation of mass); the massive net pressure (or more precisely, spatial variations in this pressure) of a
very broad and deep wave of water forces the profile of the wave to move
horizontally at vast speeds.
As the tsunami approaches shore, the depth b of course decreases, causing the tsunami to slow down, at a rate proportional to the square root
of the depth, as per (6.16). Unfortunately, wave shoaling then forces the
amplitude A to increase at an inverse rate governed by Greens law,
(6.17)
1
b1/4
at least until the amplitude becomes comparable to the water depth (at
which point the assumptions that underlie the above approximate results
break down; also, in two (horizontal) spatial dimensions there will be some
decay of amplitude as the tsunami spreads outwards). If one starts with
a tsunami whose initial amplitude was A0 at depth b0 and computes the
point at which the amplitude A and depth b become comparable using the
proportionality relationship (6.17), some high school algebra then reveals
that at this point, amplitude of a tsunami (and the depth of the water) is
4/5 1/5
about A0 b0 . Thus, for instance, a tsunami with initial amplitude of one
metre at a depth of 2 kilometres can end up with a final amplitude of about
5 metres near shore, while still traveling at about ten metres per second (35
kilometres per hour, or 22 miles per hour), which can lead to a devastating
impact when it hits shore.
While tsunamis are far too massive of an event to be able to control
(at least in the deep ocean), we can at least model them mathematically,
allowing one to predict their impact at various places along the coast with
high accuracy. The full equations and numerical methods used to perform
such models are somewhat sophisticated, but by making a large number of
simplifying assumptions, it is relatively easy to come up with a rough model
that already predicts the basic features of tsunami propagation, such as the
velocity formula (6.16) and the amplitude proportionality law (6.17). I give
this (standard) derivation below. The argument will largely be heuristic in
nature; there are very interesting analytic issues in actually justifying many
of the steps below rigorously, but I will not discuss these matters here.
6.2.1. The shallow water wave equation. The ocean is, of course, a
three-dimensional fluid, but to simplify the analysis we will consider a twodimensional model in which the only spatial variables are the horizontal
variable x and the vertical variable z, with z = 0 being equilibrium sea
level. We model the ocean floor by a curve
z = b(x),
124
thus b measures the depth of the ocean at position x. At any time t and
position x, the height of the water (compared to sea level z = 0) will be
given by an unknown height function h(t, x); thus, at any time t, the ocean
occupies the region
t := {(x, z) : b(x) < z < h(t, x)}.
Now we model the motion of water inside the ocean by assigning at each
time t and each point (x, z) t in the ocean, a velocity vector
~u(t, x, z) = (ux (t, x, z), uz (t, x, z)).
We make the basic assumption of incompressibility, so that the density
of water is constant throughout t .
The velocity changes over time according to Newtons second law F =
ma. To apply this law to fluids, we consider an infinitesimal amount of
water as it flows along the velocity field ~u. Thus, at time t, we assume that
this amount of water occupies some infinitesimal area dA and some position
~x(t) = (x(t), z(t)), where we have
d
~x(t) = ~u(t, ~x(t)).
dt
Because of incompressibility, the area dA stays constant, and the mass of
this infinitesimal portion of water is m = dA. There will be two forces on
this body of water; the force of gravity, which is (0, mg) = (0, )dA, and
the force of the pressure field p(t, x, z), which is given by pdA. At the
length and time scales of a tsunami, we can neglect the effect of other forces
such as viscosity or surface tension. Newtons law m du
dt = F then gives
d
~u(t, ~x(t)) = pdA + (0, mg)
dt
which simplifies to the incompressible Euler equation
m
effect 1 z
p of pressure cancels out the effect g of gravity. We also assume
that the pressure is zero on the surface z = h(t, x) of the water. Together,
these two assumptions force the pressure to be the hydrostatic pressure
(6.18)
p = g(h(t, x) z).
This reflects the intuitively plausible fact that the pressure at a point under
the ocean should be determined by the weight of the water above that point.
125
(6.19)
(6.20)
(This ansatz should be taken with a grain of salt, particularly when applied
to the z component uz of the velocity, which does actually have to fluctuate
a little bit to accomodate changes in ocean depth and in the height function.
However, the primary component of the velocity is the horizontal component
ux , and this does behave in a fairly vertically insensitive fashion in actual
tsunamis.)
Taking the x component of (6.19), and abbreviating ux as u, we obtain
the first shallow water wave equation
u + u u = g h.
t
x
x
(6.21)
The next step is to play off the incompressibility of water against the
finite depth of the ocean. Consider an infinitesimal slice
{(x, z) t : x0 x x0 + dx}
of the ocean at some time t and position x0 . The total mass of this slice is
roughly
(h(t, x0 ) + b(x0 ))dx
and so the rate of change of mass of this slice over time is
h
(t, x0 )dx.
t
On the other hand, the rate of mass entering this slice on the left x = x0 is
h
(t, x0 )dx = u(t, x0 )(h(t, x0 ) + b(x0 ))
t
u(t, x0 + dx)(h(t, x0 + dx) + b(x0 + dx))
126
which simplifies after Taylor expansion to the second shallow water wave
equation
(6.22)
h+
(u(h + b)) = 0.
t
x
Remark 6.2.1. Another way to derive (6.22) is to use a more familiar form
of the incompressibility, namely the divergence-free equation
(6.23)
ux +
uz = 0.
x
z
(Here we will refrain from applying (6.20) to the vertical component of the
velocity uz , as the approximation (6.20) is not particularly accurate for this
component.) Also, by considering the trajectory of a particle (x(t), h(t, x(t)))
at the surface of the ocean, we have the formulae
d
x(t) = ux (x(t), h(t, x(t)))
dt
and
d
h(t, x(t)) = uz (x(t), h(t, x(t)))
dt
which after application of the chain rule gives the equation
(6.24)
h(t, x) + ( h(x))ux (x, h(t, x)) = uz (x, h(t, x)).
t
x
A similar analysis at the ocean floor (which does not vary in time) gives
ux (t, x, z) dz.
x b(x)
(6.25)
which is the spatial rate of change of the velocity flux through a vertical
slice of the ocean. On the one hand, using the ansatz (6.20), we expect this
expression to be approximately
(u(h + b)).
x
On the other hand, by differentiation under the integral sign, we can evaluate
this expression instead as
Z h(t,x)
ux (t, x, z) dz
b(x) x
127
|h| b.
This hypothesis is fairly accurate for tsunamis in the deep ocean, and even
for medium depths, but of course is not reasonable once the tsunami has
reached shore (where the dynamics are far more difficult to model).
The hypothesis (6.26) already simplifies (6.22) to (approximately)
(6.27)
h+
(ub) = 0.
t
x
As for (6.21), we argue that the second term on the left-hand side is negligible, leading to
u = g h.
t
x
To explain heuristically why we expect this to be the case, let us make the
ansatz that h and u have amplitude A, V respectively, and propagate at
some phase velocity v and wavelength ; let us also make the (reasonable)
assumption that b varies much slower in space than u does (i.e. that b is
roughly constant at the scale of the wavelength ), so we may (for a first
approximation) replace x
(ub) by b x
u. Heuristically, we then have
(6.28)
u = O(V /)
x
h = O(A/)
x
u = O(vV /)
t
h = O(vA/)
t
and equation (6.27) then suggests
(6.29)
vA/ V b/.
u=
much faster than the velocity of the fluid. In particular, we expect u x
2
O(V /) to be much smaller than t u = O(vV /), which explains why we
expect to drop the second term in (6.21) to obtain (6.28).
128
t = vx .
129
As for the Hamilton-Jacobi equation, we solve it using the method of characteristics. Multiplying the equation by A, we obtain
(A2 t )t v 2 (A2 x )x 2vvx A2 x = 0.
Inserting (6.30) and writing F := A2 x , one obtains
vFt v 2 Fx 2vvx F = 0
which simplifies to
(t + vx )(v 2 F ) = 0.
Thus we see that v 2 F is constant along characteristics. On the other hand,
differentiating (6.30) in x we see (after some rearranging) that
(t + vx )(vx ) = 0
so vx is also constant along characteristics. Dividing, we see that A2 v is
constant along characteristics, leading to the proportionality relationship
1
A
v
which gives (6.17).
Remark 6.2.3. It becomes difficult to retain the sinusoidal ansatz once the
amplitude exceeds the depth, as it leads to the absurd conclusion that the
troughs of the wave lie below the ocean floor. However, a remnant of this
effect can actually be seen in real-life tsunamis, namely that if the tsunami
starts with a trough rather than a crest, then the water at the shore draws
back at first (sometimes for hundreds of metres), before the crest of the
tsunami hits. As such, the sudden withdrawal of water of a shore is an
important warning sign of an immediate tsunami.
Chapter 7
Number theory
log 2
log 3
is transcendental.
132
7. Number theory
3p 2q = 3p (1 3
2
pq )
q( log
log 3
133
to establish this fact other than essentially going through some version of
Bakers argument (which will be given below).
For comparison, by exploiting the trivial (yet fundamental) integrality
gap - the obvious fact that if an integer n is non-zero, then its magnitude is
at least 1 - we have the trivial bound
|3p 2q | 1
for all positive integers p, q (since, from the fundamental theorem of arithmetic, 3p 2q cannot vanish). Putting this into (7.1) we obtain a very weak
version of Proposition 7.1.3, that only gives exponential bounds instead of
polynomial ones:
Proposition 7.1.5 (Trivial bound). For any integers p, q with q positive,
one has
log 2 p
1
|
|c q
log 3 q
2
for some absolute (and effectively computable) constant c > 0.
The proof of Bakers theorem (or even of the simpler special case in
Proposition 7.1.3) is largely elementary (except for some very basic complex
analysis), but is quite intricate and lengthy, as a lot of careful book-keeping
is necessary in order to get a bound as strong as that in Proposition 7.1.3. To
illustrate the main ideas, I will prove a bound that is weaker than Proposition
7.1.3, but still significantly stronger than Proposition 7.1.5, and whose proof
already captures many of the key ideas of Baker:
Proposition 7.1.6 (Weak special case of Bakers theorem). For any integers p, q with q > 1, one has
log 2 p
0
|
| exp(C logC q)
log 3 q
for some absolute constants C, C 0 > 0.
Note that Proposition 7.1.3 is equivalent to the assertion that one can
take C 0 = 1 (and C effective) in the above proposition.
The proof of Proposition 7.1.6 can be made effective (for instance, it is
not too difficult to make the C 0 close to 2); however, in order to simplify the
exposition (and in particular, to be able to use some nonstandard analysis
terminology to reduce the epsilon management, cf. [Ta2008, 1.5]), I will
establish Proposition 7.1.6 with ineffective constants C, C 0 .
Like many other results in transcendence theory, the proof of Bakers
theorem (and Proposition 7.1.6) rely on what we would nowadays call the
polynomial method - to play off upper and lower bounds on the complexity
of polynomials that vanish (or nearly vanish) to high order on a specified
134
7. Number theory
set of points. In the specific case of Proposition 7.1.6, the points in question
are of the form
N := {(2n , 3n ) : n = 1, . . . , N } R2
for some large integer N . On the one hand, the irrationality of
that the curve
:= {(2t , 3t ) : t R}
log 2
log 3
ensures
is not algebraic, and so it is difficult for a polynomial P of controlled complexity2 to vanish (or nearly vanish) to high order at all the points of N ; the
trivial bound in Proposition 7.1.5 allows one to make this statement more
2
precise. On the other hand, if Proposition 7.1.6 failed, then log
log 3 is close to
a rational, which by Taylor expansion makes close to an algebraic curve
over the rationals (up to some rescaling by factors such as log 2 and log 3)
at each point of N . This, together with a pigeonholing argument, allows
one to find a polynomial P of reasonably controlled complexity to (nearly)
vanish to high order at every point of N .
These observations, by themselves, are not sufficient to get beyond the
trivial bound in Proposition 7.1.5. However, Bakers key insight was to
exploit the integrality gap to bootstrap the (near) vanishing of P on a set
N to imply near-vanishing of P on a larger set N 0 with N 0 > N . The point
is that if a polynomial P of controlled degree and size (nearly) vanishes to
higher order on a lot of points on an analytic curve such as , then it will
also be fairly small on many other points in as well. (To quantify this
statement efficiently, it is convenient to use the tools of complex analysis,
which are particularly well suited to understand zeroes (or small values) of
polynomials.) But then, thanks to the integrality gap (and the controlled
complexity of P ), we can amplify fairly small to very small.
Using this observation and an iteration argument, Baker was able to take
a polynomial of controlled complexity P that nearly vanished to high order
on a relatively small set N0 , and bootstrap that to show near-vanishing on
a much larger set Nk . This bootstrap allows one to dramatically bridge the
gap between the upper and lower bounds on the complexity of polynomials
that nearly vanish to a specified order on a given N , and eventually leads
to Proposition 7.1.6 (and, with much more care and effort, to Proposition
7.1.3).
Below the fold, I give the details of this argument. My treatment here
is inspired by the expose [Se1969], as well as the unpublished lecture notes
[So2010].
2Here, complexity of a polynomial is an informal term referring both to the degree of the
polynomial, and the height of the coefficients, which in our application will essentially be integers
up to some normalisation factors.
135
136
7. Number theory
log 2
log 3
p
q
Let us quickly see why Proposition 7.1.7 implies Proposition 7.1.6 (the
converse is easy and is left to the reader). This is the usual compactness
and contradiction argument. Suppose for contradiction that Proposition
7.1.6 failed. Carefully negating the quantifiers, we may then find a sequence
pn
qn of (standard) rationals with qn > 1, such that
|
log 2 pn
| exp(n logn qn )
log 3 qn
log 2
for all natural numbers n. As log
3 is irrational, qn must go to infinity.
Taking the ultralimit pq of the pqnn , and setting H to be (say) q, we contradict
Proposition 7.1.7.
137
0a,bD
(7.4)
log 3
(aq + bp).
q
Using the asymptotic calculus (and the hypotheses that D, j are of polylogarithmic size, and the ca,b are of quasipolynomial size) we conclude that the
left-hand side of (7.4) is
log 3 j X
)
ca,b (ap + bq)j 2an 3bn .
(7.5)
(
q
0a,bD
138
7. Number theory
N 2J
N J 2 log H
+
)).
D2
D
139
140
7. Number theory
thus by (7.3)
X
f (z) =
0a,bD
By hypothesis, we have
f (j) (n) 0
for all 0 j 2J and 1 n N . We wish to show that
f (j) (n0 ) 0
for 0 j J and 1 n0 N 0 . Clearly we may assume that N 0 n0 > N .
Fix 0 j J and 1 n0 N 0 . To estimate f (j) (n0 ), we consider the
contour integral
Z
dz
1
f (j) (z)
(7.7)
QN
J
2i |z|=R n=1 (z n) z n0
(oriented anticlockwise), where R 2N 0 is to be chosen later, and estimate
it in two different ways. Firstly, we have
X
f (j) (z) =
ca,b (a log 2 + b log 3)j 2az 3bz ,
0a,bD
so for |z| =
2N 0 ,
N
Y
(z n)J | (R/2)N J .
n=1
0
J
n=1 (n n)
Now we consider the poles at n = 1, . . . , N . For each such n, we see that the
first J derivatives of f (j) are quasiexponentially small at n. Thus, by Taylor
expansion (and asymptotic calculus), one can express f (j) (z) as the sum of
a polynomial of degree J with quasiexponentially small coefficients, plus an
entire function that vanishes to order J at n. The latter term contributes
141
nothing to the residue at n, while from the Cauchy integral formula (applied,
for instance, to a circle of radius 1/2 around n) and asymptotic calculus, we
see that the former term contributes a residue is quasiexponentially small.
In particular, it is less than exp(O(N J) N J log R). We conclude that
| QN
f (j) (n0 )
n=1 (n
n)J
We have
|
N
Y
(n0 n)J | (N 0 )N J
n=1
and thus
R
);
N0
choosing R to be a large standard multiple of N 0 and using the hypothesis
J
N 0 = o( D
N ), we can simplify this to
|f (j) (n0 )| exp(O(N J + DR) N J log
Now we can finish the proof of Proposition 7.1.7 (and hence Proposition
7.1.6). We select quantities D, J, N0 of polylogarithmic size obeying the
bounds
log H N0 D J
and
N0 J D2 ,
142
7. Number theory
with a gap of a positive power of log H between each such inequality. For
instance, one could take
N0 := log2 H
D := log4 H
J := log5 H;
many other choices are possible (and one can optimise these choices eventually to get a good value of exponent C 0 in Proposition 7.1.6).
Using Proposition 7.1.11, we can find a good polynomial P which van2
N 2J
H
ishes to order J on N0 , of height exp(O( N0 JDlog
+ D0 )), and hence (by
2
the assumptions on N0 , D, J) of height exp(O(N0 J)).
Applying Proposition 7.1.12, P nearly vanishes to order J/2 on N1 for
J
any N1 = o( D
N0 ). Iterating this, an easy induction shows that for any
standard k 1, P nearly vanishes to order J/2k on Nk for any Nk =
J k
) N0 ). As J/D was chosen to be larger than a positive power of log H,
o(( D
we conclude that P nearly vanishes to order at least 0 on N for any N of
polylogarithmic size. But for N large enough, this contradicts Proposition
7.1.10.
Remark 7.1.13. The above argument places a lower bound on quantities
such as
q log 2 p log 3
for integer p, q. Bakers theorem, in its full generality, gives a lower bound
on quantities such as
0 + 1 log 1 + . . . + n log n
for algebraic numbers 0 , . . . , n , 1 , . . . , n , which is polynomial in the
height of the quantities involved, assuming of course that 1, 1 , . . . , n are
multiplicatively independent, and that all quantities are of bounded degree.
The proof is more intricate than the one given above, but follows a broadly
similar strategy, and the constants are completely effective.
143
Open questions with this level of notoriety can lead to what Richard
Lipton calls3 mathematical diseases. Nevertheless, it can still be diverting
to spend a day or two each year on these sorts of questions, before returning
to other matters; so I recently had a go at the problem. Needless to say,
I didnt solve the problem, but I have a better appreciation of why the
conjecture is (a) plausible, and (b) unlikely be proven by current technology,
and I thought I would share what I had found out here.
Let me begin with some very well known facts. If n is odd, then f0 (n) =
3n + 1 is even, and so f02 (n) = 3n+1
2 . Because of this, one could replace
when n is odd,
f0 by the function f1 : N N, defined by f1 (n) = 3n+1
2
and f1 (n) = n/2 when n is even, and obtain an equivalent conjecture. Now
we see that if one chooses n at random, in the sense that it is odd with
probability 1/2 and even with probability 1/2, then f1 increases n by a factor
of roughly 3/2 half the time, and decreases it by a factor of 1/2 half the time.
Furthermore, if n is uniformly distributed modulo 4, one easily verifies that
f1 (n) is uniformly distributed modulo 2, and so f12 (n) should be roughly 3/2
times as large as f1 (n) half the time, and roughly 1/2 times as large as f1 (n)
the other half of the time. Continuing this at a heuristic level, we expect
generically that f1k+1 (n) 23 f1k (n) half the time, and f1k+1 (n) 21 f1k (n) the
other half of the time. The logarithm log f1k (n) of this orbit can then be
modeled heuristically by a random walk with steps log 32 and log 21 occuring
with equal probability. The expectation
1
3 1
1
1
3
log + log = log
2
2 2
2
2
4
is negative, and so (by the classic gamblers ruin) we expect the orbit to
decrease over the long term. This can be viewed as heuristic justification
of the Collatz conjecture, at least in the average case scenario in which
n is chosen uniform at random (e.g. in some large interval {1, . . . , N }). (It
also suggests that if one modifies the problem, e.g. by replacing 3n + 1
to 5n + 1, then one can obtain orbits that tend to increase over time, and
indeed numerically for this variant one sees orbits that appear to escape to
infinity.) Unfortunately, one can only rigorously keep the orbit uniformly
distributed modulo 2 for time about O(log N ) or so; after that, the system
is too complicated for naive methods to control at anything other than a
heuristic level.
Remark 7.2.2. One can obtain a rigorous analogue of the above arguments
by extending f1 from the integers Z to the 2-adics Z2 (the inverse limit of
the cyclic groups Z/2n Z). This compact abelian group comes with a Haar
probability measure, and one can verify that this measure is invariant with
respect to f1 ; with a bit more effort one can verify that it is ergodic. This
3See rjlipton.wordpress.com/2009/11/04/on-mathematical-diseases.
144
7. Number theory
145
(7.9)
(7.11)
we see from induction that 2ai +1 divides ni+1 , and thus ai+1 > ai :
0 = a1 < a2 < . . . < ak .
Since f2k ([n]) = [n], we have
2ak+1 n = 3k n + 3k1 2a1 + 3k2 2a2 + . . . + 2ak = 3nk + 2ak
for some integer ak+1 . Since 3nk + 2ak is divisible by 2ak +1 , and n is odd,
we conclude ak+1 > ak ; if we rearrange the above equation as (7.8), then we
obtain a counterexample to Conjecture 7.2.4.
Conversely, suppose that Conjecture 7.2.4 failed. Then we have k 1,
integers
0 = a1 < a2 < . . . < ak+1
and a natural number n > 1 such that (7.8) holds. As a1 = 0, we see that
the right-hand side of (7.8) is odd, so n is odd also. If we then introduce
146
7. Number theory
the natural numbers ni by the formula (7.10), then an easy induction using
(7.11) shows that
(7.12)
with the periodic convention ak+j := aj + ak+1 for j > 1. As the ai are
increasing in i (even for i k + 1), we see that 2ai is the largest power of 2
that divides the right-hand side of (7.12); as 2ak+1 3k is odd, we conclude
that 2ai is also the largest power of 2 that divides ni . We conclude that
f2 ([ni ]) = [3ni + 2ai ] = [ni+1 ]
and thus [n] is a periodic orbit of f2 . Since n is an odd number larger than
1, this contradicts Conjecture 7.2.4.
Call a counterexample a tuple (k, a1 , . . . , ak+1 ) that contradicts Conjecture 7.2.4, i.e. an integer k 1 and an increasing set of integers
0 = a1 < a2 < . . . < ak+1
such that (7.8) holds for some n 1. We record a simple bound on such
counterexamples, due to Terras [Te1976] and Garner [Ga1981]:
Lemma 7.2.6 (Exponent bounds). Let N 1, and suppose that the Collatz
conjecture is true for all n < N . Let (k, a1 , . . . , ak+1 ) be a counterexample.
Then
log(3 + N1 )
log 3
k < ak+1 <
k.
log 2
log 2
Proof. The first bound is immediate from the positivity of 2ak+1 3k . To
prove the second bound, observe from the proof of Proposition 7.2.5 that
the counterexample (k, a1 , . . . , ak+1 ) will generate a counterexample to Conjecture 7.2.3, i.e. a non-trivial periodic orbit n, f (n), . . . , f K (n) = n. As the
conjecture is true for all n < N , all terms in this orbit must be at least N .
An inspection of the proof of Proposition 7.2.5 reveals that this orbit consists of k steps of the form x 7 3x + 1, and ak+1 steps of the form x 7 x/2.
As all terms are at least n, the former steps can increase magnitude by a
multiplicative factor of at most 3 + N1 . As the orbit returns to where it
started, we conclude that
1
1
1 (3 + )k ( )ak+1
N
2
whence the claim.
The Collatz conjecture has already been verified for many values4 of n.
Inserting this into the above lemma, one can get lower bounds on k. For
4According to http://www.ieeta.pt/ tos/3x+1.html, the conjecture has been verified up to
at least N = 5 1018 .
147
where, in view of Lemma 7.2.6, one should restrict the double summation
3
to the heuristic regime a log
log 2 k, with the approximation here accurate to
many decimal places.
We need a lower bound on q. Here, we will use Bakers theorem (as
discussed in Section 7.1), which among other things gives the lower bound
(7.14)
q = 2a 3k 2a /aC
148
7. Number theory
In some very special cases, this can be done. For instance, suppose that
one had ai+1 = ai + 1 with at most one exception (this is essentially what
is called a 1-cycle in [St1978]). Then (7.15) simplifies via the geometric
series formula to a combination of just a bounded number of powers of
2 and 3, rather than an unbounded number. In that case, one can start
using tools from transcendence theory such as Bakers theorem to obtain
good results; for instance, in [St1978], it was shown that 1-cycles cannot
actually occur, and similar methods have been used to show that m-cycles
(in which there are at most m exceptions to ai+1 = ai + 1) do not occur for
any m 63, as was shown in [Side2005]. However, for general increasing
tuples of integers a1 , . . . , ak , there is no such representation by bounded
numbers of powers, and it does not seem that methods from transcendence
theory will be sufficient to control the expressions (7.15) to the extent that
one can understand their divisibility properties by quantities such as 2a 3k .
Amusingly, there is a slight connection to Littlewood-Offord theory in
additive combinatorics - the study of the 2n random sums
v1 v2 . . . vn
generated by some elements v1 , . . . , vn of an additive group G, or equivalently, the vertices of an n-dimensional parallelepiped inside G. Here, the
relevant group is Z/qZ. The point is that if one fixes k and ak+1 (and hence
q), and lets a1 , . . . , ak vary inside the simplex
:= {(a1 , . . . , ak ) Nk : 0 = a1 < . . . < ak < ak+1 }
149
then the set S of all sums5 of the form (7.15) (viewed as an element of Z/qZ)
contains many large parallelepipeds. This is because the simplex contains
many large cubes. Indeed, if one picks a typical element (a1 , . . . , ak ) of ,
then one expects (thanks to Lemma 7.2.6) that there there will be k
indices 1 i1 < . . . < im k such that aij +1 > aij + 1 for j = 1, . . . , m,
which allows one to adjust each of the aij independently by 1 if desired and
still remain inside . This gives a cube in of dimension k, which then
induces a parallelepiped of the same dimension in S. A short computation
shows that the generators of this parallelepiped consist of products of a
power of 2 and a power of 3, and in particular will be coprime to q.
If the weak Collatz conjecture is true, then the set S must avoid the
residue class 0 in Z/qZ. Let us suppose temporarily that we did not know
about Bakers theorem (and the associated bound (7.14)), so that q could
potentially be quite small. Then we would have a large parallelepiped inside
a small cyclic group Z/qZ that did not cover all of Z/qZ, which would not
be possible for q small enough. Indeed, an easy induction shows that a ddimensional parallelepiped in Z/qZ, with all generators coprime to q, has
cardinality at least min(q, d + 1). This argument already shows the lower
bound q k. In other words, we have
Proposition 7.2.7. Suppose the weak Collatz conjecture is true. Then for
any natural numbers a, k with 2a > 3k , one has 2a 3k k.
This bound is very weak when compared against the unconditional
bound (7.14). However, I know of no way to get a nontrivial separation
property between powers of 2 and powers of 3 other than via transcendence
theory methods. Thus, this result strongly suggests that any proof of the
Collatz conjecture must either use existing results in transcendence theory,
or else must contribute a new method to give non-trivial results in transcendence theory. (This already rules out a lot of possible approaches to solve
the Collatz conjecture.)
By using more sophisticated tools in additive combinatorics, one can improve the above proposition (though it is still well short of the transcendence
theory bound (7.14)):
Proposition 7.2.8. Suppose the weak Collatz conjecture is true. Then for
any natural numbers a, k with 2a > 3k , one has 2a 3k (1 + )k for some
absolute constant > 0.
Proof. (Informal sketch only) Suppose not, then we can find a, k with q :=
2a 3k of size (1 + o(1))k = exp(o(k)). We form the set S as before, which
5Note, incidentally, that once one fixes k, all the sums of the form (7.15) are distinct; because
given (7.15) and k, one can read off 2a1 as the largest power of 2 that divides (7.15), and then
subtracting off 3k1 2a1 one can then read off 2a2 , and so forth.
150
7. Number theory
log 2
n log
3
151
in the limit x , where n ranges over natural numbers less than x, and
f : N C is some arithmetic function of number-theoretic interest. For
instance, the celebrated prime number theorem is equivalent to the assertion
X
(n) = x + o(x)
nx
where (n) is the von Mangoldt function (equal to log p when n is a power
of a prime p, and zero otherwise), while the infamous Riemann hypothesis
is equivalent to the stronger assertion
X
(n) = x + O(x1/2+o(1) ).
nx
P
It is thus of interest to develop techniques to estimate such sums nx f (n).
Of course, the difficulty of this task depends on how nice the function f
is. The functions f that come up in number theory lie on a broad spectrum
of niceness, with some particularly nice functions being quite easy to sum,
and some being insanely difficult.
At the easiest end of the spectrum are those functions f that exhibit
some sort of regularity or smoothness. Examples of smoothness include
Archimedean smoothness, in which f (n) is the restriction of some smooth
function f : R C from the reals to the natural numbers, and the derivatives of f are well controlled. A typical example is
X
log n.
nx
as
6
P It is also often convenient to replace this sharply truncated sum with a smoother sum such
n f (n)(n/x) for some smooth cutoff , but we will not discuss this technicality here.
152
7. Number theory
nZ/qZ
nZ/qZ
X
f (n)
n=1
ns
P
which are clearly related to the partial sums nx f (n) (essentially via the
Mellin transform, a cousin of the Fourier and Laplace transforms); for this
section we ignore the (important) issue of how to make sense of this series
when it is not absolutely convergent (but see [Ta2011d, 3.7] for more
discussion). A primary reason that this technique is effective is that the
Dirichlet series of a multiplicative function factorises as an Euler product
X
f (n)
n=1
ns
YX
f (pj )
(
).
js
p
p
j=0
153
One also obtains similar types of representations for functions that are not
quite multiplicative, but are closely related to multiplicative functions, such
P
(n)
0 (s)
as the von Mangoldt function (whose Dirichlet series
n=1 ns = (s)
is not given by an Euler product, but instead by the logarithmic derivative
of an Euler product).
Moving another notch along the spectrum between well-controlled and
ill-controlled functions, one can consider functions f that are divisor sums
such as
X
X
g(d) =
1d|n g(d)
f (n) =
dR
dR;d|n
for some other arithmetic function g, and some level R. This is a linear
combination of periodic functions 1d|n g(d) and is thus technically periodic
in n (with period equal to the least common multiple of all the numbers from
1 to R), but in practice this periodic is far too large to be useful (except
for extremely small
P levels R, e.g. R = O(log x)). Nevertheless, we can still
control the sum nx f (n) simply by rearranging the summation:
X
X
X
f (n) =
g(d)
1
nx
dR
nx:d|n
P
and thus by (7.16) one can bound this by the sum of a main term x dR g(d)
d
P
and an error term O( dR |g(d)|). As long as the level R is significantly
less than x, one may expect the main term to dominate, and one can often
estimate this term by a variety of techniques (for instance, if g is multiplicative, then multiplicative number theory techniques are quite effective,
as mentioned previously). Similarly for other slight variants of divisor sums,
such as expressions of the form
X
n
g(d) log
d
dR;d|n
Fd (n)
dR
which counts the number of divisors up to n. This is a multiplicative function, and is therefore most efficiently estimated using the techniques of multiplicative number theory; but for reasons that will become clearer later, let
154
7. Number theory
us forget the multiplicative structure and estimate the above sum by more
elementary methods. By applying the preceding method, we see that
X X
X
1
(n) =
nx
dx nx:d|n
X x
=
( + O(1))
d
dx
(7.17)
= x log x + O(x).
Here, we are (barely) able to keep the error term smaller than the main
term; this is right at the edge of the divisor sum method, because the level
R in this case is equal to x. Unfortunately, at this high choice of level, it
is not always possible to always keep the error term under control like this.
For instance, if one wishes to use the standard divisor sum representation
X
n
(n) =
(d) log ,
d
d|n
X
dx
dx
nx:d|n
n
n n
n
(d)( log + O(log ))
d
d
d
d
X
(n)
lim
=0
ns
s1+
n=1
and
lim
s1+
X
(n) log n
ns
n=1
= 1.
This suggests (but does not quite prove) that one has
X
(n)
(7.18)
n=1
=0
and
(7.19)
X
(n) log n
n=1
= 1
155
this change of variables that every divisor of n above n is paired with one
d n:d|n
except when n is a perfect square, in which case one must subtract one from
the right-hand side. Using this reduced-level divisor sum representation, one
can obtain an improvement to (7.17), namely
X
(n2 + 1).
nx
(n2
(Note that
+ 1) has no multiplicativity properties in n, and so multiplicative number theory techniques cannot be directly applied here.) The
level of the divisor sum here is initially of order x2 , which is too large to be
useful; but using the square root trick, we can expand this expression as
X
X
2
1
nx dn:d|n2 +1
dx dnx:n2 +1=0
1.
mod
156
7. Number theory
dx
dx
and also
X
(d) = O(x)
dx
and thus
X
(n2 + 1) =
nx
3
x log x + O(x).
Similar arguments give asymptotics for on other quadratic polynomials; see for instance [Ho1963], [Mc1995], [Mc1997], [Mc1999]. Note that
the irreducibility of the polynomial will be important.PIf one considers instead a sum involving a reducible polynomial, such as nx (n2 1), then
the analogous quantity (n) becomes significantly larger, leading to a larger
growth rate (of order x log2 x rather than x log x) for the sum.
However, the square root trick is insufficient by itself to deal with higher
order sums involving the divisor function, such as
X
(n3 + 1);
nx
the level here is initially of order x3 , and the square root trick only lowers
this to about x3/2 , creating an error term that overwhelms the main term.
And indeed, the asymptotic for such this sum has not yet been rigorously established (although if one heuristically drops error terms, one can arrive at a
reasonable conjecture for this asymptotic), although some results are known
if one averages over additional parameters (see e.g. [Gr1970], [Ma2012].
157
for any fixed P (not necessarily irreducible) and any fixed m 1, due to
van der Corput [va1939]; this bound is in fact used to dispose of some error
terms in the proof of (7.21). These should be compared with what one can
obtain from the divisor bound (n) nO(1/ log log n) (see [Ta2009, 1.6])
and the trivial bound (n) 1, giving the bounds
X
1
1+O( log log
)
x
x
m (P (n)) x
nx
for any > 0, and the preceding methods then easily allow one to obtain
the lower bound by taking small enough (more precisely, if P has degree
d, one should take equal to 1/d or less). The upper bounds in (7.21) and
(7.22) are more difficult. Ideally, if we could obtain upper bounds of the
form
X
(7.23)
(n)
1
dn :d|n
for any fixed > 0, then the preceding methods would easily establish both
results. Unfortunately, this bound can fail, as illustrated by the following
example. Suppose that n is the product of k distinct primes
p1 . . . pk , each
of which is close to n1/k . Then n has 2k divisors, with nj of them close to
nj/k for each 0 . . . j k. One can think of (the logarithms of) these divisors
as being distributed according to what is essentially a Bernoulli distribution,
thus a randomly selected divisor of n has magnitude about nj/k , where j is
a random variable which has the same distribution as the number of heads
in k independently tossed fair coins. By the law of large numbers, j should
158
7. Number theory
concentrate near k/2 when k is large, which implies that the majority of the
divisors of n will be close to n1/2 . Sending k , one can show that the
bound (7.23) fails whenever < 1/2.
This however can be fixed in a number of ways. First of all, even when
< 1/2, one can show weaker substitutes for (7.23). For instance, for any
fixed > 0 and m 1 one can show a bound of the form
X
(d)C
(7.24)
(n)m
dn :d|n
px
for fixed irreducible P and m 1, which improves van der Corputs inequality (7.23) was shown in [De1971] using the same methods. (A slight
error in the original paper of Erdos was also corrected in this paper.) In
159
which turn out to be enough to obtain the right asymptotics for the number
of solutions to the equation p4 = x1 + y1 + z1 .
7.3.1. Landreaus argument. We now prove (7.24), and use this to show
(7.22).
Suppose first that all prime factors of n have magnitude at most nc/2 .
Then by a greedy algorithm, we can factorise n as the product n = n1 . . . nr
of numbers between nc/2 and nc . In particular, the number r of terms in
this factorisation is at most 2/c. By the trivial inequality (ab) (a) (b)
we have
(n) (n1 ) . . . (nr )
and thus by the pigeonhole principle one has
(n)m (nj )2m/c
for some j. Since nj is a factor of n that is at most nc , the claim follows in
this case (taking C := 2m/c).
Now we consider the general case, in which n may contain prime factors
that exceed nc . There are at most 1/c such factors (counting multiplicity).
Extracting these factors out first and then running the greedy algorithm
again, we may factorise n = n1 . . . nr q where the ni are as before, and q is
the product of at most 1/c primes. In particular, (q) 21/c and thus
(n) 21/c (n1 ) . . . (nr ).
One now argues as before (conceding a factor of 21/c , which is acceptable) to
obtain (7.24) in full generality. (Note that this illustrates a useful principle,
which is that large prime factors of n are essentially harmless for the purposes
of upper bounding (n).)
Now we prove (7.22). From (7.24) we have
X
(P (n))m
(d)O(1)
dx:d|P (n)
m
nx (P (n))
nx:d|n;P (n)=0
by
1.
mod
The inner sum is xd (d) + O((d)) = O( xd (d)), where (d) is the number
of roots of P mod d. Now, for fixed P , it is easy to see that (p) = O(1)
for all primes p, and from Hensels lemma one soon extends this to (pj ) =
160
7. Number theory
O(1) for all prime powers p. (This is easy when p does not divide the
discriminant (P ) of p, as the zeroes of P mod p are then simple. There
are only finitely many primes that do divide the discriminant, and they
can each be handled separately by Hensels lemma and an induction on
the degree of P .) Meanwhile, from the Chinese remainder theorem, is
multiplicative. From this we obtain the crude bound (d) (d)O(1) , and
so we obtain a bound
X (d)O(1)
X
(P (n))m x
.
d
nx
dx
X
(d)O(1)
d=1
1+ log1 x
161
Lemma 7.3.2. For generic n x, the prime factors of P (n) between logC x
and x1/2 are all distinct.
Proof. If p is a prime between logC x and x1/2 , then the total number of
n x for which p2 divides P (n) is
x
x
(p2 ) 2 + O((p2 )) = O( 2 ),
p
p
so the total number of x that fail the above property is
X
x
x
2
p
logC x
C
1/2
log xpx
which is acceptable.
It is difficult to increase the upper bound here beyond x1/2 , but fortunately we will not need to go above this bound. The lower bound cannot be
significantly reduced; for instance, it is quite likely that P (n) will be divisible
by 22 for a positive fraction of n. But we have the following substitute:
Lemma 7.3.3. For generic n x, there are no prime powers pj dividing
2
2
P (n) with p < x1/(log log x) and pj x1/(log log x) .
Proof. By the preceding lemma, we can restrict attention to primes p with
p < logC x. For each such p, let pj be the first power of p exceeding
2
x1/(log log x) . Arguing as before, the total number of n x for which pj
divides P (n) is
x
x
j 1/(log log x)2 ;
p
x
on the other hand, there are at most logC x primes p to consider. The claim
then follows from the union bound.
We now have enough information on the prime factorisation of P (n) to
proceed. We arrange the prime factors of P (n) in increasing order (allowing
repetitions):
P (n) = p1 . . . pJ .
Let 0 j J be the largest integer for which p1 . . . pj x. Suppose first
that J = j + O(1), then as in the previous section we would have
X
(P (n)) (p1 . . . pj )
1
dx:d|P (n)
x1/2 p1 . . . pj x
162
7. Number theory
and pj x1/2 .
For generic n, we have at most O(log log x) distinct prime factors, and
2
2
each such distinct prime less than x1/(log log x) contributes at most x1/(log log x)
to the product p1 . . . pj . We conclude that generically, at least one of these
2
primes p1 , . . . , pj must exceed x1/(log log x) , thus we generically have
2
The exponential factor looks bad, but we can offset it by the x1/r -smooth
nature of p1 . . . pj , which is inherited by its factors d. From (7.25), d is at
most x; by using the square root trick, we can restrict d to be at least the
square root of p1 . . . pj , and thus to be at least x1/4 . Also, d divides P (n),
and as such inherits many of the prime factorisation properties of P (n); in
particular, O(log log x) distinct prime factors, and d has no prime powers pj
2
2
dividing d with p < x1/(log log x) and pj x1/(log log x) .
To summarise, we have shown the following variant of (7.23):
Lemma 7.3.4 (Lowering the level). For generic n x, we
X
(P (n)) exp(O(r))
1
dSr :d|P (n)
for some 1 r (log log x)2 , where Sr is the set of all x1/r -smooth numbers
d between x1/4 and x with O(log log x) distinct prime factors, and such that
2
there are no prime powers pj dividing d with p < x1/(log log x) and pj
2
x1/(log log x) .
Applying P
this lemma (and discarding the non-generic n), we can thus
upper bound nx (P (n)) (up to acceptable errors) by
X
X
X
exp(O(r))
1.
1r(log log x)2
163
The level is now less than x and we can use the usual methods to estimate
the inner sums:
X (d)
X
X
1x
.
d
nx dSr :d|P (n)
dSr
1r(log log x)
It is at this point that we need some algebraic number theory, and specifically the Landau prime ideal theorem, via the following lemma:
Proposition 7.3.5. We have
X (d)
log x.
(7.27)
d
dx
164
7. Number theory
dSr
(d)
d
dSr
O(1)rt
by
x1/2
t+1 r
q1 <...<qm x1/2
tr
X
1
(u).
q1 . . . qm u<x
O(1)rt
1
(
m!
x1/2
X
t+1 r
qx1/2
tr
1 m
) log x.
q
t+1 r
qx1/2
tr
1
1.
q
Inserting this bound and summing the series using Stirlings formula, one
obtains the claim.
165
or equivalently
(7.29)
f (p) = o(
p<x
x
).
log x
(n) =
X
d|n
n
(d) log( ),
d
where is the M
obius function, refinements such as (7.28) are similar in
spirit to estimates of the form
X
(7.31)
(n)f (n) = o(x).
n<x
166
7. Number theory
one can exploit the oscillation of f . For instance, Vaughans identity lets
one rewrite the sum in (7.28) as the sum of the Type I sum
X
X
(log r)f (rd)),
(d)(
dU
V /drx/d
a(d)
dU V
f (rd),
V /drx/d
(d)b(m)f (dm),
V dx/U U <mx/V
P
and the error term dV (n)f (n), whenever 1 U, V x are parameters,
and a, b are the sequences
X
a(d) :=
(d)(e)
eU,f V :ef =d
and
X
b(m) :=
(d).
d|m:dU
c(d)
f (rd),
dU V
U V /drx/d
(m)b(d)f (dm)
V <dx/U U <mx/d
After eliminating troublesome sequences such as a(), b(), c() via CauchySchwarz or the triangle inequality, one is then faced with the task of estimating Type I sums such as
X
f (rd)
ry
f (rd)f (rd0 )
ry
y, d, d0
for various
1. Here, the trivial bound is O(y), but due to a number
of logarithmic inefficiencies in the above method, one has to obtain bounds
167
that are more like O( logyC y ) for some constant C (e.g. C = 5) in order to
end up with an asymptotic such as (7.28) or (7.31).
However, in a recent paper [BoSaZi2011] of Bourgain, Sarnak, and
Ziegler, it was observed that as long as one is only seeking the Mobius
orthogonality (7.31) rather than the von Mangoldt orthogonality (7.28),
one can avoid losing any logarithmic factors, and rely purely on qualitative equidistribution properties of f . A special case of their orthogonality
criterion (which had been discovered previously by Katai [Ka1986]) is as
follows:
Proposition 7.4.1 (Orthogonality criterion). Let f : N C be a bounded
function such that
X
(7.32)
f (pn)f (qn) = o(x)
nx
for any distinct primes p, q (where the decay rate of the error term o(x) may
depend on p and q). Then
X
(7.33)
(n)f (n) = o(x).
nx
Actually, the Bourgain-Sarnak-Ziegler paper establishes a more quantitative version of this proposition, in which can be replaced by an arbitrary bounded multiplicative function, but we will content ourselves with
the above weaker special case. This criterion can be viewed as a multiplicative variant of
lemma, which in our notation
Pthe classical van der CorputP
asserts that nx f (n) = o(x) if one has nx f (n + h)f (n) = o(x) for
each fixed non-zero h.
As a sample application, Proposition 7.4.1 easily gives a proof of the
asymptotic
X
(n)e2in = o(x)
nx
for any irrational . (For rational , this is a little trickier, as it is basically equivalent to the prime number theorem in arithmetic progressions.)
In [BoSaZi2011] this criterion is also applied to nilsequences (obtaining
a quick proof of a qualitative version of a result in [GrTa2012]) and to
horocycle flows (for which no Mobius orthogonality result was previously
known).
Informally, the connection between (7.32) and (7.33) comes from the
multiplicative nature of the Mobius function. If (7.33) failed, then (n) exhibits strong correlation with f (n); by change of variables, we then expect
(pn) to correlate with f (pn) and (pm) to correlate with f (qn), for typical p, q at least. On the other hand, since is multiplicative, (pn) exhibits
168
7. Number theory
strong correlation with (qn). Putting all this together (and pretending correlation is transitive), this would give the claim (in the contrapositive). Of
course, correlation is not quite transitive, but it turns out that one can use
the Cauchy-Schwarz inequality as a substitute for transitivity of correlation
in this case.
We will give a proof of Proposition 7.4.1 shortly. The main idea is to
exploit the following observation:P
if P is a large but finite set of primes
(in the sense that the sum A := pP p1 is large), then for a typical large
number n (much larger than the elements
of P ), the number of primes in P
P
that divide n is pretty close to A = pP p1 :
X
(7.34)
1 A.
pP :p|n
A more precise formalisation of this heuristic is provided by the TuranKubilius inequality, which is proven by a simple application of the second
moment method.
In particular, one can sum (7.34) against (n)f (n) and obtain an approximation
X
1 X X
(n)f (n)
(n)f (n)
A
nx
pP nx:p|n
that approximates a sum of (n)f (n) by a bunch of sparser sums of (n)f (n).
Since
1 Xx
,
x=
A
p
pP
169
p
p<H
goes to infinity as x .
Lemma 7.4.2 (Turan-Kubilius inequality). One has
X X
(7.35)
|
1 A|2 Ax.
nx pP :p|n
Proof. We have
X X
1=
nx pP :p|n
X X
1.
pP nx:p|n
1=
nx:p|n
x
+ O(1)
p
Similarly, we have
X
1)2 =
nx pP :p|n
1.
p,qP nx:p|n,q|n
x
pq
when
nx pP :p|n
170
7. Number theory
which we rearrange as
X
1 X X
(n)f (n) =
(n)f (n) + O(A1/2 x).
A
nx
pP nx:p|n
Write n = pm. Then we have (n)f (n) = (m)f (pm) for all but O(x/p2 )
values of n (if H is sufficiently slowly growing). The exceptional values
contribute at most
X x
X x
=
= O(Ax/W ) = o(Ax)
p2
Wp
pP
pP
uniformly for W 2k < H, where Pk are the primes between 2k and 2k+1 .
Fix k. The left-hand side can be rewritten as
X
X
(m)
f (pm)1mx/p
mx/2k
pPk
f (pm)f (qm).
p,qPk mmin(x/p,x/q)
171
The claim follows (noting from the prime number theorem that |Pk | =
o(|Pk |2 )).
7.4.2. From M
obius to von Mangoldt? It would be great if one could
pass from the M
obius asymptotic orthogonality (7.31) to the von Mangoldt
asymptotic orthgonality (7.28) (or equivalently, to (7.29)), as this would give
some new information about the distribution of primes. Unfortunately, it
seems that some additional input is needed to do so. Here is a simple example of a conditional implication that requires an additional input, namely
some quantitative control on Type I sums:
Proposition 7.4.3. Let f : N C be a bounded function such that
X
(7.36)
(n)f (dn) = o(x)
nx
for each fixed d 1 (with the decay rate allowed to depend on d). Suppose
also that one has the Type I bound
X
X
Mx
(7.37)
sup |
f (mn)|
log2+ x
yx ny
1mM
for all M, x 2 and some absolute constant > 0, where the implied constant is independent of both M and x. Then one has
X
(7.38)
(n)f (n) = o(x)
nx
and thus (by discarding the prime powers and summing by parts)
X
x
).
f (p) = o(
log x
px
Proof. We use the Dirichlet hyperbola method. Using (7.30), one can write
the left-hand side of (7.38) as
X
(m)(log d)f (dm).
dmx
mx/d
m<x/D
D<dx/m
172
7. Number theory
P
If D is sufficiently slowly growing, then by (7.36) one has mx/d (m)f (dm) =
o(x) uniformly for all d D. If D is sufficiently slowly growing, this implies that the first term in (7.39) is also o(x). As for the second term, we
dyadically decompose it and bound it in absolute value by
X
X
X
(log d)f (dm)|.
|
(7.40)
2k <x/D 2k m<2k+1 D<dx/m
D<dx/m
This sum evaluates to O(x/D ), and the claim follows since D goes to infinity.
Note that the trivial bound on (7.37) is M x, so one needs to gain about
two logarithmic factors over the trivial bound in order to use the above
proposition. The presence of the supremum is annoying, but it can be
removed by a modification of the argument if one improves the bound by an
additional logarithm by a variety of methods (e.g. completion of sums), or
by smoothing out the constraint n x. However, I do not know of a way to
remove the need to improve the trivial bound by two logarithmic factors.
Chapter 8
Geometry
174
8. Geometry
A, B, C a line ` through A that trisects the angle BAC, in the sense that
the angle between ` and BA is one third of the angle of BAC?
Thanks to Wantzels result [Wa1836], the answer to this problem is
known to be no in general; a generic angle BAC cannot be trisected
by straightedge and compass. (On the other hand, some special angles
can certainly be trisected by straightedge and compass, such as a right
angle. Also, one can certainly trisect generic angles if other methods than
straightedge and compass are permitted.)
The impossibility of angle trisection stands in sharp contrast to the
easy construction of angle bisection via straightedge and compass, which we
briefly review as follows:
(1) Start with three points A, B, C.
(2) Form the circle c0 with centre A and radius AB, and intersect it
with the line AC. Let D be the point in this intersection that lies
on the same side of A as C. (D may well be equal to C.)
(3) Form the circle c1 with centre B and radius AB, and the circle c2
with centre D and radius AB. Let E be the point of intersection
of c1 and c2 that is not A.
(4) The line ` := AE will then bisect the angle BAC.
See Figure 1. The key difference between angle trisection and angle
bisection ultimately boils down to the following trivial number-theoretic
fact:
Lemma 8.1.2. There is no power of 2 that is evenly divisible by 3.
Proof. Obvious by modular arithmetic, by induction, or by the fundamental theorem of arithmetic.
In contrast, there are of course plenty of powers of 2 that are evenly
divisible by 2, and this is ultimately why angle bisection is easy while angle
trisection is hard.
The standard way in which Lemma 8.1.2 is used to demonstrate the
impossibility of angle trisection is via Galois theory. The implication is
quite short if one knows this theory, but quite opaque otherwise. We briefly
sketch the proof of this implication here, though we will not need it in the
rest of the discussion. Firstly, Lemma 8.1.2 implies the following fact about
field extensions.
Corollary 8.1.3. Let F be a field, and let E be an extension of F that can
be constructed out of F by a finite sequence of quadratic extensions. Then
E does not contain any cubic extensions K of F .
175
176
8. Geometry
theory course, whilst the angle trisection problem requires only high-school
level mathematics to formulate. Even if one is allowed to cheat and sweep
several technicalities under the rug, one still needs to possess a fair amount
of solid intuition about advanced algebra in order to appreciate the proof.
(This was undoubtedly one reason why, even after Wantzels impossibility
result was published, a large amount of effort was still expended by amateur
mathematicians to try to trisect a general angle.)
In this section, I would therefore like to present a different proof (or
perhaps more accurately, a disguised version of the standard proof) of the
impossibility of angle trisection by straightedge and compass, that avoids
explicit mention of Galois theory (though it is never far beneath the surface).
With cheats, the proof is actually quite simple and geometric (except
for Lemma 8.1.2, which is still used at a crucial juncture), based on the
basic geometric concept of monodromy; unfortunately, some technical work
is needed however to remove these cheats.
To describe the intuitive idea of the proof, let us return to the angle
bisection construction, that takes a triple A, B, C of points as input and
returns a bisecting line ` as output. We iterate the construction to create
a quartisecting line m, via the following sequence of steps that extend the
original bisection construction:
(1) Start with three points A, B, C.
(2) Form the circle c0 with centre A and radius AB, and intersect it
with the line AC. Let D be the point in this intersection that lies
on the same side of A as C. (D may well be equal to C.)
(3) Form the circle c1 with centre B and radius AB, and the circle c2
with centre D and radius AB. Let E be the point of intersection
of c1 and c2 that is not A.
(4) Let F be the point on the line ` := AE which lies on c0 , and is on
the same side of A as E.
(5) Form the circle c3 with centre F and radius AB. Let G be the
point of intersection of c1 and c3 that is not A.
(6) The line m := AG will then quartisect the angle BAC.
See Figure 2. Let us fix the points A and B, but not C, and view m (as
well as intermediate objects such as D, c2 , E, `, F , c3 , G) as a function of
C.
Let us now do the following: we begin rotating C counterclockwise
around A, which drags around the other objects D, c2 , E, `, F , c3 , G that
were constructed by C accordingly. For instance, here is an early stage of
177
this rotation process, when the angle BAC has become obtuse; see Figure
3.
Now for the slightly tricky bit. We are going to keep rotating C beyond
a half-rotation of 180 , so that BAC now becomes a reflex angle. At this
point, a singularity occurs; the point E collides into A, and so there is an
instant in which the line ` = AE is not well-defined. However, this turns
out to be a removable singularity (and the easiest way to demonstrate this
will be to tap the power of complex analysis, as complex numbers can easily
route around such a singularity), and we can blast through it to the other
side; see Figure 4.
Note that we have now deviated from the original construction in that
F and E are no longer on the same side of A; we are thus now working in
a continuation of that construction rather than with the construction itself.
Nevertheless, we can still work with this continuation (much
P as,1 say, one
works with analytic continuations of infinite series such as n=1 ns beyond
their original domain of definition).
We now keep rotating C around A. In Figure 5, BAC is approaching
a full rotation of 360 .
178
8. Geometry
179
The reason for this, ultimately, is because any two circles or lines will intersect each other in at most two points, and so at each step of a straightedgeand-compass construction there is an ambiguity of at most 2! = 2. Each
rotation of C around A can potentially flip one of these points to the other,
but then if one rotates again, the point returns to its original position, and
then one can analyse the next point in the construction in the same fashion
until one obtains the proposition.
But now consider a putative trisection operation, that starts with an
arbitrary angle BAC and somehow uses some sequence of straightedge
and compass constructions to end up with a trisecting line `: see Figure 7.
What is the period of this construction? If we continuously rotate C
around A, we observe that a full rotations of C only causes the trisecting
line ` to rotate by a third of a full rotation (i.e. by 120 ): see Figure 8.
Because of this, we see that the period of any construction that contains
` must be a multiple of 3. But this contradicts Proposition 8.1.4 and Lemma
8.1.2.
We will now make the above proof rigorous. Unfortunately, in doing
so, one has to leave the world of high-school mathematics, as one needs a
little bit of algebraic geometry and complex analysis to resolve the issues
with singularities that we saw in the above sketch. Still, I feel that at an
180
8. Geometry
intuitive level at least, this argument is more geometric and accessible than
the Galois-theoretic argument (though anyone familiar with Galois theory
will note that there is really not that much difference between the proofs,
ultimately, as one has simply replaced the Galois group with a closely related
monodromy group instead).
8.1.1. Details. We will assume for sake of contradiction that for every
triple A, B, C of distinct points, we can find a construction by straightedge
and compass that trisects the angle BAC, and eventually deduce a contradiction out of this.
We remark that we do not initially assume any uniformity in this construction; for instance, it could be possible that the trisection procedure for
obtuse angles is completely different from that of acute angles, using a totally different set of constructions, while some exceptional angles (e.g. right
angles or degenerate angles) might use yet another construction. We will
address these issues later.
The first step is to get rid of some possible degeneracies in ones construction. At present, nothing in our definition of a construction prevents
us from adding a point, line, or circle to the construction that was already
present in the existing collection C of points, lines, and circles. However, it is
181
clear that any such step in the construction is redundant, and can be omitted. Thus, we may assume without loss of generality that for each A, B, C,
the construction used to trisect the angle contains no such redundant steps.
(This may make the construction even less uniform than it was previously,
but we will address this issue later.)
Another form of degeneracy that we will need to eliminate for technical
reasons is that of tangency. At present, we allow in our construction the
ability to take two tangent circles, or a circle and a tangent line, and add the
tangent point to the collection (if it was not already present in the construction). This would ordinarily be a harmless thing to do, but it complicates
our strategy of perturbing the configuration, so we now act to eliminate it.
Suppose first that one had two circles c1 , c2 already constructed in the configuration C and tangent to each other, and one wanted to add the tangent
point T to the configuration. But note that in order to have added c1 and c2
to C, one must previously have added the centres A1 and A2 of these circles
to C also. One can then add T to C by intersecting the line A1 A2 with c1
and picking the point that lies on c2 ; this way, one does not need to intersect
two tangent curves together: see Figure 9.
182
8. Geometry
183
184
8. Geometry
after one perturbs A, B, C, that the resulting perturbed line ` still trisects
the angle. (For instance, there are a number of ways to trisect a right angle
(e.g. by bisecting an angle of an equilateral triangle), but if one perturbs
the angle to be slightly acute or slightly obtuse, the line created by this
procedure would not be expected to continue to trisect that angle.)
The next step is to allow analytic geometry (and thence algebraic geometry) to enter the picture, by using Cartesian coordinates. We may identify
the Euclidean plane with the analytic plane R2 := {(x, y) : x, y R}; we
may also normalise A, B to be the points A = (0, 0), B = (1, 0) by this
identification. We will also restrict C to lie on the unit circle S 1 := {(x, y)
R2 : x2 + y 2 = 1}, so that there is now just one degree of freedom in the
configuration (A, B, C). One can describe a line in R2 by an equation of the
form
{(x, y) R2 : ax + by + c = 0}
(with a, b not both zero), and describe a circle in R2 by an equation of the
form
{(x, y) R2 : (x x0 )2 + (y y0 )2 = r2 }
with r non-zero. There is some non-uniqueness in these representations: for
the line, one can multiply a, b, c by the same constant without altering the
185
line, and for the circle, one can replace r by r. However, this will not be
a serious concern for us. Note that any two distinct points P = (x1 , y1 ),
Q = (x2 , y2 ) determine a line
{(x, y) R2 : xy1 xy2 yx1 + yx2 + x1 y2 x2 y1 = 0}
and given three points O = (x0 , y0 ), A = (x1 , y1 ), B = (x2 , y2 ), one can
form a circle
{(x, y) R2 : (x x0 )2 + (y y0 )2 = (x1 x2 )2 + (y1 y2 )2 }
with centre O and radius |AB|. Given two distinct non-parallel lines
` = {(x, y) R2 : ax + by + c = 0}
and
`0 = {(x, y) R2 : a0 x + b0 y + c0 = 0},
their unique intersection point is given as
(
bc0 b0 c a0 c c0 a
,
);
ab0 ba0 ab0 ba0
186
8. Geometry
(x1 , y1 ) + t(x2 x1 , y2 y1 ) (
where
t :=
r2 t2 d2 1/2
) (y1 y2 , x2 x1 )
d2
1 r22 r12
2
2d2
and
d2 := (x1 x2 )2 + (y1 y2 )2 ,
and the points of intersection between ` and c1 (if they exist in R2 ) are
given as
r
ax1 + by1 + c 2
ax1 + by1 + c
(a, b) r2 (
) (b, a).
(8.2)
(x1 , y1 )
2
2
a +b
a2 + b2
The precise expressions given above are not particularly important for our
argument, save to note that these expressions are always algebraic functions
of the input coordinates such as x0 , x1 , x2 , y0 , y1 , y2 , a, b, c, a0 , b0 , c0 , r1 , r2 , defined over the reals R, and that the only algebraic operations needed here
besides the arithmetic operations of addition, subtraction, multiplication,
and division is the square root operation. Thus, we see that any particular
construction of, say, a line ` from a configuration (A, B, C) will locally be
an algebraic function of C (recall that we have already fixed A, B), and this
definition can be extended until one reaches a degeneracy (two points, lines,
or circles collide, two curves become tangent, or two lines become parallel);
however, this degeneracy only occurs in an proper real algebraic set of configurations, and in particular for C in a dimension zero subset of the circle
S1.
These degeneracies are annoying because they disconnect the circle S 1 ,
and can potentially block off large regions of that circle for which the construction is not even defined (because two circles stop intersecting, or a
circle and line stop intersecting, in R2 , due to the lack of a real square
root for negative numbers). To fix this, we move now from the real plane
R2 to the complex plane C2 . Note that the algebraic definitions of a line
and a circle continue to make perfect sense in C2 (with coefficients such as
a, b, c, x0 , y0 , r now allowed to be complex numbers instead of real numbers),
and the algebraic intersection formulae given previously continue to make
sense in the complex setting. The point C now is allowed to range in the
1 = {(x, y) C : x2 + y 2 = 1}, which is a Riemann surface
complex circle SC
187
188
8. Geometry
possible choices for the intersection point of these two circles, and so if one
performs monodromy along a loop of possible pairs (c1 , c2 ) of circles, either
these two choices return to where they initially started, or are swapped; so if
one doubles the loop, one must necessarily leave the intersection points unchanged.) Iterating this, we see that any object constructed by straightedge
and compass from A, B, C must have period 2k for some power of two 2k , in
1 avoiding degenerate points
the sense that if one iterates a loop of C in SC
2k times, the object must return to where it started. (In more algebraic
terminology: the monodromy group must be a 2-group.)
Now, one traverses C along a slight perturbation of a single rotation of
the real unit circle S 1 , taking a slight detour around the finite number of
degeneracy points one encounters along the way. Since ` has to trisect the
angle ABC at each of these points, while varying continuously with C, we
see that when C traverses a full rotation, ` has only traversed one third of
a rotation (or two thirds, depending on which trisection one obtained), and
so the period of ` must be a multiple of three; but this contradicts Lemma
8.1.2, and the claim follows.
189
determine exactly one line AB. In higher degree, the situation is a bit
more complicated. For instance, five collinear points determine more than
one quadric curve, as one can simply take the union of the line containing
those five points, together with an arbitrary additional line. Similarly, eight
points on a conic section plus one additional point determine more than
one cubic curve, as one can take that conic section plus an arbitrary line
going through the additional point. However, if one places some general
position hypotheses on these points, then one can recover uniqueness. For
instance, given five points, no three of which are collinear, there can be at
most one quadric curve that passes through these points (because these five
points cannot lie on the union of two lines, and by Bezouts theorem they
cannot simultaneously lie on two distinct conic sections).
For cubic curves, the situation is more complicated still. Consider for instance two distinct cubic curves 0 = {P0 (x, y) = 0} and = {P (x, y) =
0} that intersect in precisely nine points A1 , . . . , A9 (note from Bezouts
190
8. Geometry
191
But then C does not lie on either ` or despite being a vanishing point of
Q, a contradiction. Thus, no three points from A1 , . . . , A8 are collinear.
In a similar vein, suppose next that six of the first eight points, say
A1 , . . . , A6 , lie on a quadric curve ; as no three points are collinear, this
quadric curve cannot be the union of two lines, and is thus a conic section.
lines Ai Bj and Aj Bi meet at a point Cij . Then the points C12 , C23 , C31 are
collinear.
Proof. We may assume that C12 , C23 are distinct, since the claim is trivial
otherwise.
purple lines in the first figure), let 1 be the union of the three lines A2 B1 ,
A3 B2 , and A1 B3 (the dark blue lines), and let be the union of the three
lines `, `0 , and C12 C23 (the other three lines). By construction, 0 and 1
are cubic curves with no common component that meet at the nine points
A1 , A2 , A3 , B1 , B2 , B3 , C12 , C23 , C31 . Also, is a cubic curve that passes
through the first eight of these points, and thus also passes through the
192
8. Geometry
ninth point C31 , by the Cayley-Bacharach theorem. The claim follows (note
that C31 cannot lie on ` or `0 ).
The same argument gives the closely related theorem of Pascal:
Theorem 8.2.3 (Pascals theorem). Let A1 , A2 , A3 , B1 , B2 , B3 be distinct
points on a conic section . Suppose that for ij = 12, 23, 31, the lines Ai Bj
and Aj Bi meet at a point Cij . Then the points C12 , C23 , C31 are collinear.
Proof. Repeat the proof of Pappus theorem, with taking the place of
` `0 . (Note that as any line meets in at most two points, the Cij cannot
lie on .)
One can view Pappuss theorem as the degenerate case of Pascals theorem, when the conic section degenerates to the union of two lines.
Finally, Proposition 8.2.1 gives the associativity of the elliptic curve
group law:
Theorem 8.2.4 (Associativity of the elliptic curve law). Let := {(x, y)
k 2 : y 2 = x3 +ax+b}{O} be a (projective) elliptic curve, where O := [0, 1, 0]
is the point at infinity on the y-axis, and the discriminant := 16(4a3 +
27b2 ) is non-zero. Define an addition law + on by defining A + B to equal
C, where C is the unique point on collinear with A and B (if A, B are
disjoint) or tangent to A (if A = B), and C is the reflection of C through
193
Let 0 be the union of the three lines AB, C(A + B), and O(B + C)
(the purple lines), and let 00 be the union of the three lines O(A + B),
BC, and A(B + C) (the green lines). Observe that 0 and are cubic
curves with no common component that meet at the nine distinct points
O, A, B, C, A + B, (A + B), B + C, (B + C), ((A + B) + C). The cubic
curve 00 goes through the first eight of these points, and thus (by Proposition
8.2.1) also goes through the ninth point ((A+B)+C). This implies that the
line through A and B+C meets in both (A+(B+C)) and ((A+B)+C),
and so these two points must be equal, and so (A + B) + C = A + (B + C)
as required.
194
8. Geometry
195
196
8. Geometry
with a preference instead for only using concepts (e.g. congruence, distance,
angle) that are invariant with respect to the group of symmetries G.
As we all know from the geometry of the Earth (and the Greek root
geometria literally means Earth measurement), the geometry of the sphere
S 2 resembles the geometry of the plane R2 at scales that are small compared
to the radius of the sphere. There are at least two ways to make this
intuitive fact more precise. One is to make the radius R of the sphere
go to infinity, and perform a suitable limit (e.g. a Gromov-Hausdorff limit).
A dual approach is to keep the radius of the sphere fixed (e.g. considering
only the unit sphere), but making the scale being considered on the sphere
shrink to zero. The two approaches are of course equivalent, but we will
consider the latter.
Thus, we view S 2 as the unit sphere in R3 . With an eye to using the
quaternionic number system later on, we will denote the standard basis of
R3 as i, j, k, thus in particular i is a point on the sphere S 2 which we will
view as an origin for this sphere. The tangent plane to S 2 at this point is
then
{i + yj + zk : y, z R2 }.
This plane is tangent to the sphere to second order. In particular, if (y, z)
R2 , and > 0 is a small parameter (which we think of as going to zero
eventually), then we can find a point on S 2 of the form i + yj + zk + O(2 ).
(If one wishes, one can enforce the O(2 ) error to lie in the i direction,
in order to make the identification uniquely well-defined, although this is
not strictly necessary for the discussion below.) Thus, we can view the
-neighbourhood of the origin i as being approximately identifiable with a
bounded neighbourhood of the origin 0 in the plane R2 via the identification
(y, z) 7 i + yj + zk + O(2 ).
With this identification, one can see various structures in spherical geometry correspond (up to errors of O()) to analogous structures in planar
geometry. For instance, a great circle in S 2 is of the form
{p S 2 : p = 0}
for some S 2 , where is the usual dot product. In order for this great
circle to intersect the O() neighbourhood of the origin i, one must have
i = O(), and so we have
= ai + (cos )j + (sin )k + O(2 )
for some bounded quantity a and some angle . If one then restricts the
great circle to points p = i + yj + zk + O(2 ), the constraint p = 0 then
becomes
a + (cos )y + (sin )z = O(),
197
198
8. Geometry
p
t 2 + x2 + y 2 + z 2 .
The conjugation operation is an anti-automorphism, and the norm is multiplicative: || = ||||. The quaternions also have a trace
tr(t + xi + yj + zk) = t
(in particular, tr( ) = tr() and tr() = tr()), giving rise to a dot
product
:= tr( )
which (together with the norm) gives a Hilbert space structure on the quaternions.
199
200
8. Geometry
particular, one can apply [GuKa2010, Theorems 2.4, 2.5] as a black box,
after verifying that at most O(N ) lines of the form lAB project into a plane
or regulus, which is proven in the S 2 case in much the same way as it is in
the R2 case). We omit the details.
A similar argument (changing the signatures in various metrics, and in
the Clifford algebra underlying the quaternions) also allows one to establish
the same results in the hyperbolic plane H 2 ; again, we omit the details.
If we restrict attention to an -neighbourhood of the origin i in the
sphere S 2 , and similarly restrict to an -neighbourhood of the stabiliser of
i in the spin group S 3 Spin(3), we can use the correspondences from the
previous section to convert S 2 into R2 in the limit, and Spin(3) in the limit
into a double cover of the rotation group SE(2) (which ends up just being
isomorphic to SE(2) again). The great circles lAB in Spin(3) then, in the
limit, become the analogous sets lAB = {R SE(2) : RA = B} in SE(2),
and the above correspondences can then be used to map (most of) SE(2) to
R3 , and (most) lAB to lines, giving Proposition 8.3.1.
Remark 8.3.5. The results in this section can also be interpreted using the
language of Clifford algebra; see [Gu2011].
= Pm (x1 , . . . , xd ) = 0}
to multiple algebraic equations
P1 (x1 , . . . , xd ) = . . . = Pm (x1 , . . . , xd ) = 0
in multiple unknowns x1 , . . . , xd in a field k, where the P1 , . . . , Pm : k d
k are polynomials of various degrees D1 , . . . , Dm . We adopt the classical
perspective of viewing V as a set (and specifically, as an algebraic set), rather
than as a scheme. Without loss of generality we may order the degrees in
non-increasing order:
D1 D2 . . . Dm 1.
We can distinguish between the underdetermined case m < d, when there
are more unknowns than equations; the determined case m = d when there
are exactly as many unknowns as equations; and the overdetermined case
m > d, when there are more equations than unknowns.
Experience has shown that the theory of such equations is significantly
simpler if one assumes that the underlying field k is algebraically closed,
201
and so we shall make this assumption throughout the rest of this section. In
particular, this covers the important case when k = C is the field of complex
numbers (but it does not cover the case k = R of real numbers - see below).
From the general soft theory of algebraic geometry, we know that
the algebraic set V is a union of finitely many algebraic varieties, each of
dimension at least d m, with none of these components contained in any
other. In particular, in the underdetermined case m < d, there are no
zero-dimensional components of V , and thus V is either empty or infinite.
Now we turn to the determined case d = m, where we expect the solution
set V to be zero-dimensional and thus finite. Here, the basic control on the
solution set is given by Bezouts theorem. In our notation, this theorem
states the following:
Theorem 8.4.1 (Bezouts theorem). Let d = m = 2. If V is finite, then it
has cardinality at most D1 D2 .
This result can be found in any introductory algebraic geometry textbook; it can for instance be proven using the classical tool of resultants. The
solution set V will be finite when the two polynomials P1 , P2 are coprime,
but can (and will) be infinite if P1 , P2 share a non-trivial common factor.
By defining the right notion of multiplicity on V (and adopting a suitably scheme-theoretic viewpoint), and working in projective space rather
than affine space, one can make the inequality |V | D1 D2 an equality.
However, for many applications (and in particular, for the applications to
combinatorial incidence geometry), the upper bound usually suffices.
Bezouts theorem can be generalised in a number of ways. For instance,
the restriction on the finiteness of the solution set V can be dropped by
restricting attention to V 0 , the union of the zero-dimensional irreducible
components of V :
Corollary 8.4.2 (Bezouts theorem, again). Let d = m = 2. Then V 0 has
cardinality at most D1 D2 .
Proof. We factor P1 , P2 into irreducible factors (using unique factorisation
of polynomials). By removing repeated factors, we may assume P1 , P2 are
square-free. We then write P1 = Q1 R, P2 = Q2 R where R is the greatest
common divisor of P1 , P2 and Q1 , Q2 are coprime. Observe that the zerodimensional component of {P1 = P2 = 0} is contained in {Q1 = Q2 = 0},
which is finite from the coprimality of Q1 , Q2 . The claim follows.
It is also not difficult to use Bezouts theorem to handle the overdetermined case m > d = 2 in the plane:
202
8. Geometry
203
hyperplanes) we can also control the total degree of all the i-dimensional
components of V for any fixed i. (Again, by using intersection theory one
can get a slightly more precise bound than this, but the proof of that bound
is more complicated than the arguments given here.)
204
8. Geometry
N
Y
(x j))2 + (
j=1
N
Y
(y j))2
j=1
205
are those with dim(V,i ) = dim(W,i ) = dim(W ). Now, the projection map
is a dominant map between two varieties of the same dimension, and is
thus generically finite, with the preimages generically having some constant
cardinality D,i , and non-generically the preimages have a zero-dimensional
component of at most2 D,i points.
As a consequence of this analysis, we see that the generic fibre always
has at least as many zero-dimensional components as a non-generic fibre,
and so to establish Theorem 8.4.5, it suffices to do so for generic P1 , . . . , Pm .
Now take P1 , . . . , Pm to be generic. We know that generically, the set V
is finite; we seek to bound its cardinality |V | by D1 . . . Dm . To do this, we
dualise the problem. Let A be the space of all affine-linear forms : k d k;
this is a d + 1-dimensional vector space. We consider the set V of all affinelinear forms whose kernel { = 0} intersects V . This is a union of |V |
hyperplanes in A, and is thus a hypersurface of degree |V |. Thus, to upper
bound the size of V , it suffices to upper bound the degree of the hypersurface
V , and this can be done by finding a non-zero polynomial of controlled
degree that vanishes identically on V . The point of this observation is that
the property of a polynomial being non-zero is a Zariski-open condition, and
so we have a chance of establishing the generic case from a special case.
Now let us look for a (generically non-trivial) polynomial of degree at
Q
most di=1 Di that vanishes on V . The idea is to try to dualise the assertion
that the monomials xa11 . . . xadd with aj < Dj for all 1 j d generically
span the function ring of V , to become a statement about V .
P
Let D be a sufficiently large integer (any integer larger than di=1 (Di 1)
will do, actually), and let X be the space of all polynomials P : k d k of
degree at most D. This is a finite-dimensional vector space over k, generated
by the monomials xa11 . . . xadd with a1 , . . . , ad non-negative integers adding up
to at most D. We can split X as a direct sum
(8.4)
d
X
i
X=(
xD
i Xi ) + X0
i=1
206
8. Geometry
particular,
(8.5)
dim(X) =
d
X
dim(Xi ) + dim(X0 ).
i=1
dim(X0 ) =
d
Y
Di .
i=1
d
X
Pi Xi ) + X0 .
i=1
207
where
A + B := {a + b : a A, b B}
is the sumset of A and B, and denotes Lebesgue measure. The estimate is
sharp, as can be seen by considering the case when A, B are convex bodies
that are dilates of each other, thus A = B := {b : b B} for some
> 0, since in this case one has (A) = d (B), A + B = ( + 1)B, and
(A + B) = ( + 1)d (B).
The Brunn-Minkowski inequality has many applications in convex geometry. To give just one example, if we assume that A has a smooth boundary A, and set B equal to a small ball B = B(0, ), then (B)1/d =
(B(0, 1))1/d , and in the limit 0 one has
(A + B) = (A) + |A| + o()
where |A| is the surface measure of A; applying the Brunn-Minkowski inequality and performing a Taylor expansion, one soon arrives at the isoperimetric inequality
|A| d(A)11/d (B(0, 1))1/d .
208
8. Geometry
)
R
R
R
This inequality is usually stated using h((1 )x + y) instead of h(x +
1
y) in order to eliminate the ungainly factor (1)d(1)
. However, we
d
formulate the inequality in this fashion in order to avoid any reference to
the dilation maps x 7 x; the reason for this will become clearer later.
The Prekopa-Leindler inequality quickly implies the Brunn-Minkowski
inequality. Indeed, if we apply it to the indicator functions f := 1A , g :=
1B , h := 1A+B (which certainly obey (8.9)), then (8.10) gives
1
1
(A) d (B) d
(A + B)1/d
(1 )1
for any 0 < < 1. We can now optimise in ; the optimal value turns out
to be
(B)1/d
:=
(A)1/d + (B)1/d
which yields (8.8).
To prove the Prekopa-Leindler inequality, we first observe that the inequality tensorises in the sense that if it is true in dimensions d1 and d2 , then
it is automatically true in dimension d1 + d2 . Indeed, if f, g, h : Rd1 Rd2
R+ are measurable functions obeying (8.9) in dimension d1 + d2 , then for
any x1 , y1 Rd1 , the functions f (x1 , ), g(y1 , ), h(x1 + y1 , ) : Rd2 R+
209
The claim then follows from the weighted arithmetic mean-geometric mean
inequality (1 )x + y x1 y .
In this section we will make the simple observation (which appears in
[LeMa2005] in the case of the Heisenberg group, but may have also been
stated elsewhere in the literature) that the above argument carries through
without much difficulty to the nilpotent setting, to give a nilpotent BrunnMinkowski inequality:
Theorem 8.5.3 (Nilpotent Brunn-Minkowski). Let G be a connected, simply connected nilpotent Lie group of (topological) dimension d, and let A, B
be bounded open subsets of G. Let be a Haar measure on G (note that
nilpotent groups are unimodular, so there is no distinction between left and
right Haar measure). Then
(8.11)
210
8. Geometry
(8.12)
)
G
G
G
To prove the nilpotent Prekopa-Leindler inequality, the key observation
is that this inequality not only tensorises; it splits with respect to short
exact sequences. Indeed, suppose one has a short exact sequence
0KGH0
of connected, simply connected nilpotent Lie groups. The adjoint action
of the connected group G on K acts nilpotently on the Lie algebra of K
and is thus unimodular. Because of this, we can split a Haar measure G
on G into Haar measures K , H on K, H respectively so that we have the
Fubini-Tonelli formula
Z
Z
f (g) dG (g) =
F (h) dH (h)
G
R+ ,
211
follows from the abelian case, which we have already established in Theorem
8.5.2.
Remark 8.5.5. Some connected, simply connected nilpotent groups G (and
specifically, the Carnot groups) can be equipped with a one-parameter family
of dilations x 7 x, which are a family of automorphisms on G, which
dilate the Haar measure by the formula
( x) = D (x)
for an integer D, called the homogeneous dimension of G, which is typically larger than the topological dimension. For instance, in the case of the
Heisenberg group
1 R R
G := 0 1 R ,
0 0 1
which has topological dimension d = 3, the natural family of dilations is
given by
1 x z
1 x 2 z
: 0 1 y 7 0 1 y
0 0 1
0 0
1
with homogeneous dimension D = 4. Because the two notions d, D of dimension are usually distinct in the nilpotent case, it is no longer helpful to try to
use these dilations to simplify the proof of the Brunn-Minkowski inequality,
in contrast to the Euclidean case. This is why we avoided using dilations in
the preceding discussion. It is natural to wonder whether one could replace
d by D in (8.11), but it can be easily shown that the exponent d is best possible (an observation that essentially appeared first in [Mo2003]). Indeed,
working in the Heisenberg group for sake of concreteness, consider the set
1 x z
A := {0 1 y : |x|, |y| N, |z| N 10 }
0 0 1
for some large parameter N . This set has measure N 12 using the standard
Haar measure on G. The product set A A is contained in
1 x z
A := {0 1 y : |x|, |y| 2N, |z| 2N 10 + O(N 2 )}
0 0 1
and thus has measure at most 8N 12 + O(N 4 ). This already shows that
the exponent in (8.11) cannot be improved beyond d = 3; note that the
homogeneous dimension D = 4 is making its presence known in the O(N 4 )
term in the measure of A A, but this is a lower order term only.
212
8. Geometry
It is somewhat unfortunate that the nilpotent Brunn-Minkowski inequality is adapted to the topological dimension rather than the homogeneous one,
because it means that some of the applications of the inequality (such as
the application to isoperimetric inequalities mentioned at the start of the
section) break down3.
Remark 8.5.6. The inequality can be extended to non-simply-connected
connected nilpotent groups G, if d is now set to the dimension of the largest
simply connected quotient of G. It seems to me that this is the best one can
do in general; for instance, if G is a torus, then the inequality fails for any
d > 0, as can be seen by setting A = B = G.
Remark 8.5.7. Specialising the nilpotent Brunn-Minkowski inequality to
the case A = B, we conclude that
(A A) 2d (A).
This inequality actually has a much simpler proof (attributed to Tsachik
Gelander in [Hr2012]): one can show that for a connected, simply connected
Lie group G, the exponential map exp : g G is a measure-preserving
homeomorphism, for some choice of Haar measure g on g, so it suffices to
show that
g (log(A A)) 2d g (log A).
But A A contains all the squares {a2 : a A} of A, so log(A A) contains
the isotropic dilation 2 log A, and the claim follows. Note that if we set
A to be a small ball around the origin, we can modify this argument to
give another demonstration of why the topological dimension d cannot be
replaced with any larger exponent in (8.11).
One may tentatively conjecture that the inequality (A A) 2d (A)
in fact holds in all unimodular connected, simply connected Lie groups G,
and all bounded open subsets A of G; I do not know if this bound is always
true, however.
3Indeed, the topic of isoperimetric inequalities for the Heisenberg group is a subtle one, with
many naive formulations of the inequality being false. See [Mo2003] for more discussion.
Chapter 9
Dynamics
214
9. Dynamics
S(x, y) := (T x, (x)y).
215
216
9. Dynamics
Remark 9.1.4. The precise connection between Lemma 9.1.3 and Proposition 9.1.2 arises from the following observation: with E, F, g as in the proof
of Proposition 9.1.2, and x X, the set
A := {n Z : T n x F }
can be partitioned into the classes
Ai := {n Z : S n (x, i) E 0 }
217
g,h,hg
(A Tg B Th C) = (A)(B)(C)
for all A, B, C X , thus for every > 0, there exists a finite subset K of G
such that
|(A Tg B Th C) (A)(B)(C)|
whenever g, h, h g all lie outside K.
218
9. Dynamics
It is obvious that a strongly 3-mixing system is necessarily strong 2mixing. In the case of Z-systems, it has been an open problem for some
time, due to Rohlin [Ro1949], whether the converse is true:
Problem 9.2.1 (Rohlins problem). Is every strongly mixing Z-system necessarily strongly 3-mixing?
This is a surprisingly difficult problem. In the positive direction, a routine application of the Cauchy-Schwarz inequality (via van der Corputs inequality) shows that every strongly mixing system is weakly 3-mixing, which
roughly speaking means that (A Tg B Th C) converges to (A)(B)(C)
for most g, h Z. Indeed, every weakly mixing system is in fact weakly
mixing of all orders; see for instance [Ta2009, 2.10]. So the problem is to
exclude the possibility of correlation between A, Tg B, and Th C for a small
but non-trivial number of pairs (g, h).
It is also known that the answer to Rohlins problem is affirmative for
rank one transformations [Ka1984] and for shifts with purely singular continuous spectrum [Ho1991] (note that strongly mixing systems cannot have
any non-trivial point spectrum). Indeed, any counterexample to the problem, if it exists, is likely to be highly pathological.
In the other direction, Rohlins problem is known to have a negative answer for Z2 -systems, by a well-known counterexample of Ledrappier [Le1978]
which can be described as follows. One can view a Z2 -system as being essentially equivalent to a stationary process (xn,m )(n,m)Z2 of random variables
2
xn,m in some range space indexed by Z2 , with X being Z with the
obvious shift map
T(g,h) (xn,m )(n,m)Z2 := (xng,mh )(n,m)Z2 .
In Ledrappiers example, the xn,m take values in the finite field F2 of two
elements, and are selected at uniformly random subject to the Pascals
triangle linear constraints
xn,m = xn1,m + xn,m1 .
A routine application of the Kolmogorov extension theorem (see e.g. [Ta2011,
1.7]) allows one to build such a process. The point is that due to the properties of Pascals triangle modulo 2 (known as Sierpinskis triangle), one
has
xn,m = xn2k ,m + xn,m2k
for all powers of two 2k . This is enough to destroy strong 3-mixing, because
it shows a strong correlation between x, T(2k ,0) x, and T(0,2k ) x for arbitrarily
large k and randomly chosen x X. On the other hand, one can still show
that x and Tg x are asymptotically uncorrelated for large g, giving strong
2-mixing. Unfortunately, there are significant obstructions to converting
219
xn + xn+tk + xn+2tk = 0
for all n F3 [t] and all k 0. Again, this system is manifestly not strongly
3-mixing, but can be shown to be strongly 2-mixing; I give details below the
fold.
As I discussed in [Ta2008, 1.6], in many cases the dyadic model serves
as a good guide for the non-dyadic model. However, in this case there is
a curious rigidity phenomenon that seems to prevent Ledrappier-type examples from being transferable to the one-dimensional non-dyadic setting;
once one restores the Archimedean nature of the underlying group, the constraints (9.3) not only reinforce each other strongly, but also force so much
linearity on the system that one loses the strong mixing property.
9.2.1. The example. Let B be any ball in F3 [t], i.e. any set of the form
{n F3 [t] : deg(n n0 ) K} for some n0 F3 [t] and K 0. One can then
create a process xB = (xn )nB adapted to this ball, by declaring (xn )nB
to be uniformly distributed in the vector space VB FB
3 of all tuples with
coefficients in F3 that obey (9.3) for all n B and k K. Because any
translate of a line (n, n + tk , n + t2k ) is still a line, we see that this process is
stationary with respect to all shifts n 7 n + g of degree deg(g) at most K.
Also, if B B 0 are nested balls, we see that the vector space VB 0 projects
surjectively via the restriction map to VB (since any tuple obeying (9.3) in
B can be extended periodically to one obeying (9.3) in B 0 ). As such, we see
that the process xB is equivalent in distribution to the restriction xB 0 B
of xB 0 to B. Applying the Kolmogorov extension theorem, we conclude that
there exists an infinite process x = (xn )nF3 [t] whose restriction x B to any
ball B has the distribution of xB . As each xB was stationary with respect
to translations that preserved B, we see that the full process x is stationary
with respect to the entire group F3 [t].
220
9. Dynamics
221
difficult to find a way to set the temperature parameters in such a way that
one has meaningful 3-correlations, without the system freezing up so much
that 2-mixing fails. It is also tempting to try to truncate the constraints
such as (9.3) to prevent their propagation, but it seems that any naive
attempt to perform a truncation either breaks stationarity, or introduces
enough periodicity into the system that 2-mixing breaks down. My tentative
opinion on this problem is that a Z-counterexample is constructible, but one
would have to use a very delicate and finely tuned construction to achieve
it.
Chapter 10
Miscellaneous
223
224
10. Miscellaneous
(10.1)
pk (1 pk+1 ) . . . (1 pN ).
The winner of the poll should then be the movie which maximises the
quantity (10.1).
One can solve this optimisation problem by assuming a power law
pk ck
for some parameters c and , which typically are comparable to 1. It is
an instructive exercise to optimise (10.1) using this law. What one finds is
225
that the value of the exponent becomes key. If < 1 (and N is large),
then (10.1) is maximised at k = N , and so in this case the poll should indeed
rate the very worst movies at the top of their ranking.
If > 1, there is a surprising reversal; (10.1) is instead maximised for
a value of k which is bounded, k = O(1). Basically, the poll now ranks the
worst blockbuster movie, rather than the worst movie period; a mediocre
but widely viewed movie will beat out a terrible but obscure movie.
Amusingly, according to Zipf s law, one expects to be close to 1. As
such, there is a critical phase transition (especially if the constant c is also
at the critical value of 1) and now one can anticipate the poll to more or
less randomly select movies of any level of quality. So one can blame Zipfs
law for the inaccuracy of worst movie polls.
226
10. Miscellaneous
227
total descriptive complexity, when the range of applicability of the law becomes large. This is in contrast to superficially similar proposed laws such
as the Titius-Bode law, which was basically restricted to the six classical
planets and thus provided only a negligible saving in descriptive complexity.
Note that Keplers law introduces a new quantity, c, to the explanatory
model of the universe. This quantity increases the descriptive complexity
of the model by one number, but this increase is more than offset by the
decrease (of six numbers, in the classical case) caused by the application of
the law. Thus we see the somewhat unintuitive fact that one can simplify
ones model of the universe by adding parameters to it. However, if one adds
a gratuitiously large number of such parameters to the model, then one can
end up with a net increase in descriptive complexity, which is undesirable;
this can be viewed as a formal manifestation of Occams razor. For instance,
if one had to add an ad hoc fudge factor Fi to Keplers law to make it
work,
Ti2 = cRi3 + Fi ,
with Fi being different for each planet, then the descriptive complexity
of this model has in fact increased to thirteen numbers (e.g. one can specify
c, R1 , . . . R6 , and F1 , . . . , F6 ), together with the fudged Keplers law, leading
to a model with worse complexity3 than the initial model of simply stating
all the twelve observables T1 , . . . , T6 , R1 , . . . , R6 .
Note also that the additional parameters (such as c) introduced by such
a law were not initially present in the previous model of the data set, and
can only be measured through the law itself. This can give the appearance of
circularity - Keplers law relates times and radii of planets using a constant
c, but the constant c can only be determined by applying Keplers law. If
there was only one planet in the data set, this law would indeed be circular
(providing no new information on the orbital time and radius of the planet);
but the power of the law comes from its uniform applicability among all
planets. For instance, one can use data from the six classical planets to
compute c, which can then be used to make predictions on, say, the orbital
period of a newly discovered planet at a known distance to the sun. This
may seem confusingly circular4 from the prescriptive viewpoint - does the
3However, if this very same fudge factor F also appeared in laws that involved other statistics
i
of the planet, e.g. mass, radius, temperature, etc. - then it can become possible again that such
a law could act to decrease descriptive complexity when working with an enlarged data set that
involves these statistics. Also, if the fudge factor is always small, then there is still some decrease
in descriptive complexity coming from a saving in the most significant figures of the primary
measurements Ti , Ri . So an analysis of an oversimplified data set, such as this one, can be
misleading.
4One could use mathematical manipulation to try to eliminate such unsightly constants, for
instance replacing Keplers law with the (mathematically equivalent) assertion that Ti2 /Ri3 =
228
10. Miscellaneous
Tj2 /Rj3 for all i, j, but this tends to lead to mathematically uglier laws and also does not lead to
any substantial saving in descriptive complexity.
229
an object in two different states, but cannot compute5 the potential energy
itself. Indeed, one could add a fixed constant to the potential energy of all
the possible states of an object, and this would not alter any of the physical
consequences of the model. Nevertheless, the presence of such unphysical
quantities can serve to reduce the descriptive complexity of a model (or at
least to reduce the mathematical complexity, by making it easier to compute
with the model), and can thus be desirable from a descriptive viewpoint,
even though they are unappealing from a prescriptive one.
It is also possible to use mathematical abstraction to reduce the number
of unphysical quantities in a model; for instance, potential energy could
be viewed not as a scalar, but instead as a more abstract torsor. Again,
these mathematical manipulations do not fundamentally affect the physical
consequences of the model.
230
10. Miscellaneous
b := P (V sells at Y |V is dishonest)
then after a bit of computation using Bayes theorem, we find that
(10.2)
P (V is honest|V sells at Y ) =
ap
.
ap + b(1 p)
(b a)p(1 p)
.
ap + b(1 p)
231
Bibliography
[Al2011] J. M. Aldaz, The weak type (1, 1) bounds for the maximal function associated
to cubes grow to infinity with the dimension, Ann. of Math. (2) 173 (2011), no. 2,
1013-1023.
[Al1974] F. Alexander, Compact and finite rank operators on subspaces of lp , Bull. London
Math. Soc. 6 (1974), 341-342.
[AmGe1973] W. O. Amrein, V. Georgescu, On the characterization of bound states and
scattering states in quantum mechanics, Helv. Phys. Acta 46 (1973/74), 635-658.
[Ba1966] A. Baker, Linear forms in the logarithms of algebraic numbers. I, Mathematika.
A Journal of Pure and Applied Mathematics 13 (1966), 204-216.
[Ba1967] A. Baker, Linear forms in the logarithms of algebraic numbers. II, Mathematika.
A Journal of Pure and Applied Mathematics 14 (1966), 102-107.
[Ba1967b] A. Baker, Linear forms in the logarithms of algebraic numbers. III, Mathematika. A Journal of Pure and Applied Mathematics 14 (1966), 220-228.
[BoSo1978] C. B
ohm, G. Sontacchi, On the existence of cycles of given length in integer
sequences like xn+1 = xn/2 if xn even, and xn+1 = 3xn + 1 otherwise, Atti Accad.
Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. 64 (1978), no. 3, 260-264.
[BoChLoSoVe2008] C. Borgs, J. Chayes, L. Lov
asz, V. S
os, K. Vesztergombi, Convergent
sequences of dense graphs. I. Subgraph frequencies, metric properties and testing, Adv.
Math. 219 (2008), no. 6, 1801-1851.
[Bo1985] J. Bourgain, Estimations de certaines fonctions maximales, C. R. Acad. Sci.
Paris Ser. I Math. 301 (1985), no. 10, 499-502.
[Bo1991] J. Bourgain, Besicovitch type maximal operators and applications to Fourier
analysis, Geom. Funct. Anal. 1 (1991), no. 2, 147-187.
[Bo2005] J. Bourgain, Estimates on exponential sums related to the Diffie-Hellman distributions, Geom. Funct. Anal. 15 (2005), no. 1, 1-34.
[BoSaZi2011] J. Bourgain, P. Sarnak, T. Ziegler, Disjointness of Mobius from horocycle
flows, preprint.
[BrGrGuTa2010] E. Breuillard, B. Green, R. Guralnick, T. Tao, Strongly dense free subgroups of semisimple algebraic groups, preprint.
233
234
Bibliography
preprint.
[Co2007] A. Comech,
Cotlar-Stein almost orthogonality lemma,
www.math.tamu.edu/ comech/papers/CotlarStein/CotlarStein.pdf
preprint.
[Ei1969] D. Eidus,
The principle of limiting amplitude, Uspehi Mat. Nauk 24 (1969), no.
3(147), 91-156.
[ElSz2012] G. Elek, B. Szegedy, A measure-theoretic approach to the theory of dense hypergraphs, Adv. Math. 231 (2012), no. 3-4, 1731-1772.
[ElSh2011] G. Elekes, M. Sharir, Incidences in three dimensions and distinct distances in
the plane, Combin. Probab. Comput. 20 (2011), no. 4, 571-608.
[ElObTa2010] J. Ellenberg, R. Oberlin, T. Tao, The Kakeya set and maximal conjectures
for algebraic varieties over finite fields, Mathematika 56 (2010), no. 1, 1-25.
[ElTa2011] C. Elsholtz, T. Tao, Counting the number of solutions to the Erdos-Straus
equation on unit fractions, preprint.
[En1973] P. Enflo, A counterexample to the approximation problem in Banach spaces, Acta
Math. 130 (1973), 309-317.
[En1978] V. Enss, Asymptotic completeness for quantum mechanical potential scattering.
I. Short range potentials, Comm. Math. Phys. 61 (1978), no. 3, 285-291.
P
[Er1952] P. Erd
os, On the sum xk=1 d(f (k)), J. London Math. Soc. 27 (1952), 7-15.
Bibliography
235
[Er1979] P. Erd
os, Some unconventional problems in number theory, Journees
Arithmetiques de Luminy (Colloq. Internat. CNRS, Centre Univ. Luminy, Luminy,
1978), pp. 73-82, Asterisque, 61, Soc. Math. France, Paris, 1979.
[Ka1940] P. Erdos, M. Kac, The Gaussian Law of Errors in the Theory of Additive Number
Theoretic Functions, American Journal of Mathematics 62 (1940), 738-742.
[Fe1971] C. Fefferman, The multiplier problem for the ball, Ann. of Math. (2) 94 (1971),
330-336.
[Fe1995] C. Fefferman, Selected theorems by Eli Stein, Essays on Fourier analysis in honor
of Elias M. Stein (Princeton, NJ, 1991), 135, Princeton Math. Ser., 42, Princeton
Univ. Press, Princeton, NJ, 1995.
[FeSt1972] C. Fefferman, E. Stein, H p spaces of several variables, Acta Math. 129 (1972),
no. 3-4, 137-193.
[Fu1977] H. Furstenberg, Ergodic behavior of diagonal measures and a theorem of Szemeredi on arithmetic progressions, J. Analyse Math. 31 (1977), 204256.
[Ga1981] L. Garner, On the Collatz 3n + 1 algorithm, Proc. Amer. Math. Soc. 82 (1981),
no. 1, 19-22.
[Ge1934] A. Gelfond, Sur le septieme Probleme de D. Hilbert,Comptes Rendus Acad. Sci.
URSS Moscou 2 (1934), 16.
[Go2008] W. T. Gowers, Quasirandom groups, Combin. Probab. Comput. 17 (2008), no.
3, 363-387.
[Gr2008] A. Granville, Smooth numbers: computational number theory and beyond, Algorithmic number theory: lattices, number fields, curves and cryptography, 267323,
Math. Sci. Res. Inst. Publ., 44, Cambridge Univ. Press, Cambridge, 2008.
[Gr1970] G. Greaves, On the divisor-sum problem for binary cubic forms, Acta Arith. 17
(1970) 1-28.
[GrRu2005] B. Green, I. Ruzsa, Sum-free sets in abelian groups, Israel J. Math. 147
(2005), 157-188.
[GrTa2012] B. Green, T. Tao, The M
obius function is strongly orthogonal to nilsequences,
Ann. of Math. (2) 175 (2012), no. 2, 541-566.
[Gr1955] A. Grothendieck, Produits tensoriels topologiques et espaces nucleaires, Mem.
Amer. Math. Soc. 1955 (1955), no. 16, 140 pp.
[Gu2011] C. Gunn, On the Homogeneous Model Of Euclidean Geometry, AGACSE (2011)
[GuKa2010] L. Guth, N. Katz, On the Erdos distinct distance problem in the plane,
preprint.
[Gu1988] R. Guy, The Strong Law of Small Numbers, American Mathematical Monthly
95 (1988), 697-712.
[Ha2010] Y. Hamidoune, Two Inverse results, preprint. arXiv:1006.5074
[HaRa1917] G. H. Hardy, S. Ramanujan, The normal number of prime factors of a number, Quarterly Journal of Mathematics 48 (1917), 76-92.
[He1983] J. Heintz, Definability and fast quantifier elimination over algebraically closed
fields, Theoret. Comput. Sci. 24 (1983), 239277.
[Ho1963] C. Hooley, On the number of divisors of a quadratic polynomial, Acta Math.
110 (1963), 97-114.
[Ho1991] B. Host, Mixing of all orders and pairwise independent joinings of systems with
singular spectrum, Israel J. Math. 76 (1991), no. 3, 289-298.
[Hr2012] E. Hrushovski, Stable group theory and approximate subgroups, J. Amer. Math.
Soc. 25 (2012), no. 1, 189-243.
236
Bibliography
[Hu2004] D. Husem
oller, Elliptic curves. Second edition. With appendices by Otto Forster,
Ruth Lawrence and Stefan Theisen. Graduate Texts in Mathematics, 111. SpringerVerlag, New York, 2004.
[IoRoRu2011] A. Iosevich, O. Roche-Newton, M. Rudnev, On an application of Guth-Katz
theorem, preprint.
[Ka1986] I. K
atai, A remark on a theorem of H. Daboussi. Acta Math. Hungar. 47 (1986),
no. 1-2, 223-225.
[Ka1984] S. Kalikow, Twofold mixing implies threefold mixing for rank one transformations, Ergodic Theory Dynam. Systems 4 (1984), no. 2, 237-259.
[Ka1965] T. Kato, Wave operators and similarity for some non-selfadjoint operators,
Math. Ann. 162 (1965/1966), 258-279.
[Ke1964] J. H. B. Kemperman, On products of sets in locally compact groups, Fund. Math.
56 (1964), 51-68.
[KnSt1971] A. Knapp, E. Stein, Intertwining operators for semisimple groups, Ann. of
Math. (2) 93 (1971), 489-578.
[Kn1953] M. Kneser, Absch
atzungen der asymptotischen Dichte von Summenmengen,
Math. Z 58 (1953), 459484.
[KrLa2003] I. Krasikov, J. Lagarias, Bounds for the 3x + 1 problem using difference inequalities, Acta Arith. 109 (2003), no. 3, 237-258.
[KrRa2010] S. Kritchman, R. Raz, The surprise examination paradox and the second incompleteness theorem, Notices Amer. Math. Soc. 57 (2010), no. 11, 1454-1458.
[La2009] J. Lagarias, Ternary expansions of powers of 2., J. Lond. Math. Soc. 79 (2009),
no. 3, 562-588.
[La1989] B. Landreau, A new proof of a theorem of van der Corput, Bull. London Math.
Soc. 21 (1989), no. 4, 366-368.
[Le1978] F. Ledrappier, Un champ markovien peut etre dentropie nulle et melangeant, C.
R. Acad. Sci. Paris Ser. A-B 287 (1978), no. 7, A561-A563.
[LeMa2005] G. Leonardi, S. Masnou, On the isoperimetric problem in the Heisenberg group
1H n , Ann. Mat. Pura Appl. (4) 184 (2005), no. 4, 533553.
[Li1973] W. Littman, Lp Lq -estimates for singular integral operators arising from hyperbolic equations, Partial differential equations (Proc. Sympos. Pure Math., Vol. XXIII,
Univ. California, Berkeley, Calif., 1971), pp. 479481. Amer. Math. Soc., Providence,
R.I., 1973.
[Lo1975] P. Loeb, Conversion from nonstandard to standard measure spaces and applications in probability theory, Trans. Amer. Math. Soc. 211 (1975), 113-122.
[LoSz2006] L. Lov
asz, B. Szegedy, Limits of dense graph sequences, J. Combin. Theory
Ser. B 96 (2006), no. 6, 933-957.
[Ma1953] A. M. Macbeath, On measure of sum sets. II. The sum-theorem for the torus,
Proc. Cambridge Philos. Soc. 49, (1953), 40-43.
[MaHu2008] C. R. MacCluer, A. Hull, A short proof of the Fredholm alternative, Int. J.
Pure Appl. Math. 45 (2008), no. 3, 379-381.
[Ma2010] K. Maples, Singularity of Random Matrices over Finite Fields, preprint.
arXiv:1012.2372
[Ma2012] L. Matthiesen, Correlations of the divisor function, Proc. Lond. Math. Soc. 104
(2012), 827-858.
Bibliography
237
238
Bibliography
Bibliography
239
[Ta2008] T. Tao, Structure and randomness: pages from year one of a mathematical blog,
American Mathematical Society, Providence RI, 2008.
[Ta2009] T. Tao, Poincares Legacies: pages from year two of a mathematical blog, Vol.
I, American Mathematical Society, Providence RI, 2009.
[Ta2009b] T. Tao, Poincares Legacies: pages from year two of a mathematical blog, Vol.
II, American Mathematical Society, Providence RI, 2009.
[Ta2010] T. Tao, An epsilon of room, Vol. I, American Mathematical Society, Providence
RI, 2010.
[Ta2010b] T. Tao, An epsilon of room, Vol. II, American Mathematical Society, Providence RI, 2010.
[Ta2011] T. Tao, An introduction to measure theory, American Mathematical Society,
Providence RI, 2011.
[Ta2011b] T. Tao, Higher order Fourier analysis, American Mathematical Society, Providence RI, 2011.
[Ta2011c] T. Tao, Topics in random matrix theory, American Mathematical Society, Providence RI, 2011.
[Ta2011d] T. Tao, Compactness and contradiction, American Mathematical Society, Providence RI, 2011.
[Ta2012] T. Tao, Hilberts fifth problem and related topics, in preparation.
[Ta2012b] T. Tao, Noncommutative sets of small doubling, preprint.
[TaVu2006] T. Tao, V. Vu, Additive combinatorics, Cambridge University Press, 2006.
[Te1976] R. Terras, A stopping time problem on the positive integers, Acta Arith. 30
(1976), no. 3, 241-252.
[Th1965] R. Thom, Sur lhomologie des varietes algebriques reelles, 1965 Differential and
Combinatorial Topology (A Symposium in Honor of Marston Morse) pp. 255-265
Princeton Univ. Press, Princeton, N.J.
Index
T T identity, 77
approximation property, 56
argumentum ad ignorantium, 1
asymptotic notation, x
atomic proposition, 13
Bakers theorem, 132
Bezouts inequality, 202
Bezouts theorem, 189, 201
Bochner-Riesz operator, 110
Borel-Cantelli lemma (heuristic), 2
Brunn-Minkowski inequality, 207
Cartan subgroup, 42
Cayley-Bacharach theorem, 190
cell decomposition, 48
charge current, 115
classical Lie group, 41
cocycle, 214
Collatz conjecture, 143
common knowledge, 25
complete measure space, 97
completeness (logic), 15
completeness theorem, 16
Cotlar-Stein lemma, 77
deduction theorem, 14
deductive theory, 16
descriptive activity, 225
Dirichlet hyperbola method, 155
Dirichlet series, 152
Dirichlets theorem on
diophantineDapproximation, 132
241
242
isogeny, 43
isoperimetric inequality, 207
Keplers third law, 226
Kleinian geometry, 195
knowledge agent, 17
Kripke model, 24
Landaus conjecture, 4
Laplacian, 109
law of the excluded middle, 13
limiting absorption principle, 114
limiting amplitude principle, 121
local smoothing, 119
local-to-global principle (heuristic), 2
Loeb measure, 101
measure space, 97
memory axiom, 27
Mertens theorem, 158
modus ponens, 13
multiplicative function, 152
negative introspection rule, 22
Nikishin-Stein factorisation theorem, 91
Notation, x
Pappus theorem, 191
Pascals theorem, 192
polynomial ham sandwich theorem, 47
polynomial method, 134
positive introspection rule, 22
Prekopa-Leindler inequality, 208
pre-measure, 101
prescriptive activity, 225
principle of indifference, 2
propositional logic, 13
quaternions, 198
RAGE theorem, 120
random rotations trick, 90
random sums trick, 90
rank of a Lie group, 42
regular sequence, 206
resolvent, 110
Riesz lemma, 58
Riesz-Thorin interpolation theorem, 70
Rohlins problem, 218
Schinzels hypothesis H, 4
Schrodinger propagator, 110
Schurs test, 74
semantics, 12
Index