Probability Theory and Stochastic Processes With Applications
Probability Theory and Stochastic Processes With Applications
Probability Theory and Stochastic Processes With Applications
and Stochastic
Processes
with Applications
Oliver Knill
Overseas Press
Probability and Stochastic
Processes with Applications
Probability and Stochastic
Processes with Applications
Oliver Knill
Read. Office:
Overseas Press India Private Limited
7/28, Ansari Road, Daryaganj
NewDelhi-110 002
Email: [email protected]
Website: www.overseaspub.com
Sales Office:
Overseas Press India Private Limited
2/15, Ansari Road, Daryaganj
NewDelhi-110 002
Email: [email protected]
Website: www.overseaspub.com
Edition : 2009
Published by Narinder Kumar Lijhara for Overseas Press India Private Limited,
7/28, Ansari Road, Daryaganj, New Delhi-110002 and Printed in India.
Contents
Preface 3
1 Introduction 5
1.1 What is probability theory? Vt 5
1.2 Some paradoxes in probability theory 12
1.3 Some applications of probability theory 16
2 Limit theorems 23
2.1 Probability spaces, random variables, independence 23
2.2 Kolmogorov's 0 — 1 law, Borel-Cantelli lemma 34
2 . 3 I n t e g r a t i o n , E x p e c t a t i o n , Va r i a n c e 3 9
2.4 Results from real analysis 42
2.5 Some inequalities 44
2.6 The weak law of large numbers 50
2.7 The probability distribution function 56
2.8 Convergence of random variables 59
2.9 The strong law of large numbers 64
2.10 Birkhoff's ergodic theorem 68
2 . 11 More convergence results 72
2.12 Classes of random variables 78
2.13 Weak convergence 90
2.14 The central limit theorem 92
2.15 Entropy of distributions 98
2.16 Markov operators 107
2.17 Characteristic functions 11 0
2 . 1 8 T h e l a w o f t h e i t e r a t e d l o g a r i t h m 11 7
1
Contents
3.9 The arc-sin law for the ID random walk 167
3.10 The random walk on the free group 171
3.11 The free Laplacian on a discrete group . . 175
3.12 A discrete Feynman-Kac formula 179
3.13 Discrete Dirichlet problem 181
3.14 Markov processes 186
Selected To p i c s 275
5.1 Percolation 275
5.2 Random Jacobi matrices 286
5.3 Estimation theory 292
5.4 Vlasov dynamics 298
5.5 Multidimensional distributions 306
5.6 Poisson processes 3 11
5.7 Random maps 316
5.8 Circular random variables 319
5.9 Lattice points near Brownian paths 327
5.10 Arithmetic random variables 333
5 . 11 S y m m e t r i c D i o p h a n t i n e E q u a t i o n s 343
5.12 Continuity of random variables 349
Preface
Having been online for many years on my personal web sites, the text got
reviewed, corrected and indexed in the summer of 2006. It obtained some
enhancements which benefited from some other teaching notes and research,
I wrote while teaching probability theory at the University of Arizona in
Tucson or when incorporating probability in calculus courses at Caltech
and Harvard University.
The last chapter "selected topics" got considerably extended in the summer
of 2006. While in the original course, only localization and percolation prob
lems were included, I added other topics like estimation theory, Vlasov dy
namics, multi-dimensional moment problems, random maps, circle-valued
random variables, the geometry of numbers, Diophantine equations and
harmonic analysis. Some of this material is related to research I got inter
ested in over time.
I would like to get feedback from readers. I plan to keep this text alive and
update it in the future. You can email this to [email protected] and
also indicate on the email if you don't want your feedback to be acknowl
edged in an eventual future edition of these notes.
4 Contents
To get a more detailed and analytic exposure to probability, the students
of the original course have consulted the book [105] which contains much
more material than covered in class. Since my course had been taught,
many other books have appeared. Examples are [21, 34].
For a less analytic approach, see [40, 91, 97] or the still excellent classic
[26]. For an introduction to martingales, we recommend [108] and [47] from
both of which these notes have benefited a lot and to which the students
of the original course had access too.
For Brownian motion, we refer to [73, 66], for stochastic processes to [17],
for stochastic differential equation to [2, 55, 76, 66, 46], for random walks
to [100], for Markov chains to [27, 87], for entropy and Markov operators
[61]. For applications in physics and chemistry, see [106].
For the selected topics, we followed [32] in the percolation section. The
books [101, 30] contain introductions to Vlasov dynamics. The book of [1]
gives an introduction for the moment problem, [75, 64] for circle-valued
random variables, for Poisson processes, see [49, 9]. For the geometry of
numbers for Fourier series on fractals [45].
The book [109] contains examples which challenge the theory with counter
examples. [33, 92, 70] are sources for problems with solutions.
Oliver Knill
Chapter 1
Introduction
Given a probability space (ft, A, P), one can define random variables X. A
random variable is a function X from ft to the real line R which is mea
surable in the sense that the inverse of a measurable Borel set B in R is
" Chapter 1. Introduction
in A. The interpretation is that if uj is an experiment, then X(u>) mea
sures an observable quantity of the experiment. The technical condition of
measurability resembles the notion of a continuity for a function / from a
topological space (fi, O) to the topological space (R,U). A function is con
tinuous if f~l{U) G O for all open sets U € U. In probability theory, where
functions are often denoted with capital letters, like X, Y,..., a random
variable X is measurable if X~l(B) € A for all Borel sets B £ B. Any
continuous function is measurable for the Borel a-algebra. As in calculus,
where one does not have to worry about continuity most of the time, also in
probability theory, one often does not have to sweat about measurability is
sues. Indeed, one could suspect that notions like a-algebras or measurability
were introduced by mathematicians to scare normal folks away from their
realms. This is not the case. Serious issues are avoided with those construc
tions. Mathematics is eternal: a once established result will be true also in
thousands of years. A theory in which one could prove a theorem as well as
its negation would be worthless: it would formally allow to prove any other
result, whether true or false. So, these notions are not only introduced to
keep the theory "clean", they are essential for the "survival" of the theory.
We give some examples of "paradoxes" to illustrate the need for building
a careful theory. Back to the fundamental notion of random variables: be
cause they are just functions, one can add and multiply them by defining
(X + y)(w) = X(lj) + Y(w) or (XY)(u>) = X(lo)Y(uj). Random variables
form so an algebra C. The expectation of a random variable X is denoted
by E[X] if it exists. It is a real number which indicates the "mean" or "av
erage" of the observation X. It is the value, one would expect to measure in
the experiment. If X = 1B is the random variable which has the value 1 if
u> is in the event B and 0 if lj is not in the event B, then the expectation of
X is just the probability of B. The constant random variable X(w) = a has
the expectation E[X] = a. These two basic examples as well as the linearity
requirement E[aX + bY] = aE[X] +bE[Y] determine the expectation for all
random variables in the algebra C: first one defines expectation for finite
sums YJi=\ adBt called elementary random variables, which approximate
general measurable functions. Extending the expectation to a subset C1 of
the entire algebra is part of integration theory. While in calculus, one can
live with the Riemann integral on the real line, which defines the integral
by Riemann sums f* f(x) dx ~ \ J2i/ne[a,b] /(*/«)> the integral defined in
measure theory is the Lebesgue integral. The later is more fundamental
and probability theory is a major motivator for using it. It allows to make
statements like that the probability of the set of real numbers with periodic
decimal expansion has probability 0. In general, the probability of A is the
expectation of the random variable X(x) = f(x) = lA{x). In calculus, the
integral f0 f(x) dx would not be defined because a Riemann integral can
give 1 or 0 depending on how the Riemann approximation is done. Probabil
ity theory allows to introduce the Lebesgue integral by defining f* f(x) dx
as the limit of £ Yn=i f(xi) for n -> oo, where n are random uniformly
distributed points in the interval [a, b]. This Mcnte Carlo definition of the
Lebesgue integral is based on the law of large numbers and is as intuitive
1.1. What is probability theory? '
to state as the Riemann integral which is the limit of £ Y.Xj=j/ne[a,b\ f(xo)
for n —▶ oo.
With the fundamental notion of expectation one can define the variance,
Var[X] = E[X2] - E[X]2 and the standard deviation a[X] = y/Var[X] of a
random variable X for which X2 e C1. One can also look at the covariance
Cov[XY] = E[XY] - E[X]E[Y] of two random variables X,Y for which
X2,Y2 € C1. The'correlation Corr[X,Y] = Cov[XY]/(a[X]a[Y)) of two
random variables with positive variance is a number which tells how much
the random variable X is related to the random variable Y. If E[XY] is
interpreted as an inner product, then the standard deviation is the length
of X - E[X] and the correlation has the geometric interpretation as cos(a),
where a is the angle between the centered random variables X - E[X] and
Y - E[Y}. For example, if Cov[X, Y] = 1, then Y = XX for some A > 0, if
Cov[X, Y] = -1, they are anti-parallel. If the correlation is zero, the geo
metric interpretation is that the two random variables are perpendicular.
Decorrelated random variables still can have relations to each other but if
for any measurable real functions / and g, the random variables f(X) and
g(X) are uncorrected, then the random variables X,Y are independent.
The law /j,x of the random variable is a probability measure on the real
line satisfying /ix((a, b\) = Fx(b) - Fx(a). By the Lebesgue decomposition
theorem, one can decompose any measure // into a discrete part /xpp, an
absolutely continuous part jiac and a singular continuous part /jlsc. Random
variables X for which fix is a discrete measure are called discrete random
variables, random variables with a continuous law are called continuous
random variables. Traditionally, these two type of random variables are
the most important ones. But singular continuous random variables appear
too: in spectral theory, dynamical systems or fractal geometry. Of course,
the law of a random variable X does not need to be pure. It can mix the
8 - Chapter 1. Introduction
three types. A random variable can be mixed discrete and continuous for
example^
While one can realize every discrete time stochastic process Xn by a measure-
preserving transformation T : ft —> ft and Xn(u) = X(Tn(uo)), probabil
ity theory often focuses a special subclass of systems called martingales,
where one has a filtration An C An+\ of a-algebras such that Xn is An-
measurable and E[Xn|Ai-i] = ^n-i, where E[Xn|Ai-i] is the conditional
expectation with respect to the sub-algebra An-i- Martingales are a pow
erful generalization of the random walk, the process of summing up IID
random variables with zero mean. Similar as ergodic theory, martingale
theory is a natural extension of probability theory and has many applica
tions.
The language of probability fits well into the classical theory of dynam
ical systems. For example, the ergodic theorem of Birkhoff for measure-
preserving transformations has as a special case the law of large numbers
which describes the average of partial sums of random variables ^ XX=i ^k-
There are different versions of the law of large numbers. "Weak laws"
make statements about convergence in probability, "strong laws" make
statements about almost everywhere convergence. There are versions of
the law of large numbers for which the random variables do not need to
have a common distribution and which go beyond Birkhoff's theorem. An
other important theorem is the central limit theorem which shows that
Sn = Xi 4- X2 + • • • + Xn normalized to have zero mean and variance 1
converges in law to the normal distribution or the law of the iterated loga
rithm which says that for centered independent and identically distributed
Xfc, the scaled sum Sn/An has accumulation points in the interval [—cr, a]
if An = y/2n log log n and a is the standard deviation of X&. While stating
10 Chapter 1. Introduction
the weak and strong law of large numbers and the central limit theorem,
different convergence notions for random variables appear: almost sure con
vergence is the strongest, it implies convergence in probability and the later
implies convergence convergence in law. There is also /^-convergence which
is stronger than convergence in probability.
In the same way as mathematics reaches out into other scientific areas,
probability theory has connections with many other branches of mathe
matics. The last chapter of these notes give some examples. The section
on percolation shows how probability theory can help to understand criti
cal phenomena. In solid state physics, one considers operator-valued ran
dom variables. The spectrum of random operators are random objects too.
One is interested what happens with probability one. Localization is the
phenomenon in solid state physics that sufficiently random operators of
ten have pure point spectrum. The section on estimation theory gives a
glimpse of what mathematical statistics is about. In statistics one often
does not know the probability space itself so that one has to make a statis
tical model and look at a parameterization of probability spaces. The goal
is to give maximum likelihood estimates for the parameters from data and
to understand how small the quadratic estimation error can be made. A
section on Vlasov dynamics shows how probability theory appears in prob
lems of geometric evolution. Vlasov dynamics is a generalization of the
n-body problem to the evolution of of probability measures. One can look
at the evolution of smooth measures or measures located on surfaces. This
12 Chapter 1. Introduction
deterministic stochastic system produces an evolution of densities which
can form singularities without doing harm to the formalism. It also defines
the evolution of surfaces. The section on moment problems is part of multi
variate statistics. As for random variables, random vectors can be described
by their moments. Since moments define the law of the random variable,
the question arises how one can see from the moments, whether we have a
continuous random variable. The section of random maps is an other part
of dynamical systems theory. Randomized versions of diffeomorphisms can
be considered idealization of their undisturbed versions. They often can
be understood better than their deterministic versions. For example, many
random diffeomorphisms have only finitely many ergodic components. In
the section in circular random variables, we see that the Mises distribu
tion has extremal entropy among all circle-valued random variables with
given circular mean and variance. There is also a central limit theorem
on the circle: the sum of IID circular random variables converges in law
to the uniform distribution. We then look at a problem in the geometry
of numbers: how many lattice points are there in a neighborhood of the
graph of one-dimensional Brownian motion? The analysis of this problem
needs a law of large numbers for independent random variables Xk with
uniform distribution on [0,1]: for 0 < 5 < 1, and An = [0, l/n6] one has
linin^oo ^ Ylk=i n* = 1- Probability theory also matters in complex
ity theory as a section on arithmetic random variables shows. It turns out
that random variables like Xn(k) = fc, Yn(k) = k2 + 3 mod n defined on
finite probability spaces become independent in the limit n —▶ oc. Such
considerations matter in complexity theory: arithmetic functions defined
on large but finite sets behave very much like random functions. This is
reflected by the fact that the inverse of arithmetic functions is in general
difficult to compute and belong to the complexity class of NP. Indeed, if
one could invert arithmetic functions easily, one could solve problems like
factoring integers fast. A short section on Diophantine equations indicates
how the distribution of random variables can shed light on the solution
of Diophantine equations. Finally, we look at a topic in harmonic analy
sis which was initiated by Norbert Wiener. It deals with the relation of
the characteristic function cj)X and the continuity properties of the random
variable X.
First answer: take an arbitrary point P on the boundary of the disc. The
set of all lines through that point are parameterized by an angle 0. In order
that the chord is longer than -y/3, the line has to lie within a sector of 60°
within a range of 180°. The probability is 1/3.
Second answer: take all lines perpendicular to a fixed diameter. The chord
is longer than \/3 if the point of intersection lies on the middle half of the
diameter. The probability is 1/2.
Third answer: if the midpoints of the chords lie in a disc of radius 1/2, the
chord is longer than \/3- Because the disc has a radius which is half the
radius of the unit disc, the probability is 1/4.
''V
The paradox is that nobody would agree to pay even an entrance fee c = 10.
14 Chapter 1. Introduction
The problem with this casino is that it is not quite clear, what is "fair".
For example, the situation T = 20 is so improbable that it never occurs
in the life-time of a person. Therefore, for any practical reason, one has
not to worry about large values of T. This, as well as the finiteness of
money resources is the reason, why casinos do not have to worry about the
following bullet proof martingale strategy in roulette: bet c dollars on red.
If you win, stop, if you lose, bet 2c dollars on red. If you win, stop. If you
lose, bet Ac dollars on red. Keep doubling the bet. Eventually after n steps,
red will occur and you will win 2nc - (c + 2c H h 2n_1c) = c dollars.
This example motivates the concept of martingales. Theorem (3.2.7) or
proposition (3.2.9) will shed some light on this. Back to the Petersburg
paradox. How does one resolve it? What would be a reasonable entrance
fee in "real life"? Bernoulli proposed to replace the expectation E[G] of the
profit G = 2T with the expectation (E[\/G])2, where u(x) = y/x is called a
utility function. This would lead to a fair entrance
oo -
It is not so clear if that is a way out of the paradox because for any proposed
utility function u(fc), one can modify the casino rule so that the paradox
reappears: pay (2fc)2 if the utility function u(k) = \fk or pay e2 dollars,
if the utility function is u(k) = log(fe). Such reasoning plays a role in
economics and social sciences.
3) The three door problem (1991) Suppose you're on a game show and
you are given a choice of three doors. Behind one door is a car and behind
the others are goats. You pick a door-say No. 1 - and the host, who knows
what's behind the doors, opens another door-say, No. 3-which has a goat.
(In all games, he opens a door to reveal a goat). He then says to you, "Do
1.2. Some paradoxes in probability theory 15
you want to pick door No. 2?" (In all games he always offers an option to
switch). Is it to your advantage to switch your choice?
The problem is also called "Monty Hall problem" and was discussed by
Marilyn vos Savant in a "Parade" column in 1991 and provoked a big
controversy. (See [98] for pointers and similar examples.) The problem is
that intuitive argumentation can easily lead to the conclusion that it does
not matter whether to change the door or not. Switching the door doubles
the chances to win:
No switching: you choose a door and win with probability 1/3. The opening
of the host does not affect any more your choice.
Switching: when choosing the door with the car, you loose since you switch.
If you choose a door with a goat. The host opens the other door with the
goat and you win. There are two such cases, where you win. The probability
to win is 2/3.
But there is no real number p = P[A] = P[A] which makes this possible.
Both the Banach-Tarski as well as Vitalis result shows that one can not
hope to define a probability space on the algebra A of all subsets of the unit
ball or the unit circle such that the probability measure is translational
and rotational invariant. The natural concepts of "length" or "volume",
which are rotational and translational invariant only makes sense for a
smaller algebra. This will lead to the notion of a-algebra. In the context
of topological spaces like Euclidean spaces, it leads to Borel cr-algebras,
algebras of sets generated by the compact sets of the topological space.
This language will be developed in the next chapter.
16 Chapter 1. Introduction
1.3 Some applications of probability theory
Probability theory is a central topic in mathematics. There are close re
lations and intersections with other fields like computer science, ergodic
theory and dynamical systems, cryptology, game theory, analysis, partial
differential equation, mathematical physics, economical sciences, statistical
mechanics and even number theory. As a motivation, we give some prob
lems and topics which can be treated with probabilistic methods.
iI
^
\
/iV
wy
Figure. A random Figure. A piece of a Figure. A piece of a
walk in one dimen random walk in two random walk in three
sions displayed as a dimensions. dimensions.
graph (t,Bt).
St
_u -iu
' ' ^
j hJ, i J_
r-i m&m
Z " J ' j" 'i i— : .i — . _„ iJ. . .! _r
:.:—
i H jj j z •'_- -c :U 3 n z i_
V ■■ J S ' ?
\ ■
^ ■■ ' . ■ ! ■■ V
■ ■■ ^ ■■■'■ v ■
■^ ■ ■ ■ _■ ■ ■
■ ■ r_■ ■ ■
.v. %:■-/'- ■■*■
Even more general percolation problems are obtained, if also the distribu
tion of the random variables Xn,m can depend on the position (n, m).
Consider the linear map Lu(n) = ]C|m-n|=i u(n) + V(n)u(n) on the space
of sequences u = (..., u-2, "U-i, uo, ui, u2,...). We assume that V(n) takes
random values in {0,1}. The function V is called the potential. The problem
is to determine the spectrum or spectral type of the infinite matrix L on
the Hilbert space I2 of all sequences u with finite ||w||2 = S^L-oo^n-
The operator L is the Hamiltonian of an electron in a one-dimensional
disordered crystal. The spectral properties of L have a relation with the
conductivity properties of the crystal. Of special interest is the situation,
where the values V(n) are all independent random variables. It turns out
that if V(n) are IID random variables with a continuous distribution, there
are many eigenvalues for the infinite dimensional matrix L - at least with
probability 1. This phenomenon is called localization.
1.3. Some applications of probability theory l'J
ll ! I ft I
if
...,l, ll, iii.n...ii.tlill lillt.ttld.ri.il Hi.ti.,.iltil iiiiiitiii.i..i
The simplest method is to try to find the factors by trial and error but this is
impractical already if N has 50 digits. We would have to search through 1025
numbers to find the factor p. This corresponds to probe 100 million times
1.3. Some applications of probability theory 21
every second over a time span of 15 billion years. There are better methods
known and we want to illustrate one of them now: assume we want to find
the factors of N = 11111111111111111111111111111111111111111111111.
The method goes as follows: start with an integer a and iterate the quadratic
map T(x) =x2 + c mod N on {0,1.,,, .N - 1 }. If we assume the numbers
x0 = a, xi = T(a),x2 = T(T(a))... to be random, how many such numbers
do we have to generate, until two of them are the same modulo one of the
prime factors p? The answer is surprisingly small and based on the birthday
paradox: the probability that in a group of 23 students, two of them have the
same birthday is larger than 1/2: the probability of the event that we have
no birthday match is 1(364/365) (363/365) • • • (343/365) = 0.492703..., so
that the probability of a birthday match is 1 - 0.492703 = 0.507292. This
is larger than 1/2. If we apply this thinking to the sequence of numbers
Xi generated by the pseudo random number generator T, then we expect
to have a chance of 1/2 for finding a match modulo p in yfp iterations.
Because p < y/n, we have to try iV1/4 numbers, to get a factor: if xn and
xm are the same modulo p, then gcd(xn -xm,N) produces the factor p of
N. In the above example of the 46 digit number AT, there is a prime factor
p = 35121409. The Pollard algorithm finds this factor with probability 1/2
in y/p == 5926 steps. This is an estimate only which gives the order of mag
nitude. With the above N, if we start with a = 11 and take a = 3, then we
have a match #27720 = xi3860- It can be found very fast.
A stick of length 2 is thrown onto the plane filled with parallel lines, all
of which are distance d = 2 apart. If the center of the stick falls within
distance y of a line, then the interval of angles leading to an intersection
with a grid line has length 2 arccos(y) among a possible range of angles
22 Chapter 1. Introduction
Limit theorems
(i) ft € A,
(ii) A £ A =▶ Ac =■-Q\A €.4,
(iii) An G A => Ur eA
A pair (ft, A) for which A is a cr-algebra in ft is called a measurable space.
23
24 Chapter 2. Limit theorems
Definition. For any set C of subsets of ft, we can define <r(C), the smallest
a-algebra A which contains C. The cr-algebra A is the intersection of all
cr-algebras which contain C. It is again a cr-algebra.
Remark. One sometimes defines the Borel cr-algebra as the cr-algebra gen
erated by the set of compact sets C of a topological space. Compact sets
in a topological space are sets for which every open cover has a finite sub-
cover. In Euclidean spaces Rn, where compact sets coincide with the sets
which are both bounded and closed, the Borel cr-algebra generated by the
compact sets is the same as the one generated by open sets. The two def
initions agree for a large class of topological spaces like "locally compact
separable metric spaces".
Example. If ft = [0,1] x [0,1] is the unit square and C is the set of all sets
of the form [0,1] x [a, b] with 0 < a < b < 1, then a(C) is the a-algebra of
all sets of the form [0,1] x A, where A is in the Borel a-algebra of [0,1].
Remark. There are different ways to build the axioms for a probability
space. One could for example replace (i) and (ii) with properties 4),5) in
the above list. Statement 6) is equivalent to a-additivity if P is only assumed
to be additive.
P[A] = ^J J e-^-y^dxdy.
Any continuous function X of two variables is a random variable on ft. For
example, X(x,y) = xy(x + y) is a random variable. But also X(x,y) =
l/(x + y) is a random variable, even so it is not continuous. The vector-
valued function X(x, y) = (x, y, x3) is an example of a random vector.
X-\B) = {X-\B)\BeB}.
We denote this algebra by cr(X) and call it the a-algebra generated by X.
Example. The map X(x,y) = x from the square ft = [0,1] x [0,1] to the
real line R defines the algebra B={4x[0,l]}, where A is in the Borel
a-algebra of the interval [0,1].
Example. We throw two fair dice. Let A be the event that the first dice is
6 and let B be the event that the sum of two dices is 11. Because P[B] =
2/36 = 1/18 and P[A n B] = 1/36 (we need to throw a 6 and then a 5),
we haveP[A\B] = (1/16)/(1/18) = 1/2. The interpretation is that since
we know that the event B happens, we have only two possibilities: (5,6)
or (6,5). On this space of possibilities, only the second is compatible with
the event B.
2.1. Probability spaces, random variables, independence 27
Exercice. a) Verify that the Sicherman dices with faces (1,3,4,5,6,8) and
(1,2,2,3,3,4) have the property that the probability of getting the value
k is the same as with a pair of standard dice. For example, the proba
bility to get 5 with the Sicherman dices is 3/36 because the three cases
(1,4), (3,2), (3,2) lead to a sum 5. Also for the standard dice, we have
three cases (1,4), (2,3), (3,2).
b) Three dices A, B, C are called non-transitive, if the probability that A >
B is larger than 1/2, the probability that B > C is larger than 1/2 and the
probability that C > A is larger than 1/2. Verify the nontransitivity prop
erty for A = (1,4,4,4,4,4), B = (3,3,3,3,3,6) and C = (2,2,2,5,5,5).
1) P[A\B] > 0.
2) P[A\A] = 1.
3) P[A\B] + P[AC\B] = 1
4) P[AnB\C] =- P[A\C] ■ p[B\An C}.
Theorem 2.1.1 (Bayes rule). Given a finite partition {^i,.., An} in A and
B e A with P[B] > 0, one has
P[B\Ai]P[Ai]
P[Ai\B]
Example. The Girl-Boy problem: "Dave has two child. One child is a boy.
What is the probability that the other child is a girl"?
Most people would intuitively say 1/2 because the second event looks inde
pendent of the first. However, it is not and the initial intuition is mislead
ing. Here is the solution: first introduce the probability space of all possible
events ft = {BG,GB,BB,GG} with P[{BG}] = P[{GB}] = P[{BB}] =
P[{GG}] = 1/4. Let B = {BG, GB, BB} be the event that there is at least
one boy and A = {GB, BG, GG} be the event that there is at least one
girl. We have
•p[A\B]
L ' J = P^An^
[ B ] (=
3 /m
4 ) =3 * *
Example. 1) The family X = {0, {1}, {2}, {3}, {1,2}, {2,3}, ft} is a 7r-system
on ft = {1,2,3}.
2) The set J = {{a, b) |0 < a < b < 1} U {0} of all half closed intervals is a
7r-system on ft = [0,1] because the intersection of two such intervals [a, b)
and [c, d) is either empty or again such an interval [c, b).
(i) tie A,
(ii) A, B 6 V, A C B = > B \ A e V.
(hi) An € V, An / A = > i e P
Proof By the previous lemma, we need only to show that d(l) is a 7r—
system.
(i) Define Vx = {B e d(l) | B n C G d(I),VC El}. Because J is a
7r-system, we have X C V\.
Claim. V\ is a Dynkin system.
Proof. Clearly ft G Vx. Given A,B G V - 1 with A C B. For C G J
we compute (B \A) C\C = (B nC)\(Af]C) which is in d(I). Therefore
A\B G Vi. Given An / A with An G Pi and C G I. Then AnnC / AnC
so that A D C G d(X) and A G Pi.
(ii) Define D2 = {Ae d(X) | 5 n A G d(X), VB G d(I) }. From (i) we know
that 1 C T>2. Like in (i), we show that V2 is a Dynkin-system. Therefore
V2 = d(l), which means that d(J) is a 7r-system. D
Lemma 2.1.5. Given a probability space (ft,*4, P). Let Q,H be two o-
subalgebras of A and X and J be two 7r-systems satisfying a (I) = Q,
a(J) = H. Then Q and H are independent if and only if X and J are
independent.
Proof, (i) Fix J G X and define on (ft,W) the measures //(if) = P[J H
H],u{H) = P[I]P[H] of total probability P[J]. By the independence of X
and J", they coincide on J and by the extension lemma (2.1.4), they agree
on H and we have P[I n H] = P[I]P[H] for all I G X and if G H.
(ii) Define for fixed H G W the measures /j,(G) = P[G D H] and i/(G) =
P[G]P[fl] of total probability P[H] on (ft, <?). They agree on X and so on Q.
We have shown that P[Gf)H} =P[G]P[H] for all G G 0 and all H eH. D
(i) ft G A,
( i i ) i G ^ ^ 4 c G .4,
(hi) A,B eA=> A u £ g * 4
Before we launch into the proof of this theorem, we need two lemmas:
Definition. Let A be an algebra and A : A •-▶ [9, oo] with A(0) = 0. A set
A G A is called a A-set, if X(A n G) + X(AC n G) = A(G) for all G G A.
Proof From the definition is clear that ft G A\ and that \f B e A\, then
Bc e A\. Given B, C G A\. Then A = 5nCGiA. Proof. Since C G *4A,
we get
A(Cn AcnG) + A(CCnicnG) = A(ACnG).
32 Chapter 2. Limit theorems
This can be rewritten with C fl Ac = C n Bc and Gc n Ac = Cc as
A(0) = O,
A, B e A with AcB=> X(A) < X(B).
An e A => X(\JnAn) < EnPW (<t subadditivity)
Subadditivity for A gives A(G) < X(A C\G) + X(AC n G). All the inequalities
in this proof are therefore equalities. We conclude that A G C and that A
is a additive on A\. □
Proof. Given an algebra 71 with a measure fi. Define A = o~(1Z) and the
a-algebra V consisting of all subsets of ft. Define on V the function
(ii) A = /i on TZ.
Given A ell. Clearly X(A) < u.(A). Suppose that Ac\JnAn, with An G
TZ. Define a sequence {Bn}neN of disjoint sets in TZ inductively Bi = A\,
Bn = Ann({Jk<n Ak)c such that Bn c An and |Jn Bn = \Jn An D A. From
the a-additivity of /x on TZ follows
/x(A)<|J/i(Bn)<|J/x(^n),
(iv) By (i) A is an outer measure on (ft, Pa)- Since by step (hi), 1Z C V\,
we know by Caratheodory's lemma that A C V\, so that we can define \i
on A as the restriction of A to A. By step (ii), this is an extension of the
measure /x on 1Z. □
Remark. The name "ring" has its origin to the fact that with the "addition"
A + B = AAB = (A U B) \ (A n B) and "multiplication" A • B = A n B,
a ring of sets becomes an algebraic ring like the set of integers, in which
rules like A-(B + C) = A-B + A-C hold. The empty set 0 is the zero
element because AA0 = A for every set A. If the set ft is also in the ring,
one has a ring with 1 because the identity A 0 ft = A shows that ft is the
1-element in the ring.
Lets add some definitions, which will occur later:
Aoo := limsup An = Q (J An
n_>°° m=ln>m
consists of the set {u G ft} such that uj e An for infinitely many n G N. The
set ^oo is contained in the tail a-algebra of An = {0,A^c,ft}. It follows
from Kolmogorov's 0 - 1 law that P[Aoo] G {0,1} if An e A and {An} are
P-independent.
5^P[i4n]=00=»P[i4oo] = l.
n€N
k>n kfc>n
>n
= n(i-p[^])^nexp(-p[^])
k>n k>n
= exp(-5>[4t]).
k>n
p[^,]=p[U nAa^Epn^]=o
nGNfe>n n£N
follows Pfylgo] = 0. G
E^f^-iog^).
2 . 2 . K o l m o g o r o v ' s 0 — 1 l a w, B o r e l - C a n t e l l i l e m m a 3 7
104 One "myriad". The largest numbers, the Greeks were considering.
105 The largest number considered by the Romans.
1010 The age of the universe in years.
1017 The age of the universe in seconds.
1022 Distance to our neighbor galaxy Andromeda in meters.
1023 Number of atoms in two gram Carbon which is 1 Avogadro.
1027 Estimated size of universe in meters.
1030 Mass of the sun in kilograms.
1041 Mass of our home galaxy "milky way" in kilograms.
1051 Archimedes's estimate of number of sand grains in universe.
1080 The number of protons in the universe.
P[A] = ^e-A"
neA
2.3. Integration, Expectation, Va r i a n c e 39
is a probability measure on the measurable space (CI, A) considered in a).
c) Show that for every s > 1
n£A
nefi
is called the Riemann zeta function.
d) Show that the sets Ap = {n G ft| p divides n} with prime p are indepen
dent. What happens if p is not prime.
e) Give a probabilistic proof of Euler's formula
p prime
f) Let A be the set of natural numbers which are not divisible by a square
different from 1. Prove
sup / Y dP
YeS,Y<\x\*
\X\J
Definition. It is custom to write L1 for the space C1, where random vari
ables X,Y for which E[|X - Y\] = 0 are identified. Unlike Cp, the spaces
Lp are Banach spaces. We will come back to this later.
a[X]=Var[X]1/2
Va r | X ] = s p fl - E [ X ] > = ^ - ^ = { 1 + m f ( 1 + 2 m )
KX(t)=\og(Mx(t))
° U
7 n =1l
m—1 . . r.a. =™0 —nC( mv + l ) '!
and
00 fmym 00 rfyml
1 (x-m)2
/(X) = e 2*2 #
11 h————i (I
1 6 I I
Since Yn < sup™=1 Xm = Xn also E[yn] < E[Xn]. One checks that
supn Yn = X implies supn E[yn] = supyG£ y<x E[Y] and concludes
Therefore
E[minf
> nXm] < E[XP] < E[sup
m > n Xm] .
E[ inf Xm] < inf E[XP] < supE[XJ < E[sup Xm] .
m^n P>n p>n m>n
E[\immfXn]
n =n supE[ minf
> nXm] < nsup inf " E[Xm]
r n > n = liminf
n E[Xn]
< limsupE[Xn] = inf sup E[Xm]
n n m>n
< inf E[ sup Xm] = E[limsupXn] .
n m>n n
E [ X ) = E [ l i mni n f X n ] < l i m ni n f E [ X n ]
< limsup E[Xn] < E[limsup Xn] = E[X].
n n
E[X]=(a+b)/2
Theorem 2.5.1 (Jensen inequality). Given X e C1. For any convex function
h : R -> R, we have
E[h(X)} > h(E[X}) ,
where the left hand side can also be infinite.
Proof Let I be the linear map defined at x0 = E[X}. By the linearity and
monotonicity of the expectation, we get
£°° c CP c Cq C C1
for p > q. The smallest space is £°° which is the space of all bounded
random variables.
Theorem 2.5.2 (Holder inequality, Holder 1889). Given p,q e [l,oo] with
P'1 + q-1 = 1 and X G C? and Y G Cq. Then XY e C1 and
\ \ X Y \ U K l \ X U Y \ \q
Q=XPP
E[Xp]
and define u = 1{a>o}57*p_1- Jensen's inequality
sen's in € gives Q{u)q < Q(vfl) so
that
E[|^|]<||A:||p||i{z>0}y||,<||x||p||y||,.
□
A special case of Holder's inequality is the Cauchy-Schwarz inequality
ll*l%<||*||2-|M|2-
Definition. We use the short-hand notation P[X > c] for P[{w £ fi | I(w) >
Mi-
h{c)-P[X>c] <E[h{X)].
I I
the case h(x) = x. The left hand
side h(c) ■ P[X > c) is the area of
the rectangles {X > c} x [0, h(x)}
and E[h(X)] = E[X] is the area
under the graph of X.
t>„ e~tcMx{t)
P[|X-E[X]|>c]<^pl.
\Cov[X,Y]\<a[X]a[Y]
Iff2 = {l,...,n}isa finite set, then the random variables X, Y define the
vectors
X = (X(l),...,X(n)),Y = (Y(l),...,Y(n))
or n data points (X(i),Y(i)) in the plane. As will follow from the proposi
tion below, the regression line has the property that it minimizes the sum
of the squares of the distances from these points to the line.
2.5. Some inequalities 49
Remark. If fi is a finite set, then the covariance Cov[X, Y] is the dot prod
uct between the centered random variables X — E[X] and Y — E[Y\, and
cr[X] is the length of the vector X — E[X] and the correlation coefficient
Corr[X,y] is the cosine of the angle a between X - E[X] and Y - E[Y]
because the dot product satisfies v- w = \v\\w\ cos(a). So, uncorrelated
random variables X, Y have the property that X — E[X] is perpendicular
to y — E[y]. This geometric interpretation explains, why lemma (2.5.7) is
called Pythagoras theorem.
For more inequalities in analysis, see the classic [29, 58]. We end this sec
tion with a list of properties of variance and covariance:
Var[X] > 0.
Vai[X] = E[X2}-E[X}2.
Var[AX] = A2Var[X].
Var[X + y] = Var[X] + Var[y] + 2Cov[X, Y}. Corr[X, Y] G [0,1].
Cov[X, y] = E[XY] - E[X]E[y].
Cov[X,y] <a[X]a[Y}.
Corr[X, y] = 1 if X - E[X] = Y - E[Y]
We first prove the weak law of large numbers. There exist different ver
sions of this theorem since more assumptions on Xn can allow stronger
statements.
lim p[|y„-y|>c] = o.
71—>-00
Theorem 2.6.1 (Weak law of large numbers for uncorrelated random vari
ables). Assume Xi G C2 have common expectation E[Xi] = m and satisfy
suPn n Ya=i Vaxpfi] < °°. If xn are pairwise uncorrelated, then ^ —▶ m
in probability.
Vflr(^]
L n_ n%
n 2 i- S%1!
n2 .n ^&1
2 n_
2 -I ±
f - 'vM[x„l.
The right hand side converges to zero for n —▶ oo. With Chebychev's in
equality (2.5.5), we obtain
Figure. Approximation of a
function f(x) by Bernstein poly
nomials B2,B5, B10, B2Q, B30.
B^) = E/(^)(r)^(i-^r-fc
converge uniformly to /. If f(x) > 0, then also Bn(x) > 0.
\Bn(x)-f(x)\ = \E[fn
( ^ ) } - f ( x ) } \ < E [ \ f n( ^ ) - f ( x )
< 2||/||.P[|^-x|><5]
n
+ s u p \ f ( x ) - f ( y ) \ . - p [I\*->n
ZL-x\<6\
\x-y\<6 n
< 2||/||-P[|^-x|><5]
+ sup \f(x)-f(y)\.
\x-y\<8
2.6. The weak law of large numbers 53
The second term in the last line is called the continuity module of /. It
converges to zero for S —▶ 0. By the Chebychev inequality (2.5.5) and the
proof of the weak law of large numbers, the first term can be estimated
from above by
,Var[X,]
n62 '
a bound which goes to zero for n —▶ oo because the variance satisfies
\ar[Xi] = x(l -x)< 1/4. □
In the first version of the weak law of large numbers theorem (2.6.1), we
only assumed the random variables to be uncorrelated. Under the stronger
condition of independence and a stronger conditions on the moments (X4 G
C1), the convergence can be accelerated:
PA > e l < n ( S n / n ) * \
n
^ ^ n + n <2 2M—
,, 1
< M—j—r-
Proof Without loss of generality, we can assume that E[Xn] = 0 for all
n e N, because otherwise Xn can be replaced by Yn = Xn — E[Xn], Define
/#(£) = tl[-R,R], the random variables
n . n .i = \
i=i
< -R=+2sapE[\Xl\;\Xi\>R].
In the last step we have used the independence of the random variables and
E[xiR)] = 0 to get
Theorem 2.6.5 (Weak law of large numbers for IID L1 random variables).
Assume Xi G C1 are IID random variables with mean m. Then Sn/n —> m
in C1 and so in probability.
2.6. The weak law of large numbers 55
Because the random variables Xi are identically distributed, P[|X»|; \Xi\ >
R] is independent of i. Consequently any set of IID random variables is also
uniformly integrable. We can now use theorem (2.6.4). □
-T(Xk-mk)^0
nfe=i
E[ I*""*' 1 _+ o
[l-r\Xn-X\l
for n • oo.
Exercice. Use the weak law of large numbers to verify that the volume of
an n-dimensional ball of radius 1 satisfies Vn -▶ 0 for n -> oo. Estimate,
how fast the volume goes to 0. (See example (2.6))
Example. Let Cl = [0,1] be the unit interval with the Lebesgue measure p
and let m be an integer. Define the random variable X(x) = x™. One calls
its distribution a power distribution. It is in C1 and has the expectation
E[X] = l/(ra + 1). The distribution function of X is Fx(s) = s^/™"* on
[0,1] and Fx(s) = 0 for s < 0 and Fx(s) = 1 for s > 1. The random
variable is continuous in the sense that it has a probability density function
fx(s) = F'x(s) = sVn-i/m so that Fx(s) = /^ fx(t) dt.
Given two IID random variables X, Y with the m'th power distribution as
above, we can look at the random variables V = X+Y, W = X-Y. One can
realize V and W on the unit square Cl = [0,1] x [0,1] by V(x, y) = xm + ym
and W(x,y) = xm - ym. The distribution functions Fv(s) = P[V < s] and
Fw(s) = P[V < s] are the areas of the set A(s) = {(x,y) | xm + ym < s }
and B(s) = {(x, y) \ xm - ym < s }.
Chapter 2. Limit theorems
Figure. Fy(s) is the area of the Figure. Fw(s) is the area of the
set A(s), shown here in the case set B(s), shown here in the case
m = 4. m — 4.
JO
Fw(s) =
f}Um l-(xm- sf/m dx, s G [0,1]
Figure. The function Fv(s) with Figure. The function Fw(s) with
density (dashed) fv{s) of the sum density (dashed) fw{s) of the dif
of two power distributed random ference of two power distributed
variables with m = 2. random variables with m = 2.
2.8. Convergence of random variables 59
f(x) = A03/2x2e-**2
f(x) = 26xe-9x2
F[\Xn-X\>e]^0
for all e > 0.
Example. Let Cln = {1,2,..., n) with the uniform distribution P[{fc}] = 1/n
and Xn the random variable Xn(x) = x/n. Let X(x) = x on the probability
space [0,1] with probability P[[a, b)] = b — a. The random variables Xn and
X are defined on a different probability spaces but Xn converges to X in
distribution for n —> oo.
Proposition 2.8.1. The next figure shows the relations between the different
convergence types.
0) In distribution = in law
Fxn(s) -> Fx(s), Fx cont. at s
1) In probability
P [ | X „ - X | > e ] ^ 0 , Ve > 0 .
2) Almost everywhere 3) In £p
P[Xn - X] = 1 ||*n-A-||p-0
4) Complete
E „ P [ | X n - * | > e ] < o o , Ve > 0
{Xn->X} = f){Jf){\Xn-X\<l/k}
k m n>m
1 = PflJ fl J{l*»
A n -~X\<\}]=
A| < Jim P[ fl {\Xn - X\ < I }]
m n>m n>r
P[(J |xn-X| > ek, infinitely often] < ^P[|Xn-X| > ek, infinitely often] =
n n
Proposition 2.8.2. Given a sequence Xn £ C°° with \\Xn\\oo < K for all n,
then Xn -> X in probability if and only if Xn -> X in C .
E[|Xn-X|) = E[(|Xn-X|;|Xn-X|>e]+E[(|Xn-X|;iXn-X|<6]
< 2^P[|Xn-X|>^] + i<e.
Lemma 2.8.3. Given X £ C1 and e > 0. Then, there exists K > 0 with
E[|X|;|X|>^]<e.
Proof. Given e > 0. If X e C1, we can find 6 > 0 such that if P[A] < S,
then E[\X\,A] < e. Since KP[\X\ > K] < E[|X|], we can choose K such
that P[|X| >K)<6. Therefore E[|X|; |X| > K] < e. □
The next proposition gives a necessary and sufficient condition for C1 con
vergence.
Proof, a) => b). Define for K > 0 and a random variable X the bounded
variable
E[|JfW-Xn|]<|,E[|A-W-X|]<|.
64 Chapter 2. Limit theorems
Since \XnK) - XW| < |Xn - X|, we have XnK) - X<*> in probability.
We have so by the last proposition (2.8.2) XnK) -> X(i° in C1 so that for
n>m E[\XnK) - X(/°|] < e/3. Therefore, for n > m also
E[|Xn-X|]<E[|Xn-XW|]+E[|xW-XW|] + E[|XW-X|]<6.
a) => b). We have seen already that Xn -> X in probability if ||Xn-X||i ->
0. We have to show that Xn -> X in C1 implies that Xn is uniformly
integrable.
Given e > 0. There exists m such that E[|Xn - X|] < e/2 for n > m. By
the absolutely continuity property, we can choose S > 0 such that P[A] <e
implies
E[|Xn|; A] < e, 1 < n < m,E[|X|; A] < e/2 .
Because Xn is bounded in C1, we can choose K such that K"1 supn E[|Xn|] <
5 which implies E[|Xn| > K) < S. For n > m, we have therefore
Exercice. a) P[supfc>n \Xk - X\ > c] -> 0 forn -^ oo and all e > 0 if and
only if Xn —▶ X almost everywhere.
b) A sequence Xn converges almost surely if and only if
lim P[sup \Xn+k - Xn| > e] = 0
^-^°° fc>i
The first version of the strong law does not assume the random variables to
have the same distribution. They are assumed to have the same expectation
and have to be bounded in C .
P[|^-m|>6]<2MJ_
Definition. A real number x G [0,1] is called normal to the base 10, if its
decimal expansion x = X\X<i... has the property that each digit appears
with the same frequency 1/10.
Proof. Define the random variables Xn(x) = xn, where xn is the n'th
decimal digit. We have only to verify that Xn are IID random variables. The
strong law of large numbers will assure that almost all x are normal. Let Cl =
{0,1,..., 9 }N be the space of all infinite sequences u = (u)\,u>2, ^3,...).
Define on Cl the product cr-algebra A and the product probability measure
P. Define the measurable map S(u>) = ]C^Li ^fc/lO^ = x from Cl to [0,1].
It produces for every sequence in Cl a real number x G [0,1]. The integers
LJk are just the decimal digits of x. The map S is measure preserving and
can be inverted on a set of measure 1 because almost all real numbers have
a unique decimal expansion.
Because Xn(x) = Xn(S(u)) = Yu(uj) = wn, if S(u) = x. We see that Xn
are the same random variables than Yn. The later are by construction IID
with uniform distribution on{0,l,...,9}. □
1 n i n
bn = ~~ / J X{, ln = — / Y% •
n>lk>n
= J3fc.P[XiG[Jfe,*+l]]<E[Jfi]<oo,
fe>l
we get by the first Borel-Cantelli lemma that P[Yn ^ Xn, infinitely often] =
0. This means Tn — Sn —▶ 0 almost everywhere, proving E[Sn] —> m.
(ii) Fix a real number a > 1 and define an exponentially growing subse
quence kn = [an] which is the integer part of an. Denote by p the law of
the random variables Xn. For every e > 0, we get using Chebychev inequal
ity (2.5.5), pairwise independence for kn = [an] and constants C which can
vary from line to line:
- £ Var[Ym]
n m=l
1 °° 1
m=l n:kn>m n
<W e if;Var[Fm]4
2 *—' m2
7Tl=l
oo 1
m=l
oo oo 1 „/+!
i=0m=(+l -71
(iii) So far, the convergence has only be verified along a subsequence kn-
Because we assumed Xn > 0, the sequence Un = Y^=i Yn = nTn is mono-
tonically increasing. For k G [km, fcm+i]> we get therefore
km Ukm _ Ukrn <Un < Uhrn+i_ _ fcm+l ^fcm+1
fcm-\-l Km Km+1 ^ ^m ^m ^ra+1
\J2x"\^en-
k=l
This means that the trajectory Ylk=i Xn ls nnahy contained in any arbi
trary small cone. In other words, it grows slower than linear. The exact
description for the growth of XX=i Xn *s given by the law of the iterated
logarithm of Khinchin which says that a sequence of IID random variables
Xn with E[Xn] = m and cr(Xn) = o i1 0 satisfies
Remark. The IID assumption on the random variables can not be weakened
without further restrictions. Take for example a sequence Xn of random
variables satisfying P[Xn = ±2n] = 1/2. Then E[Xn] = 0 but even Sn/n
does not converge.
Example. Let fi = {|z| = l}cCbe the unit circle in the complex plane
with the measure P[Arg(z) G [a, b]] = (b - a)/(2n) for 0 < a < b < 2n
and the Borel a-algebra A. If w = e2nta is a complex number of length 1,
then the rotation T(z) = wz defines a measure preserving transformation
on (Cl,B, P). It is invertible with inverse T~1(z) = z/w.
E[X;A] = E[1AX]>0.
On An = {Zn > 0}, we can extend this to Zn(T) + X > max:i<fc<n+i Sk >
max0<jk<n-i-i Sk = Zn+\ > Zn so that on An
X>Zn- Zn(T) .
70 Chapter 2. Limit theorems
Using (1) this inequality, the fact (2) that Zn = 0 on X \ An, the (3) in
equality Zn(T) > Sn(T) > 0 on An and finally that T is measure preserving
(4), leads to
n n ^^
2=0
on — Sn(T) = —
n n .
(i)X = X.
Define for a < f3 G R the sets Aa>/3 = {X < (3, a < X}. Because {X <
X} = Ua</3,a,/36Q^a^' lt is enougn to show that P[i4a,/3] = 0 for rational
a < p. Define
Because j4q>0 C .A and Aa^ is T-invariant, we get from the maximal ergodic
theorem E[X — a, Aa^] > 0 and so
E[X,A^]>a^P[Aa^].
(u)XG/:1. _ _
\Sn\ < \X\, and Sn converges point wise to X = X and X G Cl. Lebesgue's
2.10. BirkhofFs ergodic theorem 71
dominated convergence theorem gives X G C.
(iii)E[X] = E[X]. _
Define the sets Bk,n = {X G [£, *±I)} for fc G Z,n > 1. Define for e > 0,
y = X - - + e. Using the maximal ergodic theorem, we get E[Y; Bk,n] > 0.
Because e > 0 was arbitrary,
E [ X ; £ f c , n ] >n- .
and because n was arbitrary, E[X] < E[X]. Doing the same with -X and
using (i), we end with
Corollary 2.10.3. The strong law of large numbers holds for IID random
variables Xn G C1.
Bn = {max \Sk\>e}= M Ak .
Kk<n v^
fc=l
We get
E[S>] > E[Sl;Bn] = Y,E[Sl; Ak] > ^E[5fc2; A*] > €2^P[^fe] = 62P[Bn]
fe=i k=i fc=i
b)
On Ak, \Sk-i\ < e and \Sk\ < |5fe_i| + \Xk\ <e + R holds. We use that in
2 . 11 . More convergence results 73
the estimate
= ^E[S2;Afc] + ^E[(5n-5fe)2;Afc]
fc=l fe=l
so that
E[52] < P[Bn]((e + i?)2 + E[52]) + e2 - 62P[£n] .
and so
rPlBnJ
[ c ] -, (c +e i?)2
[ s 2+ ]E[Sn]
- 6 2- 6^ >" (6
x + R)2
( ^ ^+ )£[S2]
2 - >62i _- iE[52]
i±^
Remark. The inequalities remain true in the limit n -> oo. The first in
equality is then
1 °°
P[sup \Sk - E[Sk]\ > e] < -j £ Var[Xfc] .
fc e fe=i
f^(Xn-E[Xn})
n=l
for m -> oo. The above lemma implies that Sn(u;) converges. D
Sn = X>fc-E[Xfe]) = £x,
k=l k=i
converges if
oo oo 1
k=l k=l
< 00 , (2.4)
Y,n\Xk\>R}
fc=i
oo
Proof. "=>" Assume first that the three series all converge. By (3) and
Kolmogorov's theorem, we know that YlkLi xiR) ~ ElxiR)] converges al
most surely. Therefore, by (2), YlkLi XkR) converges almost surely. By
(1) and Borel-Cantelli, P[Xfc ^ X(kR) infinitely often) = 0. Since for al
most all u, xlR\v) = Xk(uo) for sufficiently large k and for almost all
uj, YlkLi Xk^fa) converges, we get a set of measure one, where Ylh=i X^
converges.
"<^=" Assume now that X^Li xn converges almost everywhere. Then Xk —▶
0 almost everywhere and P[|Xfc| > R, infinitely often) = 0 for every R > 0.
By the second Borel-Cantelli lemma, the sum (1) converges.
The almost sure convergence of Y^=i xn implies the almost sure conver
gence of £~=1 XnR) since P[|Xfc| > R, infinitely often) = 0.
Let R > 0 be fixed. Let Yk be a sequence of independent random vari
ables such that Yk and X^ have the same distribution and that all the
random variables X^,Yk are independent. The almost sure convergence
of £~ i XnR) implies that of ZZi X™ " ^- Since Et4*} " y*l = °
and P[|X^} - Yk\ < 2R) = 1, by Kolmogorov inequality b), the series
Tn = ELi XiR) ~ Y* satisfies for all e > 0
X i = {A >
l ka
Remark. The median is not unique and in general different from the mean.
It is also defined for random variables for which the mean does not exist.
Proposition 2.11.5. (Comparing median and mean) For Y G C2. Then every
a G med(y) satisfies
\a-E[Y]\<V2a[Y].
P[ Kmax
k < n\Sn -r an,k\ > e] < 2P[5n > e]
J
2 . 11 . More convergence results 77
Ax = {Si + an,i > e}, Afc+i = { max (Sn + an,i) < 6, Sfc+i + an,fc+i > e}
for 1 < k < n are disjoint and |Jfc=i Ak = {maxi<k<n(Sk + an,k) > e}.
Because {Sn > e } contains all the sets Ak as well as {Sn - Sk > an,k} for
1 < k < n, we have using the independence of o(Ak) and a(Sn - Sk)
zfc=i
= 5plU*
Z fc=l
Applying this inequality to — Xn, we get also P[—Sn — ctn,m > —6] >
2P[-Sn > -e] and so
P[ l</c<n
max |5„ + an,fc| > e] < 2P[Sn > 6] .
P[ l max
< l < n|S/+m - Sm\ > e] < P[l <
max
l < n\Si+m - Sm + an+m,z+m| > e/2] .
The right hand side can be estimated by theorem (2.11.6) applied to Xn+m
with
<2P[|Sn+m-Sm|>^]<e.
N o w a p p l y t h e c o n v e r g e n c e l e m m a ( 2 . 11 . 2 ) . □
78 Chapter 2. Limit theorems
Exercice. Prove the strong law of large numbers of independent but not
necessarily identically distributed random variables: Given a sequence of
independent random variables Xn G C2 satisfying E[Xn] = m. If
f>ar[Xfc]/fc2 < co ,
fc=i
En^<°°
71=1 2=1
Hint: Use Var^ Xi] = U E[X2] - U E[X{]2 and use the three series theo
rem.
Fx(x)=P[X<x],
where P[X < x] is a short hand notation for P[{uj G Cl | X(u) < x }. With
the law p,x = X*P of X on R has Fx(x) = f*^ dp(x) so that F is the
anti-derivative of \x. One reason to introduce distribution functions is that
one can replace integrals on the probability space Cl by integrals on the real
line R which is more convenient.
a) non-decreasing,
b)Fx(-oo) = 0,Fx(oo) = l
c) continuous from the right: Fx(x + h) = Fx-
Proof, a) follows from {X < x } C {X < y } for x < y. b) P[{X < -n}] ->
0 and P[{X < n}] -> 1. c) Fx(x + h) - Fx = P[x < X < x + h] -> 0 for
h-*0.
Given F, define Cl = R and .4 as the Borel cr-algebra on R. The measure
P[(—oo, a]] = F[a] on the 7r-system 1 defines a unique measure on (Cl,A).
□
dp(x) = F(x) .
J —c
The proposition tells also that one can define a class of distribution func
tions, the set of real functions F which satisfy properties a), b), c).
The following decomposition theorem shows that these three classes are
natural:
Proof. Denote by A the Lebesgue measure on (R,B) for which X([a, b]) =
b — a. We first show that any measure p can be decomposed as p = pac+V<s,
where pac is absolutely continuous with respect to A and ps is singular. The
decomposition is unique: p — p>a} + Ps — pac + ps implies that p^c -
p(2J = p)?> —ps2' is both absolutely continuous and singular continuous with
respect to p which is only possible, if they are zero. To get the existence
of the decomposition, define c = supAG^ \(A)=oM^)- If c = 0, then p is
absolutely continuous and we are done. If c > 0, take an increasing sequence
An G B with p(An) -» c. Define A = Un>i ^n and pac as pac(B) =
p(AilB). To split the singular part ps into a singular continuous and pure
(1) (2) (2) (2)
point part, we again have uniqueness because ps = psc + psc = Ppp + pPP
implies that v = plV — Psc = ppp — ppp are both singular continuous and
pure point which implies that v = 0. To get existence, define the finite or
countable set A = {u | p(u) > 0 } and define ppp(B) = p(A fl B). □
T(x) = / tx-le-1 dt .
Jo
B(p,q)= [ xp~l(l-x)q-1 dx ,
Jo
the Beta function.
Here are some examples of absolutely continuous distributions:
acl) The normal distribution N(m, a2) on Cl = R has the probability den
sity function
1 (x-m)2
82 Chapter 2. Limit theorems
t( \ l b
7r b2 + (x — m)2
b—a
ac4) The exponential distribution A > 0 on Cl = [0, oo) has the probability
density function
f(x) = \e~Xx .
ac5) The log normal distribution on Cl = [0, oo) has the density function
f(\— 1 (log(x)-m)2
y/2irx2a2
ac6) The beta distribution on Cl = [0,1] with p > 1, q > 1 has the density
fix)
H )~ - Bg^a-*)'-1
(p,q) •
ac7) The Gamma distribution on Cl = [0, oo) with parameters a > 0, (3 > 0
m x*-lp*e-x/(3
T(a)
for the Binomial coefficient, where fc! = fc(fc-l)(fc-2) • • • 2-1 is the factorial
of k with the convention 0! = 1. For example,
10 10! = 10 * 9 * 8/6 = 120 .
7!3!
p[X = k] = e~ k\
P[X = k] = n-
P[X = k}=p(l-p)k
Figure. The probabilities and the Figure. The probabilities and the
CDF of the binomial distribution. CDF of the Poisson distribution.
Figure. The probabilities and the Figure. The probabilities and the
CDF of the uniform distribution. CDF of the geometric distribution.
x = y^,
2.12. Classes of random variables 85
Lemma 2.12.3. Given X e £ with law p. For any measurable map h : R1 —>
[0,oo) for which h(X) e C1, one has E[/i(X)] = fRh(x) dp(x). Especially,
if p = pac = f dx then
E[h(X)] = f h(x)f(x) dx
Jr
If p — ppp, then
E[h(X)] = £ h(x)p({x}) .
x^({x})^0
Proposition 2.12.4.
Distribution Parameters Mean Variance
acl) Normal m e R, a2 > 0 m a*
ac2) Cauchy m e R, b > 0 "m" oo
ac3) Uniform a<b (a -r b)/2 (6 - a) 712
ac4) Exponential A>0 1/A 1/A2
ac5) Log-Normal m e R, a2 > 0 e/i+cra/2 (e^ - l)e2m+<T"
PQ
ac6) Beta p,q>0 p/(p + q) (p+q)Hp+<,+D
ac7) Gamma a,/3>0 a(3 a/F
86 Chapter 2. Limit theorems
Proposition 2.12.5.
ppl) Bernoulli n € N, p e [0,1] np np{\ - p)
pp2) Poisson A>0 A A
pp3) Uniform neN (l+n)/2 (n2 - 1)/12
pp4) Geometric pG (0,1) (l-p)/p (l-pj/p*
pp5) First Success p e (o, l) 1/p (i-pVp*
scl) Cantor - 1/2 1/8
fc=0 k=\ v y
For calculating higher moments, one can also use the probability generating
function
E[**] = E _-*(**)'
k\
-A(l-z)
k=0
E**
fe=0
1 - 3
gives W 1
k=0 {1-xf
Therefore
oo oo oo 1
E[xp] = Efc(i-P)fcp=Efc(1-p)fep=pEfc(1-p)fc = S = p-
k=0 k=0 fc=l
For calculating the higher moments one can proceed as in the Poisson case
or use the moment generating function.
Cantor distribution: because one can realize a random variable with the
2.12. Classes of random variables 87
Cantor distribution asX = X^Li -W3n, where the IID random variables
Xn take the values 0 and 2 with probability p = 1/2 each, we have
Va r [ X ] = ^ — ^ — - ^ — g s — - 2 ^ 9 ^ ~ i _ 1 / 9 ^ 8 l ~ 8 '
n=l n=l n=l
Mx(t) = ^e^(l-p)l=p^((l-P)e? = _ P_ .
k=0 k=0 V F)
88 Chapter 2. Limit theorems
A random variable X on f2 = {1,2,3,...} with the distribution of first
success P[X = k]= p(l - p)k~x, has the moment generating function
oo OO £
Mx(t) =£ r Y,ektp(l-p)k~1
i r-i l
=- Sp^-py)'
(1 -
= , ,fp )we t •
k=l k=0
on the positive real line ft = [0, oo) with the help of the moment generating
function. If k is allowed to be an arbitrary positive real number, then the
Erlang distribution is called the Gamma distribution.
Proof If X and Y are independent, then also etx and etY are independent.
Therefore,
E[X] =
np
E[X2} =
np(l-p + np)
Var[X] =
E[(X-E[X])2} = E[X2}-E[X}2 = np(l-p)
E[X3] =
np{l + 3(n-l)p+(2-3n + n2)p2)
E[X4] =np(l + 7(n-l)p+6(2-3n
+n2)p2 + (-6 + lln - 6n2 + n3)p3)
E[(X-E[X})4} = E[X4]-8E[X]E[X3] + 6E[X2}2 + E[X}4
= np(l - p)(l + (5n - 6)p - (-6 + n + 6n2)p2)
2.12. Classes of random variables 89
H(X) = - [ (x1/m-1/m)log(x^m-1/m) dx .
Jo
To compute this integral, note first that f(x) = xa log(a:a) = axa log(x) has
the antiderivative ax1+a((l+a) log(x)-l)/(l+a)2 so that J* xa log(xa) dx =
-a/(l + a2) andH(X) = (l-ra + log(ra)). Because £lH(Xrn) = (l/m)-l
and -^2H(Xm) = -1/ra2, the entropy has its maximum at m = 1, where
the density is uniform. The entropy decreases for m —▶ oo. Among all ran
dom variables X(x) = xm, the random variable X(x) = x has maximal
entropy.
/ / dpn -» / / dp
JR Jx
in the limit n —▶ oo.
Weak convergence defines a topology on the set MX(R) of all Borel proba
bility measures on R. Similarly, one has a topology for Mi ([a, &]).
F(s) = lim F(s - 6) < lim inf Fn (s) <n limsupFn(s) < F(s) .
<5->0 n-+oo -+oo
92 Chapter 2. Limit theorems
That is we have established convergence in distribution,
(ii) Assume now we have no convergence in law. There exists then a con
tinuous function / so that // dpn to J f dp fails. That is, there is a
subsequence and e > 0 such that | / / dpnk - J f dp\ > e > 0. There exists
a compact interval J such that | /7 / dpnk - JT f dp\ > e/2 > 0 and we
can assume that pnk and p have support on /. The set of all probability
measures on J is compact in the weak topology. Therefore, a subsequence
of dpnk converges weakly to a measure v and \v(f) — p(f)\ > e/2. De
fine the 7r-system 1 of all intervals {(—oo,s] | s continuity point of F }.
We have pn((-oc,s]) = FXn(s) -> Fx(s) = p(-oo,s]). Using (i) we see
Pnk ((—oo, s]) —> v(—oo, s] also, so that p and v agree on the n system X. If
p and v agree on T, they agree on the 7r-system of all intervals {(—oo, s]}.
By lemma (2.1.4), we know that p — v on the Borel cr-algebra and so p = v.
This contradicts \v(f) - p(f)\ > e/2. So, the initial assumption of having
no convergence in law was wrong. □
X ~ ( X* -{EX[ )X } )
the normalized random variable, which has mean E[X*] = 0 and variance
a(X*) = i/Var[X*] = 1. Given a sequence of random variables Xk, we
again use the notation Sn = Ylk=i ^fc-
E [ | X H = ^ = 2 " / Vr ( i ( p + 1 ) ) .
V71" z
v..<*z|M,1<i<„
so that S* = Y%=i Yi- Define iV(0,a)-distributed random variables Yt hav
ing the property that the set of random variables
{Yl,...,Yn,Yl,...Yn}
are independent. The distribution of Sn = Yli=i YJ is just the normal distri
bution JV(0,1). In order to show the theorem, we have to prove E[/(5*)] -
'(l '0)aT uoi^nqu^stp reuiiou prapire^s u^iav ajqBUBA uioprrei
b oq. Appra/A s9Sj8auoo "<? U9qq '[?x]J,BA > 0 ^jsiq.'Bs pros qjj qjb j 3 ?_y
JI •(saiqBiJBA.nioptrej ^7 qh joj raaaoam %\m\\ \bi%u3q) Z'VVZ majoaqx
:87 B ?X <* e7 9 -X
uoiqipuoo gqj x^jaj ubo 9m 'pgqnquqsip A*[reoi:rii9pi aq oq >x 3TW aturissB 9m ji
□ -^A/U)D > \[(uS)m - i(u*S)f}3\ »«W ipns
(/)£) ^IIB^SUOO B SJSIX9 9191^ ($j[)«3 9 / U^C-OIIIS AJ9A9 JOJ %VT\% U99S 9A«I{ 9^.
HA g/e("/[ug]"A) lsuo. _
T £||?x!|?dns '*SU°D ~
Z/S[US]^A/S\\-X\\^s.u.^saoo >
i=<y i==j
U^|]a3-*suo°
u
> [l(*4'Stella+ [|(U'*zM]a u3
yw\% os e|^| • ;suoo > K^A^Z^l S9A!§ uraioaift scjot^x
(l"Q'2) Aipnb9ui uosuaf
9q^ pu-e (g'f^X'Z) raraiai Xq ^aS 9M 'pa^nqTj^sip-(si?'o)A[ 9J13 *A 9s™so9g
u-z) • [\(^z)v\u+[m^z)u\u^>u \[(us)m-[cs)m\
ojojojoq; ^9§ 9^/V 7 no pu9d9p ui3D uoium '(^ <£)# mj9^ ^S9JC J°l^L « q^TM
1=3/
[(*A'*z)a + (U'*z)tf]2 +
it
*■ u it
1=2/
[(*x + *z)/-(% + *z)/]3 = (us)f-(*s)f
9^TJM UT30 9M 'U!9J09q; StIOpft3X U9U^ PU13
rans oidoosap^ * ^sju Smsfi -^ = ^ + uz pire *£ = XA + TZ ^^ 9*°N-
9UU9Q
7 ir^oouis JOj in; Aju9a o^ uSnou9 si %[ '(^Q 3 f jftre joj o <- [("£)/] a'
SUI9109lfl IJUIiq £ J9^dVXJQ ^g
2.14. The central limit theorem 95
Proof. The same proof gives equation (5.4). We change the estimation of
Taylor \R(z,y)\ < 6(y) • y2 with 8(y) — 0 for \y\ -+ 0. Using the IID
property and using dominated convergence we can estimate the rest term
n
R = ^E[|i?(Zfc,n)|] +E[\R(Zk,Yk)\]
k=i
as follows:
R < J2EiS(Yk)Y2]+E[S(Yk)Y2}
fc=l
= n.E[S(^)^}+n-E[S(^)^-}
s »-El0IEI^1+nEI^)1Els'
= E0|E$1 + E01E|&
s El0c^0
D
Definition. Let Po,i be the space of probability measure p on (R, Br) which
have the properties that JR x2 dp(x) = 1, JR x dp(x) = 0. Define the map
on P0,i-
Corollary 2.14.4. The only attracting fixed point of T on Poa is the law of
the standard normal distribution.
limP[f-^
»-«> <x] ~ =J ±-[X
y/np{l-p) 2^7.^ e^dy
as had been shown by de Moivre in 1730 in the case p=l/2 and for general
p e (0,1) by Laplace in 1812. It is a direct consequence of the central limit
theorem:
For more general versions of the central limit theorem, see [105]. The next
limit theorem for discrete random variables illustrates, why Poisson dis
tribution on N is natural. Denote by B(n,p) the binomial distribution on
{1,..., n } and with Pa the Poisson distribution on N \ {0 }.
Proof We have to show that P[Xn = jfe] -> P[X = k] for each fixed keN.
□
2.14. The central limit theorem
for the distribution function of a random variable X which has the standard
normal distribution N(0,1). Given a sequence of IID random variables Xn
with this distribution.
a) Justify that one can estimate for large n probabilities
large numbers.
Remark. The fact that p < v defined earlier is equivalent to this is called
the Radon-Nykodym theorem ([?]). The function / is therefore called the
Radon-Nykodym derivative of p with respect to v.
Example. If v is the counting measure N = {0,1,2,... } and v is the
law of the geometric distribution with parameter p, then the density is
f(k)=p(l-p)k.
Example. If v is the Lebesgue measure on (-co, oo) and p is the law of
the standard normal distribution, then the density is f(x) = e~x2/2/y/2n.
There is a multi-variable calculus trick using polar coordinates, which im
mediately shows that / is a density:
J[ I e-^2+y^2
J r 2 dxdy
J o= P f Je-*"2/2
o rdOdr = 2n .
H ( p ) = f - f( u j ) \o g ( f( u ) ) d u ( u ) .
Jn
It generalizes the earlier defined Shannon entropy, where the assumption
had been dv — dx.
#(M) = £-/Miog(/(uO).
±
is
-a - P) Wd - P)kP)
p
= bg(l^) -p«^ •
2.15. Entropy of distributions "
H(pi)= f -f(x)\og(f(x))dx
JfL
For example, for the standard normal distribution fi with probability den
sity function f(x) = -j^e-*2/2, the entropy is H(f) = (1 + log(27r))/2.
A = {A1,...,.An}
f = Yi f f d u ) l A i e S ( u ) ,
H({Ai}) = Yi-v(Ai)log(v{Ai))
i
which is called the entropy of the partition {Ai}. The approximation of the
density / by a step functions / is called coarse graining and the entropy
of / is called the coarse grained entropy. It has first been considered by
Gibbs in 1902.
H(ji) = ^-f{u>)log(f{u)).
100 Chapter 2. Limit theorems
If / takes only the values 0 or 1, which means that p is deterministic,
then H(p) = 0. There is no surprise and what the measurements show, is
the reality. On the other hand, if / is the uniform distribution on ft, then
H(p) = log(|fi|). We will see in a moment that this is the maximal entropy.
Theorem 2.15.1 (Gibbs inequality). 0 < H(p\p) < +oo and H(p\p) = 0 if
and only if p — p.
Proof. We can assume H(p\p) < oo. The function u(x) = x log(x) is convex
on R+ = [0, oo) and satisfies u(x) > x - 1.
0 = EM[u(^)]>ii(EM[^]) = ti(l)=0.
Remark. The relative entropy can be used to measure the distance between
two distributions. It is not a metric although. The relative entropy is also
known under the name Kullback-Leibler divergence or Kullback-Leibler
metric, if v = dx [85].
2.15. Entropy of distributions 101
With
H<fi\v) = H(ji) - H(Ji)
we also have uniqueness: if two measures p, p have maximal entropy, then
H(p\p) = 0 so that by the Gibbs inequality lemma (2.15.1) p = p.
e) For the normal distribution log(/(x)) = a + b(x - m)2 with two real
number a, b depending only on m and a. The claim follows since we fixed
Var[X] = E[(x - m)2] for all distributions.
Remark. Let us try to get the maximal distribution using calculus of vari
ations. In order to find the maximum of the functional
H{f) = - j f\og{f)du
on Cx(v) under the constraints
\ f(x)dv(x) = 1,
Jn
/ xf(x) dv(x) = c
Jq
by solving the first equation for /:
-A-Mx+1 dv^ = x
/ ■
x e --A—fix+1
x-»x+i djy^
/ ■
dividing the third equation by the second, so that we can get p from the
equation f xe~^xx du(x) = cfe~^x^ dv(x) and A from the third equation
ei+A _ j e-fix djy^xy This variational approach produces critical points of
the entropy. Because the Hessian D2(H) = —l/f is negative definite, it is
also negative definite when restricted to the surface in C1 determined by
the restrictions F = 1, G = c. This indicates that we have found a global
maximum.
Example. For ft = R, X(x) = x2, we get the normal distribution N(0,1).
Example. For ft = N, X(n) = en, we get f(n) = e~enXl/Z(f) with Z(f) =
Sn e~6nXl and where Ai is determined by ^n ene~CnAl = c. This is called
the discrete Maxwell-Boltzmann distribution. In physics, one writes A-1 =
kT with the Boltzmann constant k, determining T, the temperature.
Here is a dictionary matching some notions in probability theory with cor
responding terms in statistical physics. The statistical physics jargon is
often more intuitive.
Probability theory Statistical mechanics
Set ft Phase space
Measure space Thermodynamic system
Random variable Observable (for example energy)
Probability density Thermodynamic state
Entropy Boltzmann-Gibbs entropy
Densities of maximal entropy Thermodynamic equilibria
Central limit theorem Maximal entropy principle
104 Chapter 2. Limit theorems
exx
on Q. The set
{fix | A € A }
of measures on (ft, A) is called the exponential family defined by v and X.
H(p\p)-XEil[X]>^\ogZ(X).
H(H\n) = J Q[ fl0g(l).l±dv
JX J
= -H(fi\u.x) + (- log(Z(A)) + AEA[X]) .
Therefore
Example. Take on the real line the Hamiltonian X(x) = x2 and a measure
p = fdx, we get the energy J x2 dp. Among all symmetric distributions
fixing the energy, the Gaussian distribution maximizes the entropy.
Z(A) = £e^=exp(e*-l)
k
Example, ft = {1,... ,N}, v the counting measure and let pp be the bino
mial distribution with p. Take p = pi/2 and X(k) = k. Since
pp is an exponential family.
where
Z ( X ) = E fl [ e ^ ^ X i X i ]
Theorem 2.15.6. For all probability measures p which are absolutely con
tinuous with respect to v, we have for all A G A
F(£)=JJ(£) + AEA[X].
[ PAf d vJ=T ~ 1f A f dv
J
is called the Perron-Frobenius operator associated to T. It is a Markov
operator. Closely related is the operator Pf(x) = f(Tx) for measure pre
serving invertible transformations. This Koopman operator is often studied
on C2, but it becomes a Markov operator when considered as a transfor
mation on C1.
108 Chapter 2. Limit theorems
Exercice. Assume ft = [0,1] with Lebesgue measure p. Verify that the
Perron-Frobenius operator for the tent map
' T(x) = | 2x ,are [0,1/2]
2(1 -x) ,xe [1/2,1]
Proof. We have to show u(Pf)(u) < Pu(f)(u) for almost all uj e ft. Given
x = (Pf)(uj), there exists by definition of convexity a linear function y h+
ay + b such that u(x) =ax + b and u(y) >ay + b for all y eR. Therefore,
since af + b < u(f) and P is positive
The following theorem states that relative entropy does not increase along
orbits of Markov operators. The assumption that {/ > 0 } is mapped into
itself is actually not necessary, but simplifies the proof.
(ii) Define fk = inf(/, kg) so that fk/g < k. We have fk c /fc+i and
fk -▶ / in C1. From (i) we know that H(Pfk\Pg) < H(fk\g). We can
assume H(f\g) < oo because the result is trivially true in the other case.
Define B = {/ < g}. On B, we have fklog(fk/g) = flog(f/g) and on ft\B
we have
fklog(fk/g) < fk+ilog(fk+1/g)u-+ flog(f/g)
so that by Lebesgue dominated convergence theorem,
H(f\g) = k—+oc
lim H(fk\g) .
□
11 0 Chapter 2. Limit theorems
Corollary 2.16.4. The operator T(p)(A) = JR2 U(^) dp(x) dp(y) does
not decrease entropy.
Proof. Denote by X^ a random variable having the law p and with p(X)
the law of a random variable. For a fixed random variable Y, we define the
Markov operator
Because the entropy is nondecreasing for each Py, we have this property
also for the nonlinear map T(p) = Px^(p). D
4>x(u) = E[eiuX].
If Fx is the distribution function of X and px its law, the characteristic
function of X is the Fourier-Stieltjes transform
[ eitx fx(x) dx .
Jr
Remark. By definition, characteristic functions are Fourier transforms of
probability measures: if p is the law of X, then (j>x = P-
Example. For a random variable with density fx(%) = xm'/(m + 1) on
ft = [0,1] the characteristic function is
-«x„mdx/tm
^Hm-f MU_
0X(*) = / e2t*xm 1) = m!(l-e"em(-it))
Jo (-2t)1+m(ra+l)
where en(x) = Yl^o xk/(k\) is the n'th partial exponential function.
F*G(x)= [ F(x-y)dG(y)
Jr
Proof. While one can deduce this fact directly from Fourier theory, we
prove it by hand: use an approximation of the integral by step functions:
J[ r eiuxd(F*G)(x)
iV2n
N2n
= lim
N.n—+oo E eiuk2~n
fc=-N2n
[lFd-y)-F(t-±-y))dG(y)
- N 2 +n +
l 1
JR
^2"
= lim
N,n—▶oo £ / c-*-"[F(£ - y) - F(^— - y)] • e^ dGfo)
^=-^2^ + 1
r i V- y
= / [Urn / V eiux dF(x)]eiuy dG(y) = [ 4>(u)eiuy dG(y)
JRN->ooJ_N_y JR
= <p(u)ip(u) .
11 4 Chapter 2. Limit theorems
D
Proof. Since Xj are independent, we get for any set of complex valued
measurable functions gj, for which E[gj(Xj)] exists:
nf[9J(Xj)] = f[E[gj(Xj)].
3=1 j=l
then for step functions by linearity and then for arbitrary measurable func
tions.
Example. If Xn are IID random variables which take the values 0 and 2 with
probability 1/2 each, the random variable X = Y^=i Xn/^n is a random
variable with the Cantor distribution. Because the characteristic function
of Xn is (j)xn/3n(t) = E[e*tXn/3n] = e* 2 -1, we see that the characteristic
function of X is
™ eiV3n _ 1
M*)=nf—2—^-
The centered random variable Y = X - 1/2 can be written as Y —
Z^Li yn/3n, where Yn takes values -1,1 with probability 1/2. So
Proof. This follows immediately from proposition (2.17.5) and the alge
braic isomorphisms between the algebra of characteristic functions with
convolution product and the algebra of distribution functions with point
wise multiplication. □
Example. Let Yk be IID random variables and let Xk = XkYk with 0 < A <
1. The process Sn = XX=i Xk is called the random walk with variable step
size or the branching random walk with exponentially decreasing steps. Let
p be the law of the random sum X = J2k=i Xk. If (fry (t) is the characteristic
function of Y, then the characteristic function of X is
<t>x(t) = \[<t>x{t\n)-
71=1
For example, if the random Yn take values -1,1 with probability 1/2, where
(j)Y(t) = cos(t), then
<Px(t) = l[cos(tXn)
n=l
The measure p is then called a Bernoulli convolution. For example, for
A = 1/3, the measure is supported on the Cantor set as we have seen
above. For more information on this stochastic process and the properties
of the measure p which in a subtle way depends on A, see [41].
11 6 Chapter 2. Limit theorems
Exercice. Show that Xn —> X in distribution if and only if the distribution
functions satisfy 4>xn{t) —▶ <t>x{t) for all <6M.
tf>x(t) = E[ert-x]
on Rfc, where we wrote t = (t\,... ,tfc). Two such random variables X, F
are independent, if the cr-algebras X"1(B) and y_1(S) are independent,
where B is the Borel cr-algebra on Rk.
a) Show that if X and Y are independent then 4>x+y = <t>x • 4>y-
b) Given a real nonsingular k x k matrix A called the covariance matrix
and a vector m = (mi,... , m^) called the mean of X. We say, a vector
valued random variable X has a Gaussian distribution with covariance A
and mean m, if
<M*) eimt-±(t-At)
Exercice. Let (0, A, /x) be a probability space and let U,V E. X be ran
dom variables (describing the energy density and the mass density of a
thermodynamical system). We have seen that the Helmholtz free energy
EA[t/] - kTH[p\
Exercice. a) Given the discrete measure space (ft = {e0 + nS},u), with
e0 G M and 6 > 0 and where v is the counting measure and let X(k) = k.
Find the distribution / maximizing the entropy H(f) among all measures
p = fv fixing Ep,[X] = e.
b) The physical interpretation is as follows: ft is the discrete set of ener
gies of a harmonic oscillator, e0 is the ground state energy, 8 = hw is the
incremental energy, where u is the frequency of the oscillation and h is
Planck's constant. X(k) = k is the Hamiltonian and E[X] is the energy.
Put A = 1/fcT, where T is the temperature (in the answer of a), there ap
pears a parameter A, the Lagrange multiplier of the variational problem).
Since can fix also the temperature T instead of the energy e, the distribu
tion in a) maximizing the entropy is determined by u and T. Compute the
spectrum e(uo,T) of the blackbody radiation defined by
J1
e(uj,T) = (E[X]-e0)
where c is the velocity of light. You have deduced then Planck's blackbody
radiation formula.
P[ Kmax
k < nSk > e] < 2P[(Sn > e] .
11 8 Chapter 2. Limit theorems
limsup —- = 1, liminf -^ = -1 .
n-KX) An n-voo An
Proof We follow [47]. Because the second statement follows obviously from
the first one by replacing Xn by -Xn, we have only to prove
limsup£n/An = 1 .
n—>oo
Define nk = [(1 + e)k] G N, where [x] is the integer part of x and the events
Clearly limsupfc Ak = {Sn > (l-he)An, infinitely often}. By the first Borel-
Cantelli lemma (2.2.2), it is enough to show that Y,k P[^fc] < oo. For each
k, we get with the above lemma
< Cexp(-l(l+e)2)2nfclQgl0gnfc)
2 rik+i
< Ci exp(-(l + e) loglog(nfc))
= Cilog(nfc)-(1+£><C2fc-(1+€).
2.18. The law of the iterated logarithm 11 9
Having shown that P[Ak] < const ./H1+e) proves the claim £fc P[Ak] < oo.
Given e > 0. Choose N > 1 large enough and c < 1 near enough to 1 such
that
c^l - l/N - 2/y/N > 1 - e . (2.10)
Define nk = Nk and Ank = nk - nk-i. The sets
are independent. In the following estimate, we use the fact that Jt°° e~x /2 dx >
C • e~l I2 for some constant C.
for sufficiently large fc. Both inequalities hold therefore for infinitely many
values of fc. For such fc,
We know that JV(0,1) is the unique fixed point of the map T by the central
limit theorem. The law of iterated logarithm is true for T(X) implies that
it is true for X. This shows that it would be enough to prove the theorem
in the case when X has distribution in an arbitrary small neighborhood of
120 Chapter 2. Limit theorems
We present a second proof of the central limit theorem in the IID case, to
illustrate the use of characteristic functions.
Theorem 2.18.3 (Central limit theorem for IID random variables). Given
Xn G C which are IID with mean 0 and finite variance a2. Then
Sn/(ay/n) -> N(0,1) in distribution.
E[eft3k] - e"<2/2 .
Denote by </>Xn the characteristic function of Xn. Since by assumption
E[Xn] = 0 and E[X2] = a2, we have
Therefore
E[ea^] = <j>xA—r)n
CFy/n
= (i-J-I t 2+ *R)n
1
2 n n
= e-<2/2 + o(l).
D
This method can be adapted to other situations as the following example
shows.
Proof
E[^]=El=1°g(n) + 7 + o(l),
fc=l'c
2.18. The law of the iterated logarithm 121
where 7 = limn-oo ££=1 \ ~ los(n) is the Euler constant
1,- lx , / n *"*
Var[5n] = £ -(1 - -) = log(n) + 7 - y + o(l)
fc=i
Proof. We can assume without loss of generality that P' is a positive mea
sure (do else the Hahn decomposition P = P+ — P~), where P+ and P~
123
124 Chapter 3. Discrete Stochastic Processes
are positive measures).
E [ Yi v y 2 ; 4 = E [ y i ; A n { y i > y 2 } ] + E [ y 2 ; A n { y 2 > Yi } ]
< p'[a n {Fi > y2}] + p'[a n {y2 > yx}] = p'[A]
and contains a function Y different from 0 since else, P' would be singular
with respect to P according to the definition (2.15) of absolute continuity.
We claim that the supremum Y of all functions T satisfies yP = Pf: an
application of Beppo-Levi's theorem (2.4.1) shows that the supremum of T
is in T. The measure P" — P' — YP is the zero measure since we could do
the same argument with a new set V for the absolutely continuous part of
Pn.
(ii) Uniqueness: assume there exist two derivatives Y,Y'. One has then
E[y - Yf; [Y > Y'}] = 0 and so Y > Y' almost everywhere. A similar
argument gives Y' <Y almost everywhere, so that Y — Y' almost every
where. In other words, Y — Y' in L1. □
Proof Define the measures P[A] = P[A] and P'[A] = JAX dP = E[X;A]
on the probability space (tt,B). Given a set B G B with P[B] = 0, then
Pf[B] = 0 so that P' is absolutely continuous with respect to P. Radon-
Nykodym's theorem (3.1.1) provides us with a random variable Y G Cl(B)
with P'[A] = JAXdP = JA Y d P. D
Example. Let ft = {1,2,3,4} and A the a-algebra of all subsets of ft. Let
B = {0, {1,3}, {2,4}, ft}. What is the conditional expectation Y = E[X\B]
126 Chapter 3. Discrete Stochastic Processes
of the random variable X(k) = fc2? The Hilbert space C2(A) is the four-
dimensional space R4 because a random variable X is now just a vector
X = (X(1),X(2),X(3),X(4)) = (1,4,9,16). The Hilbert space C2(B) is
the set of all vectors v = (v\,V2,vz,v±) for which v\ = v2 and V3 = V4
because functions which would not be constant in (vi,v2) would gener
ate a finer algebra. It is the two-dimensional subspace of all vectors {v =
(a,a,b,b) \ a,b G R}. The conditional expectation projects onto that plane.
The first two components (X(1),X(2)) project to (X(1HX(2) ? xi1)^i2))^
the second two components project to (*(3)+_x(4) ? x&)+x(*)y Therefore,
E[X\B] =X +-±-Z
p
Even if random variables are only in C1, the next list of properties of
conditional expectation can be remembered better with proposition 3.1.3
in mind which identifies conditional expectation as a projection, if they are
in£2.
3.1. Conditional Expectation 127
(2) Use that P" < P' < P implies P" = Y'P' = YfYP and P" < P gives
P" = ZP so that Z = Y'Y almost everywhere.
(3) This is especially useful when applied to the algebra Cy = {0, Y, Yc, ft}.
Because X < Y almost everywhere if and only if E[X|Cy] < E[y|Cy] for
all Y eB.
(4)-(5) The conditional versions of the Fatou lemma or the dominated
convergence theorem are true, if they are true conditioned with Cy for
each Y eB. The tower property reduces these statements to versions with
B = CY which are then on each of the sets Y, Yc the usual theorems.
(6) Chose a sequence (an, bn) G R2 such that h(x) = supn anx + bn for all
x G R. We get from h(X) > anX + bn that almost surely E[h(X)\G] >
anE[X\Q] + bn. These inequalities hold therefore simultaneously for all n
and we obtain almost surely
E[(yifl)ic] = E[yiB]P[ci
so that E[lBnc^] = E[lBnCY]. The measures on o(B,C)
E[X\B](u)= [ XdP^l
Jn
If such a map from ft to Mi (ft) exists and if it is S-measurable, it is called
a regular conditional probability given B. In general such a map u h-> P^
does not exist. However, it is known that for a probability space (ft, A, P)
for which ft is a complete separable metric space with Borel a-algebra A,
there exists a regular probability space for any sub a-algebra B of A.
P[B\B] = E[1B\B].
For X G Cp, one has the conditional moment E[X?|S] = E[XP|B] if B be a
a-subalgebra of A. They are S-measurable random variables and generalize
the usual moments. Of special interest is the conditional variance:
Definition. For X G C2, the conditional variance Var[X|S] is the random
variable E[X2|S] - E[X|Z3]2. Especially, if B is generated by a random vari
able y, one writes Var[X|y] = E[X2|y] - E[X|y]2.
3.1. Conditional Expectation 129
Remark. Because conditional expectation is a projection, all properties
known for the usual variance hold the more general notion of conditional
variance. For example, if X, Z are independent random variables in C ,
then Var[X + Z\Y] = Var[X|y] + Var[Z|y]. One also has the identity
Var[X|y] = E[(X - E[x|y])2|y].
Va r [ X ] = E[X2]-E[X]2
= E[E[X2|y]]-E[E[X|y]]2
= E[Var[X|y]] + E[E[X|y]2] - E[E[X|y]]2
= E[Var[X|y]] + Var[E[X|y]] .
Here is an application which illustrates how one can use of the conditional
variance in applications: the Cantor distribution is the singular continuous
distribution with the law p has its support on the standard Cantor set.
E \Xn \Am\ — Xm
if m < n and that E[X„] is constant. Allan Gut mentions in [34] that a
martingale is an allegory for "life" itself: the expected state of the future
given the past history is equal the present state and on average, nothing
happens.
1 IR> I m 1 ■ ! m i
!
1*J t mm I
132 Chapter 3. Discrete Stochastic Processes
Figure. A random variable X on the unit square defines a gray scale picture
if we interpret X(x, y) is the gray value at the point (x, y). It shows Joseph
Leo Doob (1910-2004), who developed basic martingale theory and many
applications. The partitions An = {[k/2n(k + l)/2n) x [j/2n(j + l)/2n)}
define a filtration of Q, = [0,1] x [0,1]. The sequence of pictures shows the
conditional expectations E[Xn|.4n]. It is a martingale.
1 ",s » H H 1 'fc 1
E[X„|Y0,...,y„-i] = E[[Xn\Y0,...,Yn}\Y0,...,Yn-1]
= E[Xn\Yo,..., yn-i] = Xn-i
E[Xn+i|yi,...,yn] = -J—E\Yn+1\Y1,...,Yn]
n ] o
= —|^P[r„+i = k + i\Yn = k]- p[Yn+1]
^ n
n+2
Note that Xn is not independent of Xn-\. The process "learns" in the sense
that if there are more black balls, then the winning chances are better.
oo
ooo
oooo
ooooo
oooooo
ooooooo
oooooooo
ooooooooo
oooooooooo
ooooooooooo
oooooooooooo
ooooooooooooo
F °i g u r e . A t &yrp i c a l run J of 30 o o oo ooooooooo ooooooooooo oooooo
oooo*
oooooooooooo*
experiments with Polya's urn oooooooooooo**
oooooooooooo*••
GPnPTYiP oo ooo oo oo ooo •• ••
oooooooooooo*
oooooooooooo**
ooooooooo******
oooooo**********
oooooo***********
oooooo************
oooooo*••••••••••••
ooooo***************
oooo**•••••••••••••••
oooo******************
ooo********************
with the convention that for Yn = 0, the sum is zero. We claim that Xn =
Yn/mn is a martingale with respect to Y. By the independence of Yn and
3.2. Martingales 135
Zni,i > 1, we have for every n
Yn Yn
E[Yn+1\Y0,...,Yn]=E[Y,Znk\Y0,...,Yn]=E[Y,Znk]=mYn
fc=i fc=i
so that
D
Definition. A stochastic process C = {Cn}n>i is called previsible if Cn is
*4n-i-measurable. A process X is called bounded, if Xn G £°° and if there
exists K G R such that ||Xn||oo < # for all neN.
/
CdX)n = nY,Ck(Xk-Xk.l).
k=l
It is called a discrete stochastic integral or a martingale transform.
Remark. If one wants to relax the boundedness rof C, then one has to
strengthen the condition for X. The proposition stays true, if both C and
X are £2-processes.
Remark. Here is an interpretation: if Xn represents your capital in a game,
then Xn - Xn_x are the net winnings per unit stake. If Cn is the stake on
game n, then
CdX = Y,Ck(Xk-Xk^)
/ • k=l
are the total winnings up to time n. A martingale represents a fair game
since E[Xn — Xn-i\An-i] = 0, whereas a supermartingale is a game which
is unfavorable to you. The above proposition tells that you can not find a
strategy for putting your stake to make the game fair.
3.2. Martingales 137
which is the time of first entry of Xn into B. The set {T = oo} is the set
which never enters into B. Obviously
{T<n}= [J{XkeB}eAn
k=0
Proposition 3.2.4. Let T\,T2 be two stopping times. The infimum T\ l\T2,
the maximum T\ V T2 as well as the sum T\ + T2 are stopping times.
*tm={V :^)<cc
or equivalently Xt — Y^=QXn^{T=n}- The process X% = Xr/\n is called
the stopped process. It is equal to Xt for times T <n and equal to Xn if
T>n.
Remark. It is important that we take the stopped process XT and not the
random variable Xt-
for the random walk X on Z starting at 0, let T be the stopping time
T = inf{n | Xn = 1 }• This is the martingale strategy in casino which gave
the name of these processes. As we will see later on, the random walk is
recurrent P[T < oo] = 1 in one dimensions. However
1 = E[XT] ^ E[X0] = 0 .
(i) T is bounded.
(ii) X is bounded and T is almost everywhere finite.
(hi) T e C1 and \Xn — Xn-i\ is bounded.
(iv) XT e C1 and lim^oo E[Xk; {T > fc}] = 0.
(v) X is uniformly integrable and T is almost everywhere finite.
thenE[XT] <E[X0].
If X is a martingale and any of the five conditions is true, then E[Xr] =
E[X0].
TA n
Because KT G C1, the result follows from the dominated convergence the
orem.
(iv) By (i), we get E[X0] = E[XTAk] = E[XT; {T < fc}] + E[Xk; {T > fc}]
and taking the limit gives E[X0] = limn-+oo*E>[Xk;{T < fc}] -▶ E[XT] by
the dominated convergence theorem and the assumption,
(v) The uniformly integrability E[\Xn\; \Xn\ > R] -> 0 for R -> oo assures
that XT e C1 since E[\XT\] < k • maxi<;<nE[|Xfc|] + supnE[|Xn|; {T >
fc}] < oo. Since \E[Xk;{T > k}]\ < supnE[|Xn|;{T > fc}] -▶ 0, we can
apply (iv).
E[XT] = E[X0] ,
so that E[Xn+i; A] = E[Xn; A]. Since this is true, for any A G An, we know
that E[Xn+i|.4n] = E[Xn|^4n] = Xn and X is a martingale. □
T = min{n > 0 | Xn = b, or Xn = — a } .
Proposition 3.2.9.
Proof. T is finite almost everywhere. One can see this by the law of the
iterated logarithm,
limsup —— = 1, liminf —- = -1 .
n An n An
(We will give later a direct proof the finiteness of T, when we treat the
random walk in more detail.) It follows that P[XT = -a] = 1- P[XT = b].
We check that Xk satisfies condition (iv) in Doob's stopping time theorem:
since XT takes values in {a, b }, it is in C1 and because on the set {T > k },
the value of Xk is in (-a, b), we have \E[Xk; {T > k }]| < max{a, b}P[T >
fc] - o. a
E[ST] = mE[T] .
3.3. Doob's convergence theorem 143
Uoo[a,b] = n—»>oo
lim Un[a,b] .
Remark. The proof uses the following strategy for putting your stakes C:
wait until X gets below a. Play then unit stakes until X gets above b and
stop playing. Wait again until X gets below a, etc.
Definition. We say, a stochastic process Xn is bounded in Cp, if there exists
Mel such that ||Xn||p < M for all neN.
P[tfoo[M] = Oo]=0.
(b-a)E[Uoo[a,b]]<oo,
which gives the claim. ^
Xoo = n—*oo
lim Xn
Proof.
A = {uj e ft I Xn has no limit in [—00,00] }
= {uj e ft I lim inf Xn < lim sup Xn }
= I) {uj G ft I lim inf Xn < a < b < lim sup Xn }
a<b,a,beQ
U A°<»-
a<b,a,b£Q
E[|Xoo|] = Epiminf
n — ▶\Xn\)
0 0 < liminf
n — E[|Xn|]
> o o < supE[|Xn|]
n < 00
Ak -- r l2n,
— k 2n
+ 1^)
Example. Let Xk be IID random variables in C1. For 0 < A < 1, the
branching random walk Sn = ]C/c=o ^k^k is a martingale which is bounded
in C1 because
||S„||l < j-^HXolk .
The martingale converges by Doob's convergence theorem almost surely.
One can also deduce this from Kolmogorov's theorem (2.11.3) if Xk G C2.
Doobs convergence theorem (3.3.3) assures convergence for Xk G C1.
Proof Since the supermartingale property gives E[\Xn\] = E[Xn] < E[X0],
the process Xn is bounded in C1. Apply Doob's convergence theorem. □
E[0y"+'] = E[/(0)z"] .
Write a — f{6) and use induction to simplify the right hand side to
L{Xn){\) = fn{eiXlmn)
so that L satisfies the functional equation
L(Xm) = f(L(\)) .
Cn(uj) = l{S(u/)<n<T(u/)}
is previsible.
b) Show that for every supermartingale X and stopping times S < T the
inequality
E[XT] < E[XS]
holds.
Exercice. In Polya's urn process, let Yn be the number of black balls after
n steps. Let Xn = Yn/(n + 2) be the fraction of black balls. We have seen
that X is a martingale.
a) Prove that P[yn = fc] = l/(n + 1) for every 1 < fc < n + 1.
b) Compute the distribution of the limit Xqo .
/(6>)=E[0Z]=XX0* 1-qO
fc=l
E[Z] = J>fefc = V
k=\
J [ }= qpmn(l-0)
mn(l-6)++ q qO-p
e-p '
An = (q-p) - 1 Q
0 qn -1
We get therefore
pX-\-q-p
L(X) = E[e-xx~] = lim E[e"Ay"/mn] - lim fn(ex/mU) =
qX + q-p
If m < 1, then the law of X(oo is a Dirac mass at 0. This means that the
process dies out. We see that in this case directly that limn_*oc fn(Q) = 1. In
the case m > 1, the law of Xoo has a point mass at 0 of weight p/q = 1/ra
and an absolutely continuous part (1/ra - l)2e(1/m_1)x dx. This can be
seen by performing a "look up" in a table of Laplace transforms
/•oo
Proposition 3.3.9. For a branching process with E[Z] > 1, the extinction
probability is the unique solution of f(x) = x in (0,1). For E[Z] < 1, the
extinction probability is 1.
{y = E[X\B] | B C A, B is a - algebra }
is uniformly integrable.
Proof. Given e > 0. Choose 6 > 0 such that for all A G A, P[A] < S
implies E[\X\; A] < e. Choose further K G R such that K~l • E[|X|] < S.
By Jensen's inequality, Y = E[X\B] satisfies
Therefore
K • P[|Xn| > K] < E[\Y\] < E[\X\] <6-K
so that P[|y| > K] < 6. Now, by definition of conditional expectation ,
|y|<E[|X||£]and{|y|>X}GS
P[A]=E[X] = E[X\Cn]-+X
and because X takes only the value 0 or 1 and X = P[A] shows that it
must be constant, we get P[A] — 1 or P[A] = 0. □
Definition. A sequence A-n of cr-algebras A-n satisfying
• • • c A-n c A_(n_i) c • • • c A-i
£/fc[a,6]<(|a| + ||X||i)/(6-a).
This implies in the same way as in the proof of Doob's convergence theorem
that limn-^oo Y-n converges almost everywhere.
We show now that X_oo = E[X|,4_oo]: given A G *4-oo- We have E[X; A] =
E[X-n] A] —> E[X_oo; A]. The same argument as before shows that X_oo =
E[X;A-oo]. □
Lets also look at a martingale proof of the strong law of large numbers.
Corollary 3.4.5. Given Xn G C1 which are IID and have mean ra. Then
Sn/n —> Tn in C1.
If we define A like this, we get the required decomposition and the sub
martingale characterization is also obvious. D
Xn = Xq -h 2^(Xk - Xk-l)
fc=l
Proof.
n oo
If on the other hand, Xn is bounded in £2, then ||Xn||2 < K < oo and
^[(Xk-Xk-^^K + EiX*]. D
154 Chapter 3. Discrete Stochastic Processes
M = \J{S(k) = i',AieB}eAn-i
A2 = {An eB}n {S(k) < n - 1}C G An-i •
Since
(Xs(kh2 - ASk = (X2 - A)s{-k)
is a martingale, we see that (Xs(k) — As^. The later process As^ is
bounded by fc so that by the above lemma Xs^ is bounded in C2. There
fore limn_>oo XnAs(k) exists almost surely. Combining this with
Then,
P[T(c) = oo;A» = oo] >0
where T(c) is the stopping time
E[AT{c)An]<(c + K)2
for all n. This is a contradiction to P[Ax> = oo, supn \Xn\ < oo] > 0. D
An
Proof, (i) Cesaro's lemma: Given 0 = bo < h < ... ,bn < bn+i —> oo and a
sequence vn G R which converges vn —▶ ^oo, then ^- $^fc=i(&fc — bk-i)vk —>
Proof. Let e > 0. Choose ra such that ^ > i>oo - e if fc > m. Then
^ n ^ rn
lim inf — Y^(6fc - bk-i)vk > lim inf — Y\&fc - bk-i)vk
Since this is true for every e > 0, we have lim inf > ^oo- By a similar
argument lim sup > Vqo- '-'
(ii) Kronecker's lemma: Given 0 = b0 < h < ..., bn < 6n+i -> oo and a se
quence xn of real numbers. Define sn = xx + • • • + xn. Then the convergence
of un = Ylk=i xk/bk implies that sn/bn -> 0.
Sn=Ylbk(Uk~Uk-1^=bnUn~X^fc"&fc-1)Wfc-1*
(iii) Proof of the claim: since A is increasing and null at 0, we have An > 0
and 1/(1+An) is bounded. Since A is previsible, also 1/(1+An) is previsible,
we can define the martingale
almost surely. This implies that (W)^ < 1 so that limn_oo Wn exists
almost surely. Kronecker's lemma (ii) applied point wise implies that on
{Aoo = 00}
lim Xn/(l + An) = lim Xn/An -+ 0 .
n—>oo n—>oo
D
3.6. Doob's submartingale inequality 157
3.6 Doob's submartingale inequality
A0 = {X0 >e}eA0
n
Ak = {Xk>e}n({jA1)€Ak.
E[Xn;Ak]>E[Xk',Ak]>eP[Ak].
For given e > 0, we get the best estimate for 6 = e/n and obtain
(ii) Given K>\ (close to 1). Choose en = KK(Kn-1). The last inequality
in (i) gives
The Borel-Cantelli lemma assures that for large enough n and Kn~l < k <
Kn
Sk < sup Sk < en = KK(Kn~l) < KA(k)
l<k<Kn
limsuP777X -K '
fc^oo A(fc)
limsup —- < 1 .
fc^oo A(fc)
(iii) Given N > 1 (large) and S > 0 (small). Define the independent sets
Then
P[An] = 1 - 9(y) = (27r)-1/2(y + y-i)-ic-va/2
with y = (1 -5)(21oglog(iVn-1 - N"))1'2. Since P[An] is up to logarithmic
terms equal to (nlogiV)-^1-^2, we have ]Cnp[^i] = °°- Borel-Cantelli
shows that P[limsupn An] = 1 so that
By (ii), S(Nn) > -2A(Nn) for large n so that for infinitely many n, we
have
S(Nn+1) > (1 - S)A(Nn+1 - Nn) - 2A(Nn) .
It follows that
eP[|X|>e]<E[|y|;|X|>e]
||X*||<g-sup||Xn||p.
160 Chapter 3. Discrete Stochastic Processes
Xqo = nlim
— ▶o Xn
exists in Cp and HX^H^ = supn \\Xn\
Example. This example is a primitive model for the Stock and Bond mar
ket. Given a < r < b < oo real numbers. Define p = (r - a)/(b - a). Let en
be IID random variables taking values 1,-1 with probability p respectively
1-p. Define a process Bn (bonds with fixed interest rate /) and Sn (stocks
with fluctuating interest rates) by
Bn = (l+r)nBn-i,B0 = l •
Sn = (1 + Rn)Sn-i, So = 1
with Rn = (a + b)/2 + en(a - b)/2. Given a sequence An (the portfolio),
your fortune is Xn and satisfies
Xn = (1 + r)Xn_! + AnSn-l(Rn - r) .
We can write Rn - r = \(b - a)(Zn - Zn-i) with the martingale
n
Zn = ^(ek-2p+l).
fc=i
yn-yn_! = (l + r)-nA,5„_i(iin-r)
= Itb-aXl+ryAnSn-iiZn-Zn-!)
= Cn(Zn — Zn-i)
Mn = E[X\An] •
This implies that X = M^ if and only if ^n o~2 = oo. If the noise grows
too much, for example for o~n = n, then we can not recover X from the
observations Yk.
E[B] = £P[Sn = 0]
71=0
Theorem 3.8.1 (Polya). E[B] = oo for d = 1,2 and E[B] < oo for d > 2.
Proof. Fix n G N and define a^n\k) = P[Sn = fc] for fc G Zd. Because
the particle can reach in time n only a bounded region, the function a(n) :
Zd -> R is zero outside a bounded set. We can therefore define its Fourier
transform
<^(*) = Ea(n)(fe)e27rifcx'
kezd
which is smooth function on Td = Md/Zd. It is the characteristic function
of Sn because
E[eteS»] = Y, P(5» = W^ ■
kezd
The characteristic function <\>x of Xk is
iji=i f=i
Because the Sn is a sum of n independent random variables Xj
1 d
<t>sn = </>*, (*)</>x2 (a:)... 0xn (*) = — (£ cos(27rx0)n .
2=1
We now show that E[B] = En>o <^n(0) is finite if and only if d < 3- The
Fourier inversion formula gives
/ T~~i2 dx
J{\x\<e} m2
over the ball of radius e in Rd is finite if and only if d > 3. D
Corollary 3.8.2. The particle returns to the origin infinitely often almost
surely if d < 2. For d > 3, almost surely, the particle returns only finitely
many times to zero and P[limn_>oo \Sn\ = oo] = 1.
Proof. If d > 2, then Aoo = limsupn An is the subset of ft, for which the
particles returns to 0 infinitely many times. Since E[B] = Y^Lo^lAn],
the Borel-Cantelli lemma gives P[j4oo] = 0 for d > 2. The particle returns
therefore back to 0 only finitely many times and in the same way it visits
each lattice point only finitely many times. This means that the particle
eventually leaves every bounded set and converges to infinity.
If d < 2, let p be the probability that the random walk returns to 0:
71
Then pm~1 is the probability that there are at least m visits in 0 and the
probability is p™"1 -pm = pm~l(l -p) that there are exactly m visits. We
can write
E[JB]=^mp—1(l-p) = -^.
m>l P
Because E[B] = oo, we know that p = 1. □
The use of characteristic functions allows also to solve combinatorial prob
lems like to count the number of closed paths starting at zero in the graph:
J/ r (t>Sn(x)dx=
d J j d a f £l-(y2cos(27rixk))n
—-J dx
J122ncos2n(27rx)dx=f 2M
closed paths of length 2n starting at 0. We know that also because
P L[ S 2J n \ = n0 ] J= ( 22 nn)2± n± .
1
dx = oo .
/Jjdt 1 - </>x(x)
but unlike for the usual random walk, where Sn(x) grows like y/n, one sees
a much slower growth 5„(0) < log(n)2 for almost all a and for special
numbers like the golden ratio (\/5 + l)/2 or the silver ratio y/2 + 1 one has
for infinitely many n the relation
with a = 1/(2 log(l + \/2)). It is not known whether Sn{0) grows like log(n)
for almost all a.
Proof. The number of paths from a to b passing zero is equal to the number
of paths from —a to b which in turn is the number of paths from zero to
a + b. □
168 Chapter 3. Discrete Stochastic Processes
Theorem 3.9.2 (Ruin time). We have the following distribution of the stop
ping time:
a) P[T_a < n] = P[Sn < -a] + P[Sn > a].
b)P[T-a = n] = £P[Sn = a].
P[T_a<n] = ^2P[T-a<n,a-rSn = b]
bez
= ^P[a + Sn = &] + ^P[T_a<n,a + Sn = 6]
P[Sn<-a]+P[Sn>a]
b) From
P[S» = a]=( Jjk )
we get
^P[5n
n = a] = \(P[Sn-i = a - 1] - P[5n_! = a + 1]) .
Also
Therefore, using a)
P[T_a = n] = P[T_Q < n] - P[T_a < n - 1]
= P[5„ < -a] - P[5„_i < -a]
P[5n > a] - P[5„_i > a]
= ^(P^n-^al-P^n-^O+l])
^(PlSn-^a-lJ-P^n-^o])
= 5(P[5„_i = a - 1] - P[5„_i = a + 1]) = -P[5n = a]
* n
D
P[T.a = n] =n -P[5n=a].
Proof
Remark. We see that limn_+ooP[To > 2n] = 0. This restates that the
random walk is recurrent. However, the expected return time is very long:
oo oo oo
22n/v/7rn and so
P[^ = 0]=(^)^~(-nr1/a
which describes the last visit of the random walk in 0 before time 2N. If
the random walk describes a game between two players, who play over a
time 2iV, then L is the time when one of the two players does no more give
up his leadership.
Theorem 3.9.5 (Arc Sin law). L has the discrete arc-sin distribution:
2N -2n
p[^»]=^(2;)( N - n
Proof.
which gives the first formula. The Stirling formula gives P[S2k = 0] ~ ^
so that ,
with 1
fix) = , n , ■
7Ty/x(l - X)
3.10. The random walk on the free group 171
It follows that
L fz 2
y[wz?
2 N <z]-+ f(x)
J 0 dx = - n arcsin(x/i) .
Remark. From the shape of the arc-sin distribution, one has to expect that
the winner takes the final leading position either early or late.
A = {a1,a2,...,ad,a11,a21,...,ad1 }
172 Chapter 3. Discrete Stochastic Processes
modulo the identifications aia~ = a~ ai = 1. The group operation is
concatenating words vow = vw. The inverse of w = W\W2 • • • wn is w~1 =
wn1 "'w2lwi1' Elements w in the group Fd can be uniquely represented
by reduced words obtained by deleting all words vv~l in w. The identity
e in the group Fd is the empty word. We denote by l(w) the length of the
reduced word of w.
+■■+
n=0 n=0
oo
= z2 z2 'p[Sn1=e,Sn2=e,...,Snk=e,
n=0 0<ni<n2<---<nk
D
Remark. This lemma is true for the random walk on a Cayley graph of any
finitely presented group.
The numbers r2n+i are zero for odd 2n+1 because an even number of steps
are needed to come back. The values of r2n can be computed by using basic
combinatorics:
^=(2^^ »-"l2)M(2d-ir
Proof We have
To count the number of such words, map every word with 2n letters into
a path in Z2 going from (0,0) to (n, n) which is away from the diagonal
except at the beginning or the end. The map is constructed in the following
way: for every letter, we record a horizontal or vertical step of length 1.
If l(wk) = l(wk~x) + 1, we record a horizontal step. In the other case, if
l(wk) = l(wk~l) - 1, we record a vertical step. The first step is horizontal
independent of the word. There are
If 2n-2
n\ ra-1
174 Chapter 3. Discrete Stochastic Processes
such paths since by the distribution of the stopping time in the one dimen
sional random walk
2d-l
m(x)
(d - 1) + yjd? - (2d - l)x2
r ,W
( x, ) d-^/d2- (2d - l)x2
2d -= 1 —
and get the claim with Feller's lemma m(x) = 1/(1 - r(x)). □
Remark. The Cayley graph of the free group is also called the Bethe lattice.
One can read of from this formula that the spectrum of the free Laplacian
L : l2(Fd) -> l2(Fd) on the Bethe lattice given by
Lu(d) = J2u(q+a)
aeA
Corollary 3.10.4. The random walk on the free group Fd with d generators
is recurrent if and only if d = 1.
Proof. Denote as in the case of the random walk on Zd with B the random
variable counting the total number of visits of the origin. We have then
again E[B] = £n P[5n = e] = £n mn = m(l). We see that for d = 1 we
3 . 11 . T h e f r e e L a p l a c i a n o n a d i s c r e t e g r o u p 1 7 5
have m(l) = oo and that m(d) < oo for d > 1. This establishes the analog
of Polya's result on Zd and leads in the same way to the recurrence:
(i) d = 1: We know that Zi = Fu and that the walk in Z1 is recurrent.
(ii) d > 2: define the event An = {Sn = e). Then A^ = limsupn An is the
subset of fi, for which the walk returns to e infinitely many times. Since
for d > 2,
oo
E[£] = IZP^n]m(d)<oc,
The Borel-Cantelli lemma gives P[Ax>] = 0 for d > 2. The particle returns
therefore to 0 only finitely many times and similarly it visits each vertex in
Fd only finitely many times. This means that the particle eventually leaves
e v e r y b o u n d e d s e t a n d e s c a p e s t o i n f i n i t y. □
Remark. We could say that the problem of the random walk on a discrete
group G is solvable if one can give an algebraic formula for the function
m(x). We have seen that the classes of Abelian finitely generated and free
groups are solvable. Trying to extend the class of solvable random walks
seems to be an interesting problem. It would also be interesting to know,
whether there exists a group such that the function m(x) is transcendental.
Definition. The free Laplacian for the random walk given by (G,A,p) is
the linear operator on l2(G) defined by
Lgh = Pg-h •
Since we assumed pa =pa-i, the matrix L is symmetric: Lgh = Lhg and
the spectrum
0 p
p 0 p
p 0 p
L =
p 0 p
p 0 p
p 0
is also called a Jacobi matrix. It acts on the Hilbert space l2(Z) by (Lu)n =
p(un+i +un-i).
Example. Let G = £>3 be the dihedral group which has the presentation
G = (a,b\a3 = b2 = (ab)2 = 1). The group is the symmetry group of the
equilateral triangle. It has 6 elements and it is the smallest non-Abelian
group. Let us number the group elements with integers {1,2 = a, 3 =
a2,4 = 6,5 = a&,6 = o?b }. We have for example 3 • 4 = o?b = 6 or
3*5 = a2ab = o?b = b = 4. In this case A = {a, &}, A'1 = {a-1, b} so that
A U A'1 = {a, a"1, b}. The Cayley graph of the group is a graph with 6
vertices. We could take the uniform distribution pa = Pb = Pa-1 = I/3 on
A\JA~1, but lets instead chose the distribution pa = pa~l = ^-I^Pb = V2>
which is natural if we consider multiplication by b and multiplication by
6-1 as different.
Example. The free Laplacian on D3 with the random walk transition prob
abilities pa = Pa-1 = ^/^Pb = 1/2 is the matrix
A basic question is: what is the relation between the spectrum of L, the
structure of the group G and the properties of the random walk on GL
and since the > direction is trivial we have only to show that < direction.
Denote by E(A) the spectral projection matrix of L, so that dE(X) is a
projection-valued measure on the spectrum and the spectral theorem says
that L can be written as L = J A dE(X). The measure pe = dEee is called
a spectral measure of L. The real number E(X) — E(p) is nonzero if and
only if there exists some spectrum of L in the interval [A, p). Since
(-i)
£l^ = f'{E - X)-Uk(E)
178 Chapter 3. Discrete Stochastic Processes
can't be analytic in A in a point Ao of the support of dk which is the
spectrum of Z,, the claim follows. □
(ULU*)u(x) = ((UL)(un)(x)=pU(un+i+un-1)(x)
= p]P(un+i +un-i)eVl
= p]Tun(e^n-1)x + e'(n+1)x)
= p Y, u n ( e i x + e - i x ) e i n x
nez
= p^un2cos(x)einx
nez
= 2pcos(x) • u(x) .
Example. G = Fd the free group with the natural d generators. The spec
trum of L is
[ d~~' d J
which is strictly contained in [—1,1] if d > 1.
Remark. Kesten has shown that the spectral radius of L is equal to 1 if and
only if the group G has an invariant mean. For example, for a finite graph,
where L is a stochastic matrix, for which each column is a probability
vector, the spectral radius is 1 because LT has the eigenvector (1,..., 1)
with eigenvalue 1.
3.12. A discrete Feynman-Kac formula 179
Random walks and Laplacian can be defined on any graph. The spectrum
of the Laplacian on a finite graph is an invariant of the graph but there are
non-isomorphic graphs with the same spectrum. There are known infinite
self-similar graphs, for which the Laplacian has pure point spectrum [63].
There are also known infinite graphs, such that the Laplacian has purely
singular continuous spectrum [95]. For more on spectral theory on graphs,
start with [6].
ut = e~iLu0 .
The solution exists for all times because the von Neumann series
JL t2L2 t3L3
ut+ne = (1 - ieL)nut
f n
exp( / L) = f[L7Wj7(i+i) .
Ji t=i
Let Q is the set of all paths on G and E denotes the expectation with
respect to a measure P of the random walk on G starting at 0.
(Lnu)(0) = Eo[exp(rL)ti(7(n))].
Jo
Proof.
(L"u)(0) = $>")(,;"(;)
= E E eM[nL)u(j)
nn
= Yl exP( / LM7(™)) .
iern
□
which is a Cayley transform of L. See also [50], where the idea is disussed
to use L = arccos(aL'), where L has been rescaled such that ah has norm
smaller or equal to 1. The time evolution can then be computed by iterating
the map A : (ip, (j)) h-> (2aL^ - 0, ip) on H 0 H.
1 d
(Au)(n) = — ^2(u(n + a) + u(n - a) - 2u(n)) ,
i=l
182 Chapter 3. Discrete Stochastic Processes
where ei,..., e<* is the standard basis in Zd.
Definition. Let ilXin denote the set of all paths of length n in D which start
at a point x G D and end up at a point in the boundary SD. It is a subset
of T^n, the set of all paths of length n in Zd starting at x. Lets call it the
discrete Wiener space of order n defined by x and D. It is a subset of the
set TXin which has 2dn elements. We take the uniform distribution on this
finite set so that P*,n[{7}] = l/2dn.
Definition. Let L be the matrix for which LXiV = If (2d) if x,y G Zd are
connected by a path and x is in the interior of D. The matrix L is a bounded
linear operator on l2(D) and satisfies Lx,z = Lz,x for x,zG int(D) = D\5D.
Given / : SD -> R, we extend / to a function F(x) = 0 on / D = D\SD
and F(x) = f(x) for x G SD. The discrete Dirichlet problem can be restated
as the problem to find the solution u to the system of linear equations
(1-L)u = f.
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
000 0 000000
10 10 10 0 10 0
0 10 10 10 0 10
4L =
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 1 0 0 11
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
E*,n[/] = E f(y)LZv
yeSD
and
oo
Ex[/] = £>*.»[/]•
n=0
This functional defines for every point x G D a probability measure /ix on
the boundary SD. It is the discrete analog of the harmonic measure in the
continuum. The measure Px on the set of paths satisfies Ex[l] = 1 as we
will just see.
u(x)=Ex[f(ST)].
E*,n[/] = (Lnf)x ,
184 Chapter 3. Discrete Stochastic Processes
we have from the geometric series formula
( l - ^ r 1k=0^ ^
the result
oo oo
The path integral result can be generalized and the increased generality
makes it even simpler to describe:
u = Ex[f(ST)]
is the expected value of St on the discrete Wiener space of all paths starting
at x and ending at the boundary of D.
ELnf
n=0
But this is the sum Ex[f(Sr)] over all paths 7 starting at x and ending at
the boundary of /. □
Example. Lets look at a directed graph (D, E) with 5 vertices and 2 bound
ary points. The Laplacian on D is defined by the stochastic matrix
0 1/3 0 0 0"
1/2 0 10 0
K = 1/4 1/2 0 0 0
1/8 1/6 0 10
1/8 0 0 0 1
or the Laplacian
Theorem 3.14.1 (Markov processes exist). For any state space (S,B) and
any transition probability function P, there exists a corresponding Markov
process X.
Pij = p(i,{j})-
The matrix P transports the law of Xn into the law of Xn+\.
The transition matrix Pij is a stochastic matrix: each column is a proba
bility vector: ^. P^ = 1 with P^- > 0. Every measure on S can be given
by a vector n G l2(S) and Pn is again a measure. If Xo is constant and
equal to i and Xn is a Markov process with transition probability P, then
pn=P[Xn = j}.
n=0
Vn= j P(x,B)f(x)dv(x
Vh = h
Continuous Stochastic
Processes
191
192 Chapter 4. Continuous Stochastic Processes
for some nonsingular symmetric nxn matrix V and vector m = E[X]. The
matrix V is called covariance matrix and the vector m is called the mean
vector.
X(X) = 1t p-{x-m,V-\x-m))l2
(27r)"/2^t(l0
on Q = Rn is a Gaussian random variable with covariance matrix V. To
see that it has the required multidimensional characteristic function (j>x (u).
Note that because V is symmetric, one can diagonalize it. Therefore, the
computation can be done in a bases, where V is diagonal. This reduces the
situation to characteristic functions for normal random variables.
Proof. We can assume without loss of generality that the random variables
X, Y are centered. Two Rn-valued Gaussian random vectors X and Y are
independent if and only if
U Cov[X,y] U 0
U=
C o v [ y, x ] v 0 V
is the covariance matrix of the random vector (X, Y). With r = (£, s), we
have therefore
*(*,y)(r) = E[e^^]=e-^^
= e-Hs-V8)-±(t-Wt)
= e - i ( s - Va ) e - i ( t - W t )
= <l>x(s)<l>Y(t) •
Example. In the context of this lemma, one should mention that there
exist uncorrelated normal distributed random variables X, Y which are not
independent [109]: Proof. Let X be Gaussian on R and define for a > 0 the
variable Y(uj) — —X(uj), if uj > a and Y = X else. Also Y is Gaussian and
there exists a such that E[X7] =0. But X and Y are not independent and
X+Y = 0 on [—a, a] shows that X+Y is not Gaussian. This example shows
why Gaussian vectors (X,Y) are defined directly as R2 valued random
variables with some properties and not as a vector (X, Y) where each of
the two component is a one-dimensional random Gaussian variable.
Proof By the above lemma (4.1.1), we only have to check that for all i < j
Proposition 4.1.3. Given a separable real Hilbert space (H, || • ||). There
exists a probability space (0, A P) and a family X(h), h G H of real-valued
random variables on Q such that h*-> X(h) is linear, and X(h) is Gaussian,
centered and E[X(/i)2] =
X(h) = }hnXn
n
Especially X(A) and X(B) are independent if and only if A and £ are
disjoint.
Definition. Define the process Bt = X([0, £]). For any sequence ti,t2, • • • G
T, this process has independent increments Bti - Bu_x and is a Gaussian
process. For each t, we have E[B2] = t and for s <t, the increment Bt - Bs
has variance t — s so that
E[\Xt+h-Xt\p]<K-h^r
so that
P[|X(fc+1)/2- - Xfc/2»| > 2-™} < K2-"(1+£) .
196 Chapter 4. Continuous Stochastic Processes
Therefore
oo 2n-l
By the first Borel-Cantelli's lemma (2.2.2), there exists n(uj) < oo almost
everywhere such that for all n > n(uj) and k = 0,..., 2n — 1
Let n > n(uj) and t G [fc/2n, (fc+l)/2n] of the form * = k/2n+Y7=i 7z/2n_N
with 7* G {0,1}. Then
m
i=l
For almost all a;, this holds for sufficiently small /i.
We know now that for almost all uj, the path Xt(uj) is uniformly continuous
on the dense set of dyadic numbers D = {fc/2n}. Such a function can be
extended to a continuous function on [0,1] by defining
Yt(uj)= seD^t
lim Xs(uj).
\Yt(uj)-Ys(uj)\<C(uj)\t-s\a
In McKean's book "Stochastic integrals" [66] one can find Levy's direct
proof of the existence of Brownian motion. Because that proof gives an ex
plicit formula for the Brownian motion process Bt and is so constructive,
we outline it shortly:
2) Take a family Xfc?n for (fc,n) G I = {(fc,n) | n > l,fc < 2n,fc odd } U
{(0,0) } of independent Gaussian random variables.
3) Define
Bt = 22 ^k>n / ^fc'n '
(k,n)ei Jo
4) Prove convergence of the above series.
5) Check
Proposition 4.2.1. Brownian motion is unique in the sense that two stan
dard Brownian motions are indistinguishable.
Proof. The construction of the map H -▶ C2 was unique in the sense that
if we construct two different processes X(ft) and Y(h), then there exists an
isomorphism U of the probability space such that X(ft) = Y(U(h)). The
continuity of Xt and Yt implies then that for almost all uj, Xt(uj) = Yt(Uuj).
In other words, they are indistinguishable. D
almost surely.
Proof. From the time inversion property (iv), we see that t 1Bt, = Pi/t
which converges for t -> oo to 0 almost everywhere, because of the almost
everywhere continuity of Bt. ^J
\\Xt+h-Xt\\<C.ha
for all ft > 0 and all t. A curve which is Holder continuous of order a = 1
is called Lipshitz continuous.
The curve is called locally Holder continuous of order a if there exists for
each t a constant C = C(t) such that
\\Xt+h-Xt\\<C.ha
for all small enough ft. For a Revalued stochastic process, (local) Holder
continuity holds if for almost all uj G ft the sample path Xt(uj) is (local)
Holder continuous for almost all uj G ft.
Proposition 4.2.4. For every a < 1/2, Brownian motion has a modification
which is locally Holder continuous of order a.
200 Chapter 4. Continuous Stochastic Processes
for some constant Cp. Kolmogorov's lemma assures the existence of a mod
ification satisfying locally
\Bt-Bs\ <C\t-s\a,0<a<^^ .
2p
Because of this proposition, we can assume from now on that all the paths
of Brownian motion are locally Holder continuous of order a < 1/2.
\Bj/n-B{j_1)/n\ <7-
pm u n {i*j/»-*o-D/»i<7i}]
n>m 0<2<n+l i<j<i+3
□
Remark. This proposition shows especially that we have no Lipshitz con
tinuity of Brownian paths. A slight generalization shows that Brownian
motion is not Holder continuous for any a > 1/2. One has just to do the
same trick with fc instead of 3 steps, where k(a — 1/2) > 1. The actual
modulus of continuity is very near to a = 1/2: \Bt — Bt+e\ is of the order
h(e) = ^2elog(-e).
Proof. The first statement follows from the fact that for all u = (u\,..., un)
n
Q^aA^X^A) = ^fli6/(tt,tj) •
i=l j=l ij
202 Chapter 4. Continuous Stochastic Processes
This is a positive semidefinite inner product. Multiplying out the null vec
tors {||v|| = 0 } and doing a completion gives a separable Hilbert space
H. Define now as in the construction of Brownian motion the process
Xt = X(5t). Because the map X : H -» C2 preserves the inner product, we
have
E[XuXs] = (5s,5t) = V(s,t).
□
J_ f(k2 + l)-leik(t-sUk=le-\t-s\,
27T JR *
and so
j,k=l
Ot = -7f~'^ •
Xt = 2~1/2e-tBe2t .
4.2. Some properties of Brownian motion 203
Since E[X2] = 0, we have X\ = 0 which means that all paths start from 0
at time 0 and end at 1 at time 1.
The realization Xt = B8 — sB\ shows also that Xt has a continuous real
ization.
204 Chapter 4. Continuous Stochastic Processes
Let Xt be the Brownian bridge and let y be a point in Rd. We can consider
the Gaussian process Yt = ty + Xt which describes paths going from 0 at
time 0 to y at time 1. The process Y has however no more zero mean.
Brownian motion B and Brownian bridge X are related to each other by
the formulas:
Bt = Bt := (t + l)Xt/(£+1), Xt = Xt := (1 - t)Bt/{1.t) .
These identities follow from the fact that both are continuous centered
Gaussian processes with the right covariance:
ti,...,t„
One usually does not work with the coordinate process but prefers to work
with processes which have some continuity properties. Many processes have
versions which are right continuous and have left hand limits at every point.
Definition. Let Dbea measurable subset of ET and assume the process has
a version X such that almost all paths X(uj) are in D. Define the probability
space (D,£T n D,Q), where Q is the measure Q = (j)*P. Obviously, the
process Y defined on (D, ST n £>, Q) is another version of X. If D is right
continuous with left hand limits, the process is called the canonical version
of X.
Proof. Let D = C(T,E) C ET. Define the measure W = (j)*Px and let
Y be the coordinate process of B. Uniqueness: assume we have two such
measures W, W and let Y, Y' be the coordinate processes of B on D with
respect to W and W. Since both Y and Yf are versions of X and "being
a version" is an equivalence relation, they are also versions of each other.
This means that W and W coincide on a n- system and are therefore the
same. □
Lemma 4.4.1.
ie-2/2> / e-xV2dx> « e, - a 2 / 2
Proof.
/ e-2/2 dx < / e"* /2(x/a) dx = Va /2 .
Ie-aV2
a J
_ f°
a
e-xV2
a
dx<\r
J
e-**l2 dX .
a
D
208 Chapter 4. Continuous Stochastic Processes
P[An] = P[ l<fc<2n
max |J3fc2-n - B(fc_i)2-»| < an] .
P[lim
n - +max
o o \Bk2-n
l < / c <-2 n
J5(fc_1)2n|
v > h(2~n)] = 1 .
P [ A n ] = P [ k=j—i£K
max \Bj2-n-Bi2-n\/h(k2-n)>(l + e)}
where
K = {0 < fc < 2nS }
. keK
keK
< c . 2-n(i-«5)(i+e)2 ^P(log(Ar12n)r1/2 ( since fc-1 > 2~nS)
k€K
< C ■ n-1/22Tl(5_(1_,s)(1+e)2) •
In the last step was used that there are at most 2nS points in K and for
each of them logCAr^) > log(2"(l - 5)).
We see that £n P[An} converges. By Borel-Cantelli we get for almost every
uj an integer ti(oj) such that for n > n(ui)
\Bj2-«-Bi2-n\<(l + e)-h(k2-n),
where k = j-ieK. Increase possibly n{uS) so that for n > n(w)
Y,h(2-m)<e-h(2-(n+1){l-S)).
m>n
Pick 0 < h < h < 1 such that t = t2 - h < 2-"(^1-*>. Take next
n > n(w) such that 2^n+1^-s) < t < 2~n^ and write the dyadic
development of ti, t2\
h = i2-n _ 2-Pl _ 2-P2 _^M= j2-» + 2~^ + 2"* ...
with h < i2~n < j2~n < t2 and 0 < fc = j - i < t2n < 2n5. We get
B e c a u s e e > 0 w a s a r b i t r a r y, t h e p r o o f i s c o m p l e t e . □
TA(uj) = mf{t>0\Xt(uj)eA}
is a stopping time relative to the filtration At = a({Xs }5<t).
{{TA<t}
TA < t } == {{ s inf
m ^ t dd(X.
(Xs(uj)1A)=0}
Proof. TA is a At+ stopping time if and only if {TA < t} € At for all t.
If A is open and X,(w) G A, we know by the right-continuity of the paths
that Xt(w) G A for every t € [s, s + e) for some e > 0. Therefore
{TA
1 <t}
J = s {6 Q
inf, s Xs
<t G A } G A •
Xt(uj) = XT{u>){u>) ■
Remark. We have met this definition already in the case of discrete time
but in the present situation, it is not clear whether XT is measurable. It
turns out that this is true for many processes.
Proof. Assume right continuity (the argument is similar in the case of left
continuity). Write X as the coordinate process D([0, t],E). Denote the map
(s to) ^ Xs(u>) with Y = Y(s,u). Given a closed ball U G £. We have to
show that Y-X(U) = {(s,w) | Y(s,oj) G U} G B([0,t}) x At- Given k = N,
we define E0,i/ = 0 and inductively for fc > 1 the fc'th hitting time (a
stopping time)
Proof. The set {T < oo} is itself in AT. To say that XT is AT- measurable
on this set is equivalent with XT • l{T<t} G A for every t. But the map
u= / G(x,y)-v(y) dy .
Jd
The Green function can be computed using Brownian motion as follows:
/»oo
G(x,y) = / g(t,x,y)dt,
Jo
where for x G D,
Because of this problem, one has to modify the question and one says, u is
a solution of a modified Dirichlet problem, if u satisfies ADu = 0 inside D
and limx^yjXeD u(x) = f(y) for all nonsingular points y in the boundary
SD. Irregularity of a point y can be defined analytically but it is equivalent
with Py[TDc > 0] = 1, which means that almost every Brownian particle
starting at y G SD will return to SD after positive time.
In words, the solution u(x) of the Dirichlet problem is the expected value
of the boundary function / at the exit point BT of Brownian motion Bt
starting at x. We have seen in the previous chapter that the discretized
version of this result on a graph is quite easy to prove.
E[B?-t\As}-(B2s-s) = E[B?-B2a\As}-(t-s)
= E[(Bt-Ba)2\As}-(t-s)=0
216 Chapter 4. Continuous Stochastic Processes
Since Brownian motion begins at any time s new, we have
E[\Xt\p\As]>\E[Xt\As]\p = \Xs\p.
□
*i = $>»<«.
k=i
It is an example of a martingale which is not continuous, This process
takes values in N and measures, how many jumps are necessary to reach
t. Since E[Nt] = ct, it follows that Nt — ct is a martingale with respect to
the filtration At = <r(N8, s < t). It is a right continuous process. We know
therefore that it is progressively measurable and that for each stopping
time T, also NT is progressively measurable. See [49] or the last chapter
for more information about Poisson processes.
Nt-Ns = ^2 1s<sn<t •
n=l
m - fc] - fc!{^-tc
Proof The proof is done by starting with a Poisson distributed process Nt.
Define then
Sn(uj) = {t\Nt = n,Nt-o = n - 1 }
and show that Xn = Sn - Sn-i are independent random variables with
exponential distribution. ^
Remark. Poisson processes on the lattice Zd are also called Brownian mo
tion on the lattice and can be used to describe Feynman-Kac formulas for
discrete Schrodinger operators. The process is defined as follows: take Xt
as above and define oo
Yt = 22Zklsk<t ,
k=l
where Zn are IID random variables taking values in {ra G Zd\\m\ = 1}.
This means that a particle stays at a lattice site for an exponential time
and jumps then to one of the neighbors of n with equal probability. Let
Pn be the analog of the Wiener measure on right continuous paths on the
lattice and denote with En the expectation. The Feynman-Kac formula for
discrete Schrodinger operators H = H0 + V is
|supX*|| <q-sup\\Xt\\p .
t€D teD
The following inequality measures, how big is the probability that one-
dimensional Brownian motion will leave the cone {(t, x), \x\ < a • t}.
Proof.
= P[ sup (eQBs"V)>e^]
*€[0,1]
= P[ sup Ms > e0a]
s6[0,l]
< e^a sup E[MS] = e_/3a
«€[0,1]
A(t) = V^logllog*!
P[limsup-^-
L t - > o =A (1]t ) = J 1,' P[liminf-^-
L t - o A =( t -1]
) J= 1
220 Chapter 4. Continuous Stochastic Processes
Proof. The second statement follows from the first by changing Bt to -Bt.
an = (i + 8)6-nA(en), pn = ^p-.
which means that for almost every u, there is n0(w) such that for n > n0(ui)
andse [O,^""1),
/ 2 _ ^ _ . fl r~a2/2
'[An] = / e-' ^ a' + l*
with a = (1 - v/0)A(0n) < Kn"a with some constants K and a < 1.
Therefore En1^"] = °° and by the second Borel-Cantelli lemma,
for infinitely many n. Since -B is also Brownian motion, we know from (i)
that
-B*.+i < 2A(0n+1) (4-2)
for sufficiently large n. Using these two inequalities (4.1) and (4.2) and
A(0"+1) < 2v/0A(0n) for large enough n, we get
Remark. This statement shows also that Bt changes sign infinitely often
for t —* 0 and that Brownian motion is recurrent in one dimension. One
could show more, namely that the set {Bt = 0 } is a nonempty perfect set
with Hausdorff dimension 1/2 which is in particularly uncountable.
By time inversion, one gets the law of iterated logarithm near infinity:
Corollary 4.8.2.
Bs B\is
1 = lim sup . , . = lim sups . , ■
s_+o A(s) s^o A(s)
= lim sup = lim sup ■
T£*t\(i/t) ;r^A(i)"
The other statement follows again by reflection. □
Ppimsup-^-f
t — 0 A ( t ) = 1] = 1.
Since Bt • e < \Bt\, we know that the limsup is > 1. This is true for all
unit vectors and we can even get it simultaneously for a dense set {en}n€N
222 Chapter 4. Continuous Stochastic Processes
of unit vectors in the unit sphere. Assume the limsup is 1 + e > 1. Then,
there exists en such that
r(n)M = Ei/(M)(rM)^
which is again a stopping time. Define also:
AT + = { A e A o o \ A n { T < t } e A u V t } .
The next theorem tells that Brownian motion starts afresh at stopping
times.
Proof. Let A be the set {T < oo}. The theorem says that for every function
with g e C(Rn)
E[f(Bt)lA]=E[f(Bt)]'P[A]
and that for every set C G At+
This two statements are equivalent to the statement that for every C G At+
E[/(Bt)Unc] = lim
n—>ooE[f(BTin))lAnnc]
00
= lim y>[/(£fe/2")Un,fcnd
n—»oo ^—'
fc=0
00
= lim VE[/(B0)]-PK,fenC]
n.—▶n'o'^
n—»oo
fc=0
= E[/(B0)Unc]
= E\f(B0)]-P[AnC]
= E[/(J5t)]-P[AHC]
Theorem 4.9.2 (Blumental's zero-one law). For every set A G Ao+ we have
P[A] = 0 or P[A] = 1.
Remark. This zero-one law can be used to define regular points on the
boundary of a domain D eRd. Given a point y G SD. We say it is regular,
if Py[T5D > 0] = 0 and irregular Py[TSD > 0] = 1. This definition turns
out to be equivalent to the classical definition in potential theory: a point
y G SD is irregular if and only if there exists a barrier function / : N —▶ R
in a neighborhood Nofy.A barrier function is defined as a negative sub-
harmonic function on int(JV n D) satisfying f(x) -> 0 for x -> y within
D.
Remark. Kakutani, Dvoretsky and Erdos have shown that for d > 3, there
are no self-intersections with probability 1. It is known that for d < 2, there
are infinitely many n-fold points and for d > 3, there are no triple points.
= h(y + x) dp(x) ,
where p is the normalized Lebesgue measure on 5,5. This equality for small
e n o u g h S i s t h e d e f i n i t i o n o f h a r m o n i c i t y. □
Proof, (i) We show the claim first for one ball K = Br(z) and let R = \z-y\.
By Brownian scaling Bt~c- Bt/C2. The hitting probability of K can only
be a function f(r/R) of r/R:
by (i). □
Definition. Let p be a probability measure on R3. Define the potential
theoretical energy of p as
/x€M(K)
Proposition 4.10.4. For every compact set K C Rd, there exists an equilib
rium measure p on K and the equilibrium potential / \x - y|~(d~2) dp(y)
rsp. /logflx - y\~l) dp(y) takes the value C(K)'1 on the support K* of
p.
226 Chapter 4. Continuous Stochastic Processes
h(y,K) = <t>^C(K)
and therefore h(y, K) > C(K) • inf^^ \x - y\~l.
Proof. We have to find a probability measure p(u) on B^uj) such that its
energy I(p(w)) is finite almost everywhere. Define such a measure by
{gG[a,b]15,GA}
<W) = l (6^) 1-
Then
To see the claim we have to show that this is finite almost everywhere, we
integrate over fl which is by Fubini
They are independent and by the strong law of large numbers YLn X™ = °°
almost everywhere. '-'
22** Chapter 4. Continuous Stochastic Processes
Proof. Again, we have only to treat the three dimensional case. Let T > 0
be such that
aT = P[ |J Bt = B8]>0
t€[0,l],2<T
in the proof of the theorem. By scaling,
P[^ = Bs|tG[0,^],5G[2/3,T/?]]
is independent of /3. We have thus self-intersections of the random walk in
any interval [0, b] and by translation in any interval [a, b]. □
Rt(u)Bt(uj) = (\Bt(uj)\,0,...Q).
Then Bt = RT(Bt-\-T — Bt) is again Brownian motion.
Proof. Both h(y, Kr) and (r/\y\)d~2 are harmonic functions which are 1 at
S K r a n d z e r o a t i n f i n i t y. T h e y a r e t h e s a m e . □
which implies in the case r2_n < 1 by the Borel-Cantelli lemma that for
almost all uj, Bs(uj) > r for s > Tn. Since Tn is finite almost everywhere,
we get lim infs \Ba\ > r. Since r is arbitrary, the claim follows. □
S n = i n f { * > r n | | fl t | = l / 2 }
Tn = inf{*>Sn_i||Bt| = 2}.
These are finite stopping times. The Dynkin-Hunt theorem shows that Sn -
Tn and Tn - Sn-i are two mutually independent families of IID random
variables. The Lebesgue measures Yn = \In\ of the time intervals
In = {t\\Bt\<l, Tn<t<Tn+1},
are independent random variables. Therefore, also Xn = min(l,yn) are
independent bounded IID random variables. By the law of large numbers,
J2n Xn = oo which implies J2n Yn = oo and the claim follows from
|{*e[o,oo)||ft|<i}|>2rn.
n
D
Remark. Brownian motion in Rd can be defined as a diffusion on Rd with
generator A/2, where A is the Laplacian on Rd. A generalization of Brow
nian motion to manifolds can be done using the diffusion processes with
respect to the Laplace-Beltrami operator. Like this, one can define Brown
ian motion on the torus or on the sphere for example. See [57].
Proof. Define
\\{St-U?/nM = \\T,Uhn(St/n-Ut/n)S^^
3=0
< n SUp \\(St/n - Ut/n)Vs\\ •
0<s<t
n
—l▶
im
o n-\\(St/n-Ut/n)u\\ = 0. (4.3)
The linear space D with norm |||u||| = \\{A + B)u\\ + \\u\\ is a Banach
space since A + B is self-adjoint on D and therefore closed. We have a
bounded family {n(St/n - Ut/n)}neN of bounded operators from D to H.
The principle of uniform boundedness states that
\\n{St/n-Ut/n)u\\<C-\\\u\\\.
232 Chapter 4. Continuous Stochastic Processes
An e/3 argument shows that the limit (4.3) exists uniformly on compact
subsets of£> and especially on {vs}ae[0>t] c D and so nsup0<s<t \\(St/n-
Ut/n)va11 - U. 1 he second statement is proved in exactly the same way. □
Remark. Trotter's product formula generalizes the Lie product formula
where
t ^ 1 \xi -Xj-x
Sn(xo,Xlt... tXn,t) = ± £ -2{^-^-? ~ V(Xi)
i=l '
everywhere.
Lemma 4.12.3. Given /i,..., fn G L°°(Rd)nL2(Rd) and 0 < si < • • • < sn.
Then
(e-^o/i---e-t"i/o/n)(0) = |/i(BSl).-./n(^Jd5,
where t\ = S\,U = Si - Si-\,i > 2 and the fa on the left hand side are
understood as multiplication operators on L2(Rd).
Proof. Since BSl, B82 - B8l,... BSn - B8n_x are mutually independent
Gaussian random variables of variance ti, t2, •.. ,tn, their joint distribu
tion is
Pt1(0,»i)Pta(0,^)...Ptn(0,yn)d»
which is after a change of variables y\ = #i, yi = Xi — Xi-\
Ptl(0,xi)Pt2(xi,x2)...Ptn(xn_i,xn) dx .
Therefore,
Jfi(BSl)-'fn(Bsn)dB
[ Ptl(0,yi)Pta(0,|fc) • • • Ptn(0,yn)/i(yi) • • • fn(Vn) dy
J(Rd)n
= / Ptl(01Xi)Pt2(x1,X2) . ..Ptn(xn-i,Xn)fi(xi) . . . /n(Xn) dx
J(Rd)n
= (e"tlHo/i---e"tnHo/n)(0).
Denote by df? the Wiener measure on C([0, oo),Rd) and with dx the
Lebesgue measure on Rd. We define also an extended Wiener measure
dW = dxx dB on C([0, oo),Rd) on all paths s^Ws=x + Bs starting at
x eRd.
[fo(WSo)---fn(WSn)dW = f fo{x)e-t^h{x)...e-t^fn{x)dx
= (7o,e-tlHofi---e-t"H"fn).
(ii) In the case s0 > 0 we have from (i) and the dominated convergence
theorem
/ fo(WSo) ■ ■ ■ fn(WSn) dW
= lim / l{\x\<R}{W0)
fo(WSoJ---fn(WSn)dW
= l™Jfoe-SoH°l{\x\<R}, e-tlHoh ■ ■ ■ e-f""°/„(x))
= (fo,e-tlHoh---e-tnHofn).
D
We prove now the Feynman-Kac formula for Schrodinger operators of the
form H = H0 + V with V € C03O(Rd). Because V is continuous, the integral
/0 V(Ws(w)) ds can be taken for each w as a limit of Riemann sums and
/0 V(Wa) ds certainly is a random variable.
(f,e-tMg) = n—*oo
lim (/, {e-tH"'ne-tv'n)ng)
tn-l rt
n£
■n„ ^/n)- / V(Ws)ds.
3=0
4.13. The quantum mechanical oscillator 235
The integrand on the right hand side of (4.4) is dominated by
l/(Wb)|-|ff(Wi)|-^l|vr|l~
which is in Ll(dW) because again by corollary (4.12.4),
(Au,v) = (u,A*v)
236 Chapter 4. Continuous Stochastic Processes
the two operators are adjoint to each other. The vector
fi0 = -x2/2
rV4
is a unit vector because fig is the density of a JV(0, l/\/2) distributed ran
dom variable. Because ^fi0 = 0, it is an eigenvector of H = A* A with
eigenvalue 1/2. It is called the ground state or vacuum state describing the
system with no particle. Define inductively the n-particle states
fi n = - ^ * fi n _ !
Hn(x)vo(x)
fin(x) =
V2nnl
where Hn(x) are Hermite poly
nomials, H0(x) = l,Hi(x) =
2x,H2(x) = 4x2 - 2,H3(x) =
Sx3 - \2x,....
H = (A*A-\)Sln = (n-l-)Sln
*\n—1
[A,(AT} = n-(A*)
4.13. The quantum mechanical oscillator 237
For n = 1, this means [A, A*] = 1. The induction step is
a) Also
((A*)nfi0,(A*)mfi0) = n!5mn.
can be proven by induction. For n = 0 it follows from the fact that fio is
normalized. The induction step uses [A, (A*)71} = n- (A*)n~l and Afi0 = 0:
((A*)nfio,04*rfio) = (A(A*)nno(A*)m-lno)
= n ^ r - ^ c ^ r - ' fi o ) .
If n < m, then we get from this 0 after n steps, while in the case n = ra,
we obtain ((A*)nfi0, (A*)nfi0) = n • ((A*)n_1"o, (A*)n_1"o), which is by
induction n(n — l)!(Sn-i,n-i = n!.
0 = VnlF (/, fin) = (/, (ATfio) = (/, (** + ^)n"o) = 2n/2 (/, a?nfi0)
we have
f(x)ilQ(x)eikx dx
/ o-oo
o
= (/,fioeite) = (/,£^«o)
n>0
n!
n>0
and so /fio = 0. Since fio0*0 is positive for all x, we must have / = 0. This
finishes the proof that we have a complete basis. □
238 Chapter 4. Continuous Stochastic Processes
Remark. This gives a complete solution to the quantum mechanical har
monic oscillator. With the eigenvalues {An = n-1/2}£L0 and the complete
set of eigenvectors fin one can solve the Schrodinger equation
d
ih—u = Hu
dt
by writing the function u(x) = En=oun^n(x) as a sum of eigenfuctions,
where un = (u, fin). The solution of the Schrodinger equation is
oo
«(t,a;) = yjune<ft<n-1/2)*fin(x).
71=0
The oscillator is the special case q(x) = x. See [12]. The Backlund transfor
mation H = A" A h-> H = A A* is in the case of the harmonic oscillator the
map H >-▶ H + 1 has the effect that it replaces U with U = U - d2 logfi0,
where fi0 is the lowest eigenvalue. The new operator H has the same spec
trum as H except that the lowest eigenvalue is removed. This procedure
can be reversed and to create "soliton potentials" out of the vacuum. It
is also natural to use the language of super-symmetry as introduced by
Witten: take two copies Hf®Hh of the Hilbert space where "/" stands for
Fermion and "6" for Boson. With
0 -A* 1 0
Q= A 0 ,P = 0 -1
{e-tL°f){x)= fPt(x,y)f(y) dy
JR
is slightly more involved than in the case of the free Laplacian. Let fio be
the ground state of L0 as in the last section.
Lemma 4.14.1. Given /0, /i,..., fn G L°°(R) and -oo < s0 < si < • • • <
sn < oo. Then
(fi0,/oe-tlLo/i---e-t"Lo/nfio)
lim (fio,/o(e-'lifo/mie-'lt//miri/1---e-t"Ko/nn0)
m=(mi,...,mn),mi—▶oo
, x 1 / (x2+y2)(l + e-2t)-4xye-t
p t ( x , y ) = — = = e x p - ^ -2(J2
/v '
with<j2 = (l-e-2*).
240 Chapter 4. Continuous Stochastic Processes
Proof. We have
H 1 e"'
r* l
We get Mehler's formula by inverting this matrix and using that the density
is
(2w)det(A)-1/2e-^x'y'''A^y^ .
D
Definition. Let dQ be the Wiener measure on C(R) belonging to the os
cillator process Qt-
(fn0,e-iL9n0) = jj{Qo)9{Qt)e-S°v^-»ds dQ
forall/,5e.L2(R,figda:).
t^ *
n I>«t;/n)- I V(Q8)ds.
37 = 0 J0
. l/(Qo)||p(Q*)|e*"^H-
which is in Ll(dQ) since
Example. Let D be an open set in Rd such that the Lebesgue measure \D\ is
finite and the Lebesgue measure of the boundary \6D\ is zero. Denote by Hd
the Dirichlet Laplacian -A/2. Denote by kD(E) the number of eigenvalues
of Hd below E. This function is also called the integrated density of states.
Denote with Kd the unit ball in Rd and with \Kd\ = Vo\(Kd) = 7rd/2r(f +
1)_1 its volume. Weyl's formula describes the asymptotic behavior of ko(E)
for large E:
kD(E) \Kd\ ■ |g|
E™oo Edl2 2d/27Td '
It shows that one can read off the volume of D from the spectrum of the
Laplacian.
Example. Put n ice balls KjtTl, 1 < j < n of radius rn into a glass of water
so that n • rn = a. In order to know, how good this ice cools the water it is
good to know the lowest eigenvalue Ei of the Dirichlet Laplacian Hd since
the motion of the temperature distribution u by the heat equation u = Hdu
is dominated by e~tEl. This motivates to compute the lowest eigenvalue of
the domain D \ (J?=i ^0,n- This can be done exactly in the limit n —▶ oo
and when ice Kj,n is randomly distributed in the glass. Mathematically,
this is described as follows:
Let D be an open bounded domain in Rd. Given a sequence x = (x\, #2, • • •)
which is an element in DN and a sequence of radii n, r2,..., define
n
Dn = D\\J{\x-Xi\<rn} .
i=l
This is the domain D with n points balls K2,n with center xi,...xn and ra
dius rn removed. Let H(x, n) be the Dirichlet Laplacian on Dn and Ek(x, n)
the fc-th eigenvalue of H(x, n) which are random variable Efc(n) in x, if DN
is equipped with the product Lebesgue measure. One can show that in the
case nrn —▶ a
Efc(n)^Efc(0) + 27ra|Z)|-1
in probability. Random impurities produce a constant shift in the spectrum.
For the physical system with the crushed ice, where the crushing makes
nrn —> oo, there is much better cooling as one might expect.
Lets first prove a lemma, which relates the Dirichlet Laplacian HD — - A/2
on D with Brownian motion.
(ii) Since Brownian paths are continuous, we have /„ V(BS) ds > 0 if and
only if Bs £ Cc for some s G [0, i\. We get therefore
e-A/<V(Bs)ds ,
= E[|{|^-Bg|<*,0<5<*}|]
= A3-E[|W*(t)|],
so that one assume without loss of generality that S = I: knowing E[| Wi (t) \],
we get the general case with the formula E[|W«(i)|] = S3 ■ E[|Wi(«-2t)|].
Let K be the closed unit ball in Rd. Define the hitting probability
f(x,t)=P[x + BseK;0<s<t].
We have
E[|Wi(t)|]= / f(x,t)dx.
Proof.
E[|Wi(t)|] = j [p[xeWi(t)]dxdB
= f fp[Bs-x€K;0<s<t}dxdB
= f f P[BS -x€K;0<s<t]dBdx
= / f(x, t) dx .
244 Chapter 4. Continuous Stochastic Processes
dtP(x,0,t) = (A/2)p(x,0,t)
inside D. From the previous lemma follows that / = (A/2)/, so that the
function g(r, t) = r/(x, t) satisfies g = 2(dr)29(r' *) wu^ boundary condition
g(r,Q) = 0, <?(1, t) = 1. We compute
dx = 2irt + 4v27rt
/ /(a,*)-
J\x\>l
lim^E[|W«(t)|] = 27rt
and
lim \>E[\W6(t)\]=2ir6.
t—+oo t
We will use later for Jn,m(f) also the notation f(Btrn_1)5nBtrn, where
SnBt = Bt — Bt_2-n.
Remark. We have earlier defined the discrete stochastic integral for a pre
visible process C and a martingale X
f n
( / C dX)n=
(fcdx)n = Y/
y. Cm(Xm — Xm-i) .
J m=l
Lemma 4.16.1. If / G CX(R) such that /, /' are bounded on R, then Jn(f)
converges in C2 to a random variable
/ f(Bs) dB = lim Jn
Jo n-*°°
satisfying
|| ff(B8)dB\\2=E[f f(Bs)2ds]
Jo Jo
246 Chapter 4. Continuous Stochastic Processes
2n
\\Jn(f)\\2='£E[f(B{m_1)2-n)2}2-n .
m=l
\\Jn+l(f)-Jn(f)\\l
2n-l
= J2 E[(/(B(2ro+1)2-(»+i,) - /(5(2m)2_(.+1)))2]2-("+1)
m=l
2n-l
< C £ E[(5(2m+1)2_(.+1) - S(2m)2-(n+l))2]2-("+1)
m=l
= C.2"n-2,
where the last equality followed from the fact that E[(jB(2m+i)2-(n+i) -
B{2m)2-(n+1))2] = 2~n since J3 is Gaussian. We see that Jn is a Cauchy
sequence in C2 and has therefore a limit.
We can extend the integral to functions /, which are locally L1 and bounded
near 0. We write £foc(R) for functions / which are in LP(I) when restricted
to any finite interval / on the real line.
(ii) If / G L]oc(R) fl L°°(-e, e), then for almost every B(u), the limit
lim / l{.aM(Bs)f(Bs)2 ds
<*-oo J0
2n 2n
iE*^1
2~
I 1,-, .. .1
BsdB = -{Bl-l)^l-{B\-Bl).
Proof. Define
Jn = 2^ f(B(m-l)2-n)(Bm2-n - J5(m_1)2-») ,
771=1
2 n
^™ = Z^ f(Bm2-n)(Bm2-n - B(m_i)2-n) .
m=l
The above lemma implies that J+ - J~ -> 1 almost everywhere for n -> oo
and we check also J+ + J~ = B\. Both of these identities come from
cancellations in the sum and imply together the claim. □
We mention now some trivial properties of the stochastic integral.
Theorem 4.16.4 (Properties of the Ito integral). Here are some basic prop
erties of the Ito integral:
(1) SS f(B.) + g(B.) dBa = /„< f(B.) dBs + jj g(Bs) dBs.
(2) So * ■ /(*.) dB. = \. Ji f(Bs) dBa.
(3) t>-+ J0 f(Bs) dBs is a continuous map from K+ to C2.
(4)E[f*f(Bs)dB3) = 0.
(5) f0 f(Bs) dBa is At measurable.
Proof. (1) and (2) follow from the definition of the integral.
For (3) define Xt = /„* f(Bs) dB. Since
/t+e
f(B,)2ds]
Jt Jr v 2ns
for e —> 0, the claim follows.
(4) and (5) can be seen by verifying it first for elementary functions /. □
It will be useful to consider an other generalizations of the integral.
f f(Ws)dWs=
JO jRd
[ J fo f(x-rB3)dBsdx.
4.16. The Ito integral for Brownian motion 249
Definition. Assume / is also time dependent so that it is a function on
Rd x R. As long as E[/0 \f(Bs, s)\2 ds] < oo, we can also define the integral
f(Ba,s) ds .
Jo
df = VfdB + ±Afdt.
Remark. We cite [11]: "Ito's formula is now the bread and butter of the
"quant" department of several major financial institutions. Models like that
of Black-Scholes constitute the basis on which a modern business makes de
cisions about how everything from stocks and bonds to pork belly futures
should be priced. Ito's formula provides the link between various stochastic
quantities and differential equations of which those quantities are the so
lution." For more information on the Black-Scholes model and the famous
Black-Scholes formula, see [16].
It is not much more work to prove a more general formula for functions
/(x,t), which can be time-dependent too:
{0<2-n<...,tfc = fc.2"n,...,l}
and define SnBtk = Btk — Btk_1. We write
2n
|z|pe-x2/2 dx = C2-(np)/2 .
/ o-oo
o
This means
2n
EE |JnStJp] = C2n2-^/2
fc=i
fc=l
n- - Y.2^dxix'f{Btk-^tk-l)(-5nBtk)i(-5nBtk)i "* °
k—l i,j
in £2. Since
2"
k=l
goes to zero in C2 (applying (ii) for g = dXiXjf and note that (SnBtk)i and
(6nBtk)j are independent for i^ j), we have therefore
i r*
in£2.
gives
IIIn^ [ [ HBS,S)
f(Bs,s)ds
Jo
f(x,t)=eax~a2t/2 .
Because this function satisfies the heat equation / + f" /2 = 0, we get from
Ito's formula
We see that for functions satisfying the heat equation / + f"/2 = 0 Ito's
formula reduces to the usual rule of calculus. If we make a power expansion
in a of
j' eaB8-**s/2 dB = \ecBs-c?sl2 _ I
Jo a a
we get other formulas like
J BsdB=l-iB2-t).
252 Chapter 4. Continuous Stochastic Processes
Wick ordering.
There is a notation used in quantum field theory developed by Gian-Carlo
Wick at about the same time as Ito's invented the integral. This Wick
ordering is a map on polynomials Yl7=i a^x% which leave monomials (poly
nomials of the form xn + an-\xn~x • • •) invariant.
Definition. Let
Hn(x)Qo(x)
,_2-/2 nKy/2J
:x = X
:x2 = x2-l
:xz = x3 -3x
:z4 = x4 - 6x2 + 3
:x5 = x5 - 10a;3 + 15x .
The following formula indicates, why Wick ordering has its name and why
it is useful in quantum mechanics:
Proof. Since we know that ftn forms a basis in L2, we have only to verify
that : Qn : ttk = 2"n/2LQfc for all fc. From
2"1/2[Q,L] = [A + A*E(^)(A*)M^]
= E ( n ) Ji^y-'A^ - (n - 3)(A*yA^~l
3=0 ^ J '
= 0
0 = (: Qn : -2-n/2L)ft0
= (A:!)"1/2^(^Q)(: Qn : -2"^2L)fi0
= (: Qn : -2"n/2L)(/c!)-1/2i/fc(v^Q)fio
= (: Qn : -2"n/2L)ftfc .
Theorem 4.16.8 (Ito Integral of Bn). Wick ordering makes the Ito integral
behave like an ordinary integral.
• : K :dB. = ^.: B,"« :
I
Remark. Notation can be important to make a concept appear natural. An
other example, where an adaption of notation helps is quantum calculus,
"calculus without taking limits" [44], where the derivative is defined as
Dqf(x) = dqf(x)/dq(x) with dqf(x) = f(qx) - f(x). One can see that
Dqxn = [n]xn_1, where [n] = ^-fr- The limit q —▶ 1 corresponds to the
classical limit case h —> 0 of quantum mechanics.
I
o
: eaBs : dB = a'1 : eaBl : -a"1
254 Chapter 4. Continuous Stochastic Processes
The generating function for the Hermite polynomials is known to be
J2Hnix)^j
n—0
= ea^-£
(We can check this formula by multiplying it with ft0, replacing x with
x/y/2 so that we have
E
2 2
^njx)an
71=0 (n!)V2
If we apply A* on both sides, the equation goes onto itself and we get after
k such applications of A* that that the inner product with flk is the same
on both sides. Therefore the functions must be the same.)
This means
<* : X ' = e«x-W
-E 3=0
Since the right hand side satisfies / + f" 12 = 2, the claim follows from the
Ito formula for such functions. □
I t
o
1 dB = Bt
I0
BsdB = \iB2-l)
*
and so on.
Stochastic integrals for the oscillator and the Brownian bridge process.
Let Qt = e~tBeitj\/2 the oscillator process and At = (1 - t)Bt/(i-t) the
Brownian bridge. If we define new discrete differentials
SnQtk = Qtk+1-e-^-^Qtk
tk+i —tk .
SnAtk = Atk+1 - Atk +
il-t)
the stochastic integrals can be defined as in the case of Brownian motion
as a limit of discrete integrals.
e~tHu(0)= j e-F^B^u(Bt)dB,
/ Ks dMs ,
Jo
A function with finite total variation ||/||t = supA ||/||A < oo is called a
function of finite variation. If supt |/|t < oo, then / is called of bounded
variation. One abbreviates, bounded variation with BV.
fc-i
E[M2} = E[£(M2+1-M2)]
i=0
k-1
= E($2(Mti+l - Mti)(Mti+1 + Mti)}
i=l
k-1
= E[V>4+1-Mti)2]
2=1
4.17. Processes of bounded quadratic variation 257
and so
If the modulus | A| goes to zero, then the right hand side goes to zero since
M is continuous. Therefore M — 0. □
Remark. Before we enter the not so easy proof given in [83], let us mention
the corresponding result in the discrete case (see theorem (3.5.1), where
M2 was a submartingale so that M2 could be written uniquely as a sum
of a martingale and an increasing previsible process.
E [ TA ( M ) - TA ( M ) \ A S } = E [ ^ ( M , , + 1 - M £ j . ) 2 | A ]
3=1
+ E[(Mt-Mtk)2\As}+E[(Ms-Mtl)2\As}
= E[(Mt-Ms)2\As} = E[M2-M2\As}.
This implies that Mt2 - TtA(M) is a continuous martingale.
(ii) Let C be a constant such that \M\ < C in [0,a]. Then E[TA] < 4C2,
independent of the subdivision A = {to,..., tn) of [0, a].
Proof. The previous computation in (i) gives for 5 = 0, using T^(M) = 0
iTtiM))2 = (E(M(fe-Mtfc_J2)2
(iii) For fixed a > 0 and subdivisions An of [0,a] satisfying |An| -+ 0, the
sequence TAn has a limit in C2.
Proof. Given two subdivisions A', A" of [0,a], let A be the subdivision
obtained by taking the union of the points of A' and A". By (i), the process
X = TA -TA is a martingale and by (i) again, applied to the martingale
X instead of M we have, using (x + y)2 < 2(x2 + y2)
r ^ - T. f = ( M S f e + 1 - M t m ) 2 - ( M „ - M t m ) 2
= (MSfc+1 - MSk)iMSk+1 + MSk - 2Mtm)
4.17. Processes of bounded quadratic variation 259
and so A
TA(TA') < (supk |MSte+1 + MSk - 2Mtm\2)TA .
and the first factor goes to 0 as |A| -▶ 0 and the second factor is bounded
because of (iii).
MN-(M,N)
is a martingale.
Proof. Uniqueness follows again from the fact that a finite variation mar
tingale must be zero. To get existence, use the parallelogram law
(M + N)2 - (M + N, M + N) - (M - N)2 - (M - N, M - N)
= 4MN-(M + N,M + N)-(M-N,M-N) .
(M^,M^) = 5irt
must be Brownian motion. This is Levy's characterization of Brownian
motion.
Proof, (i) Define (M, AT)* = (M,N)t - (M, A/%. Claim: almost surely
is positive almost everywhere and this stays true simultaneously for a dense
set of r G R. Since M, AT are continuous, it holds for all r. The claim follows,
since a + 2rb+cr2 > 0 for al] r > 0 with nonnegative a, c implies 6 < yja^fc.
(ii) To prove the claim, it is, using Holder's inequality, enough to show
almost everywhere, the inequality
\ J[ oH a K a d ( M , N ) s \ < Ti l H i K i W i M ^ N f t ^ l
< ^|i/^|((M,M)^1)1/2((M,M)^1)1/2
i
< C£H2(M,M)l^y/2iJ2K2(N,N)^y/2
We can extract a subsequence, for which supt \M^ — Mt\ converges point
wise to zero almost everywhere. Therefore M G H2. The same argument
shows also that Hq is closed. □
for every N G H2. The map K h+ /q* KdM is an isometry form C2(M) to
H2.
lElfJ 0 Ksd(M,N)s}\<\\N\\n2-\\K\\C2{M).
The map
N ^ E [ ( [ JoK 8 ) d ( M , N ) s ]
= E[ K2d(M,M)]
Jo
= HKll£2(M)
Definition. The martingale /0* Ks dMs is called the Ito integral of the
progressively measurable process K with respect to the martingale M. We
can take especially, K = /(M), since continuous processes are progressively
measurable. If we take M = B, Brownian motion, we get the already
familiar Ito integral.
Proof. The general case follows from the special case by polarization: use
the special case for X ± Y as well as X and Y.
The special case is proved by discretisation: let A = {to,h,..., tn} be a
finite discretisation of [0,£]. Then
Io
aXs dMs = Xt - 1 .
dXt = aXtdMuXo = l.
4.19. Stochastic differential equations 265
This is an example of a stochastic differential equation (SDE) and one
would use the notation
——- = olX
dM
if it would not lead to confusion with the corresponding ordinary differential
equation, where M is not a stochastic process but a variable and where the
solution would be X — eaB. Here, the solution is the stochastic process
Xt = e^t-o^tii
Definition. Let Bt be Brownian motion in Rd. A solution of a stochastic
differential equation
where / : and g :
As for ordinary differential equations, where one can easily solve separable
differential equations dx/dt = f(x)+ g(t) by integration, this works for
stochastic differential equations. However, to integrate, one has to use an
adapted substitution. The key is Ito's formula (4.18.5) which holds for
martingales and so for solutions of stochastic differential equations which
is in one dimensions
dt dBt
dt 0 0
dBt 0 t
Example. The linear ordinary differential equation dX/dt = rX with solu
tion Xt = ertX0 has a stochastic analog. It is called the stochastic popula
tion model. We look for a stochastic process Xt which solves the SDE
dX = rtdt + a(dt
—
266 Chapter 4. Continuous Stochastic Processes
and integration with respect to t
t dXt
I
Joo
—A=rt + aBt.
^t
In order to compute the stochastic integral on the left hand side, we have to
do a change of variables with f(X) = log(#). Looking up the multiplication
table:
For example, with X = x and F = 0, the function v(t) satisfies the stochas
tic differential equation
dXt = -bXt + aCt ,
dt
which has the solution
Xt e~bt + aBt
With a time dependent force F(x, t), already the differential equation with
out noise can not be given closed solutions in general. If the friction constant
b is noisy, we obtain
*?± = i-b + aCt)Xt
which is the stochastic population model treated in the previous example.
||X||T = E[supX2].
t<T
E[sup|Xsn<(^-)^E[|X,H.
s<t P-1
P[X>\]<E[\Xt\'lx*>\]
we get
rx*
E[\X*\P] = E[/ p\p-1 d\]
Jo
/•OO
ll|5«(A-)-54(y)|||T<(i/4)-||A:-y||T
then S is a contraction
| | | 5 ( x ) - 5 ( y ) | | | r < ( i / 2 ) . | | x - y | | T.
\\\S1(X)-S1(Y)\\\T = nsM^f(s1X)-f(s,Y)dMs)2}
rp
= 4K2/ |||X-y|||.ds
< ( l / 4 ) . | | | X - y | | | r,
ii(t) = f(t,u(t)) .
IWvi)-tf(lfc)|| = ll//(«,Vi(*))-/(«,w(«))d*||
Jt0
< f Wfi»,Vlis))-fi»,V2i8))\\d8
Jt0
< kr - \\yi - y2\\ .
272 Chapter 4. Continuous Stochastic Processes
On the other hand, we have for every s G Jr
\ \ fi s , y i s ) ) \ \ < M
and so
We can apply the above lemma, if kr < 1 and Mr < 6(1 - kr). This is
the case, if r < b/(M + kb). By choosing r small enough, we can get the
contraction rate as small as we wish. Q
Definition. A set X with a distance function d(x, y) for which the following
properties
(i) d(y, x) = d(x, y)>0 for all x, t/ G X.
(ii) d(x, x) =0 and d(x, y) > 0 for x ^ y.
(iii) d(x, 2) < d(x, y) + d(j/, z) for all x, 2/, 2;. hold is called a metric space.
Example. The plane R2 with the usual distance d(x, y) = \x-y\. An other
metric is the Manhattan or taxi metric d(x,y) = \xi-yi\ + |x2 - 2/21-
Example. The set C([0,1]) of all continuous functions x(t) on the interval
[0,1] with the distance d(x,y) = maxt \x(t) - y(t)\ is a metric space.
Definition. A map 0 : X -> X is called a contraction, if there exists A < 1
such that d(<t>(x), <f)(y)) < A • d(x, y) for all x, y G X. The map 0 shrinks the
distance of any two points by the contraction factor A.
(Q,d(x,y) = \x-y\)
is not complete.
4.19. Stochastic differential equations 273
Example. The space C[0,1] is complete: given a Cauchy sequence xn, then
xn(t) is a Cauchy sequence in R for all t. Therefore xn(t) converges point
wise to a function x(t). This function is continuous: take e > 0, then \x(t) -
x(s)\ < \x(t) - xn(t)\ + \xn(t) - yn(s)\ -r \yn(s) - y(s)\ by the triangle
inequality. If s is close to £, the second term is smaller than e/3. For large
n, \x(t) - xn(t)\ < 6/3 and \yn(s) - y(s)\ < e/3. So, \x(t) - x(s)\ < e if
\t — s\ is small.
d($n(x),<t>n(y))<\n-d(x,y)
for all n.
(ii) Using the triangle inequality and Ylk^k = i1 ~ *)_1» we Set for a11
x GX,
n—1 n—1 ^
(iv) There is only one fixed point. Assume, there were two fixed points x, y
of (/>. Then
d(x, y) = d(0(x), 0(£)) < A • d(x, y) .
This is impossible unless x = y. D
274 Chapter 4. Continuous Stochastic Processes
Chapter 5
Selected Topics
5.1 Percolation
Definition. Let a be the standard basis in the lattice Zd. Denote with hd
the Cayley graph of Zd with the generators A = {eu ..., ed }. This graph
hd = (V, E) has the lattice Zd as vertices. The edges or bonds in that
graph are straight line segments connecting neighboring points x, y. Points
satisfying |x - y\ = J2i=i \x* ~ 2/<l = 1*
275
97fi
Chapter 5. Selected Topics
Lemma 5.1.1. There exists a critical value pc = pc(d) such that 0(p) = 0
for p < pc and 0(p) > 0 for p > pc. The value d ^ Pc(d) is non-increasing
with respect to the dimension pc(d + 1) < pc(d).
A(d) = n
—lim
▶o a(n)^n .
5.1. Percolation 277
Remark. The exact value of A(d) is not known. But one has the elementary
estimate d < A(d) < 2d - 1 because a self-avoiding walk can not reverse
direction and so a(n) < 2d(2d - l)n_1 and a walk going only forward
in each direction is self-avoiding. For example, it is known that A(2) G
[2.62002,2.69576] and numerical estimates makes one believe that the real
value is 2.6381585. The number cn of self-avoiding walks of length n in L2 is
for small values cx = 4, c2 = 12, c3 = 36, c4 = 100, c5 = 284, c6 = 780, c7 =
2172, Consult [62] for more information on the self-avoiding random
walk.
Proof. (i)pc(d)>X(d)~1.
Let N(n) < a(n) be the number of open self-avoiding paths of length n in
Ln. Since any such path is open with probability pn, we have
Ep[N(n)]=pna(n).
If the origin is in an infinite open cluster, there must exist open paths of
all lengths beginning at the origin so that
The fact that the origin is in the interior of a closed circuit of the dual
lattice if and only if the open cluster at the origin is finite follows from the
Jordan curve theorem which assures that a closed path in the plane divides
the plane into two disjoint subsets.
Let p(n) denote the number of closed circuits in the dual which have length
n and which contain in their interiors the origin of L2. Each such circuit
contains a self-avoiding walk of length n — 1 starting from a vertex of the
form (fc H-1/2,1/2), where 0 < fc < n. Since the number of such paths 7 is
at most na(n— 1), we have
Remark. We will see below that even pc(2) < 1 - A(2)"1. It is however
known that pc(2) = 1/2.
Definition. The parameter set p < pc is called the sub-critical phase, the
set p > pc is the supercritical phase.
Definition. For p < pc, one is also interested in the mean size of the open
cluster
x(p) = Ep[|C|].
For p > pc, one would like to know the mean size of the finite clusters
X/(p) = Ep[|C|||C|<oo].
It is known that \(p) < oo for p < pc but only conjectured that xf(p) < oo
for p > pc.
An interesting question is whether there exists an open cluster at the critical
point p = pc. The answer is known to be no in the case d = 2 and generally
believed to be no for d > 3. For p near pc it is believed that the percolation
probability 0(p) and the mean size x(p) behave as powers of \p - pc\. It is
conjectured that the following critical exponents
7 = -pSpc
iimlog ^xiiL
\p- pc\
0 = lim ***«
P\Pc lOg \p~ pc\
±Ep[X] = J2(Xi(l)-Xi(0))>0.
Proof. As in the proof of the above lemma, we prove the claim first for ran
dom variables X which depend only on n edges ci, e2,..., en and proceed
by induction.
(ii) Assume the claim is known for all functions which depend on A; edges
with k < n. We claim that it holds also for X,Y depending on n edges
ei,e2,...,en.
Let Ak = Aiei, ...ek) be the a-algebra generated by functions depending
only on the edges ek. The random variables
Xk = Ep[X\Ak],Yk = Ep\Y\Ak]
pP[fv«]^npp[^]-
i=l i=l
5.1. Percolation 281
We show now, how this inequality can be used to give an explicit bound for
the critical percolation probability pc in L2. The following corollary belongs
still to the theorem of Broadbent-Hammersley.
Corollary 5.1.5.
Pc(2)<(l-A(2)-1)
We know that FNnGNC {\C\ = oo}. Since FN and GN are both increas
ing, the correlation inequality says Pp[Fjv H GN] > ?p[Fn} • Pp[Gn]- We
deduce
PP[GCN)< £(l-pyW(n-l)
n=N
which goes to zero for N -▶ oo. For N large enough, we have therefore
PP[GN] > 1/2. Since also PP[FN] > 0, it follows that 9P > 0, if (1 -p) A(2) <
1 or p < (1 - A(2)_1) which proves the claim. n
±Pp[A]=Ep{NiA)},
dp
PP[A] = P{rjpeA}.
PP,[A]-?P[A] = P[rjp,eA]-P[npeA}
= P[v6il;?j,M]
= (p,(/)"P(/))Pp[/ Pivotal for A].
Divide both sides by (p'(f) -p(f)) and let p'(f) —▶ p(f). This gives
a
PP[A] = Pp[f pivotal for A]
dp(f)
(iii) The claim, if A depends on finitely many edges. If A depends on finitely
many edges, then PP[A] is a function of a finite set {p(fi)}iLi of edge
probabilities. The chain rule gives then
d m ft
-^pp\A\ = Ea^)Pp[A1|p=(p>p'p--p)
m
= ^2 Ppifi Pivotal for A]
i=l
= EP[AT(A)].
PF(e) =P+l{eeF}$
where 0 <p <p + S <1. Since A is increasing, we have
PP+S[A]>PPF[A]
and therefore
as 6 —> 0. The claim is obtained by making F larger and larger filling out
E. □
5.1. Percolation 283
Pp[eispivotal] = (^:i1)Pfc-1(l-Pr-fc
Theorem 5.1.7 (Uniqueness). For p < pci the mean size of the open cluster
is finite x(p) < °°-
The proof of this theorem is quite involved and we will not give the full
argument. Let S(n,x) = {y G Zd \ \x - y\ = £ti \xi\ < n] be the ball of
radius n around x in Zd and let An be the event that there exists an open
path joining the origin with some vertex in 5S(n, 0).
Lemma 5.1.8. (Exponential decay of radius of the open cluster) If p < pc,
there exists ^p such that Pp[An] < e'n^.
Proof. Clearly, |S(n,0)| <Cd-(n + l)d with some constant Cd. Let M =
max{n | An occurs }. By definition of pc, if p < pc, then PP[M < oo] = 1.
We get
< ^|5(n,0)|Pp[An]
n
< ^Cd(n + l)de-n^<oo
284
Chapter 5. Selected Topics
D
g'pin) = Ep[NiAn)} ,
where NiAn) is the number of pivotal edges in An. We have
e ^
= J2l^mA)\A].gPin)
so that
9v(n) 1
^y-EP[iV(^)Mn]
By integrating up from a to /?, we get
r 1
9a(n) = 9(3(n) exp(- / -Ep[AT(An) | ^J dp)
«/a P
f0
< g0in)expi- / Ep[AT(,4n) | An] dp)
J ex.
One needs to show then that Ep[N(An) \An] grows roughly linearly when
p < pc. This is quite technical and we skip it. □
oo -
Let Bn the box with side length 2ra and center at the origin and let Kn be
the number of open clusters in Bn. The following proposition explains the
name of k.
5.1. Percolation 285
Kn/\Bn\ -* nip) .
(i)E*6B„rn(x) = #n.
Proof. If S is an open cluster of Bn, then each vertex x G E contributes
|E|_1 to the left hand side. Thus, each open cluster contributes 1 to the
left hand side.
(v) 22xeB(n) rn(x) < X)X€B(n) r(x) + Ex~5b„ rn(x), where x ~ Y means
that x is in the same cluster as one of the elements y e Y c Zd.
Z2(Z) = {(...,x_1,x0,x1,x2...)| Y, 4 = 1}
k=—oo
of the form
where K,(n) are IID random variables in £°°. These operators are called
discrete random Schrodinger operators. We are interested in properties of
L which hold for almost all u e ft. In this section, we mostly write the
elements u of the probability space (ft, A, P) as a lower index.
Theorem 5.2.1 (Prohlich-Spencer). Let V(n) are IID random variables with
uniform distribution on [0,1]. There exists A0 such that for A > A0, the
operator L^ = A + A • K, has pure point spectrum for almost all u.
Gu,(m,n,E) = [(L0J-E)-1]mn.
Let p = pu be the spectral measure of the vector e0. This measure is
defined as the functional C(R) -♦ R, / .-* /(Lw)oo by f(L„)0o = E[/(L)00].
Define the function
f(Z) =fdm.
J r V- z
It is a function on the complex plane and called the Borel transform of the
measure p. An important role will play its derivative
= f M\)
h iv- z f
5.2. Random Jacobi matrices 287
Definition. Given any Jacobi matrix L, let La be the operator L + aP0,
where Po is the projection onto the one-dimensional space spanned by S0.
One calls La a rank-one perturbation of L.
dpot da = dE .
/JR
r
taKZ)
F { z ) = 1 *+" (aFiz)-
*)
Jf r f fix) dpaix)
Jr da= Jf fix) dEix)
Now, if ±Im(z) > 0, then ±lmF(z) > 0 so that ±lmF(z)~1 < 0. This
means that hz(a) has either two poles in the lower half plane if Im(z) < 0
or one in each half plane if Im(z) > 0. Contour integration in the upper
half plane (now with a) implies that JRhz(a) da = 0 for lm(z) < 0 and
2-ki for Im(z) > 0. □
288 Chapter 5. Selected To p i c s
In theorem (2.12.2), we have seen that any Borel measure p on the real line
has a unique Lebesgue decomposition dp = dpac + dpsing = dpac + dpsc +
dppp. The function F is related to this decomposition in the following way:
The theorem of de la Vallee Poussin (see [88]) states that the set
F'(E) < oo
J2\iL-E-ie)^\2 = \\iL-E-ie)-H0\\2
nez
= |[(Z,-E-ie)-1(L-.E + *e)-1]oo|
f Mx)
iR(x-S)2 + e2
f r o m w h i c h t h e m o n o t o n i c i t y a n d t h e l i m i t f o l l o w. □
J/ o \ x - a \ ^ 2 \ x - fi \ - ^ 2 d x > C Jj o \ x - p \ ~ 1 / 2 d x .
The function
/3/4l*-/3|-1/2cte
K0) = -r
/olx-aMx-zSI-^dx
is non-zero, continuous and satisfies ft(oo) = 1/4. Therefore
Lemma 5.2.8. Let f,g e i°°(Z) be nonnegative and let 0 < o < (2d)-1.
Proo/. Since ||A|| < 2d, we can write (1 - aA)"1 = Em=o(aA)m which is
preserving positivity. Since [(aA)"% = 0 for m < \i - j\ we have
oo oo
5>(n)
which holds uniformly for Im(z) ^ 0.
(i)
E[\XVin) - z\^2\9zin)\1/2} < 5n,0 + £ hin + j) .
(iii)
(iv)
(1 - CA1/2A)fc < 6n0 .
Proof. Rewriting (iii).
292 Chapter 5. Selected To p i c s
Example. We can also take the estimator T(uj) which is the median of
Xi(lj), ...,Xn(uj). The median is a natural quantity because the function
f(x) = Yl7=i \xi ~ x\ is minimized by the median. Proof. \a - x\ + |6 - x\ =
5.3. Estimation theory 29^
Proposition 5.3.2. For g(6) = Vaie[Xi\ and fixed mean m, the estimator
T = - Yln=i(xi ~ m)2 is unbiased- If tne mean is unknown, the estimator
T = ^i Zti(Xi ~ X)2 with X = £ ElLi *i ^ unbiased.
E*[T] = E,[X2]-E,[^^X^]
= — Va i 9 [ X i ] .
n
Therefore n/(n - l)T is the correct unbiased estimate. □
Remark. Part b) is the reason, why statisticians often take the average of
r^br\(xi -x)2 as an estimate for the variance of n data points Xi with mean
m if the actual mean value m is not known.
294 Chapter 5. Selected To p i c s
Definition. The expectation of the quadratic estimation error
E r r, [ T ] = E , [ ( T- ^ ) ) 2 ]
is called the risk function or the mean square error of the estimator T. It
measures the estimator performance. We have
Err$[T\=Vax9[T\ + B$[T],
minmax.R(0,T) .
t e v
min [ (R(6,T)dp(6) .
t jo
L,(x1,...,xn) = ^e-^"^-^
IiJ(0) = Jf-^-fedx.
£1
Lemma 5.3.3. 7(0) = Var«[^].
fZa.1 =
Proof. E[£] -= /n /*dx = 0 so that
Vare[f]
J e = Ee[(f)2]
je
a
****W--
In the unbiased case, one has
1
Erv0[T] >
nli$)
1 + B'ie) = Jtixu...,xn)L'eix1,...,xn)dx1---dxn
= hix1,...,xn)L'^--^Xn\dx1...dxn
J Leixi,...,xn)
4) Using 3) and 2)
Cov[T,L'e/L0] = Ee[TL'e/Le}-0
= i + B'ie).
5)
il + B'ie))2 = Cov2[T,^]
< Var*[T]Vare[^]
= Va r e [ T ] £ E , [(feixi)\
(M^)2]
= Var„[T] n/(0) ,
5.3. Estimation theory 297
where we used 4), the lemma and
n
L'eILe = YJf'e{xi)lfeixi)-
i=l
□
Si6) = - J felogife) dx ,
Nie) =27re
J-e*m.
Proof, a) Ix+y < c2Ix + (1 — c)2Iy is proven using the Jensen inequal
ity (2.5.1). Take then c = Iy/(IX + W)-
b) and c) are exercises. □
f = 9,9'- - [ VF(/(u;)-/(n))dm(n).
Jm
These equations are called the Hamiltonian equations of the Vlasov flow.
We can interpret X1 as a vector-valued stochastic process on the probability
space (M, A, m). The probability space (M, A, m) labels the particles which
move on the target space N.
y(xi,....xn) = X^ >
i
/t = 9%
3=1
H — <▶— < •——( > — — (▶— ( h——<>—I ▶— < > — — <▶— - <▶
Example. Let M = N = R2 and assume that the measure m has its support
on a smooth closed curve C. The process Xf is again a volume-preserving
deformation of the plane. It describes the evolution of a continuum of par
ticles on the curve. Dynamically, it can for example describe the evolution
of a curve where each part of the curve interacts with each other part. The
picture sequence below shows the evolution of a particle gas with support
on a closed curve in phase space. The interaction potential is V(x) = e~x.
Because the curve at time t is the image of the diffeomorphism X*, it
will never have self intersections. The curvature of the curve is expected
to grow exponentially at many points. The deformation transformation
X1 = (fl,gl) satisfies the differential equation
d t
dtf = 9
o-(f(")-f{r)))
dt9 dm(n)
Jm
ifi*) = fix)
±g*ix) = J* e-W-fW))) & .
O 0
Figure. Tfte support Figure. The support Figure. The support
of the measure P° on of the measure P0A on of the measure P12 on
N = R2. N = R2. 7V = R2.
Example. In the case of the quadratic potential V(x) = x2/2 assume ra has
a density p(x, y) = e~x2-2y2, then P* has the density p\x, y) = f(x cos(t) +
y sin(t), -xsin(t) + ycos(t)). To get from this density in the phase space,
the spatial density of particles, we have to do integrate y out and do a
conditional expectation.
L= h(x,y)jtP\x,y)dxdy
as follows:
302 Chapter & * Selected Tbpics
= Jtd J h(f(v,t),giu>,t))dmiw)
f ' .'.':...
= / Vxhif(v,t),giu,t))giu,t)dmiu)
JM
= / ^xhix,y)yPtix,y)dxdy- f P*(x,»)VyM».y)
n
Remark. The Vlasov equation is an example of an integro-differential equa-
tion The right hand side is an integral. In a short hand notation, the Vlasov
equation is
■' ■ '■ - ■■'■•--..■■■' ■' ::''«■■ P+^-^^^(x)-P^0; , i .]rS.4C 5;j,^*fc]
whereW ^VXV'*P is'the cdhvblution of the force1 V^ with P: ;
• jv == R: Vix) = M
2-
• N--= T: Vix) = |x(27T-x)|
4x
• N--= S2 ■Yix) == 10g(l,-X-x). ; !;:|
• N-.= E2 Vfal*
• N-.=M3Jix) = 47Tx|| ■'
• N = R4 V(x) =_ 1 1
For example, for ^ - R, the Laplaciati A/ =±> /'Ms the second deriva
tive. It is diagonal in Fourier space: A/(fc) = -fc2/, wheifefc €fe FVom
Deltaf{k) = -fc2/ = p(k) we get /(fc) = -(l/^^ifc^.sp^that / = V*p,
where V" is the function which has the Fourier transform V'(fc) = -1/fc2.
B^tV^x) =; |xj/2 has this Fourier transform; I .:.■.■■);'•• ^ ; ;^ !
00 ItI 1
/ 2C , ^: ... fc2
Lemma 5.4.2. (Gronwall) If a function u satisfies uf(t) < \g(t)\u(t) for all
0 < t < T, then u(t) < tz(0) exp(/0*'\g(s)\ ..factor 0 <t<T.
Proof. Integrating the assumption gives u(t) < u(0) + JQ g(s)u(s) ds. The
function h(t) satisfying the differential equ&tirin ''&($) == \ff(t)\u(t) s^isffes
Jhr(i) < \g(tyh(t). This leads to A(t>."</?i<-0>^^^jp^^)!* 0'^'^ Hty -
u(0) exp(.J* \g(s)\ ds). This proof for real value<l functions [20] generalizes
to the cafe, where ul (x)( evolves in a function spacevQne just can apply the
same proof for any fi x e d x. D
304 Chapter 5. Selected Topics
The Gronwall's lemma assures that ||X(fj)|| can not grow faster than ex
p o n e n t i a l l y. T h i s g i v e s t h e g l o b a l e x i s t e n c e . Q
Remark. If m is a point measure supported on finitely many points, then
one could also invokl the global existence theorem for differential equations.
For smooth potentials, the dynamics depends continuously on the measure
m. One could approximate a smooth measure m by point measures.
Definition. The evolution of DX1 at a point uj € M is called the linearized
Vlasov flow. It is the differential equation
—DX =
dt dt
0 1
./m-vM/M-/(^))^(^) o
A(n rf
Remark. The rank of the matrix DX\u) stays constant. Df\u) is a lin
ear combination of Df°(u) and Dg°(u). Critical points of /* can only
appear for u;, where Df°(uj), Df°(uj) are linearly dependent. More gen
erally Yk(t) = {uj e M | DX\uj) has rank 2q - fc = dim(N) - fc} is time
independent. The set Yq contains {uj \ D(f)(u) = \D(g)(u), A G RU{oc}}.
5.4. Viasov dynamics 305
Definition. The random variable
The constant C is chosen such that /Rd S(y) dy = 1. These measures are
called Bernstein-Green-Kruskal (BGK) modes.
Proof.
yVxP = yS(y)Qx(x)
= yS(y)(-(3Q(x) [ VxV(x-x')Q(x')dx')
Jnd
and
-OO J— OO «;;
Mn(*)= / xUf(x)dx
Jud
which is a short hand notation for \
• • f ^ ■■■ ^ f \ x i , ^ x d ) d x 1 ^ - d x d .
-oo « / — ooo
o
5.5. Multidimensional distributions 307
Example. Then = (7,3,4) 'th moment of the random vector X = (x3,j/4f^)
is :\ ■■ - . ■ 1 - - 1 - - : 1 - - ■■ ' i; ' ■■
E[X^X^X^\ = E[x21yl2z*0} = -Y32Q •
The random vector X is continuous and has the pfobability density
T- 2 / 3 J . - 3 / 4 ^ - 4 / 5
f(x^z) = (^)(V-ir)(-r)-
Remark. As in one dimension, one can define a multidunensional moment
generating function li:l ;.-,; ;..-.:.•.,.;-i [^^^y^-i^n^ul/.) .LO-.C xwiH'.tA
Mx(t) = E[el'x] = Effe'1*1^- • -e^f "'/{-l! : ' '
which contains^,!! the information about the moments because of the multi
dimensional moment formula
1 Jud » "^
Example. The random variable X = (x, ^y, z1/3) has the moment gener
ating function
Because^componentsiXi?,X2,X$inthisExample;wereindependent ran
dom variables, the moment generating function is of the form
M(s)M(t)M(u) ,
where the factors are the one-dimensional moments of the one-dimensional
rahclohivariaibliesilX\, X$ aiid iX^l" *
Definition. Let a be the standard Ipa^is in £f. 1 JDfB^e t^e #artM #B^fwe
(Aia)n = an~ei - an on configurations and write Afe = ]li V- Unlike the
u^ual convention, we take a particular sign conv^itiop for A. This allows
ui to avoid many negative signs in tnis section. By induction in ]£i=in*>
one proves the relation
' ' (Afc/i)n = I ,xT^(l-^fcdM, / ^ : (5.1)
5„(j/m)(x) = nB„j(yr)(^),
i=l
the claim follows from the result corollary (2.6.2) in one dimensions. □
Remark. Hildebrandt and Schoenberg refer for the proof of lemma (5.5.1)
to Bernstein's proof in one dimension. While a higher dimensional adapta
tion of the probabilistic proof could be done involving a stochastic process
in Zd with drift Xi in the z'th direction, the factorization argument is more
elegant.
Proof, (i) Because by lemma (5.5.1), polynomials are dense in C(Id), there
exists a unique solution to the moment problem. We show now existence
of a measure /z under condition (5.2). For a measures /x, define for n € Nd
5.5. Multidimensional distributions 309
the atomic measures ;u(n) on Id which have weights [\\ (&kt*)n on the
IllLifa + !) P°ints (^^T1' • • •' rb^IA>> € /d with ° - fci " "*' Because
(ii) The left hand side of (5.2) is the variation ||/x(n)|| of the measure /i(n).
Because by (i) /x^ —▶ /x, and /j, has finite variation, there exists a constant
C such that ||/i(n)|| < C for all n. This establishes (5.2).
(iii) We see that if (Afc/x)n > 0 for all fc, then the measures /x(n) are all
positive and therefore also the measure /i.
Remark. Hildebrandt and Schoenberg noted in 1933, that this result gives
a constructive proof of the Riesz representation theorem stating that the
dual of C(Id) is the space of Borel measures M(Id).
Definition. Let S(x) denote the Dirac point measure located on x G Id. It
satisfies JId S(x) dy = x.
We extract from the proof of theorem (5.5.2) the construction:
Proof: (i) Vet ^ Wthe measures of corollary (5:5.3). We' construct first
from the atomic measures pM1^ absolutely cbntihuous measures pW =
g^&x dii Id given by afuikition g whidh takes the Constant v&lue r
d
i\Akin)n\(nkyf[(ni + iy
on a cube of side lengths l/(n* + 1) centered at the point (n - k)/ne Id.
Because the cube has Lebesgue volume (n + 1)_1 = rL=i(n* + I)""1' ^ nas
the same measure with respect to both /x(n) and g^dx. We have therefore
also g^dx —»/x weakly.
(ii) Assume |u f /dx with / G Lp. Because g^dx■-^ /dx in the we&M
topology for measures, we have (^ —▶ / weakly in Lp/ But then, there
exists a constant C such that ||p^||P < C and this is equivalent to (5.3).
(iii) On the other hand, assumption (5.3) means that ||^(n)[|p < C, where
g(n) was constructed in (i). Since the unit-ball in the reflexive Banach space
Lp(Id) is weakly compact for p € (0,1), a subsequence of #(n) converges to
a function g E Lp. This implies that a subsequence of g^dx converges as
a measure to gdx which is in Lp and which is equal to p by the uniqueness
of the moment problem (Weierstrass). □
v. " ; ; ; . : i . A \ ' ' ' . : ' " - ^ r ' ^ ^ ^ i ^ ^ p ^ ^ r - v : ! i v . : " " : ' .■ -
J=l 3
m
= ^Q^B^dj).
5=1
This calculation in the case m = 1, leaving away the last step shows that NB
is Poisson distributed with parameter P[B]. The last step in the calculation
is then j u s t i fi e d . ^
z€U(u)
Example. For a Poisson process and / = 1b, one gets E/(u;) = Nb(uj).
Definition. The moment generating function of E/ is defined as for any
random variable as
■■ I T ;MEf(i)AE\e^f] .
It is called the characteristic functional of the point process.
Example. For a Poisson process aiid / = Efc=i ao *Bk, where B^ aire disjoint
sets, we have the characteristic functional
Example. For a Poisson process, and / G LX(S, P), the moment generating
function of E/ is
nrn ( n j - fl ) v
2=1
5.6. Poisson processes 3l5
(ii) For m disjoint fc-cubes {Bj}f=lr the sets 0(Bj) C ft are independent.
Proof:
■ .. ■ . rn . . ■ ' . ' ■■■
QlC\0(Bj)) = Q[{NUT=lBj=o}}
■»; ■ ■'•• .-' •• :....■ i=i ■■ ■ -.,;..:■ . ■ ; .:i!
rn
= expc-ny^])
■ . i=l .■.■■■■■■:
m
= ]jeM-P[Bj\)
m * ■■ ' { - *■ : .■
(iii) We, count the number pf points in ah open open subset U of S using
^-cubes: define\ for k > 0 the random variable tiy{uj) as the number fc-
cubes B for which uj £ 0(Bhi/).' These random variable NJj(u) converge
to Nu(uj) for fc —> oo, for almost all uj.
(iv) For an open set [/, the random variable Nu is Poisson distributed
with parameter P[U]. Proof: we compute its moment generating function.
Because for different fc-cubes, the sets O(Bj) C 0(U) are independent,
the ttibmerit '"generating function of Nfr = ^fc l'o(£)j) is the product bf trie
moment generating functions of lo(B)j):
E[e^] = n (0|O(B)].W(l-i?[O(BJ]i :
fc—cube B
= J] (exp(-P[B]) + e'(l-exp(-P[S])))
fc—cube B
Each factor of this product is positive and the monotone convergence the
orem shows that the moment generating function of Nu is
E[ew"] =cf—>olim J]
o ■*■■»• (exp(-P[B])+e*(l-exp(-P[B]))).
fc—cube B
316 Chapter 5. Selected To p i c s
which converges to exp(P[C/](l - e*)) for fc -> oo if the measure P is non-
atomic.
(v) For any disjoint open sets U\,..., C/m, the random variables {JV^ )}f=i
are independent. Proof: the random variables {N^.)}^ are independent
for large enough fc, because no fc-cube can be in more than one of the sets
Uj, The random variables {Nj}.)}^ are then independent for fixed fc. Let
ting fc —▶ oo shows that the variables Nuj are independent.
(vi) To extend (iv) and (v) from open sets to arbitrary Borel sets, one
can use the characterization of a Poisson process by its moment generating
function of / G Ll(S,P). If / = ^aj^ for disjoint open sets Uj and
real numbers aj, we have seen that the characteristic functional is the
characteristic functional of a Poisson process. For general / 6 L^S, P) the
characteristic functional is the one of a Poisson process by approximation
and the Lebesgue dominated convergence theorem. Use / = 1B to verify
that NB is Poisson distributed and / = Yl^Bj with disjoint Borel sets
Bj to see that {NBj)}fz=il are independent. □
S(x,u) = (f(x,uj),T(uj))
S(x,uj) = (f(x,uJo),T(u>)).
Iterating this random logistic map is done by taking IID random variables
cn with law v and then iterate
V(x,B)=F[f(x,uj)eB}.
Proof, a) We have to check that for all x, the measure V(x, •) is a prob
ability measure on M. This is easily be done by checking all the axioms.
We further have to verify that for all B E S, the map x -> V(x,B) is
B-measurable. This is the case because / is a diffeomorphism and so con
tinuous and especially measurable,
b) is the definition of a discrete Markov process. □
Example. If ft = (AN, ^N, vN) and T(x) is the shift, then the random map
defines a discrete Markov process.
only with respect to the transition probabilities that A must have at least
dimension d > 1. With respect to M, we have already assumed smoothness
from the beginning.
Each map either rotates the point by the vector a = (ai,a2) or by the
vector /? = (/?i,/?2). The Lebesgue measure on T2 is invariant because
it is invariant for each of the two transformations. If a and /? are both
rational vectors, then there are infinitely many ergodic invariant measures.
For example, if a = (3/7,2/7),/? = (1/11,5/11) then the 77 rectangles
[ill, (i + l)/7] x jj/ll, (j + i)/n] are permuted by both transformations.
Definition. A stationary measure p of a random diffeomorphism is called
ergodic, if p x P is an ergodic invariant measure for the map S on (M x
ft,/xxP). v
Remark. If p is a stationary invariant measure, one has
p(A)= f P(x,A)dp
Jm
for every Borel set A e A. We have earlier written this as a fixed point
equation for the Markov operator V acting on measures: Vp = p. In the
context of random maps, the Markov operator is also called a transfer
operator.
(ii) Assume there are infinitely many ergodic invariant measures, there
exist at least countably many. We can enumerate them as p\, p2,... Denote
by Ei their supports. Choose a point yi in E*. The sequence of points
has an accumulation point y e M by compactness of M. This implies
that an arbitrary e-neighborhood U of y intersects with infinitely many E*.
Again, the smoothness assumption of the transition probabilities P(y, •)
contradicts with the S invariance of the measures pi having supports E*.
□
Example. For a positive integer fc, the first significant digit is X(k) =
27rlog10(fc) mod 1. It is a circle-valued random variable on every finite
probability space (ft = {1,..., n }, A, P[{fc}] = 1/n).
320 Chapter 5. Selected To p i c s
Example. The law of the wrapped normal distribution in the first example
is a measure on the circle with a smooth density
oo
fxix)= £ e-(*+2*fe>2/2A/2^-
k=—oo
Example. The law of the first significant digit random variable Xn(k) =
27rlog10(fc) mod 1 defined on {1,... ,n } is a discrete measure, supported
on {fc27r/10|0 < fc < 10 }. It is an example of a lattice distribution.
H(f\g)= / f(x)log(f(x)/g(x))dx.
Jo
The Gibbs inequality lemma (2.15.1) assures that H(f\g) > 0 and that
H(f\g) = 0, if / = g almost everywhere.
peim = E[eiX] .
P[|sm((X-ro)/2)|>e]<^.
= E[l-cos(X)] = 2E[sin2(|)]
> 2E[l|sin(f)|>£sin(y)]
> 2e2P[|sin(f)|>e].
0 12 3 4 5 6
Proposition 5.8.3. The Mises distribution maximizes the entropy among all
circular distributions with fixed mean a and circular variance V.
V27r<T2 fc= — OO
Proof. We have \<j)x(k)\ < 1 for all fc ^ 0 because if </>x(k) = 1 for some
fc ^ 0, then X has a lattice distribution. Because </>sn(k) = n*=i fe(fc),
all Fourier coefficients (/)Sn (fc) converge to 0 for n -> oo for fc ^ 0. * □
TT Bjk = e^0,k€GX3k e
(J,*)€G
The central limit theorem assures that the total magnetic field distribution
in a large region is close to a uniform distribution.
5.8. Circular random variables 325
Example. Consider standard Brownian motion Bt on the real line and its
graph of {(t, Bt) \ t e R } in the plane. The circle-valued random variables
Xn = Bn mod 1 gives the distance of the graph at time t = n to the
next lattice point below the graph. The distribution of Xn is the wrapped
normal distribution with parameter ra = 0 and a = n.
w^
Figure. The graph of one-
dimensional Brownian motion
with a grid. The stochastic pro
cess produces a circle-valued ran
dom variable Xn = Bn mod 1. I ^^Pl—.—,—h -
-—j j
1 1 L L....I. i -
If X, y are real-valued IID random variables, then X+Y is not independent
of X. Indeed X + Y and Y are positively correlated because
The situation changes for circle-valued random variables. The sum of two
independent random variables can be independent to the first random vari
able. Adding a random variable with uniform distribution immediately ren
ders the sum uniform:
f I * Jfx(x)fv(y)dydx=
Ja c—x J Q>
f f fx(x)fy(u)
Jc
dudx = P[A]P[B] .
On the n-dimensional torus Td, the uniform distribution plays the role of
the normal distribution as the following central limit theorem shows:
Theorem 5.8.6 (Central limit theorem for circular random vectors). The
sum Sn of IID-valued circle-valued random vectors X converges in distri
bution to the uniform distribution on a closed subgroup H of G.
Proof. Again \</>x(k)\ < 1. Let A denote the set of fc such that (j>x(k) = 1.
(ii) The random variable takes values in a group H which is the dual group
ofZd/H. B y
(iii) Because </>Sn(k) = IE=i **(*), all Fourier coefficients </>sn(k) which
are not 1 converge to 0.
The central limit theorem applies to all compact Abelian groups. Here is
the setup:
5.9. Lattice points near Brownian paths 327
Definition. A topological group G is a group with a topology so that addi
tion on this group is a continuous map from GxG —▶ G and such that the
inverse x -▶ rr-1 from G to G is continuous. If the group acts transitively
as transformations on a space B, the space H is called a homogeneous
space. In this case, H can be identified with G/Gx, where Gx is the isotopy
subgroup of G consisting of all elements which fix a point x.
Example. Any finite group G with the discrete topology d(x, y) = liix^y
and d(x, y) = 0 if x = y is a topological group.
Example. The real line R with addition or more generally, the Euclidean
space Rd with addition are topological groups when the usual Euclidean
distance is the topology.
Example. The circle T with addition or more generally, the torus Td with
addition is a topological group with addition. It is an example of a compact
Abelian topological group.
Theorem 5.9.1 (Law of large numbers for random variables with shrinking
support). IfXi are IID random variables with uniform distribution on [0,1].
Then for any 0 < S < 1, and An = [0,1/n6], we have
fc=i
Proof. For fixed n, the random variables Zk(x) = l[0}1/nS](Xk) are indepen
dent, identically distributed random variables with mean E[Zk] =p = l/ns
and variance p(l -p). The sum Sn = ££=1 Xk has a binomial distribution
with mean np = nl~8 and variance Var[5n] = np(l - p) = n1_<5(l - p).
Note that if n changes, then the random variables in the sum Sn change
too, so that we can not invoke the law of large numbers directly. But the
tools for the proof of the law of large numbers still work.
Bn = {x€[0,l]||^-l|>e}
n
has by the Chebychev inequality (2.5.5), the measure
This proves convergence in probability and the weak law version for all
8 < 1 follows.
Take k = 2 with k(1 - S) > 1 and define nk = fc* = fc2. The event B =
limsupfcBnfc has measure zero. This is the event that we are in infinitely
many of the sets Bnk. Consequently, for large enough fc, we are in none of
the sets Bnk: if x e B, then
\Snk(x) ,,
i x ■*■ ^
For 8 < 1/2, this goes to zero assuring that we have not only convergence
of the sum along a subsequence Snk but for Sn (compare lemma (2.11.2)).
We know now | ^^ — 1| —▶ 0 almost everywhere for n -» oo. □
7T ,
n°
almost surely.
—
1
— m$ 1 —P—r~*
-j ■* ii
Figure. Throwing randomly
discs onto the plane and count
~1—i
4
^j *# »
ing the number of lattice points m <p^
T .
— i
which are hit. The size of the
discs depends on the number of
ti? ■P i
^ r
i r ——
discs on the plane. If 8 = 1/3 m
1»
and if n = l'OOO'OOO, then we 1! j ., m* •
have discs of radius 1/10000 f
and we expect Snj the number of fl 1 i
0
lattice point hits, to be 1007T.
P i%
§ j.
!
j i
330 Chapter 5. Selected To p i c s
Remark. Similarly as with the Buffon needle problem mentioned in the in
troduction, we can get a limit. But unlike the Buffon needle problem, where
we keep the setup the same, independent of the number of experiments. We
adapt the experiment depending on the number of tries. If we make a large
number of experiments, we take a small radius of the disk. The case 8 = 0
is the trivial case, where the radius of the disc stays the same.
Cov[X,X(Tn)]-+0
for n —> oo. If
Cov[X,X(Tn)]<e-Cn
for some constant C > 0, then X has exponential decay of correlations.
Proof. Bn has the standard normal distribution with mean 0 and standard
deviation o = n. The random variable Xn is a circle-valued random variable
with wrapped normal distribution with parameter a = n. Its characteris
tic function is <j>x(k) = e^*2!2. We have Xn+m = Xn + Ym mod 1,
where Xn and Yp are independent circle-valued random variables. Let
9n = ££L0 e"fc2n /2 cos(fcx) = 1 - e(x) > 1 - e~Cn2 be the density of Xn
which is also the density of Yn. We want to know the correlation between
Xn+m and Xn:
ri rl
/ /Jof(x)f(x + y)g(x)g(y) dy dx.
/o
Jo Jo
/ /Jof(x)f{u>)9(x)9(u - x) dudx
Jo
= [ [ f(x)f(u)(l-e(x))(l-e(u-x))dudx
Jo Jo
< Cr\f\le-Cn2.
a
5.9. Lattice points near Brownian paths 331
llm ^l>n(^(*))-l.
n—>oo n ,i-—'
fc=l
Proof. The same proof works. The decorrelation assumption implies that
there exists a constant C such that
J2 Cov[Xi,Xj] <C .
Therefore,
T h e s u m c o n v e r g e s a n d s o Va r [ 5 „ ] — n Va r [ X i ] + C . D
Remark. The assumption that the probability space fi is the interval [0,1] is
not crucial. Many probability spaces (fi, A, P) where Q is a compact metric
space with Borel cr-algebra A and P[{z}] = 0 for all x £ Q is measure
theoretically isomorphic to ([0,1],B, dx), where B is the Borel cr-algebra
on [0,1] (see [13] proposition (2.17). The same remark also shows that
the assumption An = [0, l/ns] is not essential. One can take any nested
sequence of sets An € A with P[An] = \/ns, and An+\ c An.
J / V ^
dimensional Brownian motion,
where we have a probability space
of paths and where we can make
a statement about almost every
path in that space. This is a re
sult in the geometry of numbers
for connected sets with fractal
boundary.
332 Chapter 5. Selected To p i c s
Corollary 5.9.5. Assume Bt is standard Brownian motion. For any 0 < 8 <
1/2, there exists a constant C, such that any l/n1+(5 neighborhood of the
graph of B over [0,1] contains at least C/n1_<5 lattice points, if the lattice
has a minimal spacing distance of 1/n.
Proof. Bt+i/n mod 1/n is not independent of Bt but the Poincare return
map T from time t = k/n to time (fc + l)/n is a Markov process from.
[0,1/n] to [0,1/n] with transition probabilities. The random variables Xi
have exponential decay of correlations as we have seen in lemma (5.9.3). □
Remark. A similar result can be shown for other dynamical systems with
strong recurrence properties. It holds for example for irrational rotations
with T(x) = x + a mod 1 with Diophantine a, while it does not hold for
Liouville a. For any irrational a, we have fn = ^=7 Xwb=i ^An(Tk(x)) near
1 for arbitrary large n = qi, where pi/qi is the periodic approximation of
8. However, if the qi are sufficiently far apart, there are arbitrary large n,
where fn is bounded away from 1 and where fn do not converge to 1.
The theorem we have proved above belongs to the research area of geome
try of numbers. Mixed with probability theory it is a result in the random
geometry of numbers.
Proof. One can translate all points of the set M back to the square ft =
[-1,1] x [-1,1]. Because the area is > 4, there are two different points
(x, j/), (a, 6) which have the same identification in the square f2. But if
(x, y) = (u+2k, v+2l) then (x-u, y-v) = (2fc, 21). By point symmetry also
(a, b) = (-ix, -v) is in the set M. By convexity ((x+a)/2, (y+b)/2) = (fc, I)
is in M. This is the lattice point we were looking for. D
5.10. Arithmetic random variables
To deal with " number theoretical randomness", we use the notion of asymp
totically independence. Asymptotically independent random variables ap
proximate independent random variables in the limit n —▶ oo. With this
notion, we can study fixed sequences or deterministic arithmetic functions
on finite probability spaces with the language of probability, even so there is
no fixed probability space on which the sequences form a stochastic process.
Definition. A sequence of number theoretical random variables is a col
lection of integer valued random variables Xn defined on finite probability
spaces (ftn,.An,Pn) for which fin c f*n+i and An is the set of all subsets
of ftn. An example is a sequence Xn of integer valued functions defined
•on fin = {0,...,n — 1 }. If there exists a constant C such that Xn on
{0,..., n } is computable with a total of less than C additions, multiplica
tions, comparisons, greatest common divisor and modular operations, we
call X a sequence of arithmetic random variables.
Example. For example
Xn(x) = (((x5 - 7) mod 9)3x - x2) mod n
defines a sequence of arithmetic random variables on fin = {0,..., n - 1 }.
Example. If xn is a fixed integer sequence, then Xn(k) = Xk on Qn =
{0,..., n — 1 } is a sequence of number theoretical random variables. For
example, the digits xn of the decimal sequence of tt defines a sequence
of number theoretical random variables Xn(k) = xn for fc < n. However,
in the case of 7r, it is not known, whether this sequence is an arithmetic
sequence. It would be a surprise, if one could compute xn with a finite n-
independent number of basic operations. Also other deterministic sequences
like the decimal expansions of 7r, \[2 or the Mobius function p(n) appear
"random".
5.10. Arithmetic random variables 335
Remark. Unlike for discrete time stochastic processes Xn, where all ran
dom variables Xn are defined on a fixed probability space (fi,-4,P), an
arithmetic sequence of random variables Xn uses different finite probabil
ity spaces (ftn> Ai?Pn)-
Example. On the probability space ft„ = [1,..., n] x [1,..., n], consider the
arithmetic random variables Xd = lSd, where Sd = {(n,m),gcd(n,m) =
d}.
.2
Proposition 5.10.1. The asymptotic expectation Pn[Si] = En[Xi] is 6/n
In other words, the probability that two random integers are relatively
prime is 6/7T2.
PN-(^4 + P + -) = P[Sl]I = 1,
so that P[Si] = 6/tt2. D
336 Chapter 5. Selected Topics
1) Xn(k) — n2 + c mod fc
2) Xn(k) = fc2 +cmodn
Cov[xn,yn]-+o
for n oo.
P[*H
n e j ^ G
n j] _ P[^n e /] pfii ne j] _> o
for n oo.
5.10. Arithmetic random variables 337
Remark. If there exist two uncorrelated sequences of arithmetic random
variables U,V such that \\Un - Xn\\L2^n) —▶ 0 and \\Vn - l^||i,2(n„) -+ °>
then X, Y are asymptotically uncorrelated. If the same is true for indepen
dent sequences U, V of arithmetic random variables, then X, Y are asymp
totically independent.
where c is a constant and p(n) is the n'th prime number. The random
variables Xn and Yn are asymptotically independent. Proof: by a lemma of
Merel [67, 23], the number of solutions of (x, y) G I x J of xy = c mod p is
This means that the probability that Xn/n G In, Yn/n G Jn is |JTn| • \Jn\.
338 Chapter 5. Selected To p i c s
n mod fc n rn,
xnik) = ^r- = --[-}
Elements in the set X~x(0) are the integer factors of n. Because factoring
is a well studied NP problem, the multi-valued function X~x is probably
hard to compute in general because if we could compute it fast, we could
factor integers fast.
5.10. Arithmetic random variables 339
,, n mod k N
Lemma 5.10.3. Let fn be a sequence of smooth maps from [0,1] to the circle
T1 = R/Z for which (f-l)"(x) -> 0 uniformly on [0,1], then the law \xn of
the random variables Xn(x) = (x, fn(x)) converges weakly to the Lebesgue
measure /i = dxdy on [0,1] x T1.
Proof. Fix an interval [a, b] in [0,1]. Because ^n([a, b] x T1) is the Lebesgue
measure of {(#, y) \Xn(x, y) G [a, b]} which is equal to b — a, we only need
to compare
pn([a,b] x[c,c + dy])
and
pn([a,b] x [d,d + dy])
340 Chapter 5. Selected Topics
in the limit n -^ oo. But p,n([a,b] x [c,c + dy]) - p,n([a,b] x [c,c + dy}) is
bounded above by
K/n'/w-^mi^K/n-1)"^)!
which goes to zero by assumption.
Proof. The map can be extended to a map on the interval [0, n]. The graph
(x, T(x)) in {1,..., n} x {1,..., n} has a large slope on most of the square.
Again use lemma (5.10.3) for the circle maps fn(x) = p(nx) mod n on
[0,1]. D
Remark. Also here, we deal with random variables which are difficult to
invert: if one could find Y~l(c) in 0(P(log(n)) times steps, then factoriza
tion would be in the complexity class P of tasks which can be computed
in polynomial time. The reason is that taking square roots modulo n is at
least as hard as factoring is the following: if we could find two square roots
x, y of a number modulo n, then x2 = y2 mod n. This would lead to factor
gcd(x - y, n) of n. This fact which had already been known by Fermat. If
factorization was a NP complete problem, then inverting those maps would
be hard.
M(n)
n =£i>(*)-o
k=i
« • ) - £n ?= l- n » - ? ) - '■
p prime
The function £(s) in the above formula is called the Riemann zeta function.
With M(n) < n1/2*6, one can conclude from the formula
J_ = y> Mn)
cm h n*
that £(s) could be extended analytically from Re(s) > 1 to any of the
half planes Re(s) > 1/2 + e. This would prevent roots of ((s) to be to the
right of the axis Re(s) = 1/2. By a result of Riemann, the function A(s) =
-ir~s/2r(s/2)C(s) is a meromorphic function with a simple pole at s = 1 and
satisfies the functional equation A(s) = A(l — s). This would imply that
C(s) has also no nontrivial zeros to the left of the axis Re(s) = 1/2 and
343
5.11. Symmetric Diophantine Equations
that the Riemann hypothesis were proven. The upshot is that the Riemann
hypothesis could have aspects which are rooted in probability theory.
p(xi,...,xfc) =p(yi,...,2/fc)
£*r = £
i=l j=l
m
Theorem 5.11.1 (Jaroslaw Wroblewski 2002). For fc > ra, the Diophantine
equation xf + • • • + x£* = y™ + ... + j,™ has infinitely many nontrivial
solutions.
Remark. Already small deviations from the symmetric case leads to local
constraints: for example, 2p(x) = 2p(y) +1 has no solution for any nonzero
polynomial p in fc variables because there are no solutions modulo 2.
Remark. It has been realized by Jean-Charles Meyrignac, that the proof
also gives nontrivial solutions to simultaneous equations like p(x) = p(y) =
p(z) etc. again by the pigeon hole principle: there are some slots, where more
than 2 values hit. Hardy and Wright [28] (theorem 412) prove that in the
case fc = 2, ra = 3: for every r, there are numbers which are representable
as sums of two positive cubes in at least r different ways. No solutions
of xf + yf=x% + y$ = x$ + j/| were known to those authors [28], nor
whether there are infinitely many solutions for general (fc,ra) = (2,ra).
Mahler proved that x3 + y3 + z3 = 1 has infinitely many solutions. It is
believed that x3+y3+z3+w3 = n has solutions for all n. For (fc, ra) = (2,3),
multiple solutions lead to so called taxi-cab or Hardy-Ramanujan numbers.
Remark. For general polynomials, the degree and number of variables alone
does not decide about the existence of nontrivial solutions of p(xi, ...,xk) =
P(yi, • • •, 2/fc). There are symmetric irreducible homogeneous equations with
5 . 11 . Symmetric Diophantine Equations 345
fc < ra/2 for which one has a nontrivial solution. An example is p(x, y) =
x5 - Ay5 which has the nontrivial solution p(l, 3) = p(4, 5).
k=2,m = 4 (59, 158)4 = (133, 134)4 (Euler, gave algebraic solutions in 1772 and 1778)
k = 2,m = 5 (open problem ([35]) all sums < 1.02 • IO26 have been tested)
k=3,m = 5 (3, 54, 62)5 = (24, 28, 67)5 ([59], two parametric solutions by Moessner 1939, Swinnerton-Dyer)
k=3,m=6 (3, 19, 22)6 = (10, 15, 23)6 ([28],Subba Rao, Bremner and Brudno parametric solutions)
k=4,m=7 (10, 14, 123, 149)7 = (15, 90, 129, 146)7 (Ekl)
k=5,m=7 (8, 13, 16, 19)7 = (2, 12, 15, 17, 18)7 ([59])
k = 5,m = 8 (1, 10, 11, 20, 43)8 = (5, 28, 32, 35, 41)8.
k=5,m = 9 (192, 101, 91, 30, 26)9 = (180, 175, 116, 17, 12)9 (Randy Ekl, 1997)
k = 5,m=10 open problem
k=6,m=3 (3, 19, 22)6 = (10, 15, 23)6 (Subba Rao [59])
k=6,m=10 (95, 71, 32, 28, 25, 16)10 = (92, 85, 34, 34, 23, 5)10 (Randy Ekl,1997)
k=7,m=10 (1, 8, 31, 32, 55, 61, 68)10 = (17, 20, 23, 44, 49, 64, 67)10 ([59])
k=7,m=12 (99, 77, 74, 73, 73, 54, 30)12 = (95, 89, 88, 48, 42, 37, 3)12 (Greg Childers, 2000)
k=8,m=ll (67, 52, 51, 51, 39, 38, 35, 27)11 = (66, 60, 47, 36, 32, 30, 16, 7)11 (Nuutti Kuosa, 1999)
k=20,m = 21 (76, 74, 74, 64, 58, 50, 50, 48, 48, 45, 41, 32, 21, 20, 10, 9, 8, 6, 4, 4)21
= (77, 73, 70, 70, 67, 56, 47, 46, 38, 35, 29, 28, 25, 23, 16, 14, 11, 11, 3, 3)21 (Greg Childers, 2000)
k=22,m=22 (85, 79, 78, 72, 68, 63, 61, 61, 60, 55, 43, 42, 41, 38, 36, 34, 30, 28, 24, 12, 11, ll)22
= (83, 82, 77, 77, 76, 71, 66, 65, 65, 58, 58, 54, 54, 51, 49, 48, 47, 26, 17, 14, 8, 6)22 (Greg Childers, 2000)
346 Chapter 5. Selected Topics
What happens in the case fc = ra? There is no general result known. The
problem has a probabilistic flavor because one can look at the distribution
of random variables in the limit n —▶ oo:
Xn(xi,...,xfc) =p(xi,..,xk)/nk
on the finite probability spaces Vtn = [0, ...,n]fc converge in law to the
random variable X(x\,... ,xn) = p(xi,..,Xfc) on the probability space
([0, l]fc,Z3,P), where B is the Borel cr-algebra and P is the Lebesgue mea
sure.
p(xi,...,xfc) G [nka,nkb] .
This means
^W=Fn(&)-Fn(a),
where Fn is the distribution function of Xn. The result follows from the fact
that Fn(b) — Fn(a) = Sa^(n)/nk is a Riemann sum approximation of the in
tegral F(b) - F(a) = JAa b 1 dx, where Aa,b = {x G [0, l]k \ X(xi,..., xk) G
(a, b)}. ^ □
Definition. Lets call the limiting distribution the distribution of the sym
metric Diophantine equation. By the lemma, it is clearly a piecewise smooth
function.
5 . 11 . Symmetric Diophantine Equations 347
Proof. We can assume without loss of generality that the first variable
is the one with a smaller degree ra. If the variable xi appears only in
terms of degree fc - 1 or smaller, then the polynomial p maps the finite
space [0, n]fc/m x [0, n}k~l with n*^/™"1 = nfc+e elements into the interval
[min(p), max(p)] C [-Cnk, Cnk). Apply the pigeon hole principle. □
Exercice. Show that there are infinitely many integers which can be written
in non trivially different ways as x4 + y4 + z4 - w2.
Remark. Here is a heuristic argument for the "rule of thumb" that the Euler
Diophantine equation xf + • + xf = xjf1 has infinitely many solutions for
fc > ra and no solutions if fc < ra.
For given n, the finite probability space ft = {(xi,..., xk) | 0 < x{ < n1/™ }
contains nk/m different vectors x = (xu ..., xk). Define the random variable
How close do two values Y(x),Y(y) have to be, so that Y(x) = Y(y)l
Assume Y(x) = Y(y) + e. Then
x3 + y3 = z3
5.12. Continuity of random variables 349
are tagged as threshold cases by this reasoning.
This argument has still to be made rigorous by showing that the distri
bution of the points f(x) mod 1 is uniform enough which amounts to
understand a dynamical system with multidimensional time. We see nev
ertheless that probabilistic thinking can help to bring order into the zoo
of Diophantine equations. Here are some known solutions, some written in
the Lander notation
xm = ix1,...,xk)m = x? + --- + xZ.
x5 + y5 + z5 = w5 is open
m = 6, k = 5: x6 + y6 + z6 + %i6 + v6 = w6 is open.
m = 7, fc = 7: (525, 439, 430, 413, 266, 258, 127)7 = 5687 (Mark Dodrill, 1999)
m = 8, fc = 8: (1324, 1190, 1088, 748, 524, 478, 223, 90)8 = 14098 (Scott Chase)
m = 9, fc = 12, (91, 91, 89, 71, 68, 65, 43, 42, 19, 16, 13, 5)9 = 1039 (Jean-Charles Meyrignac,1997)
Proof. Given e > 0, choose n so large that the n'th Fourier approximation
Xn(x) = EL-n <t>x(n)einx satisfies \\X - Xn\\i < e. For ra > n, we have
(MXn) = E[eimXri] = 0 so that
\<t>x(m)\ = \4>x-xn(m)\ < \\X - Xn||i < e .
D
350 Chapter 5. Selected To p i c s
Remark. The Riemann-Lebesgue lemma can not be reversed. There are
random variables X for which <£x(n) -> 0, but which X is not in Cl.
Here is an example of a criterion for the characteristic function which as
sures that X is absolutely continuous:
1 f27r
ll^lli = ^y Kk(x)dx = l.
= ^k(ak-i-2ak + aM)(l--^-)
k=i K + L
oo I ..
= YI kiak-i - 2ak -r ak+i)(l - £^y) = an
n+l
□
For bounded random variables, the existence of a discrete component of
the random variable X is decided by the following theorem. It will follow
from corollary (5.12.5) given later on.
5.12. Continuity of random variables 351
lim if>x(A0|2
n — >' n
oon k = ^i - ^
= ^x$>[X
~e -R'
= z]2
M'V-^dritte"'-
t_ „ sm(t/2)
satisfies n
A,*/(ao = s„(/)o»o= E /(*)cite
k=—n
The functions
_—inx jint
^) = 2^riD"(i-x) = 2^n,^
k=—n
follows
lim (fn,p-p({x})) =0
so that
it^nc-hi1-).
Proof. The computation ([102, 103] for the Fourier transform was adapted
to Fourier series in [51]). In the following computation, we abbreviate dp(x)
with dx:
n-l i n-1 ( * + » )2
1 *-^T r1 n ~ l e n =T-
I E we <. ./ E
k=-n '» k=-n U
d0 \pk |2
-1 n-1 -i*±Jli -
=2 e Y ~ / c"*^-*)* dxd</d0
i m eP - ^n - i { * - y ) k
dOdxdy
u U
=4 e / / ( ^-"l2"2+i(x-v)e
n-1 e_{h±±+i{x_y)n?
/ J dOdxdy
k——n
and continue
.. n—1
-EM *
k=—n
e / e-(*-»)9l5r| /'
A2 Jo
^1 ,,_«•£•+(._,,)$)»
/ „ dd\ dxdy
«——n
=6 e f / . [ f°°/ =
e-(*+<(*-v)*)a
e f t e - ( s - ,t f ), ,2„2
* dxdy
=7 e V^/
" " (e-(x~y)2l£)dxdy
Jt2
<8 e yft{( e-^-y^^dxdy)1/2
Jt2
oo «
=9 eV 5FE/ e-^-^^dxd^)1^
fc=0 ^/^<l^-2/l<(fc+l)/n
oo
<io e^Ci/i(n-1)(Ee_fc2/2)1/2
fc=o
5.12. Continuity of random variables 349
are tagged as threshold cases by this reasoning.
This argument has still to be made rigorous by showing that the distri
bution of the points f(x) mod 1 is uniform enough which amounts to
understand a dynamical system with multidimensional time. We see nev
ertheless that probabilistic thinking can help to bring order into the zoo
of Diophantine equations. Here are some known solutions, some written in
the Lander notation
x5 + y5 + z5 = w5 is open
m = 6, fc = 5 xu + y° + zD + u° + v°
m = 7, fc = 7: (525, 439, 430, 413, 266, 258, 127)7 = 5687 (Mark Dodrill, 1999)
m = 8, fc = 8: (1324, 1190, 1088, 748, 524, 478, 223, 90)8 = 14098 (Scott Chase)
m = 9, fc = 12, (91, 91, 89, 71, 68, 65, 43, 42, 19, 16, 13, 5)9 = 1039(Jean-Charles Meyrignac,1997)
Proof. Given e > 0, choose n so large that the n'th Fourier approximation
xn(x) = EL-n 4>x(n)einx satisfies \\X - Xn\\x < e. For m > n, we have
<t>rn(Xn) = E[eimX"} = 0 so that
D
350 Chapter 5. Selected To p i c s
Remark. The Riemann-Lebesgue lemma can not be reversed. There are
random variables X for which (j)X(n) -+ 0, but which X is not in C1.
Here is an example of a criterion for the characteristic function which as
sures that X is absolutely continuous:
1 f2n
= ^%fc_i-2afc + aife+1)(l--^T)
k=i K + i
ikx
^ ^ i t ^ T ik = —En J i^k t
n .n_W
V V J- k t2 ^_ e s i'n (s( fi cn (+t / 2l / 2
) )t)
k=-n V ' '
satisfies n
Akx
Dn*fix) = Snif)ix)= J2 f(k)el
k——n
The functions
—inxint
>»<«> = ^»<«-x) = 2^1 £
k=—n
so that
D
352 Chapter 5. Selected To p i c s
Definition. If p and v are two measures on (fi = T, A), then its convolution
is defined as
p • v(A) = / p(A - x) dv(x)
Jt
for any A e A. Define for a measure on [-7r,7r] also p*(A) = p(-A).
Remark. We have p*(n) = p(n) and p*v(n) = p(n)i>(n). If p = J2ajSxj
is a discrete measure, then p* = J2^js-xj- Because p*p* = ^ \o,j\2, we
have in general
(^^)({o}) = EWW)2l'
Remark. For bounded random variables, we can rescale the random vari
able so that their values is in [~7r, it] and so that we can use Fourier series
instead of Fourier integrals. We have also
x£R J n
Theorem 5.12.6 (Y. Last). If there exists C, such that ^Ylk=i lAfc|2 <
C • h(±) for all n > 0, then p is uniformly v^-continuous.
5.12. Continuity of random variables 353
1 /sin(^tr2
K n ( t ) ~n +<1 VKsin(t/2)
2 '
Ifcl >
= y (i _ JJ^Lykt
fc=—n
n + lJ
1^1 ^ifct
k=—n
Therefore
n* I - |2
E
> W)2><tt2MW)
> c-hi—).
l-±\M*<Chil).
k=—n
□
354 Chapter 5. Selected To p i c s
fc=l
Proof. The computation ([102, 103] for the Fourier transform was adapted
to Fourier series in [51]). In the following computation, we abbreviate dp(x)
with dx:
n-1 ,1 n~1 -{k+6/
a
' fc|2
fc=—n fc=-n
j n-1 iS±|li .
./o fc=—n
,._ . n Jv>
==3 e / / V] 1 dOdxdy
=4 e / / e ^ 2"2+i(x-y)e
Jt2 Jo
n-1 c_(-E±ft+<(x_y)t)a
N dOdxdy
k=—n
and continue
n-1
k=—n
n
=6 e / [ / — dt]e"^-^ ^ dxdy
Jt2 J-oo n
=7 e 0F / (e^x-y)2^)dxdy
Jt2
= 9 e^
> (E/ e-^-^^dxdy)1/2
^ A/n<|a;-»|<(fc+l)/n
oo
<io .cVJFCiMn-^Ec-*2/2)1/2
fc=0
<n Ch(n~l) .
5.12. Continuity of random variables 355
Here are some remarks about the steps done in this computation:
(1) is the trivial estimate
, n-1 -Lh±<pl
> 1
n
fc=—n
(2)
(£e-f
k=0
c2/2)1/2
is a constant.
356 Chapter 5. Selected Tbpics
Bibliography
[I] N.I. Akhiezer. The classical moment problem and some related ques
tions in analysis. University Mathematical Monographs. Hafner pub
lishing company, New York, 1965.
[9] D.R. Cox and V. Isham. Point processes. Chapman & Hall, Lon
don and New York, 1980. Monographs on Applied Probability and
Statistics.
[14] John Derbyshire. Prime obsession. Plume, New York, 2004. Bernhard
Riemann and the greatest unsolved problem in mathematics, Reprint
of the 2003 original [J. Henry Press, Washington, DC; MR1968857].
[19] P.G. Doyle and J.L. Snell. Random walks and electric networks, vol
ume 22 of Carus Mathematical Monographs. AMS, Washington, D.C.,
1984.
[37] Paul R. Halmos. Measure Theory. Springer Verlag, New York, 1974.
[43] K. Ito and H.P. McKean. Diffusion processes and their sample paths,
volume 125 of Die Grundlehren der mathematischen Wissenschaften.
Springer-Verlag, Berlin, second printing edition, 1974.
[59] J.L. Selfridge L.J. Lander, T.R. Parken. A survey of equal sums of
like powers. Mathematics of Computation, (99):446-459, 1967.
[67] L. Merel. Bornes pour la torson des courbes elliptiques sur les corps
de nombres. Inv. Math., 124:437-449, 1996.
[75] T.Lewis N.I. Fisher and B.J. Embleton. Statistical analysis of spher
ical data. Cambridge University Press, 1987.
[79] S. C. Port and C.J. Stone. Brownian motion and classical poten
tial theory. Probability and Mathematical Statistics. Academic Press
(Harcourt Brace Jovanovich Publishers),New York, 1978.
[82] C.J. Reidl and B.N. Miller. Gravity in one dimension: The critical
population. Phys. Rev. E, 48:4250-4256, 1993.
[85] RE. Hart R.O. Duda and D.G. Stork. Pattern Classification. John
Wiley and Sons, Inc, New York, second edition edition.
[96] B. Simon and T. Wolff. Singular continuous spectrum under rank one
perturbations and localization for random Hamiltonians. Commun.
Pure Appl. Math., 39:75.
Bibliography 363
[97] Ya. G. Sinai. Probability Theory, An Introductory Course. Springer
Textbook. Springer Verlag, Berlin, 1992.
[109] G.L. Wise and E.B. Hall. Counterexamples in probability and real
analysis. Oxford University Press, 1993.
Index
364
Index 365
variance, 40
variance
Cantor distribution, 129
conditional, 128
variation
stochastic process, 256
vector valued random variable, 26
vertex of a graph, 275
Probability Theory and Stochastic Processes with Applications
About the Book-
Chapter 1-2 of this text covers material of a basic probability course. Chapter 3 deals with discrete
stochastic processes including Martingale theory. Chapter 4 covers continuous time stochastic
processes like Brownian motion and stochastic differential equations. The last chapter "selected
topics" got considerably extended in the summer of 2006. In the original course, only localization
and percolation problems were included. Now, more topics like estimation theory, Vlasov
dynamics, multi-dimensional moment problems, random maps, circle-valued random variables,
the geometry of numbers, Diophantine equations and harmonic analysis have been added. No
previous knowledge in probability are necessary, but a basic exposure to calculus and linear algebra
is required. Some real analysis as well as some background in topology, functional analysis and
harmonic analysis can be helpful for the later chapters.
Oliver Knill started his research in the field of dynamical systems, tackling ergodic and spectral
theoretical questions. He worked out and published Moser's lecture notes "Selected chapters in the
calculus of variations" which appeared in 2003 in the series Lectures in Mathematics ETH Zurich.
More recently he works on applications of dynamical systems to probability theory and elementary
number theory as well as inverse problem in analysis, geometry, and computer vision.
ISBN 81-89938-40-I