Doob 1934 Stochastic Processes and Statistics
Doob 1934 Stochastic Processes and Statistics
Doob 1934 Stochastic Processes and Statistics
{ Ttw }, - Co<. t < o is called a path curve. These path curves have been
studied, and many results, such as the ergodic theorem, obtained. It was
shown by Khintchinel that there is a formal analogy between this study
and that of stochastic processes. It will be shown in this paper that the
two theories are abstractly identical.
An example of a stochastic process can be obtained as follows. Let
Q, 'Tt be defined as in the preceding paragraph, and let sp(w) be a measurable
function on U. A chance variable is, by definition, a measurable function
on a space on which a probability measure is defined. We define x(t) as
the function qo(Tiw). The probability that (1) is true is then the measure
of the set of elements w such that
aj < p(Ts3w) < bj, j =1, ..., n. (2)
The following theorem shows that every stochastic process can be obtained
in this way. If. the stochastic process is stationary, the transformations
{ T } are measure preserving, and conversely.
THXEOREM 1. A stochastic process can always be considered as a set of
measurable functionals (pt(w), -0 < t < co on the function space a offunctions
x(t) = w defined for - c< t < co, on which a probability measure is defined.
If T is the transformation of Q taking x(t) into x(t + r), 1t(o) =-o(TOw).
Let { x(t) } be the chance variables of the given stochastic process.
There corresponds to every value of t a space Ot of elements w, on which a
probability measure is defined, and x(t) is a measurable function, (pt(w),
on 91. Transform (t into the x-axis by the transformation St which
takes the set of points w, at which (pl(wt) - a into the point x = a for each
value of a, Then (i(Cot) = x if x = S,w$, and the probability measure on
VOL. 20, 1934 MA THEMA TICS: J. L. DOOB 377
(lt induces one on the x-axis. The chance variable x(t) becomes the
chance variable x *(t) = x. We can thus suppose that x2(t) is a chance
variable corresponding to a distribution on the x-axis, and that #pt(wj) =
,po(x) = x(t), i.e., that pt(ct) is the function x-where measure is defined
in a way depending on t. By hypothesis, probability is assigned to events
determined by conditions of the form
aj < x (tj) < by, j = I1, ...,p n, W1)
i.e., a probability measure is defined on the function space Q, the measure
being determined by its values on sets determined by conditions of the
form (1').3 The chance variable x(to) can be considered as the function
defined on Q which takes on the value x(to) at the element x(t) of U. This
functional is a measurable function on Q. The first part of the theorem
is thus proved. The second part is obvious.
Since x(t + r) = TT [x(t)], Theorem 1 shows that if the process is sta-
tionary, and if simple continuity conditions are satisfied,4 the ergodic
theorem of Birkhoff and similar theorems can be applied. For instance,
Downloaded from https://www.pnas.org by 196.200.146.5 on June 12, 2024 from IP address 196.200.146.5.
n
If f(x) is the probability density9 of a chance variable x, II f(xj) is the
j=1
density for the distribution of the results of n independent trials. What
has been done above can be interpreted as letting n become infinite, ob-
taining a probability set-up which is suitable for any finite number of trials.
Using the results described above, a rigorous proof can be obtained of
the validity of the method of maximum likelihood of R. A. Fisher, which
has supplanted the use of Bayes' Theorem.'0 Let f(x, p) for each value of
p in a neighborhood of a value po be a probability density. Let x be a
chance variable whose distribution has density f(x, Po). The problem is
to estimate po by means of samples of values of x. The method of Fisher
n
is to choose the value p (xi, ..., x.) of p which maximizes nlf(xo, p) for
j-1
fixed xi, .. ., x". Then it can be shown using the above results, that under
suitable restrictions on the continuity of f(x, p) in p, pn(X1, . . ., X") ap-
proaches po with probability 1, and that for large n the distribution of
PM (XlX.. X.) is nearly normal, with mean Po and variance 1/no2, where
_Jf f(x, po) ap2 [(log f(x, po)] dx.
'Mathemat. Ann., 109, 604-615 (1934); this paper should be compared with a paper
by the same author in these PROCEEDINGS, 19, 567-573 (1933);
2 We shall mean by this that there is a non-negative completely additive set function,
defined on a collection of sets of elements of the space (called measurable sets). If
A,, A2, ... are measurable sets, we suppose that their complements and their sum are
measurable. We suppose that the set of all elements in the space has measure 1.
Measurable functions and Lebesgue integration can be defined in the usual way.
VOL. 20, 1934 PHYSICS: R. C. TOLMAN 379
3This way of defining measure in function space was discussed by Kolmogoroff,
Ergebnisse Mathematik, 2, No. 3, Grundbegriffe der Wahrscheinlichkeitsrechnung, § 4.
4 It is sufficient that if E is any set in O determined by conditions of the form (1'),
and if E is transformed into Et by Tt, the measure of E Et should be continuous in
tat t = 0.
c For a simple proof of the ergodic theorem, following the lines of the first proof, given
by Birkhoff, cf. A. Khintchine, Mathemat. Ann., 107, 485-488 (1933).
B This situation was discussed by Khintchine, Zeit. Angewandte Mathemat. Mechanik,
13, 101-103 (1933), who treated the particular case of chance variables taking on only
the values 1 or 0. The general case was discussed by E. Hopf, Journal of Mathematics
and Physics, M. I. T., 13, 51-102 (1934), who obtained (3') but not Theorem 2.
7 Kolmogoroff, loc. cit.,3 p. 59, announced this result in the special case of independent
chance variables, and announced also Theorem 2, under the assumption that the prob-
ability is 1 that the upper limit in (4) is 0.
8 Loc. cit.,6 p. 488.
9 If f(x) is defined for - X < x < co except possibly for a set of points of Lebesgue
measure 0, is Lebesgue measurable, not negative, and integrable over - X < x < co,
co
f f(x)dx
- 00
= 1, f(x) will be called a probability density.
10 This method was discussed (unrigorously) by Fisher in the Phil. Trans. Roy. Soc.
Downloaded from https://www.pnas.org by 196.200.146.5 on June 12, 2024 from IP address 196.200.146.5.
London, Series A, 222 (1921). The treatment of H. Hotelling, Trans. Amer. Math.
Soc., 32, 847-859 (1930), holds only in certain special cases.
* NATIONAL RESEARCH FELLOW.