A Falsifying Rule For Probability Statements
A Falsifying Rule For Probability Statements
A Falsifying Rule For Probability Statements
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted
digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about
JSTOR, please contact [email protected].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
http://about.jstor.org/terms
Oxford University Press, The British Society for the Philosophy of Science are
collaborating with JSTOR to digitize, preserve and extend access to The British Journal for the
Philosophy of Science
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
Brit. J. Phil. Sci. 22 (197I), 231-261 Printed in Great Britain 231
by DONALD A. GILLIES
i. Introduction
2. Formulation of the Falsifying Rule
3. Criticism of the Neyman-Pearson theory
4. A reply to some objections of Neyman's
I INTRODUCTION
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
232 Donald A. Gillies
Of course Popper's point is not only an empirical claim about the behaviour
of scientists but a normative proposal. He not only holds that good scientists
use probability statements as falsifiable statements, but that all scientists
ought to use them in this way.
This proposed solution raises a problem which I propose to call 'the
problem of producing an F.R.P.S. (or Falsifying Rule for Probability
Statements)'. In other words assuming that Popper is right and statis-
ticians and physicists do use probability statements as falsifiable state-
ments, the problem is to give explicitly the methodological rules which
implicitly guide their handling of probability. Of course once again such a
rule will not be merely a description of good scientific practice, but rather
a normative proposal that probability should be dealt with in accordance
with its dictates.
In the next section we will formulate such a rule, and examine how far
it agrees with statistical practice. In fact the suggested rule largely agrees
with the standard statistical tests (the X2-test, the t-test, and the F-test),
but it contradicts the Neyman-Pearson theory of testing. As this theory
is still generally accepted among statisticians, the situation will conse-
quently look black for our proposed rule. However rather than abandon
the rule, we will proceed in section 3 to criticise the Neyman-Pearson
theory. Finally in section 4 we will attempt to answer some objections of
Neyman's. Evidently these were not directed against the view developed
here, but they had as their object theories of testing of the same general
type as the one advocated.1
1 It will be clear from this introduction that we take the notion of a falsifiable theory as
basic. However the falsifiability criterion has recently been criticised by Lakatos in his
paper [4]. Lakatos proposes analysing the growth of science in terms of 'unfalsifiable
research programmes' rather than 'falsifiable theories'. I believe that much of the dis-
cussion of this paper can be reformulated within Lakatos' framework. The problem
becomes that of formulating a rule telling us when a statistical sample becomes an
anomaly for the underlying programme. However we will not discuss these general
philosophical questions in what follows.
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 233
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
234 Donald A. Gillies
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 235
Suppose the r.v. I can take any of the integral values o, I, 2, ..., 9, 900.
Suppose its distribution D is given by
p( = o)= o.o0
p($ = i) = I O-4 (i = I, 2, ..., 9,900)
for i = I, 2, ..., 9,900 we have 1(i) = Io-2
Assuming that this is small enough to give a falsification, we have in
accordance with R.2 that H should be rejected if we get a value i with
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
236 Donald A. Gillies
WeC.shall
in then regard H as falsified if the observed value of s, x say, lies
This rule is very near our final version, but it is not satisfactory because
the requirement (iii) turns out to be too weak. To see this, consider the
following counter-example which is a modification of our previous one.
Suppose a r.v. s can take on the integral values o, I, 2, ..., 9,940. Suppose
further 5 has the distribution D defined by
p(= o) = o-oi
p(- = i) - io0-4 (i = I, 2,..., 9,540)
P( = i) = 9 x IO0-5 (i 9,541, ..., 9,940).
Suppose we now set A (o, i, 2, ..., 9,540o)
C = (9,54, ..., 9,940)
Then
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 237
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
238 Donald A. Gillies
Let us now examine the significance of the condition that fmax should,
in some sense, 'be representative of the values of f(x) in [a, b]'. Another
way of putting this is as follows. We require that f(x) should have a low
value in the 'tails' of the distribution, but once we enter the 'head' or
'acceptance region' [a, b], we would like f(x) to rise to its maximum value
fmax as quickly as possible. So we require that f should increase swiftly
once inside the region [a, b]; but how swiftly? I believe that, at the cost
admittedly of some arbitrariness, we can give a precise answer to this
question. This criterion then enables us to say definitely whether a
distribution of the continuous, unimodal, form is falsifiable or not, and
if it is falsifiable to divide it into a 'head' and 'tails'. This procedure is of
course very useful in comparing the recommendation of our F.R.P.S.
with the methods actually used in statistical practice. However we will
not here give the details of this further attempt at preciseness. Rather we
will assume that the qualitative considerations given to date enable us,
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 239
f max
f(a) f(b)
a b
Fig.
Then
divide
proba
the d
to ch
so by
given
whose
k and
test in
case w
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
240 Donald A. Gillies
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 241
to take for our falsification class the 'tails' of the distribution. Only if we
chose the tails could we obtain a low i-value as required. Consider however
a falsifying rule of the following form (which we shall call R.4).
I
I
I
I
I
I
I
I
I
I
I
I II
i
I II
I I
I I tI
r p
Fig. 2. Sho
Suppose ag
which prob
large numb
normally
Now if we
(p--8, p+
k < k0. Co
extremely
us to rejec
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
242 Donald A. Gillies
r +6
Adopting R.4 we could choose for our critical region C any interval (a, b)
where -- < a < b < j and b-a -- k, where k is some suitable constant
< k0. On the Neyman-Pearson theory a number of critical regions C
of the form (a, b) are possible depending on which 'class of alternative
hypotheses' we adopt. On our own view however no critical region of the
form (a, b) is allowable. For any such region we have I I, i.e. its maximum
possible value. Hence we certainly shall not have 1 < 10 whatever 'crucial'
value 1o is chosen. I would claim that the non-existence of a falsification
class here is in accordance with intuition. Suppose we have say -- < a <
b < a' < b' < ?, and b'-a' = b-a = k. If we now adopt (a, b) as our
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 243
critical region, we will reject the hypothesis if the observed value x e (a, b)
and not if x e (a', b'). Yet as far as the hypothesis under test goes, the
two regions (a, b) and (a', b') are exactly symmetrical. It seems wholly
arbitrary to adopt one and not the other as a critical region. The only
solution is to say that this particular distribution is not one of the falsifiable
kind.
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
244 Donald A. Gillies
error of the day, except perhaps for a few anomalies. Yet Copernicus
devised an alternative hypothesis which agreed with observation as well
as the Ptolemaic theory. This new hypothesis led to the production of
new tests of the Ptolemaic account, and generally played a large part in
stimulating the enormous scientific advances of the next hundred years.
Granted then that alternative hypotheses can be of such value, why should
I want to attack the principle of alternative hypotheses as it appears within
the Neyman-Pearson theory?
There are really two reasons. First of all although it is often desirable
to devise alternatives when testing a given hypothesis, it is by no means
necessary to do so. There are many situations where we want to test a
given hypothesis 'in isolation', i.e. without formulating precise alterna-
tives.' Indeed it is often the failure of such a test which elicits an alternative
hypothesis. Suppose a hypothesis H is suggested, but as yet we have no
precisely defined alternatives. Then on our account we can test out H.
If it passes the tests, well and good. It can be provisionally accepted, and
used for some purpose. If on the other hand it fails the test, we will then
try to devise a new hypothesis H' which avoids the refutation. In such
cases the falsification of H provides the stimulus for devising an alternative
hypothesis. Now if we stick to the Neyman-Pearson approach, the
alternative hypothesis H' should have been devised before the very first
test of H, and that test should have been designed with the alternative in
mind. The practising statistician can justly complain that this is too much
to demand. He could point out that H might have been corroborated by
the tests in which case the trouble and mental effort of devising an alterna-
tive would have been unnecessary. Further he could argue that even if H
is falsified, the nature of the falsification will give a clue as to what alterna-
tive might work better. It would be silly to start devising alternatives
without this clue.
Now admittedly it is a very good thing if a scientist can devise a viable
alternative H' to a hypothesis H, even when H has not yet been refuted.
As we have just explained, Copernicus devised an alternative astronomical
theory even though the existing one (the Ptolemaic) was reasonably well
1 This view was held by Fisher who writes in ([31, P. 42):
"On the whole the ideas (a) ... and (b) that the purpose of the test is to discriminate
or "decide" between two or more hypotheses, have greatly obscured their under-
standing, when taken not as contingent possibilities but as elements essential to their
logic. The appreciation of such more complex cases will be much aided by a clear
view of the nature of a test of significance applied to a single hypothesis by a unique
body of observations."
In a sense what follows can be considered as an attempt to support this opinion of
Fisher's against that of Neyman and Pearson. Fisher's opinion has been supported
recently by Barnard, cf. his contribution in L. J. Savage and others [91.
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 245
under test is that ( is normal ,0, u0, then the alternatives will be that ( is
normal with different jt, O (or, in some cases, just with different it). Thus
the alternatives generally considered when the Neyman-Pearson theory is
applied are merely trivial variants of the original hypothesis. But this is an
intolerably narrow framework. We could (and should) consider a much
wider variety of different alternatives. For example we might consider
alternatives which assigned a distribution to ( of a different functional
form. Again we might reject the assumption that the sample x1, ..., x, is
produced by n independent repetitions of a random variable & and try
instead a hypothesis involving dependence. We might even in some cases
replace a statistical hypothesis by a complicated deterministic one. By
restricting alternatives to such a narrow range, the Neyman-Pearson
theory places blinkers on the statistician, and discourages the imaginative
invention of a genuinely different hypothesis of one of the types just
mentioned. It must be remembered too that if a genuinely different
hypothesis is proposed and corroborated, the fact that the original falsifying
test was (say) UMP1 in a class of trivial variants ceases to have much
significance.
To this argument a reply along the following lines will no doubt be made
"These 'academic' objections of yours are all very well. We fully admit
that it would be nice to have a theory of testing which embodied all the
alternatives you speak of. But such a theory would be difficult, if not
impossible, to construct. In such a situation the practical man must be
1 I.e. 'uniformly most powerful'. We shall use this standard abbreviation throughout.
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
246 Donald A. Gillies
content with the best we can do which is to consider only certain simple
alternatives. Moreover the Neyman-Pearson model embodying these
simple alternatives finds frequent and useful application in statistical
practice." Against this I claim that the Neyman-Pearson model does not
fit most statistical situations at all well, and I shall try to show this by
means of examples.
Our first example of a statistical investigation which does not fit the
Neyman-Pearson theory is taken, oddly enough, from Neyman himself.
It occurs in his ([6], pp. 33-7). The problem dealt with arose in the field
of biology. An experimental field was divided into small squares and counts
of the larvae in each of these squares were made. The problem was to find
the probability distribution of the number n of larvae in a square. The first
hypothesis suggested was that this random variable had a Poisson distribu-
tion p. = exp (- A)Ah"/n! for some value of the parameter A (i.e. a
composite hypothesis). This was then tested by the X2-method. The
possible results were divided into io classes corresponding to o, i, ..., 8, 9
or more, observed larvae. The number me, s = o, i, ..., 9, observed in each
class was noted and the expected number m' was calculated given the
hypothesis. For the purposes of this calculation the unknown parameter
was estimated by the X2-minimum method. Finally the value of the
9
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 247
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
248 Donald A. Gillies
In all cases, the first theoretical distribution tried was that of Poisson. It will be
seen that the general character of the observed distribution is entirely different
from that of Poisson. There seems no doubt that a very serious divergence
exists between the phenomenon of distribution of larvae and the machinery
assumed in the mathematical model. When this circumstance was brought to my
attention by Dr. Beall, we set out to discover the reasons for the divergence (my
italics).
In other words it was only after the first test that Neyman attempted to
devise an alternative hypothesis. Indeed, as so often in science, it was the
falsification of a hypothesis which stimulated theoretical activity. Now of
course the Type A hypothesis could have been devised before the first
test. But as we pointed out above, it is unreasonable to demand scientific
ingenuity of this level every time a hypothesis is tested. Moreover had the
Poisson distribution proved satisfactory, the mental effort needed to pro-
duce the type A hypothesis would have been unnecessary.
A second point to notice concerns Neyman's handling of the second
test, i.e. of the test of the Type A hypothesis. Now here a genuine alterna-
tive exists-namely the Poisson hypothesis. If Neyman had been true to
his principles, he should have tried to devise say an UMP test of the Type
A hypothesis against the Poisson alternative. Of course he did not follow
this course which would have involved him in difficult (perhaps impos-
sibly difficult) mathematics, but instead used the standard X2-test. From
our point of view he was eminently justified. The X2-test procedure falsifies
one hypothesis and corroborates the other; it is thus genuinely crucial
between the two hypotheses, and hence severe and to be commended,
But is Neyman justified from the point of view of his own theory? Without
a complicated mathematical investigation of the power properties of the
X2-test, it is impossible to say.
For these reasons it cannot, I think, be denied that the piece of
statistical reasoning as actually carried out by Neyman was not fitted into
the Neyman-Pearson theory of testing. It might however be claimed that
it could, as it were retrospectively, be fitted into the theory. That is to say
we might, long after the event, propose some alternative hypotheses and
show that the tests used were in some sense optimal against these alterna-
tives. Indeed attempts have been made to fit the X2-test into the framework
of the Neyman-Pearson theory. We must now examine whether these
provide a solution to the present difficulty.
A typical such attempt is made by Lehmann in [51, PP- 303-6. Let us
suppose we are testing the hypothesis H0 that a certain real random
variable ( has a distribution F(x). We will suppose the distribution
completely specified so that the hypothesis is 'simple'. The test used is the
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 249
p, t p, i = I, ..., k and pP, =z=I1i. He next shows that the X2-test has
certain optimal properties relative to these alternatives. In fact these
optimal properties are hardly very convincing. However we will not stress
this point. Another difficulty is that in the present Neyman example we
are considering a composite hypothesis in which the distribution of (
contains certain arbitrary parameters. It is not clear that the present method
will apply to this case. However we propose to show that it is inadequate
even in the simple case and thus a fortiori in the composite one.
Our first objection is that, for the purposes of testing, H0 is replaced by
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
250o Donald A. Gillies
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 251
Now the central limit theorem yields that under certain very general
conditions ( will tend to a normal distribution as n - oc. But next suppose
that n is not large enough for the normal approximation to apply. Can we
get a better approximation to the distribution of sums like 9? This mathe-
matical question had been investigated by Cramer in his 1937 Cambridge
Monograph, Random Variables and Probability Distributions, and it was
natural for him to apply the results to the case in hand. In fact we do get a
better approximation by adding to the normal frequency function
successive terms of the Edgeworth series. Consequently Cramer modified
his original hypothesis by adding the first term of the Edgeworth series.
He applied the X2-test this time estimating 3 parameters. The result was
again a falsification. Cramer then added the first and second terms of the
Edgeworth series, applied the X2-test estimating 4 parameters, and obtained
on this occasion a corroboration.
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
252 Donald A. Gillies
(FI,n .,?-
of pk).
values of Suppose
E. How canfurther
we test H? weInhave
orderdata consisting
to do so we mustoffind
a set
a (x1, ..., x,)
statistic 4, i.e. a function rl(x1, ..., xn) of the sample which satisfies the
following three conditions:
(i) It must be possible to calculate mathematically the distribution D
of q given H, or at least to find a distribution D which is a good approxima-
tion to y's true distribution.
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 253
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
254 Donald A. Gillies
different purpose. I quote his remarks from L. J. Savage and others [9],
p. 84:
Suppose that our simple hypothesis says that the density of the observations is
fo(x), and that the test consists in calculating the function t(x) and regarding
large values of t(x) as evidence against a null hypothesis. Suppose we consider
the following family of hypotheses:
This shows that for null hypotheses of the type considered by Cox any
test whatever is uniformly most powerful relative to some set of alterna-
tive hypotheses. Thus the property of being uniformly most powerful can
only be significant if the set of alternatives introduced is in some sense
realistic as opposed to arbitrary and artificial. But how do we decide that a
set of alternatives is realistic? Only qualitative considerations will help us
here.
It is known that some statisticians are of the opinion that good tests can be
devised by taking into consideration only the hypothesis tested. But my opinion
is that this is impossible ....
1 I am grateful to Colin Howson for first drawing my attention to these objections.
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 255
The view here under attack is certainly one to which I would subscribe.
It seems to be perfectly possible in some circumstances to devise a good
test of a statistical hypothesis without taking into account alternative
hypotheses. Indeed I would cite Neyman's X2-test of the Poisson hypothesis
as an example of this. It therefore becomes necessary to try and refute
Neyman's arguments.
Neyman proceeds by proving two mathematical results, and then claim-
ing that these results raise impossible difficulties for the position he is
attacking. Like him we will begin by stating and proving these results-
giving in fact rather simpler proofs based on a method of Cramer's.
Throughout we will be concerned with the statistical hypothesis H that
x, ..., x, are independent and normally distributed with zero mean and
standard deviation r. To test this we would customarily consider the
t-statistic defined by t = (n--I)?x/s where
LEMMA. If
and
ns'2 _ x-'2
i=1
E(ysy,) r2j=l
"c2cki
for: ifori= k
{ao for i= k
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
256 Donald A. Gillies
x'/s'= z1(z+
and the result is pr
as the following two
THEOREM I. In the situation under consideration we can find a statistic
5 of the r.v.'s x1, ..., x, s.t.
(I) 2 like z has the z-dn with n--I d. of fr.
(2) Izl I .
Proof. Set (n)*x' = (x,--x,)/(2)
ns'2 = x2-nx'2
i= 1
6 = x'/s'
Then by lemma 5 has the z-dn with n--I d. of fr. as required. The second
property (cf. Neyman (4), p. 50) follows from some simple algebraic
inequalities. We have for any real numbers a, b
(a'fb)2 o
.. 2(a2+b2) > a2+b2?2ab = (aIb)2
But now
. 2 2=1
2 (xd--)2 = 2ns2
"~ ~ ' <s2 (2)
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 257
However
?n
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
258 Donald A. Gillies
This last circumstance will make it necessary to choose one of the criteria.
Against this I maintain that the a-test and the z-test are both entirely
valid tests. It would be possible with justice to apply either or both of
them. The relationship between the two tests appears strange at first
sight, but there is in fact nothing paradoxical about it. To establish this I
propose to consider a hypothesis drawn from an unproblematic area of
deterministic physics; to describe two tests T1 and T2 which everyone
would accept as valid; and to show that T1 and T2 are related in the same
way as Neyman's z-test and a-test.
The example I have in mind is none other than Galileo's law that
falling bodies have in vacuo a constant acceleration of 98I cm/sec2. We
might be able to calculate from this law and certain assumed laws concerning
the fracture of glass that if a steel ball of a certain size is dropped from a
height h it will acquire a velocity sufficient to shatter and pass through a
glass plate of thickness less than a, but that a glass plate of thickness
greater than b will stop the ball without shattering. Now define test T1 and
T, as follows:
T1: Drop a steel ball of the given size from height h on to a glass plate
of thickness ax where ax < a. If the plate shatters and the ball
continues its downward course Galileo's law is confirmed. If the plate
stops the ball Galileo's law is falsified.
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 259
T,: Drop a steel ball of the given size from height h on to a glass plate
of thickness b1 where b1 > b. If the plate stops the ball Galileo's
law is confirmed. If the ball shatters the plate and passes through,
Galileo's law is falsified.
A rather than B. On the other hand ?' = (x--x2)/(2)t does not give a
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
260 Donald A. Gillies
The obvious way out of these difficulties is the one suggested by Cochran
and Cox. We should lay down the rule that our tests of a given hypothesis
should be designed before the sample values are inspected. This rule has
great appeal for common sense, is actually used in statistical practice, and
avoids Neyman's second objection without an appeal to the principle of
alternative hypotheses.
I conclude that, although our approach is no doubt liable to many
objections, it is at least not refuted by those which Neyman raises.
REFERENCES
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms
A Falsifying Rule for Probability Statements 261
[4] LAKATOS, I. (1970) Falsification and the Methodology of Scientific Research Pro-
grammes, in Criticism and the Growth of Knowledge. Eds. I. Lakatos and A.
Musgrave, pp. 9I-195. Cambridge University Press.
[5] LEHMANN, E. L. (1959) Testing Statistical Hypotheses. John Wiley and Sons, Inc.,
New York.
[6] NEYMAN, J. (1952) Lectures and Conferences on Mathematical Statistics and Probability,
znd Edition. Washington.
[7] NEYMAN, J. and PEARSON, E. S. (1967). The testing of statistical hypotheses in relation
to probabilities a priori, in Joint Statistical Papers of J. Neyman & E. S. Pearson,
pp. 186-zoz. Cambridge University Press.
[8] POPPER, K. R. (1959) The Logic of Scientific Discovery, I934. English Edition:
Hutchinson.
This content downloaded from 129.11.21.2 on Thu, 25 Aug 2016 02:13:15 UTC
All use subject to http://about.jstor.org/terms