Causal Decision Theory
Causal Decision Theory
Causal Decision Theory
CAUSAL DECISION T H E O R Y
David Lewis
Abstract. Newcomb's problem and similar cases show the need to incorporate causal
distinctions into the theory of rational decision; the usual noncausal decision theory,
though simpler, does not always give the right answers. I give my own version of
causal decision theory, compare it with versions offered by several other authors,
and suggest that the versions have more in common than meets the eye.
1. Introduction
Decision theory in its best-known form I manages to steer clear of the thought
that what's best to do is what the agent believes will most tend to cause good
results. Causal relations and the like go unmentioned. The theory is simple,
elegant, powerful, and conceptually economical. Unfortunately it is not quite
right. In a class of somewhat peculiar cases, called Newcomb problems, this
noncausal decision theory gives the wrong answer. It commends an irrational
policy of managing the news so as to get good news about matters which you
have no control over.
I am one of those who have concluded that we need an improved decision
theory, more sensitive to causal distinctions. Noncausal decision theory will do
when the causal relations are right for it, as they very often are, but even then
the full story is causal. Several versions of causal decision theory are on the
market in the works of Gibbard and Harper, Skyrms, and Sobel, 2 and I shall put
forward a version of my own. But also I shall suggest that we causal decision
theorists share one common idea, and differ mainly on matters of emphasis and
formulation. The situation is not the chaos of disparate approaches that it may
seem.
Of course there are many philosophers who understand the issues very well,
and yet disagree with me about which choice in a Newcomb problem is rational.
This paper is about a topic that does not arise for them. Noncausal decision
theory meets their needs and they want no replacement. I will not enter into
debate with them, since that debate is hopelessly deadlocked and I have nothing
new to add to it. Rather, I address myself to those who join me in presupposing
1 As presented, for instance, in Richard C. Jeffrey, The Logic of Decision (New York: McGrawHill, 1965).
2 Allan Gibbardand William Harper,'Counterfaetualsand Two Kinds of Expected Utility',in C.
A. Hooker,J. J. Leach, and E. F. McClennen, eds., Foundations and Applications of Decision
Theory, Volume 1 (Dordrecht, Holland: D. Reidel, 1978); Brian Skyrms,'The Role of Causal
Factors in Rational Decision', in his Causal Necessity (New Haven: Yale University Press,
1980); and Jordan Howard Sobel, Probability, Chance and Choice: A Theory of Rational
Agency (unpublished; presented in part at a workshop on Pragmatiesand Conditionals at the
University of Western Ontario in May 1978).
5
that Newcomb problems show the need for some sort of causal decision theory,
and in asking what form that theory should take.
C(X/Y) -~-.dfC(XY)/C(g),
where XYis the conjunction (intersection) of the propositions X and Y. If C(Y)
is positive, then C ( - / Y ) , the function that assigns to any world W or
proposition X the value C( W~ Y) or C(XI Y), is itself a credence function. We
say that it comes from C by conditionalising on Y. Conditionalising on one's total
evidence is a rational way to learn from experience. I shall proceed on the
assumption that it is the only way for a fully rational agent to learn from
experience; however, nothing very important will depend on that disputed
premise.
We also define (expected) value for propositions. We take credence-weighted
averages of values of worlds: for any proposition X,
-~ ~ w e x C ( l V ) V ( W ) / C ( X ) .
c ( x ) = ~z c ( x z ) ,
C(X)V(X) = ~z c ( x z ) v ( x z ) .
David Lewis
v(x) =
~zC(Z/X)V(XZ).
Thence we can get an alternative definition of expected value. For any number
v, let [V = v] be the proposition that holds at just those worlds W for which
V(W) equals v. Call [V=v] a value-level proposition. Since the value-level
propositions are a partition,
(3)
v ( x ) = zcv c ( [ V = v ] / X ) v .
I have idealized and oversimplified in three ways, but I think the dodged
complications make no difference to whether, and how, decision theory ought
to be causal. First, it seems most unlikely that any real person could store and
process anything so rich in information as the C and V functions envisaged. We
must perforce make do with summaries. But it is plausible that someone who
really did have these functions to guide him would not be so very different from
us in his conduct, apart from his supernatural prowess at logic and mathematics
and a priori knowledge generally. Second, my formulation makes
straightforward sense only under the fiction that the number of possible worlds
is finite. There are two remedies. We could reformulate everything in the
language of standard measure theory, or we could transfer our simpler
formulations to the infinite case by invoking nonstandard summations of
infinitesimal credences. Either way the technicalities would distract us, and I see
little risk that the fiction of fmitude will mislead us. Third, a credence function
over possible worlds allows for partial beliefs about the way the world is, but not
for partial beliefs about who and where and when in the world one is. Beliefs of
the second sort are distinct from those of the first sort; it is important that we
have them; however they are seldom very partial. To make them partial we
need either an agent strangely lacking in self-knowledge, or else one who gives
credence to strange worlds in which he has close duplicates. I here ignore the
decision problems of such strange agents. 3
Let us next consider the agent's options. Suppose we have a partition of
propositions that distinguish worlds where the agent acts differently (he or his
counterpart, as the case may be). Further, he can act at will so as to make any
one of these propositions hold; but he cannot act at will so as to make any
proposition hold that implies but is not implied by (is properly included in) a
proposition in the partition. The partition gives the most detailed specifications
of his present action over which he has control. Then this is the partition of the
agents' alternative options. 4 (Henceforth I reserve the variable A to range over
3 I consider them in 'Attitudes De Dicto and De Se', The Philosophical Review, 88 (1979):
pp. 513-543, especiallyp. 534. There, however, I ignore the causal aspects of decision theory. I
trust there are no further problems that would arise from merging the two topics.
4 They are his narrowest options. Any proposition implied by one of them might be called an
option for him in a broader sense, since he could act at will so as to make it hold. But when I
speak of options, I shall always mean the narrowest options.
these options.) Say that the agent realises an option iff he acts in such a way as
to make it hold. Then the business of decision theory is to say which of the
agent's alternative options it would be rational for him to realise.
All this is neutral ground. Credence, value, and options figure both in
noncausal and in causal decision theory, though of course they are put to
somewhat different uses.
4. Newcomb Problems
Suppose you are offered some small good, take it or leave it. Also you may
suffer some great evil, but you are convinced that whether you suffer it or not is
entirely outside your control. In no way does it depend causally on what you do
now. No other significant payoffs are at stake. Is it rational to take the small
good? Of course, say I.
I think enough has been said already to settle that question, but there is some
more to say. Suppose further that you think that some prior state, which may or
may not obtain and which also is entirely outside your control, would be
conducive both to your deciding to take the good and to your suffering the evil.
So if you take the good, that will be evidence that the prior state does obtain and
hence that you stand more chance than you might have hoped of suffering the
evil. Bad news! But is that any reason not to take the good? I say not, since if
the prior state obtains, there's nothing you can do about it now. In particular,
you cannot make it go away by declining the good, thus acting as you would
have been more likely to act if the prior state had been absent. All you
accomplish is to shield yourself from the bad news. That is useless. (Ex
hypothesL dismay caused by the bad news is not a significant extra payoff in its
own fight. Neither is the exhilaration or merit of boldly facing the worst.) To
decline the good lest taking it bring bad news is to play the ostrich.
The trouble with noncausal decision theory is that it commends the ostrich as
rational. Let G and - G respectively be the propositions that you take the small
Da~d L e w ~
good and that you decline it; suppose for simplicity that just these are your
options. Let E and - E respectively be the propositions that you suffer the evil
and that you do not. Let the good contribute g to the value of a world and let the
evil contribute - e ; suppose the two to be additive, and set an arbitrary zero
where both are absent. Then by Averaging,
(4)
= -eC(E/-G)
V(G) = C(E/G)V(EG) + C ( - E / G ) V ( - E G ) = - e C ( E / G ) + g
That means that - G , declining the good, is the V-maximal option iff the
difference (C(E/G) - C ( E / - G) ), which may serve as a measure of the extent
to which taking the good brings bad news, exceeds the fraction g/e. And that
may well be so under the circumstances considered. If it is, noncausal decision
theory endorses the ostrich's useless policy of managing the news. It tells you to
decline the good, though doing so does not at all tend to prevent the evil. If a
theory tells you that, it stands refuted.
In Newcomb's original problem: verisimilitude was sacrificed for extremity.
C(E/G) was close to one and C ( E / - G ) was close to zero, so that declining the
good turned out to be V-maximal by an overwhelming margin. To make it so,
we have to imagine someone with the mind-boggling power to detect the entire
vast combination of causal factors at some earlier time that would cause you to
decline the good, in order to inflict the evil if any such combination is present.
Some philosophers have refused to learn anything from such a tall story.
If our aim is to show the need for causal decision theory, however, a more
moderate version of Newcomb's problem will serve as well. Even if the
difference of C(E/G) and C ( E / - G ) is quite small, provided that it exceeds
g/e, we have a counterexample. More moderate versions can also be more
down-to-earth, as wimess the medical Newcomb problems. 6 Suppose you like
eating eggs, or smoking, or loafing when you might go out and run. You are
convinced, contrary to popular belief, that these pleasures will do you no harm
at all. (Whether you are right about this is irrelevant.) But also you think you
might have some dread medical condition: a lesion of an artery, or nascent
cancer, or a weak heart. If you have it, there's nothing you can do about it now
and it will probably do you a lot of harm eventually. In its earlier stages, this
condition is hard to detect. But you are convinced that it has some tendency,
perhaps slight, to cause you to eat eggs, smoke, or loaf. So if you find yourself
indulging, that is at least some evidence that you have the condition and are in
5 Presented in Robert Nozick, 'Newcomb's Problem and Two Principles of Choice', in N.
Reseher et aL, eds., Essays in Honor of Carl G. Hempel (Dordreeht, Holland: D. Reidel, 1970).
6 Discussed in Skyrms, and Noziek, opera cit.; in Richard C. Jeffrey, 'Choice, Chance, and
Credence', in G. H. yon Wright and G. Flqistad, eds., Philosophy of Logic (Dordrecht, Holland:
M. Nijhoff, 1980); and in Richard C. Jeffrey, 'How is it Reasonable to Base Preferences on
F_~timates of Chance?' in D. H. Mellor, ed., Science Belief and Behaviour: Essays in Honour of
R. B. Braithwaite (Cambridge: Cambridge University Press, 1980). I discuss another sort of
moderate and down-to-earth Newcemb problem in 'Prisoners' Dilemma is a Neweomb
Problem', Philosophy and Public Affairs, 8 (1979): pp. 235-240.
10
for big trouble. But is that any reason not to indulge in harmless pleasures ? The
V-maximising rule says yes, if the numbers are right. I say no.
So far, I have considered pure Newcomb problems. There are also mixed
problems. You m a y think that taking the good has some tendency to produce
(or prevent) the evil, but also is a manifestation o f some prior state which tends
to produce the evil. Or you may be uncertain whether your situation is a
N e w c o m b problem or not, dividing your credence between alternative
hypotheses about the causal relations that prevail. These mixed cases are still
more realistic, yet even they can refute noncausal decision theory.
However, no N e w c o m b problem, pure or mixed, can refute anything if it is
not possible. The Tickle Defence o f noncausal decision theory 7 questions
whether N e w c o m b problems really can arise. It runs as follows: 'Supposedly the
prior state that tends to cause the evil also tends to cause you to take the good.
The dangerous lesion causes you to choose to eat eggs, or whatever. H o w can it
do that? If you are fully rational your choices are governed entirely by your
beliefs and desires so nothing can influence your choices except by influencing
your beliefs and desires. But if you are fully rational, you know your own mind.
If the lesion produces beliefs and desires favourable to eating eggs, you will be
aware of those beliefs and desires at the outset o f deliberation. So you won't
have to wait until you find yourself eating eggs to get the bad news. You will
have it already when you feel that tickle in the tastebuds - - or whatever
introspectible state it might be - - that manifests your desire for eggs. Your
consequent choice tells you nothing more. By the time you decide whether to
eat eggs, your credence function already has been modified by the evidence of
the tickle. Then C ( E / G ) does not exceed C ( E / - G ) , their difference is zero
and so does not exceed g/e, - G is not V-maximal, and noncausal decision
theory does not make the mistake of telling you not to eat the eggs.'
I reply that the Tickle Defence does establish that a N e w c o m b problem
cannot arise for a fully rational agent, but that decision theory should not be
limited to apply only to the fully rational agent, s Not so, at least, if rationality is
taken to include self-knowledge. May we not ask what choice would be rational
for the partly rational agent, and whether or not his partly rational methods of
decision will steer him correctly? A partly rational agent may very well be in a
moderate N e w c o m b problem, either because his choices are influenced by
something besides his beliefs and desires or because he cannot quite tell the
strengths of his beliefs and desires before he acts. ( ' H o w can I tell what I think
7 Discussed in Skyrms, op. cit.; and most fully presented in Ellery Eells, "Causality, Utility and
Decision", forthcoming in Synthese. Eells, argues that Newcomb problems are stopped by
assumptions of rationality and self-knowledgesomewhat weaker than those of the simple Tickle
Defence considered here, but even those weaker assumptions seem to me unduly restrictive.
s In fact, it may not apply to the fully rational agent. It is hard to see how such an agent can be
uncertain what he is going to choose, hence hard to see how he can be in a position to deliberate.
See Richard C. Jeffrey, "A Note on the Kinematics of Preference", Erkenntnis, 11 (1977):
135-141. Further, the "fully rational agent" required by the Tickle Defence is, in one way, not so
very rational after all. Self-knowledgeis an aspect of rationality, but so is willingness to learn
from experience. If the agent's introspective data make him absolutely certain of his own
credences and values, as they must if the Defence is to work, then no amount of evidence that
those data are untrustworthy will ever persuade him not to trust them.
David Lewis
11
till I see what I say?' - - E. M. Forster.) For the dithery and the self-deceptive,
no amount of Gedankenexperimente in decision can provide as much selfknowledge as the real thing. So even if the Tickle Defence shows that noncausal
decision theory gives the right answer under powerful assumptions of
rationality (whether or not for the right reasons), Newcomb problems still show
that a general decision theory must be causal.
12
V(A) = Xx C(K/A)V(AK).
Let us give noncausal decision theory its due before we take leave of it. It
works whenever the dependency hypotheses are probabilistically independent
of the options, so that all the C(KIA)'s equal the corresponding C(K)'s. Then
by (5) and the definition of U, the corresponding V(A )'s and U(A )'s also are
equal. V-maximising gives the same fight answers as U-maximising. The Tickle
Defenee seems to show that the K ' s must be independent of the A's for any
fully rational agent. Even for partly rational agents, it seems plausible that they
are at least close to independent in most realistic cases. Then indeed Vmaximising works. But it works because the agent's beliefs about causal
dependence are such as to make it work. It does not work for reasons which
leave causal relations out of the story.
I am suggesting that we ought to undo a seeming advance in the development
of decision theory. Everyone agrees that it would be ridiculous to maximise the
'expected utility' defined by
XzC(Z)V(AZ)
David Lewis
13
where Z ranges over just any old partition. It would lead to different answers
for different partitions. For the partition of value-level propositions, for
instance, it would tell us fatalistically that all options are equally good! What to
do? Savage suggested, in effect, that we make the calculation with
unconditional credences, but make sure to use only the right sort of partition. 9
But what sort is that? Jeffrey responded that we would do better to make the
calculation with conditional credences, as in the right hand side of (2). Then we
need not be selective about partitions, since we get the same answer, namely
V(A ), for all of them. In a way, Jeffrey himself was making decision theory
causal. But he did it by using probabilistic dependence as a mark of causal
dependence, and unfortunately the two need not always go together. So I have
thought it better to return to unconditional credences and say what sort of
partition is right.
As I have formulated it, causal decision theory is causal in two different
ways. The dependency hypotheses are causal in their content: they class worlds
together on the basis of likenesses of causal dependence. But also the
dependency hypotheses themselves are causally independent of the agent's
actions. They specify his influence over other things, but over them he has no
influence. (Suppose he did. Consider the dependency hypothesis which we get
by taking account of the ways the agent can manipulate dependency hypotheses
to enhance his control over other things. This hypothesis seems to be right no
matter what he does. Then he has no influence over whether this hypothesis or
another is right, contrary to our supposition that the dependency hypotheses are
within his influence.) Dependency hypotheses are 'act-independent states' in a
causal sense, though not necessarily in the probabilistic sense. If we say that the
right sort of partition for calculating expected utility is a causally actindependent one, then the partition of dependency hypotheses qualifies. But I
think it is better to say just that the right partition is the partition of dependency
hypotheses, in which case the emphasis is on their causal content rather than
their act-independence.
If any of the credences C(AK) is zero, the rule of U-maximising falls silent.
For in that case V(AK) becomes an undefined sum of quotients with
denominator zero, so U(A ) in turn is undefined and A cannot be compared in
utility with the other options. Should that silence worry us? I think not, for the
case ought never to arise. It may seem that it arises in the most extreme sort of
Newcomb problem: suppose that taking the good is thought to make it
absolutely certain that the prior state obtains and the evil will follow. Then if A
is the option of taking the good and K says that the agent stands a chance of
escaping the evil, C(AK) is indeed zero and U(A ) is indeed undefined. What
should you do in such an extreme Newcomb problem? V-maximise after all?
9 Leonard J. Savage, The Foundations of Statistics (New York: Wiley, 1954): p. 15. The
suggestion is discussed by Richard C. Jeffrey in 'Savage's Omelet', in P. Suppe and P. D.
Asquith, eds., PSA 1976, Volume 2 (East Lansing, Michigan: Philosophy of Science
Association, 1977).
14
No; what you should do is not be in that problem in the first place. Nothing
should ever be held as certain as all that, with the possible exception of the
testimony of the senses. Absolute certainty is tantamount to a firm resolve
never to change your mind no matter what, and that is objectionable. However
much reason you may get to think that option A will not be realised if K holds,
you will not if you are rational lower C(AK) quite to zero. Let it by all means
get very, very small; but very, very small denominators do not make utilities go
undefined.
What of the partly rational agent, whom I have no wish to ignore? Might he
not rashly lower some credence C(AK) all the way to zero? I am inclined to
think not. What makes it so that someone has a certain credence is that its
ascription to him is part of a systematic pattern of ascriptions, both to him and
to others like him, both as they are and as they would have been had events
gone a bit differently, that does the best job overall of rationalising behaviour.~
I find it hard to see how the ascription of rash zeros could be part of such a best
pattern. It seems that a pattern that ascribes very small positive values instead
always could do just a bit better, rationalising the same behaviour without
gratuitously ascribing the objectionable zeros. If I am right about-this, rash zeros
are one sort of irrationality that is downright impossible. 11
6. Reformulations
The causal decision theory proposed above can be reformulated in various
equivalent ways. These will give us some further understanding of the theory,
and will help us in comparing it with other proposed versions of causal decision
theory.
Expansions: We can apply the Rule of Averaging to expand the V(AK)'s that
appear in our definition of expected utility. Let Z range over any partition. Then
we have
(6)
U(A) = ~ x Ez C(K)C(Z/AK)V(AKZ).
(If any C(AKZ) is zero we may take the term for K and Z as zero, despite the
fact that V(AKZ) is undefined.) This seems only to make a simple thing
complicated; but if the partition is well chosen, (6) may serve to express the
utility of an option in terms of quantities that we find it comparatively easy to
judge.
Let us call a partition rich iff, for every member S of that partition and for
~0 See my 'Radical Interpretation', Synthese, 23 (1974): pp. 331-344. I now think that discussion is
too individualistic, however, in that it neglects the possibility that one might have a belief or
desire entirely because the ascription of it to him is part of a systematic pattern that best
rationalises the behaviour of other people. On this point, see my discussion of the madman in
'Mad Pain and Martian Pain', in Ned Block, ed., Readings in Philosophy of Psychology, Volume
1 (Cambridge, Massachusetts: Harvard University Press, 1980).
fl Those who think that credences can easily fall to zero often seem to have in mind credenc~
conditional on some background theory of the world which is accepted, albeit tentatively, in an
all-or-nothing fashion. While I don't object to this notion, it is not what I mean by credence. As I
understand the term, what is open to reconsideration does not have a credence of zero or one;
these extremes are not to be embraced lightly.
David Lewis
15
(7)
Equation (7) for expected utility resembles equation .(2) for expected value,
except that the inner sum in (7) replaces the conditional credence C ( S / A ) in
the corresponding instance of (2). As we shall see, the analogy can be pushed
further. Two examples of rich partitions to which (7) applies are the partition of
possible worlds and the partition of value-level propositions [V=v].
Imaging: Suppose we have a function that selects, for any pair of a world Wand
a suitable proposition X, a probability distribution W x. Suppose further that W x
assigns probability only to X-worlds, so that Wx(X) equals one. (Hence at least
the empty proposition must not be 'suitable'.) Call the function an imaging
function, and call W x the image o f W on X. The image might be sharp, if W x
puts all its probability on a single world; or it might be blurred, with the
probability spread over more than one world.
Given an imaging function, we can apply it to form images also of probability
distributions. We sum the superimposed images of all the worlds, weighting the
images by the original probabilities of their source worlds. For any pair of a
probability distribution C and a suitable proposition X, we define C x, the image
of C on X, as follows. First, for any world W',
16
For our present purposes, what we want are images of the agent's credence
function on his various options. The needed imaging function can be defined in
terms of the partition of dependency hypotheses: let
CA(Y) = EK C(K)C(YIAK).
The inner sum in (7) therefore turns out to be the credence, imaged on A, of S.
So by (7) and (8) together,
(9)
U(A) = Es CA (S)V(AS).
Now we have something like the Rule of Averaging for expected value, except
that the partition must be rich and we must image rather than conditionalising.
For the rich partition of possible worlds we have
(lo)
U(A) = : ~ w c A ( w ) v ( w ) .
which resembles the definition of expected value. For the rich partition of
value-level propositions we have something resembling ,(3):
(11)
U(A) -- ~ CA([V=vI)v.
Dav~Lew~
17
expected utility in the manner of (10), and he advocates maximising the utility
so defined rather than expected value.
Sobel unites his decision theory with a treatment of counterfactual
conditionals in terms of closest antecedent-worlds.13 If WA(W') is positive, then
we think of W' as one of the A -worlds that is in some sense closest to the world
W. What might bd the case if it were the case that A, from the standpoint of IV,
is what holds at some such closest A -world; what would be the case if A, from
the standpoint of IV, is what holds at all of them. Sobel's apparatus gives us
quantitative counterfactuals intermediate between the mights and the woulds.
We can say that if it were that A, it would be with probability p that X; meaning
that WA(X) equals p, or in Sobel's terminology that X holds on a subset of the
closest A-worlds whose tendencies, at W and on the supposition A, sum to p.
Though Sobel leaves the dependency hypotheses out of his decision theory,
we can perhaps bring them back in. Let us say that worlds image alike (on the
agent's options) iff, for each option, their images on that option are exactly the
same. Imaging alike is an equivalence relation, so we have the partition of its
equivalence classes. If we start with the dependency hypotheses and define the
imaging function as I did, it is immediate that worlds image alike iff they are
worlds where the same dependency hypothesis holds;~ so the equivalence
classes turn out to be just the dependency hypotheses.
The question is whether dependency hypotheses could be brought into
Sobel's theory by defining them as equivalence classes under the relation of
imaging alike. Each equivalence class could be described, in Sobel's
terminology, as a maximally specific proposition about the tendencies of the
world on all alternative suppositions about which option the agent realises. That
sounds like a dependency hypothesis to me. Sobel tells me (personal
communication, 1980) that he is inclined to agree, and does regard his decision
theory as causal; though it is hard to tell that from his written presentation, in
which causal language very seldom appears.
If the proposal is to succeed technically, we need the following thesis: if K w is
the equivalence class of W under the relation of imaging alike (of having the
same tendencies on each option) then, for any option .4 and world W',
WA(W') equals C(W'/AKw). If so, it follows that if we start as Sobel does with
the imaging function, defining the dependency hypotheses as equivalence
classes, and thence define an imaging function as I did, we will get back the
same imaging function that we started with. It further follows, by our results in
Section 6, that expected utility calculated in my way from the defined
dependency hypotheses is the same as expected utility calculated in Sobel's way
from the imaging function. They must be the same, if the defined dependency
hypotheses introduced into Sobel's theory are to play their proper role.
Unfortunately, the required thesis is not a part of Sobel's theory; it would be
an extra constraint on the imaging function. It does seem a very plausible
constraint, at least in ordinary cases. Sobel suspends judgement about imposing
~3 As in my Counterfactuals (Oxford: Blackwell, 1973), without the complications raised by
possible infinite sequences of closer and closer antecedent-worlds.
18
David Lewis
19
20
his influence. That means that on Skyrms' calculation his U(A )'s reduce to the
corresponding V (A 5 )'s, so V-maximising is right for him. That's wrong. Since
he thinks he has very little influence over whether he has the dread lesion, his
decision problem about eating eggs is very little different from that of someone
who thinks the lesion is entirely outside his influence. V-maximising should
come out wrong for very much the same reason in both cases.
No such difficulty threatens Skyrms' proposal broadly construed. The agent
may well wonder which of the causal factors narrowly construed are within his
influence, but he cannot rationally doubt that the dependency hypotheses are
entirely outside it. On the broad construal, Skyrms' second description of the
partition of hypotheses is a gloss on the first, not an amendment. The
hypotheses already specify which of the (narrow) factors are outside the agent's
influence, for that is itself a (broad) factor outside his influence. Skyrms notes
this, and that is why I think it must be the broad construal that he intends.
Likewise the degrees and directions of influence over (narrow) factors are
themselves (broad) factors outside the agent's influence, hence already
specified according to the broad construal of Skyrms' first description.
Often, to be sure, the difference between the broad and nat'row construals
will not matter. There may well be a correlation, holding throughout the worlds
which enjoy significant credence, between dependency hypotheses and
combinations of (narrow) factors outside the agent's influence. The difference
between good and bad dependency hypotheses may in practice amount to the
difference between absence and presence of a lesion. However, I find it rash to
assume that there must always be some handy correlation to erase the
difference between the broad and narrow construais. Dependency hypotheses
do indeed hold in virtue of lesions and the like, but they hold also in virtue of
the laws of nature. It would seem that uncertainty about dependency
hypotheses might come at least partly from undertainty about the laws.
Skyrms is sympathetic, as a m I, TM tO the neo-Humean thesis that every
contingent truth about a world - - law, dependency hypothesis, or what you will
- - holds somehow in virtue of that world's total history of manifest matters of
particular fact. Same history, same everything. But that falls short of implying
that dependency hypotheses hold just in virtue of causal factors, narrowly
construed; they might hold partly in virtue of dispersed patterns of particular
fact throughout history, including the future and the distant present. Further,
even if we are inclined to accept the neo-Humean thesis, it still seems safer not
to make it a presupposition of our decision theory. Whatever we think of the
neo-Humean thesis, I conclude that Skyrms' decision theory is best taken under
the broad construal of 'factor' under which his K's are the dependency
hypotheses and his calculation of utility is the same as mine.15
~4 Although sympathetic, I have some doubts; see my ' A Subjectivist's Guide to Objective
Chance', in R. C. Jeffrey, ed., Studies in Inductive Logic and Probability, Volume 2 (Berkeley
and Los Angeles: University of California Press, 1980): pp. 290-292.
~5 The decision theory of Nancy Cartwright, 'Causal Laws and Effective Strategies', NoEs, 13
(1979): pp. 419-437, is, as she remarks, 'structurally identical' to Skyrms' theory for the case
where value is a matter of reaching some all-or-nothing goal. However, hers is not a theory of
David Lewis
21
22
specify occurrences capable of causing and being caused, and the occurrences
must be entirely distinct. Further, we must exclude 'back-tracking
counterfactuals' based on reasoning from different supposed effects back to
different causes and forward again to differences in other effects. Suppose I am
convinced that stroking has no influence over purring, but that I wouldn't
stroke Bruce unless I were in a mood that gets him to purr softly by emotional
telepathy. Then I give credence to
David Lewis
23
(Comparing (12) with (8), we find that our present assumptions equate
C(A D---*S) with C~(S), the credence of S imaged on the option A .)
Substituting (12) into (7) we have
(13)
which amounts to Gibbard and Harper's defining formula for the 'genuine
expected utility' they deem it rational to maximise. 19
We have come the long way around to (13), which is not only simple but also
intuitive in its own right. But (13) by itself does not display the causal character
of Gibbard and Harper's theory, and that is what makes it worthwhile to come
at it by way of dependency hypotheses. No single C(A n---, S) reveals the
agent's causal views, since it sums the credences of hypotheses which set
A []-- S in a pattern of dependence and others which set A []---, S in a pattern
of independence. Consequently the roundabout approach helps us to appreciate
what the theory of Gibbard and Harper has in common with that of someone
like Skyrrns who is reluctant to use counterfactuals in expressing dependency
hypotheses.
~8Such a theory is defended in Terence Horgan, 'Counterfactuals and Neweomb's Problem',
Journal of Philosophy (forthcoming).
~9To get exactlytheir formula,take their 'outcomes'as conjunctionsASwith 'desirability'givenby
V(AS); and bear in mind (i) that A []-- ASis the same as A []--, S, and (ii) that ifA and A' are
contraries, A D--, A'S is the empty propositionwith credencezero.
24
David Lewis
25
26
DavM Lewis
27
Observe that this hypothesis addresses itself not only to the question of whether
loud and soft purring are within my influence, but also to the question of the
extent and the direction of my influence.
If a chance proposition says that one of the S's has a chance of one, it must
say that the others all have chances of zero. Call such a chance proposition
extreme. I shall not distinguish between an extreme chance proposition and the
S that it favours. If they differ, it is only on worlds where something with zero
chance nevertheless happens. I am inclined to think that they do not differ at
all, since there are no worlds where anything with zero chance happens; the
contrary opinion comes of mistaking infinitesimals for zero. But even if there is
a difference between extreme chance propositions and their favoured S's, it will
not matter to calculations of utility so let us neglect it. Then our previous
dependency hypotheses, the conjunctions of full patterns, are subsumed under
the conjunctions of probabilistic full patterns. So are the conjunctions of mixed
full patterns that consist partly of A D---"S's and partly of A D---, [P=p]'s.
Dare we assume that there is a probabilistic full pattern for every world, so
that on this second try we have succeeded in capturing all the dependency
hypotheses by means of counterfactuals? I shall assume it, not without
misgivings. That means accepting a special case of Conditional Excluded
Middle, but (i) the Chance Objection will not arise again, 23 (ii) there should not
be too much need for arbitrary choice on other grounds, since the options are
quite specific suppositions and not far-fetched, and (iii) limited arbitrary choice
results in nothing worse than a limited risk of the answers going indeterminate.
So my own causal decision theory consists of two theses. My main thesis is
that we should maximise expected utility calculated by means of dependency
hypotheses. It is this main thesis that I claim is implicitly accepted also by
Gibbard and Harper, Skyrms, and Sobel. My subsidiary thesis, which I put
forward much more tentatively and which I won't try to foist on my allies, is
that the dependency hypotheses are exactly the conjunctions of probabilistic full
patterns.
(The change I have made in the Gibbard-Harper version has been simply to
replace the rich partition of S's by the partition of chance propositions [P=p]
pertaining to these S's. One might think that perhaps that was no change at all:
perhaps the S's already were the chance propositions for some other rich
partition. However, I think it at least doubtful that the chance propositions can
be said to 'specify combinations of occurrences' as the S's were required to do.
This question would lead us back to the neo-Humean thesis discussed in Section
8.)
Consider some particular A and S. If a dependency hypothesis K is the
conjunction of a probabilistic full pattern, then for some p, K implies
A D ~ [P=p]. Then A K implies [P=p]; and C ( S / A K ) equals p(S), at least in
any ordinary case. 24 For any p, the K's that are conjunctions of probabilistic full
23 Chancesaren't chancy;if [P=p] pertains to a certaintime, its own chanceat that timeof holding
must be zero or one, by the argument of 'A Subjectivist's Guide to Objective Chance':
pp. 276-277.
24 That follows by what I call the Principal Principle connecting chance and credence, on the
28
Substituting (14) into (7) gives us a formula defining expected utility in terms
of counterfactuals with chance propositions as consequents:
(15)
For any S and any number q from zero to one, let [P(S)=cfl be the
proposition that holds at just those worlds where the chance of S, at the time
when the agent realises his option, is q. It is the disjunction of those [P=p]'s for
which p(S) equals q. We can lump together counterfactuals in (14) and (15) to
obtain reformulations in which the consequents concern chances of single S's:
(16)
(17)
David Lewis
29
That concludes an exposition and survey of causal decision theory. In this final
section, I wish to defend it against an objection raised by Daniel Hunter and
Reed Richter. 25 Their target is the Gibbard-Harper version; but it depends on
nothing that is special to that version, so I shall restate it as an objection against
causal decision theory generally.
Suppose you are one player in a two-person game. Each player can play red,
play white, play blue, or not play. If both play the same colour, each gets a
thousand dollars; if they play different colours, each loses a thousand dollars; if
one or both don't play, the game is off and no money changes hands. Value
goes by money; the game is played only once; there is no communication or
prearrangement between the players; and there is nothing to give a hint in
favour of one colour or another - - no 'Whites rule OK!' sign placed where both
can see that both can see it, or the like. So far, this game seems not worthwhile.
But you have been persuaded that you and the other player are very much alike
psychologically and hence very likely to choose alike, so that you are much
more likely to play and win than to play and lose. Is it rational for you to play?
Yes. So say I, so say Hunter and Richter, and so (for what it is worth) says
noncausal decision theory. But causal decision theory seems to say that it is not
rational to play. If it says that, it is wrong and stands refuted. It seems that you
have four dependency hypotheses to consider, corresponding to the four ways
your partner might play:
K~
K2
K3
K4
By the symmetry of the situation, K~ and K 2 and K 3 should get equal credence.
Then the expected utility of not playing is zero, whereas the expected utilities of
playing the three colours are equal and negative. So we seem to reach the
unwelcome conclusion that not playing is your U-maximal option.
I reply that Hunter and Richter have gone wrong by misrepresenting your
partition of options. Imagine that you have a servant. You can play red, white,
or blue; you can not play; or you can tell your serva~t to play for you. The fifth
option, delegating the choice, might be the one that beats not playing and makes
it rational to play. Given the servant, each of our previous dependency
hypotheses splits in three. For instance K~ splits into:
K1,1
K~,2
K~.3
Whatever you do, your partner would play red, and your servant
would play red if you delegated the choice;
Whatever you do, your partner would play red, and your servant
would play white if you delegated the choice;
Whatever you do, your partner would play red, and your servant
would play blue if you delegated the choice.
30
(If you and your partner are much alike, he too has a servant, so we can split
further by dividing the case in which he plays red, for instance, into the case in
which he plays red for himself and the case in which he delegates his choice and
his servant plays red for him. However, that difference doesn't matter to you
and is outside your influence, so let us disregard it.) The information that you
and your partner (and your respective servants) are much alike might persuade
you to give little credence to the dependency hypotheses Kla and K1,3 but to
give more to K1,1; and likewise for the subdivisions o f K 2 and Ka. Then you give
your credence mostly to dependency hypotheses according to which you would
either win orbreak even by delegating your choice. Then causal decision theory
does not tell you, wrongly, that it is rational not to play. Playing by delegating
your choice is your U-maximal option.
But you don't have a servant. What of it? You must have a fie-breaking
procedure. There must be something or other that you do after deliberation that
ends in a tie. Delegating your choice to your tie-breaking procedure is a fifth
option for you, just as delegating it to your servant would be if you had one. If
you are persuaded that you will probably win if you play because you and your
partner are alike psychologically, it must be because you are persuaded that
your tie-breaking procedures are alike. You could scarcely think that the two of
you are likely to coordinate without resorting to your tie-breaking procedures,
since ex hypothesi the situation plainly is a tie! So you have a fifth option, and as
the story is told, it has greater expected utility than not playing. This is not the
option of playing red, or white, or blue, straightway at the end of deliberation,
although if you choose it you will indeed end up playing red or white or blue.
What makes it a different option is that it interposes something extra -something other than deliberation - - after you are done deliberating and before
you play.26
Princeton University
26 This paper is based on a talk given at a conferenc~ on Conditional Expected Utility at the
University of Pittsburgh in November 1978. It has benefited from discussions and
correspondence with Nancy Cartwright, Allan Gibbard, William Harper, Daniel Hunter, Frank
Jackson, Richard Jeffrey, Gregory Kavka, Reed Richter, Brian Skyrms, J. Howard Sobel, and
Robert Stalnaker.