Random Variables
Random Variables
Random Variables
24
Example 4.1. Assume that you have a bag with 11 cubes, 7 of which have a fuzzy surface and
4 are smooth. Out of the 7 fuzzy ones, 3 are red and 4 are blue; out of 4 smooth ones, 2 are red
and 2 are blue. So, there are 5 red and 6 blue cubes. Other than color and fuzziness, the cubes
have no other distinguishing characteristics.
You plan to pick a cube out of the bag at random, but forget to wear gloves. Before you
start your experiment, the probability that the selected cube is red is 5/11. Now, you reach into
the bag, grab a cube, and notice it is fuzzy (but you do not take it out or note its color in any
other way). Clearly, the probability should now change to 3/7!
Your experiment clearly has 11 outcomes. Consider the events R, B, F , S that the selected
cube is red, blue, fuzzy, and smooth, respectively. We observed that P (R) = 5/11. For the
probability of a red cube, conditioned on it being fuzzy, we do not have notation, so we introduce
it here: P (R|F ) = 3/7. Note that this also equals
P (the selected ball is red and fuzzy)
P (R F )
=
.
P (F )
P (the selected ball is fuzzy)
This conveys the idea that with additional information the probability must be adjusted .
This is common in real life. Say bookies estimate your basketball teams chances of winning a
particular game to be 0.6, 24 hours before the game starts. Two hours before the game starts,
however, it becomes known that your teams star player is out with a sprained ankle. You
cannot expect that the bookies odds will remain the same and they change, say, to 0.3. Then,
the game starts and at half-time your team leads by 25 points. Again, the odds will change, say
to 0.8. Finally, when complete information (that is, the outcome of your experiment, the game
in this case) is known, all probabilities are trivial, 0 or 1.
For the general denition, take events A and B, and assume that P (B) > 0. The conditional
probability of A given B equals
P (A|B) =
P (A B)
.
P (B)
Example 4.2. Here is a question asked on Wall Street job interviews. (This is the original
formulation; the macabre tone is not unusual for such interviews.)
Lets play a game of Russian roulette. You are tied to your chair. Heres a gun, a revolver.
Heres the barrel of the gun, six chambers, all empty. Now watch me as I put two bullets into
the barrel, into two adjacent chambers. I close the barrel and spin it. I put a gun to your head
and pull the trigger. Click. Lucky you! Now Im going to pull the trigger one more time. Which
would you prefer: that I spin the barrel rst or that I just pull the trigger?
Assume that the barrel rotates clockwise after the hammer hits and is pulled back. You are
given the choice between an unconditional and a conditional probability of death. The former,
25
if the barrel is spun again, remains 1/3. The latter, if the trigger is pulled without the extra
spin, equals the probability that the hammer clicked on an empty slot, which is next to a bullet
in the counterclockwise direction, and equals 1/4.
For a xed condition B, and acting on events A, the conditional probability Q(A) = P (A|B)
satises the three axioms in Chapter 3. (This is routine to check and the reader who is more theoretically inclined might view it as a good exercise.) Thus, Q is another probability assignment
and all consequences of the axioms are valid for it.
Example 4.3. Toss two fair coins, blindfolded. Somebody tells you that you tossed at least
one Heads. What is the probability that both tosses are Heads?
Here A = {both H}, B = {at least one H}, and
P (A|B) =
P (both H)
P (A B)
=
=
P (B)
P (at least one H)
1
4
3
4
1
= .
3
Example 4.4. Toss a coin 10 times. If you know (a) that exactly 7 Heads are tossed, (b) that
at least 7 Heads are tossed, what is the probability that your rst toss is Heads?
For (a),
(9 )
6
)
P (rst toss H|exactly 7 Hs) = (10
7
1
210
2110
7
.
10
Why is this not surprising? Conditioned on 7 Heads, they are equally likely to occur on any
given 7 tosses. If you choose 7 tosses out of 10 at random, the rst toss is included in your
7
.
choice with probability 10
For (b), the answer is, after canceling
1
,
210
(9 )
() () ()
+ 97 + 98 + 99
65
(10) (10) (10) (10) =
0.7386.
88
7 + 8 + 9 + 10
6
Clearly, the answer should be a little larger than before, because this condition is more advantageous for Heads.
Conditional probabilities are sometimes given, or can be easily determined, especially in
sequential random experiments. Then, we can use
P (A1 A2 ) = P (A1 )P (A2 |A1 ),
etc.
Example 4.5. An urn contains 10 black and 10 white balls. Draw 3 (a) without replacement,
and (b) with replacement. What is the probability that all three are white?
We already know how to do part (a):
26
1. Number of outcomes:
(20)
3
(10)
3
(10
3)
.
(20
3)
To do this problem another way, imagine drawing the balls sequentially. Then, we are
computing the probability of the intersection of the three events: P (1st ball is white, 2nd ball
is white, and 3rd ball is white). The relevant probabilities are:
9
19 .
1
2
9
19
8
18 ,
8
18 .
This approach is particularly easy in case (b), where the previous colors of the selected balls
( )3
do not aect the probabilities at subsequent stages. The answer, therefore, is 12 .
Theorem 4.1. First Bayes formula. Assume that F1 , . . . , Fn are pairwise disjoint and that
F1 . . . Fn = , that is, exactly one of them always happens. Then, for an event A,
P (A) = P (F1 )P (A|F1 )+P (F2 )P (A|F2 )+. . .+P (Fn )P (A|Fn ) .
Proof.
P (F1 )P (A|F1 ) + P (F2 )P (A|F2 ) + . . . + P (Fn )P (A|Fn ) = P (A F1 ) + . . . + P (A Fn )
= P ((A F1 ) . . . (A Fn ))
= P (A (F1 . . . Fn ))
= P (A ) = P (A)
27
Here you condition on the outcome of the coin toss, which could be Heads (event F ) or Tails
1
c
(event F c ). If A = {exactly one 6}, then P (A|F ) = 16 , P (A|F c ) = 25
36 , P (F ) = P (F ) = 2 and
so
2
P (A) = P (F )P (A|F ) + P (F c )P (A|F c ) = .
9
Example 4.7. Roll a die, then select at random, without replacement, as many cards from the
deck as the number shown on the die. What is the probability that you get at least one Ace?
Here Fi = {number shown on the die is i}, for i = 1, . . . , 6. Clearly, P (Fi ) = 16 . If A is the
event that you get at least one Ace,
1. P (A|F1 ) =
1
13 ,
(48i )
.
(52i )
r
nr+1
pk1,r +
pk1,r1 ,
n
n
28
Theorem 4.2. Second Bayes formula. Let F1 , . . . , Fn and A be as in Theorem 4.1. Then
P (Fj |A) =
P (Fj A)
P (A|Fj )P (Fj )
=
.
P (A)
P (A|F1 )P (F1 ) + . . . + P (A|Fn )P (Fn )
An event Fj is often called a hypothesis, P (Fj ) its prior probability, and P (Fj |A) its posterior
probability.
Example 4.9. We have a fair coin and an unfair coin, which always comes out Heads. Choose
one at random, toss it twice. It comes out Heads both times. What is the probability that the
coin is fair?
The relevant events are F = {fair coin}, U = {unfair coin}, and B = {both tosses H}. Then
P (F ) = P (U ) = 12 (as each coin is chosen with equal probability). Moreover, P (B|F ) = 14 , and
P (B|U ) = 1. Our probability then is
1
2
1 1
2 4
1
1
4 + 2
1
= .
5
1
Example 4.10. A factory has three machines, M1 , M2 and M3 , that produce items (say,
lightbulbs). It is impossible to tell which machine produced a particular item, but some are
defective. Here are the known numbers:
machine
M1
M2
M3
You pick an item, test it, and nd it is defective. What is the probability that it was made
by machine M2 ?
The best way to think about this random experiment is as a two-stage procedure. First you
choose a machine with the probabilities given by the proportion. Then, that machine produces
an item, which you then proceed to test. (Indeed, this is the same as choosing the item from a
large number of them and testing it.)
Let D be the event that an item is defective and let Mi also denote the event that the
item was made by machine i. Then, P (D|M1 ) = 0.001, P (D|M2 ) = 0.002, P (D|M3 ) = 0.003,
P (M1 ) = 0.2, P (M2 ) = 0.3, P (M3 ) = 0.5, and so
P (M2 |D) =
0.002 0.3
0.26.
0.001 0.2 + 0.002 0.3 + 0.003 0.5
Example 4.11. Assume 10% of people have a certain disease. A test gives the correct diagnosis
with probability of 0.8; that is, if the person is sick, the test will be positive with probability 0.8,
but if the person is not sick, the test will be positive with probability 0.2. A random person from
29
the population has tested positive for the disease. What is the probability that he is actually
sick? (No, it is not 0.8!)
Let us dene the three relevant events: S = {sick}, H = {healthy}, T = {tested positive}.
Now, P (H) = 0.9, P (S) = 0.1, P (T |H) = 0.2 and P (T |S) = 0.8. We are interested in
P (S|T ) =
8
P (T |S)P (S)
=
31%.
P (T |S)P (S) + P (T |H)P (H)
26
Note that the prior probability P (S) is very important! Without a very good idea about what
it is, a positive test result is dicult to evaluate: a positive test for HIV would mean something
very dierent for a random person as opposed to somebody who gets tested because of risky
behavior.
Example 4.12. O. J. Simpsons rst trial , 1995. The famous sports star and media personality
O. J. Simpson was on trial in Los Angeles for murder of his wife and her boyfriend. One of the
many issues was whether Simpsons history of spousal abuse could be presented by prosecution
at the trial; that is, whether this history was probative, i.e., it had some evidentiary value,
or whether it was merely prejudicial and should be excluded. Alan Dershowitz, a famous
professor of law at Harvard and a consultant for the defense, was claiming the latter, citing the
statistics that < 0.1% of men who abuse their wives end up killing them. As J. F. Merz and
J. C. Caulkins pointed out in the journal Chance (Vol. 8, 1995, pg. 14), this was the wrong
probability to look at!
We need to start with the fact that a woman is murdered. These numbered 4, 936 in 1992,
out of which 1, 430 were killed by partners. In other words, if we let
A = {the (murdered) woman was abused by the partner},
M = {the woman was murdered by the partner},
then we estimate the prior probabilities P (M ) = 0.29, P (M c ) = 0.71, and what we are interested
in is the posterior probability P (M |A). It was also commonly estimated at the time that
about 5% of the women had been physically abused by their husbands. Thus, we can say that
P (A|M c ) = 0.05, as there is no reason to assume that a woman murdered by somebody else
was more or less likely to be abused by her partner. The nal number we need is P (A|M ).
Dershowitz states that a considerable number of wife murderers had previously assaulted
them, although some did not. So, we will (conservatively) say that P (A|M ) = 0.5. (The
two-stage experiment then is: choose a murdered woman at random; at the rst stage, she is
murdered by her partner, or not, with stated probabilities; at the second stage, she is among
the abused women, or not, with probabilities depending on the outcome of the rst stage.) By
Bayes formula,
P (M |A) =
29
P (M )P (A|M )
=
0.8.
c
c
P (M )P (A|M ) + P (M )P (A|M )
36.1
The law literature studiously avoids quantifying concepts such as probative value and reasonable
doubt. Nevertheless, we can probably say that 80% is considerably too high, compared to the
prior probability of 29%, to use as a sole argument that the evidence is not probative.
30
Independence
Events A and B are independent if P (A B) = P (A)P (B) and dependent (or correlated )
otherwise.
Assuming that P (B) > 0, one could rewrite the condition for independence,
P (A|B) = P (A),
so the probability of A is unaected by knowledge that B occurred. Also, if A and B are
independent,
P (A B c ) = P (A) P (A B) = P (A) P (A)P (B) = P (A)(1 P (B)) = P (A)P (B c ),
so A and B c are also independent knowing that B has not occurred also has no inuence on
the probability of A. Another fact to notice immediately is that disjoint events with nonzero
probability cannot be independent: given that one of them happens, the other cannot happen
and thus its probability drops to zero.
Quite often, independence is an assumption and it is the most important concept in probability.
Example 4.13. Pick a random card from a full deck. Let A = {card is an Ace} and R =
{card is red}. Are A and R independent?
1
2
1
, P (R) = 12 , and, as there are two red Aces, P (A R) = 52
= 26
.
We have P (A) = 13
The two events are independent the proportion of aces among red cards is the same as the
proportion among all cards.
Now, pick two cards out of the deck sequentially without replacement. Are F = {rst card
is an Ace} and S = {second card is an Ace} independent?
Now P (F ) = P (S) =
1
13
and P (S|F ) =
3
51 ,
Example 4.14. Toss 2 fair coins and let F = {Heads on 1st toss}, S = {Heads on 2nd toss}.
These are independent. You will notice that here the independence is in fact an assumption.
How do we dene independence of more than two events? We say that events A1 , A2 , . . . , An
are independent if
P (Ai1 . . . Aik ) = P (Ai1 )P (Ai2 ) P (Aik ),
where 1 i1 < i2 < . . . < ik n are arbitrary indices. The occurrence of any combination
of events does not inuence the probability of others. Again, it can be shown that, in such a
collection of independent events, we can replace an Ai by Aci and the events remain independent.
Example 4.15. Roll a four sided fair die, that is, choose one of the numbers 1, 2, 3, 4 at
random. Let A = {1, 2}, B = {1, 3}, C = {1, 4}. Check that these are pairwise independent
(each pair is independent), but not independent.
31
1
2
and P (A B) = P (A C) = P (B C) =
P (A B C) =
1
4
and pairwise
1
1
= .
4
8
+ ...,
= +
6
6 2 6
6 2
6
and then we sum the geometric series. Important note: we have implicitly assumed independence
between the coin and the die, as well as between dierent tosses and rolls. This is very common
in problems such as this!
You can avoid the nuisance, however, by the following trick. Let
D = {game is decided on 1st round},
W = {you win}.
The events D and W are independent, which one can certainly check by computation, but, in
fact, there is a very good reason to conclude so immediately. The crucial observation is that,
provided that the game is not decided in the 1st round, you are thereafter facing the same game
with the same winning probability; thus
P (W |D c ) = P (W ).
In other words, D c and W are independent and then so are D and W , and therefore
P (W ) = P (W |D).
32
This means that one can solve this problem by computing the relevant probabilities for the 1st
round:
1
P (W D)
2
P (W |D) =
= 1 651 = ,
P (D)
7
6 + 62
which is our answer.
Example 4.17. Craps. Many casinos allow you to bet even money on the following game. Two
dice are rolled and the sum S is observed.
If S {7, 11}, you win immediately.
If S {2, 3, 12}, you lose immediately.
If S {4, 5, 6, 8, 9, 10}, the pair of dice is rolled repeatedly until one of the following
happens:
S repeats, in which case you win.
7 appears, in which case you lose.
What is the winning probability?
Let us look at all possible ways to win:
1. You win on the rst roll with probability
8
36 .
2. Otherwise,
3
36 ),
3
36
3
6
+ 36
36
3
3+6
4
4
2
36 ), then win with probability 4+6 = 5 ;
5
5
5
6 (probability 36
), then win with probability 5+6
= 11
;
5
5
5
8 (probability 36 ), then win with probability 5+6 = 11 ;
4
4
9 (probability 36
), then win with probability 4+6
= 25 ;
3
3
), then win with probability 3+6
= 13 .
10 (probability 36
3 1
4 2
5 5
+
+
36 3 36 5 36 11
244
0.4929,
495
= 13 ;
33
Bernoulli trials
Assume n independent experiments, each of which is a success with probability p and, thus,
failure with probability 1 p.
( )
n k
In a sequence of n Bernoulli trials, P (exactly k successes) =
p (1 p)nk .
k
This is because the successes can occur on any subset S of k trials out of n, i.e., on any
S {1, . . . , n} with cardinality k. These possibilities
are disjoint, as exactly k successes cannot
( )
occur on two dierent such sets. There are nk such subsets; if we x such an S, then successes
must occur on k trials in S and failures on all n k trials not in S; the probability that this
happens, by the assumed independence, is pk (1 p)nk .
Example 4.18. A machine produces items which are independently defective with probability
p. Let us compute a few probabilities:
1. P (exactly two items among the rst 6 are defective) =
(6)
2
p2 (1 p)4 .
2. P (at least one item among the rst 6 is defective) = 1 P (no defects) = 1 (1 p)6
3. P (at least 2 items among the rst 6 are defective) = 1 (1 p)6 6p(1 p)5
4. P (exactly 100 items are made before 6 defective are found) equals
(
)
99 5
P (100th item defective, exactly 5 items among 1st 99 defective) = p
p (1 p)94 .
5
Example 4.19. Problem of Points. This involves nding the probability of n successes before
m failures in a sequence of Bernoulli trials. Let us call this probability pn,m .
pn,m = P (in the rst m + n 1 trials, the number of successes is n)
n+m1
( n + m 1)
pk (1 p)n+m1k .
=
k
k=n
The problem is solved, but it needs to be pointed out that computationally this is not the best
formula. It is much more ecient to use the recursive formula obtained by conditioning on the
outcome of the rst trial. Assume m, n 1. Then,
pn,m = P (rst trial is a success) P (n 1 successes before m failures)
= p pn1,m + (1 p) pn,m1 .
34
which allows for very speedy and precise computations for large m and n.
Example 4.20. Best of 7 . Assume that two equally matched teams, A and B, play a series
of games and that the rst team that wins four games is the overall winner of the series. As
it happens, team A lost the rst game. What is the probability it will win the series? Assume
that the games are Bernoulli trials with success probability 12 .
We have
P (A wins the series) = P (4 successes before 3 failures)
6 ( ) ( )6
6
1
15 + 6 + 1
=
0.3438.
=
k
2
26
k=4
Here, we assume that the games continue even after the winner of the series is decided, which
we can do without aecting the probability.
Example 4.21. Banach Matchbox Problem. A mathematician carries two matchboxes, each
originally containing n matches. Each time he needs a match, he is equally likely to take it from
either box. What is the probability that, upon reaching for a box and nding it empty, there
are exactly k matches still in the other box? Here, 0 k n.
Let A1 be the event that matchbox 1 is the one discovered empty and that, at that instant,
matchbox 2 contains k matches. Before this point, he has used n + n k matches, n from
matchbox 1 and n k from matchbox 2. This means that he has reached for box 1 exactly n
times in (n + n k) trials and for the last time at the (n + 1 + n k)th trial. Therefore, our
probability is
(
)
(
)
1 2n k
2n k
1
1
=
.
2 P (A1 ) = 2
n
2
n
22nk
22nk
Example 4.22. Each day, you independently decide, with probability p, to ip a fair coin.
Otherwise, you do nothing. (a) What is the probability of getting exactly 10 Heads in the rst
20 days? (b) What is the probability of getting 10 Heads before 5 Tails?
For (a), the probability of getting Heads is p/2 independently each day, so the answer is
( )( ) (
20
p 10
p )10
.
1
10
2
2
For (b), you can disregard days at which you do not ip to get
14 ( )
14 1
.
k 214
k=10
Example 4.23. You roll a die and your score is the number shown on the die. Your friend rolls
ve dice and his score is the number of 6s shown. Compute (a) the probability of event A that
the two scores are equal and (b) the probability of event B that your friends score is strictly
larger than yours.
35
In both cases we will condition on your friends score this works a little better in case (b)
than conditioning on your score. Let Fi , i = 0, . . . , 5, be the event that your friends score is i.
Then, P (A|Fi ) = 16 if i 1 and P (A|F0 ) = 0. Then, by the rst Bayes formula, we get
P (A) =
i=1
Moreover, P (B|Fi ) =
i1
6
P (Fi )
1
1 55
1
= (1 P (F0 )) = 6 0.0997.
6
6
6 6
P (B) =
i=1
P (Fi )
i1
6
i=1
5
i=1
1
1
=
i P (Fi )
P (Fi )
6
6
1 55
+
6 66
i=1
( ) ( )i ( )5i
5
5
1
5
1
1 55
=
i
+ 6
i
6
6
6
6 6
=
1
6
i P (Fi )
i=1
1 5 1 55
0.0392.
+
6 6 6 66
The last equality can be obtained by computation, but we will soon learn why the sum has to
equal 56 .
Problems
1. Consider the following game. Pick one card at random from a full deck of 52 cards. If you
pull an Ace, you win outright. If not, then you look at the value of the card (K, Q, and J count
as 10). If the number is 7 or less, you lose outright. Otherwise, you select (at random, without
replacement) that number of additional cards from the deck. (For example, if you picked a 9
the rst time, you select 9 more cards.) If you get at least one Ace, you win. What are your
chances of winning this game?
2. An item is defective (independently of other items) with probability 0.3. You have a method
of testing whether the item is defective, but it does not always give you correct answer. If
the tested item is defective, the method detects the defect with probability 0.9 (and says it is
good with probability 0.1). If the tested item is good, then the method says it is defective with
probability 0.2 (and gives the right answer with probability 0.8).
A box contains 3 items. You have tested all of them and the tests detect no defects. What
is the probability that none of the 3 items is defective?
36
3. A chocolate egg either contains a toy or is empty. Assume that each egg contains a toy with
probability p, independently of other eggs. You have 5 eggs; open the rst one and see if it has
a toy inside, then do the same for the second one, etc. Let E1 be the event that you get at least
4 toys and let E2 be the event that you get at least two toys in succession. Compute P (E1 ) and
P (E2 ). Are E1 and E2 independent?
4. You have 16 balls, 3 blue, 4 green, and 9 red. You also have 3 urns. For each of the 16 balls,
you select an urn at random and put the ball into it. (Urns are large enough to accommodate any
number of balls.) (a) What is the probability that no urn is empty? (b) What is the probability
that each urn contains 3 red balls? (c) What is the probability that each urn contains all three
colors?
5. Assume that you have an nelement set U and that you select r independent random subsets
A1 , . . . , Ar U . All Ai are chosen so that all 2n choices are equally likely. Compute (in a simple
closed form) the probability that the Ai are pairwise disjoint.
Solutions to problems
1. Let
F8 = {8 rst time},
F9 = {9 rst time},
(47)
8
),
P (W |F8 ) = 1 (51
8
)
(47
9
),
P (W |F9 ) = 1 (51
9
)
(47
)
P (W |F10 ) = 1 (10
51 ,
10
and so,
4
4
P (W ) =
+
52 52
(47) )
8
)
1 (51
8
4
+
52
(47) )
9
)
1 (51
9
16
+
52
(47) )
)
1 (10
51
10
2. Let F = {none is defective} and A = {test indicates that none is defective}. By the second
Bayes formula,
37
P (A F )
P (A)
(0.7 0.8)3
=
(0.7 0.8 + 0.3 0.1)3
( )3
56
=
.
59
P (F |A) =
216 1
,
315
and
P (no urns are empty) = 1 P (A1 A2 A3 )
216 1
.
= 1
315
(b) We can ignore other balls since only the red balls matter here.
Hence, the result is:
9!
9!
3!3!3!
=
.
39
8 312
(c) As
( )3
( )3
1
2
3
,
P (at least one urn lacks blue) = 3
3
3
( )4
( )4
2
1
P (at least one urn lacks green) = 3
3
,
3
3
( )9
( )9
2
1
P (at least one urn lacks red) = 3
3
,
3
3
(4)
2
p2 (1 p)3
38
we have, by independence,
( ( )
( )3 )]
1
2 3
3
P (each urn contains all 3 colors) = 1 3
3
3
[
( ( )
( )4 )]
1
2 4
3
1 3
3
3
[
(
( )9 )]
( )9
1
2
3
.
1 3
3
3
[
39
40
2 2
610
23
10!
.
4! 610
41
Then,
P (1, 2, and 3 each appear at least once)
= P ((A1 A2 A3 )c )
P (A1 A2 A3 )
( )10 ( )10
( )10
4
3
5
+3
.
= 13
6
6
6
2 4! 24
.
9!
(b) Compute the probability that at most one wife does not sit next to her husband.
Solution:
Let A be the event that all wives sit next to their husbands and let B be the event
5
that exactly one wife does not sit next to her husband. We know that P (A) = 2 9!4!
from part (a). Moreover, B = B1 B2 B3 B4 B5 , where Bi is the event that wi
42
does not sit next to hi and the remaining couples sit together. Then, Bi are disjoint
and their probabilities are all the same. So, we need to determine P (B1 ).
i.
ii.
iii.
iv.
Therefore,
P (B1 ) =
Our answer, then, is
5
3 4! 24
.
9!
3 4! 24 25 4!
+
.
9!
9!
3. Consider the following game. The player rolls a fair die. If he rolls 3 or less, he loses
immediately. Otherwise he selects, at random, as many cards from a full deck as the
number that came up on the die. The player wins if all four Aces are among the selected
cards.
(a) Compute the winning probability for this game.
Solution:
Let W be the event that the player wins. Let Fi be the event that he rolls i, where
i = 1, . . . , 6; P (Fi ) = 16 .
Since we lose if we roll a 1, 2, or 3, P (W |F1 ) = P (W |F2 ) = P (W |F3 ) = 0. Moreover,
1
P (W |F4 ) = (52) ,
4
(5)
4
),
P (W |F5 ) = (52
4
(6)
4
).
P (W |F6 ) = (52
4
Therefore,
1
1
P (W ) = (52)
6
4
( ) ( ))
5
6
1+
+
.
4
4
43
(b) Smith tells you that he recently played this game once and won. What is the probability that he rolled a 6 on the die?
Solution:
P (F6 |W ) =
=
=
=
1
6
52
4
(6 )
( )
P (W )
(6)
1+
15
21
5
.
7
(5)4
4
(6 )
4
4. A chocolate egg either contains a toy or is empty. Assume that each egg contains a toy with
probability p (0, 1), independently of other eggs. Each toy is, with equal probability,
red, white, or blue (again, independently of other toys). You buy 5 eggs. Let E1 be the
event that you get at most 2 toys and let E2 be the event that you get you get at least
one red and at least one white and at least one blue toy (so that you have a complete
collection).
(a) Compute P (E1 ). Why is this probability very easy to compute when p = 1/2?
Solution:
P (E1 ) = P (0 toys) + P (1 toy) + P (2 toys)
( )
5 2
= (1 p)5 + 5p(1 p)4 +
p (1 p)3 .
2
When p = 12 ,
P (at most 2 toys) = P (at least 3 toys)
= P (at most 2 eggs are empty)
44
45
A random variable is a number whose value depends upon the outcome of a random experiment.
Mathematically, a random variable X is a real-valued function on , the space of outcomes:
X : R.
Sometimes, when convenient, we also allow X to have the value or, more rarely, , but
this will not occur in this chapter. The crucial theoretical property that X should have is that,
for each interval B, the set of outcomes for which X B is an event, so we are able to talk
about its probability, P (X B). Random variables are traditionally denoted by capital letters
to distinguish them from deterministic quantities.
Example 5.1. Here are some examples of random variables.
1. Toss a coin 10 times and let X be the number of Heads.
2. Choose a random point in the unit square {(x, y) : 0 x, y 1} and let X be its distance
from the origin.
3. Choose a random person in a class and let X be the height of the person, in inches.
4. Let X be value of the NASDAQ stock index at the closing of the next business day.
A discrete random variable X has nitely or countably many values xi , i = 1, 2, . . ., and
p(xi ) = P (X = xi ) with i = 1, 2, . . . is called the probability mass function of X. Sometimes X
is added as the subscript of its p. m. f., p = pX .
A probability mass function p has the following properties:
1. For all i, p(xi ) > 0 (we do not list values of X which occur with probability 0).
Example 5.2. Let X be the number of Heads in 2 fair coin tosses. Determine its p. m. f.
Possible values of X are 0, 1, and 2. Their probabilities are: P (X = 0) = 14 , P (X = 1) = 12 ,
and P (X = 2) = 14 .
You should note that the random variable Y , which counts the number of Tails in the 2
tosses, has the same p. m. f., that is, pX = pY , but X and Y are far from being the same
random variable! In general, random variables may have the same p. m. f., but may not even
be dened on the same set of outcomes.
Example 5.3. An urn contains 20 balls numbered 1, . . . , 20. Select 5 balls at random, without
replacement. Let X be the largest number among selected balls. Determine its p. m. f. and the
probability that at least one of the selected numbers is 15 or more.
46
The possible values are 5, . . . , 20. To determine the p. m. f., note that we have
and, then,
( )
i1
(20)
5
outcomes,
4
).
P (X = i) = (20
5
Finally,
20
i=15
(14)
5
).
P (X = i) = 1 (20
5
Example 5.4. An urn contains 11 balls, 3 white, 3 red, and 5 blue balls. Take out 3 balls at
random, without replacement. You win $1 for each red ball you select and lose a $1 for each
white ball you select. Determine the p. m. f. of X, the amount you win.
( )
The number of outcomes is 11
3 . X can have values 3, 2, 1, 0, 1, 2, and 3. Let us start
with 0. This can occur with one ball of each color or with 3 blue balls:
()
3 3 5 + 53
55
(11)
P (X = 0) =
=
.
165
3
The probability that X = 1 is the same because of symmetry between the roles that the red
and the white balls play. Next, to get X = 2 we must have 2 red balls and 1 blue:
(3)(5)
15
P (X = 2) = P (X = 2) = 2(11)1 =
.
165
3
Finally, a single outcome (3 red balls) produces X = 3:
1
1
.
P (X = 3) = P (X = 3) = (11) =
165
3
All the seven probabilities should add to 1, which can be used either to check the computations
or to compute the seventh probability (say, P (X = 0)) from the other six.
Assume that X is a discrete random variable with possible values xi , i = 1, 2 . . .. Then, the
expected value, also called expectation, average, or mean, of X is
xi P (X = xi ) =
xi p(xi ).
EX =
i
g(xi )P (X = xi ).
47
We will give another, more convenient, formula for variance that will use the following property
of expectation, called linearity:
E(1 X1 + 2 X2 ) = 1 EX1 + 2 EX2 ,
valid for any random variables X1 and X2 and nonrandom constants 1 and 2 . This property
will be explained and discussed in more detail later. Then
Var(X) = E[(X )2 ]
= E[X 2 2X + 2 ]
= E(X 2 ) 2E(X) + 2
In computations, bear in mind that variance cannot be negative! Furthermore, the only way
that a random variable has 0 variance is when it is equal to its expectation with probability
1 (so it is not really random at all): P (X = ) = 1. Here is the summary:
The variance of a random variable X is Var(X) = E(X EX)2 = E(X 2 )(EX)2 .
48
(EX)2 = (2.3)2 = 5.29, and so Var(X) = 5.9 5.29 = 0.61 and (X) =
We will now look at some famous probability mass functions.
5.1
Var(X) 0.7810.
This is a random variable with values x1 , . . . , xn , each with equal probability 1/n. Such a random
variable is simply the random choice of one among n numbers.
Properties:
1. EX =
x1 +...+xn
.
n
2. VarX =
x21 +...+x2n
n
( x1 +...+xn )2
n
Example 5.7. Let X be the number shown on a rolled fair die. Compute EX, E(X 2 ), and
Var(X).
This is a standard example of a discrete uniform random variable and
7
1 + 2 + ... + 6
= ,
EX =
6
2
2 + . . . + 62
1
+
2
91
EX 2 =
= ,
6
6
( )2
35
7
91
= .
Var(X) =
6
2
12
5.2
This is also called an indicator random variable. Assume that A is an event with probability p.
Then, IA , the indicator of A, is given by
{
1 if A happens,
IA =
0 otherwise.
Other notations for IA include 1A and A . Although simple, such random variables are very
important as building blocks for more complicated random variables.
Properties:
1. EIA = p.
2. Var(IA ) = p(1 p).
2 = I , so that E(I 2 ) = EI = p.
For the variance, note that IA
A
A
A
49
5.3
A Binomial(n,p) random variable counts the number of successes in n independent trials, each
of which is a success with probability p.
Properties:
1. Probability mass function: P (X = i) =
2. EX = np.
(n )
i
pi (1 p)ni , i = 0, . . . , n.
10 ( )
50 1
.
i 250
i=0
Example 5.9. Denote by d the dominant gene and by r the recessive gene at a single locus.
Then dd is called the pure dominant genotype, dr is called the hybrid, and rr the pure recessive
genotype. The two genotypes with at least one dominant gene, dd and dr, result in the phenotype
of the dominant gene, while rr results in a recessive phenotype.
Assuming that both parents are hybrid and have n children, what is the probability that at
least two will have the recessive phenotype? Each child, independently, gets one of the genes at
random from each parent.
For each child, independently, the probability of the rr genotype is 14 . If X is the number of
rr children, then X is Binomial(n, 14 ). Therefore,
( )n
( )
3
1 3 n1
P (X 2) = 1 P (X = 0) P (X = 1) = 1
n
.
4
4 4
5.4
A random variable is Poisson(), with parameter > 0, if it has the probability mass function
given below.
Properties:
1. P (X = i) =
i
i!
e , for i = 0, 1, 2, . . ..
50
2. EX = .
3. Var(X) = .
Here is how we compute the expectation:
EX =
i=1
i e
i1
i
= e
= e e = ,
i!
(i 1)!
i=1
i
,
i!
=
=
=
( ) ( )i (
)
ni
n
1
i
n
n
(
) (
)
n
i
n(n 1) . . . (n i + 1) i
1
i 1
i!
n
n
n
(
)n
i
n(n 1) . . . (n i + 1)
1
(
1
)i
i!
n
ni
1
n
i
e 1 1,
i!
as n .
The Poisson approximation is quite good: one can prove that the error made by computing
a probability using the Poisson approximation instead of its exact Binomial expression (in the
context of the above theorem) is no more than
min(1, ) p.
Example 5.10. Suppose that the probability that a person is killed by lighting in a year is,
independently, 1/(500 million). Assume that the US population is 300 million.
51
p (1 p)n2 0.02311530.
1 (1 p) np(1 p)
2
2. Approximate the above probability.
As np = 35 , X is approximately Poisson( 35 ), and the answer is
1 e e
2
e 0.02311529.
2
3. Approximate P (two or more people are killed by lightning within the rst 6 months of
next year).
This highlights the interpretation of as a rate. If lightning deaths occur at the rate of 35
a year, they should occur at half that rate in 6 months. Indeed, assuming that lightning
deaths occur as a result of independent factors in disjoint time intervals, we can imagine
that they operate on dierent people in disjoint time intervals. Thus, doubling the time
interval is the same as doubling the number n of people (while keeping p the same), and
then np also doubles. Consequently, halving the time interval has the same p, but half as
3
3
and so = 10
as well. The answer is
many trials, so np changes to 10
1 e e 0.0369.
4. Approximate P (in exactly 3 of next 10 years exactly 3 people are killed by lightning).
3
10
3
)(
3
e
3!
)3 (
3
1 e
3!
)7
4.34 106 .
5. Compute the expected number of years, among the next 10, in which 2 or more people are
killed by lightning.
By the same logic as above and the formula for Binomal expectation, the answer is
10(1 e e ) 0.3694.
Example 5.11. Poisson distribution and law . Assume a crime has been committed. It is
known that the perpetrator has certain characteristics, which occur with a small frequency p
(say, 108 ) in a population of size n (say, 108 ). A person who matches these characteristics
has been found at random (e.g., at a routine trac stop or by airport security) and, since p is
52
so small, charged with the crime. There is no other evidence. We will also assume that the
authorities stop looking for another suspect once the arrest has been made. What should the
defense be?
Let us start with a mathematical model of this situation. Assume that N is the number of
people with given characteristics. This is a Binomial random variable but, given the assumptions,
we can easily assume that it is Poisson with = np. Choose a person from among these N , label
that person by C, the criminal. Then, choose at random another person, A, who is arrested.
The question is whether C = A, that is, whether the arrested person is guilty. Mathematically,
we can formulate the problem as follows:
P (C = A | N 1) =
P (C = A, N 1)
.
P (N 1)
We need to condition as the experiment cannot even be performed when N = 0. Now, by the
rst Bayes formula,
P (C = A, N 1) =
=
k=1
k=1
P (C = A, N 1 | N = k) P (N = k)
P (C = A | N = k) P (N = k)
and
P (C = A | N = k) =
so
1
,
k
1 k
P (C = A, N 1) =
e .
k k!
k=1
e
k
.
1 e
k k!
k=1
There is no closed-form expression for the sum, but it can be easily computed numerically. The
defense may claim that the probability of innocence, 1(the above probability), is about 0.2330
when = 1, presumably enough for a reasonable doubt.
This model was in fact tested in court, in the famous People v. Collins case, a 1968 jury
trial in Los Angeles. In this instance, it was claimed by the prosecution (on imsy grounds)
that p = 1/12, 000, 000 and n would have been the number of adult couples in the LA area, say
n = 5, 000, 000. The jury convicted the couple charged for robbery on the basis of the prosecutors claim that, due to low p, the chances of there being another couple [with the specied
characteristics, in the LA area] must be one in a billion. The Supreme Court of California
reversed the conviction and gave two reasons. The rst reason was insucient foundation for
53
the estimate of p. The second reason was that the probability that another couple with matching
characteristics existed was, in fact,
P (N 2 | N 1) =
1 e e
,
1 e
5
much larger than the prosecutor claimed, namely, for = 12
it is about 0.1939. This is about
twice the (more relevant) probability of innocence, which, for this , is about 0.1015.
5.5
A Geometric(p) random variable X counts the number trials required for the rst success in
independent trials with success probability p.
Properties:
1. Probability mass function: P (X = n) = p(1 p)n1 , where n = 1, 2, . . ..
2. EX = p1 .
3. Var(X) =
1p
.
p2
4. P (X > n) =
k=n+1 p(1
p)k1 = (1 p)n .
(1p)n+k
(1p)k
= P (X > n).
We omit the proofs of the second and third formulas, which reduce to manipulations with
geometric series.
Example 5.12. Let X be the number of tosses of a fair coin required for the rst Heads. What
are EX and Var(X)?
As X is Geometric( 12 ), EX = 2 and Var(X) = 2.
Example 5.13. You roll a die, your opponent tosses a coin. If you roll 6 you win; if you do
not roll 6 and your opponent tosses Heads you lose; otherwise, this round ends and the game
repeats. On the average, how many rounds does the game last?
P (game decided on round 1) =
7
), and
and so the number of rounds N is Geometric( 12
EN =
12
.
7
7
1 5 1
+ = ,
6 6 2
12
54
Problems
1. Roll a fair die repeatedly. Let X be the number of 6s in the rst 10 rolls and let Y the
number of rolls needed to obtain a 3. (a) Write down the probability mass function of X. (b)
Write down the probability mass function of Y . (c) Find an expression for P (X 6). (d) Find
an expression for P (Y > 10).
2. A biologist needs at least 3 mature specimens of a certain plant. The plant needs a year
to reach maturity; once a seed is planted, any plant will survive for the year with probability
1/1000 (independently of other plants). The biologist plants 3000 seeds. A year is deemed a
success if three or more plants from these seeds reach maturity.
(a) Write down the exact expression for the probability that the biologist will indeed end up
with at least 3 mature plants.
(b) Write down a relevant approximate expression for the probability from (a). Justify briey
the approximation.
(c) The biologist plans to do this year after year. What is the approximate probability that he
has at least 2 successes in 10 years?
(d) Devise a method to determine the number of seeds the biologist should plant in order to get
at least 3 mature plants in a year with probability at least 0.999. (Your method will probably
require a lengthy calculation do not try to carry it out with pen and paper.)
3. You are dealt one card at random from a full deck and your opponent is dealt 2 cards
(without any replacement). If you get an Ace, he pays you $10, if you get a King, he pays you
$5 (regardless of his cards). If you have neither an Ace nor a King, but your card is red and
your opponent has no red cards, he pays you $1. In all other cases you pay him $1. Determine
your expected earnings. Are they positive?
4. You and your opponent both roll a fair die. If you both roll the same number, the game
is repeated, otherwise whoever rolls the larger number wins. Let N be the number of times
the two dice have to be rolled before the game is decided. (a) Determine the probability mass
function of N . (b) Compute EN . (c) Compute P (you win). (d) Assume that you get paid
$10 for winning in the rst round, $1 for winning in any other round, and nothing otherwise.
Compute your expected winnings.
5. Each of the 50 students in class belongs to exactly one of the four groups A, B, C, or D. The
membership numbers for the four groups are as follows: A: 5, B: 10, C: 15, D: 20. First, choose
one of the 50 students at random and let X be the size of that students group. Next, choose
one of the four groups at random and let Y be its size. (Recall: all random choices are with
equal probability, unless otherwise specied.) (a) Write down the probability mass functions for
X and Y . (b) Compute EX and EY . (c) Compute Var(X) and Var(Y ). (d) Assume you have
55
s students divided into n groups with membership numbers s1 , . . . , sn , and again X is the size
of the group of a randomly chosen student, while Y is the size of the randomly chosen group.
Let EY = and Var(Y ) = 2 . Express EX with s, n, , and .
6. Refer to Example 4.7 for description of the Craps game. In many casinos, one can make side
bets on players performance in a particular instance of this game. For example, the Dont pass
side bet wins $1 if the player looses. If the player wins, it looses $1 (i.e., wins -$1) with one
exception: if the player rolls 12 on the rst roll this side bet wins or looses nothing. Let X be
the winning dollar amount on a Dont pass bet. Find the probability mass function of X, and
its expectation and variance.
Solutions
1. (a) X is Binomial(10, 16 ):
P (X = i) =
10
i
) ( )i ( )10i
5
1
,
6
6
where i = 0, 1, 2, . . . , 10.
(b) Y is Geometric( 16 ):
1
P (Y = i) =
6
( )i1
5
,
6
where i = 1, 2, . . ..
(c)
P (X 6) =
(d)
10 ( ) ( )i ( )10i
10
1
5
i=6
( )10
5
.
P (Y > 10) =
6
)
(
1
.
2. (a) The random variable X, the number of mature plants, is Binomial 3000, 1000
P (X 3) = 1 P (X 2)
= 1 (0.999)3000 3000(0.999)2999 (0.001)
(b) By the Poisson approximation with = 3000
1
1000
)
3000
(0.999)2998 (0.001)2 .
2
= 3,
9
P (X 3) 1 e3 3e3 e3 .
2
56
(c) Denote the probability in (b) by s. Then, the number of years the biologists succeeds is
approximately Binomial(10, s) and the answer is
1 (1 s)10 10s(1 s)9 .
(d) Solve
2
e = 0.001
2
for and then let n = 1000. The equation above can be solved by rewriting
e + e +
2
)
2
and then solved by iteration. The result is that the biologist should plant 11, 229 seeds.
3. Let X be your earnings.
4
,
52
4
P (X = 5) = ,
52
(26)
11
22
(2) =
,
P (X = 1) =
52 51
102
2
P (X = 10) =
P (X = 1) = 1
and so
EX =
4. (a) N is Geometric( 56 ):
2
11
,
13 102
10
5
11
2
11
4
11
+
+
1+
+
=
+
>0
13 13 102
13 102
13 51
( )n1
1
5
P (N = n) =
,
6
6
where n = 1, 2, 3, . . ..
(b) EN = 65 .
(c) By symmetry, P (you win) = 12 .
(d) You get paid $10 with probability
expected winnings are 51
12 .
5. (a)
5
12 ,
$1 with probability
1
12 ,
57
P (X = x)
0.1
0.2
0.3
0.4
P (Y = x)
0.25
0.25
0.25
0.25
i=1
si
n
n
n 2 1
n
si =
si = EY 2 = (Var(Y ) + (EY )2 ) = ( 2 + 2 ).
s
s
n
s
s
s
i=1
1
6. From Example 4.17, we have P (X = 1) = 244
495 0.4929, and P (X = 1) = 36 , so that
949
0.4793. Then EX = P (X = 1) P (X =
P (X = 1) = 1 P (X = 1) P (X = 0) = 1980
3
1) = 220 0.0136 and Var(X) = 1 P (X = 0) (EX)2 0.972.
58
A random variable X is continuous if there exists a nonnegative function f so that, for every
interval B,
f (x) dx,
P (X B) =
B
We will assume that a density function f is continuous, apart from nitely many (possibly
innite) jumps. Clearly, it must hold that
f (x) dx = 1.
f (x) dx,
a
f (x) dx.
f (s) ds
Eg(X) =
g(x) f (x) dx,
59
if 0 < x < 4,
otherwise.
EX =
and
E(X ) =
So, Var(X) = 8
64
9
8
9.
4
0
x2
8
dx =
8
3
4
3x2
0
x3
dx = 8.
8
if x [0, 1],
otherwise.
In a problem such as this, compute rst the distribution function FY of Y . Before starting,
note that the density fY (y) will be nonzero only when y [0, 1], as the values of Y are restricted
to that interval. Now, for y (0, 1),
4
1
4
FY (y) = P (Y y) = P (1 X y) = P (1 y X ) = P ((1 y) X) =
1
1
(1y) 4
3x2 dx .
It follows that
fY (y) =
1
3
d
3
1
1
,
FY (y) = 3((1 y) 4 )2 (1 y) 4 (1) =
dy
4
4 (1 y) 14
for y (0, 1), and fY (y) = 0 otherwise. Observe that it is immaterial how f (y) is dened at
y = 0 and y = 1, because those two values contribute nothing to any integral.
As with discrete random variables, we now look at some famous densities.
6.1
Such a random variable represents the choice of a random number in [, ]. For [, ] = [0, 1],
this is ideally the output of a computer random number generator.
60
Properties:
1. Density: f (x) =
2. EX =
if x [, ],
otherwise.
+
2 .
3. Var(X) =
()2
12 .
Example 6.3. Assume that X is uniform on [0, 1]. What is P (X Q)? What is the probability
that the binary expansion of X starts with 0.010?
As Q is countable, it has an enumeration, say, Q = {q1 , q2 , . . . }. By Axiom 3 of Chapter 3:
P (X Q) = P (i {X = qi }) =
P (X = qi ) = 0.
i
Note that you cannot do this for sets that are not countable or you would prove that P (X
R) = 0, while we, of course, know that P (X R) = P () = 1. As X is, with probability 1,
irrational, its binary expansion is uniquely dened, so there is no ambiguity about what the
second question means.
Divide [0, 1) into 2n intervals of equal length. If the binary expansion of a number x [0, 1)
is 0.x1 x2 . . ., the rst n binary digits determine which of the 2n subintervals x belongs to: if you
know that x belongs to an interval I based on the rst n 1 digits, then nth digit 1 means that
x is in the right half of I and nth digit 0 means that x is in the left half of I. For example, if
the expansion starts with 0.010, the number is in [0, 12 ], then in [ 14 , 12 ], and then nally in [ 14 , 38 ].
Our answer is 18 , but, in fact, we can make a more general conclusion. If X is uniform on
[0, 1], then any of the 2n possibilities for its rst n binary digits are equally likely. In other
words, the binary digits of X are the result of an innite sequence of independent fair coin
tosses. Choosing a uniform random number on [0, 1] is thus equivalent to tossing a fair coin
innitely many times.
Example 6.4. A uniform random number X divides [0, 1] into two segments. Let R be the
ratio of the smaller versus the larger segment. Compute the density of R.
As R has values in (0, 1), the density fR (r) is nonzero only for r (0, 1) and we will deal
only with such rs.
(
)
(
)
1
X
1 1X
r +P X > ,
r
FR (r) = P (R r) = P X ,
2 1X
2
X
(
)
(
)
1
r
1
1
= P X ,X
+ P X > ,X
2
r+1
2
r+1
(
)
(
) (
)
r
1
r
1
1
1
= P X
+P X
since
and
r+1
r+1
r+1
2
r+1
2
r
1
2r
=
+1
=
r+1
r+1
r+1
61
6.2
d
2
.
FR (r) =
dr
(r + 1)2
A random variable is Exponential(), with parameter > 0, if it has the probability mass
function given below. This is a distribution for the waiting time for some random event, for
example, for a lightbulb to burn out or for the next earthquake of at least some given magnitude.
Properties:
1. Density: f (x) =
ex
0
if x 0,
if x < 0.
2. EX = 1 .
3. Var(X) =
1
.
2
4. P (X x) = ex .
5. Memoryless property: P (X x + y|X y) = ex .
The last property means that, if the event has not occurred by some given time (no matter
how large), the distribution of the remaining waiting time is the same as it was at the beginning.
There is no aging.
Proofs of these properties are integration exercises and are omitted.
Example 6.5. Assume that a lightbulb lasts on average 100 hours. Assuming exponential
distribution, compute the probability that it lasts more than 200 hours and the probability that
it lasts less than 50 hours.
Let X be the waiting time for the bulb to burn out. Then, X is Exponential with =
and
P (X 200) = e2 0.1353,
1
100
P (X 50) = 1 e 2 0.3935.
6.3
62
Properties:
1. Density:
(x)2
1
f (x) = fX (x) = e 22 ,
2
where x (, ).
2. EX = .
3. Var(X) = 2 .
To show that
f (x) dx = 1
is a tricky exercise in integration, as is the computation of the variance. Assuming that the
integral of f is 1, we can use symmetry to prove that EX must be :
xf (x) dx =
(x )f (x) dx +
f (x) dx
EX =
(x)2
1
(x ) e 22 dx +
=
2
z2
1
=
z e 22 dz +
2
= ,
where the last integral was obtained by the change of variable z = x and is zero because the
function integrated is odd.
Example 6.6. Let X be a N (, 2 ) random variable and let Y = X + , with > 0. How is
Y distributed?
If X is a measurement with error X + amounts to changing the units and so Y should
still be normal. Let us see if this is the case. We start by computing the distribution function
of Y ,
FY (y) = P (Y y)
= P (X + y)
(
)
y
= P X
fX (x) dx
=
63
(y)2
1
e 22 2 .
2
has EZ = 0 and Var(Z) = 1. Such a N (0, 1) random variable is called standard Normal. Its
distribution function FZ (z) is denoted by (z). Note that
Z=
fZ (z) =
(z) = FZ (z) =
1
2
ez /2
2
z
1
2
ex /2 dx.
2
The integral for (z) cannot be computed as an elementary function, so approximate values
are given in tables. Nowadays, this is largely obsolete, as computers can easily compute (z)
very accurately for any given z. You should also note that it is enough to know these values for
z > 0, as in this case, by using the fact that fZ (x) is an even function,
z
z
(z) =
fZ (x) dx =
fZ (x) dx = 1
fZ (x) dx = 1 (z).
In this and all other examples of this type, the letter Z will stand for an N (0, 1) random
variable.
We have
P (|X | ) = P
)
X
1 = P (|Z| 1) = 2P (Z 1) = 2(1 (1)) 0.3173.
Similarly,
P (|X | 2) = 2(1 (2)) 0.0455,
P (|X | 3) = 2(1 (3)) 0.0027.
64
Example 6.8. Assume that X is Normal with mean = 2 and variance 2 = 25. Compute
the probability that X is between 1 and 4.
Here is the computation:
(
X 2
42
12
P (1 X 4) = P
5
5
5
= P (0.2 Z 0.4)
= P (Z 0.4) P (Z 0.2)
= (0.4) (1 (0.2))
0.2347 .
Let Sn be a Binomial(n, p) random variable. Recall that its mean is np and its variance
np(1 p). If we pretend that Sn is Normal, then Sn np is standard Normal, i.e., N (0, 1).
np(1p)
The following theorem says that this is approximately true if p is xed (e.g., 0.5) and n is large
(e.g., n = 100).
Theorem 6.1. De Moivre-Laplace Central Limit Theorem.
np(1p)
precisely,
P
S np
n
x
np(1 p)
(x)
We should also note that the above theorem is an analytical statement; it says that
( )
x
s2
n k
1
nk
p (1 p)
e 2 ds
k
2
k:0knp+x
np(1p)
as n , for every x R. Indeed it can be, and originally was, proved this way, with a lot of
computational work.
An important issue is the quality of the Normal approximation to the Binomial. One can
prove that the dierence between the Binomial probability (in the above theorem) and its limit
is at most
0.5 (p2 + (1 p)2 )
.
n p (1 p)
A commonly cited rule of thumb is that this is a decent approximation when np(1 p) 10;
however, if we take p = 1/3 and n = 45, so that np(1p) = 10, the bound above is about 0.0878,
too large for many purposes. Various corrections have been developed to diminish the error,
but they are, in my opinion, obsolete by now. In the situation when the above upper bound
65
on the error is too high, we should simply compute directly with the Binomial distribution and
not use the Normal approximation. (We will assume that the approximation is adequate in the
1
, the
examples below.) Remember that, when n is large and p is small, say n = 100 and p = 100
Poisson approximation (with = np) is much better!
Example 6.9. A roulette wheel has 38 slots: 18 red, 18 black, and 2 green. The ball ends at
one of these at random. You are a player who plays a large number of games and makes an even
bet of $1 on red in every game. After n games, what is the probability that you are ahead?
Answer this for n = 100 and n = 1000.
9
) random variable.
Let Sn be the number of times you win. This is a Binomial(n, 19
np
Sn np
= P
> 2
np(1 p)
np(1 p)
)
(
( 12 p) n
P Z>
p(1 p)
5
Z>
90
0.2990,
5
Z>
3
0.0478.
For comparison, the true probabilities are 0.2650 and 0.0448, respectively.
Example 6.10. What would the answer to the previous example be if the game were fair, i.e.,
you bet even money on the outcome of a fair coin toss each time.
Then, p =
1
2
and
P (ahead) P (Z > 0) = 0.5,
as n .
Example 6.11. How many times do you need to toss a fair coin to get at least 100 heads with
probability at least 90%?
Let n be the number of tosses that we are looking for. For Sn , which is Binomial(n, 12 ), we
need to nd n so that
P (Sn 100) 0.9.
We will use below that n > 200, as the probability would be approximately
1
2
66
= P Z
n
(
)
n 200
= P Z
n
(
)
n 200
=
n
= 0.9
Now, according to the tables, (1.28) 0.9, thus we need to solve
n200
n 1.28 n 200 = 0.
Rounding up the number n we get from above, we conclude that n = 219. (In fact, the probability
of getting at most 99 heads changes from about 0.0990 to about 0.1108 as n changes from 217
to 218.)
Problems
1. A random variable X has the density function
{
c(x + x)
f (x) =
0
x [0, 1],
otherwise.
(a) Determine c. (b) Compute E(1/X). (c) Determine the probability density function of
Y = X 2.
2. The density function of a random variable X is given by
{
a + bx
0 x 2,
f (x) =
0
otherwise.
We also know that E(X) = 7/6. (a) Compute a and b. (b) Compute Var(X).
67
3. After your complaint about their service, a representative of an insurance company promised
to call you between 7 and 9 this evening. Assume that this means that the time T of the call
is uniformly distributed in the specied interval.
(a) Compute the probability that the call arrives between 8:00 and 8:20.
(b) At 8:30, the call still hasnt arrived. What is the probability that it arrives in the next 10
minutes?
(c) Assume that you know in advance that the call will last exactly 1 hour. From 9 to 9:30,
there is a game show on TV that you wanted to watch. Let M be the amount of time of the
show that you miss because of the call. Compute the expected value of M .
4. Toss a fair coin twice. You win $1 if at least one of the two tosses comes out heads.
(a) Assume that you play this game 300 times. What is, approximately, the probability that
you win at least $250?
(b) Approximately how many times do you need to play so that you win at least $250 with
probability at least 0.99?
5. Roll a die n times and let M be the number of times you roll 6. Assume that n is large.
(a) Compute the expectation EM .
(b) Write down an approximation, in terms on n and , of the probability that M diers from
its expectation by less than 10%.
(c) How large should n be so that the probability in (b) is larger than 0.99?
Solutions
1. (a) As
1=c
it follows that c =
6
7.
(x +
0
x) dx = c
1 2
+
2 3
7
= c,
6
(b)
6
7
1
0
18
1
(x + x) dx =
.
x
7
(c)
Fr (y) = P (Y y)
= P (X y)
6
(x + x) dx,
=
7 0
68
and so
fY (y) =
3
7 (1
+ y 4 )
if y (0, 1),
otherwise.
1
1
2. (a) From 0 f (x) dx = 1 we get 2a + 2b = 1 and from 0 xf (x) dx =
The two equations give a = b = 14 .
1
(b) E(X 2 ) = 0 x2 f (x) dx = 53 and so Var(X) = 53 ( 76 )2 = 11
36 .
7
6
we get 2a + 83 b = 76 .
3. (a) 16 .
(b) Let T be the time of the call, from 7pm, in minutes; T is uniform on [0, 120]. Thus,
1
P (T 100|T 90) = .
3
(c) We have
Then,
if 0 T 60,
0
M = T 60 if 60 T 90,
30
if 90 T.
1
EM =
120
90
60
1
(t 60) dx +
120
120
30 dx = 11.25.
90
4. (a) P (win a single game) = 34 . If you play n times, the number X of games you win is
Binomial(n, 34 ). If Z is N (0, 1), then
3
250 n 4
.
P (X 250) P Z
3 1
n 4 4
For (a), n = 300 and the above expression is P (Z
0.0004.
10
3 ),
For (b), you need to nd n so that the above expression is 0.99 or so that
250 n 34
= 0.01.
n 34 14
250 n 34
= 2.33.
n 34 14
69
If x =
and solving the quadratic equation gives x 34.04, n > (34.04)2 /3, n 387.
5. (a) M is Binomial(n, 16 ), so EM = n6 .
(b)
P
( )
n
)
0.1
0.1 n
n
n
6
= 2
M
1.
< 0.1 P |Z| <
6
6
5
n 16 56
)
0.1
n
5
= 0.995,
0.1
n
5