Proability Principles

Probability Theory
Prepared by:
Dr. Sampson Twumasi-Ankrah
Department of Statistics and Actuarial Science

KNUST
March 14, 2022
1 / 37
Basic Concepts of Probability
Introduction
Probability forms the basis of inferential statistics. We can
think of the probability of an outcome as the likelihood of
observing that outcome. If something has a high likelihood of
happening, it has a high probability (close to 1). If something
has a small chance of happening, it has a low probability (close
to 0). If something occurs that has a low probability, we
investigate to find out ”whats up”.
Probability
Is a measure of the likelihood of a random phenomenon or
chance behavior. Probability describes the long-term proportion
with which a certain outcome will occur in situations with
short-term uncertainty.
2 / 37
Basic Concepts of Probability Cont’d
Illustration
Consider an experiment in which only one of two possible
outcomes can occur. For example, the result of treatment with
an antibiotic is that an infection is either cured or not cured
within 5 days.
The probability of a cure is not easily ascertained a priori, i.e.,

prior to performing an experiment. If the antibiotic were widely
used, based on his or her own experience, a physician prescriber
of the product might be able to give a good estimate of the prob-
ability of a cure for patients treated with the drug.
Example
For example, in the physicians practice, he or she may have
observed that approximately three of four patients treated with
the antibiotic are cured. For this physician, the probability that
a patient will be cured when treated with the antibiotic is 75%.
3 / 37
Basic Concepts of Probability Cont’d
NB: The exact probability can be determined only by treating the

total population and observing the proportion cured, a practical
impossibility in this case. In this context, it would be fair to say
that exact probabilities are nearly always unknown
Definitions
a. Experiment: An experiment is any process that generates
a set of data or well-defined outcomes. There are two types
of experiments, namely Deterministic and Random (or
Chance) Experiment. In the deterministic experiments the
observed results are not subject to chance while the
outcomes of random experiments cannot be predicted with
certainty. A random experiment could be as simple as
tossing a coin or die and observing an outcome or complex
as choosing 50 people from a population and testing them
for the AIDS disease.
4 / 37
Definitions Cont’d
b. Trial
Each repetition of an experiment is called a trial. That is, a
trial is a single performance of an experiment.
c. Outcome
The possible result of each trial of an experiment is called an
outcome. When an outcome of an experiment has equal chance
of occurring as the others the outcomes are said to be equally
likely. For example, the toss of a coin and a die yield the
possible outcomes in the sets, {H, T} and {1, 2, 3, 4, 5, 6} and
a play of a football match yields {win (W), loss (L), draw (D)}.
Sample Space
Sample space is the collection of all possible outcomes at a
probability experiment. We use the notation S for sample
space. Each element or outcome of the experiment is called
sample point.
5 / 37
Example of Sample Spaces
1 The results of two and three tosses of a coin give the

following sample spaces:
S = {HH, HT, T H, T T }
S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }
6 / 37
Event:
An event is a collection of one or more outcomes from an
experiment. That is, it is a subset of a sample space. It is
denoted by a capital letter. For example we may have:
The event of observing a head (H) in three tosses of a coin,
A = {HTT, TTH}
Consider a newly married couple planning to have three
children. The event of the family having two girls is: D =
{BGG, GBG, GGB}
Tree Diagram:
The tree diagram represents pictorially the outcomes of random
experiment. The probability of an outcome which is a sequence
of trials, is represented by any path of the tree. For example,
7 / 37
Consider a couple planning to have three children, assuming each
child born is equally likely to be a boy (B) or girl (G).
8 / 37
Determination of Probability of an Event
The probability of an event A, denoted, P(A), gives the

numerical measure of the likelihood of the occurrence of
event A which is such that 0 ≤ P (A) ≤ 1.
If P (A) = 0, the event A is said to be impossible to occur

and if P (A) = 1, A is said to be certain. If A0 is the
complement of the event A, then P (A0 ) = 1 − P (A), called
the probability that event A will not occur.
There are three main schools of thought in defining and

interpreting the probability of an event. These are the
Classical Definition, Empirical Concept and the
Subjective Approach. The first two are referred to as
the Objective Approach.
9 / 37
Probability of an Event
a. The Classical Definition
This is based on the assumption that the outcomes of an
experiment are equally likely. For example, if an experiment
can lead to n mutually exclusive and equally likely outcomes,
then the probability of the event A is defined by
n(A) N umber of successf ul outcomes

P (A) = =
n(S) N umber of possible outcomes
The classical definition of probability of event A is referred to

as priori probability because it is determined before any
experiment is performed to observe the outcomes of event A.
The Empirical Concept:

This concept uses the relative frequencies of past occurrences to
develop probabilities for future. The probability of an event A
happening in future is determined by observing what fraction of
the time similar events happened in the past. That is, 10 / 37
The Empirical Concept Cont’d:
number of times A occured in the past
P (A) =
T otal number of observation
The relative frequency of the occurrence of the event A used to
estimate P(A) becomes more accurate if trials are largely
repeated. The relative frequency approach of defining P(A) is
sometimes called posteriori probability because P(A) is
determined only after event A is observed.
The Subjective Definition:

The subjective concept of probability is based on the degree of
belief through the evidence available. The probability of an
event A may therefore be assessed through experience, intuitive,
judgement or expertise. For example, determining the
probability of getting a cure of a disease or going to rain today.
This approach to probability has been developed relatively
recently and is related to Bayesian Decision Analysis.
11 / 37
Example 1
Consider the problem of a couple planning to have three
children, assuming each child born is equally likely to be a boy
(B) or a girl (G).
a. List the possible outcomes in this experiment
b. What is the probability of the couple having exactly two
girls?
Solution:
The sample space for this experiment is
S = {BBB, BBG, BGB, BGG, GBG, GGB, GGG}
Let A be the event of the couple having exactly two girls.
Then, A = {BGG, GBG, GGB}
n(A) 3
P (A) = =
n(S) 8
12 / 37
Probability of Compound Events
Two or more events are combined to form a single event using

the set operations, ∪ and ∩ . The event
(A ∪ B) occurs if either A or B both occur(s).
(A ∩ B) occurs if both A and B occur.
Definitions:
Mutually Exclusive Events
Two or more events which have no common outcome(s) (i.e.
never occur at the same time) are said to be mutually exclusive.
If A and B are mutually exclusive events of an experiment, then
A ∩ B = ø and P (A ∪ B) = P (A) + P (B), since P (A ∩ B) = 0
13 / 37
Probability of Events
Independent Events:
Two or more events are said to be independent if the
probability of occurrence of one is not influenced by the
occurrence or non- occurrence of the other(s). Mathematically,
the two events, A and B are said to be independent, if and only
if P (A ∩ B) = P (A) · P (B). However, if A and B are such that,
P (A ∩ B) = P (A) · P (B|A), they are said to be conditionally
independent.
Conditional Probability:
Let A and B be two events in the sample space, S with
P (B) > 0. The probability that an event A occurs given that
event B has already occurred, denoted P (A|B), is called the
conditional probability of A given B. The conditional
probability of A given B is defined as.
P (A|B) = P P(A∩B)
(B) , P (B) > 0. In particular, if S is a finite
n(A∩B) n(B)
equiprobable space, then P (A ∩ B) = n(B) , P (B) = n(S)
14 / 37
Exhaustive Events:
Two or more events defined on the same sample space are said
to be exhaustive if their union is equal to the sample space S
(thus, if they partition the sample space mutually exclusively).
Eg: if A1 , A2 , A3 ∈ S A1 ∪ A2 ∪ A3 = S
Definition (partition of sample space):
A1 , A2 , A3 · · · An form a partition of the same sample space S if
the following hold:
1 Ai 6= ø for all i = 1, 2, 3, · · · , n
2 Ai ∩ Aj for all i 6= j, i, j = 1, 2, 3, · · · , n
Pn
i=1 Ai = S
3
In other words, the n - events A1 , A2 , A3 · · · An form a partition

of the sample space S if the n - events are (a) nonempty, (b)
mutually exclusive and (c) collectively exhaustive.
15 / 37
Example
a. In a certain population of women, 40% have had breast
cancer, 20% are smokers and 13% are smokers and have
had breast cancer. If a women is selected at random from
the population, what is the probability that she had breast
cancer, smokes or both?
b. Let A and B be events such that P (A) = 0.6, P (B) = 0.5
and (A ∪ B) 0.8.
Find P (A|B)
Are A and B independent?
Solution
a. Let B be the event of women with breast cancer and W the
event of women who smoke. Then,
P (B) = 0.4, P (W ) = 0.2, P (B ∩ W ) = 0.13
= P (B ∪ W ) = P (B) + P (W ) − P (B ∩ W )
= 0.4 + 0.20 − 0.13 = 0.47
16 / 37
b. Given that P (A) = 0.6, P (B) = 0.5, P (A ∪ B) = 0.8
(i)P (A ∩ B) = P (A) + P (B) − P (A ∪ B)
= 0.6 + 0.5 − 0.8 = 0.3

P (A ∩ B)
P (A|B) = , P (B) > 0
P (B)
0.3 3
= = = 0.6
0.5 5
ii. A and B are independent if P (A) · P (B) = P (A ∩ B)
P (A) · P (B) = (0.6)(0.5) = 0.3 = P (A ∩ B)
Which means that A and B are independent.
17 / 37
The Multiplication Rule for P (A ∩ B)
The definition of conditional probability yields the following
result, obtained by multiplying both sides of the conditional
probability equation by P(B).
P (A ∩ B)
P (A|B) =
P (B)
P (A ∩ B)
P (A|B) ∗ P (B) = ∗ P (B)
P (B)
P (A|B) ∗ P (B) = P (A ∩ B)
This rule is important because it is often the case that P (A ∩ B)
is desired, whereas both P(B) and P (A|B) can be specified from
the problem description.
18 / 37
The Law of Total Probability
Let A1 , · · · , Ak be mutually exclusive and exhaustive events.
Then for any other event B,
P (B) = P (B|A1 ) ∗ P (A1 ) + · · · + P (B|Ak ) ∗ P (Ak )

k
X
= P (B|Ai ) ∗ P (Ai )
i=1
Bayes’ Rule
The power of Bayes’ rule is that in many situations where we
want to compute P (A|B) it turns out that it is difficult to do so
directly, yet we might have direct information about P (B|A).
Bayes rule enables us to compute P (A|B) in terms of P (B|A).
P (A ∩ B) P (B|A)P (A)
P (A|B) = =
P (B) P (B)
19 / 37
Bayes Theorem
Let A and Ac constitute a partition of the sample space S such
that with P (A) > 0 and P (Ac ) > 0, then for any event B in S
such that P (B) > 0,
P (B|A)P (A)
P (A|B) =
P (B|A)P (A) + P (B|Ac )P (Ac )
Example
A paint-store chain produces and sells latex and semigloss
paint. Based on long-range sales, the probability that a
customer will purchase latex paint is 0.75. Of those that
purchase latex paint, 60% also purchase rollers. But only 30%
of semigloss pain buyers purchase rollers. A randomly selected
buyer purchases a roller and a can of paint. What is the
probability that the paint is latex?
20 / 37
Solution
L = {The customer purchases latex paint.}, P(L) = 0.75
S = {The customer purchases semigloss paint.}, P(S) = 0.25
R = {The customer purchases roller.}
P (R|L) = 0.6; P (R|S) = 0.3
P (R) = P (R|L)P (L)+P (R|S)P (S) = (0.6×0.75)+(0.3×0.25) = 0.525
P (L ∩ R)
P (L|R) =
P (R)
P (R|L)P (L)
=
P (R)
0.6 × 0.75
=
(0.6 × 0.75) + (0.3 × 0.25)
≈ 0.857
21 / 37
Axioms of Probability
Given an experiment and a sample space, S , the objective of
probability is to assign to each event A a number P(A), called
the probability of the event A, which will give a precise measure
of the chance that A will occur. To ensure that the probability
assignments will be consistent with our intuitive notions of prob-
ability, all assignments should satisfy the following axioms (basic
properties) of probability
A.1: For every event A, 0 ≤ P (A) ≤ 1
A.2: P(S) = 1
A.3: If A and B are mutually exclusive events, i.e A ∩ B = øthen
P (A ∪ B) = P (A) + P (B)
A.4: If A1 , A2 , · · · , An is a sequence of n mutually exclusive
events, then,
P (A1 ∪A2 ∪A3 ∪· · ·∪An ) = P (A1 )+P (A2 )+P (A3 )+· · ·+P (An )
22 / 37
Theorems
The following theorems arise directly from the above axioms:

Theorem 1 : If ø is the empty set, then P (ø) = 0
Theorem 2 : If A0 is the complement of an event A , then
P (A0 ) = 1 − P (A)
23 / 37
Bayes’ Theorem, Screening Tests, Sensitivity,
Specificity, and Predictive Value Positive and Negative:
There are two states regarding the disease and two states regard-
ing the result of the screening test:
We define the following events of interest:

D: the individual has the disease (presence of the disease)
D̄: the individual does not have the disease (absence of the dis-
ease)
T : the individual has a positive screening test result
T̄ : the individual has a negative screening test result
24 / 37
Definitions
There are two false results:
1. A false positive result:
This result happens when a test indicates a positive status
when the true status is negative. Its probability is:
P (T |D̄) = P( positive result | absence of the disease )
2. A false negative result

This result happens when a test indicates a negative status
when the true status is positive. Its probability is:
P (T̄ |D) = P( negative result | presence of the disease )
The Sensitivity:
The sensitivity of a test is the probability of a positive test
result given the presence of the disease. P (T |D) = P( positive
result of the test | presence of the disease )
25 / 37
The specificity:
The specificity of a test is the probability of a negative test
result given the absence of the disease. P (T̄ |D̄) = P( negative
result of the test | absence of the disease)
To clarify these concepts, suppose we have a sample of (n) sub-

jects who are cross-classified according to Disease Status and
Screening Test Result as follows:
Disease
Test Result Present(D) Absent (D̄) Total
Positive(T) a b a + b = n(T )
Negative (T̄ ) c d c + d = n(T̄ )
Total a + c = n(D) b + d = n(D̄) n
For example, there are (a) subjects who have the disease and
whose screening test result was positive.
26 / 37
From the Sensitivity and Specificity Table
From this table, we may compute the following conditional prob-
abilities:
1. The probability of the false positive result:
n(T ∩ D̄) b
P (T |D̄) = =
n(D̄) b + d
2. The probability of false negative result:
n(T̄ ∩ D) c
P (T̄ |D) = =
n(D) a+c
3. The sensitivity of the screening test:
n(T ∩ D) a
P (T |D) = =
n(D) a+c
4. The specificity of the screening test:
n(T̄ ∩ D̄) d
P (T̄ |D̄) = =
n(D̄) b+d
27 / 37
Definitions of the Predictive Value Positive and
Predictive Value Negative of a Screening Test:
1. The predictive value positive of a screening test:

The predictive value positive is the probability that a subject
has the disease, given that the subject has a positive screening
test result:
P (D|T ) = P( the subject has the disease | positive result )
= P( presence of the disease | positive result)
2. The predictive value negative of a screening test:

The predictive value negative is the probability that a subject
does not have the disease, given that the subject has a negative
screening test result:
P (T |D) =P( the subject does not have the disease | negative
result )
= P( absence of the disease | negative result)
28 / 37
Calculating the predictive Value Positive and Predictive
Value Negative:
(How to calculate P (D|T ) and P (T̄ |D̄)):

We calculate these conditional probabilities using the knowledge
of:
1 The sensitivity of a test = P (D|T )
2 The specificity of the test = P (T̄ |D̄))
3 The probability of the relevant disease in the general
population, P(D) . (It is usually obtained from another
independent study).
Calculating the Predictive Value Positive, P (D|T ) :
P (T ∩ D)
P (D|T ) =
P (T )
29 / 37
Value Negative Cont’d:
But we know that:
P (T ) = P (T ∩ D) + P (T ∩ D̄)
P (T ∩ D) = P (T |D)P (D) multiplication rule.
P (T ∩ D̄) = P (T |D̄)P (D̄) multiplication rule.
P (T ) = P (T |D)P (D) + P (T |D̄)P (D̄)
Therefore, we reach the following version of Bayes Theorem:

P (T |D)P (D)
P (D|T ) = ) (1)
P (T |D)P (D) + P (T |D̄)P (D̄)
30 / 37
Value Negative Cont’d:
NOTE:
P (T |D) = sensitivity
P (T |D̄) = 1 − P (T̄ |D̄) = 1- specificity
P(D) = The probability of the relevant disease in the general
population.
P (D̄) = 1 − P (D)
Calculating the Predictive Value Negative, P (D̄|T̄ ) :
To obtain the predictive value negative of a screening test, we
use the following statement of Bayes’ theorem:
P (D̄|T̄ )P (D̄)
P (D̄|T̄ ) = (2)
P (D̄|T̄ )P (D̄) + P (T̄ |D̄)P (D̄)
NOTE:
P (T̄ |D̄) = specificity
P (T̄ |D) = 1 − P (T |D) = 1- sensitivity 31 / 37
Example:
A medical research team wished to evaluate a proposed screening
test for Alzheimer’s disease. The test was given to a random sam-
ple of 450 patients with Alzheimer’s disease and an independent
random sample of 500 patients without symptoms of the disease.
The two samples were drawn from populations of subjects who
were 65 years of age or older. The results are as follows:
Alzheimer Disease
Test Result Present(D) Absent (D̄) Total
Positive(T) 436 5 441
Negative (T̄ ) 14 495 509
Total 450 500 950
Based on another independent study, it is known that the per-

centage of patients with Alzheimer’s disease (the rate of preva-
lence of the disease) is 11.3% out of all subjects who were 65
years of age or older.
32 / 37
Solution
Using the data, we estimate the following quantities:
1 The sensitivity of the test:
n(T ∩ D) 436
p(T ∩ D) = = = 0.9689
n(D) 450
2 The specificity of the test:
n()T̄ ∩ D̄) 495
P (T̄ |D̄) = = = 0.99
n(D̄) 500
3 The probability of the disease in the general population,
P(D) : The rate of disease in the relevant general
population, P(D) , cannot be computed from the sample
data given in the table. However, it is given that the
percentage of patients with Alzheimer’s disease is 11.3%
out of all subjects who were 65 years of age or older.
Therefore P(D) can be computed to be:
11.3%
P (D) =
100%
33 / 37
The predictive value positive of the test:
We wish to estimate the probability that a subject who is positive
on the test has Alzheimer disease. We use the Bayes’ formula of
Equation (1):
P (T |D)P (D)
P (D|T ) = (3)
P (T |D)P (D) + P (T |D̄)P (D̄)
From the tabulated data, we compute:
436
P (T |D) = = 0.9689
450
n(T ∩ D̄) 5
P (T |D̄) = = = 0.01
n(D̄) 500
Substituting of these results into Equation (1), we get:
(0.9689)P (D)
P (D|T ) =
(0.9689)P (D) + (0.01)P (D̄)
34 / 37
(0.9689)(0.113)
P (D|T ) =
(0.9689)(0.113) + (0.01)(1 − 0.113)
= 0.93
As we see, in this case, the predictive value positive of the test is
very high.
The predictive value negative of the test:
We wish to estimate the probability that a subject who is
negative in the test does not have Alzheimer disease. We use
the Bayes formula of Equation (2):
P (D̄|T̄ )P (D̄)
P (D̄|T̄ ) = (4)
P (D̄|T̄ )P (D̄) + P (T̄ |D̄)P (D̄)
35 / 37
The predictive value negative of the test:
To compute P (D̄|T̄ ) , we first compute the following probabili-

ties:
495
P (T̄ |D̄) = = 0.99
500
P (D̄) = 1 − P (D) = 1 − 0.113 = 0.887
n(T̄ ∩ D) 14
P (T̄ |D) = = = 0.0311
n(D) 450
Substituting in Equation (2) gives:
P (T̄ |D̄)P (D̄)

P (D̄|T̄ ) =
P (T̄ |D̄)P (D̄) + P (T̄ |D)P (D)
(0.99)(0.887)
= = 0.996
(0.99)(0.887) + (0.0311)(0.113)
As we see, the predictive value negative is also very high.
36 / 37
37 / 37

Proability Principles

Uploaded by

Copyright:

Available Formats

Proability Principles

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Proability Principles

Uploaded by

Copyright:

Available Formats

Probability Theory

Department of Statistics and Actuarial Science

March 14, 2022

The probability of a cure is not easily ascertained a priori, i.e.,

NB: The exact probability can be determined only by treating the

1 The results of two and three tosses of a coin give the

S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }

The probability of an event A, denoted, P(A), gives the

If P (A) = 0, the event A is said to be impossible to occur

There are three main schools of thought in defining and

n(A) N umber of successf ul outcomes

The classical definition of probability of event A is referred to

The Empirical Concept:

The Subjective Definition:

Two or more events are combined to form a single event using

In other words, the n - events A1 , A2 , A3 · · · An form a partition

b. Given that P (A) = 0.6, P (B) = 0.5, P (A ∪ B) = 0.8

(i)P (A ∩ B) = P (A) + P (B) − P (A ∪ B)

= 0.6 + 0.5 − 0.8 = 0.3

P (B) = P (B|A1 ) ∗ P (A1 ) + · · · + P (B|Ak ) ∗ P (Ak )

P (R) = P (R|L)P (L)+P (R|S)P (S) = (0.6×0.75)+(0.3×0.25) = 0.525

The following theorems arise directly from the above axioms:

We define the following events of interest:

2. A false negative result

To clarify these concepts, suppose we have a sample of (n) sub-

1. The predictive value positive of a screening test:

2. The predictive value negative of a screening test:

(How to calculate P (D|T ) and P (T̄ |D̄)):

Therefore, we reach the following version of Bayes Theorem:

Based on another independent study, it is known that the per-

To compute P (D̄|T̄ ) , we first compute the following probabili-

P (T̄ |D̄)P (D̄)

You might also like