Basic Biostatistical Concepts in Research

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Basic Biostatistical Concepts in Research

E Omoregie

INTRODUCTION
STATISTICS
The science of collecting, describing, and interpreting data

STATISTICAL METHODS
Include
- clarifying the situation,
- gathering data,
- summarizing data, and
- deriving and communicating meaningful conclusions
about the population (drawn on the analysis of samples)

1
INTRODUCTION

• a contraction of biology and statistics


• application of statistics to a wide range
of topics in biology
• sometimes referred to as biometry or
biometrics

INTRODUCTION
Population vs. Sample:

POPULATION
The collection, or set, of ALL individuals, items or events of
interest from which information is sort

SAMPLE
The subset of the population on which we make measurements

2
INTRODUCTION
AREAS OF BIOSTATISTICS:
DESCRIPTIVE: Methods of summarizing or describing a set of
data. Examples: Tables, graphs, numerical summaries.

INFERENTIAL: Methods of drawing conclusions about a


population based on a sample

VARIABLES: Characteristics of interest about each individual


within a population or sample. Examples: age, size, length,
respiratory rate, number of blood cells, colour
Variables should be measurable or describable.

INTRODUCTION
Types of Data:
Qualitative – non numerical
Nominal – names that describe categories, but no order. Example:
eye color, gender
Ordinal – ordered categories. Example: class standing (adults,
juveniles, males, females, etc.)
Quantitative - numerical
Discrete – number of possible values can be counted or in fixed
intervals. Number of fish, Numerical grade (4, 3.5, 3, 2.5, 2, 1.5, 1,
0)
Continuous – possible values arein a continuum. Example:
measurements in time, weight and space/distance

3
SAMPLING TECHNIQUES
Surveys and experiments should use methods to select a sample
that is “representative” of the population of interest

SAMPLING FRAME A list of elements belonging to the


population from which the sample/s will be drawn.

SAMPLING TECHNIQUES
JUDGMENTAL SAMPLING Elements are selected on the
basis of being “typical”
May be used to get “expert” opinions
Danger of biased results (like interviewing only those
households with cars if a route to Wernhill Park is to be mapped
out)
PROBABILITY SAMPLING: Each element has a certain
probability of being selected as part of the sample.

→ Avoid judgmental sampling as much as possible in biological experiments

4
SAMPLING TECHNIQUES
TYPES OF SAMPLING TECHNIQUES
Simple Random Sample (SRS): A random sample (n) that is selected
such that all possible samples from the population (N) have an equal
chance of being selected.
The concept is simple, but may be costly and difficult
to implement for a large population/ geographical
area
SAMPLE
SAMPLING
FRAME

• Scientific samples are selected using the Random Number Table

Random Number Table

10

10

5
SAMPLING TECHNIQUES
TYPES OF SAMPLING TECHNIQUES

Systematic Sample: Every kth item is selected, starting from a


randomly selected first item.

May not be appropriate when the arrangement


of the units has cyclical or repetitive trend.
Random → Not appropriate for most biological populations
start

11

SAMPLING TECHNIQUES
TYPES OF SAMPLING TECHNIQUES
Stratified Random Sample Divide sampling frame into groups called
strata, then select units from each strata using SRS.

adults SAMPLE The number of samples for each stratum


1 may be equal or relative to the number
of units of each stratum (proportional
kids SAMPLE
2 sampling)

Cluster Sample Divide sampling frame into subgroups. Randomly


select some subgroups and include all members in the subgroup.

Cluster sampling, whenever appropriate, is generally less


Selected costly than simple random sampling.
clusters
Collection of clusters

12

6
SAMPLING TECHNIQUES
TYPES OF SAMPLING TECHNIQUES

FINAL NOTE:
In many cases, the sampling schemes used in actual surveys are
combinations of one or more of the basic sampling techniques.
The purpose of doing so is to minimize cost, while making sure
that the sample taken is truly a “representative” of the population.

13

PROBABILITY vs. STATISTICS


Probability: Know the population which we sample ahead of
time; can compute probabilities of outcomes. A tool used in
statistics.
Statistics: Involves sampling from the population and trying to
figure out information on a variable about the population.
Loosely speaking, probability is “proactive” while statistics is
“reactive”

14

7
PROBABILITY
Why do we need to study probability?
Researcher / Businessman: he/she may be interested in
determining the likelihood of some “critical events” to occur.
Researcher: he/she may want to know how “reliable” are
his/her conclusions

We are faced with probability everyday, the issue is that we fail to


analyze them
Probability of tossing a coin, a dice,
Probability of scoring 0% from a 100 questions with 4 multiple choices per question (if not judgmental).

15

PROBABILITY
THE RULES OF THE GAME…
You must have….

- An experiment. An experiment is a planed activity with defined


outcomes that can be repeated under the same set of conditions.

- A set of outcomes. An outcome is one of the possible “results” of


your experiment.

- A probability value (p) for each outcome or event. It can be viewed as a


measure of importance, or a measure of likelihood for outcomes or events.

16

8
PROBABILITY
THE RULES OF THE GAME…
How do we assign a probability value to an outcome in dependent events?
Example tossing of a coin!
a) We can do the experiment so many times, and count the number of
times an outcome occur; and then we express that in relative frequency
(maximum of 1). Those are experimental or empirical probabilities.
Probability of tossing a coin 24,000 times and getting 12,012 heads.
Hence, the empirical probability of getting a head was
12012
P ( getting a head ) = = 0 . 5005
24000 N umber of outcomes belonging to the event
P ( any event ) =
Total number of outcomes

If in an experiment where the outcome is to be either of two outcomes


(Mutually exclusive events) and p is far from 0.5, then the experiment is
either faulty or the population is not normal or sample size was not
sufficient.

17

PROBABILITY
Experimental probabilities make use of this law:
If the number of times an experiment is repeated is increased,
the ratio of the number of successful occurrences to the number
of trials will tend to approach the theoretical probability of the
outcome of an individual trial

How is this achieved:


Number of Samples taken from the population
Randomization
Replications
Etc.

18

9
PROBABILITY
Probability in independent events:
Independent Events: Two or more outcomes which are independent events
if and only if the occurrence (or non-occurrence) of one does not affect the
outcome of others.

Here the analysis become more complex:


Most important here is the expression of these event/outcomes in
numerical values for analysis → variables

These variables could be discrete or continuous

Within a biological population, these variables will be distributed fairly →


normal distribution (deviation about the mean value is same → variances)

19

PROBABILITY
The normal probability distribution is considered to be the single
most important probability distribution.


 is the distance (deviation) from the
inflection point to the center

 is the mean value of the distribution


 X

20

10
PROBABILITY
Sample size affect the sample mean variability. In addition, increasing
the sample size changes the shape of the distribution of the sample
mean:

From this… Into this!

Increasing sample size (n)

21

PARAMETRIC vs. NON-PARAMETRIC


Parametric Assumptions
▪ The observations must be independent
▪ The observations must be drawn from normally distributed
populations
▪ These populations must have the same variances
▪ The means of these normal populations must be linear
combinations of effects due to columns and/or rows

Non Parametric Assumptions


▪ Observations could be independent or dependent
▪ Variable under study has underlying continuity

22

11
PARAMETRIC vs. NON-PARAMETRIC
Parametric Nonparametric
(Actual measurements (Actual measurements
used) not used – ranking)
Two samples – compare mean value t-test for independent Wald-Wolfowitz runs test
for some variable of interest samples
Mann-Whitney U test
Kolmogorov-Smirnov two
sample test

23

PARAMETRIC vs. NON-PARAMETRIC

Parametric Nonparametric
Multiple groups Analysis of Kruskal-Wallis analysis
variance (ANOVA) of ranks
Median test

24

12
PARAMETRIC vs. NON-PARAMETRIC

Parametric Nonparametric

t-test for
Compare two variables measured in the same dependent Sign test
sample samples
Wilcoxon’s matched
pairs test
If more than two variables are measured in Repeated Friedman’s two way
same sample measures ANOVA analysis of variance

Cochran Q

25

PARAMETRIC vs. NON-PARAMETRIC


Parametric Nonparametric

Correlation Spearman R
coefficient
Kendall Tau

Coefficient Gamma
Two variables of interest are
categorical Chi square

Phi coefficient

Fisher exact test

Kendall coefficient of
concordance

26

13
PARAMETRIC vs. NON-PARAMETRIC
Summary Table of Statistical Tests
Level of Sample Characteristics Correlation
Measurement
1 2 Samples K Sample (i.e., >2)
Sample
Independent Dependent Independent Dependent

Categorical Χ2 or Χ2 Macnarmar’ Χ2 Cochran’s Q


or Nominal bi- s Χ2
nomial

Rank or Mann Wilcoxin Kruskal Wallis Friendman’s Spearman’s


Ordinal Whitney U Matched H ANOVA rho
Pairs Signed
Ranks

Parametric z test t test t test within 1 way ANOVA 1 way Pearson’s r


(Interval & or t test between groups between ANOVA
Ratio) groups groups (within or
repeated
measure)
Factorial (2 way) ANOVA

27

PARAMETRIC vs. NON-PARAMETRIC

Criticisms of Nonparametric Procedures


▪ Losing precision/wasteful of data
▪ Low power
▪ False sense of security
▪ Lack of software
▪ Testing distributions only
▪ Higher-ordered interactions not dealt with

28

14
TESTING OF EXPERIMENTAL HYPOTHESIS
These two opposing hypotheses from AN EXPERIMENT form the Null and
the Alternative hypotheses:
NULL HYPOTHESIS (Ho) is the claim initially believed to be true.
It is the “status quo”

ALTERNATIVE HYPOTHESIS (Ha) is the claim we want to prove


Reasoning is analogous mean weight of two different population of fish:

Ho: mean of group I is same as mean of group II


Ha: mean of group I differ from mean of group II

• If we find “sufficient evidence” that the two means differ, then we can
reject Ho and claim Ha
• If we don’t find “sufficient evidence” against the accused, then we just
“failed to disprove the difference”. It DOESN’T mean the weights are the
same. In other words, we only failed to reject Ho.

29

TESTING OF EXPERIMENTAL HYPOTHESIS


At the end of the experiment, we will either reject Ho or fail to reject Ho. Just
like in any decision, we expose ourselves into “errors”:
Reality

Ho is TRUE Ho is FALSE
“mean weights are “mean weights
same” differs”
Reject Ho TYPE 1 Error Correct Decision
(a) (1-b)
Fail to reject Ho Correct Decision TYPE 2 Error
(1-a) (b)

TYPE 1 Error (a) is the error of rejecting Ho when Ho is true


TYPE 2 Error (b) is the error of failing to reject Ho when Ho is false
→ A good statistical test must be able to control these two errors

30

15
STANDARD DEVIATION AND VARIANCE
Variance
The variance and the closely-related standard deviation are measures
of how spread out a distribution is. In other words, they are measures
of variability.

The variance is computed as the average squared deviation of each


number from its mean.

Population Variance =

Sample Variance =

where μ is the mean and N is the number of scores, M is the mean of the sample and X
is the variable

31

STANDARD DEVIATION AND VARIANCE


Standard Deviation
The standard deviation formula is very simple: it is the square root
of the variance. It is the most commonly used measure of spread.

32

16

You might also like