Methods in Biochemical Research

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

METHODS IN BIOCHEMICAL RESEARCH

Meaning of Research
Research is an art of scientific investigation. The Advance Learners Dictionary defines
research as a careful investigation or inquiry especially through search for new facts in any
branch of knowledge. According to WHO, Research is a quest for knowledge through diligent
search or investigation or experimentation aimed at the discovery and interpretation of new
knowledge. Generally, research can be defined as a scientific inquiry aimed at learning new
facts, testing ideas, etc. It is the systematic collection, analysis and interpretation of data to
generate new knowledge and answer a certain question or solve a problem.

Characteristics of research
 It demands a clear statement of the problem
 It requires a plan (it is not aimlessly “ looking” for something in the hope that you
will come across a solution)
 It builds on existing data, using both positive and negative findings
 New data should be collected as required and be organized in such a way that they
answer the research question(s).

Types of research
Research is a systematic search for information and new knowledge. It covers topics in every
field of science and perceptions of its scope and activities are unlimited. The classical broad
divisions of research are: basic and applied research. The basic research is necessary to
generate new knowledge and technologies to deal with major unresolved health problems. On
the other hand, applied research is necessary to identify priority problems and to design and
evaluate policies and programs that will deliver the greatest health benefit, making optimal
use of available resources.

Quantitative and Qualitative researches: Early forms of research originated in the natural
sciences such as biology, chemistry, physics, geology etc. and were concerned with
investigating things which we could observe and measure in some way. Such observations
and measurements can be made objectively and repeated by other researchers. This process is
referred to as “quantitative” research. Much later, along came researchers working in the
social sciences: psychology, sociology, anthropology etc. They were interested in studying
human behaviour and the social world inhabited by human beings. They found increasing
difficulty in trying to explain human behaviour in simply measurable terms. Measurements
tell us how often or how many people behave in a certain way but they do not adequately
answer the “why” and “how” questions. Research which attempts to increase our
understanding of why things are the way they are in our social world and why people act the
ways they do is “qualitative” research. Qualitative research is concerned with developing
explanations of social phenomena. That is to say, it aims to help us to understand the world in
which we live and why things are the way they are. It is concerned with the social aspects of
our world and seeks to answer questions about:

• Why people behave the way they do


• How opinions and attitudes are formed
• How people are affected by the events that go on around them
• How and why cultures have developed in the way they have

1
Qualitative research is concerned with finding the answers to questions which begin with:
why? How? In what way? Quantitative research, on the other hand, is more concerned with
questions about: how much? How many? How often? To what extent? etc.

Public health problems are complex, not only because of their multicausality but also as a
result of new and emerging domestic and international health problems. Social, economic,
political, ethnic, environmental, and genetic factors all are associated with today’s public
health concerns. Consequently, public health practitioners and researchers recognize the need
for multiple approaches to understanding problems and developing effective interventions that
address contemporary public health issues. Qualitative methods fill a gap in the public health
toolbox; they help us understand behaviors, attitudes, perceptions, and culture in a way that
quantitative methods alone cannot. For all these reasons, qualitative methods are getting
renewed attention and gaining new respect in public health.

A thorough description of qualitative research is beyond the scope of this lecture note.
Students interested to know more about qualitative methods could consult other books which
are primarily written for that purpose. The main purpose of this lecture note is to give a
detailed account on the principles of quantitative research.

EMPIRICAL AND THEORETICAL RESEARCH


The philosophical approach to research is basically of two types: empirical and theoretical.
Health research mainly follows the empirical approach, i.e. it is based upon observation and
experience more than upon theory and abstraction. Epidemiological research, for example,
depends upon the systematic collection of observations on the health related phenomena of
interest in defined populations. Moreover, even in abstraction with mathematical models,
advances in understanding of disease occurrence and causation cannot be made without a
comparison of the theoretical constructs with that which we actually observe in populations.
Empirical and theoretical research complement each other in developing an understanding of
the phenomena, in predicting future events, and in the prevention of events harmful to the
general welfare of the population of interest. Empirical research in the health sciences can be
qualitative or quantitative in nature. Generally, health science research deals with information
of a quantitative nature, and this manual deals exclusively with this type of research. For the
most part, this involves the identification of the population of interest, the characteristics
(variables) of the individuals (units) in the population, and the study of the variability of these
characteristics among the individuals in the population.

Thus the quantification in empirical research is achieved by three related numerical


procedures: (a) measurement of variables;
(b) estimation of population parameters (parameters of the probability distribution that
captures the variability of observations in the population); and
(c) statistical testing of hypotheses, or estimating the extent to which ‘chance’ alone may
account for the variation among the individuals or groups under observation.

Taking chance, or probability into account is absolutely critical to biological research, and is
the substance of research design. Research design, above all else, must account for and
maintain the role of chance in order to ensure validity. It is statistical methods which preserve
the laws of Probability in our inquiry, and allow proper analysis and interpretation of results.
Statistics are the tool that permits health research to be empirical rather than abstract; they
allow us to confirm our findings by further observation and experiment.
2
Basic and applied
Research can be functionally divided into basic (or pure) research and applied research. Basic
research is usually considered to involve a search for knowledge without a defined goal of
utility or specific purpose. Applied research is problem-oriented, and is directed towards the
solution of an existing problem. There is continuing controversy over the relative benefits and
merits to society of basic and applied research. Some claim that science, which depends
greatly on society for its support, should address itself directly to the solution of the relevant
problems of man, while others argue that scientific inquiry is most productive when freely
undertaken, and that the greatest advances in science have resulted from pure research. It is
generally recognized that there needs to be a healthy balance between the two types of
research, with the more affluent and technologically advanced societies able to support a
greater proportion of basic research than those with fewer resources to spare.

Health research triangle


Yet another way of classifying health research, be it empirical or theoretical, basic or applied,
is to describe it under three operational interlinked categories of biomedical, health services
and behavioural research, the so-called health research triangle. Biomedical research deals
primarily with basic research involving processes at the cellular level; health research deals
with issues in the environment surrounding man, which promote changes at the cellular level;
and behavioural research deals with the interaction of man and the environment in a manner
reflecting the beliefs, attitudes and practices of the individual in society.

Scientific foundations of research


Several fundamental principles are used in scientific inquiry:

1. Order
The scientific method differs from ‘common sense’ in arriving at conclusions by employing
an organized observation of entities or events which are classified or ordered on the basis of
common properties and behaviours. It is this commonality of properties and behaviours that
allows predictions, which, carried to the ultimate, become laws.

2. Inference and chance


Reasoning, or inference is the force of advances in research. In terms of logic, it means that a
statement or conclusion ought to be accepted because one or more other statements or
premises (evidence) are true. Inferential suppositions, presumptions or theories may be so
developed, through careful construction, as to pose testable hypothesis. The testing of
hypothesis is the basic method of advancing knowledge in science.

Two distinct approaches or arguments have evolved in the development of inferences:


deductive and inductive. In deduction, the conclusion necessarily follows from the premises,
as in syllogism (all A is B, all B is C, therefore all A is C) or in algebraic equations.
Deduction can be distinguished by the fact that it moves from the general to the specific, and
does not allow for the element of chance or uncertainty. Deductive inferences, therefore, are
suited to theoretical research. Health research, being primarily empirical, depends almost
entirely upon inductive reasoning. The conclusion does not necessarily follow from the
premises or evidence (facts). We can say only that the conclusion is more likely to be valid if
the premises are true, i.e. there is a possibility that the premises may be true but the
conclusions false. Chance must, therefore, be fully accounted for. Further, inductive reasoning
is distinguished by the fact that it moves from the specific to the general – it builds.
3
3. Evaluation of probability
The critical requirement in the design of research, the one that ensures validity, is the
evaluation of probability from beginning to end. The most salient elements of design, which
are meant to ensure the integrity of probability and the prevention of bias, are: representative
sampling, randomization in the selection of study groups, maintenance of comparison groups
as controls, blinding of experiments and subjects, and the use of probability (statistical)
methods in the analysis and interpretation of outcome. Probability is a measure of the
uncertainty or variability of the characteristic among individuals in the population. If the
entire population is observed, the calculation of the relative frequencies of the variables
provides all the information about the variability. If only a sample of individuals in the
population is observed, the inference from the sample to the population (specific to the
general) will involve the identification of the probabilities of the events being observed, as
well as the laws of probability that allow us to measure the amount of uncertainty in our
inferences. These objectives can be achieved only by the proper design of research which
incorporates the laws of probability.

4. Hypothesis
Hypotheses are carefully constructed statements about a phenomenon in the population. The
hypotheses may have been generated by deductive reasoning, or based on inductive reasoning
from prior observations. One of the most useful tools of health research is the generation of
hypotheses which, when tested, will lead to the identification of the most likely causes of
disease or changes in the condition being observed. Although we cannot draw definite
conclusions, or claim proof using the inductive method, we can come ever closer to the truth
by knocking down existing hypotheses and replacing them with ones of greater plausibility.

In health research, hypotheses are often constructed and tested to identify causes of disease
and to explain the distribution of disease in populations. Mill’s canons of inductive reasoning
are frequently utilized in the forming of hypotheses which relate association and causation.
Briefly stated, these methods include:
(a) method of difference – when the frequency of a disease is markedly dissimilar under two
circumstances, and a factor can be identified in one circumstance and not the other, this
factor, or its absence, may be the cause of the disease (for example, the difference in
frequency of lung cancer in smokers and nonsmokers);
(b) method of agreement – if a factor, or its absence is common to a number of different
circumstances that are found to be associated with the presence of a disease, that factor, or its
absence may be causally associated with the disease (e.g. the occurrence of hepatitis A is
associated with patient contact, crowding and poor sanitation and hygiene, each conducive to
the transmission of the hepatitis virus);
(c) the method of concomitant variation, or the dose response effect – the increasing
expression of endemic goitre with decreasing levels of iodine in the diet, the increasing
frequency of leukaemia with increasing radiation exposure, the increase in prevalence of
elephantiasis in areas of increasing filarial endemicity, are each examples of this concomitant
variation;
(d) the method of analogy – the distribution and frequency of a disease or effect may be
similar enough to that of some other disease to suggest commonality in cause (e.g. hepatitis B
virus infection and cancer of the liver).

4
DIFFERENT TYPES OF RESEARCH STUDIES

1. CASE CONTROL STUDY


A case control study is a method extensively used by the medical profession, as an easy and
quick way of comparing treatments, or investigating the causes of disease. Longitudinal
studies are the preferred method, but are often expensive, time consuming and difficult.
Whilst this method does suffer from some weaknesses, it is quick, and delivers results
quickly.

The case control study uses groups of patients stricken with a disease, and compares them
with a control group of patients not suffering symptoms. Medical records and interviews are
used to try to build up a historical picture of the patient's life, allowing cross-reference
between patients and statistical analysis. Any trends can then be highlighted and action can be
taken.

Statistical analysis allows the researcher to draw a conclusion about whether a certain
situation or exposure led to the medical condition. For example, a scientist could compare a
group of coal miners suffering from lung cancer with those clear of the disease, and try to
establish the underlying cause. If the majority of the cases arose in collieries owned by one
company, it might indicate that the company's safety equipment and procedures were at fault
Possibly the most famous case control study using this method was a study into whether
bicycle helmets reduce the chance of cyclists receiving bad head injuries in an accident.
Obviously, the researcher could not use standard experimentation and compare a control
group of non-helmet wearers with helmet wearers, measuring the chances of head injury, as
this would be unethical. A case study control was utilized, and the researchers looked at
medical records, comparing the number of head injury sufferers wearing helmets against those
without. This generated a statistical result, showing that wearing a cycle helmet made it 88%
less likely that head injury would be suffered in an accident. The main weakness of the case
control study is that it is very poor at determining cause and effect relationships.

In the cycle helmet example, it could be argued that a cyclist who bothered wearing a helmet
may well have been a safer cyclist anyway, and less likely to have accidents. Evidence
showed that children wearing helmets were more likely to be from a more affluent class, more
used to cycling through parks than city streets. The study also showed that helmets were of
little use to adults. Whilst most agree that cycle helmets are probably a good thing for
children, there is not enough evidence to suggest that they should be mandatory for adults
5
outside extreme cycling. These problems serves as a warning that the results of any case
control study should not be relied upon, instead acting as a guide, possibly allowing deeper
and more rigorous methods to be utilized.

2. OBSERVATIONAL STUDIES
This type of research draws a conclusion by comparing subjects against a control group, in
cases where the researcher has no control over the experiment. A research study comparing
the risk of developing lung cancer, between smokers and non-smokers, would be a good
example of an observational study. The main reason for performing any observational
research is due to ethical concerns. With the smoking example, a scientist cannot give
cigarettes to non-smokers for 20 years and compare them with a control group. This also
brings up the other good reason for such studies, in that few researchers can study the long-
term effects of certain variables, especially when it runs into decades. For this study of long-
term and subtle effects, they have to use pre-existing conditions and medical records. The
researcher may want to study an extremely small sample group, so it is easier to start with
known cases and works backwards. The thalidomide cases, for example, are an example of an
observational study where researchers had to work backwards, and establish that the drug was
the cause of disabilities.

The main problem with observational studies is that the experimenter has no control over the
composition of the control groups, and cannot randomize the allocation of subjects. This can
create bias, and can also mask cause and effect relationships or, alternatively, suggest
correlations where there are none (error in research). For example, in the smoking example, if
the researcher found that there is a correlation between smoking and increased rates of lung
cancer, without knowing the full and complete background of the subjects, there is no way of
determining whether other factors were involved, such as diet, occupation or genetics.
Randomization is assumed to even out external causal effects, but this is impossible in an
observational study. There is no independent variable, so it is dangerous to assume cause and
effect relationships, a process often misunderstood by the mass media lauding the next
wonder food, or sensationalizing a political debate with unfounded results and pseudo-
science. Despite the limitations, an observational study allows a useful insight into a
phenomenon, and sidesteps the ethical and practical difficulties of setting up a large and
cumbersome medical research project.

3. COHORT STUDY
A cohort study is a research program investigating a particular group with a certain trait, and
observes over a period of time. Some examples of cohorts may be people who have taken a
certain medication, or have a medical condition. Outside medicine, it may be a population of
animals that has lived near a certain pollutant or a sociological study of poverty. A cohort
study can delve even further and divide a cohort into sub-groups, for example, a cohort of
smokers could be sub-divided, with one group suffering from obesity. In this respect, a cohort
study is often interchangeable with the term naturalistic observation. There are two main sub-
types of cohort study, the retrospective and the prospective cohort study. The major difference
between the two is that the retrospective looks at phenomena that have already happened,
whilst the prospective type starts from the present.

3a. RETROSPECTIVE COHORT STUDY


The retrospective case study is historical in nature. Whilst still beginning with the division
into cohorts, the researcher looks at historical data to judge the effects of the variable. For
6
example, it might compare the incidence of bowel cancer over time in vegetarians and meat
eaters, by comparing the medical histories. It is a lot easier than the prospective, but there is
no control, and confounding variables can be a problem, as the researcher cannot easily assess
the lifestyle of the subject. A retrospective study is a very cheap and effective way of studying
health risks or the effects of exposure to pollutants and toxins. It gives results quickly, at the
cost of validity, because it is impossible to eliminate all of the potentially confounding
variables from historical records and interviews alone.

3b. PROSPECTIVE COHORT STUDY


In a prospective cohort study, the effects of a certain variable are plotted over time, and the
study becomes an ongoing process. To maintain validity, all of the subjects must be initially
free of the condition tested for. For example, an investigation, over time, into the effects of
smoking upon lung cancer must ensure that all of the subjects are free of the disease. It is also
possible to subgroup and try to control variables, such as weight, occupation type or social
status. They are preferable to a retrospective study, but are expensive and usually require a
long period of time to generate useful results, so are very expensive and difficult. Some
studies have been running for decades, but are generating excellent data about underlying
trends in a population. The prospective cohort study is a great way to study long-term trends,
allowing the researcher to measure any potential confounding variables, but the potential cost
of error is high, so pilot studies are often used to ensure that the study runs smoothly.

Other types of cohort study are:


3c. AMBIDIRECTIONAL COHORT STUDY
The ambidirectional cohort study is the ultimate method, combining retrospective and
prospective aspects. The researcher studies and analyzes the previous history of the cohorts
and then continues the research in a prospective manner. This gives the most accurate results,
but is an extremely arduous undertaking, costing time and a great deal of money. The
ambidirectional study shares one major drawback with the prospective study, in that it is
impossible to guarantee that any data can be followed up, as participants may decline to
participate or die prematurely. These studies need to look at very large samples to ensure that
any attributional losses can be absorbed by the statistics.

4. LONGITUDINAL STUDY
A longitudinal study is observational research performed over a period of years or even
decades. Longitudinal studies allow social scientists and economists to study long-term
effects in a human population. A cohort study is a subset of the longitudinal study because it
observes the effect on a specific group of people over time. Quite often, a longitudinal study
is an extended case study, observing individuals over long periods, and is a purely qualitative
undertaking. The lack of quantitative data means that any observations are speculative, as
with many case studies, but they allow a unique and valuable perspective on some aspects of
human culture and sociology.

The 'UP' Series


The groundbreaking television documentary 'UP' is probably the most famous example of a
long-term longitudinal study, a case study of a group of British people from birth. The
original producer, Michael Apted, proposed the hypothesis that children born into a certain
social class would remain entrenched in that class throughout their life. In 1967, he selected
children from the rich, poor and middle classes, and proceeded to interview and film them
every seven years. The highly acclaimed series is still running, with the next set of interviews
7
to be performed in 2011/2012, and it has provided a unique insight into the development of
British culture since the 1960's. Even this series highlights one of the major flaws of a
longitudinal study, the problem that there can be no retesting or restart. Apted, with hindsight,
wished that he had used more female subjects, showing the importance of the initial planning
stage of a longitudinal study. Once a course of action is decided, the clock cannot be turned
back, and the results must stand as tested.

5. A CROSS SECTIONAL STUDY


The cross sectional study looks at a different aspect than the standard longitudinal study. The
longitudinal study uses time as the main variable, and tries to make an in depth study of how a
small sample changes and fluctuates over time. A cross sectional study, on the other hand,
takes a snapshot of a population at a certain time, allowing conclusions about phenomena
across a wide population to be drawn. An example of a cross-sectional study would be a
medical study looking at the prevalence of breast cancer in a population. The researcher can
look at a wide range of ages, ethnicities and social backgrounds. If a significant number of
women from a certain social background are found to have the disease, then the researcher
can investigate further. This is a relatively easy way to perform a preliminary experiment,
allowing the researcher to focus on certain population groups and understand the wider
picture. Of course, researchers often use both methods, using a cross section to take the
snapshot and isolate potential areas of interest, and then conducting a longitudinal study to
find the reason behind the trend. This is called panel data, or time series cross-sectional data,
but is generally a complicated and expensive type of research, notoriously difficult to analyze.
Such programs are rare, but can give excellent data, allowing a long-term picture of
phenomena to be ascertained.

6. A CORRELATIONAL STUDY
A correlational study determines whether or not two variables are correlated. This means to
study whether an increase or decrease in one variable corresponds to an increase or decrease
in the other variable. It is very important to note that correlation doesn't imply causation.
We'll come back to this later.
Types
There are three types of correlations that are identified:
1. Positive correlation: Positive correlation between two variables is when an increase
in one variable leads to an increase in the other and a decrease in one leads to a decrease in
the other. For example, the amount of money that a person possesses might correlate
positively with the number of cars he owns.
2. Negative correlation: Negative correlation is when an increase in one variable leads
to a decrease in another and vice versa. For example, the level of education might correlate
negatively with crime. This means if by some way the education level is improved in a
country, it can lead to lower crime. Note that this doesn't mean that a lack of education causes
crime. It could be, for example, that both lack of education and crime have a common reason:
poverty.
3. No correlation: Two variables are uncorrelated when a change in one doesn't lead to a
change in the other and vice versa. For example, among millionaires, happiness is found to be
uncorrelated to money. This means an increase in money doesn't lead to happiness.
A correlation coefficient is usually used during a correlational study. It varies between +1 and
-1. A value close to +1 indicates a strong positive correlation while a value close to -1
indicates strong negative correlation. A value near zero shows that the variables are
uncorrelated.
8
Limitations
It is very important to remember that correlation doesn't imply causation and there is no way
to determine or prove causation from a correlational study. This is a common mistake made
by people in almost all spheres of life.

For example, a US politician speaking out against free lunches to poor kids at school argues -
“You show me the school that has the highest free and reduced lunch, and I'll show you the
worst test scores, folks” (nymag.com). This is a correlation he is speaking about - one cannot
imply causation. The obvious explanation for this is a common cause of poverty: people who
are too poor to feed their children will not have the best test scores.

TYPES OF EXPERIMENTAL DESIGN

A. SEMI-EXPERIMENTAL DESIGN
FIELD EXPERIMENT
For geologists, social scientists and environmental biologists, amongst others, field
experiments are an integral part of the discipline. As the name suggests, a field study is an
experiment performed outside the laboratory, in the 'real' world. Unlike case studies and
observational studies, a field experiment still follows all of the steps of the scientific process,
addressing research problems and generating hypotheses. The obvious advantage of a field
study is that it is practical and also allows experimentation, without artificially introducing
confounding variables. A population biologist examining an ecosystem could not move the
entire environment into the laboratory, so field experiments are the only realistic research
method in many fields of science.

In addition, they circumvent the accusation leveled at laboratory experiments of lacking


external or ecological validity, or adversely affecting the behavior of the subject. Social
scientists and psychologists often used field experiments to perform blind studies, where the
subject was not even aware that they were under scrutiny. A good example of this is the
Piliavin and Piliavin experiment, where the propensity of strangers to help blood covered
'victims' was measured. This is now frowned upon, under the policy of informed consent, and
is only used in rare and highly regulated circumstances. Field experiments can suffer from a
lack of a discrete control group and often have many variables to try to eliminate. For
example, if the effects of a medicine are studied, and the subject is instructed not to drink
alcohol, there is no guarantee that the subject followed the instructions, so field studies often
sacrifice internal validity for external validity. For fields like biology, geology and
environmental science, this is not a problem, and the field experiment can be treated as a
sound experimental practice, following the steps of the scientific method. A major concern
9
shared by all disciplines is the cost of field studies, as they tend to be very expensive. For
example, even a modestly sized research ship costs many thousands of dollars every day, so a
long oceanographical research program can run into the millions of dollars. Pilot studies are
often used to test the feasibility of any long term or extensive research program before
committing vast amounts of funds and resources. The changeable nature of the external
environment and the often-prohibitive investment of time and money mean that field
experiments are rarely replicable, so any generalization is always tenuous.

B. QUASI-EXPERIMETALT DESIGN
Quasi-experimental design is a form of experimental research used extensively in the social
sciences and psychology. Whilst regarded as unscientific and unreliable, by physical and
biological scientists, the method is, nevertheless, a very useful method for measuring
social variables. The inherent weaknesses in the methodology do not undermine
the validity of the data, as long as they are recognized and allowed for during the
whole experimental process. Quasi experiments resemble quantitative and qualitative
experiments, but lack random allocation of groups or proper controls, so firm statistical
analysis can be very difficult.

DESIGN
Quasi-experimental design involves selecting groups, upon which a variable is tested, without
any random pre-selection processes. For example, to perform an educational experiment, a
class might be arbitrarily divided by alphabetical selection or by seating arrangement. The
division is often convenient and, especially in an educational situation, causes as little
disruption as possible. After this selection, the experiment proceeds in a very similar way to
any other experiment, with a variable being compared between different groups, or over a
period of time.

ADVANTAGES
Especially in social sciences, where pre-selection and randomization of groups is often
difficult, they can be very useful in generating results for general trends. E.g. if we study the
effect of maternal alcohol use when the mother is pregnant, we know that alcohol does harm
embryos. A strict experimental design would include that mothers were randomly assigned to
drink alcohol. This would be highly illegal because of the possible harm the study might do to
the embryos. So what researchers do is to ask people how much alcohol they used in their
pregnancy and then assign them to groups. Quasi-experimental design is often integrated with
individual case studies; the figures and results generated often reinforce the findings in a case
study, and allow some sort of statistical analysis to take place. In addition, without extensive
pre-screening and randomization needing to be undertaken, they do reduce the time and
resources needed for experimentation.

DISADVANTAGES
Without proper randomization, statistical tests can be meaningless. For example, these
experimental designs do not take into account any pre-existing factors (as for the mothers:
what made them drink or not drink alcohol), or recognize that influences outside the
experiment may have affected the results. A quasi experiment constructed to analyze the
effects of different educational programs on two groups of children, for example, might
generate results that show that one program is more effective than the other. These results will
not stand up to rigorous statistical scrutiny because the researcher also need to control other
factors that may have affected the results. This is really hard to do properly. One group of
10
children may have been slightly more intelligent or motivated. Without some form of pre-
testing or random selection, it is hard to judge the influence of such factors.

CONCLUSION
Disadvantages aside, as long as the shortcomings of the quasi-experimental design are
recognized, these studies can be a very powerful tool, especially in situations where ‘true’
experiments are not possible. They are very good way to obtain a general overview and then
follow up with a case study or quantitative experiment, to focus on the underlying reasons for
the results generated.

IDENTICAL TWIN STUDY


The identical twins study has been used for a long time, to study the effects of environment
and genetics on human development. Some studies have tried to determine how genetics and
environmental factors contribute to intelligence, aggression or substance addictions. Most of
the twin's studies compare identical twins, having 100% genetic similarity, with non-identical
twins, with about 50% genetic similarity. The researcher compares the occurrence of an
individual trait between identical and fraternal twins. If the identical twins show more
similarity for this trait than the non-identical twins, then the excess is assumed to be down to
genetic factors.

This type of analysis would then allow the researchers to estimate the heritability of specific
traits and quantify the effect of genetic factors on the individual trait. Psychologists have long
known that a twin study is not a true experimental design, but it has led to some interesting
insights into the influence of genes on human behavior. For this method, a number of
assumptions have to be made; that the identical twins share identical DNA profiles, and that
the environmental factors are the same for all participants.

Criticisms
There have been few criticisms of identical twin studies over the years. By their nature, and
because of small sample sizes, it is very difficult to quantitatively analyze the results and so
all experimentation tends to be observational; the sample groups cannot be random so
statistical analysis is impossible. The experimental methods assume that there is little
difference in the environmental factors between fraternal and identical twins, but there is a
criticism that the tendency of adults to treat identical twins in exactly the same way makes
this assumption invalid. Parents tend to dress identical twins the same way and encourage
them to pursue the same interests. The distinction between environmental factors and genetic
influences may not be as black and white as the identical twins study assumes. There is
probably an interaction between genes and environment and so the whole picture may be a lot
more complex. In addition, the experiment tends to assume that one gene affects one
behavioral trait. Modern genetic research is showing that many different genes can influence
behavior.

Summary
The above criticisms all have some validity, but the main point is that twin studies have never
claimed to be anything other than observational, identifying and trying to explain trends rather
than prove a hypothesis. Whilst there are some concerns about the validity of the identical
twins study, such experiments are certainly better than performing no research at all. Twins
studies are now trying to analyze the environmental factors more. Instead of assuming that the

11
environmental factors are the same, they are now contrasting shared family environment with
the individual events suffered by the individual twin.

In addition, identical twins study is constantly evolving into more complex forms, now taking
into account whole families and other siblings in addition to the twins. Research into the
human genome is now resurrecting the studies of twins; hereditary trends observed in an
identical twins study can now be studied quantitatively in the laboratory. It is now standard
practice, when conducting twin's research to analyze DNA from all participants and this is
bypassing many of the concerns about the twin study.

C. EXPERIMENTAL DESIGNS

TRUE EXPERIMENTAL DESIGN


True experimental design is regarded as the most accurate form of experimental research, in
that it tries to prove or disprove a hypothesis mathematically, with statistical analysis. For
some of the physical sciences, such as physics, chemistry and geology, they are standard and
commonly used. For social sciences, psychology and biology, they can be a little more
difficult to set up.

For an experiment to be classed as a true experimental design, it must fit all of the following
criteria.
 The sample groups must be assigned randomly.
 There must be a viable control group.
 Only one variable can be manipulated and tested. It is possible to test more than one,
but such experiments and their statistical analysis tend to be cumbersome and difficult.
 The tested subjects must be randomly assigned to either control or experimental
groups.

Advantages
The results of a true experimental design can be statistically analyzed and so there can be little
argument about the results. It is also much easier for other researchers to replicate the
experiment and validate the results. For physical sciences working with mainly numerical
data, it is much easier to manipulate one variable, so true experimental design usually gives a
yes or no answer.

Disadvantages
Whilst perfect in principle, there are a number of problems with this type of design. Firstly,
they can be almost too perfect, with the conditions being under complete control and not
being representative of real world conditions. For psychologists and behavioral biologists, for
example, there can never be any guarantee that a human or living organism will exhibit
‘normal’ behavior under experimental conditions. True experiments can be too accurate and it
is very difficult to obtain a complete rejection or acceptance of a hypothesis because the
standards of proof required are so difficult to reach. True experiments are also difficult and
expensive to set up. They can also be very impractical. While for some fields, like physics,
there are not as many variables so the design is easy, for social sciences and biological
sciences, where variations are not so clearly defined it is much more difficult to exclude other
factors that may be affecting the manipulated variable.

12
Summary
True experimental design is an integral part of science, usually acting as a final test of a
hypothesis. Whilst they can be cumbersome and expensive to set up, literature reviews,
qualitative research and descriptive research can serve as a good precursor to generate a
testable hypothesis, saving time and money. Whilst they can be a little artificial and
restrictive, they are the only type of research that is accepted by all disciplines as statistically
provable.

A double blind experiment is an experimental method used to ensure impartiality, and avoid
errors arising from bias. It is very easy for a researcher, even subconsciously, to influence
experimental observations, especially in behavioral science, so this method provides an extra
check. For example, imagine that a company is asking consumers for opinions about its
products, using a survey. There is a distinct danger that the interviewer may subconsciously
emphasize the company's products when asking the questions. This is the major reason why
market research companies generally prefer to use computers, and double blind experiments,
for gathering important data.

The Blind Experiment


The blind experiment is the minimum standard for any test involving subjects and opinions,
and failure to adhere to this principle may result in experimental flaws. The idea is that the
groups studied, including the control, should not be aware of the group in which they are
placed. In medicine, when researchers are testing a new medicine, they ensure that the
placebo looks, and tastes, the same as the actual medicine. There is strong evidence of a
placebo effect with medicine, where, if people believe that they are receiving a medicine, they
show some signs of improvement in health. A blind experiment reduces the risk of bias from
this effect, giving an honest baseline for the research, and allowing a realistic statistical
comparison. Ideally, the subjects would not be told that a placebo was being used at all, but
this is regarded as unethical.

The Double Blind Experiment


The double blind experiment takes this precaution against bias one step further, by ensuring
that the researcher does not know in which group a patient falls. Whilst the vast majority of
researchers are professionals, there is always a chance that the researcher might
subconsciously tip off a patient about the pill they were receiving. They may even favor
giving the pill to patients that they thought had the best chance of recovery, skewing the
results. Whilst nobody likes to think of scientists as dishonest, there is often pressure, from
billion dollar drug companies and the fight for research grants, to generate positive results.
This always gives a chance that a scientist might manipulate results, and try to show the
research in a better light. Proving that the researcher carried out a double blind experiment
reduces the chance of criticism.

Other Applications
Whilst better known in medicine, double blind experiments are often used in other fields.
Surveys, questionnaires and market research all use this technique to retain credibility. If you
wish to compare two different brands of washing powder, the samples should be in the same
packaging. A consumer might have an inbuilt brand identity awareness, and preference, which
will lead to favoritism and bias. An example of the weakness of single blind techniques is in
police line-ups, where a witness picks out a suspect from a group. Many legal experts are
advocating that these line-ups should be unsupervised, and unprompted. If the police are fixed
13
on bringing a particular subject to justice, they may consciously, or subconsciously, tip off the
witness. Humans are very good at understanding body language and unconscious cues, so the
chance of observer's bias should be minimized.

REVIEWING OTHER RESEARCH


 Literature Review
 Meta-analysis
 Systematic Reviews

Literature Review
Many students are instructed, as part of their research program, to perform a literature review,
without always understanding what a literature review is. Most are aware that it is a process of
gathering information from other sources and documenting it, but few have any idea of how
to evaluate the information, or how to present it. A literature review can be a precursor in the
introduction of a research paper, or it can be an entire paper in itself, often the first stage of
large research projects, allowing the supervisor to ascertain that the student is on the correct
path. A literature review is a critical and in depth evaluation of previous research. It is a
summary and synopsis of a particular area of research, allowing anybody reading the paper to
establish why you are pursuing this particular research program. A good literature review
expands upon the reasons behind selecting a particular research question.

What Is a Literature Review Not?


It is not a chronological catalog of all of the sources, but an evaluation, integrating the
previous research together, and also explaining how it integrates into the proposed research
program. All sides of an argument must be clearly explained, to avoid bias, and areas of
agreement and disagreement should be highlighted. It is not a collection of quotes and
paraphrasing from other sources. A good literature review should also have some evaluation
of the quality and findings of the research. A good literature review should avoid the
temptation of impressing the importance of a particular research program. The fact that a
researcher is undertaking the research program speaks for its importance, and an educated
reader may well be insulted that they are not allowed to judge the importance for themselves.
They want to be re-assured that it is a serious paper, not a pseudo-scientific sales
advertisement.

Whilst some literature reviews can be presented in a chronological order, it is best avoided.
For example, a review of Victorian Age Physics, could present J.J. Thomson’s famous
experiments in a chronological order. Otherwise, this is usually perceived as being a little
lazy, and it is better to organize the review around ideas and individual points. As a general
rule, certainly for a longer review, each paragraph should address one point, and present and
evaluate all of the evidence, from all of the differing points of view.

Conducting a Literature Review


Evaluating the credibility of sources is one of the most difficult aspects, especially with the
ease of finding information on the internet. The only real way to evaluate is through
experience, but there are a few tricks for evaluating information quickly, yet accurately. There
is such a thing as ‘too much information,’ and Google does not distinguish or judge the
quality of results, only how search engine friendly a paper is. This is why it is still good
practice to begin research in an academic library. Any journals found there can be regarded as
safe and credible. The next stage is to use the internet, and this is where the difficulties start. It
14
is very difficult to judge the credibility of an online paper. The main thing is to structure the
internet research as if it were on paper. Bookmark papers, which may be relevant, in one
folder and make another subfolder for a ‘shortlist.’
 The easiest way is to scan the work, using the abstract and introduction as guides. This
helps to eliminate the non-relevant work and also some of the lower quality research. If it sets
off alarm bells, there may be something wrong, and the paper is probably of a low quality. Be
very careful not to fall into the trap of rejecting research just because it conflicts with your
hypothesis. Failure to do this will completely invalidate the literature review and potentially
undermine the research project. Any research that may be relevant should be moved to the
shortlist folder.
 The next stage is to critically evaluate the paper and decide if the research is sufficient
quality. Think about it this way: The temptation is to try to include as many sources as
possible, because it is easy to fall into the trap of thinking that a long bibliography equates to
a good paper. A smaller number of quality sources is far preferable than a long list of
irrelevance.
 Check into the credentials of any source upon which you rely heavily for the literature
review. The reputation of the University or organization is a factor, as is the experience of the
researcher. If their name keeps cropping up, and they have written many papers, the source is
usually OK.
 Look for agreements. Good research should have been replicated by other independent
researchers, with similar results, showing that the information is usually fairly safe to use.
If the process is proving to be difficult, and in some fields, like medicine and environmental
research, there is a lot of poor science, do not be afraid to ask a supervisor for a few tips. They
should know some good and reputable sources to look at. It may be a little extra work for
them, but there will be even more work if they have to tear apart a review because it is built
upon shaky evidence.

Conducting a good literature review is a matter of experience, and even the best scientists
have fallen into the trap of using poor evidence. This is not a problem, and is part of the
scientific process; if a research program is well constructed, it will not affect the results.

Meta Analysis
Meta analysis is a statistical technique developed by social scientists, who are very limited in
the type of experiments they can perform. Social scientists have great difficulty in designing
and implementing true experiments, so meta-analysis gives them a quantitative tool to analyze
statistically data drawn from a number of studies, performed over a period of time. Medicine
and psychology increasingly use this method, as a way of avoiding time-consuming and
intricate studies, largely repeating the work of previous research.

What is Meta-Analysis?
Social studies often use very small sample sizes, so any statistics used generally give results
containing large margins of error. This can be a major problem when interpreting and drawing
conclusions, because it can mask any underlying trends or correlations. Such conclusions are
only tenuous, at best, and leave the research open for criticism.
Meta-analysis is the process of drawing from a larger body of research, and using powerful
statistical analyzes on the conglomerated data. This gives a much larger sample population
and is more likely to generate meaningful and usable data.

15
The Advantages of Meta-Analysis
Meta-analysis is an excellent way of reducing the complexity and breadth of research,
allowing funds to be diverted elsewhere. For rare medical conditions, it allows researchers to
collect data from further afield than would be possible for one research group. As the method
becomes more common, database programs have made the process much easier, with
professionals working in parallel able to enter their results and access the data. This allows
constant quality assessments and also reducing the chances of unnecessary repeat research, as
papers can often take many months to be published, and the computer records ensure that any
researcher is aware of the latest directions and results. The field of meta study is also a lot
more rigorous than the traditional literature review, which often relies heavily upon the
individual interpretation of the researcher. When used with the databases, a meta study allows
a much wider net to be cast than by the traditional literature review, and is excellent for
highlighting correlations and links between studies that may not be readily apparent as well as
ensuring that the compiler does not subconsciously infer correlations that do not exist.

The Disadvantages of Meta-Analysis


There are a number of disadvantages to meta-analysis, of which a researcher must be aware
before relying upon the data and generated statistics. The main problem is that there is the
potential for publication bias and skewed data. Research generating results not refuting a
hypothesis may tend to remain unpublished, or risks not being entered into the database. If the
meta study is restricted to the research with positive results, then the validity is compromised.
The researcher compiling the data must make sure that all research is quantitative, rather than
qualitative, and that the data is comparable across the various research programs, allowing a
genuine statistical analysis. It is important to pre-select the studies, ensuring that all of the
research used is of a sufficient quality to be used. One erroneous or poorly conducted study
can place the results of the entire meta-analysis at risk. On the other hand, setting almost
unattainable criteria and criteria for inclusion can leave the meta study with too small a
sample size to be statistically relevant.

Striking a balance can be a little tricky, but the whole field is in a state of constant
development, incorporating protocols similar to the scientific method used for normal
quantitative research. Finding the data is rapidly becoming the real key, with skilled meta-
analysts developing a skill-set of library based skills, finding information buried in
government reports and conference data, developing the knack of assessing the quality of
sources quickly and effectively.

Conclusions and the Future


Meta-analysis is here to stay, as an invaluable tool for research, and is rapidly gaining
momentum as a stand-alone discipline, with practitioners straddling the divide between
statisticians and librarians. The conveniences, as long as the disadvantages are taken into
account, are too apparent to ignore, and a meta study can reduce the need for long, expensive
and potentially intrusive repeated research studies.

Systematic reviews
Heavily used by the healthcare sector, systematic reviews are a powerful way of isolating and
critically evaluating previous research. Modern medical research generates so much literature,
and fills so many journals, that a traditional literature review could take months, and still be
out of date by the time that the research is designed and performed. In addition, researchers
are often guilty of selecting the research best fitting their pre-conceived notions, a weakness
16
of the traditional 'narrative' literature review process. To help medical professionals, specialist
compilers assess and condense the research, entering it into easily accessible research
databases. They are an integral part of the research process, and every student of medicine
routinely receives a long and extensive training in the best methods for critically evaluating
literature.

Systematic Reviews - Addressing the Deficiencies in Narration


The problems with narrative literature came to light a couple of decades ago, when critics
realized that reviewers looking at the same body of evidence often generated completely
different findings. They drew conclusions based upon their specialty, rather than the
compelling evidence contained within the body of research. It is unclear whether this was a
case of conscious or subconscious manipulation (bias), but this particular finding was
worrying, especially in a research area where life and death could be at stake. To address this
issue, medical authorities developed a new protocol of systematic reviewing, based upon a
structure as strict as the scientific method governing empirical research programs.

The Protocols Underpinning Systematic Reviews


 Define a research question, in a similar way to formulating a research question for a
standard research design
 Locate and select relevant previous research studies, with no attempt at evaluation at
this stage. Ideally, research in languages other than English should be used, and the researcher
should try to find papers and reports unpublished in journals, such as conference speeches or
company reports.
 Critically evaluate the studies. The reviewer should assess each study upon criteria
based upon quality, strength of the findings and validity. For safety, this process should
include at least two independent reviewers, although a greater number is advisable.
 Combine the results. This is the process of combining all of the findings, sometimes
qualitatively, but usually quantitatively, using meta-analysis.
 Publish the results. As with any research, the results have to be written and published,
usually with a system of independent review. Discussion of the conclusions, as with any
research, allows the validity of the findings to be verified.

The Reasoning behind Systematic Reviews


The principle behind the systematic reviews process is that the researcher critically evaluates
previous studies, in a much more comprehensive and systematic way than a standard literature
review. In many cases, statistical meta-analysis tools are used to give the review a quantitative
foundation, allowing correlations to be documented and conclusions to be drawn. Whilst the
techniques are mainly used by medicine and psychology, there is a growing trend towards
using systematic reviews in other disciplines. Many branches of science are becoming
increasingly fragmented and anarchic, so this layer of analysis aggregates all of the disparate
elements. Systematic reviews, and meta-analysis, are regarded as a cornerstone of healthcare
research, essential where it is impractical or unethical to keep repeating old research. In
addition to the potential risks of repeated research upon patients and volunteers, there are now
laws in many countries prohibiting excessive research using animals. Systematic reviews are a
great way of reducing the amount of suffering caused by vivisection.

Addressing the Disadvantages of Systematic Reviews


As with most systems, despite the protocols, systematic reviews do have some inherent
weaknesses. The main problem is the rapid advancement of medical research and technology,
17
often meaning that many reviews are out of date before they are even published, forcing
researchers to update their findings constantly. The development of specialist organizations
for finding and evaluating data minimizes the effects of this particular shortcoming. As with
any subjective review, there is the problem of selection bias, where contradictory research is
jettisoned, although most medical researchers are adept at following the proper procedures.

Funding and research grants cause researchers to try to find results that suit their paymasters,
a growing problem in many areas of science, not just medicine. The specialist reviewers
sidestep this problem, to a certain extent, by producing independent research, uncorrupted by
governmental or private healthcare funding, curbing the worst excesses. Often, a blind system
is used, and reviewers are unaware of where the papers they are reviewing came from, or who
they are written by. This lessens allegations of favoritism and judging research by the
reputation of the researcher rather than on merit. Ultimately, the onus is on the reader to draw
their own assessments, using their own experience to judge the quality of the systematic
review. Whilst not a perfect system, systematic reviews are far superior to the traditional
narrative approach, which often allows a lot of good research to fall through the cracks.

PILOT STUDIES
A pilot study is a standard scientific tool for 'soft' research, allowing scientists to conduct a
preliminary analysis before committing to a full-blown study or experiment. A small
chemistry experiment in a college laboratory, for example, costs very little, and mistakes or
validity problems easily rectified. At the other end of the scale, a medical experiment taking
samples from thousands of people from across the world is expensive, often running into the
millions of dollars. Finding out that there was a problem with the equipment or with the
statistics used is unacceptable, and there will be dire consequences.

A field research project in the Amazon Basin costs a lot of time and money, so finding out
that the electronics used do not function in the humid and warm conditions is too late. To test
the feasibility, equipment and methods, researchers will often use a pilot study, a small-scale
rehearsal of the larger research design. Generally, the pilot study technique specifically refers
to a smaller scale version of the experiment, although equipment tests are an increasingly
important part of this sub-group of experiments. For example, the medical researchers may
conduct a smaller survey upon a hundred people, to check that the protocols are fine. The
Amazon Researchers may perform an experiment, in similar conditions, sending a small team
either to the Amazon to test the procedures, or by using something like the tropical bio-dome
at the Eden Project. Pilot studies are also excellent for training inexperienced researchers,
allowing them to make mistakes without fear of losing their job or failing the assignment.
Logistical and financial estimates can be extrapolated from the pilot study, and the research
question, and the project can be streamlined to reduce wastage of resources and time. Pilots
can be an important part of attracting grants for research as the results can be placed before
the funding body. Generally, most funding bodies see research as an investment, so are not
going to dole out money unless they are certain that there is a chance of a financial return.
Unfortunately, there are seldom paper reporting the preliminary pilot study, especially if
problems were reported, is often stigmatized and sidelined. This is unfair, and punishes
researchers for being methodical, so these attitudes are under a period of re-evaluation.
Discouraging researchers from reporting methodological errors, as found in pilot studies,
means that later researchers may make the same mistakes. The other major problem is
deciding whether the results from the pilot study can be included in the final results and
analysis, a procedure that varies wildly between disciplines. Pilots are rapidly becoming an
18
essential pre-cursor to many research projects, especially when universities are constantly
striving to reduce costs. Whilst there are weaknesses, they are extremely useful for driving
procedures in an age increasingly dominated by technology, much of it untested under field
conditions.

TYPES OF EXPERIMENTAL DESIGN

SIMPLE EXPERIMENTAL DESIGN

1. Posttest and Pretest Design


For many true experimental designs, pretest-posttest designs are the preferred method to
compare participant groups and measure the degree of change occurring as a result of
treatments or interventions. Pretest-posttest designs grew from the simpler posttest only
designs, and address some of the issues arising with assignment bias and the allocation of
participants to groups. One example is education, where researchers want to monitor the
effect of a new teaching method upon groups of children. Other areas include evaluating the
effects of counseling, testing medical treatments, and measuring psychological constructs.
The only stipulation is that the subjects must be randomly assigned to groups, in a true
experimental design, to properly isolate and nullify any nuisance or confounding variables.

The Posttest Only Design with Non-Equivalent Control Groups


Pretest-posttest designs are an expansion of the posttest only design with nonequivalent
groups, one of the simplest methods of testing the effectiveness of an intervention. In this
design, which uses two groups, one group is given the treatment and the results are gathered
at the end. The control group receives no treatment, over the same period of time, but
undergoes exactly the same tests. Statistical analysis can then determine if the intervention
had a significant effect. One common example of this is in medicine; one group is given a
medicine, whereas the control group is given none, and this allows the researchers to
determine if the drug really works. This type of design, whilst commonly using two groups,
can be slightly more complex. For example, if different dosages of a medicine are tested, the
design can be based around multiple groups. Whilst this posttest only design does find many
uses, it is limited in scope and contains many threats to validity. It is very poor at guarding
against assignment bias, because the researcher knows nothing about the individual
differences within the control group and how they may have affected the outcome. Even with
randomization of the initial groups, this failure to address assignment bias means that the
statistical power is weak. The results of such a study will always be limited in scope and,
resources permitting; most researchers use a more robust design, of which pretest-posttest
19
designs are one. The posttest only design with non-equivalent groups is usually reserved for
experiments performed after the fact, such as a medical researcher wishing to observe the
effect of a medicine that has already been administered.

The Two Group Control Group Design


This is, by far, the simplest and most common of the pretest-posttest designs, and is a useful
way of ensuring that an experiment has a strong level of internal validity. The principle
behind this design is relatively simple, and involves randomly assigning subjects between two
groups, a test group and a control. Both groups are pre-tested, and both are post-tested, the
ultimate difference being that one group was administered the treatment. This test allows a
number of distinct analyses, giving researchers the tools to filter out experimental noise and
confounding variables. The internal validity of this design is strong, because the pretest
ensures that the groups are equivalent. The various analyses that can be performed upon a
two-group control group pretest-posttest designs are (Fig 1):

1. This design allows researchers to compare the final posttest results between the two
groups, giving them an idea of the overall effectiveness of the intervention or treatment. (C)
2. The researcher can see how both groups changed from pretest to posttest, whether one,
both or neither improved over time. If the control group also showed a significant
improvement, then the researcher must attempt to uncover the reasons behind this. (A and A1)
3. The researchers can compare the scores in the two pretest groups, to ensure that the
randomization process was effective. (B)
These checks evaluate the efficiency of the randomization process and also determine whether
the group given the treatment showed a significant difference.

Problems with Pretest-Posttest Designs


The main problem with this design is that it improves internal validity but sacrifices external
validity to do so. There is no way of judging whether the process of pre-testing actually
influenced the results because there is no baseline measurement against groups that remained
completely untreated. For example, children given an educational pretest may be inspired to
20
try a little harder in their lessons, and both groups would outperform children not given a
pretest, so it becomes difficult to generalize the results to encompass all children.

The other major problem, which afflicts many sociological and educational research
programs, is that it is impossible and unethical to isolate all of the participants completely. If
two groups of children attend the same school, it is reasonable to assume that they mix outside
of lessons and share ideas, potentially contaminating the results. On the other hand, if the
children are drawn from different schools to prevent this, the chance of selection bias arises,
because randomization is not possible. The two-group control group design is an
exceptionally useful research method, as long as its limitations are fully understood. For
extensive and particularly important research, many researchers use the Solomon four group
method, a design that is more costly, but avoids many weaknesses of the simple pretest-
posttest designs.

SCIENTIFIC CONTROL GROUP


A scientific control group is an essential part of most research designs, allowing researchers to
eliminate and isolate variables. Normal biological variation, researcher bias and
environmental variation are all factors that can skew data, so scientific control groups provide
a baseline. As well as eliminating other variables, scientific control groups help the researcher
to show that the experimental design is capable of generating results.

What is a Scientific Control Group?


A researcher must only measure one variable at a time, and using a scientific control group
gives reliable baseline data to compare their results with. For example, a medical study will
use two groups, giving one set of patients the real medicine and the other a placebo, in order
to rule out the placebo effect.

In this particular type of research, the experiment is double blind. Neither the doctors nor the
patients are aware of which pill they are receiving, curbing potential research bias. In the
social sciences, control groups are the most important part of the experiment, because it is
practically impossible to eliminate all of the confounding variables and bias. For example, the
placebo effect for medication is well documented, and the Hawthorne Effect is another
influence where, if people know that they are the subjects of an experiment, they
automatically change their behavior. There are two main types of control, positive and
negative, both providing researchers with ways of increasing the statistical validity of their
data.

21
Positive Scientific Control Groups
Positive scientific control groups are where the control group is expected to have a positive
result, and allows the researcher to show that the set-up was capable of producing results.
Generally, a researcher will use a positive control procedure, which is similar to the actual
design with a factor that is known to work. For example, a researcher testing the effect of new
antibiotics upon Petri dishes of bacteria, may use an established antibiotic that is known to
work. If all of the samples fail, except that one, it is likely that the tested antibiotics are
ineffective. However, if the control fails too, there is something wrong with the design.
Positive scientific control groups reduce the chances of false negatives.

Negative Scientific Control Groups


Negative Scientific Control is the process of using the control group to make sure that no
confounding variable has affected the results, or to factor in any likely sources of bias. It uses
a sample that is not expected to work. In the antibiotic example, the negative control group
would be a Petri dish with no antibiotic, allowing the researcher to prove that the results are
valid and that there are no confounding variables. If all of the new medications worked, but
the negative control group also showed inhibition of bacterial growth, then some other
variable may have had an effect, invalidating the results. A negative control can also be a way
of setting a baseline.

A researcher testing the radioactivity levels of various samples with a Geiger counter would
also sample the background level, allowing them to adjust the results accordingly.
Establishing strong scientific control groups is arguably a more important part of any
scientific design than the actual samples. Failure to provide sufficient evidence of strong
control groups can completely invalidate a study, however high significance-levels indicate
low probability of error. Randomization is a sampling method used in scientific experiments.
It is commonly used in randomized controlled trials in experimental research. In medical
research, randomization and control of trials is used to test the efficacy or effectiveness of
healthcare services or health technologies like medicines, medical devices or surgery.

What is Randomization?
So what is randomization? Let's suppose you have five chocolates bars and total 8 friends to
distribute these 5 chocolates to. Now how you are going to do this so the whole distribution
process is with a minimum of bias? You may write down names of each of your friends on a
separate small piece of paper, fold all small pieces of papers so no one know what name is on
any paper. Then you ask someone to pick 5 names and give chocolates to first 5 names. This
will remove the bias without hurting any of your friend's feelings. The way you did this is
what we call randomization. In randomized controlled trials, the research participants are
assigned by chance, rather than by choice, to either the experimental group or the control
group. Randomization reduces bias as much as possible. Randomization is designed to
"control" (reduce or eliminate if possible) bias by all means. The fundamental goal of
randomization is to certain that each treatment is equally likely to be assigned to any given
experimental unit.

How Randomization Actually Works?


How to achieve randomization in randomized controlled trials? Well, there are different
options used by researchers to perform randomization. It can be achieved by use of random
number tables given in most statistical textbooks or computers can also be used to generate
random numbers for us. If neither of these available, you can devise your own plan to perform
22
randomization. For example, you can select the last digit of phone numbers given in a
telephone directory. For example you have different varieties of rice grown in10 total small
plots in a greenhouse and you want to evaluate certain fertilizer on 9 varieties of rice plants
keeping one plot as a control. You can number each of the small plots up to 9 and then you
can use series of numbers like 8 6 3 1 6 2 9 3 5 6 7 5 5 3 1 and so on. You can then allocate
each of three doses of fertilizer treatment (call them doses A, B, C). Now you can apply dose
A to plot number 8, B to 6, and C to 3. Then you apply dose A to 1, B to 2 because dose B is
already used on plot 6 and so on.

Blinding: An Excellent Tool to Eliminate Bias in Randomized Controlled Trials


Blinding is commonly employed in clinical research setting and used to further eliminate bias.
There are two types of blinding as under:
 In single-blinded trial the participants are completely unaware of which group they are
in and what intervention they are receiving until conclusion of the study.
 In double-blind trials neither the participants nor the researcher know to which group
the participant belongs and what intervention the participant is receiving until the conclusion
of study. Bias is the most unwanted element in randomized controlled trials and
randomization give researchers an excellent tool to reduce or eliminate bias to maximum.
Absence of bias means more reliable the results of study are and gives legitimacy to both
research and researchers as well.

RADOMISED CONTROL TRIALS


Randomized controlled trials are one of the most efficient ways of reducing the influence of
reducing the influence of external variables. In any research program, especially those using
human subjects, these external factors can skew the results wildly and attempts by researchers
to isolate and neutralize the influence of these variables can be counter-productive and
magnify them. Any experiment that relies upon selecting subjects and placing them into
groups is always at risk if the researcher is biased or simply incorrect. The researcher may fail
to take into account all of the potential confounding variables, causing severe validity issues.

The Advantage of Randomized Controlled Trials


Randomized controlled trials completely remove these extraneous variables without the
researcher even having to isolate them or even be aware of them. Randomized experiment
designs completely remove any accusations of conscious or subconscious bias from the
researcher and practically guarantee external validity.

23
As an example, imagine that a school seeks to test whether introducing a healthy meal at
lunchtime improves the overall fitness of the children. It decides to do this by giving half of
the children healthy salads and wholesome meals, whilst the control group carries on as
before. At regular intervals, the researchers note the cardiovascular fitness of the children,
looking to see if it improves. The number of extraneous factors and potential confounding
variables for such a study is enormous. Age, gender, weight, what the children eat at home,
and activity level are just some of the factors that could make a difference. In addition, if the
teachers, generally a health-conscious bunch, are involved in the selection of children, they
might subconsciously pick those who are most likely to adapt to the healthier regime and
show better results. Such a pre-determined bias destroys the chance of obtaining useful
results. By using pure randomized controlled trials and allowing chance to select children into
one of the two groups, it can be assumed that any confounding variables are cancelled out, as
long as you have a large enough sample group.

The Disadvantages of Randomized Controlled Trials


Ideally, randomized controlled trials would be used for most experiments, but there are some
disadvantages. Firstly, researchers often choose subjects because they do not have the
resources, or time, to test larger groups, so they have to try to find a sample that is
representative of the population as a whole. This select sampling means that it becomes very
difficult to generalize the results to the population as a whole.

Secondly, randomized experiment designs, especially when combined with crossover studies,
are extremely powerful at understanding underlying trends and causalities. However, they are
a poor choice for research where temporal factors are an issue, for which a repeated measures
design is better. Whilst randomized controlled trials are regarded as the most accurate
experimental design in the social sciences, education, medicine and psychology, they can be
extremely resource heavy, requiring very large sample groups, so are rarely used. Instead,
researchers sacrifice generalization for convenience, leaving large scale randomized
controlled trials for researchers with bigger budgets and research departments.

A BETWEEN SUBJECT DESIGN

24
A between subjects design is a way of avoiding the carryover effects that can plague within
subjects designs, and they are one of the most common experiment types in some scientific
disciplines, especially psychology. The basic idea behind this type of study is that participants
can be part of the treatment group or the control group, but cannot be part of both. If more
than one treatment is tested, a completely new group is required for each.

What is a Between Subjects Design?


A group of researchers wants to test some modifications to the educational program and
decide upon three different modifications. They pick a school and decide to use the four
existing classes within an age group, assuming that the spread of abilities is similar. Each
group of children is given a different educational program, along with a control group sticking
with the original. All of the groups are tested, at the end, to determine which program
delivered the most improvement. If the researchers want to be a little more accurate and
reduce the chances of differences between the groups having an effect, they use modifications
of the design. For example, maybe one class had a great teacher and has always been much
more motivated than the others, a factor that would undermine the validity of the experiment.
To avoid this, randomization and matched pairs are often used to smooth out the differences
between the groups.

Advantages of Between Subjects Design


Between subjects designs are invaluable in certain situations, and give researchers the
opportunity to conduct an experiment with very little contamination by extraneous factors.
This type of design is often called an independent measures design because every participant
is only subjected to a single treatment. This lowers the chances of participants suffering
boredom after a long series of tests or, alternatively, becoming more accomplished through
practice and experience, skewing the results.

Disadvantages of Between Subjects Design


The main disadvantage with between subjects designs is that they can be complex and often
require a large number of participants to generate any useful and analyzable data. Because
each participant is only measured once, researchers need to add a new group for every
treatment and manipulation.
 Practicality:
Researchers testing educational programs, for example, might need two groups of twenty
children for a control and test group. If they wanted to add a third program to the mix, they
would need another group of twenty children.
For many research programs, the sheer scale of the experiment and the resources required can
make between subjects designs impractical. If the condition tested is rare, then finding enough
subjects becomes even more difficult.
 Individual Variability:
The other problem is that it is impossible to maintain homogeneity across the groups; this
method uses individuals, with all of their subtle differences, and this can skew data. Age,
gender and social class are just some of the obvious factors but intelligence, emotive quotient
and every other personality construct can influence the data. If, for example, you were using a
between subjects design to measure intelligence, how do you guarantee that emotion does not
play a role? Some people may be very intelligent but are nervous when completing tests, so
achieve lower scores than they should. These individual differences can create a lot of
background noise, reducing the effects of the statistics and obscuring genuine patterns and
trends.
25
 Assignment Bias:
Imagine researchers comparing educational programs, and they decide to use two schools as
their participants. They find that there is a difference between the two groups and conclude
that treatment A is better than treatment B. However, they neglected to take into account the
fact that the schools contain children from different socio-economic backgrounds, and this
created assignment bias. A better idea would have been to use children from a single school or
use random assignment, but this is not always possible.
 Generalization:
Whilst it is easy to try to select subjects of the same age, gender and background, this then
opens the door for generalization issues, as you cannot then extrapolate the results to
encompass wider groups. Striking the best balance is one of the keys to conducting a between
subjects design. Failure to do this can lead to assignment bias, the ogre that threatens to
destroy this type of research.
 Environmental Factors:
Environmental variables are another major issue and usually arise from poor research design.
In the example above, imagine that the researchers did, in fact, use participants from a single
school and randomly assigned them. Due to time restrictions, they tested one group in the
morning and one in the afternoon. Many studies show that most people are at their mental
peak in the morning, so this will certainly have created an environmental bias. These factors
could very easily become confounding variables and weaken the results, so researchers have
to be extremely careful to eliminate as many of these as possible during the research design.
These disadvantages are certainly not fatal, but ensure that any researcher planning to use a
between subjects design must be very thorough in their experimental design.

WITHIN SUBJECT DESIGN


In a within subject design, unlike a between subjects design, every single participant is
subjected to every single treatment, including the control. This gives as many data sets as
there are conditions for each participant; the fact that subjects act as their own control
provides a way of reducing the amount of error arising from natural variance between
individuals. These tests are common in many research disciplines. An education researcher
might want to study the effect of a new program on children and test them before, and after,
the new method has been applied.

26
Psychologists often use them to test the relative effectiveness of a new treatment, often a
difficult proposition. The sheer complexity of the human mind and the large number of
potential confounding variables often renders between subjects designs unreliable, especially
when necessarily small sample groups make a more random approach impossible.

Examples of Within Subject Designs


One of the simplest within subject designs is opinion - watch any formalized debate and you
will see the process. The chairperson will take a vote before the debate, to establish a baseline
opinion, and will ask the audience to vote again at the end. The team that gained the most
votes obviously managed to sway opinion in the same subjects much better, so can be
announced as the winner. Another common example of a within-subjects design is medical
testing, where researchers try to establish whether a drug is effective or whether a placebo
effect is in order. The researchers, in the crudest form of the test, will give all of the
participants the placebo, for a time, and monitor the results. They would then administer the
drug for a period and test the results. Of course, the researchers could just as easily administer
the drug first and then the placebo. This ensures that every subject acts as their own control,
so there are few problems with matching age, gender and lifestyle, reducing the chance of
confounding factors.

The Advantages of Within Subject Designs


The main advantage that the within subject design has over the between subject design is that
it requires fewer participants, making the process much more streamlined and less resource
heavy. For example, if you want to test four conditions, using four groups of 30 participants is
unwieldy and expensive. Using one group, which is tested for all four, is a much easier way.
Ease is not the only advantage, because a well-planned within subject design allows
researchers to monitor the effect upon individuals much more easily and lower the possibility
of individual differences skewing the results.

The Disadvantages of Within Subject Designs


One disadvantage of this research design is the problem of carryover effects, where the first
test adversely influences the other. Two examples of this, with opposite effects, are fatigue
and practice. In a long experiment, with multiple conditions, the participants may be tired and
thoroughly fed up of researchers prying and asking questions and pressuring them into taking
tests. This could decrease their performance on the last study. Alternatively, the practice
effect might mean that they are more confident and accomplished after the first condition,
simply because the experience has made them more confident about taking tests. As a result,
for many experiments, a counterbalance design, where the order of treatments is varied, is
preferred, but this is not always possible.

COMPLEX EXPERIMENTAL DESIGN


FACTORIAL DESIGN
A factorial design is often used by scientists wishing to understand the effect of two or more
independent variables upon a single dependent variable. Traditional research methods
generally study the effect of one variable at a time, because it is statistically easier to
manipulate. However, in many cases, two factors may be interdependent, and it is impractical
or false to attempt to analyze them in the traditional way. Social researchers often use
factorial designs to assess the effects of educational methods, whilst taking into account the
influence of socio-economic factors and background.

27
Agricultural science, with a need for field-testing, often uses factorial designs to test the effect
of variables on crops. In such large-scale studies, it is difficult and impractical to isolate and
test each variable individually. Factorial experiments allow subtle manipulations of a larger
number of interdependent variables. Whilst the method has limitations, it is a useful method
for streamlining research and letting powerful statistical methods highlight any correlations.

The Basics
Imagine an aquaculture research group attempting to test the effects of food additives upon
the growth rate of trout. A traditional experiment would involve randomly selecting different
tanks of fish and feeding them varying levels of the additive contained within the feed, for
example none or 10%. However, as any fish farmer knows, the density of stocking is also
crucial to fish growth; if there are not enough fish in a tank, then the wasted capacity costs
money. If the density is too high, then the fish grow at a slower rate. Rather than the
traditional experiment, the researchers could use a factorial design and co-ordinate the
additive trial with different stocking densities, perhaps choosing four groups. The factorial
experiment then needs 4 x 2, or eight treatments. The traditional rules of the scientific method
are still in force, so statistics require that every experiment be conducted in triplicate. This
means 24 separate treatment tanks. Of course, the researchers could also test, for example, 4
levels of concentration for the additive, and this would give 4 x 4 or 16 tanks, meaning 48
tanks in total.

Each factor is an independent variable, whilst the level is the subdivision of a factor.
Assuming that we are designing an experiment with two factors, a 2 x 2 would mean two
levels for each, whereas a 2 x 4 would mean two subdivisions for one factor and four for the
other. It is possible to test more than two factors, but this becomes unwieldy very quickly. In
the fish farm example, imagine adding another factor, temperature, with four levels into the
mix. It would then be 4 x 4 x 4, or 64 runs. In triplicate, this would be 192 tanks, a huge
undertaking. There are a few other methods, such as fractional factorial designs, to reduce
this, but they are not always statistically valid. This lies firmly in the realm of advanced
statistics and is a long, complicated and arduous undertaking.

The Pros and Cons of Factorial Design


Factorial designs are extremely useful to psychologists and field scientists as a preliminary
study, allowing them to judge whether there is a link between variables, whilst reducing the
possibility of experimental error and confounding variables. The factorial design, as well as
simplifying the process and making research cheaper, allows many levels of analysis. As well
as highlighting the relationships between variables, it also allows the effects of manipulating a
single variable to be isolated and analyzed singly.

The main disadvantage is the difficulty of experimenting with more than two factors, or many
levels. A factorial design has to be planned meticulously, as an error in one of the levels, or in
the general operationalization, will jeopardize a great amount of work. Other than these slight
detractions, a factorial design is a mainstay of many scientific disciplines, delivering great
results in the field.

SOLOMON FOUR GROUP DESIGN


The Solomon four group design is a way of avoiding some of the difficulties associated with
the pretest-posttest design. This design contains two extra control groups, which serve to
reduce the influence of confounding variables and allow the researcher to test whether the
28
pretest itself has an effect on the subjects. Whilst much more complex to set up and analyze,
this design type combats many of the internal validity issues that can plague research. It
allows the researcher to exert complete control over the variables and allows the researcher to
check that the pretest did not influence the results.

The Solomon four group test is a standard pretest-posttest two-group design and the posttest
only control design. The various combinations of tested and untested groups with treatment
and control groups allows the researcher to ensure that confounding variables and extraneous
factors have not influenced the results.

The Solomon Four Group Design Explained

In the figure, A, A1, B and C are exactly the same as in the standard two group design.

29
The first two groups of the Solomon four group design are designed and interpreted in exactly
the same way as in the pretest-post-test design, and provide the same checks upon
randomization.
 The comparison between the posttest results of groups C and D, marked by line 'D',
allows the researcher to determine if the actual act of pretesting influenced the results. If the
difference between the posttest results of Groups C and D is different from the Groups A and
B difference, then the researcher can assume that the pretest has had some effect upon the
results
 The comparison between the Group B pretest and the Group D posttest allows the
researcher to establish if any external factors have caused a temporal distortion. For example,
it shows if anything else could have caused the results shown and is a check upon causality.
 The Comparison between Group A posttest and the Group C posttest allows the
researcher to determine the effect that the pretest has had upon the treatment. If the posttest
results for these two groups differ, then the pretest has had some effect upon the treatment and
the experiment is flawed.
 The comparison between the Group B posttest and the Group D posttest shows
whether the pretest itself has affected behavior, independently of the treatment. If the results
are significantly different, then the act of pretesting has influenced the overall results and is in
need of refinement.

Why Isn't Every Experiment a Solomon Four Group Design?


The Solomon four group design is one of the benchmarks for sociological and educational
research, and combats most of the internal and external validity issues apparent in lesser
designs. Despite the statistical power and results that are easy to generalize, this design does
30
suffer from one major drawback that prevents it from becoming a common method of
research: the complexity.

A researcher using a Solomon four group design must have the resources and time to use four
research groups, not always possible in tightly funded research departments. Most schools and
organizations are not going to allow researchers to assign four groups randomly because it
will disrupt their normal practice. Thus, a non-random assignment of groups is essential and
this undermines the strength of the design.

Secondly, the statistics involved is extremely complex, even in the age of computers and
statistical programs. Unless the research is critical or funded by a large budget and extensive
team of researchers, most experiments are of the simpler pretest-posttest research designs. As
long as the researcher is fully aware of the issues with external validity and generalization,
they are sufficiently robust and a Solomon four group design is not needed.

REPEATED MEASURES DESIGN


The repeated measures design is a stalwart of scientific research, and offers a less unwieldy
way of comparing the effects of treatments upon participants. The term 'repeated measures
design' is often interchanged with the term 'within subjects design,' although many researchers
only class a subtype of the within subjects design, known as a crossover study, as a repeated
measures design.

What is a Repeated Studies Design?


The repeated measures design uses the same subjects with every condition of the research,
including the control.

This requires fewer participants and resources, and also decreases the effects of natural
variation between individuals upon the results. Repeated subject designs are commonly used
in longitudinal studies, over the long term, and in educational tests, where it is important to
ensure that variability is low. Repeated subjects designs do have a couple of disadvantages,
mainly that the subjects can become better at a task over time, known as practice effects or,
conversely, they become worse through boredom and fatigue. In addition, if some of the

31
subjects pull out before completing the second part, this can result in a sample group too small
to have any statistical significance.

Repeated Measures Designs - Crossover Studies


The crossover design is, by far, the most common type of repeated measures design, based
around ensuring that all of the subjects receive all of the treatments. In an experiment with
two treatments, the subjects would be randomized into two groups. The first group would be
given treatment A followed by treatment B, the second would be given treatment B followed
by treatment A. It is also possible to test more than two conditions, if required, and this
experiment meets the requirements of randomization, manipulation and control.

Like all repeated measures designs, this reduces the chance of variation between individuals
skewing the results and also requires a smaller group of subjects. It also reduces the chance of
practice or fatigue effects influencing the results because, presumably, it will be the same for
both groups and can be removed by statistical tests. The major pitfall is if the carryover
effects are asymmetrical, if B affects A more than A affects B, for example. The main
weakness of a crossover study is the possibility of carryover effects, where administration of
the first condition affects the results gained in the second. For example, imagine medical
researchers testing the effects of two drugs upon asthma sufferers. There is a chance that the
first drug may remain in the subject's system and affect the results of the second, one of the
reasons why medical researchers usually leave a 'washout' period between treatments. In
addition, crossover studies suffer badly if there is a high dropout rate amongst participants,
which can adversely affect the validity by unbalancing the groups and reducing the statistical
validity. Despite this, crossover studies remain the most common repeated measures design,
due to the ease and practicality.

COUNTERBALANCED MEASURES DESIGN


Experiments conducted with a counterbalanced measures design are one of the best ways to
avoid the pitfalls of standard repeated measures designs, where the subjects are exposed to all
of the treatments. In a normal experiment, the order in which treatments are given can actually
affect the behavior of the subjects or elicit a false response, due to fatigue or outside factors
changing the behavior of many of the subjects. To counteract this, researchers often use a
counterbalanced design, which reduces the chances of the order of treatment or other factors
adversely influencing the results.
32
What is a Counterbalanced Measures Design?
The simplest type of counterbalanced measures design is used when there are two possible
conditions, A and B. As with the standard repeated measures design, the researchers want to
test every subject for both conditions. They divide the subjects into two groups and one group
is treated with condition A, followed by condition B, and the other is tested with condition B
followed by condition A.

Three Conditions
If you have three conditions, the process is exactly the same and you would divide the
subjects into 6 groups, treated as orders ABC, ACB, BAC, BCA, CAB and CBA.

Four Conditions
The problem with complete counterbalancing is that for complex experiments, with multiple
conditions, the permutations quickly multiply and the research project becomes extremely
unwieldy. For example, four possible conditions requires 24 orders of treatment (4x3x2x1),
and the number of participants must be a multiple of 24, due to the fact that you need an equal
number in each group.

33
More Than Four Conditions
With 5 conditions you need multiples of 120 (5x4x3x2x1), with 7 you need 5040! Therefore,
for all but the largest research projects with huge budgets, this is impractical and a
compromise is needed.

Incomplete Counterbalanced Measures Designs


Incomplete counterbalanced measures designs are a compromise, designed to balance the
strengths of counterbalancing with financial and practical reality. One such incomplete
counterbalanced measures design is the Latin Square, which attempts to circumvent some of
the complexities and keep the experiment to a reasonable size.

34
With Latin Squares, a five-condition research program would look like this:
Position 1 Position 2 Position 3 Position 4 Position 5
Order 1 A B C D E
Order 2 B C D E A
Order 3 C D E A B
Order 4 D E A B C
Order 5 E A B C D

The Latin Square design has its uses and is a good compromise for many research projects.
However, it still suffers from the same weakness as the standard repeated measures design in
that carryover effects are a problem. In the Latin Square, A always precedes B, and this means
that anything in condition A that potentially affects B will affect all but one of the orders. In
addition, A always follows E, and these interrelations can jeopardize the validity of the
experiment. The way around this is to use a balanced Latin Square, which is slightly more
complicated but ensures that the risk of carryover effects is much lower. For experiments with
an even number of conditions, the first row of the Latin Square will follow the formula 1, 2, n,
3, n-1, 4, n-2…, where n is the number of conditions. For subsequent rows, you add one to the
previous, returning to 1 after n. Sounds complicated, so it is much easier to look at an
example for a six condition experiment. The subject groups are labeled A to F, the columns
represent the conditions tested, and the rows represent the subject groups:

Subjects 1st 2nd 3rd 4th 5th 6th


A 1 2 6 3 5 4
B 2 3 1 4 6 5
C 3 4 2 5 1 6
D 4 5 3 6 2 1
E 5 6 4 1 3 2
F 6 1 5 2 4 3

As you can see, this ensures that every single condition follows every other condition once,
allowing the researchers to pick out any carryover effects during the statistical analysis. When
an experiment with an odd number of conditions is designed, the process is slightly more
complex and two Latin Squares are needed to avoid carryover effects. The first is created in
exactly the same way and the second is a mirror image:

1 2 5 3 4
2 3 1 4 5
3 4 2 5 1
4 5 3 1 2
5 1 4 2 3

35
4 3 5 2 1
5 4 1 3 2
1 5 2 4 3
2 1 3 5 4
3 2 4 1 5

With this design, every single condition follows another two times, and statistical tests allow
researchers to analyse the data. This balanced Latin Square is a commonly used instrument to
perform large repeated measured designs and is an excellent compromise between
maintaining validity and practicality. There are other variations of counterbalanced measures
designs, but these variations are by far the most common.

MATCHED SUBJECTS DESIGN


In a matched subjects designs, researchers attempt to emulate some of the strengths of within
subjects designs and between subjects designs. A matched subject design uses separate
experimental groups for each particular treatment, but relies upon matching every subject in
one group with an equivalent in another. The idea behind this is that it reduces the chances of
an influential variable skewing the results by negating it.

What is a Matched Subjects Design?


Matched subjects designs are often used in education, giving researchers a useful way to
compare treatments without having to use huge and randomized groups.

For example, a study to compare two new methods for teaching reading uses a matched
subject research program. The researchers want to compare two methods, the current method
and a modern method. They select two groups of children and match pairs of children across
the two groups according to ability, using the results of their last reading comprehension test.
If the researchers wanted to test another method, they would have to find three comparable
children to compare between the three groups.

36
It is also possible to match for more than one variable. For example, a study to test whether a
daily exercise routine improved the cardio-vascular health in the inhabitants of a nursing
home could match subjects for age and gender. It may also be possible to match smokers and
ex-smokers. Obviously, given the complexity of humans and the sheer number of factors that
can influence behavior, this is exceptionally difficult for every factor, without huge groups
and making the project unnecessarily complex, especially if you are testing multiple
treatments.

The Advantages of a Matched Subjects Design


The overall goal of a matched subjects design is to emulate the conditions of a within subjects
design, whilst avoiding the temporal effects that can influence results. A within subjects
design tests the same people whereas a matched subjects design comes as close as possible to
that and even uses the same statistical methods to analyze the results. This eliminates the
possibility of differences between individuals affecting the results. The matched subjects
design also utilizes the strength of the between subjects design, in that every subject is tested
only once, eliminating the possibility of temporal factors, known as order effects, affecting
the results.

The Disadvantages of a Matched Subjects Design


Whilst the design is an excellent compromise between reducing order effects and smoothing
out variation between individuals, it is certainly not perfect. Even with careful matching of the
pairs, there will always be some variation. In the nursing home example, there are far too
many factors influencing cardio-vascular fitness that the researchers can only hope to match
the most influential variables, which is an approximation. In addition, the researcher might be
incorrect in their assumptions about which variables are the most important and miss a major
confounding variable. Even the single variable may have been measured incorrectly; in the
educational example, one of the children may have had a really bad day, been ill or suffered
from nerves, giving her a much lower score than her reading comprehension would indicate.
Despite these disadvantages, matched subjects designs are useful, allowing researchers to
perform streamlined and focused research programs whilst maintaining a good degree of
validity.

Bayesian Probability
Bayesian probability is the process of using probability to try to predict the likelihood of
certain events occurring in the future. Unlike traditional probability, which uses a frequency
to try to estimate probability, Bayesian probability is generally expressed as a percentage. In
its most basic form, it is the measure of confidence, or belief, that a person holds in a
proposition. Using Bayesian probability allows a researcher to judge the amount of
confidence that they have in a particular result. Frequency probability, via the traditional null
hypothesis restricts the researcher to yes and no answers. Bayesian methods are becoming
another tool for assessing the viability of a research hypothesis. To use Bayesian probability,
a researcher starts with a set of initial beliefs, and tries to adjust them, usually through
experimentation and research. The original set of beliefs is then altered to accommodate the
new information. This process sacrifices a little objectivity for flexibility, helping researchers
to circumvent the need for a tortuous research design.

A drug company does not want to know whether a drug works or not, but assess if it works
better than existing treatments, giving a baseline for comparison. Drugs companies often
‘tinker’ with the molecular structure of drugs, and do not want to design a new program each
37
time. The researchers will constantly reassess their Bayesian probability, or degree of belief,
allowing them to concentrate upon promising drugs and cutting short failing treatments. This
reduces the risk to patients, the timescale and the expense.

Bayesian Probability in Use


One simple example of Bayesian probability in action is rolling a die: Traditional frequency
theory dictates that, if you throw the dice six times, you should roll a six once. Of course,
there may be variations, but it will average out over time. This is where Bayesian probability
differs. Imagine a Bayesian specialist observing a game of dice in a casino. It is more than
likely that he will begin with the same 1 in 6 chance, or 16.67%. As the night wears on, he
notices that the dice is turning up sixes more than expected, and adjusts his belief. He begins
to suspect that the dice is loaded, so leaves, keeping his money in his pocket. Sticking with
the gambling theme, consider a professional poker player taking part in a game. Standard
probability would state, assuming that all the players are of equal ability and have a good
‘poker face,’ that the game revolves around the frequencies and chances of certain cards
appearing. However, our player has researched and studied the styles of his opposition over
the years. She knows that one is likely to bluff a lot; another player is cautious, and he will not
place large bets unless he has a good chance of a strong hand. The other is prone to making
rash bets and going all in. Armed with this information, she can use Bayesian probability to
reassess the likelihood of her own hand being strong, and having a chance of taking the pot. A
similar, although more complex process is used to predict the weather, based upon previous
events and occurrences, and is right much more often than not. Weather is a chaotic system,
and these are notoriously difficult to predict by frequency probability.

Any regular computer user regularly makes use of Bayesian probability. Spam filters on e-
mail accounts make use of the Bayes theorem, and do a pretty good job. Whilst they do not
intercept every single spam e-mail, and may wrongly assign legitimate messages to the trash
folder, they are certainly better than having hundreds of junk messages waiting in the inbox
every time the account is opened. Every time the program makes an incorrect assumption,
which is flagged by the recipient, the new information feeds back into the model and
facilitates a more accurate answer the next time. This summarizes Bayesian probability very
well - it is an extremely useful tool, more often right than wrong, but it is only ever a guide.
Many areas of science are adapting to this reworking of an old theory, and it promises to fit
alongside the traditional methods very well.

38

You might also like