Methods in Biochemical Research
Methods in Biochemical Research
Methods in Biochemical Research
Meaning of Research
Research is an art of scientific investigation. The Advance Learners Dictionary defines
research as a careful investigation or inquiry especially through search for new facts in any
branch of knowledge. According to WHO, Research is a quest for knowledge through diligent
search or investigation or experimentation aimed at the discovery and interpretation of new
knowledge. Generally, research can be defined as a scientific inquiry aimed at learning new
facts, testing ideas, etc. It is the systematic collection, analysis and interpretation of data to
generate new knowledge and answer a certain question or solve a problem.
Characteristics of research
It demands a clear statement of the problem
It requires a plan (it is not aimlessly “ looking” for something in the hope that you
will come across a solution)
It builds on existing data, using both positive and negative findings
New data should be collected as required and be organized in such a way that they
answer the research question(s).
Types of research
Research is a systematic search for information and new knowledge. It covers topics in every
field of science and perceptions of its scope and activities are unlimited. The classical broad
divisions of research are: basic and applied research. The basic research is necessary to
generate new knowledge and technologies to deal with major unresolved health problems. On
the other hand, applied research is necessary to identify priority problems and to design and
evaluate policies and programs that will deliver the greatest health benefit, making optimal
use of available resources.
Quantitative and Qualitative researches: Early forms of research originated in the natural
sciences such as biology, chemistry, physics, geology etc. and were concerned with
investigating things which we could observe and measure in some way. Such observations
and measurements can be made objectively and repeated by other researchers. This process is
referred to as “quantitative” research. Much later, along came researchers working in the
social sciences: psychology, sociology, anthropology etc. They were interested in studying
human behaviour and the social world inhabited by human beings. They found increasing
difficulty in trying to explain human behaviour in simply measurable terms. Measurements
tell us how often or how many people behave in a certain way but they do not adequately
answer the “why” and “how” questions. Research which attempts to increase our
understanding of why things are the way they are in our social world and why people act the
ways they do is “qualitative” research. Qualitative research is concerned with developing
explanations of social phenomena. That is to say, it aims to help us to understand the world in
which we live and why things are the way they are. It is concerned with the social aspects of
our world and seeks to answer questions about:
1
Qualitative research is concerned with finding the answers to questions which begin with:
why? How? In what way? Quantitative research, on the other hand, is more concerned with
questions about: how much? How many? How often? To what extent? etc.
Public health problems are complex, not only because of their multicausality but also as a
result of new and emerging domestic and international health problems. Social, economic,
political, ethnic, environmental, and genetic factors all are associated with today’s public
health concerns. Consequently, public health practitioners and researchers recognize the need
for multiple approaches to understanding problems and developing effective interventions that
address contemporary public health issues. Qualitative methods fill a gap in the public health
toolbox; they help us understand behaviors, attitudes, perceptions, and culture in a way that
quantitative methods alone cannot. For all these reasons, qualitative methods are getting
renewed attention and gaining new respect in public health.
A thorough description of qualitative research is beyond the scope of this lecture note.
Students interested to know more about qualitative methods could consult other books which
are primarily written for that purpose. The main purpose of this lecture note is to give a
detailed account on the principles of quantitative research.
Taking chance, or probability into account is absolutely critical to biological research, and is
the substance of research design. Research design, above all else, must account for and
maintain the role of chance in order to ensure validity. It is statistical methods which preserve
the laws of Probability in our inquiry, and allow proper analysis and interpretation of results.
Statistics are the tool that permits health research to be empirical rather than abstract; they
allow us to confirm our findings by further observation and experiment.
2
Basic and applied
Research can be functionally divided into basic (or pure) research and applied research. Basic
research is usually considered to involve a search for knowledge without a defined goal of
utility or specific purpose. Applied research is problem-oriented, and is directed towards the
solution of an existing problem. There is continuing controversy over the relative benefits and
merits to society of basic and applied research. Some claim that science, which depends
greatly on society for its support, should address itself directly to the solution of the relevant
problems of man, while others argue that scientific inquiry is most productive when freely
undertaken, and that the greatest advances in science have resulted from pure research. It is
generally recognized that there needs to be a healthy balance between the two types of
research, with the more affluent and technologically advanced societies able to support a
greater proportion of basic research than those with fewer resources to spare.
1. Order
The scientific method differs from ‘common sense’ in arriving at conclusions by employing
an organized observation of entities or events which are classified or ordered on the basis of
common properties and behaviours. It is this commonality of properties and behaviours that
allows predictions, which, carried to the ultimate, become laws.
4. Hypothesis
Hypotheses are carefully constructed statements about a phenomenon in the population. The
hypotheses may have been generated by deductive reasoning, or based on inductive reasoning
from prior observations. One of the most useful tools of health research is the generation of
hypotheses which, when tested, will lead to the identification of the most likely causes of
disease or changes in the condition being observed. Although we cannot draw definite
conclusions, or claim proof using the inductive method, we can come ever closer to the truth
by knocking down existing hypotheses and replacing them with ones of greater plausibility.
In health research, hypotheses are often constructed and tested to identify causes of disease
and to explain the distribution of disease in populations. Mill’s canons of inductive reasoning
are frequently utilized in the forming of hypotheses which relate association and causation.
Briefly stated, these methods include:
(a) method of difference – when the frequency of a disease is markedly dissimilar under two
circumstances, and a factor can be identified in one circumstance and not the other, this
factor, or its absence, may be the cause of the disease (for example, the difference in
frequency of lung cancer in smokers and nonsmokers);
(b) method of agreement – if a factor, or its absence is common to a number of different
circumstances that are found to be associated with the presence of a disease, that factor, or its
absence may be causally associated with the disease (e.g. the occurrence of hepatitis A is
associated with patient contact, crowding and poor sanitation and hygiene, each conducive to
the transmission of the hepatitis virus);
(c) the method of concomitant variation, or the dose response effect – the increasing
expression of endemic goitre with decreasing levels of iodine in the diet, the increasing
frequency of leukaemia with increasing radiation exposure, the increase in prevalence of
elephantiasis in areas of increasing filarial endemicity, are each examples of this concomitant
variation;
(d) the method of analogy – the distribution and frequency of a disease or effect may be
similar enough to that of some other disease to suggest commonality in cause (e.g. hepatitis B
virus infection and cancer of the liver).
4
DIFFERENT TYPES OF RESEARCH STUDIES
The case control study uses groups of patients stricken with a disease, and compares them
with a control group of patients not suffering symptoms. Medical records and interviews are
used to try to build up a historical picture of the patient's life, allowing cross-reference
between patients and statistical analysis. Any trends can then be highlighted and action can be
taken.
Statistical analysis allows the researcher to draw a conclusion about whether a certain
situation or exposure led to the medical condition. For example, a scientist could compare a
group of coal miners suffering from lung cancer with those clear of the disease, and try to
establish the underlying cause. If the majority of the cases arose in collieries owned by one
company, it might indicate that the company's safety equipment and procedures were at fault
Possibly the most famous case control study using this method was a study into whether
bicycle helmets reduce the chance of cyclists receiving bad head injuries in an accident.
Obviously, the researcher could not use standard experimentation and compare a control
group of non-helmet wearers with helmet wearers, measuring the chances of head injury, as
this would be unethical. A case study control was utilized, and the researchers looked at
medical records, comparing the number of head injury sufferers wearing helmets against those
without. This generated a statistical result, showing that wearing a cycle helmet made it 88%
less likely that head injury would be suffered in an accident. The main weakness of the case
control study is that it is very poor at determining cause and effect relationships.
In the cycle helmet example, it could be argued that a cyclist who bothered wearing a helmet
may well have been a safer cyclist anyway, and less likely to have accidents. Evidence
showed that children wearing helmets were more likely to be from a more affluent class, more
used to cycling through parks than city streets. The study also showed that helmets were of
little use to adults. Whilst most agree that cycle helmets are probably a good thing for
children, there is not enough evidence to suggest that they should be mandatory for adults
5
outside extreme cycling. These problems serves as a warning that the results of any case
control study should not be relied upon, instead acting as a guide, possibly allowing deeper
and more rigorous methods to be utilized.
2. OBSERVATIONAL STUDIES
This type of research draws a conclusion by comparing subjects against a control group, in
cases where the researcher has no control over the experiment. A research study comparing
the risk of developing lung cancer, between smokers and non-smokers, would be a good
example of an observational study. The main reason for performing any observational
research is due to ethical concerns. With the smoking example, a scientist cannot give
cigarettes to non-smokers for 20 years and compare them with a control group. This also
brings up the other good reason for such studies, in that few researchers can study the long-
term effects of certain variables, especially when it runs into decades. For this study of long-
term and subtle effects, they have to use pre-existing conditions and medical records. The
researcher may want to study an extremely small sample group, so it is easier to start with
known cases and works backwards. The thalidomide cases, for example, are an example of an
observational study where researchers had to work backwards, and establish that the drug was
the cause of disabilities.
The main problem with observational studies is that the experimenter has no control over the
composition of the control groups, and cannot randomize the allocation of subjects. This can
create bias, and can also mask cause and effect relationships or, alternatively, suggest
correlations where there are none (error in research). For example, in the smoking example, if
the researcher found that there is a correlation between smoking and increased rates of lung
cancer, without knowing the full and complete background of the subjects, there is no way of
determining whether other factors were involved, such as diet, occupation or genetics.
Randomization is assumed to even out external causal effects, but this is impossible in an
observational study. There is no independent variable, so it is dangerous to assume cause and
effect relationships, a process often misunderstood by the mass media lauding the next
wonder food, or sensationalizing a political debate with unfounded results and pseudo-
science. Despite the limitations, an observational study allows a useful insight into a
phenomenon, and sidesteps the ethical and practical difficulties of setting up a large and
cumbersome medical research project.
3. COHORT STUDY
A cohort study is a research program investigating a particular group with a certain trait, and
observes over a period of time. Some examples of cohorts may be people who have taken a
certain medication, or have a medical condition. Outside medicine, it may be a population of
animals that has lived near a certain pollutant or a sociological study of poverty. A cohort
study can delve even further and divide a cohort into sub-groups, for example, a cohort of
smokers could be sub-divided, with one group suffering from obesity. In this respect, a cohort
study is often interchangeable with the term naturalistic observation. There are two main sub-
types of cohort study, the retrospective and the prospective cohort study. The major difference
between the two is that the retrospective looks at phenomena that have already happened,
whilst the prospective type starts from the present.
4. LONGITUDINAL STUDY
A longitudinal study is observational research performed over a period of years or even
decades. Longitudinal studies allow social scientists and economists to study long-term
effects in a human population. A cohort study is a subset of the longitudinal study because it
observes the effect on a specific group of people over time. Quite often, a longitudinal study
is an extended case study, observing individuals over long periods, and is a purely qualitative
undertaking. The lack of quantitative data means that any observations are speculative, as
with many case studies, but they allow a unique and valuable perspective on some aspects of
human culture and sociology.
6. A CORRELATIONAL STUDY
A correlational study determines whether or not two variables are correlated. This means to
study whether an increase or decrease in one variable corresponds to an increase or decrease
in the other variable. It is very important to note that correlation doesn't imply causation.
We'll come back to this later.
Types
There are three types of correlations that are identified:
1. Positive correlation: Positive correlation between two variables is when an increase
in one variable leads to an increase in the other and a decrease in one leads to a decrease in
the other. For example, the amount of money that a person possesses might correlate
positively with the number of cars he owns.
2. Negative correlation: Negative correlation is when an increase in one variable leads
to a decrease in another and vice versa. For example, the level of education might correlate
negatively with crime. This means if by some way the education level is improved in a
country, it can lead to lower crime. Note that this doesn't mean that a lack of education causes
crime. It could be, for example, that both lack of education and crime have a common reason:
poverty.
3. No correlation: Two variables are uncorrelated when a change in one doesn't lead to a
change in the other and vice versa. For example, among millionaires, happiness is found to be
uncorrelated to money. This means an increase in money doesn't lead to happiness.
A correlation coefficient is usually used during a correlational study. It varies between +1 and
-1. A value close to +1 indicates a strong positive correlation while a value close to -1
indicates strong negative correlation. A value near zero shows that the variables are
uncorrelated.
8
Limitations
It is very important to remember that correlation doesn't imply causation and there is no way
to determine or prove causation from a correlational study. This is a common mistake made
by people in almost all spheres of life.
For example, a US politician speaking out against free lunches to poor kids at school argues -
“You show me the school that has the highest free and reduced lunch, and I'll show you the
worst test scores, folks” (nymag.com). This is a correlation he is speaking about - one cannot
imply causation. The obvious explanation for this is a common cause of poverty: people who
are too poor to feed their children will not have the best test scores.
A. SEMI-EXPERIMENTAL DESIGN
FIELD EXPERIMENT
For geologists, social scientists and environmental biologists, amongst others, field
experiments are an integral part of the discipline. As the name suggests, a field study is an
experiment performed outside the laboratory, in the 'real' world. Unlike case studies and
observational studies, a field experiment still follows all of the steps of the scientific process,
addressing research problems and generating hypotheses. The obvious advantage of a field
study is that it is practical and also allows experimentation, without artificially introducing
confounding variables. A population biologist examining an ecosystem could not move the
entire environment into the laboratory, so field experiments are the only realistic research
method in many fields of science.
B. QUASI-EXPERIMETALT DESIGN
Quasi-experimental design is a form of experimental research used extensively in the social
sciences and psychology. Whilst regarded as unscientific and unreliable, by physical and
biological scientists, the method is, nevertheless, a very useful method for measuring
social variables. The inherent weaknesses in the methodology do not undermine
the validity of the data, as long as they are recognized and allowed for during the
whole experimental process. Quasi experiments resemble quantitative and qualitative
experiments, but lack random allocation of groups or proper controls, so firm statistical
analysis can be very difficult.
DESIGN
Quasi-experimental design involves selecting groups, upon which a variable is tested, without
any random pre-selection processes. For example, to perform an educational experiment, a
class might be arbitrarily divided by alphabetical selection or by seating arrangement. The
division is often convenient and, especially in an educational situation, causes as little
disruption as possible. After this selection, the experiment proceeds in a very similar way to
any other experiment, with a variable being compared between different groups, or over a
period of time.
ADVANTAGES
Especially in social sciences, where pre-selection and randomization of groups is often
difficult, they can be very useful in generating results for general trends. E.g. if we study the
effect of maternal alcohol use when the mother is pregnant, we know that alcohol does harm
embryos. A strict experimental design would include that mothers were randomly assigned to
drink alcohol. This would be highly illegal because of the possible harm the study might do to
the embryos. So what researchers do is to ask people how much alcohol they used in their
pregnancy and then assign them to groups. Quasi-experimental design is often integrated with
individual case studies; the figures and results generated often reinforce the findings in a case
study, and allow some sort of statistical analysis to take place. In addition, without extensive
pre-screening and randomization needing to be undertaken, they do reduce the time and
resources needed for experimentation.
DISADVANTAGES
Without proper randomization, statistical tests can be meaningless. For example, these
experimental designs do not take into account any pre-existing factors (as for the mothers:
what made them drink or not drink alcohol), or recognize that influences outside the
experiment may have affected the results. A quasi experiment constructed to analyze the
effects of different educational programs on two groups of children, for example, might
generate results that show that one program is more effective than the other. These results will
not stand up to rigorous statistical scrutiny because the researcher also need to control other
factors that may have affected the results. This is really hard to do properly. One group of
10
children may have been slightly more intelligent or motivated. Without some form of pre-
testing or random selection, it is hard to judge the influence of such factors.
CONCLUSION
Disadvantages aside, as long as the shortcomings of the quasi-experimental design are
recognized, these studies can be a very powerful tool, especially in situations where ‘true’
experiments are not possible. They are very good way to obtain a general overview and then
follow up with a case study or quantitative experiment, to focus on the underlying reasons for
the results generated.
This type of analysis would then allow the researchers to estimate the heritability of specific
traits and quantify the effect of genetic factors on the individual trait. Psychologists have long
known that a twin study is not a true experimental design, but it has led to some interesting
insights into the influence of genes on human behavior. For this method, a number of
assumptions have to be made; that the identical twins share identical DNA profiles, and that
the environmental factors are the same for all participants.
Criticisms
There have been few criticisms of identical twin studies over the years. By their nature, and
because of small sample sizes, it is very difficult to quantitatively analyze the results and so
all experimentation tends to be observational; the sample groups cannot be random so
statistical analysis is impossible. The experimental methods assume that there is little
difference in the environmental factors between fraternal and identical twins, but there is a
criticism that the tendency of adults to treat identical twins in exactly the same way makes
this assumption invalid. Parents tend to dress identical twins the same way and encourage
them to pursue the same interests. The distinction between environmental factors and genetic
influences may not be as black and white as the identical twins study assumes. There is
probably an interaction between genes and environment and so the whole picture may be a lot
more complex. In addition, the experiment tends to assume that one gene affects one
behavioral trait. Modern genetic research is showing that many different genes can influence
behavior.
Summary
The above criticisms all have some validity, but the main point is that twin studies have never
claimed to be anything other than observational, identifying and trying to explain trends rather
than prove a hypothesis. Whilst there are some concerns about the validity of the identical
twins study, such experiments are certainly better than performing no research at all. Twins
studies are now trying to analyze the environmental factors more. Instead of assuming that the
11
environmental factors are the same, they are now contrasting shared family environment with
the individual events suffered by the individual twin.
In addition, identical twins study is constantly evolving into more complex forms, now taking
into account whole families and other siblings in addition to the twins. Research into the
human genome is now resurrecting the studies of twins; hereditary trends observed in an
identical twins study can now be studied quantitatively in the laboratory. It is now standard
practice, when conducting twin's research to analyze DNA from all participants and this is
bypassing many of the concerns about the twin study.
C. EXPERIMENTAL DESIGNS
For an experiment to be classed as a true experimental design, it must fit all of the following
criteria.
The sample groups must be assigned randomly.
There must be a viable control group.
Only one variable can be manipulated and tested. It is possible to test more than one,
but such experiments and their statistical analysis tend to be cumbersome and difficult.
The tested subjects must be randomly assigned to either control or experimental
groups.
Advantages
The results of a true experimental design can be statistically analyzed and so there can be little
argument about the results. It is also much easier for other researchers to replicate the
experiment and validate the results. For physical sciences working with mainly numerical
data, it is much easier to manipulate one variable, so true experimental design usually gives a
yes or no answer.
Disadvantages
Whilst perfect in principle, there are a number of problems with this type of design. Firstly,
they can be almost too perfect, with the conditions being under complete control and not
being representative of real world conditions. For psychologists and behavioral biologists, for
example, there can never be any guarantee that a human or living organism will exhibit
‘normal’ behavior under experimental conditions. True experiments can be too accurate and it
is very difficult to obtain a complete rejection or acceptance of a hypothesis because the
standards of proof required are so difficult to reach. True experiments are also difficult and
expensive to set up. They can also be very impractical. While for some fields, like physics,
there are not as many variables so the design is easy, for social sciences and biological
sciences, where variations are not so clearly defined it is much more difficult to exclude other
factors that may be affecting the manipulated variable.
12
Summary
True experimental design is an integral part of science, usually acting as a final test of a
hypothesis. Whilst they can be cumbersome and expensive to set up, literature reviews,
qualitative research and descriptive research can serve as a good precursor to generate a
testable hypothesis, saving time and money. Whilst they can be a little artificial and
restrictive, they are the only type of research that is accepted by all disciplines as statistically
provable.
A double blind experiment is an experimental method used to ensure impartiality, and avoid
errors arising from bias. It is very easy for a researcher, even subconsciously, to influence
experimental observations, especially in behavioral science, so this method provides an extra
check. For example, imagine that a company is asking consumers for opinions about its
products, using a survey. There is a distinct danger that the interviewer may subconsciously
emphasize the company's products when asking the questions. This is the major reason why
market research companies generally prefer to use computers, and double blind experiments,
for gathering important data.
Other Applications
Whilst better known in medicine, double blind experiments are often used in other fields.
Surveys, questionnaires and market research all use this technique to retain credibility. If you
wish to compare two different brands of washing powder, the samples should be in the same
packaging. A consumer might have an inbuilt brand identity awareness, and preference, which
will lead to favoritism and bias. An example of the weakness of single blind techniques is in
police line-ups, where a witness picks out a suspect from a group. Many legal experts are
advocating that these line-ups should be unsupervised, and unprompted. If the police are fixed
13
on bringing a particular subject to justice, they may consciously, or subconsciously, tip off the
witness. Humans are very good at understanding body language and unconscious cues, so the
chance of observer's bias should be minimized.
Literature Review
Many students are instructed, as part of their research program, to perform a literature review,
without always understanding what a literature review is. Most are aware that it is a process of
gathering information from other sources and documenting it, but few have any idea of how
to evaluate the information, or how to present it. A literature review can be a precursor in the
introduction of a research paper, or it can be an entire paper in itself, often the first stage of
large research projects, allowing the supervisor to ascertain that the student is on the correct
path. A literature review is a critical and in depth evaluation of previous research. It is a
summary and synopsis of a particular area of research, allowing anybody reading the paper to
establish why you are pursuing this particular research program. A good literature review
expands upon the reasons behind selecting a particular research question.
Whilst some literature reviews can be presented in a chronological order, it is best avoided.
For example, a review of Victorian Age Physics, could present J.J. Thomson’s famous
experiments in a chronological order. Otherwise, this is usually perceived as being a little
lazy, and it is better to organize the review around ideas and individual points. As a general
rule, certainly for a longer review, each paragraph should address one point, and present and
evaluate all of the evidence, from all of the differing points of view.
Conducting a good literature review is a matter of experience, and even the best scientists
have fallen into the trap of using poor evidence. This is not a problem, and is part of the
scientific process; if a research program is well constructed, it will not affect the results.
Meta Analysis
Meta analysis is a statistical technique developed by social scientists, who are very limited in
the type of experiments they can perform. Social scientists have great difficulty in designing
and implementing true experiments, so meta-analysis gives them a quantitative tool to analyze
statistically data drawn from a number of studies, performed over a period of time. Medicine
and psychology increasingly use this method, as a way of avoiding time-consuming and
intricate studies, largely repeating the work of previous research.
What is Meta-Analysis?
Social studies often use very small sample sizes, so any statistics used generally give results
containing large margins of error. This can be a major problem when interpreting and drawing
conclusions, because it can mask any underlying trends or correlations. Such conclusions are
only tenuous, at best, and leave the research open for criticism.
Meta-analysis is the process of drawing from a larger body of research, and using powerful
statistical analyzes on the conglomerated data. This gives a much larger sample population
and is more likely to generate meaningful and usable data.
15
The Advantages of Meta-Analysis
Meta-analysis is an excellent way of reducing the complexity and breadth of research,
allowing funds to be diverted elsewhere. For rare medical conditions, it allows researchers to
collect data from further afield than would be possible for one research group. As the method
becomes more common, database programs have made the process much easier, with
professionals working in parallel able to enter their results and access the data. This allows
constant quality assessments and also reducing the chances of unnecessary repeat research, as
papers can often take many months to be published, and the computer records ensure that any
researcher is aware of the latest directions and results. The field of meta study is also a lot
more rigorous than the traditional literature review, which often relies heavily upon the
individual interpretation of the researcher. When used with the databases, a meta study allows
a much wider net to be cast than by the traditional literature review, and is excellent for
highlighting correlations and links between studies that may not be readily apparent as well as
ensuring that the compiler does not subconsciously infer correlations that do not exist.
Striking a balance can be a little tricky, but the whole field is in a state of constant
development, incorporating protocols similar to the scientific method used for normal
quantitative research. Finding the data is rapidly becoming the real key, with skilled meta-
analysts developing a skill-set of library based skills, finding information buried in
government reports and conference data, developing the knack of assessing the quality of
sources quickly and effectively.
Systematic reviews
Heavily used by the healthcare sector, systematic reviews are a powerful way of isolating and
critically evaluating previous research. Modern medical research generates so much literature,
and fills so many journals, that a traditional literature review could take months, and still be
out of date by the time that the research is designed and performed. In addition, researchers
are often guilty of selecting the research best fitting their pre-conceived notions, a weakness
16
of the traditional 'narrative' literature review process. To help medical professionals, specialist
compilers assess and condense the research, entering it into easily accessible research
databases. They are an integral part of the research process, and every student of medicine
routinely receives a long and extensive training in the best methods for critically evaluating
literature.
Funding and research grants cause researchers to try to find results that suit their paymasters,
a growing problem in many areas of science, not just medicine. The specialist reviewers
sidestep this problem, to a certain extent, by producing independent research, uncorrupted by
governmental or private healthcare funding, curbing the worst excesses. Often, a blind system
is used, and reviewers are unaware of where the papers they are reviewing came from, or who
they are written by. This lessens allegations of favoritism and judging research by the
reputation of the researcher rather than on merit. Ultimately, the onus is on the reader to draw
their own assessments, using their own experience to judge the quality of the systematic
review. Whilst not a perfect system, systematic reviews are far superior to the traditional
narrative approach, which often allows a lot of good research to fall through the cracks.
PILOT STUDIES
A pilot study is a standard scientific tool for 'soft' research, allowing scientists to conduct a
preliminary analysis before committing to a full-blown study or experiment. A small
chemistry experiment in a college laboratory, for example, costs very little, and mistakes or
validity problems easily rectified. At the other end of the scale, a medical experiment taking
samples from thousands of people from across the world is expensive, often running into the
millions of dollars. Finding out that there was a problem with the equipment or with the
statistics used is unacceptable, and there will be dire consequences.
A field research project in the Amazon Basin costs a lot of time and money, so finding out
that the electronics used do not function in the humid and warm conditions is too late. To test
the feasibility, equipment and methods, researchers will often use a pilot study, a small-scale
rehearsal of the larger research design. Generally, the pilot study technique specifically refers
to a smaller scale version of the experiment, although equipment tests are an increasingly
important part of this sub-group of experiments. For example, the medical researchers may
conduct a smaller survey upon a hundred people, to check that the protocols are fine. The
Amazon Researchers may perform an experiment, in similar conditions, sending a small team
either to the Amazon to test the procedures, or by using something like the tropical bio-dome
at the Eden Project. Pilot studies are also excellent for training inexperienced researchers,
allowing them to make mistakes without fear of losing their job or failing the assignment.
Logistical and financial estimates can be extrapolated from the pilot study, and the research
question, and the project can be streamlined to reduce wastage of resources and time. Pilots
can be an important part of attracting grants for research as the results can be placed before
the funding body. Generally, most funding bodies see research as an investment, so are not
going to dole out money unless they are certain that there is a chance of a financial return.
Unfortunately, there are seldom paper reporting the preliminary pilot study, especially if
problems were reported, is often stigmatized and sidelined. This is unfair, and punishes
researchers for being methodical, so these attitudes are under a period of re-evaluation.
Discouraging researchers from reporting methodological errors, as found in pilot studies,
means that later researchers may make the same mistakes. The other major problem is
deciding whether the results from the pilot study can be included in the final results and
analysis, a procedure that varies wildly between disciplines. Pilots are rapidly becoming an
18
essential pre-cursor to many research projects, especially when universities are constantly
striving to reduce costs. Whilst there are weaknesses, they are extremely useful for driving
procedures in an age increasingly dominated by technology, much of it untested under field
conditions.
1. This design allows researchers to compare the final posttest results between the two
groups, giving them an idea of the overall effectiveness of the intervention or treatment. (C)
2. The researcher can see how both groups changed from pretest to posttest, whether one,
both or neither improved over time. If the control group also showed a significant
improvement, then the researcher must attempt to uncover the reasons behind this. (A and A1)
3. The researchers can compare the scores in the two pretest groups, to ensure that the
randomization process was effective. (B)
These checks evaluate the efficiency of the randomization process and also determine whether
the group given the treatment showed a significant difference.
The other major problem, which afflicts many sociological and educational research
programs, is that it is impossible and unethical to isolate all of the participants completely. If
two groups of children attend the same school, it is reasonable to assume that they mix outside
of lessons and share ideas, potentially contaminating the results. On the other hand, if the
children are drawn from different schools to prevent this, the chance of selection bias arises,
because randomization is not possible. The two-group control group design is an
exceptionally useful research method, as long as its limitations are fully understood. For
extensive and particularly important research, many researchers use the Solomon four group
method, a design that is more costly, but avoids many weaknesses of the simple pretest-
posttest designs.
In this particular type of research, the experiment is double blind. Neither the doctors nor the
patients are aware of which pill they are receiving, curbing potential research bias. In the
social sciences, control groups are the most important part of the experiment, because it is
practically impossible to eliminate all of the confounding variables and bias. For example, the
placebo effect for medication is well documented, and the Hawthorne Effect is another
influence where, if people know that they are the subjects of an experiment, they
automatically change their behavior. There are two main types of control, positive and
negative, both providing researchers with ways of increasing the statistical validity of their
data.
21
Positive Scientific Control Groups
Positive scientific control groups are where the control group is expected to have a positive
result, and allows the researcher to show that the set-up was capable of producing results.
Generally, a researcher will use a positive control procedure, which is similar to the actual
design with a factor that is known to work. For example, a researcher testing the effect of new
antibiotics upon Petri dishes of bacteria, may use an established antibiotic that is known to
work. If all of the samples fail, except that one, it is likely that the tested antibiotics are
ineffective. However, if the control fails too, there is something wrong with the design.
Positive scientific control groups reduce the chances of false negatives.
A researcher testing the radioactivity levels of various samples with a Geiger counter would
also sample the background level, allowing them to adjust the results accordingly.
Establishing strong scientific control groups is arguably a more important part of any
scientific design than the actual samples. Failure to provide sufficient evidence of strong
control groups can completely invalidate a study, however high significance-levels indicate
low probability of error. Randomization is a sampling method used in scientific experiments.
It is commonly used in randomized controlled trials in experimental research. In medical
research, randomization and control of trials is used to test the efficacy or effectiveness of
healthcare services or health technologies like medicines, medical devices or surgery.
What is Randomization?
So what is randomization? Let's suppose you have five chocolates bars and total 8 friends to
distribute these 5 chocolates to. Now how you are going to do this so the whole distribution
process is with a minimum of bias? You may write down names of each of your friends on a
separate small piece of paper, fold all small pieces of papers so no one know what name is on
any paper. Then you ask someone to pick 5 names and give chocolates to first 5 names. This
will remove the bias without hurting any of your friend's feelings. The way you did this is
what we call randomization. In randomized controlled trials, the research participants are
assigned by chance, rather than by choice, to either the experimental group or the control
group. Randomization reduces bias as much as possible. Randomization is designed to
"control" (reduce or eliminate if possible) bias by all means. The fundamental goal of
randomization is to certain that each treatment is equally likely to be assigned to any given
experimental unit.
23
As an example, imagine that a school seeks to test whether introducing a healthy meal at
lunchtime improves the overall fitness of the children. It decides to do this by giving half of
the children healthy salads and wholesome meals, whilst the control group carries on as
before. At regular intervals, the researchers note the cardiovascular fitness of the children,
looking to see if it improves. The number of extraneous factors and potential confounding
variables for such a study is enormous. Age, gender, weight, what the children eat at home,
and activity level are just some of the factors that could make a difference. In addition, if the
teachers, generally a health-conscious bunch, are involved in the selection of children, they
might subconsciously pick those who are most likely to adapt to the healthier regime and
show better results. Such a pre-determined bias destroys the chance of obtaining useful
results. By using pure randomized controlled trials and allowing chance to select children into
one of the two groups, it can be assumed that any confounding variables are cancelled out, as
long as you have a large enough sample group.
Secondly, randomized experiment designs, especially when combined with crossover studies,
are extremely powerful at understanding underlying trends and causalities. However, they are
a poor choice for research where temporal factors are an issue, for which a repeated measures
design is better. Whilst randomized controlled trials are regarded as the most accurate
experimental design in the social sciences, education, medicine and psychology, they can be
extremely resource heavy, requiring very large sample groups, so are rarely used. Instead,
researchers sacrifice generalization for convenience, leaving large scale randomized
controlled trials for researchers with bigger budgets and research departments.
24
A between subjects design is a way of avoiding the carryover effects that can plague within
subjects designs, and they are one of the most common experiment types in some scientific
disciplines, especially psychology. The basic idea behind this type of study is that participants
can be part of the treatment group or the control group, but cannot be part of both. If more
than one treatment is tested, a completely new group is required for each.
26
Psychologists often use them to test the relative effectiveness of a new treatment, often a
difficult proposition. The sheer complexity of the human mind and the large number of
potential confounding variables often renders between subjects designs unreliable, especially
when necessarily small sample groups make a more random approach impossible.
27
Agricultural science, with a need for field-testing, often uses factorial designs to test the effect
of variables on crops. In such large-scale studies, it is difficult and impractical to isolate and
test each variable individually. Factorial experiments allow subtle manipulations of a larger
number of interdependent variables. Whilst the method has limitations, it is a useful method
for streamlining research and letting powerful statistical methods highlight any correlations.
The Basics
Imagine an aquaculture research group attempting to test the effects of food additives upon
the growth rate of trout. A traditional experiment would involve randomly selecting different
tanks of fish and feeding them varying levels of the additive contained within the feed, for
example none or 10%. However, as any fish farmer knows, the density of stocking is also
crucial to fish growth; if there are not enough fish in a tank, then the wasted capacity costs
money. If the density is too high, then the fish grow at a slower rate. Rather than the
traditional experiment, the researchers could use a factorial design and co-ordinate the
additive trial with different stocking densities, perhaps choosing four groups. The factorial
experiment then needs 4 x 2, or eight treatments. The traditional rules of the scientific method
are still in force, so statistics require that every experiment be conducted in triplicate. This
means 24 separate treatment tanks. Of course, the researchers could also test, for example, 4
levels of concentration for the additive, and this would give 4 x 4 or 16 tanks, meaning 48
tanks in total.
Each factor is an independent variable, whilst the level is the subdivision of a factor.
Assuming that we are designing an experiment with two factors, a 2 x 2 would mean two
levels for each, whereas a 2 x 4 would mean two subdivisions for one factor and four for the
other. It is possible to test more than two factors, but this becomes unwieldy very quickly. In
the fish farm example, imagine adding another factor, temperature, with four levels into the
mix. It would then be 4 x 4 x 4, or 64 runs. In triplicate, this would be 192 tanks, a huge
undertaking. There are a few other methods, such as fractional factorial designs, to reduce
this, but they are not always statistically valid. This lies firmly in the realm of advanced
statistics and is a long, complicated and arduous undertaking.
The main disadvantage is the difficulty of experimenting with more than two factors, or many
levels. A factorial design has to be planned meticulously, as an error in one of the levels, or in
the general operationalization, will jeopardize a great amount of work. Other than these slight
detractions, a factorial design is a mainstay of many scientific disciplines, delivering great
results in the field.
The Solomon four group test is a standard pretest-posttest two-group design and the posttest
only control design. The various combinations of tested and untested groups with treatment
and control groups allows the researcher to ensure that confounding variables and extraneous
factors have not influenced the results.
In the figure, A, A1, B and C are exactly the same as in the standard two group design.
29
The first two groups of the Solomon four group design are designed and interpreted in exactly
the same way as in the pretest-post-test design, and provide the same checks upon
randomization.
The comparison between the posttest results of groups C and D, marked by line 'D',
allows the researcher to determine if the actual act of pretesting influenced the results. If the
difference between the posttest results of Groups C and D is different from the Groups A and
B difference, then the researcher can assume that the pretest has had some effect upon the
results
The comparison between the Group B pretest and the Group D posttest allows the
researcher to establish if any external factors have caused a temporal distortion. For example,
it shows if anything else could have caused the results shown and is a check upon causality.
The Comparison between Group A posttest and the Group C posttest allows the
researcher to determine the effect that the pretest has had upon the treatment. If the posttest
results for these two groups differ, then the pretest has had some effect upon the treatment and
the experiment is flawed.
The comparison between the Group B posttest and the Group D posttest shows
whether the pretest itself has affected behavior, independently of the treatment. If the results
are significantly different, then the act of pretesting has influenced the overall results and is in
need of refinement.
A researcher using a Solomon four group design must have the resources and time to use four
research groups, not always possible in tightly funded research departments. Most schools and
organizations are not going to allow researchers to assign four groups randomly because it
will disrupt their normal practice. Thus, a non-random assignment of groups is essential and
this undermines the strength of the design.
Secondly, the statistics involved is extremely complex, even in the age of computers and
statistical programs. Unless the research is critical or funded by a large budget and extensive
team of researchers, most experiments are of the simpler pretest-posttest research designs. As
long as the researcher is fully aware of the issues with external validity and generalization,
they are sufficiently robust and a Solomon four group design is not needed.
This requires fewer participants and resources, and also decreases the effects of natural
variation between individuals upon the results. Repeated subject designs are commonly used
in longitudinal studies, over the long term, and in educational tests, where it is important to
ensure that variability is low. Repeated subjects designs do have a couple of disadvantages,
mainly that the subjects can become better at a task over time, known as practice effects or,
conversely, they become worse through boredom and fatigue. In addition, if some of the
31
subjects pull out before completing the second part, this can result in a sample group too small
to have any statistical significance.
Like all repeated measures designs, this reduces the chance of variation between individuals
skewing the results and also requires a smaller group of subjects. It also reduces the chance of
practice or fatigue effects influencing the results because, presumably, it will be the same for
both groups and can be removed by statistical tests. The major pitfall is if the carryover
effects are asymmetrical, if B affects A more than A affects B, for example. The main
weakness of a crossover study is the possibility of carryover effects, where administration of
the first condition affects the results gained in the second. For example, imagine medical
researchers testing the effects of two drugs upon asthma sufferers. There is a chance that the
first drug may remain in the subject's system and affect the results of the second, one of the
reasons why medical researchers usually leave a 'washout' period between treatments. In
addition, crossover studies suffer badly if there is a high dropout rate amongst participants,
which can adversely affect the validity by unbalancing the groups and reducing the statistical
validity. Despite this, crossover studies remain the most common repeated measures design,
due to the ease and practicality.
Three Conditions
If you have three conditions, the process is exactly the same and you would divide the
subjects into 6 groups, treated as orders ABC, ACB, BAC, BCA, CAB and CBA.
Four Conditions
The problem with complete counterbalancing is that for complex experiments, with multiple
conditions, the permutations quickly multiply and the research project becomes extremely
unwieldy. For example, four possible conditions requires 24 orders of treatment (4x3x2x1),
and the number of participants must be a multiple of 24, due to the fact that you need an equal
number in each group.
33
More Than Four Conditions
With 5 conditions you need multiples of 120 (5x4x3x2x1), with 7 you need 5040! Therefore,
for all but the largest research projects with huge budgets, this is impractical and a
compromise is needed.
34
With Latin Squares, a five-condition research program would look like this:
Position 1 Position 2 Position 3 Position 4 Position 5
Order 1 A B C D E
Order 2 B C D E A
Order 3 C D E A B
Order 4 D E A B C
Order 5 E A B C D
The Latin Square design has its uses and is a good compromise for many research projects.
However, it still suffers from the same weakness as the standard repeated measures design in
that carryover effects are a problem. In the Latin Square, A always precedes B, and this means
that anything in condition A that potentially affects B will affect all but one of the orders. In
addition, A always follows E, and these interrelations can jeopardize the validity of the
experiment. The way around this is to use a balanced Latin Square, which is slightly more
complicated but ensures that the risk of carryover effects is much lower. For experiments with
an even number of conditions, the first row of the Latin Square will follow the formula 1, 2, n,
3, n-1, 4, n-2…, where n is the number of conditions. For subsequent rows, you add one to the
previous, returning to 1 after n. Sounds complicated, so it is much easier to look at an
example for a six condition experiment. The subject groups are labeled A to F, the columns
represent the conditions tested, and the rows represent the subject groups:
As you can see, this ensures that every single condition follows every other condition once,
allowing the researchers to pick out any carryover effects during the statistical analysis. When
an experiment with an odd number of conditions is designed, the process is slightly more
complex and two Latin Squares are needed to avoid carryover effects. The first is created in
exactly the same way and the second is a mirror image:
1 2 5 3 4
2 3 1 4 5
3 4 2 5 1
4 5 3 1 2
5 1 4 2 3
35
4 3 5 2 1
5 4 1 3 2
1 5 2 4 3
2 1 3 5 4
3 2 4 1 5
With this design, every single condition follows another two times, and statistical tests allow
researchers to analyse the data. This balanced Latin Square is a commonly used instrument to
perform large repeated measured designs and is an excellent compromise between
maintaining validity and practicality. There are other variations of counterbalanced measures
designs, but these variations are by far the most common.
For example, a study to compare two new methods for teaching reading uses a matched
subject research program. The researchers want to compare two methods, the current method
and a modern method. They select two groups of children and match pairs of children across
the two groups according to ability, using the results of their last reading comprehension test.
If the researchers wanted to test another method, they would have to find three comparable
children to compare between the three groups.
36
It is also possible to match for more than one variable. For example, a study to test whether a
daily exercise routine improved the cardio-vascular health in the inhabitants of a nursing
home could match subjects for age and gender. It may also be possible to match smokers and
ex-smokers. Obviously, given the complexity of humans and the sheer number of factors that
can influence behavior, this is exceptionally difficult for every factor, without huge groups
and making the project unnecessarily complex, especially if you are testing multiple
treatments.
Bayesian Probability
Bayesian probability is the process of using probability to try to predict the likelihood of
certain events occurring in the future. Unlike traditional probability, which uses a frequency
to try to estimate probability, Bayesian probability is generally expressed as a percentage. In
its most basic form, it is the measure of confidence, or belief, that a person holds in a
proposition. Using Bayesian probability allows a researcher to judge the amount of
confidence that they have in a particular result. Frequency probability, via the traditional null
hypothesis restricts the researcher to yes and no answers. Bayesian methods are becoming
another tool for assessing the viability of a research hypothesis. To use Bayesian probability,
a researcher starts with a set of initial beliefs, and tries to adjust them, usually through
experimentation and research. The original set of beliefs is then altered to accommodate the
new information. This process sacrifices a little objectivity for flexibility, helping researchers
to circumvent the need for a tortuous research design.
A drug company does not want to know whether a drug works or not, but assess if it works
better than existing treatments, giving a baseline for comparison. Drugs companies often
‘tinker’ with the molecular structure of drugs, and do not want to design a new program each
37
time. The researchers will constantly reassess their Bayesian probability, or degree of belief,
allowing them to concentrate upon promising drugs and cutting short failing treatments. This
reduces the risk to patients, the timescale and the expense.
Any regular computer user regularly makes use of Bayesian probability. Spam filters on e-
mail accounts make use of the Bayes theorem, and do a pretty good job. Whilst they do not
intercept every single spam e-mail, and may wrongly assign legitimate messages to the trash
folder, they are certainly better than having hundreds of junk messages waiting in the inbox
every time the account is opened. Every time the program makes an incorrect assumption,
which is flagged by the recipient, the new information feeds back into the model and
facilitates a more accurate answer the next time. This summarizes Bayesian probability very
well - it is an extremely useful tool, more often right than wrong, but it is only ever a guide.
Many areas of science are adapting to this reworking of an old theory, and it promises to fit
alongside the traditional methods very well.
38