GTI Manual Final 25june
GTI Manual Final 25june
GTI Manual Final 25june
Table of Contents
Introduction................................................................................................................................. 5
The DNA Barcoding Concept .................................................................................................. 5
Standard DNA Barcode Markers ............................................................................................. 6
Potential Utility and Limitations of DNA Barcoding .................................................................. 9
DNA Barcode Data Repositories ........................................................................................... 11
Standard DNA Barcoding Workflows: General Overview ...................................................... 13
Chapter 1. Front-End Processing ............................................................................................. 16
Natural History Collections and Lab Work – An Integrative Approach ................................... 16
Collection Data Management: Practical Approaches............................................................. 21
Specimen Imaging: Basic Principles ..................................................................................... 22
Tissue Sampling: Basic Principles ........................................................................................ 23
Chapter 2. Molecular Analysis .................................................................................................. 27
Set-up of Molecular Laboratory ............................................................................................. 27
DNA Extraction ..................................................................................................................... 29
Polymerase Chain Reaction (PCR) ....................................................................................... 33
Cycle Sequencing ................................................................................................................. 40
Chapter 3. Informatics and Data Analysis ................................................................................. 41
Sequence Editing .................................................................................................................. 41
Quality Control ...................................................................................................................... 44
Scaling up to Medium/High-throughput Processing ............................................................... 54
BOLD Analytics..................................................................................................................... 56
ANNEX I: Scaling up DNA Barcoding Workflows ...................................................................... 66
ANNEX II: Collecting Ontologies............................................................................................... 71
Annex III: Electronic Field Journal ............................................................................................ 75
ANNEX IV: Specimen Imaging ................................................................................................. 83
ANNEX V: Tissue Sampling ..................................................................................................... 91
ANNEX VI: Medium/High-throughput Lab ................................................................................. 95
ANNEX VII: Reagents for DNA barcoding ................................................................................ 98
ANNEX VIII: Barcoding Protocols (96-well microplates) ......................................................... 101
DNA Barcoding – Animals................................................................................................... 101
DNA Barcoding – Plants, fungi ............................................................................................ 111
3
GTI Training Manual – Standardized Workflows in DNA barcoding
4
GTI Training Manual – Standardized Workflows in DNA barcoding
Introduction
In the last decade, DNA barcoding gained visibility beyond the expert community as a tool for
sharing biodiversity knowledge – by making species identification tools applicable to anyone
interested in biodiversity. Through simple and routine diagnostic procedures, DNA barcoding
enables taxonomists to describe biological diversity more efficiently and acknowledges their
contribution towards building the reference library. At the same time, the wide user community
gains unrestricted access to the library to perform DNA-based identifications. Thus, in addition to
aiding taxonomists with performing their specialized tasks, it makes their knowledge more
available to the wide range of users who may not be carrying taxonomic expertise or years of
training in taxonomy.
DNA barcoding should not be considered as the equivalent of molecular systematics, which is an
area of taxonomy that infers relationships between species and higher taxonomic categories from
molecular phylogenetic analyses, often using multiple genetic markers. It is also different from
DNA taxonomy – an approach which proposes to use DNA as the sole basis for all taxonomic
reconstructions, with other character sets (e.g., morphology) being ancillary. The two approaches
(molecular systematics and DNA taxonomy) advocate for using larger volumes of genetic
information, to add robustness to phylogenetic reconstructions, with different marker selection,
depending on the taxonomic group studied and the goals of analysis. By contrast, “classical” DNA
barcoding argues for:
1
Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003). Biological identifications through DNA barcodes.
Proceedings of the Royal Society B, 270: 313–321.
5
GTI Training Manual – Standardized Workflows in DNA barcoding
a) standardization – using exact same gene region(s) across large taxonomic entities to
ensure adequate comparisons; and
b) minimalism – using the minimal amount of genetic data necessary to provide reliable
identification.
These requirements underscore the heavy focus on diagnostics, while maintaining a relatively
agnostic position regarding taxonomic arrangements or species concepts, which are the subjects
of other fields of study.
Animal cell
mtDNA
Mitochondrion
Figure 1. Mitochondrial cytochrome c oxidase subunit I (COI) – standard DNA barcode marker for animals.
2
Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994). DNA primers for amplification of mitochondrial
cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular Marine Biology and
Biotechnology 3(5): 294–29.
6
GTI Training Manual – Standardized Workflows in DNA barcoding
Plant cell
Plastid
Figure 2. Plastid rbcL and matK genes – standard markers used in the two-tier DNA barcoding approach
for plants.
3
Hollingsworth PM, Forrest LL, Spouge JL, Hajibabaei M, Ratnasingham S, et al. (2009). A DNA barcode
for land plants. Proceedings of the National Academy of Science 06: 12794–1279.
7
GTI Training Manual – Standardized Workflows in DNA barcoding
and may also be inadequately variable. Moreover, some fungal clades, such as
Neocallimastigomycota, lack mitochondria.
After a thorough search for a useful barcode marker, the internal transcribed spacer (ITS) (Figure
3) – a non-coding region of the ribosomal cistron was proposed as the standard barcode for fungi4.
This was mainly due to its ability to provide the highest probability of successful identification for
the broadest range of fungi by providing clear barcode gap between interspecific and intraspecific
divergence. However, it should also be noted that COI is still preferred as a barcode in some
fungal genera (e.g. Penicillium).
Fungal cell
Nucleus
Figure 3. Nuclear internal transcribed spacer (ITS) region – standard DNA barcode marker for most fungi.
Other kingdoms
Currently, only animals, plants and fungi have standard barcode markers accepted by the DNA
barcoding community. Protists, such as seaweeds and diatoms, have been investigated for DNA
barcoding on a small scale. However, it is worth-mentioning the commonly used primary markers
for macroalgae and diatoms5, as these taxa represent important components of the marine
ecosystems:
4
Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Bergeron MJ, Hamelin
RC, Vialle A, and Fungal Barcoding Consortium (2012). Nuclear ribosomal internal transcribed spacer (ITS)
region as a universal DNA barcode marker for Fungi. Proceedings of the National Academy of Science
109: 6241-6246.
55
Saunders GW, McDevit DC (2012). Methods for DNA Barcoding Photosynthetic Protists Emphasizing
the Macroalgae and Diatoms, in Methods in Molecular Biology 858: 207-222.
8
GTI Training Manual – Standardized Workflows in DNA barcoding
In addition to these primary markers, secondary markers can be used for better resolution and
are normally represented by LSU D2/D3 (divergent domains D2/D3 of the nuclear ribosomal large
subunit DNA)5.
6
Sensu lato - “in the broad sense” (from Latin).
7
EPPO Standard PM7/129 (1) (2016). DNA barcoding as an identification tool for a number of regulated
pests.
8
Phytoplasmas - bacterial parasites of plant phloem tissue transmitted from plant-to-plant by insect vectors.
9
Sensu stricto - “in the strict sense” (from Latin).
9
GTI Training Manual – Standardized Workflows in DNA barcoding
• Agriculture and forestry – identifying and monitoring agriculture and forestry pests and
biological control agents;
• Human health – identifying and monitoring human disease vectors and reservoirs and
reconstructing disease transmission pathways, assessment and monitoring of naturally
borne disease foci;
• Invasive alien species – identifying and monitoring alien (non-native) species that are
negatively impacting on ecosystems, habitats and native species, improving early
detection and regulatory measures to prevent cross-border transfer of the unwanted alien
species;
• Endangered species – enhancing taxonomic and ecological knowledge about endangered
species and creating a diagnostic framework for monitoring and preventing illegal harvest
and trade by improving detection of illegal specimens at all levels;
• Environmental surveillance/monitoring – helping mining industries (oil, gas, mining),
conservation sector (protected areas), the natural resources (forestry, fisheries) and
agriculture sectors to meet their environmental goals and to evaluate the efficiency of
management measures, restoration and mitigation measures;
• Market surveillance, product ingredient authentication, detection of food contamination
and substitution (e.g., seafood, meat and natural products).
Importantly, these applications go beyond academic science, making DNA barcoding a useful tool
in the portfolio of practitioners working in areas related to biodiversity. This broad utility of DNA
barcoding was recognized at the 13th Conference of the Parties to the Convention on Biological
Diversity, which, in one of its decisions, invited Parties “to support the development, with the
assistance, as appropriate, of the international barcode of life network, of DNA sequence-based
technology (DNA barcoding) and associated DNA barcode reference libraries for priority
taxonomic groups of organisms…” as a vehicle for global capacity building supporting the overall
goals of the Convention and its Global Taxonomy Initiative (Decision XIII/31, 13 December,
2016)11.
10
UNEP/CBD/SBSTTA/18/INF/20, 13 June, 2014; https://www.cbd.int/doc/meetings/sbstta/sbstta-
18/information/sbstta-18-inf-20-en.pdf
11
https://www.cbd.int/doc/decisions/cop-13/cop-13-dec-31-en.pdf
10
GTI Training Manual – Standardized Workflows in DNA barcoding
to understand the limitations of the DNA barcoding approach in its strict interpretation (sensu
stricto). The limitations of DNA barcoding could be divided into three groups.
• Genetic limitations are inherent to the characteristics of the markers used – COI, matK
and rbcL are organellar markers, therefore predominantly maternally inherited:
o DNA barcodes cannot resolve cases of mitochondrial or plastid introgression,
including on-going or past hybridization;
o Rare cases of heteroplasmy may lead to inaccurate sequence reads, confusing
the analytical outcome;
o The genes used may not provide enough resolution to discern recently diverged
species, including species with traces of past introgression with other, closely
related species;
o In some cases, COI is known to be present as non-functional nuclear copies known
as nuclear-mitochondrial pseudogenes (NUMTs) – these copies may be mistaken
for mitochondrial COI.
11
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 4. BOLD home page. Links to important public tools and the workbench are on the top right corner.
12
http://www.insdc.org/
13
http://www.ddbj.nig.ac.jp/
14
https://www.ebi.ac.uk/ena
15
https://www.ncbi.nlm.nih.gov/genbank/
16
https://www.ncbi.nlm.nih.gov/WebSub/?tool=genbank
17
http://boldsystems.org
18
Ratnasingham S, Hebert PDN (2007). BOLD: The Barcode of Life Data System (www.barcodinglife.org).
Molecular Ecology Notes 7, 355-364.
12
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 5. Three major components of the workflow are outlined in colour: Green – front-end; Blue –
molecular analyses; Brown – informatics (including data submission, curation and analysis). Specimen
collection data and photographs are submitted to BOLD. After sequencing, trace files and edited sequences
are submitted to BOLD as well.
Each component has subcomponents that need to be covered during DNA barcoding:
1. Front-end processing
a. Collection management
b. Collection data management
c. Specimen imaging
d. Tissue sampling
13
GTI Training Manual – Standardized Workflows in DNA barcoding
2. Laboratory analysis
a. DNA extraction
b. PCR amplification and gel check
c. Sequencing
3. Sequence management and analysis
a. Sequence editing
b. Sequence submission to BOLD/ GenBank
c. Sequence analysis and publishing
Baseline (routine) barcoding involves low, medium and high-throughput workflows aimed at
populating the DNA barcode library. The subset of the library consisting of barcodes linked to
physical specimens (vouchers) deposited in public institutions and identified to the species level
by experts (taxonomists) should be considered as the reference barcode library, the baseline
used for DNA sequence-based identification of unknown sequences. This library subset can be
built by using specimens from existing natural history collections or freshly sourced specimens
identified by taxonomists. The remaining library subset is usually generated based on freshly
collected specimens with provisional identification (usually to higher taxonomic levels) that is
updated at a later stage (based on matches between the new sequences and the existing
reference library). This approach is particularly useful for large batches of specimens representing
taxa that are difficult to identify (i.e., where the cost of molecular analysis per sample outweighs
the cost of efforts given by a taxonomic expert who is requested to identify the organism). In this
case, only a subset of the entire series of specimens with identical DNA barcodes could be
selected for non-molecular (e.g., morphological) identification to ensure the accuracy of the
results derived from DNA barcoding. Thus, the workflow can also be used in a more applied
14
GTI Training Manual – Standardized Workflows in DNA barcoding
context, when certain stages could be omitted (e.g., morphological identification of each specimen
prior to molecular processing).
Forensic barcoding
Metabarcoding
• Direct collection of multiple unsorted individuals, often used for smaller organisms (e.g.,
sweep-netting, Malaise-trapping, pitfalls, planktonic or benthic sampling);
• Indirect collection of tissue samples, usually from larger organisms (e.g., from fecal
samples, or hair-snagging traps);
• Indirect inference of the occurrence of certain taxa in an area by collection and analysis
of water, air, or substrate samples containing their environmental DNA (eDNA).
In the first case (multiple unsorted individuals), sequence information is derived from a bulk
sample containing a multitude of taxonomically diverse organisms that are not individually sorted
or vouchered; in the latter two, there are typically no morphologically discernable specimens in
the bulk sample. In either case, the resulting sequences are typically checked against reference
libraries to identify known taxa, while the remainder sequences are grouped into operational
taxonomic units to provide an estimate of taxonomic diversity within the bulk sample.
Although individual specimens are not tracked in metabarcoding workflows, it is just as important
to keep a record of the provenance of environmental samples and to track their analytical history.
For information on scaling up the DNA Barcoding Workflows, see Annex I.
15
GTI Training Manual – Standardized Workflows in DNA barcoding
• Sourcing – the process of collecting biological materials in the natural environment; note
that each DNA barcode sequence must link to its source specimen representing a single
biological individual (or clone, for modular organisms).
• Vouchering involves the processing stages necessary to turn collected organisms into
properly curated vouchers deposited in a recognized collection and available for
[re]examination by experts:
o Provenance data digitization – entry of information about the origin of biological
materials in a spreadsheet or database;
o Labeling – affixing labels with unique alpha-numeric identifiers to collection objects
and their storage containers;
o Taxonomic identification to the lowest level allowable by using non-molecular (e.g.,
morphological) characteristics.
• Imaging is used to generate an ‘e-voucher’, in case the physical voucher is lost or
consumptively analyzed:
o Generating digital images of each voucher specimen;
o Upload of images to the reference database (e.g., BOLD Systems).
• Tissue sampling – the process of isolating organismal parts destined for molecular analysis
and arranging them in a lab-compatible format.
• Molecular analysis – laboratory stages required to analyze the sample(s):
o DNA extraction, including tissue lysis, DNA isolation and purification;
16
GTI Training Manual – Standardized Workflows in DNA barcoding
Collecting Activities
Collecting Effort (expedition)
Fieldwork generally involves careful planning and organization; and its implementation requires
allocating administrative (e.g., permits and authorization), physical (e.g., equipment, travel
logistics) and financial resources. Thus, a collecting effort could be defined as a broad activity
that typically has an overarching goal (e.g., surveying certain organisms within a certain territory
over a certain period) and is part of an institutional or collaborative project or programme. A field
collecting expedition to a particular geographic location would be a typical example of a collecting
effort; another example is a research program that involves recurring collecting activities in a
certain area.
Collecting Event
In order to accomplish the broad goals of a collecting effort, targeted sampling activities are
required, designed to survey or collect functional or taxonomic groups of organisms in a particular
locality over a relatively short time span. Such activities would involve a particular trapping method
and would typically result in collecting the target organisms, often as a bulk sample. They may be
localized in space and time, i.e., be conducted on a specific date in a spot with precise coordinates
or span several days (e.g., stationary trapping using a Malaise trap) or a range of localities (e.g.,
a transect survey). It would involve collectors (people executing the collecting) and a particular
sampling method. For example, a yellow pan trap and a pitfall deployed in the same location on
the same date would represent two collecting events; however, a line of several pitfalls placed in
proximity to each other and sampled on the same date may represent the same collecting event.
17
GTI Training Manual – Standardized Workflows in DNA barcoding
It is necessary to separate the materials sourced19 from different collecting events into different
lots (see below).
Lot
Also known as bulk environmental sample, the lot (Figure 6) represents an aggregation of multiple
unaccounted individuals derived from the same collecting event and aggregated in a single
container. These individuals may be from one or more taxa. Lots usually result from bulk collecting
events, such as sweep netting, plankton netting, pitfall traps, Malaise traps, etc. For the purpose
of logistical convenience, lot contents may be further sorted by taxonomy or other characteristics,
thereby breaking up into two or more ‘sub-lots’.
Because the specimens within a lot, as a rule, cannot be individually discerned, it is imperative
that the contents of lots originating from different collecting events are not mixed together. It is
also critical that any organisms, which received any specific treatment (e.g., retrieval of any
individual information, such as measurements, or removal of parts, such as tissue samples), are
isolated from the lots and assigned individually recognizable coding as voucher specimens (see
below).
19
Note that a collecting event circumscribes the activity undertaken in order to collect biological materials,
but does not necessarily result in collecting success. In case of a negative result, recording the collecting
event, can still yield ecologically valuable data.
20
Prior to field work, it is preferred to agree on a clear and non-duplicating alpha-numerical coding system
to be used for collected biological materials. Pre-printing labels with these codes and affixing them to
collection objects can help dramatically reduce error in the field.
18
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 6. Examples of collection lots. Left to right: storage archive of jars with marine plankton samples
held at El Collegio de la Frontera Sur, Chetumal, Mexico; collection bottle with insects being removed from
a Malaise trap; and contents removed from the Malaise trap collection bottle – a slurry of small and tiny
insects. Note that none of the specimens are individually labeled or tracked.
Specimen
Also known as ‘collection voucher’, the specimen (Figure 7) represents a single biological
individual 21. In modular organisms (such as plants) parts of the same clone stored in different
repositories may be treated as different specimens, but typically a specimen would encompass
all parts preserved from an individual organism in different forms of preparation (e.g., dry mount,
skeletal elements, whole carcass, slide preparations, tissue samples, etc.). Specimens may be
collected individually in the field or removed from bulk samples (lots). In the latter case, it is
important to retain the virtual link to the lot from which the specimen was isolated; and it is always
critical to track its association with the corresponding collecting event.
Figure 7. Examples of collection voucher specimens. Left to right: vertebrate specimen (bird study skin),
entomological specimen (pinned beetle), plant specimen (herbarium voucher). Note that each specimen
has a label affixed to it; the labels have individual, globally unique voucher numbers that allow tracking each
specimen through the analytical process and linking it with the corresponding data records.
21
While specimens are physical entities (biological individuals), species are operational units used to group
specimens based on a set of criteria.
19
GTI Training Manual – Standardized Workflows in DNA barcoding
Tissue Sample
In the context of standard DNA barcoding approaches, the term ‘tissue sample’, or simply
‘sample’ refers to a portion of a specimen (usually a piece of DNA-rich tissue) isolated for
molecular analysis. When dealing with microscopic organisms, whole individuals may be
consumptively analyzed; in this case, no tissue sample is isolated, although exoskeletal remains
may be salvaged after analysis and preserved as collection vouchers. Finally, in larger organisms
(e.g., most vertebrates), one or more tissue samples from a single individual may be stored in a
genetic resources collection; and only a portion of the tissue volume may be used for a particular
molecular analysis. In this case, a piece of the archived sample is taken; this procedure is referred
to as ‘subsampling’.
It should be noted that, depending on the nature of analysis, the sample may contain DNA from
other organisms, e.g., parasites, pathogens or contaminants. It should also be noted that the term
‘environmental sample’ or ‘bulk sample’ is often used in the context of metabarcoding analysis
(screening of multi-species assemblages using next-generation sequencing approaches). Such
bulk sample would derive from a bulk collecting event and would be equivalent to a lot or its
genetically representative subset.
Accession
The term ‘accessioning’ refers to the procedure of registering a collection in a processing facility
or repository. In this context, the term ‘accession’ is used to denote a field collection registered in
a museum or repository as a single entity (usually, multiple objects derived from a single collecting
effort or sourced from a donor). Thus, it would typically contain many lots and/or specimens. Such
aggregate (aka Accession) is usually assigned an accession registrar number. Once
accessioned in a collection or processing facility, collection objects undergo sorting, preparation
and are catalogued in a collection registrar and/or database. In order to facilitate processing
logistics and associated audit trail, such objects may be further categorized or aggregated during
different stages of processing and archival.
Processing Array
Collection processing is typically done in batches; therefore, it is often logistically convenient to
aggregate catalogued storage units in a format that facilitates tracking and speeds up the process.
In the context of medium/high-throughput DNA barcoding, the most operationally critical phase is
the transfer of a batch of specimens into a lab-compatible array of 95 tissue samples. The
assembly of compatible processing arrays allows boosting operational efficiency, while reducing
the likelihood of human error. Unlike a lot, the contents of an array are individually accounted for
and their position is tracked. As well arrays of collection specimens may be disassembled after
tissue sampling or reordered for a different type of analysis; however, arrays may also be used to
aggregate specimens for archival storage.
More information on Collecting Ontologies can be found in Annex II.
20
GTI Training Manual – Standardized Workflows in DNA barcoding
• Materials collected in the field may represent either isolated biological individuals or bulk
samples containing multiple individuals. For the purpose of molecular analysis, it is critical
to discern these categories. The following standard terms are used:
o Lot – batch of unarrayed (usually uncatalogued) specimens derived from the same
bulk sample (e.g., pitfall, Malaise trap, plankton tow, etc.) and stored together in the
same jar, vial or other storage container.
o Specimen – main (elementary) storage unit, corresponding to an individual biological
object (whole or partial voucher and/or tissue sample).
Note: One DNA barcode record should always refer to a single biological individual;
therefore, the corresponding data records pertaining to the sequence and the specimen
itself have to be unambiguously linked.
• All collection objects (lots, specimens and any derived tissue samples) and their
relationships need to be clearly traceable using a system of unique identifiers, database
records, labels, and/or storage locators that would facilitate unambiguous correspondence
between biological individual and its sequence information.
• Provenance information (details of the geographic origin and collecting circumstances of
biological materials) should be recorded, ideally, in digital (e.g., database record) AND
analog (e.g., specimen label) format.
• Specimens/tissue samples destined for archival or molecular analysis need to be arranged
in a way that prevents mix-ups and facilitates routine processing of large volumes of
materials (at least, hundreds of samples per week).
21
GTI Training Manual – Standardized Workflows in DNA barcoding
• Biological materials destined for molecular analysis need to be collected and preserved in
a DNA-friendly fashion:
o They should be preserved as soon as possible, either by cryopreservation, drying
(desiccation), or by fixation using concentrated (~95%) ethanol or specialized DNA
preservation solutions (e.g., RNAlater);
o They should be stored in a manner that precludes possible DNA degradation from
light, hydrolysis, acidity/alkalinity, high temperatures, etc.
All the information mentioned above needs to be stored in a digital format. This can be done in a
database (MS Access for instance) or, for small amounts of data, in spreadsheets. An example
of a custom spreadsheet (Electronic Field Journal) with instructions to be filled is presented in
Annex III.
22
http://www.morphbank.net/
22
GTI Training Manual – Standardized Workflows in DNA barcoding
several mm in length. When working with large vouchers (e.g., vertebrate or herbarium
specimens) wide angle lenses can be used.
Figure 8. Examples of a micro photograph (A) and macro photograph (B). Left: image of a slide-mounted
flea taken using a digital camera mounted onto a dissecting microscope. Right: photograph of a fluid-
preserved frog specimen taken using an SLR camera with a macro lens.
Micro photography is warranted for specimens of several mm in length or less, particularly if they
are submerged in fluid (e.g., ethanol). Working with such specimens is usually more time-
consuming. Microscopes are typically stationary and often more expensive, compared to SLR-
type cameras; many of them are readily compatible only with same-brand cameras which require
proprietary software to capture and process images. If the specimens are not minute
(approximately 0.5 mm or larger), medium to high magnification stereoscopic (‘dissecting’)
microscopes are preferred to compound microscopes, because they allow direct manipulations
with the specimen, thereby allowing to combine imaging and tissue sampling stages into the same
workflow.
Details about imaging setup and tips are compiled in Annex IV.
Tissue samples can be collected for a variety of end uses, including, but not limited to DNA
analysis. Historically, some tissue collecting protocols were intended to facilitate studies of
chemical contaminants or other specialized tasks (e.g., allozyme analyses). Not all of these
protocols can be readily applied in DNA barcoding workflows with the expectation of recovering
high-quality DNA. For example, many genetic resources repositories today house large
collections of field-sourced tissue from vertebrate liver and other internal organs which if salvaged
1h or more post-mortem may contain heavily degraded DNA, due to the early onset of autolysis.
The main features of a barcode-friendly tissue source are:
23
GTI Training Manual – Standardized Workflows in DNA barcoding
• The tissue should be rich in structures such as mitochondria (for animal COI) or plastids (for
plant rbcL or matK);
• The tissue should have low enzymatic activity, reducing the likelihood of post-mortem
autolysis;
• The tissue should allow relatively easy lysis and DNA extraction;
• There should be a low risk of foreign contaminants
Examples of DNA-friendly tissue sources that conform to these criteria in animals are: skeletal
muscle, nervous system and gonads (although the latter may contain symbionts or parasites in
certain insects and invertebrates). For plants, green vegetative parts are generally a good tissue
source.
Tissue preservation
In addition to the source, preservation techniques and chemicals used have an important effect
on DNA preservation. The following are considered to be DNA-friendly killing/fixation methods:
• Formalin (marine)
• Ethyl acetate (insects)
• Diluted propylene glycol (Malaise traps, pitfalls)
• Most histological solutions
Note that all methods are sensitive to a wide range of factors, such as nature and quality of tissue,
quality of fixative/preservative, specifics of the fixation procedure, and subsequent storage
conditions. A non-exhaustive list of examples is provided below.
Ethanol preservation:
Although ethanol is a good preservative commonly used by field biologists to store tissue
samples, it does not guarantee preservation of high molecular weight DNA. It is sometimes
possible to estimate the likelihood of ‘DNA barcode-friendly’ preservation by visual inspection of
the samples (Figure 9). Factors that may contribute to DNA degradation include:
24
GTI Training Manual – Standardized Workflows in DNA barcoding
• Tissue/ethanol volume ratio (excessive tissue lowers ethanol concentration and increases
the concentration of autolytic enzymes and PCR inhibitors);
• Relative surface area of sample (relatively high volume hampers fixative penetration into
the tissue);
• Storage temperature (higher temperatures are likely to increase nuclease activity and other
agents destroying DNA);
• Exposure to light (may lead to DNA photolysis);
• Fixative evaporation (harmful if leads to increase in water concentration).
The effect of these factors can be mitigated if samples are stored in a cool place away from light,
ideally at freezing temperatures.
Figure 9. Examples of ethanol-preserved vertebrate muscle samples with various probabilities of DNA
barcode recovery. If the volume of sample is large, relative to fixative, if the sample has not been fragmented
into small pieces, or if the fixative is coloured and partially evaporated, the chances of recovering high
quality DNA barcode sequences are lower. The evaporation of fixative is not necessarily indicative of DNA
degradation, as long as the samples are completely desiccated (no water remains).
25
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 10. Examples of dry-preserved specimens: dried butterfly removed from a storage envelope (A) and
plant leaf sample in bag with desiccant (B). Whole insects can be preserved, either in envelopes or pinned.
Plants are typically dry-preserved on herbarium sheets; their tissue samples destined for molecular
analyses are often stored separately in sealed bags containing desiccant, such as silica gel. This provides
better protection against fluctuations in ambient humidity.
26
GTI Training Manual – Standardized Workflows in DNA barcoding
Front-end processing
DNA Hazardous
archival Sequencing waste EXIT
(-80°C)
Bioinformatics
27
GTI Training Manual – Standardized Workflows in DNA barcoding
PCR verification by gel electrophoresis is a vital component of DNA barcoding and most other
biomolecular diagnostic labs. PCR product visualization in agarose gels generally employs
ethidium bromide (EtBr), as dye. This agent binds with nucleic acid molecules and will fluoresce
when exposed to ultraviolet light facilitating the detection of PCR products. EtBr is thought to act
as a mutagen and is classified “hazardous”. Allocation of isolated space, proximal to PCR station,
is generally desired to minimize EtBr contamination and exposure to ultraviolet light. The gel-
electrophoresis station should be provided with dedicated pipettes, dedicated lab coat and
dedicated UV-station, and dedicated hazardous-waste bin to minimize carryover of EtBr to other
parts of the lab.
For a list of equipment and consumables required in a medium/high-throughput molecular facility,
see Annex VI.
Storage of reagents
Proper storage will keep the chemicals/ reagents fit for use, minimize cross-contamination, and
reduce hazards for lab workers. Flammables should be stored in flammable storage cabinets/
purpose-built ventilated cabinets.
All flammable storage cabinets must be clearly labeled with signs – “Flammable, keep fire away”.
Chemical containers/bottles should be kept tightly closed when not in use.
Non-flammable liquid reagents and solids, like buffers and salts, are usually stored on the shelves
at room temperature. However, care must be taken not to put incompatible chemicals, like acids
and bases, together. For their long-term stability, some reagents/enzymes, such as Taq
polymerase, must be stored in a freezer (below -20°C). Likewise, refrigeration of working buffers,
particularly phosphates, will enhance their life. All containers/bottles should be visibly labelled
indicating their contents. It is desirable that the containers of working-solutions/buffers be labelled
with both the contents and the preparation date. Reagents, which have a specified shelf life and
have expired should be removed from the shelf and disposed of following the standard procedures
of waste management.
Emergency procedures
Material safety data sheets (MSDS) should be available at a visible site in the lab. If exposed to
a potentially hazardous chemical, consult the MSDS for the chemical you were working with and
follow the remedial measures. In case of skin contact, remove contaminated clothing and rinse
off the affected skin with plenty of water for 15-20 min. For eye contact, rinse eye(s) thoroughly,
rolling the eye balls around, for 15-20 min at the eye-wash station. In case of inhalation, move
into fresh air immediately. In all cases of exposure to hazardous materials, seek medical attention
if symptoms persist. In case of emergency, contact the campus police/emergency services.
When there is chemical spill or fire in the lab, it must be dealt with professionally. Spills, particularly
of flammable solvents, should be cleaned immediately using non-flammable absorbents. Using
paper towels to soak up flammable liquids is not recommended. Fire extinguisher should be
readily available when working with flammables. If fire alarm goes off, leave the building
immediately and call emergency services/police.
28
GTI Training Manual – Standardized Workflows in DNA barcoding
DNA Extraction
DNA (deoxyribonucleic acid): Cell, the building block of living organisms, contains unique
material, deoxyribonucleic acid (DNA), that is self-replicating and carries the genetic information.
Most part of the DNA is located within nucleus (nuclear DNA – nDNA) while a fraction is found in
mitochondria (mtDNA) and chloroplasts (cpDNA) and is also referred to as extranuclear or
cytoplasmic DNA. The DNA of an organism is called its genome. The genome is
compartmentalized into chromosomes that may vary in number among organisms from different
species. The distinct sequence of nucleotides in the genome, the order of which determines the
order of monomers in a nucleic acid molecule or polypeptide, is the gene. Nucleotides that code
for proteins (exons) make the coding region (coding DNA sequences – CDS) and those that do
29
GTI Training Manual – Standardized Workflows in DNA barcoding
not code for proteins (introns) make the non-coding region of the genome. Each DNA molecule
is comprised of a nitrogen base, sugar and phosphate. Four types of nitrogen bases, Adenine
(A), Thymine (T), Guanine (G), and Cytosine (C), that determine nucleotides, are organized in
double-helix, like a twisted ladder, with phosphates and sugars making the backbone of the
ladder. With two carbon rings and four nitrogen atoms, A and G are classed as purine while with
one carbon ring and two nitrogen atoms, T and C are classed as pyrimidine.
DNA extraction is the process of separation of DNA, present in the nucleus (nuclear DNA –
nDNA), mitochondria (mitochondrial DNA – mtDNA) or chloroplast (chloroplast DNA – cpDNA),
from other cellular components. Since the discovery of DNA, a range of methods have been
practiced and a multitude of protocols have been published for its isolation. Isolated DNA is used
in a range of applications, such as genetic analysis, identification of individuals/ species, gene
transformation, forensic analysis etc. Each application may require a certain level of DNA purity/
integrity. In principle, DNA extraction involves two main steps; i) releasing DNA from cell by
disrupting nuclear and cell membranes and/or cell wall (in plants) – by lysis, and ii) separating
nucleic acid from the other cellular content – by isolation. Employment of a DNA extraction method
generally depends on the organism, source, age, and size of the sample. For example, plants
have cell wall, which animals are lacking, and that may require use of additional chemicals and
modified protocols for lysis. While selecting an extraction method, purpose of the DNA utility may
be another consideration. For instance, DNA microarray analysis may require a higher level of
DNA purity than that for PCR, thus requiring more stringent purification steps. Cost of the
procedure may also be a deciding factor in method selection as one protocol may be more cost
efficient than the other, though both would yield similar amount and quality of DNA.
Conventional methods
The conventional methods of DNA extraction are mainly based on chemical extraction. In general,
they are laborious and may require specialized laboratory bench space due to involvement of
hazardous chemicals. In principle, the method relies on salting-out the proteins by precipitation to
separate the nucleic acid (DNA/RNA). Most often additional chemicals, such as
phenol/chloroform, are used to denature the proteins and purify the DNA. The most common
method is phenol/chloroform extraction. This method involves mixing of lysate with phenol-
chloroform and separation of nucleic acid from protein/cell debris by centrifugation. Proteins are
denatured and precipitate down with phenol in organic phase while DNA remains in the aqueous
phase which is then precipitated by alcohol. The phenol/chloroform method is commonly used
due to its ability to produce high-molecular-weight DNA, which is desired for construction of
genomic libraries, and also due to its lower cost per sample. Although phenol alone denatures
proteins efficiently, it does not fully inhibit RNase activity. Combination of chloroform with phenol
overcomes RNases and a subsequent treatment with only chloroform removes traces of phenol
from the preparation. A range of protocols are available for all sorts of living organisms and tissue
types. An example is provided below.
Phenol/chloroform extraction for animal tissue23
1. Transfer the sample (ground mixture - lysate) to a polypropylene tube and mix by shaking
with equal volume of phenol:chloroform: isoamyl alcohol (25:24:1).
23
Adapted from Green MR, Sambrook J (2012). Molecular Cloning – A Laboratory Manual. Cold Spring
Harbor Laboratory Press.
30
GTI Training Manual – Standardized Workflows in DNA barcoding
(Isoamyl alcohol reduces the foaming and improves separation between aqueous and
organic phases).
2. Centrifuge the mixture (@12,000 x g) for 3-5 min to separate the aqueous and organic
phases.
3. Transfer (using a pipette) the aqueous phase (upper layer) to a fresh tube. Discard the
organic phase (lower layer).
4. Repeat steps 1 – 3 until no protein is visible at the interface of the organic and aqueous
phases.
5. Add an equal volume of chloroform, and repeat steps 2 – 4.
6. Add 2X volume of 95% ethanol, chill at -20°C for at least 30 min, and pellet the DNA by
centrifugation (30 min @12,000 x g).
7. Wash the DNA pellet by 70% ethanol, air dry, and re-suspend in deionized water. Store
the purified DNA at -80°C until use.
1. Cut tissue (up to 25 mg) into small pieces, and place in 1.5 ml microcentrifuge tube. Add
180 μl Buffer ATL.
2. Add 20 μl proteinase K. Mix by vortexing and incubate at 56°C until completely lysed.
Vortex occasionally during incubation (or place in a thermomixer, in a shaking water bath,
or on a rocking platform). Lysis is usually complete in 1–3 h (samples can be lysed
overnight).
3. Vortex for 15 s. Add 200 μl Buffer AL to the sample. Mix thoroughly by vortexing. Then
add 200 μl ethanol (96–100%). Mix again thoroughly. Alternatively, premix Buffer AL and
ethanol, and add together.
4. Pipet the mixture into a DNeasy Mini spin column in a 2 ml collection tube. Centrifuge at
6,000 x g (8,000 rpm) for 1 min. Discard flow-through and collection tube.
5. Place the spin column in a new 2 ml collection tube. Add 500 μl Buffer AW1. Centrifuge
for 1 min at 6,000 x g (8,000 rpm). Discard flow-through and collection tube.
31
GTI Training Manual – Standardized Workflows in DNA barcoding
6. Place the spin column in a new 2 ml collection tube. Add 500 μl Buffer AW2. Centrifuge
for 3 min at 20,000 x g (14,000 rpm). Discard flow-through and collection tube. Remove
the spin column carefully so that it does not come into contact with the flow-through.
7. Transfer the spin column to a new 1.5 ml or 2 ml microcentrifuge tube and add 200 μl
Buffer AE for elution (directly on the DNeasy membrane). Incubate for 1 min at room
temperature. Centrifuge for 1 min at 6,000 x g. Recommended: Repeat this step for
maximum yield.
Alternatively, spin-columns can be bought separately and used in a DNA extraction protocol with
home-made reagents (see Annex VII and VIII).
DNA quantification
Although purified DNA has many uses, in DNA barcoding, it is mainly used for PCR analysis. Use
of established protocols, expert hands, reliable and fresh samples, and optimized sample size
almost guarantees good DNA recovery. Although validation of successful DNA extraction is a
good idea, the DNA estimation step may also be circumvented.
1. Ethidium bromide stained gels: Query DNA extracts are electrophoresed on agarose gel
in parallel with the DNA of known amount. The gel is stained with ethidium bromide (EB),
illuminated with ultraviolet (UV) light, and bands of query DNA are then compared with
those of known DNA to estimate the DNA amount in the query sample. The analysis is
facilitated by capturing the image by CCD camera and using software for measuring band
intensity.
2. Real-time PCR: The real-time PCR measures the amount of double-stranded DNA, at
each cycle, with the help of fluorescence dye that is incorporated in the reaction. By
capturing the change in fluorescent intensity on video camera the amount of DNA can be
interpreted by designated software.
3. Spectrophotometry: The method involves measuring the absorbance of the sample at
260nm on a spectrophotometer. Nucleic acids absorb UV light in a specific pattern. When
DNA is exposed to UV light @260nm, a photo-detector on the other end measures the
light passing through the DNA sample. A higher DNA concentration will absorb more light
producing a higher optical density. The absorbance ratio is converted to DNA
concentration through a formula.
4. PicoGreen: This method measures fluorescent intensity of PicoGreen (dye) in reference
to a set of standards. The fluorescent intensity is increased when the dye binds to the
dsDNA, which is read by spectrofluorometer, and is then translated to the amount of DNA
in the sample.
DNA preservation
Several factors may play a role in degradation of purified DNA, but three are most common;
contaminants and nucleases (DNases), acid hydrolysis (due to absence of buffering capacity in
the storage solution), high variation in temperature (frequent freeze-thaw cycles).
Follow the following guidelines to overcome DNA degradation and for proficient DNA storage.
1. Use gloves and clean bench space to avoid contamination with DNases.
32
GTI Training Manual – Standardized Workflows in DNA barcoding
2. Elute/dissolve the DNA in a buffer containing chelating agent. For example, TE buffer
(10 mM Tris pH 8.0, 1 mM EDTA).
3. Store at low temperature and avoid frequent freeze-thaw cycles. One way to avoid
freeze-thaw cycles is to aliquot extracted DNA in multiple vials.
Barcode
region
5’ 3’
Gene 3’ 5’
Denaturation (94-98°C)
5’ 3’
3’ 5’
Annealing (45-68°C)
5’ 3’
3’ 5’
Extension (65-75°C)
5’ 3’
3’ 5’
5’ 3’
3’ 5’
Figure 12. Schematic representation of COI amplification. The two DNA strands are coloured in blue and
green, while the primers are in red and orange.
33
GTI Training Manual – Standardized Workflows in DNA barcoding
In a standard PCR procedure, the target gene region is amplified from double-stranded DNA
(dsDNA – template) through a series of temperature cycles in a thermocycler. One cycle is
completed in three steps; denaturation, annealing, and extension. The template is mixed with
DNA polymerase, dNTPs, primers, MgCl2 and PCR buffer. In step one (denaturation @ >94°C),
dsDNA is separated into two single strands, forward and reverse. In step two (annealing @ 45°C
– 68°C), the primers attach with the complementary sequence region of the target DNA to initiate
DNA synthesis in the presence of polymerase. In step three (extension @ 65°C – 75°C),
polymerase enzyme extends DNA sequence by incorporating dNTPs available in the PCR
reaction. At the end of each cycle the amount of DNA doubles producing millions copies of the
target DNA in just over 25 cycles.
PCR Primers
Primers are short chains of nucleotides (17-30 bp), also called oligonucleotides, that are artificially
synthesized and serve as initiation point for amplification and synthesis of the target DNA in the
genome. Since DNA in nature exist as double stranded, a pair of primers is used simultaneously
for the synthesis of both the forward and the reverse strands. Primers are the key element in PCR
amplification of the target gene marker.
The successful primers possess nucleotide sequence that is highly complementary to the target
region of the DNA. High sequence similarity with the target reduces the possibility of
mishybridization to a similar sequence in the genome. Primers in the pair, used to amplify target
DNA, must have similar melting temperature and that temperature should not vary significantly
from that of the template DNA strand. Low dimerization and ability of hairpin formation are other
qualities to consider.
The most common problems in PCR amplification arise from:
• Dimerization – primers hybridize with each other (instead of hybridizing with the target
DNA) generating short amplicons, the primer dimers;
• Hairpin formation – two ends of the same primer possess complementary nucleotides and
hybridize producing a hairpin like structure compromising their efficiency of hybridizing
with the target template.
Primers with GC-content of 40-60% are more efficient that those with low %GC.
Species-specific primers
Designing primers for the target gene region with known sequence is straightforward. Sequence
of the published DNA may be obtained from GenBank or other public sources and can be used
as template to design primers, manually or by using software. Primers that are based on sequence
from a single species, and will most likely amplify the target DNA only from that species, are
species-specific primers.
Universal primers may be designed from a gene region that is highly conserved among most
species and will amplify the target from most species. Degenerate primers on the other hand are
designed by locating the frequently-variable nucleotide sites among species and adding a
degenerate base at the variable site in the primer sequence (Figure 13). Degeneracy is
34
GTI Training Manual – Standardized Workflows in DNA barcoding
incorporated by combining oligonucleotide sequences in which variable base sites are altered in
such a way that the primer covers all the possible base combinations for the variable site. Codes
for different base combinations are selected by following universally accepted IUPAC
(International Union of Pure and Applied Chemists) codes (Table 2). For example, for A or G the
code is R and for C or T it is Y. Likewise, where all four bases (A, G, T, C) are a probability, the
code is N.
Forward primer
H AC WWT AT A Y T T Y A T T T T YGG W A T Y
5' 1
3'
2
3
4
5
6
7
8
9
10
Forward strand
Reverse primer
C H A T R Y T H Y T W A C W G A Y CG W A A T H
3' 1
5'
2
3
4
5
6
7
8
9
10
Reverse strand
35
GTI Training Manual – Standardized Workflows in DNA barcoding
Primer cocktail
Use of primer cocktails is common in DNA barcoding mix of species. Primer cocktail is a mixture
of two to three universal primers that enhances the probability of target-DNA amplification by
addressing the nucleotide variation. For example, the following forward and reverse primers,
which vary at a few nucleotide positions (indicated by bold letters), are mixed together to broaden
the range of target species amplified in a single PCR run.
Cocktail of forward primer C_LepFolF is prepared by mixing LepF1 and LCO1490 in equal
volumes:
LepF1: ATTCAACCAATCATAAAGATATTGG
LCO1490: GGTCAACAAATCATAAAGATATTGG
Cocktail of reverse primer C_LepFolR is prepared by mixing LepR1 and HCO2198 in equal
volumes:
LepR1: TAAACTTCTGGATGTCCAAAAAATCA
HCO2198: TAAACTTCAGGGTGACCAAAAAATCA
Tailed primers
A tailed primer contains extra nucleotides at the 5′-end that are not complementary to the target
gene sequence. The 3′-end of the tailed primer anneals to amplify the target region while the non-
complementary 5′-end, creates a “tail” of additional nucleotides with the PCR product. The tail is
added for various objectives such as incorporation of endonuclease sites, plasmid cloning, or to
facilitate sequencing. In DNA barcoding the tailed primers are mainly used to facilitate
sequencing. Generally, the PCR primers used in PCR reaction are also used in cycle sequencing.
PCR amplification by universal/ degenerate primers, from DNA of multiple taxa arrayed in a single
microplate, for instance, may compromise sequencing success with degenerate primers. Tailing
the degenerate primer sequence with an established universal primer, such as M13, allows the
use of primer tail for cycle sequencing.
Here is an example of forward and reverse primers where LepF1 is tailed with M13-forward (red)
and LepR1 with M13-reverse (blue).
LepF1_t1: 5′ TGTAAAACGACGGCCAGTATTCAACCAATCATAAAGATATTGG 3′
LetpR1_t1: 5′ CAGGAAACAGCTATGACTAAACTTCTGGATGTCCAAAAAATCA 3′
A list of the most common primers used in DNA barcoding of animals, plants and fungi is provided
below24.
24
References are detailed in Annex VIII.
36
GTI Training Manual – Standardized Workflows in DNA barcoding
M13F TGTAAAACGACGGCCAGT
M13R CAGGAAACAGCTATGAC
25
Primers ending in ‘t1’are M-13 tailed (PCR products are sequenced only with M13 primers).
37
GTI Training Manual – Standardized Workflows in DNA barcoding
PCR Protocol
Addition of PCR primers and DNA (template) to PCR mix completes all the PCR ingredients that
are required for DNA synthesis through thermocycling. To avoid contamination and undesired
DNA amplification, this task should be performed on a sterilized (DNA free) bench or under a
dedicated glass hood equipped with the UV light for surface sterilization. Sterilization must be
38
GTI Training Manual – Standardized Workflows in DNA barcoding
accomplished by ELIMINase treatment, then ethanol wiping, and subsequently with UV exposure
(if available). New gloves should be used when preparing PCR reactions and lab supplies should
be organized with arm’s-length before transferring DNA template to the PCR mix.
There is a large variety of PCR recipes used in different molecular labs. See below a recipe
adapted from the Canadian Centre for DNA Barcoding protocols (ccdb.ca) in 12.5 uL reaction
volume. Trehalose is used to facilitate PCR and to allow freezing of aliquoted PCR master-mixes
(made ahead of time and frozen until required for processing). A high-fidelity Taq polymerase,
although usually more expensive, would require less optimization compared to standard Taq,
therefore saving time and having higher chances of success.
Reagents 1 reaction (µL)
10% trehalose for PCR 6.25
ddH2O 2
10X PCR buffer 1.25
50mM MgCl2 0.625
10mM dNTPs 0.0625
10µM forward primer 0.125
10µM reverse primer 0.125
Taq polymerase (5 U/µL) 0.06
Total 10.5
DNA template 2µL per well
For medium/high-throughput processing and detailed instructions of PCR handling, see Annex
VIII.
Gel electrophoresis
Before proceeding to cycle sequencing, the PCR amplification of the target DNA is generally
verified by agarose gel electrophoresis. A small amount (~4 µl) of PCR product is loaded onto
ethidium bromide (EtBr)-stained gel, electrophoresed at low voltage (60-70 volts) for a certain
duration of time and then visualized by exposing the gel to UV light. EtBr tangled with the DNA
molecules become fluorescent to UV light and indicates the location of amplified PCR product on
the gel.
Agarose gel may be prepared in lab or obtained commercially. For example, 2% agarose gel may
be prepared by dissolving (by heat or microwave) 2 g of agarose in 100 ml of electrophoresis
buffer (TAE or TBE). EtBr may be added (0.5 µg/ml) to the gel at the time of preparation or the
gel may be stained, after electrophoresis, by soaking in EtBr solution.
Electronic-gel (E-gel) is another option that provides a user-friendly alternative for lab-made gel
and pre-cast gel is also available commercially. PCR verification in E-gel is accomplished without
using electrophoresis buffer which also minimize the chances of EtBr contamination. Readymade
E-gel contains EtBr and comes with multiple options for sample capacity including 96-well plate
format. See details in Annex VIII.
39
GTI Training Manual – Standardized Workflows in DNA barcoding
Cycle Sequencing
Cycle sequencing is very similar to PCR in action as it utilizes DNA polymerase and free
nucleotides to generate copies of template DNA in a temperature-cycling format; but there are
two core differences between the two. First, cycle sequencing employs only one primer to
synthesize copies of only one strand of the double helix DNA, and this copy cannot be used as a
template for later cycles. This means the amplification in cycle sequencing is linear, not
exponential, which would require sufficient copies of original template DNA in order to be detected
by sequencing equipment. Second, in addition to deoxynucleotide triphosphate (dNTPs), cycle
sequencing also utilizes dideoxynucleotide triphosphate (didNTPs). During amplification, the
didNTPs will randomly sit on the DNA template and cause base-specific termination of extension.
This process would generate fragmented copies of the single target strand with each copy
terminated at a specific nucleotide site. Where double stranded PCR product is synthesized, using
two primers (forward and reverse) in a single PCR run, two single stranded amplicons are
synthesized in two separate PCR runs using one primer in each run (Figure 14).
After cycle sequencing is complete, the PCR product is ready for Sanger sequencing in an
automated sequencer (usually provided by a sequencing facility as a service for fee). See Annex
VIII for a 96-well-based protocol for cycle sequencing. In many cases, PCR products are sent to
sequencing facilities before cycle sequencing.
Why is cycle sequencing needed?
1. The amount of DNA template necessary for the sequencing reaction is significantly
reduced.
2. Smaller amount of template introduces fewer impurities in the sequencing reaction.
3. No additional denaturation step is required at sequencing stage since PCR product has
gone through multiple heat-denaturation steps in sequencing reactions.
40
GTI Training Manual – Standardized Workflows in DNA barcoding
Sequence Editing
The visual representation of a DNA sample is a chromatogram (electropherogram or trace file)
and is generated by a sequencer. The common file formats for chromatograms are ABI and SCF.
The ABI file format is a binary file that is created by ABI sequencer software, while SCF (standard
chromatogram format) may be created by other sequencers (e.g., Beckman,Li-Cor).
PCR products are usually sequenced bi-directionally (forward reaction and reverse reaction) in
order to cover the entire length of the desired genetic fragment. Forward and reverse traces are
assembled into one contig (one sequence) and edited to fix ambiguous bases (if possible) with
specific software (free: BioEdit, commercial: CodonCode, Geneious, Sequencher etc).
Regardless of the software choice, the steps required to move from raw data (trace file) to a
reliable sequence are the same (see below).
Figure 15. Sequence editing workflow. Codes: .ab1 - ABI file; F - forward; R - reverse; RC - reverse
complement; N - ambiguous base (see IUPAC codes); SID - BOLD Sample ID; PID - BOLD Process ID.
*The R reaction can be RC while performing primer trimming (see text).
41
GTI Training Manual – Standardized Workflows in DNA barcoding
Trim ends
Example of low quality end that should be removed (usually the software allows an easy ‘highlight
and delete’ option).
Trim primers
As primers are synthetic sequences that bind to the DNA strand, they may not reflect the ‘real’
sequence at the site of annealing and therefore should be removed.
In the example below, sequencing was performed with universal invertebrate (‘Folmer’26) primers:
LCO1490 (forward primer): 5’- GGTCAACAAATCATAAAGATATTGG-3’
HCO2198 (reverse primer): 5’-TAAACTTCAGGGTGACCAAAAAATCA-3’
Forward (F) trace: go to the end of the forward trace and look for the reverse primer (as reverse
complement, RC)
At position 656, the primer sequence starts: TGATTTTTTGGTCACCC…. which is the RC of HCO
(…GGGTGACCAAAAAATCA). Delete primer sequence (starting with position 656 in the example
above, until the end of the trace). The remaining sequence should end with TTTATTT.
Reverse (R) trace: reverse complement the entire trace (usually by clicking a button in the
software) and look for the F primer at the beginning of the R trace.
At position 26 the primer sequence ends (….CATAAAGATATTGG). Delete primer sequence (in
the example above, everything from the beginning of the trace to position 26). The remaining
sequence should start with AACATT.
26
Folmer OM, Black WH, Lutz R, Vrijenhoek R (1994). DNA primers for amplification of mitochondrial
cytochrome C oxidase subunit I from metazoan invertebrates. Molecular Marine Biology and Biotechnology
3: 294-299.
42
GTI Training Manual – Standardized Workflows in DNA barcoding
Note: It is also possible to use the R trace as it is (not RC) and look for the F primer (as RC) at
the end of the trace (see below). However, the trace will need to be switched to RC before being
assembled into a contig with the F trace.
At position 657, the primer sequence begins: CCAATATCTTTATG… which is the RC of LCO
(…CATAAAGATATTGG). In this case, delete everything starting with position 657 until the end
of the trace.
In many cases only a fragment of the primer can be reliably observed (as above) due to
decreasing base quality but it will not have any impact on the overall sequence as primer
sequences are trimmed.
In cases where the quality of the trace is too low to identify the primer regions, remove any low-
quality areas and edit the remaining (shorter than 658bp) sequence. If the entire trace is low
quality, discard it from analysis.
Once the entire contig is verified and edited, it is ready to be exported (as fasta file) from the
software and uploaded to BOLD with the proper name. Various sequencing providers are using
various naming systems for electropherograms. During sequence editing, it is recommended to
change the name of the contig to reflect the BOLD Sample ID (or Process ID) so it can ease the
sequence upload to BOLD (see below an example of a contig named with BOLD Process ID:
RONOC159-18).
43
GTI Training Manual – Standardized Workflows in DNA barcoding
Quality Control
PCR amplification of false targets, pseudogenes, or contaminants is always a chance, and this
chance is higher when a broad range of taxa are targeted with universal primers. Therefore, it is
important to verify the integrity and quality of the target sequence to prevent erroneous sequences
from becoming part of the barcoding database. The validation steps can be performed directly in
BOLD (see next section, BOLD Analytics) or outside BOLD (see below).
Sequence alignment
Sequence alignment is a bioinformatics process in which two or more DNA (or protein) sequences
are arranged in a fashion so that their most similar nucleotides (or amino acids for protein) are
well aligned with each other. The alignment of multiple DNA sequences may help revealing
conservative or non-conservative nucleotide positions. At the same time, an alignment may
expose nucleotide insertions and deletions (or INDELs) which can be caused by errors in
sequence editing.
In cases where only one sequence is targeted in a workflow, additional DNA sequences (for the
hypothesized taxon) can be downloaded from public databases and an alignment performed as
quality control.
There are many software allowing sequence alignment (including the sequence editing software)
and one of the most versatile, easy to use and free is MEGA (Molecular Evolutionary Genetic
Analysis)27. Once the fasta file is imported into MEGA, sequence alignment can be performed
27
Kumar S, Stecher G, Tamura K ( 2016). MEGA7: Molecular Evolutionary Genetics Analysis version 7.0
for bigger datasets. Molecular Biology and Evolution 33:1870-1874 (https://www.megasoftware.net/).
44
GTI Training Manual – Standardized Workflows in DNA barcoding
with two algorithms (Figure 16). For large number of sequences, MUSCLE28 would perform faster
than ClustalW29.
Figure 16. Alignment of COI seqeunces performed with MUSCLE in MEGA 7. The option for algorithm
(ClustalW or MUSCLE) can be chosen from the top command pannel (see the red circle).
28
Edgar RC (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Research, 32(5): 1792–1797.
29
Thompson JD, Higgins DG, Gibson TJ (1994). CLUSTAL W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight
matrix choice. Nucleic Acids Research, 22: 4673-4680.
45
GTI Training Manual – Standardized Workflows in DNA barcoding
Various taxonomic groups have diferent genetic codes (Table 5) with, sometimes, different stop
codons. Therefore, selecting the correct genetic code before translating nucleotides into amino
acids is crucial.
Table 5. Genetic codes important for DNA barcoding (Source: NCBI30). The internal transcribed spacer
(ITS) does not require a genetic code since it is not a protein-coding gene.
Plants - Chloroplast genes matK, rbcLa Fungi - ITS (non-
Animals - Mitochondrial gene COI coding)
Vertebrate Bacterial, Archaeal and Plant Plastid Code N/A
Invertebrate
Echinoderm & flatworm
Alternative flatworm
Ascidian
Trematode
Within the DNA barcoding context, a particular software is used to translate a DNA sequence
(nucleotide) into a protein sequence (amino acid) to verify the lack of stop codons. Their presence
would most likely indicate the amplification of a pseudogene since functional proteins do not
usually have internal stop codons. This translation step obviously applies to protein-coding genes
such as the animal and plant barcodes but not to fungal barcodes.
The protein translation of a gene starts with a start codon and ends with a stop codon. In
eukaryotes, there is only one start codon, ATG which codes for Methionine. The reading frame
starts with the letter “A” of the start codon. If it is started from the second letter of the start codon
30
https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG11
46
GTI Training Manual – Standardized Workflows in DNA barcoding
(which is T), the translation will not be in frame producing incorrect amino acid reads. Since the
barcode region does not start at position 1 of the gene, some adjustements are required during
translation.
For instance, if working with COI sequences in MEGA and going directly to translation, the first
attempt would result in the appearance of many stop codons (Figure 17) indicating a frame shift.
Figure 17. Translation of an alignment of COI sequences in MEGA 7. Before codons are changed into
aminoacids, the software will require the selection of the correct genetic code. After translation, this example
shows a large number of stop codons (symbolized by ‘*’ in the software). This result indicates an incorrect
reading frame (i.e., the first nucleotide of the alignment is not the first nucleotide of the codon). In cases of
ambiguities in nucleotides (symbolized by ‘N’), affecting the inference of the correct amino acid, MEGA 7
will display a ‘?’ for that codon.
To find the correct reading frame, one option is to temporarily delete the first nucleotide of the
alignment and then translate (Figure 18). Upon satisfactory inspection of the amino acid output,
the alignment would be reverted to the original form (‘Undo’ button). Another option is to
temporarily insert two columns of gaps before the first nucleotide and then translate (Figure 19).
Again, upon satisfactory inspection of the amino acid output, the alignment would be reverted to
the original form.
47
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 18. Translation of an alignment of COI sequences in MEGA 7. The first nucleotide of the alignment
is temporarily deleted (highlight the entire first row -> Delete). After translation and checking for stop codons,
the action is reverted (hit ‘Undo’).
48
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 19. Translation of an alignment of COI sequences in MEGA 7. Two columns of gaps are inserted at
the beginnign of the alignment (select the first column, then press twice the gap symbol on the keyboard
(gaps are symbolized by ‘-‘). Again, after translation and checking for stop codons, the action is reverted
(hit ‘Undo’). Note: if only one gap is inserted, the frame is still incorrect and will result in the display of many
stop codons.
Once the verification for stop codons is complete, the sequence is compared against a database
of DNA sequences. This action will find the best matches in terms of taxonomy (i.e., an
unidentified sequence will receive a taxonomic identification, depending on completeness of
database) and will highlight cases of misidentification and/or contamination.
The most common tool for this verification is the Basic Local Alignment Search Tool (BLAST31) in
GenBank. This web-based tool locates and displays regions of similarity between biological
31
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990). Basic local alignment search tool. Journal
of Molecular Biology, 215(3):403-10 (https://blast.ncbi.nlm.nih.gov/Blast.cgi).
49
GTI Training Manual – Standardized Workflows in DNA barcoding
sequences. Nucleotide and amino acid sequences are compared with those in the databases
(DDBJ, GenBank, ENA) and statistical significance of sequence matches is calculated. The four
types of BLAST offered by GenBank are: 1) Nucleotide BLAST (blastn32), that compares
nucleotide with nucleotide sequence (Figure 20); 2) blastx, that translates the nucleotide input to
protein to compare with protein; 3) tblastn, that translates the protein input to nucleotide to
compare with nucleotide; and 4) Protein BLAST, that compares the protein input with protein
sequences in the database. A standard BLAST search will reveal Max score, Total score, Query
cover, E value, Identity, and Accession no. of the match (if available) (Figure 21). It also generates
other reports that include Search Summary, Taxonomy reports, Distance tree of results, and MSA
viewer (multiple sequence alignment viewer). The results can be downloaded if needed. Batch
BLAST searches can be easily performed by uploading a fasta file or pasting multiple sequences
in the query box.
BLAST is also available in some sequence editing software where a (single) sequence can be
compared to GenBank through a direct link which will land on the query page of GenBank (Figure
20).
32
Zhang Z, Schwartz S, Wagner L, Miller W (2000). A greedy algorithm for aligning DNA sequences.
Journal of Computational Biology 7(1-2):203-14.
(https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blast
home)
50
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 20. Comparison of a DNA sequence against a database through BLASTN. The main steps are
circled in red (query box to paste the unknown sequence, choice of database and program selection).
51
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 21. Output of a query with BLASTN. There are three main sections with results: Graphic Summary
(with colour-coded alignment scores between query and the closest 100 matches in the database),
Descriptions (a list of 100 matches with taxonomic identification, query coverage, E-value and % identity)
and Alignments (query sequence aligned with each one of the closest 100 matches). Additional reports can
be investigated, the results can be downloaded and instructions on data interpretation are given through a
YouTube video (see the information circled in red).
BOLD
A similar tool is available in BOLD and is called BOLD Identification Engine33 (ID Engine), allowing
single or batch queries of the barcode database.
Comparing single sequences to BOLD does not require a BOLD user account. As opposed to
BLAST, where any DNA sequence can be compared against the database, BOLD has been built
specifically as a platform for DNA barcoding (in particular animal barcoding with COI). Therefore,
the ID Engine allows only sequences belonging to the official barcode markers to be identified.
For animal COI, there is one important choice to make, namely the database used for queries
(Figure 22).
Depending on the goal, the most common choices are either the entire database (including
records identified to coarse taxonomic level), to see the closest match to a sequence, or the
database containing only records with species names (when the goal is to find the closest species
33
http://v4.boldsystems.org/index.php/IDS_OpenIdEngine
52
GTI Training Manual – Standardized Workflows in DNA barcoding
to the query). The sequence is pasted into the query box (Figure 22) and the output opens in a
new window (Figure 23).
Figure 22. BOLD ID Engine with a single sequence queried against the Species Database. The tool can be
accessed directly from BOLD home page through the top links (“Identification”) without a user account.
53
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 23. Output page of results from the BOLD ID Engine. The query sequence had a 100% match in
BOLD and due to the high number of sequences all confirming the same species name, the identification
is considered robust. The results (100 closest sequences to the query) can be visualized on a tree (query
sequence in red, mined data from GenBank in blue) together with images corresponding to each record in
order of appearance in the tree. The results can be downloaded.
54
GTI Training Manual – Standardized Workflows in DNA barcoding
Data validation is a crucial step in DNA barcoding since the entire method relies on the existence
of reliable databases against which unknown sequences are compared and species
identifications are made. BOLD and external software offer tools to help users validate and curate
their data (Figure 24 and the following section, BOLD Analytics).
Figure 24. Main steps for validation of molecular data. AA – amino acids; NJ – neighbour-joining; ID –
identification; BIN – Barcode Index Number; 1 – NJ trees require more than 3 sequences; 2 – BINs are
assigned once a month (in 2018); 3 – Full BIN discordance report should be performed in BOLD 3.5 (in
BOLD 4, the BIN discordance report takes into account only records within a project not the entire database;
June 2018).
55
GTI Training Manual – Standardized Workflows in DNA barcoding
BOLD Analytics
BOLD is a workbench where users upload data (see Annex X), analyze it and publish it. At that
moment, the barcode data enters the public database and becomes freely accessible to anyone
for (re)use in various studies.
BOLD is organized in projects led by project managers and built together with their
teams/collaborators. Projects hold data constituted by records which are biphasic: 1) Specimen
Data - related to the physical specimen (designated by unique Sample ID), 2) Sequence Page -
associated molecular data (designated by Process ID). Data submission consists of four steps
(specimen data – validated by the BOLD team; images, traces and sequences, directly submitted
by users, no extra validation) (Annex X).
Data validation
Once a project is populated with records, data can be validated in BOLD (some steps are similar
to the previous sections on data validation with external software).
Figure 25. Most of the tools available for data validation and analysis in a project are situated on the left-
side console. By collapsing ‘Sequence Analysis’ and ‘Aggregate Data’, a suite of tools will be displayed and
can be selected with a mouse-click.
56
GTI Training Manual – Standardized Workflows in DNA barcoding
1. Upon specimen data upload, a ‘Distribution Map’ (Figure 25) can be built to check the
accuracy of the GPS coordinates in the project (Figure 26). The map can be opened in
GoogleEarth34 for a more detailed view of localities.
• If one or only a few errors are observed, they can be corrected manually (through the
Specimen Page).
• If there are many errors, an update can be submitted to the BOLD team (see Annex X).
Figure 26. Screenshot of a distribution map for records belonging to a BODL project (the red circle shows
the link to GoogleEarth).
2. Upon image upload, an ‘Image Library’ (Figure 25) can be built to verify that no mix-up
occurred during imaging or during submission (Figure 27).
• If an erroneous image has been uploaded, it cannot be corrected by the user. Instead,
BOLD support ([email protected]) should be contacted with a request to delete
the image. Once the record is cleared, the correct image can be uploaded.
3. Upon trace upload, if an error is observed, such as mix-up of traces between records, an
email should be sent to BOLD support to delete traces. Once the record is cleared, the correct
trace(s) can be uploaded.
34
https://www.google.com/earth/
57
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 27. Example of an image library in BOLD used to verify the taxonomic identification and possible
mix-ups of images during upload.
4. Upon sequence upload, a notification appears immediately if stop codons are present in the
sequence35 or if the sequence matches the contaminant database (bacteria, human, mouse,
pig etc) (Figure 28). In such cases, records are flagged immediately and removed from the
database used for the ID Engine.
Figure 28. Built-in tools for data validation. Contaminants are flagged upon sequence upload to BOLD.
Similarly, sequences with stop codons are immediately detected, flagged and removed from the ID Engine.
35
The genetic code is chosen automatically by BOLD based on the taxonomy associated with the record,
hence the importance of correct taxonomic assignment at least to phylum level.
58
GTI Training Manual – Standardized Workflows in DNA barcoding
5. After sequence upload, the Batch ID Engine (Figure 25) can be used in BOLD 4 to run all
sequences in a project against one of two databases: all barcodes or only barcodes with
species names (Figure 29). The results can be emailed to the user and will contain a
spreadsheet with 100 closest matches for each sequence queried. If errors are observed,
action should be taken:
• If a misidentification occurred, update taxonomy (either manually through the Specimen
Page or in batch through the update submitted to the BOLD team);
• If contamination (between samples or with other contaminants) occurred, request to flag
records by sending an email to BOLD support.
Note: It is crucial to take action to correct data (or flag it, as deemed suitable) since the entire
DNA barcoding approach is based on the existence of reliable DNA libraries. If errors are left
uncorrected, there will be an impact on the activities of the user community at large.
Figure 29. Batch ID Engine allows an entire project to be compared with the full/species database. Several
filters are available. Results should always be emailed to the user for efficient time management.
Data analysis
The following analytical tools can be used both for data validation and data analysis.
1. Neighbour-joining (NJ) trees can be built from the Sequence Analysis console by selecting
Taxon ID Tree (Figure 25). Various parameters (alignment type, genetic distance model) and
branch labels can be chosen by users (Figure 30). The NJ tree allows a relatively rapid
(depending on the amount of data) assessment of barcode clusters along the tree, therefore
highlighting cases that require additional investigation (e.g., occurrence of multiple species
names in one cluster or one species split into multiple clusters). Trees can be downloaded as
pdf, newick or postscript files.
If errors are observed, action should be taken:
59
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 30. Taxon ID settings: only the alignment type is a mandatory field (Note: for large datasets choose
BOLD Aligner). For the other fields, BOLD will use default settings (process ID and lowest taxonomic level
available) unless otherwise specified by the user. The tree can be accompanied by an image library and a
spreadsheet with specimen details. If the data to be analysed is large, the option of emailing the results
should be chosen.
60
GTI Training Manual – Standardized Workflows in DNA barcoding
Note: NJ trees are great tools for data validation and barcode visualization but caution should be
taken in considering them as the true reflection of phylogenetic relationships between taxa.
2. Distance analysis of DNA sequences can be performed by selecting Distance Summary on
the Sequence Analysis console (Figure 25). Various parameters are available for users
(again, the most important one being the alignment option). The results window shows the
range of distance values (as %) within and between species as well as histograms for these
values (Figure 31). High intraspecific values (>3%) might be indicative of potential cryptic
species or misidentification (in c). In case of potential cryptic species, additional molecular
markers as well as morphological, behavioural, ecological etc. data are needed to clarify the
taxonomic position of the existing genetic clusters.
Figure 31. Results of the distance analysis between DNA sequences held in a BOLD project. Divergences
within species and between congeneric species are displayed in a table format and plotted as histograms.
All values of pairwise comparisons can be downloaded in a spreadsheet.
3. The Barcode Gap Analysis provides a quick view of all interspecific divergences in a project,
highlighting cases which require additonal investigation (intraspecific distances higher than
distances to the nearest neighbour species). Users are able to choose parameters with the
alignment option being the only mandatory field. The results window shows distance values
(maximum intraspecific and minimum interspecific) displayed as various scatterplots and
summarized in a table format (Figure 32). In case of errors (misideintification, contamination)
observed, action should be taken (see sections above).
61
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 32. Results of the barcode gap analysis in a BOLD project. Scatterplots are built for various analyses:
maximum/mean intraspecific distance vs. distance to the nearest neighbour species, individuals per
species vs. maximum intraspecific distances. Dots below the red line refer to species which need further
investigation (nearest neighbour is closer than the maximum intraspecific value). Details for each species
are provided in a table format and can be downloaded in a spreadsheet.
4. The Barcode Index Number (BIN)36 is a useful tool for cataloguing life in the absence of
taxonomy. The system is based on a unique algorithm implemented in BOLD which clusters
all COI sequences into groups with unique identifiers (3-letters-4-numbers codes:
BOLD:ABC1234). Each BIN receives a unique page in BOLD, compiling all the information
available for the member records. Due to high concordance between BINs and morphological
species, observed so far, BINs can be considered as proxies for animal species. Each record
in a project would display a link to the BIN page on the project console (Annex X).
Some short sequences (<500bp) are not part of any BIN (unless they are very similar to existing
sequences which already received a BIN assigment). In these cases, the Cluster Sequences
tool (Figure 25) will group all DNA sequences from a project in Operational Taxonomic Units
(OTUs) (Figure 33). Although the algorithm used is very similar to BINs, the main differences are
the following: 1) OTUs are temporary units which do not receive persistent pages in BOLD (data
36
Ratnasingham S, Hebert PDN (2013). A DNA-Based Registry for All Animal Species: The Barcode Index
Number (BIN) System. PLoS ONE 8(8): e66213.
62
GTI Training Manual – Standardized Workflows in DNA barcoding
can be downloaded as spreadsheet and used in other software); 2) OTUs are project-based while
BINs are based on the entire BOLD.
Figure 33. Results of clustering into OTUs in a project. Mean and maximum values for OTUs as well as the
distance to the nearest OTU are displayed and can be downloaded (green button). OTUs are numbered
but are not persistent.
Besides an analytical tool providing statistics related to a project and highlighting cases of
potential cryptic speciation, BINs have an important role in data validation. The BIN Discordance
Report (Figure 25) provides an overview on the (dis)agreement of data within a project compared
with the rest of BOLD projects (Figure 34). If errors (misidentification, contamination) are spotted
in the project, action should be taken (as mentioned above).
Figure 34. BIN discordance report for a BOLD project (in BOLD 3.5). For the discordant BINs, the rank of
discordance and details on conflicting records are mentioned. Results can be downloaded (green button).
63
GTI Training Manual – Standardized Workflows in DNA barcoding
Note: The report includes only sequences assigned to a BIN. This report can be currently built
only in BOLD 3.5 (BOLD 4 will provide a discordance report only for a specific project).
The BIN algorithm runs once month (June 2018), therefore BINs are not provided automatically
upon seqeunce upload.
Data publication
BOLD holds data in projects. However, data can be partitioned (subset of data from one project)
and mixed (subsets of data from different projects) in datasets (Figure 35), virtual copies of
projects, which have the role of allowing data recycling (the use of the same data in multiple
studies). Datasets have the same options as projects and can be analyzed with the same tools
as the ones mentioned above. Once data is ready for publication, the user can easily release the
entire dataset in BOLD Public Database as well as GenBank. In the same time, a DOI can be
obtained for that particular dataset and included in the publication (the DOI link will allow a quick
launch of the dataset directly from the publication where it is mentioned).
Figure 35. Workflow for publishing datasets. Data should be submitted to GenBank through Publication ->
Submit to GenBank (yellow rectangles). The dataset should be publicly released in BOLD through Dataset
Options -> Modify Dataset Properties -> Make this dataset publicly visible (check the box; red rectangles)
-> Save. A new window appears where a request for DOI can be made.
An overview of the analytical steps required for data validation, analysis and publication in BOLD
is presented in Figure 36.
For more details on data validation, analysis, and management, see the BOLD handbook
(http://www.boldsystems.org/index.php/resources/handbook?chapter=7_validation.html).
64
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 36. Main workflow and analytical steps required for data validation, analysis and publication in BOLD. NJ – neighbour-joining; BIN – Barcode
Index Number; OTU – Operational Taxonomic Unit. Additional analytical tools, not presented here, are available depending on the interest of the
user. Once data is ready to be published, it needs to be submitted to GenBank and publicly released in BOLD (datasets with DOI are a convenient
option).
65
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 37. Scaling up the DNA barcoding workflow involves switching from working with individual sample
tubes (left) to processing 96-well microplates.
37
Robotic liquid handling allows consolidating four 96-well plates into a single 384-well plate which allows
dramatic reduction of the volume of sequencing reactions, providing considerable savings in labware and
reagent cost; however, 384-well plates are not human-manageable.
66
GTI Training Manual – Standardized Workflows in DNA barcoding
• Each sample needs to be accurately mapped within the plate and matched to its source
specimen;
• The amount of tissue in each well has to be standardized across the entire plate and small
enough to allow using small reagent volumes and the same PCR primers;
• Organisms have to be grouped based on their taxonomy and other properties, so that the
same tissue lysis and DNA extraction protocols, as well as PCR primer sets could be used
across the entire 96-well plate.
• The procedure of assembling the samples into the microplate has to be time-efficient, to
ensure that it does not slow down the overall process.
These challenges are further complicated by the fact that the collection materials from which the
tissues originate may come from different sources (e.g., museums and field collection operations),
using different forms of preservation, and with provenance information provided in diverse
formats. In addition to slowing down the process of generating DNA barcodes, this can also
increase the chance of human error. In this context, arraying is a crucial component of the front-
end logistics.
67
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 38. The 96-well microplate (left) and its topographic layout. The 96-well microplate is typically used
to array tissue samples and DNA extracts.It contains conical-shaped sampling wells approximately 100 µl
in volume that are arranged in a 12×8 format (12 columns and 8 rows) within a rectangular-shaped plastic
skirt. Typically, well A01 (marked in green) is the start of the array; red well H12 is left empty as a negative
control. Note the label with an alphanumeric barcode on the front of the plate allowing to discern it from
other plates and track it through different stages of processing and analysis. This label is affixed prior to
any sampling activities.
Array Types
Different types of arrays may be used, depending on the nature of the organisms processed
(Figure 39). When working with small fluid-preserved specimens, tube or vial racks can be used;
for small try mounted specimens (e.g., pinned or pointed insects), gridded pinning boxes can be
custom-made. For specimens that are too large to array (e.g., vertebrate vouchers or plant
herbarium specimens), tissue samples are usually taken and stored separately; these tissues can
also be arrayed in tube racks, prior to subsampling. Typically, arrays are two-dimensional, but in
cases of large or awkwardly-shaped samples (e.g., plant tissue in desiccant), this is not practical.
In these instances, use of linear arrays allows the technician to keep track of the order of samples,
while taking up reasonably small operational footprint.
B C D
Figure 39. Examples of arrays: (A) scintillation vial rack; (B) tissue tube rack; (C) pinning box; (D) tray with
bags containing plant tissue in desiccant. For images A-C, note the empty locator corresponding to the
control well (H12).
68
GTI Training Manual – Standardized Workflows in DNA barcoding
It is critical to keep a record of the position of specimens in the array and the corresponding
samples. Each specimen and resulting sample needs to be assigned a unique identifier, or
Sample ID and these identifiers should be associated with the locators within the array,
determined by the combination of the corresponding row (A-H) and column markers (01-12).
Rather than entering the data directly into a 12×8 matrix, it is preferred to use custom spreadsheet
templates which allow entering data for samples in sequential order and then displaying 12×8
printable layout generated from the data entered using formulas. It is important to have a printout
of the assigned array map before initiating the processing of specimens, so that it could be cross-
referenced against specimen labels, to ensure correct placement of each sample.
Arraying can help streamline all bulk processing stages:
• Sorting – transferring specimens from bulk samples and organizing them into arrays;
• Labelling – affixing individual labels with globally unique alpha-numeric coding;
• Imaging – digital photography of specimens;
Below is an example of a custom spreadsheet template (Excel file) used for generating array
maps. Data are entered in the tab (worksheet) titled “DATA INPUT”. Worksheet filling instructions
are typed in green italics; and red-on-yellow warning messages are displayed if information is
missing, duplicated or does not meet the minimal standards. Data are entered into the white cells
which change colour once filed.
The type of sampling container is selected from the dropdown field and an image of the container
is displayed to allow the user to verify correct entry. The “Multiple Containers” checkbox, if
activated, displays a columnar list of fields to enter up to ten container numbers. Entering the
container names un-hides the corresponding fields for entering sample data in the left part of the
table. Each sample has to have a unique identifier that should correspond to the “Sample Locator”
field indicating its position within the corresponding container.
69
GTI Training Manual – Standardized Workflows in DNA barcoding
To confirm the correct sampling order, the user can refer to the “Array Map” (lower image) that
displays the two-dimensional localization of each of the samples entered. The “Array Map” sheet
displays one map at a time; the container can be selected from the dropdown menu in the top
right part of the sheet. The workbook does not allow entering any comments or additional data –
its sole purpose is to map the position of samples within the arrays. NOTE: All coloured (non-
white) cells in the CCDB Record workbook are write-protected to secure formulas and cross-links;
data can only be typed or pasted into white cells. When pasting data from another spreadsheet,
it should be pasted as ‘values’ or ‘unicode text’ using the ‘paste special’ function of MS Excel.
70
GTI Training Manual – Standardized Workflows in DNA barcoding
38
TDWG - http://www.tdwg.org/
39
GBIF - https://www.gbif.org/
40
Darwin Core Archive - https://github.com/gbif/ipt/wiki/DwCAHowToGuide)
41
Note that in this context the term “species occurrences” could be better understood as documented
occurrences of organisms which have been attributed (e.g., by a taxonomic expert) to a certain species or
other taxonomic category.
42
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0089606
71
GTI Training Manual – Standardized Workflows in DNA barcoding
ontological concept of collecting and the types of entities that collections deal with. It is also
important to distinguish between collecting activities that lead to the sourcing of biological
materials and the materials themselves, i.e., collection objects.
• History – a record of the persons and organizations (= ‘subjects’) undertaking the collecting
and subsequent processing of materials;
• Provenance – spatiotemporal and circumstantial properties of the collecting activities (=
‘actions’); and
• Attributes – intrinsic or relational properties the biological materials (= ‘objects’) collected.
Below is a more detailed outline of the details that are typically recorded (see Annex II for details
on the Electronic Field Journal holding all these fields).
Authority – “why?”
This information provides the administrative context for the purpose (end goal) of collecting the
biological objects; it describes the entities responsible for the collecting effort and those who have
oversight, ownership or custodianship over the materials collected:
Actors – “who?”
Individuals involved in and responsible for the collecting and field processing activities:
72
GTI Training Manual – Standardized Workflows in DNA barcoding
Although not part of collection information per se, this group of data records the audit trail of the
collection as it undergoes different stages of processing, analysis, archival and curation. Keeping
a record of these processing stages is an important part of ensuring a robust operational workflow.
73
GTI Training Manual – Standardized Workflows in DNA barcoding
These are characteristic of the collecting event related to the way it was deployed and other factors
that may affect its outcome:
Attributes – “what?”
Unlike provenance information, which is logistically convenient to record for the entire collecting
event, attributes are, as previously noted, intrinsic or relational properties the biological objects
being collected, although they may be recorded at the time of collection. It is therefore important
that lots and/or specimens for which an observation is made are grouped and labeled in a way
that allows tracing such information. Two major groups of properties could be defined.
Biological properties
Ecological
These properties describe the collection object’s relations to other organisms or abiotic
factors/resources observed at the time of collecting that may provide insights into its ecological
role or function (e.g., parasite, symbiont, host) or other peculiarities.
Organismal
These properties describe the innate characteristics of a particular organism; examples include:
Taxonomic properties
Taxonomic identification
This is a result of a curatorial act – when the specimen has been examined by an expert and
assigned to a certain taxon.
Taxonomic position
This information is derived from a nomenclatural paradigm and refers to the currently accepted
placement of a taxon within the broader hierarchy and the preferred nomenclature to be used.
74
GTI Training Manual – Standardized Workflows in DNA barcoding
75
GTI Training Manual – Standardized Workflows in DNA barcoding
76
GTI Training Manual – Standardized Workflows in DNA barcoding
77
GTI Training Manual – Standardized Workflows in DNA barcoding
78
GTI Training Manual – Standardized Workflows in DNA barcoding
79
GTI Training Manual – Standardized Workflows in DNA barcoding
80
GTI Training Manual – Standardized Workflows in DNA barcoding
81
GTI Training Manual – Standardized Workflows in DNA barcoding
TIP: If you are submitting data to the same BOLD project in several blocks, corresponding
datasets could be denoted as “_submission1”, “_submission2”, etc. Denoting the actual BOLD
project code is not mandatory; however, it helps keep track of where the data are being uploaded.
82
GTI Training Manual – Standardized Workflows in DNA barcoding
Ergonomics
Imaging large batches of specimens in an awkward posture can lead to serious strain injuries
(particularly to the neck and back); therefore, ergonomic workplace organization is a key part of
studio design, especially in the case of macro photography. The relative position of the table(s),
camera tripod and photographer’s seat should be such that the camera viewfinder is at eye level
(no need to hunch down or extend the neck) and that the entire specimen array is within
comfortable reach. Special care should be taken to avoid leaning, hunching, neck flexion or
extension and unsupported arms in static posture. Whenever logistically feasible, real-time
camera feed onto a monitor is recommended.
Specimen security
Another important consideration in organizing the imaging protocol is maintaining the integrity of
specimens (e.g., not detaching them from their labels) and avoiding damage. Studio setup should
take into account this requirement.
Avoiding errors
Having the specimens pre-arrayed and labeled prior to imaging helps streamline the imaging
process. It is recommended either to use two separate arrays (one for ‘pre-imaging’ and one for
‘post-imaging’) or to change the orientation of the specimen in the array once it has been
photographed. It is useful to have a printout of the array map and to confirm individual specimen
numbers between the map and the specimen or vial labels. When specimens are too small for
labeling, their number can only be inferred from their position (locator) within the array; therefore,
it is critical to ensure that the specimens are taken from and placed into the correct array locator
(e.g., plate well).
83
GTI Training Manual – Standardized Workflows in DNA barcoding
It is often convenient to set up a system where the camera and/or stage can be easily moved,
e.g., to exchange the specimen or to change the focal distance to attain focus and/or proper
framing. For horizontal setups (when the lens axis is positioned horizontally), it is usually preferred
to have a camera mounted statically on a tripod, while the stage could be moved relative to the
lens across the tabletop.
Figure 40. Example of a macro photo setup for pinned insects. The digital SLR camera is mounted on a
tripod; a ring macro flash is attached to a 60 mm macro lens. The cone of white paper visible in front of the
flash acts as a diffusor that minimizes glare. The specimen is pinned onto a double-layered piece of white
fabric. There is an off-camera flash behind the fabric that is synchronized with the on-camera ring flash –
when activated, it over-exposes the fabric, creating the perception of a pure-whit background.
Lighting
True to the origin of the term ‘photography’ (‘writing [drawing] with light’ in Greek), the key to taking
a successful image is to ensure appropriate lighting conditions. Although modern digital cameras
have sophisticated algorithms for exposure metering and image post-processing, it is still
important to ensure good quality lighting that provides sufficient and well-balanced exposure,
allowing the camera to capture an accurate depiction of the outline, surface profile, colouration
and texture of the specimen and its distinguishing features. It is best not to rely on ambient lighting
and to use one or more artificial light sources with good white balance. Traditionally, off-camera
flashes were used for this purpose; however, high-quality LED light fixtures are increasingly used.
84
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 41. Examples of a photographs with different exposure. Images of a fluid-preserved spider taken
using an SLR with macro lens. A: underexposed image (background clearly visible, but specimen is too
dark, specimen details obscured); B: overexposed image (background and specimen too light, specimen
details obscured); C: image with adequate exposure (background overexposed, but specimen morphology
adequately visible).
Exposure
The amount of light emitted by flashes or other off-camera sources should be sufficient to allow a
high f-stop (narrow aperture) and high shutter speed with regular ISO settings (image sensor
photosensitivity, formerly known as ‘film speed’). It is best to set the camera to manual shutter
and aperture setting and to set ISO to 100 or 200.
Directionality of lighting
Reducing glare is one of the biggest challenges when using artificial lighting; glare is often evident
when the light is emitted through a single source located in proximity to the camera and is bounced
from the object’s surface back into the lens. Glare is particularly evident in glossy objects, such
as certain insects (e.g., beetles), and specimens removed from liquid, but can also be observed
in other cases, e.g., birds with iridescent feathers. Rather than relying on the built-in camera flash,
it is best to use a lens-mounted macro (e.g., ring macro) flash which emits light from a larger area.
It is even better to set up a diffusor in front of the flash and set several additional light sources or
reflectors around the object to project light at it from several directions, akin to a ‘soft box’ used in
professional photo studios. This lighting setup will remove high contrast shadows and will highlight
the texture of the specimen.
Figure 42. Examples of glare on specimen images. Images of beetles with glossy forewings taken using a
ring macro lens. A: no light diffusor on the lens/flash – light from the flash is reflected from glossy elythrae;
85
GTI Training Manual – Standardized Workflows in DNA barcoding
B: diffusor (cone of white paper) positioned around the object in front of the flash – lighting is more evenly
distributed, no evident glare allows for better perception of the texture of the beetle’s elythrae.
Depth of field
Typically, biological objects are three-dimensional. Note that most cameras have a built-in ‘macro’
setting which is designed to create “artistic blur” around a shallow portion of the object, thereby
leaving most of it out of focus. This setting is useless for technical photography of biological
objects, where one should aim to attain maximum depth of field. In an SLR-type camera, this can
be achieved by setting a narrow aperture (usually marked as ‘A’ in camera settings),
corresponding to a large ‘f-stop’ (values of f11 or higher). Note that narrow aperture settings
require high intensity lighting to provide adequate exposure. Also note that very high f-stops
(usually, above f13) can lead to a visible diffraction effect that makes the image look more
“grainy” and obscures detail. It is best to balance the need for a high depth of field with need to
maintain image sharpness.
Sharpness
In order to depict as many characters of the specimen as possible, the image should be “crisp”,
or sharp. High detail necessitates, first of all, precise focus on the object. By default, most
cameras use automatic electronically-guided focus; when using it, ensure that the camera
focusses on the object and not the background.
Motion blur can drastically decrease image quality; it is caused by lens movement relative to the
object during exposure. To avoid motion blur, select high shutter speed settings (usually marked
as ‘S’) of 250 or more (corresponding to 1/250 of a second) and/or use a tripod to stabilize the
camera. When using flashes, ensure that the shutter speed does not exceed the maximum
allowed by flash synchronization.
High ISO (image sensor photosensitivity) settings also introduce image noise, making the image
look “grainy”; this can sometimes happen if the camera ISO is set to ‘auto’ by default. It is best to
set ISO manually to 100 or 200, as directed above.
Distortion
Typically, distortion is not an issue when using macro lenses, but may be noticeable if wide-angle
lenses are used and set to the smallest focal length (e.g., 35 mm or less). Images may also be
distorted as a result of digital post-processing, e.g., merging of multiple photos. For example, this
may happen when the Z-axis stacking feature is used in microscopy. It may be a reasonable trade-
off if the specimen is too large to be imaged by a regular lens from a certain aspect or if additional
detail is recorded during stacking or merging of several images.
The object should occupy the entire full frame, leaving relatively small margins. It is also important
to ensure that parts of the specimen are not “cut off” by the image frame. The longest axis of the
specimen should be parallel to the longest.
86
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 43. Examples of photographs of the same specimen (muskrat skull) taken with different aperture
settings showing effect on depth of field and diffraction. Photos were taken with a digital SLR camera using
a 60 mm macro lens. Left: image of entire skull, view from below; right: close-up of part of the image. A –
largest aperture, f2.8 (low depth of field, only lower part of the skull in focus; no diffraction); B – medium-
small aperture, f13 (medium-high depth of field, with most of the skull in focus; no visible diffraction); C –
smallest aperture, f32 (high depth of field, entire skull in focus; high diffraction). Image B would be optimal
quality for DNA barcoding.
87
GTI Training Manual – Standardized Workflows in DNA barcoding
Figure 44. Examples of an unframed (A), incorrectly (B) and correctly (C-D) framed photos of the same
specimen; image sizes reduced proportionally to original. A: Original photo as recorded by the camera –
excessive wide margins around the specimen; B: original image incorrectly cropped in 4:3 ratio (landscape
layout) – specimen legs cropped out; C: original image correctly cropped in 4:3 ratio (landscape layout) –
all parts of the specimen retailed in the image (note that only part of the label is within frame, but it is non-
essential); D - original image correctly cropped in 4:3 ratio (portrait layout).
Figure 45. Example of an image thumbnail set: when all images are oriented in the same direction, it is
easier for the human eye to pick out the differences that are not related to the positioning of the specimens.
88
GTI Training Manual – Standardized Workflows in DNA barcoding
It is important that uniform image orientation (landscape vs. portrait) and object position
(orientation and aspect) is maintained throughout the entire batch of morphologically similar
organisms, in order to allow batch comparisons of images.
There are no specific guidelines for background colour to be used in DNA barcode e-voucher
images; however, it is important to ensure that the background is uniform in colour and pattern
and does not obscure the outline of the object. Typically, white or black background is
recommended for most cases. A sheet of white paper or cloth (preferably, with an off-camera flash
mounted behind it) or black velvet could be used to attain the desired outcome. For best results,
the background should be far enough behind the object to be out of the camera’s focus. Note that
most automatic camera settings will try to attain a ‘center-weighted average’ image exposure,
meaning that the camera’s built-in exposure metering system will try to compensate for “over-
exposed” white background or “under-exposed” black background. To avoid this, we recommend
using manual shutter and aperture settings and ‘spot metering’ exposure settings, and exposure
compensation if available.
Figure 46. Choosing image background: comparison of the appearance of the same specimen on dark (A)
and white (B) background. When the specimen is photographed against black background, it often creates
a better perception of its surface texture; it also helps to pick up fine details of the outline, even if parts of
the specimen are dark. White background provides better representation of translucent parts of the
specimen.
Some imaging manuals recommend using blue or green background which is thought to facilitate
digital ‘cropping’ of the object from the background, which we do not support for several reasons:
• Many non-human biological objects naturally contain blue or green colours, thereby
complicating the digital cropping procedure;
• Intensely coloured background may skew the camera’s colour temperature perception or
exposure metering, or even alter the registered colouration of the specimen;
• The procedure adds an unnecessary step in the image post-processing that is not warranted
as part of the routine DNA barcoding workflow.
89
GTI Training Manual – Standardized Workflows in DNA barcoding
Imaging dry mounted specimens (e.g., pinned insects) allows more versatility in studio setup
(horizontal vs vertical); although fluid-preserved specimens may be temporarily removed from the
fixative, it is logistically preferred to use a vertical setup where the camera is mounted vertically
above the vessel where the object is placed. A similar vertical setup could be used for larger dry
specimens. When small batches of specimens are imaged at a time, the vertical camera can be
held over the specimen by the photographer, who should be mindful of the strain risks associated
with poor ergonomics of this setup.
Processing software
In choosing the software to post-process the images and prepare them for submission, it important
to consider the need for previewing and processing large batches (at least 95) images, including
basic image adjustment, cropping, batch resize and batch rename. Despite the considerable
variety of image editing software, relatively few programs offer this specialized functionality.
FastStone Image Viewer43 has the required features and is offered as freeware for personal and
educational use.
43
http://www.faststone.org/
90
GTI Training Manual – Standardized Workflows in DNA barcoding
Notes
91
GTI Training Manual – Standardized Workflows in DNA barcoding
• Do not place any foreign objects (e.g. labels) into sampling wells.
• Do not place excessive tissue into the sampling wells - this may inhibit DNA extraction and
PCR. If the sample exceeds the recommended dimensions, subdivide it into fragments to
obtain the right amount.
• Avoid sampling from body parts containing scales, hairs, or bristles, when possible.
• Avoid sampling from digestive tracts or from areas which may have been in contact with
digestive tract contents or other contaminants.
NOTE: Flame sterilization, although not recommended, can be used for DNA-rich tissues when
chemical sterilization is not available.
Supplies required to sample DNA-rich (A) and DNA-poor (B) tissue:
Required supplies
92
GTI Training Manual – Standardized Workflows in DNA barcoding
• 3 (4oz) jars with increasing amounts of distilled water: wash 1 (~50mL), wash 2 (~70mL),
and wash 3 (~90mL)
Required supplies
44
Note that the decontamination protocol is different for Eliminase and for bleach: bleach can be inactivated
by Ethanol, therefore after treating the tools with bleach, they should be placed in concentrated Ethanol,
rather than distilled water. If desired, subsequent wash in distilled water may follow.
93
GTI Training Manual – Standardized Workflows in DNA barcoding
• Microplate with sampling wells pre-filled with 30μl of 95-100% ethanol and covered with 12-
cap strips
• Gloves
• Kimwipes or other sterile paper tissue
Sampling procedures
• Clean work station (steps: ELIMINase, water, ethanol). Change gloves and lay out Kimwipes
to work on.
• Light the flame source:
o If using a propane burner, turn knob slightly to release small, steady stream of gas.
Use the lighter to light the propane burner, producing a small flame (2-3 cm).
o If using an ethanol burner, ensure it is filled with ethanol and light it with the lighter.
• NOTE: Never leave the flame unattended. Ensure that you turn off the propane burner or
smother the ethanol burner if you leave your station.
• Position the plate on a flat surface with the plate label facing towards you. The column
markers (1–12) should be at the top and the row markers (A–H) should be on the left side.
• Remove first row cap strip, without touching the part of the cap that goes in the well, and
cover with Kimwipe until row completed (to reduce contamination).
• Sterilize forceps by dipping in the jar containing ethanol and then flaming the forceps in the
burner. Do not hold forceps above flame for more than 1 second – wave forceps tips once
over the flame and let the ethanol burn off. Be careful not to drip burning ethanol on your
work surface and keep forceps away from the jar with rinsing ethanol.
• Remove piece of tissue from corresponding specimen and place in first well (A01) of
microplate. For insects, we recommend sampling the middle leg on the right side.
• Repeat steps 5 and 6 for each well of the first row, proceeding in alphanumerical order to
A12 (left to right).
• After completing the row, replace the cap strip and seal it firmly.
• Remove next row strip and repeat steps 5-8 for all rows.
• When sampling into the last row (Row H), remember to leave the last well (H12) empty (as
negative control during molecular procedures).
• Once the plate is filled with samples, ensure that all caps are pressed firmly into the wells
to prevent ethanol evaporation.
• Store microplate in the fridge/freezer until processing. Large delays (e.g., years) between
tissue sampling and molecular processing might affect the PCR success.
Plants and fungi should be tissue-sampled following the procedure for arthropods.
94
GTI Training Manual – Standardized Workflows in DNA barcoding
45
This list of equipment and suppliers is based on companies available in Canada.
95
GTI Training Manual – Standardized Workflows in DNA barcoding
Pipettes
Tissue lysis and DNA extraction
PCR hood
E-Gel station
Sequencing hood
96
GTI Training Manual – Standardized Workflows in DNA barcoding
Workflow of a standard DNA barcoding lab is compartmentalized into units that are independent
in function but linked in purpose. Computer operated LIMS links the working units by providing
current information on the action-status of each task at each functional unit and enabling the
sample tracking in real time. Documenting and tracking multiple samples in a high throughput
barcoding workflow is almost impossible without employing LIMS or a comparable management
tool.
Figure 47. Screenshot of LIMS used at the Canadian Centre for DNA Barcoding and linked to
BOLD.
97
GTI Training Manual – Standardized Workflows in DNA barcoding
Stock solutions
Description Reagents Weight Final Volume
1M Tris-HCI, pH 8.0 Trizma base 26.5g 500mL
Trizma HCl 44.4g
1M Tris-HCI, pH 7.4 Trizma base 9.7g 500mL
Trizma HCl 66.1g
0.1M Tris-HCI, pH 6.4 Trizma base 6.06g 500mL
(Adjust pH with HCl to 6.4-6.5)
1M NaCl NaCl 29.22g 500mL
0.5 M EDTA pH 8.0 EDTA 186.1g 1000mL
NaOH ~20.0g
Note: Vigorously mix on magnetic stirrer with heater. The disodium salt of EDTA will not go into solution
until the pH of the solution is adjusted to ~8.0 by the addition of NaOH.
Tip: give a brief rinse to NaOH granules with ddH20 in a separate glass before dissolving them.
Proteinase K 1g
1M Tris-HCl, pH 7.4 0.5mL
Proteinase K 50mL
ddH2O ~25mL
50% glycerol v/v 25mL
46
Based on CCDB protocols (http://ccdb.ca/resources/).
98
GTI Training Manual – Standardized Workflows in DNA barcoding
Note: To a vial of Proteinase K (1g), add 0.5mL 1M Tris-HCl (pH 7.4) and ~15mL of ddH2O. Close vial and
mix gently by rotation until dissolved (do not shake). Transfer to a 50mL tube and add enough ddH2O to
achieve a total volume of 25mL. Add 25mL glycerol (50% glycerol v/v). Mix gently by rotation (do not
shake). Aliquot by 2mL into tubes and store at -20°C (glycerol prevents freezing and protects enzyme).
Working solutions47
Buffer Description Volume from stock solution Final
(mL) or weight (g) Volume
2×CTAB 2%CTAB CTAB 4.0g 200 mL
100mM Tris-HCl, pH 8.0 1M Tris-HCl, pH 8.0 20mL
20mM EDTA, pH 8.0 0.5M EDTA, pH.8.0 8mL
1.4 M NaCl NaCl 16.4g
Vertebrate Lysis 100mM NaCl 1M NaCl 20mL 200mL
Buffer 50mM Tris-HCl, pH 8.0 1M Tris-HCl, pH 8.0 10mL
10mM EDTA, pH 8.0 0.5M EDTA, pH 8.0 4mL
0.5% SDS SDS 1.0g
Insect Lysis 700mM GuSCN GuSCN 16.5g 200mL
Buffer 30mM EDTA pH 8.0 0.5M EDTA, pH 8.0 12mL
30mM Tris-HCl pH 8.0 1M Tris-HCl, pH 8.0 6mL
0.5% Triton X-100 Triton X-100 1mL
5% Tween-20 Tween-20 10mL
Note: Vigorously mix on magnetic stirrer with heater
Binding Buffer 6M GuSCN GuSCN 354.6g 500mL
20mM EDTA pH 8.0 0.5M EDTA, pH 8.0 20mL
10mM Tris-HCl pH 6.4 0.1M Tris-HCl, pH 50mL
6.4
4% Triton X-100 Triton X-100 20mL
Note: Vigorously mix on magnetic stirrer with heater. If any re-crystallization occurs, pre-warm at 56°C to
dissolve before use. Stable at room temperature for 1 week.
Plant Binding Buffer Binding buffer 80mL 96mL
ddH2O 16mL
Protein Wash Buffer Binding buffer 26mL 100mL
EtOH 96% 70mL
Note: Stable at room temperature for ~1 week, discard if any crystallization occurs
Wash Buffer 60% EtOH EtOH 96% 300mL 475mL
50mM NaCl 1M NaCl 23.75mL
10mM Tris-HCl, pH 7.4 1M Tris-HCl, pH 7.4 4.75mL
50mM EDTA, pH 8.0 0.5M EDTA, pH 8.0 0.475mL
Note: Mix well, store at -20°C.
Binding Mix Binding buffer 50mL
EtOH 96% 50mL
Note: Stable at room temperature for 1 week.
Elution buffer 10mM Tris-HCl, pH 8.0
Note: Store at 4°C.
47
Weigh the dry components (e.g. SDS or GuSCN) first, then add required volumes of the stock solution,
and fill up with the molecular grade ddH2O to the final volume. No filtering is required.
99
GTI Training Manual – Standardized Workflows in DNA barcoding
Note:
• Thoroughly wash labware with ELIMINase, rinse with dH20. Weigh reagents using a clean
spatula, fill up with the molecular grade ddH20 to the final volume. Filter buffers through 0.2
μm filter into a clean bottle; make smaller volume working aliquots (e.g. 100mL). Store stock
solutions and working aliquots at 4ºC.
100
GTI Training Manual – Standardized Workflows in DNA barcoding
• CTAB – Echinoderms, mollusks (taxa with large quantities of polysaccharides in their tissues)
• Non-CTAB – The remaining taxonomic groups
o Vertebrate Lysis Buffer – Vertebrates
o Insect Lysis Buffer – Invertebrates (except taxa requiring CTAB)
Reagents, equipment and disposables required for lysis of 1 plate:
Item Quantity Notes
Full-skirted 96-well microplate 1 Colour: clear (“tissue/lysis plate”)
12-cap strips 8
Reagent reservoir 1 Or a fresh and sterile tip-box
Lysis buffer (Vertebrate/Insect/CTAB) 5mL Or 6mL to ensure sufficient quantity
for all the wells.
Proteinase K 0.5mL
LIMS sticker 1
Pipettes and tips See Appendix C
Incubator Temperature: 56°C
Plate centrifuge Speed: 1000×g
Gloves
Kimwipes
Permanent marker
Prepare the DNA extraction workstation by a thorough cleaning of the surface: clean with
ELIMINase, then distilled water, wipe it with Kimwipe until dry, and finally wipe the surface with
ethanol. Change gloves after cleaning.
1. Turn on incubator and set temperature to 56°C.
2. Retrieve tissue plate from temporary storage (e.g., fridge, freezer, shelf) and label with LIMS
sticker if needed.
3. To evaporate ethanol from tissue plate prior to lysis:
• Centrifuge (spin) sealed plate briefly.
• Visually inspect plate to ensure that samples are at the bottom of wells.
• Remove caps carefully (and dispose directly into garbage), change gloves, and place the
plate in incubator at 56°C for about 45-60 minutes (or until ethanol is fully evaporated),
depending on the volume of ethanol. Check periodically. Do not over-dry.
48
Based on CCDB protocols (http://ccdb.ca/resources/).
101
GTI Training Manual – Standardized Workflows in DNA barcoding
4. Mix 5mL lysis buffer (vertebrate/insect/CTAB) and 0.5mL Proteinase K in a sterile reagent
reservoir. Ensure that buffer and Proteinase K are well-mixed throughout while trying to avoid
the production of bubbles in the mixture.
5. Add 50µL of lysis mix49 to each well of tissue plate. Place sterile 12-cap strips over each row.
*Fill 600µL, dispense 50µL 50. Use same tips for entire plate.
6. Ensure that microplate has a LIMS sticker and is labelled with initials, date of lysis, and the word
“lysis”.
7. Spin microplate at 1000×g for 30 seconds.
8. Incubate at 56°C for 12-16 hours (overnight)51.
9. Record lysis step on LIMS.
10. Clean workstation and dispose of waste.
DNA extraction
This DNA extraction method is silica-based and involves DNA binding to a glass fiber membrane.
Reagents, equipment and disposables required for processing 1 plate:
Item Quantity Notes
Full-skirted 96-well microplate 1 Colour: blue (“DNA plate”)
DNA extraction glass fiber plate (GF)52 1 Vertebrates: 1.0µm
Invertebrates: 3.0µm
Square well block53 1 Re-used (clean after each use)
Reagent reservoir 4 1 reservoir/buffer
Binding mix (BM) 9.6mL Exact quantity; add more to reservoir
Protein wash buffer (PWB) 17.28mL Exact quantity; add more to reservoir
Wash buffer (WB) 72mL Exact quantity; add more to reservoir
Elution buffer (EB)54 3-6mL Exact quantity; add more to reservoir
LIMS sticker 1
Clear seal 4
Aluminum seal 1
Pipettes and tips
Incubator Temperature: 56°C
Plate centrifuge Speed: 1000×g, 5000×g
Plate roller
Gloves
Kimwipes
Permanent marker
49
If tissue samples are large, 100µL of lysis mix can be added to each well to dilute DNA concentration and
allow for better results with PCR amplification. Ensure that only 50µL of this lysate is mixed with binding
mix during extraction.
50
Setting for Thermo Scientific Matrix 8-channel 50-1250µL electronic pipette.
51
Lysis time can increase to 24 hours if considered necessary.
52
PALL has discontinued the PALL glass fibre plate types that are used at CCDB (types 5051 and 5053),
and replacement plates can only be used in the centrifuge, not in a robotic setup.
53
Square well blocks can be cleaned with ELIMINase, rinsed, dried and reused.
54
Elution can be performed with ddH20. However, EB stabilizes DNA for long-term storage.
102
GTI Training Manual – Standardized Workflows in DNA barcoding
1. Remove lysis plate from incubator and centrifuge at 1000×g for 30 seconds to remove any
condensation from cap strips. Carefully open cap strips one by one (dispose directly into
garbage), making sure that lysate does not splash into adjacent wells.
2. Label reservoirs for four extraction buffers.
3. Pour binding mix (BM) into first reservoir.
4. Add 100µL BM to each well, being careful not to touch wells with tips but hover above.
*Fill 1200µL, dispense 100µL. Use same tips for entire plate.
5. Retrieve the appropriate type of glass fiber plate (GF) (see table above) and label with initials,
original (tissue) plate number and date. Place on top of clean square well block.
6. Transfer 170-180µL lysate (aspirate all) from lysis plate to GF plate: before transferring, slowly
mix lysate up and down about 3 times in lysis plate wells before releasing into GF plate wells.
Cover with clear seal.
*Use manual multi-channel pipette. Change tips after each row.
7. Centrifuge at 5000×g for 5 minutes to bind DNA to GF membrane.
103
GTI Training Manual – Standardized Workflows in DNA barcoding
20. Dispense 40µL55 of EB directly onto membrane in each well of the GF plate and incubate at
room temperature for 1 minute. Ensure no plate flip happens. Cover with clear seal.
*Fill 500µL, dispense 40µL. Use same tips for entire plate.
21. Place GF plate + DNA plate combination on top of a square well block for centrifugation
(otherwise the DNA plate will crack).
PCR amplification
The standard animal barcode region is COI-5P, which can be amplified with a range of primer
pairs, from ‘universal’ to very specific (see Appendix E for primer details).
Reagents, equipment and disposables required for processing 1 plate (PCR mix not
included):
55
The quantity of EB can vary between 30-60µL/well depending on the tissue size (less EB for smaller
sizes).
104
GTI Training Manual – Standardized Workflows in DNA barcoding
Gloves
Vortex
Kimwipes
Permanent marker
The PCR workstation is usually a flow hood with UV light (intended for decontamination purposes).
Keep designated PCR pipettes inside the hood at all times. Clean PCR hood with ethanol before
starting to work. Change gloves after cleaning.
1. Retrieve required tips, seals, microplate (yellow), and tube racks intended for PCR, and place
in PCR hood. Press the UV light button on the PCR hood, which will turn UV light on for 15
minutes.
2. Retrieve DNA plate and centrifuge at 1000×g for 1 minute (if plate frozen, let it thaw a few
minutes at room temperature).
3. Prepare PCR mix56 for one plate by adding all reagents, except for DNA template, in a sterile
microcentrifuge tube (1.5 or 2mL tube).
4. Label a sterile yellow 96-well microplate for PCR reaction with initials, date and original (tissue)
late number. Place a LIMS sticker on the front of yellow plate. Turn on heat sealer.
5. Add mix to PCR plate:
• If using mix without primers: add 12.5µL forward primer and 12.5µL reverse primer into
PCR mix. Vortex lightly. Change tips between primers.
• Add 84µL PCR + primer mix to each well on row A (A01-A12).
• Distribute 10.5µL of mix from A01-A12 wells across the entire plate using a 12-channel
manual pipette. Use same tips for entire plate.
6. Cover with clear seal and centrifuge at 1000×g for 1 minute.
7. Discard clear seal and add 2µL DNA/well to PCR plate using a 12-channel manual pipette.
8. Cover with heat seal, with white side of seal facing upwards, and place in automated heat
sealer for 10 seconds. Seal plate top firmly with a plate roller.
Optional: Centrifuge the plate at 1000×g for 1 minute.
9. Place PCR plate in thermocycler and initiate the appropriate program.
56
PCR mix can be prepared ahead of time for one or multiple plates, with or without primers. Mixes will be
aliquoted in 1.5 or 2mL tubes (1 tube enough for one plate) and stored at -20°C.
57
See Appendix A for details on PCR reagents.
105
GTI Training Manual – Standardized Workflows in DNA barcoding
10. Cover DNA plate with aluminum film (use plate roller). Store plate at -20°C.
11. Clean space and dispose of waste.
12. Press UV light button on PCR hood (for a 15-minute decontamination).
13. Record PCR preparation and thermocycler program on LIMS.
14. Once thermocycler program ends, remove PCR plate and store at 4°C until E-Gel check.
COI
Number of cycles Block temperature Hold time (mm:ss)
1 94°C 01:00
94°C 00:40
x5 45°C 00:40
72°C 01:00
94°C 00:40
x35 51°C 00:40
72°C 01:00
1 72°C 05:00
Hold 4° or 10°C Until plate removed
PCR success is verified by running and visualizing PCR products on pre-cast 96-well 2% agarose
gels (E-Gels). This system is bufferless so exposure to ethidium bromide is minimized. However,
the E-Gel station is considered a contaminated area, producing hazardous waste and needs to
be handled with special care (e.g., separate lab coats, pipettes, gloves, garbage bin).
Reagents, equipment and disposables required for processing 1 plate:
58
After the first use, E-Gels should be kept in containers to maintain hydration.
106
GTI Training Manual – Standardized Workflows in DNA barcoding
E-Gel directly onto gel dock. Place packaging and casing directly into the container labelled
"Hazardous Waste". Change gloves.
4. Pour ddH2O into reagent reservoir.
5. Add 14µL ddH2O to E-Gel59 using a 12-channel manual pipette allocated for E-Gel station. Do not
place pipette tips too deep into wells, as it may rupture them. Use same tips for one plate.
6. Remove heat seal from PCR plate. Use tweezers if needed.
7. Load 4µL PCR product into E-Gel. Change tips for each row.
8. Plug in gel dock (ensure it is set to run program “EG”) and slide E-Gel into two electrode
connections on E-Base. Press “time” button to verify setting (‘04’ will appear). Press “pwr/prg”
button and red light will turn green. Electrophoresis will run for 4 minutes.
9. Cover PCR plate with clear seal and place it back in temporary storage.
10. Once run is complete, remove E-Gel from E-Base and place it on transilluminator (A01 on top
left). Turn camera on.
11. Open software Gel Capture -> Acquire Image -> UV Light Base -> E-Gel Adaptor Base with E-
Gel Go -> Select. The plate should appear on the computer screen with glowing bands for
successful amplification. Take a picture if you are satisfied with the brightness and alignment of
plate. Press “SAVE” to save the snapped image. Assign a file name as appropriate including the
plate number (images are saved as .tif)
12. Turn off camera and transilluminator, place comb back on E-Gel and back in its package and
either in special container (for additional runs) or in designated garbage bin for hazardous waste.
13. Open software E-Editor, open .tif image, use cross-hairs tool (left side toolbar) to define surface
of gel to be cut into 96 bands, then Next -> Save as .jpg (file format required for LIMS).
14. Upload image file to LIMS. Score bands on E-Gel to assess amplification success.
15. Clean space and dispose of waste in designated garbage bin for hazardous waste.
Currently, PCR products are not cleaned-up but proceed directly to sequencing.
59
If reused within the same day, the E-Gel does not require re-hydration. If more than a day, wipe the gel
with a Kimwipe, change gloves, and re-hydrate with 14µL ddH2O.
107
GTI Training Manual – Standardized Workflows in DNA barcoding
Cycle sequencing
The cycle sequencing reaction is customized for sequencing clean-up with magnetic beads (in-
house recipe and protocol ran at CCDB).
Each PCR plate will be sequenced in two directions (forward and reverse) resulting in two
sequencing plates.
Reagents, equipment and disposables required for processing 2 plates (sequencing mix
not included):
The sequencing workstation is usually a flow hood with UV light intended for decontamination
purposes. Keep designated sequencing pipettes inside the hood at all times. Clean hood with
ethanol before starting to work. Change gloves after cleaning.
1. Retrieve required tips, seals, plate, and tube racks intended for sequencing, and place in
sequencing hood. Press UV light button on the sequencing hood, which will turn UV light on for
15 minutes.
2. Retrieve PCR plate from temporary storage and spin briefly.
3. Pour ddH2O into reagent reservoir.
4. Dilute PCR product with 40µL ddH2O using a 12-channel manual pipette. Use same tips for
entire plate, making sure not to touch the PCR plate. Cover with clear seal.
5. Centrifuge at 1000×g for 2 minutes to get rid of bubbles.
6. Label two clear sequencing plates. Place a LIMS sticker on the right-hand side of plate (see
illustration below):
108
GTI Training Manual – Standardized Workflows in DNA barcoding
F
12
PLATE #
7. Prepare sequencing mix60 by adding all reagents, except for PCR product, in a sterile 1.5 or
2mL microcentrifuge tube. Note: mix should be made for two plates (forward and reverse) in
two separate tubes.
60
Sequencing mix can be prepared ahead of time for one or multiple plates, with or without primers. Mixes
will be aliquoted in 1.5 or 2mL tubes (1 tube enough for one plate) and stored at -20°C.
61
See Appendix A for details on sequencing reagents.
109
GTI Training Manual – Standardized Workflows in DNA barcoding
M13 primers
Number of cycles Block temperature Hold time (mm:ss)
1 96°C 01:00
96°C 00:10
35x 55°C 00:05
60°C 02:30
1 60°C 05:00
Hold 4 or 10°C Until plate removed
Other primers
Number of cycles Block temperature Hold time (mm:ss)
1 96°C 01:00
96°C 00:10
15x 55°C 00:05
60°C 01:15
96°C 00:10
5x 55°C 00:05
60°C 01:45
1 60°C 00:15
96°C 00:10
15x 55°C 00:05
60°C 02:00
1 60°C 01:00
Hold 4 or 10°C Until plate removed
110
GTI Training Manual – Standardized Workflows in DNA barcoding
62
Stainless steel beads can be re-used: separate beads from tissue debris, rinse with water, soak in
ELIMINase for 1 hour, wash thoroughly with warm water, soak in 0.5N HCl for 1 min, rinse with warm water
followed by dH2O and final rinse with ddH2O, dry and expose to UV light for 30 min.
63
Place beads in strip-tubes before tissue subsampling for an easier workflow.
111
GTI Training Manual – Standardized Workflows in DNA barcoding
5. Take one strip of tubes labeled “1” from first row of PB, and place it in the Rack 1.
6. Pull the caps from tubes using two hands: one hand holds strip by ending tag, while other hand
helps to open caps one by one, pulling them carefully by side tags of each cap. Use ending tag
to pull whole strip of released caps aside from opened tubes to prevent small particles from
caps falling into neighboring (opened) tubes. Discard the cap-strip.
7. Transfer opened strip-tube into corresponding row #1 of Rack 2. Avoid touching the upper part
of the strip-tubes, especially the open ends of the tubes.
8. Take strip “3” from the PB, and place into Rack 1. Repeat steps 6-7, place opened strip-tubes
in corresponding row #3 in Rack 2.
9. Repeat same steps with strip-tubes # 5, 7, 9, 11.
10. Repeat same operations with strip-tubes #2, 4, 6, 8, 10, 12, and place them into original PB,
according to row numbers.
11. Move Rack 1 aside and clean bench with ethanol.
12. Keep a box of clean gloves handy. Change gloves to avoid contamination. Make sure gloves
fit your fingers tightly enough to hold small objects. After changing gloves make sure you do
not touch anything but clean box lid and beads.
13. Pour some clean stainless-steel beads in a sterile box lid.
14. Take a single bead and put it in the tube of the strip-tube 1, not touching the top of tube
(change gloves in case of touching tubes). Make sure that each tube contains a single bead,
otherwise the tube will crack during grinding.
15. Repeat same procedure with all tubes in Rack 2 and PB.
16. Return unused beads back to tube with the clean beads.
17. Change gloves, and prepare 12 sterile 8-cap strips, placing them on clean Kimwipe.
18. Close all strip-tubes with fresh caps. Try to grab cap-strips by a tag and work from bottom to
top to avoid hovering over opened tubes.
19. Transfer closed strip-tubes from Rack 2 in PB, according to row numbers.
The box with tissue samples is now ready for grinding.
• Place PB with lid removed in TissueLyser base plate adapter, cover with lid adapter, make sure
to match cylindrical parts on base and lid adapters.
• Place assembly in TissueLyser and clamp using hand wheel with compression disk until locking
bolt stops clicking. Do not over-tighten.
• Close TissueLyser and apply 28Hz for 30 seconds.
• Raise and rotate locking bolt, release adapter using hand wheel, return locking bolt into ‘clicking’
position.
• Disassemble adapters and rotate each tube rack 180° (see note below) and secure them again
as described above.
• Apply 28Hz for another 30 seconds.
• Release adapters and remove plate from TissueLyser.
• Cover each PB with lid and centrifuge at 1000×g for 1 minute.
Note: When using a TissueLyser Adapter Set, samples nearer to the TissueLyser move slower
than the samples further away from the TissueLyser. To ensure uniform disruption and
homogenization, 2 shaking steps should be carried out. After the first shaking step, the
TissueLyser Adapter Set should be disassembled and the rack of tubes should be rotated so that
112
GTI Training Manual – Standardized Workflows in DNA barcoding
the tubes that were nearest to the TissueLyser are now outermost. The TissueLyser Adapter Set
should then be reassembled before continuing with the second shaking step.
1. Retrieve plant box with grinded tissue. Make sure that labels are easily readable.
2. Place Rack 1 in front and Rack 2 on right side.
3. Take strip “1” from PB and transfer it to Rack 1.
4. Remove (and discard) caps using technique described above (first section of plant protocol).
5. Transfer very carefully open strip-tubes to corresponding row #1 in Rack 2. Do not touch top of
tubes.
6. Repeat steps 3-5 with strip-tubes 3, 5, 7, 9, and 11 transferring them into corresponding rows
in Rack 2.
7. Repeat same steps with strip-tubes 2, 4, 6, 8, 10, and 12 and place them into original PB,
according to row numbers.
8. Move Rack 1 aside and clean bench with ethanol. Change gloves.
9. Prepare 12 clean 8-cap strips placing them on clean Kimwipe.
10. Pour CTAB in reagent reservoir.
11. Dispense 300μL CTAB into each tube from Rack 2 and PB. Make sure that lysis buffer does
not wet tip filters (discard tips if necessary).
12. Close tubes tightly with sterile 8-cap strips. Work from bottom to top to avoid hovering over
open tubes.
13. Transfer closed strip-tubes from Rack 2 to PB according to row numbers.
14. Close PB with the box lid.
15. Hold the PB with lid using both hands and gently invert once.
16. Immediately centrifuge at 1000×g for 1 minute.
17. Place PB in incubator at 65°C for 60-90 minutes (optional: incubator with shaker, 120 rpm).
113
GTI Training Manual – Standardized Workflows in DNA barcoding
1. Retrieve plate from incubator and allow to cool down at room temperature. Do not invert plate
once it is warm.
2. Centrifuge at 1000×g for 1 minute.
3. Retrieve a sterile clear microplate and add LIMS sticker.
4. Pour plant binding buffer (PBB) into reagent reservoir.
5. Dispense 100μL PBB, being careful not to touch the wells but hover above.
*Fill 1200µL, dispense 100µL. Use same tips for entire plate.
6. Cover microplate temporarily (e.g. Kimwipe) to prevent contamination.
7. Carefully open all caps from the plant strip-tubes, working from top to the bottom to avoid
hovering over open tubes. No additional rack is required at this stage.
8. Transfer 50μL of supernatant to microplate with PBB (and then proceed to DNA extraction).
*Use manual 8-channel pipette. Change tips after each row.
9. Seal the plant box with aluminum film, cover with lid and store at -20°C as backup.
DNA extraction
Plant DNA is extracted through a glass-fiber protocol, very similar to the animal protocol.
1. Retrieve glass fiber plate (Acroprep 1.0µm) and label with initials, original (tissue) plate number
and date. Place on top of clean square well block.
114
GTI Training Manual – Standardized Workflows in DNA barcoding
2. Transfer all 150μL mix (100μL PBB+50μL plant lysate) to GF plate: before transferring, slowly
mix lysate up and down about 5-10 times in lysate plate wells before releasing into the bottom
of GF plate wells (make sure not to puncture the GF membrane). Cover plate with clear seal.
*Use manual 12-channel pipette. Change tips after each row.
3. Centrifuge at 5000×g for 5 minutes to bind DNA to GF membrane.
4. Label reservoirs for three extraction buffers.
5. Pour binding mix (BM) into first reservoir.
6. Add 180µL BM to each well, being careful not to touch the wells but hover above. Cover with
clear seal.
* Fill 1080µL, dispense 180µL (covers ½ plate). Repeat. Use same tips for entire plate.
7. Centrifuge at 5000×g for 2 minutes.
8. Pour wash buffer (WB) into second reservoir.
9. Discard clear seal. Add 700µL WB to each well of GF plate. Cover with clear seal.
*Fill 700µL, dispense 700µL. Use same tips for one plate.
10. Centrifuge at 5000×g for 5 minutes.
11. Retrieve a sterile blue microplate for DNA collection and label with initials, date and original
(tissue) plate number. Place a LIMS sticker on the front of blue plate.
12. Place GF plate on DNA plate and discard clear seal. Make sure both plates have the same
orientation (A01 of GF is placed into A01 of the DNA plate).
13. Incubate at 56°C for 30 minutes to evaporate residual ethanol.
14. Warm elution buffer (in a container such as an 15mL falcon tube) in the same time as ethanol
evaporation in incubator. Make sure to turn off incubator after step is complete.
15. Pour warmed elution buffer (EB) into third reservoir.
16. Dispense 50µL64 of EB directly onto the membrane in each well of GF plate and incubate at
room temperature for 1 minute. Ensure no plate flip happens. Cover with clear seal.
*Fill 600µL, dispense 50µL. Use same tips for entire plate.
17. Place GF plate + DNA plate combination on top of a square-well block for centrifugation
(otherwise the DNA plate will crack).
18. Centrifuge at 5000×g for 5 minutes to collect DNA.
19. Visually inspect DNA plate from the bottom to make sure that each well contains liquid. Cover
DNA plate with aluminum seal (use plate roller to seal it tightly) and place it in the appropriate
storage.
20. Store DNA at 4°C (short term) or -20°C (medium term). Archive DNA plates at -80°C.
21. Wrap GF plates in Kimwipes and store in plastic bags at 4°C (if necessary, they may be re-
eluted, but their shelf life is relatively short).
22. Document DNA extraction on LIMS.
23. Clean workstation and dispose of waste.
PCR amplification
Amplification of different plant/fungal markers requires different PCR reactions and thermocycling
programs but the workflow and necessary laboratory equipment are the same as for the
amplification of animal COI.
Note: Phusion Hot Start High-Fidelity DNA Polymerase, with proofreading ability, has proved very
efficient for amplification of homopolymer regions and greatly improves sequencing results for
matK and psbA-trnH. However, the enzyme presents lower thermostability and batches of pre-
64
The quantity of EB can be increased up to 60µL/well.
115
GTI Training Manual – Standardized Workflows in DNA barcoding
made PCR plates should not exceed 4 plates which cannot be stored at -20°C but need to be
used immediately.
PCR reaction (12.5µL) for rbcLa, ITS2, fungal ITS, and LSU markers
116
GTI Training Manual – Standardized Workflows in DNA barcoding
117
GTI Training Manual – Standardized Workflows in DNA barcoding
Currently, PCR products are not cleaned-up but proceed directly to sequencing.
Cycle sequencing
Cycle sequencing different plant/fungal markers requires different primers for the sequencing mix
and different thermocycling regimes but the workflow and necessary laboratory equipment are
the same as for sequencing animal COI.
Sequencing mix will contain only one primer.
Note: These thermocycling programs are suitable only for fast ramping thermocyclers.
118
GTI Training Manual – Standardized Workflows in DNA barcoding
119
GTI Training Manual – Standardized Workflows in DNA barcoding
Levin RA, Wagner WL, Hoch PC, et al. (2003) Family-level relationships of Onagraceae based
on chloroplast rbcL and ndhF data. American Journal of Botany 90:107-115 (modified from
Soltis P et al. (1992) Proceedings of National Academy of Sciences USA 89: 449-451).
Messing J (1983) New M13 vectors for cloning. Methods in Enzymology 101: 20-78.
Sang T, Crawford DJ, Stuessy TF (1997) Chloroplast DNA phylogeny, reticulate evolution and
biogeography of Paeonia (Paeoniaceae). American Journal of Botany 84: 1120–1136.
Tate JA, Simpson BB (2003) Paraphyly of Tarasa (Malvaceae) and diverse origins of the polyploid
species. Systematic Botany 28: 723–737.
Vilgalys R, Hester M (1990) Rapid genetic identification and mapping of enzymatically amplified
ribosomal DNA from several Cryptococcus species. Journal of Bacteriology 172: 4239-
4246.
White TJ, Bruns T, Lee S, Taylor J (1990) Amplification and direct sequencing of fungal ribosomal
RNA genes for phylogenetics. In: PCR Protocols: a guide to methods and applications.
(Innis MA, Gelfand DH, Sninsky JJ, White TJ, eds). Academic Press, New York, USA: 315–
322.
120
GTI Training Manual – Standardized Workflows in DNA barcoding
65
Bininda-Emonds ORP (2005). transAlign: using amino acids to facilitate the multiple alignment of protein-
coding DNA sequences. BMC Bioinformatics 6:156 (https://www.uni-oldenburg.de/ibu/systematik-
evolutionsbiologie/programme/#Sequences).
66
https://www.perl.org/
67
http://www.mbio.ncsu.edu/BioEdit/bioedit.html
121
GTI Training Manual – Standardized Workflows in DNA barcoding
2. Import traces (file, import, add samples). Select “abi” files only. Import all the forward
(96) and reverse (96) sequences (including controls) (from one plate 192 samples)
122
GTI Training Manual – Standardized Workflows in DNA barcoding
3. Clip ends
4. Sort by quality, move the poor quality (<250bp) sequences and the controls to
trash. Select low-quality sequences and use right-click to reveal the window to
“Move to Trash”
123
GTI Training Manual – Standardized Workflows in DNA barcoding
6. Reverse the R direction by using the function “control R” (to convert into 3′ – 5′
complementary direction)
124
GTI Training Manual – Standardized Workflows in DNA barcoding
• Locate the forward primer sequence in the 3′ region of the reverse strand and
delete by selecting all nucleotides on the left side of the contig.
8. Unassemble F and R. Select F and R contigs by using Control key and use
unassemble function to move sequences back to “unassembled samples”.
125
GTI Training Manual – Standardized Workflows in DNA barcoding
10. Correct nucleotides in contigs and unassembled samples. Move the bad
sequences to trash. Open each contig by a mouse-click and verify that F is black
and R is red. If R is not red use “Ctrl R” to change the reverse to red. Look through
consensus sequence to correct ambiguous bases (e.g. N) and the gaps. Repeat
this for all the individual contigs.
11. Assemble contig of contigs by selecting “compare contigs” function. Check for
the direction (5′ – 3′). Select all the individual contigs and unassembled samples
using Ctrl or Shift key. Use “Compare contigs” function from “Advance Assembly”
and then “Clustal Omega” (for COI) to run alignment. Hit “Compare”.
126
GTI Training Manual – Standardized Workflows in DNA barcoding
12. Correct the erroneous bases. Open “CtgComarison” by mouse click and fix the
“Ns” and gaps.
13. Export sequences (both child contigs and unassembled samples): choose ‘include
gaps in fasta’. Export and save separate files for contigs and unassembled
samples
Next, use MEGA or a text editor to combine all the exported sequences (from contigs and
samples) in a single fasta file. Before sequence submission to BOLD, verify that the
sequences are free of stop codons.
127
GTI Training Manual – Standardized Workflows in DNA barcoding
Transalign
Perl and transalign need to be installed on the computer. The following instructions will use the
terminal window:
Step 1: cd c:\transalign
128
GTI Training Manual – Standardized Workflows in DNA barcoding
Open the terminal window and navigate to the folder with transalign. The fasta file to be analyzed
should be in the same folder. In this example, the folder is called tranAlign.
129
GTI Training Manual – Standardized Workflows in DNA barcoding
Processing each sequence to determine optimal reading frame (in all possible orientations) and
any frame shifts
WARNING: Number of stop codons in best frame (5) for MKPCH573-09 exceeds threshold
(2); inspection of sequence is recommended
Reading in ClustalW aligned DNA data and inferring indel positions ...
C:\transAlign>
130
GTI Training Manual – Standardized Workflows in DNA barcoding
131
GTI Training Manual – Standardized Workflows in DNA barcoding
132
GTI Training Manual – Standardized Workflows in DNA barcoding
Upon log in, the main console is displayed, the control desk of the entire BOLD experience.
On the upper side there is a search bar (for project, datasets and records). On the right corner,
there are links to main general tools in the following order: 1) main BOLD databases (Public Data,
BINs, Publications, Primers), 2) BOLD ID Engine, 3) Taxonomy Browser, 4) Main Console, and
5) Resources (documentation and BOLD API).
In the middle of the console, there are green buttons to create new projects and/or new datasets
and blue buttons for data upload (once the records are inserted in BOLD).
The left-side console provides other links (Projects, Checklists, Primers, Main Menu) and
becomes the main navigation pane within a project (holding all the analytical tools).
To insert data into BOLD, a new project needs to be created. Then, Specimen Data can be
inserted manually, one by one, or in batch, submitted through the Uploads tool to the BOLD team
for validation and upload to BOLD. The batch upload is performed by filling out and submitting
standard MS Excel templates that can be downloaded from BOLD. These spreadsheets can also
133
GTI Training Manual – Standardized Workflows in DNA barcoding
be generated automatically from the Electronic Field Journal using the ‘BOLD Data Output’
function (Annex II).
Note: Each data record can only be submitted once as a new record, but can later be updated an
indefinite number of times.
Once all records are in BOLD, images can be uploaded manually, one by one, or in batch through
the Uploads->Images tool (in light green in the figure above). All images (.jpg only) and the
associated spreadsheet (template provided by BOLD) are compressed in a Zip folder (<190 Mb)
and submitted to BOLD. Any issue will trigger an error message from BOLD.
Once chromatograms are received from the sequencer, these files can be uploaded manually,
one by one, or in batch through the Uploads->Traces tool (in blue in the figure above). All traces
(.abi files) and the associated spreadsheet (template provided by BOLD) are compressed in a Zip
folder (<190 Mb) and submitted to BOLD. Any issue will trigger an error message from BOLD
Finally, once traces are edited, DNA sequences can be uploaded to BOLD through the batch
submission tool (in purple in the figure above). The name of the sequencing facility needs to be
added to the Run Site (pre-filled with institution names already in BOLD; new institutions can be
added).
Once all data and metadata is in BOLD, the project console is populated with records and
statistics.
134
GTI Training Manual – Standardized Workflows in DNA barcoding
• If only a few records need some fields updated, the update can be done manually through
the Specimen Page of each record (highlighted in the red rectangle);
• If many records need to be updated, a batch update can be submitted through the same
tool as new records (with the same spreadsheet template but specifying the tab that
needs updates: Voucher, Taxonomy, Specimen, Collection);
• If there are errors in images or traces, a request needs to be sent to the BOLD team for
those files to be deleted;
• If sequences need to be updated (based on new sequence editing etc), a new sequence
upload, by the user, will overwrite the existing information. Note: sequences can be
individually updated through the Sequence Page of each record (highlighted in the green
rectangle).
135