GTI Manual Final 25june

Download as pdf or txt
Download as pdf or txt
You are on page 1of 135

This manual is a contribution of the Biodiversity Institute of Ontario - Research Training

Program (University of Guelph) to the Global Taxonomy Initiative.


Authors: Adriana E. Radulovici, Muhammad Ashfaq, Alex Borisenko
Spell-check: Susana Miranda Romo
Review: Junko Shimura
Cover design: Suzanne Bateson
GTI Training Manual – Standardized Workflows in DNA barcoding

Table of Contents

Introduction................................................................................................................................. 5
The DNA Barcoding Concept .................................................................................................. 5
Standard DNA Barcode Markers ............................................................................................. 6
Potential Utility and Limitations of DNA Barcoding .................................................................. 9
DNA Barcode Data Repositories ........................................................................................... 11
Standard DNA Barcoding Workflows: General Overview ...................................................... 13
Chapter 1. Front-End Processing ............................................................................................. 16
Natural History Collections and Lab Work – An Integrative Approach ................................... 16
Collection Data Management: Practical Approaches............................................................. 21
Specimen Imaging: Basic Principles ..................................................................................... 22
Tissue Sampling: Basic Principles ........................................................................................ 23
Chapter 2. Molecular Analysis .................................................................................................. 27
Set-up of Molecular Laboratory ............................................................................................. 27
DNA Extraction ..................................................................................................................... 29
Polymerase Chain Reaction (PCR) ....................................................................................... 33
Cycle Sequencing ................................................................................................................. 40
Chapter 3. Informatics and Data Analysis ................................................................................. 41
Sequence Editing .................................................................................................................. 41
Quality Control ...................................................................................................................... 44
Scaling up to Medium/High-throughput Processing ............................................................... 54
BOLD Analytics..................................................................................................................... 56
ANNEX I: Scaling up DNA Barcoding Workflows ...................................................................... 66
ANNEX II: Collecting Ontologies............................................................................................... 71
Annex III: Electronic Field Journal ............................................................................................ 75
ANNEX IV: Specimen Imaging ................................................................................................. 83
ANNEX V: Tissue Sampling ..................................................................................................... 91
ANNEX VI: Medium/High-throughput Lab ................................................................................. 95
ANNEX VII: Reagents for DNA barcoding ................................................................................ 98
ANNEX VIII: Barcoding Protocols (96-well microplates) ......................................................... 101
DNA Barcoding – Animals................................................................................................... 101
DNA Barcoding – Plants, fungi ............................................................................................ 111

3
GTI Training Manual – Standardized Workflows in DNA barcoding

ANNEX IX: Batch Sequence Editing ....................................................................................... 121


Annex X: BOLD Data Submission .......................................................................................... 133

4
GTI Training Manual – Standardized Workflows in DNA barcoding

Introduction

The DNA Barcoding Concept


DNA barcoding is an approach towards characterizing biological diversity by using short,
standardized DNA fragments. It is an actively developing area lying at the interface between
genomics and biodiversity science, offering a suite of molecular tools for fast, reliable identification
and discovery of species. It is based on the observation that biological species typically possess
distinctive genetic signatures. Furthermore, broad taxonomic groupings (e.g., groups of phyla)
can be discerned by analysing patterns of nucleotide variation within a single short (less than
1000 base pair) fragment within the same locus of the genome, called the ‘barcode region’.
The term “DNA barcode” was firstly used by Paul Hebert and colleagues in 2003 in their
publication “Biological identification through DNA barcodes”1. The authors determined that a
fragment of 658 base pairs (bp) from 5′-end of cytochrome oxidase I (COI) gene successfully
discriminated 200 closely allied species of Lepidoptera and argued that the gene region can be
exploited to distinguish the overwhelming majority of animal species on earth. By referencing the
analogy with commonly known ‘Universal Product Codes’, widely used to identify and track
consumer products, the term underscores the relative shortness and standardization of the gene
region selected and the automation potential of DNA-based diagnostics as a tool for “scanning”
life.

In the last decade, DNA barcoding gained visibility beyond the expert community as a tool for
sharing biodiversity knowledge – by making species identification tools applicable to anyone
interested in biodiversity. Through simple and routine diagnostic procedures, DNA barcoding
enables taxonomists to describe biological diversity more efficiently and acknowledges their
contribution towards building the reference library. At the same time, the wide user community
gains unrestricted access to the library to perform DNA-based identifications. Thus, in addition to
aiding taxonomists with performing their specialized tasks, it makes their knowledge more
available to the wide range of users who may not be carrying taxonomic expertise or years of
training in taxonomy.
DNA barcoding should not be considered as the equivalent of molecular systematics, which is an
area of taxonomy that infers relationships between species and higher taxonomic categories from
molecular phylogenetic analyses, often using multiple genetic markers. It is also different from
DNA taxonomy – an approach which proposes to use DNA as the sole basis for all taxonomic
reconstructions, with other character sets (e.g., morphology) being ancillary. The two approaches
(molecular systematics and DNA taxonomy) advocate for using larger volumes of genetic
information, to add robustness to phylogenetic reconstructions, with different marker selection,
depending on the taxonomic group studied and the goals of analysis. By contrast, “classical” DNA
barcoding argues for:

1
Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003). Biological identifications through DNA barcodes.
Proceedings of the Royal Society B, 270: 313–321.

5
GTI Training Manual – Standardized Workflows in DNA barcoding

a) standardization – using exact same gene region(s) across large taxonomic entities to
ensure adequate comparisons; and
b) minimalism – using the minimal amount of genetic data necessary to provide reliable
identification.
These requirements underscore the heavy focus on diagnostics, while maintaining a relatively
agnostic position regarding taxonomic arrangements or species concepts, which are the subjects
of other fields of study.

Standard DNA Barcode Markers


A single DNA marker cannot provide species-level resolution for all eukaryotic life across all four
main kingdoms: Animalia, Planta, Fungi, and Protista. Several studies examined the utility of
various markers outside of the animal kingdom and proposed alternatives, which have later been
established as standard markers recommended for use by the global DNA barcoding community.

Barcode marker for animals


As already mentioned, the animal barcode region is a 658 bp fragment of COI (Figure 1). In many
animal species, this gene region can be amplified using the primer pair LCO1490/HCO2198 also
known as ‘Folmer primers’2 – so the barcode region is also referred to as the ‘Folmer region’.

Animal cell

mtDNA

Mitochondrion
Figure 1. Mitochondrial cytochrome c oxidase subunit I (COI) – standard DNA barcode marker for animals.

2
Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994). DNA primers for amplification of mitochondrial
cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular Marine Biology and
Biotechnology 3(5): 294–29.

6
GTI Training Manual – Standardized Workflows in DNA barcoding

Barcode marker for plants


Standardization, minimalism, and scalability are the core principles of DNA barcoding. COI fits
these criteria for animals; however, the low rate of nucleotide substitution in mitochondrial COI in
plants prevents the gene from being a good plant barcoding marker; this has led to the search for
alternative barcoding regions resulting in a recommendation for a two-tiered approach towards
plant DNA barcoding using plastid gene markers (Figure 2):

• First pass analysis using the Ribulose-1,5-bisphosphate carboxylase/oxygenase


(RuBisCO, or rbcL) which can be easily aligned and typically offers genus-level resolution;
• Second pass analysis with Maturase K (matK) which can only be aligned among closely
related groups of plants, but offers better taxonomic resolution.
Although the combination of rbcL+matK became the standard barcode for land plants3, other
genes are also being used, such as the internal transcribed spacer 2 (ITS2) which is part of the
nuclear ribosomal DNA tandem repeats.

Plant cell

Plastid
Figure 2. Plastid rbcL and matK genes – standard markers used in the two-tier DNA barcoding approach
for plants.

Barcode marker for fungi


Fungi belong to the second largest kingdom of eukaryotic organisms but are relatively
underexplored. Their morphological identification is a challenge, especially in the context of a
steep decrease in fungal taxonomic expertise, therefore molecular identification has become a
workable approach for identifying and cataloguing fungal species. The DNA barcode markers that
have been established for animals and plants do not provide desirable results for discriminating
fungi. COI is difficult to amplify in fungi, often includes large introns (therefore difficult to align),

3
Hollingsworth PM, Forrest LL, Spouge JL, Hajibabaei M, Ratnasingham S, et al. (2009). A DNA barcode
for land plants. Proceedings of the National Academy of Science 06: 12794–1279.

7
GTI Training Manual – Standardized Workflows in DNA barcoding

and may also be inadequately variable. Moreover, some fungal clades, such as
Neocallimastigomycota, lack mitochondria.
After a thorough search for a useful barcode marker, the internal transcribed spacer (ITS) (Figure
3) – a non-coding region of the ribosomal cistron was proposed as the standard barcode for fungi4.
This was mainly due to its ability to provide the highest probability of successful identification for
the broadest range of fungi by providing clear barcode gap between interspecific and intraspecific
divergence. However, it should also be noted that COI is still preferred as a barcode in some
fungal genera (e.g. Penicillium).

Fungal cell

Nucleus
Figure 3. Nuclear internal transcribed spacer (ITS) region – standard DNA barcode marker for most fungi.

Other kingdoms
Currently, only animals, plants and fungi have standard barcode markers accepted by the DNA
barcoding community. Protists, such as seaweeds and diatoms, have been investigated for DNA
barcoding on a small scale. However, it is worth-mentioning the commonly used primary markers
for macroalgae and diatoms5, as these taxa represent important components of the marine
ecosystems:

• Red algae (Rhodophyta): COI-5P


• Brown algae (Phaeophyceae): COI-5P
• Green algae (Chlorophyta): tufA (plastid elongation factor Tu gene)
• Diatoms (Bacillariophyta): rbcL-3P

4
Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W, Bergeron MJ, Hamelin
RC, Vialle A, and Fungal Barcoding Consortium (2012). Nuclear ribosomal internal transcribed spacer (ITS)
region as a universal DNA barcode marker for Fungi. Proceedings of the National Academy of Science
109: 6241-6246.
55
Saunders GW, McDevit DC (2012). Methods for DNA Barcoding Photosynthetic Protists Emphasizing
the Macroalgae and Diatoms, in Methods in Molecular Biology 858: 207-222.

8
GTI Training Manual – Standardized Workflows in DNA barcoding

In addition to these primary markers, secondary markers can be used for better resolution and
are normally represented by LSU D2/D3 (divergent domains D2/D3 of the nuclear ribosomal large
subunit DNA)5.

DNA Barcoding sensu lato6


The section on DNA barcoding markers mentions the STANDARD markers accepted by the
barcoding community. As a reminder, DNA barcoding relies on standardization and minimalism.
However, various users (e.g., researchers, regulatory agencies, commercial companies etc) have
preference for various other markers depending on taxon and goal of study. While this wider use
of molecular markers could be seen as ‘molecular species identification’ rather than DNA
barcoding, it is worth noting that in some cases, it is necessary to use additional markers for lower
taxonomic resolution or for prokaryotes. For instance, the European and Mediterranean Plant
Protection Organization (EPPO) has published a standard titled ‘DNA barcoding as an
identification tool for a number of regulated pests’7. It includes various markers for each taxon of
concern in a regulatory context (arthropods, nematodes, fungi, invasive plants, bacteria,
phytoplasmas8). Although this EPPO standard is accepted in the plant health community, it should
be considered as DNA barcoding sensu lato, since only arthropods have the same marker (COI)
as DNA barcoding sensu stricto9 (the subject of this manual). The EPPO standard proposes
other/additional markers for the other taxa based on research conducted exclusively on a subset
of regulated pests, which were found to have higher success rate, in terms of molecular
processing (amplification and sequencing success), and better resolution to species or
infraspecific levels based on other markers. Moreover, bacteria and phytoplasmas, which are
important pests for agricultural production, are not the subject of DNA barcoding sensu stricto
(which focuses on eukaryotes) covered in the GTI Training Manual.

Potential Utility and Limitations of DNA Barcoding


The digital nature of DNA sequence information facilitates automation when comparing ultra large
datasets (millions of records); it also minimizes interpretative bias, compared to using analog
characters, such as qualitative morphological traits. It also obviates the limitations of traditional
morphological approaches in identifying species of all life stages, organismal fragments that lack
diagnostic characters, processed products, and even DNA traces in the environment. Although it
does not propose a taxonomic hierarchy based on genomic information, it does provide a
framework for provisional taxonomic allocation of organisms defining operational taxonomic units
that can be standardized across different applied projects. This makes the approach highly
applicable in tackling ‘dark taxa’ – poorly known, hyperdiverse and morphologically
indistinguishable groups of organisms. The development of next-generation sequencing opened
the potential for employing highly multiplexed approaches in biodiversity screening at the
ecosystem level.

6
Sensu lato - “in the broad sense” (from Latin).
7
EPPO Standard PM7/129 (1) (2016). DNA barcoding as an identification tool for a number of regulated
pests.
8
Phytoplasmas - bacterial parasites of plant phloem tissue transmitted from plant-to-plant by insect vectors.
9
Sensu stricto - “in the strict sense” (from Latin).

9
GTI Training Manual – Standardized Workflows in DNA barcoding

DNA Barcoding Applications


As a cost-effective and robust approach towards DNA sequence-based identification of organisms
across eukaryotes, DNA barcoding has potential applications in varied areas, such as10:

• Agriculture and forestry – identifying and monitoring agriculture and forestry pests and
biological control agents;
• Human health – identifying and monitoring human disease vectors and reservoirs and
reconstructing disease transmission pathways, assessment and monitoring of naturally
borne disease foci;
• Invasive alien species – identifying and monitoring alien (non-native) species that are
negatively impacting on ecosystems, habitats and native species, improving early
detection and regulatory measures to prevent cross-border transfer of the unwanted alien
species;
• Endangered species – enhancing taxonomic and ecological knowledge about endangered
species and creating a diagnostic framework for monitoring and preventing illegal harvest
and trade by improving detection of illegal specimens at all levels;
• Environmental surveillance/monitoring – helping mining industries (oil, gas, mining),
conservation sector (protected areas), the natural resources (forestry, fisheries) and
agriculture sectors to meet their environmental goals and to evaluate the efficiency of
management measures, restoration and mitigation measures;
• Market surveillance, product ingredient authentication, detection of food contamination
and substitution (e.g., seafood, meat and natural products).
Importantly, these applications go beyond academic science, making DNA barcoding a useful tool
in the portfolio of practitioners working in areas related to biodiversity. This broad utility of DNA
barcoding was recognized at the 13th Conference of the Parties to the Convention on Biological
Diversity, which, in one of its decisions, invited Parties “to support the development, with the
assistance, as appropriate, of the international barcode of life network, of DNA sequence-based
technology (DNA barcoding) and associated DNA barcode reference libraries for priority
taxonomic groups of organisms…” as a vehicle for global capacity building supporting the overall
goals of the Convention and its Global Taxonomy Initiative (Decision XIII/31, 13 December,
2016)11.

DNA Barcoding Limitations


DNA barcoding has been applied to solve a broad range of issues requiring accurate taxonomic
identification and most of its applications remain focused on this important end goal. However,
DNA barcoding employs molecular markers that have been used to address other research
questions, such as phylogenetic relationships between species or phylogeographic diversification
within species. These cases have sometimes been confounded with barcoding which caused
semantic ambiguity in the use of the term DNA barcoding – different scientists may understand
barcoding differently. As a result, DNA barcoding was misapplied to address questions which fell
outside its scope or its application was misinterpreted by the peers. For this reason, it is important

10
UNEP/CBD/SBSTTA/18/INF/20, 13 June, 2014; https://www.cbd.int/doc/meetings/sbstta/sbstta-
18/information/sbstta-18-inf-20-en.pdf
11
https://www.cbd.int/doc/decisions/cop-13/cop-13-dec-31-en.pdf

10
GTI Training Manual – Standardized Workflows in DNA barcoding

to understand the limitations of the DNA barcoding approach in its strict interpretation (sensu
stricto). The limitations of DNA barcoding could be divided into three groups.

• Conceptual limitations originate from the philosophy of DNA barcoding as a parsimonious


approach using one or two relatively short markers and employing computationally-light
informatics algorithms to analyze the results:
o It is not intended to recover phylogenies and does not employ phylogenetic tree
building algorithms (although the markers used may have phylogenetic signal);
o Although it can help clarify the position of organisms within a taxonomic system, it
does not intend to infer assumptions on systematics;
o It may not be sufficient as a “standalone” tool for detection of species that are new
to science.

• Genetic limitations are inherent to the characteristics of the markers used – COI, matK
and rbcL are organellar markers, therefore predominantly maternally inherited:
o DNA barcodes cannot resolve cases of mitochondrial or plastid introgression,
including on-going or past hybridization;
o Rare cases of heteroplasmy may lead to inaccurate sequence reads, confusing
the analytical outcome;
o The genes used may not provide enough resolution to discern recently diverged
species, including species with traces of past introgression with other, closely
related species;
o In some cases, COI is known to be present as non-functional nuclear copies known
as nuclear-mitochondrial pseudogenes (NUMTs) – these copies may be mistaken
for mitochondrial COI.

• Methodological limitations are a result of the standard analytical protocols typically


employed in DNA barcoding; however, they are being addressed as new methods are
introduced, particularly those making use of next-generation sequencing platforms:
o The use of universal primers may have limited capacity in recovering DNA barcode
sequences from taxonomically diverse groups of organisms;
o High-throughput analytical protocols may not work if the samples contain degraded
DNA, such as in cases with old museum material, heavily processed animal or
plant products, or naturally decomposed organismal remains.

DNA Barcode Data Repositories


The utility of DNA sequence-based taxonomic identification relies on the availability of openly
accessible and well vetted reference libraries that could be used to identify the DNA barcode
sequences of unknown organisms. In accordance with established international practices, genetic
sequence data generated as a result of academic research should be published through a major
online genetic data repository.

11
GTI Training Manual – Standardized Workflows in DNA barcoding

The International Nucleotide Sequence Database Collaboration (INSDC12) is a foundational


initiative that operates between the DNA Data Bank of Japan (DDBJ13), The European
Bioinformatics Institute (EMBL-EBI) hosting the European Nucleotide Archive (ENA14), and the
National Centre for Biotechnology Information (NCBI) hosting GenBank15. The three databases
(DDBJ, ENA, GenBank) are independent in operations but linked for sequence data. GenBank is
the largest repository of DNA sequences with data being submitted through the standard
submission tool, BankIt16. Although broader in objectives, hosting all types of DNA sequences,
and non-specific in sequence organization, with only limited metadata being uploaded, GenBank
is still the most popular database for queries of unknown sequences.
The success and acceptability of DNA barcoding as a tool for species identification generated a
large amount of sequence data from all around the world. The unique format and purpose of
barcode data required the development of a unique platform that could be used for one objective,
namely barcode data storage and its use as reference and analysis.
In this context, the Barcode of Life Data System (BOLD)17 was developed as an online
repository and bioinformatics analysis platform, operated under a Creative Commons license18,
as well as a workbench for users (Figure 4). It is specifically designed to host and analyze DNA
barcode sequence information, associated raw data, as well as provenance details, images and
taxonomic annotations related to the organisms from which the data originated. Its architecture
incorporates several modules designed to store, organize, visualize, review, curate, analyze and
share DNA barcode datasets to facilitate collaborative research and application. Moreover, it has
a link with GenBank so that BOLD records can be submitted directly to GenBank and, in the same
time, BOLD can perform data mining for specific markers from time to time.

Figure 4. BOLD home page. Links to important public tools and the workbench are on the top right corner.

12
http://www.insdc.org/
13
http://www.ddbj.nig.ac.jp/
14
https://www.ebi.ac.uk/ena
15
https://www.ncbi.nlm.nih.gov/genbank/
16
https://www.ncbi.nlm.nih.gov/WebSub/?tool=genbank
17
http://boldsystems.org
18
Ratnasingham S, Hebert PDN (2007). BOLD: The Barcode of Life Data System (www.barcodinglife.org).
Molecular Ecology Notes 7, 355-364.

12
GTI Training Manual – Standardized Workflows in DNA barcoding

Standard DNA Barcoding Workflows: General Overview


DNA barcoding may be employed for processing and analysis of individual organisms, through
single specimen analysis pipeline, or numerous specimens organized as an array (e.g. 96-well
format). The choice of the protocol may depend on the objectives of the analysis. For example,
the single-specimen analysis pipeline may be more suitable for forensic analysis of a tissue
sample or molecular identification of ancient specimens or for regulatory applications (such as
identification of plant pests, invasive alien species, endangered species when the number of
samples to be processed is small). Alternatively, when the specimens to be processed are in large
quantities and freshly collected (to yield sufficient amount of DNA and to not require specialized
treatment), a batch approach (96-well format) is more efficient. Such batch approach can be
employed in baseline barcoding studies (biodiversity surveys, biomonitoring).
In terms of operating procedures, there are three main components of DNA barcoding workflows
(Figure 5): (i) front-end processing (collections), (ii) molecular processing and (iii) informatics.

Figure 5. Three major components of the workflow are outlined in colour: Green – front-end; Blue –
molecular analyses; Brown – informatics (including data submission, curation and analysis). Specimen
collection data and photographs are submitted to BOLD. After sequencing, trace files and edited sequences
are submitted to BOLD as well.

Each component has subcomponents that need to be covered during DNA barcoding:
1. Front-end processing
a. Collection management
b. Collection data management
c. Specimen imaging
d. Tissue sampling

13
GTI Training Manual – Standardized Workflows in DNA barcoding

2. Laboratory analysis
a. DNA extraction
b. PCR amplification and gel check
c. Sequencing
3. Sequence management and analysis
a. Sequence editing
b. Sequence submission to BOLD/ GenBank
c. Sequence analysis and publishing

Main Types of DNA Barcoding Workflows


The growing use of DNA barcoding in a regulatory context demands both accuracy and clarity of
the identification results and the stringency and transparency of the procedure applied to obtain
these results. In other words, DNA barcode data should be able to “stand up in court”. One of the
key steps towards meeting these requirements is building robust, clear, transparent and highly
standardized operational workflows. Conceptually, it is possible to define three main workflows
that differ in: (i) the types of materials analyzed (‘Elementary Unit’ in Table 1), (ii) the ultimate
goals of the analysis (‘End Result’ in Table 1) and, (iii) correspondingly, the analytical approaches
used.
Table 1. Main types of DNA barcoding workflows and their essential characteristics
Baseline Barcoding Forensic Barcoding Metabarcoding
Elementary Unit biological specimen organismal fragment environmental bulk sample (lot)
Processing Unit tissue sample tissue sample entire bulk sample (lot) or bulk
lysate
Processing Batch single tubes or 96- single tubes with samples bulk sample or multiplexing array
well microplates
Processing Method Sanger sequencing Sanger sequencing next-generation sequencing
Vouchering collection vouchers usually unvouchered unvouchered or vouchered as lot
Number of single sequence single sequence multiple sequences
Sequences
End Result (goal) DNA barcode library DNA-based identification list of taxa or taxonomic units

Baseline DNA barcoding

Baseline (routine) barcoding involves low, medium and high-throughput workflows aimed at
populating the DNA barcode library. The subset of the library consisting of barcodes linked to
physical specimens (vouchers) deposited in public institutions and identified to the species level
by experts (taxonomists) should be considered as the reference barcode library, the baseline
used for DNA sequence-based identification of unknown sequences. This library subset can be
built by using specimens from existing natural history collections or freshly sourced specimens
identified by taxonomists. The remaining library subset is usually generated based on freshly
collected specimens with provisional identification (usually to higher taxonomic levels) that is
updated at a later stage (based on matches between the new sequences and the existing
reference library). This approach is particularly useful for large batches of specimens representing
taxa that are difficult to identify (i.e., where the cost of molecular analysis per sample outweighs
the cost of efforts given by a taxonomic expert who is requested to identify the organism). In this
case, only a subset of the entire series of specimens with identical DNA barcodes could be
selected for non-molecular (e.g., morphological) identification to ensure the accuracy of the
results derived from DNA barcoding. Thus, the workflow can also be used in a more applied

14
GTI Training Manual – Standardized Workflows in DNA barcoding

context, when certain stages could be omitted (e.g., morphological identification of each specimen
prior to molecular processing).

Forensic barcoding

Forensic (diagnostic) barcoding is an applied approach dealing with organismal fragments or


derivatives. Samples may derive from a biological individual, but often cannot be clearly traced to
a morphologically identifiable specimen. When morphological identification of the sample is
confounded or impossible, its taxonomic origin needs to be inferred through molecular analysis.
This procedure is often employed in taxonomic authentication of materials involved in forensic
cases, such as wildlife and forest crime. In cases when the sample is suspected to originate from
a single organism, Sanger sequencing analysis is used; however, in more complex cases
involving a suspected mix of derivatives from multiple taxa (e.g., meal mixes or natural health
products), next-generation sequencing approaches are warranted to parse out the taxonomic
composition of a sample (see metabarcoding). Despite the lack of reference voucher specimens,
forensic analysis demands a strictly controlled audit trail, therefore it typically involves similar
analytical stages to routine barcoding, in order to trace the analytical result to its material source
and to keep a record of the provenance of the sample analyzed. Results of forensic analyses
often have considerable significance in their application (e.g., as criminal evidence or grounds for
regulatory action) and are time-sensitive, therefore it is justified to deploy extra effort to analyze
the samples individually, rather than in large arrays.

Metabarcoding

With the advent of next-generation sequencing technologies, a growing body of research is


advocating the use of massively parallelized sequencing approaches to aid in ecological
assessments, biosurveillance and real-time monitoring of ecological communities, as well as
probing their trophic relationships. The goal of such analyses is to infer alpha-diversity (species-
level diversity) in ecosystems sampled using appropriated collecting techniques:

• Direct collection of multiple unsorted individuals, often used for smaller organisms (e.g.,
sweep-netting, Malaise-trapping, pitfalls, planktonic or benthic sampling);
• Indirect collection of tissue samples, usually from larger organisms (e.g., from fecal
samples, or hair-snagging traps);
• Indirect inference of the occurrence of certain taxa in an area by collection and analysis
of water, air, or substrate samples containing their environmental DNA (eDNA).
In the first case (multiple unsorted individuals), sequence information is derived from a bulk
sample containing a multitude of taxonomically diverse organisms that are not individually sorted
or vouchered; in the latter two, there are typically no morphologically discernable specimens in
the bulk sample. In either case, the resulting sequences are typically checked against reference
libraries to identify known taxa, while the remainder sequences are grouped into operational
taxonomic units to provide an estimate of taxonomic diversity within the bulk sample.
Although individual specimens are not tracked in metabarcoding workflows, it is just as important
to keep a record of the provenance of environmental samples and to track their analytical history.
For information on scaling up the DNA Barcoding Workflows, see Annex I.

15
GTI Training Manual – Standardized Workflows in DNA barcoding

Chapter 1. Front-End Processing

Natural History Collections and Lab Work – An Integrative


Approach
Historically, natural history collection management practices have been driven by the requirement
to preserve the morphological integrity of biological objects and to organize them in a system,
usually based on their taxonomic placement and geographic origin. Although the early laboratory
experimental approaches have been largely dismissive of natural history collections and regarded
them as ‘obsolete’ or even ‘unscientific’ practices, the view changed by the of the 20th century,
when it became obvious that collections, if preserved in a ‘DNA-friendly’ way, have broad scientific
value. Genetic resources collections have been established in many leading museums, herbaria,
zoos, botanic gardens and other collection facilities. In turn, this catalyzed a paradigm shift in
science, where the retention and proper annotation of physical vouchers is gradually regarded as
the ‘gold standard’ for experimental biology, including genomics. The philosophy of DNA
barcoding is deeply grounded in this integrative vision; but goes further in developing robust
operational frameworks that allow rapid generation of molecular biodiversity data through
seamless interfacing between collection management pipelines and downstream molecular
analytical protocols. As a result, the three core components of the DNA barcoding workflow
(collections, molecular, informatics) are typically integrated into a single ‘production line’. The
stages listed below relate to a typical Sanger sequencing pipeline:

• Sourcing – the process of collecting biological materials in the natural environment; note
that each DNA barcode sequence must link to its source specimen representing a single
biological individual (or clone, for modular organisms).
• Vouchering involves the processing stages necessary to turn collected organisms into
properly curated vouchers deposited in a recognized collection and available for
[re]examination by experts:
o Provenance data digitization – entry of information about the origin of biological
materials in a spreadsheet or database;
o Labeling – affixing labels with unique alpha-numeric identifiers to collection objects
and their storage containers;
o Taxonomic identification to the lowest level allowable by using non-molecular (e.g.,
morphological) characteristics.
• Imaging is used to generate an ‘e-voucher’, in case the physical voucher is lost or
consumptively analyzed:
o Generating digital images of each voucher specimen;
o Upload of images to the reference database (e.g., BOLD Systems).
• Tissue sampling – the process of isolating organismal parts destined for molecular analysis
and arranging them in a lab-compatible format.
• Molecular analysis – laboratory stages required to analyze the sample(s):
o DNA extraction, including tissue lysis, DNA isolation and purification;

16
GTI Training Manual – Standardized Workflows in DNA barcoding

o PCR amplification, including the amplification (“multiplying”) of the DNA fragment


representing the ‘barcode region’ and cycle-sequencing – generation of fragments of
different length tagged with dye-tags that could be analyzed on the sequencer;
o PCR quality control – verification of PCR success using gel-electrophoresis;
o DNA sequencing – electrophoretic separation of tagged DNA fragments and
generation of raw data reads (chromatogram trace files).
• Bioinformatics analysis – reading raw sequence data and using it for DNA-based
identification:
o Sequence assembly – visual or software-based interpretation of raw sequence data
to generate FASTA files representing the sequential order of nucleotide bases in the
analyzed ‘barcode region’;
o Data validation – verifying the congruence of assigned morphological identification
with that inferred from molecular analysis (by comparing against available data,
including existing conspecific records)
o Data curation – addressing emerging cases of data discordance (verification of
sequences, re-identification of vouchers, etc.)
Depending on the type of workflow, some of the above stages may be reduced, modified, or
completely eliminated. When next-generation sequencing approaches are employed, many of the
molecular and bioinformatics components are drastically different from the list above, but
maintaining the overarching integrative approach towards making collection management
seamlessly compatible with the molecular pipeline remains important.

Collecting Activities
Collecting Effort (expedition)

Fieldwork generally involves careful planning and organization; and its implementation requires
allocating administrative (e.g., permits and authorization), physical (e.g., equipment, travel
logistics) and financial resources. Thus, a collecting effort could be defined as a broad activity
that typically has an overarching goal (e.g., surveying certain organisms within a certain territory
over a certain period) and is part of an institutional or collaborative project or programme. A field
collecting expedition to a particular geographic location would be a typical example of a collecting
effort; another example is a research program that involves recurring collecting activities in a
certain area.

Collecting Event

In order to accomplish the broad goals of a collecting effort, targeted sampling activities are
required, designed to survey or collect functional or taxonomic groups of organisms in a particular
locality over a relatively short time span. Such activities would involve a particular trapping method
and would typically result in collecting the target organisms, often as a bulk sample. They may be
localized in space and time, i.e., be conducted on a specific date in a spot with precise coordinates
or span several days (e.g., stationary trapping using a Malaise trap) or a range of localities (e.g.,
a transect survey). It would involve collectors (people executing the collecting) and a particular
sampling method. For example, a yellow pan trap and a pitfall deployed in the same location on
the same date would represent two collecting events; however, a line of several pitfalls placed in
proximity to each other and sampled on the same date may represent the same collecting event.

17
GTI Training Manual – Standardized Workflows in DNA barcoding

It is necessary to separate the materials sourced19 from different collecting events into different
lots (see below).

Collection Objects (operational or storage units)


Physical representation of the organism(s) that were collected are referred to as collection
objects. They can also be regarded as operational units during collection processing, or storage
units when archived after processing. It is critical to ensure that each such object receives a
unique number20 and that its linkage to a particular collecting event is duly recorded on the label(s)
and in field data journals. In a typical DNA barcoding operational workflow, specimens destined
for analysis need to be sorted from bulk samples into individual specimens, databased, labeled,
imaged, arrayed into lab-compatible format and sampled. During this process, the nature of the
operational or storage units may change, depending on their content and destination. It is
important to understand the difference between the key terms used to identify such units.

Lot

Also known as bulk environmental sample, the lot (Figure 6) represents an aggregation of multiple
unaccounted individuals derived from the same collecting event and aggregated in a single
container. These individuals may be from one or more taxa. Lots usually result from bulk collecting
events, such as sweep netting, plankton netting, pitfall traps, Malaise traps, etc. For the purpose
of logistical convenience, lot contents may be further sorted by taxonomy or other characteristics,
thereby breaking up into two or more ‘sub-lots’.
Because the specimens within a lot, as a rule, cannot be individually discerned, it is imperative
that the contents of lots originating from different collecting events are not mixed together. It is
also critical that any organisms, which received any specific treatment (e.g., retrieval of any
individual information, such as measurements, or removal of parts, such as tissue samples), are
isolated from the lots and assigned individually recognizable coding as voucher specimens (see
below).

19
Note that a collecting event circumscribes the activity undertaken in order to collect biological materials,
but does not necessarily result in collecting success. In case of a negative result, recording the collecting
event, can still yield ecologically valuable data.
20
Prior to field work, it is preferred to agree on a clear and non-duplicating alpha-numerical coding system
to be used for collected biological materials. Pre-printing labels with these codes and affixing them to
collection objects can help dramatically reduce error in the field.

18
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 6. Examples of collection lots. Left to right: storage archive of jars with marine plankton samples
held at El Collegio de la Frontera Sur, Chetumal, Mexico; collection bottle with insects being removed from
a Malaise trap; and contents removed from the Malaise trap collection bottle – a slurry of small and tiny
insects. Note that none of the specimens are individually labeled or tracked.

Specimen

Also known as ‘collection voucher’, the specimen (Figure 7) represents a single biological
individual 21. In modular organisms (such as plants) parts of the same clone stored in different
repositories may be treated as different specimens, but typically a specimen would encompass
all parts preserved from an individual organism in different forms of preparation (e.g., dry mount,
skeletal elements, whole carcass, slide preparations, tissue samples, etc.). Specimens may be
collected individually in the field or removed from bulk samples (lots). In the latter case, it is
important to retain the virtual link to the lot from which the specimen was isolated; and it is always
critical to track its association with the corresponding collecting event.

Figure 7. Examples of collection voucher specimens. Left to right: vertebrate specimen (bird study skin),
entomological specimen (pinned beetle), plant specimen (herbarium voucher). Note that each specimen
has a label affixed to it; the labels have individual, globally unique voucher numbers that allow tracking each
specimen through the analytical process and linking it with the corresponding data records.

21
While specimens are physical entities (biological individuals), species are operational units used to group
specimens based on a set of criteria.

19
GTI Training Manual – Standardized Workflows in DNA barcoding

Tissue Sample

In the context of standard DNA barcoding approaches, the term ‘tissue sample’, or simply
‘sample’ refers to a portion of a specimen (usually a piece of DNA-rich tissue) isolated for
molecular analysis. When dealing with microscopic organisms, whole individuals may be
consumptively analyzed; in this case, no tissue sample is isolated, although exoskeletal remains
may be salvaged after analysis and preserved as collection vouchers. Finally, in larger organisms
(e.g., most vertebrates), one or more tissue samples from a single individual may be stored in a
genetic resources collection; and only a portion of the tissue volume may be used for a particular
molecular analysis. In this case, a piece of the archived sample is taken; this procedure is referred
to as ‘subsampling’.
It should be noted that, depending on the nature of analysis, the sample may contain DNA from
other organisms, e.g., parasites, pathogens or contaminants. It should also be noted that the term
‘environmental sample’ or ‘bulk sample’ is often used in the context of metabarcoding analysis
(screening of multi-species assemblages using next-generation sequencing approaches). Such
bulk sample would derive from a bulk collecting event and would be equivalent to a lot or its
genetically representative subset.

Accession
The term ‘accessioning’ refers to the procedure of registering a collection in a processing facility
or repository. In this context, the term ‘accession’ is used to denote a field collection registered in
a museum or repository as a single entity (usually, multiple objects derived from a single collecting
effort or sourced from a donor). Thus, it would typically contain many lots and/or specimens. Such
aggregate (aka Accession) is usually assigned an accession registrar number. Once
accessioned in a collection or processing facility, collection objects undergo sorting, preparation
and are catalogued in a collection registrar and/or database. In order to facilitate processing
logistics and associated audit trail, such objects may be further categorized or aggregated during
different stages of processing and archival.

Processing Array
Collection processing is typically done in batches; therefore, it is often logistically convenient to
aggregate catalogued storage units in a format that facilitates tracking and speeds up the process.
In the context of medium/high-throughput DNA barcoding, the most operationally critical phase is
the transfer of a batch of specimens into a lab-compatible array of 95 tissue samples. The
assembly of compatible processing arrays allows boosting operational efficiency, while reducing
the likelihood of human error. Unlike a lot, the contents of an array are individually accounted for
and their position is tracked. As well arrays of collection specimens may be disassembled after
tissue sampling or reordered for a different type of analysis; however, arrays may also be used to
aggregate specimens for archival storage.
More information on Collecting Ontologies can be found in Annex II.

20
GTI Training Manual – Standardized Workflows in DNA barcoding

Collection Data Management: Practical Approaches


The diversity of natural history collections used in DNA barcoding needs to conform to a number
of common standards and formats related to the nature of the materials that are being collected
and the structure of associated information (refer to above discussion of lots, specimens and
samples). Adopting standardized approaches will help simplify sourcing and organizing biological
materials and recording associated information, while improving the quality of both materials and
data.

Collection Management – General Considerations


Biological materials for molecular genetic analysis can be sourced in-situ or ex-situ. In-situ
(literally, “on site”) sourcing refers to organisms collected or sampled in an area where they
‘naturally’ (or historically) occur or in an extralimital locality (outside their ‘natural’ occurrence
range) where they may have dispersed or have been unintentionally transferred. Ex-situ (“off site”)
sourcing identifies a situation when, prior to sampling, an organism or its part is intentionally
transferred outside its natural or historic occurrence range to an off-site facility for the purpose
storage (e.g., natural history collections), conservation, or propagation (e.g., zoos, botanic
gardens, or biobanks). For the in-situ collecting, it is vital to record provenance data at the point
when an organism or sample enters human custody.
Specimen collecting techniques vary significantly, depending on the taxonomy of organisms and
the specific research goals; however, there are common elements related to the need to make
the samples compliant with medium to high-throughput molecular analysis downstream.

• Materials collected in the field may represent either isolated biological individuals or bulk
samples containing multiple individuals. For the purpose of molecular analysis, it is critical
to discern these categories. The following standard terms are used:
o Lot – batch of unarrayed (usually uncatalogued) specimens derived from the same
bulk sample (e.g., pitfall, Malaise trap, plankton tow, etc.) and stored together in the
same jar, vial or other storage container.
o Specimen – main (elementary) storage unit, corresponding to an individual biological
object (whole or partial voucher and/or tissue sample).
Note: One DNA barcode record should always refer to a single biological individual;
therefore, the corresponding data records pertaining to the sequence and the specimen
itself have to be unambiguously linked.

• All collection objects (lots, specimens and any derived tissue samples) and their
relationships need to be clearly traceable using a system of unique identifiers, database
records, labels, and/or storage locators that would facilitate unambiguous correspondence
between biological individual and its sequence information.
• Provenance information (details of the geographic origin and collecting circumstances of
biological materials) should be recorded, ideally, in digital (e.g., database record) AND
analog (e.g., specimen label) format.
• Specimens/tissue samples destined for archival or molecular analysis need to be arranged
in a way that prevents mix-ups and facilitates routine processing of large volumes of
materials (at least, hundreds of samples per week).

21
GTI Training Manual – Standardized Workflows in DNA barcoding

• Biological materials destined for molecular analysis need to be collected and preserved in
a DNA-friendly fashion:
o They should be preserved as soon as possible, either by cryopreservation, drying
(desiccation), or by fixation using concentrated (~95%) ethanol or specialized DNA
preservation solutions (e.g., RNAlater);
o They should be stored in a manner that precludes possible DNA degradation from
light, hydrolysis, acidity/alkalinity, high temperatures, etc.
All the information mentioned above needs to be stored in a digital format. This can be done in a
database (MS Access for instance) or, for small amounts of data, in spreadsheets. An example
of a custom spreadsheet (Electronic Field Journal) with instructions to be filled is presented in
Annex III.

Specimen Imaging: Basic Principles


Digital images complement the DNA barcode data record by providing an independent way of
verifying taxonomic identification of a given specimen when a morphological voucher is lost or not
readily accessible. Also sometimes called ‘e-vouchers’ specimen images form an integral part of
the reference DNA barcode database. Although it is not always possible to capture all
morphological diagnostic features in a single image, it often provides sufficient information to
resolve discordances between assigned taxonomy and the position of a given specimen inferred
from its DNA barcode sequence. BOLD provides a convenient platform for storing and comparing
large batches of specimen images, as well as cross-referencing them with neighbour-joining
trees; it also allows storing multiple images per specimen record. Additional online data
repositories can be used to house specimen images (e.g., MorphBank22).
Generally, it is recommended to image the specimens before tissue sampling, in order to capture
relatively intact morphology. In some cases, specimens require special preparation to generate
diagnostically meaningful images (e.g., slide mounting which can only be done after tissue
sampling or whole specimen tissue lysis). In these situations, it may be warranted to prepare the
images after sampling.

Macro vs micro photography


Depending on the size and nature of specimen and its form of preservation (dry vs. fluid), it is
optimal to use either a digital photo camera (macro photography) or a camera attached to a
microscope (micro photography).
Macro photography generally has the advantage of using relatively cheaper equipment, greater
portability and versatility in studio setup, somewhat higher optical quality and somewhat higher
processing throughput. Although most ‘consumer-grade’ cameras currently offer a “macro
function”, it is recommended to use SLR or SLR-type cameras with interchangeable lenses. Most
macro lenses enable capturing full-frame images of specimens the sizes of 10 mm and larger;
however, it is usually possible to crop reasonably high-resolution images of smaller objects,

22
http://www.morphbank.net/

22
GTI Training Manual – Standardized Workflows in DNA barcoding

several mm in length. When working with large vouchers (e.g., vertebrate or herbarium
specimens) wide angle lenses can be used.

Figure 8. Examples of a micro photograph (A) and macro photograph (B). Left: image of a slide-mounted
flea taken using a digital camera mounted onto a dissecting microscope. Right: photograph of a fluid-
preserved frog specimen taken using an SLR camera with a macro lens.
Micro photography is warranted for specimens of several mm in length or less, particularly if they
are submerged in fluid (e.g., ethanol). Working with such specimens is usually more time-
consuming. Microscopes are typically stationary and often more expensive, compared to SLR-
type cameras; many of them are readily compatible only with same-brand cameras which require
proprietary software to capture and process images. If the specimens are not minute
(approximately 0.5 mm or larger), medium to high magnification stereoscopic (‘dissecting’)
microscopes are preferred to compound microscopes, because they allow direct manipulations
with the specimen, thereby allowing to combine imaging and tissue sampling stages into the same
workflow.
Details about imaging setup and tips are compiled in Annex IV.

Tissue Sampling: Basic Principles


Barcode-friendly tissue samples
Tissue source

Tissue samples can be collected for a variety of end uses, including, but not limited to DNA
analysis. Historically, some tissue collecting protocols were intended to facilitate studies of
chemical contaminants or other specialized tasks (e.g., allozyme analyses). Not all of these
protocols can be readily applied in DNA barcoding workflows with the expectation of recovering
high-quality DNA. For example, many genetic resources repositories today house large
collections of field-sourced tissue from vertebrate liver and other internal organs which if salvaged
1h or more post-mortem may contain heavily degraded DNA, due to the early onset of autolysis.
The main features of a barcode-friendly tissue source are:

23
GTI Training Manual – Standardized Workflows in DNA barcoding

• The tissue should be rich in structures such as mitochondria (for animal COI) or plastids (for
plant rbcL or matK);
• The tissue should have low enzymatic activity, reducing the likelihood of post-mortem
autolysis;
• The tissue should allow relatively easy lysis and DNA extraction;
• There should be a low risk of foreign contaminants

Examples of DNA-friendly tissue sources that conform to these criteria in animals are: skeletal
muscle, nervous system and gonads (although the latter may contain symbionts or parasites in
certain insects and invertebrates). For plants, green vegetative parts are generally a good tissue
source.

Tissue preservation

In addition to the source, preservation techniques and chemicals used have an important effect
on DNA preservation. The following are considered to be DNA-friendly killing/fixation methods:

• Non-chemical methods (freezing)


• Ethanol (aquatic, pitfalls and Malaise traps)
• Chloroform, cyanide, ammonia (insects)
• Isoflurane, carbon dioxide (vertebrates)

Among non-chemical methods, ultra-cold freezing (cryo-preservation) is considered ideal, but


expensive and logistically difficult; drying (desiccation) is also good, but may be sensitive to
storage environment, e.g., ambient humidity.
By contrast the following are discouraged because they can destroy DNA and/or bind it,
complicating DNA extraction:

• Formalin (marine)
• Ethyl acetate (insects)
• Diluted propylene glycol (Malaise traps, pitfalls)
• Most histological solutions

Note that all methods are sensitive to a wide range of factors, such as nature and quality of tissue,
quality of fixative/preservative, specifics of the fixation procedure, and subsequent storage
conditions. A non-exhaustive list of examples is provided below.

Ethanol preservation:
Although ethanol is a good preservative commonly used by field biologists to store tissue
samples, it does not guarantee preservation of high molecular weight DNA. It is sometimes
possible to estimate the likelihood of ‘DNA barcode-friendly’ preservation by visual inspection of
the samples (Figure 9). Factors that may contribute to DNA degradation include:

• Reagent quality (e.g., acidity and additives may damage DNA);


• Reagent concentration (elevated water content leads to DNA hydrolysis);

24
GTI Training Manual – Standardized Workflows in DNA barcoding

• Tissue/ethanol volume ratio (excessive tissue lowers ethanol concentration and increases
the concentration of autolytic enzymes and PCR inhibitors);
• Relative surface area of sample (relatively high volume hampers fixative penetration into
the tissue);
• Storage temperature (higher temperatures are likely to increase nuclease activity and other
agents destroying DNA);
• Exposure to light (may lead to DNA photolysis);
• Fixative evaporation (harmful if leads to increase in water concentration).
The effect of these factors can be mitigated if samples are stored in a cool place away from light,
ideally at freezing temperatures.

Figure 9. Examples of ethanol-preserved vertebrate muscle samples with various probabilities of DNA
barcode recovery. If the volume of sample is large, relative to fixative, if the sample has not been fragmented
into small pieces, or if the fixative is coloured and partially evaporated, the chances of recovering high
quality DNA barcode sequences are lower. The evaporation of fixative is not necessarily indicative of DNA
degradation, as long as the samples are completely desiccated (no water remains).

Dry (desiccated) preservation:


Desiccation is a good form of preservation of DNA as well as of certain morphological
characteristics and is often used, for example, in herbaria and in insect collections.
The following factors need to be considered to ensure that DNA is not just properly preserved
during field collecting but is maintained under conditions that preclude its degradation:

• Drying conditions (how long and how well)


• Pre-treatment procedures (skin tanning, insect relaxing) may have adverse effects
• Ambient humidity should be low
• Storage temperature should be as low as possible
• Exposure to light should be minimized
• Fumigants and preservatives used during preparation or storage (PDB, arsenic)

25
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 10. Examples of dry-preserved specimens: dried butterfly removed from a storage envelope (A) and
plant leaf sample in bag with desiccant (B). Whole insects can be preserved, either in envelopes or pinned.
Plants are typically dry-preserved on herbarium sheets; their tissue samples destined for molecular
analyses are often stored separately in sealed bags containing desiccant, such as silica gel. This provides
better protection against fluctuations in ambient humidity.

For more information on tissue sampling, see Annex V.

26
GTI Training Manual – Standardized Workflows in DNA barcoding

Chapter 2. Molecular Analysis

Set-up of Molecular Laboratory


Designing lab workflow
A properly designed lab workflow provides efficient use of space and resources and it significantly
reduces biocontamination which is a significant limiting factor in molecular diagnostics. A
molecular research/ diagnostic lab should ideally be separated in three isolated, but associated,
working areas/rooms (depending on the available space): reagent preparation and storage,
sample preparation (front-end processing), and molecular analysis (analytical space) (Figure
11). The two major components of the analytical area – DNA extraction and PCR amplification –
should be isolated by a barrier, and bidirectional flow of lab-ware should be discouraged.

Front-end processing

Sample Sample Reagent


Receiving preparation room
and sorting

Molecular analysis (Analytical Area)

DNA PCR EtBr & UV


extraction exposure

DNA Hazardous
archival Sequencing waste EXIT
(-80°C)

Bioinformatics

Figure 11. Conceptual lab design for DNA barcoding.


The DNA room should be equipped with fridge and freezer for reagent storage, an incubator,
centrifuge, vortex, pipettes and shelves/cupboards to store consumables.
The PCR room should be equipped with freezer (-20°C) and fridge for amplicon and reagent
storage, a centrifuge, contained workstation (with UV light) for PCR preparation, thermocycler(s)
for amplification, dedicated pipettes, dedicated vortex, and a dedicated place to hang laboratory
coats. Lab workers should be encouraged to use lab-area-specified lab coats and gloves.

27
GTI Training Manual – Standardized Workflows in DNA barcoding

PCR verification by gel electrophoresis is a vital component of DNA barcoding and most other
biomolecular diagnostic labs. PCR product visualization in agarose gels generally employs
ethidium bromide (EtBr), as dye. This agent binds with nucleic acid molecules and will fluoresce
when exposed to ultraviolet light facilitating the detection of PCR products. EtBr is thought to act
as a mutagen and is classified “hazardous”. Allocation of isolated space, proximal to PCR station,
is generally desired to minimize EtBr contamination and exposure to ultraviolet light. The gel-
electrophoresis station should be provided with dedicated pipettes, dedicated lab coat and
dedicated UV-station, and dedicated hazardous-waste bin to minimize carryover of EtBr to other
parts of the lab.
For a list of equipment and consumables required in a medium/high-throughput molecular facility,
see Annex VI.

Storage of reagents
Proper storage will keep the chemicals/ reagents fit for use, minimize cross-contamination, and
reduce hazards for lab workers. Flammables should be stored in flammable storage cabinets/
purpose-built ventilated cabinets.
All flammable storage cabinets must be clearly labeled with signs – “Flammable, keep fire away”.
Chemical containers/bottles should be kept tightly closed when not in use.
Non-flammable liquid reagents and solids, like buffers and salts, are usually stored on the shelves
at room temperature. However, care must be taken not to put incompatible chemicals, like acids
and bases, together. For their long-term stability, some reagents/enzymes, such as Taq
polymerase, must be stored in a freezer (below -20°C). Likewise, refrigeration of working buffers,
particularly phosphates, will enhance their life. All containers/bottles should be visibly labelled
indicating their contents. It is desirable that the containers of working-solutions/buffers be labelled
with both the contents and the preparation date. Reagents, which have a specified shelf life and
have expired should be removed from the shelf and disposed of following the standard procedures
of waste management.

Emergency procedures
Material safety data sheets (MSDS) should be available at a visible site in the lab. If exposed to
a potentially hazardous chemical, consult the MSDS for the chemical you were working with and
follow the remedial measures. In case of skin contact, remove contaminated clothing and rinse
off the affected skin with plenty of water for 15-20 min. For eye contact, rinse eye(s) thoroughly,
rolling the eye balls around, for 15-20 min at the eye-wash station. In case of inhalation, move
into fresh air immediately. In all cases of exposure to hazardous materials, seek medical attention
if symptoms persist. In case of emergency, contact the campus police/emergency services.
When there is chemical spill or fire in the lab, it must be dealt with professionally. Spills, particularly
of flammable solvents, should be cleaned immediately using non-flammable absorbents. Using
paper towels to soak up flammable liquids is not recommended. Fire extinguisher should be
readily available when working with flammables. If fire alarm goes off, leave the building
immediately and call emergency services/police.

28
GTI Training Manual – Standardized Workflows in DNA barcoding

Good Laboratory Practices for a biomolecular/ DNA barcoding lab:


1. Lab (work area) structure/workflow:
o Separation among work stations assigned to different protocols/tasks
o Isolation of reagent storage area from sample preparation and analytical area
o Separation between sample preparation and analytical space
o Isolation between DNA extraction benches and PCR amplification space
o Use of DNA-free hood for preparation of PCR mixes and sequencing mixes
o Use of PCR hood for PCR preparation (transfer of DNA to PCR plates)
o Isolation of gel electrophoresis station and hazardous waste bins
o Allocation of sequencing-reaction preparation bench (hood)
o Allocation of dedicated freezer/ fridge for PCR (amplification) room
2. General guidelines for lab workers:
o Familiarize yourself with the laboratory equipment, lab-ware and chemicals
o Use lab coats to protect yourself (exposed parts) and the quality of your experiment
o Use close-toed shoes to avoid contact with accidental spills of chemical/biological
materials
o Use gloves for your safety and for the protection of experimental material from cross
contamination
o Wear safety-glasses when working with volatile chemicals and reagents
o Use nose-mask for protection against inhalation of hazardous fumes and aerosols
o Use a chemical fume hood when working with toxic volatile materials (eg. Phenol,
chloroform, formaldehyde etc.), flammables, or aerosol creating toxic substances
o Dispose recyclable and non-recyclable waste in designated bins
o Dispose hazardous waste in protected sacs in specified bins
o Always clean thoroughly the work space before starting any procedure
3. Safety precautions:
o Personal safety: learn about proper handling of lab equipment and chemicals
o Take appropriate measures to save quality and integrity of the experimental material
o Protect against potential chemical and non-chemical hazards
o Do not work bare-handed with hazardous material, such as ethidium bromide (EtBr),
ethanol, DNase etc.
o Understand the use of fire extinguisher and fire code
o Locate eyewash station in the lab and learn how to use it in emergency
o Secure flammables behind a sturdy (metal) cupboards
o Know who to contact (phone) in case of emergency
o Know the location of first aid kit (in the lab)

DNA Extraction
DNA (deoxyribonucleic acid): Cell, the building block of living organisms, contains unique
material, deoxyribonucleic acid (DNA), that is self-replicating and carries the genetic information.
Most part of the DNA is located within nucleus (nuclear DNA – nDNA) while a fraction is found in
mitochondria (mtDNA) and chloroplasts (cpDNA) and is also referred to as extranuclear or
cytoplasmic DNA. The DNA of an organism is called its genome. The genome is
compartmentalized into chromosomes that may vary in number among organisms from different
species. The distinct sequence of nucleotides in the genome, the order of which determines the
order of monomers in a nucleic acid molecule or polypeptide, is the gene. Nucleotides that code
for proteins (exons) make the coding region (coding DNA sequences – CDS) and those that do

29
GTI Training Manual – Standardized Workflows in DNA barcoding

not code for proteins (introns) make the non-coding region of the genome. Each DNA molecule
is comprised of a nitrogen base, sugar and phosphate. Four types of nitrogen bases, Adenine
(A), Thymine (T), Guanine (G), and Cytosine (C), that determine nucleotides, are organized in
double-helix, like a twisted ladder, with phosphates and sugars making the backbone of the
ladder. With two carbon rings and four nitrogen atoms, A and G are classed as purine while with
one carbon ring and two nitrogen atoms, T and C are classed as pyrimidine.
DNA extraction is the process of separation of DNA, present in the nucleus (nuclear DNA –
nDNA), mitochondria (mitochondrial DNA – mtDNA) or chloroplast (chloroplast DNA – cpDNA),
from other cellular components. Since the discovery of DNA, a range of methods have been
practiced and a multitude of protocols have been published for its isolation. Isolated DNA is used
in a range of applications, such as genetic analysis, identification of individuals/ species, gene
transformation, forensic analysis etc. Each application may require a certain level of DNA purity/
integrity. In principle, DNA extraction involves two main steps; i) releasing DNA from cell by
disrupting nuclear and cell membranes and/or cell wall (in plants) – by lysis, and ii) separating
nucleic acid from the other cellular content – by isolation. Employment of a DNA extraction method
generally depends on the organism, source, age, and size of the sample. For example, plants
have cell wall, which animals are lacking, and that may require use of additional chemicals and
modified protocols for lysis. While selecting an extraction method, purpose of the DNA utility may
be another consideration. For instance, DNA microarray analysis may require a higher level of
DNA purity than that for PCR, thus requiring more stringent purification steps. Cost of the
procedure may also be a deciding factor in method selection as one protocol may be more cost
efficient than the other, though both would yield similar amount and quality of DNA.

Conventional methods
The conventional methods of DNA extraction are mainly based on chemical extraction. In general,
they are laborious and may require specialized laboratory bench space due to involvement of
hazardous chemicals. In principle, the method relies on salting-out the proteins by precipitation to
separate the nucleic acid (DNA/RNA). Most often additional chemicals, such as
phenol/chloroform, are used to denature the proteins and purify the DNA. The most common
method is phenol/chloroform extraction. This method involves mixing of lysate with phenol-
chloroform and separation of nucleic acid from protein/cell debris by centrifugation. Proteins are
denatured and precipitate down with phenol in organic phase while DNA remains in the aqueous
phase which is then precipitated by alcohol. The phenol/chloroform method is commonly used
due to its ability to produce high-molecular-weight DNA, which is desired for construction of
genomic libraries, and also due to its lower cost per sample. Although phenol alone denatures
proteins efficiently, it does not fully inhibit RNase activity. Combination of chloroform with phenol
overcomes RNases and a subsequent treatment with only chloroform removes traces of phenol
from the preparation. A range of protocols are available for all sorts of living organisms and tissue
types. An example is provided below.
Phenol/chloroform extraction for animal tissue23
1. Transfer the sample (ground mixture - lysate) to a polypropylene tube and mix by shaking
with equal volume of phenol:chloroform: isoamyl alcohol (25:24:1).

23
Adapted from Green MR, Sambrook J (2012). Molecular Cloning – A Laboratory Manual. Cold Spring
Harbor Laboratory Press.

30
GTI Training Manual – Standardized Workflows in DNA barcoding

(Isoamyl alcohol reduces the foaming and improves separation between aqueous and
organic phases).
2. Centrifuge the mixture (@12,000 x g) for 3-5 min to separate the aqueous and organic
phases.
3. Transfer (using a pipette) the aqueous phase (upper layer) to a fresh tube. Discard the
organic phase (lower layer).
4. Repeat steps 1 – 3 until no protein is visible at the interface of the organic and aqueous
phases.
5. Add an equal volume of chloroform, and repeat steps 2 – 4.
6. Add 2X volume of 95% ethanol, chill at -20°C for at least 30 min, and pellet the DNA by
centrifugation (30 min @12,000 x g).
7. Wash the DNA pellet by 70% ethanol, air dry, and re-suspend in deionized water. Store
the purified DNA at -80°C until use.

Advanced methods (silica-based)


The advanced methods are based on binding DNA molecules to silica surfaces in the presence
of certain salts and under certain pH conditions. Contrary to conventional methods, silica-based
DNA purification require less involvement of hazardous chemicals. In general, silica-based
extraction is conducted in four steps:
1. Passing cell lysate through silica membrane in the presence of high-salt buffers – the silica
holds the nucleic acid molecules.
1. Washing away proteins and other cellular debris using specialized buffers.
2. Washing away other impurities, such as remnants of buffer salts, by high volume of
ethanol-based wash buffers.
3. Eluting the purified DNA, from silica membrane, to collection tube.
There is a large variety of commercial kits for silica-based DNA extraction. Each kit comes with
spin-columns (containing the silica membrane), reagents and instructions to be followed.
An example of a silica-based DNA extraction protocol by using a commercial kit (modified from
Qiagen DNeasyâ Blood and Tissue Kit) is described below:

1. Cut tissue (up to 25 mg) into small pieces, and place in 1.5 ml microcentrifuge tube. Add
180 μl Buffer ATL.
2. Add 20 μl proteinase K. Mix by vortexing and incubate at 56°C until completely lysed.
Vortex occasionally during incubation (or place in a thermomixer, in a shaking water bath,
or on a rocking platform). Lysis is usually complete in 1–3 h (samples can be lysed
overnight).
3. Vortex for 15 s. Add 200 μl Buffer AL to the sample. Mix thoroughly by vortexing. Then
add 200 μl ethanol (96–100%). Mix again thoroughly. Alternatively, premix Buffer AL and
ethanol, and add together.
4. Pipet the mixture into a DNeasy Mini spin column in a 2 ml collection tube. Centrifuge at
6,000 x g (8,000 rpm) for 1 min. Discard flow-through and collection tube.
5. Place the spin column in a new 2 ml collection tube. Add 500 μl Buffer AW1. Centrifuge
for 1 min at 6,000 x g (8,000 rpm). Discard flow-through and collection tube.

31
GTI Training Manual – Standardized Workflows in DNA barcoding

6. Place the spin column in a new 2 ml collection tube. Add 500 μl Buffer AW2. Centrifuge
for 3 min at 20,000 x g (14,000 rpm). Discard flow-through and collection tube. Remove
the spin column carefully so that it does not come into contact with the flow-through.
7. Transfer the spin column to a new 1.5 ml or 2 ml microcentrifuge tube and add 200 μl
Buffer AE for elution (directly on the DNeasy membrane). Incubate for 1 min at room
temperature. Centrifuge for 1 min at 6,000 x g. Recommended: Repeat this step for
maximum yield.
Alternatively, spin-columns can be bought separately and used in a DNA extraction protocol with
home-made reagents (see Annex VII and VIII).

DNA quantification
Although purified DNA has many uses, in DNA barcoding, it is mainly used for PCR analysis. Use
of established protocols, expert hands, reliable and fresh samples, and optimized sample size
almost guarantees good DNA recovery. Although validation of successful DNA extraction is a
good idea, the DNA estimation step may also be circumvented.
1. Ethidium bromide stained gels: Query DNA extracts are electrophoresed on agarose gel
in parallel with the DNA of known amount. The gel is stained with ethidium bromide (EB),
illuminated with ultraviolet (UV) light, and bands of query DNA are then compared with
those of known DNA to estimate the DNA amount in the query sample. The analysis is
facilitated by capturing the image by CCD camera and using software for measuring band
intensity.
2. Real-time PCR: The real-time PCR measures the amount of double-stranded DNA, at
each cycle, with the help of fluorescence dye that is incorporated in the reaction. By
capturing the change in fluorescent intensity on video camera the amount of DNA can be
interpreted by designated software.
3. Spectrophotometry: The method involves measuring the absorbance of the sample at
260nm on a spectrophotometer. Nucleic acids absorb UV light in a specific pattern. When
DNA is exposed to UV light @260nm, a photo-detector on the other end measures the
light passing through the DNA sample. A higher DNA concentration will absorb more light
producing a higher optical density. The absorbance ratio is converted to DNA
concentration through a formula.
4. PicoGreen: This method measures fluorescent intensity of PicoGreen (dye) in reference
to a set of standards. The fluorescent intensity is increased when the dye binds to the
dsDNA, which is read by spectrofluorometer, and is then translated to the amount of DNA
in the sample.

DNA preservation
Several factors may play a role in degradation of purified DNA, but three are most common;
contaminants and nucleases (DNases), acid hydrolysis (due to absence of buffering capacity in
the storage solution), high variation in temperature (frequent freeze-thaw cycles).
Follow the following guidelines to overcome DNA degradation and for proficient DNA storage.
1. Use gloves and clean bench space to avoid contamination with DNases.

32
GTI Training Manual – Standardized Workflows in DNA barcoding

2. Elute/dissolve the DNA in a buffer containing chelating agent. For example, TE buffer
(10 mM Tris pH 8.0, 1 mM EDTA).
3. Store at low temperature and avoid frequent freeze-thaw cycles. One way to avoid
freeze-thaw cycles is to aliquot extracted DNA in multiple vials.

As a general guideline for DNA storage, follow the following scheme.


i. Short-term (for few weeks): dissolve in TE and store at 4°C.
ii. Medium-term (for few months): dissolve in TE and store at –80°C.
iii. Long-term (few years): precipitate in ethanol and store at –80°C.
iv. Long-term (many years): dry and store at –160°C.

Polymerase Chain Reaction (PCR)


The polymerase chain reaction (PCR) is used to amplify a selected target sequence that occurs
within the genome of a certain organism. The procedure can selectively amplify the target DNA
region, in a quasi-exponential chain reaction, generating millions of copies. PCR uses
temperature cycling to perform polymerase-catalyzed DNA synthesis from the template in the
presence of deoxynucleotide triphosphates (dNTPs), oligonucleotides, and specified buffer. In
general, each cycle of PCR consists of three stages; denaturation of the template DNA (at >
90°C), annealing of the oligonucleotide (primers) to the template, and extension that completes
the synthesis of the target DNA (Figure 12). The temperature cycling of the reaction is performed
in a thermocycler (PCR machine), a programmable device that controls the temperature and time
of each cycle. PCR has broad applications in molecular biology tasks including DNA sequencing,
genotyping, mutation detection, DNA cloning, and in vitro mutagenesis etc.

Barcode
region
5’ 3’
Gene 3’ 5’

Denaturation (94-98°C)

5’ 3’
3’ 5’

Annealing (45-68°C)
5’ 3’
3’ 5’

Extension (65-75°C)
5’ 3’
3’ 5’
5’ 3’
3’ 5’

Figure 12. Schematic representation of COI amplification. The two DNA strands are coloured in blue and
green, while the primers are in red and orange.

33
GTI Training Manual – Standardized Workflows in DNA barcoding

In a standard PCR procedure, the target gene region is amplified from double-stranded DNA
(dsDNA – template) through a series of temperature cycles in a thermocycler. One cycle is
completed in three steps; denaturation, annealing, and extension. The template is mixed with
DNA polymerase, dNTPs, primers, MgCl2 and PCR buffer. In step one (denaturation @ >94°C),
dsDNA is separated into two single strands, forward and reverse. In step two (annealing @ 45°C
– 68°C), the primers attach with the complementary sequence region of the target DNA to initiate
DNA synthesis in the presence of polymerase. In step three (extension @ 65°C – 75°C),
polymerase enzyme extends DNA sequence by incorporating dNTPs available in the PCR
reaction. At the end of each cycle the amount of DNA doubles producing millions copies of the
target DNA in just over 25 cycles.

PCR Primers
Primers are short chains of nucleotides (17-30 bp), also called oligonucleotides, that are artificially
synthesized and serve as initiation point for amplification and synthesis of the target DNA in the
genome. Since DNA in nature exist as double stranded, a pair of primers is used simultaneously
for the synthesis of both the forward and the reverse strands. Primers are the key element in PCR
amplification of the target gene marker.
The successful primers possess nucleotide sequence that is highly complementary to the target
region of the DNA. High sequence similarity with the target reduces the possibility of
mishybridization to a similar sequence in the genome. Primers in the pair, used to amplify target
DNA, must have similar melting temperature and that temperature should not vary significantly
from that of the template DNA strand. Low dimerization and ability of hairpin formation are other
qualities to consider.
The most common problems in PCR amplification arise from:

• Dimerization – primers hybridize with each other (instead of hybridizing with the target
DNA) generating short amplicons, the primer dimers;
• Hairpin formation – two ends of the same primer possess complementary nucleotides and
hybridize producing a hairpin like structure compromising their efficiency of hybridizing
with the target template.
Primers with GC-content of 40-60% are more efficient that those with low %GC.

Species-specific primers

Designing primers for the target gene region with known sequence is straightforward. Sequence
of the published DNA may be obtained from GenBank or other public sources and can be used
as template to design primers, manually or by using software. Primers that are based on sequence
from a single species, and will most likely amplify the target DNA only from that species, are
species-specific primers.

Universal and degenerate primers

Universal primers may be designed from a gene region that is highly conserved among most
species and will amplify the target from most species. Degenerate primers on the other hand are
designed by locating the frequently-variable nucleotide sites among species and adding a
degenerate base at the variable site in the primer sequence (Figure 13). Degeneracy is

34
GTI Training Manual – Standardized Workflows in DNA barcoding

incorporated by combining oligonucleotide sequences in which variable base sites are altered in
such a way that the primer covers all the possible base combinations for the variable site. Codes
for different base combinations are selected by following universally accepted IUPAC
(International Union of Pure and Applied Chemists) codes (Table 2). For example, for A or G the
code is R and for C or T it is Y. Likewise, where all four bases (A, G, T, C) are a probability, the
code is N.
Forward primer
H AC WWT AT A Y T T Y A T T T T YGG W A T Y

5' 1
3'
2
3
4
5
6
7
8
9
10

Forward strand
Reverse primer
C H A T R Y T H Y T W A C W G A Y CG W A A T H

3' 1
5'
2
3
4
5
6
7
8
9
10

Reverse strand

Figure 13. Degenerate primers design.

Table 2. IUPAC codes for nucleotides.

IUPAC nucleotide code Base


A Adenine
C Cytosine
G Guanine
T (or U) Thymine (or Uracil)
R A or G
Y C or T
S G or C
W A or T
K G or T
M A or C
B C or G or T
D A or G or T
H A or C or T
V A or C or G
N Any base
Or - gap

35
GTI Training Manual – Standardized Workflows in DNA barcoding

Primer cocktail

Use of primer cocktails is common in DNA barcoding mix of species. Primer cocktail is a mixture
of two to three universal primers that enhances the probability of target-DNA amplification by
addressing the nucleotide variation. For example, the following forward and reverse primers,
which vary at a few nucleotide positions (indicated by bold letters), are mixed together to broaden
the range of target species amplified in a single PCR run.
Cocktail of forward primer C_LepFolF is prepared by mixing LepF1 and LCO1490 in equal
volumes:
LepF1: ATTCAACCAATCATAAAGATATTGG
LCO1490: GGTCAACAAATCATAAAGATATTGG
Cocktail of reverse primer C_LepFolR is prepared by mixing LepR1 and HCO2198 in equal
volumes:
LepR1: TAAACTTCTGGATGTCCAAAAAATCA
HCO2198: TAAACTTCAGGGTGACCAAAAAATCA

Tailed primers

A tailed primer contains extra nucleotides at the 5′-end that are not complementary to the target
gene sequence. The 3′-end of the tailed primer anneals to amplify the target region while the non-
complementary 5′-end, creates a “tail” of additional nucleotides with the PCR product. The tail is
added for various objectives such as incorporation of endonuclease sites, plasmid cloning, or to
facilitate sequencing. In DNA barcoding the tailed primers are mainly used to facilitate
sequencing. Generally, the PCR primers used in PCR reaction are also used in cycle sequencing.
PCR amplification by universal/ degenerate primers, from DNA of multiple taxa arrayed in a single
microplate, for instance, may compromise sequencing success with degenerate primers. Tailing
the degenerate primer sequence with an established universal primer, such as M13, allows the
use of primer tail for cycle sequencing.
Here is an example of forward and reverse primers where LepF1 is tailed with M13-forward (red)
and LepR1 with M13-reverse (blue).
LepF1_t1: 5′ TGTAAAACGACGGCCAGTATTCAACCAATCATAAAGATATTGG 3′
LetpR1_t1: 5′ CAGGAAACAGCTATGACTAAACTTCTGGATGTCCAAAAAATCA 3′

A list of the most common primers used in DNA barcoding of animals, plants and fungi is provided
below24.

24
References are detailed in Annex VIII.

36
GTI Training Manual – Standardized Workflows in DNA barcoding

Primer sets for animals


Name Ratio Cocktail name / Primer sequence 5’-3’ Reference Taxonomic
Groups
Folmer primers Folmer et al. Various phyla
1994
LCO1490 GGTCAACAAATCATAAAGATATTGG
HCO2198 TAAACTTCAGGGTGACCAAAAAATCA
Lepidoptera primers Insects,
amphibians
LepF1 ATTCAACCAATCATAAAGATATTGG Hebert et al.
2004a
LepR1 TAAACTTCTGGATGTCCAAAAAATCA
MLepF1 GCTTTCCCACGAATAAATAATA (use with LepR1) Hajibabaei et
al. 2006
MLepR1 CCTGTTCCAGCTCCATTTTC (use with LepF1)
Bird primers Hebert et al. Birds
2004b
BirdF1 TTCTCCAACCACAAAGACATTGGCAC
BirdR1 ACGTGGGAGATAATTCCAAATCCTGG
C_LepFolF – C_LepFolR Hernández- Various
Triana et al. invertebrates
LepF1 1 ATTCAACCAATCATAAAGATATTGG 2014
LCO1490 1 GGTCAACAAATCATAAAGATATTGG
LepR1 1 TAAACTTCTGGATGTCCAAAAAATCA
HCO2198 1 TAAACTTCAGGGTGACCAAAAAATCA
C_VF1LFt125 – C_VR1LRt1 (Mammal cocktail) Ivanova et al. Mammals,
2007 reptiles, fish,
LepF1_t1 1 TGTAAAACGACGGCCAGTATTCAACCAATCATAAAGATATTGG amphibians
VF1_t1 1 TGTAAAACGACGGCCAGTTCTCAACCAACCACAAAGACATTGG
VF1d_t1 1 TGTAAAACGACGGCCAGTTCTCAACCAACCACAARGAYATYGG
VF1i_t1 3 TGTAAAACGACGGCCAGTTCTCAACCAACCAIAAIGAIATIGG
LepR1_t1 1 CAGGAAACAGCTATGACTAAACTTCTGGATGTCCAAAAAATCA
VR1d_t1 1 CAGGAAACAGCTATGACTAGACTTCTGGGTGGCCRAARAAYCA
VR1_t1 1 CAGGAAACAGCTATGACTAGACTTCTGGGTGGCCAAAGAATCA
VR1i_t1 3 CAGGAAACAGCTATGACTAGACTTCTGGGTGICCIAAIAAICA
C_FishF1t1 – C_FishR1t1 (Fish cocktail) Ivanova et al. Fish, mammals
2007
VF2_t1 1 TGTAAAACGACGGCCAGTCAACCAACCACAAAGACATTGGCAC
FishF2_t1 1 TGTAAAACGACGGCCAGTCGACTAATCATAAAGATATCGGCAC
FishR2_t1 1 CAGGAAACAGCTATGACACTTCAGGGTGACCGAAGAATCAGAA
FR1d_t1 1 CAGGAAACAGCTATGACACCTCAGGGTGTCCGAARAAYCARAA
Sequencing primers for M13-tailed PCR products Messing, 1983

M13F TGTAAAACGACGGCCAGT
M13R CAGGAAACAGCTATGAC

25
Primers ending in ‘t1’are M-13 tailed (PCR products are sequenced only with M13 primers).

37
GTI Training Manual – Standardized Workflows in DNA barcoding

Primer sets for plants and fungi


Name Primer sequence 5’-3’ Taxonomic group Reference

rbcL primers Vascular plants

rbcLa-F ATGTCACCACAAACAGAGACTAAAGC Levin et al, 2003

rbcLa-R GTAAAATCAAGTCCACCRCG Kress & Erickson, 2009

rbcLajf634R GAAACGGTCTCTCCAACGCAT Fazekas, 2008

matK primers Flowering plants

MatK-1RKIM-f ACCCAGTCCATCTGGAAATCTTGGTTC Ki-Joong Kim, unpub.

MatK-3FKIM-r CGTACAGTACTTTTGTGTTTACGAG Ki-Joong Kim, unpub.

matK_xF TAATTTACGATCAATTCATTC Ford et al. 2009

matK-MALPR1 ACAAGAAAGTCGAAGTAT Dunning & Savolainen,


2010
MatK_390f CGATCTATTCATTCAATATTTC Cuenoud et al. 2002

MatK_1326r TCTAGCACACGAAAGTCGAAGT Cuenoud et al. 2002

psbA-trnH primers Vascular plants

psbA3_f GTTATGCATGAACGTAATGCTC Sang et al. 1997

trnHf_05 CGCGCATGGTGGATTCACAATCC Tate & Simpson, 2003

ITS2 primers Vascular plants

ITS_S2F ATGCGATACTTGGTGTGAAT Chen et al. 2010

ITS_S3R GACGCTTCTCCAGACTACAAT Chen et al. 2010

ITS primers Fungi

ITS5 GGAAGTAAAAGTCGTAACAAGG Ascomycetes White et al. 1990

ITS4 TCCTCCGCTTATTGATATGC White et al. 1990

ITS1-F CTTGGTCATTTAGAGGAAGTAA Basidiomycetes Gardes and Bruns, 1993

ITS4-B CAGGAGACTTGTACACGGTCCAG Gardes and Bruns, 1993

ITS1 TCCGTAGGTGAACCTGCGG Internal for ascomycetes White et al. 1990

LSU primers Fungi

LR0R ACCCGCTGAACTTAAGC Vilgalys & Hester, 1990

LR5 TCCTGAGGGAAACTTCG Vilgalys & Hester, 1990

PCR Protocol
Addition of PCR primers and DNA (template) to PCR mix completes all the PCR ingredients that
are required for DNA synthesis through thermocycling. To avoid contamination and undesired
DNA amplification, this task should be performed on a sterilized (DNA free) bench or under a
dedicated glass hood equipped with the UV light for surface sterilization. Sterilization must be

38
GTI Training Manual – Standardized Workflows in DNA barcoding

accomplished by ELIMINase treatment, then ethanol wiping, and subsequently with UV exposure
(if available). New gloves should be used when preparing PCR reactions and lab supplies should
be organized with arm’s-length before transferring DNA template to the PCR mix.
There is a large variety of PCR recipes used in different molecular labs. See below a recipe
adapted from the Canadian Centre for DNA Barcoding protocols (ccdb.ca) in 12.5 uL reaction
volume. Trehalose is used to facilitate PCR and to allow freezing of aliquoted PCR master-mixes
(made ahead of time and frozen until required for processing). A high-fidelity Taq polymerase,
although usually more expensive, would require less optimization compared to standard Taq,
therefore saving time and having higher chances of success.
Reagents 1 reaction (µL)
10% trehalose for PCR 6.25
ddH2O 2
10X PCR buffer 1.25
50mM MgCl2 0.625
10mM dNTPs 0.0625
10µM forward primer 0.125
10µM reverse primer 0.125
Taq polymerase (5 U/µL) 0.06
Total 10.5
DNA template 2µL per well

For medium/high-throughput processing and detailed instructions of PCR handling, see Annex
VIII.

Gel electrophoresis

Before proceeding to cycle sequencing, the PCR amplification of the target DNA is generally
verified by agarose gel electrophoresis. A small amount (~4 µl) of PCR product is loaded onto
ethidium bromide (EtBr)-stained gel, electrophoresed at low voltage (60-70 volts) for a certain
duration of time and then visualized by exposing the gel to UV light. EtBr tangled with the DNA
molecules become fluorescent to UV light and indicates the location of amplified PCR product on
the gel.
Agarose gel may be prepared in lab or obtained commercially. For example, 2% agarose gel may
be prepared by dissolving (by heat or microwave) 2 g of agarose in 100 ml of electrophoresis
buffer (TAE or TBE). EtBr may be added (0.5 µg/ml) to the gel at the time of preparation or the
gel may be stained, after electrophoresis, by soaking in EtBr solution.
Electronic-gel (E-gel) is another option that provides a user-friendly alternative for lab-made gel
and pre-cast gel is also available commercially. PCR verification in E-gel is accomplished without
using electrophoresis buffer which also minimize the chances of EtBr contamination. Readymade
E-gel contains EtBr and comes with multiple options for sample capacity including 96-well plate
format. See details in Annex VIII.

39
GTI Training Manual – Standardized Workflows in DNA barcoding

Cycle Sequencing
Cycle sequencing is very similar to PCR in action as it utilizes DNA polymerase and free
nucleotides to generate copies of template DNA in a temperature-cycling format; but there are
two core differences between the two. First, cycle sequencing employs only one primer to
synthesize copies of only one strand of the double helix DNA, and this copy cannot be used as a
template for later cycles. This means the amplification in cycle sequencing is linear, not
exponential, which would require sufficient copies of original template DNA in order to be detected
by sequencing equipment. Second, in addition to deoxynucleotide triphosphate (dNTPs), cycle
sequencing also utilizes dideoxynucleotide triphosphate (didNTPs). During amplification, the
didNTPs will randomly sit on the DNA template and cause base-specific termination of extension.
This process would generate fragmented copies of the single target strand with each copy
terminated at a specific nucleotide site. Where double stranded PCR product is synthesized, using
two primers (forward and reverse) in a single PCR run, two single stranded amplicons are
synthesized in two separate PCR runs using one primer in each run (Figure 14).
After cycle sequencing is complete, the PCR product is ready for Sanger sequencing in an
automated sequencer (usually provided by a sequencing facility as a service for fee). See Annex
VIII for a 96-well-based protocol for cycle sequencing. In many cases, PCR products are sent to
sequencing facilities before cycle sequencing.
Why is cycle sequencing needed?
1. The amount of DNA template necessary for the sequencing reaction is significantly
reduced.
2. Smaller amount of template introduces fewer impurities in the sequencing reaction.
3. No additional denaturation step is required at sequencing stage since PCR product has
gone through multiple heat-denaturation steps in sequencing reactions.

Figure 14. Schematic representation of cycle sequencing.

40
GTI Training Manual – Standardized Workflows in DNA barcoding

Chapter 3. Informatics and Data Analysis

Sequence Editing
The visual representation of a DNA sample is a chromatogram (electropherogram or trace file)
and is generated by a sequencer. The common file formats for chromatograms are ABI and SCF.
The ABI file format is a binary file that is created by ABI sequencer software, while SCF (standard
chromatogram format) may be created by other sequencers (e.g., Beckman,Li-Cor).
PCR products are usually sequenced bi-directionally (forward reaction and reverse reaction) in
order to cover the entire length of the desired genetic fragment. Forward and reverse traces are
assembled into one contig (one sequence) and edited to fix ambiguous bases (if possible) with
specific software (free: BioEdit, commercial: CodonCode, Geneious, Sequencher etc).
Regardless of the software choice, the steps required to move from raw data (trace file) to a
reliable sequence are the same (see below).

Figure 15. Sequence editing workflow. Codes: .ab1 - ABI file; F - forward; R - reverse; RC - reverse
complement; N - ambiguous base (see IUPAC codes); SID - BOLD Sample ID; PID - BOLD Process ID.
*The R reaction can be RC while performing primer trimming (see text).

41
GTI Training Manual – Standardized Workflows in DNA barcoding

Trim ends
Example of low quality end that should be removed (usually the software allows an easy ‘highlight
and delete’ option).

Trim primers
As primers are synthetic sequences that bind to the DNA strand, they may not reflect the ‘real’
sequence at the site of annealing and therefore should be removed.
In the example below, sequencing was performed with universal invertebrate (‘Folmer’26) primers:
LCO1490 (forward primer): 5’- GGTCAACAAATCATAAAGATATTGG-3’
HCO2198 (reverse primer): 5’-TAAACTTCAGGGTGACCAAAAAATCA-3’
Forward (F) trace: go to the end of the forward trace and look for the reverse primer (as reverse
complement, RC)

At position 656, the primer sequence starts: TGATTTTTTGGTCACCC…. which is the RC of HCO
(…GGGTGACCAAAAAATCA). Delete primer sequence (starting with position 656 in the example
above, until the end of the trace). The remaining sequence should end with TTTATTT.
Reverse (R) trace: reverse complement the entire trace (usually by clicking a button in the
software) and look for the F primer at the beginning of the R trace.

At position 26 the primer sequence ends (….CATAAAGATATTGG). Delete primer sequence (in
the example above, everything from the beginning of the trace to position 26). The remaining
sequence should start with AACATT.

26
Folmer OM, Black WH, Lutz R, Vrijenhoek R (1994). DNA primers for amplification of mitochondrial
cytochrome C oxidase subunit I from metazoan invertebrates. Molecular Marine Biology and Biotechnology
3: 294-299.

42
GTI Training Manual – Standardized Workflows in DNA barcoding

Note: It is also possible to use the R trace as it is (not RC) and look for the F primer (as RC) at
the end of the trace (see below). However, the trace will need to be switched to RC before being
assembled into a contig with the F trace.

At position 657, the primer sequence begins: CCAATATCTTTATG… which is the RC of LCO
(…CATAAAGATATTGG). In this case, delete everything starting with position 657 until the end
of the trace.
In many cases only a fragment of the primer can be reliably observed (as above) due to
decreasing base quality but it will not have any impact on the overall sequence as primer
sequences are trimmed.
In cases where the quality of the trace is too low to identify the primer regions, remove any low-
quality areas and edit the remaining (shorter than 658bp) sequence. If the entire trace is low
quality, discard it from analysis.

Sequence assembly and editing


Once the primers are trimmed, F and R traces are ready to be assembled into one contig (usually
by clicking a button in the software). The contig is ready for manual inspection to verify quality
and assess ambiguities. If a nucleotide can be ‘called’ (e.g., a clear peak which erroneously
appears as ‘N’), it needs to be corrected. If peaks are overlapping and a decision cannot be made,
the ambiguity code ‘N’ should be used instead. In the example below, the image on the left shows
a double peak which cannot be solved while the image on the right shows two ambiguities, one
which cannot be solved (and will stay as ‘N’) and another one which could be called ‘G’.

Once the entire contig is verified and edited, it is ready to be exported (as fasta file) from the
software and uploaded to BOLD with the proper name. Various sequencing providers are using
various naming systems for electropherograms. During sequence editing, it is recommended to
change the name of the contig to reflect the BOLD Sample ID (or Process ID) so it can ease the
sequence upload to BOLD (see below an example of a contig named with BOLD Process ID:
RONOC159-18).

43
GTI Training Manual – Standardized Workflows in DNA barcoding

Quality Control
PCR amplification of false targets, pseudogenes, or contaminants is always a chance, and this
chance is higher when a broad range of taxa are targeted with universal primers. Therefore, it is
important to verify the integrity and quality of the target sequence to prevent erroneous sequences
from becoming part of the barcoding database. The validation steps can be performed directly in
BOLD (see next section, BOLD Analytics) or outside BOLD (see below).

Sequence alignment
Sequence alignment is a bioinformatics process in which two or more DNA (or protein) sequences
are arranged in a fashion so that their most similar nucleotides (or amino acids for protein) are
well aligned with each other. The alignment of multiple DNA sequences may help revealing
conservative or non-conservative nucleotide positions. At the same time, an alignment may
expose nucleotide insertions and deletions (or INDELs) which can be caused by errors in
sequence editing.
In cases where only one sequence is targeted in a workflow, additional DNA sequences (for the
hypothesized taxon) can be downloaded from public databases and an alignment performed as
quality control.
There are many software allowing sequence alignment (including the sequence editing software)
and one of the most versatile, easy to use and free is MEGA (Molecular Evolutionary Genetic
Analysis)27. Once the fasta file is imported into MEGA, sequence alignment can be performed

27
Kumar S, Stecher G, Tamura K ( 2016). MEGA7: Molecular Evolutionary Genetics Analysis version 7.0
for bigger datasets. Molecular Biology and Evolution 33:1870-1874 (https://www.megasoftware.net/).

44
GTI Training Manual – Standardized Workflows in DNA barcoding

with two algorithms (Figure 16). For large number of sequences, MUSCLE28 would perform faster
than ClustalW29.

Figure 16. Alignment of COI seqeunces performed with MUSCLE in MEGA 7. The option for algorithm
(ClustalW or MUSCLE) can be chosen from the top command pannel (see the red circle).

Translation into amino acids


A sequence of three nucleotides, referred to as one codon, corresponds to a specific aminoacid.
There are 64 possible triplet combinations of nucleotides (codons) coding for 20 amino acids
(Table 3) and these relationships are representing the genetic code (Table 4). The code is
considered degenerate because multiple codons can correspond to one amino acid. A few codons
are not responsible for conding any amino acid but are signaling the end of the polypeptide chain
and they are referred to as stop codons (or termination codons). In the standard genetic code,
the stop codons are the following: TAA, TAG and TGA.
Table 3. List of aminoacids and their codes.
Amino acid Code Letter Amino acid Code Letter
Alanine Ala A Leucine Leu L
Arginine Arg R Lysine Lys K
Asparagine Asn N Methionine Met M
Aspartic acid Asp D Phenylalanine Phe F
Cysteine Cys C Proline Pro P
Glutamic acid Glu E Serine Ser S
Glutamine Gln Q Threonine Thr T
Glycine Gly G Tryptophan Trp W
Histidine His H Tyrosine Tyr Y
Isoleucine Ile I Valine Val V

28
Edgar RC (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput.
Nucleic Acids Research, 32(5): 1792–1797.
29
Thompson JD, Higgins DG, Gibson TJ (1994). CLUSTAL W: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight
matrix choice. Nucleic Acids Research, 22: 4673-4680.

45
GTI Training Manual – Standardized Workflows in DNA barcoding

Table 4. Universal genetic code.


SECOND CODON LETTER
F T C A G T
I T TTT – Phe TCT– Ser TAT – Tyr TGT– Cys T H
R TTC TCC TAC TGC C I
S TTA – Leu TCA TAA – Stop TGA – Stop A R
T TTG TCG TAG – Stop TGG – W G D
C CTT – Leu CCT – Pro CAT – His CGT – Arg T
C C
CTC CCC CAC CGC C
O CTA CCA CGA A O
D CAA – Gln D
CTG CCG CAG CGG G
O O
N A ATT – Ile ACT – Thr AAT – Asn AGT – Ser T N
ATC ACC AAC AGC C
L ATA ACA AAA – Lys AGA – Arg A L
E ATG – Met ACG AAG AGG G E
T G GTT – Val GCT – Ala GAT – Asp GGT – Gly T T
T GTC GCC GAC GGC C T
E GTA GCA GAA – Glu GGA A E
R GTG GCG GAG GGG G R

Various taxonomic groups have diferent genetic codes (Table 5) with, sometimes, different stop
codons. Therefore, selecting the correct genetic code before translating nucleotides into amino
acids is crucial.

Table 5. Genetic codes important for DNA barcoding (Source: NCBI30). The internal transcribed spacer
(ITS) does not require a genetic code since it is not a protein-coding gene.
Plants - Chloroplast genes matK, rbcLa Fungi - ITS (non-
Animals - Mitochondrial gene COI coding)
Vertebrate Bacterial, Archaeal and Plant Plastid Code N/A
Invertebrate
Echinoderm & flatworm
Alternative flatworm
Ascidian
Trematode

Within the DNA barcoding context, a particular software is used to translate a DNA sequence
(nucleotide) into a protein sequence (amino acid) to verify the lack of stop codons. Their presence
would most likely indicate the amplification of a pseudogene since functional proteins do not
usually have internal stop codons. This translation step obviously applies to protein-coding genes
such as the animal and plant barcodes but not to fungal barcodes.
The protein translation of a gene starts with a start codon and ends with a stop codon. In
eukaryotes, there is only one start codon, ATG which codes for Methionine. The reading frame
starts with the letter “A” of the start codon. If it is started from the second letter of the start codon

30
https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi#SG11

46
GTI Training Manual – Standardized Workflows in DNA barcoding

(which is T), the translation will not be in frame producing incorrect amino acid reads. Since the
barcode region does not start at position 1 of the gene, some adjustements are required during
translation.
For instance, if working with COI sequences in MEGA and going directly to translation, the first
attempt would result in the appearance of many stop codons (Figure 17) indicating a frame shift.

Figure 17. Translation of an alignment of COI sequences in MEGA 7. Before codons are changed into
aminoacids, the software will require the selection of the correct genetic code. After translation, this example
shows a large number of stop codons (symbolized by ‘*’ in the software). This result indicates an incorrect
reading frame (i.e., the first nucleotide of the alignment is not the first nucleotide of the codon). In cases of
ambiguities in nucleotides (symbolized by ‘N’), affecting the inference of the correct amino acid, MEGA 7
will display a ‘?’ for that codon.

To find the correct reading frame, one option is to temporarily delete the first nucleotide of the
alignment and then translate (Figure 18). Upon satisfactory inspection of the amino acid output,
the alignment would be reverted to the original form (‘Undo’ button). Another option is to
temporarily insert two columns of gaps before the first nucleotide and then translate (Figure 19).
Again, upon satisfactory inspection of the amino acid output, the alignment would be reverted to
the original form.

47
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 18. Translation of an alignment of COI sequences in MEGA 7. The first nucleotide of the alignment
is temporarily deleted (highlight the entire first row -> Delete). After translation and checking for stop codons,
the action is reverted (hit ‘Undo’).

48
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 19. Translation of an alignment of COI sequences in MEGA 7. Two columns of gaps are inserted at
the beginnign of the alignment (select the first column, then press twice the gap symbol on the keyboard
(gaps are symbolized by ‘-‘). Again, after translation and checking for stop codons, the action is reverted
(hit ‘Undo’). Note: if only one gap is inserted, the frame is still incorrect and will result in the display of many
stop codons.

Compare sequences to a database


GenBank

Once the verification for stop codons is complete, the sequence is compared against a database
of DNA sequences. This action will find the best matches in terms of taxonomy (i.e., an
unidentified sequence will receive a taxonomic identification, depending on completeness of
database) and will highlight cases of misidentification and/or contamination.
The most common tool for this verification is the Basic Local Alignment Search Tool (BLAST31) in
GenBank. This web-based tool locates and displays regions of similarity between biological

31
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990). Basic local alignment search tool. Journal
of Molecular Biology, 215(3):403-10 (https://blast.ncbi.nlm.nih.gov/Blast.cgi).

49
GTI Training Manual – Standardized Workflows in DNA barcoding

sequences. Nucleotide and amino acid sequences are compared with those in the databases
(DDBJ, GenBank, ENA) and statistical significance of sequence matches is calculated. The four
types of BLAST offered by GenBank are: 1) Nucleotide BLAST (blastn32), that compares
nucleotide with nucleotide sequence (Figure 20); 2) blastx, that translates the nucleotide input to
protein to compare with protein; 3) tblastn, that translates the protein input to nucleotide to
compare with nucleotide; and 4) Protein BLAST, that compares the protein input with protein
sequences in the database. A standard BLAST search will reveal Max score, Total score, Query
cover, E value, Identity, and Accession no. of the match (if available) (Figure 21). It also generates
other reports that include Search Summary, Taxonomy reports, Distance tree of results, and MSA
viewer (multiple sequence alignment viewer). The results can be downloaded if needed. Batch
BLAST searches can be easily performed by uploading a fasta file or pasting multiple sequences
in the query box.
BLAST is also available in some sequence editing software where a (single) sequence can be
compared to GenBank through a direct link which will land on the query page of GenBank (Figure
20).

32
Zhang Z, Schwartz S, Wagner L, Miller W (2000). A greedy algorithm for aligning DNA sequences.
Journal of Computational Biology 7(1-2):203-14.
(https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=blast
home)

50
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 20. Comparison of a DNA sequence against a database through BLASTN. The main steps are
circled in red (query box to paste the unknown sequence, choice of database and program selection).

51
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 21. Output of a query with BLASTN. There are three main sections with results: Graphic Summary
(with colour-coded alignment scores between query and the closest 100 matches in the database),
Descriptions (a list of 100 matches with taxonomic identification, query coverage, E-value and % identity)
and Alignments (query sequence aligned with each one of the closest 100 matches). Additional reports can
be investigated, the results can be downloaded and instructions on data interpretation are given through a
YouTube video (see the information circled in red).

BOLD

A similar tool is available in BOLD and is called BOLD Identification Engine33 (ID Engine), allowing
single or batch queries of the barcode database.
Comparing single sequences to BOLD does not require a BOLD user account. As opposed to
BLAST, where any DNA sequence can be compared against the database, BOLD has been built
specifically as a platform for DNA barcoding (in particular animal barcoding with COI). Therefore,
the ID Engine allows only sequences belonging to the official barcode markers to be identified.
For animal COI, there is one important choice to make, namely the database used for queries
(Figure 22).

• All Barcode Records Database


• Species Level Barcode Records Database
• Public Barcode Records Database
• Full Length Barcode Records Database

Depending on the goal, the most common choices are either the entire database (including
records identified to coarse taxonomic level), to see the closest match to a sequence, or the
database containing only records with species names (when the goal is to find the closest species

33
http://v4.boldsystems.org/index.php/IDS_OpenIdEngine

52
GTI Training Manual – Standardized Workflows in DNA barcoding

to the query). The sequence is pasted into the query box (Figure 22) and the output opens in a
new window (Figure 23).

Figure 22. BOLD ID Engine with a single sequence queried against the Species Database. The tool can be
accessed directly from BOLD home page through the top links (“Identification”) without a user account.

53
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 23. Output page of results from the BOLD ID Engine. The query sequence had a 100% match in
BOLD and due to the high number of sequences all confirming the same species name, the identification
is considered robust. The results (100 closest sequences to the query) can be visualized on a tree (query
sequence in red, mined data from GenBank in blue) together with images corresponding to each record in
order of appearance in the tree. The results can be downloaded.

Scaling up to Medium/High-throughput Processing


Batch sequence editing
Trace files for a 96-well DNA barcoding plate, sequenced bidirectionally with an ABI sequencer,
may be delivered to the user as zip file via email or web-link in a single (both directions) or two
packages (one for each direction). The package includes files with two types of extensions: abi
(for electropherograms) and phd (for nucleotide sequences). The abi files can be viewed only by
specified software such as Chromas, CodonCode, BioEdit, Geneious etc, while the phd files can
be opened with any sequence reading software.
Detailed instructions for batch sequence editing for all four official barcode markers are specified
in Annex IX.

Batch quality control


Batch validation of data can be easily performed by following the steps mentioned above:

• Alignment of multiple sequences (in MEGA)

54
GTI Training Manual – Standardized Workflows in DNA barcoding

• Translation of nucleotides into amino acids (in MEGA)


• BLAST in GenBank
• BOLD ID Engine: to allow concomitant multiple queries, a BOLD user account is needed.
Up to 100 sequences can be queried at a time through the ID Engine (copy-paste
sequences in the query box and hit ‘Submit’) or even hundreds through the Batch ID
Engine in BOLD 4 (More details in the following section).

Data validation is a crucial step in DNA barcoding since the entire method relies on the existence
of reliable databases against which unknown sequences are compared and species
identifications are made. BOLD and external software offer tools to help users validate and curate
their data (Figure 24 and the following section, BOLD Analytics).

Figure 24. Main steps for validation of molecular data. AA – amino acids; NJ – neighbour-joining; ID –
identification; BIN – Barcode Index Number; 1 – NJ trees require more than 3 sequences; 2 – BINs are
assigned once a month (in 2018); 3 – Full BIN discordance report should be performed in BOLD 3.5 (in
BOLD 4, the BIN discordance report takes into account only records within a project not the entire database;
June 2018).

55
GTI Training Manual – Standardized Workflows in DNA barcoding

BOLD Analytics
BOLD is a workbench where users upload data (see Annex X), analyze it and publish it. At that
moment, the barcode data enters the public database and becomes freely accessible to anyone
for (re)use in various studies.
BOLD is organized in projects led by project managers and built together with their
teams/collaborators. Projects hold data constituted by records which are biphasic: 1) Specimen
Data - related to the physical specimen (designated by unique Sample ID), 2) Sequence Page -
associated molecular data (designated by Process ID). Data submission consists of four steps
(specimen data – validated by the BOLD team; images, traces and sequences, directly submitted
by users, no extra validation) (Annex X).

Data validation
Once a project is populated with records, data can be validated in BOLD (some steps are similar
to the previous sections on data validation with external software).

Figure 25. Most of the tools available for data validation and analysis in a project are situated on the left-
side console. By collapsing ‘Sequence Analysis’ and ‘Aggregate Data’, a suite of tools will be displayed and
can be selected with a mouse-click.

56
GTI Training Manual – Standardized Workflows in DNA barcoding

1. Upon specimen data upload, a ‘Distribution Map’ (Figure 25) can be built to check the
accuracy of the GPS coordinates in the project (Figure 26). The map can be opened in
GoogleEarth34 for a more detailed view of localities.
• If one or only a few errors are observed, they can be corrected manually (through the
Specimen Page).
• If there are many errors, an update can be submitted to the BOLD team (see Annex X).

Figure 26. Screenshot of a distribution map for records belonging to a BODL project (the red circle shows
the link to GoogleEarth).

2. Upon image upload, an ‘Image Library’ (Figure 25) can be built to verify that no mix-up
occurred during imaging or during submission (Figure 27).
• If an erroneous image has been uploaded, it cannot be corrected by the user. Instead,
BOLD support ([email protected]) should be contacted with a request to delete
the image. Once the record is cleared, the correct image can be uploaded.

3. Upon trace upload, if an error is observed, such as mix-up of traces between records, an
email should be sent to BOLD support to delete traces. Once the record is cleared, the correct
trace(s) can be uploaded.

34
https://www.google.com/earth/

57
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 27. Example of an image library in BOLD used to verify the taxonomic identification and possible
mix-ups of images during upload.

4. Upon sequence upload, a notification appears immediately if stop codons are present in the
sequence35 or if the sequence matches the contaminant database (bacteria, human, mouse,
pig etc) (Figure 28). In such cases, records are flagged immediately and removed from the
database used for the ID Engine.

Figure 28. Built-in tools for data validation. Contaminants are flagged upon sequence upload to BOLD.
Similarly, sequences with stop codons are immediately detected, flagged and removed from the ID Engine.

35
The genetic code is chosen automatically by BOLD based on the taxonomy associated with the record,
hence the importance of correct taxonomic assignment at least to phylum level.

58
GTI Training Manual – Standardized Workflows in DNA barcoding

5. After sequence upload, the Batch ID Engine (Figure 25) can be used in BOLD 4 to run all
sequences in a project against one of two databases: all barcodes or only barcodes with
species names (Figure 29). The results can be emailed to the user and will contain a
spreadsheet with 100 closest matches for each sequence queried. If errors are observed,
action should be taken:
• If a misidentification occurred, update taxonomy (either manually through the Specimen
Page or in batch through the update submitted to the BOLD team);
• If contamination (between samples or with other contaminants) occurred, request to flag
records by sending an email to BOLD support.
Note: It is crucial to take action to correct data (or flag it, as deemed suitable) since the entire
DNA barcoding approach is based on the existence of reliable DNA libraries. If errors are left
uncorrected, there will be an impact on the activities of the user community at large.

Figure 29. Batch ID Engine allows an entire project to be compared with the full/species database. Several
filters are available. Results should always be emailed to the user for efficient time management.

Data analysis
The following analytical tools can be used both for data validation and data analysis.
1. Neighbour-joining (NJ) trees can be built from the Sequence Analysis console by selecting
Taxon ID Tree (Figure 25). Various parameters (alignment type, genetic distance model) and
branch labels can be chosen by users (Figure 30). The NJ tree allows a relatively rapid
(depending on the amount of data) assessment of barcode clusters along the tree, therefore
highlighting cases that require additional investigation (e.g., occurrence of multiple species
names in one cluster or one species split into multiple clusters). Trees can be downloaded as
pdf, newick or postscript files.
If errors are observed, action should be taken:

59
GTI Training Manual – Standardized Workflows in DNA barcoding

• If a misidentification occurred, update taxonomy (either manually through the Specimen


Page or in batch through the update submitted to the BOLD team);
• If cross-contamination (between samples) occurred, request to flag records by sending an
email to BOLD support.

Figure 30. Taxon ID settings: only the alignment type is a mandatory field (Note: for large datasets choose
BOLD Aligner). For the other fields, BOLD will use default settings (process ID and lowest taxonomic level
available) unless otherwise specified by the user. The tree can be accompanied by an image library and a
spreadsheet with specimen details. If the data to be analysed is large, the option of emailing the results
should be chosen.

60
GTI Training Manual – Standardized Workflows in DNA barcoding

Note: NJ trees are great tools for data validation and barcode visualization but caution should be
taken in considering them as the true reflection of phylogenetic relationships between taxa.
2. Distance analysis of DNA sequences can be performed by selecting Distance Summary on
the Sequence Analysis console (Figure 25). Various parameters are available for users
(again, the most important one being the alignment option). The results window shows the
range of distance values (as %) within and between species as well as histograms for these
values (Figure 31). High intraspecific values (>3%) might be indicative of potential cryptic
species or misidentification (in c). In case of potential cryptic species, additional molecular
markers as well as morphological, behavioural, ecological etc. data are needed to clarify the
taxonomic position of the existing genetic clusters.

Figure 31. Results of the distance analysis between DNA sequences held in a BOLD project. Divergences
within species and between congeneric species are displayed in a table format and plotted as histograms.
All values of pairwise comparisons can be downloaded in a spreadsheet.

3. The Barcode Gap Analysis provides a quick view of all interspecific divergences in a project,
highlighting cases which require additonal investigation (intraspecific distances higher than
distances to the nearest neighbour species). Users are able to choose parameters with the
alignment option being the only mandatory field. The results window shows distance values
(maximum intraspecific and minimum interspecific) displayed as various scatterplots and
summarized in a table format (Figure 32). In case of errors (misideintification, contamination)
observed, action should be taken (see sections above).

61
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 32. Results of the barcode gap analysis in a BOLD project. Scatterplots are built for various analyses:
maximum/mean intraspecific distance vs. distance to the nearest neighbour species, individuals per
species vs. maximum intraspecific distances. Dots below the red line refer to species which need further
investigation (nearest neighbour is closer than the maximum intraspecific value). Details for each species
are provided in a table format and can be downloaded in a spreadsheet.

4. The Barcode Index Number (BIN)36 is a useful tool for cataloguing life in the absence of
taxonomy. The system is based on a unique algorithm implemented in BOLD which clusters
all COI sequences into groups with unique identifiers (3-letters-4-numbers codes:
BOLD:ABC1234). Each BIN receives a unique page in BOLD, compiling all the information
available for the member records. Due to high concordance between BINs and morphological
species, observed so far, BINs can be considered as proxies for animal species. Each record
in a project would display a link to the BIN page on the project console (Annex X).

Some short sequences (<500bp) are not part of any BIN (unless they are very similar to existing
sequences which already received a BIN assigment). In these cases, the Cluster Sequences
tool (Figure 25) will group all DNA sequences from a project in Operational Taxonomic Units
(OTUs) (Figure 33). Although the algorithm used is very similar to BINs, the main differences are
the following: 1) OTUs are temporary units which do not receive persistent pages in BOLD (data

36
Ratnasingham S, Hebert PDN (2013). A DNA-Based Registry for All Animal Species: The Barcode Index
Number (BIN) System. PLoS ONE 8(8): e66213.

62
GTI Training Manual – Standardized Workflows in DNA barcoding

can be downloaded as spreadsheet and used in other software); 2) OTUs are project-based while
BINs are based on the entire BOLD.

Figure 33. Results of clustering into OTUs in a project. Mean and maximum values for OTUs as well as the
distance to the nearest OTU are displayed and can be downloaded (green button). OTUs are numbered
but are not persistent.
Besides an analytical tool providing statistics related to a project and highlighting cases of
potential cryptic speciation, BINs have an important role in data validation. The BIN Discordance
Report (Figure 25) provides an overview on the (dis)agreement of data within a project compared
with the rest of BOLD projects (Figure 34). If errors (misidentification, contamination) are spotted
in the project, action should be taken (as mentioned above).

Figure 34. BIN discordance report for a BOLD project (in BOLD 3.5). For the discordant BINs, the rank of
discordance and details on conflicting records are mentioned. Results can be downloaded (green button).

63
GTI Training Manual – Standardized Workflows in DNA barcoding

Note: The report includes only sequences assigned to a BIN. This report can be currently built
only in BOLD 3.5 (BOLD 4 will provide a discordance report only for a specific project).
The BIN algorithm runs once month (June 2018), therefore BINs are not provided automatically
upon seqeunce upload.

Data publication
BOLD holds data in projects. However, data can be partitioned (subset of data from one project)
and mixed (subsets of data from different projects) in datasets (Figure 35), virtual copies of
projects, which have the role of allowing data recycling (the use of the same data in multiple
studies). Datasets have the same options as projects and can be analyzed with the same tools
as the ones mentioned above. Once data is ready for publication, the user can easily release the
entire dataset in BOLD Public Database as well as GenBank. In the same time, a DOI can be
obtained for that particular dataset and included in the publication (the DOI link will allow a quick
launch of the dataset directly from the publication where it is mentioned).

Figure 35. Workflow for publishing datasets. Data should be submitted to GenBank through Publication ->
Submit to GenBank (yellow rectangles). The dataset should be publicly released in BOLD through Dataset
Options -> Modify Dataset Properties -> Make this dataset publicly visible (check the box; red rectangles)
-> Save. A new window appears where a request for DOI can be made.
An overview of the analytical steps required for data validation, analysis and publication in BOLD
is presented in Figure 36.
For more details on data validation, analysis, and management, see the BOLD handbook
(http://www.boldsystems.org/index.php/resources/handbook?chapter=7_validation.html).

64
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 36. Main workflow and analytical steps required for data validation, analysis and publication in BOLD. NJ – neighbour-joining; BIN – Barcode
Index Number; OTU – Operational Taxonomic Unit. Additional analytical tools, not presented here, are available depending on the interest of the
user. Once data is ready to be published, it needs to be submitted to GenBank and publicly released in BOLD (datasets with DOI are a convenient
option).

65
GTI Training Manual – Standardized Workflows in DNA barcoding

ANNEX I: Scaling up DNA Barcoding Workflows


Processing large batches of samples (hundreds and thousands) requires scaling up of both
molecular and informatics pipelines, together with an updated corresponding collection
management system. The use of 96-well microplates (95 samples and one negative control)
provides the necessary capabilities for processing large sample sizes in a short time and a cost-
effective manner.

Scaling up the molecular pipeline


The applicability of DNA barcoding depends on the completeness of the reference libraries, which,
in turn, relies on the efficiency of baseline DNA barcoding workflows. Early efforts focussed on
establishing highly effective and error-free laboratory workflows that could be implemented in
medium-large sequencing facilities. The key to scaling up analytical production was the transition
from single tube-based protocols for DNA extraction and PCR amplification to 96-well plate-based
approaches (Figure 5). On the hardware side, this transition was manifested by employing multi-
channel liquid handling equipment that facilitated parallel operations in sample batches of 96 or
more37. In addition to expediting the operations, this also led to cost savings on labware and
particularly on reagents, because reagent volumes per/sample used in a 96-well microplate are
considerably smaller than in individual tubes. As a result, molecular protocols have been adjusted
to operate in a medium-throughput environment.

Figure 37. Scaling up the DNA barcoding workflow involves switching from working with individual sample
tubes (left) to processing 96-well microplates.

Scaling up collection management


The transition of laboratory workflows to 96-well microplate format poses a new set of challenges
for front-end (pre-lab) processing, resulting from the need to aggregate individual samples into
batches of 95 (assuming one well within the plate is left empty as negative control):

37
Robotic liquid handling allows consolidating four 96-well plates into a single 384-well plate which allows
dramatic reduction of the volume of sequencing reactions, providing considerable savings in labware and
reagent cost; however, 384-well plates are not human-manageable.

66
GTI Training Manual – Standardized Workflows in DNA barcoding

• Each sample needs to be accurately mapped within the plate and matched to its source
specimen;
• The amount of tissue in each well has to be standardized across the entire plate and small
enough to allow using small reagent volumes and the same PCR primers;

• Organisms have to be grouped based on their taxonomy and other properties, so that the
same tissue lysis and DNA extraction protocols, as well as PCR primer sets could be used
across the entire 96-well plate.

• Special care needs to be taken to avoid cross-contamination between wells during


sampling or sub-sampling;

• The procedure of assembling the samples into the microplate has to be time-efficient, to
ensure that it does not slow down the overall process.
These challenges are further complicated by the fact that the collection materials from which the
tissues originate may come from different sources (e.g., museums and field collection operations),
using different forms of preservation, and with provenance information provided in diverse
formats. In addition to slowing down the process of generating DNA barcodes, this can also
increase the chance of human error. In this context, arraying is a crucial component of the front-
end logistics.

Arraying in standardized workflows


Specimens may arrive in varied formats and often require case-specific approaches. Arraying is
a critical stage of front-end processing that helps organize the specimens in the same order in
which they will be sampled for DNA extraction.
Specimen array is defined here as a batch of catalogued and labelled specimens assembled
together in sequential order for imaging and/or tissue sampling (may or may not correspond to
96-well format);
Sample array is a batch of tissue samples assembled together in a tube rack, microplate, or on
a blotting card in sequential order corresponding to the format of the 96-well plate where they will
be subsampled.
Most biodiversity researchers and practitioners are used to working with either bulk samples or
individual samples, but not with two-dimensionally organized batches, which requires a
substantial change of mindset and special effort to avoid errors. The purpose of an array is to
provide a visual reference of the order in which biological samples will be organized in the 96-well
microplate, thereby streamlining the processing workflow and improving quality control.

67
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 38. The 96-well microplate (left) and its topographic layout. The 96-well microplate is typically used
to array tissue samples and DNA extracts.It contains conical-shaped sampling wells approximately 100 µl
in volume that are arranged in a 12×8 format (12 columns and 8 rows) within a rectangular-shaped plastic
skirt. Typically, well A01 (marked in green) is the start of the array; red well H12 is left empty as a negative
control. Note the label with an alphanumeric barcode on the front of the plate allowing to discern it from
other plates and track it through different stages of processing and analysis. This label is affixed prior to
any sampling activities.

Array Types

Different types of arrays may be used, depending on the nature of the organisms processed
(Figure 39). When working with small fluid-preserved specimens, tube or vial racks can be used;
for small try mounted specimens (e.g., pinned or pointed insects), gridded pinning boxes can be
custom-made. For specimens that are too large to array (e.g., vertebrate vouchers or plant
herbarium specimens), tissue samples are usually taken and stored separately; these tissues can
also be arrayed in tube racks, prior to subsampling. Typically, arrays are two-dimensional, but in
cases of large or awkwardly-shaped samples (e.g., plant tissue in desiccant), this is not practical.
In these instances, use of linear arrays allows the technician to keep track of the order of samples,
while taking up reasonably small operational footprint.

B C D
Figure 39. Examples of arrays: (A) scintillation vial rack; (B) tissue tube rack; (C) pinning box; (D) tray with
bags containing plant tissue in desiccant. For images A-C, note the empty locator corresponding to the
control well (H12).

68
GTI Training Manual – Standardized Workflows in DNA barcoding

Array Map – keeping a digital record

It is critical to keep a record of the position of specimens in the array and the corresponding
samples. Each specimen and resulting sample needs to be assigned a unique identifier, or
Sample ID and these identifiers should be associated with the locators within the array,
determined by the combination of the corresponding row (A-H) and column markers (01-12).
Rather than entering the data directly into a 12×8 matrix, it is preferred to use custom spreadsheet
templates which allow entering data for samples in sequential order and then displaying 12×8
printable layout generated from the data entered using formulas. It is important to have a printout
of the assigned array map before initiating the processing of specimens, so that it could be cross-
referenced against specimen labels, to ensure correct placement of each sample.
Arraying can help streamline all bulk processing stages:

• Sorting – transferring specimens from bulk samples and organizing them into arrays;
• Labelling – affixing individual labels with globally unique alpha-numeric coding;
• Imaging – digital photography of specimens;

Below is an example of a custom spreadsheet template (Excel file) used for generating array
maps. Data are entered in the tab (worksheet) titled “DATA INPUT”. Worksheet filling instructions
are typed in green italics; and red-on-yellow warning messages are displayed if information is
missing, duplicated or does not meet the minimal standards. Data are entered into the white cells
which change colour once filed.
The type of sampling container is selected from the dropdown field and an image of the container
is displayed to allow the user to verify correct entry. The “Multiple Containers” checkbox, if
activated, displays a columnar list of fields to enter up to ten container numbers. Entering the
container names un-hides the corresponding fields for entering sample data in the left part of the
table. Each sample has to have a unique identifier that should correspond to the “Sample Locator”
field indicating its position within the corresponding container.

69
GTI Training Manual – Standardized Workflows in DNA barcoding

To confirm the correct sampling order, the user can refer to the “Array Map” (lower image) that
displays the two-dimensional localization of each of the samples entered. The “Array Map” sheet
displays one map at a time; the container can be selected from the dropdown menu in the top
right part of the sheet. The workbook does not allow entering any comments or additional data –
its sole purpose is to map the position of samples within the arrays. NOTE: All coloured (non-
white) cells in the CCDB Record workbook are write-protected to secure formulas and cross-links;
data can only be typed or pasted into white cells. When pasting data from another spreadsheet,
it should be pasted as ‘values’ or ‘unicode text’ using the ‘paste special’ function of MS Excel.

70
GTI Training Manual – Standardized Workflows in DNA barcoding

ANNEX II: Collecting Ontologies


Background and terminology
Expert communities working on particular taxonomic or functional groups of organisms have
historically developed group-specific traditions and best practices for recording information. This
has led to considerable ambiguity about the basic terms used to denote the objects isolated from
the environment and preserved. The disparity of established traditions for data capture became
particularly apparent in the digital era when different natural history collections and research
facilities started establishing in-house databases, which had limited interoperability across
institutional and taxonomic domains.
Several attempts tried to overcome this challenge. The most prominent example is the Biodiversity
Information Standards association, formerly known as the Taxonomic Databases Working Group
(TDWG)38, which is affiliated with the International Union of Biological Sciences and focusses on
developing and promoting international data standards for biological objects (and biodiversity
research in general) that facilitate data exchange and interoperability. The application of such
standards facilitated the emergence of global big data aggregators for biodiversity. The Global
Biodiversity Information Facility (GBIF)39 is an intergovernmental organization that promotes and
facilitates free and open access to biodiversity data and serves as a platform for disseminating
this information.
In order for such projects to be successful, it is imperative that a growing number of experts
engage both as users and as providers of high quality information, thereby necessitating their
familiarity with data standards and best practices. The main biodiversity data standard currently
used is known as the Darwin Core Archive (DwC-A)40. It imposes a unified set of terms, in order
to structure the dataset and associated metadata (“data about data”) and to enable capturing
cross-comparable information on species occurrences41, such as sampling (=collecting) events.
The standard is rather complex and highly technical, reflecting the modern requirements for data
digitization and the need to accommodate for all possible cases. Nonetheless, it has been
criticized by Walls et al (2014)42 for lacking the semantic structure required for describing
biodiversity data in a computationally useful way. It was suggested that biodiversity-related
ontologies should be developed to provide contexts for the terms used in the data standard.
Within the context of biodiversity information and, specifically, biological collections, the concept
of ‘ontology’ could be understood as a set of structured terms circumscribing the categories of
biological materials, their interrelationships and the different classes of information that
characterizes these objects, their origin (provenance) and the process through which they were
acquired. In order to address the challenges of integrating collection management practices with
medium- and high-throughput molecular analytical protocols, it is important to understand the

38
TDWG - http://www.tdwg.org/
39
GBIF - https://www.gbif.org/
40
Darwin Core Archive - https://github.com/gbif/ipt/wiki/DwCAHowToGuide)
41
Note that in this context the term “species occurrences” could be better understood as documented
occurrences of organisms which have been attributed (e.g., by a taxonomic expert) to a certain species or
other taxonomic category.
42
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0089606

71
GTI Training Manual – Standardized Workflows in DNA barcoding

ontological concept of collecting and the types of entities that collections deal with. It is also
important to distinguish between collecting activities that lead to the sourcing of biological
materials and the materials themselves, i.e., collection objects.

Collecting ontologies – conceptual structure


In essence, the fields in a biodiversity database need to be meaningful in describing the objects
and the processes used to gather relevant information. In order to understand biological collection
ontologies, it is important to keep in mind the hierarchical organization of collecting activities
(collection effort vs collecting event) and the biological objects that are derived from these
activities (lots vs specimens, as well as the different categories of samples). It also helps to define
the semantic groupings of data that serve particular purposes. Broadly speaking, this information
can be grouped into three main categories pertaining to ‘subject’, ‘action’ and ‘object’, specifically:

• History – a record of the persons and organizations (= ‘subjects’) undertaking the collecting
and subsequent processing of materials;
• Provenance – spatiotemporal and circumstantial properties of the collecting activities (=
‘actions’); and
• Attributes – intrinsic or relational properties the biological materials (= ‘objects’) collected.
Below is a more detailed outline of the details that are typically recorded (see Annex II for details
on the Electronic Field Journal holding all these fields).

History – “why?”, “who?” and “where to?”


Once a biological object is removed from nature and becomes part of a natural history collection,
it essentially turns into a cultural object. This group of data refers to a set of “historic” properties
recording the human interaction with biological objects, rather than the natural origin or the
biological properties of objects themselves. This information provides background on the purpose
of the collecting effort (“why?”), the actors involved (“who?”) and the destination of the collection
(“where to?”). Three key sub-groupings could be defined.

Authority – “why?”

This information provides the administrative context for the purpose (end goal) of collecting the
biological objects; it describes the entities responsible for the collecting effort and those who have
oversight, ownership or custodianship over the materials collected:

• Program or project under the auspices of which collecting took place;


• Expedition that carried out the collecting mission;
• Institution(s) responsible for the collecting activities; and
• Owner(s) and/or custodians of materials and data collected.
This information would represent the highest level of hierarchy and thus would relate to the broad
collecting effort.

Actors – “who?”

Individuals involved in and responsible for the collecting and field processing activities:

• Collector – person(s) who collected the biological objects;

72
GTI Training Manual – Standardized Workflows in DNA barcoding

• Identifier – person(s) who provided taxonomic identification for the biological/collection


objects;
• Other terms (e.g., “preparator”) may be used to indicate the person(s) who processed the
voucher specimen after collecting, if different from collector.
This information relates to the lower level of hierarchy of activities (collecting events) or to the
biological objects (lots and specimens) that result from them.

Destination – “where to?”

Although not part of collection information per se, this group of data records the audit trail of the
collection as it undergoes different stages of processing, analysis, archival and curation. Keeping
a record of these processing stages is an important part of ensuring a robust operational workflow.

Provenance – “where?”, “when?” and “how?”


This is the core part of the biodiversity collection ontology, providing details on the origin of the
biological objects. Usually, this information is obtained at the time of deploying a fine-grained
collecting activity (collecting event); by extension, it applies to all biological objects (lots and/or
specimens) that are collected as a result. Provenance information can be grouped into three broad
categories of properties that describe the collecting event’s localization in space (“where?”), time
(“when?”) and the method used (“how?”):

Spatial properties (“where?”)

Locality – geographic position


• Political jurisdiction (country)
• Administrative jurisdiction (province/state, region/district, or municipality)
• Geographic reference (sector – distance and direction from nearest prominent settlement)
• Local (position relative to local geographic features)
• Precise geospatial position (latitude, longitude and elevation or oceanic depth)
If precise geopositioning information is available, then other data elements may be considered
redundant, although they are traditionally recorded. Note that a collecting event may span a
certain area (e.g., in the case of a transect survey) or timeframe (e.g., a long-term trapping effort).
Habitat
Unlike broad geographic position, habitat properties may vary locally; therefore, it is important to
record them. Information may include:

• Landscape characteristics, e.g., type of biome where collecting took place;


• Microhabitat (depending on the nature of the collecting activity);
• Aerial elevation or local depth may be important, e.g., when biological objects were collected
in the tree canopy above ground or in a deep freshwater body, when distance from the
ground or water surface cannot be easily inferred from elevation above sea level.

Temporal properties (“when?”)

• Date or range of dates spanned by the collecting event;

73
GTI Training Manual – Standardized Workflows in DNA barcoding

• Time (if available with sufficient precision);


Similar to the case with spatial properties, a collecting event may span varying timelines,
depending on the nature of collecting activities and the limitations that it may impose on data
accuracy.

Circumstantial properties (“how?”)

These are characteristic of the collecting event related to the way it was deployed and other factors
that may affect its outcome:

• Collecting method – technique used to collect the organisms;


• Effort quantification – a quantitative characteristic allowing to estimate coverage or sample
size (e.g., number of traps used, volume of soil core sample taken, or length of survey
transect);
• Phenological and/or weather characteristics – although related to temporal properties of
collecting activities, they are not readily derived from them and often provide helpful insights
into the ecological patterns of organisms being collected;
• Other deployment circumstances or remarks.

Attributes – “what?”
Unlike provenance information, which is logistically convenient to record for the entire collecting
event, attributes are, as previously noted, intrinsic or relational properties the biological objects
being collected, although they may be recorded at the time of collection. It is therefore important
that lots and/or specimens for which an observation is made are grouped and labeled in a way
that allows tracing such information. Two major groups of properties could be defined.

Biological properties

Ecological
These properties describe the collection object’s relations to other organisms or abiotic
factors/resources observed at the time of collecting that may provide insights into its ecological
role or function (e.g., parasite, symbiont, host) or other peculiarities.
Organismal
These properties describe the innate characteristics of a particular organism; examples include:

• Morpho-physiological (e.g., sex, age or life stage, reproductive condition);


• Metric (e.g., measurements, body mass).

Taxonomic properties

Taxonomic identification
This is a result of a curatorial act – when the specimen has been examined by an expert and
assigned to a certain taxon.
Taxonomic position
This information is derived from a nomenclatural paradigm and refers to the currently accepted
placement of a taxon within the broader hierarchy and the preferred nomenclature to be used.

74
GTI Training Manual – Standardized Workflows in DNA barcoding

Annex III: Electronic Field Journal


Overview of the template
This field data management spreadsheet has been developed to address the essential curatorial
needs involved in field specimen processing. Its primary function is to facilitate spatiotemporal
data tracking for the survey effort deployed, curation of collected lots and individual specimens,
assembly of sampling and imaging arrays, and BOLD data submission.
The spreadsheet is built on the MS Excel platform for Windows and requires no specialized
database training or software resources from its prospective users. Built-in formulas and macros
offer versatility in data management. The establishment of hierarchical relationships between lot
and specimen records and corresponding collecting events minimizes the need for repetitive data
entry. A set of conversion tools allows outputting data in a variety of flat file formats compatible
with BOLD.
Users are expected to enter data into the white cells which change colour as a result. General
collection information fields, column headers and buttons have popup help comments that are
displayed when the mouse cursor is placed above them. These help messages indicate the type
of information that should be entered and relevant data requirements, as well as the functionality
provided by the buttons.
Several ancillary tools are offered, such as automatic generation of specimen and lot label
printouts. Taxonomic nomenclature is validated against an imbedded reference checklist and
geospatial data can be directly plotted in Google Earth. The layout of the spreadsheet is designed
to reflect the biological collecting ontology discussed in this manual; however, there is sufficient
flexibility to accommodate for exceptional data entry needs, such as digitization of provenance
information from collection labels of museum specimens. Full functionality can be attained with a
standalone computer workstation connected to laser printer (with archival paper).
While not intended to become a replacement for specialized collection databases, the
spreadsheet has proven to save valuable field time and improve the quality of data capture. Its
broader usage will not only enhance the efficiency of front-end stages of DNA barcoding, but will
help address many logistical challenges of pre-museum collection management.

COLLECTING tab – registering the collecting activity and collecting


events
The COLLECTING tab of the spreadsheet is used to enter historic and provenance information about
collecting activities. The left column of fields should be filled with information about the overall collecting
effort, while the main table is used to enter data for each collecting event.
See below the blank and filled form:

75
GTI Training Manual – Standardized Workflows in DNA barcoding

76
GTI Training Manual – Standardized Workflows in DNA barcoding

LOTS tab – registering and associating lots


Each lot record has to be associated with a collecting event by selecting the collecting event code from the
dropdown menu. This will link the provenance information and display it in the coloured cells to the right.
Note the set of buttons in the top right menu that allows generating several standard formats of lot labels.
These labels can be laser-printed on archival-grade paper and placed in the lot storage containers.
See below the blank and filled form:

77
GTI Training Manual – Standardized Workflows in DNA barcoding

SPECIMENS tab – registering and associating specimens


Each specimen record has to be associated with a collecting event by selecting the collecting event code
from the dropdown menu. If the specimens are derived from a lot, they should also be associated with the
corresponding lot number. Once this is done, the collecting event could be retrieved automatically by
selecting one or more of the corresponding ‘Collecting Event Code’ fields and clicking the button ‘Retrieve
from Parent’. This will link the provenance information and display it in the coloured cells to the right. Note
the set of buttons in the top right menu that allow generating several standard formats of specimen labels.
These labels can be laser-printed on archival-grade paper and placed in the lot storage containers. Also
note the ‘BOLD output’ button that allows generating the BOLD batch submission spreadsheet and the
‘Sampling Arrays’ button that facilitates arranging specimens in arrays for tissue sampling and analysis.
See below the blank and filled form:

78
GTI Training Manual – Standardized Workflows in DNA barcoding

TAXONOMY tab – adding taxonomic references


This spreadsheet allows the user to enter/upload detailed taxonomic checklists (up to species level) and
provide a taxonomic/nomenclatural framework for the biological materials being collected. Each taxon
needs to be recorded only once; it is associated with the corresponding lots and/or specimens by entering
the taxon name and rank in the corresponding lot and specimen taxonomy fields. If no taxonomic checklists
are entered, built in taxonomy will be applied, containing a hierarchy down to family level downloaded from
ITIS. This list can be visualized by scrolling down to the bottom of the spreadsheet.

Printing lot and specimen labels


Label sheets are unhidden by pressing the corresponding buttons on the LOT and/or SPECIMEN
sheets. Each label sheet is formatted to be printed on a single Letter or A4 paper. The height and
width of cells can be changed by the user to facilitate print formatting. If the number of records is
greater than can fit on a single printed sheet, the user can toggle between pages using the page
toggle buttons or select the starting number of the print array. Some label templates allow
adding/removing borders that facilitate cutting out individual labels from print.

79
GTI Training Manual – Standardized Workflows in DNA barcoding

Registering arrays and generating array maps


Specimens are associated with arrays by adding the array numbers. By default, the first 95
specimen records are associated with the first array, and so on. This can be changed by manually
selecting the array start and finish; in this case, be mindful of possible gaps or overlap between
arrays (which will be highlighted with error messages). Individual specimens can also be
individually associated with specific array locators by adding the respective values in the ‘Array #
or Container’ and ‘Array Locator’ in the SPECIMENS sheet. Note that array codes need to be
registered before they can be associated with specimens.

80
GTI Training Manual – Standardized Workflows in DNA barcoding

Prepare BOLD data submission


In order to prepare the BOLD data submission from the Electronic Field Journal, assign all
specimen records to a BOLD project code (real or mock) by populating the ‘BOLD Project’ field in
the ‘SPECIMENS’ tab. Then, unhide the BOLD Output tab in the table by pressing the ‘BOLD
Data Output’ button in the ‘SPECIMENS’ tab.
The ‘BOLD Output’ tab appears when pressing the ‘BOLD Data Output’ button in the ‘SPECIMENS’ or
‘COLLECTING’ sheet. Once in the ‘BOLD Output’ sheet, review data fields to make sure there are no errors
(highlighted in red). Empty fields and missing data are marked in bright yellow. Troubleshoot errors by
making adjustments to the data in tabs ‘COLLECTING’, ‘LOTS’, ‘SPECIMENS’ and ‘Taxonomy’, as
appropriate. When ready to submit data to BOLD, click the ‘Refresh project codes’ button, then select the
preferred project code and submission type (new or update) from the dropdown menus. This will update all
formulas and links within the spreadsheet and filter the dataset, restricting it to the records that will go into
the selected project. Click the button ‘Output BOLD 3.1 Specimen Data’ and wait for the macro to generate
the 4-tab BOLD data submission spreadsheet automatically. The corresponding MS Excel *.xls file will be
saved on the MS Windows desktop (the function does not work in other operating systems). If your dataset
spans more than one project, repeat the above steps for each consecutive project.

81
GTI Training Manual – Standardized Workflows in DNA barcoding

TIP: If you are submitting data to the same BOLD project in several blocks, corresponding
datasets could be denoted as “_submission1”, “_submission2”, etc. Denoting the actual BOLD
project code is not mandatory; however, it helps keep track of where the data are being uploaded.

82
GTI Training Manual – Standardized Workflows in DNA barcoding

ANNEX IV: Specimen Imaging


Practical tips
Imaging setup and best practices will vary greatly, depending on the nature of specimens imaged,
the equipment used, etc., particularly when organisms are photographed in nature. Here we will
provide tips on imaging preserved collection voucher specimens in a laboratory setting. Several
technical terms used in this section require basic understanding of the principles of photography
which are easily found online. Different camera brands and models vary considerably in default
settings and controls; therefore, we also recommend consulting your camera manual for specific
ways of changing the parameters discussed below.

Imaging studio layout


Planning the imaging studio layout is key to ensuring processing efficiency. The following key
points should be considered.

Ergonomics

Imaging large batches of specimens in an awkward posture can lead to serious strain injuries
(particularly to the neck and back); therefore, ergonomic workplace organization is a key part of
studio design, especially in the case of macro photography. The relative position of the table(s),
camera tripod and photographer’s seat should be such that the camera viewfinder is at eye level
(no need to hunch down or extend the neck) and that the entire specimen array is within
comfortable reach. Special care should be taken to avoid leaning, hunching, neck flexion or
extension and unsupported arms in static posture. Whenever logistically feasible, real-time
camera feed onto a monitor is recommended.

Specimen security

Another important consideration in organizing the imaging protocol is maintaining the integrity of
specimens (e.g., not detaching them from their labels) and avoiding damage. Studio setup should
take into account this requirement.

Avoiding errors

Having the specimens pre-arrayed and labeled prior to imaging helps streamline the imaging
process. It is recommended either to use two separate arrays (one for ‘pre-imaging’ and one for
‘post-imaging’) or to change the orientation of the specimen in the array once it has been
photographed. It is useful to have a printout of the array map and to confirm individual specimen
numbers between the map and the specimen or vial labels. When specimens are too small for
labeling, their number can only be inferred from their position (locator) within the array; therefore,
it is critical to ensure that the specimens are taken from and placed into the correct array locator
(e.g., plate well).

83
GTI Training Manual – Standardized Workflows in DNA barcoding

Camera and stage

It is often convenient to set up a system where the camera and/or stage can be easily moved,
e.g., to exchange the specimen or to change the focal distance to attain focus and/or proper
framing. For horizontal setups (when the lens axis is positioned horizontally), it is usually preferred
to have a camera mounted statically on a tripod, while the stage could be moved relative to the
lens across the tabletop.

Figure 40. Example of a macro photo setup for pinned insects. The digital SLR camera is mounted on a
tripod; a ring macro flash is attached to a 60 mm macro lens. The cone of white paper visible in front of the
flash acts as a diffusor that minimizes glare. The specimen is pinned onto a double-layered piece of white
fabric. There is an off-camera flash behind the fabric that is synchronized with the on-camera ring flash –
when activated, it over-exposes the fabric, creating the perception of a pure-whit background.

Lighting
True to the origin of the term ‘photography’ (‘writing [drawing] with light’ in Greek), the key to taking
a successful image is to ensure appropriate lighting conditions. Although modern digital cameras
have sophisticated algorithms for exposure metering and image post-processing, it is still
important to ensure good quality lighting that provides sufficient and well-balanced exposure,
allowing the camera to capture an accurate depiction of the outline, surface profile, colouration
and texture of the specimen and its distinguishing features. It is best not to rely on ambient lighting
and to use one or more artificial light sources with good white balance. Traditionally, off-camera
flashes were used for this purpose; however, high-quality LED light fixtures are increasingly used.

84
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 41. Examples of a photographs with different exposure. Images of a fluid-preserved spider taken
using an SLR with macro lens. A: underexposed image (background clearly visible, but specimen is too
dark, specimen details obscured); B: overexposed image (background and specimen too light, specimen
details obscured); C: image with adequate exposure (background overexposed, but specimen morphology
adequately visible).

Exposure

The amount of light emitted by flashes or other off-camera sources should be sufficient to allow a
high f-stop (narrow aperture) and high shutter speed with regular ISO settings (image sensor
photosensitivity, formerly known as ‘film speed’). It is best to set the camera to manual shutter
and aperture setting and to set ISO to 100 or 200.

Directionality of lighting

Reducing glare is one of the biggest challenges when using artificial lighting; glare is often evident
when the light is emitted through a single source located in proximity to the camera and is bounced
from the object’s surface back into the lens. Glare is particularly evident in glossy objects, such
as certain insects (e.g., beetles), and specimens removed from liquid, but can also be observed
in other cases, e.g., birds with iridescent feathers. Rather than relying on the built-in camera flash,
it is best to use a lens-mounted macro (e.g., ring macro) flash which emits light from a larger area.
It is even better to set up a diffusor in front of the flash and set several additional light sources or
reflectors around the object to project light at it from several directions, akin to a ‘soft box’ used in
professional photo studios. This lighting setup will remove high contrast shadows and will highlight
the texture of the specimen.

Figure 42. Examples of glare on specimen images. Images of beetles with glossy forewings taken using a
ring macro lens. A: no light diffusor on the lens/flash – light from the flash is reflected from glossy elythrae;

85
GTI Training Manual – Standardized Workflows in DNA barcoding

B: diffusor (cone of white paper) positioned around the object in front of the flash – lighting is more evenly
distributed, no evident glare allows for better perception of the texture of the beetle’s elythrae.

Depth of field

Typically, biological objects are three-dimensional. Note that most cameras have a built-in ‘macro’
setting which is designed to create “artistic blur” around a shallow portion of the object, thereby
leaving most of it out of focus. This setting is useless for technical photography of biological
objects, where one should aim to attain maximum depth of field. In an SLR-type camera, this can
be achieved by setting a narrow aperture (usually marked as ‘A’ in camera settings),
corresponding to a large ‘f-stop’ (values of f11 or higher). Note that narrow aperture settings
require high intensity lighting to provide adequate exposure. Also note that very high f-stops
(usually, above f13) can lead to a visible diffraction effect that makes the image look more
“grainy” and obscures detail. It is best to balance the need for a high depth of field with need to
maintain image sharpness.

Sharpness

In order to depict as many characters of the specimen as possible, the image should be “crisp”,
or sharp. High detail necessitates, first of all, precise focus on the object. By default, most
cameras use automatic electronically-guided focus; when using it, ensure that the camera
focusses on the object and not the background.
Motion blur can drastically decrease image quality; it is caused by lens movement relative to the
object during exposure. To avoid motion blur, select high shutter speed settings (usually marked
as ‘S’) of 250 or more (corresponding to 1/250 of a second) and/or use a tripod to stabilize the
camera. When using flashes, ensure that the shutter speed does not exceed the maximum
allowed by flash synchronization.
High ISO (image sensor photosensitivity) settings also introduce image noise, making the image
look “grainy”; this can sometimes happen if the camera ISO is set to ‘auto’ by default. It is best to
set ISO manually to 100 or 200, as directed above.

Distortion

Typically, distortion is not an issue when using macro lenses, but may be noticeable if wide-angle
lenses are used and set to the smallest focal length (e.g., 35 mm or less). Images may also be
distorted as a result of digital post-processing, e.g., merging of multiple photos. For example, this
may happen when the Z-axis stacking feature is used in microscopy. It may be a reasonable trade-
off if the specimen is too large to be imaged by a regular lens from a certain aspect or if additional
detail is recorded during stacking or merging of several images.

Object framing and orientation

The object should occupy the entire full frame, leaving relatively small margins. It is also important
to ensure that parts of the specimen are not “cut off” by the image frame. The longest axis of the
specimen should be parallel to the longest.

86
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 43. Examples of photographs of the same specimen (muskrat skull) taken with different aperture
settings showing effect on depth of field and diffraction. Photos were taken with a digital SLR camera using
a 60 mm macro lens. Left: image of entire skull, view from below; right: close-up of part of the image. A –
largest aperture, f2.8 (low depth of field, only lower part of the skull in focus; no diffraction); B – medium-
small aperture, f13 (medium-high depth of field, with most of the skull in focus; no visible diffraction); C –
smallest aperture, f32 (high depth of field, entire skull in focus; high diffraction). Image B would be optimal
quality for DNA barcoding.

87
GTI Training Manual – Standardized Workflows in DNA barcoding

Figure 44. Examples of an unframed (A), incorrectly (B) and correctly (C-D) framed photos of the same
specimen; image sizes reduced proportionally to original. A: Original photo as recorded by the camera –
excessive wide margins around the specimen; B: original image incorrectly cropped in 4:3 ratio (landscape
layout) – specimen legs cropped out; C: original image correctly cropped in 4:3 ratio (landscape layout) –
all parts of the specimen retailed in the image (note that only part of the label is within frame, but it is non-
essential); D - original image correctly cropped in 4:3 ratio (portrait layout).

Figure 45. Example of an image thumbnail set: when all images are oriented in the same direction, it is
easier for the human eye to pick out the differences that are not related to the positioning of the specimens.

88
GTI Training Manual – Standardized Workflows in DNA barcoding

It is important that uniform image orientation (landscape vs. portrait) and object position
(orientation and aspect) is maintained throughout the entire batch of morphologically similar
organisms, in order to allow batch comparisons of images.

Choosing image background

There are no specific guidelines for background colour to be used in DNA barcode e-voucher
images; however, it is important to ensure that the background is uniform in colour and pattern
and does not obscure the outline of the object. Typically, white or black background is
recommended for most cases. A sheet of white paper or cloth (preferably, with an off-camera flash
mounted behind it) or black velvet could be used to attain the desired outcome. For best results,
the background should be far enough behind the object to be out of the camera’s focus. Note that
most automatic camera settings will try to attain a ‘center-weighted average’ image exposure,
meaning that the camera’s built-in exposure metering system will try to compensate for “over-
exposed” white background or “under-exposed” black background. To avoid this, we recommend
using manual shutter and aperture settings and ‘spot metering’ exposure settings, and exposure
compensation if available.

Figure 46. Choosing image background: comparison of the appearance of the same specimen on dark (A)
and white (B) background. When the specimen is photographed against black background, it often creates
a better perception of its surface texture; it also helps to pick up fine details of the outline, even if parts of
the specimen are dark. White background provides better representation of translucent parts of the
specimen.
Some imaging manuals recommend using blue or green background which is thought to facilitate
digital ‘cropping’ of the object from the background, which we do not support for several reasons:

• Many non-human biological objects naturally contain blue or green colours, thereby
complicating the digital cropping procedure;
• Intensely coloured background may skew the camera’s colour temperature perception or
exposure metering, or even alter the registered colouration of the specimen;
• The procedure adds an unnecessary step in the image post-processing that is not warranted
as part of the routine DNA barcoding workflow.

89
GTI Training Manual – Standardized Workflows in DNA barcoding

Imaging dry vs. fluid-preserved specimens

Imaging dry mounted specimens (e.g., pinned insects) allows more versatility in studio setup
(horizontal vs vertical); although fluid-preserved specimens may be temporarily removed from the
fixative, it is logistically preferred to use a vertical setup where the camera is mounted vertically
above the vessel where the object is placed. A similar vertical setup could be used for larger dry
specimens. When small batches of specimens are imaged at a time, the vertical camera can be
held over the specimen by the photographer, who should be mindful of the strain risks associated
with poor ergonomics of this setup.

Processing software

In choosing the software to post-process the images and prepare them for submission, it important
to consider the need for previewing and processing large batches (at least 95) images, including
basic image adjustment, cropping, batch resize and batch rename. Despite the considerable
variety of image editing software, relatively few programs offer this specialized functionality.
FastStone Image Viewer43 has the required features and is offered as freeware for personal and
educational use.

43
http://www.faststone.org/

90
GTI Training Manual – Standardized Workflows in DNA barcoding

ANNEX V: Tissue Sampling


Practical tips for tissue sub-sampling
Sub-sampling is defined as the process of extracting or removing a small piece or aliquot of
tissue from the main tissue sample for the purpose of consumptive analysis. The instructions
included here are related to batch processing (96-well microplates) which requires sample
arraying prior to sub-sampling. When whole specimens are used as the tissue source and when
they can be easily arrayed (e.g., insects and other small arthropods), the term ‘sub-sampling’
could refer to sampling for these whole specimens.

Using standard containers for sub-sampling


Types of sampling containers are determined by the specifics of the DNA extraction procedure
and the type of equipment used. In a medium-throughput setting, 96-well plates are used, instead
of individual tubes. Typically, 96-well microplates are used for animal tissue. For plants,
specialized tube racks are used instead where tubes are aggregated in 12 strips of 8 tubes. This
arrangement determines the difference in order in which the tissues are sampled: A) A01 to H11
in a microplate in animals; B) H01 to B12 in a plant tube rack in plants and fungal fruiting bodies
(see below; red circles are negative control wells).

Recommendations for tissue sizes and other notes


Tissue sizes

• Small insect: whole leg, antenna — ca. 5mm length


• Large insect: tibia or femur only — ca. 2mm length
• Vertebrate/invertebrate: muscle — ca. 1mm3 volume or 1 mm diameter
• 2-dimensional tissue: skin/body wall — ca. 2–4mm diameter
• Minute invertebrate: whole specimen — ca. <3mm length

Notes

• Only leave one row of wells uncovered at a time.


• You can visualize well contents (e.g. to evaluate the correct amount of tissue sampled) by
examining the plate from below.

91
GTI Training Manual – Standardized Workflows in DNA barcoding

• Do not place any foreign objects (e.g. labels) into sampling wells.
• Do not place excessive tissue into the sampling wells - this may inhibit DNA extraction and
PCR. If the sample exceeds the recommended dimensions, subdivide it into fragments to
obtain the right amount.
• Avoid sampling from body parts containing scales, hairs, or bristles, when possible.
• Avoid sampling from digestive tracts or from areas which may have been in contact with
digestive tract contents or other contaminants.

DNA-rich vs. DNA-poor tissue


Tissue samples can be broadly categorized into ‘DNA-rich’ and ‘DNA-poor’ by the amount of DNA
that can be extracted from a typical sample.

• DNA-rich tissues (e.g., vertebrate muscle) may be prone to cross-contamination,


therefore require thorough decontamination (chemical sterilization) between sampling
rounds.
• DNA-poor tissues (e.g., insect legs or plant leaves) are somewhat less prone to carry
over between sampling rounds but are more sensitive to residues of chemicals such as
bleach or ELIMINase; instead, flame sterilization is used when logistically permissible.

NOTE: Flame sterilization, although not recommended, can be used for DNA-rich tissues when
chemical sterilization is not available.
Supplies required to sample DNA-rich (A) and DNA-poor (B) tissue:

Specimens with DNA-rich tissue (e.g. vertebrates)

Required supplies

• ~30mL of ELIMINase in one (2oz) jar

92
GTI Training Manual – Standardized Workflows in DNA barcoding

• 3 (4oz) jars with increasing amounts of distilled water: wash 1 (~50mL), wash 2 (~70mL),
and wash 3 (~90mL)

• Forceps (smooth-tipped, not ribbed)


• Microplate with sampling wells pre-filled with 30μl of 95-100% ethanol and covered with 12-
cap strips):
• Tube rack for placing isolated tissue tubes destined for subsampling
• Gloves
• Kimwipes or other sterile paper tissue
Sampling procedures
• Clean workstation (steps: ELIMINase, water, ethanol)44 and set up ELIMINase wash station
(1 jar of ELIMINase, 3 jars of distilled water). Change gloves and lay out Kimwipes to work
on.
• Position the plate on a flat surface with the plate label facing towards you. The column
markers (1–12) should be at the top and the row markers (A–H) should be on the left side.
• Remove first row cap strip, without touching the part of the cap that goes in the well, and
cover with Kimwipe until row completed to reduce contamination.
• Sterilize forceps by rinsing in ELIMINase, wiping with Kimwipe, and then rinsing in each of
the 3 jars of distilled water (wash 1 through to wash 3) and wiping with a different Kimwipe.
• Remove piece of tissue from corresponding specimen and place in first well (A01) of
microplate.
• Repeat steps 4 and 5 for each well of the first row, proceeding in alphanumerical order to
A12 (left to right).
• After completing the row, place the cap strip and seal it firmly.
• Remove next row strip and repeat steps 4-7 for all rows.
• When sampling into the last row (Row H), remember to leave the last well (H12) empty (as
negative control during molecular procedures).
• Once the plate is filled with samples, ensure that all caps are pressed firmly into the wells
to prevent ethanol evaporation.
• Store microplate in the fridge/freezer until processing. Large delays (e.g., years) between
tissue sampling and molecular processing might affect the PCR success.

Specimens with DNA-poor tissue (e.g. arthropods)

Required supplies

• ~30mL of ethanol in jar or beaker


• Propane burner or ethanol burner (filled with ethanol)
• Lighter
• Forceps (smooth tipped, not ribbed)

44
Note that the decontamination protocol is different for Eliminase and for bleach: bleach can be inactivated
by Ethanol, therefore after treating the tools with bleach, they should be placed in concentrated Ethanol,
rather than distilled water. If desired, subsequent wash in distilled water may follow.

93
GTI Training Manual – Standardized Workflows in DNA barcoding

• Microplate with sampling wells pre-filled with 30μl of 95-100% ethanol and covered with 12-
cap strips
• Gloves
• Kimwipes or other sterile paper tissue

Sampling procedures

• Clean work station (steps: ELIMINase, water, ethanol). Change gloves and lay out Kimwipes
to work on.
• Light the flame source:
o If using a propane burner, turn knob slightly to release small, steady stream of gas.
Use the lighter to light the propane burner, producing a small flame (2-3 cm).
o If using an ethanol burner, ensure it is filled with ethanol and light it with the lighter.
• NOTE: Never leave the flame unattended. Ensure that you turn off the propane burner or
smother the ethanol burner if you leave your station.
• Position the plate on a flat surface with the plate label facing towards you. The column
markers (1–12) should be at the top and the row markers (A–H) should be on the left side.
• Remove first row cap strip, without touching the part of the cap that goes in the well, and
cover with Kimwipe until row completed (to reduce contamination).
• Sterilize forceps by dipping in the jar containing ethanol and then flaming the forceps in the
burner. Do not hold forceps above flame for more than 1 second – wave forceps tips once
over the flame and let the ethanol burn off. Be careful not to drip burning ethanol on your
work surface and keep forceps away from the jar with rinsing ethanol.
• Remove piece of tissue from corresponding specimen and place in first well (A01) of
microplate. For insects, we recommend sampling the middle leg on the right side.
• Repeat steps 5 and 6 for each well of the first row, proceeding in alphanumerical order to
A12 (left to right).
• After completing the row, replace the cap strip and seal it firmly.
• Remove next row strip and repeat steps 5-8 for all rows.
• When sampling into the last row (Row H), remember to leave the last well (H12) empty (as
negative control during molecular procedures).
• Once the plate is filled with samples, ensure that all caps are pressed firmly into the wells
to prevent ethanol evaporation.
• Store microplate in the fridge/freezer until processing. Large delays (e.g., years) between
tissue sampling and molecular processing might affect the PCR success.

Plants and fungi should be tissue-sampled following the procedure for arthropods.

94
GTI Training Manual – Standardized Workflows in DNA barcoding

ANNEX VI: Medium/High-throughput Lab


Consumables and equipment45
Description Abbreviation Supplier
ABGene 12-Strip Flat PCR caps Cap strips Fisher Scientific™
AcroPrep™ 1.0 μm Glass Fiber Plate PALL1 PALL
AcroPrep™ 3.0 μm Glass Fiber Plate PALL2 PALL
Air Clean PCR workstation Flow hood AirClean® Systems
Allegra™ 25R Refrigerated Benchtop Centrifuge Beckman Coulter
Centrifuge (plate)
Aluminum Foil, Seal & Sample Aluminum seal Beckman Coulter
Axygen® 96-Well PCR Plates Sequencing plates Axygen Scientific
Axygen™ 8-strip minitube caps Plant strip caps Fisher Scientific™
Axygen™ Mini Tube System Plant box Fisher Scientific™
Axygen™ Platemax Semi-automated Heat Sealer Fisher Scientific™
Plate Sealer
BioClean LTS Tips: 20μL, 250μL, 1000μL Tips Rainin
E-Gel™ High-Throughput DNA E-Gel Invitrogen™
Electrophoresis System
E-Gel™ Imager Invitrogen™
Eppendorf™ 96-well microplates Microplate Fisher Scientific™
Eppendorf™ Mastercycler™ PCR System Thermocycler Fisher Scientific™
Falcon™ Centrifuge Tubes (15 or 50mL) Tubes Fisher Scientific™
Fixed Speed Vortex Mixer Vortex Fisher Scientific™
Heat Sealing Film Heat seal Axygen Scientific
Isotemp Incubator Incubator Fisher Scientific™
Kimwipes™ (1-ply) Kimwipes Kimberly-Clark™
Laboratory refrigerators/freezers Fisher Scientific™
Microcentrifuge tubes (1.5 or 2mL) Tubes Thermo Scientific™
Mini microcentrifuge (8-place rotor) Microcentrifuge Corning®
Powder-free Nitrile Exam Gloves Gloves Fisher Scientific™
PP Masterblock, 96 Well, 2mL Square-well block Greiner Bio-One
Racks for tubes and microtubes Racks Thermo Scientific™
Rainin Pipet-Lite XLS LTS: single, 8/12- Manual pipettes Rainin
channel
Reagent Reservoirs (100mL) Reservoir Thermo Scientific™
SBS Receiver Plate Collar PALL collar PALL
Secura® Laboratory Balance Balance Sartorius
Soft-Rubber Plate Roller Roller Fisher Scientific™
Stainless steel beads, 3.17 mm Montreal Biotech
Thermo Scientific Matrix 1250μL Tall Talltips Thermo Scientific™
Filter tips (102 mm)

45
This list of equipment and suppliers is based on companies available in Canada.

95
GTI Training Manual – Standardized Workflows in DNA barcoding

Thermo Scientific Matrix 15-1250µL EXP Electronic pipettes Thermo Scientific™


Pipette 8-channel
TissueLyser TissueLyser Qiagen
Whatman™ Microplate Uniseal™ Clear seal Fisher Scientific™

Pipettes
Tissue lysis and DNA extraction

• Electronic 8-channel 50-1200µL 1250µL tips


• Manual 8-channel 20-200µL 200µL tips
• Manual 12-channel 20-200µL 200µL tips
• Manual single channel 100-1000µL 1250µL tips

DNA-free station for PCR and sequencing mix

• Manual single channel 100-1000µL 1250µL tips


• Manual single channel 20-200µL 200µL tips

PCR hood

• Manual 8-channel 5-50µL 200µL tips


• Manual 12-channel 0.5-10µL 10µL tips
• Manual single channel 20-200µL 200µL tips
• Manual single channel 2-20µL 20µL tips

E-Gel station

• Manual 12-channel 10-100µL 200µL tips


• Manual 12-channel 0.5-10µL 10µL tips

Sequencing hood

• Manual 12-channel 0.5-10µL 10µL tips


• Manual 12-channel 20-200µL 200µL tips
• Manual single channel 20-200µL 200µL tips

Laboratory Information Management System (LIMS)


Central labs, with diverse tasks assigned to multiple units, face specific challenges of action
coordination, such as: tracking individual samples throughout the analytical process, their
sequencing status, and the interpretation of resultant data. This requires the development and
employment of automated tools that could track parallel lab events, reduce work-flow errors, and
minimize dependency on manual lab-books. To address the need of tracking lab operations, most
modern labs adopt some sort of tracking system, such as Laboratory Information Management
Systems (LIMS), Laboratory Information System (LIS) or Laboratory Management System (LMS).

96
GTI Training Manual – Standardized Workflows in DNA barcoding

Workflow of a standard DNA barcoding lab is compartmentalized into units that are independent
in function but linked in purpose. Computer operated LIMS links the working units by providing
current information on the action-status of each task at each functional unit and enabling the
sample tracking in real time. Documenting and tracking multiple samples in a high throughput
barcoding workflow is almost impossible without employing LIMS or a comparable management
tool.

Figure 47. Screenshot of LIMS used at the Canadian Centre for DNA Barcoding and linked to
BOLD.

97
GTI Training Manual – Standardized Workflows in DNA barcoding

ANNEX VII: Reagents for DNA barcoding46


DNA extraction reagents
List of reagents
Description Abbreviation Supplier & Catalogue #
Cetyl trimethylammonium bromide CTAB Sigma® H9151-250G
Disodium ethylenediamine tetraacetate•2H2O EDTA Fisher Scientific® S311-500
ELIMINase® ELIMINase Decon Labs Inc.™ 1102
Ethyl alcohol (anhydrous) EtOH 96% Commercial Alcohols Inc.
472-06-02
Glycerol Glycerol Sigma® G5516-500ML
Guanidine thiocyanate GuSCN Sigma® G9277-500g
Molecular biology grade water ddH2O HyClone® SH30538.02
Polyethylene glycol sorbitan monolaurate Tween-20 Fluka® 93773
Proteinase K Proteinase K Promega® V3021
Sodium chloride NaCl Fisher Scientific® S271-3
Sodium dodecyl sulfate SDS Fisher Scientific BP166-500
Sodium hydroxide NaOH Fisher Scientific® S318-3
t-Octylphenoxypolyethoxyethanol Triton X-100 Sigma® T8787-100ML
Tris(hydroxymethyl)aminometane Trizma base Sigma® T6066-100g
Tris(hydroxymethyl)aminometane Trizma HCl Sigma® T5941-100g
hydrochloride

Stock solutions
Description Reagents Weight Final Volume
1M Tris-HCI, pH 8.0 Trizma base 26.5g 500mL
Trizma HCl 44.4g
1M Tris-HCI, pH 7.4 Trizma base 9.7g 500mL
Trizma HCl 66.1g
0.1M Tris-HCI, pH 6.4 Trizma base 6.06g 500mL
(Adjust pH with HCl to 6.4-6.5)
1M NaCl NaCl 29.22g 500mL
0.5 M EDTA pH 8.0 EDTA 186.1g 1000mL
NaOH ~20.0g
Note: Vigorously mix on magnetic stirrer with heater. The disodium salt of EDTA will not go into solution
until the pH of the solution is adjusted to ~8.0 by the addition of NaOH.
Tip: give a brief rinse to NaOH granules with ddH20 in a separate glass before dissolving them.
Proteinase K 1g
1M Tris-HCl, pH 7.4 0.5mL
Proteinase K 50mL
ddH2O ~25mL
50% glycerol v/v 25mL

46
Based on CCDB protocols (http://ccdb.ca/resources/).

98
GTI Training Manual – Standardized Workflows in DNA barcoding

Note: To a vial of Proteinase K (1g), add 0.5mL 1M Tris-HCl (pH 7.4) and ~15mL of ddH2O. Close vial and
mix gently by rotation until dissolved (do not shake). Transfer to a 50mL tube and add enough ddH2O to
achieve a total volume of 25mL. Add 25mL glycerol (50% glycerol v/v). Mix gently by rotation (do not
shake). Aliquot by 2mL into tubes and store at -20°C (glycerol prevents freezing and protects enzyme).

Working solutions47
Buffer Description Volume from stock solution Final
(mL) or weight (g) Volume
2×CTAB 2%CTAB CTAB 4.0g 200 mL
100mM Tris-HCl, pH 8.0 1M Tris-HCl, pH 8.0 20mL
20mM EDTA, pH 8.0 0.5M EDTA, pH.8.0 8mL
1.4 M NaCl NaCl 16.4g
Vertebrate Lysis 100mM NaCl 1M NaCl 20mL 200mL
Buffer 50mM Tris-HCl, pH 8.0 1M Tris-HCl, pH 8.0 10mL
10mM EDTA, pH 8.0 0.5M EDTA, pH 8.0 4mL
0.5% SDS SDS 1.0g
Insect Lysis 700mM GuSCN GuSCN 16.5g 200mL
Buffer 30mM EDTA pH 8.0 0.5M EDTA, pH 8.0 12mL
30mM Tris-HCl pH 8.0 1M Tris-HCl, pH 8.0 6mL
0.5% Triton X-100 Triton X-100 1mL
5% Tween-20 Tween-20 10mL
Note: Vigorously mix on magnetic stirrer with heater
Binding Buffer 6M GuSCN GuSCN 354.6g 500mL
20mM EDTA pH 8.0 0.5M EDTA, pH 8.0 20mL
10mM Tris-HCl pH 6.4 0.1M Tris-HCl, pH 50mL
6.4
4% Triton X-100 Triton X-100 20mL
Note: Vigorously mix on magnetic stirrer with heater. If any re-crystallization occurs, pre-warm at 56°C to
dissolve before use. Stable at room temperature for 1 week.
Plant Binding Buffer Binding buffer 80mL 96mL
ddH2O 16mL
Protein Wash Buffer Binding buffer 26mL 100mL
EtOH 96% 70mL
Note: Stable at room temperature for ~1 week, discard if any crystallization occurs
Wash Buffer 60% EtOH EtOH 96% 300mL 475mL
50mM NaCl 1M NaCl 23.75mL
10mM Tris-HCl, pH 7.4 1M Tris-HCl, pH 7.4 4.75mL
50mM EDTA, pH 8.0 0.5M EDTA, pH 8.0 0.475mL
Note: Mix well, store at -20°C.
Binding Mix Binding buffer 50mL
EtOH 96% 50mL
Note: Stable at room temperature for 1 week.
Elution buffer 10mM Tris-HCl, pH 8.0
Note: Store at 4°C.

47
Weigh the dry components (e.g. SDS or GuSCN) first, then add required volumes of the stock solution,
and fill up with the molecular grade ddH2O to the final volume. No filtering is required.

99
GTI Training Manual – Standardized Workflows in DNA barcoding

PCR and sequencing reagents


Reagents Supplier Components Quantity Notes
10% trehalose Sigma, D-(+)-trehalose 5g Dissolve trehalose in
90210 dehydrate 50mL total volume
ddH2O ~50mL ddH2O. Store at -20°C.
Dimethyl sulfoxide Fisher
(DMSO) Scientific®
100µM primer Invitrogen™ Desiccated Store at -20°C.
stock primer
ddH2O _number of
nmol x 10μl
10µM primer 100µM primer 20µL Store at -20°C.
working solution stock
ddH2O 180µL
50mM MgCl2 Invitrogen™ Store at -20°C in 1mL
aliquots.
10mM dNTP mix New England Store at -20°C in 100µL
Biolabs® aliquots
10x PCR buffer for Invitrogen™ Store at -20°C.
Platinum Taq
Platinum Taq Invitrogen™ Store at -20°C in 50µL
polymerase aliquots.
BigDye Terminator Life Store at -20°C.
mix v3.1 Technologies
5X sequencing Life Store at -20°C.
buffer Technologies

Note:

• Thoroughly wash labware with ELIMINase, rinse with dH20. Weigh reagents using a clean
spatula, fill up with the molecular grade ddH20 to the final volume. Filter buffers through 0.2
μm filter into a clean bottle; make smaller volume working aliquots (e.g. 100mL). Store stock
solutions and working aliquots at 4ºC.

100
GTI Training Manual – Standardized Workflows in DNA barcoding

ANNEX VIII: Barcoding Protocols (96-well


microplates)48

DNA Barcoding – Animals


Tissue lysis
DNA extraction is based on a glass-fiber protocol in which one of the differentiating factors is the
buffer used for tissue lysis:

• CTAB – Echinoderms, mollusks (taxa with large quantities of polysaccharides in their tissues)
• Non-CTAB – The remaining taxonomic groups
o Vertebrate Lysis Buffer – Vertebrates
o Insect Lysis Buffer – Invertebrates (except taxa requiring CTAB)
Reagents, equipment and disposables required for lysis of 1 plate:
Item Quantity Notes
Full-skirted 96-well microplate 1 Colour: clear (“tissue/lysis plate”)
12-cap strips 8
Reagent reservoir 1 Or a fresh and sterile tip-box
Lysis buffer (Vertebrate/Insect/CTAB) 5mL Or 6mL to ensure sufficient quantity
for all the wells.
Proteinase K 0.5mL
LIMS sticker 1
Pipettes and tips See Appendix C
Incubator Temperature: 56°C
Plate centrifuge Speed: 1000×g
Gloves
Kimwipes
Permanent marker
Prepare the DNA extraction workstation by a thorough cleaning of the surface: clean with
ELIMINase, then distilled water, wipe it with Kimwipe until dry, and finally wipe the surface with
ethanol. Change gloves after cleaning.
1. Turn on incubator and set temperature to 56°C.
2. Retrieve tissue plate from temporary storage (e.g., fridge, freezer, shelf) and label with LIMS
sticker if needed.
3. To evaporate ethanol from tissue plate prior to lysis:
• Centrifuge (spin) sealed plate briefly.
• Visually inspect plate to ensure that samples are at the bottom of wells.
• Remove caps carefully (and dispose directly into garbage), change gloves, and place the
plate in incubator at 56°C for about 45-60 minutes (or until ethanol is fully evaporated),
depending on the volume of ethanol. Check periodically. Do not over-dry.
48
Based on CCDB protocols (http://ccdb.ca/resources/).

101
GTI Training Manual – Standardized Workflows in DNA barcoding

4. Mix 5mL lysis buffer (vertebrate/insect/CTAB) and 0.5mL Proteinase K in a sterile reagent
reservoir. Ensure that buffer and Proteinase K are well-mixed throughout while trying to avoid
the production of bubbles in the mixture.
5. Add 50µL of lysis mix49 to each well of tissue plate. Place sterile 12-cap strips over each row.
*Fill 600µL, dispense 50µL 50. Use same tips for entire plate.
6. Ensure that microplate has a LIMS sticker and is labelled with initials, date of lysis, and the word
“lysis”.
7. Spin microplate at 1000×g for 30 seconds.
8. Incubate at 56°C for 12-16 hours (overnight)51.
9. Record lysis step on LIMS.
10. Clean workstation and dispose of waste.

DNA extraction
This DNA extraction method is silica-based and involves DNA binding to a glass fiber membrane.
Reagents, equipment and disposables required for processing 1 plate:
Item Quantity Notes
Full-skirted 96-well microplate 1 Colour: blue (“DNA plate”)
DNA extraction glass fiber plate (GF)52 1 Vertebrates: 1.0µm
Invertebrates: 3.0µm
Square well block53 1 Re-used (clean after each use)
Reagent reservoir 4 1 reservoir/buffer
Binding mix (BM) 9.6mL Exact quantity; add more to reservoir
Protein wash buffer (PWB) 17.28mL Exact quantity; add more to reservoir
Wash buffer (WB) 72mL Exact quantity; add more to reservoir
Elution buffer (EB)54 3-6mL Exact quantity; add more to reservoir
LIMS sticker 1
Clear seal 4
Aluminum seal 1
Pipettes and tips
Incubator Temperature: 56°C
Plate centrifuge Speed: 1000×g, 5000×g
Plate roller
Gloves
Kimwipes
Permanent marker

49
If tissue samples are large, 100µL of lysis mix can be added to each well to dilute DNA concentration and
allow for better results with PCR amplification. Ensure that only 50µL of this lysate is mixed with binding
mix during extraction.
50
Setting for Thermo Scientific Matrix 8-channel 50-1250µL electronic pipette.
51
Lysis time can increase to 24 hours if considered necessary.
52
PALL has discontinued the PALL glass fibre plate types that are used at CCDB (types 5051 and 5053),
and replacement plates can only be used in the centrifuge, not in a robotic setup.
53
Square well blocks can be cleaned with ELIMINase, rinsed, dried and reused.
54
Elution can be performed with ddH20. However, EB stabilizes DNA for long-term storage.

102
GTI Training Manual – Standardized Workflows in DNA barcoding

1. Remove lysis plate from incubator and centrifuge at 1000×g for 30 seconds to remove any
condensation from cap strips. Carefully open cap strips one by one (dispose directly into
garbage), making sure that lysate does not splash into adjacent wells.
2. Label reservoirs for four extraction buffers.
3. Pour binding mix (BM) into first reservoir.
4. Add 100µL BM to each well, being careful not to touch wells with tips but hover above.
*Fill 1200µL, dispense 100µL. Use same tips for entire plate.
5. Retrieve the appropriate type of glass fiber plate (GF) (see table above) and label with initials,
original (tissue) plate number and date. Place on top of clean square well block.
6. Transfer 170-180µL lysate (aspirate all) from lysis plate to GF plate: before transferring, slowly
mix lysate up and down about 3 times in lysis plate wells before releasing into GF plate wells.
Cover with clear seal.
*Use manual multi-channel pipette. Change tips after each row.
7. Centrifuge at 5000×g for 5 minutes to bind DNA to GF membrane.

Left: AcroPrep PALL glass fiber (GF)


plate. Be sure to label prior to use.

Right: Place GF plate on square well


block for binding and washing steps.

8. Pour protein wash buffer (PWB) into second reservoir.


9. Discard clear seal. Add 180µL PWB to each well of GF plate. Cover with clear seal.
*Fill 1080µL, dispense 180µL (covers ½ plate). Repeat. Use same tips for entire plate.
10. Centrifuge at 5000×g for 3 minutes.
11. Pour wash buffer (WB) into third reservoir.
12. Discard clear seal. Add 750µL WB to each well of GF plate. Cover with clear seal.
*Fill 750µL, dispense 750µL. Use same tips for one plate.
13. Centrifuge at 5000×g for 3 minutes.
14. Pull seal back to uncover all wells, reposition it and spin again at 5000×g for 2 minutes to
avoid incomplete removal of WB.
15. Retrieve a blue 96-well plate for DNA collection and label with initials, date and the original
(tissue) plate number. Place a LIMS sticker on DNA plate.
16. Place GF plate on DNA plate and discard seal. Make sure both plates have same orientation
(A01 of GF plate is placed into A01 of DNA plate). Incubate at 56°C for 30 minutes to
evaporate residual ethanol.
17. Warm elution buffer (in a container such as an 15mL falcon tube) in the same time as ethanol
evaporation in incubator. Make sure to turn off incubator after step is complete.
18. Dispose of extraction buffers (collected in square well block during centrifugation) into a
special container for hazardous waste (due to guanidine) and wash blocks (see footnote 8).
Change gloves.
19. Pour warmed elution buffer (EB) into fourth reservoir.

103
GTI Training Manual – Standardized Workflows in DNA barcoding

20. Dispense 40µL55 of EB directly onto membrane in each well of the GF plate and incubate at
room temperature for 1 minute. Ensure no plate flip happens. Cover with clear seal.
*Fill 500µL, dispense 40µL. Use same tips for entire plate.
21. Place GF plate + DNA plate combination on top of a square well block for centrifugation
(otherwise the DNA plate will crack).

Position GF plate on sterile blue microplate intended


for pure DNA (elution step). Be sure to place both on
top of a square well block before centrifugation.

22. Centrifuge at 5000×g for 5 minutes to collect DNA.


23. Visually inspect DNA plate from the bottom to make sure that each well contains liquid. Cover
DNA plate with aluminum seal (use plate roller to seal it tightly) and place it in the appropriate
storage.
24. Store DNA at 4°C (short term) or -20°C (medium term). Archive DNA plates at -80°C.
25. Wrap GF plates in Kimwipes and store in plastic bags at 4°C (if necessary, they may be re-
eluted, but their shelf life is relatively short).
26. Document DNA extraction on LIMS.
27. Clean workstation and dispose of waste.

PCR amplification
The standard animal barcode region is COI-5P, which can be amplified with a range of primer
pairs, from ‘universal’ to very specific (see Appendix E for primer details).
Reagents, equipment and disposables required for processing 1 plate (PCR mix not
included):

Item Quantity Notes


Full-skirted 96-well microplate 1 Colour: yellow (PCR plate)
LIMS sticker 1
Clear seal 1
Aluminum seal 1
Heat seal 1
Microcentrifuge tube (1.5 or 2mL) 1
Pipettes and tips
Plate centrifuge Speed: 1000×g
Heat sealer Temperature: 167°C
Plate roller
Plate thermocycler

55
The quantity of EB can vary between 30-60µL/well depending on the tissue size (less EB for smaller
sizes).

104
GTI Training Manual – Standardized Workflows in DNA barcoding

Gloves
Vortex
Kimwipes
Permanent marker
The PCR workstation is usually a flow hood with UV light (intended for decontamination purposes).
Keep designated PCR pipettes inside the hood at all times. Clean PCR hood with ethanol before
starting to work. Change gloves after cleaning.
1. Retrieve required tips, seals, microplate (yellow), and tube racks intended for PCR, and place
in PCR hood. Press the UV light button on the PCR hood, which will turn UV light on for 15
minutes.
2. Retrieve DNA plate and centrifuge at 1000×g for 1 minute (if plate frozen, let it thaw a few
minutes at room temperature).
3. Prepare PCR mix56 for one plate by adding all reagents, except for DNA template, in a sterile
microcentrifuge tube (1.5 or 2mL tube).

PCR reaction: 12.5µL


Reagents57 1 reaction (µL) 1 plate (µL) 10 plates (µL)
10% trehalose for PCR 6.25 625 6250
ddH2O 2 200 2000
10X PCR buffer 1.25 125 1250
50mM MgCl2 0.625 62.5 625
10mM dNTPs 0.0625 6.25 62.5
10µM forward primer 0.125 12.5 125
10µM reverse primer 0.125 12.5 125
Taq polymerase (5 U/µL) 0.06 6 60
Total 10.5 1050 10500
DNA template 2µL per well
Note: Extra volume is considered for larger reactions (100 reactions/1 plate; 1000 reactions/10
plates) to allow for pipetting errors.

4. Label a sterile yellow 96-well microplate for PCR reaction with initials, date and original (tissue)
late number. Place a LIMS sticker on the front of yellow plate. Turn on heat sealer.
5. Add mix to PCR plate:
• If using mix without primers: add 12.5µL forward primer and 12.5µL reverse primer into
PCR mix. Vortex lightly. Change tips between primers.
• Add 84µL PCR + primer mix to each well on row A (A01-A12).
• Distribute 10.5µL of mix from A01-A12 wells across the entire plate using a 12-channel
manual pipette. Use same tips for entire plate.
6. Cover with clear seal and centrifuge at 1000×g for 1 minute.
7. Discard clear seal and add 2µL DNA/well to PCR plate using a 12-channel manual pipette.
8. Cover with heat seal, with white side of seal facing upwards, and place in automated heat
sealer for 10 seconds. Seal plate top firmly with a plate roller.
Optional: Centrifuge the plate at 1000×g for 1 minute.
9. Place PCR plate in thermocycler and initiate the appropriate program.
56
PCR mix can be prepared ahead of time for one or multiple plates, with or without primers. Mixes will be
aliquoted in 1.5 or 2mL tubes (1 tube enough for one plate) and stored at -20°C.
57
See Appendix A for details on PCR reagents.

105
GTI Training Manual – Standardized Workflows in DNA barcoding

10. Cover DNA plate with aluminum film (use plate roller). Store plate at -20°C.
11. Clean space and dispose of waste.
12. Press UV light button on PCR hood (for a 15-minute decontamination).
13. Record PCR preparation and thermocycler program on LIMS.
14. Once thermocycler program ends, remove PCR plate and store at 4°C until E-Gel check.

Animal PCR thermocycler program

COI
Number of cycles Block temperature Hold time (mm:ss)
1 94°C 01:00
94°C 00:40
x5 45°C 00:40
72°C 01:00
94°C 00:40
x35 51°C 00:40
72°C 01:00
1 72°C 05:00
Hold 4° or 10°C Until plate removed

Confirmation of PCR amplification

PCR success is verified by running and visualizing PCR products on pre-cast 96-well 2% agarose
gels (E-Gels). This system is bufferless so exposure to ethidium bromide is minimized. However,
the E-Gel station is considered a contaminated area, producing hazardous waste and needs to
be handled with special care (e.g., separate lab coats, pipettes, gloves, garbage bin).
Reagents, equipment and disposables required for processing 1 plate:

Item Quantity Notes


E-Gel system 1 Can be used for 2-3 runs
Reagent reservoir 1
ddH20 1.34mL
Exact quantity; add more to reservoir
Clear seal 1
Pipettes and tips
UV transilluminator + computer
Gloves
Kimwipes
Permanent marker
Tweezers
1. Retrieve PCR plate from temporary storage and spin briefly.
2. Turn on computer and UV transilluminator (E-Gel Imager).
3. Retrieve E-Gel58 and label the outside package, as well as bottom of E-Gel with original plate
number (optional: date and primer pair). Remove red casing (comb) from wells of E-Gel and place

58
After the first use, E-Gels should be kept in containers to maintain hydration.

106
GTI Training Manual – Standardized Workflows in DNA barcoding

E-Gel directly onto gel dock. Place packaging and casing directly into the container labelled
"Hazardous Waste". Change gloves.
4. Pour ddH2O into reagent reservoir.
5. Add 14µL ddH2O to E-Gel59 using a 12-channel manual pipette allocated for E-Gel station. Do not
place pipette tips too deep into wells, as it may rupture them. Use same tips for one plate.
6. Remove heat seal from PCR plate. Use tweezers if needed.
7. Load 4µL PCR product into E-Gel. Change tips for each row.
8. Plug in gel dock (ensure it is set to run program “EG”) and slide E-Gel into two electrode
connections on E-Base. Press “time” button to verify setting (‘04’ will appear). Press “pwr/prg”
button and red light will turn green. Electrophoresis will run for 4 minutes.
9. Cover PCR plate with clear seal and place it back in temporary storage.
10. Once run is complete, remove E-Gel from E-Base and place it on transilluminator (A01 on top
left). Turn camera on.
11. Open software Gel Capture -> Acquire Image -> UV Light Base -> E-Gel Adaptor Base with E-
Gel Go -> Select. The plate should appear on the computer screen with glowing bands for
successful amplification. Take a picture if you are satisfied with the brightness and alignment of
plate. Press “SAVE” to save the snapped image. Assign a file name as appropriate including the
plate number (images are saved as .tif)
12. Turn off camera and transilluminator, place comb back on E-Gel and back in its package and
either in special container (for additional runs) or in designated garbage bin for hazardous waste.
13. Open software E-Editor, open .tif image, use cross-hairs tool (left side toolbar) to define surface
of gel to be cut into 96 bands, then Next -> Save as .jpg (file format required for LIMS).
14. Upload image file to LIMS. Score bands on E-Gel to assess amplification success.
15. Clean space and dispose of waste in designated garbage bin for hazardous waste.

PCR product clean-up

Currently, PCR products are not cleaned-up but proceed directly to sequencing.

59
If reused within the same day, the E-Gel does not require re-hydration. If more than a day, wipe the gel
with a Kimwipe, change gloves, and re-hydrate with 14µL ddH2O.

107
GTI Training Manual – Standardized Workflows in DNA barcoding

Cycle sequencing
The cycle sequencing reaction is customized for sequencing clean-up with magnetic beads (in-
house recipe and protocol ran at CCDB).
Each PCR plate will be sequenced in two directions (forward and reverse) resulting in two
sequencing plates.
Reagents, equipment and disposables required for processing 2 plates (sequencing mix
not included):

Item Quantity Notes


Non-skirted 96-well plate 2 Colour: clear
LIMS sticker 2
Clear seal 4
Aluminum seal 2
Reagent reservoir 1
Microcentrifuge tube (1.5 or 2mL) 2
ddH2O for PCR product dilution 3.84mL Exact quantity; add more to reservoir
Pipettes and tips
Plate centrifuge Speed: 1000×g
Plate thermocycler
Plate roller
Gloves
Kimwipes
Permanent marker

The sequencing workstation is usually a flow hood with UV light intended for decontamination
purposes. Keep designated sequencing pipettes inside the hood at all times. Clean hood with
ethanol before starting to work. Change gloves after cleaning.
1. Retrieve required tips, seals, plate, and tube racks intended for sequencing, and place in
sequencing hood. Press UV light button on the sequencing hood, which will turn UV light on for
15 minutes.
2. Retrieve PCR plate from temporary storage and spin briefly.
3. Pour ddH2O into reagent reservoir.
4. Dilute PCR product with 40µL ddH2O using a 12-channel manual pipette. Use same tips for
entire plate, making sure not to touch the PCR plate. Cover with clear seal.
5. Centrifuge at 1000×g for 2 minutes to get rid of bubbles.
6. Label two clear sequencing plates. Place a LIMS sticker on the right-hand side of plate (see
illustration below):

108
GTI Training Manual – Standardized Workflows in DNA barcoding

F
12

PLATE # LIMS sticker DIRECTION

PLATE #

7. Prepare sequencing mix60 by adding all reagents, except for PCR product, in a sterile 1.5 or
2mL microcentrifuge tube. Note: mix should be made for two plates (forward and reverse) in
two separate tubes.

Sequencing reaction: 11µL


Reagents61 1 reaction (µL) 1 plate (µL) 10 plates (µL)
10% trehalose for sequencing 5 520 5200
Big Dye Terminator Mix 0.25 26 260
5X sequencing buffer 1.875 195 1950
ddH2O 0.875 91 910
10µM primer 1 104 1040
Total 9 936 9360
Diluted PCR product 2µL per well
Note: Extra volume is considered for larger reactions (104 reactions/1 plate; 1040
reactions/10 plates) to allow for pipetting errors.

8. Add mix to sequencing plate:


• If using sequencing mix without primers: add 104µL forward primer to the first tube of
mix and label top of tube with “F” to indicate direction. Add 104µL reverse primer to the
second tube of mix and label top of tube with “R” to indicate direction. Change tips between
primers.
• For each plate, add 76µL sequencing + primer mix to each well on row A (A01-A12).
• Distribute 9µL sequencing + primer mix from A01-A12 wells across the entire plate using
a 12-channel manual pipette. Use same tips for entire plate.
Note: Be sure to dispense the correct direction sequencing + primer mix depending on the
labelling of the sequencing plates.
9. Cover sequencing plate with clear seal. Spin both plates at 1000×g for 1 minute.
10. Discard clear seal from sequencing plates. Add 2µL diluted PCR product to each well by
using a 12-channel manual pipette. Ensure no plate flip happens.
11. Cover PCR plate with clear seal and place it back in temporary storage.
12. Cover each plate with aluminum film and seal it firmly with a plate roller. Mark direction
and plate number on seal with a permanent marker. Spin plate briefly.
13. Place sequencing plates in thermocycler and initiate appropriate program.
14. Clean space and dispose of waste.
15. Press the UV light button on sequencing hood (for a 15-minute decontamination).

60
Sequencing mix can be prepared ahead of time for one or multiple plates, with or without primers. Mixes
will be aliquoted in 1.5 or 2mL tubes (1 tube enough for one plate) and stored at -20°C.
61
See Appendix A for details on sequencing reagents.

109
GTI Training Manual – Standardized Workflows in DNA barcoding

16. Record cycle sequencing preparation and thermocycler program on LIMS.


17. Once thermocycler program ends, place sequencing plates in a box, and submit them to
CCDB for sequencing. If plates cannot be submitted right away, place them immediately
into fridge/freezer for no longer than one week.

Animal sequencing thermocycler program

M13 primers
Number of cycles Block temperature Hold time (mm:ss)
1 96°C 01:00
96°C 00:10
35x 55°C 00:05
60°C 02:30
1 60°C 05:00
Hold 4 or 10°C Until plate removed

Other primers
Number of cycles Block temperature Hold time (mm:ss)
1 96°C 01:00
96°C 00:10
15x 55°C 00:05
60°C 01:15
96°C 00:10
5x 55°C 00:05
60°C 01:45
1 60°C 00:15
96°C 00:10
15x 55°C 00:05
60°C 02:00
1 60°C 01:00
Hold 4 or 10°C Until plate removed

Longer fragments (800-1000bp)


Number of cycles Block temperature Hold time (mm:ss)
1 96°C 02:00
96°C 00:30
x30 55°C 00:15
60°C 04:00
Hold 4 or 10°C Until plate removed

110
GTI Training Manual – Standardized Workflows in DNA barcoding

DNA Barcoding – Plants, fungi


Tissue lysis
Plants and fungi require processing in a plant box, rather than a microplate (as animals). Plant
boxes (PB) consist of 96 tubes placed on a rack, connected in rows of 8 and covered tightly with
8-cap strips. The box is closed with a clear lid secured by tape to prevent unexpected opening
during shipping and handling. As opposed to animals, the orientation of the box is not from A01
to H12 but from H01 to A12.
The plant tissue is a dry piece of leaf or stem (approx. 0.5 cm²), dried using silica-gel or sampled
from an herbarium voucher. Plant or fungal tissue fragments are very light, fragile, and easily
chargeable by static electricity (since they are sampled dry and not placed in wells with ethanol,
as required for animals), therefore require EXTREME caution while handling. In addition,
sample size is sometimes very small which brings another layer of difficulty in dealing with these
taxa. The plant/fungal tissues require mechanical disruption to break the cell-wall and release
DNA for extraction. All stages of lysis, including grinding, are performed in the same strip-tubes.
The protocol presented here is designed to optimize the high-throughput procedure for the
plant/fungal tissue lysis, in order to avoid cross contamination.

Placing beads62 in the strip-tubes63

Item Quantity Notes


Empty blue racks for plant tubes 2 Rack 1 and Rack 2
Sterile box lid 1
8-cap strips 12
LIMS sticker 1
Tube with clean beads (stainless steel)
Plate centrifuge Speed: 1000×g
Gloves
Kimwipes
Permanent marker
Prepare workstation: clean surface with ELIMINase, then distilled water, wipe it with Kimwipe until
dry, and finally wipe the surface with ethanol. Change gloves after cleaning.
1. Retrieve plant box (PB) with tissues samples (add LIMS sticker if needed) and centrifuge at
1000×g for 1 min. Centrifugation of dry material is not very efficient, however it helps to spin
down tissues and relatively larger particles. Small particles may still be sticking to walls and
caps due to static electricity.
2. Place PB in an orientation H01 to A12. Remove tape, open lid and clearly label strip-tubes in
row H with numbers from 1 to 12 according to rows in the rack, using permanent marker. Do
not remove the strip tubes from the rack before you label them.
3. Position PB within reach but separate from clean area. Make sure that labels are legible.
4. Place one empty rack (Rack 1) in front and other empty rack (Rack 2) on right side.

62
Stainless steel beads can be re-used: separate beads from tissue debris, rinse with water, soak in
ELIMINase for 1 hour, wash thoroughly with warm water, soak in 0.5N HCl for 1 min, rinse with warm water
followed by dH2O and final rinse with ddH2O, dry and expose to UV light for 30 min.
63
Place beads in strip-tubes before tissue subsampling for an easier workflow.

111
GTI Training Manual – Standardized Workflows in DNA barcoding

5. Take one strip of tubes labeled “1” from first row of PB, and place it in the Rack 1.
6. Pull the caps from tubes using two hands: one hand holds strip by ending tag, while other hand
helps to open caps one by one, pulling them carefully by side tags of each cap. Use ending tag
to pull whole strip of released caps aside from opened tubes to prevent small particles from
caps falling into neighboring (opened) tubes. Discard the cap-strip.
7. Transfer opened strip-tube into corresponding row #1 of Rack 2. Avoid touching the upper part
of the strip-tubes, especially the open ends of the tubes.
8. Take strip “3” from the PB, and place into Rack 1. Repeat steps 6-7, place opened strip-tubes
in corresponding row #3 in Rack 2.
9. Repeat same steps with strip-tubes # 5, 7, 9, 11.
10. Repeat same operations with strip-tubes #2, 4, 6, 8, 10, 12, and place them into original PB,
according to row numbers.
11. Move Rack 1 aside and clean bench with ethanol.
12. Keep a box of clean gloves handy. Change gloves to avoid contamination. Make sure gloves
fit your fingers tightly enough to hold small objects. After changing gloves make sure you do
not touch anything but clean box lid and beads.
13. Pour some clean stainless-steel beads in a sterile box lid.
14. Take a single bead and put it in the tube of the strip-tube 1, not touching the top of tube
(change gloves in case of touching tubes). Make sure that each tube contains a single bead,
otherwise the tube will crack during grinding.
15. Repeat same procedure with all tubes in Rack 2 and PB.
16. Return unused beads back to tube with the clean beads.
17. Change gloves, and prepare 12 sterile 8-cap strips, placing them on clean Kimwipe.
18. Close all strip-tubes with fresh caps. Try to grab cap-strips by a tag and work from bottom to
top to avoid hovering over opened tubes.
19. Transfer closed strip-tubes from Rack 2 in PB, according to row numbers.
The box with tissue samples is now ready for grinding.

Grinding specimen tissues

This part requires a TissueLyser for grinding, available at CCDB.


Follow these steps to grind plant/fungal tissue:

• Place PB with lid removed in TissueLyser base plate adapter, cover with lid adapter, make sure
to match cylindrical parts on base and lid adapters.
• Place assembly in TissueLyser and clamp using hand wheel with compression disk until locking
bolt stops clicking. Do not over-tighten.
• Close TissueLyser and apply 28Hz for 30 seconds.
• Raise and rotate locking bolt, release adapter using hand wheel, return locking bolt into ‘clicking’
position.
• Disassemble adapters and rotate each tube rack 180° (see note below) and secure them again
as described above.
• Apply 28Hz for another 30 seconds.
• Release adapters and remove plate from TissueLyser.
• Cover each PB with lid and centrifuge at 1000×g for 1 minute.
Note: When using a TissueLyser Adapter Set, samples nearer to the TissueLyser move slower
than the samples further away from the TissueLyser. To ensure uniform disruption and
homogenization, 2 shaking steps should be carried out. After the first shaking step, the
TissueLyser Adapter Set should be disassembled and the rack of tubes should be rotated so that

112
GTI Training Manual – Standardized Workflows in DNA barcoding

the tubes that were nearest to the TissueLyser are now outermost. The TissueLyser Adapter Set
should then be reassembled before continuing with the second shaking step.

Adding CTAB buffer

Item Quantity Notes


Empty blue racks for plant 2 Rack 1 and Rack 2
tubes
8-strip minitube caps 12
Lysis buffer (CTAB) ~40mL Between 100-400µL/well depending on tissue
size
Reagent reservoir 1
Plate centrifuge Speed: 1000×g
Pipettes and tips
Incubator Temperature: 65°C
Gloves
Kimwipes
Permanent marker

1. Retrieve plant box with grinded tissue. Make sure that labels are easily readable.
2. Place Rack 1 in front and Rack 2 on right side.
3. Take strip “1” from PB and transfer it to Rack 1.
4. Remove (and discard) caps using technique described above (first section of plant protocol).
5. Transfer very carefully open strip-tubes to corresponding row #1 in Rack 2. Do not touch top of
tubes.
6. Repeat steps 3-5 with strip-tubes 3, 5, 7, 9, and 11 transferring them into corresponding rows
in Rack 2.
7. Repeat same steps with strip-tubes 2, 4, 6, 8, 10, and 12 and place them into original PB,
according to row numbers.
8. Move Rack 1 aside and clean bench with ethanol. Change gloves.
9. Prepare 12 clean 8-cap strips placing them on clean Kimwipe.
10. Pour CTAB in reagent reservoir.
11. Dispense 300μL CTAB into each tube from Rack 2 and PB. Make sure that lysis buffer does
not wet tip filters (discard tips if necessary).
12. Close tubes tightly with sterile 8-cap strips. Work from bottom to top to avoid hovering over
open tubes.
13. Transfer closed strip-tubes from Rack 2 to PB according to row numbers.
14. Close PB with the box lid.
15. Hold the PB with lid using both hands and gently invert once.
16. Immediately centrifuge at 1000×g for 1 minute.
17. Place PB in incubator at 65°C for 60-90 minutes (optional: incubator with shaker, 120 rpm).

Transferring lysate to the microplate

Item Quantity Notes


Full-skirted 96-well microplate 1 Colour: clear
LIMS sticker 1
Plant Binding Buffer 9.6mL
Reagent reservoir 1

113
GTI Training Manual – Standardized Workflows in DNA barcoding

Aluminum adhesive film 1


Plate centrifuge Speed: 1000×g
Pipettes and tips
Plate roller
Gloves
Kimwipes
Permanent marker

1. Retrieve plate from incubator and allow to cool down at room temperature. Do not invert plate
once it is warm.
2. Centrifuge at 1000×g for 1 minute.
3. Retrieve a sterile clear microplate and add LIMS sticker.
4. Pour plant binding buffer (PBB) into reagent reservoir.
5. Dispense 100μL PBB, being careful not to touch the wells but hover above.
*Fill 1200µL, dispense 100µL. Use same tips for entire plate.
6. Cover microplate temporarily (e.g. Kimwipe) to prevent contamination.
7. Carefully open all caps from the plant strip-tubes, working from top to the bottom to avoid
hovering over open tubes. No additional rack is required at this stage.
8. Transfer 50μL of supernatant to microplate with PBB (and then proceed to DNA extraction).
*Use manual 8-channel pipette. Change tips after each row.
9. Seal the plant box with aluminum film, cover with lid and store at -20°C as backup.

DNA extraction
Plant DNA is extracted through a glass-fiber protocol, very similar to the animal protocol.

Item Quantity Notes


Full-skirted 96-well microplate 1 Colour: blue
DNA extraction glass fiber plate (GF) 1 1.0µm plate
Square well block 1
Reagent reservoir 3 1 reservoir/buffer
Binding mix (BM) 17.28mL Exact quantity; add more to reservoir
Wash buffer (WB) 67.2mL Exact quantity; add more to reservoir
Elution buffer (EB) 4.8mL Exact quantity; add more to reservoir
LIMS sticker 1
Clear seal 3
Aluminum seal 1
Pipettes and tips
Incubator Temperature: 56°C
Plate centrifuge Speed: 5000×g
Plate roller
Gloves
Kimwipes
Permanent marker

1. Retrieve glass fiber plate (Acroprep 1.0µm) and label with initials, original (tissue) plate number
and date. Place on top of clean square well block.

114
GTI Training Manual – Standardized Workflows in DNA barcoding

2. Transfer all 150μL mix (100μL PBB+50μL plant lysate) to GF plate: before transferring, slowly
mix lysate up and down about 5-10 times in lysate plate wells before releasing into the bottom
of GF plate wells (make sure not to puncture the GF membrane). Cover plate with clear seal.
*Use manual 12-channel pipette. Change tips after each row.
3. Centrifuge at 5000×g for 5 minutes to bind DNA to GF membrane.
4. Label reservoirs for three extraction buffers.
5. Pour binding mix (BM) into first reservoir.
6. Add 180µL BM to each well, being careful not to touch the wells but hover above. Cover with
clear seal.
* Fill 1080µL, dispense 180µL (covers ½ plate). Repeat. Use same tips for entire plate.
7. Centrifuge at 5000×g for 2 minutes.
8. Pour wash buffer (WB) into second reservoir.
9. Discard clear seal. Add 700µL WB to each well of GF plate. Cover with clear seal.
*Fill 700µL, dispense 700µL. Use same tips for one plate.
10. Centrifuge at 5000×g for 5 minutes.
11. Retrieve a sterile blue microplate for DNA collection and label with initials, date and original
(tissue) plate number. Place a LIMS sticker on the front of blue plate.
12. Place GF plate on DNA plate and discard clear seal. Make sure both plates have the same
orientation (A01 of GF is placed into A01 of the DNA plate).
13. Incubate at 56°C for 30 minutes to evaporate residual ethanol.
14. Warm elution buffer (in a container such as an 15mL falcon tube) in the same time as ethanol
evaporation in incubator. Make sure to turn off incubator after step is complete.
15. Pour warmed elution buffer (EB) into third reservoir.
16. Dispense 50µL64 of EB directly onto the membrane in each well of GF plate and incubate at
room temperature for 1 minute. Ensure no plate flip happens. Cover with clear seal.
*Fill 600µL, dispense 50µL. Use same tips for entire plate.
17. Place GF plate + DNA plate combination on top of a square-well block for centrifugation
(otherwise the DNA plate will crack).
18. Centrifuge at 5000×g for 5 minutes to collect DNA.
19. Visually inspect DNA plate from the bottom to make sure that each well contains liquid. Cover
DNA plate with aluminum seal (use plate roller to seal it tightly) and place it in the appropriate
storage.
20. Store DNA at 4°C (short term) or -20°C (medium term). Archive DNA plates at -80°C.
21. Wrap GF plates in Kimwipes and store in plastic bags at 4°C (if necessary, they may be re-
eluted, but their shelf life is relatively short).
22. Document DNA extraction on LIMS.
23. Clean workstation and dispose of waste.

PCR amplification
Amplification of different plant/fungal markers requires different PCR reactions and thermocycling
programs but the workflow and necessary laboratory equipment are the same as for the
amplification of animal COI.
Note: Phusion Hot Start High-Fidelity DNA Polymerase, with proofreading ability, has proved very
efficient for amplification of homopolymer regions and greatly improves sequencing results for
matK and psbA-trnH. However, the enzyme presents lower thermostability and batches of pre-

64
The quantity of EB can be increased up to 60µL/well.

115
GTI Training Manual – Standardized Workflows in DNA barcoding

made PCR plates should not exceed 4 plates which cannot be stored at -20°C but need to be
used immediately.

PCR reaction (12.5µL) for rbcLa, ITS2, fungal ITS, and LSU markers

Reagents 1 reaction (µL) 1 plate (µL)


10% trehalose 6.25 625
ddH2O 2 200
10X buffer 1.25 12.5
50mM MgCl2 0.625 62.5
10mM dNTPs 0.0625 6.25
Platinum Taq (5 U/µL) 0.06 6
10µM forward primer 0.125 12.5
10µM reverse primer 0.125 12.5
Total 10.5 1050
DNA template 2µL

PCR reaction (10μL) for matK marker

Reagents 1 reaction (µL) 1 plate (µL)


DMSO 0.3 30
ddH2O 5.375 537.5
5X buffer HF (with MgCl2) 2 200
10M dNTPs 0.2 20
Phusion Hot-Start (5 U/µL) 0.125 12.5
10µM forward primer 0.5 50
10µM reverse primer 0.5 50
Total 9 900
DNA template 1µL

PCR reaction (10μL) for psbA-trnH marker

Reagents 1 reaction (µL) 1 plate (µL)


DMSO 0.3 30
ddH2O 6.32 632
5X buffer HF (with MgCl2) 0.2 20
10M dNTPs 0.056 5.6
Phusion Hot-Start (5 U/µL) 0.125 12.5
10µM forward primer 0.1 10
10µM reverse primer 0.1 10
Total 9 900
DNA template 1µL

116
GTI Training Manual – Standardized Workflows in DNA barcoding

Plant PCR thermocycler programs

rbcLa matK ITS2 psbA- trnH


Number of Block Time Block Time Block Time Block Time
cycles temperature (mm:ss) temperature (mm:ss) temperature (mm:ss) temperature (mm:ss)
1 94°C 04:00 98°C 00:45 94°C 05:00 98°C 00:45
94°C 00:30 98°C 00:10 94°C 00:30 98°C 00:10
35x 55°C 00:30 52°C 00:30 56°C 00:30 64°C 00:30
72°C 01:00 72°C 00:40 72°C 00:45 72°C 00:40
1 72°C 10:00 72°C 10:00 72°C 10:00 72°C 10:00
Hold 4 or 10°C Until plate 4 or 10°C Until plate 4 or 10°C Until plate 4 or 10°C Until plate
removed removed removed removed

Fungal PCR thermocycler programs

ITS and LSU


Number of cycles Block temperature Hold time (mm:ss)
1 94°C 02:00
94°C 00:30
40x 50°C 00:30
72°C 01:00
1 72°C 05:00
Hold 4 or 10°C Until plate removed

117
GTI Training Manual – Standardized Workflows in DNA barcoding

Confirmation of PCR amplification

See the section above on COI amplification.

PCR product clean-up

Currently, PCR products are not cleaned-up but proceed directly to sequencing.

Cycle sequencing
Cycle sequencing different plant/fungal markers requires different primers for the sequencing mix
and different thermocycling regimes but the workflow and necessary laboratory equipment are
the same as for sequencing animal COI.
Sequencing mix will contain only one primer.

Plant sequencing thermocycler regime

Note: These thermocycling programs are suitable only for fast ramping thermocyclers.

rbcLa, psbA-trnH, and ITS2


Number of cycles Block temperature Hold time (mm:ss)
1 96°C 01:00
96°C 00:10
15x 55°C 00:05
60°C 01:15
96°C 00:10
5x 55°C 00:05
60°C 01:45
1 60°C 00:15
96°C 00:10
15x 55°C 00:05
60°C 02:00
1 60°C 01:00
Hold 4 or 10°C Until plate removed

All other markers


Number of cycles Block temperature Hold time (mm:ss)
1 96°C 01:00
96°C 00:10
35x 50°C 00:05
60°C 02:30
1 60°C 05:00
Hold 4 or 10°C Until plate removed

118
GTI Training Manual – Standardized Workflows in DNA barcoding

References for Primers used in DNA barcoding


Chen S, Yao H, Han J, et al. (2010) Validation of the ITS2 region as a novel DNA barcode for
identifying medicinal plant species. PLoS ONE 5(1): e8613.
Cuenoud P, Savolainen V, Chatrou LW, et al. (2002) Molecular phylogenetics of Caryophyllales
based on nuclear 18S rDNA and plastid rbcL, atpB, and matK DNA sequences. American
Journal of Botany, 89: 132-144.
Dunning LT, Savolainen V (2010) Broad-scale amplification of matK for DNA barcoding plants, a
technical note. Botanical Journal of the Linnean Society 164: 1–9.
Fazekas AJ, Burgess KS, Kesanakurti PR, et al. (2008) Multiple multilocus DNA barcodes from
the plastid genome discriminate plant species equally well. PLOS One 7: e2802.
Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for amplification of
mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates.
Molecular Marine Biology and Biotechnology 3: 294–299.
Ford CS, Ayres KL, Haider N, Toomey N, van-Alpen-Stohl J, et al. (2009) Selection of candidate
DNA barcoding regions for use on land plants. Botanical Journal of the Linnean Society
159: 1–11.
Gardes M, Bruns TD (1993) ITS primers with enhanced specificity of basidiomycetes: Application
to the identification of mycorrhizae and rusts. Molecular Ecology 2: 113–118.
Hajibabaei M, Janzen DH, Burns JM, Hallwachs W, Hebert PDN (2006) DNA barcodes distinguish
species of tropical Lepidoptera. Proceedings of the National Academy of Sciences of the
United States of America 103: 968-971.
Hebert PDN, Penton EH, Burns JM, Janzen DH, Hallwachs W (2004a) Ten species in one: DNA
barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator.
Proceedings of the National Academy of Sciences of the United States of America 101:
14812-14817.
Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM (2004b) Identification of birds through DNA
barcodes. PLoS Biology 2: 1657-1668.
Hernández-Triana LM, Prosser SW, Rodríguez-Perez MA, Chaverri LG, Hebert PDN, Gregory
TR (2014) Recovery of DNA barcodes from blackfly museum specimens (Diptera:
Simuliidae) using primer sets that target a variety of sequence lengths. Molecular Ecology
Resources 14:508–518.
Ivanova NV, Zemlak TS, Hanner RH, Hebert PDN (2007) Universal primer cocktails for fish DNA
barcoding. Molecular Ecology Notes 7: 544-548.
Kress WJ, Erickson DL, Jones FA, Swenson NG, Perez R, et al. (2009) Plant DNA barcodes and
a community phylogeny of a tropical forest dynamics plot in Panama. Proceedings of the
National Academy of Sciences 106: 18621–18626.

119
GTI Training Manual – Standardized Workflows in DNA barcoding

Levin RA, Wagner WL, Hoch PC, et al. (2003) Family-level relationships of Onagraceae based
on chloroplast rbcL and ndhF data. American Journal of Botany 90:107-115 (modified from
Soltis P et al. (1992) Proceedings of National Academy of Sciences USA 89: 449-451).
Messing J (1983) New M13 vectors for cloning. Methods in Enzymology 101: 20-78.
Sang T, Crawford DJ, Stuessy TF (1997) Chloroplast DNA phylogeny, reticulate evolution and
biogeography of Paeonia (Paeoniaceae). American Journal of Botany 84: 1120–1136.
Tate JA, Simpson BB (2003) Paraphyly of Tarasa (Malvaceae) and diverse origins of the polyploid
species. Systematic Botany 28: 723–737.
Vilgalys R, Hester M (1990) Rapid genetic identification and mapping of enzymatically amplified
ribosomal DNA from several Cryptococcus species. Journal of Bacteriology 172: 4239-
4246.
White TJ, Bruns T, Lee S, Taylor J (1990) Amplification and direct sequencing of fungal ribosomal
RNA genes for phylogenetics. In: PCR Protocols: a guide to methods and applications.
(Innis MA, Gelfand DH, Sninsky JJ, White TJ, eds). Academic Press, New York, USA: 315–
322.

120
GTI Training Manual – Standardized Workflows in DNA barcoding

ANNEX IX: Batch Sequence Editing


These instructions are using CodonCode as sequence editing software and are focused on editing
traces for 96-well microplates. For plant and fungal markers additional software are required
(transalign65 which requires Perl66 as well, and BioEdit67).

Sequence editing: COI


Setting-up parameters for CodonCode: The following instructions apply to editing COI
sequences in arthropods (for other taxa, change the genetic code accordingly).

65
Bininda-Emonds ORP (2005). transAlign: using amino acids to facilitate the multiple alignment of protein-
coding DNA sequences. BMC Bioinformatics 6:156 (https://www.uni-oldenburg.de/ibu/systematik-
evolutionsbiologie/programme/#Sequences).
66
https://www.perl.org/
67
http://www.mbio.ncsu.edu/BioEdit/bioedit.html

121
GTI Training Manual – Standardized Workflows in DNA barcoding

1. Open CodonCode aligner, create and name the project:

2. Import traces (file, import, add samples). Select “abi” files only. Import all the forward
(96) and reverse (96) sequences (including controls) (from one plate 192 samples)

122
GTI Training Manual – Standardized Workflows in DNA barcoding

3. Clip ends

4. Sort by quality, move the poor quality (<250bp) sequences and the controls to
trash. Select low-quality sequences and use right-click to reveal the window to
“Move to Trash”

123
GTI Training Manual – Standardized Workflows in DNA barcoding

5. Assemble by direction (contig, advance assembly, assemble in groups,


direction).

6. Reverse the R direction by using the function “control R” (to convert into 3′ – 5′
complementary direction)

7. Delete the primers by locating primer sequences or by using a reference sequence


(reference must be imported and added in the contig before use). Delete bad
sequences from the contig.
• Locate the complementary sequence of reverse primer in 3′ region of the forward
strand and delete by selecting all nucleotides on the right side of the contig.

Reverse primer (HCO2198): 3ʹ TGATTTTTTGGTCACCCTGAAGTTTA 5ʹ


Forward 3ʹ

124
GTI Training Manual – Standardized Workflows in DNA barcoding

• Locate the forward primer sequence in the 3′ region of the reverse strand and
delete by selecting all nucleotides on the left side of the contig.

Forward primer (LCO1490): 5ʹ ATTCAACCAATCATAAAGATATTGG 3ʹ


3ʹ Reverse 5ʹ

8. Unassemble F and R. Select F and R contigs by using Control key and use
unassemble function to move sequences back to “unassembled samples”.

9. Assemble by clone (contig, advance assembly, assemble in groups, clone). (F


should be in black and R in red). Black is to indicate coding strand in 5′ – 3′ while
red to indicate non-coding strand in 3′ – 5′ direction. This action will produce
individual contigs by combining forward and reverse directions of each sample
(clone). Samples that do not have a complementary sequence to pair with will
remain in “unassembled samples”.

125
GTI Training Manual – Standardized Workflows in DNA barcoding

10. Correct nucleotides in contigs and unassembled samples. Move the bad
sequences to trash. Open each contig by a mouse-click and verify that F is black
and R is red. If R is not red use “Ctrl R” to change the reverse to red. Look through
consensus sequence to correct ambiguous bases (e.g. N) and the gaps. Repeat
this for all the individual contigs.

11. Assemble contig of contigs by selecting “compare contigs” function. Check for
the direction (5′ – 3′). Select all the individual contigs and unassembled samples
using Ctrl or Shift key. Use “Compare contigs” function from “Advance Assembly”
and then “Clustal Omega” (for COI) to run alignment. Hit “Compare”.

126
GTI Training Manual – Standardized Workflows in DNA barcoding

12. Correct the erroneous bases. Open “CtgComarison” by mouse click and fix the
“Ns” and gaps.

13. Export sequences (both child contigs and unassembled samples): choose ‘include
gaps in fasta’. Export and save separate files for contigs and unassembled
samples

Next, use MEGA or a text editor to combine all the exported sequences (from contigs and
samples) in a single fasta file. Before sequence submission to BOLD, verify that the
sequences are free of stop codons.

127
GTI Training Manual – Standardized Workflows in DNA barcoding

Sequence editing: matK


• Import Forward and Reverse traces into CodonCode Aligner.
• Clip ends.
• Sort by quality then delete sequences ≤ 250bp and controls.
• Assemble by direction:
o Forward: Go to end (3’) and delete roughly 70-100 nucleotides (starting just before
a polyA region.
o Reverse: Ensure that sequences are in reverse direction (red). Look at the start
(left) and delete roughly 70-100 nucleotides.
• Unassemble.
• Assemble by clone. Edit each contig individually.
• Edit individual unassembled traces (those without a complement direction).
• Ensure directions are correct: forward traces are in black, reverse traces are in red.
• Compare contigs and unassembled traces (use CLUSTAL option).
o Look for gaps (‘-‘) and N’s in contigs as well as individual unassembled sequences.
Edit and correct N’s.
o Let sequences start at an open reading frame (ORF)! Find ‘CATCTGGAA’ at
the 5’ end. Count back (left) in 3’s in order to maintain that ORF.
• Export contigs without gaps.
• Export samples without gaps.
• Merge in Notepad or MEGA and save as fasta.
• Run Transalign using the merged fasta file. Note errors/presence of stop codons (*) and
their location.
• Open in MEGA the Transalign alignment and check translation into amino acids.
ENSURE NO STOP CODONS (*) PRESENT (search for symbol *).
• Note: If a stop codon is present, Transalign would have detected it and given you a warning.
o If editing issue (e.g., nucleotide edited incorrectly), correct it.
o If valid stop codon, it is probably a pseudogene and will need to be submitted
SEPARATELY to ‘matK-like’ instead of matK. If stop codons found, remove the
sequence(s), and run Transalign on the remaining sequences again simply to
ensure that, when pseudogene sequences are removed, remainder of the
sequences are OK.
• Submit sequence data to BOLD:
o matK sequences submitted to ‘matK’.
o matK sequences with stop codons submitted to ‘matK-like’ (this marker needs to
be added to the project before sequence upload).

Transalign

Perl and transalign need to be installed on the computer. The following instructions will use the
terminal window:

Step 1: cd c:\transalign

128
GTI Training Manual – Standardized Workflows in DNA barcoding

Step 2: c:\strawberry\perl\bin\perl.exe transalign.pl


Step 3: enter the name of the fasta file
Step 4: clustalw2.exe

Open the terminal window and navigate to the folder with transalign. The fasta file to be analyzed
should be in the same folder. In this example, the folder is called tranAlign.

[navigate to the correct folder] -> cd C:\transAlign


C:\transAlign>c:\strawberry\perl\bin\perl.exe transalign.pl
Entering interactive user-input mode. Type "q" at any prompt to exit program.
Enter name of data file []: PLCHU-005-006-matK-tot.fas
Enter format of file PLCHU-005-006-matK-tot.fas (fasta|nexus|phylip|Se-Al)
[autodetect]:fasta
Enter whether to strip all explicit gaps (all), only those flanking sequence (flank), or none
(none) [flank]: all
Genetic codes available:
1: standard
2: vertebrate mitochondrial
3: yeast mitochondrial
4: mold, protozoan and colenterate mitochondrial and mycoplasam/spiroplasma
5: invertebrate mitochondrial
6: ciliate, dasycladacean and hexamita nuclear
9: echinoderm mitochondrial
10: euplotid nuclear
11: bacterial and plant plastid
12: alternative yeast nuclear
13: ascidian mitochondrial
14: alternative flatworm mitochondrial
15: Blepharisma nuclear
16: chlorophycean mitochondrial
21: trematode mitochondrial
22: Scenedesmus obliquus mitochondrial
23: Thraustochytrium mitochondrial
Enter global genetic code to be applied [1]: 11
Enter maximum percentage of Ns any sequences can contain [5]: 1
Enter whether stop codon threshold should be a number (abs) or a percentage (perc) [abs]:
abs
Enter maximum number of stop codons (excluding terminal one) input sequence can have
[1]: 2
Check all possible orientations and reading frames for input sequences (y|n) [n]: y
Enter how to handle frame-shifted sequences (exclude|AA alignment|DNA alignment) [DNA]:
DNA
Enter protein weight matrix to use for ClustalW alignment (BLOSUM|GONNET|PAM)
[GONNET]:
Enter full path and name to Clustal program [clustalw]: clustalw2.exc
Clustal program not found at clustalw2.exc
Enter full path and name to Clustal program [clustalw]: clustalw2.exe
Alpha level with which to test for poorly aligning sequences (from 0 to1; 0 = off) [0]:
Enter output order for sequences (alphabetical|clustal|input file) [alphabetical]:
Output results in nexus format (y|n) [n]:

129
GTI Training Manual – Standardized Workflows in DNA barcoding

Output results in traditional phylip format (y|n) [n]:


Output results in extended phylip format (y|n) [n]:
Output results in Se-Al format (y|n) [n]:
Output verbose results to screen (y|n) [n]:
Establishing translation tables for all genetic codes ...

Reading in sequence data from file PLCHU-005-006-matK-tot.fas (type is fasta) ...

Processing each sequence to determine optimal reading frame (in all possible orientations) and
any frame shifts
WARNING: Number of stop codons in best frame (5) for MKPCH573-09 exceeds threshold
(2); inspection of sequence is recommended

WARNING: Number of stop codons in best frame (4) for MKPCH284-09MatK-1RKIM-


f_MatK-3FKIM-r exceeds threshold (2); inspection of sequence is recommended

Time taken to frame sequences: 2 seconds

Passing AA sequence data to ClustalW for alignment ...


Running status from ClustalW saved to file clustal_AA_log.txt

Time taken to align amino acid sequences: 88 seconds

Reading in ClustalW aligned AA data and back-translating to DNA ...

Time taken to back-translate to DNA: 0 seconds

Passing DNA sequence data to ClustalW for alignment as profiles ...


Running status from ClustalW saved to file clustal_profile_log.txt

Time taken to align DNA sequences: 20 seconds

Reading in ClustalW aligned DNA data and inferring indel positions ...

Location of putative indels causing frame shifts:


Sequence: MKPCH573-09
Insertions: none
Deletions: 602
Unspecified indels: none
Sequence: MKPCH284-09MatK-1RKIM-f_MatK-3FKIM-r
Insertions: 639
Deletions: none
Unspecified indels: none

Time taken to infer locations of frame shifts: 0 seconds

Printing results ...


Writing to fasta-formatted file PLCHU-005-006-matK-tot_tAlign.fasta ...

C:\transAlign>

130
GTI Training Manual – Standardized Workflows in DNA barcoding

Sequence editing: rbcLa


• Import Forward and Reverse traces into CodonCode Aligner.
• Clip ends.
• Sort by quality then delete sequences ≤ 250bp and controls.
• Assemble by direction:
o Forward: Go to end (3’), find the conserved sequence ‘CGCGGT’ in your
consensus sequences and delete it (plus everything beyond). Change colour,
check for poor quality sequences.
o Reverse: Ensure that sequences are in reverse direction (red). Look at the start
(left), find ‘AAAGC’+A. Careful examination is needed as this conserved region is
repeated a bit further downstream. Delete the first occurrence of the region plus
an extra A (to ensure it is in proper frame when checking for presence of stop
codons prior to uploading to BOLD).
• Unassemble.
• Assemble by clone. Edit contigs.
• Edit individual unassembled traces (those without a complement direction).
• Ensure directions are correct: forward traces are in black, reverse traces are in red.
• Check length of assembled clones. If universal primers are used, the fragment is 552 bp.
If greater than 552 bp nucleotides, determine if the gap present should be removed. If no
gap present, and the longer length is from a valid insertion, verify that the sequence is not
a pseudogene (see below).
• Compare contigs (use MUSCLE).
o Look for gaps (‘-‘) and N’s in the contigs and correct when needed.
o Look for gaps (‘-‘) and N’s in individual unassembled sequences and correct when
needed.
o Change preferences to ‘majority consensus’ and edit discrepancies.
• Add gaps to the front (5’) of sequences that need it (so as to ensure all start at the same
position).
• Export contigs with gaps as fasta file.
• Export samples with gaps as fasta file.
• Merge both files in Notepad or MEGA and save as fasta file.
• Open the merged file in MEGA and check translation into amino acids. ENSURE NO
STOP CODONS (*) ARE PRESENT (search for symbol *).
• If a stop codon is detected, verify its validity:
o If editing issue (e.g., nucleotide edited incorrectly), correct it.
o If valid stop codon, it is probably a pseudogene and needs to be submitted
SEPARATELY to ‘rbcL-like’ instead of to rbcLa.
• Submit sequence data to BOLD:
o rbcL sequences submitted to ‘rbcLa’.
o rbcL sequences with stop codons submitted to ‘rbcL-like’ (this marker needs to be
added to the project before sequence upload)

131
GTI Training Manual – Standardized Workflows in DNA barcoding

Sequence editing: ITS


• Import Forward and Reverse traces into CodonCode Aligner.
• Clip ends.
• Sort by quality then delete short sequences and controls.
• Assemble by direction and review.
o Forward: look for reverse primer at the end of the sequence: TTTAA/GCATATCA
o Reverse: look for forward primer GTGAAT/TGCAGAATC
o Keep the BOLDED sequences and delete the italicized
• Unassemble.
• Assemble by clone. Edit each contigs (correct N’s and ‘-‘). Do not correct individual reads
(unless necessary). If the contig is bad, unassemble and edit each sequence individually
and keep the best sequence/trace.
• Edit individual unassembled traces (those without a complement direction).
• Ensure directions are correct: forward traces are in black, reverse traces are red.
• BLAST all short sequences (both singles and contigs) to make sure they are the correct
kingdom.
• Compare contigs (use MUSCLE).
o Correct N’s if possible, but, N’s should be valid given that all contigs and individual
samples have been checked in the step above.
o Change preferences to ‘majority consensus’ to edit discrepancies, change back.
• BLAST the first 4 and last 4 sequences in the contig comparison to see if they are
fungal/plant (the clustering algorithm will group based on similarity, so the odd ones tend
to go to the top or the bottom).
• Export contigs without gaps.
• Export unassembled samples without gaps.
• Merge in Notepad and save as fasta file.
• Import/Open the merged fasta file in BioEdit. Select all sequences, click on the Accessory
Application tab, chose the ClustalW Multiple Alignment option, then under the multiple
alignment section enter 10 for gap open and 5 for gap extend. Hit ‘Run ClustalW’. Save
the alignment.
• Submit sequence data to BOLD.

132
GTI Training Manual – Standardized Workflows in DNA barcoding

Annex X: BOLD Data Submission


BOLD provides a platform to deposit DNA barcodes associated with specimen taxonomy and
provenance information. Prior to uploading data however, an account needs to be created (see
Figure 4 for links to the login page.

Upon log in, the main console is displayed, the control desk of the entire BOLD experience.

On the upper side there is a search bar (for project, datasets and records). On the right corner,
there are links to main general tools in the following order: 1) main BOLD databases (Public Data,
BINs, Publications, Primers), 2) BOLD ID Engine, 3) Taxonomy Browser, 4) Main Console, and
5) Resources (documentation and BOLD API).

In the middle of the console, there are green buttons to create new projects and/or new datasets
and blue buttons for data upload (once the records are inserted in BOLD).

The left-side console provides other links (Projects, Checklists, Primers, Main Menu) and
becomes the main navigation pane within a project (holding all the analytical tools).

To insert data into BOLD, a new project needs to be created. Then, Specimen Data can be
inserted manually, one by one, or in batch, submitted through the Uploads tool to the BOLD team
for validation and upload to BOLD. The batch upload is performed by filling out and submitting
standard MS Excel templates that can be downloaded from BOLD. These spreadsheets can also

133
GTI Training Manual – Standardized Workflows in DNA barcoding

be generated automatically from the Electronic Field Journal using the ‘BOLD Data Output’
function (Annex II).

Note: Each data record can only be submitted once as a new record, but can later be updated an
indefinite number of times.

Once all records are in BOLD, images can be uploaded manually, one by one, or in batch through
the Uploads->Images tool (in light green in the figure above). All images (.jpg only) and the
associated spreadsheet (template provided by BOLD) are compressed in a Zip folder (<190 Mb)
and submitted to BOLD. Any issue will trigger an error message from BOLD.

Once chromatograms are received from the sequencer, these files can be uploaded manually,
one by one, or in batch through the Uploads->Traces tool (in blue in the figure above). All traces
(.abi files) and the associated spreadsheet (template provided by BOLD) are compressed in a Zip
folder (<190 Mb) and submitted to BOLD. Any issue will trigger an error message from BOLD

Finally, once traces are edited, DNA sequences can be uploaded to BOLD through the batch
submission tool (in purple in the figure above). The name of the sequencing facility needs to be
added to the Run Site (pre-filled with institution names already in BOLD; new institutions can be
added).

More details on data submission can be found in the BOLD Handbook:


http://www.boldsystems.org/index.php/resources/handbook?chapter=3_submissions.html#data_
submissions

Once all data and metadata is in BOLD, the project console is populated with records and
statistics.

134
GTI Training Manual – Standardized Workflows in DNA barcoding

For updates to existing records, there are two options:

• If only a few records need some fields updated, the update can be done manually through
the Specimen Page of each record (highlighted in the red rectangle);
• If many records need to be updated, a batch update can be submitted through the same
tool as new records (with the same spreadsheet template but specifying the tab that
needs updates: Voucher, Taxonomy, Specimen, Collection);
• If there are errors in images or traces, a request needs to be sent to the BOLD team for
those files to be deleted;
• If sequences need to be updated (based on new sequence editing etc), a new sequence
upload, by the user, will overwrite the existing information. Note: sequences can be
individually updated through the Sequence Page of each record (highlighted in the green
rectangle).

135

You might also like