Taberlet Et Al., 2007

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Published online 14 December 2006 Nucleic Acids Research, 2007, Vol. 35, No.

3 e14
doi:10.1093/nar/gkl938

Power and limitations of the chloroplast trnL (UAA)


intron for plant DNA barcoding
Pierre Taberlet1,*, Eric Coissac2,3, François Pompanon1, Ludovic Gielly1, Christian Miquel1,
Alice Valentini1,4,5, Thierry Vermat6, Gérard Corthier7, Christian Brochmann8 and
Eske Willerslev9
1
Laboratoire d’Ecologie Alpine, CNRS UMR 5553, Université Joseph Fourier, BP 53, 38041 Grenoble Cedex 9, France,
2
Laboratoire Adaptation et Pathogénie des Microorganismes, CNRS UMR 5163, Université Joseph Fourier, BP 170,
38042 Grenoble Cedex 9, France, 3INRIA Rhône-Alpes, Hélix Project, 655 Avenue de l’Europe, 38334 Montbonnot
Cedex, France, 4Dipartimento di Ecologia e Sviluppo Economico Sostenibile, Università degli Studi della Tuscia, via S.
Giovanni Decollato 1, 01100 Viterbo, Italy, 5Department of Ecology and Natural Resource Management, Norwegian
University of Life Sciences, PO Box 5003, No-1432 Ås, Norway, 6Bioinformatics, GENOME Express, 11 Chemin des
Prés, 38944 Meylan, France, 7UR 910 Ecologie et Physiologie du Système Digestif, INRA Domaine de Vilvert, 78352
Jouy-en-Josas Cedex, France, 8National Centre for Biosystematics, Natural History Museum, University of Oslo, PO
Box 1172 Blindern, NO-0318 Oslo, Norway and 9Center for Ancient Genetics, Niels Bohr Institute & Biological
Institutes, University of Copenhagen, Juliane Maries vej 30, DK-2100 Copenhagen, Denmark

Received June 29, 2006; Revised September 21, 2006; Accepted October 16, 2006

ABSTRACT INTRODUCTION
DNA barcoding should provide rapid, accurate DNA barcoding is a relatively new concept (1,2), aiming
and automatable species identifications by using to provide rapid, accurate and automatable species identi-
a standardized DNA region as a tag. Based on fications by using a standardized DNA region as a tag (3).
sequences available in GenBank and sequences As recently pointed out by Chase et al. (4), there are two
categories of potential DNA barcode users: taxonomists and
produced for this study, we evaluated the resolution
scientists in other fields (e.g. forensic science, biotechnology
power of the whole chloroplast trnL (UAA) intron and food industry, animal diet).
(254–767 bp) and of a shorter fragment of this According to the current technology, the ideal DNA
intron (the P6 loop, 10–143 bp) amplified with highly barcoding system should meet the following criteria. First,
conserved primers. The main limitation of the whole it should be sufficiently variable to discriminate among all
trnL intron for DNA barcoding remains its relatively species, but conserved enough to be less variable within
low resolution (67.3% of the species from GenBank than between species. Second, it should be standardized,
unambiguously identified). The resolution of the with the same DNA region as far as possible used for differ-
P6 loop is lower (19.5% identified) but remains ent taxonomic groups. Third, the target DNA region should
higher than those of existing alternative systems. contain enough phylogenetic information to easily assign
The resolution is much higher in specific contexts species to its taxonomic group (genus, family, etc.). Fourth,
it should be extremely robust, with highly conserved priming
such as species originating from a single ecosys-
sites, and highly reliable DNA amplifications and sequencing.
tem, or commonly eaten plants. Despite the rela- This is particularly important when using environmental
tively low resolution, the whole trnL intron and its DNA where each extract contains a mixture of many species
P6 loop have many advantages: the primers are to be identified at the same time. Fifth, the target DNA region
highly conserved, and the amplification system is should be short enough to allow amplification of degraded
very robust. The P6 loop can even be amplified DNA. Unfortunately, such an ideal DNA marker does not
when using highly degraded DNA from processed exist. However, for different category of users (i.e. taxono-
food or from permafrost samples, and has the mists versus scientists in other fields), the five criteria listed
potential to be extensively used in food industry, above will not be equally important. For example, a high
in forensic science, in diet analyses based on feces level of variation with sufficient phylogenetic information
and in ancient DNA studies. will be most important for taxonomists. In contrast, the levels

*To whom correspondence should be addressed. Tel: +33 476 51 45 24; Fax: +33 476 51 42 79; Email: [email protected]

 2006 The Author(s).


This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded from https://academic.oup.com/nar/article-abstract/35/3/e14/2401927


by guest
on 17 June 2018
e14 Nucleic Acids Research, 2007, Vol. 35, No. 3 PAGE 2 OF 8

of standardization and robustness will be most important in


forensics or when analyzing processed food.
So far, methodological papers published on DNA barcod-
ing have typically dealt with the most suitable region of the
genome according to the taxonomists’ point of view [e.g.
Ref. (5–7)]. In animals, the 50 fragment of the mitochondrial
gene for the cytochrome oxidase subunit I (COI or COXI)
represents a good candidate [e.g. Ref. (5,8,9)]. However, there
is no consensus in the scientific community, and 16S rRNA,
Figure 1. Position of the primers c, d, g and h on the chloroplast trnL (UAA)
another mitochondrial gene, or the nuclear ribosomal DNA gene. The P6 loop amplified with primer g and h is indicated in green.
have also been proposed as useful barcoding markers (7,10).
In plants, the situation is much more difficult, because both Table 1. Sequences of the two universal primer pairs amplifying the trnL
the mitochondrial and chloroplast genomes are evolving too (UAA) intron
slowly to provide enough variation. For taxonomists, the cur-
rent strategy is to sequence several DNA regions (4), including Name Code Sequence 50 –30
both nuclear and chloroplast fragments such as the internal
c A49325 CGAAATCGGTAGACGCTACG
transcribed spacer (ITS) region of the 18S–5.8S–26S nuclear d B49863 GGGGATAGAGGGACTTGAAC
ribosomal cistron (11) or the chloroplast trnH–psbA region (6). g A49425 GGGCAATCCTGAGCCAA
In this study, we approach the plant DNA barcoding h B49466 CCATTGAGTCTCTGCACCTATC
problem in another way, by emphasizing the point of view
Length of the amplified fragment with primers c–d in tobacco: 456 bp. Length
of scientists other than taxonomists, looking for standardized of the amplified fragment with primers g–h in tobacco: 40 bp. The code denotes
and robust methodologies. For this purpose, we must find a the 30 -most base pairs in the published tobacco cpDNA sequence (23). Primers
genome region as variable as possible, but bearing the possi- c and d are from Taberlet et al. (19). Primer g and h were designed for this study
bility of designing highly conserved PCR primers that amplify (France patent no 2 876 378; April 14, 2006).
a very short DNA region, of no more than 100–150 bp. Such a
short region should allow reliable amplifications of even highly GenBank. Then, they were evaluated on two specific datasets
degraded DNA found in processed food or in fossil remains. by sequencing the whole intron for more than 100 plant
Up to now, when working with substrates such as ancient species originating from the same environment, and by com-
DNA, the strategy has been to use primers based on the chloro- piling sequences of the main plants used in the food industry.
plast rbcL gene (12), but this system only allows in most cases Finally, we tested the robustness of a new pair of internal
the identification of families, not genera or species. primers applied on different substrates supposed to contain
The chloroplast trnL (UAA) intron may represent a good highly degraded DNA.
target region for our purpose. Its sequences have been widely
used for reconstructing phylogenies between closely related Primer used
species (13–15) or for identifying plant species (16,17).
Figure 1 presents the location of the primers in the chloro-
Nevertheless, it is widely recognized that it does not represent
plast trnL (UAA) gene, and Table 1 gives their sequences.
the most variable non-coding region of chloroplast DNA (18),
The primers c and d are from Taberlet et al. (19). This frag-
but it bears some unique advantages. Universal primers
ment encompasses the entire trnL (UAA) intron plus a few
for this region were designed 15 years ago (19), and sub-
base pairs on each side belonging to the trnL (UAA) gene
sequently extensively used, mainly in phylogenetic studies
itself. The primers g and h were designed for this study on
among closely related genera and species (20). The evolution
two highly conserved regions after aligning various
of the trnL (UAA) intron has been thoroughly analyzed and is
sequences, either from GenBank or produced earlier in the
well understood (21,22). Furthermore, this region is the only
Grenoble laboratory.
Group I intron in chloroplast DNA (23,24). This means that it
has a conserved secondary structure (25,26) with alternation
The Arctic plant dataset
of conserved and variable regions (22). As a consequence,
the alignment of diverse trnL intron sequences might allow We analyzed 123 arctic plant samples collected between 1998
the design of new versatile primers embedded in conserved and 2003, partly taken from herbarium specimens and partly
regions and amplifying the short variable region in between. from field-collected, silica-dried leaf samples deposited at the
More specifically, our objective in this paper is to evaluate Natural History Museum in Oslo. Total DNA was extracted
the power and the limitations of the chloroplast trnL (UAA) from around 10 mg of dried leaf tissue with the DNeasy
intron for plant DNA barcoding, and to assess the possibility 96 Plant Kit (Qiagen), following the manufacturer’s protocol.
for designing a new system allowing species identification Double-stranded DNA amplifications were performed in vol-
with highly degraded DNA. umes of 25 ml containing 2.5 mM MgCl2, 200 mM of each
dNTP, 1 mM of each primer and 1 U of AmpliTaq Gold
DNA polymerase (Applied Biosystems). The trnL (UAA)
MATERIALS AND METHODS intron was amplified with primers c and d (19). Following
an activation step of 10 min at 95 C for the enzyme (Applied
General strategy
Biosystems specification), the PCR mixture underwent
The power and the robustness of the trnL intron for DNA 35 cycles of 30 s at 95 C, 30 s at 50 C and 2 min at 72 C
barcoding were first evaluated with the data available in on a GeneAmp PCR system 2720 (Applied Biosystems).

Downloaded from https://academic.oup.com/nar/article-abstract/35/3/e14/2401927


by guest
on 17 June 2018
PAGE 3 OF 8 Nucleic Acids Research, 2007, Vol. 35, No. 3 e14

To remove excess primers and deoxynucleotide triphosphates with this taxon are not found in any other taxa. To limit the
after amplification, PCR products were purified on QIAquick influence of the taxonomic coverage of the GenBank data-
PCR Purification Kit columns (Qiagen), according to the base, we discarded genera represented by only one species
manufacturer’s instructions. Sequencing was performed, on and families represented by only one genus. The same mea-
both strands, using the BigDye Terminator v1.1 Cycle sure of specificity was applied to the arctic plant dataset
Sequencing Kit (Applied Biosystems) in volumes of 20 ml described above. We also assessed the intraspecific variation
containing 20 ng of purified DNA and 4 pmol of amplifica- of the whole trnL intron and of the short P6 loop fragment by
tion primer, according to the manufacturer’s specifications. extracting, from the GenBank amplicon database constructed
Sequencing reactions underwent 25 cycles of 30 s at 96 C, by the ePCR software, all the species represented by more
30 s at 50 C and 4 min at 60 C. Excess dye terminators than one entry.
were removed by a spin-column purification. Sequencing
reactions were electrophoresed for 45 min on an ABI Primer ‘universality’
PRISM 3100 Genetic Analyzer (Applied Biosystems) using
The universality of the four primers c, d, g and h was exam-
36 cm capillaries and POP-4 polymer.
ined by comparing their sequences with homologous
sequences, either from GenBank (for primers c, d, g and h)
The Food dataset
or produced in this study (for primers g and h).
Seventy-two sequences of the main plants used in the
food industry were retrieved from GenBank or sequenced Robustness of the system for biotechnological
following the previous protocol. For this analysis, we applications
restricted our investigations to the short fragment amplified
To illustrate the possibility of using the g–h primer pair in
with the g–h primer pair.
biotechnology, we retrieved from GenBank some sequences
corresponding to common plant species frequently used in
Bioinformatic approach
food industry. To demonstrate the robustness of the system
PCR were simulated on the full plant division of GenBank using the g and h primers, we tried to amplify this fragment
download from NCBI server on the December 14, in several highly degraded templates, such as processed food
2005 (ftp://ftp.ncbi.nlm.nih.gov/genbank). This release corre- (four samples: brown sugar from sugar cane, cooked potatoes,
sponds to 731 531 entries. The electronic PCR software cooked pasta and lyophilized potage), human feces (two
(ePCR) was specially developed for this study. It is based samples) and permafrost samples (four samples). Appropriate
on the agrep algorithm (27) that allows identifying occur- criteria for the retrieval of highly degraded DNA were
rences of a small pattern (corresponding to a PCR primer) followed (28). This included DNA extraction and PCR
on a large text (genomic sequence) with a fixed maximum setup in dedicated and isolated ancient DNA facilities in
mismatch count. This strategy is more relevant than simple Grenoble and Copenhagen, and the use of multiple extraction
blast queries, which are not suitable to identify similarity and PCR blank controls. Importantly, the permafrost sample
on nucleic sequences when the query sequence (here had been drilled spiking the drilling apparatus with a recog-
oligonucleotide sequence) is too short. Our ePCR software nizable bacterial vector (pCR4-TOPO; Stratagene) to test
allows specifying maximum mismatch count, minimum for contamination during drilling and handling. After arrival
and maximum length of the amplified region and takes care (frozen) in the laboratory, 2–3 cm of the core surfaces was
to also retrieve taxonomic data from analyzed entries. removed. The outer scrape and the interior core material were
It works on Genbank, EMBL or fasta formatted sequence subjected to DNA extractions followed by 40 cycles of PCR
files (in the latter case, taxonomic data must be encoded in using vector-specific primers T3/T7. No vector contaminants
a special format on the title line). The ePCR software is avail- were detected in the inner core extracts used for the plant
able for academic users upon e-mail request to Eric Coissac DNA studies. For processed food, total DNA was extracted
([email protected]). from 50 mg of dried material using the DNeasy Tissue
ePCR was realized on GenBank data, first with the c and d Kit (Qiagen) following the manufacturer’s instructions. The
primers, second with the g and h primers, third on a short DNA extract was recovered in a volume of 200 ml. Total
rbcL fragment with the h1aF and h2aR primers (12), and DNA was extracted according to Godon et al. (29) and to
finally with eight primer pairs found in Shaw et al. (18). Willerslev et al. (30) for the human feces and the permafrost
ePCR was also realized on the arctic plant dataset with the sample, respectively. DNA amplifications were carried out
c and d primers (after adding the c and d sequences on using the primers g and h in final volume of 25 ml, using
each side of the sequenced PCR product), and with the g 2.5 ml of DNA extract as template. The amplification mixture
and h primers. contained 1 U of AmpliTaq Gold DNA Polymerase
Next, amplicon databases constructed by the ePCR soft- (Applied Biosystems), 10 mM Tris–HCl, 50 mM KCl,
ware were analyzed to extract taxonomic specificities of the 2 mM of MgCl2, 0.2 mM of each dNTPs, 1 mM of each
amplified sequences. This analysis used the taxonomic classi- primer (for some experiments, the g primer was labeled
fication provided by NCBI to assess taxonomic relationships with the HEX fluorochrome, or the h primer was labeled
between sequences. The main goal of this analysis was to with the FAM fluorochrome), and 200 mg/ml of BSA
determine the proportion of the species, genera and families (Roche). After 10 min at 95 C (Taq activation), the PCR
unambiguously identified by the sequences amplified via cycles were as follows: 35 cycles of 30 s at 95 C, 30 s at
ePCR. A taxon (species, genus or family) was defined as 55 C and 30 s at 72 C, except for the sugar extract for
‘unambiguously identified’ if all the sequences associated which we performed 50 cycles, and for the amplifications

Downloaded from https://academic.oup.com/nar/article-abstract/35/3/e14/2401927


by guest
on 17 June 2018
e14 Nucleic Acids Research, 2007, Vol. 35, No. 3 PAGE 4 OF 8

with the fluorescent g primer for which we removed the corresponding to 11 404 species, 4215 genera and 410 fami-
elongation time in order to reduce the +A artefact (31,32). lies. These 18 200 sequences give a good evaluation of the
PCR products obtained with the fluorescent g or h primers number of chloroplast trnL (UAA) intron sequences in Gen-
were electrophoresed for 35 min on an ABI PRISM Bank. The much lower number obtained for the c–d ePCR is
3100 Genetic Analyzer (Applied Biosystems) using 36 cm simply due to the fact that the recorded sequences do not con-
capillaries and POP-4 polymer. PCR products obtained tain the primer sequences, and thus are not ‘amplified’ via our
with non-fluorescent primers were either directly sequenced, ePCR approach. The arctic plant dataset produced for this
or cloned (except for the permafrost samples) if the sequences study consists of 132 species, 58 genera and 28 families
obtained with direct sequencing were not readable (i.e. a (GenBank accession nos DQ860511–DQ860642). The food
mixture of different sequences). dataset analyzed for primers g and h, consists of 72 species,
64 genera and 37 families retrieved from GenBank, or pro-
duced for this study (GenBank accession numbers of species
RESULTS sequenced for this study: EF010967–EF010973).
For all datasets, the length of the sequences amplified with
The three datasets c and d varies from 254 to 767 bp, and the length of the P6
Via the ePCR with primers c and d we retrieved 1308 loop amplified with g and h varies from 10 bp in Cuscuta
sequences from GenBank, corresponding to 706 species, indecora to 143 bp in Schoenoplectus littoralis.
366 genera and 119 families (excluding all sequences with
Universality of primer sites
at least one ambiguous nucleotide, and excluding genera
with a single species and families with a single genera). Table 1 presents the sequences of the two primer pairs c–d,
With primers g and h, we retrieved 18 200 sequences, and g–h. Figure 2 shows the exact positions of the four

Figure 2. Positions of the primers c and d on the secondary structure of the trnL (UAA) exon (A) and of the primers g and h on the secondary structure of the trnL
(UAA) intron (B) for Nymphaea odorata [modified from Ref. (33)]. Highly conserved elements of the catalytic core (P, Q, R1, R2 and S) are located in grey
boxes. The P6 loop, amplified with primers g and h, is identified by green letters. The 30 ends of each of the four primers c, d, g and h are marked out by an arrow
and their positions are identified by red letters.

Downloaded from https://academic.oup.com/nar/article-abstract/35/3/e14/2401927


by guest
on 17 June 2018
PAGE 5 OF 8 Nucleic Acids Research, 2007, Vol. 35, No. 3 e14

Table 2. Sequence variation of priming site for primer c, d, g and h

Primer Sequence 50 –30 % Species Acc. no.

c CGAAATCGGTAGACGCTACG 76.65 Nicotiana tabacum M16898


......T............. 17.46 Carex phacota AB079396
......T..........G.. 2.86 Angelica archangelica AF444007
.........C.......... 2.07 Manulea annua AJ550529
..G................. 0.69 Luzula rufa AY437945
d GGGGATAGAGGGACTTGAAC 94.18 N.tabacum M16898
.............T...... 2.76 Elegia cuspidata AF148735
...............C.... 1.08 Nymphaea alba AJ627251
...A................ 0.89 Cephalanthus natalensis AJ414549
g GGGCAATCCTGAGCCAA 92.55 N.tabacum M16898
...T............. 3.78 Picea abies AB045065
.......T......... 1.27 Apteranthes europaea AJ488313
....G.......T.... 0.51 Lamium purpureum AJ608588
h CCATTGAGTCTCTGCACCTATC 65.60 N.tabacum M16898
..G................... 16.15 Sedum clavatum AY540575
....C................. 9.74 Veronica davisii AY540871
..G.C................. 4.28 Stapeliopsis pillansii AY780507
..T................... 1.55 Cinnamomum zeylanicum AB040085
..T................T.. 0.60 Corryocactus brevistylus AY015393

Only variants at a frequency higher than 0.005 are indicated. A total of 1014 and 14 145 GenBank entries were used for the primer pairs c–d and g–h,
respectively. %: percentage of sequence variants found in GenBank. Species: Example of species corresponding to the sequence variant. Acc. no.: accession
number in GenBank.

primers used in the secondary structure RNAs produced by and 14% for the P6 loop. This subset of sequences allowed to
both the trnL (UAA) exon and the trnL (UAA) intron. estimate the lower and upper limits of the intraspecific vari-
Primers g and h are located on highly conserved catalytic ability. The lower limit was estimated assuming no variation
parts of the intron, leading to the amplification of the short in species represented by a single entry in GenBank, and the
P6 loop. upper limit by taking into account only species represented by
Table 2 shows the variation at the priming sites. Only more than one entry in GenBank. The intraspecific variability
sequence variants with a frequency of more than 0.005 lies between 5.9 and 55.0% for the whole intron, and 3.4 and
were listed. Primers c and d are highly conserved among 24.1% for the P6 loop. However, the upper values certainly
land plants, from Angiosperms to Bryophytes. Even in represent a large overestimation of the real values, because
some algae, this primer pair has the potential to produce a single entry in GenBank might correspond to many
PCR products. The very large number of trnL (UAA) intron analyzed individuals from the same species. Furthermore,
sequence retrieved as well as those produced for this study for the P6 loop, the intraspecific polymorphism does not com-
allowed an extensive evaluation of the universality of primers promise the species identification in 85 cases out of 481.
g and h. These new primers are highly conserved in Angios-
perms and Gymnosperms.
Robustness of the system using the g and h primers
Proportions of species, genera and families identified
We obtained PCR products with 35 cycles for all the samples
Table 3 shows the percentage of species, genera and families analyzed, except for the sugar sample, for which 50 cycles
properly identified using the primer pairs c–d and g–h in both were necessary. After electrophoresis of the fluorescent
the GenBank and arctic plant datasets, and the primer pair PCR products, some samples gave a single peak (data not
h1aF–h2aR (12). Globally, on the GenBank dataset, the shown; sugar, cooked potatoes, cooked pasta) while all the
entire trnL (UAA) intron and the P6 loop amplified with pri- other samples gave a multi-peak profile. The sequences
mers g and h allow the identification of 67.3 and 19.5% of obtained after direct sequencing for the three samples that
the species without taking into account single species within gave a single peak correspond to sugarcane (Saccharum
a genus, respectively. However, these values are probably officinarum), potato (Solanum tuberosum) and wheat (Triti-
underestimates, because of the possibility of misidentification cum vulgare). Figure 3 illustrates the multi-peak profiles
in GenBank (i.e. a wrong species assignment, either by mis- obtained after electrophoresis of the fluorescent PCR products
identification of the specimen, by problems of synonymy for more than 20 000 years old permafrost sample, and for a
or by PCR contamination). The ePCR using other primer human fecal sample. The PCR products of the lyophilized
pairs found in Shaw et al. (18), which amplify psbB-psbH, potage and of the human feces were cloned and sequenced.
rpoB-trnC (GCA), rpS16 intron, trnD (GUC)-trnT (GGU), Table 5 shows the sequences obtained after cloning the PCR
trnH (GUG)-psbA and trnS (UGA)-trnfM (CAU), never product obtained from the lyophilized potage. Twenty-three
retrieved more than 100 sequences, and were not taken clones were sequenced, and three species were unambigu-
into account. Table 4 illustrates the sequence variation of ously identified: leek (Allium porum), potato (S.tuberosum)
g-h amplicons for commonly eaten plant species. and onion (Allium cepa). The same approach was used
Among all the amplicons retrieved from GenBank by using for the human feces, and the plant species identified are
the ePCR software, the percentage of species represented by banana (Musa acuminata), lettuce (Lactuca sativa) and
more than a single entry was 11% for the whole trnL intron cacao (Theobroma cacao).

Downloaded from https://academic.oup.com/nar/article-abstract/35/3/e14/2401927


by guest
on 17 June 2018
e14 Nucleic Acids Research, 2007, Vol. 35, No. 3 PAGE 6 OF 8

Table 3. Percentages of species, genera and families identified using the chloroplast trnL (UAA) intron, the P6 loop of this intron and comparison with another
primer pairs

cpDNA gene and dataset Length variation No. of species/genera/ Species (%) Genus (%) Family (%)
(bp)a families analyzedb

Chloroplast trnL (UAA) intron amplified with primers 254–767 706/366/119 67.28 86.34 100.00
c and d. GenBank dataset
Chloroplast trnL (UAA) intron amplified with primers 355–653 103/47/24 85.44 100.00 100.00
c and d. Arctic plant dataset
P6 loop of trnL intron amplified with primers 10–143 11 404/4225/310 19.48 41.40 79.35
g and h. GenBank dataset
P6 loop of trnL intron amplified with primers 22–83 106/48/25 47.17 89.58 100.00
g and h. Arctic plant dataset
P6 loop of trnL intron amplified with primers 22–65 72/64/37 77.78 87.50 100.00
g and h. Food dataset
P6 loop of trnL intron amplified with primers 10–127 1524/1525/244 24.02 59.48 90.57
g and h. Subset of the GenBank datasetc
rbcL amplified with primers h1aF and h2aR (12). 91–98 1524/1525/244 15.09 37.51 68.03
Subset of the GenBank datasetc

Note that these estimates were made by taking into account genera with more than two species for the species identification, families with more than two genera for
genus identification, and orders with more than two families for family identification.
a
Length in base pairs excluding primers.
b
Excluding families with a single genera, genera with a single species and species alone in a genus except for food dataset.
c
Based on species in common between the g–h and the h1aF–h2aR datasets.

Table 4. Example of P6 loop [trnL (UAA)] sequences of commonly eaten plant species amplified with primers g and h

Common name Scientific name P6 loop sequence amplified with primers g and h Acc. no.

Cacao Theobroma cacao ATCCTATTATTTTATTATTTTACGAAACTAAACAAAGGTTCAGCAAG- EF010969


CGAGAATAATAAAAAAAG
Beet Beta vulgaris CTCCTTTTTTCAAAAGAAAAAAAATAAGGATTCCGAAAACAAGAATAAAAAAAAAG EF010967
Sugarcane Saccharum officinarum ATCCCCTTTTTTGAAAAAACAAGTGGTTCTCAAACTAGAACCCAAAGGAAAAG AY116253
Wheat Triticum aestivum ATCCGTGTTTTGAGAAAACAAGGGGTTCTCGAACTAGAATACAAAGGAAAAG AB042240
Rye Secale cereale ATCCGTGTTTTGAGAAAACAAGGGGTTCTCGAACTAGAATACAAAGGAAAAG AF519162
Rice Oryza sativa ATCCATGTTTTGAGAAAACAAGCGGTTCTCGAACTAGAACCCAAAGGAAAAG X15901
Millet Panicum miliaceum ATCCCTTTTTTGAAAAAACAAGTGGTTCTCAAACTAGAACCCAAAGGAAAAG AY142738
Strawberry Fragaria vesca ATCCCGTTTTATGAAAACAAACAAGGGTTTCAGAAAGCGAGAATAAATAAAG EF010971
Apricot Prunus armeniaca ATCCTGTTTTATTAAAACAAACAAGGGTTTCATAAACCGAGAATAAAAAAG EF010968
Sour cherry Prunus cerasus ATCCTGTTTTATTAAAACAAACAAGGGTTTCATAAACCGAGAATAAAAAAG EF010970
Maize Zea mais ATCCCTTTTTTGAAAAACAAGTGGTTCTCAAACTAGAACCCAAAGGAAAAG NC_001666
Garden pea Pisum sativum ATCCTTCTTTCTGAAAACAAATAAAAGTTCAGAAAGTGAAAATCAAAAAAG EF010972
Common bean Phaseolus vulgaris ATCCCGTTTTCTGAAAAAAAGAAAAATTCAGAAAGTGATAATAAAAAAGG AY077945
Johnson grass Sorghum halepense ATCCACTTTTTTCAAAAAAGTGGTTCTCAAACTAGAACCCAAAGGAAAAG AY116244
Lettuce Lactuca sativa ATCACGTTTTCCGAAAACAAACAACGGTTCAGAAAGCGAAAATCAAAAAG U82042
Sunflower Helianthus annuus ATCACGTTTTCCGAAAACAAACAAAGGTTCAGAAAGCGAAAATAAAAAAG U82038
Wild oat Avena sativa ATCCGTGTTTTGAGAGGGGGGTTCTCGAACTAGAATACAAAGGAAAAG X75695
Barley Hordeum vulgare ATCCGTGTTTTGAGAAGGGATTCTCGAACTAGAATACAAAGGAAAAG X74574
Potato Solanum tuberosum ATCCTGTTTTCTGAAAACAAACAAAGGTTCAGAAAAAAAG EF010973
Tomato Solanum lycopersicum ATCCTGTTTTCTGAAAACAAACCAAGGTTCAGAAAAAAAG AY098703
Egg plant Solanum melongena ATCCTGTTTTCTCAAAACAAACAAAGGTTCAGAAAAAAAG AY266240
Radish Raphanus sativus ATCCTGAGTTACGCGAACAAACCAGAGTTTAGAAAGCGG AF451576
Cabbage Brassica oleracea ATCCTGGGTTACGCGAACAAAACAGAGTTTAGAAAGCGG AF451574

DISCUSSION
phylogenetic studies among closely related species. Obvi-
DNA barcoding concerns two categories of scientists: ously, this drawback is even more dramatic when using the
taxonomists and scientists in fields other than taxonomy (4). very short P6 loop (amplified with primers g and h), but on
The goal of this paper was to evaluate the potential use of the the same subset of species, the short P6 loop performs signifi-
chloroplast DNA trnL (UAA) intron for plant DNA barcod- cantly better than the alternative system used to date when
ing in areas other than taxonomy. We will first discuss the analyzing highly degraded DNA [rbcL fragment amplified
drawbacks of this molecular marker, and then its advantages. with h1aF and h2aR (12)]. Finally, even if the proportion
The main, and maybe the only but extremely important of species unambiguously identified with the P6 loop seems
drawback is the relatively low resolution of the trnL (UAA) low (around 20%), usually only closely related species are
intron compared with several other noncoding chloroplast not resolved.
regions. This has already been pointed out in several studies It is interesting to note that the relatively low resolution of
(6,18). It is clear that the trnL intron does not represent the trnL (UAA) intron is logically linked to a lower intraspe-
the best choice for characterizing plant species and for cific variation, compared with other noncoding regions of

Downloaded from https://academic.oup.com/nar/article-abstract/35/3/e14/2401927


by guest
on 17 June 2018
PAGE 7 OF 8 Nucleic Acids Research, 2007, Vol. 35, No. 3 e14

Figure 3. Example of multi-peak profiles obtained after capillary electrophoresis of the fluorescent PCR products obtained using the g and h primers.
(A) Permafrost sample drilled from Main River Ice Bluff (N.E. Siberia, 64.06N, 171.11E), between 21 050 and 25 440 years old (uncalibrated 14C years, based on
AMS dating of plant macrofossils from the section); g fluorescent primer; each peak represents at least one arctic plant species. (B) Human feces sample; h
fluorescent primer; three of the four main peaks have been identified after cloning and sequencing: peak 1, nonidentified; peak 2, banana (Musa acuminata); peak
3, lettuce (Lactuca sativa); and peak 4, cacao (Theobroma cacao).

Table 5. Sequences obtained after cloning the PCR product from the the identification of the species or the genus. Finally, the
lyophilized potage robustness of both systems (the entire intron and the P6
loop) also represents an important advantage. This last advan-
Sequence obtained 50 –30 Species Number
of clones
tage might be linked to the two previous ones, because a
robust system will incite scientists to use this region, increas-
ATCTTTATTTTTTGAAAAACAA- Leek 19 ing the number of sequences in databases, and the robustness
GGGTTTAAAAAAGAGAAT- (Allium porum) mainly comes from the primer universality.
AAAAAAG
ATCCTGTTTTCTGAAAACAAA- Potato 3
Actually, in some situations, the relatively low resolution
CAAAGGTTCAGAAAAAAAG (Solanum tuberosum) of the trnL intron can be largely compensated by the possi-
ATCTTTCTTTTTTGAAAAACAA- Onion 1 bilities of standardization. In many situations, the number
GGGTTTAAAAAAGAGAAT (Allium cepa) of possible plant species is restricted, reducing the impact
AAAAAAG of the relatively low resolution. In our arctic plant dataset,
Note that onion and leek belong to the same genus Allium, and that their the number of species unambiguously identified among
sequences differ by a single substitution. 123 is close to 50% for the P6 loop, and close to 85% for
the entire intron. In the same way, the eaten plant species
chloroplast DNA (18). Nevertheless, even the short P6 loop are few and taxonomically diverse, and can be identified in
can present some intraspecific variation, due in 21.2% of most cases. Even the short P6 loop allows the identification
the cases to the presence of a T (or A) stretch of >10 bp long. of the three commonly eaten species of the genus Solanum
However, the strong drawback posed by the relatively low (potato, tomato and eggplant), which differ by a single muta-
resolution is compensated by several advantages. First, the tion (see Table 4). However, the P6 loop does not allow the
primers used to amplify both the entire region (c and d) identification of the different cultivars of the same species
and the P6 loop (g and h) are extremely well conserved [specifically, of Brassica oleracea (Brussels sprouts, Kohl
(Table 2), from Bryophytes to Angiosperms for the c–d rabi, Broccoli, etc.) or of Phaseolus vulgaris (different culti-
primer pair, from Gymnosperms to Angiosperms for the vated varieties)]. In addition, the P6 loop cannot distinguish
g–h pair. The primers g and h are much more conserved most of the species of the genus Prunus (apricot, peach,
than the primers h1aF and h2aR (12) targeting a protein cherry, etc.).
sequence, and thus having much more variable positions. To conclude, the trnL (UAA) intron, despite its relatively
This advantage is particularly important when amplifying low resolution, provide a unique opportunity for plant DNA
multiple species within the same PCR. Second, the number barcoding in the biotechnology area, because of the univer-
of trnL (UAA) intron sequences available in databases is sality of the c–d and g–h primers, of the robustness of
already very high, by far the most numerous among noncod- the amplification process, and of the possibility of develop-
ing chloroplast DNA sequences, allowing in many cases ing highly standardized procedures. Furthermore, the

Downloaded from https://academic.oup.com/nar/article-abstract/35/3/e14/2401927


by guest
on 17 June 2018
e14 Nucleic Acids Research, 2007, Vol. 35, No. 3 PAGE 8 OF 8

low-intraspecific variation represents an important advantage 13. Scharaschklin,T. and Doyle,J.A. (2005) Phylogeny and historical
if the amplicons are detected by hybridization. Even the short biogeography of Anaxagorea (Annonaceae) using morphology and
P6 loop allows to gather valuable information about plant noncoding chloroplast sequence data. Syst. Bot., 30, 712–735.
14. McDade,L.A., Daniel,T.F., Kiel,C.A. and Vollesen,K. (2005)
identification and will undoubtedly become the marker of Phylogenetic relationships among Acantheae (Acanthaceae): major
choice for highly degraded template DNA. This P6 loop lineages present contrasting patterns of molecular evolution and
has the potential to be extensively used in food industry, in morphological differentiation. Syst. Bot., 30, 834–862.
forensic science, in diet studies based on feces, and in per- 15. Chen,S.Y., Xia,T., Wang,Y.J., Liu,J.Q. and Chen,S.L. (2005)
Molecular systematics and biogeography of Crawfurdia, Metagentiana
mafrost analyses for reconstructing past plant communities. and Tripterospermum (Gentianaceae) based on nuclear ribosomal and
plastid DNA sequences. Ann. Bot., 96, 413–424.
16. Ronning,S.B., Rudi,K., Berdal,K.G. and Holst-Jensen,A. (2005)
Differentiation of important and closely related cereal plant species
ACKNOWLEDGEMENTS (Poaceae) in food by hybridization to an oligonucleotide array.
J. Agric. Food Chem., 53, 8874–8880.
This study has been financially supported by an ECLIPSE II 17. Ward,J., Peakall,R., Gilmore,S.R. and Robertson,J. (2005) A molecular
grant (CNRS). We thank Dietmar Quandt for help with identification system for grasses: a novel technology for forensic
Figure 2, and Jean-Pierre Furet for extracting the DNA from botany. Forensic Sci. Int., 152, 121–131.
human fecal samples. E.W. wants to thank Andrei Sher and 18. Shaw,J., Lickey,E.B., Beck,J.T., Farmer,S.B., Liu,W., Miller,J.,
James Haile for helping with sample collection, Tina Brand Siripun,K.C., Winder,C.T., Schilling,E.E. and Small,R.L. (2005) The
tortoise and the hare II: relative utility of 21 noncoding chloroplast
for assisting the lab work and The Wellcome Trust, UK and DNA sequences for phylogenetic analysis. Am. J. Bot., 92, 142–166.
the National Science Foundation, DK for financial support. 19. Taberlet,P., Gielly,L., Pautou,G. and Bouvet,J. (1991) Universal
F.P. is supported by the French ‘Institut National de la primers for amplification of three noncoding regions of chloroplast
Recherche Agronomique’. C.B. thanks Reidar Elven and DNA. Plant Mol. Biol., 17, 1105–1109.
20. Gielly,L. and Taberlet,P. (1996) A phylogeny of the European gentians
Hanne H. Grundt for help with the arctic plant sample inferred from chloroplast trnL (UAA) intron sequences. Bot. J. Linn.
collection and the Research Council of Norway (grant Soc., 120, 57–75.
146 515/420) for funding. Funding to pay the Open Access 21. Quandt,D. and Stech,M. (2005) Molecular evolution of the trnL (UAA)
publication charges for this article was provided by CNRS. intron in bryophytes. Mol. Phylogenet. Evol., 36, 429–443.
22. Quandt,D., Müller,K., Stech,M., Frahm,J.P., Frey,W., Hilu,K.W. and
Conflict of interest statement. None declared. Borsch,T. (2004) Molecular evolution of the chloroplast trnL-F region
in land plants. Monogr. Syst. Bot. Missouri Botanic Garden, 98, 13–37.
23. Shinozaki,K., Ohme,M., Tanaka,M., Wakasugi,T., Hayashida,N.,
Matsubayashi,T., Zaita,N., Chunwongse,J., Obokata,J.,
REFERENCES Yamaguchi-Shinozaki,K. et al. (1986) The complete nucleotide
sequence of tobacco chloroplast genome: its gene organization and
1. Floyd,R., Abebe,E., Papert,A. and Blaxter,M. (2002) Molecular expression. EMBO J., 5, 2043–2049.
barcodes for soil nematode identification. Mol. Ecol., 11, 839–850. 24. Palmer,J.D. (1991) Plastid chromosomes: structure and evolution.
2. Hebert,P.D.N., Cywinska,A., Ball,S.L. and de Waard,J.R. (2003) Cell Cult. Som. Cell Genet. Plants, 7A, 5–53.
Biological identification through DNA barcodes. Proc. R. Soc. 25. Michel,F., Jacquier,A. and Dujon,B. (1982) Comparison of fungal
Lond., B. Biol. Sci., 270, 313–321. mitochondrial introns reveals extensive homologies in RNA secondary
3. Hebert,P.D.N. and Gregory,T.R. (2005) The promise of DNA structure. Biochimie, 64, 867–881.
barcoding for taxonomy. Syst. Biol., 54, 852–859. 26. Davies,R.W., Waring,R.B., Ray,J.A., Brown,T.A. and Scazzocchio,C.
4. Chase,M.W., Salamin,N., Wilkinson,M., Dunwell,J.M., (1982) Making ends meet—a model for RNA splicing in fungal
Kesanakurthi,R.P., Haidar,N. and Savolainen,V. (2005) Land plants mitochondria. Nature, 300, 719–724.
and DNA barcodes: short-term and long-term goals. Philos. Trans. R. 27. Wu,S. and Manber,U. (1992) Agrep-a fast approximate pattern-
Soc. B Biol. Sci., 360, 1889–1895. matching tool. In Proceedings of the USENIX Winter 1992 Technical
5. Hebert,P.D.N., Ratnasingham,S. and de Waard,J.R. (2003) Barcoding Conference, USENIX Association, Berkeley, CA, pp. 153–162.
animal life: cytochrome c oxidase subunit 1 divergences among closely 28. Willerslev,E. and Cooper,A. (2005) Ancient DNA. Proc. R. Soc. Lond.
related species. Proc. R. Soc. Lond. B Biol. Sci., 270, S96–S99. B, 272, 3–16.
6. Kress,W.J., Wurdack,K.J., Zimmer,E.A., Weigt,L.A. and Janzen,D.H. 29. Godon,J.J., Zumstein,E., Dabert,P., Habouzit,F. and Moletta,R. (1997)
(2005) Use of DNA barcodes to identify flowering plants. Proc. Natl Molecular microbial diversity of an anaerobic digestor as determined
Acad. Sci. USA, 102, 8369–8374. by small-subunit rDNA sequence analysis. Appl. Environ. Microbiol.,
7. Vences,M., Thomas,M., van der Meijden,A., Chiari,Y. and Vieites,D. 63, 2802–2813.
(2005) Comparative performance of the 16S rRNA gene in DNA 30. Willerslev,E., Hansen,A.J., Binladen,J., Brand,T.B., Gilbert,M.T.P.,
barcoding of amphibians. Front. Zool., 2, 5. Shapiro,B., Bunce,M., Wiuf,C., Gilichinsky,D.A. and Cooper,A. (2003)
8. Hebert,P.D.N., Penton,E.H., Burns,J.M., Janzen,D.H. and Diverse plant and animal genetic records from Holocene and
Hallwachs,W. (2004) Ten species in one: DNA barcoding reveals Pleistocene sediments. Science, 300, 791–795.
cryptic species in the neotropical skipper butterfly Astraptes fulgerator. 31. Brownstein,M.J., Carpten,J.D. and Smith,J.R. (1996) Modulation of
Proc. Natl Acad. Sci. USA, 101, 14812–14817. non-templated nucleotide addition by Taq polymerase: primer
9. Hebert,P.D.N., Stoeckle,M.Y., Zemlak,T.S. and Francis,C.M. (2004) modification that facilitate genotyping. BioTechniques, 20, 1004–1010.
Identification of birds through DNA barcodes. PLoS Biol., 2, e312. 32. Magnuson,V.L., Ally,D.S., Nylund,S.J., Karanjawala,Z.E.,
10. Tautz,D., Arctander,P., Minelli,A., Thomas,R.H. and Vogler,A.P. Rayman,J.B., Knapp,J.I., Lowe,A.L., Ghosh,S. and Collins,F.S. (1996)
(2003) A plea for DNA taxonomy. Trends Ecol. Evol., 18, 70–74. Substrate nucleotide-determinated non-templated addition of adenine
11. Álvarez,I. and Wendel,J.F. (2003) Ribosomal ITS sequences and plant by Taq DNA polymerase: implications for PCR-based genotyping and
phylogenetic inference. Mol. Phylogenet. Evol., 29, 417–434. cloning. BioTechniques, 21, 700–709.
12. Poinar,H.N., Hofreiter,M., Spaulding,W.G., Martin,P.S., 33. Borsch,T., Hilu,K.W., Quandt,D., Wilde,V., Neinhuis,C. and
Stankiewicz,B.A., Bland,H., Evershed,R.P., Possnert,G. and Pääbo,S. Barthlott,W. (2003) Noncoding plastid trnT-trnF sequences reveal a
(1998) Molecular coproscopy: Dung and diet of the extinct ground well resolved phylogeny of basal angiosperms. J. Evol. Biol.,
sloth Nothrotheriops shastensis. Science, 281, 402–406. 16, 558–576.

Downloaded from https://academic.oup.com/nar/article-abstract/35/3/e14/2401927


by guest
on 17 June 2018

You might also like