Eva-12-187 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

GOUS et al.

| 191

TA B L E 2 A summary of merged processed reads for ITS1, ITS2 After removal of taxa representing <0.1% of reads per sample,
and rbcL after next‐generation sequencing. Numbers indicated are only one or two plant genera per sample for ITS1 could be identified
after reads were processed for quality with Q20 filtering, Nextera against the sequence reference database generated in this study.
adapter trimming, fragments discarded that were less than 100 bp
Between one and eight plant species were identified per sample
in length and forward and reverse reads merged
using ITS2. Rarefaction curves show that the sequencing depth
ITS1 ITS2 rbcL for all samples was sufficient to obtain maximum taxon richness
Sum of total combined 660,837 1,130,803 40,646 (Figure 1). When all raw read data are included in rarefaction analy‐
reads ses, a maximum of ten species per sample for ITS2 was reached, with
Mean of total combined 30,038 51,400 1,936 curves still reaching a plateau (Figure S1), thus further indicated that
reads sufficient sequencing depth was reached.
Median of combined 20,135 24,668 1,570 A total of 81.8% of ITS1 samples had more than 1,000 reads
reads
identified to Viridiplantae. One of the ITS2 samples only had 1,154
Standard deviation 27,246 56,826 1,427
high‐quality, merged reads that could be identified to Viridiplantae,
but was still sequenced to saturation, as indicated by the rarefaction
curve (Figure 1b). A single ITS2 sample had less than 1,000 reads
trimming across the 22 pollen samples. This is on average 30,038 identified to Viridiplantae and was removed from further analyses,
reads for ITS1 and 51,400 reads for ITS2 per sample, respectively. with 95.5% of samples remaining for further analyses.
Only 40,646 reads were obtained for rbcL after 2 sequencing runs,
with a mean of <2,000 reads per sample. Mean read lengths for both
3.3 | Plant origins of pollen collected from
forward and reverse reads for rbcL were very close to the range of
Megachile venusta specimens
our length quality cut‐off (forward mean length = 142 bp; reverse
mean length = 103 bp). One sample failed to produce any reads, and When classifying sequence reads to the ITS1 database, two plant
<200 reads were obtained for four samples. The number of reads genera (Helianthus and Oryza) were identified. Helianthus was identi‐
obtained per sample for rbcL was significantly lower than for ITS1 fied in 72.2% of the samples, and both genera were identified in the
(t = 4.48, p < 0.001) and ITS2 (t = 4.03, p < 0.001). Table 2 provides remaining 27.8% of samples. On average 3.3% (SD = 0.25) of reads
summary statistics for ITS1, ITS2 and rbcL processed reads. per sample could only be assigned to the phylum level (Streptophyta)
The percentage of reads of both ITS1 and ITS2 assigned to the and 50.3% (SD = 0.09) of reads remained unidentified at the assign‐
kingdom Viridiplantae varied between samples. Samples consisting ment level of kingdom. Classification to species level was limited
of fewer than 1,000 reads that were classified to species level were with the ITS1 database.
regarded as unsuccessful and were discarded prior to further analy‐ Classification with the ITS2 database produced classification
ses. This cut‐off was selected so that rare taxa identified will have at only up to kingdom in 0.6% of the reads per sample, on aver‐
least two reads. The presence of unidentified reads did not influence age (SD = 0.02) and only up to phylum for an average of 68.4%
the identification of plant origins of samples, even though a higher (SD = 0.22) of reads per sample. Significantly more lower rank‐
amount of total reads were necessary to reach sequence saturation ing taxon classifications could be made using the ITS2 database.
for plant identification. Identification of rbcL reads to plant origins With the confidence set at the recommended level of 85%, an
produced very variable results. In 45% of the samples, less than average of four species, four genera and four families were iden‐
1,000 reads were produced. Due to the extremely variable nature tified per sample when classifying reads with the ITS2 database.
of amplification and sequencing results, rbcL data were not analysed In total, 25 species from 21 different genera could be confidently
further. identified with the ITS2 database. These species belonged to

F I G U R E 1 Rarefaction curves for (a) ITS1 and (b) ITS2 samples. ITS1 samples reached sequence saturation at approximately 250 reads,
whereas ITS2 samples needed approximately 1,000 to 2,000 high‐quality sequence reads to obtain maximum plant taxon richness per
sample. Rarefaction curves were created after taxa representing less than 0.1% of reads per sample were removed. Rarefaction curves
without <0.1% reads removed are in available in the Supporting Information

You might also like