Gudbergsd-Ttir Et Al-2016-Environmental Microbiology
Gudbergsd-Ttir Et Al-2016-Environmental Microbiology
Gudbergsd-Ttir Et Al-2016-Environmental Microbiology
Environmental Microbiology
Microbiology (2016)
(2015) 18(3), 863–874 doi:10.1111/1462-2920.13079
doi:10.1111/1462-2920.13079
sequencing of the total DNA offers a few advantages whereas the other three metagenomes are largely com-
over only sequencing the viral fraction. For example, it is posed of bacterial species (Is2-5S, Is3-13 and It6). In
easier to obtain enough DNA for sequencing from the general, the percentage of all metagenomic reads
total mass than from the viral fraction. Also, the correla- assigned to viruses is low, ranging from 0.05% (Is2-5S)
tion between viruses and their hosts is immediately avail- to 2% (It6) except NL10, which has 12% of all reads
able, e.g. through analysing the clusters of regularly assigned to viruses (Table S2). This paper focuses on
interspaced short palindromic repeats (CRISPR) loci the viral diversity and description of novel viral genomes
from the cellular fraction of the metagenomes. However, in the metagenomes.
analysing viral sequences from cellular metagenomes Two (Is2-5S) to 38 (NL10) viral contigs larger than
requires a database of reference viral genomes, whereas 5 kbp are detected in the metagenomes (Table S3)
the purification of viral particles allow for the discovery of among which CH1102 contains the highest number of
entirely new viruses. contigs of more than 10 kbp. Figure 1 shows the relative
In order to link viruses to potential hosts in terrestrial abundance of viral families or species among all reads
hot springs, several studies have utilized information assigned to viruses in each metagenome. More than 91%
from the CRISPR arrays, which contain short spacer of all viral sequences were assigned to crenarchaeal viral
sequences derived from viral genomes and serve as families in each metagenome, except sample Is2-5S with
‘memories’ of former virus invasions (Sorek et al., 2008). only 52.5% of its viral sequences assigned to
The combination of metagenomics and CRISPR crenarchaeal viral families (Fig. 1). Given that half of the
analysis overcomes the limitation of the culture- samples are dominated by bacterial species, the predomi-
dependent methods in detecting novel viruses and in nance of crenarchaeal viral sequences in all the samples
analysing viral diversities (Heidelberg et al., 2009). This may reflect a compositional bias in the reference data-
has been done either by extracting spacers from base. Among the ∼60 thermophilic viruses, about two
sequenced cellular genomes (Anderson et al., 2011) or thirds was isolated from crenarchaeal hosts with an
by polymerase chain reaction identifying CRISPR optimal growth temperature between 75oC and 100oC,
spacers from the same sample (Emerson et al., 2013). A whereas the remaining third were bacteriophages isolated
microarray approach with CRISPR spacers as probes from thermal environments under 75oC (Uldahl and Peng,
has also been reported to identify unknown viruses and 2013). Taking temperatures of the six hot springs (76°C to
monitor changes in virus populations in hot springs 90°C, Table S1) into account, it is very likely that the
(Snyder et al., 2010). abundance of thermophilic bacterial viruses in the
In this study we identified and analysed viral sequences samples is under-estimated and more thermophilic bac-
from six metagenomes derived from several distantly terial viruses remain to be isolated and studied. In accord-
located hot springs of varying temperature and pH param- ance with this, the three samples dominated by bacteria
eters (Menzel et al., 2015). We describe 10 novel all contain high amounts of sequences unassigned by
genomes, substantially expanding the small family of MEGAN, some of which may be viral sequences of high
thermophilic viruses. Importantly, this study demonstrates novelty (Table S2).
the wide distribution of four thermophilic archaeal viral Three of the samples (NL10, CH1102 and It6) are pre-
families and high viral diversity in hot springs. dominated by a single viral family, whereas the viral
community is more diverse in the other three samples
Results and discussion (Fig. 1). The most abundant viral family varies from
Lipothrixviridae in NL10, It3 and It6, to Ampullaviridae in
Overall viral diversity
Is3-13, whereas the presently unclassified virus SMV1 is
The metagenomes of two hot spring samples from the most abundant in CH1102 constituting more than
Iceland, two from Italy, and two from Yellowstone 80% of the identified viral reads. Unlike in the other
National Park (YNP) in USA ranging in temperature from metagenomes, the non-crenarchaeal viral order
76°C to 90°C and in pH from 1.8 to 5.5 were analysed for Caudovirales is most abundant in Is2-5S. The
their viral composition and diversity (Table S1). To this Caudovirales order is composed of head to tail viruses
end, de-novo assembled metagenomic contigs were infecting members of Bacteria and Euryarchaea.
assigned to taxa by MEGAN and relative taxon abun- However, Caudovirales sequences detected in Is2-5S
dances are estimated by counting reads mapped were assigned to bacteriophages. The cellular part of
to the contigs of each taxon (see Experimental proce- the Is2-5S metagenome also exhibited unusually high
dures). The biodiversity of the metagenomes at the cel- diversity for a site of very high temperature (Table S1),
lular level has been recently described in Menzel et al. which could be explained by the presence of various
(2015). In short, three of the metagenomes are domi- organic substrates in/around the hot spring (Menzel
nated by archaeal species (CH1102, NL10 and It3), et al., 2015).
© 2015
C 2015 Society
V SocietyMicrobiology
for Applied for Applied and
Microbiology and& John
John Wiley SonsWiley & Sons Ltd, Environmental
Ltd, Environmental Microbiology
Microbiology, 18, 863–874
Novel viral
Novelgenomes from from
viral genomes acidic hot springs
acidic hot springs8653
100 1.4
1.6 4.2 4.1 2.3
2.3 7.2
8 3.1 12.1
2.1
8.9 13 3.7 tax
9.6 unclassified
75 6.3
other viruses
3.4
43.5 A: Turriviridae
11.5
Relative abundance in percent
Fig. 1. Relative abundance in percentage of viral families or species in each metagenome. The viral hosts are indicated by A (Archaea) and B
(Bacteria). All families are hyperthermophilic, but the Caudovirales order contains both mesophilic and thermophilic viruses. The total number
of viral reads in each metagenome is indicated above the sample name.
Wide distribution of Ampullaviridae, Bicaudaviridae, 0.9% (in CH1102) to 48.4% (in Is3-13) for Ampullaviridae,
Lipothrixviridae and Rudiviridae in the six hot springs from 0% (in Is2-5S) to 24.2% (in It3) for Bicaudaviridae,
from 2% (in CH1102) to 81.7% (in It6) for Lipothrixviridae,
The It6 sample was taken from the same hot spring in
and from 1.4% (in CH1102) to 8.9% (in NL10) for
Pozzuoli, Italy, where ABV and ATV, the single member of
Rudiviridae (Fig. 1).
Ampullaviridae and Bicaudaviridae, respectively, as well
Among the four viral families detected in all six
as the lipothrixvirus AFV2 and the rudivirus ARV1 were
metagenomes, the Ampullaviridae and Bicaudaviridae
originally isolated around 10 years ago (Haring et al.,
contain only a single studied isolate and it has been
2005a). Interestingly, sequences related to these four
difficult to assess the conservation and the essentiality of
crenarchaeal viral families were detected in (nearly) all the
the viral genes. In most of the metagenomes several large
metagenomes (Fig. 1). Among the viral families exhibiting
contigs were assigned to the two families (Table S4),
morphotypes exclusively detected in the Archaea domain,
providing a basis for comparative analyses. The longest
namely Fuselloviridae, Ampullaviridae, Bicaudaviridae
ABV contigs from metagenomes Is3-13 and It6 were
and Guttaviridea, only the Fuselloviridae have been
close to complete genomes and were analysed in more
detected at geographically distant locations. The detec-
detail. Despite the large number of sequences assigned
tion of Ampullaviridae- and Bicaudaviridae-related
to Lipothrixviridae and Rudiviridae in three of the
sequences in the six metagenomes sampled from differ-
metagenomes (Table S4) only one novel rudiviral genome
ent continents demonstrates the wide distribution of these
was identified in the It6 metagenome.
unusual archaeal viruses. In spite of being widespread,
the four viral families exhibited highly variable abun- Sequences related to ABV The largest contig assigned
dances from one site to another. Among the identified viral to ABV in It6 is 22 613 bp, which is slightly smaller than
reads at each site, the relative abundance ranges from the 23.8 kbp genome of the original ABV (Peng et al.,
© 2015 Society
C 2015
V Society for
for Applied
Applied Microbiology
Microbiology and
and John
John Wiley
Wiley&&Sons
SonsLtd,
Ltd,Environmental
EnvironmentalMicrobiology,
Microbiology 18, 863–874
866
4 S. S. R. Gudbergsdet
R. Gudbergsdóttir ttir
o al. et al.
The ORF numbers of ABV genes mentioned in the main text are indicated below the ABV map. Predicted functions of genes exclusively found in ABV3 are listed below the respective ORF.
genomes, whereas white arrows denote ORFs unique to each genome. Blue arrows denote ORFs conserved between ABV and ABV2, but have no significant sequence similarity to ABV3.
identified, which is slightly shorter than the 590 bp ITR of
ABV. The presence of ITR in this It6 contig, hereafter
Fig. 2. Genome comparison between ABV and the novel ABV2 and ABV3. The size of each genome (kbp) is given in brackets. Black arrows indicate ORFs conserved across all three
designated ABV2, strongly indicates that the genome is
complete.
The ABV2 genome was annotated with RAST and all
predicted open reading frames (ORFs) were searched
against the GenBank database. The genome encodes a
total of 54 ORFs. Five ORFs have no significant sequence
similarity to the ABV genome, two of which are located
within the ITRs (Fig. 2), two are located between
homologues of ABV gp08 and gp09 (predicted leucine
zipper protein) and the last one between homologues of
ABV gp50 (predicted leucine zipper protein) and gp51.
The remaining 49 ORFs are homologous to ABV genes
with an average amino acid identity of only 70%, despite
(28.5 kb)
(23.8 kb)
© 2015
C 2015 Society
V SocietyMicrobiology
for Applied for Applied and
Microbiology and& John
John Wiley SonsWiley & Sons Ltd, Environmental
Ltd, Environmental Microbiology
Microbiology, 18, 863–874
Novel viral
Novelgenomes from from
viral genomes acidic hot springs
acidic hot springs8675
out of the inserted six ORFs shows homology (34% a.a. ATV-like genomes with lower than 90% nt similarity to ATV
identity) to a DNA-directed RNA polymerase (Fig. 2). are also present in the same sample.
Apart from ABV2 and ABV3, ten to hundreds of ABV- The ATV-like sequences from It3 metagenome were
related contigs were recovered from the six distributed throughout the entire ATV genome with highly
metagenomes (Table S4). To gain further insight into variable nt sequence identity, ranging from less than 60%
sequence diversity of ABV-like genomes, one to three of to 90%. An 8853 bp contig had 70% nt identity to ATV
the large contigs from each metagenome, except Is2-5S, covering ATV gp14 to gp33 with deletions of gp20, gp25,
were analysed in more detail. Except a couple of gp28-gp29.
contigs retrieved from It3 and It6 and one contig from To gain information about the coverage of ATV-like
Is3-13, the nucleotide sequences of the large contigs are sequences, individual sequencing reads from It6 and It3
generally very different from those of ABV, ABV2 and metagenomes were mapped to the ATV genome (Fig.
ABV3. The a.a. sequence identity, if detectable, ranges S7). Whereas 95% of the ATV genome is covered by at
between 25% and more than 90% between the ORFs of least one read in It6 with an average coverage of 39×, only
the contigs and those of ABV (Table S6). Among the 12 87% of the ATV genome is covered by reads in It3 with a
large ABV-like contigs analysed, 6 aligned to the right half lower average coverage of 22×.
of the ABV genome (gp24-gp57), 2 to the left part (gp01- Almost no reads could be mapped for Is3-13, NL10 and
gp23) and 4 encompassing the junction (Figs S1–S6 and CH1102, likely due to too little sequence conservation on
Table S6). This suggests that the right half of the ABV-like nucleotide level. Nevertheless, 6–155 contigs were
genomes is more conserved, as supported by the full assigned to ATV by MEGAN in these three metagenomes
genome comparison between ABV, ABV2 and ABV3 on a.a. sequence level (Table S4). The longest ATV-like
(Fig. 2). Clearly, most of the insertions occur between contig in the Is3-13 metagenome is 4553 nt long (Table
homologues of ABV gp08 and gp11 in ABV3 genome, and S4) and encodes ORFs showing around 60% of a.a.
the right half between ABV gp24 to gp46 is rarely inter- identity to ATV gp58-gp61. Thus, ATV-like sequences
rupted (Fig. 2). The ABV virion has a very complex struc- from the Icelandic and YNP hot springs seem to be much
ture (Haring et al., 2005a) and may contain multiple more diverged from the original ATV genome compared to
different structural proteins, none of which, however, has the sequences from the Italian samples.
been identified due to the low yield of ABV virions in the
laboratory (Peng et al., 2007). While the left terminus Lipothrixviridae-related sequences Lipothrixviridae are
encodes a putative protein-primed DNA polymerase very abundant in both NL10 and It6, constituting more
(gp05) and a putative terminal protein (gp07) possibly than 75% of the identified viral reads. Although the sum of
involved in viral genome replication, genes encoding the all unique contigs assigned to Lipothrixviridae reached
structural proteins are expected to be relatively conserved 2.7 Mbp in the NL10 sample (Table S4), corresponding to
and therefore may be located in the conserved right half of around 70 units of lipothrixviral genomes which range in
the genome. size between 20 and 40 kbp (Vestergaard et al., 2008b),
In the IS3-13, CH1102 and NL10 samples, one third to no complete lipothrixviral genome was assembled. This
more than half of the predicted ORFs in each of the large suggests high sequence diversity within the lipothrixviral
ABV-like contigs from the metagenomes have no signifi- community in the NL10 sample. In the other
cant matches to GenBank, reflecting the novelty and high metagenomes the relative abundance of the
diversity of the ABV-like genomes in these hot spring Lipothrixviridae family varies from 5% to 44.5% (Fig. 1).
environments. The largest viral contig in the NL10 metagenome of
12 858 bp is assigned to Lipothrixviridae but further clas-
ATV-like sequences ATV-like sequences are present in sification into one of the four genera described within the
all metagenomes, except Is2-5S. However no full genome Lipothrixviridae family (Prangishvili, 2013) was not possi-
of an ATV-like virus was recovered although ATV-like ble. Of the 24 predicted ORFs, 13 shared highest
sequences are abundant in the two Italian metagenomes, sequence similarities with the lipothrixvirus AFV2 (Haring
It6 and It3 (Table S4). et al., 2005c), seven with lipothrixvirus SIFV (Arnold et al.,
In accordance with being sampled from the same site 2000), one with Sulfolobus turreted icosahedral virus
where ATV was isolated, the It6 metagenome contains (STIV) (Rice et al., 2004) and the remaining three had no
112 contigs mapping to the ATV genome with over 90% nt significant match to the GenBank database. The It6
identity. There are a few small gaps where no contigs metagenome also contains one large contig of 10 969 bp
align to the ATV genome with such a high identity. assigned to the Lipothrixviridae. However, BLASTP
Sequence analysis of the remaining contigs (Table S4) searches of the predicted ORFs revealed the best hits
matching to ATV revealed between less than 60% and from different lipothrixviral genomes, with an average a.a
90% nt identity throughout the genome, suggesting that identity of 78%, making it difficult to determine which
© 2015 Society
C 2015
V Society for
for Applied
Applied Microbiology
Microbiology and
and John
John Wiley
Wiley&&Sons
SonsLtd,
Ltd,Environmental
EnvironmentalMicrobiology,
Microbiology 18, 863–874
868
6 S. S. R. Gudbergsdet
R. Gudbergsdóttir ttir
o al. et al.
is the closest related genome among the known The genome sizes of SMV2, SMV3 and SMV4 are
Lipothrixviridae. 50 918 bp, 64 323 bp and 51 711 bp, encoding 65, 87 and
68 ORFs respectively. A comparison of all four SMV
Rudiviridae-related sequences The Rudiviridae are not genomes revealed two distinct parts, a very well-
as abundant in the metagenomes as the Lipothrixviridae, conserved region between SMV1 gp03 and gp34 and a
their relative abundance ranges between 1.4% in less conserved region containing many gene insertions as
CH1102 and 8.9% in NL10 (Fig. 1). Whereas the contigs well as deletions (Fig. 3A). In total, 35 genes are con-
assigned to the Rudiviridae family were all smaller than served in all four SMV genomes, including genes encod-
5 kbp in three metagenomes (CH1102, Is3-13 and Is2- ing two structural proteins (SMV1 gp06 and gp11) and an
5S), the other three samples contain two to three contigs integrase (gp34). The a.a. sequence identities between
assigned to Rudiviridae that are larger than 5 kbp the homologues range between 25% and 96%. Nine to 22
(Table S4). genes are present exclusively in each SMV genome,
A 29 763 bp contig in the It6 metagenome is assigned showing no sequence similarity to GenBank, whereas the
to ARV1, which was also isolated in Pozzuoli, Italy rest of genes are shared between two or three of the SMV
(Vestergaard et al., 2005). The contig encodes 43 ORFs genomes (Fig. 3A).
and is hereafter designated ARV2. Multiple rearrange- To estimate the evolutionary relationship of the four
ments are observed in comparison with the ARV1 genome genomes, the a.a. sequences of the major virion protein
(Fig. S8). Eighteen of the predicted ORFs match to ARV1 (SMV1 gp11 homologues) from the four SMV genomes as
with a.a. identities ranging between 33% and 87%. Nine well as from the ATV, STSV1 (Xiang et al., 2005) and
ORFs match to other archaeal viruses such as ATV or STSV2 (Erdmann et al., 2014) genomes were used to
AFV, five match to both ARV1 and other archaeal viruses, construct a phylogenetic tree (Fig. S9). The average a.a.
four match to Sulfolobus or Acidianus genes and the identity of the major virion protein is 83% between the
remaining seven ORFs have no significant match SMVs and only 33% between the SMVs and the other
(E-value < 0.01) to GenBank. Three ORFs inserted in the three viruses. The inferred phylogenetic tree of the major
depicted right part of the ARV2 genome match to virion protein shows a distinct cluster of SMVs as well as
Sulfolobus/Acidianus genes (green arrows in Fig. S8), a cluster of STSV1 and STSV2, which are both distant to
with one having a match to a transcription initiation factor ATV. Within the SMV cluster, SMV1 and SMV4 are the
IIB. Five ORFs have been inverted in comparison with most closely related (having a bootstrap support of 92%).
ARV1 (Fig. S8). In total 15 ARV1 ORFs are missing from
ARV2: the first seven and the last four ORFs as well as Novel viral or plasmid genomes from CH1102
four ORFs inside the genomic region (Fig. S8). Among Homologues of the SMV1 integrase were searched in the
these deleted genes are two predicted transcriptional cellular part of the CH1102 metagenome in order to find
regulators, putative thymidylate synthase and a putative the integration site on the host genome. We could not
ATPase. A small ITR of around 54 nt is found, which is identify any integration site; however, we discovered two
much shorter than the reported 1365 bp ITR of the ARV1 long contigs assigned by MEGAN to Sulfolobus islandicus
genome and the 1030–2029 bp of other rudiviral that contained the integrase homologues. Both contigs
genomes (Peng et al., 2001; Vestergaard et al., 2008a; appear to be complete circular genomes, one of 40.9 kbp,
Servin-Garciduenas et al., 2013). Therefore, some termi- designated SYV1, and the other of 34.8 kbp, designated
nal sequences are possibly missing. SYV2, encoding 71 and 60 genes respectively. More than
half of the genomes are homologous between the two
with several genes inverted (Fig. 3B). Of the two
Other novel putative viruses
genomes, seven and 14 ORFs, respectively, have best
SMV1-like viral genomes from CH1102 Several large matches to viral genes (SMV1, SSVs or HAV2) (Fig. 3B),
contigs present in the CH1102 metagenome were whereas the majority of the genomes show no sequence
assigned to Sulfolobus monocaudavirus 1 (SMV1), which similarity to GenBank, suggesting they are novel viral or
was also isolated from YNP. SMV1 is a spindle-shaped plasmid genomes.
crenarchaeal virus with a single tail protruding from the
main virion body and was found to induce hyperactive Novel viral-like contigs from It3 and Is2-5S In the It3
CRISPR spacer uptake from co-infecting genetic ele- metagenome a contig of 20 640 bp was assigned to HAV2
ments but not from its own genome (Erdmann and (Garrett et al., 2010) by MEGAN. HAV2 refers to
Garrett, 2012). Three of the contigs were complete hyperthermophilic archaeal virus 2 retrieved from the
genomes of circular DNA, designated SMV2, SMV3 and culture supernatant of a bioreactor maintained at 85°C
SMV4 respectively. A comparison between the four and pH 6 (Garrett et al., 2010). The contig encodes 30
genomes is shown in Fig. 3A. ORFs, but BLASTP searches against the GenBank data-
© 2015
C 2015 Society
V SocietyMicrobiology
for Applied for Applied and
Microbiology and& John
John Wiley SonsWiley & Sons Ltd, Environmental
Ltd, Environmental Microbiology
Microbiology, 18, 863–874
Novel viral
Novelgenomes from from
viral genomes acidic hot springs
acidic hot springs8697
Fig. 3. Genome maps of the SMVs (A) and SYV1 and SYV2 (B).
A. Genome comparison between SMV1 and the three novel SMVs. Black arrows denote conserved genes in all genomes and white arrows
denote unique genes in each genome, whereas homologs shared by two or three SMVs are colour coded. ORF numbers mentioned in the
main text are indicated below the SMV1 map.
B. Genome comparison between SYV1 and SYV2. Colours on arrows indicate the organism having the best BLASTP match. White arrows
denote ORFs having no significant match to GenBank.
base with a cut-off of 0.01, revealing only one ORF CRISPR spacer analysis suggested Hydrogenobaculum
showing low similarity (38% a.a. identity) to an HAV2 gene being the potential host (see below).
(Fig. S10). Because many archaeal viral sequences show
no or little similarity to the public databases, it is possible
Host-virus correlation
that the contig is part of a novel viral genome. Indeed,
CRISPR spacer analyses provided evidence for the Most archaeal viruses known to date, found at these
contig being a genetic element of Pyrobaculum (see elevated temperatures, infect members of the
below). Hence, it is designated GEP1. Sulfolobaceae, mainly within the genera of Sulfolobus and
A contig of 19 351 bp in the Is2-5S metagenome was Acidianus. In all metagenomic samples there is a clear
assigned to the root of the taxonomic tree by MEGAN. This co-occurrence of virus and host. Sequences belonging to
contig encodes 26 ORFs with only four BLASTP matches to members of Sulfolobus and Acidianus are present in all
viral genes in GenBank having lower than 43% a.a. iden- the samples where sequences related to viruses known to
tities. Two ORFs had sequence similarities to phage infect either of the two are detected. In samples Is2-5S,
genes, a terminase and a DNA primase/polymerase. One Is3-13 and NL10 viral sequences were also assigned to
ORF showed sequence similarity to a S. islandicus cas4 Pyrobaculum spherical virus (PSV) (Haring et al., 2004)
gene and another to an AFV3 gene (Fig. S11). Cas4 and Thermoproteus tenax virus 1 (TTV1) (Janekovic
homologues have been found in other viral genomes et al., 1983), and in these cases the metagenomes also
(Gardner et al., 2011; Guo et al., 2015). This contig could contain sequences assigned to their archaeal hosts
therefore represent an incomplete novel viral genome that Pyrobaculum and Thermoproteus tenax (Menzel et al.,
is likely to infect thermophilic microorganisms and 2015).
© 2015 Society
C 2015
V Society for
for Applied
Applied Microbiology
Microbiology and
and John
John Wiley
Wiley&&Sons
SonsLtd,
Ltd,Environmental
EnvironmentalMicrobiology,
Microbiology 18, 863–874
8870S. R.
S. Gudbergsdóttir
R. Gudbergsdet ttir
o al. et al.
Table 1. Number of CRISPR loci and their unique spacers from each metagenome and the number of spacers that match to the metagenome’s
viral sequences as well as known viruses and plasmids from GenBank.
Loci Loci with taxonomy Spacers matching Spacers matching CF spacers matching
found assigned by MEGAN Unique viral contigs known Spacers viral contigs
spacers matching
Meta-genome Bacteria Archaea from all loci Viruses Plasmids CF spacers
The last two columns denote the overlap of the metagenomic spacers with the CRISPR Finder database (CF) and how many CF spacers have
matches to the viral contigs present in the metagenomes.
To further link the novel viral or plasmid genomes and at high temperatures constitute only a small part of the
their potential host in the samples, CRISPR loci were genetic elements existing in the thermal environments
identified from the cellular part of the metagenomes using and a large diversity remain to be discovered in nature. It
the CRISPR Recognition Tool (CRT) (Bland et al., 2007), is, however, possible that viruses diverged too much for
which revealed no CRISPR locus in Is2-5S and 18–291 some spacer matches to be detected.
different CRISPR loci in the other five samples. We then All spacers from the CRISPR finder (CF) database
searched for CRISPR loci on contigs that had no match to (http://crispr.u-psud.fr/) were extracted and compared with
GenBank or could not be assigned to a taxon by MEGAN, our metagenomic spacers. Despite the large number of
collectively referred to as unassigned in Table S2. Doing spacers identified from the metagenomes, very few of
so, we identified 858 CRISPR loci in Is2-5S and between these match to the known CF spacers (Table 1). In
107 and 748 loci from the other five samples, increasing general, more matches to viral contigs were identified
the total number of CRISPR loci from 436 to 2305, and the from the spacers of the same sample than from the pub-
total number of spacers from 4480 to 18 542 (Table 1). lished known CF spacers (Table 1). The opposite is found
The number of repeat/spacer units in the loci varies with NL10 and Is3-13, which could be correlated with their
between three and 80, and the total number of spacers relatively lower number of spacers (1096 and 559,
ranges between 559 and 7772 across all metagenomes, respectively, in comparison to up to 7772 in CH1102).
most of which were identified from unassigned contigs. All spacers from the metagenomes were aligned to the
We therefore stress the importance of searching for novel putative viral genomes described in the previous
potential CRISPR loci in the metagenomes as a whole section, to identify or verify their potential host (Table 2).
rather than limiting the analysis to contigs with assigned Four spacers of the It3 metagenome matched GEP1, two
taxonomy. Potentially, the taxonomy of unassigned of which are perfect matches. This indicates that the
contigs having CRISPR loci could be identified based on contig is an extracellular sequence of either viral or
the repeat sequences. plasmid origin. In order to make a link to the host of this
After removing duplicates from spacers identified by genetic element we traced the taxonomic origin of the
CRT, the unique spacers were aligned to the viral contig harbouring the matching CRISPR loci. All four
sequences of the same metagenome as well as to the spacers originated from hyperthermophilic crearchaeon
viral and plasmid sequences in GenBank using Blastn. Pyrobaculum, so it is very likely that this novel virus, or
We allowed for up to 10 mismatches between the full- plasmid, targets Pyrobaculum as its host.
length spacer and the proto-spacer based on the findings Out of the 4543 spacers identified in the Is2-5S
that DNA interference in Sulfolobus species, the main metagenome 20 matched to the 19 351 bp contig
host of the viruses discovered in these metagenomes, assigned to be viral (Fig. S11), 15 of which were perfect
was not abolished even with 15 mismatches between the matches (Table 2). In addition to the aforementioned simi-
spacer and the proto-spacer (Manica et al., 2013). On larity between four ORFs of the contig and some viral
average, only 8% of spacers match back to viral contigs, genes, these matches support the idea that this contig is
with the lowest percentage in Is2-5S of only 0.5%. Nev- of viral origin. In order to shed light on the potential host of
ertheless, this is an indication of the virus–host interac- this novel virus, we searched the CRISPR repeats asso-
tions in nature allowing their co-existence. Similarly, on ciated with the matching spacers against the CRISPR
average only 3% of the spacers matched known viruses finder database. Out of all the repeats, 18 had the same
and plasmids, again with Is2-5S with only 0.6% matching sequence that matched to a Hydrogenobaculum repeat.
spacers. This suggests that viruses and plasmids isolated Two other repeats matched to either Thermodesulfobac-
© 2015
C 2015 Society
V SocietyMicrobiology
for Applied for Applied Microbiology and&John
and John Wiley SonsWiley & Sons Ltd, Environmental
Ltd, Environmental Microbiology, Microbiology
18, 863–874
Novel viral
Novelgenomes from from
viral genomes acidic hot springs
acidic hot springs8719
Table 2. Name, potential host, genome size and accession number of the novel viral genomes.
The last two columns denote the number of matching spacers from CRISPR loci found in the same metagenome and the number of matching
spacers from the CF database.
a. The percentage of matching spacers derived from contigs assigned to the potential host is given in brackets.
–, the origin of the single spacer matching ABV3 could not be identified.
terium geofontis or Sulfurihydrogenibium azorense. Iceland and YNP, USA. In all metagenomes, the identified
Based on this, we infer this contig represents a bacterial viral sequences are predominantly assigned to
phage that can infect Hydrogenobaculum species, and crenarchaeal viruses. Four viral families, Ampullaviridae,
named it as Hydrogenobaculum phage 1 (HP1). Indeed, Bicaudaviridae, Lipothrixviridae and Rudiviridae, were
0.1% of cellular reads in the Is2-5S metagenome were present in all metagenomes with the first two families
assigned to Hydrogenobaculum with the majority being detected for the first time in several distantly located geo-
classified as unassigned Hydrogenobaculum. graphic regions.
In accordance with their high sequence similarity to We analysed and described 10 complete or near com-
previously identified Sulfolobus and Acidianus viruses, plete genomes and 16 partial genomes (>5 kb), most
the genomes of ARV2, SMV2-SMV4 and ABV2-ABV3 of which were derived from viruses likely infecting
matched to 1–211 spacers with 87–100% derived from Sulfolobaceae. These novel genomes allowed the identi-
Sulfolobus- or Acidianus-related contigs (Table 2), further fication of core genomes for ABV- and SMV-like viruses
supporting their viral identity and confirming their host and the partial genomes demonstrated high sequence
being Sulfolobales. Similarly, more than 93% of the variation in the viral populations.
spacers matching to SYV1 and SYV2 are associated with The absence of a complete lipothrixviral genome in
Sulfolobus contigs (Table 2). Given their similarity with the metagenomes, despite the high abundance of
different Sulfolobus viral genes (Fig. 3B), the two ele- contigs assigned to the family (Table S4), could indicate
ments are probably Sulfolobus viruses. the presence of multiple related but not identical
Notably, the number of spacers matching to the novel lipothrixviral genomes of similar abundance. Perhaps,
SMV genomes in the CH1102 sample is generally very this contributes to the genetic recombination between
high (Table 2). In fact the protospacers map more than similar genomes and the mosaic feature of the
16% of the SMV2 genome, given the average length of a genomes, as observed here and by Vestergaard and
Sulfolobales protospacer being 39 bp (Rousseau et al., colleagues (2008b).
2009). Whereas hyperactive uptake of viral or plasmid CRISPR analysis confirmed the viral origin of all the
sequences into the CRISPR arrays has been reported for novel viral genomes presented in this study and dem-
virus–host (Erdmann and Garrett, 2012) or plasmid–host onstrated useful in linking them to potential hosts. Based
(Yosef et al., 2012) systems under laboratory conditions, on homology with known viral genes and CRISPR analy-
our data suggest that similar events may take place in sis, we identified the genome of the first virus presum-
nature. In the case of the CH1102 sample, this may be ably infecting the Hydrogenobaculum for which no virus
mediated by a virus similar to SMV1, which induces the has been reported (Romano et al., 2013) as well as two
hyperactive uptake of CRISPR spacers into the genome novel genomes infecting the Sulfolobaceae family. Of
of Sulfolobus (Erdmann and Garrett, 2012). particular interest is the relatively high number of spacer
matches to SMV genomes and the extremely low
number of spacer matches to ABV genomes, although
Conclusion
both are abundant in their respective sample. It remains
Viral sequences were retrieved from six metagenomes an open question as to what controls the CRISPR
of acidic and high temperature hot springs from Italy, spacer uptake events.
© 2015 Society
C 2015
V Society for
for Applied
Applied Microbiology
Microbiology and
and John
John Wiley
Wiley&&Sons
SonsLtd,
Ltd,Environmental
EnvironmentalMicrobiology,
Microbiology 18, 863–874
872
10 S.S.R.R. Gudbergsdo
Gudbergsdóttiret
ttiral.et al.
Experimental procedures present in the CRISPR database (Grissa et al., 2007), only
considering identical sequences. Additionally, all spacers
Sequence assembly and phylogeny contained in the CRISPR database were aligned to our viral
contigs, allowing for 10 mismatches as above.
Sampling, DNA extraction and metagenomic sequencing
have been described in Menzel et al. (2015). Briefly, the
assembly was done using various assemblers Acknowledgement
and settings, and individual assemblies were merged into a
meta-assembly. Each metagenome assembly was then This work was supported by the European Union 7th Frame-
aligned to GenBank using RAPSEARCH2 (Zhao et al., 2012) work Programme
(E-value < 0.01), and MEGAN4 (Huson et al., 2011) was used FP7/2007–2013 under grant agreement No. 265933 –
to assign contigs to taxa. For the Illumina-sequenced HotZyme. The authors declare no conflict of interest.
samples It6, Is2-5S and Is3-13, the reads were mapped back
to the contigs and read counts per taxon were used to quan-
References
tify taxonomic abundances within each metagenome. For the
454-sequenced samples It3, NL10 and CH1102, reads were Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman,
directly aligned to GenBank using the same E-value cut-off D.J. (1990) Basic local alignment search tool. J Mol Biol
and assigned to taxa by MEGAN. 215: 403–410.
Anderson, R.E., Brazelton, W.J., and Baross, J.A. (2011)
Using CRISPRs as a metagenomic tool to identify micro-
Annotation bial hosts of a diffuse flow hydrothermal vent viral assem-
blage. FEMS Microbiol Ecol 77: 120–133.
Large viral contigs were subjected to annotation using RAST
Arnold, H.P., Zillig, W., Ziese, U., Holz, I., Crosby, M.,
(Rapid Annotation using Subsystem Technology) against all
Utterback, T., et al. (2000) A novel lipothrixvirus, SIFV, of
dsDNA viruses, no RNA stage as taxonomy ID (Aziz et al.,
the extremely thermophilic crenarchaeon Sulfolobus. Virol-
2008). A few of the contigs were also analysed with METAVIR2
ogy 267: 252–266.
(Roux et al., 2014) and the two annotations compared in
Aziz, R.K., Bartels, D., Best, A.A., DeJongh, M., Disz, T.,
ARTEMIS (Rutherford et al., 2000).
Edwards, R.A., et al. (2008) The RAST Server: rapid anno-
tations using subsystems technology. BMC Genomics 9:
75.
Genomic comparison Bland, C., Ramsey, T.L., Sabree, F., Lowe, M., Brown, K.,
The annotated ORFs of novel virus genomes were aligned to Kyrpides, N.C., and Hugenholtz, P. (2007) CRISPR
the known and novel genomes in each viral family using recognition tool (CRT): a tool for automatic detection of
BLASTP (Altschul et al., 1990) using an E-value cut-off of 0.01. clustered regularly interspaced palindromic repeats. BMC
The BLASTP output was used to annotate homologous ORFs Bioinformatics 8: 209.
between the genomes. Bolduc, B., Shaughnessy, D.P., Wolf, Y.I., Koonin, E.V.,
To visualize the comparison of more than one contig or Roberto, F.F., and Young, M. (2012) Identification of novel
genome we used MAUVE (Darling et al., 2010). Visualization positive-strand RNA viruses by metagenomic analysis of
of pair-wise genomic comparison was done with EasyFig archaea-dominated Yellowstone hot springs. J Virol 86:
(Sullivan et al., 2011). DNA plotter was used to generate a 5562–5573.
map of those contigs that could not be compared with any Brum, J.R., and Sullivan, M.B. (2015) Rising to the challenge:
known genome (Carver et al., 2009) presented in the supple- accelerated pace of discovery transforms marine virology.
mentary figures. Nat Rev Microbiol 13: 147–159.
Carver, T., Thomson, N., Bleasby, A., Berriman, M., and
Parkhill, J. (2009) DNAPlotter: circular and linear interac-
CRISPR analysis tive genome visualization. Bioinformatics 25: 119–120.
Darling, A.E., Mau, B., and Perna, N.T. (2010)
The contigs from each metagenome assembly were progressiveMauve: multiple genome alignment with gene
screened for CRISPR loci using the CRT with default settings gain, loss and rearrangement. PLoS ONE 5: e11147.
(Bland et al., 2007). Spacer sequences were extracted from Delwart, E.L. (2007) Viral metagenomics. Rev Med Virol 17:
the predicted CRISPR loci from each metagenome and clus- 115–131.
tered with cd-hit-454 (Niu et al., 2010) by perfect identity in Edwards, R.A., and Rohwer, F. (2005) Viral metagenomics.
order to remove duplicates. Spacers with more than two Nat Rev Microbiol 3: 504–510.
undefined nucleotides (‘N’) were removed. The set of unique Emerson, J.B., Andrade, K., Thomas, B.C., Norman, A.,
spacers from each metagenome was aligned to the viral Allen, E.E., Heidelberg, K.B., and Banfield, J.F. (2013)
contigs using BLASTN (Altschul et al., 1990) with an E-value Virus–host and CRISPR dynamics in Archaea-dominated
cut-off of 1e-5 and a word size of 7. The BLASTN output was hypersaline Lake Tyrrell, Victoria, Australia. Archaea 2013:
further filtered to only keep alignments having a maximum of 370871.
10 mismatches between spacers and potential protospacers. Erdmann, S., and Garrett, R.A. (2012) Selective and hyper-
Unique spacers were also aligned to all known viral and active uptake of foreign DNA by adaptive immune systems
plasmid genomes from GenBank using the same method. We of an archaeon via two distinct mechanisms. Mol Microbiol
also counted the number of spacer sequences already 85: 1044–1056.
© 2015
C 2015 Society
V SocietyMicrobiology
for Applied for Applied and
Microbiology and& John
John Wiley SonsWiley & Sons Ltd, Environmental
Ltd, Environmental Microbiology
Microbiology, 18, 863–874
Novel Novel
viral genomes fromfrom
viral genomes acidic hothot
acidic springs
springs 873
11
Erdmann, S., Chen, B., Huang, X., Deng, L., Liu, C., Shah, reducing archaebacterium Thermoproteus tenax. Mol Gen
S.A., et al. (2014) A novel single-tailed fusiform Sulfolobus Genet 192: 39–45.
virus STSV2 infecting model Sulfolobus species. Krupovic, M., Prangishvili, D., Hendrix, R.W., and Bamford,
Extremophiles 18: 51–60. D.H. (2011) Genomics of bacterial and archaeal viruses:
Gardner, A.F., Prangishvili, D., and Jack, W.E. (2011) dynamics within the prokaryotic virosphere. Microbiol Mol
Characterization of Sulfolobus islandicus rod-shaped Biol Rev 75: 610–635.
virus 2 gp19, a single-strand specific endonuclease. Manica, A., Zebec, Z., Steinkellner, J., and Schleper, C.
Extremophiles 15: 619–624. (2013) Unexpectedly broad target recognition of the
Garrett, R.A., Prangishvili, D., Shah, S.A., Reuter, M., Stetter, CRISPR-mediated virus defence system in the archaeon
K.O., and Peng, X. (2010) Metagenomic analyses of novel Sulfolobus solfataricus. Nucleic Acids Res 41: 10509–
viruses and plasmids from a cultured environmental 10517.
sample of hyperthermophilic neutrophiles. Environ Menzel, P., Gudbergsdóttir, S., Rike, A., Lin, L., Zhang, Q.,
Microbiol 12: 2918–2930. Contursi, P., et al. (2015) Comparative metagenomics of
Grissa, I., Vergnaud, G., and Pourcel, C. (2007) The eight geographically remote terrestrial hot springs. Microb
CRISPRdb database and tools to display CRISPRs and to Ecol 70: 411–424.
generate dictionaries of spacers and repeats. BMC Mokili, J.L., Rohwer, F., and Dutilh, B.E. (2012)
Bioinformatics 8: 172. Metagenomics and future perspectives in virus discovery.
Guo, Y., Kragelund, B.B., White, M.F., and Peng, X. (2015) Curr Opin Virol 2: 63–77.
Functional characterization of a conserved archaeal viral Niu, B., Fu, L., Sun, S., and Li, W. (2010) Artificial and natural
operon revealing single-stranded DNA binding, annealing duplicates in pyrosequencing reads of metagenomic data.
and nuclease activities. J Mol Biol 427: 2179–2191. BMC Bioinformatics 11: 187.
Haring, M., Peng, X., Brugger, K., Rachel, R., Stetter, K.O., Peng, X., Blum, H., She, Q., Mallok, S., Brugger, K., Garrett,
Garrett, R.A., and Prangishvili, D. (2004) Morphology R.A., et al. (2001) Sequences and replication of genomes
and genome organization of the virus PSV of the of the archaeal rudiviruses SIRV1 and SIRV2: relationships
hyperthermophilic archaeal genera Pyrobaculum and to the archaeal lipothrixvirus SIFV and some eukaryal
Thermoproteus: a novel virus family, the Globuloviridae. viruses. Virology 291: 226–234.
Virology 323: 233–242. Peng, X., Basta, T., Haring, M., Garrett, R.A., and
Haring, M., Rachel, R., Peng, X., Garrett, R.A., and Prangishvili, D. (2007) Genome of the Acidianus bottle-
Prangishvili, D. (2005a) Viral diversity in hot springs of shaped virus and insights into the replication and packag-
Pozzuoli, Italy, and characterization of a unique archaeal ing mechanisms. Virology 364: 237–243.
virus, Acidianus bottle-shaped virus, from a new family, the Prangishvili, D. (2013) The wonderful world of archaeal
Ampullaviridae. J Virol 79: 9904–9911. viruses. Annu Rev Microbiol 67: 565–585.
Haring, M., Vestergaard, G., Rachel, R., Chen, L., Garrett, Rice, G., Tang, L., Stedman, K., Roberto, F., Spuhler, J.,
R.A., and Prangishvili, D. (2005b) Virology: independent Gillitzer, E., et al. (2004) The structure of a thermophilic
virus development outside a host. Nature 436: 1101–1102. archaeal virus shows a double-stranded DNA viral capsid
Haring, M., Vestergaard, G., Brugger, K., Rachel, R., Garrett, type that spans all domains of life. Proc Natl Acad Sci USA
R.A., and Prangishvili, D. (2005c) Structure and genome 101: 7716–7720.
organization of AFV2, a novel archaeal lipothrixvirus with Romano, C., D’Imperio, S., Woyke, T., Mavromatis, K.,
unusual terminal and core structures. J Bacteriol 187: Lasken, R., Shock, E.L., and McDermott, T.R. (2013) Com-
3855–3858. parative genomic analysis of phylogenetically closely
Heidelberg, J.F., Nelson, W.C., Schoenfeld, T., and Bhaya, D. related Hydrogenobaculum sp. Isolates from Yellowstone
(2009) Germ warfare in a microbial mat community: National Park. Appl Environ Microbiol 79: 2932–2943.
CRISPRs provide insights into the co-evolution of host and Rosario, K., and Breitbart, M. (2011) Exploring the viral world
viral genomes. PLoS ONE 4: e4169. through metagenomics. Curr Opin Virol 1: 289–297.
Huson, D.H., Mitra, S., Ruscheweyh, H.J., Weber, N., and Rousseau, C., Gonnet, M., Le Romancer, M., and Nicolas, J.
Schuster, S.C. (2011) Integrative analysis of environmental (2009) CRISPI: a CRISPR interactive database.
sequences using MEGAN4. Genome Res 21: 1552–1560. Bioinformatics 25: 3317–3318.
Inskeep, W.P., Rusch, D.B., Jay, Z.J., Herrgard, M.J., Roux, S., Tournayre, J., Mahul, A., Debroas, D., and Enault,
Kozubal, M.A., Richardson, T.H., et al. (2010) F. (2014) Metavir 2: new tools for viral metagenome
Metagenomes from high-temperature chemotrophic comparison and assembled virome analysis. BMC
systems reveal geochemical controls on microbial commu- Bioinformatics 15: 76.
nity structure and function. PLoS ONE 5: e9773. Rutherford, K., Parkhill, J., Crook, J., Horsnell, T., Rice, P.,
Inskeep, W.P., Jay, Z.J., Herrgard, M.J., Kozubal, M.A., Rajandream, M.A., and Barrell, B. (2000) Artemis,
Rusch, D.B., Tringe, S.G., et al. (2013) Phylogenetic and sequence visualization and annotation. Bioinformatics 16:
functional analysis of metagenome sequence from high- 944–945.
temperature archaeal habitats demonstrate linkages Schoenfeld, T., Patterson, M., Richardson, P.M., Wommack,
between metabolic potential and geochemistry. Front K.E., Young, M., and Mead, D. (2008) Assembly of viral
Microbiol 4: 95. metagenomes from Yellowstone hot springs. Appl Environ
Janekovic, D., Wunderl, S., Holz, I., Zillig, W., Gierl, A., and Microbiol 74: 4164–4174.
Neumann, H. (1983) TTV1, TTV2 and TTV3, a family of Servin-Garciduenas, L.E., Peng, X., Garrett, R.A., and
viruses of the extremely thermophilic, anaerobic, sulfur Martinez-Romero, E. (2013) Genome sequence of a novel
© 2015 Society
C 2015
V Society for
for Applied
Applied Microbiology
Microbiology and
and John
John Wiley
Wiley&&Sons
SonsLtd,
Ltd,Environmental
EnvironmentalMicrobiology,
Microbiology 18, 863–874
874
12 S.S.R.R. Gudbergsdo
Gudbergsdóttiret
ttiral.et al.
archaeal rudivirus recovered from a Mexican hot spring. and 11 673 bp, respectively, derived from the Is3-13
Genome Announc 1: e00040–12. metagenome.
Snyder, J.C., Bateson, M.M., Lavin, M., and Young, M.J. Fig. S3. Genome map of the largest ABV-like contig of
(2010) Use of cellular CRISPR (clusters of regularly 10 273 bp derived from CH1102 in comparison to ABV.
interspaced short palindromic repeats) spacer-based Fig. S4. Mauve comparison of ABV and It3 ABV-like contigs
microarrays for detection of viruses in environmental (5365 bp, 8612 bp and 11 060 bp respectively).
samples. Appl Environ Microbiol 76: 7251–7258. Fig. S5. Mauve comparison of ABV and ABV-like contigs
Sorek, R., Kunin, V., and Hugenholtz, P. (2008) CRISPR – a from NL10 (11 649, 8331 and 8990 bp respectively).
widespread system that provides acquired resistance Fig. S6. Mauve comparison of ABV and all large ABV-like
against phages in bacteria and archaea. Nat Rev Microbiol contigs from all metagenomes. The left part (L, 0 – 11.9 kb)
6: 181–186. and the right part (R, 11.9 – 23.8 kb) are indicated, separated
Sullivan, M.J., Petty, N.K., and Beatson, S.A. (2011) Easyfig: by a vertical line on top of the figure.
a genome comparison visualizer. Bioinformatics 27: 1009– Fig. S7. Recruitment plot of metagenomic reads of samples
1010. It3 (454-sequencing) and It6 (Illumina HiSeq 2x90bp) to the
Uldahl, K., and Peng, X. (2013) Biology, biodiversity and ATV genome. Black rectangles denote the ORFs on the
application of thermophilic viruses. In Thermophilic genome, with only a few having annotated functions.
Microbes in Environmental and Industrial Biotechnology. Fig. S8. Genome comparison between ARV1 and ARV2 from
Satyanarayana, T., Littlechild, J., and Kawarabayasi, Y. It6. Several ORFS are conserved (orange arrows) between
(eds). Netherlands: Springer, pp. 271–304. the two genomes although with a few rearrangements. The
Vestergaard, G., Haring, M., Peng, X., Rachel, R., Garrett, ends of the genomes show poor alignment. ORFs showing
R.A., and Prangishvili, D. (2005) A novel rudivirus, ARV1, highest homology to different viral or host genes are
of the hyperthermophilic archaeal genus Acidianus. Virol- colour-coded.
ogy 336: 83–92. Fig. S9. Pairwise sequence identities (a) and phylogenetic
Vestergaard, G., Shah, S.A., Bize, A., Reitberger, W., Reuter, tree (b) of the major virion protein between ATV, STSVs and
M., Phan, H., et al. (2008a) Stygiolobus rod-shaped virus the SMVs genomes. Internal nodes denote the bootstrap
and the interplay of crenarchaeal rudiviruses with the confidence values. Amino acid sequences were aligned by
CRISPR antiviral system. J Bacteriol 190: 6837–6845. MUSCLE ver. 3.8.31 (using default settings) and the tree was
Vestergaard, G., Aramayo, R., Basta, T., Haring, M., Peng, inferred using the http://phylogeny.lirmm.fr server (using
X., Brugger, K., et al. (2008b) Structure of the acidianus PhyML 3.0 with default settings and 100 bootstrap iterations).
filamentous virus 3 and comparative genomics of related Fig. S10. Genome map of GEP1 from the It3 metagenome.
archaeal lipothrixviruses. J Virol 82: 371–381. Fig. S11. Genome map of HP1 from the IS2-5S
Xiang, X., Chen, L., Huang, X., Luo, Y., She, Q., and Huang, metagenome.
L. (2005) Sulfolobus tengchongensis spindle-shaped virus Table S1. Temperature and pH of the sampling site and sam-
STSV1: virus–host interactions and genomic features. J pling time for each sample. YNP, Yellowstone National Park.
Virol 79: 8677–8686. Table S2. Distribution of metagenomic reads assigned to
Yosef, I., Goren, M.G., and Qimron, U. (2012) Proteins and Viruses, Bacteria, Archaea, or Other (e.g. Eukaryotes) as well
DNA elements essential for the CRISPR adaptation as Unassigned to any of the four.
process in Escherichia coli. Nucleic Acids Res 40: 5569– Table S3. Overview of the number of the large viral contigs in
5576. each metagenome and their phylogenetic assignment by
Zhao, Y., Tang, H., and Ye, Y. (2012) RAPSearch2: a fast and MEGAN. The size and the name (in bold) of the largest viral
memory-efficient protein similarity search tool for next- contig within a metagenome are indicated.
generation sequencing data. Bioinformatics 28: 125– Table S4. Overview of ABV and ATV-like as well as Lipotrix-
126. and Rudiviral-like contigs from all metagenomes.
Table S5. Comparison of the three ABV genomes and a.a.
identity between homologues. Genes deleted in both ABV2
Supporting information and ABV3 are indicated in orange cells and those deleted
only in ABV3 are in yellow cells, whereas genes indicated in
Additional Supporting Information may be found in the online
blue cells are inverted in comparison to ABV.
version of this article at the publisher’s web-site:
Table S6. Protein sequence identity between ABV and the
Fig. S1. Mauve comparison between ABV2 and the three largest ABV like contigs from each metagenome. Numbers
large contigs over 5 kbp derived from the It6 metagenome. with * denote genes that are found outside their regular gene
Fig. S2. Mauve comparison between ABV3 and the largest order. Genes labelled + are on forward strand and – on
ABV-like contigs of 14 789 bp, 12 489 bp, 10 238 bp reverse strand, as depicted in Fig. 2.
© 2015
C 2015 Society
V SocietyMicrobiology
for Applied for Applied and
Microbiology and& John
John Wiley SonsWiley & Sons Ltd, Environmental
Ltd, Environmental Microbiology
Microbiology, 18, 863–874