Partial Sequencing Reveals The Transposable Element Composition of Coffea Genomes and Provides Evidence For Distinct Evolutionary Stories

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/305716648

Partial sequencing reveals the transposable element composition of Coffea


genomes and provides evidence for distinct evolutionary stories

Article in Molecular Genetics and Genomics · October 2016


DOI: 10.1007/s00438-016-1235-7

CITATIONS READS

20 281

12 authors, including:

Romain Guyot Mathilde Dupeyron


Institute of Research for Development Université de Montpellier
201 PUBLICATIONS 5,986 CITATIONS 24 PUBLICATIONS 129 CITATIONS

SEE PROFILE SEE PROFILE

Alexandre De Kochko Dominique Crouzillat


Institute of Research for Development Nestlé S.A.
258 PUBLICATIONS 6,377 CITATIONS 108 PUBLICATIONS 2,587 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Alexandre De Kochko on 26 March 2018.

The user has requested enhancement of the downloaded file.


Mol Genet Genomics (2016) 291:1979–1990
DOI 10.1007/s00438-016-1235-7

ORIGINAL ARTICLE

Partial sequencing reveals the transposable element composition


of Coffea genomes and provides evidence for distinct evolutionary
stories
Romain Guyot2 · Thibaud Darré1 · Mathilde Dupeyron1 · Alexandre de Kochko1 ·
Serge Hamon1 · Emmanuel Couturon1 · Dominique Crouzillat3 ·
Michel Rigoreau3 · Jean‑Jacques Rakotomalala4 · Nathalie E. Raharimalala4 ·
Sélastique Doffou Akaffou5 · Perla Hamon1

Received: 19 May 2016 / Accepted: 25 July 2016 / Published online: 28 July 2016
© Springer-Verlag Berlin Heidelberg 2016

Abstract The Coffea genus, 124 described species, has a At the opposite, for the insular species (Mascarocoffea), a
natural distribution spreading from inter-tropical Africa, strong variation of LTR-RT was observed suggesting dif-
to Western Indian Ocean Islands, India, Asia and up to ferential dynamics of these elements in this group. Two
Australasia. Two cultivated species, C. arabica and C. LTR-RT lineages, SIRE and Del were clearly differentially
canephora, are intensively studied while, the breeding accumulated between African and insular species, suggest-
potential and the genome composition of all the wild spe- ing these lineages were associated to the genome diver-
cies remained poorly uncharacterized. Here, we report the gence of Coffea species in Africa. Altogether, the informa-
characterization and comparison of the highly repeated tion obtained in this study improves our knowledge and
transposable elements content of 11 Coffea species repre- brings new data on the composition, the evolution and the
sentatives of the natural biogeographic distribution. A total divergence of wild Coffea genomes.
of 994 Mb from 454 reads were produced with a genome
coverage ranging between 3.2 and 15.7 %. The analyses Keywords LTR retrotransposons · Partial genome
showed that highly repeated transposable elements, mainly sequencing · Coffea · Genome size · Geographic divergence
LTR retrotransposons (LTR-RT), represent between 32 and
53 % of Coffea genomes depending on their biogeographic
location and genome size. Species from West and Central Introduction
Africa (Eucoffea) contained the highest LTR-RT content
but with no strong variation relative to their genome size. Repetitive sequences are major components of plant
genomes. Transposable elements (TEs), constituting the
mobile part of the genomes, are divided into two main
Communicated by S. Hohmann.
classes (Class I and Class II) according to their mode of
Electronic supplementary material The online version of this transposition. They are hierarchically classified into orders,
article (doi:10.1007/s00438-016-1235-7) contains supplementary super-families, lineages, families and individuals within
material, which is available to authorized users. each class (Wicker and Keller 2007; Wicker et al. 2007).
Class I elements known as retrotransposons, transpose via
* Romain Guyot
[email protected] an RNA intermediate without movement of the master
copy. This ‘copy-and-paste’ mechanism can theoretically
1
IRD UMR DIADE, EvoGeC, BP 64501, 34394 Montpellier lead to a rapid increase of the frequency of the original
Cedex 5, France
copy. Class II, or transposons move following a ‘cut-and-
2
IRD UMR IPME, CoffeeAdapt, BP 64501, paste’ mechanism or through DNA replication, resulting
34394 Montpellier Cedex 5, France
to low or moderate new inserted copies. Plant retrotrans-
3
Nestlé R&D Tours, 101 AV. G. Eiffel, Notre Dame d’Oe ́, BP posons include two major orders: Long Tandem Repeat
49716, 37097 Tours Cedex 2, France
retrotransposons (LTR-RT) and non-LTR retrotransposons.
4
FOFIFA, Ambatobe, Madagascar The first ones include two super-families: Copia and Gypsy
5
University Jean Lorougnon Guédé, Daloa, Ivory Coast that differ mainly in their coding region organization and

13
1980 Mol Genet Genomics (2016) 291:1979–1990

are composed of ancient conserved evolutionary lineages in distribution of TEs between wheat and barley (Wicker et al.
plants (Wicker and Keller 2007). The second ones includes 2009), the genome size variation in the allotetraploid spe-
long and short interspersed nuclear element, LINE and cies Nicotiana tabaccum, (Renny-Byfield et al. 2011) as
SINE, respectively (Kumar and Bennetzen 1999; Wicker well the composition and abundance of highly repeated
et al. 2007). TEs in ten Triticeae taxa (Middleton et al. 2013) were also
During the last 10 years, the accumulation of genomic studied via a 454 pyrosequencing genomic survey.
sequencing data (Michael and Jackson 2013) indicated that Ranked fourth among angiosperms, the young Rubi-
TEs are the major component of plant genomes and that aceae family [90.4 My divergence time, (Bremer and Eriks-
their accumulation could be correlated with the genome son 2009)] comprises ca. 600 genera and ca. 13,600 spe-
sizes (Ibarra-Laclette et al. 2013; Kumar and Bennetzen cies. This family includes herbs, shrubs and trees growing
1999; Lisch 2013). LTR-RTs are the most redundant ele- naturally in overly diverse habitats (from desert to tropical
ments and in extreme cases, they can represent up to 80 % sempervirente forests via temperate areas), altitudes (from
of plant genome sequences, suggesting that their propaga- sea level to over 2500 m) and soils. In this plant family,
tion mechanisms are directly responsible of the genome diploids are the most common and share the same basic
size increase (SanMiguel et al. 1998; Schulman et al. 2004; chromosome number [x = 11 (Kiehn 1995)]. The Coffea
Bennetzen et al. 2005; Hawkins et al. 2006; Dvořák 2009). genus, member of Rubiaceae, is the most known genus due
Sometimes the propagation mechanisms induce a rapid to its major socio-economic importance worldwide (pro-
accumulation, called a “burst”, of a few number of LTR- ducers in Southern countries and consumers in Northern
RT families as demonstrated in the wild rice Oryza aus- countries). Accounting for 124 described species, all dip-
traliensis (Piegu et al. 2006). At the opposite, in maize, the loids with 2n = 2x = 22 but C. arabica (allotetraploid), the
accumulation of LTR-RT families was probably gradual, natural distribution in inter-tropical forests of Africa and
but leading to a considerable genome size increase when of Western Indian Ocean Islands was recently extended to
compared to the sorghum or the rice genome. However, a India, Asia and Australasia (Davis et al. 2011). The recent
correlation between genome size variations and LTR-RT sequencing of C. canephora (also called Robusta) genome
copy numbers was not established for the Zea genus (Mey- showed that no whole genome duplication has occurred
ers et al. 2001), suggesting that the proliferation mecha- after the Asterid clade divergence, some 110 My ago
nism of a few LTR-RT families per se cannot explain all (Denoeud et al. 2014). Moreover, comparative mapping
genome size variations in plants. The host genome controls between two divergent African genomes: C. canephora and
the level of transposition of LTR-RTs through epigenetic C. pseudozanguebariae did not reveal any major chromo-
mechanisms (Bucher et al. 2012; Ito 2013; Ito and Kaku- somal rearrangements (unpublished data). Despite struc-
tani 2014) This control might be reduced under abiotic tural conservations, a notable variation of genome sizes is
stresses (Todorovska 2007; Alzohairy et al. 2014; Kinoshita observed among Coffea species. This variation ranges from
and Seki 2014), leading to an increase of transposition and 469 to 900 Mb with a general pattern of increasing genome
suggesting that LRT-RT play a role in the genome adapta- sizes from East to West in Africa (Noirot et al. 2003) and
tion facing environmental changes (Casacuberta and Gon- from North to South-East in Madagascar (Razafinarivo
zalez 2013). However, so far no correlation was established et al. 2012), suggesting a gradual accumulation of nuclear
between plant genome size and their habitat or phenotypic DNA, under speciation and adaptive processes of the spe-
and life traits (Eilam et al. 2007; Knight and Beaulieu cies. Recently, the C. canephora genome sequencing
2008; Slovak et al. 2009; Dušková et al. 2010). allowed the computational identification of TEs (Denoeud
Next generation sequencing technologies provided pow- et al. 2014). They represent more than half of the available
erful tools to identify and characterize the repetitive frac- genome sequence, and among them, LTR-RTs are the most
tion of genomes even in large genomes such as wheat, bar- frequent order of elements (42 % of the genome). However,
ley or pea (Macas et al. 2007; Wicker et al. 2009). For these outside the C. canephora genome, no wide survey of TE
authors, an important advantage of the NGS sequencing composition has been conducted in the Coffea genus.
lies in the limited bias obtained for the production of the Here, we used a 454 sequencing survey of one tetraploid
sequences. Low-depth sequencing was effective in identify- and ten diploid species representative of the botanical and
ing the most highly repeated sequences and in estimating geographical diversity of the genus Coffea to study and
their copy numbers in the pea genome (Swaminathan et al. compare the composition and abundance of highly repeated
2007), banana genome (Hribova et al. 2010) and in vesper transposable elements in their genomes. Using a genome
bats (Pagan et al. 2012) and to study genome evolution at a coverage ranging from 3.2 to 15.7 %, the analysis of LTR-
genus or a family scale (Nystedt et al. 2013). It also allows RT composition and dynamics shows a clear difference
identifying TEs insertion polymorphism accompanying between African and insular Coffea species, suggesting an
clonal variation in grape (Carrier et al. 2012). The uneven ancient divergence. Contrary to previous hypotheses and

13
Mol Genet Genomics (2016) 291:1979–1990 1981

Table 1  454 sequencing data for 11 Coffea Species


Species Accession Country of origin Group No. of 454 reads Total (bp) Mean size (bp) Coverage (%) Genome size (Mb)

C. arabica ET39 Ethiopia EUC 93,194 41,643,904 446 3.2 1300


C. arabica ET39 Ethiopia EUC 112,615 51,892,121 460 3.9 1300
C. arabica ET39 Ethiopia EUC 140,976 61,729,478 437 4,7 1300
C. canephora IF410 Ivory Coast EUC 186,138 85,292,671 458 12.2 700
C. canephora DH200-94 D. Republic EUC 98,017 43,037,451 439 6.1 700
Congo
C. canephora BUD15 Uganda EUC 140,120 64,290,611 458 9.2 700
C. charrieriana OA22 Cameroon EUCa 136,518 57,405,992 420 7.9 723
C. eugenioides OUG14 Uganda EUC 186,449 85,961,094 461 13.3 645
C. eugenioides DA56 Kenya EUC 91,834 39,993,235 435 6.2 645
C. heterocalyx JC65 Cameroon EUC 123,119 45,633,337 370 5.2 863
C. pseudozangue- 8107 Kenya MOZ 215,117 91,733,301 426 15.5 593
bariae
C. racemosa IA56 Mozambique MOZ 173,803 79,199,218 455 15.7 506
C. tetragona A.252 Madagascar MAS 147,430 68,881,825 467 13.4 513
C. dolichophylla A.206 Madagascar MAS 147,758 70,632,674 478 10.4 682
C. humblotiana A.230 Comoros MAS 141,834 62,465,685 440 10.4 469
C. horsfieldiana HOR Indonesia PSI 104,605 44,610,588 426 7.5 593

Botanical groups (Group) are those from Chevalier (1942) with EUC Eucoffea (species from West and Central Africa), MOZ Mozambicoffea
(East Africa), MAS Mascarocoffea (species from Western Indian Ocean Islands), PSI Paracoffea
a
The Eucoffea classification for C. charrieriana was not established by Chevalier since the species was recently described by Stoffelen et al.
(2008). Therefore, its classification was assumed according to its geographical origin. Genome sizes are from Noirot et al. (2003) and Razafina-
rivo et al. (2012). The genome coverage is given in %

generally admitted idea, our results suggest that the Coffea accessions. Information on the accessions used, their origin
species from Western Indian Ocean Islands and from Asia and other used data are given in Table 1.
have diverged independently from their continental coun- DNA was isolated from fresh or dried leaves using
terparts. Furthermore, no strong activation of LTR-RTs was Qiagen DNeasy Plant Mini extraction kits following the
obvious in any species, whatever their genome size, sug- manufacturer protocol. Quantity and quality of DNA was
gesting that other molecular mechanisms or general but measured using a Nanodrop (ND-1000). The libraries con-
limited variation in TE copy numbers are associated to struction and Next Generation sequencing were performed
genome size increases in the Coffea genus. at Nestlé R&D laboratory (Tours, France) according to the
Roche/454 Life Sciences Sequencing Method using one
Roche 454 GS Junior plate per accession. Data were sub-
Materials and methods mitted to GenBank, BioProject PRJNA242989. General
information on 454-pyrosequencing is available in Table 1.
DNA isolation and 454 sequencing
Sequences analyses
Leaves from Madagascan and Comorian species were
obtained from the Kianjavato Coffee Research Station Quality of 454 reads was checked using FASTQC (http://
(KCRS) in Madagascar. The African species were sampled www.bioinformatics.babraham.ac.uk/projects/fastqc/) and
from the Coffea collection maintained at IRD (Montpel- cleaned using Prinseq v0.20.4 (Schmieder and Edwards
lier, France) or Nestlé R&D (Tours, France) greenhouses. 2011).
The studied species belong to Chevalier’s (Chevalier 1942) BLASTX searches (minimum e-value 10e−4) were
botanical sections, i.e., Eucoffea (West and Central African first carried out on 454-reads against the RepBase amino
species), Mozambicoffea (East African species), Masca- acid sequence dataset (Kohany et al. 2006; Jurka et al.
rocoffea (species from the Western Indian Ocean Islands) 2005)—http://www.girinst.org/repbase/). BLASTN were
and Paracoffea (species belonging to Psilanthus sub- carried out against Coffea coding sequence (CDS, http://
genus Afrocoffea). In total, we used seven Eucoffea, two coffee-genome.org), the C. arabica chloroplast genome
Mozambicoffea, three Mascarocoffea and one Paracoffea (EF044213) and rRNA sequence (X52320 and AY083685)

13
1982 Mol Genet Genomics (2016) 291:1979–1990

with a minimum e-value of 10e−6. BLASTN analyses minimum redundancy in 454 dataset as follow: 20, 100,
were also performed against the C. canephora repeat 500 and 1000 repetitions.
database built with REPET (https://urgi.versailles.inra.
fr/Tools/REPET) with an e-value of 10e−20. The goal Searches for microsatellites
was to identify the major TE classes, super-families and
lineages reported until today at different scales (amino Microsatellites were detected on 454 sequences using the
acid and nucleotide) and to obtain their proportion in the MicroSAtellite identification tools (http://pgrc.ipk-gatersle-
investigated genomes. Given the importance of the Class ben.de/misa/). The unit size of repetition ranged from 1 to
I/LTR-RT in all genomes, BLASTN similarity searches 20 and the number repeated units ranged from 1 to 10.
were conducted between 454 reads and a dataset of LTR
retrotransposons consensus sequences from C. canephora PCR amplification on Coffea DNA
classified according to their Reverse Transcriptase (RT)
amino acid similarities (available at the Gypsy Database Primers were designed on three full-length SIRE annotated
2.0). 454 sequences showing similarities with RT domains in this analysis (called 36-863, 3-942 and 6-1571) on ENV
were classified by phylogenetic analyses. Identified RT and LTR domains using Primer3 (http://primer3.ut.ee)
domains from 454 datasets were extracted from the nucle- (Supplemental data 1A). PCR amplifications were per-
otide sequences and translated into amino acids. Amino formed in a final volume of 20 μL using the GoTaq DNA
acid sequences (with a minimum of 150 residues) were polymerase from Promega, according to the manufacturer
aligned (ClustalW) to construct a bootstrapped neighbor- recommendations: 0.5 ml of dNTP (10 nM), 1 ml of each
joining tree, edited with FigTree (http://tree.bio.ed.ac.uk/ primer (10 mM), 0.2 U of Taq polymerase (GoTaq, Pro-
software/figtree/). mega) and 20 ng of DNA matrix. We used the following
Detailed annotation of the SIRE lineage (Copia) was PCR amplification cycle: 98 °C 2 min.; three steps (98 °C
performed using LTRFinder (http://tlife.fudan.edu.cn/ltr 30 s, 55 °C 30 s, 72 °C 30 s) repeated 35 times followed by
finder/, (Xu and Wang 2007). LTR domain sequences were a final elongation step (72 °C 5 min). The DNA samples,
aligned with MUSCLE to build a consensus 100 bootstraps representative of the biogeographic Coffea groups, (Sup-
neighbor-joining phylogenetic tree with ClustalW. Com- plemental data 1B) are those used in (Razafinarivo et al.
plete SIRE elements were annotated with Artemis (Ruther- 2013).
ford et al. 2000) and used as references. Structural incon-
gruities (InDels and rearrangements) were searched using
graphic alignments (dot-plot, (Sonnhammer and Durbin Results
1995).
The copy number of SIRE in 454 dataset was estimated 454 sequencing in Coffea: run reproducibility
as described in (Chaparro et al. 2015) and (Dias et al. and characterization of genomes composition
2015). BLASTN searches were carried out with full-length
SIRE elements found in the C. canephora genome. Reads The 454 junior runs were produced for 10 Coffea diploid
with more than 90 % of nucleotide identity with the refer- and one tetraploid species. Three independent runs for the
ence sequence over a minimum 90 % of the read lengths same accession (ET39) of the tetraploid species, C. ara-
were considered as potential fragments of the element. bica, were carried out to check the reproducibility of the
Cumulative lengths of aligned reads were used to extrap- runs. In addition, for two diploid species, C. canephora
olate the contribution of the element to each genome size and C. eugenioides, three (BUD15, HD200 and IF410) and
investigated. For each element family, the potential number two (DA56 and OUG14) accessions were, respectively,
of full-length copies is estimated by the division of the esti- sequenced. The 454 sequencing produced a genome cov-
mated size of total members of the element in the genome erage ranging from 3.2 to 4.7 % for C. arabica and from
by the reference sequence length. 5.2 to 15.7 % for all the diploid species (Table 1). In total,
more than 2.2 millions reads, accounting for 994 Mb, were
De novo detection of repeated sequences produced and analyzed in this study. The three C. arabica
replicates gave similar results showing the good reproduc-
De novo detection of repeated sequences was carried out ibility of the sequencing and enabling to have confidence in
using RepeatScout (http://bix.ucsd.edu/repeatscout/ (Price the results presented here.
et al. 2005)) on 454 sequences for each species. The librar- Using BLASTN (CDS, chloroplast genome, rDNA) and
ies of repeated sequences were used to mask each 454 BLASTX (transposable elements) we found that protein-
dataset using RepeatMasker (http://www.repeatmasker. coding genes represented between 11 % (C. heterocalyx)
org). Repetitions were then filtered out according to their and 18 % (C. canephora acc. DH200-94) of the obtained

13
Mol Genet Genomics (2016) 291:1979–1990 1983

Class I Class II CDS rDNA Chloroplast

1,1 % 1,2 % 1,1 % 1,4 %


11,6 % 11,8 % 11,2 % 11,8 %

16,6 % 17,0 % 16,7 % 16,5 %

0,2 % 0,1 % 0,1 % 1,6 %


69,9 % 0,6 %
69,2 % 0,6 % 0,7 % 68,3 % 0,4 %
70,2 %

C. arabica (ET39 1) C. arabica (ET39 2) C. arabica (ET39 3) C. eugenioides (OUG14)


(1300) (1300) (1300) (645)

1,4 % 1,3 % 1,3 %


1,4 %
11,4 % 12,3 % 11,0 % 12,5 %

14,7 %
17,0 % 19,0 % 16,4 %
1,5 % 0,3 %
3,2 % 0,6 % 66,4 % 0,2 %
66,5 % 0,2 % 69,0 % 0,7 %
71,1 %
0,8 %

C. canephora (IF410) C. canephora (DH) C. canephora (BUD15) C. eugenioides (DA56)


(700) (700) (700) (645)

1,0 % 0,9 % 1,4 % 1,5 %


12,0 % 11,0 % 11,0 % 10,9 %

11,7 % 11,5 %
16,6 % 17,2 %
4,0 %
0,4 % 1,9 % 0,7 %
63,3 % 6,9 % 0,8 % 66,4 % 3,2 % 0,8 %
70,8 %
74,1 %

C. charrieriana C. heterocalyx C. pseudozanguebariae C. racemosa


(723) (863) (593) (506)
1,1 % 1,3 %
1,4 %
1,5 %
11,3 % 13,2 % 9,2 % 11,3 %

15,1 % 12,8 %
16,6 %
17,4 %
0,4 %0,6 %
4,8 % 0,8 %
0,3 %
0,7 % 66,9 % 0,2 % 69,0 %
69,8 % 73,6 %
0,8 %

C. tetragona C. dolichophylla C. humblotiana


(513) (689) (469) (593)

Fig. 1  Composition of 454 reads for 11 Coffea species and 14 acces- Name and accession of species were indicated with their respective
sions. Class I and Class II are known transposable element coding genome size indicated into brackets (in Mb)
regions, CDS cellular coding regions, rDNAs ribosomal DNA genes.

data (Fig. 1). A similar percentage to that of C. canephora between species varies between 0.14 % (C. arabica) and
was found for the three C. arabica replicates (17 %). How- 7 % (C. pseudozanguebariae). Five species showed a per-
ever, the proportion of identified chloroplast sequences centage of chloroplast sequences larger than 2 % (Fig. 1).

13
1984 Mol Genet Genomics (2016) 291:1979–1990

Chloroplast DNA presence may be attributed to the fact (Eucoffea). Altogether our data suggest a noticeable varia-
that total DNA was extracted for the sequencing and not tion of the quantitative LTR-RT content in Coffea species
just the nuclear fraction as in (Carrier et al. 2011) or to dif- genomes.
ferent amount of chloroplast DNA inserted into the nuclear
genomes according to the studied taxa, such insertions have Abundance of LTR‑retrotransposon lineages and their
been observed in the sequenced C. canephora genome contribution to genome size
(Denoeud et al. 2014). Recognizable coding sequences
from transposable elements represented a significant pro- To further investigate the quantitative variation of LTR-
portion ranging from 10 % for C. humblotiana, the smallest retrotransposon content, we first classified the REPET con-
genome [469 Mb, (Razafinarivo et al. 2012)] to 14 % for C. sensus sequences into Copia and Gypsy super-families and,
dolichophylla, an average size genome (689 Mb). Interest- thus, into lineages (Bianca, Oryco, Retrofit, Sire, Tork for
ingly, the genome of C. heterocalyx [the biggest one with Copia and Athila, CRM, Del, Galadriel, Reina and TAT for
863 Mb, (Noirot et al. 2003)] was containing 12 % of trans- Gypsy (Llorens et al. 2009), according to their similarities
posable element coding genes. to reverse transcriptase (RT) reference domains. In total,
For C. canephora, a similar TE coding sequences pro- LTR-retrotransposon consensus sequences were assigned
portion (Class I and Class II) was found for the three acces- to 877 families containing RT domains, for which 352 and
sions analyzed (BUD15, IF410 and DH200-94) originating 525 belong to Copia and Gypsy, respectively. These 877
from three different geographical areas (respectively, 12.3, families belong to all the different LTR-retrotransposon lin-
12.8 and 13.7 %). For all the species, most of the identified eages previously discovered in other plant genomes. Using
coding sequences of transposable elements felled into the this dataset, all the Coffea species analyzed were found to
Class I, as found for the C. canephora genome sequence contain a Gypsy/Copia ratio ranging from 2.6 to 4.6, sug-
(Denoeud et al. 2014). gesting that Gypsy represented the most abundant LTR-ret-
To further investigate the composition of repeated rotransposon super-family in Coffea species, as previously
sequences in Coffea species, we used as reference the C. found in C. canephora (Denoeud et al. 2014; Dereeper
canephora database of consensus transposable elements et al. 2013). The overall proportion of Copia and Gypsy
that was constructed de novo and annotated using the varied greatly according to Chevalier’s botanical classi-
REPET programs. The C. canephora database is composed fication and increased from Eucoffea to Mascarocoffea
of 4051 consensus sequences for which 1536 and 2023 (Supplemental data 3). These variations were not notice-
belonged to the LTR retrotransposons and non-autonomous able when the 454 reads were translated (using BLASTX
LTR retrotransposons, respectively. Using this dataset, analysis against RepBase). Interestingly the Gypsy/Copia
the proportion of LTR retrotransposons in the 454 reads ratio was clearly heterogeneous among Mascarocoffea spe-
reached 32 % for C. humblotiana and 53 % for C. hetero- cies. Indeed the proportion of different lineages also varied
calyx (Supplemental data 2). Interestingly, the amount of according to the botanical classification (Fig. 2). Two lin-
454 reads similar to C. canephora LTR retrotransposon eages, SIRE from Copia and Del from Gypsy appeared to
consensus sequences was very similar for Eucoffea spe- differ strongly in the 454 reads between Eucoffea, Mozam-
cies whatever their genome size (C. arabica: 50–51 %, C. bicoffea, Mascarocoffea and Paracoffea. In Eucoffea, the
eugenioides: 48–50 %, C. canephora: 49–52 %, C. charrie- SIRE lineage is present in 4.5–5.1 % of the 454 reads (iden-
riana: 48 % and C. heterocalyx: 53 %), while a clear lower tified with BLASTN, value 10e−20), at the exception of C.
amount was observed for the Mozambicoffea species (C. charrieriana for which 3.2 % of reads contained this line-
pseudozanguebariae: 37 %, C. racemosa: 39 %), for Mas- age. Mozambicoffea species contained a lower percentage
carocoffea species (C. tetragona: 36 %, C. dolichophylla: of SIRE, with 2.1 and 2.2 % for C. pseudozamguebariae
40 %, and C. humblotiana: 32 %) and for Asian Paracoffea and C. racemosa, while SIRE sequences were very rare in
(C. horsfieldiana: 34 %). These variations between Eucof- Mascarocoffea species and Paracoffea (between 1.1 and
fea and the three other botanical groups (Mozambicoffea, 1.5 %). Another important variation between botanical
Mascarocoffea and Paracoffea), appeared independent from groups is observed for the Del fraction; going from 16.2 to
the genomes size, at the exception of C. humblotiana that 14 % in Eucoffea, 10.7 to 11.6 % in Mozambicoffea, 7.3
showed both the smallest genome (469 Mb) and the low- to 9.9 % in Mascarocoffea and 7.2 % in Paracoffea (Fig. 2;
est percentage of 454 reads containing sequences similar Supplemental data 4). Here also, the lowest percentage in
to C. canephora LTR retrotransposons (32 %). Such varia- Eucoffea is observed for C. charrieriana (13.1 %), con-
tion could be attributed to the nucleotide divergence of LTR trasting with the other species of this botanical group.
retrotransposons between Eucoffea and the other botanical The pattern of LTR retrotransposon identified in C. char-
groups since the nucleotide database of LTR retrotranspo- rieriana, suggests that this species differs from all the other
sons used as reference was established from C. canephora Eucoffea species studied here.

13
Mol Genet Genomics (2016) 291:1979–1990 1985

Copia Gypsy

BIANCA SIRE TORK ORYCO RETROFIT REINA ATHILA TAT CRM GALADRIEL DEL

18

13,5

4,5

0
C. eugenioides (OUG14)
C. canephora (BUD15)

C. canephora (HD)

C. canephora (IF410)

C. arabica (ET39 1)

C. arabica (ET39 2)

C. arabica (ET39 3)

C. eugenioides (DA56)

C. charrieriana (OA22)

C. dolichophylla

C. humblotiana
C. heterocalyx

C. pseudozanguebariae

C. racemosa

C. tetragona

Eucoffea Moz Mas P

Fig. 2  Composition of 454 reads (in percentage) similar to LTR retrotransposon lineages between 11 Coffea species, organized according to
their botanical sections: Eucoffea, Mozambicoffea (Moz), Mascarocoffea (Mas) and Paracoffea (P)

Interestingly, no clear relationship was found between pattern was observed for repeated sequences with more
the abundance of LTR retrotransposon super-families or than 100, 500 and 1000 copies. C. heterocalyx (863 Mb)
lineages and, the genome size variation. However, there is and C. canephora (IF410; 700 Mb) are the two samples
a clear relationship between the abundance of detected ele- with the highest proportion of repeated sequences, while C.
ments and the botanical classification of the Coffea species. humblotiana, the smallest genome, has the lower number
of repetitions. Among Mascarocoffea, this percentage dif-
De novo detection of repeated sequences in Coffea fers considerably between C. humblotiana (469 Mb) and C.
dolichophyla (682 Mb). Interestingly, some species appears
As no clear relationship could be established between the enriched with highly repeated sequences (>500 and >1000
presence of LTR retrotransposons and the genome size copies), such as C. heterocalyx (10.8 % of sequences were
variation in Coffea genomes, another type of repeated repeated more than 500 times), while C. humblotiana
sequences should be involved. For this, we estimated the and C. horsfieldiana contained very few highly repeated
global number of repeated sequences (excluding micro- sequences (Supplemental data 5).
satellite sequences) presenting more than 20, 100, 500 and
1000 repeats and their proportion in each dataset (Supple- Microsatellites and genome size variation
mental data 5). Repeated sequences with a minimum of
20 copies represented between 54.4 (C. heterocalyx) and Different types of microsatellites were identified and their
45.6 % (C. canephora) of reads for Eucoffea, 46.3 and cumulative length was represented on a histogram (Sup-
41.8 % for Mozambicoffea, 43.4–33.3 % (C. humblotiana) plemental data 6). No large variation of the microsatellite
for Mascarocoffea and 44.1 % for Paracoffea. A similar content was observed among the species analyzed. Indeed,

13
1986 Mol Genet Genomics (2016) 291:1979–1990

the amount of microsatellite is higher in C. arabica, which The copy number of SIREs elements estimated in the
is the allotetraploid species, but for the diploid species it set of species analyzed here and using the three references
doesn’t show any variation corresponding to the genome SIRE sequences previously defined, showed a large vari-
size, whatever the size of the microsatellite motif (Supple- ation between botanical groups (Supplemental data 10).
mental data 7). The highest number was obtained for the Eucoffea with
the exception of C. charrieriana, while Mascarocoffea
The SIRE LTR retrotransposon lineage and Coffea species and Paracoffea showed very few SIRE sequences.
geographic distribution The Mozambicoffea showed a moderate number of SIRE
copies, whose numbers ranged between that of Eucoffea
As LTR retrotransposons represented a significant but vari- and Mascarocoffea. To confirm these observations at the
able part of Coffea genomes, we assess their relationships molecular level, we conducted a PCR amplification sur-
from phylogenetic analysis based on their RT domains at vey of LTR and/or ENV domains based on the three SIRE
the amino acid level. The tree obtained using 2,325 RT elements reference over a large panel of species (Supple-
domains (with a minimum length of 150 amino acids) mental data 1). Amplification products were obtained for
(Supplemental data 8) shows clearly an organization into nearly all the Eucoffea, while amplifications were obtained
lineages between the two super-families Gypsy and Copia. for few Mozambicoffea species and almost no amplifica-
For each lineage, it was possible to observe a combination tions were observed for the Mascarocoffea and Paracoffea
of RT domains from different botanical groups (Eucoffea, (Fig. 3; Supplemental data 11).
Mozambicoffea, Mascarocoffea and Paracoffea). How-
ever, one lineage named SIRE, showed a specific pattern
with an over-representation of RT sequences from Eucoffea Discussion
and Mozambicoffea and, very few from Mascarocoffea and
Paracoffea. From the 263 RT belonging to the SIRE line- The objective of this study was to investigate the trans-
age, five belong to the Indonesian Paracoffea species, and posable element composition of diploid and allotetraploid
21, 49 and 188 belong to Mascarocoffea, Mozambicoffea genomes from the Coffea genus. In some plant genomes,
and Eucoffea, respectively. This observation suggests a dif- a clear relationship was established between the num-
ferent dynamics of SIRE elements depending on the botan- ber of LTR retrotransposons and the variation of genome
ical group of the species. An in-depth study of this lineage size (Piegu et al. 2006; Lee and Kim 2014). Considering a
was performed to confirm our observations. relatively short evolutionary divergence time of the Coffea
SIRE LTR retrotransposons were identified, annotated genus [~11 MY; (Tosh et al. 2013)] and a significant varia-
and characterized in the C. canephora genome (Chaparro tion of genome size observed among species (from 469 to
et al. 2015). After detailed analysis, a total of 85 full-length 900 Mb), we focused our study on the identification and the
SIRE LTR retrotransposons were selected for further analy- characterization of repeated sequences and more particu-
ses. SIRE elements from this dataset showed strong simi- larly the LTR retrotransposons.
larities with the SIRE internal coding domains from the We used the 454 Junior apparatus to produce partial
Gypsy 2 database, and they had no apparent large inser- genome sequencing, representing genome coverage from
tion. All these predicted SIRE elements showed an over- 3.2 to 4.7 % for the allotetraploid C. arabica and 5.2–
all length around 9–10 kb, with an average LTR length of 15.7 % for ten diploid Coffea species. Such “454 whole
1 kb. The internal regions of these sequences included a genome snapshot” approach has been recently used in plant
large open reading frame (ORF1) containing the consensus and animal genomes to study and compare their composi-
for the GAG, AP, INT, RT and RNaseH domains. Down- tion in transposable elements, with similar or even lower
stream of ORF1 an additional small ORF (ORF2) showing (Wicker et al. 2009; Middleton et al. 2013; Sergeeva et al.
strong identities with the ENV domain of retroviruses was 2014; Swaminathan et al. 2007; Pagan et al. 2012). No
identified. bias of genomic sampling for particular sequence type was
These 85 sequences were classified through phyloge- noted when using the 454 sequencing procedure (Swami-
netic analysis based on their LTR sequences, into three nathan et al. 2007). Indeed using a relatively low genome
major clusters (A, B and C) composed of 17, 28 and 40 ele- coverage, only highly repeated transposable elements
ments, respectively (Supplemental data 9). For each cluster, can be accurately studied and low-copy number repeated
one full-length sequence (with highest percentage of LTR sequences will not be represented in our dataset (Macas
identity, and highest overall length) was used as a refer- et al. 2007). Despite the 454 sequencing technology is
ence sequences for further analyses (the sequences were beginning to be outdated; it generates long reads allow-
named 36-863, 3-942 and 6-1571 for A, B and C cluster, ing an accurate identification of genes and transposable
respectively). elements. Other approaches are now possible to study the

13
Mol Genet Genomics (2016) 291:1979–1990 1987

+++ 100% (18)


+ 23.8% (21)
Mozambicoffea
Eucoffea
+ 6.2% (16)
Psilanthus

- 0% (132)
Mascarocoffea

Fig. 3  Geographical distribution of Coffea botanical groups and 7, 30 and 4 species were used as DNA matrix for, respectively, Eucof-
summary of SIRE PCR amplifications. The summary of SIRE PCR fea, Mozambicoffea, Mascarocoffea and Paracoffea (Supplemental
amplification is symbolized by the rate of PCR amplification in per- data 8, 9 and 10)
centage, and the number of PCR assays performed in parenthesis. 18,

transposable element composition and copy numbers using genome; and, (2) an ab initio identification of repeated
the Illumina platform providing shorter read length but sequences.
with an unrivaled genome coverage (Ramachandran and We found that the most repeated order of transpos-
Hawkins 2016). able elements are LTR retrotransposons as found in the
In this study, we confirmed that no bias was observed in C. canephora genome and in most plant genomes (Lee
the randomness of the sequencing when performing three and Kim 2014). At the TE amino acid level, as curated in
repetitions of the same accession (C. arabica, Et39). So Repbase, we found a similar percentage of TE between the
far, few studies concerned the TE abundance and dynam- generated C. canephora 454 reads (ranged from 12.2 to
ics among species from a single plant genus. Most of them 13.5 %) and a recent and similar analysis of 131,412 BAC
were performed on annual plants, with the exception of the End Sequences (BES) from two C. canephora (DH200-94)
gymnosperm family (Nystedt et al. 2013). However, for BAC libraries [11.9 % (Dereeper et al. 2013)]. Surprisingly
perennial angiosperms, the dynamics and the evolutionary the percentage of known TE coding sequences remains
history of TE within a genus remain poorly studied. relatively stable whatever the botanical groups, the species
and the genome size. Interestingly, the only notable dif-
TE composition reflects the divergence of the botanical ferences concerned C. dolichophylla and C. humblotiana
groups species showing, respectively, 14.6 and 10.2 % of detected
TE coding sequences. Considering the genome size differ-
For the first time, we conducted a study on TE composi- ence (689 and 469 Mb), these species that belong to the
tion of the genome of eleven Coffea species. Our study Mascarocoffea may have underwent a different history of
was based on the analysis of 454 reads for (1) their simi- TE accumulation. This observation was confirmed at the
larities with known TE proteins in plants and against nucleotide level using a C. canephora de novo library of
a library of TE annotated in the Coffea canephora TEs using REPET.

13
1988 Mol Genet Genomics (2016) 291:1979–1990

Using a detailed classification of LTR-RT REPET con- accumulation of numerous transposable elements (mainly
sensuses, we also found that some lineages have vary- LTR RT) belonging to a large panel of families.
ing distribution levels among Coffea species and botani- Similarly, no strong variation of microsatellite copy
cal groups. For example the Gypsy Del lineage identified numbers was detected between species, suggesting that
in higher abundance in African species, decreases from a rapid amplification of some of these simple sequence
Eucoffea species (14–16 %), to Mozambicoffea (10–11 %), repeats was not the main mechanisms involved in the C.
Mascarocoffea and Paracoffea (9–7 %). This suggests an heterocalyx genome size increase as it was observed in
overall increase of the Del LTR-RT westwards; from Indo- Lupinus (Martin et al. 2016). Our results are congruent
nesian and Malagasy Coffea species to eastern and west- with those of Pinus (Morse et al. 2009), Helianthus (Caval-
ern African species. Another LTR RT lineage, named SIRE lini et al. 2010), and Lupinus (Martin et al. 2016) both gen-
(Copia super-family) was identified as being significantly era showing a large genome size variation (18–40, 3.2–12.3
numerous in African species (in 5 % of the 454 reads), but and, 0.97–2.4 Gb, respectively) but with none element con-
almost absent in Indonesian, Madagascan and Comorian tributing specifically to this variation.
species (~1 %), this observation was confirmed by the real- At the opposite, the Mascarocoffea species present more
ization of PCR amplifications (Fig. 3). This indicates that important variations of their TEs composition. The strong
the SIREs proliferated successfully in African species (in contrast in TE content between C. dolichophylla and C.
Mozambicoffea and especially in Eucoffea) while the copy humblotiana is due to an increase/decrease of the amount
number remained low, by lack of activity or elimination, in of the Del LTR retrotransposon lineage (10 vs 7 %) and a
species from insular species. smaller increase/decrease for the remaining LTR RT line-
These two examples of LTR-RT lineages variation, sug- ages. C. humblotiana, has undergone few proliferation of
gesting different history of TE proliferation, reflect inde- LTR retrotransposons explaining its small genome size
pendent genome divergences between Coffea botanical (469 Mb) while C. dolichophylla has undergone prolifera-
groups. This result also suggests that geographical differ- tion of mainly Del and several other Copia and Gypsy LTR-
entiation could be associated to independent niches colo- RT lineages. The variation of repeated sequences between
nization and speciation in Africa, Madagascar and Indone- C. dolichophylla and C. humblotiana is also clear with
sia. Therefore, quantitative and qualitative TE composition the de novo analysis showing a clear increase/decrease in
might be used for performing phylogeny analysis and to repeated sequences. Since the fully resolved phylogenetic
reinforce a model for the evolution of plant species. analysis of Mascarocoffea is not yet available, the time-
scale of the LTR RT proliferation in C. dolichophylla can-
TE composition reflects a different evolution of species not be estimated.
within the botanical groups Altogether, our analysis demonstrated the power of
sequencing at low coverage to study the transposable ele-
It is well established that plant genome sizes are directly ments composition of genomes at the genus scale for com-
linked with the proportion of transposable elements. A parative structural genomics of non-model species. The
large amplification of a small number of LTR retrotrans- C. humblotiana species represents an interesting genomic
posons lineages may cause a dramatic and sudden genome model, worth to have its genome completely sequenced.
size increase (Piegu et al. 2006). In our study, we found This WGS will allow a better understanding of the mecha-
contrasted results between the genome size of Coffea spe- nisms involved in the decrease or in the control of the pro-
cies and their TEs composition. liferation of transposable elements in a genome.
Few variation of TE composition was related to the
genome size in Eucoffea, although genome size varies from
Compliance with ethical standards
645b Mb for C. eugenioides, to 863 Mb for C. heterocalyx
(700 Mb for C. canephora). This suggests that no rapid pro- Conflict of interest All authors declare they have no conflict of inter-
liferation of few TE families was involved to explain this est.
genome size difference. Particularly the TE proportion is
Funding This research was supported Agropolis Fondation through
almost identical between C. canephora and C. heterocalyx the “Investissement d’avenir” program (ANR-10-LABX-0001-01)
with the exception to one Gypsy lineage named TAT, that under the reference ID 1002-009.
varies from 0.9 % in C. canephora to 2.2 % in C. hetero-
calyx. However, this recent proliferation in C. heterocalyx Ethical approval This article does not contain any studies with
human or animals performed by any of the authors.
cannot explain alone the genome size difference between
the two species. We, therefore, propose that in Eucoffea Data availability The project has been deposited at DDBJ/EMBL/
the genome size variation would result from a differential GenBank BioProject ID PRJNA242989.

13
Mol Genet Genomics (2016) 291:1979–1990 1989

References Dušková E, Kolář F, Sklenář P, Rauchová J, Kubešová M, Fér T,


Suda J, Marhold K (2010) Genome size correlates with growth
form, habitat and phylogeny in the Andean genus Lasiocephalus
Alzohairy A, Sabir J, Gyulai G, Younis R, Jansen RK, Bahieldin A
(Asteraceae). Preslia 82:127–148
(2014) Environmental stress activation of plant long-terminal
Dvořák J (2009) Triticeae genome structure and evolution. In: Mue-
repeat retrotransposons. Funct Plant Biol 41:557–567
hlbauer JG, Feuillet C (eds) Genetics and genomics of the Trit-
Bennetzen JL, Ma J, Devos KM (2005) Mechanisms of recent
iceae. Springer, New York, pp 685–711
genome size variation in flowering plants. Ann Bot 95:127–132
Eilam T, Anikster Y, Millet E, Manisterski J, Sag-Assif O, Feldman M
Bremer B, Eriksson T (2009) Time tree of Rubiaceae: phylogeny
(2007) Genome size and genome evolution in diploid Triticeae
and dating the family, subfamilies, and tribes. Int J Plant Sci
species. Genome 50:1029–1037
170:766–793
Hawkins JS, Kim H, Nason JD, Wing RA, Wendel JF (2006) Differ-
Bucher E, Reinders J, Mirouze M (2012) Epigenetic control of trans-
ential lineage-specific amplification of transposable elements is
poson transcription and mobility in Arabidopsis. Curr Opin Plant
responsible for genome size variation in Gossypium. Genome
Biol 15:503–510
Res 16:1252–1261
Carrier G, Santoni S, Rodier-Goud M, Canaguier A, Kochko A,
Hribova E, Neumann P, Matsumoto T, Roux N, Macas J, Dolezel J
Dubreuil-Tranchant C, This P, Boursiquot JM, Le Cunff L
(2010) Repetitive part of the banana (Musa acuminata) genome
(2011) An efficient and rapid protocol for plant nuclear DNA
investigated by low-depth 454 sequencing. BMC Plant Biol
preparation suitable for next generation sequencing methods. Am
10:204
J Bot 98:e13–e15
Ibarra-Laclette E, Lyons E, Hernandez-Guzman G, Perez-Torres CA,
Carrier G, Le Cunff L, Dereeper A, Legrand D, Sabot F, Bouchez O,
Carretero-Paulet L, Chang T-H, Lan T, Welch AJ, Juarez MJA,
Audeguin L, Boursiquot JM, This P (2012) Transposable ele-
Simpson J, Fernandez-Cortes A, Arteaga-Vazquez M, Gongora-
ments are a major cause of somatic polymorphism in Vitis vinif-
Castillo E, Acevedo-Hernandez G, Schuster SC, Himmelbauer
era L. PLoS One 7:10
H, Minoche AE, Xu S, Lynch M, Oropeza-Aburto A, Cervantes-
Casacuberta E, Gonzalez J (2013) The impact of transposable ele-
Perez SA, de Jesus Ortega-Estrada M, Cervantes-Luevano JI,
ments in environmental adaptation. Mol Ecol 22:1503–1517
Michael TP, Mockler T, Bryant D, Herrera-Estrella A, Albert VA,
Cavallini A, Natali L, Zuccolo A, Giordani T, Jurman I, Ferrillo V,
Herrera-Estrella L (2013) Architecture and evolution of a minute
Vitacolonna N, Sarri V, Cattonaro F, Ceccarelli M, Cionini PG,
plant genome. Nature 498:94–98
Morgante M (2010) Analysis of transposons and repeat compo-
Ito H (2013) Small RNAs and regulation of transposons in plants.
sition of the sunflower (Helianthus annuus L.) genome. Theor
Genes Genet Syst 88:3–7
Appl Genet 120:491–508
Ito H, Kakutani T (2014) Control of transposable elements in Arabi-
Chaparro C, Gayraud T, de Souza RF, Domingues DS, Akaffou S,
dopsis thaliana. Chromosome Res 22:217–223
Laforga Vanzela AL, Kochko A, Rigoreau M, Crouzillat D, Hamon
Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Wali-
S, Hamon P, Guyot R (2015) Terminal-repeat retrotransposons with
chiewicz J (2005) Repbase update, a database of eukaryotic
GAG domain in plant genomes: a new testimony on the complex
repetitive elements. Cytogenet Genome Res 110:462–467
world of transposable elements. Genome Biol Evol 7:493–504
Kiehn M (1995) Chromosome survey of the Rubiaceae. Ann Mo Bot
Chevalier A (1942) Les caféiers du globe II: Iconographie des caféiers
Gard 82:398–408
sauvages et cultivés et des Rubiacées prises pour des caféiers. In:
Kinoshita T, Seki M (2014) Epigenetic memory for stress response
Lechevalier P (ed) Encyclopédie Biologique, Paris
and adaptation in plants. Plant Cell Physiol 55:1859–1863
Davis AP, Tosh J, Ruch N, Fay MF (2011) Growing coffee: Psilanthus
Knight CA, Beaulieu JM (2008) Genome size scaling through pheno-
(Rubiaceae) subsumed on the basis of molecular and morpholog-
type space. Ann Bot 101:759–766
ical data; implications for the size, morphology, distribution and
Kohany O, Gentles AJ, Hankus L, Jurka J (2006) Annotation, submis-
evolutionary history of Coffea. Bot J Linn Soc 167:357–377
sion and screening of repetitive elements in Repbase: Repbase
Denoeud F, Carretero-Paulet L, Dereeper A, Droc G, Guyot R, Piet-
Submitter and Censor. BMC Bioinf 7:474
rella M, Zheng C, Alberti A, Anthony F, Aprea G, Aury JM,
Kumar A, Bennetzen JL (1999) Plant retrotransposons. Annu Rev
Bento P, Bernard M, Bocs S, Campa C, Cenci A, Combes MC,
Genet 33:479–532
Crouzillat D, Da Silva C, Daddiego L, De Bellis F, Dussert S,
Lee SI, Kim NS (2014) Transposable elements and genome size vari-
Garsmeur O, Gayraud T, Guignon V, Jahn K, Jamilloux V, Joët T,
ations in plants. Genomics Inform 12:87–97
Labadie K, Lan I, Leclercq J, Lepelley M, Leroy T, Li LT, Lib-
Lisch D (2013) How important are transposons for plant evolution?
rado P, Lopez L, Muñoz A, Noel B, Pallavicini A, Perrotta G,
Nat Rev Genet 14:49–61
Poncet V, Pot D, Priyono Rigoreau M, Rouard M, Rozas J, Tran-
Llorens C, Munoz-Pomer A, Bernad L, Botella H, Moya A (2009)
chant-Dubreuil C, VanBuren R, Zhang Q, Andrade AC, Argout
Network dynamics of eukaryotic LTR retroelements beyond
X, Bertrand B, de Kochko A, Graziosi G, Henry RJ, Jayarama
phylogenetic trees. Biol Direct 4:41
Ming R, Nagai C, Rounsley S, Sankoff D, Giuliano G, Victor
Macas J, Neumann P, Navratilova A (2007) Repetitive DNA in the
A, Albert V, Wincker P, Lashermes P (2014) The coffee genome
pea (Pisum sativum L.) genome: comprehensive characterization
provides insight into the convergent evolution of caffeine biosyn-
using 454 sequencing and comparison to soybean and Medicago
thesis. Science 345:1181–1184
truncatula. BMC Genom 8:427
Dereeper A, Guyot R, Tranchant-Dubreuil C, Anthony F, Argout X,
Martin G, Paris A, Samar M, Keller J, Salmon A, Novak P, Macas
de Bellis F, Combes MC, Gavory F, de Kochko A, Kudrna D,
J, Aïnouche A (2016) Dramatic lineage-specific accumulation of
Leroy T, Poulain J, Rondeau M, Song X, Wing R, Lashermes P
retrotransposons versus Simple Sequence Repeats across the last
(2013) BAC-end sequences analysis provides first insights into
10 million years in Mediterranean and African lupin genomes
coffee (Coffea canephora P.) genome composition and evolution.
(Lupinus; Fabaceae). In: International Congress on Transposable
Plant Mol Biol 83:177–189
elements, Saint Malo, France
Dias ES, Hatt C, Hamon S, Hamon P, Rigoreau M, Crouzillat D,
Meyers BC, Tingey SV, Morgante M (2001) Abundance, distribution,
Carareto CM, De Kochko A, Guyot R (2015) Large distribution
and transcriptional activity of repetitive elements in the maize
and high sequence identity of a Copia-type retrotransposon in
genome. Genome Res 11:1660–1676
angiosperm families. Plant Mol Biol 89:83–97

13
1990 Mol Genet Genomics (2016) 291:1979–1990

Michael TP, Jackson S (2013) The first 50 plant genomes. Plant Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream
Genome 6:1–7 MA, Barrell B (2000) Artemis: sequence visualization and anno-
Middleton CP, Stein N, Keller B, Kilian B, Wicker T (2013) Compar- tation. Bioinformatics 16:944–945
ative analysis of genome composition in Triticeae reveals strong SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL
variation in transposable element dynamics and nucleotide diver- (1998) The paleontology of intergene retrotransposons of maize.
sity. Plant J 73:347–356 Nat Genet 20:43–45
Morse AM, Peterson DG, Islam-Faridi MN, Smith KE, Magbanua Z, Schmieder R, Edwards R (2011) Quality control and preprocessing of
Garcia SA, Kubisiak TL, Amerson HV, Carlson JE, Nelson CD, metagenomic datasets. Bioinformatics 27:863–864
Davis JM (2009) Evolution of genome size and complexity in Schulman AH, Gupta PK, Varshney RK (2004) Organization of ret-
Pinus. PLoS One 4:e4332 rotransposons and microsatellites in cereal genomes. In: Gupta
Noirot M, Poncet V, Barre P, Hamon P, Hamon S, De Kochko A PK, Varshney VR (eds) Cereal genomics. Kluwer Academic,
(2003) Genome size variations in diploid African Coffea species. Dordrecht, pp 83–118
Ann Bot (Lond) 92:709–714 Sergeeva EM, Afonnikov DA, Koltunova MK, Gusev VD, Mirosh-
Nystedt B, Street NR, Wetterbom A, Zuccolo A, Lin YC, Scofield DG, nichenko LA, Vrána J, Kubaláková M, Poncet C, Sourdille P,
Vezzi F, Delhomme N, Giacomello S, Alexeyenko A, Vicedomini Feuillet C, Doležel J, Salina EA (2014) Common wheat chromo-
R, Sahlin K, Sherwood E, Elfstrand M, Gramzow L, Holmberg some 5B composition analysis using low-coverage 454 sequenc-
K, Hallman J, Keech O, Klasson L, Koriabine M, Kucukoglu M, ing. Plant Genome 7:1–16
Kaller M, Luthman J, Lysholm F, Niittyla T, Olson A, Rilakovic Slovak M, Vit P, Urfus T, Suda J (2009) Complex pattern of genome
N, Ritland C, Rossello JA, Sena J, Svensson T, Talavera-Lopez size variation in a polymorphic member of the Asteraceae. J Bio-
C, Theissen G, Tuominen H, Vanneste K, Wu ZQ, Zhang B, geogr 36:372–384
Zerbe P, Arvestad L, Bhalerao R, Bohlmann J, Bousquet J, Gar- Sonnhammer ELL, Durbin R (1995) A dot-matrix program with
cia Gil R, Hvidsten TR, de Jong P, MacKay J, Morgante M, Rit- dynamic threshold control suited for genomic DNA and protein
land K, Sundberg B, Thompson SL, Van de Peer Y, Andersson B, sequence analysis (reprinted from Gene Combis, vol 167, pg
Nilsson O, Ingvarsson PK, Lundeberg J, Jansson S (2013) The GC1-GC10, 1995). Gene 167:GC1–GC10
Norway spruce genome sequence and conifer genome evolution. Stoffelen P, Noirot M, Couturon E, Anthony F (2008) A new caffeine-
Nature 497:579–584 free coffee from Cameroon. Bot J Linn Soc 158:67–72
Pagan HJ, Macas J, Novak P, McCulloch ES, Stevens RD, Ray DA Swaminathan K, Varala K, Hudson ME (2007) Global repeat discov-
(2012) Survey sequencing reveals elevated DNA transposon ery and estimation of genomic copy number in a large, complex
activity, novel elements, and variation in repetitive landscapes genome using a high-throughput 454 sequence survey. BMC
among vesper bats. Genome Biol Evol 4:575–585 Genom 8:132
Piegu B, Guyot R, Picault N, Roulin A, Saniyal A, Kim H, Collura Todorovska E (2007) Retrotransposons and their role in plant-
K, Brar DS, Jackson S, Wing RA, Panaud O (2006) Doubling Genome evolution. Biotechnol Biotechnol Equip 21:294–305
genome size without polyploidization: dynamics of retrotranspo- Tosh J, Dessein S, Buerki S, Groeninckx I, Mouly A, Bremer B,
sition-driven genomic expansions in Oryza australiensis, a wild Smets EF, De Block P (2013) Evolutionary history of the Afro-
relative of rice. Genome Res 16:1262–1269 Madagascan Ixora species (Rubiaceae): species diversification
Price AL, Jones NC, Pevzner PA (2005) De novo identification of and distribution of key morphological traits inferred from dated
repeat families in large genomes. Bioinformatics 21:i351–i358 molecular phylogenetic trees. Ann Bot 112:1723–1742
Ramachandran D, Hawkins JS (2016) Methods for accurate quanti- Wicker T, Keller B (2007) Genome-wide comparative analysis of
fication of LTR-retrotransposon copy number using short-read copia retrotransposons in Triticeae, rice, and Arabidopsis reveals
sequence data: a case study in Sorghum. Mol Genet Genomics conserved ancient evolutionary lineages and distinct dynamics of
Razafinarivo N, Rakotomalala JJ, Brown SC, Bourge M, Hamon S, individual copia families. Genome Res 17:1072–1081
De Kochko A, Poncet V, Dubreuil-Tranchant C, Couturon E, Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B,
Guyot R, Hamon P (2012) Geographical gradients in the genome Flavell A, Leroy P, Morgante M, Panaud O, Paux E, SanMiguel
size variation of wild coffee trees (Coffea) native to Africa and P, Schulman AH (2007) A unified classification system for
Indian Ocean islands. Tree Genet Genomes 8:1345–1358 eukaryotic transposable elements. Nat Rev Genet 8:973–982
Razafinarivo NJ, Guyot R, Davis AP, Couturon E, Hamon S, Crouzil- Wicker T, Taudien S, Houben A, Keller B, Graner A, Platzer M, Stein
lat D, Rigoreau M, Dubreuil-Tranchant C, Poncet V, De Kochko N (2009) A whole-genome snapshot of 454 sequences exposes
A, Rakotomalala JJ, Hamon P (2013) Genetic structure and the composition of the barley genome and provides evidence for
diversity of coffee (Coffea) across Africa and the Indian Ocean parallel evolution of genome size in wheat and barley. Plant J
islands revealed using microsatellites. Ann Bot 111:229–248 59:712–722
Renny-Byfield S, Chester M, Kovarik A, Le Comber SC, Grandbast- Xu Z, Wang H (2007) LTR_FINDER: an efficient tool for the pre-
ien M-A, Deloger M, Nichols RA, Macas J, Novak P, Chase diction of full-length LTR retrotransposons. Nucleic Acids Res
MW, Leitch AR (2011) Next generation sequencing reveals 35:W265–W268
genome downsizing in allotetraploid Nicotiana tabacum, pre-
dominantly through the elimination of paternally derived repeti-
tive DNAs. Mol Biol Evol 28:2843–2854

13

View publication stats

You might also like