Genome-wide evolutionary analysis of the noncoding RNA genes and noncoding DNA of Paramecium tetraurelia

  1. Chun-Long Chen1,2,3,4,
  2. Hui Zhou3,
  3. Jian-You Liao3,
  4. Liang-Hu Qu3 and
  5. Laurence Amar1,2
  1. 1Institut de Biologie Animale Intégrative et Cellulaire, Université Paris Sud, Orsay 91405, France
  2. 2Centre National de la Recherche Scientifique (CNRS), UMR 8080, Orsay 91405, France
  3. 3Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Zhongshan University, Guangzhou, 510275, People's Republic of China

    Abstract

    The compact genome of the unicellular eukaryote Paramecium tetraurelia contains noncoding DNA (ncDNA) distributed into >39,000 intergenic sequences and >90,000 introns of 390 base pairs (bp) and 25 bp on average, respectively. Here we analyzed the molecular features of the ncRNA genes, introns, and intergenic sequences of this genome. We mainly used computational programs and comparative genomics possible because the P. tetraurelia genome had formed throughout whole-genome duplications (WGDs). We characterized 417 5S rRNA, snRNA, snoRNA, SRP RNA, and tRNA putative genes, 415 of which map within intergenic sequences, and two, within introns. The evolution of these ncRNA genes appears to have mainly involved purifying selection and gene deletion. We then compared the introns that interrupt the protein-coding gene duplicates arisen from the recent WGD and identified a population of a few thousands of introns having evolved under most stringent constraints (>95% of identity). We also showed that low nucleotide substitution levels characterize the 50 and 80–115 base pairs flanking, respectively, the stop and start codons of the protein-coding genes. Lower substitution levels mark the base pairs flanking the highly transcribed genes, or the start codons of the genes of the sets with a high number of WGD-related sequences. Finally, adjacent to protein-coding genes, we characterized 32 DNA motifs able to encode stable and evolutionary conserved RNA secondary structures and defining putative expression controlling elements. Fourteen DNA motifs with similar properties map distant from protein-coding genes and may encode regulatory ncRNAs.

    Keywords:

    Keywords

    Footnotes

    • 4 Present address: Centre National de la Recherche Scientifique (CNRS), UPR 2167, CGM, Gif sur Yvette, 91198, France.

    • Reprint requests to: Laurence Amar, Institut de Biologie Animale Intégrative et Cellulaire, Université Paris Sud, Orsay 91405, France; e-mail: laurence.amar{at}u-psud.fr; fax: 33 1 69 15 49 49; or Liang-Hu Qu, Key Laboratory of Gene Engineering of the Ministry of Education, State Key Laboratory for Biocontrol, Zhongshan University, Guangzhou, 510275, People's Republic of China; e-mail: lssqlh{at}mail.sysu.edu.cn.

    • Article published online ahead of print. Article and publication date are at http://www.rnajournal.org/cgi/doi/10.1261/rna.1306009.

      • Received August 8, 2008.
      • Accepted December 23, 2008.
    | Table of Contents