Generic eukaryotic core promoter prediction using structural features of DNA

  1. Thomas Abeel1,2,
  2. Yvan Saeys1,2,
  3. Eric Bonnet1,2,
  4. Pierre Rouzé1,2,3, and
  5. Yves Van de Peer1,2,4
  1. 1 Department of Plant Systems Biology, Flanders Institute for Biotechnology (VIB), 9052 Gent, Belgium;
  2. 2 Department of Molecular Genetics, Ghent University, 9052 Gent, Belgium;
  3. 3 Laboratoire Associé de l’INRA (France), Ghent University, 9052 Gent, Belgium

Abstract

Despite many recent efforts, in silico identification of promoter regions is still in its infancy. However, the accurate identification and delineation of promoter regions is important for several reasons, such as improving genome annotation and devising experiments to study and understand transcriptional regulation. Current methods to identify the core region of promoters require large amounts of high-quality training data and often behave like black box models that output predictions that are difficult to interpret. Here, we present a novel approach for predicting promoters in whole-genome sequences by using large-scale structural properties of DNA. Our technique requires no training, is applicable to many eukaryotic genomes, and performs extremely well in comparison with the best available promoter prediction programs. Moreover, it is fast, simple in design, and has no size constraints, and the results are easily interpretable. We compared our approach with 14 current state-of-the-art implementations using human gene and transcription start site data and analyzed the ENCODE region in more detail. We also validated our method on 12 additional eukaryotic genomes, including vertebrates, invertebrates, plants, fungi, and protists.

Footnotes

| Table of Contents

Preprint Server