Gki 627

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Nucleic Acids Research, 2005, Vol. 33, No.

10 3165–3175
doi:10.1093/nar/gki627

Structural properties of promoters: similarities and


differences between prokaryotes and eukaryotes
Aditi Kanhere and Manju Bansal*

Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India

Received February 23, 2005; Revised April 21, 2005; Accepted May 13, 2005

ABSTRACT positions relative to the TSS (1). The structure of eukaryotic


promoters is generally more complex and they have several
During the process of transcription, RNA polymerase different sequence motifs, such as TATA box, INR box, BRE,
can exactly locate a promoter sequence in the com- CCAAT-box and GC-box (2). These sequence motifs were
plex maze of a genome. Several experimental studies identified based on the analysis of a large number of promoters
and computational analyses have shown that the pro- and they represent consensus sequences. In other words, each
moter sequences apparently possess some special nucleotide in the consensus sequence motif represents the
properties, such as unusual DNA structures and most frequently occurring nucleotide at that position and
low stability, which make them distinct from the rest does not represent an actual sequence. It has been observed
of the genome. But most of these studies have been that a wide variety of sequences similar to these representative
carried out on a particular set of promoter sequences motifs are present in promoters. In fact, there are very few
or on promoter sequences from similar organisms. To promoter sequences that exactly match the consensus
sequence, and also each of these sequence motifs is found
examine whether the promoters from a wide variety of
in only a few of the promoter sequences. In addition, because
organisms share these special properties, we have these sequence motifs comprise only 6–10 bp and are degen-
carried out an analysis of sets of promoters from bac- erate, the probability of finding similar sequences in regions
teria, vertebrates and plants. These promoters were other than promoters is quite high. Hence, it is difficult to
analyzed with respect to the prediction of three differ- believe that these sequence motifs alone are wholly respons-
ent properties, such as DNA curvature, bendability ible for RNAP–promoter interaction. It is possible that base
and stability, which are relevant to transcription. sequences in the neighborhood of these specific motifs may
All the promoter sequences are predicted to share also be involved in the identification process, and it is highly
certain features, such as stability and bendability likely that, in addition to the actual sequence itself, the second-
profiles, but there are significant differences in DNA order properties of the promoter sequence can also play a role
curvature profiles and nucleotide composition bet- in transcriptional regulation. Experimental evidences indeed
suggest that sequence-dependent secondary properties of pro-
ween the different organisms. These similarities and
moters are important in their function. Three such properties
differences are correlated with some of the known that are often involved are stability, curvature and bendability
facts about transcription process in the promoters of DNA in these promoter regions.
from the three groups of organisms. An important step during transcription is the open complex
formation between RNAP and promoter sequence, which
involves local separation of the two strands around the 10
INTRODUCTION region (3–8). The transcription process takes place under con-
The process of transcription begins with the RNA polymerase ditions in which DNA melting is a thermodynamically unfa-
(RNAP) binding to DNA in the promoter region, which is in vorable process and yet during open complex formation the
the immediate vicinity of the transcription start site (TSS). two strands separate without the help of any external energy. It
Exactly, how RNAP locates this specific binding site in the is thought that the low stability of promoter region may assist
large excess of non-promoter DNA remains a field of intense in initial melting (9–12).
investigation. A typical promoter sequence is thought to com- Another property, often associated with upstream
prise some sequence motifs positioned at specific sites relative sequences, is the occurrence of unusual DNA structures,
to the TSS. For example, a prokaryotic promoter is observed to such as curved DNA, which can be defined as a double-
have two hexameric motifs centered at or near 10 and 35 stranded DNA with a curved helical axis. A number of

*To whom correspondence should be addressed. Tel: +91 80 2293 2534; Fax: +91 80 2360 0535; Email: [email protected]

 The Author 2005. Published by Oxford University Press. All rights reserved.

The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access
version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press
are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but
only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact [email protected]
3166 Nucleic Acids Research, 2005, Vol. 33, No. 10

examples, from eukaryotes and prokaryotes, have shown that Eukaryotic promoter sequences. The vertebrate and plant POL
many genes have curved regions upstream of the TSS (13–28). II promoter sequences were extracted from Eukaryotic Pro-
The experimental data also indicate a correlation between moter Database (EPD) (44,45). The EPD dataset has 2540
promoter functioning and sequence-dependent DNA curvature vertebrate promoters and 198 plant promoters. Only those
(16–20). Many computational studies also predict the presence promoters that have single initiation site and <50% sequence
of curved DNA regions in the promoters (29–33). similarity in the region between 79 and +20 positions (des-
Depending on its sequence, curvature can be an inherent ignated as +S in the FP line of EPD entry) were selected in the
property of a DNA molecule or it can be induced by external first round of screening (669 vertebrate and 124 plant pro-
factors, such as protein binding. Some DNA sequences, while moters). In the second round of screening, only those
being intrinsically straight, can readily undergo distortion, and sequences that extend 500 bp upstream and 500 bp down-
hence bendability of a DNA molecule can be defined as the stream of the TSS were retained. Finally, 252 vertebrate pro-
ease with which the molecule can be made to curve in any moters and 74 plant promoters were used for this study.
direction. It is known that DNA bendability is important for
binding of transcription factors, such as TBP (34) and CAP Shuffled sequences (control set). Each 1000 bp sequence in the
(35). Many other transcription factors also facilitate the adop- four datasets was divided with respect to TSS into 500 bp
tion of curved conformations by DNA molecules (36,37). In upstream and 500 bp downstream regions. Each region was
shuffled separately, such that its mononucleotide composition
addition, there is compelling experimental evidence, which
suggests that promoter DNA wraps around the RNA poly- was maintained. Thus, shuffled sequences have the same nuc-
merase (22,38,39). Hence, it is essential to have a better under- leotide composition as the actual promoters, but not the char-
acteristic sequence patterns, if any. This procedure was
standing of DNA bendability in promoter regions.
We have, therefore, analyzed sets of promoters from dif- repeated five times, to produce different shuffled sequences,
ferent organisms with respect to the above three properties, corresponding to each upstream and downstream region. The
stability, bendability and curvature calculations were carried
such as their predicted stability, curvature and bendability.
Although promoters from different origins share certain fea- out (as described below) on these shuffled sequences, and the
tures, such as stability and bendability profiles, we also see mean values for the five shuffled sequences are used for com-
significant differences in their curvature and nucleotide com- parison with the original genome sequences.
position. With the availability of a large number of genome Free energy calculation
sequences, the task of gene identification has assumed more
significance. The characterization of these structural proper- The stability of a DNA molecule can be expressed in terms of
ties, in addition to sequence motifs, can greatly help in improv- free energy. The stability of DNA depends on mononucleotide
ing the currently available promoter and gene prediction composition as well as dinucleotide composition and it is
algorithms (40,41). possible to predict the stability of a DNA duplex from its
sequence if one knows the contribution of each nearest-
neighbor interaction (46–48). The standard free energy change
METHODS (DG037 ) corresponding to the melting transition of an ‘n’ nuc-
leotide (or ‘n  1’ dinucleotides) long DNA molecule, from
Promoter sequence sets double strand to single strand, is calculated as follows (46):
All the promoter sequences used in this study are 1000 nt long,   X
n1
starting from 500 nt upstream (position 500) and extending DG0 ¼  DG0ini þ DG0sym þ DG0i‚ iþ1
up to 500 nt downstream (position +500) of the TSS. In order i¼1
to avoid having multiple TSSs in a given 1000 nt sequence, where,
we have excluded all the TSSs that are <500 nt apart. Our
promoter set has 227 Escherichia coli promoters, 89 Bacillus G0ini is the initiation free energy for dinucleotide of type ij.
subtilis promoters, 252 vertebrate promoters and 74 plant DG0sym equals +0.43 kcal/mol and is applicable if the duplex is
promoters. self-complementary.
E.coli promoter sequences. The E.coli promoters were taken DG0i‚ j is the standard free energy change for the dinucleotide of
from the PromEC dataset (42), which provides a compilation type ij.
of 471 experimentally identified transcriptional start sites. As Because our analysis involves long continuous stretches of
mentioned above, after excluding all the TSSs that are <500 nt DNA molecules, in our calculation we did not consider the two
apart, the dataset contains 227 promoters. With the help of terms, G0ini and DG0sym , which are more relevant for short oli-
TSS information, promoter sequences were extracted from the gonucleotides. In the present calculation, each promoter
E.coli genome sequence (NCBI accession no: NC_000913). sequence is divided into overlapping windows of 15 bp (or
B.subtilis promoter sequences. The TSSs for B.subtilis pro- 14 dinucleotide steps) and for each window, the free energy is
moters were obtained from the DBTBS database (43). The calculated as given in the above equation. The energy values
required length sequences around TSSs were extracted from corresponding to the 10 unique dinucleotide sequences are
the Bacillus genome sequence (NCBI accession no: taken from the unified parameters proposed recently (47,48).
NC_000964). The DBTBS dataset has 97 Bacillus promoters
Curvature prediction
with experimentally identified start site. Out of the 97 Bacillus
promoters, 89 promoters were selected after excluding all the All the curvature calculations on the promoter sequences stud-
TSSs that are <500 nt apart. ied in this analysis were carried out with the help of in-house
Nucleic Acids Research, 2005, Vol. 33, No. 10 3167

software NUCGEN (49). Our earlier analysis showed that a set calculations on the shuffled sequences, which have the
of dinucleotide parameters (CS) based on crystal structure same nucleotide composition as the actual promoters but
data of oligonucleotides (50,51) can correctly predict the cur- lack their sequence patterns, if any. The properties of shuffled
vature of synthetic and genomic DNA sequences. Hence, the sequences thus provide a baseline for comparative analysis of
CS parameters were used for the DNA structure generation. the actual promoter sequences.
Additional analysis (A. Kanhere and M. Bansal, unpublished
data) also showed that for a reliable curvature prediction, the
Promoter sequences are less stable than coding
window size should be at least 50 bp or larger. Hence, we
sequences
chose a window size of 75 bp for all the curvature calculations.
This not only allowed us to make a more reliable estimation of It is well known that DNA stability depends primarily on
curvature, but also helped to reduce the noise. Thus, for a the sum of the interactions between the constituent dinuc-
promoter sequence of length ‘n’ and with a window size leotides (46). The mean stability profiles of different groups
‘w’ = 75 bp, we obtained (n  w + 1) number of DNA frag- of promoter sequences were calculated based on this principle
ments. The curvature of the predicted structure for each of and are shown in Figure 1A–D. The most striking feature
these fragments was calculated in terms of (i) radius of cur- across all groups of promoters is the absence of any strong
vature (LSC), (ii) ratio of maximum component (Imax) to min- features in the shuffled sequences. Another prominent feature
imum component (Imin) of moments of inertia (Imax/Imin) and of the analysis (Figure 1A–D) is the difference in stabilities
(iii) ratio of end-to-end distance ‘d’ to the contour length ‘lmax’ between the upstream and downstream regions. In all four
along the path traced by the DNA molecule (d/lmax). Because groups of promoter sequences, the average stability of
similar trends were observed for all three parameters, only the upstream region is predicted to be lower than the average
parameter d/lmax is discussed in detail. stability of downstream region. The lower stability of
upstream region probably arises owing to the higher AT con-
tent in this region (Table 1).
DNase I and nucleosomal positioning preference of The bacterial promoter sequences (Figure 1C and D) show
DNA sequences lowered stability around the 10 region, while the eukaryotic
Two different trinucleotide models, based on nucleosomal promoter sequences (Figure 1A and B) show a peak lying
positioning preferences (52) and DNase I sensitivity (53), between 25 and 35 region. The slight shift in the peak
have been suggested for bendability predictions of DNA in eukaryotic promoter sequences as compared with proka-
sequences. We followed the procedure used previously for ryotic sequences also suggests that the peak corresponds to
the analysis of a set of human promoter sequences (54), whereby the 10 promoter element in bacteria and to the TATA box
the bendability profiles are calculated by looking up the values (at 30 position) in the eukaryotic promoter sequences. This
of trinucleotide parameters corresponding to each consecutive peak vanishes in the case of the TATA-less promoters in plants
overlapping trinucleotide in the sequence. (data not shown) as well as in the case of shuffled sequences,
The purpose of this study is to analyze general character- thus confirming that the peak is owing to the characteristic
istics of each set of promoter sequences. Hence, an average TATA box sequence in this region. Similar stability calcula-
profile is obtained for each group of promoters, by taking the tion on E.coli promoter sequences, using a slightly smaller
mean value at each position, over all the promoter sequences in window size and different free energy parameters, had also
any given group. For this purpose, all the sequences were reported a low stability peak around 10 region (11). Our
aligned such that all the TSS are in identical position, one analysis on a diverse set of promoters confirms the universal
below the other, and no gaps were introduced in order to nature of this characteristic peak.
maximize the sequence similarity. The mean and standard
errors were calculated by bootstrap method using 100 runs. Curvature prediction for promoter sequences
The average properties were compared with the corresponding
properties of shuffled sequences. It is found that even in the absence of any external force, some
DNA molecules can adopt a stable curved structure. Presence
of such intrinsic curved DNA, upstream of promoter
sequences, has been shown experimentally for eukaryotic
RESULTS and prokaryotic systems (13–28). To examine whether
The promoter sequence dataset, used in this study, comprises the presence of such altered DNA structure can be predicted
only experimentally proven TSS from different organisms, from the promoter sequences, we obtained curvature profiles
ranging from prokaryotes to eukaryotes. The promoter for each group of promoters. The d/lmax profiles for all the four
sequences of prokaryotes belong to two classes: those from groups of promoters are shown in Figure 2A–D. Pronounced
E.coli, a well studied bacteria of gram-negative class and those curvature is predicted for DNA regions in the vicinity of TSS
from B.subtilis, a representative of gram-positive bacterial of both the sets of bacterial promoter sequences (Figure 2C and
class. Eukaryotic promoters are also grouped into two classes, D). An additional curved region around 300 position is pre-
depending on whether they are from vertebrates or from dicted for the B.subtilis promoters. In the same region, i.e.
plants. The choice of our dataset permits us to compare the around 300 position, a curved region is predicted in both
properties of promoters from different classes of organisms plant and vertebrate promoters (Figure 2A and B). The mag-
and find out the similarity and differences among them. nitude of the mean curvature predicted (around 300 position)
Another important feature of the present analysis is the for the vertebrate promoters is much smaller when compared
comparison of the properties of promoter sequences with with bacterial and plant promoters.
3168 Nucleic Acids Research, 2005, Vol. 33, No. 10

Figure 1. Distribution of free energy of duplex formation, near the TSSs. The figure shows the average free energy profiles (black) with respect to the relative
base position (x-axis), in the case of (A) vertebrate (B) plant (C) E.coli and (D) B.subtilis promoters. More negative values indicate greater stability (indicated by
black arrow on the top right hand corner of the figure). The profiles in this, and in subsequent Figures 2–4, extend from 500 nt upstream to 500 nt downstream of
TSS (shown as dashed vertical line at 0 position). The profiles calculated for the shuffled sequences in the upstream and downstream regions are shown (gray) in
each case.

Table 1. The average frequency of mononucleotides A+T in different groups of promoter sequences

Group of promoters Complete genome Present dataset


Upstream (500 to TSS) Downstream (TSS to +500) Upstream (150 to 50) Downstream (100 to 200)

Vertebrate — 0.47 (0.11) 0.43 (0.11) 0.44 (0.14) 0.40 (0.13)


Plant — 0.63 (0.09) 0.54 (0.12) 0.58 (0.13) 0.50 (0.13)
E.coli 0.49 0.53 (0.06) 0.49 (0.04) 0.56 (0.08) 0.49 (0.06)
B.subtilis 0.56 0.60 (0.04) 0.57 (0.04) 0.60 (0.08) 0.59 (0.06)

The standard deviation values are given in parenthesis.

prominent as compared with that in the other three groups


Bendability prediction for promoter sequences of promoters. The difference between the predicted bendab-
As mentioned above, bendability plays an important role in ility of the shuffled and genomic sequences is also less sig-
gene expression. Hence we predicted bendability profiles for nificant in the case of vertebrate promoters.
different classes of promoters. The bendability profiles obtained
using the nucleosomal positioning preference (Figure 3A–D) Compositional analysis of promoter sequences
and DNase I based-bendability measure (Figure 4A–D) match The characteristic differences observed in the upstream and
well. Observation of the two profiles, for all four groups of downstream regions can be a consequence of base composi-
promoters (Figure 3 and 4), reveals a common pattern in tion. Hence, we compared the composition of upstream and
the bendability of the upstream and downstream region, i.e. downstream regions of promoter sequences in terms of the
upstream promoter region is predicted to be less bendable than mononucleotide, dinucleotide and trinucleotide frequencies.
the downstream coding region. Although a similar character- For this calculation, we considered 100 nt fragments in the
istic bendability pattern is seen for all four types of promoters, upstream region (150 to 50 position) and downstream
in the case of vertebrate promoters the difference is less region (+100 to +200 position). These regions were selected
Nucleic Acids Research, 2005, Vol. 33, No. 10 3169

Figure 2. Distribution of curvature around TSSs. The figure shows the average predicted curvature (d/lmax) profile (black) against the relative base position (x-axis),
in the case of (A) vertebrate (B) plant (C) E.coli and (D) B.subtilis promoters. Smaller values indicate higher curvature (indicated by black arrow on the top right hand
corner of the figure).

because they are the best representatives of upstream and and downstream region as compared with the other two pro-
downstream regions. We avoided the middle region encom- moter groups. The Bacillus and plant sequences are AT-rich
passing the low stability peak corresponding to the TATA box, while vertebrate sequences are GC-rich (Table 1). The dinuc-
known to have a composition biased toward high T+A content. leotide frequencies of the vertebrate and Bacillus sequences
All the groups of promoters show a high occurrence of AA, (Figure 5A and D) span a wide range, extending from the
AT, TA and TT dinucleotides in the upstream regions as lower (3%) to the higher (12%) end of the scale, whereas
compared with the downstream region, while the dinucleotides those for plant and E.coli sequences (Figure 5B and C and
GC, GG, CG, CC, AG, GA and TG are over-represented in the Figure 6B and C) are clustered in the middle region of the scale
downstream region as compared with the upstream region (4–9%). In the case of trinucleotide frequencies, only the
(Figure 5). Interestingly, the trinucleotides AAA, TTT, Bacillus sequences span a wide range (0.5–5%) while for
AAT, ATT, ATA, TTA, TAA, TTC and TCA containing the other three classes they are clustered in the middle region
the above identified dinucleotides are generally over- (0.5–3.5%).
represented in the upstream region, while the trinucleotides
CCG, CGG, GCC, GGC, AGC, GAG, CAG, GTG, TGC,
TGG, CTG and GCT are over-represented in the downstream
region (Figure 6). This is also reflected in the mononucleotide DISCUSSION
composition of all the promoters, the upstream region being We have compared various structural properties as predicted
more A+T-rich than the downstream region (Table 1). for the promoter sequences from organisms belonging to three
The calculation of dinucleotide and trinucleotide frequen- different kingdoms, such as bacteria, animal and plant. The
cies along the sequence length also shows sharp transitions study indicates that there are certain properties, which may be
near TSS. Though compositional differences are seen in the shared by promoters, independent of the organism that they
upstream and downstream region of all the four groups of belong to or the gene that they control. In general, the promoter
promoters, the magnitude of the difference varies between regions are less stable and less bendable but contain DNA
the different groups. The plant promoters and E.coli promoters elements with enhanced curvature, when compared with the
(Figure 5B and C) are quite distinct in showing very large downstream coding regions. However, there are also striking
differences in the dinucleotide composition, between upstream differences between the structural profiles of prokaryotic and
3170 Nucleic Acids Research, 2005, Vol. 33, No. 10

Figure 3. Bendability distribution around TSSs calculated using trinucleotide parameters based on nucleosomal positioning preferences. The figure shows
the bendability profiles (black) with respect to the relative base position (x-axis), in the case of (A) vertebrate (B) plant (C) E.coli and (D) B.subtilis promoters.
For the sake of clarity, the profiles are smoothed using a 50 nt window. Smaller values indicate greater bendability (indicated by black arrow on the top right hand
corner of the figure).

eukaryotic promoters. Here, we discuss the possible role of Possible roles of DNA curvature and bendability
these observations and their implications in the process of
The presence of sequence-dependent DNA curvature in pro-
transcription.
moter region, independent of any external factors such as
proteins, has been experimentally observed in many cases
Low stability of promoter regions as compared with the [reviewed in (19,21,24–25)]; furthermore, transcriptional
non-promoter regions regulation by curved DNA stretches has been demonstrated
The RNA polymerase movement during transcription leads to in a number of cases (17,18,56–58). Our analysis clearly
the induction of positive supercoils ahead and negative super- shows that a significant number of promoters in all the groups
coils behind, leading to torsional stresses. Opening of any may have curved DNA elements upstream of the TSSs, thus
base pair under this stress changes the denaturing probabi- facilitating transcription. The difference in the location of
lity of every other base pair. It has been reported earlier predicted curved DNA from one group of promoter to another
that the susceptibility to duplex destabilization induced by correlates with differences in their transcription regulation (as
superhelical stress is closely associated with the boundaries discussed below).
of genes and transcription regulatory sites (55). The low In addition to the sequence-dependent intrinsic curvature,
stability predicted for promoter regions as compared with DNA bendability also plays an important role in transcription.
the non-promoter region can explain the stress-induced pro- Based on various experimental studies, it has been suggested
moter-specific opening of the DNA. It is interesting that that during transcription initiation, the promoter DNA of
the feature of lower stability of upstream region is common length 300 s wraps around the polymerase (22,39). It has
to all promoters, independent of their overall mononucleotide also been proposed that the energy cost of DNA bending may
composition (Table 1). In plants and E.coli promoters, the play a role in modulating the open complex formation, as well
difference in the dinucleotide composition (Figure 5B and as in facilitating promoter clearance and that, without this
C) is more prominent and this is reflected in the greater energy cost, the energy of RNAP–DNA complex would prob-
difference in stability between the upstream and down- ably be too high to permit the escape of the polymerase from
stream region. the promoter. In this context, our observation of the distinct
Nucleic Acids Research, 2005, Vol. 33, No. 10 3171

Figure 4. Bendability distribution in the vicinity of TSSs calculated using DNase I sensitivity parameters. The figure shows the bendability profiles (black) with
respect to the relative base position (x-axis), in the case of (A) vertebrate (B) plant (C) E.coli and (D) B.subtilis promoters. For the sake of clarity, the profiles are
smoothed using a 50 nt window. Less negative values indicate higher bendability (indicated by black arrow on the top right hand corner of the figure).

Comparison between the four groups of promoters


presence of a low bendability region in the proximity of the
TSS is significant. Although the overall function seems to be conserved across
Another explanation for the typical bendability profile the different groups of promoters, they do differ in finer
(Figures 3 and 4), with upstream region having lower bend- details. In eukaryotes, the DNA is packed into nucleosomes,
ability than the downstream region, was given by Pedersen which blocks the recognition of the core promoters by the
et al. (54) from their analysis of a set of human promoters. basic transcription machinery (65–68). In comparison,
They suggested that the characteristic bendability pattern in the prokaryotic DNA is essentially naked, i.e. the RNA
the promoter sequences is possibly connected with the forma- polymerase is not greatly hindered in its ability to gain
tion of nucleosomes. Because nucleosomes have a preference access to the DNA and initiate RNA synthesis (65). DNA
for more flexible DNA (59,60), any elements that destabilize flexibility is also known to play a role in nucleosome forma-
nucleosomes can activate transcription by facilitating access to tion, and perhaps overall higher flexibility of downstream
transcription factors (61,62). This fact is supported by the regions in eukaryotic promoters (matching that of shuffled
observation of low bendability in the upstream region and DNA) is important in this regard. In contrast, the prokaryotic
high bendability in the downstream region of eukaryotic pro- promoters, where the DNA is not packaged into nucleosomes,
moters (Figure 3A and B and Figure 4A and B). In bacterial are overall more rigid than the shuffled sequences.
genomes, proteins, such as H-NS (63), are analogous in func- Another noticeable feature of eukaryotic promoters is the
tion to the histones and the HMG box proteins of eukaryotes presence of regulatory sites hundreds of base pairs upstream
and this may explain the low bendability in the upstream from TSS, while the regulatory elements in bacterial pro-
region in bacteria, even though the genome is not organized moters tend to be located in the vicinity of the TSS. Our
as nucleosomes. Another interesting point about the bendab- analysis also indicates that the special upstream features
ility profiles is that although the two trinucleotide bending seem to extend at least up to 500 position in the case of
parameters are not highly correlated (64), the bendability pro- eukaryotic promoter sequences (Figures 1–4 A, B), but seem
files of the promoters, derived using the two parameters, show to be confined up to 300 position in the case of prokaryotic
very similar features, suggesting that the nucleotide composi- promoters (Figures 1–4 C, D). The observation that in euka-
tion in promoters is such that some characteristic properties, ryotes, transcription factors can bind hundreds of base pair
such as bendability, are conserved. upstream seems to be reflected in the position of the predicted
3172 Nucleic Acids Research, 2005, Vol. 33, No. 10

Figure 5. The percentage occurrence of each dinucleotide in upstream (y-axis) versus downstream (x-axis) region in the near vicinity of TSS, i.e. 150 to 50
and 100 to 200 in the case of (A) vertebrate, (B) plant, (C) E. coli (D) B. subtilis. The dinucleotides, which are present more often in downstream region than in the
upstream region, appear below the diagonal and vice versa.

curved region. Both groups of eukaryotic promoters show the density when compared with animal genomes; however, the
presence of a curved region considerably upstream of the TSS increase in gene density comes at the cost of greater logistic
(>200 bp); however, the bacterial promoters show the pres- problem in transcriptional regulation of the genes (72). One
ence of a curved region nearer to the TSS. solution to this problem would be to have a larger number of
The observed differences between the promoters from the regulatory proteins and plant genomes are in fact known to
two bacterial origins may be attributed to the differences in have a very high percentage of genes coding for transcription
their mode of binding to their respective RNA polymerases factors (73). We would like to suggest that the sharp
(69–71). On the other hand, differences between the two euka- delineation in the various properties, such as stability, bend-
ryotic promoter sequences, such as vertebrate and plant, may ability, etc., of intergenic and coding region of plant
be a consequence of basic differences in their transcription genomes may be one more way of identifying the transcrip-
regulation mechanisms, as well as due to their distinct com- tional regulatory regions. In line with this hypothesis, it is
position (with plants being overall AT-rich and more so in the to be noted that the vertebrate class has an average gene
upstream region). The recent releases of plant genome density much smaller than the members of other three groups
sequences have revealed that plants have much higher gene (Gene densities: Human 1/100 000, Arabidopsis 1/4000,
Nucleic Acids Research, 2005, Vol. 33, No. 10 3173

Figure 6. The percentage occurrence of trinucleotides in upstream ( y-axis) versus downstream (x-axis) region in the near vicinity of TSS, i.e. 150 to 50 and 100 to
200 in the case of (A) vertebrate, (B) plant, (C) E. coli (D) B. subtilis. The trinucleotides, which are found more often in the downstream region than in the upstream
region, appear below the diagonal and vice versa. For the sake of clarity, only some trinucleotides, which show significantly large differences in their upstream and
downstream frequencies, are labeled.

E.coli 1/1000 and B.subtilis 1/1000), and accordingly ver- compared with eukaryotic sequences, where they seem to
tebrate promoters do not show large differences in their extend over a significantly larger upstream region. In addition,
upstream and downstream region, as compared with the the prokaryotic sequences are predicted to be overall less
other three groups of promoters. bendable when compared with eukaryotic promoters. The dif-
ferences in prokaryotic and eukaryotic promoter sequences
match well with their distinct patterns of transcription regu-
CONCLUSIONS lation. We have also observed some distinct features in the two
prokaryotic promoter sets as well as in the eukaryotic promoter
Promoter regions in prokaryotic and eukaryotic genomes are sets. In general, these similarities and differences between the
predicted to have several common structural features, such as promoters can provide a rationale for some of the known facts
lower stability, higher curvature and lesser bendability as com- about the transcription process in the various organisms.
pared with their neighboring regions. All the four groups of
promoters considered here are also distinctly different from
non-promoter regions in their mononucleotide, dinucleotide
ACKNOWLEDGEMENTS
and trinucleotide composition. However, there are also some
important differences among the various groups of promoters. This work of supported by Department of Biotechnology,
In the case of prokaryotic sequences, the distinct structural India. During the study, A.K. was supported by University
features are confined to relatively short upstream region as Grants Commission and Council of Scientific and Industrial
3174 Nucleic Acids Research, 2005, Vol. 33, No. 10

Research, India. The Open Access publication charges for 22. Rees,W.A., Keller,R.W., Vesenka,J.P., Yang,G. and Bustamante,C.
this article were waived by Oxford University Press. (1993) Evidence of DNA bending in transcription complexes imaged
by scanning force microscopy. Science, 260, 1646–1649.
Conflict of interest statement. None declared. 23. Tanaka,K., Muramatsu,S., Yamada,H. and Mizuno,T. (1991) Systematic
characterization of curved DNA segments randomly cloned from
Escherichia coli and their functional significance. Mol. Gen. Genet.,
226, 367–376.
24. Hagerman,P.J. (1990) Sequence-directed curvature of DNA. Annu.
REFERENCES Rev. Biochem., 59, 755–781.
25. Kanhere,A. and Bansal,M. (2004) DNA bending and curvature:
1. Harley,C.B. and Reynolds,R.P. (1987) Analysis of E.coli promoter A ‘turning’ point in DNA function? PINSA, B70, 239–255.
sequences. Nucleic Acids Res., 15, 2343–2361. 26. Prosseda,G., Falconi,M., Giangrossi,M., Gualerzi,C.O., Micheli,G. and
2. Bucher,P. (1990) Weight matrix descriptions of four eukaryotic RNA Colonna,B. (2004) The virF promoter in Shigella: more than just a curved
polymerase II promoter elements derived from 502 unrelated promoter DNA stretch. Mol. Microbiol., 51, 523–537.
sequences. J. Mol. Biol., 212, 563–578. 27. Kaji,M., Matsushita,O., Tamai,E., Miyata,S., Taniguchi,Y.,
3. Buckle,M. and Buc,H. (1989) Fine mapping of DNA single-stranded Shimamoto,S., Katayama,S., Morita,S. and Okabe,A. (2003) A novel type
regions using base-specific chemical probes: study of an open complex of DNA curvature present in a Clostridium perfringens ferredoxin gene:
formed between RNA polymerase and the lac UV5 promoter. characterization and role in gene expression. Microbiology, 149,
Biochemistry, 28, 4388–4396. 3083–3091.
4. Chen,Y.F. and Helmann,J.D. (1997) DNA-melting at the Bacillus subtilis 28. Agrawal,G.K., Asayama,M. and Shirai,M. (2003) Two distinct curved
flagellin promoter nucleates near 10 and expands unidirectionally. DNAs upstream of the light-responsive psbA gene in a cyanobacterium.
J. Mol. Biol., 267, 47–59. Biosci. Biotechnol. Biochem., 67, 1817–1821.
5. Craig,M.L., Suh,W.C. and Record,M.T.,Jr (1995) HO. and DNase I 29. Kozobay-Avraham,L., Hosid,S. and Bolshoy,A. (2004) Curvature
probing of E sigma 70 RNA polymerase–lambda PR promoter distribution in prokaryotic genomes. In Silico Biol., 4, 29.
open complexes: Mg2+ binding and its structural consequences at the 30. Jauregui,R., Abreu-Goodger,C., Moreno-Hagelsieb,G., Collado-Vides,J.
transcription start site. Biochemistry, 34, 15624–15632. and Merino,E. (2003) Conservation of DNA curvature signals in
6. Sasse-Dwight,S. and Gralla,J.D. (1989) KMnO4 as a probe for lac regulatory regions of prokaryotic genes. Nucleic Acids Res., 31,
promoter DNA melting and mechanism in vivo. J. Biol. Chem., 264, 6770–6777.
8074–8081. 31. Kalate,R.N., Kulkarni,B.D. and Nagaraja,V. (2002) Analysis of DNA
7. Siebenlist,U., Simpson,R.B. and Gilbert,W. (1980) E.coli RNA curvature distribution in mycobacterial promoters using theoretical
polymerase interacts homologously with two different promoters. Cell, models. Biophys. Chem., 99, 77–97.
20, 269–281. 32. Gabrielian,A.E., Landsman,D. and Bolshoy,A. (2000) Curved DNA
8. Suh,W.C., Ross,W. and Record,M.T.,Jr (1993) Two open complexes and in promoter sequences. In Silico Biol., 1, 183–196.
a requirement for Mg2+ to open the lambda PR transcription start site. 33. Tosato,V., Gjuracic,K., Vlahovicek,K., Pongor,S., Danchin,A. and
Science, 259, 358–361. Bruschi,C.V. (2003) The DNA secondary structure of the Bacillus subtilis
9. Nakata,K., Kanehisa,M. and Maizel,J.V.,Jr (1988) Discriminant analysis genome. FEMS Microbiol Lett., 218, 23–30.
of promoter regions in Escherichia coli sequences. Comput. Appl. Biosci., 34. Nikolov,D.B., Chen,H., Halay,E.D., Hoffman,A., Roeder,R.G. and
4, 367–371. Burley,S.K. (1996) Crystal structure of a human TATA box-binding
10. Vollenweider,H.J., Fiandt,M. and Szybalski,W. (1979) A relationship protein/TATA element complex. Proc. Natl Acad. Sci. USA, 93,
between DNA helix stability and recognition sites for RNA polymerase. 4862–4867.
Science, 205, 508–511. 35. Schultz,S.C., Shields,G.C. and Steitz,T.A. (1991) Crystal structure
11. Margalit,H., Shapiro,B.A., Nussinov,R., Owens,J. and Jernigan,R.L. of a CAP-DNA complex: the DNA is bent by 90 degrees. Science,
(1988) Helix stability in prokaryotic promoter regions. Biochemistry, 27, 253, 1001–1007.
5179–5188. 36. Nagaich,A.K., Appella,E. and Harrington,R.E. (1997) DNA bending is
12. Pedersen,A.G, Jensen,L.J, Brunak,S., Staerfeldt,H.H. and Ussery,D.W. essential for the site-specific recognition of DNA response elements by the
(2000) A DNA structural atlas for Escherichia coli. J. Mol. Biol., 299, DNA binding domain of the tumor suppressor protein p53. J. Biol. Chem.,
907–930. 272, 14842–14849.
13. Bossi,L. and Smith,D.M. (1984) Conformational change in the DNA 37. Konig,P. and Richmond,T.J. (1993) The X-ray structure of the GCN4-
associated with an unusual promoter mutation in a tRNA operon of bZIP bound to ATF/CREB site DNA shows the complex depends on DNA
Salmonella. Cell, 39, 643–652. flexibility. J. Mol. Biol., 233, 139–154.
14. Galas,D.J., Eggert,M. and Waterman,M.S. (1985) Rigorous pattern- 38. Rivetti,C., Guthold,M. and Bustamante,C. (1999) Wrapping of DNA
recognition methods for DNA sequences. Analysis of promoter sequences around the E.coli RNA polymerase open promoter complex. EMBO J.,
from Escherichia coli. J. Mol. Biol., 186, 117–128. 18, 4464–4475.
15. Hsu,L.M., Giannini,J.K., Leung,T.W. and Crosthwaite,J.C. (1991) 39. Cheetham,G.M., Jeruzalmi,D. and Steitz,T.A. (1999) Structural basis
Upstream sequence activation of Escherichia coli argT promoter in vivo for initiation of transcription from an RNA polymerase–promoter
and in vitro. Biochemistry, 30, 813–822. complex. Nature, 399, 80–83.
16. Kuhnke,G., Fritz,H.J. and Ehring,R. (1987) Unusual properties of 40. Ohler,U., Niemann,H., Liao,G. and Rubin,G.M. (2001) Joint modeling of
promoter-up mutations in the Escherichia coli galactose operon and DNA sequence and physical properties to improve eukaryotic promoter
evidence suggesting RNA polymerase-induced DNA bending. recognition. Bioinformatics, 17, S199–S206.
EMBO J., 6, 507–513. 41. Kanhere,A. and Bansal,M. (2005) A novel method for prokaryotic
17. Lamond,A.I. and Travers,A.A. (1983) Requirement for an upstream promoter prediction based on DNA stability. BMC Bioinformatics, 6, 1.
element for optimal transcription of a bacterial tRNA gene. Nature, 42. Hershberg,R., Bejerano,G., Santos-Zavaleta,A. and Margalit,H. (2001)
305, 248–250. PromEC: an updated database of Escherichia coli mRNA promoters
18. McAllister,C.F. and Achberger,E.C. (1988) Effect of polyadenine- with experimentally identified transcriptional start sites. Nucleic
containing curved DNA on promoter utilization in Bacillus subtilis. Acids Res., 29, 277.
J. Biol. Chem., 263, 11743–11749. 43. Makita,Y., Nakao,M., Ogasawara,N. and Nakai,K. (2004) DBTBS:
19. Perez-Martin,J., Rojo,F. and de Lorenzo,V. (1994) Promoters responsive database of transcriptional regulation in Bacillus subtilis and its
to DNA bending: a common theme in prokaryotic gene expression. contribution to comparative genomics. Nucleic Acids Res., 32, D75–D77.
Microbiol. Rev., 58, 268–290. 44. Praz,V., Perier,R., Bonnard,C. and Bucher,P. (2002) The Eukaryotic
20. Plaskon,R.R. and Wartell,R.M. (1987) Sequence distributions associated Promoter Database, EPD: new entry types and links to gene expression
with DNA curvature are found upstream of strong E.coli promoters. data. Nucleic Acids Res., 30, 322–324.
Nucleic Acids Res., 15, 785–796. 45. Schmid,C.D., Praz,V., Delorenzi,M., Perier,R. and Bucher,P. (2004)
21. Ohyama,T. (2001) Intrinsic DNA bends: an organizer of local chromatin The Eukaryotic Promoter Database EPD: the impact of in silico primer
structure for transcription. Bioessays, 23, 708–715. extension. Nucleic Acids Res., 32, D82–D85.
Nucleic Acids Research, 2005, Vol. 33, No. 10 3175

46. Breslauer,K.J., Frank,R., Blocker,H. and Marky,L.A. (1986) Predicting of the TATAAACGCC repeat sequence. J. Biol. Chem., 274,
DNA duplex stability from the base sequence. Proc. Natl Acad. Sci. 31847–31852.
USA, 83, 3746–3750. 61. Zhu,Z. and Thiele,D.J. (1996) A specialized nucleosome modulates
47. SantaLucia,J.,Jr (1998) A unified view of polymer, dumbbell, and transcription factor access to a C.glabrata metal responsive promoter.
oligonucleotide DNA nearest-neighbor thermodynamics. Proc. Natl Cell, 87, 459–470.
Acad. Sci. USA, 95, 1460–1465. 62. Iyer,V. and Struhl,K. (1995) Poly(dA:dT), a ubiquitous promoter element
48. Allawi,H.T. and SantaLucia,J.,Jr (1997) Thermodynamics and NMR of that stimulates transcription via its intrinsic DNA structure. EMBO J.,
internal G.T mismatches in DNA. Biochemistry, 36, 10581–10594. 14, 2570–2579.
49. Bansal,M., Bhattacharyya,D. and Ravi,B. (1995) NUPARM and 63. Hommais,F., Krin,E., Laurent-Winter,C., Soutourina,O., Malpertuy,A.,
NUCGEN: software for analysis and generation of sequence dependent Le Caer,J.P., Danchin,A. and Bertin,P. (2001) Large-scale monitoring of
nucleic acid structures. Comput. Appl. Biosci., 11, 281–287. pleiotropic regulation of gene expression by the prokaryotic nucleoid-
50. Kanhere,A. and Bansal,M. (2003) An assessment of three dinucleotide associated protein, H-NS. Mol. Microbiol., 40, 20–36.
parameters to predict DNA curvature by quantitative comparison with 64. Brukner,I., Sanchez,R., Suck,D. and Pongor,S. (1995) Trinucleotide
experimental data. Nucleic Acids Res., 31, 2647–2658. models for DNA bending propensity: comparison of models based on
51. Bansal,M. (1996) Structural variation observed in DNA crystal structures DNase I digestion and nucleosome packaging data. J. Biomol. Struct.
and their implications for protein–DNA interaction. In Sarma,R.H. and Dyn., 13, 309–317.
Sarma,M.H. (eds), Biological Structure and Dynamics, Proceedings of 65. Struhl,K. (1999) Fundamentally different logic of gene regulation in
ninth convention., Adenine press, New York, NY, Vol. 1, pp. 121–134. eukaryotes and prokaryotes. Cell, 98, 1–4.
52. Satchwell,S.C., Drew,H.R. and Travers,A.A. (1986) Sequence 66. Kornberg,R.D. and Lorch,Y. (2002) Chromatin and transcription: where
periodicities in chicken nucleosome core DNA. J. Mol. Biol., 191, do we go from here. Curr. Opin. Genet. Dev., 12, 249–251.
659–675. 67. Beato,M. and Eisfeld,K. (1997) Transcription factor access to chromatin.
53. Brukner,I., Sanchez,R., Suck,D. and Pongor,S. (1995) Sequence- Nucleic Acids Res., 25, 3559–3563.
dependent bending propensity of DNA as revealed by DNase I: parameters 68. Landick,R., Stewart,J. and Lee,D.N. (1990) Amino acid changes in
for trinucleotides;. EMBO J., 14, 1812–1818. conserved regions of the beta-subunit of Escherichia coli RNA
54. Pedersen,A.G., Baldi,P., Chauvin,Y. and Brunak,S. (1998) DNA structure polymerase alter transcription pausing and termination. Genes. Dev.,
in human RNA polymerase II promoters. J. Mol. Biol., 281, 663–673. 4, 1623–1636.
55. Benham,C.J. (1996) Duplex destabilization in superhelical DNA is 69. Weilbaecher,R., Hebron,C., Feng,G. and Landick,R. (1994) Termination-
predicted to occur at specific transcriptional regulatory regions. J. Mol. altering amino acid substitutions in the beta’ subunit of Escherichia coli
Biol., 255, 425–434. RNA polymerase identify regions involved in RNA chain elongation.
56. Brahms,G., Brahms,S. and Magasanik,B. (1995) A sequence-induced Genes. Dev., 8, 2913–2927.
superhelical DNA segment serves as transcriptional enhancer. J. Mol. 70. Zhou,Y.N. and Jin,D.J. (1998) The rpoB mutants destabilizing initiation
Biol., 246, 35–42. complexes at stringently controlled promoters behave like ‘stringent’
57. Kim,J., Klooster,S. and Shapiro,D.J. (1995) Intrinsically bent DNA in a RNA polymerases in Escherichia coli. Proc. Natl Acad. Sci. USA,
eukaryotic transcription factor recognition sequence potentiates 95, 2908–2913.
transcription activation. J. Biol. Chem., 270, 1282–1288. 71. Dobinson,K.F. and Spiegelman,G.B. (1987) Effect of the delta subunit of
58. Ellinger,T., Behnke,D., Knaus,R., Bujard,H. and Gralla,J.D. (1994) Bacillus subtilis RNA polymerase on initiation of RNA synthesis at two
Context-dependent effects of upstream A-tracts. Stimulation or inhibition bacteriophage phi 29 promoters. Biochemistry, 26, 8206–8213.
of Escherichia coli promoter function. J. Mol. Biol., 239, 466–475. 72. Bird,A.P. (1995) Gene number, noise reduction and biological
59. Fitzgerald,D.J. and Anderson,J.N. (1998) Unique translational complexity. Trends Genet., 11, 94–100.
positioning of nucleosomes on synthetic DNAs. Nucleic Acids Res., 73. Riechmann,J.L., Heard,J., Martin,G., Reuber,L., Jiang,C., Keddie,J.,
26, 2526–2535. Adam,L., Pineda,O., Ratcliffe,O.J., Samaha,R.R. et al. (2000)
60. Widlund,H.R., Kuduvalli,P.N., Bengtsson,M., Cao,H., Tullius,T.D. and Arabidopsis transcription factors: genome-wide comparative analysis
Kubista,M. (1999) Nucleosome structural features and intrinsic properties among eukaryotes. Science, 290, 2105–2110.

You might also like