RNA Secondary Structure
RNA Secondary Structure
Basis of RNA
RNA bases A,C,G,U Primary Structure of RNA : A sequence of the bases A,G,C and U Base Pairs
A-U G-C
Stability
G-C > A-U > G-U
Basis of RNA
single stranded; strand folds upon itself to form base pairs; can have a diverse form of secondary structure compared to base sequences, structure conservation is most important with RNA
Structure Rules
Base pairing stabilize the structure Unpaired sections (loops) destabilize the structure when a base in one position changes, the base it pairs to must also change to maintain the same structure (covariation)
RNA Combinatorics
The number of RNA secondary structures for the sequence [1,n]
Recurrence Relation
based on the combinatorics, there are approximately 1.3 billion possible RNA structures of length n = 27.
Hairpin Loop
generally at least 4 bases long for each loop
Bulge Loops
occur when bases on one side of the structure cannot form base pairs
Interior Loops
occur when bases on both sides of the structure cannot form base pairs
Junctions (Multi-loops)
two or more double-stranded regions converge to form a closed structure
Nussinov Algorithm
Four ways to get the optimal structure between position i and j from the optimal substructure
Nussinov Algorithm
compares a sequence against itself in a n*n matrix Find the maximum of the scores for the four possible structures at a particular position
Nussinov Algorithm
this method will not necessarily generate the most stable structure may have scattered matches which are not biologically reasonable Does not give accurate structure predictions
Minimize Energy
All possible choices of complementary sequences are considered
consider all possible choices of complementary sequences to find the most stable structure Stacks (contiguous nested base pairs) are the dominant stabilizing force contribute to the negative free energy Unpaired bases form destabilizing loops, contributing the positive free energy.
Hairpin loops, bulge/internal loops, and multiloops.
Minimize Energy
Energy minimization algorithm predicts the secondary structure by minimizing the free energy (G) G calculated as sum of individual contributions of:
loops base pairs secondary structure elements
Given the energy tables, the free energy can be calculated for a structure
The score in dynamic programming is based on the free energy values Gaps represent some form of a loop The most widely used software that incorporates this minimum free energy algorithm is MFOLD/RNAfold
Drawbacks
Only have one optimal solution
The miRNA/target site duplex stability was evaluated by assigning energy (G) to the duplex
A candidate target site was rejected if the G value was higher than a threshold
partition the duplex into two parts, the seed (8nt of the miRNA) and out-seed For each part, consider the following feature of the duplex
The number of paired bases the number of bulges the number of loops the number of asymmetric loops eight features, each representing the number of bulges of lengths 1-7 and those with lengths greater than 7
Eight features, each representing the number of symmetric loops with lengths 1-7 and those with lengths greater than 7. Eight features each representing the number of asymmetric loops with lengths 1-7 and those with lengths greater than 7. the distance from the start of the seed (the 3 end) to the first paired base of the 5 start of the out-seed