0% found this document useful (0 votes)
60 views

Weighted Interval Scheduling Segmented Least Squares Knapsack Problem RNA Secondary Structure

Uploaded by

lulalala
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

Weighted Interval Scheduling Segmented Least Squares Knapsack Problem RNA Secondary Structure

Uploaded by

lulalala
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 51

6.

D YNAMIC P ROGRAMMING
I
‣ weighted interval scheduling
‣ segmented least
squares
‣ knapsack problem
‣ RNA secondary structure

Lecture slides by Kevin Wayne


Copyright © 2005 Pearson-Addison Wesley
http://www.cs.princeton.edu/~ wayne/kleinberg-tard
os

Last updated on 1/15/20 6:20 AM


Algorithmic paradigms

Greed. Process the input in some order, myopically making irrevocable decisions.

Divide-and-conquer. Break up a problem into independent subproblems; solve each


subproblem; combine solutions to subproblems to form solution to original problem.

Dynamic programming. Break up a problem into a series of overlapping subproblems;


combine solutions to smaller subproblems to form solution to large subproblem.

fancy name for


caching intermediate results
in a table for later reuse

2
Dynamic programming history

Bellman. Pioneered the systematic study of dynamic programming in 1950s.

Etymology.

・Dynamic programming = planning over time.


・Secretary of Defense had pathological fear of mathematical research. Bellman
・sought a “dynamic” adjective to avoid conflict.

3
Dynamic programming applications

Application areas.

・Computer science: AI, compilers, systems, graphics, theory, …. Operations


・research.
・Information theory.
・Control theory.
・Bioinformatics.
Some famous dynamic programming algorithms.

・Avidan–Shamir for seam carving.


・Unix diff for comparing two files.
・Viterbi for hidden Markov models.
・De Boor for evaluating spline curves.
・Bellman–Ford–Moore for shortest path.
・Knuth–Plass for word wrapping text in T X .
・Cocke–Kasami–Younger for parsing context-free grammars. Needleman–
・Wunsch/Smith–Waterman for sequence alignment.
4
Dynamic programming books
6. D YNAMIC P ROGRAMMING
I
‣ weighted interval scheduling
‣ segmented least
squares
‣ knapsack problem
‣ RNA secondary structure

SECTIONS 6.1–6.2
Weighted interval scheduling

・Job j starts at sj, finishes at fj, and has weight wj > 0.


・Two jobs are compatible if they don’t overlap.
・Goal: find max-weight subset of mutually compatible jobs.

sj wj fj

h
time
0 1 2 3 4 5 6 7 8 9 10 11
7
Earliest-finish-time first algorithm

Earliest finish-time first.

・Consider jobs in ascending order of finish time.


・Add job to subset if it is compatible with previously chosen jobs.
Recall. Greedy algorithm is correct if all weights are 1.

Observation. Greedy algorithm fails spectacularly for weighted version.

weight = 999 b
weight = 1
weight = 1 a
h
time
0 1 2 3 4 5 6 7 8 9 10 11
8
Weighted interval scheduling

Convention. Jobs are in ascending order of finish time: f1 ≤ f2 ≤ . . . ≤ fn . Def. p( j )

= largest index i < j such that job i is compatible with j.


Ex. p(8) = 1, p(7) = 3, p(2) = 0. i is leftmost interval that
ends before j begins

8
time
0 1 2 3 4 5 6 7 8 9 10 11
9
Dynamic programming: binary choice

Def. OPT( j ) = max weight of any subset of mutually compatible jobs for subproblem
consisting only of jobs 1, 2, ..., j.

Goal. OPT(n) = max weight of any subset of mutually compatible jobs.

Case 1. OPT( j ) does not select job j.


・Must be an optimal solution to problem consisting of remaining jobs 1, 2, ...,
j – 1.
optimal substructure property
(proof via exchange argument)
Case 2. OPT( j ) selects job j.
・Collect profit wj.
・Can’t use incompatible jobs { p( j ) + 1, p( j ) + 2, ..., j – 1 }.
・Must include optimal solution to problem consisting of remaining compatible
jobs 1, 2, ..., p( j ).

Bellman equation. OP T (j ) = 0 j = 0
max { OP T (j 1), wj + OP T (p(j )) } j >
10
0
Weighted interval scheduling: brute force

BRUTE-FORCE (n, s1, …, sn, f1, …, fn, w1, …, wn)


_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Sort jobs by finish time and renumber so that f1 ≤ f2 ≤ … ≤ fn.

Compute p[1], p[2], …, p[n] via binary search.


RETURN COMPUTE-OPT(n).

COMPUTE-
OPT( j )
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ IF
(j = 0)
RETURN 0.

ELSE
RETURN max {COMPUTE-OPT( j – 1), + COMPUTE-
wj OPT( p[ j ]) }.
11
Dynamic programming: quiz 1

What is running time of COMPUTE- O PT(n) in the worst case?

A. Θ (n log n)

B. Θ (n2) (1) n= 1
n
T (n) =
C. Θ (1.618 ) 2T (n 1) + (1) n> 1
D. Θ (2 n)

COMPUTE-
OPT( j )
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ IF
(j = 0)
RETURN 0.

ELSE
RETURN max {COMPUTE-OPT( j – 1), + COMPUTE-
wj OPT( p[ j ]) }.
12
Weighted interval scheduling: brute force

Observation. Recursive algorithm is spectacularly slow because of overlapping


subproblems ⇒ exponential-time algorithm.

Ex. Number of recursive calls for family of “layered” instances grows like Fibonacci
sequence.

1 4 3

2
3 2 2 1
3

4
2 1 1 0 1 0
5
1 0
p(1) = 0, p(j) = j - 2
recursion tree

13
Weighted interval scheduling: memoization

Top-down dynamic programming (memoization).


・Cache result of subproblem j in M [ j ].
・Use M [ j ] to avoid solving subproblem j more than once.
TOP-DOWN(n, s1, …, sn, f1, …, fn, w1, …, wn)
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Sort jobs by finish time and renumber so that f1 ≤ f2 ≤ … ≤ fn.


Compute p[1], p[2], …, p[n] via binary search.
M [0 ] ← global array

0.
RETURN M-COMPUTE-OPT(n).

M-COMPUTE-
OPT( j )
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

IF (M [ j ] is
uninitialized)
M [ j ] ← max { M-COMPUTE-OPT ( j – 1), wj + M-COMPUTE-OPT( p[ j ])
}.
RETURN M
[ j ].
14
Weighted interval scheduling: running time

Claim. Memoized version of algorithm takes O(n log n) time.


Pf.
・Sort by finish time: O(n log n) via mergesort.
・Compute p[ j ] for each j : O(n log n) via binary search.
・M-COMPUTE-OPT( j ): each invocation takes O(1) time and either - (1)
returns an initialized value M [ j ]
- (2) initializes M [ j ] and makes two recursive calls

・Progress measure Φ = # initialized entries among M [1 .. n ]. -


initially Φ = 0; throughout Φ ≤ n.
- (2) increases Φ by 1 ⇒ ≤ 2n recursive calls.

・Overall running time of M-COMPUTE-OPT(n) is O(n). ▪

15
16
Weighted interval scheduling: finding a solution

Q. DP algorithm computes optimal value. How to find optimal solution? A. Make a


second pass by calling FIND-SOLUTION(n).

FIND-
SOLUTION( j )
_____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________
____________________________________________________________________________________________________________

IF (j = 0)
RETURN ∅.
ELSE IF (wj + M [ p[ j ]] > M [ j –
1])
RETURN { j } ∪ FIND-
SOLUTION(p[ j ]). ELSE
RETURN FIND-SOLUTION( j –
1).
M [ j ] = max { M[j – 1], wj +
M[p[ j ]] }.

Analysis. # of recursive calls ≤ n ⇒ O(n).


17
Weighted interval scheduling: bottom-up dynamic programming
Bottom-up dynamic programming. Unwind recursion.

BOTTOM-UP(n, s1, …, sn, f1, …, fn, w1, …, wn)


_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Sort jobs by finish time and renumber so that f1 ≤ f2 ≤ … ≤ fn.


Compute p[1], p[2], …, p[n].
previously computed values
M [0 ] ← 0.
FOR j = 1 TO n
M [ j ] ← max { M [ j – 1], +M
wj [ p[ j ]] }.
_________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Running time. The bottom-up version takes O(n log n) time.


18
MAXIMUM SUBARRAY PROBLEM

Goal. Given an array x of n integer (positive or negative), find a contiguous subarray


whose sum is maximum.

1 5 − 3 −6 5 2 −5 5 9 −9 −2 8 −1 6
2 1 1 1 9 6 3 8 7 3 3 4 5
187

Applications. Computer vision, data mining,


genomic sequence analysis, technical job interviews, ….

19
MAXIMUM RECTANGLE PROBLEM

Goal. Given an n-by-n matrix A, find a rectangle whose sum is maximum.

2 5 0 5 2 2
3
4 3 1 3 2 1 1
5 6 3 5 1 4 2
A = 1 1 3 1 4 1 1
3 3 2 0 3 3 2
2 1 2 1 1 3 1
2 4 0 1 0 3 1

13

Applications. Databases, image processing, maximum likelihood estimation,


technical job interviews, …

21
6. D YNAMIC P ROGRAMMING
I
‣ weighted interval scheduling
‣ segmented least
squares
‣ knapsack problem
‣ RNA secondary structure

SECTION 6.3
Least squares

Least squares. Foundational problem in statistics.


・Given n points in the plane: (x1, y1), (x2, y2) , …, (xn, yn).
・Find a line y = ax + b that minimizes the sum of the squared error:
n
2
SSE = (yi − axi − y
i =1 b)

Solution. Calculus ⇒ min error is achieved when

n ix i yi − ( i x i )( i yi ) i i yi − a i xi
a = , b =
n x i 2− ( i x i ) 2 n

24
Segmented least squares

Segmented least squares.

・Points lie roughly on a sequence of several line segments.


・Given n points in the plane: (x1, y1), (x2, y2) , …, (xn, yn) with
x1 < x2 < ... < xn, find a sequence of lines that minimizes f (x).

Q. What is a reasonable choice for f (x) to balance accuracy and parsimony?

goodness of fit number of lines

x 25
Segmented least squares

Segmented least squares.

・Points lie roughly on a sequence of several line segments. Given n


・points in the plane: (x1, y1), (x2, y2) , …, (xn, yn) with x1 < x2 < ... < xn,
find a sequence of lines that minimizes f (x).

Goal. Minimize f (x) = E + c L for some constant c > 0, where


・E = sum of the sums of the squared errors in each segment. L = number
・of lines.
y

x 26
Dynamic programming: multiway choice

Notation.OP
・ T( j ) = minimum cost for points p1, p2, …, pj. =
・ eij SSE for for points pi, pi+1, …, pj.

To compute OPT( j ):
・Last segment uses points pi, pi+1, …, pj for some i ≤ j.
・Cost = eij + c + OPT(i – 1). optimal substructure property
(proof via exchange argument)

Bellman equation.

0 j = 0
OP T (j ) = min { ei j + c + OP T (i 1) } j >
01 i j

27
Segmented least squares algorithm

SEGMENTED-LEAST-SQUARES(n, p1, …, pn, c)


__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ FO
R j = 1 TO n
FOR i = 1 TO j
Compute the SSE eij for the points pi, pi+1, …, pj.

M [ 0] ← 0. previously computed value


FOR j = 1 TO n
M[j]← 1≤i≤j { eij + c + M [ i –
min 1] }.

RETURN M
[ n].
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

28
Segmented least squares analysis

Theorem. [Bellman 1961] DP algorithm solves the segmented least squares problem in
3 2
O(n ) time and O(n ) space.

Pf.
・Bottleneck = computing SSE eij for each i and j.

n kx k yk − ( k x k )( k yk ) , k yk − ai j k xk
ai j = 2 2 ij =
n k xk − ( k xk ) b n

・O(n) to compute eij. ▪

2
Remark. Can be improved to O(n ) time. i i i i

・ For each i : precompute cumulative sums


k= 1
xk ,
k=1
yk ,
k=1
xk ,
2

k=1
x k yk .

・Using cumulative sums, can compute eij in O(1) time.


29
6. D YNAMIC P ROGRAMMING
I
‣ weighted interval scheduling
‣ segmented least
squares
‣ knapsack problem
‣ RNA secondary structure

SECTION 6.4
Knapsack problem

Goal. Pack knapsack so as to maximize total value of items taken. There are n
・ items: item i provides value vi > 0 and weighs wi > 0. Value of a subset of
・ items = sum of values of individual items. Knapsack has weight limit of W.

Ex. The subset { 1, 2, 5 } has value $35 (and weight 10).
Ex. The subset { 3, 4 } has value $40 (and weight 11).

Assumption. All values and weights are integral.


i vi wi
1 $1 1 kg
$18 5k
g weights and values
$22 g
6k
2 $6 2 kg can be arbitrary
11 k g positive integers
3 $18 5 kg
$6 2 k g 4 $22 6 kg

g
5 $28 7 kg
$28 7k
$1 g
1k
knapsack instance
Creative Commons Attribution-Share Alike 2.5
(weight limit W = 11)
by Dake 31
Dynamic programming: quiz 2

Which algorithm solves knapsack problem?

A.
Greedy-by-value: repeatedly add item with maximum vi. Greedy-by-
B. weight: repeatedly add item with minimum wi. Greedy-by-ratio: repeatedly
C. add item with maximum ratio vi / wi. None of the above.
D.

i vi wi
1 $1 1 kg
$18 g
5k
$22 g
6k
2 $6 2 kg
11 kg
3 $18 5 kg
$6 2 k g 4 $22 6 kg

g
5 $28 7 kg
$28 7k
$1 g
1k
knapsack instance
Creative Commons Attribution-Share Alike 2.5
(weight limit W = 11)
by Dake
32
Dynamic programming: quiz 3

Which subproblems?

A.
OPT(w) = optimal value of knapsack problem with weight limit w. OPT(i) =
B. optimal value of knapsack problem with items 1, …, i. OPT(i, w) = optimal
C. value of knapsack problem with items 1, …, i
subject to weight limit w.

D. Any of the above.

33
Dynamic programming: two variables

Def. OPT(i, w) = optimal value of knapsack problem with items 1, …, i, subject to


weight limit w.
Goal. OPT(n, W).
possibly because wi >w

Case 1. OPT(i, w) does not select item i.


・OPT(i, w) selects best of { 1, 2, …, i – 1 } subject to weight limit w.
Case 2. OPT(i, w) selects item i. optimal substructure property
(proof via exchange argument)
・Collect value vi.
・New weight limit = w – wi.
・OPT(i, w) selects best of { 1, 2, …, i – 1 } subject to new weight limit.
Bellman equation.
0 i =0

OP T (i, w )
= OP T (i 1, w ) wi
> w max { OP T (i 1, w ), vi + OP T (i 1, w wi ) }
34
Knapsack problem: bottom-up dynamic programming

KNAPSACK(n, W, w1, …, wn, v1, …, vn )


__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ FO
R w = 0 TO W
M [ 0, w ] ←
0.
previously computed values
FOR i = 1 TO n
FOR w = 0 TO W

IF (wi > w) M [ i, w ] ← M [ i – 1,
w ].
ELSE M [ i, w ] ← max { M [ i – 1, w ], vi + M [ i – 1, w –
wi] }.
RETURN M [ n,
W].
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

0 i= 0
OP T (i, w )
= OP T (i 1, w ) wi
> w max { OP T (i 1, w ), vi + OP T (i 1, w wi ) }
35
Knapsack problem: bottom-up dynamic programming demo

i vi wi
1 $1 1 kg
0 i =0
2 $6 2 kg OP T (i, w ) OP T (i 1, w ) wi
=
3 $18 5 kg > w max { OP T (i 1, w ), vi + OP T (i 1, w wi }
4 $22 6 kg
5 $28 7 kg
weight limit w

0 1 2 3 4 5 6 7 8 9 10 11

{ } 0 0 0 0 0 0 0 0 0 0 0 0

{1} 0 1 1 1 1 1 1 1 1 1 1 1

subset { 1, 2 } 0 1 6 7 7 7 7 7 7 7 7 7
of items
1, …, i { 1, 2, 3 } 0 1 6 7 7 18 19 24 25 25 25 25

{ 1, 2, 3, 4 } 0 1 6 7 7 18 22 24 28 29 29 40

{ 1, 2, 3, 4, 5 } 0 1 6 7 7 18 22 28 29 34 35 40

OPT(i, w) = optimal value of knapsack problem with items 1, …, i, subject to weight limit w
36
Knapsack problem: running time

Theorem. The DP algorithm solves the knapsack problem with n items and
maximum weight W in Θ (n W) time and Θ (n W) space.
Pf. weights are integers
between 1 and W
・Takes O(1) time per table entry.
・There are Θ (n W) table entries.
・After computing optimal values, can trace back to find solution: OPT(i, w)
takes item i iff M [i, w] > M [i – 1, w]. ▪

Remarks.

・Algorithm depends critically on assumption that weights are integral. Assumption


・that values are integral was not used.

37
Dynamic programming: quiz 4

Does there exist a poly-time algorithm for the knapsack problem?

A. Yes, because the DP algorithm takes Θ (n W) time. “pseudo-polynomial”

B. No, because Θ (n W) is not a polynomial function of the input size. No, because
C. the problem is NP-hard.

D. Unknown.

equivalent to P ≠ NP conjecture because


knapsack problem is NP-hard

38
COIN CHANGING

Problem. Given n coin denominations { c1, c2, …, cn } and a target value V, find the
fewest coins needed to make change for V (or report impossible).

Recall. Greedy cashier’s algorithm is optimal for U.S. coin denominations, but not for
arbitrary coin denominations.

Ex. { 1, 10, 21, 34, 70, 100, 350, 1295, 1500 }.


Optimal. 140¢ = 70 + 70.

39
6. D YNAMIC P ROGRAMMING
I
‣ weighted interval scheduling
‣ segmented least
squares
‣ knapsack problem
‣ RNA secondary structure

SECTION 6.5
RNA secondary structure

RNA. String B = b1b2…bn over alphabet { A, C, G, U }.

Secondary structure. RNA is single-stranded so it tends to loop back and form base
pairs with itself. This structure is essential for understanding behavior of molecule.

C A

A A

A U G C
base
C G U A A G

G
U A U U A
base pair G
A C G C U
G

C G C G A G C

G
A U

RNA secondary structure for GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA


42
RNA secondary structure

Secondary structure. A set of pairs S = { (bi, bj) } that satisfy:


・[Watson–Crick] S is a matching and each pair in S is a Watson–Crick
complement: A–U, U–A, C–G, or G–C.

G G

C U

C G
A C G U G G C C A U

base pair S is not a secondary structure (C-A is


in secondary structure A C
not a valid Watson-Crick pair)

U A

B = ACGUGGCCCA U S
= { (b1, b10), (b2, b9), (b3, b8) } 43
RNA secondary structure

Secondary structure. A set of pairs S = { (bi, bj) } that satisfy:


・[Watson–Crick] S is a matching and each pair in S is a Watson–Crick
complement: A–U, U–A, C–G, or G–C.
・[No sharp turns] The ends of each pair are separated by at least 4 intervening
bases. If (bi, bj) ∈ S, then i < j – 4.

G G

C G

A U G G G G C A U
A U

S is not a secondary structure


U A (≤ 4 intervening bases between G and C)

B = A U GG G G C
AU
S = { (b1, b9), (b2, b8), (b3, b7) } 44
RNA secondary structure

Secondary structure. A set of pairs S = { (bi, bj) } that satisfy:


・[Watson–Crick] S is a matching and each pair in S is a Watson–Crick
complement: A–U, U–A, C–G, or G–C.
・[No sharp turns] The ends of each pair are separated by at least 4 intervening
bases. If (bi, bj) ∈ S, then i < j – 4.
・[Non-crossing] If (bi, bj) and (b , b k) are
ℓ two pairs in S, then we cannot have i < k

< j < ℓ.

G G

C U

C U

A G
A G U U G G C C A U

U A S is not a secondary structure (G-C


and U - A cross)
B = ACUUGGCCAU
S = { (b1, b10), (b2, b8), (b3, b9) } 45
RNA secondary structure

Secondary structure. A set of pairs S = { (bi, bj) } that satisfy:


・[Watson–Crick] S is a matching and each pair in S is a Watson–Crick
complement: A–U, U–A, C–G, or G–C.
・[No sharp turns] The ends of each pair are separated by at least 4 intervening
bases. If (bi, bj) ∈ S, then i < j – 4.
・[Non-crossing] If (bi, bj) and (b , b k) are
ℓ two pairs in S, then we cannot have i < k

< j < ℓ.

G G

C U

C G

A U
A U G U G G C C A U

U A S is a secondary structure (with


3 base pairs)
B=AUGUGGCC
AU
S = { (b1, b10), (b2, b9), (b3, b8) } 46
RNA secondary structure

Secondary structure. A set of pairs S = { (bi, bj) } that satisfy:


・[Watson–Crick] S is a matching and each pair in S is a Watson–Crick
complement: A–U, U–A, C–G, or G–C.
・[No sharp turns] The ends of each pair are separated by at least 4 intervening
bases. If (bi, bj) ∈ S, then i < j – 4.
・[Non-crossing] If (bi, bj) and (b , b k) are
ℓ two pairs in S, then we cannot have i < k

< j < ℓ.

Free-energy hypothesis. RNA molecule will form the secondary structure with the
minimum total free energy.

approximate by number of base pairs


(more base pairs ⇒ lower free energy)

Goal. Given an RNA molecule B = b1b2…bn, find a secondary structure S that


maximizes the number of base pairs.

47
Dynamic programming: quiz 5

Is the following a secondary structure?

A. Yes.

B. No, violates Watson–Crick condition. No,


C. violates no-sharp-turns condition.

D. No, violates no-crossing condition.

G C

C G U A A G

A U U A
G
G C U
G

C G A G C

A U

G 48
Dynamic programming: quiz 6

Which subproblems?

A. OPT( j ) = max number of base pairs in secondary structure of the


substring b1b2 … bj.
B. OPT( j ) = max number of base pairs in secondary structure of the
substring bj bj+1 … bn.
C. Either A or B.

D. Neither A nor B.

49
RNA secondary structure: subproblems

First attempt. OPT( j ) = maximum number of base pairs in a secondary structure of


the substring b1b2 … bj.

Goal. OPT(n).

match bases bt and b n

Choice. Match bases bt and bj.

1 t j last base

Difficulty. Results in two subproblems (but one of wrong form). Find


・ secondary structure in b1b2 … bt–1. OPT(t–1)

・Find secondary structure in bt+1bt+2 … bj–1. need more subproblems


(first base no longer b ) 1

50
Dynamic programming over intervals

Def. OPT(i, j ) = maximum number of base pairs in a secondary structure of the


substring bi bi+1 … bj.

Case 1. If i ≥ j – 4.
・OPT(i, j ) = 0 by no-sharp-turns condition.

Case 2. Base bj is not involved in a pair.


・OPT(i, j ) = OPT(i, j – 1).

Case 3. Base bj pairs with bt for some i ≤ t < j – 4.


・Non-crossing condition decouples resulting two subproblems. OPT(i, j)=
・1 + max t { OPT(i, t – 1) + OPT(t + 1, j – 1) }.
match bases bj and b t
take max over t such that i ≤ t < j – 4 and
bt and b j are Watson–Crick complements

i t j 51
Dynamic programming: quiz 7

In which order to compute OPT(i, j) ?

A.
Increasing i, then j.
B. Increasing j, then i.
C. Either A or B.

D. Neither A nor B.

match bases bj and b t

i t j

OPT(i, j) depends upon OPT(i, t-1) and OPT(t+1, j-1)

52
Bottom-up dynamic programming over intervals

Q. In which order to solve the subproblems?


A. Do shortest intervals first—increasing order of ⎜j − i⎟.

RNA-SECONDARY-STRUCTURE(n, b1, …, bn ) j
6 7 8 9 10
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

FOR k = 5 TO n – 1 4 0 0 0
all needed values
FOR i = 1 TO n – k are already computed 3 0 0
i
j ← i + k. 2 0

1
Compute M[i, j] using formula.
RETURN M[1, n]. order in which to solve subproblems
__________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Theorem. The DP algorithm solves the RNA secondary structure problem in O(n ) time
and 3O(n ) space. 2

53
Dynamic programming summary

typically, only a polynomial


Outline. number of subproblems

・Define a collection of subproblems.


・Solution to original problem can be computed from subproblems. Natural
・ordering of subproblems from “smallest” to “largest” that enables determining
a solution to a subproblem from solutions to smaller subproblems.

Techniques.

・Binary choice: weighted interval scheduling.


・Multiway choice: segmented least squares.
・Adding a new variable: knapsack problem.
・Intervals: RNA secondary structure.

Top-down vs. bottom-up dynamic programming. Opinions differ.

54

You might also like