0% found this document useful (0 votes)
65 views9 pages

Recursion, Dynamic Programming, Divide & Conquer: Sequence Alignment, Quicksort

The document provides information about algorithms and data structures, including recursion, dynamic programming, divide and conquer, quicksort, Fibonacci numbers, and sequence alignment. It discusses recursive solutions to problems like the Tower of Hanoi puzzle and calculating Fibonacci numbers. It also describes how dynamic programming can be used to optimize recursive solutions by storing previously computed values in a memo to avoid recomputing them.

Uploaded by

jahanzeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views9 pages

Recursion, Dynamic Programming, Divide & Conquer: Sequence Alignment, Quicksort

The document provides information about algorithms and data structures, including recursion, dynamic programming, divide and conquer, quicksort, Fibonacci numbers, and sequence alignment. It discusses recursive solutions to problems like the Tower of Hanoi puzzle and calculating Fibonacci numbers. It also describes how dynamic programming can be used to optimize recursive solutions by storing previously computed values in a memo to avoid recomputing them.

Uploaded by

jahanzeb
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Tower of Hanoi

Tower of Hanoi puzzle,


marketed in 1883 by
Professor N. CLAUS (DE
SIAM), an anagram
pseudonym for Professor
douard LUCAS
(DAMEINS).

Algorithms and Data Structures


Lecture 4:
Recursion,
Dynamic programming,
Divide & Conquer

The game consists of demolishing the tower level


by level, and reconstructing it in a neighboring
place, conforming to the
rules given.

Sequence Alignment, Quicksort

Veronica
Gaspes
[email protected]

www.hh.se/staff/vero/itads
lecture 4

p.1/36

lecture 4

Tower of Hanoi

p.2/36

Tower of Hanoi

Move all plates from peg A to peg C


Plates can be moved one by one from one peg to another
peg
At no stages should a smaller plate come below a bigger
plate
An extra peg B can be used.

lecture 4

p.3/36

lecture 4

p.4/36

Recursion

Recursion

A recursive method is a method that directly or indirectly


makes a call to itself.

void hanoi(int n, char from, char to, char help){


if(n>0){
hanoi(n-1,from,help,to);
System.out.println(from+" --> "+to);
hanoi(n-1,help,to,from);
}
}

void hanoi(int n, char from, char to, char h){


if(n>0){
hanoi(n-1,from,h,to);
System.out.println(from+" --> "+to);
hanoi(n-1,h,to,from);
}
}

The Base Case. Allways have at least one case that is


solved without recursion. In hanoi, 0 and all negative
integers are base cases: do nothing!
Progress towards the base case. All recursive calls must
be done with arguments that get closer to the base
case. In hanoi, when calling with a positive integer x,
the recursive calls are with x-1

The recursive calls on values closer to 0.

lecture 4

p.5/36

lecture 4

Fibonacci numbers
Consider the following sequence of numbers
1 1 1+1 2 1+2 3 2+3 5 3+5 8

You gotta believe! Always assume that the recursive calls


work . And complete the solution for the actual value!

p.6/36

Fibonacci numbers
5+8 13

A program that computes the n th fibonacci number:

8+13 2

Strange as it seems it has very nice properties, it occurrs


in many places and has magazines dedicated to it!
We can define the n th element of the sequence:

!
1
f ib(n) =
f ib(n 1) + f ib(n 2)

lecture 4

int fib(int n){


if(n==0||n==1)
return 1;
else
return fib(n-1) + fib(n-2);
}

if n = 0 or n = 1
if n 2

p.7/36

lecture 4

p.8/36

Divide and Conquer

Fibonacci numbers
Nice, all the rules are followed (base cases, progression,
belief!)

A problem solving technique that leads to recursive


solutions.

BUT! The recursive calls are overlapping:


To compute fib(5) we call fib(4) and fib(3)
To compute fib(4) we call fib(3) and fib(2)
to compute fib(3) we call fib(2) and fib(1)
To compute fib(2) we call fib(1) and fib(0)

A divide and conquer algorithm is an efficient recursive


algorithm that consists of 2 parts:
Divide: Smaller problems are solved recursively
(except the base cases!)
Conquer : The solution to the original problem is
formed from the solutions to the subproblems.

This leads to very inefficient programs!


More about this later today, first
a good use of recursion . . .

lecture 4

Hopefully all subproblems are much smaller than the


original one and the subproblems do not overlap!
p.9/36

lecture 4

p.10/36

Divide & Conquer and Sorting

Quicksort

Sort an array using Divide and Conquer :

To sort an array of size 10.

To sort an array of size N .

13 81 92 43 31 65 57 26 75 0

Divide the array into two halves.

Divide the array in two halves.

Recursively sort the two parts.

13 81 92 43 31 65 57 26 75 0

Put together the sorted parts to a sorted whole.

Pivot?

Partition
13 0 26 43 57 31

65

92 75 81

Recursively sort the two parts. (Believe! Quicksort)

What to do for putting together depends on how we


choose to divide

0 13 26 31 43 57

65

75 81 92

Put together the sorted parts.


lecture 4

p.11/36

lecture 4

p.12/36

Quicksort

Quicksort

Auxiliary methods

void quicksort(T [ ] a, int low, int high ){


if( small array )
insertionSort( a, low, high );
else{
int middle = ( low + high ) / 2;
sort low, middle, high

1. Find a good pivot. An element in the array that has


more or less as many elements smaller as it has larger
in the array.
Find it in constant time!
Median of 3 among a[low], a[mid] and a[high]

partition
quicksort( a, low, i - 1 );
// Pivot at i
quicksort( a, i + 1, high );

2. Partition. All smaller than the pivot to the left, all larger
to the right.
Loop through the array from low upwards and from
high downwards.
Stop on elements that are on the wrong half.
Exchange elements when needed and continue
looping until all elements are in the proper half.
lecture 4

}
}

p.13/36

lecture 4

p.14/36

Quicksort

Quicksort

Small array

sort low, middle, high

if( a[ middle ].compareTo( a[ low ] ) < 0 )


swapReferences( a, low, middle );
if( a[ high ].compareTo( a[ low ] ) < 0 )
swapReferences( a, low, high );
if( a[ high ].compareTo( a[ middle ] ) < 0 )
swapReferences( a, middle, high );

low + CUTOFF > high

where CUTOFF can be around 10.

lecture 4

p.15/36

lecture 4

p.16/36

Quicksort - analysis

Quicksort
Partition
// Place pivot at position high - 1
swapReferences( a, middle, high - 1 );
T pivot = a[ high - 1 ];
// Begin partitioning
int i, j;
for( i = low, j = high - 1; ; ){
while( a[ ++i ].compareTo( pivot ) < 0 );
while( pivot.compareTo( a[ --j ] ) < 0 );
if( i >= j ) break;
swapReferences( a, i, j );
}
// Restore pivot
swapReferences( a, i, high - 1 );
lecture 4

T (N ) time to sort an array of size N

Divide it into two halves takes O(c) to pick the pivot and
O(N ) to partition. So division is O(N ).

Recursively sort the two parts will take


T (Nsmall ) + T (Nlarge )
Put together the solutions do nothing!
So
T (N ) = T (Nsmall ) + T (Nlarge ) + O(N )

p.17/36

lecture 4

Quicksort - analysis

Fibonaccis problem
Look once more at the definition of f ib(n):
!
1
if n = 0 or n = 1
f ib(n) =
f ib(n 1) + f ib(n 2) if n 2

T (N ) = T (Nsmall ) + T (Nlarge ) + O(N )

If we manage to divide the array in equal sized parts we


will get
T (N ) = 2T ( N2 ) + N = 4T ( N4 ) + 2 N2 + N = . . .
= N T (1) + N log(N )

An obvious java program


int fib(int n){
if(n==0||n==1)
return 1;
else
return fib(n-1) + fib(n-2);
}
leads to an explosion of recursive calls with values
being computed once and again!

T (N ) is O(N log(N )) if we manage to find a good pivot in


constant time!

Compare with O(N 2 ) for insertion sort!

lecture 4

p.18/36

p.19/36

lecture 4

p.20/36

The Problem
fib(8)
fib(7)

Memoaization
Whenever we have to compute a value, check in a memo
whether we already have computed it!

fib(6)

fib(6)

fib(5)

fib(5)
fib(4)

fib(3)

fib(3)

fib(2)

fib(4)

BigInteger [] memo;

fib(3)

fib(3)

fib(2)

fib(2)

fib(1)

fib(5)

This means that when we compute a value for the first


time, we have to record it in a memo!

fib(4)

BigInteger fib(int n){


if(n == 0 || n == 1)
memo[n]=BigInteger.ONE;
else
if(memo[n].equals(BigInteger.ZERO))
memo[n]=fib(n-2).add(fib(n-1));
return memo[n];
}

fib(4)

fib(4)
fib(3)
fib(2)
fib(3)

fib(3)
fib(2)
fib(1)
fib(2)

fib(2)

fib(1)

fib(1)

fib(0)

and it is not over . . .

lecture 4

p.21/36

lecture 4

Bottom-up: Dynamic programming

p.22/36

Sequence Comparison

It is easy to realize that we can fill the array from the base
cases and forward! And that we only need 2 values any
point!

A more advanced application of dynamic programming


A widely applied topic: file comparisson, spelling
correction, information retrieval and searching for
similarities among biosequences.

BigInteger fibIt(int n){


BigInteger fn_1 = BigInteger.ONE;
BigInteger fn_2 = BigInteger.ZERO;
while(n-- > 0){
fn_1 = fn_1.add(fn_2);
fn_2 = fn_1.subtract(fn_2);
}
return fn_1;
}

How similar are the strings VERONICA and MARTIN?


How similar are spinach and rice?
sequences of Triosephosphate Isomerase):

(according to peptide

CNGTKESITKLVSDLNSATLEAD__VDVVVAPPFVYIDQVKSSLTGRVEISA
CNGTTDQVDKIVKILNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPEIQVAA

And monkeys and humans?


MNGRKQNLGELIGTLNAAKVPAD__TEVVCAPPTAYIDFARQKLDPKIAVAA
MNGRKQSLGELIGTLNAAKVPAD__TEVVCAPPTAYIDFARQKLDPKIAVAA

In short from a recursive formulation of the problem to an


iterative program that recalls computed values that are
lecture 4

p.23/36

lecture 4

p.24/36

Minimal Edit Distance

Minimal Edit Distance

One such string comparison problem can be stated as


Align two strings in such a way that the number of
commands needed to transform one into the other is
minimal.
VERONICA
MARTIN__
requires 7 changes (editing commands)
while
VERONICA
MART_IN_
requires only 6!
as well as
VERONICA
MAR_TIN_
lecture 4

Or, more formally


Given 2 strings compute an alignment that minimizes the
edit distance between them
For strings a and b, the distance (a, b) is
(a, b) = (ai , bi )

for the aligned strings (possibly with gaps)


!
0 if ai = bi
(ai , bi ) =
1 if ai #= bi

p.25/36

lecture 4

p.26/36

Minimal Edit Distance

Minimal Edit Distance

First attempt

Second attempt
Based on the observation that Any prefix of the optimal
alignment is an optimal alignment of prefixes use the
recursion

for i = 0
j

for j = 0
i

(i, j) =
(i 1, j) + 1

min (i, j 1) + 1

(i 1, j 1) + (a , b )

i1 j1

Enumerate all alignments and their distances and choose


an alignment with minimum distance.
Unfortunately . . . there are too many!
For strings of lengths m and n there are
(m + n)!
m!n!

alignments and for n = m = 150 this is approximately 1090 !

lecture 4

where

p.27/36

lecture 4

where (i, j) is the minimal cost of aligning the prefixes of a


and b of lengths i and j respectively. Base cases
correspond to empty prefixes, indexes in the strings are
0 . . . m 1, 0 . . . n 1.

p.28/36

Minimal Edit Distance

Minimal Edit Distance

Third attempt - Dynamic Programming


Each step of the recursion requires 3 values. Try to find a
way of recording the values in a bottom-up fashion.

Dynamic Programming
The matrix can be filled in different ways so that the values
needed in the computation are available:

0
m 1
a 2
r 3
t 4
i 5
n 6

v
_

ve
__

v
1
1
2
3

e
2
2
2
3

r
3
3
3
2

o n i c
4 5 6 7

a
8

Row by row

Column by column

vero
____

veron
_____

lecture 4

p.29/36

veroni

veronic

...

Antidiagonal by antidiagonal

?
ver
___

...

lecture 4

p.30/36

veronica

Dynamic Programming

Tracing back an alignment


When a value is chosen for (i, j) by taking

(i 1, j) + 1
min (i, j 1) + 1

(i 1, j 1) + (a , b )
i1 j1

The problem is stated as an optimization problem.


Optimal values are defined recursively.
Efficient solutions are derived memorizing already
computed values (using dynamic porgramming)

we record also the coordinates of the chosen alternative:


In some problems, e.g. sequence alignment, not only
the optimal value is of interest, but also how it is
achieved.
>java SequenceAlignment1 veronica martin
6
veronica
mar-ti-n
In this case extra space must be used to trace it back
lecture 4

0
m 1
a 2
r

v
1
1
2

e
2
2
2

r
3
3
3

3 3 3

o n i c a
4 5 6 7 8

We record that for (3, 3) we come from cell (2, 2)


p.31/36

lecture 4

p.32/36

Tracing back an alignment

Sequence Alignment in Bioinformatics

We have to do this for each cell in the matrix, we need a


matrix of
class Coord{
int i, j;
Coord(int x, int y){
i=x;j=y;
}
}

DNA and proteins are built as long chains of chemichal


components (biosequences) conventionaly denoted by
letters
A G C T for ADN
A C D E F G H I K L M N P Q R S T V W Y
for proteins
Biosequences are compared in the hope that what holds
for a sequence also holds for similar sequences.

We fill both matrices during the same traversal of all


possible alignments

The way of comparing biosequences is by finding good


alignments

The optimal alignment is then recovered by tracing the


coordinates back from the value corresponding to the
alignment of the complete strings.
lecture 4

Alignments are good when they maximize similarity

p.33/36

Sequence Alignment in Bioinformatics

lecture 4

Sequence Alignment in Bioinformatics

Score matrices

Dynamic programming made sequence alignment feasible.


Many optimizations have been proposed: to minimize the
space required for computations; heuristics that reduce
the portion of that is explored

Similarity between biosequences is built up from how


similar the letters are.
There is not only match/mismatch but matrices that
describe how similar each pair of letters is

There are now search engines for huge databases:


the Basic Local Alignment Search Tool.

This is related to how likely it is that a letter is the result of


a mutation from some ancestor
There are many! computed score matrices: (e.g. gonnet)
C S T P A G N D E
C 12 0 0 -3 0 -2 -2 -3 -3 . . .
S 0 2 2 0 1 0 1 0 0
...
lecture 4

p.34/36

p.35/36

lecture 4

BLAST

Original sources:
A general method appplicable to search for similarities
in the amino acid sequence of two proteins by
Needleman and Wunch, JBL 1970.
Identification of common molecular subsequences by
Smith and Waterman, JBL 1981.
Basic Local Alignment Search Tool by Altschul et al.,
JBL 1990.
p.36/36

You might also like