0% found this document useful (0 votes)
23 views70 pages

Lecture # 15 - New

Uploaded by

Danger Danger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views70 pages

Lecture # 15 - New

Uploaded by

Danger Danger
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 70

Design and Analysis of Algorithm

Tanveer Ahmed Siddiqui

Department of Computer Science


COMSATS University, Islamabad
Recap and Today Covered

Algorithm Design and Analysis Process


Understand the problem

Decide on : algorithm
design techniques etc.

Design an algorithm

Prove correctness

Analyze efficiency etc

Code the algorithm


Department of Computer Science
Reading Material

Read Chapter 8
Dynamic Programming

Department of Computer Science


Objectives

 How to Design Algorithm using Dynamic


Programming Approach.

Department of Computer Science


Department of Computer Science
What we have learnt?

 Define a set of sub problems that can lead to the


solution of the original problem.
 Think: what information would make solving this
problem easier?
 Try to find a relation between the sub problems
you found above, and the current problem you're
trying to solve.
 Write the recursion solution of the original problem
from the solutions of sub problems.
 Build the solution lookup table in a bottom-up
fashion, until reaching the original problem.
 Initialize the lookup table.
 Fill the other terms through iteration by using the recurrence
relation.
Department of Computer Science 6
What we have learnt?

 In a nutshell, dynamic programming is recursion


without repetition.
 Dynamic programming algorithms store the
solutions of intermediate subproblems, often but
not always in some kind of array or table.
 Many algorithms students make the mistake of
focusing on the table (because tables are easy and
familiar) instead of the much more important (and
difficult) task of finding a correct recurrence.
 As long as we memoize the correct recurrence, an explicit table
isn’t really necessary, but if the recursion is incorrect, nothing
works

Department of Computer Science 7


Lecture No. 25

Dynamic Programming
for Solving
Optimization Problems

(Edit Distance)

Department of Computer Science


Spell Checking

 When a spell checker encounters a possible


misspelling, it looks in its dictionary for other
words that are close by.

Department of Computer Science


Spell Checking

 What is the appropriate notion of closeness in


this case?

Department of Computer Science


Spell Checking

 What is the appropriate notion of closeness in


this case?
 To answer this question, let us understand the
source of miss spell

Department of Computer Science


Spelling errors

Department of Computer Science


Edit Operations

 80% spell errors lies in:


 Insertion, Deletion, Substitution, and
Transposition

Department of Computer Science


Edit Distance

 What is the appropriate notion of closeness in


this spell checking?
 The appropriate notion of closeness in this spell
checking is the minimum number of edits
(operations) required to convert ‘S1’ into ‘S2’.

Department of Computer Science


Edit Distance

“Michael Jackson” to “Mendelssohn”

MI C H A E L J ACKSO N

E N D S H
Operation S D S S S D D D D I

Distance 0 1 2 3 4 5 6 7 8 9 10
Department of Computer Science
What is Edit Distance?

 The Edit Distance is defined as the minimum


number of edits needed to transform one string
into the other.
 The Edit Distance (or Levenshtein distance)
is a metric for measuring the amount of
difference between two texts.
 Now “What is edit script?”
 The sequence of operations is called edit script.

EDIT
Department SCRIPT:
of Computer Science DIS
What is Edit Distance?
 What is the edit distance between FOOD and MONEY.
 The edit distance between FOOD and MONEY is at most four:

 Can we minimize this distance?


 Yes(How)

Department of Computer Science


What is Edit Distance?

 A natural measure of the distance between two


strings is the extent to which they can be aligned
or matched up.
 Technically, an alignment is simply a way of
writing the strings one above the other.

Department of Computer Science


How to measure Edit Distance?

 Technically, an alignment is simply a way of


writing the strings one above the other. For
instance, here are two possible alignments of
SNOWY and SUNNY:
- S N O W - Y

S U N - - N Y
Cost = 5
 The first word has a gap for every insertion (I) and
the second word has a gap for every deletion (D).
Columns with two different characters correspond to
substitutions (S). Matches (M) do not count.
Department of Computer Science
How to measure Edit Distance?

 Technically, an alignment is simply a way of


writing the strings one above the other. For
instance, here are two possible alignments of
SNOWY and SUNNY:
S - N O W Y
S U N N - Y
Cost = 3

 The first word has a gap for every insertion (I) and
the second word has a gap for every deletion (D).
Columns with two different characters correspond to
substitutions (S). Matches (M) do not count.
Department of Computer Science
How to measure Edit Distance?

Alignment : 2 x k matrix ( k  m, n )
S1= SNOWY n=5
S2 = SUNNY m=5

- S N O W - Y

S U N - - N Y
Cost = 5

S - N O W Y
S U N N - Y
Cost = 3
Department of Computer Science
UP-SHOT

 A better way to display this editing process is to


place the words above the other:

 The first word has a gap for every insertion (I)


and the second word has a gap for every
deletion (D). Columns with two different
characters correspond to substitutions (S).
Matches (M) do not count.

Department of Computer Science


How to formulate Edit Distance?

 What are the subproblems in this case?


 When solving a problem by dynamic programming,
the most crucial question is, What are the
subproblems?
 Consequently, the first step towards devising a
dynamic programming solution is to check
whether the problem exhibits optimal
substructure.
 Optimal substructure means that the solution to a
given optimization problem can be obtained by the
combination of optimal solutions to its sub-problems.
 Such optimal substructures are usually described
by means of recursion.
Department of Computer Science
Formulation of Edit Distance?

 What are the subproblems in this case?


 How about looking at the edit distance between
some prefix of the first string, A[1,i], and some
prefix of the second, B[1,j]?
 For Example

 Call this subproblem E(i, j)


 Our final objective, then, is to compute E(m, n).

Department of Computer Science


Formulation of Edit Distance?
 What are the subproblems in this case?
 What do we know about the best alignment between A[1
i] and B[1 j]?
 The gap representation for the edit sequences has a crucial
“optimal substructure” property.
 The edit distance is 6 for the following two words.

 If we remove the last column, the remaining columns must


represent the minimum edit sequence for the remaining
substrings.
 If we remove the last column, the edit distance reduces to

5.

Department of Computer Science


Formulation of Edit Distance?
 What are the subproblems in this case?
 The idea is to process all characters one by on
starting from either from left or right sides of both
strings.
 Let us traverse from right corner, there are two
possibilities for every pair of character being
traversed.
 Possibility 1: If last characters of two strings
are same
 Nothing much to do.
 Ignore last characters and get count for remaining strings.
So, we recur for lengths m-1 and n-1.

Department of Computer Science


Formulation of Edit Distance?

 Possibility 1: If last characters of two strings


are same
 Nothing much to do.
 Ignore last characters and get count for remaining strings.
So, we recur for lengths m-1 and n-1.

E(6,5)

E(5,4)
 In this case: E(i, j) = E(i - 1, j - 1)
Department of Computer Science
Formulation of Edit Distance?

 Match: If characters are same, no substitution is


needed:

E(i, j) = E(i - 1, j - 1)

Department of Computer Science


Formulation of Edit Distance?
 What are the subproblems in this case?
 Possibility 2: If last characters are not same:
 We consider all operations on ‘S1’,.
 Consider all three operations on last character
of first string, recursively compute minimum
cost for all three operations and take minimum
of three values.

Department of Computer Science


Formulation of Edit Distance?

 Insertion:
E(5,4)

E(5,3)

 In this case
E(i, j) = E(i, j - 1) + 1
Department of Computer Science
Formulation of Edit Distance?

 Insertion: The last entry in the top row is


empty.

 In this case
E(i, j) = E(i, j - 1) + 1

Department of Computer Science


Formulation of Edit Distance?

 Deletion:
E(5,4)

E(4,4)

 In this case:
Department of Computer Science
E(i, j) = E(i- 1, j) + 1
Formulation of Edit Distance?

 Deletion: Last entry in bottom row is empty.

 In this case
E(i, j) = E(i- 1, j) + 1

Department of Computer Science


Formulation of Edit Distance?
 Substitution:
E(5,4)

E(4,3)
 If this case: E(i, j) = E(i - 1, j - 1) + 1

Department of Computer Science


Formulation of Edit Distance?

 Substitution: Both rows have characters in the


last column.

 E(i, j) = E(i - 1, j - 1) + 1

Department of Computer Science


UP-SHOT: Formulation of Edit Distance?
 Substitute/Replace: Recur for i-1 and j-1
 Delete: Recur for i-1 and j
 Insert: Recur for i and j-1

Department of Computer Science


Formulation of Edit Distance?

 Now what are the base case(s)


 There are a couple of obvious base cases:
 The only way to convert an empty string into a
string of j characters is by doing j insertions.
 Thus, E(0, j) = j
 The only way to convert a string of i characters
into the empty string is with i deletions:
 Thus, E(i, 0) = i

Department of Computer Science


Formulation of Edit Distance?

 Thus the edit distance E(i, j) is the smallest of


the four possibilities:

Department of Computer Science


Formulation of Edit Distance?

 Thus, the edit distance E(i, j) is the smallest of


the four possibilities:

 Consider the example of edit between the words


“ARTS” and “MATHS”:
 The edit distance would be in E(4, 5). If we
compute this edit distance, then we have
following recursion

Department of Computer Science


Formulation of Edit Distance?

 Recursion clearly leads to the same repetitive call


pattern.

Department of Computer Science


Recursive Algorithm for Edit Distance?

Department of Computer Science


Formulation of Edit Distance?

 To avoid this, we will use the DP approach.


 We will build the solution bottom-up(smallest to
largest).
 We will use the base case E(0, j) to fill first row
and E(i,0) to fill first column

Department of Computer Science


Formulation of Edit Distance?

 How will you fill out remaining table?

Department of Computer Science


Edit Distance: Dynamic Programming Algorithm

Department of Computer Science


Example

 Compute the edit distance and edit scripts for the


strings “ARTS” and “MATHS”.
 Step 1: Use the base case E(0, j) to fill first row.
This is shown in the following figure

Department of Computer Science


Example

 Compute the edit distance and edit scripts for the


strings “ARTS” and “MATHS”.
 Step 1: Use the base case E(0, j) to fill first row.
This is shown in the following figure

Department of Computer Science


Example

 Compute the edit distance and edit scripts for the


strings “ARTS” and “MATHS”.
 Step 1: Computing E[1, 1]

Department of Computer Science


Example

 Compute the edit distance and edit scripts for the


strings “ARTS” and “MATHS”.
 Step 1: Computing E[1, 2]

Department of Computer Science


Example

 Compute the edit distance and edit scripts for


the strings “ARTS” and “MATHS”.
 Step 1: Computing E[1, 3] and E[1, 4]

Department of Computer Science


Example

 Compute the edit distance and edit scripts for the


strings “ARTS” and “MATHS”.
 Step 1: The final table with all E[i, j] entries
computed

Department of Computer Science


Example

 Compute the edit distance and edit scripts for


the strings “ARTS” and “MATHS”.
 Step 2: An edit script can be extracted by
following a unique path from E[0, 0] to E[4, 5].
 There are three possible paths in the current
example. Let us follow these paths and compute
the edit script.
 Path 1:

Department of Computer Science


Example

 Compute the edit distance and edit scripts for


the strings “ARTS” and “MATHS”.
 Step 2: An edit script can be extracted by
following a unique path from E[0, 0] to E[4, 5].
 There are three possible paths in the current
example. Let us follow these paths and compute
the edit script.
 Path 2:

Department of Computer Science


Example
 Compute the edit distance and edit scripts for
the strings “ARTS” and “MATHS”.
 Step 2: An edit script can be extracted by
following a unique path from E[0, 0] to E[4, 5].
 There are three possible paths in the current
example. Let us follow these paths and compute
the edit script.
 Path 3:

Department of Computer Science


Example

 Compute the edit distance and edit scripts for


the strings “Algorithm” and “ALTRUISTIC”.

Department of Computer Science


B E A T L E S
0 1 2 3 4 5 6 7

I I I I I I I
0 0 1 2 3 4 5 6 7

D K
I I I I I I
B 1 1 0 1 2 3 4 5 6
D D K
I I I I I
E 2 2 1 0 1 2 3 4 5
D D K S S S K
E I I I
3 3 2 1 1 2 3 3 4

T 4
D
4
D
3
D
2
S D
2
K
1
I
2
I
3
S
I ?
4

D D D S D D S S S
H I I
5 5 4 3 3 2 2 3 4

K I D S
Keep
Department of Computer ScienceInsert Delete Substitute
B E A T L E S

0 1 2 3 4 5 6 7
Keep B,1
B 1 0 1 2 3 4 5 6
Keep E,2
E 2 1 0 1 2 3 4 5
Subst EA,3
E 3 2 1 1 2 3 3 4
Keep T,4
T 4 3 2 2 1 2 3 4
Subst
Ins L,5 Ins E,6 HS,7
H 5 4 3 3 2 2 3 4

Keep
Department of Computer ScienceInsert Delete Substitute
Recursive Algorithm for Edit Distance?

Department of Computer Science


Application of Edit Distance
 There are numerous applications of the Edit
Distance algorithm. Here are some examples:
 It has many applications, such as:
 Spell checkers
 Plagiarism Detection
 Natural Language Translation
 Bioinformatics(difference between two DNA
sequences).

Department of Computer Science


Application of Edit Distance
 There are numerous applications of the Edit
Distance algorithm. Here are some examples:
 Spelling Correction
 If a text contains a word that is not in the
dictionary, a ‘close’ word, i.e. one with a small edit
distance, may be suggested as a correction.
 Most word processing applications, such as
Microsoft Word, have spelling checking and
correction facility. When Word, for example, finds an
incorrectly spelled word, it makes suggestions of
possible replacements.

Department of Computer Science


Application of Edit Distance
 Spelling Correction
 If a text contains a word that is not in the
dictionary, a ‘close’ word, i.e. one with a small
edit distance, may be suggested as a correction.

Department of Computer Science


Application of Edit Distance

 There are numerous applications of the Edit


Distance algorithm. Here are some examples:
 Plagiarism Detection
 If someone copies, say, a Java program and makes
a few changes here and there, for example, change
variable names, add a comment of two, the edit
distance between the source and copy may be
small.
 The edit distance provides an indication of similarity
that might be too close in some situations.

Department of Computer Science


Application of Edit Distance
 Computational Molecular Biology DNA is a polymer. The monomer units
of DNA are nucleotides, and the polymer is known as a “polynucleotide.”
Each nucleotide consists of a 5-carbon sugar (deoxyribose), a nitrogen
containing base attached to the sugar, and a phosphate group. There are
four different types of nucleotides found in DNA, differing only in the
nitrogenous base. The four nucleotides are given one letter abbreviations as
shorthand for the four bases.
 • A-adenine
 • G-guanine
 • C-cytosine
 • T-thymine

 The edit distance like algorithms are used to compute a distance between
DNA sequences (strings over A,C,G,T, or protein sequences (over an
alphabet of 20 amino acids), for various purposes, e.g.:
 to find genes or proteins that may have shared functions or properties
 to infer family relationships and evolutionary trees over different
organisms.
Department of Computer Science
Application of Edit Distance
 In biological applications, we often want to compare the DNA of two (or
more) different organisms.
 Finding sequence similarities with genes of known function is a common
approach to infer a newly sequenced gene’s function

A normal growth gene switched on at the wrong time causes cancer
!

Department of Computer Science


Similar
Problems
Department of Computer Science
Similarity between Other Optimization problems
 Some typical optimization problems like:
 The Manhattan Tourist Problem
 Edit Distance and Alignments
 Longest Common Subsequences
 Global Sequence Alignment
 Scoring Alignments
 Local Sequence Alignment
 Alignment with Gap Penalties
 Even though, every problem has its own nature and
description; however after changing certain
constraints, they become equivalent to each other.
DISCOVER RELATION(s) AMONG ABOVE PROBLEMS YOURSELF

Department of Computer Science 65


Question

a) Design an algorithm based on DP that find minimum number


of insertion required to convert a string into a palindrome.
b) Design an algorithm based on DP that find minimum number
of deletion required to convert a string into a palindrome
c) Design an algorithm that employs Edit distance
(Minimum insert/delete operations combined) to
determine the minimum number of operation to covert string
into palindrome
d) Design an algorithm that employs LCS to determine the
minimum number of operation to convert string into
palindrome.
e) How these problems linked with Manhattan Tourist
Problem(MTP)

Department of Computer Science


Collecting Coins

 A checkerboard has a certain number of coins on it


 A robot starts in the upper-left corner, and walks to the
bottom left-hand corner
 The robot can only move in two directions: right and down
 The robot collects coins as it goes
 You want to collect all the coins using the minimum
number of robots walk
 Example:
 Can you design a DP algorithm for
doing this?

Department of Computer Science


CONCLUSION

Department of Computer Science


What we have learnt?

 Define a set of sub problems that can lead to the


solution of the original problem.
 Think: what information would make solving this
problem easier?
 Try to find a relation between the sub problems
you found above, and the current problem you're
trying to solve.
 Write the recursion solution of the original problem
from the solutions of sub problems.
 Build the solution lookup table in a bottom-up
fashion, until reaching the original problem.
 Initialize the lookup table.
 Fill the other terms through iteration by using the recurrence
relation.
Department of Computer Science 69
What we have learnt?

 In a nutshell, dynamic programming is recursion


without repetition.
 Dynamic programming algorithms store the
solutions of intermediate subproblems, often but
not always in some kind of array or table.
 Many algorithms students make the mistake of
focusing on the table (because tables are easy and
familiar) instead of the much more important (and
difficult) task of finding a correct recurrence.
 As long as we memoize the correct recurrence, an explicit table
isn’t really necessary, but if the recursion is incorrect, nothing
works

Department of Computer Science 70

You might also like