0% found this document useful (0 votes)
63 views52 pages

Unit 5

Uploaded by

sbsbxbxahsbxx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views52 pages

Unit 5

Uploaded by

sbsbxbxahsbxx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

String Matching

String Matching Problem


 We assume that the text is an array T[1..n] of length n
and that the pattern is an array P[1..m] of length m ≤ n.
 We further assume that the elements of P and T are
characters drawn from a finite alphabet ∑.
 Pattern P occurs with shift s in text T if 0 ≤ s ≤ n-m
and T [s+1 .. s+m] = P[1..m].
 If P occurs with shift s in T , then we call s a valid
shift; otherwise, we call s an invalid shift.
 The string-matching problem is the problem of
finding all valid shifts with which a given pattern P
occurs in a given text T .
String Matching Problem
Example: Consider the text T and pattern P as following:-
T = abcabaabcabac
P= abaa
Find all valid shifts.
Solution:

Valid shift s = 3
There will be only one valid shift in this example.
Prefix and Suffix of a string
 Prefix: A string w is a prefix of a string x, denoted w ⊏
x,
if x = wy for some string y ∈∑*.

 Suffix: A string w is a suffix of a string x, denoted w ⊐


x,
if x = yw for some string y ∈∑*.

 Example: Clearly, a b ⊏ a bc ca and cc a ⊐ abcc a .

 The empty string ε is both a suffix and a prefix of every


string.
The naive string-matching algorithm

 The worst-case running time is θ((n-m)m) , which


is θ(n2) if m=⎿n/2⏌.
The naive string-matching algorithm
Example: The operation of this algorithm is shown in
the following:-
Here T = acaabc and P = aab
The Rabin-Karp algorithm
 Rabin and Karp proposed a string-matching
algorithm that performs well.
 Given a pattern P{1..m], let p denote its
corresponding decimal value. In a similar
manner, given a text T[1..n], let ts denote the
decimal value of the length-m substring T [s+1 ..
s+m] , for s= 0, 1,……, n-m.
 ts = p iff T [s+1 .. s+m] = P[1..m]
 Therefore, s is a valid shift if and only if ts = p.
The Rabin-Karp algorithm
Computation of p and ts using Horner’s rule:
 p = P[m]+10(P[m-1]+10(P[m-2]+10(P[m-3] + ….
+ 10(P[2]+10P[1]))))
 The value of t0 can be computed similarly from
T[1..m].
 To compute the remaining values t1, t2, t3,……,tn-m ,
ts+1 can be computed from ts in the following
way:-
ts+1 = 10 (ts - 10m-1T[s+1]) + T[s+m+1] ……….. (1)
 The only difficulty with this procedure is that p
and ts may be too large.
The Rabin-Karp algorithm
 To solve this problem, with d-ary alphabet {0,1,2,…., d-1},
we choose q so that dq fits with in a computer word and
adjust the recurrence equation (1) to work modulo q, so
that it becomes
ts+1 = ( d (ts-T[s+1]h) + T[s+m+1] ) mod q
where h = dm-1 mod q
 The solution of working modulo q is not perfect, because:
ts ≡ p mod q does not imply that ts = p. On the other hand, if
ts ≢ p mod q, then we definitely have that ts ≠ p, so that shift
s is invalid.
 Any shift s for which ts ≡ p mod q must be tested further to
see whether s is really valid or we just have a spurious
hit. This additional test explicitly checks the condition
P[1..m] = T[s+1……..s+m]
The Rabin-Karp algorithm
Example: Consider T and P as following:-
T= 2359023141526739921
P= 31415
q = 13
Find all valid shifts and spurious hit.
Solution:
The Rabin-Karp algorithm
The Rabin-Karp algorithm
Time complexity
 RABIN-KARP-MATCHER takes ‚θ(m) preprocessing
time, and its matching time is ‚ θ((n-m+1)m) in the worst
case.
Question: For q=11, how many spurious hits does the
Robin-Karp matcher encounter in the text T =
3141592653589793 when looking for the pattern P= 26?
Solution:
Knuth-Morris-Pratt(KMP) algorithm
Prefix function for a pattern
Given a pattern P[1..m], the prefix function for the
pattern P is the function π : {1,2,3,…..,m} 
{0,1,2,……,m-1} such that
π(q) = max{ k ! K <q and Pk ⊐ Pq}
π(q) is the length of the longest prefix of P that is a
proper suffix of Pq.
Knuth-Morris-Pratt(KMP) algorithm

Example: Compute the prefix function of the


pattern
P = ababababca
Solution:
1 2 3 4 5 6 7 8 9 10
i a b a b a b a b c a
P(i) 0 0 1 2 3 4 5 6 0 1
π(i)
Knuth-Morris-Pratt(KMP) algorithm
Knuth-Morris-Pratt(KMP) algorithm
Knuth-Morris-Pratt(KMP) algorithm
Time complexity
Running time of compute-prefix-function is θ(m).
The matching time of KMP-Matcher is θ(n).

Question: Consider text and pattern as following:-


T = bacbababaabcbab
P = aba
Find all valid shifts using KMP algo.

Question: Compute the prefix function for the pattern


ababbabbabbababbabb.
Knuth-Morris-Pratt(KMP) algorithm
Prefix function for a pattern
Given a pattern P[1..m], the prefix function for the
pattern P is the function π : {1,2,3,…..,m} 
{0,1,2,……,m-1} such that
π(q) = max{ k ! K <q and Pk ⊐ Pq}
π(q) is the length of the longest prefix of P that is a
proper suffix of Pq.
Knuth-Morris-Pratt(KMP) algorithm

Example: Compute the prefix function of the


pattern
P = ababababca
Solution:
1 2 3 4 5 6 7 8 9 10
i a b a b a b a b c a
P(i) 0 0 1 2 3 4 5 6 0 1
π(i)
Knuth-Morris-Pratt(KMP) algorithm
Knuth-Morris-Pratt(KMP) algorithm
Knuth-Morris-Pratt(KMP) algorithm
Time complexity
Running time of compute-prefix-function is θ(m).
The matching time of KMP-Matcher is θ(n).

Question: Consider text and pattern as following:-


T = bacbababaabcbab
P = aba
Find all valid shifts using KMP algo.

Question: Compute the prefix function for the pattern


ababbabbabbababbabb.
AKTU Examination Questions
1. Write an algorithm for Naïve string matcher.
2. Write KMP algorithm for string matching. Perform the KMP
algorithm to search the occurrences of the pattern abaab in the
text string abbabaabaabab.
3. Write Rabin Karp string matching algorithm. Working modulo
q=11, how many spurious hits does the Rabin karp matcher in
the text T= 3141592653589793, when looking for the pattern
P=26.
4. Explain and Write the Knuth-Morris-Pratt algorithm for pattern
matching also write its time complexity.
5. Describe in detail Knuth-Morris-Pratt string matching
algorithm. Compute the prefix function 𝜋 for the pattern
ababbabbabbababbabb when the alphabet is Σ = {a,b}.
6. Compute the prefix function π for the pattern P= a b a c a b using
KNUTHMORRIS –PR
Approximation Algorithms
Approximation Algorithms
 An algorithm that returns near-optimal solutions is said
to be approximation algorithm.
 This technique does not guarantee the best solution.
 The goal of an approximation algorithm is to come as
close as possible to the optimum value in a reasonable
amount of time which is at the most polynomial time.
 Approximation algorithms are designed to get the
solution of NP-complete problems in polynomial time.
 If we work on an optimization problem where every
solution carries a cost, then an approximation algorithm
returns a legal solution, but the cost of that legal solution
may not be optimal.
Approximation Ratio
 Let C be the cost of the solution returned by an
approximate algorithm, and C* is the cost of the
optimal solution.
 An algorithm for a problem has an approximation
ratio of P(n) for any input of size n, if the cost C of
the solution produced by the algorithm is within a
factor of P(n) of the cost C* of an optimal solution i.e.
max(C/C*, C*/C) ≤ P(n)
 If an algorithm achieves an approximation ratio of
P(n), we call it a P(n)-approximation algorithm.
Approximation Ratio(cont.)
 The approximation ratio measures how bad the approximate
solution is distinguished with the optimal solution. A large
(small) approximation ratio measures the solution is much
worse than (more or less the same as) an optimal solution.
 Observe that P(n) is always ≥ 1, if the ratio does not depend on
n, we may write P. Therefore, a 1-approximation algorithm gives
an optimal solution.
 For a minimization problem, 0 < C ≤ C*, and the ratio C*/C gives
the factor by which the cost of the optimal solution is larger than
the cost of approximate solution.
 Similarly, for a minimization problem, 0 < C* ≤ C, and the ratio
C/C* gives the factor by which the cost of the approximate
solution is larger than the cost of optimal solution.
Vertex Cover Problem
 A vertex cover of an undirected graph G =(V, E) is a
subset V’ ⊆ V such that if (u,v) is an edge of G, then either
u ∈ V’ or v ∈ V’ (or both).
 The size of a vertex cover is the number of vertices in it.
 The vertex-cover problem is to find a vertex cover of
minimum size in a given undirected graph. We call such a
vertex cover an optimal vertex cover.
 This problem is the optimization version of an NP-
complete decision problem.
Approximation Algorithm for
Vertex Cover Problem
Approximation Algorithm for
Vertex Cover Problem
Ex. Consider the following graph:-

Find the optimal vertex cover of this graph.


Solution

(a) (b)

(c) (d)
Approximation Algorithm for
Vertex Cover Problem

Optimal vertex cover for this problem contains only


three vertices: b, d, and e.

Note: The running time of this algorithm is O(E+V)


using adjacency lists to represent E’.
Traveling-salesman problem
 In the traveling-salesman problem, we are given a complete
undirected graph G(V,E) that has a nonnegative integer
cost c(u,v) associated with each edge (u,v) ∈ E, and we
must find a Hamiltonian cycle (a tour) of G with minimum
cost.
 Let c(A) denote the total cost of the edges in the subset
A ⊆ E.
c(A) = ∑ c(u,v)
(u,v) ∈ A
 Cost function c satisfies the triangle inequality if, for
all vertices u,v,w ∈V,
c(u,v) ≤ c(u,v) + c(v,w)
Traveling-salesman problem with the triangle
inequality
 The following algorithm computes a near-optimal tour of an
undirected graph G, using the minimum-spanning-tree
algorithm MST-PRIM.
 Here, the cost function satisfies the triangle inequality.
 The tour that this algorithm returns is no worse than twice as
long as an optimal tour.
Traveling-salesman problem with the
triangle inequality
Traveling-salesman problem with the
triangle inequality
(a) A complete undirected graph. Vertices lie on intersections
of integer grid lines. For example, f is one unit to the right and
two units up from h. The cost function between two points is
the ordinary euclidean distance.

(b) A minimum spanning tree T of the complete graph, as


computed by MST-PRIM. Vertex a is the root vertex. Only edges
in the minimum spanning tree are shown. The vertices happen
to be labeled in such a way that they are added to the main tree
by MST-PRIM in alphabetical order.

(c) A walk of T , starting at a. A full walk of the tree visits the


vertices in the order a, b, c, b, h, b, a, d, e, f, e, g, e, d, a. A
preorder walk of T lists a vertex just when it is first encountered,
as indicated by the dot next to each vertex, yielding the ordering
a, b, c, h, d, e, f, g.
Traveling-salesman problem with the
triangle inequality
(d) A tour obtained by visiting the vertices in the
order given by the preorder walk, which is the tour H
returned by APPROX-TSP-TOUR. Its total cost is
approximately 19.074.

(e) An optimal tour H for the original complete


graph. Its total cost is approximately 14.715.

Note: The running time of APPROX-TSP-TOUR is


O(V2).
NP-Completeness
Classes P and NP
 The class P consists of those problems that are solvable in
polynomial time.
 More specifically, they are problems that can be solved in
time O(nk) for some constant k, where n is the size of the
input to the problem.
 The class NP consists of those problems that are “verifiable”
in polynomial time.
 What do we mean by a problem being verifiable? If we were
somehow given a “certificate” of a solution, then we could
verify that the certificate is correct in time polynomial in the
size of the input to the problem.
Classes P and NP
 For example, in the hamiltonian cycle problem, given a
directed graph G(V,E), a certificate would be a sequence <
v1,v2,v3,……,vn > of n vertices. We could easily check in
polynomial time that (vi,vi+1) ∈ E for i = 1, 2, 3,………, n-1 and
that (vn,v1) ∈ E as well.

 Any problem in P is also in NP, since if a problem is in P then


we can solve it in polynomial time without even being
supplied a certificate.
NP-Hard and NP-complete(NPC)
NP-Hard: A problem L is said to be NP-hard if every problems
belong in to NP is polynomial time reducible to L.
That is,
Problem L is NP-hard if for all problems L' ϵ NP,
L' ≤p L.
That is, if we can solve L in polynomial time, then we can
solve all NP problems in polynomial time.
NP-Complete
NP-Complete: Problem L is NP-complete if
1. L ϵ NP and
2. L is NP-hard

NP-Complete problems:
 Boolean satisfiability problem (SAT)
 Hamiltonian cycle problem.
 Travelling salesman problem
 Vertex cover problem

Note: If any NP-complete problem is solvable in


polynomial time, then every NP-Complete problem
is also solvable in polynomial time.
Relationship between P, NP, NP-hard and NPC
Polynomial-time Reduction Algorithm
 Consider a decision problem A, which we would like to solve
in polynomial time.
 We call the input to a particular problem an instance of
that problem.
 Suppose that we already know how to solve a different
decision problem B in polynomial time.
 Finally, suppose that we have a procedure that transforms any
instance α of A into some instance β of B with the following
characteristics:
 The transformation takes polynomial time.
 The answers are the same. That is, the answer for α is “yes” if
and only if the answer for β is also “yes.”
 We call such a procedure a polynomial-time reduction
algorithm.
Polynomial-time Reduction Algorithm(cont.)

Polynomial-time reduction algorithm provides us a way to


solve problem A in polynomial time:
1. Given an instance α of problem A, use a polynomial-time
reduction algorithm to transform it to an instance β of problem
B.
2. Run the polynomial-time decision algorithm for B on the
instance β.
3. Use the answer for β as the answer for α.
Lemma
If L is a language such that L’ ≤P L for some L’ ∈ NPC, then L
is NP-hard. If, in addition, L ∈ NP, then L ∈ NPC.

Note: The hamiltonian cycle problem is NP-complete.


Note: Satisfiability of boolean formulas in 3-conjunctive
normal form is NP-complete.
Note: The vertex-cover problem is NP-complete.
Theorem: Show that the traveling-salesman problem is
NP-complete.
Proof:
We first show that TSP belongs to NP.
Given an instance of the problem, we use as a
certificate the sequence of n vertices in the tour. The
verification algorithm checks that this sequence
contains each vertex exactly once, sums up the edge
costs, and checks whether the sum is at most k. This
process can certainly be done in polynomial time.
To prove that TSP is NP-hard, we show that HAM-
CYCLE≤PTSP.
Let G=(V,E) be an instance of HAM-CYCLE. We construct an
instance of TSP as follows.
We form the complete graph G’=(V,E’), where
E’={(i,j) ! i,j ∈ V and i≠j} and
we define the cost function c by:-
c(i,j) = 0 if (i,j) ∈ E
=1 if (i,j) ∉ E
The instance of TSP is then < G’,c,0 >, which we can easily
create in polynomial time.
We now show that graph G has a hamiltonian cycle if and only
if graph G’ has a tour of cost at most 0.
Suppose that graph G has a hamiltonian cycle h. Each edge in
h belongs to E and thus has cost 0 in G’. Thus, h is a tour in G’
with cost 0. Conversely, suppose that graph G’ has a tour h’ of
cost at most 0. Since the costs of the edges in E’ are 0 and 1,
the cost of tour h’ is exactly 0 and each edge on the tour must
have cost 0. Therefore, h’ contains only edges in E. We
conclude that h’ is a hamiltonian cycle in graph G.
AKTU Examination Questions
1. Write and explain the algorithm to solve vertex
cover problem using approximation algorithm.
2. Explain NP-complete and NP-Hard.
3. Explain Randomized algorithm in brief.
4. What is an approximation algorithm? What is
meant by P(n) approximation algorithms? Discuss
approximation algorithm for Travelling Salesman
Problem.
5. Define NP-Hard and NP- complete problems. What
are the steps involved in proving a problem NP-
complete? Specify the problems already proved to
be NP-complete.
AKTU Examination Questions
6. What are approximation algorithms? What is meant by
P(n) approximation algorithms?
7. Define NP, NP hard and NP Complete Problems. Prove
that Travelling Salesman Problem is NP-Complete.
8. What is the application of Fast Fourier Transform
(FFT)? Also write the recursive algorithm for FFT.
9. Discuss the problem classes P, NP and NP –complete
with class relationship.
10. Explain Approximation and Randomized algorithms.
11. What do you mean by polynomial time reduction?
12. Explain applications of FFT.

You might also like