0% found this document useful (0 votes)
2 views23 pages

Unit-5 of DAA

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 23

UNIT-5

NP-Completeness: These problems have their own significance in programming. There are
many problems for which no polynomial time algorithms is known. Some of these problems are
TSP, Graph Colouring, and Knapsack Problem etc. A concept known as NP-Completeness deles
with the finding of an efficient algorithm for certain problems. Thus for some constant k, the
algorithm is efficient if its worst case running time is O(nk) on inputs of size n.

Polynomial Time Algorithms:


i. Algorithms with worst case running time of O(nk), where k is a constant, are called tractable
others are called intractable.
ii. Formally, an algorithm is polynomial time algorithm if there exists a polynomial p(n) such that
the algorithm can solve an instance of size n in a time O(p(n)).
iii. Problem requiring Ω(n35) to solve is essentially intractable for large n. Most known polynomial
time algorithm runs in time O(nk) for fairly low value of k.

P Class (Polynomial time solving): P is set of problems that can be solved by a deterministic
Turing machine in Polynomial time. These algorithm take time like O(n), O(n2), O(n3). For ex.- All
sorting algorithms, BFS, DFS, Boyse-Moore string matching algorithm, Kruskal algorithm, Dijkstra
algorithm etc.

NP Class (Non deterministic Polynomial time solving): Problem which can't be solved in
polynomial time by a deterministic Turing Machine but are verifiable in polynomial time means
that given a solution of a problem, we can check that whether the solution is correct or not in
polynomial time. These problems can be solved in polynomial time by a Non deterministic Turing
Machine. P is subset of NP (any problem that can be solved by deterministic machine in polynomial
time can also be solved by non-deterministic machine in polynomial time). For ex.- All backtracking
and branch-n-bound problems.

NP-Complete Class: NP-Complete problems are problems that live in both the NP and NP-Hard
classes. This means that NP-Complete problems can be verified in polynomial time and that any NP
problem can be reduced to this problem in polynomial time.
NP-complete problems are the hardest problems in NP set. A decision problem L is NP-complete if:
1) L is in NP (Any given solution for NP-complete problems can be verified quickly, but there is no
efficient known solution).
2) Every problem in NP is reducible to L in polynomial time (Reduction is defined below).

NP-Hard Class: NP-Hard problems need not have any bound on their running time. If
any NPC Problem is polynomial time reducible to a problem X, that problem X belongs to NP Hard
class. Hence, all NP Complete problems are also NPH. In other words if a NPH problem is non-
deterministic polynomial time solvable, it is a NPC problem. Example of a NP problem that is
not NPC is Halting Problem.
Below is a venn diagram of the different class spaces:

From the diagram, its clear that NPC problems are the hardest problems in NP while being the
simplest ones in NPH. i.e.; NP∩NPH=NPC.

Note: Given a general problem, we can say its in NPC, if and only if we can reduce it to
some NP problem (which shows it is in NP) and also some NPC problem can be reduced to it
(which shows all NP problems can be reduced to this problem). Also, if a NPH problem is in NP,
then it is NPC

Polynomial Time Reduction Algorithm:


Let L1 and L2 be two decision problems. Suppose algorithm A2 solves L2. That is, if y is an input for
L2 then algorithm A2 will answer Yes or No depending upon whether y belongs to L2 or not.
The idea is to find a transformation from L1 to L2 so that the algorithm A2 can be part of an algorithm
A1 to solve L1.

Learning reduction in general is very important. For example, if we have library functions to solve
certain problem and if we can reduce a new problem to one of the solved problems, we save a lot of
time. Consider the example of a problem where we have to find minimum product path in a given
directed graph where product of path is multiplication of weights of edges along the path. If we have
code for Dijkstra’s algorithm to find shortest path, we can take log of all weights and use Dijkstra’s
algorithm to find the minimum product path rather than writing a fresh code for this new problem.
How to prove a problem is NP-Complete: From the definition of NP-complete, it appears
impossible to prove that a problem L is NP-Complete. By definition, it requires us to that show every
problem in NP is polynomial time reducible to L. Fortunately, there is an alternate way to prove it.
Following are the steps to show a problem L is NP-complete:
1. Prove L is in NP.
i. Describe the certificate,
ii. Describe the verification algorithm,
iii. Show that the certificate is polynomial in size and the algorithm takes polynomial time.
2. Select a problem K known to be NP-complete.
3. Give an algorithm to compute a function f() mapping each instance of K to be an instance of L.
4. Prove that x is in K if and only if f(x) is in L for all string x.
5. Prove that the algorithm computing f() runs in polynomial time.

Decision vs Optimization Problems: In optimization problems each feasible (i.e. legal) solution
has an associated value, and we wish to find the feasible solution with the best value. For ex.-
Shortest-Path problem. But in decision problem the answer is simply Yes or No.

POLYNOMIALS AND THE FFT: The straightforward method of adding two polynomials of
degree n take (n) time, but the straightforward method of multiplying them takes (n2) time.
But by using Fast Fourier Transform, or FFT, we can reduce the time to multiply polynomials to
(n log n).

Polynomials: A polynomial in the variable x over an algebraic field F is a function A(x) that can
be represented as follows:

We call n the degree-bound of the polynomial, and values a0,a1,....,an-1 the coefficients of the
polynomial. A polynomial A(x) is said to have degree k if its highest nonzero coefficient is ak. The
degree of a polynomial of degree-bound n can be any integer between 0 and n-1, inclusive.
Conversely, a polynomial of degree k is a polynomial of degree-bound n for any n > k.

Polynomial Operations:
There are a variety of operations we might wish to define for polynomials:
 Polynomial Addition
 Polynomial Multiplication

Polynomial Addition: For polynomial addition, if A (x) and B (x) are polynomials of degree-
bound n, we say that their sum is a polynomial C (x), also of degree-bound n, such that C (x) =
A (x) + B (x) for all x. That is, if
and

then

where cj = aj + bj for j = 0,1, . . .,n - 1.

For example, if A(x) = 6x3 + 7x2 - 10x + 9 and B(x) = -2x3+ 4x - 5, then C(x) = 4x3 + 7x2 - 6x + 4.

Polynomial Multiplication: For polynomial multiplication, if A(x) and B(x) are polynomials of
degree-bound n, we say that their product C(x) is a polynomial of degree-bound 2n-1 such
that C(x) = A(x)*B(x) for all x. You have probably multiplied polynomials before, by multiplying
each term in A(x) by each term in B(x) and combining terms with equal powers.
For example, we can multiply A(x) = 6x3 + 7x2 - 10x + 9 and B(x) = -2x3 + 4x - 5 as follows:
6x3 + 7x2 - 10x + 9 degree=3, degree bound=4
3
- 2x + 4x - 5 degree=3, degree bound=4
-------------------------
- 30x3 - 35x2 + 50x - 45
24x4 + 28x3 - 40x2 + 36x
- 12x6 - 14x5 + 20x4 - 18x3
--------------------------------------------------------
- 12x6 - 14x5 + 44x4 - 20x3 - 75x2 + 86x – 45 degree=6, degree bound=7
--------------------------------------------------------
Another way to express the product C(x) is

where

Ex- C4=a0b4+a1b3+a2b2+a3b1+a4b0

Note that degree(C) = degree(A) + degree(B), implying


degree-bound(C)= degree-bound(A) + degree-bound(B) –1 <=degree-bound(A) + degree-bound(B).
We shall nevertheless speak of the degree-bound of C as being the sum of the degree-bounds
of A and B, since if a polynomial has degree-bound k it also has degree-bound k + 1.
Representation of polynomials: We can represent polynomials by using following two
methods:
 Coefficient Representation
 Point-value Representation

Coefficient representation: A coefficient representation of a polynomial of


degree-bound n is a vector of coefficients a = (a0, a1, . . ., an-1).
The coefficient representation is convenient for certain operations on polynomials. For example,
the operation of evaluating the polynomial A(x) at a given point x0 consists of computing the
value of A(x0). Evaluation takes time (n) using Horner's rule.
Similarly, adding two polynomials represented by the coefficient vectors a = (a0, a1, . . ., an-1)
and b = (b0, b1, . . ., bn-1) takes (n) time. We just output the coefficient vector c = (c0, c1, . . ., cn-1),
where cj = aj + bj for j = 0,1, . . . ,n - 1.
Now, consider the multiplication of two degree-bound n polynomials A(x) and B(x) represented in
coefficient form. If we use the method described by equations (32.1) and (32.2), polynomial
multiplication takes time (n2), since each coefficient in the vector a must be multiplied by each
coefficient in the vector b. The operation of multiplying polynomials in coefficient form seems to
be considerably more difficult than that of evaluating a polynomial or adding two polynomials.

Point-value representation: A point-value representation of a polynomial A(x) of degree-


bound n is a set of n point-value pairs {(x0, y0), (x1, y1), . . ., (xn-1, yn-1)}, such that all of the xk are
distinct and yk = A(xk), for k = 0, 1, . . ., n - 1. A polynomial has many different point-value
representations, since any set of n distinct points x0, x1, . . ., xn-1 can be used as a basis for the
representation.
Computing a point-value representation for a polynomial given in coefficient form is in principle
straightforward, since all we have to do is select n distinct points x0, x1, . . ., xn-1 and then
evaluate A(xk) for k = 0, 1, . . ., n - 1. With Horner's method, this n-point evaluation takes time
(n2). We shall see later that if we choose the xk cleverly, this computation can be accelerated to
run in time (n log n).
The time to add two polynomials of degree-bound n in point-value form is thus (n).
Similarly, the point-value representation is convenient for multiplying polynomials. If C(x) =
A(x)*B(x), then C(xk) = A(xk)*B(xk) for any point xk, and we can point wise multiply a point-value
representation for A by a point-value representation for B to obtain a point-value representation
for C. Given an extended point-value representation for A, {(x0,y0),(x1, y1),...,(x2n-1,y2n-1)}, and a
corresponding extended point-value representation for B, {(x0,y'0),(x1,y'1),...,(x2n-1,y'2n-1)}, then a
point-value representation for C is {(x0,y0y'0),(x1,y1y'1),...,(x2n-1,y2n-1y'2n-1)}.
Given two input polynomials in extended point-value form, we see that the time to multiply them
to obtain the point-value form of the result is (n), much less than the time required to multiply
polynomials in coefficient form.

Fast multiplication of polynomials in coefficient form:


Given figure shows this strategy of multiply two polynomials in a fast way graphically. One minor
detail concerns degree-bounds. The product of two polynomials of degree-bound n is a polynomial
of degree-bound 2n. Before evaluating the input polynomials A and B, therefore, we first double
their degree-bounds to 2n by adding n high-order coefficients of 0.
Given the FFT, we have the following (n log n) time procedure for multiplying two
polynomials A(x) and B(x) of degree-bound n, where the input and output representations are in
coefficient form. We assume that n is a power of 2; this requirement can always be met by adding
high-order zero coefficients.

Figure: A graphical outline of an efficient polynomial-multiplication process. Representations on


the top are in coefficient form, while those on the bottom are in point-value form. The arrows
from left to right correspond to the multiplication operation. The w 2n terms are complex (2n)th
roots of unity

Horner’s Rule: We can evaluate a polynomial by using Horner’s rule in a fast way. In this instead
of computing the terms individually we do: A(x) = a0+x(a1+x(a2+…..+x(an-2+x(an-1))…))
For example- A(3) = a0+3(a1+3(a2+…..+3(an-2+3(an-1))…))
This method requires O(n) operations and evaluation takes (n) time.

Algorithm:
Evaluate_Horner(A,n,x)
{
t = A[n-1]
for i = n-2 down to 0, do
t = t.x + A[i]
return t
}

Example: Evaluate A(x) = 2+3x+x2, for x = 2.


We can evaluate this as:
A(x) = 2+x(3+x(1)), now put x=2, we get
A(2) = 2+2(3+2(1))
= 2+2(3+2)
= 2+10
= 12.
Approximation Algorithms: Approximation algorithms are efficient algorithms that find
approximate solutions to optimization problems (in particular NP-hard problems) with provable
guarantees that the returned solution is optimal one. The goal of an approximation algorithm is to
come as close as possible to the optimum value in a reasonable amount of time which is at the
most polynomial time. Such algorithms are also called heuristic algorithm.

 For the traveling salesperson problem, the optimization problem is to find the shortest cycle,
and the approximation problem is to find a short cycle.
 For the vertex cover problem, the optimization problem is to find the vertex cover with fewest
vertices, and the approximation problem is to find the vertex cover with few vertices.

There are many examples of approximation algorithms. Some of them are:

i. Vertex-Cover Problem
ii. Set-Cover Problem
iii. Travelling Salesman Problem

Vertex-Cover: A Vertex Cover of an undirected graph G is a set of vertices such that each edge
in G is incident to at least one of these vertices. The size of vertex cover is the number of vertices
in it. It seeks to find a set of vertices such that every edge in the graph touches one of these
vertices. The decision vertex-cover problem was proven NPC. Even though it may be difficult to
find an optimal vertex cover in a graph G, it is not too hard to find a vertex cover that is near
optimal. Now, we want to solve the optimal version of the vertex cover problem, i.e., we want to
find a minimum size vertex cover of a given graph. We call such vertex cover an optimal vertex
cover C*.
Following approximation algorithm takes as input an undirected graph G and returns a vertex
cover whose size is guaranteed to be no more than twice the size of an optimal vertex cover.

Approx_Vertex_Cover (G = (V, E))


{
C = empty-set; O(1)
E'= E; O(E)
While E' is not empty, do O(E)
{
Let (u, v) be any edge in E' O(E)
Add u and v to C; O(V)
Remove from E' all edges incident to either u or v; O(E)
}
Return C; O(1)
}
Ex.- Optimal and approximate vertex cover for the following graph
The idea is to take an edge (u, v) one by one, put both vertices to C, and remove all the edges
incident to either u or v. We carry on until all edges have been removed. C is a Vertex Cover.

Approximate Vertex Cover = {b, c, d, e, f, g}


Optimal Vertex Cover = {b, d, e}

Set-Cover: The set cover problem is a very important optimization problem. In this you are given a
pair (X, F), where X = {x1, x2,….xm} is a finite set of elements and F = {S1, S2,…..Sn} is a family of subsets
of X, such that every element of X belongs to at least one set of F.
Consider a subset C of F, we say that C covers the domain if every element of X is in some set of C.
The problem is to find the minimum sized subset C of F that covers X.
Following approximation algorithm takes as input a finite set of elements X and a set of sub sets F,
and returns a set cover whose size is guaranteed to be no more than twice the size of an optimal
set cover.

Approx_Set_Cover(X, F)
{
U=X
C is an empty set
While U is not equal to empty, do
Select an S belongs to F that minimizes |S ꓵ U|
U=U-S
C = C U {S}
Return (C)
}
Ex.- Consider an example shown below. An instance {X, F} of the set-cover problem, where X
consists of 12 black points and F = {T1, T2, T3, T4, T5, T6}.

Sol.: Let the points of X are numbered from 1 to 12 in row major order.
U = X = {1,2,3,4,5,6,7,8,9,10,11,12}, C = {}, and
T1 = {1,2,3,4,5,6}, T2 = {5,6,8,9}, T3 = {1,4,7,10}
T4 = {2,5,7,8,11}, T5 = {3,6,9,12}, T6 = {10,11}
Step 1: |T1 ꓵ U| = |,1,2,3,4,5,6-|= 6
|T2 ꓵ U| = |,5,6,8,9-| = 4
|T3 ꓵ U| = |,1,4,7,10-| = 4
|T4 ꓵ U| = |,2,5,7,8,11-| = 5
|T5 ꓵ U| = |,3,6,9,12-| = 4
|T6 ꓵ U| = |,10,11-| = 2
Select T1 as it maximizes the result, Hence C = {T1}
Now U = U-T1 = {7,8,9,10,11,12}
Step 2: |T1 ꓵ U| = |,-|= 0
|T2 ꓵ U| = |,8,9-| = 2
|T3 ꓵ U| = |,7,10-| = 2
|T4 ꓵ U| = |,7,8,11-| = 3
|T5 ꓵ U| = |,9,12-| = 2
|T6 ꓵ U| = |,10,11-| = 2
Select T4 as it maximizes the result, Hence C = {T1,T4}
Now U = U-T4 = {9,10,12}
Step 3: |T1 ꓵ U| = |,-|= 0
|T2 ꓵ U| = |,9-| = 1
|T3 ꓵ U| = |,10-| = 1
|T4 ꓵ U| = |,-| = 0
|T5 ꓵ U| = |,9,12-| = 2
|T6 ꓵ U| = |,10-| = 1
Select T5 as it maximizes the result, Hence C = {T1,T4,T5}
Now U = U-T5 = {10}
Step 4: |T1 ꓵ U| = |,-|= 0
|T2 ꓵ U| = |,-| = 0
|T3 ꓵ U| = |,10-| = 1
|T4 ꓵ U| = |,-| = 0
|T5 ꓵ U| = |,-| = 0
|T6 ꓵ U| = |,10-| = 1
Select T3 as it maximizes the result, Hence C = {T1,T4,T5,T3}
Now U = U-T3 = {}
Hence, the approximate set cover for the given problem is = {T 1,T3,T4,T5}, and
Optimized set cover for the given problem is = {T3,T4,T5},
This shows that the approximate result is not more than twice of the optimized result.

Traveling-Salesman Problem:
In the traveling salesman Problem, a salesman must visits n cities. We can say that salesman
wishes to make a tour or Hamiltonian cycle, visiting each city exactly once and finishing at the city
he starts from. There is a non-negative cost c (i, j) to travel from the city i to city j. The goal is to
find a tour of minimum cost. We assume that every two cities are connected. Such problems are
called Traveling-salesman problem (TSP).
We can model the cities as a complete graph of n vertices, where each vertex represents a city.
It can be shown that TSP is NPC.
If we assume the cost function c satisfies the triangle inequality, then we can use the following
approximate algorithm.
Triangle inequality: Let u, v, w be any three vertices, we have

One important observation to develop an approximate solution is if we remove an edge from H*,
the tour becomes a spanning tree.
Approx-TSP (G= (V, E))
{
1. Compute a MST T of G by using Prim’s algo; O(VlogV)
2. Select any vertex r is the root of the tree; O(1)
3. Let L be the list of vertices visited in a preorder tree walk of T; O(V)
4. Return the Hamiltonian cycle H that visits the vertices in the order L; O(1)
}
Example:
Intuitively, Approx-TSP first makes a full walk of MST T, which visits each edge exactly two times.
To create a Hamiltonian cycle from the full walk, it bypasses some vertices (which corresponds to
making a shortcut).

Randomized Algorithms: A randomized algorithm is defined as an algorithm that allowed


some randomness, unbiased random bits, as an input. It uses a random number at least once
during the computation to make a decision.
Example: In Quick Sort, using a random number to choose a pivot.

String Matching Algorithm:


String Matching Algorithm is also called "String Searching Algorithm." This is a vital class of string
algorithm is declared as "this is the method to find a place where pattern is found one or more
time within the larger string."
Given a text array, T [1.....n], of n character and a pattern array, P [1......m], of m characters. The
problems are to find an integer s, called valid shift where 0 ≤ s <= n-m and T [s+1......s+m] = P
[1......m]. In other words, to find P in T, i.e., where P is a substring of T. The item of P and T are
character drawn from some finite alphabet such as {0, 1} or {A, B .....Z, a, b..... z}. Given a string T
[1......n], the substrings are represented as T *i......j+ for some 0≤i ≤ j≤n-1, the string formed by the
characters in T from index i to index j, inclusive. This process that a string is a substring of itself
(take i = 0 and j =m).
The proper substring of string T [1......n] is T *i......j+ for some 0<i ≤ j<n-1. That is, we must have
either i>0 or j < n-1.
Using these descriptions, we can say given any string T [1......n],
The substrings are
 T [i.....j] = T [i] T [i +1] T [i+2]......T [j] for some 0≤i ≤ j≤n-1.
And proper substrings are
 T [i.....j] = T [i] T [i +1] T [i+2]......T [j] for some 0<i ≤ j<n-1.
Note: If i>j, then T [i.....j] is equal to the empty string or null, which has length zero.

Algorithms used for String Matching:


There are different types of method is used to finding the string
1. The Naive String Matching Algorithm
2. The Rabin-Karp-Algorithm
3. The Knuth-Morris-Pratt Algorithm
4. The Boyer-Moore Algorithm
5. Finite Automata

The Naive String Matching Algorithm: The Naive approach tests all the possible placement
of Pattern P [1.......m] relative to text T [1......n]. We try shift s = 0, 1.......n-m, successively and for
each shift s. Compare T [s+1.......s+m] to P [1......m]. This algorithm finds all valid shifts using a loop
that checks the condition P [1.......m] = T [s+1.......s+m] for each of the n - m +1 possible value of s.
NAIVE-STRING-MATCHER (T, P)
{
n ← length *T+
m ← length *P+
for s ← 0 to n -m
do if P [1.....m] = T [s + 1....s + m]
then print "Pattern occurs with shift" s
}
Analysis: This for loop from 3 to 5 executes for n-m + 1(we need at least m characters at the end)
times and in iteration we are doing m comparisons. So the total complexity is O (n-m+1)m.
Example: Suppose T = 1011101110, P = 111. Find all the Valid Shift.
Solution:
The Rabin-Karp-Algorithm:
The Rabin-Karp string matching algorithm calculates a hash value for the pattern, as well as for
each M-character sub sequences of text to be compared. If the hash values are unequal, the
algorithm will determine the hash value for next M-character sequence. If the hash values are
equal, the algorithm will analyze the pattern and the M-character sequence. In this way, there is
only one comparison per text subsequence, and character matching is only required when the
hash values match.

RABIN-KARP-MATCHER (T, P, d, q)
{
n ← length [T] T=31
m ← length *P+ P=26
m-1
h ← d mod q 101 mod 11 = 10
p← 0
t0 ← 0
for i ← 1 to m //Preprocessing
do p ← (dp + P*i+) mod q 2 mod 11 = 2, 26 mod 11 = 4
t0 ← (dt0 + T[i]) mod q 3 mod 11 = 3, 31 mod 11 = 9 14
for s ← 0 to n-m //Matching
do if p = ts
then if P [1.....m] = T [s+1.....s + m]
then "Pattern occurs with shift" s
if s < n-m
then ts+1 ← (d (ts-T [s+1]h)+T [s+m+1])mod q (10 (9-3*10) +4) mod 11 = -206 mod 11=3
} (10(3-1*10)+1)mod11= -69 mod 11= 8

Example: For string matching, working module q = 11, how many spurious hits does the Rabin-
Karp matcher encounters in Text T = 31415926535 of pattern P =26.
Solution: Given
T = 31415926535, and P = 26 , d=10
Here T.Length =11 so Q = 11
So, P mod Q = 26 mod 11 = 4
Now we find the exact match of P mod Q as:
Solution:
Analysis:
The running time of RABIN-KARP-MATCHER in the worst case scenario O ((n-m+1) m) but it has a
good average case running time. If the expected number of strong shifts is small O (1) and prime q
is chosen to be quite large, then the Rabin-Karp algorithm can be expected to run in time O
(n+m) plus the time to require to process spurious hits.

The Knuth-Morris-Pratt (KMP) Algorithm:


Knuth-Morris and Pratt introduce a linear time algorithm for the string matching problem. A
matching time of O (n) is achieved by avoiding comparison with an element of 'S' that has
previously been involved in comparison with some element of the pattern 'p' to be matched. i.e.,
backtracking on the string 'S' never occurs

Components of KMP Algorithm:


1. The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates knowledge about how
the pattern matches against the shift of itself. This information can be used to avoid a useless shift
of the pattern 'p.' In other words, this enables avoiding backtracking of the string 'S.'
2. The KMP Matcher: With string 'S,' pattern 'p' and prefix function 'Π' as inputs, find the
occurrence of 'p' in 'S' and returns the number of shifts of 'p' after which occurrences are found.

Following pseudo code compute the prefix function, Π:

COMPUTE- PREFIX- FUNCTION (P)


{
m ←length *P+ //'p' pattern to be matched
Π *1+ ← 0
k←0
for q ← 2 to m
do while k > 0 and P *k + 1+ ≠ P *q+
do k ← Π *k+
if P [k + 1] = P [q]
then k← k + 1
Π *q+ ← k
Return Π
}

Running Time Analysis:


In the above pseudo code for calculating the prefix function, the for loop from step 4 to step 10
runs 'm' times. Step1 to Step3 take constant time. Hence the running time of computing prefix
function is O (m).

Example: Compute Π for the pattern 'p' below:

Solution:
Initially: m = length [p] = 7
Π *1+ = 0
k=0
After iteration 6 times, the prefix function computation is complete:

The KMP Matcher:


The KMP Matcher with the pattern 'p,' the string 'S' and prefix function 'Π' as input, finds a match
of p in S. Following pseudo code compute the matching component of KMP algorithm:
KMP-MATCHER (T, P)
{
n ← length *T+
m ← length *P+
Π← COMPUTE-PREFIX-FUNCTION (P)
q←0 // numbers of characters matched
for i ← 1 to n // scan S from left to right
do while q > 0 and P *q + 1+ ≠ T *i+
do q ← Π *q+ // next character does not match
If P [q + 1] = T [i]
then q ← q + 1 // next character matches
If q = m // is all of p matched?
then print "Pattern occurs with shift" i - m
q ← Π *q+ // look for the next match
}
Running Time Analysis:
The for loop beginning in step 5 runs 'n' times, i.e., as long as the length of the string 'S.' Since step
1 to step 4 take constant times, the running time is dominated by this for the loop. Thus running
time of the matching function is O (n).
Example: Given a string 'T' and pattern 'P' as follows:

Let us execute the KMP Algorithm to find whether 'P' occurs in 'T.'
For 'p' the prefix function, ? was computed previously and is as follows:

Solution:
Initially: n = size of T = 15
m = size of P = 7
Pattern 'P' has been found to complexity occur in a string 'T.' The total number of shifts that took
place for the match to be found is i-m = 13 - 7 = 6 shifts.
The Boyer-Moore Algorithm
Robert Boyer and J Strother Moore established it in 1977. The B-M String search algorithm is a
particularly efficient algorithm and has served as a standard benchmark for string search algorithm
ever since. The B-M algorithm takes a 'backward' approach: the pattern string (P) is aligned with
the start of the text string (T), and then compares the characters of a pattern from right to left,
beginning with rightmost character. If a character is compared that is not within the pattern, no
match can be found by analyzing any further aspects at this position so the pattern can be
changed entirely past the mismatching character.
For deciding the possible shifts, B-M algorithm uses two pre-processing strategies simultaneously.
Whenever a mismatch occurs, the algorithm calculates a variation using both approaches and
selects the more significant shift thus, if make use of the most effective strategy for each case.
The two strategies are called heuristics of B - M as they are used to reduce the search. They are:
1. Bad Character Heuristics
2. Good Suffix Heuristics
1. Bad Character Heuristics
This Heuristics has two implications:
o Suppose there is a character in a text in which does not occur in a pattern at all. When a
mismatch happens at this character (called as bad character), the whole pattern can be
changed, begin matching form substring next to this 'bad character.'
o On the other hand, it might be that a bad character is present in the pattern, in this case,
align the nature of the pattern with a bad character in the text.
Thus in any case shift may be higher than one.
Example1: Let Text T = <nyoo nyoo> and pattern P = <noyo>
Example2: If a bad character doesn't exist the pattern then.

Problem in Bad-Character Heuristics:


In some cases, Bad-Character Heuristics produces some negative shifts.
For Example:

This means that we need some extra information to produce a shift on encountering a bad
character. This information is about the last position of every aspect in the pattern and also the set
of characters used in a pattern (often called the alphabet ∑of a pattern).
COMPUTE-LAST-OCCURRENCE-FUNCTION (P, m, ∑ )
{
for each character a ∈ ∑
do λ *a+ = 0
for j ← 1 to m
do λ *P *j++ ← j
Return λ
}

2. Good Suffix Heuristics:


A good suffix is a suffix that has matched successfully. After a mismatch which has a negative shift
in bad character heuristics, look if a substring of pattern matched till bad character has a good
suffix in it, if it is so then we have an onward jump equal to the length of suffix found.
Example:
COMPUTE-GOOD-SUFFIX-FUNCTION (P, m)
1. Π ← COMPUTE-PREFIX-FUNCTION (P)
2. P'← reverse (P)
3. Π'← COMPUTE-PREFIX-FUNCTION (P')
4. for j ← 0 to m
5. do ɣ *j+ ← m - Π *m+
6. for l ← 1 to m
7. do j ← m - Π' *L+
8. If ɣ *j+ > l - Π' *L+
9. then ɣ *j+ ← 1 - Π'*L+
10. Return ɣ

BOYER-MOORE-MATCHER (T, P, ∑)
1. n ←length *T+
2. m ←length *P+
3. λ← COMPUTE-LAST-OCCURRENCE-FUNCTION (P, m, ∑ )
4. ɣ← COMPUTE-GOOD-SUFFIX-FUNCTION (P, m)
5. s ←0
6. While s ≤ n - m
7. do j ← m
8. While j > 0 and P [j] = T [s + j]
9. do j ←j-1
10. If j = 0
11. then print "Pattern occurs at shift" s
12. s ← s + ɣ*0+
13. else s ← s + max (ɣ *j+, j - λ*T*s+j++)

Complexity Comparison of String Matching Algorithm:


Algorithm Pre-processing Time Matching Time

Naive O (O (n - m + 1)m)

Rabin-Karp O(m) (O (n - m + 1)m)

Finite Automata O(m|∑|) O (n)

Knuth-Morris-Pratt O(m) O (n)

Boyer-Moore O(|∑|) (O ((n - m + 1) +


|∑|))

You might also like