Unit-5 of DAA
Unit-5 of DAA
Unit-5 of DAA
NP-Completeness: These problems have their own significance in programming. There are
many problems for which no polynomial time algorithms is known. Some of these problems are
TSP, Graph Colouring, and Knapsack Problem etc. A concept known as NP-Completeness deles
with the finding of an efficient algorithm for certain problems. Thus for some constant k, the
algorithm is efficient if its worst case running time is O(nk) on inputs of size n.
P Class (Polynomial time solving): P is set of problems that can be solved by a deterministic
Turing machine in Polynomial time. These algorithm take time like O(n), O(n2), O(n3). For ex.- All
sorting algorithms, BFS, DFS, Boyse-Moore string matching algorithm, Kruskal algorithm, Dijkstra
algorithm etc.
NP Class (Non deterministic Polynomial time solving): Problem which can't be solved in
polynomial time by a deterministic Turing Machine but are verifiable in polynomial time means
that given a solution of a problem, we can check that whether the solution is correct or not in
polynomial time. These problems can be solved in polynomial time by a Non deterministic Turing
Machine. P is subset of NP (any problem that can be solved by deterministic machine in polynomial
time can also be solved by non-deterministic machine in polynomial time). For ex.- All backtracking
and branch-n-bound problems.
NP-Complete Class: NP-Complete problems are problems that live in both the NP and NP-Hard
classes. This means that NP-Complete problems can be verified in polynomial time and that any NP
problem can be reduced to this problem in polynomial time.
NP-complete problems are the hardest problems in NP set. A decision problem L is NP-complete if:
1) L is in NP (Any given solution for NP-complete problems can be verified quickly, but there is no
efficient known solution).
2) Every problem in NP is reducible to L in polynomial time (Reduction is defined below).
NP-Hard Class: NP-Hard problems need not have any bound on their running time. If
any NPC Problem is polynomial time reducible to a problem X, that problem X belongs to NP Hard
class. Hence, all NP Complete problems are also NPH. In other words if a NPH problem is non-
deterministic polynomial time solvable, it is a NPC problem. Example of a NP problem that is
not NPC is Halting Problem.
Below is a venn diagram of the different class spaces:
From the diagram, its clear that NPC problems are the hardest problems in NP while being the
simplest ones in NPH. i.e.; NP∩NPH=NPC.
Note: Given a general problem, we can say its in NPC, if and only if we can reduce it to
some NP problem (which shows it is in NP) and also some NPC problem can be reduced to it
(which shows all NP problems can be reduced to this problem). Also, if a NPH problem is in NP,
then it is NPC
Learning reduction in general is very important. For example, if we have library functions to solve
certain problem and if we can reduce a new problem to one of the solved problems, we save a lot of
time. Consider the example of a problem where we have to find minimum product path in a given
directed graph where product of path is multiplication of weights of edges along the path. If we have
code for Dijkstra’s algorithm to find shortest path, we can take log of all weights and use Dijkstra’s
algorithm to find the minimum product path rather than writing a fresh code for this new problem.
How to prove a problem is NP-Complete: From the definition of NP-complete, it appears
impossible to prove that a problem L is NP-Complete. By definition, it requires us to that show every
problem in NP is polynomial time reducible to L. Fortunately, there is an alternate way to prove it.
Following are the steps to show a problem L is NP-complete:
1. Prove L is in NP.
i. Describe the certificate,
ii. Describe the verification algorithm,
iii. Show that the certificate is polynomial in size and the algorithm takes polynomial time.
2. Select a problem K known to be NP-complete.
3. Give an algorithm to compute a function f() mapping each instance of K to be an instance of L.
4. Prove that x is in K if and only if f(x) is in L for all string x.
5. Prove that the algorithm computing f() runs in polynomial time.
Decision vs Optimization Problems: In optimization problems each feasible (i.e. legal) solution
has an associated value, and we wish to find the feasible solution with the best value. For ex.-
Shortest-Path problem. But in decision problem the answer is simply Yes or No.
POLYNOMIALS AND THE FFT: The straightforward method of adding two polynomials of
degree n take (n) time, but the straightforward method of multiplying them takes (n2) time.
But by using Fast Fourier Transform, or FFT, we can reduce the time to multiply polynomials to
(n log n).
Polynomials: A polynomial in the variable x over an algebraic field F is a function A(x) that can
be represented as follows:
We call n the degree-bound of the polynomial, and values a0,a1,....,an-1 the coefficients of the
polynomial. A polynomial A(x) is said to have degree k if its highest nonzero coefficient is ak. The
degree of a polynomial of degree-bound n can be any integer between 0 and n-1, inclusive.
Conversely, a polynomial of degree k is a polynomial of degree-bound n for any n > k.
Polynomial Operations:
There are a variety of operations we might wish to define for polynomials:
Polynomial Addition
Polynomial Multiplication
Polynomial Addition: For polynomial addition, if A (x) and B (x) are polynomials of degree-
bound n, we say that their sum is a polynomial C (x), also of degree-bound n, such that C (x) =
A (x) + B (x) for all x. That is, if
and
then
For example, if A(x) = 6x3 + 7x2 - 10x + 9 and B(x) = -2x3+ 4x - 5, then C(x) = 4x3 + 7x2 - 6x + 4.
Polynomial Multiplication: For polynomial multiplication, if A(x) and B(x) are polynomials of
degree-bound n, we say that their product C(x) is a polynomial of degree-bound 2n-1 such
that C(x) = A(x)*B(x) for all x. You have probably multiplied polynomials before, by multiplying
each term in A(x) by each term in B(x) and combining terms with equal powers.
For example, we can multiply A(x) = 6x3 + 7x2 - 10x + 9 and B(x) = -2x3 + 4x - 5 as follows:
6x3 + 7x2 - 10x + 9 degree=3, degree bound=4
3
- 2x + 4x - 5 degree=3, degree bound=4
-------------------------
- 30x3 - 35x2 + 50x - 45
24x4 + 28x3 - 40x2 + 36x
- 12x6 - 14x5 + 20x4 - 18x3
--------------------------------------------------------
- 12x6 - 14x5 + 44x4 - 20x3 - 75x2 + 86x – 45 degree=6, degree bound=7
--------------------------------------------------------
Another way to express the product C(x) is
where
Ex- C4=a0b4+a1b3+a2b2+a3b1+a4b0
Horner’s Rule: We can evaluate a polynomial by using Horner’s rule in a fast way. In this instead
of computing the terms individually we do: A(x) = a0+x(a1+x(a2+…..+x(an-2+x(an-1))…))
For example- A(3) = a0+3(a1+3(a2+…..+3(an-2+3(an-1))…))
This method requires O(n) operations and evaluation takes (n) time.
Algorithm:
Evaluate_Horner(A,n,x)
{
t = A[n-1]
for i = n-2 down to 0, do
t = t.x + A[i]
return t
}
For the traveling salesperson problem, the optimization problem is to find the shortest cycle,
and the approximation problem is to find a short cycle.
For the vertex cover problem, the optimization problem is to find the vertex cover with fewest
vertices, and the approximation problem is to find the vertex cover with few vertices.
i. Vertex-Cover Problem
ii. Set-Cover Problem
iii. Travelling Salesman Problem
Vertex-Cover: A Vertex Cover of an undirected graph G is a set of vertices such that each edge
in G is incident to at least one of these vertices. The size of vertex cover is the number of vertices
in it. It seeks to find a set of vertices such that every edge in the graph touches one of these
vertices. The decision vertex-cover problem was proven NPC. Even though it may be difficult to
find an optimal vertex cover in a graph G, it is not too hard to find a vertex cover that is near
optimal. Now, we want to solve the optimal version of the vertex cover problem, i.e., we want to
find a minimum size vertex cover of a given graph. We call such vertex cover an optimal vertex
cover C*.
Following approximation algorithm takes as input an undirected graph G and returns a vertex
cover whose size is guaranteed to be no more than twice the size of an optimal vertex cover.
Set-Cover: The set cover problem is a very important optimization problem. In this you are given a
pair (X, F), where X = {x1, x2,….xm} is a finite set of elements and F = {S1, S2,…..Sn} is a family of subsets
of X, such that every element of X belongs to at least one set of F.
Consider a subset C of F, we say that C covers the domain if every element of X is in some set of C.
The problem is to find the minimum sized subset C of F that covers X.
Following approximation algorithm takes as input a finite set of elements X and a set of sub sets F,
and returns a set cover whose size is guaranteed to be no more than twice the size of an optimal
set cover.
Approx_Set_Cover(X, F)
{
U=X
C is an empty set
While U is not equal to empty, do
Select an S belongs to F that minimizes |S ꓵ U|
U=U-S
C = C U {S}
Return (C)
}
Ex.- Consider an example shown below. An instance {X, F} of the set-cover problem, where X
consists of 12 black points and F = {T1, T2, T3, T4, T5, T6}.
Sol.: Let the points of X are numbered from 1 to 12 in row major order.
U = X = {1,2,3,4,5,6,7,8,9,10,11,12}, C = {}, and
T1 = {1,2,3,4,5,6}, T2 = {5,6,8,9}, T3 = {1,4,7,10}
T4 = {2,5,7,8,11}, T5 = {3,6,9,12}, T6 = {10,11}
Step 1: |T1 ꓵ U| = |,1,2,3,4,5,6-|= 6
|T2 ꓵ U| = |,5,6,8,9-| = 4
|T3 ꓵ U| = |,1,4,7,10-| = 4
|T4 ꓵ U| = |,2,5,7,8,11-| = 5
|T5 ꓵ U| = |,3,6,9,12-| = 4
|T6 ꓵ U| = |,10,11-| = 2
Select T1 as it maximizes the result, Hence C = {T1}
Now U = U-T1 = {7,8,9,10,11,12}
Step 2: |T1 ꓵ U| = |,-|= 0
|T2 ꓵ U| = |,8,9-| = 2
|T3 ꓵ U| = |,7,10-| = 2
|T4 ꓵ U| = |,7,8,11-| = 3
|T5 ꓵ U| = |,9,12-| = 2
|T6 ꓵ U| = |,10,11-| = 2
Select T4 as it maximizes the result, Hence C = {T1,T4}
Now U = U-T4 = {9,10,12}
Step 3: |T1 ꓵ U| = |,-|= 0
|T2 ꓵ U| = |,9-| = 1
|T3 ꓵ U| = |,10-| = 1
|T4 ꓵ U| = |,-| = 0
|T5 ꓵ U| = |,9,12-| = 2
|T6 ꓵ U| = |,10-| = 1
Select T5 as it maximizes the result, Hence C = {T1,T4,T5}
Now U = U-T5 = {10}
Step 4: |T1 ꓵ U| = |,-|= 0
|T2 ꓵ U| = |,-| = 0
|T3 ꓵ U| = |,10-| = 1
|T4 ꓵ U| = |,-| = 0
|T5 ꓵ U| = |,-| = 0
|T6 ꓵ U| = |,10-| = 1
Select T3 as it maximizes the result, Hence C = {T1,T4,T5,T3}
Now U = U-T3 = {}
Hence, the approximate set cover for the given problem is = {T 1,T3,T4,T5}, and
Optimized set cover for the given problem is = {T3,T4,T5},
This shows that the approximate result is not more than twice of the optimized result.
Traveling-Salesman Problem:
In the traveling salesman Problem, a salesman must visits n cities. We can say that salesman
wishes to make a tour or Hamiltonian cycle, visiting each city exactly once and finishing at the city
he starts from. There is a non-negative cost c (i, j) to travel from the city i to city j. The goal is to
find a tour of minimum cost. We assume that every two cities are connected. Such problems are
called Traveling-salesman problem (TSP).
We can model the cities as a complete graph of n vertices, where each vertex represents a city.
It can be shown that TSP is NPC.
If we assume the cost function c satisfies the triangle inequality, then we can use the following
approximate algorithm.
Triangle inequality: Let u, v, w be any three vertices, we have
One important observation to develop an approximate solution is if we remove an edge from H*,
the tour becomes a spanning tree.
Approx-TSP (G= (V, E))
{
1. Compute a MST T of G by using Prim’s algo; O(VlogV)
2. Select any vertex r is the root of the tree; O(1)
3. Let L be the list of vertices visited in a preorder tree walk of T; O(V)
4. Return the Hamiltonian cycle H that visits the vertices in the order L; O(1)
}
Example:
Intuitively, Approx-TSP first makes a full walk of MST T, which visits each edge exactly two times.
To create a Hamiltonian cycle from the full walk, it bypasses some vertices (which corresponds to
making a shortcut).
The Naive String Matching Algorithm: The Naive approach tests all the possible placement
of Pattern P [1.......m] relative to text T [1......n]. We try shift s = 0, 1.......n-m, successively and for
each shift s. Compare T [s+1.......s+m] to P [1......m]. This algorithm finds all valid shifts using a loop
that checks the condition P [1.......m] = T [s+1.......s+m] for each of the n - m +1 possible value of s.
NAIVE-STRING-MATCHER (T, P)
{
n ← length *T+
m ← length *P+
for s ← 0 to n -m
do if P [1.....m] = T [s + 1....s + m]
then print "Pattern occurs with shift" s
}
Analysis: This for loop from 3 to 5 executes for n-m + 1(we need at least m characters at the end)
times and in iteration we are doing m comparisons. So the total complexity is O (n-m+1)m.
Example: Suppose T = 1011101110, P = 111. Find all the Valid Shift.
Solution:
The Rabin-Karp-Algorithm:
The Rabin-Karp string matching algorithm calculates a hash value for the pattern, as well as for
each M-character sub sequences of text to be compared. If the hash values are unequal, the
algorithm will determine the hash value for next M-character sequence. If the hash values are
equal, the algorithm will analyze the pattern and the M-character sequence. In this way, there is
only one comparison per text subsequence, and character matching is only required when the
hash values match.
RABIN-KARP-MATCHER (T, P, d, q)
{
n ← length [T] T=31
m ← length *P+ P=26
m-1
h ← d mod q 101 mod 11 = 10
p← 0
t0 ← 0
for i ← 1 to m //Preprocessing
do p ← (dp + P*i+) mod q 2 mod 11 = 2, 26 mod 11 = 4
t0 ← (dt0 + T[i]) mod q 3 mod 11 = 3, 31 mod 11 = 9 14
for s ← 0 to n-m //Matching
do if p = ts
then if P [1.....m] = T [s+1.....s + m]
then "Pattern occurs with shift" s
if s < n-m
then ts+1 ← (d (ts-T [s+1]h)+T [s+m+1])mod q (10 (9-3*10) +4) mod 11 = -206 mod 11=3
} (10(3-1*10)+1)mod11= -69 mod 11= 8
Example: For string matching, working module q = 11, how many spurious hits does the Rabin-
Karp matcher encounters in Text T = 31415926535 of pattern P =26.
Solution: Given
T = 31415926535, and P = 26 , d=10
Here T.Length =11 so Q = 11
So, P mod Q = 26 mod 11 = 4
Now we find the exact match of P mod Q as:
Solution:
Analysis:
The running time of RABIN-KARP-MATCHER in the worst case scenario O ((n-m+1) m) but it has a
good average case running time. If the expected number of strong shifts is small O (1) and prime q
is chosen to be quite large, then the Rabin-Karp algorithm can be expected to run in time O
(n+m) plus the time to require to process spurious hits.
Solution:
Initially: m = length [p] = 7
Π *1+ = 0
k=0
After iteration 6 times, the prefix function computation is complete:
Let us execute the KMP Algorithm to find whether 'P' occurs in 'T.'
For 'p' the prefix function, ? was computed previously and is as follows:
Solution:
Initially: n = size of T = 15
m = size of P = 7
Pattern 'P' has been found to complexity occur in a string 'T.' The total number of shifts that took
place for the match to be found is i-m = 13 - 7 = 6 shifts.
The Boyer-Moore Algorithm
Robert Boyer and J Strother Moore established it in 1977. The B-M String search algorithm is a
particularly efficient algorithm and has served as a standard benchmark for string search algorithm
ever since. The B-M algorithm takes a 'backward' approach: the pattern string (P) is aligned with
the start of the text string (T), and then compares the characters of a pattern from right to left,
beginning with rightmost character. If a character is compared that is not within the pattern, no
match can be found by analyzing any further aspects at this position so the pattern can be
changed entirely past the mismatching character.
For deciding the possible shifts, B-M algorithm uses two pre-processing strategies simultaneously.
Whenever a mismatch occurs, the algorithm calculates a variation using both approaches and
selects the more significant shift thus, if make use of the most effective strategy for each case.
The two strategies are called heuristics of B - M as they are used to reduce the search. They are:
1. Bad Character Heuristics
2. Good Suffix Heuristics
1. Bad Character Heuristics
This Heuristics has two implications:
o Suppose there is a character in a text in which does not occur in a pattern at all. When a
mismatch happens at this character (called as bad character), the whole pattern can be
changed, begin matching form substring next to this 'bad character.'
o On the other hand, it might be that a bad character is present in the pattern, in this case,
align the nature of the pattern with a bad character in the text.
Thus in any case shift may be higher than one.
Example1: Let Text T = <nyoo nyoo> and pattern P = <noyo>
Example2: If a bad character doesn't exist the pattern then.
This means that we need some extra information to produce a shift on encountering a bad
character. This information is about the last position of every aspect in the pattern and also the set
of characters used in a pattern (often called the alphabet ∑of a pattern).
COMPUTE-LAST-OCCURRENCE-FUNCTION (P, m, ∑ )
{
for each character a ∈ ∑
do λ *a+ = 0
for j ← 1 to m
do λ *P *j++ ← j
Return λ
}
BOYER-MOORE-MATCHER (T, P, ∑)
1. n ←length *T+
2. m ←length *P+
3. λ← COMPUTE-LAST-OCCURRENCE-FUNCTION (P, m, ∑ )
4. ɣ← COMPUTE-GOOD-SUFFIX-FUNCTION (P, m)
5. s ←0
6. While s ≤ n - m
7. do j ← m
8. While j > 0 and P [j] = T [s + j]
9. do j ←j-1
10. If j = 0
11. then print "Pattern occurs at shift" s
12. s ← s + ɣ*0+
13. else s ← s + max (ɣ *j+, j - λ*T*s+j++)
Naive O (O (n - m + 1)m)