Unit-5 NP-Completeness Theoery
Unit-5 NP-Completeness Theoery
− Computational complexity theory refers to the lower bounds on the efficiency of the
algorithms in a pessimistic way to describe the difficulty level of those algorithms.
− It categorizes numerous computational problems based on their inherent difficulty and
defines the relations between them.
− It identifies different models of efficient computation, their strengths and weaknesses, and
their relationships.
GQ. Explain polynomial and non-deterministic polynomial time problems. Explain its
Computational complexity. (4 Marks)
GQ. What are deterministic and non-deterministic algorithms? Explain with an
example. (4 Marks)
Optimization problem
− It is a computational problem that determines the optimal value of a specified cost function.
Decision problem
− It is a restricted type of computational problem that produces only two possible outputs
(“yes” or “no”) for each input.
− A decision problem only verifies whether the input satisfies a specific property.
− E.g. Primality test, Hamiltonian cycle: Whether a given graph has any Hamiltonian cycle
in it?
− A decision algorithm answers the correct truth value TRUE or FALSE for each input
instance of a given decision problem in a finite amount of time.
− The decision problems are easier to solve than the optimization problems.
− If a decision problem is “hard” to solve, then it implies that the corresponding optimization
problem is also “hard”. Thus, the theory of computational complexity is built around the
decision problems that have implications for optimization problems.
− An optimization problem can be formulated as a decision problem.
− E.g. Clique decision problem: Is there any clique of size at least s, for some s in a given
graph?
Decidable problem
− A decision problem is said to be decidable if there exists a decision algorithm to solve it.
− The decidable problems get the correct TRUE or FALSE answer for a given input either
in polynomial time or in non-polynomial time. Thus, the decidable problems can be either
tractable or intractable.
− E.g. Primality test, TSP, knapsack problem, Hamiltonian cycle problem.
Undecidable problem
− A decision problem is said to be undecidable if there is no decision algorithm to solve it.
− Such problems do not get the correct “TRUE” or “FALSE” answer for a given input by
any algorithm.
Tractability
− It defines the feasibility of an algorithm to complete its execution in a reasonable time.
− The problems are said to be tractable if they get solved in polynomial time using
deterministic algorithms.
− E.g. Time complexities O(n2), O(n3), O(1), O(n log n).
− Some examples of the tractable problems: linear search: O(n), binary search: O(log n),
merge sort: O(n log n), Prim’s algorithm: O(n2).
Intractability
− It defines the infeasibility of an algorithm to complete its execution in a reasonable time.
− The problems are said to be intractable if they cannot be solved in polynomial time using
deterministic algorithms.
− There are some intractable but decidable problems with exponential runtime. E.g. Time
complexities: O(2n), O(nn), O(n!).
− Some examples of the intractable problems: 0/1 knapsack problem: O(2n/2), TSP: O(n22n),
m-colouring problem: O(nmn).
Deterministic algorithm
− The algorithm is said to be deterministic if it generates the same result for the same set of
inputs.
− The deterministic algorithms uniquely define the outcomes for specific legitimate input.
− All computer programs are deterministic.
− E.g. Addition of first n numbers, sorting algorithms, searching algorithms.
Non-deterministic algorithm
− The algorithm is said to be non-deterministic if it generates different results for the same
set of inputs on different executions.
− The non-deterministic algorithms arbitrarily define the outcomes for specific legitimate
input.
− Though there is a degree of randomness to the outcomes of a non-deterministic algorithm,
Complexity classes
− In the theory of computational complexity, the computational problems are classified as
per their complex nature and the requirements of computing resources. These classes of
problems are known as complexity classes.
− The problems with a similar range of time and space requirements to get their solutions are
included in a single complexity class.
− Complexity classes help in the organization of similar types of problems.
− The major four complexity classes are:
(1) P-Class problems,
(2) NP-Class problems,
(3) NP-Hard problems and
(4) NP-Complete problems.
We consider an example of the searching algorithm that searches for an element in a given
list of elements.
Problem description
− Here, a function Choice(1, n) randomly selects any index in the range [1, n].
− If an element y is found at this randomly selected index i in an array X then the algorithm
has a successful execution indicating the same by a function Success( ).
− However, if an element y is not found at this randomly chosen index i in an array X then
the algorithm has an unsuccessful execution indicating the same by a function Failure( ).
− Complexity analysis of the non-deterministic searching algorithm: Each of the three
functions Choice(1, n), Success( ), and Failure( ) takes a constant execution time O(1).
− Hence non-deterministic complexity of this non-deterministic searching algorithm is O(1).
− Every deterministic searching algorithm has time complexity (n) to search an element in
an unordered set of n elements.
Ans. :
− Definitions of a deterministic and non-deterministic algorithm :
− Refer to the definitions mentioned in section 5.1.1 Basic Terminologies of Computational
Complexity.
− E.g. A deterministic searching algorithm and a non-deterministic searching algorithm.
− An example of a deterministic algorithm :
− Refer to section 5.2.1 Deterministic searching algorithm.
− An example of a non-deterministic algorithm :
− Refer to section 5.2.1 Non-deterministic searching algorithm.
GQ. Write a brief note on NP-completeness and the classes P, NP, and NPC. (7marks)
GQ. Define P and NP problems. Also, give an example of each type of problem. (4 marks)
GQ. Define NP-Complete and NP-Hard problems. Also, give examples. (4 marks)
1. P-Class
− It is the class of decision problems that can be solved in polynomial time by deterministic
algorithms.
A polynomial-time algorithm
− It is an algorithm whose running time is polynomially dependent on the input size of a
problem instance.
− Thus, a polynomial-time algorithm for a problem instance of size n has its worst-case
complexity O(p(n)) where p(n) is the polynomial of input size n. E.g. Time complexities
O(n2), O(n3), O(1), O(n log n).
− “P” stands for the Polynomial-time algorithm.
− P-Class problems are decidable and tractable problems.
− Some examples of P-Class problems: linear search: O(n), binary search: O(log n), merge
sort: O(n log n), Prim’s algorithm: O(n2), Floyd-Warshall’s algorithm: O(n3).
2. NP-Class
− It is the class of decision problems that can be solved in polynomial time by non-
deterministic algorithms.
− “NP” stands for the Non-deterministic Polynomial-time algorithm and not for Non-
polynomial time algorithms.
3. NP-Hard Class
4. NP-Complete Class
The relationship among P, NP, NP-Hard and NP-Complete problems (assuming P NP)
is depicted in Fig. 5.3.1.
Fig. 5.3.1: Generally assumed relationship among P, NP, NP-Hard, and NP-
Complete problems
− Suppose P1 and P2 are two decision problems and a problem P2 can be solved by a
polynomial-time deterministic algorithm.
o Function f maps all “yes” or “accept” instances of P1 to all “yes” or “accept” instances
of P2 and all “no” or “reject” instances of P1 to "no" or “reject” instances of P2.
o Function f can be computed using a polynomial-time algorithm.
− Thus, through polynomial-time reduction one can prove the easiness of computations for
problem P1 by using the easiness of computations for problem P2.
Two problems P1 and P2 are polynomially equivalent iff P1 P2 and P2 P1 (or it can be
also written as iff P1 P P2 and P2 P P1).
Proof
− Let us assume a problem P2 can be solved by a known polynomial-time algorithm.
(i) By applying polynomial-time reduction depicted in Fig. 5.4.1 we would have a solution to
P1 using a polynomial-time solution to P2. But it contradicts the known fact of the non-
existence of any polynomial-time algorithm to solve P1.
GQ. Which are the three major concepts used to show that a problem is an NP-
− If a decision problem P1 is known to be NP-Complete, then one can prove that any other
− Thus, to prove the NP-Completeness of any decision problem there must be the “first” NP-
Complete problem.
− The circuit-satisfiability problem is the first proven NP-Complete problem and hence
it is used to prove the NP-Completeness of any other decision problem by referring to the
satisfiability problem determines some set of inputs to this Boolean circuit to produce the
Proofs for NP-Hard and NP-Complete problems are based on the theory of reducibility.
− We know that an NP-Hard problem is at least as hard as the hardest NP-Class problem and
P1 is NP-Hard.
− As per Cook's theorem, all NP-Complete problems are polynomially reducible to the
− Since, NP-Complete class = NP-Class NP-Hard class, if P1NP and P1NP-Hard, then
P1 is NP-Complete.
satisfiability” problem.
− In 1971, Stephen Cook proved the first time that the satisfiability problem is NP-Complete.
Problem description
variables x1, x2, .... and the operations OR, AND, NOT. E.g. ( x1 x2) (x3 x4) where
− The expression is ‘satisfiable’ if there is some set of truth assignments to the variables in
it that makes the expression TRUE. E.g. [ x1 x2] is satisfied if x1 = FALSE and x2 =
TRUE.
The satisfiability (SAT) problem is to ascertain whether the given Boolean expression is
satisfiable. E.g. ( x1 x1) is un-satisfiable for any truth values of x1.
The CNF-satisfiability problem is the SAT problem for Conjunctive Normal Form
(CNF) expressions which are represented as I = 1 Ci where Ci’s are the clauses each
M
expressed as bij; m is the number of clauses, bij’s are literals (variables or their negations)
The 3-SAT problem is generalized to the k-SAT problem in which the CNF formula
consists of clauses with at most k-literals in each clause.
The DNF-satisfiability problem is the SAT problem for Disjunctive Normal Form (DNF)
expressions which are represented as I = 1 Ci where Ci’s are the clauses each expressed as
M
bij; m is the number of clauses, bij’s are literals and j is the number of literals in each
− The DNF SAT problem has a trivial solution if at least one of the clauses in a DNF
only if it does not have both xi and xi for some variable xi in it.
Non-deterministic algorithm
− The non-deterministic algorithm for a 3-SAT problem randomly assigns TRUE or FALSE
value to each of the three variables in the given Boolean propositional expression.
− Thus, it non-deterministically determines one of the 23 possible truth assignments to 3
variables x1, x2, x3 and checks whether an expression E (x1, x2, x3) is TRUE for that random
choice of truth assignments to x1, x2, and x3.
Complexity analysis
− To make a random choice of truth assignments to n variables (x1, x2,.... xN), the non-
deterministic time (n) is needed. So to make a random choice of truth assignments to 3
variables (x1, x2, x3), the non-deterministic time (3) => constant time O(1) is needed.
− To verify the satisfiability of an expression E(x1, x2, x3) for the random choice of truth
assignments to x1, x2, and x3 the required deterministic time directly proportional to the
length 3 of the expression E is constant time O(1).
GQ. What is Boolean Satisfiability Problem? Explain the 3-SAT problem. Prove 3-
SAT is NP-Complete. (8 Marks)
GQ. Explain in brief the NP-Complete problem. Prove that the 3-SAT problem is NP-
Complete. (8 Marks)
Proof
− For a given 3-CNF expression, we can randomly choose a set of truth assignments to the
variables in it.
− Then in polynomial time, we can verify the given
− The 3-CNF formula is evaluated to be TRUE for that chosen set of truth assignments.
Step 1 :
− We represent a propositional Boolean expression E1 as a binary “parse” tree in which
leaves represent literals and internal nodes represent connectives.
− Using the property of associativity, we fully parenthesize E1 such that each internal node
in a corresponding sparse tree has 1 or 2 children.
− Let yI represents the output at each internal node in a parse tree.
− We express E1 as the AND of output yI at the root and the conjunction of the other outputs
at each node. Let the resultant expression be E2.
− E.g. Let E1 = x1 x2 (x3 → x4)
o We parenthesize E1 to get,
E1 = x1 (x2 (x3 → x4))
Fig. 5.5.1
Step 2 :
− We convert each clause in the resultant expression E2 into CNF.
− For the same, we evaluate all truth assignments to all variables in E2 using a truth table.
− Based on it we construct a Disjunctive Normal Form (DNF) for truth assignments
evaluating to 0.
− Then we convert this DNF formula into a CNF formula using DeMorgan’s laws :
(1) (a b) = a b and (2) (a b) = a b
Let this resultant CNF expression be E3 which is equivalent to the original expression E1.
− E.g. In step (1) we have E2 = y1 (y1 ( x1 y2)) which is equivalent to E1.
Table 5.5.1
y1 y2 x1 y1 ( x1 y2)
1 1 1 0
1 1 0 1
1 0 1 0
1 0 0 0
0 1 1 1
0 1 0 0
0 0 1 1
0 0 0 1
o Considering the truth assignments evaluating to 0 in the truth table we form a DNF
formula E3 equivalent to E2 as below :
Step 3 :
(ii) If Ci contains 2 different literals i.e. Ci = (p q) then add third literal as r and r to
make it (p q r) (p q r) as clauses in a 3-CNF expression E5. For any truth
value of r (r = 0 or r = 1), one of these clauses results in 1, and the other results in (p
q).
(iii) If Ci contains only 1 literal i.e. Ci = (p) then add 2 more literals as q, q, r, and r to
make clauses (p q r) (p q r) (p q r) (p q r) in a 3-CNF
expression E5. For any truth values of q and r, one of these clauses results in (p) and
the remaining 3 clauses result in 1.
− Thus, in polynomial time by following the afore mentioned steps (1), (2), (3) any
propositional Boolean expression can be converted to an equivalent 3-CNF expression. So,
SAT 3-SAT 3-SAT NP-Hard.
− Since both, the conditions: (i) 3-SAT NP and (ii) 3-SAT NP-Hard, the 3-SAT problem
is proved to be an NP-Complete problem.
− It is a classic example of an NP-Complete problem.
− Hamiltonian cycle decision problem (HCDP) is to find whether a given undirected graph
has any Hamiltonian cycle in it.
Non-deterministic algorithm
Algorithm HCD_ND (G, n, s)
/* Input: G = (V, E) is an undirected graph where V is a set of nodes and E is a set of edges. n
is the number of nodes in G.
Output: Indicates success if a graph has a Hamiltonian cycle, otherwise signals failure.*/
{
V := ; /* V is an initially empty set to store nodes of a subgraph of G.*/
Failure( );
/*If each ordered pair of nodes in a set V is not adjacent then there is no
Hamiltonian path. If so, it indicates failure. */
Success( ); /*Indicates success if randomly selected sequence of n vertices form a
Hamiltonian cycle. */
}
}
Complexity analysis
− To make a random choice of a sequence of n vertices of a given graph, the non-
deterministic time O(n) is needed.
− To check whether a randomly chosen sequence of n vertices is a Hamiltonian cycle in the
given graph, the required deterministic time is (|E|).
Proof : By definition of the NP-Complete problem, the Hamiltonian cycle decision problem
(HCDP) is NP-Complete if (i) HCDP NP and (ii) HCDP NP-Hard.
− For a given undirected graph G = (V, E), we can randomly choose a sequence of n vertices.
Then in polynomial time, we can verify that a selected sequence of n vertices forms a
Hamiltonian cycle by testing whether each node of G is visited exactly once excluding the
first node which is visited twice to complete a cycle.
− We can show that either VCDP (vertex cover decision problem) HCDP or SAT HCDP.
GQ. Prove that the Hamiltonian Cycle decision problem is NP-Hard. (7 Marks)
− The travelling salesperson problem (TSP) asks to minimize the cost (or length) of a tour
that visits all cities only once and ends at the starting city.
Problem description
− Let G = (V, E) be a weighted directed graph of a set V of vertices and a set E of edges. A
node in a graph represents a city and an edge < i, j > E represents a path between two
cities i and j. The cost (weight) of an edge < i, j > E is given as Cij for all i and j V.
− Let |V|=n and n > 1. A tour of G starts at any node i V and ends at i by visiting all other
nodes in (V – {i}) only once.
− Thus, a tour of G is a Hamiltonian cycle including each vertex in V. The sum of the costs
of all paths (or edges) included in a tour determines the cost of a tour.
− The travelling salesman problem is to determine a tour with the smallest cost value.
− The travelling salesman decision problem is to find whether a given weighted directed
graph has a tour with a minimum cost k.
Proof
− For a given directed graph G = (V, E), we can randomly choose a sequence of n vertices
as a tour by the travelling salesman. Then in polynomial time, we can verify the credibility
of a selected tour by testing whether each node of G is visited exactly once excluding the
first node which is visited twice to complete a cycle.
− Then we sum the total cost of the paths (edges) in a tour and finally, we verify if the cost
of a tour is minimum. As this can be accomplished in polynomial time, TSDP NP.
A clique is a complete sub graph of a given undirected graph. In a complete subgraph, all
nodes are connected.
((a) Given (b) Not a clique (c) Non-maximal (d) A maximal (e) A maximal
graph G as nodes 1 and 4 clique as node 4 clique of size 3 clique of size 4.
are not adjacent connects to all This is also a
nodes in a maximum
subgraph clique.
Problem description
The clique decision problem (CDP) is to check whether a given undirected graph has a
clique of size at least s for some given s.
− The theory of NP-Completeness refers to the clique decision problem (CDP).
//Verification stage:
for (all pairs (0, 1) such that 0, 1 V and 0 1) do
if an edge <0, 1> E then Failure( );
/*If each pair of nodes in a set V is not adjacent, then that subgraph is not
a clique. If so, indicate failure. */
Success ( ); /*Indicate success if randomly selected subgraph is a clique of size s. */
}
Complexity analysis
− To make a random choice of a subgraph with s vertices among total n vertices of a given
graph, the non-deterministic time O(n) is needed.
Proof of NP-Completeness
Proof :
By definition of NP-Complete problem, the Clique Decision Problem (CDP) is NP-
Complete if (i) CDP NP and (ii) CDP NP-Hard.
− For a given undirected graph G = (V, E) we can randomly choose a subgraph G = (V, E)
of G such that V V and E E. Then in polynomial time we can verify that a subgraph
G is a clique in G by testing whether, for each pair v0, v1 V, there is an edge
< v0, v1 > E.
− We show that CNF-SAT CDP. To prove that CNF-SAT CDP, we can represent a
CNF-formula F of length m as a graph G = (V, E) such that G has a clique of size at least
m iff F is satisfiable.
m
− Let F = i = 1 ci be a CNF-formula of length m. Let xi, 1 i n be the variables in F. Then
A vertex cover of a given undirected graph G = (V, E) is a subset V V iff each edge <
v0, v1 > E, v0, v1 V is incident to at least one node in V’ that means either v0 V or v1
V or 0, v1 V.
(a) Given graph (b) Vertex cover of (c) Vertex cover of size 4
size 3 (smallest vertex
cover)
Fig. 5.5.6: Vertex Cover Examples
Problem description
The vertex cover optimization problem is to determine a vertex cover with the minimum
number of nodes for a given undirected graph.
The vertex cover decision problem (VCDP) is to check whether a given undirected graph
has a vertex cover of size at most s for some given s.
− The theory of NP-Completeness refers to the vertex cover decision problem (VCDP).
// Verification stage:
for (each edge < v0 , v1 > E and v0 , v1 V ) do
{
if ((v0 V) || (v1 V) || (v0, v1 V))
then Success ();
else Failure (); /*If all nodes in a randomly selected set V covers all edges of the
graph G then indicates success, otherwise signals failure*/
}
}
Complexity analysis
− To make a random choice of a set with S nodes among total n nodes of a given graph, the
non-deterministic time (n) is needed.
− To check whether a randomly chosen set of nodes is a vertex cover of a given graph, the
required deterministic time is (|E|).
Proof :
By definition of NP-Complete problem, the vertex cover decision problem (VCDP) is NP-
Complete if (i) VCDP NP and (ii) VCDP NP-Hard.
− For a given undirected graph G = (V, E), we can randomly choose a set of vertices V V
and |V| = s for some given size s. Then in polynomial time, we can verify that set V is a
vertex cover of G by testing whether each edge < v0, v1 > E and v0, v1 V is incident to
at least one node in V.
− The proof of CDP VCDP refers the concept of “complement” of a graph. For a
– = (V E
undirected graph G = (V, E) its complement is given by G – ) where E
– = {< v , v
0 1
− Let < G, n, s > be an instance of CDP where G is a given undirected graph with n vertices
and a clique of size at least s.
− This instance of CDP can be reduced to an instance of VCDP in polynomial time by
– = (V E
constructing the complement G – ) of a graph G. Then G
– has a vertex cover of size
at most (n – s). So, by polynomial-time reduction a CDP instance < G, n, s > is transformed
– , n, n – s >.
to a VCDP instance < G
− E.g. consider a CDP instance < G, 5, 3 > as depicted in Fig. 5.5.7(a). G has 5 nodes and a
clique of size 3 including nodes 3, 4 and 5.
−
– , 5, 2 > as depicted
This CDP instance is polynomially reducible to a VCDP instance <G
– has 5 nodes a vertex cover of size 2 including nodes 1 and 2.
in Fig. 5.5.7(b) below G
−
– = (V E
We must show that the construction of G – ) of VCDP instance from G = (V, E)
– = (V E
(ii) Similarly, V V is a vertex cover of G – ) then V – V must form a clique in G.
− Thus, we have shown that CDP VCDP. Also, we know that CNF-SAT CDP. So, by
transitivity CNF-SAT VCDP. Thus, it is proved that VCDP NP-Hard.
− Since both the conditions: (i) VCDP NP and (ii) VCDP NP-Hard are proved the vertex
cover decision problem is NP-Complete.